Labelia assessment: FAQ

FAQ

Find the answers to your questions about the Labelia assessment.

The assessment

How was the evaluation developed?

  • It is the result of a participatory work initiated in mid-2019 and carried out by the Labelia Labs association. This approach is described in this blog article that we recommend!

    But let's review here some of the contextual elements described in the article. First of all, we can see that there is growing tension between the potential and interest of AI techniques on the one hand, and the difficulty of trusting these techniques or their implementation on the other (whether by private actors such as Apple with the Apple Card, Tesla in this astonishing example, or by public actors such as States, cf. COMPAS on parole in the USA, the controversies each year on Parcoursup in France, unemployment benefits in the Netherlands, and many others). In this context, it is becoming increasingly difficult for an organisation to implement data science approaches in its products and services and to take this on board publicly.

    Obviously this tension is not new, certain risks are very real, and it seems to us that there is a general consensus on the fact that it is necessary to develop structuring and reassuring frameworks. You only have to type AI and ethics or responsible AI into a search engine to see the significant number of initiatives in this field. However, many of them are lists of cardinal principles, and do not offer a concrete, operational hook. How can we position ourselves? How to evaluate one's organisation? What should you work on to comply with these principles?

    It is based on these observations that we wanted to develop a tool that is intended for practitioners, useful and actionable as soon as possible. Give it a try and tell us what you think!

Who is this evaluation intended for?

  • The self-assessment tool has been developed to suit (and hopefully bring something!) all organisations (companies, university laboratories, start-ups, specialised consultants, etc.) with activities in data science, AI, ML, etc. A data scientist, a team leader, or a technical director for example can complete the assessment. The tool also enables several users from the same organisation to fill in an assessment together, for example to split up the different topics.

How is the evaluation structured?

  • It is composed of 6 thematic sections. We have chosen not to take up here the 7 themes of the report of the EU high-level expert group and its ALTAI tool, but to prefer a breakdown that we hope will be more pragmatic, aimed at getting closer to the life cycle of a data science project.

Is the assessment framework 'finished' or will it keep evolving?

  • Yes, it will continue to evolve. From the very beginning of this project it was clear that it would be an iterative process, because it seemed unimaginable to work for a period of time, publish this work and move on to something else. The field evolves quickly, the perspectives are multiple (large companies, public organisations, small start-ups, specialised consultants, regulators...), it was going to have to start somewhere and improve over time. Now that the platform is online, however, it is not a question of making changes every week, otherwise the assessments in progress or just completed will be constantly obsolete. We therefore set ourselves a time constant of the order of a quarter or semester. In order to accompany these updates and make them a positive thing for users and organisations that have already evaluated themselves, the platform includes a migration functionality. This consists of migrating a given evaluation to the more recent version of the evaluation repository: all responses to unchanged elements will be retained.

Score

The synthetic score is on a total of 100 theoretical maximum points for a full assessment. It provides an indication of the organization maturity level on responsible and trustworthy data science practices. At the end of 2020, the 50/100 threshold can be considered a very advanced maturity level.


The mechanism for calculating the score is relatively simple:

  • with each version of the assessment we define a number of points for each response item of each evaluation element, as well as a so-called importance weighting calibrated to ensure that the theoretical maximum total is exactly 100.
  • for single response evaluation elements, the number of points of the selected item is retained, while for multiple response evaluation elements, the number of points of all selected items are summed.
  • the total score obtained is the sum of the number of points for each evaluation element, weighted by the importance weighting.

There is, however, a subtlety in cases where one is not concerned by certain evaluation elements and the risk universes corresponding to them. Indeed, it would be illogical to deprive the organisation of points associated with evaluation elements that do not concern it but which other organisations concerned by this risk can obtain. Similarly, it would be illogical to immediately obtain all possible points, at the risk otherwise of automatically having a very high score as soon as little is actually done. The mechanism for dealing with this point is as follows:

  • If you are not concerned by an evaluation element, you are automatically awarded half of the maximum number of points for the element. The other half is added to a temporary variable, the number of points that cannot be obtained.
  • Once all the assessment items that do not concern you have been dealt with, an intermediate score is calculated by summing the points for each item. This intermediate score is therefore not out of 100, but out of an intermediate maximum = (100 - the number of points that cannot be obtained).
  • This intermediate score is then dilated to be brought back to 100; dilated by a factor (100 / maximum intermediate).
  • This mechanism is a compromise to ensure : (i) that not being affected by certain risks is taken into account; (ii) that the score of any assessment is always out of 100.

Finally, here is some additional information in the form of answers to frequently asked questions:

  • Why is the value of each response item not seen during the assessment? In studying several systems for evaluating professional practices in different sectors, it has become apparent that it is rather good practice not to show these values. The aim is to give priority to the content, and to limit the risk of distracting the user by constantly putting before his eyes the numerical elements that can lead to an attempt to optimise his answers.

About Labelia Labs

Since 2019, Labelia Labs has been bringing together Data Science practitioners through a Meetup with over 300 members to concretely and operationally explore best practices, resources, and tools to bring about a positive practice of Artificial Intelligence, limiting risks and negative externalities.

Thanks to this community, a digital commons has been created: the Responsible and Trustworthy Data Science Assessment . Evolving biannually and overseen by an independent committee, this commons identifies assessment points, best practices, resources and technical tools for responsible AI.

In order to help practitioners in a concrete way, Labelia Labs has set up evaluation and rating platform allowing any organization to evaluate its level of maturity regarding its practices.

Since October 2021, Labelia Labs offers the most mature organizations to become Labelia and join a community of companies applying high standard in their data science practices.