AI Labelling Tools @ Michelin

Michelin journey to enhance computer vision data enrichment

AI Labelling Tools @ Michelin
Illustration from Robert Laursoo (Unsplash) with SAM3 processing

As a leading tire manufacturer, Michelin has always been at the forefront of integrating new technologies into its industrial operations. Artificial Intelligence (AI) is one of these technologies. However, we quickly realized that it was necessary to collect and enrich data before we could leverage it for AI image processing.

Are we in pain ?

First of all, the first question we had to consider was :

Should we use AI for industrial computer vision ?

To answer this question, we need to reflect on the types of problems we may encounter in our industry related to computer vision.
We can categorize them into two different types: Machine Vision and Wild Vision.

Machine vision

In machine vision, the goal is to make the acquisition as reproductible as possible. This approach allows us to control many influential parameters, but it requires :

  • Highly Reproductible Acquisition & Illumination
  • Pattern Matching / Shape Detection for realignment before measure
Machine Vision - Developpment Pipelines

There are several benefits :

  • Can be Embedded in Sensors
  • Accessible Licensed Toolbox
  • High Precision Measurement
  • (New) Embedded AI for Complex Shapes Detection/Classification

For these types of topics, we can rely on suppliers, and the hardware complexity is low because most of the time we do not need a dedicated computer.

Before 2018, most of our vision applications were designed this way.

A limitation is that this approach is using a lot of hyper-parameters like patterns / thresholds / ROI, what is not very flexible by design.

Vision in the Wild

On the other hand, we have vision in the wild. Sometimes, it is very difficult to control the acquisition due to the diversity of environments and products to analyze.

Wild Vision - Most of the time requires a PC Based Architecture

We may multiply recipes that are specific to what we want to measure at time T, but this leads to a high operational complexity.

We relyed heavily on human eyes to do this kind of verifications, but with AI we can significantly enhance this process.

That is where we were in pain before AI Era.

As you can see, there is 3 ways to address these projects, OEM, Licensed and custom image processing.
All of these fields have been challenged by computer vision AI and we are just at the begining of the journey.
In this article, we'll focus on tooling for AI Custom Processing.


AI Development Pipeline

Now we have identified where we are in pain, we must explore common AI development pipelines.

AI Development Pipeline @ Michelin 2018 -> 2025

All steps may be necessary or not.

  • DataAcquisition/Gathering - To collect a representative set of images.
  • DataLabelling - To enrich image with information about what we want to do.
  • Training - To train a model that is capable of reproducing the annotations.
  • Deployment - To deploy the model in production.

Note that there is two backward process that we call Finetuning Loop, and Feedback Loop.

  • Finetuning Loop - Use the trained model to assess the releavance of our labels / data
  • Feedback Loop - Collect new data to improve the model in further iterations.

In this process multiple competencies and individuals work together. For example, a central Data scientists may work with technicians working in the factory, potentially in a different country.
Because not everybody is in the same place, it may be challenging to work together on the same data.

That's exactly the purpose of a Collaborative AI Labelling Tool.


Michelin Collaborative Labeling Journey

Michelin AI Labeling Journey

Non Collaborative Era (2017 - 2021)

During the non collaborative Era, we used multiple labelling tools that were very specific to the project we wanted to work on.
Most of the time, images and labels were stored locally. Data scientists labelled images independently based on the specifications they received in order to ultimately design algorithm and train models to meet business requirements.

To share their results, they used metrics and notebooks to display several predictions/annotations representative of performance.

Advantages

  • Data scientists were very close to the data and could label the way they believed was best for the model
  • Few disagreement among labelers (because there is only one)

Drawbacks

  • Wasted Development time to :
    • Develop/Adapt the labellingTool
    • Develop/Adapt algorithms to fit with non-standard label format.
  • Wasted Data scientist time, as they were not focus on their datascience expertise, which positioned them as project bottlenecks
  • Difficulties in merging dataset from different sources when we have acquisition performed in multiple factories.

Initially, this is the last need that drove our decision to seek for a collaborative labelling tool at the time we wanted to do this.

Single Solution Era (2021 - 2024)

After comparing of different suppliers at the time, in 2021 we decided to purchase Labelbox, a SaaS platform as collaborative labeling tool with a hybrid cloud architecture.

This architecture allows us to store data (mainly images) on Michelin's side, with very strict access control on the storage account. By uploading data from different sites around the world to our VPC (Virtual Private Cloud), we can access images from various sources, allowing us to share data among multiple stakeholders worldwide and scale easily.

Labelbox - Architecture and capabilities


In Labelbox, we can configure multiple ontologies, what is flexible enough to implement the vast majority of our project at the time.

This architecture allows us to make sure we can put under control who access to the images we put on the platform in coherence with our security requirements.

Advantages

  • Collaboration across the world
  • Handle multilayered images
  • Unified Labeling Format / Exports

Drawbacks

  • Cost of the SaaS that follows the scales with data/labels
  • Rigidity of the platform on some use cases
  • Model Assisted Labeling schema is complex and difficult to manipulate
  • ReadOnly capabilities requiring IT skills for synchronization
  • Right/Permission Management that requires global administrators who have access to everything in order to upload new data what is not very convenient in term of security.

In 2024, we decided to continue working with this supplier. However, we also decided to implement another solution to handle new use cases.

Multiple Solutions Era (2024 - Today)

In order to have a playground for proof of concepts and users in factories, we decided in 2024-2025 to implement an opensource solution in parallel with Labelbox.

This solution responds to multiple needs :

  • Give a solution for smaller scale projects
  • Limit the number of global administrators of Labelbox
  • Simplify the data ingestion process

In order to do that we forked Labelstudio opensource version (Apache 2.0).

Initially, we decided to fork because we wanted to use a service principal authentication instead of the account key to manage read/write rights on the storage account and to fix some UI problems when dealing with multi-layered image.
We attempted to contribute on the opensource project but without success.
- Service Principal PR - Labelstudio
- MultiLayered UI PR - Labelstudio

This forked version is built with a CI pipeline and deployed on demand depending on factories requests with a dedicated url in michelin infrastructure.

This solution is giving us more flexibility and allows us more creative labelling pipeline due to the easy synchronisation mechanism that is not possible in Labelbox where images must be declared through SDK.

Labelstudio - Architecture & Usage
Historical way to label data without pre-labelling
Capability using a pre-labelling pipeline

Conclusion

You can see in this article that we had to experiment a combination of a lot of tools to arrive on current tooling catalog.

Concerning our BCM, we definitely target to come back on well maintained last opensource version of labelstudio to make sure we are able to benefit from cutting edge features.

In this article we are only focusing on custom image processing...

Focus of current article

But, Licensed and OEM solutions exists as well, with their strengths and weaknesses. They have done a lot of improvement from 2018 to now and they are still competitive on even more scenarios especially OEM processing that allows to reduce the hardware complexity.