Subscribe Button 1

BMW Publishes Largest Open-Source Dataset For Production AI Applications

The BMW Group is publishing the world’s largest data set to streamline and significantly accelerate the training of artificial intelligence in production. The synthesized AI dataset – known as SORDI (Synthetic Object Recognition Dataset for Industries) – consists of more than 800,000 photorealistic images. These are divided into 80 categories of production resources, from pallets and pallet cages to forklifts, and include objects of particular relevance to the core technologies of automotive engineering and logistics.

By publishing SORDI, the BMW Group together with its partners Microsoft, NVIDIA and idealworks is making available the world’s largest reference dataset for artificial intelligence in the field of manufacturing. The visual data is of particularly high quality, and the integrated digital labels enable basic image processing tasks to be carried out, such as classification, object detection or segmentation for relevant areas of production in general.

“The BMW Group has been using artificial intelligence since 2019. AI has already been utilized in various quality assurance applications in production at the plants. SORDI, the new, synthetic dataset makes AI models much faster to train and AI considerably more cost-efficient in production,” says Michele Melchiorre, Senior Vice President of BMW Group Production System, Planning, Tool and Plant Engineering.

To create the synthesized AI training data non-manually, the simulated environment for robotics, the digital twin of the production system and the AI training environment were all fused within the NVIDIA Omniverse. The rendering pipeline from the BMW Tech Office in Munich allows any number of photos, including labels, to be synthesized in sufficient photorealistic HD quality for them to be used in the creation of highly robust AI models. SORDI can be utilized by IT professionals to develop and tailor AI solutions for manufacturing, and by production employees to maintain mature AI systems for validation purposes ready for the start of production.

Freely available to software developers, the publication of the innovative dataset represents the next targeted step in the BMW Group’s systematic expansion of activities to democratize artificial intelligence ( The publications of no-code AI and SORDI complement each other: on the one hand, the BMW Labelling Tool Lite and published AI training tools explicitly allow users to use AI intuitively, even if they lack sound IT expertise. On the other, SORDI’s synthesis significantly accelerates and simplifies the training of AI models for production applications.

For more information: