{"id":8810,"date":"2024-01-28T22:06:55","date_gmt":"2024-01-28T22:06:55","guid":{"rendered":"https:\/\/modeling-languages.com\/?p=8810"},"modified":"2024-01-28T22:15:54","modified_gmt":"2024-01-28T22:15:54","slug":"data-models-and-ai","status":"publish","type":"post","link":"https:\/\/modeling-languages.com\/data-models-and-ai\/","title":{"rendered":"The perfect three-way: data, models and AI"},"content":{"rendered":"

In the last decade, we have witnessed an explosion of research on new architectures, training methods, fine-tuning strategies, etc. for machine learning (ML). But we are now entering a new phase where all these new approaches are becoming a commodity. Platforms like HuggingFace<\/a> do an outstanding job in making all the latest results accessible to everyone. And its exponential growth<\/a> confirms it.<\/p>\n

Therefore, using the latest ML architectures alone is not a competitive advantage<\/strong> anymore. Instead, companies need to turn their attention to the data<\/strong> used for training the ML models. Better data turns into better ML<\/strong>. As simple as this. And the only way to evaluate the quality of the data is to understand it. Both their underlying structure and their gathering and annotation process.<\/a><\/p>\n

Data and data models are back in fashion<\/strong> thanks to the AI fever. But this also means that classical problems like data annotation, data mining, data fusion, data composition, etc., now in an ML context, must be revisited. For instance, ML often relies on big data sources that seem to be schemaless<\/a>. But this is not really true. At most, we can say it is \u00a0\u201cless-schema<\/strong>\u201d than other data, and we may need to first infer the implicit schema behind that data to be able to interpret it. And we could discover more than one possible schema as data is not always a static artefact. It’s rather a\u00a0partial, dynamic and temporal view of the data to facilitate manipulating the data at that specific instant.<\/p>\n

I believe that the three-way relationship resulting from the interweaving of data, data models and AI reinforces each of them. Let’s see a relevant scenario as a representative of each combination:<\/p>\n