The perfect three-way: data, models and AI

In the last decade, we have witnessed an explosion of research on new architectures, training methods, fine-tuning strategies, etc. for machine learning (ML). But we are now entering a new phase where all these new approaches are becoming a commodity. Platforms like HuggingFace do an outstanding job in making all the latest results accessible to everyone. And its exponential growth confirms it.

Therefore, using the latest ML architectures alone is not a competitive advantage anymore. Instead, companies need to turn their attention to the data used for training the ML models. Better data turns into better ML. As simple as this. And the only way to evaluate the quality of the data is to understand it. Both their underlying structure and their gathering and annotation process.

Data and data models are back in fashion thanks to the AI fever. But this also means that classical problems like data annotation, data mining, data fusion, data composition, etc., now in an ML context, must be revisited. For instance, ML often relies on big data sources that seem to be schemaless. But this is not really true. At most, we can say it is “less-schema” than other data, and we may need to first infer the implicit schema behind that data to be able to interpret it. And we could discover more than one possible schema as data is not always a static artefact. It’s rather a partial, dynamic and temporal view of the data to facilitate manipulating the data at that specific instant.

I believe that the three-way relationship resulting from the interweaving of data, data models and AI reinforces each of them. Let’s see a relevant scenario as a representative of each combination:

Data + AI -> Models. We can use AI techniques to infer the data models representing the structure of the dataset and the behavioral models that could be used to Create/Read/Update/Delete instances of the dataset. This is the key idea behind our low-modeling approach at the core of BESSER.
Data + Models -> AI. Properly annotated data — including ethical aspects — can improve the quality of AI components trained on such data and prevent possible ethical biases. Or at least, making data users aware of the limitations of such dataset.
AI + Models -> Data. We can use AI techniques to synthesize new artificial data compliant with a certain model structure. This can be used to test the software models. But even more, to generate enough data to train ML models on domains (such as the health domain) where it’s difficult to collect real data.

These scenarios impose new requirements for the conceptual modeling field. In this new AI age, models are not a static element in the development process, but they become dynamic as they often need to change and evolve to remain aligned with the data (and the data drifts). They are also partial (as they may represent only parts of the data) and uncertain (as we may not be completely sure of how accurate they are, e.g. when they are automatically inferred).

But despite these challenges, conceptual models remain a key asset. A good example of this is the promotion of the common European data spaces to facilitate the data exchange among partners within a data domain. This exchange requires the partners to agree on a unified conceptual data model to ensure interoperability. It’s not surprise that modeling languages such as the Entity Relationship language are experiencing a revival.

And let’s not forget, ML models are also models!

Everything is a model – Jean Bézivin (On the unification power of models)

This means that the conceptual modeling community has the chance to bring their expertise to the AI world, helping the AI community to improve the way they represent, transform, reuse and deploy ML artefacts. Looking forward to seeing how we can bring AI-based engineering to a whole new level thanks to our decades of expertise in conceptual modeling.

This reflection is part of the panel discussion AI-Driven Software Engineering – The Role of Conceptual Modeling we had at ICSOFT 2023. The panel was coordinated by Hans-Georg Fill and I had as co-panelists Wolfgang Maass and Marten Van Sinderen.

Jordi Cabot

FNR Pearl Chair. Head of the Software Engineering RDI Unit at LIST. Affiliate Professor at University of Luxembourg. More about me.

The perfect three-way: data, models and AI

Submit a Comment Cancel reply

Showcase your modeling / low-code tool

Modeling: all you need to know

Show your tool to hundreds of modeling experts

Recent Comments

Twitter / X

Want to build better software faster?

You have Successfully Subscribed!

Pin It on Pinterest

Share on Mastodon