{"id":8810,"date":"2024-01-28T22:06:55","date_gmt":"2024-01-28T22:06:55","guid":{"rendered":"https:\/\/modeling-languages.com\/?p=8810"},"modified":"2024-01-28T22:15:54","modified_gmt":"2024-01-28T22:15:54","slug":"data-models-and-ai","status":"publish","type":"post","link":"https:\/\/modeling-languages.com\/data-models-and-ai\/","title":{"rendered":"The perfect three-way: data, models and AI"},"content":{"rendered":"<p>In the last decade, we have witnessed an explosion of research on new architectures, training methods, fine-tuning strategies, etc. for machine learning (ML). But we are now entering a new phase where all these new approaches are becoming a commodity. Platforms like <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noopener\">HuggingFace<\/a> do an outstanding job in making all the latest results accessible to everyone. And <a href=\"https:\/\/livablesoftware.com\/hfcommunity-huggingface-community-opensource\/\" target=\"_blank\" rel=\"noopener\">its exponential growth<\/a> confirms it.<\/p>\n<p>Therefore, <strong>using the latest ML architectures alone is not a competitive advantage<\/strong> anymore. Instead, <strong>companies need to turn their attention to the data<\/strong> used for training the ML models. <strong>Better data turns into better ML<\/strong>. As simple as this. And the only way to evaluate the quality of the data is to understand it. Both their underlying structure and their <a href=\"https:\/\/modeling-languages.com\/describeml-machine-learning-datasets\/\" target=\"_blank\" rel=\"noopener\">gathering and annotation process.<\/a><\/p>\n<p><strong>Data and data models are back in fashion<\/strong> thanks to the AI fever. But this also means that classical problems like data annotation, data mining, data fusion, data composition, etc., now in an ML context, must be revisited. For instance, ML often relies on <a href=\"https:\/\/modeling-languages.com\/modeling-big-data-compatible\/\" target=\"_blank\" rel=\"noopener\">big data sources that seem to be schemaless<\/a>. But this is not really true. At most, we can say it is \u00a0\u201c<strong>less-schema<\/strong>\u201d than other data, and we may need to first infer the implicit schema behind that data to be able to interpret it. And we could discover more than one possible schema as data is not always a static artefact. It&#8217;s rather a\u00a0partial, dynamic and temporal view of the data to facilitate manipulating the data at that specific instant.<\/p>\n<p>I believe that the three-way relationship resulting from the interweaving of data, data models and AI reinforces each of them. Let&#8217;s see a relevant scenario as a representative of each combination:<\/p>\n<ul>\n<li><strong>Data + AI -&gt; Models<\/strong>. We can use AI techniques to infer the data models representing the structure of the dataset and the behavioral models that could be used to Create\/Read\/Update\/Delete instances of the dataset. This is the key idea behind our <a href=\"https:\/\/modeling-languages.com\/welcome-to-the-low-modeling-revolution\/\" target=\"_blank\" rel=\"noopener\">low-modeling approach<\/a> at the core of <a href=\"https:\/\/modeling-languages.com\/lowcode-opensource-besser\/\" target=\"_blank\" rel=\"noopener\">BESSER<\/a>.<\/li>\n<li><strong>Data + Models -&gt; AI<\/strong>. Properly annotated data &#8212; including ethical aspects &#8212; can improve the quality of AI components trained on such data and prevent <a href=\"https:\/\/modeling-languages.com\/automating-bias-testing-llm\/\" target=\"_blank\" rel=\"noopener\">possible ethical biases<\/a>. Or at least, making data users aware of the limitations of such dataset.<\/li>\n<li><strong>AI + Models -&gt; Data<\/strong>. We can use AI techniques to synthesize new artificial data compliant with a certain model structure. This can be used to test the software models. But even more, to generate enough data to train ML models on domains (such as the health domain) where it&#8217;s difficult to collect real data.<\/li>\n<\/ul>\n<p>These scenarios impose new requirements for the conceptual modeling field. In this new AI age, <strong>models are not a static element<\/strong> in the development process, but they become dynamic as they often need to change and evolve to remain aligned with the data (and the data drifts). They are also <strong>partial<\/strong> (as they may represent only parts of the data) and <strong>uncertain<\/strong> (as we may not be completely sure of how accurate they are, e.g. when they are automatically inferred).<\/p>\n<p>But despite these challenges, conceptual models remain a key asset. A good example of this is the promotion of the common<a href=\"https:\/\/gaia-x.eu\/what-is-gaia-x\/deliverables\/data-spaces\/\" target=\"_blank\" rel=\"noopener\"> European data spaces <\/a>to facilitate the data exchange among partners within a data domain. This exchange requires the partners to agree on a unified conceptual data model to ensure interoperability. It&#8217;s not surprise that modeling languages such as the <a href=\"https:\/\/modeling-languages.com\/entity-relationship-language-er-stronger-than-ever\/\" target=\"_blank\" rel=\"noopener\">Entity Relationship language are experiencing a revival<\/a>.<\/p>\n<p>And let&#8217;s not forget, <strong>ML models are also models!<\/strong><\/p>\n<blockquote><p>Everything is a model &#8211; Jean B\u00e9zivin (On the unification power of models)<\/p><\/blockquote>\n<p>This means that <strong>the conceptual modeling community has the chance to bring their expertise to the AI world<\/strong>, helping the AI community to improve the way they represent, transform, reuse and deploy ML artefacts. Looking forward to seeing how we can bring AI-based engineering to a whole new level thanks to our decades of expertise in conceptual modeling.<\/p>\n<p>This reflection is part of the panel discussion <a href=\"https:\/\/emisa-journal.org\/emisa\/article\/view\/328\" target=\"_blank\" rel=\"noopener\">AI-Driven Software Engineering \u2013 The Role of Conceptual Modeling<\/a> we had at <a href=\"https:\/\/icsoft.scitevents.org\/\" target=\"_blank\" rel=\"noopener\">ICSOFT<\/a> 2023. The panel was coordinated by <span class=\"name\">Hans-Georg Fill and I had as co-panelists <\/span>Wolfgang Maass and Marten Van Sinderen.<\/p>\n<span class=\"et_bloom_bottom_trigger\"><\/span>","protected":false},"excerpt":{"rendered":"<p>In the last decade, we have witnessed an explosion of research on new architectures, training methods, fine-tuning strategies, etc. for machine learning (ML). But we are now entering a new phase where all these new approaches are becoming a commodity. Platforms like HuggingFace do an outstanding job in making all the latest results accessible to [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":8813,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[609,17,49,78],"tags":[741,856],"hashtags":[],"_links":{"self":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/posts\/8810"}],"collection":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/comments?post=8810"}],"version-history":[{"count":7,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/posts\/8810\/revisions"}],"predecessor-version":[{"id":8818,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/posts\/8810\/revisions\/8818"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/media\/8813"}],"wp:attachment":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/media?parent=8810"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/categories?post=8810"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/tags?post=8810"},{"taxonomy":"hashtags","embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/hashtags?post=8810"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}