Coupling metamodeling with machine learning (ML) approaches is a relatively recent trend. One of the first works introduced the term model cognification, which could be informally described as the extraction of knowledge from models and metamodels to be used in different tasks, presenting existing challenges and possible research opportunities. Our work focuses on one of these challenges.

We are working on the specific task of classifying unstructured models into metamodels. In other words, given a set of schema-less JSON documents and a set of metamodels, we want to discover whether the JSON documents could be categorized as (partially) conforming to one of the available metamodels.

This classification is useful to identify the domain of unstructured models, in model discovery tasks, and to be used as initial feedback for model interoperability (for instance, to migrate JSON models into statically typed models).

The approach execution flow is illustrated in the Figure below. It is inspired by classification tasks in machine learning approaches, but adjusting the artifacts and algorithms to the modeling / MDE ecosystem. While this description may seem straightforward, this “adaptation” has several difficult implications that need to be taken into consideration (as reported as well in other applications of ML to MDE).Overview of our model classification approach

As depicted in the figure, the main tasks in the approach are:

  • extracting the metamodels and entering them into the neural network;
  • training the network;
  • extracting the JSON documents;
  • classifying the documents.

There are several kinds of neural networks (NN) and plenty of different (hyper)parameters to play with. A given NN may be good for image processing, another one for text processing, and so on. So choosing and setting the right neural network was the first challenge. In our case, it was particularly difficult because we did not have (yet) enough experience with ML techniques. To make it simple, we started as an exploratory work and we chose Multi-Layer Perceptron (MLP) with 3 and 5 hidden layers.

Second, it is necessary to define a feature vector to represent the input metamodels. A metamodel can be encoded in several different ways. We have also started with a simple solution, encoding the elements using OHE (One Hot Encoding), in which each existing characteristics has a 0 or 1 value in the feature vector. We encoded each model element as “existing” or “missing” (0 or 1). Any other feature existing in an Ecore-based metamodel could be encoded. However, we foresee that such encoding will lead to dimensionality reduction problems.

Once the encoding is ready, the training of the network is done with part of the input metamodels. We used MySQL, KM3, UML and Java metamodels, extracted from the ATL transformations site. After the training, we finally can start the classification. We have automatically generated JSON models with 50 and 100 elements, by mixing the elements of the input metamodels, with different degrees. For instance, one generated JSON has 80% of elements of the MySQL metamodel and 20% of elements of the KM3 metamodel. This was done to verify the behavior of the classifier. The table below show the evaluation of a MLP with 5 hidden layers, and input models with 50 and 100 elements. The accuracy of classification results were quite good in this controlled setting.

This first work on the classification of unstructured models into metamodels has been shown very promising, giving us several paths that could be followed. The three most interesting (for us) are: 1) the utilization/adaptation of different kinds of neural networks, which could be better adapted to metamodels; 2) the definition of adapted techniques to encode metamodels into feature vectors; 3) better integration with a MDE framework.

This work has been published at MODELSWARD 2020 (download the paper here, download poster here), describing the approach in detail:

  • Walmir Couto, Emerson Morais, Marcos Didonet Del Fabro. Classifying Unstructured Models into Metamodels Using Multi Layer Perceptrons, 8th MODELSWARD, v.1, pages 271-278, Valletta, Malta. February 2020
  • The implementation of the extraction training and classification scripts are available for download at https://github.com/walmircouto/MLPTraining.
Want to build better software faster?

Want to build better software faster?

Read about the latest trends on software modeling and low-code development

You have Successfully Subscribed!

Pin It on Pinterest

Share This