Javier Cánovas talks today about ScheMoL, his approach (co-authored with Oscar Díaz, Gorka Puente and Jesús García Molina ) for the reverse engineering of relational databases. Note that, ulike other approaches, the goal is not to get a model representing the database schema but to transform the database data into a model (conforming to a metamodel aligned with the database schema). Enter Javier.
Extracting models from both source code and data is an essential task in model-driven software reengineering since they are commonly the main assets of an application. Several approaches address model-driven code reverse engineering, however, little attention has been paid to data model-driven reverse engineering. In fact, the extraction of model from relational data is normally performed by means of ad hoc solutions or object-relational mappers, which require performing hard and time-consuming. Bearing these problems in mind, we created ScheMoL, a Domain-Specific Language (DSL) for extracting models from relational data.
In ScheMoL, a model extraction process is considered as a database-to-model transformation, so mappings between data elements (i.e., tables) and metamodel elements (i.e., metaclasses) are explicitly specified. Figure 1 shows a ScheMoL transformation process, which has three inputs: (1) data conforming to the relational database schema D, (2) a target metamodel MMT and (3) a transformation definition T. The output of the process is a model which conforms to the target metamodel.
Figure 1. ScheMoL transformation process
The language is deeply inspired by Gra2MoL, which is a code-to-model transformation language. Like Gra2MoL, ScheMoL is a rule-based transformation language and also uses the concept of binding as ATL, which was adapted to deal with tuples. In this sense, the source element of a rule is a database element (i.e., a table) rather than a metamodel element. Moreover, ScheMoL also incorporates a query language which has been specially adapted to collect information from databases. A ScheMoL transformation is therefore composed of a set of transformation rules and, optionally, a preamble. The preamble of a transformation definition allows developers to specify ad hoc foreign keys and views.
Figure 2a shows a simplified example of a ScheMoL transformation process to extract models from a database storing student data. The inputs of the process are: (1) the database schema defining the UniversityTable and StudentTable tables, where StudentTable has a foreign key to UniversityTable (i.e., university_fk_id column); (2) data conforming to such database schema; (3) a simple target metamodel to represent universities (University metaclass) and their students (Student metaclass); (4) and the ScheMoL mapping definition, which we explain below. The result of the ScheMoL transformation process is a model conforming to the target metamodel.
Figure 2. Simple example of a ScheMoL transformation process. (a) The inputs and outputs of the process and (b) the ScheMoL mapping definition used in this example.
The ScheMoL mapping definition used in this example is composed of two rules, which are listed in Figure 2b. The first rule, called mapUniversity, starts the transformation process and creates an instance of University metaclass from the only tuple of UniversityTable table in the example. The first mapping initializes the id attribute by accessing to the id column of the UniversityTable tuple. The second mapping is a binding whose right-hand side is a query which collects every tuple in StudentTable referring to the current UniversityTable tuple and the left-hand side refers to the students reference of the University metaclass. Since the StudentTable contains two tuples, there are two query results and the binding causes executing the mapStudent rule twice, which creates two instances of Student metaclass and initializes the name attribute by accessing to the name column of the StudentTable tuple received by the rule.
The language was extensively tested by the Onekin research group in several case studies aimed to extract models from Web 2.0 data stored in relational databases. The development of these case studies allowed us to devise an extension mechanism to permit defining new operators in the mappings part. Thus, new operators were defined as language extensions in order to support the extraction of models from Web 2.0 data (e.g., support for annotated data).
If you want to know more, you can read the full paper:
Oscar Díaz, Gorka Puente, Javier Luis Cánovas Izquierdo, Jesús García Molina: Harvesting models from web 2.0 databases. Software and System Modeling 12(1): 15-34 (2013)
(official link, preprint version)