Have you ever developed a model-based tool and didn’t have the instances to test it and run benchmarks? Then, maybe this post is for you.
In recent years, in the AtlanMod team we have focused in the development of core technologies to support scalable modeling processes (see for instance Neo4EMF – Big Models made possible with EMF and Neo4j). However, after doing the initial testing, sometimes we didn’t have the adequate models to test and evaluate our tool. This is especially true when the needed models have to be of a considerable size, and thus, they cannot be created manually.
To solve this, sometimes we were able to use existing models from other case studies, or we simply obtained them from reverse engineering big sources of codebase. However, this is not always possible, and here is where the EMF random instantiator comes into play.
An EMF (pseudo) random instantiator
The EMF random instantiator is an open source (EPL) utility that produces sets of pseudorandom instances for EMF (Ecore) metamodels. The instantiator has been developed with three main goals:
- No domain knowledge is needed to generate the instances.
- The size of the resulting model can be controlled in terms of number of elements.
- The generation should be deterministic, providing the same set of result instances when a seed is specified.
A default configuration using uniform probability distributions for each meta-class and structural feature is provided. This configuration is ready to use, and can be invoked from command line. The default configuration considers that any non-abstract EClass without a required containing EReference is a valid type for a root EObject.
A generation configuration holds information such as (i) the metaclasses that should (not) be involved in the generation; and (ii) the probability distributions to establish how many instances should be generated for each metaclass (and which values should be assigned to structural features). Details of the configuration can be found here, and of course it can be tweaked for your own needs.
The instantiation process is guided by a goal number of EObjects (i.e., expected size of the result instance in terms of number of elements). The generation stops once this number of elements is reached and no multiplicity constraint in the containment references is violated.
The current implementation uses XMI as the persistence format for the generated models. So that, although models of up to 2 billions of elements are theoretically supported, the actual maximum number is limited by the hardware configuration. In our experiments (and with the proper hardware) we have generated models of up to 30M elements producing XMI files of up to 20GB (approx.).
Since the default configuration is metamodel-agnostic, generated models are not guaranteed to successfully pass an EMF diagnostic. This could happen for several reasons: the configuration parameters do not allow the creation of instances of a specific class, the creation of a cross reference modifies the opposite side, etc. Nevertheless, the tool can (optionally) run a diagnosis on the set of generated models and provide a detailed report if any error is found.
Running the EMF random instantiator
The EMF random instantiator can be directly executed using the provided Fat JAR which contains all the required dependencies:
$ java -jar dist/instantiator.jar <program arguments>
The only required argument is the file containing the metamodel to instantiate. Nevertheless, several additional parameters can be configured as shown in the usage information. Some of the most interesting are:
Average models’ size, e.g. goal number of elements (defaults to 1 000)
Variation ([0..1]) in the models’ size (defaults to 0,1)
Average number of references per EObject (defaults to 8). Actual sizes may vary +/- 10%.
Average size for attributes with variable length, e.g. Strings, arrays… (defaults to 64). Actual sizes may vary +/- 10%.
Seed number (random by default)
The next figure shows two models generated automatically using the EMF random instantiator.
The command to generate those instances is:
$ java -Djava.util.logging.config.file=logging.default.properties -jar instantiator-fatjar.jar -m Grafcet.ecore -n 2 -s 100 -g
It specifies that a set of 2 random models will be generated, of an average size of 100 elements, and using the default configuration with a random seed. The file
logging.default.properties is used to control the verbosity of the log messages. The produced log messages can be seen in our GitHub page.