We report about a new model generator which automatically creates well-formed instance models that satisfy consistency constraints. Our model generator is based on a graph solver that natively operates over EMF-based (graph) models and combines advanced graph algorithms (like partial models, graph shapes, incremental graph queries) with rule-based exploration driven by traditional SAT solving techniques. As a result, the generator can derive consistent models with several thousands of objects. We believe that **tackling consistency is the first key step towards a more general class of model generators**.

## The CORE DISC model generation challenge: consistent, realistic, diverse, scalable

Automatically generated graph models can be key artifacts in various application scenarios such as testing or benchmarking design tools, autonomous smart cyber-physical systems, graph databases or object-oriented programs. However, the actual model generation challenge can be substantially different. We identified **four desirable properties for an ideal model generator**:

*Consistent*model generators need to derive models which satisfy (or intentionally violate) a set of well-formedness constraints or design rules;*Diverse*model generators need to derive models which are substantially different from each other (e.g. each model is from a different equivalence class);*Realistic*model generators need to synthesize models which resemble real model instances created by engineers or taken from the real system;*Scalable*model generators need to create models which are growing exponentially in size. This way, they are appropriate for performance benchmarking of techniques and tools.

Some of these desirable properties of models may sound self-contradictory. For example, a real system model created by an engineer is dominantly consistent as it satisfies most well-formedness constraints. But it is not scalable and not diverse as real models typically use only a small subset of model elements and possible structures. As such, it is not useful for testing or benchmarking purposes.

On the other hand, random generators may be scalable and diverse, but they provide no consistency guarantees at all. In fact, randomly generated models will violate many well-formedness constraints – as such, they are hardly realistic at all. Therefore, we need to tackle the problem of generating consistent models if we ever wish to generate realistic models.

## Model generators for consistent models

Existing generators for consistent models (see [1] for a detailed overview of existing approaches) dominantly follow a similar pattern. First, they map the metamodel, the set of constraints, and optionally, an initial seed model into an underlying logic solver. They use the sophisticated decision procedures of the logic solver to derive models. The range of logic solvers used in model generation includes SAT solvers (e.g. SAT4J, MiniSAT), SMT solvers (e.g. Z3), CSP solvers (e.g. Choco), or model finders (like Alloy).

**While using Alloy and Z3 for over 6 years, we failed to ever generate models for industrial modeling tools no matter how hard we tried to fine tune the mapping**. For example, an auto-generated Yakindu statechart model with synchronization never had more than 50 objects. And Yakindu statecharts have a reasonably simple metamodel and set of constraints compared to any real languages and tools like Capella, AUTOSAR, SysML or UML.

The Alloy Analyzer has been a popular choice as a backend solver in various model generators. But we took a closer look at mappings to Alloy and identified some major weaknesses:

**Too many Boolean variables for graph structure**. Existing approaches represent each potential edge (link) between a pair of nodes (objects) as a separate Boolean variable. As such, it blows up the state space even when the graph is sparse with only linear number of edges.**Checking of well-formedness constraints over graphs**. Existing logic solvers spend much time on repeatedly evaluating well-formedness constraints on instance model candidates.**Isomorphism of graphs as states**. Existing logic solvers frequently have problems in efficiently detecting if two (graph) states are identical or symmetric as it may necessitate complex graph isomorphism checks.**Lack of control over generation process**. Once the mapping is completed, logic solvers are dominantly handled as black-box components. As such, back-end solvers cannot exploit domain-specific information as hints or strategies.

## A graph solver for generating consistent models

In the last 18 months, we have started to take a different path with Oszkár Semeráth by designing a *solver that operates directly over graphs* to derive consistent models, which will be presented at ICSE 2018 [2]. Essentially, our approach exploits four key ideas:

**Refinement over partial models**. Consistent instance models are generated by gradually refining an abstract initial model. For internal model representation, we adapted and generalized uncertain partial models (from M. Chechik, R. Salay and M. Famelis). We proposed a precise refinement calculus in [1] over partial models which can detect unrepairable inconsistencies to remove infeasible partial solutions early.**Checking graph constraints**. To enforce the checking of consistency constraints as early as possible, we carry out an uncertain (3-valued) evaluation, which may reveal a certain or a potential inconsistency in one or all possible refinements of a partial solution (in the style of M. Sagiv, T. Reps and R. Wilhelm). A constraint rewriting technique [4] enables to use the efficient incremental graph query engine of VIATRA for 3-valued constraint evaluation.**Handling of graph states**. Since checking graph isomorphism is computationally complex, logic solvers use different (symmetry breaking) heuristics to identify surely isomorphic states. Instead, we use graph shapes and neighborhood abstraction (from A. Rensink) to identify equivalence classes for graphs [3], and to automatically remove model candidates that fall into the same equivalence class.**Model generation process**. Finally, we combined core SAT solving techniques with rule-based design space exploration over graphs [5] to terminate search along inconsistent paths early. Separating decision and unit propagation rules enables to keep model candidates in a close to consistent state during model generation.

## An open source graph solver tool

Our approach is implemented in an open source graph solver tool (to be called **VIATRA Solver** once it gets approval from Eclipse Foundation) which operates over metamodels captured using the Eclipse Modeling Framework (EMF). Currently, well-formedness constraints can be specified in the VQL graph query language of the VIATRA query and transformation framework, but the solver framework is designed in a way to incorporate OCL constraints in the future.

The solver framework is available as an Eclipse plugin (tightly integrated with Ecore metamodels and the VIATRA query language) or in a stand-alone mode run from the command line. The tool offers a configuration DSL (developed in Xtext) to easily customize the model generation process with content assist. For example, one can select what backend solver to use (currently, Alloy, Z3 and our graph solver) or how many new elements (and of which type) can be used for generating a new model, etc.

Finally, we really hope that some of the readers will successfully use our graph solver for future research. For testing and benchmarking purposes, **our tool provides a set of case studies used in some our papers and projects**. The visualization of graphs (in GML, formatted by Yed or Gephy) and the recording of runtime measurements as CSV files may speed up experimental evaluations in your research.

## Sample auto-generated models

Altogether, our graph solver can generate consistent models with thousands of model elements. Hence we include two examples of auto-generated graph models: a small statechart model for Yakindu, and a large Functional Architecture Model (with 6,250 nodes and 13,722 edges).

## References

- Varró, O. Semeráth, G. Szárnyas, Á. Horváth: Towards the Automated Generation of Consistent, Diverse, Scalable and Realistic Graph Models. Graph Transformation, Specifications, and Nets 2018: 285-312
- Semeráth, A. Nagy, D. Varró: A Graph Solver for the Automated Generation of Consistent Domain-Specific Models, 40th International Conference on Software Engineering (ICSE 2018), Gothenburg, Sweden, ACM, 2018.
- Semeráth, D. Varró: Iterative Generation of Diverse Models for Testing Specifications of DSL Tools, 21st International Conference on Fundamental Approaches to Software Engineering (FASE), Thessaloniki, Greece, Springer, 2018.
- Semeráth, D. Varró: Graph Constraint Evaluation over Partial Models by Constraint Rewriting, Proceedings of the 10th International Conference on Theory and Practice of Model Transformation, Marburg, Germany, Springer, 2017.
- Á. Hegedüs, Á. Horváth, D. Varró: A model-driven framework for guided design space exploration. Autom. Softw. Eng. 22(3): 399-436 (2015)

I am a professor of software engineering and a former research chair of the MTA-BME Lendület Cyber-Physical Systems Research Group. My research focuses on model-based software and systems engineering techniques and tools for smart and safe cyber-physical systems. I am co-founder of IncQuery Labs Ltd. and the VIATRA open source model query and transformation framework.