We are pleased to announce the B-UML Dataset, a large dataset of B-UML models designed for anybody interesteed in doing some research in modeling languages, model-driven engineering, and AI-assisted modeling. You can access it on this Github repository.
This dataset has 5410 models; each of them is accompanied by an image graphically depicting the model, a python generaed implementation of the model, a (deterministic) simple description of the model in English, model category, and metadata around the model size. In total, 37870 files are in the dataset. Each BUML model in the dataset is directly editable on the BESSER online editor.
Why yet another model repository?
We have seen many attempts to collect models. Ours doesn’t aim to reinvent the wheel. In fact, the dataset is systematically created based on the Ecore ModelSet by José Antonio Hernández López, Javier Luis Cánovas Izquierdo, and Jesús Sánchez Cuadrado. We would like to sincerely thank Jesús for guiding us through the original database.
The novel part of our dataset is that it’s linked to our BESSER open source platform. This enables going beyond a simple model repository. We can manipulate and easily generate, for instance, an implementation of that model using the variety of code generators in BESSER. As an example, a Python implementation of every model is provided. But you could go further and enrich more the dataset, increasing its value for a number of empirical analysis or for some type of AI training.
What Is in the B-UML Dataset?
As of today, the dataset has 5410 models and spans 220 distinct categories, reflecting the diversity of modeling domains. Some of the most common categories include statemachine (388 models), library (230 models), and modelling (209 models). Here are the top 10 categories in the dataset
| Category | Number of Models |
|---|---|
| dummy | 719 |
| statemachine | 388 |
| library | 230 |
| modelling | 209 |
| petrinet | 234 |
| class-diagram | 180 |
| gpl | 172 |
| metamodelling | 161 |
| unknown | 156 |
| workflow | 109 |
Every B-UML model in the dataset comes with the following artifacts:
🧩 B-UML Model (Editable in BESSER WME)
The B-UML representation, directly usable and modifiable in BESSER WME.
🖼️ Model Image
A rendered visual diagram for quick inspection and documentation purposes. Such as
📊 Structured Metadata
Each model is annotated with precise metadata, including:
-
Number of classes
-
Number of associations
-
Number of attributes
-
Number of functions (operations)
📝 Deterministic Textual Description
Each model includes a deterministic natural-language description of classes, attributes and associations. For example 
The description is generated using a simple set of rules. A more “requirements-like” description is in the works.
🐍 Python Code
Each B-UML model is also available as Python code, where the model classes are transformed into Python classes mirroring the model structure. As we said above, you could easily plug any of our other generators (for SQL, for Java, for APIs…)
Availability
The B-UML Dataset is publicly available on GitHub to support easy access, and issue tracking. You are welcome to browse the dataset and submit your own issues and improvement suggestions. Check it out here: https://github.com/BESSER-PEARL/BESSER-Dataset


Recent Comments