We are pleased to announce the B-UML Dataset, a large dataset of  B-UML models designed for anybody interesteed in doing some research in modeling languages, model-driven engineering, and AI-assisted modeling. You can access it on this Github repository.  

This dataset has 5410 models; each of them is accompanied by an image graphically depicting the model, a python generaed implementation of the model, a (deterministic) simple description of the model in English, model category, and metadata around the model size. In total, 37870 files are in the dataset. Each BUML model in the dataset is directly editable on the BESSER online editor

Why yet another model repository?

We have seen many attempts to collect models. Ours doesn’t aim to reinvent the wheel. In fact, the dataset is systematically created based on the Ecore ModelSet by José Antonio Hernández López, Javier Luis Cánovas Izquierdo, and Jesús Sánchez Cuadrado. We would like to sincerely thank Jesús for guiding us through the original database.

The novel part of our dataset is that it’s linked to our BESSER open source platform. This enables going beyond a simple model repository. We can manipulate and easily generate, for instance, an implementation of that model using the variety of code generators in BESSER. As an example, a Python implementation of every model is provided. But you could go further and enrich more the dataset, increasing its value for a number of empirical analysis or for some type of AI training. 

What Is in the B-UML Dataset?

As of today, the dataset has 5410 models and spans 220 distinct categories, reflecting the diversity of modeling domains. Some of the most common categories include statemachine (388 models), library (230 models), and modelling (209 models). Here are the top 10 categories in the dataset

Category Number of Models
dummy 719
statemachine 388
library 230
modelling 209
petrinet 234
class-diagram 180
gpl 172
metamodelling 161
unknown 156
workflow 109

Every B-UML model in the dataset comes with the following artifacts:

🧩 B-UML Model (Editable in BESSER WME)

The B-UML representation, directly usable and modifiable in BESSER WME

An example model in BESSER WME

🖼️ Model Image

A rendered visual diagram for quick inspection and documentation purposes. Such as

A generated image of an Ecore model

📊 Structured Metadata

Each model is annotated with precise metadata, including:

  • Number of classes

  • Number of associations

  • Number of attributes

  • Number of functions (operations)

📝 Deterministic Textual Description

Each model includes a deterministic natural-language description of classes, attributes and associations. For example 

The description is generated using a simple set of rules. A more “requirements-like” description is in the works. 

🐍 Python Code

Each B-UML model is also available as Python code, where the model classes are transformed into Python classes mirroring the model structure. As we said above, you could easily plug any of our other generators (for SQL, for Java, for APIs…) 

Availability

The B-UML Dataset is publicly available on GitHub to support easy access, and issue tracking. You are welcome to browse the dataset and submit your own issues and improvement suggestions. Check it out here: https://github.com/BESSER-PEARL/BESSER-Dataset

Want to build better software faster?

Want to build better software faster?

Get the latest news in software modeling, model-based and low-code development

Thanks for your interest. Check your inbox and confirm your subscription!

Pin It on Pinterest

Share This