Let me tell you a story:
In the beginning there where humans.
Some of these humans, where engineers. Logic dictates that they had to create models and share those models with other engineers… and eventually with the world!
So a bunch of engineers got together and thought:
> What if we persists the models in this cool XML based format which we can then send around, and at the other end the models can be recreated?It actually worked really well.
Shortly after, another group of engineers were starting to validate and transform models, or use them to generate code and what not. Activities known as model management. These group of engineers usually grouped elements by type in order to analyse or transform the models. or were only interested in the model demographics (total number of elements, number of elements per type, etc.). As such, these engineers were more interested in the internal characteristics of the models than the format used to share them.
Unfortunately, the needs of the two groups don’t align. The XML based format forces models to be loaded as a whole prior to processing. The modelling engineers were interested in parts of the model.
So that is how JSOI came to be, to provide a format that is model management friendly!
Why JSOI
I’ve been working in Model-Driven Engineering for over 10 years. But it was not until I started working on incremental language execution that I personally felt the pain of working with large models stored in XMI format (probably the most common format for storing EMF models) – but that is a story for another time. Suffice to say that it got me thinking about how we persist models.
Some history
The Object Management Group (OMG) was looking for a model exchange format for the Unified Modelling Languages back in 1997. At that time, the XML was well-established and its tree-based serialisation format was found to be a good fit for the model exchange format. In 1998 the XML Metadata Interchange (XMI) Specification (version 2.5.1 as of writing this post) was born. The XMI specification defines how models (graphs) can be represented as trees, and then how these trees can be persisted in text files using the XML.
The Eclipse Modelling Framework (EMF) was released in 2007; the Atlas Transformation language dates back to 2006; the languages from the Epsilon Modelling Framework appeared between 2006 and 2010; others have come after. [1]
Model management languages came late to the party, else, in my opinion, the modelling language community would have sent a proposal to the OMG back in 1997; one that was designed with model management tasks in mind additionally to the exchange requirements.
JSOI in a nutshell
JSOI is a Interchange Format for Efficient Model Management. Before getting into the technical details behind the language, let’s get hands-on. Using the Railway Benchmark metamodel in the figure above, a conformant JSOI model will look like this:
{ ... "railway:RailwayContainer": { "size": 1, "elements": [ { "routes": [ "jsoi:$.railway:Route.elements[0]" ], "regions": [ "jsoi:$.railway:Region.elements[0]" ] }] }, "railway:TrackElement": { "subtypes" : [ "railway:Segment", "railway:Switch" ]}, "railway:Route": { "size": 2, "elements": [ { "id": "R.7A-T.7A", "active": true, "follows": [ "jsoi:$.railway:SwitchPosition.elements[0]" ], "requires": [ "jsoi:$.railway:Sensor.elements[0]", "jsoi:$.railway:Sensor.elements[1]" ], "entry": "$.railway:Semaphore.elements[0]", "exit": "$.railway:Semaphore.elements[1]" } ... ] }, "railway:Segment": { "size": 1986, "elements": [ { ... }, ... ] }, }
The two main differences between JSOI and XMI, is that JSOI does not use nesting to represent containment references (graph vs tree format) and that elements are stored in arrays grouped by type (model management vs exchange). The purpose of the elements-by-type structure is to facilitate retrieval of elements of a given type and the gathering of model statistics. For this, the elements-by-type structure not only holds all elements of the type, but also stores information about the type’s subtypes and the number of elements. The number of elements information is stored so model statistics can be retrieved directly from the model without having to load any elements.[1]
JSOI was designed with three goals in mind:
- Ability to retrieve sub-sets of elements by type.
- Lazy evaluation of references.
- Element demographics without element loading.
Elements by type
As my introductory tale told, model management tasks are often targeting only in a subset of the elements in a model and in particular in elements of a given type (or types). For example, a validation script might only be interested in validating Semaphores, a transformation script might only be interested in transforming Segments, and so on. With the XMI format, the complete model has to be loaded and then filtered to find elements of the desired type (usually done by the famous `<type>.all()`). With JSOI retrieving elements of a specific type can be delegated to the model instead, which can rapidly find them and load them. Even better, the loading of the elements will not happen until they are requested. This saves loading time of, for example, a second rule – with another type – which is invoked conditionally.
Lazy evaluation of references
As well as not interested in all the types, modelling scripts might not be interested in all properties of the elements of that type. This becomes important when the un-interesting properties are references. For example, I might be interested in the *length* of **Segments** but not in its *semaphores*, its *monitoredBy* and *connectsTo* values… so why load all those referenced elements to memory if we are not going to use them?
Element statistics without element loading
Face it, sometimes we only want to know the model demographics: how many of each type. The demographics can result in the model being skipped (i.e. not interesting enough) or trigger different scripts depending on the distribution/presence of elements. Please refer to the paper[1] that introduced JSOI if you want further details.
Other approaches
I know, I know, I am not the first one to propose alternatives to XMI. NeoEMF persists EMF models in a DB, so does Teneo-Hybernate, CDO, EMFStore, and others. However, JSOI is different in that it uses a textual based format. And what the heck, new programming languages are created frequently, why can’t we create new persistence/format languages too?
Development
After an initial prototype for a proof of concept – storing EMF models in JSOI format, development has continued to also support loading and all the features of EMF’s Resource so that existing tools can use models persisted in JSOI format. You can follow the development here.
Note: Implementing a full-fledged EMF Resource turned out to be a bit harder than anticipated and life also happened, so development has been slow :).
Let me know in the comments below any questions or comments about JSOI, or drop by Kinori Tech to discuss your modelling projects.
References
JSOI was introduced in the 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C) workshop.
[1] https://ieeexplore.ieee.org/document/8904481
This is a great idea! I think one way to evaluate the efficiency of a model persistence format is to look at its compression. For example, I’ve seen a 1.2 GB XMI file being only 41 MB when zipped. That means there’s A LOT of waste! Scalability is one of the main downsides to XML-based formats. Would also be interesting to see which one fairs better: JSON or YAML (in terms of file size for a given model). I’m guessing performance experiments (both in terms of time and memory) are future work?
Definitively pure text formats are not ideal and one would probably want to go to binary format after a certain model size (number of elements). However, JSOI is more about data access than data size. Performance experiments are indeed future work, focusing on loading/persistence, element access and demographics. A comparison against YAML is not planned for the moment as, IMHO, for the JSOI objectives there would be little gain in using YAML.
Wouldn’t it be possible to use JSON Pointers instead of defining a string pattern? I guess that JSON Pointers could be problematic across files, but I’m not sure.
Also, is there a performance benefit on encoding the object’s type and reference within a string instead of using an object with type and reference fields? My first approach would be to do the former and, maybe, include IDs to have a “JSON:API like”approach, but you know what people say about the first ideas that come to our minds: usually, they are not the best ones.
The references? Those are actually JSON Path expressions. Given that the JSON path points to another JSOI document, then the path will always start with the referred object’s type since JSOI is organised by types. As in XMI, index based paths can be replaced by ID based paths, but not all EMF metamodels define IDs and not all EMF Resources use generated IDs.