There was a time when I thought the Entity Relationship (ER) language was past its prime. But I now realize I was wrong, really wrong. The ER language feels stronger than it has been for a long time.
In tihs post I try to write down why I have changed my mind. Let’s see if you agree!
Brief introduction to the ER language
Originally proposed by Peter Chen in 1976 in the paper The entity-relationship model—toward a unified view of data, the ER language quickly gained massive adoption in the emerging (at that time) database community. Indeed, the ER model became the de facto standard for designing databases with many vendors (beginning with most of the companies selling relational database management systems) offering modeling environments for ER diagrams with the possibility to automatically translate such models to SQL DDL scripts.
As the name suggests, the key elements of the ER language are the entities and relationships. Entities are the equivalent of objects in the Object-Oriented world (i.e. a specific person, product or event) and the relationships are the links between these entities. Similar entities are grouped into entity sets, also known, as entity types. Similarly, relationships are grouped into relationship types. Clearly, entity types resemble the concept of classes in OO terminology, while relationship types would the equivalent of associations. Entity types can also have attributes. And both attributes and relationship types may be constrained by multiplicity relationships.
Several extensions to the original ER language were proposed, commonly known as the EER family of languages (“Extended Entity Relationship”). One such key extension added inheritance relationsihps between entity types.
Given this limited set of modeling concepts, it is easy to see that the mapping from such models to relational database schemas was mostly a straightforward generation. When many vendors were not yet strictly following the SQL schema, the main challenge of ER editors was to be able to generate SQL scripts for all the different flavors of SQL existing at the time.
The dark years of ER
ER always had its community, especially, in the database world, but it lost its appeal in the broader software development community.
IMHO for two reasons:
- The arrival of UML. With UML you can model whatever you can model with the ER language (or almost) and much more as with UML you can define all the dimensions of your software project and not “just” your domain data (even if, to me, this is still the core element)
- The NoSQL trend. With NoSQL databases, there is no fixed schema. At least in theory, then I like to say that NoSQL is not schemaless. At most, we can say it is “less-schema” than other data. But still, creating database schemas went out of fashion.
The resurrection of the Entity Relationship language
After a few years, people understood that UML had many qualities but it came with a price, as a language was far more complex than the ER one so if you just wanted to create a database or build an application that was basically a web wrapper on top of a database (which could be easily generated from the database itself), maybe it was a better idea to drop the complexity of UML and stick to the simpler ER language.
And it was also soon obvious that all NoSQL vendors were adding SQL support as SQL was the only language that everybody knew. So even if internally the data was stored using a variety of NoSQL strategies, they brought back the “illusion” of a partial schema that could, again, be modeled.
Moreover, a clear sign that a language is getting more and more interest is the creation of new modeling tools for it.
And I have witnessed this around the ER language. After a while, with only some of the classical database design tools (e.g. PowerDesigner that I already used over 20 years ago ????), we are seeing plenty of new tooling initiatives, including online ER tools, textual modeling ER tools and even an ER plugin for VS Code.
I would even dare to add that the explosion of Machine Learning (ML) has also helped, as it has given a lot of importance to the datasets required to train the ML models. And where you have datasets you have the need to describe such datasets(and nothing more, so, again, no need to use a full-fledged modeling language like UML for that).
All in all, given the clear role of data and domain models in all types of initiatives and the better understanding we all have about the strengths and drawbacks of ER compared to other languages, I think ER will remain popular for a long time. True, the hardcore software engineering community will probably never adopt it but we sometimes tend to forget there are many other communities around us that have different needs and for which the ER language could be a perfect fit.
FNR Pearl Chair. Head of the Software Engineering RDI Unit at LIST. Affiliate Professor at University of Luxembourg. More about me.
Hello Jordi,
Nice article and I totally agree with you that data modeling is necessary more than ever. NoSQL, Data Mining, Machine Learning and data driven organisation will need data models especially on the application of data in data driven organisayions and having the semanctics and syntaxis right in your organisation.
But the ER language, I have my questions on that. I am sixty years old and learned this language at university a long time ago. I considered it a luck that new data modeling languages emerged and had better implementations than ER. I think of UML class diagrams, OWL and even the passive structure of ArchiMate
Hello Bert,
Could you explain what parts of UML class diagrams do you prefer over ER?
I think this could also be linked to what flavour of ER you used. In my case, my EER was very very similar to what later I found in UML. Even graphically, there were only a few variations on how to express the cardinalities but not much more than that.
Hello Jordi,
I mean the advance of a compact notation with the attributes within the element rectangle instead of the attributes in elipses modeled separately. My diagrams become cluttered really easy with the elipse notation. Furthermore I often have to create different models with ArchiMate, UMLClass and XSD and than it is nice that the notation is in all based on the samen concepts and setup
As already mentioned I grew up with OMT and loved the introduction of the notation and the concepts inheritance, aggregation and composition;-).
Furthermore the useage of specialisations, aggregations and compositions and eventually the usage of operations to add some behavioural description within the model
Regards
Bert
Ah yes, my EER versions were all already with the attributes inside the class shape. I agree it’s awful otherwise.
And of course, I also agree that if we want to model a complete system it makes sense to use a notation that cover all dimensions you need to model. But I guess there is still quite a few scenarios where people just want to model the data.
As you say, with the world realising that many noSQL options are just “less SQL”, I think it’s time that designers, architects and developers recognise the importance and often the simplicity of ER diagrams. Data models are the heart of all applications. With the current trend towards reevaluating and reestablishing good coding practices and ideas like domain-driven design – keeping things simple (c.f. C4 framework) using simple techniques is now more valuable than ever. ERDs play a crucial role here.
Congratulations on posting a provocative topic. As you said, there are several communities in the modeling world. In this world, the database community prefers to use EER rather than the UML class diagram (I also have restrictions on using UML – this topic has potential for an interesting article/post :D). I’m from the Database community and an MDD enthusiast (I take this opportunity to thank you for modeling-languages.com and your book, Jordi Cabot). In this context, our team has been using the Epsilon framework to develop the EERCASE tool (https://cin.ufpe.br/~eercase), which follows the notation proposed by Elmasri & Navathe (Fundamentals of Database Systems) and supports the structural validations by DULLEA, SONG and LAMPROU [1], as well as those by Calvanese and Lenzerini [2]. EERCASE has been used in DB conceptual design classes and the validations reported by the tool have helped students design better EER diagrams. For more information on EERCASE, see [3, 4, 5] – Thanks again for your time, Jordi Cabot.
[1] https://www.sciencedirect.com/science/article/pii/S0169023X03000491
[2] https://ieeexplore.ieee.org/abstract/document/283032/
[3] https://link.springer.com/chapter/10.1007/978-3-642-34002-4_40
[4] https://periodicos.ufmg.br/index.php/jidm/article/view/224
[5] https://sol.sbc.org.br/journals/index.php/jidm/article/view/2537
I think the persistence of (E)ER modeling is mostly the result of the inertia of change (and identity polititcs) in siloed scientific/professional communities. The DB community seeks to defend its own traditions (and identity) against the progress made in the SE community.
For university teachers, I think it’s irresponsible to create the unnessesary cognitive burden for students requiring them to learn two notations, EER and UML Class Diagrams, for the same task. In addition, it’s confusing for students that there is no standard for EER.
I agree that asking students learning the two notations, at least, without an introductory explanation on the relationship between the two, it’s really really bad.
It happened to myself when I was doing my CS degree! And it took me a while to understand the overlapping between the two
I kind of agree with Gerd. A subset of UML class diagrams can serve the same purpose as (E)ER diagrams. UML class diagrams resolve the problem of diverging variants of ER diagrams, e.g. where to place the cardinalities.
ER diagrams were the first conceptual models. UML class diagrams can cover it but can cover more than just the conceptual model view. This is a potential weakness of UML class diagrams.
Diagrams are visualizations of models. So, whether to visualize as ER or UML is a secondary consideration. We should perhaps teach how to model, e.g. with triplet statements.