A few days ago Jonas Elfström tweeted the page count of several language specifications (based on a quick check of the specification documents, so we could probably argue about what he is exactly counting but let’s assume these numbers are more or less correct):
- C++ 865 (300ish without the standard libs)
- Java 644
- C 540
- C# 511
- Ruby 311
- Smalltalk 303 (only 35 pages without stdlibs)
- SystemVerilog 1315
- SQL 762
- Haskell 329
- Forth 309
- Erlang 31
- F# 300
- JavaScript 258
- Dart 118
- Go 109(46)
- Scheme 90
So, how do you think UML related to these other specifications? UML is a modeling language, more abstract than those programing languages, so should be also smaller, right?
Well the page count for the UML specification is : 748+230 = 978 or what it’s the same, UML beats all other languages at least regarding the specification lenght.
This numbers is the sum of the two parts of the UML specification (the infrastructure and the superstructure). From the OMG page: “Beginning with UML 2.0, the UML Specification was split into two complementary specifications: Infrastructure and Superstructure. The UML infrastructure specification defines the foundational language constructs required for UML 2.4.1. It is complemented by UML Superstructure, which defines the user level constructs required for UML 2.4.1. The two complementary specifications constitute a complete specification for the UML 2 modeling language”
The good news is that one of the major goals of the upcoming 2.5 version is a spec simplification.
FNR Pearl Chair. Head of the Software Engineering RDI Unit at LIST. Affiliate Professor at University of Luxembourg. More about me.
> UML is a modeling language, more abstract
> than those programing languages,
> so should be also smaller, right?
There are many things that play a role in the size of the specification, but all other things being equal, I’d expect the opposite: since higher level languages provide more concepts out of the box (as language constructs) than lower level languages, they tend to have lengthier specifications. In lower level languages, the same concepts are recreated outside of the language by means of implementation techniques, design patterns, idioms, libraries, frameworks etc leading to smaller languages. For instance, an assembly language does not need to explain how to do for loops, the Java spec does not need to explain state machines, etc.
The latest draft of the UML 2.5 specification document (under finalization) is 831 pages (one document). As Rafael notes, UML includes a lot of constructs not found in a typical programming language (interaction models, state machines, use cases, deployment models, etc., etc.). In addition, the UML specification has diagrams, examples and descriptions of every abstract syntax metamodel class.
Note also that the Alf UML action language specification (which is much closer to a programming language) is just 449 pages, and only 224 if you include only the clauses that correspond to a typical language reference manual, not the abstract syntax clauses, detailed mapping to fUML and annexes.
I have a preliminary question on this.
Besides questioning how sensible it is to compare such different specs (on this I’m with Ed), I also guess that you (more the author of the study actually) ever wondered why should we bother.
I mean, who is the intended reader of such specs?
For sure, neither the end user, nor the software developer, and possibly not even the MDD or soft.eng. expert. How many of you have ever read the C++ specs?
The specs are intended to be for the implementor of the specs, i.e., the builders of the compilers, IDEs, interpreters, and so on. And for them: the more precise and thorough are the specs, the better.
For the rest of the people: there are plenty of manuals, online materials, tutorials and books for any of the mentioned technologies.
Bottom line: studying and comparing size of specs doesn’t make any sense and is completely useless.
Marco
Yes and no. I use the UML (and even more the OCL one) specification every time I have a doubt regarding the specific semantics of a UML element. Not that the spec always gives the answer to that but, in contrast with programming languages, I don´t think any UML tutorial or book does a good job on this, they all seem to just stay at a very superficial level (i.e. the concrete syntax part).
Granted. But:
1) you are not the average UML guy
2) still, your usage of the specs is more like a lookup than a reading (you search for the item of interest or jump there through the TOC). So, size is not really relevant. Actually, what you look for is completeness and precision (and specs writers need space for that).
I have been doing the same on BPMN, but can’t find a sensible scenario where you read the specs document thoroughly. However, I agree it’s sad that books and materials are often at the level of “modeling for dummies”.
Anyway, IMHO: size is not really the issue. Instead, we should be grateful that most of the specs in our field are available online and free to read (as it’s not the case in other fields).