Greetings, everyone! I’m Antonio, one of the developers of the Eclipse Epsilon project. A few weeks ago, Jordi tweeted about an article that used Roslyn to generate UML diagrams from C# code and when I replied back saying we had something similar for Java, he invited me to write about it on his blog – and here I am.
To provide more context, Epsilon provides a family of languages for working with models. They all share a base language (the Epsilon Object Language) and a common API for reading and writing models in various formats (the Epsilon Model Connectivity layer). The nice bit is that you can write EOL code and reuse it from your EGL code generator, your EVL validator or your ETL transformation, and you don’t need to learn 4 completely different languages. In the Epsilon project, we have the idea that everything is a model: EMF/UML models, spreadsheets, plain XML files, and many more.
More recently, we developed a module that allows Epsilon to query Java code as if it were a model, using the internal representations of the Eclipse Java Development Tools. It is the EMC JDT driver, which you can grab from Github.
But, wait a second, why should I want to do this?
There are many cases in which you want to check things about your code. In many cases, a simple text-based search or the code navigation facilities in your IDE might be enough. For instance, finding a class by name or listing its inherited methods is something we do all the time.
However, what if you want to check something very particular which is not supported by your IDE, and which involves “understanding” Java? As an example, suppose that you want to find all the places in your code where a new programmer may have used == to compare floating-point numbers. This is a very common mistake that novices make.
Text search won’t help you: floating-point expressions and variables can be arbitrarily complex. You will need to parse the Java code, find these == comparisons and reimplement the bits of the Java Language Specification needed to find if one of the two sides is a floating-point expression. Too much work for a quick check you wanted to run on your code!
A simpler approach would be to have a tool give you a representation of the code that is close to how the compiler thinks about it, and that you can go through easily. This is what we mean with having “a model”. In particular, this example would be solved with our tool by writing this snippet of EOL code:
for (expr in InfixExpression.all) {
if (expr.operator.toString == '==') {
var typeLExpr =
expr.leftOperand.resolveTypeBinding().qualifiedName;
var typeRExpr =
expr.rightOperand.resolveTypeBinding().qualifiedName;
if (typeLExpr == 'float' or typeRExpr == 'float') {
var cUnit = expr.root.javaElement.elementName;
('WARNING: in ' + cUnit
+ ', tried to use == with a float: '
+ expr).println();
}
}
}
This query finds the == expressions in your code, reuses the Eclipse Java Developer Tools to find if one of the two sides is a “float” (we’re ignoring “double” to simplify things), and then reports any problems. It can handle non-trivial cases like method invocations, array accesses and so on. And all in 12 lines of code.
How does it differ from other tools?
The usual approach when exposing code as a model is to parse the code, dump it as a model (e.g in XMI) and then treat it as usual. This approach is followed by popular tools such as MoDisco, and it works well in “software modernization” situations in which you have a “frozen” legacy code base. This is the bottom path shown on this figure, starting from “Java code” and going to the cyan nodes:
However, if you have an active codebase, having to extract a full model every time you make a change is tedious and slows you down. Instead, it’d be better to just have something running in the background keeping such a model up to date. The good news is that many IDEs already do this for their code navigation facilities, so we can piggyback on it without adding yet another background process to the user experience.
Our EMC JDT driver is exactly that – we don’t do any big extraction work in advance, so the query starts running almost immediately. The driver exposes the indices maintained by the Eclipse Java Development Tools, so you can quickly find a class and go through its methods, for instance. If at some point you need more information than the indices provide, we’ll transparently use the JDT parser (based on the Java compiler) to fetch things for you. This is represented as the top path in the above figure, starting at “Java code” and going through the orange nodes.
How is it used?
With Epsilon and the EMC JDT driver installed, we create a new “query.eol” file with our query. For instance, this one-liner prints how many types we have in our program:
TypeDeclaration.all.size.println(‘number of types: ‘);
To run it, we create a standard EOL launch configuration and then select the new “Java” model type:
We then select a set of Eclipse Java projects to expose as a model:
You can select multiple projects through Ctrl+click – these will all be exposed as a single model. Here I have code for various versions of the JFreeChart library, and I have selected the code for the 1.0.19 version in particular.
Click on OK, then Run, and you’ll get your answer:
number of types: 1041
What other things can I do?
While the previous example was very simple, EOL is a fully-featured language, with support for loops, user-defined operations, built-in data structures and full access to any Java library. In our OCL’16 paper we showed how to use it to validate your real Java code against a UML diagram, checking if perhaps your UML diagram had gone “stale”. We found that using the EMC JDT driver would be faster than using MoDisco if you just wanted to do this check repeatedly across multiple releases.
Essentially, we expose the JDT document object model (DOM) directly through Epsilon, so if you want to access all instances of the JDT DOM TypeDeclaration class, you write “TypeDeclaration.all” as we did above. We also provide a few convenient shorthands. For a TypeDeclaration t, you can use these:
- t.public, t.protected, t.private, t.static, t.abstract, t.final: these are true/false depending on whether the underlying type has this modifier.
- t.name: this exposes the name of the underlying type (which usually requires going through multiple fields).
We also expose the JDT index so you can quickly find a type by name: in fact, it’s the same index you use when pressing CTRL+SHIFT+T on Eclipse. To do so, you can use one of these:
- TypeDeclaration.all.select(td|td.name=’someClsas’) finds a type by name and returns it as a collection of TypeDeclarations with access to every detail within those types.
- TypeDeclaration.all.search(td|td.name=’someClass’) does the same, but it only returns the raw index entry (an instance of IType), which is much faster but has less information.
Conclusion
Just as with Roslyn, the EMC JDT driver shows that you don’t need to extract a full model to start querying your code. You can just reuse whichever representations your IDE offers and then fetch the extra bits that you may need with some on-the-fly parsing. The time savings can be considerable if you need to do this frequently!
If you’d like to know more about this, feel free to contact us at @epsilonews – we’ll be happy to help! We also welcome contributions through Github, in case you’d like to pitch in – the JDT indices offer many other ways to search for types, for instance, and we’d like to hear your thoughts on which types of JDT search you would like to see integrated.
Featured image from here.
I am a Lecturer at Aston University (UK), interested in model-driven software engineering, search-based methods for software engineering (especially testing) and better approaches for engineering education. I also contribute to the Eclipse Epsilon family of model management languages and to the Hawk NoSQL model indexer.
It is very interesting to read about Eclipse Epsilon and this comparison with the C# approach published on my blog.
One of my goal is to build something similar but focusing on having something that work outside the IDE, either as part of automated jobs or used as a library in different projects. I think that the Eclipse JDT is great but it is a bit heavy and it is difficult to use it outside Eclipse. For this reason I contribute to JavaParser and I have built JavaSymbolSolver (which resolve symbols and build a model of compiled code).
We have much work to do to reach the maturity and number of features of the JDT, of course, but I think we could learn a lot from Epsilon.
Interesting! Actually, Epsilon can run outside Eclipse – in fact, we have plain JAR distributions of it in our website [1]. Obviously, the EMC JDT driver in this example would have to run inside Eclipse (even if it’s a headless one), but perhaps your approach could be exposed as another EMC driver (this time, one that can run outside Eclipse).
I’m only concerned about its use in large code bases – the indices JDT builds by default are very important to speed up certain things, as we don’t want to parse everything all the time. Do you think JavaSymbolSolver could work incrementally, resolving references on the fly, or does it need to parse all the source code at once? How large are the codebases that you have tried it with?
[1]: https://www.eclipse.org/epsilon/download/
JavaSymbolSolver works absolutely incrementally: it is a bit harder than simply building models for everything but the idea is that you can parse a Java file using JavaParser then ask JavaSymbolSolver to provide information on one piece (like a name in the middle of some expression) and it will load everything it needs to in order to answer that question. It than caches a lot of stuff so related questions will be answered much faster. Obviously if that name is a local name no external file will need to be loaded. But if it is, let’s say, an inherited field then the parent class could be loaded to answer that question.
The reason for this approach is that JavaSymbolSolver was born to complement JavaParser. Our goal was to add the “give me the type of this” feature. It meant having to load compiled classes, uses type inferences in lambda, recognizes which variant of an overload method is invoked etc. So in the end we could build a model for every class, method, annotation, etc but we do that on demand.
About dependencies: JavaParser has zero dependencies, while JavaSymbolSolver have a few. Both works outside OSGi.
Configuring JavaSymbolSolver means doing something like this:
CombinedTypeSolver combinedTypeSolver = new CombinedTypeSolver();
combinedTypeSolver.add(new JreTypeSolver());
combinedTypeSolver.add(new JavaParserTypeSolver(new File(“src/test/resources/javaparser_src/proper_source”)));
combinedTypeSolver.add(new JavaParserTypeSolver(new File(“src/test/resources/javaparser_src/generated”)));
So no need to create an Eclipse project or something like that.
While JavaParser have been tested for years on all sort of projects JavaSymbolSolver is much younger and has not been tested on huge projects.
It could be interesting to see if we could turn our stuff in a driver for Epsilon. Where could I find information about doing that?
It sounds very nice indeed. That TypeSolver setup code could be probably part of the loading process for an EMC driver, and then we could expose the JavaParser / JavaSymbolSolver document object model for easier querying as we did with JDT.
We ran a tutorial in MoDELS’16 on developing EMC drivers, among other things. The materials are here:
https://www.eclipse.org/epsilon/doc/articles/developing-a-new-emc-driver/
If you have any questions, feel free to ask!