IEEE Software is constantly trying to reinvent itself to better serve the needs of its target audience: software practitioners. As part of the new initiatives team I get to be involved in some of these new initiatives. Among them, I specially like (and collaborate to) the Practitioners’ digest, a column aiming at highlighting conference papers that could be of special interest for practitioners. The goal is to help disseminate some research works that otherwise would be mostly ignored. I still have to run into a software practitioner that enjoys reading conference proceedings 🙂 !
The column works by proposing summaries of papers the team believes interesting, coming from conferences we have recently attended. Then, a discussion lead by the column leader (Jeffrey C. Carver) follows on to finally choose the few papers that fit into the column length.
The latest issue includes a couple of papers from the Models 2015 conference:
- On the Use of UML Documentation in Software Maintenance: Results from a Survey in Industry. Ana M. Fernández-Sáez, Danilo Caivano, Marcela Genero, and Michel R. V. Chaudron. Models 2015: 292-301
- A Model-Based Framework for Probabilistic Simulation of Legal Policies. Ghanem Soltana, Nicolas Sannier, Mehrdad Sabetzadeh, and Lionel C. Briand. Models 2015: 70-79
but I think there were quite a few others (from Models but also from the ER conference) that could have made it. I’m leaving here the summaries of all papers from Models and ER that I thought a practitioners could specially enjoy.
Il-Yeol Song, Yongjun Zhu, Hyithaek Ceong, Ornsiri Thonggoom:
Methodologies for Semi-automated Conceptual Data Modeling from Requirements. ER 2015:18-31
To model or not to model. This is a frequent topic of discussion among software developers. The cost of building the models of the software to be built is high and the benefits of taking the time to specify those models still a matter of opinion. This papers aims to give some arguments to the “pro-modeling” side by simplifying the task of creating quality models by combining six different techniques that can assist you in the creation process.
The basic idea is to start with natural language processing techniques and general domain ontologies to understand textual requirements and reexpress them as conceptual models. The linguistic part would identify the key concepts in the text while the ontology would then help to contextualize those concepts and identify the relationships among them. Model repositories, pattern-based approaches and metamodeling techniques will then be used to refactor and improve the initial models. To make sure the generated models actually represent the customer requirements, a reverse transformation verbalizes the models for validation purposes.
The paper stays at a rather high-level description of the process and therefore, it’s too early to know whether the framework they propose can effectively semi-automate the modeling process. Still, it’s a good starting point.
Diego Sevilla Ruiz, Severino Feliciano Morales, Jesús García Molina:
Inferring Versioned Schemas from NoSQL Databases and Its Applications. ER 2015: 467-480
Typically, NoSQL databases are schemaless. This lack of schema offers a great deal of flexibility, specifically regarding the the recording of non-uniform data and data evolution. Still, this schemaless characteristic is only partially true in the sense that in fact the data has an schema even if it’s implicit. And this implicit schema needs to be “discovered” at some point in order to effectively manipulate that data (e.g. when writing the code that must retrieve and manipulate that data in any non-trivial way).
This is exactly the problem tackled in this paper, where authors describe a reverse engineering approach to infer the schema of NoSQL databases. The approach takes into account the different versions of the data in the database therefore generating not one schema but a set of “Versioning Schemas” to accommodate the different data snapshots. The discovery process itself follows an initial map-reduce phase aimed at collecting a minimal set of examples of JSON objects (minimal in the sense that all data versions are covered with a single JSON object representative) to be used in inference stage. Then, a traversal of all these objects aims at inferring the types, attributes and relationships of the schema based on the JSON structures.
This process makes easier for companies to work with NoSQL databases and to know the number of instances they have for each version (useful for instance, when planning data migration projects).
Roman Lukyanenko, Jeffrey Parsons:
Principles for Modeling User-Generated Content. ER 2015: 432-440
Traditional Information Systems development has assumed that data comes either from the organization’s employees or customers in an stable and well-defined setting. This is not true anymore and many organizations now rely on digital information produced by members of the general public (who often are just casual contributors) to operate successfully. This is known as user-generated content (UGC).
To be effectively managed, this information must have some kind of structure but not a fixed-enough one that makes very difficult for casual end-users to use and understand. As an example, authors are participating in an online citizen research project that relies on UGC to collect bird sighting data from ordinary people to support scientific research on bird conservationist efforts. A strict data-entry form forcing people to enter all kinds of details about the birds would be ideal for biologist having to classify that information but would deter the participation of non-experts that may be unable to respond (or even to understand) some of those questions.
Authors propose a set of principles to take into account when modeling UGC problems, mainly reduce the abstraction level and consider instances as primary constructs. Under the proposed principles, crowd volunteers are able to provide information according to their own conceptualization of reality without having to conform to a particular structure. Obviously, this makes the data analysis part more difficult than before but still possible.
Valerio Cosentino, Javier Luis Cánovas Izquierdo, Jordi Cabot:
Gitana: A SQL-Based Git Repository Inspector. ER 2015: 329-343
Git plays a key role in all software development projects nowadays. We all know about its benefits but also about its challenges. And one of the main challenges for many of us is how to try to unveil interesting metrics on the project evolution (e.g. number of deleted files or number of modifications on a given file) using the limited Git command line interface in combination with shell commands. Instead, anybody with a minimal exposure to software development is most likely familiar with SQL.
Therefore, the goal of this work is to provide a (relational) database representation of Git repositories. A database schema is created and populated with the data coming from a given Git repository. Incremental synchronization optimizes the time to update the database with the latest changes once the initial load has been completed. Authors show how then SQL can be used to easily query the database to uncover interesting information on the project status. Available analytical tools could also be plugged to the database to generate reports on its temporal evolution.
This relational representation also facilitates the integration of Git data with other project management tools at the data level enabling queries mixing data from several tools
Enhancing the Communication Value of UML Models with Graphical Layers
Yosser El Ahmar, Sébastien Gérard, Cédric Dumoulin, and Xavier Le Pallec. Models 2015: 64-69
Probably, one of the few things everybody agrees when it comes to modeling is that it’s useful for communication. Still, as a communication mechanism, the current UML standard and the tools implementing it, could do better specially when it comes to visualize large models covering information relevant for a variety of roles. This paper enriches UML with auxiliary visual variables (e.g. color, position, size, font visibility, border thickness,…) that stakeholders can use to better express their specific viewpoints. For instance, stakeholders can indicate what are the more important classes for them, what are the elements which have been recently modified; what are the main concerns that the diagram deals with, etc. As an example, they show how brightness could be used to express the progress of the project or different colors could represent the area of responsibility of each designer.
This information is then expressed as a visual view that can be superimposed on the original UML diagram. To that end, authors extend the concept of layer from classical drawing tools (e.g., gimp) to UML diagrams. This approach is implemented on the FlipLayers tool, a plugin for the Papyrus modeling environment in Eclipse.