Adoption of model-driven engineering in open source projects

Tweet about this on TwitterShare on FacebookBuffer this pageShare on RedditShare on LinkedInShare on Google+Email this to someone

Guest post by Nicholas Matragkas , Dimitris Kolovos and Yannis Korkontzelos  (read the author profiles at the bottom of the post) reporting on their results after evaluating the actual use of model-driven technologies in current open source projects. Enter Nicholas, Dimitris and Yannis.

The level of adoption of Model-Driven Engineering (MDE) principles and technologies in a range of industry sectors has been the subject of several recent studies (e.g. [1], [2], [3]). However to our knowledge there are no studies on the adoption of MDE in open-source projects. With the intent of shedding some light into this unexplored area, we conducted an empirical study on the adoption of Eclipse-based modelling technologies open-source projects on GitHub. The results of this study were presented in the OSS4MDE’15 workshop, and were published in the workshop’s proceedings (open-access version of the paper).

The questions that we were interested in answering through this study included the following:

  • Which of the technologies under consideration demonstrate significant adoption in open-source projects?
  • Is the activity related to these technologies increasing or decreasing over time?
  • What is the size of the community of open-source developers who are familiar with these technologies?

For this work we have chosen to focus on publicly-accessible GitHub repositories due to the dominant role of GitHub in the open-source software development world and the comprehensive API it provides for extracting information of interest. We have also chosen to focus on Eclipse-based MDE technologies due to the fact that the Eclipse Modeling Project arguably fosters the most active open-source MDE community. Twenty two technologies were considered for this study. The complete list of these technologies as well as a discussion on how we extracted the data from GitHub can be found in the paper.

Results

Number of Relevant Files per Technology: To begin with, we wanted to measure the number of MDE-related files found across GitHub. The figure below provides an overview of the obtained measurements. We can observe the dominance of Ecore which accounts for more than twice the files compared to the next technology (Xpand). We also observe that files related to model-to-text transformation languages (Xpand, Acceleo and JET) appear much more frequently compared to files related to model-to-model transformation (ATL, QVTo, Henshin, Kermeta) and model query (OCL, IncQuery) languages.

Number of files per technology

Fig1. – Number of Files per Technology

Number of Repositories per Technology: Next, we wanted to find out how many repositories contained MDE-related files. Conceivably, a small number of repositories that contain a large number of files related to a particular technology could lead to misleading interpretation of the results above. Since the GitHub search API does not return the total number of repositories in which files related to each technology appear, we had to estimate this number.

To compute this estimate for each technology, we accepted that the files related to a technology for which we had been able to retrieve details comprised a representative sample of the entire population of files related to this technology. Consequently, we could assume that the ratio of repositories over files in the sample generalises over the entire population. To estimate the error in this computation, we used binomial proportion confidence intervals.

We modelled the occurrence of a new, previously unseen repository in each of the files in the sample as a Bernoulli trial, since files are independent as far the repository that they belong to is concerned. We computed normal approximation (Wald), exact binomial (Clopper-Pearson), Wilson score, adjusted Wald (Agresti-Coull) and Jeffreys intervals for 90%, 95% and 99% confidence levels. We observed that in all cases, normal approximation, exact binomial, Wilson score and Jeffreys intervals coincide or exhibit a difference equal or less than 1%. Figure 3 displays the normal (Wald) approximation intervals for 95% confidence level. The results of this analysis can be seen in the figure below.

Number of repositories in GitHub per technology

Fig2. – Estimated Number of Repositories per Technology

While Ecore continues to dominate the chart, there are a few interesting differences. First, Xtext and GMF (6th and 11th in Fig1), climb up to 2nd and 4th places in this chart. This is expected as model-to-text transformations typically consist of many files (templates), while graphical and textual syntaxes are often limited to one file per language. In the same spirit, while model-to-text transformation languages still appear to be more popular compared to M2M languages, the gap is substantially smaller in this chart.

We also observe that OCL and Henshin (5th and 9th in Fig1), slip down to the 15th and 17th position in this chart. This suggests that there are a few repositories that contain a large number of OCL constraints and Henshin transformations. Further analysis revealed that for OCL, two repositories (dresden-ocl/dresdenocl and regressiontesttool/rtt) account for nearly 50% of the OCL files on GitHub. Similarly, a single repository (CoWolf/CoWolf) accounts for more than 80% of all Henshin files on GitHub.

Number of MDE-literate Developers: Overall, we have identified 2,195 developers (developers with unique email addresses) who are responsible for at least one commit related to files of interest across all technologies included in this study. The breakdown for each technology is presented in Fig3. Compared to Fig2, we don’t observe any major changes in the relative order of technologies. Estimates and confidence intervals were computed in the same way used in Fig2.

Number of developers in GitHub

Fig3. – Estimated Number of Developers

Activity per Year: Fig4 illustrates the number of commits that involve MDE-related files (across all technologies) over time. We observe a steep increase, from 4.4K commits in 2010 to 24K commits in 2014. This can be attributed to different extents to the increasing popularity of Git and GitHub and to an increase of the use of Eclipse MDE technologies in open-source software development over the last few years.

Fig4. - Commits per Year

Fig4. – Commits per Year

Last-updated Files per Year: On a related measurement, Fig5 illustrates the number of files that were last updated on different years (as with the previous chart, we omit data for the current year). In this figure we observe that more than 40% of the files of interest can be considered as active as they have been updated at least once over the last year.

Updated files in GitHub

Fig5. – Updated Files per Year

active github repositories

Fig6. – Estimated Number of Active GitHub Repositories per Technology

Key Findings

This study was an initial investigation in assessing the use of Eclipse-based MDE technologies in open- source software development projects. In future iterations of this work, we plan to extend the selection of participating technologies to include languages, frameworks and tools beyond the bounds of the Eclipse Modelling Framework (e.g. commercial UML tools, Simulink, Microsoft’s T4, MetaEdit+, JetBrains MPS, MediniQVT, Modelmorf, USE). Moreover, we plan to analyse open source projects included in other public repositories such as SourceForge. The key findings of our study follow:

  • We have identified a substantial number of GitHub repositories (1,928) which make use of the technologies included in this study;
  • We have identified a substantial community of 2,195 developers who are responsible for at least one commit related to these technologies;
  • We have observed an increasing number of commits on files related to these technologies over the last decade;
  • We have observed that model-to-text transformation languages (XPand, Acceleo and JET) appear to be more widely used than model-to-model transformation languages (ATL, QVTo, Henshin);
  • We have identified that technologies that are led by industrial organisations (Ecore, Xtext, Xpand, GMF, Acceleo and JET) are more widely used compared to technologies predominately developed in academia (Epsilon, ATL, Kermeta, Henshin).

For more information about this study readers can refer to the full paper. Moreover, they can also read the relevant deck of slides:

 

 

References

  1. Whittle, J., Hutchinson, J. and Rouncefield, M., 2014. The state of practice in model-driven engineering. Software, IEEE, 31(3), pp.79-85.
  2. Hutchinson, J., Whittle, J. and Rouncefield, M., 2014. Model-driven engineering practices in industry: Social, organizational and managerial factors that lead to success or failure. Science of Computer Programming,89, pp.144-161.
  3. Mohagheghi, P. and Dehlen, V., 2008, January. Where is the proof? -a review of experiences from applying MDE in industry. In Model Driven Architecture – Foundations and Applications (pp. 432-443). Springer Berlin Heidelberg.

Authors

matragkasNicholas Matragkas is a Lecturer in the Department of Computer Science, University of Hull. His research interests include Model-Driven Engineering, Domain-Specific Languages, and software analytics.

 

 

 

kolovosDimitris Kolovos is a Senior Lecturer in the Department of Computer Science, University of York. He has co-authored more than 100 scientific papers in international journals, conferences and workshops in the broader field of software engineering and has been an Eclipse Foundation committer leading the development of the Epsilon open-source project since 2006, and the Emfatic project since 2010.

 

korkotzelosYannis Korkontzelos is a Senior Lecturer in the Department of Computing, Edge Hill University. His research focus is on Natural Language Processing and Text Mining.

 

Tweet about this on TwitterShare on FacebookBuffer this pageShare on RedditShare on LinkedInShare on Google+Email this to someone

Reply

Your email address will not be published. Required fields are marked *