Full research proposal on the study of Open Source communities
Following up on our introductory post in the topic I’m now “releasing” the full B1 Research Proposal document I submitted to the ERC Consolidator Grant 2016 .
If you want full details of the proposal (either because you like the topic or you are just interested in collecting some examples of ERC proposals to help preare your own) keep reading. If you just want the short version of our research ideas on this topic, the following presentation (or this short roadmap paper) can be good enough:
(UPDATE: this research proposal was rejected, still, we continue to believe this is a research line worth to be investigated so we’re going ahead with some of the sublines while looking for funding to go full steam ahead with it)
My goals with this public posting of the proposal are:
- Help other researchers going through a similar “ERC experience”. Obviously, this is just a proposal so I’m not saying this is a good example of a ERC proposal, it could be a terrible one but, still, it’s an example and, unfortunately, many people talk about open science but few practice it so I’m sure some of you will find it useful when writing yours. Also, this research proposal is aimed at studying open source development so I’d find counterintuitive not to have the proposal itself in the open
- Hope to find other researchers interested in this research line to collaborate with
- Find practitioners/contributors to OSS projects that would be open to help with our reserach by accepting to be contacted/interviewed to learn more about how OSS is developed, commit to reading the results we produce and (maybe) try them in their projects. If you’d like to help please fill this form
(and no, I’m not scared to death that somebody decides to “steal” any of these ideas, read this for a longer explanation)
Now, without further due, my ERC proposal (B1 file):
Very Large COmmunity-based Software DEvelopment (CODE)
- Name of the Principal Investigator (PI): Jordi Cabot
- Name of the PI’s host institution for the project: ICREA – Universitat Oberta de Catalunya
- Proposal duration in months: 48
We live in a software-enabled world. Software is everywhere, in your laptop, your phone, you car and even (sooner than later) your toaster. Global cost of software development is estimated to be over one trillion dollars making it a crucial market for Europe’s ICT initiatives.
Much of this software is critical for the daily activities of our society and has a large community behind it, comprising thousands of contributors but also millions of users that must be listen to as well. This should be especially true for software built following the principles of Open Source Software (OSS) typically developed in a collaborative manner via online code hosting platforms like GitHub.
In theory, OSS is of better quality thanks to this higher community involvement (at different levels: submitting bug reports, feature requests, giving feedback, contributing code…). Luckily, most of the crucial software for our society is OSS (like Apache Server, Firefox, Linux or WordPress). In practice, though, many OSS projects suffer from a lack of transparency and democracy, fail to attract and manage contributors and, in general, are unable to properly respond to their users’ needs. This hampers their future success and will impact the growth of Europe’s ICT.
The goal of this project is to transform software development into a real community-driven process by providing an online collaborative platform where a software community at large (i.e. including its users) can effectively participate and be managed in order to make joint decisions in the open to ensure the long-term sustainability of the project. This will require solving a number of research challenges around the human and social aspects of software development. Therefore, the project will built a unified interdisciplinary framework combining techniques from software mining and analytics with methods borrowed from political science, sociology and economics.
Section a: Extended Synopsis of the scientific proposal (max. 5 pages)
We live in a software-enabled world and open source software is a key player in it: “Software is everywhere today, yet its instrumental role in the modern digital economy is often overlooked. With market revenues of over €200 billion in Europe software is the largest and the fastest growing segment of the ICT market … Open source software (OSS) is now playing a significant role in this Software economy. A number of OSS specific actions could contribute to growth in Europe, jobs creation and improvement of the European Software imbalance ” – European Software Strategy Report.
These numbers and vision clearly convey the importance of software development and, in particular, OSS development in the European economy (and, in fact, our daily life, each of us interacts with OSS every single day even if inadvertently). According to the Open Source initiative: “OSS development is a development method that harnesses the power of distributed peer review and transparency….The promise of OSS is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in“. This level of quality is due to the active participation of the community. This is also the key proposal of the well-known essay “The Cathedral and the Bazaar”  where the author contrasts two development models: the Cathedral model where code is developed by a restricted set of developers and the Bazaar model where development is a collaborative endeavor and users are co-developers constituting altogether a very large global community of people with different profiles. Indeed, this “co-developer” role doesn’t mean users contribute code, it highlights the fact that users are key members of the software community, have a say in it and can contribute in any form or shape they can, e.g. submitting bug reports, feature requests or just giving feedback of any aspect of the software. This is different from end-user development approaches  that pretended to convert users in semi-developers to adapt themselves the software alone.
Unfortunately, this does not reflect the reality of OSS development and therefore the potential benefits of OSS to the European society may never happen. Reality shows that many OSS projects are closer to the Cathedral model than the Bazaar one. I manually analyzed the twenty-five most popular projects in GitHub and found out that only one (4%) explicitly described how user contributions would be managed (with another 28% giving partial hints). This means that 68% had no explicit governance model. Absolutely none of them were democratic (i.e. end users could not vote in any way not even to elect people to represent them). In fact, the only one describing its decision-making process stated that “this project follows the timeless, highly efficient and totally unfair system known as Benevolent dictator for life”. Clearly, not what is common practice in the rest of community aspects of our society. And this is not the only problem. Most projects struggle to attract contributors and to properly manage their massive communities of developers and users. In fact, we can conclude the OSS model is broken with many projects failing and getting abandoned in the very early stages (see  for some statistics). Therefore, alternative software production models deserve to be explored now.
I argue in this proposal that to improve software quality (in the broadest sense of the word, i.e. including product-market fit) we need to shift the focus of our software engineering research from a code-centric focus to a people-centric one. This shift will be achieved by implementing an ambitious multi-dimensional and cross-disciplinary research agenda that will bring to the software field expertise available in other academic disciplines. This is obviously a challenging task since it will involve transforming the way software is developed, making the process more open (now for real!) and community-driven. Still, software has largely contributed to make our world more social (e.g. enabling the social networks or the sharing economy services) and democratic (e.g. e-democracy and voting systems). I believe it is time we explore how these aspects can benefit software development itself.
State of the art
The software research community has been chasing forever the silver bullet that will fix all problems in software engineering . Recently, the availability of a massive dataset of software project data in repositories like GitHub (with over 30 million projects, even if data needs to be taken with a grain of salt – ) has opened new research opportunities focusing on mining such repositories for valuable insights on good software development practices, specially wrt open source projects. We have performed a systematic literature review of these papers resulting in the selection of over 100 papers that have been analyzed and classified to detect the open research challenges in the software domain. Herein, we present a summary of this work, validating the need for this research proposal.
Published papers analyze software projects from different angles but mostly with a code-centric view, meaning that they focus its analysis on the projects’ source code by analyzing, for instance, (1) the use of programming languages (e.g., , ), (2) the type of license they apply (e.g., , ), (3) the folder structure of the project  or the potential vulnerabilities and complexity of the code (e.g., , ). Others focus on more methodological aspects covering testing practices (e.g., , ), refactoring (e.g., ) or pull requests (e.g., , ). This is also true for several European funded projects on OSS-related areas like MANCOOSI , OSSMETER or MARKOS.
Only a few works analyze the social part of the software development process, trying to understand how developers are internally organized and work together in the project. There are studies on the team diversity (e.g., , ) and composition (e.g., , ). Community dynamics are analyzed looking at the interactions between community members and the project or among members themselves. The former category includes works that analyze the first impression formation (e.g., ), using projects for hiring new people (e.g., ), onboarding (e.g., ) and social coding (e.g., use of the social services of GitHub to track activity in projects of interest ). The latter includes works studying the social and technical factors that motivate people to contribute to a given project (e.g., ), algorithms that recommend developers to open tasks (e.g., ) and their role in promoting together the project itself (e.g., ).
Based on the gaps detected in this literature review, evidence from existing projects and discussions with members of the OSS community, we can conclude that (open-source) software development faces the following open challenges:
- It is not as open as you would expect (code is open, the management and decision-making of the project is not even if we do not know why)
- It has strong difficulties in attracting contributors with most projects having only one or two contributors.
- It is unable to manage its community efficiently
In this proposal we aim at developing original research contributions for each one of these challenges.
Disrupting (open source) software development implies shifting our main focus of attention from the analysis of code aspects in the software repository to the analysis of the people behind that code, either as developers, owners or users. Therefore, the main goal of this project can be stated as building:
A unified framework to transform software development into a real community-driven development process
with the benefits of a faster and higher-quality software production and, importantly, a better alignment with the needs of the community at large. The following figure tries to illustrate this change of perspective, highlighting how we go from the current developer centric view (kind of a meritocracy where only core developers have the right to decide) to a community that now collaborates together and has the tools it needs to manage this collaboration in an optimal way.
Developer-based vs Community-driven software development
This community-driven process will be enabled by borrowing and adapting to the software development field techniques from the domains of political science, sociology (e.g. social/behavioural informatics), economics and ecology that had been studying a diverse range of communities for centuries, and combining them with core software techniques for mining of software repositories, constraint solving  and language design, among several others.
More precisely, this main goal will be implemented through the following specific subgoals aimed at helping projects to: (G1) open all aspects of the project, defining a precise governance model setting up the foundations of this participative process, (G2) bring more participants in and diversify their profiles and (G3) optimize how they all collaborate together, regardless their role. All this considering that (G4) projects do not thrive in isolation but are part of a project network. The final goal (G5) is to integrate all these techniques in one single unified community-driven development platform built as an extension of current code hosting services. A more detailed description and decomposition of each subgoal follows:
G1: Bring Transparency and Democracy to OSS development
Open source communities are not as open as they seem as discussed above. Indeed, lack of transparency and anti-democratic practices can scare away potential contributors/users and hamper the project’s alignment with the their needs. To overcome this situation we propose to:
- Employ software mining techniques to conduct a systematic study of current governance models in OSS projects. Complement it with interviews to project members to better understand the reasons behind those choices.
- Develop a domain-specific language to enable OSS projects precisely define their governance model extending the basic strategies covered in . Given their explicit definition, rules could even be automatically enforced and its execution registered for future traceability (e.g. who voted for this at that moment in time?).
- Adapt different democracy models (representative, direct, liquid, …) and other political systems to the specific context of OSS to empirically test the best model for OSS projects, depending on the project characteristics.
- Assist projects transition to more democratic practices, if so desired by them. This may involve for instance the automatic suggestion of possible internal leaders (based on their repository activity) to represent groups of users in elections for intermediate technical committees in a representative democracy scenario. Aspects like the Gini index  for equality distribution and the quality of the online deliberation, inspired from will also play a role.
G2: Attract new contributors to OSS projects
OSS projects need contributors to progress . A few large projects, like Linux, may rely on paid contributors but most depend on convincing external people to volunteer their time. Given that simpler strategies, like making the project more popular, are not enough , we propose to:
- Develop goal models  for each participant profile in OSS to better understand their motivations.
- Propose innovative contribution models. We believe OSS can be regarded as an example of a matching market (markets where money is not the main factor ) and therefore we can adapt retribution strategies successful in other matching markets to the OSS one. Examples would be to replicate the idea of time banks or donor chains (I help you if you help somebody that can help me).
- Apply gamification principles to OSS to increase the level of contribution of current members.
- Identify potential new contributors that have the skills a OSS project is looking for by analyzing and cross-profiling people’s public profiles and behavior in social networks reusing expert finding techniques like   , . This may also be used to reduce the gender gap  and increase team diversity.
G3: Optimize internal project collaborations
Effective collaboration requires more than setting up theoretical good conditions for it. A continuous monitoring of the community structure and the exchanges taking place among its members would allow detecting and fixing early on possible bottlenecks in the communication. In particular we propose to:
- Visualize the community network as a typed directed multigraph (where edges would denote several kinds of interactions between the members) and adapt well-known graph-based algorithms to identify subcommunities, leaders, low density areas and so on. Then project owners can react to solve this, e.g. by “building bridges” between the subcommunities or inviting people to specially scarce areas in the project.
- Define acceptable thresholds and ranges for some social metrics in OSS (e.g. bus factor  or the ratio between external and internal contributors) depending on the project size and domain to evaluate the “health” of the community. The ranges would come from the analysis of a representative set of “successful” projects and typical values in other fields like human ecology.
- Adapt review aggregator and sentiment analysis techniques to summarize long conversational exchanges to facilitate in order to let everybody easily follow relevant project discussions.
G4: Take Cross-project dependencies into account
Projects do not grow in isolation. All the dimensions described above need to be extended to deal with cross-project interactions since project dependencies take place not only at the technical level but at the human level : projects compete for the same resources (e.g. developers’ time) and have cascade effects on each other. I will model this as a constraint optimization problem  aimed at finding an optimal assignment of resources to projects.
G5: Building a community-driven software development platform
All techniques described above will be implemented and released as part of an online collaborative platform. Once built, this platform will enable a software community at large to effectively participate in the development process according to the practices and principles developed in the project. The platform will be built by ourselves as part of the project but following the “eat your own dog food” principle, it will also be released as an open source project in itself and therefore open to contributions and suggestions from the open source community. To avoid reinventing the wheel, the platform will be built on top of GitHub (or another similar hosting platform) and provide connectors with external add-ons (e.g. forums, mailing lists, external bug trackers) to be used as additional information sources for the analysis tasks of the project.
Timing and adequacy of the proposal
Open source is reaching its tipping point where, more than ever, even the most powerful tech companies and entrepreneurs are embracing open source  while the number of projects grows exponentially (GitHub went from 10M projects to over 30 in two years) alongside their impact on the global economy and society. And the OSS community itself is quickly realizing that at this scale, better collaboration is a must (e.g. see this open letter  to GitHub promoted by a group of maintainers of OSS projects frustrated with the limited collaboration capabilities of the platform).
This justifies the importance of this research proposal even if it is a challenging one due to its multidimensional and cross-disciplinary perspective, that requires mixing a wide variety of research techniques coming from both the software realm and social sciences. This increases the risk of the project but at the same time opens the door to promising novel research works in the intersection of several areas. I believe I am in a unique position to take this opportunity given:
- My broad range of research interests and background (in software modeling , including goal modeling , formal methods , software analysis and mining , domain specific languages  and different kinds of empirical studies e.g. , to give a few examples ) covering the skill set required by the project.
- My preliminary work on some of the research topics, e.g. the first version of a specific language for governance of OSS projects  or our study of the problems in attracting contributors , plus expertise on conducting research on software mining and the GitHub platform (e.g. , ).
- My long term interest in several open source communities. Beyond GitHub, we are deeply involved in the Eclipse open source community (see ) and I am personally involved in the WordPress ecosystem .
- My research environment is specially suited to conduct interdisciplinary research (see the risks section)
Achieving the above goals in CODE will benefit the whole software development community and our society in general. Users/citizens are empowered to have a more active participation and influence in the project evolution; contributors know in advance how their effort will be evaluated and dealt with; and project owners get the tools to attract more contributors and better manage the community to speed up the development process. But CODE will also benefit other communities. Here we describe the potential impact of CODE in and beyond OSS development:
Scientific impact: Transforming software development. The techniques developed in the project will have a substantial impact in the way that software projects are developed, analyzed and evaluated and will shed some light on the reasons why some projects are successful while others are not. I am confident that this project can open a new area of research where more and more knowledge from other completely different fields is deemed useful in Software Engineering and brought to it, something that so far has been done only occasionally.
Impact in proprietary software development. Private companies can benefit from many of the techniques developed as part of this, e.g. to evaluate the performance of their employees or get feedback from users. In fact, it has been shown that adopting OSS practices, a process called inner source, is beneficial for companies .
Outside the software world: impact on organizations. The work on formalization and monitoring of governance models (goal 1) is of interest for any kind of organization that wants to be transparent. Moreover, many of the social analysis techniques (goal 3) could be easily redefined to be applied on other communication platforms (e.g. forums, email threads) and not just on software-specific repositories. For instance, modeling the governance of NPO/NGO organizations could help us evaluate and compare their openness. Same for political parties and even countries.
Helping other research projects. A key long-term impact of the project should be its contribution to accelerate the advance of research in the field. Therefore, as part of the project, I will have as explicit goal the development of a series of artefacts useful to other research teams. For example, we will develop a representative sample builder  of projects in GitHub to be used as a benchmark when comparing results of different research works.
Methodology & risk assessment
CODE will adhere to the Design-Science Research (DSR) paradigm . DSR is a problem-solving paradigm for activities dealing with the construction and evaluation of technology artifacts as well as the development of their associated research theories. Besides, CODE will make extensive use of empirical research methods both quantitative (e.g. in the automatic mining of repositories) and qualitative (e.g. semi-structured interviews to gather the motivation and requirements of participants in OSS projects and validate the results). The project will be conducted in an incremental and iterative manner  where at each iteration new advances in each of the project goals will be achieved. Validation of project advancement will be performed at the end of each iteration via the practitioners board (see “Resources” section) and via the automatic measurement of pre and post values of a number of metrics for a set of benchmark projects (both existing and created from scratch to be used as guinea pigs) monitored during the full duration of CODE.
Sketch of the work plan.
This four-year project will be divided as follows. An initial work package (WP0) will setup the project infrastructure and compile the initial set of projects to be used as benchmark. WP1-5 will focus on goals 1-5 above. respectively. Dissemination of results (WP6) will be an ongoing activity. This simplified Gantt diagram summarizes the work plan:
This research project has an interdisciplinary nature and covers a broad spectrum of techniques which clearly increase its inherent risks. Nonetheless, my profile and that of my research environment makes us a good fit for this project (see sect. 4) and will contribute to mitigate those risks and ensure the project’s viability. Main risks and mitigation measures:
Broad range of research techniques required to accomplish the project goals (Probability: Low / Impact: Low). I have some previous experience with all the required techniques. Other members of the team will contribute also their strong technical skills in some of these areas minimizing this risk.
Cross-disciplinary nature of the project (Probability: Low / Impact: Medium). My institution’s name is “Internet Interdisciplinary Institute”, meaning that it has interdisciplinarity at its heart and favours as much as possible cross-domain scientific exchanges. A project like this is, then, a perfect fit for the institution and its strengths, and will have its complete endorsement and network of researchers to complement our skills and knowledge.
Dependency from open source repositories to get the data needed for the analysis (Probability: Low / Impact: Low). The project has a technical dependence to GitHub as the dominant code hosting platform nowadays. However, if GitHub decides to close down or change its business model, others (Bitbucket, Google code,…) will immediately take the opportunity to fill this market and we could easily adapt to their platforms to continue the project.
Little engagement of the OSS community, especially to test and validate the results of our research (Probability: Low / Impact: Medium). I have been able to recruit industrial participants in the past using my blog as a medium. We can also ensure the involvement of our many contacts in the GitHub, WordPress and Eclipse communities. Besides, we are already discussing (e.g.) these research ideas in the open to gauge the interest of the community (also clearly expressed in this kind of initiatives, e.g. ) and learn their main concerns.
Resources & budget
I, as PI, will dedicate 70% of my time to CODE during the whole length of the project and will benefit from the support of my research team (ten members right now). Additionally, and given the cross-disciplinary nature of the project, I have assembled a scientific advisory board with experts from the areas of political science, sociology, psychology and ecology to have regular discussions on the project status and evolution. These are local experts from my affiliated institutions with whom I have already discussed this proposal and have confirmed their interest in joining the advisory board. Also, a professional advisory board with participants with different roles in relevant OSS projects will be constituted with over 20 volunteers recruited already. Beyond monitoring the evolution of the project and giving their opinion on it, their mission will be to validate and apply on their projects the outcomes of CODE.
The total budget requested is 1.599.697,53€, covering the hiring of 3 postdocs and 3 PhD students (mixing computer science and social science profiles in both categories) and 2 technicians for the duration of the project plus funding for research stays, trips for presenting results, event organizations and equipment.
 Report of an industry expert group invited by the European Commission to give their advice on the European software strategy ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/ssai/European_Software_Strategy.pdf
 This is also a key principle of agile methodologies that have been massively adopted by software teams in the last years but at a small scale.
 GitHub is the most used web-based collaborative development platform for OSS projects, offering a series of services, like issue trackers and access-control user management, on top of free Git repository for version control and now hosting over 30 million projects
 Full list of analyzed projects: https://docs.google.com/spreadsheets/d/1q4z6Z1iNcHCuBbznFK3xZ-fDu8UXp5-sjHF2IqWgmq0/edit?usp=sharing
 A governance model describes the roles that project participants can take on and the process for decision making within the project (OSS watch)
 We are not implying that all OSS projects should be democratic but we strongly believe that this is an aspect that deserves attention.
 A fork happens when a group of developers take a copy of the source code of a project and use it to create an independent version of the original project, evolving independently (and therefore at the risk of causing a split in the community behind the project if not merged back later on).
 Even if, for whatever reason, a certain project is NOT looking for contributors, stating this clearly (transparency) would avoid misunderstandings.
 A domain-specific language (DSL) is a language specifically designed to express solutions to problems in a specific domain. This is in contrast with general languages (like Java or UML) that aim to be used in any domain.
 Gamification: Use of game elements (like badges, points or levels) in serious environments
 Tipping point: a point in time when a group rapidly and dramatically changes its behavior by widely adopting a previously rare practice 
 Number of citations, h-index and i10 index data taken from Google Scholar. Citations include self-citations.
 Only 11 of those 143 publications co-authored with my thesis supervisor
 Our tools publicly available on GitHub: https://github.com/SOM-Research
 R. Schuwer, M. van Genuchten, and L. Hatton, “On the Impact of Being Open,” IEEE Software, vol. 32, no. 5, pp. 81–83, Sep. 2015.
 E. S. Raymond, The Cathedral and the Bazaar. O’Reilly Media, 2001.
 A. Sutcliffe and N. Mehandjiev, “End-user development: tools that empower users to create their own software solutions – Special issue,” Communications of the ACM, vol. 47, no. 9, p. 31, Sep. 2004.
 C. M. Schweik and R. C. English, Internet Success: A Study of Open-Source Software Commons. The MIT Press, 2012.
 F. P. . J. Brooks, “No Silver Bullet Essence and Accidents of Software Engineering,” Computer, vol. 20, no. 4, pp. 10–19, Apr. 1987.
 C. Bird, P. Rigby, and E. Barr, “The promises and perils of mining git,” in 6th International Working Conference on Mining Software Repositories, 2009, pp. 1–10.
 J. Howison and K. Crowston, “The perils and pitfalls of mining SourceForge,” in Proc. of Workshop on Mining Software Repositories, 2004, pp. 7–11.
 E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “The promises and perils of mining GitHub,” in 11th Working Conference on Mining Software Repositories, 2014, pp. 92–101.
 B. Vasilescu, A. Serebrenik, and V. Filkov, “A Data Set for Social Diversity Studies of GitHub Teams,” in 12th Working Conference on Mining Software Repositories, 2015, pp. 514–517.
 T. F. Bissyande, F. Thung, D. Lo, L. Jiang, and L. Reveillere, “Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects,” in 37th Annual IEEE Computer Software and Applications Conference, 2013, pp. 303–312.
 P. Mayer and A. Bauer, “An empirical analysis of the utilization of multiple programming languages in open source projects,” in 19th International Conference on Evaluation and Assessment in Software Engineering, 2015, no. November, pp. 1–10.
 C. Vendome, “A Large Scale Study of License Usage on GitHub,” in 37th IEEE/ACM International Conference on Software Engineering, Volume 2, 2015, pp. 2–4.
 C. Vendome, M. Linares-Vásquez, G. Bavota, M. Di Penta, D. German, and D. Poshyvanyk, “License usage and changes: A largescale study of java projects on github,” in ICPC conf., 2015, pp. 218–228.
 J. Zhu, M. Zhou, and A. Mockus, “The Relationship Between Folder Use and the Number of Forks : A Case Study on Github Repositories,” in 2014 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, 2011, p. 30.
 R. Coleman and M. a. Johnson, “Power-Laws and Structure in Functional Programs,” in 2014 International Conference on Computational Science and Computational Intelligence, 2014, pp. 168–172.
 K. Achuthan, S. Sudharavi, R. Kumar, and R. Raman, “Security Vulnerabilities in Open Source Projects : An India Perspective,” in 2nd International Conference on Information and Communication Technology, 2014, pp. 18–23.
 P. S. Kochhar, T. F. Bissyande, D. Lo, and L. Jiang, “Adoption of Software Testing in Open Source Projects–A Preliminary Study on 50,000 Projects,” in 17th European Conference on Software Maintenance and Reengineering, 2013, pp. 353–356.
 R. Pham, L. Singer, O. Liskin, F. F. Filho, and K. Schneider, “Creating a shared understanding of testing culture on a social coding site,” in 35th International Conference on Software Engineering, 2013, pp. 112–121.
 G. Destefanis and M. Ortu, “Position Paper : Are Refactoring Techinques Used by Developers ? A Preliminary Empirical Analysis,” in REFTEST workshop, 2014.
 M. Pinzger and A. Van Deursen, “An Exploratory Study of the Pull-based Software Development Model,” in 36th International Conference on Software Engineering, 2014, pp. 345–355.
 Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu, “Wait For It: Determinants of Pull Request Evaluation Latency on GitHub,” in 12th IEEE/ACM Working Conference on Mining Software Repositories, 2015, pp. 367–371.
 A. Lima, L. Rossi, and M. Musolesi, “Coding together at scale: GitHub as a collaborative social network,” in 8th AAAI International Conference on Weblogs and Social Media, 2014, pp. 295–304.
 B. Vasilescu, V. Filkov, and A. Serebrenik, “Perceptions of Diversity on GitHub : A User Survey,” CHASE Workshop, 2015.
 M. Y. Allaho and W.-C. Lee, “Trends and behavior of developers in open collaborative software projects,” in 2014 International Conference on Behavior, Economic and Social Computing, 2014, pp. 1–7.
 P. Loyola and I.-Y. Ko, “Biological Mutualistic Models Applied to Study Open Source Software Development,” in 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2012, vol. 1, pp. 248–253.
 E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “An in-depth study of the promises and perils of mining GitHub,” Empirical Software Engineering, Sep. 2015.
 M. Y. Allaho and W.-C. Lee, “Trends and behavior of developers in open collaborative software projects,” in 2014 International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC2014), 2014, pp. 1–7.
 J. Marlow, L. Dabbish, and J. Herbsleb, “Impression Formation in Online Peer Production : Activity Traces and Personal Profiles in GitHub,” in 16th ACM Conference on Computer Supported Cooperative Work, 2013, pp. 117–128.
 J. Marlow and L. Dabbish, “Activity traces and signals in software developer recruitment and hiring,” in 16th ACM Conference on Computer Supported Cooperative Work, 2013, pp. 145–156.
 F. Fagerholm, A. Sanchez Guinea, J. Borenstein, and J. Munch, “Onboarding in Open Source Projects,” IEEE Software, vol. 31, no. 6, pp. 54–61, Nov. 2014.
 F. Thung, T. F. Bissyande, D. Lo, and L. Jiang, “Network Structure of Social Coding in GitHub,” in 17th European Conference on Software Maintenance and Reengineering, 2013, pp. 323–326.
 J. Tsay, L. Dabbish, and J. Herbsleb, “Influence of social and technical factors for evaluating contribution in GitHub,” in 36th International Conference on Software Engineering, 2014, pp. 356–366.
 J. Xavier and A. Macedo, “Understanding the popularity of reporters and assignees in the Github,” in 26th International Conference on Software Engineering and Knowledge Engineering, 2014, pp. 484–489.
 J. Jiang, L. Zhang, and L. Li, “Understanding project dissemination on a social coding site,” in 20th Working Conference on Reverse Engineering, 2013, pp. 132–141.
 K. Apt, Principles of Constraint Programming. Cambridge University Press, 2003.
 J. L. Canovas Izquierdo and J. Cabot, “Enabling the Definition and Enforcement of Governance Rules in Open Source Systems,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 2015, vol. 2, pp. 505–514.
 L. Ceriani and P. Verme, “The origins of the Gini index: extracts from Variabilità e Mutabilità (1912) by Corrado Gini,” The Journal of Economic Inequality, vol. 10, no. 3, pp. 421–443, Jun. 2011.
 D. Friess and C. Eilders, “A model for assessing online deliberation. Towards a more complex approach to measure and explain deliberativeness online,” in The Internet, Policy & Politics Conferences, 2014.
 L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in github: transparency and collaboration in an open software repository,” in 15th ACM Conference on Computer Supported Cooperative Work, 2012, pp. 1277–1286.
 R. Padhye, S. Mani, and V. S. Sinha, “A study of external community contribution to open-source projects on GitHub,” in Proceedings of the 11th Working Conference on Mining Software Repositories – MSR 2014, 2014, pp. 332–335.
 J. L. Cánovas Izquierdo, V. Cosentino, and J. Cabot, “Popularity will NOT bring more contributions to your OSS project,” Journal of Object Technology, vol. 14, no. 4, 2015.
 A. van Lamsweerde, “Goal-oriented requirements engineering: a guided tour,” in 5th IEEE International Symposium on Requirements Engineering, 2001, pp. 249–262.
 A. E. Roth, Who Gets What — and Why: The New Economics of Matchmaking and Market Design. Eamon Dolan/Houghton Mifflin Harcourt, 2015.
 A. Bozzon, M. Brambilla, S. Ceri, M. Silvestri, and G. Vesci, “Choosing the right crowd,” in Proceedings of the 16th International Conference on Extending Database Technology – EDBT ’13, 2013, pp. 637–648.
 F. Wiedemann, R. Sontag, and M. Gaedke, “NeLMeS: Finding the Best Based on the People Available Leveraging the Crowd,” in 15th International Conference on Web Engineering, 2015, vol. 9114, pp. 687–690.
 B. Vasilescu, V. Filkov, and A. Serebrenik, “StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge,” in 2013 International Conference on Social Computing, 2013, pp. 188–195.
 L. Singer, F. Figueira Filho, and M.-A. Storey, “Software engineering at the speed of light: how developers stay current using twitter,” in 36th International Conference on Software Engineering, 2014, pp. 211–221.
 D. N. Beede, T. A. Julian, D. Langdon, G. McKittrick, B. Khan, and M. E. Doms, “Women in STEM: A Gender Gap to Innovation,” Economics and Statistics Administration, no. Issue Brief No. 04–11, Aug. 2011.
 V. Cosentino, J. L. C. Izquierdo, and J. Cabot, “Assessing the bus factor of Git repositories,” in 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, 2015, pp. 499–503.
 T. Mens and P. Grosjean, “The Ecology of Software Ecosystems,” Computer, vol. 48, no. 10, pp. 85–87, Oct. 2015.
 K. Apt, “Principles of Constraint Programming,” Sep. 2003.
 C. Metz, “Open Source Software Went Nuclear This Year | WIRED,” Wired, 2015.
 “Dear GitHub – An open letter from the maintainers of open source projects.” [Online]. Available: https://github.com/dear-github/dear-github.
 M. Brambilla, J. Cabot, and M. Wimmer, Model-Driven Software Engineering in Practice, vol. 1. Morgan & Claypool Publishers, 2012.
 H. C. Esfahani, J. Cabot, and E. Yu, “Adopting agile methods: Can goal-oriented social modeling help?,” Research Challenges in Information Science (RCIS), 2010 Fourth International Conference on, 2010.
 J. Cabot, R. Clarisó, and D. Riera, “On the verification of UML/OCL class diagrams using constraint programming,” Journal of Systems and Software, vol. 93, pp. 1–23, Jul. 2014.
 C. A. González and J. Cabot, “Formal verification of static software models in MDE: A systematic review,” Information and Software Technology, vol. 56, no. 8, pp. 821–838, Aug. 2014.
 V. Cosentino, J. L. C. Izquierdo, and J. Cabot, “Assessing the bus factor of Git repositories,” in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2015, pp. 499–503.
 V. Cosentino, J. L. Cánovas Izquierdo, and J. Cabot, “Gitana: A SQL-Based Git Repository Inspector,” in 34th International Conference on Conceptual Modeling, ER 2015, 2015, vol. 9381, pp. 329–343.
 R. Tairas and J. Cabot, “Corpus-based analysis of domain-specific languages,” Software & Systems Modeling, vol. 14, no. 2, pp. 889–904, Jun. 2013.
 D. Ameller, C. Ayala, J. Cabot, and X. Franch, “Non-functional Requirements in Architectural Decision Making,” IEEE Software, vol. 30, no. 2, pp. 61–67, Mar. 2013.
 H. Brunelière and J. Cabot, “On Developing Open Source MDE Tools: Our Eclipse Stories and Lessons Learned,” in [email protected] 2014, 2014, pp. 9–19.
 J. Cabot, “Looking at WordPress through the eyes of a Software Researcher.” WordCamp Europe, 2015.
 K.-J. Stol and B. Fitzgerald, “Inner Source–Adopting Open Source Development Practices in Organizations: A Tutorial,” IEEE Software, vol. 32, no. 4, pp. 60–67, Jul. 2015.
 M. Nagappan, T. Zimmermann, and C. Bird, “Diversity in software engineering research,” in Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2013, p. 466.
 A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” MIS Quarterly, vol. 28, no. 1, pp. 75–105, Mar. 2004.
 C. Larman and V. R. Basili, “Iterative and incremental developments. a brief history,” Computer, vol. 36, no. 6, pp. 47–56, Jun. 2003.
 M. Gladwell, The Tipping Point: How Little Things Can Make a Big Difference. Back Bay Books, 2002.
 Thomas C. Schelling, Micromotives and Macrobehavior. W. W. Norton & Company, 2006.
Section b: Curriculum vitae (max. 2 pages)
Family name, First name: Cabot, Jordi Date of birth: 11th September, 1978 – Nationality: Spanish
URL for web site: http://jordicabot.com Full CV: http://modeling-languages.com/FullCV
Researcher unique identifier(s): Blog, DBLP, ResearchGate, ORCID, Twitter, LinkedIn, MetaScience, ResearcherID
My research falls into the broad area of software engineering, with emphasis on model-based engineering, formal methods and software analytics. After finishing my PhD in 2006 in Spain, I did a postdoc in Canada and later became one of the youngest ever researchers to lead an INRIA team in France. After five years there, I am now back to Spain as (also) one of the youngest ICREA research professors after winning a highly competitive selection process (6.2% of 159 applicants were offered a position in my year). After leading the AtlanMod research team (2010-2015, 15-20 people overall), I am now building my new group SOM at UOC (around 10 people at the moment).
Some basic (quantitative) research KPIs summarizing my research career:
|Total peer-reviewed publications:
||Number of citations :
||Number of co-author:
2012 French Habilitation (HdR – Habilitation à diriger des recherches, “accreditation to supervise research”). Dissertation: “MDE 2.0 : Pragmatical formal model verification and other challenges“. École des Mines de Nantes (France)
2006 Ph.D in Computer Science (European Mention) from the Technical University of Catalonia (Software program, LSI Department). Dissertation: “Incremental Integrity Checking in UML/OCL Conceptual Schemas”. Advisor: Dr. Ernest Teniente
2002 BSc Degree in Informatics Engineering. Technical University of Catalonia
- CURRENT and PREVIOUS POSITION(S)
2015 –present ICREA Research Professor at Internet Interdisciplinary Institute (IN3-UOC), Spain. Leader of the SOM Research team, composed by 10 people right now.
2010 – 2015 INRIA Research Chair and Associate Professor at the École des Mines de Nantes (EMN, France). Leader of the AtlanMod joint research team (around 15-20 members ) since July 2010
2008 – 2010 Post-doctoral fellow at the Software Engineering Group, University of Toronto (Canada)
2004 –2008 Senior lecturer at the Open University of Catalonia, Spain.
2002 – 2004 Associate lecturer at the “Caixa d’Estalvis de Terrassa” Business College.
2000 – 2002 Associate lecturer at the Mataró School of Engineering.
- FELLOWSHIPS AND PERSONAL GRANTS
2011-2014 Grant from the Pays de la Loire Region (France) to build a new research team proposal. Call : Soutenir et accompagner la constitution de nouvelles équipes sur des thématiques émergentes. Total funding: 196.000 euros
2008-2009. Catalan Government grant “Beatriu de Pinós”. Grant covering two years of salary corresponding to my post-doctoral stay at the University of Toronto.
2006 Catalan Government travel grant to cover the expenses of my pre-doctoral research stay at the Politecnico di Milano.
- SUPERVISION OF GRADUATE STUDENTS AND POSTDOCTORAL FELLOWS
I have supervised seven PhD students (four completed, three ongoing at the moment) and five postdoctoral fellows.
- TEACHING ACTIVITIES (highlights)
In my career I have taught and/or coordinated undergrad courses on a number of topics (programming, databases, software engineering) and environments (elearning, blended learning and face-to-face). I have also co-authored an introductory book on model-driven engineering used in around 80 institutions around the world and participated in three other course-oriented books. I have been the director of the International post-graduate specialization Diploma in Model Driven Engineering (MDE) (2010-2011) and PI of the École des Mines node in the Lifelong learning European project “Exchanging knowledge, techniques and experiences in Model Driven Engineering education”.
- INTERNATIONAL AND NATIONAL COLLABORATIVE PROJECTS
I have participated in a considerable number of research projects in different calls (FP7 IP, FP7 STREPs, Lifelong learning, ARTEMIS,…). The following table summarizes the number of projects and funding received during the last 5 years. Column Budget-global show the total budget of the Project while Budget-team/me indicates the amount corresponding to the team funding and man-month how that amount translated into new job hirings for the team. Only projects where I was either the PI or the PI of the national node are included.
|Type of Project
||405 (33.75 yrs)
- MAJOR COLLABORATIONS AND NETWORKS
Among my over 100 different co-authors, I would like to highlight the four following continuous collaborations:
- Database Group. Politecnico di Milano (Italy). We have a long-term relationship collaborating on web engineering topics. We have been partners in two European projects plus transfer actions
- Business Informatic group at TU Wien (Austria). We have worked together in a number of core modeling challenges, co-authored a teaching book and participated in the ARTIST IP European Project
- AtlanMod at Inria (France). Lead the team for five years. We are still in close contact and keep collaborating in several research lines and projects/proposals.
- Miso group at Autonomous University of Madrid (Spain). We have collaborated for over 8 years on model verification topics. We are also partners in the FP7 STREP Mondo project and national networks.
- ORGANISATION OF SCIENTIFIC MEETINGS
PC Chair PC Chair for three major conferences specialized in the field of model-driven engineering, including the most important one: 18 Int. Conf. on Model Driven Engineering Languages and Systems (Models) in 2015, European Conference on Modeling Foundations and Applications (ECMFA) in 2014 and the Int. Conf. on Model Transformation (ICMT) in 2011.
Other roles Social Media chair at Models 2014, Tutorial Chair at Models 2013, Posters and Demo Chair at ICWE 2014, Publicity Chair and PhD Workshop chair at ER 2008, co-organizer of the first ever Doctoral Symposium in the UML conference (2004)
Workshops Started the CloudMDE (Cloud computing meets MDE) series of workshops (three editions so far) and the MELO workshop (MDE meets logic programming, one edition so far, second to come this year). Co-organizer of the OCL workshop during five years
- COMMISSIONS OF TRUST (if applicable)
Occasional reviewer for the following national research councils: Luxembourg, France, Spain, Netherlands, UK, Argentina, Israel and Austria
Member of the Editorial board of IEEE Software (from 2015, Initiatives team), the Computer Languages, Systems & Structures Journal (from 2015), the Software and Systems Modeling Journal (from 2013), the Journal of Object Technology (from 2012) and the Journal of Information Modeling and Design (2011-2013)
Member of the Steering Committee of the Models Conference and the Int. Conference on Model Transformation
Reviewer for all major software engineering journals.
Program Committee member in many software engineering conferences like ASE, Models, CAiSE, ER, WWW, ICWE, ICMT, ECMFA,… plus a number of related workshops
Frequent member of PhD Thesis Juries (e.g. seven in 2015 for French, Italian and Spanish PhD Students)
Section c: Early achievements track-record (max. 2 pages)
My research career started with a PhD Thesis in the field of model-driven engineering (in short, a paradigm that promotes the rigorous use of software models as key elements in all software engineering activities), a core domain where I still work on. Besides, at the end of my thesis I opened a new research line combining software models and constraint programming to provide a “lightweight” approach to the software formal verification challenge. Recently, I have developed a growing interest in analyzing not only the software itself but also the community behind it as a key aspect to improve productivity and quality in software engineering. In the following I summarize my achievements in these three areas, including links to a selection of research publications at the end.
AREA 1 – MODEL-DRIVEN ENGINEERING AREA
Models allow the specification of complex systems at a high level of abstraction. In my role as leader of the AtlanMod team, I helped to build and grow a complete set of techniques for all kind of model manipulation operations (language definition, model transformation, merging, code generation,…). I have also co-authored the most popular introductory book on MDE , now used in around 80 teaching institutions all over the world
Main achievements in this area would be:
– Growing some of these techniques to make them the reference approach/tool in the area, like MoDisco [2,4] for model-based reverse engineering.
– Serve as PC Chair of the most important conference specialized in MDE (Models) plus of two other conferences in the area (ICMT and ECMFA). Currently also serving in the Steering Committee of Models and ICMT.
– Organization of a number of international workshops in this domain.
– Participate in several European funded projects where we contributed our MDE knowledge to the consortium.
– Maintaining the most popular modeling blog with over 1000 visits per day
– Coordinate a monography on MDE for the magazine of the Spanish association of computer professionals
AREA 2 – BOUNDED VERIFICATION
Initially proposed by Hoare, the goal of automatically verified software is a well-known grand scientific challenge for computer science due to the exponential complexity of the problem. In our case, these correctness analysis starts at the model-level since errors in the models will propagate (and multiply) to errors in the final code. My goal, together with my collaborators, was to propose a solution that was at the same time automatic, expressive and easy to use by people with no formal methods expertise.
Main results and achievements in this area would be:
– Proposal of a bounded verification and testing approach as the ideal trade-off for model-based verification and validation  and its adaptation to various model manipulation techniques like model or graph transformations (e.g. ).
– Participation in a summer school on formal methods  where I talked about the Object Constraint Language and how to verify OCL expressions. The talk has been viewed over 30.000 times in slideshare.
– Development of the EMFtoCSP (and previously, UMLtoCSP) tool, reference tools in the UML/EMF verification field. For instance, in a 2013 paper in IEEE Transactions on Software Engineering, UMLtoCSP was described as the most widely used and referenced OCL constraint solver in the literature.
AREA 3 – SOFTWARE ANALYSIS
In my new team, SOM, I have recently added a new research line on software analytics to the two previous ones. I believe understanding how successful (and failed) software projects are developed (and specially how the community behind those projects interact to develop them) is a key factor in improving the productivity and quality of development teams.
Initial achievements in this area would be:
– A set of tools to collect metrics from open source projects hosted in the GitHub platform (like a tool to calculate the Bus Factor of a project or visualizations of community interactions grouped by the kind of issues they are working with).
– Empirical studies, like , to better understand how practitioners make decisions relevant to the design and evolution of a project
– Helping (open-source) software communities to better organize themselves, starting with .
- SELECTED PUBLICATIONS (citations from Google Scholar, excluding self-citations from all authors)
- Javier Luis Cánovas Izquierdo, Jordi Cabot: Enabling the Definition and Enforcement of Governance Rules in Open Source Systems. ICSE – Software Engineering in Society track: 505-514 (2015).
Citations: 0. Description: Proposal of a domain-specific language to help open source projects to be more transparent and clarify their governance policies
- Hugo Brunelière, Jordi Cabot, Javier Luis Cánovas Izquierdo, Leire Orue-Echevarria Arrieta, Oliver Strauß, Manuel Wimmer: Software Modernization Revisited: Challenges and Prospects. IEEE Computer 48(8): 76-80 (2015).
Citations: 0. Description: Article summarizing the results of the ARTIST IP EU Project explaining companies how to face the software modernization of their legacy applications.
- Robert Tairas, Jordi Cabot: Corpus-based analysis of domain-specific languages. Software and System Modeling 14(2): 889-904 (2015).
Citations: 7. Description: Empirical analysis of language usage by end-users as a way to improve the language specification itself
- Hugo Brunelière, Jordi Cabot, Grégoire Dupé, Frédéric Madiot: MoDisco: A model driven reverse engineering framework. Information & Software Technology 56(8): 1012-1032 (2014)
Citations: 25. Description: Methodology and tool support to perform a variety of reverse engineering tasks thanks to the decomposition of the problem in a set of model discovery plus model understanding activities.
- Jordi Cabot, Robert Clarisó, Daniel Riera: On the verification of UML/OCL class diagrams using constraint programming. Journal of Systems and Software 93: 1-23 (2014)
Citations: 15 (plus 163 for the original workshop paper this journal is the extended version). Description: Bounded verification approach for the effective quality analysis of UML models using state-of-the-art constraint solvers
- David Ameller, Claudia P. Ayala, Jordi Cabot, Xavier Franch: Non-functional Requirements in Architectural Decision Making. IEEE Software 30(2): 61-67 (2013)
Citations: 19. Description: Empirical study by means of semi-structured interviews to analyze how software architects deal with non-functional requirements in the design of software architectures
- Marco Brambilla, Jordi Cabot, Manuel Wimmer: Model-Driven Software Engineering in Practice. Synthesis Lectures on Software Engineering, Morgan & Claypool Publishers 2012
Citations: 185. Description: Introductory book to the field of model-based engineering for both professionals and educators.
- Jordi Cabot, Martin Gogolla: Object Constraint Language (OCL): A Definitive Guide. Int. School on Formal Methods 2012: 58-90 (2012).
Citations: 46. Description: Extensive but didactic guide covering all aspects of this OMG standard language for constraint/rule specification on software models.
- Jordi Cabot, Raquel Pau, Ruth Raventós: From UML/OCL to SBVR specifications: A challenging transformation. Information Systems 35(4): 417-440 (2010)
Citations: 65. Description: Expressing software models in natural language (including business rules) to facilitate the validation of those models by the involved stakeholders.
- Jordi Cabot, Robert Clarisó, Esther Guerra, Juan de Lara: Verification and validation of declarative model-to-model transformations through invariants. Journal of Systems and Software 83(2): 283-302 (2010)
Citations: 116. Description: Formal definition of correctness properties for model transformations plus analysis of those properties for popular transformation languages
Featured image by Ron Rothbart