We already discussed that releasing a tool as open source does not guarantee that people will start contributing. In fact, far from it.
And this is not new, a few years ago, when interviewing sourceforge representatives for “Tools for Teams: A Survey of Web-Based Software Project Portals” , they already mentioned that most projects hosted in SourceForge involved only one or two people.
What I wondered was whether this was a “social problem” or more of a “technical problem”. Maybe with better collaborative tools and new development methods this would change? If we make contributing to a project easier, would more people do it? In theory, thanks to Git with its pull-based development method (in which you can fork any public repository, modify it and then solicit that your changes are pushed back to the main branch) and code hosting environments like GitHub that facilitate setting up and monitoring Git-based projects, collaborating to a open-source project is now easier than ever. So the question is, has this helped to increase the number of contributions?
I’m not going to answer this question (I don’t have it) but I’ll share with you some data that shows that, for whatever reason, if you see your open source project is completely ignored by everybody, you should not worry too much, this is in fact the norm.
We (myself, Javier and Valerio) have checked how many times each of the 7.388.726 (public) projects in GitHub (as of today, only projects created after January 2012) has been forked using GitHub Archive to get the data. As mentioned above, under the pull-based development method, forking is a necessary step for contributing to a project (but doesn’t mean that every fork ends up in a contribution). There are other ways to contribute to GitHub project without forking it but this is by far the most common method so we will focus on this one and ignore the rest (yes, I can hear you yelling: Threat to validity!!!)
The results are quite shocking (they follow the trend we discussed at the beginning but the numbers are “worse” than I anticipated).
The following two graphics show the distribution of GitHub projects according to the number of times they have been forked. These graphics use this raw data in CSV and Excel
The first one shows the quickly decreasing number of projects as soon as we start increasing the number of forks.
The second shows the same data but on a percentual basis.
What are your thoughts? Are you surprised by these results? How would you improve these percentages?
UPDATE: check also the reddit discussion around this data