85% of projects in Github have never been forked

Tweet about this on TwitterShare on FacebookBuffer this pageShare on RedditShare on LinkedInShare on Google+Email this to someone

We already discussed that releasing a tool as open source does not guarantee that people will start contributing. In fact, far from it.

And this is not new, a few years ago, when interviewing sourceforge representatives for “Tools for Teams: A Survey of Web-Based Software Project Portals” , they already mentioned that most projects hosted in SourceForge involved only one or two people.

What I wondered was whether this was a “social problem” or more of a “technical problem”. Maybe with better collaborative tools and new development methods this would change? If we make contributing to a project easier, would more people do it? In theory, thanks to Git with its pull-based development method (in which you can fork any public repository, modify it and then solicit that your changes are pushed back to the main branch) and code hosting environments like GitHub that facilitate setting up and monitoring Git-based projects, collaborating to a open-source project is now easier than ever. So the question is, has this helped to increase the number of contributions?

I’m not going to answer this question (I don’t have it) but I’ll share with you some data that shows that, for whatever reason, if you see your open source project is completely ignored by everybody, you should not worry too much, this is in fact the norm.

We (myself, Javier and Valerio) have checked how many times each of the 7.388.726 (public) projects in GitHub (as of today, only projects created after January 2012) has been forked using GitHub Archive to get the data. As mentioned above, under the pull-based development method, forking is a necessary step for contributing to a project (but doesn’t mean that every fork ends up in a contribution). There are other ways to contribute to GitHub project without forking it but this is by far the most common method so we will focus on this one and ignore the rest (yes, I can hear you yelling: Threat to validity!!!)

The results are quite shocking (they follow the trend we discussed at the beginning but the numbers are “worse” than I anticipated).

[Tweet “85% of projects in GitHub have never been forked”]

The following two graphics show the distribution of GitHub projects according to the number of times they have been forked. These graphics use this raw data in CSV and Excel

GitHubProjectsForksand cvs.

The first one shows the quickly decreasing number of projects as soon as we start increasing the number of forks.

Distribution of GitHub projects based on the number of forks

The second shows the same data but on a percentual basis.

projectsandforkspercentual

[Tweet “Only 0,35% of projects in GitHub have been forked 10 times or more”]

What are your thoughts? Are you surprised by these results? How would you improve these percentages?

UPDATE: check also the reddit discussion around this data

If you liked this post, you should subscribe to the blog feed , mailing list , or facebook page and follow my thoughts on twitter about software development or web design and social media . Check also my book on model-driven engineering .

Be sociable, share!

Tweet about this on TwitterShare on FacebookBuffer this pageShare on RedditShare on LinkedInShare on Google+Email this to someone
Comments
  1. Tassilo Horn
  2. Jan
    • modelinglang
  3. Elywn Sykes
    • modelinglang
  4. abetusk
    • modelinglang
  5. Artem

Reply

Your email address will not be published. Required fields are marked *