85% of projects in Github have never been forked

We already discussed that releasing a tool as open source does not guarantee that people will start contributing. In fact, far from it.

And this is not new, a few years ago, when interviewing sourceforge representatives for “Tools for Teams: A Survey of Web-Based Software Project Portals” , they already mentioned that most projects hosted in SourceForge involved only one or two people.

What I wondered was whether this was a “social problem” or more of a “technical problem”. Maybe with better collaborative tools and new development methods this would change? If we make contributing to a project easier, would more people do it? In theory, thanks to Git with its pull-based development method (in which you can fork any public repository, modify it and then solicit that your changes are pushed back to the main branch) and code hosting environments like GitHub that facilitate setting up and monitoring Git-based projects, collaborating to a open-source project is now easier than ever. So the question is, has this helped to increase the number of contributions?

I’m not going to answer this question (I don’t have it) but I’ll share with you some data that shows that, for whatever reason, if you see your open source project is completely ignored by everybody, you should not worry too much, this is in fact the norm.

We (myself, Javier and Valerio) have checked how many times each of the 7.388.726 (public) projects in GitHub (as of today, only projects created after January 2012) has been forked using GitHub Archive to get the data. As mentioned above, under the pull-based development method, forking is a necessary step for contributing to a project (but doesn’t mean that every fork ends up in a contribution). There are other ways to contribute to GitHub project without forking it but this is by far the most common method so we will focus on this one and ignore the rest (yes, I can hear you yelling: Threat to validity!!!)

The results are quite shocking (they follow the trend we discussed at the beginning but the numbers are “worse” than I anticipated).

The following two graphics show the distribution of GitHub projects according to the number of times they have been forked. These graphics use this raw data in CSV and Excel

GitHubProjectsForksand cvs.

The first one shows the quickly decreasing number of projects as soon as we start increasing the number of forks.

Distribution of GitHub projects based on the number of forks

The second shows the same data but on a percentual basis.


What are your thoughts? Are you surprised by these results? How would you improve these percentages?

UPDATE: check also the reddit discussion around this data

If you liked this post, you should subscribe to the blog feed , mailing list , or facebook page and follow my thoughts on twitter about software development or web design and social media . Check also my book on model-driven engineering .

Be sociable, share!

9 Responses to 85% of projects in Github have never been forked

  1. Tassilo Horn says:

    Forking is not necessary to contribute to git (or GitHub) projects. You can just clone the original repository, do whatever changes you like, commit, and then send a patch created with “git format-patch” to the devs. They can then use “git am” to apply it upstreams if they like to.

    I’m usually using that method for one-shot fixes and only create a fork when I’m planning to contribute more frequently.

  2. Jan says:

    I think your numbers don’t just miss an important practice of contributing. With me and my projects, if someone wants to contribute, he or she just enters the team. Thus, there is no need for forking anymore. Well, I know the contributors personally. Obviously that’s the reason for the trust in the contributors. I think that’s a common practice of contributing, too, but it’s hard to figure that out with numbers.

    • modelinglang says:

      I’m not sure this is common practice. I mean, for sure, if somebody starts contributing to the project s/he may end up being invited to join the team. But, how frequently somebody would join the team before contributing anything? As you say this only happens when you personally the owners of the project

  3. Elywn Sykes says:

    The problem is that most Github projects should never have been started. The damage has been caused by the meme that says you need to publish your projects in order to get a job.

  4. abetusk says:

    Could you plot a semi-log plot and a log-log plot to get some more insight into the data? Barring that, could you make available the data you used to generate the plot?

  5. Artem says:

    The results are not particularly striking. Speaking of my friends and myself, we tend to push almost every little project on Github/Bitbucket, even university homeworks. The purpose of using git for these is not team development but rather 1) commit history, allowing e.g. to use git bisect to find bugs; and 2) being able to access the code from any machine.
    Of course, that’s somewhat of an abuse, Dropbox or its analogues might be a better choice for this type of thing, but that’s how Github is used nowadays.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Powered by WordPress
More in empirical, open source
Who wants to help us build EMF-REST (Restful APIs for models)?

A tool to define the governance rules of your (open source) projects
Why don’t you contribute (more) to open source projects?