Wednesday, April 29, 2009

Git, BitKeeper and the Power of Open Source

Update April 2012: The comparison page that I reference now just mentions "other SCM", but a side-bar continues to compare their product only to non-distributed, circa 1980s and 1990s offerings.


Back in the mists of 2002, debate ran hot in the Linux development community. The debate was over a proprietary source code management (SCM) tool called BitKeeper that was used as the primary SCM for the Linux Kernel. When a dispute with the vendor resulted in a schism between the Linux developer community and BitKeeper in 2005, the tool was dropped in favor of a replacement written by Linus Torvalds over a one-month period. To understand the importance of this achievement, understand that BitKeeper was written by eight developers over the course of three years and McVoy, its primary architect and original developer estimated that it would cost $12 million to do it again in an ordinary, non-startup company.

Instead, Torvalds sat down behind his keyboard and set out to replace it. How successful was he? If you look at BitKeeper's comparison page with other SCM tools today, you'll notice that it compares itself to many other tools (and makes quite a few rather large errors along the way), but none of them is git. In fact, none of the list are any of the next-generation tools that have followed in git's wake such as Bazzar or Mercurial. Why? Well, git is simpler, easier and better. It also happens to be radically faster. There's no point in comparing yourself with such a tool in public, since it's only going to make you look bad to say that the free tool is radically better.

McVoy also made the claim that a replacement for BitKeeper wouldn't be possible because it was too hard and programmers capable of doing the work wouldn't do it for free. Why is this? Well, it comes down to graph theory and its application to text revisions. Recognizing text differences is hard enough, but to extend that to maintaining a directed acyclic graph of revision histories and branches in a distributed way... well that's downright hard. Sure, it's hard, but then so is writing a POSIX-compatible kernel. The fact that there are now three excellent options out there for distributed source code management that excel at doing just what McVoy said would be impossible to reproduce should go a long way to demonstrating that free and open source software development is one of the most powerful new paradigms of engineering to come along since the invention of the functional specification.