On distributed revision control and project forking

One of those myths about distributed revision control systems that has grown legs and acquired a heartbeat is that they make forking a project easier.

To the uninitiated, a “fork” occurs when some contributors to a project get disgruntled and decide to take the code, make their own changes, and start a new project based on the existing one.

Forking is a fairly common occurrence. It happens every few years to the BSD operating systems; it happened to GCC in the late 1990s; and it happened to Emacs almost fifteen years ago.

If you’re going to fork, it’s nice to be able to bring the complete history of the project’s source code along with you. While it’s true that this is trivial with a distributed revision control system, it is very nearly as easy with a centralised system, even for people who don’t have commit access to a server.

  • Sourceforge has published tarballs of every CVS repository they host for years. Want to fork? Download a CVS tarball.
  • CVS users can also use cvsps to pull history out of CVS and play it back elsewhere, while Subversion users can use SVK.
  • The swiss army chainsaw of tools in this area is Tailor, which lets you pull history from any of a large number of tools and play it back into almost any other.

Distributed tools can make it somewhat easier for forked projects to communicate changes between each other. A fork can “leech” off its parent project, taking changes from it on an ongoing basis, but the converse is also true. As the two projects diverge, though, it becomes more difficult to absorb changes in either direction using any revision control tool’s automatic merge capabilities, and things are likely to descend into manual conflict resolution hell.

The big deal, though, is that distributed tools make it easier to reconcile a fork.

Consider how you’d do this with a centralised system. The easiest thing to do is not to preserve history at all, but pull in all of one project’s changes en masse, then fix things up. If you want to make the fork look like it was a branch and preserve its change history, good luck; you’ll be doing it all by hand.

By contrast, with a distributed tool, a fork is just another branch. Branches get merged all the time, and the history of each side of a branch is always preserved. There may be the one-time pain of a giant merge, but after that, you have a unified history, reflecting every change made on both sides of the split while it lasted.

Posted in scm, software

Leave a Reply

Your email address will not be published. Required fields are marked *

*