More about Subversion

[I originally posted this about two weeks ago, but something caused it to disappear. Here’s a lightly edited repost.]

Karl Fogel posted a comment on my future of free distributed SCM systems entry, asking for some more detail on what it is that I don’t like about Subversion.

That’s a tall order. Since Karl is one of the primary authors of Subversion, I can’t just go making wild scattershot accusations. Well, I suppose I could, and I did, in that earlier posting. But now for some actual information, or at least opinions posing in that form.

Specifically, Karl said “That Subversion is non-distributed cannot be denied, but fragile, overengineered and sketchy entanglements are pretty serious statements“.

I won’t talk about the non-distributed nature of Subversion here, because that is an axiomatic property. Either you want a distributed system or you don’t.

Fragile: Subversion was originally built on top of Berkeley DB. BDB may be a fine piece of software, but for whatever reason, it is easy to provoke SVN and BDB into catatonia, forcing some administrator to step in and clean up the database droppings. The major classes of problem seem to involve running out of transaction locks and running out of disk space.

In addition, Berkeley DB is not shy about changing its database format on a frequent basis. Since Subversion builds against an external Berkeley DB, you have to manually dump and reload all BDB-backed Subversion repositories if your system’s BDB installation gets upgraded. This makes a bother of keeping up to date with rolling Linux distribution upgrades.

In response to these concerns, there’s now a filesystem-based back end. I have not used it, so I have no idea whether it is any more robust.

In contrast, Perforce, a commercial SCM that has many features similar to Subversion (and with which I have many happy years of experience), also uses Berkeley DB, but more or less never falls over due to the database getting screwed up. It either transparently upgrades its database when the schema changes, or (sometimes) requires the server to be started with a flag to tell it to upgrade. So Perforce acts as an existence proof that building on top of Berkeley DB need not be a stability or maintenance problem.

Overengineered: This is more a matter of taste than anything else, perhaps. I can understand building on top of APR (the Apache Portable Runtime) for portability (but see below for why I don’t like it). But I have more trouble with the use of DAV and integration with Apache, as they have added lots of bloat without solving anything I can point to as real problems.

Regarding DAV, I know that it means that people can mount a Subversion repository as a network drive and interact with it, but I wonder if such people are numerous, or use any of Subversion’s other features.

As for integration with Apache, I can somewhat understand the desire to use Apache’s authentication, network transport, and access control mechanisms. But Apache is painful to administer at the best of times, and administering a Subversion repository that is integrated into Apache and using such features as authentication and access control is not for the faint of heart.

It would be interesting to know how many people actually make use of the DAV and Apache integration, versus ssh-based access. I would not be surprised if deployments lean heavily towards the latter. It would also be interesting to get a sense for how secure the former two are, and how resistant to insecurity through misconfiguration.

Entanglements: I will admit that this is more of a holdover grudge from pre-1.0-beta days, when almost every other build of Subversion required an upgrade of an external dependency. Since then, Subversion has shipped with specific versions of some of the external packages to which it is most tightly bound.

However, I had no idea how large APR was until I started researching this posting, but it’s huge: almost 270,000 lines of code. And neon (the DAV library that Subversion uses) is almost another 80,000 lines of code. It’s hard to feel comfortable with 350,000 lines of support code.

And then there’s the external Berkeley DB issue that I’ve already mentioned.

But enough of the trash talking. There are some properties of Subversion that are highly positive.

  • They have striven to develop a piece of software that is usable in the real world. For example, where persistent problems have arisen, such as with Apache and Berkeley DB integration, they have provided more usable and reliable mechanisms.
  • The documentation for users is excellent.
  • The “atmosphere”, for want of a better word, within which Subversion is developed is highly positive towards contributors. This is entirely the result of conscious work by people like Karl, Mike Pilato, and other core developers, and it makes the process of contribution a pleasant experience. The project can only benefit as a result.
  • There might be a vast quantity of code, but it is heavily commented. The commentary is a lot more voluminous than I’m used to, but it helps to make the sheer amount of code less daunting.
  • The boundaries and dependencies between modules are well delineated. This reduces the risk of bits of code getting dropped into the wrong place for expediency’s sake.

In the end, I think I can boil my objections down to this. Subversion is “only” a better CVS, but look how much you have to wade through to be able to contribute:

  $ find subversion-1.2.0 type f print0 | xargs 0r wc l | awk '/total/{a+=$1}END{print a}'  945090 

Whereas Mercurial is, I think, in some ways already more capable and interesting, at a cognitive cost of less than 1.5% that of Subversion:

  $ find hg \( name .hg prune \) o \( type f print0 \) | xargs 0r wc l | awk '/total/{print$1}'  12498 

I think extremely highly of the Subversion developers; they are smart, thorough, and plain nice folks. There’s just a part of me that is disappointed that they aimed as low as a mere replacement for CVS, and put so much work and code into the result, when the same amount of work could have yielded something vastly different and (to my mind) better.

Posted in scm, software

Leave a Reply

Your email address will not be published. Required fields are marked *

*