Monday, March 1, 2010

Git or Mercurial as a replacement for SVN

Are DVCS tools like Git and Mercurial better alternatives than SVN for a company using a central repository? Does DVCS makes sense for a company where all development is done from the same location? These were the main questions I had on my mind as I investigated alternatives to SVN at Overstock.com.

The short answer is that, yes, Git and Mercurial can function as a central repository. And yes, there are benefits to using DVCS even if the IT department is centrally located. But there are costs as well. The long answer is, well, long and that's what I'll try to address in this post.

Before I go into the tools, let me give you a little background about Overstock.com. We have a few public facing applications (e.g. www.overstock.com) and many dozen internal applications. Our applications are mostly Java. Our projects are organized as multi-module Maven projects. We use Hudson for builds and Nexus for a Maven repostiry. We have a handful of SVN repos each containing a couple dozen projects. Most developers use Eclipse though a few use IntelliJ. Developers are organized into teams of 4-6, and all work from the same office. Each teams works on a handful of projects.

Our development staff has been growing rapidly. 4 years ago we had less than 25 developers but now we have over 100 developers. When we had a small number of developers, we generally put features on branches and merged them into trunk periodically. Since there weren't that many branches, there weren't that many surprises at merge time. But this doesn't scale well. 100 developers with lots of branches makes for some painful merging with SVN. SVN is pretty lousing at merging. If you move a file on a branch (think refactoring), and that file has been modified on trunk, you will have problems when you merge that branch into trunk. And SVN forces you to keep track of the "to" and "from" revisions which is error prone. SVN 1.5 was supposed to have better merging capabilities but from what I've seen, it's still broken.

OK, so we all agree SVN the tool is pretty lousy for large scale development. But what about a central repository? For Overstock.com, a central repository makes a lot of sense. It is automatically backed up and highly available. Everyone knows where to find the latest version of the code. Everyone has visibility into what other people are working on.

It would seem the perfect solution for Overstock is a next-generation VCS tool with a central repository. Everyone seems to agree that the next-generation tools are all better than SVN. Of the next-generation VCS, Git and Mercurial have the best industry support and momentum at the moment. Git has the Linux community. Mercurial has some big names supporting it too like Google.

We could just choose Git based on all the buzz but we don't change VCS often so its worth investigating Mercurial too. Looking at a feature comparision, they seem pretty similar. Google code seems to think that Mercurial is more efficient but Git is more powerful and complicated.

From my investigation I found that both tools offer vastly superior command-line features to SVN and both are very similar in both features and performance. Their philosophies are very different though. One blog used the analogy that Git is McGyver and Mercurial is James Bond. I would spin that a bit and say that McGyver would use Git and James Bond would use Mercurial.

At first glance, Git has more features. This is because Git enables just about everything by default. Git also feels more geeky in its syntax. It took me longer to get used to.

Mercurial takes a different approach. Instead of enabling everything, you enable only what you need. By default, only the commonly used features, known as extensions, are enabled. Extensions like hgk (analogous to Git's gitk) are packaged with Mercurial but need to be enabled. Some extensions also need to be downloaded before they can be enabled. Enabling extensions is pretty painless from what I found. I also found the defaults more sensible for cloning, branching, merging and reverting.

I suspect that if you enabled all the features of Git and Mercurial, you'd end up with nearly identical feature sets. Both tools are still growing so any features lacking in one of the the other will likely be added within a few weeks or months.

By far the most compelling features of Git and Mercurial for me are the merge capabilities. In this area, neither tool disappointed. I created many branches, moved files in one branch and modified them in another. Merging them together was simple and worked correctly. I didn't have to do all that revision bookkeeping like I have to with subversion. Both Git and Mercurial merged changes across moved files beautifully. At last, merging that works like its supposed to.

Now for some bad news. Perhaps you caught on to my wording above where I said Git and Mercurial have better command-line features than SVN. When it comes to Eclipse plugins, SVN is better than Mercurial and vastly better than Git. For most developers at Overstock, the majority of their interaction with the code repository is via Eclipse. Perhaps this is because the Eclipse plugins (subversive and subclipse) have spoiled us. I think it's important to understand how to use SVN from the command-line, and sometimes it's actually easier. But there are some functions that just make more sense in the IDE. Generally these are tasks which involve looking at multiple files or revisions such as synchronize with repository, browse repository, show annotations, merge and show history.

At the time of review, the Git Eclipse plugin is very unstable and missing many of the features. This is likely because the Git plugin developers have their hands full with reimplementing Git in Java. At the moment, you cannot synchronize with repository or browse repositories. I was also unable to merge if there were conflicts. As soon as a conflict was encountered it just aborted. For those using multi-module Maven projects with the M2E plugin, you're out of luck. The Git plugin cannot be enabled for these projects. I also found that the plugin crashed Eclipse nearly every session.

Mercurial has at least a decent Eclipse plugin. There are at least 2 that I found and I can definitely recommend HgEclipse. It has most the features of the SVN plugins. It works with multi-module maven and M2E. It annoyingly asks for my password multiple times when pushing or pulling to a remote repo. It was so annoying I had to set up passwordless ssh to the remote repo. But it was stable and usable.

Eclipse plugins aside, the next issue I encountered with both Git and Mercurial was dealing with our SVN repos that contain a lots of projects. With SVN, we only have a few repos. One repo is for internal applications, another for the website and related services, and so on. Generally, when you are working on a feature you only check out a small subset of projects from a repo. This doesn't map well to Git or Mercurial. Neither tools supports partial clones/checkouts. You have to check out the whole repo even if you just want to modify one file. There are 3 alternatives for migrating. 1) Convert every project to a repo. 2) Keep the same structure. Option 3) Use an extension to roll your own repo of repos.

Option 1 means that our 5-10 repos become several dozen repos. Now we'll need a tool for hosting all these repos. There are a few public tools like GitHub or Googlecode, but not so many options for private hosting.

Option 2 means cloning an entire repo just to work on one of its projects. This might not be so bad for smaller repos but it does not scale.

Option 3 involves using an extension like NestedRepositories for Mercurial or Submodules for Git. Git's submodules are not well-suited for making a repo of repos. It felt akin to hard links in unix and it's not really intended for this purpose. Mercurial's NestedRepositories look more promising but I have not had time to play with them. It mentions the following relevant use case: "Partial views: a developer who only needs to work with two out of twelve modules should not have to download or deal with the other ten."

Now the choice does not seem so obvious. If I had to choose between Mercurial and Git at Overstock right now, I'd pick Mercurial, with emphasis on right now. As I said before, both tools are still expanding and evolving so I could easily see Git becoming a better choice in 6 months to a year. At the moment, neither tools is compelling enough for me to recommend abandoning svn. For our situation, my recommendation is to just wait and see. From a personal interest, I'd really like to use Git or Mercurial, but it just doesn't make sense right now for the company. Hopefully in the next year or two the tools will evolve in such a way that it does it make sense. Or our needs will evolve in such a way that SVN no longer makes any sense.

No comments:

Post a Comment