Skip to main content

Shrink your HG repository

Posted by fabriziogiudici on February 22, 2010 at 7:47 AM PST

When I used Subversion and Ant for my projects, I had the habit of committing the required libraries together with the sources. I think that it's a solution that still makes sense with those two tools, as you can checkout a certain version of a project and you have all you need to compile it on the local disk. Things change with Mercurial, since you'll clone the whole history of the project, that is all the versions of the iibraries that have been used in the past, and a Mercurial repo can quickly grow huge in this circumstance. For instance, when I converted the blueMarine repos from Subversion to Mercurial, still using Ant, I got stuff large several hundreds megabytes. This is an annoyance for people that want to quickly clone the repo and try compiling the application. With Maven, of course, the repository is smaller because it doesn't contain the libraries; they will be downloaded as artifacts, but only the specific versions that you need for the current version of the project, not for all the history.

One of the extra advantages of Mercurial (and, generally speaking, I think that the concept applies to Distributed SCMs) is that you get administration utilities for the repository as first class tools. For instance, Mercurial has a command, named 'convert', that allows to convert an existing, local repository from Subversion, Git, Bazaar, others... and Mercurial itself.

What's the point in converting from Mercurial to Mercurlal? It's that you can process flles, for instance dropping or renaming them. In my case, the libraries were committed to the lib directory (and let's also consider a tool directory containing all the extra-Ant building tools that I needed). Yesterday I created a configuration file named filemap:

exclude lib

exclude tools

and then performed the command:

hg convert blueMarine-core/ blueMarine-core-cleanedup --filemap filemap

This created a new repository where all the files in the specified directories have been stripped; it shrank the repo from several hundreds megabytes to just fifty. Then I went to Kenai, scratched and recreated the existing repo and performed a fresh push, that replaced the original contents.

Of course you have to pay two things:

  1. All the changeset ids have been modified. In Mercurial they are a hash function of the repository contents, so you get the point. You can't refer to arbitrary changeset in the past as they were documented until yesterday. Of course, tags still work; and in any case, if you have to rebuild an untagged past version, you can use the commit date.
  2. Of course, it is now impossible to perform a build of the old versions of the project that used Ant, since the required libraries have been stripped. On this purpose, I just created a binary bundle of the old repository with the command

    hg bundle --all blueMarine-core-repo-archive-20100221_1450.hg 

    and uploaded the file to the Kenai download area. If one wants to reconstruct an version older than yesterday, he just needs to

    hg unbundle http://<url-of-the-archived-repo>
    hg update -C <changesetId>

In the end, it's a reasonable trade-off, as it's still possible to reconstruct arbitrarily old versions, and you get a much smaller footprint.

Related Topics >>


I use ivy to keep the footprint small ....

I usually migrate the ant based project to consume ivy and have an ivy resolver that fetches the artifacts from an internal repository. If I need to publish the current artifacts back to the repository, then I define a custom publish target which pushes the artifacts on to an NFS mount and add this mount as one of the search locations for the repository. I guess each project build system has its own workarounds :)