Skip to main content

Using mercurial to maintain local changes to a 3rd party library

Posted by kohsuke on November 1, 2007 at 12:22 AM PDT

So here is the scenario. You are working on a project that depends on a 3rd party library XYZ, and you need to patch XYZ — maybe you can't wait for the upstream to fix a bug, or you need to implement a new feature, or maybe you just need to tweak things so that it works nicely with your application. At the same time, you know that XYZ is still evolving, so when the upstream makes the next version available, you want to incorporate those.

To do this, I use two mercurial repositories, one called "incoming" and the other called "patched." The former is used to store the unmodified source drops from the upstream, and the latter is used to maintain local changes.

The first step is to set up those two mercurial repos:

$ hg create incoming
$ hg create patched

I'll then seed the incoming repository. In my case, I take upstream source drops from the Subversion repository of the upstrem project, so it looks something like this:

$ cd incoming
$ svn export --force http://svn.svnkit.com/repos/svnkit/tags/1.1.4/ .
$ ls
COPYING     build.xml      contrib  ...

I'll then put the whole thing as-is into mercurial:

$ hg addremove
$ hg commit -m "imported http://svn.svnkit.com/repos/svnkit/tags/1.1.4/"

Now I'll switch to the "patched" repository:

$ cd ../patched
$ hg pull ../incoming && hg update
  (this brings in the upstream source tree as-is)

At this point I can start making changes on this "patched" workspace. I can run "hg addremove", "hg commit", and so on so that I can logically group my changes.

So far those two mercurial repositories only exist locally, but I can create remote repositories for them, so that I don't lose my data even if my laptop dies. This is also a crucial step in a team environment, too. The following steps illustrate this for one repository, and I need to do this again for the "patched" repository:

$ ssh server hg init /export/home/hg/svnkit/incoming
$ cd incoming
$ vi .hg/hgrc
  ...
$ cat .hg/hgrc
[paths]
default = ssh://kohsuke@server//export/home/hg/svnkit/incoming
$ hg push
pushing to ssh://kohsuke@server//export/home/hg/svnkit/incoming
...

Now, if the upstream releases a new version, I'll import that into the incoming repository like this:

$ cd incoming
$ hg pull && hg update
  make sure my local workspace is up to date
$ rm -rf *
$ svn export --force http://svn.svnkit.com/repos/svnkit/tags/1.1.5/ .
$ hg addremove
$ hg commit -m "importing http://svn.svnkit.com/repos/svnkit/tags/1.1.5/"
$ hg push

The key part is "rm -rf *", which removes all the old source files, except the .hg control directory. Unlike subversion/CVS, mercurial only creates .hg at the top level directory, so I can wipe out local files very easily like this. This is a necessary step in case the upstream removed some files.

Next I'll merge this into my local changes:

$ cd patched
$ hg pull && hg update
  make sure my local workspace is up to date
$ hg pull ../incoming
  this brings in upstream changes
$ hg merge
$ hg commit
$ hg push

Things I like

  1. I do have a lot of experience of doing this with CVS. The gist is ,outlined here, but it's missing a lot of details, and I think it's suffice to say that it was very painful. This is much better.
  2. I get to retain full history of my local changes.
  3. This set up works in a team environment where multiple developers maintain the shared local changes to the upstream project.

Things I don't like

  1. Setting up a new mercurial repository properly in a team environment is not very easy, yet if you start doing this for your dependencies, you can quickly get a large number of repositories.
  2. I do retain full history, but my favorite IDE doesn't recognize mercurial, so I have to come back to CLI for any VCS operations. No change bar, no visual diff, no annotated source view. This really hurts.
  3. Merge in mercurial sucks, mostly because I'm used to good 3-way merge support in modern Java IDEs.

Things I haven't figured out

  1. I use two repositories, but this should be also doable just with one repository. I don't know which is better — as I said above having a larger number of repository is painful, but doing this is one repo would complicate steps.
  2. I have a feeling that using mercurial branches might let me label changesets so that I can easily tell whether a particular change is a local change or from the upstream.
Related Topics >>