Skip to main content

Repository cohesion is another plus of Mercurial

Posted by fabriziogiudici on September 18, 2009 at 2:00 PM PDT

During my last years before getting the master degree, I have been working at a free flight simulator. It run under DOS and was named FGFLY. It was written in C++, initially Borland C++ and later Watcom C++, in order to use a memory extender to bypass the infamous 640k limit. At the time I just was a student able to earn a few money with programming, and couldn't allow to spend a lot in hardware - so my computer was never at the leading edge; I remember that compiling the whole project took more than one hour.

Today computers are faster and I thankfully own a very fast computer - still, my favourite project blueMarine takes quite long to get compiled. Two years ago I ran some performance tests for my build environment, and on Mac OS X a build with Ant took 1'44" (Linux was definitely faster because of a faster file system). Today, the project has grown and takes more than 7' - on a faster machine. Since a long time blueMarine has been split in subprojects, so the biggest component takes less than 3"30. For the record, these are the times needed to perform an ant clean nbms on a MacBoocPro unibody, 2.4GHz running Leopard:

Semantic: 42"
Metadata: 58"
blueMarine-core: 3' 22"
blueMarine: 2' 10"

Compiling an application is clearly a disk bound operation, so a possible solution to make things faster is to use a faster disk. The fastest disk on the earth is a RAM disk, but unfortunately it's also unreliable. Mac OS X and Linux are very stable, but it could occasionally happen that a crash causes a RAM disk to vanish and I really can't tolerate the feeling of throwing away a few hours of work.

With Mercurial, it seems I've found a good solution to balance speed and reliability. While Subversion and CVS create service directories (.svn and CVS) in every folder of your project, thus mixing the local portion of the repository with your working area, Mercurial implements a cohesive repository contained in a single directory (.hg). This means that with just a symbolic link you can have the repository and the workspace to live on two separate volumes, thus filesystems. Furthermore, being the Mercurial repository a whole, locally cloned repository, re-creating the working area from scratch takes only a few seconds and doesn't need a network connection. Last but not least, commits are local, so you can frequently commit and in case of failure you're likely to loose no more than a few minutes of work.

Bingo! This script makes all the magics on Mac OS X:

fritz% cat makeramdisk 
#!/bin/bash

VolumeName="blueMarineBuild"
SizeInMB=512
NumSectors=$((2*1024*SizeInMB))
DeviceName=`hdid -nomount ram://$NumSectors`
echo $DeviceName
diskutil eraseVolume HFS+ $VolumeName $DeviceName

mkdir /Volumes/$VolumeName/blueMarine
mkdir /Volumes/$VolumeName/blueMarine/Metadata
mkdir /Volumes/$VolumeName/blueMarine/Semantic
mkdir /Volumes/$VolumeName/blueMarine/blueMarine-core
mkdir /Volumes/$VolumeName/blueMarine/blueMarine

HgRepo=$HOME/Business/Tidalwave/MercurialRepos/blueMarine

ln -s $HgRepo/Metadata/.hg /Volumes/$VolumeName/blueMarine/Metadata
ln -s $HgRepo/Semantic/.hg /Volumes/$VolumeName/blueMarine/Semantic
ln -s $HgRepo/blueMarine-core/.hg /Volumes/$VolumeName/blueMarine/blueMarine-core
ln -s $HgRepo/blueMarine/.hg /Volumes/$VolumeName/blueMarine/blueMarine

cd /Volumes/$VolumeName/blueMarine/Metadata && hg update -C default
cd /Volumes/$VolumeName/blueMarine/Semantic && hg update -C default
cd /Volumes/$VolumeName/blueMarine/blueMarine-core && hg update -C default
cd /Volumes/$VolumeName/blueMarine/blueMarine && hg update -C default

It first creates and mounts a RAM disk large enough (768MB) to contain the whole blueMarine working areas, including compilation artifacts; then it creates empty directories for the working areas and create the relevant links to the Mercurial repositories, which safely live on a ZFS filesystem; at last, it invokes Mercurial to re-create the working directories.

Running it takes about 35 seconds - it's good for setting up the stuff before each working session.

Repeating the same compilation tests shows that you're saving around 50% of the time:

Semantic: 22" 50%
Metadata: 29" 50%
blueMarine-core: 1' 25" 45%
blueMarine: 1' 2" 50%

I also expect a better performance with NetBeans when it needs to scan the source files.

Related Topics >>

Comments

Working area in the ramdisk, even sources?

Hi Fabrizio, citing your statement:

...you can frequently commit...

I suppose you are committing very very (VERY) frequently! :) How frequently? At any file changing? If you don't, I think the risk of loosing your work is very high. Even the risk of deliberating shutting down your pc with pending commits...then deliberating trashing hours of work.

IMHO the only thing could benefit a move to a ramdisk without risk is the "target" (in maven terms) directory of a project. I tried this on Linux with a Maven project. It's very simple, and it don't even require to execute a script before starting to work. Briefly:
  1. Add a ramdisk mounted on the (already existing) target directory of a project, putting in the /etc/fstab the following line:

    tmpfs_myprj /home/lucio/myprj/target tmpfs defaults 0 0

    It will be mounted automatically at boot. You can mount it now with mount -a.
  2. A mounted directory can't be deleted, so you'll have a fail if you run a clean. My solution is to configure the maven-clean-plugin for not failing on error. In the project pom.xml:

    <plugin>
    <artifactId>maven-clean-plugin</artifactId>
    <configuration>
    <failOnError>false</failOnError>
    </configuration>
    </plugin>
I tried this on a simple project I'm working on. My gain in speed is not profitable: only 2-3 seconds on a total of 33-37 seconds. But that build process is already mostly in-memory (even the test database is in-memory). Maybe on your project the gain could be better.

Hi Lucio. I think this

Hi Lucio.

I think this approach must be heavily customized to fit everybody's need, depending on the way of working, the reliability of the operating system and the cost / benefit one wants to accept. In my case, I usually commit after a test passes, which usually is a few minutes. Sometimes hard bugs need more and could last hours (or days). At this point one should define an acceptable trade off - I'd say that my Mac OS X and Linux don't crash and / or fail hibernating more than once per two weeks. I'd say acceptable to loose 30 minutes of work in these cases. This means that I need to make sure that every 30 minutes everything is committed or saved (note that 30 minutes is a "pomodoro" and the end of a pomodoro is a nice moment to make a check). A save point can be easily created by a command such as hg diff > /safedir/blah - one might just write an alias to have the blah to reflect the current timestamp.

There's also another approach, that so far I've only tried once. Since Hg commits are local, you can undo them by means of the hg strip command. I could even commit more frequently than I do, because if the things I'm doing don't get to the solution I'm searching, I can strip them (the hg strip command). Note that hg strip only drops the commits, but leaves your file untouched - this means that you could change something, and then re-commit. Also, there are some hg commands that make it possible to re-arrange commits not pushed, e.g. to merge several commits into a single one if one wants to have "atomic" commits.

Of course, this is added complexity and one should measure the benefits. Since I've started using the RAMdisk again only yesterday, at the moment I can only see the obvious benefits in time of compilation time, but eventual troubles will need about a month of continuous use to be evaluated. Also, I'm not using pomodoros with blueMarine yet, so I will take some time for this new workflow to eventually consolidate.

Ok. ...looking forward for

Ok. ...looking forward for comments when your workflow will be consolidated.