Skip to main content

Hudson usage analysis

Posted by kohsuke on January 21, 2009 at 5:42 PM PST

Starting Hudson 1.264, Hudson has an option of sending usage statistics. This was released 12/16 last year, so it's been about a month. So I run some data analysis over the data that's collected so far.

First, data is filtered to eliminate one off installations that don't appear to be a long running installation. That is, I only counted installations that kept sending the usage data for more than 3 day span, within last 2 weeks.

This filtering left me with 2127 installations.

So how big is the installation base?

Since not everyone wants to participate in the usage survey, and not everyone runs a version of Hudson that supports this feature, the first question is, how representative is this data?

These 2127 installations participated came from 1865 unique IPs. In comparison, during the same time period, the Hudson update center has 11706 unique IPs. So if we use that as the estimate of the real installation number, this represents about 16% of the total installation base, and that puts us to about 13350 installations total. (Update center is implemented early June last year, so this ignores everyone who hasn't upgraded Hudson for more than 6 months.)

Update center also gives us a means to track version numbers of the installations, which you can see in the following chart. On X-axis we have version numbers (discovered from update center pings), and on Y-axis, we have # of installations for blue bars. The red line shows the cumulation. This tells us that about half the people runs versions below 1.264 (54% to be more precise), so that means about 2/3 of the users didn't want to participate in the survey, even though their Hudson is capable of doing so.

versions.png

How many jobs do people run?

So now we are back to the survey data.

The following graph shows how many jobs people run on Hudson. On X axis we have a number of jobs, and on Y axis we have the number of installations. You can see that 90% of the installations out there runs less than 30 jobs.

OTOH, there are some really large Hudson installations. Within the 33 installations that have more than 140 jobs, the average number of jobs are 273.

I should note that there's likely some bias here to push this number down — new users are better represented here than existing users (because this only counts existing users who upgraded in last 1 month.)

One of the concerns that I had with Hudson is that its UI tends to break down when you have a large number of jobs. So this is good to know that this problem is hurting people less than I thought.

job-size.png

Do people run distributed builds?

The following graph shows the answer to this question. 87% of the installations don't do distributed builds, and the remaining 13% does. But I don't know how to interpret this.

cluster-size.png

What job types do people use?

The 2127 installations reported 33496 jobs in total, and the following pie chart shows the drill-down of how people use it.

Predictably the versatile freestyle job is the most commonly used, but I wasn't expecting this many Maven jobs, which is about 24% of the entire jobs.

jobtypes.png

Plugins

The next data I looked at is, what plugins do people use? And the data is shown below. The most popular plugin is the findbugs plugin, which is installed to about 1/3 of the entire installation base. We can see that Ulli Hafner's static code analysis plugins are all very popular.

To a certain degree, trying to infer the popularity of different SCMs (clearcase=88, git=88, perforce=88, mercurial=75, vss=33, accirev=26, tfs=14, ... ) or different code coverage tools (cobertura=511, emma=281) are interesting, too.

While this graph only shows plugins that are installed in more than 10 places, the analysis found that there are 211 plugins, whereas the Hudson update center only hosts 103 plugins. So there appears to be a significant number of plugins that are written in places we can't see.

I think this is good that the efforts we put into the plugin development environment is paying off.

Also, 83% of the total installations have at least one plugin, so it confirms my assumption that the plugin system is a real source of attraction for Hudson users.

plugins.png

Operating Systems

It looks like I messed up with the data collection here, so I only have data about where people run the Hudson master. But the drill-down is below. Linux, Windows, and Solaris covers the 95%.

This is good to know, because there are many OS-specific features in Hudson, and now I know I basically just need to worry about 3.

os.png

JDK version


Here I also made a mistake — I collected the system property "java.vm.version" instead of "java.version", and most VMs are reporting wierd values like "11.0-b16". These values appear to be coming from JDK6, so I counted them into JDK6, but if this is wrong, then the picture will be very different.

In any case, JDK6 penetoration on Hudson masters is about 65%, so Hudson needs to continue supporting JDK5.

jvm.png

32-vs-64.png

-->

Future Works

So that's about it for the "stop the world" analysis. The reason I call this "stop the world" is it looks at the current state of the installation across the users without paying any regard to how each of them evolved.

The other interesting analysis that I'd like to do eventually is the time line analysis. This is about better understanding how a Hudson installation evolves. The idea is to try to build a model of a typical Hudson installation starting when it's installed. This will allow me to say things like "on average, one new job is added every X days" or "on average people upgrade Hudson every Y days." For this to be useful, however, we need to keep collecting more data.

Until then, thank you very much for your participation to the survey.

Related Topics >>

Comments

brianthewise -- that is not something we collect. There was an idea in the dev list to develop more elaborate data collection mechanism as a plugin, so people who are willing to participate can do so. Perhaps SCM usage would fit that better.

This is great stuff. Is there any data on how the native SCMs (SVN and CVS) numbers compare to those supported by plugins?

jhm -- building a project on multiple platforms is well supported in Hudson and is called "matrix project". It's also often very useful for running tests on a large number of different combinations. Wrt JVM, this is the JDK version that Hudson master runs on, which can be different from JDKs used for building projects.

emilian -- Knowing users is harder for open-source projects than for commercial software (which involves a license purchase that creates a contact), and knowing users is good for guiding developments. So thank you for your making an exception for open-source projects.

Very nice to see - especially the plugin list. If my evalution for a CI system comes to Hudson (and I think it will ;) this would be a starting point for installing extensions. I think distributed builds are a good thing for large installations, so it will work as a load balancer. But I have another scenario: build (or just test) a project on multiple OSes. We have a major number of VB projects and I want to integrate them too. Regarding the JVM version: is this the use of the VM Hudson uses or the VM the project requires? I could think that Hudson runs on an uptodate Java version (1.6.*) while projects have to be built on an old environment (we have a 1.4 requirement in a project). Anyway good to see a objective measurement that Hudson is product you can count on. BTW: Apache hosts a Hudson instance on http://hudson.zones.apache.org/hudson/ (as well as a Continuum installation on http://vmbuild.apache.org/continuum/groupSummary.action)

Those are some pretty nice statistics. I always disliked sending usage statistics as it was always that big-brother gloom to it. But after seeing your charts here and the charts on NetBeans ( http://statistics.netbeans.org/analytics/ ) it occurred to me that maybe for open-source projects I can make an exception and tick that "phone home usage info" checkbox. Anyhow, congrats on Hudson, I've used it as of maybe 2 years ago in 2 companies. And my former employer from 2006 is also using Hudson now.