Skip to main content

Mustang's HotSpot Client gets 58% faster!

Posted by opinali on November 10, 2005 at 9:13 AM PST

[FIXED: I originally posted wrong data for JSE 6.0 Client b59's: the average score was correct, so it stands that the Client VM became 58% faster; but scores for the individual tests were identical to other VM as I copy&pasted the HTML but didn't change the values. Sorry for the confusion. (I re-checked all other numbers.) Also, I deleted the comments about an apparently  big impact on array-intensive code, because with the new detailed data, this trend doesn't look so big anymore, e.g. the FFT test improves more than matmult and LU but it isn't array-intensive, and SOR is but doesn't improve substantially.]

Mustang is adding new optimizations to HotSpot, not news, we expect the JVM to learn new tricks at every release. But the great news is that Sun just added a great performance enhancement to HotSpot Client. Yep, the other HotSpot... the one that never appears in benchmarks aimed to make Java or particular JVMs look good. Unfortunately, you can't realistically use the Server VM in most client apps, from simple database front-ends to action games, as it loads slower, eats more RAM, and it's not included in the more ubiquitous JRE package.

Well, good news: Build 59 includes an improvement recorded as Bug 6320351: new register allocator for C1. ("C1" means HotSpot Client; C2 is Server.) The bug description is short, so there it goes in full: "The existing register allocator for C1 is very primitive and doesn't take very good advantage of the registers available.  It also doesn't allow values to live across blocks.  Along with the high level IR models locals in such an explicit manner that the code quality for inlining is fairly bad." A good register allocator is even more important in the Intel x86 platform because it's got so few architectural registers.

How cool is this new optimization? I tested with SciMark2, a good benchmark for that kind of stuff because it runs tight loops of complex arithmetic expressions, array access and similar operations that benefit enormously from good register allocation. Results (on Windows / Pentium-IV 2,25GHz) are:

  HotSpot Client HotSpot Server
J2SE 5.0
Update 5

FFT 87
SOR 308
MonteCarlo 24
Sparse matmult 116
LU 303

FFT 232
SOR 563
MonteCarlo 70
Sparse matmult 333
LU 853
JSE 6.0
build 58

FFT 87
SOR 308
MonteCarlo 40
Sparse matmult 119
LU 291

FFT 274
SOR 560
MonteCarlo 126
Sparse matmult 289
LU 905
JSE 6.0
build 59

FFT 190 (+118%)
SOR 354 (+15%)
MonteCarlo 50 (+25%)
Sparse matmult 250 (+110%)
LU 480 (65%)

FFT 272
SOR 560
MonteCarlo 126
Sparse matmult 289
LU 910

Notice the remarkable improvement in HotSpot Client, a 58% boost over 5.0 (the best stable HotSpot) or Mustang b58 (the previous build without the new optimization). The Server VM is also improving in Mustang, but only by a modest 5% over 5.0 (that's already good enough to produce near-ideal code; SciMark2 doesn't need most new tricks of Server 6.0, from lock coarsening to the [still upcoming] benefits of escape analysis).

This all spells good news for Java clients. The improved code generation is good not just for action games with sophisticated geometry calculations. Complex algorithms like Java2D's blt loops, Swing's painting and layout management code, chart renders, XML parsers, middleware stacks and others, lie below our vanilla CRUD & reporting application. HotSpot Client is doomed to always lag behind Server, but Moore's Law allows to continuously package more tricks even in the JVM for low-end machines and user-time sensitive apps.