 |
Mustang's HotSpot Client gets 58% faster!
Posted by opinali on November 10, 2005 at 09:13 AM | Comments (17)
[FIXED: I
originally posted wrong data for JSE 6.0 Client b59's: the average
score was correct, so it stands that the Client VM became 58% faster;
but scores for the individual tests were identical to other VM as I
copy&pasted the HTML but didn't change the values. Sorry for the
confusion. (I re-checked all other numbers.) Also, I deleted the
comments about an apparently big impact on array-intensive code,
because with the new detailed data, this trend doesn't look so big
anymore, e.g. the FFT test improves more than matmult and LU but it
isn't array-intensive, and SOR is but doesn't improve substantially.]
Mustang is adding new optimizations to HotSpot, not
news, we expect the JVM to learn new tricks at every release. But
the great news is that Sun just added a great performance
enhancement to HotSpot Client.
Yep, the other HotSpot... the
one that never appears
in benchmarks aimed to make Java or particular JVMs look good.
Unfortunately, you can't
realistically use the Server VM in most client apps, from simple
database
front-ends to action games, as it loads slower, eats
more
RAM, and it's not included in the more ubiquitous JRE package.
Well, good news: Build
59 includes an improvement recorded as Bug
6320351: new register allocator for C1. ("C1" means
HotSpot Client; C2 is Server.) The bug description is short, so
there it goes in full: "The
existing register allocator for C1 is very primitive and doesn't take
very good advantage of the registers available. It also doesn't
allow values to live across blocks. Along with the high level IR
models locals in such an explicit manner that the code quality for
inlining is fairly bad." A good register
allocator is even more important in the Intel x86 platform because
it's got so few architectural registers.
How cool is this new optimization? I tested with SciMark2, a good benchmark
for that kind of stuff because it runs tight loops of
complex arithmetic expressions, array access and similar operations
that benefit enormously from good register allocation. Results (on
Windows / Pentium-IV 2,25GHz) are:
|
HotSpot Client
|
HotSpot Server
|
J2SE 5.0
Update 5 |
168
| FFT |
87 |
| SOR |
308 |
| MonteCarlo |
24 |
| Sparse matmult |
116 |
| LU |
303 |
|
410
| FFT |
232 |
| SOR |
563 |
| MonteCarlo |
70 |
| Sparse matmult |
333 |
| LU |
853 |
|
JSE 6.0
build 58 |
169
| FFT |
87 |
| SOR |
308 |
| MonteCarlo |
40 |
| Sparse matmult |
119 |
| LU |
291 |
|
431
| FFT |
274 |
| SOR |
560 |
| MonteCarlo |
126 |
| Sparse matmult |
289 |
| LU |
905 |
|
JSE 6.0
build 59 |
265
| FFT |
190 (+118%)
|
| SOR |
354 (+15%)
|
| MonteCarlo |
50 (+25%)
|
| Sparse matmult |
250 (+110%)
|
| LU |
480 (65%)
|
|
432
| FFT |
272 |
| SOR |
560 |
| MonteCarlo |
126 |
| Sparse matmult |
289 |
| LU |
910 |
|
Notice the remarkable improvement in HotSpot Client, a 58% boost over
5.0 (the best stable HotSpot) or Mustang b58 (the previous build
without the new optimization). The
Server VM is also improving in Mustang, but only by a modest 5% over
5.0 (that's already good enough to produce near-ideal code; SciMark2
doesn't need most new tricks of Server
6.0, from lock
coarsening to the [still upcoming] benefits of escape
analysis).
This all
spells good news for Java clients. The improved code
generation is good not just for action
games with sophisticated geometry calculations. Complex algorithms like
Java2D's blt loops, Swing's painting and layout
management code, chart renders, XML parsers, middleware stacks and
others, lie below our vanilla CRUD
& reporting application. HotSpot Client is doomed to always lag
behind Server, but Moore's Law
allows to continuously package more tricks even in the JVM for low-end
machines and user-time sensitive apps.
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
One interesting thing to note is that the client JVM is actually faster than the server JVM on the "Sparse matmult" benchmark. Any ideas why?
Posted by: mayhem on November 10, 2005 at 11:18 PM
-
You say that escape analysis is "still upcoming" but the link to escape analysis (http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html) says "the current builds of Mustang (Java SE 6) can do escape analysis"...?
Posted by: marc_ on November 11, 2005 at 01:17 AM
-
I believe escape analysis is performed in the latest HotSpot builds, but no escape analysis related optimizations are done. Some benchmarks will see a huge improvement (10x) when these optimizations are implemented.
Posted by: mayhem on November 11, 2005 at 03:56 AM
-
Absolutely awesome!
Congratulations to all involved. May many more 58%'s be coming soon :)
When is the beta due out?
Posted by: profiler on November 11, 2005 at 04:05 AM
-
Compare the data "Hotspot Client - JSE 6.0 build 59" and "HotSpot Server - J2SE 5.0 Update 5": the detailed data are identical, while the "total performance index" is different.
How should I read these numbers?
Posted by: insac on November 11, 2005 at 05:52 AM
-
insac: LU results 910 != 905 it's marginal.
Posted by: danielmd on November 11, 2005 at 06:08 AM
-
Insac: sorry, my mistake, Insac is right, the tables are equal is it a typo?
Posted by: danielmd on November 11, 2005 at 06:10 AM
-
Hi insac and all: I fixed the table, see the comment in the top. Sorry about that, I should be sleepy. But the overall conclusions still hold as the most important, average score was correct.
Posted by: opinali on November 11, 2005 at 08:58 AM
-
this is great, apart from some typos maybe?
i ran a quamtum mechanical calculation using https://quantumj.dev.java.net/, and it seems to be really fantastic:
HF/sto3g, water, 7 contractions, single point energy
5.0 : 2129 ms
6.0b59: 1057 ms
so i see a good possibility of using more java in numerical algos.
Posted by: myjinic on November 11, 2005 at 09:56 PM
-
thats cool.
i ran a quantum chemical code https://quantumj.dev.java.net/
with HF/sto3g, for water, single point energy:
5.0: 1475 ms
6.0b59: 930ms
so this appears to be promising.
Posted by: myjinic on November 11, 2005 at 11:23 PM
-
sorry , the earlier posting was the first time run, the second is average time of 10runs.
Posted by: myjinic on November 11, 2005 at 11:24 PM
-
Both the 1.5.0_05 Server and the 1.6b59 "unoptimize" the Monte Carlo code. It starts out at a value in the 90s, then falls off by 50% as the tests are repeated in a loop.
Monte Carlo : 98.72
Monte Carlo : 90.41
Monte Carlo : 60.92
Any idea why this happens?
Nice speed improvement otherwise!
Posted by: schnied on November 12, 2005 at 03:15 PM
-
schnied: I'm now with b60, but I tested also with 5.0u5 and in both releases I don't see this degradation of MonteCarlo. If I change the benchmark code to run kernel.measureMonteCarlo() in a loop, the scores are very stable. This is the single individual test where Mustang's Server improves over Tiger (+80%), so it may depend of some recent optimization that may be still on the works, buggy or badly tuned, but even this wouldn't explain this behavior for 5.0u5. What you see is quite odd, because HotSpot Server doesn't recompile anything after the first run of MonteCarlo, at least here and with default options.
Posted by: opinali on November 14, 2005 at 05:18 AM
-
It means that optimizations for P4/SSE2 were implemented in HotSpot Client /Java SE 6.0 as well (before they lived only in HotSpot Server.)
It greatly affect floating-point computations but other code does not benefit from them.
The hardware must have support for SSE2.
Posted by: doublebass on November 14, 2005 at 09:59 PM
-
doublebass: Good observation, I failed to remember that HSC didn't use SSE previously. This is certainly a big factor for the benchmarks that do FP... and this is 100% of the tests in SciMark2. But I guess that the new register allocator should benefit non-FP code too, see Sun's description of the problems in the old allocator, like poor interaction with inlining. Any routine that has many local variables (incuding those that "steal" locals from inlined methods) should have some benefit, but we'd need more targeted benchmarks to see this.
Posted by: opinali on November 15, 2005 at 07:48 AM
-
I'm curious as to how the five values shown in the Java 6 build 59 client column are EXACTLY the same as the five values shown in the Java 5 server column. Yet the overall values given (267 client, 410 server) are different. Could there have been a transcription error?
Posted by: jxt on November 18, 2005 at 06:07 AM
-
Oh well, sorry about that previous post. I would retract it if I could. I got to the original blog entry from a link, and it showed only the old numbers. After I posted my comment, I was taken to the updated entry. Weird. But the previous post should be disregarded as it has already been pointed out and corrected.
Posted by: jxt on November 18, 2005 at 06:13 AM
|