Skip to main content

First look at JavaFX 1.2, Part II

Posted by opinali on June 10, 2009 at 8:02 AM PDT

Check first part here. By just adding -server, I got the following results (standard runs, without removing the toolbar or any other tricks):

  • 16 Balls @ 980fps (1% CPU): 1.47X faster than HotSpot Client;
  • 128 Balls @ 460fps (14% CPU): 1.39X faster;
  • Adaptive mode / 285 Balls @ 200fps (20% CPU): 1.28X more load;
  • Adaptive mode / 610 Balls @ 60fps (24% CPU) : 1.08X more load;

HotSpot Server is not adequate for internet deployments or client software: not included in the public JRE, bigger loading time, bigger memory footprint. Typical Server warmup behavior applies: the 16 Balls benchmark (for example) only reaches its maximum score after ~45s of execution. In a real application with a "normal" frame rate, Server might take even more time to fully optimize JavaFX, unless the lower fps is compensated by a more complex scene.

But these results are surprising because I'd think that after JavaFX 1.2's scene graph improvements, JavaFX Ball's performance would already be dominated by native code (FX lays over a good chunk of native libs, from its own runtime and also Java2D's accelerated pipelines). So, we can see interesting potential. Java code in the runtime is still very significant, and maybe the new Scene Graph code (and also Java2D) can be further optimized... Or replaced by even more native code. Sun has already abandoned the PureJava dogma for the JavaFX runtime (the Mobile runtime, apparently, is even more native), so I wouldn't write that off. HotSpot's JIT rules and all, it often delivers C-level code speed, but no bytecode/JIT technology will ever compete with native code for near-zero loading time and minimal memory usage.

[The only reason why competitors like Flash has that split-second loading time is that its runtime is basically a monolithic blob of native code written in C/C++. Flash's managed code (AS3) is not significantly better than Java in loading time, and is just as bad in resource usage and worse in execution performance. Just look at AIR applications, they are every bit as slow-loading and memory-hungry as Swing applets, because AIR's frameworks are written in ActionScript.]

Another possibility is actually using HotSpot Server's superior optimizations... how? Recent builds of HotSpot (in JDK 7 and JDK 6u10+) have a "tiered" execution mode that should combine the best of both worlds: fast startup and lean footprint because most methods are compiled by the Client JIT, but top performance as the really hot methods are (re)compiled by the Server JIT. If this works well, it's another promise of an important boost to the performance of Java desktop software. You can try this with -server -XX:+TieredCompilation, but the implementation is incomplete. This recent OpenJDK thread says it's "in the back burner", apparently with no progress since Steve Goldman, the engineer who was doing it (and to whom we owe other things like the current x64 support), passed away one year ago. The JVM switch works, showing very different compile logs, but the performance is not there; it's not significantly different than pure Server. Several bugs track this feature, and I hope we can get really done in JDK 7 but I think that won't be for FCS as it's clearly not a priority (release driver). Besides that, other improvements of JDK 7, remarkably the Jigsaw modularization, may further improve the loading time of JavaFX (and also Swing) applets.

Related Topics >>

Comments

Don't know if this will help with tiered compilation, but I've found -XX:Tier2CompileThreshold=35000 to make the server compiler not kick in so soon (it defaults to 10000 I think) give better performance in some cases.

Osvaldo, thanks for the answer :)

@rael_gc: Now, with a more straight answer to your question... :-) In a way, Sun has already abandoned excess purism since many years ago (JDK 1.2?). If you peek inside the API sources, you'll find abundant use of internal "magic" classes like the famous sun.misc.Unsafe, which enables all sorts of C-level features like fully unrestricted memory access or really low-level synchronization operations. HotSpot optimizes out calls to Unsafe's methods by replacing them with equivalent intrinsic operations. The net result is the same of extended unsafe bytecodes, except that you don't need to create new bytecodes. Of course, access to such APIs is strictly controlled by security policies, so you can't normally use them in application code. There are also other "impure" optimizations of higher level, like java.nio's direct buffers, and the extensive optimization of Java2D to take advantage of OpenGL and Direct3D pipelines (and also XRender in JDK 7).

@rael_gc: Good question. I don't expect major change in the JRE architecture, simply because it's already a very stable, tested, tuned etc. codebase and it just doesn't make sense to rewrite it - "if it ain't broken, don't fix it". Unless the rewards are VERY high to compensate the risk. I have the theory that Sun could reimplement, with native code, a "kernel" part of the JavaSE API - i.e., those few hundred classes that every app loads (java.lang, java.util, java.util.jar, part of java.io, etc.) - and this would result in a significant saving of loading time: at least for simpler apps, only classes from the application itself would require regular classloading. But there are other problems... if we convert existing Java classes into native code, we increase the Java/native boundaries (and JNI invocations are very expensive), and we reduce optimizations opportunities (HotSpot cannot inline calls from Java classes to native functions). So, there is a very clear tradeoff between loading time and performance (not to mention the maintenance cost of native code). The JNI overhead can be reduced if the JNI standard is somehow "fixed", but part of that overhead comes from advanced VM features; other VMs with lighter native interfaces seem to have less advanced GCs and JITs (when they have a JIT at all) so it's not as easy as saying that JNI is a bad design because Perl or Ruby or whatever have a much simpler and lighter native interface. A better idea (probably) is using AOT (ahead-of-time) compilation of those "kernel" APIs. Just keep the existing code, but have HotSpot to precompile and cache the binary code. Something in the lines of CDS, but caching JITted code, not only pre-linked/verified bytecodes. This probably means that the cached code must be compiled without context-specific optimizations, like aggresive inlining that depends on speculative devirtualization - but this tradeoff would be OK for HotSpot Client; I think most developers will be fine with a modest loss of performance (say 10% less throughput) if this is the cost for "instant" loading of applets. Notice that we don't need to completely disable these aggressive optimizations, it's only for a subset of the core APIs that are critical for app loading. The IBM JDK already implements a very extensive JIT cache, but unfortunately their VM is not well tuned for client apps, the cache serves mostly to save memory (through sharing) when you load multiple instances of a monster app like Websphere in the same machine. IBM's JIT cache can even store code that uses advanced optimizations, but the result is that the cache is complex and big and hard to share between different apps (in my system I have RAD and WAS, and they use separate cache files - respectively 50Mb and 90Mb, both quite big, once again because the VM doesn't care for client-side needs like memory efficiency, it's only tuned for throughput). A real AOT compiler like Excelsior's produces more compact code, so we could imagine some combo of this to compile only the kernel APIs and a top-class JIT like HotSpot for the rest...

Osvaldo, do you know if "the PureJava dogma" will be abandoned for the JavaVM too? Or only for JavaFX runtime?