Skip to main content

Programming bitmapped graphics with JavaFX

Posted by opinali on October 29, 2009 at 8:36 AM PDT

In my last attempt to stress the JavaFX platform, I ported the Strange Attractor demo/benchmark. Different from JavaFX Balls, this is not scenegraph-driven animation, but old-school "pixel by pixel" drawing… still, makes for another batch of interesting findings, including a few issues in the JavaFX Script language and its compiler, and other topics like fractal maths, BigDecimal, and JDK 7's stack allocation.

UPDATE: All webstart apps here are now updated for JavaFX 1.3, so their performance may be different from what is described by the article.

I have found Strange Attractor in Miguel de Icaza's blog, listing three implementations: Canvas/JavaScript, Flash/AS3, and Silverlight – in this order of increasing performance. The general point seems to be that static-typed languages will wipe the floor with dynamic languages when it comes to performance. I happen to agree… except that there's one important RIA platform missing on that list ;-) so I went to Joa Ebert's blog, fetched the Silverlight code, and ported it over to JavaFX.

The porting was easy, as soon as I found my way around JavaFX's limitations. It happens that JavaFX doesn't let you "paint" a component, like AWT/Swing and most other 2D toolkits. You can only "draw" things by composing scenegraph nodes. But this wouldn't work here: Strange Attractor is a particle animation demo, and it uses 300K particles to render a 3D fractal. I could use a tiny, pixel-sized rectangle for each particle, but this very likely would come dead last in the performance race. Even if JavaFX's scenegraph scales very well, the memory weight of all those nodes and the rendering overhead would certainly kill it.

But the solution is simple. First, I create a ImageView node for the animation. This contains a Image object that is initialized from a blank image. So far, standard stuff. Now, in order to "paint" the fractal in this Image, I do this:

 

function move (deltaX:Float, deltaY:Float)
{
    var pixels = ((img.platformImage as BufferedImage).getRaster()
        .getDataBuffer() as DataBufferInt).getData();
    java.util.Arrays.fill(pixels, 0x000000);
    ...
            pixels[index] = min(pixels[index] + 0x202020, 0xFFFFFF);
    ...
    imgView.impl_transformsChanged();
}

 

I have to resort to two "internal" tricks. First, I access the Image's platformImage property; its declared type is opaque (Object), as the actual type is platform-dependent. For the desktop profile, implemented on top of Java SE APIs like Java2D, that type is BufferedImage. So I just need to cast, then use standard Java SE APIs to put my dirty hands on the int[] array that contains the pixels. I can fill this array with black with Arrays.fill(), read and write individual pixels by just indexing its positions, etc.

In the second trick, as soon as the frame is complete, I call ImageView.impl_transformsChanged(). This is another internal method; it is invoked automatically by the runtime when the node's transforms are changed. Normal apps never need to call it, so it's not a official API. But it has the side effect of forcing the ImageView node to refresh itself from the backing pixels. Notice that my ImageView has no transforms at all, so this should not perform any other redundant work.

In an ideal world, we'd have a official ImageView.invalidate() method. There are some issues with my hacks, so I filed the bug RT-5548: Provide (official) support for bitmapped rendering. I explore the issue in more depth in this bug; so just read, comment, or vote there if you are interested. This is all we can do to influence/lobby a project that's not open source. I will just paste my final comment here: "Right now JavaFX is pretty hard for third-party extension developers. Suppose I want to create a new Control that really demands custom (non scenegraph-based) rendering, what should I do? Full source code is not available so I can't consult it; in-depth technical documentation does not exist at all; the platform still misses important functionality for some people. If the team at least provides some guideline about JavaFX-to-native-2D integration, at least we can work around these limitations while they exist."

The performance

So, just how fast can JavaFX move these particles? I've found that this depends on several factors, so I actually created four variants of the program, identified below by their class names (I've bundled each in a single .fx file, mostly to make the variants easier to manage). In these names, Float = float precision; Double = double precision; List = particles are stored in an ad-hoc single linked list with a next pointer in each particle; Seq = particles don't have that pointer, but they are stored in a JavaFX Script sequence. The version that matches the other ports of Strange Attractor is MainListDouble.

I tested with the early access of JDK 6u18, yet another important update for client-side Java in general and JavaFX in particular; for this specific benchmark, 6u18 brings a update version of HotSpot so CPU-bound code should benefit. (Click each program's name to launch it; source code here.)

Program HotSpot Client 6u18ea-b01 HotSpot Server 6u18ea-b01
MainListFloat 78 fps 111 fps
MainListDouble 74 fps 95 fps
MainSeqFloat 68 fps 92 fps
MainSeqDouble 60 fps 75 fps

 

Some pretty interesting results here. First, the score seems to be very influenced by memory access. The major difference between the four variants is the size and layout of particle data. Each Particle is a simple object with x, y, z fields; and also the Java object header, and an extra int VFLGS$0 field that's used by JavaFX Script's properties (each 32 properties share one such field which is a bitmap; classes with more than 32 properties need additional bitmap fields). We have 300K particles, so a Float particle is 24 bytes = 7,2Mb for the entire fractal; and a Double particle is 36 bytes, likely 40 due to alignment = 12Mb total (estimations for a 32-bit JVM.) Even the lower value doesn't fit entirely in my Q6600 CPU's 4Mb L2 cache, and there other memory pages involved in rendering (the pixel array that's 880Kb; code from the app, JVM, OS…), so the rendering will hit the FSB hard.

The List variants are also faster, why? The datasets are the same size – there is an additional reference field in each Particle, but then I don't need a sequence object with one reference per particle. JavaFX's sequences are well optimized, they are backed by native arrays just like Java SE's ArrayList. (The real story is more complex – there are many concrete sequence impls and the compiler picks and changes the most adequate as needed; sequences of value types can map to optimized sequences without boxing overhead so it's even better than ArrayList and closer to a growable version of primitive arrays.) But there is some small overhead to iterate the sequence, and once again there's worse memory locality. In the MainSeq* programs, the heap will contain one huge array with at least 300K references = 1,2Mb; plus 300K Particle objects somewhere else. The sequence's backing array is treated by the JVM as a "large object", which by itself may have some performance consequences. But the major problem is that a sequential iteration through all particles will demand a non-sequential memory access pattern, alternating between the sequence's backing array pages, and other pages containing the particles.

The JIT compiler also appears as a very important performance factor. HotSpot Server shows a whopping 42% better frame rate in the easiest test MainListFloat; in the hardest, MainSeqDouble, it still produces a very large advantage of 25%. But this result is very interesting because the animation's inner loop is relatively simple, it just performs a few multiplications for coordinate transformation and plots a pixel in the resulting position. The particles are all constant data, the transform matrix is calculated only once per frame, and the inner loop contains no expensive operations like allocation. (There is one call to a tiny method that's trivial to inline even for HotSpot Client; performance didn't change after I refactored some code to introduce this method.) I guess HotSpot Server is just smarter in memory access, e.g. with prefetching instructions.

Then I said to myself: What a wonderful… no; I said: how could I make this code even faster? One obvious target is avoiding the cost of JavaFX Script objects, which are a bit larger than similar Java objects due to those property bitmap fields. The new variant MainListFloatJava implements the Particle class as a Java object. And while I'm at it, why not eliminating all object model overhead completely and just store all particle data in a raw float[] with 3 consecutive positions for each particle's (x, y, z) data? The variant MainListFloatRaw does this.

Program HotSpot Client 6u18ea-b01 HotSpot Server 6u18ea-b01
MainListFloatJava 81 fps 115 fps
MainFloatRaw 142fps 170 fps

 

Once again, very interesting speedups. The Java variant is almost 4% better for both HotSpot Client and Server, a nice although not vast advantage; but the Raw variant is an incredible 82% better for HotSpot Client, and 53% better for HotSpot Server. (The fact that Server gets a smaller speedup reinforces the thesis that its advantage in the previous tests were mostly related to more efficient memory access – Server caught most low-hanging fruit in the previous test.)

Whining/Wishing Dept.

The fact that I could make this program a full 4% faster by just rewriting a trivial class in Java means that the overhead of the binding bitmaps, inserted by javafxc, is pretty annoying. I did my best to help the compiler: my Particle class is script-private, its properties have no triggers and are never involved in binding expressions, which allows javafxc do optimize out some of the generated code. But that was not enough. Checking the bytecode, I identified several optimization opportunities so I filed bug JFXC-3456: Optimize handling of VFLGS$N bitmaps and other property-related code. (This blog was a long time in the making, so this bug is already fixed for SoMa and marked as a dupe; it seems the Compiled bind rework of the next release, mentioned in my investigation of binding performance, is making fast progress.)

And I was also horrified with the way javafxc (alas, the JavaFX Script language) handles null values: all nulls are masked out, so we never get a NullPointerException. I failed to perceive this in previous explorations of JavaFX (it's not documented in the Language Reference). I reported this as a bug in JFXC-3447: Support NullPointerException. Yeah I'm asking to have my NPEs (and also a few other important exceptions) back – at least, to have some control over this critical behavior; please check the bug report before you think I have some fetish for stack traces.

And yes, these results are yet another evidence that the Java platform would benefit from value types, so I could have a headerless Particle class (ok, struct) and put it in a by-value array (objects stored directly in the array without references). This would produce exactly the same memory layout of the MainListFloatRaw, except that my code wouldn't need several changes (for worse – low-level array manipulation like particles[i + 1] instead of particles[i].y, etc.) This memory layout requires 300K*4*3 = 3,6Mb, just half the footprint of MainListFloat, and it's even better as all particles are laid out in perfect sequential disposition. We're still overflowing the L2 cache, but much less than before so the performance gain is huge.

The Java community claims for some kind of value type support since ever; the last attempt came from John Rose's Tuples proposal – a relatively modest and easy change, but still, out of JDK 7. The Java language is basically frozen when it comes to fundamental capabilities… but not the Java platform. See the great JSR-292 that basically "fixes" the JVM for all dynamic-typed languages. This is a good precedent because this huge platform enhancement is basically useless for the Java language. The DaVinci project is also working hard on all sorts of cool features to support immediate (headerless) objects, including fixnums, tuples, structs, inline arrays in tail position, etc.; and also tail calls, continuations and other fundamental techniques that are worth gold for many languages; see this presentation. These enhancements from the DaVinci Machine will most probably not come to a future version of the Java language, but they will eventually be adopted by other JVM languages like Scala, Clojure, JRuby etc.; and obviously JavaFX Script, if Sun gets its act straight.

Meanwhile, JDK 7 is also making great progress in the Escape Analysis-based stack allocation optimization, that was recently turned on by default. Some days ago, Slava Pestov was happily tweeting how Factor kills HotSpot Server in a Mandelbrot program (we all love fractal calculation microbenchmarks…). I found not only that Java was faster (from 160ms to 46ms) after eliminating a Complex class that caused 300Mb of allocation per run of the benchmark, but even maintaining this class, JDK 7b72 could run 2,15X faster (74ms) thanks to reducing the churn to 110Mb per run. Keep in mind however, that Escape Analysis is by definition only good for temporary objects that don't "escape" a single method (or basic block, trace, or whatever the optimization unit); this optimization won't be any help for long-lived data like Strange Attractor's particles.

JavaFX vs. Other RIAs

I didn't compare "officially" JavaFX to the other versions of Strange Attractor; some comments though. The JavaScript/Canvas version is dog slow, max ~7 fps here even in the latest browsers with post-modern JavaScript JITs, not surprising because dynamic typing sucks (ok, I'm repeating myself). The AS3/Flash version is better but not stellar at ~25 fps, thanks to its optional static typing (that is for real, and not a joke like in Groovy). Silverlight is easily the best of these three, and my JavaFX version appears to be even faster (apples-to-apples, fair comparison is the MainListDouble version with HotSpot Client = 74 fps). But the Silverlight program is missing a FPS counter, and even if it that was available, the animation would probably be capped by the display frequency or some standard rate like 60Hz; JavaFX usually does that too, but in my code I resorted to the same tricks used by JavaFX Balls to reach the maximum possible fps.

Also, you cannot compare CPU usage, because my JavaFX version is smart enough to only render new frames when the 3D image's position changes; all other versions don't do that, they will peg one CPU core at 100% all the time even if you keep your mouse parked so all frames are identical.

The .NET platforms does support value types, so it could potentially be optimized to use this feature, and (unless the .NET JIT compiler is really poor) Java's only hope to match it would be resorting to a low-level implementation like MainListFloatRaw (or waiting for the fruits from the DaVinci Machine project).

Fractal Mystery

Now, the most interesting comparison is not the performance, but the actual image produced by each program. The three original versions produce basically the same image, modulo details like color and some FPS display. But my JavaFX version is distinctively different – check this:

Silverlight:

JavaFX:

I captured the images in similar positions (mouse parked at the lower-right corner); the difference is very noticeable. My program produces a bigger image, remarkably in the outer "corona" of the fractal – the deviation seems to grow as function of the distance from center. Now, I'm just a rookie in the maths involved in these graphics: since the old times of FractInt for DOS (most awesomest fractal platform evar!), I'm content to code formulas that I find somewhere else, and amuse myself with the result without really understanding it in depth. In this case I didn't even code anything, I just ported C# code to JavaFX Script. The JavaFX image looks better and more complete, but this may be just my bias. Can anybody explain this difference?

While investigating this, I changed the code that calculates the color for each rendered pixel, so particles closest to the observer are brighter, the figure looks solid, and the object is easier to inspect. Performance goes down ~4% in HotSpot Server, ~10% in Client; but the 3D effect is pretty nice (especially when animated).

JavaFX (3D enhanced, Float):

javafx2

But the lack of bitwise operators in JavaFX Script is irritating (I have to use a clumsy Bits class with methods like shiftLeft(), etc.). JavaFX Script aims to be a high-level language, but come on - the extra operators would not add any significant complexity, they are bread-and-butter material, even competing "languages for designers" like JavaScript and ActionScript have those operators. Perhaps it's just a leftover from JavaFX 1.0, that didn't even have integral numeric types. But there's more – the language omits symbolic operators &&, || and !, forcing us to the keywords and, or and not… that are much less readable (compare "aa and bb or cc" to "aa && bb || cc"). And not consistent too: not-equals is != so the exclamation point still lives meaning negation. And the vertical bar in "[a | a > 5]" has the same role as where in "for (a in b where a > 5)". IMHO, operator syntax is one area where JavaFX Script's design should be fixed.

Now, the most interesting discovery, facilitated by this enhancement, is that the fractal changes significantly with numeric precision. Compare the previous image with the following:

JavaFX (3D enhanced, Double) (click image to run):

javafx3

The last image, created with Double precision, is noticeably different in the outer corona where you can see series of bands, like in a snail' shell. Most of these bands are inexistent, or very hard to see, in Float precision, because the particles that should form the frontiers are in slightly wrong positions. If you're a veteran fractal lover, this is not news – fractals are remarkably dependent on numeric precision, in fact it's one of the very few CG technique where floats are not good enough to avoid severe artifacts. Most good fractal programs offer better-than-Double precision, necessary to render in deep zoom or high iteration levels. Strange Attractor performs 300K iterations over a single (x, y, z) position; this is a enormous number of iterations, so any imprecision will quickly escalate into noticeable artifacts.

If Double is better than Float, wouldn't big decimals be even better? I recoded the calculation method with Java's BigDecimal. The resulting code is horrendous (as usual), which is bad enough in Java but definitely doesn't "fit" in JavaFX Script… you'd expect a language like that to offer a seamless arbitrary-precision decimal type. We could just have some syntax sugar over BigDecimal, to be able to use * instead of multiply(), etc. But the performance would still suck (as usual) because BigDecimal is immutable and the churn of object allocation and GC burns more CPU than the actual calculations. The Java platform desperately needs mutable counterparts of BigDecimal and BigInteger (some implementations already have these for the internal implementation of some operations, but the mutable classes are not public). Then, many high-level languages like JavaFX Script, Groovy, Scala etc., could offer a decimal type complete with operators and other special syntax and semantics, but reusing java.math's implementation and interoperable representation. The mutable objects wouldn't eliminate the advantages of immutability if the programmers don't use them explicitly – the source compiler could do that automatically to compile expressions requiring temporary values (much like javac does for string concatenations since JDK 1.0); still, public mutable APIs would allow much further manual optimization (and value-type BigDecimal, even better…). Anyway, after waiting a few seconds for this calculation (limiting precision to IEEE128 ~= 2X better than Double), the resulting image is not any different to the naked eye, so Double was already good enough.

But after this digression, the conclusion of this experiment with numeric precision is that… I still don't know why the other languages produce different output even at same precision. The Java platform is well-known to have a very strict math spec, but the fractal calculation uses extremely basic arithmetic (only multiplications and sums) so this should not be a factor.

Comments

What does the performance-test say?

What says the performance-test?
That the actual JavaFX with the Java-plotform on which it sits, is fast.

But the actual JavaFX includes lots third-party-code. And Sun rewrites that code. So its very likely, that the parts of JavaFX, which are now fast and mature are later slow and buggy.

And the current JavaFX implementation is - because of its license - for nobody usable.

Read this three blogs, and you know why:
http://www.jroller.com/agoubard/entry/javafx_preview_release_read_the
http://lobobrowser.wordpress.com/2008/12/22/javafx-10-license-unreasonable/
http://gnu.wildebeest.org/diary/2008/08/01/the-javafx-trap/

I think, there existing three big users, which could potential use technologies like JavaFX:
- Companies, which would use JavaFX for its marktplace, to give it a better look.
** But it isn't allowed to use JavaFX in a commercial context.
- OpenSource fans. Which try every new technology and new language out.
** But JavaFX isn't OpenSource. The compiler is GPL. But the runtime is not OpenSource. And from the Scenegraph exists an old version, which is GPL without GNU Classpath exception. But the Scenegraph was completely new rewritten. And the new version is closed source.
- Freeware developer, who creates little offline programs for free.
** But it isn't allowed to give the JavaFX-runtime with the own program away. Everybody have - with downloading JavaFX - only a JavaFX license, which allow to make one copy of JavaFX for its own use.

I would be happy, when the JavaFX-team wroting more about its progress of rewriting the third-party code under an OpenSource-license (possible like OpenJDK: GPL+GNU Classpath exception).

More FUD...

"But the actual JavaFX includes lots third-party-code. And Sun rewrites that code. So its very likely, that the parts of JavaFX, which are now fast and mature are later slow and buggy." - what do you mean by that?

The only third-party code that exists in JavaFX is media codecs (from On2). The next release will reportedly include another component (the Saffron font rendering engine). But Sun doesn't rewrite these components, they just integrate them. A "rewrite" only makes sense in cases where Sun may decide to remove the third-party component, but this rarely happens - for example, the OpenJDK project replaced a few closed-source pieces of Sun JDK with open source alternatives, but even now, Sun keeps using those third-party parts in the Sun JDK, for several reasons (superior performance, bug-per-bug compatibility, whatever).

Nothing is becoming slower and buggier, just the opposite, so far every release of JavaFX is significantly superior in quality, features, AND all aspects of performance. The next major release (SoMa) is once again expected to be another big leap forward. And the parts that are becoming better and faster are exactly the parts owned, controlled and developed by Sun, not the third-party parts (that are relatively very stable tech).

On JavaFX Licensing

Please stop quoting that first blog (Anthony Goubard's), this is FUD. Those comments are specific to the JavaFX Preview release, which licensing terms were obviously more limited. The license changed in JavaFX 1.0, so there's no expiration, no restrictions for commercial use, etc.

On the other blogs: yep, the JavaFX runtime is not yet open source, I have been a vocal critic of this fact, remarkably because Sun has officially promised that JavaFX would be eventually full open source; but that was years ago when Sun started the project and before the decision to sell themselves. Now it's possible that the license change is mostly blocked by the never-ending Oracle/Sun deal; hopefully this drama will end soon, and Oracle will make the right decision and move JavaFX to GPLv2, to join the OpenJDK ecosystem.

Anyway, until this happens, JavaFX's license is not any worse than most proprietary licenses(*). If your choice is one of JavaFX, Flash or Silverlight, you don't really have a choice. (Mono's Moonlight could be a good choice if you like .NET/Silverlight and if you don't mind Mono's possible issues with Microsoft's patents and historical behavior towards open source.)

(*)The runtime's distribution restriction is meaningless because JavaFX is not designed for a deployment model of bundling the runtime with the application. The runtime is automatically downloaded when some JavaFX apps runs for the first time, and it's not a big download. There's no way to do a standalone install of the JavaFX runtime, it's just a bunch of jars and DDLs inside the JRE's resource cache.

'But there's more – the

Even the CAPTCHA expresses

Even the CAPTCHA expresses the same kind of dilemma of JavaFX , which is language inconsistency. CAPTCHA Math question: * + two = nine Solve this math question and enter the solution with digits. E.g. for "two plus four = ?" enter "6". Why not just simply use either digit or words ? Why can't it be simplified to something that is easy to understand and use ? Sure some will be confused by the captcha. Hint: The best captcha simply does not need explanation and instruction. Will it be better to use either one + three = four (this is verbose and non-common) or 1 + 3 = 4 (this should be the preferred way since programmer is the audience for this site, and familiar with number)

Nice one

Having recently jumped from the Java world to .NET and WPF/Silverlight it is great to see that, even at this level, very advanced graphic rendering can be done. Nice work!

mpressed

I am Very impressesed by your work. And of course javaFx. I have a basic 2 years old Pc runnign Vista and all the demos you showed us all just work fine, and at a lightning speed ! I feel this is great news for java-javaFx platform. Just hope for the next big javaFx "show off" there will be many kind of these demos.

Performance

Like @theuserbl says, the performance you see here should actually be attributed to the core Java platform; this should be clear for any Java hacker who studies the code as the entire rendering is performed by plain old Java2D programming. Even though function move() is written in the JavaFX Script language, I don't use any nontrivial JavaFX-specific features there. BTW, Marceli Narcyz wrote two other ports of this code - for Java and Scala - and their performance is also basically the same, as you should expect: Java, Scala and JavaFX Script are very similar in the aspects important for this code - e.g., statically-typed and able to use primitive numeric types, so a function like move() can be coded with zero overhead, with a optimal mapping to the JVM's typesystem and bytecode instruction set.

Of course, a efficient JavaFX demo app requires more than efficient code. Deployment is also very important. The StrangeAttractor apps are a single 177Kb jar, and it's only that big because I was too lazy to create separate jars for each variant: all JNLPs in this blog refer to a single jar that contains the compiled code for all the apps. The good thing is that once you load the first app, all others load instantly because the jar is already in your cache. ;-) Unfortunately I couldn't use pack200 compression due to a java.net bug, if I could, the jar would be just 21Kb which is in the "almost instantaneous" range even for dial-up users. And the apps don't load any graphics/media resources, web services, etc. - it's just code. Even the initial calculation of the fractal is fast enough to not create a perceivable delay... Otherwise I could fool the user somehow, for example by computing the fractal in parallel with the rendering - so the first few frames would show an incomplete fractal, but unless you have Time Warp-like slow motion vision, you wouldn't notice. ;-)