Skip to main content

JavaFX Balls 2.2: Effects and more

Posted by opinali on July 9, 2009 at 6:15 AM PDT

In the last updates, I did a quick port to JavaFX 1.2 and evaluated its performance again (and again). But as I keep playing with this benchmark and learning JavaFX, I added a few extra enhancements:

  • New options of 512 Balls (desktop) / 128 Balls (mobile), and Adaptive 60fps. These make easier to compare to some other versions of Bubblemark.
  • Binding-related enhancements recommended here.
  • More benchmarking-friendly behaviors. The animation is off when the applet starts; this allows you to change the options (with preview!) before the animation starts, important to measure warm-up performance. The status is updated more often; logged to the console; more precise (2 decimal digits), and more correct (when some option changes, the status would previously mix new configuration info with a score related to the old config).
  • (desktop) New rendering option to enable use of JavaFX's Effects framework. I chose the BoxBlur effect, because it's a simple, popular effect that I'd expect any competing animation package to offer or be capable to program easily. Other Bubblemark ports don't have this option, but I'd be thrilled if they'd implement it.
  • (mobile) A help screen that documents the keypad controls, and disappears only when the animation is started.
  • (desktop) The control toolbar was removed, because its cost is not insignificant even with caching. I adopted keyboard controls similar to the mobile version, just with a different selection of keys. The status line shows a short help text, similar to the help screen from the new mobile version, but always-on.

Check the source code. The applet below shows the desktop version. I still refuse to offer a WebStart deployment because the java.net site doesn't serve JNLP files with the correct content-type and that would force me to host the JNLP somewhere else. (The kind of problem that would never happen in a site from Adobe or Microsoft.)

UPDATE: Applet now updated for JavaFX 1.3, so its performance may be different from what is described by the article.

A portability glitch

The new FPS counter adds two decimal digit of precision, so in a very hard test like 512 Balls + Effect, when the performance drops to <10fps, we still have 2-3 digits of total precision. But this uncovered a subtle portability bug. I used JavaFX Script's string formatting syntax, e.g. "{%02.2f frames}fps", where %02.2 is the System.printf()-style formatter for frames's value. But that code failed for JavaFX Mobile, because floating-point formatting is not supported. In fact, not even zero padding is supported. I had to add some manual formatting code as a workaround.

JavaFX Mobile supports formatting with internal packages, because there are no formatting APIs in Java ME, either printf() or older APIs like DecimalFormat. The JavaFX Language Reference (which btw is in pretty bad shape - very incomplete and not significantly updated since JavaFX 1.0) doesn't document the formatting options, having a placeholder: "[To do: describe portable Formatter that handles the common subset]". Hopefully a future version of javafxc will have an option to generate warnings for code that uses syntax that's specific to the desktop profile, because this kind of error is not directly related to missing APIs so the compiler is currently silent even if the project is configured for mobile.

Benchmarking Effects

Enabling the BoxBlur effect makes the animation fuzzy, but not warm. The effect has a severe impact in performance:

  • 1 Ball: 665fps (from 995fps)
  • 16 Balls: 116fps (from 665fps)
  • 32 Balls: 60fps (from 665fps)
  • 128 Balls: 17fps / 15% CPU (from 400fps / 22% CPU: 23X slower)
  • 512 Balls: 2,15fps / 25% CPU (from 82fps / 23% CPU: 38X slower)

Notice that the base 1 Balls score is meaningless because on that test the animation is capped by the animation engine's 1KHz "pulse", in fact even 10 Balls score the same 995fps in my system. Also, the scores for 16 and 32 Balls are identical; both tests have minimal CPU usage, so once again the FPS rate should only be limited by timing artifacts. Because of this, I can't compute with any significant precision how much slower the effect makes these tests. I did that calculation only for 128 and 512 Balls, where the score seems to be limited by CPU.

I was expecting a sensible cost in performance, but not in the range of 20-40X slower. Looking at CPU usage in the same test, it's <1% in the standard run (Windows's Task Manager shows a stable "00"), but it jumps to ~8% (i.e., 32% of a single core from my Q6600). For 128 Balls test the CPU usage costs an extra 7%, but this must be normalized to performance, so it's 11,62% CPU/frame which is 13X worse than 0,88% CPU/frame without Effects.

The program dumps the acceleration used by JavaFX for this effect, and the result in my system is "Direct3D" so I suppose the BoxBlur is fully implemented by shading. There is some bottleneck in the activation of the Effects, perhaps for imposing additional steps in the pipeline, extra buffers, or something.

As it is today, the performance of Effects is not viable for higher-end animations, such as action games. It's perfectly fine for RIA apps, e.g. to cast a drop shadow in some internal frame that can be dragged. Notice however that I didn't test in machines with onboard graphics. I tested in my main test system that has a NVidia Quadro 1700 and also in a laptop with a NVidia GeForce 8400M. The Effects framework can also be accelerated by OpenGL, or by a x86/SSE pipeline for machines without shading-capable GPUs.

Dude, where are my frames?

In the bullet list above, I didn't report the score actually reported for the 512 Balls test with Effects. The program reported ~5,5fps, but that was obviously false. This is very odd, because my animation timeline uses the canSkip:true option, so it's only executed as often as the animation engine can actually produce frames. This is important to keep the FPS score honest. But something was not working as advertised. I added code to print nanoTime() at each execution of the timeline's action:

244768994737808 (delta: 467ms)
244769462624160 (delta: 13ms)
244769475647323 (delta: 13ms)
244769941612132 (delta: 465ms)
244769954661486 (delta: 13ms)

The first and fourth deltas above reveal that frames are being produced each ~465ms, so the score is a little north of 2fps. The ~13ms deltas are unreal; the runtime should be skipping frames or repeating the keyframe. The reason is linked to the bugs RT-2943 and RT-4052; both report problems for 0-duration timelines. My bug has different symptoms, so I filed a new bug, RT-5024: Animation drops frames but repeats keyframes. The JavaFX team responded quickly and I learned a few more things about the animation runtime.

I can avoid the 0-duration bugs by using a tiny duration, like 1ms (the smallest possible). I tested that and it fixed the FPS status, making it exact again. But there was a severe disadvantage: top performance dropped to 500fps, even for the most trivial test with 1 Ball. This turns out to happen due to OS scheduling latency (in my case Vista SP2, so I'm curious to know if the same behavior happens in other OSes, remarkably one with a realtime scheduler like Solaris).

Then, I can gain my high scores back by enabling the option com.sun.scenario.animation.fullspeed=true. With this option, the animation will not yield to the OS when there's nothing to do, so any impact from scheduling (delays, jitter) is eliminated or at least greatly reduced. This worked wonderfully, so I could reach a round 1000fps score with low Ball counts, and even at 512 Balls the performance improved from 82fps to 90fps, a very significant boost of almost 10%.

The bad news: fullspeed=true will cause JavaFX to 0wn your CPU, sucking it at 100% all the time, regardless of the amount of work done, even 1 Ball. (In fact I have a quad-core machine so this translates to "only" 25% CPU usage, so my system didn't get any less responsive.) This high and unconditional CPU usage happens because JavaFX will busy-wait instead of yielding. Notice that this doesn't disable multitasking; Windows can still preempt the JavaFX process to run other apps if there are no other CPU resources available. Still, it's a significant impact on the system in a bad scenario, like a single-core machine or even a multicore box that's loaded.

The busy-wait, or "spinning", technique is common in animation and game engines, as it can extract the last ounce of system performance for a few extra FPS. For example, I noticed that the LWJGL/Slick version of Bubblemark, with V-Sync off, uses 20% of my CPU with 1 Ball - almost a full core, and it would probably reach 25% / 1 full core if ran outside the browser. That's an insane CPU overhead for a single Ball, even at 3070fps - JavaFX Balls can do 1000fps at <1% CPU, so it could theoretically do 3000fps in <3% CPU if JavaFX got rid of its 1000fps cap. I'm pretty sure that LWJGL/Slick, and perhaps other bubblemark competitors, resort to busy waiting. But I decided to not enable this technique in JavaFX Balls, as the cost/benefit is certainly not good for most potential apps; I think this only makes sense for advanced action games.

Deployment issues

After all updates since JDK 6u10 (I'm using 6u14) and JavaFX 1.0, Java applets still suffer deployment problems that make difficult to keep the faith in Java for the desktop.

In my Vista SP2 PC, JavaFX presents a dialog saying that it must install the JavaFX Desktop Runtime. I click Accept and the applet runs, but this stupid dialog will return every time. I checked the JPI cache and the JavaFX runtime is there all the time. This happens even for the JavaFX applets in javafx.com/samples; but not all applets, for example the old version of JavaFX Balls in my blog doesn't suffer this - but that one uses an older JavaFX version. It's probably some bug in the JavaFX Deployment Toolkit v1.2. I noticed this bug before but I was ignoring it because I blamed it on my messed-up developer's environment. So I uninstalled all my JDKs and JREs (as I keep everything from 1.3.1 up installed), manually cleaned up Java's entries in Windows' Registry, installed everything again, and the problem persists. In my laptop running Windows 7 RC, this problem doesn't happen.

I still suffer from the problem of multiple JavaPlugIn icons appearing in the Windows system tray. This bug was supposedly fixed, but it happens in my dev/test sessions because I typically run several "Java host" programs: at least one Java IDE, multiple browsers (usually the latest FF and Chrome), and sometimes Windows Live Writer which also shows embedded applets. The problem is that each "host" needs its own jp2launcher.exe process. That sucks, because I'm probably not getting all resource sharing I should get with a single launcher.

Conclusions

JavaFX 1.2 was a much needed update that fixed important holes and performance bottlenecks. But as they fix the ugliest issues, we, developers / enthusiasts / early adopters, just move to find smaller issues so we still have something to complain about. :-) The relatively poor performance of Effects is certainly something Sun should investigate, although it's certainly not a disastrous performance problem like the scene graph scalability that I complained about for JavaFX 1.0/1.1. I reported this new problem as issue RT-5035.

Moving from severe bugs / bottlenecks / missing functionality to less severe ones certainly shows progress in the platform; but perhaps this progress could be faster. I've seen other people criticizing JavaFX 1.2 for bugs in the new APIs like controls, and I think Sun deserves all flak they might receive, simply because the development of JavaFX is still disgustingly closed - how do you expect highest quality when such a major release is shipped without any public beta? In practice, all JavaFX "FCS" releases are betas, and if you intend to ship some app for JavaFX 1.2, my advise is waiting for the first maintenance update, as usual (1.2.1 or whatever). At least the project has a open issue tracking and the dev team is very responsive.

UPDATE: I refreshed the applet (and sources), fixing a bug with the Adaptive mode and also adding a new feature to scale the balls (keys UP/DOWN cycle from 1X to 4X scale in each axis, i.e. 1X to 16X areal size). Larger balls create virtually no cost for bitmap rendering, but the cost is significant for vector rendering and/or effects. Notice that I didn't care adjust the bouncing against walls and other balls, so the animation (remarkably for multiple balls) will look a little wrong, but I didn't want to make even more changes in the moving/collision code that should be kept as close as possible to the reference Bubblemark. Finally, I also changed some key bindings to make the desktop and mobile controls more similar and natural.

Related Topics >>

Comments

I performed a simple profiling of Effects, using the JVM's -Xrunhprof:cpu=samples (the NetBeans Profiler doesn't work for JavaFX projects). Below, all entries with >1% self time: CPU SAMPLES BEGIN (total = 5504) Wed Jul 15 14:14:07 2009 rank self accum count trace method 1 41.01% 41.01% 2257 300234 sun.awt.windows.WToolkit.eventLoop 2 11.34% 52.34% 624 301500 sun.java2d.d3d.D3DRenderQueue.flushBuffer 3 7.94% 60.28% 437 301497 sun.java2d.d3d.D3DRenderQueue.flushBuffer 4 7.58% 67.86% 417 301356 sun.java2d.d3d.D3DRenderQueue.flushBuffer 5 6.61% 74.47% 364 301485 sun.java2d.d3d.D3DRenderQueue.flushBuffer 6 3.72% 78.20% 205 301472 java.net.SocketInputStream.socketRead0 7 3.05% 81.25% 168 301510 com.sun.scenario.effect.impl.hw.d3d.D3DRendererDelegate.drawTexture 8 3.02% 84.27% 166 301461 java.net.SocketInputStream.socketRead0 9 1.07% 85.34% 59 301436 java.net.PlainSocketImpl.socketConnect The huge accounting of CPU for sun.awt.windows.WToolkit.eventLoop should be hiding the native rendering pipeline. So the most interesting entries are those for D3DRenderQueue.flushBuffer() and com.sun.scenario.effect.impl.hw.d3d.D3DRendererDelegate.drawTexture(). In a similar test w/o Effects, only the flushBuffer() methods appear, and they have <0,75% CPU usage. The bottleneck's pattern seems obvious: Effects demand some costly management of extra rendering buffers/textures. Either this part of the pipeline is not well tuned (too much allocation and/or data copying costs), or it's not sufficiently accelerated (not everything allocated in the VRAM and/or computed in the GPU).

@liquid: I didn't interpret your explanations as justification, I guess I was just a little frustrated with this problem... :) Anyway, it's good to know that this performance limitation is something that can be fixed as an optimal implementation is possible but just "way more complicated". So I guess it's just a matter of demand. Even in the bug RT-5035 that I filed, I realize that this problem is not a priority for most uses of JavaFX. Even for action games, I think these stock effects wouldn't be of much use - what would you do, a Super Mario clone where every sprite is blurred? :-D but if JavaFX eventually develops a more general shading facility that's another matter.

@opinali: i was explaining how the javafx script nodes created java nodes; so the bytecode reading i was mentioning was the one from the javafx runtime, not your applet 's - those were not arguments per se, just remarks on how javafx and scenario/decora worked the last time i checked, and which could *explain* your findings, not *justify* them.

JDK 7b64 is out with 6u10-6u14 merge, so I tested JavaFX Balls on it. The test with Effects is significantly better: for 16 Balls, the score is 30% better at 152fps; for 128 Balls, it's 35% better at 23fps. There was no change in the execution of native code, GPU-accelerated code or JavaFX runtime, so all this improvement is evidence that JavaFX's Effect bottleneck is due to execution of too much Java code. JDK 7 runs this code MUCH faster, so the overall score is greatly improved. The real fix of course should depend on properly accelerating the effects.

@liquid: No need to read bytecode, the source code is linked in the beginning of the article. :) I tried to understand your arguments, but if you check the source you'll see that the animation's scene graph model is really very simple: Scene [ Group [ Group [ {balls} ], Text ] ] where each ball in {balls} is a ImageView (assuming image rendering). Its only bound properties are translateX and translateY; everything else is fixed at initialization, including the Effect if there is any. When the user changes rendering option, I recreate the balls' view nodes, so there's no binding complexity for anything other than translation. I use binding also for the content of the Text node that shows the status (and the status only changes once per second). So in summary, we have a scene graph that is extremely stable, the only data that changes from frame to frame is the translation of the ball nodes. If JavaFX's animation engine cannot optimize this to something that requires minimum Java/native and CPU/GPU transition overheads, there's something really wrong IMHO. I can understand that vector rendering makes animation much slower, because the vector drawing of this ball shape is pretty difficult as it uses several curves and a RadialGradient, and we don't get slim shady here. :) But I wouldn't expect a large cost from a simple, shader-accelerated convolution kernel like BoxBlur.

@ngthomas: On the security/EULA dialog issue, I think I already reported to a JavaFX bug (that's not yet public), a clarification of this problem. It seems that JavaFX was being confused because it was trying to write the file .javafx_eula_accepted to the incorrect location for my user profile. The root cause was Java's bug 4787931 (System property "user.home" does not correspond to "USERPROFILE" (win)), which is a major problem IMHO but it's open since 2002. The bug only happened in my work place where we migrated the Windows domain server, I won't repeat all the messy details here, but all Java apps that use user.dir for config files will default to the wrong place until I override it. See also http://forums.sun.com/thread.jspa?threadID=5390774 BTW, this whole EULA thing is funny because (if it doesn't hit a userprofile problem like mine) the JavaFX Runtime will create automatically the .javafx_eula_accepted, without anything being accepted by the user!! So the whole mess seems to be a waste. I see that the runtime contains, and always loads, a few classes related to EULA/licensing: [Loaded com.sun.javafx.runtime.eula.Eula from file:/C:/Java/NetBeans.67/javafx2/javafx-sdk/lib/desktop/javafx-ui-common.jar] [Loaded com.sun.javafx.eula.EulaImpl from file:/C:/Java/NetBeans.67/javafx2/javafx-sdk/lib/desktop/eula.jar] [Loaded com.sun.javafx.eula.EulaImpl$3 from file:/C:/Java/NetBeans.67/javafx2/javafx-sdk/lib/desktop/eula.jar] [Loaded com.sun.javafx.eula.Ping from file:/C:/Java/NetBeans.67/javafx2/javafx-sdk/lib/desktop/eula.jar] [Loaded com.sun.javafx.eula.Ping$1 from file:/C:/Java/NetBeans.67/javafx2/javafx-sdk/lib/desktop/eula.jar] [Loaded com.sun.javafx.eula.EulaImpl$1 from file:/C:/Java/NetBeans.67/javafx2/javafx-sdk/lib/desktop/eula.jar] Why should all this garbage load for every JavaFX app? :-(

Comments on your deployment issues section: 1. On windows, there should not be security dialog displayed because of the JavaFX runtime. I tried on XP/Vista, running FX applets on your blog or on javafx.com (for samples that are unsigned) will not pop up any security dialog. If you can consistently reproduce the problem, can you file a bug please ? We would like to work offline with you and investigate the problem further. Thanks! 2. Yes, if you have multiple different browsers running java applets, we will have multiple java tray icons displayed currently. This is good feedback and I will create a RFE for this to investigate this. Thanks for your input!

@osvaldo: as far as i can tell from reading bytecode - :) - your javafx scene creates javafx nodes and javafx effects, which have parameters and binding, etc. Those objects create in turn scenario (java) nodes and decora (java) effects and pass them the parameters. You can see that as a peer model however decora's effects (so javafx effects' peer objects if you will) also have a peer model that abstracts the low level effect processing. Same concept, but not universal to all objects, and a different usage here. getAccelType is definitely not lying to you, a direct3d shader does most, if not all, the work with the effect. It may be an issue of time, however in your case, computing parameters on the gpu would require multiple shaders that depend on one another and storing the temporary and final results in textures. That's way more complicated to write than a java loop, In my experience, even now they perform "well enough" blurring (which is not a totally trivial shader by itself) bubblemark running at full speed or doing real games might not be a realistic requirement yet. Prism will probably help with that.

I am very happy to see great idea!! go ahead Dude.

@greeneyed: Yes the JNLP works for dev.java.net, but it doesn't work for any URL from weblogs.java.net, these are different servers (ping reveals different IPs). I don't have any dev.java.net project so I can use its download area to host my JNLPs; and I won't create a dummy project, or some other site, just for that purpose. It's a pathetic problem if you ask me. The correct context-type for JNLP should be configured for all servers, all contexts, of any site even remotely related to Java, and especially for any site dedicated to Java and maintained or associated with Sun.

No problem here with 6u14. Please note that JDK 7 is still a very poor choice for applets, because Sun has yet to merge the 6u10+ client-side enhancements (JDK 7 was branched off JDK 6 much before 6u10). These include not only general improvements like plugin2 and javakernel, but also many low-level AWT/Java2D fixes that target specifically JavaFX (that's why in the Windows platform, 6u10 is mandatory as a minimum runtime for JavaFX). JDK 7's Milestone 4, due 07/23, will include "Forward-port 6u10 features" so by that time you can early-adopt JDK 7 and eat your JavaFX cake too. But before M4, it's better to stick with the latest Java 6. Even after M4, of course, JDK 7 may suffer from general bugginess, which is expected. My guess is that M4 will have all 6u10+ stuff forward-ported (plus the new "platform APIs" that expose it), but with some regressions that may force you to wait a few additional builds, or even another milestone or two, until you can reliably run JavaFX on it.

About the JNLP issue... I asked the same a long time ago and it was supposed to be fixed. For example, see the two JNLP files at my project: https://mw4serverseeker.dev.java.net/

Both are recognised by my firefox as proper JNLP files and Java Web Start is selected to open them. Is that my browser being smart and the issue was not really fixed?

If that's the case, let me know and I'll move the issue again as it was supposed to be fixed.

S!

Some more bad news... About 50% of the time I hit this page with Java 1.7 (nightly build), the Java plugin hangs FireFox and it needs to be killed. CPU usage and hard-drive usage are at zero. Does anyone else see this with Java6 update 14?

@liquid: Thanks for these details. It's surprising for me that such a simple and important effect like BoxBlur, that btw is considered an optimization of GaussianBlur, is not fully accelerated. When I hit this performance problem I did try a couple other effects but with similar results; perhaps I should try all others. But notice that Effect.getAccelType() tells me that BoxBlur has Direct3D rendering, is this API lying to me?? Isn't the peer model something universal for all the scene graph objects? For any node you must collect info from JavaFX objects... I suppose that you cache that info in the native pipeline by building a "mirror" scene, so you don't hit fine-grained JNI overheads to query again every single node property from the pipeline at every frame. Also, all the rest of the rendering is accelerated, so I suppose there's no architectural problem, just another case of not having had time to write a fully accelerated impl so far, right?

The last time i looked into decora internals it worked like this: generic java effect you or the javafx runtime uses, w/ methods to get and set the effect's parameters. This is backend independent (java, sse2, opengl, d3d, prism). This effect has a backend renderer and effect peer, that are the one in fine responsible for the image processing, ie the java2d RSL buffers, jlsl/glsl shaders, etc loops + processing. Those are also in java as it's the way to control the effects processing parameters and pass those to opengl/d3d. Some of those are more complex to compute than say pass an int to a glsl shader, and since since you're talking about blurs, the blur kernels were computed in java if i remember correctly. So stuff has to be done on the cpu, not everything can be done on the gpu unfortunately, not till everyone runs some cuda capable or larabee :) I think that might be the cause of the cpu usage you're seeing, whether it can be lowered is a question i don't have the answer to. Not yet, but shhh i'm working on it.

UPDATE: The JIRA bugtrack is now public. Notice that the information in this blog is more up to date and complete, although there is some extra info there too. Support for shading-based effects seems to be the latest cool trend in RIA runtimes. Silverlight 3.0 (currently in beta) adds this feature, but Microsoft went with a very different idea: just expose HLSL pixel-shading programming. I suppose the stock Silverlight runtime includes some standard effect APIs, because HLSL is prohibitively above the skill level of most RIA developers. It's a nice feature for advanced programmers though. The only problem is that this architecture probably ties Silverlight's effects to the Direct3D platform, i.e. to Windows - I wonder if Mono's Moonlight can support that. Since HLSL is analogous to GLSL, I guess cross-compilation should be posible but I wonder if one can distribute a single binary that will run on both D3D and OpenGL plafs. HLSL also demands machines with DX9+ capable GPUs; here, JavaFX is in advantage as in addition to D3G & OGL, it offers a SSE2 pipeline that should be a life-saver for older PCs with very crappy integrated graphics but slightly less crappy CPUs (designs from 2001 and later: Pentium-IV, Celeron M or Athlon, etc.). There are still a few hundreds of millions of such PCs in use out there, esp. in business environments.

Nice showcase, I'm hurry to start playing with JavaFX... It's a shame that java.net isn’t able to serve JNLP correctly !!! Please Sun, do something, you created the technology so let us use it on your infrastructure !