Skip to main content

STR-Crazy: Improving the OpenGL-based Java 2D Pipeline

Posted by campbell on March 11, 2005 at 7:22 PM PST

One Thread To Rule Them All

As most developers are already aware, an OpenGL-based Java 2D pipeline (henceforth known as "the OGL pipeline") was included in JDK 5.0 for improved rendering performance. While the OGL pipeline was a big step forward for rendering performance of complex operations (think transforms, compositing, gradients, etc), it was not nearly as robust as our existing X11- and DirectX-based pipelines. This meant that users evaluating their apps with the OGL pipeline enabled would see frequent crashes and rendering artifacts. Five steps forward, 3 steps back...

What was causing these crashes? If you're familiar with OpenGL, you probably know that it's important to do all your rendering from one (and only one) thread. While it is possible to render from other threads (if you're careful), OpenGL drivers are optimized for the single-threaded case. Games almost always play by these rules; they may use other threads for things like AI and physics calculations, but they do all their rendering from one display thread.

Well, we're not quite as lucky in Java 2D... Rendering requests can come from any number of threads (the EDT, a user thread, etc) even though we try to teach developers to avoid doing heavy rendering to hardware surfaces from many threads. In 5.0, we dealt with this by taking precautions, such as:

  • ensuring only one thread is calling an OpenGL method at any given time
  • using OpenGL rendering contexts only from the thread on which they were created

This makes OpenGL drivers reasonably happy, but it requires more labor on our part (meaning reduced performance), and despite all our efforts, drivers can still crash when there are many rendering threads in use by an application.

Clearly, something needed to be done if we ever wanted to see the OGL pipeline become reliable enough for day-to-day use by end users. For the past couple years, Chet and others on our team have discussed the idea of "single-threaded rendering" (STR), which would allow us to interact with native graphics libraries (such as OpenGL) from a single thread, thus making those libraries and drivers happy.

The idea originally came up as a possible solution to a number of threading problems we've dealt with in the past on the Windows side. But what better way to test these ideas than to implement them using the OGL pipeline? So that's what we set out to do in Mustang (JDK 6.0). Instead of this mess:


we'll now have the following, more elegant solution:


As you have probably noticed, I could go on and on about this project, but I'll spare you further pain. (If you really want more details, see this RFE.) The bottom line is that the project was a big success (from our perspective). The OGL pipeline is now more robust since we only interact with the native OpenGL libraries from a single "queue flushing" thread.

Going into this project, we were a bit unsure of the performance implications. We had our reservations, but surprisingly, there are some significant performance gains attributed to these changes, mostly because we can avoid going down through JNI on each and every rendering operation. Instead, rendering operations are buffered up on a NIO buffer, which is flushed periodically (or as necessary). Another benefit of STR is that it makes it much easier to "batch up" similar operations. OpenGL prefers to see lots of similar operations (like lines or rectangles) batched up within a glBegin()/glEnd() pair, so it is now easier for us to track which operations are being enqueued, and therefore easier for us to batch.

I'll save the detailed performance numbers for another day, but here are some quick numbers from my Linux configuration (JDS, 2x 2.6GHz P4, Nvidia GeForce FX 5600), but the numbers look similar on Solaris and Windows:

  • SwingMark (internal Swing benchmark) is approximately 15% faster
  • drawString() is about 250% faster (according to J2DBench)
  • fillRect() and drawLine() are up to 2500% faster (according to J2DBench)
  • with FireStarter (an internal demo), we can render about 2600 transformed/filtered/blended sprites per frame at 30 fps (as opposed to only 1600 sprites before STR)

(Note: these results are from microbenchmarks, so take them with a grain of salt...)

Now, why did I choose today to post this blog entry? If you said, "well, because the STR changes have just been integrated into Mustang build 27", give yourself a big pat on the back. It's true... The b27 snapshot was just posted on today. Please download the binaries for your favorite platform and take an app or two for a test drive with the OGL pipeline enabled (-Dsun.java2d.opengl=True). Let me know how it goes... But first, a couple caveats...

Everytime I write about the OGL pipeline, the first question that is asked (without fail) is "Will the OGL pipeline be enabled by default?"... Well, the answer is still "no". It will never be enabled by default on the Windows platform (the DirectX-based pipelines are a better bet on Windows). However, I certainly hope that someday (Mustang? Dolphin?) the OGL pipeline will be enabled by default on Solaris and Linux, but only if we detect a "compatible" system, meaning hardware accelerated drivers are installed, bug-free, and performant.

Even with STR in place for the OGL pipeline, there are still just a few outstanding driver issues (more so with ATI's drivers than with Nvidia's) that prevent us from enabling the OGL pipeline by default. We've filed bugs with the respective companies, and we're still working with their driver teams to get the bugs fixed. I hope to post a follow-up soon to report progress on the remaining driver issues. For now, if you're using a video card from ATI, you may see some inverted colors and/or gradients being rendered incorrectly, and maybe even a crash on exit. These issues have been filed with ATI. If you're using a video card from Nvidia, you may see reduced performance when rendering software-based images on Windows, but otherwise, things should be looking pretty good.

If you're one of those people hoping to see hardware accelerated Java 2D on Solaris and Linux someday, please download the Mustang snapshots and continue to provide feedback on the OGL pipeline. The more feedback we get, the better the chances that we can enable the OGL pipeline by default in a future release.

Finally, A Micro-Benchmark For Java 2D...

A number of folks on the and JavaLobby forums have requested a way to benchmark Java 2D performance on various platforms and configurations. We've had an internal microbenchmarking application for quite some time called J2DBench that we use on a daily basis. In response to requests from developers, we wanted to make J2DBench available to the public, but it took quite a while to get permission from our lawyers to do so (you know how it goes).

Finally I'm happy to say that J2DBench is available as part of the JRL source bundle for Mustang build 27. To access the J2DBench sources, download the source bundle and cd to j2se/src/share/demo/java2d/J2DBench. In there, you'll find a README file, Ant build.xml file, and NetBeans 4.0 project file to get you started. In the future, we should be able to accept J2DBench fixes and improvements (e.g. new tests) through the JDK Collaboration (or similar) project on I'll blog with more details in the future once we figure out how best to handle that issue.

Thanks to Dmitri Trembovetski for taking the time to prepare the J2DBench source files for public consumption. I hope it will be useful for developers trying to performance tune their application, or as a convenient way to submit performance bugs to us on the Java 2D team. (For example, use the sample options file included with J2DBench to compare the performance of the OGL pipeline in JDK 5.0 with the improved pipeline in JDK 6.0-b27.) Good luck!

Our Work Is Never Done

Another improvement coming soon in a Mustang snapshot near you: a fix for the dreaded "gray rect" problem. This is one of Chet's favorite battles, so I'll let him describe the problem, and the solution, in an upcoming blog. However, I wanted to quickly mention that due to the way this fix was implemented, Swing is able to use BufferStrategy for their doublebuffering needs.

What makes this interesting is that it opens the door for the Swing backbuffer to be stored in the native OpenGL backbuffer, which is usually pre-allocated for us anyway, and often goes unused in the case of Swing applications. Making use of the OpenGL backbuffer means improved Swing performance (when the OGL pipeline is enabled), and also results in a big reduction in VRAM usage.

Prior to this change, the Swing backbuffer would be allocated in an OpenGL pbuffer. Pbuffers allow for hardware accelerated offscreen rendering, but they can be expensive in terms of VRAM footprint (they often use 12-20 bytes of VRAM per pixel!). So by eliminating the need for pbuffers for the Swing backbuffer, there will be plenty more VRAM available for accelerating things like icons and text. I'll provide more details about this change in a future blog entry.

There are plenty of other projects on my plate for Mustang, including:

  • Fullscreen/DisplayMode changing on Linux (finally)
  • OGL pipeline improvements (of course)
  • Hardware acceleration for LCD-optimized text (important!)
  • Wide line performance improvements (maybe)
  • Applying STR techniques to our DirectX-based pipelines on Windows (Dolphin?)
  • Other miscellanea (as always)

In my ears: Robert Pollard, "Speak Kindly Of Your Volunteer Fire Department"

In my eyes: Cao Xueqin, "The Story Of The Stone (Vol. 2)"

Related Topics >>