The Source for Java Technology Collaboration
User: Password:



Chris Campbell

Chris Campbell's Blog

Faster Java 2D Via Shaders

Posted by campbell on April 07, 2007 at 07:58 PM | Comments (18)

We've been doing a lot of work over the past couple years to accelerate many complex Java 2D operations on the GPU via OpenGL fragment shaders. (Fragment shaders are little programs that operate on each pixel being rendered on the screen; they're infinitely more flexible than the fixed functionality that's historically been available in OpenGL.) The sky's the limit when it comes to the kind of effects one can achieve by writing shader programs.

The first GPUs with fully programmable shader support were made available from Nvidia and ATI in 2002. It took a couple years for the new hardware to penetrate into the consumer market, but now shader-capable GPUs are extremely prevalent (thanks in part to the hefty graphical requirements of Mac OS X and Windows Vista), so much so that we can reliably take advantage of shaders to accelerate complex Java 2D operations. Even those first-generation boards are capable of providing huge performance gains, and each new generation of hardware seems to give an order of magnitude improvement over the last. This should be quite evident from the charts that follow. While CPU speeds seem to be nearing an asymptote, GPU performance continues to rocket, and now Java 2D is able to benefit from that power. Not only does this mean improved performance for your Swing or Java 2D application, but also reduced CPU consumption, thus freeing up your CPU to crunch on application logic rather than getting bogged down with rendering tasks.

[This is one of those blog entries that could be novel length, but no one would actually read the words, because there are too many pretty bar charts to distract the reader. Blah blah blah, words words words. See? No one's reading. So let's skip the prose and get on with it... Oh, but first, I have to tell you how to read these charts. I generated these numbers on a couple different machines, using J2DBench on Windows XP with the latest graphics drivers (ATI Catalyst 7.3 and Nvidia 93.71). Since the machines vary slightly in processor performance and bus speed (the GeForce 7800 is a PCI-E board, the rest are AGP), I decided to use our software pipeline as a baseline, and then compare the OGL pipeline numbers to that baseline. For example, if you see a result that lines up with the number 2000, it means that test is 2000% of baseline, or in other words, it is about 20 times faster on the GPU than on the CPU. Your mileage may vary, but the big takeaway is that most operations are many times faster when executed on the GPU...]


Text Rendering (Bug 6274813: Available in JDK 6)

LCD-Optimized Text
I already discussed this a bit in a blog entry from about a year ago. Since then, we've enabled this by default in JDK 7 (and soon in a JDK 6 update) when the OGL pipeline is enabled. It's cool to see how software performance improves little over time, but each new generation of GPUs brings big performance improvements.

lcdtext.png


BufferedImageOps (Bug 6514990: Available in JDK 7 b08)

ConvolveOp
ConvolveOp is commonly used for modern UI effects such as blurs and drop shadows. Due to limitations in first-generation shader-level hardware, we are currently only accelerating ConvolveOp for 3x3 and 5x5 sized Kernels. These are fairly common kernel sizes, but most drop shadow and glow effects require larger kernels, so we are working to loosen these restrictions and accelerate a wider range of kernel sizes.

convolve.png

LookupOp
LookupOp is often used to perform simple brightness and contrast adjustments on images. To simplify our code, we are currently only accelerating LookupOp for ByteLookupTables and ShortLookupTables with a maximum length of 256 elements, and for 1-, 3-, and 4-band sRGB images only.

lookup.png

RescaleOp
RescaleOp is basically a degenerate case of LookupOp that can be accelerated very efficiently in shaders; after all, it's just a multiply and an add. We are currently accelerating RescaleOp for 1-, 3-, and 4-band sRGB images only.

rescale.png


Multi-stop Gradient Paints (Bug 6521533: Available in JDK 7 b10)

LinearGradientPaint
For 2-stop linear gradients (NO_CYCLE or REFLECT), we can delegate to our existing GradientPaint codepath, which is already ridiculously fast via OpenGL's fixed functionality. For all other linear gradients (for all CycleMethods and ColorSpaceTypes, up to a maximum of 12 color stops), we can accelerate the operation using shaders, for both antialiased and non-antialiased rendering.

linear.png

RadialGradientPaint
The same restrictions for linear gradients also apply to radial gradients (maximum of 12 color stops, etc). For gradients with more than 12 stops, we simply fall back on our existing software routines.

radial.png


Extended Composite Modes (Bugs 6531647, 5094232: On the way)

Antialiased Painting (with Non-SrcOver AlphaComposite)
Historically the OGL pipeline has been able to accelerate the compositing step of an antialiased painting operation only when AlphaComposite.SrcOver (or AlphaComposite.Src, if the paint is opaque) due to the math involved. (This is described in the quasi-official guide to the OpenGL-based Java 2D pipeline, but that document could use a refresh for JDK 6 and beyond.) But now with shaders we're able to accelerate antialiased rendering for any arbitrary AlphaComposite mode.

Coming Soon: PhotoComposite
It's not officially approved (or integrated) yet, but I've been working on adding more blending modes to Java 2D, in addition to those already provided by AlphaComposite. Many of these modes come from traditional photography techniques, thus the name "PhotoComposite". Some modes are simple (Add, Multiply) and can be accelerated easily using OpenGL's built-in blending rules, others are more involved (ColorBurn, SoftLight) and benefit greatly from the use of shaders for efficient rendering.


What's Next?

There are plenty more optimizations that can be made to common Java 2D operations by leveraging shader technology; we'll keep working on this. Also, there have been some discussion on the interest list recently about the use of shaders in Java 2D. Some folks would like to be able to write arbitrary shaders and have them work on Java 2D content. I think it would be hard to come up with a general solution (in the public API) to make this work everywhere, and would shift an unreasonable burden to Java 2D (which is designed with WORA in mind).

However, I do recognize that it would be great if it were easy for developers to make use of shaders in their applications (as Romain has demonstrated) without getting bogged down in the OpenGL/GLSL learning curve. To that extent, I've been working on a few utility classes for JOGL in the vein of TextureIO and related classes. This is another way to make the transition easier for existing Swing developers to leverage JOGL in their applications. More on that in a future blog entry.

Finally, it's worth mentioning that all the shader-based optimizations I've described above are currently only available for the OpenGL-based Java 2D pipeline, but that will soon change. For JDK 7 (maybe sooner?), we have a newly redesigned Direct3D-9-based pipeline in the works that will share much of the architecture (and code) of the OpenGL-based pipeline. We fully expect that all of these shader-based optimizations will be available for most Windows users in the near future. Stay tuned.



In my ears: Panda Bear, "Person Pitch"
In my eyes: JPG, Issue 9


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • I can't wait to get my hands on the first Java 7 builds :) In the meantime, writing a rather general solution to filter images with OpenGL shaders could easily be done outside of the JDK using JOGL and MGS' excellent Shader node.

    Posted by: gfx on April 07, 2007 at 10:45 PM

  • Chris for President!

    Would it be possible to get the engineer that's doing the Direct3D pipeline to write a little teaser blog, with some nice bars?

    Posted by: mikaelgrev on April 08, 2007 at 01:35 AM

  • Well done on the optimisations! Good news.

    But may I be a pedant and ask you not to use "infinitely" and "order of magnitude" when you really mean "a lot" since I had to read your graphs thinking "that's NOT an order of magnitude", etc! Those are "art" terms in mathematics and you are abusing them in the way that only cheap print journalists are allowed to...

    Rgds

    Damon

    Posted by: damonhd on April 08, 2007 at 05:10 AM

  • @mikaelgrev: I agree that would be nice (a blog about D3D, not me running for President of the Universe), although it's worth noting that the charts would probably end up looking suspiciously similar to the ones above (same hardware after all, just a different API)...

    @damonhd: Never a good idea to fight a pedant with more pedantry, but if you're going to compare me to a cheap journalist, then I have no choice... I used the word "infinitely" to describe the flexibility of shaders, which is perfectly acceptable usage considering that with the fixed function pipeline there is a finite number of possible operations, while with vertex/geometry/pixel shading languages, the possibilities truly are infinite.

    And on your second point, while most of the charts above don't demonstrate it, a number of operations are an order of magnitude faster compared to the last generation (e.g. compare GeForce 6-series to GeForce FX-series for the BufferedImageOp charts). My statement was also intended to be more general; there are more cases that aren't reflected in the charts. For example, there wasn't a hardware-vectorized pow() instruction on older GPUs, so using the GLSL pow() instruction was eventually made much faster on newer hardware. Also, while not directly applicable to performance, it's interesting to see the large jumps in capabilities of each new generation of hardware (e.g. maximum number of texture instructions went from 32 to 512 to 4096 in three successive generations of Nvidia hardware).

    Finally, I didn't do a very good job of explaining why some of the numbers look worse on newer hardware (e.g. why RescaleOp looks slower on GeForce 7 than on GeForce 6). As I explained, those numbers were generated on different machines with different bus characteristics. If I ran the tests again on the same machine, with one PCI-E GeForce 7 board and one PCI-E GeForce 6 board, and compared the numbers, you would see that GeForce 7 would beat out GeForce 6 handily (again, not necessarily an order of magnitude, but still a big improvement). That's why I'm not terribly happy with the charts above, but at least they do show how much faster (on average) these Java 2D operations can be when executed on the GPU. It wasn't my intention to do a general performance comparison of shaders; there are already plenty of comparisons/reviews out there that are more comprehensive.

    Thanks, Chris.

    Posted by: campbell on April 08, 2007 at 10:43 AM

  • Cool, very nice work :) Can't wait to try this out at home!

    Are you allowed to post the code you used to do run the benchmarks? If yes, it would be very interesting to see. I've done some benches on various image processing algorithms, was never very happy with the benchmarking procedure itself and would like to learn :)

    Posted by: iluetkeb on April 08, 2007 at 10:52 AM

  • @iluetkeb: All of these performance results were generated using our standard benchmarking suite, known as J2DBench. The source code for J2DBench can be found in the JDK 7 source bundle. Unfortunately there's still no definitive page out there that describes J2DBench; the closest thing we have is my STR-Crazy blog, but it should be enough to get you started.

    Posted by: campbell on April 08, 2007 at 11:41 AM

  • Were the bar charts done using Java2D?

    Posted by: bellbux on April 09, 2007 at 07:58 AM

  • Sounds great. Java needs this to stay competitive, so great work. Two questions: (1) You mentioned Windows support. Can we expect to see this for Linux and Mac, too? (2) For ConvolveOp, I've found it awkward that it can't affect pixels near the border of the image. I understand that there's no source data to work with, but isn't there some way to deal with this (such as by changing the normalization near the borders)? For example, notice the blur going all the way to the edges here: "http://www.simplesystems.org/RMagick/doc/image1.html#blur_image". Also, can shaders deal with such issues?

    Posted by: tompalmer on April 09, 2007 at 08:45 AM

  • @tompalmer: There are ways to do so. You can either wrap the convole op around the edges (and grab the pixels from the left edge when you are reaching the right edge for instance) or you can simply use less sample values. There's no reason why shaders couldn't deal with it.

    Posted by: gfx on April 09, 2007 at 08:56 AM

  • @gfx, Thanks for the info. Any really quick code pointers on that? All I see in the docs (http://java.sun.com/javase/6/docs/api/java/awt/image/ConvolveOp.html) and from my testing are EDGE_NO_OP and EDGE_ZERO_FILL. So, I could code my own convolution to follow your recommendation, but is there anything built into Java2D that would make it easy (and fast once the shaders are automatically there). Any chance for adding EDGE_WRAP or EDGE_IGNORE_AND_RENORMALIZE (whatever the best term is for that) in Java 7? Or any way to cheat by extending the image before convolving in some way? (Seems like that could be killer on RAM for some cases even if it works, though.)

    Posted by: tompalmer on April 09, 2007 at 10:11 AM


  • @bellbux: No, the charts were created in StarOffice. But I've been hacking a bit on J2DBench so that it uses JFreeChart (which is built on Java 2D) to spit out pretty looking charts automatically.

    @tompalmer: 1) All the shader enhancements I described are for the OGL pipeline only right now, and that works on all platforms (Windows, Linux, Solaris, and Mac too once Apple takes this code). When I mentioned Windows support I was specifically talking about the new D3D9 pipeline, which of course is for Windows only. But the benefit of the D3D pipeline is that we can enable it more easily by default on Windows than we could with the OGL pipeline, and therefore Windows users should get all of these shader-based improvements for free in an upcoming release. And of course, the OGL pipeline offers great performance for Linux/Solaris/Mac, or as an alternative to the D3D pipeline on Windows.

    2) Yeah, the two built-in edge conditions for ConvolveOp are a bit awkward. Romain points out a couple different alternatives, but all of these options have their strengths and weaknesses. We've been considering adding new edge modes for some time, and in fact, there's already an RFE filed for it (see 6473061 and feel free to add a vote). We can of course implement these fairly easily, both in our software implementation and in our shader-based implementation. And regarding your last question, yes, many people get around these issues today by either a) cropping the edge pixels (if they don't care about losing data) or b) creating an image padded with transparent pixels, which has an effect similar to a wrap mode where the pixels outside the edges contribute zero values.

    Thanks, Chris

    Posted by: campbell on April 09, 2007 at 10:33 AM

  • Thanks much for the pointers.

    Posted by: tompalmer on April 09, 2007 at 11:18 AM

  • I must be a minority but can I vote for more words less graphs ;-)
    I'd love to get a more in-depth understanding of how the shaders fit in with everything since I was recently involved in a project related to shaders.
    Personally I hope that Java would build a generic "GPU" JSR which would look somewhat like GWT converting on the fly java bytecode (with some restrictions) into the appropriate shader language. This can be used both for 2D and for heavy processing applications (e.g. scheduling, math software) and would make the development of such shaders really easy for any Java programmer.

    Posted by: vprise on April 09, 2007 at 11:48 AM

  • Ah, is Chris hinting that the new D3D pipeline may eventually be enabled by default? That would be sweet. I still cant get the OGL pipeline to work on either my XP or Vista laptops (though it works great on my XP desktop). But the D3D pipeline has always worked everytime, but just never felt as fast. I cant wait to see the new one.

    Posted by: benloud on April 09, 2007 at 06:58 PM

  • @gfx What do you mean by "MGS' excellent Shader node" I googled but didn't find any answering hit (or is it Metal Gear Solid?)

    Posted by: saxer_de on April 10, 2007 at 02:09 PM

  • @saxer_de: Romain meant "MSG" instead of "MGS". MSG stands for Minimal Scene Graph, and the code can be found in the joglutils project on java.net. I'll blog more about the Shader convenience classes I've been working on when I return from vacation in a couple weeks, but in the meantime you can browse around the code in the project if you want a sneak peek (e.g. see the net.java.joglutils.msg.misc.Shader class).

    Posted by: campbell on April 10, 2007 at 02:17 PM

  • Hi Chris, what a wonderful news ! I have been waiting for this for at least 2 years now... voting for related RFEs and posting on the java2D forum ! I am really looking forward to trying the PhotoComposite when it's out ! Java rocks ! as usual !
    Cheers.
    Vince.

    Posted by: vync79 on April 13, 2007 at 05:18 AM

  • I'm so grateful for all that you've done. Thanks again for that nice essay and I would be most grateful if you would send me the latter ones....


    mirc
    mırc
    mirç
    mırç
    mirc indir
    chat yap
    islami sohbet
    dini sohbet
    kelebek
    kelebek sohbet
    kelebek mirc
    kameralı mirc
    kameralı sohbet
    chat yap
    çet
    çet odaları
    sohbet kanalları
    sohbet odaları
    yarışma
    sevgili
    arkadaş
    arkadaş ara
    arkadaşlık

    Posted by: jklmno on June 19, 2008 at 09:22 AM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds