The Source for Java Technology Collaboration
User: Password:



Chet Haase

Chet Haase's Blog

Graphics Acceleration Geeks: Rejoice!

Posted by chet on May 03, 2005 at 03:19 PM | Comments (8)

If you are interested in hardware acceleration for Java2D on Windows, check out the latest bits on the mustang site ( http://mustang.dev.java.net). Dmitri Trembovetski has been working tirelessly to implement functionality similar to what Chris Campbell did with our OpenGL rendering pipeline, and it's pretty stunning. There is now (as of build 33) acceleration for everything from the standard image copies to translucent image operations to lines to transforms to complex clips to text (AA and non-AA).

Note: This rendering pipeline is disabled by default for now; there are various issues we are working through to make this renderer as good in quality as the default renderer. That quote from Spiderman comes to mind: "With great power comes great responsibility." Except in our case, the quote runs more like this: "With great power comes great driver quality issues"; as we enable more features in Direct3D, we expose more quality and robustness issues in graphics hardware and drivers that we need to work around. This driver quality issue is a ripe topic for another article or more; suffice it to say that the hardware and driver manufacturers tend to have a lower bar for "quality" than people tend to expect for Java. To enable the Direct3D pipeline in the current Mustang builds, use the -Dsun.java2d.d3d=true runtime flag.

I thought it might help to dive into one area of acceleration, to explain why we're doing this, and what benefits you might expect to see. In both the OpenGL and Direct3D pipelines, we accelerate text by caching individual characters (glyphs) as tiles in a texture. The first time you render a glyph (ala drawString()) to an accelerated destination (such as the Swing back buffer), we will rasterize that glyph into the texture and then execute a texture-mapped quad operation to get that glyph into the right place. The next time you draw that same glyph, we already have it cached and can simply perform the texturing operation.

It might not be obvious why this is a Good Thing; after all, doesn't it sound like a lot more effort to do a full-on 3D texture-mapped quad operation than to simply draw a few pixels for a character into a buffer? Yes ... and no. in terms of raw instructions executed, that's probably correct; rasterizing a glyph is a pretty simple operation. And we already have a software cache for glyphs, so all we really do on repeat operations is to copy the pixels down from that cache into the destination. Meanwhile, a texture-map operation requires possible setup of the rendering destination in direct3D, possible transformation setup, creation of appropriate vertex and texture-coordinate information for the glyph quad, passing down the call to Direct3D, then the stuff that the Direct3D driver does before handing it off to hardware, which then rasterizes the textured-quad. This definitely sounds like a whole pile of work...

But there are two keys here that make the performance win more understandable: VRAM and parallel processing.

  • VRAM: Using video memory is all a matter of getting better performance by locality of memory. Basically, things happen faster if they are located more closely together.

    Let me try a sports analogy. This is a first for me; anyone that knows me would be shocked that I'd try this. Sports is one of those things that never really "took" with me. I'm apt to start talking about runs and goals and tackles in the same metaphor and the whole analogy would fall flat. But I like to try new things, so here goes:

    Imagine a play in baseball (that's the one with hits and runs and outs, right?). Let's say that the batter hits a grounder that the fielders need to get to quickly to try to throw the player out at first. If one of the infielders can manage to get to the ball before it passes out of the infield, then they can wing it over to first base and have a hope of throwing the person out. But if the ball goes into the outfield, then whoever gets the ball has to throw the ball farther, and thus has less chance of throwing out the batter at first. Here we see the dynamic of locality; if the play can be kept completely within the infield, then there is a greater chance of making the out because the ball can travel much quicker to first base.

    Whew! Okay, that was a (7th inning?) stretch, but I made it out the other side at least. Let's take this back into more familiar territory of computers.

    The screen exists in video memory (that's where the data lives that the monitor inputs read from). The Swing back buffer (as of j2se 1.4) also lives in video memory (I'm talking about Windows here, since this article is about our Direct3D pipeline; other platforms have different screen/buffer/rendering dynamics). This means fast copies from the back buffer to the screen; if they are both in VRAM, then the operation is going to happen faster. This is because the bits don't have to travel as far, but it is really because there is a faster data path from VRAM to VRAM than there is from system memory to VRAM; pixels don't need to go through the CPU or over the PCI/AGP/PCI-Express bus, they just go through the faster/ wider video card bus.

    (Note: The observant reader may notice that my baseball analogy breaks down here somewhat. VRAM operations are not faster just because of locality, but also because there is a faster path for local data. If I were to overload the analogy to account for this, it would be as if the infield players were the really good players on the team that could throw a whole lot faster than the outfielders. This is maybe not too far off-base; when I played little league it was certainly the case that the person playing right field (that'd be me) was far slower and less capable than the people closer to the batter).

    The dynamic between the back buffer and the screen also applies to operations going to the back buffer itself; anything that can happen from VRAM to that back buffer has the advantages of locality and a faster/wider data path. In the case of texture-mapping operations, it may be that there is more happening to copy each individual pixel into place, but these pixels are being copied from a better location (VRAM) to the back buffer than the previous approach of rasterizing or copying from system memory to the back buffer.

  • Parallel Processing: Another important factor here that makes all of this possible is that the graphics chip is a completely separate processor. So when we're talking about the work involved in rasterizing a texture-mapped quad, this is all happening on the GPU, not the CPU with the rest of the Java software stack. In addition to being parallel, the GPU is also highly-tuned for doing these sorts of operations, so it can probably do a much better/faster job of them than the CPU could.

    I could try to overextend the strained baseball analogy here, where the fielders operate asynchronously to the pitcher, but that would probably result in the next play starting while the current play was still happening. Baseball is confusing enough without throwing multi-threading into the mix.

Between these two factors, using data in VRAM and using the capabilities of the GPU, it is no longer the case that more complicated operations necessarily result in slower performance.

Another side benefit of this approach is that more interesting text approaches, such as anti-aliasing, can be supported with basically no additional performance hit. Typically, in a software rendering solution, text-antialiasing causes a significant performance hit. This is because of the increased amount of stuff happening to rasterize these characters; there is now a read from the destination pixel and a blending operation to get the smooth edges of each glyph. Beyond the extra calculations involved here, that simple read can be quite expensive, especially when the destination is in VRAM. Graphics chips are really good at doing things in VRAM. They are pretty good at doing things from the CPU down into VRAM. But they really stink at doing things from VRAM to the CPU; the read speed of VRAM is really abysmal. So if a software rasterizer must read from VRAM in order to draw an anti-aliased glyph, performance will usually suffer.

But with the texture-mapped quad approach to text rendering, there is basically no extra work going on when the glyphs are translucent. The same operations occur under the hood, but now they are all happening on the GPU and in VRAM, which have all the benefits so eloquently and inappropriately layed out in the baseball analogy above.

So enough about the low-level details. Download the bits, try them out, let us know what you find. We are continuing to work on it (various performance, quality, and robustness issues) and will enable Direct3D rendering by default when we are confident that this renderering pipeline is at least as good as the default one. In the meantime, you can force it on by using the -Dsun.java2d.d3d=true flag.

Dmitri has just informed me of three bugs that are currently being fixed on our side (not driver issues, actual implementation bugs if you can believe it):

  • 6255408: PIT: D3D: Animation freezes when pushing the console to FS mode and restoring it, WinXP
  • 6255346: PIT: D3D: VolatileImage is distorted when lowering the color depth at runtime
  • 6255836: PIT: ClassCastException thrown when ALT+TABing a FullScreen-page flipping app, Win32

In addition, the pipeline may not get enabled (even when you force it on) in 16-bit color depth; some graphics chips (such as the GeForce MX products from nVidia) have hardware limitations that force us to back off of acceleration in that depth.

If you do see any "issues" on your system, let us know. Be sure to tell us your platform details (especially your OS, graphics chip, resolution, bit depth, and driver version) so that we can chase down and fix the problems.


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Hi Chet, I ran a quick test on a text editor that I am working on, and text rending seems to be significantly slower with Direct3D enabled. I notice this when scrolling through large documents. With no flags set there is no lag (scollbar thumb keeps up with mousepointer), but with Direct3D there is a noticeable lag.

    All I do is create tons of GlyphVector objects using the graphics FontRenderContext and then draw them using Graphics2D.drawGlyphVector(...). I know this is probably a very unefficient way to draw text and it will probably be changed, but maybe you are interested in hearing that this painting operation is slower with Direct3D enabled.

    BTW, with all those direct text drawing enabled, can we still be sure that a GlyphVector that was created using the graphics FontRenderContext has the same metrics as the glyphs on the screen that get painted with drawString(...)?

    Posted by: jansan on May 04, 2005 at 01:11 AM

  • I just changed the code to draw the text using Graphics2D.drawString(...) and this gave back the old performance.

    Posted by: jansan on May 04, 2005 at 01:29 AM

  • jansan,

    Thanks for the information; I'll forward it to our font folks and see if there is something known or unknown about this issue. Can you try it with opengl instead (-Dsun.java2d.opengl=true) and see if you get the same effect as you do with d3d? (I don't know how well openGL will run on your system, but it's worth a try to see if the problem is in our d3d implementation or in the overall font code).

    Chet.

    Posted by: chet on May 04, 2005 at 07:21 AM

  • drawGlyphVector ought NOT to be a slow way to draw text.

    In fact in theory it should be the fastest because the char->glyph processing
    doesn't need to be done each time as it must be for drawString.

    We'd need to look into why the D3D pipe can't accelerate this case.

    Posted by: philrace on May 04, 2005 at 09:04 AM

  • jansan, could you please contact me (tdv at sun dot com), I'd like to get more information on how you use drawGlyphList(). It is accelerated by the pipeline, so something is not right.

    Thanks,
    Dmitri
    Java2D Team

    Posted by: trembovetski on May 04, 2005 at 10:43 AM

  • Tests: for the Java2D demo (in the JDK 1.5 demos), this looks very good--either the D3D enhancements or just the latest Mustang improvements show much cleaner rendering in the demo, pretty smooth, with almost no jerkiness. The J2D demo was always a little disappointing on my machine, and now it actually impresses me. Nice work.

    But...the Flying Saucer XHTML renderer again has problems. Using build 33, enabling d3d and the painting absolutely breaks on scrolling--appears that it just draws on top of what was there without clearing out first, so you get etch-a-sketch. If I try to enable OGL, OGL is rejected, and there are no painting problems--also no problems if no flag is specified at all.

    Machine: HP Laptop, video ATI Mobility M6, driver should be ATI as well--this machine is a 4-5 year old design/model at this point.

    If you want to try and set this up, write me pdoubleya at dev.java.net.

    Patrick

    Posted by: pdoubleya on May 05, 2005 at 02:28 AM

  • Hi Patrick,

    I tried to send you email at the address you specified, but it bounced. Could you please contact me at tdv at sun dot com.

    Thanks,
    Dmitri
    Java2D Team

    Posted by: trembovetski on May 05, 2005 at 12:58 PM

  • Hello!

    What's happening with "No more Gray Rectangle fix"? Will it be enabled on Linux too?

    Peter

    Posted by: 5er_levart on May 31, 2005 at 04:06 AM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds