The Source for Java Technology Collaboration
User: Password:



Chris Campbell

Chris Campbell's Blog

Behind the Graphics2D: The OpenGL-based Pipeline

Posted by campbell on November 10, 2004 at 01:28 AM | Comments (8)

[Update (2004/11/12): This blog entry has transitioned into a full-fledged java.net article. The content in the article is very similar to that below, but contains a few clarifications and slightly better formatting. Therefore, I suggest you visit the new article page instead.]


1) Introduction

Ever since the new OpenGL-based Java 2D pipeline became available in J2SE 5.0, developers have been asking the same question: "Which rendering operations are accelerated by OpenGL?"... While I've tried my best to answer these questions clearly, I know that my answers never tell the whole story. There is just no simple way to answer that question with just a few sentences or a "matrix of supported operations" or anything like that. Even my colleagues will tell you that I usually resort to wild handwaving and whiteboard diagramming (that verges on interpretive dance at times) when I try to explain this stuff in the office.

Therefore, I compiled this document to help answer the hot question and explain all the caveats that developers might encounter when they run their application with the OpenGL-based pipeline enabled. Even this one (long!) document is probably not sufficient. There are at least two more topics that I would like to cover in the near future: a performance comparison of the OpenGL-based pipeline, and a roadmap describing some of the features and performance improvements we would like to implement for it in the future.

[This document describes the current state of the OpenGL-based pipeline as of J2SE 5.0. Keep in mind that this story may change a bit in future releases as we find ways to accelerate more operations using OpenGL.]


2) General Comments

  • This document does not discuss how to enable the OpenGL-based pipeline for your application. For more information on that topic, as well as detailed platform-specific release notes, click here.
  • In this document, the term "OpenGL surface" refers to hardware surfaces, such as an AWT Frame (the screen), the Swing backbuffer, or a VolatileImage (which is backed by an OpenGL "pbuffer"). Rendering operations to an OpenGL surface are accelerated in hardware, as described below (e.g. fillRect(), drawString(), drawImage(), copyArea(), etc). In some cases, operations from an OpenGL surface are accelerated in hardware (e.g. copying a VolatileImage to the screen will result in a fast VRAM->VRAM operation).
  • The term "OpenGL texture" is used to differentiate OpenGL texture objects from the surfaces described above, since one cannot render to a texture as one would, say, a pbuffer. (This all gets a little muddied if you start talking about the "render-to-texture" extension, which is a bit complex and outside the scope of this document.) Operations from an OpenGL texture are accelerated in hardware (e.g. transforming a managed image to, say, a BufferStrategy backbuffer will result in a fast VRAM->VRAM operation).
  • Remember that VRAM (memory located on the graphics hardware) is a finite resource. Even if the OpenGL-based pipeline can be enabled on a given graphics device, there may not be enough VRAM available to hold all your images. As mentioned above, we attempt to back VolatileImages using OpenGL pbuffers, but pbuffers can be very resource hungry objects since they often contain 8-bit stencil buffers, depth buffers, accumulation buffers, etc, in addition to the 24- or 32-bit color buffer that one would expect. We try to choose the least resource-demanding pbuffer format, but even so, some drivers return a pbuffer that requires 20 bytes (or more) per pixel! (Just one 1024x768 VolatileImage could require more than 15 MB of VRAM!) If we are unable to fit a VolatileImage in VRAM, we will always fall back and create the image in system memory so your application will still work properly, albeit more slowly than the ideal.
  • OpenGL textures only store color information, so they do not require as much VRAM as pbuffers. For example, a 1024x512 INT_ARGB managed image that is cached in an OpenGL texture will require only about 2 MB of VRAM. However, OpenGL requires that textures have power-of-two dimensions. This means that if your managed image does not have power-of-two dimensions, we will create an OpenGL texture with power-of-two dimensions so that your image will fit. The downside of this approach is that it is potentially wasteful of VRAM. Consider a 129x257 managed image: we will cache the image in a 256x512 texture, which requires about four times as much VRAM as one would expect. (Graphics hardware manufacturers are beginning to support non-power-of-two sized textures in their latest products, and our pipeline is already prepared for this extension, so that we are not required to create power-of-two sized textures. Sadly, this extension is only supported on the very latest hardware, so the above caveats still apply for current hardware.) As with pbuffers, if we are unable to cache a managed image in an OpenGL texture (due to limited VRAM, or the image dimensions exceed the maximum texture size allowed by OpenGL), your application will still work properly, but copying from that image will likely be slower than if it was cached in texture memory.
  • For most rendering operations described below, clipping is fully hardware accelerated by OpenGL. For rectangular clip regions, we use glScissor() which provides extremely fast rectangular clipping in hardware. For complex (shape) clip regions we use the OpenGL stencil buffer, which is also a very fast way to clip out non-rectangular regions.
  • To determine which operations are being accelerated by OpenGL in your application, you can enable tracing with the following system property:
    -Dsun.java2d.trace=log
    For more information on tracing (and other system properties), click here .
  • The Java 2D API can be divided into three general categories of rendering operations: shapes, text, and images. Whether a particular rendering operation can be accelerated by OpenGL depends on the type of operation and the current Graphics2D state. (Read on for more than you ever wanted to know about the OpenGL-based pipeline...)


3) Shape Rendering

Operations in this category include drawLine(), fillRect(), draw(Shape), etc. The way that each operation is handled largely depends on whether the ANTIALIASING RenderingHint is turned on, in addition to the other relevant Graphics2D state.

3.1) Non-antialiased Rendering (ANTIALIAS_DEFAULT/OFF)
Some basic operations can be rendered directly by OpenGL simply by passing down the coordinates of the operation. Specifically, these basic operations include drawLine(), drawRect(), drawPolygon(), drawPolyline(), and fillRect(). More complex operations, such as drawArc() and fill(Shape) are converted to easily digestible spans, which are then rendered by OpenGL. The Graphics2D state determines how the operation is handled by OpenGL:

Paint

  • If the current Paint is a simple Color (either opaque or translucent), then we set the current OpenGL color state using the value from the Color object. Geometry that is rendered subsequently will be drawn with this solid color value, according to the current Composite state (see below).
  • If the current Paint is a GradientPaint, we can use OpenGL's texture coordinate generation mechanism to dynamically apply a GradientPaint to the geometry being rendered. The process used here is fairly complex and outside the scope of this document, but it is safe to say that the technique is very fast even on old graphics hardware. This GradientPaint technique works equally well for all AlphaComposite rules, but we have to punt to software loops in the case of XOR mode.
  • If the current Paint is a TexturePaint, the approach is very similar to that described for GradientPaint above. However, there are two caveats to be aware of. First, the BufferedImage used for the TexturePaint must be cached in texture memory (see "Image Rendering" section below). Second, the BufferedImage used for the TexturePaint must have power-of-two dimensions (unless the new GL_ARB_texture_non_power_of_two is available, as discussed in the "General Comments" section). The texture coordinate generation mechanism that we use will only tile the texture image properly if it has power-of-two dimensions. If either of these two restrictions is not met, we will just fall back on the existing software-based TexturePaint implementation.
  • For custom Paint implementations, we will simply fall back on our software pipelines to complete the operation.

Composite
All 12 Porter-Duff rules defined by the AlphaComposite class can be accelerated by OpenGL. Likewise, if XOR mode is set, then we will use OpenGL's XOR logic operation to accelerate XOR rendering. For custom Composite implementations, we will fall back on our software pipelines to complete the operation.

Stroke
For simple draw operations (such as drawLine()), the geometry can be sent directly to OpenGL only when there is a thin stroke (i.e. a default BasicStroke with width=1.0) installed on the Graphics2D object. If the stroke state is any more complex, then the shape will be sent to the software rasterizer and converted into spans, which will then be rendered by OpenGL as a list of simple quads. (The composite and paint operations will still be accelerated by OpenGL as described above when rendering the spans.)

Transform
If the current AffineTransform represents a simple translation (no scale, shear, or rotation), then the translation factors will be applied to the parameters of the operation and the operation will be performed by OpenGL. If the current AffineTransform is more complex, then the shape will be sent to the software rasterizer and converted into spans, which will then be rendered by OpenGL as a list of simple quads. (The composite and paint operations will still be accelerated by OpenGL as described above when rendering the spans.)

3.2) Antialiased Rendering (ANTIALIAS_ON)
When antialiasing is enabled, shape rendering operations go through the software geometry rasterizer, which knows how to optimally apply the current transform, stroke, and clip state in order to produce something easily digestible by OpenGL. Specifically, the geometry is converted into a series of alpha mask tiles. (There is actually a ton of things going on here, but for the sake of simplicity I'll just talk about this process from the perspective of the OpenGL-based pipeline, which only knows how to take these alpha tiles and turn them into something visible on the screen.)

Even though the software rasterizer is heavily involved when antialiasing is enabled, I would still argue that the operation can be considered "accelerated", since OpenGL can be used to apply the mask to the current Paint and composite the result to the destination OpenGL surface.

Due to the way the operation is defined, OpenGL will only accelerate the alpha mask operation if:

  • the current Paint is a Color object (either opaque or translucent) AND
    the current Composite is of type AlphaComposite.SRC_OVER
  • the current Paint is an opaque Color object AND
    the current Composite is of type AlphaComposite.SRC AND
    the "extra alpha" value of the AlphaComposite is 1.0

If the above restictions are not met (e.g. a GradientPaint is installed), we will use a slower path, but rest assured that we will use OpenGL whenever possible to render the antialiased shape to the destination surface.


4) Text Rendering

Operations in this category include drawString(), drawGlyphVector(), etc.

Rendering of text, both antialiased and non-antialiased, is accelerated by the OpenGL-based pipeline. We maintain an OpenGL texture that acts as a hardware glyph cache, so commonly used glyphs can simply be texture mapped to the destination surface, taking advantage of the hardware accelerated compositing offered by OpenGL. The heuristics used by the OpenGL glyph cache are subject to change, but in J2SE 5.0, we attempt to cache a glyph if its width and height are each less than or equal to 16 pixels. If the glyph cannot fit in the OpenGL glyph cache (which can hold approximately 1024 16x16 glyphs), we render each glyph individually using a process very similar to that descibed in Section 3.2 (including the same restrictions on the current Paint and Composite).


5) Image Rendering

Operations in this category include all the drawImage() variants. If you are unfamiliar with the concepts of VolatileImages and "managed images", I highly suggest you read through Chet's blogs on those subjects.

Imaging operations are usually accelerated in hardware by OpenGL, even if one of the 12 AlphaComposite rules is installed on the Graphics2D. Generally speaking, the OpenGL-based pipeline will accelerate the following operations:

  • simple copies (e.g. drawImage(img, x, y, null))
  • simple scales (e.g. drawImage(img, x, y, w, h, null))
  • arbitrary transforms (e.g. drawImage(img, xform, null))

Exactly how the image data is rendered to an OpenGL surface depends on the types of images involved. Each type of imaging operation is described below.

5.1) System Memory Surface --> OpenGL Surface
System memory surfaces (e.g. a BufferedImage that has not yet been cached in an OpenGL texture) of the following types can be rendered directly by OpenGL:

  • IntArgb
  • IntArgbPre
  • IntRgb
  • IntRgbx
  • IntBgr
  • IntBgrx
  • Ushort565Rgb
  • Ushort555Rgb
  • Ushort555Rgbx
  • ByteGray
  • UshortGray
If an image is not of one of the above types, we can still use OpenGL to render the image, but we will first convert the image into an intermediate type that OpenGL can handle, such as IntArgbPre.

The glDrawPixels() operation can handle simple copies and simple scales (in conjunction with glPixelZoom()), so these operations should be relatively performant. However, glDrawPixels() is known to be somewhat slow, especially on graphics hardware in the x86 world, so this is not the most optimal path.

There is no direct way in OpenGL for transforming system memory surfaces (barring the "pixel transform" extension, which is either not available or not performant on most graphics hardware). Therefore, the OpenGL-based pipeline will use a special tiled approach that uses an intermediate OpenGL texture object to transform the system memory surface:

sysmem --> texture --> OpenGL surface


This approach is reasonably fast since the intermediate texture operations are handled in hardware, but note that it is currently defined only for NEAREST_NEIGHBOR interpolation. (We have an RFE open that would make this work for BILINEAR as well, but for now BILINEAR and BICUBIC hints are handled by our software transform loops in this case.)

5.2) Managed Image (OpenGL Texture) --> OpenGL Surface
Managed images of all types can be cached in an OpenGL texture (there are direct loops defined for the types mentioned in Section 5.1, but generally speaking we can cache any image type by first going through an intermediate surface). Once an image has been cached in an OpenGL texture object, that image can be rendered to an OpenGL surface by mapping the texture to an OpenGL quad. The texture-mapped quad will respect the current AffineTransform state, and will therefore be transformed.

For example, if there is a rotation transform set on the Graphics2D object, the texture will be rotated by the graphics hardware. Likewise, the variants of drawImage() that take scaling parameters will scale the texture mapped quad before rendering to the destination OpenGL surface. Transforming a managed image with either NEAREST_NEIGHBOR or BILINEAR interpolation RenderingHints will be accelerated by OpenGL in hardware. Unfortunately, OpenGL does not support BICUBIC interpolation for textures, so we fall back on our software transform loops for the BICUBIC case.

5.3) VolatileImage (OpenGL Pbuffer) --> OpenGL Surface
Simple copies and scaled copies from a pbuffer-backed VolatileImage to an OpenGL surface will be accelerated by the VRAM->VRAM glCopyPixels() operation, and should be relatively performant. There is no direct way in OpenGL for transforming pbuffers (barring the render-to-texture approach, which is not discussed here). Therefore, copying a pbuffer-backed VolatileImage with an arbitrary transform will use a tiled approach similar to that described in Section 5.1:

pbuffer --> texture --> OpenGL surface


This approach is reasonably fast since the intermediate texture operations are handled in hardware, but note that it is currently defined only for NEAREST_NEIGHBOR interpolation, as mentioned in Section 5.1.


6) Miscellaneous

  • The Graphics2D.copyArea() operation is accelerated for OpenGL surfaces, using the very fast VRAM->VRAM glCopyPixels() operation.
  • The BufferStrategy.show() method (for "flip" strategies) will result in a native SwapBuffers() operation, which causes the contents of the hardware OpenGL backbuffer to be "flipped" to the front buffer (i.e. the screen). Depending on the platform and graphics drivers, this operation may be synchronized with the vertical refresh of the monitor.
  • The Image.flush() method will delete any OpenGL textures in use by a managed image, or any OpenGL pbuffer in use by a VolatileImage.


7) Conclusion

I hope this article answers most of the questions developers have been asking for the past few months. If you see any glaring omissions, something you would like clarified, or topics for a future "Behind the Graphics2D" article, please post a comment. I'll try to incorporate your suggestions into this document so that it can be the "definitive source" for this topic (if that's possible).



In my ears: Wire, "154"
In my eyes: Kobo Abe, "The Woman in the Dunes"


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Just for giggles, I figured I'd try running netbeans with the openGL based pipeline.

    The splash screen came up and then the process segfaulted in libGL.so while making a call from sun.java2d.DefaultDisposerRecord.invokeNativeDispose(JJ)

    Posted by: alanstange on November 11, 2004 at 06:48 AM

  • Re: crash in NetBeans... Are you on Linux with an Nvidia graphics card? If so, I would recommend that you upgrade to the recently released 6629 drivers, which are more stable and performant when using the OpenGL-based Java 2D pipeline.

    Posted by: campbell on November 11, 2004 at 11:14 AM

  • First off, many thanks -- this is a very useful document.

    A question: since you specifically note several drawImage calls that are accelerated, are the others not? For example, one of the drawImage versions allows for subimage blits as well as flipping and scaling:
    drawImage(img, dx1, dy1, dx2, dy2, sx1, sy1, sx2, sy2, null)

    Also, for future consideration:
    It would be nice if there was a way to enable/disable acceleration after AWT was started, primarily if we wanted to prompt users to enable/disable it, or if it could be made available to applets. But maybe these would require hefty rewrites.
    Is there an equivalent to AlphaComposite that could be used to independently scale all three/four channels, not just alpha?


    Too bad it isn't yet available for Mac OS X.

    Posted by: miles on November 11, 2004 at 12:56 PM


  • miles:


    All drawImage() calls are accelerated. I can see that the way I wrote it wasn't entirely clear about that. I'll try to fix that...


    I would rather enable the OpenGL-based pipeline by default on the platforms where it makes sense, rather than leaving the decision up to users, since our implementation has a better idea of the impact (good or bad) of enabling it. That way applets and applications would automatically benefit on conformant systems.


    Re: "equivalent to AlphaComposite"... AlphaComposite does a lot more than just scaling the alpha channel. It performs some calculations that take into account both the source and destination color/alpha components. I think the operation you're looking for is a rescale, which is available in the java.awt.image.RescaleOp class. In my upcoming roadmap document, I'll mention that we could accelerate some BufferedImageOps (like RescaleOp) using OpenGL in the future. More to come.


    Thanks for your feedback.

    Chris

    Posted by: campbell on November 11, 2004 at 01:26 PM

  • Hi Chris,

    I know it's been a long time since you made this blog, but I've got an importatnt scenario that I'd like you to comment on: applications switching between OpenGL and the default rendering, potentially at runtime. This is a feature which quite a lot of games allow and it would be great if Java supported it as well.

    I think the relevant questions are:

    1. Is it possible to set the value of 'sun.java2d.opengl' at runtime, even as the first statement in an application, or does it HAVE to be on the command line?

    2. Is there a way to find out which pipeline is in use? (Other than checking the value of ''sun.java2d.opengl')

    3. Is it possible - at all - to change the pipeline in use at runtime?

    4. If yes, then what are the constraints? I would assume all windows have to be disposed before the change can be made?

    5. If no, are you going to be able to make it possible to do this some time in the future?

    Thanks very much,

    Graham.

    Posted by: grlea on April 10, 2005 at 09:54 PM

  • Hi Graham,

    1. Yes, you can do setProperty("sun.java2d.opengl", "true") at runtime if you are careful to call it before any other 2D/AWT initialization steps in your main() method. (Not recommended, see below.)

    2. No standard way beside hacks like looking at the toString() of the default GraphicsConfig (if it starts with GLX or WGL, that's a good indication that the OGL pipeline is enabled successfully). (Not recommended, see below.)

    3, 4. No.

    5. Not likely that we will ever make this change. In fact, I would highly recommend that you avoid my hacks mentioned in points 1 and 2. The decision to use the OGL pipeline versus the existing (default) pipelines is not something that developers should be making. We currently do not have the appropriate code in place so that if you specify sun.java2d.opengl=true and you're running on buggy or slow drivers, we won't make any attempt (currently) to avoid using OGL in that case. In Mustang and Dolphin, we will be investigating which platforms (and driver/hardware combos) will make sense for enabling the OGL pipeline by default. This will be the ideal solution. Our job in Java 2D is to provide developers and end users the fastest (and most correct) 2D graphics, depending on the capbilities of the users machine. Developers should not have to worry about which pipeline would work best for their application.

    Hope this helps.

    Chris

    Posted by: campbell on April 11, 2005 at 12:51 PM

  • Thanks for that, Chris.
    Those answers are pretty much what I expected.
    I would have been very surprised (not to mention impressed) if you told me I could change pipelines after starting 2D work. : )

    I should point out that my intention isn't to try and force the application to always use OpenGL, but to empower users to turn on OpenGL.
    Your average user isn't going to want to, or know how to, set a system property of a java process.
    I was intending to implement a feature where the application gives the user an option to "Use OpenGL".
    The option would:
    - be adjacent to an explanation of how OpenGL may improve performance on some systems but may produce incorrect results or errors on others, and that users should try it but turn it off if they see degraded performance or ... (what will they see, by the way?);
    - be stored in the application's preferences;
    - require the user to restart the application.

    Then, as part of the initialisation, I'd have some code like this:
    Preferences preferences = Preferences.userRoot().node("myApplication");
    boolean useOpenGl = preferences.getBoolean("openGL", false);
    if (useOpenGl)
    System.setProperty("sun.java2d.opengl", "true");


    I would probably also create a facility for changing the OpenGL flag outside of the application, in case their OpenGL problems were so bad the user couldn't get to the settings.

    So, in essence, I just want to make it easier for the user to set the flag.
    Does this seem like a reasonable solution?
    Is it in line with how you intend the OpenGL pipeline to be used?

    Thanks,
    Graham.

    Posted by: grlea on April 11, 2005 at 03:35 PM

  • Hi Graham,

    I see now what you are proposing. It seems like a reasonable solution, if your users understand that there may be some risk of crashes or rendering artifacts (e.g. if they do not have the latest graphics drivers installed, or if they are somehow buggy). That situation will improve over time. Since you have given your users an out, then those changes should be a good stopgap until a future release when they will get better performance out-of-the-box (from either the D3D or OGL-based pipeline).

    Thanks,

    Chris

    Posted by: campbell on April 11, 2005 at 03:50 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds