Skip to main content

Behind the Graphics2D: The OpenGL-based Pipeline

Posted by campbell on November 10, 2004 at 1:28 AM PST

[Update (2004/11/12): This blog entry has transitioned into a full-fledged java.net article. The content in the article is very similar to that below, but contains a few clarifications and slightly better formatting. Therefore, I suggest you visit the new
article page instead.]


1) Introduction

Ever since the new OpenGL-based Java 2D pipeline became available in
J2SE 5.0, developers have been asking the same question: "Which rendering
operations are accelerated by OpenGL?"... While I've tried my best to answer
these questions clearly, I know that my answers never tell the whole story.
There is just no simple way to answer that question with just a few
sentences or a "matrix of supported operations" or anything like that. Even
my colleagues will tell you that I usually resort to wild handwaving and
whiteboard diagramming (that verges on interpretive dance at times) when
I try to explain this stuff in the office.

Therefore, I compiled this document to help answer the hot question and
explain all the caveats that developers might encounter when they run their
application with the OpenGL-based pipeline enabled. Even this one (long!)
document is probably not sufficient. There are at least two more topics that
I would like to cover in the near future: a performance comparison of the
OpenGL-based pipeline, and a roadmap describing some of the features and
performance improvements we would like to implement for it in the future.

[This document describes the current state of the OpenGL-based pipeline as of
J2SE 5.0. Keep in mind that this story may change a bit in future releases
as we find ways to accelerate more operations using OpenGL.]


2) General Comments

  • This document does not discuss how to enable the OpenGL-based pipeline
    for your application. For more information on that topic, as well as detailed
    platform-specific release notes,

    click here
    .

  • In this document, the term "OpenGL surface" refers to hardware surfaces,
    such as an AWT Frame (the screen), the Swing backbuffer, or a VolatileImage
    (which is backed by an OpenGL "pbuffer"). Rendering operations to an
    OpenGL surface are accelerated in hardware, as described below (e.g.
    fillRect(), drawString(), drawImage(), copyArea(), etc). In some cases,
    operations from an OpenGL surface are accelerated in hardware (e.g.
    copying a VolatileImage to the screen will result in a fast VRAM->VRAM
    operation).

  • The term "OpenGL texture" is used to differentiate OpenGL texture objects
    from the surfaces described above, since one cannot render to a
    texture as one would, say, a pbuffer. (This all gets a little muddied if
    you start talking about the "render-to-texture" extension, which is a bit
    complex and outside the scope of this document.) Operations from an
    OpenGL texture are accelerated in hardware (e.g. transforming a managed image
    to, say, a BufferStrategy backbuffer will result in a fast VRAM->VRAM
    operation).

  • Remember that VRAM (memory located on the graphics hardware) is a
    finite resource. Even if the OpenGL-based pipeline can be enabled on a given
    graphics device, there may not be enough VRAM available to hold all your
    images. As mentioned above, we attempt to back VolatileImages using OpenGL
    pbuffers, but pbuffers can be very resource hungry objects since they often
    contain 8-bit stencil buffers, depth buffers, accumulation buffers, etc, in
    addition to the 24- or 32-bit color buffer that one would expect. We try
    to choose the least resource-demanding pbuffer format, but even so, some
    drivers return a pbuffer that requires 20 bytes (or more) per pixel! (Just
    one 1024x768 VolatileImage could require more than 15 MB of VRAM!) If we
    are unable to fit a VolatileImage in VRAM, we will always fall back and
    create the image in system memory so your application will still work
    properly, albeit more slowly than the ideal.

  • OpenGL textures only store color information, so they do not require as
    much VRAM as pbuffers. For example, a 1024x512 INT_ARGB managed image that is
    cached in an OpenGL texture will require only about 2 MB of VRAM. However,
    OpenGL requires that textures have power-of-two dimensions. This means that
    if your managed image does not have power-of-two dimensions, we will
    create an OpenGL texture with power-of-two dimensions so that your image
    will fit. The downside of this approach is that it is potentially wasteful
    of VRAM. Consider a 129x257 managed image: we will cache the image in a
    256x512 texture, which requires about four times as much VRAM as one would expect.
    (Graphics hardware manufacturers are beginning to support non-power-of-two
    sized textures in their latest products, and our pipeline is already prepared
    for this extension, so that we are not required to create power-of-two sized
    textures. Sadly, this extension is only supported on the very latest
    hardware, so the above caveats still apply for current hardware.)
    As with pbuffers, if we are unable to cache a managed image in an OpenGL
    texture (due to limited VRAM, or the image dimensions exceed the maximum
    texture size allowed by OpenGL), your application will still work properly,
    but copying from that image will likely be slower than if it was cached
    in texture memory.

  • For most rendering operations described below, clipping is fully hardware
    accelerated by OpenGL. For rectangular clip regions, we use glScissor()
    which provides extremely fast rectangular clipping in hardware. For complex
    (shape) clip regions we use the OpenGL stencil buffer, which is also a very
    fast way to clip out non-rectangular regions.

  • To determine which operations are being accelerated by OpenGL in your
    application, you can enable tracing with the following system property:


        -Dsun.java2d.trace=log


    For more information on tracing (and other system properties),

    click here
    .

  • The Java 2D API can be divided into three general categories of rendering
    operations: shapes, text, and images. Whether a particular rendering
    operation can be accelerated by OpenGL depends on the type of operation and
    the current Graphics2D state. (Read on for more than you ever wanted to
    know about the OpenGL-based pipeline...)


3) Shape Rendering

Operations in this category include drawLine(), fillRect(), draw(Shape), etc.
The way that each operation is handled largely depends on whether the
ANTIALIASING RenderingHint is turned on, in addition to the other relevant
Graphics2D state.


3.1) Non-antialiased Rendering (ANTIALIAS_DEFAULT/OFF)



Some basic operations can be rendered directly by OpenGL simply by passing
down the coordinates of the operation. Specifically, these basic operations
include drawLine(), drawRect(), drawPolygon(), drawPolyline(), and fillRect().
More complex operations, such as drawArc() and fill(Shape) are converted to
easily digestible spans, which are then rendered by OpenGL. The Graphics2D
state determines how the operation is handled by OpenGL:

Paint

  • If the current Paint is a simple Color (either opaque or translucent), then
    we set the current OpenGL color state using the value from the Color object.
    Geometry that is rendered subsequently will be drawn with this solid color
    value, according to the current Composite state (see below).

  • If the current Paint is a GradientPaint, we can use OpenGL's texture
    coordinate generation mechanism to dynamically apply a GradientPaint to
    the geometry being rendered. The process used here is fairly complex and
    outside the scope of this document, but it is safe to say that the technique
    is very fast even on old graphics hardware. This GradientPaint technique
    works equally well for all AlphaComposite rules, but we have to punt to
    software loops in the case of XOR mode.

  • If the current Paint is a TexturePaint, the approach is very similar to that
    described for GradientPaint above. However, there are two caveats to be
    aware of. First, the BufferedImage used for the TexturePaint must be
    cached in texture memory (see "Image Rendering" section below). Second, the
    BufferedImage used for the TexturePaint must have power-of-two dimensions
    (unless the new GL_ARB_texture_non_power_of_two is available, as discussed
    in the "General Comments" section). The texture coordinate generation
    mechanism that we use will only tile the texture image properly if it
    has power-of-two dimensions. If either of these two restrictions is not
    met, we will just fall back on the existing software-based TexturePaint
    implementation.

  • For custom Paint implementations, we will simply fall back on our software
    pipelines to complete the operation.

Composite


All 12 Porter-Duff rules defined by the AlphaComposite class can be
accelerated by OpenGL. Likewise, if XOR mode is set, then we will use
OpenGL's XOR logic operation to accelerate XOR rendering. For custom
Composite implementations, we will fall back on our software pipelines to
complete the operation.

Stroke


For simple draw operations (such as drawLine()), the geometry can be sent
directly to OpenGL only when there is a thin stroke (i.e. a default
BasicStroke with width=1.0) installed on the Graphics2D object. If the
stroke state is any more complex, then the shape will be sent to the software
rasterizer and converted into spans, which will then be rendered by OpenGL
as a list of simple quads. (The composite and paint operations will still
be accelerated by OpenGL as described above when rendering the spans.)

Transform


If the current AffineTransform represents a simple translation (no scale,
shear, or rotation), then the translation factors will be applied to the
parameters of the operation and the operation will be performed by OpenGL.
If the current AffineTransform is more complex, then the shape will be sent
to the software rasterizer and converted into spans, which will then
be rendered by OpenGL as a list of simple quads. (The composite and paint
operations will still be accelerated by OpenGL as described above
when rendering the spans.)


3.2) Antialiased Rendering (ANTIALIAS_ON)



When antialiasing is enabled, shape rendering operations go through the
software geometry rasterizer, which knows how to optimally apply the current
transform, stroke, and clip state in order to produce something easily
digestible by OpenGL. Specifically, the geometry is converted into a series
of alpha mask tiles. (There is actually a ton of things going on here, but
for the sake of simplicity I'll just talk about this process from the
perspective of the OpenGL-based pipeline, which only knows how to take these
alpha tiles and turn them into something visible on the screen.)

Even though the software rasterizer is heavily involved when antialiasing
is enabled, I would still argue that the operation can be considered
"accelerated", since OpenGL can be used to apply the mask to the current
Paint and composite the result to the destination OpenGL surface.

Due to the way the operation is defined, OpenGL will only accelerate
the alpha mask operation if:

  • the current Paint is a Color object (either opaque or translucent) AND


    the current Composite is of type AlphaComposite.SRC_OVER

  • the current Paint is an opaque Color object AND


    the current Composite is of type AlphaComposite.SRC AND


    the "extra alpha" value of the AlphaComposite is 1.0

If the above restictions are not met (e.g. a GradientPaint is installed), we
will use a slower path, but rest assured that we will use OpenGL whenever
possible to render the antialiased shape to the destination surface.


4) Text Rendering

Operations in this category include drawString(), drawGlyphVector(), etc.

Rendering of text, both antialiased and non-antialiased, is accelerated by
the OpenGL-based pipeline. We maintain an OpenGL texture that acts as a
hardware glyph cache, so commonly used glyphs can simply be texture mapped
to the destination surface, taking advantage of the hardware accelerated
compositing offered by OpenGL. The heuristics used by the OpenGL glyph cache
are subject to change, but in J2SE 5.0, we attempt to cache a glyph if
its width and height are each less than or equal to 16 pixels. If the glyph
cannot fit in the OpenGL glyph cache (which can hold approximately 1024 16x16
glyphs), we render each glyph individually using a process very similar to
that descibed in Section 3.2 (including the same restrictions on the current
Paint and Composite).


5) Image Rendering

Operations in this category include all the drawImage() variants. If you
are unfamiliar with the concepts of VolatileImages and "managed images",
I highly suggest you read through Chet's

blogs
on those subjects.

Imaging operations are usually accelerated in hardware by OpenGL, even if one
of the 12 AlphaComposite rules is installed on the Graphics2D.
Generally speaking, the OpenGL-based pipeline will accelerate the following
operations:

  • simple copies (e.g. drawImage(img, x, y, null))
  • simple scales (e.g. drawImage(img, x, y, w, h, null))
  • arbitrary transforms (e.g. drawImage(img, xform, null))

Exactly how the image data is rendered to an OpenGL surface depends on the
types of images involved. Each type of imaging operation is described below.


5.1) System Memory Surface --> OpenGL Surface



System memory surfaces (e.g. a BufferedImage that has not yet been cached
in an OpenGL texture) of the following types can be rendered directly by
OpenGL:

  • IntArgb
  • IntArgbPre
  • IntRgb
  • IntRgbx
  • IntBgr
  • IntBgrx
  • Ushort565Rgb
  • Ushort555Rgb
  • Ushort555Rgbx
  • ByteGray
  • UshortGray

If an image is not of one of the above types, we can still use OpenGL to render
the image, but we will first convert the image into an intermediate type that
OpenGL can handle, such as IntArgbPre.

The glDrawPixels() operation can handle simple copies and simple scales
(in conjunction with glPixelZoom()), so these operations should be relatively
performant. However, glDrawPixels() is known to be somewhat slow, especially
on graphics hardware in the x86 world, so this is not the most optimal path.

There is no direct way in OpenGL for transforming system memory surfaces
(barring the "pixel transform" extension, which is either not available or
not performant on most graphics hardware). Therefore, the OpenGL-based
pipeline will use a special tiled approach that uses an intermediate OpenGL
texture object to transform the system memory surface:

sysmem --> texture --> OpenGL surface


This approach is reasonably fast since the intermediate texture operations
are handled in hardware, but note that it is currently defined only for
NEAREST_NEIGHBOR interpolation. (We have an RFE open that would make this
work for BILINEAR as well, but for now BILINEAR and BICUBIC hints are handled
by our software transform loops in this case.)


5.2) Managed Image (OpenGL Texture) --> OpenGL Surface



Managed images of all types can be cached in an OpenGL texture (there are
direct loops defined for the types mentioned in Section 5.1, but generally
speaking we can cache any image type by first going through an intermediate
surface).
Once an image has been cached in an OpenGL texture object, that image can be
rendered to an OpenGL surface by mapping the texture to an OpenGL quad.
The texture-mapped quad will respect the current AffineTransform state, and
will therefore be transformed.

For example, if there is a rotation transform
set on the Graphics2D object, the texture will be rotated by the graphics
hardware. Likewise, the variants of drawImage() that take scaling parameters
will scale the texture mapped quad before rendering to the destination
OpenGL surface. Transforming a managed image with either NEAREST_NEIGHBOR or
BILINEAR interpolation RenderingHints will be accelerated by OpenGL in
hardware. Unfortunately, OpenGL does not support BICUBIC interpolation for
textures, so we fall back on our software transform loops for the BICUBIC
case.


5.3) VolatileImage (OpenGL Pbuffer) --> OpenGL Surface



Simple copies and scaled copies from a pbuffer-backed VolatileImage to an
OpenGL surface will be accelerated by the VRAM->VRAM glCopyPixels() operation,
and should be relatively performant. There is no direct way in OpenGL for
transforming pbuffers (barring the render-to-texture approach, which is not
discussed here). Therefore, copying a pbuffer-backed VolatileImage
with an arbitrary transform will use a tiled approach similar to that
described in Section 5.1:

pbuffer --> texture --> OpenGL surface


This approach is reasonably fast since the intermediate texture operations
are handled in hardware, but note that it is currently defined only for
NEAREST_NEIGHBOR interpolation, as mentioned in Section 5.1.


6) Miscellaneous

  • The Graphics2D.copyArea() operation is accelerated for OpenGL surfaces,
    using the very fast VRAM->VRAM glCopyPixels() operation.

  • The BufferStrategy.show() method (for "flip" strategies) will result in a
    native SwapBuffers() operation, which causes the contents of the hardware
    OpenGL backbuffer to be "flipped" to the front buffer (i.e. the screen).
    Depending on the platform and graphics drivers, this operation may be
    synchronized with the vertical refresh of the monitor.

  • The Image.flush() method will delete any OpenGL textures in use by a
    managed image, or any OpenGL pbuffer in use by a VolatileImage.


7) Conclusion

I hope this article answers most of the questions developers have been asking for the past few months. If you see any glaring omissions, something you would like clarified, or topics for a future "Behind the Graphics2D" article, please post a comment. I'll try to incorporate your suggestions into this document so that it can be the "definitive source" for this topic (if that's possible).



In my ears: Wire, "154"


In my eyes: Kobo Abe, "The Woman in the Dunes"

Related Topics >>