|
|
||
Chet Haase's BlogPerformance Archives"Freebird!"Posted by chet on August 23, 2004 at 11:13 AM | Permalink | Comments (21)I'm a little afraid of posting this blog, thinking that it could resulting in either
Nevertheless, I'll forge ahead. After all, the whole point in my blogs/articles is to talk about stuff that developers want or need to know more about. So, in the words of the unforgettable sourthern rock band and Java desktop client developers Lynyrd Skynyrd:
In other words, what would you like to see articles on? I have a few ideas kicking around in my head that I'd like to cover, and people have suggested a few more. But if you have other ideas that you would like to have considered, please tell me in the feedback section below. Here are a few topics that I would like to cover sometime soon, to give you a feel for where things are headed. If you have opinions on these, feel free to post those below as well:
That's all I can think of for now, although I tend to add to my internal list of blog TODOs frequently.
ThreadachesPosted by chet on August 19, 2004 at 03:42 PM | Permalink | Comments (10)Metaphorical Introduction
I find myself trying to multithread my life constantly. I've got so many things to do; surely there's a way I can multiplex the chores to get all of them done faster, right? For example, I'll be brushing my teeth and realize I also need to comb my hair. I'm only using one hand to hold the toothbrush, so I reach for the comb with the other hand. Then I'll start combing my hair, at which point the other hand with the toothbrush stops moving, or (even worse) keeps moving, but in such as way that toothpaste starts running all over the place. Now I've gone from two actions that would have been simple to perform serially to one interleaved complex action that's resulted in a toothpaste-and-spittle mess and an unruly mop on my head. The problem here is that, despite the speed of our processors and the simplicity of our actions, some things in life simply cannot be multithreaded. This is not to say that multithreading itself is unachievable or not worthwhile for some operations. As an example, consider breathing or blinking; if I had to stop any other action to make my eyes blink or to breathe, I'd never get anything done. Consider multiplexing actions while driving; if we couldn't do many things at once in this environment, we'd all be dead. (Of course, many people assume that this multiplexing-while- driving capability is universal and extends to complex conversations on phones while driving at 80 on the highway; I believe these folks will eventually be evolved out of our society, although they may take a few of us with them along the way). So there are clearly some operations which are better done on separate threads; the trick is figuring out which ones are which if you don't want to be wiping up toothpaste off your clothes all the time. Or, in the case of your applications, if you don't want to be debugging thread deadlocks or performance problems due to thread abuse. That reminds me; I was going to write about software in this article, not toothpaste and driving. I knew there was a reason I was posting this to a developer site... Now, to get Technical I've spent the better part of the last year thinking about multithreading problems, and I could probably spend the rest of my career doing the same (although it probably wouldn't be a long career since my head would pop off before too long). There are so many issues in this ugly space that to write about all of them would take too long. So for the purposes of a focused blog (although it's probably too late for that goal), let's just focus on one area of trouble for Java client developers: multi-threaded graphics. When developers first discover the joys of multithreaded programming, it's like opening a new present on Christmas; the power of Java thread creation is that it is so easy to create and use new threads, that you can start taking advantage of multithreaded processing in a much easier fashion than you ever could before in previous programming languages. And the knowledge that your application can perform multiple tasks simultaneously, especially on systems with multiple processors or hyper-threaded cores, is huge. You no longer need to block on IO, waiting for the system to open a file. You no longer need to hang in your display code waiting for an image to downloaded over the network. You no longer need to hang the GUI while calculating a complex set of equations. In all of these cases, it is trivial to spawn a new thread to perform the bottlenecking operation, and use the result of that operation later as appropriate.
In fact, Java builds in much of this functionality for you, so you don't even need to
do the work of multithreading some of your operations which may be blocking. For example,
the old Toolkit Image loading methods automatically load images on a separate thread. The
following code:
"Well heck," the Happy Developer says. "If two threads make my application that much faster, imagine how fast it'll be with ten!"
For example, suppose a developer finds that one of the more costly operations in their
application is some rendering operation, like drawing some complex Shape.
Every frame they have to draw numShapes of these shapes, like so:
"Golly!", the Happy Developer says, "What if I use that cool
Thread mechanism to speed this up?! Then it'll go way faster."
They might write something like the following:
They compile the code, run it hopefully ... and discover that it didn't fix their performance problem. In fact, they discover that the app is actually much slower than the original code. What gives? There are actually several different factors that can contribute to the performance of this particular approach, ranging from things that add no benefit to those factors that actually make multi-threading this application slower. Let's go through some of these factors, one by one:
1) Thread/object creation overhead
However, that doesn't mean to say that creating temporary objects is actually free. In particular, it doesn't mean that you want to create and initialize objects in your inner loop if you don't have to. In the above example, we are not only asking for temporary memory for the Thread and ShapeRenderer objects (a cheap operation), but we are also asking that those objects get created and initialized, which may not be so cheap, depending on the complexity of the objects involved and whatever initialization process they need to go through. In this case, the creation of a thread will probably involve a fair amount of processing at either/both the Java and native levels in order to create the underlying thread object. The Happy Developer, realizing this, will of course take steps to minimize the temporary object creation. In this situation, they may realize that since the same thing happens every time through the loop, there is no reason that the applications needs to create the Thread and ShapeRendering objects every time through; they can just incur the overhead of creation one time and then reuse these objects whenever we need them. I won't bother with an example here, just picture a variation of the above where the Thread and the ShapeRender objects are created only once. Then, inside the paintComponent() method, we need only update the Graphics object of each ShapeRenderer and then tell each Thread to do its thing. Once again, the Happy Developer (this time with a slightly less huge smile of anticipation on their face) awaits the stunning results ... but discovers that this new variation is still worse than the original approach. Things may be a bit better here than before; at least the application is not going through the contortions of creating and initializing the Thread and ShapeRenderer objects every time through the painting loop. But there's still something amiss in the messy threading details. 2) Thread Swapping One of the hidden details of multithreaded programming is that the operating system has to go through a fair amount of work in order to run a separate thread. This is not much in the whole scheme of things (less than a millisecond, certainly), but it can add up when there are several threads involved. For example, if you have ten threads all trying to do similar tasks at the same time and at the same priority, then the system will keep swapping the threads in and out trying to get the work done. This may not be as bad as swapping out each thread after just a couple of instructions in some round-robin fashion; we may get a fair chunk of work done in any given thread before we are swapped out. But the amount of work accomplished on that thread must be weighed against the work done to swap threads to know whether it was worthwhile having multiple threads to accomplish the task. I wrote a test to see what thread-swapping overhead was. I ran in a tight loop, calling wait/notify to swap the threads back and forth. For 100,000 swaps, it took 1.3 seconds on that particular test system. This doesn't sound like much, but if you can imagine each thread trying to perform something simple like drawing a single line, the fact that we could only do 100,000 of these operations in that 1.3 seconds makes the thread swap overhead seem pretty significant. (For comparison purposes, I also timed calling a function 100,000 on the same system, which took only about 10 ms). The cost of thread swapping overhead comes into play especially on systems where there are less computing resources available than there are threads than want those resources. This is an excellent, if obvious, segue into my next point... 3) Limited Thread Resources The ideal case for multithreaded systems is having one processor per thread, or at least one processor available whenever a new thread needs processing power. For example, if you have a four CPU system and there are four threads all trying to run at the same time, they can each have a processor to themselves and get along swimmingly. There need be no thread-swapping overhead, as mentioned in point #2 above, because the threads do not have to be swapped out; they have full control over their CPU (at least while the process is running). There are now hybrid systems where single chips can have multiple resources available for threads, such as the Hyper Threaded CPUs of various chips today. While these systems cannot dedicate an entire CPU to a particular thread, they have enough resources on any CPU to dedicate some of those resources to separate threads. There is still some overhead of thread swapping and contention here, but it is at least better than the single-CPU model. The problem with the sample application above is that it does not take into account anything about the system when it creates a thread per Shape. Unless the application is running on a system that has as many CPUs (or at least an many thread resources, in the hyper threading case) as there are threads, then there is going to be contention in thread processing and thus overhead for swapping threads in and out. Our Happy Developer might realize that the available thread processing resources could be a bottleneck. Suppose they know that, in general, their application will run on systems with at least 2 processors or one hyper-threaded processor, and they would still like to take advantage of that capability in multithreading their application. They may then change their app to use a model of exactly two rendering threads instead of the one-per-Shape model above. Now, instead of sending each Shape rendering operation through its own dedicated thread, it will queue up these operations on two separate threads. The code will be more complex for this approach, but still fairly straightforward. For brevity, I'll skip the example, but hopefully it's easy to picture this thread-sharing approach. Again, the Happy Developer awaits patiently the results they know will stun the world. There is a slight faltering of their smile this time; they have met defeat too many times in the past. Still, they look forward to ... more failure. Once again, the application fails to improve upon the original single-threaded approach. This time, they have eliminated much of the overhead in dealing with threads. And when running on a system than can process multiple threads simultaneously, they may even have eliminated thread-swapping overhead (or at least reduced it significantly). Given the power of doing multiple tasks simultaneously, and the ability of the system to handle this simultaneity, what happened? 4) Graphics is Inherently Single-Threaded Here's the sad reality of today's computing platforms; graphics hardware is inherently single-threaded. This single-threaded approach is so ingrained in today's platforms that it is an assumption at the hardware, the driver, and even the API level (although some APIs are written to handle multi-threaded programming, they do not do a good job of compensating for the limitations below them and the hardware and driver level).. While the hardware architectures, processors, operating systems, and languages have evolved to allow and encourage multi-threaded programming, the underlying graphics engines simply cannot do it. This means that you may be able to easily write an application that performs graphics operations in multiple threads. And the system you are using (e.g., Java) may turn around and issue those graphics calls in separate threads. And the underlying systems that Java depends upon (e.g., X, DirectX, GDI, whatever) may be able to receive those calls from multiple threads. But when it finally gets down to the hardware, it has all been funneled through one pipe and there is no way to get any advantage by trying to use multiple threads at a higher level. Just like the chain that is only as strong as its weakest link, an application is only as multithreaded as the systems it depends upon; in this case, if the graphics subsystem is single-threaded, then there is nothing you can do at the upper layers to make that system more multi-threaded-friendly. Let's take a look at an example. I was working on this one just this week, playing around with various options in double-buffering. I wanted to play with the idea of writing to a buffer on one thread and copying that buffer to the onscreen window on a totally different thread. There were various reasons for this (and various possible gotchas), but it was at least worth an experiment. I wrote my rendering loop to fill the buffer as fast as it could, with simple calls that mimicked a scrolling operation (copy part of the buffer to itself in a different location, fill in the rest with some color). After each operation, it would update a flag that told the system that the buffer contents were new (and should thus be copied to the onscreen window at some time). I wrote my buffer-copying thread to occasionally wake up (every few milliseconds) and copy the buffer to the screen if the buffer had been changed since the last copy. There are a few implementation details here (such as synchronizing on the update flag variable), but I have described the essential bits. What I saw confused me at first. It ran pretty well on my development system. So I had someone else run it on their system. In that new environment, it basically froze the window for several seconds. It looked like we were not even getting our screen- updating loop, as if the system was not even kicking off the timer I had set. In digging into it further, I found the artifact was more disturbing. We were being woken up correctly based on the timer, and were then attempting to update the screen. But the buffer-rendering loop had so completely filled the graphics pipeline with scroll/fill calls that we basically froze the system while those were being worked on by the underlying driver and hardware. Here was a situation where:
The upshot of this whole diatribe on the graphics subsystem is that there is basically no gain to be had in changing the original sample application in the way that we did; since the underlying system is inherently single-threaded, we have nothing to gain and everything to lose by introducing potential thread overhead when everything will just end up in a single thread in the final rendering step in the hardware. Conclusions, Thoughts, Powertool Accidents The point of this article was to raise some issues to be aware of in multithreaded programming. Note that I am specifically not saying "Don't Do It!". There are lots of advantages to multithreaded approaches, some of them mentioned in the introduction above. For example, it is still a huge win to do time-consuming non-graphics tasks in a separate thread (such as IO or image loading) if you do not want to block the main thread (a big example in my desktop client world is the canonical "don't block the GUI thread" example; all blocking operations should happen elsewhere so that the user still sees a snappy GUI).. And there are even cases where graphics operations can be effectively multi-threaded. For example, in a system that is using all software rendering (such as rendering destinations that are not hardware-accelerated, such as BufferedImage objects) running on systems that support multiple threads, there could be a huge win in having separate rendering threads. Imagine a multi-chip server that is producing separate images all in parallel, rendering to each with software loops, running each of those loops on separate processors; this is a major win for parallelism. But what I am saying is: be aware of the issues in the platform and underlying systems when taking a multithreaded approach. And always test your application to see if you actually got the speedup you were anticipating.
Think of multithreaded programming kind of like a chainsaw. It can be an incredibly
powerful tool that can dramatically reduce the time needed to perform some tasks.
Or it can chop your hand off and cause a huge mess that's impossible to recover from.
You need to know how to use it effectively to determine which one it will do for you.
ToolkitBufferedVolatileManagedImage StrategiesPosted by chet on August 11, 2004 at 05:11 AM | Permalink | Comments (8)A common question seems to arise often from Java graphics developers about which image type or creation method to use. When exactly should you use VolatileImage? What is BufferedImage appropriate for? What about the old Toolkit images? And when is BufferStrategy more appropriate than one of these image types? It's a pretty big topic, and the answer (like all truly great answers) is probably "It depends". But there are some general guidelines that can come in handy. And perhaps a description of what these different kinds of images and methods are all about might help. 1) Image Types First of all, perhaps a short dictionary of image types might help:
That's it for the basic image types. Now let's talk about how we actually create and use these image objects. 2) Who you gonna call? Whenever I want to give myself a fright about the complexity of our APIs, I simply ponder the vast array of choices that face developers who simply want to create an image. I'm sure I'm missing some here, but let's see...
I'm sure there's more out there, especially using things like ImageIO (which is all about reading and writing images, as you might guess from the name...). But this list will do for now. So it's a wrap. This article's pretty much finished; just use the above API calls to create your images. Left as an exercise to the reader. Q.E.D. It's obvious, isn't it? Okay, so maybe it isn't obvious; there are a lot of methods above that all seem to need different parameters or that create different types of images. Here's the trick: All of the above image creation methods (and any others that are not on the list) can be broken down into just a few categories. Then the plethora of ways of creating an image in one of those categories can just be seen as utility methods; different ways of getting the same result. The convenience methods may be because of logic (why do I have to get the GraphicsConfig to create an image associated with a Component? Why not use the Component directly?), or convenience (instead of using some InputStream mechanism for all image readers, we provide several ways to read the image directly including from filenames, URLs, and streams; just call the method appropriate for your situation). So the real work in this article is to break down the categories of image types and describe which types of images and methods you may want to use in which situations. Once you get that down, the rest, as they say, is just implementation details. 3) Image Loading or Creation? First of all, are you loading existing image data? Or are you creating an image buffer in memory? Image loading means that you have image data (either locally or across the network) that you want to load into your application, possibly to copy that image onto the screen or to read and operate on the data. Image creation means that you want some arbitrary image memory created for your application; perhaps you want to create a buffer for double-buffered animations, or you want a place to cache intermediate filtering results. 3.1) Image Loading
In the above method list, all of the methods that take filenames, urls,
streams, producers, and data arrays are those intended for loading existing
images. In particular, all of the methods listed above for Applet,
ImageIO, ImageIcon, and Toolkit are intended for image loading:
There are at least four major things that differentiate these methods:
When I'm talking about location, I'm mainly concerned with whether the file is local or across a network. Also, if it's packed into some resource file, such as a jar file, that also comes into play here. Loading across the network
If you are accessing the data across a network, it's probably easiest
to use the URL variations:
Now suppose you have another image that you have saved locally in a file;
just use the filename variation of the above. For example, let's say
you loved one particular instantiation of the lovable-yet-quirky duke.gif
file above so much that you downloaded and saved it for use in your application
(see the above note on scary lawyers). Then you could use the following code
to load that file from the directory where the program was launched:
Another consideration is the format of your stored image. The old Toolkit/Applet loaders only understand GIF, JPEG, and PNG format files. (Okay, they also understand XBM and XPM2, old X11 image formats, but those are probably not formats you are terribly concerned about). These loaders works well for most web applications since these image types are traditional web image formats. But what if you have an image in some other format that the Toolkit/Applet loaders do not understand? ImageIO currently has built-in readers for GIF, JEG, and PNG. In addition, it will have BMP and WBMP capability in the jdk1.5 release. Moreover, there will be more image readers/writers for ImageIO going forward, whereas there are no specific plans to support more formats for the old Toolkit/Applet loaders. And finally, ImageIO has a pluggable reader API, so if you have a custom image format, or some other format not yet supported by the core library, you can write your own loader for that format within ImageIO. In fact, the JAI team has made available a package with additional ImageIO readers/writers at http://java.sun.com/products/java-media/jai/downloads/download-iio.html if you have requirements beyond the current ImageIO defaults. So ImageIO could also be the right choice if you need to deal with formats beyond the basic web image formats. 3.1.3) Synchronicity
The Applet and Toolkit image loading methods came from the old days
of Java 1.0, when Java was seen primarily as a networked application
API and image data might come from any source, potentially
one on an unreliable or slow network connection. To make networked
applications more robust, it is reasonable to put network-dependent
operations in separate threads to ensure that an application's
main or GUI threads do not hang while waiting for a slow download.
Because this was a common pattern for Java GUI applications at that time,
the image loading operations were all created to run on a separate
image loading thread. Thus when an application calls:
Note that this model of asynchronous loading does not apply solely to networked applications, or even to image loading specifically; any operation that takes a significant amount of time should not be done on the GUI thread, lest you run the chance of making your application appear hung while the operation is taking place. So, for example, if you are loading in a huge image from a local file, you may want that non-networked operation to happen in a separate worker thread to ensure that your GUI has no pauses during image loading. This model works well enough for applications that create their images early for later use. The application simply may need to check whether the image has been loaded whenever it is required in the application.
When applications do need the data (for example, if they need image sizes in
order to determine layout correctly, or if they need to display images in
their final form), they may need to synchronize on the image
loader and wait until the image loading is done. For example,
an application may want to load local image data and be willing to
wait for that data to load before proceeding (knowing that a local
load will usually not take very long). In that case, the application might do
something similar to the following:
Meanwhile, ImageIO has synchronous loading methods that do not return until the image has been loaded and is ready to go. Note that some applications and situations may still need asynchronous loading behavior (for long image loads or to more efficiently multitask). For example, it does not take a huge amount of time to affect perceived GUI performance, so if an image load will take even as long as a tenth of a second, you may want to avoid loading that image synchronously on the Event Dispatch Thread (so don't load it in your paint() method). You can always spawn a new Thread yourself to call the ImageIO loading methods if necessary. 3.1.4) Resulting Java Image Type Part of the decision over which image creation API you is in which image type you want to get back from the creation method. In particular, do you want a Toolkit Image or a BufferedImage?
Toolkit Images are created by the Applet, Toolkit, and ImageIcon methods
listed above. The resulting images are easy to use for display purposes
(just call BufferedImage objects are created by the ImageIO methods listed above. These objects offer a more powerful API, albeit with potentially more work involved to do some operations (such as displaying an animating GIF image). Image or BufferedImage: What's in a Name? Although both Image and BufferedImage have similar properties in terms of being displayable, BufferedImage has many more capabilities. For one thing, the Image objects created by the Toolkit, Applet, and ImageIcon load methods are read-only; you cannot get the Graphics of those Images and render to them. So if you want to modify the image data, you will need to do more work (such as creating another image that is modifiable and copying the loaded Image into that new image). Image has some very simple methods and is mostly intended to be a simple object that holds image data. But BufferedImage has many methods for modifying and extracting all kinds of data from an image; color models, pixel data, and more. Given a choice between the two, I would always opt for the one that gave me more power and flexibility. But doesn't that increased capability mean increased overhead? Not at all; there is no extra processing involved in BufferedImages when these other powerful methods are not used. If all you do is load and image and display it, BufferedImage can do this just as easily as the more streamlined Image object. So go ahead and use BufferedImage. It is, after all, better than butter. Dirty Laundry One good (and not entirely obvious) reason for using the ImageIO API for loading images is the unfortunate reality that the code is simply newer, cleaner, and more maintained (both now and in the future). Much of the old Applet and Toolkit image code was written years ago and has many assumptions and situations that it must account for and is therefore tricky to maintain and upgrade. Our future image reading/writing direction is with ImageIO; yours should be too, because that's where the focus of our efforts will be in the future. Having said all that wonderful stuff about ImageIO, there could be situations in which the old Toolkit/Applet/ImageIcon approach makes more sense for your particular application, including:
Note also that if you need to use the old APIs for some reason but
you still want the power and flexibility of BufferedImage, it is
easy enough to load the images in through whatever methods are
appropriate, create a new BufferedImage object, and then simply
copy the loaded images into the BufferedImage. For example:
3.1.5) Hey! What about the other loading methods above? The approaches above cover most of the loading methods I listed, but some are notably skipped. The *Stream methods of ImageIO are simply variations on a theme; if you happen to have your data in that format (versus a URL or file), go for it; it's just a convenience to use these alternatives.
As for the other skipped methods (one using an ImageProducer and
some using data arrays), I hoped you wouldn't notice....
As far as reading the image data from an array of data (see the methods above with the imageData[] parameter), this is really only appropriate if you've already read the data into the array to begin with. This could be necessary if you have some custom image storage mechanism, such as a database. But if the image existing in a regular file/URL/stream format, you should probably be using one of the other loading methods instead. 3.2) Image Creation What if you do not have an existing image on the network or file system? What if you just want a buffer of pixel data that you can use in your application? This could be for creating sprites or icons with rendering calls instead of loaded image information (perhaps you've found this to be faster in your situation than reading image files). Or it could be a buffer that you can use for caching intermediate results or for providing double-buffered rendering for an animation. For the purposes of this discussion, I'll break down this category of images into three types:
3.2.1) Static Images Static images are ones that are created and rendered to once (or infrequently) but probably copied from often. Examples of this type of image include icons for a GUI or sprites for a game. The best approach for this type of image is to create an image that is in the same format as the image or window that the image will be copied to; this ensures the most straightforward copy mechanism since the underlying software will not have to perform a conversion on the image data while copying to the destination.
You could, of course, create a BufferedImage object manually through one
of its constructors; you could query the GraphicsDevice for its display
information and then create a BufferedImage of the appropriate type:
But why go to the hassle of all of that when there are convenience
mechanisms that do all of this for you? Specifically, take a look
at:
The best part about static images is that you can use very simple means to create the images and then we will try very hard internally to see that you get any available hardware acceleration for these images when they get copied around. We call these "managed images", because we manage the acceleration details for you. For more information on managed images, please see my blog on BufferedImage performance. Note that we currently (in all jdk1.4.* releases) manage images that are created with the above APIs and some of the Toolkit image loading methods described previously, but in jdk 5.0 (available in Beta form now, and full release soon) we manage nearly all types of images and take advantage of hardware acceleration if it exists. So go ahead and create the type of image that is most convenient for you and we'll try to do the right thing under the hood. 3.2.2) Dynamic Images This kind of image may be rendered to quite often, as in an animating icon, or a sprite that is modified on a frequent basis. You could certainly use the same image-creation APIs listed above for static images; these will work fine in most situations and are certainly the easiest way to go in general. However, some developers interested in maximizing performance may want to know more about image management and how dynamic images can affect it. We manage images by detecting when the application is copying from an image to a destination (either another image or an onscreen window) that lives in accelerated memory. If this copy is done successively when the source image has not changed, then we may decide to cache a copy of that image in accelerated memory and perform future copies from this cached version. In the case of a dynamic image, if that image is being updated one or more times for every copy to the destination, then we will never create an accelerated version of it, and thus the image will never benefit from any hardware acceleration that we could otherwise provide. (Aside: For the insatiably curious, the reason for this oddity in acceleration comes from "surface loss", where an accelerated version of an image may simple go away at any time due to operating-system or user-caused situations. To keep the original image data intact, we store the main image data (that which is modified by the application) in an unaccelerated location, and only accelerate a mirror copy of that image. That way, if the accelerated version gets wiped out, we still have the original data from which we can create a new accelerated copy. The problem here, in terms of performance, is that an "unaccelerated image" means that all rendering to and from that image is unaccelerated. And if an application is constantly modifying the image, all of that rendering will be unaccelerated and it is never appropriate for us to create and use an accelerated version of that image.) Developers that care about top performance for these types of images may want to look into using VolatileImages instead. These images store the data in accelerated memory (when possible) and thus rendering to and from that image may be accelerated automatically. The downside is that these images require a bit more care and feeding, due to "surface loss" issues that arise with current video memory architectures and operating systems. Note that not all types of rendering to these images is accelerated, either, but simple types of rendering like lines and rectangular fills and copies can usually be accelerated, depending on the platform configuration.
I've already written about VolatileImages in past blogs
(Part I and
Part II), so I will not
go into the details of their usage here; please check out those
other articles for more information. But it is worth covering the
APIs used to create the images, just for consistency's sake in this
article:
The use of the ImageCapabilities object in these methods gives you the ability to require certain attributes (such as hardware acceleration) from any image created with that method. In general, you probably will not need to use that variation, although as we enable more hardware acceleration features in our platform, we may expand the ImageCapabilities API to be more powerful and useful. (Note, too, that ImageCapabilities can be used effectively as a means of inquiring what capabilities an existing image has). 3.2.3) Back Buffers By "back buffer" I mean an arbitrary offscreen image that is created for use in a double-buffering situation. Typically, an application that wishes to have smooth graphics, especially animations, will draw to a back buffer and then copy that buffer onto the screen instead of drawing directly to the screen. Swing does this by default, so that you do not see the various GUI elements in a Swing app flash as they are drawn to the screen. The buffer copy in these applications typically happens so fast that the graphics in the application are perceptibly smoother than if they were drawn one-by-one directly to the screen. A developer could use any of the above static or dynamic image APIs that I listed for creating a back buffer. However, the following things should be taken into account when doing so:
BufferStrategy: the preferred way of buffering in Java In jdk1.4, we introduced the BufferStrategy API, which is a wrapper around VolatileImages. This API allows you to ask for an accelerated back buffer and avoid having to manage the details of surface loss associated with VolatileImages. It also ensures that you will get a buffer of the optimal type for your application. In particular, you will get either a FlipBuffer (which can only be used in fullscreen-exclusive mode on Windows) or a BltBuffer (which is used by default for windowed applications). A FlipBuffer performs a swap of the front and back buffers in video memory when you request BufferStrategy.show(). A BltBuffer copies the contents from the back buffer to the front (just as you would if you called drawImage() from a VolatileImage back buffer to the front buffer). With this API, there is little need to create and manage VolatileImages directly; just let us manage the details for you inside the BufferStrategy implementation. For more information on BufferStrategy, check out the javadocs; they're pretty clear on how the system works.
The APIs you will need when creating a BufferStrategy are:
4) Wrap-Up So that's pretty much it. You have image loading methods and image (or buffer) creation methods. And in each category, you have various flavors depending on the location and type of the data, and the type of image you want returned to you. So even though there are a lot of methods listed at the top of this article, they all break down into just a few comprehensible categories and can be used effectively, once you understand the implications of each variation. Although there is certainly more complexity here than we can cover with a simple table, it might help to break down some of the basic attributes of the image types we have talked about and the reasons to consider one type over another when writing your application:
4.1) Hey! You forgot some methods!
There are still a couple of the creation methods up top that I have not
covered yet:
4.1) What About Performance? It is difficult or impossible for me to write a long block of text or code without thinking about performance. And since some of the users of the APIs above, and image operations in general, care a great deal about performance, I should spend a few words discussing some performance issues to be aware of. Again, check out my blogs on BufferedImage ( Part I and Part II); I go into much more detail on image management there. Some important things to keep in mind with respect to managed images (and making sure they are benefiting from available acceleration):
4.2) What about 5.0? Most of the APIs I discussed above are pre-5.0, so you can use everything above (except where noted) in the current releases available for download. If you are looking forward to using 5.0 (available in beta form today, in full release Real Soon Now), then I'll mention a couple of tweaks to the above:
BufferedImage as Good as Butter, Part IIPosted by chet on August 21, 2003 at 01:36 PM | Permalink | Comments (20)Part II, in which we discuss the internal performance implications of said image type Let's dive briefly into some of the performance implications with BufferedImages (because I'm writing this and I tend to wind up in the performance arena no matter what I'm talking about). There is currently only one image type that we guarantee acceleration on (if possible on the runtime platform): VolatileImage. When you create one of these images, we try to stash it in accelerated memory (such as VRAM on Windows, or an accelerated pixmap on Unix) and then perform rendering operations to and from that image using any available hardware acceleration. VolatileImages work well for things like back buffers (ala the Swing back buffer, which is now a VolatileImage), where you obviously want to render to them frequently and copy from them as fast as possible. But for your average image, managing a VolatileImage can be tiresome (you have to make sure it's there before and after you use it), and you can't get all the flavors of images you want (currently only opaque volatiles exist). But think about your average application: there's the back buffer and screen that you write to often, so you want those rendering-to operations to go really really fast. But then there's a bunch of other images like icons, sprites, whatever which you really only write to once or occasionally, but from which you would like to copy often. This is where Managed Images come in (our new catch-phrase which, roughly translated, means "we will try our darnedest to accelerate this for you"). Here, you create an image however you need to, start working with it, and internally we will recognize that these copying operations can go much faster using an accelerated version, so we will just create that cached version for you. You don't need to manage the image and you don't need to know how these operations are happening; you just keep calling your rendering operations and let us take care of the pesky details.
Now, the fine print: as of 1.4.*, we hooked out only certain parts of
the API to create these managed images. Specifically, you need to
create an image either by calling one of the create*Image methods:
So in the current implementation of Java2D, the advice posted by ajsutton in response to Part I of this BufferedImage article is well-taken: if you want to take advantage of possible acceleration for your image, use a compatible image (or one of the other means above). Then we will attempt to accelerate this for you. Another benefit of using a compatible image is that you will get an image that is "compatible" (thus the name of the method above) with the display device you are rendering to, which saves pixel format conversion during the copy loops. (A further caveat is that not all image types that you get from the above methods are acceleratable. For example, if you create an image with the flag Transparency.TRANSLUCENT then we do not currently accelerate that image and you end up going through software rendering loops regardless. Look for this to change as the library evolves and we try to accelerate more and more standard yet nifty features of the API). Okay, so that's the state of things now: use BufferedImage for all of your condiment image needs, and if performance is particularly important to you, then use one of the variants mentioned above is the way to go. But what about the future? Gosh, I'm glad you asked that. What a great question. You may already be wondering, in reading the above explanations and caveats: "Why can't they just accelerate everything? Why are only portions of the API managed?" In fact, this is totally correct; there is nothing preventing this from happening (other than the most obvious of reasons: time to implement and lots of other stuff that we've been working on in the meantime). For example, let's say you have a 16-bit BufferedImage you created from scratch and you want to copy it to a 32-bit display. This means that we have to do a pixel-format conversion, so we can't cache the 16-bit version, right? But we can cache a new 32-bit version; we just copy the 16-bit version to our new 32-bit cached version, and then simply use the cached version thereafter. Starting in jdk1.5 (currently in the oven, baking for a while, available at some unspecified (by me) date in the future), we will manage a much wider array of images. In fact, most of the images you can create or load will be managed for you. The code is in there, I've seen it working with my own eyes: BufferedImage objects running as fast as compatible images. It's pretty sweet...
That's all for now. I've glossed over many of the details of images and acceleration, but hopefully I've given a taste of how accelerated images work today and in the future. And hopefully you will be able to use this information to get the fastest and tastiest BufferedImage applications possible.
| ||
|
|