The Source for Java Technology Collaboration
User: Password:



Mark Lam

Mark Lam's Blog

Performance: Too much of a good thing?

Posted by mlam on November 22, 2006 at 05:52 PM | Comments (7)

This article continues with esoteric knowledge about the phoneME Advanced VM and the JavaME space that developers will need.

If you've looked at the phoneME Advanced VM source code, you'll see that a lot of the names of functions and data structures are prefixed with CVM. CVM is the informal name of Sun's CDC VM, and prefixing labels (especially for global functions and data structures) with CVM is a standard coding convention in this VM code base. This is probably common knowledge to most people who already work with Sun's CDC technology, but I thought I'd mention it anyway in case. Plus, now I can simply refer to CVM directly instead of having to say phoneME Advanced VM.

So, on to this entry's topic ...

Performance
Usually, no user will ever complain if you offer them more performance in their software. However, performance comes with a price. Usually, it means more complex code that makes better use of the hardware. That can mean a higher memory footprint may be needed to run the software. For a JavaME VM which is targetted for resource constrained embedded devices, this is definitely a great concern. Hence, any performance work needs to be justified against its cost. What this means is that platform developers can't just go wild with every optimization trick in the book that they know.

Having said that, I want you to know that I am not saying this because CVM's performance is anything to be embarrassed about. As far as we know, CVM is one of the fastest VM in this space, if not the fastest. To give you an idea of CVM's performance, a few years back, we benchmarked it against JavaSE 1.3 client VM on a subset of SPEC JVM98. We had to use a subset because SPEC JVM98 uses deprecated APIs which have been removed from CDC. Hence, we had to do an internal "port" of the benchmark for this comparison. The comparison was done on a PowerPC PowerMac and a Solaris SPARC machine. CVM came out to be around 80-90% of the performance with only 10% of the static footprint in comparison with JavaSE. You should know that this is old data. JavaSE has improved significantly since, and so has CVM. Note: I'm only sharing about this comparison to give you an idea of the level of performance that can be achieved in JavaME. I'm not saying anything about which VM is better. That would be like comparing apples and oranges. More on that later.

So, when we talk about performance, one of the VM's component that people think of first is the dynamic adaptive compiler, also commonly know as the JIT. Below, I will talk about some performance issues around compilation. I will also touch on other areas / topics that are not JIT related but are important as well.

Static Compilers vs JITs
One classic mistake that engineers make is to start implementing optimizations from their compiler text book in the JIT without due consideration. While some of those optimization techniques are meaningful, some are not. For one thing, the text books usually teach about static compilers. JITs, on the other hand, are dynamic compilers. Factor in JavaME's resource constrained requirements, and you'll have an even greater contrast in their attributes. Here's a comparison of the two technologies:

Static compiler JavaME JIT compiler
Can afford a lot of working memory to do the compilation work. Must minimize / limit amount of working memory used.
Can afford more time / CPU cycles to do compilation work. Must minimize / limit consumption of CPU cycles.
Typically assumes all methods are available to the compiler. Works with only a subset.
Typically compiles all methods. Compiles only hot methods.
Must be able to compile all types of code. Only need / want to compile commonly used types. Let the interpreter handle the uncommon cases.

Note how static compilers may assume the availability of more memory and CPU resources. Hence, their compilation techniques may have similar assumptions. Obviously, that means that some of those techniques may not be suitable for JavaME.

Note also that the Java platform is a dynamic environment where it is normal to expect some code to be downloaded at runtime. Very late binding of code is expected in the Java VM and language specifications. This doesn't match the static compiler's assumption that the entire application code base will be available as input to the compilation process.

The JIT's ability to let the interpreter handle execution of uncommon cases also reduces on resource consumption (in both compiled code and compiler footprint) for compiling code which is not critical to performance.

Critics may say that the claims in my table above are based on broad generalizations that may not be true of some state of the art static compilers today. Why, yes, I am. For one, I am assuming static compilation also comes with static linking. But bear in mind that your compiler text books will probably not cover the state of the art either. I am also using a strict definition of static compilation i.e. I'm expecting it to compile static code. Granted that real-world implementations may have added capabilities that deal with dynamic code (which may be downloaded), but those aren't strictly static compilers anymore. The point here is that you should not apply classic compiler techniques blindly. Those techniques are usually targetted and optimized for a different kind of system (one that does not necessarily resemble the Java platform), and hence may not be suitable here.

Another misconception that people may have is that code generated by static compilers will be faster than JITted code. This is not always true. In some cases, JITted code will actually out-perform staticly compiled code. The key reason for this lies in the fact that the Java platform is dynamic and that late binding occurs. I'll leave the details of that discussion for another day (not a short discussion either).

Hence, there are many reasons why static compilation techniques may not be suitable, even when discounting the resource constrain issue and performance is all that you care about.

JavaSE Hotspot vs CVM
OK, so we can't just pick tricks from a compiler book. How about tricks from the JavaSE VM then? The answer is also "maybe". First of all, there is the resource constrain issue. JavaSE is tuned for significantly larger systems with a lot more resources. It is entirely reasonable and expected that they will make use of those resources to give you the best performance for your money. But when the resources are not available on your device, those techniques may be a no-go.

Another point that may not be obvious to the average developer is that a JavaME implementation (like CVM) is not just a smaller JavaSE. The type of devices that JavaSE targets are different beasts than those of JavaME. CVM is not smaller that JavaSE's Hotspot only because it has lesser functionality. CVM was architected with different design goals in mind to enabled it to work well in embedded devices. At each level of its design, a different choice was made for the speed-space tradeoff. For this reason, techniques used in JavaSE may not apply in CVM because they are tuned for a different tradeoff.

To give you a concrete example of how JavaME devices differ from JavaSE, some time back, a colleague of mine from the JavaSE side discovered that when he applied a certain technique to improve cache locality, he was able to get a performance gain of about 20% in one benchmark. This caught my attention. 20% is nothing to sneeze at. So, I applied the technique in CVM. To my surprise, that same benchmark yielded a jaw-dropping 70% gain in performance. What happened? The difference was that the JavaSE run was on a server class machine where the amount of cache was huge (possibly in the hundreds of KBs or maybe even a few MBs). I was running mine on an ARM device with only a 32KB cache. The improved cache locality had a greater impact here.

Hey, but doesn't that demonstrates the exact opposite, that it's good to import JavaSE techniques into CVM? Well, in this case, it worked out. But what if the JavaSE technique was one that made use of the fact that the target device will have a large cache, and optimized code to take advantage of that? Such a technique applied to CVM may actually cause a significant degradation in performance due to JavaME devices not having that expected amount of cache.

Hence, the point is that we should not import techniques from JavaSE blindly either. Note that the above illustration also shows that a JavaSE VM may not actually run faster than CVM if it was run on a JavaME device, even if you give it enough RAM (but not system cache) to fit. It would be like trying to power your car with a rocket engine. It sounds like a good idea, but your fuel system won't be able to handle it. And the result is not a faster car, but something that may not move at all.

JavaME is not just a smaller JavaSE. It is a different beast. This is why comparing JavaSE and JavaME may be like comparing apples and oranges.

Benchmarking
But, in the end, the best way to know if a certain optimization will work is to try it out. We have often done that ourselves in the past. There have been ideas that failed, and were not incorporated into the code base. One important criteria for determining whether the optimization will be incorporated is, of course, how much it costs in terms of resource comsumption. Another is how much performance it buys you.

To measure performance gains, you will need to run benchmarks of some sort. One common mistake that people make is to run micro-benchmarks that only test the one area that is improved by the optimization. The issue here is that real world applications would probably not just sit in a tight loop and exercise that one area of code all day. Hence, benchmarks that are based on real world applications are more reliable as performance indicators. For JavaME, we like SPEC JVM98, but as indicated earlier, it won't run on CDC without modification due to deprecated methods. Another one that we like is GrinderBench by EEMBC.

If possible, try to run your benchmark on a JavaME type device. As indicated above, JavaME is different from JavaSE. Benchmarking your changes on a JavaSE type desktop / server machine will give you an indication of the result your changes will yield, but it's not necessarily the same results you will get on a JavaME device. Exercise proper engineering discretion.

Performance Elsewhere
In general, the performance of a Java platform is not only dictated by its execution engine (the interpreter, the JIT). The quality of the VM runtime and class libraries also play a big part. Sometimes, they are even more important than the VM. We have seen this typically in graphics/GUI applications that spend a majority of their time in native code (as opposed to Java code). However, this doesn't mean that there's no work for the VM to do here. Strictly speaking, the VM runtime libraries are part of the VM implementation, and will need to be optimal as well.

Also, the VM can provide mechanisms which can help the class libraries perform better. They need to coorperate together. It's not a one or the other thing.

Lastly, there's the thing about native code. Some people think that their code will always run faster if they implement all the major pieces as native code. This is actually a fallacy. For various reasons, using native code can actually result in worse performance than implementing some or all functionality in Java bytecodes.

All these will be discussed in detail at a later date.

What does this mean to you?
Performance is a complex topic. We are only scratching the surface here. And as I tried to point out above, things aren't always what they seem. When trying to implement performance enhancements for a JavaME system, it is prudent to always think in the mind-frame of an embedded system developer. Each optimization technique needs to be evaluated individually for its viability in this space.

Have a nice day. :-)


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Hi,
    Thanks very much for writing interesting articles!
    I have a question, I was thinking about the fact that devices get more power and more RAM. I thought when will JavaSE be a better choice instead of JavaME/CDC1.1?
    How much CPU, RAM, cache....do you need?
    Regards,
    Ove

    Posted by: ovjo122 on November 23, 2006 at 07:18 AM

  • Hi Ove, your question is a good one. Unfortunately, I don't have a precise answer for you. The decision is an engineering one, and as I've been trying to point out, engineering decisions are about tradeoffs. Having said that, I can point out a few things to help with that decision. I sense that this could be a lengthy write-up (though I could be wrong) ... so, I'll make it my next blog entry. Please look for that next article later today. Thanks for your question. - Mark

    Posted by: mlam on November 23, 2006 at 08:51 AM

  • Thanks a lot for blogging such a lot about cvm and the embedded java stuff. Till now it has been really hard to get information about those embedded JVMs.
    Furthermore your articles are very interesting, thanks a lot :-)

    Posted by: linuxhippy on November 25, 2006 at 08:42 AM

  • You might eant to check out the articles at javolution.org has a more detailed explanation of some of the optimizations that can be made such as altering the collection classes..(aka shareme, http://www.jroller.com/page/shareme/Weblog)

    Posted by: shareme on November 26, 2006 at 04:19 PM

  • What are java.net people doing are they sleeping i reported a broken link of the
    Today on java.net November 26, 2006

    How Come U Don't Call Me Anymore?: What's the best way to make sense of an API? ยป Read more

    and some one from collab.net mailed me back that it will be fixed and still it is not fixed five days and i see there is still 404 page i know this is not the right place to say this but where shall i say this

    Posted by: javaniraj on November 26, 2006 at 09:37 PM

  • Hi javaniraj.
    Please be patient. This weekend is a 4 day holiday (Thanksgiving) weekend for some people in the US. Perhaps the folks are just away from their computers during this time. Please allow them a little more time to respond. And yes, this is the wrong place to post your request. I'm sorry, but I don't know where the right place to make the request is. I would start with maybe an email to the java.net community manager. See the java.net feedback page for contact info.
    Regards, Mark

    Posted by: mlam on November 26, 2006 at 10:10 PM

  • Regarding when small devices will be powerful enough to run JavaSE, keep in mind that as the devices get more memory and faster processors, JavaSE also gets bigger and requires more of both. The same is true of CDC and CLDC stacks, both of which have grown over the years. You can view this as CDC and CLDC taking advantage of the increased capabilities of the embedded devices they target, as JavaSE for the most part continues to be out of reach (last I heard a headless JavaSE was something like 10x the static and dynamic footprint of CDC).

    Posted by: cjplummer on November 27, 2006 at 03:23 PM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds