Search |
|||||||||||||
Performance: Too much of a good thing?Posted by mlam on November 22, 2006 at 5:52 PM PST
This article continues with esoteric knowledge about the phoneME Advanced VM and the JavaME space that developers will need. If you've looked at the phoneME Advanced VM source code, you'll see that a lot of the names of functions and data structures are prefixed with CVM. CVM is the informal name of Sun's CDC VM, and prefixing labels (especially for global functions and data structures) with CVM is a standard coding convention in this VM code base. This is probably common knowledge to most people who already work with Sun's CDC technology, but I thought I'd mention it anyway in case. Plus, now I can simply refer to CVM directly instead of having to say phoneME Advanced VM. So, on to this entry's topic ... Performance Having said that, I want you to know that I am not saying this because CVM's performance is anything to be embarrassed about. As far as we know, CVM is one of the fastest VM in this space, if not the fastest. To give you an idea of CVM's performance, a few years back, we benchmarked it against JavaSE 1.3 client VM on a subset of SPEC JVM98. We had to use a subset because SPEC JVM98 uses deprecated APIs which have been removed from CDC. Hence, we had to do an internal "port" of the benchmark for this comparison. The comparison was done on a PowerPC PowerMac and a Solaris SPARC machine. CVM came out to be around 80-90% of the performance with only 10% of the static footprint in comparison with JavaSE. You should know that this is old data. JavaSE has improved significantly since, and so has CVM. Note: I'm only sharing about this comparison to give you an idea of the level of performance that can be achieved in JavaME. I'm not saying anything about which VM is better. That would be like comparing apples and oranges. More on that later. So, when we talk about performance, one of the VM's component that people think of first is the dynamic adaptive compiler, also commonly know as the JIT. Below, I will talk about some performance issues around compilation. I will also touch on other areas / topics that are not JIT related but are important as well.Static Compilers vs JITs
Note how static compilers may assume the availability of more memory and CPU resources. Hence, their compilation techniques may have similar assumptions. Obviously, that means that some of those techniques may not be suitable for JavaME. Note also that the Java platform is a dynamic environment where it is normal to expect some code to be downloaded at runtime. Very late binding of code is expected in the Java VM and language specifications. This doesn't match the static compiler's assumption that the entire application code base will be available as input to the compilation process. The JIT's ability to let the interpreter handle execution of uncommon cases also reduces on resource consumption (in both compiled code and compiler footprint) for compiling code which is not critical to performance. Critics may say that the claims in my table above are based on broad generalizations that may not be true of some state of the art static compilers today. Why, yes, I am. For one, I am assuming static compilation also comes with static linking. But bear in mind that your compiler text books will probably not cover the state of the art either. I am also using a strict definition of static compilation i.e. I'm expecting it to compile static code. Granted that real-world implementations may have added capabilities that deal with dynamic code (which may be downloaded), but those aren't strictly static compilers anymore. The point here is that you should not apply classic compiler techniques blindly. Those techniques are usually targetted and optimized for a different kind of system (one that does not necessarily resemble the Java platform), and hence may not be suitable here. Another misconception that people may have is that code generated by static compilers will be faster than JITted code. This is not always true. In some cases, JITted code will actually out-perform staticly compiled code. The key reason for this lies in the fact that the Java platform is dynamic and that late binding occurs. I'll leave the details of that discussion for another day (not a short discussion either). Hence, there are many reasons why static compilation techniques may not be suitable, even when discounting the resource constrain issue and performance is all that you care about. JavaSE Hotspot vs CVM Another point that may not be obvious to the average developer is that a JavaME implementation (like CVM) is not just a smaller JavaSE. The type of devices that JavaSE targets are different beasts than those of JavaME. CVM is not smaller that JavaSE's Hotspot only because it has lesser functionality. CVM was architected with different design goals in mind to enabled it to work well in embedded devices. At each level of its design, a different choice was made for the speed-space tradeoff. For this reason, techniques used in JavaSE may not apply in CVM because they are tuned for a different tradeoff. To give you a concrete example of how JavaME devices differ from JavaSE, some time back, a colleague of mine from the JavaSE side discovered that when he applied a certain technique to improve cache locality, he was able to get a performance gain of about 20% in one benchmark. This caught my attention. 20% is nothing to sneeze at. So, I applied the technique in CVM. To my surprise, that same benchmark yielded a jaw-dropping 70% gain in performance. What happened? The difference was that the JavaSE run was on a server class machine where the amount of cache was huge (possibly in the hundreds of KBs or maybe even a few MBs). I was running mine on an ARM device with only a 32KB cache. The improved cache locality had a greater impact here. Hey, but doesn't that demonstrates the exact opposite, that it's good to import JavaSE techniques into CVM? Well, in this case, it worked out. But what if the JavaSE technique was one that made use of the fact that the target device will have a large cache, and optimized code to take advantage of that? Such a technique applied to CVM may actually cause a significant degradation in performance due to JavaME devices not having that expected amount of cache. Hence, the point is that we should not import techniques from JavaSE blindly either. Note that the above illustration also shows that a JavaSE VM may not actually run faster than CVM if it was run on a JavaME device, even if you give it enough RAM (but not system cache) to fit. It would be like trying to power your car with a rocket engine. It sounds like a good idea, but your fuel system won't be able to handle it. And the result is not a faster car, but something that may not move at all. JavaME is not just a smaller JavaSE. It is a different beast. This is why comparing JavaSE and JavaME may be like comparing apples and oranges. Benchmarking To measure performance gains, you will need to run benchmarks of some sort. One common mistake that people make is to run micro-benchmarks that only test the one area that is improved by the optimization. The issue here is that real world applications would probably not just sit in a tight loop and exercise that one area of code all day. Hence, benchmarks that are based on real world applications are more reliable as performance indicators. For JavaME, we like SPEC JVM98, but as indicated earlier, it won't run on CDC without modification due to deprecated methods. Another one that we like is GrinderBench by EEMBC. If possible, try to run your benchmark on a JavaME type device. As indicated above, JavaME is different from JavaSE. Benchmarking your changes on a JavaSE type desktop / server machine will give you an indication of the result your changes will yield, but it's not necessarily the same results you will get on a JavaME device. Exercise proper engineering discretion. Performance Elsewhere Also, the VM can provide mechanisms which can help the class libraries perform better. They need to coorperate together. It's not a one or the other thing. Lastly, there's the thing about native code. Some people think that their code will always run faster if they implement all the major pieces as native code. This is actually a fallacy. For various reasons, using native code can actually result in worse performance than implementing some or all functionality in Java bytecodes. All these will be discussed in detail at a later date. What does this mean to you? Have a nice day. :-) »
Related Topics >>
Mobile and Embedded Comments
Comments are listed in date ascending order (oldest first)
|
|||||||||||||
|
|