The Source for Java Technology Collaboration
User: Password:



Mark Lam's Blog

February 2007 Archives


JIT Performance: Defying Physics?

Posted by mlam on February 21, 2007 at 05:56 PM | Permalink | Comments (6)

A few days ago, I came across a few blog entries that referenced my previous article. They are: When is software faster than hardware? by Matthew Schmidt, and Can JIT'ed Code be Faster than Hardware Accelleration by Kirk Pepperdine. These blog entries had received some comments that I thought deserved a response. So below, I will try to address issues raised in some of those comments, as well as provide an intuitive understanding of why you would expect a JIT to outperform a JPU.

Resources: When is Software faster than Hardware?, Software Territory: Where Hardware can't go!

Let's start with ...

Physics Shmeesics
How can software possibly run faster than hardware? I've made this statement myself numerous times in the past. What was I thinking when I made that statement? Well, basically, the thought goes: if a piece of software is running on some given hardware, the performance of that software is ultimately gated by the hardware it runs on. This can be illustrated with an analogy as follows ...

Continue Reading...



Software Territory: Where Hardware can't go!

Posted by mlam on February 16, 2007 at 02:27 AM | Permalink | Comments (15)

In response to my previous article, some folks have been asking about the JIT optimizations I listed, as well as a lot of other interesting questions. I'm not sure I can address all of the questions here. But on the topic of JIT optimizations, I can provide more insight on what they are as well as why hardware cannot implement them.

Before I get started, just to be clear, I'm not personally against hardware Java processors. I certainly think that they fit nicely in some domains. I am also not against any vendors who make Java processors out there. I applaud them for serving the needs of a market that a JIT may not fit. Also, just because a JIT fits doesn't mean that it is always the best solution to deploy. In a previous article, I've made the case that engineering decisions should always be made on a case by case basis. A "one size fits all" mentality can work, but may not always yield the best solution.

However, I do want to debunk the myth that a hardware processor can be faster than an optimizing JIT. But, of course, the JIT isn't free. There is some cost to it in terms of CPU cycles and memory, though it is often a lot less than most people believe. I will address the JIT cost issue in a future article. For today, let's look at JIT optimizations. Since I work on the phoneME Advanced VM for CDC (aka CVM), along the way, I'll point out if these optimizations are available in CVM as it exists today (for those who are interested in CVM details).

Resources: When is Software faster than Hardware?

JIT Optimizations
In my last entry, I rambled off a random list of JIT compiler optimizations. The list is by no means comprehensive nor necessarily indicative of the most desirable optimizations to have in a JIT. Previously, I have explained how more performance isn't always a good thing. Each optimization comes with a cost of some sort. The VM/JIT engineer must weigh the cost against the benefits in choosing to include or leave out an optimization. That said, let's go over the optimizations I've already mentioned as examples to illustrate why a JIT has the advantage over Java processors when performance is the criteria of comparison.

The list again is:

  1. inlining
  2. constant folding
  3. loop unrolling
  4. loop invariant hoisting
  5. common subexpression elimination
  6. use of intrinsic methods

Continue Reading...



When is Software faster than Hardware?

Posted by mlam on February 13, 2007 at 02:44 AM | Permalink | Comments (6)

I decided that I'll take a break from the bug fix track that I've been on, and have a little diversion to spice things up. I'll resume the bug fix (and JIT internals) discussion soon. For today, I would like to clarify a common misconception that hardware Java processors are faster than dynamic adaptive compilers / just-in-time compilers (i.e. JITs). I'll take you through some analysis to prove my point. The analysis will be based on examples from the phoneME Advanced VM for CDC (aka CVM), but this reasoning should apply to other VMs as well. Let's dive in ...

Hardware Acceleration
Hardware acceleration is a technique that is commonly employed to get better software performance in terms of speed. This approach has been successful with graphics, sound, and DSP processing. In those cases, the hardware acceleration offloads the graphics, sound, and DSP work onto co-processors and frees up the main CPU to do other stuff. This parallelism is one reason we get improved performance out of hardware accelerators.

Another reason is that the hardware accelerators can provided special instructions that can do work that is traditionally done by software routines. Of course, these special instructions are specific to the types of algorithm (i.e. graphics, sound, DSP) that uses them. Hence, if your application doesn't do much graphics, sound, and/or DSP, then such hardware accelerators won't be able to make your application run any faster.

Due to the known success of these hardware accelerators in their respective applications, we have come to generalize this success to think that all hardware acceleration will beat software solutions. In the case of Java processors in comparison to JITs, this generalization turns out to be untrue.

Java Processors
The Java VM specification comes with its own instruction set. Some of these instructions look a lot like those one would find in a typical CPU's instruction set. Hence, the idea is that by adding the Java VM's instruction set to a CPU's instruction decoder, one can improve the performance of Java code execution. This observation is valid. However, the misconception is that this hardware acceleration will also out-perform or even match VM JITs. In the case of modern JIT compilers, a hardware Java processor will be hard-pressed to beat the performance of a JIT. Note: In the following discussion, I will abbreviate the hardware Java processor simply as JPU for brevity.

Disclaimer: I am not commenting on the quality of any specific hardware Java processor implementations in the market, but merely looking at this issue from a purely theoretical viewpoint.

OK, now let's look at a specific example ...

Continue Reading...



Bug Fix Part III: List of Changes

Posted by mlam on February 02, 2007 at 01:41 AM | Permalink | Comments (0)

The problem with having a real job is that I don't always have time to blog. =p And I am also looking forward to wrapping up this thread of discussion so that I can move on to some other topics as well. Unfortunately, because of the sheer amount of information, it will take a few more entries. While I'm still very busy, this discussion will never end if I keep putting it off. So, here's a bit more for today. Let's dive in ...

Map of CVM JIT Architecture

Resources:
1. Part I: A Field Get Experience
2. Part II: In a bit of a Volatile Fix!
3. the JIT Architecture Map (1024x768 JPG format)
4. the JIT Architecture Map (PDF format)

By the way, I call this entry Part III (as opposed to Part IV) because I didn't count my last entry on the JIT Architecture Map as being directly about to this bug fix (though its content is relevant).

Status of the fix
As of 2 weeks ago (relative to this writing), I have actually completed this fix and committed the changes to the phoneME Advanced repository already. You should be able to find the details if you check the repository's commit log for revision 1270. I'll discuss that revision log here.

Summary of changes
Here is the summary of changes exactly as it appears in the revision log:

  1. Fix for CR#5080490:
     Added support for compiling with volatile 64bit field accesses (includes
     potential volatile 64bit field accesses due to unresolved CP entries).
     The IR is changed to mark the fieldref nodes with a VOLATILE flag when
     appropriate.  The JIT backend is changed to emit calls to helpers for the
     cases of potential or known 64-bit volatile field accesses.  The current
     implementation uses CCM helper functions to achieve field atomicity in the
     same way that the interpreter does it i.e. using a microlock.

The above essentially describes the strategy that I used to fix this bug. I will discuss this in greater detail in my next entry. There were also a few other related items that I took care of while working on this fix:

Continue Reading...





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds