The Source for Java Technology Collaboration
User: Password:



Mark Lam

Mark Lam's Blog

Software Territory: Where Hardware can't go!

Posted by mlam on February 16, 2007 at 02:27 AM | Comments (15)

In response to my previous article, some folks have been asking about the JIT optimizations I listed, as well as a lot of other interesting questions. I'm not sure I can address all of the questions here. But on the topic of JIT optimizations, I can provide more insight on what they are as well as why hardware cannot implement them.

Before I get started, just to be clear, I'm not personally against hardware Java processors. I certainly think that they fit nicely in some domains. I am also not against any vendors who make Java processors out there. I applaud them for serving the needs of a market that a JIT may not fit. Also, just because a JIT fits doesn't mean that it is always the best solution to deploy. In a previous article, I've made the case that engineering decisions should always be made on a case by case basis. A "one size fits all" mentality can work, but may not always yield the best solution.

However, I do want to debunk the myth that a hardware processor can be faster than an optimizing JIT. But, of course, the JIT isn't free. There is some cost to it in terms of CPU cycles and memory, though it is often a lot less than most people believe. I will address the JIT cost issue in a future article. For today, let's look at JIT optimizations. Since I work on the phoneME Advanced VM for CDC (aka CVM), along the way, I'll point out if these optimizations are available in CVM as it exists today (for those who are interested in CVM details).

Resources: When is Software faster than Hardware?

JIT Optimizations
In my last entry, I rambled off a random list of JIT compiler optimizations. The list is by no means comprehensive nor necessarily indicative of the most desirable optimizations to have in a JIT. Previously, I have explained how more performance isn't always a good thing. Each optimization comes with a cost of some sort. The VM/JIT engineer must weigh the cost against the benefits in choosing to include or leave out an optimization. That said, let's go over the optimizations I've already mentioned as examples to illustrate why a JIT has the advantage over Java processors when performance is the criteria of comparison.

The list again is:

  1. inlining
  2. constant folding
  3. loop unrolling
  4. loop invariant hoisting
  5. common subexpression elimination
  6. use of intrinsic methods

Inlining
Consider this example:

    public class MyProperty {
        protected int value;
        public int getValue() {
            return value;
        }
    }

    public class User {
        public void doStuff(MyProperty p) {
            System.out.println(p.getValue());
        }
    }

This example shows a common coding pattern in the Java programming language i.e. the use of getter/setter methods to access private data. This is done to achieve better encapsulation. We use getter methods like getValue() because accessing fields like value directly would introduce a whole slew of software engineering problems which I won't go into here.

While using a getter method is good for encapsulation, it is bad for performance because you will have to incur the cost of a method call. The cost of a method call includes pushing arguments (e.g. the this pointer), setting up and tearing down a stack frame for the target method (getValue() in this case), and popping the return value of the stack. Inside the target method, there's also the added cost of more pushing and popping of operands and results. In this trivial example, we only need to push the result inside getValue(). In a more complex example, there can be other costs not shown here. This cost adds up to somewhere between 10s to 100+ machine instructions. Note that these instructions are all method overhead. The getValue() method will still has to do the real work of accessing the field which can take as little as 2 machine instructions.

To deal with this, when compiling doStuff(), a JIT compiler would inline the call to getValue() to effectively get the following code:

        public void doStuff(MyProperty p) {
            System.out.println(p.value);
        }

In so doing, you still get the benefits of encapsulation (in the source code, at development time) for good software engineering practice, but still get optimal performance (at runtime) as if you accessed the field directly. All the cost of method invocation is removed. The access to p.value takes less than 10 instructions. Compare this with the extra 10s to 100+ instructions to do the method invocation.

Note that I said earlier that the getValue() could do the work of accessing the field in as little as 2 instructions. But in the inlined case, I said it will take less than 10 instructions. Why the discrepancy?

Well, getValue() is a virtual method. Hence, there may be some added cost to check if we're actually going to end up invoking MyProperty.getValue() as opposed to an overriding method in a subclass. This is the reason for the 10 or so instruction estimate. However, in the case where this method is not overridden, the JIT can truly optimize this down to the minimum 2 instructions.

I pointed out the added complexity with dealing with virtual methods because I want you to understand that there is more to doing inlining correctly than meets the eye. There are many other details to the implementation of inlining that I can't go into here.

Hardware Method Invocation
Now, let's consider the Java processor (JPU). When executing doStuff(), the JPU will encounter an invokevirtual bytecode where it tries to call getValue(). By definition, the JPU will treat the invokevirtual bytecode as its machine instruction and execute it. However, the JPUs won't know how a VM structure its stack. Hence, it will need to trap to software to do all the work that I pointed out above as overhead.

One might argue that a really advanced JPU will define the stack structure and the VM software will just have to conform to that so that hardware will know how to push and pop a frame itself. But even without the stack issues, there are a few other things that make it really hard for the JPU to do a method invocation purely in hardware.

For one, the invokevirtual bytecode specifies an index into the class constant pool (CP). The JPU will also need to be able to understand the structure of the CP. But the class constant pool has symbolic references to the method to be invoked. This will need to be resolved first. Resolution will trigger class lookup. In the case of invoking static methods, resolution can trigger classloading, class initialization, garbage collection, and exceptions being thrown. As you can see, invoking a method is not a trivial thing. It would take a seriously advanced and extremely complex JPU to do method invocations in hardware.

Note, you don't actually have to do classloading, garbage collection, etc. in hardware in order to do method invocations in hardware. You just need to be able to find some way to trap to these when the hardware can't handle it. If the JPU can just execute the common invocation cases in hardware (and leave the rest to software), then that's a big win. However, in order to achieve this, it will require that in addition to having to specify the stack structure, the JPU will at least also have to specify a constant pool structure that the hardware can understand.

Using Miraculous Hardware
Now, let's grant you that the hardware designer is relentless and gives you all that. With that, the JPU will still have to execute the method invocation which involves all the overhead I pointed out. Executing it in hardware doesn't mean that the overhead is gone. The work done in the overhead incurs a lot of memory accesses. What is the chance that you will never have a cache miss? And if you have a large enough cache to make cache misses improbable, then what would it take to be able to move multiple words of data (for the arguments, stack frame values, and result) around the cache without incurring multiple machine cycles? Chances are, the number of cycles incurred by the JPU will be non-zero. Now compare that with the JIT where that cost can be 0. There's no beating inlining when it comes to performance.

If you're still an optimist for the JPU, the next thing you may ask is if we can have the JPU do inlining too. But remember what I said about having to do a check in some cases when we're dealing with inlining virtual method calls (not to mention the other complexities that I did not talk about)? It will be a whole lot of extra work to be able to handle all those cases in hardware.

Yes, theoretically, anything one can do in software, you can also do in hardware. But the difficulty of doing it in hardware is significantly more difficult and costly (in terms of hardware design, manufacturing, etc.) compared to a software solution. So, a real world JPU would probably trap to software to do method invocation. At best, it can do something to help the software do less work, but not reduce the work to 0 as a JIT can in this case.

Inlining is available in the CVM JIT.

Constant Folding
Consider this example:

    public class O1 {
        public static final int OFFSET = 5;
    }

    public class O2 {
        public static final int OFFSET = 3;
    }

    public class MyClass {
        int calcValue(int v1, int v2) {
            return (v1 + O1.OFFSET) + (v2 + O2.OFFSET);
        }
    }

The JIT can effectively compile calcValue() into:

        int calcValue(int v1, int v2) {
            return (v1 + v2 + 8);
        }

Constant folding is basically an optimization where we fold the constants together to reduce the amount of work that needs to be done to compute a result. In this case, the JIT takes advantage of the algebraic properties of addition and pre-add the 2 constants together instead of having to do it every time this method is called. Hence, only 2 add operations are needed when the method is called.

A JPU by definition will execute its instructions which are the bytecodes. In this case, the bytecodes for the constants will include pushing 2 constants, and doing 3 additions. With the possible hardware feature where the top N operands of the stack are mirrored in registers, the JPU can avoid some of the cost of the pushing and popping cost. However, it still needs to initialize the values of those registers. Compared to the JIT, the JPU will incur these additional register initialization cost plus one extra addition. The JIT can not only eliminate the add, but also encode the constant (in this case, the value 8), if it is not too big, into the one of the add instructions. This allows it to avoid the register initialization altogether.

OK, you may ask: won't javac be smart enough to do the constant folding when the Java source code is compiled into bytecode? Maybe. I didn't check. In practice, constant folding usually becomes more meaningful when used in conjunction with inlining. Inlining may yield opportunities for constant folding that don't exist at the source level. For example:

    int adjustValue(int value) {
        return value + 5;
    }

    int adjustMore(int value) {
        return adjustValue(value) + 3;
    }

After inlining adjustValue() into adjustMore(), the JIT can also fold the constants as follows:

    int adjustMore(int value) {
        return value + 8;
    }

Some types of constant folding is available in the CVM JIT. In practice, constant folding has not yielded a lot of performance gains in real world benchmarks. Hence, accordingly, we didn't put a lot of effort into implementing every possible type.

Loop Unrolling
Consider this example:

    int a = ... // some value.
    for (int i = 0; i < 3; i++) {
        a = a + i;
    }

The anatomy of the above loop includes the following operation:
     1. initialize the iterator i to 0.
     2. check to see if the iterator has exceeded the limit (i.e. 3).
     3. execute the addition within the loop.
     4. increment the iterator.
     5. branch back to the top of the loop.

Again, by definition, a JPU will execute the bytecode as its own native instruction set. Since the bytecode basically expresses the above operations, the JPU will execute the above steps 3 times.

With loop unrolling, the JIT can compile the above code fragment into the following:

    int a = ... // some value.
    int i = 0;
    a = a + i;
    i++;
    a = a + i;
    i++;
    a = a + i;

With a little extra smarts, the JIT can further optimize the above to:

    int a = ... // some value.
    a = a + 0;  // i is 0.
    a = a + 1;  // i is 1.
    a = a + 2;  // i is 2.

Add constant folding:

    int a = ... // some value.
    a = a + 3;  // 0 + 1 + 2.

Loop unrolling, in of itself, works to remove the loop overhead like the branch back to the top of the loop, and possibly the iterator incrementing, as well as the limit check. But when combined with other optimizations, as we can see, the performance gains can be dramatic. It's not possible for the JPU to implement this optimization too because by contract, the JPU needs to execute the bytecodes as specified.

In practice, loop unrolling is not as trivial as the example shown above. Consider what happens if the loop iterator limit is a variable (as opposed to a constant) that is passed into the method. How many iterations do we unroll the loop into then? Alternatively, what if the limit is a very large constant? Unrolling all the way to the limit can result in some serious code bloat, which in turns reduces cache locality and can hurt performance. What if the code inside the loop can throw an exception e.g. indexing into an array beyond its bounds? I won't go into the details of how a JIT deals with all these. I just want to point out that there are a lot of extra complexity to this optimization then initially apparent.

Loop unrolling is not currently available in the CVM JIT. It is not easy to implement, and it is not an important nor cost-effective optimization to implement for the CDC space based on our previous experience. That's not to say that things won't change in the future.

Loop Invariant Hoisting
Consider this example:

    void foo(int[] data) {
        int a = ... // some value.
        for (int i=0; i < data.length; i++) {
            ...;
        }
    }

In the above example, the length of the array is fetched in every iteration of the loop. If the JIT can determine that the array data won't change in length inside the loop, we can hoist the fetching of its length outside of the loop so that we don't incur the cost repeatedly for each iteration. The JIT effectively emits code that does the following:

    void foo(int[] data) {
        int a = ... // some value.
        // pre-fetch the array length into a register:
        int tempReg = data.length;
        for (int i=0; i < tempReg; i++) {
            ...;
        }
    }

This type of optimization is called loop invariant hoisting. In the JIT's case, fetching the array length requires accessing the array's data structure in memory (and memory accesses are expensive). Prefetching it into a register will allow the JIT to avoid this cost on every loop iteration. The JPU on the other hand has to execute the bytecode verbatim. As a result, it will fetch the array length on every loop iteration.

More advanced cases of loop invariant hoisting includes interactions with inlining. Let's say the body of the loop invokes some method that gets inlined. If the method happens to perform some operation that is invariant, that operation can be hoisted out of the loop to avoid unnecessary redundant work. This is, of course, not possible for the JPU to implement because of the inlining issues.

Loop invariant hoisting is not currently available in the CVM JIT. It isn't easy to implement in a generic way. Again, it isn't the most important optimization to have for applications in the CDC space.

Common Subexpression Elimination
Consider this example:

    int a = p.value + p.value;

The bytecodes for the above include 2 fetches of the field value from the object p. Field accesses will result in memory accesses which can be expensive. The JIT recognizes that the above code can be expressed as follows:

    int tempReg = p.value;
    int a = tempReg + tempReg;

In this case, the fetching of the field is a subexpression of the addition expression. The JIT eliminated one subexpression by fetching the field only once and reusing its value as the second operand in the addition. In this case, it saves one memory access. This optimization is called common subexpression elimination (aka CSE). In contrast, a JPU will have to execute the bytecode verbatim and do the field access twice.

The above is only a very simple form of CSE. More complex forms exists, and those take a lot more effort to implement in the JIT. Some block local types of CSE is available in CVM's JIT.

Intrinsic Methods
Consider the following example:

     ...
     time = System.currentTimeMillis();
     ...

The JPU will execute the above as a method invocation to a native method that gets the systems millisecond timer value.

Let's say we have a system that the milliseconds timer is a 64 bit hardware timer/counter that is memory mapped. In other words, software can read from it directly at some address in memory. A JIT can take advantage of this knowledge. Instead of emitting code that invokes the System.currentTimeMillis() method, it emits a single memory load from the location of the hardware timer. The gain here is that we need not incur all the method call overhead, as well as other cost for invoking a native method (see Beware of the Natives). In other words, the JIT can eliminate many hundreds of machine cycles down to a single 64-bit memory access.

This optimization is called intrinsifying the method, or using intrinsic methods. The idea is basically that there are certain standard library methods that the JIT knows the semantics/behavior of. This special knowledge allows the JIT to emit code that implements the semantics of the method without doing an actual method call, or alternatively, to do the method call in a less expensive manner.

Intrinsics is also one way that the JIT can make use of special hardware features instead of calling a software method. For example, Math.cos() can be replaced with a cos instruction if the hardware provides such a feature.

A JPU can't implement this optimization because it has to execute the invoke bytecode as specified. There's also the hurdle of needing to understanding the VM's constant pool structure, and having to deal with resolution, class initialization, etc. that I mentioned earlier. In the least, a JPU cannot afford to implement as many intrinsics (in number and types) as a JIT.

Intrinsics are available in CVM's JIT.

Closing Thoughts
Again, theoretically it is possible to implement any software features in hardware. However, the cost of doing so makes it impractical, and therefore effectively impossible.

Also, so far, I've been saying that a JPU can't implement all these optimizations because it has to execute the bytecodes verbatim. You might ask: why can't the JPU solution employ some sort of code transformation like the JIT so that it doesn't have to execute bytecode in a simple minded way i.e. verbatim? Well, if you do that, then what you have is a JIT. Code transformation is what a JIT does. It transform bytecodes into a form that is optimal for the CPU to execute. Hence, by definition, a JPU (without a JIT) must execute the bytecode verbatim, and consequently, will not be able to implement JIT type optimizations.

Another reminder: the above is a only sampling of possible JIT optimizations. This list is neither exhaustive nor representative of all the most important / cost-effective optimizations that a JIT can implement, though some of these are really important. Inlining is one that yields a lot of performance gain without too much cost when applied in a JIT.

Ok, time to stop. I hope this article helps shed some additional light on this topic. Have a nice day. :-)


BTW, regarding JavaOne, I will probably be there on one or more days. If folks are interested in getting together to have a little technical discussion, I'd be happy to oblige (assuming schedules will allow it).


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Hi,

    Many of the optimizations you present, apart inlining which requires knowledge of the runtime environment because new classes could have been loaded, are classical and are already implemented in static compilers like gcc.

    Could we think that Sun, for unknown reasons, had not implemented them in javac to get simpler bytecode? I imagine that the JIT compiler uses pattern matching to decide where it can apply a JIT optimization, and the simpler the bytecode is, the easier it finds instructions patterns to match.

    If these kind of optimizations (loop unrolling, constant folding...) had been applied in the bytecode generation by javac, the JPU would have a simpler job. And on other platforms, I think that the JIT compiler would not spend 5% of the CPU time but only 3%, to apply the remaining inlining optimization. JIT compilers would be simpler to write, so use less runtime memory, and Java applications would start quickier...

    A problem with JIT compiler is that they do again and again the same optimizations, at every new run. If they had some kind of memory of the previous runs, the optimization decisions would be more straightforward.

    Posted by: genepi on February 16, 2007 at 09:47 AM

  • Great and very clear explanation of some difficult subjects!
    About loop unrolling with a variable I understood Java SE can do something like this (which is probably not an option for ME since it increases code size):

    for(int i = 0 ; i < x ; i++) {
    doSomething with i..;
    }

    So the JIT would convert it to something like this which can
    potentially reduce conditional statements by a factor of 3 + 1:
    if(x % 3 == 0) {
    for(int i = 0 ; i < x ; i++) {
    doSomething with i..;
    i++;
    doSomething with i..;
    i++;
    doSomething with i..;
    }
    } else {
    for(int i = 0 ; i < x ; i++) {
    doSomething with i..;
    }
    }


    I also noticed this Sun announcement which gives the best of both worlds Jazelle acceleration and a JIT. Does the CVM support something like this?

    Posted by: vprise on February 16, 2007 at 10:01 AM

  • Dear genepi,

    Yes, most of those are classical optimizations (discussed in most compiler text books). I don't know the reason why javac does not apply these. One reason may be to preserve symbollic information in the bytecodes. But there are other cases where javac could have optimized the bytecodes but did not. I'm sure the javac team had reasons for doing so. I'm just not aware of it.

    But I think you missed an important point: even an optimizing javac can only go so far in applying these classical optimizations. Inlining is determined at runtime, and inlining yields opportunities for applying these classic optimizations where javac could not apply it before. Hence, there can be a good reason to support these in the JIT.

    As for the complexity of JITs and the amount of work they do, there are different tradeoffs made in each JIT implementation (for CLDC, CDC, and JavaSE). It is true that it is possible to write really simple JITs, but such JITs may not yield as much performance. It's all a tradeoff.

    As for caching the JIT output for subsequent runs, that's an argument that proponents of static compilers like to make. It's also based on a static compilation view of the world. The issue here is that JIT output is based on dynamic behavior. There are inherent problems with caching the output. Once mechanisms are put in place to solve all these issues, you essentially end up with a static compiler in the VM. And with that, you will get all the benefits (which you alluded to) and all the disadvantages as well (which usually is not talked about). I will try to talk about these issues one day as it is also another major myth that I would like to debunk. That is not to say that a JIT can't cache some of its output. But again, there is some cost involved, and therefore some trade-offs need to be made.

    Regards, Mark

    Posted by: mlam on February 16, 2007 at 12:24 PM

  • Dear vprise,

    Thanks for your more complete example of how loop unrolling is done. I was aware of it. As for Jazelle support on CVM, I am not at liberty to comment on that. Sorry. If there are parties who are interested in this feature, they should inquire with their Sun sales rep.

    Regards, Mark

    Posted by: mlam on February 16, 2007 at 12:39 PM

  • I would also add devirtualization and speculative devirtualization to the list.

    Posted by: olegpliss on February 16, 2007 at 04:31 PM

  • Dear Mark,
    I would also be very interested in an explanation of the drawbacks of caching JIT's, if you could go into that in a future post that would be great! I recall Symantec had something like that in the early days of Java and I still don't understand why this hasn't caught on. My only guess is that it is due to space constraints in devices and complexity on the PC.

    Posted by: vprise on February 16, 2007 at 10:50 PM

  • Hi Mark

    Thanks for continuing this thread--I love these blog entries, both informative and useful. You write well, please keep it up!

    A question that's interested me for awhile now is the impact of the final keyword in VM optimizations. I believe that early on it was recommended to use final liberally, because it allowed the VM to optimize certain algorithms (e.g. no need to worry about a new subclass appearing). However, I believe that this is no longer recommended best practice--can you comment on how/if final actually helps the CVM in optimizing code?

    Thanks
    Patrick

    Posted by: pdoubleya on February 19, 2007 at 07:37 AM

  • Hi Patrick
    Since Mark didn't answer this I'll take the liberty, I have no idea whether CVM implements it but Java SE has a feature where Hotspot automatically marks classes as final. So if at runtime a class is detected to have no subclasses it is marked as final without much of a performance penalty. A Java One presentation from a couple of years back (I forget which) showed that using final doesn't get you much of anything in terms of performance. Maybe this is different for Java ME though.

    Posted by: vprise on February 19, 2007 at 11:11 PM

  • For JavaOne you could grab some time in the java.net pavilion. They've got chairs and a couch -- good for discussion. Or you could take over part of a local bar.

    Posted by: dwalend on February 20, 2007 at 09:55 AM

  • Hi Patrick (pdoubleya),

    Regarding the final keyword, I think vprise is correct. I haven't thought through the issues in great detail, but on the surface, for methods, I don't think the final keyword adds much. The devirtualization and speculative devirtualization optimizations, that my colleague Oleg mentioned, will make non-final methods look like final ones. CVM implements these optimizations as well (and so does, JavaSE Hotspot). Hence, whether you specify final or not, it may not make a big difference.

    In general, my advice to folks who are writing Java code is that they should primarily be concerned with writing good code based on sound software design (i.e. judicious use of OO principles and good design patterns, avoid anti-patterns, etc) instead of worrying about whether the VM or JIT will optimize something or not. You can never be sure what VM your code will be deployed on. So, it's unwise to make tradeoffs based on expected behavior of one VM or another. That's not to say that one should write bad code and just expect the VM to fix its performance problems. But something fine grain like final is probably not going to make a lot of difference. So, use final when your software design intends the method to be final, and not because of any potential performance gains.

    Thanks for the good question.

    Regards, Mark

    Posted by: mlam on February 21, 2007 at 09:54 AM

  • Constant folding is supported by the JLS, since modifying a static final field is not binary compatible.

    Posted by: konrad_schwarz on March 02, 2007 at 06:39 AM

  • Could you comment on the usefulness of the "Jazelle" instruction set extension for ARM?

    Posted by: konrad_schwarz on March 02, 2007 at 06:40 AM

  • Hi Konrad,

    Thanks for your clarification about the JLS. I recall that that was the case, but was too lazy to look it up when I was writing the entry.

    Regarding Jazelle, I'm not sure I am at liberty to comment about details due to various restrictions. However, in this article (as well as the previous and the next), I tried to outline the technical considerations that one would make (from the perspective of constraints in any Java implementation) when considering any JPU. I also talked about the downside as well as some upsides based on the general principles of a JPU. I hope that this will give you (the developer) one side of the information you will need to make an informed decision. Of course, the other bit of information you will need is the JPU's specific features and if/how they overcome some of these constraints. The usefulness of any JPU will depend on these factors as well as how well it addresses your requirements in terms of performance, startup time, memory usage, power conservation, etc. I'll have to leave that determination to the individual developer as requirements can vary.

    Regards, Mark

    Posted by: mlam on March 02, 2007 at 12:59 PM

  • Hi Mark,


    As for caching the JIT output for subsequent runs, that's an argument that proponents of static compilers like to make. It's also based on a static compilation view of the world. The issue here is that JIT output is based on dynamic behavior. There are inherent problems with caching the output. Once mechanisms are put in place to solve all these issues, you essentially end up with a static compiler in the VM. And with that, you will get all the benefits (which you alluded to) and all the disadvantages as well (which usually is not talked about). I will try to talk about these issues one day as it is also another major myth that I would like to debunk. That is not to say that a JIT can't cache some of its output. But again, there is some cost involved, and therefore some trade-offs need to be made.


    Sorry if my thoughts were not correctly interpreted. I don't want to reproduce a static compiler into the JVM. I think that the JVM optimzer could take better decisions if it remembered the context, and not necessarily the compiled result, from previous runs. Much more like profiling information but for runtime use.

    For instance, when it starts, if the JVM can obtain from a previous run the list of hot spot candidate methods, it won't have to wait a few thousands CPU cycles to decide which methods it should optimize in priority.

    I know that we can't keep naively an image of the previous compilation result (or JVM dump) and start a new run from it. There are so many parameters like static initializers or changes in the classpath/class loader for instance, that make the process more difficult than with a static compiler. But there are also patterns which can be detected, like some final classes for which the JVM could reuse compiled code from the previous run...

    I think the static Intel C++ compiler has such a option. You start your application with a special runtime flag and it generates a profile dump. And then you submit this dump to the compiler which recompiles and optimize your application code for real life. So even static compilers need to have dynamic information to do the best optimizations!

    Is there some work done in the JVM team to share dynamic and static information for optimization decisions?

    Posted by: genepi on March 05, 2007 at 11:02 AM

  • Hi genepi,

    Thanks for the clarification. You are correct that such information can be used as compilation hints for the JIT. I am still wary of the potential problem of over-eager compilation though. By over-eager, I mean that compilation may take place before the method has had adequate time to warm up. Warming up in this case, can mean more than a single run. It requires that all critical code paths be exercised before compilation. Otherwise, those code paths will get sub-optimal code generated for them. Regardless, I can see how your idea can help. It is at least worthy of some exploration (for possible refinement) and experimentation to get empirical data on how beneficial it is. There's always the chance that this may not make any noticeable difference. But the idea is interesting enough to warrant an investigation.

    As for work done for this, I am not personally aware of such work in CVM, though it is entirely possible that others at Sun have already attempted this. I would be tempted to explore this as soon as my schedule frees up a little though. But since this is open source, anyone who is able and willing is welcomed to try this if I don't get to it first (which may not be for a long time yet).

    Regards, Mark

    Posted by: mlam on March 05, 2007 at 11:29 AM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds