Skip to main content

JIT me up, Scotty

Posted by mlam on December 6, 2006 at 7:03 PM PST

Map of CVM Data Structures

The phoneME Advanced VM (CVM) comes with a dynamic adaptive compiler (JIT) which generates compiled code. Today's article will talk about how JIT compiled code uses the runtime execution stacks. I will also point out a few other tidbits about efficiency and performance as pertaining to the runtime stacks.

Resources: start of CVM data structure discussion, start of stacks discussion, copy of the map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime.

Let's get started ...

Bouncing on the Compiled Code Trampoline

A colleague of mine once describe the compiled code generated by CVM's JIT as co-routines. Unlike C functions, they don't have their own native stack frame. Instead, compiled code is bootstrapped using a piece of assembler glue called CVMgoNative (which I will abbreviate as goNative for the purpose of this discussion). goNative sets up a stack frame and reserves some small amount of scratch meory on the native stack for the use of any compiled code. After that, it branches into the appropriate entry point in the compiled code of the method to be invoked. For example, when we have an interpreted method (mI) call a compiled method (mC), the runtime stacks looks like this:

native stack: ... executeJava -> goNative

Java stack: ... mI -> mC

Let's take a look at another case. Here's compiled (mCa) to compiled (mCb):

native stack: ... executeJava -> goNative

Java stack: ... mCa -> mCb

Note that even if the very first Java method executed is a compiled method, we still have an instance of executeJava (i.e. the interpreter loop) on the native stack. This is because like with native code, the interpreter is used to do the bootstrapping via transition methods (see previous discussion for details).

Once we're executing in compiled code, the VM will tend to stay in compiled code until there is a need to exit. Here's interpreted (mIa) to compiled (mCa) to compiled (mCb) to compiled (mCc):

native stack: ... executeJava -> goNative

Java stack: ... mIa -> mCa -> mCb -> mCc

Note that there is only one instance of executeJava and goNative on the native stack even though there is 1 interpreted and 3 compiled methods being executed. CVM first enters executeJava and interprets method mIa. When m1a calls mCa, the interpreter detects that mCa is compiled. So, it pushes a compiled frame (see CVMCompiledFrame in interpreter.h) onto the Java stack instead. Next, it calls goNative with the appropriate entry point into mCa.

When mCa calls mCb, the compiled code will make use of some assembler code called the invoke glue (see examples in src/arm/javavm/runtime/jit/ccminvokers_cpu.S here). The invoke glue then branches to the entry point for mCb. mCb does not get another frame on the native stack, but reuses the same scratch memory allocated by goNative that was previously used by mCa. Hence, the scratch memory on the native stack is not used to hold state information across method call boundaries. All method state that need to persist across this boundary is kept in the compiled frame on the Java stack instead (see here for the structure of the compiled frame).

This is what we mean by the "staying in compiled code". Though code execution goes from one method to another, there is no new native frame being pushed or popped, and code flow does not go through the interpreter loop. Overhead between method calls are kept to a minimum.

Each compiled method looks like a routine that we branch to instead of calling. The code flow execution pattern looks like bouncing on a trampoline. We bounce off the glue trampoline to jump into a compiled method. To call another method, we fall out of that method back into the glue, and bounce into another compiled method. The same is done for returns as well as invocations. Hence, the trampoline analogy. Sometimes, the glue code is referred to as trampoline code.

But what if we need to call an interpreted method from compiled code?

Falling off the trampoline

Here's a case where interpreted mIa calls compiled mCb, which calls compiled mCc, which calls interpreted mId:

Before calling mId:

native stack: ... executeJava -> goNative

Java stack: ... mIa -> mCb -> mCc



After calling mId:

native stack: ... executeJava

Java stack: ... mIa -> mCb -> mCc -> mId

Unlike with native code, to invoke an interpreted method from a compiled method, we do not recurse into the interpreter. Instead, the invoke glue detects that we're calling an interpreted method and simply returns to the previous interpreter loop, and let it interpret the method. That's why goNative is not on the native stack anymore.

But don't the compiled methods need to retain their state information? How can we pop the goNative stack frame without losing this information? Well, as mentioned above, the goNative native stack frame is only used for scratch data. It will not be used to persist method state across method calls. Persisted method state is held in the Java stack. Note that the stack frames in the Java stack still shows you the order of method calls in the call chain.

When we return from mId, we pop the mId frame off the Java stack, and then simply push a new goNative frame on the native stack, and resume executing in mCc. This time, instead of branching to the start of the method, we branch to the return point after the method invocation. Similarly, if we needed to invoke a compiled method mCe from mId, we would have left mId on the Java stack, pushed a goNative frame on the native stack, and branched to the start of mCe to continue executing. There is no need for more than one interpreter loop frame and one goNative frame at any time (except for ... see below).

Here are some more examples to illustrate the above: interpreted mIa calls compiled mCb, which calls compiled mCc, which calls interpreted mId, which calls compiled mCe:

Before calling mCe:

native stack: ... executeJava

Java stack: ... mIa -> mCb -> mCc -> mId



After calling mCe:

native stack: ... executeJava -> goNative

Java stack: ... mIa -> mCb -> mCc -> mId -> mCe

mCe returns to mId:

native stack: ... executeJava

Java stack: ... mIa -> mCb -> mCc -> mId

mId returns to mCc:

native stack: ... executeJava -> goNative

Java stack: ... mIa -> mCb -> mCc

the Native Monkey Wrench

This design is another way in which CVM reduces native recursion. It does not need to continually recurse into the interpreter loop in order to transition between compiled and interpreted code. However, compiled code must be able to call native methods. For CVM, the native code can be bootstrapped directly from the compiled code's assembler glue. Hence, we won't have to fall back to the interpreter loop to invoke the native method. For example, compiled mCa calls native mNb:

native stack: ... executeJava -> goNative -> mNb

Java stack: ... mCa -> mNb

But unlike compiled code, native methods do rely on their native stack frame to store method state information that is meant to be persisted while it calls another method. Hence, when a native method calls another method (interpreted, native, or compiled), the VM needs to recurse into the interpreter loop. For example, compiled mCa calls native mNb, which calls compiled mCc:

native stack: ... executeJava -> goNative -> mNb -> executeJava -> goNative

Java stack: ... mCa -> mNb -> mCc

Hence, as we pointed out yesterday, having native methods in your call chain can be expensive and bad for performance ... and memory consumption (think "mirrored stack frames").

Compiled Code Performance

Note also that since all compiled code reside in the code cache (see section on the code cache in the BIG picture), we are mostly executing code that is contained within a smaller contiguous region (small as in the size of the code cache relative to all code in the VM and libraries). This yields better cache locality which means better performance.

I mentioned the invoke glue earlier. There are also other pieces of assembler glue code. These may be there for connective purposes or for performance reasons. In fact, many of these glue logic could be implemented in a minimal fashion to make use of C code that is already provided in the shared code. This is part of the strategy that enables ports of CVM (and its JIT) to be up and running quickly (see section on portability in this previous discussion). However, in the code that is open-sourced, these glue logic have already been optimized with full (or nearly full) assembler implementations. At VM initialization time, this glue code is copied into the code cache. As a result, compiled code normally do not have to branch far away when using the glue. This further improves cache locality and performance. There are other reasons for copying the glue code into the code cache, but we'll leave that for another day.

In yesterday's discussion, I explained that we save code space by letting the interpreter loop do all the method dispatching. By method dispatching, I mean the operation of checking the type of the callee method and invoking it in the proper manner (i.e. pushing the appropriate frame, and dispatching to the proper invoker to execute the code). Besides the interpreter, the only other piece of code in the VM that can do method dispatching is the compile code glue. This is done for performance reasons so that we can stay in compiled code as long as possible.

Unlike the interpreter, the compiled code glue dispatcher does not handle every type of method. Only the ones that has performance advantages will be dispatched directly from compiled code. For all other cases, it will fail to dispatch and fall back to the interpreter. In fact, falling back to the interpreter is the default approach for an initial JIT port. Adding a dispatcher in the compiled code glue is a performance optimization that can be done in a later phase of the JIT port.

Efficiency

We've been talking about the dual stacks in CVM. When we think in terms of efficiency, one thing that would come to mind would be to ask if there is some significant loss of efficiency in this approach? The initial thought would be that 2 stacks would mean twice the amount of frame pushing and popping. And therefore, we would incur twice the amount of memory consumption as well. However, when we dig deeper, we see that this isn't always true.

For interpreted methods, we only push a single frame for executeJava on the native stack. For the methods themselves, we only push frames on the Java stack. For compiled methods, we only push a frame for goNative once (in addition to executeJava). All the frames for the compiled methods are pushed on the Java stack. Therefore, if we stay in interpreted mode or compiled mode most of the time, we are effectively only pushing and popping one frame per method.

The problem comes when we include native methods. JNI native methods will incur the penalty of 2 frames per method: one on the native stack, and one on the Java stack. This is why we discourage the use of native methods. Well, this is one reason. However, it is a fact of life that native methods will need to be used at the some point to access native resources. And, this leads me to ...

Coming Soon ...

In my next article, I'll talk about why native methods can be bad for performance (other than the stack issues that I've already told you about so far). I'll also talk about ways to minimize that impact, and other JIT tricks that helps us avoid the "native" penalty.

a Commentary / Side Track

Incidentally, just earlier today, a colleague of mine made a friendly comment on my verbose style of blogging, and how it almost appears as if I cut-and-paste huge sections out of a text-book for each entry. Funny. That's sort of what I was going for. Not a text-book per se, but more like a book for a guided tour through the phoneME Advanced code, with a focus on esoteric tidbits that you may not learn immediately from reading code and comments. In case you're curious ... no, I'm not copying this stuff out of a text book. My colleague knows this because there is no such text-book here inside Sun. It takes a lot of time to write all this (and, I need to learn how to do this faster so that I can get back to my day job).

With all this effort, I do hope that you are getting something out of this ... be it entertainment or enlightenment. Hey, and if you think that the info in this blog is useful, please tell your friends about it so that they can benefit from it too.

When Sun open-sourced the code for the Java platform, I believe we intended for you, our community, to be able to benefit from it, to learn from it, to use it, to make money with it even ... and of course, to help us make money with it too. Disclaimer: I am not a spokesman for Sun. My opinions are my own. Code that can't be or is difficult to understand won't be very useful to a developer who is new to it. But the Java platform is not your average HelloWorld program. It is intrinsically complex. So, with each article I write, I hope that I will be able to give you the sense of "the BIG Picture" ... the very same one that allows me, as an employee of Sun, to have this "text-book" worth of information in my head in order to do my day job effectively. Hopefully, that will make the code seem a lot less complex to you too (see my entry on containing complexity), and allow you to enjoy its benefits sooner ... make money, or whatever you hope to get from it.

And, of course, if you like my blogs, please tell my boss too. ;-)

Till the next entry, have a nice day. :-)

Related Topics >>