Skip to main content

CVM Stacks and Code Execution

Posted by mlam on November 30, 2006 at 5:13 PM PST

Map of CVM Data Structures

Welcome to a continuation of the discussion on the internals of the phoneME Advanced VM (CVM). If you missed the beginning of this discussion, look here where I did a high level introduction of some of the major VM data structures using the CVM map. Today, I'll get into the execution of Java methods and how this appears in the runtime stacks. By stacks, I mean stacks as in the thread stacks that hold activation records for methods ... not stacks as in container APIs, or stacks as in API layers. This discussion will give you insight into the control flow of Java code execution in CVM (i.e. who has the CPU at any time). If you want to bring up a copy of the map for reference while you read on, click here (or here for a PDF to print).

All the source files that will be referenced below can be found in the src/share/javavm/include (see here) or src/share/javavm/runtime (see here) folders of the phoneME Advanced project. You will find the .h files in the include folder, and .c files in the runtime folder.



The Execution Engines

In CVM, there's the interpreter and then, there's the dynamic adaptive compiler (commonly known as the JIT). Conceptually, the interpreter is just a big switch statement, and each case is for a bytecode that is to be executed (see CVMgcUnsafeExecuteJavaMethod() in executejava_standard.c). The interpreter loops around this switch statement until there are no more bytecodes to execute. For methods that are executed frequently (commonly referred to as being hot), the JIT will compile these methods into native machine code. The compiled methods will then be executed in place of doing the bytecode interpretation.

There are many ways to measure the hotness of a method. The CLDC VM (phoneME Feature) uses a timer based sampling mechanism. As of this writing, CVM uses invocation counts that are sampled during interpretation. Upon reaching some threshold of hotness, the method gets compiled. The issue now is how to go from interpreting the bytecodes to executing the compiled method. To understand this (and all the other nuances of Java code execution), we need to take a look at what happens in the runtime stacks when Java code is executed ...

the Runtime Stacks

As previously said, each Java thread in CVM has 2 stacks: a native stack and a Java stack. The Java stack is also commonly referred to as the interpreter stack. Every thread in CVM is identified by a pointer to its CVMExecEnv record. We commonly refer to this as the ee. Given the ee, you can always find the Java stack as follows:

CVMStack *currentStack = &ee->interpreterStack;

the Java Stack

The Java stack is of type CVMStack (see stacks.h and stacks.c). The stack is organized as a list of stack chunks. When the stack is initialized in CVMinitStack(), it will malloc a stack chunk. As more memory is needed in the stack, additional chunks will be allocated. Hence the Java stack is growable in terms of chunks. Theoretically, the stack can shrink too, but the code does not current do this.

Method activation records are stored in frames. The base class frame is CVMFrame (see stacks.h). There is also CVMFreeListFrame (see stacks.h), CVMJavaFrame, CVMTransitionFrame, and CVMCompiledFrame (see interpreter.h). All of these frames are polymorphic with respect to CVMFrame. Note: though CVM is written in C, it uses a lot of object-oriented paradigm in its design. A lot of data structures are polymorphic where it makes sense.

CVMFrames also form a link list of frames that can span the stack chunks. The head of the list (i.e. the bottom-most frame in the stack) is always known because it is at the start of the first chunk. The last frame in the list (i.e. the top most frame in the stack) is pointed to by the currentFrame pointer in CVMStack.

CVMJavaFrame

The CVMJavaFrame is the frame that is used for bytecode interpreted methods. Before invoking a method, the VM will push CVMJavaFrame on the stack. The frame will be initialized with information like the CVMMethodBlock * of the method to be invoked amongst other info. Method metadata are stored in a data structure called the CVMMethodBlock (commonly referred to as the mb or MB). The address of the MB is used as the universal identifier of the method. Hence, that is what is stored in the frame. The frame will also contain a program counter (PC) value. In this case, the PC is a pointer to the bytecode to be executed next (as in the return PC for method calls). The current PC is not always flushed to the frame. Instead it is kept in the local state of the interpreter loop.

This frame structure looks like this:

                          |-----------------|
      start of frame ---> | locals ...      |
                          |-----------------|
                          | frame info      |
                          | (CVMJavaFrame)  |
                          |-----------------|
       top of stack ----> | operand stack   |
                          | ...             |
                          |-----------------|

The locals area hold the Java locals (as defined in the VM spec), and the operand stack area is where the VM pushes and pops operands which are used as the arguments for opcode computations, or as outgoing arguments for a method to be invoked, or a return value from a method that was just invoked.

The VM spec says that the number of locals and the max operand stack capacity is known ahead of time for any given bytecode method. Hence, we will know if there is enough room left in the stack chunk before we push the frame. If there isn't, then a new stack chunk will be allocated, and the frame will be pushed on the next chunk instead.

Since outgoing arguments (for the next method to be invoked) are stored on the operand stack, that part of the operand stack becomes the start of the locals area for the next frame like this:

                          |-----------------|
      start of frame ---> | locals ...      |
                          |-----------------|
                          | Method 1        |
                          | frame info      |
                          |-----------------|
                          | operand stack   |
                          |                 |-----------------|
start of next frame ---> |   outgoing args = incoming locals |
                          |                 |                 |
                          |-----------------|                 |
                                            |-----------------|
                                            | Method 2        |
                                            | frame info      |
                                            |-----------------|
        top of stack ---------------------> | operand stack   |
                                            |                 |
                                            |-----------------|

This is in conformance with the VM spec that says that incoming args start at local 0 of the locals area of the frame.

Note: In CVM, the locals and operand stack area are word sized slots. On a 32-bit system, this would mean 32-bits of memory. The stack pointer would be incremented in word increments. These words can contain Java primitive types (64 bit values will take up 2 slots), or object pointers.

CVMFreeListFrame

One use of freelist frames are for the frames of JNI methods. The frame structure looks like this:

                          |--------------------|
      start of frame ---> | frame info         |
                          | (CVMFreeListFrame) |
                          |--------------------|
                          | operand stack      |
       top of stack ----> | ...                |
                          |--------------------|

One difference is that there are no incoming locals or locals of any sort. For JNI methods, the incoming args are stored on the native stack frame of the native method. These args were copied from the outgoing args in the operand stack of the caller frame. This copying is part of the mashalling work done in the assembler invokeNative glue (see CVMjniInvokeNative e.g. in invokeNative_arm.S here).

Another difference is that the operand stack area is only used to store object pointers. Other operands are stored in CPU registers or on the native stack (depending on the C compiler that compiled the native method). In JNI, when you allocate local refs using NewLocalRef, the freelist frame is where it allocates a stack slot from to store that object pointer. When you release the ref using DeleteLocalRef, the stack slot gets chained into a link list called the freelist (hence, the name freelist frame). The head of the list is in the CVMFreeListFrame record. A copy of the JNI method's MB pointer is also stored there. When you allocate a local ref, we first check the freelist for available refs. If one is available, that ref is removed from the list and returned. If one is not available, then we bump the top of stack pointer and allocate from the top of the operand stack.

Note that unlike Java bytecode methods, we don't know the max number of operands that can be allocated. Fortunately we don't have to. Unlike the Java frame, freelist frames can span across stack chunks. If we run out of space in the current chunk, we simply add another chunk to the stack and allocate from the new chunk.

The other use of the freelist frame is for implementing the GC root stacks that I spoke of in a previous article. The GC root stacks are implemented using a CVMStack with only one freelist frame in it. Since a GC root stack is actually supposed to be a list of object references that serves as GC roots that can be allocated and released (e.g. when you call JNI's NewGlobalRoot() and DeleteGlobalRoot()), the freelist frame implements that nicely.

CVMTransitionFrame

The transition frames mechanism is a clever trick for getting the interpreter to invoke a method for us without writing a whole lot more glue code. The way it works is to simulate a byte code method with a special constantpool entry that points to a target method to be invoked. That constantpool entry is not actually constant, but is a variable in the interpreter loop. The interpreter sets the constantpool entry to point to the MB of the target method. Next, it pretends to invoke one of 4 artificial methods called transition methods (see CVMinvokeStaticTransitionCode, CVMinvokeVirtualTransitionCode, CVMinvokeNonVirtualTransitionCode, and CVMinvokeInterfaceTransitionCode in executejava_standard.c). The choice of the transition method depends on the type of invocation we want are trying to do.

One use of this mechanism is for invoking static initializer () methods. Another is for bootstrapping into the first method to be called in Java code.

CVMCompiledFrame

Lastly, we have the compiled frame. Its frame structure looks like this:

                          |--------------------|
      start of frame ---> | locals ...         |
                          |--------------------|
                          | frame info         |
                          | (CVMCompiledFrame) |
                          |--------------------|
                          | spill/temp area    |
                          |--------------------|
                          | ...                |
       top of stack ----> | operand stack      |
                          | ...                |
                          |--------------------|

Like the Java frame, the CVMCompiledFrame contains a MB pointer, and a PC. The PC in this case is a pointer to the compiled code instruction to be executed next (also the return PC typically). When the VM is about to invoke a method, it first checks if the method is compiled, not compiled, or is native. For native methods, a freelist frame is pushed before the method is invoked through the invokeNative glue assembler. "Not compiled" or bytecode methods will have a Java frame pushed, and execution continues in the interpreter loop. If the method is compiled, a compiled frame will be pushed and the VM will jump to the entry point of the compiled method to continue execution.

On-Stack Replacement

But what if the method was being interpreted for a very long time in a loop, and we decided to compile it ... how do we continue executing the compiled version of that method when we were already interpreting it half way? What we need here is a feature called On-Stack Replacement (OSR). OSR allows us to replace the Java frame on the stack with an equivalent compiled frame.

Note that the shape of the compiled frame is very similar to the Java frame. The only visible difference here is the addition of the spill/temp area. This area is of a fixed size and is known in size after the method is compiled (i.e. before the frame is pushed). There are actually other differences. A compiled frame could have more locals than its Java frame counterpart due to method inlining. Also, the size of the operand stack area could be different. By design, the CVM JIT keeps the same locals mapping for the compiled method as its bytecode equivalent. Locals that are added from inlined methods are added at higher indexes. This means that it will be easy to map the locals over from the Java frame to the compiled frame. As for the new frame info for the compiled frame, we can compute that from the Java frame info without too much effort.

That leaves the spill area and the operand stack. One observation we made is that because of the nature of bytecodes generated from compilation of the Java language, the operand stack will tend to be empty at the beginning of a loop. This is true for 99% of the cases for today's javac compilers (that's a wild guess but probably a good one). Hot loops is where we want OSR to occur. This means that there will be nothing on the operand stack to map when we are at the start of a loop, and we can take advantage of that to do our OSR.

As for the spill area, the CVM JIT also does not generate spill content at the start of loops. Hence, the only thing that needs to be done there is to reserve some space for it in the new frame. And with that, we can replace hot loops that are partially interpreted with the compiled equivalent.

Note: CVM only supports OSR of Java frames with their equivalent compiled frames, and not in the other direction. OSR in the other direction is significantly more difficult and may incur a cost that may not be justified for a JavaME system. This will be one area that the community can choose to investigate in the future if it wishes. There are some interesting advanced things that can be done with reverse OSR, but I'll leave that for another day if we ever get to it.

What Next?

With that, you have now been introduced to the Java stack structure of CVM. Tomorrow, I'll talk briefly about native stack frames (which any embedded programmer should be very familiar with already), and more interestingly, I'll talk about the interplay between the two. So, look for Part II of this discussion.

Till then, have a nice day. :-)

Related Topics >>

Comments

<p>Very good material to understand jvm. Thank you, mlam.</p>

Very good material to understand jvm. Thank you, mlam.

3x

mlam, thank you for your article, I am learning phoneme feature .