Search |
||
CVM Stacks and Code ExecutionPosted by mlam on November 30, 2006 at 5:13 PM PST
Welcome to a continuation of the discussion on the internals of the phoneME Advanced VM (CVM). If you missed the beginning of this discussion, look here where I did a high level introduction of some of the major VM data structures using the CVM map. Today, I'll get into the execution of Java methods and how this appears in the runtime stacks. By stacks, I mean stacks as in the thread stacks that hold activation records for methods ... not stacks as in container APIs, or stacks as in API layers. This discussion will give you insight into the control flow of Java code execution in CVM (i.e. who has the CPU at any time). If you want to bring up a copy of the map for reference while you read on, click here (or here for a PDF to print). All the source files that will be referenced below can be found in the src/share/javavm/include (see here) or src/share/javavm/runtime (see here) folders of the phoneME Advanced project. You will find the .h files in the include folder, and .c files in the runtime folder.
The Execution Engines There are many ways to measure the hotness of a method. The CLDC VM (phoneME Feature) uses a timer based sampling mechanism. As of this writing, CVM uses invocation counts that are sampled during interpretation. Upon reaching some threshold of hotness, the method gets compiled. The issue now is how to go from interpreting the bytecodes to executing the compiled method. To understand this (and all the other nuances of Java code execution), we need to take a look at what happens in the runtime stacks when Java code is executed ... the Runtime Stacks the Java Stack Method activation records are stored in frames. The base class frame is CVMFrame (see stacks.h). There is also CVMFreeListFrame (see stacks.h), CVMJavaFrame, CVMTransitionFrame, and CVMCompiledFrame (see interpreter.h). All of these frames are polymorphic with respect to CVMFrame. Note: though CVM is written in C, it uses a lot of object-oriented paradigm in its design. A lot of data structures are polymorphic where it makes sense. CVMFrames also form a link list of frames that can span the stack chunks. The head of the list (i.e. the bottom-most frame in the stack) is always known because it is at the start of the first chunk. The last frame in the list (i.e. the top most frame in the stack) is pointed to by the currentFrame pointer in CVMStack. CVMJavaFrame This frame structure looks like this:
|-----------------|
start of frame ---> | locals ... |
|-----------------|
| frame info |
| (CVMJavaFrame) |
|-----------------|
top of stack ----> | operand stack |
| ... |
|-----------------|
The locals area hold the Java locals (as defined in the VM spec), and the operand stack area is where the VM pushes and pops operands which are used as the arguments for opcode computations, or as outgoing arguments for a method to be invoked, or a return value from a method that was just invoked. The VM spec says that the number of locals and the max operand stack capacity is known ahead of time for any given bytecode method. Hence, we will know if there is enough room left in the stack chunk before we push the frame. If there isn't, then a new stack chunk will be allocated, and the frame will be pushed on the next chunk instead. Since outgoing arguments (for the next method to be invoked) are stored on the operand stack, that part of the operand stack becomes the start of the locals area for the next frame like this:
|-----------------|
start of frame ---> | locals ... |
|-----------------|
| Method 1 |
| frame info |
|-----------------|
| operand stack |
| |-----------------|
start of next frame ---> | outgoing args = incoming locals |
| | |
|-----------------| |
|-----------------|
| Method 2 |
| frame info |
|-----------------|
top of stack ---------------------> | operand stack |
| |
|-----------------|
This is in conformance with the VM spec that says that incoming args start at local 0 of the locals area of the frame. Note: In CVM, the locals and operand stack area are word sized slots. On a 32-bit system, this would mean 32-bits of memory. The stack pointer would be incremented in word increments. These words can contain Java primitive types (64 bit values will take up 2 slots), or object pointers. CVMFreeListFrame
|--------------------|
start of frame ---> | frame info |
| (CVMFreeListFrame) |
|--------------------|
| operand stack |
top of stack ----> | ... |
|--------------------|
One difference is that there are no incoming locals or locals of any sort. For JNI methods, the incoming args are stored on the native stack frame of the native method. These args were copied from the outgoing args in the operand stack of the caller frame. This copying is part of the mashalling work done in the assembler invokeNative glue (see CVMjniInvokeNative e.g. in invokeNative_arm.S here). Another difference is that the operand stack area is only used to store object pointers. Other operands are stored in CPU registers or on the native stack (depending on the C compiler that compiled the native method). In JNI, when you allocate local refs using NewLocalRef, the freelist frame is where it allocates a stack slot from to store that object pointer. When you release the ref using DeleteLocalRef, the stack slot gets chained into a link list called the freelist (hence, the name freelist frame). The head of the list is in the CVMFreeListFrame record. A copy of the JNI method's MB pointer is also stored there. When you allocate a local ref, we first check the freelist for available refs. If one is available, that ref is removed from the list and returned. If one is not available, then we bump the top of stack pointer and allocate from the top of the operand stack. Note that unlike Java bytecode methods, we don't know the max number of operands that can be allocated. Fortunately we don't have to. Unlike the Java frame, freelist frames can span across stack chunks. If we run out of space in the current chunk, we simply add another chunk to the stack and allocate from the new chunk. The other use of the freelist frame is for implementing the GC root stacks that I spoke of in a previous article. The GC root stacks are implemented using a CVMStack with only one freelist frame in it. Since a GC root stack is actually supposed to be a list of object references that serves as GC roots that can be allocated and released (e.g. when you call JNI's NewGlobalRoot() and DeleteGlobalRoot()), the freelist frame implements that nicely. CVMTransitionFrame One use of this mechanism is for invoking static initializer (<clinit>) methods. Another is for bootstrapping into the first method to be called in Java code. CVMCompiledFrame
|--------------------|
start of frame ---> | locals ... |
|--------------------|
| frame info |
| (CVMCompiledFrame) |
|--------------------|
| spill/temp area |
|--------------------|
| ... |
top of stack ----> | operand stack |
| ... |
|--------------------|
Like the Java frame, the CVMCompiledFrame contains a MB pointer, and a PC. The PC in this case is a pointer to the compiled code instruction to be executed next (also the return PC typically). When the VM is about to invoke a method, it first checks if the method is compiled, not compiled, or is native. For native methods, a freelist frame is pushed before the method is invoked through the invokeNative glue assembler. "Not compiled" or bytecode methods will have a Java frame pushed, and execution continues in the interpreter loop. If the method is compiled, a compiled frame will be pushed and the VM will jump to the entry point of the compiled method to continue execution. On-Stack Replacement Note that the shape of the compiled frame is very similar to the Java frame. The only visible difference here is the addition of the spill/temp area. This area is of a fixed size and is known in size after the method is compiled (i.e. before the frame is pushed). There are actually other differences. A compiled frame could have more locals than its Java frame counterpart due to method inlining. Also, the size of the operand stack area could be different. By design, the CVM JIT keeps the same locals mapping for the compiled method as its bytecode equivalent. Locals that are added from inlined methods are added at higher indexes. This means that it will be easy to map the locals over from the Java frame to the compiled frame. As for the new frame info for the compiled frame, we can compute that from the Java frame info without too much effort. That leaves the spill area and the operand stack. One observation we made is that because of the nature of bytecodes generated from compilation of the Java language, the operand stack will tend to be empty at the beginning of a loop. This is true for 99% of the cases for today's javac compilers (that's a wild guess but probably a good one). Hot loops is where we want OSR to occur. This means that there will be nothing on the operand stack to map when we are at the start of a loop, and we can take advantage of that to do our OSR. As for the spill area, the CVM JIT also does not generate spill content at the start of loops. Hence, the only thing that needs to be done there is to reserve some space for it in the new frame. And with that, we can replace hot loops that are partially interpreted with the compiled equivalent. Note: CVM only supports OSR of Java frames with their equivalent compiled frames, and not in the other direction. OSR in the other direction is significantly more difficult and may incur a cost that may not be justified for a JavaME system. This will be one area that the community can choose to investigate in the future if it wishes. There are some interesting advanced things that can be done with reverse OSR, but I'll leave that for another day if we ever get to it. What Next? Till then, have a nice day. :-) »
Related Topics >>
Mobile and Embedded Comments
Comments are listed in date ascending order (oldest first)
|
||
|