|
|
||
Mark Lam's BlogA Tale of Two StacksPosted by mlam on December 05, 2006 at 01:10 AM | Comments (0)
So today, I'll continue talking about the runtime stacks in the phoneME Advanced VM / CVM from my last entry. Resources: start of CVM data structure discussion, copy of the map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime. Let's get started ... Why use 2 Stacks? In the early days of CVM, linux was still uncommon in embedded systems. Most embedded OSes in those days (and still today) do not have processes, and do not provide access to a virtual memory manager (even if the hardware supports it). Hence, when you create a native thread, you will have to malloc a chunk of memory for the use of the native stack. Remember that CVM's threads are mapped directly onto native threads. One issue with having to allocate memory for the stacks is knowing exactly how much memory to allocate. If you allocate too little, applications won't run. Allocate too much, and you'll have wasted resources. For embedded devices, wasting resources is highly undesirable. Note that without virtual memory, malloc'ing the memory here means a fixed sized allocation and the memory is committed upon allocation. In turn, that means no other thread can use the memory even if the current thread doesn't need it. This is why over-allocating stack sizes results in wastage for most embedded OSes. For classical embedded software written in C, one could do some analysis to determine max usage of stack space and get an optimum allocation of memory. For a Java platform, this is typically not possible. One of the most desirable features of the Java platform is its ability to enhance the device by allowing new or updated versions of applications to be downloaded and executed without having to re-ROM/flash the core firmware in the device. This means that the stack requirement of the application(s) is not known / computable at the time the Java VM is built. And the solution is ... The Separate Java stack The stack chunk size, and min and max Java stack sizes are specified in globals.c. Search for stackChunkSize, stackMinSize, and stackMaxSize respectively in the file. These 3 are specified in an example of some VM command-line options. Hence, similarly, these options can be changed on the VM command-line using -Xopt:stackChunkSize=<size>, -Xopt:stackMinSize=<size>, and -Xopt:stackMaxSize=<size> respectively. Note that the stack chunk size specification is only used to determine the typical chunk size, and not as a limit on chunk sizes. In the event that there is a Java method which requires an abnormally large frame that doesn't fit in the size of a typical chunk (e.g. has a lot of locals, and/or needs a very deep operand stack), the VM will allocate a special chunk just for that method frame and insert it into the stack where it is needed. The old chunk which isn't big enough will be freed. The min size will be used as the size of the first stack chunk. The max size is the upper bound on how much memory can be allocated for each Java thread. What happens to the Native Stack? the Interpreter's Native Frames For example, when methods mIa calls mIb which in turn calls mIc (where all 3 methods are interpreted bytecode methods), the frames on the stacks will look as follows: native stack: ... executeJava Note: executeJava is the abbreviation of the name of the interpreter loop function (see CVMgcUnsafeExecuteJavaMethod() in executejava_standard.c). Note also that the executeJava frame only appears once on the native stack even though there are 3 Java methods on the Java stack. This is part of the C recursion elimination feature that is designed into CVM to make it embedded friendly. Eliminating the need for C recursion in the interpreter now enables us to shrink the usage of the C stack significantly. Another example of C recursion elimination in CVM is in the classloading system where the highly recursive part of the classloading process is moved into Java code to make use of the Java stack instead. the Native Code's Native Frames When an interpreted method (mI) calls a native method (mN), the stacks will look as follows: native stack: ... executeJava -> mN But native methods can invoke another native or interpreted method through JNI. Here's what the stack looks like for native to native invocation: native stack: ... executeJava -> mNa -> executeJava -> mNb Here's what the stack looks like for native (mN) to interpreted (mI) invocation: native stack: ... executeJava -> mN -> executeJava Here's native (mNa) to interpreted (mIa) to native (mNb) to interpreted (mIb) invocation: native stack: ... executeJava -> mNa -> executeJava -> mNb -> executeJava Note that the interpreter loop must be on the native stack once for each native methods. This is because the interpreter loop is used to invoke a transition method which bootstraps the native method. See yesterday's discussion for details on transition frames and methods. Therefore, invoking any method from a native method (whether native or interpreted) requires recursing into the interpreter loop i.e. executeJava. Hence, it's not 100% accurate to say that CVM never recurses into the interpreter loop. But rather that CVM eliminates a lot of interpreter loop recursion i.e. those for bytecode methods. CVM will recurse into the interpreter loop for each layer of native methods called in the call chain. This is one reason why native methods aren't necessarily good for performance as mentioned previously. Recursion into the interpreter loop adds some overhead. However, this isn't the only reason native code can be bad for performance. More on that another day. Why recurse the Interpreter Loop? Calling a C function that is not a Java method, e.g. malloc(), does not require recursing into the interpreter loop. Detecting Stack Overflows For native functions, it would be painful to have to implement a stack bounds check on every function to be called. Even if we can do that for the VM code, we won't be able to enforce it on any native JNI methods that a third party middleware vendor may write. JNI also does not require nor have any mechanisms to achieve this. Another issue is that the performance impact will be high. So, for native code, we used a red zone mechanism. Though the Java application code is unpredictable at VM build time, the VM code itself is not. Hence, we can apply some static analysis to determine the max stack usage of native code (in most cases). But instead of inserting a stack check before every function call, we only insert stack checks at major recursion points. For example, see the call to CVMCstackCheckSize() (in stacks.c) from the interpreter loop, CVMgcUnsafeExecuteJavaMethod() (in executejava_standard.c). Instead of ensuring enough stack capacity for one function call, each checkpoint ensures enough capacity for any possible call chain from that checkpoint till another checkpoint is encountered (including till the same function recurses). This checked capacity is called a red zone. An example of a red zone value is CVM_REDZONE_ILOOP which is the stack capacity value which is checked at the checkpoint in the interpreter loop. All red zone values are defined in the platform specific threads_md.h (for example, see the linux one here). The defined values are conservative estimates that should work for any CPU architecture. If you look at the example source, you will see that all the red zone values are paded with a CVM_REDZONE_Lib_Overhead value. This value is the estimate stack usage for any given JNI method and its possible call chains (up to another red zone checkpoint, of course). The only case where we can't compute native stack usage is JNI code that is provided by a third party. Hence, we use a reasonably conservative estimate for this padding. A VM porting engineer should adjust this value if it is not conservative enough for the actual target device. However, it is always possible to write a JNI method that is pathological and will break any limit we set. But then again, when you have untrusted native methods, you'll have a lot more to worry about than unreasonable stack usage. Note: This is yet another reason to avoid using native code. Hence, the red zone padding here is meant only to provide for protection against overflow due to reasonable stack usage. It is not a guarantee against misbehaving native code. So, don't overdo the padding. Note that since every native method is bootstrapped by an instance of the interpreter loop, adding the padding to the interpreter loop red zone value should cover the JNI method's native stack usage. But, as a cautionary (possibly paranoid) measure, the padding was added to all the other red zone values as well. Note that the red zone values are only used for checking the capacity of potential stack usage. It only ensures that we have at least as much memory left in the native stack as specified by the red zone value before embarking on a native call chain. Hence, any wastage in stack memory (due to conservative guessing of red zone values) is bounded by the size of the max red zone value. The amount of wastage is not cummulative at each red zone checkpoint. Other Issues So, How do other VMs "stack" up? The phoneME Feature / CLDC VM implements its own pseudo-thread library for Java threads. In terms of native threads, there is only one for the VM (theoretically speaking). Instead, each Java thread get its own Java stack. The Java thread does not have a separate native stack. Disclaimer: I am not an expert on these other VMs. I'm only telling you what I've heard i.e. they both use one stack per thread. Time for Bed Have a nice day. :-) Bookmark blog post: CommentsComments are listed in date ascending order (oldest first) | Post Comment | ||
|
|