Search |
||
The BIG Picture: a Map of CVMPosted by mlam on November 27, 2006 at 1:17 PM PST
Personally, when I dive into a new system, one of the first thing that I try to figure out is how everything fits together. If you are a visual thinker like me, one of the best ways to do that is to draw a diagram of all the things that you think are important and see how they relate to one another. In the case of embedded systems, in my experience, it is also important to know what goes where in memory, and to get a feel of how system resources are being used. Hence, I prefer to map out the data structures. Here is my map of CVM ... the WORLD according to CVM And here's how to read the map ...the Root Data Structure GC and the Java Heap CVM has a pluggable GC architecture. Pluggable as in build-time pluggable, not runtime pluggable. This allows for experimental GCs to be tried out with CVM. Currently, the only product quality GC for CVM is the generational GC (see here and here for GC specific implementation files). All Java objects, i.e. anything that extends from java.lang.Object, is allocated from the Java heap. The only exception to that is for ROMized Java objects. These reside in global data. The Java heap itself is allocated from the C heap. All other data structures are either allocated from global data (i.e. .bss, .data, or their equivalents), or from the C heap. the JIT and Compiled Code When a Java method gets compiled by the JIT, the compiler generated bits (commonly referred to as the compiled method) will reside in the code cache. The compiled method's meta-data (generated by the JIT) will also be stored in the code cache. Hence, the size of the code cache will dictate, indirectly, how many methods can be compiled. Java Objects and Classes Every Java object in CVM will have 2 words of header. The first word usually contains a pointer to the classblock. However, this header is not visible to Java code. It is only visible to the C side of the VM. Note: since java.lang.Class extends java.lang.Object, instances of Class will also have these 2 word headers. Key files to look at are objects.h and classes.h. See here for the files. Java Threads There is a one-to-one mapping between the ee and the java.lang.Thread instance. Once the thread is properly initialized, the 2 will always exist as a pair. There is also a one-to-one mapping between the ee and a JNIEnv. The JNIEnv is embedded as a field within the ee. Mapping between the ee and JNIEnv addresses basically requires only an offset adjustment. All ees are chained together in a link list. The head of this list is CVMglobals.threadList. The ee of the main thread is allocated as an embedded field in CVMglobals. The others are malloc'ed. System Mutexes Each sysMutex has a dedicated purpose (e.g. the CVMglobals.threadLock is for synchronizing the thread operations), and is ranked. In order to prevent deadlock, sysMutexes can only be locked in increasing rank order. When CVM is built with assertions enabled, this rank order will be asserted. Java Execution Stack The Java stack (also known as the interpreter stack) is used to hold the activation records of Java methods. For each Java method that is executed, a frame will be pushed on this stack. Stack and frame data structures are defined in stacks.h here and stacks.c here . If you dump a trace of the native stack when executing several Java methods, you will see stack frames for C code and the interpreter loop. If you dump a trace of the Java stack, you will only see stack frames for the Java methods that have been invoked. If you have a native method in the invocation chain, you will see a stack frame in both the native and Java stack. This is because the native method is both a C function and a Java method at the same time. GC Roots and Root Stacks All reachable (and therefore live) objects in the VM can be found by tracing this tree (or trees) of object references called the GC root tree. The tree starts from a root reference. These root references are essentially globals, and are usually stored in data structures called root stacks. An example of this is CVMglobals.globalRoots. Strictly speaking, these data structures need not be stacks. They are actually used as lists. However, our Java stack data structures have properties that fulfills the needs of GC root stacks nicely, and doesn't require us to write additional code (good for code efficiency). So, we just use the stacks. If an object cannot be found by tracing the root trees, then that object is unreachable and therefore can be reclaimed by the GC. Note that in traversing a tree, at any point in the traversal, a node can be the root of a new subtree. Hence, the term root or GC root is sometimes used to refer to object pointers / references that are found alone the way in a root scan. GC roots can be found in the root stacks, in thread execution stacks, and in object and class fields. the End In the above, I also left out many juicy details like ... why allocated a data structure from the C heap vs the Java heap. I'll leave that for subequent discussions. So, in the next few days (or weeks), I will zoom in on the CVM subsystems and/or data structures (one at a time), and talk about them in detail. This will include mechanical details as well as design philosophies for why things are the way they are (when relevant, of course). Again, feel free to ask questions or make requests for topics. I will try to accommodate as much as I can. Have a nice day. :-) »
Related Topics >>
Mobile and Embedded Comments
Comments are listed in date ascending order (oldest first)
|
||
|
|