The Source for Java Technology Collaboration
User: Password:



Mark Lam

Mark Lam's Blog

The BIG Picture: a Map of CVM

Posted by mlam on November 27, 2006 at 01:17 PM | Comments (8)

Personally, when I dive into a new system, one of the first thing that I try to figure out is how everything fits together. If you are a visual thinker like me, one of the best ways to do that is to draw a diagram of all the things that you think are important and see how they relate to one another. In the case of embedded systems, in my experience, it is also important to know what goes where in memory, and to get a feel of how system resources are being used. Hence, I prefer to map out the data structures.

Here is my map of CVM ...

the WORLD according to CVM

Map of CVM Data Structures
Click on the map to get a popup window with a 1024 x 768 res bitmap of the map (if you want to view it in a separate window). Or click here to view the map in a PDF file. I highly recommend using the PDF if you plan to do a printout of the map.

And here's how to read the map ...

the Root Data Structure
One of CVM's design criteria is to be restartable even when you run it on an OS that is not process based. Restartability without processes requires that we are able to release all malloc'ed memory. To make life easier (and it is good practice anyway), we make sure that all data is reachable from the root of a single tree of data structures in memory. This root data structure is CVMglobals which you will find at the left side of the map. You will find CVMglobals defined in globals.h here (also look for CVMGlobalState in this file) and globals.c here. Looking in CVMglobals, you will find that it is an aggregation of system global data structures. Keeping the globals in one location also makes it easier to restore the globals to a known initial state i.e. by memsetting the whole thing to 0 (after we have done proper clean up of all the subtrees, of course).

GC and the Java Heap
From the globals, you can find an embedded struct which holds GC configuration and management information (CVMglobals.gc). From this, you will be able to get to the Java heap eventually.

CVM has a pluggable GC architecture. Pluggable as in build-time pluggable, not runtime pluggable. This allows for experimental GCs to be tried out with CVM. Currently, the only product quality GC for CVM is the generational GC (see here and here for GC specific implementation files).

All Java objects, i.e. anything that extends from java.lang.Object, is allocated from the Java heap. The only exception to that is for ROMized Java objects. These reside in global data. The Java heap itself is allocated from the C heap. All other data structures are either allocated from global data (i.e. .bss, .data, or their equivalents), or from the C heap.

the JIT and Compiled Code
CVMglobals also hold the configuration and management records for the JIT (CVMglobals.jit). Traversing that tree, you will eventually find the JIT code buffer (also commonly known as the code cache). The code cache is currently fixed sized (though runtime configurable) and is allocated at VM boot time. Once it has been allocated, its size will not be changeable.

When a Java method gets compiled by the JIT, the compiler generated bits (commonly referred to as the compiled method) will reside in the code cache. The compiled method's meta-data (generated by the JIT) will also be stored in the code cache. Hence, the size of the code cache will dictate, indirectly, how many methods can be compiled.

Java Objects and Classes
When a classfile is loaded into memory, it's contents are basically parsed and organized into an optimal structure which is allocated from the C heap. This structure is called the CVMClassBlock, and it holds all the metadata of the class. The metadata includes the constantpool, class attributes, field and method information, bytecodes, etc. For each CVMClassBlock, there is one instance of java.lang.Class which will be allocated from the Java heap. Once a class has been properly loaded, these will always exist as a pair. The classblock will have a reference to the class, and vice versa. When the class is unloaded, they will both be freed effectively together.

Every Java object in CVM will have 2 words of header. The first word usually contains a pointer to the classblock. However, this header is not visible to Java code. It is only visible to the C side of the VM. Note: since java.lang.Class extends java.lang.Object, instances of Class will also have these 2 word headers.

Key files to look at are objects.h and classes.h. See here for the files.

Java Threads
In order to execute anything, the VM must have threads. Each Java thread is represented by a CVMExecEnv (also commonly referred to as an ee). In the VM, the ee is essentially the token identifier of the thread. All thread operations require the ee of the currently executing thread as a parameter. See interpreter.h here and interpreter.c here.

There is a one-to-one mapping between the ee and the java.lang.Thread instance. Once the thread is properly initialized, the 2 will always exist as a pair.

There is also a one-to-one mapping between the ee and a JNIEnv. The JNIEnv is embedded as a field within the ee. Mapping between the ee and JNIEnv addresses basically requires only an offset adjustment.

All ees are chained together in a link list. The head of this list is CVMglobals.threadList. The ee of the main thread is allocated as an embedded field in CVMglobals. The others are malloc'ed.

System Mutexes
Manipulation of the VM thread list needs to be synchronized. The same is true for many other subsystems and resources in the VM. This synchronization is normally done by using a CVMSysMutex (see sync.h here and sync.c here). There are several sysMutexes allocated at VM boot time. These mutexes are not visible to Java code, only VM C code. They are only used by VM code, not Java code.

Each sysMutex has a dedicated purpose (e.g. the CVMglobals.threadLock is for synchronizing the thread operations), and is ranked. In order to prevent deadlock, sysMutexes can only be locked in increasing rank order. When CVM is built with assertions enabled, this rank order will be asserted.

Java Execution Stack
Any thread of execution must have an execution stack. In CVM, each Java thread has 2 physical stacks: a native stack, and a Java stack. The native stack is the one that is allocated by the OS, and is used for C code execution. It holds the activation records (i.e. stack frames) of native code, and VM code including the interpreter loop function. It also holds activation frames for JIT compiled code (with a twist).

The Java stack (also known as the interpreter stack) is used to hold the activation records of Java methods. For each Java method that is executed, a frame will be pushed on this stack. Stack and frame data structures are defined in stacks.h here and stacks.c here .

If you dump a trace of the native stack when executing several Java methods, you will see stack frames for C code and the interpreter loop. If you dump a trace of the Java stack, you will only see stack frames for the Java methods that have been invoked. If you have a native method in the invocation chain, you will see a stack frame in both the native and Java stack. This is because the native method is both a C function and a Java method at the same time.

GC Roots and Root Stacks
In GC terms, CVM is called an exact VM. This means that at the time of GC, we will be able to know definitely where all the object pointers are in the system. This is in contrast with conservative GC systems which requires you to guess whether some piece of memory contains an object pointer or just some random data that resembles an object pointer.

All reachable (and therefore live) objects in the VM can be found by tracing this tree (or trees) of object references called the GC root tree. The tree starts from a root reference. These root references are essentially globals, and are usually stored in data structures called root stacks. An example of this is CVMglobals.globalRoots. Strictly speaking, these data structures need not be stacks. They are actually used as lists. However, our Java stack data structures have properties that fulfills the needs of GC root stacks nicely, and doesn't require us to write additional code (good for code efficiency). So, we just use the stacks.

If an object cannot be found by tracing the root trees, then that object is unreachable and therefore can be reclaimed by the GC.

Note that in traversing a tree, at any point in the traversal, a node can be the root of a new subtree. Hence, the term root or GC root is sometimes used to refer to object pointers / references that are found alone the way in a root scan. GC roots can be found in the root stacks, in thread execution stacks, and in object and class fields.

the End
That should be enough to give you an overall idea of how the major data structures are laid out in CVM. Note: most of the things I told you above is meant to give you a good conceptual model of the lay of the land. In practice, there will be exceptions in some cases for various reasons. Sometimes, these exceptions will break the rules. Other times, they are like extension to the rules. To keep things simple, I left out the exceptions. I may get into those when I talk about each subsystem and/or data structure specifically.

In the above, I also left out many juicy details like ... why allocated a data structure from the C heap vs the Java heap. I'll leave that for subequent discussions.

So, in the next few days (or weeks), I will zoom in on the CVM subsystems and/or data structures (one at a time), and talk about them in detail. This will include mechanical details as well as design philosophies for why things are the way they are (when relevant, of course). Again, feel free to ask questions or make requests for topics. I will try to accommodate as much as I can.

Have a nice day. :-)


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Mark--this is fascinating and well-written to boot. I have one question as regards threading--what requirements are there as regards the host device for threading, what threading library/approach do you depend on?

    Keep it up! Maybe the content of these blogs should go in a Wiki related to the project? It's a great introduction to the code.

    Cheers
    Patrick

    Posted by: pdoubleya on November 28, 2006 at 02:32 AM

  • Hi Patrick, thanks for your comment.

    Regarding threading, CVM basically makes use of the underlying OS thread library i.e. native threads with full pre-emption. Technically, it should work with green (simulated) threads too. However, I don't think that it has ever been tested with it. If the green threads library requires regular check points where each thread yields the CPU to another thread, then there is no explicit support for that currently. But usually, the thread library works with the synchronization library. Hence, yields will occur at sync points (like when waiting on locks and conditional variables). The thread scheduling may seem weird though, but it is not incorrect. Java threads make no guarantees about how threads are scheduled. The one thing CVM does not do, is to provide its own thread library.

    Regarding putting this content in a twiki, I do have plans for that but I'm not sure when I'll get around to it (as I would prefer to reformat things to make it more suitable for a twiki). The phoneME Advanced twiki is here. There's not much on it yet, but we're only just starting.

    Regards, Mark

    Posted by: mlam on November 28, 2006 at 09:41 AM

  • Hello Mark!

    Thank you very much for your blogs. I've been working on J2ME applications for some time now and, while I've been wanting to know about what really happens under the hood, I've not had an opportunity until now. However, with the help of your blogs and by asking lot of questions, I hope to gain an understanding of the inner mysteries. So here are the first two questions:

    1) with reference to the cvm map, would I be right in assuming that the platform specific codes are localised in the JNI mechanism or are there other areas of the cvm that are also affected by architectural features of platforms?

    2) it would be wonderful if a sort of "cvm prototyping kit" was available so that one could play around with vm code and study the effects of modifications. It'd be especially great for people like me who don't work for vm "manufacturers". This, of course, is really not a question but an expression of wishful thinking and I'd love to have your comments.

    Thanks once again

    Biswajit Sarkar

    Posted by: bsarkar on December 04, 2006 at 08:16 AM

  • Hi Biswajit,
    Thanks for your feedback. Regarding your first question, see my first blog entry. That will tell you about platform specific code in the VM. There are, of course, some JNI native methods for the class libraries. A lot of these are platform independent at the source code level. Some which uses specific libraries / toolkits / OS features are platform specific.

    Regarding your second question, phoneME Advanced is open-sourced. You can download the code, build it, and try it yourself. Others have one this already. Just go to the phoneM project and follow the links.

    *EDIT*: I forgot to mention that you'll be able to reach more experts (including class library engineers, etc) at the phoneME Advanced forum. Feel free to ask questions there as well. Of course, I love getting comments too.

    Regards, Mark

    Posted by: mlam on December 04, 2006 at 10:40 AM

  • http://www.jnode.org/search/node/jvm

    Posted by: robinpaul on February 27, 2007 at 11:58 PM

  • http://www.jnode.org/search/node/jvm

    Posted by: robinpaul on February 27, 2007 at 11:59 PM

  • I'd love to hear the explanations on why specific things are on the Java heap vs. the malloc heap. In particular it seems like there are a lot of things outside the Java heap that need to refer to things inside the Java heap (eg jitted code) resulting in a potentially large number of roots when collecting garbage.

    Posted by: erikcorry on August 09, 2007 at 05:41 AM

  • Erik, the answer to your question is here: CVM: Why use the C or Java heap?. -- regards, Mark

    Posted by: mlam on August 09, 2007 at 06:54 PM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds