|
|
||
Mark Lam's BlogCVM: Why use the C or Java heap?Posted by mlam on August 09, 2007 at 06:49 PM | Comments (4)A comment in a previous blog asks ...
For those of you who haven't been following my blogs before, Erik is asking a specific question regarding the memory layout of data structures in the CVM Java virtual machine (aka the phoneME Advanced VM). Erik, do you mean why specific things are in the C heap instead of the Java heap? Or do you mean why are specific things in the Java heap instead of the C heap? Well, let me answer both ... Why are some things in the Java heap? In general, objects that are instantiated in the Java programming language all reside in the Java heap. Why are some things in the C heap? But you're probably asking about VM data strutures like the CVMClassBlock (cb), CVMMethodBlock (mb), CVMFieldBlock (fb), and the CVMExecEnv (ee) data structure. These are just a few of the more prominent examples of VM data structures that have logical equivalents in the Java world i.e. Class, Method, Field, and Thread. These data structures are called meta-data. Technically, you can choose to implement a VM that keeps all of these meta-data in the Java heap as well. For example, the SE HotSpot VM allocates its class meta-data (class, methods, fields) from the Java heap. I don't know what they do with thread though I am quite sure that some part of the thread (the native stack) at least resides in the malloc heap (or mmap'ed memory). The CLDC VM (aka phoneME Feature VM) also allocates its meta-data in the Java heap. For both these VMs, the reason for doing so is to be able to get memory compaction when the meta-data is no longer needed. CVM chose to allocate these from the C heap because these data structures tend to be accessed a lot during Java code execution. For example, the invocation bytecodes specifies a constant pool entry that refers to the method to be invoked in terms of a String that names the method. The interpreter would quicken this into a pointer to the method itself. Similarly, the JIT generated code which need to access class meta-data are given direct pointers to the data itself. Having a direct pointer results in higher performance from not having to go through levels of indirection. Of course, direct pointers are only possible for objects that don't move. And that's why CVM allocates them from the C heap. But, but, but ... They do this by keeping pointer relocation tables of where all such pointers exists in the meta-data objects. When GC runs and these objects get moved, the pointers that point to them will be updated. That includes pointers that reside in the constant pool (due to quickening) and in JIT compiled code. Hence, they use direct pointers as well. The only difference is that they need to incur the footprint cost for the pointer relocation tables, and the GC time cost to relocate these pointers. Now, Wait a Minute! The difference is this: for CLDC, we're dealing with extremely small libraries and applications, and therefore an extremely small heap. The number of classes (and therefore number of pointer relocation tables) for CLDC are far fewer than for CDC which is what CVM is primarily targetted for. Hence, the cost of relocating the meta-data isn't as expensive for the CLDC VM. CLDC is also extremely tight for space, hence, they need to compact as much as they can. As for the SE HotSpot VM, we have JavaSE which has a lot more classes to deal with than CDC. However, JavaSE is traditionally targetted at relatively more capable machines with a lot more memory and computing power i.e. desktops and servers. Hence, the cost of relocating the meta-data is more tolerable there too. CVM services the space in between where the number of classes are much larger than CLDC's but must run in machines that are not as capable as JavaSE's typical targets. Hence, the tradeoff decision was made early on to allocate these data structures in the C heap. What about Fragmentation? What about the JIT and roots? Erik, when you referred to the JIT compiled code, I presume you meant the generated code and not the references on the Java stack that they operate on. CVM's JIT compiled code doesn't have any such references to the Java heap. Instead, there are references to the meta-data in the C heap instead e.g. the cb, mb, and fb's. On the contrary, allocating these meta-data from the Java heap would require a lot of additional root traversals and reference fixups due to the pointers to the meta-data. Hence, CVM's approach actually results in less GC roots to scan. Or were you asking about allocating the compiled code buffer itself from the Java heap? The CLDC VM does that. As a result, the compiled code can be relocated during GC. Note that allocating the compiled code buffer from the Java heap doesn't actually reduces the number of roots the GC has to scan. You might be thinking of roots in terms of the root of a reference tree, and in this case, it appears that the JIT compiled code can hold a bunch of these roots. If there are references to objects in compiled code (which there isn't in CVM), then yes, you will increase the number of "roots" pointing into the Java heap from the outside. However, roots are just object references that the GC knows about. They are no different than any other object reference e.g. fields inside an object. It does not matter much that they are outside or inside the Java heap. The GC still has to scan them. So, that aspect of it doesn't really make any difference. And again, I remind you that CVM's compiled code actually does not have any GC roots in them. Last Thoughts Whether those incentives will prove compelling enough to motivate the work, only time will tell. Regards, Bookmark blog post: CommentsComments are listed in date ascending order (oldest first) | Post Comment
| ||
|
|