<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en">
<title>Mark Lam&apos;s Blog</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/" />
<modified>2008-06-17T06:29:31Z</modified>
<tagline></tagline>
<id>tag:weblogs.java.net,2008:/blog/mlam/356</id>
<generator url="http://www.movabletype.org/" version="3.01D">Movable Type</generator>
<copyright>Copyright (c) 2008, mlam</copyright>
<entry>
<title>CVM Object Allocation</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2008/06/cvm_object_allo.html" />
<modified>2008-06-17T06:29:31Z</modified>
<issued>2008-06-17T06:29:20Z</issued>
<id>tag:weblogs.java.net,2008:/blog/mlam/356.9978</id>
<created>2008-06-17T06:29:20Z</created>
<summary type="text/plain">Why does the CVM GC stop the world for object allocations?  The answer: for performance.  Here&apos;s how it works ...</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Virtual Machine</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>In a previous <a href="http://weblogs.java.net/blog/mlam/archive/2008/03/cvm_jit_constan.html#43853">comment</a>, Jamsheed asked, ...

<blockquote><p>"In CDC we have garbage collection invocation for fast lock contention case (From my understanding this is done for rolling the object allocation unsafe thread to gcsafe).  My question is why should we invoke a gc call for reaching safe point while this can be achieved by simply making try heap lock a blocking lock in gc safe window(with slight modification to gc safe window).  Or by polling try heaplock with safe point after every iteration."</blockquote>

<p>Jamsheed, I presume that you are referring to the piece of allocation code that requests that all threads to reach a GC safe state.  You're probably thinking that this is a rather slow operation, and that there are cheaper alternatives.  So, why do this?

<p>Here's why ...]]>
<![CDATA[<p><b>Some background ...</b><br>
For those who aren't clued in to what we're talking about here, the piece of code in question can be found in a few places.  One of these is in <code>gc_common.c</code> in the function that allocates memory for new objects.  There, you'll see a test for a microlock.  If the microlock is not currently held by the current thread, then the allocation code will request that all threads reach a GC safe point.  This is also commonly referred to as "stop the world" in common GC discussions.  

<p>In a lot of common GC algorithms (as is the case with the current CVM GC), the GC needs to scan all object pointers in the entire VM in order to determine which objects are still alive and cannot be collected.  In order to do this, it needs to ensure that the threads which are doing work won't be doing any work that can move these pointers around in a way that the GC won't know about.  These threads are called <i>mutator</i> threads because they <i>mutate</i> (i.e. change) the state of pointers in threads.  This is an over-simplified description of what the mutation is, but it's enough to illustrate the point.

<p>A GC safe state is a state in which the thread agrees to NOT mutate any object pointers.  Hence, when we need to GC, we must first ask all threads to reach a GC safe point.  When the threads reach their respective GC safe points, they are said to have entered a GC safe state.  And by definition, they won't be mutating the thread (at least, not in any way that gets in the way of the GC).  So, effectively, the GC has "stopped the world" ... at least, stopped it from doing anymore mutation until further notice.  The threads can still run and do work ... just not any work that mutates object pointers.  If it needs to do any mutation, it will block until the GC gives it permission to proceed.

<p>So, what has this got to do with object allocation?

<p><b>Fast Allocation</b><br>
The fastest way to do allocation from a region of contiguous free heap memory is by simply bumping a top of heap pointer.  Basically, the GC/heap keeps a top of heap pointer.  This pointer points to the highest memory in the heap that has been allocated.  When a new allocation is needed, it simply bumps the pointer up by the size of the needed memory.  The previous value of the pointer would be the address of the newly allocated memory.

<p>However, that only works if there's only one thread that does all the allocation.  If you can have more than one thread, then we need to make sure that those threads don't try to bump the pointer at the same time.  In order to do this, CVM uses a spinlock microlock (in most target platforms).  The spinlock is implemented using a single atomic swap instruction.  The atomic swap is used to check a flag and at the same time mark the flag as being locked.

<p>Most of the time, different threads aren't trying to allocated at the same time.  Hence, the thread who wants the microlock flag will almost always succeed in acquiring it.  That thread then quickly bumps the top of heap pointer to do its allocation, and thereafter, release the microlock flag.

<p>OK, that sounds nice ... but what happens in the case when the threads do try to allocate at the same time (infrequent as it may be)?

<p><b>The Slow Path</b><br>
When more than one thread is contending to do memory allocation at the same time, then the second thread, T2, who tried to acquire the microlock flag will need to block until the first thread, T1, releases the flag.  Note: the microlock flag is just a flag field.  It is not a mutex.  Hence, there is no blocking functionality associated with it.  And we need T2 to block until T1 is done with its allocation, at which point, we wish T2 to be woken up and to take control of the heap to do its allocation.

<p>One way to do this is to have T2 request a "stop the world" on all threads.  After initiating this request, T2 will be blocked waiting for all other threads to reach their GC safe points.  Meanwhile T1 is doing its memory allocation in a GC unsafe state.  After its allocation is done, T1 sees the "stop the world" request, and responds by entering a GC safe state.  Eventually, all threads whould have entered their respective GC safe states, and this will wake T2 up.  At this point, T2 can safely proceed with its memory allocation without having to worry about contention from other threads because ... they are all "stopped".

<p>To recap: the fast case for memory allocation can only occur while a thread is in a GC unsafe state.  If T2 requested a "stop the world" that puts all threads into a GC safe state, then it is guaranteed that no one else will be trying to do a fast allocation at the same time.  And since T2 is the one who successfully requested a "stop the world", no other threads can request a "stop the world" at the same time.  This guarantees that T2 will be the only one who can do the slow path of the memory allocation.

<p>Hence, the "stop the world" request is used in this case as a mechanism to synchronize threads to handle contention for heap resources during memory allocation.

<p>So, back to Jamsheed's question ...

<p><b>Why not just use a mutex?</b><br>
The reason is because a mutex is much more heavy-weight than testing the microlock spinlock flag.  This may not be as evident if you are just looking at the allocation C code in <code>gc_common.c</code>.  However, there is an assembly version of the fast path allocation code that is used by JIT compiled code.  Having this fast path allows JIT compiled code to stay executing in compiled code.  Transitioning out to C code to do the allocation will be expensive as it will force a lot of overhead code to be executed.  JIT compiled code can test the spinlock flag easily for the fast case, but it cannot lock a system mutex (which is target platform implementation specific) without transitioning out to C code.

<p>Hence, we uses the spinlock flag instead of a real mutex in order to get better allocation performance for JIT compiled code.  And as I've pointed out above, most of the time, there will be no contention and we can continue to use the fast path.  In the more rare case when contention occurs, the JIT code will transition out to C code to run the slow path which uses a "stop the world" request to synchronize all threads.

<p><b>In Summary ...</b><br>
The "stop the world" mechanism may be more over-weight than a mutex, but it is only used in the allocation slow path which happens infrequently.  Meanwhile, the "stop the world" mechanism allows us to use the spinlock flag as a simple contention checker that allows us to do fast allocation most of the time.  In the overall scheme of things, this approach yields better performance than using a mutex directly which incurs slower execution (compared to the spinlock flag check) in the majority of allocation cases.

<p>I hope that helps. =)

<p>Regards,<br>
Mark]]>
</content>
</entry>
<entry>
<title>Not at Sun anymore</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2008/06/not_at_sun_anym.html" />
<modified>2008-06-16T09:06:49Z</modified>
<issued>2008-06-16T09:06:44Z</issued>
<id>tag:weblogs.java.net,2008:/blog/mlam/356.9971</id>
<created>2008-06-16T09:06:44Z</created>
<summary type="text/plain">Tonight, I noticed that there were a few inquiries posted (back in May) as comments on some of my old entries. I apologize for not bring able to respond since I didn&apos;t know about them until now. Well, I left...</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>

<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>Tonight, I noticed that there were a few <a href="http://weblogs.java.net/blog/mlam/archive/2008/03/cvm_jit_constan.html#43853">inquiries </a> posted (back in May) as comments on some of my old entries.  I apologize for not bring able to respond since I didn't know about them until now.  Well, I left Sun back in April '08, and for some reason, the blog email notification wasn't redirected to my new email.

<p>Anyway, for what it's worth, Jamsheed and hkpottyn, when I get a chance in the next few days, I'll try to give you an answer to the extent that I can.  I don't work on CVM anymore, but I'd be happy to share my knowledge as before ... again, to the extent that I can, of course.  Since, I don't work on CVM anymore, my knowledge may soon be obsoleted.

<p>Regards,<br>
Mark
]]>

</content>
</entry>
<entry>
<title>JVMTI in Multi-tasking VMs (MVM)</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2008/03/jvmti_in_multit.html" />
<modified>2008-03-13T09:21:36Z</modified>
<issued>2008-03-13T09:21:30Z</issued>
<id>tag:weblogs.java.net,2008:/blog/mlam/356.9357</id>
<created>2008-03-13T09:21:30Z</created>
<summary type="text/plain">In a comment in a previous article, Steven North asks about JVMTI for an MVM.  Here&apos;re my brief thoughts on that subject.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Virtual Machine</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>Hmmmm ... two blog questions in the same day.  What's an over-worked and busy guy to do?  Oh well, I guess the day job can wait just a little while I respond with a few words. :)

<p>On March 12, 2008, in a <a href=http://weblogs.java.net/blog/mlam/archive/2007/07/cdc_and_jvmti.html#38738>blog comment</a>, Steven North asks ...

<blockquote><p><i>
"Mark, I have found your CVM blogs postings very interesting, but I am trying to track down information about MVM (Multi-tasking Virtual Machine) and JVMTI. I am investigating whether I can develop a JVMTI-based tool for the MVM. I can't find any blogs dealing with MVM and its availablity or functionality--this was the closest blog I have found. Would you be so kind as to point me to the right place? Thanks in advance..."
</i></blockquote>

<p>Hi Steven.  Thanks for your compliment and question.  Unfortunately, I don't have an authoritative answer for you.  But here's a few of my thoughts on this subject ...
]]>
<![CDATA[<p><b>What's in an MVM?</b><br>
In case you don't already know, I wrote an entry about MVMs previously (see <a href=http://weblogs.java.net/blog/mlam/archive/2006/11/multitasking_th_1.html>here</a>).  There, I talked about some of the issues of implementing an MVM solution.

<p>One thing to note is that there aren't actually any standards for MVM APIs today.  Well, that's not quite true.  There is <a href="http://jcp.org/en/jsr/detail?id=121">JSR 121</a>.  However, the existing CLDC and CDC implementations of MVM by Sun are not JSR 121 implementations.  Each implementation vary slightly by features and AMS interfaces.  Also, I can't speak for MVM implementations by other VM vendors.  Then again, I don't know of any MVM implementations by other VM vendors.  Maybe I'm just ignorant.

<p>Anyway, let's talk about the ones I do know about i.e. Sun MVMs.  Both the CLDC and CDC MVMs implement only the VM capabilities of MVMs (or some subset thereof depending on what makes sense).  They don't provide the JSR 121 Java APIs that will allow you to programmatically control the VM instances (or Isolates) from the Java programming language level.  The MVM implementations just provide isolation, concurrency, reliable termination, and memory usage efficiency ... you know, the usual MVM goodies that people want. :)

<p><b>JVMTI on MVMs?</b><br>
First of all, as of today, I don't think such a beast exists.  The CLDC VM (aka CLDC-HI aka PhoneME Feature VM) don't support JVMTI yet ... and I don't foresee it doing so in the future either.  The CDC VM (aka CVM aka CDC-HI aka PhoneME Advanced VM) supports JVMTI in the latest development code in the phoneME repository on java.net, but it is not tested to work with the CDC MVM solution.

<p><b>JVMDI (D, not T) on MVMs?</b><br>
That said, both MVM solutions do provide debugging support via JVMDI (the pre-cursor of JVMTI).  Well, sort of.

<p>For the CLDC VM, it uses KDWP ... a debugger wire protocol that a debugger front-end can talk to.  The CLDC VM doesn't actually implement JVMDI.  This is the case for single VMs as well as MVMs.  To my knowledge (which is not very first hand), the KDWP agent has been enhanced to provide simultaneous debugging of multiple VM instances at the same time.  This is available today with Sun's CLDC MVM.

<p>For the CDC VM, it uses a JVMDI agent that talks the JDWP wire protocol.  This is how debugger front-ends can connect to the VM for a debugging session.  The connection is via a socket address and port.  For the MVM case, the CDC VM instances literally run in different processes.  Each of these can be assigned a different debugger port.  If I'm not mistaken, the CLDC MVM works the same way i.e. each VM instance is debugged via a unique socket port, though all VM instances exist within the same process in the CLDC case.

<p>The debugger socket port number serves as the unique identifier for the VM instances in the MVM.  This is how the debugger front-end tells them apart.

<p><b>An MVM Debugger Front-End?</b><br>
How would one connect to multiple VM instances using debugger front-ends?  Today, the only option I know of is to run multiple NetBeans or Eclipse (or name your favorite debugger) instance on the client machine, and have them connect to the different ports for the VM instances.  This is simply because the IDEs are not built to support debugging multiple VM instances at the same time.  Of course, I could just be ignorant about the capabilities of the IDEs too.

<p>However, I see no reason why an IDE can't be made to host multiple VM debug sessions all within the same debugger IDE instance.  It's just a matter of whether the IDE people want to add this capability or not.  Note that this capability does not depend on any knowledge of MVMs at all.  The debugger IDEs simply see the VM instances as if there are purely independent single VMs running on separate machines or processes.  The only difference here is that the MVM case necessarily has the same IP address for the socket connection, and only the port values are different.

<p>I have an inkling suspicion that the CLDC Wireless ToolKit (WTK) from Sun already supports MVM debugging for the CLDC VM.  Just a suspicion.  Maybe one of my colleagues in the know can comment on whether this is true or not.

<p><b>A JVMTI tool for MVMs?</b><br>
But Steven was asking about developing a JVMTI based tool for MVMs ... not JVMDI.  Well, conceptually, I would say that you just have to think of the MVM VM instances as if they are just separate single VM instances.  Based on what I know of the JVMTI specification, I don't think that there's anything there that precludes a JVMTI agent written for a single VM from being used in a MVM as is without modification.  However, this is only based on my current limited knowledge of the inner workings of JVMTI.  Further analysis may uncover some obstacles in the fine print.  But right now, I know of none.

<p>However, that doesn't mean that you'll actually be able to test your JVMTI tool on a MVM implementation.  As I said earlier, I don't know of the availability of such an implementation yet ... at least, not a tested qualified product.  But again, I could just be ignorant on this.

<p>If by JVMTI-based tool, you are actually referring to a debugger front-end such as in the IDEs, then there is certainly a lot more interesting work that you can do there.  For this, you don't need an MVM solution to test your implementation.  You can run multiple single VM instances with different debugger socket ports that the IDE will connect to.  This kind of tool development is interesting if you are debugging some network distributed applications (i.e. the apps run across different VMs potentially on different machines, or the same machine in the MVM case).

<p><b>Final Words</b><br>
Steven, I'm sorry.  That's all the info I can offer right now with my limited knowledge on this subject to date.  I hope that this helps illustrate some of the issues a bit though.

<p>Good luck on your efforts. :)]]>
</content>
</entry>
<entry>
<title>CVM JIT Constant Pool Dumps</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2008/03/cvm_jit_constan.html" />
<modified>2008-03-13T07:52:40Z</modified>
<issued>2008-03-13T07:52:29Z</issued>
<id>tag:weblogs.java.net,2008:/blog/mlam/356.9356</id>
<created>2008-03-13T07:52:29Z</created>
<summary type="text/plain">In a comment in a previous article, Jamsheed asked why CVM&apos;s JIT dumps compiled code constants in a seemingly reverse order.  Well, here&apos;s a discussion about why.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Virtual Machine</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>Hello World!  It's been a long time ... ummm ... like 6 months since I last wrote an entry.  What can I say?  That's the problem with having a day job, and so far, all the ideas for things that I want to write about involves some heavy duty writing that will take up a lot of time.  So, I've been putting it off.  Sorry.

<p>However, this <a href=http://weblogs.java.net/blog/mlam/archive/2007/04/why_choose_java.html#38724>inquiry</a> came in today on one of my previous blog entries.  Now, this, I can answer without taking up a few days of writing time.  So, here you are ...

<p><b>The Question</b><br>
On March 12, 2008, Jamsheed Mohammed asks ... 

<blockquote>
<p><i>"hi lam, Why in the JIT constant pool is the last accessed constant first and the first accessed constant emitted last, while the other way around would be a more efficient usage of ARM architectural limitation (PC relative load limitation)?"</i>
</blockquote> 

<p>I took some liberty with editing the comment for clarity.  Jamsheed, I hope you don't mind.]]>
<![CDATA[<p><b>What's a Constant Pool?</b><br>
Anyone who has written any code would know that you will often need some constants in the course of writing code.  This is no different for code written in the Java programming language.  These Java constants are stored in a data structure called the <a href=http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#20080>Constant Pool</a> which is part of the Java classfile.  The constant pool that Jamsheed is asking about is not that constant pool.  This might be obvious for some people, but just in case, it's better to be clear.

<p>Instead, Jamsheed is referring to constants that are referenced by code that the CVM JIT compiler generates.  These constants may be the same values that are fetched from the classfile constant pool, but that's beside the  point.

<p>Now, there will be more than one of these constants used in the JIT generated code.  So, instead of spreading them all over the generated code, we "pool" them together i.e. we keep them in one (or a few) places.  We call these places (where we pool the constants) the constant pools.  This is the constant pool that Jamsheed was referring to.

<p><b>Why Pool Constants?</b><br>
Some CPUs (like the ARM) has a separate instruction and data cache.  These are commonly referred to as the <i>i-cache</i> and <i>d-cache</i> respectively.  When loading data from memory, the data first gets cached in the d-cache before it is accessible by the CPU.  When loading instructions, the instruction first gets cached in the i-cache instead.  So, data gets stored in the d-cache, and code gets stored in the i-cache.  And constants are ... data!

<p>Now consider what happens when you have your constants spread out throughout interlaced between all the JIT generated code.  If that occurs, when you execute code that is located adjacent to the constant, the constant may also be loaded into the i-cache as well simply because it is located near the needed code.

<p>The cache manager doesn't actually know if the bits in memory are code or data.  It just loads a few words of memory at a time into the respective cache.  If those few words include a constant in the midst of code being executed, the constant will get loaded into the i-cache as well.  When this happens, some space in the i-cache will taken up for something (i.e. the constant) that will never be executed as code.  This is inefficient use of the limited and precious i-cache space.

<p>Similarly, when loading the constant as data, the cache manager will load a few words of memory around the constant into the d-cache.  If all the words around the constant are actually code and not data, then the d-cache will now contain wasted space for words that will never be used as data.

<p>The end result of all this is less cache locality, and that means that the code will run slower.  By pooling the constants together, we lessen the probability of these kinds of i-cache and d-cache inefficiencies occurring. :)

<p>But I digress.  Now, let's get back to Jamsheed's question ...

<p><b>The ARM Instruction Set</b><br>
In his question, Jamsheed mentioned the ARM architecture.  The reason for this is because in the ARM instruction set, load instructions can only load from an address that is located within approximately 4K from the current PC of the load instruction itself.  Let's see what this means ...

<p>Let's say you (i.e. the JIT) just emitted a load instruction to load a constant.  Because you want to pool the constants together, you don't actually emit the constant yet.  Instead, you keep a record of where the load instruction was emitted, and later on when you emit the constant (and therefore, finally know where it is located), you'll come back and fix up this load instruction with the proper offset for that constant.

<p>The question is ... how do you know when to actually emit (also commonly referred to as "dump") the constant?  If you dump it too early, then you may not be pooling as many constants as you possibly could, thereby increasing the cache inefficiency issue I described earlier.  If you dump it too late, then the constant may be out of reach of the 4K range of the load instruction that needs to reach it.

<p>The answer is to do periodic checks for a need to dump constants i.e. we'll dump them out into a pool whenever we feel that we may reach the 4K range limit soon.  See <i>CVMJITcpoolNeedDump()</i> in <i>src/share/javavm/runtime/jit/jitconstantpool.c</i>.

<p><b>Does the CVM JIT really do that?</b><br>
Well, actually ... we don't dump exactly at the border of the 4K limit.  This is because we can't arbitrarily dump whenever we like.  For example, there are some code sequences that need to stick together and cannot afford to have a constant pool suddenly show up in its midst.  Hence, the CVM JIT has to check for a need to dump whenever it is at a convenient place to dump.  Right now, such places include branch sites, method invocation sites, and one other place ... which I'll explain later.

<p>Hence, we can't actually wait till we reach the 4K limit before dumping the constants.  Note that there's also a chance that we may have collected a large number of constants.  When we dump the constants, each constant also further increases the offset for the next constant.  Also, if we don't dump right now, we don't know when the next opportunity to dump will show up.  If it shows up too late, then we'll have a JIT compilation failure.  To address this issue, the CVM JIT uses a heuristic and dump whenever we reach a distance of 2K limit from the original load instruction i.e. we tradeoff some cache inefficiency to make sure that we can reach the constants from the load instructions.

<p>By now, you might be thinking ... what a retarded JIT compiler!  Surely it can do something more intelligent and inch out every last bit of offset possible between the load instruction and the constant pool dump.  Well, theoretically, the JIT can do that.  But in this case, we're talking about a JavaME VM JIT, and it needs to be fast and efficient i.e. the CVM JIT is not allowed to take up too much time and memory to do the compilation.  Using the above heuristic is a cheap but effective trick that gets the job done without sacrificing too much performance.  "Cheap" is good for embedded devices. :)

<p><b>More CVM JIT Details</b><br>
Well, actually ... the max offset range is even smaller than 4K.  That is, if you are using the VFP hardware float instructions of the ARM.  The VFP load instructions (for floating point constants) have an even more smaller range ... something like 256 bytes, if I remember correctly.  So, the cache efficiency issue will be exacerbated.  Anyway, the JIT's gotta do what the JIT's gotta do.

<p>So, remember earlier when I said that there's one other place where we can dump constants?  Well, that place is in between every sequence of instructions that the JIT grammar rules may emit (with some proper code to branch around the constant pool dump, of course).  There is a logical break between each sequence of instructions emitted for each JIT grammar rule where we can insert a constant pool dump.  Because of the small offset range of ARM VFP constants, the CVM JIT is forced to allow dumps more frequently like this.

<p>This doesn't necessarily mean that there will always be a constant pool dump every 128 bytes of instructions or so.  It only means that when there are constants to dump, you may see them show up every 128 bytes or so in the worst case.  Fortunately, our benchmark data shows that performance is not impacted by this (or at least not significantly enough to be noticed).

<p>But I am still digressing ...

<p><b>The Question again</b><br>
Jamsheed was asking why the CVM JIT constants are dumped from last one accessed to the first one.  This obviously increases the offset distance between the first accessed constant and its corresponding load instruction.

<p><b>The Answer ... finally</b><br>
Well, it's simply because we were using a link list to track the constants, and inserting new constants at the head instead of the tail.  There's no reason for dumping the constants this way.  With very minimal work, we can change it so that the constants are dumped in a forward order rather than in the current reverse access order.

<p>Having said that, if you take a look at the range limit issues and the heuristic that the CVM JIT has to employ to predict when the last possible opportunity to dump constants is, you may find that this optimization will have very little effect on the overall scheme of things.    Yes, dumping in forward order will help.  How it will help is perhaps to allow the CVM JIT to use a heuristic ratio that is less than half the max offset range (currently it is half).  This will allow more constants to be pooled before we do a dump.  

<p>However, I'm not sure <i>how much</i> it will help and how much to change the heuristic ratio.  That will be an interesting exercise to do when someone can find the time.  As I've said when I started this entry, the day job is not leaving us a lot of time to play. :(  Is anyone in the community willing to give this a try and report your findings?  Of course, I can provide a few tips on what to do if you are interested.

<p><b>Last Words</b><br>
So, Jamsheed, I hope that answers your question.  Thanks for your astute observation.  It gave me this opportunity to share a bit about how constant pool dumps work in the CVM JIT.

<p>Have a nice day. :)

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/JIT" rel="tag">JIT</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a> <a href="http://technorati.com/tag/embedded+systems" 
rel="tag">embedded systems</a>
]]>
</content>
</entry>
<entry>
<title>VM Inspector 0.1: Some new stuff</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/09/vm_inspector_01_1.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-09-22T03:37:21Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.8301</id>
<created>2007-09-22T03:37:21Z</created>
<summary type="text/plain">Some new features have been added to the CVM&apos;s VM Inspector.  This entry will give you a quick update on this.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>You may or may not have noticed on the <a href="https://phoneme.dev.java.net/downloads_page.html#advanced">phoneME Advanced Downloads page</a>, that there is a phoneME Advanced MR2 binary for WinCE / Windows Mobile 5.  That's one of the projects that my esteemed colleagues and I have been busy working on in the past few months.  That is part of the reason I have not been able to post much.  So, I apologize for that.

<p>Well, after being spoiled with all the rich debugging features available in gdb on Linux, working with WinCE has been ... ummm ... challenging.  In the initial phases of bringing up the VM, any number of things could have gone wrong.  Without adequate debugging capability, it is hard to figure out what has gone wrong.  

<p>One of the common things that can occur when the system isn't stable yet is a hang.  When that happens, you really need to get hold of the thread stack traces in order to figure out where the hang is happening.  Maybe there's a way to do this on WinCE (and VS2005 i.e. Visual Studio 2005) and I'm just ignorant about it, but what I found was that when the threads are all hung, I can't just force a break on all threads whenever I like in VS2005.  Well, I can ... but I don't seem to be getting any thread stack info.  What I heard was that when a thread is blocked inside some WinCE API, then VS2005 won't be able to give us a stack trace.  Since hangs tend to be cases where the threads are all blocked in locks of some kind (i.e. using WinCE APIs), I was out of luck trying to get stack trace information.

<p>But all is not lost.  There's the <a href="http://weblogs.java.net/blog/mlam/archive/2007/07/cvms_vm_inspect.html">VM Inspector (and <i>cvmsh</i>)</a> and the <a href="http://weblogs.java.net/blog/mlam/archive/2007/06/async_thread_du.html">thread dump hack</a> for CVM Java threads that I'd introduced to you previously.  However, I can't just use them as is yet.  So, I added a little bit of enhancements ...]]>
<![CDATA[<p><b>the <i>CVMSH</i> Server</b><br>
On WinCE, there is no default console application that will allow me to use <i>cvmsh</i> in its previous form.  So, I simply looked up an example of some client-server Java examples, and modified <i>cvmsh</i> to support a server feature.  I'll admit the code in there isn't the most elegant stuff I have ever written, but it works.  As of rev 7328 in the phoneME Advanced repository, <i>cvmsh</i> now supports a server option.

<p>The client-server feature allows me to use a remote client as the source of command input to <i>cvmsh</i>.  Hence, I no longer need to rely on a local command shell.  There is now a <i>cvmclient</i> application that you can use to connect to <i>cvmsh</i>'s server.

<p>Enough talk.  Let's get into the useful stuff ...

<p><b>How do I use it?</b><br>
<ol>
<li> <p><b>Build the Inspector</b><br>
As explained <a href="http://weblogs.java.net/blog/mlam/archive/2007/07/cvms_vm_inspect.html">before</a>, build CVM with CVM_INSPECTOR=true.  This can work with non-debug builds (CVM_DEBUG=false) as well, but by default, the inspector is enabled for debug builds (CVM_DEBUG=true). 

<li> <p><b>Run with the Server</b><br>
At your command line, run CVM as follows:<br>
<pre>
> cvm -cp testclasses.zip cvmsh --X "startServer"
</pre>
The <code>--X</code> option says that the next set of arguments are <i>cvmsh</i> commands (like those that I've wrote about <a href="http://weblogs.java.net/blog/mlam/archive/2007/07/cvms_vm_inspect.html">previously</a>).  The <code>startServer</code> is a new command that starts a background server thread.

<p>Normally, you would want to run your application.  Let's say this is how you would normally run your application:
<pre>
> cvm -cp yourJar.jar yourApp 1 2 3 yourArgs
</pre>

<p>To debug your app with <i>cvmsh</i> in server mode, this is how you would run it:
<pre>
> cvm -cp testclasses.zip:yourJar.jar cvmsh \
  --X "startServer" yourApp 1 2 3 yourArgs
</pre>

<p>This launches <i>cvmsh</i>'s server, and then have <i>cvmsh</i> launch your application.

<p>Another new command that I added to <i>cvmsh</i> is the <code>verbose</code> which tells <i>cvmsh</i> to be more talkative and tell you more personal stuff as it does its thing.  Here's how you can give it a try:
<pre>
> cvm -cp testclasses.zip:yourJar.jar cvmsh \
  --X "verbose;startServer" yourApp 1 2 3 yourArgs
</pre>

<p>Note that all I did was add it to the cvmsh command list with a semicolon (;) as the delimiter between each of the <i>cvmsh</i> commands (in this case, <code>verbose</code> and <code>startServer</code>).  In the same way, you can add as many <i>cvmsh</i> commands (along with their arguments) as you like.  Give it a try.  Go nuts!  Here's an example of me going nuts on it:
<pre>
> cvm -cp testclasses.zip:yourJar.jar cvmsh \
  --X "verbose; captureHeapState initialState; gc" \
  --X "disableGC;enableGC; port=2000;" \
  --X ";startServer" yourApp 1 2 3 yourArgs
</pre>

<p>Note that:
<ul>
<li><p>I had several <code>--X</code> arguments. 
<li><p>Sometimes I have spaces between the <i>cvmsh</i> commands and sometimes, I don't.
<li><p>I sometimes have an extra ';' at the end of one command list, and I have an extra one at the start of another.
<li><p>I sometimes have more than one command in a command list, and in another list, I have only one command.
<li><p>In one case (<code>captureHeapState</code>), the command is followed by an argument before the next command.  Note that the next command comes after the ';' delimiter.
</ul>

<p>It's all good.  <i>cvmsh</i> reads and executes those commands from left to right.  You can put them all in one <code>--X</code> command list or several as you like.  Extra ' ' and ';' delimiters are just ignored by <i>cvmsh</i>.  As I said, you can issue any sequence of <i>cvmsh</i> commands that make sense to you.  It's just like typing them in on at the <i>cvmsh</i> shell prompt, except that they are all typed out in a command list now delimited (i.e. separated) by ';'s.

<li><p><b>Connect to the Server</b><br>
    In your remote client machine, run <i>cvmclient</i> like so:
<pre>
> java -cp testclasses.zip cvmclient -host 123.45.67.89 -port=2000
</pre>

<p>This will bring up a prompt just like the <i>cvmsh</i> prompt.  If you don't specify the host, the localhost will be used.  If you don't specify the port, then port 4321 will be used.  4321 is also the default socket port that <i>cvmsh</i>'s server will wait on if you didn't specify the <code>port=2000</code> command earlier that changed the port to 2000 instead.

<p>And now, you are ready to use <i>cvmsh</i> just like you did before.  But wait a minute ... it's not quite the same.  <i>cvmclient</i> only acts as an input mechanism for <i>cvmsh</i>.  All the output for <i>cvmsh</i> still goes to stdout/err on the device where <i>cvmsh</i> is running on.

<p>Note also that in the command line for running <i>cvmclient</i>, I used JavaSE's <code>java</code> instead of CVM.  This is intentional.  If you have a CVM port for the remote machine where you run <i>cvmclient</i>, then you can run it using CVM too.  But chances are, your remote machine is a desktop machine and not a JavaME device.  Hence, you're more likely to have JavaSE installed but not a CVM port there.  Hence, <i>cvmclient</i> is designed to not use any CVM proprietary APIs for that reason ... so that it can be run using JavaSE.  <i>cvmsh</i>, on the other hand, only runs on CVM.  This is, of course, because it needs to access proprietary APIs in order to get you the data about what's going on inside CVM.

</ol>

<p><b>Limitations</b><br>
The <i>cvmsh</i> server feature works with Foundation Profile only and not plain CDC.  This is because it uses <code>java.net.ServerSocket</code> which isn't included in plain CDC.

<p><b>Other Goodies</b><br>
When I started this blog entry, I said that I wanted a Java thread dump on WinCE.  I haven't gotten there yet.  To get there, I need to add a thread dump command to <i>cvmsh</i>.  I also added a few other things while I was going at it.  Here are the new commands:
<table>
<tr><td> <p>listAllThreads
    <td> <p>list all live CVM threads
<tr><td> <p>dumpAllThreads
    <td> <p>dump stacks for all live CVM threads
<tr><td> <p>dumpStack
    <td> <p>dump the stack of the specified thread
<tr><td> <p>sysinfo
    <td> <p>dump some system configuration info
         e.g. the sizes and boundaries of the GC heap,
         the JIT configurations.
<tr><td> <p>verbose/quiet
    <td> <p>enable/disable verbose output e.g. error messages.
<tr><td> <p>port=<i>&lt;num&gt;</i>
    <td> <p>specifies the port number of the socket connection to use.
<tr><td> <p>startServer
    <td> <p>starts the cvmsh server which waits on the socket.
<tr><td> <p>stopServer
    <td> <p>stops the cvmsh server which waits on the socket.
</table>

<p>Give them a try.  In <i>cvmsh</i> or <i>cvmclient</i>, type <code>help</code> to get a list of all the commands and their usage info.

<p>Note that with these new features of <i>cvmsh</i>, we will no longer need the <a href="http://weblogs.java.net/blog/mlam/archive/2007/06/async_thread_du.html">thread dump hack</a> that I talked about previously.  One of the reasons that I had to enhance <i>cvmsh</i> to support a server mode was because I can't send signals to WinCE process to trigger my stack dumps.  Now, I can request it via <i>cvmsh</i> instead.  By the way, I've added more useful info in the thread dumps than was available based on the code I posted previously.  So, check it out.

<p><b>Final Words</b><br>
In case you didn't read my previous introduction of <a href="http://weblogs.java.net/blog/mlam/archive/2007/07/cvms_vm_inspect.html"><i>cvmsh</i> and the VM Inspector</a>, I would like to remind you that <i>cvmsh</i> is intended as a poor man's debugger/profiler.  I intend to continue to add functionality to it over time as I encounter a need ... particularly debugging features that involve inspecting the internals of the VM.  However, I don't intend to make it into a full-fledged debugger/profiler.  That's what NetBeans and JVMTI are for.

<p>So, if you have your favorite feature that you would like me to add (or that you think is useful but cannot be gotten in other ways), please let me know.  Alternatively, <i>cvmsh</i> is a fun small sized project that anyone can work on with some effort.  Working on it will also expose you to some hands on CVM internals.  So, if you are looking for a good way to get motivated about learning the internals of CVM (aka the phoneME Advanced VM), then consider signing up and contributing to the development of <i>cvmsh</i>.

<p>As I find time, I intend to add more features, and I will also try to write a series of blogs of how to do useful stuff using <i>cvmsh</i>.  An example of useful stuff would be like: how to find out why my object is not getting GC'ed?  Or how to find out where the deadlock is in my Java code?  :-)  The first one can be done with <i>cvmsh</i> today, but the second one can't yet.  But with a bit more enhancements, we should be able to do that too.

<p>Well, I hope this has been helpful.  Have a nice day. :-)

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a>]]>
</content>
</entry>
<entry>
<title>What&apos;s the Diff?</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/08/whats_the_diff.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-09-01T03:22:05Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.8150</id>
<created>2007-09-01T03:22:05Z</created>
<summary type="text/plain">You&apos;ve been working with the phoneME Advanced code base (or one of the other projects) on java.net, and you see that someone has checked in some code with a certain revision number.  How do you find out what that change is for?  This entry will give you a clue.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>In a <a href="http://weblogs.java.net/blog/mlam/archive/2007/07/cvms_vm_inspect.html#comments">comment</a> for a previous blog entry, I was asked ...
<blockquote><i>
<p>Hi Mark,

<p>Although my question does not directly have to do with <a href="http://weblogs.java.net/blog/mlam/archive/2007/07/cvms_vm_inspect.html">VM Inspector</a>, I have a question regarding phoneME advanced MR2.  This is regarding revision 5512 - Fix for CR 6554965: crash in pthread_getattr_np calling JNI AttachCurrentThread Reviewed by Chris.

<p>Can you disclose what's the investigation on CR 6554965?  I've seem cvm segfault occasionally without any further debug messages.  However, I do not have a reproducible case.  I would like to understand what is CR 6554965 is about and whether that applies to the problem I have experienced with the cvm.

<p>Thanks,<br>
Steven
</i></blockquote>

<p>This is how I get the answer for that ...]]>
<![CDATA[<p><b>What Files were Modified?</b><br>
First of all, let's get the revision info and a list of what files were changed for this revision.  I'm assuming that you have already set yourself up with subversion and can access the phoneME repository.  So here goes ...

<p>From a command prompt (I'm assuming <code>tcsh</code> ... pick your favorite):
<pre>
> setenv phoneME https://phoneme.dev.java.net/svn/phoneme/components
> svn log -v -r5512 $phoneME/cdc/trunk
</pre>

<p>I basically asked for the verbose (-v) log for revision 5512 (-r5512) which Steven asked about, and I asked for it on the phoneME Advanced trunk code ($phoneME/cdc/trunk).  This is what I get:

<pre>
------------------------------------------------------------------------
r5512 | xyzzy | 2007-05-08 12:32:32 -0700 (Tue, 08 May 2007) | 4 lines
Changed paths:
   M /components/cdc/trunk/src/linux/javavm/runtime/threads_md.c

Fix for CR 6554965: crash in pthread_getattr_np calling JNI AttachCurrentThread
Reviewed by Chris


------------------------------------------------------------------------
</pre>

<p>The revision info says that only one file was modified: <code>src/linux/javavm/runtime/threads_md.c</code>

<p><b>What's the Diff?</b><br>
From the command prompt, type:
<pre>
> svn diff -r5511:5512 $phoneME/cdc/trunk/src/linux/javavm/runtime/threads_md.c
</pre>

And svn tells me ...
<pre>
Index: threads_md.c
===================================================================
--- threads_md.c        (revision 5511)
+++ threads_md.c        (revision 5512)
@@ -129,7 +129,8 @@
 LINUXcomputeStackTop(CVMThreadID *self)
 {
     void *sp = &self;
-    if (pthread_self() == initial_thread_id) {
+    pthread_t myself = pthread_self();
+    if (myself == initial_thread_id) {
         self->stackTop = initial_stack_top;
 #ifdef LINUX_WATCH_STACK_GROWTH
        self->stackBottom = initial_stack_bottom;
@@ -137,13 +138,12 @@
 #endif
         return CVM_TRUE;
     } else if (pthreadGetAttr != NULL) {
-        pthread_t tid = POSIX_COOKIE(self);
         pthread_attr_t attr;
         int result;
         void *base;
         size_t size;
  
-        result = (*pthreadGetAttr)(tid, &attr);
+        result = (*pthreadGetAttr)(myself, &attr);
        if (result != 0) {
            return CVM_FALSE;
        }
</pre>

<p>If you aren't familiar with reading diffs, the lines that start with a '-' are lines that are removed from the old revision.  The lines that start with '+' are added in the new revision.  

<p>If you would like to see the whole file for context, you can check out the file, or just cat it like this below:
<pre>
> svn cat -r5511 $phoneME/cdc/trunk/src/linux/javavm/runtime/threads_md.c > threads_md.c.5511.c
> svn cat -r5512 $phoneME/cdc/trunk/src/linux/javavm/runtime/threads_md.c > threads_md.c.5512.c
</pre>

<p><b>Interpreting the Diff</b><br>
In this case, from the diffs, I see that the change was made in <code>LINUXcomputeStackTop()</code>.  <code>LINUXcomputeStackTop()</code> is responsible for computing the address of the top of the native stack for the current thread.  This value is used for stack bounds checks that are done later during VM runtime execution.

<pre>
-    if (pthread_self() == initial_thread_id) {
+    pthread_t myself = pthread_self();
+    if (myself == initial_thread_id) {
</pre>

<p>The first 3 lines of diffs show basically caches the return value of <code>pthread_self()</code> in a variable <code>myself</code> whereas the old code just uses it without caching.  No significant difference there.

<pre>
-        pthread_t tid = POSIX_COOKIE(self);
</pre>

<p>The next line of diff removes the fetching of tid.  Hmmm ....

<pre>
-        result = (*pthreadGetAttr)(tid, &attr);
+        result = (*pthreadGetAttr)(myself, &attr);
</pre>

<p>The last 2 lines of diffs shows that we're calling <code>pthreadGetAttr()</code> with the <code>myself</code> instead of <code>tid</code>.  If you look in the entire file for <code>LINUXcomputeStackTop()</code>, you will see that <code>pthreadGetAttr</code> is a function pointer to <code>pthread_getattr_np</code>.  We use it to get the attributes of the current thread.  Those attributes are then later used to get the native stack info of the current thread via <code>pthread_attr_getstack</code>.

<p>Both <code>tid</code> and <code>myself</code> are some token that represents the current thread.  What happened is that under some circumstances, <code>POSIX_COOKIE(self)</code> will not return the token for the current thread because the CVMThreadID data structure has not been initialized yet.  Getting the token directly from the pthreads lib ensures that we have a valid token.

<p>The segfault that was reported was because of the use of a <code>tid</code> token that was not initialized yet.

<p><b>What does CR 6554965 say?</b><br>
Though I can't always fulfill a request like this to provide info on change requests that are internal to Sun (for various reasons), I will indulge this one.  CR 6554965 says:

<blockquote>
The code tries to use POSIX_COOKIE(self), but it hasn't been set yet.
</blockquote>

<p>And that is pretty much what we deduced from taking a look at the diffs for this revision change.

<p><b>Final Thoughts</b><br>
In case you really really want to know what a Sun change request says (regardless of what I've shown you above), and I am not available to give you the info, you have 2 options:
<ol>
<li><p>You can query the <b>Sun bug database</b> at <a href="http://bugs.sun.com/bugdatabase">http://bugs.sun.com/bugdatabase</a>.  But for some reason, I can't seem to query any info on CR 6554965.  Maybe the database is down at this moment.  Or maybe the CR is a highly confidential one ... which I know is not the case here.
<li><p>If you are a <b>Java licensee</b>, depending on your support level (I'm not too clear on how these support level things work since I've never worked in Sun's Java Licensee Engineering myself), you can ask your JLE engineer to get you the info on the CR / bug.
</ol>

<p>But if the above 2 options won't work for you, you can always do what I showed you in this blog entry, and take a look at the code changes.  Believe it or not, this is what I actually did before I even bothered to look up the bug database.  Often times, the revision log will give you enough data.

<p>One more thing: the problem that got you interested in CR 6554965 in the first place was a segfault.  I know that Steven said that he wasn't able to reproduce it readily, but in case you have a segfault that you can reproduce and want to know a bit about how to debug it, read this <a href="http://forums.java.net/jive/post!reply.jspa?messageID=231216">forum post</a> that I wrote a while ago.  It will give you an idea about how to get more info about a segfault.  One day, I may repost that in a blog (with some additional info), but for now, check out the forum post.

<p>Hope this has been helpful to you.

<p>Regards,<br>
Mark

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a>]]>
</content>
</entry>
<entry>
<title>CVM: Why use the C or Java heap?</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/08/cvm_why_use_the.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-08-10T02:49:59Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.8002</id>
<created>2007-08-10T02:49:59Z</created>
<summary type="text/plain">A comment in a previous blog asks why CVM keeps some data structures in the C heap instead of the Java heap.  Here&apos;s the answer.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>A <a href="http://weblogs.java.net/blog/mlam/archive/2006/11/the_big_picture.html#28524">comment</a> in a previous <a href="http://weblogs.java.net/blog/mlam/archive/2006/11/the_big_picture.html">blog</a> asks ...

<blockquote>
<p>I'd love to hear the explanations on why specific things are on the Java heap vs. the malloc heap.  In particular it seems like there are a lot of things outside the Java heap that need to refer to things inside the Java heap (e.g. jitted code) resulting in a potentially large number of roots when collecting garbage.<br>
-- erikcorry
</blockquote>

<p>For those of you who haven't been following my blogs before, Erik is asking a specific question regarding the memory layout of data structures in the CVM Java virtual machine (aka the phoneME Advanced VM).

<p>Erik, do you mean why specific things are in the C heap instead of the Java heap?  Or do you mean why are specific things in the Java heap instead of the C heap?  Well, let me answer both ...]]>
<![CDATA[<p><b>Why are some things in the Java heap?</b><br>
One of the features of the Java platform is automatic garbage collection (GC).  One reason for having this is so that we can compact the memory usage and avoid fragmentation.  The GC controls the Java heap and its layout.  It works in conjunction with allocators to allocate objects in different regions of the heap as appropriate.  Periodically, or as needed, the GC will free up memory that is no longer needed, and compact the rest to reduce the use of physical memory pages.

<p>In general, objects that are instantiated in the Java programming language all reside in the Java heap.

<p><b>Why are some things in the C heap?</b><br>
... or the malloc heap as you call it (more accurately so).  The virtual machine itself (in this case, CVM) is a piece of native code and is primarily written in C.  It follows that some of its data structures must necessarily reside in the C heap.

<p>But you're probably asking about VM data strutures like the CVMClassBlock (cb), CVMMethodBlock (mb), CVMFieldBlock (fb), and the CVMExecEnv (ee) data structure.  These are just a few of the more prominent examples of VM data structures that have logical equivalents in the Java world i.e. Class, Method, Field, and Thread.  These data structures are called meta-data.

<p>Technically, you can choose to implement a VM that keeps all of these meta-data in the Java heap as well.  For example, the SE HotSpot VM allocates its class meta-data (class, methods, fields) from the Java heap.  I don't know what they do with thread though I am quite sure that some part of the thread (the native stack) at least resides in the malloc heap (or mmap'ed memory).  The CLDC VM (aka phoneME Feature VM) also allocates its meta-data in the Java heap.  For both these VMs, the reason for doing so is to be able to get memory compaction when the meta-data is no longer needed.

<p>CVM chose to allocate these from the C heap because these data structures tend to be accessed a lot during Java code execution.  For example, the invocation bytecodes specifies a constant pool entry that refers to the method to be invoked in terms of a String that names the method.  The interpreter would quicken this into a pointer to the method itself.  Similarly, the JIT generated code which need to access class meta-data are given direct pointers to the data itself.  Having a direct pointer results in higher performance from not having to go through levels of indirection.  Of course, direct pointers are only possible for objects that don't move.  And that's why CVM allocates them from the C heap.

<p><b>But, but, but ...</b><br>
... but the SE HotSpot and CLDC VMs both allocate their class meta-data in the Java heap.  How do they avoid the cost of the indirection, and get good performance anyway?  

<p>They do this by keeping pointer relocation tables of where all such pointers exists in the meta-data objects.  When GC runs and these objects get moved, the pointers that point to them will be updated.  That includes pointers that reside in the constant pool (due to quickening) and in JIT compiled code.  Hence, they use direct pointers as well.  The only difference is that they need to incur the footprint cost for the pointer relocation tables, and the GC time cost to relocate these pointers.

<p><b>Now, Wait a Minute!</b><br>
Did I just pull a fast one?  Did I not say that CVM allocates the meta-data in the C heap because it wanted the benefit of direct pointers and not have to deal with relocating these meta-data objects?  Why use the C heap when you can get the same benefit with allocating the meta-data from the Java heap?  You would get the benefit of memory compaction in addition too.

<p>The difference is this: for CLDC, we're dealing with extremely small libraries and applications, and therefore an extremely small heap.  The number of classes (and therefore number of pointer relocation tables) for CLDC are far fewer than for CDC which is what CVM is primarily targetted for.  Hence, the cost of relocating the meta-data isn't as expensive for the CLDC VM.  CLDC is also extremely tight for space, hence, they need to compact as much as they can.

<p>As for the SE HotSpot VM, we have JavaSE which has a lot more classes to deal with than CDC.  However, JavaSE is traditionally targetted at relatively more capable machines with a lot more memory and computing power i.e. desktops and servers.  Hence, the cost of relocating the meta-data is more tolerable there too.

<p>CVM services the space in between where the number of classes are much larger than CLDC's but must run in machines that are not as capable as JavaSE's typical targets.  Hence, the tradeoff decision was made early on to allocate these data structures in the C heap.

<p><b>What about Fragmentation?</b><br>
If it's in the C heap, it isn't compactible.  Wouldn't that cause a lot of fragmentation?  Yes, it will cause some fragmentation.  CVM deals with this by allocating (for the most part) only one large contiguous block per class for the class meta-data.  All the method, and field meta-data are contained in the same block allocation.  This reduces fragmentation due to classes in general.

<p><b>What about the JIT and roots?</b><br>
Erik's comment mentioned the JIT compiled code having references to objects in the Java heap, and that this creates a large number of roots (as in GC roots that need to be scanned) during GC.  This is not true.  

<p>Erik, when you referred to the JIT compiled code, I presume you meant the generated code and not the references on the Java stack that they operate on.  CVM's JIT compiled code doesn't have any such references to the Java heap.  Instead, there are references to the meta-data in the C heap instead e.g. the cb, mb, and fb's.  

<p>On the contrary, allocating these meta-data from the Java heap would require a lot of additional root traversals and reference fixups due to the pointers to the meta-data.  Hence, CVM's approach actually results in less GC roots to scan.

<p>Or were you asking about allocating the compiled code buffer itself from the Java heap?  The CLDC VM does that.  As a result, the compiled code can be relocated during GC.  

<p>Note that allocating the compiled code buffer from the Java heap doesn't actually reduces the number of roots the GC has to scan.  You might be thinking of roots in terms of the root of a reference tree, and in this case, it appears that the JIT compiled code can hold a bunch of these roots.  If there are references to objects in compiled code (which there isn't in CVM), then yes, you will increase the number of "roots" pointing into the Java heap from the outside.

<p>However, roots are just object references that the GC knows about.  They are no different than any other object reference e.g. fields inside an object.  It does not matter much that they are outside or inside the Java heap.  The GC still has to scan them.  So, that aspect of it doesn't really make any difference.  And again, I remind you that CVM's compiled code actually does not have any GC roots in them.

<p><b>Last Thoughts</b><br>
Nowadays, CVM is being used in a very diversed range of devices, some of which might even qualify as or outperforms the desktops of the past.  Hence, in the future, it is possible that we (or members of the phoneME Advanced project on java.net) may choose to modify CVM to allocate class meta-data or compiled code from the Java heap instead of the C heap.  The benefits that may incent this include the ability to do more compaction of memory usage, as well as added performance from being able to embed direct object pointers in the compiled code.

<p>Whether those incentives will prove compelling enough to motivate the work, only time will tell.

<p>Regards,<br>
Mark

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/JIT" rel="tag">JIT</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a>]]>
</content>
</entry>
<entry>
<title>CVM&apos;s VM Inspector</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/07/cvms_vm_inspect.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-08-01T07:59:00Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.7951</id>
<created>2007-08-01T07:59:00Z</created>
<summary type="text/plain">A Java virtual machine is a complex piece of machinery.  How does one navigate its internal data structures and make sense of all those data bits?  Well, for CVM, there is help: the VM Inspector.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>In a previous blog entry, I showed you a <a href="http://weblogs.java.net/blog/mlam/archive/2006/11/the_big_picture.html">map of CVM</a>.  If you are a VM engineer (or someone who is doing a port of the VM), and need to do some debugging, navigating all that data structures can be pretty daunting.  How do the CVM engineers do it?

<p><b>History</b><br>
Since the very early days, CVM was built with a bunch of utility functions that allows us to dump certain information about certain commonly used VM data structures.  For example, 2 very popular examples of these are:
<blockquote>
<p>1. <code>CVMconsolePrintf()</code>, and<br>
2. <code>CVMdumpStack()</code>
</blockquote>

<p><code>CVMconsolePrintf()</code> is just like <code>printf</code> except that it adds some nice formating options like <code>%O</code>, <code>%C</code>, <code>%M</code>, <code>%F</code>, that prints the details of a <code>CVMObject *</code>, <code>CVMClassBlock *</code>, <code>CVMMethodBlock *</code>, and <code>CVMFieldBlock *</code> respectively.  There is more, but this gives you the idea.  <code>CVMdumpStack()</code> is used to dump the contents of the Java stack.  The CVM engineers would call these utility functions from the <code>gdb</code> command prompt at runtime to get live information about the state of the VM and its data structures.

<p>However, there is a problem with using these utility functions.  That is, you will need to be careful how you use them.  For example, if you use <code>CVMconsolePrintf("%C", ...)</code> with a pointer that is a <code>CVMClassBlock *</code>, then you may inadvertantly cause a segfault that will crash the VM.  And this would mean that you could lose all the debugging state of the bug that you have spent hours or days to reproduce.

<p>Can't we just get the VM utilities to just do all the careful checks for us automatically so that we don't make a foll of ourselves by calling the wrong call at the wrong time?

<p><b>the VM Inspector</b><br>
Why, yes we can ... ]]>
<![CDATA[well, to a certain extent at least.  The title of this blog entry suggests that there is a module inside CVM called the VM Inspector.  Actually, it isn't quite as grandiose as that.

<p>The VM Inspector is actually just a collection of those utilities that already existed before, plus some wrappers around some of them to make them safe.  It also include some other additional useful utilities that aren't normally available in a production VM build.

<p>To use the VM inspector utilities, build CVM with <code>CVM_INSPECTOR=true</code>.  <code>CVM_INSPECTOR</code> is set to true by default when you build with <code>CVM_DEBUG=true</code>, but you can also enable it in a non-debug build without having to pull in all the other debug code in the system.

<p>  After that, you can start CVM in a <code>gdb</code> session, and call some of the inspector functions from the <code>gdb</code> command prompt.  Alternatively, you can call them from modified VM code.  For a list of the available functions, check out <code>src/share/javavm/include/inspector.h</code> in the phoneME Advanced VM codebase.

<p>However, with that said, you still don't know how to use these utility functions properly.  But rather than writing a big user's manual for you here, let's talk about the <code>cvmsh</code> utility shell instead.  You'll be able to get an idea of how to use these functions based on how they are used in <code>cvmsh</code> (see below).

<p><b>the <code>cvmsh</code> shell</b><br>
cvmsh is a shell program written in the Java programming language that is intended to help with diagnostics and inspection of VM internals when running applications.  In one regard, it is a poor man's debugger/profiler.  One use of it is to detect memory leaks when running applications.  This can be used by application developers or class library developers who want to see what happens in terms of the VM state when Java code is executed.  Because cvmsh is a Java program, it's purpose is not for debugging bugs that crashes the VM.

<p>One advantage of using <code>cvmsh</code> instead calling the inspector or other CVM utility functions directly from <code>gdb</code> is that it is a lot more user friendly.  User friendly in the sense that it is very difficult for you to accidentally crash the VM from cvmsh.  It isn't user friendly in terms of presenting you with a fancy GUI with nice graphics.  It's a low tech tool that I whipped up in previous years.  As said earlier, it is a "poor man's debugger/profiler".

<p>Here is a quick summary user's manual of <code>cvmsh<code>:
<ol>
<li><p><b>Building cvmsh</b><br>
    To build <code>cvmsh</code>, build CVM with
    <code>CVM_INSPECTOR=true</code> added to the make command line.
    <code>CVM_INSPECTOR</code> is true by default for
    <code>CVM_DEBUG=true</code> builds.
    However, it is possible to build CVM with
    <code>CVM_INSPECTOR=true</code> independent of whether
    <code>CVM_DEBUG=true</code> or not.

   <p>The <code>CVM_INSPECTOR=true</code> option will add inspector
   code (Java and native) into the CVM binary (and possibly JAR
   files).  One of these classes will be the
   <code>sun.misc.VMInspector</code>.  This class is intended for
   private use only.

   <p><code>cvmsh</code> will be built into
   <code>testclasses.zip</code>.

<li><p><b>How to run cvmsh?</b><br>
   <code>cvmsh</code> only works with CVM because it relies on APIs
   in <code>sun.misc.VMInspector</code>.  It will not run with other
   VMs.  At the OS command prompt, run:<br>
   <code>&gt; cvm -cp testclasses.zip cvmsh</code>

<li><p><b>How to use cvmsh?</b><br>
   <code>cvmsh</code> launches into a command prompt: '&gt;'
   At the command prompt, you can enter these commands:

   <ul>
   <li><p><code>help</code><br>
       prints a list of commands that can be used.
   <li><p><code>gc</code><br>
       requests a full GC cycle.
   <li><p><code>memstat</code><br>
       prints the current memory statistics of the VM.

   <li><p><code>enableGC</code><br>
       enables the GC if it was previously disabled.
       Does nothing if GC is already enabled.

   <li><p><code>disableGC</code><br>
       disables the GC if it was previously enabled.
       Does nothing if GC is already disabled.

       <p>NOTE: Disabling the GC can have an adverse effect of
       causing the VM to lock up.  This is because GC cycles will
       be blocked until GC is re-enabled.  Under this condition,
       even <code>cvmsh</code> may not be able to continue to run
       for a long time.  It depends on how much free memory remains
       for <code>cvmsh</code>'s use without triggering a GC.

   <li><p><code>keepObjectsAlive true|false</code><br>
       forces the GC to keep all objects alive regardless of whether
       they are reachable or not, or revert to normal GC behavior.

       <p><code>true</code>: force GC to keep objects alive.<br>
       <code>false</code>: allow normal GC behavior to resume.

       <p><b>Dumpers:</b><br>
   <li><p><code>print <i>&lt;object address&gt;</i></code><br>
       invokes System.out.println() on the specified object.<br>
       NOTE: Can only be called while GC is disabled.

   <li><p><code>dumpObject <i>&lt;object address&gt;</i></code><br>
       dumps the contents (class,size,fields,etc) of the specified
       object.<br>
       NOTE: Can only be called while GC is disabled.<br>
       NOTE: Will report an error if the specified object is not a
             valid object.

   <li><p><code>dumpClassBlock <i>&lt;classblock address&gt;</i></code><br>
       dumps some info about the specified classblock.<br>
       NOTE: Can only be called while GC is disabled.<br>
       NOTE: Will report an error if the specified classblock is not a
             valid classblock.

   <li><p><code>dumpObjectReferences <i>&lt;object address&gt;</i></code><br>
       dumps a list of references to the specified object.<br>
       NOTE: Can only be called while GC is disabled.<br>
       NOTE: Will report an error if the specified object is not a
             valid object.<br>
       NOTE: If there are no references, the list will be empty.

   <li><p><code>dumpClassReferences <i>&lt;classname&gt;</i></code><br>
       dumps all references to all instances of the specified class.<br>
       NOTE: Can only be called while GC is disabled.<br>
       NOTE: If the specified class is not found, it will be reported
             as not loaded.

   <li><p><code>dumpClassBlocks <i>&lt;classname&gt;</i></code><br>
       dumps classblock addresses for the specified class.<br>
       NOTE: Can only be called while GC is disabled.<br>
       NOTE: If the specified class is not found, it will be reported
             as not loaded.

   <li><p><code>dumpHeap [simple|verbose|stats]</code><br>
       dumps the heap in the specified format.<br>
       If format is not specified, the default format 'simple' will
       be used.<br>

       <p><code>simple</code>: dumps the number of objects in the heap.<br>
       <code>verbose</code>: dumps the address of each object and their class.<br>
       <code>stats</code>: dumps statistics about objects in the heap. 
       The sizes of each object is added up and a list of each type of
       object (i.e. the class) is printed in order of decreasing total
       consumption of memory.  The more instances of a class, the more
       memory it will consume.  The larger the instances, the more
       memory it will consume.  The total consumption is a measure of
       bytes consumed by all instances of each class.

       <blockquote>
           <p>The statistics are organized in 3 columns:<br>
              Column 1: The total size in bytes of memory consumed by
                         instances of a class.<br>
              Column 2: The number of instances of that class.<br>
              Column 3: The class signature.<br>
       </blockquote>
       NOTE: Can be called with GC enabled or disabled.

      <p><b>Capturing and Comparing Heap
            states:</b><br>

      <p>A heap state is a snapshot of all objects
      that currently exist in the heap.  The
      objects are not copied into the snapshot.
      Only their addresses are copied.

      <p>If a GC occurs and an object is moved,
      its address in all the captured snapshots
      will be updated accordingly to reflect
      this movement.

      <p>If a GC occurs and an object is GCed, it
      will be marked as having being collected in
      all the snapshots.

      <p>NOTE: Hence, the contents of a snapshot
      can change with each GC cycle due to object
      movement or collection.

      <p>Examples of how heap snapshots can be
      used:
      <ol>
      <li><p>Comparing how many and what types of
          objects are created between two points
          of execution.  To do this, make sure to
          run the VM with a large young generation
          so as to allow the app to run without
          triggering a GC.

          <p><blockquote><code>
          &gt; gc<br>
          &gt; disableGC<br>
          &gt; captureHeapState Before running app<br>
          &gt; run <i>&lt;your app&gt;</i><br>
          &gt; captureHeapState After running app<br>
          &gt; listHeapStates<br>
          List of captured heap states:<br>
          &nbsp;&nbsp;hs 2:&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;After running app<br>
          &nbsp;&nbsp;hs 1:&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;Before running app<br>
          &gt; compareHeapState 1 2<br>
          </code></blockquote>

          <p>NOTE: This example takes a look at how
          much memory, how many objects, and what
          type of objects were created during the
          running of some application.

      <li><p>Looking for memory leakage through
          unintentional retention of objects even
          after GC cycles.

          <p><blockquote><code>
          &gt; gc<br>
          &gt; captureHeapState 1<br>
          &gt; run <i>&lt;your app&gt;</i><br>
          &gt; gc<br>
          &gt; captureHeapState 2<br>
          &gt; compareHeapState 1 2<br>
          </code></blockquote>

          <p>NOTE: This example takes a look at
          how much object and memory retention
          occurs across the execution of some
          application.

          <p>Technically, if the VM was in a
          steady state before and after the
          execution of the app, the difference
          should be 0 if there is no memory
          leakage or unexpected object retention.

          <p>However, be aware that running the
          application may cause more system
          classes to be loaded and initialized.
          These system classes and
          objects will not be loaded and will show
          up in the difference.  But
          after running the application several
          times, it is unlikely that
          more system classes will be loaded.  So,
          one way to mitigate this
          effect is to run the application several
          times before doing these
          measurements.
      </ol>

    <li><p><code>captureHeapState [<i>&lt;comment&gt;</i>]</code><br>
        captures the current heap state.  The user
        may provide a comment to
        label the heap state.  A captured heap
        state is also automatically
        assigned a numeric id.  Heap states are
        identified by their ids.

        <p>The comment is provided to help the
        user remember the context under
        which the heap state is captured.
        Comments are optional.  If a
        comment is not specified, a time stamp in
        milliseconds at the time
        the heap state is captured will be
        assigned.

    <li><p><code>releaseHeapState <i>&lt;id&gt;</i></code><br>
        release the specified heap state.

    <li><p><code>releaseAllHeapStates</code><br>
        release all heap states.

    <li><p><code>listHeapStates</code><br>
        list all captured heap states that have
        not been released.  The list
        will show the following columns:<br>
        Column 1: heap state id number.<br>
        Column 2: comment regarding the heap state.

    <li><p><code>dumpHeapState <i>&lt;id&gt;</i> [obj|class]</code><br>
        dumps the specified heap state sorted in
        one of the following orders:<br>
	<code>none</code>: this is the default if
           no sorting order is specified
        <code>obj</code>: sorts in object
           addresses in increasing order.
        <code>class</code>: sorts by classblock
           addresses followed by object addresses
           in increasing order.

    <li><p><code>compareHeapState <i>&lt;id1&gt;</i> <i>&lt;id2&gt;</i></code><br>
        compares the specified heap states and
        list differing objects that
        appear in the 2 heap states.  Some
        statistics are also listed.

        <p>For example:<br>
        <blockquote><code>
        &gt; captureHeapState 1<br>
        &gt; captureHeapState 2<br>
        &gt; compareHeapState 1 2<br>
        Comparing heapStates 1 and 2:<br>
        &nbsp;&nbsp; hs 2: size 20: 0x2e5d54 java.lang.String@0<br>
        &nbsp;&nbsp; hs 2: size 48: 0x2e5d68 [C@0<br>
        &nbsp;&nbsp; hs 2: size 12: 0x2e5d98 cvmsh$CmdStream@0<br>
        &nbsp;&nbsp; hs 2: size 20: 0x2e5da4 java.lang.String@0<br>
        &nbsp;&nbsp; hs 2: size 20: 0x2e5db8 java.lang.String@0<br>
        Number of mismatches in heapState 1: 0 (size 0)<br>
        Number of mismatches in heapState 2: 5 (size 120)<br>
        Total number of mismatches: 5 (size 120)<br>
        Size of heapState 1: 109908<br>
        Size of heapState 2: 110028<br>
        Size difference: 120<br>
        &gt;<br>
        </code></blockquote>

        <p>First a list of objects that appear in
        one heap state but not the
        other will be shown.  In this example,
        heap state 2 is captured
        after heap state 1.  Hence, it follows
        that heap state 2 has more
        objects than heap state 1.

        <p>NOTE: The extra objects that are
        contained in heap state 2 are due
        in this case to objects generated by
        command line input and parsing
        for cvmsh.

        <p>NOTE: There are no objects that appear
        in heap state 1 that aren't
        in heap state 2.  The only possibility of
        such objects are those
        that have been GCed.  For brevity,
        compareHeapState does not list
        objects which have been GCed.

        <p>After the list, some statistics follow:

        <blockquote>
        <p><code>Number of mismatches in heapState <i>&lt;id1&gt;</i>:</code><br>
           this indicates the number of objects
           that exist in the first
           heap state that aren't in the second.
           In this example, there
           are 0 such instances because the there
           are no GCs between the
           capture of the 2 heap states.

        <p><code>Number of mismatches in heapState <i>&lt;id2&gt;</i>:</code><br>
           this indicates the number of objects
           that exist in the second
           heap state that aren't in the first.
           In this example, there
           are 5 such instances which are also
           listed above.

        <p><code>Total number of mismatches:</code><br>
           this is the sum of the 2 mismatch
           counts for heap states
           <i>&lt;id1&gt;</i>
           and <i>&lt;id2&gt;</i>.

        <p><code>Size of heapState <i>&lt;id1&gt;</i>:</code><br>
           this indicates the total size in bytes
           of all objects allocated
           in the heap at the time heap state
           <i>&lt;id1&gt;</i> was captured.

        <p><code>Size of heapState <i>&lt;id2&gt;</i>:</code><br>
           this indicates the total size in bytes
           of all objects allocated
           in the heap at the time heap state
           <i>&lt;id2&gt;</i> was captured.

        <p><code>Size difference:</code><br>
           this indicates the difference in total
           size in bytes of all objects
           between heap state <i>&lt;id1&gt;</i>
           and <i>&lt;id2&gt;</i>.
        </blockquote>

        <p>NOTE: It is possible for the list of
        mismatched objects to appear
        more than the size differences shown in
        the statistics.  This is
        because the statistics are based on total
        heap sizes.  There may
        have been a lot of objects which were GCed
        after the first heap
        state and a lot more allocated before the
        second heap state.  The
        size difference can come close to 0, and
        yet the list of mismatched
        objects in the 2 heap states being
        compared could be large.
        Usually, the objects that appear in this
        list are transient objects
        that will go away in a subsequent GC.

      <p><b>Misc utilities:</b><br>

  <li><p><code>time <i>&lt;command&gt;</i></code><br>
      measures the time in milliseconds sampled
      around the execution of the
      specified cvmsh command.

  <li><p><code>run <i>&lt;Java app and arguments&gt;</i></code><br>
      synchronously runs the specified application
      with the specified
      arguments.  When this command returns to the
      prompt, the application
      will have normally have completed.  Strictly
      speaking, it means that the
      <code>main()</code> method of the
      application has returned.

  <li><p><code>bg <i>&lt;Java app and arguments&gt;</i></code><br>
      asynchronously runs the specified
      application with the specified
      arguments.  The application will be run in a
      newly created thread.
      The fact that this command returns to the
      prompt is no indication of
      whether the app has started/completed or
      not.  The command prompt is
      independent of the execution of the app.

      <p>NOTE: This is not an MVM solution.  There
      is no application context isolation here.
      The app is merely running in a separate
      thread.  If you are running a Personal
      Profile app with a window, and you click
      on the Exit button on that app, it is very
      likely that the app that
      the app will invoke
      <code>System.exit()</code>.  This not only
      cause the app to
      terminate, but <code>cvmsh</code> as well.
      This is because both shares the same
      VM instance.
   </ol>
</ol>

<p><b>Last words</b><br>
The VM Inspector code is just a collection of utilities that can be used to browse VM data structures and inspect the state of the VM.  It is by no means exhaustive in functionality, and is not guaranteed to be bug free either.  This is because it isn't an official product feature.  Therefore, it has not undergone rigorous testing, and I don't get much time to work on it in my day job.  However, it can still be quite useful for debugging and profiling work in the absence of more advanced tools.  It was for me (which is why I put it together a few years ago).

<p>If anyone is so inclined, please give it a try.  Please also feel free to send me feedback on the tool, comments, bug fixes, and enhancements / contributions (subject to the open source governance rules of the phoneME project, of course).  

<p>In the least, I hope it'll be of some help to you in your development efforts.

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a>]]>
</content>
</entry>
<entry>
<title>CDC and JVMTI</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/07/cdc_and_jvmti.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-07-31T06:45:39Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.7946</id>
<created>2007-07-31T06:45:39Z</created>
<summary type="text/plain">The JVM Tools Interface (JVMTI) was introduced with JavaSE 1.5.  Are there issues with using it on CDC 1.1 which is based on JavaSE 1.4?  </summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>In a <a href="http://weblogs.java.net/blog/mlam/archive/2007/04/meeting_up_java.html#28175">comment</a> in a previous blog entry, a friend asked a question about using the JVM<sup><small><small><small>TM</small></small></small></sup> Tools Interface (<a href="http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html">JVMTI</a>) with <a href="http://java.sun.com/products/cdc/">JavaME CDC</a> ...

<blockquote><p>... I am considering to use JVMTI instead of JVMPI.  However, I have one concern that does JVMTI applicable to CDC 1.1(HI)?
As you know, CDC 1.1 is based on JavaSE 1.4, but JVMTI is based on JavaSE 1.5.

<p>Kind regards<br>
Byungseon Shin
</blockquote>

<p>Here's what I think ...]]>
<![CDATA[<p><b>JVMTI and JavaSE 1.5</b><br>
JVMTI was introduced into JavaSE together with the 1.5 release, and replaces the old debugging (<a href="http://java.sun.com/j2se/1.5.0/docs/guide/jpda/jvmdi-spec.html">JVMDI</a>) and profiling (<a href="http://java.sun.com/j2se/1.5.0/docs/guide/jvmpi/index.html">JVMPI</a>) interfaces.  However, JVMTI is a VM native interface and, therefore, does not depend on the use of any Java class library APIs.  This means that it can be implemented in a VM without impacting nor depending on the public Java APIs that are packaged with the VM.

<p>To be sure, I asked a colleague, Bill P., about this.  Bill is the man who implemented the JVMTI functionality in CVM (aka the phomeME Advanced VM).  Here's what Bill said ...

<blockquote>
<p>JVMTI is not tied to a specific SE release although some features of JVMTI are probably easier to do in SE just because that was the implementation vehicle <i>[for the JVMTI RI]</i>.  The <code>java.lang.instrument</code> API is totally separate and unrelated to JVMTI.

<p>We (CDC) are nearly complete in our implementation of JVMTI.   There are only a few APIs that we won't have because we either can't support it <i>[yet]</i> (e.g. <code>GetOwnedMonitorStackDepthInfo</code>, <code>IsFieldSynthetic</code>) or it was way down on the priority list (<code>SetNativeMethodPrefix</code> {for profiling native methods}).  In total there are only 17 out of 138 APIs we don't support right now.  This isn't a  problem in practice since JVMTI has an API that allows an agent to determine if some APIs are present or not.

<p>bill

<p><i>Note: The additional commentary in [] were added by me.</i>
</blockquote>

<p>Hence, it is not a problem to used JVMTI with CDC.  It is additional functionality that the VM supports, but does not impact the public Java APIs.

<p><b>Can I get it now?</b><br>
Having heard that JVMTI is being implemented in CVM, some of you may naturally ask if you can get it now.  Well, some of the code is already in the phoneME Advanced repository.  Other bits are being code reviewed right now, and of course, it may need to go through some additional testing cycles (though Bill has tested it quite a bit).  

<p>The part that is already in the repository is the debugging support.  The part to be added still is the profiling support.  As Bill pointed out, CVM's implementation will be a subset of the JVMTI APIs (which is allowed by the JVMTI specification).  However, I think Bill has been testing this with some of the big name IDEs and tools.  So, we're not expecting too many surprises.

<p>If you are interested in giving it a try, please check out the latest code from the phoneME Advanced repository.  And since this is all open source, we welcome feedback, comments, bug fixes, and contributions (subject to the rules of the project governances, of course).

<p>For other folks who are new to JVMTI, ...

<p><b>What's wrong with JVMPI?</b><br>
Old timers may ask, "what's wrong with JVMPI?"  People in JavaSE land can probably tell you many reasons why JVMPI is bad, but I'll try to capture a few points here, plus some that are specifically relevant to CVM:

<ol>
<li><p>JVMPI is experimental i.e. not really a standard.

<li><p>JVMPI makes lots of assumptions based on how the old JavaSE classic VM works.

<li><p>JVMPI is badly documented.  Above assumptions can only be found out by reverse engineering an existing implementation.

<li><p>JVMPI implementations vary slightly from VM to VM (and not just in Sun's VMs).

<li><p>JVMPI performance is really bad, and can perturb the profile of your application significantly.  This is because of all the native event function calls that it relies on.  This has been eliminated in JVMTI.

<li><p>JVMPI profiling tools (because of the variations between VM implementations) will only work with certain VMs.<br>

<p>Technically, they could work with CVM too.  However, most commercial JVMPI tools would assume the presence of JavaSE libraries, and would also query the VM for its version information.  Using this information, it tweaks its behavior to match the differences in the JVMPI implementation in the various VMs.  Hence, your VM will only work with the tool if the tool vendors have you on their list.  Since most profiler tool vendors traditionally target server applications, CVM being a (then new) JavaME VM, was not on the list.  That means those tools won't work with CVM's JVMPI even though they could.

<li><p>CVM's JVMPI implementation is supported by the interpreter only.  No JIT support.
</ol>

<p>That said, JVMPI served its purpose as an experiment.  The Java community was able to learn from it, and applied that knowledge when designing JVMTI.  To my knowledge, JVMTI resolved all of these problems.

<p>However, there are still limitations to using CVM's JIT with JVMTI.  This is because CVM doesn't currently support on-demand decompilation yet.  Why is this important?  Because functionality like inserting breakpoints and profiling instrumentations effectively require that method bytecodes be redefined at runtime.  If the method being redefined is already compiled, it will need to be decompiled before the new version can be installed for use (or some such equivalent treatment).

<p>For profiling (which relies a lot on bytecode instrumentation), this can be done on CVM using <a href="http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#bci">static or load-time instrumentation</a>.  Perhaps, when time permits in the future, support for on-demand decompilation can be added which will enable dynamic instrumentation as well.

<p>Another potential limitation of using JVMTI with CVM is CVM's romization feature.  ROMized classfiles and their bytecodes are by definition read-only.  Hence, they cannot be redefined (i.e. re-written) at runtime.  I am not sure if we have provided an implementation work around for this yet.  But in general, this does not impact the debugging of your applications which isn't ROMized, nor profiling which relies on static or load-time instrumentation.

<p><b>What's wrong with JVMDI?</b><br>
Ummm ... I can't personally know of anything.  JVMTI is, to my knowledge, for the most part based on JVMDI.  However, the interface has been upgraded to be more general for more diverse VM tool implementations rather than just debuggers.

<p><b>Last words</b><br>
To recap, you can use JVMTI with CDC because JVMTI does not necessitate any changes in Java APIs.  It is purely a VM level interface.

<p>Well, Byungseon, I hope that answers your question.

<p>Regards,<br>
Mark

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/JIT" rel="tag">JIT</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a> <a href="http://technorati.com/tag/embedded+systems" 
rel="tag">embedded systems</a>]]>
</content>
</entry>
<entry>
<title>Async Thread Dumps on CVM</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/06/async_thread_du.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-06-22T09:45:33Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.7708</id>
<created>2007-06-22T09:45:33Z</created>
<summary type="text/plain">Sometimes, your application appears to be hanging, and you don&apos;t quite know where it&apos;s hanging.  If you&apos;re running your app on the phoneME Advanced VM (CVM), then here&apos;s a way to hack it to get a dump of the thread stacks so that you can get an idea of where your app is hung.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>There are times in the course of your development effort when your application just seems to hang forever.  At those times, you wish you had some way of knowing where the hang is occurring.  If you're running on JavaSE, chances are you'll have a lot of advanced tools that makes life easy for you.  But if you're running on an embedded device, suddenly, your options are now severely limited.  For the phoneME Advanced VM (CVM), there's a way to get help on this even when there is not advance debugging support on your device.

<p>What I'll be showing here is an old trick to get an asynchronous dump of the stacks of all the threads that are currently alive in the VM.  First of all, you need to know that this is a <b>hack</b> i.e. it's not good and clean code.  That's why I haven't already committed it to the source repository, and won't be doing so.  The reason it is a hack will be explained below later under <a href="#why_this_is_a_hack">Why this is a Hack!!!</a>.  But even though it is a hack, it is useful when you need it.  Many of my colleagues as well as customers have often asked me for the code patch for this hack to help with debugging the hangs in their applications.  I figure you might find it helpful too.

<p>So, here it is ...]]>
<![CDATA[<p><b>the Code Patch</b><br>
<i>Step 1</i>: In <i>src/linux/javavm/runtime/sync_md.c</i>, add the following function:

<pre>
<p><i>/* BEGIN for Debug Use only */</i>
#include "javavm/include/interpreter.h"

<i>/* Warning: This thread dumper is only for the use of debugging code.
   There's a risk that it can potentially crash the VM if invoked at
   the wrong time.  Hence, this is not to be incorporated into a
   production build.  It is only for assisting in debugging efforts
   when needed.
*/</i>
static void <b>threadDumpHandler</b>(int sig)
{
    CVMExecEnv *ee = CVMgetEE();
    CVMBool success;
    int threadCount = 1;

    success = CVMsysMutexTryLock(ee, &CVMglobals.threadLock);
    if (!success) {
       return;
    }

    CVMconsolePrintf("\nStart thread dump:\n");
    CVMconsolePrintf("======================================\n");
    CVM_WALK_ALL_THREADS(ee, threadEE, {
       CVMconsolePrintf("Thread %d: ee 0x%x", threadCount, threadEE);
       CVMdumpStack(&threadEE->interpreterStack,0,0,0);
       CVMconsolePrintf("======================================\n");
       threadCount++;
    });
    CVMconsolePrintf("End thread dump\n\n");

    CVMsysMutexUnlock(ee, &CVMglobals.threadLock);
}
<i>/* END for Debug Use only */</i>
</pre>

<p><i>Step 2</i>: In linuxSyncInit(), add:
<pre>
<p>            <i>/* BEGIN for Debug Use only */</i>
            {SIGQUIT, <b>threadDumpHandler</b>, SA_RESTART},
            <i>/* END for Debug Use only */</i>
</pre>

<p>If you'll look in <i>src/linux/javavm/runtime/sync_md.c</i>, you'll note that this code is set up to use the same SIGQUIT signal that JVMPI is also using.  So, you need to make sure that there is no conflict i.e. either you aren't using JVMPI at the same time, or that the <i>threadDumpHandler</i> function needs to be called from the JVMPI signal handler function instead.  In this case, "using JVMPI at the same time" means that you had built CVM with the CVM_JVMPI=true option.

<p><b>How Does it Work?</b><br>
Basically, in the above patch, we're setting up a signal handler in Linux for the signal SIGQUIT.  When CVM receives the SIGQUIT signal, it will call the <i>threadDumpHandler</i> function, and that function will iterate through all the live threads and dump their stacks.

<p>Note that the trigger mechanism used here is a signal on Linux.  You should be able to use this for other OSes as well provided that you can set up an asynchronous request handler whereby CVM can receive a request from the user.  That request handler should call <i>threadDumpHandler</i>.  Disclaimer: Some finessing may be necessary if you try to use this for other OSes.  I've only tested this code on Linux.

<p><b>How to Use it</b><br>
After building CVM with this hack added, you can run CVM as usual (with the arguments that you normally specify).  Whenever you like to get a thread stack dump after that, you need to open another terminal window and get the process ID for CVM.  One way to do this is by typing "ps -ef | grep cvm" at the command line.  That should list the processes that are running cvm.

<p>Then, issue a "kill -QUIT <i>&lt;pid&gt;</i>" where <i>&lt;pid&gt;</i> is the process ID of the CVM instance you want a thread dump from.  This will send a SIGQUIT signal to CVM and trigger the thread dump.  You can request this dump as many times as you like and at different times to see if there are changes in the stack traces.  Bear in mind that there is a chance that the request <b>may crash the VM because this is a <a href="#why_this_is_a_hack">hack</a></b> (and not a clean solution).

<p>If you don't want to use SIGQUIT as the trigger signal, you can choose a different signal in the code, and issue a different kill command from the terminal.

<a name="why_this_is_a_hack"></a>
<p><b>Why this is a HACK!!!</b><br>
As mentioned many times above, this trick is a hack i.e. it is not clean code that you would want to put into your production VM.  Only include this code for your personal debugging use.  Note that because it is a hack, it can crash the VM if you're not lucky when using it.

<p>Here are all the reasons why this trick is a hack:

<ol>
<li><p>This mechanism uses a signal handler.  The mechanism also requires that we lock the <i>threadLock</i> mutex.  This is because we need to iterate over the list of live threads and we can't have any of these threads dying on us (and being freed) while we're trying to dump their stack.

<p>However, according to the Linux man pages on <i>pthread_mutex_lock</i>, mutex functions are not async-signal safe, and that calling these functions from a signal handler may deadlock the calling thread.  

<li><p>The stack dumps mechanism <i>CVMdumpStack</i> makes use of <i>CVMconsolePrintf</i> which prints to <i>stderr</i> using <i>fprintf</i>.  I don't know if <i>fprintf</i> is reentrant or not from a signal handler (I didn't see anything in the man pages).  I suspect that it is not.

<p>In general, it is not good practice to do a lot of IO work (like printing to stderr) from signal handlers anyway.  Also, printing to stderr will be synchronized some where underneath <i>fprintf</i>.  Hence, the mutex locking problem also applies here.

<li><p>If you've built CVM with CVM_JIT=true, then chances are some (or all) of your threads are running JIT compiled code.  When running in JIT compiled mode, the threads do not always flush their stack context to the thread's stack data structure.  Some of the context values are simply kept in registers, and only flushed to the stack when absolutely needed.

<p>An asynchronous inspection of the thread's stack data structure (as is done by this dumper mechanism) will not necessarily see the top most methods in the thread's execution.  This is because their information may not have been flushed to the stack.  Again, this is because the compiled code has no need to flush its context to the stack if it is able operate out of registers.

<p>Hence, the stack dump may not be precise.  On the bright side, in practice, it will tend to get you close enough to where the thread is actually executing.  In the least, it takes you to its caller (or its caller's caller, etc).

</ol>

<p><b>WARNING!</b> Again, I caution you:  DO NOT put this code in the production VM that you deploy in your products.  It can crash your VM.  That's why I don't want to commit it into the source repository (for fear that someone will enable it without knowing what the consequences are).

<p><b>Using the Thread Stack Dump Info</b><br>
Is this information enough to prove a deadlock, or solve your problem completely?  No, not necessarily.  All it is guaranteed to do is to give you more information about what your application is doing at that moment in time when you requested the thread dumps.

<p>To actually determine if you have a deadlock or not will require some additional info regarding the state of the monitors that a thread is blocked on.  You will also need to know who owns those monitors.  That is a topic for another day.  The thread dumps may be enough to suggest the existence of a deadlock as the source of your application's hang, and thereby justify further investigation in that direction.  Alternatively, it can show that you don't have a deadlock either.

<p><b>Final Words</b><br>
Again, remember that this is a hack!  Use it with caution, and do not include it in your deployed products.  In spite of its imperfect (and hacky) nature, I do hope that it will help you out should you find yourself in the sticky situation of having to debug hangs.

<p>Have a nice day. =)

<hr>
<p><i>Personal Update</i><br>
I've been really busy with projects for work ... so much that I haven't had much time to sit back and think of more relevant subjects to write about.  As such, I guess I've been feeling a little bit uninspired in the blogging department.  Couple that with my now very limited to non-existent free time, and the result is ... very infrequent blog updates.  For this, I do sincerely apologize.

<p>While I often feel guilty about not updating the blog regularly, I also don't want to just write entries that wouldn't provide you with something useful.  I doubt you'll really want to hear about what I have for lunch each week (or some such mundane details).  So instead, if you have a question or a request for a discussion on any specific  subject that interests you, please ask me about it by entering a comment in my blog.  That will inspire me to write as I do prefer to talk about things that are relevant to you, the Java developer.

<p>Till next time then.

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/JIT" rel="tag">JIT</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a> <a href="http://technorati.com/tag/embedded+systems" 
rel="tag">embedded systems</a>]]>
</content>
</entry>
<entry>
<title>The Price of Speed</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/06/the_price_of_sp.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-06-07T07:40:02Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.7475</id>
<created>2007-06-07T07:40:02Z</created>
<summary type="text/plain">Java ME is typically deployed in resource constrained devices.  We like JIT compilers because they can make Java applications run fast on these devices.  But how much overhead do they incur?  Come find out.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>I apologize for not writing in a while.  I've been trying to get some real work done (i.e. coding and designing solutions to improve the lives of our customers ... or at least, that's my goal).  Anyway, two weeks ago, an interesting comment was added to a previous <a href="http://weblogs.java.net/blog/mlam/archive/2007/02/jit_performance_1.html">article</a> I wrote on understanding JIT performance.  The comment says ...

<blockquote><p>
<i>"Very informative blog!! Is there any information/projection on what % of apps in handheld market is based on and is expected to be written in Java/J2ME? I hear that, since handhelds are very memory constrained, JIT has challenges wrt to space and energy consumption.  Is that too high to keep JIT technology in the darkages? How much more memory does JIT add over an interpreted version? Is there any study or white paper on why and how much such an overhead would be for JIT?"</i>
</blockquote>

<p>Thanks for the <a href="http://weblogs.java.net/blog/mlam/archive/2007/02/jit_performance_1.html#comments">comment</a> (and the compliment), Cochin.  I wanted to answer right away, but alas, I needed to gather some facts for it, and my day job also got in the way (needed to get some work done).  At this moment, while I'm waiting for my computer to crunch some major compilations, I'll take a few minutes to give you my answer ...]]>
<![CDATA[<p>Regarding a projection of what % of the handheld market is / will be based on Java ME (formerly known as J2ME) apps, unfortunately, I personally don't have such info.  But check out this <a href="http://blogs.sun.com/hinkmond/entry/blimey_finally_java_me_tech">blog entry</a> by my colleague Hinkmond Wong.  You can make your own judgement from that, but my guess is that the Java ME handheld apps market will only increase.  But hey, I'm biased.

<p>But regarding JIT overhead and device memory constraints, here are some perspectives ...

<p><b>The OS and Device Memory Budgets</b><br>
In recent years, most devices I've seen (in my limited survey of the embedded world) have a RAM memory capacity in the range from 16M to 32M.  More high-end devices have 64M.  Media devices like set top boxes may have even more memory (128M, 256M, and up).  Regardless of the capacity, the budget for the Java platform to run in is usually a small fraction of the total.  The majority of the memory budget usually goes towards heavy memory consumers like the underlying OS, and media data (image, video, and sound).

<p>For example, in one Linux device which comes with 64M of RAM, right after booting the OS (and its misc services) before the Java platform gets to run, the amount of memory reported by the OS to already be in use is around 42.7M.  Another 4M or so is presumably reserved by the OS (<i>/proc/meminfo</i> says that there is only a total of 60M though we know it's a 64M machine).  That leaves less than 18M for the Java platform and other native applications to work with.

<p><b>The Interpreted VM</b><br>
Next, we need to ask what the footprint of a basic interpreted Java ME platform is like.  Here are some numbers:

<ol>
<li><p><b>CLDC</b><br>
Numbers based on Sun's CLDC-HI aka phoneME Feature VM:<br>

<blockquote>
<p><i>code size:</i> 300K<br>
   <i>static memory (data + bss):</i> 9K<br>
   <i>Java heap:</i> 500K - 8M (typical, but can scale up or down)
</blockquote>

<p>System classes and ROMized classes are included in the size.  Their native methods are included too.  These do not include all the many Java ME JSRs that one can add nor MIDP.  Those are extra, of course, and can add significantly to the footprint depending on which JSRs.  My guess is that JSRs can add anywhere from 10s to 100s of Ks in footprint per JSR.

<li><p><b>CDC</b><br>
Numbers based on Sun's CDC-HI aka phoneME Advanced VM aka CVM (on ARM):<br>

<blockquote>
<p><i>code size:</i> 1522K<br>
   <i>static memory (data + bss):</i> 903K<br>
   <i>Java heap:</i> 2M - 8M (typical, but can scale up or down)
</blockquote>

<p>System classes (CDC/Foundation Profile 1.1) and ROMized classes are included in the size.  Their native methods are included too.  These do not include all the many possible Java ME JSRs as well.

</ol>

<p>In the above memory measurements, native memory for threads, stacks, heap, and other OS constructs in memory is not included.  These can vary depending on the type of Java application that you run.  Usually these add less than 1M of extra RAM usage.  Of course, your mileage may vary.

<p>Note that in most deployments, an 8M heap is probably a very generous allowance.  Depending on the device and the typical applications, the heap size limits may be set differently.

<p><b>The JIT</b><br>
Next, let's look at the additional footprint that a JIT adds.  Here are the numbers:

<ol>
<li><p><b>CLDC</b><br>
Numbers based on Sun's CLDC-HI aka phoneME Feature VM:<br>

<blockquote>
<p><i>JIT code size:</i> 100K<br>
   <i>JIT working memory:</i> small<br>
   <i>JIT code cache:</i> 10% - 20% of Java heap (allocated from Java heap)
</blockquote>

<p>I don't have a measured number for the working memory but it is small (I'm guessing less than 10K, maybe significantly less) because the CLDC-HI JIT does not allocate a lot of intermediate data structures for its compilation process.

<p>CLDC-HI's code cache is allocated from the Java heap.  When the heap is under pressure (i.e. low on memory), the code cache can be shrunk to make room for object allocations.

<li><p><b>CDC</b><br>
Numbers based on Sun's CDC-HI aka phoneME Advanced VM aka CVM (on ARM):<br>

<blockquote>
<p><i>JIT code size:</i> 246K<br>
   <i>JIT static memory:</i> 111K<br>
   <i>JIT working memory:</i> up to 1M<br>
   <i>JIT code cache:</i> 512K (typical, but can scale up or down)
</blockquote>

<p>Hmmmm ... these code size and static numbers surprised me a little actually.  They are much larger than I expected.  However, these are computed by building the VM with the JIT enabled, and subtracting the interpreter only VM sizes from the new sizes.  Let's do a size measurement based on the JIT code modules alone:

<blockquote>
<p><i>JIT code size:</i> 207K<br>
   <i>JIT static memory:</i> 2K<br>
</blockquote>

<p>OK, now that's more like what I expected.  So, what happened is that when I enabled the JIT, the rest of the VM code and data structures also increased in size to support the JIT.  That accounts for the 39K in code size.  The 109K of static memory probably comes from ROMized class data structures that now has to be in RAM to support the JIT.  But I'll be fair and use the larger set of numbers for our computation below.

<p>The CDC-HI JIT working memory is allocated from the C <i>malloc</i> heap as needed.  Typical JIT compilations will not use that much memory (typical usage is in the low 100Ks).  The 1M is the default limit.  This limit can be set lower or higher from the VM command line as needed. 

<p>The JIT code cache (which used to store the compiled code generated by the JIT) is allocated from the C heap as well.  512K is the default size.  I've seen that fairly large sized applications can perform optimally within a code cache size of less than 350K.  The default is set at 512K to allow for ease of use.  This size can be set lower or higher from the VM command line as needed as well.

</ol>

<p><b>The Overhead</b><br>
Given the above numbers, let's look at the overhead a JIT adds over an interpreted only VM:

<ol>
<li><p>CLDC code: 100K / 300K = ~33%<br>
    CLDC RAM (assume 8M heap): (20% x 8M) / (9K + 8M) = ~20%<br>
    <b>CLDC overall</b>: (100K + 20% x 8M) / (300K + 9K + 8M) = ~<b>20%</b>
<li><p>CDC code: 246K / 1522K = ~16%<br>
    CDC RAM (assume 8M heap): (1.5M + 111K) / (903K + 8M) = ~18%<br>
    <b>CDC overall</b>: (246K + 1.5M + 111K) / (1522K + 903K + 8M) = ~<b>18%</b>
</ol>

<p>Note that I'm assuming worse case JIT memory usage in both the CDC and CLDC cases.  Typical memory usage is significantly smaller than this (especially in the CLDC case).  I'm also ignoring other native memory usage (e.g. threads, native stack, etc).

<p>Now, let's look at realisticly what this overhead looks like on a real device.  Assuming the 64M Linux device I mentioned earlier:

<ol>
<li><b>CLDC overall</b>:<br>
    (100K + 20% x 8M) / (300K + 9K + 8M + 42.7M) = <b>1738.4K</b> / 52225.8K = ~<b>3.3%</b>
<li><b>CDC overall</b>:<br>
    (246K + 1.5M + 111K) / (1522K + 903K + 8M + 42.7M) = <b>1893K</b> / 54341.8K = ~<b>3.5%</b>
</ol>

<p>Even with typical Java heap sizes that may be smaller, the memory overhead for the JIT will still be around the 3 - 4% range.

<p><b>The Bottom Line</b><br>
Depending on your device memory budget and other factors, a Java VM JIT may or may not represent a significant overhead.  In the above highly inflated numbers (not in favor of the JIT), the overhead is less than 2M (3% to 4% of overall memory consumption) in both the CLDC and CDC case.  In both cases, typical memory usage is less than the above (and significantly so in the CLDC case).  As you can see from the above example, the bulk of the overhead comes from JIT working memory and the code cache size.  And if your memory budget is tight, these can be configured to be lower in order to reduce the overhead.

<p>To venture a guess, a typical CLDC deployment will have a heap of 4M or less.  Hence, its JIT overhead (which is a % of the heap size) will be reduced by ~819K from ~1738K to ~919K.  If you are only using a 2M heap, that overhead reduces to about <b>510K</b> only.  

<p>A typical CDC deployment will have a 5M heap.  But the working memory and code cache size are independent of that.  The typical working memory size is in the low 100Ks.  Let's say we limit it at 512K.  That brings the JIT overhead down to ~1381K.  As mentioned earlier, a good large sized application will already run optimally in less than 350K.  Assuming we limit the code cache to 400K, the overhead now reduces to about <b>1269K</b>.

<p>Before you go thinking that the CDC JIT is inferior, here are the reasons why it would use more memory than the CLDC JIT:
<ol>
<li><p>The CDC JIT performs <b>more advanced compilation</b>: this uses more working memory but generates more higher performance code.
<li><p>A typical CDC <b>application is a lot more complex</b> than a CLDC app: this causes more Java code to be needed.  Therefore more code gets compiled, and a larger JIT code cache is needed.
<li><p>Typical CDC devices prefer a <b>higher performance to footprint tradeoff</b>: CDC devices usually have a larger memory budget than their CLDC counterparts.  Hence, it makes sense to trade off a bit more memory to get better performance.  For example, CDC inlining is more aggressive.  This can be reduced (by setting JIT options at VM boot time) if needed.
</ol>
<p>Hence, the difference in overhead isn't for nothing.  With CDC, you pay more because you get more and your apps need more.

<p>Note also that the JIT working memory is only used during the JIT compilation process to store intermediate results and data structures.  Once the compilation is complete, all this memory is freed up.  Hence, once the system reaches steady state, JIT compilation will seldom occur, and this overhead will not be incurred.  The above overhead estimates does not take this into account but is instead considering only the worse case when a compilation is actually in progress.

<p><b>Final Word</b><br>
And so, considering the overall allocation of memory in your device, a JIT would actually use very little memory in a typical scenario.  My conclusion, unless your memory budget is extremely tight, is that a JIT is worth the relatively small overhead.  As usual, your mileage may vary.  Draw your own conclusions if you don't like how I pick my numbers.  But, I hope that this at least gives you a better feel for the relatively small size of the overhead.

<p>And, Cochin, for the record, JIT technology isn't from the dark ages.  The dark ages was back when people didn't realize that a JIT is so much more efficient, and they compiled all their code down to native code instead.  Talk about a major footprint overhead.  A resource constraint enviroment like those in Java ME devices is precisely why we would want and need JIT technology.  It is one of a few technologies available today that enables us to get high performance at a smaller footprint price.

<p>Oops, I just realized that I didn't address the energy consumption part of your question yet.  That is a whole separate discussion.  Perhaps, another day.

<p>BTW, thanks to my colleague Brandon Passanisi and Oleg Pliss for providing me with some of the above numbers which I used for this exercise.  Thanks, guys.

<p>Till next time, have a nice day. =)

<hr>

<p>Tags: <a href="http://technorati.com/tag/CDC" rel="tag">CDC</a> <a href="http://technorati.com/tag/CLDC" rel="tag">CLDC</a> <a 
href="http://technorati.com/tag/CVM" rel="tag">CVM</a> <a 
href="http://technorati.com/tag/Java" rel="tag">Java</a> <a 
href="http://technorati.com/tag/J2ME" rel="tag">J2ME</a> <a 
href="http://technorati.com/tag/JavaME" rel="tag">JavaME</a> <a 
href="http://technorati.com/tag/JIT" rel="tag">JIT</a> <a 
href="http://technorati.com/tag/phoneME" rel="tag">phoneME</a> <a 
href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME 
Advanced</a> <a href="http://technorati.com/tag/phoneME+Feature" rel="tag">phoneME 
Feature</a> <a href="http://technorati.com/tag/embedded+systems" 
rel="tag">embedded systems</a>]]>
</content>
</entry>
<entry>
<title>Meeting up @ JavaOne 2007</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/04/meeting_up_java.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-04-28T20:39:53Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.7180</id>
<created>2007-04-28T20:39:53Z</created>
<summary type="text/plain">Does anyone want to meet up at JavaOne?</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>Previously, some of you have suggested to meet up at JavaOne since we'll all be there.  So, here's my availability this year:

<pre>
tues 5/8  @ 12:00pm - 1:10pm - lunch time
             7:10pm - 8:50pm
weds 5/9  @ 12:00pm - 1:10pm - lunch time
thur 5/10 @ 12:00pm - 1:10pm - lunch time
             6:50pm - 7:30pm
             9:00pm - 9:50pm 
</pre>

<p>If anyone is interested in meeting up to talk about VM internals or implementation issues (particularly CVM's), embedding the Java platform in devices, or just want to say hi and introduce yourselves, please leave a comment about when you would like to meet.  If you are new to the Java platform (or not) and was hoping that someone can tell you about how the guts of it work, then here's your chance to talk to a Java virtual machine engineer and get some free one on one time for your questions.

<p>As for places to meet, I would prefer to stay around the JavaOne pavilion or meeting halls, but if you have suggestions for alternate good places to meet, let me know.  


<p>I'll leave a comment in this blog by the middle of or late next week on a finalized place and time to meet (assuming there is interest).  I want to leave some of my free time for checking out all the cool stuff that will be shown on the pavilion floor (and I heard that there will be lots of cool stuff to check out this year).

<p>I'm also one of the panelists at <b>BOF-5734 "Architecture and Implementation of Multitasking Java ME Systems Panel"</b> at 8pm on Tuesday, May 8.  So, come, listen, and join in the discussion.

<p>And in case you don't already know, some of my esteemed colleagues have also set up a <a href="http://wiki.java.net/bin/view/Mobileandembedded/JavaOne2007">social gathering at the Thirsty Bear</a> where you'll get to meet some folks from Sun as well as other mobile and embedded community members.  Make sure to check it out if you can.  Unfortunately, due to a prior engagement, I will not be able to attend that gathering myself.

<p>Hope to see you at JavaOne.
]]>

</content>
</entry>
<entry>
<title>Java and More Embedded Considerations</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/mlam/archive/2007/04/java_and_more_e_1.html" />
<modified>2008-06-24T19:17:03Z</modified>
<issued>2007-04-22T11:46:00Z</issued>
<id>tag:weblogs.java.net,2007:/blog/mlam/356.7115</id>
<created>2007-04-22T11:46:00Z</created>
<summary type="text/plain">This is a follow-up to my previous article &quot;Why Choose Java?&quot;.  This article will try to provide some answers about various things that embedded device developers are likely to ask when choosing a software platform to develop on.</summary>
<author>
<name>mlam</name>

<email>Mark.Lam@Sun.COM</email>
</author>
<dc:subject>Community: Mobile &amp; Embedded</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/mlam/">
<![CDATA[<p>Previously, I talked about why an embedded systems developer would choose to develop on the Java platform.  If you have read that <a href="http://weblogs.java.net/blog/mlam/archive/2007/04/why_choose_java.html">article</a> and are intrigued by the benefits that the Java platform offers, then the next step is probably to ask some more deep probing questions like ...

<p><b>Do I really need the Java platform?</b><br>
Sure, the Java platform offers many benefits.  But is it needed for my specific device?

<p>Well, if you want the benefits of a runtime interpreted scripting language (i.e. isolation, upgradeability, etc.), then, as I have explained previously, your best bet is with the Java platform.

<p>You may not need the Java platform if your device has the following characteristics:

<ol>

<li> <p><b>static functionality</b>: the software functionality never needs to be upgraded in the field, not even for bug fixes.  Or, it is cheaper to replace the device than to replace the software (although this isn't very eco-responsible).  Or you can live with the cost of providing the service and infrastructure to completely re-flash the software in deployed devices.  Under these conditions, you do not need the Java platform's dynamic class loading/unloading feature.

<li> <p><b>simple software</b>: the software application is extremely simple.  When the number of lines of code is low enough, the complexity of the software may be manageable, and not be overwhelming.  Hence, the likelihood of your programmers being able to understand the entire system is higher, and the number of details for them to remember is lower.  Under this condition, you can live without the Java platform's isolation property and language features (e.g. protection from stray pointers, automatic garbage collection, structured locking, etc.), and still be able to get a reasonable amount of developer productivity.

<p>I will also talk more about "simple software" from the perspective of performance and footprint below.

<li> <p><b>small and restricted developer group</b>: if the group of developers is small, then they are easier to manage.  The likelihood that their code will accidentally step on each other's code is less.  If there is only one developer group for the software and you will never need other groups or third parties to develop software for your device, then there is no risk of their code trampling on your code.  Under these circumstances, you may not need the Java platform's isolation property and security features.

<li> <p><b>software can always be trusted</b>: If there will never be any software deployed on your device that is from an untrusted source (i.e. can perform attacks on or crash your device), then you may not need the Java platform's isolation property and security features.  Or alternatively, if you don't care if they attack and crash your device, then you may not need the Java platform.

</ol>

<p>Generally, if your situation doesn't fit into one of the above profiles, then it is likely that you will benefit from developing on the Java platform.
]]>
<![CDATA[<p><b>How does Java fragmentation affect me?</b><br>
<i>Fragmentation</i> is the term that most people use to describe what happened in the mobile phone space with JavaME CLDC implementations.  The effects of fragmentation is that even though the Java platform aims to be <i>Write Once Run Anywhere</i> (WORA), in the mobile phone space, this isn't quite true.  As a result, application developers end up having to test their applications on many different devices (instead of just on one reference platform), and sometimes (or maybe, often times), they also have to write customizations in the application for each of the variations that they discover in those devices.

<p>Fragmentation is bad for the application developers (also known as Independent Software Vendors, or ISVs) because the extra coding and testing adds additional development costs.  Fragmentation is also bad for the phone carriers because it reduces the amount of available content (i.e. applications and services) for their phones, as well as increases deployment costs because of the version tracking that is needed to deal with the variations between devices on their network.  Bottom line, fragmentation is bad for everyone's business. 

<p>So, does this affect you as an embedded device developer?

<p>Well, first of all, let's understand why there is fragmentation.  The reason for this fragmentation (i.e. differences in behavior) is because the mobile phone industry wanted to leave room in the specifications for differentiation of products.  Unfortunately, the amount of room available led to the type of variations that introduce incompatibilities between devices.  Hence, the incompatibility in the Java implementations are there because the industry "wanted" it that way.  I put "wanted" in quotes because, in retrospect, most people would agree that this was a bad decision.  That's why there are efforts (e.g. JSR 248, Mobile Service Architecture, commonly referred to as MSA) to create a more unified Java implementation.

<p>But let's look at the embedded device developer's perspective.  Fragmentation does not affect you because you get to make sure that the hardware is adequately compatible between your devices.  You would have to do this anyway if you were coding with a native language like C/C++ and wanted to re-use code between your devices.

<p><b>Portability in C/C++?</b><br>
If the device hardware is adequately compatible, then why can't you just implement your software in C/C++ while keeping portability in mind?  Wouldn't you get just as much portability from your code because you designed it well?

<p>The answer is yes.  It is possible to implement a good porting abstraction in C/C++ so that it will minimize your effort to port to a new device.  Sun's CVM (aka phoneME Advanced VM) itself, which I work on, is for the most part C code that implements a well thought out porting abstraction.  That's why it is easy to port to different devices.

<p>Why go with the Java platform then?  As I have explained previously, portability of software isn't the only benefit of the Java platform.  You also get lots of other Java platform perks like:
<ol>
<li> <p><b>isolation</b> of critical code: don't need to worry about stray pointers interfering with your critical code anymore, or crashing your system.
<li> <p><b>improved productivity</b>: because you don't have to worry about stray pointers interfering with your critical code, or any code for that matter.  There are no more stray pointer problems.
<li> <p>language and API <b>feature set</b>: fine-grain security, class loading and unloading, etc.
</ol>

<p><b>Development, Testing, and Deployment Costs</b><br>
Let's say we disregard the above benefits of the Java platform, and just compare development, testing, and deployment costs.  How would the C/C++ development route compare?  

<p><u><i>Development</i></u><br>
In terms of development, as a device developer, you won't be able to get away from C/C++ (or even assembly) completely.  Ultimately, you need to do some low level work that is not conveniently done in the Java language.  In the least, you need to port the Java VM and library native code.  

<p>But consider this: if you develop your software (application and system code) entirely in C/C++, you may be subject to debugging problems like memory corruptions, stack overflows, and other such bugs which are extremely difficult to isolate.  If you develop most of your software in Java and implement only a small critical set in C/C++, then debugging time decreases because the percentage of code that can contribute to the problem has been reduced significantly.  

<p>An old proven method for managing the