Skip to main content

The Price of Speed

Posted by mlam on June 6, 2007 at 11:40 PM PDT

I apologize for not writing in a while. I've been trying to get some real work done (i.e. coding and designing solutions to improve the lives of our customers ... or at least, that's my goal). Anyway, two weeks ago, an interesting comment was added to a previous article I wrote on understanding JIT performance. The comment says ...

"Very informative blog!! Is there any information/projection on what % of apps in handheld market is based on and is expected to be written in Java/J2ME? I hear that, since handhelds are very memory constrained, JIT has challenges wrt to space and energy consumption. Is that too high to keep JIT technology in the darkages? How much more memory does JIT add over an interpreted version? Is there any study or white paper on why and how much such an overhead would be for JIT?"

Thanks for the comment (and the compliment), Cochin. I wanted to answer right away, but alas, I needed to gather some facts for it, and my day job also got in the way (needed to get some work done). At this moment, while I'm waiting for my computer to crunch some major compilations, I'll take a few minutes to give you my answer ...

Regarding a projection of what % of the handheld market is / will be based on Java ME (formerly known as J2ME) apps, unfortunately, I personally don't have such info. But check out this blog entry by my colleague Hinkmond Wong. You can make your own judgement from that, but my guess is that the Java ME handheld apps market will only increase. But hey, I'm biased.

But regarding JIT overhead and device memory constraints, here are some perspectives ...

The OS and Device Memory Budgets

In recent years, most devices I've seen (in my limited survey of the embedded world) have a RAM memory capacity in the range from 16M to 32M. More high-end devices have 64M. Media devices like set top boxes may have even more memory (128M, 256M, and up). Regardless of the capacity, the budget for the Java platform to run in is usually a small fraction of the total. The majority of the memory budget usually goes towards heavy memory consumers like the underlying OS, and media data (image, video, and sound).

For example, in one Linux device which comes with 64M of RAM, right after booting the OS (and its misc services) before the Java platform gets to run, the amount of memory reported by the OS to already be in use is around 42.7M. Another 4M or so is presumably reserved by the OS (/proc/meminfo says that there is only a total of 60M though we know it's a 64M machine). That leaves less than 18M for the Java platform and other native applications to work with.

The Interpreted VM

Next, we need to ask what the footprint of a basic interpreted Java ME platform is like. Here are some numbers:

  1. CLDC

    Numbers based on Sun's CLDC-HI aka phoneME Feature VM:

    code size: 300K

    static memory (data + bss): 9K

    Java heap: 500K - 8M (typical, but can scale up or down)

    System classes and ROMized classes are included in the size. Their native methods are included too. These do not include all the many Java ME JSRs that one can add nor MIDP. Those are extra, of course, and can add significantly to the footprint depending on which JSRs. My guess is that JSRs can add anywhere from 10s to 100s of Ks in footprint per JSR.

  2. CDC

    Numbers based on Sun's CDC-HI aka phoneME Advanced VM aka CVM (on ARM):

    code size: 1522K

    static memory (data + bss): 903K

    Java heap: 2M - 8M (typical, but can scale up or down)

    System classes (CDC/Foundation Profile 1.1) and ROMized classes are included in the size. Their native methods are included too. These do not include all the many possible Java ME JSRs as well.

In the above memory measurements, native memory for threads, stacks, heap, and other OS constructs in memory is not included. These can vary depending on the type of Java application that you run. Usually these add less than 1M of extra RAM usage. Of course, your mileage may vary.

Note that in most deployments, an 8M heap is probably a very generous allowance. Depending on the device and the typical applications, the heap size limits may be set differently.

The JIT

Next, let's look at the additional footprint that a JIT adds. Here are the numbers:

  1. CLDC

    Numbers based on Sun's CLDC-HI aka phoneME Feature VM:

    JIT code size: 100K

    JIT working memory: small

    JIT code cache: 10% - 20% of Java heap (allocated from Java heap)

    I don't have a measured number for the working memory but it is small (I'm guessing less than 10K, maybe significantly less) because the CLDC-HI JIT does not allocate a lot of intermediate data structures for its compilation process.

    CLDC-HI's code cache is allocated from the Java heap. When the heap is under pressure (i.e. low on memory), the code cache can be shrunk to make room for object allocations.

  2. CDC

    Numbers based on Sun's CDC-HI aka phoneME Advanced VM aka CVM (on ARM):

    JIT code size: 246K

    JIT static memory: 111K

    JIT working memory: up to 1M

    JIT code cache: 512K (typical, but can scale up or down)

    Hmmmm ... these code size and static numbers surprised me a little actually. They are much larger than I expected. However, these are computed by building the VM with the JIT enabled, and subtracting the interpreter only VM sizes from the new sizes. Let's do a size measurement based on the JIT code modules alone:

    JIT code size: 207K

    JIT static memory: 2K

    OK, now that's more like what I expected. So, what happened is that when I enabled the JIT, the rest of the VM code and data structures also increased in size to support the JIT. That accounts for the 39K in code size. The 109K of static memory probably comes from ROMized class data structures that now has to be in RAM to support the JIT. But I'll be fair and use the larger set of numbers for our computation below.

    The CDC-HI JIT working memory is allocated from the C malloc heap as needed. Typical JIT compilations will not use that much memory (typical usage is in the low 100Ks). The 1M is the default limit. This limit can be set lower or higher from the VM command line as needed.

    The JIT code cache (which used to store the compiled code generated by the JIT) is allocated from the C heap as well. 512K is the default size. I've seen that fairly large sized applications can perform optimally within a code cache size of less than 350K. The default is set at 512K to allow for ease of use. This size can be set lower or higher from the VM command line as needed as well.

The Overhead

Given the above numbers, let's look at the overhead a JIT adds over an interpreted only VM:

  1. CLDC code: 100K / 300K = ~33%

    CLDC RAM (assume 8M heap): (20% x 8M) / (9K + 8M) = ~20%

    CLDC overall: (100K + 20% x 8M) / (300K + 9K + 8M) = ~20%

  2. CDC code: 246K / 1522K = ~16%

    CDC RAM (assume 8M heap): (1.5M + 111K) / (903K + 8M) = ~18%

    CDC overall: (246K + 1.5M + 111K) / (1522K + 903K + 8M) = ~18%

Note that I'm assuming worse case JIT memory usage in both the CDC and CLDC cases. Typical memory usage is significantly smaller than this (especially in the CLDC case). I'm also ignoring other native memory usage (e.g. threads, native stack, etc).

Now, let's look at realisticly what this overhead looks like on a real device. Assuming the 64M Linux device I mentioned earlier:

  1. CLDC overall:

    (100K + 20% x 8M) / (300K + 9K + 8M + 42.7M) = 1738.4K / 52225.8K = ~3.3%

  2. CDC overall:

    (246K + 1.5M + 111K) / (1522K + 903K + 8M + 42.7M) = 1893K / 54341.8K = ~3.5%

Even with typical Java heap sizes that may be smaller, the memory overhead for the JIT will still be around the 3 - 4% range.

The Bottom Line

Depending on your device memory budget and other factors, a Java VM JIT may or may not represent a significant overhead. In the above highly inflated numbers (not in favor of the JIT), the overhead is less than 2M (3% to 4% of overall memory consumption) in both the CLDC and CDC case. In both cases, typical memory usage is less than the above (and significantly so in the CLDC case). As you can see from the above example, the bulk of the overhead comes from JIT working memory and the code cache size. And if your memory budget is tight, these can be configured to be lower in order to reduce the overhead.

To venture a guess, a typical CLDC deployment will have a heap of 4M or less. Hence, its JIT overhead (which is a % of the heap size) will be reduced by ~819K from ~1738K to ~919K. If you are only using a 2M heap, that overhead reduces to about 510K only.

A typical CDC deployment will have a 5M heap. But the working memory and code cache size are independent of that. The typical working memory size is in the low 100Ks. Let's say we limit it at 512K. That brings the JIT overhead down to ~1381K. As mentioned earlier, a good large sized application will already run optimally in less than 350K. Assuming we limit the code cache to 400K, the overhead now reduces to about 1269K.

Before you go thinking that the CDC JIT is inferior, here are the reasons why it would use more memory than the CLDC JIT:

  1. The CDC JIT performs more advanced compilation: this uses more working memory but generates more higher performance code.

  2. A typical CDC application is a lot more complex than a CLDC app: this causes more Java code to be needed. Therefore more code gets compiled, and a larger JIT code cache is needed.

  3. Typical CDC devices prefer a higher performance to footprint tradeoff: CDC devices usually have a larger memory budget than their CLDC counterparts. Hence, it makes sense to trade off a bit more memory to get better performance. For example, CDC inlining is more aggressive. This can be reduced (by setting JIT options at VM boot time) if needed.

Hence, the difference in overhead isn't for nothing. With CDC, you pay more because you get more and your apps need more.

Note also that the JIT working memory is only used during the JIT compilation process to store intermediate results and data structures. Once the compilation is complete, all this memory is freed up. Hence, once the system reaches steady state, JIT compilation will seldom occur, and this overhead will not be incurred. The above overhead estimates does not take this into account but is instead considering only the worse case when a compilation is actually in progress.

Final Word

And so, considering the overall allocation of memory in your device, a JIT would actually use very little memory in a typical scenario. My conclusion, unless your memory budget is extremely tight, is that a JIT is worth the relatively small overhead. As usual, your mileage may vary. Draw your own conclusions if you don't like how I pick my numbers. But, I hope that this at least gives you a better feel for the relatively small size of the overhead.

And, Cochin, for the record, JIT technology isn't from the dark ages. The dark ages was back when people didn't realize that a JIT is so much more efficient, and they compiled all their code down to native code instead. Talk about a major footprint overhead. A resource constraint enviroment like those in Java ME devices is precisely why we would want and need JIT technology. It is one of a few technologies available today that enables us to get high performance at a smaller footprint price.

Oops, I just realized that I didn't address the energy consumption part of your question yet. That is a whole separate discussion. Perhaps, another day.

BTW, thanks to my colleague Brandon Passanisi and Oleg Pliss for providing me with some of the above numbers which I used for this exercise. Thanks, guys.

Till next time, have a nice day. =)


Tags: href="http://technorati.com/tag/CVM" rel="tag">CVM href="http://technorati.com/tag/Java" rel="tag">Java href="http://technorati.com/tag/J2ME" rel="tag">J2ME href="http://technorati.com/tag/JavaME" rel="tag">JavaME href="http://technorati.com/tag/JIT" rel="tag">JIT href="http://technorati.com/tag/phoneME" rel="tag">phoneME href="http://technorati.com/tag/phoneME+Advanced" rel="tag">phoneME
Advanced rel="tag">embedded systems

Related Topics >>