|
|
||
Mark Lam's BlogJ2ME ArchivesJVMTI in Multi-tasking VMs (MVM)Posted by mlam on March 13, 2008 at 01:21 AM | Permalink | Comments (9)Hmmmm ... two blog questions in the same day. What's an over-worked and busy guy to do? Oh well, I guess the day job can wait just a little while I respond with a few words. :) On March 12, 2008, in a blog comment, Steven North asks ...
Hi Steven. Thanks for your compliment and question. Unfortunately, I don't have an authoritative answer for you. But here's a few of my thoughts on this subject ... CVM JIT Constant Pool DumpsPosted by mlam on March 12, 2008 at 11:52 PM | Permalink | Comments (7)Hello World! It's been a long time ... ummm ... like 6 months since I last wrote an entry. What can I say? That's the problem with having a day job, and so far, all the ideas for things that I want to write about involves some heavy duty writing that will take up a lot of time. So, I've been putting it off. Sorry. However, this inquiry came in today on one of my previous blog entries. Now, this, I can answer without taking up a few days of writing time. So, here you are ... The Question
I took some liberty with editing the comment for clarity. Jamsheed, I hope you don't mind. VM Inspector 0.1: Some new stuffPosted by mlam on September 21, 2007 at 07:37 PM | Permalink | Comments (0)You may or may not have noticed on the phoneME Advanced Downloads page, that there is a phoneME Advanced MR2 binary for WinCE / Windows Mobile 5. That's one of the projects that my esteemed colleagues and I have been busy working on in the past few months. That is part of the reason I have not been able to post much. So, I apologize for that. Well, after being spoiled with all the rich debugging features available in gdb on Linux, working with WinCE has been ... ummm ... challenging. In the initial phases of bringing up the VM, any number of things could have gone wrong. Without adequate debugging capability, it is hard to figure out what has gone wrong. One of the common things that can occur when the system isn't stable yet is a hang. When that happens, you really need to get hold of the thread stack traces in order to figure out where the hang is happening. Maybe there's a way to do this on WinCE (and VS2005 i.e. Visual Studio 2005) and I'm just ignorant about it, but what I found was that when the threads are all hung, I can't just force a break on all threads whenever I like in VS2005. Well, I can ... but I don't seem to be getting any thread stack info. What I heard was that when a thread is blocked inside some WinCE API, then VS2005 won't be able to give us a stack trace. Since hangs tend to be cases where the threads are all blocked in locks of some kind (i.e. using WinCE APIs), I was out of luck trying to get stack trace information. But all is not lost. There's the VM Inspector (and cvmsh) and the thread dump hack for CVM Java threads that I'd introduced to you previously. However, I can't just use them as is yet. So, I added a little bit of enhancements ... What's the Diff?Posted by mlam on August 31, 2007 at 07:22 PM | Permalink | Comments (0)In a comment for a previous blog entry, I was asked ...
This is how I get the answer for that ... CVM: Why use the C or Java heap?Posted by mlam on August 09, 2007 at 06:49 PM | Permalink | Comments (4)A comment in a previous blog asks ...
For those of you who haven't been following my blogs before, Erik is asking a specific question regarding the memory layout of data structures in the CVM Java virtual machine (aka the phoneME Advanced VM). Erik, do you mean why specific things are in the C heap instead of the Java heap? Or do you mean why are specific things in the Java heap instead of the C heap? Well, let me answer both ... CVM's VM InspectorPosted by mlam on July 31, 2007 at 11:59 PM | Permalink | Comments (1)In a previous blog entry, I showed you a map of CVM. If you are a VM engineer (or someone who is doing a port of the VM), and need to do some debugging, navigating all that data structures can be pretty daunting. How do the CVM engineers do it? History
However, there is a problem with using these utility functions. That is, you will need to be careful how you use them. For example, if you use Can't we just get the VM utilities to just do all the careful checks for us automatically so that we don't make a foll of ourselves by calling the wrong call at the wrong time? the VM Inspector CDC and JVMTIPosted by mlam on July 30, 2007 at 10:45 PM | Permalink | Comments (8)In a comment in a previous blog entry, a friend asked a question about using the JVMTM Tools Interface (JVMTI) with JavaME CDC ...
Here's what I think ... Async Thread Dumps on CVMPosted by mlam on June 22, 2007 at 01:45 AM | Permalink | Comments (4)There are times in the course of your development effort when your application just seems to hang forever. At those times, you wish you had some way of knowing where the hang is occurring. If you're running on JavaSE, chances are you'll have a lot of advanced tools that makes life easy for you. But if you're running on an embedded device, suddenly, your options are now severely limited. For the phoneME Advanced VM (CVM), there's a way to get help on this even when there is not advance debugging support on your device. What I'll be showing here is an old trick to get an asynchronous dump of the stacks of all the threads that are currently alive in the VM. First of all, you need to know that this is a hack i.e. it's not good and clean code. That's why I haven't already committed it to the source repository, and won't be doing so. The reason it is a hack will be explained below later under Why this is a Hack!!!. But even though it is a hack, it is useful when you need it. Many of my colleagues as well as customers have often asked me for the code patch for this hack to help with debugging the hangs in their applications. I figure you might find it helpful too. So, here it is ... The Price of SpeedPosted by mlam on June 06, 2007 at 11:40 PM | Permalink | Comments (1)I apologize for not writing in a while. I've been trying to get some real work done (i.e. coding and designing solutions to improve the lives of our customers ... or at least, that's my goal). Anyway, two weeks ago, an interesting comment was added to a previous article I wrote on understanding JIT performance. The comment says ...
Thanks for the comment (and the compliment), Cochin. I wanted to answer right away, but alas, I needed to gather some facts for it, and my day job also got in the way (needed to get some work done). At this moment, while I'm waiting for my computer to crunch some major compilations, I'll take a few minutes to give you my answer ... Java and More Embedded ConsiderationsPosted by mlam on April 22, 2007 at 03:46 AM | Permalink | Comments (0)Previously, I talked about why an embedded systems developer would choose to develop on the Java platform. If you have read that article and are intrigued by the benefits that the Java platform offers, then the next step is probably to ask some more deep probing questions like ... Do I really need the Java platform? Well, if you want the benefits of a runtime interpreted scripting language (i.e. isolation, upgradeability, etc.), then, as I have explained previously, your best bet is with the Java platform. You may not need the Java platform if your device has the following characteristics:
Generally, if your situation doesn't fit into one of the above profiles, then it is likely that you will benefit from developing on the Java platform. Why choose Java?Posted by mlam on April 17, 2007 at 03:09 AM | Permalink | Comments (12)Or as the lawyers will probably correct me, the question in the title would more accurately be phrased as "Why Choose the Java platform?". If you've been following my blog, you may notice that I haven't written in a while. This is because I've been really busy with my day job. One of the things that the job brought me on recently was a road trip to meet with some customers. On this trip, I had the pleasure of having a conversation with an esteemed fellow embedded systems developer who was trying to understand the Java platform. He asked, "Why Java?". Such a simple question (simple only in its phrasing), but so pertinent. In his case, he was relatively new to Java, but has been doing embedded systems development for a long time. What he was asking was actually (1) why the Java platform is relevent for his projects (i.e. an embedded systems device), and (2) why he should choose it over available alternatives. To be honest, I was a little caught off guard by this question. I had to pause for a while to think. On the spot, the only way I could think of to answer his question in the proper context was to tell him my personal story of how I came to choose the Java platform myself. This was the same event that motivated me to learn about the internals of a Java VM which got me hooked, and eventually led me to come work for Sun. I thought that the answer to this question that he asked may also be pertinent and interesting to other embedded systems developers who may not yet know of the advantages of the Java platform. So, I decided to share it with you here. JIT Performance: Defying Physics?Posted by mlam on February 21, 2007 at 05:56 PM | Permalink | Comments (6)A few days ago, I came across a few blog entries that referenced my previous article. They are: When is software faster than hardware? by Matthew Schmidt, and Can JIT'ed Code be Faster than Hardware Accelleration by Kirk Pepperdine. These blog entries had received some comments that I thought deserved a response. So below, I will try to address issues raised in some of those comments, as well as provide an intuitive understanding of why you would expect a JIT to outperform a JPU. Resources: When is Software faster than Hardware?, Software Territory: Where Hardware can't go! Let's start with ... Physics Shmeesics Software Territory: Where Hardware can't go!Posted by mlam on February 16, 2007 at 02:27 AM | Permalink | Comments (15)In response to my previous article, some folks have been asking about the JIT optimizations I listed, as well as a lot of other interesting questions. I'm not sure I can address all of the questions here. But on the topic of JIT optimizations, I can provide more insight on what they are as well as why hardware cannot implement them. Before I get started, just to be clear, I'm not personally against hardware Java processors. I certainly think that they fit nicely in some domains. I am also not against any vendors who make Java processors out there. I applaud them for serving the needs of a market that a JIT may not fit. Also, just because a JIT fits doesn't mean that it is always the best solution to deploy. In a previous article, I've made the case that engineering decisions should always be made on a case by case basis. A "one size fits all" mentality can work, but may not always yield the best solution. However, I do want to debunk the myth that a hardware processor can be faster than an optimizing JIT. But, of course, the JIT isn't free. There is some cost to it in terms of CPU cycles and memory, though it is often a lot less than most people believe. I will address the JIT cost issue in a future article. For today, let's look at JIT optimizations. Since I work on the phoneME Advanced VM for CDC (aka CVM), along the way, I'll point out if these optimizations are available in CVM as it exists today (for those who are interested in CVM details). Resources: When is Software faster than Hardware? JIT Optimizations The list again is:
When is Software faster than Hardware?Posted by mlam on February 13, 2007 at 02:44 AM | Permalink | Comments (6)I decided that I'll take a break from the bug fix track that I've been on, and have a little diversion to spice things up. I'll resume the bug fix (and JIT internals) discussion soon. For today, I would like to clarify a common misconception that hardware Java processors are faster than dynamic adaptive compilers / just-in-time compilers (i.e. JITs). I'll take you through some analysis to prove my point. The analysis will be based on examples from the phoneME Advanced VM for CDC (aka CVM), but this reasoning should apply to other VMs as well. Let's dive in ... Hardware Acceleration Another reason is that the hardware accelerators can provided special instructions that can do work that is traditionally done by software routines. Of course, these special instructions are specific to the types of algorithm (i.e. graphics, sound, DSP) that uses them. Hence, if your application doesn't do much graphics, sound, and/or DSP, then such hardware accelerators won't be able to make your application run any faster. Due to the known success of these hardware accelerators in their respective applications, we have come to generalize this success to think that all hardware acceleration will beat software solutions. In the case of Java processors in comparison to JITs, this generalization turns out to be untrue. Java Processors Disclaimer: I am not commenting on the quality of any specific hardware Java processor implementations in the market, but merely looking at this issue from a purely theoretical viewpoint. OK, now let's look at a specific example ... Bug Fix Part III: List of ChangesPosted by mlam on February 02, 2007 at 01:41 AM | Permalink | Comments (0)The problem with having a real job is that I don't always have time to blog. =p And I am also looking forward to wrapping up this thread of discussion so that I can move on to some other topics as well. Unfortunately, because of the sheer amount of information, it will take a few more entries. While I'm still very busy, this discussion will never end if I keep putting it off. So, here's a bit more for today. Let's dive in ... Resources: By the way, I call this entry Part III (as opposed to Part IV) because I didn't count my last entry on the JIT Architecture Map as being directly about to this bug fix (though its content is relevant). Status of the fix Summary of changes
1. Fix for CR#5080490:
Added support for compiling with volatile 64bit field accesses (includes
potential volatile 64bit field accesses due to unresolved CP entries).
The IR is changed to mark the fieldref nodes with a VOLATILE flag when
appropriate. The JIT backend is changed to emit calls to helpers for the
cases of potential or known 64-bit volatile field accesses. The current
implementation uses CCM helper functions to achieve field atomicity in the
same way that the interpreter does it i.e. using a microlock.
The above essentially describes the strategy that I used to fix this bug. I will discuss this in greater detail in my next entry. There were also a few other related items that I took care of while working on this fix: CVM's JIT: Another BIG PicturePosted by mlam on January 11, 2007 at 04:33 AM | Permalink | Comments (0)In my last few entries, I've been talking about a bug I'm currently fixing. One of the reason I haven't been updating daily is because said bug is taking a lot more of my time than expected. There is always more to the picture than meets the eye. Anyway, in my last entry, I briefly discussed the internals of the CVM (phoneME Advanced VM)'s JIT (officially, the dynamic adaptive compiler). Since the bug that I need to fix involved adding functionality to the JIT, we need to know in greater detail how the JIT works (or at least be able to know our way around the code). So today, we'll leave the bug fix alone for a while, and talk about the JIT's BIG Picture ... Click on the map to get a popup window with a 1024 x 768 res bitmap of the map (if you want to view it in a separate window). Or click here to view the map in a PDF file. I highly recommend using the PDF if you plan to do a printout of the map. And here's how to read the map ... In a bit of a Volatile Fix!Posted by mlam on January 05, 2007 at 01:38 AM | Permalink | Comments (0)Sorry for not writing for a while. I've been really busy. In my last entry, I described a bug that needs to be fixed and all the background information behind it. Below, I will get into the details of how we'll fix the bug. Of course, we'll talk more about the internals of the phoneME Advanced VM (CVM) as we proceed with the fix. Resources: start of CVM internals discussion, copy of the CVM map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime. bug Update Last time, I also said that volatile 64-bit field accesses are relatively rare. But their presence in a method can still stop the method from being compiled by the JIT even if the method is hot. Hence, it would be nice to fix this so that volatile 64-bit field accesses won't prevent the JIT from doing its job. Note that use of non-volatile 64-bit fields is more prevalent than their volatile counterparts. However, the codepaths that exercise these accesses may be equally rare. If the code path has not been executed at least once before the JIT attempts to compile the method that contains it, then the field access opcode will remain in an unquickened state. This in turn means that the JIT won't know if the field is actually volatile or not, and must therefore treat it like a volatile field just to be safe and refuse to compile the method. Hence, the performance impact of this bug is exacerbated because it not only impacts code which uses 64-bit volatile fields but regular 64-bit fields which are unresolved as well. So, let's get on with the fix ... A Field Get ExperiencePosted by mlam on December 14, 2006 at 06:41 PM | Permalink | Comments (0)This article is a continuation of my series of discussions about the internals of the phoneME Advanced VM (commonly known as CVM) for JavaME CDC. Below, I'll work on fixing a bug in the VM. Along the way, I'll discuss more of CVM's internal mechanisms. Note: for the purpose of this discussion, I will only focus on the coding aspects. The source code version control details will not be discussed here. Resources: start of CVM internals discussion, copy of the CVM map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime. What is the Bug? The load, store, read, and write actions on volatile variables are atomic, even if the type of the variable is double or long. Currently, the JIT handles this by refusing to compile methods that accesses these types of 64-bit variables. Instead it defers execution of the method to the interpreter which already handles these 64-bit volatile field accesses in an atomic fashion. The bug is one of performance rather than of correctness. Digging further ... Beware of the NativesPosted by mlam on December 07, 2006 at 02:11 PM | Permalink | Comments (16)There are a lot of not so nice things about using native methods. Here are some:
But if these reasons aren't enough to deter you from using native methods, try this on for size:
This seems to go against most people's expectations, but it is the truth. First of all, there is the reason due to what goes on in the runtime stacks when you invoke native methods. I've talked about that in my previous articles (here, here, and here). There, I showed that using native methods incurs bootstrapping and extra frame pushing/popping overhead which results in degraded performance. But there are also many other reasons besides this. To be fair, native code can be used to help improve performance when used in the right places. I will explain those cases as well. The key is to use native code "carefully". Ok, let's go bust the "native" myth ... JIT me up, ScottyPosted by mlam on December 06, 2006 at 07:03 PM | Permalink | Comments (1)
The phoneME Advanced VM (CVM) comes with a dynamic adaptive compiler (JIT) which generates compiled code. Today's article will talk about how JIT compiled code uses the runtime execution stacks. I will also point out a few other tidbits about efficiency and performance as pertaining to the runtime stacks. Resources: start of CVM data structure discussion, start of stacks discussion, copy of the map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime. Let's get started ... Bouncing on the Compiled Code Trampoline native stack: ... executeJava -> goNative Let's take a look at another case. Here's compiled (mCa) to compiled (mCb): native stack: ... executeJava -> goNative Note that even if the very first Java method executed is a compiled method, we still have an instance of executeJava (i.e. the interpreter loop) on the native stack. This is because like with native code, the interpreter is used to do the bootstrapping via transition methods (see previous discussion for details). Once we're executing in compiled code, the VM will tend to stay in compiled code until there is a need to exit. Here's interpreted (mIa) to compiled (mCa) to compiled (mCb) to compiled (mCc): native stack: ... executeJava -> goNative Note that there is only one instance of executeJava and goNative on the native stack even though there is 1 interpreted and 3 compiled methods being executed. CVM first enters executeJava and interprets method mIa. When m1a calls mCa, the interpreter detects that mCa is compiled. So, it pushes a compiled frame (see CVMCompiledFrame in interpreter.h) onto the Java stack instead. Next, it calls goNative with the appropriate entry point into mCa. When mCa calls mCb, the compiled code will make use of some assembler code called the invoke glue (see examples in src/arm/javavm/runtime/jit/ccminvokers_cpu.S here). The invoke glue then branches to the entry point for mCb. mCb does not get another frame on the native stack, but reuses the same scratch memory allocated by goNative that was previously used by mCa. Hence, the scratch memory on the native stack is not used to hold state information across method call boundaries. All method state that need to persist across this boundary is kept in the compiled frame on the Java stack instead (see here for the structure of the compiled frame). This is what we mean by the "staying in compiled code". Though code execution goes from one method to another, there is no new native frame being pushed or popped, and code flow does not go through the interpreter loop. Overhead between method calls are kept to a minimum. Each compiled method looks like a routine that we branch to instead of calling. The code flow execution pattern looks like bouncing on a trampoline. We bounce off the glue trampoline to jump into a compiled method. To call another method, we fall out of that method back into the glue, and bounce into another compiled method. The same is done for returns as well as invocations. Hence, the trampoline analogy. Sometimes, the glue code is referred to as trampoline code. But what if we need to call an interpreted method from compiled code? A Tale of Two StacksPosted by mlam on December 05, 2006 at 01:10 AM | Permalink | Comments (0)
So today, I'll continue talking about the runtime stacks in the phoneME Advanced VM / CVM from my last entry. Resources: start of CVM data structure discussion, copy of the map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime. Let's get started ... Why use 2 Stacks? In the early days of CVM, linux was still uncommon in embedded systems. Most embedded OSes in those days (and still today) do not have processes, and do not provide access to a virtual memory manager (even if the hardware supports it). Hence, when you create a native thread, you will have to malloc a chunk of memory for the use of the native stack. Remember that CVM's threads are mapped directly onto native threads. One issue with having to allocate memory for the stacks is knowing exactly how much memory to allocate. If you allocate too little, applications won't run. Allocate too much, and you'll have wasted resources. For embedded devices, wasting resources is highly undesirable. Note that without virtual memory, malloc'ing the memory here means a fixed sized allocation and the memory is committed upon allocation. In turn, that means no other thread can use the memory even if the current thread doesn't need it. This is why over-allocating stack sizes results in wastage for most embedded OSes. For classical embedded software written in C, one could do some analysis to determine max usage of stack space and get an optimum allocation of memory. For a Java platform, this is typically not possible. One of the most desirable features of the Java platform is its ability to enhance the device by allowing new or updated versions of applications to be downloaded and executed without having to re-ROM/flash the core firmware in the device. This means that the stack requirement of the application(s) is not known / computable at the time the Java VM is built. And the solution is ... CVM Stacks and Code ExecutionPosted by mlam on November 30, 2006 at 05:13 PM | Permalink | Comments (0)
Welcome to a continuation of the discussion on the internals of the phoneME Advanced VM (CVM). If you missed the beginning of this discussion, look here where I did a high level introduction of some of the major VM data structures using the CVM map. Today, I'll get into the execution of Java methods and how this appears in the runtime stacks. By stacks, I mean stacks as in the thread stacks that hold activation records for methods ... not stacks as in container APIs, or stacks as in API layers. This discussion will give you insight into the control flow of Java code execution in CVM (i.e. who has the CPU at any time). If you want to bring up a copy of the map for reference while you read on, click here (or here for a PDF to print). All the source files that will be referenced below can be found in the src/share/javavm/include (see here) or src/share/javavm/runtime (see here) folders of the phoneME Advanced project. You will find the .h files in the include folder, and .c files in the runtime folder.
The Execution Engines There are many ways to measure the hotness of a method. The CLDC VM (phoneME Feature) uses a timer based sampling mechanism. As of this writing, CVM uses invocation counts that are sampled during interpretation. Upon reaching some threshold of hotness, the method gets compiled. The issue now is how to go from interpreting the bytecodes to executing the compiled method. To understand this (and all the other nuances of Java code execution), we need to take a look at what happens in the runtime stacks when Java code is executed ... Multi-tasking the Java platform: What's the Big Deal?Posted by mlam on November 28, 2006 at 11:35 PM | Permalink | Comments (7)Today, I started reading this thread on java.net forums. It made me wonder if people all mean the same thing when they talk about a multi-tasking Java platform. So, I decided to postpone my discussion of CVM internals for a day, and go over the topic of multi-tasking (which is also relevant to phoneME and CVM). Disclaimer: Before getting into it, I should clarify that my opinions are my own and not necessarily that of Sun, my employer, nor my colleagues at Sun. So, here goes ... What is Multi-Tasking anyway? Therefore, when people want multi-tasking, I would think that what they are actually asking for is a Java process, and the Java platform takes on the role of an OS relative to the Java applications. Let's take a look at multi-tasking features in OSes and see how those should be manifested in the Java platform. We should also take a look at why people would want these features so that we don't end up over-engineering a solution. So first, OSes ... The BIG Picture: a Map of CVMPosted by mlam on November 27, 2006 at 01:17 PM | Permalink | Comments (8)Personally, when I dive into a new system, one of the first thing that I try to figure out is how everything fits together. If you are a visual thinker like me, one of the best ways to do that is to draw a diagram of all the things that you think are important and see how they relate to one another. In the case of embedded systems, in my experience, it is also important to know what goes where in memory, and to get a feel of how system resources are being used. Hence, I prefer to map out the data structures. Here is my map of CVM ... the WORLD according to CVM And here's how to read the map ... C further with CVMPosted by mlam on November 25, 2006 at 01:19 PM | Permalink | Comments (3)I've been talking a lot about esoteric knowledge about the phoneME Advanced VM (CVM), and thought that it is about time to feed you some really technical data. So, I spent most of yesterday rendering a Map of CVM to show you the lay of the land, but it is taking a lot longer than I thought. As a result, no blog entry yesterday. :-( Hopefully, I will get it done today, and be able to do a write up for monday. Look for it. It'll be like CVM in a nutshell. By the way, I'm using InkScape to do my rendering of the CVM map (a colleague pointed me to it). I don't know if it's the best, but it certainly does the job. So, I thought I'd give it a mention here in case others are looking for a tool like this too. I'm using InkScape because I wanted to render the CVM map in SVG, so that you'll get to scale it to match whatever resolution you need without sacrificing detail. But alas, I'm finding that my browsers aren't quite able to display the SVG format yet (or maybe I'm not exporting to the right format). If anyone has hints on what SVG format is supported by popular web browsers, please let me know. Otherwise, I will go with a bitmap for ease of viewing and a PDF for finer inspection. Incidentally, I also want to thank the 2 people who have left comments for me so far. It's nice to know that I'm not just talking to a wall. So, on to today's topic(s) ... Why is CVM written in C? On a second note, we've found that some C++ compilers also generate very inefficient code in terms of footprint (2 to 3 times more footprint). This certainly is not good for any embedded software. Now, before you jump to conclusions, I don't think that this inefficiency necessarily had to do with the C++ language itself. Personally, I'm a fan of C++ as well, and I know how it can let you write really elegant and efficient code (assuming the compiler cooperates), as well as really bad bloated code. My guess at the time was that people in general didn't care enough about C++ to invest in its toolchain (in comparison with C) ... not to say that there aren't very good C++ tool chains out there. As a result, C++ is given a bad name ... which I think is unfortunate. Mind you, the CVM decision was made some 7 years ago. The inefficient C++ code generation was observed about 3 to 4 years ago. Perhaps, these issues of availability and efficiency have been fixed since. Some more of my thoughts on portability and performance below ... When does JavaSE becomes a better choice than JavaME CDC?Posted by mlam on November 23, 2006 at 02:17 PM | Permalink | Comments (5)A comment from my last entry on performance, asked, "I was thinking about the fact that devices [increasingly] get more power and more RAM. I thought when will JavaSE be a better choice instead of JavaME/CDC1.1? How much CPU, RAM, cache....do you need?" Before I answer this, I must first make the disclaimer that my opinions are my own as an engineer, and not necessarily that of my employer, Sun, or even other engineers at Sun. With that said, now let's get into the question ... JavaSE or JavaME? Device Capability Device capability isn't only about the choice of processors. Take PowerPC for example. It has embedded variants as well as the more well known desktop and server versions. The processor core is mostly the same (i.e. will execute the same code), but other capabilities are different. The most obvious would be differences in clock speed, and cache. And then, there are other hardware differences (e.g. board level) in capability: cache architecture, L2 cache, main RAM capacity, RAM speed, bus speed, memory and I/O bus architecture, I/O processors, MMU, DMA, TLAB size, secondary storage (HardDisks, FlashDisks), etc. ... or, the lack thereof. The more of these features your device has, the more likely JavaSE is a better fit, and vice versa. Just looking at memory capacity alone, I think JavaSE typically operates with a footprint in the order of 10s to 100s of MBs, or even GBs. CVM operates in the order of 1s to the low 10s of MBs. Of course, a lot of this depends on what your application is doing (for both JavaSE and CVM). But those numbers should give you an idea. So, if your device only has 16MB RAM, CVM will probably be your best bet. If you have 32MB of RAM, it gets a little gray, and depends on what you are trying to achieve. CVM is still usually your best bet for most embedded applications. Low 100MBs, it is still gray but tending more towards JavaSE now. If the device has 1 GB or more, I would be fairly confident that JavaSE is better suited here. As for cache, it's a lot harder to tell. 0 to 10s KBs, go with CVM. 10s to 100s of KBs, it's a gray area. MBs of cache, you can definitely run JavaSE now, but this doesn't mean that CVM isn't still the better choice in some cases. For clock speeds, 10s to low 100s of MHz, CVM is your better choice. Low 100 MHzs to 1GHz, it's gray. More than 1GHz, JavaSE is the likely choice. But as Sun has shown not too long ago, CPU performance isn't all about clock speeds (see CoolThreads). So, take the above numbers with a grain of salt. In fact, all the ranges I've given above are just educated guesses based on my experience in this field. They can be used as a hints, but a real world case can be different. That's why there's no hard fast rule as to which fits better in any given case. Now that we're talked about the obvious stuff, let's get into all the "gotchas" that people may not think about ... Performance: Too much of a good thing?Posted by mlam on November 22, 2006 at 05:52 PM | Permalink | Comments (7)This article continues with esoteric knowledge about the phoneME Advanced VM and the JavaME space that developers will need. If you've looked at the phoneME Advanced VM source code, you'll see that a lot of the names of functions and data structures are prefixed with CVM. CVM is the informal name of Sun's CDC VM, and prefixing labels (especially for global functions and data structures) with CVM is a standard coding convention in this VM code base. This is probably common knowledge to most people who already work with Sun's CDC technology, but I thought I'd mention it anyway in case. Plus, now I can simply refer to CVM directly instead of having to say phoneME Advanced VM. So, on to this entry's topic ... Performance Having said that, I want you to know that I am not saying this because CVM's performance is anything to be embarrassed about. As far as we know, CVM is one of the fastest VM in this space, if not the fastest. To give you an idea of CVM's performance, a few years back, we benchmarked it against JavaSE 1.3 client VM on a subset of SPEC JVM98. We had to use a subset because SPEC JVM98 uses deprecated APIs which have been removed from CDC. Hence, we had to do an internal "port" of the benchmark for this comparison. The comparison was done on a PowerPC PowerMac and a Solaris SPARC machine. CVM came out to be around 80-90% of the performance with only 10% of the static footprint in comparison with JavaSE. You should know that this is old data. JavaSE has improved significantly since, and so has CVM. Note: I'm only sharing about this comparison to give you an idea of the level of performance that can be achieved in JavaME. I'm not saying anything about which VM is better. That would be like comparing apples and oranges. More on that later. So, when we talk about performance, one of the VM's component that people think of first is the dynamic adaptive compiler, also commonly know as the JIT. Below, I will talk about some performance issues around compilation. I will also touch on other areas / topics that are not JIT related but are important as well. Introduction to phoneME Advanced VM InternalsPosted by mlam on November 21, 2006 at 01:17 PM | Permalink | Comments (1)If you are reading my blog, chances are that you already know about Sun open-sourcing its JavaME software stack in the phoneME project. If not, click here to read more about phoneME. Some background info Hence, I intend to write a series of blog entries (starting with this one) on these topics to make our mutual lives easier in the long run, as well as allowing everyone to get to the fun stuff sooner instead of having to waste time figuring out mundane things. I will also be writing about technical topics like how certain sub-systems work as I feel inspired to. Feel free to leave comments to requests topics that you want me to talk about, or ask for clarifications on things I will/have talked about. I will take demand into consideration when I choose the order of topics to write about. Before I start, you might ask how I came into this knowledge that I will share (and why you should trust that I actually know what I am talking about). So, a bit about me: I work on the VM team that created and maintains the VM at the crux of the phoneME Advanced project. I was with the team since before CDC 1.0 was released. Hence, I've been working with this code base for a long time. The way our team works, we don't have officially divided up parts of the VM that we work on. We basically go where we are needed and do the necessary work. Hence, each VM engineer's knowledge of the code is quite well rounded. However, each of us do have areas that we are more familiar with than others. I should also point out that I am a VM engineer (as opposed to a class library engineer), and therefore, my expertise lies mostly in the VM and some very core system classes. While I am generally knowledgeable about the other classes in the standard libraries, I am not an expert on them. We have other engineers who focus on the libraries. Also, I will only be writing about the phoneME Advanced VM (as opposed to the phoneME Feature VM) because this is my area of expertise. OK, so let's get into our first topic. The meat Earlier, I said that the esoteric knowledge that I speak of is known to few, even amongst customers who use our technology. The reason for this is because those customers seldom have a need or incentive to modify the region of code we call shared code. But now with OSS, this will no longer be the case. The greatest degree of innovation and feature enhancements occur in shared code, which historically has mostly been the domain of Sun engineers only. Our customers, on the other hand, is usually more focused on the region of code we call the HPI or Host Porting Interface. Look here for the CDC Porting Guide which will tell you how to get to the details of the HPI. The actual HPI is documented in the source code if you know where to look. You will also find other interesting documents on that webpage. a Design Philosophy The code reuse is achieved by keeping as much common code as possible within the shared umbrella. Only hardware or OS dependent bindings is kept out of the shared code. These hardware / OS dependent bindings are referred to as target or platform specific code. Click here to see a listing of the src folder of the phoneME Advanced project. In this specific example, there are arm, linux, and linux-arm folders. The linux folder contains code that is common to all linux ports. These are usually implementations of the HPI which is called from the shared code in the shared directory. The linux-arm folder contains additonal customizations that either complete or override implementations in the linux directory. These customizations are, of course, only relevent to linux ARM ports. The arm folder contains code that is specific to ARM ports. Usually, they appear in the form of utility functions (which could be assembly code in some cases) that is called upon from various ARM ports. You will find that the code density of the shared folder (and its children) will be the highest followed by the OS folders (e.g. linux), and followed lastly by a tie between the OS-CPU (e.g. linux-arm) and CPU (e.g. arm) folders. This fact also demonstrates how the VM is made more portable. The porting effort usually only requires implementation / modification of the target specific files (which is a significantly smaller portion of the total code). The decision to do the majority of our work and innovation in the shared code as opposed to the target specific code also supports our decision to maximize performance for all ports. This way, every port can benefit from the bulk of the performance work that is done (in shared code). It is true that there are optimizations that are port specific that we may wish to apply. For these, we usually apply them in the OS-CPU or CPU folders as appropriate. Another bi-product of this code organization is that the code tends to be more readable. You will find that shared code is not littered with #ifdefs for customizations for various OS and CPU architectures. The #ifdefs you will typically see there are for enabling/disabling VM features instead. You will also see that the OS, CPU, and OS-CPU files are also more readable because they will/should not have #ifdefs due to customizations for other architectures. In the src folder, you will see a portlibs folder. portlibs is used to hold code which may be common to various ports but don't quite fit in the OS or CPU categories. Some examples of these are commonality due to toolchains (e.g. gcc) or libraries / standards (e.g. posix, ansi). Various ports (in the OS and OS-CPU files) may choose to make use of the code in portlibs, or not as appropriate. One way to conceptually understand the organization of the code is as follows: in terms of object-oriented terminology, there is a parent class for the VM. The parent class is expressed in the shared code. Each port of the VM of a specific OS and CPU target is a subclass that may has made use of white-box reuse through inheritance. The OS code is the immediate subclass of the shared code. The OS-CPU code is the subclass of the OS code. The code in the CPU folders are utility libraries that the OS-CPU class may choose to make use of. The portlibs code is another library that the OS-CPU class may choose to make use of. To summarize, the VM is incarnated as a singleton for a given platform (OS and CPU). However, it is instantiated from an OS-CPU VM class which extends the OS VM class which in turn extends the shared VM class. And this OS-CPU VM class may reuse code also by delegation to libraries in CPU and portlibs code. Mind you, this is only a conceptual model of the code organization. You will find no reference to a parent and subclass VM in the code. And the conceptual model is also not perfect in all aspects. You may find some areas where the code relate in ways that does not fit this abstraction. Yes, there are exceptions. But this model is the general rule. What does this mean to you? You will have to think similarly for code that you want to contribute. This code organization is one of the key factors that this VM has achieved its great ease of portability (which is an important feature for VMs in the mobile and embedded space). The code review process for code contributions will certainly take this into consideration. As I've mentioned earlier, you may find some exceptions that don't follow this convention. Please don't use that as an excuse to further deviate from the convention. Instead, either the existing exceptions should be fixed to conform (if possible), or there are good technical reasons why those cases will/should not fit the desired mold. Those exceptions may be allowed if the reasons are compelling enough. Hey, wait a minute!!! And that leads me to another question: if we're not trying to squeeze out every bit of performance possible (because of the tradeoff we made in our design philosophy), how much performance is enough performance? That, I will answer in my next blog entry. Have a nice day. :-) | ||
|
|