Introduction to phoneME Advanced VM Internals
If you are reading my blog, chances are that you already know about Sun open-sourcing its JavaME software stack in the phoneME project. If not, click here to read more about phoneME.
Some background info
The intent of open sourcing our code is basically to allow you to gain access to it, study it, and perhaps contribute changes of your own. While there is now a mechanism by which you can gain access to the code, gaining access is not the same as being able to understand and contribute effectively to it. The latter requires some special knowledge that until now, for the most part, is only known to some of Sun's employees with a few exceptions (e.g. some hardcore VM engineers at customer companies who use our technology). The knowledge that I refer to are things like coding conventions, terminology/jargon, design philosophies, code organization, and design-tradeoff decisions to name a few. While I trust that you are all intelligent folk who can figure all these things out in time, I'm guessing that such a task is not what attracted you to this project in the first place.
Hence, I intend to write a series of blog entries (starting with this one) on these topics to make our mutual lives easier in the long run, as well as allowing everyone to get to the fun stuff sooner instead of having to waste time figuring out mundane things. I will also be writing about technical topics like how certain sub-systems work as I feel inspired to. Feel free to leave comments to requests topics that you want me to talk about, or ask for clarifications on things I will/have talked about. I will take demand into consideration when I choose the order of topics to write about.
Before I start, you might ask how I came into this knowledge that I will share (and why you should trust that I actually know what I am talking about). So, a bit about me: I work on the VM team that created and maintains the VM at the crux of the phoneME Advanced project. I was with the team since before CDC 1.0 was released. Hence, I've been working with this code base for a long time. The way our team works, we don't have officially divided up parts of the VM that we work on. We basically go where we are needed and do the necessary work. Hence, each VM engineer's knowledge of the code is quite well rounded. However, each of us do have areas that we are more familiar with than others. I should also point out that I am a VM engineer (as opposed to a class library engineer), and therefore, my expertise lies mostly in the VM and some very core system classes. While I am generally knowledgeable about the other classes in the standard libraries, I am not an expert on them. We have other engineers who focus on the libraries. Also, I will only be writing about the phoneME Advanced VM (as opposed to the phoneME Feature VM) because this is my area of expertise.
OK, so let's get into our first topic.
Earlier, I said that the esoteric knowledge that I speak of is known to few, even amongst customers who use our technology. The reason for this is because those customers seldom have a need or incentive to modify the region of code we call shared code. But now with OSS, this will no longer be the case. The greatest degree of innovation and feature enhancements occur in shared code, which historically has mostly been the domain of Sun engineers only. Our customers, on the other hand, is usually more focused on the region of code we call the HPI or Host Porting Interface. Look here for the CDC Porting Guide which will tell you how to get to the details of the HPI. The actual HPI is documented in the source code if you know where to look. You will also find other interesting documents on that webpage.
a Design Philosophy
The VM is designed to be highly portable, and to maximize reuse of code between ports while maximizing performance as much as possible. This is a founding principle of the VM.
The code reuse is achieved by keeping as much common code as possible within the shared umbrella. Only hardware or OS dependent bindings is kept out of the shared code. These hardware / OS dependent bindings are referred to as target or platform specific code. Click here to see a listing of the src folder of the phoneME Advanced project. In this specific example, there are arm, linux, and linux-arm folders. The linux folder contains code that is common to all linux ports. These are usually implementations of the HPI which is called from the shared code in the shared directory.
The linux-arm folder contains additonal customizations that either complete or override implementations in the linux directory. These customizations are, of course, only relevent to linux ARM ports.
The arm folder contains code that is specific to ARM ports. Usually, they appear in the form of utility functions (which could be assembly code in some cases) that is called upon from various ARM ports.
You will find that the code density of the shared folder (and its children) will be the highest followed by the OS folders (e.g. linux), and followed lastly by a tie between the OS-CPU (e.g. linux-arm) and CPU (e.g. arm) folders. This fact also demonstrates how the VM is made more portable. The porting effort usually only requires implementation / modification of the target specific files (which is a significantly smaller portion of the total code).
The decision to do the majority of our work and innovation in the shared code as opposed to the target specific code also supports our decision to maximize performance for all ports. This way, every port can benefit from the bulk of the performance work that is done (in shared code). It is true that there are optimizations that are port specific that we may wish to apply. For these, we usually apply them in the OS-CPU or CPU folders as appropriate.
Another bi-product of this code organization is that the code tends to be more readable. You will find that shared code is not littered with #ifdefs for customizations for various OS and CPU architectures. The #ifdefs you will typically see there are for enabling/disabling VM features instead. You will also see that the OS, CPU, and OS-CPU files are also more readable because they will/should not have #ifdefs due to customizations for other architectures.
In the src folder, you will see a portlibs folder. portlibs is used to hold code which may be common to various ports but don't quite fit in the OS or CPU categories. Some examples of these are commonality due to toolchains (e.g. gcc) or libraries / standards (e.g. posix, ansi). Various ports (in the OS and OS-CPU files) may choose to make use of the code in portlibs, or not as appropriate.
One way to conceptually understand the organization of the code is as follows: in terms of object-oriented terminology, there is a parent class for the VM. The parent class is expressed in the shared code. Each port of the VM of a specific OS and CPU target is a subclass that may has made use of white-box reuse through inheritance. The OS code is the immediate subclass of the shared code. The OS-CPU code is the subclass of the OS code. The code in the CPU folders are utility libraries that the OS-CPU class may choose to make use of. The portlibs code is another library that the OS-CPU class may choose to make use of.
To summarize, the VM is incarnated as a singleton for a given platform (OS and CPU). However, it is instantiated from an OS-CPU VM class which extends the OS VM class which in turn extends the shared VM class. And this OS-CPU VM class may reuse code also by delegation to libraries in CPU and portlibs code.
Mind you, this is only a conceptual model of the code organization. You will find no reference to a parent and subclass VM in the code. And the conceptual model is also not perfect in all aspects. You may find some areas where the code relate in ways that does not fit this abstraction. Yes, there are exceptions. But this model is the general rule.
What does this mean to you?
When you plan to look for the code that achieves some functionality, think of where the functionality should belong (i.e. shared, OS, OS-CPU, or in the CPU or portlibs libraries). This will help you locate the code of interest faster.
You will have to think similarly for code that you want to contribute. This code organization is one of the key factors that this VM has achieved its great ease of portability (which is an important feature for VMs in the mobile and embedded space). The code review process for code contributions will certainly take this into consideration.
As I've mentioned earlier, you may find some exceptions that don't follow this convention. Please don't use that as an excuse to further deviate from the convention. Instead, either the existing exceptions should be fixed to conform (if possible), or there are good technical reasons why those cases will/should not fit the desired mold. Those exceptions may be allowed if the reasons are compelling enough.
Hey, wait a minute!!!
All this stuff about portability sounds nice, but wouldn't all this layering have an impact on performance? The answer is NO! Well, "NO" for cases which we care about in this space. Is the VM as highly performant as it could be if it inlined everything and got rid of the layering? Maybe not, but that's a tradeoff we make for portability and maintainability. Note also that in practice, any performance difference is negligible. That said, the code is not implemented in a naive way in terms of layering. The layering is implemented using various techniques (which I won't go into here) to prevent unnecessary performance loss where it matters. And, of course, the team has done measurements to ensure that this code is competitive ... very competitive ... in terms of performance.
And that leads me to another question: if we're not trying to squeeze out every bit of performance possible (because of the tradeoff we made in our design philosophy), how much performance is enough performance? That, I will answer in my next blog entry.
Have a nice day. :-)