Multi-tasking the Java platform: What's the Big Deal?
Today, I started reading this thread on java.net forums. It made me wonder if people all mean the same thing when they talk about a multi-tasking Java platform. So, I decided to postpone my discussion of CVM internals for a day, and go over the topic of multi-tasking (which is also relevant to phoneME and CVM).
Disclaimer: Before getting into it, I should clarify that my opinions are my own and not necessarily that of Sun, my employer, nor my colleagues at Sun.
So, here goes ...
What is Multi-Tasking anyway?
Strictly speaking, multi-tasking means to be able to do multiple things at the same time. For the Java platform, this would mean to be able to have concurrent execution of code. The Java platform already supports multi-threading. So, what's the issue? Well, people want to be able to run multiple applications concurrently as well. How is that different than just calling the main() method of the apps from different threads? The difference is that apps like to think that they own the world and all its resources. Simply running them in different threads may have side-effects where one application can interfere with the operation of another. So, a multi-application Java platform needs to be able to isolate these applications from one another. Now, where have I heard of this kind of feature / behavior before? Why, it is commonly implemented today as processes in operating systems.
Therefore, when people want multi-tasking, I would think that what they are actually asking for is a Java process, and the Java platform takes on the role of an OS relative to the Java applications. Let's take a look at multi-tasking features in OSes and see how those should be manifested in the Java platform. We should also take a look at why people would want these features so that we don't end up over-engineering a solution. So first, OSes ...the Operating System perspective
If we have processes for the Java platform, they should probably have the following features that OS processes have:
|Feature||Implications for the Java platform|
|Concurrency||Multiple applications can run on the same device at the same time. The OS may also ensure fair allocation of resources between the applications.|
|Isolation||Each application has its own VM environment. An application must not be able to interfere with other concurrently running apps, and vice versa. Bugs and misbehaviors are quarantined in the bad app's environment.|
|Reliable Termination||When an application misbehaves, the application manager must be able to terminate the bad applications and reclaim its resources reliably.|
Reduce redundancy in resource usage and maximize sharing between applications. Also, reduce redundancy in initialization time of the Java VM for each application.|
The JSR 121 Application Isolation API defines APIs to manage these features, with an isolate being the analog of the OS process. However, having an API is not the same as having an implementation.
What People Want
Here are some scenarios of why people want multitasking. These examples probably aren't exhaustive by any measure, but I think that they are realistic, and will illustrate the issues at hand.
Case 1: You're working with a device that runs a primitive embedded OS that doesn't support processes, and you can't afford to upgrade the OS because of all the legacy native code investment that you've already made. Upgrading would mean porting all the legacy code to the new OS which can be costly. Besides, you've already paid for the license to the source code of that OS. But, you do have to make that device run multiple applications concurrently ... applications written by third party software vendors whom you have no control over. There may even be the need to run some random app downloaded from the net that the user wants. Unfortunately, if your device crashes or starts running miserably slow, your user will blame you instead of the app that his teen-aged son downloaded from a not-so-trustworthy website.
So, you figure ... the Java platform is a virtual execution environment anyway. Why can't you just have it keep the applications from doing bad things to one another or to the system? Problem solved!
Case 2: You're writing a browser type application, and have a need to run Java applications as plugins. When you don't need the plugin anymore, you want it to just go away and not consume resources. You want the Java VM to go away too. But right now, your native OS APIs doesn't give you a convenient way to do this. Since the Java VM is loaded as a library attached to your browser, if you use the OS features to blow away the VM, you'll end up blowing away your browser too, or leave it frozen in a hang. That would make you look bad.
So, you figure ... the Java VM was the one who loaded the app, right? Why can't it just blow the app away and clean itself up? Problem solved!
Case 3: You need to be able to run multiple Java applications at the same time on your device, but the OS isn't able to make the Java applications share resources ... or at least not enough.
You figure ... doesn't all Java applications run with the same underlying class libraries? Why can't the Java VM just make them share the same copy in memory? Problem solved!
NOT So Fast!
Consider this ...
For Case 1, why didn't the implementer consider implementing process isolation in his own application, or enhancing his embedded OS to do it? After all, the Java VM is also a piece of software that is launched from his app. If he was able to enhance his OS with an isolation capability, then each launch of the Java VM can be in its own process, and his problem will be solved.
For Case 2, why didn't the implementer consider implementing, in the browser, a resource tracking system that traps all resource usage by the Java VM and the application that it runs? If the browser can track all these resources, then it could kill the VM and application threads, and reclaim all the resources itself.
For Case 3: why didn't the implementer try to enhance the OS (or ask the OS vendor) to do a better job at sharing common resources instead?
The answer to all 3 is simply this: it is difficult to do ... prohibitively difficult. Moving the problem over to the Java platform doesn't make it easier. It just gets it off the implementer's plate. Now to be fair, there are certain bits of information that the Java VM has that may enable it to have a higher chance of achieving the goal. But in essense, the problem doesn't get significantly easier.
I just wanted you to think through that so that you'll have an appreciation for the complexity of the problem. When the problem is yours to solve, your appreciation for it changes. Multi-tasking is not a trivial feature that Sun or any Java platform vendors can throw in as just another bullet point in their product roadmap. Implementing multi-tasking is a major undertaking (investment and schedule-wise).
Having said that, Sun does offer 2 multi-tasking VM solutions for JavaME CLDC and CDC. Let's get into how they work ...
What's under the hood?
Each of the above 3 cases illustrates a need for 3 different features of multi-tasking: isolation, reliable termination, and efficiency. In real life, several or all of these features may be needed together.
I didn't bother talking about the feature of concurrency because it isn't the difficult part of the solution. The Java platform already supports concurrency in the form of threads, and it's not too much of a stretch to build concurrency of applications on top of thread concurreny. That is unless you need some special scheduling properties for applications. I will leave that alone for this discussion.
CLDC does not support a native interface for native methods. In practice, most CLDC apps are MIDlets which are 100% pure Java code. Hence, the CLDC MVM only needs to handle isolation of Java state and Java bugs.
By Java state, I mean class static fields, and singleton instances used in the system libraries. The system libraries are used by all the applications. Without isolation, each application will be able to see the side-effects of the operations of another application on the system libraries.
An example of a Java bug is a deadlock condition. For example, an application has 2 threads that synchronizes on 2 different objects in opposite order. As a result, each thread will hold one lock while waiting for the other to release the other lock, and both threads will block forever waiting for the other thread. Without isolation, threads of another application may synchronize on the same object and be blocked perpetually as well. Hence, the bug in one misbehaved app is causing problems for another well-behaved app.
The way the CLDC MVM solves this is by replicating all static fields in the system libraries. Each isolate gets its own copy of the statics. As a result, class / static initializers need to be run once for each isolate in order to initialize the copy of the statics of that isolate. By this mechanism, each isolate will also create their own copy of any singleton objects in the system.
CLDC isolation also involves some tricks to make sure that different apps get a different lock instance when synchronizing on ROMized class and string instances which are not replicated in JavaME VMs. Interned strings are treated similarly.
Having Java state isolation automatically results in isolation of Java bugs. If one app deadlocks, the others are not affected because the Java state of the deadlocked app is not observable to them.
There's another kind of Java bug: the deliberate kind that a malicious application might have for the purpose of perpetrating a denial of service attack on other applications. One example of this is to consume all available memory so that other apps can't run. The CLDC MVM solves this by having resource quotas for each isolate. If an isolate tries to pull that stunt, it will run see an OutOfMemoryError while another app still gets to allocate memory.
CDC requires support for the Java Native Interface (JNI) for native methods. Hence, the CDC VM needs to provide isolation for native state and bugs as well. By native state, I mean global and static variables in native code. Native bugs would include things like stray pointers that trashes memory, segmentation faults, illegal instruction faults, etc.
The only way to handle this is to isolate the native code in an OS process ... one for each application. And this is exactly what the CDC MVM does. It uses a special type of process fork to spawn off new isolates. Each isolate will run in its own process. The CDC MVM is currently only available for Linux and Solaris, because only these OSes offer that type of forking capability. Java level isolation is handled automatically by the processes as well.
As for denial of service attacks due to resource starvation, the CDC VM relies on the OS to enforce quotas if necessary. It is no different than how resource starvation issues are handled by native applications.
CLDC Isolate Termination
The CLDC MVM provides its own thread library. When it needs to terminate an application, it simply chooses to not schedule that app any more, and nullifies all references to that isolate. Since all applications are 100% pure Java code, their resources will get cleaned up by the garbage collector. Some resources have native counterparts. These will get cleaned up by private finalizers that are triggered by the GC. Note: finalizers are not part of the CLDC specification. It is a VM private implementation that is used by the system libraries.
CDC Isolate Termination
Since isolates in the CDC MVM have their own process, termination is simply a matter of killing the respective process.
CLDC Common Resource Sharing
For the CLDC MVM, since all isolates run as threads in the same physical VM, they automatically share all constant data. This includes class metadata (e.g. the constantpool, and attributes), method bytecodes, ROMized objects, and interned strings.
Each isolate gets a quota from the Java heap. In general, resources allocated from the Java heap are not shared. However, there is less internal fragmentation because the isolates allocate from the same heap.
Starting a new isolate does not require a full re-initialization of the VM. However, class initialization of system libraries will need to be re-run.
CDC Common Resource Sharing
For the CDC MVM, since isolates are different processes that forked from a common process, they will share memory pages that are read-only as well data that have not been written into (thanks to that special fork). The CDC MVM employs certain techniques to help the OS maximize this sharing.
Starting new isolates in the CDC MVM also does not require a full re-initialization of the child VM, but worker threads do need to be re-spawned because process forks don't work with threads. Similar to the CLDC case, class initialization will also need to be re-run.
Is the CDC MVM really an MVM?
If the CDC MVM is relying on OS processes for so much of its features, how is it different than simply running different instances of a regular CDC VM in processes? Well, the MVM has optimizations that allow for effective resource sharing that the regular VM does not. These optimizations also incur a little cost in terms of performance that is not incurred by the regular VM. What? MVM has performance costs? Why, yes. Did you think it would be free? The more you share between isolates, the more it will cost, and vice versa. But on average that cost is around 2-3% (if I remember correctly). This may or may not be significant depending on what your performance needs are.
The MVM also comes with an application manager to manage the isolates that it spawns. With the regular VM running in different processes, you're on your own.
In practice, not all isolation is freely available because of the use of OS processes either. For example, some physical (hardware and OS) resources cannot be replicated. Hence, access to these resources need to be arbitrated between the isolates so that they don't walk all over each other. An example of this is the graphics screen. The Java class libraries (or their native code) need to modified to allow for meaningful arbitration between the isolates. This arbitration is also controlled by the application manager.
Arbitration of physical resources is also implemented in the CLDC MVM.
Can I have a CLDC style MVM for CDC?
Technically, yes ... but there are issues. For one, the CLDC style MVM can't do anything about native code isolation. OK, let's say we will only deploy the VM in an environment where we guarantee that there will be no native code in the applications (or we will reject any applications that have native state). Can we have the CLDC style MVM then?
Yes, but there are still additional challenges to overcome. CDC's VM (CVM) runs on fully pre-emptive native threads. The scheduler is in the OS, not the VM. Hence, the VM has no control over thread scheduling (if the OS doesn't provide such a mechanism ... which normally, they don't). Therefore, the CLDC approach to reliable termination won't work.
This is where we will remember the old Thread.stop() API. But we already know there are problems with that. However, we're talking about a 100% pure Java code application here. We said we won't let any native code run. And we will have Java state isolation ala CLDC style MVM. Would that make the Thread.stop() problems go away, and give us task termination now?
Yes, we can make it work. The VM will have to set a flag to indicate a termination condition in the isolate to be terminated. Checks for this termination condition will need to be inserted in various places: the VM interpreter, compiled code generated by the JIT compiler, and loops in native code. This is to ensure that the thread will terminate in a somewhat timely fashion.
At these checkpoints, if the termination condition is detected, an uncatchable exception will be thrown. The VM will ignore try-catch blocks when handling this exception. That will ensure that the application won't be able to catch the exception and prevent the thread from terminating. The exception will cause the Java and native stacks to be unwound.
The VM will be able to unwind stack frames for Java bytecode methods. But native methods will require that the native method check for the exception and return. Since we're not allowing any application native code, we only have to deal with native code in the system libraries. Every native method will have to be modified to check for the termination condition and make sure it returns instead of clearing the exception.
Additionally, all native methods need to be checked to make sure that they free any resources that allocate locally in the method. This resource clean-up needs to be done before the method returns due to the exception stack unwinding. Some native resources are not allocated and used locally in a certain native method. For such types of native resources , we will need to make sure that they are associated with Java objects with finalizers so that they can be reclaimed when the corresponding Java objects gets GCed.
Why can't I have my CLDC style MVM for CDC?
Are you feeling exhausted yet? The number of steps that need to be done in order to implement the CLDC style MVM for CDC is certainly more lengthy. Now consider the complexity and size of CDC libraries compared to CLDC libraries. Because all native libraries will have to be modified appropriately in order for the CLDC style MVM to work correctly, the implementation effort is significantly higher. And the risk of not getting it done right is also equally high.
In addition, if the VM is deployed with any middleware stack (for example, MHP middleware for set-top boxes), all the native code of that middleware stack will also have to go through the same treatment in order for the CLDC style MVM to work.
Hence, the difficulty of implementing this type of MVM for CDC is significantly higher. Sun actually has experimental code in CVM that implements this kind of MVM. But because of the implementation cost and risks involved with the native code portion, that code was never completed. However, it will be open sourced soon in the next release of phoneME Advanced, and interested parties are welcomed to drive it to completion.
An MVM for JavaSE?
A CDC style MVM may or may not be viable for JavaSE. Their VM is structured differently. Hence, they may or may not benefit from this approach. The JavaSE VM's class sharing feature is already a step in the same direction.
As for a CLDC style MVM, if you can understand how difficult it is to do it right for CDC, you will see that it is significantly more so for JavaSE. This is because JavaSE has a significantly larger set of class libraries. However, only the native methods in those libraries will be a contributing factor to the difficulty. Fortunately, the bulk of JavaSE's libraries are pure Java code. But is it enough to make this task less daunting?
What if I don't need all the features of an Isolate?
With CDC, you already have namespace isolation through the use of classloaders. There has certainly been systems that have been deployed based on thread concurrency and classloader namespace isolation alone. However, these systems suffer from various weaknesses owing to the lack of true isolation. A common complaint is that one misbehaving app can crash the entire system, and all apps go down with it.
There are also vendors who offers MVM solutions. I always wonder if they did as thorough a job as Sun did in analyzing the problem, or are they offering only a partial solution. Since, I haven't played with their VMs, I don't know how complete their MVM is.
What does this mean to you?
I hope you will now have an understanding of the daunting task involved in building an MVM solution. I understand that when you need an MVM and there just isn't one available, knowing that it's difficult to implement one is of no consolation at all.
However, since the Java platform is now open-sourced, if this is a important enough feature to you, please contribute in this area of the VM. If we work on this together a little at a time, we may get an MVM sooner than later.
Have a nice day. :-)