Skip to main content

In a bit of a Volatile Fix!

Posted by mlam on January 5, 2007 at 1:38 AM PST

Sorry for not writing for a while. I've been really busy. In my last entry, I described a bug that needs to be fixed and all the background information behind it. Below, I will get into the details of how we'll fix the bug. Of course, we'll talk more about the internals of the phoneME Advanced VM (CVM) as we proceed with the fix.

Resources: start of CVM internals discussion, copy of the CVM map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime.

bug Update

Since last time, I discovered that there was already an earlier bug filed for this issue, bug 5080490. So, bug 6450163 will be closed as a duplicate of 5080490, and the fix will be applied to 5080490 instead.

Last time, I also said that volatile 64-bit field accesses are relatively rare. But their presence in a method can still stop the method from being compiled by the JIT even if the method is hot. Hence, it would be nice to fix this so that volatile 64-bit field accesses won't prevent the JIT from doing its job. Note that use of non-volatile 64-bit fields is more prevalent than their volatile counterparts. However, the codepaths that exercise these accesses may be equally rare. If the code path has not been executed at least once before the JIT attempts to compile the method that contains it, then the field access opcode will remain in an unquickened state. This in turn means that the JIT won't know if the field is actually volatile or not, and must therefore treat it like a volatile field just to be safe and refuse to compile the method. Hence, the performance impact of this bug is exacerbated because it not only impacts code which uses 64-bit volatile fields but regular 64-bit fields which are unresolved as well.

So, let's get on with the fix ...

the pesky JNI accessors

First, let's fix the JNI accessors that we discovered to be not handling the 64-bit volatile fields atomically. If you recall, the way we get atomicity of access to 64-bit volatile fields is by synchronizing such accesses with a lock called CVM_ACCESS_VOLATILE_LOCK. Actually, CVM_ACCESS_VOLATILE_LOCK is a macro defined in sync.h as follows:



CVM_ACCESS_VOLATILE_MICROLOCK is one instance of a set of non-reentrant mutexes that are used in the VM commonly called microlocks. Unlike other locks in the system, microlocks are expected to be held by a thread for only a short period of time. Also, typically, we don't call complex functions or do other synchronization while we're in a region of code synchronized with a microlock. So, the fix for the JNI accessors is to add calls to the above microlock macros into the JNI accessor functions (in jni_impl.c) as follows:

jType_ JNICALL \
CVMjniGet##elemType_##Field(JNIEnv* env, jobject obj, jfieldID fid) \
{ \
    CVMExecEnv* ee  = CVMjniEnv2ExecEnv(env); \
    jType_ v64; \
    CVMFieldBlock *fb = (CVMFieldBlock *)fid; \
    CVMjniSanityCheckFieldAccess(env, obj, fid); \
    if (CVMfbIs(fb, VOLATILE)) { \
    } \
    CVMID_fieldRead##elemType_(ee, obj, CVMfbOffset(fid), v64); \
    if (CVMfbIs(fb, VOLATILE)) { \
    } \
    return v64; \
}    \


A CVMFieldBlock is the VM data structure that defines the identity and attributes of fields. A CVMFieldBlock pointer (commonly referred to as fb or FB) is usually used as the identifier of the field. In CVM, the JNI jfieldID is actually implemented as a CVMFieldBlock pointer (as implied by the cast above).

Before accessing the field memory location, we need to lock the CVM_ACCESS_VOLATILE_MICROLOCK. However, we only do this if the field itself is volatile, which we find out using the CVMfbIs() macro. This macro can be used to check an fb for various field related attributes. CVMFieldBlock, CVMfbIs(), and the various field attribute values are defined in classes.h. After the field memory location is accessed, we unlock the microlock.

Of course, we need to the above for the setter functions as well which is also defined in jni_impl.c. I've left that detail out for brevity.

the Deal with the JIT

To fix the JIT, we must first understand how it works. In CVM, the JIT is a 2-stage compiler with a front-end and a back-end. In the first stage, the front-end compiles Java bytecodes to an intermediate representation (commonly referred to as the IR). At it's most simplest unit, the IR is structured as a directed acyclic graph (DAG) i.e. a tree of nodes which do not have cyclic references. Each node represents either a data item (leaf nodes), an operator, or an expression composed of these data items, operators, and/or other expressions. A method is represented as a list of blocks. A block is a list of DAGs. And a DAG is a tree of nodes.

For those of you who may care to know more, each block here corresponds to a region of bytecodes in the method that forms an extended basic block. That means the only entry into this region is via the start of the region, but there can be multiple exits either via the end of the region or via branches into other blocks. The boundaries of these extended basic blocks are basically defined by branch targets in the method.

In the second stage, the JIT back-end will then compile this IR into platform specific machine code that can execute natively on the machine. The back-end is sometimes called the code generator (commonly referred to as codegen). For some bytecodes, the back-end will generate machine instructions e.g. addition instructions that adds 2 integers. For other more complex bytecodes (e.g. new which is used to allocate memory for objects), the code generator will generate calls to runtime helper functions instead.

These helper functions are commonly referred to as the Compiled Code Manager (CCM) runtime functions, or CCM helpers. The reason for using CCM helpers is because the operation performed may be too complex to generate inline in the compiled method. The complexity makes it difficult to generate this code correctly. Secondly, the complexity would mean the needed number of instructions to implement the operation would be significantly large. It may not be efficient in terms of space to generate this code inline in the compiled method.

the JIT's status quo

Currently, while compiling, when the JIT front-end encounters an access to a 64-bit volatile field or an unresolved 64-bit field, it will abort compilation like so:

/* If the field is 64-bit volatile type, refuse to compile 
* the method. Since 64-bit volatile fields are rare,
* failure to compile methods that access this type of
* fields should not have noticeable impact on performance.

if (CVMfbIs(fb, VOLATILE) && CVMfbIsDoubleWord(fb)) {
        "method access 64-bit volatile field");

... or like this:

/* When a field is unresolved, we don't know if it is volatile
* or not. In this case, we just refuse to compile any method
* access unresolved 64-bit fields.

if (typeTag == CVM_TYPEID_LONG ||
    typeTag == CVM_TYPEID_DOUBLE) {
    CVMJITerror(con, RETRY_LATER,
        "method access unresolved 64-bit field");

Look in this version of jitir.c here for the above code examples.

In the first case, the front-end discovered a 64-bit field access that has been resolved and is known to be volatile. Hence, the JIT will throw an error with status CANNOT_COMPILE that abort compilation as well as mark the method as not compilable so that we won't attempt to compile it again in the future.

In the second case, the front-end encountered an unresolved field that is 64-bit in size. Since the VM can't know if the field is volatile or not until after it gets resolved, the JIT will throw an error with status RETRY_LATER which will allow the JIT to retry compilation later hoping that the field would have been resolved by then. Of course, if it doesn't get resolved (i.e. the codepath doesn't get executed) by the next compilation attempt, this error will be thrown again requesting yet another retry at a later time, and so on.

The reason for throwing these errors is because the JIT is not currently able to emit code that can access these 64-bit fields in an atomic way. By refusing to compile the method, we can let the interpreter continue to execute it and therefore get the correct behavior there (since the interpreter already handles it correctly using the microlocks).

the Fix for the JIT

What's so difficult that causes the JIT to not be able to emit the needed code? Remember that the approach to get atomicity for 64-bit volatile fields is by synchronizing their accesses using a microlock. A microlock can be implemented in different ways for different target platforms. For example, for most platforms, microlocks are implemented using OS mutexes wrapped by VM data structures. For other platforms, it may be implemented using a flag in memory that is set with an atomic swap instruction and with the use of a call to a thread sleep mechanism within a polling loop to block threads when they fail to lock the microlock due to contention. Another implementation of the microlock is through the use of an OS system call to disable interrupts or the thread scheduler for the duration the lock is held.

This potential difference in implementations makes it difficult for the JIT to be written in a portable way that emits the needed code for each target platform (recall that portability is an essential feature of CVM due to the needs of the JavaME market). But more importantly, note that the operation of locking and unlocking a microlock is not trivial whatever the implementation. It requires several calls to external functions in the VM or the OS amongst other things. In such a case, it is more efficient to use a CCM helper to do the field access instead of emitting the code inline.

Using a CCM helper also has the added benefit of being able to invoke the microlock APIs from C code, thereby allowing the VM HPI (host porting interface) to define the actual microlock implementation. The portability issue is solved with no hassle.

Note: CCM helpers can also be written in assembly when appropriate, usually for performance reasons. But in this case, due to the rarity of accesses to 64-bit volatile fields, using CCM helpers written in C makes more sense especially since this gives us benefits in terms of portability.

To Be Continued ...

With that, we're now ready to write some code. Well, almost ready. So, next time, I'll try to show you a map of the JIT. In case you haven't noticed, I like pictures. And then, we'll go through and implement the changes step by step while I explain each part of the JIT that we'll visit during the fix. This fix will take us all the way from the bytecode parsing in the JIT front-end to the generated code in the JIT back-end, as well as through the implementation of a few CCM helpers. So, it'll be a good tour of the JIT's inner works.

OK, till then, have a nice weekend. :-)

Related Topics >>