Skip to main content

A Field Get Experience

Posted by mlam on December 14, 2006 at 6:41 PM PST

This article is a continuation of my series of discussions about the internals of the phoneME Advanced VM (commonly known as CVM) for JavaME CDC. Below, I'll work on fixing a bug in the VM. Along the way, I'll discuss more of CVM's internal mechanisms. Note: for the purpose of this discussion, I will only focus on the coding aspects. The source code version control details will not be discussed here.

Resources: start of CVM internals discussion, copy of the CVM map, PDF of map for printing, .h files in src/share/javavm/include, and .c files in src/share/javavm/runtime.

What is the Bug?

The bug report can be found here (bug 6450163). Essentially, reads and writes to long and double volatile variables need to be atomic. This is specified in section 17.7 (Rules for Volatile Variables) of the Java Language Specification:

The load, store, read, and write actions on volatile variables are atomic, even if the type of the variable is double or long.

Currently, the JIT handles this by refusing to compile methods that accesses these types of 64-bit variables. Instead it defers execution of the method to the interpreter which already handles these 64-bit volatile field accesses in an atomic fashion. The bug is one of performance rather than of correctness.

Digging further ...

Other Relevant Information

Here (bug 4896358) is a related bug that was fixed a few years back. Before that fix, the VM wasn't handling 64-bit field accesses atomically (which is what the bug was reporting). The lack of atomicity is basically because most instances of CVM are deployed on 32-bit systems where the memory bus size is 32-bit. Hence, a 64-bit field value will be read/written using 2 32-bit bus accesses. On a target system that supports full pre-emption in its thread implementation, it is possible that a thread context switch can occur in between the 2 bus accesses. If that were to happen, the field value may only be partially written before it is read by another thread, or partially read before another thread interrupts and over-writes the field (including the other half that is yet to be read). This means that the read/write access to the volatile fields wouldn't be atomic as required by the JLS.

The solution is to ensure that all 64-bit volatile field accesses are synchronized with a VM mutex. Yes, there may be other solutions like OS or hardware APIs to request an atomic 64-bit memory access. In CVM's case, we decided that 64-bit accesses are not commonly used. Hence, it wasn't worth it to create yet another porting API to access any such OS and/or hardware API that may not be available on all platform architectures. Doing so would increase the burden of porting. Certainly, this can be changed if this is found to be a performance hotspot. But for now, the use of a mutex is adequate.

What's the Story on the current Bug?

Part of bug 4896358's fix includes making sure that JIT compiled code complies with this requirement for atomic 64-bit volatile field accesses as well. Since 64-bit field accesses in general were considered to be rare for JavaME applications, the decision was to take the simplest approach i.e. to refuse to compile the method that has this kind of field access.

Why fix bug 6450163 now? Well, even though 64-bit field accesses may be rare, they may be used in a rarely traversed code path inside a hot method that does other things. Refusing to compile the method would mean that it will not be able to benefit from the acceleration that compilation yields. Secondly, the exercise of fixing the bug will give me the opportunity to tell you more about CVM internals. :-)

So, how extensive should the fix be? Doesn't non-volatile field accesses need to be atomic as well? Why not do it for all 64-bit field accesses? Well, there is nothing in the VM spec that says anything about the atomicity of field accesses. The language specification only added on the statement above about atomicity for volatile fields. There is no such guarantee for non-volatile fields. This is not so unreasonable. Note: that if you were coding in C/C++ or any other language, you will be facing the same issues there.

So, how do people normally ensure that in a multi-threaded environment, atomicity problems won't show up thereby leading to the reading/writing of erroneous data? The answer is to use proper mechanisms to synchronize the field accesses between the threads. This is also true for Java code. However, the language specification did make an explicit allowance for volatile fields. Hence, the VM must be able to handle this case.

Now, that we have confirmed that the fix only need to be implemented for volatile fields, the next thing that has to be done is to understand how the interpreter solves the problem. This is because the JIT solution will need to be compatible with that.

So, let's see what happens in the interpreter (since the fix for bug 4896358) ...

CVM's Object Layout

Every Java object in CVM will start with the following header (see object.h):

    struct CVMObjectHeader {
        volatile CVMClassBlock   *clas;
        volatile CVMAddr         various32;
        CVM_GC_SPECIFIC_WORDS
    };

From within VM code, all objects can be referred to in one of 2 ways: a direct pointer or an indirect pointer. The indirect pointer (commonly referred to as ICells) is of type CVMObjectICell *, and is also used as a GC root. I'll discuss ICells when get around to talking about the garbage collector. The direct pointer is of type CVMObject *. Both CVMObjectICell and CVMObject are defined in defs.h. But CVMObject essentially contains a CVMObjectHeader and nothing else. Hence, we always look at CVMObjectHeader when we want to see the shape of the top of the object. And we tend to use the terms CVMObject and CVMObjectHeader interchangeably.

In the CVMObject, the first word, clas, contains a pointer to the object's class meta-data. The second word, various32, contains bit encodings of various bits of information about the object. CVM_GC_SPECIFIC_WORDS is reserved so that a GC implementation can add extra object header information. But in practice, the current GC implementations don't add anything there. Hence the size of the object header is 2 words, and it's first instance field will start immediately after the object header. All instance fields are word-aligned. This means that booleans, bytes, chars, and shorts will all take up one word even though they could fit in less. 64-bit types, double and long, will take up 2 words. Only instance fields are stored here. Static fields are stored with class data structures in the CVMClassBlock.

getfield

For the purpose of our discussion, we'll only look at the details of the getfield bytecode for reading from an instance field. While field accesses includes putfield (for writing to an instance field), getstatic (reading from a static field), and putstatic (writing to a static field), they are all handled in a similar fashion. Hence, just discussing getfield alone would be adequate to illustrate the solution for all cases.

Just like with C code accessing the field of a struct, a getfield requires 2 arguments: a CVMObject pointer (to point to the top of the object), and an offset into object (to point to the location of the field). However, unlike with C code which has all the offset figured out at compile time, a Java classfile will only record the field's symbolic information. The symbollic information will be store in the class' constant pool (CP) table. A getfield bytecode will specify the index into the constant pool to get the entry which holds the symbollic info for the field.

A getfield bytecode looks like this (see spec):

    getfield  index1 index2

where index1 and index2 are unsigned 8 bit values, and ((index1 << 8) | index2) computes the value of the CP index of the CP entry that contains the field symbollic information.

Note: the CVMObject * for the getfield operation will come from the operand stack of the current thread. This in contrast with the field offset which has to be computed from the value that follows the getfield bytecode.

Quickening

Before the getfield bytecode can be executed by an interpreter, the CP entry that it references need to be resolved first. This process is called constant pool (CP) resolution and is commonly done on first use/access of the CP entry. After resolution, an optimization can be done to speed up interpretation of the getfield bytecode. Basically, getfield needs the field offset which we get from the CP resolution process. Therefore, instead of going to the CP table every time to fetch the field offset, we go ahead and overwrite the CP index value following the getfield bytecode with the field offset instead.

However, we need to change the bytecode to a new synthetic bytecode to indicate that we have done this so that we'll know next time that the 2 following bytes contains the field offset and not the CP index. This bytecode re-writing process is called quickening, and the synthetic bytecodes are called quickened bytecodes. Note: quickened bytecodes are specific to VM implementations, and are not part of the VM spec.

To see quickening in action, search for CASE_ND(opc_getfield) in the interpreter loop function (executeJava()) in executejava_standard.c. You will find that the code flows into an area that calls CVMquickenOpcode(). CVMquickenOpcode() is found in quicken.c. This is where the quickening of bytecodes happen. After the quickening, we return to the interpreter to execute the quickened bytecodes. Note that the original getfield bytecode never got executed. It only triggers its own quickening, and then the quickened forms will be executed instead.

In CVM, there are 4 quickened bytecodes for getfield:

  • getfield_quick
  • getfield2_quick
  • agetfield_quick
  • getfield_quick_w

getfield_quick is for all primitive fields of size 32-bit or smaller. This is one reason why we round up boolean, byte, char, and short to a full 32-bit in the object fields. It allows us to use the same quickened bytecode. We also use this bytecode for int and float types.

getfield2_quick is for 64-bit fields i.e. double and long.

agetfield_quick is for object pointers. On a 32-bit system these are 32-bit in size, but we still need to distinguished it from getfield_quick because of other work that may be done when manipulating the object pointers.

And that leaves us with getfield_quick_w which is the catch all for other cases that aren't handled by the above 3. This bytecode will actually result in the field offset being fetched from the CP entry. And then it use the offset to access the field. The handler for this quickened bytecode (see CVMgetfield_quick_wHelper() in executejava_standard.c) is also where we enforce the atomicity of 64-bit volatile field gets.

Synchronizing Volatile Field Access

Looking in CVMgetfield_quick_wHelper(), we see:

    static CVMStackVal32*
    CVMgetfield_quick_wHelper(CVMExecEnv* ee, CVMFrame* frame,
                              CVMStackVal32* topOfStack, CVMConstantPool* cp,
                              CVMUint8* pc)
    {
        CVMFieldBlock* fb;
        CVMObject* directObj = STACK_OBJECT(-1);
        if (directObj == NULL) {
            return NULL;
        }
        fb = CVMcpGetFb(cp, GET_INDEX(pc+1));
        ...
        if (CVMfbIsDoubleWord(fb)) {
            /* For volatile type */
            if (CVMfbIs(fb, VOLATILE)) {
                CVM_ACCESS_VOLATILE_LOCK(ee);
            }
            CVMD_fieldRead64(directObj, CVMfbOffset(fb), &STACK_INFO(-1));
            if (CVMfbIs(fb, VOLATILE)) {
                CVM_ACCESS_VOLATILE_UNLOCK(ee);
            }
            ...
            topOfStack++;
        } else {
            ...
        }
        return topOfStack;
    }

At the top of the function, we basically do some necessary checks like ensuring that the object pointer that we will dereference is not NULL. Next, we fetch the CVMFieldBlock * from the CP entry. A CVMFieldBlock (commonly referred to as FB or fb) is a data structure that is used to store all the meta-data concerning a given field in a class. In CVM, the fb pointer is the universal identifier for a field. In this case, when we resolve the CP entry, we essentially attain a fb pointer to the fb in the class that owns that field. Note: sometimes, we simply say fb instead of fb *. The context of use will disambiguate the true meaning behind each usage. When we say we're passing the fb, we don't actually mean that we're passing an fb around by value. We're only passing the fb *.

Next, from the fb, we can find out if the field is 64-bit or not. If it is 64-bit, we will check if the field is volatile. If so, we synchronize on the CVM_ACCESS_VOLATILE_LOCK before reading from the field using CVMD_fieldRead64(). After reading the field, we, of course, release the lock.

More Than Meets the Eye

There's another part of the picture that's needed in order for this to work. That is we have to make sure that all getfields for volatile 64-bit fields will be quickened into getfield_quick_w instead of getfield2_quick. Otherwise, we won't get to this helper function which does the synchronization. The quickening decision is made in CVMquickenOpcodeHelper() in quicken.c.

Note that we don't quicken all 64-bit getfields into getfield_quick_w. We only do that for the fields which are volatile. For the rest, we quicken then into getfield2_quick which doesn't have the synchronization overhead, and embeds the field offset after the bytecode for quicker access (i.e. no need to look it up via the CP entry).

Other Volatile Fields

CVM's design assumes a 32-bit hardware architecture as a minimum. There are many other design decisions that depend on this. In this case, the assumption here is that a 32-bit architecture will at least be able to do memory accesses 32-bits at a time. Hence, there will not be any atomicity issues with accesses to field size which are smaller or equa l to 32-bits.

JNI Field Accesses

What about JNI's GetLongField() and GetDoubleField()? That should be handled in jni_impl.c. Since the code for the field getters and setters look so much alike, these functions are defined using macros to reduce the amount of redundant typing. Here's an excerpt of the JNI API function definitions:

    #define CVM_DEFINE_JNI_64BIT_FIELD_GETTER_AND_SETTER(jType_, elemType_)	\
\
    jType_ JNICALL \
    CVMjniGet##elemType_##Field(JNIEnv* env, jobject obj, jfieldID fid) \
    { \
        CVMExecEnv* ee  = CVMjniEnv2ExecEnv(env); \
        jType_ v64; \
    \
        CVMjniSanityCheckFieldAccess(env, obj, fid); \
        CVMID_fieldRead##elemType_(ee, obj, CVMfbOffset(fid), v64); \
        return v64; \
    }    \
    ...

    CVM_DEFINE_JNI_64BIT_FIELD_GETTER_AND_SETTER(jlong,   Long)
    CVM_DEFINE_JNI_64BIT_FIELD_GETTER_AND_SETTER(jdouble, Double)

Ooops! It looks like it doesn't check for volatile 64-bit fields and do the appropriate synchronization. Looks like a bug to me. Well, I guess we'll have to fix that too.

JIT Compiled Code

What about field accesses from JIT compiled code? I'll leave that for ...

To Be Continued ...

In my next entry (part 2), I'll discuss what is done in the JIT to ensure atomicity of accesses to volatile 64-bit fields in the current implementation. I will also talk about the solution to bug 6450163 as well. Just a hint: we'll be going inside the JIT internals. So, come back and read if you're interested.

Have a nice day. :-)

Related Topics >>