Skip to main content

"We need a 'dirty hack' (but a brilliant one)..."

Posted by simonis on February 20, 2009 at 9:37 AM PST

Usually it's not big fun to be "supporter of the week" but recently, when I
was on duty, I got this somehow unusual request on our support queue. If you're
interested in Bytecode Instrumentation and Rewriting, Classloaders and
Instrumentation Agents read on to hear the full story...


How everything started..

So what was the problem that required such an unusual solution? An engineer
explained, that they delivered version 1.0 of their API with a method
void foo(String arg) (see Listing 1):

Listing 1: The original, old API
package api;
public class API {
  public static void foo(String arg) {
    System.out.println("Now in : void   API.foo(String)");
    System.out.println("arg = " + arg);
  }
}

Some time later, they delivered version 2.0 of the API where they accidently changed the signature of foo to void foo(String arg)
(see Listing 2):

Listing 2: The new version of the API
package api;
public class API {
  public static Object foo(String arg) {
    System.out.println("Now in : Object API.foo(String)");
    System.out.println("arg = " + arg);
    return null;
  }
}

Unfortunately they didn't realize this just until a client complained that one of their applications didn't worked anymore because a third party library which they where using (and of which they had no source code!) was compiled against version 1.0 of the API. This was similar to the test program shown in Listing 3:

Listing 3: The test program (compiled against the old API)
import api.API;
public class Test {
  public static String callApiMethod() {
    System.out.println("Calling: void   API.foo(String)");
    API.foo("hello");
    System.out.println("Called : void   API.foo(String)");
    return "OK";
  }
  public static void main(String[] args) {
    System.out.println(callApiMethod());
  }
}

If compiled and run against the old API the Test class will run as follows:

> javac -cp apiOld Test.java
> java  -cp apiOld Test
Calling: void   API.foo(String)
Now in : void   API.foo(String)
arg = hello
Called : void   API.foo(String)
OK

However, if compiled against the old API shown in Listing 1 and run against the new API from Listing 2, it will produce a NoSuchMethodError:

> javac -cp apiOld Test.java
> java  -cp apiNew Test
Calling: void   API.foo(String)
Exception in thread "main" java.lang.NoSuchMethodError: api.API.foo(Ljava/lang/String;)V
    at Test.callApiMethod(Test.java:11)
    at Test.main(Test.java:17)

Unfortunately, at this point it was already impossible to revert the change in foo's signature, because there already existed a considerable number of new client libraries which were compiled against version 2.0 and depended on the new signature.

Our engineer now asked to "hack" the Java VM such that calls to the old version of foo get redirected to the new one if version 2.0 of the API is used. Hacking the VM for such a purpose is of course out of question. But they asked so nicely and I had already heard of bytecode instrumentation and rewriting for so many times in the past without ever having the time to try it out that I finally decided to help them out with the hack they requested.

Two possible solutions

There were two possible solutions I could think of: statically edit the offending class files and rewrite the calls to the old API with calls to the
new one (remember that the client had no sources for the library which caused the problems). This solution had two drawbacks: first, it would result in two different libraries (one compatible with the old and one compatible with the new API) and second, it had to be manually repeated for each such library and it was unknown, what other libraries could cause this problem.

A better solution would be to dynamically rewrite the calls at runtime (i.e. at load time, to be more exact) only if needed (i.e. if a library which
was compiled against the old API is running with the new one). This solution is more general, but it has the drawback of introducing a small performance penalty because all classes have to be scanned for calls to the old API method at load time.

I decided to use dynamic instrumentation, but then again there were (at least) two possibilities how this could be implemented. First, Java 5 introduced a new Instrumentation API which serves exactly our purpose, namely "..to instrument programs running on the JVM. The mechanism for instrumentation is modification of the
byte-codes of methods". Second, there has always been the possibility to use a custom class loader which alters the bytecodes of classes while they are loaded. I'll detail both approaches here:

Using the Java Instrumentation API

The Java Instrumentation API is located in the java.lang.instrument package. In order to use it, we have to define a Java programming language agent which registers itself with the VM. During this registration, it receives an Instrumentation
object as argument which among other things can be used to register class transformers (i.e. classes which implement the ClassFileTransformer interface) with the VM.

A Java agent can be loaded at VM startup with the special command line option -javaagent:jarpath[=options] where jarpath denotes the jar-file which contains the agent. The jar-file must contain a special attribute called Premain-Class in its manifest which specifies the agent class within the jar-file. Similar to the main method in a simple Java program, an agent class has to define a so called premain method with the following signature: public static void premain(String agentArgs, Instrumentation inst). This method will be called when the agent is registered at startup (before the main method) and gives the agent a chance to register class transformers with the instrumentation API. The following listing shows the Premain-Class class of our instrumentation agent:

Listing 4: The instrumentation agent
package instrumentationAgent;
import java.lang.instrument.Instrumentation;

public class ChangeMethodCallAgent {
  public static void premain(String args, Instrumentation inst) {
    inst.addTransformer(new ChangeMethodCallTransformer());
  }
}

A class file transformer has to implement the ClassFileTransformer interface which defines a single transform method. The transform takes quite some arguments from which we only need the classfileBuffer which contains the class file as a byte buffer. The class transformer is now free to change the class definition as long as the returned byte buffer contains another valid class definition. Listing 5 shows our minimal ChangeMethodCallTransformer. It calls the real transformation method Transformer.transform which operates on the bytecodes and replaces calls to the old API method with calls to the new version of the method. The Transformer class will be described in a later section of this article (see Listing 8).

Listing 5: Our class file transformer
package instrumentationAgent;

import bytecodeTransformer.Transformer;
import java.lang.instrument.ClassFileTransformer;
import java.lang.instrument.IllegalClassFormatException;
import java.security.ProtectionDomain;

public class ChangeMethodCallTransformer implements ClassFileTransformer {
  public byte[] transform(ClassLoader loader, String className,
          Class classBeingRedefined, ProtectionDomain protectionDomain,
          byte[] classfileBuffer) throws IllegalClassFormatException {
    return Transformer.transform(classfileBuffer);
  }
}

For the sake of completeness, Listing 6 shows the manifest file which is used to create the instrumentation agent jar-file. ChangeMethodCallAgent is defined to be the premain class of the agent. Notice that we have to put asm-3.1.jar in the boot class path of the agent jar-file, because it is needed by our actual transform method.

Listing 6: The manifest file for our instrumentation agent
Manifest-Version: 1.0
Premain-Class: instrumentationAgent.ChangeMethodCallAgent
Boot-Class-Path: asm-3.1.jar

If we run our test application with the new instrumentation agent, we will not get an error anymore. You can see the output of this invocation in the following listing:

> java -cp apiNew:asm-3.1.jar:bytecodeTransformer.jar:. -javaagent:instrumentationAgent.jar Test
Calling: void   API.foo(String)
Now in : Object API.foo(String)
arg = hello
Called : void   API.foo(String)
OK


Using a custom class loader

Another possibility to take control over and alter the bytecodes of a class is to use a custom class loader. Dealing with class loaders is quite tricky and there are numerous publications which deal with this topic (e.g. References [2], [3], [4]). One important point is to find the right class loader in the hierarchy of class loaders which is responsible for the loading of the classes which we want to transform. Especially in Java EE scenarios which can have a lot of chained class loaders this may be not an easy task. But once this class loader is identified, the changes which have to be applied in order to make the necessary bytecode transformations are trivial.

For this example I will write a new system class loader. The system class loader is responsible for loading the application and it is the default delegation parent for new class loaders. If the system property java.system.class.loader is defined at VM startup then the value of that property is taken to be the name of the system class loader. It will be created with the default system class loader (which is a implementation-dependent instance of ClassLoader) as the delegation parent. The following listing shows our simple system class loader:

Listing 7: A simple system class loader
package systemClassLoader;

import bytecodeTransformer.Transformer;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;

public class SystemClassLoader extends ClassLoader {

  public SystemClassLoader(ClassLoader parent) {
    super(parent);
  }

  @Override
  public Class loadClass(String name, boolean resolve) throws ClassNotFoundException {
    if (name.startsWith("java.")) {
      // Only bootstrap class loader can define classes in java.*
      return super.loadClass(name, resolve);
    }
    try {
      ByteArrayOutputStream bs = new ByteArrayOutputStream();
      InputStream is = getResourceAsStream(name.replace('.', '/') + ".class");
      byte[] buf = new byte[512];
      int len;
      while ((len = is.read(buf)) > 0) {
        bs.write(buf, 0, len);
      }
      byte[] bytes = Transformer.transform(bs.toByteArray());
      return defineClass(name, bytes, 0, bytes.length);
    } catch (Exception e) {
      return super.loadClass(name, resolve);
    }
  }
}

In fact we only have to extend the abstract class java.lang.ClassLoader and override the the loadClass method. Inside loadClass, we immediately bail out and return the output of the superclass version of loadClass, if the class name is in the java package, because only the bootstrap class loader is allowed to defined such classes. Otherwise we read the bytecodes of the requested class (again by using the superclass methods), transform them with our Transformer class (see Listing 8) and finally call defineClass with the transformed bytecodes to generate the class. The transformer, which will be presented in the next section, takes care of intercepting all calls to the old API method and replaces it with calls to the method in the new API.

If we run our test application with the new system class loader, we will succeed again without any error. You can see the output of this invocation in the following listing:

> java -cp apiNew:asm-3.1.jar:bytecodeTransformer.jar:systemClassLoader.jar:. \
       -Djava.system.class.loader=systemClassLoader.SystemClassLoader Test
Calling: void   API.foo(String)
Now in : Object API.foo(String)
arg = hello
Called : void   API.foo(String)
OK


Finally: rewriting the bytecodes

After I have demonstrated two possibilities how bytecode instrumentation can be applied to a Java application, it is finally time to show how the actual rewriting takes place. This is fortunately quite easy today, because with ASM, BCEL and SERP to name just a few, there exist some quite elaborate frameworks for Java bytecode rewriting. As detailed by Jari Aarniala in his excellent paper "Instrumenting Java bytecode", ASM is the smallest and fastest out of these libraries, so I decided to use it for this project.

ASM's architecture is based on the visitor pattern which makes it not only very fast, but also easy to extend. Listing 8 finally shows the Transfomer class which was used in the instrumentation agent (see Listing 5) and in our custom class loader (see Listing 7).

Listing 8: the bytecode transformer
package bytecodeTransformer;

import org.objectweb.asm.ClassReader;
import org.objectweb.asm.ClassWriter;

public class Transformer {
  public static byte[] transform(byte[] cl) {
    ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
    ChangeMethodCallClassAdapter ca = new ChangeMethodCallClassAdapter(cw);
    ClassReader cr = new ClassReader(cl);
    cr.accept(ca, 0);
    return cw.toByteArray();
  }
}

The public static transform method takes a byte array with a java class definition as input argument. These bytecodes are fed into an ASM ClassReader object which parses the bytecodes and allows a ClassVisitor object to visit the class. In our case, this class visitor is an object of type ChangeMethodCallClassAdapter which is derived from ClassAdapter. ClassAdapter is a convenience class visitor which delegates all visit calls to the class visitor object which it takes as argument in its constructor. In our case we delegate the various visit methods to a ClassWriter with the exception of the visitMethod method (see Listing 9).

Listing 9: ChangeMethodCallClassAdapter, the class visitor
package bytecodeTransformer;

import org.objectweb.asm.ClassAdapter;
import org.objectweb.asm.ClassVisitor;
import org.objectweb.asm.MethodVisitor;

public class ChangeMethodCallClassAdapter extends ClassAdapter {

  public ChangeMethodCallClassAdapter(ClassVisitor cv) {
    super(cv);
  }

  @Override
  public MethodVisitor visitMethod(int access, String name, String desc,
                                   String signature, String[] exceptions) {
    MethodVisitor mv;
    mv = cv.visitMethod(access, name, desc, signature, exceptions);
    if (mv != null) {
      mv = new ChangeMethodCallAdapter(mv);
    }
    return mv;
  }
}

We are only interested in the methods of a class because our

api.API.foo method can only be called from within another method. Notice that static initializers are grouped together in the generated method which will also be visited by visitMethod. In the overridden method we get the MethodVisitor of the delegate (which is vanilla ClassWriter)) and return a new ChangeMethodCallAdapter which is constructed with the same delegate.

The ChangeMethodCallAdapter is finally the place, where the bytecode rewriting will take place. Again, ChangeMethodCallAdapter expands the generic MethodAdapter which by default passes all bytecodes to its class writer delegate. The only exception here is the visitMethodInsn which will be called for every bytecode instruction that invokes a method.

Listing 10: ChangeMethodCallAdapter, the method visitor
package bytecodeTransformer;

import org.objectweb.asm.MethodAdapter;
import org.objectweb.asm.MethodVisitor;
import org.objectweb.asm.Opcodes;

public class ChangeMethodCallAdapter extends MethodAdapter {

  public ChangeMethodCallAdapter(MethodVisitor mv) {
    super(mv);
  }

  @Override
  public void visitMethodInsn(int opcode, String owner, String name, String desc) {
    if ("api/API".equals(owner) && "foo".equals(name) && "(Ljava/lang/String;)V".equals(desc)) {
      mv.visitMethodInsn(opcode, owner, name, "(Ljava/lang/String;)Ljava/lang/Object;");
      mv.visitInsn(Opcodes.POP);
    } else {
      mv.visitMethodInsn(opcode, owner, name, desc);
    }
  }
}

In visitMethodInsn (see Listing 10), we look for methods named foo with a receiver object of type API and a signature equal to (Ljava/lang/String;)V (e.g. a String argument and a void return value). These are exactly the calls to the old version of foo which we want to patch. To finally patch it, we call our delegate with the same receiver and method name, but with the changed signature. We also have to insert a new POP bytecode after the call, because the new version of foo will return an Object which wouldn't be handled otherwise (because the following code doesn't expect foo to return a value because it was compiled against the old API (see Listing 1)). That's it - all the other calls and bytecode instructions will be copied verbatim by the class writer to the output byte array!

Conclusion

This article should by no means encourage you to be lazy with your API design and specification. It's always better to prevent problems as described in this article by good design and even better testing (e.g. signature tests of all publicly exposed methods). I also don't claim that the "hack" presented here is a good solution for the above problem - it was just fun to see what's possible today in Java with very little effort!

You can download the complete source code of this example together with a self explaining Ant file from here: hack.zip

Acknowledgments

I want to thank Jari Aarniala for his very interesting, helpful and concise article "Instrumenting Java bytecode" which helped me a lot to get started with this topic!

References


[1] Instrumenting Java bytecode by Jari Aarniala


[2] Internals of Java Class Loading by Binildas Christudas


[3] Inside Class Loaders by Andreas Schaefer


[4] Managing Component Dependencies Using ClassLoaders by Don Schwarz


[5] ASM homepage


[6] ASM 3.0 A Java
bytecode engineering library
(tutorial in pdf format)


[7] BCEL The Byte Code Engineering Library


[8] SERP framework for manipulating Java bytecode


[9] hack.zip - the source code from this article

Related Topics >>

Comments

the aspectj weaver might be of interest to those not familiar with byte code manipulation; it provides load-time weaving (byte code manipulation at load time)...

Very nice work, Volker. This is a really helpful intro to the various tools--kudos, you couldn't have made it any simpler without ruining the effect. Cheers! Patrick