Disassembling serialized Java objects
Presenting Serialysis, a library that allows you to disassemble the serial form of Java objects. This can allow you to retrieve information about an object that is not available through its public API. It is also a useful tool when testing the serialization of your classes.
When the public API is not enough
My reason for writing this library is that I encountered a couple of problems where I found that I needed information from an object that was not available through its public API, but that was available through its serial form.
One example is if you have a stub for a remote RMI object, and you want to know what address it will connect to, or what port, or using what socket factory. The standard RMI API doesn't give you any way to extract this information from the stub. But the information is there, and it must be included when the stub is serialized so that the stub is usable when it is later deserialized. So if we could somehow parse the serialized stub we could get the information we want.
A second example comes from the JMX API. Queries to the MBean Server are represented by the interface QueryExp. QueryExp instances are constructed using the methods of the Query class. If you have an object implementing QueryExp, how can you know what query it executes? The JMX API doesn't include any method to find out. The information must be present in the serial form, so that when a client sends a query to a remote server it can be reconstituted on the server. If we could look at the serial form, we could find out what the query was.
This second example is what prompted me to write this library. The existing standard JMX connectors are based on Java serialization, so they don't need to do anything special for QueryExps. But the new Web Services Connector being defined by JSR 262 uses XML for serialization. How can it analyze a QueryExp in order to convert it into XML? The answer is that the WS Connector uses a version of this library to look at the Java-serialized QueryExp.
What these examples have in common is that they illustrate gaps in the relevant APIs. There ought to be methods that allow you to extract the information contained in an RMI stub. There ought to be methods that convert back from a QueryExp object to the original Query methods that constructed it. (Even a standardized parseable toString() would be enough.) But those methods aren't there today, and if we want code that works with those APIs as they are now, we need another approach.
Grabbing the private fields of objects
If you have the source code of the classes you're interested
in, it's tempting just to barrel in and grab the information you
need. In the RMI stub example, we can find out by experiment that
the stub's getRef()
method returns a sun.rmi.server.UnicastRef, and by
studying the JDK source we might be able to figure out that this
class contains a field ref of type
sun.rmi.transport.LiveRef with the information we
need. So we might end up with code like this:
// This is NOT a good idea!!!
import sun.rmi.server.*;
import sun.rmi.transport.*;
import java.rmi.*;
import java.rmi.server.*;
public class StubDigger {
public static getPort(RemoteStub stub) throws Exception {
RemoteRef ref = stub.getRef();
UnicastRef uref = (UnicastRef) ref;
Field refField = UnicastRef.class.getDeclaredField("ref");
refField.setAccessible(true);
LiveRef lref = (LiveRef) refField.get(uref);
return lref.getPort();
}
}
You might be satisfied with this, but you shouldn't be. The
code in bold is full of horrors. First of all, you should
never depend on sun.* classes, because
there's no guarantee they won't change unrecognizably in any JDK
update, plus of course your code probably won't be portable to
platforms other than the JDK. Secondly, it's a huge red flag
when you see Field.setAccessible
being called. That means the code is depending on undocumented
fields, which again could change between releases, or, worse,
which might continue to exist but with subtly different
semantics.
(The above code was written for JDK 5. It turns out that in
JDK 6, LiveRef acquires a public getPort() method, so you no
longer need Field.setAccessible. But you still need to depend
on sun.* classes.)
Well, sometimes you can't do any better than this. But if the class you're interested in is serializable, often you can. The reason is that the serial form of a class is part of its public interface. If the API is any good at all then its public interfaces will evolve compatibly in every update. This is a very strong requirement on the JDK platform in particular.
So if the information you need isn't available through a class's public methods, but is part of the documented serial form, then you can rely on it remaining in the serial form in the future.
The serial form is included in the Javadoc output as part of the See Also for each serializable class. You can see the serial form of all public JDK classes in a single giant page.
Enter Serialysis
My library to parse serialized objects is called Serialysis, the result of cramming the words "serial analysis" too close together.
Here's a simple example of what it looks like in action. This code...
SEntity sint = SerialScan.examine(new Integer(5));
System.out.println(sint);
...produces this output...
SObject(java.lang.Integer){
value = Prim(int){5}
}
This tells us that the java.lang.Integer that we
gave to SerialScan.examine serializes as an object
with a single field value of type int.
If we check out the documented
serialized form of java.lang.Integer we can see
that this is indeed what is expected.
If you check out the source code of
java.lang.Integer, you'll see that the class itself
also has a single field value of type
int:
/**
* The value of the <code>Integer</code>.
*
* @serial
*/
private final int value;
But private fields are an implementation detail. An update
could rename this field, or replace it with a new field
inherited from the parent class java.lang.Number,
or whatever. There's no guarantee that that won't happen, but
there is a guarantee that the serial form will remain
the same. Serialization provides mechanisms
to keep the serial form the same even when the class's fields
change.
Here's a more complicated example. Suppose that, for some reason, we want to know how big the array in an ArrayList is. The API doesn't allow us to find out, though it does allow us to force the array to be at least a certain size.
If we check the serial
form of ArrayList, we see that it does contain the information
we're looking for. There's a serialized field
size, which is the number of elements in the list.
That's not what we want. But the Serial Data in the
writeObject method does have what we want:
- Serial Data:
- The length of the array backing the
ArrayListinstance is emitted (int), followed by all of its elements (each anObject) in the proper order.
If we execute this code...
List<Integer> list = new ArrayList<Integer>(); list.add(5); SObject slist = (SObject) SerialScan.examine(list); System.out.println(slist);
...we get this output...
SObject(java.util.ArrayList){
size = SPrim(int){1}
-- data written by class's writeObject:
SBlockData(blockdata){4 bytes of binary data}
SObject(java.lang.Integer){
value = SPrim(int){5}
}
}
This is where we get into the gory details of serialization.
In addition to, or instead of, serializing an object's fields, its
class can declare a method
writeObject(ObjectOutputStream) that writes arbitrary
data to the serial stream using methods like ObjectOutputStream.writeInt.
It must declare a corresponding readObject that reads
the same data, and it should document via a @serialData
tag what the writeObject method writes, as
ArrayList does.
The writeObject data is accessible in Serialysis
through the method SObject.getAnnotations(), which
returns a List<SEntity>. Each Object that
was written via the method ObjectOutputStream.writeObject(Object)
appears as an SObject in this list. Each chunk of
data written by one or more consecutive calls to the methods
that ObjectOutputStream gets from DataOutput
(writeInt,
writeUTF,
etc) appears as an SBlockData. The serial stream
doesn't include enough information to separate out individual
items within the chunk; that information is an agreement between
writer and reader that is documented by the
@serialData tag.
Based on the ArrayList documentation, we can find the size of the array like this:
SObject slist = (SObject) SerialScan.examine(list);
List<SEntity> writeObjectData = slist.getAnnotations();
SBlockData data = (SBlockData) writeObjectData.get(0);
DataInputStream din = data.getDataInputStream();
int alen = din.readInt();
System.out.println("Array length: " + alen);
How Serialysis solves my example problems
Without showing all the details of the code, here's the outline of the solution to the QueryExp problem I mentioned. Suppose I have a QueryExp constructed like this:
QueryExp query =
Query.or(Query.gt(Query.attr("Version"), Query.value(5)),
Query.eq(Query.attr("SupportsSpume"), Query.value(true)));
This means, "MBeans where the Version attribute is
greater than 5 or the SupportsSpume attribute is
true. The toString() of this query in the JDK looks like
this:
((Version) > (5)) or ((SupportsSpume) = (true))
The result of SerialScan.examine looks like this:
SObject(javax.management.OrQueryExp){
exp1 = SObject(javax.management.BinaryRelQueryExp){
relOp = SPrim(int){0}
exp1 = SObject(javax.management.AttributeValueExp){
attr = SString(String){"version"}
}
exp2 = SObject(javax.management.NumericValueExp){
val = SObject(java.lang.Long){
value = SPrim(long){5}
}
}
}
exp2 = SObject(javax.management.BinaryRelQueryExp){
relOp = SPrim(int){4}
exp1 = SObject(javax.management.AttributeValueExp){
attr = SString(String){"supportsSpume"}
}
exp2 = SObject(javax.management.BooleanValueExp){
val = SPrim(boolean){true}
}
}
}
You can imagine code that descends into this structure producing an XML equivalent. Every conformant implementation of the JMX API is required to produce this same serial form, so the code that parses it is guaranteed to work everywhere.
Now here's the code that solves the RMI stub port number problem:
public static int getPort(RemoteStub stub) throws IOException {
SObject sstub = (SObject) SerialScan.examine(stub);
List<SEntity> writeObjectData = sstub.getAnnotations();
SBlockData sdata = (SBlockData) writeObjectData.get(0);
DataInputStream din = sdata.getDataInputStream();
String type = din.readUTF();
if (type.equals("UnicastRef"))
return getPortUnicastRef(din);
else if (type.equals("UnicastRef2"))
return getPortUnicastRef2(din);
else
throw new IOException("Can't handle ref type " + type);
}
private static int getPortUnicastRef(DataInputStream din) throws IOException {
String host = din.readUTF();
return din.readInt();
}
private static int getPortUnicastRef2(DataInputStream din) throws IOException {
byte hasCSF = din.readByte();
String host = din.readUTF();
return din.readInt();
}
To understand this, you need to see the serial
form for RemoteObject. This code is
admittedly difficult, but it is portable and futureproof. It
should be fairly clear how to extract the other information I
mentioned from RMI stubs using the same approach.
Conclusions
You really don't want to get into disassembling serial forms unless you have to. But if you do have to, then Serialysis should make your task a little less painful.
It's also a good way to check that your own classes serialize the way you expect them to.
Download
You can download the Serialysis library at http://weblogs.java.net/blog/emcmanus/serialysis.zip.
[Tags: jmx, rmi, serialization.]
- Login or register to post comments
- Printer-friendly version
- emcmanus's blog
- 1868 reads





