Skip to main content

Cloning Java objects using serialization

Posted by emcmanus on April 4, 2007 at 8:43 AM PDT

Sometimes you need to clone objects, and sometimes you can't
use their clone method, and sometimes serialization provides an
alternative. Here's an explanation of when you might need this
exotic and expensive technique, and how you can use it.

When do you need to clone?

The commonest time that you need to clone an object is when it
is a parameter or return value of one of your public methods.
If it is a parameter that you save somewhere, then you don't
want the caller to be able to modify it later. So you save a
copy of the object. Likewise, if you are returning an object
that is part of your class's internal state, you need to return
a copy instead so that callers can't accidentally or
deliberately change that internal state.

A second case when you might want to clone is when you need to
modify an object, but you don't know who else might have a
reference to it. So you make a copy and modify that.

[In both cases, if the object in question is of one of your
classes, you should ask yourself if it couldn't be href="http://en.wikipedia.org/wiki/Immutable">immutable
instead. Immutability has many advantages regarding
performance, security, and thread-safety.]

Can't I just use Object.clone()?

Happily, java.lang.Object defines a href="http://java.sun.com/javase/6/docs/api/java/lang/Object.html#clone()">clone()
method whose intent is exactly to produce a copy of the object
on which it is called. Unhappily, this method, which dates from
the very earliest days of the Java platform, has some design
flaws.

The first problem is that the method is protected. The idea,
presumably, is that subclasses have to explicitly agree to be
cloneable by overriding this protected method with a public
method. The various Collections classes all do this, for
example. ( href="http://java.sun.com/javase/6/docs/api/java/util/ArrayList.html#clone()">ArrayList,
href="http://java.sun.com/javase/6/docs/api/java/util/TreeSet.html#clone()">TreeSet,
href="http://java.sun.com/javase/6/docs/api/java/util/IdentityHashMap.html#clone()">IdentityHashMap,
etc.) The subclass also has to implement href="http://java.sun.com/javase/6/docs/api/java/lang/Cloneable.html">Cloneable
for the default cloning mechanism in Object.clone() to work.

If you have an object that you know has a public clone()
method, but you don't know the type of the object at compile-time,
you are stuck. Say x is declared as an Object. You can't just
call x.clone(), because Object.clone() is protected. If Cloneable
defined a public clone() method, then you could use ((Cloneable)
x).clone(). But it doesn't. So you either have to enumerate all the classes that you think x could be...

Object copy;
if (x instanceof ArrayList)
    copy = ((ArrayList<?>) x).clone();
else if (x instanceof IdentityHashMap)
    copy = ((IdentityHashMap<?, ?>) x).clone();
else
    ...

...or you have to resort to reflection...

Object copy;
try {
    Method clone = x.getClass().getMethod("clone");
    copy = clone.invoke(x);
} catch (Exception e) {
    ...what?...
}

Both solutions are pretty nasty.

A second potential problem is that the default behaviour of
Object.clone() is to make a shallow copy of the object.
Most system classes that provide a clone() method do this too.
A shallow copy means that the copy object itself is different,
but if the original object referenced other objects, the copy
will reference those same objects, and not a copy of them. For
example, what does the following code print?

    HashMap<String, List<String>> cities = new HashMap<String, List<String>>();
    cities.put("France", Arrays.asList("Paris", "Grenoble"));
    HashMap<String, List<String>> citiesClone = (HashMap) cities.clone();
    citiesClone.get("France").set(0, "Dublin");
    System.out.println(cities.get("France"));

Of course, it prints "[Dublin, Grenoble]". The original
cities object has been modified through its clone,
because the clone operation does not clone the list in each
entry.

Cloning through serialization

One solution to these problems is to clone using serialization.
Usually, serialization is used to send objects off somewhere (into
a file or over the network) so that somebody else can reconstruct
them later. But you can abuse it to reconstruct the object
yourself immediately. If the object is serializable at all, then
the reconstruction should be a faithful copy. In normal uses of
serialization, the original object is nowhere near; it could be on
the other side of the world at the far end of a network
connection. So you can be sure that changing the copy will have
no effect on the original.

Before going any further, I should caution that this technique
is not to be used lightly. First of all, serialization is
hugely expensive. It could easily be a hundred times more
expensive than the clone() method. Secondly, not all objects
are serializable
. Thirdly, making a class serializable is
tricky
and not all classes can be relied on to get it right.
(You can assume that system classes are correct, though.)

Class-loading subtleties

When an object is being deserialized, the platform has to be
able to find its class in order to construct an instance of that
class. Imagine that you're deserializing an object you received
over the network. If it's a com.example.Foo, the
serialization framework is going to have to find that class
somehow. How does it do it?

The answer is that it uses the ClassLoader of the code that is
doing the deserialization. So if I define a class
SerialClone that serializes and deserializes an
object, then by default the class of that object, and the
classes of other objects it references, need to be known to the
ClassLoader of SerialClone.

In simple cases, this will always be true. Every class of
interest is on the classpath, including SerialClone
and the class of any object it might be asked to copy.

In more complicated environments, with several ClassLoaders,
this default behaviour is not necessarily what you want. In a
web server, for example, typically every web app has its own
ClassLoader. If the web server wanted to serial-clone an object
it got from a web app, it would need to reconstruct the object
using the web app's ClassLoader. How could it do that?

The answer is that it can use class annotations. RMI
uses these to write information into the serial stream that
tells the remote partner where it can find the classes of
objects that are being sent. The information is the appropriate
URL to download the class from.

We don't need anything nearly as complicated as that. We
already have the class locally. We know the class when we're
serializing. We just need to figure out how to recover it when
deserializing.

The first time any class is referenced from an object being serialized, ObjectOutputStream calls its
annotateClass
method. Correspondingly, the first time a
class is referenced from an object being deserialized,
ObjectInputStream calls its href="http://java.sun.com/javase/6/docs/api/java/io/ObjectInputStream.html#resolveClass(java.io.ObjectStreamClass)">
resolveClass method.

So we can create subclasses of ObjectOutputStream and
ObjectInputStream that override these methods. Our
annotateClass method will simply record the class in a list; it
won't actually write anything into the stream. The resolveClass
is called at the same point in deserialization as annotateClass
is called in serialization. So resolveClass can simply consume
the classes one by one from the list where annotateClass
recorded them.

Deep modifications in the cloned object

Another method that you can override in ObjectOutputStream is
href="http://java.sun.com/javase/6/docs/api/java/io/ObjectOutputStream.html#replaceObject(java.lang.Object)">replaceObject.
This allows you to replace one object with another. For
example, suppose you wanted to change the string "foo" into the
string "bar" everywhere it occurs within an object. Your
ObjectOutputStream subclass might look like this:

class MyObjectOutputStream extends ObjectOutputStream {
    MyObjectOutputStream(OutputStream out) throws IOException {

    super(out);
    enableReplaceObject(true);
    }

    protected Object replaceObject(Object obj) throws IOException {
    if (obj.equals("foo"))
        return "bar";
    else
        return super.replaceObject(obj);
    }
   
    ...
}

This will change "foo" into "bar" even if it occurs deep inside
the object being serialized, for example in any of the Strings in
a Map>. Don't forget
to call enableReplaceObject(true)
or nothing will
happen!

Of course the ability to make arbitrary changes to object
contents is potentially very dangerous. Proceed with caution.
For this reason, unprivileged code cannot override replaceObject
in this way; if there is a SecurityManager then you must have href="http://java.sun.com/javase/6/docs/api/java/io/SerializablePermission.html">
SerializablePermission("enableSubstitution").

The code

Here's a basic class that clones using serialization. Once
again, only use this as a last resort, for all the reasons
mentioned above.

Usage is copy = SerialClone.clone(object).

package serialclone;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InvalidClassException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.ObjectStreamClass;
import java.io.OutputStream;
import java.util.LinkedList;
import java.util.Queue;

public class SerialClone {
    public static <T> T clone(T x) {
try {
    return cloneX(x);
} catch (IOException e) {
    throw new IllegalArgumentException(e);
} catch (ClassNotFoundException e) {
    throw new IllegalArgumentException(e);
}
    }

    private static <T> T cloneX(T x) throws IOException, ClassNotFoundException {
ByteArrayOutputStream bout = new ByteArrayOutputStream();
CloneOutput cout = new CloneOutput(bout);
cout.writeObject(x);
byte[] bytes = bout.toByteArray();

ByteArrayInputStream bin = new ByteArrayInputStream(bytes);
CloneInput cin = new CloneInput(bin, cout);

@SuppressWarnings("unchecked")  // thanks to Bas de Bakker for the tip!
T clone = (T) cin.readObject();
return clone;
    }

    private static class CloneOutput extends ObjectOutputStream {
Queue<Class<?>> classQueue = new LinkedList<Class<?>>();

CloneOutput(OutputStream out) throws IOException {
    super(out);
}

@Override
protected void annotateClass(Class<?> c) {
    classQueue.add(c);
}

@Override
protected void annotateProxyClass(Class<?> c) {
    classQueue.add(c);
}
    }

    private static class CloneInput extends ObjectInputStream {
private final CloneOutput output;

CloneInput(InputStream in, CloneOutput output) throws IOException {
    super(in);
    this.output = output;
}

    @Override
protected Class<?> resolveClass(ObjectStreamClass osc)
throws IOException, ClassNotFoundException {
    Class<?> c = output.classQueue.poll();
    String expected = osc.getName();
    String found = (c == null) ? null : c.getName();
    if (!expected.equals(found)) {
throw new InvalidClassException("Classes desynchronized: " +
"found " + found + " when expecting " + expected);
    }
    return c;
}

    @Override
    protected Class<?> resolveProxyClass(String[] interfaceNames)
throws IOException, ClassNotFoundException {
        return output.classQueue.poll();
    }
    }
}
Related Topics >>

Comments

Very useful. Worked first time as written when applied to my ...

Very useful. Worked first time as written when applied to my existing "Serializable" objects. Was also able to make very good use of "Deep Modifications" via Override of the replaceObject method (looking for all instanceof String and replacing substrings that represent variables with the variable values as follows:

Map<String,String> variables;
.
.
.
@Override
protected Object replaceObject(Object obj) throws IOException {
    if(obj instanceof String && variables != null && variables.size() > 0) {
        String newString = (String)obj;
        for (Map.Entry<String, String> variable : variables.entrySet()) {
            newString = newString.replace("[" + variable.getKey() + "]", variable.getValue());
        }
        return newString;
    }
    return super.replaceObject(obj);
}

Good article

Very useful article, albeit complicated for a programmer who hasn't serialized objects in the past. I'll definitely consider the serialization route next time I need to do clone().