The Source for Java Technology Collaboration
User: Password:



Eamonn McManus

Eamonn McManus's Blog

Cloning Java objects using serialization

Posted by emcmanus on April 04, 2007 at 08:43 AM | Comments (9)

Sometimes you need to clone objects, and sometimes you can't use their clone method, and sometimes serialization provides an alternative. Here's an explanation of when you might need this exotic and expensive technique, and how you can use it.

When do you need to clone?

The commonest time that you need to clone an object is when it is a parameter or return value of one of your public methods. If it is a parameter that you save somewhere, then you don't want the caller to be able to modify it later. So you save a copy of the object. Likewise, if you are returning an object that is part of your class's internal state, you need to return a copy instead so that callers can't accidentally or deliberately change that internal state.

A second case when you might want to clone is when you need to modify an object, but you don't know who else might have a reference to it. So you make a copy and modify that.

[In both cases, if the object in question is of one of your classes, you should ask yourself if it couldn't be immutable instead. Immutability has many advantages regarding performance, security, and thread-safety.]

Can't I just use Object.clone()?

Happily, java.lang.Object defines a clone() method whose intent is exactly to produce a copy of the object on which it is called. Unhappily, this method, which dates from the very earliest days of the Java platform, has some design flaws.

The first problem is that the method is protected. The idea, presumably, is that subclasses have to explicitly agree to be cloneable by overriding this protected method with a public method. The various Collections classes all do this, for example. (ArrayList, TreeSet, IdentityHashMap, etc.) The subclass also has to implement Cloneable for the default cloning mechanism in Object.clone() to work.

If you have an object that you know has a public clone() method, but you don't know the type of the object at compile-time, you are stuck. Say x is declared as an Object. You can't just call x.clone(), because Object.clone() is protected. If Cloneable defined a public clone() method, then you could use ((Cloneable) x).clone(). But it doesn't. So you either have to enumerate all the classes that you think x could be...

Object copy;
if (x instanceof ArrayList)
    copy = ((ArrayList<?>) x).clone();
else if (x instanceof IdentityHashMap)
    copy = ((IdentityHashMap<?, ?>) x).clone();
else
    ...

...or you have to resort to reflection...

Object copy;
try {
    Method clone = x.getClass().getMethod("clone");
    copy = clone.invoke(x);
} catch (Exception e) {
    ...what?...
}

Both solutions are pretty nasty.

A second potential problem is that the default behaviour of Object.clone() is to make a shallow copy of the object. Most system classes that provide a clone() method do this too. A shallow copy means that the copy object itself is different, but if the original object referenced other objects, the copy will reference those same objects, and not a copy of them. For example, what does the following code print?

    HashMap<String, List<String>> cities = new HashMap<String, List<String>>();
    cities.put("France", Arrays.asList("Paris", "Grenoble"));
    HashMap<String, List<String>> citiesClone = (HashMap) cities.clone();
    citiesClone.get("France").set(0, "Dublin");
    System.out.println(cities.get("France"));

Of course, it prints "[Dublin, Grenoble]". The original cities object has been modified through its clone, because the clone operation does not clone the list in each entry.

Cloning through serialization

One solution to these problems is to clone using serialization. Usually, serialization is used to send objects off somewhere (into a file or over the network) so that somebody else can reconstruct them later. But you can abuse it to reconstruct the object yourself immediately. If the object is serializable at all, then the reconstruction should be a faithful copy. In normal uses of serialization, the original object is nowhere near; it could be on the other side of the world at the far end of a network connection. So you can be sure that changing the copy will have no effect on the original.

Before going any further, I should caution that this technique is not to be used lightly. First of all, serialization is hugely expensive. It could easily be a hundred times more expensive than the clone() method. Secondly, not all objects are serializable. Thirdly, making a class serializable is tricky and not all classes can be relied on to get it right. (You can assume that system classes are correct, though.)

Class-loading subtleties

When an object is being deserialized, the platform has to be able to find its class in order to construct an instance of that class. Imagine that you're deserializing an object you received over the network. If it's a com.example.Foo, the serialization framework is going to have to find that class somehow. How does it do it?

The answer is that it uses the ClassLoader of the code that is doing the deserialization. So if I define a class SerialClone that serializes and deserializes an object, then by default the class of that object, and the classes of other objects it references, need to be known to the ClassLoader of SerialClone.

In simple cases, this will always be true. Every class of interest is on the classpath, including SerialClone and the class of any object it might be asked to copy.

In more complicated environments, with several ClassLoaders, this default behaviour is not necessarily what you want. In a web server, for example, typically every web app has its own ClassLoader. If the web server wanted to serial-clone an object it got from a web app, it would need to reconstruct the object using the web app's ClassLoader. How could it do that?

The answer is that it can use class annotations. RMI uses these to write information into the serial stream that tells the remote partner where it can find the classes of objects that are being sent. The information is the appropriate URL to download the class from.

We don't need anything nearly as complicated as that. We already have the class locally. We know the class when we're serializing. We just need to figure out how to recover it when deserializing.

The first time any class is referenced from an object being serialized, ObjectOutputStream calls its annotateClass method. Correspondingly, the first time a class is referenced from an object being deserialized, ObjectInputStream calls its resolveClass method.

So we can create subclasses of ObjectOutputStream and ObjectInputStream that override these methods. Our annotateClass method will simply record the class in a list; it won't actually write anything into the stream. The resolveClass is called at the same point in deserialization as annotateClass is called in serialization. So resolveClass can simply consume the classes one by one from the list where annotateClass recorded them.

Deep modifications in the cloned object

Another method that you can override in ObjectOutputStream is replaceObject. This allows you to replace one object with another. For example, suppose you wanted to change the string "foo" into the string "bar" everywhere it occurs within an object. Your ObjectOutputStream subclass might look like this:

class MyObjectOutputStream extends ObjectOutputStream {
    MyObjectOutputStream(OutputStream out) throws IOException {

    	super(out);
    	enableReplaceObject(true);
    }

    protected Object replaceObject(Object obj) throws IOException {
    	if (obj.equals("foo"))
    	    return "bar";
    	else
    	    return super.replaceObject(obj);
    }
    
    ...
}

This will change "foo" into "bar" even if it occurs deep inside the object being serialized, for example in any of the Strings in a Map<Integer, Set<String>>. Don't forget to call enableReplaceObject(true) or nothing will happen!

Of course the ability to make arbitrary changes to object contents is potentially very dangerous. Proceed with caution. For this reason, unprivileged code cannot override replaceObject in this way; if there is a SecurityManager then you must have SerializablePermission("enableSubstitution").

The code

Here's a basic class that clones using serialization. Once again, only use this as a last resort, for all the reasons mentioned above.

Usage is copy = SerialClone.clone(object).

package serialclone;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InvalidClassException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.ObjectStreamClass;
import java.io.OutputStream;
import java.util.LinkedList;
import java.util.Queue;

public class SerialClone {
    public static <T> T clone(T x) {
	try {
	    return cloneX(x);
	} catch (IOException e) {
	    throw new IllegalArgumentException(e);
	} catch (ClassNotFoundException e) {
	    throw new IllegalArgumentException(e);
	}
    }

    private static <T> T cloneX(T x) throws IOException, ClassNotFoundException {
	ByteArrayOutputStream bout = new ByteArrayOutputStream();
	CloneOutput cout = new CloneOutput(bout);
	cout.writeObject(x);
	byte[] bytes = bout.toByteArray();
	
	ByteArrayInputStream bin = new ByteArrayInputStream(bytes);
	CloneInput cin = new CloneInput(bin, cout);

	@SuppressWarnings("unchecked")  // thanks to Bas de Bakker for the tip!
	T clone = (T) cin.readObject();
	return clone;
    }

    private static class CloneOutput extends ObjectOutputStream {
	Queue<Class<?>> classQueue = new LinkedList<Class<?>>();

	CloneOutput(OutputStream out) throws IOException {
	    super(out);
	}

	@Override
	protected void annotateClass(Class<?> c) {
	    classQueue.add(c);
	}

	@Override
	protected void annotateProxyClass(Class<?> c) {
	    classQueue.add(c);
	}
    }

    private static class CloneInput extends ObjectInputStream {
	private final CloneOutput output;

	CloneInput(InputStream in, CloneOutput output) throws IOException {
	    super(in);
	    this.output = output;
	}

    	@Override
	protected Class<?> resolveClass(ObjectStreamClass osc)
	throws IOException, ClassNotFoundException {
	    Class<?> c = output.classQueue.poll();
	    String expected = osc.getName();
	    String found = (c == null) ? null : c.getName();
	    if (!expected.equals(found)) {
		throw new InvalidClassException("Classes desynchronized: " +
			"found " + found + " when expecting " + expected);
	    }
	    return c;
	}

    	@Override
    	protected Class<?> resolveProxyClass(String[] interfaceNames)
	throws IOException, ClassNotFoundException {
    	    return output.classQueue.poll();
    	}
    }
}

Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • It should be fairly rare in a statically typed language that you don't know the type of an object. One solution is to declare a ReallyCloneable interface and make everything implement that, or shove a public Object clone() in an existing supertype. With clever generics tricks similar to how enums work, you could make it covariant too.

    Given the usual case, which is that you have an instance of some supertype, say, Animal, and its implementations make the clone method public, you could use the visitor pattern:


    Animal cloned=animal.accept(new AnimalVisitor<Animal>()
    {
    public Animal visit(Sheep sheep)
    {
    return sheep.clone(); //but it dies within a year!
    }

    public Animal visit(Dog dog)
    {
    return dog.clone();
    }
    });


    Adding it to a supertype, or adding a supertype, is definitely a better idea though.

    Posted by: ricky_clarkson on April 04, 2007 at 09:51 AM

  • The <Animal> in the previous post means the type that the visit methods return - generics make visitors more attractive.

    It's just a shame that methods that return Void actually have to return null, rather than being treated the same as methods that return void. Or maybe the shame is that generics type parameters can't be primitive.

    Posted by: ricky_clarkson on April 04, 2007 at 10:02 AM

  • Or you could use JBoss Serialization which was designed for solving this problem.

    Posted by: mister__m on April 10, 2007 at 11:43 AM

  • mister__m, JBoss Serialization was undoubtedly designed to solve some problem, but only peripherally and badly this one.

    "Smart cloning" is probably interesting in cases where you need to translate objects from classes defined by one ClassLoader to the same classes defined by another, but in the situation I describe I want to recreate objects using the same ClassLoader they had in the original. JBoss Serialization doesn't give you any way to do that; when using it you have to find a single ClassLoader that can load every class referenced from your object. In the general case this is not possible. Imagine cloning a Map containing lots of objects from different ClassLoaders, for example.

    I'd also hesitate to recommend the use of this project to anyone. It is very poorly documented and its code has many fragile dependencies on internal implementation details of the JDK.

    Posted by: emcmanus on April 12, 2007 at 01:39 AM

  • Another alternative is to use a dynamic cloning technique.

    Partial_Bean_Cloning_and_DTOs

    You can also find the javadoc to the reference implementation here:
    Bean Cloner

    Posted by: markfuturesft on April 19, 2007 at 06:51 PM

  • See the clone method in Apache Commons Lang SerializationUtils.

    Posted by: optidave on November 23, 2007 at 04:29 AM

  • optidave, if someone is already using the Apache Commons Lang package then this is certainly a possibility to look at. However, in its current version (560660) it doesn't do anything to address the ClassLoader problems I mentioned above. Of course nothing would stop somebody volunteering a contribution that adds the solution I describe to this method.

    Posted by: emcmanus on November 26, 2007 at 01:31 AM

  • "I'd also hesitate to recommend the use of this project to anyone. It is very poorly documented
    " What documentation do you need besides the Java API? It's just an extension of Object*Stream

    "has many fragile dependencies on internal implementation details of the JDK."

    Well... the JDK doesn't give you any opening for things that should be available through some nice API. There is no way to get around it.

    "Imagine cloning a Map containing lots of objects from different ClassLoaders"

    ObjectOutputStream doesn't care about classLoading. There are not classLoading operations on ObjectOutputStream. On the Inputstream however, the ClassLoader being used will be responsible to return the correct implementation. For this there is no difference between JavaSerialization and JBossSerialization

    Posted by: clebertsuconic on January 04, 2008 at 08:59 AM

  • Hi Clebert,
    My remark on the documentation was based on looking at the javadoc. Suppose I want to have a faster version of ObjectInputStream/ObjectOutputStream, which is one of the two things that JBoss Serialization is advertised as allowing me to do. I see from the project page that I need to make a JBossObjectOutputStream. So I go and look at the constructor for that class. In fact there are four constructors, of which the simplest is documented like this:

    Creates an OutputStream, that by default doesn't require

    The other constructors are no better, including parameters with no explanation whatever of what they do. I basically stopped reading at that point, because if something so basic isn't documented at all then I think I am justified in saying the documentation is poor.
    The other problem that JBoss Serialization addresses is "Smart Cloning", but nothing tells you how to use it. Even if you chance upon JBossObjectOutputStream.smartClone in the javadoc, you then need to decipher "Reuses every primitive value to recreate another object." I don't like calling methods whose documentation makes no sense because I am afraid that their behaviour will make no sense either.
    You are probably right that it would not be possible to implement this without introducing dependencies on implementation details, but that doesn't change my position that this project cannot be recommended because an application that uses it is liable to break when the underlying JDK is updated.
    I don't see how your last paragraph addresses the point I was making.
    Sorry to be so negative but this project pushed a hot button of mine, which I've written about in detail elsewhere.
    Regards,
    Éamonn

    Posted by: emcmanus on January 11, 2008 at 09:44 AM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds