Skip to main content

Socket + XML = pitfall

Posted by kohsuke on July 15, 2005 at 12:04 PM PDT

Yesterday, one of the JAXB users sent me an e-mail, asking for how to solve the problem he faced.

The scenario was like this; you have a client and a server, and you want a client to send an XML document to a server (through a good ol' TCP socket), then a server sends back an XML document. A very simple use case that should just work.

The problem he had is that unless the client sends the "EOS" (end of stream) signal to the server, the server keeps blocked. When he modified his code to send EOS by partial-closing the TCP socket (Socket.shutdownOutput), the server somehow won't be able to send back the response saying the socket is closed.

What's Happening?

So, what's happening and who's fault is this?

When you tell JAXB to unmarshal from InputStream, it uses JAXP behind the scene, in particular SAX, for parsing the document. Normally in Java, the code who opened the stream is responsible for closing it, but SAX says a parser is responsible for closing a stream.

Call it a bug or a feature, but this is done for a reason. People often assume that a parser only reads a stream until it hits the end tag for the root element, but the XML spec actually requires a parser to read the stream until it hits EOS. This is because a parser needs to report an error if anything other than comment, PI, or whitespace shows up. Given that, I'd imagine SAX developers thought "well, if a parser needs to read until EOS, why not have a parser close it? after all, the only thing you can do with a fully read stream is to close it!" In a sense, it makes sense. In any case, it's too late to change now.

So, the net effect is that when you pass in an InputStream from Socket.getInputStream to a JAXB unmarshaller, the underlying SAX parser will call the InputStream.close automatically.

Now, what happens when a socket InputStream is closed? The JDK javadoc really doesn't seem to say definitively, but a little exeriment reveals that it actually fully closes the TCP session. That explains why our guy couldn't write back the response --- by the time he read an input, his socket was already closed!

This seems like a surprising behavior. It would have been better if closing a stream only closes a socket partially in that direction, and you would need to close both InputStream and OutputStream to fully shutdown a socket. It would have made a lot of sense. I guess the reason why it's not done this way is because of the backward compatibility. The Socket class was there since the very first JDK 1.0, but the notion of partial close is only added in JDK 1.3. JDK 1.3 of course couldn't change the behavior of earlier JDKs, no matter how undesirable it is.

By putting those two behaviors, we now know what has happened. At server, a SAX parser who was reading a request is terminating a connection too prematurely.

How To Fix This?

So how to fix this? To make it work, you don't let a parser to close the stream. You can do this by writing a simple FilterInputStream that ignores the close method invocation:

public class NoCloseInputStream extends FilterInputStream {
    public NoCloseInputStream(InputStream in) {
    public void close() {} // ignore close

Then at server, you invoke JAXB like this:

unmarshaller.unmarshal(new NoCloseInputStream(socket.getInputStream());

You can then do some processing, followed by a marshal method invocation like this:

marshaller.marshal( object, socket.getOutputStream() );

Finally you close the socket and you are done. On the client side, you'll do:

marshaller.marshal( object, socket.getOutputStream() );
socket.shutdownOutput(); // send EOS
Object response = unmarshal.unmarshal( socket.getInputStream() );

This time you do want to close the socket after the response is read, so you don't need to use NoCloesInputSTream.

Hird, I'm sorry Java let you down on this one, but hopefully this explains what's going on and why.

Related Topics >>