Skip to main content

XML processing pitfall: InputStream

Posted by kohsuke on October 7, 2005 at 12:43 PM PDT

Many XML parser APIs accept InputStream or Reader. For example, JAXB unmarshaller has unmarshal(InputStream), StAX has XMLInputFactory.createStreamReader(InputStream), XStream has XStream.fromXML(Reader). So all too often you'd write something like:

XMLInputFactory xif = ...;
xif.createStreamReader(new FileInputStream("data/foo.xml"));

Or maybe:

XMLInputFactory xif = ...;

The problem with this shows itself when you have references to other files in your XML file, such as:


Or maybe:


In general, it doesn't work if your XML file has relative references to other resources, because the parser (or the unmarshaller or whatever) doesn't know the base URI to resolve a relative reference with.

To make the issue even more complicated, some parser, such as Xerces (at least some version of it), try to resolve it against the current directory, which sometimes work (and break as soon as you deploy your apps in producion!) Some other parsers, such as Aelfred, does a better job of issueing a warning in this situation.

Another factor that makes the situation worse is the poorly designed APIs. For example, XStream doesn't offer any version of the fromXML method that allows you to pass the URI of the document. So it's not only error-prone, but it's actually impossible to make it resolve relative references correctly.

StAX is marginally better, as it offers XMLInputFactory.createXMLStreamReader(String,InputStream), which lets you pass in the URI. But unless you are an XML geek, it would probably never occur to you that you need to use this version, as opposed to more simpler createXMLStreamReader(InputStream). Besides, you need to turn a file name into URL, so the code will look like:

File file = new File("data/foo.xml");
xif.createXMLStreamReader( file.toURL().toExternalForm(), new FileInputStream(file) );

... which isn't exaclty the simplest code in the world.

SAX API does a much better job, as you'd be using XMLReader.parse(String) version. It's both the intuitive version as well as the correct version at the same time. The only little downside is that it's not type-safe, so at the first glance, you aren't sure if you need to pass in the URL form or the file form (it actually works in both forms in most of the parsers --- don't know if it's required by the SAX API.) JAXB does it slightly better, as it exposes Unmarshaller.unmarshal(File), thereby eliminating the type-safety issue.

The other benefit of having an API that just asks you the name of the XML file is that the implementation can choose the right buffering strategy without any redundancy. If an API accepts InputStream, some implementations want you to do the buffering, while some others do the buffering on its own. So you have this little guess game of whether you should wrap your InputStream to BufferedInputStream or not.

It's just one example of why it's hard to design an API that "just works."

Related Topics >>