Skip to main content

XSLT + XHTML + JDK6 + JDK7 = madness

Posted by fabriziogiudici on February 12, 2012 at 3:27 PM PST

Java is great. But sometimes you get caught in a trap of mud and you don't see it coming.

Context: I'm working with XSLT for manipulating XHTML and it works great. Well, 99% of it works great. The remainder 1% is problematic. Xalan (the XSLT processor in the JDK) knows how to produce output in HTML and XML modes, by means of the <xsl:output .../> directive. HTML mode doesn't work, because XHTML is not HTML. XML should work, because XHTML is XML. Really? Not, indeed. XHTML differs from XML for some details in serialization. In particular, while empty elements in XML are always serialized with a shortcut (e.g. <element/>), this is not the case for XHTML. Some elements, such as <br/> must be serialized with shortcuts. Some must not, such as <a>, <p>, <script>, <textarea>, etc... This means that an empty anchor must be always serialized with <a></a>.

Such empty elements are frequent when a JavaScript library is used, since they are usually placeholders to be dinamically populated. Even recent browsers screw up things when they see a <script href="..."/> element, and some JavaScript tools which manipulate the DOM screw up things when they see something such as <a/>.

In spite of this, Xalan is not able to deal with proper XHTML serialization. There are a number of blog posts from annoyed people and an official Xalan bug opened ... in 2004 and never fixed. Clearly at the Xalan community they think that manipulating XHTML is a niche activity (Sun has been bashed for years because of relevant bugs not fixed after a long time, but clearly they weren't alone).

Trying to patch the internal serialization classes of Xalan didn't work, as they are tightly coupled with a lot of stuff in the com.sun.* packages. Apache used to provide a number of serializes for XML, but they were deprecated in favour or TrAX (that is, Xalan), or LSSerializer. Unfortunately the latter doesn't seem to provide any flexibility in XML serialization and can't be used for proper XHTML serialization.

So, the only solution I've found so far is to resume an old deprecated class named XHTMLSerializer which does almost everything good (the missing parts can be easily fixed by subclassing). Actually, looking at the source, specific care to the XHTML issues were paid, demonstrating that among the Xalan authors the problem was well understood. Somebody was probably too quick on the trigger when he decided to deprecate some stuff without ensuring that all the features had found their way to the new classes.

Done? Not at all. XHTMLSerializer works fine with JDK6 but miserably fails with JDK7. It seems that the JRE7 misses a resource used by some of the inner classes. Probably these classes have not been tested in JDK7 since they are deprecated (but, then, what's the point in including them in the runtime?).

Second attempt, and I copied three other classes from JDK6 into my application, together with the missing resource. Fortunately they are not tightly coupled with other stuff and they can live on their own. 

Hours wasted for such a silly thing.

If this can be helpful to you, the details are filed in my project's issues NW-96 and NW-99 (they include links to patches).


Related Topics >>