Sun StAX Parser at Java.net
The Sun Java Streaming XML Parser (SJSXP) FCS version 1.0 is now available in binary and source forms from Java.net. This parser is an implemenation of JSR 173, submitted to the JCP by BEA. We liked this parser and the StAX API so much that we've made it a key component of our Web services stack in Glassfish.
The Streaming API for XML (StAX) essentially turns the SAX processing model upside down. Instead of the parser controlling the application's flow, and the application reacting to parsing events, it is the application that control's the flow by "pulling" events from the parser. This parsing model has several advantages over SAX. First, it often makes the application logic easier to understand given that it is the application and not the parser that is in control of the process (stated differently, the application does not get "pushed around" :). Second, if implemented correctly, there are a number of new optimizations that are possible when the application does not need to process the entire infoset. In particular, it is possible to lazily wait until the application requests a certain infoset item before it is actually constructed (a good example of this is Java strings).
So, this is all great, but what about performance? Glad you asked. I picked up about 20 documents from industry-standard schemas like UBL, FPML and GAML and compared the single-threaded throughput performance of SJSXP vs. the version of Xerces in JDK 1.5. Here are the results:
These results have been obtained using Japex and the Japex Standard Driver Library (JSDL) which is bundled with it.
As always, when it comes to performance "your mileage will vary". It should be pointed out that the version of Xerces in JDK 1.5 is not the latest. Despite that, we are very excited about this new parser and hope to continue improving it by building an SJSXP community at Java.net. So come and join us!