Skip to main content

What's next for the XPath API in JAXP?

Posted by spericas on January 5, 2007 at 1:14 PM PST

The XPath API has been part of JAXP since version 1.3, which is part of Java SE 5.0. Yet, we've never received (or at least I haven't heard) much feedback about it or the implementation that we have in the RI. Well, that's not entirely true, I have heard a few times that the implementation is not very fast (which is true), yet not much about the API itself.

In case your not familiar with the XPath API in JAXP, see its Package Description. Here is a simple example borrowed from the Japex source code:


  XPath query = XPathFactory.newInstance().newXPath();

  query.setNamespaceContext(new NamespaceContext() {
      public String getNamespaceURI(String prefix) {
          return prefix.equals("reg") ? 
            "http://www.sun.com/japex/regressionReport" : null;
      }
      public String getPrefix(String namespaceURI) { return null; }
      public Iterator getPrefixes(String namespaceURI) { return null; }
  });

  // Find the value of the notify attribute
  Object o = query.evaluate("/reg:regressionReport/@notify",
                            dom.getNode(), XPathConstants.STRING);

  // Is notification needed?
  if (o.equals("true")) {
     // do something
  }

Personally, I have found the API to be sufficient for most of my use cases. However, there is a bit of an impedance mismatch with the RI's implementation that is based on Xalan. Namely that Xalan (as well as XSLTC) use an internal representation for the infoset known as DTM. So every time you run a query, a new DTM must be created before the XPath expression can be evaluated. Thus, if you do this in a loop (as a typical benchmark would) the performance will be dominated by the DTM creation, not the query evaluation per se. Moreover, even if you do need to run multiple queries on the same infoset, the current API does not support that, so the cost of building DTMs in the RI cannot be amortized by running multiple queries.

So how common is to run multiple queries over the same infoset? I'm not certain, but I know of some Business Process Management tools that could use this feature, and this is why it has made it into our list of things to consider. What else is on our list? Support for streaming in XPath, i.e. the ability to execute a query in constant space. We heard and read a number of articles about this technique, but again, I'm not actually sure how many people actually need this or how many people are using this technique today.

As I mentioned at the beginning, the main purpose of this blog was to collect feedback from you. So this is your turn now. We'd like to hear from you on ways of improving support for XPath in the API and the RI. I almost forgot, there's also the topic of XPath 2.0, and naturally we want to hear from you on the need to support this new version as well.

Related Topics >>