Skip to main content

XML Standards as ObjectOriented Code Part I

Posted by jive on July 24, 2005 at 2:44 PM PDT

In the open-spatial Java community we have a problem: XML

I know that does not sound like much of a problem, as long as you are an XML god. Spatial data has a habit of being very large and breaking to existing XML tools and assumptions. This blog post is not about that.

There are three projects I am going to talk about today:

  • GeoTools - open source java GIS toolkit
  • GeoAPI - interfaces for OGC/ISO standards (for cross toolkit interoptability)

But wait that is only two? The other is a quick hack I am doing trying to combine their idea of what a Filter is.

What is a Filter - the quick answer is this specification Filter Encoding 1.0 that talks mostly about Featurs. And this one Filter 1.1 which expands the domain of discource to include metadata. (You can see why I am looking into Filter before trying my mad metadata plan).

What does all that mean? A filter basically gives you a "pass/fail" for an Object (Features are objects in GIS world). Basically this is a test of set membership. If you know SQL this is the SQL WHERE clause.

The specification mostly is worth knowing about so you can talk to Web Feature Servers and get real data (rather then those pictures you see on google maps).

Three Takes on the problem

The standard is defined in terms of XML. And we have two sets of interfaces to play off of:

The Geotools interfaces are mutable, and the GeoAPI ones are not.

Traversal XSLT vs Visitor vs Iterator

Both Geotools and GeoAPI define an accepts( Visitor ) method allowing these constructs to be traversed. This is in marked contrast to the XML approach where due to a common model, a declarative language (XSLT) can be used to target and transform Filter expressions no matter where they are in a document. The other object oriented way to do things is using an Iterator.

Iterator is best known these days for its role in the java Collections API:


for( Iterator i=STRUCTURE.iterator(); i.hasNext();){
    Object obj = i.next();
    // do something with object
}

This construct does provide traversal without revealing the internals of the object. Depending on the iterator you may not even reveal internal order.

A Visitor acts exactly like the inside of that above for loop:


Visitor visitor = new new Visitor(){
    public void accept( Object obj ){
      // do something with object
   }
}
STRUCTURE.visit( visitor );

The advantage being that you can reuse your visitor, an implementation can cheat and run the visitor across cached copies and is generally free to hack a little bit more.

What it boils down to for GeoAPI:

  • Object Expression.accept(ExpressionVisitor visitor, Object extraData)
  • Object Filter.accept( FilterVisitor visitor, Obejct extraData )

Where: Accepts a visitor. Subclasses must implement with a method whose content is the following: return visitor.visit(this, extraData);

The description is clearly wrong, the definition left out the taverse part, they also developed their visitor takes an Object along for the traversal ride. Normally this is a Collector object much like you see in JUnit tests grabbing all the passes and fails. The combination is powerful and lets implementers actually forgo traversal is feed in ached results if they are known. As I said visitor allows for more hacking.

Lets see how GeoTools does it. The implementation is a normal accepts method. Usually the visitor acts as its own Collection object.
Here is the description: Used by FilterVisitors to perform some action on this filter instance. Typicaly used by Filter decoders, but may also be used by any thing which needs infomration from filter structure. Implementations should always call: visitor.visit(this); It is importatant that this is not left to a parent class unless the parents API is identical.

So why should we have FilterVisitor at all ...

The point of being Object-Oriented?

One of the points of having a Filter and Expression API (that is having these as Objects that can be implemented), is to allow client code to do their own thing.

That is people will:

  • write custom expressions
  • write custom filters

Earlier the question was asked why have FilterVisitor at all - this is built into the motivation for the visitor pattern. Visitor lets people write a data structure, and hide the implementation details while still offering traversal.

With the existing GeoAPI FilterVisitor and ExpressionVisitor this is not possible, and hense there is no point to having the methods. As long as the set of Filters and Expressions are closed you can just define a "Walker" interface that knows how to traverse, and it can take a visitor along for the ride.

For known, closed domains, this is actually a good thing. As example see the geotools graph module where the computer science construct of a graph of relationships can be built (Builder Pattern), based on a relationship (Stratagy Pattern), traversed by Walkers using common algorithms (like shortest path) and visited in the manner described above.

Filter and Expression are not a close system so we need to indicate that at the GeoAPI interface level:

  • Object FilterVisitor.visit(Filter filter, Object extraData )
  • Object ExpressionVisitor.visit(Expression filter, Object extraData )

I am sure this is just an oversight. The current interfaces have a visit method for each and every of the specific subclasses of Filter defined by the library.

If we do not have these methods you "close the door" on people implementing custom work. Not all filters can and should be written in terms of the basic opperations defined by the standard. Yes that means that you will not be able to traverse every Filter/Expression ever made and generate SQL for it. Some filter/expressions will require post processing by actual code.

Visitor lets a custom implementation provide traversal of its internal data structure. Not all nodes of which will be available to a Filter or Expression visitor. It is something the actual implementation needs to do.
The wish to be open ended is also the reason the door is shut on being able to do XSLT style transformations for everything in the object model.A common wish is the ability to turn these annoying objects into XML so this style of tool can be applied. The power of XSLT is the ability to transform known elements nested deeply in an unknown document.

This is a lot harder to accomplish in an Object-Oriented system.

Transformation and Traversal

As mentioned above if everything was available as an XML document there are known ways to accomplish a transformation in the XML world. Lets consider a similar transformation in Objectland (which is like the best book ever).

This also brings us to our first real problem. How can we allow transformations in one of these object oriented system? Such as reprojection of literal Geometry in a Filter/Expression structure.

In geotools this was/is easy. The expressions can be modified as part of their interface. A visitor could just change things as it wandered around the data structure. Mutability has its own set of consequences (especially if the filter is just on "loan" from client code. Often this means that any code calling a visitor had made a copy (to prevent random damage), or can be broken by a badly constructed visitor.

In the GeoApi interfaces we don’t offer modification. Which begs the questions on how transformations should actually happen.

It is my understanding was that for the motivation for the Object returned by the GeoAPI visitor was going to *be* the Filter/Expression resulting from a transformation.. And if no change was required the immutable interface would let us safely return a branch with out the need to copy.

So the visitor would be used to "Build" the transformation as it is walked across the object structure.

So what is the problem with that?

Two problems actually:

  • Being open ended (Again) - the generic Filter or Expression visit methods don’t know to copy themselves
  • Transforming in the middle of an higher level Object model

In order to smoothly apply a transformation FilterVisitor/ExpressionVisitor to a higher order object like WFS Query we may need to formalize what is expected of a transformation.

We can see the problem in the small thus even with the API we have defined already:
- How do we apply an ExpressionVisitor transformation to a Filter?

If we had:

class ExpressionTransform extends ExpressionVisitor {
   Expression visit( Expression expression, Object extraData )
}

Matched with a method such as: Object Filter.transform( ExpressionTransform transform )

This would be defined as a clone of the filter logic, with the transformation applied to the contained Expressions.

Perhaps this is best illustrated with an example:
POINT( "1187128,395268", "3005" ) EQUALS POINT( "1187128,395268", "3005" )

After a Expression transform:
POINT( "-123.470095558323,48.5432615620081", "4326" ) EQUALS POINT( "-123.470095558323,48.5432615620081", "4326" )

Note that no change is required to the Expression interfaces to allow transformation to happen (this is good for future extensibility as Filter is used in all maner of places these days – from FeatureType models to Styling.

The Payoff

Those of you that play with XML may be in horror by now – yes XML buys you a lot.
Here is the point of all this OO madness. This object-oriented design lets you implement as much of your
work in XSLT as you can get away with.

The visitor pattern above let hide the idea of traversal:
- you don't even need to do it in some cases (cached values from another run)
- it can be split over several processors and nobody will ever know (the main advantage over iterator)
- You can recognize a branch as being entirely viable as XML so you can run XSLT on it and nobody will ever know

I will also point out that this technique has worked well over the years for turning Filter expressions into SQL: The power of each DB is different and many can do a subset of the Filter standard if not all. By splitting your filter into two (that which can be transformed into SQL and that which cannot) you can smoothly support the entire specification on top of a wide range of databases. Anything that cannot be encoded as SQL can be applied after the fact on the resulting result set.

Where to next? Well how about the follow up article: Supporting Hacks and Versions

Related Topics >>