Skip to main content

Open Source: Digester

Posted by sayedh on September 21, 2005 at 7:18 PM PDT

So this is the first is a series of Open Source entries that I'll make. I would like to raise awareness of open source projects available to Java developers. To help achieve this I'll download/play with some projects, some that I've used before and some that I haven't. Then I'll discuss my experience with it. Also the more feedback the better, good or bad! Also I'd like to have a consistent format for these entries, so if you notice any changes that you think I should make please let me know. On with this blog!

Apache Digester

In this blog we will examine the Apache Digester which is included in the Jakarta Commons project.

Motivation

The reason why I chose the Digester is because it seems to be a simple alternative
to reading XML files with out needing an XSD. Also I have used the Digester before
so I'm a little familiar with it, but I'd like to get to know more as well. I'm interested to hear what you guys say about your experience
with the Digester.

Background (Appetizer)

As applications and services move towards adaptability and inter-operatibility the use of XML continues. When faced with the problem of handling
XML files there are many options to the Java developer. Here is a brief listing of some of those options.

Name Brief Description
Apache Digester Provides a means to map between XML entities and Java classes without the requirement of an XSD.
JAXB Java Architecture for XML Binding.
Allows you to create Java classes to represent the contents of the XML document based
on a XSD.
Castor "Castor is an Open Source data binding framework for Java[tm]. It's the shortest path between Java
objects, XML documents and relational tables. Castor provides Java-to-XML binding, Java-to-SQL persistence, and more.

[NOTE: from castor.org]
SAX/DOM type parser Low-level method of handling XML files.

The Apache Digester is targeted towards the Reading of these
XML files. Unlike some other options Digester provides no means of
creating/modifying XML files. A great scenario to use the Digester would be for
XML configuration files. Lets consider this example, your team is in the course of developing an application which utilizes XML based
configuration files. If you work in a relatively small shop like I do, we don't write schemas (XSD) ahead of time, so we are constantly
adding information to theses XML files. I imagine that it may be a hassle to use the JAXB because

  1. You don't have an XSD
  2. Even if you did it would change often so this means your XML binding classes would change as well

In this scenario you're not really interested in writing the XML files so we don't have to worry about that.
Can't really discuss the Castor option because I have never used it. Maybe I will in a later blog :)


For a better comparison of this see the Related Resources.

Beef & Potatoes (Main course)

Now we have come to the main course I'll show you how to get the Digester and some simple example(s) of its use.
I'll also present some sample code that I've written to demonstrate the use of the Digester, and all of it is available to you.
As you continue to read or while you look through the code keep in mind that I'm new to the Digester so I may have, and probably, did
make some mistakes in its use, and I hope you point them out! I'd much rather make a mistake here than in production code :) How to get
the code is described at the end of this entry.


First of all you'll want to download the Digester, you can find it in the
Jakarta Commons project. I imagine I'll be looking at a bunch of these projects
so you might as well get used to hearing that name. :) Many cool things happening over there.


Ok so the project documentation page says:

Many Jakarta projects read XML configuration files to provide initialization of various Java objects within the system. There are several ways of doing this, and the Digester component was designed to provide a common implementation that can be used in many different projects.


Basically, the Digester package lets you configure an XML -> Java object mapping module, which triggers certain actions called rules whenever a particular pattern of nested XML elements is recognized.



So in a nutshell if you need to read an XML file then the Digester may be a good choice for you. However if you also need to, or
better yet ever may need to, write the contents back to an XML file the Digester is NOT and option for you. Its kinda like a one-way
hash, no functionality to go the other way :)


Now the Digester seems to be using a SAX based parser in the backend so keep that in mind as you are using it. It kinda explains some things
about how you are going to use it.


Basically here are the steps to use the Digester:

  1. Initialize your digester - usually just call the default constructor
  2. Define rules that you will tell digester when to do stuff and what to do.
  3. Give it the file you are trying to parse by passing it into the parse() method
  4. Object returned by parse(), so you're done :)

You have 2 options about how to define the rules

  1. Inside your Java code
  2. Inside an XML file

I'll present both methods and comment on them.

So here is the XML file that we will be parsing:

<?xml version="1.0" encoding="ISO-8859-1"?>
<entries>
   <entry owner="sayedh" created="2005-09-19T08:30:00-05:00">
      <subject style="simple">Sample subject</subject>
      <body style="bodySimple">The contents of last nights dream goes right in here!</body>
      <permissions>
         <include>
            <user>mollyk</user>
            <user>mikem</user>
            <user>keelys</user>
            <user>gilbertoc</user>
            <group>friends0</group>
         </include>
         <exclude>
            <user>desireel</user>
         </exclude>
      </permissions>
   </entry>
   <entry owner="sayedh" created="2005-09-17T04:25:00-05:00">
      <subject style="simple">Subject 2</subject>
      <body style="bodySimple">Body 2 in here</body>
      <permissions>
         <include>
             <group>everyone</group>
         </include>
         <exclude>
            <user>desireel</user>
         </exclude>
      </permissions>
   </entry>
   <entry owner="sayedh" created="2005-09-16T01:25:22-05:00">
      <subject style="simple">Subject 3</subject>
      <body style="bodySimple">Body 3 in here</body>
      <permissions>
         <include>
             <group>everyone</group>
         </include>
         <exclude>
            <user>desireel</user>
         </exclude>
      </permissions>
   </entry>
</entries>

This represents a overly simplified possible representation of entries in a journal. Note: I'm the owner of the
Dreamcatcher Project which is an eXtensible forum. The example code will be available on that
project page so I'll try to make most of it somehow related. But this XML file is obviously not what is going to be used :)


In the sample code you'll see that we have to following objects defined, with the exception of the parsing classes.

Class name Description
Entry Holds an entry
EntryCollection Holds Entry objects
Person Represents a single user
Group Represents a group of users
User Interface to abstract Person and Group
UserType Enumeration of the different user types
Permissions Holds people who are included/excluded from viewing the entry
XMLAble Just a method to implement that will return the XML format of the object

So you can see how these objects relate the XML elements. Some elements are simply members of the parent element's object.


In addition to this you'll find 2 classes that are responsible for parsing the files those are:

d>

Class name Description
Creator Builds the EntryCollection by defining rules in the Java code
CreatorXML Builds the EntryCollection by defining rules in an XML file

Here are the rules defined in the Java Creator class:

   /**
    * Have digester read the XML string and return all entries contained within it
    * @param str
    * @return
    * @throws SAXException
    * @throws IOException
    */
   public EntryCollection buildFromXMLString() throws IOException, SAXException{     
     
      Digester digester = new Digester();
      digester.setValidating(false);
     
      digester.addObjectCreate( "entries",EntryCollection.class );
     
      //now deal with Entry class
      digester.addObjectCreate("entries/entry", Entry.class);
      digester.addSetProperties("entries/entry", "owner", "owner");
      digester.addSetProperties("entries/entry","created","created");
      digester.addSetNext("entries/entry","addEntry");
     
      digester.addBeanPropertySetter("entries/entry/subject", "subject");
      digester.addSetProperties("entries/entry/subject","style","subjectStyle");
     
      digester.addBeanPropertySetter("entries/entry/body","body");
      digester.addSetProperties("entries/entry/body","style","bodyStyle");
     
      digester.addObjectCreate("entries/entry/permissions", Permissions.class);
      digester.addCallMethod("entries/entry/permissions/include/user", "includeUser",0);
      digester.addCallMethod("entries/entry/permissions/include/group", "includeGroup",0);
      digester.addCallMethod("entries/entry/permissions/exclude/user", "excludeUser",0);
      digester.addCallMethod("entries/entry/permissions/exclude/group", "excludeGroup",0);
      digester.addSetNext("entries/entry/permissions/", "setPermissions", "com.sedodream.blog.dig.Permissions");

      EntryCollection parsedValue = (EntryCollection)( digester.parse( this.inputFile ) );  
      return parsedValue;
   }


That's only 17 lines of create the mapping and to actually build those objects! That is the strong point of the Digester.
You don't have to create crazy methods to deal with startElement and you don't need to bind to a specific XSD. The binding to a
an XSD is one thing that turned me on to the Digester, because many of the XML documents that I work with are a work in progress. It's quite
rare that their definition be set in stone. So I need a method to parse the files that is not bound to that. Also another cool thing, all the
parsing is in one place and easy to understand, after you get the hang of it that is. So if your XML file changes a lot, you still only have
17 lines of code to change! Now on to the XML file that describes these rules, here it is.

<?xml version="1.0"?>

<digester-rules>
   <object-create-rule pattern="entries" classname="com.sedodream.blog.dig.EntryCollection" />

   <object-create-rule pattern="entries/entry" classname="com.sedodream.blog.dig.Entry" />
  
   <set-properties-rule pattern="entries/entry">
      <alias attr-name="owner" prop-name="owner" />
      <alias attr-name="created" prop-name="created" />

   </set-properties-rule>

   <pattern value="entries/entry">
      <set-next-rule methodname="addEntry" />
      <call-method-rule methodname="setOwner" pattern="owner"/>
      <bean-property-setter-rule pattern="subject" propertyname="subject"/>

      <set-properties-rule pattern="subject">
         <alias attr-name="style" prop-name="subjectStyle"/>
      </set-properties-rule>

      <bean-property-setter-rule pattern="body" propertyname="body"/>
      <set-properties-rule pattern="bodyStyle">
         <alias attr-name="style" prop-name="subjectStyle"/>

      </set-properties-rule>
     
      <object-create-rule pattern="permissions" classname="com.sedodream.blog.dig.Permissions" />
      <pattern value="permissions">
         <pattern value="include">
            <call-method-rule pattern="user" methodname="includeUser" paramcount="0" paramtypes="java.lang.String" />
            <call-method-rule pattern="group" methodname="includeGroup" paramcount="0" paramtypes="java.lang.String" />  
         </pattern>

         <pattern value="exclude">
            <call-method-rule pattern="user" methodname="excludeUser" paramcount="0" />
            <call-method-rule pattern="group" methodname="excludeGroup" paramcount="0" />
         </pattern>
        
         <set-next-rule methodname="setPermissions" paramtype="com.sedodream.blog.dig.Permissions" />
        
      </pattern>

   </pattern>  
</digester-rules>


Ok, so at this point I could dive into how you should create these rules and what is what but I'm not going to do that. For a couple of reasons

  1. I'd probably say something that was wrong cuz I haven't researched or used the Digester enough
  2. That's not the purpose of this blog, the purpose is to present it and give you my thoughts. With the hopes of getting feedback from you!

Here is what I've found, I thought that the Digester is pretty cool for loading XML files that only need to be read. Now don't take this too
lightly, you must seriously think about this before you consider the Digester as an option. Because how often do things change? Even in projects
where you don't expect to have to output the objects as XML you may have to. I must admit when I started writing the sample code, the thought
of actually printing the XML didn't occur to me. But then as I was working I decided to. So I had to do it all by hand. In this case it was
pretty easy but in yours it may not!


When given the option to write the rules in code or in XML, always pick in XML. I thought that it was much easier to see how the rules related
to my XML file when I was creating the XML file. In your case if you don't have to, I would disregard the code based rules all together. I think
that you'll grasp the workings of the Digester much easier when writing the XML files and you'll be more likely to reduce the time you spend
writing those rules. Also I didn't find much good documentation about the format of the XML rules so I had to resort to the digester-rules.dtd
file which is included with the Digester distribution. If you can't find what you're looking for online after like 1 minute, I'd look there
for sure.

Conclusions

Despite liking the Digester and actually having fun learning it, I will continue to be cautious when faced with the decision to use it or not.
Because there is no means to go from Java Object -> XML only XML -> Java Object. So in that light I'll be looking for some other options
in the future, most likely the next one will be Castor, but probably not for a while. Oh wait I think I'd
be more likely to evaluate using Hibernate for this task first. It is more close to a project that
I'm working on currently. I'll be sure to let you know how that goes.


Please let me know what you think of this entry and what's wrong with it. Always looking for feedback. About the source you can always get the
latest version at my Dreamcatcher Project page. Just go to 'Version control - CVS' link on the left
and you can either browse the source online or download it. Also I have made all the source available in zip at:
example source for your
convenience. You will find a build.xml file which is an ant build file. From that you can
compile and run the examples. Note you'll need Java 5 for this.


Related Resources


Relevant Environment Info

  • JDK: 1.5.0_01
  • Eclipse info: 3.1.0 Build id: I20050610-1757
Related Topics >>