Skip to main content

XMLDecoder improvements

Posted by malenkov on October 31, 2006 at 9:58 AM PST

I would like to start a discussion about XMLDecoder improvements. Some requests can be found in RFE 4864117. I don't want to discuss improvements of persistence delegation (XMLEncoder) here.

How to read objects

Usually the following code is used to read XML file that represents JavaBeans archive:

public static Object[] readXML( InputStream stream ) {
    List list = new ArrayList();
    XMLDecoder decoder = new XMLDecoder( stream );
    try {
        while ( true ) {
            list.add( decoder.readObject() );
        }
    } catch ( ArrayIndexOutOfBoundsException exception ) {
    } finally {
        decoder.close();
    }
    return list.toArray();
}

This is the right way when you don't know about the amount of objects in JavaBeans archive. Also you can customize it with the class loader, the exception handler and the owner of this decoder. And you should do it every time for each file meant to parse.

I want to suggest the following code:

DocumentHandler handler = new DocumentHandler();
Object[] result1 = handler.parse( new InputSource( stream1 ) );
Object[] result2 = handler.parse( new InputSource( stream2 ) );

In this case you can parse each file with the same handler, so you don't need customize it before each usage. With InputSource you can use not only byte stream (InputStream), but you can use also character stream (Reader) or file name represented by String.

DocumentHandler extends DefaultHandler. So you can use it within your own SAX parser.

How to pass an argument

Each element that appears in the body of the outermost element () is evaluated in the context of the decoder itself. Typically this outer context is used to retrieve the owner of the decoder, which can be set before reading the archive. The owner is a property of the decoder and can be accessed in the usual way:

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.4.0" class="java.beans.XMLDecoder">
  <void id="myController" property="owner"/>
  ...objects go here...
</java>

I think this is a bad idea for security reasons. Now it is possible to set any property of XMLDecoder and execute any method. Do not do this below:

<java>
  <object method="readObject"/>
</java>

So I think that the context of element should be an owner of DocumentHandler. There is no problem with backward incompatibility, because we can set appropriate owner depending of value of class attribute of element.

XMLDecoder was created to read JavaBeans archive generated by XMLEncoder, but some people prefer to create XML manually. In many cases JavaBeans archive contains only one element, that does not use the context. DocumentHandler does not require to use element. For example, the following code

<java>
  <string>text</string>
</java>

can be replaced with

<string>text</string>

How to use variables

When a graph contains cycles or multiple references to the same object, an identifier must be given to the object so that it can be referred to later. Identifiers are created using the id attribute, which binds a name to the expression value. The identifier has global scope extending from the last argument of the expression to the end of the file. The following expression creates an identifier button1, bound to an instance of JButton class:

<void id="button1" class="javax.swing.JButton"/>

Reference is made to named instances by using an idref attribute in the element. The following expression makes reference to a previously defined instance button1:
<object idref="button1"/>

DocumentHandler allows the programmer to set the value of some variable before parsing and to get the value of some variable after parsing. Also you can parse some XML files by the same instance of DocumentHandler. The same variable can be used in all parsed XML files. For example:

DocumentHandler handler = new DocumentHandler();
handler.setVariable("input1", "example");
handler.parse( new InputSource( stream1 ) );
handler.setVariable("input2", handler.getVariable( "output1" ));
handler.parse( new InputSource( stream2 ) );

This is a way to pass many arguments to parser.

How to add own elements

XMLDecoder parses all elements and attributes in one method. It is very complex and hard to maintain. DocumentHandler uses separate ElementHandlers for each element. Such mechanism allows the programmer to create new elements easily. For example, the test application adds 4 custom elements:

DocumentHandler handler = new DocumentHandler();
handler.setElementHandler( "music", handler.getElementHandler( "java" ) );
handler.setElementHandler( "group", GroupElement.class );
handler.setElementHandler( "album", AlbumElement.class );
handler.setElementHandler( "track", TrackElement.class );
  1. The element is a topmost element. It uses the same ElementHandler like the element.
  2. The element specifies the music band. It uses GroupElement class to parse attributes name and home.
  3. The element specifies the album of the music band. It uses AlbumElement class to parse attributes year, name and time.
  4. The element specifies the track of the album. It uses TrackElement class to parse attributes name and time.

Note that it is possible to mix these elements with all basic ones.

How to include another XML file

I am working on this feature now. And I should solve the following issues:

  1. What should I use: element or processing instruction?
    1. <include file="name.xml">

      This way is more flexible. We can use attribute file, idref and others to specify how we should create InputSource for included data.

    2. <?include name.xml?>

      This approach uses only one argument (file name in this case) and you can configure your XML creation tool to support such processing instruction (if it has such a feature).

  2. How I can search file?

    It is simple when the file name is given by full path. But I should decide how to find the file by its short name using the system id from InputSource. Note that the system id can be null.

About performance

You can download the following files to test the performance:

performance.jar (142 887 bytes)

This is a test application. It contains the following files to test performance:

music.xml (162 815 bytes)

This file contains original data: the list of musical bands, their albums and songs.

music.old.xml (590 765 bytes)

This file is generated from the original one by the XSL-transformation by using the file music.old.xslt. It contains a set of XML elements for creation corresponding JavaBeans. This file is compatible with the old XMLDecoder.

music.new.xml (598 833 bytes)

This file is generated from the original one by the XSL-transformation by using the file music.new.xslt. It contains a set of commands that are analogue of commands from the file music.old.xml. But this file is optimized to be read by DocumentHandler.
decoder.jar (29 648 bytes)

This is the library needed for the application.

decoder.zip (64 244 bytes)

This file contains documentation for API of the library and supported tags, but it is not completed yet.

performance.zip (37 394 bytes)

This file contains documentation for API of the application with source code.

Theare are 4 test cases:

  1. XMLDecoder parses music.old.xml: $ java -jar performance.jar old music.old.xml
  2. DocumentHandler parses music.old.xml: $ java -jar performance.jar new music.old.xml
  3. DocumentHandler parses music.new.xml: $ java -jar performance.jar new music.new.xml
  4. DocumentHandler parses music.xml using own ElementHandlers: $ java -jar performance.jar mod music.xml

The following table contains results of testing (performance is better when time of parsing is less):

case cold start hot start

1.
703 ms
406 ms

2.
563 ms
281 ms

3.
484 ms
187 ms

4.
187 ms
31 ms

The result table shows that DocumentHandler has better performance than XMLDecoder on the same data set. Performance is even better when new features of DocumentHandler are used.

Conclusions

We cannot remove XMLDecoder because Java should be backward compatible, but we can rewrite it using DocumentHandler. I recommend to put DocumentHandler into the public domain, because it is much more flexible and usable than old XMLDecoder.

I would appreciate your comments, what else should we add to DocumentHandler? I would like to make it more convenient at all. You can vote for the RFE 4864117 here.

Related Topics >>