The Source for Java Technology Collaboration
User: Password:



Sergey Malenkov

Sergey Malenkov's Blog

XMLDecoder improvements

Posted by malenkov on October 31, 2006 at 09:58 AM | Comments (11)

I would like to start a discussion about XMLDecoder improvements. Some requests can be found in RFE 4864117. I don't want to discuss improvements of persistence delegation (XMLEncoder) here.

How to read objects

Usually the following code is used to read XML file that represents JavaBeans archive:

public static Object[] readXML( InputStream stream ) {
    List list = new ArrayList();
    XMLDecoder decoder = new XMLDecoder( stream );
    try {
        while ( true ) {
            list.add( decoder.readObject() );
        }
    } catch ( ArrayIndexOutOfBoundsException exception ) {
    } finally {
        decoder.close();
    }
    return list.toArray();
}

This is the right way when you don't know about the amount of objects in JavaBeans archive. Also you can customize it with the class loader, the exception handler and the owner of this decoder. And you should do it every time for each file meant to parse.

I want to suggest the following code:

DocumentHandler handler = new DocumentHandler();
Object[] result1 = handler.parse( new InputSource( stream1 ) );
Object[] result2 = handler.parse( new InputSource( stream2 ) );

In this case you can parse each file with the same handler, so you don't need customize it before each usage. With InputSource you can use not only byte stream (InputStream), but you can use also character stream (Reader) or file name represented by String.

DocumentHandler extends DefaultHandler. So you can use it within your own SAX parser.

How to pass an argument

Each element that appears in the body of the outermost element (<java>) is evaluated in the context of the decoder itself. Typically this outer context is used to retrieve the owner of the decoder, which can be set before reading the archive. The owner is a property of the decoder and can be accessed in the usual way:

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.4.0" class="java.beans.XMLDecoder">
  <void id="myController" property="owner"/>
  ...objects go here...
</java>

I think this is a bad idea for security reasons. Now it is possible to set any property of XMLDecoder and execute any method. Do not do this below:

<java>
  <object method="readObject"/>
</java>

So I think that the context of <java> element should be an owner of DocumentHandler. There is no problem with backward incompatibility, because we can set appropriate owner depending of value of class attribute of <java> element.

XMLDecoder was created to read JavaBeans archive generated by XMLEncoder, but some people prefer to create XML manually. In many cases JavaBeans archive contains only one element, that does not use the context. DocumentHandler does not require to use <java> element. For example, the following code

<java>
  <string>text</string>
</java>

can be replaced with

<string>text</string>

How to use variables

When a graph contains cycles or multiple references to the same object, an identifier must be given to the object so that it can be referred to later. Identifiers are created using the id attribute, which binds a name to the expression value. The identifier has global scope extending from the last argument of the expression to the end of the file. The following expression creates an identifier button1, bound to an instance of JButton class:

<void id="button1" class="javax.swing.JButton"/>

Reference is made to named instances by using an idref attribute in the <object> element. The following expression makes reference to a previously defined instance button1:

<object idref="button1"/>

DocumentHandler allows the programmer to set the value of some variable before parsing and to get the value of some variable after parsing. Also you can parse some XML files by the same instance of DocumentHandler. The same variable can be used in all parsed XML files. For example:

DocumentHandler handler = new DocumentHandler();
handler.setVariable("input1", "example");
handler.parse( new InputSource( stream1 ) );
handler.setVariable("input2", handler.getVariable( "output1" ));
handler.parse( new InputSource( stream2 ) );

This is a way to pass many arguments to parser.

How to add own elements

XMLDecoder parses all elements and attributes in one method. It is very complex and hard to maintain. DocumentHandler uses separate ElementHandlers for each element. Such mechanism allows the programmer to create new elements easily. For example, the test application adds 4 custom elements:

DocumentHandler handler = new DocumentHandler();
handler.setElementHandler( "music", handler.getElementHandler( "java" ) );
handler.setElementHandler( "group", GroupElement.class );
handler.setElementHandler( "album", AlbumElement.class );
handler.setElementHandler( "track", TrackElement.class );
  1. The <music> element is a topmost element. It uses the same ElementHandler like the <java> element.
  2. The <group> element specifies the music band. It uses GroupElement class to parse attributes name and home.
  3. The <album> element specifies the album of the music band. It uses AlbumElement class to parse attributes year, name and time.
  4. The <track> element specifies the track of the album. It uses TrackElement class to parse attributes name and time.

Note that it is possible to mix these elements with all basic ones.

How to include another XML file

I am working on this feature now. And I should solve the following issues:

  1. What should I use: element or processing instruction?
    1. <include file="name.xml">

      This way is more flexible. We can use attribute file, idref and others to specify how we should create InputSource for included data.

    2. <?include name.xml?>

      This approach uses only one argument (file name in this case) and you can configure your XML creation tool to support such processing instruction (if it has such a feature).

  2. How I can search file?

    It is simple when the file name is given by full path. But I should decide how to find the file by its short name using the system id from InputSource. Note that the system id can be null.

About performance

You can download the following files to test the performance:

performance.jar (142 887 bytes)
This is a test application. It contains the following files to test performance:
music.xml (162 815 bytes)
This file contains original data: the list of musical bands, their albums and songs.
music.old.xml (590 765 bytes)
This file is generated from the original one by the XSL-transformation by using the file music.old.xslt. It contains a set of XML elements for creation corresponding JavaBeans. This file is compatible with the old XMLDecoder.
music.new.xml (598 833 bytes)
This file is generated from the original one by the XSL-transformation by using the file music.new.xslt. It contains a set of commands that are analogue of commands from the file music.old.xml. But this file is optimized to be read by DocumentHandler.
decoder.jar (29 648 bytes)
This is the library needed for the application.
decoder.zip (64 244 bytes)
This file contains documentation for API of the library and supported tags, but it is not completed yet.
performance.zip (37 394 bytes)
This file contains documentation for API of the application with source code.

Theare are 4 test cases:

  1. XMLDecoder parses music.old.xml:

    $ java -jar performance.jar old music.old.xml

  2. DocumentHandler parses music.old.xml:

    $ java -jar performance.jar new music.old.xml

  3. DocumentHandler parses music.new.xml:

    $ java -jar performance.jar new music.new.xml

  4. DocumentHandler parses music.xml using own ElementHandlers:

    $ java -jar performance.jar mod music.xml

The following table contains results of testing (performance is better when time of parsing is less):

casecold starthot start
1.703 ms406 ms
2.563 ms281 ms
3.484 ms187 ms
4.187 ms31 ms

The result table shows that DocumentHandler has better performance than XMLDecoder on the same data set. Performance is even better when new features of DocumentHandler are used.

Conclusions

We cannot remove XMLDecoder because Java should be backward compatible, but we can rewrite it using DocumentHandler. I recommend to put DocumentHandler into the public domain, because it is much more flexible and usable than old XMLDecoder.

I would appreciate your comments, what else should we add to DocumentHandler? I would like to make it more convenient at all. You can vote for the RFE 4864117 here.


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Yes, this needs to be reworked to make it more usable. I ended up cutting and pasting a lot of the source to make this work in an SAX filter chain. I will see if I can get some cycles to look at this soon.

    Posted by: bruff on October 31, 2006 at 03:51 PM


  • Please add security restrictions: With XMLDecoder, an attacker can "inject" arbitrary object constructions and method calls at almost any part of the graph. In clear words: If you are using XMLDecoder, an attacker can easily call arbitrary code simply by modifying the data to be parsed. To make it secure, we would at least need to restrict the parsing to a defined set of classes and optionally, a defined set of methods for each class.

    Second, but probably a bit out of scope since you said you don't care for XMLEncoder, XMLEncoder lacks a clear definition of how to treat the listeners of a JavaBean: Should they be persistet too or not? The classes in the JDK which have XMLEncoder support are behaving differently on this subject: Some persist their listeners, some don't. This causes pain in the butt...

    Posted by: christian_schlichtherle on November 01, 2006 at 05:17 AM

  • I think for persistance of object graphs, Microsoft's XAML conventions are quite good. It would be a good option over current Java encodings..

    Posted by: vhi on November 01, 2006 at 06:34 AM

  • We should guarantee backward compatibility. So we can't replace existing format with XAML. But my DocumentHandler allows to create and use custom ElementHandlers that can support XAML elements. This is the next step. I'll think about it.

    Posted by: malenkov on November 01, 2006 at 07:45 AM

  • I don't want to discuss XMLEncoder here. Its refactoring requires a lot of time. XMLEncoder is rather a headache for me. I am thinking about it...

    Posted by: malenkov on November 01, 2006 at 07:54 AM

  • Christian, could you give me more information about security problems? Now you can't create private class or call private method:
    java.lang.IllegalAccessException: Class sun.reflect.misc.Trampoline can not access a member of class java.beans.ReflectionUtils with modifiers ""
    java.lang.NoSuchMethodException: =Class.initialize(); (for public class java.beans.PropertyEditorManager)
    I think it is enough.

    Posted by: malenkov on November 01, 2006 at 08:36 AM

  • XMLEncoder is dead, long live XStream: http://xstream.codehaus.org/Seriously, I see absolutely no reason for why one would *ever* want to use XMLEncoder instead of XStream.

    Gili

    Posted by: cowwoc on November 01, 2006 at 12:50 PM


  • Sergey, the problem is that XMLDecoder allows access to ALL public elements. For example I can easily place a File object into the list of persisted objects and call its delete() method.

    Again, if an application is using XMLDecoder, attackers could easily call malicious code. If they could also modify the class path, then they could even inject arbitrary code.

    To fix this, there should be an optional argument with a tree of allowed Class and Method objects. When deserializing, only those classes and methods should be allowed to be used.

    Posted by: christian_schlichtherle on November 04, 2006 at 06:16 AM

  • As an even simpler alternative, the decoder could call into a filter interface. The application's implementation would then check if instantiation this constructor or calling this method would be OK.

    Posted by: christian_schlichtherle on November 04, 2006 at 06:19 AM

  • Gili, XStream seems good, but we can't change the architecture like different libraries does, because Java should be backward compatible. Also XStream uses internal fields. So it can have some problems with installed security manager and can be broken when internal implementation is changed.

    Posted by: malenkov on November 26, 2006 at 08:05 AM

  • Christian, I got it, but we can't change the behavior, because many people use Long-Term Persistence in their applications and don't have the ability to rebuild it. So you should find other ways to improve the security. For example, check the source of JavaBeans archive or set custom security manager.
    A filter interface for XMLDecoder is an interesting idea. If it will be implemented we can set own filter to check method before executing, but we should allow to execute all methods by default.

    Posted by: malenkov on November 26, 2006 at 08:58 AM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds