|
|
||
Kirill Grouchnikov's BlogWeb Services and XML ArchivesNative XML support in DolphinPosted by kirillcool on July 16, 2005 at 01:10 AM | Permalink | Comments (18)In "Evolving the Java language" technical session during the last JavaOne, Mark Reinhold shed some light on the future of Java XML support. The slides for the technical sessions have been finally uploaded to conference webpage, so download the PDF for session TS-7955. The executive summary of the relavant slides is:
And now, hoping the above list did not offend too many lawyers, let's proceed. As outlined in the first three items, the current state of affairs in working with XML is far from satisfactory. If you take a web designer, it usually means exceptional HTML and CSS skills, and good Javascript. Javascript is very far from Java, but it's fairly easy to use. How about throwing in an XML parser? We can be all excited with StAX, but the "ease of use" column is somewhat misleading. It's easy to use when you talk with your dog in JVM bytecodes, but it's certainly not easier for non-technical guys. IBM continues to develop XJ - XML enhancements for Java. XJ program treats XSD schemas as "first-class" citizens, allowing to import them as regular Java classes, read and write attributes and elements, unmarshal strings, streams and files to "virtual" objects and marshal these objects back ("virtual" object is object of class that directly corresponds to some schema artifact, without the need for explicitly generating this class). As Mark pointed out during his talk, this approach is too restrictive - you work only on XML that are valid according to a predefined set of schemas. The same can be said about JAXB 2.0. It doesn't matter if you start with a schema and create classes, or if you start with your classes. As long as the input can not be completely mapped to your classes, the unmarshaller will fail. In addition, you can not add arbitrary elements during the marshalling. The approach that Mark outlined in his talk is the complete opposite - no schema, no class for the data, only working with XML tags (that in the proposed syntax can not even be externalized). Complete freedom that comes at cost of optional validation, typechecking and syntax that is far from readable (except for your JVM-compliant dog). So, what am I looking for? Suppose I have two simple classes, Customer and Order, that look like this:
class Order {
// has get-set pair
private int id;
}
class Customer {
// has get-set pair
private int id;
// has get-set pair
private String name;
// has get-set pair
private List
Simple annotation with JAXB 2.0 can be putting the following on each class:
@XmlAccessorType(AccessType.FIELD)And maybe the following on Customer: @XmlRootElement(name = "customer")Taking a simple XML
<customer>
<id>1</id>
<name>Dan</name>
<order>
<id>1</id>
</order>
<order>
<id>1</id>
</order>
</customer>
I'd like to be able to simply write
String xml = ...; // contains the above XML Customer cust = xml;With the auto-unboxing calling JAXB 2.0 unmarshaller (which is already a part of Mustang). The same auto-unboxing should be provided for File, Reader and InputStream as well. If I want to change my customer, i simply change the field:
cust.setName("Arnold");
The marshalling should be as simple as unmarshalling
String newXml = cust;Here, the auto-boxing should be called. In this case, toString() default behaviour can be the marshalling using JAXB 2.0 (in case Customer class does not override the default implementation of toString()). Auto-boxing should be also provided for Writer, OutputStream and File. Looping over elements in Customer should be kept as simple as possible:
for (Order order : cust.getOrders[id>3]) {
System.out.println(order.getId());
}
Here, the compiler knows the exact type of id field and can invoke the getter function. The amount of extra syntax elements (hash mark, slashes, apostrophes) should be 0. The code should be easy to read.Now, the more interesting problem. What if we don't have schema? What if we are working on XML that has extra elements or attributes that our functions should simply ignore? What if we need to add extra elements or attributes that our functions wish to add for subsequent modules? The answer is simple, and was introduced long ago in Java, and reinforced in 5.0 with generics - extends keyword. Combined with "implicit" properties and functions introduced on enums in 5.0, clean solutions can be provided to the problems stated above. Suppose that I get the following XML
<customer>
<id>1</id>
<name>Dan</name>
<age>32</age>
<order>
<id>1</id>
<extId>1000</extId>
</order>
<order>
<id>1</id>
<extId>2000</extId>
</order>
</customer>
Marked in red - elements that can not be mapped to Customer and Order classes. However, our code doesn't use these elements at all. How can we make our code work and the compiler happy? Make the compiler perform implicit narrowing conversion:
String xml = ...; // contains the above XML Customer cust = xml;The code is exactly as it was. The marshaller should simply discard all the "irrelevant" information, just as done with regular upcast. Of course, the regular upcast doesn't really change the class, so that you can always downcast back (at your own risk). This will be clarified in the following examples. Suppose now that you wish to handle the new fields, but you can't change the Customer and Order classes (for example, they are part of external jar). Now the code can look like this: String xml = ...; // contains the above XML extends Customer cust = xml;The extends keyword instructs the compiler (and the unmarshaller) to keep extra information (exactly as done with enums and name() function). This keyword applies to all internal elements (Order in our case). How can we access the new (undeclared) fields - the same way as regular fields: System.out.println(cust.age);Here, the unmarshaller stored the value of age in some internal map, and the compiler retrieves that value for us. Here, there are three possible cases:
// must extend Object as we don't know it's type
for (extends Object age : cust.age) {
// implicit function provided by the compiler
if (age.isSimple()) {
// the cast will succeed
System.out.println((String)age);
}
}
What about adding new elements? Simply call
cust.weight = 180;This can only compile on extends Customer. If the type of "cust" is Customer, the compiler should issue an error message. Continuing this line of thought, the regular rules for casting, narrowing and passing objects as parameters apply:
private Customer cust1;
private extends Customer cust2;
void test() {
// narrowing implicit cast - all undeclared
// attributes and elements are discarded
cust1 = cust2;
// widening implicit cast
cust2 = cust1;
}
Here we have special case - although both cust1 and cust2 point to the same "base" object, changes to cust1 are seen in cust2, but changes in cust2 are seen in cust1 only on declared fields. If we have another extends Customer cust3 that points to cust2, it's the same object. In this case, all three are poiting to the same object in memory, but calling marshaller on cust1 will emit only declared fields, while calling marshaller on cust2 or cust3 will emit all fields. This way, the referencing model is preserved, and the compiler does not allow adding or retrieving undeclared fields from cust1.Another example - SetIn this case, this function will print null on the cust1.age - it was implicitly widened based on the type of the Set, but doesn't contain information on age. Another example:
void foo(Customer cust) {
// widening cast - undeclared fields
// can appear after the call.
bar(cust);
}
void bar(extends Customer cust) {
// implicit narrowing cast - only declared fields
// can be changed.
foo(cust);
}
The extended type may provide access to its elements, as enum does (with implicit functions):
Mapwhere each entry can be either String or List> for collections. The last example is a recursive function that traverses the input XML and dumps its contents to the console. Arguably, this function is not much simpler than its counterpart for DOM. However, most of its logic is both straightforward and simple. In bold red font - the functions that are generated implicitly by the compiler:
void dumpXml(extends Object xmlObj) {
// see if it is a simple (and implicitly single) element
if (xmlObj.isSimple()) {
System.out.println((String)xmlObj);
return;
}
// see if it is a single (and complex because of the previous
// check) element
if (xmlObj.isSingle()) {
// iterate over inner elements. The getElements()
// function is generated implicitly in the same way as
// name() is for enums
for (Map.Entry
Answers to selected comments The marshalling and unmarshalling exceptions (including I/O and XML format) should be declared as unchecked. Few new exception classes (may be even one) should "envelop" the existing exceptions (too many of them already). If your code wishes to provide corresponding support - you will have to catch the new exceptions and deal with them correspondingly. The examples are based on attributes rather than accessors, but using JAXB 2.0 this remains purely a choice. You can go either way (and don't forget that XML attributes and elements do imply straightforward field implementation). For undeclared elements (such as age in the examples above), there can be only a public attribute-style access - after all, they are not declared on the corresponding data class. XJ's support of undeclared attributes requires the same approach as working with DOM and XPath. The only option to parse XML that is not valid according to some schema is to use XMLElement class, which brings back the "ease-of-use" of DOM. The only example I could find of getting information out of XMLElement was using XPath query (in SequenceInstanceOf.xj). All other examples (which are very scarce) use this class to wrap something inside some tag and either output it as XML or put it inside another tag. There are two problems with this approach:
| ||
|
|