 |
Is binary XML an oxymoron?
Posted by jbob on March 23, 2005 at 12:53 PM | Comments (13)
news.com recently reported that A W3C committee is recommending that the group create a standard for a binary XML format. The problem they are trying to solve is the inherent inefficiencies of text.
Is this a memory lapse?
It seems we've forgotten what the notion of a Markup Language is all about. XML, like other markup languages such as HTML and WML, tag portions of text documents for one reason or another. HTML marks up text for formatting purposes and XML marks up text to make data embedded in a text document more machine readable.
All of these things are about making documents more useful. Formating documents, embedding data in documents, etc, is the purpose of markup languages.
The other thing we are forgetting is that binary formats are platform optimized. This optimization is a leading cause for incompatibility between dissimilar systems.
Finally, does anyone actually expects there to be a single binary standard if the WC3 actually pursues this? Many in the industry, including Microsoft, are already calling for multiple binary standards for XML.
Multiple binary standards for XML?! This whole thing is becoming a mess before it gets out of the gate.
I like XML. I think it's useful for certain purposes and use it myself for configuration files and for storing offline data. The things that make XML particularly useful are that it's human readable and that it is a standard. Daniel Steinberg provides an excellent example of why human readable data is valuable in his 2003 article on transforming iCal files with Java on O'Reilly's Mac Dev Center site.
I believe the problem with Binary XML movement is that, once again, we are looking for a silver bullet. There are no silver bullets and XML is also not one. Rather than embracing a wonderful technology for what it's good for, we will wind up jeopardizing it as we try to get it to do things that it isn't well designed for. The Fast Infoset Project (FI) provides some immediate relief for document size and performance. I think FI is solving the problem correctly.
All of this reminds me of when the whole Web Services craze started. Everyone just stopped thinking. Everything needed to be XML and everything needed to be Web Services. It was crazy.
During the early years of web services I would give talks to people deciding when and if to adopt emerging technologies. I typically praised XML and warned against what I thought was inefficient use.
4+ years later, my position remains the same. Given the current state of Internet and Wireless bandwidth along with text processing performance, it just doesn't seem desirable to use text as the basis for high volume data transmissions. Text is fat and inefficient for high volume use. Additionally, to secure that text, you must encrypt it which adds additional bandwidth, processing, and memory overhead.
I think the FI project is fixing the problem in the right place and is better than pretending we can all agree on a single binary format for XML. Eduardo Pelegri describes Fast Infoset in his blog as "GZIP for XML" and I think this is the right approach.
Let's use XML for what's it's good at and get better at using it. This includes more efficient document design. Don't put everything including the kitchen sink in your messages/documents and learn to normalize your documents and messages. I believe the answer is a new efficient standard or improvements in text compression and processing.
Whatever happens, I'm counting on Java to continue to make it easy for me to manage and process XML.
Thanks for reading.
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
Uhh, Microsoft is not in favor of multiple binary *standards*,it is highly skeptical that a single format will cover the range of use cases that the W3C binary characterization working group has identified. While reasonable people disagree within and between companies, many feel that 0 binary XML standards is better than 2 or more.
Fast Infoset (AFAIK from a distance) is pretty much the leading contender for the "single binary standard if the W3C actually pursues this." It will be interesting to see any analyses of how it performs for use cases other than web services.
Basically, I think the Microsoft party line is essentially: XML text for interoperability across systems, platform-specific tools (e.g. Fast Infoset in Java, various things in other environments) for faster communications within systems. Maybe in a few years we will understand all this well enough to devise interoperable binary XML standards, or maybe that's a pipedream. Nobody really knows enough to say right now, so the time is not ripe for any binary XML standards.
Posted by: mchampion on March 23, 2005 at 01:34 PM
-
Thanks for the correction. In the news.com article, Michael Rys, a program manager for Microsoft's SQL Server database and a member of the W3C's XML Query Working Group, stated that multiple binary formats would be needed, not multiple standards as I stated.
Thanks!
Posted by: jbob on March 23, 2005 at 02:06 PM
-
Check out my old blog on this, especially the comments.
Posted by: johnm on March 23, 2005 at 06:42 PM
-
Fast Infoset is not gzip for XML. It is a binary XML format with all the disadvantages that come with that idea. The only advantage "fast Infoset" has over similar proposals is that it doesn't actually use the acronym "XML" in its name. It's still a fundamentally wrong-headed idea that will harm interoperability, and lead to opaque, unintelligible data. It's hard to reconcile the first part of this article with the second part. If you detest binary XML, you'll detest Fast Infoset. If you like binary XML, you might like Fast Infoset. But Fast Infoset is substantially more than gzipping XML.
Posted by: elharo on March 24, 2005 at 06:04 AM
-
I also had some additional comments on this article, especially as to how this reflects against W3C XQuery.
Posted by: jonbruce on March 24, 2005 at 07:16 AM
-
Hi Mike.
Thanks for the kind words about Fast Infoset; we like it too (:-). Just one clarification on that area. The Fast Infoset standard is not Java-specific. From day 1 all parties involved in that project recognized that what the market needed was a standard that maintained the cross-platform value that XML has delivered; in particular, I am aware of C++ implementations of Fast Infoset.
From what I can see, Microsoft is a bit ambivalent in its position. I have heard the "no single binary standard" position before, but Indigo has one binary encoding that is targetted at many uses. Indigo does support pluggable encodings through their binding machinery, but the standard bindings are there to cover most peoples needs, and there is one, infoset-based, encoding used in them. Since Indigo's encoding is proprietary, we do not really know what it is based on, but, from what we have learned from public forums, it seems very similar to Fast Infoset.
Posted by: pelegri on March 24, 2005 at 08:02 AM
-
Eduardo: Point taken about non-Java implementations of FastInfoset, I wasn't aware of that. I really don't have any knowledge or opinion on how FastInfoset stacks up to the Indigo equivalent., and agree that in principle, someday, a standard web services binary format might be very useful for *efficient* interop across the .NET and Java worlds. I do think the burden of proof is on those who think that someday is now. The great thing about the work of the XBC WG at W3C is that we have the tools in place for someone to make that case. I *personally* would not be distressed if the case is made, but I can't speak for the Indigo people.
My sense (trying hard to present the MS collective position) is somewhere between Elliotte Rusty Harold's "XML is text, interop is everything" position and Eduardo's "one binary encoding can do it all" position. It's very clear that binary infoset encodings can offer substantial performance benefits in systems where both sides understand the same encoding and interop with arbitrary users and systems is not an issue. We all seem to find that in our internal work, I know from many conversations with people from all sorts of companies. MS uses binary encodings internally in a number of products that interact with the outside world via XML, so the *technological* benefit is not in dispute here.
What is in dispute -- within MS and probably a bunch of other companies -- is whether: a) one binary encoding can meet a wide range of use cases, e.g. both those in which XML size is now a problem and those in which XML speed is now a problem. I recall learning in Computer Science 101 about the size/speed tradeoff, so I'm a bit skeptical myself, but then again XML is pretty sub-optimal on both dimensions. b) whether the actual size/speed technology advantages of such a (hypothetical) encoding would outweigh the business/social advantages of having one and only one XML format. [Actually it's more complex than that because of the multiple character encodings already allowed, the de-facto subset of XML that SOAP uses, and so on]. That is pretty much where the MS people agree with Elliotte -- we are all scrambling to make XML *text* interop a reality today, and wildcards such as XML 1.1 and "binary XML" just make that all the harder.
My personal sense is that this area is very fruitful for experimentation, and industry-specific standardization (e.g. web services maybe, wireless probably), but that W3C standardization would be premature. We don't want to repeat the XML Schema experience where the desire to get something that met a lot of needs was standardized before the scope of the real problem and the range of alternative approaches was clear. Speaking ONLY for myself, in 20/20 hindsight a world where something like RELAX NG was the loosely-structured document schema standard and something more like XDR was the strongly-typed data schema standard would be preferable to the one we live in now. I suspect that premature standardization on any particular binary encoding of XML in the short term will lead similar regrets in the long term.
Posted by: mchampion on March 24, 2005 at 10:42 AM
-
A "binary XML" is not a Extended MARK-UP Language.
It is just a binary file. Just like that files we use in when the computers were to slow to process a human readable information.
Posted by: aces on March 24, 2005 at 12:02 PM
-
Obviously computers are still too slow to process human readable information or we wouldn't be discussing binary XML! ;)
I think aces is making my point about apparent contradiction is terms with "binary XML". XML is indeed a markup language (hence the title) and if binary XML is not, then we are radically deviating from what XML is supposed to be.
I just don't see everyone agreeing on a single binary implementation of XML. I do see the possibility of there being several XML compression standards that may emerge but I don't believe that is what binary XML is all about.
Posted by: jbob on March 24, 2005 at 01:35 PM
-
I agree with jbob! In fact, here was a take I posted on the Binary XML and
Web Services Forum:
Is Binary XML Good For Jini?
Posted: Jan 11, 2005 11:57 AM Reply
We created an “open” mechanism that essentially circumvents firewalls by executing
RPC over port 80 (i.e. XML Web Services). However, XML Web Services are very
verbose (read: bandwidth hog). Further, the inordinately large body of standards
trying to solidify XML Web Services is large enough to merit an abbreviation
that includes a universal regular expression character in its acronym (i.e.
WS-*)! WS-* will only compound the verbosity problem. This situation naturally
begs for a binary XML protocol to reduce the number of bits on the wire.
But whose “binary” will we use? Aren’t “binary” and “open” fundamentally contrary
in the context of computing? Binary XML may well lead to proprietary XML Web
Services, which will put us right back where we were. Even if it doesn’t, and
we all agree on an “open binary” XML standard, we are still executing code remotely
over port 80.
I have heard people suggest that RPC over port 80 is tolerated because the
messages are textual, and therefore, one can “read” them. How many System Administrators
are “reading SOAP messages”? If I am correct in my assumption that people don’t
read SOAP messages coming over port 80, then why would they care if it comes
over port 80 in a binary format? By the way, whose binary format will they be
“monitoring”? If IT security managers allow text-based RPC over port 80, it
follows they will allow binary XML over port 80 (read: XML Web Services inertia).
Now that we have established binary RPC over port 80, let’s talk about binary
RPC over other ports using say…RMI. If one entity “gets to use” binary XML RPC
over port 80, why can’t another use (binary) RMI over another port? It’s only
fair. If this is true, maybe XML Web Services’ true destiny is enabling large
scale execution of RPC using protocols such as RMI!
No longer will RMI be relegated to second class citizen status! In fact, I
think Jini – which has been in existence since 1999 – is superior to XML Web
Services anyway. (Supporting references available upon request!) It doesn’t
require a “regular expression” to describe the supporting standards either.
Using its default RPC protocol (RMI), it outperforms XML Web Services. Notice
that I said “default protocol.” Jini does not require RMI as the RPC protocol;
it can use virtually any protocol for RPC – including XML Web Services, and
probably the coming binary XML protocols (plural). By the way, Jini uses the
existing Java security model, yet allows extensions where required.
This all suggests to me that binary XML protocol(s) for XML Web Services is
good news for Jini!
Thanks, Sam
Posted by: sgchance on March 24, 2005 at 07:30 PM
-
Hi! Maybe look at this from other side - why to duplicate existing (and usable) format such as asn1? look http://asn1.elibel.tm.fr/en/introduction/index.htm, this format was invented (described) to exchange data within different communication devices, without any dependece to them, ie. asn1 i commonly used in SSL communication. maybe it's time to choose - reinvent the wheel or take one of ready-to-use and extend them to fit our needs?
Posted by: andrzejros on April 01, 2005 at 07:34 AM
-
Hi andrzejros.
The Fast Infoset standard (X.891) is being developed by the same ISO/IEC/ITU-T committee that takes care of ASN.1.
One of the reasons why we undertook this standard development project was to address those use cases in which a schema is not available, or is difficult to agree upon, or exists but is likely to vary across space and time, and so on, yet one needs a compact and fast representation of the information in an XML document (or in an XML infoset).
The Fast Web Services standard (X.892), developed by the same committee, uses a combination of X.694 (XSD --> ASN.1 translation) and Fast Infoset depending on the characteristics of the particular Web service wrt. availability, variability, or stability of schemas for the body, header blocks, and fault detail within a SOAP message.
Broadly speaking, in Fast Web Services, one would choose the X.694 schema mapping (for any given piece of content) whenever a schema is available and fixed -- which usually leads to greater compactness and speed due to the use of ASN.1/PER. One would choose Fast Infoset whenever those conditions are not met.
Alessandro Triglia, OSS Nokalva
Posted by: alessandrot on April 03, 2005 at 06:30 PM
-
That is informative post for me.))
------------------------------------------------------------------
Free ftp upload software
Posted by: frida1 on August 26, 2007 at 11:47 PM
|