The Source for Java Technology Collaboration
User: Password:



John D. Mitchell's Blog

Binary XML?

Posted by johnm on January 19, 2005 at 07:38 PM | Comments (21)

Well, there's seems to be a fair bit of discussion lately about various approaches to making XML less of a bloated sack of protoplasm. Technically speaking there's a Sun article on talking about the Fast InfoSet draft specification. More generically speaking, here's a CNet article asking: How do we make XML faster?

Alas, I don't see anyone asking moderately important questions like:
  • If binary XML is the answer, what exactly is the question?
  • If XML is the answer, what is the question?
  • Why are people using XML for so many things that it's horrible for?

IMHO, all of this stems from the fact that people have been mislead by XML's name into believing that XML is a language when XML is really just a data format. Very sad.


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • The Jabber IM protocol is based on XML, probably for the same reasons that most things that are XML, are XML: lots of data are hierarchial in a manner that matches XML's method of specifying a data hierarchy . The Jabber protocol would probably save a great deal of bandwidth using a binary XML protocol, without sacrificing any functionality.

    My answers to your questions:
    1. "How can I represent tree-like data in a well-supported, robust format which doesn't take up as much space, or take as long to parse, as XML?"
    2. "How can I represent tree-like data in a well-supported, robust format, which could be read and modified by people, if necessary?"
    3. XML is new, people are experimenting with it

    Posted by: keithkml on January 19, 2005 at 07:56 PM

  • Keith,

    I would debate your answer to 3.

    XML is not that new any more. The fact is that there is a lot of accumulated experience out there. Unfortunately I have yet to see it well presented and disseminated, which is in fact the answer to 3!

    Bob.

    Posted by: bob_boothby on January 19, 2005 at 11:35 PM

  • Philosophical question apart, "binary XML" sounds like an oxymoron in the first place.


    At a more practical level, if you are looking for a well specified binary format, why not simply use ASN.1 instead of reinventing the wheel once again?

    Posted by: zoe_info on January 19, 2005 at 11:36 PM

  • Since Apache (the webserver) automatically uses gzip compression in various very usefull ways I have always found the question a bit silly. Just make sure you ask a .gz version from apache or provide the right meta data and the data generated by your servlet will automatically be compressed by apache.

    So; I feel that people have been asking the wrong questions when they come up with bin-xml as the answer.

    Posted by: zander on January 20, 2005 at 12:18 AM

  • Perhaps it would be better called Compressed XML.

    Posted by: johnreynolds on January 20, 2005 at 06:44 AM

  • >If binary XML is the answer, what exactly is the question?
    If I were to invent a whole new specification for making XML smaller, rather than just using GZIPOutputStream, what would I call it?

    >If XML is the answer, what is the question?
    What technology should I use in my application to make people think I am cool?

    >Why are people using XML for so many things that it's horrible for?
    Because so many articles and books are using it for things that it's horrible for.

    Some serious answers:

    >If binary XML is the answer, what exactly is the question?
    Obviously, to make it smaller. I really don't see why GZIP isn't the answer to this, though.

    >If XML is the answer, what is the question?
    What's currently the most recommended way of encoding inter-application (especially inter-platform) communications?

    >Why are people using XML for so many things that it's horrible for?
    It does has lots of advantages:

    Don't have to write a parser (woohoo!)
    Specifications (DTDs, schemas) relatively easy to write and understand
    Parsers validate document against specification
    Character encoding specified in the file
    Easy to create mock data
    Multi-platform ("and binary isn't"? : )


    My pet hate with the proliferation of XML is people who believe XML is "easily readable and self-describing", and think that this means they can:

    Use XML for their custom configuration files
    Not use a DTD
    Not provide any documentation besides small desciptions in the "example" file
    Not provide a GUI!

    #4 there is the worst sin:
    Just because you've done something in XML doesn't mean it's so easy to understand that you shouldn't write a GUI.
    ANT has unfortunately led most developers to believe that editing XML is a normal, everyday thing to do. I think it should be a last resort.
    Sure, it has many benfits as a storage format for your app's configuration data, but that doesn't mean that users should have to hand-edit it!!

    I've been considering for some time starting a "Stop the XML!" campaign, encouraging people not to actually stop using XML, but to stop making their users edit it.
    What do people think of this idea?

    Posted by: grlea on January 20, 2005 at 04:00 PM

  • If binary XML is implemented well:
    you could open a binary XML file in an editor and read it just like a text XML file. You might need an Eclipse/Emacs/vi plugin, but otherwise no big deal.
    you could use the same parser code for binary and text XML. You might need to upgrade your parser, but the semantics of the document would be the same
    a binary XML file will be faster to access and smaller than a compressed text XML file. This is because we can exploit our understanding of the structure of the document.

    I'm psyched for a binary XML spec that meets these requirements!

    -Jesse

    Posted by: jessewilson on January 21, 2005 at 07:25 AM

  • Keith: Thanks for jumping in! Alas, I gotta say that I don't find your answers (the questions) compelling. What does "robust" and "well supported" actually mean? Have you seen much of the crap that's floating around that people are marketing with that sort of hype as yet another silver bullet just because it's "XML"?

    Posted by: johnm on January 21, 2005 at 03:47 PM

  • Zoe: Well, as the Sun article mentions, this Fast InfoSet stuff is based upon ASN.1 (and whether that's a good thing or not... :-). The answer to your question is that they want "XML".

    Posted by: johnm on January 21, 2005 at 03:51 PM

  • JohnR and Zander: There's arguably a big difference between compressing straight, textual XML and creating a binary format that's "XML". This difference seems to be felt most acutetly by folks dealing with huge amounts of data in XML.

    Posted by: johnm on January 21, 2005 at 03:54 PM

  • grlea: Your discussion of the pros and cons reminds me to point that that all of that DTD/schema junk still does not a language make. IMHO, people have gotten so enamored of the XML data format (with these bells and whistles) that they are missing the point: what matters (be it in pure data interchange or in using the data format for insane things like Ant's configuration file) is the interchange of useful information (not data). That is to say, it's the semantics that matter and XML is basically worthless in that regard since XML only really deals with syntax of data.

    Posted by: johnm on January 21, 2005 at 04:03 PM


  • To answer in reverse order...


    Why are people using XML for so many things that it's horrible for?
    Because we all remember the pre-XML days, and while few would claim XML to be perfect, it was a huge improvement over the chaos of file formats which preceeded it.


    If XML is the answer, what is the question?
    How do we make sense of all these wildly different ways of formating data? Why is it I need software to extract the data out of a "dot whatever" file, before I can even get to the stage of trying to understand what that data means?


    If binary XML is the answer, what exactly is the question?
    Okay, so now we have an agreed way of holding data, how do we make it more practical from a storage and transport point of view?


    For any type of interchange between two parties there has to be two basic conditions: a shared understanding of the context of the conversaion, and a shared language in which to express that conversation. English speakers may read this message, but devoid of the necessary background in computing they would not understand it. Likewise, a computer professional would be capable of understanding this message, but if I wrote it in Welsh they would be unable to penetrate the language.


    XML provides a shared language. It doesn't necessarily mean that I can look at any old XML document and understand it - I lack enough knowledge of the data (the 'context') - but at least it's expressed in a format (a 'language') I understand And so in this regard it removes one of the barriers. to working with the data.

    Posted by: javakiddy on January 24, 2005 at 03:19 AM

  • Javakiddy: Um, er, I don't see how your first answer answers the question. If XML is an improvement over the old days formats, how does it follow that the use of XML is justified for all of the uses that it's just plain horrible for? Thanks.

    Posted by: johnm on January 25, 2005 at 10:22 AM

  • Because the horrible-ness of XML is slightly less horrible than the horrible-ness of the proprietary formats it replaced. It's like democracy - flawed, but still better than the alternatives.

    Posted by: javakiddy on February 01, 2005 at 06:49 AM

  • Why are people using XML for so many things that it's horrible for?People like familiarity I suppose. If you start work on an existing project, where do you begin? If you see config files in XML, ah, there's a start. And, then you can take that XML and put it on the web. And, you used to be able to make eBook reader files out of XML (don't know if that's still freely available). And on and on . . .I guess it boils down to universalness. XML _can be_ used in everything. (hammer, everything looks like a nail, you get the picture . . . )then there's always the fact that XML and tree-walking are cool and logically fun to do . . .

    Posted by: xoastorm on February 01, 2005 at 11:18 AM

  • javakiddy, so let me get this straight... Since people have used crappy solutions in the past and XML is less crappy because it's "non-proprietary" that means that it's okay to use XML for tasks that it's truly horrible for? That sounds like wishing rather than logic.

    Posted by: johnm on February 01, 2005 at 11:23 AM

  • xaostorm, indeed, familiarity is an easy crutch to use. Does that mean that XML is actually well-suited for the purpose or that it will be effective, modifiable, and otherwise maintainable in the long run?

    When all you know is a hammer, not only is it hard to pound screws, but it's impossible to paint.

    Posted by: johnm on February 01, 2005 at 11:28 AM

  • johnm,

    In a sense, XML is effective if you can give over files that are slightly commented with a well-chosen XML element naming scheme to someone in some context (i.e., Ant, app config files, book/journal raw data), and they can in a short time learn how to use the data.

    In the same vein, XML is easily modifiable by anyone who knows XML file formatting. Some people I've run up against don't know XML (or HTML, surprisingly) at all and fail miserably in keeping or modifying files. So, in those instances, I have written applications that just ask them questions in dialogs and hide the XML. And, yes, these are programmers.

    Actually, I believe that every programmer should know XML. It's useful for keeping information about, blogging on, and charting processes of projects, for instance. And, you can always write a program that reads in XML data and do with it what you want. But, how can you read in a proprietary file if you don't understand its structure? (i.e., mainframe files . . .)

    Posted by: xoastorm on February 01, 2005 at 12:11 PM

  • xoastorm, alas, people seem to be still making this argument that XML is a priori better just because they can compare it favorably against old, crappy, and, most importantly, proprietary data formats.

    First off, just because the syntax is simplistic doesn't mean that the data contained therein isn't just as convoluted (semantically speaking) or proprietary (in the same sense as used in this thread). I.e., a readable system is arguably necessary but it's certainly not sufficient.

    Secondly, I still don't see any actual answers (other than laziness) as to why people are using XML in situations that XML is horrible for.

    Posted by: johnm on February 01, 2005 at 12:19 PM

  • johnm,

    Would you call familiarity "laziness"? I think it's fear, not laziness, that drives the desire for sameness.

    I think you might be running up against human nature here. That, and no one with gentle curiousity and calm intelligence has come along and seen an actual better solution in cases where XML is used by default, like a band-aid over a gun-shot wound.

    Posted by: xoastorm on February 01, 2005 at 12:54 PM

  • Yes, when "familiarity" is used as an excuse to do a crappy job, I'd call that laziness (or worse :-). However, I heartily concur with your point that the underlying issue is fear.

    Posted by: johnm on February 01, 2005 at 01:07 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds