The Source for Java Technology Collaboration
User: Password:



Ben Galbraith

Ben Galbraith's Blog

How I Learned to Love Domain-Specific Languages (in Three Parts)

Posted by javaben on November 16, 2005 at 08:14 PM | Comments (20)

(cross-posted on Married... with children)

I've been watching the hype surrounding domain-specific languages (DSLs) with skepticism. At first I thought, "Why would I learn some custom syntax when I could use good old Java and XML?" And then, gradually, I saw the light.

Dave Thomas was the catalyst behind my change of heart. Over dinner a few years ago, he patiently explained to me why the Ruby community eschewed XML for YAML; that angle brackets and wearisome complexity make XML more pain than it's worth.

At the time, I sputtered various protests about how XML was a lingua franca, readily-understood, blah blah blah. But it got me thinking -- and the next time I edited an Ant build file, my discontent was a bit more pronounced than usual.

Shortly thereafter, James Duncan Davidson came out with his "I shudda used a scripting language" post (couldn't find a good link, sorry, but this fragment gives you an idea), and that got me thinking some more.

As the Rails machine started dominating the Ruby discussions, Dave's anti-XML chorus found a whole new host of supporters, this time advocating in addition to YAML the use of the value of domain-specific languages instead of general-purpose configuration files. About the same time, my friend Neal Ford started banging his own domain-specific language (a.k.a. language-oriented programming) drum.

"Hey," I started thinking, "maybe there's something to this 'XML bad, DSL good' meme."

The funny thing was that all throughout this time period, though I didn't realize it, I was already preferring a DSL to XML. RELAX NG (a blissfully simple alternative to W3C XML Schema) comes in both XML and DSL flavors, and I'd been preferring the DSL for months. Have a look, starting with the XML version:

<?xml version="1.0"?>
<element name="contents"
         xmlns="http://relaxng.org/ns/structure/1.0"
         ns="http://galbraiths.org/myns">
    <oneOrMore>
        <element name="section">
            <attribute name="name"/>
            <element name="content">
                <zeroOrMore>
                    <choice>
                        <text />
                        <element name="p">
                            <empty />
                        </element>
                    </choice>
                </zeroOrMore>
            </element>
        </element>
    </oneOrMore>
</element>

That doesn't look so bad -- and since it's RELAX NG, and not W3C XML Schema, it's actually somewhat intuitive.

Now, check out the DSL:

default namespace = "http://galbraiths.org/myns"
element contents {
    element section {
        attribute name { text },
        element content {
            (text |
             element p { empty })*
        }
    }+
}

I think the XML version is clearly more intuitive for those with no previous exposure to RELAX NG, but for those who spend a lot of time working in the domain, my own experience (and anecdotal evidence) shows that the DSL is the clear favorite. By the way, the RELAX NG "DSL" is known as the "Compact Syntax". Until recently, I never thought of it as a DSL, but, it sure is.

So gosh, I guess I prefer domain-specific languages to general-purpose XML, too.

(I'll talk a bit about creating and using DSLs in Part 2 of this blog entry, and in Part 3 I'll give some examples of how Java's Swing GUI toolkit can benefit from a DSL or two.)


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Have you looked at, and will you be writing about, JetBrains' MPS? (http://www.jetbrains.com/mps/)

    Posted by: grlea on November 16, 2005 at 09:07 PM

  • I wish we could go one step further:

    http://www.javarants.com/C1464297901/E20050710165403/index.html

    I'd put the implementation in the comment box but there isn't enough room...

    Posted by: spullara on November 16, 2005 at 09:34 PM


  • grlea: Yes, and yes.


    spullara: :-)

    Posted by: javaben on November 16, 2005 at 09:41 PM

  • I think the general problem with DSLs is that they form a language, whereas, in principle, any XML schema can be thought of as a data structure with plenty of metadata. Most programmers could probably not write a BNF or EBNF for a DSL they thought up. You could come up with a structure-only DSL but then, what's the point? I guess it will save you some typing. Most people will probably agree with you that a well-designed DSL is much more fun to work with than the equivalent XML. There just aren't as many well-designed DSLs out there as there are XML schemas that accomplish the same purpose, because DSLs are just harder to write. Patrick

    Posted by: pdoubleya on November 17, 2005 at 02:08 AM

  • Interesting. I am not convinced YAML is going to end up replacing XML, and I don't think it is really any easier, but I can write an XMLReader to read YAML so no problem.
    For a "real" DSL showing XML document structure, how about this:

    <contents xmlns="http://galbraiths.org/myns" xmlns:s="urn:simple-schema">
    <section name="text" s:cardinality="1..*">
    <content>
    <s:options s:cardinality="0..*">
    <s:text/>
    <p/>
    </s:options>
    </content>
    </section>
    </contents>

    although I would question your need to define most of the things a schema-language lets you define...

    There is some accumulated crap in our perception of what XML is or should be, I suppose the first thing to throw out is the concept of validity (which will be determined in the context anyway) and the second thing to throw out is the concept of "document" (at least as far as it being necessary for doing XML, it is still a useful concept in specific contexts)

    Posted by: tobega on November 17, 2005 at 04:57 AM

  • pdoubleya: This is exactly where I'm going with part 2 of the entry. Actually, I don't know if its good or bad that the first three comments have basically reproduced the first few paragraphs of the next blog entry.


    tobega: I don't think YAML will ever replace XML. Certainly, if it ever does, it will be XML without angle brackets -- which, really, isn't all that interesting. I hope YAML stays within the niche of a simple declarative syntax for when you just want a simple declarative syntax.


    RE: necessity of schema languages. I'll sidestep this discussion for now, if that's okay, though I find it a little ironic that in a Java forum we debate the necessity of static validation of implementation syntax (i.e., validation:XML as static type-checking and bytecode verification:Java).

    Posted by: javaben on November 17, 2005 at 06:10 AM

  • Ben,
    What have you come across with respect to debugging applications that are written in domain specific languages?

    DSLs appeal to me, but I am a sucker for tool support.

    --JohnR

    Posted by: johnreynolds on November 17, 2005 at 07:27 AM

  • It's great that more people are finally starting to see the value of humane languages over silly "formats". Here are some links that approach the subject from various angles:

    Why Humans Should Not Have to Grok XML
    ITLS!
    Conversational Programming Languages
    Architecture: Abstract or Manifest?
    Design vs. Architecture
    The Poetry of Programming

    Posted by: johnm on November 17, 2005 at 07:40 AM

  • I've blogged my more complete response as: DSLs feelin' groovy (or, graduating from elementary school). Go wild. :-)

    Posted by: johnm on November 17, 2005 at 11:10 AM

  • Ben, sure we don't have to take the whole discussion here, it would not be possible, but I think there's a difference between code and data. And I am not saying "don't ever validate data", even though it might have sounded like it, I just meant that there are more appropriate ways, like in your business code, for example.

    Posted by: tobega on November 17, 2005 at 02:05 PM

  • Also want to say that I think a big reason for using YAML instead of XML is that nobody invented a schema language for it yet, not even a DTD :-)

    Posted by: tobega on November 17, 2005 at 03:15 PM


  • johnreynolds: I'll share what little I know in a subsequent entry -- very interesting question.


    johnm: Nice links! Aesthetics matter. Big time.


    tobega: RE: validation. "I would question your need to define most of the things a schema-language lets you define" seems kinda strong to me. ;-) There are dozens of well-understood use cases for static validation using an XML schema; I don't want to get into that discussion, but I do understand and agree with your point that if you are interrogating a DOM tree, you can write your code such that the act of reading the data from the tree is the validation and a pre-process static validation may be redundant.


    In the case of stream parsing (SAX/StAX), I disagree, as the low-level nature of those APIs makes it difficult to figure out just where your state machine got farked up, and makes it hard to communicate anything to the user other than "error." Much easier to rely on a nice validator to communicate the gaffs.


    Moving on to data-binding solutions, where you statically generate object proxies for XML elements, a schema is fairly important. In a more dynamic language, perhaps that wouldn't be the case.


    And then, of course, you've got the authoring tools which, when given a schema, can make it so much easier to create document instances.


    So, I guess I did wind up having some of that schema discussion. Dang.


    RE: YAML vs. XML. Yeah, I think that's the point. YAML is a quick-and-easy alternative when you just need a quick-and-easy alternative. Make no mistake, YAML could evolve into the same monster that is XML (just as XML evolved into the SGML monster) -- the key is to leave the simple solution for the simple problems and not muck it up by trying to make it an all-purpose solution.


    So, yeah, if you just need a simple format with existing parsers, and validation, etc. are not necessary, go with YAML.

    Posted by: javaben on November 17, 2005 at 05:03 PM

  • OK, then ;-)
    I would tend to think that the error message you can give from your business app would be much more informative than a schema validation error... Besides, stream parsing is a very low level API, surely we can do better (jakarta commons Digester comes to mind), and surely the one who created the document has a responsibility to make sure it can be understood (testing, anyone?).
    XML binding depends on some knowledge of structure, but what you really need is a mapping, not necessarily a Schema (see JiBX, XStream and Javolution or whatever it's called now). I can agree that we need to be able to express some rules about the data, but current schema languages focus too much on unimportant aspects and are too rigid.
    Schemas are focused on making the job of tool vendors/programmers easier, but they don't really improve the situation for business users, actually quite the opposite. I would say that we are remaking the mistakes made with EDI, which was supposed to solve the same kinds of problems.
    We're digressing from the main topic, and I do agree with you about DSL's. The only problem, I guess, is that it's possible to create so many of them...

    Posted by: tobega on November 18, 2005 at 02:24 AM


  • This is one of the benefits that Ruby brings to the table: it makes it easy to write DSLs *without* writing a BNF and parser for it.
    Look at Jim Weirichs talk on doing this:
    http://onestepback.org/articles/lingo/index.html


    Ruby allows this through a combination of having a nice blocks (=~ closures) syntax, dynamic typing (to intercecpt messages sent to an object, even unknown ones,...) and some other things.
    It might not be the perfect solution for *every* possible DSL, but it'll suffice for most I can think of. Not to mention that it's significantly easier to do than write BNF+Parser+AST/Model or writing XMLSchema+XMLDataBinding+Model which all involve a lot of libraries and technologies to get to the same point.

    Beyond Java Disclaimer:
    Just because mentioning Ruby in Java circles is a slightly dangerous thing at the moment, I'll say the following: With this post I do not imply that every Java dev should drop every Java related library like a hot potato and move to C-Ruby and C-Ruby only. Instead think about using JRuby http://jruby.sourceforge.net/ (or other dynamic languages) for that.
    Instead of ogling at everything that C# throws over the fence, let's look at languages that are actually a bit more interesting and how they solve problems.

    Posted by: murphee on November 18, 2005 at 05:16 AM

  • Well, XML is in fact a form of DSL despite its verbosity. Though, I like simplicity of DSL, but it requires you to define a grammer and a parser, whereas, there are tons of tools for XML.

    Posted by: shahzad on November 18, 2005 at 06:37 AM

  • its silly to take this stance too far. XML has an awful lot of good points, but a lot of people have put it to uses for which it is not suitable.

    defining a language is not something many people seem capable of doing competently. Many times encountering a program that has defined its own language for scripting or configuration has made my heart sink because you have to learn a whole new set of arbitrary and often ill thought out or irrelevent rules. taking a specific example, imho, xml within ANT is fine - especially with IDE autocomplete on the task names and attributes. using XML syntax or not in this case is much less important than resolving whether build scripts can be turing complete or declarative to capture sensible build processesanother point related to IDEs, is that XML might well be a good format for storing your language, even if you edit it in another way - to me this makes most sense for more graphical oriented languages.

    Posted by: asjf on November 18, 2005 at 06:42 AM

  • I came across this entry on Slashdot many months ago during a discussion on language design. It was so simple and clear that I wanted to, well, cry.


    Ingredients
    {
    2 egg;
    1 tsp vanilla;
    4 cups flour;
    }

    Equipment
    {
    mixing bowl;
    oven;
    12x6 inch pan;
    }

    Directions
    {
    preheat oven 450 degrees farenheit;

    break 2 egg into mixing bowl;
    pour 1 tsp vanilla into mixing bowl;
    pour 4 cups flour into mixing bowl;

    mix mixing bowl;
    pour mixing bowl into 12x6 inch pan;
    place 12x6 inch pan into oven;

    bake 15 minutes;
    }

    How can you possibly get that wrong? Wouldn't such a language be, well, beautiful?

    I think one argument in favor of DSLs is that in order to keep widening the accessibility of computers (and their uses), we need to remember the goals of BASIC, Pascal, Logo: make it easy for people to use, learn, play with. Make the language approachable. Somehow I think DSLs are like that. Instead of trying to force a general tool on a specific problem set (XML as a build-process structure, XML as a purchase-order struct, XML as a page-layout structure), make more specific tools that express themselves within the context of the user and their goals.

    Patrick

    Posted by: pdoubleya on November 18, 2005 at 07:33 AM

  • I would definitely suggest that anyone interested in domain specific languages read some of the papers by Dr. Ousterhout, creator of the Tcl language. Tcl was created just for this purpose and has a long history of use as an extension language, configuration language, as well as a general purpose language in its own right. It's been used in telephone switches, Cisco equipment, Oracle's tools, IBM WebSphere. I even used it for extensible data formatting in a realtime weather application.

    Most of what Ousterhout says is language independent, applicable to any flexible language especially ones with flexible constructs like Tcl, Lisp/Scheme, Ruby, JavaScript, ...

    Recommended reading:
    http://home.pacbell.net/ouster/scripting.html

    History of the Tcl language itself and its reasons for existence:
    http://www.tcl.tk/about/history.html

    JVM Version (Jacl):
    http://tcljava.sourceforge.net/docs/website/index.html
    http://sourceforge.net/project/showfiles.php?group_id=13005

    Posted by: djhagberg on November 18, 2005 at 03:08 PM

  • In the midst of all this talk about Domain Specific Languages (DSL), and the need for them, shouldn't there be also a discussion on what exactly these domains that we want to target with these DSLs, actually are? I think a big part of figuring out what a particular DSL should look like, is knowing the domain at which it is being targeted. So the question is again: what constitutes a DOMAIN? And is there a formal list somewhere of all these domains?

    Posted by: mikeazzi on November 22, 2005 at 11:44 AM

  • mikeazzi, the "domain" can be anything. For example, the language used to specifying string formatting with the printf family of functions. There's no a priori constraint on the size, scope, or extent of a DSL. I would agree that it's good to practice creating little DSLs before trying to create big ones. Unfortunately, many people are trying to start with something way too big.

    Posted by: johnm on November 23, 2005 at 02:12 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds