Skip to main content

Building a fluent API (internal DSL) in Java

Posted by carcassi on February 4, 2010 at 1:53 PM PST

In this post I am going to sum up things I have learned while creating a fluent API (or internal DSL) in Java. I'll talk about the search API I created for my current position: it's not a toy problem, it's a real problem, which has a significant amount of complexity. Because of that complexity, you get to see techniques and ideas that you don't usually see in toy examples. I am not including the full source, which (if you really want) you can access on the project's site.

The model

First, I need to tell you a little bit about the problem I am trying to solve. IRMIS, the tool that I am developing, needs to keep track of all the components of a particle accelerator and their relationships. For example, you want to keep track where a particular card is, how is it getting power, and how messages from the control system are addressing it. The GUI, for example, would display something like:

IRMIS Screenshot

Each component can be part of multiple hierarchies (control, housing, power), can have multiple children in each, can have multiple parents in each (though 99% of the time it has one). Each component has also a type and a set of properties:

public interface Component {
    ...
    ComponentType getComponentType();
    Set<ComponentRelationship> getParents(RelType relType);
    Collection<ComponentRelationship> getChildren(RelType relType);
    Map<String, String> getProperties();
}

It has a bunch of other stuff too, but to we can limit our attention to this. Each relationship will have:

public interface ComponentRelationship {
    ...
    RelType getRelationshipType();
    Component getParent();
    Component getChild();
    String getDescription();
}

With this bean-like interface you can, at least in principle, access all the information. Yet, it becomes kind of hard to express conceptually simple queries. For example, let's say you want to ask what are all the components that are powering all installed instances of particular type of device. You would have to start from the housing hierarchy, recursively iterate (to get all the components that are installed somewhere), filter out only the ones of the type you want, then from those get all parents in the power hierarchy. This is significantly complicated: you really have to write a script while the specification is really just a sentence.

What I have built is a fluent API that allow you to write that query as:

parents().in(POWER).of(allComponents().where(componentType().isEqualTo("My Type")));

It is not so far from a verbal request and it is similar enough to SQL. It took me more than a couple of months in incremental iterations to reach this design, but once I had it was very very trivial to implement and I was kind of amazed how things fell into place very well. Designing a similar API would take me significant less time now. So, here are some things I have learned that I am using in creating other fluent APIs.

The design end goal

When I started, I didn't really know what the end product should look like. I went around to the usual places, but I found only basic stuff. I look at some libraries for ideas but that was about it. I didn't have a good "recipe".

I came to realize the following: there are two distinct APIs I was actually building. One is the proper fluent API, the one that allows me to actually express the query. The second one is the execution API, the one that allows me to run the query. The two are connected to each other (whenever I add a new expression, I need to implement the search that uses that expression) but you have to de-couple them (as you may want to completely change how the queries are executed without changing how they are written and vice-versa).

The other key problem is that you will want to break down the fluent expression in chunks that you can recombine and use. For example, we may modify the previous query in the following ways:

children().in(POWER).of(allComponents().where(componentType().isEqualTo("My Type")));
parents().in(POWER).of(allComponents().where(componentType().isNotEqualTo("My Type")));
parents().in(POWER).of(allComponents().where(property("owner").isEqualTo("carcassi")));

What you want to do is to be able define each piece of the expression independently, so that it can be reused and other pieces can be added or removed as the API evolves. And whenever you add a new expression, you add only the minimum piece of code to execute that part of the expression within the query. It may sound very very complicated, but it's surprisingly not!

Let's start by identifying these sub-expression. For example, I'll need to be able to describe component relationships (parent, child, descendent, ancestor, ...), filtering expression, and the attribute of the component (the type, a key value property, ...) on which is going to be part of the filtering expression. Each of these element will be one class, and I will first define the execution part of the API. For example:

public abstract class ComponentHierarchyRelationship {

    /**
     * Returns the components related to the given component according to
     * this hierarchical relationship.
     */
    public abstract Set<Component> getComponents(Component component);

    ...
}

Each hierarchy relationship is something that given a component, returns a set of components: the parents instance will take a component and return the parents, a set of componetns. To implement each different relationship, we simply implement that abstract method:

    public static ComponentHierarchyRelationship children() {
        return new ComponentHierarchyRelationship() {
            @Override
            public Set<Component> getComponents(Component component) {
                return ...;
            }
        };
    }

Each instance should be defined as public static, so that it can be added through a static import and it becomes one of the tokens that your fluent API understands. You have a choice of methods or variables: it seems equivalent to me, so pick one and be consistent across the API. You may also create once and cache it (if needed). You should also consider making the execution API private: it boils down whether the user should be able to define his own tokens and "extend" the language. Make public if you really need to allow that. Just to iterated: you add and remove elements of the same type simply by adding and removing static factory methods with the same class.

Now that we see how to define and implement the execution part, let's concentrate on the fluent part. Consider the following class that represents a component field:

public abstract class ReadOnlyComponentField {

    /**
     * Returns the value of this field for the given cmpnt.
     * @param cmpnt a component
     * @return the value of the field
     */
    public abstract String extractValue(Component cmpnt);
   
    ...
}

And the following class that represent a filtering condition, like "the type is foo" or "the owner is bar":

public abstract class ComponentFilter {

    /**
     * Returns true if the filter accept the given component.
     * @param cmpnt a component
     * @return true if the component passes the filter
     */
    public abstract boolean accept(Component cmpnt);

    ...
}

Now we add an operation on the field, so that I can express whether a particular field has a value:

public abstract class ReadOnlyComponentField {
    ...
    public ComponentFilter isEqualTo(final String value) {
        if (value == null)
            throw new NullPointerException("Value should be not null");
        return new ComponentFilter() {
            @Override
            public boolean accept(Component cmpnt) {
                return value.equals(extractValue(cmpnt));
            }
        };
    }
    ...  
}

Few things to note. The method is added to the abstract class: all instances are going to support this operation. The return type is another class that represent an element of the fluent API: this is how we connect all the pieces together of the fluent API itself. The null pointer check is executed as part of the fluent API, so the exception is raised when the user is putting together the expression. The code within the inner class, instead, runs during the execution of the query itself. The inner class code uses the abstract function to perform the operation: this is how we connect all the pieces of the execution API.

Now consider this:

public abstract class ComponentFilter {
    ...

    /**
     * A helper class of boolean operations on {@code ComponentFilter}s
     */
    private abstract static class BooleanOperation extends ComponentFilter {

        ComponentFilter filter1;
        ComponentFilter filter2;

        public BooleanOperation(ComponentFilter filter1, ComponentFilter filter2) {
            this.filter1 = filter1;
            this.filter2 = filter2;
        }

    }

    /**
     * Returns a filter that is the logical OR of this filter and the given
     * filter.
     * @param anotherFilter a {@code ComponentFilter}
     * @return a new {@code ComponentFilter}
     */
    public ComponentFilter or(ComponentFilter anotherFilter) {
        return new BooleanOperation(this, anotherFilter) {
            @Override
            public boolean accept(Component cmpnt) {
                return filter1.accept(cmpnt) || filter2.accept(cmpnt);
            }
        };
    }

    /**
     * Returns a filter that is the logical AND of this filter and the given
     * filter.
     * @param anotherFilter a {@code ComponentFilter}
     * @return a new {@code ComponentFilter}
     */
    public ComponentFilter and(ComponentFilter anotherFilter) {
        return new BooleanOperation(this, anotherFilter) {
            @Override
            public boolean accept(Component cmpnt) {
                return filter1.accept(cmpnt) && filter2.accept(cmpnt);
            }
        };
    }

This, as all the comments suggest, implement boolean operations. And remember: these are now going to apply to all filters, however defined. So we can have:

    componentType().isEqualTo("My Type").and(property("owner").isNotEqualTo("carcassi))

And so on. The rest is pretty much defined in the same way: the whole query represents a set. A set itself can be filtered, with a where that accept a filter. ComponentHierarchyRelationship.of can accept a set itself, so we can have nested/recursive queries. To add and remove vocabulary without changing the overall grammar, you simply add/remove static factory methods or add/remove methods of the class.

The end result of your design should be the list of classes that represent the expressions of your fluent API, the list of static factory methods that represents the tokens of your API and the list of operators you allow on each expression. This will take many iterations to get right, but knowing these are the kind of things you want to have in the end is helpful.

Sentence structure

When designing the fluent API I noticed it pays to be aware of what role each fragment plays and be clear of the natural language grammar (not Java or BNF that is). For example, a component field will be a noun, the component filter will be a subordinate clause, the component set (which is also a whole query) is noun sentence fragment, which could contain subordinates (i.e. all components where the type is foo). You need to understand what these are, be strict and make sure that whatever prepositions or verbs you use to combine them always makes some sense.

Some rule of the thumb: subordinates should always be passed as an argument and not chained. For example:

childrenOf().childrenOf().allComponents().where().componentType().isEqualTo("My type")

is a really bad idea (I know: I have tried). It couples together the whole API and you can't break pieces and reuse them easily. You also lose the whole nested structure of the sentence, which actually makes it easier to implement the API.

Coordinates should be chained with a conjunction or adverb but it should still be passed as an argument. For example:

componentType().isEqualTo("My type").and().property("owner").isNotEqualTo("carcassi") 

is bad: pass the coordinate as an argument to end. Make also sure that your conjunction really work for all cases for example:

componentType().isEqualTo("My type").but(property("owner").isNotEqualTo("carcassi"))

works, but this:

componentType().isEqualTo("My type").but(property("owner").isEqualTo("carcassi")) 

does not. Don't try being too cute or you'll get into these situations. 

Another unexpected interesting grammatical feature I encountered is the use of singular and plurals. For example I distinguish between parents() and parent(). The first returns all the parents while the second makes sure that, during the execution, it either returns one component or throws an exception. This is useful because multiple parents in a single hierarchy is a fairly unusual thing (redundant power supply for example) so in many case you simple expect components to have a single parent. By choosing parent() you can enforce that assumption.

What should come across is that designing a fluent interfaces is not just about method chaining. It's really about building up an ad-hoc language inspired by the natural domain language. But, at the end of the day, you are still designing an API so it's not prose either. You'll get sentences that kind of sound like English but aren't. Thinking about the domain sentence structure does usually give you hints of how your design should proceed. (In the same way that looking at the domain knowledge often tells you what classes and objects you will need).

Restricting expression scope

Another trick useful trick I discovered is about limiting the scope of the operation. For example, I do not want to allow the chaining of two where in the expression:

allComponent().where(componentType().isEqualTo("My type")).where(property("owner").isEqualTo("carcassi")

But I also want to be able to define operation on a set, regardless of the presence of the where clause. In this case, you can use subclassing, and let the method return the upper class, so that fewer and fewer methods are available.

allComponents().where(componentType().isEqualTo("My type")).andTheir(descendents())
UnfilteredSet -> FilteredSet                              -> ComponentSet

The first method returns an UnfilteredSet, which extends FilteredSet to define the where() operation. The where operation returns a FilteredSet which extends the ComponentSet to define the andTheir() operation. The ComponentSet has no operation and just defines the method for the execution API. This means that one can't have two where() calls and can't invert the order of where() and andTheir(). The implementation is also easy because the methods of the parent class can simply ignore the optional parameters of the child class: you implement at the narrowest scope.

This method works quite well, especially with the auto-complete features of the IDEs. The problem is that it proliferates classes, so use it judiciously. But where you need it, it works great.

Another way of achieving this is by using exceptions instead of the type system. For example, every hierarchy relationship needs to allow to specify whether it applies to all hierarchies or to a specific one. The way I implemented this is by having the following method:

public abstract class ComponentHierarchyRelationship {
    /**
     * The hierarchy type that this relationship refers to. If null, refers to all
     * types.
     */
    protected RelType relType;
   
    /**
     * Limits the relationship to a particular hierarchy.
     * @param relType a hierarchy type
     * @return this
     */
    public ComponentHierarchyRelationship in(RelType relType) {
        if (this.relType != null) {
            throw new IllegalStateException("Relationship type already set");
        }
        this.relType = relType;
        return this;
    }
    ...
}

By default parents() refers to all parents and it can be refined to parents().in(HOUSING), but a double call will result in an exception. You can extend this even to require a particular ordering (if foo is already set, setting bar would throw an exception, forcing bar to be set before foo). This strategy does not pollute the type system, but does not give you compile time safety and it complicates the implementation (you need to check at all possible options and switch between those instead of having the type system doing it for you). Again, the exception is thrown while the expression is being created, not during the query execution.

Sentence order and ending problem

One of the well known feature of fluent API is that they reverse the order of tokens, and it gets a while to get adjusted. In English you don't say:

sandwich.eat();

we say:

eat(sandwich);

A related problem is being able to tell when you stop a method chain. For example, to convert from a FooBuilder to the actual Foo you may have:

Foo foo = foo("the name").owner("me").build();

Here are a couple of ways to avoid that. We are typically used to put the most important attributes first: if the name is required, we pass it as an argument to the constructor and that comes first. But if name is really compulsory, this actually gives us a natural way to end the chaining:

Foo foo = newFoo().owner("me").andName("the name");

The ending problem itself can also be seen as a sentence order problem: we say "build this" and not "this build". And "what is being built" is actually a subordinate clause, which can be arbitrarily complicated. So, we could theoretically and more properly have:

Foo foo = build(aNewFoo().with(owner.equalTo("me").and(name().equalTo("the name"))));

With all the gradation in between

Foo foo = aNewFoo().with(owner.equalTo("me").and(name().equalTo("the name")));
Foo foo = aNewFoo().with(owner("me").name("the name"));
...

Your taste and your requirements may drive what is better in you case. But you should have these two alternatives in mind: required expression at the end or use endMark(expression) instead of expression.endMark()

Also, if your natural language allows verbs at the end (like German) you can do that. Or you can use the politeness API pattern:

Foo foo = foo("the name").owner("me").please();

Nothing happens if you forget to say please!!! ;-)

Conclusion

In designing a fluent API there are different considerations you have to make, I hope this gives you a glimpse into that. It helps if you have studied about languages and grammars (the computer ones this time). But in the end it's just practice. I still doubt that one can write a whole API to be fluent. What I did was to have the basic building data objects as immutable beans (get thread safety) so the read API is "standard bean API". The search and the write, though, are more like fluent APIs. It seems to work well.

The thing does not work well and I don't really know how to solve is documentation: JavaDocs is not at all suited for this task... and the documentation is too scattered. I have example of the overall usage at the package level and at some of the key classes, but one would really want is to have it automatically documented like you document SQL or other special purpose languages.

 

Related Topics >>

Comments

Very interesting,

Very interesting, Gabriele! It's surely giving me a lot of hints.