The Source for Java Technology Collaboration
User: Password:



Scott Oaks's Blog

Grizzly Protocol Parsers

Posted by sdo on December 19, 2007 at 01:01 PM | Comments (5)

[NOTE: The code in this blog was revised 2/11/08 due to some errors on my part the first time, and some changes as it was ingtegrated into grizzly. And thanks to Erik Svensson for pointing out a few errors, it has been revised again on 2/13/08.]
I'm quite interested these days in parsing performance: much of what a Java appserver does is take bytes from a network stream (usually, but not always, in some 8-bit encoding) and convert them into Java strings (based on 16-bit characters). Because servlet and JSP APIs are written in terms of strings, much of that conversion is unavoidable, but parsing network protocols at the byte level is appropriate in some circumstances.

As I prepared to prototype some tests around that, I realized I needed a good framework to test my changes, and of course that framework is grizzly. In fact, the newly-released grizzly 1.7 has a new protocol parser that exactly fit my needs (partly because I joined the grizzly project so that I could modify the parser as I needed; such are the joys of open source!).

I'll talk about some of my performance tests with network parsing in later blogs; for now, I wanted to write a quick entry on how to use grizzly 1.7's new protocol parser. In grizzly 1.7, the ProtocolParser interface was reimplemented to make it much easier to deal with the messages that the parser is expected to produce. This means that it is now possible to use standard grizzly filters to handle the data produced by a ProtocolParser, simply like this:
controller.setProtocolChainInstanceHandler(new DefaultProtocolChainInstanceHandler() {
      public ProtocolChain poll() {
          ProtocolChain protocolChain = protocolChains.poll();

          if (protocolChain == null) {
              protocolChain = new DefaultProtocolChain();
              ((DefaultProtocolChain) protocolChain).setContinuousExecution(true);
              protocolChain.addFilter(new MyParserProtocolFilter());
              protocolChain.addFilter(new MyProcessorFilter());
          }

          return protocolChain;
      }
}
The nice thing about this is that additional filters (like a debugging log filter) can be inserted anywhere along the chain; the protocol use is completely integrated into the standard grizzly design. Note that call to setContinuousExecution -- it should be the default for protocol parsers (and will be eventually), but version 1.7 of grizzly will need that call. [Note that the standard LogFilter in grizzly is not appropriate in this case, since it tries to read directly from the socket as well; it's trivial to write your own if you like.]

Now it's a matter of implementing the two filters and the parser itself. The ParserProtocolFilter class will handle reading the requests and calling the parser, but in order for it to know which parser to use, you must extend it and override the newProtocolParser method:
public class MyParserProtocolFilter() {
    public ProtocolParser newProtocolParser() {
         return new MyProtocolParser());
    }
}
What about the parser itself? That's the meat of the issue. The new protocol parser interface expects a basic flow like this: start processing a buffer, enumerate the message in the buffer, and end processing the buffer. The buffer can contain 0 or more complete messages, and it's up to the protocol parser to make sense of that. Here's the outline of a simple protocol parser that parses a protocol where the first byte is a number of bytes in string, followed by the remaining bytes:
public class MyProtocolParser implements ProtocolParser {
    byte[] data;
    int position;
    ByteBuffer savedBuffer;
    int origLimit;
    public void startBuffer(ByteBuffer bb) {
        // We begin with a buffer containing data. Save the initial buffer
        // state information. The best thing here is to get the backing store
        // so that the bytes can be parsed directly. We also need to save the
        // original limit so that we can place the buffer in the correct state at the
        // end of parsing
            savedBuffer = bb;
            savedBuffer.flip();
            partial = false;
            origLimit = savedBuffer.limit();
            if (savedBuffer.hasArray()) {
                data = savedBuffer.array();
                position = savedBuffer.position() + savedBuffer.arrayOffset();
                limit = savedBuffer.limit() + savedBuffer.arrayOffset();
            } else ...maybe copy out the data, or use put/get when parsing...
    }

    public boolean hasMoreBytesToParse() {
        // Indicate if there is unparsed data in the buffer
        return position < limit;
    }

    public boolean isExpectingMoreData() {
        // If there is a partial message remaining in the buffer, return true
        return partial;
    }

    public String getNextMessage() {
        // We already know this, but other protocols might parse here
        return savedString;
    }

    public boolean hasNextMessage() {
        // In our case, it's easier to parse here
        int length = data[position];
        if (data.length < position + 1 + length) {
            savedString = new String(data, position + 1, length);
            position += length + 1;
            savedBuffer.limit(length + position + 1);
            savedBuffer.position(position + 1);
            partial = false;
        }
        else partial = true;
        return !partial;
    }

    public boolean releaseBuffer() {
        // If there's a partial message return true; else false
            if (!hasMoreBytesToParse())
                savedBuffer.clear();
            else {
                // You could compact the buffer here if you're
                // concerned that there isn't enough space for
                // further messages, but compacting comes at a
                // performance price -- whether to compact or not
                // depends on your protocol.
                savedBuffer.position(position);
                savedBuffer.limit(origLimit);
            }
            return partial;
    }
}
The point of this is that the ParserProtocolFilter will repeatedly call hasNextMessage/getNextMessage to retrieve messages (Strings in this case) to pass to the next filter. When it's done, it will call releaseBuffer, which is responsible for setting the position and limit in the buffer to reflect the data consumed by the (possibly multiple) messages returned.

So what about the downstream filters? You probably noticed that when we parsed the data, we also set the limit/position in the ByteBuffer to reflect the message boundaries. That's because not all grizzly filters will understand that the data is protocol based and has been seperated into types. For instance, you could write a LogFilter that just prints out the data received; it doesn't know about the messages (and we wouldn't want it to -- we'd want it to print the raw data anyway, rather than information in the message).

But downstream filters can also understand what a message is and hence they can work like this:

public class MyProcessorFilter implements ProtocolFilter {
    public boolean execute(Context ctx) {
        String s = (String) ctx.getAttribute(ProtocolParser.MESSAGE);
        if (s == null) {
            // no message; just use the bytes in the buffer like a
            // normal filter
            s = getStringFromBuffer(ctx);
        }
        .. do something with s ..
    }
}
So, apart from writing the protocol parser (which could be quite complex, depending on the actual protocol and how it breaks into messages), using the new grizzly framework for protocol parsing is quite simple: you just set up the parser class, and then have a filter that processes the messages from the parser. And long the way, you can use any other grizzly filter or framework feature you need.

Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Thanks for the sample code. One question--in hasNextMessages(), at the end of the conditional block, you're setting partial to false, then immediately outside the block, setting it to true. Are you missing a return or an else? Thanks, Patrick

    Posted by: pdoubleya on December 20, 2007 at 01:30 AM

  • Oops -- good catch; it was missing an else. I've corrected it above.

    Posted by: sdo on December 20, 2007 at 07:29 AM


  • What do I do if the Protocol String is 93119 bytes?
    The String will than stretch several ByteBuffers.
    So I need to store the Content of the ByteBuffers somewhere intelligently in between MyProtocolParser invocations?

    Posted by: johnmann on January 04, 2008 at 08:38 AM


  • Ok I thought again about the problem. I now think that it is the responsibilty of a client to limit the protocol string to maybe 8000 bytes. If the string/message is logically greater then one should implement something like Corba Fragmentation. Anyway Scott many thanks for your article. I am trying to implement a custom protocol and your entry is a nice kickstart into the framework.

    Posted by: johnmann on January 06, 2008 at 04:40 AM

  • I'm afraid the current protocol parser it's difficult to use to parse some streaming protocol. When I was developing a streaming server, I tried the ProtocolParser and ProtocolFilter first. But as the basic concept, the streaming socket is opened for a long time and one socket should have one codec attached. I've ever tried to trace the ProtocolFilter into the grizzly implementation. It's not easy to attach one parser to one connection and so many unnecessary steps I don't need at all. So I finally decided to attach a SelectionKeyAttachment implementing AttributeHolder and use this attachment as the parser.
    I think current ProtocolParser is only used for some simple and short connection. It's not suitable to used as a streaming protocol parser. This is only one of my cent.

    Posted by: popeyelin on May 14, 2008 at 03:35 PM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds