Skip to main content

J2EE Architecture for Speech Applications

Posted by richunger on May 12, 2005 at 11:20 AM PDT

In my previous entry on J2EE, I made a somewhat deliberately inflammatory rant on the proliferation of web frameworks. In my online reading, I had not been able to wade through all the available information to come up with a coherent view of how they can all fit together. So, I used my soapbox here to provoke some discussion.

Boy, was that a good idea! The conversation drifted from the comments of the original post to some other folks' blogs. The best advice I received was from an old college friend, who told me to pick up Expert One-on-One J2EE Design and Development and Expert One-on-One J2EE Development without EJB.

These are fantastic books.

If you are trying to gain some perspective on J2EE architecture, and you do not read both of these books, you are doing yourself a tremendous disservice. I'm usually not big on books for particular technologies, as the online information is usually more up to date, and the quality is often variable. But I'll say it one more time: these are fantastic books.

They are also very big sellers, so I'll presume many of you already know this, and I'm coming to the party very, very late. The rest of this entry will presume the reader is familiar with the issues Rod Johnson addresses in his books.

I'm really interested in taking the lessons I've learned, and applying them to VoiceXML applications. I'm convinced that the lightweight container architecture is the right way to go for the majority of voice applications. However, there needs to be some tweaks in the way the UI tier utilizes it, compared to what we're used to seeing with HTML-based applications.

All of the UI tier libraries and frameworks out there kind of assume you're making HTML sites. Okay, that's not really true. They'll work with any presenation layer. But let's be really honest: a lot of the convenience classes that make a lot of these libraries a real pleasure to work with assume HTML. Maybe it's not a big leap to have a cell phone talking WAP, but the fundamental architecture is the same. The user has a computer or device with a browser running on it, which makes queries for pages from a server.

Not so with VoiceXML platforms. The standard platform architecture is to have 100 or so browser instances sitting on a server. The browser is not located on the client, because the client is a POT (Plain Old Telephone). The caller dials up a computer with a Dialogic or NMS card, or a SIP gateway, and the call is routed to one of the browser instances running on a server. This may be the same server that's running the app server, or it may just be on the same LAN.

So, what difference does this make to my J2EE architecture? Well, now the browser cache becomes much, much more important. If I have a few thousand phone calls coming in every hour to those 100 browsers, and they're all running the same application, the absolute worst thing I can do is a lot of dynamic page rendering. That would involve parsing the JSP into VoiceXML, and parsing the VoiceXML into runnable code, on every request.

Think of a kiosk. Now think of 100 dumb-terminal kiosks sharing the same computer. With really high traffic.

Combine this with the fact that VoiceXML (for better or worse) actually has a temporal component and a "form interpretation algorithm", which specifies the order of execution for

s and s, and the temptation to put business logic in the presentation layer is extreme.

I want static VoiceXML pages in the browser cache all the time. Some VoiceXML browsers actually have configuration parameters for pre-populating the browser cache!

So, component libraries like Tapestry or JSF are out. MVC frameworks like struts could be okay, as long as they're redirecting to static VoiceXML pages, instead of forwarding to JSPs.

However, we're back to square one when it comes to exposing server-side data to the browser. In the HTML world, you might solve this with ajax. The static, cached pages make calls to the service layer to get information when it's needed, not at page-render-time, and modifies the page DOM to deliver that information to the user.

One problem, VoiceXML doesn't support DOM rewriting. (Nor should it. That would be hideously confusing!) What VoiceXML does have, at least in version 2.1, is the element. This allows static VoiceXML pages to make HTTP requests (like a ), without incurring a page transition. The resulting document is expected to be XML, and the DOM is exposed as a javascript variable.

For example, you might say:

<data name="myJavascriptVariable" src="/springBeanServlet?bean=myBeanName&method=myServiceInterfaceMethod"/>

If myBeanName.myServiceInterfaceMethod() returns a POJO, the springBeanServlet would be responsible for serializing it as a simple xml file. For example, if you had a class:

class Flight {
private Date arrivalTime, departureTime;
private String arrivalAirport, departureAirport;

// accessors...

It might be serialized as:


This would allow static pages to invoke service layer interfaces, as well as access POJOs defined on the server. Obviously, some syntactic sugar would be needed to address collections.

One downside to this approach is that now the VoiceXML file needs to contain a lot of javascript code for traversing the DOM:


...or something similar, just to get the arrival time!

Luckily, the ECMAScript folks have incorporated e4x into the standard. Unfortunately, too late for the w3c voice browser working group, who have decided not to allow e4x as an option for the tag, at least for VoiceXML 2.1. Too bad, it would have reduced the above code to:


Much better, no?

Well, if you've made it this far, I'm terribly impressed! I really don't know how big the VoiceXML developer community is. I do know it's nascent enough, that these issues haven't begun to be addressed seriously. So, if you're still here, please post a comment, just to let me know if there's enough interest for me to keep blogging about these issues. And if I'm missing a trick here, I'd love to know about it.

Related Topics >>