The Source for Java Technology Collaboration
User: Password:



Rich Unger

Rich Unger's Blog

J2EE Architecture for Speech Applications

Posted by richunger on May 12, 2005 at 11:20 AM | Comments (5)

In my previous entry on J2EE, I made a somewhat deliberately inflammatory rant on the proliferation of web frameworks. In my online reading, I had not been able to wade through all the available information to come up with a coherent view of how they can all fit together. So, I used my soapbox here to provoke some discussion.

Boy, was that a good idea! The conversation drifted from the comments of the original post to some other folks' blogs. The best advice I received was from an old college friend, who told me to pick up Expert One-on-One J2EE Design and Development and Expert One-on-One J2EE Development without EJB.

These are fantastic books.

If you are trying to gain some perspective on J2EE architecture, and you do not read both of these books, you are doing yourself a tremendous disservice. I'm usually not big on books for particular technologies, as the online information is usually more up to date, and the quality is often variable. But I'll say it one more time: these are fantastic books.

They are also very big sellers, so I'll presume many of you already know this, and I'm coming to the party very, very late. The rest of this entry will presume the reader is familiar with the issues Rod Johnson addresses in his books.

I'm really interested in taking the lessons I've learned, and applying them to VoiceXML applications. I'm convinced that the lightweight container architecture is the right way to go for the majority of voice applications. However, there needs to be some tweaks in the way the UI tier utilizes it, compared to what we're used to seeing with HTML-based applications.

All of the UI tier libraries and frameworks out there kind of assume you're making HTML sites. Okay, that's not really true. They'll work with any presenation layer. But let's be really honest: a lot of the convenience classes that make a lot of these libraries a real pleasure to work with assume HTML. Maybe it's not a big leap to have a cell phone talking WAP, but the fundamental architecture is the same. The user has a computer or device with a browser running on it, which makes queries for pages from a server.

Not so with VoiceXML platforms. The standard platform architecture is to have 100 or so browser instances sitting on a server. The browser is not located on the client, because the client is a POT (Plain Old Telephone). The caller dials up a computer with a Dialogic or NMS card, or a SIP gateway, and the call is routed to one of the browser instances running on a server. This may be the same server that's running the app server, or it may just be on the same LAN.

So, what difference does this make to my J2EE architecture? Well, now the browser cache becomes much, much more important. If I have a few thousand phone calls coming in every hour to those 100 browsers, and they're all running the same application, the absolute worst thing I can do is a lot of dynamic page rendering. That would involve parsing the JSP into VoiceXML, and parsing the VoiceXML into runnable code, on every request.

Think of a kiosk. Now think of 100 dumb-terminal kiosks sharing the same computer. With really high traffic.

Combine this with the fact that VoiceXML (for better or worse) actually has a temporal component and a "form interpretation algorithm", which specifies the order of execution for <form>s and <field>s, and the temptation to put business logic in the presentation layer is extreme.

I want static VoiceXML pages in the browser cache all the time. Some VoiceXML browsers actually have configuration parameters for pre-populating the browser cache!

So, component libraries like Tapestry or JSF are out. MVC frameworks like struts could be okay, as long as they're redirecting to static VoiceXML pages, instead of forwarding to JSPs.

However, we're back to square one when it comes to exposing server-side data to the browser. In the HTML world, you might solve this with ajax. The static, cached pages make calls to the service layer to get information when it's needed, not at page-render-time, and modifies the page DOM to deliver that information to the user.

One problem, VoiceXML doesn't support DOM rewriting. (Nor should it. That would be hideously confusing!) What VoiceXML does have, at least in version 2.1, is the <data> element. This allows static VoiceXML pages to make HTTP requests (like a <submit>), without incurring a page transition. The resulting document is expected to be XML, and the DOM is exposed as a javascript variable.

For example, you might say:

<data name="myJavascriptVariable" src="/springBeanServlet?bean=myBeanName&method=myServiceInterfaceMethod"/>

If myBeanName.myServiceInterfaceMethod() returns a POJO, the springBeanServlet would be responsible for serializing it as a simple xml file. For example, if you had a class:

class Flight {
	private Date arrivalTime, departureTime;
	private String arrivalAirport, departureAirport;
	
	// accessors...
}

It might be serialized as:

<flight>
<arrivalTime>{arrivalTime.getTime()}</arrivalTime>
<departureTime>{departureTime.getTime()}</departureTime>
<arrivalAirport>JFK</arrivalAirport>
<departureAirport>SFO</departureAirport>
</flight>

This would allow static pages to invoke service layer interfaces, as well as access POJOs defined on the server. Obviously, some syntactic sugar would be needed to address collections.

One downside to this approach is that now the VoiceXML file needs to contain a lot of javascript code for traversing the DOM:

myJavascriptVariable.getRootElement().getChildElement("arrivalTime").getValue()

...or something similar, just to get the arrival time!

Luckily, the ECMAScript folks have incorporated e4x into the standard. Unfortunately, too late for the w3c voice browser working group, who have decided not to allow e4x as an option for the <data> tag, at least for VoiceXML 2.1. Too bad, it would have reduced the above code to:

myJavascriptVariable.arrivalTime

Much better, no?

Well, if you've made it this far, I'm terribly impressed! I really don't know how big the VoiceXML developer community is. I do know it's nascent enough, that these issues haven't begun to be addressed seriously. So, if you're still here, please post a comment, just to let me know if there's enough interest for me to keep blogging about these issues. And if I'm missing a trick here, I'd love to know about it.


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • I think the Java VoiceXML community is much is different than most other segements of the Java community in the ratio of professionals to hobbyists. This has to do with the expense and complexity of experimenting with VoiceXML versus JDBC, Swing, or many other aspects of Java. Tell Me Studio seems like an excellent resource for beginners. What resources would you recommend to beginners?

    Posted by: coxcu on May 13, 2005 at 07:47 AM

  • TellMe Studio is a fine choice. Also, community.voxeo.com has resources for beginners. Unfortunately, the place I work is one of those expensive places. We have a great development tool for beginners, but it's not free and we have no public hosting for you to try out your application.

    Posted by: richunger on May 13, 2005 at 11:04 AM

  • VXML projects i've worked for did a lot of data prefetching.
    As VXML pages have one or more forms, we fetched data for all forms of a single VXML page.

    Moreover we "hide" the server-side fetching behind some welcome sound.
    I think producing VXML pages, is very much like WML page generation. In WML you have one more decks in a single WML page.

    Summarizing the challenge for VXML is that the synchronous processing as known in the HTML world becomes more asynchronously in the VXML world.

    Posted by: berni on May 27, 2005 at 02:10 AM

  • We're still here! Good writeup Rich. This is definitely a problem we've faced with trying to write dynamic VoiceXML apps with good performance. The most compelling applications are inherently personal (MY emails, MY address book) which completely defeats the browser cache in a stardard architecture, leading to voice applications that are not as "snappy" as you'd want.

    Other approaches to get better performance don't rely on the cache, but you always take the hit somewhere. You might tradeoff the performance and latency in requesting a larger page in order to make a large enough page that the majority of your interactions will not require leaving that page. As an example, think of creating a page that has everything you want to do for your first 10 voice mail messages, and only transitioning to a new page when you get to message 11.

    Too bad about the e4x... sure would be a lot cleaner.

    Any idea of which browsers have 2.1 draft / candidate on their roadmaps now?

    Posted by: gpnewton on July 13, 2005 at 11:22 AM

  • You've hit the nail on the head in a lot of ways. The client-server architecture in place now with voicexml is really not a very good one. After all, the "client" is a server, and probably resides on the same LAN (maybe even the same box!). So why jump through the hoops of a client-server architecture?
    This is, in fact, the direction voicexml is headed. In 3.0, it will be possible to dynamically (on the client) specify a dialog strategy (your own form interpretation algorithm), meaning you'd code your transition logic in javascript. At that point, it really only makes sense to use the data tag for backend access, and just use the voicexml language to drive the dialog, as opposed to some web tier framework.
    WRT 2.1-compliant browsers, I'm fairly certain most of the 2.0 implementations out there will have 2.1 compliance very soon, if they don't already. It's a really small delta, and mostly it's stuff a lot of the vendors were already implementing as extension tags. Our ship dates at Nuance are currently very up in the air obviously (because of the merger), but I know that getting our browser to 2.1 compliance is taking less than one full-time engineer.

    Posted by: richunger on July 13, 2005 at 01:50 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds