J2EE Architecture for Speech Applications
In my previous entry on J2EE, I made a somewhat deliberately inflammatory rant on the proliferation of web frameworks. In my online reading, I had not been able to wade through all the available information to come up with a coherent view of how they can all fit together. So, I used my soapbox here to provoke some discussion.
Boy, was that a good idea! The conversation drifted from the comments of the original post to some other folks' blogs. The best advice I received was from an old college friend, who told me to pick up Expert One-on-One J2EE Design and Development and Expert One-on-One J2EE Development without EJB.
These are fantastic books.
If you are trying to gain some perspective on J2EE architecture, and you do not read both of these books, you are doing yourself a tremendous disservice. I'm usually not big on books for particular technologies, as the online information is usually more up to date, and the quality is often variable. But I'll say it one more time: these are fantastic books.
They are also very big sellers, so I'll presume many of you already know this, and I'm coming to the party very, very late. The rest of this entry will presume the reader is familiar with the issues Rod Johnson addresses in his books.
I'm really interested in taking the lessons I've learned, and applying them to VoiceXML applications. I'm convinced that the lightweight container architecture is the right way to go for the majority of voice applications. However, there needs to be some tweaks in the way the UI tier utilizes it, compared to what we're used to seeing with HTML-based applications.
All of the UI tier libraries and frameworks out there kind of assume you're making HTML sites. Okay, that's not really true. They'll work with any presenation layer. But let's be really honest: a lot of the convenience classes that make a lot of these libraries a real pleasure to work with assume HTML. Maybe it's not a big leap to have a cell phone talking WAP, but the fundamental architecture is the same. The user has a computer or device with a browser running on it, which makes queries for pages from a server.
Not so with VoiceXML platforms. The standard platform architecture is to have 100 or so browser instances sitting on a server. The browser is not located on the client, because the client is a POT (Plain Old Telephone). The caller dials up a computer with a Dialogic or NMS card, or a SIP gateway, and the call is routed to one of the browser instances running on a server. This may be the same server that's running the app server, or it may just be on the same LAN.
So, what difference does this make to my J2EE architecture? Well, now the browser cache becomes much, much more important. If I have a few thousand phone calls coming in every hour to those 100 browsers, and they're all running the same application, the absolute worst thing I can do is a lot of dynamic page rendering. That would involve parsing the JSP into VoiceXML, and parsing the VoiceXML into runnable code, on every request.
Think of a kiosk. Now think of 100 dumb-terminal kiosks sharing the same computer. With really high traffic.
Combine this with the fact that VoiceXML (for better or worse) actually has a temporal component and a "form interpretation algorithm", which specifies the order of execution for