|
|
||
John Reynolds's BlogDatabases ArchivesIt's time for RDBMS Change Notification ServicesPosted by johnreynolds on June 09, 2004 at 01:06 PM | Permalink | Comments (8)It is a safe bet that relational databases will be around for a very long time. The relational model is well past middle age (it was introduced by E.F. Codd in 1970), but it exhibits no loss of vigor despite repeated challenges. Upstarts like Object and XML databases have garnered some support, but they havent yet made a dent in the preeminence of the RDBMS. Layers may be erected between the RDBMS and the application logic, but the roots of most applications will still be in systems that execute SQL commands. As Jonathan Bruce points out in his recent blog, the Java community's interest in SQL (via JDBC) is far from stagnant. Records managed by an RDBMS are often long lived and are seldom accessed by a single application. Applications written in different languages often manipulate the same records, and this shared access leads to a fundamental problem: How do you know that your data is in sync with the RDBMS? There is a tendency to deal with shared data problems by building layers on top of the RDBMS. In the Java world, several very good object caches exist and many of these provide replication across clusters to insure that applications will operate on identical data (like JBoss Cache and Turbine JCS). The problem with these solutions is fundamental. Applications can bypass these mechanisms and go directly to the RDBMS. If a change is made directly to the database, the caching mechanism will be ignorant until a change is attempted or the cache is refreshed. I think that the fix belongs in the RDBMS itself. Change notification needs to become a standard RDBMS feature. The RDBMS should publish changes to registered subscribers. Most popular RDBMS offerings already provide for update triggers that execute commands when changes occur. If your application needs to be aware of specific changes, then you can add a trigger to the RDBMS. The downside of this approach is that many triggers must be generated to deal with very similar concerns (and improperly written triggers can severely impact DB performance). RDBMS authors should build on the trigger technology to implement publish and subscribe change notification services. Applications (and object caches) would subscribe to change notifications by issuing statements very similar to update triggers with the addition of a callback. When the trigger is fired, the callback is invoked. The advantage to this approach is that it cannot be bypassed. A change notification service that is integral to the RDBMS will catch any changes to the data (including those caused by stored procedures). Applications will still have to deal with the changes, but at least they will know about them. The APIs for the RDBMS Change Notification Services should be standardized, and the callback mechanisms should be flexible to support multiple languages and protocols (like XML over HTTP). I'm probably naive, but like SQL itself, agreement on a common standard will benefit all RDBMS vendors. There should be little incentive to implement proprietary APIs.
Update: Thanks to a reader for a link to Oracle Streams. If Oracle integrates this functionality with Toplink, then Java developers will have something very close to what I envision. Oracle's efforts are a very good start, but we need standard APIs supported by many RDBMS offerings. Data access sanity checkPosted by johnreynolds on April 26, 2004 at 06:24 AM | Permalink | Comments (3)One of the first tasks that I performed for my employer was to diagnose and resolve a several minutes long CPU spike on the database server for one of our J2EE applications. All of our servers were well monitored, and without much ado we were able to pin the spike on a specific use case. As it turns out, the culprit was a use case for exporting Loan Application records from our company to a client. To accomplish this task several thousand Entity Beans were instantiated, data was extracted from the objects, and a comma-delimited output file was generated. Adding insult to injury, in addition to instantiating thousands of EJBs, the collection exceeded our cache size resulting in the passication and activation of beans (to and from the database) as we traversed the collection. In retrospect it's hard to fathom how this implementation strategy got past a design review, but in the heat of battle all sorts of less-then-optimal solutions creep into most products. The first tack that I took to resolve this issue was to pursue a JDBC rather then an Entity Bean approach (inspired by the Fast-Lane Reader pattern), and this resulted in a substantial performance gain (the use case executed in a third of the original time). Fortunately, my colleagues are way more SQL savvy then I am, and they suggested pursuing a stored procedure approach. The stored-procedure implementation of the use case executes in about 1/100th of the time required for the original EJB-centric solution. This is one of those great "war stories" that can be used to make all sorts of points. It speaks to inadequate design reviews, the need for system monitoring, the misuse of Entity Beans, the value of teams with diverse skill sets, and numerous other "soap box issues" that I've been known to pontificate about (a former co-worker coined the tern "johntification" to refer to my frequent monologues). Today I would like to use my "Entity Bean Based Loan Export" war story to talk about optimizing data access, and how we really ought to code in a way that enables it. The goal of our Loan Export use case was well defined: Produce an output file that contains data from Loans that meet specified criteria. Note that the use case concerns Loans; not Java objects; not database records. This is a key point to remember. Depending on the current state of the system, the data that constitutes the "Loan" could be on a hard disk, in the cache of the database system, or in the application's memory (real or virtual). The best strategy for collecting data can vary wildly based on where the data currently resides. Using my war story as an example; if Entity Beans are already instantiated for all of the Loans to be exported, then producing an output file from the Entity Beans will generate no additional load on our database server and should be pretty zippy. If the Loan data is still exclusively on disk, then the stored procedure approach is the way to go (assuming that I'm using a single RDBMS). I am not sure how to clearly express the point that I want to make, but it has something to do with optimizing for the present and planning for the future. One solution may be optimal if all objects can reside in memory, while another may be optimal if the number of objects exceeds some threshold. We need to code in a manner that allows an "optimized" solution to be injected without disrupting or confusing our intent. These thoughts gell with the goals of SQL query optimization. In some database systems, the SQL that you submit is not the SQL that is executed. Behind-the-scenes query optimizations are applied by the database engine, resulting in better overall performance. In the SQL research world, the goal is along the lines: The query that you specified is sufficient for the system to determine the records that you want to retrieve. The procedure by which those records are obtained is an implementation detail that you need not worry about.Wouldn't it be delightful to write Java data access code along similar lines? Consider a "collection populator" service. Specify the type of objects the collection should hold, specify the criteria that the objects within the collection must meet, and let the service worry about the details of populating the collection. Of course there's no such thing as a free lunch: You are going to have to write all of the methods of your "collection populator" service. The advantage will come later if your data sources change or you need to develop a more efficient implementation.
Update: A Brief Introduction to IoC by Sam Newman -- provides a good example of using IOC to inject specific DAOs. Make JDO the "P" in CMPPosted by johnreynolds on March 12, 2004 at 07:53 AM | Permalink | Comments (14)Bruce Tate's article "For JDO, the Time Is Now" brings up many good points, but it misses a key concern of mine: Solutions that already incorporate Entity Beans would be painfully expensive to rearchitect as JDO solutions. The J2EE specification for Entity Bean CMP should dictate the interface for using JDO as a persistence mechanism. This would allow developers to reliably introduce JDO below the Entity Bean level without impacting the overlying layers of their applications. I am involved with a project that uses Toplink as the CMP mechanism for Entity Beans and have experienced the pain of incorporating a non-standard persistence mechanism. Bruce is not completely candid when he states that TopLink can be used as a snap-in replacement for EJB CMP; it's more like a hack-in or pound-in replacement. The interface between the EJB container and an underlying persistence mechanism is not a part of the J2EE standard, so Oracle (formerly WebGain) has to craft unique versions every time the container (in our case WebLogic) changes. At times we have been unable to apply WLS Service Packs because they would break Toplink. This is a real maintenance and upgrade hassle. A standard mechanism for using JDO as the "P" in CMP would avoid versioning problems between EJB containers and JDO implementations, and it would also enable the ability to truly snap-in competing JDO implementations (competition is a good thing). On another note, Bruce lobbies for the recognition of SQL in the JDO standard, and I heartily agree. EJBQL and JDOQL just aren't up to snuff, and from my perspective they just aren't worth bothering with. SQL is not as "Java friendly" as the new query languages, but it is much more comprehensive and widely understood. It seems like a distraction to pursue EJBQL, JDOQL and JDBC rather then focusing on tools to help developers write good SQL. All in all, I am delighted at the possibility of a JDO resurgence, but I hope that it expands to embrace fixing EJB CMP rather then just replacing it. Update 1: The problems that we have experienced due to the tight coupling between Weblogic Server and Toplink CMP have crossed the pain threshold to the point where we have commited to fixing the problem.
After careful analysis, we have determined that EJB CMP 2.0 can not meet our needs (primarilly due to the limitations of EJBQL) and the best option for us is to eliminate Entity Beans and use Toplink for Java directly from our session beans. This looks like a rather straightforward conversion for us. I'll keep you posted.
Update 2 (July 2004):
The EJB 3.0 spec's radical overhaul of CMP has so confused us that we've slammed on the brakes. We're going to maintain our current code for the next few months and wait for the dust to settle.
| ||
|
|