Skip to main content

Transactions are your friend

Posted by marklittle on March 2, 2006 at 1:25 PM PST

After having been doing this for 20 years, I can say that transaction processing has got to be one of the most difficult middleware components to persuade developers to use. There are several reasons for this, but probably the most important is that, unlike something like caching or security, you don't see the benefits transactions bring until there's a failure. Unfortunately (or fortunately, depending on your perspective), failures don't happen that often, so actually demonstrating the utility of using a transaction processing system is made even more difficult. Furthermore, unlike something like security, you're unlikely to be refused access to a resource because you're not using it within the scope of a transaction.

However, thanks to the inefficiencies of natural selection (humanity is not perfect yet) and the beauty of entropy (all things decay), failures will always happen and so transactions will always be needed: all we can ever hope to do as technology advances, is reduce the probability of a failure occuring. Therefore, as a developer you've got to weigh up the likelihood of a failure (any failure) happening and corrupting your application versus the perceived cost (commercial and the overhead of restoring the system to good health) of using transactions. If you want to take the risk, then don't use transactions; but likewise, don't forget that they do exist to help you.

Now you may think that replication of resources/objects/servers could be used in place of transactions, but that isn't the case. Replication and transactions can be complimentary, but they're not a replacement for one another. Transactions guarantee consistency even in the presence of complete system failures, but you won't necessarily get forward progress. However, replication offers (though cannot guarantee) forward progress in the presence of a finite number of failures. So I would argue that if you are replicating updatable data, then you should definitely consider transactions as well.Which leads us to another problem with selling the idea of transactions, that I've blogged on before: the notion that they slow your application down. Combined with the first problem I mentioned, I've often heard this referred to as "I get nothing for something" syndrome: you get the overhead of using transactions, but you just don't see the benefits they bring (which, looked at from some perspectives, is an entirely logical conclusion to make). Of course transactions slow down your application: I've
discussed this before
, but if you just think about what they have to do in order to guarantee consistency in the presence of failures, it makes sense: there really is no such thing as a free meal!

Transaction processing systems have been the backbone of significant areas of computing infrastructure for decades. A lot of these places (finance, telecos etc.) made the trade-off between performance and reliability because there was never a trade-off to be made: if they corrupt data (e.g., lose updates to stock trades), then institutions lose business. Now obviously that's not the case everywhere and there are applications where failures really don't matter (e.g., stateless interactions). But in general, you need to think about the effect of failures on your applications and although transactions are just one of the techniques you could use to help tolerate them, with the JTA they are a core component of J2EE. So rather than come up with ad hoc solutions, it may be better to try to leverage tried-and-tested techniques and associated implementations.

Following on from this is the oft heard statement: "everything I do is within a single VM, so I don't need transactions". This is definitely an education issue, where distributed transactions have become synonymous in the minds of many people with transactions. Most people can see that if they're accessing resources/participants across physically distinct machines or processes, there's a need for transactions to coordinate simultaneous updates to state. In a local (single VM) environment, the need is often overlooked. But it is still there: in many cases, even within the same VM, applications use and modify data from multiple different sources, and in that case, you need the benefits that transaction processing provides. Distribution just makes it more obvious that independent failures can cause problems. But they're still there in the local case; you may just have to look a little harder.

Plus, transactions get a lot of bad press for overheads that really don't exist in all cases. All commercial grade implementations support a number of significant optimisations to improve performance in the 80-20 case. For example, if there's only a single participant in a transaction, then the notorious two-phase commit protocol goes away and we run with a single phase. Then there's the read-only optimization: if a participant didn't modify any data, then it can drop out of the transaction "early". Plus, there are some implementations that have evolved over decades to offer many other performance features, such as lightweight coordinators, nested transactions and non-durable participants. The intention (mirrored by Microsoft's work with Indigo transactions) is to make transaction implementations so lightweight, with low overhead that they'll become a natural part of the infrastructure (and in the case of Microsoft, that'll mean in the operating system). We've already seen them moving into hardware, so this makes a lot of sense too.

As I've said before, think of transactions like an insurance policy: compared to how much time, money and effort you may lose by not using them, the cost of using them may be well worth it. Obviously there's a tiping point on any graph of cost of using transactions versus advantage they may bring, and that point is going to be very dependent on your application. But consider it nonetheless.

Related Topics >>