Skip to main content

Oh, go ahead -- prematurely optimize

Posted by sdo on February 25, 2008 at 3:04 PM PST

Recently, I've been reading an article entitled
The Fallacy of Premature Optimization by Randall Hyde. I urge everyone to go read the full article, but I can't help
summarizing some of it here -- it meshes so well with some of my conversations with developers
over the past few years.



Most people can quote the line "Premature optimization is the root of all evil" (which was
popularized by Donald Knuth, but originally comes from Tony Hoare). Unfortunately, I (and
apparently My. Hyde) come across too many developers who have taken this to mean that they
don't have to care about the performance of their code at all, or at least not until the code
is completed. This is just wrong.



To begin, the complete quote is actually

We should forget about small efficiencies, say about 97% of the time: premature optimization
is the root of all evil.

I agree with the basic premise of what this says, and also with everything it does not say.
In particular, this quote is abused in three ways.



First, it is only talking about small efficiencies. If you're designing a multi-tier app
that uses the network alot, you want to pay attention to the number of network calls you
make and the data involved in them. Network calls are a large inefficiency. And
not to pick on network calls -- experienced developers know what things are inefficient,
and know to program them carefully from the start.



Second, Hoare is saying (and Hyde and I agree) that you can safely ignore the small
inefficiencies 97% of the time. That means that you should pay attention to small
inefficiencies 1 out of every 33 lines of code you write.



Third, and only somewhat relatedly, this quote builds into the perception that 80% of
the time an application spends will be in 20% of the code, so we don't have to worry about
our code's performance until we find out we're in the 80%.



I'll present one example from glassfish to highlight those last two points. One day, we
discovered that a particular test case for glassfish was bottlenecked on calls to Vector.size --
in particular, because of loops like this:
Vector v;
for (int i = 0; i < v.size(); i++)
     process(v.get(i));

This is a suboptimal way to process a vector, and one of the 3% of cases you need to pay
attention to. The key reason here is because of the synchronization around vector, which
turns out to be quite expensive when this loop is the hot loop in your program. I know,
you've been told that uncontended access to a synchronized block is almost free, but that's
also not quite true -- crossing a synchronization boundary means that the JVM must flush all
instance variables presently held in registers to main memory. The synchronization boundary
also prevents the JVM from performing certain optimzations, because it limits how the JVM
can re-order the code. So we got a big performance boost by re-writing this as
ArrayList v;
for (int i = 0, j = v.size(); i < j; i++)
     process(v.get(i));

Perhaps you're thinking that we needed to use a vector because of threading issues, but
look at that first loop again: it is not threadsafe. If this code is accessed by multiple
threads, then it's buggy in both cases.



What about that 80/20 rule? It's true that we found this case because it was consuming a lot
(not 80%, but still a lot) of time in our program. [Which also means that fixing this case
is tardy optimization, but there it is.]
But the problem is that there wasn't just
one loop written like this in the code; there were (and still are...sigh) hundreds. We
fixed the few that we the worst offenders, but there are still many, many places in the
code where this construct lives on. It's considered "too hard" to go change all the places
where this occurs (though NetBeans could refactor it all pretty quickly, but there's a
risk that subtle differences in the loop would mean that it would need to be refactored
differently).



When we addressed preformance in Glassfish V2 in order to get our excellent SPECjAppServer results,
we fixed a lot of little things like this, because we spend 80% of our time in about 50% of
our code. It's what I call performance death by a thousand cuts: it's great when you can
find a simple CPU-intensive set of code to optimize. But it's even better if developers
pay some attention to writing good, performant code at the outset and you don't have to
track down hundreds of small things to fix.



Hyde's full article
has some excellent references for further reading, as well as other important points about
why, in fact, paying attention to performance as you're developing is a necessary part of
coding.

Related Topics >>

Comments

@prunge, there is only one rule: use the best algorithms, master the number of calls of a code and bench YOUR program (not a snippet).
and please, forget what you used to do:

  1. "Try to use the collection interface rather than implementation wherever possible", stupid, don't forget the complexity. Replacing an ArrayList by a LinkedList in an indexed loop doesn't come with no cost.
  2. use for-each everywhere, stupid too, an indexed loop is more effective than iterator on an ArrayList (ArrayList implements RandomAccess), furthermore for-each on a collection allocates an Iterator but nobody seams to care about that.
Rémi

I think onyx makes an important point. It's only when the optimization affects the complexity or clarity of the code that it should be avoided in the first place. If the code is just as clear with an optimized case, then you should start with that case in the first place.

Hi Markus,

There is a cost difference irrespective of monitor contention. The ** additional ** synchronization cost naturally varies across platforms, runtimes (and benchmarks) but on my PowerBook G4 the overhead is approximately 0.05 microseconds.

William

A couple of things I always do.

- Try to use the collection interface rather than implementation wherever possible when declaring references. e.g. use List instead of ArrayList or Vector. It's easier to swap implementations later that way.

- Use the enhanced for loops that were introduced in 1.5 if I only need to reference the current element in the loop. It means there is one less variable in the code (either an Iterator or an index) I have to think about, therefore reducing complexity. It's probably not the most efficient way of iterating an array list, but there are very few loops that I write where they themselves are causing performance issues - usually what is happening inside the loop is what needs to be optimized. If the loop itself was really causing performance issues I'd consider changing over to arrays instead.

But I agree that in some cases we do need to think about performance before we write code. Examples are doing linear searches through large lists or doing many fine-grained remote calls over a network.

Which JVM are we discussing the code seems to play into the hands of 1.6 optimisations ? I ran a quick 10,000 object test on 1.5 and could see the difference in milli seconds running again (same class) using 1.6 and the advantage disappeared. The code was coded in such a way it was just asking for 1.6 optimizations to kick in. Your code snippets suggests no other threads can contend your locks from your blog I suspect this is not how you use your code. i.e. your code (vector/arraylist) looks thread local though I appreciate thats not what you meant from the accompanying text). 1.6 in theory I believe could via escape analysis effectively remove synchronization the statement "crossing a synchronization boundary means that the JVM must flush all instance variables" is a bit miss leading at best using the phrase 'memory barrier' is better but my understanding is basically just because you use the keyword synchronized does NOT always mean it has to 'flush' i.e. generate a memory barrier though it could well do. This is important as some people have tried to use synchronized to make a memory barrier with no intention of actual synchronization which is flawed on the latest JVMs in theory. Note also in this case synchronization was free (admittedly as the JVM could see it was unnecessary) http://www.ibm.com/developerworks/java/library/j-jtp10185/index.html I DO agree with moving the v.size () in the loop (and using ArrayList) and would always recommend that but I have seen so cases of over optimizing producing worse performance on later JVM's I would not recommend getting obsessed. Coding guidelines that are not well maintained can also produce some awful results.

This all looks like a case of "don't repeat yourself". Of course, you may have more stringent guidelines for how you could refactor, but it sounds like you should look at code that does the same looping and refactor that into a single object. Then, you can make changes to it as much as you want.

I have published some additional comments on my blog regarding the article linked by Scott.
Observations on Premature Optimization
http://blog.jinspired.com/?p=217

@remi

I don't consider using for-each stupid, especially since it removes a variable (the iterator or the index variable) from the code, therefore reducing complexity. Yes, I realize that this will allocate an Iterator every time which is more expensive than using an index, but usually this cost is insignificant compared to the processing done inside the loop.

As for using List vs ArrayList, there are other disadvantages to using ArrayList rather than the interface: you can't use the Collections wrappers any more without changing more than one line in your code. And this becomes even more important when designing a library and exposing these in public methods - should I force the users of my API to use an ArrayList? Or should I let them pass in any kind of list (or Collection or even Iterable)? And in this case I cannot assume it is efficient to do an indexed loop vs iterator. Maybe do an instanceof check with RandomAccess, but this adds more complexity to the code again for questionable gain.

Of course, after writing the code, doing profiling and identifying a loop as degrading performance, then I will change it, but not prematurely.

I actually look at this a different way.

First, as a disclaimer, I use ArrayList as the first structure I reach for out of the bag, rather than Vector. Basically because of the (years old) synchronization issue, and the fact that most of our stuff is in Session Beans or equally thread safe areas.

But what I've learned is that whoever was using Vector everywhere probably arrived at that default decision one of two ways.

One, they used the "damn the performance, speeding the system up later is more efficient than hunting down random thread safety bugs" rule, or, most probably, they used Vector solely because "everyone else used vector".

That is whenever the first engineer started work on the system, they made, for whatever reason, the decision to use Vector as their Collection hammer.

Later new folks coming on board saw old code with prevalent and pervasive use of Vector, so they just used it to. "We seem to be using it everywhere else, these are smart guys, they've chosen this from some reason not worth asking about -- I have a deadline to meet" kind of reasoning. And thus, it propagated.

Many coders simply know not what or why they do. Mind, not out of malice, or laziness, or being "stupid", but rather just the enormity of most modern software systems and frameworks. But when they don't have a strong, compelling reason to not do something, they defer to authority about the "right way", whether that authority is a Senior colleague, a web post from someone they like, or, most likely, the authority of code written, tested, and vetted by the Powers That Be.

This is why, particularly on large projects, early decisions can have long term consequences as teams grow and "learn" from working code.

I always use

Arraylist a;[...]for(int i=a.size()-1;i>=0;--i)

unless a different behavior is absolutely required. If the order doesn't matter I even use a Bag (an unordered list - which isn't included in Java's collections), because removing elements is a lot faster there.

While this kind of thing is often in the realm of micro/low-level optimization, I do it everywhere regardless. In my opinion it's alright, because there isn't much of a readability/typing difference to the other alternatives.

However, generally low-level optimizations aren't the best thing to do. Many of 'em can degenerate the quality of the code drastically without improving performance that much.

Whenever I throw the "premature optimization is the root of all evil" quote around I'm referring to low-level optimizations. High-level optimizations, however, are a different thing. If you have experience in the area it would be foolish to use a simpler algorithm instead of a better suited one.

Additionally, high-level optimizations usually offer the biggest performance gains. It's really something to think about beforehand. If the profiler still indicates problems try another approach. Once that's done you can eventually start with low-level optimizations. That is, if it's still required. But don't scatter bit-ops and unrolled for-loops all over the place if there isn't a valid reason for doing so.

Hi, I doubt that in a single threaded Application going from Vector to ArrayList will buy you much. Uncontented synchronization is usually fast on modern VM's Regards, Markus (java performance blog)

It's nice to see this stated so clearly. I work on scheduling/optimisation algorithms (Operations Research, etc) and we've conducted experiments on various Java language constructs. We're trying to ensure that, when we write our code, it is written the correct way from a performance perspective. The example you give, of caching a Collection's size, is a common one.

Some might consider our resulting code to be "prematurely optimised". In my opinion, knowing the fastest way to access an ArrayList, or avoiding HashSets unless you really need Set semantics, is just a good coding practice.

Both the original article and whartung's comment point to one deficiency - a lack of coding guidelines for the project. (the counter-part for this, a list of coding anti-patterns that shows what the code base is trying to move away from).

It's also worth setting performance goals for different modules. A low-level module may be called so frequently that it has to avoid using Iterator and move even trivial loop invariants out (e.g. v.size() above).

Conversely, on a higher-level module, using Iterator makes sense because it results in code that is easy to understand.