Monday, May 19, 2008

Premature optimization is the root of all evil

Every once in a while I come across a code review where there is a small inefficiency in the code which can be easily corrected, but where an author invokes the ghost of "premature optimization" to justify keeping it this way.

Today's example (Java)...

Map map;
...
for ( ; ; ) {
...
if (!map.containsKey(key))
continue;
Object x = map.get(key);
...
}

This does the hash lookup twice - first when checking whether it contains the key, and second when retrieving the value. In the case where objects stored in the map are never null, the time spent in this code can be cut in two by just doing this:

Map map;
...
for ( ; ; ) {
...
Object x = map.get(key);
if (!x)
continue;
...
}

Is this a premature optimization?

What about this (C++):

bool process(string &in, string *out) {
*out = in;
string x, y;
if (!subprocess(in, &x, &y))
return false;
*out = x + '/' + y;
return true;
}

"*out = in;" being something on the order of several hundred of instructions (http://1-800-magic.blogspot.com/2008/04/stl-strings.html), the same code can be rewritten, without the loss of readability, as follows:

bool process(string &in, string *out) {
string x, y;
if (!subprocess(in, &x, &y)) {
*out = in;
return false;
}
*out = x + '/' + y;
return true;
}

Would this be a premature optimization as well?

The term premature optimization, originally applied by Knuth and Hoare to making design trade-offs to optimize to clock-level efficiency is most often misapplied to mean that how you write the code does not matter - we'll figure out what's slow and optimize it later.

There are a few problems with this approach. First, as Hoare writes himself...

"I've always thought this quote has all too often led software designers into serious mistakes because it has been applied to a different problem domain to what was intended. The full version of the quote is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." and I agree with this. Its usually not worth spending a lot of time micro-optimizing code before its obvious where the performance bottlenecks are. But, conversely, when designing software at a system level, performance issues should always be considered from the beginning. A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems."

(Emphasis mine.)

There are certain systemic decisions that a designer makes before writing the code (do I use STL? Do I use Java?) that are extremely difficult to undo once the code is complete - the performance/memory/code size implications may end up spread across the entire system - perhaps across multiple layers of the software stack - and are extremely hard to localize and "fix".

What's worse, the "fix" usually is an even worse hack.

Consider a developer who applies Java to the wrong application domain (http://1-800-magic.blogspot.com/2007/11/domain-languages.html) - e. g. media processing - and ends up with a very slow system.

This developer might be compelled to rewrite parts of it in C++ and sprinkle the native code in various parts of the otherwise managed system where he or she has found one could gain the most performance improvement by profiling.

Of course this makes the design more complicated and the code less readable - now we have some parts of the system implemented in one language, and other seemingly random parts - in another. This flies right in the face of the originally stated goal of "cleanliness" of the design. It also makes debugging and correctness verification tasks much harder.

Second point I want to make is the implementation cleanliness. While one might argue that checking whether the hash contains an element or not more clearly expresses the intent of the developer, it is actually in the eyes of the beholder.

When I look at the original Java code snippet, the first thing that comes to my mind is - "oh, (s)he must be storing nulls as possible key values, so the check needs to be done upfront. But wait, (s)he then dereferences the value without checking for null? A bug, perhaps?.."

And when I look at the C++ snippet, I think that returning "false" rather than "true" is perhaps the common case, since the code is clearly optimized for it (it does more work than necessary in the "true" case, but that probably does not matter because we never follow that code path in reality).

So as you can see, the opinions on what's readable - explicit check/upfront initialization vs. checking for null/error case initialization - might diverge. For me doing the extra work makes code less readable - I expect that this was done for a reason, and would start searching for this reason. Another developer might like the more explicit but redundant code better.

Since we don't know who would be reading our code, and what his or her preferences might be, we should not trade ephemeral aesthetics vs. the very real watt-hours burned by the CPU executing suboptimal software.

The third reason for chosing the more efficient programming style is because it will make one a better developer. To reiterate the Hoare quote, albeit a bit out of context, "A good software developer will do this automatically, having developed a feel for where performance issues will cause problems."

The code above is worth noticing and correcting simply to make sure that the developer trains himself or herself to avoid writing the same code in the future (and perhaps, this time, in the critical path). After a correction or two, the engineer will start producing efficient implementations automatically.

I consider this point of style that separates the men from boys (and the women from girls, to be politically correct). If you look at code written by great developers, it is efficient without sacrificing readability, non-redundant without being obscure. It implements the algorithm - even in a pseudocode - in a way that makes it completely clear what author wants to do but does not sacrifice one clock of CPU time.

Take a look at the code in CLR (http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262032937), Sedgewick (http://www.amazon.com/Bundle-Algorithms-C%2B%2B-Parts-1-5/dp/020172684X), or David Hanson's "C Interfaces and Implementations" (http://www.amazon.com/Interfaces-Implementations-Techniques-Addison-Wesley-Professional/dp/0201498413). Even when written in pseudocode, it is an amazingly effective pseudocode. It manages to achieve this effectiveness without sacrificing the readability.

Learn to do this, and you're going to be a great developer. Take up the religion where writing effective code by default is a "premature optimization" - and consign yourself to a career, as Joel puts it, of copying and pasting a whole bunch of Java code :-).

More on premature optimization here: http://www.acm.org/ubiquity/views/v7i24_fallacy.html.

2 comments:

Илья Казначеев said...

JVM might defeat most optimizations because it's going to optimize anyway.

I didn't try it but I wouldn't bet my salary on the fact that second smippet would be twice as faster.

Anyway, I think containsKey() before get() is very misguided semantically, and you should use the second example because it's shorter and more straightforward.

My approach is "Never bother about computational complexity of an atomic operations, care about O(x), x >> N only. Or when in a profiled tight loop."

Илья Казначеев said...

Also: double lookup will introduce a race condition.