Tuesday, February 26, 2008

Beware the STL my son! The jaws that bite, the claws that catch!

I used to like C++ a lot. It was a very simple, efficient language, and you could always say exactly how the code you're writing mapped to your CPU.

Of course that was at Microsoft and before, and the dialect that I used was more C than C++ - sprinkled with some syntactic sugar like ability to define variables at the site of the first use, and maybe some classes, and an occasional - very infrequent - template. But still C at the very core.

This is not how Google uses C++ though. Here, C++ means STL. Big time. None of this C style is tolerated (there isn't even readability process for C!). And everything - and I mean everything! is a class.

STL was spooking me for a long time. I never used it at Microsoft because things that I worked on - mostly in the core of the OS - had to be insanely fast, and, most importantly, long-running. So I had to care a lot about things like memory fragmentation (see here: http://1-800-magic.blogspot.com/2007/11/guerilla-guide-to-native-memory.html and here: http://1-800-magic.blogspot.com/2007/12/memory-is-not-free-more-on-vista.html).

And it's kinda hard - possible, but you have to bend over backwards - to make STL use custom allocators that were my bread and butter.

Anyway, at Google I had to learn and start using STL. This, unfortunately, made me like C++ (at least STL-ish C++) less and less.

The first problem that jumped at me was the way you learn things. Most of my career I was learning by reading the source.

With C runtime, even with its Microsoft implementation which has way more levels of indirection than necessary, it is easy - you step into the function, and look what's inside. The code that compiler generates maps directly to the source, so the performance is on the surface - it executes as written.

With STL, the file vector - what can be simpler than that?! - is 2424 lines (Visual Studio 2008). algorithm - basically, sorting, searching, and copying - is 5796 lines. This, by the way, is the longest quicksort implementation I've seen in my life :-)! When you step through the code, it's a level of indirection on top of level of indirection on top of level of indirection.

[As a side note - the description of level 62 in old Microsoft - the level where you stop being a junior developer - had, among others, this as a qualification: "Does not create unnecessary abstractions." I think this was written by a smart, smart person!]

Now, consider the following piece of code:

#include <stdio.h>
#include <vector>

using namespace std;

template<class T> class Outer {
template<class T> class Inner {
public:
int i;
Y t;
Inner(int a, T b) {
i = a;
t = b;
}
};

public:
void DoWork(T t) {
vector<Inner<T>> data;
for (int i = 0; i < 10; ++i)
data.push_back(Inner<T>(i, t));

for (vector<Inner<T>>::iterator
it = data.begin(); it != data.end(); ++it)
it->i += 10;
}
};

int main(int argc, char **argv) {
Outer<int> x;
x.DoWork(25);
return 0;
}

[Note, this is just an example. The original code was not it. Yes, I know about STL pair.]

This code compiles and runs just fine on Visual Studio compiler. It does not on GCC, for 3 reasons.

First, GCC requires a space between two closing '>' on a templatized template, so you can't write X<Y<Z>>. It has to be X<Y<Z> >.

Second, the template T in the Inner class declaration on GCC shadows the outer T, so it has to be named something else.

And finally, and the most ridiculous - GCC does not parse vector<Inner<T>>::iterator as a type. You have to prefix it with the keyword typename, like so:

typename vector<Inner<T>>::iterator it =...

If the compiler can't recognize the type as a type, there's something wrong with the code, don't you think?

Update 03/02/2008: Judging from commentrarry, I failed to communicate the intended point. Yes I know that in the three cases above the behavior is required by the standard, and VS2008 is not standards compliant. VS2008 is probably responsible for compiling at least 60% of the code written on this planet, and if the leading compiler is not compliant, this means only one thing - that the standard is too unwieldy. None of this happens to be the case with C or Java - there's only one way to parse them, and they are parsed the same by all tools.

Today, I started looking at the performance of STL code. But this post is getting too big for that, so the details are in the next articles. Here I'll just say that they are not encouraging.

12 comments:

Илья Казначеев said...

I never understood why would one use C++ in the first place.

If you want to use C++ as C with classes, like MFC uses it, then C++ is very very bad C with classes, it lacks everything you expect from OO model (reflection, for example). Also, C++ sucks badly in dynamic linking, which cause you to have all that 50 mfc libraries of 8 kinds all around the place, or, in case of linux, incompatibility between libs compiled with different versions of gcc.

As for STL, it's a very cool idea but it's no more ready for general adoption than, let's say, Haskell.

Thomas said...

You do realize that all those three issues are explicitly mentioned in the standard and that the VC++ compiler does compile it actually means that VC++ is per definition *not* a C++ compiler...?

All those three points are basic C++ std things explicitly mentioned in the std. I think e.g. the double angle-brackets too in fact are one of the first samples Bjarne gives in his C++ the Programming Language Book about templates.

And that VC++ manages to use some stupid non-Koenig lookup algo that manages to "guess" (that's what it's doing) that your typedef is a typename, is *NOT* a good thing...

C++ rules, and all though I can agree on the "do not create unnecessary abstractions" thing in your post, I think that's about the only thing I can agree with you about...

But C++ is "another language" and if you've done a lot of C (which I call the stuff you were doing before STL) then it is a learning curve, that's true. But to say stuff like you're doing now about C++, templates, classes and especially STL which probably is the closest thing you can come to "God's gift to developers" are extremely arrogant, especially when you've spent such a small amount of time (obviously) as you have...

PS!
Done correctly STL is FAR faster than any C version of the same algorithms...

.t

Anonymous said...

Please stop writing posts about C++ and the STL. Thank you.

Sergey Solyanik said...

Of course I'll be writing about STL and C++ - this is MY blog! If you don't like it, you have either of the two choices:

(1) Don't read the blog.
(2) Don't read the posts on this specific topic.

-S

Sorin said...

First, > > it has to have a space, according to the standard. Please see 14.2.3 section for a longer explanation why that is the case.

Second, the vector iterator is a dependent name. The standard requires that dependent names be prefixed with typename for disambiguation purposes. For most part, you don't have to understand all the complex rules about name lookup, you just have to remember to use typename for qualified dependent names, otherwise, the compiler parses them as non types.

Anonymous said...

i'm by far not this much into C or C++ but still i'm interested in your posts regarding this topic!
continue!

Anonymous said...

"Solution Domain Artifacts" -- http://antipattern.wordpress.com/2007/10/23/solution-domain-artifacts/ ?

Anonymous said...

This is becoming embarassing reading about MS and Google.

Good Heavens, I should start charging more as you people are obviously employed whilst you'd fail a basic test in modern C++ programming.

I suggest you revise the lot and delete this from internet, it is, again, pretty embarassing.

Anonymous said...

I was going to write a bigger comment but will just post to agree with the previous comment. This is embarrassing.

Mike G. said...

It's a long quicksort because, if I remember correctly, it isn't a quicksort :)

The standard doesn't require that sort be guaranteed O(n log n) worst case, so quicksort would work.

But I think most STL implementations use introsort instead, since it guarantees the worst case, AND is about as fast as median-of-3 quicksort:
http://www.cs.rpi.edu/~musser/gp/algorithms.html

Anonymous said...

I never understood why would one use C++ in the first place.

Piano, 鋼琴, 琴行
搬屋, 搬屋, 搬屋公司, 搬屋公司, 搬屋
English Course, 英語課程, Grammar Course, Learn English
傢俬, 辦公室傢俬, 寫字樓傢俬, 屏風, Office Furniture
廣告設計, Graphic Design, 廣告設計公司, 平面廣告設計, Graphic Design Company, Logo Design

handbag-mall said...

Bag2u.com - Most popular internet shopping malls for Height High Quality Louis Vuitton, Gucci, Prada, Balenciaga, Chanel Replica Handbags and Wallet / Purse . The mall is designed to be convenient and informative in helping today's busy shoppers find the best stores and deals online. We have many exclusive money saving offers only available on the Bag2u.com web site.