Monday, April 28, 2008

Warren Buffet says recession is going to be bad, invests in chewing gum!

"NEW YORK (Reuters) - Warren Buffett, the world's richest person, said on Monday the U.S. economy is in a recession that will be more severe than most people expect.

Buffett made his comments on CNBC television after his Berkshire Hathaway Inc (BRKa.N) (BRKb.N) agreed to invest $6.5 billion in the takeover of chewing gum maker Wm Wrigley Jr Co (WWY.N) by Mars Inc in a $23 billion transaction.

"This is not a field of specialty for me, but my general feeling is that the recession will be longer and deeper than most people think," Buffett said. "This will not be short and shallow.

"I think consumers are feeling gas and food prices," he added, "and not feeling they've got a lot of money for other things.""

Chewing gum prices are positively correlated with the prices of food - the more expensive the food is, the more chewing gum is used to replace it :-)...

The next development - George Soros shorting the weight-loss industry - to be announced later!

Computer Science lectures

This blog has a bunch of pointers to what looks like a nice collection of CS lectures.

Friday, April 25, 2008

Election criteria

One of my favorite sayings: "Smart people tend to hire people smarter than themselves, dumb people tend to hire people dumber than themselves".

How does this work in the US? Let's see...

I think THE prerequisite for an effective democracy is an educated demos. This we definitely do not have.

US Aid to Israel

A few interesting numbers here:

From Bill Maher

"If you think that Democrats are going to take your Bible away, you're an idiot.
If you think that they're going to take your gun away, you're an armed idiot.
And if you think that they're going to take your gun and give it to Mexicans to kill your god, you're Bill O'Reilly"

Thursday, April 24, 2008

STL strings

So, while reviewing someone's code, I ran into a place where it was possible to eliminate an unnecessary string assignment. Which got me thinking - is mentioning this even worth it? How expensive is a string assignment, anyway?

So I wrote this simple program:

#define _SECURE_SCL 0

#include <string>

using namespace std;

void f(string a, string *b) {
*b = a;

int main(int argc, char **argv) {
string y = argv[0];
string x;
f(y, &x);
return 0;

and stepped through it in the disassembler. The function f itself was inlined, so I only counted... prepare... the instructions in this call:

call dword ptr [__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char> >::operator= (402048h)]

I skipped the call to new (argv[0] turns out to be 69 bytes, just above the default string buffer, so it does allocate), and memcpy itself (memcpy_s to be exact). The net result was - 270+ instructions. If there is no allocation, it is ~150!

This is correct - anywhere between 150 and 270 instructions of STL goo per string assignment IN ADDITION TO ACTUALLY DOING THE WORK!

I tried to do the same on Linux, and after 70-something ddd hung disassembling one of the functions...

Tuesday, April 22, 2008

Server 2008, first impressions

Short version

MUCH better than Vista.

Medium version

If you're a Microsoft employee and the goons from ITG(*) are trying to rip your favorite OS from your cold, dead hands and make you run Vista, don't - use Server 2008 instead.

There's no LUA/UAC/this idiotic thing that asks you whether you really intend to do what you just asked your computer to do. ipconfig works in a normal shell, instead of demanding one with elevated priorities. It takes literally half as many clicks/keystrokes to do everything.

There are manifestly fewer bugs. It did not fall apart within the first couple of weeks of use. And it seemed much snappier than Vista (although I did not do any real performance benchmarks).

Here is a blog post that has instructions on how to configure Server 2008 into a workstation:

And this one claims that it's 20% faster than Vista, but does not give details on how this number was arrived at:
ITG - Microsoft's Informational Technology Group. Changed names multiple times over the years, but its essense stayed the same - these are the people who prevent developers from doing their jobs in the name of security. But this probably merits a separate post...

Long version

I have a lot of storage in my house - not counting client computers, there are approximately 9TB of redundant, RAIDed usable (after RAID) space. Most of my data is on this storage - software, music, pretty much everything.

Every time I buy anything, I immediately copy it to the server, and put the originals in a big box in the basement, where they will stay until the time when BSA people show up at my door and demand the proof of license :-).

My own data is replicated to multiple servers - approximately 300GB of home videos and digital photos, plus other, smaller stuff that accumulated over the years.

Storing stuff requires storage. Storing terabytes of things requires redundant storage. Over the years, I sampled a few RAID-5 solutions, both at home and at work.

There are two major problems with RAID controllers.
(1) While they protect you from a disk failure, they do not protect you from the failure of the controller itself.

Since all of these controllers use proprietary information to describe the RAID array that they store on the disks themselves, they are not interchangeable - you can't take a bunch of disks that you used in RAID mode on LSI and move them to Adaptec (while preserving that data that's on them, that is).

In fact, there's no guarantee that you can move disks between the controllers from the same manufacturer. Or between controllers with different versions of firmware. Or...

So 5 years from now when the RAID card fails, one can very easily be stuck with trying to find an exact replacement of a controller that had been out of production for the last 4.5 years. eBay, anyone?

(2) Software that accompanies these cards is often crappy.

The UI is almost always some atrocious Java program obviously written by a contractor in 2 days right before the product shipped, rife with misspellings and terrible English usage.

I have had multiple problems where midrange cards corrupted data when used on machines with more cores than the manufacturers originally expected.

So unless you test the disk failure scenario right upfront, BEFORE you get any data on it, you may well discover that (a) either the recovery mode does not work, or (2) because of unobvious UI you did something that wiped your disks instead of recovering them.

And good luck finding drivers when you upgrade to the new OS. And since you can't easily move the disks to a new controller... see (1).

Luckily, Microsoft Server family has software RAID subsystem (I am sure Linux has something similar, but coming from Microsoft, I am more familiar with Windows software).

To use it, you have to make your disks dynamic, then you can combine multiple disks into RAID-0, 1, or 5. Volumes of different types can share the same set of physical disks, so for example you can have part of disks 1 and 2, and whole of disk 3 to contribute to RAID-5 volume, and remainder of disks 1 and 2 to form RAID-0 temp storage.

The advantage of soft RAID is that it's hardware independent. You can take all these disks, shove them into any other computer (running Server), and it will still be a RAID volume with all your data intact. The same disk packs can be played on both Server 2003 and Server 2008.

It's an insanely cool idea. Unfortunately, in Server 2003, it was coupled with atrocious implementation.

Soft RAID-5 on Server 2003 is slow. Glacially slow. On my servers that feature dual Xeon 5130s (4 cores per server), with a very decent server motherboard (Supermicro X7DVL-E), and fairly decent mid-range SATA controllers, it was barely doing 20MBps writes, and sometimes would drop to 10MBps for extended periods of time. That on disks that are individually capable of 300MBps transfer rate.

RAID-5 works by partitioning disks into chunks, and then combining chunks from N disks to get N-1 chunk worth of data and 1 chunk of parity. Which means that to write a single sector to a volume, the RAID would have to read corresponding chunks from N disks, compute the parity, and write 2 sectors - one data and 1 parity.

A really terrible implementation would not cache the results of this read, so if the next sector needs to be written, it would repeat all the operations anew, instead of reusing the results of the previous reads.

The only way I can explain the RAID-5 write speed on Server 2003 is that it was this very terrible implementation, although I don't know for sure - an alternative explanation is that maybe they had sleep cycles in there :-).

So when Server 2008 came out, I could not wait to install it and check out its soft RAID implementation. I installed it first on my media server, and then on my data server.

Overall, I was quite impressed. Of course my expectations were very low to begin with because of Vista, but this thing was closer to Server 2003 than it was to Vista. I hit a few bugs right upfront - it hard hung once within a couple of days of installation, and then lost a set of disks (but recovered after reboot).

I am not quite ready to blame it on server itself though, because I added an unknown RAID controller to the machine, and it is more than likely that buggy drivers are to blame. The second server which did not have that controller did not (yet) exhibit this behavior.

Since then, it was relatively quiet and everything functioned the way it supposed to.

The drivers for SATA controllers from Server 2003 worked on Server 2008. The chipset drivers for the motherboard did not. I found that the chipset support for Server 2008 is still quite scarce.

What is unquestionably a bug in Server 2008 is that on RAID volumes the performance counters for logical disks are completely broken - everything except the disk free space and idle time is 0 when it is reading or writing full speed.

But most importantly, its soft RAID implementation is way faster than Server 2003. I copied a few terabytes of data so far, and on writes it does sustained throughput of ~80MBps - 4 times faster than the peak performance Server 2003 could muster. The reads (comparing files between two servers) almost saturate 1GBps network.

So far this things gets my stamp of approval :-).

I am yet to see if it is has long-term stability to last between Windows Update reboots (just in case, I preserved the original installations of Server 2003). I will report on this in a couple of months if everything goes well, earlier if it does not.

Henry Blodget agrees with me...

In today's post on Huffington Post he's basically saying the same thing I wrote about here: the CEO comp structure in the US encourages reckless, irresponsible attitude to the long-term shareholder value.

And he's very authoritative on the subject of irresponsible behavior :-)!

Friday, April 18, 2008

Who is flying this plane?

A while ago when I was in Technology Management MBA @ UW, they got former Qwest CEO Joseph Nacchio to come to the orientation meeting and talk about his tenure with the company.

As a result of this talk, I've gotten two very lasting impressions. First was a sense that under absolutely no conditions save outright starvation I would want to work for Qwest, and second was a state of bewilderment to which extent a CEO of the company can be disconnected from the technology that this company is using.

During his talk Nacchio confused GPRS, GSM, and other cell network terms. He professed the ignorance of how the networks work himself, and with a distinct sense of smugness (as in - you don't need to know any of this crap to be a leader - this is for lowly engineers to understand).

What he did talk about at length, and where he had become really animated and very lucid was mergers and acquisitions. Apparently, at the time Qwest was making a lot of money by buying small Eastern European telecoms and then selling them at a profit. From what I gathered in the talk, that line of business was what really animated the management. Not the pesky technological and operational hurdles of providing telecom services.

Over the last many years, as Ballmer's influence increased, I watched Microsoft leadership become less and less engineers, and more and more sales and marketing people.

Now, I don't think that engineering skills are required to head an engineering company. However, I do find that culturally marketing and engineering organizations are VERY different, almost stereotypically, Scott Adams, cartoonish different.

And nothing affects company's culture more than the CEO.

So when I ran into this "pearl" today, I felt embarrassed, but, sadly, not surprised:

There is often a big disconnect between market's perception of the product, and the sales perception it. It is not bad - to be an effective salesperson, you have to be excited about the product as it is - or you won't sell very many of it. Sadly, I fear that Microsoft internal perception - that of the leadership team, anyway - of Vista is closer to the movie above, and not to the painful reality.

Thursday, April 17, 2008

Wednesday, April 16, 2008

Google Road Traffic Incidents ahoy!

I have not posted for a long, long while because the confluence of several events has been keeping me busy almost around the clock for the last 5 weeks.

First and foremost, the project I've been working for the last 3 months - the road traffic incidents - has shipped today. It's my first experience building the entire pipeline at Google - downloading the data from 3rd party provider, parsing it, storing the results in a bigtable, writing a spatial index for it, and, finally, publishing it through the maps frontend server. Here's the result:

It's actually kinda fun watching the data in various cities. Seattle is a really, really quiet place. The incidents are mostly construction (there are a few accident icons, but in reality they are all about slow traffic):

Though Seattle is a small place, actual real accidents (as in - collisions) are quite uncommon even in much bigger cities. Here's New York. Here you see a lot of road closures (most of them are periodic, and in the future - eventually we'll figure out how to be more intelligent about displaying them; right now the data we're getting is lacking a lot of details that would allow us to be - but we'll work it out with our provider). There are a few constructions, and slowdowns, but only one of the icons is displaying a collision:

And most other US cities are somewhere between New York and Seattle. Except for one. If you're living in LA, my hat is off to you. The traffic there is real zoo. Yes, there's even one place where almost everyday the sign shows "Animals on the road". Here's a quiet night at LA (believe it or not, during the day time, it's a lot worse). Most non-consturction icons are real collisions. A couple are hit and runs.

The insane amount of construction icons you see here is a nightly event - the authorities deply miriads of crews to do various road maintenance tasks on the roads in the evening, and it lasts through the night. And they report a construction incident for every one of them. During the day, the roadwork in LA clears, leaving mostly just collisions. But a lot of them!