Wednesday, May 19, 2010

Can we stuff 2000 calories in one drink? Yes, we can! (via Reddit)

This is really, truly horrible:

The worst-worst drink has 2,010 calories, 131 g fat (68 g saturated), 153 g sugars (this is the number of calories a human should consume in a day; and more fat than one probably should have in a week).

#4 on the list has 1,210 calories, 19 g fat (10 g saturated), and 240 g sugars! Yes, this is really 1/4 of a kg. In one "drink"!!!

Monday, May 17, 2010

Server 200x RAID tips and tricks

Since 2003, Windows Server was shipping with a very cool feature: software RAID.

Software RAID has two major advantages over hardware.

First, hardware RAID protects you against the disk failure, but not against the controller failure. The drive array contains proprietary disk allocation information that varies from manufacturer to manufacturer and controller to controller, so disks are not easily movable between different controllers. So when your controller fails - potentially, long time in the future, when the same boards are no longer available, - or a new release of OS stops supporting your older driver, you may be very much out of luck with your data.

Second, software RAID is considerably more flexible than hardware RAID. Hardware RAID operates on disks as atomic units - you RAID the whole disks together. Software RAID operates on volumes, and each volule can be configured with different level of redundancy. For example, you can have 2 constructs on two disks at the same time - an OS partition that is RAID-1 mirrored with an image on the second disk, and another volume that combines the rest of the space on two drives as a single span or RAID-0. The level of redundancy is selected per volume, not per disk.

Also, for those of us who like coding, software RAID has a very nice software interface ( and its undocumented managed counterpart in Microsoft.Storage.Vds.dll) which allows one to code simple things like checking the health of the storage and send an email if something goes bad.

But what about performance? A while ago when we were designing Windows Home Server, we tested various hardware RAID implementations versus software RAID in Server 2003.

It turns out that both RAID-0 and RAID-1 exhibit very similar performance for both hardware and software solutions. If you think about what the system has to do (write the same data to two disks at the same time in the case of RAID-1) it quickly becomes obvious that hardware implementation does not really add anything over the software in this case: both can write the same data in two places at the same time with the same speed. Big surprise :-).

RAID-5 is a different beast though - there actually is a computation going on, and it is possible to build a specialized chip for doing vector XOR operations that would leave the general purpose x86 in the dust.

A much bigger problem also exists in the lack of integration between the formatting and the RAID code. When you format a RAID drive, the default allocation unit that the UI presents is very small.

Due to the way the software RAID is implemented, it leads to incredibly slow performance. On my relatively powerful system the writes clock at only 20-30MBps (this going to the drives that are supposed to sustain 3Gbps, or 300MBps transfers). Selecting a more reasonable allocation unit of 64k improves the write speed by a factor of four, to almost 120MBps.

The other performance problem that is format related is after creating a new RAID volume, the default behavior is that format and resync happen at the same time. I covered it in the previous blog post here:

In summary, here are the two very simple rules can make your RAID array much faster:
- Select the 64k as a default allocation unit when formatting the RAID-5 volume.
- When formatting any new RAID volume, use quick format first, wait for the volume to finish resyncing, then repeat with full format if you like (remember to keep the large allocation unit in the second format though!)

Happy RAIDing!

Solution to the slow formatting puzzle

A couple of weeks back I posted a puzzle about an experience that I just had with Server 2008 R2 software RAID subsystem: somehow the speed of formatting a new very large RAID-5 array was highly unpredictable. It advanced by a mere 12% a day for the 4 days and then suddenly sped up 4x and finished the last 50% of the formatting in just one day.

Furthermore, I expected precisely this behavior. Why?

This weirdness occurs because of the typical Microsoft phenomenon that was best expressed by an acquaintance of mine that works in Windows Mobile. He thought that the biggest difference between Microsoft's and Apple's approaches to development are these:

Microsoft attacks problems horizontally: it builds core system, then builds layers on top of it. When it's time to ship and something needs to get cut, it's the top of the stack that goes first - and more often than not it's the user experience.

Apple develops software vertically - it enables a user experience, from top to bottom. When they need to cut, they cut the entire experiences (and also maybe the bugs, judging from the fact that my iPhone seems to crash considerably more often than most of Windows Mobile phones I owned in the past). So for example, iPhone would ship with Bluetooth support just for the mono phone headset, but no stereo profile. But the experiences that are left are implemented completely, to the maximum possible level of usability.

Back to our RAID problem. The reason the formatting is so slow in the very beginning is because two things happen at the same time: RAID resync and the formatting. Resync gets kicked off immediately after the RAID system is built, and it simply ensures that the parity volume (in the case of RAID-5) has correct checksums, or the mirrored volume (in the case of RAID-1) has a correct copy of the primary volume.

The other thing that happens once you create RAID volume from the UI is, of course, formatting. One can select quick format which only creates a file system. However, most people probably prefer to run full format when a new drive is added, just to make sure that it is not full of bad sectors.

So we have two write operations that are going on - first one is creating the redundant information, the other is filling the disk with zeroes.

Writing to the disk is very fast these days, but the seek time has barely improved in the last decade, and because the two writes happen in different places on the disk, the whole thing is completely dominated by the disk head moving from one track to another. And with a seek time of 15ms you can only have ~70 of these per second.

Of course, if the format is in progress, there is absolutely no sense to do a resync at the same time - whatever redundancy data resync creates is going to be instantly overwritten by the format.

But there was no Steve Jobs standing over the devs' shoulder, and getting the (filesystem) format in sync with (block device) RAID required two different teams to do something together so it probably got punted. I am sure it's in in a readme somewhere... and now in at least one blog :-).

So to avoid unnecesary delays, select quick format option when adding the RAID volume, wait for the resync to finish, THEN format it again with the full format.

Saturday, May 1, 2010

Formatting the large hard drive - continued

As mentioned in the previous post, one of my servers is formatting a 7.5TB RAID-5 volume.

During the first 24 hours, it has completed 13%.
During the next 24 hours, the progress was at 26%.
Next day, it was at 38%, and the day after (last night) it clocked at 50%.
However, this morning, 12 hours later, it was at 75%, and tonight I expect it to be completed.

So in the very last day the system's progress was the same as for the first 4 days. Moreover, I fully expected this to be the case. Can anyone guess why?