Tuesday, December 25, 2007

SBS 2003 - don't put your files on it!

I was running Small Business Server 2003 in my house since the day it was shipped (and so all MSFT employees in Windows Division got a copy). I installed it that same day, and I ***LOVE*** the product.

It's a wondeful integration of Windows Server the OS (normally, $700) with Exchange (normally, $1000+), firewall, and a nice setup package that handles all the nitty-gritties of setting up a domain, Exchange server, configuring the firewall - all the tasks that you need to be a sys admin to know how to do - into a nice, $300 package.

So I was running SBS 2003 for ever as an edge computer on my network - a replacement for a router, but it also allows me to fully host my domain. For quite some time I used an old Dell workstation (circa 1999) that I picked up at a PC Recycle depot. I noticed that file sharing was kinda slow, but I attributed it to the old hardware.

Earlier this year I went through a major computer upgrade, and part of it was a new hardware for SBS. This cannot be called slow - Xeon 3060 CPU, a very decent server motherboard, a fast SATA drive, and 4GB of RAM. Initially I broke hard drive into two partitions - one for the OS, and the other for per-user file shares.

Then I tested copying the files, and it it was terribly, horribly slow. The disk was doing something on the order of 8MBps writes. What's worse, if you copy a very big file (a couple of gigs), now that the computer had tons of RAM, it would cache it all, and then start writing. And while it is writing, it would monopolize all of the disk bandwidth, so all page operations will be just tucked at the end of this practically infinite queue, so the whole OS would ground to a halt - the mouse would move, but practically nothing else will, and that's for 5-10-15 minutes! WTF!?

After a lot of experimentation, I found out that if you write files on any partition on a system disk, the writes are excruciatingly slow. If you put them on any other disk, they are an order of magnitude faster.

So I started asking around. The SBS team knew more or less nothing - it has changed almost entirely since 2003 shipped, and they didn't really test file share performance back then, either. Finally I found a dev manager from one of the storage teams who had an answer - turns out that Active Directory turns off the disk caching for the drive it is running on. The reason it because a lot of drives lie about flushing the disk cache - they tell OS that they did, but in reality the data may still be somewhere in the disk cache. This completely defies the transaction support in the AD, and if the power is lost at the right moment, the database can be corrupted.

Of course, no sane person would run a file server from the AD controller, right? So that limitation was probably OK for what they designed it to do - most of the writes on this computer are writes into the directory, and they need to be flushed any way.

Of course if you're running a Small Business Server, all of it - AD, Exchange, file sharing - runs from the same machine, and most likely from the same hard drive. Which leads to terrible performance of the storage-bound parts of the software stack.

Morale: if you do configure user directories on SBS, or use any other kind of file sharing, but the shares on a separate physical disk.

3 comments:

Anonymous said...

Bottom line ? Run Linux ? :-)

Anonymous said...

If you enable the Domain Controller role, operating system disables "write cache" on the system drive by default. You don't need in any kind of loss the Active Directory data, yes? It's well documented on MS TechNet.

And SBS 2003 is Domain Controller by default.

:-)

Anonymous said...

Hope you get some kind of notification that you've received comments, so that you're still reading this... :)

What's the resolution for this? Simply "don't do it" ??

Is there some kind of override to this?

I'm running my server on a small-form-factor workstation-class box, and there's only room for one internal drive, so I really need to have write caching enabled permanently. It auto-disables any time the server is rebooted, which is incredibly aggravating.

Would installing a UPS, and configuring the OS to recognize that it's on a UPS resolve this, or will the behavior continue?