We have just bought a digital camera, and started accumulating tons of digital images that needed reliable store. There was also software on CDs that kept getting lost; various random bits of data - code, text, word documents - that spanned the last ten years, and occasionally required retrieving; and mp3 files.
Meanwhile, the number of computers that the household had kept increasing. My wife and I had laptops; my daughters had their own PCs. All were networked using a Windows CE-based router which was also a print, web, and a small file server.
After a brief attempt at using USB drives, I quickly came to conclusion that they are not the answer. There were several problems.
First, it is impossible to say if the data on a drive is good. There were bit errors introduced by copying. A bad sector in the middle of a file would suddenly make it unreadable, but you wouldn't know until you try. And of course occasionally, the whole drive would go bad. At least the later problem is easier to discover, if not fix.
Second, I ended up with lots of them, fast. Because I wanted to protect against hardware failures, I would usually create multiple copies of the data. Then I would forget where the data were, and how many copies I had. Most of the time I would fail to update all copies. Towards the end I started treating disks as a write-once media, creating a new backup every time I remembered to do it, on the theory that one of the copies would survive a disk/sector/bit error.
Finally, USB drives are slow. I don't know where they got the 480 megabit figure - I was getting 10 megabytes per second (100 megabits, if you account for the transport overhead), and even that only when the disk was directly connected to a PC, and there was only one disk. Plug in a couple in a USB hub, and the transfer quickly drops by a half.
So when Microsoft shipped the Small Business Server 2003 and gave a copy of it to every employee of the Windows team, I pounced on the opportunity, bought a second-hand workstation from one of the PC-Recycle shops that spawned around Microsoft, and built my first home server.
It turned out that building various servers for homes had quickly become my hobby. In the next five years I've tried almost every possible variant. From dirt cheap Fry's motherboard/CPU combinations to super-expensive Xeon/Supermicro performance monsters. From SBS 2003 to Server 2008. I've even helped start (as the very first developer, and then dev manager) the Windows Home Server project.
All in all, between home and work I've built perhaps 20 servers. I have 4 of them in my basement right now, totaling roughly 10TB of usable protected storage, doing my web site and email hosting.
The write-up below documents what I learned as a result.
The only caveat is that the software section is entirely Windows-based. Since I work at Microsoft, I can buy our own software for drastically discounted prices at the company store. If I didn't, perhaps there would be an incentive for me to look at what's available in the Linux world.
Or maybe not - one can buy SBS 2003 for around $300 these days, and this comes with Exchange, domain controller, file server, and everything else in one package conveniently wrapped in one installer. Would it be worth for me to learn Linux to avoid spending $300? I don't know. (http://1-800-magic.blogspot.com/2008/10/linux-vs-developers.html).
Anyway, let's get to the meat!
There are three priorities that you should follow when building a home server.
First, it must be reliable.
Unlike a personal computer, a home server is always on, and is rebooted extremely infrequently. As a result, it is prone to weird errors that are usually invisible on a typical personal computer - such as spurious bit flips in memory, and during data transfers.
When my home servers were cheap Fry's motherboard/CPU combos, they would usually hang every month or so, and would produce a bit flip in data transfers - usually one per 100-200GB of files copied. It was then when I got in the habit of diffing the source and destination every time I copied files around - and in many cases I discovered that I was lucky I checked.
Therefore, a server motherboard and ECC memory are absolutely required. Say no to cheap desktop hardware - in most cases the motherboard should be in the $200+ range, and make sure it does take ECC memory before buying it.
Second, it must be able to accommodate many internal hard drives.
While you can add external drives, there are several problems with that. First, external drives usually cannot be combined in fault-tolerant arrays, which is the option you will most certainly want (there will be more discussion below). Second, they make the system fragile - they fall off, the cables spuriously disconnect. Third, they create an unsightly - and more importantly, hard to navigate - mess of cables and power supplies that makes the server impossible to move.
Definitely get a full tower case - not a mini- or mid-tower. Make sure that there are fans near the internal 3.5" hard drive bays, so the drives are properly cooled. Check the 5.25" bays - it would be the best if there was no railing in these slots. Smooth walls where the equipment is held in place only by the screws are the most versatile: many SATA enclosures do not have grooves and won't fit into multiple 5.25 bays without modifying the case.
Do buy the SATA enclosure that fits the case - you will want a minimum of 11 drives per server - and there are no cases with 11 internal drive bays.
For example: http://www.newegg.com/Product/Product.aspx?Item=N82E16817119404, or http://www.newegg.com/Product/Product.aspx?Item=N82E16816119014 (but note the absence of the grooves on the second one).
Also, hard drives are power-hungry, especially on boot when they spin up. The power consumption of my server (which has 11 drives) jumps to 500 watts on boot, before receding to around 270 watts in steady state. If 500 watts does not seem like much, consider that much of the power is consumed on the 12V rails, whereas much of the modern power supplies are optimized for the maximum wattage on the 5V rail - the video card.
A 750W PSU is barely enough for 9 drives - get a minimum of 850W for 11, and 1200W if you plan to have 16. Do pay attention that the enough power is available on the 12V rails - allow 40W/drive.
And do get a brand name, this tends to be one of the more important components for the reliability of the system. I used CoolMax and Thermaltake and they were quite reliable.
Third, the server should be QUIET and power-efficient.
These two things go together.
Drives generate a lot of heat, and heat takes many fans to dissipate. Pick out the quietest (12cm or above), yet powerful: if the temperature rises inside the case, the CPU fan (which is small, and therefore requires more RPM) tends to sound like a rocket engine. Even if the server is in the basement, you probably do not want it to be the most audibly prominent thing there. And forget the closet, unless it has proper ventilation :-).
Do get a CPU with a decent power management. Currently, 3000-series Xeons seem to have a decent tradeoff between the power consumption and the compute power.
Also keep in mind that a big server might dissipate ~250-300W of heat. This is equivalent to 1/2 of a portable oil heater, operated on low. And it's always on. Check the electricity prices in your neighborhood :-).
What is less important?
Rejoice - there ARE a few things you do NOT need to care about. They are usually the things you care most when building a desktop machine.
Unless you plan to run a virtual farm, CPU computing power is not important. But CPU I/O bandwidth is. Most desktop processors optimize for the former. Many server processors optimize for the later.
As of this writing, I would recommend Xeon 3xxx series. They are relatively cheap, have great power efficiency, and more than enough juice to run any Microsoft server OS. They do not support dual configuration - but two cores are plenty for serving files.
Obviously, you don't care about graphics - a built-in video adapter is available on most server motherboards, and it's more than enough.
More controversially, I do not recommend spending money on hardware-based RAID controllers.
(A short aside about RAID.
There are 3 most popular modes of combining multiple hard drives into one storage array for speed and/or reliability. All three can be accomplished in software - by storage driver doing the work - or in hardware - by the controller card doing the work.
RAID0 places half of the data on one hard drive, and one on another, in stripes. As a result, you are writing your data on two disks simultaneously and effectively at twice the speed. But if any one drive fails, all your data is gone - the probability of the array failure is twice the probability of a failure of a single drive. Because the task can be accomplished by scheduling two DMA transfers in parallel, the best hardware is not any faster than a software implementation.
RAID1 places the same copies of the data on both hard drives. If one fails, the other survives, and the array can be repaired by substituting the failed drive for a new. RAID1 arrays are very reliable - but you pay twice for protecting the data. Hardware and software speeds are the same for the same reason as above.
The best way to visualize RAID5 is to imagine a system of 9 drives where the first 8 store 1 bit of every byte, and the 9th stores the XOR of all the bits. If any of the drives fail, its contents can be regenerated from the other 7 and XORs (and if XOR drive fails, the mask can be regenerated again from the data). Note that excluding the cost of generating a mask, such a system could be 8 times faster in transferring the data. This is not how the RAID5 system really works - the data is striped, and XOR is computed from the corresponding stripes - but it's close enough. Real RAID5 arrays can consist of any number of disks greater than two - N-1 drives store the data, and Nth drive stores the XORs (again, it's more complicated in reality, but we'll skip the gory details).
RAID5 protects data for much less than RAID1, because of N drives only one is wasted. For this reason alone you will want to use RAID5 in your home server. Given a powerful CPU, a good software implementation could give RAID5 hardware a run for the money. As we will see below, the quality of software RAID5 stack in Windows Server varies.)
All versions of Windows Server support software RAID5. I am sure, Linux support it as well. Note that if your raid controller is cheaper than $400, chances are that its RAID5 implementation is in software, not hardware, anyway.
The most important consideration about hardware RAID is that it protects you from a disk failure, but not from the failure of the controller itself. Because the format of the data on hard disks managed by hardware RAID depends on the RAID card, the RAID controller cannot be substituted for a different model. So if your controller dies, so does your data if you can't get the exact replacement - which is very likely if it dies after a few years.
For this reason I prefer using software RAID - it is standard, it is hardware independent, and if your computer dies, you can put the disks in another computer and the array will be recognized.
I typically have 2 arrays of 5 disks each per server, plus one small (200GB) system disk.
Speaking of disks - more disks that are smaller have better reliability and better price/performance than very big expensive disks. I usually go for whatever is available around $100, which is typically two sizes below the current maximum.
For example, as of this writing the biggest available disk is 1.5T. 1T retails for ~$150, and 750GB drives can be had for ~90. 750GB would be what I'd use. For just below $1000 in drives, you'd get 6TB of usable protected space (10 drives in two 5-disk RAID5 arrays, 3TB of usable storage per array).
The memory. You don't need to stuff your computer with memory. For any reasonable home usage, 2-4GB is enough. For a corporate environment where hundreds of people may be using the system concurrently, this is obviously not true, but for home, it is really enough.
So in short, hardware wise, the rules are simple: big case, big power supply, server motherboard, 2-4GB of ECC memory, one midrange Xeon CPU, and simple, run-of-the-mill SATA controller card, with as many cheap disks as possible in software RAID5 arrays.
Again, this only covers the Windows options. I do not know much about what's available on Linux.
Windows Server 2008
- Works really well as a file server.
- RAID5 implementation is very good.
- Lots of useful data-management features.
- Available to current MSDN subscribers at 0 incremental cost. You keep your license key even if your subscription ends.
- If you don't work at MSFT, it's $800, plus CALs.
Windows Server 2003
- Cheaper than Server 2008 - can be had for $500.
- Software RAID5 implementation is crappy. Expect ~15MBps throughput.
- $500 is still a lot of money.
Windows Home Server
Full disclosure: I was the first dev and then the dev manager for the v1 of WHS.
This is an interesting product, and its target market is, well, the homes in need of a server. It is relatively cheap ($140 in Newegg, although it is an OEM version, which means that once registered, it is tied to a motherboard). It comes with a great backup solution, a UI custom-made for home server scenarios, and a nice remote access portal.
Its weakness is the storage system.
It is a conceptual variant of RAID1 - you can have some of your files duplicated. If you have a lot of data though, you're paying twice for its protection - for me, it would not just be prohibitively expensive, but a physical impossibility of cramming so many disks into the server chassis. The recommended way to expand Windows Home Server is via external USB disks, but (see above) I do not recommend doing it because it makes the physical installation fragile.
And because the UI and all other subsystems in Windows Home Server are tied to its custom storage management solution, you can use native Server 2003 software RAID5, but you would then have to keep these disks on the side and leave them unconfigured in the Windows Home Server UI. You would not be able to put backups on them.
To do this, install Windows Home Server on a computer with a single big hard drive. Do not create any shares, and do not put any data on the shares created for the users. Then, AFTER the installation, add more disks, and configure them in RAID5 arrays by using native Server 2003 UI. Leave these disks unconfigured in Windows Home Server UI. You will also need to use native UI to share them out.
Another way to use Windows Home Server is by putting it on one big, hardware-managed RAID5 system. You do have to pay more for hardware and be mindful of possible controller failure, but given a substandard performance of 2003 software RAID, you will win on performance.
A relatively small disadvantage of Windows Home Server compared to other options is its inability to join the domain, or be a domain controller. This had to be done to justify the much lower price.
Small Business Server 2003
This is the product with potentially the best price/performance ratio that has ever come out of Microsoft. It is a full implementation of Server 2003 with Exchange server, Sharepoint, and a nice remote access portal, and you can buy a retail version of it for a mere $300 (go to www.pricegrabber.com; search for Small Business Server 2003 Standard; be careful - a lot of options are CAL packs, not the software itself; find the RETAIL version - OEM copies are only marginally cheaper, but tied to the motherboard once you register them).
It comes with the installer that sets it up as a domain controller, and configures its components. If a PC has two network interfaces, it can even be used as a router! It serves as a DHCP server, and has a DNS proxy, and even a VPN server built-in, so you can access your home network from the outside.
And yes, you do get a real domain for your home. Your user accounts and passwords are now centralized, and any user can log in to any machine in the household using the same name and the password. It's really cool!
In addition, you can host your own web site, your own email, share the calendar with the family members (Microsoft Outlook is included), etc. Highly recommended!
First, do not put any data on the system hard disk (the whole disk, not just the partition): the write cache on that disk is turned off, so accesses are SLOOOOOOOW. This is because of the Active Directory database.
Second, SBS is restricted to be a domain controller: it cannot join any other domain.
I use the following relatively simple rules to ensure that my data is healthy.
- I ALWAYS diff source and destination on multi-hundred-gigabyte data transfers. Bit flips do happen when you're dealing with a lot of data. A single bit flip can render a ZIP file useless.
- I keep all data on RAID5 arrays. I keep multiple copies of truly irreplaceable data, such as digital pictures, videos, and documents. There is one server for such data where two RAID5 arrays are mirrored by hand (and are periodically diffed).
- Once every few months, I backup the irreplaceable data to a large drive and store it offsite.
- Truth is always on servers. All other computers may periodically keep a cache, but all modifications are regularly transferred to servers.
If you've made it so far, congratulations! You can build servers for your home for fun and profit, and avoid all the mistakes I've made when building mine :-)!