Sunday, October 26, 2008

Startups in the down economy and venture financing

I've recently read an article by Paul Graham about starting a company in bad economy (http://www.paulgraham.com/badeconomy.html). The gist of his position is that founders and the business idea are far more relevant to startup's success than sources of finances, and that a lot of formidable companies (Apple, Microsoft) were founded where when the economy was not exactly booming.

His position is logical of course, but it is interesting to note that a sound startup that is cash-positive on day one, or early enough, does not require venture capital. Neither Apple, not Microsoft had VCs as their shareholders when they went public, and the founders still owned vast majority of stock on IPO - something that happens very rarely today. I suspect - although I did not check - that the same is true for Sun, Oracle, Intel, SAP, and most other companies of the pre-dot-com generation.

And of course most of the crap VCs bring to market these days is companies without a revenue model (with the business based on attracting the eyeballs, not dollars), and they get sold either to the unsuspecting public, or a different company (i.e. the greater fool) before they make their first dime. No matter - one could always count that the greater fool would be found, and separated from their money.

The more entertaining read, however, was not the Graham's article itself, but the reaction it caused in the Silicon Valley "businessmen" (http://uncov.com/paul-graham-shut-your-face).

I was surprised by the level of vitriol, before realizing that the world of the 20-something founders must be crumbling: indeed, the immediate future is probably going to be tough for Intels and Microsofts of the world, but for the "eyeballs"-centered world of the Facebook apps vendors it will be nothing short of a mass extinction event.

People keep paying for the tools through the recessions, but the marketing budget - and with it, the Facebook ad revenue - gets cut first. Watch Google earnings for the next couple of quarters...

Anyway, Graham's post and the reactions to it are things of the not so distant past. The reason I remembered them was because today I read another article, this time by Thomas Friedman (http://www.nytimes.com/2008/10/26/opinion/26friedman.html), in which he argues for the government to play passive role in the banks in which it takes ownership. The question he poses is - would a government official give Larry and Sergey a loan to found Google.

As he often is on economic issues, Friedman is full of it. The whole reason we're having a crisis is because the banks that are part of the country's monetary system are also engaged in extremely risky speculative activities. Such as venture financing, derivatives trading, and so on. Specifically - because they are currently lending money to Larry and Sergey.

This was not always the case - before 1999 banks could not do any of these high risk activities, but it meant that the depositors' money - and the country's monetary system - were safe. Venture financing, complex financial instruments, insurance were handled by organization that had these functions - not banking - in their charter. All VCs could go out of business simultaneously, and nobody - outside the startup world - would have noticed.

Incidentally, this regulation framework was introduced during the Great Depression, as a reaction to activities that caused banks to fail during that era.

Gramm-Leach-Bliley Act of 1999 - heavily lobbied for by Greenspan, and written by McCain's former economic advisor - allowed banks to enter the businesses that were previously verbotten. As a side effect, it poured a lot of money into the .COM startup market, and later into the subprime mortgages, and thus precipitated the current collapse.

So of course government officials should not be deciding whether to lend money to a young startup - because a bank should never be lending money to uncollateralized ventures in the first place. This is the business of a venture financing firm, an angel investor, the founders themselves, but not a bank.

A bank should be - and for the most part of the recent history it was - an extension of Fed, and of course it is perfectly reasonable for a government official to sit on something so close to Fed and make decisions that impact the country's monetary system!

Saturday, October 25, 2008

Wassup 2008

Titanic

An ad in Micronews (Microsoft's internal newspaper) circa 1998: "Titanic for sale. Tape 1 watched once. Tape 2 never watched."

Movies to see

Since I've been mostly critical of the movies I've seen recently, I figure the counterbalance post listing the movies I like is long overdue. Otherwise I am starting to sound like a crumudgeon.

So, in no particular order...

Fantasy


Comedy


Comedy/Drama/Romance - all in one


Animation


Adventures


Horror


War


Epics


Kurosawa


Sergio Leone


Teleseries


Russian Specifics (could be hard to understand if you were not born there before Perestroika)

Religulous: A review

We watched Bill Maher's "Religulous" this Friday. The movie had a few entertaining moments but overall it was a little bit heavy on Bill's own persona. Maher is great at editorializing, but this was a two hour movie - and it felt like over half of it was about Bill himself.

I wish we could let his subjects talk more - I have a hunch that what they would have said then would be a lot funnier.

As it were, most of the interviews followed the following template:
Interviewee: <some insanely crazy stuff>
Bill Maher: "You can't seriously believe this crap..."
Interviewee: Yes I can...
Bill Maher: C'mon... You can't.

The one moment I really enjoyed was the interview with the senator from Arkansas, Mark Pryor, where he says that one doesn't have to pass an IQ test to get into the Senate. The absolutely priceless part is watching him right after he says it, and see how he slowly realizes what he'd just said, as the smile fades away from his face...


The other were a few quotes from Founders, to illustrate what they actually thought about Christianity (that the United States of America was founded as a "Christian nation" has long been the working theory of the right wing, and a platform of Texas Republican Party).

Here are the quotes:

Lighthouses are more useful than churches.
-- Benjamin Franklin


Christianity is the most perverted system
that ever shone on man.
-- Thomas Jefferson


This would be the best of all possible worlds,
if there were no religion in it.
-- John Adams


Of course, the movie will not win any converts. The people who believe in the "talking snake" will probably never even see it. But as far as the entertainment value for the people who don't - I would say wait for the DVD.

Thursday, October 23, 2008

When the Democrats are in power, their first action should be to break up Fox News...

AMEX Centurion

http://blogs.moneycentral.msn.com/smartspending/archive/2008/10/22/all-about-the-mysterious-black-card.aspx

So apparently there are people who, while paying $2500 yearly membership fee for a credit card, do care about the reward points accrued when they rent a limo...

"Let's hope we are all wealthy and retired by the time this house of cards falters...''

Oct. 22 (Bloomberg) -- Employees at Moody's Investors Service and Standard & Poor's privately questioned the value of some mortgage-backed securities that were given creditworthy ratings, saying they created a ``monster,'' according to e-mails released by a U.S. House panel.

http://www.bloomberg.com/apps/news?pid=20601087&sid=a2EMlP5s7iM0&refer=worldwide

Dear Red States...

"We've decided we're leaving. We intend to form our own country, and
we're taking the other Blue States with us."

http://www.craigslist.org/about/best/sfo/80714812.html

Wednesday, October 22, 2008

Interviewing for fun and profit

I've been interviewing a lot lately - starting the Windows Home Server project, helping build Servers and Tools division in China, serving on Seattle hiring committee at Google, and now building my new team. All this meant spending anywhere from 2-3 hours a week at Google to 7-8 hours a day in China on recruiting activities.

From Seattle to Lahore, from Google office in Fremont to Microsoft campus in Shanghai, the candidates that end up getting offers look very similar. This post is about how to be that kind of a candidate.

Getting a job at a good software development company is like being a genius - it is 1% inspiration, and 99% perspiration. This means a lot of work, vast majority of which must happen before the event. The good news is that this work is portable - once done, it opens a lot of doors, and puts you in a position where you choose where you want to work, not the other way around.

So without much ado, let's delve into the gory details.

Prework - long term


All interviewers look for basically the same thing - candidates that are "smart and get things done". There is no good way to determine one's intelligence, so we use an approximation. We ask the interviewees to solve complex algorithmic problems on the whiteboard. Luckily, unlike intelligence, this ability can be trained.

Unfortunately, most software development jobs, with very rare exceptions, do not provide on-the-job training for algorithmic skills. A colleague of mine says that there are two types of developers - people who can call APIs, and people who can build APIs. Vast majority of the jobs in this industry are about calling APIs. The most highly paid jobs, however, are about building them. Most teams at Microsoft, Google, and the like are looking for people of the second kind.

If you are not already working in the team that builds APIs, you need to learn to love computer science as science. Get interested in algorithms design. Learn about computer architecture. Start paying attention to how the functions you're calling are implemented.

Do you know what's actually inside the HashMap class you've been using all the time? What algorithm does STL use for sorting? Why? When is heap sort faster than the merge sort? What does a modern computer really do when reading a byte out of memory?

If you live near a good university, take a class every few years to keep connection with the scientific side of the software development.

In an earlier article (http://1-800-magic.blogspot.com/2007/12/recipe-for-getting-employed-by-google.html) I claimed that reading and doing the exercises in three books will get you a job in any good company. I still believe this to be the case - but you do need to actually read these books, and do the exercises. Just buying and putting them on the shelf will not work!

The "getting things done" part during the interview is usually proxied by having the candidate to write a lot of code on the white board. The better the code looks - syntactically correct, concise, professional - the more likely that the candidate has written a lot of code in his or her career, and therefore, got more things done.

One can see the difference when doing something as simple as copying a string:

while(*d++ = *s++) // Good!
;
vs.
 // Generates the same binary code,
// but too much text!
int i;
for(i = 0; s[i] != '\0'; ++i)
{
d[i] = s[i];
}
d[i] = s[i];
(This is where STL with its "string s1 = s2;" really gets you :-).)

The only way to good code is to practice writing - a lot! Jump into the parts of your project that generates as much code as possible. If you're a developer, and you're not writing at least 20Kloc/year, your coding muscle does not get enough exercise. Get a coding project at home - write a piece of automation software for your light switches!

Prework - short term


The above suggestions are designed to get and keep you in shape as a software developer. They maintain the set of skills (AKA coding) that makes you employable. However, even great software engineers often do not get the job they want because there are other things that are required for a match.

I've been asking every person who I interviewed for various management positions at Microsoft what they are looking for when interviewing people. Every time the passion for technology was coming out very close to the top. What this means is that ideal candidate (1) wants to work in this general area, and (2) wants this job.

As an interviewee, you must have have plausible answers for both.

The best (and the only) proof that you want to work in this general area is to demonstrate that you invested time to research deeply the technologies/customers/competitive landscape/standards/other subcurrents that shape this part of the industry. Buzzwords are not enough - deep knowledge to an extent where one knows enough to form an educated opinion is required. I've seen many a qualified candidate being rejected because they gave standard, cookie-cutter answers to questions which an interviewer had considered too shallow.

SOAP is a great technology? Why? Because it uses XML? And why is that a plus?

The best (and the only) proof that you want this job is that you've spent time to learn about it. A lot of information is available publically. For example, if you're interviwing for a position of a program manager at Microsoft, it helps to know what this job really is.

Whenever I go on college recruiting trips for Microsoft, the most amusing thing is interviewing candidates for program management positions. A program manager's job does not actually involve management, nor it is any more senior compared to test and dev disciplines. Both of these aspects are described in detail at both the Microsoft recruiting web sites, as well as on the application for the college interviews. Yet 90% of the college candidates that I interviewed on campus assumed that (1) they will be managing people, and (2) right out of school they were entitled to do this.

If it's an internal transfer within the same company, more often than not most of the information about the project is available. I once had a case where a candidate knew more about the history of the project, the market, and the competition - without ever having worked on it - than I did. Needless to say, she got the job :-).

More often than not, recruiters and interviewers will go out of the way telling you directly what will be on the interview. An email that I've got from Google when I interviewed there listed all the areas where the interviews focused. Microsoft recruiters give out no less details. I am telling every candidate, before meeting them, about the areas where the interview will go. You have no idea how many people completely ignore the advice!

If, for example, it's written on the web site, in the recruiter email, and in the interview packet that THE INTERVIEW WILL INVOLVE WRITING CODE ON THE WHITE BOARD, and when you show up at the company and are asked to write code on the white board you say that your coding is "rusty", what outcome do you expect? It's a double-whammy: not only do you not match an explicit qualification for the position, you have wasted company's time and money by coming to the interview, because you apparently have not even bothered to read the description of the job!

Do prepare good, deep questions about things that you should want to know, but cannot. Avoid cliches - believe me, "Describe your typical day as a developer at X" does you no good. This question has been asked many times. Too many. The only reaction the interviewer has to a question like this - "Oh, here we go again..."

The day of the interview


Don't be late. It's unprofessional. Plus it cuts into the time that is available for you to solve the problem :-).

Don't use pseudocode, unless the problem requires it (many graph algorithms do). If you're a great coder, you should be writing in a real computer language with ease. Plus we warned you that (1) we're looking for great coders, and (2) you will be coding on a white board.

Avoid syntax errors when coding. Well-written, well-formatted code is classy and shows experience.

Do be VERY detail-oriented. Check and recheck what you're writing. Do not shrug off the "off-by-one" errors. If you're "always messing up some mundane detail" ((c) Office Space), you should not be in this trade. Messing up mundane details in the past had caused spaceships to miss their destination, to say nothing about expensive product recalls.

Do listen to what the interviewer says. This cannot be overemphasized! They are here to give you cues, to help you. Most of the time they genuinely want you to succeed. They know their problems. If an interviewer makes a suggestion, TAKE IT. (S)he knows!

Do test your code. After the code has been written, think about small, but exhaustive set of test inputs that would exercise all the code paths in your function, and elicit all the differences in behavior. For example if you're writing a binary search, test it on inputs with odd and even number of elements. An algorithm that takes an array should work well on inputs of size 0, 1, 2 and 3. Test the inputs that might cause overflows and out of bounds conditions.

Do step through the code as written, not as you think it should work. This is the most frequent error that I have been observing - a person would look at the data input and then reason through it with the algorithm (s)he had in mind - but not the one on the white board. Pretend you're a CPU instead. Write a sample on the board, create a column for every variable, and start iterating - mechanically.

Be humble. If you're saying "I am really good at X", this is a challenge to the interviewer. They will give you harder problems, give you less help, and treat the failure harsher because you behaved as an arrogant braggart. It's much better to underpromise and overdeliver than the other way around.

And last but not least, be enthusiastic. Ask questions. Offer suggestions. Demonstrate that YOU WANT THIS JOB.

Speaking of wanting the job.

It helps to set up multiple interviews when you're thinking about changing the job, and sequence them such that the more desireable jobs go last. Interviewing is a frame of mind, and it helps to put onself into the right state. But on the day you go to the interview - even if you aren't 100% sure this is the job you'll take in the end - on this day you have only one goal - to get this job. WANT IT!

Happy interviewing!
-----

By the way - we are hiring!

http://1-800-magic.blogspot.com/2008/07/looking-for-great-devs.html

Tuesday, October 21, 2008

Joe the Millionaire

The stupidity of McCain's "Joe the Plumber" dominated campaign - and the people who fall for it - is surreal.

Under Obama's plan, the taxes are raised for people with $250k AGI (adjusted gross income).

For the small business owner to pay him- or herself $250k in salary, the small business must bring at least as much in profit - assuming no reinvestment in the business growth, and no other share- or bondholders.

At a very conservative P/E of 10, this means that the business would be worth $2.5M.

With $2.5M in assets, that would make Joe the Plumber a multimillionaire - exactly the constituency to which Republican party caters. Well, that and of course the stupid people who take McCain's claims at the face value...

So... is this really better than Soviet occupation?

KABUL, Afghanistan — An appeals court sentenced a young Afghan journalist to 20 years in prison for blasphemy on Tuesday, overturning a death sentence ordered by a provincial court but raising further concerns of judicial propriety in the case.

The defendant, Sayed Parwiz Kambakhsh, 23, was a journalism student in the northern city of Mazar-i-Sharif and worked for a daily newspaper there. He was arrested last October and accused of printing and distributing an article from the Internet about Islam and women’s rights, on which he had written some comments about the Prophet Muhammad’s failings on that issue.

(the emphasis is mine)

http://www.nytimes.com/2008/10/22/world/asia/22afghan.html?hp

I guess... "freedom" is messy? ((C) Rumsfeld)

Monday, October 20, 2008

McCain and the witchcraft!

Surprised when McCain looks "confused and like an idiot"? It's the witchcraft, of course!



'The occultists are "weaving lazy 8's around McCain's mind to make him look confused and like an idiot". Bree K. said we need to break these curses off of him that are being sent from Kenya.'

http://www.injesus.com/index.php?module=message&task=view&MID=CB007FA2&GroupID=2A004N9G&label=&paging=all

Sunday, October 19, 2008

How to build a server for your house

Five years ago there came a moment in my life where I realized that I was being overwhelmed by the amount of data in our household.

We have just bought a digital camera, and started accumulating tons of digital images that needed reliable store. There was also software on CDs that kept getting lost; various random bits of data - code, text, word documents - that spanned the last ten years, and occasionally required retrieving; and mp3 files.

Meanwhile, the number of computers that the household had kept increasing. My wife and I had laptops; my daughters had their own PCs. All were networked using a Windows CE-based router which was also a print, web, and a small file server.

After a brief attempt at using USB drives, I quickly came to conclusion that they are not the answer. There were several problems.

First, it is impossible to say if the data on a drive is good. There were bit errors introduced by copying. A bad sector in the middle of a file would suddenly make it unreadable, but you wouldn't know until you try. And of course occasionally, the whole drive would go bad. At least the later problem is easier to discover, if not fix.

Second, I ended up with lots of them, fast. Because I wanted to protect against hardware failures, I would usually create multiple copies of the data. Then I would forget where the data were, and how many copies I had. Most of the time I would fail to update all copies. Towards the end I started treating disks as a write-once media, creating a new backup every time I remembered to do it, on the theory that one of the copies would survive a disk/sector/bit error.

Finally, USB drives are slow. I don't know where they got the 480 megabit figure - I was getting 10 megabytes per second (100 megabits, if you account for the transport overhead), and even that only when the disk was directly connected to a PC, and there was only one disk. Plug in a couple in a USB hub, and the transfer quickly drops by a half.

So when Microsoft shipped the Small Business Server 2003 and gave a copy of it to every employee of the Windows team, I pounced on the opportunity, bought a second-hand workstation from one of the PC-Recycle shops that spawned around Microsoft, and built my first home server.

It turned out that building various servers for homes had quickly become my hobby. In the next five years I've tried almost every possible variant. From dirt cheap Fry's motherboard/CPU combinations to super-expensive Xeon/Supermicro performance monsters. From SBS 2003 to Server 2008. I've even helped start (as the very first developer, and then dev manager) the Windows Home Server project.

All in all, between home and work I've built perhaps 20 servers. I have 4 of them in my basement right now, totaling roughly 10TB of usable protected storage, doing my web site and email hosting.

The write-up below documents what I learned as a result.

The only caveat is that the software section is entirely Windows-based. Since I work at Microsoft, I can buy our own software for drastically discounted prices at the company store. If I didn't, perhaps there would be an incentive for me to look at what's available in the Linux world.

Or maybe not - one can buy SBS 2003 for around $300 these days, and this comes with Exchange, domain controller, file server, and everything else in one package conveniently wrapped in one installer. Would it be worth for me to learn Linux to avoid spending $300? I don't know. (http://1-800-magic.blogspot.com/2008/10/linux-vs-developers.html).

Anyway, let's get to the meat!

Hardware


There are three priorities that you should follow when building a home server.

First, it must be reliable.

Unlike a personal computer, a home server is always on, and is rebooted extremely infrequently. As a result, it is prone to weird errors that are usually invisible on a typical personal computer - such as spurious bit flips in memory, and during data transfers.

When my home servers were cheap Fry's motherboard/CPU combos, they would usually hang every month or so, and would produce a bit flip in data transfers - usually one per 100-200GB of files copied. It was then when I got in the habit of diffing the source and destination every time I copied files around - and in many cases I discovered that I was lucky I checked.

Therefore, a server motherboard and ECC memory are absolutely required. Say no to cheap desktop hardware - in most cases the motherboard should be in the $200+ range, and make sure it does take ECC memory before buying it.

http://www.newegg.com/Store/SubCategory.aspx?SubCategory=302&name=Server-Motherboards

Second, it must be able to accommodate many internal hard drives.

While you can add external drives, there are several problems with that. First, external drives usually cannot be combined in fault-tolerant arrays, which is the option you will most certainly want (there will be more discussion below). Second, they make the system fragile - they fall off, the cables spuriously disconnect. Third, they create an unsightly - and more importantly, hard to navigate - mess of cables and power supplies that makes the server impossible to move.

Definitely get a full tower case - not a mini- or mid-tower. Make sure that there are fans near the internal 3.5" hard drive bays, so the drives are properly cooled. Check the 5.25" bays - it would be the best if there was no railing in these slots. Smooth walls where the equipment is held in place only by the screws are the most versatile: many SATA enclosures do not have grooves and won't fit into multiple 5.25 bays without modifying the case.

Do buy the SATA enclosure that fits the case - you will want a minimum of 11 drives per server - and there are no cases with 11 internal drive bays.

For example: http://www.newegg.com/Product/Product.aspx?Item=N82E16817119404, or http://www.newegg.com/Product/Product.aspx?Item=N82E16816119014 (but note the absence of the grooves on the second one).

Also, hard drives are power-hungry, especially on boot when they spin up. The power consumption of my server (which has 11 drives) jumps to 500 watts on boot, before receding to around 270 watts in steady state. If 500 watts does not seem like much, consider that much of the power is consumed on the 12V rails, whereas much of the modern power supplies are optimized for the maximum wattage on the 5V rail - the video card.

A 750W PSU is barely enough for 9 drives - get a minimum of 850W for 11, and 1200W if you plan to have 16. Do pay attention that the enough power is available on the 12V rails - allow 40W/drive.

And do get a brand name, this tends to be one of the more important components for the reliability of the system. I used CoolMax and Thermaltake and they were quite reliable.

Third, the server should be QUIET and power-efficient.

These two things go together.

Drives generate a lot of heat, and heat takes many fans to dissipate. Pick out the quietest (12cm or above), yet powerful: if the temperature rises inside the case, the CPU fan (which is small, and therefore requires more RPM) tends to sound like a rocket engine. Even if the server is in the basement, you probably do not want it to be the most audibly prominent thing there. And forget the closet, unless it has proper ventilation :-).

Do get a CPU with a decent power management. Currently, 3000-series Xeons seem to have a decent tradeoff between the power consumption and the compute power.

Also keep in mind that a big server might dissipate ~250-300W of heat. This is equivalent to 1/2 of a portable oil heater, operated on low. And it's always on. Check the electricity prices in your neighborhood :-).

What is less important?
Rejoice - there ARE a few things you do NOT need to care about. They are usually the things you care most when building a desktop machine.

Unless you plan to run a virtual farm, CPU computing power is not important. But CPU I/O bandwidth is. Most desktop processors optimize for the former. Many server processors optimize for the later.

As of this writing, I would recommend Xeon 3xxx series. They are relatively cheap, have great power efficiency, and more than enough juice to run any Microsoft server OS. They do not support dual configuration - but two cores are plenty for serving files.

Obviously, you don't care about graphics - a built-in video adapter is available on most server motherboards, and it's more than enough.

More controversially, I do not recommend spending money on hardware-based RAID controllers.

(A short aside about RAID.

There are 3 most popular modes of combining multiple hard drives into one storage array for speed and/or reliability. All three can be accomplished in software - by storage driver doing the work - or in hardware - by the controller card doing the work.

RAID0 places half of the data on one hard drive, and one on another, in stripes. As a result, you are writing your data on two disks simultaneously and effectively at twice the speed. But if any one drive fails, all your data is gone - the probability of the array failure is twice the probability of a failure of a single drive. Because the task can be accomplished by scheduling two DMA transfers in parallel, the best hardware is not any faster than a software implementation.

RAID1 places the same copies of the data on both hard drives. If one fails, the other survives, and the array can be repaired by substituting the failed drive for a new. RAID1 arrays are very reliable - but you pay twice for protecting the data. Hardware and software speeds are the same for the same reason as above.

The best way to visualize RAID5 is to imagine a system of 9 drives where the first 8 store 1 bit of every byte, and the 9th stores the XOR of all the bits. If any of the drives fail, its contents can be regenerated from the other 7 and XORs (and if XOR drive fails, the mask can be regenerated again from the data). Note that excluding the cost of generating a mask, such a system could be 8 times faster in transferring the data. This is not how the RAID5 system really works - the data is striped, and XOR is computed from the corresponding stripes - but it's close enough. Real RAID5 arrays can consist of any number of disks greater than two - N-1 drives store the data, and Nth drive stores the XORs (again, it's more complicated in reality, but we'll skip the gory details).

RAID5 protects data for much less than RAID1, because of N drives only one is wasted. For this reason alone you will want to use RAID5 in your home server. Given a powerful CPU, a good software implementation could give RAID5 hardware a run for the money. As we will see below, the quality of software RAID5 stack in Windows Server varies.)


All versions of Windows Server support software RAID5. I am sure, Linux support it as well. Note that if your raid controller is cheaper than $400, chances are that its RAID5 implementation is in software, not hardware, anyway.

The most important consideration about hardware RAID is that it protects you from a disk failure, but not from the failure of the controller itself. Because the format of the data on hard disks managed by hardware RAID depends on the RAID card, the RAID controller cannot be substituted for a different model. So if your controller dies, so does your data if you can't get the exact replacement - which is very likely if it dies after a few years.

For this reason I prefer using software RAID - it is standard, it is hardware independent, and if your computer dies, you can put the disks in another computer and the array will be recognized.

I typically have 2 arrays of 5 disks each per server, plus one small (200GB) system disk.

Speaking of disks - more disks that are smaller have better reliability and better price/performance than very big expensive disks. I usually go for whatever is available around $100, which is typically two sizes below the current maximum.

For example, as of this writing the biggest available disk is 1.5T. 1T retails for ~$150, and 750GB drives can be had for ~90. 750GB would be what I'd use. For just below $1000 in drives, you'd get 6TB of usable protected space (10 drives in two 5-disk RAID5 arrays, 3TB of usable storage per array).

The memory. You don't need to stuff your computer with memory. For any reasonable home usage, 2-4GB is enough. For a corporate environment where hundreds of people may be using the system concurrently, this is obviously not true, but for home, it is really enough.

So in short, hardware wise, the rules are simple: big case, big power supply, server motherboard, 2-4GB of ECC memory, one midrange Xeon CPU, and simple, run-of-the-mill SATA controller card, with as many cheap disks as possible in software RAID5 arrays.

Software


Again, this only covers the Windows options. I do not know much about what's available on Linux.

Windows Server 2008


Pro:

  • Works really well as a file server.

  • RAID5 implementation is very good.

  • Lots of useful data-management features.

  • Fast.

  • Secure.

  • Available to current MSDN subscribers at 0 incremental cost. You keep your license key even if your subscription ends.


Con:

  • If you don't work at MSFT, it's $800, plus CALs.



Windows Server 2003


Pro:

  • Cheaper than Server 2008 - can be had for $500.


Con:

  • Software RAID5 implementation is crappy. Expect ~15MBps throughput.

  • $500 is still a lot of money.



Windows Home Server


Full disclosure: I was the first dev and then the dev manager for the v1 of WHS.

This is an interesting product, and its target market is, well, the homes in need of a server. It is relatively cheap ($140 in Newegg, although it is an OEM version, which means that once registered, it is tied to a motherboard). It comes with a great backup solution, a UI custom-made for home server scenarios, and a nice remote access portal.

Its weakness is the storage system.

It is a conceptual variant of RAID1 - you can have some of your files duplicated. If you have a lot of data though, you're paying twice for its protection - for me, it would not just be prohibitively expensive, but a physical impossibility of cramming so many disks into the server chassis. The recommended way to expand Windows Home Server is via external USB disks, but (see above) I do not recommend doing it because it makes the physical installation fragile.

And because the UI and all other subsystems in Windows Home Server are tied to its custom storage management solution, you can use native Server 2003 software RAID5, but you would then have to keep these disks on the side and leave them unconfigured in the Windows Home Server UI. You would not be able to put backups on them.

To do this, install Windows Home Server on a computer with a single big hard drive. Do not create any shares, and do not put any data on the shares created for the users. Then, AFTER the installation, add more disks, and configure them in RAID5 arrays by using native Server 2003 UI. Leave these disks unconfigured in Windows Home Server UI. You will also need to use native UI to share them out.

Another way to use Windows Home Server is by putting it on one big, hardware-managed RAID5 system. You do have to pay more for hardware and be mindful of possible controller failure, but given a substandard performance of 2003 software RAID, you will win on performance.

A relatively small disadvantage of Windows Home Server compared to other options is its inability to join the domain, or be a domain controller. This had to be done to justify the much lower price.

Small Business Server 2003


This is the product with potentially the best price/performance ratio that has ever come out of Microsoft. It is a full implementation of Server 2003 with Exchange server, Sharepoint, and a nice remote access portal, and you can buy a retail version of it for a mere $300 (go to www.pricegrabber.com; search for Small Business Server 2003 Standard; be careful - a lot of options are CAL packs, not the software itself; find the RETAIL version - OEM copies are only marginally cheaper, but tied to the motherboard once you register them).

It comes with the installer that sets it up as a domain controller, and configures its components. If a PC has two network interfaces, it can even be used as a router! It serves as a DHCP server, and has a DNS proxy, and even a VPN server built-in, so you can access your home network from the outside.

And yes, you do get a real domain for your home. Your user accounts and passwords are now centralized, and any user can log in to any machine in the household using the same name and the password. It's really cool!

In addition, you can host your own web site, your own email, share the calendar with the family members (Microsoft Outlook is included), etc. Highly recommended!

Two gotchas.

First, do not put any data on the system hard disk (the whole disk, not just the partition): the write cache on that disk is turned off, so accesses are SLOOOOOOOW. This is because of the Active Directory database.

Second, SBS is restricted to be a domain controller: it cannot join any other domain.

Processes


I use the following relatively simple rules to ensure that my data is healthy.

  1. I ALWAYS diff source and destination on multi-hundred-gigabyte data transfers. Bit flips do happen when you're dealing with a lot of data. A single bit flip can render a ZIP file useless.

  2. I keep all data on RAID5 arrays. I keep multiple copies of truly irreplaceable data, such as digital pictures, videos, and documents. There is one server for such data where two RAID5 arrays are mirrored by hand (and are periodically diffed).

  3. Once every few months, I backup the irreplaceable data to a large drive and store it offsite.

  4. Truth is always on servers. All other computers may periodically keep a cache, but all modifications are regularly transferred to servers.



If you've made it so far, congratulations! You can build servers for your home for fun and profit, and avoid all the mistakes I've made when building mine :-)!

Linux vs. developers

I work at Microsoft, and from observing the workplace, I am pretty sure that every Microsoft developer is perfectly capable of installing a video card.

When I worked at Google, all my computer configuration tasks were handled by technicians from the internal IT service called the "Tech Stop" (*). They were smart, efficient, and knowledgeable Linux experts, and they were always on hand.

Coming from the Windows world, I of course needed a lot of help. At first, I thought that it's just me, and eventually I would accumulate enough know-how to handle Linux myself.

As time went, I noticed that most other Googlers rely on the Tech Stop experts as well - for the tasks that I considered trivial, and which any developer should be able to do easily - on Windows.

I started asking around, and discovered that at Google, 80% of all engineers would NOT be able to replace a video card with a different model: that would require configuring X, and that was well beyond vast majority of people.

Linux: 1, developers: 0 :-).

----
(*) http://blogoscoped.com/archive/2008-09-01-n53.html

Friday, October 17, 2008

So long, and thanks for all the fish!

...from a hedge fund manager who bet against the subprime mortgages...

Worth reading!

http://www.cnbc.com/id/27239479

Saturday, October 11, 2008

More breaking news...

The stock market crashed this week... what's on the reporters' minds?


Friday, Oct 10, 2008 at 3:55 pm PST.

Thursday, October 9, 2008

Why test matters



A picture is worth a thousand words...

http://gizmodo.com/5053734/how-many-google-engineers-does-it-take-to-tell-the-time

How low can it go? A back of an envelope calculation...

With the market pushing 9000 mark from above, I was wondering how low is really low? Is it 9000? 8000? 4000? So here's my estimate.

The market price has two components - earnings, and P/E (price to earnings) multiple. The former reflect the factual state of affairs, the later - consisting of the bizzare goo of expectations, punditry, "animal spirits of the entrepreneurs", and the like - the phychological state of the investors.

P = (P/E) * E

Here's the P/E chart for the last 100+ years:


The recent P/E ratio was hovering around 25 - way below the absolute maximum of the .COM boom, and still below the short period right before the great depression, and yet much higher than during the rest of the modern history. Just looking at the graph, it looks like it's reasonable to expect that during an economic downturn the P/E ratio could return to a more typical 10-15 range.

Let's say it's 12.

Now what about earnings? A big difference in consumption that I observed while traveling outside the US was this: in the US, stores are full of cheap stuff, and people buy it profusely. For example, the average number of TV sets per US household: 2.24. The average size of household: 2.5. By comparison, prices for consumer goods in Europe are much higher (or at least were much higher until dollar tanked), and the salaries are lower. So people buy less.

In a slow economy, can a US citizen get by with fewer than 3 TVs? 2 cars? a smaller house? You bet! I can easily see consumer spending dropping by 30% just to eliminate the senseless, excess consumption that does not really improve the quality of life.

Of course a 30% less in consumer spending will probably result in similar drop in business investment, etc. But let's say that overall effect on the economy is a 20% drop in profits, just to be safe.

What we get with this numbers is

(Future Dow) = (Recent Dow) * (Future P/E) * (Future E) / ((Recent P/E) * (Recent E)) = 14000 * 12 * 0.8 / 25 = 5000.

Which is to say - it's NOT UNREASONABLE to expect that Dow can go down to 5000 - not that I KNOW that it will really happen. :-).

Friday, October 3, 2008

Breaking news!

The $700B bill has been approved. The stock market is down 800 in a week. The real estate market goes up in flames. Palin debated Biden yesterday.

But check out the breaking news...




This is from NY Times 10/4/2008 6:50 GMT.

Wednesday, October 1, 2008

What newspapers do you read?

Sarah Palin reads "most of them", according to her CBS interview.



COURIC: And when it comes to establishing your world view, I was curious, what newspapers and magazines did you regularly read, before you were tapped for this, to stay informed and to understand the world?

PALIN: I've read most of them again with a great appreciation for the press for the media, I mean...

COURIC: Like what ones specifically? I'm curious that you...

PALIN: Um, all of 'em, any of 'em that um have been in front of me over all these years, um...

COURIC: Can you name any of them?

PALIN: I have a vast variety of sources where we get our news too. Alaska isn't a foreign country where it's kind of suggested it seems like, wow how could you keep in touch with what the rest of Washington, DC may be thinking and doing, when you live up there in Alaska. Believe me, Alaska is like a microcosm of America.

(Transcription from http://www.americablog.com/2008/09/cbs-palin-cant-name-one-single.html



Truly Alaska is a microcosm of America!