Sunday, December 30, 2007

America, the land of second (third, fourth) chance

How many times does one have to screw up before people stop taking him or her seriously?

As Prince Igor sings in the famous Borodin's opera,

...give me back my freedom, God above,
And let me wipe my shame out on' the field!
... - presumably so he can waste another army while pursuing personal glory in the most incompetent way.

A couple of months I ran into an article in The Atlantic Monthly by none other than Henry Blodget. The article's title was "The Conscientious Investor".

If you have survived the .com boom (and subsequent bust) in the US, the name of Henry Blodget must ring familiar. He was one of the top cheerleaders for THE bubble of our time, and a manifestly dishonest one, at that. "In 2002, Eliot Spitzer published Merrill Lynch e-mails in which Blodget allegedly gave assessments about stocks, which conflicted with what was publicly published. In 2003, he was charged with civil securities fraud by the U.S. Securities and Exchange Commission. He settled without admitting or denying the allegations and was subsequently banned from the securities industry for life."

The word "conscientious" appearing in the same passage with the name "Henry Blodget" is an oxymoron. Yet one of the nation's most respected magazines gives him a pulpit again, to advise on investment strategies.

This is not really an isolated example. David Frum, formerly the Bush speechwriter has now a comfortable job commenting on Marketplace, otherwise a very respectful market commentary produced by APM/NPR. Again, a cheerleader for every excess of the Bush regime, he normally - in the olden days - would have had a seat at the Nuremberg trial, and a deserved noose at the end of it. But here, he dispenses advice on the foreign and economic policy.

The list goes on and on and on. From Barry Minkow, ex-convict now a "religious leader" to, ultimately, George W. Bush, ex-alcoholic, a "C" student, and an across the board business failure now a "president". I bet even the Enron guys will come out on top, eventually, just give them time.

The question is - how long can one keep screwing up in this country before being treated as what he or she is - a crazy moron, never to be listened to again?

Saturday, December 29, 2007

Risk vs. Reward

One of the chief premises of self-regulating markets is that the brand carries immense value. So an owner of a brand will work hard to not compromise the hard-earned reputation.

As Paul Krugman, a respected economist from Princeton and one of the best columnists for the New York Times writes, "In a 1963 essay for Ms. Rand’s newsletter, Mr. Greenspan dismissed as a “collectivist” myth the idea that businessmen, left to their own devices, “would attempt to sell unsafe food and drugs, fraudulent securities, and shoddy buildings.” On the contrary, he declared, “it is in the self-interest of every businessman to have a reputation for honest dealings and a quality product.”"

Of course these days most big companies in the US are publically held, and run not by owners, but by hired executives. These executives are paid a lot of money, and majority of their compensation is tied to the short term performance of the company, usually as measured by its stock price.

An average executive stays on the job for 8.2 years. A typical vesting period of a stock package is between 4 and 6 years. Most stock option packages require the options to be exercised (sold) within 10-12 years from the date on which they were granted.

Most of executive contracts also have provisions for a "golden parachute" - if executive's employment is terminated for any reason, he or she is entitled to large severance bonus, sometimes measured in tens of millions of dollars. "Golden parachute" contracts usually do not require the executive to meet any performance standards - he or she gets paid even when fired for gross incompetence.

So what does all this mean for an executive as a rational player?

Most importantly, there is a clear incentive to UNDERPRICE risk.

Suppose you have a risky scheme that, if successful, will bump your company's earnings significantly, even if temporary. If unsuccessful, it could cause a major setback for earnings, that could potentially knock the company off its feet for years.

In other words, there is a choice between potentially unlimited upside, and a golden parachute as a downside.

Let's do some modeling - throw a bunch of random numbers at the problem. The numbers are fictitious, but the orders of magnitude are about right - I've been reading them in the newspapers for years.

Let's say we have a CEO that has a $5M golden parachute built into his contract, and a package of options based on a $10M investment (e. g. he gets the result of appreciation of the stock that is worth that much, if the stock does appreciate).

Let's say we have a strategy of organic growth, where the company does whatever it was doing for a while, and enjoys the historical growth rate of a 10% per year, for the next 5 years.

An alternative is a major product line restructuring. Let's say that if the project is successful, the stock may appreciate by an order of magnitude (10 times) over the period of 5 years. However, if it fails, the stock price falls by 60%. Let's say the probability of success is very low, let's say 10%.

The expected pay off for the risky alternative is 9*0.1 + (-0.6)*0.9 = 0.36. The expected pay-off for the organic growth strategy is (1 + 0.10)^5 - 1 = 0.61. The risky strategy is a clear loser.

Let's see how the same calculation looks from the perspective of our CEO. In both cases our hero rules the company for 5 years, then heads for the greener pastures.

Organic growth: ((1 + 0.10)^5 - 1) * 10M(stock options) + 5M(golden parachute) = $11M.
The Risky Project: 9 * 10M * 0.1(stock options) + 5M(golden parachute) = $14M.

Hmmm... Risky strategy is a clear winner! The difference between the owners and the employees here is that the owners are negatively impacted by the losing part of the bet, whereas the employees benefit from winning, but do not suffer from losing.

This statement applies mostly to the top of the hierarchy, of course - if the company goes down, it will downsize the grunts, which would very much feel the negative effect - by not being able to pay their mortgages. The top, however, will have the benefit of a very big pay-off either way, enough to pay for the private jet's fuel for quite some time.

"Wait a minute!" - you'll say. What about the CEO's reputation? After the failure he has inflicted on his company he will not be hired anywhere else? Turns out that this is not true. More about it tomorrow.

As a side note, ANY stock-option based compensation package by definition has limited downside (you get nothing if the stock goes below the exercise price), and unlimited upside. For a rational corporate player, this should encourage taking risks.

I think overall risk in the US economy is very much underpriced, because there is no incentive at the top to price if fairly. Let's look at an insurance businesses as our second example.

Insurance industry earns money by collecting the premiums, paying the claims, and keeping the difference. For a normal year the amount of claims is fairly stable and predictable. This allows actuarials to set the price of your home insurance very precisely. However, there are events that happen very infrequently (major hurricanes or earthquakes) that could potentially have enormous effect on the claims, and sizable effect on the cost of your premium.

However, NOT taking these events into account makes perfect sense from the perspective of a CEO - the likelihood that a major once in a hundred years event will hit on his watch is low, and the potential upside of not including it in premiums is high - it allows the company to undercut competitors, get more customers, and increase the stock price.

While not accounting for a probable event may be malicious and potentially criminal, there are plenty of legal way of going a long way towards the same goal, while still deriving sizable benefits. It is because the probability of an improbable event is extremely hard to estimate.

For example, geologists think that a devastating earthquake should hit the Pacific Northwest some time in a future, but it could be any time in the next 1000-10000 years. Well, an insurance company could figure in a conservative probability of .1%, or a liberal probability of 0.01%. In either case, nobody could possibly claim a foul play, but financial incentives dictate the liberal estimate, regardless of the truth.

By the way, this premium will have to pay for a total loss of the house. A typical house in Seattle area is worth around 600000. If, say, 400000 of this is the cost of the building (assuming that the plot is not destroyed by the earthquake, which is actually NOT a given, either, because the geological evidence shows swaths of land underwater after the previous such event), .1% is $4000 a year, and .01% is $400. It's a big difference.

Judging from my own insurance premium (which does have an earthquake rider), they are not figuring in even the liberal estimate.

Now, what would other insurance companies be forced to do to compete? Lower their own premiums, of course! Which would in turn require underpricing the risk by the entire industry.

If you keep this propensity to undeprice risk and its overall effect on the entire industries, what happened during the .COM and the subprime mortgage crises becomes perfectly clear. In both cases execs at the bodies charged with evaluating the riskiness of the investments had short term horizons, and massive financial incentive s for allowing very questionable investments to go through. And the competitiveness of the industry ensured that once a few companies chose to disregard the risk, the rest were required to follow.

So where does this all lead us? My conclusion is that the principle of self-regulation is suspect, especially when taken as a religion by the crazy Ayn Rand followers. Hopefully, they will be wiped out by the next election, but the picture they paint is simple, easy to understand, can be packed in a 30-second TV commercial... and false. Which makes it very easy to sell to the American electorate which has notoriously short attention span.

We are yet to see how the mortgage crisis will ultimately turn (I will blog my own predictions at some point). The question is - who is going to be next? My hope is that it's not going to be the airlines, where the price competition is extreme, the ridership is high, and the fleet is wearing out.

Friday, December 28, 2007

A Random Walk Down Wall Street

On average only about 20% of the cost of bringing a product to market is spent on research, development, and manufacturing. The rest is the sales and marketing expenses.

This varies by industry quite a bit, and nowhere the sales and marketing share of the pie is so pronounced as it is in the finance industry. Ever heard of the multi-million dollar Christmas bonuses on the Wall Street? Yes, the very same people who create all these wonderfully complicated schemes that they push on you in exchange for large commissions!

If I had to pick one thing that I have learned in business school to keep, and forget everything else, the class I would choose would be the class in risk and derivatives - the financial instruments which allow factoring out the trend out of the stock to focus on it's purely random component, or the random component to focus purely on a trend, etc. And the most interesting thing I learned there is CAPM, or Capital Assets Pricing Model.

The theory is simple enough for a first-year college student to understand. It goes approximately as follows.

Let's say we have a portfolio of all stocks and other securities in the universe with spot (current) prices Ei, i = 1..N. Let's say the standard deviations of the distribution of the prices of these securities are Si. The weights in which they are present in the portfolio are wi, w1 + w2 + w3 + ... + wN = 1.

The risk of a stock or a portfolio is quantified by its standard deviation. The standard deviation of a portfolio is a square root of quadratic polynomial of its weights - a convex function on a plane where we have standard deviation on a horizontal axis, and return on a vertical.

Let's now consider a combination of risk-free investment (cash, the theory assumes that you are a big player and you can borrow and lend at very close rates) which pays interest rF, and the some portfolio with some combinations of fixed weights. Using these 2 financial instruments, you can construct a combination portfolio that has a given return between rF (you have all your money in cash) and infinity (you have borrowed an infinite amount of money, and bought an infinite amount of the aforementioned portfolio with it).

On our graph the returns for a given risk all will lie on a line connecting the point representing risk-free cash (0, rF) with the point representing our portfolio (Sw1w2w3w4...wN, Ew1w2w3w4...wN). Because the graph of all possible portfolios is convex, there exist a single point (a tangent to our set of all possible portfolios) which has all other lines passing through all other possible portfolios below it.

Courtesy of Wikipedia:


Points on this line represent the most effective investments - for a given risk, they carry maximum return, and they achieve any given level of return with the minimum risk.

The conjecture that won a Nobel in Economics is that this point is the current "market" portfolio - i. e. all of the market, of which most people consider SP500 to be a "good enough" representation.

What does this mean for a casual investor and finance professor alike? That regardless of what you're being told by your broker, the best investment you can buy is and SP500 index fund. The rest is basically made to maximize the salesman's commissions and fund manager's income (Vanguard SP500 fund expenses ratio is 0.19%, as compared to 1.5% of an average "managed" index fund).

An aside:
Dogbert is talking to Alice, who's going to a trade show.
Dogbert: "To be successful at the trade show, you'll need a trade show booth from the Dogbert Trade Show Booth Company. For maximum profts, I recommend the deluxe model."
Alice: "How will the deluxe model maximize my company's profits?"
Dogbert: "Oh, so now this is about *your* company?"

Another aside: The finance professors I personally know from UW's business school invest either exclusively or primarily in SP500.

But wait! you'd say. There are plenty of stocks and funds that outperform SP500 by a large margin! AAPL! GOOG! The energy fund! The gold! The point here is that they do it at a considerable risk, which is higher than can be delivered by the efficient portfolio. Measuring return without the attendant risk is meaningless.

So if you want to achieve a greater return, you can borrow the money from the bank, put them in the index fund - and it will be less risky for any level of return than any stock or mutual fund.

More on CAPM here: http://en.wikipedia.org/wiki/Capital_asset_pricing_model

The book after which this article was named, A Random Walk Down Wall Street is a fascinating reading. It gives an excellect historical overview of financial markets, investment vehicles, and instances of "irrational exuberances". I strongly recommend this book, in fact, the late addition of this recommendation to this article is because I myself bought 3 copies of it to give to friends (some of which read this blog) this holiday season, and I didn't want them to buy it before I give it as present.

Thursday, December 27, 2007

Up or Out

The US Navy has an "up or out" rule that guides their promotions system: once an office is no longer promotable, he or she must leave the service.

Throughout its history, Microsoft frowned upon developers that were "just doing their jobs". It's review model used to have the following grades:

  1. 2.5 - "below expectations" (you're about to be fired)
  2. 3.0 - "at the level's expectations"
  3. 3.5 - "exceeds some level expectations"
  4. 4.0 - "exceeds most expectations"
  5. 4.5 - "one of a kind" (I've only seen this given out 3 times throughout my career as a manager - in all the teams that I've managed, and all the sister teams)
  6. 5.0 - "special service to the company" (I have never seen or heard of this given to a developer)

These numbers went on a curve. For levels below 65, 1/3 of all people would get 4.0, 1/3 - 3.0, and the rest 3.5. Ratings above and below this range were extremely rare.

Although HR mantra was that grading on the curve was completely unacceptable for smaller teams, and only teams of 25 people and above should fit the curve, it was extremely difficult for managers to reconcile performance across groups (and worse, across disciplines), so in reality most teams at least at the level of a discipline manager (dev manager, test manager, or GPM) and below were forced to the curve even if they were smaller.

Another HR mantra was that 3.0 was a good, passing grade, but in reality the bonus and merit raise numbers spoke for themselves. I've seen (although only once) a person being fired for being stuck at a level with a whole bunch of 3.0s. You couldn't get promoted until you got into 3.5-4.0 land, and stayed there for a while. And, of course, finding a new job was a lot easier if you were a 3.5-4.0 developer, rather than a 3.0 one.

The year before I left the company the review system changed to remove any ambiguity. The stock ratings (which were originally A through D and not revealed to the employees, and even the first level managers) had become "limited", "strong", and "outstanding", and focused strictly on promotability.

The distribution was 20-70-10 - 20% of the people were supposed to be "oustanding", 70% "strong", and 10% "limited". The problem was that despite being awarded for promotion potential they were labeled "contribution", so if you were to reach your (manager-perceived) zenith, you were told that your contribution was "limited".

I remember at the time I was absolutely furious - I was to have to tell some of my most senior (and most productive) people that their contribution was "limited". WTF!? Who invented this stupid review system? That person was definitely "limited"! (I was told - Ballmer invented these labels, but I do not know for sure).

So this review period (which, thankfully, I didn't have to do!) they renamed the buckets to just "20", "70", and "10" to eliminate the stigma. I don't know if it worked.

In other words, MSFT does not have a strict "up or out" rule that once you are no longer promotable you should get out, but it sends a strong hint. Which means that once you're at the top of your career, you're not particularly welcome here.

Incidentally, most full time test engineer (as opposed to developer in test) jobs were eliminated (and transferred to contract status) in part because there was not much career potential for engineers who were not developers at Microsoft.

The results of all this aren't particularly great. It saps the morale for both the most senior (and often the most productive) contributors and their immediate managers. Elimination of testers without a question negatively impacted the quality of the products, as well as the productivity of the test developers, who now had to do both automation AND manual testing (of course, at much higher pay scale than manual testers used to do).

I wonder what the top execs were thinking when they created this. Peter Principle, perhaps? Something else? Did this effect just slipped through the cracks when the company was young and most people retired before reaching their maximum level?

What do you think?

Windows 8, or 9, or 10?

After the dust settled around the Vista release, I could distinctly sense the smell of decline and fall of the Roman Empire all around it. To wit - luxurious, decadent external appearance (the UI) coupled with declining, rotting from the inside substance.

Just to recoup a few facts:

  • On the same hardware, 2x slower than XP
  • Incompatible with a large proportion of existing media
  • Big step backwards from MCE UI - try playing DVDs stored on a hard drive
  • Insanely buggy (read Mark Russinovich blog to fully appreciate the scope)
  • Majority of device drivers and applications no longer work
  • No major, innovative features to compensate for it all

More on Vista here and here.

This is no laughing matter, ladies and gentlemen. While the freedom-loving hordes of northern barbarians might have rejoiced in the Roman collapse, the civilization plunged in the darkness of the Middle Ages for the next millenium.

The time, especially computer time, moves much faster now, but I still would not want Microsoft OS franchise to die, or be replaced by either Mac or Linux. This is still the only company that takes corporate users and developers seriously. Northern barbarians notwithstanding, there's still nothing that can replace Windows on my laptop or on my home network. Plus, I feel no desire learning Objective C :-). (Not that I dislike learning languages - I've learned JavaScript, Python, and to a big extent Java just in the past 6 months; it's just that a language that only is relevant for one computer platform does not excite me).

Anyway, is everything lost? Is Windows code base beyond the point of no return? What would I do if I were Bill Gates and would have decided to return to reincarnate Microsoft just like Steve Jobs did with Apple?

One of the things that is both an amazing blessing, and a huge problem for Windows is enormous legacy. Thousands of applications whose compatibility is ensured by millions of lines of source code that otherwise is not used. Huge barrier to entry for a single bug fix, or any other change - whatever you do inside Windows is sure to break somebody. Whatever you do in Windows kernel is sure to break a LOT of people.

Look at what happened after they introduced LUA in Vista - vast majority of even Microsoft's own applications stopped working.

Attempts to maintain backwards compatibility the way Windows does it now lead to the huge code base that is terrible hard to maintain and expand (an average Windows dev now only writes 1500 lines of code per year - and this leads to rapid decline of the quality of the development org as a whole), and increases the memory pressures, which leads to the overall sluggishness.

Meanwhile, the Moore's law appears to have stopped contributing to the performance of an individual CPU, and started contributing to the parallelism of the system - instead of clock speed-up we're now seeing the proliferation of cores. 4 years ago when I started my new job as a dev manager in one of the Windows NT base team, the spec I put together for a dev box had a dual 2GHz Opteron and cost $3500. The 4-way machine we used to build our code was 2GHz and ~$12000. Today, my home file servers are 4-ways running at 2GHz still, but they are only $2000, excluding the cost of the drives.

The proliferation of cores is bad for the current Windows architecture - at 16 CPUs Windows is actually SLOWER than a single-core machine. Luckily, many-core architectures create opportunities for virtualization, and this I think is the solution for the OS quagmire Microsoft is finding itself in.

Here is what I would do if I were the new Dave Cutler.

At the lowest level it would start with a hypervisor that would host multiple copies of multiple NNT (new NT) subsystems, and enable extremely fast message-based communications between them. Each subsystem will host one or multiple applications, but no UI whatsoever.

There will be a separate subsystem for every compatibility layer (say, XP, or Vista, or Server 2003, or maybe even Windows 98) derived from the source code for that specific generation of the OS.

New subsystems (essentially, the new versions of the OS runtime environment) can be developed without any regard to backwards compatibility, because the old apps will be run by the old subsystems. For example, there could be a pure .NET-based subsystem. A Singularity subsystem. A POSIX subsystem. Development of these subsystems could be an internal Microsoft affair, or it could be shared with the industry or research institutions, potentially causing a virtuous cycle of innovation.

The UI will be handled by a separate UI-only subsystem (which could potentially run wholly on a specialized UI core, like Nvidia), though a hypervisor-layer UI protocol that would work much like RDP (except an RDP connection to a specific set of windows by an application, see Remote Applications Integrated Locally).

The file system can also live in a separate subsystem (or maybe a number of subsystems, say, one per volume), and be accessed through a hypervisor version of redirector (over SMB-like protocol).

This architecture is robust (if a subsystem crashes, much of the system is still alive and useful, unlike previous isolations where kernel may be operational, but you can't do anything if the UI died). It is scalable not just to many cores, but with the hypervisor transport extended through the network, to a cluster, or beyond. It is not hard to implement in an iterative fashion - at first, the subsystems can run the entire versions of the OSes, with drivers for UI and message passing. Later, they can be stripped down to bare minimum required to run the subsystem.

The question is

  1. Do Microsoft execs understand what a hole they dug for themselves in the last few years, and
  2. Are they going to rock this boat? Is there an incentive for a change now that Bill is gone?

Tuesday, December 25, 2007

SBS 2003 - don't put your files on it!

I was running Small Business Server 2003 in my house since the day it was shipped (and so all MSFT employees in Windows Division got a copy). I installed it that same day, and I ***LOVE*** the product.

It's a wondeful integration of Windows Server the OS (normally, $700) with Exchange (normally, $1000+), firewall, and a nice setup package that handles all the nitty-gritties of setting up a domain, Exchange server, configuring the firewall - all the tasks that you need to be a sys admin to know how to do - into a nice, $300 package.

So I was running SBS 2003 for ever as an edge computer on my network - a replacement for a router, but it also allows me to fully host my domain. For quite some time I used an old Dell workstation (circa 1999) that I picked up at a PC Recycle depot. I noticed that file sharing was kinda slow, but I attributed it to the old hardware.

Earlier this year I went through a major computer upgrade, and part of it was a new hardware for SBS. This cannot be called slow - Xeon 3060 CPU, a very decent server motherboard, a fast SATA drive, and 4GB of RAM. Initially I broke hard drive into two partitions - one for the OS, and the other for per-user file shares.

Then I tested copying the files, and it it was terribly, horribly slow. The disk was doing something on the order of 8MBps writes. What's worse, if you copy a very big file (a couple of gigs), now that the computer had tons of RAM, it would cache it all, and then start writing. And while it is writing, it would monopolize all of the disk bandwidth, so all page operations will be just tucked at the end of this practically infinite queue, so the whole OS would ground to a halt - the mouse would move, but practically nothing else will, and that's for 5-10-15 minutes! WTF!?

After a lot of experimentation, I found out that if you write files on any partition on a system disk, the writes are excruciatingly slow. If you put them on any other disk, they are an order of magnitude faster.

So I started asking around. The SBS team knew more or less nothing - it has changed almost entirely since 2003 shipped, and they didn't really test file share performance back then, either. Finally I found a dev manager from one of the storage teams who had an answer - turns out that Active Directory turns off the disk caching for the drive it is running on. The reason it because a lot of drives lie about flushing the disk cache - they tell OS that they did, but in reality the data may still be somewhere in the disk cache. This completely defies the transaction support in the AD, and if the power is lost at the right moment, the database can be corrupted.

Of course, no sane person would run a file server from the AD controller, right? So that limitation was probably OK for what they designed it to do - most of the writes on this computer are writes into the directory, and they need to be flushed any way.

Of course if you're running a Small Business Server, all of it - AD, Exchange, file sharing - runs from the same machine, and most likely from the same hard drive. Which leads to terrible performance of the storage-bound parts of the software stack.

Morale: if you do configure user directories on SBS, or use any other kind of file sharing, but the shares on a separate physical disk.

Bit rot, or maintaining a healthy Windows installation

My computer network at home has grown quite big (for a home). I have 2 massive storage servers (3 TB per server in protected usable storage - 2 RAID5 arrays with 4 500GB disks in each, per server), 1 backup server with 2TB more, 2 Media Center PCs (I have no DVD players), 1 game computer, 4 personal computers (mostly laptops) belonging to the family members, a Small Business Server 2003 "to rule them all", a game computer, and another PC running the phone system for the household.

That's 12 computers overall, 3 running Windows Server, 3 Windows MCE 2005, a few XP laptops, and in one instance (the game computer, because Halo 2 PC requires Vista) a dual boot between XP and Vista.

Over the years I tried to introduce Linux a few times, but never managed to integrate it in a usable way (that is, assign it a meaningful task). When I'm done with my current project (writing a Project Gutenberg proxy that automatically converts books into a SONY Reader format), I will try it again.

Except for the 4 servers that have to be separate because you just can't cram enough hard drives into one case (there are 9 hard drives per server), I could probably have consolidated the rest into less equipment.

The reason they are separate is because over my almost 15 years of using NT, I learned one lesson - Windows computers become slower and less reliable over a period of time. The reason - constant installation and uninstallation of software, and consequent fragmentation of the registry and other vital system files that normal disk defragmentation cannot touch, because it does not defragment open files.

An attendant problem is that software installation and uninstallation is often done as an after-thought - either in a very short time right before the software is to ship, and people discover that nobody thought about the installer, or by developers who are less competent and therefore cannot be assigned to a "real" feature.

Uninstaller gets even less love, because it's an after-after thought. This unfortunate trend results in piles of garbage in your system's registry - leftovers from software long gone. Most commercial installers do track the registry settings that are created during the installation process, so they can be removed during uninstallation. But most software creates registry entries in the process of its normal operation, and about that the installer has no way of knowing.

Case in point - install Visual Studio 2005, run it once, then uninstall it. Then search your registry for "Visual Studio". I have not yet done this experiment with 2008, maybe they have cleaned up their act by now, but I somehow doubt it.

Registry is both a file that grows, and thus gets fragmented, and also a file-system-like data structure, that can get fragmented (unbalanced) internally.

The first problem can be solved with this wonderful tool by Mark Russinovich.

The second I don't think is solvable at all, short of maybe booting into WinPE, mounting the hive, and then copying the data from the hive into a different hive. But don't try this at home because some of the data in the hive, such as info fields, the regedit cannot access. If you actually do do this, you will end up with an unbootable system.

So I used to reinstall Windows by reformat approximately yearly because by the end of the year the PC would be noticeably slower. This phenomenon is widely known as the "bit rot".

Then I tried a different usage pattern, that turned out to be very successful - partitioning of the tasks across many computers. This way you pre-install everything you need for a specific computing task on a single machine, and then leave this machine doing this task - and nothing else.

For example, my file servers have Windows Server 2003 installed on them, and nothing else. No Office. No codecs. The data lives on disks that are separate from OS. So the image on these boxes does not change much day to day, and the performance today is exactly the same as it was a year ago (yes, I did measure :-)).

The same is true for Media Center computers, and most other machines in my household - they are single-purpose boxes. When I need to deploy a new application, I just go out and buy another box. You can usually build a generic PC from Fry's sale component for ~$200, without the monitor.

I install the OS, update, defragment, install all the supporting apps, defragment, then leave the box to run its task. The only things that change after that are Windows updates, and infrequent updates to this one application (e. g. my MCEs occasionally get their codecs refreshed, when I discover that they no longer play newer files).

Of course, I still install new things to try them out, but I do it on my laptop (or in a VM, if it looks really scary). Because all of the usual computer tasks have been farmed out to the rest of the network, my laptop doesn't run a lot - XP, Office 2007, Visual Studio 2008, VMWare, Adobe Reader, SONY Reader, X server, GnuWin32, SlickEdit, a couple of media conversion programs, and a pack of codecs. Reinstalling it is a matter of hours, and it's the only machine in the house that I will be redoing periodically.

Monday, December 24, 2007

How long does it take your book to boot?

Mine takes about 5 minutes, because my book is a Sony Reader PRS-500.

Overall, it's a nice device, and I use it all the time, except it can sometimes be very, very slow. I have about 600 books on an SD card, which is what could be the problem - they must have some N**2 algorithms hidden in there somewhere.

The quality of the electronic-ink screen is really decent, and once it boots, the device is reasonably responsive. The biggest problem with it is there is no way to maintain hierarchy of the books - in my case all 600 of them are shown in one big list. You can page through it, but it's 60 pages. You can sort these 60 pages by author, title, or the date the book was transferred to the device.

Even if they have had a one-level hierarchy based on authors, or genre, or anything, using the device with a lot of books on it would have been much easier. As it stands, the fact that it can take a 2GB SD card is not very useful.

The ideal solution, of course, would be to just support a file system, with user-defined folder hierarchy. And this brings me to one of my favorite rants about all media playback devices and software that I have seen so far - the fact that they all support hard-coded hierarchies based on file metadata exclusively.

Configuring metadata is a pain. There's no easy way to set, let's say, a genre for a lot of files. Supposedly, there are internet databases that the ripping software can (and does) use to query metadata when it rips the CDs. In reality, these databases are extremely inconsistent. On CD of a series could have genre "Classical", and the next would be "Opera". There are 3 spellings for audiobooks - Audio book, AudioBook, and audiobook. And let's not even go into various ways people misspell the artist names.

That is, if you're lucky enough to have all your music in English. But if some of it is in Russian, Japanese, Chinese, or any other non-latin-based alphabet, then there will be albums with author names in 2-3 different encodings, and 5 types of transliterations, making finding anything completely impossible.

On the other hand, putting files in directories is easy - everybody knows how to do it, operating on large groups of files is easy, and the user is in ultimate control. I have no idea why not use an old, proven model of a file system instead of inventing a new model, setting up databases, creating synchronization protocols - and all that is to make user's life harder.

So what I ended up doing at home is write a program using Windows Media SDK that allows setting metadata from a command line, putting all music on server in the following hierarchy: genre\author\album_name, and writing a script that enumerates all the files (it took quite a bit to write it in CMD batch language - I wish I knew python back then :-)), and sets the metadata to the respective directory names.

It took a day or so to set up, but at least now my hardware devices sort of cooperate.

Saturday, December 22, 2007

Life by committee, or management Google-style

At Microsoft, the job of a manager is perfectly well defined.

A dev lead is responsible for productivity of the elementary organization unit - the development team. He or she owns a feature or a group of features in the product, and is primarily responsible for success of these features in the market.

To do their job well, dev leads have various tools at their disposal. For example, they write their employee's reviews, have weekly 1:1 meetings with the people who work for them, and are primary contributors to team hiring efforts.

A dev manager is responsible for health of his or her development organization. He finds and appoints dev leads, makes sure that individual contributors are successful, ultimately owns all performance and underperformance issues, and is primarily responsible for the success of the product's implementation in the market.

A dev manager also has plenty of tools in his or her arsenal. They write reviews for their leads. They decide who gets hired on the team. And they mediate the agreement between the leads on respective merits of individual developers when it comes to stack ranking on the reivews.

Test and PM leads and managers own identical roles in their disciplines.

A product unit manager is responsible for smooth interoperation of all disciplines involved in bringing the product to market. He or she is also the primary point of contact for intergroup cooperation issues, and is an advocate for the product company-wise.

General manager owns the product lineup, and is responsible for building the leadership team, setting the overall division strategy and investments.

And so it goes...

After 7 months at Google, I am happy to report that I have no idea what exactly is the role of a lower to mid-level manager at here.

I totally get what Engineering VPs and above do - their role does not seem to be any different from Microsoft. Below them there are engineering directors, who seem to own a lot of nitty-gritties of basic Google site operations (like finding buildings to rent), and most of the low-level (at MSFT PUM level) investment decisions.

Right below them there are line managers. Line manager is a person to which most engineers report. There is slight variation in their jobs - some of them (very few) have fewer employees (~10) but are active engineers, many are not really engineers at all, but have lots of employees (30-60).

The role of a line manager is a big mystery to me. Before moving to my new job in GEO projects, I met my manager in Gmail maybe twice (before I started looking for a new job, that is) in 5 months.

The question is, if a manager's face time with an employee is less than an hour a month, how can this manager do the manager thing? As it turns out, (s)he doesn't, at least not in Microsoft sense of the word.

Let's start with building the team. At Microsoft the team would start with a dev manager, who would interview the people and decide, along with another very senior interviewer called "an AA" (As Appropriate - a person whose job is to decide whether the interviewee has a career at Microsoft at large, as opposed to a single team).

At Google, hiring decisions are made by a committee of mid-career to senior developers called Hiring Committee. HC makes hiring decision "for Google", irrespective of any particular team where the person might work. HC members do not actually interview people - they review the written feedback from the interview loop. The interview loop is different from Microsoft as well - the people interview "for Google", not for any particular team, and they come from a bunch of different teams at the company. In fact, the person is not placed with any particular team before he or she is hired. The placement is decided by an Allocation Committee, structured similarly to HC, except with larger proportion of Eng Directors.

As a side note, this means that when an engineer interviews with Google, by and large he or she does not know on which team he or she will end up working. This is usually not a problem because changing jobs at Google is very easy. Excited by something else? Just finish your committments to the current team, and go to the next project.

Which means that managers don't do much of the team building. But surely, they do performance reviews? Not really...

At Google, performance is fully peer-based. Which means that your perf review is written mostly by people who work with you. You nominate them yourself, although your manager can add to that pool. The manager's role is only aggregate the feedback, and deliver it. Manager's personal opinion about your work is also on your review, but is worth maybe 10-15% of the total.

The results of the perf review written by your peers go to another committee, who do the salary review. This committe typically consists of engineers that do not know you (and for a smaller office like Seattle/Kirkland, are likely to be outside your office; likewise, SEA/KIR perf committees get mostly people from Mountain View), but are a level or two higher.

Finally, the promotions are handled by a committee as well. The employee is the one who decides whether to ask for promotion, puts together a promotion package, and sends it to a promo committee, which functions very much like perf review.

So Google's manager tools are very limited. He or she doesn't decide who gets hired; doesn't directly influence people's growth, and does not influence the technology (since most managers are not practicing engineers). What do they do? Who knows...

More on assembly-level optimization here...

http://www.agner.org/optimize/

Integer multiplication and dynamic programming (Part 2 of 2)

Alright, now with code. Unlike previous code samples, this one is actually debugged :-).

First, let's define a few data structures. We will need to define an operation somehow, and also the decomposition - a way of breaking up multiplication by a number into a sequence of operations that we define thusly:

// Possible operations
// F1: shl reg, #iconst ; * 2**iconst
// F2: lea reg, [reg + 2*reg] ; * 3
// F3: lea reg, [reg + 4*reg] ; * 5
// F4: lea reg, [reg + 8*reg] ; * 9
// F5: add reg, reg2 ; +1
// F6: sub reg, reg2 ; -1
// F7: lea reg, [reg + 2*reg2] ; +2
// F8: lea reg, [reg + 4*reg2] ; +4
// F9: lea reg, [reg + 8*reg2] ; +8
// F10: lea reg, [reg2 + 2*reg] ; *2 + 1
// F11: lea reg, [reg2 + 4*reg] ; *4 + 1
// F12: lea reg, [reg2 + 8*reg] ; *8 + 1

Here's my definition for the operation:

enum OPTYPE {
NONE, // Just a sentry
SHIFT, // F1
MULTIPLY, // F2-F4
ADDCONST, // F5-F9
INCRMUL // F10-F12
};

struct OP {
OPTYPE type;
int argument;
};

As is customary in dynamic programming, for every multiplication operation we're trying to decompose, we will actually decompose a whole bunch of numbers that the original multiple breaks into. There will be a lot of repetitions, so we will store all the intermediary results in a hash, to avoid recalculating them over and over and over again.

Therefore our decomposition structure will be a member in two linked lists - one will point to the next decomposition after applying the operation, and the other is just a linked list of all the elements in a hash bucket.

struct Decomposition {
int number;
int cost;
OP operation;
Decomposition *next;
Decomposition *next_in_bucket;
};

Let's now implement our hash structure, to get it out the way. As you remember, there are two conditions - when we have an extra register, we can use operations F5-F12, but there is one extra MOV that needs to happen before the register is computed first, so the cost is +1 (for the entire decomposition). If the register is not used, we are limited to operations F1-F4.

It turns out that the easiest way to deal with these cases is to do two decompositions in parallel - one with an extra register, and one without, and then compare the costs. Hence there will be two hashes.

// These hashes must have identical sizes
#define DECOMPOSITION_HASH_SIZE 511
Decomposition *
hash_with_extra_reg[DECOMPOSITION_HASH_SIZE];
Decomposition *
hash_no_extra_reg[DECOMPOSITION_HASH_SIZE];

Decomposition *FindInHash(int num,
bool has_extra_reg) {
Decomposition **hash = has_extra_reg ?
hash_with_extra_reg :
hash_no_extra_reg;
Decomposition *runner = hash[num %
DECOMPOSITION_HASH_SIZE];
while (runner && runner->number != num)
runner = runner->next_in_bucket;

return runner;
}

void AddToHash(Decomposition *dc,
bool has_extra_reg) {
Decomposition **hash = has_extra_reg ?
hash_with_extra_reg :
hash_no_extra_reg;
int bucket = dc->number %
DECOMPOSITION_HASH_SIZE;
dc->next_in_bucket = hash[bucket];
hash[bucket] = dc;
}

Simple, huh? Now let's write a few service functions that let us compute the results of an operation. Remember that decomposition F exists IFF n == F(F-1(n)), so we will need functions to compute both F and F-1:

int Compute(int num, OP *op) {
switch (op->type) {
case SHIFT:
return num << op->argument;
case MULTIPLY:
return num * (op->argument + 1);
break;
case ADDCONST:
return num + op->argument;
break;
case INCRMUL:
return num * op->argument + 1;
}
return 0;
}

int ComputeInverse(int num, OP *op) {
int val = 0;
switch (op->type) {
case SHIFT:
val = num >> op->argument;
break;
case MULTIPLY:
val = num / (op->argument + 1);
break;
case ADDCONST:
val = num - op->argument;
break;
case INCRMUL:
val = (num - 1) / op->argument;
break;
}

// The result can never be negative or 0.
// These values never have inverses.
if (val <= 0)
return 0;

// The operation must have the inverse...
// It is not always the case.
if (Compute(val, op) == num)
return val;
return 0;
}

And while we're at it, something to actually print the result:

char *GetOpFormat(OP *op) {
switch (op->type) {
case SHIFT:
return "shl eax, %d";
case MULTIPLY:
return "lea eax, [eax + %d*eax]";
case ADDCONST:
if (op->argument == 1)
return "add eax, ecx";
else if (op->argument == -1)
return "sub eax, ecx";
else
return "lea eax, [eax + %d*ecx]";
case INCRMUL:
return "lea eax, [ecx + %d*eax]";
}
return NULL;
}

void ReversePrint(Decomposition *dc) {
if (!dc)
return;

ReversePrint(dc->next);
printf(GetOpFormat(&dc->operation),
dc->operation.argument);
printf("\n");
}

And now the actual dynamic part. We need to enumerate through operations, and within the operations, through the arguments. Let's write these functions, first (for bloggish simplicity, they are not iterators).

bool NextOp(OP *op, bool has_extra_reg) {
if (op->type == NONE)
op->type = SHIFT;
else if (op->type == SHIFT)
op->type = MULTIPLY;
else if (op->type == MULTIPLY &&
has_extra_reg)
op->type = ADDCONST;
else if (op->type == ADDCONST &&
has_extra_reg)
op->type = INCRMUL;
else
return false;
op->argument = 0;
return true;
}

bool NextArg(OP *op) {
if (op->type == SHIFT)
op->argument++;
else if (op->type == MULTIPLY) {
if (op->argument == 0)
op->argument = 2;
else if (op->argument == 2)
op->argument = 4;
else if (op->argument == 4)
op->argument = 8;
else return false;
} else if (op->type == ADDCONST) {
if (op->argument == 0)
op->argument = -1;
else if (op->argument == -1)
op->argument = 1;
else if (op->argument == 1)
op->argument = 2;
else if (op->argument == 2)
op->argument = 4;
else if (op->argument == 4)
op->argument = 8;
else return false;
} else if (op->type == INCRMUL) {
if (op->argument == 0)
op->argument = 2;
else if (op->argument == 2)
op->argument = 4;
else if (op->argument == 4)
op->argument = 8;
else return false;
} else return false;
return true;
}

Now the main function. Note extensive use of sentries.

#define MAX_DEPTH 10
Decomposition *Decompose(int num,
bool has_extra_reg,
int recursion_depth) {
Decomposition *d = FindInHash(num,
has_extra_reg);
if (d) // We already have it!
return d;

// Limit level of recursion - this
// allows us to avoid loops
if (recursion_depth > MAX_DEPTH)
return NULL;

Decomposition temp_d;
memset(&temp_d, 0, sizeof(temp_d));
temp_d.number = num;
temp_d.operation.type = NONE;

OP temp_op;
temp_op.type = NONE;
temp_op.argument = 0;

for ( ; ; ) {
if (! NextOp(&temp_op, has_extra_reg))
break;
for ( ; ; ) {
if (! NextArg(&temp_op))
break;
int next_val = ComputeInverse(num,
&temp_op);
if (next_val == 0) {
if (temp_op.type == SHIFT)
break;
continue;
}

if (next_val == 1) { // Done!
temp_d.next = NULL;
temp_d.cost = 1;
temp_d.operation = temp_op;
break;
}

Decomposition *next = Decompose(
next_val, has_extra_reg,
recursion_depth + 1);
if (! next)
continue;
if (temp_d.operation.type == NONE
|| next->cost + 1
< temp_d.cost) {
temp_d.next = next;
temp_d.cost = next->cost + 1;
temp_d.operation = temp_op;
}
}
}

if (temp_d.operation.type == NONE)
return NULL;

d = new Decomposition;
*d = temp_d;

AddToHash(d, has_extra_reg);
return d;
}

We are practically done. Just need something to call Decompose with and without the extra register, and compare the results.

Decomposition *Decompose(int num) {
if (num <= 1)
return NULL;

Decomposition *with_reg = Decompose(
num, true, 0);
Decomposition *no_reg = Decompose(
num, false, 0);

if (! with_reg)
return no_reg;

if (! no_reg)
return with_reg;

// +1 is for mov operation to
// copy to extra register
return (with_reg->cost + 1 <
no_reg->cost) ? with_reg : no_reg;
}

Alright! We're done. Let's write a trivial main to collect user input and print out the results.

int main(int argc, char **argv) {
for (;;) {
printf("Input number: ");
char input[128];
fgets(input, sizeof(input), stdin);
char *nl = strchr(input, '\n');
if (nl)
*nl = '\0';
if (strcmp(input, "exit") == 0)
break;

int num = atoi(input);
Decomposition *dc = Decompose(num);
if (!dc) {
printf ("Could not compute!\n");
continue;
}
if (NeedsExtraReg(dc))
printf("mov ecx, eax\n");
ReversePrint(dc);
}
}

Now you can copy all this into an editor of your choice, and enjoy multiplication without multiplication :-).

Friday, December 21, 2007

Integer multiplication and dynamic programming (Part 1 of 2)

Integer multiplication is one of the operations that benefited a lot from transition to Core architecture. It's latency dropped from 10 clocks on Pentium 4 to 3 on Core 2 Duo. As it happens, Pentium 4 was a step backwards from PIII - it was 4 clocks there. But they were slower clocks :-).

I happen to look at what Microsoft Visual Studio-generated assembly a lot, and for a while I was noticing that whenever a program needed to multiply by integer constant, the multiplication operation was almost always replaced by other instructions - and this was true not for just multiplying by a power of two, where it's just a shift.

(With VS 2008 this is no longer true - the 3 clock penalty for imul is better than memory pressure of replacing it with 2 instructions.)

Intel has a number of other operations that can be used in lieu of integer multiplication, and you can combine them to get to fairly complex numbers. For example, suppose you want to multiply EAX by an integer constant.

You can:
SHL EAX, n ; multiplies by 2 to the power of N
LEA EAX, [EAX + 2 * EAX] ; Multiply EAX by 3(*)
LEA EAX, [EAX + 4 * EAX] ; 5
LEA EAX, [EAX + 8 * EAX] ; 9

(*) LEA = load (really, compute) effective address. This is indexing instruction that is designed to quickly add an index of an array element times its size (which can be 1, 2, 4, and 8) to the starting address of the array to get the address of the element. Usually the start of the array lives in the first register, and the index in another, so typical use is something like lea eax, [edx + 8*esi]. But with the right registers it can be used for multiplication as well.

Now, suppose you can use another register (say, the number you're multiplying by can be copied to EDX)

Then you can do even more:
ADD EAX, EDX ; +1
SUB EAX, EDX ; -1
LEA EAX, [EAX + 2 * EDX] ; +2
LEA EAX, [EAX + 4 * EDX] ; +4
LEA EAX, [EAX + 8 * EDX] ; +8
LEA EAX, [EDX + 2 * EAX] ; *2 + 1
LEA EAX, [EDX + 4 * EAX] ; *4 + 1
LEA EAX, [EDX + 8 * EAX] ; *8 + 1


By using combinations of these operations, you can achieve amazing things:

Multiply by 45 (5 * 9):
lea eax, [eax + 8*eax]
lea eax, [eax + 4*eax]
Multiply by 78:
mov ecx, eax
lea eax, [ecx + 4*eax]
lea eax, [eax + 8*ecx]
lea eax, [eax + 2*eax]
shl eax, 1
Multiply by 13:
mov ecx, eax
lea eax, [ecx + 4*eax]
lea eax, [eax + 8*ecx]
Multiply by 241:
mov ecx, eax
lea eax, [ecx + 4*eax]
lea eax, [eax + 2*eax]
shl eax, 4
add eax, ecx
Multiply by 189:
mov ecx, eax
lea eax, [ecx + 4*eax]
lea eax, [ecx + 4*eax]
lea eax, [eax + 8*eax]
Multiply by 1023:
mov ecx, eax
shl eax, 10
sub eax, ecx


How does compiler finds the right decomposition of the multiple into these operations? The answer is dynamic programming.

The subject of dynamic programming is decomposing a problem into simpler subtasks that can be computed recursively by decomposing them into more subtasks.

In this case if we have a multiple we want to decompose, we can try every possible operation from the list above. 'Trying an operation' actually means applying the inverse to the multiple - e. g. if we want to see if we can use 'multiply by 9' operation (lea eax, [eax + 8 * eax]), this would mean that we divide our multiple by 9 (decompose it into *9 and other operations), then call decompose on this new number (but only if it exists - i. e. is a whole number).

For every multiple we will go through all inverse operations, and note which one resulted in the shortest link. This would be the operation we will end up using.

The algorithm is as follows:

Let F(N) be all of the above operations, and F-1(N) are the inverse operations (so for example let F(3) = lea eax, [eax + 8 * eax] - multiplication by 9, then F-1(3) would be division by 9.

Decompose (multiple, recursion_level):
if recursion_level > 10 return infinite (*)
if multiple == 1 return 0
min_cost = infinite
best_op = none
for n = 1..N
next_multiple = F-1(n)(multiple)
if next_multiple is a whole number:
cost = 1 + Decompose(next_multiple)
if const < min_cost:
min_cost = cost
best_op = F(n)
decomposition[multiple] = best_op
return min_cost

(*) This makes sure it terminates. Because some of our inverse
operations can actually make the result bigger (specifically,
an inverse to -1 is +1), the recursion becomes infinite otherwise.

The actual C code of course is much more involved. I will post it tomorrow.

Dreaming of a startup...

Horatio Alger was a writer in 19th century America who made his name publishing rags to riches stories. His heroes, starting invariably poor, through hard work and determination achieve spectacular successes. Here's a representative title: "The Errand Boy; or, How Phil Brent Won Success".

Horatio Alger's spirit is alive and well in America today - how else can one explain that half of the population reliably votes to give more and more money to the rich, very rich, and the richest through the tax cuts, something that is absolutely unparallel to the developments in the rest of the industrialized world.

But nowhere it is more relevant than in the startup world of the Silicon Valley. Thousands of new graduates from the best software engineering schools all around the world stream into the Valley every year hoping to strike it rich by joining the right company.

Well, let's see what the real chances are, however.

A typical new grad joining a new startup can expect to get somewhere around 0.1% of equity in the new company on joining. If this does not sound like a lot, consider that a 100 person comany would be giving 10% of its equity to the employees - a very fair number, considering that there will be 2-3 founders and 3-4 rounds of financing in which VCs will get another 30-40% (or more) of the company.

So to make our new grad a millionaire (after taxes), a startup should become worth more than $2B. Looking at Google Finance, Software and Programming section, there are roughly 40 such companies. Most of them have been in business for at least 10 years (the average is probably closer to between 15 and 20).

Some companies get acquired before reaching the public stage, of course, but there are fewer than 5 of such high-profile acquisitions per year.

But does $1M really make you rich in Silicon Valley? Considering that a reasonable house there is $2-3M, that would make for a nice downpayment, assuming that the base salary can carry the rest of the $5000/month mortgage. But it certainly would not make you financially independent.

What would it take to be really well off there? I'd say that would be $5M minimum - this way one can buy a house for cash, and then have enough money left to invest (or maybe start your own startup). Unfortunately, $5M after tax for our new grad means $10B in startup capitalization.

At this point you can forget about acquisitions - there aren't any at this price. So back to the Google Finance page we go, and the results are not encouraging: 12 companies in software and programming section. Out of hundreds of new startup every year, hitting approximately 0.5 that ends up being a hit is about as likely as winning in a lottery.

Morale of the story: if you want to make money in a startup, be a founder.

More here:

Thursday, December 20, 2007

Top Ten: Why Carlyle Group Co-Founder Bought the Magna Carta

http://dealbook.blogs.nytimes.com/2007/12/20/top-ten-why-carlyle-group-co-founder-bought-the-magna-carta/


My favorite: "Rubenstein’s public statements to the contrary, freedom today does actually have a price. And just in case you were wondering, most of us can’t afford it."

Wednesday, December 19, 2007

Unix vs. Windows

Since I've done two Unix-hating posts, in the interest of being "fair and balanced", I thought I should talk about weaker sides of Windows as well.

(Parenthetically, I did not actually diss Unix per se. I was writing about the quality of its dev tools. It is hard to argue with the fact that of all software companies in the world Microsoft has spent by far the most effort to support its developers - and reaped the biggest reward in process of doing so.)

What I don't like about Windows is the API, or, rather, the meta-API - the general principles that guide the developers who create Windows APIs.

A short poll for Windows developers - how many of you remember all of the arguments to CreateFile? CreateWindowEx? Same Windows developers - how many of you remember arguments to fopen? Big difference, eh?

(Parenthetically, this is how Unix developers live without Intellisense - by having functions that take fewer arguments :-)).

But you probably still do use the same arguments to CreateFile that you would use with fopen - a file name, and an access mode. And the access mode implicitly dictates how the file is being treated if it does not exist.

The difference is that with Windows the API tries to cram every possible parameter into the function definition. With Unix, the APIs by and large provide for most frequently used case, and if you want to do something non-standard, you can do it by using something else.

For example, if once in a blue moon you want to fail on open for write if the file does not exist - the same thing can be accomplished by just checking if the file exists upfront - there's really no need to cram every possible piece of functionality into one function!

The problem is not just getting programmers to learn to type CreateEvent(NULL, FALSE, FALSE, NULL) when they are trying to get an event. It's that all these NULLs are being pushed on the stack - megabytes of 'push 0' in any Windows app.

Here's a very simple program "in Windows" and what compiler generates for it:

int wmain(int argc, WCHAR **argv) {
HANDLE h = CreateFile(argv[1], GENERIC_WRITE, 0, NULL,
CREATE_ALWAYS, 0, NULL);
00401000 mov eax,dword ptr [esp+8]
00401004 mov ecx,dword ptr [eax+4]
00401007 push esi
00401008 push 0
0040100A push 0
0040100C push 2
0040100E push 0
00401010 push 0
00401012 push 40000000h
00401017 push ecx
00401018 call dword ptr [__imp__CreateFileW@28 (402004h)]
WriteFile(h, "Hello, world", sizeof("Hello, world"),
NULL, NULL);
0040101E push 0
00401020 push 0
00401022 push 0Dh
00401024 mov esi,eax
00401026 push offset string "Hello, world" (4020F4h)
0040102B push esi
0040102C call dword ptr [__imp__WriteFile@20 (402000h)]
CloseHandle(h);
00401032 push esi
00401033 call dword ptr [__imp__CloseHandle@4 (402008h)]
return 0;
00401039 xor eax,eax
0040103B pop esi
}
0040103C ret


And the same program "in Unix":

int main(int argc, char **argv) {
FILE *fp = fopen(argv[1], "w");
00401000 mov eax,dword ptr [esp+8]
00401004 mov ecx,dword ptr [eax+4]
00401007 push esi
00401008 push offset string "w" (4020F4h)
0040100D push ecx
0040100E call dword ptr [__imp__fopen (4020A0h)]
00401014 mov esi,eax
fputs("Hello, world", fp);
00401016 push esi
00401017 push offset string "Hello, world" (4020F8h)
0040101C call dword ptr [__imp__fputs (4020A8h)]
fclose(fp);
00401022 push esi
00401023 call dword ptr [__imp__fclose (40209Ch)]
00401029 add esp,14h
return 0;
0040102C xor eax,eax
0040102E pop esi
}
0040102F ret


The Windows executable code is 60 bytes, and the Unix is 46 - full 25% less. But the memory is free, right? No, it's not, you pay for it in performance.

What else is bad about Windows? Kernel mode! It's a huge blob of millions of lines of code, all interdependent, all running in the same address space, with every component having a potential of trashing the state of another.

That's where drivers live, too - to make sure that even if Microsoft only hires the absolute wizards, there will always be a bunch of device drivers writers - written by contractors hired by hardware companies - to coredump you computer for you.

On top of it, you get KD to debug it.

I used to work in NT base for a year (I ran away because of terrible tools, I just couldn't stand KD after 6 years of using Windows CE kernel debugger, which is exactly like Visual Studio, except it can step in and out of the operating system), and I observed NT devs for all of my 9 years with Microsoft.

The only way to make any progress working with any kernel component in Windows is to know everybody else who works in kernel mode, because an average debug session involves having 2-3 people from across the stack - one level above you, one level below, and one to the side (security) - sitting together at your computer trying to figure out who corrupted the pool today.

Or alternatively you can send around the remote sessions (remote kernel debugging skills is something that Windows NT team perfected to the black belt level).

What else? I think that the recent metadata-based approach that Microsoft champions (where the code is not enough to describe what application is doing, one has to supply a bunch of metadata describing security level, signing, component dependencies, versions of dependencies, etc) is way too complicated.

Compare, for example, a learning curve for regular Windows UI programming model (which I think was really well designed) with Avalon. The former requires maybe a day to get right, and maybe a week to become really proficient. The later I have no idea, because after trying to grasp it for a couple of days, I gave up.

Parenthetically, how does it all compare with X? Read about it here http://tronche.com/gui/x/xlib-tutorial/ :-).)

How to get hired by Google

I have realized that I have a perfect recipe to passing a Google interview. It's not easy, but if you do it, you have 95% or more chance of success. The good thing is that the result is portable - you will be equally employable by Microsoft, as well as most other good software companies.

Here it is.

Read and do all exercises in the following books:

(1) Introduction to Algorithms

(Except chapters on advanced data structures (including B-trees, binomial and Fibonacci heaps, representing disjoint sets in data structures); sorting networks; polynomials and the Fast Fourier Transformation (FFT))

(2) Computer Architecture, Fourth Edition: A Quantitative Approach

and

(3) Hacker's Delight

If you do this and you're not hired (but you can prove that you've done all the exercises and tried to pass the interview in good faith), I will pay you $200 :-).

Rules for development on Unix

Two posts bashing Unix in a row - this is corny, I agree. But I ran into this quote and could not resist... It describes X perfectly.

Date: Wed, 10 Apr 91 08:31:33 EDT
From: Olin Shivers
To: UNIX-HATERS
Subject: Unix evolution
I was giving some thought to the general evolution (I use the term
loosely, here) of Unix since its inception at Bell Labs, and I think it
could be described as follows.
In the early PDP-11 days, Unix programs had the following design
parameters:
Rule 1. It didn’t have to be good, or even correct,
but:
Rule 2. It had to be small.
Thus the toolkit approach, and so forth.
Of course, over time, computer hardware has become progressively
more powerful: processors speed up, address spaces move from 16 to
32 bits, memory gets cheaper, and so forth.
So Rule 2 has been relaxed.

Sunday, December 16, 2007

C++ Development on Linux

My new team at Google uses C++. After spending the last 6 months in Java and JavaScript, I welcomed the opportunity to work again in a language I know well and like a lot.

The team is tiny, and at first the fact that there isn't really a setup for debugging struck me as odd, but explainable. I was coming from the environment where everything started with instructions (or creating thereof) on how to build and run the project on the local machine. The road traffic team runs everything, including the dev builds, in the data center, but in a pipeline that is different from production.

But how do you debug? Well, that has not been figured out yet. Currently with printfs and logs.

Ok, weird... So I, too, used the printf in developing a new little feature that I wrote first, I, too, set up a special pipeline for myself to run, and used the logs. Then I went to a python class, and that was an impetus for me to look into running everything locally - I wanted to use the new skills.

The very first stage of the pipeline worked just fine, but the second did not work. There is no way I could debug thousands of existing code with printfs, so I set out investigating Linux debuggers.

What I discovered was truly horrible. The only debugger I could find was gdb - a command-oriented debugger with no UI whatsoever. If you come from Windows environment, think ntsd, but without extensions. On top of it there is ddd - a windbg-like shell, only an order of magnitude more primitive.

Used to the richness and power of Visual Studio? Tough luck, my friend, welcome to the best 1975 can offer! Back to the future...

To the developers who say that Linux is a better development environment - you are crazy. Patently insane! You just have no idea! You only use Linux because you've never been exposed to programming the way it is supposed to be done. And the people who do switch platforms never come back, so you do not hear from them.

Proof? There is Emacs emulation in Visual Studio, but there is no Visual Studio emulation in Emacs. You cannot claim that this is because Emacs keystrokes are more intuitive - it is probably because the people who leave the Unix environment to program Windows never come back...

It looks like Microsoft is really the only company that is spending big money on courting developers. It is unfortunate that with Vista release it appears to have just put a bullet through its head. I don't know if it will recover, but I hope to god it would, because otherwise there is not going a company that really cares about the developers. And the world will have no alternative to gdb.

Saturday, December 15, 2007

Scheduling a software project

For the first 6 years in my career I have not heard the word 'schedule' once.

I was working at Bentley System then, starting as a new graduate and working my way up to the founders's team (not in a sense that I became one of the founders, of course; but they were active developers, and they worked on fun stuff, and I was working with them). There weren't many schedule reviews - most of the top managers in the company were engineers, and preferred to learn the state of the world by actively contributing the code. In particular, the founders worked on every new project that the company started.

Then I moved to Windows CE team at Microsoft, and for the first 3 years there I have not heard much about schedule either. This despite the fact that I became a dev lead almost immediately after joining the team. Later on we started drawing boxes in MS Project, but this was mostly to keep track on who works on what. There was not an attempt to hold individuals to a date.

The same was true in my next job as a dev manager in an incubation team. We did have the schedule, and it was my job to make one, but it was a tool to help us track where we are. It was never a goal to 'hit the schedule'.

When I moved to Windows Home Server, it was a different world. Being a consumer product, there was a huge incentive to have it ready in time for Christmas shopping season - consumer electronics stores make close to 70% of all their yearly business during the three months leading to the New Year. If you aren't ready for Christmas, you might as well skip the release altogether, and wait for the next year.

In economics, there is a well known truism that a country cannot have fixed exchange rate, free flow of capital across its borders, and a monetary policy all at the same time. One can have any two of the three, but not all three.

For example, suppose the government has pegged exchange rates to a big currency (the dollar), let's say the flow of capital across the border is not restricted. The way the government usually conducts the monetary policy (which is a way to use government's control over money supply to influence the economy) is by setting the interest rates. Let's suppose it sets the interest rates above that which is set by the Fed.

Well, it would be an arbitrage opportunity - someone can borrow an infinite amount of money from the US paying the Fed interest rate, move it into the country with the higher interest rates (free capital flow!), and make the other government pay the higher rate. Because the exchange rates are pegged, there is no risk - we're making something (a difference on interest rate borrowed and paid) for nothing.

An arbitrage is ability to make non-zero amount of money from nothing, and without risk.

In no time you will find enormous mass of the capital streaming into the poor country, wiping out its resources, and transforming them into huge bonuses on the Wall Street.

Well, the software equivalent of this principle does exist, and it goes as follows - it is impossible to have fixed release date, fixed feature set, and product quality all at the same time. This of course is true for relatively big projects - I can predict precisely how quickly I can write a "hello, world" program, and it will do exactly what it needs to do, and be bug free.

But most of the time in this industry people are working on a bunch of interdependent features, and the code is based on a very rich platform. At the time of estimation, the environment (the dependencies below and to the side of your project) are not known, and a lot of times the information about the interfaces at the top side of your project (customer requirements) is incorrect. Add to this other eventualities - an engineer getting sick, or a key resource on another team going on a month-long vacation in Nepal, and you will understand why predicting the schedule, given a fixed feature set and quality, is an NP-complete task.

Of course if you're interviewing a project manager, he or she will never admit this. The standard solution people use to address this problem is pad. Add 10% for the meeting tax, 10% for expected sick days, 20% for reengineering, 30% for unknown customer requirements.

The problem with padding is the net effect of it is almost always huge - doubling or tripling the originally estimated project time. Plus you don't really know how many sick days you will really take, and how many new customer requirements will be discovered en route. So the result of padding is grossly overestimated schedule, that is about as useless as the original underestimated one.

Because if the problem with the underestimating the schedule is that you will miss it, the problem with the overestimating it is that you will either cut the features that did not need to be cut, or quote your customer a price that will be beaten by a competitor.

There is another insiduous side to overestimating, and it is that you cannot fool your people. Let's suppose a developer quoted you a week for a particular task, and you, the manager, has padded it by another week. Well, the dev now has 2 weeks on his schedule, and because he or she is absolutely sure that the task is doable in just a week, the first one will be spent browsing the web or playing Sudoku. And then the dev will get the flu and miss the schedule anyway! I am quite convinced that this phenomenon is very common, because in my career I have seen many an overblown schedule be missed anyway.

So with this in mind we have set out to build Windows Home Server with the understanding that the time is very important, as is the quality - so we will compromise the feature set - everything that cannot be done by the time we need to ship will be mercilessly cut. The quality and the date will be held firm.

This sounds fantastic on paper. In fact, I read and re-read the last paragraph, and I absolutely love it. We were so smart! Why didn't anyone figure this out before us? There must be a Nobel prize in project management in it!

The problem here is that usually the features that make up the skeleton of your project have enough variability in and of themselves that make cutting secondary features a very moot point. In reality you don't have enough time to do what absolutely needs to be done for the project to ship. So in reality, it is the feature set that is fixed, and the real trade-off is between quality and time.

And that was exactly the tradeoff that happened in the run-up to the release of Windows Home Server.

So what lesson did I learn from all this? I think that fixing the date leads to crappy quality, and that it is better to take the Halo route - the game is ready when it is ready, even if Bill Gates tells you otherwise. So now I am back to working on projects that are quality, not release date driven.

Disagree? Post a comment to the contrary, and get yourself employed as a project manager for the next version of Windows. Microsoft will pay millions if you can get it ship in time and on quality. :-)

Golden Compass: a movie review

Just came back from the movie "Golden Compass", and it was a disappointment. I took kids out to see it because of generally positive New York Times review, but the reviewer was too kind. My wife was smarter: she refused to go.

I must preface this that my wife and I read the trilogy by Pullman earlier this year, and were a little underwhelmed. The first book I thought was OK to good, but it sort of went down from there. The plot was good (up until the point where it hit the nun turned physicist), but the characters were a little overblown (Lyra in particular was so rebellious, and cunning, and manipulative that she felt like a caricature of a normal person), and the narrative turned tedious in a number of places.

I actually felt that the book was a very good candidate for a film. The plot was good, there is plenty of opportunities for grand vistas, and the movie could easily dispense with tedios part and present the plot more dynamically.

Unfortunately, this was not to be. I cannot say that it was a total bust - it may even be better than average. Definitely not like "In her shoes" which I caught a glimpse in an airplane - after 10 minutes of these, I felt that my IQ has dropped 20 points, and on the 15s minute I turned it off and went to bed. "Golden Compass" is definitely not bad in this sense. I thought it was definitely much better than Narnia, and my dauther thought that it was better than Eragon.

The movie suffered on 3 major points. First, it was definitely not great photography. There was a lot of opportunity to make it so, as the plot was very conducive to it, but the team did not deliver. In one case (where Lyra runs away from Mrs. Coutler) it looked like the background was superimposed on her running figure - like an old movie shot from inside the moving car where the background does not look real. The northern vistas were very subdued. The bear was just not very impressive - a step back in time towards the Jurassic park, and away from the standards set by Lord of the Rings. The battle scene... let's not even get into that.

Second, the characters lost a lot of their dimensions. In the book, nobody was entirely good, or entirely evil. It was the Master of the Jordan College who tried to poison Lord Asriel, and Lord Asriel sacrificed a child to his experiment of opening a pathway to a parallel universe. The movie was sterilized of any attempts subplot of how pursuing the "greater good" can lead to lots of local evil. This turns out to be the major if not the biggest component of the book, and omitting it from the movie made the movie really shallow.

Finally, the movie was way too short relative to what it was trying to byte off. The problem with the book is that there aren't that many plot lines that could be cut off, especially if you plan the sequels, and 2 hours is just not enough time to present them all in any details. What that lead to was lots of people who were present in the movie, but it was entirely unclear why. For example, Lyra runs into Lee Scoresby and the next thing you see was she is flying in the balloon with him. The witches start fighting on Lyra's side after barely a 2 minute conversation. Who the witches are (or for that matter, gyptians, or armoured bears) was not really explained.

Resume - this is the movie I will not be buying on DVD. If you have not seen it, wait a while and then get it on Netflix if you must.

Setting up Slickedit to work with large code base on a server

I am writing code for Linux, but my editing environment is Windows - I use SlickEdit to edit stuff off a file server that hosts my home directory.

If you use this approach naively, as I did in the very beginning, there is a good chance you will end up with the configuration that is painfully slow - from opening the files, to clicking the menus, to doing the context switches.

Here are the steps that one needs to do to get Visual SlickEdit to work well in this environment. The instructions are relevant for really big projects (tens of thousands of files), if yours is only a few files, you might not need this.

(1) Do NOT use UNC (\\server\share) names. Map a network drive. This makes SMB connection persist for longer than in the case of a UNC access, and makes switching away and back to the editor much much much faster. This may depend on the server, too - in my case the server is a NetApp appliance, and it makes a huge difference.

(2) If menus take a long time to show up, turn off makefile menus. What happens here is whenever click on the menu, SlickEdit re-reads all of the makefiles in your project. Here are instructions on how to do it, from SlickEdit's support:

Do you have a large number of makefiles in your project or workspace? Or any makefiles at all?

You can go to Macro-->Set Macro Variable and set the value of def_show_makefile_target_menu to either 2 or 0. This slowdown is usually seen when you have a large number of makefiles and/or source files in your workspace and stems from the Target menu being added to the Build Menu.

1) Bring up the Set Variable dialog (Macro-->Set Macro Variable).
2) Enter def_show_makefile_target_menu in the "Variable:" field.
3) Enter 2 in the "Value:" field.
4) Hit OK.

Here are the variable's valid values just in case you want to move to the 0 level:

0 - Disable all makefile submenus (Build menu and project toolbar)
1 - Enable all makefile submenus (Default)
2 - Enable makefile submenus only in the project toolbar (Build menu makefile targets are disabled)

(3) Remove all files named ltmain.sh from the project. SlickEdit reparses them every time it regains focus, resulting in more network traffic - and longer switch latency.

(4) [Added 02/06/2008] Disable Auto-list compatible parameters in Tools->Options->File Extensions Setup->Tagging (this needs to be done per extension). See http://1-800-magic.blogspot.com/2008/02/slow-slickedit-part-ii.html.

Have more tips about editing on the network? Share them here...

Tuesday, December 11, 2007

How to debug JavaScript using Visual Studio Express

http://www.berniecode.com/blog/2007/03/08/how-to-debug-javascript-with-visual-web-developer-express/

I used VS2005 for debugging JS. There's no way it is better than Firebug, but it's better than MSE for sure (and is really the only not entirely sadomasochistic option available for IE7). This article explains how to do it for free...

Monday, December 10, 2007

US electoral politics

Suppose you are hiring a software developer, and you get this guy on the interview raving about how computers are bad for humanity, and should never ever have been invented.

Or you're hiring a dev manager and an interviewee tells you how really all cool software is only written by brilliant individuals, and that programming in teams is just plain stupid.

Would you hire these people for these positions? The answer is probably empathic "NO!". As in - "no, no way!"

Interestingly, when American electorate hires their chief executive, half of the country reliably falls for people who say that government is fundamentally bad, incompetent, and should never exist. And everything should be done by private sector.

Interestingly, private sector does not, by and large, hire people like that. In private sector we prefer people who believe in what they are doing, and strive to do their job well.

This becomes a self-perpetuating prophecy. If people do not expect the government to be competent, competent people do not end up in government. The government is then run by idiots, and becomes incompetent - as expected.

When I first came to the US, I was actually dumb struck how poorly the trivial things that usually take no time in Russia (or were, really, because I left Russia a long, long time ago) are handled here.

Two examples. I needed to get a form from INS for a college application. This was before the internet. I came to the office and asked for the form. I was told to wait in line. I waited in line - for 4 hours. Every 15 minutes to half an hour I would come to the counter to remind them that they just need to hand me a piece of paper, but was sent back. 4 hours later, when my turn finally came, the office reached behind the desk and handed me the form. The whole transaction lasted less than a minute.

Later that year I was waiting in line to a financial assistance office. There were not so many people, but it took almost a whole day to get there. In my college in Russia they had 10 times the amount of applicants, yet they were handled an order of magnitude faster.

Of course, it is hard to find a more incompetent bunch of morons than the current administration, where the only qualification for the job is your conservative credentials. There's been tons written on how these idiots have handled anything - ANYTHING! - they touched. I will not repeat this.

But just today I've read a piece in NYT about White House press secretary not knowing what the Cuban Missile Crisis was.

Don't worry scrote. There are plenty of 'tards out there living really kick ass lives. My first wife was 'tarded. She's a pilot now.

Sunday, December 9, 2007

Memory is not free (more on Vista performance)

A while ago I was running the Windows CE part of the Microsoft MN-700 project, which was a joint project between Windows CE and Microsoft Hardware (and it also happened to be my thesis for an enterpreneurship class at TM MBA). I always had a soft spot in my heart for cheap pervasive smart devices...

Anyway, this particular board had a version of Linux that was ported to it my Broadcom, so a direct comparison between Linux and Windows CE routing performance was possible... and it was not to Windows CE's advantage.

Far from it. Linux was routing packets, WAN-LAN, at ~30Mbps, and wireless at ~20. Windows CE was crawling at barely 12Mbps wired and 6Mbps wireless. Now, this was CE's debut in the router space, so, obviously, no perf tuning was done previously - this was our first attempt.

Be it as it may, we were past the date where the product could have been moved by ship to guarantee the shelf space for Christmas, and dangerously close the threshold for airlift (turns out you need to have it in stores around mid to end of August, or you kiss your Christmas sales goodbye, because the shelf space will be given to other products).

After much profiling during many sleepless nights, we found that no particular part of our system was slow, but it was slow as a whole. After eliminating the last data copy (so the packet essentially went from DMA of one network controller directly to the DMA buffer of the other), we were still slower than Linux, which was completely beyond us - you cannot copy data less than zero!

Then we started looking at CPU performance registers (as it turned out we could read the Linux CPU state with a hardware probe). What we found out was Windows CE had a LOT more instruction cache misses than Linux. The CPU (a MIPS variant) had 8K I-cache, and the entire Linux routing code path fit into the cache. Windows CE has 12K instructions, so 50% of the cache was constantly evicted.

After we changed the routing algorithm to be more cache-local (by routing batches of packets, rather than individual packets), we started doing 35MBps WAN-LAN, and 25MBps (which is a theoretical maximum for "g") wireless - 20% better than Linux.

So what does all this ancient mean for modern computing? We have come to believe than the memory is basically free - an inexhaustible resource. It is so plentiful, that running out of memory has been deemed grounds for a program abort, instead of graceful handling - most programs die outright if "new" or "malloc" fails. So plentiful that it is pretty much impossible to say how much memory does a program use in Windows.

However, even though 4GB of RAM is now firmly under $100, and it is hard or impossible to buy a computer with less than 1GB, DRAM is very, very, glacially slow, and caches are still very limited.

Here is what it costs access data on a modern CPU (data from here: http://www.nwcpp.org/Downloads/2007/Machine_Architecture_-_NWCPP.pdf)

Store Size Access cost
(clocks)
Registers 64B 1
L1 cache 32KB 3
L2 cache 4MB 14
DRAM 4GB 200
Disk 500GB 15M


That's right - the moment your memory footprint spills out of 32KB (L1 cache), the data accesses becomes 3 times slower. The moment you run out of 4MB (L2 cache), it's 15 times (compared to L2), or 60 (compare to L1).

If you're writing commercial software today, it is probably not complicated math computations - vast majority of it is exactly this - the data accesses. Sorting stuff. Putting it in hashes. Computing hash codes. Retrieving. Enumerating, looking for matches. All of it either goes into memory, or comes from it.

And as you can see, it only takes 1 miss out of 15 accesses to double the amount of time it takes your program to run. And this is only a 6% data cache miss rate... Fifty percent of cache misses, and your program is slower by a decimal order of magnitude.

I am willing to bet the bank that Vista's overall sluggishness is directly traceable to memory abuse - code paths that spill out of cache all too often, poor cache locality on memory accesses, etc. You can actually see it - I have seen laptops with relatively slow CPUs, and monster gaming desktops with extremely fast CPUs to have very similar "feel" with Vista. The one factor that is common to this otherwise very different hardware is memory latency. So high rate of cache misses seems to be the only rational explanation.

Here's an interesting comparison of a Mac Plus (1987) vs an AMD desktop (2007) on a few office application tasks.

http://hubpages.com/hub/_86_Mac_Plus_Vs_07_AMD_DualCore_You_Wont_Believe_Who_Wins

Turns out that Mac Plus is faster on some tasks (a little bit), and slower on others (but not by much). So in 20 years we have not seen a lot of improvement in software responsiveness, while CPU power grew by many orders of magnitude. As did memory bandwidth - but not memory latency!

So if you think that times when software had to be efficient had long passed, you're wrong. Think about it this way - your code has 1MB to run in, tops (other programs and OS has claim on L2 cache as well). And yes, this includes the code, as well. What would be different in the way you design your code?