Thursday, January 31, 2008

Project Gutenberg for SONY Reader!

Alright, our office went on a 2-day ski trip to Whistler (one of the Google perks), and since I'm not an enthusiastic skier, I stayed behind to finish up some work, play some Halo, and do other fun projects.

One of my recent fun projects is a proxy of Project Gutenberg web site that adds ability to translate to SONY Reader. I was playing with it for a while now, and it slowly shaped itself into ~1kloc of Java code that actually seems to work.

So I thought I'd throw it here to see what happens. Note that this is ALPHA code. There is a good chance that I will take it down soon to fix bugs. It is also a standalone program running embedded HTTP server - eventually it will migrate into a servlet on Apache. But right now it seems to be doing something useful, so feel free to play with it.

Several notes:

  1. It is not intended to replace the official Gutenberg web site. Please go to directly if you need to download stuff other than SONY books.

  2. This is running off my DLS connection. Please please please do not be a bandwidth hog. Absolutely no crawlers! Robots.txt is set accordingly, and there are abuse counters, so if you crawl, YOU WILL BE BANNED. FOREVER! Worse, crawling may trigger P. G. abuse logic, and then the web site will be down, because I won't be able to get to them.

  3. This is ALPHA. This means bugs, potentially, a lot of them. Please be a good member of community, and if you do find a bug, leave a note with repro steps, a URL that did not work, etc. in comments section to this article. Please take a look at previous comments, the problem may have already been reported.

  4. If you are a hacker - you can probably hack this machine. But its image gets wiped every so often, and there is absolutely nothing else on that PC. And it's physically outside my home network, on its own IP. So your triumph will be hollow and short-lived.

Alright, without further ado, here's the link:

Give it a whirl!

UPDATE 02/03/08

I've fixed a ton of bugs in it today - I am actually feeling quite good about the code now. So I am upgrading it to ALPHA! I could use some feedback from users though - so if you did use it, please tell me how it went!

Known issue: makelrf which I use to translate the texts crashes on some files. When this happens, you get a 403 page.


  1. Move to Apache running as a servlet from current embedded web server.

  2. Move to a newer translation technology from makelrf.

When this is done, I will call it a BETA.

UPDATE 02/05/08

I've found and fixed the problem with crashing makelrf. It had static buffers for title, author, and description, which was not so big - 40-something characters for title, for example. It was frequently overflowing and crashing.

I've made my software limit the strings to the sizes that makelrf supports. If you tried to download a Reader book from the site but failed, try again. It should now work. In my own testing, the site has been quite reliable recently.

Don't be evil!

I am slowly being persuaded that marketing is evil. Not as a discipline, but as a division of the company (textbook definition of marketing is actually more about strategic research; practical implementation is mostly about sales and PR).

Read this pearl, for example.


Microsoft SQL Server 2008 Roadmap Clarification

The past few months have been an amazing time for the SQL Server team as we gear up for the start of the global launch wave on February 27. The response to SQL Server 2008 has been overwhelmingly positive – in fact, we now have more 100,000 downloads of our CTPs. What is catching users’ eyes? Scalability improvements, Resource Governor, Filestream, spatial data support, data compression, policy-based management….the list goes on and on.

Simply put, SQL Server 2008 is a significant release for us – one that builds on all of the great things that we were able to deliver in SQL Server 2005. We see it as a critical step forward for our data platform and the foundation of our broader vision for business intelligence. Based on what we are hearing from customers, as well as the results of the latest benchmarks, it seems the industry agrees.

Not surprisingly, one of the top areas of focus for us is always to deliver a high quality product, and in a very predictable manner. This is vital for our customers and partners – which is why we’ve frequently discussed our goal of releasing SQL Server 2008 within 24-36 months after SQL Server 2005. We are on track to reach this goal.

To continue in this spirit of open communication, we want to provide clarification on the roadmap for SQL Server 2008. Over the coming months, customers and partners can look forward to significant product milestones for SQL Server. Microsoft is excited to deliver a feature complete CTP during the Heroes Happen Here launch wave and a release candidate (RC) in Q2 calendar year 2008, with final Release to manufacturing (RTM) of SQL Server 2008 expected in Q3. Our goal is to deliver the highest quality product possible and we simply want to use the time to meet the high bar that you, our customers, expect.

This does not in any way change our plans for the February 27 launch and we look forward to seeing many of you in Los Angeles and other events around the world. Please keep the great feedback coming and thank you again for your ongoing support of SQL Server!


What this really says is - we're late. We move the expected release date for SQL server 2008 from Q2 to Q3. This is what it literally should have been saying. Instead, the message is a collection of atrocious doublespeak watered down by a bunch of idiotic self-congratulations.

After reading this, one cannot help by being cynical about the writers, and by proxy, the company that produced this thing. The first thing that comes to mind is - why are they thinking the rest of humanity is THIS stupid???

Here's a parody of this PR pearl, thank you Vadim for pointing it out...

Tuesday, January 29, 2008

Life advice from Halo: Find your strengths and abuse them!

I've spent quite a bit of time since the release of Xbox in 2001 playing what I consider one of the best games ever - Halo. I have a treadmill in front of a game machine (most recently it's actually a powerful PC built specifically with one goal in mind - playing Halo), so every day I set it to 2 MPH, 12o incline, and spend 90 minutes playing the game. It results in 2000ft elevation gain, and 1000 calories burned.

The game is so good because it's realistic. Not only it sports a universe where objects respond to your actions in a way you'd expect them to in a real world. You also see groups of people behaving in a way that's not dissimilar (psychologically) from the real life. The behaviors range from outright chivalry to deep sociopathy.

One of the principles I learned in the game that I apply in regular life can be formulated like this - "Find your strength and abuse it. Don't suck at everything else, but do find every opportunity to play to your strength."

Most of the Halo players are universalists - they are equivalently good (or bad) in all situations and with all weapons. Occasionally you would meet a true professional - a sniper, a Banshee pilot, a Warthog driver. These people are REALLY good with a particular weapon or a skill.

A sniper will pick you out from the remote corner of the player field, but also blow you brains out with his rifle from merely a few steps. A master banshee pilot can evade homing rockets, making the only weapon that is worth anything against the vehicles useless. A driver can splatter a row of an advancing infantry in one big swoop.

A team that has any one of them on their roster is unbeatable.

The only way to win against a team like this, as I learned, is to deny the professional player his or her tools. If it's a sniper, steal his team's sniper rifle. If it's a pilot, grab their Banshee before he does. You may not be as good at either of them as the adversary, but the difference would still be huge. Because most of the time, the professional player would be pretty bad with other weapons, vehicles, or situations.

Find your strength and abuse it.

In a real life, this statement goes again commonly held wisdom that the best person is a well-rounded person. Microsoft's newest review model for example details on the order of 5 competency classes, mostly orthogonal (yes, a 5-dimensional space!), in which a person must succeed to advance in his or her career.

I had many a junior developer coming to me and saying "I love coding, but my manager tells me that to get to the next level I must find a way to work with other teams, so I need to drop what I am doing and find an intergroup project."

It goes the other way around, too - I have seen some reasonably good people managers being driven insane because they were "not technical enough".

I think that an insistence on a well-rounded set of competencies almost guarantees mediocrity in the long range.

Bill Gates has not become what he is because he is a well-rounded person. In fact, Bill is a very uninspiring leader, probably a terrible team player, and certainly would not have moved much beyond a dev lead within Microsoft's new review model. But he is a brilliant strategist, and a great technologist, and he found a way to exploit his strengths, while hiring other people to compensate for his weaknesses.

I don't think much people care about interpersonal or leadership skills of Guido van Rossum very much (I am not trying to say he hasn't them, just that they are unimportant). He got to where he is - a designer of wildly popular computer language - by being a brilliant coder, and a great computer scientist.

Neither did "the cookie man" Lou Gerstner achieve - against all odds - his amazing turn-around of IBM because he was a great technologist, or indeed knew anything about engineering or any other aspects of IBM business.

Whenever you look - from Heinrich Scliemann to Albert Einstein, the people who left an imprint on the civilization had glaring deficiencies in many, many, many things. They had one or two overwhelming strengths, and the courage to do what they do best, rather than what they are told.

So for the individual contributors, I would repeat again and again. "Find your strength and abuse it. Don't suck at everything else, but do find every opportunity to play to your strength."

And for the managers - "Find what your people's strengths are, and put them in positions where they can abuse them for the good of the team."

Life advice from Paul Graham

"The most impressive people I know are all terrible procrastinators. So could it be that procrastination isn't always bad?"

Life advice from Joel Spolsky

"Without further ado, then, here are Joel's Seven Pieces of Free Advice for Computer Science College Students (worth what you paid for them):

  1. Learn how to write before graduating.

  2. Learn C before graduating.

  3. Learn microeconomics before graduating.

  4. Don't blow off non-CS classes just because they're boring.

  5. Take programming-intensive courses.

  6. Stop worrying about all the jobs going to India.

  7. No matter what you do, get a good summer internship.


(emphasis mine)

Life advice from Richard Hamming

"Over on the other side of the dining hall was a chemistry table. I had worked with one of the fellows, Dave McCall; furthermore he was courting our secretary at the time. I went over and said, ``Do you mind if I join you?'' They can't say no, so I started eating with them for a while. And I started asking, ``What are the important problems of your field?'' And after a week or so, ``What important problems are you working on?'' And after some more time I came in one day and said, ``If what you are doing is not important, and if you don't think it is going to lead to something important, why are you at Bell Labs working on it?'' I wasn't welcomed after that; I had to find somebody else to eat with! That was in the spring.

In the fall, Dave McCall stopped me in the hall and said, ``Hamming, that remark of yours got underneath my skin. I thought about it all summer, i.e. what were the important problems in my field. I haven't changed my research,'' he says, ``but I think it was well worthwhile.'' And I said, ``Thank you Dave,'' and went on. I noticed a couple of months later he was made the head of the department. I noticed the other day he was a Member of the National Academy of Engineering. I noticed he has succeeded. I have never heard the names of any of the other fellows at that table mentioned in science and scientific circles. They were unable to ask themselves, ``What are the important problems in my field?''"

Life advice series: Warren Buffett

"Going into the meeting, I was thinking that I would receive a great deal of advice about investing and how to quantify intrinsic value. I figured he'd tell us all about "Mr. Market" and how his favorite holding period is forever. Then I figured we'd get around to his bet on the Euro or his belief that the market is irrational and ineffecient. Or perhaps the second richest man in the world would, I don't know, talk about money?

Total time spent talking about any of the above: Zero. Zilch. Nada."

--- Later update: the blog referenced above has become invite-only, but the cache was still there. Normally I hate reposts, but in this particular case the content merits it. So here it goes. ---

I spent 6 hours last week in Omaha with Warren Buffett. As I walked into the meeting I was pleasantly surprised to find Mr. Buffett dressed more like a scroungy sophomore chemistry student than the greatest investor of all time. It was an open Q&A session with some of my colleagues and me for about 3 hours.

Going into the meeting, I was thinking that I would receive a great deal of advice about investing and how to quantify intrinsic value. I figured he'd tell us all about "Mr. Market" and how his favorite holding period is forever. Then I figured we'd get around to his bet on the Euro or his belief that the market is irrational and ineffecient. Or perhaps the second richest man in the world would, I don't know, talk about money?

Total time spent talking about any of the above: Zero. Zilch. Nada.

Let's get to what he DID talk about. As a big fan of "Top" lists, I've compiled the "Top 5" things (prioritized) I learned from Warren Buffett that day:

1. Be Grateful -

There are roughly 6 Billion people in the world. Imagine the worlds biggest lottery where every one of those 6 Billion people was required to draw a ticket. Printed on each ticket were the circumstances in which they would be required to live for the rest of their lives.

Printed on each ticket were the following items:

- Sex
- Race
- Place of Birth (Country, State, City, etc.)
- Type of Government
- Parents names, income levels & occupations
- IQ (a normal distribution, with a 66% chance of your IQ being 100 & a standard deviation of 20)
- Weight, height, eye color, hair color, etc.
- Personality traits, temperment, wit, sense of humor
- Health risks

If you are reading this blog right now, I'm guessing the ticket you drew when you were born wasn't too bad. The probability of you drawing a ticket that has the favorable circumstances you are in right now is incredibly small (say, 1 in 6 billion). The probability of you being born as your prefereable sex, in the United States, with an average IQ, good health and supportive parents is miniscule.

Warren spent about an hour talking about how grateful we should all be for the circumstances we were born into and for the generous ticket we've been offered in life. He said that we should not take it for granted or think that it is the product of something we did - we just drew a lucky ticket. (He also pointed out that his skill of "allocating capital" would be useless if he would have been born in poverty in Bangladesh.)

2. Be Ethical & Fair

Continuing on the analogy above, consider this scenario:

Imagine that you were selected as the one person (out of 6 Billion) to create the systems of the world. This includes the type of government, social programs, tax systems, military systems, job markets, laws, regulations, etc.

The only catch was this: You had to come up with systems that you believed were fair and that you wanted to live with, before you were allowed to look at your ticket.

When Warren talked about this it made me reconsider the definition of ethical behavior - what type of system would you create if you didn't know what ticket you had drawn? Would you take a different position on some of the programs you are for or against if you were surrounded by a different set of circumstances?

3. Be Trustworthy

This may be a minor point that Mr. Buffett was trying to make, but he told a simple story that affected me greatly. He told of the Founder of the Nebraska Furniture Mart, one of his companies, and how she came from a poor Jewish family and couldn't read, write or speak English. She was had survived the Holocaust, spent 16 years bringing her family to the U.S. (at $50 per person), and grew the Nebraska Furniture Mart from a $500 initial investment to do $350 Million annually from a single location in Omaha.

She told Warren at one point that the way she evaluated people was simple: She simply asked herself, "Would they hide me?" What a great way to judge your instincts about whether to trust someone or not.

4. Invest in Your Cirle of Competence

Warren talked at length about investing within your circle of competence. This applies as much to entrepreneurship as it does to investing in public securities. One thing that continually amazes me is how much discipline Warren has in never letting himself get excited about a deal that he doesn't understand. He understands his weaknesses, limitations, and the types of businesses that he gets.

He said that it is crucial that people clearly recognize what they don't understand, and place their effort and energy on businesses or career paths that allow them to bet big on themselves doing something that they do understand. He said that it's "not so important how big the circle is, but it's important that you know where the perimeter is, and when you're outside of it."

5. Do What You Love

Perhaps the reason that we've heard this a million times is that it's true. Warren talked at length about how excited he is to wake up in the morning and to do what he loves. He talked about how important it is to have the freedom in your life to paint your own canvas any way that you like. He said that many people talk about how they are going to just work at a high-paying job "for a little while" and then go do what they love - he equated that to "saving up sex for old age." He said to "never do something that doesn't excite you or that you dislike."

Not the advice you'd expect from somebody worth over $40 Billion?

Sunday, January 27, 2008

Absolutely hilarious...

PC World's "The 15 Biggest Tech Disappointments of 2007",140583-page,1-c,techindustrytrends/article.html

Apparently, Leopard can eat data...

What's wrong with Windows Mobile?

Just read this article on Gizmodo:

One can spend a whole day listing things that are better on iPhone than they are on Windows Mobile - the UI, the browser experience (speed), the media support, the web apps, or which are better on Windows Mobile than they are on iPhone - the browser support (page rendering for the smaller screen), the business apps, the email, calendar, contacts support.

I am not going to. The point I want to make is - the iPhone has already won. Despite the fact that I have spent a third of my career working on Windows CE, despite the fact that I am still using Windows Mobile phone, after considering switching to the iPhone and playing with one, on and off, for about a month, despite all that - it's time to admit it.

All iPhone problems aside, it's a version 1, and it is vastly superior to anything that we shipped up until version 5 of Windows Mobile. Hell, it took us 5 versions to just get it to the point where it would reliably receive calls - I was completely dumbstruck, I could not believe my lying eyes when it turned out that iPhone is a reliable cell phone in version 1.

But the reason I believe it has truly, really won is because I do not think that Microsoft will be able to keep up. You see, to keep up, they would need to write a lot of code, cool code, exciting code, code that the customers will love. And do it quickly.

But the culture in the company does not favor people who love to write code any more. It has become all about the release process, the planning, the schedulig. Fact: an average developer in Windows organization at Microsoft produces about 1500 lines of code per year. Sure, some kick-ass devs are still there, but there are now a minority, and they are leaving in droves. The Force is no longer with them...

NB: I have not worked in Windows Mobile division for the last 3 years. My observations apply to the company as a whole. It is theoretically possible that WM is an oasis of green in the otherwise bleak picture. We will know for sure when the next version comes out :-).

Saturday, January 26, 2008

Smallest ever web server in Python

Suppose you want to compare Java to Python as far as writing a simple web server is concerned :-).

Without much ado, here's the code that does the same, in about half as many lines of code:

import os
import BaseHTTPServer

class SimplestWebHandler(
def do_GET(self):
self.wfile.write('Hello, you requested '
+ self.path)

def main():
server = BaseHTTPServer.HTTPServer(('', 8000),
print 'Server is looping.'

if __name__ == '__main__':

Smallest ever web server in Java

Suppose you are writing a server, and you want it to expose a simple web page with, say, usage statistics. In JRE 1.6 Sun has implemented an HTTP server class that makes it remarkably simple.

Here's all you need to do, and your app will start responding to your browser on a given port (I chose 8000 for this example):



class SmallestEverWebServer
implements HttpHandler {
public void handle(HttpExchange t)
throws IOException {
String response = "Hello! You asked for "
+ t.getRequestURI();
t.sendResponseHeaders(200, response.length());
OutputStream os = t.getResponseBody();

public static void main(String[] args)
throws IOException {
HttpServer server = HttpServer.create(
new InetSocketAddress(8000), 0);
new SmallestEverWebServer());

System.out.println("Server is running.");
BufferedReader in = new BufferedReader(
new InputStreamReader(;

for (;;) {
System.out.print("Enter 'exit' to exit> ");
String line = in.readLine();
if (line.compareToIgnoreCase("exit") == 0)

Friday, January 25, 2008

After 8 months, no longer a Noogler!

Today, I've got my readability in the last major Google language - Python, so I am now officially a full-fledged member of the engineering team! The three others that I already had were C++ (well, obviously!), Java, and JavaScript, although in actuality I've got them in the reverse order - first, JavaScript, and last - C++. And now, Python!

What's a "readability" you might ask?

Since Google is growing insanely fast, its engineering culture is defined rather tightly, which helps assimilating a lot of new people without dissimilating into a complete bedlam. There are several very rigid requirements with respect to writing source code:

  1. Every check-in (*) must be code reviewed.

  2. A person is only allowed to submit source in a language, or authorize submissions by other engineers, if (s)he was pre-approved by one of the vew experts in this specific language through a process called "readability review".

  3. If a person does not have "readability", he or she must seek a code review (and an approval) from people who have the said readability.

  4. All Google code is written in a specific style (which varies by the language, but not by product). The style guides are published, and all code must conform to them.

  5. All new code must be accompanied by tests (this one is occasionally but not so frequently violated).

(*) For non-engineers, to check-in means submitting code to the central database. All other engineers, build and test machines get it from there. When code is "checked-out", it's a work in progress residing only on an engineer's computer that may, for example, not compile or work. When code is checked in, other people get it, so it better work (and, in Google's case, be beautiful :-)).

Having a readability in the language therefore allows one to approve check-ins of the people who do not. It is less important for one's own check-ins, because they have to be reviewed anyway, and more important for the project, because there may be not enough (or even not at all) people who have readability in a language that the team is using.

For example, I am the only one who has readability in Java and JavaScript in our team, and it helps a lot - before I joined, they had to go and seek people from the outside who would kindly agree to review their code and approve the check-in.

To get readability, one has to produce a non-trivial body of code - usually several hundred lines - that uses non-trivial amount of Google platform, and language facilities. It should be written in an accepted dialect and testify to general command of language. Most readability reviewers require it to be accompanied by the unit tests.

After the code is written, it gets submitted to one of the few "readability reviewers" (when I joined readability team for JavaScript, there were only 4 people for the entire company, which was the reason I volunteered). The process often takes several weeks, and a lot of back and forth between the reviewer and reviewee. For example, my python code was almost completely redone by the end of the review, and it took over a month (although one week of which was the winter break).

Interestingly, most of the readability recommendations in my last three reviews were about test (in case of Java, just to write it - this is more or less my approach for my own readability reviewing), but in the case of C++ and Python the reviewers largely agreed with the main code base, but really took the tests to heart, making me more or less rewrite them completely.

Interesting, but not surprising. I found that if happens quite often that most of the arguments during code reviews at Google are about tests. The reason is of course that there are relatively few variations in which the main code can be written by a competent developer. Testing, however is different - a typical unit test covers maybe 10% of everything there is to test, and so the selection of the right 10% provides an excellent fodder for religious wars.

The other (strong) possibility is that we may be competent developers, but not competent testers :-)...

Thursday, January 24, 2008

Stories about Quality. 2. Why does software ship with bugs?

The bugs are integral part of any software product. So far in my 20+ years playing with computers I am yet to encounter one software product that did not have them.

This seems to be a marked different from the rest of the world - even from as close to home as computer hardware. Your monitor may occasionally have "bugs", but it is an aberration rather than the norm. Yet rules are somehow different for the software. How come?

There are several roots to the problem.

The economics

We ship buggy code because we can. The reasons are two - first, the software industry is relatively young. There is still certain amount of forgiveness that can be expected from the users - as long as what it does is indispensible, and solves their essential problem, they are willing to cope with the lack of quality - as long as there is no alternative.

Second, software industry is not awfully competitive (which more or less guarantees that there is no viable alternative to a lot of software products). Unlike most of the physical goods, software has virtually no manufacturing and physical distribution costs - making and shipping 10 million copies is as easy as shipping 10 thousand. Which leads to natural development of monopolies and oligopolies (I am omitting a whole lot of discussion on how exactly this happens - there are volumes and volumes written on the subject).

Having little competition at all means that there is even less competition based on the quality of the product. So there is little incentive to improve it beyond the point where the product is usable.

The process

Most projects follow a relatively rigid model of product development: planning, followed by design, followed by development, followed by testing and stabilization.

The final dates are often inflexible, and even when they are, the initial stages of the project tend to expand to fill the extra time.

The result is, when it comes time to ship, the last stage - which happens to be stabilization - is cut short. Bang! The product ships with a bunch of buggy features.

In Windows Home Server we had modified it process to do it per feature. Every feature had its own design-implement-stabilize pass, and the developer did not get to working on the next feature before the previous was all done and stabilized.

This worked wonders on smaller features, but of course for every release there are a few features on the critical path - they are big enough to fill all the time alotted for the entire version. They also often define the product itself. For these features this model does not help at all - they still end up having their testing phase cut short.

The sheer magnitude of the problem

"The deparment's motto was, "Comprehending the infinity requires infinite time", from which they derived a curious result - why work?" - A & B Strugatsky, "Monday Starts on Saturday"

Vista has on the order of 50 million lines of code, which in turn depend on amazing number of variations of hardware on which it runs, and software that runs on it.

While running the QA for Windows Home Server in the beginning of the shipping cycle, I quickly learned (and my experience at Google where almost all tests are written by engineers corroborated it) that development of just the unit test code takes approximately twice the amount of resources as the development of the code under the test.

If you tack on the costs of all other programmatic testing (stress, long-term, environmental testing, integration testing, etc), that adds another 300% on top of it. So really, truly exhaustively testing an application programmatically costs at least 5 times as much as writing it in the first place.

This of course is in the cases where you can do it programmatically - the manual testing is cheaper upfront, but you have to repeat it over, and over, and over again, so it adds up quickly - probably to about the same grand total. Since the long-standing argument between the proponents of automation vs. manual testing stands unresolved, I suspect that there is no statistically proven cost difference between the two methods.

For the bigger projects like Windows this cost becomes truly monumental!

Why is testing is so expensive?

It's because of what is called a test matrix - a set of tests that need to be written and run, which is a Cartesian product of number of all potential code paths (code coverage) by all potential data inputs (data coverage) in all potential environments. Just the code paths problem is combinatorically divergent, and the input set is for all practical purposes infinite for all but the simplest applications.

Most test organizations survive by heroic efforts to establish equivalency relationships between the test cases which allow them to prune the test matrix. Then they prioritize what's left to fill the time that they have by the most important of the test cases left over from the equivalency pruning.

This guarantees that the common case is relatively bug free, but when you step out of the common usage scenario - welcome to the bug farm!

Wednesday, January 23, 2008

Random permutations

Given an array, how do you shuffle its contents so every element is equally likely to be in any place?

This question has popped up on my horizon several times in the last few months, both as an interview question, in real work, and in code reviews.

The answer is remarkably simple. Take the last element of the array, generate a random index from 0 to the end of the array - inclusive, very important - and swap the element at that index with the last one. Then consider the subrarray [0...len-2]... etc.

The reason that the random number must include the index that is being swapped becomes obvious if you consider the case of two-element array. If the random number would not have included the whole interval, there would have been no swapping.

Expressed in code, it would go like this:

void knuth_shuffle(int *array, int len) {

for (int i = len - 1; i > 1; --i) {
int index = rand() % (i + 1); // 0..i
int tmp = array[index];
array[index] = array[i];
array[i] = tmp;

A fairly loose proof that it actually generates a good shuffle is inductive. In the case of len = 2, it's obvious. Assume that it holds for the case len, and let's prove that it holds for len + 1. The very last position, by the definition of the algorithm, has an equal probability of containing any element of the array. Then the problem is reduced to the case of len, which is the induction proposition.

A more strict proof is this:

The probability of an element originally at
index a to end up at index len - 1 is
p(a => len-1) = 1/len
The probability of an element at index a to
end up at index len - 2 is
p(a => len-2) = p(! a => len) p(a => len-2 on second step) =
(len-1)/len * 1/(len-1) = 1/len

... etc.

Simple, huh? But still requires attribution - the algorithm is called Knuth Shuffle, and was invented by Donald Knuth.

Tuesday, January 22, 2008

Lunatics are running the asylum

"Parsley has emerged as a leading figure in Christian conservative politics and is a frequent visitor to the Bush White House and Capitol Hill. Many credit him with the GOP victory in his native Ohio in 2004, a result that gave Bush the necessary electoral votes to capture the White House a second time. Although Parsley was well-known in Word of Faith circles for years from his church and his television program, Breakthrough, he became a nationally recognized name in 2004 for his relentless campaigning for Ohio's gay marriage ban.


Over the course of the evening, Parsley will slay people in the Holy Spirit, lay hands on them, and profess to heal their cancer, homosexuality, and financial problems. He will walk over the pews as people sway and fall to the floor. He will take credit for a woman's new job as a marketing and database manager, which she says she got after she sent Parsley her last $6. He will claim, with two members of his congregation as his witnesses, that he cured their adopted baby who was born without a brain. "His head was the size of his shoulders, nothing but water in that globe," Parsley boasts. "They brought him into service, we laid hands on him. The six o'clock news carried it; the eleven o'clock news carried it. Here are the brain scans. Here's the child with no brain. Here is the child after the prayer with a fully developed, completely normal functioning brain.""

Mortgage crisis

So there we go - it's now official - the chickens are coming home to roost.

The article in NY times "If Everyone’s Finger-Pointing, Who’s to Blame?" details a massive tangle of lawsuits that is now sweeping the mortgage industry - everybody is suing everybody else to recover a part of the massive losses.

Funnily, the people who are getting sued (brokers, bankers, insurers) have absorbed but a fraction of the money that has been lost, of course, so the potential for redress is rather slim.

7 years ago, at the end of the .com boom, the financial industry has been facing the very same question - how come there was such a massive failure to see the obvious things, by the people who were specifically charged with watching over them. The answer is still the same - underpricing the risk, that is prevalent in the U. S. today.

I have a longish analysis here:

But the short story is this: execs are paid a lot if the company hits, or exceeds profit targets. They are insulated by a negative effects of a risky project by the golden parachutes. Projects that are very risky long term (like lending to unqualified people) could produce marvelous short-term results (a surge in loans -> a surge in short term revenue -> higher stock price -> lots of money from stock options).

The effect is further amplified by the fact that once one company starts doing it, others are all but required to follow suit - otherwise THEIR earnings start looking bad comparatively, the stock goes below the exercise price, and execs are paid nothing. Well, except the salary. Funnily, this reminds me of a Soviet time Russian curse "damn you to live on the salary alone" (as opposed to salary + whatever one could steal, which often was a lot more).

A quote by Upton Sinclair - "It is difficult to get a man to understand something when his salary depends upon his not understanding it.". Funny how the history repeats itself...

Saturday, January 19, 2008

The History of the Decline and Fall of the Roman Empire

"The various modes of worship, which prevailed in the Roman world, were all considered by the people, as equally true; by the philosopher, as equally false; and by the magistrate, as equally useful."

Friday, January 18, 2008

Converting project Gutenberg books to SONY Reader

UPDATE: I now have a proxy for Project Gutenberg web site that converts the books on the fly, here:

I am in the process of writing a proxy web site that would be a projection of, but would add an option to download the file in a LRF format (which I would construct on the fly from the text files).

Today I finished the proxy part, and started experimenting woth the converter. Luckily, someone has already written a program to convert text into LRF, which is available here:

So all I need to do is preprocess the text file to remove unnecessary line breaks

(otherwise processed book ends up having a
jagged appearance
on the smaller screen, because they break
the line both at
the screen end, and at the end of line).

The simplest way to experiment is to just write the script. The simplest way to write a script is to use python. So here goes it, you can copy it from here. It assumes that all makelrf files are in c:\bin, and the books get output into c:\books.

It takes the book as a number (140), or a full URL to the text file (

import os
import sys
import urllib
import tempfile

DESCR_TEXT = 'The Project Gutenberg EBook of '
AUTHOR_TEXT = 'Author: '
TITLE_TEXT = 'Title: '

def main(argv):
if len(argv) != 2:
print 'Usage: number or URL'
print 'The output goes into c:\books'

url = argv[1]
if not url.startswith('http://'):
url = (''
% (argv[1], argv[1]))

(fd, temp_file_name) = tempfile.mkstemp(
suffix = '.txt', text = True)

url_file = urllib.urlopen(url)

title = None
author = None
description = None

# Read URL, and convert everything into
# single-line paragraphs. Also parse out
# title, author and description
first_line = True
for l in url_file:
if l.endswith('\r\n'):
l = l[:-2]
if l:
if not description and l.startswith(DESCR_TEXT):
description = l

if not author and l.startswith(AUTHOR_TEXT):
author = l[AUTHOR_LEN:]
if not title and l.startswith(TITLE_TEXT):
title = l[TITLE_LEN:]

if first_line:
first_line = False
os.write(fd, ' ')

os.write(fd, l)

# This could be a poetry stanza,
# treat short lines differently
if len(l) < 50:
os.write(fd, '\r\n')
first_line = True
if first_line:
os.write(fd, '\r\n')
os.write(fd, '\r\n\r\n')
first_line = True


if not (author and title and description):
print 'Could not parse the file!'

target = 'c:\\books\\%s.lrf' % title

if os.path.exists(target):

print temp_file_name
print 'Title: ' + title
print 'Author: ' + author
print 'Description: ' + description
print 'Converting to LRF: ' + target

os.spawnv(os.P_WAIT, 'c:\\bin\\makelrf.exe',
'-d', '"%s"' % description,
'-a', '"%s"' % author,
'-t', '"%s"' % title,
'-o', '"%s"' % target,


if __name__ == '__main__':

Thursday, January 17, 2008

Anatomy of a web ad.

NY Times is running Apple's famous "Mac vs. PC" ads on its front page. Unlike the normal ads which are normally contained within a single rectangular region or an iFrame, these consist of TWO images - one in a horizontal banner on top of the page, and the other is in the rectangular block on the right side.

None of this would of course be worth writing about, except for the fact that the animated figure moves from the lower right box into the upper banner.

I was puzzled how one might do it, given the content isolation principles that guide web pages - a content of one frame has no access to the content of another frame, nor to the content of the parent. So I started poking around the content of the page.

The answer becomes obvious if you right-click inside the banner, around the middle, and select Zoom In, while the ad is playing.

The ad LOOKS like this, of course:

But the real composition is this, with the borders of the banner simply being just drawn on the right part of the ad:

Clever. I remember people were saying that Apple lost the PC wars not because MSFT had a superior product, but because Microsoft had better marketing. Bollocks! as Creedy used to say...

Stories about Quality. 1. Who owns it?

A while ago I was asked to post my thoughts about quality. I mulled over this request for quite a while, and realized that I did not spend enough time on the subject to produce anything really deep.

However, I was working in this industry for almost 20 years now (next year will be exactly 20 years since I have made appreciable amount of money on software, writing a piece of code for an artillery guidance system in 8080 assembly, but that's quite another story).

Not surprisingly, I have accumulated quite a few stories that could help someone maybe learn something from somebody else's experience, which is always the best way to learn. So I will be posting them here, periodically, as they come to mind.

And so it goes...

My first real experience interacting with testers was at Microsoft. The company I was working before did have a test team, but they were all black-box testers. The company produced CAD software, so the testers were part former drafters, part computer enthusiasts, but not engineers. They knew something about the product, but there was no process, no test plans, nor any accounting whatsoever. All testing was done ad-hoc by just banging on the keyboard. And they clearly could not be considered to be owners of the quality of the product.

On my first day at Microsoft, I was presented the book "Writing Solid Code" by Steve Maguire. Steve was a tester during a tumultuous period at Microsoft around late '80s, when the company was mired in bugs, and it looked like every new fix introduces two more regressions. But he also was a developer. And judging from the book, a very good one!

Unlike my previous company, Microsoft employed real devs to do testing. This was a big change.

In fact, the mantra at Microsoft was
(1) SDETs (software developers in test) should be as qualified as developers as SDEs (software design engineers)
(2) Testing is an equal participant in product development, it has the same competency and career ladder, and testers on level X are paid the same as devs on the same level.
(3) As a discipline, testers own equal share of responsibility for a product success. They do not report to developers, and are equals to developers in every respect.
(4) There should be 3 testers for every 2 developers.

This was 1998, the stock was dancing around $120, Microsoft was swimming in money and could hire the best of the best - so we could almost afford this in its full, unabridged by reality form!

The test manager in Windows CE, which was my first team at Microsoft, was Pat Copeland (he's now at Google as well, although in the Mountain View campus). He believed that the test team's job is TO PREVENT PRODUCT FROM SHIPPING. That is, the product ships when the test team can no longer prove that it is not shippable. Wow! Not only test team at Microsoft is a co-equal branch, they hold the keys to the ultimate goal of any software company - shipping the product.

This lasted for a few years, the test team was very strong, they kept the quality gates high, and I came to believe that this is how it really should be. To this day I think it was actually a good system, a system of perfect checks and balances. If there is an org that builds the product, there should be a different org to verify it - otherwise there are plenty of opportunities of the conflict of the interests.

Then Pat left for the greener pastures, and right about the same time Windows CE went through a reorganization. Windows Me just shipped, and was to be the last in Windows 9x code line, so they needed to do something with the massive number of people who were now free to do something else. I have no idea what happened with Windows PMs and devs, but we got all of their testers - lock, stock, and barrel.

Now, Windows CE at the time was strictly an arms vendor. Not to be confused with Pocket PC line which was handled by a different arm of the company, Windows CE org produced OS kernel (in the form of libraries to be linked together in various combinations) and dev tools, that could then be used, as a LEGO set, to produce more or less anything embedded. Pocket PC was one of our client orgs, so was Auto PC, Web TV, and a few other projects. The product was also available to 3rd party companies, of course.

The point is, we produced a dev kit, which had practically no UI (just a bunch of samples), and all testing was done at the API level. Which meant that all testers were SDETs.

Windows 9x was a consumer product. Almost all of its testing was UI-based, and so there was an enormous lab, and a lot of people who could bang on a keyboard, but could not program. The new test team was 3-4 times bigger than what we had before.

As it turns out, the new test manager also had a radically different approach to testing. He believed that every release cycle he is given a fixed time to test the product. He does his best to test whatever he can during this time, and then whatever is left, ships.

This was a major culture clash. Most of the original testers left, and the dev team was to deal with the change. Most of us took this as a disaster - nobody owned the quality of the product any more!

I remember having yelling matches with the new test manager in the hallways. But to no avail - the new PUM was also from Win9x, this was the life as they knew it, and that's how it was going to be. The shipping date was set at January 4th, and it shipped on the clock.

This was the worst release of Windows CE ever. Half of it didn't work, and the half that did was twice slower than the previous version. That was version 4.0, so 4.1 shipped in 6 months and was basically a gigantic service pack. But the quality was actually OK.

And then 4.2 shipped in another 6 months, and had more bug fixes and a new feature now and then, and it was very good - higher quality than any other release. Suddenly, the life was pretty good.

What happened was devs stopped thinking about the quality as strictly the test team's business. They realized that the product will work IFF they produce the code that has few bugs, and write they own tests (or samples) to ensure it. Using QA team as a safety net no longer worked, but people got it, and started working around it.

Of course, part of the test development team was still there - they were smaller, and they mostly handled the tasks that could not be handled by engineers - long-term testing, stress, certain kinds of integration testing, and automation.

So what did I learn from this experience?

  1. Quality is a feature. If nobody owns it, it won't ship.

  2. In a great team, an owner will emerge and fill the void if it exists.

  3. The sky may be falling, but in a well-functioning team things will work themselves out. "As if by magic..."

To be continued...

Monday, January 14, 2008

A very interesting NYT article on moral instinct

"Which of the following people would you say is the most admirable: Mother Teresa, Bill Gates or Norman Borlaug? And which do you think is the least admirable? For most people, it’s an easy question. Mother Teresa, famous for ministering to the poor in Calcutta, has been beatified by the Vatican, awarded the Nobel Peace Prize and ranked in an American poll as the most admired person of the 20th century. Bill Gates, infamous for giving us the Microsoft dancing paper clip and the blue screen of death, has been decapitated in effigy in “I Hate Gates” Web sites and hit with a pie in the face. As for Norman Borlaug . . . who the heck is Norman Borlaug?

Yet a deeper look might lead you to rethink your answers. Borlaug, father of the “Green Revolution” that used agricultural science to reduce world hunger, has been credited with saving a billion lives, more than anyone else in history. Gates, in deciding what to do with his fortune, crunched the numbers and determined that he could alleviate the most misery by fighting everyday scourges in the developing world like malaria, diarrhea and parasites. Mother Teresa, for her part, extolled the virtue of suffering and ran her well-financed missions accordingly: their sick patrons were offered plenty of prayer but harsh conditions, few analgesics and dangerously primitive medical care."

Sunday, January 13, 2008

Game machine spec

Earlier this year I've put together a very nice game PC. I don't play anything other than Halo 2 on it, but in that game it is rendering a full 1080p stream (1920x1200, actually) quite nicely. There isn't a trace of glitching, tearing, or delays of any kind, even on Vista. I am fairly sure that the video card can go even further, and the power supply would support an SLI configuration, if it doesn't and you will be adding another card.

XP just flies on this box - boots in something line 10-15 seconds.

A friend has asked me for an advice for a game PC config, so I decided to put it here for future reference, too.

So here goes, with the recent prices, and links to Newegg where you can read the references:

Total - just below $1300.

You will need at least a mini-tower case for it, and a hard drive, obviously, which would add another $100 to the price.

(Note - if you do not expect to add another video card for SLI, you can save $100 and go with 750W power supply).

---- An update: it turns out that the Gigabyte motherboard does not support SLI - you have to have Nvidia chipset for that. So go ahead and do save the $100 on the power supply, or use a different motherboard.

Wednesday, January 9, 2008

I just couldn't keep this to myself...

"Not long after this, Jurgis, wearying of the risks and vicissitudes of miscellaneous crime, was moved to give up the career for that of a politician." - Upton Sinclair, "The Jungle"

Sunday, January 6, 2008

Technology for the rest of us, or why Microsoft is not winning in consumer space

My last job at Microsoft was building a consumer product - a server for an advanced household. As I wrote in my previous post, "The evil of lowered expectations", the most frustrating part of the job was killing really cool features because "consumer will not understand" them. Sometimes it got to the point of grotesqueness - some people thought that our customers don't know what a file share is.

I think this is actually a common trend at Microsoft. Almost every consumer product that ever came out of Redmond (*), from MSN to Windows Media Player looks like it was designed for complete idiots. On the other hand the best products that Microsoft shipped - Windows NT/2000/XP, Office, Visual Studio, SQL Server - are all products written for developers and enterprise customers.

It does not take a rocket scientist to see this trend and understand its roots. The people who make software at Microsoft are, well, software developers. The best products are, well, the products that they've built for themselves. They use mail, create documents and presentations, organize files, connect to the networks, etc. That's why the successful products are so successful - they know the customer very well. They ARE the customer.

On the other hand, take MSN. It is very clearly targeted to an average "soccer Mom". It looks like a tabloid. It has the tabloid content. Here's a snapshot of "Video Highlights" for today, for example:

  • 'Obama Girl' releases new video

  • Woman gives birth inside pants

  • Boy catches 551-pound shark

  • Watch the latest episode of 'Power of 10'

  • UFO spotted in Canada

It's not that Microsoft targets morons with its consumer products by design. It just subconsciously equates non-technical users with morons. This happens because most Microsoft engineers themselves ARE very technical of course, and they don't know very much about non-technical users. And because they don't know very much about them, they tend to go by the lowest common denominator, because they think that if a product can be used by an idiot, it can be used by a smart person as well. While technically it is true, it is also often the case that a smart person wouldn't want to use the product designed for a cretin.

Here's the way I see the market in this respect.

The reason Apple is so successful in the market in which Microsoft failed so consistently is because they recognize that "non-technical users != idiots". They CAN learn to use technology, IF the technology is presented in a package that is (1) useful, and either (2) polished, or, (3) essential. Dumbing down is not really required.

It's not all that much different from an automotive market. Cars are complex machinery, and operating them is probably harder than operating computers. There are car enthusiasts, but vast majority of the people are not. I, for one, do not know what either of my two cars look like under the hood, and have no desire to learn. They nevertheless expend non-trivial amount of time learning how to operate the car and the rules of the road.

They also spend a lot of money on hardware. Car manufacturers understand this and most of the models they build are designed for people who are not car enthusiasts.

Microsoft failure to differentiate people who are not technical enthusiasts costs company exorbitant amounts of money. I would say that the whole MSN project is a big failure because the people who it could attract (the same people who buy tabloids in supermarkets) do not spend much money online, so providing a service that is backed by advertising for them is a losing proposition. A $10B losing proposition :-(.

But wait, there's more. When Vista was still "Longhorn", it was a product for developers. It was supposed to be one of the biggest programming paradigm shifts in the history of Windows - from C-centric interfaces to .NET, from 2d graphics to 3d, from file systems to databases. It was to be an incredibly innovative product.

Then about midway through the project it turned out that revolutions are messy, bloody, and, most importantly, tend to run way behind the schedule. So Vista was refocused to compete with Apple in the UI arena.

Again, the hypothesis was that users care more about pretty looks vs. substance (performance, stability, functionality). I. e, that the users are, well, stupid. We have not yet seen the full cost of this one, but it would probably be on the order of $10B as well.

Morale of the story - know thy customers. Ideally, be them!


(*) There are two exceptions that just affirm the rule. I think that both Zune and Xbox are excellent consumer products. However, Xbox is clearly built for a hardcore gamer (its causal game lineup is absolutely awful). A lot of Microsoft employees, and certainly many if not most people who worked on Xbox are hardcore gamers. And I know that people who designed Zune ARE digital media enthusiasts. So successful consumer products can be done at Microsoft. It just happens when people build them for themselves.

Saturday, January 5, 2008

Do you want to know how the sausage is made?

I am reading Upton Sinclair's "The Jungle" (available here:, and it's one of the most poignant descriptions of the turn of the last century, the top of the Gilded Age. It follows a family of Lithuanian immigrants to Chicago who work at the famous "Yards" - the meat packing plants.

There are surprising parallels with the turn of this century - the products that are prohibited in Europe for safety reasons are sold here in the US; federal laws (written by industry lobbyists) that PROHIBIT certain type of product testing for (can you imagine a federal law designed to make something LESS SAFE? Does the hambuger law come to mind?), etc.

And the description of the food production pipeline, and working conditions may not be relevant in the US anymore, but somehow I suspect that they are very relevant for Russia, China, and India even today...

"It was only when the whole ham was spoiled that it came into the department of Elzbieta. Cut up by the two-thousand-revolutions-a-minute flyers, and mixed with half a ton of other meat, no odor that ever was in a ham could make any difference. There was never the least attention paid to what was cut up for sausage; there would come all the way back from Europe old sausage that had been rejected, and that was moldy and white--it would be dosed with borax and glycerine, and dumped into the
hoppers, and made over again for home consumption. There would be meat that had tumbled out on the floor, in the dirt and sawdust, where the workers had tramped and spit uncounted billions of consumption germs. There would be meat stored in great piles in rooms; and the water from leaky roofs would drip over it, and thousands of rats would race about on it. It was too dark in these storage places to see well, but a man could run his hand over these piles of meat and sweep off handfuls of the dried dung of rats. These rats were nuisances, and the packers would put poisoned bread out for them; they would die, and then rats, bread, and meat would go into the hoppers together. This is no fairy story and no joke; the meat would be shoveled into carts, and the man who did the shoveling would not trouble to lift out a rat even when he saw one--there were things that went into the sausage in comparison with which a poisoned rat was a tidbit. There was no place for the men to wash their hands before they ate their dinner, and so they made a practice of washing them in the water that was to be ladled into the sausage. There were the butt-ends of smoked meat, and the scraps of corned beef, and all the odds and ends of the waste of the plants, that would be dumped into old barrels in the cellar and left there. Under the
system of rigid economy which the packers enforced, there were some jobs that it only paid to do once in a long time, and among these was the cleaning out of the waste barrels. Every spring they did it; and in the barrels would be dirt and rust and old nails and stale water--and cartload after cartload of it would be taken up and dumped into the hoppers with fresh meat, and sent out to the public's breakfast. Some of it they would make into "smoked" sausage--but as the smoking took time, and was therefore expensive, they would call upon their chemistry department, and preserve it with borax and color it with gelatine to make it brown. All of their sausage came out of the same bowl, but when they came to wrap it they would stamp some of it "special," and for this they would charge two cents more a pound."

The book Fast Food Nation examines the modern day US fast food industry and has a chapter about meat packing plants as well.

Broadband woes

In the last couple of years broadband was nothing but trouble for me. It started with moving from Qwest 1.5Mbps to Speakeasy 6Mbps service in 2005. It cost a fortune (almost $150/month), but for the price it was the only way to get high connection speed AND static IP addresses.

I need static IP because I host my domain, and Exchange server, and web server at home.

It was exorbitantly expensive (you could get cable at 1/3 of the price then, but alas! no static ip). And it worked well - for a while. In the beginning of 2007 I started experiencing massive packet loss during the business hours. The truth is, it is possible that the problem started earlier, and I just noticed it then. Replacing the modem did not help, nor did an onsite visit from a Covad engineer, who found no problems with my home network whatsoever.

I am fairly sure that the problem was actually with the equipment at the office - when they reprovisioned the line, it helped, although temporarily, and when they put the modem in safe mode, it worked well, but only at 4Mbps.

After I lost any hope to resolve this problem with Speakeasy, I moved to Qwest. Qwest promised blazingly fast connection (7Mbps) at half the price of Speakeasy. At for a while, it was true - I never got full 7 megabit of course, but 6 was definitely there - both on DSL speed measuring apps like Speakeasy speed test, as well as on downloads.

However, connectivity deteriorated over the last few months - now I am barely getting 2.5Mbps down. Often the speed falls to 1.5, and it has never ever been above 3.5 in the last 3 months (I checked often).

The modem says that it is connected at 6Mbps, and the line does not show any drops, so I assume the bottleneck is probably an oversubscribed connection to the backbone. Maybe I should just go back to Speakeasy and hope that they've fixed the line...

Friday, January 4, 2008

How Google works: MapReduce

Here's a paper by two of the Google's pre-eminent engineers on one of our core technologies.

If you're interested in processing huge amounts of data by a large number of computers, a very much recommended reading.

Tuesday, January 1, 2008

Exchange support in iPhone is garbage

So I caved in and bought my daughter an iPhone for the New Year. Today most of my time was spent trying to get it to work (i. e. send and receive email). There are plenty of instructions on the net on how to do it. The problem is, none of them work.

For example, there's this:
and this:
and this:
and finally this:

I was able to get an Outlook Express to send and receive mail using every one of these instructions. But none of them actually worked for the iPhone.

When I first got my Windows Mobile, it just worked, with minimal configuration, out the box. I guess despite all its media glory, the iPhone is not a smart phone yet.

Music metadata: the insanity continues

Ok, I've bought my wife an audio book of Upton Sinclair's "The Jungle", and one of my daughters the audio book collection of the Lord of the Ring trilogy. And of course as I always do, I tried to immediately rip them to WMAs so they could be listened to from a variety of hardware I have at home.

And I immediately crashed full speed into the file metadata idiocy. Of course, CDDB does not know anything about Upton Sinclair. So I tried to edit it in WMP10. Apparently on their continuing quest of making the UI accessible to idiots, they've made it completely unusable. I managed to set the album name for the whole CD, but it would only set the artist per track. I could absolutely not figure out how to either make it forget my changes, or apply them to all the tracks on CD.

I ended up just ripping everything to Unknown Artist/Unknown Album XXX, then using my usual trick of creating the directory hierarchy Genre\Author\Album, and running a script to set the metadata in all the files to be derived from names of their ancestor directories.

Here's the script - for the occasion I rewrote it in Python:

import sys
import os

def _GetMetaDataFromDirName(dir_name):
all_dirs = dir_name.split('\\')
return (all_dirs[-3], all_dirs[-2], all_dirs[-1])

def main(argv):
if len(argv) < 2:
print 'Usage: python dir_name'
print 'This will walk the directory tree that is presumed'
print 'to have *\\genre\\author\\album structure and set WMA'
print 'and MP3 file metadata accordingly'

base_dir = argv[1]
for root, dirs, files in os.walk(argv[1]):
for file in files:
full_file_name = os.path.join(root, file)
cmdline = None
if file.lower().endswith('.wma'):
(genre, author, album) = _GetMetaDataFromDirName(root)
if genre and author and album:
cmdline = ('c:\\bin\\meta.exe "WM/Genre=%s" "Artist=%s" '
'"Author=%s" "WM/Artist=%s" "WM/AlbumArtist=%s" '
'"WM/AlbumTitle=%s" "%s"' % (genre, author, author,
author, author, album,
if file.lower().endswith('.mp3'):
(genre, author, album) = _GetMetaDataFromDirName(root)
if genre and author and album:
cmdline = ('c:\\bin\\meta.exe "WM/Genre=%s" '
'"Author=%s" "WM/AlbumArtist=%s" '
'"WM/AlbumTitle=%s" "%s"' % (genre, author, author,
album, full_file_name))
if cmdline:
print '\n\n'
print cmdline

if __name__ == '__main__':

It uses the command line-based metadata-manipulation program I wrote a while ago (meta), available here: Note that you have to have both Windows Media Encoder 9, and .NET 2.0 installed to use it. But once you get all the dependencies, it's quite convenient to script dealing with metadata using it.

Ok, meanwhile, my daughter was ripping the "Fellowship of the Ring". When she was done, the folders were named differently - some "Fellowship of the Ring Disk X", some "The Fellowship of the Ring Disk Y", and all the file names inside the directories were different as well.

If only that was the worst! On closer examination I found that
(1) Disk 1 and Disk 16 both ripped into the same directory - "Fellowship of the Ring", thus disk 16 overwrote disk 1.
(2) Disk 14 ripped into the directory corresponding to disk 12, and overwrote these files as well.

So I had to re-rip these CDs, and run my script to reset the metadata correctly.

I think the person who invented media classification by metadata needs to be built a monument with the inscription "Spit here!".