Friday, December 4, 2009

Sr Oracle needed at Bank of America

...presumably so that they could predict the bursting of the next bubble ahead of time.

30000 new troops for Afghanistan - yeeeeeeehaw!

Tuesday, December 1, 2009

Web applications!

2 browsers (IE and Firefox) running demo are maxing out T9300 on my laptop - a Core 2 Duo running at 2.5 GHz. This is plugged in.

Try it yourself:

Wednesday, November 25, 2009

Modern capitalism

The top 5 execs at Lehman Brothers made $1B from 2000 to 2008. The top 5 execs at Bearn Stearns made $1.4B during the same period.

"The whole idea of capitalism is that the people provide the capital and the executives take care of it for us. In this case, the people provided the capital, and the executives took it."

Tuesday, October 20, 2009

Matt Taibbi on naked short selling, or what killed Bear Stearns

Really awesome article on naked shorts. There are only a handful of news organizations who do real journalism these days, and, surprisingly, Rolling Stones seems to be one of these few. I am also very impressed with Bloomberg reporting recently - it looks like after Murdoch bought WSJ the best people went to Bloomberg.

Here's the really scary thing - we've lost the integrity of our financial markets.

The reason US was so successful in attracting capital was the transparency with which financial markets here used to operate, and strong regulations that ensured this transparency - for the most part, the legacy of Great Depression. However, over the last couple of decades these regulations have been weakened, and now we're back to the turn of the century stock swindles and off the books accounting (

All this - coupled with the dollar fall - can mean only one thing: US will certainly lose its status as a financial capital of the world within the next decade.

In other news, marijuana legislation has been relaxed ( So instead of worrying about markets, botched wars, and disappearing industrial base, the population can just get stoned! Finally, a government that knows how to rule...

Friday, October 16, 2009

Government rationing of health care

There were all these inane reports in the press about "Obama's suicide panels" and government rationing the health care - all widely covered.

Let's see what they say about this beauty - an insurance company says that they will only cover a woman if she gets sterilized - because she had a C-section in the past.

Under the hill

Check out where you can deliver a birthday present on!

Monday, October 12, 2009

Think pay for performance works? Think again...

This TED talk alludes to a number of social experiments where adding monetary incentive has significantly decreased the productivity of creative work.

Extremely enlightening, a must watch for every manager - and yet another proof that the claim that the Wall Streeters need to be paid exorbitant amounts of money to retain "the best and the brightest" is just a crapload of bull.

United States v. 8,800 Pounds, More or Less of Powdered White Egg Product

"United States v. 8,800 Pounds, More or Less of Powdered White Egg Product, et al., No. 07-3671, 2008 U.S. App. LEXIS 26098 (8th Cir. Dec. 24, 2008)(defendant’s importation of egg whites from Peru without first obtaining a pasteurization certificate from the Peruvian government and without obtaining authorization from the Food Safety Inspection Service violated the Egg Products Inspection Act, which could result in the condemnation and destruction of the egg whites; trial court’s granting of summary judgment upheld)."

It's hidden somewhere around the middle of the list on this page:

I kid you not!

Wednesday, October 7, 2009

Big Pharma and research

I love it when they say that the pharmaceuticals are so expensive in the US because the companies need this money to finance research. In reality, if you look at a typical big pharma income statement, you will discover that only ~20% of its expenses are research - the rest is mostly sales and marketing.

But here's another proof beyond any reasonable doubt that what big pharma is all about is sales - its donations to political parties. Researchers giving money to repubs? Basically, the same people who say that Earth is 10,000 years old, and dinosaurs walked the planet alongside people, radiocarbon dating be damned?

This political contributions profile is more in line with what you expect from used car salespeople than the research and development organizations...

By comparison, here's how the software sector looks like:

Tuesday, October 6, 2009

Bravo, Apple! You have class!

Apple quits the U.S. Chamber of Commerce over its ‘frustrating’ global warming denialism.

Friday, October 2, 2009

TFS 2010 will be much easier to install

I've spent a week trying to get TFS 2008 installed on my home infrastructure. It was the most painful piece of software that I've ever installed. And it looks like I was not the only one who felt that:

"Installing TFS has been a pain point for years. Although it’s gotten better, 2010 represents a quantum leap. The TFS installer now has 3 wizards: Basic, Standard and Advanced. The big innovation is the new “Basic” install wizard. It is a Next, Next, Next install experience that allows you to install and configure TFS in about 20 minutes or less (assuming .NET and SQL Express are already on your computer – a little longer if TFS has to install them for you). Both will already be there if you’ve installed VS 2010. The Basic wizard will install and configure IIS (if it’s not already there), install and configure SQL Express (if it’s not already there), and install and configure TFS. The only thing that really pains me is installing .NET 4.0 requires a reboot :(."

Sunday, September 20, 2009

Guns and morons

I am slowly building up my collection of World War II and former communist block firearms. This involves scouring the internet with searches on certain keywords.

Recently, I had a sad realization - most of the US gun-owning community is not mentally qualified to operate a fork, rather than own a gun. Which may explain an extremely high rate of gun-related death in the United States - 30896 in 2006.

Here's a representative post (

"Should i buy an SKS, AK-47 or AR-15 rifle?

the cheaper the better, but not if it is total garbage. The purpose of owning one for me is to have a assault rifle before they are illegal. i want something that can pack a punch when the world goes to crap so i can defend my family. yes, the purpose would be to kill, so go somewhere else hippies, this is a hypothetical situation i would like to be prepared for if it ever happens."

"Tactical advantage", "firefight" are the terms that litter the gun boards, so instead of answering these questions in dozens of places, I decided to come up with one meta-response here.

"Dear Moron,

Let me try to answer your question from the liberal/hippie point of view, since this angle is typically not covered on the gun boards you are frequenting.

First of all, this is the stupidest thing I read on the internet today! Why, you might ask? Well, there are multiple reasons.

First, do you really expect that hippies will attack you and your family? Hippies? Seriously? Like this guy?

Second, if us hippies/liberals wanted to attack you, surely we wouldn't storm your house? Did you know that there is a strong positive correlation between education and liberal views? You might not realize that, but one skill that they teach in college is thinking. So if we really did want to get you, we would surely be able to devise a better approach.

For example, we could just wait outside your house, and spray you with bullets from a safe distance when you come out to buy groceries. Or if we wanted to force the events, why not setting your house on fire and then just shoot everything that comes out?

But in reality, we wouldn't even bother with this at all. We'd just send a black helicopter ( As you know, being government/UN freaks, we have plenty at our command. It would take one missile to have your house look like this, from a safe distance:

Do you really think that AK-47 vs AR-15 would make a material difference here?

Finally, even if you really did get into one of the "firefights", I bet you would not have much luck in a "tactical combat situation". I've played with your kind in Halo. You will be the biggest target, in the center of the field, collecting all the bullets.

I bet like with any other profession, it takes years of hard work to train a soldier - not an act of buying a gun.

So take my advice - instead of wasting money on something that you aren't mentally qualified to operate and won't be able to use, buy a book. Or sign up for a history or biology class at your local community college. You really could use the extra IQ so that next time people won't look at your writing and say - Geez, this is the stupidest thing I've seen on the internet today!"

Sunday, September 13, 2009


The Queen Anne Blockbuster is going out of business, and, among other things, I picked up a copy of Beowulf ( for a couple of dollars.

Boy was this a weird experience! So weird that, in a sense, it deserves to be a cult movie - a la Striptease. For most of the movie we couldn't tell if these guys were being serious, or trying to spoof something. I am still not entirely sure. We laughed throughout big part of it.

From a really overdone dialog ((think of Gimili’s boasting in The Lord of the Rings, multiply that by 800, and you have the most humble of Beowulf’s pronouncements), to nude Angelina Jolie's feet, that seamlessly merged lack of shoes with high heels, to the way they were hiding - in a really conspicuous way - Beowulf's genitalia as he was fighting Grendel in the nude (this was very, very similar to the scene where Bart rides through the streets of Springfield in Simpsons The Movie) - it looked more like a comedy than what the movie was trying to present.

As a comedy it might have been mildly funny, though not hysterical. If you enjoyed Striptease, watch this one. Otherwise, it’s probably not worth the hour and a half.

Saturday, September 12, 2009

If you don't like George Bush, you should take a look at his voters

"The film was chosen to open the Toronto Film Festival and has its British premiere on Sunday. It has been sold in almost every territory around the world, from Australia to Scandinavia.

However, US distributors have resolutely passed on a film which will prove hugely divisive in a country where, according to a Gallup poll conducted in February, only 39 per cent of Americans believe in the theory of evolution."

Friday, September 11, 2009

An oldie...

...but a goodie!

"Cheney Waits Until Last Minute Again To Buy Sept. 11 Gifts"

Sunday, September 6, 2009

How much for shipping, again?

Check out the Fedex Next Day rate on this $10 battery...

Think this is egregiously expensive? It is, but it still comes out way ahead of our local Best Buy where a similar (actually, less powerful) battery retails for $39.99 plus tax...

Wednesday, August 26, 2009

Classy Sarah Palin fans react to Kennedy's death

Just look at the spelling in these gems. The correlation between the level of education and political views is very real...

"thank you for maintaining my belief in you as a real american, however this country is now much better off, one less socialist, anti freedom senator."

"Now if we could just talk God into taking Arlin Spector, Harry Reid,and Nancy Pelosi America would be Eutopia!"

"good riddens"

"If he makes it into Heaven (& I doubt he will with his stance on abortion) I hope that God makes him babysit all the aborted children for eternity. God have mercy on his soul."

"Ted Kennedy dying has made my day...."

"He cannot fillibuster God. Good ridencance to a sorry person."

"It's about time, we can only hope Pelosi and Ried will be joining him very soon. All 3 of them should be buried in Moscow for whom they work so tirelessly."

"Brilliant" marketing

I love it when people say that Microsoft has terrible products which only succeed because of our amazing marketing.

Friday, August 21, 2009

Introducing Black Square, a MS Word document review system

Black Square is to specs what Malevich is to code. This release is 0.1 - treat it as a preview, it has not been field-tested at all. There are known important bugs. But if you do feel adventurous, give it a whirl!

Big thanks to Eric White ( who coded the most important part of the system - the document merger.

For more background on Black Square, see the blog post where it was originally introduced:

Physics lessons in Alabama

Driving to work today I was listening to yesterday's Marketplace podcast on my Zune.

They were reading a piece about schools in Alabama having big trouble finding teachers in math and sciences. Imagine recruiting a biology teacher who'd have to teach "alternatives" to evolution! - must be super hard indeed.

So they are importing teachers from Philippines. One school mentioned in the show had its first physics class in several years(!) and it had just 17 people enroll. No wonder such a staggering percentage of US population is convinced that Earth is 6000 years old...

Wednesday, August 19, 2009

Just say no to political correctness

'Something strange has happened in America in the nine months since Barack Obama was elected. It has best been summarised by the comedian Bill Maher: "The Democrats have moved to the right, and the Republicans have moved to a mental hospital."'

We liberals brought this upon ourselves. We should stop acting like the idiots' point of view is entitled to respect and equal treatment in the media. It's not. The cooks, the crooks, and the cons - including the neocons - need to be called for what they are, loud and clear.

In particular, we should stop trying to prove that science and religion are somehow compatible. They are not.

Science requires preponderance of evidence. For science, if something is unobservable in principle, it does not exist.

Religion requires faith in tales written thousands of years ago by semi-literate shepherds, claiming the miracles that have never been observed in practice.

Most Americans are big boys and girls - it's OK to have them face the hard choice.

Either you believe in science and enjoy the fruits of labor of these godless scientists ( - including antibiotics, computers, jet travel, and sanitation. Or choose supernatural and go back to the middle ages and 40 years average life span.

Trying to have it both ways is like trying to lose weight without a diet. It simply does not work.

Tuesday, August 18, 2009

Dude, have you heard of Netflix?

I am channeling the Fake Steve Jobs now, but damn it, two insane press stories a day...

> That's largely because they're not a darn thing worth watching or
> playing that uses Moonlight/Silverlight. Go ahead visit the
> Silverlight site; let me know when you find something compelling. I didn't.

Netflix of course uses Silverlight for streaming the part of its content that's available online.

A pundit tries OpenGoo. Hilarity ensues.

OpenGoo is an open source counterpart to Google Docs. The idea is that you download and host it on your own servers. This C|Net reporter gets really, really confused by this concept:

In case they remove this article, here's the screen shot (click on it to read):

This is even funnier than Investor Business Daily's "Steven Hawking would have no chance under British health care" piece... I am wondering, what kind of college degrees do these people have? And what kinds of colleges graduate these "journalists"?

Friday, August 14, 2009

British healthcare

Apparently there's been a bunch of ads on TV feeding US populace the horror stories about how horrible NHS is.

This is what real British - you know, the ones actually living there and using British health care (where doctors by and large are government employees and the health care costs are almost entirely paid by the government) - have to say about it:

"Watching these debates is like reading National Geographic. It's just impossible, from a European perspective, to understand what these people are on about. Their political views seem as backwards and removed from the world we live in as a shaman casting magic spells."$1318652.htm

Also, from Investor's Business Daily editorial on 8/3: "People such as scientist Stephen Hawking wouldn't have a chance in the U.K. where the National Health Service would say the quality of life of this brilliant man, because of his physical handicaps, is essentially worthless."

After much ridiculing on the interwebs, they removed the sentence. Pity, it was yet another proof that money != brain.

Monday, August 10, 2009

Bingin' Malevich

I've done a bit of egosurfing today ( and found that my code review system (http:/ shows up as #2 when searching for Malevich in Bing.

Frankly, I was surprised.

By comparison, Google search has it towards the bottom of the second page. Mondrian - Google's own code review tool which inspired Malevich - is at the very bottom of the first page.

Now, there was a fresh batch of conspiracy theories fomented by the "technical" pundits that claim that Microsoft uses Bing put down Apple ( To me this is like a toddler pushing a nuclear submarine to help it go faster: Windows certainly does not need help from Bing to compete with Apple.

I think there is a very simple explanation for both Malevich and Mac vs PC phenomena. The search engine certainly uses clickthrough data to guide its searches. Bing's biggest market share in the geek population is Microsoft employees - there are tens of thousands people here that use Bing. The overall market share of Bing is small enough so that a few thousand users can swing the search results considerably - bringing Malevich-the-code-review-system (which is also extremely popular here) and articles critical of Apple on top.

Admittedly, this is not nearly as juicy as the "evil Microsoft adjust the search results to favor itself" conspiracy theories, but it appears to be the simplest one :-).'s_Razor

Saturday, August 8, 2009

GI Joe, the movie

No, I have not seen it. But I've read Roger Ebert review of it, which is a masterpiece on its own.

"The two teams also each have a skilled Ninja fighter from Japan. Why is this, you might ask? Because Japan is a huge market for CGI animation and videogames, that's why. It also has a sequence set in the Egyptian desert, although there are no shots of dead robots or topless pyramids. And Cobra headquarters are buried within the miles-deep ice of Arctic. You think construction costs are high here. At one point the ice cap is exploded real good so it will sink and crush the G. I. Joe's submarine. We thought ice floated in water but, no, you can see big falling ice chunks real good here. It must be only in your Coke that it floats."

Tuesday, August 4, 2009

Developer Connection

If you thought (like I did) that TFS documentation was bad, check out this beauty:

As a mental exercise, try to guess what does empty() do? Does it empty the selection? Or does it return true if the selection is empty?

Monday, August 3, 2009

Obama's birth certificate - a Kenyan angle!

FROM: Mr. James Thambo,
TO: Orly Taitz, Esq (


Dear Mrs. Orly Taitz:

I am Mr. James Thambo, a Barrister to US President Barrak Hussein Obama's great great uncle Matimor Thambo Hussein. He has died 3 days ago after being sick with Cancer, and now I am in Charge for Executing his In-heritance. Before he died he gave me a Birth Certificate for US President Barrak Hussein Obama issued By Kenian Republic in 1961 that Proves that Barrak Hussein Obama is not a US Citizen. It is Numbered 47O44 and Executed with All Appropriate Authority. I will give you the name of my bank where said Certificate is stored and other important information if I receive a positive reply from you.

I want you to be my partner, to secretly transfer the Certificate of Birth to the United States where it would be Sold on ebay. My business Partners here has estimated the Auction Value for the US President Barrak Hussein Obama's Birth Certificate to be $100,000,000 (U. S. Dollars!). I cannot Sell it myself Because I am a Barrister to the late US President Barrak Hussein Obama's great great uncle Matimor Thambo Hussein and they would suspect me if I sell it myself. All you would have to do is place this Item for bid on Ebay, and receive the money into your account.

Your share will be 30% which is $30,000,000 (U. S. Dollars!). My own share will be 69%, which is $69,000,000. We shall keep 1% which is $1,000,000 for expenses. Reach me immediately by mail so that I can give you further details. Also provide me your direct tel/fax to reach you, and your bank account number, and your credit card so I can Certify the Authenticity.

Thank you and God Bless,

Mr. James Thambo.


Friday, July 31, 2009

Removing duplicate comments from a word document

As I wrote before, I am working on Malevich-like system ( for reviewing specs in the same way we're reviewing code.

This work is based on Eric White's excellent blog post about merging comments from two identical files (

The idea is to have a web site where one uploads a Word document, the reviewers then download a locked copy of it which only allows adding comments. They then use Word to comment, and upload the files back. The server merges all comments (using Eric's code) back into the master copy. Every person who downloads the document afterwards gets the comments from all previous reviewers.

While working on this system, I had to add two things in terms of comment management.

First, I had to lock files so only adding comments is allowed. The code for this is here:

Second, Eric's code merges the comments by adding all comments from one document to the other. Unfortunately what this means is that after the very first reviewer has added his or her comments, every time someone else downloads the copy with these comments, adds more, and uploads the document back, the original set of comments gets duplicated. So I had to write code that cleans up this duplication.

The comments in the Word files leave in a special section accessible through MainDocumentPart.WordprocessingCommentsPart.Comments of the WordprocessingDocument class. They can be enumerated as follows:

WordprocessingDocument doc = WordprocessingDocument.Open(args[0], true);
foreach (Comment c in doc.MainDocumentPart.WordprocessingCommentsPart.Comments)
Console.WriteLine("{0} {1}:{2}", c.Id, c.Author, c.InnerText);

This section contains the comments themselves, but it does not have any information as to where the comments attach to the actual text in the Word document. Instead the comments attach via commentRangeStart, commentRanveEnd, and commentReference elements that are intersperced into the text of the paragraph:

<w:t xml:space="preserve">This is a test</w:t>
<w:commentRangeStart w:id="0" />
<w:commentRangeStart w:id="2" />
<w:commentRangeStart w:id="4" />
<w:rStyle w:val="CommentReference" />
<w:commentReference w:id="0" />
<w:commentRangeEnd w:id="0" />
<w:rStyle w:val="CommentReference" />
<w:commentReference w:id="2" />
<w:commentRangeEnd w:id="2" />
<w:rStyle w:val="CommentReference" />
<w:commentReference w:id="4" />
<w:commentRangeEnd w:id="4" />

To the developer, these elements are accessible from the root element of the Word document's MainDocumentPart:

foreach (CommentReference cRef in
Console.WriteLine("Found reference for {0}", cRef.Id);

foreach (CommentRangeStart baseRs in
Console.WriteLine("Found range start for {0}", baseRs.Id);

foreach (CommentRangeEnd baseRe in
Console.WriteLine("Found range end for {0}", baseRe.Id);

Unlike the beauty of almost Lisp-like functional code that Eric wrote to merge comments, the code below goes through some contortions trying to determine that comments that have the same text and author really do start and end in the same place of the Word document. Location is important in determining the equivalence of comments because it is easy to imagine a whole bunch of separate, different comments with the same text, for example, "Here, too.", that would otherwise be considered equal.

To compile the code, you need to get and install Microsoft's OpenXML SDK 2.0 from here:, and add a reference to DocumentFormat.OpenXml assembly which the SDK installer puts in GAC.

Here's the code. It is rather self-explanatory: it collects all the relative elements from the document - comments, ranges, and comment reference points, determines which ones are duplicates, then removes the dupes.

There is subtlety that this code relies upon which appears to be true, but technically does not technically have to be - that for the comments that are attached to the same location the commentRangeStart and commentRangeEnd elements have the same sequence - e.g. if comment A's commentRangeStart preceedes comment B's commentRangeStart, then comment A's commentRangeEnd should preceed comment B's commentRangeEnd. While this seems to be true for Word, if you are adopting this code for general purpose OpenXML, I would recomment changing the logic to remove this dependency.

// <copyright>
// Copyright (C) Sergey Solyanik.
// This file is subject to the terms and conditions of the Microsoft Public License (MS-PL).
// See for more details.
// </copyright>
using System;
using System.Collections.Generic;
using System.Xml.Linq;

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace RemoveDuplicateComments
/// <summary>
/// Removes duplicate comments in an OpenXML document.
/// </summary>
class Program
/// <summary>
/// Removes duplicate comment in an OpenXML document.
/// </summary>
/// <param name="args"> Command line arguments (file name). </param>
static void Main(string[] args)
if (args.Length != 1)
Console.WriteLine("Usage: removeduplicatecomments filename");

Dictionary<int, Comment> comments =
new Dictionary<int, Comment>();
Dictionary<int, string> commentTexts =
new Dictionary<int, string>();
Dictionary<int, CommentRangeStart> commentRangeStarts =
new Dictionary<int, CommentRangeStart>();
Dictionary<int, CommentRangeEnd> commentRangeEnds =
new Dictionary<int, CommentRangeEnd>();
Dictionary<int, OpenXmlElement> commentReferenceParents =
new Dictionary<int, OpenXmlElement>();
HashSet<OpenXmlElement> commentReferenceParentsSet =
new HashSet<OpenXmlElement>();
HashSet<int> idsOfIdenticalStarts = new HashSet<int>();
HashSet<int> idsOfIdenticalEnds = new HashSet<int>();

WordprocessingDocument doc = WordprocessingDocument.Open(args[0], true);
foreach (Comment c in
Console.WriteLine("{0} {1}:{2}", c.Id, c.Author, c.InnerText);
int id = int.Parse(c.Id);
comments.Add(id, c);
commentTexts.Add(id, c.Author + " : " + c.InnerText);

foreach (CommentReference cRef in
Console.WriteLine("Found reference for {0}", cRef.Id);
commentReferenceParents.Add(int.Parse(cRef.Id), cRef.Parent);

foreach (CommentRangeStart baseRs in
Console.WriteLine("Found range start for {0}", baseRs.Id);

int baseId = int.Parse(baseRs.Id);

commentRangeStarts[baseId] = baseRs;

string baseCommentText = commentTexts[baseId];

CommentRangeStart rs = baseRs;
for (; ; )
CommentRangeStart next = rs.NextSibling() as CommentRangeStart;
if (next == null)

rs = next;

int rsId = int.Parse(rs.Id);
if (baseCommentText == commentTexts[rsId])

foreach (CommentRangeEnd baseRe in
Console.WriteLine("Found range end for {0}", baseRe.Id);

int baseId = int.Parse(baseRe.Id);

commentRangeEnds[baseId] = baseRe;

string baseCommentText = commentTexts[baseId];

CommentRangeEnd re = baseRe;
for (; ; )
OpenXmlElement nextEl = re.NextSibling();
while (nextEl != null && commentReferenceParentsSet.Contains(nextEl))
nextEl = nextEl.NextSibling();

re = nextEl as CommentRangeEnd;
if (re == null)

int reId = int.Parse(re.Id);
if (baseCommentText == commentTexts[reId])

foreach (int id in idsOfIdenticalStarts)
if (idsOfIdenticalEnds.Contains(id))
Console.WriteLine("Eliminating comment {0}", id);



Console.WriteLine("All done!");

Apple is replacing Microsoft as a company Linux advocates love to hate

Of course, there's still plenty of hate for everyone... still, so much fun to watch!

Monday, July 27, 2009

Locking/unlocking Word doc files programmatically

My team is going through a planning milestone again, and this means reading, reviewing, and approving a lot of specs and design documents.

So for this weekend I was toying with the idea of setting up a clone of Malevich ( for document reviews.

Malevich is of course the tool we (and now a whole bunch of other teams inside and outside Microsoft) are using for code reviews. Its main target is to make commenting easy - you simply click on a line of source code, an edit box opens, you type your comment for that line, and that's it. You can read more about Malevich's inspirations and aspirations here:

Over the last 7 months Malevich has proven to be a big success. It streamlined code review process in the development team, involved many people in code reviews who otherwise would not be participating, and did wonders for the quality of our code base.

All this made me start thinking about introducing a similar process for spec reviews. After all, a review is a review, right?

The biggest problem with the spec reviews turns out to be the file format. Malevich operates on text files, and so rendering these files on the screen, showing a difference between the two versions of a file, and associating comments with the line turns out to be very simple. Specs (at Microsoft) are traditionally written as Microsoft Word documents.

Word turns out to have a very nice commenting mechanism, but rendering documents on a web page is not nearly as straightforward, and diffing them... that's a whole another project!

While pondering this idea, I ran into this blog post by Eric White: which describes how to determine if two Word documents are the same (modulo comments). The post served as my first introduction into OpenXML, which is the format behind the Word document. Also, I read that Eric was planning a blog post about merging comments from two documents, and this lead me to the following design for the spec review site.

I am going to put together a system very similar to Malevich (let's call it Black Square for now), but instead of text files, it would hold Word documents. To create a review request, a reviewee would upload a document to the server via a web site. Upon upload, the server will lock the Word file in a way that would prevent all modifications to it other than the comments. It will then make the document available for reviewers to download.

To perform a review, the reviewer downloads the document, comments on it using Office reviews functionality, and upload it back to the server. The server will then merge the comments back into the master document, making comments from everybody available to all subsequent reviewers as well as the reviewee.

I've shot Eric an email, and as it turned out, he had already largely completed his merger, and he gave me a preliminary copy to beta test (the final version is now here:

Then I spent part of the weekend coding. After a few hours I had a skeleton web site and needed to code the first meaningful action - locking a Word document so only comments could be added.

When I have to deal with large new API sets, I tend to program by Google - search for a code snippet that best illustrates the use of the API. Internet is a great resource for that (with the only exception - reading is fine, copying code with unclear copyright into commercial problems is not!), and Windows source is even better (although I cannot use that for the open source projects, for similar reasons).

Well, as it turned out, there is a dearth of samples when it comes to OpenXML programming. Unlike most of .NET APIs, MSDN has no examples of use in its API documentation. There are a few "How to" samples of solving and end-to-end problem which primarily focus on processing the text, not configuration options of the Word file. And the rest of the Internet is pretty much silent on the subject.

To make matters worse, the API is based on XML with a bunch of types derived from base XML elements, so Intellisense does not often works.

After some struggle (and help from Eric) I was able to make sense of the programming model. Here's what's going on here.

The document has a bunch of sections. You can look them up by changing the docx extension of the file into zip, and then opening it in your favorite archiver. You will find that the file is just a zipped archive of a bunch of XML files. What I've done to figure out what elements need to be changed to lock the file was making the copy of the file, expanding it, then locking the file, expanding the result, and then diffing it.

This led me to two elements: documentSecurity in properties of ExtendedFilePropertiesPart, and documentProtection. The first one was easy - it had a counterpart in the object model, "doc.ExtendedFilePropertiesPart.Properties.DocumentSecurity", setting it was very easy:

WordprocessingDocument doc = WordprocessingDocument.Open(args[1], true);
doc.ExtendedFilePropertiesPart.Properties.DocumentSecurity =
new DocumentFormat.OpenXml.ExtendedProperties.DocumentSecurity(isLock ? "8" : "0");

The second was a setting in MainDocumentPart. The hiccup for me (a very novice XML developer - remember, most of my life was spent deep in the guts of OS, I have not touched managed code and all attendant goo until a few months ago!) was that settings were a collection of OpenXML elements, and DocumentProtection, despite the existence of the type, was not addressable in the direct way, as a property of the settings. Instead, the settings needed to be interpreted as an XML record, e.g. via LINQ to XML:

DocumentProtection dp =
if (dp != null)

if (isLock)
dp = new DocumentProtection();
dp.Edit = DocumentProtectionValues.Comments;
dp.Enforcement = DocumentFormat.OpenXml.Wordprocessing.BooleanValues.One;



So here's a full code snippet. It gives you a command line utility to lock and unlock Word files (unlocking the file will - I think - also remove the password protection, although I did not try this).

You need OpenXML Format SDK 2.0 to run this, available here:, and a reference to DocumentFormat.OpenXml in your project.

// <copyright>
// Copyright (C) Sergey Solyanik.
// This file is subject to the terms and conditions of the Microsoft Public License (MS-PL).
// See for more details.
// </copyright>
using System;
using System.Xml.Linq;

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace LockDoc
/// <summary>
/// Manipulates modification permissions of an OpenXML document.
/// </summary>
class Program
/// <summary>
/// Locks/Unlocks an OpenXML document.
/// </summary>
/// <param name="args"></param>
static void Main(string[] args)
if (args.Length != 2)
Console.WriteLine("Usage: lockdoc lock|unlock filename.docx");

bool isLock = false;
if (args[0].Equals("lock", StringComparison.OrdinalIgnoreCase))
isLock = true;
else if (!args[0].Equals("unlock", StringComparison.OrdinalIgnoreCase))
Console.Error.WriteLine("Wrong action!");

WordprocessingDocument doc = WordprocessingDocument.Open(args[1], true);
doc.ExtendedFilePropertiesPart.Properties.DocumentSecurity =
new DocumentFormat.OpenXml.ExtendedProperties.DocumentSecurity
(isLock ? "8" : "0");

DocumentProtection dp =
if (dp != null)

if (isLock)
dp = new DocumentProtection();
dp.Edit = DocumentProtectionValues.Comments;
dp.Enforcement = DocumentFormat.OpenXml.Wordprocessing.BooleanValues.One;




BTW, for the not faint-of-heart, here's the documentation for OpenXML format:

And here are the Microsoft SDK docs:

Wednesday, July 15, 2009

Freedom and the Bible

"Romans 13:1-7 (NLT): Everyone must submit to governing authorities. For all authority comes from God, and those in positions of authority have been placed there by God. 2 So anyone who rebels against authority is rebelling against what God has instituted, and they will be punished. 3 For the authorities do not strike fear in people who are doing right, but in those who are doing wrong. Would you like to live without fear of the authorities? Do what is right, and they will honor you. 4 The authorities are God’s servants, sent for your good. But if you are doing wrong, of course you should be afraid, for they have the power to punish you. They are God’s servants, sent for the very purpose of punishing those who do what is wrong. 5 So you must submit to them, not only to avoid punishment, but also to keep a clear conscience. 6 Pay your taxes, too, for these same reasons. For government workers need to be paid. They are serving God in what they do. 7 Give to everyone what you owe them: Pay your taxes and government fees to those who collect them, and give respect and honor to those who are in authority."

Wednesday, July 8, 2009

Among all the idiocy printed today about Chrome OS

...finally, the voice of reason! Ladies and Gentlemen, I give you... fake Steve Jobs!

The mother of all bull...

"Google Drops A Nuclear Bomb On Microsoft. And It’s Made of Chrome."

The idiots in the press are at it again, cooking a sensation by blowing up an interesting tidbit of information way out of proportion.

Let me point out two obvious facts.

(1) The entire consumer market is rather small as a share of Microsoft revenue (10%?). The netbooks most likely represent less than 1% of the company's revenue stream. You cannot possibly call a "nuclear bomb" something that targets so little money.

(2) The smart phone market will always be much bigger than a netbook market. So if the "nuclear bomb" metaphor made any sense, Apple has dropped it years ago with iPhone.

Here's another stupid quote of the day:

'"One of Google's major goals is to take Microsoft out, to systematically destroy their hold on the market," said Mr Enderle.

"Google wants to eliminate Microsoft and it's a unique battle. The strategy is good. The big question is, will it work?"'

When I was at Google, the last thing people there were thinking about was Microsoft. I maybe have heard Microsoft mentioned a grand total of 10 times in my year plus there. What Googlers do care about is building cool things that attract attention and make customers come to their sites. THAT strategy clearly works. Destroying Microsoft - not so much (Netscape tried that approach).

My own take on this - thank you, Google! Windows 8/IE 9 will be better for your efforts. It often takes a competitor to persuade us that a segment of a market is important (unfortunate, but true). With this announcement Google did just that.

Do you have a health insurance?

Don't be so sure. You might lose it when you actually need it. Apparently, insurance companies slap a $1M surcharge on corporate policies that carry expensive patients. The companies then face a choice of whether to essentially pay you a $1M+ salary or...

Incidentally, in 3/4 of all medical bankruptcies (which are half of all bankruptcies in the US) people had health insurance.

Monday, July 6, 2009

BMI is bogus... because it embarrasses USA

It was making sense up to a point where an author claimed that 200 years ago most people led sedentiary life styles, although I had to ignore his quip on "if the formula does not describe the data, rig the formula" (this, of course, is what science - at least theoretical physics - is all about).

But when I got to the end, it was this: BMI does not make sense because...

"10. It embarrasses the U.S.

It is embarrassing for one of the most scientifically, technologically and medicinally advanced nations in the world to base advice on how to prevent one of the leading causes of poor health and premature death (obesity) on a 200-year-old numerical hack developed by a mathematician who was not even an expert in what little was known about the human body back then."

Come to think about it, an even more ridiculous fact is that our entire space program is based on a 300-year-old formula developed by a theologian!

This pearl of logical reasoning comes to you directly from a Stanford (!) Professor (!) of Mathematics (!) Keith Devlin...

P.S. The author of this blog takes no position on the validity of BMI as a measure of human obesity, only on the validity of the referenced above argument against it.

Tuesday, June 30, 2009

Scriptster: C# as a scripting language

I love Python! Unlike the vast majority of script languages that evolved ad-hoc, Python was built in a controlled way and has features, syntax, and runtime which click together.

I learned it at Google, got readability in it (, and have written a few thousand lines of code since. I must say that of all the scripting languages, Python is probably the most amenable to writing thousand-plus line programs (maybe with the exception of Ruby).

As far as I am concerned, it would be nice if all other command interpreter scripting died (sorry, PowerShell) and Python were integrated into shells everywhere.

There is one problem with Python, however - it is yet another language to learn, and yet another development/runtime environment to maintain. Its integration with Windows is good, but not nearly as good as C# which was literally made for Windows. And so after I came back to Microsoft ( and found myself doing most things in C# and C++, I started getting more and more rusty with Python.

Of course, being a dev manager does not help - my opportunities to code are few and far between.

What that meant that scripting became harder and harder. Soon I found myself writing small executables in C# instead of scripts. Which was all good, except that you end up with two things - a source file and an executable, which now need to be maintained together, checked in together, etc. And for very small things, it's a considerable overhead.

And then I had an idea. .NET, you see, ships with a C# compiler in the box - it is present on every (updated) Windows system. What if I were to write a small program, a C# "interpreter" of sorts, that would run C# programs directly from a command line as if they were batch files?

And so Scriptster was born.

Scriptster is a single executable that allows you to run C# programs directly, without manually compiling them. It automatically compiles C# code before running (and caches the compiled versions so the next execution is faster), but to the user it is completely transparent. It is fast, too - even the first, compiling, invocation takes fractions of a second. Subsequent runs are instantaneous.

Scripter is copy-deployable: simply copy it into a directory of your choice, run
    scriptster --install
(from a command prompt started with administrator privileges - "Run as Administrator"), and close and reopen CMD windows where you expect to be using it (CMD needs update to PATHEXT environment variable, which it reads on load).

After this, you can author and run C# scripts. For example, open Notepad, and type the following:
using System;

class Program
static void Main(string[] args)
Console.WriteLine("This program was invoked with the following command line parameters:");
foreach(string s in args)
Console.WriteLine("{0}", s);
Save this file as testscript.csscript (the files need to have .csscript extension to work with Scriptster), and you can run it directly from the command line:
    c:\temp> testscript Blah blah blah
This program was invoked with the following command line parameters:
You can run considerably more involved scripts with Scriptster of course, in fact, any C# program can be run, as long as it is (1) in one file, and (2) only relies on assemblies in GAC.

Here's an example of a program that queries LDAP for user alias:

//#ref System.DirectoryServices.dll
using System;
using System.Collections.Generic;
using System.DirectoryServices;
using System.Text;

class Program
static void Main(string[] args)
DirectorySearcher ds = new DirectorySearcher();


foreach (string alias in args)
ds.Filter = "(SAMAccountName=" + alias + ")";
SearchResult result = ds.FindOne();
if (result == null)
Console.Error.WriteLine("Could not resolve {0}", alias);
Console.WriteLine("{0}'s email is {1}", alias,
Notice "//#ref" at the top of the file? This is how you tell Scriptster about referenced assembly.

You can edit scripts in Visual Studio, debug them, then rename the files to .csscript extensions and run them right from the command line. You can take existing small programs, and run them from the command line, too.

Interested? You can download Scriptster, including the source code, from CodePlex:

Give it a whirl, and let me know how it works out.

Saturday, June 27, 2009

Obama has the Ring...

  • Indefinite detentions: check!
  • Broad executive powers: check!
  • Secrecy: check!
  • Signing statements: check!
  • Escalation of foreign wars: check!
  • ...

Friday, June 26, 2009

Couple of consumer guides

A must read for information on how to use your credit card (fine print translated into human-readable form)...

... and on how to use your iPhone...

"A friend of mine used to work for AT&T customer service. He had a call one day from a guy working for a very small company. A company size of five people, in fact. All of them decided to get iPhones and get a shared plan. Then they all decided to go out of the country...apparently for over a month.

Well, they spent a little less than a month over seas when one of them called my friend to ask some innocuous support question. They had not yet seen their bill of over $300,000. My friend did not say a word to him, but hung up and laughed his ass off.

Originally I thought this story was unbelievable, and I doubted it. I found it was possible after some rough calculations, but still it was kind of an extreme case.

Now, I see more and more of these stories popping up. Five people on the same plan, out of country, and all using their iPhones extensively...I'm starting to believe."

Wednesday, June 24, 2009

ActiveDirectory and disk imaging: not a happy combination

When I first heard about Hyper-V snapshotting I was extremely excited. This feature allows one to freeze an image of the virtual machine's hard drive (take a snapshot), and revert back to it at any point in time.

Moreover, it supports a snapshot tree: you can install the OS, take a snapshot, install one application, take a snapshot, revert back to the original image, install another application, and, again, take a snapshot. As a result, you now have three images which you can boot at any point in time (although not simultaneously): clean OS, app1 install, and independent - and clean - app2 install.

If you ever had to test your software in multiple environments, this is an absolute Holy Grail.

So I did it and it worked - for a while.

Unfortunately, one of the security features of the NT domain is that machine accounts periodically (once a month) change their passwords. This is driven by the client, not AD server (as described here: - which is a good introduction on how machine passwords work), and can - in theory - be turned off. But it's on by default, and is probably on as a security policy at most actual corporate installations.

So in a month the currently running version of VM changes its password. Which then renders all the rest of the snapshots useless: they have the old passwords. If you boot any of the snapshot, your VM can no longer connect to the domain. If you disconnect and rejoin, it gets a new machine SID and a new password. Which means that the password (and SID) of the version of the VM that was running previously - and all other snapshots, as a matter of fact, - is now bad.

All this means that after one month, the snapshot tree that you've just invested so much time building becomes completely useless.

The problem is not limited to Hyper-V per se. It manifests in every imaging solution - Vista/Server 2008 backup, Norton Ghost, etc. The only way to fix it - if your domain policy allows it - is to disable password change. Which brings us back to the link I mentioned earlier.

The difference between United States and Soviet Union...

...might be that in the US most people wholeheartedly believe the propaganda, where's in SU most people didn't. As far as the foreign policy goes, the actions of the two countries seem to be about the same.

There is a duality of 1984 and Brave New World. In 1984 the government rules because everybody is afraid. In Brave New World it rules because nobody cares. But the net result is still the same.

Monday, June 22, 2009

Open source in China

This place: says that 80% of Chinese use Open Source software.

Yet this impromptu Google survey says that only 8% of people around Times Square know what a browser is:

Doesn't compute, does it? People who don't know what a browser is would know about Open Source and be able to distinguish it from other forms of software licenses?

I think I know what's going on here: 80% of Chinese users think that Windows and Office are open source products because they got pirated copies of them for free :-)!

We're going to be proud of our OS again!

This is Win7 running on a VM with Visual Studio and an instance of SQL server. The CPU spikes are compiles. Notice 0% CPU and only 1.32GB of RAM.

Sunday, June 21, 2009

Cost of crapware in battery life

I watched a presentation about power last week. Among other interesting numbers in it - a clean install of Vista has less than 1% of CPU utilization on idle; an image from an OEM (including a bunch of 3rd party software) had ~7%. An increment of 10% CPU utilization leads to 8% less battery life...

Friday, June 19, 2009

A picture is worth 1000 words...

But 1000 words is roughly 5K, and a moderately-sized picture is at least 50K. Plus - unlike pictures - the text is searchable. Go figure...

Thursday, June 18, 2009

Talking to a wall... (via Reddit)

In Jerusalem, a female journalist heard about an old Jew who had been going to the Western Wall to pray, twice a day, everyday, for a long, long time. So she went to check it out. She goes to the Western Wall and there he is! She watches him pray and after about 45 minutes, when he turns to leave, she approaches him for an interview.
"I'm Rebecca Smith from CNN. Sir, how long have you been coming to the Western Wall and praying?"
"For about 50 years."
"50 years! That's amazing! What do you pray for?"
"I pray for peace between the Jews and the Arabs. I pray for all the hatred to stop and I pray for our children to grow up in safety and friendship."
"How do you feel after doing this for 50 years?"
"Like I'm talking to a fuckin' wall."

Wednesday, June 17, 2009

Monday, June 15, 2009

I kept maybe five textbooks from college. One of them was Ordinary Differential Equations (via Reddit)

I hate reposting, but this is too good to leave it up to Reddit's comments retention policy. Reproduced for posterity. But if you like it, do go to the link above and upvote...

By kleinbl00:
"The guy on the left.

He was a graduate from the University of Zagreb or something and he had an awesome accent. And he was beanpole tall and twitched around. He was like a cross between Cosmo Kramer and The Count from Sesame Street.

He was incredibly passionate about what he tought. He would bang on the chalkboard, breaking the chalk, and say "DoyounderstandTHIS? DoyouGETthis?" and look at us all intense. And then he'd roll on with what he was going.

The dude fucking loved math. He was teaching ordinary diff EQ and you'd think he was Beethoven explaining crescendos. And he really didn't give a shit about homework. He'd sit there and drill us through stuff as if our life depended on us understanding. I saw that guy tear up a couple times. More than once, the professor next door would step in and ask him to keep it down. He spent maybe an hour explaining Euler's identity - and I shit you not, he got us to tear up, too.

We had one homework assignment. It was given about halfway through the class. We had a week to do it. And it took that entire week, in groups of two or three, five to six hours a day to do it. And we handed it in, and he didn't even grade it for like three more weeks.

When we got it back, there was a pallor over the class (what was left of it - a third of the class had dropped out). I had a 23%. I'm sure I turned gray. I went to see him - what the hell could I do? I mean, I needed to pass -

"DoNotWorryaboudit. AveragewaseighTEEEN. Yooodooverygooood."

When it came down to the final, it was one, simple, benign sheet of paper. It had one problem on it. There were absolutely no numbers on it other than (1). It started with "imagine a function..."

We had two hours. At 1:15, nobody had handed in a thing. I was just sitting there stunned, grinding through the first half. At 1:30, nobody had handed in a thing. At 1:45, he said

"Eeef...youtakezeetest choam wityou and feenishit, I veel passyou."

Nobody got up. We sat there and cranked through the fucking test. The survivors, anyway. We started with 30 people in the class. We finished with twelve.


I kept maybe five textbooks from college. One of them was Ordinary Differential Equations. It was an expensive book - too expensive for me to afford. The first time I went to see him, I apologized for not having the book.

He gave me his."

Barak Hoover Obama

A very interesting article in this month's Harper's (, subscription required) contrasts Hoover - a deliberative, progressive, well-meaning technocrat - with Barak Obama.

The article is making the case that, although Obama is often compared with FDR, by trying to take the "middle road" and avoiding the open warfare with his detractors, he is actually emulating Herbert Hoover, and that this approach is doomed to failure.

"Franklin Roosevelt also took office imagining that he could bring all classes of Americans together in some big, mushy, cooperative scheme. Quickly disabused of this notion, he threw himself into the bumptious give-and-take of practical politics; lying, deceiving, manipulating, arraying one group after another on his side—a transit encapsulated by how, at the end of his first term, his outraged opponents were calling him a “traitor to his class” and he was gleefully inveighing against “economic royalists” and announcing, “They are unanimous in their hatred for me—and I welcome their hatred.”

Obama should not deceive himself into thinking that such interest-group politics can be banished any more than can the cycles of Wall Street. It is not too late for him to change direction and seize the radical moment at hand. But for the moment, just like another very good man, Barack Obama is moving prudently, carefully, reasonably toward disaster."

Sunday, June 14, 2009

More TFS blues - adding a user

The more I use this system, the more I get a suspicion that it was not designed by developers. Or maybe we found a totally clean room bunch of developers that have never used a source control system before? (Would that be... PMs?)

I wrote previously ( about the incredible amount of pain TFS is to install, and put together a simple step-by-step guide on how to install it here: (Incidentally, this is now one of the most popular articles on the blog.)

Yesterday I spent about an hour trying to figure out how to allow my daughter to use TFS server that I've set up for our home projects.

In Perforce (Microsoft uses a derivative of Perforce for quite a few of its internal projects) everything is simple: you type "p4 protect" (or "sd protect" at Microsoft), it opens a file in a notepad, and you add a line that looks like this:
write user sergey * //depot/...
And you are done.

Here's what you have to do in TFS:

(1) Add a user to a list of "licensed users". This list is not displayed by default when you navigate to the project's security settings, you have to click on a checkbox to make it display all groups.

I missed this step, and it was not mentioned on the TFS documentation page that deals with setting permissions ( I performed all the magic incantations from that page, but TFS still would not connect.

And of course the error that it was showing listed security among three other options, but gave no suggestions of what might go wrong.

Eventually, a Google search on the error code led me to the words "licensed user", and then here:

(2) You have to add the user to Contributor group in your project using TFS Explorer.

This is described here:

(3) Separately, you have to grant the user access to the sharepoint site.

Also here:

(4) Separately, you have to grant the user access to the reporting portal.

And again, here:

I understand cutting features to make the deadlines, but c'mon, ladies and gentlemen of TFS, does this thing really have to be such a pain in the butt for the administrator? Especially that we're competing with Perforce where deployment is done by a single double-click?

Saturday, June 13, 2009

What is your number of nines?

Ran into an interesting page today - a list of scheduled down times for Blogger:

It looks like Blogger is down for roughly 10 minutes once a month (in addition to a Picasa downtime that impairs its ability to accept images).

10 minutes a month does not look like much, but it does amount to about 2 hours of downtime per year. Is two hours a year good or bad?

The system's availability is defined as the ratio of uptime to the total time:

Availability = -----------
where MTTF is the mean time between failures, and MTTR is the mean time that takes to bring the system back online. The "failure" here should be understood as a measure of system ability to process requests rather than a fault: a scheduled downtime is not a bug, but the system is not available nevertheless.

2 hours of downtime in a year yield the availability of 365.25 * 24 / (2 + 365.25 * 24) = 99.9%, or "3 nines", which puts Blogger in a category of "Well-managed" systems.

Here are the definitions of various levels of availability given in Jim Gray's famous book on transaction processing (

System typeUnavailability (min/year)AvailabilityClass

As the Blogger's example shows, it's fairly hard to create a fault-tolerant (or above) system - you have to account for things that range from OS and software patching to the maintenance of the power equipment in the data centers.

One might think that the hardware failures and software bugs cause most of the availability problems, but it is actually the scheduled maintenance that creates majority of work, because it causes a lot of downtime. Once you figured out how to deal with the maintenance, the unavailability due to bugs is probably already taken care of by the same measures.

And at server MTTF of roughly 14 years, one should only be worrying about hardware (assuming that the failure can be detected and the job reallocated within one hour) when availability starts approaching 5 nines.

How many nines does your system have?

Friday, June 12, 2009

Navigating the Dell price labyrinth

Last week I bought a couple of big (8-core, 32GB, fast disk) boxes from Dell because they closely match the hardware that we're going to be running on in our data centers. For the speed and quality of the hardware the price ended up being very reasonable $4k/box (when bought with Microsoft discount).

While doing it, I discovered a curious thing - if you configure the box with 32GB RAM upfront, the memory comes up quite costly. If you just buy the server with 8GB default, and buy 32GB RAM separately - the peripherals section of the same Dell web site, the Dell-recommended RAM upgrade for this very workstation - the total cost is over $800 less (and you end up with 8GB of unused RAM that originally was there).

If you work at Microsoft, use this trick and watch our stock price go up 50 cents!

If, furthermore, you buy disks separately, you save another $150 or more over the price of pre-installed hard drives.

These are Microsoft-internal prices, which - for obvious reasons - I can not quote, but the problem is even bigger on the external Dell web site, because everything is even more expensive there.

Here, for example, is the price for memory - preconfigured - for Precision T7400.

You can see that the price for 32GB is $2960, and for 64GB it is a whopping $17870 (!).

Alternatively you can buy the same RAM on Newegg, so for 32GB you will by 4 8GB kits at $165-$240 for a total of $660 to $960:

Or, for 64GB you would buy 8 of these, at $420 each - $3360 - almost $15000 cheaper than on the Dell's web site!

It goes beyond RAM.

Dell wants $550 for 1TB hard drive (although they give their small business buyer a break - a 1 TB drive for the same T7400 there is "only" $430).

The prices on Newegg for 1TB hard drive range from $110-150 for a retail box to $74-$90 for OEM packaging.

Morale - if you are buying Dell computers, getting the parts on the side will save you a bundle. It is much cheaper to buy the minimal configuration, throw away the memory and the hard drive it comes with, and buy the replacement RAM and disks from Newegg (or anywhere else).

Note that the same is not true with CPUs - Dell CPUs are ~$200 more expensive than the same parts on Newegg, BUT you have to have a non-standard Dell heatsink, which - when bought separately - is very pricey. Plus replacing CPUs is not as trivial as RAM and disks.

Another interesting observation is that prices on Dell's home/small business site are often - usually - considerably less than on corporate web site. Most likely Dell uses this sales tactic to give its corporate users a "discount". Recently I bought a laptop using Microsoft EPP program, just to discover that the 7% "discount" that Dell provides simply matches the price that is available on its small business site for all.

Finally, for peripherals - docking stations and the like - it pays to check eBay. A $199 (plus shipping, handling, and tax) advanced port replicator for Latitude can be easily had there - new - for $129 (reasonable shipping, and no tax).

Thursday, June 11, 2009

Overlapped I/O in Windows

One of the current puzzles that our team is dealing with is database performance. As part of our platform we're building a performance counter collection system, most of which lives in a datawarehouse-like structure in SQL server.

The specific problem we're facing is a lack of parallelism on data loads. Our usage does not quite fit standard database models, of which two are typical - OLAP (on-line analytical processing - essentially read-only database that is updated, say, nightly, contains tons of data, and is queried frequently using specific set of queries for which it was designed) and OLTP (on-line transaction processing - where the "hot" subset of the data is smaller, but is read and updated very frequently).

Our database is the worst of both worlds - it writes about 1 million rows a minute, and the reads are rather infrequent. (An argument that SQL server is not the right technology for this can be made, but this discussion is outside the context of this post).

There are two basic problems.

First is the lack of parallelism in bulk transfers on the SQL server - on an 8-core, 32GB machine with 3 10k RPM data drives (which is what $4k currently buys at Dell) the load (and index update) part is completely single-threaded. Which means that one core runs at 100% CPU, and the rest are doing nothing.

I wrote a very simple program that does all the data transforms in memory, and this part is completely parallel - the server runs at 60-70% CPU utilization - and is very fast. Unfortunately, the very last insert into SQL - Amdahl's law! - is now controls the overall performance.

The second - bigger - problem is that eventually the index no longer fits in the RAM, and due to the nature of the data it seems to be rewritten almost entirely on every upload. If I restrict SQL memory to 6GB, this covers roughly 2 hours of input, the disk starts thrashing really badly, and the load times go to hell - what initially would take only 30 seconds becomes 3 minutes.

One potential solution is to partition the database, but once we do it, it starts negatively affect the performance of our queries.

The data flow through the system is really quite small relative to the power of the hardware. The rows we're writing are only a few dozens of bytes each, and the aggregate data flow is less than 1MBps. But because the entire index is being rewritten all the time, the box keeps writing at at the rate of 20-30 MBps for several minutes.

This seemed kinda slow (although speeding it up is of course not going to be a part of the solution - we need to figure out a more global approach) - so I wanted to check what the hardware is capable of doing in terms of disk throughput, so I quickly typed up a piece of code that also makes a reasonable tutorial on how to use overlapped I/O on Windows. Hence this article.

The essential steps as as follows.



Second, you want to use DMA for the data transfer, to avoid copying, and for that the data buffer should be aligned on a sector boundary. The easiest way is to VirtualAlloc it, since this will force the buffer to be contiguous and aligned on 64k:

buffer = VirtualAlloc(NULL, bufferSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

Use WriteFileEx to schedule your I/O. The hEvent part of the overlapped is not used, so a developer can use it to pass context to the completion routine. The API is quite weird this way - you'd expect that hEvent would take an event to signal on I/O completion - but that's not how it works.

overlapped.hEvent = (HANDLE)1;
overlapped.Offset = (DWORD)offset;
overlapped.OffsetHigh = (DWORD)(offset >> 32);
offset += bufferSize;
WriteFileEx(hFile, buffers, bufferSize, &overlapped, WriteFinished);

where WriteFinished gets called when the write is done:

static void CALLBACK WriteFinished(DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
lpOverlapped->hEvent = (HANDLE)2;

Finally, once the write is scheduled, the thread must be put into alertable sleep (or wait - use the SleepEx/WaitForSingleObjectEx/WaitForMultipleObjectEx functions that take the alertable flag:



DWORD dwRes = WaitForSingleObjectEx(hSomeEvent, INFINITE, TRUE);
// write is done

Obviously, you will be scheduling multiple I/Os - using an array of OVERLAPPED structures and hEvent fields inside them to keep track of what has finished and what has not is handy.

Below is the full text.

Oh, yes, and that server does ~110 MBps writes on a 10K RPM disk using the code below (and ~80 MBps writes on 7200 RPM disk), with practically zero CPU utilization.

// <copyright>
// Copyright (C) Sergey Solyanik. All rights reserved.
// This software is in public domain and is "free as in beer". It can be
// redistributed in full or in parts for free and without any preconditions.
// </copyright>
#include <windows.h>
#include <stdio.h>


enum WriteProgress
WriteScheduled = 1,
WriteSucceeded = 2,
WriteError = 3

static void CALLBACK WriteFinished(DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
if (dwErrorCode == 0)
lpOverlapped->hEvent = (HANDLE)WriteSucceeded;

wprintf(L"Error: %d\n", dwErrorCode);
lpOverlapped->hEvent = (HANDLE)WriteError;

int wmain(int argc, WCHAR* argv[])
if (argc != 5)
wprintf(L"Usage: writedata filename total_size chunk_size "
wprintf(L"Note: total_size is in megabytes\n");
wprintf(L" chunk_size is in bytes and must be a power of 2 "
L"greater than 2048\n");
wprintf(L" number_of_writes is the number of writes "
L"that are scheduled simultaneously\n");
return 1;

if (GetFileAttributesW(argv[1]) != 0xffffffff)
wprintf(L"%s already exists.", argv[1]);
return 2;

int size = _wtoi(argv[2]);
if (size <= 0)
wprintf(L"Size must be a positive number.");
return 3;

bytes.QuadPart = (__int64)size * 1024L * 1024L;

int bufferSize = _wtoi(argv[3]);
if (bufferSize <= 0)
wprintf(L"Buffer size should be a positive number.");
return 4;

if (bufferSize & (bufferSize - 1))
wprintf(L"Buffer size must be power of 2");
return 4;

if (bufferSize < 4096)
wprintf(L"Buffer size is too small");
return 4;

int simwrites = _wtoi(argv[4]);
if (simwrites <= 0)
wprintf(L"Number of simultaneous writes should be a "
L"positive number.");
return 5;

wprintf(L"Number of simultaneous writes is too large.");
return 5;

HANDLE hFile = CreateFileW(argv[1], GENERIC_WRITE, 0, NULL,

SetFilePointerEx(hFile, bytes, NULL, FILE_BEGIN);

memset(overlappeds, 0, sizeof(overlappeds));

for (int i = 0 ; i < simwrites ; ++i)
buffers[i] = VirtualAlloc(NULL, bufferSize,
memset(buffers[i], i, bufferSize);

DWORD tick = GetTickCount();

__int64 totalScheduled = 0;
__int64 totalWritten = 0;
__int64 nWrites = bytes.QuadPart / bufferSize;
int nOutstanding = 0;
for (;;)
int nScheduled = 0;
for (int i = 0; i < simwrites; ++i)
if ((int)overlappeds[i].hEvent == WriteScheduled)

if ((int)overlappeds[i].hEvent == WriteSucceeded)
totalWritten += bufferSize;
wprintf(L"\r%I64d", totalWritten);
memset(&overlappeds[i], 0, sizeof(OVERLAPPED));

if ((int)overlappeds[i].hEvent == WriteError)
goto finished;

if (nWrites > 0)
overlappeds[i].hEvent = (HANDLE)WriteScheduled;
overlappeds[i].Offset = (DWORD)totalScheduled;
overlappeds[i].OffsetHigh =
(DWORD)(totalScheduled >> 32);
totalScheduled += bufferSize;

if (!WriteFileEx(hFile, buffers[i], bufferSize,
&overlappeds[i], WriteFinished))
DWORD dwErr = GetLastError();
wprintf(L"Write error %d\n", dwErr);

goto finished;

if (nScheduled == 0)


for (int i = 0 ; i < simwrites; ++i)
VirtualFree(buffers[i], 0, MEM_RELEASE);


int seconds = (GetTickCount() - tick) / 1000;
if (seconds <= 0)
seconds = 1;

wprintf (L" in %d seconds (%d MBps)\n", seconds,
size / seconds);

return 0;