Tuesday, June 30, 2009

Scriptster: C# as a scripting language

I love Python! Unlike the vast majority of script languages that evolved ad-hoc, Python was built in a controlled way and has features, syntax, and runtime which click together.

I learned it at Google, got readability in it (http://1-800-magic.blogspot.com/2008/01/after-8-months-no-longer-noogler.html, and have written a few thousand lines of code since. I must say that of all the scripting languages, Python is probably the most amenable to writing thousand-plus line programs (maybe with the exception of Ruby).

As far as I am concerned, it would be nice if all other command interpreter scripting died (sorry, PowerShell) and Python were integrated into shells everywhere.

There is one problem with Python, however - it is yet another language to learn, and yet another development/runtime environment to maintain. Its integration with Windows is good, but not nearly as good as C# which was literally made for Windows. And so after I came back to Microsoft (http://1-800-magic.blogspot.com/2008/06/back-to-microsoft.html) and found myself doing most things in C# and C++, I started getting more and more rusty with Python.

Of course, being a dev manager does not help - my opportunities to code are few and far between.

What that meant that scripting became harder and harder. Soon I found myself writing small executables in C# instead of scripts. Which was all good, except that you end up with two things - a source file and an executable, which now need to be maintained together, checked in together, etc. And for very small things, it's a considerable overhead.

And then I had an idea. .NET, you see, ships with a C# compiler in the box - it is present on every (updated) Windows system. What if I were to write a small program, a C# "interpreter" of sorts, that would run C# programs directly from a command line as if they were batch files?

And so Scriptster was born.

Scriptster is a single executable that allows you to run C# programs directly, without manually compiling them. It automatically compiles C# code before running (and caches the compiled versions so the next execution is faster), but to the user it is completely transparent. It is fast, too - even the first, compiling, invocation takes fractions of a second. Subsequent runs are instantaneous.

Scripter is copy-deployable: simply copy it into a directory of your choice, run
    scriptster --install
(from a command prompt started with administrator privileges - "Run as Administrator"), and close and reopen CMD windows where you expect to be using it (CMD needs update to PATHEXT environment variable, which it reads on load).

After this, you can author and run C# scripts. For example, open Notepad, and type the following:
using System;

class Program
{
static void Main(string[] args)
{
Console.WriteLine("This program was invoked with the following command line parameters:");
foreach(string s in args)
Console.WriteLine("{0}", s);
}
}
Save this file as testscript.csscript (the files need to have .csscript extension to work with Scriptster), and you can run it directly from the command line:
    c:\temp> testscript Blah blah blah
This program was invoked with the following command line parameters:
Blah
blah
blah
You can run considerably more involved scripts with Scriptster of course, in fact, any C# program can be run, as long as it is (1) in one file, and (2) only relies on assemblies in GAC.

Here's an example of a program that queries LDAP for user alias:

//#ref System.DirectoryServices.dll
using System;
using System.Collections.Generic;
using System.DirectoryServices;
using System.Text;

class Program
{
static void Main(string[] args)
{
DirectorySearcher ds = new DirectorySearcher();

ds.PropertiesToLoad.Add("mail");

foreach (string alias in args)
{
ds.Filter = "(SAMAccountName=" + alias + ")";
SearchResult result = ds.FindOne();
if (result == null)
{
Console.Error.WriteLine("Could not resolve {0}", alias);
continue;
}
Console.WriteLine("{0}'s email is {1}", alias,
result.Properties["mail"][0].ToString());
}
}
}
Notice "//#ref" at the top of the file? This is how you tell Scriptster about referenced assembly.

You can edit scripts in Visual Studio, debug them, then rename the files to .csscript extensions and run them right from the command line. You can take existing small programs, and run them from the command line, too.

Interested? You can download Scriptster, including the source code, from CodePlex: http://scriptster.codeplex.com.

Give it a whirl, and let me know how it works out.

Saturday, June 27, 2009

Obama has the Ring...

  • Indefinite detentions: check!
  • Broad executive powers: check!
  • Secrecy: check!
  • Signing statements: check!
  • Escalation of foreign wars: check!
  • ...

http://sipseystreetirregulars.blogspot.com/2009/04/obama-and-ring-of-power.html

Friday, June 26, 2009

Couple of consumer guides

A must read for information on how to use your credit card (fine print translated into human-readable form)...

http://www.mint.com/blog/finance-core/the-descent-into-credit-card-debt/

... and on how to use your iPhone...

http://www.reddit.com/r/WTF/comments/8w094/att_charges_adam_savage_of_mythbusters_11k/c0alsiz

"A friend of mine used to work for AT&T customer service. He had a call one day from a guy working for a very small company. A company size of five people, in fact. All of them decided to get iPhones and get a shared plan. Then they all decided to go out of the country...apparently for over a month.

Well, they spent a little less than a month over seas when one of them called my friend to ask some innocuous support question. They had not yet seen their bill of over $300,000. My friend did not say a word to him, but hung up and laughed his ass off.

Originally I thought this story was unbelievable, and I doubted it. I found it was possible after some rough calculations, but still it was kind of an extreme case.

Now, I see more and more of these stories popping up. Five people on the same plan, out of country, and all using their iPhones extensively...I'm starting to believe."

Wednesday, June 24, 2009

ActiveDirectory and disk imaging: not a happy combination

When I first heard about Hyper-V snapshotting I was extremely excited. This feature allows one to freeze an image of the virtual machine's hard drive (take a snapshot), and revert back to it at any point in time.

Moreover, it supports a snapshot tree: you can install the OS, take a snapshot, install one application, take a snapshot, revert back to the original image, install another application, and, again, take a snapshot. As a result, you now have three images which you can boot at any point in time (although not simultaneously): clean OS, app1 install, and independent - and clean - app2 install.

If you ever had to test your software in multiple environments, this is an absolute Holy Grail.

So I did it and it worked - for a while.

Unfortunately, one of the security features of the NT domain is that machine accounts periodically (once a month) change their passwords. This is driven by the client, not AD server (as described here: http://blogs.technet.com/askds/archive/2009/02/13/machine-account-password-process.aspx - which is a good introduction on how machine passwords work), and can - in theory - be turned off. But it's on by default, and is probably on as a security policy at most actual corporate installations.

So in a month the currently running version of VM changes its password. Which then renders all the rest of the snapshots useless: they have the old passwords. If you boot any of the snapshot, your VM can no longer connect to the domain. If you disconnect and rejoin, it gets a new machine SID and a new password. Which means that the password (and SID) of the version of the VM that was running previously - and all other snapshots, as a matter of fact, - is now bad.

All this means that after one month, the snapshot tree that you've just invested so much time building becomes completely useless.

The problem is not limited to Hyper-V per se. It manifests in every imaging solution - Vista/Server 2008 backup, Norton Ghost, etc. The only way to fix it - if your domain policy allows it - is to disable password change. Which brings us back to the link I mentioned earlier.

http://blogs.technet.com/askds/archive/2009/02/13/machine-account-password-process.aspx

The difference between United States and Soviet Union...

...might be that in the US most people wholeheartedly believe the propaganda, where's in SU most people didn't. As far as the foreign policy goes, the actions of the two countries seem to be about the same.

http://digbysblog.blogspot.com/2007/04/truths-consequences-by-digby-since.html

There is a duality of 1984 and Brave New World. In 1984 the government rules because everybody is afraid. In Brave New World it rules because nobody cares. But the net result is still the same.

Monday, June 22, 2009

Open source in China

This place: http://ostatic.com/blog/actuate-survey-open-source-booming-in-china-germany-and-other-regions says that 80% of Chinese use Open Source software.

Yet this impromptu Google survey says that only 8% of people around Times Square know what a browser is: http://1-800-magic.blogspot.com/2009/06/oh-customers.html.

Doesn't compute, does it? People who don't know what a browser is would know about Open Source and be able to distinguish it from other forms of software licenses?

I think I know what's going on here: 80% of Chinese users think that Windows and Office are open source products because they got pirated copies of them for free :-)!

We're going to be proud of our OS again!


This is Win7 running on a VM with Visual Studio and an instance of SQL server. The CPU spikes are compiles. Notice 0% CPU and only 1.32GB of RAM.

Sunday, June 21, 2009

Cost of crapware in battery life

I watched a presentation about power last week. Among other interesting numbers in it - a clean install of Vista has less than 1% of CPU utilization on idle; an image from an OEM (including a bunch of 3rd party software) had ~7%. An increment of 10% CPU utilization leads to 8% less battery life...

Friday, June 19, 2009

A picture is worth 1000 words...

But 1000 words is roughly 5K, and a moderately-sized picture is at least 50K. Plus - unlike pictures - the text is searchable. Go figure...

Thursday, June 18, 2009

Talking to a wall... (via Reddit)

In Jerusalem, a female journalist heard about an old Jew who had been going to the Western Wall to pray, twice a day, everyday, for a long, long time. So she went to check it out. She goes to the Western Wall and there he is! She watches him pray and after about 45 minutes, when he turns to leave, she approaches him for an interview.
"I'm Rebecca Smith from CNN. Sir, how long have you been coming to the Western Wall and praying?"
"For about 50 years."
"50 years! That's amazing! What do you pray for?"
"I pray for peace between the Jews and the Arabs. I pray for all the hatred to stop and I pray for our children to grow up in safety and friendship."
"How do you feel after doing this for 50 years?"
"Like I'm talking to a fuckin' wall."

http://www.reddit.com/r/atheism/comments/8tkf8/in_jerusalem_a_female_journalist_heard_about_an/

Monday, June 15, 2009

I kept maybe five textbooks from college. One of them was Ordinary Differential Equations (via Reddit)

http://www.reddit.com/r/AskReddit/comments/8sios/who_was_the_best_teacher_youve_ever_had_what_made/c0aamlv

I hate reposting, but this is too good to leave it up to Reddit's comments retention policy. Reproduced for posterity. But if you like it, do go to the link above and upvote...

By kleinbl00:
"The guy on the left.

He was a graduate from the University of Zagreb or something and he had an awesome accent. And he was beanpole tall and twitched around. He was like a cross between Cosmo Kramer and The Count from Sesame Street.

He was incredibly passionate about what he tought. He would bang on the chalkboard, breaking the chalk, and say "DoyounderstandTHIS? DoyouGETthis?" and look at us all intense. And then he'd roll on with what he was going.

The dude fucking loved math. He was teaching ordinary diff EQ and you'd think he was Beethoven explaining crescendos. And he really didn't give a shit about homework. He'd sit there and drill us through stuff as if our life depended on us understanding. I saw that guy tear up a couple times. More than once, the professor next door would step in and ask him to keep it down. He spent maybe an hour explaining Euler's identity - and I shit you not, he got us to tear up, too.

We had one homework assignment. It was given about halfway through the class. We had a week to do it. And it took that entire week, in groups of two or three, five to six hours a day to do it. And we handed it in, and he didn't even grade it for like three more weeks.

When we got it back, there was a pallor over the class (what was left of it - a third of the class had dropped out). I had a 23%. I'm sure I turned gray. I went to see him - what the hell could I do? I mean, I needed to pass -

"DoNotWorryaboudit. AveragewaseighTEEEN. Yooodooverygooood."

When it came down to the final, it was one, simple, benign sheet of paper. It had one problem on it. There were absolutely no numbers on it other than (1). It started with "imagine a function..."

We had two hours. At 1:15, nobody had handed in a thing. I was just sitting there stunned, grinding through the first half. At 1:30, nobody had handed in a thing. At 1:45, he said

"Eeef...youtakezeetest choam wityou and feenishit, I veel passyou."

Nobody got up. We sat there and cranked through the fucking test. The survivors, anyway. We started with 30 people in the class. We finished with twelve.


--------------------------------------------------------------------------------

I kept maybe five textbooks from college. One of them was Ordinary Differential Equations. It was an expensive book - too expensive for me to afford. The first time I went to see him, I apologized for not having the book.

He gave me his."

Barak Hoover Obama

A very interesting article in this month's Harper's (http://www.harpers.org/archive/2009/07/0082562?redirect=1022411470, subscription required) contrasts Hoover - a deliberative, progressive, well-meaning technocrat - with Barak Obama.

The article is making the case that, although Obama is often compared with FDR, by trying to take the "middle road" and avoiding the open warfare with his detractors, he is actually emulating Herbert Hoover, and that this approach is doomed to failure.

"Franklin Roosevelt also took office imagining that he could bring all classes of Americans together in some big, mushy, cooperative scheme. Quickly disabused of this notion, he threw himself into the bumptious give-and-take of practical politics; lying, deceiving, manipulating, arraying one group after another on his side—a transit encapsulated by how, at the end of his first term, his outraged opponents were calling him a “traitor to his class” and he was gleefully inveighing against “economic royalists” and announcing, “They are unanimous in their hatred for me—and I welcome their hatred.”

Obama should not deceive himself into thinking that such interest-group politics can be banished any more than can the cycles of Wall Street. It is not too late for him to change direction and seize the radical moment at hand. But for the moment, just like another very good man, Barack Obama is moving prudently, carefully, reasonably toward disaster."

Sunday, June 14, 2009

More TFS blues - adding a user

The more I use this system, the more I get a suspicion that it was not designed by developers. Or maybe we found a totally clean room bunch of developers that have never used a source control system before? (Would that be... PMs?)

I wrote previously (http://1-800-magic.blogspot.com/2009/03/adventures-in-tfs-continued.html) about the incredible amount of pain TFS is to install, and put together a simple step-by-step guide on how to install it here: http://1-800-magic.blogspot.com/2009/03/how-to-install-tfs-on-single-domain.html. (Incidentally, this is now one of the most popular articles on the blog.)

Yesterday I spent about an hour trying to figure out how to allow my daughter to use TFS server that I've set up for our home projects.

In Perforce (Microsoft uses a derivative of Perforce for quite a few of its internal projects) everything is simple: you type "p4 protect" (or "sd protect" at Microsoft), it opens a file in a notepad, and you add a line that looks like this:
write user sergey * //depot/...
And you are done.

Here's what you have to do in TFS:

(1) Add a user to a list of "licensed users". This list is not displayed by default when you navigate to the project's security settings, you have to click on a checkbox to make it display all groups.

I missed this step, and it was not mentioned on the TFS documentation page that deals with setting permissions (http://msdn.microsoft.com/en-us/library/bb558971.aspx). I performed all the magic incantations from that page, but TFS still would not connect.

And of course the error that it was showing listed security among three other options, but gave no suggestions of what might go wrong.

Eventually, a Google search on the error code led me to the words "licensed user", and then here: http://msdn.microsoft.com/en-us/library/ms404880.aspx.

(2) You have to add the user to Contributor group in your project using TFS Explorer.

This is described here: http://msdn.microsoft.com/en-us/library/bb558971.aspx.

(3) Separately, you have to grant the user access to the sharepoint site.

Also here: http://msdn.microsoft.com/en-us/library/bb558971.aspx.

(4) Separately, you have to grant the user access to the reporting portal.

And again, here: http://msdn.microsoft.com/en-us/library/bb558971.aspx.

I understand cutting features to make the deadlines, but c'mon, ladies and gentlemen of TFS, does this thing really have to be such a pain in the butt for the administrator? Especially that we're competing with Perforce where deployment is done by a single double-click?

Saturday, June 13, 2009

What is your number of nines?

Ran into an interesting page today - a list of scheduled down times for Blogger: http://status.blogger.com/.

It looks like Blogger is down for roughly 10 minutes once a month (in addition to a Picasa downtime that impairs its ability to accept images).

10 minutes a month does not look like much, but it does amount to about 2 hours of downtime per year. Is two hours a year good or bad?

The system's availability is defined as the ratio of uptime to the total time:

MTTF
Availability = -----------
MTTF + MTTR
where MTTF is the mean time between failures, and MTTR is the mean time that takes to bring the system back online. The "failure" here should be understood as a measure of system ability to process requests rather than a fault: a scheduled downtime is not a bug, but the system is not available nevertheless.

2 hours of downtime in a year yield the availability of 365.25 * 24 / (2 + 365.25 * 24) = 99.9%, or "3 nines", which puts Blogger in a category of "Well-managed" systems.

Here are the definitions of various levels of availability given in Jim Gray's famous book on transaction processing (http://www.amazon.com/Transaction-Processing-Concepts-Techniques-Management/dp/1558601902):


System typeUnavailability (min/year)AvailabilityClass
Unmanaged5256090%1
Managed525699%2
Well-managed52699.9%3
Fault-tolerant5399.99%4
High-availability599.999%5
Very-high-availability0.599.9999%6
Ultra-availability0.0599.99999%7


As the Blogger's example shows, it's fairly hard to create a fault-tolerant (or above) system - you have to account for things that range from OS and software patching to the maintenance of the power equipment in the data centers.

One might think that the hardware failures and software bugs cause most of the availability problems, but it is actually the scheduled maintenance that creates majority of work, because it causes a lot of downtime. Once you figured out how to deal with the maintenance, the unavailability due to bugs is probably already taken care of by the same measures.

And at server MTTF of roughly 14 years, one should only be worrying about hardware (assuming that the failure can be detected and the job reallocated within one hour) when availability starts approaching 5 nines.

How many nines does your system have?

Friday, June 12, 2009

Navigating the Dell price labyrinth

Last week I bought a couple of big (8-core, 32GB, fast disk) boxes from Dell because they closely match the hardware that we're going to be running on in our data centers. For the speed and quality of the hardware the price ended up being very reasonable $4k/box (when bought with Microsoft discount).

While doing it, I discovered a curious thing - if you configure the box with 32GB RAM upfront, the memory comes up quite costly. If you just buy the server with 8GB default, and buy 32GB RAM separately - the peripherals section of the same Dell web site, the Dell-recommended RAM upgrade for this very workstation - the total cost is over $800 less (and you end up with 8GB of unused RAM that originally was there).

If you work at Microsoft, use this trick and watch our stock price go up 50 cents!

If, furthermore, you buy disks separately, you save another $150 or more over the price of pre-installed hard drives.

These are Microsoft-internal prices, which - for obvious reasons - I can not quote, but the problem is even bigger on the external Dell web site, because everything is even more expensive there.

Here, for example, is the price for memory - preconfigured - for Precision T7400.


You can see that the price for 32GB is $2960, and for 64GB it is a whopping $17870 (!).

Alternatively you can buy the same RAM on Newegg, so for 32GB you will by 4 8GB kits at $165-$240 for a total of $660 to $960:


Or, for 64GB you would buy 8 of these, at $420 each - $3360 - almost $15000 cheaper than on the Dell's web site!


It goes beyond RAM.

Dell wants $550 for 1TB hard drive (although they give their small business buyer a break - a 1 TB drive for the same T7400 there is "only" $430).


The prices on Newegg for 1TB hard drive range from $110-150 for a retail box to $74-$90 for OEM packaging.

Morale - if you are buying Dell computers, getting the parts on the side will save you a bundle. It is much cheaper to buy the minimal configuration, throw away the memory and the hard drive it comes with, and buy the replacement RAM and disks from Newegg (or anywhere else).

Note that the same is not true with CPUs - Dell CPUs are ~$200 more expensive than the same parts on Newegg, BUT you have to have a non-standard Dell heatsink, which - when bought separately - is very pricey. Plus replacing CPUs is not as trivial as RAM and disks.

Another interesting observation is that prices on Dell's home/small business site are often - usually - considerably less than on corporate web site. Most likely Dell uses this sales tactic to give its corporate users a "discount". Recently I bought a laptop using Microsoft EPP program, just to discover that the 7% "discount" that Dell provides simply matches the price that is available on its small business site for all.

Finally, for peripherals - docking stations and the like - it pays to check eBay. A $199 (plus shipping, handling, and tax) advanced port replicator for Latitude can be easily had there - new - for $129 (reasonable shipping, and no tax).

Thursday, June 11, 2009

Overlapped I/O in Windows

One of the current puzzles that our team is dealing with is database performance. As part of our platform we're building a performance counter collection system, most of which lives in a datawarehouse-like structure in SQL server.

The specific problem we're facing is a lack of parallelism on data loads. Our usage does not quite fit standard database models, of which two are typical - OLAP (on-line analytical processing - essentially read-only database that is updated, say, nightly, contains tons of data, and is queried frequently using specific set of queries for which it was designed) and OLTP (on-line transaction processing - where the "hot" subset of the data is smaller, but is read and updated very frequently).

Our database is the worst of both worlds - it writes about 1 million rows a minute, and the reads are rather infrequent. (An argument that SQL server is not the right technology for this can be made, but this discussion is outside the context of this post).

There are two basic problems.

First is the lack of parallelism in bulk transfers on the SQL server - on an 8-core, 32GB machine with 3 10k RPM data drives (which is what $4k currently buys at Dell) the load (and index update) part is completely single-threaded. Which means that one core runs at 100% CPU, and the rest are doing nothing.

I wrote a very simple program that does all the data transforms in memory, and this part is completely parallel - the server runs at 60-70% CPU utilization - and is very fast. Unfortunately, the very last insert into SQL - Amdahl's law! - is now controls the overall performance.

The second - bigger - problem is that eventually the index no longer fits in the RAM, and due to the nature of the data it seems to be rewritten almost entirely on every upload. If I restrict SQL memory to 6GB, this covers roughly 2 hours of input, the disk starts thrashing really badly, and the load times go to hell - what initially would take only 30 seconds becomes 3 minutes.

One potential solution is to partition the database, but once we do it, it starts negatively affect the performance of our queries.

The data flow through the system is really quite small relative to the power of the hardware. The rows we're writing are only a few dozens of bytes each, and the aggregate data flow is less than 1MBps. But because the entire index is being rewritten all the time, the box keeps writing at at the rate of 20-30 MBps for several minutes.

This seemed kinda slow (although speeding it up is of course not going to be a part of the solution - we need to figure out a more global approach) - so I wanted to check what the hardware is capable of doing in terms of disk throughput, so I quickly typed up a piece of code that also makes a reasonable tutorial on how to use overlapped I/O on Windows. Hence this article.

The essential steps as as follows.

First, open the file using FILE_FLAG_OVERLAPPED | FILE_FLAG_NO_BUFFERING:

HANDLE hFile = CreateFileW(szFileName, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_FLAG_OVERLAPPED | FILE_FLAG_NO_BUFFERING, NULL);

Second, you want to use DMA for the data transfer, to avoid copying, and for that the data buffer should be aligned on a sector boundary. The easiest way is to VirtualAlloc it, since this will force the buffer to be contiguous and aligned on 64k:

buffer = VirtualAlloc(NULL, bufferSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

Use WriteFileEx to schedule your I/O. The hEvent part of the overlapped is not used, so a developer can use it to pass context to the completion routine. The API is quite weird this way - you'd expect that hEvent would take an event to signal on I/O completion - but that's not how it works.

overlapped.hEvent = (HANDLE)1;
overlapped.Offset = (DWORD)offset;
overlapped.OffsetHigh = (DWORD)(offset >> 32);
offset += bufferSize;
WriteFileEx(hFile, buffers, bufferSize, &overlapped, WriteFinished);


where WriteFinished gets called when the write is done:

static void CALLBACK WriteFinished(DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
lpOverlapped->hEvent = (HANDLE)2;
}


Finally, once the write is scheduled, the thread must be put into alertable sleep (or wait - use the SleepEx/WaitForSingleObjectEx/WaitForMultipleObjectEx functions that take the alertable flag:

SleepEx(INFINITE, TRUE);

or

DWORD dwRes = WaitForSingleObjectEx(hSomeEvent, INFINITE, TRUE);
if (dwRes == WAIT_IO_COMPLETION)
{
// write is done
...


Obviously, you will be scheduling multiple I/Os - using an array of OVERLAPPED structures and hEvent fields inside them to keep track of what has finished and what has not is handy.

Below is the full text.

Oh, yes, and that server does ~110 MBps writes on a 10K RPM disk using the code below (and ~80 MBps writes on 7200 RPM disk), with practically zero CPU utilization.

//-----------------------------------------------------------------------
// <copyright>
// Copyright (C) Sergey Solyanik. All rights reserved.
//
// This software is in public domain and is "free as in beer". It can be
// redistributed in full or in parts for free and without any preconditions.
// </copyright>
//-----------------------------------------------------------------------
#include <windows.h>
#include <stdio.h>

#define MAX_OUTSTANDING_WRITES 64

enum WriteProgress
{
WriteScheduled = 1,
WriteSucceeded = 2,
WriteError = 3
};

static void CALLBACK WriteFinished(DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
if (dwErrorCode == 0)
{
lpOverlapped->hEvent = (HANDLE)WriteSucceeded;
return;
}

wprintf(L"Error: %d\n", dwErrorCode);
lpOverlapped->hEvent = (HANDLE)WriteError;
}

int wmain(int argc, WCHAR* argv[])
{
if (argc != 5)
{
wprintf(L"Usage: writedata filename total_size chunk_size "
L"number_of_writes\n");
wprintf(L"Note: total_size is in megabytes\n");
wprintf(L" chunk_size is in bytes and must be a power of 2 "
L"greater than 2048\n");
wprintf(L" number_of_writes is the number of writes "
L"that are scheduled simultaneously\n");
return 1;
}

if (GetFileAttributesW(argv[1]) != 0xffffffff)
{
wprintf(L"%s already exists.", argv[1]);
return 2;
}

int size = _wtoi(argv[2]);
if (size <= 0)
{
wprintf(L"Size must be a positive number.");
return 3;
}

LARGE_INTEGER bytes;
bytes.QuadPart = (__int64)size * 1024L * 1024L;

int bufferSize = _wtoi(argv[3]);
if (bufferSize <= 0)
{
wprintf(L"Buffer size should be a positive number.");
return 4;
}

if (bufferSize & (bufferSize - 1))
{
wprintf(L"Buffer size must be power of 2");
return 4;
}

if (bufferSize < 4096)
{
wprintf(L"Buffer size is too small");
return 4;
}

int simwrites = _wtoi(argv[4]);
if (simwrites <= 0)
{
wprintf(L"Number of simultaneous writes should be a "
L"positive number.");
return 5;
}

if (simwrites > MAX_OUTSTANDING_WRITES)
{
wprintf(L"Number of simultaneous writes is too large.");
return 5;
}

HANDLE hFile = CreateFileW(argv[1], GENERIC_WRITE, 0, NULL,
CREATE_ALWAYS, FILE_FLAG_OVERLAPPED | FILE_FLAG_NO_BUFFERING,
NULL);

SetFilePointerEx(hFile, bytes, NULL, FILE_BEGIN);
SetEndOfFile(hFile);

OVERLAPPED overlappeds[MAX_OUTSTANDING_WRITES];
memset(overlappeds, 0, sizeof(overlappeds));

void *buffers[MAX_OUTSTANDING_WRITES];
for (int i = 0 ; i < simwrites ; ++i)
{
buffers[i] = VirtualAlloc(NULL, bufferSize,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
memset(buffers[i], i, bufferSize);
}

DWORD tick = GetTickCount();

__int64 totalScheduled = 0;
__int64 totalWritten = 0;
__int64 nWrites = bytes.QuadPart / bufferSize;
int nOutstanding = 0;
for (;;)
{
int nScheduled = 0;
for (int i = 0; i < simwrites; ++i)
{
if ((int)overlappeds[i].hEvent == WriteScheduled)
{
++nScheduled;
continue;
}

if ((int)overlappeds[i].hEvent == WriteSucceeded)
{
totalWritten += bufferSize;
wprintf(L"\r%I64d", totalWritten);
memset(&overlappeds[i], 0, sizeof(OVERLAPPED));
}

if ((int)overlappeds[i].hEvent == WriteError)
{
CancelIo(hFile);
goto finished;
}

if (nWrites > 0)
{
overlappeds[i].hEvent = (HANDLE)WriteScheduled;
overlappeds[i].Offset = (DWORD)totalScheduled;
overlappeds[i].OffsetHigh =
(DWORD)(totalScheduled >> 32);
totalScheduled += bufferSize;
--nWrites;
++nScheduled;

if (!WriteFileEx(hFile, buffers[i], bufferSize,
&overlappeds[i], WriteFinished))
{
DWORD dwErr = GetLastError();
wprintf(L"Write error %d\n", dwErr);

CancelIo(hFile);
goto finished;
}
}
}

if (nScheduled == 0)
break;

SleepEx(INFINITE, TRUE);
}

finished:
for (int i = 0 ; i < simwrites; ++i)
VirtualFree(buffers[i], 0, MEM_RELEASE);

CloseHandle(hFile);

int seconds = (GetTickCount() - tick) / 1000;
if (seconds <= 0)
seconds = 1;

wprintf (L" in %d seconds (%d MBps)\n", seconds,
size / seconds);

return 0;
}

Monday, June 8, 2009

Friday, June 5, 2009

Coding for Kim Jong-il

You are on the plane from Seattle to Beijing when your 747 makes an emergency landing at a military landing strip in North Korea. Korean security forces quickly dispose of the crew and the passengers (the "Dear Leader" (http://en.wikipedia.org/wiki/Kim_Jong-il) likes the plane and wants to keep it for himself by faking a crash), but you are spared because they learn that you are a software developer working one of the best software companies in the world.

As you learn shortly from a personal audience with the "Dear Leader" himself, Kim Jong-il - protected by a nuclear shield - is now ready to branch into software development. His goal is to build North Korea into a technology powerhouse - with a market share as strong that of Microsoft and Google, and with employees as tightly controlled as Apple's.

So you are given a choice of potential technologies one of which you will need to build in the span of one year. At the end of the year, your work will be evaluated solely on a technical success (i.e. you are not expected to win a competition with existing products, but you are expected to build something that is competitive purely from the technical point of view).

If your code passes this evaluation, you live and potentially even reap some undisclosed benefits (the "Dear Leader" is vague on this point). If it does not, a standard process for "people who know too much" is applied to you. If you decline to cooperate, the aforementioned process is applied to you right now.

If you do agree to work on the project, you are given all possible technical means to achieve your task - an unlimited number of computers (clients and server), special dedicated fiber to the Internet, etc. You are allowed to read research papers, but not source code - including, but not limited to, any open source project. Your code must be 100% clean room implementation!

No pre-existing software infrastructure is available, other than a C++/Java/C# compiler, a text editor of your choice, and your choice of a version of Windows or Linux as an operating system.

As a first step, you get to chose the project you will be working on. The options are:
(1) C++ compiler
(2) A general-purpose database engine
(3) A general-purpose Internet search engine
(4) An operating system

Which project would you choose, and why?

Thursday, June 4, 2009

Today's installer WTF

We bought my parents-in-law a Zune as a present (MSFT people get a discount at the company store). They have a pretty slow DSL connection, so downloading 131MB Zune software took almost 30 minutes.

But guess what happened next? When the setup started, it started downloading again - for another half an hour.


This begs two questions:
(1) What was in the original 130MB file - just a setup program? In 131MB???
(2) How big can a media player possibly be? Media Player Classic is less than 2MB, and VLC can play absolutely every file known to humanity in 16. Yes, Zune software also has transcoders, but I've got a whole bunch of these, too - and they are just a few megs in other packages. Why do we need a good quarter of a gigabyte?

Monday, June 1, 2009

LDAP, .NET, and Annual Review

It's that time of the year again - the annual performance review started at Microsoft today.

Annual review is an important part of every Microsoftie's career - it defines not only one's compensation, but, more importantly, reputation and mobility. It's much easier to move around the company if the review history is good. The more fun the project is, the bigger competition it attracts, and the more discriminating the hiring managers are when it comes to review history.

Good reviews are required (but, of course, not sufficient) for a great career at Microsoft.

Our team practices peer performance reviews. This is one of the Googleisms that I brought back from my year there. The theory is that peers know more than a manager about the work done by the employee, and are harder to fool, so majority of the review feedback and the ultimate review grade comes from them.

(See more here: http://1-800-magic.blogspot.com/2007/12/life-by-committee-or-management-google.html, and if you work at Microsoft and want to know more, drop me an email - I have written a whitepaper on peer review system and will be glad to share it with you.)

One of the artifacts of the peer review model is the amount of mail every manager has to send to request it - say, 5 employees times 6 reviewers per employee - that's 30 emails.

With tons of cut and paste, it took me over an hour of frantic typing last time around when we started using peer reviews during the mid-year career discussion cycle.

Then of course there is the fun of verifying that the right mail goes to the right person - individually, because you don't want to share everybody's reviewer list with everybody else, - that I didn't by mistake send a mail that is intended for one reviewer to the other, that all names in the email are correct, etc.

This time I was not about to sit and cut and paste for hours again - the problem called for a programmatic solution.

I could have probably easily put together a web site, but ensuring security a bunch of super-important personal information for a few dozen people is not exactly my idea of fun - and can't be reasonably done in under 30 minutes. The data had to stay in email.

But at least I could send the invitations automatically.

A tool was pretty quickly born that took an alias of the reviewee, and a list of aliases of the reviewers, formatted email from a template, and sent it out. In the process, I found out how to extract information from Active Directory using LDAP - I needed to get the names of the users from their aliases.

The job is surprisingly easy using DirectorySearcher class of the System.DirectoryServices namespace (you do need to add the reference to it before using).

Here's a code snippet that extracts and prints a bunch of interesting information about a user from its AD record.

Happy LDAP'ing!

//-----------------------------------------------------------------------
// <copyright>
// Copyright (C) Sergey Solyanik. All rights reserved.
//
// This software is in public domain and is "free as in beer". It can be
// redistributed in full or in parts for free and without any preconditions.
// </copyright>
//-----------------------------------------------------------------------
namespace ldapquery
{
using System;
using System.Collections.Generic;
using System.DirectoryServices;
using System.Linq;
using System.Text;

/// <summary>
/// Sample for LDAP usage in C#. Looks up a name, and then
/// </summary>
class Program
{
/// <summary>
/// Prints out various interesting user properties from ActiveDirectory
/// </summary>
/// <param name="args"></param>
static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("Usage: ldapquery alias1 alias2...");
return;
}

DirectorySearcher ds = new DirectorySearcher();

ds.PropertiesToLoad.Add("displayName");
ds.PropertiesToLoad.Add("givenName");
ds.PropertiesToLoad.Add("telephoneNumber");
ds.PropertiesToLoad.Add("mobile");
ds.PropertiesToLoad.Add("homephone");
ds.PropertiesToLoad.Add("mail");
ds.PropertiesToLoad.Add("title");
ds.PropertiesToLoad.Add("department");
ds.PropertiesToLoad.Add("manager");

foreach (string alias in args)
{
ds.Filter = "(SAMAccountName=" + alias + ")";
SearchResult result = ds.FindOne();
if (result == null)
{
Console.Error.WriteLine("Could not resolve {0}", alias);
continue;
}

Console.WriteLine("{0}:", alias);

if (result.Properties["displayname"].Count > 0)
{
Console.WriteLine("Display name: {0}",
result.Properties["displayname"][0].ToString());
}

if (result.Properties["givenname"].Count > 0)
{
Console.WriteLine("Given name: {0}",
result.Properties["givenname"][0].ToString());
}

if (result.Properties["telephonenumber"].Count > 0)
{
Console.WriteLine("Office Phone: {0}",
result.Properties["telephonenumber"][0].ToString());
}

if (result.Properties["mobile"].Count > 0)
{
Console.WriteLine("Mobile Phone: {0}",
result.Properties["mobile"][0].ToString());
}

if (result.Properties["homephone"].Count > 0)
{
Console.WriteLine("Display name: {0}",
result.Properties["homephone"][0].ToString());
}

if (result.Properties["mail"].Count > 0)
{
Console.WriteLine("Email: {0}",
result.Properties["mail"][0].ToString());
}

if (result.Properties["title"].Count > 0)
{
Console.WriteLine("Title: {0}",
result.Properties["title"][0].ToString());
}

if (result.Properties["department"].Count > 0)
{
Console.WriteLine("Department: {0}",
result.Properties["department"][0].ToString());
}

if (result.Properties["manager"].Count > 0)
{
Console.WriteLine("Manager: {0}",
result.Properties["manager"][0].ToString());
}

Console.WriteLine();
}
}
}
}

SSL certs @ GoDaddy are ~$13 a year...

...up to 10 years.

I finally bit the bullet and bought a real cert for my server at home... And VPN to my home network now works from Windows 7. Yay!