Wednesday, January 23, 2008

Random permutations

Given an array, how do you shuffle its contents so every element is equally likely to be in any place?

This question has popped up on my horizon several times in the last few months, both as an interview question, in real work, and in code reviews.

The answer is remarkably simple. Take the last element of the array, generate a random index from 0 to the end of the array - inclusive, very important - and swap the element at that index with the last one. Then consider the subrarray [0...len-2]... etc.

The reason that the random number must include the index that is being swapped becomes obvious if you consider the case of two-element array. If the random number would not have included the whole interval, there would have been no swapping.

Expressed in code, it would go like this:

void knuth_shuffle(int *array, int len) {

for (int i = len - 1; i > 1; --i) {
int index = rand() % (i + 1); // 0..i
int tmp = array[index];
array[index] = array[i];
array[i] = tmp;

A fairly loose proof that it actually generates a good shuffle is inductive. In the case of len = 2, it's obvious. Assume that it holds for the case len, and let's prove that it holds for len + 1. The very last position, by the definition of the algorithm, has an equal probability of containing any element of the array. Then the problem is reduced to the case of len, which is the induction proposition.

A more strict proof is this:

The probability of an element originally at
index a to end up at index len - 1 is
p(a => len-1) = 1/len
The probability of an element at index a to
end up at index len - 2 is
p(a => len-2) = p(! a => len) p(a => len-2 on second step) =
(len-1)/len * 1/(len-1) = 1/len

... etc.

Simple, huh? But still requires attribution - the algorithm is called Knuth Shuffle, and was invented by Donald Knuth.


Anonymous said...

Good article, but misprint presents at the knuth_shuffle:

for (... ++i) // <= Should be --i

Sergey Solyanik said...

Thank you - fixed!

Anonymous said...

That's one Google interview question I forgot (even though it's a nice one) - maybe perhaps I was only asked it once during my 9-interviews (eventually failed) process at Google. The one I didn't forgot involve Bloom filters, which I was asked about twice - either I'm lucky, or it gets used ALOT in Google.

loser2007 said...

rand() % i doesn't output 0..i-1 with equal probabilities ... you need to use a different method.