This question has popped up on my horizon several times in the last few months, both as an interview question, in real work, and in code reviews.

The answer is remarkably simple. Take the last element of the array, generate a random index from 0 to the end of the array - inclusive, very important - and swap the element at that index with the last one. Then consider the subrarray [0...len-2]... etc.

The reason that the random number must include the index that is being swapped becomes obvious if you consider the case of two-element array. If the random number would not have included the whole interval, there would have been no swapping.

Expressed in code, it would go like this:

void knuth_shuffle(int *array, int len) {

srand(time());

for (int i = len - 1; i > 1; --i) {

int index = rand() % (i + 1); // 0..i

int tmp = array[index];

array[index] = array[i];

array[i] = tmp;

}

}

A fairly loose proof that it actually generates a good shuffle is inductive. In the case of len = 2, it's obvious. Assume that it holds for the case len, and let's prove that it holds for len + 1. The very last position, by the definition of the algorithm, has an equal probability of containing any element of the array. Then the problem is reduced to the case of len, which is the induction proposition.

A more strict proof is this:

The probability of an element originally at

indexato end up at indexlen - 1is

p(a => len-1) = 1/len

The probability of an element at indexato

end up at indexlen - 2is

p(a => len-2) = p(! a => len) p(a => len-2 on second step) =

(len-1)/len * 1/(len-1) = 1/len

... etc.

Simple, huh? But still requires attribution - the algorithm is called Knuth Shuffle, and was invented by Donald Knuth.

## 4 comments:

Good article, but misprint presents at the knuth_shuffle:

...

for (... ++i) // <= Should be --i

Thank you - fixed!

That's one Google interview question I forgot (even though it's a nice one) - maybe perhaps I was only asked it once during my 9-interviews (eventually failed) process at Google. The one I didn't forgot involve Bloom filters, which I was asked about twice - either I'm lucky, or it gets used ALOT in Google.

rand() % i doesn't output 0..i-1 with equal probabilities ... you need to use a different method.

Post a Comment