February 26, 2005

Choosing at random

Ok, somehow I drifted into posts about rather abstract concepts. Today I'll give you something more concrete to chew on, and tomorrow an extension of it.

When you want to pick an item at random from a set, it helps to know the size of the set first, so that you can generate a random number in the correct range. For example, if you want to pick a line R at random from a file (perhaps for a random quote generator) and you know the number of lines in the file, you only have to read in the first R lines of the file. Often, though, you don't know how many lines there are in the file.

When that happens, you can use the following algorithm:
sum = 0
while there's another line L:
sum += 1
pick a random number R from [0, sum)
if R < 1 then P = L
When this is finished, all the lines in the file have been read in and P contains a line selected at random.

You can convince yourself this works by noting that P always gets set to the first line. Then, if there's a second line, P will get set to it half the time. If there's a third line, P will get set to it one third of the time, and otherwise will remain as the first or second line at 50/50 probability. In other words, you can prove that this works inductively, by noting that it works for number of lines N = 1, and then noting that if it works for N = K it will also work for N = K + 1, where K is an arbitrary number.

0 Comments:

Post a Comment

<< Home