February 18, 2005

Continuous vs. Discreet

Almost all modern programming languages provide a function that returns a random real number between 0 (inclusive) and 1 (exclusive). You can write that more compactly as "a real number in the range [0, 1)". Most programmers have figured out how to convert to a random integer by doing something like: int(rand() * 100) + 1, which gets you an integer in the range [1, 100]. Unfortunately, this is often followed by a table like the following:
 1 -  10 Blue
11 - 30 Green
31 - 70 Yellow
71 - 100 Orange,
in which Blue, Green, Yellow, and Orange are selected with 10%, 20%, 40%, and 30% probability respectively. An example I've run across more than once is a random star generated by picking a stellar type from OBAFGKM, then looking up statistics like size, temperature, mass, and luminosity from a table. This is a poor way to to pick a random star. Astronomy students and Sci-Fi RPGers take note!

Frequently the problem is that the programmer doesn't know offhand how to convert [0, 1) to a non-uniform distribution and instead constructs a lookup table. Knowing a few simple conversions to non-uniform distributions can save you a lot of typing, and make your program much more useful too.

One of the most commonly needed is the exponential distribution. The classic use for this is to generate an interarrival time (the time to the next arrival of a customer, or the time to the next radioactive decay for example). -a*ln(1 - x) where x is a number from [0, 1) will give you an exponentially distributed number between [0, ∞) with average value a. The average star in a globular cluster has a mass 3.333 times the Sun, in an exponential distribution, before aging eliminates the large stars. From the mass you can calculate all the other characteristics of a main-sequence star.

The best known distribution is probably the normal distribution. Wikipedia has a short description of how to generate normally distributed numbers using the Box-Muller transform. Interestingly, you can also generate a normally distributed number by adding together a large number of 0/1 coin tosses (or calls to rand). This explains why the normal distribution pops up so often: things like your height are the result of the sum of a large number of random factors (both genetic and environmental).

Any time you find yourself asking the user to pick a number from a list, or picking from a set of radio buttons labelled "Low, Average, High", you might be making the same mistake. Internally, these buttons are probably converted to preset numbers, like "0.2, 0.5, 0.8". You should be allowing the user to pick any number between 0 and 1 in this case.

0 Comments:

Post a Comment

<< Home