February 27, 2005

Choosing at random 2

I mentioned earlier that there's a relatively well-known algorithm on the net for generating solar systems from scratch. The core of this algorithm, Accrete, simulates the depletion of a dust disk as proto-planets sweep up material in orbit around a star. The program begins with a dust disk starting at distance A from the star and ending at distance B. Then a location between A and B is selected at random and the amount of dust that would accrete onto a proto-planet orbiting there is calculated. Once this is done, the disk is divided into two dust bands, AC and DB.

Now a second proto-planet is created, sweeping up more dust and further subdividing the dust bands. This process continues until all the dust has been turned into planets.

At first glance, there's nothing wrong with this algorithm. It generates a solar system in negligible time and uses an algorithm sufficiently complex to generate interesting results. However, generating one solar system isn't enough. Like all simulations, this one has enough randomness in it that you need to run it thousands of times and compare the results if you want to find the interesting patterns. When you do this, you discover that it does not take a negligible amount of time to generate thousands of solar systems.

When I examined the standard implementation I discovered that a lot of time is wasted generating proto-planets in locations where the dust has already been swept. After subdividing AB into AC and DB, instead of picking a location in the remaining dust bands, the program picks locations between A and B until it finds one that isn't in a dust gap. By the time most of the planets are generated there are only a few tiny dust bands left, so most random insertions miss. I found that most runs need about 300 insertions to sweep all the dust, and some need 3000. The largest solar system I saw had 20 planets (i.e. 20 successful insertions).

This is a perfect example of a simulation written by a scientist (the astronomer S.H. Dole in this case, as well as various tweaks by, among others, Carl Sagan) that could have benefited a lot from the help of an experienced programmer. Keep in mind that this algorithm was published in 1969, and Sagan was working on it in 1977, when it cost him ~$1 of computer time per solar system (3 seconds).

I had just learned the trick I described in my last post, the one you use when you don't know the size of the set you're selecting from (the number of dust bands, in this case), so my first instinct was to try it out on this problem. I succeeded (how is an exercise left to the reader) and speeded up the generation of solar systems by a factor of 10 or so.

The smart thing to do, as I realized later, was to keep track of the total width of the remaining dust bands. The program was already calculating the bounds of the dust gaps being generated, so almost no extra overhead was introduced. This way, only one random number needed to be generated, instead of one for each dust band. The speed increase over my first approach was modest, but more importantly the code was simpler.


At March 01, 2005, Blogger Fraxas said...

This approach is called a Free List; it's a useful bin-packing heuristic in a lot of places. It gets used in garbage collectors a lot.

At March 01, 2005, Blogger JeremyHussell said...

Yes, I know, and I should have thought of it first, instead of trying out the temporarily novel and overly complex algorithm I tried first. I could have explained that by writing another two paragraphs and referring back to my earlier post on balancing simplicity and complexity, but I'm trying to learn succinctness.


Post a Comment

<< Home