March 31, 2005

Adding to the Compiler

Free book: On Lisp, by Paul Graham. For those of you who are annoyed by Graham's social commentary, this is purely technical. The title has a double meaning: it's about Lisp to some degree, but it's mostly about how to use macros to build on Lisp.

Macros find some particular pattern in your code and replacing it with other (usually longer and less comprehensible) code at compile time. This can be as simple as changing the more readable max(a, b) to the less readable (a > b ? a : b), to building a whole set of functions with one command, to changing the syntax of the programming language. In effect, macros are user-defined extensions to the compiler. In theory, you could build a complete compiler out of macros that expand your code into simpler code, more macros that expand that to assembly language, and more to turn that into machine language.

Lisp is particularly well suited to macros because it has a very simple syntax, which makes it easy to parse, rearrange, and reassemble code. On the other hand, you lose the benefits of having a more complex syntax.

C and its closest relatives have a preprocessor with its own mini-language, including such commands as #define. Although powerful, you have to learn them and their pitfalls separately from the rest of the language.

Perl 5 doesn't have an easy way to make macros yet, although it can be done with Filter::Simple, and someone (inspired by On Lisp) has begun work on a Macro module. This is partly because Perl has a complex syntax, and partly because macros are hard to implement in a module if you aren't intimately familiar with the language and its compiler.

However, Perl 6 is going to have macros built right into the core language, so that macros are just special subroutines. (Down the page a bit.) From what I've read, this is going to be the least dangerous, most powerful implementation of macros since Lisp, which is a big achievement considering Perl's complex syntax. As an aside, Perl 6 is going to come with a built in grammar (collection of regexps) that parses Perl 6. This is a subject so cool that I'll have to devote a post to it sometime soon.

Java, as far as I know, doesn't have macros.

March 24, 2005

Dream job?

The people at the Virtual Planetary Laboratory get paid to simulate extra-terrestrial planets, so that astronomers will have some idea of what spectra to expect when their instruments become sensitive enough to see planets in other solar systems.

There aren't very many details on the website, but their basic model looks sensible enough. Simulating a biotic world is probably taking things a bit too far, though. Until we have more data points (i.e. we've seen a few more worlds with life) the smart thing to do would be to concentrate on abiotic modeling, and search for spectra that deviate from dead worlds.

March 23, 2005

Off topic

I have a weakness for video game music. So when I noticed a reference to a web site that archives video game music (vgmusic.com), I couldn't resist.

While I'm on this binge, I'd like to point out this.

March 21, 2005

Brevity

I was inspired to start this blog by a few hackers who pointed out that knowing how to write well is valuable, even (or perhaps especially) for the traditionally uncommunicative programmer.

Paul Graham just published a short essay on writing, expressing the same sentiment but mostly giving his advice for actually writing better. His advice to cut everything unnecessary is spot on. I spend a lot of time removing extra adjectives and clauses from my sentences, particularly the word "probably", which I probably overuse. Brevity isn't just important in programming and writing, but in speech, and probably in all communication. It's a matter of efficiency.
I made this letter so long because I didn't
have the time to make it shorter.
- Blaise Pascal

Graham prefers to work in larger blocks of time and words than blogs allow, and he may have a point. Unless you're dealing with trivia, most subjects are too complicated to boil down to a paragraph. On the other hand, making the attempt is a good way to learn brevity.

My advice: if you want to learn to write, first read a lot. If you haven't done that, you won't have enough reference points to figure out which bits of your writing are good enough to keep.

March 20, 2005

Java improves

Yes, I know this is old news, but I was just fiddling around with some of the recent improvements to Java. Java isn't my favourite language, but I have to use it frequently anyway.

Anyway, I got sick of writing ArrayList<Point2D> points = new ArrayList<Point2D>(); until I noticed that you can do this:
class Points extends ArrayList<Point2D> {}
Points points = new Points();
This might seem like a pointless (no pun intended) way to save a few characters, but now that Points exists, you can start adding methods to it, like this:
class Points extends ArrayList<Point2D> {
double distanceTo(Point2D p1) {
double min = Double.POSITIVE_INFINITY;
for(Point2D p2 : this)
if(p1.distance(p2) < min)
min = p1.distance(p2);
return min;
}
}
If, every time you find yourself writing a loop that iterates over a collection of objects, you turn the loop into a method on the collection, you'll soon find that you've got a nice toolkit.

Of course, I'd still rather be using Perl.

March 17, 2005

Somebody else did the work first

So it finally occurred to me to check the Wikipedia to see what it had to say about simulation. What I found, of course, is that not only is there an article, but it neatly sumarizes some of the concepts I was trying to write about earlier. Just what I should have expected from the Wikipedia :-)

The article classifies simulations into Stochastic (what I would call Probabilistic) vs. Deterministic. In practice, Deterministic simulations aren't very common, probably because random numbers are such a powerful tool. Much more useful is the distinction between Continuous and Discrete simulations. That's much more succinct terminology than time slice and critical event. The article also talks about agent-based simulation, which is basically what I described when I talked about my early science fair projects.

So, now I have to buckle down and figure out something non-obvious to tell you about simulations.

March 16, 2005

Saving the seed

Will Wright (by way of Fraxas) got me thinking about a neat technique I learned a while back for saving simulation runs, or certain other forms of data. The basic idea is to save the seed of your random number generator before you do anything. In some cases, like simulations where there is no user input except for a few initial parameters, you can exactly reproduce a particular simulation run just by setting your random number generator correctly and running the program again.

This is particularly useful for computationally inexpensive simulations. I once wrote a fairly complex simulation whose save files were a mere 24 bytes. The output was much much bigger of course, but it only took a second or so to calculate it all from the seed (4 bytes) and 5 initial parameters.

Game developers have also on occasion used this technique to save the static portions of a randomly generated game world. Sometimes the generated universe becomes so compelling that exploring it becomes the main focus, as in Noctis.

By static portions, I mean the bits whose state were completely determined by the program. For example, you could generate the terrain using a random seed and some parameters, but if the terrain is deformable, then you have to keep track of all the changes the user made. In some cases this is worth doing, but if the user makes enough changes then it eventually becomes more efficient just to store the final state.

Using this technique also requires that you have efficient algorithms for generating things. Using Will Wright's Spore as an example, if the algorithms used to figure out how your creature walks take too long to reach a stable final state, then users are going to be annoyed when their creatures have to spend 10 minutes learning to walk again every time the game is loaded. In this case, figuring out an efficient way to store or compute the walking behaviour is necessary. <rant>Efficiency is still important!<\rant>

March 15, 2005

Where to look...

I'm lucky to be in computer science, because there's more detailed information about computer science on the web than there is about just about any other subject. Here are the most useful sources of general information about algorithms I've found:
  • MathWorld - highly technical, and doesn't describe algorithms, but provides the equations you need to implement some interesting algorithms.
  • Wikipedia - the good mix of quality and readability in Wikipedia's articles extends to articles about algorithms too.
  • Google Groups - the noise to content ratio may be a bit high, but this is the best place to go when you don't know what to search for elsewhere or need some gem of insight like "Figuring out the Voronoi diagram of points on a sphere is the same as finding the convex hull of those points."
Google Groups is often where I go first, to figure out what to search for on Wikipedia and MathWorld.

Of course, there are also hundreds of more specialized sites out there that are just as useful.

March 13, 2005

Artificial evolution

In some biology simulations, you don't even need to store the location of each critter. Surprisingly enough, evolution simulations are often of this type.
Each critter is either male or female.
Each critter has it's own ratio of
male to female offspring.
Each timestep:
Select a male critter C1 at random
Select a female critter C2 at random
delete 2 random critters
create 2 offspring from C1 and C2
When creating offspring from C1 and C2, one offspring inherits C1's male/female probability gene, the other inherits C2's, and they are male or female with their respective probabilities.

To provide a bit of variation, occasionally randomize a critter's male/female gene.

When you run this extremely simple simulation, the ratio of males to females in the population quickly approaches 50/50. The genepool also changes significantly, from a set of random ratios to a pool entirely filled with genes near 50/50.

This happens because of negative feedback. When one sex becomes more numerous, genes that produce more of the other sex gain an advantage at reproducing themselves, thus rebalancing the population.

So, sorry guys, all those sci-fi stories with races of space-babes whose populations are 7/8ths female just aren't possible. Unless, of course, they've got the genetics of a hive insect, in which case most of them won't be interested.

March 12, 2005

Biology simulations

My first simulations were in the field of biology, which at first glance is a dumb place to start. Isn't biology more complex, and thus more difficult to simulate?

In truth, it's just as easy to write biology simulations as it is to write simulations in any other field. The only difference is that when you write a biology simulation, you abstract out a whole lot more underlying detail. Here's how the simulation I wrote for my grade 7 science fair project, way back in 1990, worked:

In an M by N world with no wraparound, a population of P critters was maintained, so that every time one died, a new one was born. Internally, the only data maintained was the location and other characteristics of each critter, which cut the memory requirements from M*N down to P, and made it easier to code (keep in mind I was using BASIC and I was only 12). Critters were given a lifetime of 75 timesteps, after which they would immediately die and a new critter would be created at a random position. During their life, critters would perform a random walk, with bounds checking so that they didn't wander off the edge of the world.

Starting with this very simple framework, you can add all kinds of details to test various theories. That year, I did a disease simulation, with each critter being either uninfected, infectious, or immune. Infectious critters would occasionally infect nearby critters, and would eventually die, or recover and become immune. By tweaking various parameters such as P, the lethality of the disease, the length of the infectious stage, and so on, I was able to create low level persistent diseases that would always keep a small portion of the population infected, plagues that would rip through the population until practically all individuals were immune then come close to dying out until the next generation of uninfected individuals was born, and diseases so lethal that they killed off all nearby hosts and died off completely.

By the next year's science fair project, the population was no longer being artificially maintained at P. Critters had an energy level, which was depleted by movement and reproduction, and increased by eating food. Critters would still die of old age, but they would also die if they ran out of energy. Instead of being immediately replaced, critters were only created when an existing critter accumulated enough energy (and a reserve) to allow it to reproduce. Food consisted of unmoving "plankton" that was being scattered across the world at a constant rate, and was eaten whenever a critter happened upon some. The experiment that year was to create two different "species" of critter with different characteristics (energy needs, lifespan, etc.) and have them compete for the food. The result was that no matter how carefully I tweaked the variables, one type would always die out. This turned out to support (in a way) a well known hypothesis that only one species can occupy a niche at a time.

This very simplistic model can be extended to perform all kinds of little experiments, and with enough work can become a pretty complex ecology simulation. Tomorrow, I'll describe how to add a simple element of genetics, and therefore evolution, to the mix.

March 11, 2005

Explaining what you do

A standard conversation starter in the United States is "What's your job?", to which I usually reply "I'm a programmer at [current organization]." Inevitably, the next question is "So... what do you do, then?"

If the person I'm talking to doesn't know much about computers, I'm faced with the problem of explaining what, exactly, a programmer does, without going into too much technical detail. Here's how I usually deal with it:

"Programming is a little like dealing with a genie, or any other mythical creature that grants wishes, because a computer does exactly what you tell it to do, rather than what you want it to do. Since programmers have to deal with this problem every day, we created a whole new language in which it's much more difficult to say something you didn't mean, and we communicate our wishes to the computer using that language instead of English. So, most of my job involves translating from English to computerese, and working out exactly what was meant in excruciating detail so that the computer won't do the wrong thing."

March 10, 2005

Somebody else did the work today

Easy post today: Paul Graham has published another long essay. Go read it, and marvel at the parallels with what I've been writing.

March 09, 2005

Efficient operations

Ok, I was thinking about picking points in a sphere again. I put together a Perl program that timed picking a million points, then took the average time over 100 trials. Here are the results:
Exact method:   0.37092802 seconds
Discard method: 0.63045970 seconds
In other words, taking the time to figure out the exact method comes close to halving the time it takes to pick a point at random from inside a sphere, which means the extra calculations required for the exact method must not be very expensive.

While I was at it, I did some very rough timings of the cost of various basic operations.
+    1.0
- 1.0
* 1.0
/ 1.0
% 1.0
rand 0.85
cos 1.2
sin 1.2
log 1.2
sqrt 1.0
** 1.5
The test can't be made much more accurate due to my using an interpreted language on an OS that's constantly running other processes in the backround, so don't take the numbers too seriously. All this shows is that rand is cheaper than +, -, *, /, %, and sqrt, which are cheaper than cos, sin, and log, which are cheaper than **, in Perl. Weird, huh?

March 08, 2005

Why program?

We all seek to do something rewarding and fulfilling with our lives, so naturally I must believe there is something rewarding and fulfilling about programming or else I would be in a different line of work. Actually, I believe there are many good things about being a programmer, including but not limited to that feeling you get when you finally solve a really tough problem in an elegant way, the opportunities to get involved in fields outside your specialty, and the general clarity of thought you win for yourself after having gone through several hundred of those mental debuggings I wrote about yesterday.

However, those are all personal benefits, and I'll admit to having a smidgen of social conscience. What is it, exactly, that a programmer does for the rest of us? I think it can be summed up in one word: automation. In any office you can find people doing repetitive, mindless little tasks that have long since been worked out in such detail that someone has written down a list of precise instructions that even a new hire who knows nothing about the organization can carry out. They may need to be carried out once a week or daily, or maybe some poor soul's entire job is to carry out the same task over and over again. In cases like this, a programmer's job is obvious. A detailed list of precise instructions is an algorithm. Somebody has already done half your job for you!

Automating the boring stuff is probably the single greatest service that programmers perform for society. Not only does it save time, but it's easier to pass on the accumulated wisdom encoded in that list of precise instructions. In much the same way that writing preserves knowledge from one generation to the next, so that people don't have to work things out from scratch every 30 years, programs preserve algorithms.

March 07, 2005

Mental Debugging

It's surprising how often my mental model of something is severely buggy. (Well, surprising to me.) I think I know exactly how to do something, but when I sit down to code it (or, in some cases, actually do it), I inevitably find that the situation is much more complicated than I thought. No matter how hard I think about it before hand, I have to make major changes before I can get real results.

Plate tectonics is a perfect example. My first instinct was that I knew a simple way to make a plate tectonics simulation, so I was surprised not to find one out there already. But when I tried to write my own, I didn't get very far before running into big problems.

So far, the only effective way I've found to debug my mental models before beginning to code is to do a couple of iterations of a simulation beforehand with paper and pencil. In most cases this immediately makes obvious the questions you should have answered before you started coding, such as "What do I need to keep track of?", "What data structure am I going to use to represent this?", and "How is the program going to know where two plates are colliding?". This forces you to clarify the vague parts of your algorithm and identifies major structural changes needed, before you've wasted time writing code that can't be used.

But the ultimate way to debug your mental model is to write code that implements it. Before you do that, you've got no way to know for sure that your ideas don't have bugs, or even gaping holes, that you simply haven't noticed.

Maybe I should add a corollary to the third law of computer programming?
Third Law of Computer Programming:
Any programmer can find 90 percent
of his bugs simply by explaining his
program to any uninterested observer.

Corollary Two:
To find the other 10 percent,
you have to write the program.
Running a couple of iterations with paper and pencil beforehand is like explaining it to an uninterested observer. (In this case, the programmer.)

March 06, 2005

World Builder

World Builder uses a more complicated algorithm than SimEarth. It begins with a single, flat landmass in the center of a square world. At each timestep there is a chance a landmass will split in two, and the parts will move away from each other for awhile, slow down, then reverse direction until they hit each other. Landmasses can split while they are already moving, and mountain ranges build up at the leading edge of any moving land. Also, at each timestep any existing mountain ranges erode a little, so that old mountain ranges eventually disappear.

The result is usually a world whose continents look like they could all fit together, the way South America and Africa look today. Mountain ranges, rather than single peaks, are created. It is also relatively simple to code, because faults are only used at the moment a landmass splits. After that, only the velocity of each landmass is tracked.

On the other hand, no island chains ever get generated, and erosion never progresses to the point that old land disappears, because there is no process to generate new land. Along with the lack of wraparound, which prevents pieces of the jigsaw puzzle supercontinent from ever changing places, this gives the generated worlds a look that always differs from Earth in the same characteristic way.

Of course, like SimEarth, there's a lot more being simulated than continental drift in World Builder. Once the user is satisfied with the continents, World Builder can calculate seasonal temperatures and rainfall, climate zones, rivers, and lakes, and then export the world into Nation Builder. It's a shame it's not available any more.

March 05, 2005

SimEarth

If you've read my last two posts, you've probably figured out why no simple plate tectonics algorithms are known. It's a complicated process, one that doesn't lend itself to simple code. That, however, hasn't stopped people from trying.

SimEarth has a very simple, highly unrealistic algorithm for continental drift. The world, which is cylindrical (i.e. wraps around horizontally but not vertically), is divided into a grid of cells, each of which is assigned one of the 8 compass directions or "not moving". Occasional random "earthquakes" assign the same direction to a circular area (a "plate"). Occasional random "volcanos" produce land. Land in a particular cell moves by one cell in the direction indicated every time unit, and if no land enters a cell it is set to ocean.

This algorithm captures the concept of moving land, and not much else. Faults are completely ignored, so that no island chains or mountain ranges are generated, just islands with single large peaks in the center.

On the other hand, SimEarth had to be computationally inexpensive enough to run quickly on a desktop computer in 1988, with a lot more than just continental drift being simulated. It's still a great simulation toy (like SimCity, only on the grandest scale), and it runs fine on WinXP.

March 04, 2005

Tectonics 2

Ok, take a look at this.

The histogram on the left has two distinct peaks. This is because there are two different types of rock making up the bulk of the surface of the Earth, one of which is less dense and "floats" on top of the other. It is for this reason that we have continents separated by deep oceans rather than a world entirely covered in shallow seas.

The cumulative frequency curve on the right could be used as a sanity check for plate tectonics simulations. Any simulation should produce a similar curve, if it's attempting to mimic an Earth-like world.

There are three types of fault, spreading, transform, and subducting. In a simulation we can pretty much ignore transform faults (unless we're interested in earthquakes), because they don't do much. Spreading faults are found at the borders of plates that are moving away from each other. The denser of the two types of rock is produced there, creating new ocean floor. If this were the only type of fault on Earth, we would have nothing but a world-wide shallow sea.

Subduction faults are found where two plates are moving towards each other. One plate inevitably wins the collision and forces the other to bend downwards into the mantle. This produces a deep ocean trench along one side of the fault, which explains the sharp downward spike at the right end of the cumulative frequency curve. The interesting thing is what happens on the other side of the fault. Through some bit of thermo-chemical magic I don't understand yet, the subducting plate triggers volcanic activity about 100km away from the trench. The volcanos produce the second, less dense type of rock, which forms island chains. The Indonesian islands of Java and Sumatra were formed like this, as were Japan and the Aleutian islands. If the activity goes on long enough, entire continents can be formed. South America is the best example of this, with a deep trench off the west coast, followed by a volcanic mountain chain (the Andes) just inland.

These mountains partly account for the spike on the left side of the cumulative frequency curve (they then quickly erode down to relatively flat continents). The rest of it is accounted for by continent-continent collision, where the subducting continental rock is buoyant enough that it doesn't so much subduct as get shoved underneath the other plate. The Himalayas were produced in this way.

So, from a simulation point of view, you not only have to keep track of the thicknesses of the two types of rock at every point on your world, but also the locations of spreading faults, and the locations and polarity of subduction faults.

March 03, 2005

Tectonics

One neat thing about studying simulations is that you get to (have to) learn about fields well outside your specialty. It also means that when I don't have anything in particular to tell you about simulations, I get to talk about anything I want. Today, it's plate tectonics.

I mentioned earlier that I only knew of one plate tectonics simulation algorithm out there. I forgot about SimEarth. There are also lots of reconstructions of the past plate tectonics of Earth out there, but none that are clearly described for the average coder, and none (that I've found) that illustrate the principles of plate tectonics using randomly generated initial conditions.

So I did a little digging with Google to find out about plate tectonics for myself. From the web pages I found, I've got the impression that plate tectonics vastly simplifies geology, but that the details haven't been worked out yet. It's as though Copernicus has already pointed out that it's much simpler to think of the planets as orbiting the sun, but Kepler and Newton haven't yet shown that they orbit in ellipses and obey F=Gm1m2/d2. We know that the surface of the earth is divided into plates and the directions and speeds at which they are currently moving, but we don't know how faults move relative to one another, how or why new faults are created and old ones destroyed, and so on. (Or if someone does, it hasn't filtered out to the web yet.)

Here's what we do know: the location of Earth's faults 1, earthquakes are almost all generated along faults 2, and there are three different types of faults that behave differently.

More tomorrow!

March 02, 2005

Pseudonyms considered harmful

I'm short on time again, so this will be short and off-topic.

You may have noticed that I use my real name online for most purposes, from my email address to the address of this blog. A lot of people prefer to use a pseudonym for their activity online; sometimes a single favourite, but often a different one for every activity.

One reason given for this is the desire for anonymity. People, for whatever reason, sometimes fear that their behaviour in one context (e.g. what they write on their blog) will hurt their reputation in another context (e.g. at work). Occasionally this is even true. I, on the other hand, prefer to keep myself honest by putting my own name on everything I do. This prevents conflicts of interest between different parts of my life from going undealt-with for too long, insures that I'm polite at all times (as opposed to, e.g., obnoxious anonymous online gamers), and has the added benefit of ensuring that I get credit for all the things I do.

Another reason given for using pseudonyms is the desire to avoid identity theft. In my opinion, the larger a footprint you leave in terms of identity, the harder it will be for someone to steal it all. Identity thieves like to get the social security numbers of infants that died 30 years ago precisely because there isn't any other information about those people hanging around to contradict the thief's claims. Similarly, if someone steals an account on a forum that uses a real name, so that the original can easily be contacted, the thief is much more likely to be noticed and caught than if a once off pseudonym was used.

March 01, 2005

Compiler writers vs. compiler users

Compilers can be written so that they can deal with ambiguous code, but that makes compiler writers' jobs harder, so they usually insist on precise languages. HTML parsers used to be able to handle the most distorted HTML imaginable, but XHTML (and XML) has swung things back the other way to insisting that everything be perfect, right down to not allowing quotes to be omitted around attribute values. This makes the HTML parser writers' job easier, and makes the HTML writers' job harder. Unfortunately, there are far more HTML writers than HTML parser writers, so this is a net loss. Since parser writers usually control the specs for the language, this kind of problem crops up all over the place.

Perl's TMTOWTDI philosophy is an example of how adding to compiler complexity (and annoying language theorists who would prefer an elegant language) can make the language more useful and easier to learn for the average programmer. Perl got this way partly because it was designed by enough programmers that they weren't tempted to cut any corners just to make writing the compiler a smaller job.