April 27, 2005

Daily WTF

For your amusement... or possibly horror:
http://thedailywtf.com/

April 24, 2005

Rebalancing

Haven't had any interesting thoughts to share lately, so:

The animal balancing algorithm works by finding the most unbalanced animal (the one with the most animals on the other side of the parent question) that the user can help with by answering one of the questions on that side.



Does an elephant have an exceptionally long neck? no



Does a giraffe have hooves? yes
Note that the program remembers that giraffes don't have trunks.



Does an elephant have hooves no



Does an elephant have retractable claws? no



I should find a way to generate these little tree gifs automatically. Hmmm.

April 21, 2005

Right idea, wrong time

A friend asked: "Why do all the good code ideas come when you should be asleep?" Good question. For me, they always come just after I leave work on Friday. A couple of times I've even turned around, gone back and spent another couple of hours coding, something you normally couldn't pay me to do. (Obviously not because I don't like coding, but because I won't give up my evenings and weekends for better pay.) This, after spending the entire day, and sometimes the entire week, trying to think of that one good idea.

My theory is that if you think hard about something, you get into a rut. You spend all your time thinking about the details of a particular set of ideas with clearly defined boundaries. Then, when you begin to relax as you walk to the parking lot (or get tired enough that your thinking gets fuzzy), your mind wanders a little bit outside the box and, if you're lucky, you find that idea that should have been inside the box before your stomach growls or you fall asleep.

April 18, 2005

Power corrupts...

Automation is a powerful tool. Also a great way to perpetrate a hoax! You should generate a sample paper and have a look for yourself. The program creates a table of contents, references, and even figures, charts, and graphs. The only thing missing is semantic content.

April 17, 2005

Animal improvements

Ever played Animal? You think of an animal, and the program tries to guess what it is by asking yes or no questions. At the heart of the algorithm is a binary tree of questions, where each leaf on the tree is an animal, like this:

When the program guesses wrong, something like this happens:

Are you thinking of a kitten? no
What animal are you thinking of? tiger
What is a question that distinguishes between a kitten and a tiger?
would you keep it as a pet
Would you keep a tiger as a pet? no


In this way new animals and questions are added to the tree and the program increases its chance of guessing correctly.

There are many possible improvements to the basic program. One early version varied its wording (to seem more natural), made pronoun substitutions (Would you keep a tiger as a pet? instead of How would you answer that question for a tiger?), accepted varied input (e.g. y|sure|of cou|ok|o.k.|all ri |someti|freque|certai|right|positiv for "Yes"), and had a primitive form of error correction. The error correction worked by checking to see if new animals were already on the tree and, if so, quizzing the user to find out whether to: discard the new entry because a question was answered incorrectly, delete the old entry because it was in the wrong place, or keep both because the branching point question could have been answered either way.

One signifcant problem is that the tree can easily get extremely unbalanced, leading to very long sequences of questions for some commonly picked animals. Ideally, the tree would be huffman coded so that the most commonly picked animals would be guessed with the fewest questions. However, it's much easier to just balance the tree as much as possible, so that every animal can be guessed with roughly the same number of questions. To do this, the program has to ask the user some new questions. For example:

Are you thinking of a human? yes
I win! Bwahahaha!
You must answer a question, mortal:
Does a giraffe have hooves?
yes


Using this answer and two others:

Does an elephant have hooves? no
Does an elephant have retractable claws no


The example tree from the beginning can be rebalanced to:

In practice the program won't always ask the optimal number of questions to rebalance the tree (because the answers aren't known ahead of time), but in general good rebalancings can be obtained from few questions. My version of this algorithm asks four questions (the other one is Does an elephant have an exceptionally long neck?) to get this result, which cuts the maximum number of questions from six to four.

Rebalancing helps solve problems caused by overly specific questions, by letting those questions sink to the bottom of the tree. Overly vague questions, overly vague animals, and overly specific animals are more difficult to fix. All of these problems can be solved by allowing users to delete bad questions and animals, but this is, IMHO, a poor choice. Better would be to find ways to make the program self-correcting. For example, vague questions will probably cause animals to end up on two different branches of the tree. As soon as this happens, the program should ask the user to replace the offending question with a new one.

Too-vague animals can cause an odd situation:

Are you thinking of a fish? yes
I win! Bwahahaha!
Play again?
I was thinking of rainbow trout...
Please answer yes or no.
Play again?


I'm not sure how the program can detect those, or too-specific animals (e.g. Garfield). Too-specific animals aren't too bad, they just add some extra questions before the program can guess cat.

"Advanced" versions of Animal and 20 questions allow answers besides "Yes" and "No" and use a matrix of answers and questions instead of a binary tree. However, they don't seem to be very good at asking a minimum number of questions.

Thoughts?

April 13, 2005

Entish as a computer-aided translation interlanguage

Ever wondered how a universal translator could work? Obviously it's impossible to instantly begin translating a previously unknown language, but the problem of translating between known languages is more than hard enough. The last time I checked, the EU had 20 official languages and ran the world's largest translation operation (2nd place was the UN, with 6 official languages). With 20 languages to translate, the EU has to deal with 190 language pairs, and pays over €500 million/year for translator salaries alone.

One way to simplify the problem would be to translate every language to the same intermediate language, then translate that to the target language. That would reduce the problem to 19 language pairs in the case of the EU. But which one? For political reasons, nobody would agree to using any of the current official languages (which is why the EU is in this situation in the first place). More importantly, because each translation introduces errors, the more intermediate steps there are, the worse the final translation will be, turning the whole thing into a giant game of telephone.

So, why not create a language meant solely to act as an intermediate, used only by computer translators, and designed to reduce translation errors to a minimum? (That's a rhetorical question, by the way.)

Some translation errors are caused by the addition of information that is mandatory in the target language but unspecified in the original language. But many more errors come from the loss of subtle meanings during the translation. An intermediate language would ideally lose no information at all, carefully recording every nuance of the original utterance. This would lead to a language that would be unacceptably verbose for use by humans, but computers should be able to handle it. Entish happens to be a perfect model for this kind of language, since every word in Entish is an exhaustive description of the attributes of the thing named. Unfortunately, only a few fragments of Entish are known, presumably because it is too boring to listen to Ents speak it.

April 12, 2005

Survival of the survivors

The central lesson of biology is that survival is at the core of just about every behaviour there is, even ones that at first glance appear to increase mortality. For example, people will sacrifice themselves to save their family. If survival of the individual was the driving force, people would happily save themselves at the expense of their family. However, survival in the biological sense is defined in terms of the survival of characteristics, not individuals. Because parents share most of their behaviours with their offspring, anytime a parent sacrifices him/herself to save offspring, the parent increases the survival rate of all of his or her behaviours, whether they were passed on through genes, memes, or something else, including the one that caused the parent to make the sacrifice. If the rescuer might survive the experience, the motivation is even stronger, in inverse proportion to the risk.

Although we are more used to thinking of evolution in terms of physical characteristics, like the peacock's tail, behavioural characteristics are possibly more important. For example, on its own, the male peacock's tail would decrease its survival rate, because it's easily visible to predators, slows down the bird, and takes a lot of energy to grow and maintain. However, female peacocks choose to mate with males that survive and prosper even with massive ornamental tails. Together, the physical characteristic and the behavioural characteristic act to increase survival, by reducing the breeding rate of individuals that can't maintain an expensive tail. Without the behavioural characteristic, the physical characteristic wouldn't be useful.

This way of thinking about behaviour leads to the conclusion that ideas that have survived for a long time, e.g. the various concepts called morals, must in some way increase survival. Either that, or the theory is wrong and something else is going on.

April 11, 2005

Missed Opportunity

Damn, I wish I had heard of this in time to participate. Contestants had 3 days (or 1 day for the lightning prize) to create an ant brain that could outcompete all the other contestants' ant brains. To do this, the winning contestants wrote an ant-world simulator that matched the judges' simulator exactly, wrote a visualizer so that they could observe and debug their ants' behaviour, invented a higher-level language to describe ant behaviour, and wrote a compiler to translate that to the finite state machine language required for a valid submission. Then, of course, they had to work out actual strategies for their ants. All in 3 days!

It reminds me of the ACM programming competition, where 3 people get 1 computer to solve 8 problems in 5 hours. I'll definitely keep an eye on this; maybe I'll have a chance to compete next year. Anyone interested in forming a team?

April 06, 2005

Reverse Polish Notation and Language

Ever heard of reverse polish notation (RPN)? It was a way to allow early calculators to parse complicated mathematical expressions, using a minimum amount of memory, without needing brackets. Lately I've been thinking about its applicaton to languages, and this time I'm not talking about programming languages. (Although Forth and Joy use RPN, and LISP and its relatives use PN.)

There are people out there who construct languages (conlangs) for a variety of purposes, from providing atmosphere for fiction (Klingon) to facilitating international communication (Esperanto). A subset of this group, often called loglangers (for "logical language makers"), is concerned (in part) with creating languages that are unambiguous. One type of ambiguity they want to eliminate is attachment ambiguity. Here's an example:

(John saw the lady who watched the crowd) with a telescope.

John saw (the lady who watched the crowd with a telescope).

We solve this problem in English by putting a comma after "crowd" if we want the first meaning, but English can't handle "X saw Y who saw Z who saw W with a telescope", "X saw Y and Z who saw W with a telescope", or any number of other complicated expressions.

Some loglangers have solved this problem by including bracket-like "elidible terminators" (optional words for marking the end of a phrase or other expresion) in their language. I'm specifically thinking of Lojban. However, if you treat verbs as operators and noun phrases as operands, you can use RPN or PN to solve this problem entirely with word order:

John the lady who the crowd watched with a telescope saw

John the lady who the crowd with a telescope watched saw

The only words that have moved are "watched" and "saw", switching from Subject Verb Object (SVO) order to SOV order. Actually, there's a fourth element, the prepositional phrase, so what I've really done is switch from SVOP to SOPV. SOVP would also solve the attachment ambiguity.

SOV is actually the most common word order in natural languages. Could that be because it allows speakers and listeners to parse the language using a minimum amount of memory, without attachment ambiguity?

April 03, 2005

Simulating a house

I was just reading this. (How did I end up there, you ask? Because of the other stuff Bob Jenkins has on his website.) Anyway, this got me to thinking about the Sims. I haven't actually played the Sims since before the first expansion pack was released, but the core of the game looks like it could act as a simulation for evaluating house designs. In particular, as the sims wander around on their daily routine you can figure out how much walking they had to do in each floor-plan. This is important both on a large scale, if two rooms that are often visited one after the other are on opposite sides of the house (or different floors!), and on a small scale, as in laying out an efficient kitchen.

Actually, most of the game is irrelevant for a simulation, since you don't need graphics, music, or user interaction if you just want to compile some statistics. For path-length evaluation all you really need is a set of paths for sims to follow from one service to another, a way to compute the distances from a floorplan, and a way to generate new floorplans.

Of course, there are lots of other factors to consider, like building cost, location, lot size constraints, typical weather, plumbing (so that hot water doesn't take 5 minutes to start flowing and flushing the downstairs toilet doesn't cause the upstairs shower to change temperature), light switch locations (I hate stumbling around in the dark looking for the switch; it should be right next to the entrance to the lighted area), and dozens of others. Just to make the problem harder, you can't consider any of them in isolation: the optimal solution to one may force a non-optimal solution to another. So you have to decide how to convert everything to a common system of measurement and try to optimize that.

All in all, it makes a very interesting problem.