April 17, 2005

Animal improvements

Ever played Animal? You think of an animal, and the program tries to guess what it is by asking yes or no questions. At the heart of the algorithm is a binary tree of questions, where each leaf on the tree is an animal, like this:

When the program guesses wrong, something like this happens:

Are you thinking of a kitten? no
What animal are you thinking of? tiger
What is a question that distinguishes between a kitten and a tiger?
would you keep it as a pet
Would you keep a tiger as a pet? no

In this way new animals and questions are added to the tree and the program increases its chance of guessing correctly.

There are many possible improvements to the basic program. One early version varied its wording (to seem more natural), made pronoun substitutions (Would you keep a tiger as a pet? instead of How would you answer that question for a tiger?), accepted varied input (e.g. y|sure|of cou|ok|o.k.|all ri |someti|freque|certai|right|positiv for "Yes"), and had a primitive form of error correction. The error correction worked by checking to see if new animals were already on the tree and, if so, quizzing the user to find out whether to: discard the new entry because a question was answered incorrectly, delete the old entry because it was in the wrong place, or keep both because the branching point question could have been answered either way.

One signifcant problem is that the tree can easily get extremely unbalanced, leading to very long sequences of questions for some commonly picked animals. Ideally, the tree would be huffman coded so that the most commonly picked animals would be guessed with the fewest questions. However, it's much easier to just balance the tree as much as possible, so that every animal can be guessed with roughly the same number of questions. To do this, the program has to ask the user some new questions. For example:

Are you thinking of a human? yes
I win! Bwahahaha!
You must answer a question, mortal:
Does a giraffe have hooves?

Using this answer and two others:

Does an elephant have hooves? no
Does an elephant have retractable claws no

The example tree from the beginning can be rebalanced to:

In practice the program won't always ask the optimal number of questions to rebalance the tree (because the answers aren't known ahead of time), but in general good rebalancings can be obtained from few questions. My version of this algorithm asks four questions (the other one is Does an elephant have an exceptionally long neck?) to get this result, which cuts the maximum number of questions from six to four.

Rebalancing helps solve problems caused by overly specific questions, by letting those questions sink to the bottom of the tree. Overly vague questions, overly vague animals, and overly specific animals are more difficult to fix. All of these problems can be solved by allowing users to delete bad questions and animals, but this is, IMHO, a poor choice. Better would be to find ways to make the program self-correcting. For example, vague questions will probably cause animals to end up on two different branches of the tree. As soon as this happens, the program should ask the user to replace the offending question with a new one.

Too-vague animals can cause an odd situation:

Are you thinking of a fish? yes
I win! Bwahahaha!
Play again?
I was thinking of rainbow trout...
Please answer yes or no.
Play again?

I'm not sure how the program can detect those, or too-specific animals (e.g. Garfield). Too-specific animals aren't too bad, they just add some extra questions before the program can guess cat.

"Advanced" versions of Animal and 20 questions allow answers besides "Yes" and "No" and use a matrix of answers and questions instead of a binary tree. However, they don't seem to be very good at asking a minimum number of questions.



Post a Comment

<< Home