May 13, 2010

XKCD Colors

Randall Munroe (creator of XKCD) recently ran a color name survey. Roughly 150 thousand people responded, naming ~3.4 million colors, an average of ~22 per respondent. ~180 thousand unique color names were given, although only about 4% of these were given by ten or more different people. Munroe analyzed the list of names given by more than 100 different people (1228 names), and produced a somewhat filtered list of 949 color names assigned to RGB values.

Some spelling corrections were made (gray vs grey, fuschia vs fuchsia), but many remain:
  • forrest vs forest
  • kelley vs kelly
  • lavendar vs lavender
  • liliac vs lilac
  • ocher vs ocre vs ochre
  • orange(-y/-ish) vs orang(-y/ish)
  • perrywinkle vs periwinkle
  • purple(-y/-ish) vs purpl(-y/ish)
  • robin egg vs robin's egg
  • siena vs sienna
  • terra cotta vs terracota vs terracotta
  • toupe vs taupe
Non-english-language names haven't been filtered out (rosa, azul), at least one descriptive phrase made it in (blue with a hint of purple), and many colors differing only in punctuation haven't been merged (yellow green vs yellow/green vs yellowgreen). Note that the correct form (yellow-green) has been removed, even though it's the most common form in the raw data.

Although there are no duplicate colors in the list, there are some very similar color names that aren't clearly different colors (navy vs navy blue, olive vs olive green, yellow green vs green yellow). Sometimes two clearly different names seem to be synonyms for the same color (burgundy vs maroon, lilac vs lavender). The top 10 most similar pairs are:
  1. very light blue and really light blue
  2. ice blue and very pale blue
  3. poop and shit
  4. light beige and creme
  5. mahogany and dried blood
  6. very light blue and very pale blue
  7. amber and saffron
  8. ecru and ivory
  9. banana and faded yellow
  10. bile and baby puke green
Runners-up include forest green and british racing green, slate blue and steel blue, and orangey brown and orangish brown.

Despite all the flaws in the names, the method used to determine the RGB value for each color name is clearly superior to some of the other methods I've seen used in color name surveys: it avoids the edge problem (black ends up as #000000 instead of #1e1e20) and it is robust (in the sense that a median is more robust than an arithmetic mean because it is less affected by outliers). Also, the sheer volume of raw data is amazing, and it's all available for download and independent analysis.


Post a Comment

<< Home