May 31, 2010

Color Adjectives

Ever wondered what the difference between a light color and a bright color is? Ever wanted a systematic way to name variants of a color? Well, now we have data.

First, a bit of terminology. Intensity is the total quantity of light. In the RGB system, this is easy to calculate: just sum the three values, since each represents the intensity of one component of a color. Saturation is the distance between a color and the nearest grey (the grey of the same intensity).

The most common color adjectives (appearing in the most color names) in the XKCD list are light, dark, pale, bright, deep, and dull. The most common color nouns are green, blue, pink, purple, brown, yellow, and red. Here's a labeled plot of the simple combinations of the common adjectives and nouns (plus teal and magenta):

There's a definite pattern. Bright colors are all saturated, and dull colors are their opposite, unsaturated. Dark colors are both unsaturated and unintense, while deep colors are slightly more saturated but equally unintense. Pale colors are both intense and unsaturated, while light colors are slightly more saturated but equally intense. The adjectives appear to denote directions within the color space, rather than absolute regions. For example, pale green and pale blue are nearly white, but pale red, orange, and yellow are quite saturated, since the unsaturated versions of those hues are called brown (if unintense) or tan (if intense).

The situation is not as neat as the little compass on the diagram would lead you to believe, but it serves as a useful first approximation. Some of the awkward details: dusty and faded are common synonyms for dull; pastel colors are commonly between the light and pale variants; and neon and fluorescent are (mostly) synonyms for bright. Deep might better be considered the opposite of pastel, leaving pale without an obvious opposite.

So, now you can hear mauve described as 'a dull magenta', or lavender as 'a pale purple', and actually have a way to work out what that means.

May 25, 2010

XKCD Color Name Maps

All of the colors named in XKCD's list of color names:

Here's a new projection, with light intensity on the X axis, and saturation (distance from the nearest grey) on the Y axis:

Black is in the bottom-left corner (though it can't be seen against the background), and white is in the bottom-right, with the greys running along the bottom edge. Colors with different hues but the same intensity and saturation end up at the same location, so the other six corners of the RGB color cube overlap at two places along the top edge.

Compare to the X11 color names:

There are a bunch of colors one-on-top-of-the-other in the second image, but it's obvious the color space isn't nearly as well covered, especially among the dark colors. Also, it's clear several groups of colors were artificially generated, most likely by increasing the non-maximum RGB values, moving a saturated color towards a white or grey while increasing the intensity.

Here are the top 100 XKCD color names (ranked by number of survey participants who gave the exact name at least once), labelled. Click to view full sized.

May 20, 2010

XKCD color maps

Gissehel made some maps from the XKCD Color Survey raw data. (Via XKCD; I don't know who Gissehel is.) It's quite difficult to see the grey dots on a white background, and it's difficult to see which color(s) each dot represents. In short, I was unimpressed, so I made my own.









The projection is just a rotation of the RGB color cube to put the black and white corners in the center and the red corner on the right. This places more "colorful" or "saturated" colors further away from the center, at an angle corresponding to the hue. Light and dark variants of the same color end up at the same location.

The color maps usually show a tight group of dots confined to a single hue. Non-color (spam) terms are relatively easy to spot, because they have a high density of responses all over the map. E.g., compare two colors with roughly the same number of data points, eggplant and ugly:

On a few colors though, people are genuinely confused. By far the most striking example is puce, which according to the dictionary is "flea colored", or "brownish purple", but which many people seem to believe is a shade of yellow-brown.

Melon seems to vary between watermelon, cantaloupe, and honeydew. Even watermelon occasionally refers to the green rind rather than the red interior.

Apple combines apple green and apple red:

Topaz refers to a gem with many possible colors, but people are apparently only confused about whether the color is teal, yellow, or (rarely) blue or pink.

Maps of color names which are spelling variations of the same name strongly resemble each other, right down to the confused people. E.g., chartreuse vs chartruse:

This leads me to wonder whether there's a method of computing a similarity between any two of these color maps in a way that would make it possible to merge variations of the same color term automatically, or at least produce a good list of potential mergers to be go over by hand.

May 13, 2010

XKCD Colors

Randall Munroe (creator of XKCD) recently ran a color name survey. Roughly 150 thousand people responded, naming ~3.4 million colors, an average of ~22 per respondent. ~180 thousand unique color names were given, although only about 4% of these were given by ten or more different people. Munroe analyzed the list of names given by more than 100 different people (1228 names), and produced a somewhat filtered list of 949 color names assigned to RGB values.

Some spelling corrections were made (gray vs grey, fuschia vs fuchsia), but many remain:
  • forrest vs forest
  • kelley vs kelly
  • lavendar vs lavender
  • liliac vs lilac
  • ocher vs ocre vs ochre
  • orange(-y/-ish) vs orang(-y/ish)
  • perrywinkle vs periwinkle
  • purple(-y/-ish) vs purpl(-y/ish)
  • robin egg vs robin's egg
  • siena vs sienna
  • terra cotta vs terracota vs terracotta
  • toupe vs taupe
Non-english-language names haven't been filtered out (rosa, azul), at least one descriptive phrase made it in (blue with a hint of purple), and many colors differing only in punctuation haven't been merged (yellow green vs yellow/green vs yellowgreen). Note that the correct form (yellow-green) has been removed, even though it's the most common form in the raw data.

Although there are no duplicate colors in the list, there are some very similar color names that aren't clearly different colors (navy vs navy blue, olive vs olive green, yellow green vs green yellow). Sometimes two clearly different names seem to be synonyms for the same color (burgundy vs maroon, lilac vs lavender). The top 10 most similar pairs are:
  1. very light blue and really light blue
  2. ice blue and very pale blue
  3. poop and shit
  4. light beige and creme
  5. mahogany and dried blood
  6. very light blue and very pale blue
  7. amber and saffron
  8. ecru and ivory
  9. banana and faded yellow
  10. bile and baby puke green
Runners-up include forest green and british racing green, slate blue and steel blue, and orangey brown and orangish brown.

Despite all the flaws in the names, the method used to determine the RGB value for each color name is clearly superior to some of the other methods I've seen used in color name surveys: it avoids the edge problem (black ends up as #000000 instead of #1e1e20) and it is robust (in the sense that a median is more robust than an arithmetic mean because it is less affected by outliers). Also, the sheer volume of raw data is amazing, and it's all available for download and independent analysis.