Tuesday, June 5, 2007

What do you Mean, Mean?

There are three kinds of lies; lies, damned lies and statistics
-Benjamin Disraeli

When I look at most statistics, I see phrases such as "the average x" or "the median y". Even though I know better, I tend to read them both as "average". This can give a false impression of what's happening. What's the difference and what does that difference mean?

If you haven't repressed your junior high math yet, you remember that the mean, or average, of a set of numbers is the sum of the numbers divided by the count. For example if you have a group of five people aged 10, 12, 14, 15, and 52, the mean would be (10+12+14+15+52)/5 or 20.6. This is accurate, but it means that 4 out of the five people are below the mean. What's worse, you would look at the mean by itself, it gives the impression that this is a group of young adults, probably college-aged, which is completely wrong.

The median is the "middle" value when you order them from least to greatest (if there is an even number, you take the average of the two middle values). In the example above, the median would be 14, which gives the impression of a group of young teens. This gives a better picture of the group, even if it does ignore the oldest person (which a group of young pre-teens/teens would probably be doing anyway).

There is some additional information you can provide, such as standard deviation, that will improve the picture for the reader, but this causes most people's eyes to glaze over. The question is, which one do you use?

If the data can skew greatly in one direction, then median may be the right choice. Income is one example of this. One investment banker can raise the average wages of dozens of tellers, making the average earnings in the finance industry fairly meaningless by itself.

If the data is naturally restricted, then the mean may be more appropriate. There aren't any 1000 degree days in January, so the average high temperature is probably useful information.

The key here is to know the type of data that is being represented and making you decision based on that. Even more important is to know what is appropriate for other people's data and making sure they are providing an accurate picture.

No comments: