Foolish Workshop
By
Statistical Significance and You
Or, a lucky toss of the grapefruit
DES PLAINES, IL (Sept. 23, 1999) -- One of the questions that people have about the high-performing Workshop strategies is "What's the possibility that these results are just a fluke?"
From a mathematical standpoint, the question is "Are the returns from these screens statistically significant?"
There are several ways to test for statistical significance. This week we begin an attempt to describe what's being tested, and along the way we'll explain some of the mathematical jargon.
Let's say you chose a thousand people at random and asked them to throw a grapefruit as far as possible. Let's assume that the average person can lob a grapefruit 60 feet. If you kept track, you might find that most (about two-thirds) of the people tossed the grapefruit between 45 and 75 feet, about half of the rest chucked the citrus between 76 and 90 feet, and only about 25 or so managed to heave it over 90 feet. On the other hand, there were some people who could only throw a grapefruit 30-44 feet, and about 25 who didn't even make it that far. The graph of those results would look a lot like this:
Grapefruit Distance
This is a "normal distribution curve," also known as a "bell curve" because it looks like a bell. Not all phenomena are normally distributed, but most naturally occurring things follow this pattern pretty closely. Most of the measurements will cluster around the average. The frequency of occurrence drops as the results get farther and farther from the average.
When judging statistical significance, one of the main terms you'll want to understand is "standard deviation." This tells you how far your results tend to be from the average. I won't show the formula here, but suffice it to say that in a normal distribution, about two-thirds of the results will be within one standard deviation of the average, and about 95% (19 out of 20) will be within two standard deviations.
In the example above, the standard deviation for grapefruit tossing looks to be about 15 feet. The average distance was 60 feet and two-thirds of the tosses are within 15 feet of that average (i.e., between 45 and 75 feet). So they are within one standard deviation of the mean. Most of the rest are within 30 feet (i.e., between 30 to 90 feet) or within two standard deviations of the average. As you can see, the frequency trails off gradually. An additional 3% will be within three standard deviations, and the remainder (about 2%) will be outside of three standard deviations, half on the long side and half on the short side.
Why is this important? Imagine that we're trying to identify a group of people who would likely be great grapefruit throwers without heaving more citrus everywhere. Where would we look? Obviously we want to look at the group that threw the grapefruit the farthest -- the guys who scored way over there on the right side of the graph where the curve tales off. Mathematically, we want to look for throwers three or more positive standard deviations from the mean. And then we want to find more fruit lobbers who share the same characteristics as our champions.
That's rather obvious intuitively. But why? It's simple when you think about it. Nobody throws the same distance every time. An average thrower might toss a fruit 63 feet one time and 55 the next. But who is more likely to toss it 100 feet? The guy who tossed it 63 feet in our test or the guy who tossed it 90 feet? Statistics tells us how likely it is that an unusual event occurred by random chance or because the grapefruit tosser was 6 foot 7 and 350 pounds of muscle.
Next week we'll explore the relationship between the standard deviation and probability. Until then, have a great week, and Fool On!
Acknowledgment -- This article was inspired by a piece done by Bill James in his 1984 Baseball Abstract. His common sense explanation of bell curves and standard deviations has stuck with me ever since.