Friday, November 28, 1997

The Daily Workshop Report
Randy Befumo (TMF Templr)

Statistical Certainty and Relevance

Given that Robert has the day off today, instead of talking about the existing screens here in the Workshop, I thought I would give readers a look at how a statistician would analyze the relevancy of this data.

All too often investors see a back-tested screen and assume that going forward the returns are fait accompli. Although certainly they should not be dismissed out of hand as "noise," back testing is more of an art than a science. Even if you compiled an incontrovertible testament spanning 500 years of daily records, you could only have a 95% certainly that this pattern would continue. Statistics, much like life, never offer guarantees.

Even more interesting, a statistician looking at back-tested data would offer only one word to describe even the highest-return model -- "promising." This is because proof for the statistician is in how predictive the model is after it has been discovered. Part of the problem with back-tested returns is that over any period of time, numbers can be optimized and massaged. As a result, proof is taken not in what the model predicted in the past, but what the model does after it starts to predict. This is when many statisticians would argue the real data begins.

The key thing for readers to recognize is that the word "back-tested" is an indicator that some due diligence has been exercised, but in no way guarantees any level of return. Now, if a back-tested screen as a whole outperforms the market as a whole by a wide enough margin for a long enough time, certainly you might have something there. This is where stuff like standard deviation comes in.

By looking at the volatility of returns over the entire period, you can calculate a number called standard deviation, which is the average amount the model veers up or down in a given year. The standard deviation helps to tell you how statistically significant the outperformance or underperformance really is. For each standard deviation (or sigma) the back-tested screen outperforms the market, the more statistically significant it is deemed.

If a back-tested screen (otherwise known as a model to some) does 20% per year with a standard deviation of 10% and the market averaged 18% over the same period, the outperformance cannot be deemed statistically significant. However, if the screen does 24% per year with a standard deviation of 5% while the market does 13%, this is more than two standard deviations of outperformance and can be considered quite promising.

Unfortunately for many investors, forward testing of back-tested screens takes time. When an investor sees a promising bit of data based on 10 to 20 years of back-testing, the tendency is to plunge right in. Although in the end what you do with your money is your decision, exercising some caution here might make sense. Particularly when the period covered is short or when the time period rests in a period like the 1980s and 1990s when market returns have been three or four standard deviations above equity returns over the past 200 years, drawing any 100% conclusions may be rash.

In the end, any "model" is as much a product of its user as it is a product of the market. Every time you tweak it based on subjective criteria you further distance it from any potential relevance accorded by back-testing. However, if you blindly implement a model that it turns out has no predictive value, you destroy wealth. A nice middle ground here is to use any screens that have not accumulated very much post-discovery data as idea generation vehicles rather than as mechanical models.

Have You Given? The Fool Charity Fund