<FOOLISH FOUR PORTFOLIO>

What's Data Mining?
...and what isn't

by Chris Rugaber (TMF [email protected])

(May 13, 1999) If you've been reading this space this week, you know that Ann Coleman has been responding to an article published in the March/April 1999 issue of the Financial Analysts Journal. Titled "Mining Fool's Gold," it essentially asserts that the Foolish Four is the result of "data mining" and therefore that its backtested results provide little basis for believing it will perform similarly in the future.

Today, I wanted to refer you to a post on our message boards by Dr. Alan Christensen, a geneticist at the University of Nebraska, who posts under the screen name "drosophilosopher." His message provides an excellent example of the difference between "data mining" and what we're doing with the Foolish Four. His statistical research found that once you ranked the 30 Dow stocks by their RP scores for every year from 1962 through 1997 and then looked at the returns of each of those stocks, compared to the Dow 30 as a whole, you would find that RP stocks #2, #3, and #5 had higher returns than the Dow 30 overall. In addition, he found that this was statistically significant, meaning it was highly unlikely that it was due to chance. He also found that the average returns of the #4 stock did not outperform the Dow at all.

If we were data mining, we would then drop the #4 stock and only keep stocks 2, 3, and 5 in our portfolio. After all, the returns from such a portfolio would have been great in the past. But would it make sense?

There is no reason to think that the stocks ranked 2, 3, and 5 in RP order have some intrinsic characteristic that will enable them, in the future, to return significantly more than the #4 RP stock. The fact that they have in the past is an artifact of the data and is probably due to chance.

But there are logical reasons why high RP stocks generally provide significantly higher returns than the Dow as a whole. These characteristics are their high yield, their relatively lower price, their distinction as Dow stocks (meaning they are huge, reliable businesses), and their collective history of maintaining their dividends (American corporations rarely reduce their dividends, even in troubled times, for fear of alarming Wall Street).

As a result, if the Motley Fool were to say, "OK, we've got one more refinement for you. Backtesting shows that the #4 RP stock doesn't do very well, so drop it," that would be a clear example of data mining. There is no logical basis for doing it, and it can't be justified or explained without reference to data. Yet the outperformance of high RP stocks in general can be explained without data, as was just done above.

This may cause some of you to ask about RP #1: what's the basis for dropping that stock? Isn't that just data mining? In short, yes and no. Dropping the RP1 stock is the result of an observation that in 5 of the 7 years when the lowest-priced stock was also the highest yielder (in which case the stock would have the highest RP score), it lost more than 19%.

There was a theory behind that, which is that such a stock was likely to be in real financial trouble, as opposed to only temporarily floundering. We have since found that this rule does not hold up well in portfolios that start in months other than January, so the rule is suspect. Since the fifth-ranked stock is also a strong performer, essentially replacing the first stock with the fifth is a simple way to avoid potential trouble with no downside.

Overall, this is a complicated topic, so please be sure to read Ann Coleman's articles from earlier this week and be sure to check out the message board discussion of this. I have only looked at one small piece of this debate tonight.

By the way, if discussions of statistical analysis have left you glazed over, be sure to check out our Star Wars special for a little more light-hearted fare. Fool on!


Today's Stock Lists | 1999 Dow Returns

05/13/99 Close
Stock  Change   Last
--------------------
CAT  -1  1/4   60.63
JPM  +7  5/8   146.75
MMM  -2 15/16  90.94
IP   -   7/8   53.38



                Day   Month    Year   History
        FOOL-4   -0.41%   0.89%  30.09%  32.02%
        DJIA     +0.97%   2.95%  21.36%  20.88%
        S&P 500  +0.26%   2.43%  11.57%  11.84%
        NASDAQ   -0.94%   1.54%  17.75%  19.37%

    Rec'd   #  Security     In At       Now    Change

 12/24/98   24 Caterpillar   43.08     60.63    40.73%
 12/24/98    9 JP Morgan    105.51    146.75    39.09%
 12/24/98   14 3M            73.57     90.94    23.61%
 12/24/98   22 Int'l Paper   43.55     53.38    22.56%


    Rec'd   #  Security     In At     Value    Change

 12/24/98   24 Caterpillar 1034.00   1455.00   $421.00
 12/24/98    9 JP Morgan    949.62   1320.75   $371.13
 12/24/98   14 3M          1030.00   1273.13   $243.13
 12/24/98   22 Int'l Paper  958.12   1174.25   $216.13

              Dividends Received      $29.45
                             Cash     $28.26
                            TOTAL   $5280.84









</FOOLISH FOUR PORTFOLIO>