<FOOLISH FOUR PORTFOLIO>

Testing the Foolish 4
Part 2 of data mining

by Ann Coleman (TMF AnnC)

Alexandria, VA (May 11, 1999) -- One of the dangers of academic research is that it can encourage the finding of complications where none exist. Yesterday I described how the Foolish Four strategy originated in response to a charge of "data mining."

The authors of the article, Grant McQueen and Steven Thorley, are concerned that the Foolish Four is the 4th generation of "data mining." The first generation was the Dogs of the Dow (also known as the High Yield 10). If irrational conclusions are indeed being drawn, then drawing further conclusions based on them would be dumb.

They express concern because they do not know how many factors were looked at before the original correlation between high yield and high stock price appreciation was discovered, but, as I said yesterday, it wouldn't take a genius, or even a Fool with access to a computer and a database, to wonder if higher yielding Dow stocks might do better than the group as a whole.

Remember, "data mining" is not finding a correlation between two factors -- it is looking through a large number of factors and picking the one that correlates most highly with what you are looking for or adding factors that improve results but that have no logical reason for doing so.

Simply speculating that high yield and stock price appreciation are correlated (for obvious reasons -- many people and many mutual funds buy stocks for the yield) and then looking at those two factors over time and finding that they ARE correlated is not "data mining." I would call it formulating a hypothesis and testing it.

But even if the hypothesis tests out, it doesn't prove that high yield causes price appreciation or that all stocks with high yields will go up or even that in the future, Dow stocks with higher yields will tend to outperform lower yielding Dow stocks. It doesn't really
prove anything except that the correlation existed in the past.

However, one way to test whether a correlation is sound or spurious is to look at it over different time periods. As it turns out, the higher performance of high yield Dow stocks holds all the way back to 1929. In fact, there is no 20-year period and only one 10-year period from 1929 through today where the High Yield 10 did not beat the Standard & Poor's 500 Index.

In my opinion, this connection between yield and price appreciation for Dow stocks and companies similar to the Dow companies (I can't believe that it would hold for all stocks) is as easily established and reasonable as the connection between earnings per share and stock price, which I don't believe anyone seriously questions. All other things being equal, companies that make money (high earnings per share, or EPS) are more valuable than companies that don't, and the more money the company makes, the more valuable its stock. Does anyone have a problem with that?

The correlation between EPS and share price appreciation could easily have been "discovered" through data mining (not that it would have gone unnoticed before the data explosion that so worries McQueen and Thorley). Having been discovered that way would NOT make the correlation less valid. There is a reason behind it. Therefore, it really doesn't make any difference to me whether the high yield correlation was discovered by someone running a million factors through a Cray or by a lone money manager who noticed that the stocks everyone had been badmouthing last year usually did pretty good the next year.

This is the first thing that puzzles me about this study. If McQueen and Thorley were really looking for a "case study in data mining," why didn't they find one that was actually created that way and that subsequently failed? Simply saying that they don't know how the strategy originated is hardly damning proof of a bad process.

If they had asked, I am sure they could have found many things about the way the Foolish Four was originally conceived and subsequently modified that didn't meet their academic standards, and that, looking back, I would have to say were naive. But we muddled through that and came to a sound and reasonable conclusion.

It's pretty obvious to me that what we did in the beginning wasn't data mining. It was far from academic-standard research, but it was not looking through many variables to find the one with the highest return. We only had one variable to look at -- rank -- and no one was contending that the price rank of a high yield stock was, in itself, significant. Mostly we were trying to understand what was up with Beating the Dow. Our discovery that the lowest-priced stock had a much lower average return than the others and the idea of leaving it out to improve the return is characteristic of data mining, however.

That bothered us. It bothered one Foolish staff member in particular a whole lot. Randy Befumo never accepted anything at face value. This "number voodoo" that the Beating the Dow message board was so excited about in 1995 bugged him so much that he got out and dug up the information that was missing. He looked up the dividends and prices of all 30 Dow stocks back to 1961. This was our original Dow Dividend Spreadsheet, which we have been selling (and updating) since 1996.

Having those numbers let us verify the actual returns of the High Yield 10, the Beating the Dow 5, the PPP, and the strategy variations that I had tested using just Michael O'Higgins's annual returns -- and it let us look at a time period that Michael O'Higgins had not mentioned. What we found was interesting and educational.

First, the general premise was still valid -- the High Yield stocks did do better. During the 12 additional years (1961-1972) that Randy dug up, the High Yield 10 beat the Dow seven times, but more importantly, when we looked at five-year rolling returns for that period, the High Yield 10 beat the Dow in six of eight five-year periods, and the other two were virtual ties -- you had to carry the return out to two decimal places before a difference showed up.

BUT, in the '60s, the High Yield 5 and not the Beating the Dow 5 had the best record, and the PPP was a disaster. It was the LOWEST priced high yielding stock, not the second lowest priced, that had the highest average returns.

Hummmmmm. What to make of this?

Did that mean that the high returns of the 5 lowest-priced stocks and the PPP from 1971 on were simply a statistical variation, or did it mean that something about the market changed in the early '70s. And if something had changed, what was it, and was it still in effect?

Good questions. I don't know the answers, folks. But until I find the answers I'm not going to toss a strategy that has worked and continues to work. What we see, when we look carefully at our data, is that prior to 1971, the high yield approach is still valid but the refinement of selecting for low price didn't seem to make any difference. If we carried the study back another 10-20 years, we might find the same thing, or we might find that the '60s were an anomalous period. What is important to me is that, now that we had data to work with, Fools got busy, and two things emerged.

Robert Sheard, who used to fill this space every day, noticed that the really bad years that were dragging down the average performance of the lowest-priced stocks were very strongly correlated with the times when the lowest-priced stock was also the highest yielding. (By high correlation I mean that in 5 out of the 7 times that this occurred, the return was lower than minus 19%.)

Now, since yield is driven up when price goes down, this is essentially saying the same thing -- when the lowest-priced stock also has the highest yield, the stock's price has been driven down really low. O'Higgins's contention that the lowest-priced stock was often a stock in real trouble as opposed to just temporarily out of favor is consistent with that.

If that situation were predictive (and it was only speculation that it might actually be predictive), then we had a way to avoid that lowest-priced stock when it was in danger of tanking. Robert tested his theory (the only theory he had, because the correlation was pretty obvious) and found that it did, indeed, improve the average returns over the entire range of the database.

Because we had never felt completely comfortable with the idea of a stock being "bad" when it showed up as number 1 on the list but wonderful when it showed up as number 2 (thanks to Randy's hammering us about the dangers of data mining), we changed the way the Foolish Four was selected. The new trading rule was: buy the four lowest-priced stocks on the High Yield 10 list except when the lowest priced stock is also the highest yielding.

I suspect that McQueen and Thorley are thinking I've proved their point right about now -- but proving a point isn't my goal here. My goal is explaining exactly what happened, when, and why, so that readers can judge for themselves.

Tomorrow: We really start to mine the data -- the RP process.

Fool on and prosper!


Today's Stock Lists | 1999 Dow Returns

05/11/99 Close
Stock  Change   Last
--------------------
CAT  +1  7/16  63.19
JPM  -   3/8   135.31
MMM  +  13/16  95.00
IP   -2  1/4   53.88



                Day   Month    Year   History
       FOOL-4   -0.13%   1.39%  30.74%  32.68%
        DJIA     +0.17%   2.20%  20.48%  20.00%
        S&P 500  +1.14%   1.53%  10.60%  10.87%
        NASDAQ   +1.59%   0.94%  17.06%  18.66%

    Rec'd   #  Security     In At       Now    Change

 12/24/98   24 Caterpillar   43.08     63.19    46.67%
 12/24/98   14 3M            73.57     95.00    29.13%
 12/24/98    9 JP Morgan    105.51    135.31    28.25%
 12/24/98   22 Int'l Paper   43.55     53.88    23.71%


    Rec'd   #  Security     In At     Value    Change

 12/24/98   24 Caterpillar 1034.00   1516.50   $482.50
 12/24/98   14 3M          1030.00   1330.00   $300.00
 12/24/98    9 JP Morgan    949.62   1217.81   $268.19
 12/24/98   22 Int'l Paper  958.12   1185.25   $227.13

              Dividends Received      $29.45
                             Cash     $28.26
                            TOTAL   $5307.27





</FOOLISH FOUR PORTFOLIO>