A research project to determine whether the Foolish Four's formula was able to choose non-Dow stocks that outperformed is under way. We will be applying the RP formula to different stocks in different timeframes, but will keep all other factors exactly the same. The results will tell us about the past, but still won't tell us what the strategy will do in the future.
![]() |
|||
|
|||
|
|||
By
If you just arrived on this planet, the Foolish Four has been under considerable attack since an article appeared in the Financial Analysts Journal more than a year ago (click here for a "lighthearted" Adobe acrobat version -- it's a download), charging that it was a product of "data mining" (which is true) and, therefore, not valid (which is what we want to find out). I don't accept that opinion, since data mining can be a very useful way to discover valid correlations as well as invalid ones.
The arguments made in the McQueen and Thorley article were answered in this space, as far as they could be answered at that time. (See Fool's Gold? and the 10 or so articles following it.) Our discussion board community also asked the same questions and expanded on the statistical problems that we faced.
Some participants became very passionate and vocal critics of the strategy. It was largely due to their persistence, and their many and often sensible arguments, that I decided to run another test. If it were just the McQueen and Thorley article, I would never have gone to all this trouble!
The crux of the matter, as I see it, is that there is no way to run any kind of statistical test that establishes the validity of the strategy using the data we have now. That's because of the multiple hypotheses problem: Say you were using historical data to look for a market-beating strategy. First, you tried a strategy that picked stocks with low P/Es (price-to-earnings ratios), but they didn't do as well as you hoped. So, you tried high P/Es and that didn't work, either. So, you tried low debt ratios combined with low P/Es, etc. You kept trying one strategy after another and one day -- bingo -- you found one that just killed the market.
You run a statistical test that shows your strategy outperforms the market by such a high level, that there is only a 5% probability that it was due to chance.
Is it time to mortgage your house and put everything into that baby? Don't bet on it.
You have to turn things around: The question to ask is, how many strategies did you test? Suppose you had tested 20 invalid strategies. Chances are that one of them would have performed that well simply by chance. That's what your statistics told you. That is something I didn't understand a couple of years ago when I ran A one-tailed T test. (Thanks goes to Repoonsatad -- a.k.a. Datasnooper -- and others on the discussion board who explained it to me many, many times until it sank in.)
The fact that the test could not be considered valid because of the multiple hypotheses problem doesn't mean that the strategy is no good. It just means we are back to square one, looking at the historical record and judging whether the criteria make sense.
Now, the problem with the Foolish Four is that its parentage and history are very murky before 1994. We don't know how many strategies were tested by Michael O'Higgins -- author of Beating the Dow -- and others. I don't believe it was very many. Others think it could have been hundreds or even thousands. There is just no way to know and, therefore, no way to modify a statistical test to see if the results still come out statistically significant.
So, we punt. A different tactic for testing validity -- and, in my opinion, a more useful one -- is to test your strategy on a different data set. While the association in the first set of data could be due simply to luck, it would be very unlikely for that same bit of luck to be at work in the second, independent data set.
So, if the RP ratio, which combines yield and price in one number, selects Dow stocks that tend to outperform the Dow, then it should select outperforming stocks that are not on the Dow, provided the companies are more or less of the same type. (No one is arguing that the high-yield/low-price formula would apply to small caps or utilities, I hope.)
The rationale for using the Dow has been that these are very big, dividend-paying companies, with huge resources behind them and the ability to weather market storms and bounce back from minor troubles. The Dow is made up (usually) of companies like that, but not entirely.
Gathering the data to retest the Foolish Four formula on a different set of stocks was the problem. Ethan Haskel provided some supporting evidence when he developed the "Beating the S&P" strategy. That strategy created an un-Dow, a list of 30 large-cap stocks that excluded Dow stocks. His original backtest only covered 10 years, with one January portfolio per year. That wasn't enough to convince anyone, although it certainly carried some weight with me.
Eventually, we decided to do it up right and spring for the gold standard database -- the same one used by McQueen and Thorley and most academic researchers that is maintained by the Center for Research in Security Prices at the University of Chicago Graduate School of Business.
So, here's the plan: To keep things very simple, and hopefully non-controversial (silly me!) we are going to test the Foolish Four over the last 50 years, decade by decade. Instead of using the Dow for our first cut, we will substitute the criteria of market cap -- we will simply use the 30 largest U.S. companies. (The database already excludes transportation and utilities because they have always been excluded by the Dow, and we want to change as few factors as possible.)
So, for each month of the last 50 years, we will choose the 30 largest companies by market cap and rank them by their RP ratio. The database conveniently provides total return figures for each stock, so all we have to do is average the total return after one year for the second- through fifth-ranked stocks each month. That will give us 594 Foolish Four portfolios. (We only have 12-month returns for portfolios from the first half of 1999, so technically we have 49.5 years of data.)
That will give us an exact analog of the Foolish Four strategy (current version), with only one criteria changed: the stocks on which it was used. And it will let us look at how the strategy did prior to the current database we have for Dow stocks, which only goes back to 1962. It will give us an out-of-sample test in two ways, time and stocks.
Where are we now? Last Thursday we got the final (I hope) version of the database from CRSP. Bob Price is lead geek on this -- he's been developing the macros that will do the number crunching.
We still have to remove the Dow stocks, which is quite labor-intensive, so we have no idea how the returns will play out. But I can tell you one thing: There's no "January effect." Returns follow no seasonal pattern at all. That is a big relief, although I was surprised to see it. I really bought the tax-loss selling and "window-dressing" arguments about money managers and corporations trying to make their portfolios look better by buying the successful stocks near year-end, driving them up a bit, and selling the less-successful ones, driving them down a bit -- or selling their worst stocks near the end of the year to get the tax write-off.
For Foolish Four investors, that would have meant possibly getting the downtrodden stocks at a slightly better price. Apparently not. I'm happy we won't have that factor to kick around. It simplifies things, and I'm highly in favor of anything that simplifies this mass of data.
Fool on and prosper!