By
We have ordered a database of the top 500 stocks by market cap from the University of Chicago Graduate School of Business' Center for Research in Security Prices, the creme de la creme of stock price databases. I'm trembling. Not with anticipation -- with fear. The database is over 300,000 rows long.
If my computer doesn't just curl up and die when I ask it to do the first sort, I should be able to do some simple tests myself. More complicated stuff will be farmed out to Bob Price (TMF Sandy). Tomorrow I will give you a more detailed description of what we will actually be getting. Today let's talk about why we are getting it.
This larger database will let us test the Foolish Four theory in ways that were not possible before. Hopefully it will put to rest some of the controversy about whether the Dow-based strategies are valid. That and the fact that I love to play with spreadsheets. At least, I love to play with little spreadsheets. Really big spreadsheets are a new kind of toy.
While it will never be possible to PROVE that the high-yield/low-price method of picking large cap stocks works going forward, there are a number of questions that can be answered with the new database.
One of the first things I want to look at is the time period of the original studies that showed high-yielding, low-priced Dow stocks outperformed the Dow over the subsequent 12 months. A lot of discussion has gone on about the validity of Michael O'Higgins' original study and our subsequent studies that were based on his. Were the findings a result of data mining?
To review, data mining is one of the dangers of using historical data to draw conclusions. The famous Super Bowl predictor is one simple example. For a number of years the stock market went up in years when the National Football Conference won the Super Bowl and down in years when the American Football Conference won -- or was it the other way around? Doesn't matter. The apparent relationship was just a random association. There was obviously no cause-and-effect.
The point is that there will always be random associations between various bits of data. If you have enough data to look through, it's not difficult to find something like the Super Bowl predictor. But if one factor doesn't actually influence the other, the association won't continue.
The original study that lead to the development of the Foolish Four noted that high- yield Dow stocks tended to outperform the Dow as a whole. Eventually this was dubbed the "Dogs of the Dow" strategy. ("Dogs" because a stock with a higher than average yield usually got that way because the price has dropped. The price drops when a stock is out of favor with investors.)
Michael O'Higgins studied returns for Dow stocks between 1973 and 1991 and found that if you started with the 10 highest-yielding Dow stocks, and then bought only the 5 lowest priced from that bunch, the returns looked even better. Randy Befumo, an early Fool, didn't believe it. He developed his own data base that went back to 1971 (later expanded to 1961) and confirmed O'Higgins results for the original time period, although the time period from 1961 to 1973 showed no outperformance. (Note, both of these studies looked only at strategies starting around January 1.)
Much (and I do mean much) controversy has arisen over the nature of this association. Since Michael O'Higgins didn't go into a lot of detail in explaining how he did his original study, critics have said that the study was "data-mined," meaning that he may have searched through a great number of factors until he found two (price and yield) that were randomly associated with high returns.
I've never been persuaded that this is the case, although it is, of course, possible. I don't know how O'Higgins did his original study, but given that high yield is usually fairly attractive to investors, it doesn't seem to me that the association here is inexplicable. The low price factor is less understandable. Low price will affect yield (yield = dividend/price) but should not otherwise affect the performance of a stock. The fact is that it does seem to have always been a weak point in the strategy, but I'm not convinced that it is a fatal flaw.
At any rate, I don't see any reason to assume that the original study was "datamining." It is very difficult (and expensive) to get the kind of database that would have many factors that one could test. Price and dividend information is easily available. You can get it from any newspaper (although checking it and calculating returns from such contemporaneous data can be a nightmare because of splits, spinoffs, mergers, etc.) However, even if it wasn't datamining per se, the association could still have been a random one.
Since we don't know how the original study was done, the way to determine if its results were valid is to conduct an "out of sample" test. We did that by assembling a monthly database that went back an additional 10 years. But the new data raised other questions (as always happens) and the criticism and controversy continued.
The fact that the Foolish Four has not beaten the market for the last several years has certainly fueled the controversy, even though such periods have existed in the past. I know that when your portfolio is down, the "Think Long Term" mantra is cold comfort.
We can't prove that the strategy will work going forward. But with this new data we can at the very least look at the period of the original study and see if it was flawed. Originally, the Dow stocks were used for convenience. Dow membership was never considered essential, although being an extremely large, financially sound company would be essential. So if we eliminate Dow membership as a factor, and instead use a database of large cap, dividend-paying U.S. stocks, we can test whether high yield combined with low price really does correlate with higher returns.
If the association was random, it won't occur in the larger database. If there was a true causal effect, it should not be limited to the stocks of the Dow. So that will be my first study. Tomorrow I will describe the database that we will be getting and lay out the details of how I plan to do the first study. Suggestions are welcome, but for the moment, let's confine them to this first question. Please. At least until I have time to figure out how this monster works.
Fool on and prosper!