Database Dreamin'

By Ann Coleman (TMF AnnC)
June 30, 2000

I'm taking something of a chance today in describing a database that I haven't actually seen yet, but I really want to get input from our community on what we should be looking for and how we should go about looking for it. So today I will lay out for you what it is that we will have to work with and invite you to suggest ways in which we can make the best use of this resource.

Here is what we are getting from the Center for Research in Security Prices (CRSP).

For each month from January 1950 on, the database will list the 500 largest (by market cap) U.S. companies, excluding transportation and utility companies. (They are excluded because they were never included in the Dow.) Each company is identified by a unique code so that it can be tracked over time as names and tickers change, companies merge, go out of business, spin off divisions, etc. In addition to identifying information, for each company we will have:

Price as of the first trading day of the month

Regular cash dividend

Yield (most recently declared dividend * 4 / price)

RP (yield/square root of price)

Number of shares outstanding (from which market cap is calculated by multiplying the number of shares outstanding by the share price)

Returns for subsequent one to three years at three-month intervals

We will also have an adjusted share price and adjusted number of shares outstanding (adjusted for splits, spin-offs, etc.). These numbers are adjusted to their equivalent in today's values. All dividend and other payout information is included in a separate database. From this information we can calculate returns for other periods. CRSP is also ranking the stocks for us by price, yield, market cap, and RP.

Naturally, the folks at CRSP are concerned about how their data is used. I have some idea how difficult this information is to assemble, and I fully understand and support their concerns -- even if they weren't backed by law! The raw data is copyrighted material. Any sharing we do with members of the community must be in line with our contract, which rules out sharing raw price and dividend data right off the bat.

Naturally, the results we get can be shared, and "derived" data -- like the RP, returns, and rankings -- can also be shared. I am working on a way to make some parts of the database available to certain members of the community in a limited way that doesn't violate our contract with CRSP. Also, anyone who wishes to travel to Alexandria, Virginia, may view the raw database right here on the computer where it lives.

My plan for the first, quick-and-dirty test is to extract the 50 largest, dividend-paying stocks by market cap for each month, sort them by RP, and compare the one-year returns of the stocks ranked 2-5 (the Foolish Four method) with that of the 500-stock database as a whole and also with the return for the 50 largest-cap stocks. (By the way, when I say "quick," don't take that too literally!)

Please feel free to e-mail me ([email protected]) with your suggestions for this first study or post them on the discussion board. In particular, I would love to hear what kind of statistical test readers would find valuable for comparing returns, and what you would feel is an acceptable confidence level to establish validity.

There are a number of other questions that can be addressed, but my first focus is going to be validating, or not, the original concept. As I keep saying so often that my fingers type it of their own accord, even if the new database confirms the results of our earlier work, that won't prove anything about the future performance of the Foolish Four. But it will give me, personally, a reason to hold on during uncomfortable periods like this one.

Fool on and prosper!