Longitudinal Studies > Snapshot Studies

Longitudinal studies described by Roberts are very difficult while snapshot comparisons are relatively much easier to implement. But even the longitudinal studies are done on wide demographic cohorts. You may be interested in a much narrower subset of people. For example the children of decamilionaires, the class Kylie was born into.

The way to do a study like that would probably be a Bayesian experimental design. For example you’d take the base rates from the broader longitudinal studies and adjust them for the specific reference class you are studying. A statement that sounds like “how did children born in the 1980s do given they had access to a PC or given they grew up in Beverly Hills?” The conditions after the word “given” comprise the reference class.

In Kylie’s case, what was the right reference class? What’s the relevant condition? Being a decamillionaire to start? What about her IG followers? Not all decamillionaires enjoyed her fame. What about the fact that she became a billionaire so quickly?

The critical observation: experimental design has an enormous impact on the results of studies that show up in our feeds that then spread like truth at happy hour. A poorly constructed design or reference class has downstream effects that can swamp the conclusions.

A recurring Moontower theme is the world is messy, be critical of tidy takes. Alas, don’t despair, here are some guides and concepts to keep in mind.

Making Better Comparisons

One of my favorite researchers Michael Mauboussin wrote a guide to making better comparisons. Since all non-split second decisions require conscious comparison we can all do better by getting meta about the process. Mauboussin decomposes the steps and highlights pitfalls that await along the way. The link to the pdf and my summary notes can be found here.

Simpson’s Paradox

We learned how easy it can be to overreach for conclusions from snapshot data. Russ Roberts pointed out the danger of averaging averages — Simpson’s Paradox. It states that the change in the average is not the same as the average of the changes. When I interviewed at Susquehanna coming out of college, one of the questions was a perfect example of the paradox, even if I didn’t know the term for it back then.

The question: If batter A has a better batting average than batter B for the first half of the season AND batter A has a better batting average for the second half of the season, is it possible for batter B to have a better batting average for the whole season?

The answer.

Life expectancy example

In 1900, life expectancy was about 47 for a US male. Does that mean you were middle-aged by the time you left college? Of course not. If you made it to 22 years old there was a good chance you’d live well into your 60s or older. Life expectancy is extremely sensitive to infant mortality rates. In 1900, infant mortality was around 20%. Today it’s closer to .5%.

Society 1:

Infant mortality = 30%
Survivors who make it past infancy live to 80
Life expectancy = .20 x 0 + .80 x 80 = 64 years old

Society 2:

Infant Mortality = .5%
Survivors who makes it past infancy live to 80
Life expectancy = .005 x 0 + .995 x 80 = 79.6 years old

Without tracking the lives of people, the snapshot of life expectancy can make people jump to all types of silly conclusions about the 2 periods. When we study longitudinally we would have seen that someone who makes it to 32 is still far from middle-aged.

Leave a Reply