Regression to the Mean: Why Extremes Don't Last

simulator beginner ~8 min
Loading simulation...
Regression ≈ 4.5 points — extreme scores regress toward the mean

With r=0.70, subjects above the 90th percentile at time 1 score about 4.5 points lower at time 2 on average. This is not a real decline — it is regression to the mean.

Formula

E[X₂ | X₁] = μ + r × (X₁ - μ)
Regression amount = (1 - r) × (X₁ - μ)
r = correlation between test and retest

The Invisible Force That Fools Everyone

Regression to the mean is perhaps the most pervasive statistical phenomenon in everyday life — and the most consistently overlooked. Discovered by Francis Galton in 1886 while studying the heights of parents and children, it explains why the children of very tall parents tend to be shorter than their parents, why sophomore slumps follow rookie sensations, and why "miracle cures" seem to work when administered after symptoms peak.

Why Extremes Regress

Any measured outcome is a combination of a stable component (true ability, underlying condition) and a random component (luck, measurement error, daily variation). Extreme observations are extreme partly because both components aligned in the same direction. On remeasurement, the random component is just as likely to push in either direction — so the extreme total tends to moderate. This is not a causal force pulling things to the center; it is a simple consequence of randomness.

Real-World Consequences

Regression to the mean has profound implications for evaluating interventions. Speed cameras are installed at accident hotspots, and accidents decline — but they would likely have declined anyway, because the hotspot was partly a statistical anomaly. Students who score lowest on a pretest improve the most on a posttest — not necessarily because they learned the most, but because their low scores partly reflected bad luck. Without a control group, regression to the mean masquerades as a treatment effect.

Galton's Original Discovery

Galton noticed that while tall parents tended to have tall children, the children were on average less extreme than their parents. He initially called this "regression toward mediocrity." Crucially, the same phenomenon works in reverse — children of short parents tend to be taller than their parents. The population distribution stays stable across generations because regression works symmetrically. This simulator lets you witness regression in action across repeated measurements.

FAQ

What is regression to the mean?

Regression to the mean is the statistical phenomenon where extreme observations on one measurement tend to be less extreme on a subsequent measurement. A student who scores in the 99th percentile on a test will likely score lower (though still high) on a retest. This happens because extreme scores are partly due to skill and partly due to luck — and the luck component is unlikely to repeat.

Why does regression to the mean fool people?

People naturally attribute the change to whatever happened between measurements. A coach punishes a player after a bad game, then the player improves — the coach credits the punishment, but regression to the mean explains the improvement. A CEO implements a new policy after record profits, then profits fall — critics blame the policy, but regression explains the decline.

How is regression to the mean related to correlation?

The amount of regression is directly determined by the correlation between measurements. With perfect correlation (r=1), there is no regression. With zero correlation (r=0), extreme scores regress completely to the mean. The expected second score is: μ + r × (first_score - μ). This formula was discovered by Francis Galton in the 1880s.

Can regression to the mean be prevented or corrected?

It cannot be prevented because it is a mathematical consequence of imperfect correlation, not a bias that can be removed. However, it can be accounted for in study design by using control groups. Any change in the treatment group that also appears in the control group is likely regression to the mean, not a treatment effect.

Sources

Embed

<iframe src="https://homo-deus.com/lab/statistics/regression-to-mean/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub