The Most Misunderstood Number in Science
The p-value is arguably the most influential — and most misinterpreted — number in all of science. Introduced by Ronald Fisher in 1925, it was meant as an informal measure of evidence against a null hypothesis. Nearly a century later, it has become a rigid threshold that determines what gets published, what drugs get approved, and what policies get enacted. Understanding what p-values actually measure, and what they do not, is essential statistical literacy.
How the Z-Test Works
The one-sample z-test asks: is the sample mean far enough from the hypothesized population mean to be unlikely under the null hypothesis? The test statistic z measures this distance in units of standard error. A larger z means the sample mean is further from the null — making the null hypothesis less plausible. The p-value converts this distance into a probability under the null distribution.
Power and Sample Size
Statistical power is the probability of correctly rejecting a false null hypothesis. It depends on three factors: the true effect size, the sample size, and the significance level α. Underpowered studies — which are alarmingly common — waste resources and produce unreliable results. This simulator lets you see exactly how increasing the sample size or effect size boosts power toward the conventional 80% target.
The Replication Crisis
The widespread misuse of p-values contributed to science's replication crisis, where many published findings failed to reproduce. Researchers engaged in p-hacking — running multiple analyses until p < 0.05 appeared by chance. The American Statistical Association issued an unprecedented statement in 2016 warning against over-reliance on p-values. Modern best practice emphasizes effect sizes, confidence intervals, and pre-registration of hypotheses.