Sample Size Calculator: Statistical Power for Clinical Studies

simulator beginner ~8 min
Loading simulation...
n = 64 per group — total N = 128

To detect a medium effect size (d=0.5) with 80% power at the 5% significance level using a two-tailed test, 64 participants per group (128 total) are required.

Formula

n = ((z_{α/2} + z_β) / d)² per group
d = (μ₁ − μ₂) / σ_pooled

The Planning Phase

Sample size calculation is the single most important statistical step before launching a clinical trial. Too few participants and the study may fail to detect a real treatment benefit, wasting years of effort and millions of dollars. Too many participants and the study exposes unnecessary patients to unproven treatments and consumes resources that could fund other research. The calculation balances four quantities: significance level (α), power (1−β), effect size (d), and sample size (n) — knowing any three determines the fourth.

The Normal Distribution Visualization

The upper panel shows two overlapping normal distributions — the null hypothesis (no effect, centered at 0) and the alternative hypothesis (real effect, centered at d). The significance level α defines the rejection region in the tail of the null distribution. Power (1−β) is the area of the alternative distribution that falls in the rejection region. As you increase sample size, both distributions become narrower (standard error decreases), reducing their overlap and increasing power.

Effect Size: The Missing Ingredient

The most challenging aspect of sample size calculation is specifying the effect size. What difference between treatment and control is both clinically meaningful and realistic? This requires domain knowledge, pilot study data, or literature review. The simulation lets you explore how sensitive sample size is to effect size assumptions — halving the effect size quadruples the required sample. This nonlinear relationship catches many researchers off guard.

Beyond the Basics

This calculator covers the two-sample t-test scenario. Real clinical trials often require more complex calculations accounting for survival endpoints (requiring event-driven sample sizes), binary outcomes (requiring different formulas based on proportions), multiple comparisons (Bonferroni or other adjustments to α), adaptive designs (allowing sample size re-estimation at interim analyses), and dropout rates (inflating the sample to account for losses to follow-up).

FAQ

Why is sample size calculation important?

Sample size calculation ensures a study is large enough to detect a real effect (adequate power) but not wastefully large (exposing unnecessary participants to experimental treatments). Ethics committees, regulatory agencies, and grant funders all require sample size justification before a study begins.

What is statistical power?

Statistical power (1−β) is the probability that a study will correctly detect a real treatment effect. Power of 0.80 means an 80% chance of finding a statistically significant result if the true effect exists. The standard threshold is 80%, though 90% is common for pivotal trials.

What is Cohen's d?

Cohen's d is a standardized measure of effect size — the difference between two group means divided by the pooled standard deviation. Cohen's guidelines: d=0.2 is small, d=0.5 is medium, d=0.8 is large. Most clinical interventions produce effects in the d=0.3-0.7 range.

What happens if a study is underpowered?

An underpowered study has a high probability of missing real effects (Type II error). Worse, any significant results it does produce are likely to overestimate the true effect size (the 'winner's curse'), leading to inflated expectations that fail to replicate in subsequent studies.

Sources

Embed

<iframe src="https://homo-deus.com/lab/biostatistics/sample-size/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub