RNA-seq Differential Expression Simulator: Statistical Power & Experimental Design

simulator intermediate ~11 min
Loading simulation...
Power ≈ 72% — for 2-fold change with 3 replicates

With 3 replicates per group, 20M reads, and dispersion 0.1, the experiment has 72% power to detect a 2-fold change in expression, with an estimated FDR of 5% using Benjamini-Hochberg correction.

Formula

Var(Y) = μ + φμ² (negative binomial variance)
log₂FC = log₂(μ_treatment / μ_control)
q = p × m / rank (Benjamini-Hochberg FDR)

From Reads to Expression

RNA-seq measures gene expression by sequencing the mRNA molecules in a biological sample. After alignment to a reference genome, the number of reads mapping to each gene serves as a proxy for its expression level. However, raw counts must be normalized for sequencing depth and gene length before comparison across samples, using methods like TPM (transcripts per million) or the size factors employed by DESeq2.

The Negative Binomial Model

Gene expression counts follow a negative binomial distribution, which accounts for both the Poisson sampling noise of sequencing and the biological variability between replicates. The key parameter is the dispersion φ, which quantifies how much the variance exceeds the mean. DESeq2 estimates dispersion for each gene using empirical Bayes shrinkage, borrowing information across genes to stabilize estimates when sample sizes are small.

Statistical Testing and Multiple Correction

For each gene, a statistical test compares expression between conditions, producing a p-value. With 20,000+ genes tested simultaneously, multiple testing correction is essential — at p < 0.05, we would expect 1,000 false positives by chance alone. The Benjamini-Hochberg procedure controls the false discovery rate (FDR), ensuring that among all genes called significant, at most a specified fraction (typically 5%) are false positives.

Experimental Design Matters

The most impactful design choice in RNA-seq is the number of biological replicates. Adding replicates improves power far more than increasing sequencing depth beyond ~20 million reads. The volcano plot — plotting log2 fold change versus -log10 p-value — reveals the tradeoff: large fold changes are easy to detect even with few replicates, while small but biologically important changes require substantial replication to distinguish from noise.

FAQ

What is differential gene expression analysis?

Differential expression analysis identifies genes whose mRNA levels differ significantly between two or more conditions (e.g., disease vs. healthy). RNA-seq quantifies transcript abundance by counting mapped reads per gene, then statistical tests (DESeq2, edgeR) identify genes with changes larger than expected by chance, controlling the false discovery rate.

Why is the negative binomial distribution used for RNA-seq?

RNA-seq count data shows overdispersion — the variance exceeds the mean, unlike a Poisson distribution. The negative binomial distribution has an extra dispersion parameter that models this biological variability between replicates. Tools like DESeq2 estimate gene-wise dispersions using empirical Bayes shrinkage.

What is statistical power in RNA-seq experiments?

Statistical power is the probability of correctly detecting a truly differentially expressed gene. It depends on fold change magnitude, sample size, sequencing depth, biological variability (dispersion), and the significance threshold. Low power means many real DE genes go undetected (false negatives).

How many replicates do I need for RNA-seq?

The answer depends on the expected fold change and biological variability. For detecting 2-fold changes with moderate variability, 3 replicates provide ~70% power. For small fold changes (1.5×), 6-10 replicates may be needed. Biological replicates (independent samples) are far more valuable than technical replicates (re-sequencing the same library).

Sources

Embed

<iframe src="https://homo-deus.com/lab/bioinformatics/gene-expression/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub