Forensic DNA Match Probability & Population Statistics Calculator

simulator intermediate ~10 min
Loading simulation...
RMP ≈ 10⁻²⁴ — profile is astronomically rare

With 20 STR loci and average allele frequency of 0.08, the random match probability is approximately 10⁻²⁴, meaning the probability of a coincidental match is vanishingly small even when searching the largest forensic databases.

Formula

RMP = ∏ P(genotype_i) for i = 1 to n loci
P(het) = 2p_i q_i with NRC 4.1 theta correction: 2(θ + (1-θ)p)(θ + (1-θ)q)
E(matches) = RMP × N_database; LR = 1 / RMP

Hardy-Weinberg and the Basis of Match Probability

The foundation of forensic DNA statistics rests on the Hardy-Weinberg principle: in a large, randomly mating population, genotype frequencies can be predicted from allele frequencies. For a heterozygous genotype with alleles A and B at frequencies p and q, the expected frequency is 2pq. For homozygotes, it is p². These per-locus probabilities are then multiplied across all tested loci under the assumption of linkage equilibrium — that alleles at different loci are inherited independently.

The Product Rule and Combined Rarity

The extraordinary discriminating power of DNA profiling comes from the multiplicative combination of moderately informative loci. A single locus might have a genotype frequency of 5%, but 20 such loci combined produce a profile frequency of approximately 0.05²⁰ ≈ 10⁻²⁶. Even with more common alleles, modern 24-locus kits routinely achieve random match probabilities far smaller than the inverse of the world population, making coincidental matches among unrelated individuals effectively impossible.

Population Substructure Corrections

Real human populations do not mate entirely at random. Ethnic, geographic, and cultural barriers create substructure where individuals within groups share more alleles than expected. The theta (Fst) correction, recommended by the NRC II in 1996, inflates genotype frequency estimates to account for this extra allele sharing. Typical theta values of 0.01-0.03 are used, with higher values for more isolated populations. This correction is now standard practice in accredited forensic laboratories.

Database Searches and the Birthday Problem

When suspects are identified through database searches rather than independent investigation, the probability of finding a coincidental match increases with database size — analogous to the birthday problem. The expected number of adventitious matches is RMP × N, where N is the database size. For CODIS with 20+ million profiles, this underscores the importance of using expanded STR panels to maintain discriminating power even as databases grow.

FAQ

What is the random match probability?

The random match probability (RMP) is the probability that a randomly selected unrelated individual from the reference population would have the same DNA profile as the evidence sample. It is calculated by multiplying per-locus genotype frequencies across all tested loci, using the product rule under the assumption of linkage equilibrium.

How does the product rule work for DNA profiles?

The product rule multiplies genotype probabilities across independent loci: RMP = P(locus1) × P(locus2) × ... × P(locusN). For a heterozygous locus with alleles of frequency p and q, the expected frequency is 2pq (Hardy-Weinberg). With 20 loci each having frequencies around 5-15%, the combined RMP typically reaches 10⁻²⁰ or smaller.

What is the database search (NRC 4.2) correction?

When a suspect is identified through a database search rather than independent investigation, some statisticians argue the match probability should be adjusted. The NRC II Report recommended multiplying the RMP by the database size (RMP × N) to estimate the expected number of coincidental matches, though this recommendation remains debated.

Why do different populations give different match probabilities?

Allele frequencies vary among population groups due to different demographic histories. A genotype rare in one population may be more common in another. Forensic laboratories typically calculate RMP using multiple reference databases (e.g., Caucasian, African American, Hispanic, Asian) and report the most conservative (highest) value.

Sources

Embed

<iframe src="https://homo-deus.com/lab/forensic-genetics/population-statistics/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub