Hardy-Weinberg and the Basis of Match Probability
The foundation of forensic DNA statistics rests on the Hardy-Weinberg principle: in a large, randomly mating population, genotype frequencies can be predicted from allele frequencies. For a heterozygous genotype with alleles A and B at frequencies p and q, the expected frequency is 2pq. For homozygotes, it is p². These per-locus probabilities are then multiplied across all tested loci under the assumption of linkage equilibrium — that alleles at different loci are inherited independently.
The Product Rule and Combined Rarity
The extraordinary discriminating power of DNA profiling comes from the multiplicative combination of moderately informative loci. A single locus might have a genotype frequency of 5%, but 20 such loci combined produce a profile frequency of approximately 0.05²⁰ ≈ 10⁻²⁶. Even with more common alleles, modern 24-locus kits routinely achieve random match probabilities far smaller than the inverse of the world population, making coincidental matches among unrelated individuals effectively impossible.
Population Substructure Corrections
Real human populations do not mate entirely at random. Ethnic, geographic, and cultural barriers create substructure where individuals within groups share more alleles than expected. The theta (Fst) correction, recommended by the NRC II in 1996, inflates genotype frequency estimates to account for this extra allele sharing. Typical theta values of 0.01-0.03 are used, with higher values for more isolated populations. This correction is now standard practice in accredited forensic laboratories.
Database Searches and the Birthday Problem
When suspects are identified through database searches rather than independent investigation, the probability of finding a coincidental match increases with database size — analogous to the birthday problem. The expected number of adventitious matches is RMP × N, where N is the database size. For CODIS with 20+ million profiles, this underscores the importance of using expanded STR panels to maintain discriminating power even as databases grow.