STR Profiling: Allele Frequency & Discrimination Power Calculator

simulator intermediate ~10 min
Loading simulation...
Combined DP > 0.999999999 — virtually certain discrimination

With 16 STR loci averaging 12 alleles each, the combined discrimination power exceeds 9 nines, meaning fewer than 1 in a billion unrelated individuals would share this profile.

Formula

Heterozygosity: H = 1 - Σ(p_i²) where p_i is frequency of allele i
Discrimination Power: DP = 1 - Σ(g_j²) where g_j are genotype frequencies
Theta correction: P(A_i) = θp_i + (1-θ)p_i = p_i (homozygote adjusted upward)

The Foundation of Forensic Identity

STR profiling is the gold standard of forensic human identification. Each locus in the human genome where a short sequence repeats polymorphically provides an independent axis of variation. By measuring repeat counts at multiple loci spread across different chromosomes, forensic scientists construct a multi-dimensional fingerprint whose statistical uniqueness grows exponentially with each additional marker.

Allele Frequency Distributions

The evidential weight of a DNA profile depends entirely on how common its constituent alleles are in the relevant population. Rare alleles contribute more discrimination power than common ones. Population geneticists maintain reference databases for major ethnic groups, and forensic calculations use the most conservative frequency estimates to protect defendants' rights.

Population Substructure

Real populations are not perfectly random-mating. Ethnic groups, geographic isolates, and endogamous communities show elevated homozygosity relative to Hardy-Weinberg expectations. The theta correction adjusts genotype probability calculations upward to account for this substructure, ensuring that match statistics remain conservative even when the suspect's exact subpopulation is unknown.

Modern Multiplex Kits

Commercial kits like GlobalFiler (24 loci), PowerPlex Fusion (24 loci), and Investigator 24plex simultaneously amplify dozens of STR markers from nanogram quantities of DNA. These massive multiplex reactions push combined discrimination power past 10^-30, making adventitious matches essentially impossible even searching planetary-scale databases.

FAQ

What are STR loci in forensic genetics?

Short Tandem Repeats (STRs) are regions of DNA where a 2-6 base pair motif repeats a variable number of times. Different individuals carry different repeat counts (alleles), making STRs ideal markers for human identification. Forensic kits amplify 16-24 STR loci simultaneously.

How is allele frequency measured?

Allele frequencies are estimated from reference population databases. A sample of N individuals is genotyped at each locus, and the frequency of each allele equals its count divided by 2N (since each person has two alleles). Larger samples yield more precise frequency estimates.

What is the theta correction?

The theta (θ) correction, also called F_ST or coancestry coefficient, accounts for population substructure. When subpopulations exist, allele frequencies within a group may differ from the overall population, inflating match probabilities. Typical θ values range from 0.01 to 0.03.

How many STR loci are needed for reliable identification?

The FBI CODIS system expanded from 13 to 20 core loci in 2017. European systems use 12-16 loci. With 16+ loci, the random match probability for unrelated individuals typically falls below 1 in 10^18, providing extremely high statistical confidence.

Sources

Embed

<iframe src="https://homo-deus.com/lab/forensic-genetics/str-profiling/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub