Language Family Tree: How Languages Diverge Over Time

simulator intermediate ~10 min
Loading simulation...
~20 living languages from one ancestor

Over 6,000 years of simulated evolution, a single proto-language typically splits into roughly 20 surviving daughter languages with decreasing mutual intelligibility — mirroring the real Indo-European family.

Formula

Lexical retention rate: R(t) = R_0 * Math.pow(1 - drift/100, t) per generation
Expected branches at time t: E[B] = Math.pow(2, branch_rate * t)
Swadesh similarity: S(A,B) = shared_cognates / total_items * 100

Languages Are Living Organisms

Every language alive today descends from an earlier form, branching and diverging like a biological tree of life. English, Hindi, Russian, and Persian all trace back to a single ancestor — Proto-Indo-European — spoken on the Pontic steppe roughly 6,000 years ago. This simulation models that process: watch a single proto-language fracture into a family of descendants, each drifting further from its siblings with every passing generation.

The Mechanics of Divergence

Language change is relentless and universal. Sound shifts transform pronunciation (Latin 'c' as /k/ became French 'ch' as /ʃ/), grammatical structures simplify or complexify, and vocabulary is borrowed, invented, or lost. When two populations are separated — by mountains, oceans, or politics — their speech drifts independently until mutual intelligibility vanishes. At that point, one language has become two.

Extinction and the Endangered Majority

Of the roughly 7,000 languages spoken today, linguists estimate that 40–50% are endangered. A language dies when its last fluent speaker dies, taking with it a unique worldview, oral literature, and ecological knowledge. This simulation lets you adjust extinction rates to see how quickly diversity collapses when small languages are lost at the periphery of the tree.

Computational Phylogenetics

Modern linguists use the same phylogenetic algorithms as evolutionary biologists. By coding cognate sets across languages and running Bayesian inference on the data, researchers can estimate when language families diverged — producing remarkably precise dates that often align with archaeological evidence of population movements and cultural shifts.

FAQ

How do languages split into new languages?

Languages diverge when populations become geographically or socially isolated. Each group's speech drifts independently through sound changes, vocabulary shifts, and grammatical innovations. After enough time, mutual intelligibility is lost and we classify them as separate languages.

What was Proto-Indo-European?

Proto-Indo-European (PIE) was the reconstructed common ancestor of the Indo-European language family, spoken roughly 6,000 years ago in the Pontic steppe. It gave rise to most European languages plus Persian, Hindi, and many others — about 3.2 billion native speakers today.

How do linguists reconstruct ancient languages?

Through the comparative method: systematically comparing cognate words across related languages to identify regular sound correspondences. For example, Latin 'p' corresponds to Germanic 'f' (pater/father), allowing reconstruction of the original PIE sound.

Why are some language families larger than others?

Family size depends on historical factors: geographic spread, population growth, conquest, trade, and technological advantages. Indo-European and Austronesian became enormous because their speakers expanded across vast territories, while some families remained small in isolated regions.

Sources

Embed

<iframe src="https://homo-deus.com/lab/linguistics/language-tree/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub