The Tree of Life
Every living organism on Earth is connected by a single branching tree of descent — the Tree of Life. Phylogenetics is the science of reconstructing this tree from observable data: the shapes of bones, the sequences of DNA, the presence or absence of anatomical features. Since Darwin first sketched a branching diagram in his notebook in 1837, the methods have become enormously sophisticated, but the fundamental goal remains the same — to determine who is most closely related to whom.
Building Trees from Data
This simulation generates a random 'true' evolutionary tree, simulates trait evolution along its branches, and then lets you reconstruct the tree using three different algorithms. UPGMA (Unweighted Pair Group Method with Arithmetic Mean) assumes a constant molecular clock. Neighbor-Joining relaxes this assumption and handles variable rates. Maximum parsimony finds the tree that requires the fewest evolutionary changes. Each method has trade-offs between speed, accuracy, and assumptions about the evolutionary process.
The Combinatorial Explosion
For just 10 species, there are over 34 million possible unrooted tree topologies. For 20 species, the number exceeds 10^21. This combinatorial explosion means that exhaustive search — evaluating every possible tree — is feasible only for small datasets. Real phylogenetic analyses with hundreds or thousands of species rely on heuristic search algorithms that intelligently explore tree space, guided by optimality criteria like maximum likelihood or minimum total branch length.
Confidence and Pitfalls
How do we know if a phylogenetic tree is correct? Bootstrap resampling provides statistical support for each node: if a grouping appears in 95% of resampled trees, we have high confidence it reflects genuine evolutionary history. But phylogenetics has pitfalls. Long-branch attraction causes rapidly evolving lineages to be incorrectly grouped together. Convergent evolution creates misleading similarities. Horizontal gene transfer — common in bacteria — violates the tree model entirely, requiring network approaches instead.