Phylogenetic Trees: Reconstructing Evolutionary History

simulator advanced ~12 min
Loading simulation...
Neighbor-Joining tree — 8 taxa, 20 traits

A neighbor-joining tree built from 8 taxa and 20 morphological traits with mutation rate 0.1. The tree correctly recovers the simulated evolutionary relationships with high bootstrap support at most nodes.

Formula

Number of unrooted trees for n taxa: T(n) = (2n-5)!! = (2n-5)!/(2^(n-3) × (n-3)!)
Jukes-Cantor distance: d = -(3/4) × ln(1 - (4/3)p) where p = fraction of differing sites
Consistency index: CI = M_min / M_observed where M = number of character state changes

The Tree of Life

Every living organism on Earth is connected by a single branching tree of descent — the Tree of Life. Phylogenetics is the science of reconstructing this tree from observable data: the shapes of bones, the sequences of DNA, the presence or absence of anatomical features. Since Darwin first sketched a branching diagram in his notebook in 1837, the methods have become enormously sophisticated, but the fundamental goal remains the same — to determine who is most closely related to whom.

Building Trees from Data

This simulation generates a random 'true' evolutionary tree, simulates trait evolution along its branches, and then lets you reconstruct the tree using three different algorithms. UPGMA (Unweighted Pair Group Method with Arithmetic Mean) assumes a constant molecular clock. Neighbor-Joining relaxes this assumption and handles variable rates. Maximum parsimony finds the tree that requires the fewest evolutionary changes. Each method has trade-offs between speed, accuracy, and assumptions about the evolutionary process.

The Combinatorial Explosion

For just 10 species, there are over 34 million possible unrooted tree topologies. For 20 species, the number exceeds 10^21. This combinatorial explosion means that exhaustive search — evaluating every possible tree — is feasible only for small datasets. Real phylogenetic analyses with hundreds or thousands of species rely on heuristic search algorithms that intelligently explore tree space, guided by optimality criteria like maximum likelihood or minimum total branch length.

Confidence and Pitfalls

How do we know if a phylogenetic tree is correct? Bootstrap resampling provides statistical support for each node: if a grouping appears in 95% of resampled trees, we have high confidence it reflects genuine evolutionary history. But phylogenetics has pitfalls. Long-branch attraction causes rapidly evolving lineages to be incorrectly grouped together. Convergent evolution creates misleading similarities. Horizontal gene transfer — common in bacteria — violates the tree model entirely, requiring network approaches instead.

FAQ

What is a phylogenetic tree?

A phylogenetic tree is a branching diagram that shows the evolutionary relationships among species or other taxa. Each branch point (node) represents a common ancestor, and branch lengths typically represent the amount of evolutionary change or time. Modern phylogenetics uses morphological traits, DNA sequences, or both to infer these relationships using statistical algorithms.

How is a phylogenetic tree constructed?

There are three main approaches: distance methods (UPGMA, Neighbor-Joining) compute pairwise distances between taxa and cluster them hierarchically; parsimony methods find the tree requiring the fewest evolutionary changes; and likelihood/Bayesian methods use explicit models of evolution to find the most probable tree. Each method has strengths — NJ is fast and accurate for most data, while Bayesian methods provide statistical support but are computationally intensive.

What is bootstrap support in phylogenetics?

Bootstrap support measures confidence in each node of a tree. The original data matrix is resampled with replacement many times (typically 1000), a tree is built from each resampled dataset, and the percentage of trees containing each grouping is reported. Values above 70% are generally considered good support. A node with 95% bootstrap support appears in 95% of the resampled trees.

What is homoplasy and why does it matter?

Homoplasy occurs when similar traits evolve independently in unrelated lineages (convergent evolution) or when traits reverse to an ancestral state. Wings evolved independently in bats and birds; that similarity does not indicate close kinship. Homoplasy confuses phylogenetic algorithms because it creates misleading similarity. The consistency index (CI) measures the ratio of minimum to observed changes — low CI values indicate significant homoplasy in the data.

Sources

Embed

<iframe src="https://homo-deus.com/lab/paleontology/phylogenetic-tree/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub