Memory-Free Randomness
A Markov chain is a random walk through states where each step depends only on where you are now, not how you got there. This 'memoryless' property — formalized by Andrey Markov in 1906 while studying vowel-consonant patterns in Pushkin's poetry — makes the math tractable and the applications vast. From PageRank to protein folding, Markov chains model systems where history compresses into the current state.
Transition Probabilities and the Matrix
The chain's behavior is fully encoded in its transition matrix P, where pᵢⱼ gives the probability of moving from state i to state j. Each row sums to 1. This simulation shows a 3-state chain with adjustable transitions — you control the arrows between states and watch how changing a single probability reshapes the entire long-run behavior. The matrix power Pⁿ converges to a rank-1 matrix whose rows are all the stationary distribution π.
Convergence to Steady State
Start the chain in any state and run it forward: after enough steps, the probability distribution over states stabilizes at π, the stationary distribution satisfying πP = π. The convergence plot shows this process — initial transients decay exponentially, with the rate controlled by the second-largest eigenvalue of P. Faster mixing (smaller second eigenvalue) means faster convergence, which is critical for MCMC sampling algorithms.
From Chains to Algorithms
Markov Chain Monte Carlo (MCMC) reverses the usual question: instead of analyzing a given chain, you design a chain whose stationary distribution equals your target distribution. The Metropolis-Hastings algorithm constructs such chains for arbitrary distributions, enabling Bayesian inference in complex models. This simulation builds intuition for how transition structure determines equilibrium — the same intuition that guides MCMC algorithm design.