The Fundamental Tension
Machine learning walks a tightrope between two failure modes. Too simple a model (high bias) cannot capture the true relationship in the data — a straight line through a cubic trend. Too complex a model (high variance) captures everything, including the noise — a degree-15 polynomial that passes through every point but oscillates wildly. The simulation lets you watch this tradeoff unfold: increase polynomial degree and watch training error drop to zero while test error explodes.
Training Error vs Test Error
The hallmark of overfitting is a gap between training and test performance. Training error always decreases with model complexity — given enough parameters, any dataset can be fit perfectly. But test error (on unseen data) follows a U-shaped curve: it first decreases as the model captures the true pattern, then increases as the model starts fitting noise. The bottom of this U is the optimal complexity, and finding it is the art of machine learning.
The Bias-Variance Decomposition
The expected prediction error at any point decomposes into three components: bias squared (how far the average model prediction is from truth), variance (how much predictions vary between different training sets), and irreducible noise. Simple models have high bias and low variance; complex models have low bias and high variance. This decomposition, formalized by Geman et al. in 1992, provides the theoretical foundation for understanding overfitting and guiding model selection.
Modern Perspectives: Double Descent
Classical wisdom says test error follows a U-curve with complexity. But recent research by Belkin et al. revealed a surprising 'double descent' phenomenon: as models become massively overparameterized (far more parameters than data points), test error can decrease again after an initial spike. This occurs in neural networks, random forests, and even linear models. The interpolation threshold — where the model just barely fits training data perfectly — is the worst point, and going beyond it into the overparameterized regime can actually improve generalization.