The Birth of Neural Networks
In 1958, Frank Rosenblatt built the Mark I Perceptron — a machine that could learn to classify visual patterns by adjusting connection weights. The New York Times reported it as a machine that 'will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.' The reality was more modest but no less revolutionary: the perceptron was the first algorithm that could learn from examples, adjusting its behavior based on feedback rather than explicit programming. This simulation recreates that fundamental learning process.
How the Perceptron Learns
A perceptron takes two inputs (x1, x2), multiplies each by a weight (w1, w2), adds a bias (b), and applies a threshold: if w1*x1 + w2*x2 + b > 0, output 1; otherwise output 0. Training adjusts the weights using the perceptron learning rule: for each misclassified point, nudge the weights in the direction that would correct the classification. The learning rate η controls the nudge size. The decision boundary is the line w1*x1 + w2*x2 + b = 0 — you can watch it rotate and shift as training progresses.
The XOR Problem and Its Resolution
In 1969, Marvin Minsky and Seymour Papert published their devastating analysis: the perceptron cannot solve XOR (points at (0,0) and (1,1) are class A; (0,1) and (1,0) are class B). No single straight line can separate them. This proof, which applies to any linearly inseparable problem, triggered the first AI winter — funding for neural network research dried up for over a decade. The solution came in the 1980s: stack multiple perceptrons in layers, add nonlinear activation functions, and use backpropagation to train. These multi-layer networks can draw curved decision boundaries, solving XOR and much more complex problems.
From Perceptron to Deep Learning
Every modern deep learning system — from GPT to AlphaFold — is fundamentally an enormous stack of perceptron-like units. A single perceptron separates points with a line. Two layers can carve out convex regions. Three or more layers can approximate any continuous function (the Universal Approximation Theorem). Today's language models contain billions of these simple units organized in transformer architectures. The perceptron learning rule evolved into backpropagation, then into Adam, AdaGrad, and other optimizers. But the core idea remains: adjust weights to reduce error, one step at a time.