Patterns as Machines
Regular expressions are the programmer's shorthand for describing text patterns, used billions of times daily in search, validation, and text processing. But behind every regex is a finite automaton. Ken Thompson's 1968 algorithm compiles a regex into a non-deterministic finite automaton (NFA) using simple structural rules: concatenation chains states, alternation forks paths, and the Kleene star creates loops. This simulation makes that hidden machine visible.
Thompson's Construction
Thompson's construction is elegant in its economy: each regex operator maps to a tiny NFA fragment with at most two new states and four transitions. Concatenation connects fragments end-to-start. Alternation creates a new start state with epsilon transitions to both sub-patterns. The star operator loops the fragment back to its start. The total NFA never exceeds twice the pattern length in states, guaranteeing efficient construction.
NFA Simulation
Rather than converting the NFA to a DFA (which can explode exponentially in states), Thompson's algorithm simulates the NFA directly by maintaining a set of currently active states. For each input character, it computes the next set of active states by following all valid transitions. If any active state after the last character is an accept state, the string matches. This approach guarantees O(n·m) time — linear in practice.
The Backtracking Trap
Most modern regex engines (Perl, Python, JavaScript) use backtracking instead of Thompson's algorithm, because backtracking supports extensions like backreferences. But backtracking can take exponential time on pathological patterns — the so-called catastrophic backtracking or ReDoS vulnerability. Google's RE2 engine returns to Thompson's approach, proving that the 1968 algorithm remains the gold standard for guaranteed-efficient matching.