Regular Expression Visualizer: Watch NFA Pattern Matching Step by Step

simulator beginner ~10 min
Loading simulation...
Match — pattern (a|b)*abb matched the input string

The regular expression (a|b)*abb compiles to a 5-state NFA that matches strings ending in 'abb' over the alphabet {a, b}.

Formula

L(r₁|r₂) = L(r₁) ∪ L(r₂)
L(r*) = {ε} ∪ L(r) ∪ L(r)·L(r) ∪ ...
Thompson NFA: |Q| ≤ 2·|pattern|

Patterns as Machines

Regular expressions are the programmer's shorthand for describing text patterns, used billions of times daily in search, validation, and text processing. But behind every regex is a finite automaton. Ken Thompson's 1968 algorithm compiles a regex into a non-deterministic finite automaton (NFA) using simple structural rules: concatenation chains states, alternation forks paths, and the Kleene star creates loops. This simulation makes that hidden machine visible.

Thompson's Construction

Thompson's construction is elegant in its economy: each regex operator maps to a tiny NFA fragment with at most two new states and four transitions. Concatenation connects fragments end-to-start. Alternation creates a new start state with epsilon transitions to both sub-patterns. The star operator loops the fragment back to its start. The total NFA never exceeds twice the pattern length in states, guaranteeing efficient construction.

NFA Simulation

Rather than converting the NFA to a DFA (which can explode exponentially in states), Thompson's algorithm simulates the NFA directly by maintaining a set of currently active states. For each input character, it computes the next set of active states by following all valid transitions. If any active state after the last character is an accept state, the string matches. This approach guarantees O(n·m) time — linear in practice.

The Backtracking Trap

Most modern regex engines (Perl, Python, JavaScript) use backtracking instead of Thompson's algorithm, because backtracking supports extensions like backreferences. But backtracking can take exponential time on pathological patterns — the so-called catastrophic backtracking or ReDoS vulnerability. Google's RE2 engine returns to Thompson's approach, proving that the 1968 algorithm remains the gold standard for guaranteed-efficient matching.

FAQ

What is a regular expression?

A regular expression is a formal notation for describing patterns in strings using concatenation, alternation (|), and Kleene star (*). Formally, they denote exactly the regular languages — the same class recognized by finite automata. In practice, programming language regex engines add extensions (backreferences, lookahead) that go beyond regular languages.

How does a regular expression become an automaton?

Thompson's construction (1968) converts any regular expression to an NFA with at most 2n states, where n is the pattern length. Each regex operator maps to a simple NFA fragment: concatenation chains fragments, alternation creates a branch, and Kleene star adds a loop. The resulting NFA can be simulated directly or converted to a DFA.

What is the difference between NFA and DFA regex matching?

NFA-based matching (Thompson/Pike) tracks a set of active states simultaneously, running in O(nm) time for pattern length m and string length n. DFA-based matching (like RE2) precompiles to a DFA for O(n) matching but may take exponential time/space to build. Backtracking engines (Perl, Python) can take exponential time on pathological patterns.

Why can regular expressions cause performance problems?

Backtracking regex engines try paths one at a time and backtrack on failure. Patterns with nested quantifiers like (a+)+ can cause catastrophic backtracking — exponential time on non-matching inputs. This is a real security concern (ReDoS attacks). NFA-based engines like RE2 avoid this by design, maintaining the O(nm) guarantee.

Sources

Embed

<iframe src="https://homo-deus.com/lab/automata-theory/regular-expressions/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub