From Articulation to Sound
Every vowel you speak is the product of a specific vocal tract shape. By raising or lowering the tongue, pushing it forward or back, opening or closing the jaw, and rounding or spreading the lips, you reshape a tube roughly 17 cm long into an acoustic filter that selectively amplifies certain frequencies. This simulation lets you control these articulatory parameters and see how they map to formant frequencies and vowel identity in real time.
The Source-Filter Model
Speech production follows the source-filter model proposed by Gunnar Fant in 1960. The source is the quasi-periodic buzzing of the vocal folds (glottal source), producing a harmonic series. The filter is the vocal tract, which amplifies frequencies near its resonances (formants) and attenuates others. By separating source and filter, we can independently control pitch (source) and vowel quality (filter) — just as the human system does.
Articulatory-Acoustic Mappings
The relationship between articulation and acoustics is nonlinear and many-to-one: different tract shapes can sometimes produce similar formant patterns (motor equivalence). However, the primary mappings are well established. Jaw opening and tongue lowering raise F1. Tongue fronting raises F2. Lip rounding lowers F2 and F3. The simulation computes these mappings using a simplified tube model, letting you discover the acoustic consequences of each articulatory gesture.
Building a Vocal Tract
Articulatory synthesis aims to generate speech by modeling the vocal tract as a series of concatenated tubes with varying cross-sectional areas. Advanced models simulate airflow, tissue compliance, and radiation from the lips. While modern text-to-speech systems primarily use neural networks, articulatory models remain essential for understanding speech motor control, simulating disorders, and teaching phonetics — because they reveal the causal chain from gesture to sound.