Vowel Formant Analysis Simulator: Map the Acoustic Vowel Space

simulator intermediate ~10 min
Loading simulation...
/ɛ/ region — mid front vowel

With F1 at 500 Hz and F2 at 1500 Hz, you are in the region of the mid front vowel /ɛ/ as in 'bed'. This is one of the most common vowels across world languages.

Formula

F_n ≈ (2n - 1) × c / (4L) for a uniform tube model
L ≈ c / (4 × F1) — vocal tract length estimate
H(f) = 1 / √((f² - F_n²)² + (BW × f)²) — formant transfer function

The Resonant Vocal Tract

The human vocal tract — from the glottis to the lips — acts as an acoustic resonator roughly 17 cm long. Like any tube open at one end, it amplifies certain frequencies while attenuating others. These amplified frequency bands are called formants, numbered F1, F2, F3, and so on. By reshaping the tract (moving the tongue, jaw, and lips), speakers continuously alter these resonances to produce different vowels.

F1 and F2: The Vowel Code

Decades of phonetics research have shown that the first two formants carry almost all the information needed to identify a vowel. F1 tracks tongue height: close vowels like /i/ and /u/ have low F1 (~300 Hz), while open vowels like /a/ have high F1 (~800 Hz). F2 tracks tongue frontness: front vowels like /i/ have high F2 (~2200 Hz), while back vowels like /u/ have low F2 (~800 Hz). This creates the classic vowel space triangle.

Formant Bandwidth and Quality

Each formant has a bandwidth — the range of frequencies it amplifies. Narrow bandwidths produce sharp, ringing formants with clear vowel quality. Wider bandwidths, caused by increased damping from nasal coupling or breathy voice, produce more diffuse resonances. In pathological speech, abnormal bandwidths can indicate vocal tract dysfunction, making formant analysis a clinical diagnostic tool.

Applications in Technology

Formant analysis underpins modern speech technology. Automatic speech recognition systems extract formant features to identify phonemes. Voice synthesis engines manipulate formant tracks to generate natural-sounding speech. Forensic phonetics uses formant measurements for speaker identification. Even singing pedagogy uses formant tuning — opera singers learn to align formants with harmonics to project their voice over an orchestra.

FAQ

What are formants in speech?

Formants are resonant frequencies of the vocal tract. When air from the lungs passes through the vocal folds and resonates in the throat and mouth cavities, certain frequencies are amplified. The first two formants (F1 and F2) are sufficient to distinguish most vowels.

How do F1 and F2 relate to tongue position?

F1 correlates inversely with tongue height — low F1 means a close/high vowel, high F1 means an open/low vowel. F2 correlates with tongue advancement — high F2 means a front vowel, low F2 means a back vowel.

What is the vowel quadrilateral?

The vowel quadrilateral (or vowel space) is a plot of F1 vs F2 that maps all possible vowels. It roughly forms a quadrilateral shape corresponding to the cardinal vowels defined by Daniel Jones, with /i/ at the top-left and /ɑ/ at the bottom-right.

Why do men and women have different formant frequencies?

Formant frequencies are determined by vocal tract length. Adult males average about 17 cm while females average 14 cm, resulting in formants roughly 15-20% higher for women. Children have even shorter tracts and correspondingly higher formants.

Sources

Embed

<iframe src="https://homo-deus.com/lab/speech-science/formant-analysis/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub