Voice Onset Time Simulator: Explore Stop Consonant Voicing

simulator intermediate ~10 min
Loading simulation...
Short-lag VOT — English voiced stop region

A VOT of 25 ms falls in the short-lag category, perceived as a voiced stop /b, d, g/ in English. This is near the perceptual boundary at ~30 ms where categorical perception switches.

Formula

VOT = t_voicing_onset - t_burst_release
Aspiration duration = max(0, VOT - burst_duration)
Category boundary (English) ≈ 25-35 ms

The Timing of Voice

When you say 'pat' versus 'bat', the primary acoustic difference is not loudness or pitch — it is timing. Voice onset time (VOT) measures the precise interval between the burst of air released when the lips open and the moment the vocal folds begin vibrating. This seemingly tiny timing difference — measured in milliseconds — is how your brain distinguishes voiced from voiceless consonants.

Three Voicing Categories

Across the world's languages, VOT defines three broad categories. Prevoiced stops (VOT < 0 ms) have voicing that begins during the closure, before the burst — typical of /b, d, g/ in Romance languages. Short-lag stops (0-30 ms) have near-simultaneous voicing and release — these are English 'voiced' stops. Long-lag stops (>50 ms) have a clear aspiration gap between burst and voicing — English 'voiceless' stops like /p, t, k/ in word-initial position.

Categorical Perception

One of the most remarkable findings in speech science is that VOT is perceived categorically, not continuously. If you create a synthetic continuum from 0 ms to 80 ms VOT, listeners do not hear a gradual change — they hear a sharp switch from /b/ to /p/ around 25-30 ms. This boundary is not fixed: it shifts with speaking rate, phonetic context, and even the listener's native language, revealing deep connections between acoustics and cognition.

Clinical and Forensic Applications

VOT measurement is clinically important: children with speech disorders often show abnormal VOT distributions, and bilingual speakers show VOT patterns influenced by both languages. In forensic phonetics, VOT patterns help identify speakers and their language backgrounds. Speech synthesis systems must generate appropriate VOT values to sound natural — too short or too long, and the consonant sounds foreign or robotic.

FAQ

What is voice onset time (VOT)?

Voice onset time is the interval between the release burst of a stop consonant and the onset of vocal fold vibration. Positive VOT means voicing starts after the burst (voiceless), negative VOT means voicing starts before (prevoiced), and zero VOT means simultaneous onset.

How does VOT differ across languages?

Languages use VOT boundaries differently. English distinguishes short-lag (~15-30 ms, 'voiced') from long-lag (~60-100 ms, 'voiceless'). French/Spanish distinguish prevoiced (~-100 ms) from short-lag (~15 ms). Thai and Hindi use a three-way distinction.

What is categorical perception of VOT?

Listeners do not perceive VOT as a continuous scale — they hear distinct categories with a sharp boundary. In English, the boundary is around 25-30 ms: stimuli below are heard as /b/, above as /p/, with very little ambiguity near the boundary.

How is VOT measured in the lab?

VOT is measured from wideband spectrograms or waveforms. The burst appears as a sharp spike, and voicing onset is identified by the appearance of regular periodic energy (voicing bar) in the low frequencies. Modern software like Praat automates this measurement.

Sources

Embed

<iframe src="https://homo-deus.com/lab/speech-science/voice-onset-time/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub