The Melody of Speech
Every utterance carries a melody — the pitch contour created by variations in vocal fold vibration rate. This fundamental frequency (f₀) contour is one of the richest channels of linguistic information, encoding whether you are asking a question or making a statement, which word carries emphasis, and even your emotional state. Pitch tracking — extracting this f₀ contour from the acoustic signal — is one of the most important tasks in speech analysis.
Intonation Contours
Languages use characteristic pitch patterns called intonation contours. In English, declarative statements typically have a falling contour: pitch peaks on the nuclear (most stressed) syllable and falls to the baseline. Yes/no questions end with a rise. Wh-questions often fall. These patterns are language-specific — in Bengali, statements rise, and in Sicilian Italian, questions fall. This simulation lets you compare these contour types.
Measuring Pitch Perturbation
No voice is perfectly periodic. Slight cycle-to-cycle variations in the vibration period (jitter) and amplitude (shimmer) give each voice its unique character. Normal voices have jitter below 1%. Professional singers often have remarkably low jitter, while pathological voices (vocal nodules, paralysis) show elevated jitter. Tracking these perturbations is essential for clinical voice assessment and for making synthetic speech sound natural.
From Acoustics to Meaning
Pitch tracking enables a wide range of applications: clinical voice assessment detects pathology from f₀ perturbation patterns; tonal language processing requires accurate f₀ for word recognition in Mandarin or Thai; emotion recognition systems use pitch range and contour shape to classify affect; and music information retrieval uses pitch tracking to transcribe melodies. The algorithms must handle the challenges of creaky voice, voice breaks, and background noise.