Pitch Perception Simulator: Basilar Membrane, Mel Scale & Place Theory

simulator intermediate ~10 min
Loading simulation...
549.6 mel — perceived pitch of A4 (440 Hz)

Concert A at 440 Hz maps to 549.6 mel, positioned about 60% along the basilar membrane from the base. The mel scale captures the perceptual reality that doubling frequency does not double perceived pitch.

Formula

m = 2595 × log₁₀(1 + f/700) (mel scale)
z = 13·arctan(0.00076·f) + 3.5·arctan((f/7500)²) (Bark scale)
ERB(f) = 24.7 × (4.37·f/1000 + 1) Hz

From Vibration to Pitch

Pitch — the perceptual correlate of frequency — is one of the most important attributes of sound. It lets us recognize melodies, distinguish voices, and parse tonal languages. Yet pitch is not simply frequency: the relationship is nonlinear (an octave is always a doubling, regardless of starting frequency), level-dependent, and can even arise from stimuli with no energy at the perceived frequency (the missing fundamental).

The Basilar Membrane: A Biological Spectrum Analyzer

Inside the cochlea, the basilar membrane performs a real-time Fourier-like decomposition of incoming sound. High frequencies excite the base; low frequencies excite the apex. Georg von Bekesy won the 1961 Nobel Prize for mapping this tonotopic organization. Each inner hair cell along the membrane responds to a narrow frequency band, creating a spatial 'frequency map' that the auditory nerve transmits to the brain.

Perceptual Pitch Scales

The mel and Bark scales were developed to quantify how perceived pitch relates to physical frequency. Both compress high frequencies relative to low — reflecting the basilar membrane's logarithmic spacing. The mel scale (Stevens, 1937) is widely used in speech technology for computing mel-frequency cepstral coefficients (MFCCs), the dominant feature representation in speech recognition systems.

Place vs. Temporal Coding

Below about 500 Hz, the auditory nerve fibers phase-lock to individual cycles of the sound wave, providing precise temporal pitch information. Above 4–5 kHz, phase-locking fails and the brain relies entirely on which place on the basilar membrane is activated. Between these extremes, both mechanisms contribute. This dual coding explains why pitch perception is most acute in the 500–4000 Hz speech range, where both cues overlap.

FAQ

How does the ear perceive pitch?

Pitch perception involves two complementary mechanisms: place coding (different frequencies activate different positions along the basilar membrane) and temporal coding (the auditory nerve fires in synchrony with waveform cycles, up to about 4-5 kHz). The brain integrates both cues to create the sensation of pitch.

What is the mel scale?

The mel scale is a perceptual pitch scale where equal mel intervals sound equally spaced in pitch. Defined so that 1000 mel = 1000 Hz, it compresses high frequencies: 2000 Hz = 1500 mel, 4000 Hz = 2146 mel. It is widely used in speech recognition and audio feature extraction (mel-frequency cepstral coefficients, MFCCs).

What is the missing fundamental?

If you play harmonics 2, 3, 4, and 5 of a fundamental frequency but omit the fundamental itself, listeners still perceive the pitch of the missing fundamental. This demonstrates that pitch is a central auditory computation, not simply the lowest frequency present. Telephones exploit this — they transmit 300–3400 Hz but male voices (F0 ≈ 120 Hz) still sound natural.

How does the basilar membrane work?

The basilar membrane in the cochlea acts as a mechanical frequency analyzer. It is narrow and stiff at the base (responding to high frequencies) and wide and floppy at the apex (responding to low frequencies). Each point resonates at a characteristic frequency, creating a tonotopic map that the brain reads to determine pitch.

Sources

Embed

<iframe src="https://homo-deus.com/lab/psychoacoustics/pitch-perception/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub