From Vibration to Pitch
Pitch — the perceptual correlate of frequency — is one of the most important attributes of sound. It lets us recognize melodies, distinguish voices, and parse tonal languages. Yet pitch is not simply frequency: the relationship is nonlinear (an octave is always a doubling, regardless of starting frequency), level-dependent, and can even arise from stimuli with no energy at the perceived frequency (the missing fundamental).
The Basilar Membrane: A Biological Spectrum Analyzer
Inside the cochlea, the basilar membrane performs a real-time Fourier-like decomposition of incoming sound. High frequencies excite the base; low frequencies excite the apex. Georg von Bekesy won the 1961 Nobel Prize for mapping this tonotopic organization. Each inner hair cell along the membrane responds to a narrow frequency band, creating a spatial 'frequency map' that the auditory nerve transmits to the brain.
Perceptual Pitch Scales
The mel and Bark scales were developed to quantify how perceived pitch relates to physical frequency. Both compress high frequencies relative to low — reflecting the basilar membrane's logarithmic spacing. The mel scale (Stevens, 1937) is widely used in speech technology for computing mel-frequency cepstral coefficients (MFCCs), the dominant feature representation in speech recognition systems.
Place vs. Temporal Coding
Below about 500 Hz, the auditory nerve fibers phase-lock to individual cycles of the sound wave, providing precise temporal pitch information. Above 4–5 kHz, phase-locking fails and the brain relies entirely on which place on the basilar membrane is activated. Between these extremes, both mechanisms contribute. This dual coding explains why pitch perception is most acute in the 500–4000 Hz speech range, where both cues overlap.