Two Ears, Three Dimensions
With just two ears, the human auditory system localizes sounds in three dimensions with remarkable precision — about 1° accuracy for sources directly ahead. This feat relies on the physics of sound waves interacting with the head, pinnae (outer ears), and torso, which create direction-dependent modifications to the incoming signal that the brain has learned to decode over a lifetime of experience.
Interaural Time Difference (ITD)
When sound comes from the left, it arrives at the left ear before the right. For an average head (radius ~8.75 cm), the maximum ITD is about 690 microseconds at 90° azimuth. The brain's medial superior olive contains neurons that act as coincidence detectors, firing maximally when inputs from both ears arrive simultaneously after compensating for the delay — a biological delay line first proposed by Jeffress in 1948.
Interaural Level Difference (ILD)
At frequencies above about 1.5 kHz, the head casts an 'acoustic shadow' — the far ear receives less sound energy. This interaural level difference can exceed 20 dB at high frequencies and large angles. The lateral superior olive in the brainstem computes ILD by comparing excitatory input from one ear with inhibitory input from the other. Together with ITD, this provides the azimuthal component of localization.
Beyond Left-Right: Elevation and Distance
ITD and ILD determine azimuth but cannot distinguish front from back or high from low (the cone of confusion). The brain resolves these ambiguities using spectral cues: the complex folds of the pinna (outer ear) create frequency-dependent filtering that varies with elevation. These head-related transfer functions (HRTFs) are unique to each individual, which is why generic spatial audio sometimes sounds 'wrong' — personalized HRTFs are the frontier of immersive audio technology.