Binaural Localization Simulator: ITD, ILD & Spatial Hearing

simulator intermediate ~10 min
Loading simulation...
ITD = 295 μs, ILD = 4.2 dB — 30° azimuth at 1 kHz

A sound source at 30° azimuth produces an interaural time difference of about 295 μs and a level difference of 4.2 dB at 1 kHz — both well above the just-noticeable thresholds (10 μs and 1 dB respectively).

Formula

ITD = (r/c) × (sin θ + θ) (Woodworth model)
ILD ≈ f·a·sin(θ) / c (simplified high-freq approximation)

Two Ears, Three Dimensions

With just two ears, the human auditory system localizes sounds in three dimensions with remarkable precision — about 1° accuracy for sources directly ahead. This feat relies on the physics of sound waves interacting with the head, pinnae (outer ears), and torso, which create direction-dependent modifications to the incoming signal that the brain has learned to decode over a lifetime of experience.

Interaural Time Difference (ITD)

When sound comes from the left, it arrives at the left ear before the right. For an average head (radius ~8.75 cm), the maximum ITD is about 690 microseconds at 90° azimuth. The brain's medial superior olive contains neurons that act as coincidence detectors, firing maximally when inputs from both ears arrive simultaneously after compensating for the delay — a biological delay line first proposed by Jeffress in 1948.

Interaural Level Difference (ILD)

At frequencies above about 1.5 kHz, the head casts an 'acoustic shadow' — the far ear receives less sound energy. This interaural level difference can exceed 20 dB at high frequencies and large angles. The lateral superior olive in the brainstem computes ILD by comparing excitatory input from one ear with inhibitory input from the other. Together with ITD, this provides the azimuthal component of localization.

Beyond Left-Right: Elevation and Distance

ITD and ILD determine azimuth but cannot distinguish front from back or high from low (the cone of confusion). The brain resolves these ambiguities using spectral cues: the complex folds of the pinna (outer ear) create frequency-dependent filtering that varies with elevation. These head-related transfer functions (HRTFs) are unique to each individual, which is why generic spatial audio sometimes sounds 'wrong' — personalized HRTFs are the frontier of immersive audio technology.

FAQ

How does the brain localize sound?

The brain uses three primary cues: interaural time difference (ITD) — the sound arrives at the near ear first; interaural level difference (ILD) — the head shadows the far ear at high frequencies; and spectral cues from pinna filtering that resolve front-back and elevation ambiguities. The medial superior olive processes ITD while the lateral superior olive processes ILD.

What is the duplex theory?

Lord Rayleigh's duplex theory (1907) states that localization uses ITD for low frequencies (<1.5 kHz) and ILD for high frequencies (>1.5 kHz). At the crossover frequency, localization accuracy dips — this 'hole' in localization ability has been confirmed experimentally and corresponds to the frequency where the head is about one wavelength in diameter.

What is the cone of confusion?

The cone of confusion is a set of points in space that produce the same ITD and ILD. For any azimuth angle, there is a corresponding cone of positions (varying in elevation and front-back) with identical interaural differences. The brain resolves this ambiguity using pinna spectral cues, head movements, and prior expectations.

How is spatial audio technology related?

Virtual reality audio, surround sound, and spatial audio headphones (Apple Spatial Audio, Dolby Atmos for Headphones) work by synthesizing the correct ITD, ILD, and pinna filtering (via HRTF — head-related transfer functions) to create the illusion of sounds coming from specific directions. Accurate HRTFs are critical for convincing spatialization.

Sources

Embed

<iframe src="https://homo-deus.com/lab/psychoacoustics/binaural-localization/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub