Stereo Matching Simulator: Depth from Disparity Maps

simulator intermediate ~12 min
Loading simulation...
Z_min = 2.34 m — nearest reconstructable depth

With a 30 cm baseline, 50 mm focal length, and max disparity of 64 px, the stereo system resolves depth starting at 2.34 m and provides sub-centimeter resolution at close range.

Formula

Z = f × B / d (depth from disparity)
ΔZ = Z² / (f × B) (depth resolution)
SAD = Σ|I_L(x,y) - I_R(x-d,y)| over W×W block

Seeing Depth from Two Eyes

Stereo vision mimics human binocular perception: two cameras separated by a known baseline capture the same scene from slightly different angles. The horizontal offset (disparity) between corresponding points in the left and right images encodes depth — closer objects exhibit larger disparity shifts, while distant features show nearly zero displacement. This geometric principle underpins autonomous vehicle perception, robotic navigation, and aerial 3D mapping.

Block Matching Algorithms

The simplest dense stereo method slides a small window across each epipolar line, computing a matching cost (typically Sum of Absolute Differences) at every candidate disparity. The disparity minimizing cost is selected per pixel, producing a dense disparity map. Block size controls the smoothness-detail tradeoff: small blocks preserve edges but are noise-sensitive, while large blocks smooth results at the cost of blurred boundaries.

Depth Precision and Range

The fundamental depth equation Z = fB/d reveals that depth precision degrades quadratically with distance — doubling range quadruples depth uncertainty. This means stereo systems excel at close-range reconstruction but struggle at long distances. Increasing baseline or focal length extends useful range, though wider baselines introduce more occlusion artifacts at object boundaries.

Modern Advances

Semi-Global Matching (SGM) dramatically improved stereo quality by enforcing piecewise smoothness across multiple directions, achieving near-real-time dense reconstruction. Recent deep learning approaches like RAFT-Stereo and LEAStereo learn to match features directly, handling textureless regions and reflections that defeat traditional methods. These advances power LiDAR-free depth sensing in smartphones and self-driving cars.

FAQ

What is stereo matching in photogrammetry?

Stereo matching finds corresponding pixels between two images taken from slightly different viewpoints. The horizontal displacement (disparity) between matched points is inversely proportional to depth, enabling 3D reconstruction from 2D image pairs.

How does block matching work?

Block matching compares small patches (blocks) in the left image against candidate positions in the right image using a cost function like Sum of Absolute Differences (SAD). The disparity with lowest cost is selected as the match, producing a dense disparity map.

What determines stereo depth resolution?

Depth resolution depends on baseline length, focal length, and pixel size. Larger baselines and longer focal lengths provide finer depth discrimination. Resolution degrades quadratically with distance — objects twice as far have four times less depth precision.

What are common stereo matching challenges?

Textureless regions produce ambiguous matches, occlusions create unmatched areas, and repetitive patterns cause false matches. Semi-global matching (SGM) and deep-learning approaches address these by enforcing smoothness constraints and learning feature representations.

Sources

Embed

<iframe src="https://homo-deus.com/lab/photogrammetry/stereo-matching/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub