PCA: Reducing Dimensions While Preserving Information

simulator intermediate ~9 min
Loading simulation...
2 PCs capture ≈ 85% variance — effective compression

With correlated 5-dimensional data, the first 2 principal components typically capture about 85% of the total variance, enabling visualization of high-dimensional structure in 2D.

Formula

Covariance Matrix: C = (1/n) Xᵀ X (where X is mean-centered)
Eigendecomposition: C v = λ v
Variance Explained = λᵢ / Σλⱼ × 100%

Finding the Essential Dimensions

In a world drowning in high-dimensional data — genomes with 20,000 genes, images with millions of pixels, customer profiles with hundreds of features — PCA answers a fundamental question: what are the few directions that matter most? Karl Pearson invented the method in 1901, and it remains one of the most widely used techniques in all of data science, from genomics to finance to natural language processing.

How PCA Works

PCA finds the principal components — orthogonal directions ordered by how much variance they capture. The first component points in the direction of maximum spread in the data. The second captures the most remaining spread perpendicular to the first. Mathematically, these are the eigenvectors of the data's covariance matrix, ordered by their eigenvalues. The simulation above visualizes these eigenvectors emerging from the data cloud.

Compression Without Losing Meaning

The magic of PCA is that real-world data is often intrinsically low-dimensional. A dataset with 100 features might have 95% of its variance captured by just 3 principal components. By projecting onto these 3 components, you achieve 33:1 compression while losing only 5% of the information. This enables visualization of high-dimensional data, reduces computational costs, and often improves model performance by removing noise.

Beyond Linear PCA

Standard PCA only captures linear relationships. For complex, nonlinear data, extensions like kernel PCA, t-SNE, and UMAP can discover curved manifolds and clusters that PCA misses. However, PCA remains the essential first step in any dimensionality reduction pipeline — it's fast, interpretable, and often surprisingly effective even for complex datasets.

FAQ

What is PCA (Principal Component Analysis)?

PCA finds the directions of maximum variance in high-dimensional data and projects the data onto these directions. The first principal component captures the most variance, the second captures the most remaining variance orthogonal to the first, and so on.

When should you use PCA?

Use PCA when you have many correlated features and want to reduce dimensionality for visualization, noise reduction, or computational efficiency. PCA is most effective when the data lies near a low-dimensional subspace — that is, when a few directions capture most of the variance.

How do you choose the number of components?

Common methods: keep enough components to explain 90-95% of variance, use the Kaiser criterion (eigenvalue > 1), or look for an 'elbow' in the scree plot. Cross-validation on downstream task performance is the most rigorous approach.

What are the limitations of PCA?

PCA only captures linear relationships. It's sensitive to feature scaling (always standardize first). It assumes the principal components are orthogonal. For nonlinear dimensionality reduction, consider t-SNE, UMAP, or kernel PCA.

Sources

Embed

<iframe src="https://homo-deus.com/lab/data-science/pca-reduction/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub