K-Means Clustering: How Algorithms Discover Groups in Data

simulator intermediate ~8 min
Loading simulation...
Silhouette ≈ 0.72 — well-separated clusters

With 3 clusters and moderate spread, K-means typically converges in 5-10 iterations and produces well-separated clusters with a silhouette score around 0.72.

Formula

Inertia (WCSS) = Σᵢ Σⱼ∈Cᵢ ||xⱼ - μᵢ||²
Silhouette(i) = (b(i) - a(i)) / max(a(i), b(i))
Centroid Update: μᵢ = (1/|Cᵢ|) Σⱼ∈Cᵢ xⱼ

Discovering Structure in Unstructured Data

K-means clustering is one of the most widely used unsupervised learning algorithms. Given a set of unlabeled data points, it discovers K groups (clusters) such that points within each group are similar to each other and different from points in other groups. Stuart Lloyd first described the algorithm in 1957 at Bell Labs, though it wasn't published until 1982. Today it powers customer segmentation, image compression, and anomaly detection.

The Algorithm: Assign and Update

K-means alternates between two simple steps. In the assignment step, each data point is assigned to the nearest centroid. In the update step, each centroid moves to the mean of all points assigned to it. These two steps repeat until convergence — when no points change their cluster assignment. The animation above lets you watch this process unfold in real time, seeing centroids shift and data points switch colors as clusters form.

The Initialization Problem

K-means is guaranteed to converge, but not necessarily to the best solution. Different initial centroid positions can lead to dramatically different final clusterings. The K-means++ algorithm (2007) solves this by choosing initial centroids that are spread far apart, which typically produces much better results. Most modern implementations use K-means++ by default.

Choosing K and Evaluating Results

The biggest challenge in K-means is choosing the right number of clusters. The elbow method plots inertia (within-cluster sum of squares) against K and looks for a bend. The silhouette score measures how well-separated clusters are, with values near 1 indicating excellent partitioning. In practice, domain knowledge often matters more than any automated method — the right K is the one that produces actionable, interpretable groups.

FAQ

How does K-means clustering work?

K-means works in two alternating steps: (1) assign each data point to its nearest centroid, (2) move each centroid to the mean of its assigned points. These steps repeat until centroids stop moving. The algorithm always converges, though not necessarily to the global optimum.

How do you choose the right number of clusters (K)?

Common methods include the elbow method (plot inertia vs. K, look for an 'elbow'), silhouette analysis (choose K with highest silhouette score), and the gap statistic. Domain knowledge should also inform the choice.

What are the limitations of K-means?

K-means assumes spherical, equally-sized clusters. It struggles with non-convex shapes, varying densities, and outliers. It requires specifying K in advance and can converge to local optima depending on initialization. Alternatives like DBSCAN handle irregular shapes better.

What is the K-means++ initialization?

K-means++ is an improved initialization method that spreads initial centroids apart. The first centroid is chosen randomly; subsequent centroids are chosen with probability proportional to their squared distance from the nearest existing centroid. This dramatically reduces the chance of poor convergence.

Sources

Embed

<iframe src="https://homo-deus.com/lab/data-science/k-means-clustering/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub