Discovering Structure in Unstructured Data
K-means clustering is one of the most widely used unsupervised learning algorithms. Given a set of unlabeled data points, it discovers K groups (clusters) such that points within each group are similar to each other and different from points in other groups. Stuart Lloyd first described the algorithm in 1957 at Bell Labs, though it wasn't published until 1982. Today it powers customer segmentation, image compression, and anomaly detection.
The Algorithm: Assign and Update
K-means alternates between two simple steps. In the assignment step, each data point is assigned to the nearest centroid. In the update step, each centroid moves to the mean of all points assigned to it. These two steps repeat until convergence — when no points change their cluster assignment. The animation above lets you watch this process unfold in real time, seeing centroids shift and data points switch colors as clusters form.
The Initialization Problem
K-means is guaranteed to converge, but not necessarily to the best solution. Different initial centroid positions can lead to dramatically different final clusterings. The K-means++ algorithm (2007) solves this by choosing initial centroids that are spread far apart, which typically produces much better results. Most modern implementations use K-means++ by default.
Choosing K and Evaluating Results
The biggest challenge in K-means is choosing the right number of clusters. The elbow method plots inertia (within-cluster sum of squares) against K and looks for a bend. The silhouette score measures how well-separated clusters are, with values near 1 indicating excellent partitioning. In practice, domain knowledge often matters more than any automated method — the right K is the one that produces actionable, interpretable groups.