Question 1

How does k-means clustering work?

Accepted Answer

K-means partitions data into k clusters by iterating two steps: (1) assign each point to the nearest centroid, and (2) recompute each centroid as the mean of its assigned points. This repeats until centroids stop moving. The algorithm minimizes within-cluster sum of squares (inertia). It is fast and simple but assumes spherical clusters and requires specifying k in advance.

Question 2

How do you choose the right number of clusters?

Accepted Answer

The elbow method plots inertia vs k and looks for a 'bend' where adding clusters gives diminishing returns. The silhouette method measures how similar points are to their own cluster vs neighboring clusters — higher is better. Gap statistic compares inertia to a null reference. In practice, domain knowledge often guides the final choice.

Question 3

What is the silhouette score?

Accepted Answer

The silhouette score for each point measures how well it fits its assigned cluster. It ranges from -1 to 1: values near 1 mean the point is well-matched to its cluster and far from others; near 0 means it is on the boundary; near -1 means it is probably in the wrong cluster. The average silhouette score summarizes overall clustering quality.

Question 4

When does k-means fail?

Accepted Answer

K-means struggles with: non-spherical clusters (elongated, ring-shaped), clusters of very different sizes, clusters with different densities, and outliers. For these cases, algorithms like DBSCAN (density-based), Gaussian Mixture Models (soft assignment), or spectral clustering are more appropriate. K-means also depends on initialization — k-means++ helps avoid poor starting positions.

K-Means Clustering: Finding Structure in Unlabeled Data

Formula

Unsupervised Learning: No Labels Needed

The Algorithm: Assign and Update

Choosing K: The Eternal Question

Beyond K-Means

FAQ

Sources

Embed

K-Means Clustering: Finding Structure in Unlabeled Data

Formula

Unsupervised Learning: No Labels Needed

The Algorithm: Assign and Update

Choosing K: The Eternal Question

Beyond K-Means

FAQ

Sources

Other simulations: Machine Learning & AI Algorithms

Decision Tree Classifier Visualizer

Gradient Descent Optimizer Visualizer

Neural Network Forward Pass Visualizer

Overfitting & Bias-Variance Tradeoff

Embed