Divide and Classify
A decision tree is one of the most intuitive machine learning models — it mirrors how humans make decisions through a series of questions. Given data with features and labels, the algorithm finds the best feature and threshold to split the data into purer groups. It repeats recursively on each subset until a stopping criterion is met. The result is a tree of if-then rules that can classify new data points by traversing from root to leaf.
Axis-Aligned Partitions
Each split in a standard decision tree tests a single feature against a threshold, creating a boundary that is perpendicular to one feature axis. The simulation shows these rectangular partitions overlaid on the 2D data — you can see how the tree approximates complex boundaries (circles, XOR patterns) through many small rectangular regions. This axis-alignment is both a strength (simple, interpretable) and a limitation (requires many splits for diagonal boundaries).
Growing and Pruning
An unpruned tree will keep splitting until every leaf is pure — perfectly classifying the training data but memorizing noise. The max_depth parameter controls this: shallower trees underfit (too simple) while deeper trees overfit (too complex). The sweet spot depends on the data complexity and noise level. In production, cross-validation is used to find the optimal depth, and ensemble methods aggregate many trees to get robust predictions.
From Single Trees to Forests
The power of decision trees was amplified by ensemble methods. Random Forests (Breiman, 2001) train hundreds of trees on bootstrap samples with random feature subsets, then vote on predictions — dramatically reducing overfitting. Gradient Boosting (XGBoost, LightGBM) builds trees sequentially, with each tree correcting the errors of the previous ones. These ensemble methods consistently win machine learning competitions on structured/tabular data.