The Most Interpretable Classifier
Decision trees are the only machine learning model that thinks the way humans naturally do: by asking a sequence of yes/no questions. 'Is the email longer than 50 words? Does it contain the word free? Is the sender unknown?' Each answer leads to another question or a final classification. This makes decision trees uniquely interpretable — you can explain every prediction by tracing the path from root to leaf.
Building the Tree: Greedy Splitting
The algorithm builds the tree top-down by choosing the best split at each node. 'Best' means the split that most reduces impurity — typically measured by Gini impurity or information gain (entropy). The algorithm considers every feature and every possible threshold, evaluating each split's ability to separate the classes. This greedy approach is fast but doesn't guarantee the globally optimal tree.
The Overfitting Problem
An unconstrained decision tree will perfectly classify every training point by creating a leaf for each one. This memorization of noise produces beautiful training accuracy and terrible real-world performance. The simulation above shows how increasing tree depth improves training accuracy but creates increasingly complex, jagged decision boundaries. Controlling depth is the key to balancing accuracy and generalization.
From Single Trees to Forests
Individual decision trees are simple but fragile — small changes in data can produce completely different trees. Random Forests (Breiman, 2001) solve this by growing hundreds of trees on random data subsets and averaging their predictions. Gradient Boosted Trees (XGBoost, LightGBM) build trees sequentially, each correcting the errors of the previous ones. These ensemble methods dominate Kaggle competitions and real-world applications alike.