Decision trees are the closest thing machine learning has to a crystal ball that explains its own predictions. Unlike black-box models that spit out answers without context, a decision tree breaks down a problem into a series of logical yes-or-no questions, much like the childhood game of 20 Questions. This transparency makes them invaluable for industries where understanding the "why" behind a prediction is just as important as the prediction itself.
From 20 Questions to AI: How Decision Trees Think
At their core, decision trees operate on a simple premise: they turn data into a flowchart of decisions. Each branch represents a question based on a feature like age, income, or petal length, while each leaf node represents a final classification or prediction. For example, a tree might ask, "Is income greater than $50,000?" If the answer is yes, it could then ask, "Is age over 30?" to determine whether a customer will buy a product.
This structure mirrors the way humans solve problems by narrowing down possibilities. The key difference is that decision trees don’t rely on guesswork—they learn the questions directly from data. By systematically evaluating every possible split in the data, the model identifies the questions that most effectively separate different classes, such as buyers from non-buyers.
The Science Behind the Splits: Entropy and Information Gain
So how does a decision tree decide which questions to ask? The answer lies in two critical concepts: entropy and information gain. Entropy measures the level of uncertainty or disorder in a dataset. A perfectly pure group—where all examples belong to the same class—has an entropy of 0, while a 50/50 split between two classes reaches the maximum entropy of 1.
import numpy as np
def entropy(p):
if p == 0 or p == 1:
return 0 # No uncertainty
return -p * np.log2(p) - (1 - p) * np.log2(1 - p)The tree’s goal is to reduce entropy by finding splits that make subgroups as pure as possible. This reduction is called information gain, which quantifies how much a split improves the model’s certainty. For instance, splitting a dataset of 10 samples (5 positive, 5 negative) into two perfectly pure groups—one with all positives and another with all negatives—yields an information gain of 1.0, the highest possible improvement.
def information_gain(parent_entropy, left_group, right_group):
n_left = len(left_group)
n_right = len(right_group)
n_total = n_left + n_right
p_left = sum(left_group) / n_left if n_left > 0 else 0
p_right = sum(right_group) / n_right if n_right > 0 else 0
weighted_entropy = ((n_left / n_total) * entropy(p_left) +
(n_right / n_total) * entropy(p_right))
return parent_entropy - weighted_entropyBy continuously selecting the split with the highest information gain, the tree refines its questions until it reaches a stopping condition—either all leaf nodes are pure or a predefined depth limit is hit.
Building Your First Decision Tree with Python
Implementing a decision tree in Python is straightforward using the scikit-learn library. Let’s walk through a practical example using the classic Iris dataset, which contains measurements of iris flowers and their species.
First, we load the dataset and split it into training and testing sets to evaluate the model’s performance. A shallow tree with a maximum depth of 3 is used to keep the model interpretable.
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X_train, y_train)
y_pred = tree.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")The output shows an accuracy of 0.967, meaning the model correctly classifies 96.7% of the test samples. But accuracy alone doesn’t tell the full story—the real value lies in the tree’s structure. We can extract the decision rules as plain text to see exactly how the model makes its predictions.
from sklearn.tree import export_text
rules = export_text(tree, feature_names=iris.feature_names)
print(rules)The output reveals a step-by-step logic:
|--- petal length (cm) <= 2.45
| |--- class: setosa
|--- petal length (cm) > 2.45
|--- petal width (cm) <= 1.75
| |--- petal length (cm) <= 4.95
| | |--- class: versicolor
| |--- petal length (cm) > 4.95
| |--- class: virginica
|--- petal width (cm) > 1.75
|--- class: virginicaThis flowchart-like structure is the decision tree in action. Each line represents a question, and each indentation level shows the progression of decisions. For example, if a flower’s petal length is 2.3 cm, the model immediately classifies it as setosa without further questions.
Balancing Simplicity and Performance: Avoiding Overfitting
While decision trees are intuitive, they have a notorious weakness: overfitting. A tree that grows too deep can memorize noise in the training data instead of learning general patterns, leading to poor performance on unseen data. This is why techniques like limiting the tree’s depth, setting a minimum number of samples per leaf, or pruning branches are essential.
# Example: Controlling overfitting
tree = DecisionTreeClassifier(
max_depth=5,
min_samples_split=10,
min_samples_leaf=5,
random_state=42
)Additionally, visualizing the tree can help identify areas where it might be overcomplicating its decisions. Tools like matplotlib can plot the entire tree, coloring nodes by the majority class to highlight regions of high confidence.
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 6))
plot_tree(
tree,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True,
rounded=True,
fontsize=10
)
plt.title('Decision Tree for Iris Dataset')
plt.show()The Future of Interpretable AI
Decision trees remain one of the most accessible yet powerful tools in machine learning, striking a balance between performance and transparency. As AI adoption grows in regulated industries like healthcare and finance, the ability to explain predictions isn’t just a nice-to-have—it’s a requirement. With libraries like scikit-learn, building and interpreting decision trees has never been easier, making them an ideal starting point for anyone exploring AI.
The next time you face a classification problem, consider letting a decision tree ask the right questions—literally.
AI summary
Learn how decision trees work like a game of 20 Questions to classify data. Build your first model in Python and understand entropy and information gain without complex math.