Understanding Decision Trees in
Machine Learning
1. Introduction
Decision Trees are a type of supervised machine learning algorithm used for classification
and regression tasks. They model decisions and their possible consequences in the form of a
tree structure.
2. Structure of a Decision Tree
A decision tree is composed of the following:
- Root Node: Represents the entire dataset and the first decision to be made.
- Internal Nodes: Represent features based on which the data is split.
- Leaf Nodes: Represent the final output or decision (class or value).
- Branches: Show the outcome of a decision and connect nodes.
3. Types of Decision Trees
- Classification Trees: Output is a class label (e.g., "Yes" or "No").
- Regression Trees: Output is a continuous value (e.g., house price).
4. Key Terminologies
- Gini Impurity: Measures the impurity of a node. Used in classification.
- Entropy & Information Gain: Measure the effectiveness of a feature in classifying data.
- Variance Reduction: Used in regression trees.
- Pruning: Technique to reduce overfitting by removing less significant branches.
5. Advantages of Decision Trees
- Easy to interpret and visualize.
- Handles both numerical and categorical data.
- Requires little data preprocessing.
6. Disadvantages
- Prone to overfitting.
- Small changes in data can lead to different trees.
- Biased with imbalanced datasets.
7. Popular Algorithms
- ID3 (Iterative Dichotomiser 3)
- C4.5 / C5.0
- CART (Classification and Regression Trees)
8. Use Cases
- Customer churn prediction
- Credit scoring
- Medical diagnosis
- Marketing segmentation
9. Decision Tree in Python (Example Code)
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
tree.plot_tree(clf)
10. Conclusion
Decision Trees are a powerful and interpretable tool in machine learning. Despite their
simplicity, they are the basis of more advanced ensemble methods like Random Forests and
Gradient Boosted Trees.