1.
Decision Trees
A Decision Tree is a machine learning model that is represented as a flowchart-like structure, where:
Internal nodes represent a decision or test on an attribute.
Branches represent the outcome of the decision or test.
Leaf nodes represent the result or output, such as a class label (in classification) or a value (in
regression).
The tree is structured in a way that helps decision-making based on the features (attributes) of the data.
The flow from the root to the leaf nodes provides a decision rule that helps predict the class or value for
a given set of features.
How Decision Trees Work:
1. Start at the Root Node: The root node represents the entire dataset. We begin by selecting a
feature (attribute) that best splits the data into different classes or outcomes. This split is
determined by specific criteria like Gini Impurity, Information Gain, or Variance Reduction.
2. Split Data Based on Features: At each internal node, the dataset is split based on the feature
that provides the best separation between classes or predicts the target value the best.
3. Continue Splitting: This process continues recursively at each internal node until we reach the
leaf nodes. These leaf nodes hold the final decision (class label for classification or value for
regression).
4. Make a Prediction: For new, unseen data, the prediction is made by following the tree structure
from the root to a leaf node, applying the decisions (tests) along the way.
Decision Rules:
Definition: A decision rule is a simple "if-then" condition derived from the decision tree.
Example: Consider a decision tree for classifying whether someone will buy a product based on
their age and income:
o If Age ≤ 30 and Income > 50,000, then "Buy Product" (Class 1).
o If Age > 30 and Income ≤ 50,000, then "Don't Buy Product" (Class 0).
These rules are extracted from the paths leading to the leaf nodes in the decision tree.
Example of Decision Tree for Classification:
Let’s consider a small example to illustrate how a decision tree works for classification:
Problem:
Classify whether a person will play tennis based on the weather conditions (Outlook, Temperature,
Humidity, Wind).
Attributes: Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool), Humidity (High,
Low), Wind (Weak, Strong)
Target/Label: PlayTennis (Yes, No)
Dataset:
Outlook Temperature Humidity Wind PlayTennis
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Low Weak Yes
Rain Cool Low Strong No
Overcast Cool Low Strong Yes
Sunny Mild High Weak No
Sunny Cool Low Weak Yes
Rain Mild Low Weak Yes
Building the Decision Tree:
1. Step 1: Select the Root Node: The root node is selected based on the best feature to split the
data. In this case, we would use a criterion like Information Gain to decide the best feature.
After calculating the Information Gain, we might find that Outlook is the best feature to split the
data, as it has the highest Information Gain.
2. Step 2: Split Data: The tree branches into three based on the possible values of Outlook (Sunny,
Overcast, Rain).
3. Step 3: Continue Splitting: Now, for each of these branches, we further split based on the next
best feature (say, Humidity or Wind).
o For Sunny, the tree might split based on Humidity: If Humidity = High, predict "No"
(Leaf node), otherwise "Yes".
o For Rain, the tree might split based on Wind: If Wind = Weak, predict "Yes" (Leaf node),
otherwise "No".
4. Step 4: Reach Leaf Nodes: The decision tree will keep splitting until it reaches leaf nodes with a
predicted label.
Decision Tree Diagram:
Below is a simplified decision tree for the above example.
Outlook
/ | \
Sunny Overcast Rain
/ \ | |
Humidity Yes Wind
/ \ / \
High Low Weak Strong
| | | |
No Yes Yes No
Explanation of the Tree:
1. Root Node: The first decision is based on Outlook.
o If Outlook is Overcast, predict Yes (PlayTennis).
o If Outlook is Sunny, we move to the next test: Humidity.
If Humidity is High, predict No.
If Humidity is Low, predict Yes.
o If Outlook is Rain, the next test is Wind.
If Wind is Weak, predict Yes.
If Wind is Strong, predict No.
Advantages:
Easy to Interpret: The model is visual and intuitive, making it easy to explain to non-experts.
No Need for Data Preparation: Minimal data preprocessing is required (e.g., no need for
normalization or scaling).
Disadvantages:
Overfitting: Decision trees can easily overfit to training data, especially with deep trees.
Instability: Small changes in the data can result in a completely different tree.
Bias toward Dominant Classes: Decision trees can be biased if the dataset is imbalanced.
2. Generating Decision Trees
To construct a Decision Tree:
1. Choose the Best Attribute:
o Use measures like Information Gain or the Gini Index to identify the attribute that
splits the data most effectively.
2. Recursively Split Data:
o Apply the splitting process to each subset until the stopping criteria are met.
3. Assign Labels or Predictions:
o At the leaf nodes, assign the majority class label (for classification) or the average
value (for regression).
3. Pruning Decision Trees
Pruning is the process of reducing the size of a decision tree to prevent overfitting and improve
generalization.
Pre-Pruning: Stop the tree's growth early based on conditions like maximum depth or
minimum data at a node.
Post-Pruning: Grow the entire tree and then remove branches that do not improve
performance on a validation set.
5. Decision Rules
Decision Rules are IF-THEN conditions derived from Decision Trees. For example, a rule might
look like:
IF age > 30 AND income > 50K THEN approve loan.
These rules provide a straightforward way to represent the tree’s logic, offering interpretability
and flexibility in practical applications.
6. Limitations of Decision Trees and Rules
1. Overfitting:
o Decision Trees can grow excessively, capturing noise in the training data.
o Pruning helps mitigate this but may lead to underfitting if over-pruned.
2. Bias Towards Dominant Features:
o Trees can favor features with many levels (e.g., ID numbers) or numeric features
with high variance.
3. Instability:
o Small changes in the training data can lead to entirely different tree structures.
4. Performance on Complex Relationships:
o Decision Trees struggle with datasets where features interact in complex, non-
linear ways.
5. Scalability:
o For very large datasets, tree construction can become computationally expensive.