Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views8 pages

DataMining-Handouts1 5

A decision tree is a supervised machine learning algorithm used for classification and regression, structured as a tree with nodes representing decisions based on feature values. Key components include root nodes, internal nodes, leaf nodes, and branches, with algorithms like ID3 and CART used for building the trees based on criteria such as entropy and Gini index. Decision trees are interpretable and versatile but can suffer from overfitting, and ensemble methods like Random Forests and Gradient Boosting can enhance their performance.

Uploaded by

Huzaifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views8 pages

DataMining-Handouts1 5

A decision tree is a supervised machine learning algorithm used for classification and regression, structured as a tree with nodes representing decisions based on feature values. Key components include root nodes, internal nodes, leaf nodes, and branches, with algorithms like ID3 and CART used for building the trees based on criteria such as entropy and Gini index. Decision trees are interpretable and versatile but can suffer from overfitting, and ensemble methods like Random Forests and Gradient Boosting can enhance their performance.

Uploaded by

Huzaifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Decision Tree

A decision tree is a supervised machine learning algorithm used for both classification
and regression tasks. It works by splitting the data into subsets based on the value of
input features, creating a tree-like structure of decisions. Each internal node represents a
decision based on a feature, each branch represents the outcome of that decision, and
each leaf node represents a final outcome (class label or continuous value).

Key Components of a Decision Tree:

1. Root Node: The topmost node that represents the entire dataset.
2. Internal Nodes: Nodes that split the data based on a feature.
3. Leaf Nodes: Terminal nodes that provide the final decision or prediction.
4. Branches: Paths from the root to the leaves, representing decision rules.

How a Decision Tree Works:

1. Feature Selection: The algorithm selects the best feature to split the data based
on criteria like Gini impurity, information gain, or variance reduction.
2. Splitting: The dataset is divided into subsets based on the selected feature.
3. Recursion: The process is repeated for each subset until a stopping condition is
met (e.g., maximum depth, minimum samples per leaf).
4. Prediction: For a new data point, the tree is traversed from the root to a leaf
node to make a prediction.

1. How Decision Trees are Built

Decision trees are constructed using algorithms that aim to find the most informative features to
split the data at each node. Here are two common approaches:

 ID3 (Iterative Dichotomiser 3): This algorithm uses entropy and information gain to
select the best feature for splitting.
o Entropy: Measures the impurity or randomness of a set of data. A set with equal
proportions of different classes has high entropy, while a set with only one class
has low entropy.
o Information Gain: Measures the reduction in entropy achieved by splitting the
data on a particular feature. The feature with the highest information gain is
chosen for the split.
 CART (Classification and Regression Trees): This algorithm uses the Gini index to
select the best feature for splitting.
o Gini Index: Measures the impurity of a set of data, similar to entropy. A lower
Gini index indicates higher purity.

2. Splitting Criteria

 Numerical Features: For numerical features, the splitting condition usually involves a
threshold. For example, "Age < 25?" splits the data into two groups: those younger than
25 and those 25 or older.
 Categorical Features: For categorical features, the splitting condition can be based on
the values of the feature. For example, "Favorite Genre = Action?" splits the data into
groups based on their favorite genre.
o Categorical data is a type of data that consists of labels or categories rather than
numerical values. It represents qualitative characteristics of an object or event.

Example:

o Colors: {Red, Blue, Green}

3. Overfitting and Pruning

 Overfitting: Decision trees can become very complex and capture noise in the data,
leading to poor performance on unseen data. This is called overfitting.
 Pruning: To avoid overfitting, we can prune the tree by removing branches or nodes that
do not contribute significantly to the prediction accuracy. Pruning can be done by
limiting the depth of the tree, setting a minimum number of samples required at a node,
or using statistical measures to evaluate the importance of branches.

4. Handling Different Data Types

 Categorical Data: Decision trees can handle categorical data directly by creating
branches for each category.
 Numerical Data: Numerical data can be used directly or discretized into categories. For
example, age can be divided into age groups like "young," "middle-aged," and "old."
 Missing Values: Decision trees can handle missing values by assigning them to the most
likely branch or creating a separate branch for missing values.

5. Advantages and Disadvantages (Expanded)

 Advantages:
o Interpretability: Decision trees are easy to understand and visualize, making
them useful for explaining decisions.
o Versatility: They can handle both classification and regression tasks, as well as
different data types.
o Minimal Data Preprocessing: Decision trees require less data preprocessing
compared to some other machine learning algorithms.
 Disadvantages:
o Overfitting: Decision trees are prone to overfitting, especially when they are very
complex.
o Instability: Small changes in the data can lead to significant changes in the tree
structure.
o Bias: Decision trees can be biased towards features with more levels or
categories.

6. Applications (Expanded)

 Customer Relationship Management (CRM): Predicting customer churn, identifying


potential customers, and personalizing marketing campaigns.
 Risk Assessment: Assessing credit risk, predicting loan defaults, and evaluating
insurance applications.
 Medical Diagnosis: Diagnosing diseases based on symptoms and medical history,
predicting patient outcomes, and personalizing treatment plans.
 Fraud Detection: Identifying fraudulent transactions in financial systems, detecting
suspicious activities in online platforms.

7. Ensemble Methods

To improve the performance and robustness of decision trees, ensemble methods can be used.
These methods combine multiple decision trees to make predictions. Two popular ensemble
methods are:

 Random Forests: Create multiple decision trees on different subsets of the data and
combine their predictions through averaging or voting.
 Gradient Boosting: Build trees sequentially, where each tree tries to correct the errors of
the previous trees.

Types of Decision Trees:

 Classification Trees: Predict categories (e.g., "likes movie" or "dislikes movie").


 Regression Trees: Predict continuous values (e.g., the price of a house).
ID3 Algorithm (Iterative Dichotomiser 3) - Step by Step

The ID3 (Iterative Dichotomiser 3) algorithm is a decision tree learning algorithm developed
by Ross Quinlan. The ID3 (Iterative Dichotomiser 3) algorithm is a decision tree learning
algorithm used for classification tasks. It builds a tree by selecting the attribute with the highest
Information Gain at each step. It uses Entropy and Information Gain to determine the best
attribute to split the data. Information Gain measures how much a feature reduces the
uncertainty (entropy) in the dataset.

Note: ID3 is a foundational algorithm. More advanced decision tree algorithms like C4.5 and CART
address some of these limitations.

Step-by-Step Explanation of ID3 Algorithm

1. Start with the Entire Dataset:


oBegin with the complete dataset and all available features.
2. Calculate the Entropy of the Target Attribute:
o Entropy measures the impurity or uncertainty in the dataset. For a binary
classification problem, entropy is calculated as:

3. Calculate Information Gain for Each Feature:


o Information gain measures how much a feature reduces the entropy. It is
calculated as:

4. Select the Feature with the Highest Information Gain:


o Choose the feature that maximizes information gain as the splitting
criterion.
5. Split the Dataset:
Split the dataset into subsets based on the selected feature's values.
o
6. Repeat the Process Recursively:
o Repeat steps 2–5 for each subset until:
All instances in a subset belong to the same class (no further

splitting needed).
 No more features are left to split on.
 A predefined stopping condition is met (e.g., maximum tree depth).
7. Create the Decision Tree:
o The splits form the internal nodes of the tree, and the leaf nodes represent
the class labels.

Example Problem
We will use the ID3 Algorithm to build a decision tree to determine if a person will play tennis
based on these features:

Outlook Temperature Humidity Windy Play Tennis?


Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No

Compute Entropy for the Target Variable (Play Tennis?)


So, the entropy of the dataset is 0.94

Compute Information Gain for Each Attribute


We now calculate the Information Gain for Outlook, Temperature, Humidity, and Windy.

Information Gain for "Outlook"

Outlook Total Play Tennis: Yes Play Tennis: No


Sunny 5 2 3
Overcast 4 4 0
Rainy 5 3 2

First, compute the entropy for outlook

Now, compute the Information Gain-IG for outlook

5 4 5
𝐼𝐺(𝑂𝑢𝑡𝑙𝑜𝑜𝑘) = 0.94 − (( × 0.97) + ( × 0) + ( × 0.97))
14 14 14

𝐼𝐺(𝑂𝑢𝑡𝑙𝑜𝑜𝑘) = 0.94 − (0.35 + 0 + 0.35)

𝐼𝐺(𝑂𝑢𝑡𝑙𝑜𝑜𝑘) = 0.24

Similarly, we compute IG for Temperature, Humidity, and Windy and choose the highest one.
Information Gains (IGs)

 Outlook: 0.247
 Temperature: 0.029
 Humidity: 0.151
 Windy: 0.048

Select the Feature with the Highest Information Gain:


Since, Outlook has the highest Information Gain, we split on it:

Recursively Build the DecisionTree:


 For the Overcast subset (all "Yes"), create a leaf node labeled "Yes".
 For the Sunny and Rainy subsets, repeat the process to find the best feature to
split on.

Final Decision Tree

 If Overcast, Play Tennis = Yes.


 If Sunny, check Humidity:
o If High, Play Tennis = No.
o If Normal, Play Tennis = Yes.
 If Rainy, check Windy:
o If False, Play Tennis = Yes.
o If True, Play Tennis = No.

Making Predictions
Now, we can use the tree to classify new data.

Example:

 Outlook = Rainy, Windy = False → Play Tennis = Yes


 Outlook = Sunny, Humidity = High → Play Tennis = No

Conclusion
The ID3 algorithm:

1. Calculates Entropy for the dataset.


2. Finds Information Gain for each attribute.
3. Selects the attribute with the highest IG as the root node.
4. Recursively splits the dataset until all nodes are pure or other stopping criteria are met.

You might also like