Decision Tree
Algorithm
Decision tree Algorithm
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems.
• It is a tree-structured classifier, where,
• Internal nodes represent the features of a dataset,
• Branches represent the decision rules and
• Each leaf node represents the outcome.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, a question and a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
• To build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question and based on the answer (Yes/No), it further splits
the tree into subtrees.
Decision tree
Why use Decision Trees?
•Below are the two reasons for using the Decision tree:
1.Decision Trees usually mimic human thinking ability
while making a decision, so it is easy to understand.
2.The logic behind the decision tree can be easily
understood because it shows a tree-like structure.
Decision Tree Terminologies
• Root Node: The root node is from where the decision tree starts. It represents the entire dataset, which further gets
divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given
conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.
How does the Decision Tree algorithm Work?
• In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the
tree.
• This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.
• For the next node, the algorithm again compares the attribute value with the other sub-nodes and moves
further.
• It continues the process until it reaches the leaf node of the tree.
• The complete process can be better understood using the given algorithm:
Example - 1
• Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or
Not.
• So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM).
• The root node splits further into the next decision node (distance from the office) and one leaf node based
on the corresponding labels.
• The next decision node further gets split into one decision node (Cab facility) and one leaf node.
• Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offers). Consider the
below diagram:
Example -2
Example -2
Attribute Selection Measures
• While implementing a Decision tree, the main issue arises in how to select the best attribute for
the root node and sub-nodes.
• So, to solve such problems there is a technique which is called as Attribute selection measure
or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree.
• There are two popular techniques for ASM, which are:
•Information Gain
•Gini Index
1. Information Gain
• Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an
attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the decision tree.
• A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute
having the highest information gain is split first.
• It can be calculated using the below formula
• Entropy: Entropy is a metric to measure the impurity in a given
attribute. It specifies randomness in data.
• Entropy can be calculated as:
Example
Solution
Entropy is measured in bits.
If there are only two possible classes, entropy
values can range from 0 to 1.
For n classes, entropy ranges from 0 to log2(n).
Some Examples:
Entropy & Information Gain
Example - 1
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Example 2
Solution
Example - 3
• Consider the following data, where the Y label is whether or not the child goes out to play
Solution
• Step 1: Calculate the IG (information gain) for each attribute
(feature)
Step 2: Choose which feature to split
with.
Step 3: Repeat for each level
Step 4: Choose a Feature for each
node.
Step – 5 - Final Tree
Example - 6
• Consider the table given. It represent factors
that affect whether John would go out to play
golf or not. Using the data in the table, build a
decision tree to model that can be used to
predict if John would play golf or not.
Step 1: Determine the Decision
Column
• Since decision trees are used for classification, you need to
determine the classes that are the basis for the decision.
•
In this case, it is the last column, that is Play Golf column with
classes Yes and No.
• To determine the rootNode we need to compute the entropy.
To do this, we create a frequency table for the classes (the Yes/No
column).
Step 2: Calculating Entropy for the classes
(Play Golf)
• In this step, you need to calculate the entropy for the Play Golf
column and the calculation step is given below.
Step 3: Calculate Entropy for Other Attributes
After Split
• The easiest way to approach this calculation is to create a
frequency table for the two variables, that is PlayGolf and Outlook.
• So now that we have all the entropies for all four attributes, let’s go
ahead to summarize them as shown in below:
Step 4: Calculating Information Gain for
Each Split
• The next step is to calculate the information gain for each of the
attributes.
• The information gain is calculated from the split using each of the
attributes. Then the attribute with the largest information gain is
used for the split.
Step 5: Perform the First Split
• Draw the First Split of the Decision Tree
• Now that we have all the information gain, we then split the tree
based on the attribute with the highest information gain.
• From our calculation, the highest information gain comes from
Outlook. Therefore, the split will look like this:
From Table 3, we could see that the Overcast outlook requires no further split
because it is just one homogeneous group. So, we have a leaf node.
Step 6: Perform Further Splits
• The Sunny and the Rainy attributes needs to be split
• The Rainy outlook can be split using either Temperature, Humidity or
Windy.
• What attribute would best be used for this split? Why?
• Humidity. Because it produces homogenous groups.
• The Rainy attribute could be split using High and
Normal attributes and that would give us the
tree as shown.
• Let’s now go ahead to do the same thing for the Sunny outlook.
• The Rainy outlook can be split using either Temperature, Humidity
or Windy.
• What attribute would best be used for this split? Why?
• Windy . Because it produces homogeneous groups.
• If we do the split using the Windy attribute, we
would have the final tree that would require no
further splitting! This is shown in the next Figure.
Step 7: Complete the Decision Tree
2. Gini Index
• Gini index is a measure of impurity or purity used
while creating a decision tree in the CART(Classification
and Regression Tree) algorithm.
• An attribute with the low Gini index should be preferred
• It only creates binary splits, and the CART algorithm
uses the Gini index to create binary splits.
• Gini index can be calculated using the below formula:
Pruning: Getting an Optimal Decision Tree
• Pruning is a process of deleting unnecessary nodes
from a tree to get the optimal decision tree.
• A too-large tree increases the risk of overfitting, and a
small tree may not capture all the important features of
the dataset.
• Therefore, a technique that decreases the size of the
learning tree without reducing accuracy is known as
Pruning
• . There are mainly two types of tree pruning technology used:
1. Cost Complexity Pruning
2. Reduced Error Pruning.
Advantages of the Decision Tree
• It is simple to understand as it follows the same process
which a human follow while making any decision in real-
life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a
problem.
• There is less requirement of data cleaning compared to
other algorithms.
Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it
complex.
• It may have an overfitting issue, which can be resolved
using the Random Forest algorithm.
• For more class labels, the computational complexity of
the decision tree may increase.
Decision Tree: - In Short