Decision Tree & ID3
Algorithm
Reference book:
R2.Tom Mitchell, Machine Learning, McGraw-Hill, 1997
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Tree Versus ML Decision Tree
Leaf
Root
Root Leaf
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
1. Satisfy Criteria for salary?
2. Is it a dream company?
3. Commute/travel time is less than a hour?
4. Offers free breakfast & Coffee?
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
You need to buy apples…
How would you choose fresh apples in market?
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
• DT is a method for approximating discrete-valued functions that is robust to
noisy data and capable of learning disjunctive expressions.
• Learned trees can also be re-represented as sets of if-then rules to improve
human readability. Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
1. Satisfy Criteria for salary?
2. Is it a dream company?
3. Commute/travel time is less than a hour?
4. Offers free breakfast & Coffee?
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Definition – Decision Trees
1. Decision trees classify instances by sorting
them down the tree from the root to
some leaf node, which provides the
classification of the instance.
2. Each node in the tree specifies a test of
• DT is a method for approximating discrete- some attribute of the instance, and each
valued functions that is robust to noisy data branch descending from that node
and capable of learning disjunctive corresponds to one of the possible values
expressions. for this attribute.
3. An instance is classified by starting at the
• Learned trees can also be re-represented as
root node of the tree, testing the attribute
sets of if-then rules to improve human
specified by this node, then moving down
readability.
the tree branch corresponding to the
value of the attribute.
4. This process is then repeated for the sub-
tree rooted at the new node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Types of Decision Trees
• Classification Tree : Classification Tree is used to create a decision tree
for a categorical response (commonly known as target) with many
categorical or continuous predictors (factors). The categorical response
can be in the form of binomial or multinomial (e.g. Pass/Fail, high,
medium & low, etc.). It illustrates important patterns and relationships
between a categorical response and important predictors within highly
complicated data, without using parametric methods. Also, identify
groups in the data with desirable characteristics, and to predict response
values for new observations. For e.g., a credit card company can use
classification tree to identify customers that will take credit card or not
based on several predictors.
• Regression Tree : Regression Tree is used to create a decision tree for a
continuous response (commonly known as target) with many categorical
or continuous predictors (factors). The continuous response can be in the
form of a real number (e.g. piston diameter, blood pressure level, etc.). It
also illustrates the important patterns and relationships between a
continuous response and predictors within highly complicated data,
without using parametric methods. Also, identify groups in the data with
desirable characteristics, and to predict response values for new
observations. For example, a pharmaceutical company can use regression
tree to identify the potential predictors which are affecting the dissolution
rate based on several predictors.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Decision Tree Algorithms
ID3
Decision
Tree
Algorithms
C4.5 CART
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).
Decision Tree algorithms
1. ID3(Iterative Dichotomiser 3): ID3 cannot handle continuous variables
directly; it works only with categorical data. It is also prone to overfitting.
(Splitting Criterion: Information Gain)
2. C4.5: It can handle both categorical and continuous attributes by converting
continuous attributes into categorical ones through thresholding. (Splitting
Criterion: Gain Ratio: C4.5 is an extension of ID3)
3. CART (Classification and Regression Trees): CART splits nodes into exactly
two branches (Splitting Criterion: Gini Index for classification trees, Variance
Reduction for regression trees.)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Terminologies
• Root Node : It represents the entire population or sample and
this further gets divided into two or more homogeneous sets
• Leaf Node: Node cannot be further segregated into further
nodes
• Parent/Child node: Root node is the parent node and all the
other nodes branched from it is know as child node
• Branch/sub tree: Formed by splitting the tree/node
• Splitting : It is dividing the root node/sub node into different
parts on the basis of some condition.
• Pruning: Opposite of splitting, basically removing unwanted
branches from the tree
• Entropy: Measure that tells the purity/impurity of samples
• Information Gain: It is the decrease in entropy after a dataset
is split on the basis of an attribute. Constructing a decision Decision trees represent a
tree is all about finding attribute that returns the highest
information gain (Useful in deciding which attribute can be disjunction of conjunctions of
used as root node ) constraints on the attribute values
• Reduction in variance: It is an algorithm used for continuous of instances.
target variables (regression problems). The split with lower
variance is selected as the criteria to split the population.
• Gini index: the measure of purity or impurity used in building
decision tree in CART ()
• Chi Square: It is an algorithm to find out the statistical
significance between the differences between sub-nodes and
parent node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
DECISION TREE REPRESENTATION
• (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
• In general, decision trees represent a disjunction of conjunctions of constraints on the attribute
values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree
itself to a disjunction of these conjunctions.
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
If (Outlook = Sunny AND Humidity = Normal) OR (Outlook = Overcast) OR (Outlook = Rain AND Wind = Weak)
Then: Play Tennis = Yes
Else
Then: Play Tennis = No
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Test Instance:
Whether we PlayTennis ?
If (“Outlook = Sunny ˄ Humidity = Normal”)
Then Play Tennis = Yes
Outlook Temperature Humidity Wind Play Tennis:
Decision
Sunny ----- High ----- Yes
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Concept of Decision Trees
Attribute 1 Attribute 2 Attribute 3 Attribute 4 Class = {M, H}
Class = M
Class = H
Attribute 2
Attribute 3
Class = M
Class = H
Attribute 1
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Fig. 3.1 Representation of objects (samples) using features.
Colour {Green, Brown, Gray, Other} Has wings?
Abdomen Thorax
length Antennae
length length
Mandible
Size
Spiracle
diameter Leg length
Fig. 3.2 Measuring features for domain of interest.
Table 3.1 Instance, Features and Class
Insect Abdomen Antennae Insect class
ID length length
1 2.7 5.5 Grasshopper
2 8 9.1 Kartydid
3 0.9 4.7 Grasshopper
4 1.1 3.1 Grasshopper
5 5.4 8.5 Kartydid
6 2.9 1.9 Grasshopper
7 6.1 6.6 Kartydid
8 0.5 1 Grasshopper
9 8.3 6.6 Kartydid
10 8.1 4.7 Kartydid
An Example from Medicine
Table 4.1 Medical Data
The main
o Gender Age BP Drug
1 Male 20 Normal A
purpose of the decision tree is 2 Female 73 Normal B
3 Male 37 High A
to expose the structural 4 Male 33 Low B
5 Female 48 High A
6 Male 29 Normal A
information contained in the 7 Female 52 Normal B
8 Male 42 Low B
data. 9 Male 61 Normal B
10 Female 30 Normal A
11 Female 26 Low B
12 Male 54 High A
Grasshoppers Katydids
10
7
Abdomen length > 7.1?
6
Antenna length
5 No
Yes
4
Antenna length > 6.0? Katydid
3
2 No Yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10
Abdomen length Fig. 4.8 Feature space and the decision tree for insect data.
Grasshoppers Katydids
10
7
Abdomen length > 7.1?
6
Antenna length
5 no yes
4
Antenna length > 6.0? Katydid
3
2 no yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10
Abdomen length
Fig. 4.8 Feature space and the decision tree for insect data.
Concept of Decision Tree ML
A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Training Dataset
• Input features are also called Attributes
• This example dataset has 2 attributes – Colour & Diameter
• Instances are defined by categorical levels/numerical
values of attributes
• We used to call a dataset as Labelled dataset if class is
defined for input feature – Concept of Supervised
classification: Here Output is categorical
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best
attribute: Ask a question [Is dia>= 3?]
Stop Continue
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai growing tree growing tree
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best
attribute: Ask a question [Is dia>= 3?]
3. Based on attribute values[True/False] – Dataset is divided into 2 subsets and those
subsets becomes input to 2 new child nodes
Stop Continue
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai growing tree growing tree
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best attribute:
Ask a question [Is dia>= 3?]
3. Based on attribute values[True/False] – Dataset is divided into 2 subsets and those subsets
becomes input to 2 new child nodes
4. In False side – Data subset has similar type for label (Grape- There is no uncertainty (no
confusion in predicting the label) about the type of leaf so stop growing the tree in that
side)
5. In True side – Subset has mixture of labels so uncertainty exists – So continue splitting the
dataset as well node
Dataset
Dataset Dataset
(Grape) (Mango &
Lemon)
Stop
Continue
growing tree
growing tree
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai (Leaf node)
Decision tree - procedure
Based on the characteristics of attributes, Identify different set
of possible question to ask.
How to identify, a question to continue growing tree is good
indictor? → Information gain metric
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step I
• Identify different set of possible question to ask
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Method to identify best
attributes
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy – Is a metric to identify best attribute
A decision tree is built top-down from a root node and involves
partitioning the data into subsets that contain instances with similar
values (homogenous). ID3 algorithm uses entropy to calculate the
homogeneity of a sample. If the sample is completely homogeneous the
entropy is zero and if the sample is an equally divided it has entropy of
one.
CLASSIFICATION METHODS
Challenges
How to represent the entire information in the dataset using minimum number of
rules?
How to develop the smallest tree?
Solution
Select the variable with maximum information (highest relation with Y) for first split
31
ID3 Vs C4.5 Vs CART
Decision Trees
▪ Decision trees are a type of supervised machine learning
▪ Use well “labelled” training data and on basis of that data, predict the
output. This process can then be used to predict the results for
unknown data
▪ Decision trees can be applied for both regression and classification
problems
▪ A decision tree processes data into groups based on the value of the
data and the features it is provided
▪ Decision trees can be used for regression to get real numeric value. Or
they can be used for classification to split data into different categories
33
Decision Trees
34
34
Decision Trees
35
35
Decision Trees
36
36
Decision Trees
37
37
Decision Trees
Decision trees has three types of
nodes
▪ A root node that has no
incoming edges and zero or
more outgoing edges
▪ Internal nodes, each of which
has exactly one incoming edge
and two or more outgoing
edges
▪ Leaf or Terminal nodes, each
of which has exactly one
incoming edge and no
outgoing edges 38
38
Decision Trees
▪ In a decision tree, each leaf
node is assigned a class label
▪ The non-terminal nodes,
which include the root and
other internal nodes, contain
attribute test conditions to
separate records that have
different characteristics
39
39
Decision Trees
▪ Classifying a test record is straight
forward once a decision tree has been
constructed
▪ Starting from the root node, we apply
the test condition to the record and
follow the appropriate branch based
on the outcome of the test
▪ This will lead us either to another
internal node, for which a new test
condition is applied or to a leaf node
▪ Class label associated with a leaf node
is then assigned to the record
40
40
Stepts in Decision Trees
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.
41
41
Decision Trees
42
42
Decision Trees
Why use Decision Trees?
1. Decision Trees usually mimic human thinking ability
while making a decision, so it is easy to understand.
2. The logic behind the decision tree can be easily
understood because it shows a tree-like structure.
43
43
Terminologies
• Root Node : It represents the entire population or sample and
this further gets divided into two or more homogeneous sets
• Leaf Node: Node cannot be further segregated into further
nodes
• Parent/Child node: Root node is the parent node and all the
other nodes branched from it is know as child node
• Branch/sub tree: Formed by splitting the tree/node
• Splitting : It is dividing the root node/sub node into different
parts on the basis of some condition.
• Pruning: Opposite of splitting, basically removing unwanted
branches from the tree
• Entropy: Measure that tells the purity/impurity of samples
• Information Gain: It is the decrease in entropy after a dataset
is split on the basis of an attribute. Constructing a decision Decision trees represent a
tree is all about finding attribute that returns the highest
information gain (Useful in deciding which attribute can be disjunction of conjunctions of
used as root node ) constraints on the attribute values
• Reduction in variance: It is an algorithm used for continuous of instances.
target variables (regression problems). The split with lower
variance is selected as the criteria to split the population.
• Gini index: the measure of purity or impurity used in building
decision tree in CART ()
• Chi Square: It is an algorithm to find out the statistical
significance between the differences between sub-nodes and
parent node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
How many attributes? Which attribute is significant?
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Why need to find significant attribute?
• Because to start to construct the decision tree, one of the best
attribute has to assigned as Root node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Decision Trees
Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to solve
such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:
1. Information Gain
2. Gini Index
47
47
Decision Trees
1. Information Gain:
Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
It calculates how much information a feature provides us about a class.
According to the value of information gain, we split the node and build the
decision tree.
A decision tree algorithm always tries to maximize the value of information
gain, and a node/attribute having the highest information gain is split first. It
can be calculated using the below formula:
Information Gain = Entropy(S )- [(Weighted Avg) *Entropy(each feature)]
48
48
Decision Trees
Entropy: Entropy is a metric to measure the impurity in a given attribute.
It specifies randomness in data.
Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
•S= Total number of samples
•P(yes)= probability of yes
•P(no)= probability of no
49
49
Decision Trees
2. Gini Index:
•Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree) algorithm.
•An attribute with the low Gini index should be preferred as compared to
the high Gini index.
•It only creates binary splits, and the CART algorithm uses the Gini index
to create binary splits.
•Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2
50
50
Information Gain
Few samples
are mixed Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Information Gain
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy metric (Numerical example)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy metric (Numerical example)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Concept of Decision
Trees
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
ID3 ALGORITHM (Iterative Dichotomiser 3)
• ID3 algorithm, learns decision trees by constructing them
top-down, beginning with the question "which attribute
should be tested at the root of the tree?‘ To answer this
question, each instance attribute is evaluated using a
statistical test to determine how well it alone classifies the
training examples.
• The best attribute is selected and used as the test at the
root node of the tree.
• A descendant of the root node is then created for each
possible value of this attribute, and the training examples
are sorted to the appropriate descendant node
• The entire process is then repeated using the training
examples associated with each descendant node to select
the best attribute to test at that point in the tree.
• This forms a greedy search for an acceptable decision tree,
in which the algorithm never backtracks to reconsider
earlier choices.
• A simplified version of the algorithm, specialized to learning
boolean-valued functions (i.e., concept learning)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
WHICH ATTRIBUTE IS THE BEST
CLASSIFIER?
• The central choice in the ID3 algorithm is selecting which attribute to
test at each node in the tree.
• What is a good quantitative measure of the worth of an attribute?
• We will define a statistical property, called information gain, that
measures how well a given attribute separates the training examples
according to their target classification.
• ID3 uses this information gain measure to select among the
candidate attributes at each step while growing the tree.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Contd.,
• ID3(Examples, Target attribute, Attributes)
• Examples are the training examples. Target attribute is the attribute whose
value is to be predicted by the tree. Attributes is a list of other attributes that
may be tested by the learned decision tree. Returns a decision tree that
correctly classifies the given Examples.
• Create a Root node for the tree
• If all Examples are positive, Return the single-node tree Root, with label = +
• If all Examples are negative, Return the single-node tree Root, with label = -
• If Attributes is empty, Return the single-node tree Root, with label = most
common value of
• Target attribute in Examples
• Otherwise Begin
• A<-the attribute from Attributes that best classifies Examples
• The decision attribute for Root <- A
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Contd.,
• For each possible value, vi, of A,
• Add a new tree branch below Root, corresponding to the test A = vi
• Let Examplesvi be the subset of Examples that have value vi for A
• If Examplesvi is empty
• Then below this new branch add a leaf node with label = most
common
• value of Target attribute in Examples
• Else below this new branch add the subtree
• ID3(Examplesvi, Targetattribute, Attributes - (A)))
• End
• Return Root
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Numerical problem
Given a dataset of historical tennis match data, including
features such as outlook, temperature, humidity and windy,
design a decision tree classifier to predict the outcome of a
tennis playing decision making based on these input features.
How would you determine the optimal split criteria at each
node of the decision tree to make accurate predictions about
whether a player should play tennis or not under specific
weather and player condition scenarios?
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
DATASET FOR PLAYING TENNIS
+ Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
-
DATASET FOR PLAYING TENNIS
Question?
Candidate
attributes
+ Positive instances Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
- Negative instances
Take X1:outlook attribute and
analyze what are the sub-attributes
in it and count # of samples in each
4
5 5
• There are 5 samples in Sunny sub-attribute with 2-Yes positive labels + 3-No negative
labels.
• There are 4 samples in Overcast sub-attribute with 4-Yes positive labels + 0-No
negative labels.
• There are 5 samples in Rainy sub-attribute with 3-Yes positive labels + 2-No negative
labels.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
4
5 5 Frequency Table for X1
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Take X2:temp attribute and analyze what are
the sub-attributes in it and count
Temp?
hot mild cool
No Yes Yes
No No No
Yes Yes Yes
Yes Yes Yes
4 Yes Frequency Table for X2
No 4
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
humidity?
Frequency Table for X3
high normal
No Yes
No No
Yes Yes
Yes Yes
No Yes
Yes Yes
No Yes
7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Windy?
Frequency Table for X4
false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Frequency Table for entire dataset
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
DATASET FOR PLAYING TENNIS There are 4 attributes (outlook, temperature,
humidity and windy) in the given dataset.
Sub-attributes Attributes Target Which should be considered as root node?
Question?
The given 4
Candidate
attributes
+ Positive instances =9 /14 Solution
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
- Negative instances = 5/14
Step 1
• Measure entropy for overall samples S in the given dataset. Formula
for Entropy
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 2
• For all the attributes from the dataset, compute entropy and
information gain measures to identify which attribute is significant to
consider it as a root node to start building the decision tree.
• In this dataset there are 4 attributes namely outlook,
temperature, humidity and windy.
• Lets start considering outlook as the first choice in our computation to
calculate Information gain
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
For X1: Overcast compute all the
following measures, Refer to the
basic Entropy formula
Step 1: Calculate Overall Entropy value
Step 2: How many sub attributes in Outlook?
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
Step 4: Calculate Outlook-Overcast Attribute based
Entropy value
4
5 5
Step 5: Calculate Outlook-Rainy Attribute based
Entropy value
Step 6: Information for outlook
Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value
+ Positive instances =2 /5 Step 2: How many sub attributes in Outlook?
- Negative instances = 3/5 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
Step 4: Calculate Outlook-Overcast Attribute based
Entropy value
4
5 5
Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based
log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Sunny)
= -(2/5) log2 (2/5) –(3/5) log2 (3/5) Step 6: Information for outlook
= -0.4(-1.3219)-0.6(-0.7369)
= 0.52876+0.44214
= 0.971 Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value
+ Positive instances =4 /4 Step 2: How many sub attributes in Outlook?
- Negative instances = 0/4 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
Step 4: Calculate Outlook-Overcast Attribute based
Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0
Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based
log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Sunny)
= -(4/4) log2 (4/4) –(0/4) log2 (0/4) Step 6: Information for outlook
= -0
Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value
+ Positive instances =3 /5 Step 2: How many sub attributes in Outlook?
- Negative instances = 2/5 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
Step 4: Calculate Outlook-Overcast Attribute based
Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0
Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based
log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Rainy) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
E(Outlook=Sunny)
= -(2/5) log2 (2/5) –(3/5) log2 (3/5) Step 6: Information for outlook
= -0.4(-1.3219)-0.6(-0.7369)
= 0.52876+0.44214
= 0.971 Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value
E(S) = -(9/14) log2 (9/14) –(5/14) log2 (5/14) = 0.940
Step 2: How many sub attributes in Outlook?
3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
Step 4: Calculate Outlook-Overcast Attribute based
Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0
Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based
log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Rainy) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
I(Outlook) = E(Outlook=Sunny)
+E(Outlook=Overcast)+E(Outlook=Rai Step 6: Information for outlook
ny) = (5/14) * E(Outlook=Sunny)+ I(Outlook) = (5/14) * 0.971 + (4/14)*0 + (5/14)*0.971 = 0.693
(4/14)* +E(Outlook=Overcast)
(5/14)*+E(Outlook=Rainy) Step7: Calculate Gain for outlook
= 0.693 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Gain(Outlook) = E(S)-I(Outlook) [Step1-Step 6]
= 0.940 – 0.693=0.247
Take X2:temp attribute and analyze
what are the sub-attributes in it and
count
Temp?
hot mild cool
No Yes Yes
No No No
Yes Yes Yes
Yes Yes Yes
4 Yes
No 4
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Take X2:temp attribute and analyze
what are the sub-attributes in it and
count
Step 1: Calculate Overall Entropy value
E(S) = -(9/14) log2 (9/14) –(5/14) log2 (5/14) = 0.940
Temp? Step 2: How many sub attributes in temp?
3 = hot, mild, cool
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
hot mild cool E(temp=hot) = -(2/4) log (2/4) –(2/4) log (2/4) = 1
2 2
No Yes Yes
No Step 4: Calculate Outlook-Overcast Attribute based
No No Entropy value
Yes Yes Yes
Yes Yes Yes E(temp=mild) = -(4/6) log2(4/6)-(2/6) log2 (2/6) = 0.9184
Yes Step 5: Calculate Outlook-Rainy Attribute based
4 4 Entropy value
No
E(temp=cool) = -(3/4) log2 (3/4) –(1/4) log2 (1/4) = 0.8112
6
Step 6: Information for outlook
I(temp) = (4/14) * 1 + (6/14)*0.9184 + (4/14)*0.8112= 0.9149
Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai Gain(Temp) = E(S)-I(Temp)
= 0.94 – 0.9149=0.0251
humidity?
high normal
No Yes
No No
Yes Yes
Yes Yes
No Yes
Yes Yes
No Yes
7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Take X3:Humidity attribute and
analyze what are the sub-attributes
in it and count
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Windy?
false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
We have to check which attribute
Step 3 is highest Information Gain value
The outlook attribute which has
maximum value of Information
Gain is assigned as root node to
grow the Decision tree
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 4
Having identified that outlook attribute is the root node, yet we need to make 3 branches for
sunny, overcast and rain from it.
The labels
are similar
and pure
On checking overcast attribute has only (Yes) positive labels with high purity
measure so it is considered as leaf node because it has no possibility to grow
further. Whereas sunny and rain has both Yes and No labels which means impurity
is there and it has possibilityDr.S.Sridevi,
to branch out
ASP/SCOPE, as Yes and No.
VIT Chennai
Below the Sunny attribute we will grow the tree via choosing any of the pending
attributes (temp or humidity or wind)
Which to choose?
Measure the Information Gain and choose the one with highest value.
Step 5: So, Lets consider the data samples D1, D2, D8, D9, D11 pertaining to Sunny sub-
attribute with respect to other attributes namely temp, humidity and windy
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 5a
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 5b
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 5c
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
We have to check which attribute
is having highest Information
Gain value
The Humidity attribute which has
maximum value of Information
Gain is assigned as root node in
Level1 to grow the Decision tree
further
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 6
Step 6: To grow the tree further below
Rain sub-attribute, Lets consider the data
samples pertaining to Rain sub-attribute
On checking High sub-attribute has only (No)with negative respect to with
labels otherhigh
mainpurity
attributes
measure and Normal sub-attribute has namely
onlyASP/SCOPE,
(Yes) temp, humidity
with and
highwindy
Dr.S.Sridevi, VITpositive
Chennai labels purity
measure so they are considered as leaf nodes.
Step 6
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
On checking Strong sub-attribute has only (No) negative labels with high purity
measure and Weak sub-attribute has only (Yes) positive labels with high purity
measure so they are considered as leaf nodes.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Finally Decision tree is grown
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
DECISION TREE REPRESENTATION
• (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
• would be sorted down the leftmost branch of this decision tree and would therefore be classified
as a negative instance (i.e., the tree predicts that Play Tennis = no).
• In general, decision trees represent a disjunction of conjunctions of constraints on the attribute
values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree
itself to a disjunction of these conjunctions.
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Attributes
Target
+
-
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Attributes
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
`
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai