0% found this document useful (0 votes)

66 views30 pages

Classification: Decision Trees

This document discusses decision trees and their construction. It explains that decision trees classify examples by sorting them down the tree from root nodes to leaf nodes based on attribute tests. The document outlines techniques for choosing the best attribute to split on at each node, including information gain and gain ratio, which evaluate attributes based on how well they separate examples into distinct classes. It provides an example of building a decision tree to predict whether to "Play" using weather data, and calculates the information gain of different attributes on this data.

Uploaded by

Ashish Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views30 pages

Classification: Decision Trees

Uploaded by

Ashish Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

Classification:

Decision Trees
Outline

 Top-Down Decision Tree Construction

 Choosing the Splitting Attribute
 Information Gain and Gain Ratio

2
DECISION TREE
 An internal node is a test on an attribute.
 A branch represents an outcome of the test, e.g.,
Color=red.
 A leaf node represents a class label or class label
distribution.
 At each node, one attribute is chosen to split training
examples into distinct classes as much as possible
 A new case is classified by following a matching path
to a leaf node.

3
Weather Data: Play or not Play?
Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No Note:
overcast hot high false Yes Outlook is the
rain mild high false Yes Forecast,
rain cool normal false Yes
no relation to
rain cool normal true No
overcast cool normal true Yes
Microsoft
sunny mild high false No email program
sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No

4
Example Tree for “Play?”

Outlook

sunny
overcast rain

Humidity Yes
Windy

high normal true false

No Yes No Yes

5
Building Decision Tree [Q93]

 Top-down tree construction

 At start, all training examples are at the root.
 Partition the examples recursively by choosing one
attribute each time.

 Bottom-up tree pruning

 Remove subtrees or branches, in a bottom-up
manner, to improve the estimated accuracy on new
cases.

6
Choosing the Splitting Attribute

 At each node, available attributes are evaluated

on the basis of separating the classes of the
training examples. A Goodness function is used
for this purpose.
 Typical goodness functions:
 information gain (ID3/C4.5)
 information gain ratio
 gini index

7
witten&eibe
Which attribute to select?

8
witten&eibe
A criterion for attribute selection

 Which is the best attribute?

 The one which will result in the smallest tree
 Heuristic: choose the attribute that produces the
“purest” nodes

 Popular impurity criterion: information gain

 Information gain increases with the average purity of
the subsets that an attribute produces

 Strategy: choose attribute that results in greatest

information gain
9
witten&eibe
Computing information

 Information is measured in bits

 Given a probability distribution, the info required to
predict an event is the distribution’s entropy
 Entropy gives the information required in bits (this can
involve fractions of bits!)

 Formula for computing the entropy:

entropy( p1 , p2 , , pn )   p1logp1  p2logp2   pn logpn

10
witten&eibe
*Claude Shannon “Father of
information theory”
Born: 30 April 1916
Died: 23 February 2001
Claude Shannon, who has died aged 84, perhaps
more than anyone laid the groundwork for today’s
digital revolution. His exposition of information
theory, stating that all information could be
represented mathematically as a succession of
noughts and ones, facilitated the digital
manipulation of data without which today’s
information society would be unthinkable.
Shannon’s master’s thesis, obtained in 1940 at MIT,
demonstrated that problem solving could be
achieved by manipulating the symbols 0 and 1 in a
process that could be carried out automatically with
electrical circuitry. That dissertation has been hailed
as one of the most significant master’s theses of the
20th century. Eight years later, Shannon published
another landmark paper, A Mathematical Theory of
Communication, generally taken as his most
important scientific contribution.
Shannon applied the same radical approach to cryptography research, in which he later
became a consultant to the US government.
Many of Shannon’s pioneering insights were developed before they could be applied in
practical form. He was truly a remarkable man, yet unknown to most of the world.
11
witten&eibe
Example: attribute “Outlook”, 1
Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No
overcast hot high false Yes
rain mild high false Yes
rain cool normal false Yes
rain cool normal true No
overcast cool normal true Yes
sunny mild high false No
sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No

12
witten&eibe
Example: attribute “Outlook”, 2

 “Outlook” = “Sunny”:
info([2,3])  entropy(2/5,3/5)  2 / 5 log(2 / 5)  3 / 5 log(3 / 5)  0.971 bits

Note: log(0) is
 “Outlook” = “Overcast”:
not defined, but
info([4,0])  entropy(1,0)  1log(1)  0 log(0)  0 bits
we evaluate
0*log(0) as zero
 “Outlook” = “Rainy”:
info([3,2])  entropy(3/5,2/5)  3 / 5 log(3 / 5)  2 / 5 log(2 / 5)  0.971 bits

 Expected information for attribute:

info([3,2], [4,0],[3,2])  (5 / 14)  0.971  (4 / 14)  0  (5 / 14)  0.971
 0.693 bits
13
witten&eibe
Computing the information gain
 Information gain:
(information before split) – (information after split)
gain(" Outlook")  info([9,5]) - info([2,3], [4,0], [3,2])  0.940 - 0.693
 0.247 bits

 Compute for attribute “Humidity”

14
witten&eibe
Example: attribute “Humidity”

 “Humidity” = “High”:
info([3,4])  entropy(3/7,4/7)  3 / 7 log(3 / 7)  4 / 7 log(4 / 7)  0.985 bits

 “Humidity” = “Normal”:
info([6,1])  entropy(6/7,1/7)  6 / 7 log(6 / 7)  1 / 7 log(1 / 7)  0.592 bits

 Expected information for attribute:

info([3,4], [6,1])  (7 / 14)  0.985  (7 / 14)  0.592  0.79 bits

 Information Gain:
info([9,5]) - info([3,4], [6,1])  0.940 - 0.788  0.152
15
Computing the information gain
 Information gain:
(information before split) – (information after split)
gain(" Outlook")  info([9,5]) - info([2,3], [4,0], [3,2])  0.940 - 0.693
 0.247 bits

 Information gain for attributes from weather

data: gain("Outlook")  0.247 bits
gain("Temperature" )  0.029 bits
gain(" Humidity" )  0.152 bits
gain(" Windy" )  0.048 bits
16
witten&eibe
Continuing to split

gain(" Humidity")  0.971 bits

gain("Temperatur e" )  0.571 bits

gain(" Windy" )  0.020 bits

17
witten&eibe
The final decision tree

 Note: not all leaves need to be pure; sometimes

identical instances have different classes
 Splitting stops when data can’t be split any further

18
witten&eibe
Highly-branching attributes

 Problematic: attributes with a large number of

values (extreme case: ID code)
 Subsets are more likely to be pure if there is a
large number of values
Information gain is biased towards choosing attributes
with a large number of values
This may result in overfitting (selection of an attribute
that is non-optimal for prediction)

21
witten&eibe
Weather Data with ID code
ID Outlook Temperature Humidity Windy Play?
A sunny hot high false No
B sunny hot high true No
C overcast hot high false Yes
D rain mild high false Yes
E rain cool normal false Yes
F rain cool normal true No
G overcast cool normal true Yes
H sunny mild high false No
I sunny cool normal false Yes
J rain mild normal false Yes
K sunny mild normal true Yes
L overcast mild high true Yes
M overcast hot normal false Yes
N rain mild high true No
22
Split for ID Code Attribute

Entropy of split = 0 (since each leaf node is “pure”, having only

one case.

Information gain is maximal for ID code

23
witten&eibe
Gain ratio
 Gain ratio: a modification of the information gain
that reduces its bias on high-branch attributes
 Gain ratio should be
 Large when data is evenly spread
 Small when all data belong to one branch
 Gain ratio takes number and size of branches into
account when choosing an attribute
 It corrects the information gain by taking the intrinsic
information of a split into account (i.e. how much info
do we need to tell which branch an instance belongs to)

24
witten&eibe
Gain Ratio and Intrinsic Info.
 Intrinsic information: entropy of distribution of
instances into branches
|S | |S |
IntrinsicInfo(S , A)   i log i .
|S| 2 | S |

 Gain ratio (Quinlan’86) normalizes info gain by:

GainRatio(S , A)  Gain(S, A) .
IntrinsicInfo(S , A)

25
Computing the gain ratio
 Example: intrinsic information for ID code
info([1,1, ,1)  14  (1 / 14  log1 / 14)  3.807 bits
 Importance of attribute decreases as intrinsic
information gets larger
 Example of gain ratio:
gain("Attribute")
gain_ratio(" Attribute") 
intrinsic_info("Attribute")

 Example: 0.940 bits

gain_ratio(" ID_code")   0.246
3.807 bits

26
witten&eibe
Gain ratios for weather data
Outlook Temperature
Info: 0.693 Info: 0.911
Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029
Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.362
Gain ratio: 0.247/1.577 0.156 Gain ratio: 0.029/1.362 0.021

Humidity Windy
Info: 0.788 Info: 0.892
Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048
Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985
Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049

27
witten&eibe
More on the gain ratio
 “Outlook” still comes out top
 However: “ID code” has greater gain ratio
 Standard fix: ad hoc test to prevent splitting on that type of
attribute

 Problem with gain ratio: it may overcompensate

 May choose an attribute just because its intrinsic information is
very low
 Standard fix:
 First, only consider attributes with greater than average information gain
 Then, compare them on gain ratio

28
witten&eibe
*CART Splitting Criteria: Gini Index
 If a data set T contains examples from n classes, gini
index, gini(T) is defined as

where pj is the relative frequency of class j in T.

gini(T) is minimized if the classes in T are skewed.

29
*Gini Index
After splitting T into two subsets T1 and T2 with sizes
N1 and N2, the gini index of the split data is defined
as

 The attribute providing smallest ginisplit(T) is chosen

to split the node.

30
Discussion

 Algorithm for top-down induction of decision

trees (“ID3”) was developed by Ross Quinlan
 Gain ratio just one modification of this basic algorithm
 Led to development of C4.5, which can deal with
numeric attributes, missing values, and noisy data

 Similar approach: CART (to be covered later)

 There are many other attribute selection criteria!
(But almost no difference in accuracy of result.)

31
Summary

 Top-Down Decision Tree Construction

 Choosing the Splitting Attribute
 Information Gain biased towards attributes with a
large number of values
 Gain Ratio takes number and size of branches
into account when choosing an attribute

CNAS (PS-DBM) June 13, 2025
No ratings yet
CNAS (PS-DBM) June 13, 2025
5 pages
Test Cases For Irctc 21
58% (26)
Test Cases For Irctc 21
14 pages
DEK 265-Horizon Installation Manual
No ratings yet
DEK 265-Horizon Installation Manual
68 pages
Wella Hair Color Guide
No ratings yet
Wella Hair Color Guide
14 pages
ID3 Lecture4
No ratings yet
ID3 Lecture4
25 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Business Analytics & Machine Learning: Decision Tree Classifiers
No ratings yet
Business Analytics & Machine Learning: Decision Tree Classifiers
60 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Decision Tree Intro MDT903
No ratings yet
Decision Tree Intro MDT903
40 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
04-Data Maining Classification Decision Trees
No ratings yet
04-Data Maining Classification Decision Trees
24 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
DM UNIT 4b (1R ALGO)
No ratings yet
DM UNIT 4b (1R ALGO)
39 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Unit 5 Decision Tree2
No ratings yet
Unit 5 Decision Tree2
40 pages
3 Decision Trees - LMS
No ratings yet
3 Decision Trees - LMS
47 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Lecture 11 Classification-1
No ratings yet
Lecture 11 Classification-1
30 pages
Id3algorithm 200307175839
No ratings yet
Id3algorithm 200307175839
22 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
30 pages
Classification Trees: C4.5: Vanden Berghen Frank
No ratings yet
Classification Trees: C4.5: Vanden Berghen Frank
5 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Decision Trees in Data Mining
No ratings yet
Decision Trees in Data Mining
17 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
75 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Machine Learning: BY:Vatsal J. Gajera (09BCE010)
No ratings yet
Machine Learning: BY:Vatsal J. Gajera (09BCE010)
25 pages
Mod 3 Part1 - Merged
No ratings yet
Mod 3 Part1 - Merged
101 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
DM GTU Study Material Presentations Unit-4 21052021124323PM
No ratings yet
DM GTU Study Material Presentations Unit-4 21052021124323PM
28 pages
Classification Basics & Decision Trees
No ratings yet
Classification Basics & Decision Trees
82 pages
DM GTU Study Material Presentations Unit-4 21052021124323PM
No ratings yet
DM GTU Study Material Presentations Unit-4 21052021124323PM
28 pages
ML Lecture04x2
No ratings yet
ML Lecture04x2
16 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Information Gain: Information Gain (IG) Measures How Much "Information" A Feature Gives Us About The Class
No ratings yet
Information Gain: Information Gain (IG) Measures How Much "Information" A Feature Gives Us About The Class
34 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Information and Interaction Among Features: 36-350: Data Mining 9 September 2009
No ratings yet
Information and Interaction Among Features: 36-350: Data Mining 9 September 2009
16 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
Decision Tree Tutorial
No ratings yet
Decision Tree Tutorial
8 pages
Jdavis Indlearn2
No ratings yet
Jdavis Indlearn2
91 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Boosted Decision Trees Explained
No ratings yet
Boosted Decision Trees Explained
30 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
16 pages
Floating Point
No ratings yet
Floating Point
16 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Decision Trees: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Decision Trees: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
101 pages
Introduction To HDL Day - 3: STC On HDL For Digital System Design 1
50% (2)
Introduction To HDL Day - 3: STC On HDL For Digital System Design 1
8 pages
SSL Certificates & Web Services Guide
No ratings yet
SSL Certificates & Web Services Guide
9 pages
Math & CS Formula Cheat Sheet
No ratings yet
Math & CS Formula Cheat Sheet
1 page
B.Tech IT OS Exam May 2014
No ratings yet
B.Tech IT OS Exam May 2014
9 pages
Advt NT
No ratings yet
Advt NT
3 pages
Class X 165 SQP 2023-24
No ratings yet
Class X 165 SQP 2023-24
11 pages
Search For Music Using Your Voice by Singing or Humming, View Music Videos, Join Fan Clubs, Share With Friends, Be Discovered and Much More For Free!
No ratings yet
Search For Music Using Your Voice by Singing or Humming, View Music Videos, Join Fan Clubs, Share With Friends, Be Discovered and Much More For Free!
3 pages
2017 Mit 070
No ratings yet
2017 Mit 070
71 pages
Types and Benefits of Application Software
No ratings yet
Types and Benefits of Application Software
6 pages
Mixed Signal Integrated Circuit Design
100% (1)
Mixed Signal Integrated Circuit Design
1 page
Sample Test ECDL CAD V1.5
No ratings yet
Sample Test ECDL CAD V1.5
6 pages
Going Beyond T-SNE: Exposing Whatlies in Text Embeddings
No ratings yet
Going Beyond T-SNE: Exposing Whatlies in Text Embeddings
8 pages
SMTS File - 1 RS20200105 2020 05 19 14 - 26 - 04
No ratings yet
SMTS File - 1 RS20200105 2020 05 19 14 - 26 - 04
2 pages
1992 Mercedes 300 SE Audio Wiring Guide
100% (1)
1992 Mercedes 300 SE Audio Wiring Guide
3 pages
GMC 300E Plus User Guide
No ratings yet
GMC 300E Plus User Guide
24 pages
Vetcare
No ratings yet
Vetcare
18 pages
Vigenere Cipher: By: Mohsin Tahir Waqas Akram Numan-Ul-Haq Ali Asghar Rao Arslan
No ratings yet
Vigenere Cipher: By: Mohsin Tahir Waqas Akram Numan-Ul-Haq Ali Asghar Rao Arslan
15 pages
CAN-Based Smart Home System
No ratings yet
CAN-Based Smart Home System
7 pages
CS335 Lecture 1 Slides
No ratings yet
CS335 Lecture 1 Slides
30 pages
Chapter 12 Quizzes
No ratings yet
Chapter 12 Quizzes
3 pages
Lutech Viewer2-9 Manual FINAL 200930A
No ratings yet
Lutech Viewer2-9 Manual FINAL 200930A
19 pages
Cluster Analysis and Applications
No ratings yet
Cluster Analysis and Applications
37 pages
MCA 2 Year Syllabus
No ratings yet
MCA 2 Year Syllabus
99 pages
Wholesale Services Agreement
No ratings yet
Wholesale Services Agreement
19 pages
Remote Radiotherapy Planning The EIMRT Project
No ratings yet
Remote Radiotherapy Planning The EIMRT Project
7 pages
Algorithm Assignment Solutions
No ratings yet
Algorithm Assignment Solutions
3 pages
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
2 pages
Thesis Asset Management Client Login
100% (2)
Thesis Asset Management Client Login
4 pages
Lab 6-1 VLAN Configurations
No ratings yet
Lab 6-1 VLAN Configurations
14 pages
SME AC Panel Manual 052005 en
100% (1)
SME AC Panel Manual 052005 en
65 pages
PGP Machine Learning Brochure
No ratings yet
PGP Machine Learning Brochure
12 pages