0% found this document useful (0 votes)

14 views42 pages

Decision Tree

Uploaded by

shauryaa0002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views42 pages

Decision Tree

Uploaded by

shauryaa0002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Machine Learning for Language Technology 2015

http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm

Decision Trees (2)

Entropy, Information Gain, Gain Ratio

Marina Santini
[email protected]

Department of Linguistics and Philology

Uppsala University, Uppsala, Sweden

Autumn 2015
Acknowledgements
• Weka’s slides
• Wikipedia and other websites
• Witten et al. (2011: 99-108; 195-203; 192-203)

Decision Trees (Part 2) 2

Outline
• Attribute selection
• Entropy
• Suprisal
• Information Gain
• Gain Ratio
• Pruning
• Rules

Decision Trees (Part 2) 3

Constructing decision trees
 Strategy: top down
Recursive divide-and-conquer fashion
 First: select attribute for root node
Create branch for each possible attribute value
 Then: split instances into subsets
One for each branch extending from the node
 Finally: repeat recursively for each branch, using
only instances that reach the branch
 Stop if all instances have the same class

4 Decision Trees (Part 2)

Play or not?
• The weather dataset

Decision Trees (Part 2) 5

Which attribute to select?

6 Decision Trees (Part 2)

Computing purity: the information
measure
• information is a measure of a reduction of
uncertainty

• It represents the expected amount of

information that would be needed to “place”
a new instance in the branch.

Decision Trees (Part 2) 7

Which attribute to select?

8 Decision Trees (Part 2)

Final decision tree

 Splitting stops when data can’t be split any further

9 Decision Trees (Part 2)

Criterion for attribute selection

 Which is the best attribute?

 Want to get the smallest tree
 Heuristic: choose the attribute that produces the
“purest” nodes

10 Decision Trees (Part 2)

-- Information gain: increases with the average purity of the subsets
-- Strategy: choose attribute that gives greatest information gain

11 Decision Trees (Part 2)

How to compute Informaton Gain:
Entropy
1. When the number of either yes OR no is zero
(that is the node is pure) the information is
zero.
2. When the number of yes and no is equal, the
information reaches its maximum because
we are very uncertain about the outcome.
3. Complex scenarios: the measure should be
applicable to a multiclass situation, where a
multi-staged decision must be made.

Decision Trees (Part 2) 12

Entropy
• Entropy (aka expected surprisal)

Decision Trees (Part 2) 13

Suprisal: Definition
• Surprisal (aka self-information) is a measure of the information
content associated with an event in a probability space.

• The smaller its probability of an event, the larger the surprisal

associated with the information that the event occur.

• By definition, the measure of surprisal is positive and additive. If an

event C is the intersection of two independent events A and B, then
the amount of information knowing that C has happened, equals
the sum of the amounts of information of event A and
event B respectively:
I(A ∩ B)=I(A)+I(B)

Decision Trees (Part 2) 14

Surprisal: Formula
• The surprisal “I” of an event is:

Decision Trees (Part 2) 15

Entropy: Formulas
 Formulas for computing entropy:

16 Decision Trees (Part 2)

Entropy: Outlook, sunny
 Formulae for computing the entropy:

= (((-2) / 5) log2(2 / 5)) + (((-3) / 5) x log2(3 / 5)) = 0.97095059445

17 Decision Trees (Part 2)

Measures: Information & Entropy
• Watch out: There are many statements in the
literature which say that information is the same
as entropy.
• Properly speaking: entropy is a probabilistic
measure of uncertainty or ignorance and
information is a measure of a reduction of
uncertainty

• However, in our context we use entropy (ie the

quantity of uncertainty) to measure the purity of
a node.
Decision Trees (Part 2) 18
Example: Outlook

19 Decision Trees (Part 2)

Computing Information Gain

 Information gain: information before splitting –

information after splitting
gain(Outlook ) = info([9,5]) – info([2,3],[4,0],[3,2])
= 0.940 – 0.693
= 0.247 bits

 Information gain for attributes from weather data:

gain(Outlook ) = 0.247 bits
gain(Temperature ) = 0.029 bits
gain(Humidity ) = 0.152 bits
gain(Windy ) = 0.048 bits

20 Decision Trees (Part 2)

Information Gain Drawbacks
 Problematic: attributes with a large number
of values (extreme case: ID code)

21 Decision Trees - Part 2

Weather data with ID code
ID code Outlook Temp. Humidity Windy Play
A Sunny Hot High False No
B Sunny Hot High True No
C Overcast Hot High False Yes
D Rainy Mild High False Yes
E Rainy Cool Normal False Yes
F Rainy Cool Normal True No
G Overcast Cool Normal True Yes
H Sunny Mild High False No
I Sunny Cool Normal False Yes
J Rainy Mild Normal False Yes
K Sunny Mild Normal True Yes
L Overcast Mild High True Yes
M Overcast Hot Normal False Yes
N Rainy Mild High True No
22 Decision Trees - Part 2
Tree stump for ID code attribute

 Entropy of split (see Weka book 2011: 105-108):

 Information gain is maximal for ID code (namely 0.940

bits)
23 Decision Trees - Part 2
Information Gain Limitations

 Problematic: attributes with a large number

of values (extreme case: ID code)
 Subsets are more likely to be pure if there is
a large number of values
 Information gain is biased towards choosing
attributes with a large number of values
 This may result in overfitting (selection of an
attribute that is non-optimal for prediction)
 (Another problem: fragmentation)

24 Decision Trees - Part 2

Gain ratio

 Gain ratio: a modification of the information gain

that reduces its bias
 Gain ratio takes number and size of branches into
account when choosing an attribute
 It corrects the information gain by taking the intrinsic
information of a split into account

 Intrinsic information: information about the class is

disregarded.

25 Decision Trees - Part 2

Gain ratios for weather data
Outlook Temperature
Info: 0.693 Info: 0.911
Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029
Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.557
Gain ratio: 0.247/1.577 0.157 Gain ratio: 0.029/1.557 0.019
Humidity Windy
Info: 0.788 Info: 0.892
Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048
Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985
Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049

26 Decision Trees - Part 2

 “Outlook” still comes out top

 However: “ID code” has greater gain ratio
 Standard fix: ad hoc test to prevent splitting on that
type of attribute
 Problem with gain ratio: it may overcompensate
 May choose an attribute just because its intrinsic
information is very low
 Standard fix: only consider attributes with greater
than average information gain

27 Decision Trees - Part 2

Interim Summary
 Top-down induction of decision trees: ID3,
algorithm developed by Ross Quinlan
 Gain ratio just one modification of this basic
algorithm
  C4.5: deals with numeric attributes, missing
values, noisy data
 Similar approach: CART
 There are many other attribute selection
criteria!
(But little difference in accuracy of result)

28 Decision Trees - Part 2

Pruning
 Prevent overfitting to noise in the data
 “Prune” the decision tree

 Two strategies:

 Postpruning
take a fully-grown decision tree and discard
unreliable parts
 Prepruning
stop growing a branch when information
becomes unreliable
 Postpruning preferred in practice—
prepruning can “stop early”
29 Decision Trees - Part 2
Postpruning
 First, build full tree
 Then, prune it

 Fully-grown tree shows all attribute interactions

 Problem: some subtrees might be due to chance
effects
 Two pruning operations:

 Subtree replacement
 Subtree raising

30 Decision Trees - Part 2

Subtree replacement
 Bottom-up
 Consider replacing a tree only
after considering all its subtrees

31 Decision Trees - Part 2

Subtree raising
 Delete node
 Redistribute instances
 Slower than subtree
replacement
(Worthwhile?)

32 Decision Trees - Part 2

Prepruning
 Based on statistical significance test
 Stop growing the tree when there is no statistically
significant association between any attribute and the
class at a particular node
 Most popular test: chi-squared test
 ID3 used chi-squared test in addition to
information gain
 Only statistically significant attributes were allowed to be
selected by information gain procedure

33 Decision Trees - Part 2

From trees to rules
 Easy: converting a tree into a set of rules
 One rule for each leaf:

 Produces rules that are unambiguous

 Doesn’t matter in which order they are executed

 But: resulting rules are unnecessarily complex

 Pruning to remove redundant tests/rules

34 Decision Trees - Part 2

From rules to trees
 More difficult: transforming a rule set into a tree
 Tree cannot easily express disjunction between rules

35 Decision Trees - Part 2

From rules to trees: Example
 Example: rules which test different attributes
If a and b then x
If c and d then x

Symmetry needs to be broken

 Corresponding tree contains identical subtrees

( “replicated subtree problem”)

36 Decision Trees - Part 2

Topic Summary
• Attribute selection
• Entropy
• Suprisal
• Information Gain
• Gain Ratio
• Pruning
• Rules

• Quizzes are naively tricky, just to double check

that your attention is still with me 
Decision Trees (Part 2) 37
Quiz 1: Regression and Classification
Which of these statement is correct in the context
of machine learning?

1. Classification is is the process of computing

model that predicts a numeric quantity.

2. Regression and Classification mean the same.

3. Regression is the process of computing model

that predicts a numeric quantity.

Decision Trees - Part 2 38

Quiz 2: Information Gain
What is the main drawback of the IG metric in
certain contexts?

1. It is biassed towards attributes that have

many values.
2. It is based on entropy rather than suprisal.
3. None of the above.

Decision Trees - Part 2 39

Quiz 3: Gain Ratio
What is the main difference between IG and
GR?

1. GR disregards the information about the

class, and IG takes the class into account.
2. IG disregards the information about the class
and GR takes the class into account.
3. None of the above.
Decision Trees - Part 2 40
Quiz 4: Pruning
Which pruning strategy is commonly recommended?

1. Prepruning
2. Postpruning
3. Subtree raising

Decision Trees - Part 2 41

The End

Decision Trees - Part 2 42

Bio (In Focus Year 12)
67% (3)
Bio (In Focus Year 12)
636 pages
Denise Dailyroutine
No ratings yet
Denise Dailyroutine
10 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
79 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Decision Trees for CS Students
No ratings yet
Decision Trees for CS Students
54 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
14 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree
100% (4)
Decision Tree
66 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
62 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Machine Learning Lec6
No ratings yet
Machine Learning Lec6
40 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Decision Tree Learning Basics
No ratings yet
Decision Tree Learning Basics
36 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
L8 1 Decisiontrees Random Forest
No ratings yet
L8 1 Decisiontrees Random Forest
118 pages
Lecture 6 - Decision Tree
No ratings yet
Lecture 6 - Decision Tree
30 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Decision Tree Intro MDT903
No ratings yet
Decision Tree Intro MDT903
40 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
Tree Models
No ratings yet
Tree Models
42 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
75 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
2025 Lecture07 P1 ID3
No ratings yet
2025 Lecture07 P1 ID3
41 pages
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
No ratings yet
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
33 pages
ID3 Decision Tree Algorithm Guide
No ratings yet
ID3 Decision Tree Algorithm Guide
17 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
AWS Mock Test-5
No ratings yet
AWS Mock Test-5
34 pages
AWS Mock Test-6
No ratings yet
AWS Mock Test-6
35 pages
AWS Mock Test-9
No ratings yet
AWS Mock Test-9
34 pages
AWS Mock Test-8
No ratings yet
AWS Mock Test-8
34 pages
EMC Engineering Exam Insights
No ratings yet
EMC Engineering Exam Insights
3 pages
Case Study
No ratings yet
Case Study
2 pages
Oscor Blue
No ratings yet
Oscor Blue
6 pages
Viviane Namaste - Undoing Theory
No ratings yet
Viviane Namaste - Undoing Theory
23 pages
Scaffolding in Learning
No ratings yet
Scaffolding in Learning
5 pages
安川ES165N en 40
No ratings yet
安川ES165N en 40
4 pages
Thesis Help for Trade Students
100% (2)
Thesis Help for Trade Students
6 pages
Lesson 2 Political Ideologies
No ratings yet
Lesson 2 Political Ideologies
15 pages
IPCW Tutorial 2
No ratings yet
IPCW Tutorial 2
30 pages
Joseph Matthews - The Renegade Rapport
No ratings yet
Joseph Matthews - The Renegade Rapport
21 pages
演讲技巧与主题选择
100% (1)
演讲技巧与主题选择
6 pages
Randox RX Analyzers: Monaco & Imola Specs
No ratings yet
Randox RX Analyzers: Monaco & Imola Specs
7 pages
Rizal Paris To Berlin
No ratings yet
Rizal Paris To Berlin
14 pages
Presentation 1
No ratings yet
Presentation 1
91 pages
Carl Jung: Everything About Other People That Doesn't Satisfy Us Helps Us To Better Understand Ourselves
No ratings yet
Carl Jung: Everything About Other People That Doesn't Satisfy Us Helps Us To Better Understand Ourselves
1 page
6 Structured Query Language (II) : Data Query and Update: ICT Focus
No ratings yet
6 Structured Query Language (II) : Data Query and Update: ICT Focus
5 pages
Chapter 6 Generation of High Voltage
No ratings yet
Chapter 6 Generation of High Voltage
41 pages
5GRAIL WCRR Presentation
No ratings yet
5GRAIL WCRR Presentation
6 pages
LT 1083RCU FX 3500RCU Installation Manual
No ratings yet
LT 1083RCU FX 3500RCU Installation Manual
103 pages
Moral Panics Assignment
No ratings yet
Moral Panics Assignment
7 pages
CO2 Fire Suppression Systems Guide
100% (2)
CO2 Fire Suppression Systems Guide
21 pages
A Saga of Qualitative Research
No ratings yet
A Saga of Qualitative Research
4 pages
Unit 5 Transformation Notes
No ratings yet
Unit 5 Transformation Notes
33 pages
Faktor Pengeboran Sumur Make Up
No ratings yet
Faktor Pengeboran Sumur Make Up
16 pages
Supply Chain Management Project
No ratings yet
Supply Chain Management Project
3 pages
Chapter 7
No ratings yet
Chapter 7
49 pages
Decision Theory for Leaders
No ratings yet
Decision Theory for Leaders
12 pages
Summary of Learning
No ratings yet
Summary of Learning
10 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Machine Learning for Language Technology 2015

Decision Trees (2)

Department of Linguistics and Philology

Decision Trees (Part 2) 2

Decision Trees (Part 2) 3

4 Decision Trees (Part 2)

Decision Trees (Part 2) 5

6 Decision Trees (Part 2)

• It represents the expected amount of

Decision Trees (Part 2) 7

8 Decision Trees (Part 2)

 Splitting stops when data can’t be split any further

9 Decision Trees (Part 2)

 Which is the best attribute?

10 Decision Trees (Part 2)

11 Decision Trees (Part 2)

Decision Trees (Part 2) 12

Decision Trees (Part 2) 13

• The smaller its probability of an event, the larger the surprisal

• By definition, the measure of surprisal is positive and additive. If an

Decision Trees (Part 2) 14

Decision Trees (Part 2) 15

16 Decision Trees (Part 2)

= (((-2) / 5) log2(2 / 5)) + (((-3) / 5) x log2(3 / 5)) = 0.97095059445

17 Decision Trees (Part 2)

• However, in our context we use entropy (ie the

19 Decision Trees (Part 2)

 Information gain: information before splitting –

 Information gain for attributes from weather data:

20 Decision Trees (Part 2)

21 Decision Trees - Part 2

 Entropy of split (see Weka book 2011: 105-108):

 Information gain is maximal for ID code (namely 0.940

 Problematic: attributes with a large number

24 Decision Trees - Part 2

 Gain ratio: a modification of the information gain

 Intrinsic information: information about the class is

25 Decision Trees - Part 2

26 Decision Trees - Part 2

 “Outlook” still comes out top

27 Decision Trees - Part 2

28 Decision Trees - Part 2

 Fully-grown tree shows all attribute interactions

30 Decision Trees - Part 2

31 Decision Trees - Part 2

32 Decision Trees - Part 2

33 Decision Trees - Part 2

 Produces rules that are unambiguous

 But: resulting rules are unnecessarily complex

34 Decision Trees - Part 2

35 Decision Trees - Part 2

Symmetry needs to be broken

( “replicated subtree problem”)

36 Decision Trees - Part 2

• Quizzes are naively tricky, just to double check

1. Classification is is the process of computing

2. Regression and Classification mean the same.

3. Regression is the process of computing model

Decision Trees - Part 2 38

1. It is biassed towards attributes that have

Decision Trees - Part 2 39

1. GR disregards the information about the

Decision Trees - Part 2 41

Decision Trees - Part 2 42

You might also like