0% found this document useful (0 votes)

197 views33 pages

Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting

The document discusses decision trees, including their representation, the ID3 learning algorithm, and concepts like entropy and information gain. Decision trees can be used for categorization problems by building a tree from training data that predicts the category of new events based on their features. The ID3 algorithm selects attributes to split on by choosing the attribute with the highest information gain, or reduction in entropy.

Uploaded by

Mir Saahil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

197 views33 pages

Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting

Uploaded by

Mir Saahil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 33

Decision Trees

Decision tree representation

ID3 learning algorithm

Entropy, information gain

Overfitting
Introduction
Goal: Categorization
 Given an event, predict is category. Examples:
 Who won a given ball game?

 How should we file a given email?

 What word sense was intended for a given

occurrence of a word?
 Event = list of features. Examples:
 Ball game: Which players were on offense?

 Email: Who sent the email?

 Disambiguation: What was the preceding

word?

2
Introduction

 Use a decision tree to predict categories

for new events.
 Use training data to build the decision tree.
New
Events

Training
Decision
Events and
Tree
Categories

Sunny Overcast Rain

Humidity Each internal node tests an attribute

High Normal Each branch corresponds to an

attribute value node
No Yes Each leaf node assigns a classification
4
Word Sense Disambiguation

 Given an occurrence of a word, decide which

sense, or meaning, was intended.
 Example: "run"
 run1: move swiftly (I ran to the store.)

 run2: operate (I run a store.)

 run3: flow (Water runs from the spring.)

 run4: length of torn stitches (Her stockings

had a run.)
 etc.

5
Word Sense Disambiguation
 Categories
 Use word sense labels (run1, run2, etc.) to name the
possible categories.
 Features
 Features describe the context of the word we want to
disambiguate.
 Possible features include:
 near(w): is the given word near an occurrence of word w?
 pos: the word’s part of speech
 left(w): is the word immediately preceded by the word w?
 etc.

6
Word Sense Disambiguation
 Example decision tree:
pos
noun verb

near(stocking) near(race)
yes no yes no

run4 run1 near(river)

yes no

run3

(Note: Decision trees for WSD tend to be quite large)

7
WSD: Sample Training Data

Features Word
pos near(race) near(river) near(stockings) Sense
noun no no no run4
verb no no no run1
verb no yes no run3
noun yes yes yes run4
verb no no yes run1
verb yes yes no run2
verb no yes yes run3

8
Decision Tree for Conjunction
Outlook=Sunny  Wind=Weak

Outlook

Sunny Overcast Rain

Wind No No

Strong Weak

No Yes
9
Decision Tree for Disjunction
Outlook=Sunny  Wind=Weak

Outlook

Sunny Overcast Rain

Yes Wind Wind

Strong Weak Strong Weak

No Yes No Yes
10
Decision Tree for XOR
Outlook=Sunny XOR Wind=Weak

Outlook

Sunny Overcast Rain

Wind Wind Wind

Strong Weak Strong Weak Strong Weak

Yes No No Yes No Yes

11
Decision Tree
• decision trees represent disjunctions of conjunctions
Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

(Outlook=Sunny  Humidity=Normal)
 (Outlook=Overcast)
 (Outlook=Rain  Wind=Weak)
12
When to consider Decision Trees
 Instances describable by attribute-value pairs
 Target function is discrete valued
 Disjunctive hypothesis may be required
 Possibly noisy training data
 Missing attribute values
 Examples:
 Medical diagnosis

 Credit risk analysis

 Object classification for robot manipulator (Tan

1993)

13
Top-Down Induction of Decision
Trees ID3

1. A  the “best” decision attribute for next node

2. Assign A as decision attribute for node
3. For each value of A create new descendant
4. Sort training examples to leaf node according to
the attribute value of the branch
5. If all training examples are perfectly classified
(same value of target attribute) stop, else
iterate over new leaf nodes.

14
Which attribute is best?

[29+,35-] A1=? A2=? [29+,35-]

G H L M

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

15
Entropy

 S is a sample of training examples

 p+ is the proportion of positive examples
 p- is the proportion of negative examples
 Entropy measures the impurity of S
Entropy(S) = -p+ log2 p+ - p- log2 p-
16
Entropy
 Entropy(S)= expected number of bits needed to encode
class (+ or -) of randomly drawn members of S (under
the optimal, shortest length-code)
Why?
 Information theory optimal length code assign

–log2 p bits to messages having probability p.

 So the expected number of bits to encode
(+ or -) of random member of S:
-p+ log2 p+ - p- log2 p-

17
Information Gain (S=E)
 Gain(S,A): expected reduction in entropy due to sorting
S on attribute A

Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64

= 0.99
[29+,35-] A1=? A2=? [29+,35-]

G H True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

18
Information Gain
Entropy([21+,5-]) = 0.71 Entropy([18+,33-]) = 0.94
Entropy([8+,30-]) = 0.74 Entropy([11+,2-]) = 0.62
Gain(S,A1)=Entropy(S) Gain(S,A2)=Entropy(S)
-26/64*Entropy([21+,5-]) -51/64*Entropy([18+,33-])
-38/64*Entropy([8+,30-]) -13/64*Entropy([11+,2-])
=0.27 =0.12

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

19
Training Examples
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
20
Selecting the Next Attribute
S=[9+,5-] S=[9+,5-]
E=0.940 E=0.940
Humidity Wind

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]

E=0.985 E=0.592 E=0.811 E=1.0
Gain(S,Humidity) Gain(S,Wind)
=0.940-(7/14)*0.985 =0.940-(8/14)*0.811
– (7/14)*0.592 – (6/14)*1.0
=0.151 =0.048
Humidity provides greater info. gain than Wind, w.r.t target classification. 21
Selecting the Next Attribute
S=[9+,5-]
E=0.940
Outlook

Over
Sunny Rain
cast

[2+, 3-] [4+, 0] [3+, 2-]

E=0.971 E=0.0 E=0.971
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
22
Selecting the Next Attribute
The information gain values for the 4 attributes
are:
• Gain(S,Outlook) =0.247
• Gain(S,Humidity) =0.151
• Gain(S,Wind) =0.048
• Gain(S,Temperature) =0.029

where S denotes the collection of training

examples

23
ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]

Sunny Overcast Rain

Ssunny =[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14]

[2+,3-] [4+,0-] [3+,2-]
? Yes ?
Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970
Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
24
ID3 Algorithm
Outlook

Sunny Overcast Rain

Humidity Yes Wind

[D3,D7,D12,D13]

High Normal Strong Weak

No Yes No Yes

[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]

[mistake] 25
Occam’s Razor
”If two theories explain the facts equally weel, then the
simpler theory is to be preferred”
Arguments in favor:
 Fewer short hypotheses than long hypotheses

 A short hypothesis that fits the data is unlikely to be a

coincidence
 A long hypothesis that fits the data might be a
coincidence
Arguments opposed:
 There are many ways to define small sets of
hypotheses

26
Overfitting
 One of the biggest problems with decision trees is
Overfitting

27
Avoid Overfitting
 stop growing when split not statistically
significant
 grow full tree, then post-prune

Select “best” tree:

 measure performance over training data

 measure performance over separate

validation data set

 min( |tree|+|misclassifications(tree)|)

28
Effect of Reduced Error Pruning

29
Converting a Tree to Rules
Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

R1: If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No

R2: If (Outlook=Sunny)  (Humidity=Normal) Then PlayTennis=Yes
R3: If (Outlook=Overcast) Then PlayTennis=Yes
R4: If (Outlook=Rain)  (Wind=Strong) Then PlayTennis=No
R5: If (Outlook=Rain)  (Wind=Weak) Then PlayTennis=Yes
30
Continuous Valued Attributes
Create a discrete attribute to test continuous
 Temperature = 24.50C

 (Temperature > 20.00C) = {true, false}

Where to set the threshold?

Temperature 150C 180C 190C 220C 240C 270C

PlayTennis No No Yes Yes Yes No

31
Unknown Attribute Values
What if some examples have missing values of A?
Use training example anyway sort through tree
 If node n tests A, assign most common value of A

among other examples sorted to node n.

 Assign most common value of A among other

examples with same target value

 Assign probability pi to each possible value vi of A

 Assign fraction pi of example to each descendant in

tree

Classify new examples in the same fashion

32
Cross-Validation
 Estimate the accuracy of an hypothesis
induced by a supervised learning algorithm
 Predict the accuracy of an hypothesis over
future unseen instances
 Select the optimal hypothesis from a given
set of alternative hypotheses
 Pruning decision trees

 Model selection

 Feature selection

 Combining multiple classifiers (boosting)

Lecture 6 - Decision Tree
No ratings yet
Lecture 6 - Decision Tree
30 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
75 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
79 pages
Decision Trees for Data Classification
No ratings yet
Decision Trees for Data Classification
33 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
14 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Unit 3
No ratings yet
Unit 3
90 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Decision Trees for CS Students
No ratings yet
Decision Trees for CS Students
54 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Ics320part3 DecisionTreeLearning
No ratings yet
Ics320part3 DecisionTreeLearning
29 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Unit 3
No ratings yet
Unit 3
81 pages
L8 1 Decisiontrees Random Forest
No ratings yet
L8 1 Decisiontrees Random Forest
118 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Lecture 19 - Decision Tress
No ratings yet
Lecture 19 - Decision Tress
21 pages
Yapay Zeka Ve Makine Öğrenmesi 10
No ratings yet
Yapay Zeka Ve Makine Öğrenmesi 10
34 pages
A08 Decision Trees 2up
No ratings yet
A08 Decision Trees 2up
20 pages
Decision Tree
No ratings yet
Decision Tree
98 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Machine Learning for Engineers
100% (1)
Machine Learning for Engineers
80 pages
ID3 Explanation
No ratings yet
ID3 Explanation
23 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
161 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Module 3
No ratings yet
Module 3
102 pages
Tree Models
No ratings yet
Tree Models
42 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Raunkiers Life Forms
No ratings yet
Raunkiers Life Forms
11 pages
Correlation of Chemical Evaporation Rate With Vapor Pressure
100% (1)
Correlation of Chemical Evaporation Rate With Vapor Pressure
5 pages
Geospatial
No ratings yet
Geospatial
13 pages
Iron Thiocyanate Equilibrium Study
No ratings yet
Iron Thiocyanate Equilibrium Study
3 pages
Delay in Pipeline Construction Projects in The Oil and Gas Industry: Part 2 (Prediction Models)
No ratings yet
Delay in Pipeline Construction Projects in The Oil and Gas Industry: Part 2 (Prediction Models)
9 pages
Multistage Sampling
No ratings yet
Multistage Sampling
14 pages
Class IX Unit 3& 4 Question Bank
No ratings yet
Class IX Unit 3& 4 Question Bank
7 pages
A Step-by-Step Guide To Survival Analysis: Lida Gharibvand, University of California, Riverside
No ratings yet
A Step-by-Step Guide To Survival Analysis: Lida Gharibvand, University of California, Riverside
16 pages
Intro Stat
No ratings yet
Intro Stat
112 pages
Air-Jet Texturing PDF
No ratings yet
Air-Jet Texturing PDF
0 pages
Astm D 2132
100% (1)
Astm D 2132
6 pages
Load Forecasting Class
100% (1)
Load Forecasting Class
24 pages
Traffic Forecasting for Pavement Design
No ratings yet
Traffic Forecasting for Pavement Design
168 pages
Decision Analysis (Adapted From Exercise 15.2-7, Page 784, Operations Research, 7
No ratings yet
Decision Analysis (Adapted From Exercise 15.2-7, Page 784, Operations Research, 7
8 pages
Kalman Filter Slides
No ratings yet
Kalman Filter Slides
27 pages
VN Social Sciences and Humanities
No ratings yet
VN Social Sciences and Humanities
132 pages
PVT Correlations McCain - Valko
No ratings yet
PVT Correlations McCain - Valko
17 pages
Econometrics Chapter 14, 15 & 16 PPT Slides
100% (2)
Econometrics Chapter 14, 15 & 16 PPT Slides
113 pages
Apendice G155
No ratings yet
Apendice G155
8 pages
Ap Stats
No ratings yet
Ap Stats
8 pages
Q4-Math 7-Week 1-Statistics
No ratings yet
Q4-Math 7-Week 1-Statistics
55 pages
Chemistry Gases: Student Practice
100% (1)
Chemistry Gases: Student Practice
58 pages
Purifying Alcoholic Beverage Using Simple and Fractional Distillation
No ratings yet
Purifying Alcoholic Beverage Using Simple and Fractional Distillation
4 pages
Investment Skewness & Kurtosis Guide
No ratings yet
Investment Skewness & Kurtosis Guide
2 pages
1 s2.0 S0169207020300224 Main
No ratings yet
1 s2.0 S0169207020300224 Main
19 pages
Test Bank For Statistics 3rd Edition Agresti Franklin 0321755944 9780321755940 - Free Download Available in PDF DOCX Format
100% (15)
Test Bank For Statistics 3rd Edition Agresti Franklin 0321755944 9780321755940 - Free Download Available in PDF DOCX Format
50 pages
A Regression-Kriging Model For Estimation of Rainf
No ratings yet
A Regression-Kriging Model For Estimation of Rainf
10 pages
Introduction To Environmental Data Science
No ratings yet
Introduction To Environmental Data Science
649 pages
E251 PDF
No ratings yet
E251 PDF
20 pages
Group 5 Practical
No ratings yet
Group 5 Practical
6 pages

Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting

Uploaded by

Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting

Uploaded by

Decision Trees

Decision tree representation

Entropy, information gain

 How should we file a given email?

 What word sense was intended for a given

 Email: Who sent the email?

 Disambiguation: What was the preceding

 Use a decision tree to predict categories

Sunny Overcast Rain

Humidity Each internal node tests an attribute

High Normal Each branch corresponds to an

 Given an occurrence of a word, decide which

 run2: operate (I run a store.)

 run3: flow (Water runs from the spring.)

 run4: length of torn stitches (Her stockings

run4 run1 near(river)

(Note: Decision trees for WSD tend to be quite large)

Sunny Overcast Rain

Sunny Overcast Rain

Yes Wind Wind

Strong Weak Strong Weak

Sunny Overcast Rain

Wind Wind Wind

Strong Weak Strong Weak Strong Weak

Yes No No Yes No Yes

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

 Credit risk analysis

 Object classification for robot manipulator (Tan

1. A  the “best” decision attribute for next node

[29+,35-] A1=? A2=? [29+,35-]

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

 S is a sample of training examples

–log2 p bits to messages having probability p.

Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]

[2+, 3-] [4+, 0] [3+, 2-]

where S denotes the collection of training

Sunny Overcast Rain

Ssunny =[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14]

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]

 A short hypothesis that fits the data is unlikely to be a

Select “best” tree:

 measure performance over separate

validation data set

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

R1: If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No

 (Temperature > 20.00C) = {true, false}

Where to set the threshold?

Temperature 150C 180C 190C 220C 240C 270C

PlayTennis No No Yes Yes Yes No

among other examples sorted to node n.

examples with same target value

 Assign fraction pi of example to each descendant in

Classify new examples in the same fashion

 Combining multiple classifiers (boosting)

You might also like