CSE 422 Machine Learning Tree Based Methods

Uploaded by

airobot28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views35 pages

CSE 422 Machine Learning Tree Based Methods

Uploaded by

airobot28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Tree Based Methods

Fall 2024
Contents
● Tree Based Methods
○ ID3, C4.5, CART
○ Mixed of numeric and categorical
○ Missing data
● Pruning
● Visualization
● Rule Generator

https://www.flickr.com/photos/wonderlane/2062184804/
Explainable Rules

https://heartbeat.fritz.ai/understanding-the-mathematics-behind-decision-trees-22d86d55906
Decision Tree Example
Handling Numeric Attributes
y ● Suppose y is the prediction variable
● x is the feature
● The decision boundary is at value v
● If x > v then use samples on the right
v x
● If x < v use samples on the left
● For classification, voting
● For regression use uniform or
weighted average
Decision Tree
● There are many trees possible
○ Always prefer the shortest one
● What is a good decision tree?
● For numeric attributes, it is important to
decide the value to split
○ binary vs multiway splits
● For categorical variables its the set of the
different values
● How to select between multiple attributes?
● How many attributes should be selected?
○ Single or multiple?
Decision Tree for Regression and Classification
● Classification and Regression Trees
○ Breiman et al 1984
○ Only Binary Splits
○ Uses Gini “measure of impurity”
● Iterative Dichotomiser 3
○ Ross Quinlan, 1986
○ Uses Information Gain, Greedy Algorithm
● C 4.5 by
○ Ross Quinlan, 1993
○ Improved version over ID3
■ Pruning, attributes with different costs, missing values, continuous attributes,

Top 10 algorithms in data mining http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf

Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Entropy
● Measure of disorder in a set
● Find out entropy of each of
the rectangles
Information Gain
● How much information is
gained by a split
● Originally a node have a
measure of entropy H(q)
● After the split, the entropy is
divided into sets. The gain is
the difference.
Gain Ratio
● IG biases the decision tree
against considering attributes
with a large number of distinct
values
○ E.g. credit card number
● Normalization of Information
Gain
● Split Information

● Gain Ratio = Information Gain /

Split Information
Gini Index
● Gini impurity is a measure of
how often a randomly chosen
element from the set would be
incorrectly labeled if it was
randomly labeled according to
the distribution of labels in the
subset
● Used by CART
● Gain is deﬁned similarly
An Example
We will work on same dataset
in ID3. There are 14 instances
of golf playing decisions based
on outlook, temperature,
humidity and wind factors.

● We will use gini index

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook
Outlook is a nominal
feature. It can be sunny,
overcast or rain. I will
summarize the ﬁnal
decisions for outlook
feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Temperature
Temperature is a nominal
feature and it could have
3 diﬀerent values: Cool,
Hot and Mild. Let’s
summarize decisions for
temperature feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Humidity
Humidity is a binary class feature. It can be high or normal.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Wind
Wind is a binary class similar to humidity. It can be weak and strong.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁrst split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁrst split: Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁrst split:Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Recursive Partitioning
A sub dataset

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Temperature

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Humidity

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Wind

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁnal tree

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Decision Tree Overfitting
● Pre-Pruning
○ Maximum number of leaf nodes
○ Maximum depth of the tree
○ Minimum number of training
instances at a leaf node
● Post-Pruning
○ Another strategy to avoid
overfitting in decision trees is to
first grow a full tree, and then
prune it based on a previously
held-out validation dataset.
○ Use statistical Tests
Tree Pruning: Validation Set
● Prune using a hold out validation dataset

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits
● Try Chi Square Test
● Check the statistic to find any
significance gain achieved by the
split
● Is there any difference with the
arbitrary split?

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Decision Tree
Pros Cons

● Interpretable and Simple ● NP-complete

● Handles all types of data ● Not stable
● Handles missing values ● Often Overﬁts
● Less pre-processing required ● High bias
● Fast computation ● Not suitable for unstructured
● non-parametric data
Multi-Variable Split?
Multi-Variable Split?

Decision Tree
100% (4)
Decision Tree
66 pages
CART Decision Tree Guide
0% (1)
CART Decision Tree Guide
26 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
District Test On The Circular Flow Model-1-1
100% (2)
District Test On The Circular Flow Model-1-1
7 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Data Warehousing and Data Mining: Classification, Trees
No ratings yet
Data Warehousing and Data Mining: Classification, Trees
26 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
MODULE 4-Dr - GM
No ratings yet
MODULE 4-Dr - GM
23 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
ST93C46 Data Sheets
No ratings yet
ST93C46 Data Sheets
14 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
The Economist
No ratings yet
The Economist
27 pages
Implementing Merchandise Plans
100% (4)
Implementing Merchandise Plans
19 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Recruitment & Selection Process at Vodafone
50% (2)
Recruitment & Selection Process at Vodafone
73 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Examples
No ratings yet
Examples
8 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
ML TCS Lecture 1608 DecisionTree
No ratings yet
ML TCS Lecture 1608 DecisionTree
41 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Trees
No ratings yet
Decision Trees
61 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Data II - Decision Trees and Rules
No ratings yet
Data II - Decision Trees and Rules
11 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
2 - 4 Cart
No ratings yet
2 - 4 Cart
16 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
27 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Lecture 5 DecisionTree
No ratings yet
Lecture 5 DecisionTree
21 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
57 Brochure
No ratings yet
57 Brochure
42 pages
Unit 2
No ratings yet
Unit 2
29 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
IPE 341 Chip Formation Mechanism
100% (1)
IPE 341 Chip Formation Mechanism
22 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Pci Leasing and Finance
No ratings yet
Pci Leasing and Finance
6 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
No ratings yet
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
60 pages
Strategic Management Test Bank Wheelen smbp12 TB 05
No ratings yet
Strategic Management Test Bank Wheelen smbp12 TB 05
24 pages
2 Decision Tree Algo
No ratings yet
2 Decision Tree Algo
46 pages
Legal Frameworks & Judicial Notice
100% (1)
Legal Frameworks & Judicial Notice
7 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
TN206
No ratings yet
TN206
37 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Bhatti 062014
No ratings yet
Bhatti 062014
41 pages
Decision Tree - Notes
No ratings yet
Decision Tree - Notes
8 pages
Biochemical Test of Bacteria
No ratings yet
Biochemical Test of Bacteria
26 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Legal Issues To Consider To Protect Your App in 2024
No ratings yet
Legal Issues To Consider To Protect Your App in 2024
23 pages
Analytical VaR VaR Mapping
No ratings yet
Analytical VaR VaR Mapping
13 pages
HW 9 Bootstrap, Jackknife, and Permutation Tests
No ratings yet
HW 9 Bootstrap, Jackknife, and Permutation Tests
7 pages
SAP MM - Purchase Info Record
100% (1)
SAP MM - Purchase Info Record
6 pages
Stock Transport
No ratings yet
Stock Transport
1 page
Maintenance Task Record E Rating English
No ratings yet
Maintenance Task Record E Rating English
11 pages
BTK - A318 - A319 - A320 - A321 - AMM - 01-Feb-2020 - J. AIDS MCDU Functions
No ratings yet
BTK - A318 - A319 - A320 - A321 - AMM - 01-Feb-2020 - J. AIDS MCDU Functions
44 pages
Bank Deposit Secrecy Law Overview
No ratings yet
Bank Deposit Secrecy Law Overview
7 pages
Insurance Premium Rates Guide
No ratings yet
Insurance Premium Rates Guide
6 pages
Avr Libc User Manual 1.4.6
No ratings yet
Avr Libc User Manual 1.4.6
372 pages
Non Paper Asylum Policy
No ratings yet
Non Paper Asylum Policy
2 pages
IRCTC Train Ticket: Rourkela to Surat
No ratings yet
IRCTC Train Ticket: Rourkela to Surat
3 pages
Emerging Trends in Sales Management
100% (7)
Emerging Trends in Sales Management
14 pages
CaseStudy Ch8 (3) Eng
No ratings yet
CaseStudy Ch8 (3) Eng
2 pages
Libreoffiice Basic: Libreoffic E Referen E Card
No ratings yet
Libreoffiice Basic: Libreoffic E Referen E Card
2 pages
Journal For Success: (Behavioural Science Programme)
No ratings yet
Journal For Success: (Behavioural Science Programme)
12 pages