Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
19 views35 pages

CSE 422 Machine Learning Tree Based Methods

K

Uploaded by

airobot28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views35 pages

CSE 422 Machine Learning Tree Based Methods

K

Uploaded by

airobot28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Tree Based Methods

Fall 2024
Contents
● Tree Based Methods
○ ID3, C4.5, CART
○ Mixed of numeric and categorical
○ Missing data
● Pruning
● Visualization
● Rule Generator

https://www.flickr.com/photos/wonderlane/2062184804/
Explainable Rules

https://heartbeat.fritz.ai/understanding-the-mathematics-behind-decision-trees-22d86d55906
Decision Tree Example
Handling Numeric Attributes
y ● Suppose y is the prediction variable
● x is the feature
● The decision boundary is at value v
● If x > v then use samples on the right
v x
● If x < v use samples on the left
● For classification, voting
● For regression use uniform or
weighted average
Decision Tree
● There are many trees possible
○ Always prefer the shortest one
● What is a good decision tree?
● For numeric attributes, it is important to
decide the value to split
○ binary vs multiway splits
● For categorical variables its the set of the
different values
● How to select between multiple attributes?
● How many attributes should be selected?
○ Single or multiple?
Decision Tree for Regression and Classification
● Classification and Regression Trees
○ Breiman et al 1984
○ Only Binary Splits
○ Uses Gini “measure of impurity”
● Iterative Dichotomiser 3
○ Ross Quinlan, 1986
○ Uses Information Gain, Greedy Algorithm
● C 4.5 by
○ Ross Quinlan, 1993
○ Improved version over ID3
■ Pruning, attributes with different costs, missing values, continuous attributes,

Top 10 algorithms in data mining http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf


Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Entropy
● Measure of disorder in a set
● Find out entropy of each of
the rectangles
Information Gain
● How much information is
gained by a split
● Originally a node have a
measure of entropy H(q)
● After the split, the entropy is
divided into sets. The gain is
the difference.
Gain Ratio
● IG biases the decision tree
against considering attributes
with a large number of distinct
values
○ E.g. credit card number
● Normalization of Information
Gain
● Split Information

● Gain Ratio = Information Gain /


Split Information
Gini Index
● Gini impurity is a measure of
how often a randomly chosen
element from the set would be
incorrectly labeled if it was
randomly labeled according to
the distribution of labels in the
subset
● Used by CART
● Gain is defined similarly
An Example
We will work on same dataset
in ID3. There are 14 instances
of golf playing decisions based
on outlook, temperature,
humidity and wind factors.

● We will use gini index

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook
Outlook is a nominal
feature. It can be sunny,
overcast or rain. I will
summarize the final
decisions for outlook
feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Temperature
Temperature is a nominal
feature and it could have
3 different values: Cool,
Hot and Mild. Let’s
summarize decisions for
temperature feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Humidity
Humidity is a binary class feature. It can be high or normal.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Wind
Wind is a binary class similar to humidity. It can be weak and strong.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split: Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split:Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Recursive Partitioning
A sub dataset

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Temperature

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Humidity

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Wind

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The final tree

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Decision Tree Overfitting
● Pre-Pruning
○ Maximum number of leaf nodes
○ Maximum depth of the tree
○ Minimum number of training
instances at a leaf node
● Post-Pruning
○ Another strategy to avoid
overfitting in decision trees is to
first grow a full tree, and then
prune it based on a previously
held-out validation dataset.
○ Use statistical Tests
Tree Pruning: Validation Set
● Prune using a hold out validation dataset

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits
● Try Chi Square Test
● Check the statistic to find any
significance gain achieved by the
split
● Is there any difference with the
arbitrary split?

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Decision Tree
Pros Cons

● Interpretable and Simple ● NP-complete


● Handles all types of data ● Not stable
● Handles missing values ● Often Overfits
● Less pre-processing required ● High bias
● Fast computation ● Not suitable for unstructured
● non-parametric data
Multi-Variable Split?
Multi-Variable Split?

You might also like