Introduction to
Tree Methods
Reading Assignment
Chapter 8 of
Introduction to Statistical Learning
By Gareth James, et al.
Tree Methods
Let’s start off with a thought experiment to give some
motivation behind using a decisionMachine Math &
tree method.
Learning
Statistics
DS
Software Research
Domain
Knowledge
Tree Methods
Imagine that I play Tennis every Saturday and I always invite a
friend to come with me. Machine Math &
Learning
Sometimes my friend shows up, sometimes not.
Statistics
DS
For him it depends on a variety of factors, such as: weather,
Software Research
temperature, humidity, wind etc..
I start keeping track of these features and whether or not he
showed up to play with me. Domain
Knowledge
Tree Methods
Machine Math &
Learning
Statistics
DS
Software Research
Domain
Knowledge
Tree Methods
I want to use this data to
predict whether or not he will Machine Math &
Learning
show up to play. Statistics
An intuitive way to do this is DS
through a Decision Tree Software Research
Domain
Knowledge
Tree Methods
In this tree we have:
Machine Math &
● Nodes Learning
○ Split for the value of a
Statistics
certain attribute DS
● Edges Software Research
○ Outcome of a split to
next node
Domain
Knowledge
Tree Methods
In this tree we have:
Machine Math &
● Root Learning
○ The node that performs
Statistics
the first split DS
● Leaves Software Research
○ Terminal nodes that
predict the outcome
Domain
Knowledge
Intuition Behind Splits
Imaginary Data with 3 features (X,Y, and Z) with two possible
classes. Machine Math &
Learning
Statistics
DS
Software Research
Domain
Knowledge
Intuition Behind Splits
Splitting on Y gives us a clear separation between classes
Machine Math &
Learning
Statistics
DS
Software Research
Domain
Knowledge
Intuition Behind Splits
We could have also tried splitting on other features first:
Machine Math &
Learning
Statistics
Intuition Behind Splits
Entropy and Information Gain are the Mathematical Methods of
choosing the best split. Refer to reading Math &
Machine assignment.
Learning
Statistics
Random Forests
To improve performance, we can use many trees with a
random sample of features chosen Math
Machineas the &
split.
Learning
Statistics
● A new random sample of features is chosen for
every single tree at every single split.
● For classification, m is typically chosen to be the
square root of p.
Random Forests
What's the point?
Machine Math &
Learning
● Suppose there is one very strong featureStatistics
in the data set. When
using “bagged” trees, most of the trees will use that feature as
the top split, resulting in an ensemble of similar trees that are
highly correlated.
Random Forests
What's the point?
Machine Math &
Learning
Statistics
● Averaging highly correlated quantities does not significantly
reduce variance.
● By randomly leaving out candidate features from each split,
Random Forests "decorrelates" the trees, such that the
averaging process can reduce the variance of the resulting
model.