0% found this document useful (0 votes)

81 views15 pages

Introduction To Tree Methods

Decision trees are a machine learning method that can be used to predict outcomes based on input data. Trees split the data into nodes based on the values of predictor variables, with the splits aiming to maximize separation of outcomes. Random forests improve on decision trees by growing many trees on randomly sampled subsets of the data and features, averaging their predictions, which reduces variance and helps prevent overfitting.

Uploaded by

johnconnor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views15 pages

Introduction To Tree Methods

Uploaded by

johnconnor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Introduction to

Tree Methods
Reading Assignment

Chapter 8 of
Introduction to Statistical Learning
By Gareth James, et al.
Tree Methods

Let’s start off with a thought experiment to give some

motivation behind using a decisionMachine Math &
tree method.
Learning
Statistics
DS

Software Research

Domain
Knowledge
Tree Methods

Imagine that I play Tennis every Saturday and I always invite a

friend to come with me. Machine Math &
Learning
Sometimes my friend shows up, sometimes not.
Statistics
DS
For him it depends on a variety of factors, such as: weather,
Software Research
temperature, humidity, wind etc..
I start keeping track of these features and whether or not he
showed up to play with me. Domain
Knowledge
Tree Methods

Machine Math &

Learning
Statistics
DS

Software Research

Domain
Knowledge
Tree Methods

I want to use this data to

predict whether or not he will Machine Math &
Learning
show up to play. Statistics
An intuitive way to do this is DS

through a Decision Tree Software Research

Domain
Knowledge
Tree Methods

In this tree we have:

Machine Math &
● Nodes Learning
○ Split for the value of a
Statistics
certain attribute DS

● Edges Software Research

○ Outcome of a split to
next node
Domain
Knowledge
Tree Methods

In this tree we have:

Machine Math &
● Root Learning
○ The node that performs
Statistics
the first split DS

● Leaves Software Research

○ Terminal nodes that

predict the outcome
Domain
Knowledge
Intuition Behind Splits

Imaginary Data with 3 features (X,Y, and Z) with two possible

classes. Machine Math &
Learning
Statistics
DS

Software Research

Domain
Knowledge
Intuition Behind Splits

Splitting on Y gives us a clear separation between classes

Machine Math &
Learning
Statistics
DS

Software Research

Domain
Knowledge
Intuition Behind Splits

We could have also tried splitting on other features first:

Machine Math &
Learning
Statistics
Intuition Behind Splits

Entropy and Information Gain are the Mathematical Methods of

choosing the best split. Refer to reading Math &
Machine assignment.
Learning
Statistics
Random Forests

To improve performance, we can use many trees with a

random sample of features chosen Math
Machineas the &
split.
Learning
Statistics
● A new random sample of features is chosen for
every single tree at every single split.
● For classification, m is typically chosen to be the
square root of p.
Random Forests

What's the point?

Machine Math &
Learning
● Suppose there is one very strong featureStatistics
in the data set. When
using “bagged” trees, most of the trees will use that feature as
the top split, resulting in an ensemble of similar trees that are
highly correlated.
Random Forests

What's the point?

Machine Math &
Learning
Statistics
● Averaging highly correlated quantities does not significantly
reduce variance.
● By randomly leaving out candidate features from each split,
Random Forests "decorrelates" the trees, such that the
averaging process can reduce the variance of the resulting
model.

Bayesian Statistics (An Introduction) (4th Edition) Lee
No ratings yet
Bayesian Statistics (An Introduction) (4th Edition) Lee
10 pages
Cornerstones of Financial Accounting 3rd Canadian Edition Rich Unlocked Test Bank
No ratings yet
Cornerstones of Financial Accounting 3rd Canadian Edition Rich Unlocked Test Bank
311 pages
Scientific Notation Unit Test
100% (1)
Scientific Notation Unit Test
3 pages
Operation Strategy
100% (1)
Operation Strategy
22 pages
Nguyễn Văn Thành Trung-K59BF-ML15 PDF
No ratings yet
Nguyễn Văn Thành Trung-K59BF-ML15 PDF
9 pages
CH1O3 Questions PDF
No ratings yet
CH1O3 Questions PDF
52 pages
Math Test: Rounding & Operations
No ratings yet
Math Test: Rounding & Operations
4 pages
Patrolling
No ratings yet
Patrolling
31 pages
Introduction To Algo Trading
100% (8)
Introduction To Algo Trading
50 pages
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
No ratings yet
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
64 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
5 Intro To Tree Methods LT
No ratings yet
5 Intro To Tree Methods LT
15 pages
Intro to k-Nearest Neighbor Algorithm
No ratings yet
Intro to k-Nearest Neighbor Algorithm
3 pages
Handbook The Best Skin of Your Life Starts Here
90% (10)
Handbook The Best Skin of Your Life Starts Here
404 pages
Reoi Construction Supervision Services Leseru-Kitale Morpus-Lokichar - 28.3.2025
100% (1)
Reoi Construction Supervision Services Leseru-Kitale Morpus-Lokichar - 28.3.2025
3 pages
Exercise 4: Simple and Multiple Linear Regression Analysis
No ratings yet
Exercise 4: Simple and Multiple Linear Regression Analysis
15 pages
BC3406 Business Analytics Hackathon Rubric PDF
No ratings yet
BC3406 Business Analytics Hackathon Rubric PDF
1 page
WiFi, Working, Elements of WiFi
100% (2)
WiFi, Working, Elements of WiFi
67 pages
Product List
No ratings yet
Product List
42 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Free Download Data Science Curriculum - Innomatics Research Labs Hyderabad, India
No ratings yet
Free Download Data Science Curriculum - Innomatics Research Labs Hyderabad, India
14 pages
Data Mining: Concepts and Techniques: - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Introduction
44 pages
The Famished Road
No ratings yet
The Famished Road
91 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Unsupervised Learning Explained
No ratings yet
Unsupervised Learning Explained
20 pages
Optimization in Data Science
No ratings yet
Optimization in Data Science
18 pages
Final Exam Paper Fall 2020
No ratings yet
Final Exam Paper Fall 2020
3 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Minnesota Waterfowl Regulations 2023
No ratings yet
Minnesota Waterfowl Regulations 2023
32 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
25 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Business 70 PDF
No ratings yet
Business 70 PDF
1 page
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Unsupervised Learning: K-Means & GMM
No ratings yet
Unsupervised Learning: K-Means & GMM
27 pages
Data Science
No ratings yet
Data Science
74 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Sono 336 Carotid-Worksheet
No ratings yet
Sono 336 Carotid-Worksheet
1 page
Chapter
100% (1)
Chapter
101 pages
Notes - EDA-Unit1
No ratings yet
Notes - EDA-Unit1
34 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
CH 6
No ratings yet
CH 6
72 pages
Understanding Kohlberg's Moral Stages
No ratings yet
Understanding Kohlberg's Moral Stages
43 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
REPORT On DECISION TREE
No ratings yet
REPORT On DECISION TREE
40 pages
Data Science
100% (2)
Data Science
38 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
ECommerce Data Analysis
No ratings yet
ECommerce Data Analysis
1 page
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Heat Transfer Course Overview
No ratings yet
Heat Transfer Course Overview
3 pages
Powerpoint Template - 190911
No ratings yet
Powerpoint Template - 190911
2 pages
MIT SCM Full Time Brochure 28.8.13
No ratings yet
MIT SCM Full Time Brochure 28.8.13
2 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Program Overview: #Datascience - Data Science in Iot
100% (1)
Program Overview: #Datascience - Data Science in Iot
9 pages
Manual Ventiladores Munters - mfs36-52
No ratings yet
Manual Ventiladores Munters - mfs36-52
39 pages
BC3406 Business Analytics Hackathon Rubric
No ratings yet
BC3406 Business Analytics Hackathon Rubric
1 page
Supervised & Deep Learning Guide
No ratings yet
Supervised & Deep Learning Guide
83 pages
Wireless AC1750 Dual-Band Gigabit Cloud Router USB 3.0: Product Highlights
No ratings yet
Wireless AC1750 Dual-Band Gigabit Cloud Router USB 3.0: Product Highlights
3 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Kaggle's State of Machine Learning and Data Science 2021
No ratings yet
Kaggle's State of Machine Learning and Data Science 2021
45 pages
Supply Chain Coordination Model
No ratings yet
Supply Chain Coordination Model
264 pages
BC3406 Data Hackathon 2017 Brief v4
No ratings yet
BC3406 Data Hackathon 2017 Brief v4
3 pages
Data Science Intro
No ratings yet
Data Science Intro
52 pages
Power BI For SMEs and Me
No ratings yet
Power BI For SMEs and Me
7 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Anova (Keller)
No ratings yet
Anova (Keller)
91 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Basic Git Commands for Windows Users
No ratings yet
Basic Git Commands for Windows Users
11 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Newsvendor Problem
No ratings yet
Newsvendor Problem
13 pages
Heart Disease Prediction Guide
100% (1)
Heart Disease Prediction Guide
73 pages
Python Built-in Functions Guide
No ratings yet
Python Built-in Functions Guide
1 page
Paula's Choice Business Analytics Report
No ratings yet
Paula's Choice Business Analytics Report
175 pages
Goat Housing Design Guide
No ratings yet
Goat Housing Design Guide
2 pages
Supreme Health Brochure (Full Version)
No ratings yet
Supreme Health Brochure (Full Version)
27 pages
Array Formulas
No ratings yet
Array Formulas
12 pages
DDP Sohana - 2021 - Notification
No ratings yet
DDP Sohana - 2021 - Notification
17 pages
Single Channel LoRa IoT Kit v2 User Manual - v1.0.7
No ratings yet
Single Channel LoRa IoT Kit v2 User Manual - v1.0.7
61 pages
Lecture O03: ENGR90024 Computational Fluid Dynamics
No ratings yet
Lecture O03: ENGR90024 Computational Fluid Dynamics
43 pages
Intro To Neural Nets PDF
No ratings yet
Intro To Neural Nets PDF
29 pages
Intro To Neural Nets PDF
No ratings yet
Intro To Neural Nets PDF
29 pages
Data Science New
No ratings yet
Data Science New
9 pages
Industrial Valve Specifications
No ratings yet
Industrial Valve Specifications
9 pages
Data Hackathon: Paula's Choice
No ratings yet
Data Hackathon: Paula's Choice
21 pages
Matplotlib Fundamentals
No ratings yet
Matplotlib Fundamentals
31 pages
USPCAS-E Manual
No ratings yet
USPCAS-E Manual
119 pages
Nanyang Business School BC3406 Business Analytics Consulting Data Hackathon
No ratings yet
Nanyang Business School BC3406 Business Analytics Consulting Data Hackathon
77 pages
BC3406 - Business Analytics Consulting: Team 5
No ratings yet
BC3406 - Business Analytics Consulting: Team 5
75 pages
BC3406 Business Analytics Consulting Semester 2, AY 2016/17 Paula's Choice Report
No ratings yet
BC3406 Business Analytics Consulting Semester 2, AY 2016/17 Paula's Choice Report
31 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
DBSCAN Algorithm for Data Scientists
No ratings yet
DBSCAN Algorithm for Data Scientists
10 pages
Efficient Sequential Pattern Mining
No ratings yet
Efficient Sequential Pattern Mining
7 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Ps 1320 Gbnlfresd
No ratings yet
Ps 1320 Gbnlfresd
8 pages
5 Powerful Scikit-Learn Examples - Towards Data Science
No ratings yet
5 Powerful Scikit-Learn Examples - Towards Data Science
10 pages
First Term TT-2 CL 9,10,11&12
No ratings yet
First Term TT-2 CL 9,10,11&12
1 page
Canine Protection Training: The Police Dog: History, Breeds and Service
No ratings yet
Canine Protection Training: The Police Dog: History, Breeds and Service
30 pages
K Means
No ratings yet
K Means
22 pages
K-NN Algorithm: Key Concepts & Challenges
No ratings yet
K-NN Algorithm: Key Concepts & Challenges
10 pages
DSR Ss 03 January 2023 Indordb
No ratings yet
DSR Ss 03 January 2023 Indordb
19 pages
Data Science
100% (2)
Data Science
52 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Assignment MHDD 160
No ratings yet
Assignment MHDD 160
2 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages