0% found this document useful (0 votes)

6 views66 pages

Boosted Trees

Machine Learning Boosted Trees

Uploaded by

yabaza71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views66 pages

Boosted Trees

Machine Learning Boosted Trees

Uploaded by

yabaza71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

ICS 603: Advanced Machine Learning

Lecture 2 Part 1
Ensemble Learning: Boosted Trees

Dr. Caroline Sabty

[email protected]
Faculty of Informatics and Computer Science
German International University in Cairo
Acknowledgment

The course and the slides are based on the slides of Dr. Seif Eldawlatly and
based on the course created by Prof. Jose Portilla
Boosting

● We’ve learned about single Decision Trees and have seeked to

improve upon them with Random Forest models.
● Let’s now explore another methodology of seeking to improve on the
single decision tree, known as boosting.
Boosting

● Outline:
○ Boosting and Meta-Learning
○ AdaBoost (Adaptive Boosting) Theory
○ Example of AdaBoost
○ Gradient Boosting Theory
● Related Reading:
■ ISLR: Section 8.2.3
○ Relevant Wikipedia Articles:
○ wikipedia.org/wiki/Boosting_(machine_learning)
○ wikipedia.org/wiki/AdaBoost
Boosting

• Boosting is an iterative procedure used to adaptively change the distribution

of training examples so that the base classifiers will focus on examples that
are hard to classify

• Initially, the examples are assigned equal weights, 1/N, so that they are
equally likely to be chosen for training

• A sample is drawn according to the sampling distribution of the training

examples to obtain a new training set

• Next, a classifier is induced from the training set and used to classify all the
examples in the original data

• The weights of the training examples are updated at the end of each
boosting round
Boosting

• Examples that are classified incorrectly will have their weights increased

• Those that are classified correctly will have their weights unchanged or
decreased

• This forces the classifier to focus on examples that are difficult to classify in
subsequent iterations
Boosting

Bagging Boosting

Classifier Classifier Classifier Classifier Classifier Classifier

Majority Weighted
Voting Voting

Prediction Prediction
Boosting

● The concept of boosting is not actually a machine learning algorithm,

it is methodology applied to an existing machine learning algorithm,
most commonly applied to the decision tree.
● Let’s explore this idea of a meta-learning algorithm by reviewing a
simple application and formula.
● Main formula for boosting (the overall boosted meta-learning
model):
● Main formula for boosting (process of aggregating a bunch of weak
learners into a strong ensemble learner):

● Implies that a combination of estimators with an applied coefficient

could act as an effective ensemble estimator.

● Note h(x) can in theory be any machine learning algorithm

(estimator/learner) (e.g., decision tree).
● Main formula for boosting:

● Can an ensemble of weak learners (very simple version of model) be

a strong learner when combined?
● Main formula for boosting:

● For decision tree models, we can use simple trees in place of h(x)
and combine them with the coefficients on each model.
AdaBoost

● AdaBoost (Adaptive Boosting) works by using an ensemble of weak

learners and then combining them through the use of a weighted
sum.
● Adaboost adapts by using previously created weak learners in
order to adjust misclassified instances for the next created weak
learner (not parallel creation of trees, adapted in serious).

13
AdaBoost

● What is a weak learner?

○ A weak model is a model that is too simple to perform well on
its own.
○ The weakest decision tree possible would be a stump, one
node and two leaves (the weak learner)

14
AdaBoost

● Unlike a single decision tree which fits to all the data at once (fitting
the data hard), AdaBoost aggregates multiple weak learners,
allowing the overall ensemble model to learn slowly (tree by tree)
from the features.
● Let’s first understand how this works from a data perspective

15
● Imagine a classification task:
● What would a stump classification look like?

16
● What would a stump classification look like?

X
X2

17
● What would a stump classification look like?

X1
X2

18
● What would a stump classification look like?

X1
X2

19
● What would a stump classification look like?

X2
X2

20
● What would a stump classification look like?

X2
X2

21
● How can we combine stumps? How to improve performance with
an ensemble (meta-learning)?

22
● AdaBoost Process:

● Main Formulas
● Algorithmic Steps
● Visual Walkthrough of Algorithm

23
● AdaBoost Process: Main Formulas
Number of learners (tress)

weighting factor (we want

it high if the weak learner
is good at predicting)

24
● AdaBoost Process:

Error term we want to

minimize: Sum of training
error

25
● AdaBoost Process:

Update weights for each point

26
● AdaBoost Process: Algorithm Steps

27
● AdaBoost Process:

We calculate the actual influence for

this classifier in classifying the data
points using this formula
Recall: Alpha is how much influence
this stump will have in the final
classification 28
● AdaBoost Process: Set of all weak learners already created
and we add to it this new one

Updated weights are being

used by next week learner

29
● AdaBoost Process: Visual Walkthrough

30
Visual Walkthrough

● AdaBoost Process:

31
● AdaBoost Process:

X1
αt

32
● AdaBoost Process: Use the updated set of weights to proceed and
build another weak learner

X1 X2
αt

33
● AdaBoost Process:

X1 X2
αt αt+1

34
● AdaBoost Process: Adjust the weights based on new
predictions

X1 X2
αt αt+1

35
● AdaBoost Process:

X1 X2
αt αt+1

36
● AdaBoost Process:

X1 X2 X1
αt αt+1

37
● AdaBoost Process:

X1 X2 X1
αt αt+1

38
● AdaBoost Process:

X1 X2 X1
αt αt+1 αt+2

39
● AdaBoost Process:

X1 X2 X1
αt αt+1 αt+2

40
● AdaBoost Process:

X1 X2 X1
αt αt+1 αt+
2
We sum all the predictions together

41
● AdaBoost Process:

X1 X2 X1
αt αt αt
+1 +2

42
● AdaBoost Process:

X1 X2 X1
αt αt+1 αt+
2

43
● AdaBoost Process:

X1 X2 X1
αt αt+1 αt+2

44
● AdaBoost uses an ensemble of weak learners that learn slowly in
series.
● Certain weak learners have more “say” in the final output than
others due to the multiplied alpha parameter.
● Each subsequent t weak learner is built using a reweighted data set
from the t-1 weak learner.

45
● Intuition of Adaptive Boosting:
○ Each stump essentially represents the strength of a feature to
predict.
○ Building these stumps in series and adding in the alpha
parameter allows us to intelligently combine the importance
of each feature together.

46
● Notes on Adaptive Boosting:
○ Unlike Random Forest, it is possible to overfit with AdaBoost,
however it takes many trees to do this.
○ Usually error has already stabilized way before enough trees
are added to cause overfitting.

47
Gradient Boosting

● Gradient Boosting is a very similar idea to AdaBoost, where weak

learners are created in series in order to produce a strong ensemble
model.
● Gradient Boosting makes use of the residual error for learning.
● Gradient Boosting vs. Adaboost:
○ Larger Trees allowed in Gradient Boosting.
○ Learning Rate coefficient same for all weak learners.
○ Gradual series learning is based on training on the residuals of
the previous model.

48
● Gradient Boosting Regression Example

Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $462,000

230 3 3 $565,000

49
● Train a decision tree on data
● Note - not just a stump!
● Get predicted ŷ value

Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $462,000

230 3 3 $565,000

50
● Get predicted ŷ value

y ŷ

$500,000 $509,000

$462,000 $509,000

$565,000 $509,000

51
● Calculate residual: e = y-ŷ

y ŷ e
$500,000 $509,000 -$9,000

$462,000 $509,000 -$47,000

$565,000 $509,000 $56,000

52
● Create new model to predict the error

y ŷ e
$500,000 $509,000 -$9,000

$462,000 $509,000 -$47,000

$565,000 $509,000 $56,000

53
● Create new model to predict the error

y ŷ e

$500,000 $509,000 -$9,000

$462,000 $509,000 -$47,000

$565,000 $509,000 $56,000

54
● Create new model to predict the error

y ŷ e f1

$500,000 $509,000 -$9,000 -$8,000

$462,000 $509,000 -$47,000 -$50,000

$565,000 $509,000 $56,000 $50,000

55
● Create new model to predict the error

y ŷ e f1
$500,000 $509,000 -$9,000 -$8,000

$462,000 $509,000 -$47,000 -$50,000

Area m2 Bedroo Bathroo

ms ms
$565,000 $509,000 $56,000 $50,000 200 3 2

190 2 1

230 3 3 56
● Create new model to predict the error

y ŷ e f1
$500,000 $509,000 -$9,000 -$8,000

$462,000 $509,000 -$47,000 -$50,000

Area m2 Bedroo Bathroo

ms ms
$565,000 $509,000 $56,000 $50,000
200 3 2

190 2 1

230 3 3 57
● Update prediction using error prediction

y ŷ e f1
$500,000 $509,000 -$9,000 -$8,000

$462,000 $509,000 -$47,000 -$50,000

$565,000 $509,000 $56,000 $50,000

58
● Update prediction using error prediction

y ŷ e f1 F1 = ŷ +
f1

$500,000 $509,000 -$9,000 -$8,000

$462,000 $509,000 -$47,000 -$50,000

$565,000 $509,000 $56,000 $50,000

59
● Update prediction using error prediction
● We can continue this process in series

y ŷ e f1 F1 = ŷ +
f1

$500,000 $509,000 -$9,000 -$8,000 $501,000

$462,000 $509,000 -$47,000 -$50,000 $459,000

$565,000 $509,000 $56,000 $50,000 $559,000

60
● Gradient Boosting Process

Same across all the models

61
● Gradient Boosting Process
○ Create initial model: f0
○ Train another model on error
■ e = y - f0
○ Create new prediction
■ F1 = f0 + ηf1
○ Repeat as needed
■ Fm = fm-1 + ηfm

62
● Note, for classification we can use the logit as an error metric:

63
● Note, the learning rate is the same for each new model in the
series, it is not unique to each subsequent model (unlike
AdaBoost’s alpha coefficient).
● Gradient Boosting is fairly robust to overfitting, allowing for the
number of estimators to be set high be default (~100).

64
● Gradient Boosting Intuition
○ We optimize the series of trees by learning on the
residuals, forcing subsequent trees to attempt to correct
for the error in the previous trees.

65
● Gradient Boosting Intuition
○ The trade-off is training time.
○ A learning rate is between 0-1, which means a very low
value would mean each subsequent tree has little “say”,
meaning more trees need to be created, causing a longer
computational training time.

Project 9 Portfolio
No ratings yet
Project 9 Portfolio
51 pages
UNIT1
No ratings yet
UNIT1
80 pages
CFO and CCC
0% (1)
CFO and CCC
9 pages
Gradient Boosting in ML
No ratings yet
Gradient Boosting in ML
5 pages
Adaboost
No ratings yet
Adaboost
22 pages
Assignment No 2
No ratings yet
Assignment No 2
26 pages
Module 4 ML
No ratings yet
Module 4 ML
33 pages
HR Analytics 3rd Chapter
No ratings yet
HR Analytics 3rd Chapter
16 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
ML Unit 3 (Ab22)
No ratings yet
ML Unit 3 (Ab22)
42 pages
Ensemble, Voting, Bagging, Boosting
No ratings yet
Ensemble, Voting, Bagging, Boosting
15 pages
Adaboost
No ratings yet
Adaboost
29 pages
Module 7 Notes
No ratings yet
Module 7 Notes
3 pages
Boosting
No ratings yet
Boosting
2 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Ensemble (v6)
No ratings yet
Ensemble (v6)
45 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
ML Exp 9
No ratings yet
ML Exp 9
3 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Finance-Focused Big Data Techniques
100% (1)
Finance-Focused Big Data Techniques
23 pages
Boosting Algo Adaboost
No ratings yet
Boosting Algo Adaboost
3 pages
BBA Project Guidelines PDF
100% (1)
BBA Project Guidelines PDF
33 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Lecture 16: Boosting - Applied ML
No ratings yet
Lecture 16: Boosting - Applied ML
20 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
LECTURE+NOTES Boosting
No ratings yet
LECTURE+NOTES Boosting
8 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Machine Learning Boosting Guide
No ratings yet
Machine Learning Boosting Guide
27 pages
Algorithm Adaboost
No ratings yet
Algorithm Adaboost
1 page
22 Boosting
No ratings yet
22 Boosting
32 pages
AdaBoost Notes
No ratings yet
AdaBoost Notes
5 pages
Pradipta Kumar Pattanayak - Ada Boosting
No ratings yet
Pradipta Kumar Pattanayak - Ada Boosting
44 pages
DM (Boosting)
No ratings yet
DM (Boosting)
15 pages
AdaBoost Algorithm: Key Features & Benefits
No ratings yet
AdaBoost Algorithm: Key Features & Benefits
9 pages
Ensemble
No ratings yet
Ensemble
33 pages
Titanic Survival Prediction Using ML Miniproject
No ratings yet
Titanic Survival Prediction Using ML Miniproject
21 pages
Types of Boosting
No ratings yet
Types of Boosting
4 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Ensemble Learning
No ratings yet
Ensemble Learning
9 pages
AdaBoost Interview Prep Guide
No ratings yet
AdaBoost Interview Prep Guide
6 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Boosting
No ratings yet
Boosting
12 pages
ML Presentation
No ratings yet
ML Presentation
14 pages
Chapter 3 - Boosting Theory
No ratings yet
Chapter 3 - Boosting Theory
7 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
ML U3 Notes
No ratings yet
ML U3 Notes
10 pages
Adaboost
No ratings yet
Adaboost
4 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Course 2 Ask-Questions-Make-Decisions
No ratings yet
Course 2 Ask-Questions-Make-Decisions
16 pages
Ensemble Classifiers Overview
No ratings yet
Ensemble Classifiers Overview
37 pages
Ensemble Methods for ML Students
No ratings yet
Ensemble Methods for ML Students
28 pages
Resilience To Overfitting AdaBoosts Approach
No ratings yet
Resilience To Overfitting AdaBoosts Approach
8 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Boosting Algorithms Explained
No ratings yet
Boosting Algorithms Explained
79 pages
AdaBoost Classifier Tutorial Python
100% (1)
AdaBoost Classifier Tutorial Python
9 pages
Ensemble Machine Learning Approach
No ratings yet
Ensemble Machine Learning Approach
13 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
COCOMO II Software Cost Estimation Guide
No ratings yet
COCOMO II Software Cost Estimation Guide
37 pages
Clustering Validation
No ratings yet
Clustering Validation
4 pages
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
No ratings yet
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
35 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
CV Om Umrania
No ratings yet
CV Om Umrania
2 pages
Docs Quality MA - IMS.00001 Integrated Management System Manual
No ratings yet
Docs Quality MA - IMS.00001 Integrated Management System Manual
42 pages
download/computer Science PDF
No ratings yet
download/computer Science PDF
20 pages
Experimental and Numerical Study On CO2 Absorption Mass Transfer
No ratings yet
Experimental and Numerical Study On CO2 Absorption Mass Transfer
13 pages
Module 2
No ratings yet
Module 2
21 pages
Pharma Recruitment Practices Analysis
No ratings yet
Pharma Recruitment Practices Analysis
28 pages
SPL Measurement Lab Report
No ratings yet
SPL Measurement Lab Report
4 pages
CV Data
No ratings yet
CV Data
2 pages
MAT 3 14th WeeK
No ratings yet
MAT 3 14th WeeK
28 pages
Nabil Bank Project Report
No ratings yet
Nabil Bank Project Report
36 pages
Stata Tutorial
No ratings yet
Stata Tutorial
88 pages
ML IA1 Answers
No ratings yet
ML IA1 Answers
26 pages
Datamites Data Analyst Brochure
No ratings yet
Datamites Data Analyst Brochure
17 pages
Sigal As 2015
No ratings yet
Sigal As 2015
15 pages
Business Analytics for Non-Techies
No ratings yet
Business Analytics for Non-Techies
12 pages
Aditya Kumar - Internship Presentation
No ratings yet
Aditya Kumar - Internship Presentation
12 pages
Work Discipline & Environment Impact on Employee Performance
No ratings yet
Work Discipline & Environment Impact on Employee Performance
12 pages
Statistics
No ratings yet
Statistics
6 pages
Lecture 6 Topic 5 Basic Regression
No ratings yet
Lecture 6 Topic 5 Basic Regression
11 pages
Pengaruh Kondisi Infrastruktur Terhadap Pertumbuhan Ekonomi Di Jawa Barat
No ratings yet
Pengaruh Kondisi Infrastruktur Terhadap Pertumbuhan Ekonomi Di Jawa Barat
10 pages
Lesson Plan Probability
No ratings yet
Lesson Plan Probability
3 pages

Boosted Trees

Uploaded by

Boosted Trees

Uploaded by

ICS 603: Advanced Machine Learning

Dr. Caroline Sabty

● We’ve learned about single Decision Trees and have seeked to

• Boosting is an iterative procedure used to adaptively change the distribution

• A sample is drawn according to the sampling distribution of the training

Classifier Classifier Classifier Classifier Classifier Classifier

● The concept of boosting is not actually a machine learning algorithm,

● Implies that a combination of estimators with an applied coefficient

● Note h(x) can in theory be any machine learning algorithm

● Can an ensemble of weak learners (very simple version of model) be

● AdaBoost (Adaptive Boosting) works by using an ensemble of weak

● What is a weak learner?

weighting factor (we want

Error term we want to

Update weights for each point

We calculate the actual influence for

Updated weights are being

● Gradient Boosting is a very similar idea to AdaBoost, where weak

Area m2 Bedrooms Bathrooms Price

Area m2 Bedrooms Bathrooms Price

$462,000 $509,000 -$47,000

$565,000 $509,000 $56,000

$462,000 $509,000 -$47,000

$565,000 $509,000 $56,000

$500,000 $509,000 -$9,000

$462,000 $509,000 -$47,000

$565,000 $509,000 $56,000

$500,000 $509,000 -$9,000 -$8,000

$462,000 $509,000 -$47,000 -$50,000

$565,000 $509,000 $56,000 $50,000

$462,000 $509,000 -$47,000 -$50,000

Area m2 Bedroo Bathroo

$462,000 $509,000 -$47,000 -$50,000

Area m2 Bedroo Bathroo

$462,000 $509,000 -$47,000 -$50,000

$565,000 $509,000 $56,000 $50,000

$500,000 $509,000 -$9,000 -$8,000

$462,000 $509,000 -$47,000 -$50,000

$565,000 $509,000 $56,000 $50,000

$500,000 $509,000 -$9,000 -$8,000 $501,000

$462,000 $509,000 -$47,000 -$50,000 $459,000

$565,000 $509,000 $56,000 $50,000 $559,000

Same across all the models

You might also like