Module 10- Part II
Boosting Models
AdaBoost, GBM, XGBoost
Prof. Pedram Jahangiry
Prof. Pedram Jahangiry
Class Modules
• Module 1- Introduction to Machine Learning
• Module 2- Setting up Machine Learning Environment
• Module 3- Linear Regression (Econometrics approach)
• Module 4- Machine Learning Fundamentals
• Module 5- Linear Regression (Machine Learning approach)
• Module 6- Penalized Regression (Ridge, LASSO, Elastic Net)
• Module 7- Logistic Regression
• Module 8- K-Nearest Neighbors (KNN)
• Module 9- Classification and Regression Trees (CART)
• Module 10- Bagging and Boosting
• Module 11- Dimensionality Reduction (PCA)
• Module 12- Clustering (KMeans – Hierarchical)
Prof. Pedram Jahangiry
Road map ML Algorithm
Supervised Unsupervised
Dimensionality
Regression Classification Clustering
Reduction
Linear / Logistic Principle K-Mean
Polynomial regression Component
Penalized Analysis
regression (PCA)
KNN KNN Hierarchical
SVR SVM SVC
1. Decision Trees (DTs)
Tree-based Tree-based
Regression models Classification models 2. Bagging, Random Forests
3. Boosting
Prof. Pedram Jahangiry
Topics
Part I
1. Bagging vs Boosting
2. AdaBoost
3. Gradient Boosting Machine (GBM)
4. XGBoost
Part II
Pros and Cons
Prof. Pedram Jahangiry
Part I
1. Bagging vs Boosting
2. AdaBoost
3. Gradient Boosting Machine (GBM)
4. XGBoost
Prof. Pedram Jahangiry
Bagging vs Boosting
• Bagging consists of creating many “copies” of the training data
(each copy is slightly different from another) and then apply the
weak learner to each copy to obtain multiple weak models and then
combine them.
• In bagging, the bootstrapped trees are independent from each other.
• Boosting consists of using the “original” training data and iteratively
creating multiple models by using a weak learner. Each new model
would be different from the previous ones in the sense that the weak
learner, by building each new model tries to “fix” the errors which
previous models make.
• In boosting, each tree is grown using information from previous tree.
Prof. Pedram Jahangiry
AdaBoost (Adaptive Boosting)
• Forest of weak learners (trees with only 1 feature;
stumps).
• Each tree (stump) depends on the previous tree’s
errors rather than being independent.
1) Starting with usual splitting criteria!
2) Each tree (stump) gets different weight based on
its prediction accuracy.
3) Each observation gets a weight inversely related
to its predicted outcome. (ex, misclassified ones
get more weight).
Source: Towards data science
4) Aggregation is done based on each weak
learner’s weight.
Prof. Pedram Jahangiry
AdaBoost
Key features:
• Adaptive: Updates the weights of misclassified instances at each step.
• Tends to be sensitive to noise and outliers.
• Can be used with various base classifiers, but most commonly used with decision stumps.
• AdaBoost is old: AdaBoost is a popular boosting technique introduced by Yoav Freund and Robert
Schapire in 1996.
Prof. Pedram Jahangiry
Gradient Boosting Machine (GBM)
Source: Geeksforgeeks
• In gradient boosting, each weak learner corrects its
predecessor’s error.
• Unlike AdaBoost, the weights of the training instances
are not tweaked, instead, each predictor is trained using
the residual errors of predecessor as labels.
• Unlike AdaBoost, each tree can be larger than a stump.
However, the trees are still small. By fitting a small tree
to the residuals, the GBM slowly improve 𝑓መ in areas
where it does not perform well.
• Learning rate shrinks the contribution of each tree. There is a trade-off between learning rate and
number of trees. Learning rate slows down the process even further, allowing for more and different
shaped trees to attack the residuals.
• Aggregation is done by adding the first tree predictions and a scaled (shrunk) version of the following
trees.
Prof. Pedram Jahangiry
Extreme Gradient Boosting (XGBoost)
• XGBoost is a refined and customized version of a gradient boosting decision tree system, created
with performance and speed in mind.
• Extreme refers to the fact that the algorithms and methods have been customized to push the limit
of what is possible for gradient boosting algorithms.
Prof. Pedram Jahangiry
Put it all together!
Prof. Pedram Jahangiry
Part II
Pros and Cons
Prof. Pedram Jahangiry
XGBoost’s Pros and Cons
Pros:
Cons:
• XGBoost is more difficult to understand, visualize and to tune compared to AdaBoost and
random forests. There is a multitude of hyperparameters that can be tuned to increase
performance.
Prof. Pedram Jahangiry
Class Modules
✓ Module 1- Introduction to Machine Learning
✓ Module 2- Setting up Machine Learning Environment
✓ Module 3- Linear Regression (Econometrics approach)
✓ Module 4- Machine Learning Fundamentals
✓ Module 5- Linear Regression (Machine Learning approach)
✓ Module 6- Penalized Regression (Ridge, LASSO, Elastic Net)
✓ Module 7- Logistic Regression
✓ Module 8- K-Nearest Neighbors (KNN)
✓ Module 9- Classification and Regression Trees (CART)
✓ Module 10- Bagging and Boosting
• Module 11- Dimensionality Reduction (PCA)
• Module 12- Clustering (KMeans – Hierarchical)
Prof. Pedram Jahangiry