Week 7 - Tree-Based Model

This document summarizes tree-based machine learning models. It discusses decision trees, random forests, and other ensemble methods like AdaBoost and gradient boosting. Decision trees use a series of yes/no questions to classify or regress data. Random forests average the predictions of many decision trees to improve accuracy. Boosting methods sequentially train weak learners to focus on misclassified examples. The document also covers best practices like hyperparameter tuning and evaluating feature importance.

Uploaded by

Nguyễn Trường Sơn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

73 views8 pages

Week 7 - Tree-Based Model

Uploaded by

Nguyễn Trường Sơn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Week 7 - Tree-based model

1. Decision tree - 30m

- Both used for classification & regression
- Is a series of questions with yes/no answers, where the resulting tree structure contains all
the combinations of responses. The goal is to create a model that predicts the value of a
target variable by learning simple decision rules inferred from the data features.
- Training algorithm: at each feature, construct a threshold and choose this via the
greatest drop weighted error metric - delta before/after!!! (and use this threshold to
divide into two sets - reduce Gini impurity or MSE - variance). The greater the node purity,
the lower the Gini metric.
- This algorithm is called a “greedy” algorithm that can cause locally optimal choice
(optimal at current round only). Although greedy algorithms do not always converge to
global but better time complexity (faster computation).
For classification, the partitions are chosen to separate the different classes while in
regression, the partitions are picked to reduce the variance of sample labels.
https://github.com/manhitv/worldquant-university-unit-II/blob/master/ML_Tree_Based_Models.ipynb
(example for classification - Gini, most common values and regression - Mse, average values)
- Hyperparameters: Leaf - end, Split - higher floor!
+ Max_depth: key param
+ Max_features: number of features to consider best split
+ Min_sample_split: min samples considered to split internal node
+ Min_sample_leaf: min samples required for a leaf.

- Time complexity: training time (np*O(log(n)) (log(n) - number of level / np - number of

possible nodes) and time for making prediction (O(log(n)) - make log(n) decisions.
- Tree based models are popular because they mimic human decision making processes,
work well for a large class of problems, naturally handle multiclassification, and handle a mix
of categorical and numerical data.
- Pros and cons:
+ Advantages: easy to interpret and visualize, requires fewer data preprocessing
from the user, can be used for feature engineering such as predicting missing
values, no assumptions about distribution.
The transparency of a model is often called its explicability. Models with low
explicability are often referred to as "black boxes" and are difficult to derive
insight over the process they are modeling. (explainable ML field)
+ Disadvantages: sensitive to noisy data, unstable (sensitive to small variance of
data, this can be reduced by bagging and boosting algorithms), biased with
imbalance dataset (need to balance out before creating the tree).
*Question: For decision trees, there is no need to scale your data. Why is this?
2. Ensemble models - Tree-based algorithms
- ML models that use more than one predictor to arrive at a prediction. A group of
predictors form an ensemble. In general, ensemble models perform better than using a single
predictor.
- VotingClassifier: is this Ensemble models?

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html
If all classifiers are able to estimate class probabilities (i.e., they all have a
predict_proba() method), then you can tell Scikit-Learn to predict the class with the highest class
probability, averaged over all the individual classifiers.
https://scikit-learn.org/stable/modules/ensemble.html#voting-classifier
Ensemble methods work best when the predictors are as independent from one another
as possible. One way to get diverse classifiers is to train them using very different algorithms.
This increases the chance that they will make very different types of errors, improving the
ensemble’s accuracy.
- There are 3 types of ensemble models: bagging, boosting and blending
+ Bagging: e.g. random forest

After getting many predictors → aggregate function by statistical mode (average for
regression, most frequent value for classification).
Predictors can all be trained in parallel, via different CPU cores or even different servers.
Similarly, predictions can be made in parallel. This is one of the reasons bagging and pasting
are such popular methods: they scale very well.
Reference at sklearn (rarely used): sklearn.ensemble.BaggingClassifier — scikit-learn
0.23.2 documentation
Bootstrapping is a general statistical technique to generate "new" data sets with a single set
by random sampling with replacement.
`n_estimators` - The number of trees in the forest - increase performance till it’s maximum.
Random Forest: a random subset of the features are selected to determine which one to use for a
node split.
Extremely randomized trees: instead of considering the optimal split point for each selected
feature, a candidate for the split for each feature is chosen at random. From these randomly chosen
values, the best is chosen to perform the split. (might use more trees, run to compare two classifiers)
+ Boosting: combine multiple “weak learner” (boost) to form a strong learner.
The general idea of most boosting methods is to train predictors sequentially, each trying
to correct its predecessor.
*Adaboost (adaptive boosting): pay a bit more attention to the training instances that the
predecessor underfitted. When training an AdaBoost classifier, the algorithm first trains a base
classifier (such as a Decision Tree) and uses it to make predictions on the training set. The
algorithm then increases the relative weight of misclassified training instances. Then it trains a
second classifier, using the updated weights, and again makes predictions on the training set,
updates the instance weights, and so on.
There is one important drawback to this sequential learning technique: it cannot be
parallelized (or only partially), since each predictor can only be trained after the previous
predictor has been trained and evaluated. It does not scale as well as bagging or pasting.
*Gradient boosting: However, instead of tweaking the instance weights at every iteration
like AdaBoost does, this method tries to fit the new predictor to the residual errors made by the
previous predictor
Affect of hyperparameters:
+ Blending:
*Splitting:

*Blender & predict:

- Feature importances: Tree-based algorithms support finding importance of features via

feature_importance_ method.
Scikit-Learn measures a feature’s importance by looking at how much the tree
nodes that use that feature reduce impurity on average (across all trees in the forest).
More precisely, it is a weighted average, where each node’s weight is equal to the
number of training samples that are associated with it. Scikit-Learn computes this score
automatically for each feature after training, then it scales the results so that the sum of
all importances is equal to 1.
3. Exercise
- Questions:
+ If a Decision Tree is overfitting the training set, is it a good idea to try decreasing
max_depth?
+ If your AdaBoost ensemble underfits the training data, which hyperparameters
should you tweak and how?
+ If your Gradient Boosting ensemble overfits the training set, should you increase
or decrease the learning rate? → normal cases
- Practice use ensemble for Make_moons dataset (Logistic Regression, SVM, Random
Forest) → Homework: used for California housing dataset
4. Lab
- Re-introduce workflow of a Machine Learning project
B1. Define the scope of work and objective
+ Context: How is your solution be used?
+ Metrics, contraints?
+ If manually?
+ Back test: list the available assumptions, and verify if possible.
B2. Get the data
+ Where, Format/Store data
+ Check the overview (size, type, sample, description, statistics)
+ Data cleaning
B3. EDA & Data transformation
+ Attribute and its characteristics (missing values, type of distribution, usefulness)
+ Visualize the data
+ Study the correlations between attributes
+ [Feature selection], Feature Engineering, [Feature scaling]
+ Write functions for all data transformations
B4. Train models
+ Automate as much as possible
+ Train promising models quickly using standard parameters. Measure and compare
their performance
+ Error analysis
+ Shortlist the top three of five most promising models, preferring models that make
different types of errors.
B5. Fine-tuning
+ Treat data transformation choices as hyperparameters, expecially when you are not
sure about them (e.g., replace missing values with zeros or with the median value)
+ Unless there are very few hyperparameter values to explore, prefer random search
over grid search.
+ Try ensemble methods
+ Test your final model on the test set to estimate the generalizaiton error. Don't tweak
your model again, you would start overfitting the test set.
- Fashion MNIST dataset

Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Data Mining Project: Clustering & Model Analysis
100% (1)
Data Mining Project: Clustering & Model Analysis
40 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Cardio Fitness Project
No ratings yet
Cardio Fitness Project
1 page
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
25 pages
Churn Data
100% (1)
Churn Data
56 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Customer Churn Prediction Using Machine Learning
100% (1)
Customer Churn Prediction Using Machine Learning
14 pages
Scribd DDD
No ratings yet
Scribd DDD
175 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Jio's Canadian Market Potential
No ratings yet
Jio's Canadian Market Potential
40 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Telecom Customer Churn Dataset Analysis
100% (1)
Telecom Customer Churn Dataset Analysis
5 pages
P-149 Final PPT
No ratings yet
P-149 Final PPT
57 pages
A "Short" Introduction To Model Selection
No ratings yet
A "Short" Introduction To Model Selection
25 pages
Telecom Churn Analysis with Logistic Regression
No ratings yet
Telecom Churn Analysis with Logistic Regression
6 pages
Handling Missing Values in Data Mining
No ratings yet
Handling Missing Values in Data Mining
12 pages
Churn Prediction
100% (1)
Churn Prediction
11 pages
Lead Scoring Case Study Presentation
100% (2)
Lead Scoring Case Study Presentation
11 pages
Customer Churn Prediction in Telecommunication
No ratings yet
Customer Churn Prediction in Telecommunication
13 pages
Market Segmentation - Product Service Management
No ratings yet
Market Segmentation - Product Service Management
16 pages
Capstone Notes-Model
No ratings yet
Capstone Notes-Model
20 pages
Statisitics Project 6
100% (2)
Statisitics Project 6
48 pages
2014 - Predicting The Price of Used Cars Using Machine Learning Techniques PDF
No ratings yet
2014 - Predicting The Price of Used Cars Using Machine Learning Techniques PDF
12 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
52 pages
Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Food Delivery Time Prediction 1703681339
100% (1)
Food Delivery Time Prediction 1703681339
8 pages
Churn Predict Analysis
100% (1)
Churn Predict Analysis
23 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Weather Impact on Radio Links
100% (1)
Weather Impact on Radio Links
8 pages
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
No ratings yet
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
9 pages
The Box-Jenkins Methodology For RIMA Models
No ratings yet
The Box-Jenkins Methodology For RIMA Models
172 pages
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
No ratings yet
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
59 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
82 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Predicting Churn Customer in Telecom Using Peergrading Regression Learning Technique
No ratings yet
Predicting Churn Customer in Telecom Using Peergrading Regression Learning Technique
13 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Vinee
100% (1)
Vinee
28 pages
Predictive Model for Retailers
100% (1)
Predictive Model for Retailers
3 pages
Linear Regression Models Guide
100% (1)
Linear Regression Models Guide
61 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Handling Missing Value in Decision Tree Algorithm PDF
No ratings yet
Handling Missing Value in Decision Tree Algorithm PDF
6 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Project Based Learning
No ratings yet
Project Based Learning
3 pages
Bank Marketing Data
100% (2)
Bank Marketing Data
14 pages
Final Capstone Report
No ratings yet
Final Capstone Report
16 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Finance-Focused Big Data Techniques
100% (1)
Finance-Focused Big Data Techniques
23 pages
Customer Churn Analysis
No ratings yet
Customer Churn Analysis
10 pages
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
No ratings yet
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
15 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Module 2
No ratings yet
Module 2
34 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
An Explanation Method For Siamese Neural Networks
No ratings yet
An Explanation Method For Siamese Neural Networks
11 pages
Numpy: Introduction and Applications To Data Processing (Draft Version)
No ratings yet
Numpy: Introduction and Applications To Data Processing (Draft Version)
48 pages
Rethinking Imagenet Pre-Training: Kaiming He Ross Girshick Piotr Doll Ar Facebook Ai Research (Fair)
No ratings yet
Rethinking Imagenet Pre-Training: Kaiming He Ross Girshick Piotr Doll Ar Facebook Ai Research (Fair)
10 pages
Frequentnet: A New Deep Learning Baseline For Image Classification
No ratings yet
Frequentnet: A New Deep Learning Baseline For Image Classification
6 pages
Deep Learning Schizophrenic
No ratings yet
Deep Learning Schizophrenic
16 pages
DEL AAT New
No ratings yet
DEL AAT New
15 pages
Supervised Learning Guide
No ratings yet
Supervised Learning Guide
9 pages
Session 0 CO1-Introduction To AI and ML
No ratings yet
Session 0 CO1-Introduction To AI and ML
18 pages
7 - BV - Ananda - Path - All - Chapter PDF
No ratings yet
7 - BV - Ananda - Path - All - Chapter PDF
86 pages
20CS4701A
No ratings yet
20CS4701A
2 pages
Unit 2 - MCQ Bank PDF
No ratings yet
Unit 2 - MCQ Bank PDF
15 pages
Bca 6 Sem Machine Learning 91697 Jan 2023
No ratings yet
Bca 6 Sem Machine Learning 91697 Jan 2023
2 pages
Zeroshot Fewshot (Concepts)
No ratings yet
Zeroshot Fewshot (Concepts)
5 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Deep Learning for Self-Driving Cars
No ratings yet
Deep Learning for Self-Driving Cars
49 pages
MLSlides1 Selected Shared
No ratings yet
MLSlides1 Selected Shared
21 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
Deep Learning Essentials for Experts
No ratings yet
Deep Learning Essentials for Experts
8 pages
Practice Final
No ratings yet
Practice Final
16 pages
9 Ai Sample Paper 23
No ratings yet
9 Ai Sample Paper 23
7 pages
Introduction To Sora AI
No ratings yet
Introduction To Sora AI
10 pages
Course: Artificial Intelligence Instructor: Naveed Kazim Khan Federal Urdu University of Arts, Science and Technology, Islamabad
No ratings yet
Course: Artificial Intelligence Instructor: Naveed Kazim Khan Federal Urdu University of Arts, Science and Technology, Islamabad
22 pages
Python Projects
No ratings yet
Python Projects
6 pages
AI Chess: Carlsen vs. Machine
No ratings yet
AI Chess: Carlsen vs. Machine
8 pages
All DL
No ratings yet
All DL
72 pages
1Z0 1127 25 Demo5
No ratings yet
1Z0 1127 25 Demo5
6 pages
Generative AI for Business Leaders
100% (18)
Generative AI for Business Leaders
80 pages
AI for Solar Panel Fault Detection
No ratings yet
AI for Solar Panel Fault Detection
10 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
Logistic Regression Case Study & Program
No ratings yet
Logistic Regression Case Study & Program
6 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Video Violence Detection with MoBiLSTM
No ratings yet
Video Violence Detection with MoBiLSTM
12 pages
Skin Cancer Classification
No ratings yet
Skin Cancer Classification
30 pages
SwinLSTM: Spatiotemporal Prediction Boost
No ratings yet
SwinLSTM: Spatiotemporal Prediction Boost
10 pages

Week 7 - Tree-Based Model

Uploaded by

Week 7 - Tree-Based Model

Uploaded by

Week 7 - Tree-based model

1. Decision tree - 30m

- Time complexity: training time (np*O(log(n)) (log(n) - number of level / np - number of

*Blender & predict:

- Feature importances: Tree-based algorithms support finding importance of features via

You might also like