0% found this document useful (0 votes)

62 views26 pages

Tuning A CART's Hyperparameters: Elie Kawerk

The document discusses tuning the hyperparameters of classification and regression tree (CART) and random forest models in Python. It defines hyperparameters as parameters that are set prior to training rather than learned from data, such as max_depth and min_samples_leaf for CART. It describes grid search and cross-validation as common approaches for tuning hyperparameters to find an optimal model with the best performance on held-out data based on a score such as accuracy or R2. An example tunes CART hyperparameters using grid search cross-validation in scikit-learn.

Uploaded by

manish wadhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views26 pages

Tuning A CART's Hyperparameters: Elie Kawerk

Uploaded by

manish wadhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Tuning a CART's

hyperparameters
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

Elie Kawerk
Data Scientist
Hyperparameters
Machine learning model:

parameters: learned from data

CART example: split-point of a node, split-feature of a node, ...

hyperparameters: not learned from data, set prior to training

CART example: max_depth , min_samples_leaf , splitting criterion ...

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

What is hyperparameter tuning?
Problem: search for a set of optimal hyperparameters for a learning algorithm.

Solution: nd a set of optimal hyperparameters that results in an optimal model.

Optimal model: yields an optimal score.

Score: in sklearn defaults to accuracy (classi cation) and R2 (regression).

Cross validation is used to estimate the generalization performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Why tune hyperparameters?
In sklearn , a model's default hyperparameters are not optimal for all problems.

Hyperparameters should be tuned to obtain the best model performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Approaches to hyperparameter tuning
Grid Search

Random Search

Bayesian Optimization

Genetic Algorithms

....

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Grid search cross validation
Manually set a grid of discrete hyperparameter values.

Set a metric for scoring model performance.

Search exhaustively through the grid.

For each set of hyperparameters, evaluate each model's CV score.

The optimal hyperparameters are those of the model achieving the best CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Grid search cross validation: example
Hyperparameters grids:
max_depth = {2,3,4},

min_samples_leaf = {0.05, 0.1}

hyperparameter space = { (2,0.05) , (2,0.1) , (3,0.05), ... }

CV scores = { score(2,0.05) , ... }

optimal hyperparameters = set of hyperparameters corresponding to the best CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Inspecting the hyperparameters of a CART in
sklearn
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Set seed to 1 for reproducibility

SEED = 1

# Instantiate a DecisionTreeClassifier 'dt'

dt = DecisionTreeClassifier(random_state=SEED)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Inspecting the hyperparameters of a CART in
sklearn
# Print out 'dt's hyperparameters {'class_weight': None,
print(dt.get_params()) 'criterion': 'gini',
'max_depth': None,
'max_features': None,
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_impurity_split': None,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'presort': False,
'random_state': 1,
'splitter': 'best'}

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Import GridSearchCV
from sklearn.model_selection import GridSearchCV

# Define the grid of hyperparameters 'params_dt'

params_dt = {
'max_depth': [3, 4,5, 6],
'min_samples_leaf': [0.04, 0.06, 0.08],
'max_features': [0.2, 0.4,0.6, 0.8]
}

# Instantiate a 10-fold CV grid search object 'grid_dt'

grid_dt = GridSearchCV(estimator=dt,
param_grid=params_dt,
scoring='accuracy',
cv=10,
n_jobs=-1)

# Fit 'grid_dt' to the training data

grid_dt.fit(X_train, y_train)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Extracting the best hyperparameters
# Extract best hyperparameters from 'grid_dt'
best_hyperparams = grid_dt.best_params_
print('Best hyerparameters:\n', best_hyperparams)

Best hyerparameters:
{'max_depth': 3, 'max_features': 0.4, 'min_samples_leaf': 0.06}

# Extract best CV score from 'grid_dt'

best_CV_score = grid_dt.best_score_
print('Best CV accuracy'.format(best_CV_score))

Best CV accuracy: 0.938

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Extracting the best estimator
# Extract best model from 'grid_dt'
best_model = grid_dt.best_estimator_

# Evaluate test set accuracy

test_acc = best_model.score(X_test,y_test)

# Print test set accuracy

print("Test set accuracy of best model: {:.3f}".format(test_acc))

Test set accuracy of best model: 0.947

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON
Tuning an RF's
Hyperparameters
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

Elie Kawerk
Data Scientist
Random Forests Hyperparameters
CART hyperparameters

number of estimators

bootstrap

....

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Tuning is expensive
Hyperparameter tuning:

computationally expensive,

sometimes leads to very slight improvement,

Weight the impact of tuning on the whole project.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Inspecting RF Hyperparameters in sklearn
# Import RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor

# Set seed for reproducibility

SEED = 1

# Instantiate a random forests regressor 'rf'

rf = RandomForestRegressor(random_state= SEED)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Inspect rf' s hyperparameters {'bootstrap': True,
rf.get_params() 'criterion': 'mse',
'max_depth': None,
'max_features': 'auto',
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_impurity_split': None,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'n_estimators': 10,
'n_jobs': -1,
'oob_score': False,
'random_state': 1,
'verbose': 0,
'warm_start': False}

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Basic imports
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import GridSearchCV

# Define a grid of hyperparameter 'params_rf'

params_rf = {
'n_estimators': [300, 400, 500],
'max_depth': [4, 6, 8],
'min_samples_leaf': [0.1, 0.2],
'max_features': ['log2', 'sqrt']
}

# Instantiate 'grid_rf'
grid_rf = GridSearchCV(estimator=rf,
param_grid=params_rf,
cv=3,
scoring='neg_mean_squared_error',
verbose=1,
n_jobs=-1)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Searching for the best hyperparameters
# Fit 'grid_rf' to the training set
grid_rf.fit(X_train, y_train)

Fitting 3 folds for each of 36 candidates, totalling 108 fits

[Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 10.0s
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 24.3s finished
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=4,
max_features='log2', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=0.1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=1,
oob_score=False, random_state=1, verbose=0, warm_start=False)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Extracting the best hyperparameters
# Extract best hyperparameters from 'grid_rf'
best_hyperparams = grid_rf.best_params_

print('Best hyerparameters:\n', best_hyperparams)

Best hyerparameters:
{'max_depth': 4,
'max_features': 'log2',
'min_samples_leaf': 0.1,
'n_estimators': 400}

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Evaluating the best model performance
# Extract best model from 'grid_rf'
best_model = grid_rf.best_estimator_
# Predict the test set labels
y_pred = best_model.predict(X_test)
# Evaluate the test set RMSE
rmse_test = MSE(y_test, y_pred)**(1/2)
# Print the test set RMSE
print('Test set RMSE of rf: {:.2f}'.format(rmse_test))

Test set RMSE of rf: 3.89

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON
Congratulations!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

Elie Kawerk
Data Scientist
How far you have come
Chapter 1: Decision-Tree Learning

Chapter 2: Generalization Error, Cross-Validation, Ensembling

Chapter 3: Bagging and Random Forests

Chapter 4: AdaBoost and Gradient-Boosting

Chapter 5: Model Tuning

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Thank you!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

Grid Search
No ratings yet
Grid Search
48 pages
Chapter 09 CART - Week 06 - 02
No ratings yet
Chapter 09 CART - Week 06 - 02
53 pages
Unit 1
No ratings yet
Unit 1
11 pages
445 Lecture 8 DTR
No ratings yet
445 Lecture 8 DTR
135 pages
Slides (A19 A20)
No ratings yet
Slides (A19 A20)
261 pages
Machine Learning Cheat Sheet: Karn Singh
No ratings yet
Machine Learning Cheat Sheet: Karn Singh
13 pages
Forest - Py
No ratings yet
Forest - Py
50 pages
Lec 04 05
No ratings yet
Lec 04 05
37 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
17 pages
Hyperparameter Tuning Mits
No ratings yet
Hyperparameter Tuning Mits
17 pages
Notes 221104 101858
No ratings yet
Notes 221104 101858
32 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
12 pages
Assgn 06 ML - Ipynb - Colab
No ratings yet
Assgn 06 ML - Ipynb - Colab
5 pages
Chapter 1 Introduction To Emerging Technologies
No ratings yet
Chapter 1 Introduction To Emerging Technologies
51 pages
Deep Learning Course Overview
100% (1)
Deep Learning Course Overview
122 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
No ratings yet
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
5 pages
Grid
No ratings yet
Grid
2 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
Hyper Parameter Optimization
No ratings yet
Hyper Parameter Optimization
13 pages
ML Chap 5
No ratings yet
ML Chap 5
14 pages
Hyperparameter Tuning The Random Forest in Python BOM 3 - by Will Koehrsen - Towards Data Science
No ratings yet
Hyperparameter Tuning The Random Forest in Python BOM 3 - by Will Koehrsen - Towards Data Science
15 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
2 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Reference Guide - Validation & Cross-Validation
No ratings yet
Reference Guide - Validation & Cross-Validation
7 pages
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Predictive Analytics Assignment
No ratings yet
Predictive Analytics Assignment
29 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
17CS104 - Robotics Process Automation
No ratings yet
17CS104 - Robotics Process Automation
14 pages
Generalization Error: Elie Kawerk
No ratings yet
Generalization Error: Elie Kawerk
37 pages
Robot Programming ROS Introduction To Navigation PDF
0% (1)
Robot Programming ROS Introduction To Navigation PDF
16 pages
Digital Transformation Insights
No ratings yet
Digital Transformation Insights
10 pages
Random Forest Hyperparameter Tuning Guide
No ratings yet
Random Forest Hyperparameter Tuning Guide
5 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
#Machinelearning: Mastering Tuning Hyperparameter
No ratings yet
#Machinelearning: Mastering Tuning Hyperparameter
7 pages
Hyperparameter - Tuning
No ratings yet
Hyperparameter - Tuning
3 pages
AI's Impact on Students & Workers
No ratings yet
AI's Impact on Students & Workers
2 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
Untitled 57
No ratings yet
Untitled 57
4 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Decision Trees in Python Guide
No ratings yet
Decision Trees in Python Guide
29 pages
Decision Tree
No ratings yet
Decision Tree
1 page
Hyperparameter Tuning Guide
No ratings yet
Hyperparameter Tuning Guide
7 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
3 pages
Optimized Hyperparameters Tuning of Multi-Class Classification Algorithms
No ratings yet
Optimized Hyperparameters Tuning of Multi-Class Classification Algorithms
17 pages
Tree Models & Generalization in Python
No ratings yet
Tree Models & Generalization in Python
37 pages
4p Code
No ratings yet
4p Code
3 pages
Slidesgo Optimizing Decision Trees A Deep Dive Into Hyperparameter Tuning With Randomizedsearchcv and Gridse 20241024015612VysU
No ratings yet
Slidesgo Optimizing Decision Trees A Deep Dive Into Hyperparameter Tuning With Randomizedsearchcv and Gridse 20241024015612VysU
8 pages
QB 1
No ratings yet
QB 1
11 pages
Chapter1 PDF
No ratings yet
Chapter1 PDF
29 pages
ML NEW Final Format
No ratings yet
ML NEW Final Format
37 pages
Intro to ML: Random Forest Exercise
No ratings yet
Intro to ML: Random Forest Exercise
2 pages
Hyperparameter Tuning Is The Process of Optimizing The Model
No ratings yet
Hyperparameter Tuning Is The Process of Optimizing The Model
3 pages
Random Forest
No ratings yet
Random Forest
28 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
SINGA - List of Supervisors - MSE
No ratings yet
SINGA - List of Supervisors - MSE
2 pages
Hyperparameters
No ratings yet
Hyperparameters
8 pages
Decision Trees in Sklearn Decision Trees in Sklearn
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
7 pages
Regression Trees Chapter2
No ratings yet
Regression Trees Chapter2
21 pages
Character Classification and Recognition For Urdu Texts in Natural Scene Images
No ratings yet
Character Classification and Recognition For Urdu Texts in Natural Scene Images
6 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Part 1.4. Convolution Neural Network
No ratings yet
Part 1.4. Convolution Neural Network
24 pages
Article - What Is A Chatbot
No ratings yet
Article - What Is A Chatbot
2 pages
MKT 6080 Module 4 Strategic Insights On Emerging Technologies
No ratings yet
MKT 6080 Module 4 Strategic Insights On Emerging Technologies
14 pages
Lecture 8
No ratings yet
Lecture 8
42 pages
Aiml Vit Phase 2 - Aiml Ap - Bhopal
No ratings yet
Aiml Vit Phase 2 - Aiml Ap - Bhopal
319 pages
Randomized Motion Planning
No ratings yet
Randomized Motion Planning
55 pages
Vanishing Gradient Problem
No ratings yet
Vanishing Gradient Problem
3 pages
DL Cif 2023
No ratings yet
DL Cif 2023
3 pages
06 Pytorch Transfer Learning
No ratings yet
06 Pytorch Transfer Learning
18 pages
RR 01 Artificial Intelligence
No ratings yet
RR 01 Artificial Intelligence
14 pages
Major In: Machine Learning
No ratings yet
Major In: Machine Learning
11 pages
Human Parsing For Image-Based Virtual Try-On Using Pix2Pix
No ratings yet
Human Parsing For Image-Based Virtual Try-On Using Pix2Pix
5 pages
Dr. Sourabh Shrivastava - Image Processing
No ratings yet
Dr. Sourabh Shrivastava - Image Processing
4 pages
Deep Residual Learning For Image Recognition
No ratings yet
Deep Residual Learning For Image Recognition
2 pages
Aiml Assignment 2
No ratings yet
Aiml Assignment 2
2 pages
AIOT - When AI Meets IOT-N
No ratings yet
AIOT - When AI Meets IOT-N
2 pages
Practice Problem ML
No ratings yet
Practice Problem ML
2 pages
Soft Computing
No ratings yet
Soft Computing
2 pages
Advanced Algorithm Techniques
No ratings yet
Advanced Algorithm Techniques
2 pages
1.2.5. Machine Learning With Python Lab
No ratings yet
1.2.5. Machine Learning With Python Lab
2 pages
Syllabus
No ratings yet
Syllabus
2 pages

Tuning A CART's Hyperparameters: Elie Kawerk

Uploaded by

Tuning A CART's Hyperparameters: Elie Kawerk

Uploaded by

Tuning a CART's

parameters: learned from data

hyperparameters: not learned from data, set prior to training

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Solution: nd a set of optimal hyperparameters that results in an optimal model.

Optimal model: yields an optimal score.

Score: in sklearn defaults to accuracy (classi cation) and R2 (regression).

Cross validation is used to estimate the generalization performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Hyperparameters should be tuned to obtain the best model performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Set a metric for scoring model performance.

Search exhaustively through the grid.

For each set of hyperparameters, evaluate each model's CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

min_samples_leaf = {0.05, 0.1}

hyperparameter space = { (2,0.05) , (2,0.1) , (3,0.05), ... }

CV scores = { score(2,0.05) , ... }

optimal hyperparameters = set of hyperparameters corresponding to the best CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Set seed to 1 for reproducibility

# Instantiate a DecisionTreeClassifier 'dt'

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Define the grid of hyperparameters 'params_dt'

# Instantiate a 10-fold CV grid search object 'grid_dt'

# Fit 'grid_dt' to the training data

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Extract best CV score from 'grid_dt'

Best CV accuracy: 0.938

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Evaluate test set accuracy

# Print test set accuracy

Test set accuracy of best model: 0.947

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

sometimes leads to very slight improvement,

Weight the impact of tuning on the whole project.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Set seed for reproducibility

# Instantiate a random forests regressor 'rf'

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Define a grid of hyperparameter 'params_rf'

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Fitting 3 folds for each of 36 candidates, totalling 108 fits

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

print('Best hyerparameters:\n', best_hyperparams)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Test set RMSE of rf: 3.89

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Chapter 2: Generalization Error, Cross-Validation, Ensembling

Chapter 3: Bagging and Random Forests

Chapter 4: AdaBoost and Gradient-Boosting

Chapter 5: Model Tuning

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

You might also like