Hyperparameter Tuning

The document discusses hyperparameter tuning in machine learning, explaining the difference between model parameters and hyperparameters, with examples of each. It outlines the importance of hyperparameters in influencing model performance and describes various optimization techniques such as Grid Search, Random Search, and Bayesian Optimization. Additionally, it provides insights into the use of GridSearchCV for automating hyperparameter tuning in Python's sklearn library.

Uploaded by

Kapil Nagwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views17 pages

Hyperparameter Tuning

Uploaded by

Kapil Nagwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

HYPERPARAMET

ER TUNING
Kapil Kumar Nagwanshi
PhD (CSE), Sr. Member IEEE, LMCSI, MIAENG

Associate Professor (CSE), SoS E&T

PARAMETER IN A ML MODEL?
 Machine learning models are basically mathematical functions that represent the relationship between
different aspects of data. For instance, a linear regression model uses a line to represent the relationship
between “features” and “target.” The formula looks like this:

 where x is a vector that represents features of the data and y is a scalar variable that represents the
target (some numeric quantity that we wish to learn to predict).
 This model assumes that the relationship between x and y is linear.
 The variable w is a weight vector that represents the normal vector for the line; it specifies the slope of
the line.
 This is what’s known as a model parameter, which is learned during the training phase.
 “Training a model” involves using an optimization procedure to determine the best model parameter that
“fits” the data.

 So a model parameter is a configuration variable that is internal to the model and whose value can be
estimated from the given data.
 They are required by the model when making predictions.
 Their values define the skill of the model on your problem.
 They are estimated or learned from data.
 They are often not set manually by the practitioner.
 They are often saved as part of the learned model.
HYPERPARAMETERS: “NUISANCE PARAMETERS”
 In statistics, hyperparameter is a parameter from a prior distribution; it captures the prior belief before
data is observed.
 In any machine learning algorithm, these parameters need to be initialized before training a model.
 These are values that must be specified outside of the training procedure.
 Vanilla linear regression doesn’t have any hyperparameters.
 But variants of linear regression do.
 Ridge regression and lasso both add a regularization term to linear regression; the weight for the
regularization term is called the regularization parameter.
 Decision trees have hyperparameters such as the desired depth and number of leaves in the tree.
Support vector machines (SVMs) require setting a misclassification penalty term.
 Kernelized SVMs require setting kernel parameters like the width for radial basis function (RBF) kernels.

 Model Hyperparameters are the properties that govern the entire training process. The below are
the variables usually configure before training a model.
• Learning Rate
• Number of Epochs
• Hidden Layers
• Hidden Units
• Activations Functions
WHY ARE HYPERPARAMETERS ESSENTIAL?
 Hyperparameters are important because they directly control the behavior of the
training algorithm and have a significant impact on the performance of the model is
being trained.
 “A good choice of hyperparameters can really make an algorithm shine”.
 Choosing appropriate hyperparameters plays a crucial role in the success of our neural
network architecture. Since it makes a huge impact on the learned model.
 For example,
 if the learning rate is too low, the model will miss the important patterns in the data.
 If it is high, it may have collisions.

 Choosing good hyperparameters gives two benefits:

 Efficiently search the space of possible hyperparameters
 Easy to manage a large set of experiments for hyperparameter tuning.
HYPERPARAMETERS OPTIMISATION TECHNIQUES

The process of finding

most optimal
hyperparameters in Common algorithms
machine learning is called include:
hyperparameter
optimisation.
• Grid Search
• Random Search
• Bayesian Optimisation
GRID SEARCH
 Grid search is a very traditional technique for implementing
hyperparameters. It brute force all combinations. Grid search
requires to create two set of hyperparameters.
 Learning Rate
 Number of Layers

 Grid search trains the algorithm for all combinations by using

the two set of hyperparameters (learning rate and number of
layers) and measures the performance using “Cross Validation”
technique.
 This validation technique gives assurance that our trained
model got most of the patterns from the dataset.
 One of the best methods to do validation by using “K-Fold Cross
Validation” which helps to provide ample data for training the
model and ample data for validations.
 The Grid search method is a simpler algorithm to use but it
suffers if data have high dimensional space called the curse of
dimensionality.

https://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
GRIDSEARCHCV
 It is difficult to manually change the hyperparameters and fit them on my training
data every time. Here’s why:
• it is time-consuming
• it is hard to keep track of hyperparameters we tried and we still have to try

 So, I quickly asked Google if there was any solution to my problem and Google
showed me something called GridSearchCV from Sklearn. Let me share how I
took advantage of this GridSearchCV to solve my problem with a simple example.
 GridSearchCV is a library function that is a member of sklearn’s model_selection
package. It helps to loop through predefined hyperparameters and fit your
estimator (model) on your training set. So, in the end, you can select the best
parameters from the listed hyperparameters.
 In addition to that, you can specify the number of times for the cross-validation
for each set of hyperparameters.
1. estimator: estimator object you created
2. params_grid: the dictionary object that holds the
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier hyperparameters you want to try
kn = KNeighborsClassifier() 3. scoring: evaluation metric that you want to use, you can
simply pass a valid string/ object of evaluation metric
params = { 4. cv: number of cross-validation you have to try for each
'n_neighbors' : [5, 25], selected set of hyperparameters
'weights': ['uniform', 'distance’],
5. verbose: you can set it to 1 to get the detailed print out
'algorithm': ['auto', 'ball_tree', 'kd_tree’,
'brute’] while you fit the data to GridSearchCV
} 6. n_jobs: number of processes you wish to run in parallel for
this task if it -1 it will use all available processors.
grid_kn = GridSearchCV(estimator = kn,
param_grid = params,
coring = 'accuracy’,
That is all pretty much you need to define. Then you have to
cv = 5, fit your training data as you do normally. You will get the first
verbose = 1, line printed
Fitting like this:
5 folds for each of 16 candidates, totalling 80 fits
n_jobs = -1) ...
... •
Are you confused what it means?
grid_kn.fit(X_train, y_train) ... •
Simple! Since we have to try two options for
n_neighbors, two for weights and four for
algorithms, altogether there are 16 different
combinations we should try out.
• And for each combination, we have 5 CV fits, so 80
different fits will be tested by our GridSearcCV
The time for this fit depends on the number of object.
hyperparameters you are trying out. Once everything
is finished, you will get [Parallel(n_jobs=1)]:
an output like this: Done 80 out of 80 | elapsed: 74.1min finished]
Then to know what are the best parameters you can simply print it
with

# extract best estimator

print(grid_kn.best_estimator_)

Output:
KNeighborsClassifier(algorithm='auto',
leaf_size=30, metric='minkowski',metric_params=None, n_jobs=-1,
n_neighbors=25, p=2, weights='distance’)

# to test the bestfit

print(grid_kn.score(X_test, y_test))

Output:
0.9524753
RANDOM SEARCH
 Randomly samples the search space and evaluates sets from a
specified probability distribution.
 For example, Instead of trying to check all 100,000 samples,
we can check 1000 random parameters.
 Drawback
 However, it doesn’t use information from prior experiments to select
the next set and also it is very difficult to predict the next of
experiments.
BAYESIAN OPTIMISATION
 Hyperparameter setting maximizes the performance of the model on a validation set.
 Machine learning algorithms frequently require to fine-tuning of model
hyperparameters.
 Unfortunately, that tuning is often called as ‘black function’ because it cannot be
written into a formula since the derivates of the function are unknown.
 Much more appealing way to optimize and fine-tune hyperparameters are enabling
automated model tuning approach by using Bayesian optimization algorithm.
 The model used for approximating the objective function is called surrogate model.
 A popular surrogate model for Bayesian optimization is Gaussian process (GP).
 Bayesian optimization typically works by assuming the unknown function was
sampled from a Gaussian Process (GP) and maintains a posterior distribution for this
function as observations are made.
BAYESIAN OPTIMIZATION
 There are two major choices must be made when performing Bayesian
optimization.
 Select prior over functions that will express assumptions about the function being
optimized. For this, we choose Gaussian Process prior
 Next, we must choose an acquisition function which is used to construct a
utility function from the model posterior, allowing us to determine the next point
to evaluate.
GAUSSIAN PROCESS
 A Gaussian process defines the prior distribution over functions which can
be converted into a posterior over functions once we have seen some data.
 The Gaussian process uses Covariance matrix to ensure that values that
are close together.
 The covariance matrix along with a mean µ function to output the expected
value ƒ(x) defines a Gaussian process.
 1. Gaussian process will be used as a prior for Bayesian inference
 2. To computing the posterior is that it can be used to make predictions for
unseen test cases.
GAUSSIAN PROCESS
 Acquisition Function
 Introducing sampling data into the search space is done by acquisition functions.
It helps to maximize the acquisition function to determine the next sampling
point. Popular acquisition functions are
• Maximum Probability of Improvement (MPI)
• Expected Improvement (EI)
• Upper Confidence Bound (UCB)
 The Expected Improvement (EI) function seems to be a popular one. It is defined as
 EI(x)=𝔼[max{0,ƒ(x)−ƒ(x̂ )}]

 where ƒ(x̂ ) is the current optimal set of hyperparameters. Maximising the

hyperparameters will improve upon ƒ.
1. EI is high when the posterior expected value of the loss µ(x) is higher than the current
best value ƒ(x̂)
2. EI is high when the uncertainty σ(x)σ(x) around the point xx is high.
[email protected]
• “Random Search for Hyper-Parameter Optimization.” James Bergstra and Yoshua
Bengio. Journal of Machine Learning Research, 2012.
• “Algorithms for Hyper-Parameter Optimization.” James Bergstra, Rémi Bardenet, Yoshua
Bengio, and Balázs Kégl.” Neural Information Processing Systems, 2011. See also
a SciPy 2013 talk by the authors .
• “Practical Bayesian Optimization of Machine Learning Algorithms.” Jasper Snoek, Hugo
Larochelle, and Ryan P. Adams. Neural Information Processing Systems, 2012.
• “Sequential Model-Based Optimization for General Algorithm Configuration.” Frank Hutter,
Holger H. Hoos, and Kevin Leyton-Brown. Learning and Intelligent Optimization, 2011.
• “Lazy Paired Hyper-Parameter Tuning.” Alice Zheng and Mikhail Bilenko. International Joint
Conference on Artificial Intelligence, 2013.
• Introduction to Derivative-Free Optimization (MPS-SIAM Series on Optimization) . Andrew R.
Conn, Katya Scheinberg, and Luis N. Vincente, 2009.
• Gradient-Based Hyperparameter Optimization Through Reversible Learning. Dougal Maclaurin,
David Duvenaud, and Ryan P. Adams. ArXiv, 2015.

Grid Search
No ratings yet
Grid Search
48 pages
Hyperparameter Tuning Mits
No ratings yet
Hyperparameter Tuning Mits
17 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
DL Unit-2 - Deep Learning Unit 2 Material DL Unit-2 - Deep Learning Unit 2 Material
No ratings yet
DL Unit-2 - Deep Learning Unit 2 Material DL Unit-2 - Deep Learning Unit 2 Material
37 pages
Rectangular Pyramid Easy 1
100% (1)
Rectangular Pyramid Easy 1
2 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
13 pages
DL Unit1
No ratings yet
DL Unit1
61 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Lec 04 05
No ratings yet
Lec 04 05
37 pages
Lecture 9 Model Selection
No ratings yet
Lecture 9 Model Selection
15 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
1 Normal Stress
No ratings yet
1 Normal Stress
4 pages
Model Parameters
No ratings yet
Model Parameters
26 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
Hyperparameter Tuning - GeeksforGeeks
No ratings yet
Hyperparameter Tuning - GeeksforGeeks
23 pages
ML Chap 5
No ratings yet
ML Chap 5
14 pages
Hyper Parameter New
No ratings yet
Hyper Parameter New
4 pages
Hyper Parameters
No ratings yet
Hyper Parameters
7 pages
Discrete Simulation Optimization For Tuning Machine Learning Method Hyperparameters
No ratings yet
Discrete Simulation Optimization For Tuning Machine Learning Method Hyperparameters
23 pages
Model Training: (Anything Done While We Train The Model)
No ratings yet
Model Training: (Anything Done While We Train The Model)
194 pages
Unit 1
No ratings yet
Unit 1
11 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Machine Learning Cheat Sheet: Karn Singh
No ratings yet
Machine Learning Cheat Sheet: Karn Singh
13 pages
Hyperparameter Tunning Machine Learning Cse
No ratings yet
Hyperparameter Tunning Machine Learning Cse
5 pages
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
No ratings yet
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
5 pages
Bergstra12a PDF
No ratings yet
Bergstra12a PDF
25 pages
Cours 6
No ratings yet
Cours 6
26 pages
Hyperparameters
No ratings yet
Hyperparameters
2 pages
HISTORY ISC GRADE 11 2018-19 Emergence of The Colonial Economy. Why Was There A
No ratings yet
HISTORY ISC GRADE 11 2018-19 Emergence of The Colonial Economy. Why Was There A
13 pages
Final ML
No ratings yet
Final ML
2 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
3 pages
Lecture6c HyperparameterOptimization
No ratings yet
Lecture6c HyperparameterOptimization
19 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
ML Answer Key (M.tech)
No ratings yet
ML Answer Key (M.tech)
31 pages
Map of The GD&T World
No ratings yet
Map of The GD&T World
2 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
#Machinelearning: Mastering Tuning Hyperparameter
No ratings yet
#Machinelearning: Mastering Tuning Hyperparameter
7 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
The Geometry of Futon Comfort
No ratings yet
The Geometry of Futon Comfort
5 pages
Updated Lecture 12 Zainab
No ratings yet
Updated Lecture 12 Zainab
17 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
6 pages
1 s2.0 S1674862X19300047 Main
No ratings yet
1 s2.0 S1674862X19300047 Main
15 pages
Hyperparameter Tuning Guide
No ratings yet
Hyperparameter Tuning Guide
9 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Hyperparameter Tuning Guide
No ratings yet
Hyperparameter Tuning Guide
7 pages
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
No ratings yet
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
10 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
8 pages
Random Forest Hyperparameter Tuning Guide
No ratings yet
Random Forest Hyperparameter Tuning Guide
5 pages
Hyperparameter Search Guide
No ratings yet
Hyperparameter Search Guide
6 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Hyperparameter - Tuning
No ratings yet
Hyperparameter - Tuning
3 pages
Logistic
No ratings yet
Logistic
14 pages
Unit 4 A
No ratings yet
Unit 4 A
16 pages
Grid Random Search
No ratings yet
Grid Random Search
6 pages
8 Ejercicio - Optimización y Guardado de Modelos - Training - Microsoft Learn Ingles
No ratings yet
8 Ejercicio - Optimización y Guardado de Modelos - Training - Microsoft Learn Ingles
13 pages
Hyperparameters
No ratings yet
Hyperparameters
8 pages
Hyperparameter Optimization For Machine Learning Models Based On Bayesian Optimization
No ratings yet
Hyperparameter Optimization For Machine Learning Models Based On Bayesian Optimization
15 pages
ML Assignment
No ratings yet
ML Assignment
13 pages
Xenon E-180 Service Manual
No ratings yet
Xenon E-180 Service Manual
17 pages
Tle
No ratings yet
Tle
7 pages
LG - TV - LG Uj6500
100% (1)
LG - TV - LG Uj6500
37 pages
Research Proposal Covid 19
No ratings yet
Research Proposal Covid 19
19 pages
Annex+A-2 Draft+Amended+Net+Metering+Agreement
No ratings yet
Annex+A-2 Draft+Amended+Net+Metering+Agreement
5 pages
Hyperparameter Optimization Survey
No ratings yet
Hyperparameter Optimization Survey
13 pages
3.1 BSMarE 1st Yr Level - REVALIDA SET B
No ratings yet
3.1 BSMarE 1st Yr Level - REVALIDA SET B
11 pages
Christianity As A Double-Edged Sword in Colonial Africa
No ratings yet
Christianity As A Double-Edged Sword in Colonial Africa
12 pages
Glass Block Technical Presentation
No ratings yet
Glass Block Technical Presentation
16 pages
Rfid Logger With Mysql Database
No ratings yet
Rfid Logger With Mysql Database
10 pages
Digital Satellite Receiver: User Manual
No ratings yet
Digital Satellite Receiver: User Manual
78 pages
Bar & Beverage Menu
No ratings yet
Bar & Beverage Menu
13 pages
Population Growth: BBC Learning English 6 Minute English
No ratings yet
Population Growth: BBC Learning English 6 Minute English
5 pages
IEEE Standard Terminology For Power and Distribution Transformers
No ratings yet
IEEE Standard Terminology For Power and Distribution Transformers
56 pages
Boq For Construction of Reinforced Concrete Piles (Permanent Shoring) To Support Vertical Excavations
No ratings yet
Boq For Construction of Reinforced Concrete Piles (Permanent Shoring) To Support Vertical Excavations
4 pages
Z-Transforms and Their Applications For Solving Difference Equations
No ratings yet
Z-Transforms and Their Applications For Solving Difference Equations
3 pages
Mhra and Ctdi
No ratings yet
Mhra and Ctdi
34 pages
Geometry Exercises 2: Parallelogram Rule
No ratings yet
Geometry Exercises 2: Parallelogram Rule
2 pages
Caulking
No ratings yet
Caulking
6 pages
Impact of Pressure on IRP Fatigue
No ratings yet
Impact of Pressure on IRP Fatigue
23 pages
Screenshot 2021-06-17 at 4.10.12 PM
No ratings yet
Screenshot 2021-06-17 at 4.10.12 PM
3 pages
PT Mapeh-6 Q3
No ratings yet
PT Mapeh-6 Q3
11 pages
Peh Reviewer
No ratings yet
Peh Reviewer
46 pages
Environmental Biotech Solutions
No ratings yet
Environmental Biotech Solutions
10 pages
Battery Monitoring Board Tesla 1701959523
No ratings yet
Battery Monitoring Board Tesla 1701959523
8 pages

Hyperparameter Tuning

Uploaded by

Hyperparameter Tuning

Uploaded by

HYPERPARAMET

Associate Professor (CSE), SoS E&T

 Choosing good hyperparameters gives two benefits:

The process of finding

 Grid search trains the algorithm for all combinations by using

# extract best estimator

# to test the bestfit

 where ƒ(x̂ ) is the current optimal set of hyperparameters. Maximising the

You might also like