Hyperparameter Tuning

The document discusses machine learning models, focusing on the concepts of model parameters and hyperparameters, which are crucial for training algorithms. It explains the importance of selecting appropriate hyperparameters and outlines methods for hyperparameter optimization, including grid search, random search, and Bayesian optimization. Additionally, it details the role of Gaussian processes and acquisition functions in optimizing hyperparameters to enhance model performance.

Uploaded by

Kapil Nagwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views13 pages

Hyperparameter Tuning

Uploaded by

Kapil Nagwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Kapil Kumar Nagwanshi

PhD (CSE), Sr. Member IEEE, LMCSI, MIAENG

Associate Professor (CSE), ASET

Amity University Rajasthan, Jaipur
▪ Machine learning models are basically mathematical functions that represent the relationship between different
aspects of data. For instance, a linear regression model uses a line to represent the relationship between
“features” and “target.” The formula looks like this:
𝑦 = 𝑤 𝑇𝑥
▪ where x is a vector that represents features of the data and y is a scalar variable that represents the target (some
numeric quantity that we wish to learn to predict).
▪ This model assumes that the relationship between x and y is linear.
▪ The variable w is a weight vector that represents the normal vector for the line; it specifies the slope of the line.
▪ This is what’s known as a model parameter, which is learned during the training phase.

▪ “Training a model” involves using an optimization procedure to determine the best model parameter that “fits”
the data. 𝑛

𝑦 = ෍ 𝑤𝑖 𝑥𝑖
𝑖=0
▪ So a model parameter is a configuration variable that is internal to the model and whose value can be estimated
from the given data.
▪ They are required by the model when making predictions.
▪ Their values define the skill of the model on your problem.
▪ They are estimated or learned from data.
▪ They are often not set manually by the practitioner.
▪ They are often saved as part of the learned model.
▪ In statistics, hyperparameter is a parameter from a prior distribution; it captures the prior belief before
data is observed.
▪ In any machine learning algorithm, these parameters need to be initialized before training a model.
▪ These are values that must be specified outside of the training procedure.
▪ Vanilla linear regression doesn’t have any hyperparameters.
▪ But variants of linear regression do.
▪ Ridge regression and lasso both add a regularization term to linear regression; the weight for the
regularization term is called the regularization parameter.
▪ Decision trees have hyperparameters such as the desired depth and number of leaves in the tree.
Support vector machines (SVMs) require setting a misclassification penalty term.
▪ Kernelized SVMs require setting kernel parameters like the width for radial basis function (RBF)
kernels.
▪ Model Hyperparameters are the properties that govern the entire training process. The below are the
variables usually configure before training a model.
• Learning Rate
• Number of Epochs
• Hidden Layers
• Hidden Units
• Activations Functions
▪ Hyperparameters are important because they directly control the behaviour of the training
algorithm and have a significant impact on the performance of the model is being trained.
▪ “A good choice of hyperparameters can really make an algorithm shine”.
▪ Choosing appropriate hyperparameters plays a crucial role in the success of our neural network
architecture. Since it makes a huge impact on the learned model.
▪ For example,
▪ if the learning rate is too low, the model will miss the important patterns in the data.
▪ If it is high, it may have collisions.

▪ Choosing good hyperparameters gives two benefits:

▪ Efficiently search the space of possible hyperparameters
▪ Easy to manage a large set of experiments for hyperparameter tuning.
The process of finding most
optimal hyperparameters in
machine learning is called Common algorithms include:
hyperparameter
optimisation.
• Grid Search
• Random Search
• Bayesian Optimisation
▪ Grid search is a very traditional technique for implementing
hyperparameters. It brute force all combinations. Grid search
requires to create two set of hyperparameters.
▪ Learning Rate
▪ Number of Layers

▪ Grid search trains the algorithm for all combinations by using the two
set of hyperparameters (learning rate and number of layers) and
measures the performance using “Cross Validation” technique.
▪ This validation technique gives assurance that our trained model got
most of the patterns from the dataset.
▪ One of the best methods to do validation by using “K-Fold Cross
Validation” which helps to provide ample data for training the model
and ample data for validations.
▪ The Grid search method is a simpler algorithm to use but it suffers if
data have high dimensional space called the curse of dimensionality.

https://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
▪ Randomly samples the search space and evaluates sets from a
specified probability distribution.
▪ For example, Instead of trying to check all 100,000 samples, we can
check 1000 random parameters.
▪ Drawback
▪ However, it doesn’t use information from prior experiments to select the next set
and also it is very difficult to predict the next of experiments.
▪ Hyperparameter setting maximizes the performance of the model on
a validation set.
▪ Machine learning algorithms frequently require to fine-tuning of
model hyperparameters.
▪ Unfortunately, that tuning is often called as ‘black function’ because it
cannot be written into a formula since the derivates of the function are
unknown.
▪ Much more appealing way to optimize and fine-tune
hyperparameters are enabling automated model tuning approach by
using Bayesian optimization algorithm.
▪ The model used for approximating the objective function is called
surrogate model. A popular surrogate model for Bayesian
optimization is Gaussian process (GP).
▪ Bayesian optimization typically works by assuming the unknown
function was sampled from a Gaussian Process (GP) and maintains a
posterior distribution for this function as observations are made.
▪ There are two major choices must be made when performing Bayesian
optimization.
▪ Select prior over functions that will express assumptions about the function being
optimized. For this, we choose Gaussian Process prior
▪ Next, we must choose an acquisition function which is used to construct a utility function
from the model posterior, allowing us to determine the next point to evaluate.
▪ A Gaussian process defines the prior distribution over functions which can be
converted into a posterior over functions once we have seen some data.
▪ The Gaussian process uses Covariance matrix to ensure that values that are close
together.
▪ The covariance matrix along with a mean µ function to output the expected value
ƒ(x) defines a Gaussian process.
▪ 1. Gaussian process will be used as a prior for Bayesian inference
▪ 2. To computing the posterior is that it can be used to make predictions for unseen test cases.
▪ Acquisition Function
▪ Introducing sampling data into the search space is done by acquisition functions. It helps to
maximize the acquisition function to determine the next sampling point. Popular acquisition
functions are
• Maximum Probability of Improvement (MPI)
• Expected Improvement (EI)
• Upper Confidence Bound (UCB)
▪ The Expected Improvement (EI) function seems to be a popular one. It is defined as
▪ EI(x)=𝔼[max{0,ƒ(x)−ƒ(x̂ )}]

▪ where ƒ(x̂ ) is the current optimal set of hyperparameters. Maximising the hyperparameters will
improve upon ƒ.
1. EI is high when the posterior expected value of the loss µ(x) is higher than the current best value ƒ(x̂)
2. EI is high when the uncertainty σ(x)σ(x) around the point xx is high.

Data Driven Technologies and Artificial Intelligence in Supply Chain (Mahesh Chand, Vineet Jain, Puneeta Ajmera) - 1
No ratings yet
Data Driven Technologies and Artificial Intelligence in Supply Chain (Mahesh Chand, Vineet Jain, Puneeta Ajmera) - 1
291 pages
CNN Image Classification Guide
No ratings yet
CNN Image Classification Guide
13 pages
Artificial Intelligence in Banking - Where To Start PDF
100% (1)
Artificial Intelligence in Banking - Where To Start PDF
27 pages
Linear Models & SVM in Machine Learning
100% (1)
Linear Models & SVM in Machine Learning
23 pages
Bergstra12a PDF
No ratings yet
Bergstra12a PDF
25 pages
RVM Tutorial
No ratings yet
RVM Tutorial
25 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
SVM Parameter Optimization Guide
No ratings yet
SVM Parameter Optimization Guide
7 pages
EE378A - Combined Notes
No ratings yet
EE378A - Combined Notes
76 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
ML Answer Key (M.tech)
No ratings yet
ML Answer Key (M.tech)
31 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
Artificial Intelligence and Autonomous Vehicles
No ratings yet
Artificial Intelligence and Autonomous Vehicles
6 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
MLT - MKC
No ratings yet
MLT - MKC
10 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Unit 2
No ratings yet
Unit 2
37 pages
Unit 4 A
No ratings yet
Unit 4 A
16 pages
Summary - Data Analytics& Machine Learning
No ratings yet
Summary - Data Analytics& Machine Learning
18 pages
ML Merge
No ratings yet
ML Merge
145 pages
Complete ML Notes
No ratings yet
Complete ML Notes
62 pages
Guidelines For Secure Adoption and Usage of AI - NCSA Qatar
No ratings yet
Guidelines For Secure Adoption and Usage of AI - NCSA Qatar
37 pages
Machine Learning Class Notes: SVM & Bayesian Learning
No ratings yet
Machine Learning Class Notes: SVM & Bayesian Learning
16 pages
Lecture 4.2 Supervised Learning Classification
No ratings yet
Lecture 4.2 Supervised Learning Classification
25 pages
Lec 05
No ratings yet
Lec 05
54 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Artificial Intelligence & It's Applications in Banking Sector
No ratings yet
Artificial Intelligence & It's Applications in Banking Sector
61 pages
Metalearning For Hyperparameter Optimization
No ratings yet
Metalearning For Hyperparameter Optimization
20 pages
Bayes Theorem in Machine Learning
No ratings yet
Bayes Theorem in Machine Learning
40 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Dimensionality Reduction & Model Evaluation
No ratings yet
Dimensionality Reduction & Model Evaluation
80 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Lecture6c HyperparameterOptimization
No ratings yet
Lecture6c HyperparameterOptimization
19 pages
Machine Learning Insights
No ratings yet
Machine Learning Insights
14 pages
Harsha Thesis
No ratings yet
Harsha Thesis
62 pages
ML 01
No ratings yet
ML 01
24 pages
Transfer Learning Using VGG-16 With Deep Convoluti
No ratings yet
Transfer Learning Using VGG-16 With Deep Convoluti
9 pages
Getting Data Science Done: Managing Projects From Ideas To Products
No ratings yet
Getting Data Science Done: Managing Projects From Ideas To Products
40 pages
Andrew Crudge, Will Thomas, Kaiyuan Zhu, Landmark Recognition Using Machine Learning
No ratings yet
Andrew Crudge, Will Thomas, Kaiyuan Zhu, Landmark Recognition Using Machine Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
No ratings yet
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
7 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Wisen Document Text
No ratings yet
Wisen Document Text
26 pages
Unsupervised Image Segmentation Model
No ratings yet
Unsupervised Image Segmentation Model
13 pages
Unit 2
No ratings yet
Unit 2
18 pages
Unit 4
100% (1)
Unit 4
7 pages
Ict 4052 Nnfl-Mkp-Part B
No ratings yet
Ict 4052 Nnfl-Mkp-Part B
2 pages
Transfer Learning Guide for ML Enthusiasts
No ratings yet
Transfer Learning Guide for ML Enthusiasts
18 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Poultry Disease Early Detection Methods Using Deep
No ratings yet
Poultry Disease Early Detection Methods Using Deep
12 pages
60 EC 604 Machine LearningTechniques Model Questions
No ratings yet
60 EC 604 Machine LearningTechniques Model Questions
3 pages
AI Training for Tech Enthusiasts
No ratings yet
AI Training for Tech Enthusiasts
8 pages
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
Applied Deep Learning - Part 3 - Autoencoders - by Arden Dertat - Towards Data Science
No ratings yet
Applied Deep Learning - Part 3 - Autoencoders - by Arden Dertat - Towards Data Science
20 pages
Smart Sensors in Industry 4.0
No ratings yet
Smart Sensors in Industry 4.0
13 pages
Hyperparameters
No ratings yet
Hyperparameters
2 pages
Deep Learning for Age & Gender Detection
No ratings yet
Deep Learning for Age & Gender Detection
6 pages
Ai For Decision Support
No ratings yet
Ai For Decision Support
6 pages
08 Classification
No ratings yet
08 Classification
46 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
ES (AI Module) Student Workbook - Year 1 (English)
100% (1)
ES (AI Module) Student Workbook - Year 1 (English)
46 pages
Module 1 Introduction To AI in Project Management
No ratings yet
Module 1 Introduction To AI in Project Management
16 pages
Lecture-16 Machine Learning With Python
No ratings yet
Lecture-16 Machine Learning With Python
39 pages
Machine Learning for Network Slice Prediction
No ratings yet
Machine Learning for Network Slice Prediction
6 pages
SWEP200 Questions (Compiled)
No ratings yet
SWEP200 Questions (Compiled)
23 pages
Optimizing Heat Exchangers For A Sustainable Energy Future
No ratings yet
Optimizing Heat Exchangers For A Sustainable Energy Future
7 pages
ML Chap 5
No ratings yet
ML Chap 5
14 pages
Course Outline
No ratings yet
Course Outline
3 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Model Parameters
No ratings yet
Model Parameters
26 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
Lecture 9 Model Selection
No ratings yet
Lecture 9 Model Selection
15 pages
DL Unit1
No ratings yet
DL Unit1
61 pages
ML Repeated Questions With Solutions
No ratings yet
ML Repeated Questions With Solutions
47 pages
Predicting Health Service Access
No ratings yet
Predicting Health Service Access
27 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
17 pages
Hyperparameter Tuning Mits
No ratings yet
Hyperparameter Tuning Mits
17 pages
UNIT 2 Deep Learing Answers
No ratings yet
UNIT 2 Deep Learing Answers
42 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
7 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
7 pages
Machine Learning
No ratings yet
Machine Learning
78 pages
DL Unit-2 - Deep Learning Unit 2 Material DL Unit-2 - Deep Learning Unit 2 Material
No ratings yet
DL Unit-2 - Deep Learning Unit 2 Material DL Unit-2 - Deep Learning Unit 2 Material
37 pages