Lecture 1.
Closed-form Equation, Type of Gradient
Descent
(Batch, Stochastic, Mini-batch) -
Definition, properties.
Dr. Mainak Biswas
Closed-form Equation
• Closed-form Equation: closed-form equation is
a mathematical expression that provides a
direct way to compute a value without
requiring iterative procedures or infinite series
– Example:
• Sum of an Arithmetic Series: The sum of the first n
terms of an arithmetic series with the first term a and
common difference d is:
𝑛
𝑆𝑛 = (2𝑎 + 𝑛 − 1 𝑑
2
Dr. Mainak Biswas
Gradient Descent
• Gradient Descent is an optimization algorithm used in
machine learning and deep learning to minimize the
loss function by updating the model's parameters in
the direction of the steepest descent
• The type of gradient descent depends on how much
data is used to compute the gradient at each iteration
• Gradient descent is also called “the deepest downward
slope algorithm”
• It is very important in machine learning, where it is
used to minimize a cost function
Dr. Mainak Biswas
Dr. Mainak Biswas
Loss function
𝑁
1 2
𝐸 𝑤 = 𝑓 𝑥𝑖 − 𝑦𝑖
2𝑁
𝑖=1
• Where 𝑓 𝑥𝑖 = 𝑤 𝑇 𝑥𝑖 , then
𝑁
𝜕𝐸 1
= 𝑓 𝑥𝑖 − 𝑦𝑖 𝑥𝑖
𝜕𝑤 𝑁
𝑖=1
Dr. Mainak Biswas
Mathematical Formulation of Gradient
Descent
𝑤 = 𝑤 − 𝜂𝛻𝐸(𝑤)
• 𝑤 : Model parameters (weights)
• 𝜂 : Learning rate
• 𝛻𝐸(𝑤): Gradient of the loss function 𝐸(𝑤) with
respect to 𝑤
• It can be also written as:
𝑁
1
𝑤 =𝑤−𝜂 𝑓 𝑥𝑖 − 𝑦𝑖 𝑥𝑖
𝑁
𝑖=1
Dr. Mainak Biswas
Dr. Mainak Biswas
Numerical Problem
• Let 𝐸 𝑤 = 𝑤 − 3 2 + 2, 𝜂 = 0.1, 𝑤 = 0 ,
then find w and E(w) for five iterations:
• So, we see 𝑥𝑖 = 1, therefore iterations can be
solved in terms of w only
𝛿𝐸
• =2 𝑤−3
𝛿𝑤
• 𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 − 𝜂𝛻𝐸 𝑤𝑜𝑙𝑑 ⇒ 0.8𝑤𝑜𝑙𝑑 + 0.6
Sl w 𝐸 𝑤
1 0 11
2 0.6 7.76
Dr. Mainak Biswas
Sl w 𝐸 𝑤
1 0 11
2 0.6 7.76
2 1.0800 5.6864
2 1.4640 4.3593
2 1.7712 3.5099
Dr. Mainak Biswas
Batch Gradient Descent
• Batch Gradient Descent is an optimization algorithm
used to minimize a loss function by iteratively updating
the model's parameters using the entire dataset to
calculate the gradient
• Advantages:
– Computes the gradient with high precision using the entire
dataset
– Converges steadily towards the minimum
– Suitable for smooth and convex loss functions
• Disadvantages:
– Memory-intensive when the dataset is large
– Requires processing the entire dataset for each iteration
Dr. Mainak Biswas
Stochastic Gradient Descent
• Stochastic Gradient Descent (SGD) is a variant of gradient
descent where the model parameters are updated using
only a single training example at a time, rather than the
entire dataset
• This leads to faster updates and can help the algorithm
escape local minima, making it suitable for large datasets
• Advantages
– Faster Updates
– Escaping Local Minima
– Scalability
• Disadvantages
– Noisy Convergence
– Requires More Iterations
Dr. Mainak Biswas
Mini-Batch Gradient Descent
• Mini-batch Gradient Descent is a hybrid approach between Batch
Gradient Descent and Stochastic Gradient Descent. It aims to combine the
advantages of both by updating the model parameters using a subset
(mini-batch) of the training data rather than the entire dataset (batch) or
just one data point (stochastic)
– Mini-batch: The dataset is divided into small batches, each containing a fixed
number of training examples (The size of each mini-batch (denoted as 𝑏) is a
hyper-parameter)
– Gradient Calculation: For each mini-batch, the gradient is calculated based on
the average of the training examples in that batch
– Weight Update: The model parameters are updated using the computed
gradient for the mini-batch
– Repeat for all mini-batches until convergence
• Advantages: Faster than Batch GD, Less Noisy than SGD
• Disadvantages: Choosing the Right Batch Size, Memory Considerations
Dr. Mainak Biswas