Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views11 pages

What Is Gradient Descent - Built in

What Is Gradient Descent

Uploaded by

Sam Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

What Is Gradient Descent - Built in

What Is Gradient Descent

Uploaded by

Sam Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Gradient Descent in Machine bumpsskkier@gmail.

com

08/07/2024, 18:50
Learning: A Basic Introduction What Is Gradient Descent? | Built In Continue as Sam

To create your account, Google will share your name, email


Take a high-level look into gradient descent — one of ML's most popular algorithms. picture with builtin.com. See builtin.com's privacy policy and

Written by Niklas Donges

Image: Shutterstock / Built In

UPDATED BY REVIEWED BY
Jessica Powers | Mar 27, 2023 Parul Pandey

 Gradient descent is by far the most popular optimization strategy used in machine learning
and deep learning at the moment. It is used when training data models, can be combined with
 every algorithm and is easy to understand and implement.
 Everyone working with machine
FOR EMPLOYERS JOIN LOG IN

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 1/11
How to Solve Gradient Descent Challenges
08/07/2024, 18:50 Types of Gradient Descent: Batch, Stochastic,
What IsMini-Batch
Gradient Descent? | Built In

Gradient Descent : Data Science Concepts

A video overview of gradient descent. Video: ritvikmath

Introduction to Gradient Descent


Gradient descent is an optimization algorithm that’s used when training a machine learning
model. It’s based on a convex function and tweaks its parameters iteratively to minimize a
given function to its local minimum.

What Is Gradient Descent in Machine Learning?

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 2/11
function. To understand this concept fully, it’s important to know about gradients.
08/07/2024, 18:50 What Is Gradient Descent? | Built In

What Is a Gradient?

"A gradient measures how much the output of a function changes if


you change the inputs a little bit." — Lex Fridman (MIT)

A gradient simply measures the change in all weights with regard to the change in error. You
can also think of a gradient as the slope of a function. The higher the gradient, the steeper the
slope and the faster a model can learn. But if the slope is zero, the model stops learning. In
mathematical terms, a gradient is a partial derivative with respect to its inputs.

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 3/11
08/07/2024, 18:50 What Is Gradient Descent? | Built In

Image: Shutterstock

Imagine a blindfolded man who wants to climb to the top of a hill with the fewest steps along
the way as possible. He might start climbing the hill by taking really big steps in the steepest
direction, which he can do as long as he is not close to the top. As he comes closer to the top,
however, his steps will get smaller and smaller to avoid overshooting it. This process can be
described mathematically using the gradient.

Imagine the image below illustrates our hill from a top-down view and the red arrows are the
steps of our climber. Think of a gradient in this context as a vector that contains the direction
of the steepest step the blindfolded man can take and also how long that step should be.

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 4/11
08/07/2024, 18:50 What Is Gradient Descent? | Built In

Image: Niklas Donges

Note that the gradient ranging from X0 to X1 is much longer than the one reaching from X3 to
X4. This is because the steepness/slope of the hill, which determines the length of the
vector, is less. This perfectly represents the example of the hill because the hill is getting less
steep the higher it’s climbed. Therefore a reduced gradient goes along with a reduced slope
and a reduced step size for the hill climber.

How Gradient Descent Works


Instead of climbing up a hill, think of gradient descent as hiking down to the bottom of a
valley. This is a better analogy because it is a minimization algorithm that minimizes a given
function.

The equation below describes what the gradient descent algorithm does: b is the next position
of our climber, while a represents his current position. The minus sign refers to the
minimization part of the gradient descent algorithm. The gamma in the middle is a waiting
factor and the gradient term ( Δf(a) ) is simply the direction of the steepest descent.

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 5/11
08/07/2024, 18:50 Imagine you have a machine learning problem
Whatand want toDescent?
Is Gradient train your algorithm
| Built In with gradient
descent to minimize your cost-function J(w, b) and reach its local minimum by tweaking its
parameters (w and b). The image below shows the horizontal axes representing the
parameters (w and b), while the cost function J(w, b) is represented on the vertical axes.
Gradient descent is a convex function.

Image: Niklas Donges

We know we want to find the values of w and b that correspond to the minimum of the cost
function (marked with the red arrow). To start finding the right values we initialize w and
b with some random numbers. Gradient descent then starts at that point (somewhere around
the top of our illustration), and it takes one step after another in the steepest downside

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 6/11
optimal weights.
08/07/2024, 18:50 What Is Gradient Descent? | Built In
For the gradient descent algorithm to reach the local minimum we must set the learning rate
to an appropriate value, which is neither too low nor too high. This is important because if the
steps it takes are too big, it may not reach the local minimum because it bounces back and
forth between the convex function of gradient descent (see left image below). If we set the
learning rate to a very small value, gradient descent will eventually reach the local minimum
but that may take a while (see the right image).

Image: Niklas Donges

So, the learning rate should never be too high or too low for this reason. You can check if your
learning rate is doing well by plotting it on a graph.

How to Solve Gradient Descent Challenges

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 7/11
08/07/2024, 18:50 What Is Gradient Descent? | Built In

Image: Niklas Donges

If the gradient descent algorithm is working properly, the cost function should decrease after
every iteration.

When gradient descent can’t decrease the cost-function anymore and remains more or less on
the same level, it has converged. The number of iterations gradient descent needs to converge
can sometimes vary a lot. It can take 50 iterations, 60,000 or maybe even 3 million, making
the number of iterations to convergence hard to estimate in advance.

There are some algorithms that can automatically tell you if gradient descent has converged,
but you must define a threshold for the convergence beforehand, which is also pretty hard to
estimate. For this reason, simple plots are the preferred convergence test.

Another advantage of monitoring gradient descent via plots is it allows us to easily spot if it
doesn’t work properly, for example if the cost function is increasing. Most of the time the
reason for an increasing cost-function when using gradient descent is a learning rate that's too
high.

If the plot shows the learning curve just going up and down, without really reaching a lower
point, try decreasing the learning rate. Also, when starting out with gradient descent on a

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 8/11
08/07/2024, 18:50 What Is Gradient Descent? | Built In

Batch Gradient Descent


Batch gradient descent, also called vanilla gradient descent, calculates the error for each
example within the training dataset, but only after all training examples have been evaluated
does the model get updated. This whole process is like a cycle and it’s called a training epoch.

Some advantages of batch gradient descent are its computational efficiency: it produces a
stable error gradient and a stable convergence. Some disadvantages are that the stable error
gradient can sometimes result in a state of convergence that isn’t the best the model can
achieve. It also requires the entire training dataset to be in memory and available to the
algorithm.

Stochastic Gradient Descent


By contrast, stochastic gradient descent (SGD) does this for each training example within the
dataset, meaning it updates the parameters for each training example one by one. Depending
on the problem, this can make SGD faster than batch gradient descent. One advantage is the
frequent updates allow us to have a pretty detailed rate of improvement.

The frequent updates, however, are more computationally expensive than the batch gradient
descent approach. Additionally, the frequency of those updates can result in noisy gradients,
which may cause the error rate to jump around instead of slowly decreasing.

Mini-Batch Gradient Descent

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 9/11
08/07/2024, 18:50 What Is Gradient Descent? | Built In

Recent Machine Learning Algorithms Articles

AI Is an Energy Glutton. That Needs to Stop.

What Is Digital Manufacturing?

The 35 Best AI Tools to Know

About Get Involved Resources Tech Hubs


Our Story Recruit With Built Customer Our Sites
In Support
Careers
Become an Share Feedback
Our Staff Writers Expert
Contributor Report a Bug
Built In is the online Content
community for startups Descriptions Browse Jobs
and tech companies.
Find startup jobs, tech Tech A-Z
news and events.

   

Learning Lab User Accessibility Copyright Privacy Terms of Your Privacy Choices/Cookie CA Notice of
Agreement Statement Policy Policy Use Settings Collection

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 10/11
08/07/2024, 18:50 What Is Gradient Descent? | Built In

https://builtin.com/data-science/gradient-descent#:~:text=Gradient Descent is an optimization algorithm for finding a local,function as far as possible. 11/11

You might also like