01 Univariate Linear Regression
01 Univariate Linear Regression
Outline
1 - Packages
2 - Linear regression with one variable
2.1 Problem Statement
2.2 Dataset
2.3 Refresher on linear regression
2.4 Compute Cost
Exercise 1
2.5 Gradient descent
Exercise 2
2.6 Learning parameters using batch gradient descent
1 - Packages
First, let's run the cell below to import all the packages that you will need during this
assignment.
numpy is the fundamental package for working with matrices in Python.
matplotlib is a famous library to plot graphs in Python.
utils.py contains helper functions for this assignment. You do not need to
modify code in this file.
In [1]: import numpy as np
import matplotlib.pyplot as plt
from utils import *
import copy
import math
%matplotlib inline
2 - Problem Statement
Suppose you are the CEO of a restaurant franchise and are considering different
cities for opening a new outlet.
You would like to expand your business to cities that may give your restaurant
higher profits.
The chain already has restaurants in various cities and you have data for profits
and populations from the cities.
You also have data on cities that are candidates for a new restaurant.
For these cities, you have the city population.
You will use the data to help you identify which cities may potentially give your
business higher profits.
3 - Dataset
You will start by loading the dataset for this task.
The load_data() function shown below loads the data into variables
x_train and y_train
x_train is the population of a city
y_train is the profit of a restaurant in that city. A negative value for profit
indicates a loss.
Both X_train and y_train are numpy arrays.
In [2]: # load the dataset
x_train, y_train = load_data()
x_train is a numpy array that contains decimal values that are all greater than
zero.
These values represent the city population times 10,000
For example, 6.1101 means that the population for that city is 61,101
Now, let's print y_train
In [4]: # print y_train
print("Type of y_train:",type(y_train))
print("First five elements of y_train are:\n", y_train[:5])
Similarly, y_train is a numpy array that has decimal values, some negative, some
positive.
These represent your restaurant's average monthly profits in each city, in units of
$10,000.
For example, 17.592 represents $175,920 in average monthly profits for that
city.
-2.6807 represents -$26,807 in average monthly loss for that city.
Check the dimensions of your variables
Another useful way to get familiar with your data is to view its dimensions.
Print the shape of x_train and y_train and see how many training examples
you have in your dataset.
In [5]: print ('The shape of x_train is:', x_train.shape)
print ('The shape of y_train is: ', y_train.shape)
print ('Number of training examples (m):', len(x_train))
The city population array has 97 data points, and the monthly average profits also
has 97 data points. These are NumPy 1D arrays.
Visualize your data
It is often useful to understand the data by visualizing it.
For this dataset, you can use a scatter plot to visualize the data, since it has only
two properties to plot (profit and population).
Many other problems that you will encounter in real life have more than two
properties (for example, population, average household income, monthly profits,
monthly sales).When you have more than two properties, you can still use a
scatter plot to see the relationship between each pair of properties.
In [6]: # Create a scatter plot of the data. To change the markers to red "x",
# we used the 'marker' and 'c' parameters
plt.scatter(x_train, y_train, marker='x', c='r')
The model function for linear regression, which is a function that maps from x
(city population) to y (your restaurant's monthly profit for that city) is
represented as
fw,b (x) = wx + b
To train a linear regression model, you want to find the best parameters (w, b)
is a function of
J . That is, the value of the cost
(w, b) depends J (w, b)
The choice of that fits your data the best is the one that has the
(w, b)
smallest cost .
J (w, b)
the optimal values that will achieve the lowest cost . J (w, b)
The trained linear regression model can then take the input feature (city x
5 - Compute Cost
Gradient descent involves repeated steps to adjust the value of your parameter
(w, b) to gradually get a smaller and smaller cost . J (w, b)
At each step of gradient descent, it will be helpful for you to monitor your
progress by computing the cost as gets updated.
J (w, b) (w, b)
In this section, you will implement a function to calculate so that you can J (w, b)
m−1
1
(i) (i) 2
J (w, b) = ∑ (fw,b (x ) − y )
2m
i=0
(i) (i)
fw,b (x ) = wx + b
Implementation
Complete the compute_cost() function below to compute the cost J (w, b) .
Exercise 1
Complete the compute_cost below to:
Iterate over the training examples, and for each example, compute:
The prediction of the model for that example
(i) (i)
fwb (x ) = wx + b
operator.
In [7]: # GRADED FUNCTION: compute_cost
Args:
x (ndarray): Shape (m,) Input to the model (Population of cities)
y (ndarray): Shape (m,) Label (Actual profits for the cities)
w, b (scalar): Parameters of the model
Returns
total_cost (float): The cost of using w,b as the parameters for l
to fit the data points in x and y
"""
# number of training examples
m = x.shape[0]
for i in range(m):
f_wb = w * x[i] + b # prediction
cost = (f_wb - y[i])**2 # squared error
total_cost += cost
total_cost = total_cost / (2 * m) # average over dataset
return total_cost
You can check if your implementation was correct by running the following test code:
In [8]: # Compute cost with some initial values for paramaters w, b
initial_w = 2
initial_b = 1
# Public tests
from public_tests import *
compute_cost_test(compute_cost)
<class 'numpy.float64'>
Cost at initial w: 75.203
All tests passed!
Expected Output:
Cost at initial w: 75.203
6 - Gradient descent
In this section, you will implement the gradient for parameters for linear w, b
regression.
As described in the lecture videos, the gradient descent algorithm is:
repeat until convergence: {
∂J (w, b)
b := b − α
∂b
∂J (w, b)
w := w − α (1)
∂w
m−1
∂J (w, b) 1
(i) (i) (i)
= ∑ (fw,b (x ) − y )x (3)
∂w m
i=0
∂w
,
∂J (w)
∂b
Exercise 2
Complete the compute_gradient function to:
file:///Users/pokeapokemon/Downloads/01_univariate_linear_regression (1).html 7/13
9/7/25, 11:46 PM 01_univariate_linear_regression
Iterate over the training examples, and for each example, compute:
The prediction of the model for that example
(i) (i)
fwb (x ) = wx + b
(i)
∂J (w, b)
(i) (i) (i)
= (fw,b (x ) − y )x
∂w
m−1 (i)
∂J (w, b) 1 ∂J (w, b)
= ∑
∂w m ∂w
i=0
operator
In [9]: # GRADED FUNCTION: compute_gradient
def compute_gradient(x, y, w, b):
"""
Computes the gradient for linear regression
Args:
x (ndarray): Shape (m,) Input to the model (Population of cities)
y (ndarray): Shape (m,) Label (Actual profits for the cities)
w, b (scalar): Parameters of the model
Returns
dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
dj_db (scalar): The gradient of the cost w.r.t. the parameter b
"""
dj_db /= m
dj_dw /= m
compute_gradient_test(compute_gradient)
Now let's run the gradient descent algorithm implemented above on our dataset.
Expected Output:
Gradient at initial , b (zeros) -65.32884975 -5.83913505154639
In [11]: # Compute and display cost and gradient with non-zero w
test_w = 0.2
test_b = 0.2
tmp_dj_dw, tmp_dj_db = compute_gradient(x_train, y_train, test_w, test_b)
Expected Output:
Gradient at test w -47.41610118 -4.007175051546391
Assuming you have implemented the gradient and computed the cost correctly
and you have an appropriate value for the learning rate alpha, should J (w, b)
never increase and should converge to a steady value by the end of the
algorithm.
Exercise 3
file:///Users/pokeapokemon/Downloads/01_univariate_linear_regression (1).html 9/13
9/7/25, 11:46 PM 01_univariate_linear_regression
∂J (w, b)
w = w − α (3)
∂w
∂J (w, b)
b = b − α
∂b
Args:
x : (ndarray): Shape (m,)
y : (ndarray): Shape (m,)
w_in, b_in : (scalar) Initial values of parameters of the model
cost_function: function to compute cost
gradient_function: function to compute the gradient
alpha : (float) Learning rate
num_iters : (int) number of iterations to run gradient descent
Returns:
w (scalar): Updated value of parameter after running gradient desce
b (scalar): Updated value of parameter after running gradient desce
J_history (List): History of cost values
p_history (list): History of parameters [w,b]
"""
w, b = w_in, b_in
J_history = []
p_history = []
for i in range(num_iters):
# compute gradients at current parameters
dj_dw, dj_db = gradient_function(x, y, w, b)
# parameter update
w = w - alpha * dj_dw
b = b - alpha * dj_db
Now let's run the gradient descent algorithm above to learn the parameters for our
dataset.
In [13]: # initialize fitting parameters. Recall that the shape of w is (n,)
initial_w = 0.
initial_b = 0.
Expected Output:
w, b found by gradient descent 1.16636235 -3.63029143940436
We will now use the final parameters from gradient descent to plot the linear fit.
Recall that we can get the prediction for a single example . f (x
(i)
) = wx
(i)
+ b
To calculate the predictions on the entire dataset, we can loop through all the training
examples and calculate the prediction for each example. This is shown in the code
block below.
In [14]: m = x_train.shape[0]
predicted = np.zeros(m)
for i in range(m):
predicted[i] = w * x_train[i] + b
We will now plot the predicted values to see the linear fit.
In [15]: # Plot the linear fit
plt.plot(x_train, predicted, c = "b")
Your final values of can also be used to make predictions on profits. Let's predict
w, b
predict2 = 7.0 * w + b
print('For population = 70,000, we predict a profit of $%.2f' % (predict2
Expected Output:
For population = 35,000, we predict a profit of $4519.77
For population = 70,000, we predict a profit of $45342.45