Linear Regression Model Using Gradient Descent
For this assignment the objective is to implement the Linear Regression Model (of two variables) by
using Gradient Descent algorithm for calculating the optimal parameters for the model. Also the
implementation needs to be a basic one without using any high level libraries.
First part is to implement the Gradient Descent in python. Below is the code to implement the
Gradient Descent algorithm in python.
def gradient_descent(self, X, y):
num_samples, num_features = X.shape
weights = np.zeros(num_features + 1)
for i in range(self.iter):
for j in range(num_samples):
prediction = weights[0]
for k in range(num_features):
prediction += weights[k+1] * X[j, k]
error = y[j] - prediction
weights[0] += self.lr * error
for k in range(num_features):
weights[k+1] += self.lr * error * X[j, k]
return weights
Here the method gradient_descent receives the params x - input variable, y -target variable, lr -
learning rate, iter - number of iterations.
The assumption of learning rate = 0.001 and 1000 iterations is made to keep it more standard.
The next part is implementing the Linear Regression Model using this gradient descent function from
scratch. Below is the code to implement the model.
class LinRegGD:
def __init__(self):
self.lr = 0.001
self.iter = 1000
def train_model(self, X, y):
self.weights = self.gradient_descent(X, y)
return self
def gradient_descent(self, X, y):
num_samples, num_features = X.shape
weights = np.zeros(num_features + 1)
for i in range(self.iter):
for j in range(num_samples):
prediction = weights[0]
for k in range(num_features):
prediction += weights[k+1] * X[j, k]
error = y[j] - prediction
weights[0] += self.lr * error
for k in range(num_features):
weights[k+1] += self.lr * error * X[j, k]
return weights
def make_predictions(self, X):
num_samples, num_features = X.shape
predictions = []
for j in range(num_samples):
prediction = self.weights[0]
for k in range(num_features):
prediction += self.weights[k+1] * X[j, k]
predictions.append(prediction)
return np.array(predictions)
For the Data set sklearn.datasets import load_diabetes is used to Load data. Below is the code to load
data to Train, Dev and Test data sets.
data = load_diabetes()
X, y = data.data, data.target
X_train, X_temp, y_train, y_temp = train_test_split(X, y,
test_size=0.3)
X_dev, X_test, y_dev, y_test = train_test_split(X_temp, y_temp,
test_size=0.5)
Of the three approaches, I have chosen the approach with all combinations of 45. That is 10!. So for
each combination, I train the data and retrieve the MSE for each set of features. Once I have all the
MSE’s I identify the best set with minimal MSE among the 45.
for feature_pair in combinations(range(len(feature_names)), 2):
model = LinRegGD()
model.train_model(X_train[:, feature_pair], y_train)
predictions = model.make_predictions(X_test[:, feature_pair])
mse = calculate_mse(y_test, predictions)
mse_scores[feature_pair] = mse
feature_pair_names = ', '.join(feature_names[i] for i in
feature_pair)
print(f'MSE for features {feature_pair_names}: {mse}')