22.08.
2025
Statistical Methods in AI (CS7.403)
Lecture-6: Regression
Ravi Kiran S (CVIT)
Sai Kiran B (SPCRC)
IIIT Hyderabad
Supervised
Learning
Reinforcement
Classification Regression
Learning
Regression model
• Regression model
– Explanatory variables: independent variables
– Variables to be explained : dependent variables
Examples
• Independent variable: Price of crude oil
• Dependent variable: Retail price of petrol
• Independent variables: hours of work, education, occupation, sex, age, years of experience etc.
• Dependent variable: Employment income
• Independent variables: Area of house, Population Density
• Dependent variable: Rent or Price of house
• Price of a product and quantity produced or sold:
– Quantity sold affected by price. Dependent variable is quantity of product sold – independent variable is
price.
– Price affected by quantity offered for sale. Dependent variable is price – independent variable is quantity
sold.
600 160
140
500
120
400
100
300 80
60
200
40
100
20
0 0
1981M01
1982M01
1983M01
1984M01
1985M01
1986M01
1987M01
1988M01
1989M01
1990M01
1991M01
1992M01
1993M01
1994M01
1995M01
1996M01
1997M01
1998M01
1999M01
2000M01
2001M01
2002M01
2003M01
2004M01
2005M01
2006M01
2007M01
2008M01
Crude Oil price index, 1997=100, left axis Regular gasoline prices, regina, cents per litre, right axis
Source: CANSIM II Database (Vector v1576530 and v735048 respectively)
Linear Regression Model
1. Relationship Between Variables Is a Linear Function
Slope Random
Y-Intercept Error
Yi 0 1X i i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g. Yrs of experience)
(e.g. Salary)
Linear Regression Model
Y Yi 0 1X i i Observed
value
i = Error
E Y 0 1 X i
X
Observed value
Estimating Parameters:
Least Squares Method
Least Squares
1. ‘Best Fit’ Means Difference Between Actual Y Values
& Predicted Y Values Are a Minimum. But Positive
Differences Offset Negative ones
Y
Yi 0 1X i i
^4
^2 E Y 0 1 X i
^1 ^3
X
Least Squares
1. ‘Best Fit’ Means Difference Between Actual Y Values
& Predicted Y Values is a Minimum. But Positive
Differences Offset Negative ones. So square errors!
Y Y ˆ
n n
Y ˆ 2
2
i i i
i 1 i 1
^4
^2 Yi 0 1X i i
^1 ^3
E Y 0 1 X i
X
Least Squares
1. ‘Best Fit’ Means Difference Between Actual Y Values
& Predicted Y Values Are a Minimum. But Positive
Differences Off-Set Negative. So square errors!
ˆ
n n
ˆ 2
Y Yi i
2
i
i 1 i 1
2. LS Minimizes the Sum of the Squared Differences
(errors) (SSE)
Least Squares Graphically
n
LS minimizes i 1 2 3 4
2
2
2
2
2
i 1
Y Y2 0 1X 2 2
^4
^2
^1 ^3
X
Derivation of Parameters (1)
Least Squares (L-S):
Minimize squared error
n n
yi 0 1 xi
2 2
i
i 1 i 1
Derivation of Parameters (1)
Least Squares (L-S):
Minimize squared error
n n
yi 0 1 xi
2 2
i
i 1 i 1
i2 yi 0 1 xi
2
0
0 0
2 ny n 0 n1 x
ˆ0 y ˆ1x
Derivation of Parameters (1)
Least Squares (L-S):
Minimize squared error
i2 yi 0 1 xi
2
0
1 1
2 xi yi 0 1 xi
2 xi yi y 1 x 1 xi
1 xi xi x xi yi y
1 xi x xi x xi x yi y
SS xy
ˆ1
SS xx
Coefficient Equations
Prediction equation
yˆi ˆ0 ˆ1xi
Sample slope
SS xy xi x yi y
ˆ
1
xi x
SS xx 2
Sample Y - intercept
ˆ0 y ˆ1x
Regression – Error measures
Regression – Error measures
Linear Regression – Matrix Form
Linear Regression – Matrix Form
Geometric Interpretation
• 𝑌 lies in subspace spanned by column space of 𝑋: 𝐶 𝑋
• 𝑟 = 𝑌 − 𝑌 = 𝑌 − 𝑋𝛽 is orthogonal to 𝑋, 𝑖. 𝑒. 𝑋 𝑇 𝑟 = 0
• Colab Notebook:
https://colab.research.google.com/drive/193OYEJ_-
wh_p9Mv8idRruV3ZIbb6EkiP?usp=sharing
Linear Regression – Matrix Form - Issues
• N samples, p-dimensional (what if p > N ?)
• Complexity of matrix inversion (what if N very large
?)
Gradient Descent
1. Initialize the parameters 𝛽 to some
random values.
2. Update the parameters using gradient
descent rule
𝛽 𝑡 + 1 = 𝛽 𝑡 − 𝜂 ∇𝛽 𝐿 𝛽 𝑡
3. Repeat 2 until |∇𝛽 𝐿 𝛽 𝑡 | is close to 0
Gradient Descent
1. Initialize the parameters 𝛽 to some
random values.
2. Update the parameters using gradient
descent rule
𝛽 𝑡 + 1 = 𝛽 𝑡 − 𝜂 ∇𝛽 𝐿 𝛽 𝑡
3. Repeat 2 until |∇𝛽 𝐿 𝛽 𝑡 | is close to 0
Gradient Descent
1. Initialize the parameters 𝛽 to some random
values.
2. Update the parameters using gradient
descent rule
𝛽 𝑡 + 1 = 𝛽 𝑡 − 𝜂 ∇𝛽 𝐿 𝛽 𝑡
3. Repeat 2 until |∇𝑤 𝐿 𝑤 𝑡 | is close to 0
𝛽1
𝛽0
Gradient Descent
1. Initialize the parameters 𝛽 to some
random values.
2. Update the parameters using gradient
descent rule
𝑤 𝑡 + 1 = 𝑤 𝑡 − 𝜂 ∇𝑤 𝐿 𝑤 𝑡
3. Repeat 2 until |∇𝑤 𝐿 𝑤 𝑡 | is close to 0
Linear Regression
• Linear Regression Linear in coefficients and
NOT variables
Careful: X may not be causing y !
Linear Regression – Outliers
Linear Regression is problematic in
many other cases
Linear Regression is problematic in
many other cases
Linear Regression is problematic in
many other cases
Linear Regression is problematic in
many other cases
Piecewise Linear Regression
Also ref: Multivariate Adaptive Regression Splines (MARS)
Bivariate and multivariate models
Bivariate or simple regression model
(Education) x y (Income)
Multivariate or multiple regression model
(Education) 1 x
(Gender) x2
(Experience) x3 y (Income)
(Age) x4
Model with simultaneous relationship
Price of wheat Quantity of wheat produced
Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
39
Test Time
Overfitting