Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views46 pages

Smai Lecture 06 Regression

The document discusses regression models in supervised learning, focusing on the relationship between independent and dependent variables. It covers linear regression, the least squares method for estimating parameters, and the gradient descent algorithm for optimization. Additionally, it addresses potential issues such as overfitting and the complexity of regression models with multiple variables.

Uploaded by

aaravjee1076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views46 pages

Smai Lecture 06 Regression

The document discusses regression models in supervised learning, focusing on the relationship between independent and dependent variables. It covers linear regression, the least squares method for estimating parameters, and the gradient descent algorithm for optimization. Additionally, it addresses potential issues such as overfitting and the complexity of regression models with multiple variables.

Uploaded by

aaravjee1076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

22.08.

2025

Statistical Methods in AI (CS7.403)


Lecture-6: Regression

Ravi Kiran S (CVIT)


Sai Kiran B (SPCRC)

IIIT Hyderabad
Supervised
Learning
Reinforcement
Classification Regression
Learning
Regression model

• Regression model
– Explanatory variables: independent variables
– Variables to be explained : dependent variables
Examples
• Independent variable: Price of crude oil
• Dependent variable: Retail price of petrol

• Independent variables: hours of work, education, occupation, sex, age, years of experience etc.
• Dependent variable: Employment income

• Independent variables: Area of house, Population Density


• Dependent variable: Rent or Price of house

• Price of a product and quantity produced or sold:


– Quantity sold affected by price. Dependent variable is quantity of product sold – independent variable is
price.
– Price affected by quantity offered for sale. Dependent variable is price – independent variable is quantity
sold.
600 160

140

500

120

400

100

300 80

60

200

40

100

20

0 0
1981M01

1982M01

1983M01

1984M01

1985M01

1986M01

1987M01

1988M01

1989M01

1990M01

1991M01

1992M01

1993M01

1994M01

1995M01

1996M01

1997M01

1998M01

1999M01

2000M01

2001M01

2002M01

2003M01

2004M01

2005M01

2006M01

2007M01

2008M01
Crude Oil price index, 1997=100, left axis Regular gasoline prices, regina, cents per litre, right axis

Source: CANSIM II Database (Vector v1576530 and v735048 respectively)


Linear Regression Model

 1. Relationship Between Variables Is a Linear Function

Slope Random
Y-Intercept Error

Yi   0  1X i   i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g. Yrs of experience)
(e.g. Salary)
Linear Regression Model

Y Yi   0  1X i   i Observed
value

i = Error

E Y   0  1 X i

X
Observed value
Estimating Parameters:
Least Squares Method
Least Squares
 1. ‘Best Fit’ Means Difference Between Actual Y Values
& Predicted Y Values Are a Minimum. But Positive
Differences Offset Negative ones
Y
Yi   0  1X i   i
^4
^2 E Y   0  1 X i
^1 ^3

X
Least Squares
 1. ‘Best Fit’ Means Difference Between Actual Y Values
& Predicted Y Values is a Minimum. But Positive
Differences Offset Negative ones. So square errors!

 Y  Y    ˆ
n n
Y ˆ 2
2
i i i
i 1 i 1
^4
^2 Yi   0  1X i   i
^1 ^3
E Y   0  1 X i
X
Least Squares
 1. ‘Best Fit’ Means Difference Between Actual Y Values
& Predicted Y Values Are a Minimum. But Positive
Differences Off-Set Negative. So square errors!

    ˆ
n n

 ˆ 2
Y  Yi i
2
i
i 1 i 1
 2. LS Minimizes the Sum of the Squared Differences
(errors) (SSE)
Least Squares Graphically
n
LS minimizes  i 1 2 3 4

 2
 
 2
 
 2
 
 2
 
 2

i 1
Y Y2   0   1X 2   2
^4
^2
^1 ^3

X
Derivation of Parameters (1)
 Least Squares (L-S):
Minimize squared error
n n

    yi  0  1 xi 
2 2
i
i 1 i 1
Derivation of Parameters (1)
 Least Squares (L-S):
Minimize squared error
n n

    yi  0  1 xi 
2 2
i
i 1 i 1

   i2    yi   0  1 xi 
2

0 
0 0
 2  ny  n 0  n1 x 

ˆ0  y  ˆ1x
Derivation of Parameters (1)
 Least Squares (L-S):
Minimize squared error
   i2    yi   0  1 xi 
2

0 
1 1
 2 xi  yi   0  1 xi 
 2 xi  yi  y  1 x  1 xi 

1  xi  xi  x    xi  yi  y 
1   xi  x  xi  x     xi  x  yi  y 
SS xy
ˆ1 
SS xx
Coefficient Equations
 Prediction equation
yˆi  ˆ0  ˆ1xi

 Sample slope
SS xy   xi  x  yi  y 
ˆ
1  
  xi  x 
SS xx 2

 Sample Y - intercept

ˆ0  y  ˆ1x
Regression – Error measures
Regression – Error measures
Linear Regression – Matrix Form
Linear Regression – Matrix Form
Geometric Interpretation
• 𝑌෠ lies in subspace spanned by column space of 𝑋: 𝐶 𝑋

• 𝑟 = 𝑌 − 𝑌෠ = 𝑌 − 𝑋𝛽 is orthogonal to 𝑋, 𝑖. 𝑒. 𝑋 𝑇 𝑟 = 0

• Colab Notebook:
https://colab.research.google.com/drive/193OYEJ_-
wh_p9Mv8idRruV3ZIbb6EkiP?usp=sharing
Linear Regression – Matrix Form - Issues

• N samples, p-dimensional (what if p > N ?)


• Complexity of matrix inversion (what if N very large
?)
Gradient Descent
1. Initialize the parameters 𝛽 to some
random values.
2. Update the parameters using gradient
descent rule
𝛽 𝑡 + 1 = 𝛽 𝑡 − 𝜂 ∇𝛽 𝐿 𝛽 𝑡
3. Repeat 2 until |∇𝛽 𝐿 𝛽 𝑡 | is close to 0
Gradient Descent
1. Initialize the parameters 𝛽 to some
random values.
2. Update the parameters using gradient
descent rule
𝛽 𝑡 + 1 = 𝛽 𝑡 − 𝜂 ∇𝛽 𝐿 𝛽 𝑡
3. Repeat 2 until |∇𝛽 𝐿 𝛽 𝑡 | is close to 0
Gradient Descent
1. Initialize the parameters 𝛽 to some random
values.
2. Update the parameters using gradient
descent rule
𝛽 𝑡 + 1 = 𝛽 𝑡 − 𝜂 ∇𝛽 𝐿 𝛽 𝑡
3. Repeat 2 until |∇𝑤 𝐿 𝑤 𝑡 | is close to 0

𝛽1
𝛽0
Gradient Descent
1. Initialize the parameters 𝛽 to some
random values.
2. Update the parameters using gradient
descent rule
𝑤 𝑡 + 1 = 𝑤 𝑡 − 𝜂 ∇𝑤 𝐿 𝑤 𝑡
3. Repeat 2 until |∇𝑤 𝐿 𝑤 𝑡 | is close to 0
Linear Regression
• Linear Regression  Linear in coefficients and
NOT variables
Careful: X may not be causing y !
Linear Regression – Outliers
Linear Regression is problematic in
many other cases
Linear Regression is problematic in
many other cases
Linear Regression is problematic in
many other cases
Linear Regression is problematic in
many other cases
Piecewise Linear Regression

Also ref: Multivariate Adaptive Regression Splines (MARS)


Bivariate and multivariate models
Bivariate or simple regression model
(Education) x y (Income)

Multivariate or multiple regression model


(Education) 1 x
(Gender) x2
(Experience) x3 y (Income)
(Age) x4

Model with simultaneous relationship


Price of wheat Quantity of wheat produced
Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

39
Test Time
Overfitting

You might also like