Introduction to Linear Regression
Focusing on Equations
Your Name
What is Linear Regression?
▶ A statistical method to model the relationship between
variables
▶ Predicts a dependent variable (y) based on one or more
independent variables (x)
▶ Assumes a linear relationship: changes in y are proportional to
changes in x
▶ Goal: Find the line that best fits the observed data points
The Simple Linear Regression Equation
For simple linear regression with one independent variable:
y = β0 + β1 x + ε
where:
y = dependent variable
x = independent variable
β0 = y-intercept (value of y when x = 0)
β1 = slope (change in y for a unit change in x)
ε = error term (captures unexplained variation)
Estimating the Regression Line
We estimate β0 and β1 to get the best-fitting line:
ŷ = b0 + b1 x
where:
ŷ = predicted value of y
b0 = estimate of β0
b1 = estimate of β1
The goal is to minimize the difference between observed y and
predicted ŷ .
Ordinary Least Squares (OLS)
OLS minimizes the sum of squared residuals:
n
X n
X
minimize (yi − ŷi )2 = (yi − (b0 + b1 xi ))2
i=1 i=1
where:
n = number of observations
yi = observed value of y for the i-th observation
xi = observed value of x for the i-th observation
Calculating the Slope and Intercept
The OLS estimates for b0 and b1 are:
Pn
(x − x̄)(yi − ȳ )
b1 = i=1 Pn i 2
i=1 (xi − x̄)
b0 = ȳ − b1 x̄
where:
x̄ = mean of x values
ȳ = mean of y values
Interpreting the Coefficients
▶ b0 (y-intercept): The predicted value of y when x = 0
▶ b1 (slope):
▶ The change in y for a one-unit increase in x
▶ If b1 > 0: positive relationship
▶ If b1 < 0: negative relationship
▶ If b1 = 0: no linear relationship
Assessing Model Fit: R-squared
R-squared measures the proportion of variance in y explained by x:
SSR
R2 = 1 −
SST
where:
n
X
SSR = (yi − ŷi )2 (Sum of Squared Residuals)
i=1
Xn
SST = (yi − ȳ )2 (Total Sum of Squares)
i=1
R-squared ranges from 0 to 1, with 1 indicating a perfect fit.
Multiple Linear Regression
For multiple independent variables:
y = β0 + β1 x1 + β2 x2 + ... + βk xk + ε
where:
y = dependent variable
x1 , x2 , ..., xk = independent variables
β0 , β1 , ..., βk = coefficients to be estimated
ε = error term
Assumptions of Linear Regression
1. Linearity: The relationship between x and y is linear
2. Independence: Observations are independent of each other
3. Homoscedasticity: Constant variance of residuals
4. Normality: Residuals are normally distributed
5. No multicollinearity (for multiple regression)