3.
Theory of linear regression
In statistics, linear regression is a linear approach for modeling the relationship between a
scalar response and one or more explanatory variables (also known as dependent and
independent variables). The case of one explanatory variable is called simple linear
regression; for more than one, the process is called multiple linear regression. In this report,
we only measure the term of single linear regression.
2.1 Simple linear regression
As mentioned, simple linear regression is a linear regression with a single explanatory variable.
That is, it concerns two-dimensional sample points (conventionally, the x and y coordinates in a
Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as
accurately as possible, predicts the dependent variable value as a function of the independent
variable. The adjective “simple” refers to the fact that the outcome variables are related to a
single predictor.
It is common to make the additional stipulation that the ordinary least square (OLS) method
should be used: the accuracy of each predicted value is measure by its square residual (vertical
distance between the point of the data set and the fitted line), and the goal is to make the sum of
these square deviations as small as possible.
The expected value of Y at each level of x is a random variable.
Ε ( Y |x )=α + βx
We assume that each observation, Y, can be described by the model:
Y 1=α + β x 1 +ϵ 1
Y 2=α + β x2 +ϵ 2
Y n=α + β x n+ ϵ n
The fitted or estimated regression line is therefore:
^
y 1=a+b x i
Note that each pair of observation satisfies the relationship:
y i=a+bx i +ϵ i ,i=1 , 2 ,3 , , n
Where e i= y i− ^
yi is called the residual. The residual describes the error in the fit of the model to
the 𝑖 − 𝑡ℎ observation of y i.
Estimate Ε ( Y |X =x ¿ ) : ^
y ¿ =a+b x ¿
The least-square estimate of the intercept and slope in the simple linear regression model are
S xy
b=
S xx
And
a= y−b x ,
Where
n
S xx =∑ ¿ ¿
i=1
( )( )
n n n n
1
S x y =∑ (x i ¿−x )( y i− y )=∑ x i y i − ∑ xi ∑ y i ,¿
i=1 i=1 n i=1 i=1
n
S yy =∑ ¿ ¿
i=1
2.2 Sum of square
- The sum of square of the residuals:
SSE=∑ ( y i−^
2
yi )
- The total sum of square of the response variable:
SST =∑ ( y i− y i )
2
- The sum of square for regression:
SS R=∑ ( ^
2
y i− y i )
- Fundamental identity:
SST =SSE+ SSR
- Computational formula:
SST =S yy , SSR=b S x y , SSE=S yy −b S xy
- It can be shown that:
2
E(SSE)=(n−2)σ
- An unbiased estimator of σ 2:
2 SSE
S=
n−2
- Coefficient of determination:
2 SSE
r =1−
SST
2.3 Properties of the Least Square Estimators I
*Slope properties
2
σ 2
E(b)=β ,V (b)= =σ b
S xx
- The estimated standard error of the slope:
Sb =
√ S2
S xx
- Moreover:
b− β
T= t (n−2)
Sb
2.4 Properties of the Least Square Estimators II
*Intercept properties:
2
σ μ xx 2
E(a)=α ,V (a)= =σ a
S xx
1
- With μ xx =
n
∑ x 2i . The estimated standard error of the slope:
Sa =
√
S2 μ xx
S xx
=√ S b μxx
2
- Moreover:
a−α
T= t(n−2)
Sa
2.5 Confidence Interval
- Confidence interval for the slope:
b ± t v/ 2 ,n−2∗S b
- Confidence interval for the intercept:
a ± t v/ 2 ,n−2∗S a
2.6 Hypothesis Testing
b− β0
- Slope: T = with
Sb
H 1 : β ≠ β 0 , H 1 : β< β 0 , H 1 : β > β 0
a−α 0
-Intercept: T = with
Sa
H 1 : α ≠ α 0 , H 1 : α < α 0 , H 1 : α >α 0