Chapter 2
Chapter 2
Chapter Two
The Simple Regression Model
2.1 Definition of Simple Regression Model
Given the two variables Y & X, the simple linear regression explains Y interims of X which
shows how Y varies with changes in X. The simple equation that linearly relates Y to X
with the inclusion of disturbance or error term which represents other factors affecting Y
is called simple linear regression model. Such a simple equation is:
Y 0 1 X ............................(2.1)
This equation is also called the two variable linear regression model or bivariate linear
regression model.
In order to define each of the quantities in the equation, mainly the variables Y&X have
several different names used interchangeably. That is, Y is called dependent variable,
explained variable, response variable, predicted variable or regresand. X is called the
independent variable, explanatory variable, control variable, predictor variable or regressor.
The variable , is called disturbance or error term which represents factors other than X that
affect y. The equation also addresses the functional relationship between Y and X. If the
other factors in are fixed, so that change in is zero, then X has a linear effect Y.
Thus, change in Y is simple 1 multiplied by the change in X, This means that 1 is the
The simple linear regression can be best explained by a simple wage equation:
Wage = 0 1educ .................(2.2)
This equation indicates change in years of education affects the wage level. Thus, 1
measures the change in hourly wage given certain years of education, holding all other
factors, , fixed. Some of those factors may be labor force experience, ability, work ethics
etc. Although it is not realistic for many economic applications, the linearity feature in the
model refers to a one unit change in the independent variable (years of education) has the
same effect on the dependent variable (wage).
The most difficult issue to be addressed by simple linear regression model is that whether
the model really allows us to draw ceteris paribus conclusions about how X affects. That is,
we need to show how 1 does measure the effect of X on Y, holding all other factors (in )
fixed. So as to estimate the effect of X on Y, 1 , with the help of ceteris paribus principle,
we need to explain how unobserved variable X.
Before stating a key assumption about how X and are related, we can have one
assumption about then average value of . As long as the intercept 0 is included in the
equation, nothing is last by assuming that the average value of in the population is zero.
i.e, E( ) = 0 ………………………….(2.3)
Importantly, this assumption doesn’t say nothing about relationship between X & but
simply explains the distribution of unobservable in the population.
We now turn to the crucial assumption regarding how X & are related. The natural
measure of the association between the two variables (X & ) is the correlation
coefficient. However, even if the correlation coefficient shows those X & are
uncorrelated, it does not be a guaranty that the two variables are not totally related. That is,
it is possible for to be uncorrelated with X while being correlated with functions of X,
such as X2 , because correlation only measures linear dependency between X & . The
better approach to solve such problem is using the expected value of given X. That is, the
conditional distribution of given any value of X.
The crucial assumption that is prevailed in this instance is that the expected value of
given X is zero which explains that the expected (or average) value of does not depend
on the value of X.
Thus, E ( /X ) = 0 ………………… (2.4)
The above equation indicates that the average value of unobservable is the same for any
value of X. this equation is already developed from the crucial assumption called zero
conditional mean assumption and therefore this equation must be equal to the earlier
assumption which says the average value of on the entire population is zero which is also
termed as the zero mean assumption.
Since we have come up with the crucial assumption that explains is not related with X
and hence it allows us to draw a ceteris paribus conclusion, we can use the simple linear
regression model in econometric analysis.
Taking the expected value of the simple linear regression model conditional on X and the
zero conditional mean assumption as given above E ( /X ) = 0 we can obtain a new
This equation is a population regression function (PRF). If the zero conditional mean
assumption, (i.e, the variables X & are uncorrelated) is true, then it is useful to break the
simple linear regression model into two components:
The part 0 1 X is called deterministic or systematic component and the part is called
To find out the estimate of the population parameters ( 0 & 1 ) , we need to derive the
sample regression function (SRF) from the population regression function (PRF). Implying,
we can estimate the parameters from a sampled date say a randomly selected sample of Y
values for the fixed X values. Since there are sample fluctuations, we may not be able to
estimate the PRF accurately from a particular sample date (or SRF). To avoid such problem
we would critically identify the SRF which is a close approximation of the PRF. In this case
the estimate of the population parameters, the sample parameters 0 & 1 are as close as
possible to their respective values of the true population parameters 0 & 1 though we will
The sample counter part of the earlier PRF can be written as: Yi 0 1 X i ..............(2.7)
Where: Y is read as “Y – hat” or “Y-Cap”;
Y i = estimator of E(Y/Xi)
0 & 1 are estimators of 0 & 1 .
To estimate the PRF from its SRF, we need first to explain the least square principles from
the method of OLS. Recall the PRF was given by: Y1 0 1 X i i its estimate from
SRF is also given by:
Y1 0 1 X i i
Yi Y1 i , because Y1 0 1 X i
The earlier equation shows that the residuals ( i ) are simply the differences between the
actual Y values (Yi) and the estimated Y values (Yi ) .
The next task is determining the SRF itself provided that n-pairs of observations on Y&X
are given (i.e, PRF). So as to make the SRF is as close as the PRF the summation of
differences between the actual and estimated values which is the sum of residuals should
be as small as possible.
That is, (Yi Yi ) i should be as small as possible. If this is so the estimated value, Yi ,
However in the minimum i criterion, the sum can be small even though the i are
widely spread about the SRF. For instance, if we assume the four residual i , 2 , 3 & 4 in
a SRF have the values 10, -2, +2, and -10 respectively, then the algebraic sum of these
residuals is zero although 1 & 4 are scattered more widely around the graphical
representation of the SRF.
Y 3 SRF
1 4 Yi o 1 X i
2
X
Fig 3.1 Least-squares Criterion
To avoid this problem we should adopt another criterion called the least- squares criterion,
which states that the summation of squared of residuals ( 2 ) should be as small as
possible.
2 (Yi Yi ) 2
i.e.
2 (Yi 0 1 X i ) 2 .....................................(2.9)
as noted previously, under the minimum i criterion, the sum can be small even though
the i are widely spread about the SRF. But this is not possible under the least-squares
criterion, because by squaring i this method gives more weight to residuals 1 & 4 than
2
2 & 3 That is, the larger the i (in absolute value), the larger i . A further justification
for the least square method lies in the fact that the estimators obtained by it have some
desirable statistical properties, which will be discussed in latter sections.
The method of ordinary least squares (OLS) provides unique estimates of 0 & 1 that gives
2
the smalls possible value of i .
2 (Yi Yi ) 2
i.e. 2
(Yi 0 1 X i ) 2 .....................................(2.10)
2
The partial derivatives of i with respect to 0 gives
2
i (Yi 0 1 X i ) 2 0
0 0
(Yi 0 1 X i ) 0
Yi n 0 1 X i 0
Yi n 0 1 X i ...........................................(2.11)
Dividing equation 2.11 by n, which is the no of observations in the sample, we can obtain =
Y 0 1 X ................................................(2.12)
Note that equations 2.11 and 2.13 are known as normal equations and salving these normal
equations simultaneously, we obtain the least squares estimates of 0 & 1 as:
1 = n X i Yi X i Yi
2
n X i ( X i ) 2
= ( X i X )(Yi Y )
( X i X ) 2
= xi y i
2
xi .......................................................................(2.14)
Where Y and X are sample means of X & Y and hence we can define xi X i X and
y i Yi Y . Hence forth we adopt the convention of letting the lower case letters denote
By simple algebraic manipulations of equation (2.11), the parameter 0 can also be solved
as:
Yi n 0 1 X i
Y 0 1 X
0 Y 0 X ........................................................(2.16)
The third algebraic property of OLS estimates provides us the sample covariance between
the estimated Y value ( Yi ) and the residual. ( ei ) is zero.
Note: when we consider the square of summation of the three variations discussed above
we will get the three sum of squares as follow = total sum of squares (TSS), explained sum
of squares (ESS) and residual sum of squares (RSS).
2
TSS = (Yi Y ) 2 y i
2
ESS = (Yi Y ) 2 y i
2
RSS = (Yi Yi ) 2 ei
The variance of i is constant or the same for all observation in each period so that the error
terms are homoscedastically distributed.
i..e. Var ( i ) = 2 …………………………….(2.21)
parameters ( 0 & 1 ). So the expected or mean values of the estimated parameters are
= ( 0 0 ) 2
2 2
= u X i
2
u Xi
2
Variance of 1 : The Variance of 1 is the square of the expected deviation of
2 1
i.e, S( 1 ) = Var (1 ) u ..............................(2.32)
( X i X ) 2
Having the null & alternative hypothesis, we need to compute the standard errors
of the estimates S ( i ) in order to apply the standard errors test. After
computing the standard errors, the standard errors of the estimates need to be
compared with the numerical values of the estimates and can be concluded as
follow.
i. If the standard error is less than half of the numerical value of a parameter.
1
(i.e, S ( i ) < / i / , then we reject H 0 (or accept H 1 ) so that we conclude that
2
the estimate i is statistically significant.
ii. If the standard error of the estimate is greater than half of the numerical value
1
(i.e, , S ( i ) > / i / , then we accept H 0 (or accept H 1 ) so that we conclude
2
that the estimate i is not statistically significant.
iv. Find the theoretical or table value of t, which is obtained from the t-table using
the d.f and the significant level chosen. This t-value is termed as t-tabulated, ttab,
or t-critical, tcr.
v. Find the calculate value of t, tcal using the following formula.
t = i /S ( i )
This t-value is also termed as t-calculated, tcal, t-computed, tcom, or t-statistic ts.
vi. Finally, the t-calculated will be compared with the t-tabulated value and can be
concluded as follow.
a. If tcal > ttab, then we reject the bull hypothesis (or accept alternative
hypothesis) so that we concluded that the estimate i is statistically
significant.
b. If tcal < ttab, then we accept the null hypothesis (or reject the alternative
hypothesis) so that the estimated parameter i is not statistically significant.
Thus, the CI of 95% can be computed and presented using the following formula
expression.
P i t / 2 . S( i ) i i t / 2 . S( i ) 95%........(2.33)
Where is the significance level and the equation shows that the probability of the
estimated value lied between the two extreme values (to the right & left of ) is 95%.
Alternatively, the CI can be rewritten as:
i i t / 2 SE ( i )..................................................(2.34)
in the dependent variable Y is explained by the explanatory variable X. that is, it measures
then goodness of fit of the model in explaining the variation.
In order to compute r2, we need to recall the relationship between various sum of squares
appearing in equation (2.20) which states that the total sum of squares (TSS) is the sum of
the explained sum of squares (ESS) and the residual sum of squares (RSS).
i.e, TSS = ESS + RSS ………………………………..(2.35)
This equation tells us that the total variation is the sum of the explained variation and
unexplained variation. Therefore, the above eq(2.35) can be rewritten as:
( Yi Y ) 2 = ( Y Y ) 2 + ( Yi Yi ) 2
Total variation (TSS) = Exp. Var. (ESS) + Unexp.Var. (RSS) ……..(2.36)
r r 2 ...........................................................................(2.39)
Where the lower cases X &Y refers ( X X ) and ( Y Y ) respectively which shows the
deviation of variables X and Y from their respective means.
An important point here is knowing the effect on OLS estimates when the units of
measurement of the dependent and independent variables change. For example, if the
dependent variable is multiplied by 1000 which means each value in the sample is
multiplied by 1000, then the OLS intercept and estimates are also multiplied by 1000
provided that nothing has changed about the independent variable. Therefore, the above
estimated equation will be changed by (This assumes salary measured in dollar):
salar dol = 963.191 + 18,501roe ………………………(2.43)
Note that both eq(2.42) & (2.43) gives the same interpretations. That is, if return on equity
(roe) increased by one unit, then the predicted salary increased by $18,501.
Generally, it is easy to figure out what happen to the intercept and slope estimates when the
dependent variable changes its unit of measurement. That is, if the dependent variable is
multiplied by the constant C, then the OLS intercept and slope estimates are also multiplied
by C.
We can also use CEO salary example to see what happens to the intercept and estimate
when we change the units of measurements of the independent variable. For example, if we
divide the independent variable by 100 - which means each value in the sample is divided
by 100 – then the original regression model in which the variable salary is measured in
thousands of dollar (i.e, eq(2.42) will be changed to Y the following estimated model:
salar dol = 963.191 + 1850.1roedec ………………………(2.44)
Where ‘roedec’ refers to return on equity in decimals.
According to the above estimated equation, in order to preserve the interpretation of the two
equations eq(2.42) and eq(2.44) the slope estimate should be multiplied by 100 when the
independent variable is divided by 100. But the intercept will not be changed. This is so
because the coefficient on roedec is 100 times the coefficient on roe in eq(2.42). Implying
that changing roe by one percentage point is equivalent to roedec = 0.01.
Generally, if the independent variable is divided or multiplied by some non zero constant,
C, then the OLS slope coefficient is multiplied or divided by C respectively but does not
affect the intercept.
general enough for all economic applications it is rather better to incorporate much
nonlinearity in to the simple regression model.
Generally, we can come up with the two commonly used nonlinear functional forms of
regression models in applied social sciences. These are:
This functional form recognizes the constant percentage increase of wage for more
years of education: a percentage increase in wage is the same given one more years
of education. Therefore, the main reason for using the log of wage is to impose a
constant percentage effect of education on wage.
This is a constant elasticity model in which the coefficient of log (sales), i is the
estimated elasticity of salary with respect to sales. It implies that 1 percent increase
in firm sales increase CEO salary by i percent- the usual interpretation of
elasticity.
This type of simple linear regression model is a model without an intercept so that it is
known as regression through the origin and can be presented as:
y = i X ………………..……………………………(2.47)
Obtaining eq(2.47) is called regression through the origin because the line of this equation
pass through the origin of a graph where X = 0, Y = 0.
To obtain the slope estimate 1 we still rely on OLS method, which in this case minimize
the sum of squared residuals:
2
ei ( y i y i ) 2
Using calculus, it can be shown that 1 must solve the first order condition:
ei
0
1
2 ( y i 1 X i ) 2
0
21
2 ( y i 1 X i ) . X 0
( y i 1 X i ). X 0
2
y i xi 1 x i 0
2
1 xi y i xi
y i xi (Yi Y )( X i X )
1 2
.........................................(2.49)
xi ( X i X ) 2