Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views36 pages

Ees 400 - Topic Three - Simple Regression

Uploaded by

miged90235
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views36 pages

Ees 400 - Topic Three - Simple Regression

Uploaded by

miged90235
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

EES 400: FUNDAMETALS OF ECONOMETRICS I

TOPIC THREE - SIMPLE REGRESSION MODELS

3.1 INTRODUCTION

Regression is the main tool of econometricians


Regression analysis is concerned with the study of the dependence of one variable (the
DEPENDENT VARIABLE) on one or more other variables (the EXPLANATORY
VARIABLE(S) with a view to estimating and/ or predicting the average value of the
dependent variable.

To illustrate, consider the scatter graph below which shows the relationship between two
variables x and y and a line fitted among the scatter points.

y . +u

. -u

. . .
. x

From the scatter graph, we notice that despite variability between X and Y, there is a general
“tendency” for the variables to move together- i.e., as X increases, so does Y; as shown by
the line of best fit. The fitted linen among the scatter points is actually the REGRESSION
LINE. Thus regression analysis aims at finding the line of best fit among variables. The

regression line is thus as:Y =α + βX , such that:


Y≡ dependent variable α = intercept

X≡ explanatory β = slope or gradient

1
The relationship Y =α + βX is called a DETERMINISTIC or functional relationship since
it shows us that X and Y have an exact relationship. However, in real life, not many variables
have such an exact relationship. Indeed, not all points lie on the regression line as shown.
Some points are above the line while others are below the line. Only a few are above the line
while others are below the line. Only a few are on the line.

Thus a more realistic relationship between X and Y will actually be given as


Y =α + βX +u , such that u is called an error term or the disturbance term. The function
Y =α + βX +u is called a STOCHASTIC or STATISTICAL function. For points above
the regression line, u is positive while for points below the regression line, u is negative. A
variable such as u which can take on any set of values, positive or negative, with a given
probability, is called a RANDOM variable or a STOCHASTIC variable. Thus the error term
implies that not all points will lie on the lines. It actually represents the variations in Y than
cannot be explained by X. The reason why we include an error term in a regression mode is
due to:

i) Measurement errors; i.e. errors in measuring any of the variables.


ii) Omitted variable bias; i.e. it could be likely that we have omitted some very
important variables in the regression model
iii) Human behavior which is random and unpredictable e.g. due to differences in
tastes and preferences in tastes and preferences; or also shocks.
iv) Specification errors i.e. assuming a linear relationship when it should actually be
non-linear and so on.

A point to note about regression analysis is that any regression analysis is that any
regression must be guided a prior by economic theory.

Thus, although the function Y =α + βX +u assumes causation, i.e. that X and Y but
this causation must be informed by economic theory. Thus, if we say:

 Consumption= α +β . Income+u , this is true regression since it is


derived from economic theory of consumption;

 Income=α +β . Income+u , this is an untrue regression as no theory


provides for such a relationship.

2
3
3.2 CORRELATION VERSUS REGRESSION

A distinction is always made between correlation and regression. While correlation


analysis aims at finding the strength of linear association between variables, regression
analysis, on the other hand, aims at finding the direction of relationship between
variables. There are thus two (2) primary differences between correlation and regression
analysis, as outlined below:

CORRELATION REGRESSION
i) Assumes symmetry between X i) Assumes asymmetry between X
and Y i.e., there is no and Y; i.e., distinguishes which
distinction as to which variable variable is dependent and which
is dependent(causality is not is explanatory (causality is
important) important)

ii) Both X and Y are assumed to be ii) Only Y is assumed to be


statistical, random or stochastic statistical but X is assumed to
be fixed.

Thus, correlation does not imply causality, but regression does so.

There are basically two types of regression analysis, i.e.,

i) Simple regression analysis


ii) Multivariate regression analysis

In simple regression analysis, we study the effect of only one explanatory variable on the

dependent variable. For example, how X affects Y in: Y =α + βX +u . Thus, Y=f(x)

For this reason, simple regression analysis is also known as TWO-VARIATE


REGRESSION ANALYSIS., or BIVARIATE regression

In multiple variable analyses, we study the effect of more than one explanatory variable
on the dependent variable. For example, how X 1, X2 and X3 will affect Y in:
Y =α + βX 1 + β 2 X 2 + β 3 X 3 +u . Thus Y=f(X , X … X )
1 2 n .

4
3.3 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS

Economic analysis is mainly EMPIRICAL- i.e. we pay a lot of emphasis on using data to do
any econometric analysis. There are basically 3 types of data we can use in economic analysis

i) Time series data


ii) Cross section data, and
iii) Panel data
These are now discussed as follows:
i. Time series data

A time series is a set of observations on the value that a variable takes at different times; e.g.
the GDP of Kenya from 1963 to 2013.

The frequency at which time series data varies as shown below;


 Daily- e.g stock prices, interest rates
 Weekly- money supply figures
 Monthly- the unemployment rate, the consumer price index (CPI)
 Quarterly- the gross domestic product (GDP)
 Annually- the government Budget, GDP
 Deciannually- the census of a population

Time series data are the most widely used data in economic analysis. If the data used in
regression analysis is a time-series, then the regression equation is expressed with subscript t
for time, as follows:

Y t =α + βX t +ut

ii. Cross-section data

Cross section data are data of one or more variables collected at the same point in time e.g.
the GDP in Kenya, Uganda, and Tanzania for the year 2013.

If the data in regression analysis is cross-section data, then the regression equation is

expressed with subscript ( i ) for individual observation, as follows


Y i =α + βX i +ui

5
iii. Panel data

Panel data is data in which the SAME cross-sectional unit is surveyed over time. For
example, the GDP of Kenya, Uganda and Tanzania between the years 1963-2013 will
constitute panel data.

Thus, panel data combines the characteristics of both times series data and cross-sectional
data. For this reason, panel data is usually more powerful than time series or cross sectional
data as it helps to bring out the dynamics of individual behavior.

Panel data is also known as pooled or longitudinal data.

If the data used in regression analysis is panel data, then regression equation is expressed
with subscript it as follows:
Y i = α + βX i +u it
t t

Where:
( i ) = stands for the cross-sectional unit, Kenya =1, Uganda=2 and Tanzania=3; and

( t ) =stands for the time series identifier, e.g. 1963, 1964, 1965 ... 2013.

There are many sources of data which include:


i) Governmental agencies- e.g. the Kenya National Bureau Of Statistics
ii) International agencies- e.g. the World Bank, the International Monetary Fund
(IMF), the African Development Bank,
iii) Private organizations- e.g. the Standard and Poor Corporation, the World Penn
Tables from the University of Pennsylvania, Private Sector Alliance -Kenya;
iv) Individuals- e.g. data collected by an individual by use of questionnaires,
interviews, experiments etc, and
v) The internet and soon
For data ensure data is accurate, reliable, credible, available and quality.

6
3.4 TWO-VARIABLE REGRESSION ANALYSIS

Two variable regression analysis aims at fitting the line of best fit between two variable, X
and Y.

In performing regression analysis, we can use either the population or a sample

 The population regression is given by


Y i =β 0 + β 1 X i +u i
¿ ¿ ¿ ¿

 The sample regression is given by:


Y = β0 + β 1 X i +u

Where:

-
Y i is the actual value of dependent / endogenous variable ( for the population)
¿

- Y i is the estimated value of the endogenous /dependent variable (for a sample)

- Xi is the independent or exogenous variable


β 0 is the intercept term of the population regression line.
¿ ¿


β 0 Is the intercept term of the sample regression line. The term β 0 is read as β

hat subscript zero an estimator of


β0 .


β 1 is the slope or partial derivative of the population regression line i.e.
∂Yi
=β1
∂ Xi
¿ ¿


β 1 Is the slope coefficient or partial derivative of the sample regression line. β 1

is an estimator or approximation of
β1


ui Is the error or disturbance term of the population regression.
¿

 ui is the error or disturbance term of the sample regression

7
POPULATION

y . +u Y i =β 0 + β 1 X i +u i

. -u

. . .
. x
SAMPLE

¿ ¿ ¿ ¿

y . +u Y = β0 + β 1 X i +u

. -u

. . .
. x
¿
Y
Thus, the error or disturbance term is the difference between each observation ( i orY ) from
¿
the estimated regression line (Y ).
¿
Y −Y = population error/ disturbance term
Thus; i

¿ ¿ Y=
¿
∑ Y =E (Y )
Y −Y = Sample error/ disturbance term. Recall n
¿
Notice that the mean of Y =Y = Expected value of Y=regression line.

On the other hand, the residual is the difference between the population regression
¿

line and the sample regression line. Thus:


Y i −Y i = residual.

8
The major difference between the error or disturbance term and the residual, is that, whereas
the error term is measurable we cannot measure the residual.
¿ ¿

The intercept term (


β 0 or β 0 ) and the slope coefficients ( β 1 or β 1 ) are known as
MODEL PARAMETERS.

In regression analysis, we however prefer to use a sample rather than the population. This is
because, in real life or practice, it is easier to observe a sample as compared to observing a
population.

3.5 THE ORDINARY LEAST SQUARES ESTIMATORS

The ordinary squares (OLS) estimators are the main techniques used to estimate regression
models.
The name OLS is derived from the fact that OLS aims at minimizing the sum of squared
¿ ¿
β β
residuals. In so doing, OLS finds the values of the model parameters ( 0 and 1 ) which fits
a line of best fit.

¿ ¿
β β
The ordinary least squares estimators ( 0 and 1 ) are derived using the following eight
steps:

STEP 1: Begin with a sample and population regression lines and obtain the residual, as a
difference of the two:

 Population regression line:


Y i =β 0 + β 1 X i +u i
¿ ¿ ¿ ¿

 Sample regression line:


Y = β0 + β 1 X i +u
¿
e Y −Y i
 Residual ( i ) = i
¿ ¿
e i=Y i−β 0 −β 1 X 1

STEP 2: Square on both sides of the equation and take summations:

¿ ¿
e 2 =( Y i − β 0 −β1 X 1 )2
i

9
∑ ei 2=(Y i −β 0−β1 X 1)
¿ ¿ 2

The result, i.e.


∑ e i2 is called the SUM OF SQUARED RESIDUALS (RSS)
¿
β
STEP3: Obtain the partial derivatives of the sum of squared residuals with respect to ( 0 and
¿
β 1 ) as follows:

∂∑ e 2
( )
¿ ¿
=2 ∑ Y i −β 0 −β 1 X 1 .−1
i
¿
∂ β0

( )
¿ ¿
−2 ∑ Y i− β0 −β 1 X 1
¿ ¿ ¿
−2 ∑ Y i +2 ∑ β 0 +2 ∑ β 1 X 1 ,,,,but ,2 ∑ β 0
¿ ¿
−2 ∑ Y i +2n β 0 +2 β 1 ∑ X 1 ..............................(1)

∂∑ e 2
( )
¿ ¿
=2 ∑ Y i −β 0 −β 1 X 1 .− X 1
i
¿
∂ β1

¿ ¿
−2 ∑ X 1 . ∑ (Y i −β 0 −β 1 X 1 )
¿ ¿
−2 ∑ Y i X i +2 β 0 ∑ X i +2 β1 ∑ X 2 . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. .(2)
i

STEP 4: The first –order necessary condition for maxima/minima requires that each partial
derivative is equal to zero. Thus we shall equate equation 1 and equation 2 to zero:

From Equation 1:
¿ ¿
−2 ∑ Y i +2n β 0 +2 β 1 ∑ X i =0
¿ ¿
2n β 0 +2 β1 ∑ X i=2 ∑ Y i
¿ ¿
n β 0+β 1 ∑ X i=∑ Y i ...............................................(3)

From Equation 2:

10
¿ ¿
−2 ∑ Y i X i +2 β 0 ∑ X i +2 β1 ∑ X 2 =0
i
¿ ¿
2 β 0 ∑ X i +2 β 1 ∑ X 2=2 ∑ Y i X i
i
¿ ¿
β 0 ∑ X i +β 1 ∑ X 2 =∑ Y i X i . .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. . .(4 )
i

The two resulting equation 3 and 4 are the famous NORMAL EQUATIONS

STEP 5: Check that the normal equations will maximize or minimize the residual sum of

squares
∑ e i2
To do so, we use the second –order conditions on equations 1 and 2 as follows.
∂2∑ e2
i
¿ =2 n> 0( min ima )
∂β 2
0
∂2 ∑ e 2
=2 ∑ X 2 >( min ima)
i
¿
i
∂ β1
Since the second-order conditions are positive definite, it means that and will indeed
MINIMIZE the residual sum of squares.

STEP 6: Express the two normal equations from equation 3 and equation 4 into matrix
format as follows,

¿ ¿
n β 0 + β 1 ∑ X i= ∑ Y i
[n ∑ X i ¿] ¿ ¿ ¿
In matrix form ¿
¿ ¿
β 0 ∑ X i + β 1 ∑ X 2 =∑ Y i X i
i

STEP7: solve for


β 1 using the method of CRAMMER’S RULE :( replace column).

¿
β 1=|n ∑ i |¿ ¿¿¿¿
Y ¿
¿
11
¿

Thus, in a simpler way, the OLS estimator


β 1 is given by the formula:
¿ n ∑ Y i X i− ∑ Y i . ∑ X i
β 1 == 2
n ∑ X 2 −( ∑ X i )
i
¿ ¿ ¿ ¿ ¿

Step 8: Obtain the intercept parameter


β 0 is given by the formula: β 0 =Y −β 1 X
¿
Y=
∑Y ¿
X=
∑X
Where: n and n i.e. the mean values.

Example: Estimating a Regression Model Using Ordinary least squares

Recall example 1 (on the sales and profit of ABCD Company limited) provided under
Correction Analysis:

The data table is reproduced as under for convenience: we got the following:

∑ X= 550, ∑ Y = 90, ∑ XY = 6,340, ∑ X 2= 38,500, ∑ Y 2= 1,054


¿ ¿

Using the information, we can now obtain the OLS estimators


β 0 and β 1
respectively as follows

¿ n ∑ Y i X i− ∑ Y i . ∑ X i ( 10×6340 )−( 550×90 )


β 1= 2
=
n ∑ X 2 −( ∑ X i ) ( 10×38500 )− (550 )2
i

¿
β 1=
∑ xy
∑ x2
¿
1390
β 1=
¿
63 , 400−49 ,500 13 , 900 8250
β 1= = =0 .1685 ¿
82 ,500 82 , 500 , Similarly
β 1=0. 1685

¿
¿
Y=
∑ Y = 90 =9 β 0 =9− ( 0 .1685×55 )
¿
10 n β 0 =9−9 .2667=−0 . 2667
¿ ¿ ¿ ¿

X =
∑ X =550 =55 ¿
β =Y −β 1 X but
For 0 n 10 β 0 =−0 . 2667

12
¿
Thus the OLS regression Equation is Y =−0 .2667+0 .1685 X

Interpretation of the results:


¿
β 0 =−0 . 2667
Holding sales(x) constant, the expected or mean profit (Y) is ksh-0.2667(a loss)
¿
β 1=0. 1685
An increase in sales by one unit, will lead to an increase in profit by 0.1685 units,
ceteris paribus

From the OLS regression equation, we can also predict or forecast the value of Y for any
given value of X.
For example, given that X= 150, we can now predict Y as follows:
¿
Y =−0 .2667+0 .1685 X
¿
Y =−0 .2667+0 .1685(150 )
¿
Y =25
Apart from prediction or forecasting ,we can equally calculate the elasticity of profit with
respect to sales using either point elasticity or arc elasticity as follows :

For point elasticity , the elasticity of profit with respect to sales is.
∂p s
e p , s= .
∂s p

Thus, since Pr ofit=−0 . 2667+0 . 1685(sales ) , then at the mean values of profit and
sales ,we shall obtain :
¿
∂p S
e p , s= . ¿
∂s P

=0 . 1685 S= ∑ =550 =55 P= ∑ =90 =9


∂p ¿ S ¿ P
Where: ∂ s , n 10 n 10

55
e p , s =0 .1685× =1. 0297
Thus, 9 ,
e p , s =1. 0297
13
Interpretation;

A 1% increase in the value of sales, will lead to a 1.0297 % increase in profit, ceteris paribus.
Thus, sales and profit are relatively elastic.

For Arc elasticity, the elasticity of profit with respect to sales is :

e p , s= .
[
∂ p S 1 +S 2
∂ s P 1 + P2 ]
For example, if
S1 =40 and S2 =60 , then:

At
S1 =40 ; profit =−0 . 2667+0 . 1685(40 ) =6.4733

At
S2 =60 ; profit = −0 . 2667+0 . 1685(60 ) =9.8433

Arc elasticity =
e p , s =0 .1685 .
[ 40+60
]
6 . 4733+9 . 8433
=1 .0327

e p , s =1. 0327
¿
e
Finally, we can also obtain Y (i.e. The estimated value of profits), the residual ( i ) and the

squared residuals (
∑ ei 2 ) or RSS as follows:
¿ ¿
Time X Y e2
Y =−0 .2667+0 .1685 X e i=Y −Y i

1 10 2 1.4183 0.5817 0.3384

2 20 3 3.1033 -0.1033 0.0107

3 30 5 4.7883 0.2117 0.0448

4 40 7 6.4733 0.5267 0.2774

5 50 8 8.1583 -0.1583 0.0251

6 60 9 9.8433 -0.8433 0.7111

7 70 11 11.5283 -0.5283 0.2791

8 80 12 13.2133 -1.2133 1.4721

14
9 90 14 14.8933 -0.8933 0.8069

10 100 19 16.5833 2.4167 5.8404

∑ ei =−0 . 008 ∑ ei 2 =9.8


06

Thus, the expected value or mean of the residual is


e i is.
¿
e i=E ( e i )=
∑ e i = −0 . 008 =−0 . 0008
n 10

Actually, the expected value or mean of the residual or error term should be zero. In this
case, it is not exactly zero due to the rounding off.

Thus E(
e i )= Zero (0)

The value 9.806 is called the sum of squared residuals, i.e.


∑ ei 2=9 . 806( RSS )

3.6 ASSUMPTIONS OF THE ORDINARY LEAST SQUARES

The OLS methodology is based on the following assumptions.

E ( ei ) =
∑ ei =0
i. The expected value or the mean of the error term is Zero n
This concept was introduced in the table provided above

ii. The variance of the error term is constant, i.e.


Var ( e i ) =E [ e i−E ( ei ) ] =E ( e i ) 2=σ 2 ( sigma ,, squared )
2

When the variance of the error term is constant, this is the assumption of
HOMOSKEDASTICITY ; otherwise ,if the variance is not constant ,that is a case
of HETEROSKEDASTICITY, which is actually a violation of the OLS assumption
of homoscedasticity. Therefore the error term should be homoscedastic. The
problem of Heteroskedasticity is common in cross-sectional data.

iii. The Assumption of Normality:

15
The error term is assumed to follow a normal distribution with a mean of zero and a

variance of σ
2
,
ei N (0 , σ2 )

iv. There is a linear relationship between the dependent variable and the

independent variables: Y =α + βX + e

Thus, the relationship between X and Y is linear in the OLS parameters α and β

v. Assumption of no multicollinearity

Multicollinearity is a situation in which the independent variables (


X 1 , X 2 , X 3 ) are
correlated. It is also a violation of the OLS assumption. If there is a problem of
multicollinearity, it means that we cannot obtain the values of the OLS parameters
¿ ¿
α , β1 , β2 etc. Thus, there should not be a problem of multicollinearity
corr ( X 1 , X 2 )=0
.

vi. Assumption of zero correlation between the independent variable and the error
term; i.e., The error term and the independent variable should not be correlated
Cov ( X i , ei )=0

vii. The assumption of zero autocorrelation:

The error term in period ( i ) and the error term in period ( j) should not be correlated.
Thus, there should be no autocorrelation, otherwise known as SERIAL
Cov ( ei , e j )=E [ ( e i )( e j ) ]=0
CORRELATION. For alli≠ j

The problem of autocorrelation is therefore a violation of the OLS assumption, and is


common in time series Data.

viii. No outliers in the data

An outlier is a value that is very large or very small, in relation to the rest of the
other observations

3.7 THE VARIANCE OF THE ERROR TERM

16
VarU
The variance of the error term, that is i2 is given by:

2
∑ u i2
VarU 2 =σ =
i n−2

N/B n−2 is called the degree of freedom (df) such that we minus 2 since the
¿ ¿
regression model we obtained had 2 OLS estimators α and β .

U
i2 Is the sum of squared residuals (RSS) which we found earlier
∑ ei 2=9 . 806
.

¿
2
∑ u i2 9 .806 9. 806
VarU 2 =σ = = = =1 . 22575
Thus i n−2 10−2 8

3.8 THE STANDARD ERROR OF THE REGRESSION MODEL

The standard error of the regression model ( se ) is obtained by taking the square root of

the variance of the error term, that is


se=√ var ui =σ =
¿

√ ∑ U i2
n−2

Hence se = √ 1.22575 = 1.10714

N/B. The standard error of the regression model is actually the standard deviation of
the Y values about the mean of Y .Thus ,it is also the standard deviation of Y.

3.9 THE STANDARD ERROR OF THE OLS COEFFICIENTS

[Recall we are dealing with a sample drawn from target population] This brings into
consideration, the sampling distribution

a) The standard error of the slope coefficient

¿
se ¿

The standard error of the slope coefficient β , which is denoted β by is given


¿
σ
se =
¿

by:
β
√∑ x 2

17
¿
From our example, we notice that σ =1 .10714 ∑ x 2=8 , 250
Thus,

1 . 10714 1. 10714
se =
¿ = =0 . 01219
β √ 8 , 250 90.8295
se =0. 01219
¿
β

b) The standard error of the intercept parameter

¿
se ¿

The standard error of the intercept parameter α which is denoted by α is given

by:
se =
¿
α
√ ∑ X i . x σ¿
n∑ x
2
2

¿
From our example, we notice thatσ =1 .10714 ,
∑ X i2 =38 ,500 , n=10 ∑ x 2 =8 , 250

Thus,

se = ¿

√ ∑ 38 , 500 ×1 .10714
10×8 , 250
α
se =0. 68313×1. 10714
¿
α
se =0. 75632
¿
α

3.10 THE T-STUDENT RATIO

The t student ratio is an important test statistic that we use in determining whether a
particular variable or parameter is significant or not. This process is actually referred to
as HYPOTHESIS TESTING:

The t student ratio is thus given by the formula:

OLS . estimator
t=
s tan dard . error . of . OLS .estimator
Therefore the t-value for the slope coefficient is given as:

18
¿
β 0 . 1685
t =
¿ =
β se 0 . 01219
¿
β
t =13. 8228
¿
β

¿
This is actually, the calculated t-statistics for β

On the other hand, the t-value for the intercept parameter can be obtained in a similar way
as follows.
¿
α −0 . 2667
t = ¿ =
β se 0 . 75632
¿
α
t =−0 .35258
¿
α

3.11 THE COMPLETE REGRESSION MODEL

By complete regression model, we mean, a regression model should at a snapshot, show:

- the OLS estimates,


- the standard errors of the OLS estimates
- the t values for the OLS estimates and
- the goodness of fit or coefficient of determination

Hence, we can now present the complete regression model for ABC Company where we
regressed profit (Y) on sales (X) as follows:
Pr ofit=−0 . 2667+0 . 1685(sales )
Se (0.75632) (0.01219)
2
t-values -0.35258 13.8228 R =0 . 9598

Recall : In this example, we obtained R2 by squaring correlation coefficient

More formally, these results can be presented in a table of regression results as follows:
Profit Coefficient Std Errors t-value

Constant -0.2667 0.75632 -0.3525

19
Sales 0.1685 0.01219 13.8228

R2 =0 . 9598

3.12 THE ADJUSTED R-SQUARED

2
Although the coefficient of determination or goodness of fit ( R ) was found as

r 2 =0 . 9598 , this statistic usually has problems ‘i.e.’

2
i. We cannot compare the r computed from models which have different dependent
variables.
2
Thus, any re-arrangement of the model will yield different values for r

2
ii. The values of r usually tends to increase as the number of independent variables
2
increase in the model. with this, r loses its usefulness since we cannot tell whether
it is measuring the goodness of fit or the number of independent variables

iii. r 2 also cannot discriminate among models, i.e. It cannot tell us which particular
model to choose among 2 or more models.
2
Due to the above limitations ofr , an alternative measure of goodness of fit, known as
¿
2 r2
adjusted r or commonly has been developed to help overcome these limitations of
2
the simpler .

2
The adjusted r is modified or adjusted so as to accommodate the changes in degrees of
freedom that results due to addition or removal of some independent variables in a
regression model.

2
The formula for Adjusted R is:

[ ( n−1
n−k )
( 1−R ) ]
¿
2
R2 = 1−

Thus, from our example above, we notice that


2
N=10, K=2 and R =0.9598,

20
[ ( ) ]
10−1
¿
R2 = 1− ( 1−0 . 9598 )
10−2

[ () ]
¿
9
R2 = 1− . ( 0 . 0402 )
8
¿
R2 =[ 1−0 . 04521 ]
¿
R2 =0 .9548
Interpretation:
Holding all other factors constant, sales (X) explains or accounts for 95.48% of changes in
profit (Y), when adjusted for degrees of freedom.
¿
R 2
Always 2 < R

3.13 PROPERTIES OF THE LEAST SQUARES ESTIMATORS

3.13.1 THE GAUSS-MARKOV THEOREM

The Gauss-Markov Theorem states as follows: “in the class of all linear and unbiased
estimators, the OLS estimators are BLU – Best, Linear and Unbiased estimators”.

¿
An OLS estimator such as β is said to be BLUE - i.e. best linear unbiased estimator, if it has
the following properties:

i. LINEAR

¿
The dependent variable Y should be linear in the parameters ( β ) as shown below;
Y =α + βX + ei

Notice that Y is linear in β andα .


¿
Recall that β = ∑xiYi/ ∑xi2 = ∑wiyi where wi is a weight.

If indeed Y is linear in β andα , then we can write:


¿
β =∑ wi Y i =w1 Y 1 +w 2 Y 2 +w 3 Y 3 +.. ..

21
¿
α =h i Y i=h1 Y 1 +h2 Y 2 +h3 Y 3 +.. . .. .. .

Where
w i and hi are simply weights

ii. UNBIASEDNESS

¿ ¿
The average or expected value of β is denoted by E ( β ) is equal to its true value β .
¿ ¿ ¿
Thus E ( β ) = β or also E ( β ) - β = 0. In such a case, we say β is an unbiased

estimator of β

¿ ¿ ¿
Similarly E (α ) =α , i.e., α is an unbiased estimator ofα . To demonstrate that β is an

unbiased estimator of β we proceed as follows:


¿
β = β0 + β 1 X i + ei
o Assume: 1

o Multiplying throughout this equation by


a i yields:
¿
a i β 1 =ai β 0 + ai β 1 X i + ai e i Such that ∑ ai=0

o Taking summation operators on both sides and simplifying, we get:

¿
∑ ai β 1=∑ a i β 0 +∑ ai β 1 X i +∑ ai e i
¿
β 1 ∑ a i=β 0 ∑ ai +β 1 ∑ a i X i +e i ∑ ai

Assumptions:

∑ ai β 1=E ( β 1) ,∑ a =0 , and ∑ a X =1
¿ ¿

i i i

¿ ¿ ¿ ¿
Thus,
E( β 1 )=0+ β +0 , E( β 1 )=β 1 . Thus β is an unbiased estimator of β 1

Next, we now want to demonstrate that E( α )=α

22
¿ ¿ ¿ ¿
- We start from the formula that α =Y −β X
¿
Y=
∑Yi ¿

- However , recall that: n and


β =∑ wi Y i
¿ ¿ ¿ ¿
- If we substitute these into the formulaα =Y −β X , we get:

¿ ∑Yi − ¿
α=
n
∑ wi Y i X
or
¿ ∑Yi − ¿
α=
n
∑ wi X Y i
[ ]
¿ ¿
1
α =∑ −wi X Y i
n

1 ¿
−w i X =hi ¿

Let n be a constant .Therefore


α = ∑ hi Y i

[ 1
]
¿ ¿
α =∑ −wi X Y i
- Since n and that
Y i =α + βX i +e i , we substitute Y i into

[ 1
]
¿ ¿
¿ α =∑ −wi X [ α + βX i + ei ]
α as follows: n

Expand this expression as follows:

1
{[ 1 1
] [ ] [ ] }
¿ ¿ ¿ ¿ ¿
α = −w i X i α + −w i X βX i + −wi X wi X
n n n

[ α
] ∑[ ] ∑[ ] βX i 1
¿ ¿ ¿ ¿
α =∑ −αw i X + −βwi X i X i + −wi X e i
n n n

∑ ∑ βX i − βw
α
[ ]∑
¿ ¿ ¿ ¿
1
α= −∑ αw X + ∑ i i i X Xi+ −wi X i ei
n n n

. .. .. . .. .. . .. . ∑ αw i X i=0 ,,,
¿ ∑ βX i − ¿

n
∑ βw i X X i=0 ,,, ∑ ei =0 ,,

23
¿ ¿
α =E (α ) And also, E( α )=α
¿
Thus (α ) is an unbiased estimator of α

In general therefore, the OLS estimators are unbiased estimators of their


actual or population values.

(iii) EFICIENT ESTIMATOR:

 By efficient estimator , we mean that the OLS estimators have minimum


variance in the class of linear and unbiased estimators.
 To demonstrate this, we shall work out the two variance
¿
i. The variance of the OLS estimator ( β ).
¿
ii. The variance of another estimator ( β∗¿ ¿) which is obtained by
another econometric method.
¿
 First , we obtain the variance of the OLS estimator. ( β ).

[ √∑ ]
¿ 2 ¿
¿ 2
σ σ
Var β =( se )
2
= =
∑ x2
¿
β 2
x
¿
 Next, we obtain the variance of another estimator ( β∗¿ ¿) which is obtained
econometric method
¿ ¿ ¿

We first define ( β∗¿ ¿) as β∗¿ ¿=


β +w i

The variance for


¿
β∗¿ ¿ is defined as
Var ¿ ¿

24
Var ¿ ¿ ¿
¿ ¿

But we recall that ∑ wi =zero( 0 ) and that


E ( e 2 )=Vare 2 = σ 2
i i

Thus

Var ¿ ¿ ¿
¿
Var ( β ) <Var ¿ ¿
¿

From the above equation we can note that

¿
Thus the OLS estimator ( β ) has minimum variance when compared to the variance of
¿ ¿
another estimator ( β∗¿ ¿) obtained from another econometric method. Thus β is an
efficient estimator.
In summary ,the Gauss-Markov Theorem states as follows: “Given the assumptions of
the classical linear regression model, the ordinary least squares (OLS) by goodness
estimators ,in the class of unbiased linear estimators, have minimum variance i.e.
They are BLUE.

25
3.14 GOODNESS OF FIT

By goodness of fit, we mean: “How well does the sample regression line fit the data?” The
2
goodness of fit, otherwise known as coefficient of determination, is denoted byr .

2
The value of r ranges from 0 to 1, i.e. from no goodness of fit to a perfect goodness of fit.
2
Therefore: 0≤r ≤1 .

2
The following steps illustrate derivation of r

¿ ¿
Y =α + β X i + ei
Step1: Begin with an OLS regression model: i
¿ ¿ ¿ ¿

Recall that the sample regression line is:


Y =α + β X i . Thus: Y i =Y i +e i

Step2: Subtract the mean value of Y, i.e.


¿ ¿ ¿
Y i −Y =Y + ei −Y
¿ ¿ ¿

Or
Y i −Y =Y −Y +e i

( Y i −Y ) =∑ ( Y −Y ) +e 2
¿ 2 ¿ ¿ 2

Step 3: Square both sides and take summations


∑ i

From the above equation, we obtain 3 important squares:


¿ 2
∑ ( Yi−Y ) =∑ y2= Total Sum of Squares (TSS)

∑ (Y −Y ) =∑ y =
¿ − 2 ¿
2
Explained sum of squares (ESS)

∑ ei 2= =Residual sum of squares (RSS)

Therefore: TSS=ESS+RSS

Step 4: Divide both sides of the equation by TSS,

26
TSS ESS RSS ESS RSS ESS RSS
= + 1= + or .. =1−
TSS TSS TSS >>>>>>>>>>>> TSS TSS TSS TSS

Step 5: The goodness of fit.

ESS
Now, the ratio TSS 2
is called the goodness of fit (r ). Therefore:

∑ ei 2
¿
ESS ∑ y
2
2 2 RSS
r = = . . or . . r =1− .=1−
TSS ∑ y 2 TSS ∑ y2
¿
2

Another formula for r is:


2
r = 2 ( ∑ xy )
∑ x2 .∑ y2
2
. .or . .r =β 2
( )
∑ x2
∑ y2
Recall

2
∑ ei 2 9 . 806
r =1− ..=1−
∑ y2 244
r 2 =0 . 9598

( ∑ xy )
2
2 ( 1390 )2
r = = =0 .9598 . or 95 . 98 %
Or, ∑ x 2 . ∑ y 2 8250×244
¿

Or,
2
r =β 2
( ) ∑ x2
∑ y2
=0 . 16852 ×
8250
244
=0 .9598 . or . 95 .98 %

3.15 CONFIDENCE INTERVAL ESTIMATION

Confidence interval estimation aims at constructing an interval around the OLS estimators.

The confidence interval for the slope coefficient β is given as follows:


¿ ¿
β −t α . se ≤β≤β +t α . se .=1−α
(β ) ( β)
¿ ¿

2 2

Where:
¿
- β is the estimated OLS estimator for β


- 2 Is the critical t value for a two tailed test at n-k degrees of
freedom.

27
se ¿
( β ) is the standard error of the slope coefficient ( β )
¿

- α is the level of significance, e.g. 1%, 5%, and 10%

- 1-α is the coincidence interval, e.g. 99%, 95%, and 90%.

¿
The figure below illustrates the confidence interval for β :-

¿ ¿
β −t α . se ¿ β +t α . se
(β ) (β)
¿

2 β 2

α α
2 1−α 2

In the diagram above, the shaded part is the rejection region, while the un-shaded part is
the acceptance region.

The following table shows the appropriate critical t values at various levels of
significance and at one-tail and two-tail tests:

Level of significance One-tail Two-tail t-critical,one t-critical,two 1-α


tail tail

α =1% 0.01 0.005 2.326 2.576 99%

α =5% 0.05 0.0025 1.645 1.960 95%

α =10% 0.10 0.05 1.282 1.645 90%

For example, in order to calculate a 95% confidence interval for β at a two-tailed test is
obtained as follows.

28
¿ se =
β =0.1685; n=10; k=2 ; n-k= (10-2)=8 degrees of freedom ; and ( β ) 0.1219
¿

and
1-α =0.95 ; hence α =5%=0.05

Thus:

¿ ¿
β −t α ,8df . se ≤ β≤β +t α ,8df .se ( ) =1−α
¿ ¿
(β) 2
β
2
¿ ¿
β −t 0.025 ,8df .se ≤β≤β +t0. 025 ,8df .se( ) =95%
¿ ¿
( β) β

0.1685−( 2.306×0.1219 )≤β≤0.1685+ ( 2.306×0.1219 )=95 %


0.1685−0.02811≤β≤0.1685+0.02811=95%
0.1404≤ β≤0.1966=95%

Hence, the 95% confidence interval for β is 0 . 1404≤ β≤0 .1966

3.16 HYPOTHESIS TESTING

A hypothesis is a guess or a hunch about something.

By hypothesis testing, we mean: “can our regression results be trusted?” or also, “Do our
regression estimates matter?”

There are 2 types of hypothesis.


- The null hypothesis

- The alternative hypothesis

The null hypothesis is the hypothesis of interest. It is usually denoted by Ho. For example, to
¿
test whether the slope coefficient is significant, we state: Ho: β =0.

The alternate hypothesis is the hypothesis that is tested against the hypothesis of interest, i.e.
the null hypothesis. The alternate hypothesis is denoted by H 1 or HA. For example, the
alternate hypothesis to test whether the slope coefficient is significant, we state as follows:
¿
- H1: β ≠0 for the case of a two-tailed test

29
¿ ¿
- H1: β >0 or H1: β <0 for the case of a one-tail test.
Point to note
¿
The hypothesis Ho: β =0.means as follows:
- The slope coefficient is equal to zero, or
- The slope coefficient is not statistically significant, or
- X does not influence Y
¿
The hypothesis H1: β ≠0 means as follows:
- The slope coefficient is different from zero,
- The slope coefficient is statistically significant
- X does influence Y
In hypothesis testing, there are 2 possible types of errors that can be committed, i.e. Type I
error and type II error.
Type I error occurs when we reject the null hypothesis, when in actual sense, it should
not have been rejected; i.e. “killing an innocent man”

Type II error occurs when we do not reject (accept) the null hypothesis when in actual
sense, it should have been rejected, i.e. “letting a guilty man away Scott-free”

The aim of hypothesis testing is to reduce the chances of committing both type I and type II
errors. This is the reason why in hypothesis testing, we specify the level of significance ( α
=1% or 5% or 10% ).

There are 3 common approaches used in hypothesis testing:


1. The confidence interval approach
2. The test of significance approach
3. The probability-value (P-Value) approach
3.16.1 HYPOTHESIS TESTING USING CONFIDENCE INTERVAL APPROACH
The decision rule for hypothesis testing using the confidence interval approach states as
follows:
“If the OLS parameter of interest under the Null hypothesis falls within the constructed
confidence interval, we do not reject the Null hypothesis. However, if it falls outside the
confidence interval, then we reject the Null hypothesis.”
This decision rule is demonstrated as under:
30
Rejection region Rejection region
Acceptance region

¿
Earlier on, we had constructed a 95% confidence interval for β and obtained the confidence

interval as: 0 . 1404≤ β≤0 .1966=95 %

From this confidence interval, we can test the following hypotheses:


¿ ¿
Ho: β =0 Ho: β =0 .16
¿ ¿
H : β ≠0
i) A
H : β ≠0 . 16
ii) A

¿
For the first set of hypothesis, we notice that the value β =0 does not lie within the
confidence interval, i.e. it lies in the REJECTION REGION. Thus, we reject the null
hypothesis or accept the alternative hypothesis.

¿ ¿
In conclusion, it means β is not equal to zero, or we could say, β is statistically different
from zero.

¿
For the second set of hypothesis, we notice that the value β =0 . 16 actually lies within the
confidence interval, i.e. it lies in the ACCEPTANCE REGION. Thus, we accept or (do not
reject) the null hypothesis.

¿ ¿
In conclusion, it means that β is statistically equal to 0.16 or that β is not statistically
different from 0.16.

3.16.2 HYPOTHESIS TESTING USING TEST OF SIGNIFICANCE APPROACH

31
The test of significance (t-test) approach is the most commonly used to test for hypothesis
testing in econometrics. In this approach, which is similar in spirit to the confidence interval
approach, the null and alternate hypotheses are stated respectively as:
¿ ¿
Ho:β=β∗¿¿H A:β≠β∗
¿
Such that: Ho: β =β∗¿ ¿ is the true value of the estimated OLS coefficient and,
¿
H A : β =β∗¿ ¿ Is a hypothesized or guessed value of β

¿
β∗¿
t−calculated=β − ¿
se
(β )
¿

The general formula for the t-test is as follows:

se ¿
( β ) is the standard error of the OLS parameter β
¿

Where:
¿
If β > β *, then t-calculated will be positive
¿
If β < β *, then t-calculated will be negative
Irrespective of the value of t-calculated, we always take its ABSOLUTE VALUE.

Having obtained t-calculated, we then proceed to obtain the critical value for the t-statics, i.e.
t-critical, from the t-tables.

The critical t is obtained as follows:

t α , n−k . df
t-critical = 2 , for a two-tailed test

t α , n−k . df
t-critical= 2 , for a one tailed test

The decision Rule for hypothesis testing using the test of significance approach states as
follows:
“If t-calculated is greater than t-critical, reject the Null hypothesis, but if t-calculated is less
than t-critical, do not reject (accept) the null hypothesis.”

32
For example, we can now test for the following hypothesis using the t-test approach;

assuming a level of significance α =5 %


¿ ¿
Ho: β =0 Ho: β =0 .16
¿ ¿

i.
H A : β ≠0 ii
H A : β ≠0 . 16

For the first set of hypothesis, we can obtain t-calculated as follows:


¿
β∗¿
t−calculated=β − ¿
se
(β ) ,
¿

¿ se.
( β ) = 0.01219, thus,
¿

Where: β =0 . 1685 , β=0 , and


0 .1685−0
t−calculated= =13 . 8228
0 .01219

α
t α , n−k =2 . 5 %=0 .025
Then t-critical= 2 , where α =5 % , 2 , n=10, k=2,
t
n-k=8df. Thus, t-critical = 0. 025
, 8 df =2. 306

Upon comparing t-calculated and t-critical, we notice that: t-calculated>t-critical.

Thus, according to our decision rule, we reject the Null hypothesis but do not reject (accept)
the alternative hypothesis.

¿ ¿
In conclusion, we can therefore say that β is not equal to zero, or we could say, β is
statistically different from zero.
For the second set of hypothesis, we can obtain t-calculated as follows:
¿
β∗¿ 0.1685−0.16
t−calculated=β − = =0.6973 ¿
se 0.01219
(β )
¿

The value for t-critical will remain the same t t-critical=2.306. Upon comparing t-calculated

and t-critical, we notice that: t −calculated <t −critical . Thus, following the decision

33
¿
rule, we do not reject(accept) the null hypothesis. In conclusion, we can therefore say that β
is statistically equal to 0.16.

POINT TO NOTE: The conclusions from the confidence interval approach actually
resemble the conclusions from the test of significance approach and this must always be so.
Indeed, the confidence interval approach is simply a mirror image of the test of significance
approach.

3.16.3 HYPOTHESIS TESTING USING THE PROBABILITY (P) VALUE


APPROACH

The probability (P) value approach is also an ideal way for testing hypothesis. The P-value
states the smallest level of significance (α ) for which the null hypothesis can be rejected.

The beauty with P-value approach is that most computer soft ware (Excell, SPSS, STATA,
Eviews, SHAZAM, RATS, etc) automatically provide this P-value whenever you run a
regression.

For example, if the software reports a P-Value of 0.07, it means there is a 7% chance that we

can reject the Null hypothesis. Thus, we can reject the Null hypothesis at α =10 % , but we

cannot reject the null hypothesis at α =5 % or α =1%

The table below summarizes some P-values and significance level.

P-value Details Is coefficient significant at

α =1 % α =5 % α =10 %

P=0.0000 ¿ Yes Yes Yes


β is significant at all levels

P=0.035 ¿ No Yes Yes


β is significant at 3.5%

P=0.074 ¿ No No Yes
β is significant at 7.4%

P=0.1025 ¿ No No No
β is significant at 10.25%

34
¿
In summary, the smaller the P-Value, the more significant is β .

3.17 REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE

Analysis of variance (ANOVA) is a study of Total Sum of Squares, (TSS) and its
components, i.e., Explained Sum of Squares (ESS) and Sum of Squared residuals (RSS).

The concept here is that: ESS + RSS = TSS


¿ ¿
∑ y +∑ u2=∑ y 2
2

By dividing the sum of squares (SS) by their associated degrees of freedom (df), we get the
mean sum of squares (MSS). The Anova table therefore shows the sum of squares (SS),
degrees of freedom (df), mean sum of squares (MSS) and source of variation.

Source of variation Sum of squares (SS) df Mean sum of squares (MSS)

¿ ¿ ¿
Due to regression k-1
(ESS)
∑y 2
or β 2
∑x 2
ESS β ∑ x
=
2 2
=MSS reg
df k−1
¿ ¿
Due to residuals n-k
(RSS)
∑ u2 RSS ∑ u
=
2
=MSS res
df n−k

Total (TSS) ∑ y2 n-1


F=
MSS reg
MSS res

From the ANOVA table, the F statistic is computed as follows:


ESS
Mean sum of squares due to regression k −1
F= =
RSS
Mean sum of squares due to residual
n− k

The F statistic follows the F distribution with (k-1) degrees of freedom on the numerator and
(n-k) degrees of freedom on the denominator. The F statistic is used to test for overall
significance of the model.

If F-calculated >F-critical, the model is statistically significant

35
If F-calculated <F-critical, the model is not statistically significant.

EXAMPLE

Recall the example o the sales (X) and profit (Y) of ABC Company limited for a period of
10years. The following values were obtained:

¿
∑ x =8 , 250 , ∑ y =244 , β =0 . 1685 , ∑ e
2 2
i
2=9 . 806 ,
and n=10, k=2

Thus, Total Sum of Squares (TSS) = ∑ y 2=244


¿

Explained Sum of Squares (ESS) = β


2
∑ x 2=(0 .1685 )2×8 , 250=234 . 36
¿

Sum of Squared residuals (RSS) = ∑ u 2 =9.806


Notice that 234.236+9.806=(244)

Hence, the ANOVA table is as follows:

Source of Variation Sum of Squares (SS) df Mean Sum of Squares (MSS)

Due to regression (ESS) 234.2306 2-1=1 234 . 2306


=234 . 2306
1

Due to residuals (RSS) 9.806 10-2=8 9 .806


=1. 2258
8

Total (TSS) 244.0000 10-1=9 234 . 2306


F= =191. 09
1 .2258

The critical value of F at 5% is given as follows:

Critical F=
F k−1 , n−k , α=F1 ,8 ,5 %=5 . 32

N.B: The F ratio is always a one-tailed test. We notice that calculated-F is greater than
critical-F, i.e. (Fcal>Fcrit)

Conclusion: The overall model is statistically significant.

36

You might also like