Master of Science (Economics)
Symbiosis School of Economics, Pune
Advanced Econometrics I
Unit I: Classical Linear Regression Model – two
variable model
(Para 2)
Professor Ajit Karnik
Basic Data Analysis
• All pieces of empirical work should begin with some
basic data analysis
– Eyeball the data
– Summarise the properties of the data series
– Examine the relationship between data series
• Most powerful analytic tools are your eyes and your
common sense
– Computers still suffer from “Garbage in - garbage out”
1
What to do when?
• Descriptive statistics (Summary Statistics)
– One variable:
– Mean or average value
– Minimum and Maximum value
– Mode & Median
– Variance and standard deviation
– Two variables (in addition):
– Covariance
– Correlation
– Cross-plot (or scatter gram or scatter plot)
Descriptive Statistics
A cross-country data for 75 countries over the 1970-2008 period
2
Covariance & Correlation
• Descriptive statistics for two variables
n
– cov(X,Y) = (X
i 1
i X )(Yi Y )
n 1
• Sample Correlation coefficients between X and Y is symbolised
by r or rxy.
(Yi Y)(Xi X)Cov( X ,Y )
rxy
2 2 sd ( X )*sd (Y )
(Y Y) (X X)
i i
EXAMPLE: CORRELATION
DOW JONES VS FTSE
10000.00
8000.00
6000.00
4000.00
2000.00
0.00
-2000.00 -1500.00 -1000.00 -500.00 0.00 500.00 1000.00 1500.00
-2000.00
-4000.00
-6000.00
-8000.00
Deviations from the mean:
COVARIANCE = 2111536.745
CORRELATION = 0.8939
3
Regression
Objectives
• Ordinary Least Square estimator (OLS)
• How to derive the OLS estimates
• Assumption of the Classical Linear Regression model
• Numerical properties of the OLS estimator
• Derivation of actual and fitted values
• Coefficient of determination
What Kind of Problems Do We Research?
• What is the effect of exchange rate on exports?
• What is the effect of the interest rate on inflation?
• What is the effect of minimum wages on
unemployment?
• What is the effect of additional police force on the
crime rate?
• What factors determine whether a person buys a car
or not? (Note: the dependent variable, buying a car, is
a binary variable)
4
Kinds of Data
• Time Series Data: Problem of non-stationarity
• Cross-section Data: we will assume that the data we
are using is cross-section data
• Time-series and Cross-section data: Panel Data
Regression analysis: the basic story
Regression analysis is largely concerned with estimating
and/or predicting the population mean value of the dependent
variable on the basis of the known or fixed values of the
explanatory variables.
y is a function of x
y depends on x
y is determined by x
“the spot exchange rate depends on relative price levels and interest
rates…”
10
10
5
Regression and Correlation
If we say y and x are correlated, it means that we
are treating y and x in a symmetric way.
In regression, we treat the dependent variable (y)
and the independent variable(s) (x’s) very
differently
◦ The y variable is assumed to be random or “stochastic” in
some way, i.e. to have a probability distribution.
◦ The x variables are assumed to have fixed (“non-
stochastic”) values in repeated samples.
11
11
Deterministic versus stochastic relationships
(1) y = 8+ 3x
– y is known exactly if x is known
– x is known exactly if y is known
• which is dependent variable here?
(2) y = 8 + 3x + u
– The term ‘u’ is the error or disturbance term and it contains
all factors affecting y other than x.
12
12
6
Econometric Model Building
1. Understand the Economic
2. Derive an estimable model
theory
3. Collect Data 4. Estimate the model
5. Evaluate estimation results
Satisfactory Unsatisfactory
Interpret & use the Re-estimate the model 13
model with better data
13
Finding the Line of Best Fit
• We can use the general equation for a straight line,
y = α + βx
to get the line that best “fits” the data.
• But this equation (y = α + βx) is completely deterministic.
• Is this realistic? No. So what we do is to add a random
disturbance term, u into the equation.
yi = + xi + ui
where i = 1, 2,…,n
14
14
7
Going back a step or two: Relationship
We are talking about statistical relationships:
Y X u
The term ‘u’ is the error or disturbance term and it contains
all factors (omitted variables) affecting y other than x
– Measurement problem: data could be “noisy”
– Wrong functional form (mis-specification)
• The “true” model could be
𝑌 = 𝛼 + 𝛽𝑋 + 𝑢
Or
𝑌 = 𝛼 + 𝛽√𝑋 + 𝑢
15
15
Going back a step or two: Relationship (2)
• We want a model with as little ERROR as possible
• Suppose the TRUE model is
Y 1 X 1 2 X 2 u
• But we estimate
Y 1 X 1 u
• Now we will have a model with the following
disturbance term:
Y 1 X 1 ; where 2 X 2 u
16
8
Mean as OLS Estimate
• The mean is an Ordinary Least Squares (OLS)
estimate
• This is exciting because
– OLS estimators are BLUE
– Proven with Gauss-Markov Theorem
17
17
BLUE Estimators
• Best
– Minimum variance (of all possible unbiased estimators)
– Narrower distribution than other estimators
• e.g. median, mode
• Linear
– Linear predictions
– For the mean
– Linear (straight, flat) line
– Linearity in variables & linearity in parameters
• Non-Linearity:
𝑌 = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝜀 Non-linear in variable
𝑌 = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝑢 Non-linear in parameter
18
18
9
BLUE Estimators
• Unbiased
– Centred around true (population) values
– Expected value = population value
• Also, consistent
– Sample approaches infinity, get closer to population
values
– Variance shrinks
19
19
PRF and SRF
POPULATION REGRESSION FUNCTION (PRF):
Yi X i u i i=1,2,…n
Our objective is to get estimates of the unknown parameters alpha and
beta, given ‘n’ observations on Y and X.
SAMPLE REGRESSION FUNCTION (SRF):
Yi ˆ ˆX i uˆi
20
20
10
Population regression line, sample data points
and the associated error terms
y E(y|x) = 0 + 1x
y4 .
u4 {
y3 .} u3
y2 u2 {.
y1 .} u1
x1 x2 x3 x4 x
21
21
Sample regression line, sample data points
and the associated estimated error terms
This is called the
y Sample Regression Function
y4 . (SRF)
û4 {
yˆ ˆ0 ˆ1 x
y3
.} û3 =y^/ x
û {.
y
y2 2
The Population
Regression Function
(PRF) is given by
.} û1
E(y|x) = 0 + 1x
y1
x1 x2 x x3 x4 x
22
22
11