0% found this document useful (0 votes)

15 views11 pages

Clusteringmf

This document provides a comprehensive overview of clustering in linear regression models, addressing the relaxation of the homoscedasticity assumption and the implications for estimating parameters and standard errors. It discusses the econometric model, the conditions for consistent estimation, and the importance of correcting standard errors when dealing with clustered data. Additionally, it covers methods for efficient estimation using GLS and practical implementation in statistical software like Stata.

Uploaded by

Sandeep Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

Clusteringmf

Uploaded by

Sandeep Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Short Guides to Microeconometrics Kurt Schmidheiny

Fall 2023 University of Basel

Clustering in the Linear Model

matrix-free

1 Introduction

This handout extends the handout on “The Multiple Linear Regression

model” and refers to its definitions and assumptions in section 2. It relaxes
the homoscedasticity assumption (OLS4a) and allows the error terms to
be heteroscedastic and correlated within groups or so-called clusters. It
shows in what situations the parameters of the linear model can be consis-
tently estimated by OLS and how the standard errors need to be corrected.
The canonical example (Moulton 1986, 1990) for clustering is a regres-
sion of individual outcomes (e.g. wages) on explanatory variables of which
some are observed on a more aggregate level (e.g. employment growth on
the state level).
Clustering also arises when the sampling mechanism first draws a ran-
dom sample of groups (e.g. schools, households, towns) and than surveys
all (or a random sample of) observations within that group. Stratified
sampling, where some observations are intentionally under- or oversam-
pled asks for more sophisticated techniques.

2 The Econometric Model

Consider the multiple linear regression model

ygi = β0 + β1 xgi1 + ... + βK xgiK + ugi

where observations belong to a cluster g = 1, ..., G and observations are

indexed by i = 1, ..., M within their cluster. G is the number of clusters,

Version: 3-1-2024, 14:10

Clustering in the Linear Model 2

P
M is the number of observations per cluster, and N = g M = GM is
the total number of observations. For notational simplicity, M is assumed
constant in this handout. It is easily generalized to a cluster specific
number Mg . ygi is the dependent variable, xgi1 , ..., xgiK are K explanatory
variables, β0 , ..., βK are K + 1 parameters, and ugi is the error term.
The data generation process (dgp) is fully described by:

CL1: Linearity
ygi = β0 + β1 xgi1 + ... + βK xgiK + ugi and E[ugi ] = 0

CL2: Independence
{xg11 , ..., xgM K , yg1 , ..., ygM }G
g=1
i.i.d. (independent and identically distributed)
CL2 assumes that the observations in one cluster are independent from
the observations in all other clusters. It does not assume independence of
the observations within clusters.

CL3: Strict Exogeneity

2
a) ugi |xg11 , ..., xgM K ∼ N (0, σgi )
b) ∀j, k : ugi ⊥ xgjk (independent)
c) E[ugi |xg11 , ..., xgM K ] = 0 (mean independent)
d) ∀k, j : Cov[xgjk , ugi ] = 0 (uncorrelated)
CL3 assumes that the error term ugi is unrelated to all explanatory vari-
ables of all observations within its cluster.

CL4: Clustered Errors

2
V [ugi |xg11 , ..., xgM K ] = σgi > 0 and < ∞
Cov[ugi , ugj |xg11 , ..., xgM K ] = ρgij σgi σgj < ∞, for all i 6= j
CL4 means that the error terms are allowed to have different variances and
to be correlated within clusters conditional on all explanatory variables
3 Short Guides to Microeconometrics

of all observations within the cluster.

Under CL2, CL3c and CL4, the conditional variances and covariances
across all error terms are

2
V (ugi |xg11 , ..., xgM K ) = σgi

Cov(ugi , ugj |xg11 , ..., xgM K ) = ρgij σgi σgj , i 6= j

Cov(ugi , uhj |xg11 , ..., xgM K , xh11 , ..., xhM K ) = 0, i 6= j, g 6= h

CL5: Identifiability
(1, xgi1 , · · · , xgiK ) are not linearly dependent
0 < V [xgik ] < ∞ and 0 < Vb [xgik ]
CL5 assumes that the regressors have identifying variation (non-zero vari-
ance) and are not perfectly collinear.

3 A Special Case: Random Cluster-specific Effects

Suppose as Moulton(1986) that the error term ugi consists of a cluster

specific random effect cg and an individual effect vgi

ugi = cg + vgi

Assume that the individual error term is strictly exogenous, homoscedastic

and independent across all observations

E[vgi |xg11 , ..., xgM K ] = 0

V [vgi |xg11 , ..., xgM K ] = σv2

Cov[vgi , vgj |xg11 , ..., xgM K ] = 0, i 6= j

and that the cluster specific effect is exogenous, homoscedastic and un-
correlated with the individual effect
Clustering in the Linear Model 4

E[cg |xg11 , ..., xgM K ] = 0

V [cg |xg11 , ..., xgM K ] = σc2

Cov[cg , vgi |xg11 , ..., xgM K ] = 0

The resulting variances and covariances of the combined error term

ugi = cg + vgi are then within each cluster g

V [ugi |xg11 , ..., xgM K ] = σu2

Cov[ugi , ugj |xg11 , ..., xgM K ] = ρu σu2 , i 6= j

where σu2 = σc2 + σv2 and ρu = σc2 /(σc2 + σv2 ). This structure is called
equicorrelated errors. In a less restrictive version, σu2 and ρu are allowed
to be cluster specific as a function of xg11 , ..., xgM K .
Note: this structure is formally identical to a random effects model for
panel data with many “individuals” g observed over few “time periods”i.
The cluster specific random effect is also called an unrelated effect.

4 Estimation with OLS

The parameter β can be estimated with OLS by regressing ygi on a con-

stant and on xgi1 , · · · , xgiK . In the special case with one regressor xgi ,
the resulting OLS estimators of β0 and β1 are:
PG PM
g=1 i=1 (xgi − x̄)(ygi − ȳ)
βb1 = PG PM 2
g=1 i=1 (xgi − x̄)

βb0 = ȳ − β̂1 x̄
P P P P
where ȳ = 1/GM g i ygi and x̄ = 1/GM g i xgi .
The OLS estimator of β remains unbiased in small samples under
CL1, CL2, CL3c, CL4, and CL5 and normally distributed additionally
assuming CL3a. It is consistent and approximately normally distributed
5 Short Guides to Microeconometrics

under CL1, CL2, CL3d, CL4, and CL5 in samples with a large number
of clusters. However, the OLS estimator is not efficient any more. More
importantly, the usual standard errors of the OLS estimator and tests (t-,
F -, z-, Wald-) based on them are not valid any more.

5 Estimating Correct Standard Errors

The small sample variance V (βbK |x111 , ..., xGM K ) of βbK differs from the
usual OLS one under CL3c and CL4. This cannot be easily expressed
without matrix notation even for the binary regression model. Con-
sequently, the usual estimator Vb (βbk |x111 , ..., xGM K ) is incorrect. Usual
small sample test procedures, such as the F - or t-Test, based on the usual
estimator are therefore not valid.
With the number of clusters G → ∞ and fixed cluster size M =
N/G, the OLS estimator is asymptotically normally distributed under
CL1, CL2, CL3d, CL4, and CL5
√ d
G(βbk − βk ) −→ N 0, ς 2

where ς 2 is not easily expressed without matrix notation. The OLS esti-
mator is therefore approximately normally distributed in samples with a
large number of clusters

A
βbk ∼ N βk , Avar(βbk )

where Avar(βbk ) = ς 2 /N can be consistently estimated with some addi-

tional assumptions on higher order moments of xg11 , ..., xgM K . For the
binary regression, the robust variance estimator is calculated as
PG PM PM
g=1 i=1 j=1 u bgj (xgi − x̄)(xgj − x̄)
bgi u
Avar(β1 ) =
[ b hP i2
G PM 2
g=1 i=1 (xgi − x̄)

This so-called cluster-robust covariance matrix estimator is a gener-

alization of Huber(1967) and White(1980).1 It does not impose any re-
1 Note: the cluster-robust estimator is not clearly attributed to a specific author.
Clustering in the Linear Model 6

strictions on the form of both heteroscedasticity and correlation within

clusters (though we assumed independence of the error terms across clus-
ters). We can perform the usual z- and Wald-test for large samples using
the cluster-robust covariance estimator.
Note: the cluster-robust covariance matrix is consistent when the num-
ber of clusters G → ∞. In practice we should have at least 50 clusters.
Bootstrapping is an alternative method to estimate a cluster-robust
covariance matrix under the same assumptions. See the handout on “The
Bootstrap”. Clustering is addressed in the bootstrap by randomly draw-
ing clusters g (rather than individual observations gi) and taking all M
observations for each drawn cluster. This so-called block bootstrap pre-
serves all within cluster correlation. With 20 to 50 clusters, a wild block
residual bootstrap-t should be used (Cameron and Miller, 2015).

6 Efficient Estimation with GLS

In some cases, for example with cluster specific random effects, we can es-
timate β efficiently using feasible GLS (see the handout on “Heteroscedas-
ticity in the Linear Model” and the handout on “Panel Data”). In prac-
tice, we can rarely rule out additional serial correlation beyond the one
induced by the random effect. It is therefore advisable to always use
cluster-robust standard errors in combination with FGLS estimation of
the random effects model.
7 Short Guides to Microeconometrics

7 Special Case: Estimating Correct Standard Errors

with Random Cluster-specific Effects

Moulton (1986, 1990) studies the bias of the usual OLS standard errors
for the special case with random cluster-specific effects. Assume cluster-
specific random effects in a bivariate regression:

ygi = β0 + β1 xgi + ugi

where ugi = cg + vgi with σu2 = σc2 + σv2 , ρu = σc2 /(σc2 + σv2 ). Then the
(cluster-robust) asymptotic variance can be estimated as

bu2
σ
Avar
[ cluster [βb1 ] = P
G PM [1 + (M − 1)b
ρx ρbu ]
2
g=1 i=1 (xgi − x̄)

where σbu2 is the usual OLS estimator, ρx is the within cluster correlation of
b , ρbu and ρbx are consistent estimators of σ 2 , ρu and ρx , respectively.
x. σ 2

The robust standard error for the slope coefficient is accordingly

p
se b ols (βb1 ) 1 + (M − 1)b
b cluster (βb1 ) = se ρx ρbu

where seb [βb ] is the usual OLS standard error.

p ols 1
1 + (M − 1)ρx ρu ] > 1 is called the Moulton factor and measures
how much the usual OLS standard errors understate the correct standard
errors. For example, with cluster size M = 500 and intracluster correla-
tions ρu = 0.1 and ρx = 0.1, the correct standard errors are 2.45 times
the usual OLS ones.

Lessons from the Moulton factor

1. If either the within cluster correlation of the combined error term u

is zero (ρu = 0) or the within cluster correlation of x is zero (ρx = 0),
then the Moulton factor is 1 and the usual OLS standard errors are
correct. Both situations generalize to K explanatory variables.
Clustering in the Linear Model 8

2. If the variable of interest is an aggregate variable on the level of the

cluster (hence ρx = 1), the Moulton factor is maximal. This case
generalizes to K aggregate explanatory variables:
p
se b ols (βbk ) 1 + (M − 1)b
b cluster (βbk ) = se ρ

In this situation, we need to correct the standard errors. Alterna-

tively, we could aggregate (average) all variables and run the regres-
sion on the collapsed data.

3. If only control variables are aggregated, we better include cluster

fixed effects (i.e. dummy variables for the groups) which will take
care of the cluster-specific effect. See also the handout on “Panel
Data: Fixed and Random Effects”.

4. If the variable of interest is not aggregated but has an important

cluster specific component (large ρx ), then including cluster fixed
effects may destroy valuable information and we better don’t in-
clude cluster fixed effects. However, we need to correct the standard
errors.

5. If only control variables have an important cluster-specific compo-

nent, it is better to include cluster fixed effects.

6. If the variable of interest has only a small cluster specific component

(i.e. a lot of within-cluster variation and very little between-cluster
variation), it is better to include cluster fixed effects.

Standard errors are in practice most easily corrected using the Eicker-
Huber-White cluster-robust covariance from section 5 and not via the
Moulton factor. Note that we should have at least G = 50 clusters to
justify the asymptotic approximation.
In the context of panel and time series data, serial correlation beyond
the ones from a random effect becomes very important. See the handout
on “Panel Data: Fixed and Random Effects”. In this case, standard errors
need to be corrected even when including fixed effects.
9 Short Guides to Microeconometrics

8 Implementation in Stata 17

Load example data

webuse auto7.dta

Stata reports the cluster-robust covariance estimator clustered for

manufacturer with the vce(cluster) option, e.g.2
regress price weight, vce(cluster manufacturer)
matrix list e(V)

Note: Stata multiplies Vb with (N − 1)/(N − K − 1) · G/(G − 1) to “cor-

rect” for degrees of freedom in small samples. This practice is not based
on asymptotic theory but often produces better small sample properties.
Stata reports p-values for the t- and F -statistics with G − 1 degrees of
freedom.
We can also estimate a cluster robust covariance using a nonparametric
block bootstrap. For example with either of the following,
regress price weight, vce(bootstrap, reps(999) cluster(manufacturer))
bootstrap, reps(999) cluster(manufacturer): regress price weight

The cluster specific random effects model is efficiently estimated by

FGLS. For example,
xtset manufacturer_grp
xtreg price weight, re

In addition, cluster-robust standard errors are reported with

xtreg price weight, re vce(cluster manufacturer)

The wild block residual bootstrap-t for the slope coefficient of the
variable weight is reported by David Roodman’s command boottest
ssc install boottest
regress price weight, vce(cluster manufacturer)
boottest weight=0, reps(99999)

2There are only 23 clusters in this example dataset used by the Stata manual. This
is not enough to justify using large sample approximations.
Clustering in the Linear Model 10

9 Implementation in R 4.3.1

Load example data

library(haven)
auto <- read_dta("http://www.stata-press.com/data/r17/auto7.dta")

First, we estimate the regression with the usual command

ols <- lm(price~weight, data=auto)
summary(ols)

The cluster-robust covariance estimator clustered for manufacturer is

calculated and reported with the packages sandwich and lmtest3
library(sandwich)
library(lmtest)
coeftest(ols, vcov = vcovCL, cluster = ~manufacturer)

The following commands are equivalent

coeftest(ols, vcov = vcovCL, cluster = ~manufacturer, cadjust=TRUE)
coeftest(ols, vcov = vcovCL(ols, cluster = ~manufacturer))
coeftest(ols, vcov = vcovCL(ols, type="HC1", cluster = ~manufacturer))

Note: The above commands multiply Vb with (N −1)/(N −K −1)·G/(G−

1) to “correct” for degrees of freedom in small samples. R reports p-values
for the t- and F -statistics with N − K − 1 degrees of freedom.
We can also estimate a cluster robust covariance using a nonparametric
block bootstrap
coeftest(ols, vcov = vcovBS, cluster = ~manufacturer, R=999)

The wild block residual bootstrap-t for the slope coefficient of the
variabel weight is calculated by David Roodman’s algorithm in boottest
library(fwildclusterboot)
wild <- boottest(ols, param="weight", clustid=c("manufacturer"),
B=99999, type="rademacher", impose_null=TRUE,
p_val_type="two-tailed")
summary(wild)

3There are only 23 clusters in this example dataset used by the Stata manual. This
is not enough to justify using large sample approximations.
11 Short Guides to Microeconometrics

References

Advanced textbooks

Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics:

Methods and Applications, Cambridge University Press. Sections 24.5.
Wooldridge, Jeffrey M. (2002), Econometric Analysis of Cross Section and
Panel Data, MIT Press. Sections 7.8 and 11.54.

Companion textbooks

Angrist, Joshua D. and Jörn-Steffen Pischke (2009), Mostly Harmless

Econometrics: An Empiricist’s Companion, Princeton University Press.
Chapter 8.

Articles

Cameron, A. Colin and Douglas L. Miller (2015), A Practitioner’s Guide

to Cluster-Robust Inference, Journal of Human Resources, forthcom-
ing.
Moulton, B. R. (1986), Random Group Effects and the Precision of Re-
gression Estimates, Journal of Econometrics, 32(3), 385-397.
Moulton, B. R. (1990), An Illustration of a Pitfall in Estimating the Ef-
fects of Aggregate Variables on Micro Units, The Review of Economics
and Statistics, 72, 334-338.

Microeconometrics for Economists
No ratings yet
Microeconometrics for Economists
11 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
CLM: Review: - OLS Estimation
No ratings yet
CLM: Review: - OLS Estimation
44 pages
Aitken' GLS
No ratings yet
Aitken' GLS
7 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
MultivariableRegression 4
No ratings yet
MultivariableRegression 4
98 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
MGT Three
No ratings yet
MGT Three
86 pages
MultivariableRegression 3
No ratings yet
MultivariableRegression 3
67 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Econometrics for Finance Students
No ratings yet
Econometrics for Finance Students
64 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
Cluster and Stratified Sampling: 1. The Linear Model With Cluster Effects
No ratings yet
Cluster and Stratified Sampling: 1. The Linear Model With Cluster Effects
31 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
52 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
17 pages
Cameron and Miller 2015 A PRactitioner's Guide To Cluster Robust Inference
No ratings yet
Cameron and Miller 2015 A PRactitioner's Guide To Cluster Robust Inference
56 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
Chapter 3 Multiple Regression
No ratings yet
Chapter 3 Multiple Regression
49 pages
When Should You Adjust Standard Errors For Clustering?: Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge
No ratings yet
When Should You Adjust Standard Errors For Clustering?: Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge
33 pages
Econ3061 Chapter 2
No ratings yet
Econ3061 Chapter 2
67 pages
Financial Econometrics Lecture 4
No ratings yet
Financial Econometrics Lecture 4
41 pages
Lecture 4
No ratings yet
Lecture 4
11 pages
Chapter3
No ratings yet
Chapter3
52 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Econometrics Study Guide
No ratings yet
Econometrics Study Guide
9 pages
Econometrics
No ratings yet
Econometrics
13 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Ec 384 Applied Econometrics Topic 1 - 2023
No ratings yet
Ec 384 Applied Econometrics Topic 1 - 2023
99 pages
Econometrics I Lecture 4 Wooldridge
No ratings yet
Econometrics I Lecture 4 Wooldridge
33 pages
Basic Econometrics - II
No ratings yet
Basic Econometrics - II
30 pages
Lecture 5: Bad Controls, Standard Errors, Quantile Regression
No ratings yet
Lecture 5: Bad Controls, Standard Errors, Quantile Regression
44 pages
EC501 Lecture 05
No ratings yet
EC501 Lecture 05
28 pages
Wooldridge
0% (1)
Wooldridge
57 pages
Econometria 2
No ratings yet
Econometria 2
16 pages
Econometrics for Students
No ratings yet
Econometrics for Students
36 pages
Econometrics Part1 Notes
No ratings yet
Econometrics Part1 Notes
7 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Multiple Linear Regression Model - Final
No ratings yet
Multiple Linear Regression Model - Final
16 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
ECON0019 Lecture8 Slides
No ratings yet
ECON0019 Lecture8 Slides
35 pages
Econometrics I
No ratings yet
Econometrics I
43 pages
Two-Variable Regression Model Basics
No ratings yet
Two-Variable Regression Model Basics
17 pages
Econometrics: CLM & OLS Basics
No ratings yet
Econometrics: CLM & OLS Basics
11 pages
Linear Regression Analysis: Module - Vii
No ratings yet
Linear Regression Analysis: Module - Vii
10 pages
Notes 2
No ratings yet
Notes 2
16 pages
Week 3-1
No ratings yet
Week 3-1
25 pages
Hayashi 1 13
No ratings yet
Hayashi 1 13
13 pages
Finite-Sample OLS Analysis
No ratings yet
Finite-Sample OLS Analysis
35 pages
OLS in Nonlinear Economic Models
No ratings yet
OLS in Nonlinear Economic Models
4 pages
Lecture Notes Week One
No ratings yet
Lecture Notes Week One
16 pages
Heteroskedasticity in The Linear Model: I 0 I I 0 I
No ratings yet
Heteroskedasticity in The Linear Model: I 0 I I 0 I
10 pages
Gauss Markov Theorem
No ratings yet
Gauss Markov Theorem
16 pages
CLRM
No ratings yet
CLRM
15 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
Minitab Basics for Statisticians
No ratings yet
Minitab Basics for Statisticians
41 pages
Working Capital Management Efficiency of Indian Cement Industry
No ratings yet
Working Capital Management Efficiency of Indian Cement Industry
22 pages
BSCS-243 & 239 2nd & 3rd Semester - Fall 2024
No ratings yet
BSCS-243 & 239 2nd & 3rd Semester - Fall 2024
8 pages
Chapter 5 Multicollinearity
No ratings yet
Chapter 5 Multicollinearity
20 pages
Standard Error
No ratings yet
Standard Error
14 pages
Tijarbt Vol6 Enayaba
No ratings yet
Tijarbt Vol6 Enayaba
26 pages
I J C R B Examining The Earnings Persistence and Its Components in Explaining The Future Profitability
No ratings yet
I J C R B Examining The Earnings Persistence and Its Components in Explaining The Future Profitability
14 pages
SPSS 19 Answers To Selected Exercises
No ratings yet
SPSS 19 Answers To Selected Exercises
68 pages
Ams L 01 Introduction
No ratings yet
Ams L 01 Introduction
38 pages
AP Stats Exam Prep Guide
No ratings yet
AP Stats Exam Prep Guide
39 pages
MCD2080 T1 2022 Assignment-Questions
No ratings yet
MCD2080 T1 2022 Assignment-Questions
5 pages
Totalt4 Arc
No ratings yet
Totalt4 Arc
6 pages
COVID Report
No ratings yet
COVID Report
13 pages
Analytics Prepbook Laterals 2019-2020
100% (1)
Analytics Prepbook Laterals 2019-2020
40 pages
Modern Pridictive Modelling (Regression)
No ratings yet
Modern Pridictive Modelling (Regression)
12 pages
Deep-Learning Models For Forecasting Financial Risk Premia and Their Interpretations
No ratings yet
Deep-Learning Models For Forecasting Financial Risk Premia and Their Interpretations
14 pages
CSR Research Paper
No ratings yet
CSR Research Paper
27 pages
Reliability Data Statistical Methods
No ratings yet
Reliability Data Statistical Methods
8 pages
Multiple-Choice Test Linear Regression Regression: y X y X y X
No ratings yet
Multiple-Choice Test Linear Regression Regression: y X y X y X
2 pages
Hypothesis Tests and Confidence Intervals in Multiple Regression
No ratings yet
Hypothesis Tests and Confidence Intervals in Multiple Regression
44 pages
ch4 3
No ratings yet
ch4 3
25 pages
Outliers & Influential Points Guide
No ratings yet
Outliers & Influential Points Guide
14 pages
CS461 Syllabus 2024 Fall-1
No ratings yet
CS461 Syllabus 2024 Fall-1
1 page
Time Series Exam Practice Questions
No ratings yet
Time Series Exam Practice Questions
5 pages
13.question Bank
No ratings yet
13.question Bank
4 pages
Module 4 Question Bank: Big Data Analytics
No ratings yet
Module 4 Question Bank: Big Data Analytics
2 pages
Data Mining Using Sas Enterprise Miner: Mahesh Bommireddy. Chaithanya Kadiyala
No ratings yet
Data Mining Using Sas Enterprise Miner: Mahesh Bommireddy. Chaithanya Kadiyala
40 pages
D
No ratings yet
D
3 pages
Linear Correlation Analysis Guide
No ratings yet
Linear Correlation Analysis Guide
15 pages

Clusteringmf

Uploaded by

Clusteringmf

Uploaded by

Short Guides to Microeconometrics Kurt Schmidheiny

Fall 2023 University of Basel

Clustering in the Linear Model

This handout extends the handout on “The Multiple Linear Regression

2 The Econometric Model

Consider the multiple linear regression model

ygi = β0 + β1 xgi1 + ... + βK xgiK + ugi

where observations belong to a cluster g = 1, ..., G and observations are

Version: 3-1-2024, 14:10

CL3: Strict Exogeneity

CL4: Clustered Errors

of all observations within the cluster.

Cov(ugi , ugj |xg11 , ..., xgM K ) = ρgij σgi σgj , i 6= j

Cov(ugi , uhj |xg11 , ..., xgM K , xh11 , ..., xhM K ) = 0, i 6= j, g 6= h

3 A Special Case: Random Cluster-specific Effects

Suppose as Moulton(1986) that the error term ugi consists of a cluster

Assume that the individual error term is strictly exogenous, homoscedastic

E[vgi |xg11 , ..., xgM K ] = 0

V [vgi |xg11 , ..., xgM K ] = σv2

Cov[vgi , vgj |xg11 , ..., xgM K ] = 0, i 6= j

E[cg |xg11 , ..., xgM K ] = 0

V [cg |xg11 , ..., xgM K ] = σc2

Cov[cg , vgi |xg11 , ..., xgM K ] = 0

The resulting variances and covariances of the combined error term

V [ugi |xg11 , ..., xgM K ] = σu2

Cov[ugi , ugj |xg11 , ..., xgM K ] = ρu σu2 , i 6= j

4 Estimation with OLS

The parameter β can be estimated with OLS by regressing ygi on a con-

5 Estimating Correct Standard Errors

where Avar(βbk ) = ς 2 /N can be consistently estimated with some addi-

This so-called cluster-robust covariance matrix estimator is a gener-

strictions on the form of both heteroscedasticity and correlation within

6 Efficient Estimation with GLS

7 Special Case: Estimating Correct Standard Errors

ygi = β0 + β1 xgi + ugi

The robust standard error for the slope coefficient is accordingly

where seb [βb ] is the usual OLS standard error.

Lessons from the Moulton factor

1. If either the within cluster correlation of the combined error term u

2. If the variable of interest is an aggregate variable on the level of the

In this situation, we need to correct the standard errors. Alterna-

3. If only control variables are aggregated, we better include cluster

4. If the variable of interest is not aggregated but has an important

5. If only control variables have an important cluster-specific compo-

6. If the variable of interest has only a small cluster specific component

Load example data

Stata reports the cluster-robust covariance estimator clustered for

Note: Stata multiplies Vb with (N − 1)/(N − K − 1) · G/(G − 1) to “cor-

The cluster specific random effects model is efficiently estimated by

In addition, cluster-robust standard errors are reported with

Load example data

First, we estimate the regression with the usual command

The cluster-robust covariance estimator clustered for manufacturer is

The following commands are equivalent

Note: The above commands multiply Vb with (N −1)/(N −K −1)·G/(G−

Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics:

Angrist, Joshua D. and Jörn-Steffen Pischke (2009), Mostly Harmless

Cameron, A. Colin and Douglas L. Miller (2015), A Practitioner’s Guide

You might also like