DUMMY WARIAR}
REGRESSION MODE|s
——vVsamv0reerew
In Chapter 1 we discussed briefly the four types of variables that one gene,
ally encounters in empirical analysis: These are: ratio scale, interval Scale,
ordinal scale, and nominal scale. The types of variables that we hay,
encountered in the preceding chapters were essentially ratio scale. But th
should not give the impression that regression models can deal only With
ratio scale variables. Regression models can also handle other types of ian,
ables mentioned previously. In this chapter, we consider models that may
involve not only ratio scale variables but also nominal scale variables. Suh
variables are also known as indicator variables, categorical variables,
h
=
qualitative variables, or dummy variables.!
2.1 THE NATURE-OF-DUMMY VARIABLES
In regression analysis the dependent variable, or regressand, is frequent
influenced not only by ratio scale variables (e.g., income, output. prices
costs, height, temperature) but also by variables that are essentially qualita.
tive, or nominal scale, in nature, such as sex, race, color, religion, national
ity, geographical region, political upheavals, and party affiliation, For evan:
ple, holding all other factors constant, female workers are found to earn less
than their male counterparts or nonwhite worker: found to earn less
than whites.” This pattern may result from sex or racial discrimination, but
whatever the reason, qualitative variables such as sex and race seem to
{We will discuss ordinal scale variables in Chap. 15
?For a review of the evidence on this subject, see Bruce E. Kaufman and Julie L. Hotchis
The Economics of Labor Market, 5th ed., Dryden Press, New York, 2000™“N
CHAPTERNINE. DUMMY VARIABLE REGMESSION MODELS 305
anttuenee the reetessand and clearly should be included among the explana-
in eatviales, OF the FeREESSONS.
Since such variables usually indicate the presence or absence of &
quality” ora attribute, such as male or female, black or white, Catholic or
varicatholic, Democrat or Republican, they are essentially nontinal seale
ables. One way we could "quantify" such attributes is by constructing
aviables that take on values of 1 or 0, 1 indicating the presence (or
ariession) of that attribute and O indicating the absence of that attribute.
pe example | may indicate that a person is @ female and 0 may designate a
Foviesor | may indicate that a person is a college graduate, and O that the
person is wot, and so on, Variables that assume such 0 and | values are
Peed dummy variables.’ Sitch variables are thus essentially a device to elas-
idata into mutually exclusive categories such as male or female.
‘pummy variables can be incorporated in regression models just as easily
as quantitative variables. As a matter of fact, a regression model may con-
as MMgressors that are all exclusively dummy, or qualitative, in nature
Such models are called Analysis of Variance (ANOVA) models.’
ke
2 ayova MODELS
“Do illustrate the ANOVA models, consider the following example.
EXAMPLE 9.1 .
PUBLIC SCHOOL TEACHERS’ SALARIES BY GEOGRAPHICAL REGION
able 9.1 gives data on average salary (in doliars) of public school teachers in 80 states and
the District of Columbia for the year 1985. These 51 areas are classified into three geo-
tgaptical regions: (1) Northeast and North Central (21 states in all) (2) South (17 states
Sin and (3) West (13 states in al), For the time being, do not worry about the format ofthe
table and the other data given in the table.
‘Suppose we want to find out i the average annual salary (AAS) of public school teachers
ditfers among the three geographical regions of the country. If you take the simple arth
mato average of the average salaries of the teachers in the three regions, you wil find that
these averages for the three regions are as follows: $24,424.14 (Northeast and North Cen-
tra), $22,894 (South), and $26,158.62 (West). These numbers look diferent, ut are they
(Continued)
geese eee
Tris not absolutely essential that dummy variables take the values of O and L-The pat 0.1)
can be transformed into any other pair by a linear function such that Z = a + bD (2 0). here
cab are constants and where D = 1 or 0, When D = t,we have Z = + band w hen D = 0.
cathave Z aa. Thus the pair (0,1) becomes (a,a+). For example, ita = 1 and b=2. the
dummy variables will be (1, 3). This expression shows tar qualitative. or dawn, variables do not
have a natural scale of measurement. That is why they are described as nominal scale variables.
ANOVA models are used to assess the statistical significance of the relationship Detweey
quantitative regressand and qualitative or dummy regressors They ate often used to com:
pare the differences in the mean values of two or more groups oF
ategories, and are therefore
pavee general than the f test which can be used to compare the means of two groups oF Ete:
gories only.«i
308 PART ONE: SINGLE-EQUATION REGRESSION MODELS
EXAMPLE 9.1 (Continued)
B, = $26,158
$24,424 (B, + B)
el
$22,894 (6, + B,)
|
West Northeast and South
North Central
FIGURE 9.1
‘Average salary (in dollars) of public school teachers in
three regions.
Differences in educational levels, in cost of living indexes, in gender and race may ange
some effect on the observed differences. Therefore, unless we take into account al thecee
variables that may affect a teachers salary, we will not be able to pin down the causes y
the differences.
From the preceding discussion, it is clear that all one has to do is see ifthe cceicens
attached to the various dummy variables are individually statistically significant This amps
also shows hoiv easy itis to incorporate qualitative, or dummy, regressors in the regessor
models.
Caution in the Use of Dummy Variables
Although they are easy to incorporate in the regression models, one mustue
the dummy variables carefully. In particular, consider the following aspect
1. In Example 9.1, to distinguish the three regions, we used only
dummy variables, Dz and D3. Why did we not use three dummies to disit
guish the three regions? Suppose we do that and write the model (9.2.11
Yi =a + Bi Dy + ByDaj + B3Dsi + 14 (9281
where Dj; takes a value of 1 for states in the West and 0 otherwise. This.
now have a dummy variable for each of the three geographical resi
Using the data in Table 9.1, if you were to run the regression (9.2.6), thee
puter will “refuse” to run the regression (try it). Why? The reason is that |
yj |
‘Actually you will get a message saying that the data matrix is singular-
9.2.6) where you h
up of © have a dummy vari
° oe variable for cach catego
hdakoan tere pt, vou have a case of perfect collinearity, nate
re Oe i ete 7 a 31
we a 1 column, taking th 8
faa pay ¢ the value of 1 whenevei
' i and Ostherdee: New If you'uéd she htce D columns Kor
epee will obtain a column that has 51 ones in it. B eel
i et ut since the value
is (implicitly) 1 for each observati
wie ‘ach observation, you will have a
colt
hat also contains 51 ones. In other v 7
: . words, the
Stamos will ‘simply reproduce the intercept column, ae ‘ a
un 2 to perfect
In this case, estimation of the mod
se ° el (9.2.6) is i
olfye message Here is If a qualitative variable has Lee
uee only =1 ) ey variables. In our example, since the Se ative
de ie vegion” has three categories, we introduced only two dummies. If
vu do not follow this rule, you will fall into what is called the dummy oa
able traps that is, the situation of perfect collinearity or perfect multi-
collinearit
if there is more than one exact relati
i : nore % ionship among the vari-
ables. This “sale also applies if we have more than one qualitative variable in
he model, an example of wi
hich is presented later. Thus v
: we shi
the meceding rule as: For each qual ce cavers
‘ jitative regressor thi
dummy variables introduced must be one less than the ne st
durPvariable. Thus, if in Example 9.1 we had information about the gender
tine teacher, we would use an additional dummy variable (but not two)
of ging a value of 1 for female and 0 for male or vice versa
3 the category for which no dummy vari
she base, benchmark, control, comparison,
‘gory, And all co isons
CHAPTER Nn
NINE DUMMY VARIABLE REGRESSION MODELS 309.
the §
croup a"
able is assigned is known as
at reference, or omitted cate-
7 ind all comparisons are tade in relation to the benchmark category:
oF The intercept value (B1) represents the mean value of the benchmark
category. In Example -{, the benchmark category is the Western region.
Fence, in the regression (9.2.5) the intercept value of about 26,159 repre-
sents the mean salary of teachers in the Western states.
4, “The coefficients attached to the dummy variables in (9.2.1) are known
ential intercept coefficients because they tell by how much
‘of ihe intercept that receives the value of 1 differs from the inter-
cept. fficient of the benchmark category. For example, in (9. 5), the value
of about — 1734 tells us that Tie mean salary of teachers in the Northeast oF
North Central is smaller by about $1734 than the mean salary ‘of about
$26,159 for the benchmark category, the West.
5. Ifa qualitative variable has more than one category: a in our illus-
trative example, the ice of the benchmark category is strictly up to the
searcher. Sometimes the choice of the benchmark is dictated by the par-
ticular problem at hand, In our illustrative example, we could have chosen
the South as the benchmark category: Tn that case the v ession results
given in (9.2.5) will change, because now all comparisons are made in rela-
tion to the South. Of course, this will not change the ov erall conclusion of
our example (why?). In this case, the intercept value will be about $22,894,
which is the mean salary of teachers in the South.310 PARTONE: SINGLE-EQUATION REGRESSION MODELS
6. We warned above about the dum:
circumvent this trap by introducing as
eh ariable trap, 7
tof categories of that variable, provided we dain) Vataby™ a,
tuch a model. Thus, if we drop the intercept cert ttodugs ‘Sti
sider the following model, From 5
ss
at
Y= BDut BDa + BsDy 44
we do not fall into the dummy variable trap,
collinearity. But make sure that when you run thig ere 8 °
intercept option in your regression package, Bresion, yoy gS
How do we interpret regression (9.2.7)? If “ab A
(9.2.7), you will find that: You take the i
“|
6; = mean salary of teachers in the West
ean salary of teachers in the Northeast a
pie, |
ind North, Cony,
(986.8645) ou
27.5072)" (23.1987)
i R? = 0.0901
| lues of these f ratios are very smal
ummy \C ents give directly the mean (sslan)*|
ans, West, Northeast and North Central, and So
better: method|of introducing a dummy variable:(I)®
each category and omit the intercept term or 2)
luice Only (2 — 1) dummies, where"
variable? As Kennedy notes:
ith an intercept more conveniznt ts
eh they wl
kes adi gis wu
t the categorization mal
ization does make a different
atiable coefficient estimates",
can be done by running 8°
‘to be more general, at FE"TWO QUALITATIVE VARIABLES
pets - 7 i
oe No he evans eet eee a tie an ANOVA model with one qualitative
riable With © ries. In this section we consider another ANOV:
variable ith tivo qualitative variables, and bring out some Saitonal
: meee about ‘gummy variables.
7
\Wihich is the benchmark category here? Obvious
WG ne? a expmton TO NARITAL STATUS. cues non extn residence. oer ‘words.
; Jerson who donot Tivo int
_ | Sree hesenee ort Se cet uer cr eneke,
+ | nur 085 ho tat ote gon The mean ew wae
ee os peanes yenchmark is about $8.81. Compared with this, the
vesiiane eoragje Hourly wag of those who are marred is righor
| sry 672003 by about $1.10, for an actual average wage of $9.91
etl a) (neat + "110). By contrast, for those who live in the
ve tn uth, the average hourly wage slower by about $1.87,
Me ie aan oan, eaten rourly wage o $7.44,
i. ' ire ine proceding average nouly wagos stastcaly
# (0.0008) iferent compared to the base category? They are, tor
" 1182)"
p00 (0.01 all the differential intercepts are statistically significant,
as their p values are quite low.
aa ‘The point to note about this example is this: Once
yel row a = martied.O= otherwise YOU. go beyond one qualitative variable, you have to pay
po aried gt = South, = otnerise ‘Hose attention to the category that's treated as the base
peregon % ‘category, since all comparisons are made in relation to
fi that category. This is especially important when you
ve. wo qualitative repressors, have several qualatie regressors, each with several
rawvjenco we have assigned & categories. Bit the mechanics of introducing several
Soy anebe gory. qualitative variables should be clear by now.
~y
4 REGRESSION WITH A MIXTURE OF QUANTITATIVE AND Ee D
(QUTATIVE REGRESSORS: THE ANCOVA MODELS
RP = 0.0322
é
)
ANOVA models of the type discussed in the preceding two sections, al-
and
though common in fields such as sociology, psychology,
market research, are not that common in economics. Typically, in most €co-
nomic research a regression model contains some explanatory variables
‘d some that are qualitative. Regression models con
that are quantitative an
taining a mix of quantitative and qualitative variables are called analysis of
els are an extension of the
covariance (ANCOVA) models. ANCOVA mode i
ANOVA models in that they provide a ‘method of statistically controlling the
alled covariates or control variables,
effects of quantitative regressors, C
education,
hai anaes
PThe data are obtained from the data disk in Arthur S. Goldberger, Intvoductory Economet-
ries, Harvard University Press, Cambridge, Mass., 1998. We have already considered these dats
in Chap. 2.PART ONE: SINGLE-EQUATION REGRESSION MODELS
v
“org,
Example gy
ers may no,
in a model that includes both quantitative and
gressors. We now illustrate the ANCOVA models
To motivate the analysis, let us reconsider p
that the average salary of public schoo} teach,
three regions if we take into account any variat ‘
dardized across the regions, Consider, for exam, ee thar ance
ture on public schools by local authorities, ns publin © the Varig! be
local and state question. To see if this ie the case, \otCation abe
model: * we
Malitatiye
y
be agin
fog
7 Yi= Bi + B2Dx + AsDy, FBX; hy .
if the state is in the South
= 0, otherwise
The data on X are given in Table 9.1. Kee
West as the benchmark category. Also,
Tegressors, we have a quantitative vay
P in mind that we
are
hote that besides the ne
riable, X, which in thot delim,
ANCOVA models is known as a covariate, as noted ew, Soman
EXAMPLE 9.3
95 7
TEACHER'S SALARY IN RELATION TO REGION AND
SPENDING ON PUBLIC SCHOOL PER PUPIL
From the data in Table 9.1, the results of the model (9.4.1) are as follows:
Y= 13,260.11 ~ 1673.514De;— 1144.157Ds + 3.2889%,
Se= (1895.056) (801.1703) (861.1182) (0.3176) =
f= (85115 (-2.0880)" (-1.9086)"* (10.3539"
FP = 0.7266
é t
Where * indicates pvalues less than 5 Percent, and ** indicates p values greater than Spee"
As these results suggest, ceteris paribus: as public expeniture goes up by adit
Sverage, a public schoo! teacher's salary goes up by about $3.29. Controlling for bei se
education, we now see that the cltferential intercept cootfiient is signiicant 0: eae
a nowh-Centra region, but not for the South, These results are dieren ee
(8.2.5). But this should not be Surprising, for in (9.2.5) we did not account for
ically, we have the st
Sifferences in per pupil public ‘spending on education, Diagrammatically, we have
tion shown in Figure 9.2.
Note that althou
the regression
regression li
o,stasicl
igh we have shown three regression lines for the three at the the?
's are the same for the West and the South. Also no!
8 are drawn parallel (why?), (conn8
_ yeu VARIABL
CHAPTER NINE: DUMMY VARIAOLE REG
_xanirteoo (cominued)
y
FIGURE 92
FIGURE teacher’ salary (Yin relation to per pupil expenditure on education %. \
E ALTERNATIVE TO THE CHOW TEST?
In Section 8.8 we discussed the Chow test to examine-the structural stabil-
ity of a regression model. The example we discussed there related to the
“lationship between savings and income in the United States over the
eriod 1970-1995. We divided the sample period into two, 1970-1981 and
982-1995, and showed on the basis of the Chow test that there was a dif-
ference in the regression of savings on income between the two periods.
However) we could not tell whether the difference in the two regressions
was because of differences in the intercept terms or the slope coefficients or
both. Very. often this knowledge itself is very useful.
Referring to Eqs. (8.81) and (8.8.2), we see that there are four possibili-
ties, which we illustrate in Figure 93.
1. Both the intercept and the slope coefficients are the same in the wo re-
gressions. This, the case of coincident regressions, is shown in Figure 9.3a.
2, Only the intercepts in the two regressions are different but the slopes
are the same. This is the case of parallel regressions, which is shown in
Figure 9.3b.
The material in this section draws on the author's articles, "Use of Dummy Va
‘Testing for Equality between Sets of Coefficients in Two Lineat Resress 0m. ANote,” and “Use
of Dummy Variables ...A Generalization,” both published in the America Statistician, vol. 24,
nos. land 5, 1970, pp. 50-52 and 18-21.SINGLE-EQUATION REGRESSION MODELS
cE z
312. par ON
nes
savings
| T |
nn Tora,
(Ls
Income !
(a) Coincident regressions (6) Parallel regression, Ming
Savings ings
nea
Income ae
(c) Concurrent regressions (d) Dissimilar regressions ‘
FIGURE 9.3 Plausible savings-income regressions.
3. The intercepts in the two regressions are the same, but the slopes,
different. This is the situation of concurrent regressions (Figure 9.3.
4. Both the intercepts and slopes in the two regressions are differ
This is the case of dissimilar regressions, which is shown in Figure 9
The multistep Chow test procedure discussed in Section 8.8, as noted ex
lier, tells us only if two (or more) regressions are different without tellingus
what is the source of the difference. The source of difference, if any: can’
pinned down by pooling all the observations (26 in all) and running justo
multiple regression as shown below":
Ye = ay +a2D; + BrX; + B2(D:X1) +H (951
where savings
X = income
time
1 for observations in 1982-1995
= 0, otherwise (i.e., for observations in 1970-1981)
D
Thro. a
‘As in the Chow test, the pooling technique assumes homoscedasticity thatis.21 ="!INCOME DATA, UNITED
neste
z =
oe cy
we 36
972 eo
33 ee
we 976
a 1004
8 es
a ee
7 1126
oe 301
= 1618
we 1901
4982 2055
198 167
ei 2057
ee 2082
= 4985
1987 1084
408 102.4
4988 sere
4300 2087
‘eat 246.4
ay 2728
a 218d
{ead
1994
249.3
1995,
__ 88208
Tor asarvaone waging 1820
iso! ool “e
yeu S
eaten a's.
98 9 eeonami ae ppt 007. Te 828.0 SP
ccture of the data matrix:
2 shows the strus
table 9.
: as 7 ‘implications of ( 5,1), and, assumins:
Mean savings function for 1970-1981:
(1D = 0 = + BX (9.5.2)
as previousl
drifter):
vings fun
HU [be 25 =
20 97 99:5
4806 7
—S b 5421 ON
=
SINGLE-EQUATION REGRESSION MODELS
category that receives the dummy value of 1) differs from that of the firg
riod. Notice how the introduction of the dummy variable D in the ing?
tive, or multiplicative, form (D multiplied by X) enables us to differen 8
between slope coefficients of the two periods, just as the introduction ,t€
dummy variable in the additive form enabled us to distinguish betweg,, 7
intercepts of the two periods
EXAMPLE 9.4
SAVING:
STRUCTURAL DIFFERENCES IN THE US.
THE DUMMY VARIABLE APPROACH
S-INCOME REGRESSION,
Before we proceed further, let us first present the regression results of Model (9.5.1) appieg
to the US. savings-income data.
Y= 10161 + 152.4786D, + 0.0803X,— 0.0655(D:X)
se = (20.1648) (33.0824) (0.0144) (0.0159) 54 | tM
t= (0.0504)"* (4.6090)* (5.5413)" —_ (~4.0963)"
FP = 0.8819
where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent
‘As these regression results show, both the differential intercept and slope coetticients arg
statistically significant, strongly suggesting that the savings-income regressions for the tug
time periods are different, as in Figure 9.3d.
From (9.5.4), we can derive equations (9.5.2) and (9.5.3), which are:
‘Savings-income regression, 1970-1981:
¥,= 1.0161 + 0.0803X; (9.5.5)
‘Savings-income regression, 1982-1995:
Y= (1.0161 + 152.4786) + (0.0803 ~ 0.0655)X,
= 153.4947 + 0.0148%, (958)
These are precisely the results we obtained in (8.8.14) and (8.8.2a), which should not be sur
prising. These regressions are already shown in Figure 8.3.
The advantages of the dummy variable technique [i.e., estimating (9.5.1)] over the Chow
test [Le., estimating the three regressions (8.8.1), (8.8.2), and (8.8.3)] can now be seen
readily:
1, We need to run only a single regression because the individual regressions can easily be
derived from it in the manner indicated by equations (9.5.2) and (9.5.3)
2, The single regression (9.5.1) can be used to test a variety of hypotheses. Thus if the di
ferential intercept coefficient az is statistically insignificant, we may accept the hypothess
that the two regressions have the same intercept, that is, the two regressions are concu-
rent (see Figure 9.30). Similarly, if the differential slope coefficient 2 is statistically i
significant but as is significant, we may not reject the hypothesis that the two regressions
have the same slope, that is, the two regression lines are parallel (cf. Figure 9.3b). The tes!
of the stability of the entire regression (i.6., a2 = Bz = 0, simultaneously) can be made bj
the usual Ftest (recall the restricted least-squares F test). If this hypothesis is not rejected
the regression lines will be coincident, as shown in Figure 9.3a.
(Continued)CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 317
EXAMPLE 9.4 (Continued)
3, The Chow test
not expticitly tell us which coefficient, intercept. or lop
is ditterant
on whether (as in this example) both are different in the two perieds That is. one can ob-
tain a sianficant Chow test because the slope only is dit
ent or the intercept only is dit
ferent, of Both are different In other words, we cannot tell via the Chow test, which ane af
tne four possibiitves depicted in Figure 9.2 exists in a given instance In this respect. the
dummy variable approach has a distinct advantage. for it not only tells # the two are dit-
ferent but also pinpoints the source(s) of the ditterence—whether its due to the inte
dor the slope or both. In practice, the knowledge that two regres:
apt
ons difer inthis or that co-
efficient is as important as. if not more than, the plain knowledge that they are different.
4, Finally, since pooling (1.e., including all the observations in one regression) increases the
degrees of freedom, it may improve the relative precision of the estimated parameters. Of
course. keep in mind that every addition of a dummy variable will consume ene degree of
freedom,
«ren#GTION EFFECTS USING DUMMY VARIABLES
wl
Dummy variables are a flexible tool that can handle a variety of interesting
problems. To see this, consider the following model:
¥, =a + 02D3, + @3D3 + BX) + (9.6.1)
where Y = hourly wage in dollars
X = education (years of schooling)
Dz = 1 if female, 0 otherwise
Dy = 1 if nonwhite and non-Hispanic, 0 otherwise
In this model gender and race are qualitative regressors and education is
a quantitative regressor.'! Implicit in this model is the assumption that the
differential effect of the gender dummy D) is constant across the two cate-
gories of race and the differential effect of the race dummy D; is also con-
stant across the two sexes. That is to say, if the mean salary is higher for
males than for females, this is so whether they are nonwhite/non-Hispanic
or not. Likewise, if, say, nonwhite/non-Hispanics have lower mean wages,
this is so whether they are females or males.
In many applications such an assumption may be untenable. A female
nonwhite/non-Hispanic may earn lower wages than a male nonwhite/non-
Hispanic. In other words, there may be interaction between the two qualita-
tive variables D2 and D3. Therefore their effect on mean Y¥ may not be simply
additive as in (9.6.1) but multiplicative as well, as in the following model.
Y, = ay + a Dy) + eDyi + (D2, Dy) + BX: +
(9.6.2)
where the variables are as defined for model (9.6.1).
From (9.6.2), we obtain:
E(Y;\ Dy = 1, Dai = 1, Xi) = (ory + 2 + 3 Ferg) + BX;
(9.6.3)
T]f we were to define education as less than high school, high school, and more than big
school, we could then use two dummies to represent the three classes.
>NG E EQUATION HEGRE
.SSION MODELS -
which is the mean hourly wage fanetion for femal
workers, Observe that
ay = dillerential effect of being a female
= differential effect of being a non
lillerential effect of being a femal
hite
hiteMonttigy,
le nonwhite
‘nic
ned
"I
i
which shows thatthe mean hourly waes of female tony
isdlferent (by ea) from the mean hourly wages of Fema Mm |
Hispanics. 1, for instance, all the three differential dung MMe)
negative, this world imply that female nonvhitchnonsttiny etl st
nich lower mean hourly wages than female or oni k,
workers as compared with the base category, which in we
is male white or Hispanic. re
Now the reader can see how the interaction dummy (
two qualitative or dummy variables) modifies the effect of
considered individually (i.e,, additively),
EXAMPLE, aT
AVERAGE HOURLY EARNINGS IN RELATION TO EDUCATION, GENDER, anon
4.0. the prog,
the two att,
Let us first present the regression results based on model (9.6.1). Using the dala aus
sed to estimate regression (9.3.1), we obtained the following results:
Y= 0.2610 - 2,3606Dz)—. 1.7327Dy + 0.8028X,
t= (-0.2387)"" (—5.4873)" — (-2.1803)"_(9.9094)" Bie
fete ie
‘here "Indicates p values less than § percent and ** indicates p values greater thanSpeos
‘The reader can check that the differential intercept coefficients are statistical so
that they have the expected signs (why?), and that education has a strong poste tt"
hourly wage, an unsurprising finding, i
As (9.6.4) shows, ceteris paribus, the average hourly earnings of females #2 tt)
about $2.36, and the average hourly earnings of nonwhite non-Hispanic workes
lower by about $1.73,
We now consider the results of model (9.6.2), which includes the interaction "7
Y= 026100 - 2.3606Dy— 1.790703 + 2.1209D,Dy + 08026
of
'= (-02957)" "(5.4873)" (-2,1803)" (1.7420) (99085)
Re=02032 1-58
nana
where” indicates pvalues less than & percent and “indicates p values grea pe
far 20u can see, the two additive dummies are stil statistically significa et
dummy a 0 atthe conventional 5 percent level; the actual p valve! Marv?
lummy Is about the 8 percent level, if you think this is a low enough proba! stat
Sar et (2.85) can be interpreted as follows: Holding tne level of eduction
ea the three dummy coefficients you will obtain: -1.964 2.3605 - oa
thea groans that mean hourly wages of nonwhitemnon-Hispanic female ¥O
about $1.96, which is betwee: ce al
: tl + differen
(ac tron ange neon 2 value of ~2.9605 (gnder |CHAP” :
TER NINE: DUMMY VARIABLE REGRESS! oeis 32t
2
a
a
receding, example clearly reveals the role of interacti
we note that in the model (9.6.5) we are assumin ihe
e of hourly earnings with respect to education (of abe 4
. additional year of schooling) remains constant across gender and ace Ba
sd not be the case. Ifyou want to test for this, you wil hav een
th uferential slope coeiiens (see exercise 9.25 ee
|AL_ ANALYSIS
nomic time series based on moni abi
patterns (regular oscillatory ee ey eels oa of
: i nents). ples are sales of
nt storés at Christmas and other major holiday times, d id fe
‘cash balances) by households at holiday times demand for ice
monty and soft drinks during summer, prices of crops right after parvesting
season, demand for air travel, etc. Often it is desirable to remove the fe
sonal factor, OF component, from a time series so that one can concentrate
on the other components, such as the trend.!2 The process of removing
on. the sonal component from a time series is known as deseasonalization
or seasonal adjustment, and the time series thus obtained is called the
deseasonalized, or seasonally adjusted, time series. Important economic
time series, such as the unemployment rate, the consumer price index (CPD),
the producer's price index (PPI), and the index of industrial production, are
treuslly published in seasonally adjusted form.
\ neve are several methods of deseasonalizing a time series, but we will
consider only one of these methods, namely, the method of dummy vari-
_ ables. To illustrate how the dummy variables can be used to deseasonalize
geonomic time series, consider the data given in Table 9.3. This table gives
verly data for the years 1978-1995 on the sale of four major appliances,
dishwashers, garbage disposers, refrigerators, and washing machines, all
Geta in thousands of units. The table also gives data on durable goods expen-
diture in 1982 billions of dollars.
Wer lustrate the dummy technique, we will consider only the sales of re-
frigerators over the sample period. But first Jet us look at the data, which is
shown in Figure 9.4. This figure suggests that perhaps there is a seasonal
pattern in the data associated with the various quarters. To see if this is the
case, consider the following model:
Y, = a1 Dy + 02 Dar + 3 Dy + 04 Dar +e (9.7.1)
where ¥; = sales of refrigerators (in thousands) and the D’s are the dum-
mies, taking a value of 1 in the relevant quarter and 0 otherwise. Note that
TA time series may contain four components: 2 seasonal, a cyclical, trend, and one that
is strictly random.
For the various methods of 5
Elements of Forecasting, 2 ed., Soul
seasonal adjustment, see, for instance, Franets X. Diebod.
‘eQvestern Publishers, 2001, Chap. 5&T ONE
TABLE 8.3
FIGURE 9.4
SINGLE-EQUATION REGRESSION MODELS
QUARTERLY DATA ON APPLIANCE SALES (IN THOUSANDS)
AND EXPENDITURE ON DURABLE GOODS (1978-1 TO 1985:1V)
DISH DISPFRIG.
WASH DUR DISH DISPFRIG wasn
eee eee s
798 13171271 252.6 480 708 943 tg
837 1615 1295724 530 582 1175 tog
821 1662 1313—270.9 587 G59. 1269 igay
858 1205 «1150 273.9 602 897973 ig
837 12711289 268.9 658 86711021437
898 15851245 _262.9 749 860 1344 tg
632 1639 ©1270 270.9 827 91816411239
gig 1238 «« 1103 263.4 a58 1017122599
868 ©1277 -««1273-—«S «2606 aos = 108314291396
623 1258 «1031 231.9 840 955 16991228
66214171143 242.7 893 973 1749 t297
22 1185 1101 248.6 950 109611171199
a7i 11961181 258.7 838 1086. 12421299
7oi 14101116 248.4 884 990 1684 1349
759 «14171190 255.5 905 102817641323,
7349191125240. 909 © 100313281274
‘Note: DISH = dishwashers: DISP = garbage dlsposers; FRIG = refrigerators: WASH = washing naan.
DUR = durable goods expenditure, bitions of 1992 dollars.
‘Source: Business Statistics and Survey of Current Business, Deparimont of Commerce (vanous sues)
1800
‘Thousands of units
8
1000
800 4
7 79 +80 81 82 83 Bd BS Ro
Year
Sales of refrigerators 1978-1985 (quarterly).
to avoid the dunmny variable trap, we are assigning a dummy fo each qual
of the year, but omitting the intercept term. Tf there is any seasonal ellectt)
given quarter, that will be indicated by a statistically significant! value ott
dummy coefficient for that quarter.'*
"Note a technical point. This method of assigning a dummy to each quater
the
sonal factor, if present, is deterministic and not stochastic. We will te
when we discuss time series econometrics in Part V of this book,
ee)
jit this OfCHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 921
(Continued)
TABLE OF ATOR SALES REGRE:
REFRIGER: SSSION: ACTUAL, FIT
QALUES (EQ. 9.7.9) ED, AND RESIDUAL
Actual Fitted
Residuals 0
1317 1222.12 94.875
jo7ell 1615 1467.50 447.500
jored 1662 1569.75 22.250
ye7ev 1298 1160.00 195.000
497941 1271 1222.12 ‘4a676
4979 1555 1467.50 87.500
49791 1639 1569.75, 69.250
ig7av (1288 4160.00 78,000
4980-1 127 1222.12 54.875
1980-1 1258 1467.50 209.500
4980-II1 1417 1569.75, ~152.750
1980-1V, 1185 1160.00, 25.000
4981-1 1196 1222.12 26.125
1981-ll 4410 1467.50 57.500
1981-lIL A417 1569.75, -152.750
4981-1V 919 1160.00 —241.000
4982-1 ~ 943 1222.12 279.125
1962-1 A175 1467.50 292.500
4982-1 1269 1569.75, -300.750
1982-1V 973 1160.00 —187,000
1983-1 4102 1222.12 120.125
1983-1 1344 1467.50 -123,500
1983-111 1641 1569.75 71.250
1983-1V 1225 1160.00 65.000
1984-1 1429 1222.12 206.875
1984-1 1699 1467.50 231.500
1984-IIl 1749 1569.75 179.250
1984-1V 1417 1160.00 = 43,000
1985-1 4242 1202.12 19.875
1985-11 1684 1467.50 216.500
4985-II 1764 1569.75 194.250
1985-V 1828 4160.00 168,000
(tamer ce 1 Ves
Again, keep in mind that we are treating the first quarter as our base. As in (9.7.3), we see
shat a Uitereriel intercap toeffictenta for the second and thie Guar are statistically dit-
ferent from that of the first quarter, but the intercepts of the fourth quarter and the first quar
terave statistically about the same. The coefficient of X (durable goods expenciture) of about
277 tells us that, allowing for seasonal effects, if expenditure on durable goods goes uP
bya dollar, on average, sales of refrigerators go up by about 2.77 units, that is, approximately
a ints; Dear in ming that refrigerators aro in thousands of Units ‘and X is in (1982) billions
lars,
(Continued)