Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
46 views15 pages

Dummy Variables

Uploaded by

ahmedtobar2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views15 pages

Dummy Variables

Uploaded by

ahmedtobar2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

DUMMY VARIABLE

REGRESSION MODELS
Faculty of Economics and Political Science
Economic Department
Third Year- 2023-2024
2

In regression analysis, the dependent variable is frequently influenced


not only by quantitative variables (e.g. income, price, height), but also
by variables that are qualitative such as nationality, geographical
region, gender.

For example, female workers earn less than their male counterparts,
holding all other factors constant.

If the dependent variable, Y, represents the earnings or wages, then a


qualitative variable that represents gender (male or female) should be
included among the explanatory variables (regressors).
3

The question is how to quantify such variables in order to add them


to the regression model?

Construct artificial variables (dummy variables) that takes on


values 1 or 0; where 1 indicates the presence of an attribute (e.g.
value 1 if female) and 0 indicates the absence of the attribute (value 0
if male)
So, classify data into mutually exclusive categories and then easily
incorporate them into the regression model.
4

Example

If we have data on the average salary of teachers in three geographical


areas: North, South and West, and we want to find out if the average
salary differs among the three areas?

Using Regression Analysis

is 1 if North; 0 otherwise
is 1 if South; 0 otherwise
5

Therefore

→ Mean salary of teachers in the North

→ Mean salary of teachers in the South

→ Mean salary of teachers in the West


6

So, the mean salary of teachers in the West is given by the

intercept (the differential intercept coefficients ( tell by how

much the mean salaries in the North and South differ from that

in the West.
7

Note that the qualitative variable (geographical area) has three


categories (North, South, West), but we introduced only two dummy
variables in the regression model. WHY??
This is to avoid the dummy variable trap (situation of perfect
collinearity; where the sum of the three dummy variables’ columns
in the X matrix reproduce the intercept column. Therefore, the
determinant of the X matrix is zero and we cannot estimate the
model parameters).
8

If a qualitative variable has m categories, we introduce


only (m1) dummy variables in the model. In other words,
for each qualitative regressor the number of dummy
variables must be one less than the categories of the
variable; otherwise fall into the dummy variable trap.
9

The category for which no dummy variable is assigned (West in the


previous example) is known as the comparison category (also base
category, control category, benchmark category, reference category)
because all comparisons are made in relation to this category. The choice
of the comparison category is up to the researcher (so, in the previous
example, doesn’t matter West, North or South). The mean value of the
comparison category is given by the intercept.
10

To avoid the dummy variable trap, we may also introduce as many dummy
variables as the number of categories, but we do NOT introduce the intercept.

Example

In this case
is the mean salary of teachers in the West.
is the mean salary of teachers in the North.
is the mean salary of teachers in the South.

So, we directly obtain the mean values.


11

Most researchers find that introducing (m dummy variables in


case of m categories is better than omitting the intercept term
because it allows researchers to address more easily whether
the categorization makes a difference or not.
12

How can we deal with two qualitative variables?

In case of two qualitative regressors, each with two categories (so a single
dummy variable for each) such as:

Interpret the coefficients in the following regression model


13

In case of more than one qualitative variable, pay close attention to the
comparison category
It is (unmarried, non-south resident)
If the dependent variable is the mean wage, we can say that the mean
wage of unmarried persons who don’t live in the south is $8.81.
Compared with this, the mean wage of those who are married is higher
by about $1.1 (so their actual wage is $9.91).
Similarly, for those who live in the south, the mean wage is lower by
about $1.67.
14

We can test the statistical significance of the coefficient in the same way of
quantitative variables.

For example,

(0.4015) (0.4642) (0.4854)


t= 21.95 2.3688 -3.446
p-value = 0.0000 0.0182 0.0006

Here, all differential intercepts are statistically significant.

This means, for example, that the mean wage in the south is statistically
significantly lower by $1.67.
15

Final note

If the model has several qualitative variables with


several categories, introduction of dummy variables
can consume many degrees of freedom.
So, one should always weigh the number of dummy
variables to be introduced against the total number
of available observations.

You might also like