CHAPTER FIVE
LIMITED DEPENDENT
VARIABLE MODELS
Introduction
Difference between models with quantitative & qualitative
dependent variables
Quantitative Dependent variable
Our objective is to estimate the mean value of Y given the
values of regressors Dependent variable has continuous
nature.
Qualitative Dependent variable
Our objective is to find the probability of something
happening, hence the qualitative response models are also
known as probability models.
In all the regression models that we have considered so far,
we have implicitly assumed that the regressand, the
dependent variable, or the response variable Y is
quantitative, whereas the explanatory variables are either
quantitative, qualitative (or dummy), or a mixture thereof.
In this chapter we consider several models in which the
regressand itself is qualitative in nature.
Although increasingly used in various areas of social sciences
and medical research, qualitative response regression models
pose interesting estimation and interpretation challenges.
There are frequently cases in which the dependent variable is of a qualitative
nature and therefore a dummy is being used in the left-hand side of the
regression model.
Assume, for example, we want to examine why some people go to university
while others do not, or why some people decide to enter the labour force
and others do not.
Both these variables are dichotomous (they take 0 or 1 values) dummy
variables. Here we want to use this variable as the dependent variable.
Things can be even further complicated by having a dependent variable that
is of a qualitative nature but can take more than two responses (a
polychotomous variable).
For example, consider the ratings of various goods from
consumer surveys answers to questionnaires on various
issues taking the form:
strongly disagree, disagree, indifferent, agree,
strongly agree and so on.
We start with the linear probability model, followed by the
logit, probit and Tobit models.
Ordered and multinomial logit and probit models are also
presented.
A limited dependent variable (LDV) is broadly defined as a
dependent variable whose range of values is substantively
restricted.
A binary variable takes on only two values, zero and one.
Elsewhere, we have encountered several limited dependent
variables, including the percentage of people participating in a
pension plan (which must be between zero and 100) and
college grade point average (which is between zero and 4.0 at
most colleges).
Limited dependent variables are dependent variables that have limited
ranges: usually either discontinuous or range bounded.
There are many models of LDVs based on what the limitations are:
0-1 dependent variables (dummies) by probit and logit
Ordered dependent variables by ordered probit and logit
Categorical dependent variables (with more than two categories) by
multinomial logit
Truncated dependent variables by Heckman’s procedure o Censored
dependent variables by tobit.
Count (integer) dependent variables by Poisson regression
Hazard (length) dependent variables by hazard models
o Dependent variable or response is the waiting time until the
occurrence of an event.
Because of the limited ranges of the dependent variable,
the standard additive normal error is not tenable for these
models. Instead we must model the probability of various
discrete outcomes.
LDV models are usually estimated by maximum
likelihood, given the assumed distribution of the
conditional probabilities of various outcomes.
The linear probability model
The linear probability model (LPM) is by far the simplest
way of dealing with binary dependent variables, and it is
based on an assumption that the probability of an event
occurring, 𝑃𝑖 , is linearly related to a set of explanatory
variables 𝑋1𝑖 ,𝑋2𝑖 , ... ,𝑋𝑘𝑖
𝑃𝑖 = 𝑝 𝑦𝑖 = 1 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + . . . + 𝛽𝑘 𝑋𝑘𝑖 + 𝑈𝑖
𝑖 = 1, … , 𝑁 (1)
The actual probabilities cannot be observed, so we would estimate a
model where the outcomes, 𝐲𝐢 (the series of zeros and ones), would be
the dependent variable.
This is then a linear regression model and would be estimated by OLS.
The set of explanatory variables could include either quantitative
variables or dummies or both.
The fitted values from this regression are the estimated probabilities for
𝒚𝒊 = 1 for each observation i.
The slope estimates for the linear probability model can be interpreted
as the change in the probability that the dependent variable will equal
1 for a one-unit change in a given explanatory variable, holding the
effect of all other explanatory variables fixed.
𝐃𝐢 is a dichotomous dummy
Figure 5.1 The linear probability model
Now, by the definition of mathematical expectation, we obtain:
𝐸(𝑌𝑖 ) = 0 1 − 𝑃𝑖 + 1(𝑃𝑖 ) = 𝑃𝑖
We can equate equation 1:
E 𝑌𝑖 /𝑋𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + . . . + 𝛽𝑘 𝑋𝑘𝑖 + 𝑈𝑖 = 𝑃𝑖 (2)
That is, the conditional expectation of the model can, in fact,
be interpreted as the conditional probability of 𝑌𝑖 .
In general, the expectation of a Bernoulli random variable is
the probability that the random variable equals 1.
In passing note that if there are n independent trials, each with
a probability p of success and probability (1 − 𝑝) of failure,
and X of these trials represent the number of successes, then X
is said to follow the binomial distribution.
The mean of the binomial distribution is np and its variance is
np(1 − p).
The term success is defined in the context of the problem.
Since the probability 𝑃𝑖 must lie between 0 and 1, we have the
restriction.
0 ≤ E 𝑌𝑖 /𝑋𝑖 ≤ 1 (3)
that is, the conditional expectation (or conditional probability)
must lie between 0 and 1.
Non-Normality of the Disturbances 𝑈𝑖
Although OLS does not require the disturbances (𝑼𝒊 ) to be
normally distributed, we assumed them to be so distributed
for the purpose of statistical inference.
But the assumption of normality for 𝑈𝑖 is not tenable for the
LPMs because, like 𝑌𝑖 , the disturbances 𝑈𝑖 follow the
Bernoulli distribution.
Suppose, for example, that we wanted to model the probability
that a firm i will pay a dividend (𝑦𝑖 = 1) as a function of its
market capitalisation (𝑋2𝑖 = 1, measured in millions of US
dollars), and we fit the following line:
𝑃𝑖 = −0.3 + 0.012𝑋2𝑖
where 𝐏𝐢 denotes the fitted or estimated probability for
firm i.
This model suggests that for every $1m increase in size, the
probability that the firm will pay a dividend increases by
0.012 (or 1.2%). A firm whose stock is valued at $50m will
have a −0.3 + 0.012 × 50 = 0.3 (or 30%) probability of
making a dividend payment.
Graphically, this situation may be represented as in figure 5.2.
Figure 5.2 The fatal flaw of the linear probability model
Mortgage applications
Example:
Most individuals who want to buy a house apply for a
mortgage at a bank.
Not all mortgage applications are approved.
What determines whether or not a mortgage application is
approved or denied?
During this lecture we use a subset of the Boston HMDA
data (N = 2380)
a data set on mortgage applications collected by the Federal
Reserve Bank in Boston
Variable Description Mean SD
deny = 1if mortgage application is denied 0.120 0.325
pi_ratio anticipated monthly loan payments / monthly income 0.331 0.107
black = 1if applicant is black, = 0 if applicant is white 0.142 0.350
Does the payment to income ratio affect whether or not a
mortgage application is denied?
The estimated OLS coefficient on the payment to income ratio
equals
𝛽1 = 0.60
The estimated coefficient is significantly different from 0 at
a 1% significance level.
How should we interpret 𝛽1 ?
In the mortgage application example:
𝛽1 = 0.60
A change in the payment to income ratio by 1 is estimated to
increase the probability that the mortgage application is denied
by 0.60.
A change in the payment to income ratio by 0.10 is estimated to
increase the probability that the application is denied by 6%
(0.10*0.60*100).
Problems with LPM
• Predicted values may lie outside the interval [0, 1]: non
fulfillment of 0 ≤ 𝐸 𝑌𝑖 𝑋 ≤ 1 ) the probability of Y
occurring given X should be between 1 and 0.
While the linear probability model is simple to estimate
and intuitive to interpret, the diagram should immediately
signal a problem with this setup. For any firm whose value
is less than $25m, the model-predicted probability of
dividend payment is negative, while for any firm worth
more than $88m, the probability is greater than one.
Clearly, such predictions cannot be allowed to stand,
since the probabilities should lie within the range (0,1).
Heteroscedastic Variances of the Disturbances
Hence the error term cannot plausibly be assumed to be
normally distributed. Since 𝑈𝑖 changes systematically with
the explanatory variables, the disturbances will also be
heteroscedastic.
It is therefore essential that heteroscedasticity robust
standard errors are always used in the context of limited
dependent variable models.
Questionable R squared. Usually low R2 Practical value of
R2 will be between 0.2 and 0.6
The model is not logically very attractive since it assumes
that 𝑷𝒊 increases with X, that is marginal effect of X
remains constant.
Thus we need model which has two features:
As 𝑿𝒊 increases 𝑷𝒊 (𝒀𝒊 / 𝑿𝒊 ) also increases but does not
step outside 0 and 1.
The relationship between X and P is nonlinear, that is S
shaped relationship.
These two features are possessed by Logit and Probit
models.
The linear probability model:
Summary
Models Pr(Y=1|X) as a linear function of X
Advantages:
simple to estimate and to interpret
inference is the same as for multiple regression (need
heteroskedasticity-robust standard errors)
Disadvantages:
Does it make sense that the probability should be linear
in X?
Predicted probabilities can be <0 or >1!
These disadvantages can be solved by using a nonlinear
probability model: probit and logit regression
24
Alternatives to LPM
As we have seen, the LPM is plagued by several problems, such as (1)
non-normality of 𝑈𝑖 , (2) heteroscedasticity of 𝑈𝑖 , (3) possibility of
𝑌𝑖 lying outside the 0 -1 range, and (4) the generally lower 𝑅2 values. But
these problems are surmountable. For example, we can use weighted least
square (WLS) or robust standard error to resolve the heteroscedasticity
problem or increase the sample size to minimize the non-normality
problem.
By resorting to restricted least-squares or mathematical programming
techniques we can even make the estimated probabilities lie in the 0–1
interval.
But even then the fundamental problem with the LPM is that it is not
logically a very attractive model because it assumes that 𝑃𝑖 = E(Y = 1 | X)
increases linearly with X, that is, the marginal or incremental effect of X
remains constant throughout.
25
FIGURE 5.3: A cumulative distribution function (CDF).
26
Geometrically, the model we want would look something like Figure 5.3.
Notice in this model that the probability lies between 0 and 1 and that it
varies nonlinearly with X.
The reader will realize that the sigmoid, or S-shaped, curve in the figure
very much resembles the cumulative distribution function (CDF) of a
random variable.
Therefore, one can easily use the CDF to model regressions where the
response variable is dichotomous, taking 0–1 values.
The practical question now is, which CDF? For although all CDFs are S
shaped, for each random variable there is a unique CDF.
For historical as well as practical reasons, the CDFs commonly chosen to
represent the 0–1 response models are (1) the logistic and (2) the normal,
the former giving rise to the logit model and the latter to the probit (or
normit) model.
27
The logit model
Let us start with home ownership example to explain the basic
ideas underlying the logit model.
Recall that in explaining home ownership in relation to income,
the LPM was
𝑃𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 (4)
where X is income and 𝑃𝑖 = E(𝑌𝑖 = 1|𝑋𝑖 ) means the family owns a
house.
But now consider the following representation of home
ownership:
28
1
𝑃𝑖 = (5)
1+ 𝑒 −(𝛽1 +𝛽2 𝑋𝑖 )
For ease of exposition, we write Eq. (5) as
1 𝑒 𝑍𝑖
𝑃𝑖 = = (6)
1+ 𝑒 −𝑍𝑖 1+ 𝑒 𝑍𝑖
Where 𝑍𝑖 = 𝛽1 + 𝛽2 𝑋𝑖
Equation (6) represents what is known as the (cumulative)
logistic distribution function.
It is easy to verify that as 𝑍𝑖 ranges from −∞ to +∞, 𝑃𝑖 ranges
between 0 and 1 and that 𝑃𝑖 is nonlinearly related to 𝑍𝑖 (i.e.,
𝑋𝑖 ), thus satisfying the two requirements considered earlier.
29
If 𝑃𝑖 , the probability of owning a house, is given by Eq. (6),
then (1 − 𝑃𝑖 ), the probability of not owning a house, is
1
(1 − 𝑃𝑖 ) = (7)
1+ 𝑒 𝑍𝑖
Therefore, we can write
𝑒𝑍𝑖
𝑃𝑖 1+ 𝑒𝑍𝑖
= 1 = 𝑒 𝑍𝑖 (8)
(1 − 𝑃𝑖 )
1+ 𝑒𝑍𝑖
𝑃𝑖
Now is simply the odds ratio in favor of owning a
(1 − 𝑃𝑖 )
house—the ratio of the probability that a family will own a
house to the probability that it will not own a house.
30
Now if we take the natural log of Eq. (8), we obtain a very
interesting result, namely,
𝑃𝑖
𝐿𝑖 = ln = = ln 𝑒 𝑍𝑖 = 𝑍𝑖
(1 − 𝑃𝑖 )
𝑍𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 (9)
that is, L, the log of the odds ratio, is not only linear in X, but also
(from the estimation viewpoint) linear in the parameters. L is
called the logit, and hence the name logit model for models like
Eq. (9).
31
Features of the logit model.
As P goes from 0 to 1 (i.e., as Z varies from −∞ to +∞), the logit
L goes from −∞ to +∞. That is, although the probabilities (of
necessity) lie between 0 and 1, the logits are not so bounded.
Although L is linear in X, the probabilities themselves are not.
Although we have included only a single X variable, or
regressor, in the preceding model, one can add as many
regressors as may be dictated by the underlying theory.
32
If L, the logit, is positive, it means that when the value of the
regressor(s) increases, the odds that the regressand equals 1
(meaning some event of interest happens) increases. If L is
negative, the odds that the regressand equals 1 decreases as the
value of X increases. To put it differently, the logit becomes
negative and increasingly large in magnitude as the odds ratio
decreases from 1 to 0 and becomes increasingly large and
positive as the odds ratio increases from 1 to infinity.
33
More formally, the interpretation of the logit model given in Eq. (9) is as
follows: 𝜷𝟐 , the slope, measures the change in L for a unit change in X, that
is, it tells how the logodds in favor of owning a house change as income
changes by a unit, say, $1,000.
The intercept 𝜷𝟏 is the value of the log-odds in favor of owning a house if
income is zero. Like most interpretations of intercepts, this interpretation
may not have any physical meaning.
Given a certain level of income, say, X* , if we actually want to estimate not
the odds in favor of owning a house but the probability of owning a house
itself, this can be done directly from Eq. (6) once the estimates of 𝛽1 and 𝛽2
are available.
Whereas the LPM assumes that 𝑃𝑖 is linearly related to 𝑋𝑖 , the logit model
assumes that the log of the odds ratio is linearly related to 𝑋𝑖 .
34
Estimation of the Logit Model
For estimation purposes, we write Eq. (9) as follows:
𝑃𝑖
𝐿𝑖 = ln = = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑈𝑖 (10)
(1 − 𝑃𝑖 )
Using the estimated 𝑃𝑖 , we can obtain the estimated logit as
𝑃𝑖
𝐿𝑖 = ln = = 𝛽1 + 𝛽2 𝑋𝑖 (11)
(1 −𝑃𝑖 )
1
𝑈𝑖 ~ 𝑁 0,
𝑁𝑖 𝑃𝑖 (1 − 𝑃𝑖 ) (12)
35
that is, 𝑈𝑖 follows the normal distribution with zero mean and
1
variance equal to
𝑁𝑖 𝑃𝑖 (1 − 𝑃𝑖 )
1
𝛿2 = (13)
(𝑁𝑖 𝑃𝑖 (1 −𝑃𝑖 )
Interpretation of coefficients
An increase in x increases/decreases the likelihood that
y=1 (makes that outcome more/less likely). In other words,
an increase in x makes the outcome of 1 more or less
likely.
We interpret the sign of the coefficient but not the
magnitude. The magnitude cannot be interpreted using the
coefficient because different models have different scales
of coefficients.
36
In probability theory and statistics, the logistic distribution is
a continuous probability distribution.
Its cumulative distribution function is the logistic function,
which appears in logistic regression .
It resembles the normal distribution in shape but has heavier
tails (higher kurtosis).
The logistic distribution has wider tails than a normal
distribution so it is more consistent with the underlying data
and provides better insight into the likelihood of extreme
events.
37
Logistic Distribution Graph
38
The Probit Model
As we have noted, to explain the behavior of a dichotomous dependent
variable we will have to use a suitably chosen cumulative distribution
function (CDF).
The logit model uses the cumulative logistic function.
But this is not the only CDF that one can use. In some applications,
the normal CDF has been found useful.
The estimating model that emerges from the normal CDF is
popularly known as the probit model, although sometimes it is also
known as the normit model.
Instead of using the cumulative logistic function to transform the model, the
cumulative normal distribution is sometimes used instead. This gives rise to
the probit model.
The function of probit model is :
1 1 𝑧
−2( 𝜎𝑖 )
𝐹 𝑧𝑖 = 𝑒
𝜎 2𝜋
This function is the cumulative distribution function for a
standard normally distributed random variable.
As for the logistic approach, this function provides a
transformation to ensure that the fitted probabilities will lie
between zero and one.
40
Here the observed dependent variable Y, takes on one of the
values 0 and 1 using the following criteria.
41
• Interpretation of coefficients in Probit models
Continuous explanatory variables:
where
How does the probability for y = 1 change if
explanatory variable xj changes by one unit?
Discrete explanatory variables:
For example, explanatory variable xk increases by one unit.
• Partial effects are nonlinear and depend on the
level of X
Researchers often report the marginal effect, which is the
change in y* for each unit change in x.
Choosing between the logit and probit models
For the majority of the applications, the logit and probit models will
give very similar characterisations of the data because the densities
are very similar.
That is, the fitted regression will be virtually indistinguishable and the
implied relationships between the explanatory variables and the
probability that 𝑦𝑖 = 1 will also be very similar.
Both approaches are much preferred to the linear probability model.
But, the logit model assumes logistic distribution, whereas probit
model assumes noramal distribution.
Logit & probit
44
The Tobit Model
The Tobit model (developed and named after Tobin (1958) is
an extension of the probit model that allows us to estimate
models that use censored variables.
Censored variables are variables that contain regular values
for some of the cases in the sample and do not have any
values at all for some other cases.
Censoring and truncation
Censoring is when the limit observations are in the
sample (only the value of the dependent variable is
censored) and truncation is when the observations are
not in the sample.
Censored sample: include consumers who consume
zero quantities of a product.
Truncated sample: only include consumers who
choose positive quantities of a product.
Censoring: When an observation is incomplete due to some
random cause.
Truncation: When the incomplete nature of the observation is
due to a systematic selection process inherent to the study
design.
For a concrete example of truncation, car insurance companies never
hear about accidents where the damage is less than the deductible,
because people don't report there. This is left truncation; we never see
data on these incidents at all.
For an example of right censoring, when a sick patient decides to stop
seeing their doctor, or moves to a different city, then all that is known
is that they were alive on the day they left, but we don't know when
they died
Censoring: some observations will be censored, meaning that
we only know that they are below (or above) some bound.
This can for instance occur if we measure the concentration of
a chemical in a water sample. If the concentration is too low,
the laboratory equipment cannot detect the presence of the
chemical. It may still be present though, so we only know that
the concentration is below the laboratory's detection limit.
If the detection limit is 1.5, so that observations that fall below
this limit is censored, our example data set would become:
<1.5<1.5245,<1.5<1.5245,
that is, we don't know the actual values of the first two
observations, but only that they are smaller than 1.5.
The censored sample is representative of the
population (only the mean for the dependent variable
is not) because all observations are included.
The truncated sample is not representative of the
population because some observations are not
included.
Truncation has greater loss of information than censoring (missing
observations rather than values for the dependent variable).
Censored sample: observe people that do not work but their
work hours are recorded as zero.
Truncated sample: do not observe anything about people who
do not work.
A truncated sample will have fewer observations and higher mean
(with censoring from below) than a censored sample.
Because of censoring, the dependent variable y is the incompletely
observed value of the latent dependent variable y*.
Income of y*=120,000 will be censored as y=100,000 with top coding
of 100,000
Censoring from below
The actual value for the dependent variable y is
observed if the latent variable y* is above the limit and
the limit is observed for the censored observations.
We observe the actual hours worked for people who
work and zero for people who do not work
where L is the lower limit.
Censoring from above
The actual value for the dependent variable y is observed if
the latent variable y* is below the limit and the limit is
observed for the censored observations.
If people make below $100,000, we observe their
actual income and if they make above $100,000, we
record their income as 100,000 (censored values).
where U is the upper limit
Multinomial Logit
16.3
Multinomial Logit
• We are often faced with choices
involving more than two alternatives
– These are called multinomial choice
situations
• If you are shopping for a laundry detergent, which
one do you choose? Tide, Cheer, Arm & Hammer,
Wisk, and so on
• If you enroll in the business school, will you
major in economics, marketing, management,
finance, or accounting?
16.3
Multinomial Logit
• The estimation and interpretation of the
models is, in principle, similar to that in
logit and probit models
– The models go under the names
• multinomial logit
• conditional logit
• multinomial probit
Ordered Choice Models
16.5
Ordered Choice
Models
• The choice options in multinomial and
conditional logit models have no natural
ordering or arrangement
– However, in some cases choices are ordered in
a specific way
16.5
Ordered Choice
• Examples:
Models
1. Results of opinion surveys in which responses
can be strongly disagree, disagree, neutral,
agree or strongly agree
2. Assignment of grades or work performance
ratings
3. Standard and Poor’s rates bonds as AAA, AA,
A, BBB and so on
4. Levels of employment are unemployed, part-
time, or full-time
16.5
• When modeling these types of outcomes
Ordered Choice
Models
numerical values are assigned to the
outcomes, but the numerical values are
ordinal, and reflect only the ranking of the
outcomes
– In the first example, we might assign a
dependent variable y the values:
1 strongly disagree
2 disagree
y 3 neutral
4 agree
5 strongly agree
16.5
Ordered Choice
• There may be a natural ordering to college
Models
choice
– We might rank the possibilities as:
3 4-year college (the full college experience)
Eq. 16.26 y 2 2-year college (a partial college experience)
1 no college
– The usual linear regression model is not
appropriate for such data, because in regression
we would treat the y values as having some
numerical meaning when they do not
16.5
Ordered Choice
Models
16.5.1
• When faced with a ranking problem, we
Ordered Probit
Choice Probabilities
develop a ‘‘sentiment’’ about how we feel
concerning the alternative choices, and the
higher the sentiment, the more likely a
higher-ranked alternative will be chosen
– This sentiment is, of course, unobservable to
the econometrician
– Unobservable variables that enter decisions are
called latent variables
16.5
Ordered Choice
Models
16.5.1
Ordered Probit
• For college choice, a latent variable may be
Choice Probabilities
grades:
yi* GRADESi ei
– This model is not a regression model, because
the dependent variable is unobservable
• Consequently it is sometimes called an index model
16.5
Ordered Choice
Models Ordinal choices relative to thresholds
16.5.1
Ordered Probit
Choice Probabilities
16.5
Ordered Choice
Models
16.5.1
Ordered Probit
Choice Probabilities
• We can now specify:
3 (4-year college) if yi* 2
y 2 (2-year college) if 1 yi* 2
1 (no college) if yi* 1
16.5
Ordered Choice
Models
• If we assume that the errors have the
16.5.1
Ordered Probit
Choice Probabilities
standard normal distribution, N(0, 1), an
assumption that defines the ordered probit
model, then we can calculate the following:
P yi 1 P yi* 1 P GRADESi ei 1
P ei 1 GRADESi
1 GRADESi
16.5
Ordered Choice
Models
16.5.1
Ordered Probit
Choice Probabilities
• Also:
P yi 2 P 1 yi* 2 P 1 GRADESi ei 2
P 1 GRADESi ei 2 GRADESi
2 GRADESi 1 GRADESi
16.5
Ordered Choice
Models
16.5.1
Ordered Probit
Choice Probabilities
• Finally:
P yi 3 P yi* 2 P GRADESi ei 2
P ei 2 GRADESi
1 2 GRADESi
16.5
Ordered Choice
Models
• If we observe a random sample of N = 3
16.5.2
Estimation and
Interpretation
individuals, with the first not going to
college (y1 = 1), the second attending a
two-year college (y2 = 2), and the third
attending a four-year college (y3 = 3), then
the likelihood function is:
L , 1 , 2 P y1 1 P y2 2 P y3 3
16.5
Ordered Choice
Models
• Econometric software includes options for
16.5.2
Estimation and
Interpretation
both ordered probit, which depends on the
errors being standard normal, and ordered
logit, which depends on the assumption that
the random errors follow a logistic
distribution
– Most economists will use the normality
assumption
– Many other social scientists use the logistic
– There is little difference between the results
16.5
Ordered Choice
Models
• The types of questions we can answer with
16.5.2
Estimation and
this model are the following:
Interpretation
1. What is the probability that a high school
graduate with GRADES = 2.5 (on a 13-point
scale, with one being the highest) will attend a
two-year college?
Pˆ y 2 | GRADES 2.5 2 2.5 1 2.5
16.5
Ordered Choice
Models
• The types of questions:
16.5.2
Estimation and
Interpretation
3. If we treat GRADES as a continuous variable,
what is the marginal effect on the probability of
each outcome, given a one-unit change in
GRADES?
P y 1
1 GRADES
GRADES
P y 2
GRADES 1 GRADES 2 GRADES
P y 3
2 GRADES
GRADES
THE END OF CHAPTER
FIVE”
THANK YOU FOR YOUR
ATTENTION!!
May 2004 Prof.VuThieu 75