Cópia de Aula5 - Contagem

The document discusses count data models, primarily focusing on the Poisson regression model and its limitations due to overdispersion. It explains the need for alternative models like the Negative Binomial model and methods for testing overdispersion, as well as addressing issues related to truncated and zero-inflated data. Additionally, it highlights an application involving cross-sectional data from a Brazilian survey on food security and crime.

Uploaded by

Aldryn Dylan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views28 pages

Cópia de Aula5 - Contagem

Uploaded by

Aldryn Dylan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Count Data Model

Greene 18.4
Cameron e Trivedi cap 20
Basic Idea
• Y = non-negative integer number or count of
events, in general, with few and small values
(0, 1, 2...)
• Ex: number of visits to a doctor in the year,
number of patentes registerd by a firm in a
year, number of times you are robbed in a
year
• Starting point: Poisson process
Rare Events Law
• The total number of events will approximately
follow the Poisson distribution if an event can
occur in any trial among a large number of them,
but the probability of occurrence in a given trial is
small.
• The Poisson distribution is the number of
occurrrences of the event, with density given by:
𝑦𝑖
exp(−𝜆𝑖 )𝜆𝑖
Pr 𝑌 = 𝑦𝑖 = , 𝑦 = 0, 1, 2, …
𝑦𝑖 !
Rare Events Law
•  = intensity parameter
• E[Y] = Var[Y] = 
• Poisson Propriety: E[Y] = Var[Y] =  →
equidispersion (mean = variance)
Poisson Regression Model
• Introducting subscript i for each observation
and the hypothesis that observations are
independente and identically distributed→
Poisson regression model.
• The model establishes that each yi is a trial
from a Poisson population with parameter i,
related to the regressors xi.
Poisson Regression Model
• The density of this distribution is given by:
𝑦𝑖
exp(−𝜆𝑖 )𝜆𝑖
Pr 𝑌 = 𝑦𝑖 =
𝑦𝑖 !

• The usual hypothesis for i parametrization is the log-linear

model:
i = exp(xi’) ou ln(i) = xi’

• So, E[yi|xi] = Var[yi|xi] = i = exp(xi’)

• Obs: Poisson regression is intrinsically heteroskedastic
Maximum Likelihood Estimation
𝑦𝑖
exp(−𝜆𝑖 )𝜆𝑖
Pr 𝑌 = 𝑦𝑖 =
𝑦𝑖 !
𝑦𝑖
Ln[Pr 𝑌 = 𝑦𝑖 ] = ln(exp −𝜆𝑖 + ln 𝜆𝑖 − ln(𝑦𝑖 !) =
= −𝜆𝑖 + 𝑦𝑖 ln 𝜆𝑖 − ln(𝑦𝑖 !)

But remember that 𝜆𝑖 = exp 𝑥𝑖′ 𝛽

So, we can write:

𝐿𝑛[𝐿 𝛽 ] = ෍[𝑦𝑖 𝑥𝑖′ 𝛽 − exp 𝑥𝑖′ 𝛽 − ln(𝑦𝑖 !)]

𝑖=1
Maximum Likelihood Estimation
𝑛

𝐿𝑛[𝐿 𝛽 ] = ෍[𝑦𝑖 𝑥𝑖′ 𝛽 − exp 𝑥𝑖′ 𝛽 − ln(𝑦𝑖 !)]

𝑖=1

𝜕𝐿𝑛[𝐿 𝛽 ]
= σ𝑛𝑖=1[𝑦𝑖 𝑥𝑖′ − exp 𝑥𝑖′ 𝛽 𝑥𝑖′ ] = 0
𝜕𝛽
𝜕𝐿𝑛[𝐿 𝛽 ]
= σ𝑛𝑖=1{[ 𝑦𝑖 − exp 𝑥𝑖′ 𝛽 ] 𝑥𝑖 } = 0
𝜕𝛽

𝑛
2
𝜕 𝐿𝑛[𝐿 𝛽 ]
= − ෍{exp 𝑥𝑖′ 𝛽 𝑥𝑖 𝑥𝑖′ } Hessian is negative
𝜕𝛽𝜕𝛽
𝑖=1 definite for all x and , so
𝐿𝑛[𝐿 𝛽 ] é globally
concave
Maximum Likelihood Estimator
• Newton method is generally used and
normally converges fast.
−1
 ˆ '
n
Variance-covariance estimator
 i xi xi  matrix for 𝛽෡
 i =1 
Conditional mean and variance
• Given the estimates, mean prediction for each
observation i is given by:

ˆ ' ˆ
E[ yi | xi ] = i = exp( xi  )
• Variance of prediction is given by:

𝑣𝑎𝑟 𝑦𝑖 𝑥𝑖 = 𝜆መ 2𝑖 𝑥𝑖′ 𝑉𝑥𝑖 ,

where V is the estimated asymptotic covariance
መ
matrix for 𝛽.
Marginal Effects
• As stated, predicton for each observation i is
given by:
E[ yi | xi ] = ˆi = exp( xi' ˆ )

• So, marginal effects are given by:

𝜕𝐸(𝑦|𝑥)
𝜕𝑥𝑗
= 𝛽𝑗 exp(𝑥 ′ 𝛽)
• For exemple, if 𝛽෡𝑗 =0,25 and exp(𝑥 ′ 𝛽)=3,
መ then a
one-unit change in the j-th regressor increases
the expectation of y by 0.75 units
Marginal Effects
• As before, if you want a single response, it is
common to report the average marginal effects
calculated for all individuals in the sample:

𝜕𝐸(𝑦𝑖 |𝑥𝑖 )
𝑁 −1 ෍
𝑖 𝜕𝑥𝑖𝑗
• If 𝛽𝑗 is twice as large as 𝛽𝑘 , then the effect of
changing the j-th regressor by one unit is twice
that of changing the k-th regressor by one unit.
Important issue
• Poisson Distribution is usually too restrictive for
count data
• The distribution is parametrized in terms of a
single scalar (𝜆) so that we have equidispersion
(condicional mean = condicional variance).
• However, in many applications for Count data,
variance usually exceeds the mean
(overdispersion)
• So, we need to test to verify if there is
overdispersion. If this is true, we will need to
estimate other models.
Testing Overdispersion
• A statistical test of overdispersion is therefore
highly desirable after running a Poisson
regression.
• Most count models with overdispersion
specify overdispersion to be of the form:
𝑉 𝑦𝑖 𝑥𝑖 = 𝜆𝑖 + 𝛼𝑔(𝜆𝑖 )
• Where 𝛼 is a unknown parameter and g() is a
known function, most commonly 𝑔 𝜆𝑖 = 𝜆2𝑖
or 𝑔 𝜆𝑖 = 𝜆𝑖
Testing Overdispersion
• We assume that both under null and
alternative hypothesis, the mean is correctly
specified as, for example, exp(𝑥 ′ 𝛽).
• Under Ho: 𝛼=0 (equidispersion)
• An overdispersion test statistic can be
computed by estimating the Poisson model,
constructing fitted values 𝜆෡𝑖 = exp(𝑥𝑖 ′𝛽)
መ
Testing Overdispersion
• And running the auxiliary OLS regression (without
constant):

(𝑦𝑖 − 𝜆෡𝑖 )2 −𝑦𝑖 𝑔(𝜆෡𝑖 )

=𝛼 + 𝑢𝑖
𝜆෡𝑖 𝜆෡𝑖

• Where 𝑢𝑖 is an error term. The t- statistic for 𝛼 is

asymptotically normal under the null hypothesis
of no overdispersion
Negative Binomial Model
• Overdispersion in count data may be due to
unobserved heterogeneity.
• Suppose the distribution of a random count y is
Poisson, conditional on the parameter 𝜆 , so that
exp(−𝜆)𝜆𝑦
𝑓(𝑦|𝜆) = 𝑦!
• In the negative binomial model, we assume that 𝜆 is
random. Let 𝝀 = 𝝁𝝊, where 𝜇 is a deterministic
function of x [for ex. exp(x𝛽)] and 𝜐>0 is iid with
density 𝑔(𝜐|α)
• So, different observations may have different 𝜆 , but
part of this difference is due to random component
𝜐.
Negative Binomial Model
• The distribution of 𝑦𝑖 conditional on 𝑥𝑖 and 𝜐𝑖 remains Poisson with
conditional mean and variance 𝜆𝑖 :

• The unconditional distribution f 𝑦𝑖 𝑥𝑖 is the expected value (over

𝜐𝑖 ) of f 𝑦𝑖 𝑥𝑖 , 𝜐𝑖 :
𝑦
∞ 𝑒 −(𝜇𝑖 𝜐𝑖 ) (𝜇𝑖 𝜐𝑖 ) 𝑖
f 𝑦𝑖 𝑥𝑖 = ‫׬‬0 g 𝜐𝑖 𝑑𝜐𝑖 (**)
𝑦𝑖 !

• In general, a gamma distribution is assumed for 𝜐𝑖 , so that:

𝜃𝜃 −𝜃𝜐
g 𝜐𝑖 = 𝑒 𝑖 𝜐𝑖 𝜃−1 (*)
Γ(𝜃)

Obs: If 𝜃 is a natural number (1, 2, 3,…), then Γ(𝜃) = (𝜃 − 1)!

Negative Binomial Model
• Substituting (*) in (**) and manipulating:

Γ 𝜃 + 𝑦𝑖 𝜇𝑖 𝑦 𝜇𝑖 𝜃
f 𝑦𝑖 𝑥𝑖 = { } 𝑖 {1 − }
Γ 1 + 𝑦𝑖 Γ 𝜃 𝜇𝑖 + 𝜃 𝜇𝑖 + 𝜃

• which is one form of the negative binomial

distribution. The distribution has conditional mean
𝜇𝑖 and conditional variance 𝜇𝑖 [1 + (1/ 𝜃) 𝜇𝑖 ] (NB2).
• Note that var > mean, since 𝜃 > 0 and 𝜇𝑖 > 0
Truncation
• In some studies, inclusion in the sample requires that sampled
individuals have been engaged in the activity of interest. Then
the count data are truncated, as the data are observed only
over part of the range of the response variable.
• Examples of truncated counts include the number of bus trips
made per week in surveys taken on buses, the number of
shopping trips made by individuals sampled at a mall, and the
number of unemployment spells among a pool of
unemployed.
• In all these cases we do not observe zero counts, so the data
are said to be zero-truncated, or more generally left-
truncated.
Truncation
• Truncation leads to inconsistent parameter estimates unless
the likelihood function is suitably modified. Consider the case
of zero truncation.
• Let f(y|θ) denote the density function and F(y|θ) = Pr[Y ≤ y]
denote the cumulative distribution function of the discrete
random variable, where θ is a parameter vector. If realizations
of y less than the positive integer 1 are omitted, the zero-
truncated density is given by:
f (y|θ, y ≥ 1) = f (y|θ)/ [1 − F(0|θ)], y = 1, 2, . . .
• This specializes in the zero-truncated Poisson case, for
example, to f (y|μ, y ≥ 1) = exp(-μ)μy/[y!(1 − exp(−μ))].
• It is possible to construct a log-likelihood based on this density
and to obtain maximum likelihood estimates.
Excess of zeros
• In some applications, we have lots of zeros.
• Ex: number of homicides in a municipality, number of times
you are robbed in a year
• The zero inflated-model models Pr[y = 0] = f1(0) through a
binary process and the count with a different density,
f2(y|y>0).
• If the binary process takes value 0, with probability f1 (0), then
y = 0. If the binary process takes value 1, with probability f1(1),
then y takes count values 0, 1, 2, . . . from the count density
f2(·).
• This lets zero counts occur in two ways: as a realization of the
binary process and as a realization of the count process when
the binary random variable takes value 1.
Excess of zeros
• The density is given by:

• Regression models let f1(·) be a logit model and f2(·) be a

Poisson or negative binomial density.
Application

https://www.anpec.org.br/encontro/2014/submissao/files_I/i12
-967168f1c1bc02480e2a256adddcd66b.pdf
Application
• Cross-sectional data from Special Supplements on Food
Security, Victimization and Justice included in the National
Household Sample Survey of 2009 (PNAD in Brazilian acronym)
carried out by the Brazilian Institute for Geography and
Statistics
• Advantages of this dataset as compared to official figures:
1) its coverage is nation-wide;
2) the response variable (i.e., crime) is free from bias caused by
measurement errors resulting from under-reporting;
3) it allows for the effects of household income, education, and
other factors on the number of crimes to be identified based on
non-victimized individuals
Application
• Dependent variable: the amount of times an individual was
victimized during one year.
• Models for four types of crimes were performed separately:
robbery (roubo), theft (furto), attempted theft/robbery, and
assault (agressão)
Application
• Dependent variable: the amount of times an individual was
victimized during one year.
• Models for four types of crimes were performed separately:
robbery (roubo), theft (furto), attempted theft/robbery, and
assault (agressão)
Application
They estimated a Negative Binomial Model and a Zeero-Inflated
Binomial Model. They reject the Poisson after testing for
equidisperson and also discuss the ZINB because there is lots of 0s
Men are 0,111
times more
victims of theft
than women,
ceteris paribus

In general,
small effects
but te
incidence is
also very low.

Lecture 6
No ratings yet
Lecture 6
76 pages
Fitting A Model Probability Distribution
No ratings yet
Fitting A Model Probability Distribution
17 pages
Count Models Poisson NB
No ratings yet
Count Models Poisson NB
10 pages
Poisson Vs Negativebinomial Blackburn2014
No ratings yet
Poisson Vs Negativebinomial Blackburn2014
12 pages
Countreg
No ratings yet
Countreg
11 pages
Unit 1 DMV
No ratings yet
Unit 1 DMV
22 pages
00 Estimation
No ratings yet
00 Estimation
33 pages
Chapter 11 Generalized
No ratings yet
Chapter 11 Generalized
28 pages
Review Statistics
No ratings yet
Review Statistics
24 pages
Section 8 P
No ratings yet
Section 8 P
43 pages
Vintage Games 2.0
100% (10)
Vintage Games 2.0
375 pages
L19 CountDataModels v2
No ratings yet
L19 CountDataModels v2
36 pages
HSM 16
No ratings yet
HSM 16
13 pages
3.handouts Binary Dependent Variables
No ratings yet
3.handouts Binary Dependent Variables
8 pages
MSD Discrete Count Models 2
No ratings yet
MSD Discrete Count Models 2
42 pages
Categorical Notes Ch1
No ratings yet
Categorical Notes Ch1
18 pages
Humanize AI
No ratings yet
Humanize AI
1 page
Comparing Poisson Regression Via Negative Binomial Regression For Modeling Zero-Inflated Data
No ratings yet
Comparing Poisson Regression Via Negative Binomial Regression For Modeling Zero-Inflated Data
9 pages
TCRM CountData
No ratings yet
TCRM CountData
43 pages
Arithmetic and Weighted Mean
No ratings yet
Arithmetic and Weighted Mean
5 pages
2 Descriptive Statistics Handout
No ratings yet
2 Descriptive Statistics Handout
2 pages
Problems and Prospects of E-Marketing
75% (20)
Problems and Prospects of E-Marketing
13 pages
Actsc 432 Review Part 1
No ratings yet
Actsc 432 Review Part 1
7 pages
The Poisson Regression Model
No ratings yet
The Poisson Regression Model
6 pages
EconomicsLetters NBP
No ratings yet
EconomicsLetters NBP
6 pages
Modeling
100% (1)
Modeling
300 pages
Modeling Count Data
No ratings yet
Modeling Count Data
6 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Shorten - Count Data Analysis
No ratings yet
Shorten - Count Data Analysis
24 pages
Probs Stats
No ratings yet
Probs Stats
26 pages
SAHADEB - Categorical - Data - LECTURES - Till Session 6
No ratings yet
SAHADEB - Categorical - Data - LECTURES - Till Session 6
165 pages
Thesis Using Multiple Linear Regression
75% (4)
Thesis Using Multiple Linear Regression
7 pages
Math2101Stat 5
No ratings yet
Math2101Stat 5
23 pages
SAHADEB - Categorical - Data - LECTURES 1 - Part 2
No ratings yet
SAHADEB - Categorical - Data - LECTURES 1 - Part 2
108 pages
Modeling Count Data. ISBN 1107611253, 978-1107611252
100% (27)
Modeling Count Data. ISBN 1107611253, 978-1107611252
23 pages
Assignment3 Finaldraft
No ratings yet
Assignment3 Finaldraft
38 pages
AY24 - 25 S1 Week 2 Engineering Reasoning Framework Traits and Elements Tutorial Handout
No ratings yet
AY24 - 25 S1 Week 2 Engineering Reasoning Framework Traits and Elements Tutorial Handout
7 pages
Count Data Models Explained
No ratings yet
Count Data Models Explained
7 pages
Countdata2018 2
No ratings yet
Countdata2018 2
23 pages
AI Boost for YouTube Long-Form Videos
No ratings yet
AI Boost for YouTube Long-Form Videos
26 pages
Negative Binomial Regression 1st Edition Joseph Hilbe Fast Download
No ratings yet
Negative Binomial Regression 1st Edition Joseph Hilbe Fast Download
84 pages
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
10 pages
WS2812B
No ratings yet
WS2812B
3 pages
Poisson Regression
No ratings yet
Poisson Regression
12 pages
Configuration Guide Version11 Docusign Included
100% (1)
Configuration Guide Version11 Docusign Included
82 pages
Real Time Braille To Speech Using Python
100% (1)
Real Time Braille To Speech Using Python
10 pages
Gigabyte Ga-Q77m-D2h Rev 1.01
No ratings yet
Gigabyte Ga-Q77m-D2h Rev 1.01
32 pages
Math 9 Final Review
No ratings yet
Math 9 Final Review
26 pages
Poisson vs. Negative Binomial Regression
No ratings yet
Poisson vs. Negative Binomial Regression
38 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
4 pages
Generalized Linear Models-1
No ratings yet
Generalized Linear Models-1
29 pages
Homework List Template
100% (1)
Homework List Template
5 pages
Regression Models For Count Data in R: Achim Zeileis Christian Kleiber Simon Jackman
No ratings yet
Regression Models For Count Data in R: Achim Zeileis Christian Kleiber Simon Jackman
25 pages
Practical Malware Analysis
No ratings yet
Practical Malware Analysis
65 pages
Memoire John CC
No ratings yet
Memoire John CC
71 pages
Negative Binomial Distribution: Bution
No ratings yet
Negative Binomial Distribution: Bution
21 pages
PeopleSoft v9.2 Product Review
No ratings yet
PeopleSoft v9.2 Product Review
163 pages
RUA Form 2022 - New
No ratings yet
RUA Form 2022 - New
5 pages
Modeling Count Data (Joseph M. Hilbe)
No ratings yet
Modeling Count Data (Joseph M. Hilbe)
304 pages
Lecture 11: Alternatives To OLS With Limited Dependent Variables, Part 2
No ratings yet
Lecture 11: Alternatives To OLS With Limited Dependent Variables, Part 2
42 pages
Baltagi Poisson
No ratings yet
Baltagi Poisson
37 pages
Chap1 Introduction 2may24
No ratings yet
Chap1 Introduction 2may24
21 pages
Web Technologies Week 03-04 (CSS)
No ratings yet
Web Technologies Week 03-04 (CSS)
50 pages
EDCI572 Project
No ratings yet
EDCI572 Project
28 pages
Brosur Elektronik 15 April 2023
No ratings yet
Brosur Elektronik 15 April 2023
2 pages
Batch - 10 OS
No ratings yet
Batch - 10 OS
12 pages
STAT2120: Categorical Data Analysis Chapter 1: Introduction
No ratings yet
STAT2120: Categorical Data Analysis Chapter 1: Introduction
51 pages
Bayesian Poisson Regression Guide
No ratings yet
Bayesian Poisson Regression Guide
122 pages
Module01 ProbabilityAndHypothesisTesting
No ratings yet
Module01 ProbabilityAndHypothesisTesting
62 pages
Poisson Models for Count Data
No ratings yet
Poisson Models for Count Data
49 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
DSB 1610 4X0
No ratings yet
DSB 1610 4X0
2 pages
OPM-50 Optical Power Meter User's Manual: Shineway Technologies, Inc. All Rights Reserved
No ratings yet
OPM-50 Optical Power Meter User's Manual: Shineway Technologies, Inc. All Rights Reserved
20 pages
Procedural Lab Use The Teamcenter Environment Manager To Deploy The Template Project
No ratings yet
Procedural Lab Use The Teamcenter Environment Manager To Deploy The Template Project
2 pages
05 Programmer's Reference, With Instructions On How To Execute The Program
No ratings yet
05 Programmer's Reference, With Instructions On How To Execute The Program
43 pages
SSPSS Data Analysis Examples Poisson Regression
No ratings yet
SSPSS Data Analysis Examples Poisson Regression
34 pages
Application Model For Travel Recommendations Based On Android
No ratings yet
Application Model For Travel Recommendations Based On Android
8 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
26 pages
Huntington University High School Mathematics Competition Competition Information and Sample Problems
No ratings yet
Huntington University High School Mathematics Competition Competition Information and Sample Problems
5 pages
Analyzing The BCG Matrix of Amazon PDF
No ratings yet
Analyzing The BCG Matrix of Amazon PDF
2 pages
Lecture Notes Fall Term 2013
No ratings yet
Lecture Notes Fall Term 2013
40 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
V27i08 PDF
No ratings yet
V27i08 PDF
25 pages
Count Data Modeling for Academics
100% (2)
Count Data Modeling for Academics
34 pages
CSS Cascade
No ratings yet
CSS Cascade
111 pages
Basic Probability & Statistics Review
No ratings yet
Basic Probability & Statistics Review
20 pages
Cybersecurity Analytics 1st Edition Rakesh M Verma David J Marchette PDF Download
No ratings yet
Cybersecurity Analytics 1st Edition Rakesh M Verma David J Marchette PDF Download
81 pages
College Statistics
No ratings yet
College Statistics
244 pages