What is the Poisson Distribution?
The Poisson distribution is a discrete probability distribution that expresses the probability of a
given number of events occurring in a fixed interval of time or space. These events must occur
with a known constant mean rate and must be independent of the time since the last event (e.g.,
modeling the number of customers arriving at a store per hour).
The Poisson distribution is defined by a single parameter 𝜇 which represents the average rate of
occurrence of the event in the given interval.
Probability Mass Function (PMF) of the Poisson Distribution:
The PMF of a Poisson random variable y, which represents the number of events occurring in an
interval, is given by:
𝑒 −𝜇 𝜇𝑘
Pr (𝑦 = 𝑘) = , 𝑘 = 0,1,2, …
𝑘!
o 𝜇: is the average number of occurrences (mean rate) in the interval.
o 𝑘: is the actual number of occurrences observed.
To visualize the Poisson distribution, the probability mass function for different values of 𝜇 can
be plotted. This will show how the distribution changes as the average rate of occurrence changes.
The plot below shows few graphs for different values of 𝜇 (e.g. 𝜇 = 1,2,5,10) to illustrate the
shape of the Poisson distribution.
o Skewness: Lower values of μ result in a distribution that is skewed to the right, indicating
a higher probability of fewer events occurring.
o Symmetry: As μ increases, the distribution becomes more symmetric and spreads out,
meaning there is a higher probability of a larger number of occurrences.
o Variance: The variance of the Poisson distribution is equal to its mean μ. Thus, as μ
increases, both the average and the spread of possible outcomes increase.
Key Properties of the Poisson Distribution
1. Mean and Variance:
o The mean (expected value) of a Poisson-distributed random variable is μ.
o The variance of a Poisson-distributed random variable is also μ. This property is
known as Equidispersion, where the mean and variance are equal.
E(𝑦 ∣ 𝑥) = Var(𝑦 ∣ 𝑥) = 𝜇
2. Shape of the Distribution:
o The shape of the Poisson distribution depends on the value of μ.
o For small values of μ, the distribution is skewed to the right (more probability mass
at lower values).
o As μ increases, the distribution becomes more symmetric and bell-shaped,
resembling a normal distribution.
3. Independence of Events:
o One key assumption of the Poisson distribution is that events occur independently.
The occurrence of one event does not affect the probability of another event
occurring.
4. Probability of Zero Events:
o The probability of observing zero events 𝑘 = 0 is 𝑒 −𝜇 . This probability decreases
as μ increases, reflecting that as the average rate of occurrence grows, the likelihood
of having no events in an interval becomes smaller.
Poisson regression models
Poisson models, or Poisson regression models, are a type of generalized linear model (GLM) that
extends the Poisson distribution to allow for the modeling of count data as a function of
explanatory variables. These models are used when the dependent variable (response variable)
represents counts, such as the number of occurrences of an event, and the goal is to understand
how predictor variables influence the count of events.
In Poisson regression, we model the mean 𝜇𝑖 as an exponential function of a linear predictor to
ensure that 𝜇𝑖 is always positive. This is achieved using a log-link function. The primary reason for
using the log-link function in Poisson regression is that it ensures the predicted mean (𝜇𝑖 ) is always
positive, which aligns with the nature of count data (e.g., number of calls, number of incidents).
Count data cannot be negative, so we need a function that guarantees non-negative predictions.
The log-link function achieves this by transforming the linear predictor into the exponentiated
form, ensuring that 𝜇𝑖 (the mean number of occurrences) is always greater than zero.
In Poisson regression, we link the mean 𝜇𝑖 to the linear predictor 𝑋𝑖 𝛽 using a logarithmic
transformation:
log (𝜇𝑖 ) = 𝑋𝑖 𝛽
o Here, log(𝜇𝑖 ) is the natural logarithm of the expected count 𝜇𝑖 .
o 𝑋𝑖 is a vector of predictor variables for observation 𝑖 (including an intercept term).
To express 𝜇𝑖 directly, we exponentiate both sides of the equation:
𝜇𝑖 = 𝑒 𝑋𝑖𝛽
This ensures that 𝜇𝑖 is always positive because the exponential function 𝑒 𝑥 is always positive for
any real number 𝑥.
Formulating the Log-Likelihood Function
The log-likelihood function is derived from the Poisson PMF. It represents the logarithm of the
likelihood of observing the given data under the model.
Given the Poisson PMF, the likelihood 𝐿(𝛽) for n independent observations is the product of the
individual probabilities:
𝐿(𝛽) = ∏ ⬚ Pr(𝑦𝑖 = 𝑘)
𝑖=1
Taking the logarithm to form the log-likelihood function ℒ(𝛽):
𝑛 𝑦
𝑒 −𝜇𝑖 𝜇𝑖 𝑖
ℒ(𝛽) = log (∏ ⬚ )
𝑦𝑖 !
𝑖=1
This simplifies using the logarithmic properties:
𝑛
𝑦
ℒ(𝛽) = ∑(log (𝑒 −𝜇𝑖 ) + log (𝜇𝑖 𝑖 ) − log (𝑦𝑖 !))
𝑖=1
ℒ(𝛽) = ∑(−𝜇𝑖 + 𝑦𝑖 log(𝜇𝑖 ) − log(𝑦𝑖 !))
𝑖=1
In Poisson regression, 𝜇𝑖 = 𝑒 𝑋𝑖 𝛽 . Substituting this into the log-likelihood function:
ℒ(𝛽) = ∑(−𝑒 𝑋𝑖𝛽 + 𝑦𝑖 (𝑋𝑖 𝛽) − log (𝑦𝑖 !))
𝑖=1
This is the log-likelihood function for the Poisson regression model. It is a function of the
coefficients β, which we aim to estimate.
Incident Rate Ratio (IRR)
The Incident Rate Ratio (IRR) represents the multiplicative effect of a one-unit change in a
covariate on the expected count rate. The IRR is derived directly from the Poisson regression
coefficients and gives an intuitive way to interpret the results (similar to the concept of odds ratio
that you learnt in the logit model).
For a covariate 𝑥𝑗 with an estimated coefficient 𝛽𝑗 , the IRR is calculated as:
IRR𝑗 = exp (𝛽𝑗 )
If 𝛽𝑗 = 0.2, then the IRR for 𝑥𝑗 is:
IRR𝑗 = exp (0.2) ≈ 1.22
This means that for a one-unit increase in 𝑥𝑗 , the expected count increases by 22% (1.22 − 1 =
0.22).
Marginal Effects
The Marginal Effect in Poisson regression quantifies the absolute change in the expected count 𝜇
for a one-unit change in the covariate 𝑥𝑗 , holding all other variables constant. Mathematically, it
is the partial derivative of the expected count 𝜇𝑖 with respect to the covariate 𝑥𝑗 :
∂𝜇𝑖
= 𝜇𝑖 ⋅ 𝛽𝑗
∂𝑥𝑗
The marginal effect tells you the absolute change in the expected count for a one-unit increase in
𝑥𝑗 depending on both the value of 𝛽𝑗 and the current expected count 𝜇𝑖 .
For example, if 𝛽𝑗 = 0.2 and the current expected count 𝜇𝑖 = 5 the marginal effect is:
∂𝜇𝑖
= 5 ⋅ 0.2 = 1
∂𝑥𝑗
This means that a one-unit increase in 𝑥𝑗 will increase the expected count by 1 unit.
Negative Binomial Regression Model
The Negative Binomial Regression Model (NBRM) is indeed an extension of the Poisson
Regression Model (PRM), and its main purpose is to account for overdispersion, which occurs
when the variance of the count data is greater than the mean.
Poisson regression assumes that the variance equals the mean (Var( 𝑌𝑖 ∣ 𝑋𝑖 ) = 𝜇𝑖 ) which works
well for count data where the spread of the data is proportional to the mean. However, in many
real-world datasets, the variance is often greater than the mean, a situation known as
overdispersion.
The Negative Binomial model modifies the Poisson model to handle this overdispersion by
introducing unobserved heterogeneity. This means that the model acknowledges that not all of the
variability in the count data is captured by the observed predictor variables 𝑋𝑖 . There are additional,
unobserved factors that introduce variability, and these factors cause the variance to be larger than
the mean.
In the Negative Binomial model, the mean 𝜇𝑖 is no longer treated as a fixed value for each
observation, but rather it is allowed to vary due to unobserved heterogeneity. This variability is
captured by introducing a random error term 𝜖𝑖 into the Poisson mean. The modified mean is
written as:
𝜇̃𝑖 = 𝜇𝑖 exp(𝜖𝑖 )
Where:
• 𝜇𝑖 is the deterministic part of the mean based on the predictors 𝑋𝑖 .
• 𝜖𝑖 is a random error term that introduces variability in the mean and is often assumed to
follow a Gamma distribution.
As a result of introducing the random error term, the variance in the Negative Binomial model is
larger than the mean, unlike the Poisson model. Specifically, the variance becomes:
Var( 𝑌𝑖 ∣ 𝑋𝑖 ) = 𝜇𝑖 (1 + 𝛼𝜇𝑖 )
𝜇𝑖 is the expected mean (as in Poisson regression).
𝛼 is the dispersion parameter, which controls how much larger the variance is compared to the
mean. When 𝛼 = 0 the model reduces to the Poisson model (where the variance equals the mean).
Example
Suppose we want to predict the number of emergency hospital visits 𝑌𝑖 made by patients, based on
their age and health risk factors 𝑋𝑖 (e.g., history of chronic illness, lifestyle habits, etc.). Our data
suggests that some patients make significantly more visits than others, even when their age and
risk factors are similar. This hints that there may be unobserved factors influencing the number of
visits.
In the Poisson regression model, we assume that the mean number of visits 𝜇𝑖 for each patient is
determined by a set of predictors 𝑋𝑖 (e.g., age, health conditions) and follows a log-linear function:
𝜇𝑖 = exp (𝑋𝑖 𝛽)
In the Poisson model, it is assumed that the variance of 𝑌𝑖 is equal to the mean 𝜇𝑖 i.e.:
Var( 𝑌𝑖 ∣ 𝑋𝑖 ) = 𝜇𝑖
Example of Poisson Regression:
Let's say a patient has an expected mean number of emergency visits 𝜇𝑖 = 5 based on their age
and health risk factors. Under the Poisson model, the mean and variance are the same:
𝜇𝑖 = 5
Var( 𝑌𝑖 ∣ 𝑋𝑖 ) = 5
In this case, the Poisson model assumes that most patients with the same risk factors will make
around 5 visits with small variability around that number.
Problem of Overdispersion
However, when we examine the actual data, we observe that some patients make many more visits
than predicted by the Poisson model, while others make fewer. The variance in the number of
hospital visits is greater than the mean. This is called overdispersion, and the Poisson model is not
well-suited to handle it because it assumes the variance equals the mean.
To address overdispersion, we use the Negative Binomial regression model. The NBRM modifies
the Poisson model by introducing a random error term that captures unobserved heterogeneity
factors that influence the number of visits but are not included in the predictors 𝑋𝑖 .
In the Negative Binomial model, the number of visits 𝑌𝑖 still follows a Poisson process, but the
mean 𝜇𝑖 is now treated as a random variable:
𝜇̃𝑖 = 𝜇𝑖 exp(𝜖𝑖 )
• 𝜇𝑖 = exp(𝑋𝑖 𝛽) is the expected mean as before.
• 𝜖𝑖 is a random error term that follows a Gamma distribution, introducing extra variability
in 𝜇𝑖 .
This additional random term 𝜖𝑖 captures unobserved factors that make some patients more likely
to visit the hospital frequently. The error term allows the variance to be greater than the mean,
addressing the overdispersion problem.
The variance in the Negative Binomial model is given by:
Var( 𝑌𝑖 ∣ 𝑋𝑖 ) = 𝜇𝑖 (1 + 𝛼𝜇𝑖 )
𝛼 is the dispersion parameter, which controls how much larger the variance is compared to the
mean. When 𝛼 = 0 the model reduces to the Poisson model (where the variance equals the mean).
But if 𝛼 > 0, the variance exceeds the mean, allowing the model to handle overdispersed data.
---------Optional----------
The relationship between the variables in the Negative Binomial model can be represented as
follows:
𝜇̃ = exp(𝑥𝑖 𝛽) exp(𝑒𝑖 ) = 𝜇 exp(𝑒𝑖 ) = 𝜇0
Here, 𝜇0 represents the baseline mean (without the error term), while 𝜇 represents the expected
value of the outcome variable after accounting for covariates 𝑥𝑖 and error term 𝑒𝑖 .
The NBRM is not fully identified unless an assumption is made about the mean of the error term.
The common assumption used here is that the mean of the error term is 1, analogous to the OLS
assumption that the mean of residuals is 0.
Therefore, we have:
𝜇̃ = exp(𝑥𝑖 𝛽) exp(𝑒𝑖 ) = 𝜇0 𝑒 0 = 𝜇0
To fully model count data, it is crucial to understand the distribution of the overdispersion
parameter 𝛿. In NBRM, delta often follows a gamma distribution. This assumption leads to the
following key properties of 𝛿
Mean of 𝛿 = 1
1
Variance of 𝛿 =
𝜈
This assumption simplifies many aspects of the Negative Binomial model.
The expected value of the dependent variable 𝑦 is the same in the Negative Binomial distribution
as it is in the Poisson distribution:
𝐸( 𝑦 ∣ 𝑥 ) = 𝜇 = exp(𝑥𝑖 𝛽)
However, the conditional variance differs between the two models. For the NBRM, the conditional
variance is represented as:
𝜇𝑖 exp(𝑥𝑖 𝛽)
𝑉𝑎𝑟( 𝑦𝑖 ∣ 𝑥𝑖 ) = 𝜇𝑖 (1 + ) = exp(𝑥𝑖 𝛽) (1 + )
𝜈 𝜈
This additional term accounts for the overdispersion present in count data, which the Poisson
model cannot handle.
Variance implications
Since both 𝜇 and 𝜈 are positive, the conditional variance of 𝑦 in the NBRM must exceed the
conditional mean exp(𝑥𝑖 𝛽). This is a crucial difference between the Poisson and Negative
Binomial models. In the Poisson model, the variance equals the mean, but the NBRM introduces
an overdispersion parameter to account for greater variability.
If the overdispersion parameter 𝜈 varies by individuals, then there will be more parameters than
there are observations, leading to an identification problem. The most common assumption to
avoid this issue is that 𝜈 is constant across all individuals, which simplifies the model and makes
it identifiable.
This assumption mirrors the homoscedasticity assumption in OLS (Ordinary Least Squares).
𝜈𝑖 = 𝛼𝑧𝑖 for 𝛼 > 0 where 𝛼 is a constant and 𝑧𝑖 may represent individual-specific characteristics.
𝛼 is known as the dispersion parameter. When 𝛼 increases, the conditional variance of 𝑦 increases,
as it governs how much extra variability exists beyond what is expected from the mean.
By substituting the variance formula for the conditional variance of 𝑦, we have:
𝜇𝑖 exp(𝑥𝑖 𝛽)
𝑉𝑎𝑟(𝑦𝑖 ∣ 𝑥𝑖 ) = 𝜇𝑖 (1 + ) = exp(𝑥𝑖 𝛽) (1 + )
𝛼𝑧𝑖 𝛼𝑧𝑖
This variance is typically rewritten as:
𝑉𝑎𝑟( 𝑦𝑖 ∣ 𝑥𝑖 ) = 𝜇𝑖 + 𝛼𝜇𝑖2
When 𝛼 = 0, the NBRM simplifies to a Poisson model where the mean and variance are identical,
and there is no overdispersion.
References
1. Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (No. 53).
Cambridge university press.
2. Hilbe, J. M. (2014). Modeling count data. Cambridge University Press.
3. Prof. Richard Williams, Models for Count Outcomes, University of Notre Dame,
https://www3.nd.edu/~rwilliam/stats3/CountModels.pdf