This document lists common probability distributions described in
For each distribution, we show both the probability density and cumulative distribution plots.
Probability density plots show the probability that a random variable \(X\) equals some value \(x\); expressed using the notation: \(P(X=x)\).
Cumulative distribution plots show the probablity that a random variable \(X\) is less than or equal to some value \(x\); expressed using the notation \(P(X<=x)\).
The cumulative distribution answers questions like:
The emphasized phrases indicate we want the total population with a value in a certain range. This contrasts with probability density, that answers questions like:
The emphasized phrases indicate we want the slice of the population that exactly equals a certain value.
The R distribution functions (eg. pbinom, ppois, pnorm, etc.) have a lower.tail option to get the range above or below a threshold.
lower.tail = FALSE to get the population percentage above a threshold.lower.tail = TRUE to get the population percentage below a threshold.In general, the term distribution is used to describe statistics on discrete variables (eg. item counts, customer counts, etc). The density applies to continuous variables that can take any value from \(-inf\) to \(+inf\) (eg. heights, weights, etc. )
Suppose that n independent coin flips (ie. Bernoulli trials) are performed, each with the same success probability p. Let X be the number of HEADS (ie. successes). The distribution of X is called a Binomial distribution.
The plot shows the probability of getting exactly x HEADS in 10 biased coin flips where the probability of HEADS is 20%.
The plot shows the probability of getting greater than x HEADS in 10 coin flips where the probability of HEADS is 20%.
The Poisson distribution is often used in situations where we are counting the number of successes in a particular region or interval of time and there are a large number of trials. Each trial has a small probability of success. The parameter lambda (\(\lambda\)) is interpreted as the rate of occurrence of these rare events.
\(\lambda\) could be 20 (emails per hour), 10 (chips per cookie), and 2 (earthquakes per year). The poisson paradigm says that in applications similar to the ones above, we can approximate the distribution of the number of events that occur by a Poisson Distribution.
Consider an example where the number of people that show up at a bus stop is Poisson with a mean of 2.5 per hour. If watching the bus stop for 1 hours, what is the probability that exactly \(x\) people show up for the whole time?
For the previous example, what is the probability that more than \(x\) people show up for the whole time?
The following example illustrates how to simulate arrival times within a specified interval \((0,L]\).
Unif(O,L).The following plot simulates arrivals from a Poisson process with rate 10 events/interval after observing for 5 intervals.
Now, let’s build on the previous example to model a Poisson process that contains 2 event types.
Unif(O,L).Bern(0.3) of Heads; these coin tosses are labeled as type-1; the rest are labeled as type-2.The resulting vectors of arrival times t1 and t2 are realizations of 2 independent Poisson processes.
The Normal distribution is a famous continuous distribution with a bell-shaped PDF. It is extremely widely used in statistics because of a theorem, the central limit theorem, which says under very weak assumptions, the sum of a large number of i.i.d. random variables has an approximately Normal distribution - regardless of the distribution of the underlying random variable.
The simplest normal distribution is the standard normal, which is centered at 0 and has a variance of 1.
The following plot shows the probability that Normal(0,1) exactly equals any value between -3 and +3.
The following plot shows the probability that Normal(0,1) is greater than to any value between -3 and +3.
Lets’ consider a click fraud detection event analysis where 1 out of 100 ad clicks is fraudulent (ie. fraud rate is 1%).
If we had a very large dataset, we could run 2000 experiments where we calculate the fraud rate of small samples. According to the central limit theorem, we can take the average fraud rate across all the experiments and be close to the actual fraud rate.
The plot above shows a histogram of the fraud rates for each sample. We try sizes of 10, 100, 200 and 500 samples. Notice that as the sample size grows, the distribution gets Gaussian looking (like a bell curve) and increasinsly centered at 0.01 (1%).
A multivariate (MVN) normal distribution generalizes the Normal distribution into higher dimension. The parameters of a multi-variate normal are:
The following plot shows random variables from 2 bivariate normals: BLUE and ORANGE. The BLUE distribution has parameters \(N((1,0)^T, I)\). The ORANGE distribution has parameters \(N((0,1)^T, I)\). I represents a 2x2 identity matrix - where the diagonals are 1 and other values are 0.