0% found this document useful (0 votes)

13 views97 pages

Smbi CH1

This document serves as an introduction to Bayesian statistics, outlining the fundamental concepts and methodologies such as the Bayesian approach, Bayes' theorem, and various statistical distributions including Binomial, Negative Binomial, and Poisson distributions. It contrasts Bayesian inference with the Frequentist approach and discusses the importance of prior knowledge in updating probabilities. The document also includes practical examples and exercises to illustrate the application of Bayesian methods.

Uploaded by

mis2021190009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views97 pages

Smbi CH1

Uploaded by

mis2021190009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 97

Simulation Method and Bayesian Inference

Chapter 01
Introduction to Bayesian Statistics

Prepared by Nhim Malai

[email protected]

Department of Applied Mathematics and Statistics

Institute of Technology of Cambodia
SMBI: Chapter 1, Introduction to Bayesian Statistics

References

Hoff, P. D. (2009). A first course in Bayesian statistical methods (Vol. 580). New York:
Springer.

Lesaffre, E., Lawson, A. B. (2012). Bayesian biostatistics. John Wiley Sons.

Srijith, R. Introduction to Computational Statistics for Data Scientists Specialization [MOOC].

Coursera. https://www.coursera.org/specializations/compstats

2 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

3 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Modes of Statistical Inference

The central activity in statistics is inference.

Statistical inference is a procedure or a collection of activities with the aim to extract
information from (gathered) data and to generalize the observed results beyond the data
at hand, say to a population or to the future.
Two mainstream paradigms to draw statistical inference:
1 Frequentist approach (aka Classical approach)
2 Bayesian approach
In between these two paradigms is the (pure) likelihood approach.

4 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Frequentist Approach

Classical statistical approach is the method that provides statistical inference based on the
classical P-value, the significance level, the power, and the confidence interval (CI).
Mix of two approaches (Fisher’s approach & Neyman and Pearson’s approach)
Fisher’s Approach
Inductive approach
Introduction of null hypothesis (H0 ), significant test, P-value (= evidence against H0 ), and
significant level. NO alternative hypothesis. NO power.
Neyman and Pearson’s Approach
Deductive approach
Introduction of the alternative hypothesis (HA ), type I error, type II error, power, and
hypothesis test.
In practice the two approaches are mixed.

5 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Likelihood Approach

Inference purely on likelihood function has not been developed to a full-blown statistical
approach.
Considered here as a precursor to the Bayesian approach.
Likelihood function = plausibility of the observed data as a function of the parameters of
the stochastic model.
Likelihood(θ|data) = P(x|θ)
Likelihood does not form a valid probability since the Probability Density Function formed
by varying θ with the observed data does not integrate to 1.

6 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Bayesian Approach

Central idea of Bayesian approach: combine likelihood (data) with your prior knowledge
(prior information) to update information on the parameter to result in a revised
probability associated with the parameter (posterior probability).
Example of Bayesian reasoning in real life:
Tourist: prior views on Cambodian + visit Cambodia (data) ⇒ posterior view on
Cambodian.
Marketing: launching of new energy drink on the market.
Medical: Patients treated for CVA1 with thrombolytic agents suffer from severe bleeding
accident (SBA). Historical studies (20% - prior), pilot study (10% - data) ⇒ posterior

1 CVA - Cerebral Vascular Accident (a brain attack) is an interruption in the flow of blood to cells in the brain.
7 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

8 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Bayes’ Rule

Bayes theorem = theorem on inverse probability

P(B|A)P(A)
P(A|B) =
P(B)

Equivalence to,
P(B|A)P(A)
P(A|B) =
P(B|A)P(A) + P(B|Ā)P(Ā)

9 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: Sensitivity, Specificity, and Prevalence

A: “diseased” (D + ); B: “positive diagnostic test” (T + )

Characteristic of diagnostic test:
Sensitivity (Se ) = P(B|A) = P(T + |D + )
Specificity (Sp ) = P(B̄|Ā) = P(T − |D − )
Positive predictive value (pred+) = P(A|B) = P(D + |T + )
Negative predictive value (pred-) = P(Ā|B̄) = P(D − |T − )
Prevalence (prev) = P(A) = P(D + )

10 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: Sensitivity, Specificity, and Prevalence

Bayes’ Rule
P(T + |D + )P(D+)
P(D + |T + ) =
P(T + |D + )P(D + ) + P(T + |D − )P(D − )
In terms of Se , Sp , and prev

Se · prev
prev + =
Se · prev + (1 − Sp ) · (1 − prev )

11 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: COVID-19

We are interested in taking a test to determine if we have COVID-19.

From a sample, P(disease) = 5% and P(no disease) = 95%.
For one of the COVID-19 tests, the test sensitivity is reported to be 80%, and the test
specificity is 98.9%.
We want to know when the test is positive, then are we in fact infected?
P(T + |D − ) = 1 − P(T − |D − ) = 1-0.989 = 0.011
0.8 × 0.05
P(D + |T + ) = = 0.79
0.8 × 0.05 + 0.011 × 0.95

Not 100% but still high at 79%.

12 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Breaking Down Bayes’ Rule

Bayes’ Rule:
P(B|A)P(A)
P(A|B) =
P(B)

Posterior: P(A|B). In Bayesian analysis, we are often looking for the posterior to
represent the distribution of the parameter given the data.
Likelihood: P(B|A) In the future, we will see that the likelihood will represent the
likelihood of observing the data given the parameters.
Prior: P(A). Prior can represent a belief, it can be informed, or vague.
Marginal: P(B). This is a constant and in many analyses may be dropped out.

13 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 2.1

A car repair shop receives a car with reports of a strange noise coming from the engine. The
shop knows 90% of the cars that come in for “noises” have a loose fan belt while the other
10% have a loose muffler. A common description, 95%, of cars having loose mufflers is the
rattle. Less commonly, 8%, fan belt issues can also sound like a rattle. The car owner is
describing the strange noise as a rattle. What is the probability the car has a loose muffler?
1 78%
2 57%
3 95%

14 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 2.2

It is estimated that 80% of emails are spam. You have developed a new algorithm to detect
spam. Your spam software can detect 99% of spam emails but has a false positive rate of 5%.
Your company receives 1000 emails in a day, how many emails will be incorrectly marked as
spam?
1 10
2 20
3 5
4 200
5 50

15 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 2.3

You have developed a new algorithm for detecting fraud. It has a sensitivity of 90% with a
specificity of 95%. Choose the correct statement:
1 true positive rate = 90%, true negative rate = 5%
2 true positive rate = 90%, true negative rate = 95%

16 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

17 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution

To describe the number of successful p in n total events. Each draw is an independent

Bernoulli event.
n p
P(y |θ) = θ (1 − θ)n−p
p
Mean = nθ
Variance = nθ(1 − θ)
Example: To model the number of successful outcomes in a drug trial.
Conditions
Discrete data
Two possible outcomes for each trial
Each trial is independent
The probability of success/failure is the same in each trial

18 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution [Python Code]

from future import print_function

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import numpy as np
import scipy
from scipy.special import gamma, factorial, comb
import plotly.express as px
import plotly.offline as pyo
import plotly.graph_objs as go
# Set notebook mode to work in offline
pyo.init_notebook_mode()

INTERACT_FLAG = True

19 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution [Python Code]

def binomial_vector_over_y(theta, n):

total_events = n
y = np.linspace(0, total_events , total_events + 1)
p_y = [comb(int(total_events), int(yelem)) * theta** yelem *
(1 - theta)**(total_events - yelem) for yelem in y]

fig = px.line(x=y, y=p_y, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Binomial distribution for theta = %lf, n = %d" %(theta, n))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "y"
fig.layout.yaxis.title.text = "P(y)"
fig.show()

if(INTERACT_FLAG):
interact(binomial_vector_over_y, theta=0.5, n=15)
else:
binomial_vector_over_y(theta=0.5, n=10)

20 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distributions

21 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution

To describe the number of successes (r − 1) and x failures in (x + r − 1) trials, until you

have a success on the (x + r )th trail.

x +r −1 r
P(x|θ) = θ (1 − θ)x
r −1

Mean = r (1 − θ)/θ
Variance = r (1 − θ)/θ2
Example: To measure the number of days your car would work before it breaks down for
the 3rd time.
Conditions
Count of discrete events
The events can be non-independent (the events can influence or cause other events)
Variance can exceed the mean

22 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution [Python Code]

def negative_binomial_vector_over_y(theta, total_events):
# total_events = x + r
# for a fixed number of events, what is the probability of seeing 'x' failures
# number of successes 'r' is therefore total_events - x
# theta is the probability of the success event

x = np.linspace(0, total_events , total_events + 1)

p_x = [comb(int(total_events - 1), int(total_events - xelem - 1)) * theta**
(total_events - xelem) * (1 - theta)**(xelem) for xelem in x]

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800,
title="Negative Binomial distribution for theta = %lf, total_events = %d"
%(theta, total_events))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x = number of failures"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(negative_binomial_vector_over_y, theta=0.9, total_events=15)
else:
negative_binomial_vector_over_y(theta=0.9, total_events=15)
23 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution

24 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution

To indicate the probability of a number of events.

P(y |θ) = θy e −θ /y ! y = 0, 1, 2, ...

Mean = Variance = θ
Example: To model the number of accidents at an intersection. To model the number of
Salmonella outbreaks in a year.
Conditions
Discrete non-negative data - count of events, the rate parameter can be a non-integer
positive value
Each event is independent of other events
Each event happens at a fixed rate
A fixed amount of time in which the events occur

25 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution [Python Code]

def poisson_vector(theta, y_end):

y = np.linspace(0,y_end,y_end+1)

p_theta = (theta**y * np.exp(-theta)) / factorial(y)

# y is the number of events

# y_end is how far you want to compute y values
fig = px.line(x=y, y=p_theta, color_discrete_sequence=["steelblue"],
height=600, width=800, title=" Poisson distribution for theta = %d" %(theta))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "y"
fig.layout.yaxis.title.text = "P(y)"
fig.show()

if(INTERACT_FLAG == True):
interact(poisson_vector, theta=7, y_end=20)
else:
poisson_vector(theta=7, y_end=20)

26 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution

27 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution

To model the duration of events.

P(x) = λe −λx
Mean = 1/λ
Variance = 1/λ2
Example: Time to failure for the radiator in a car.
Conditions
Continuous non-negative data
Time between events are considered to happen at a constant rate
Events are considered to be independent

28 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution [Python Code]

def exponential_distribution(lambda_rate, x_end):

x = np.linspace(0,x_end,x_end*4)

p_x = lambda_rate * np.exp(-lambda_rate * x)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800, title=" Exponential distribution for lambda = %lf" %(lambda_rate))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(exponential_distribution, lambda_rate = 4, x_end=20)
else:
exponential_distribution(lambda_rate = 0.2, x_end=20)

29 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution

30 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution

To model the time taken for n independent events to occur.

β α α−1 −βx
P(x) = x e
Γ(α)

α > 0 is the shape parameter; β > 0 is the rate parameter, or the inverse scale parameter
Mean = α/β
Variance = α/β 2
Example: To model the time taken for 4 bolts in your car to fail.
Conditions
Continuous non-negative data
A generalization of the exponential distribution, but more parameters to fit,
An exponential distribution models the time to the first event, Gamma distribution models
the time to the “n” event.

31 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution [Python Code]

def gamma_individual(a, b, x_max):

x = np.arange(0,x_max,0.1)

term = b**a /gamma(a)

p_x = term * x**(a - 1) * np.exp(-b * x)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Gamma distribution for a (num events) = %d, b (rate of events) = %d" %(a, b))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x (wait times)"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(gamma_individual,a=2,b=1,x_max=10)
else:
gamma_individual(2,1,10)

32 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution

Figure: Gamma distribution by varying parameters α and β

.
33 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution (Gaussian)

To model real-valued random variable.

1 2 2
P(x) = √ e −(x−µ) /2σ
σ 2π
Mean = µ
Variance = σ 2
Example: The heights of men in your state can be represented by a normal distribution.
Conditions
Continuous
Unbounded distribution
Outliers are minimal

34 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution (Gaussian) [Python Code]

import math

def normal_distribution(mean, sigma):

x = np.linspace(-4sigma + mean ,4sigma + mean, 50*sigma)

p_x = np.exp(-(x - mean)**2 / (2sigmasigma)) / (sigma * np.sqrt(2.0 * math.pi ))

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Exponential distribution for mean = %lf, sigma = %lf" %(mean, sigma))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(normal_distribution, mean = 4, sigma = 3)
else:
normal_distribution(mean = 5, sigma = 4)

35 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution

36 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution

To model a right-skewed continuous random variable (long-tail towards the right).

1 2 2
) /2σ 2
P(x) = √ e −(ln(x)−µ
xσ 2π

X ∼ lognormal(µ, σ 2 ) then log(X ) ∼ N(µ, σ 2 )

Mean = exp(µ + σ 2 /2)
Variance = exp(2µ + σ 2 )(exp(σ 2 ) − 1)
To model the disease parameters such as the reproduction number for epidemics.
Conditions
Continuous non-negative values
Asymmetric unlike the Normal distribution

37 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution [Python Code]

import math

def lognormal_distribution(mean, sigma):

x = np.linspace(0.1,2.5,100)

p_x = np.exp(-(np.log(x) - mean)**2 / (2sigmasigma)) / (x * sigma * np.sqrt(2.0 * math.pi ))

mode = np.exp(mean - sigma**2)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],
height=600, width=800,
title=" Lognormal distribution for mean = %lf, sigma = %lf, mode = %lf" %(mean, sigma, mode))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

38 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution [Python Code]

# OPTION 1, if you want to provide the lognormal mean_x and std deviation_x
# OPTION 2, if one wants to select a mean based on a desired mode of the lognormal distribution
# and a given standard deviation.

OPTION = 2

if(OPTION == 1):
mean_x = 2 # CHANGE THIS
sigma_x = 2 # CHANGE THIS
mean = np.log(mean_x**2 / (np.sqrt(mean_x**2 + sigma_x**2)))
sigma = np.log(1 + (sigma_x**2 / mean_x**2))
else:
sigma = 0.2 # CHANGE THIS
mode = 0.8 # CHANGE THIS
mean = np.log(mode + sigma**2)

#print("Mean %lf, sigma %lf, mode %lf "%(mean, sigma, mode))

if(INTERACT_FLAG):
interact(lognormal_distribution, mean = 1, sigma = 0.25)
else:
lognormal_distribution(mean = mean, sigma = sigma)

39 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution

40 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution

Similar to the normal distribution with its bell shape but has heavier tails.
r
Γ( ν+1
2 ) λ λ(x − µ)2 −(ν+1)/2
p(x) = ν (1 + )
Γ( 2 ) νπ ν

Mean = µ
Variance = µ/(ν − 2)λ
Example: A distribution of test scores from an exam which has a significant number of
outliers and would not be appropriate for a Normal distribution
Conditions
Continuous data
Unbounded distribution
Considered an overdispersed Normal distribution, a mixture of individual normal distributions
with different variances

41 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution [Python Code]

def studentst_distribution(v):

t = np.linspace(-10,10,100)

term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))

term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2

fig = px.line(x=t, y=p_t, color_discrete_sequence=["steelblue"],

height=600, width=800, title=" Student's t-distribution for v = %lf" %(v))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "t"
fig.layout.yaxis.title.text = "P(t)"
fig.show()

if(INTERACT_FLAG == True):
interact(studentst_distribution, v=10)
else:
studentst_distribution(v=10)

42 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution [Python Code]

import plotly.graph_objects as go
t = np.linspace(-10,10,100)
fig = go.Figure()

v = 1
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t, line=go.scatter.Line(color="gray"), showlegend=True))
fig.add_scatter(x=t, y=p_t, name="v=1", mode="lines")

v = 4
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=2", mode="lines")

v = 10
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=10", mode="lines")
43 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution

44 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution

To model continuous random variables whose range is between 0 and 1.

Γ(a + b) a−1
P(θ|a, b) = θ (1 − θ)b−1
Γ(a)Γ(b)
a
Mean = a+b
ab
Variance = (a+b)2 (a+b+1)
Example: in Bayesian analyses, the beta distribution is often used as a prior distribution of
the parameter p (which is bounded between 0 and 1) of the binomial distribution.
Conditions
Takes positive values between 0 and 1 as input
Setting a and b to 1 gives you a uniform distribution

45 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution
# Beta posterior with uniform Beta prior, a=1, b=1
def beta_vector_theta(num_p, total, a, b):
alpha = num_p + a
beta = total - num_p + b
theta = np.linspace(0,1,25)

print("Posterior a =",alpha)
print("Posterior b =",beta)

term = gamma(alpha + beta) / ( gamma(alpha) * gamma(beta) )

p_theta = term * theta**(alpha - 1) * (1 - theta)**(beta - 1)

fig = px.line(x=theta, y=p_theta, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Beta dist. for total # of events=%d, # of positive events=%d" %(total, num_p))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "theta"
fig.layout.yaxis.title.text = "P(theta)"
fig.show()

if(INTERACT_FLAG):
interact(beta_vector_theta, num_p = 4, total=10, a=1, b=1)
else:
beta_vector_theta(num_p = 4, total=10, a=1, b=1)
46 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution

Figure: Beta distribution by varying parameters α (a) and β (b)

. 47 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exercise 3.1
Matching distribution to its usage:
Distribution Usage
(1) Binomial distribution (A) Modeling the time between events occurring in a Poisson process.
(2) Negative Binomial distribution (B) Modeling continuous data that follow a symmetric bell-shaped curve.
(3) Poisson distribution (C) Modeling the number of events occurring in a fixed interval of time or space.
(4) Exponential distribution (D) Modeling the number of successes in a fixed number of independent Bernoulli trials.
(5) Gamma distribution (E) Modeling positive data that are skewed and have a distribution of logarithmic values that follow a normal distribution.
(6) Normal distribution (F) Modeling the waiting time until a given number of events occur in a Poisson process.
(7) Log-normal distribution (G) Used for hypothesis testing and constructing confidence intervals for small sample sizes when the population standard deviation is unknown.
(8) Student’s t distribution (H) Modeling the number of trials needed to achieve a fixed number of successes in independent Bernoulli trials.
(9) Beta distribution (I) Modeling continuous data that are bounded between 0 and 1, commonly used in Bayesian analysis as a prior distribution for binomial proportions.

Answer:
1 →
2 →
3 →
4 →
5 →
6 →
7 →
8 →
9 →
48 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

49 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Priors
Bayes’ Rule gives us a method to update our beliefs based on prior knowledge.
Prior is the unconditional probability of the parameters before the (new) data.
Prior can come from a number of sources including:
past experiments or experience
some sort of desire for balance or weighting in a decision
non-informative, but objective
mathematical convenience
The choice of prior is as much about what is currently known about the parameters. It is
often subjective and contested. Two broad types of prior:
1 non-informative
2 informative
The prior can be proper, i.e. conform to the rules of probability and integrate to 1, or
improper.
Convenient choice of priors can lead to closed-form solutions for the posterior.
50 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Conjugate priors

Conjugate priors are priors that induce a known (same family as prior) distribution in the
posterior.
Example:
Data: X ∼ Bern(θ) Q
Likelihood: L(x|θ) = ni=1 θix (1 − θ)1−xi = θk (1 − θ)n−k , where k =
P
xi .
α−1 β−1
Prior ∝ Kernel of beta = θ (1 − θ)
Posterior = Likelihood × prior

L(x|θ) · p(θ) ∝ [θk (1 − θ)n−k ][θα−1 (1 − θ)β−1 ] = θα+k−1 (1 − θ)β+n−k−1

We recognize as Beta(α + k, β + n − k).

If we are modeling a Bernoulli process and use a prior with a Beta distribution, our posterior
has a Beta distribution.

51 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Conjugate priors (Cont.)

Common conjugate priors by likelihood type:

52 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Non-informative priors

Non-informative priors are priors that suggest ignorance as to the parameters. These are
sometimes called vague or diffuse priors.
The priors generally cover the region of the parameter space relatively smoothly.
Common non-informative priors: U[−100, 100], N[0, 104 ].

53 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Jeffrey’s non-informative prior

Jeffrey’s prior is a non-informative prior that is derived from the Fisher information.
We do not specify prior information. We use the data information to shape the prior.
Fisher’s information In (θ) tells us how much information about θ is included in the data.
Jeffery’s prior is derived by:
2 !
∂2

p ∂ ln f (X ; θ)
p(θ) ∝ In (θ), where I (θ) = E = −E ln f (x; θ)
∂θ ∂θ2

For n sample size

In (θ) = nI (θ)

54 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Jeffrey’s non-informative prior

Example:
Data: X ∼ gamma(α, β), assuming α is known and β is unknown.
Fisher’s information, In (β) = nα
β 2 leading to the Jeffrey’s prior for β:
r
nα
p(β) ∝
β2

Note: Jeffrey’s priors are not guaranteed to be a proper prior. Perhaps most importantly,
Jeffrey’s priors are stable under reparameterization.

55 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Informative priors

Informative priors are explicitly chosen to represent current knowledge or belief about the
parameter of interest.
When choosing informative priors, one can also choose the form of prior.
Example: Tossing Coins
We were given a new coin and were told it would generate heads with P(heads) = 0.75.
We conduct a new experiment to characterize the distribution of θ.
When dealing with Bernoulli trials, a computationally convenient on the prior is beta(a, b).

56 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Example: Tossing Coins

What do we choose for the parameters of the

beta distribution?
P(heads) = 0.75
To incorporate the information, we might
use beta with a mean close to 0.75.
We also have the ability to choose the
precision/scale that represents some
amount of disbelief in the unfairness of the
coin.
In this case, we can use beta(6.9, 3) or
beta(16,6).

57 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Example: Tossing Coins [Python Code]

from scipy.stats import beta

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0,1,100)

y1 = beta.pdf(x, 0.5, 0.5)

y2 = beta.pdf(x, 1, 1)
y3 = beta.pdf(x, 3, 3)
y4 = beta.pdf(x, 6.9, 3)
y5 = beta.pdf(x, 16, 6)

plt.plot(x, y1, "-", label="beta(0.5,0.5)")

plt.plot(x, y2, "r--", label="beta(1,1)")
plt.plot(x, y3, "g--", label="beta(3,3)")
plt.plot(x, y4, "b--", label="beta(6.9,3)")
plt.plot(x, y5, "y--", label="beta(16, 6)")

plt.legend(loc="upper left")
plt.show()

58 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Weakly informative or vague priors

We can tune our prior belief using the mean or even mode to center our belief and
variance as a measure of the strength of belief.
Example: Tossing Coins
For prior, beta(6.9,3):

a a−1 ab
E [x] = = 0.70 mode(x) = = 0.77 V (x) = = 0.02
a+b a+b−2 (a + b)2 (a + b + 1)

For prior, beta(16,6): E (x) = 0.73, mode(x) = 0.75, V (x) = 0.0086

Tuning the prior to include slightly more confidence in the prior information may suggest a
beta(16,6).
These priors are vague in that the mass of the prior is still diffuse allowing the data to
drive the posterior through the likelihood.

59 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Sensitivity Analysis of Priors

The general approach to using priors in models is to start with some justification for a
prior, run the analysis, then come up with competing priors and re-examine the
conclusions under the alternative priors.
Many Bayesian experts recommend that a sensitivity analysis should always be conducted.
The process takes place as follows:
The researcher predetermines a set of priors to use for model estimation.
The model is estimated, and convergence is obtained for all model parameters.
The researcher comes up with a set of competing priors to examine.
Results are obtained for the competing priors and then compared with the original results
through a series of visual and statistical comparisons.
The final model results are written up to reflect the original model results (obtained in Item
1, from the original priors), and the sensitivity analysis results are also presented in order to
comment on how robust (or not) the final model results are to different prior settings.

60 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Exercise 4.1

We are studying a Bernoulli process for which we have no prior information. We decided to use
a non-informative prior such as the beta(1,1). Because the prior is flat [0,1], this prior will
have no effect on the posterior.
1 True
2 False (Although, as you gather more data, the effect of the prior diminishes.)

61 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Exercise 4.2

You are given a data set containing the average weight of male flies and decide to model the
average weight via a normal distribution. You don’t know what to expect in terms of mean,
but expected a previous study of female flies to be informative about the variance (σf2 ). Using
conjugates, your model, assuming unknown mean and known variance, will be (likelihood ×
prior):
1 N(µ, σ 2 ) × N(µ0 , σf2 )
2 N(µ, σf2 ) × N(µ0 , σ02 )
3 N(µ, σ 2 ) × N(µ, σ02 )

62 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

63 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case

Example: Stroke Study

First interim analysis ECASS 3: 50 rt-PA2 patients with 10 SICHs3 .
Historical data ECASS 2: 100 rt-PA patients with 8 SICHs.
Estimate risk for SICH in ECASS 3 to construct stopping rule of rt-PA.

2 rt-PA is a treatment for stroke.

3 SICH is a side effect.
64 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case (Cont.)

Prior: ECASS 2 Study

n0 = 100 and x0 = 8
Kernel of binomial likelihood, θx0 (1 − θ)n0 −x0 ∝ Beta(α0 = x0 + 1, β0 = n0 − x0 + 1)

1 Γ(α0 )Γ(β0 )
p(θ) = θα0 −1 (1 − θ)β0 −1 , B(α0 , β0 ) =
B(α0 , β0 ) Γ(α0 + β0 )

Likelihood: ECASS 3 Study

n = 50 and x = 10
n

Likelihood: L(θ|x) = x θx (1 − θ)n−x

65 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case (Cont.)

Posterior distribution
p(θ|x) ∝ L(θ|x)p(x) ∝ Beta(α, β)

1 Γ(α)Γ(β)
p(θ|x) = θα−1 (1 − θ)β−1 , B(α, β) =
B(α, β) Γ(α + β)

where α = α0 + x and β = β0 + n − x

66 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study

Prior: n0 = 100 and x0 = 8, beta(9, 93)
Likelihood: n = 50 and x = 10,
binomial(50, 0.2)
Posterior: beta(19, 133)

67 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study [Python Code]

x = np.linspace(0, 1, 100)

# prior: ECASS 2 study

n0 = 8
y0 = 100
alpha0 = n0 + 1 # 8+1=9
beta0 = y0 - n0 + 1 # 100-8+1=93
prior = beta.pdf(x, alpha0, beta0)
plt.plot(x, prior, "r-", label="prior = beta(9, 93)")

# likelihood: ECASS 3 study

n = 50
y = 10
# x play role as p
likelihood = comb(50,10)*(x)**(10)*(1-x)**(50-10)

def f(p):
return p**(10) * (1-p)**(50-10)

68 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study [Python Code]

p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)

plt.plot(x, likelihood, "b-", label="likelihood of bin(n=50, y=10)")

plt.plot(x, scaled_likelihood, "b--", label="scaled likelihood (AUC=1)")

# posterior: likelihood*prior
alpha1 = alpha0 + y # 9+10=19
beta1 = beta0 + n - y # 93+50-10=133
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(19, 133)")

plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.show()

69 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Characteristics of the Posterior Distribution

Posterior = compromise between prior & likelihood.

n0 n
Posterior mode: θ̄ = n0 +n θ0 + n0 +n θ̂ (analogous result for mean).
Shrinkage: θ0 ≤ θ̄ ≤ θ̂ when (x0 /n0 ≤ y /n).
Here: posterior more peaked than prior & likelihood (not in general).
Likelihood dominates the prior for large sample sizes.
Posterior = beta distribution = prior (conjugacy).
Posterior estimate θ = MLE of combined ECASS 2 data and interim data ECASS 3.

70 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Equivalence of Prior Information and Extra Data

Beta(α0 , β0 ) prior ∝ binomial experiment with (α0 − 1) success in (α0 + β0 − 2)

experiments.
⇒≈ extra data to observed data set: (α0 − 1) successes and (β0 − 1) failures.

71 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case: No Prior Information Is Available

Example: Stroke Study

Suppose no prior information is available.
Need a prior distribution that expresses
ignorance = non-informative (NI) prior.
For stroke study: NI prior = p(theta) =
I[0,1] = flat prior on [0,1].
Prior: Uniform prior on [0,1] = beta(1, 1)
Likelihood: n = 50 and x = 10,
binomial(50, 0.2)
Posterior: beta(11, 41)

72 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

No Prior Information Is Available [Python Code]

x = np.linspace(0, 1, 100)

# prior: no priror information is available

alpha0 = 1
beta0 = 1
prior = beta.pdf(x, 1, 1)
plt.plot(x, prior, "r-", label="prior = beta(1, 1)")

# likelihood: ECASS 3 study

n = 50
y = 10
# x play role as p
likelihood = comb(50,10)*(x)**(10)*(1-x)**(50-10)

def f(p):
return p**(10) * (1-p)**(50-10)

p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)

73 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

No Prior Information Is Available [Python Code]

#plt.plot(x, likelihood, "b-", label="likelihood of bin(n=50, y=10)")

plt.plot(x, scaled_likelihood, "b--", label="scaled likelihood (AUC=1)")

# posterior: likelihood*prior
alpha1 = alpha0 + y # 1+10=11
beta1 = beta0 + n - y # 1+50-10=41
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(11, 41)")

plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.xlabel('theta')
plt.show()

74 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Exercise 5.1.1: Tossing Coin

Consider you are doing a coin toss experiment. You are given a presumably unfair coin with
p(heads) = 0.80 from 20 coins tossed. You are now collecting new data and analyzing the
posterior by doing 10 coin tosses and getting 4 heads.
A. Choose the distribution for your prior and construct your posterior distribution.
B. In case no prior information is available, construct your posterior distribution.

75 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case

Example: Dietary Study

IBBENS Study: a dietary survey in Belgium
Of interest: intake of cholesterol
Monitoring dietary behavior in Belgium: IBBENS-2 study

76 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: Prior Distribution

Histogram of the dietary cholesterol of 563 bank employees approximately follows a

normal distribution.
Y ∼ N(µ, σ 2 ) where the pdf of Y

1 (y −µ)2
f (y ) = √ e − 2σ2
2πσ
Sample y1 , ..., yn the we obtained the likelihood:
" n
# " 2 #
1 X 1 µ − ȳ
L(µ|y ) ∝ exp − 2 (yi − µ)2 ∝ exp − √ ≡ L(µ|ȳ )
2σ 2 σ/ n
i=1

77 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Histogram and Likelihood IBBENS Study

78 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: Prior Distribution

Denote sample n0 IBBENS data: y 0 ≡ {y0,1 , y0,2 , ..., y0,n0 } with mean ȳ0
Likelihood ∝ N(µ0 , σ02 )
µ0 ≡ ȳ0 = 328 √
√
σ0 = σ/ n0 = 120.3/ 563 = 5.072
IBBENS prior distribution
" 2 #
1 1 µ − µ0
p(µ) = √ exp −
2πσ0 2 σ0

with µ0 ≡ ȳ0

79 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Constructing the Posterior Distribution

IBBENS-2 Study:
Sample y with n = 50
ȳ = 318 mg/day and s = 119.5 mg/day
The 95% confidence interval = [284.3, 351.9] mg/day ⇒ wide
Combine IBBENS prior distribution and IBBENS-2 Normal likelihood:
IBBENS-2 likelihood: L(y |ŷ )
IBBENS prior density: N(µ0 , σ02 )
Posterior distribution ∝ p(µ)L(µ|ȳ ):
( " 2 2 #)
1 µ − µ0 µ − ȳ
p(µ|y ) ∝ p(µ|ȳ ) ≡ exp − + √
2 σ0 σ/ n

80 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Constructing the Posterior Distribution

Integration constant to obtain density?

Recognize standard distribution: exponent (quadratic function of µ)
Posterior distribution:
p(µ|y ) = N(µ̄, σ̄ 2 )
with
1
µ + σn2 ȳ
σ02 0 1
µ̄ = 1 and σ̄ 2 =
σ02
+ σn2 1
σ02
+ n
σ2

Hence µ̄ = 327.2 and σ̄ = 4.79

81 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

IBBENS-2 Posterior Distribution

82 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Characteristics of the Posterior Distribution

Posterior distribution: a compromise between prior and likelihood.

Posterior mean: weighted average of prior and the sample mean
w0 w1
µ̄ = µ0 + ȳ
w0 + w1 w0 + w1
with weights
1 1
w0 = and w1 = 2
σ02 σ /n
The posterior precision = 1/posterior variance:

1
= w0 + w1
σ̄ 2
with w0 = 1/σ02 = prior precision and w1 = 1/(σ 2 /n) = sample precision.

83 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Characteristics of the Posterior Distribution

Posterior is always more peaked than prior likelihood.

When n → ∞ or σ0 → ∞: p(µ|y ) = N(ȳ , σ 2 /n)
When the sample size increases the likelihood dominates the prior.
Posterior = normal = prior ⇒ conjugacy.

84 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Equivalence of Prior Information and Extra Data

Prior variance σ02 = σ 2 ⇒ σ̄ 2 = σ 2 /(n + 1) ⇒ Prior information = adding one extra

observation to the sample.
General: σ02 = σ 2 /n0 , with n0 general
n0 n
µ̄ = µ0 + ȳ
n0 + n n0 + n
and
σ2
σ̄ 2 =
n0 + n

85 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: No Prior Information Available

Non-informative prior: σ02 → ∞ ⇒ Posterior: N(ȳ , σ 2 /n)

86 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Exercies 5.2.1

Given a prior having a mean 10 and data having a mean 5, we should expect the posterior
mean to lie
1 to the left of 5
2 to the right of 10
3 between 5 and 10

87 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Exercise 5.2.2: Midge Wing Length

Grogan and Wirth (1981) provided data on the wing length in

millimeters of nine members of a species of midge (small,
two-winged flies). From these nine measurements (1.64, 1.70,
1.72, 1.74, 1.82, 1.82, 1.82, 1.90. 2.08) we wish to make an
inference on the population mean θ. Studies from other
populations suggest that wing lengths are typically around
1.9mm and must be positive. We also know that most of the
probability (95%) is within two standard deviations of the
mean. Obtain and plot the posterior distribution of midge
wing length.

88 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

The Poisson Case

Take y ≡ {y1 , ..., yn } independent counts ⇒ Poisson distribution

Poisson (θ)
θy e −θ
p(y |θ) =
y!
Mean and variance = θ
Poisson likelihood
n n yi
Y Y θ
L(θ|y ) ≡ p(yi |θ) = e −nθ
yi !
i=1 i=1

89 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Example: Describing Caries Experience in Flanders

The Signal-Tandmobiel (STM) study:

Longitudinal oral health study in Flanders.
Annual examinations from 1996 to 2001.
4468 children (7% of children born in
1989)
Caries experience measured by dmft-index
(min=0, max=20)
Frequentist and Likelihood Calculations
MLE of θ: θ̂ = ȳ = 2.24
Likelihood-based 95% confidence interval
for θ: [2.1984, 2.2875].

90 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

The Poisson Case: Specifying Prior Distribution

Information from literature:

Average dmft-index 4.1 (Liège) & 1.39 (Gent,
1994)
Oral hygiene has improved considerably in Flanders
Average dmft-index bounded above by 10.
Candidate for prior: Gamma(α0 , β0 )

β0α0 α0 −1 −β0 θ
p(θ) = θ e
Γ(α0 )

α0 = shape parameter & β0 = inverse of scale

parameter
E (θ0 ) = α0 /β0 & Var (θ) = α0 /β02
STM study: α0 = 3 & β0 = 1.
91 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Constructing the Posterior Distribution

Posterior
n
Y θ α0 α0 P
p(θ|y ) ∝ L(θ|y )p(θ) ∝ e −nθ (θiy /yi !) 0 θα −1 e −β0 θ ∝ θ( yi +α0 )−1 e −(n+β0 )θ
Γ(α0 )
i=1
P
Recognize kernel of a Gamma( yi + α0 , n + β0 ) distribution

β̄ ᾱ ᾱ−1 −β̄θ
⇒ p(θ|y ) ≡ p(θ|ȳ ) = θ e
Γ(ᾱ)
P
with ᾱ = yi + α0 = 9758 + 3 = 9761 and β̄ = n + β0 = 4351 + 1 = 4352 ⇒ STM
Study: The effect of the prior is minimal.

92 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Characteristics of the Posterior Distribution

Posterior is a compromise between prior and likelihood.

Posterior mode and mean demonstrate shrinkage.
For the STM study posterior more peaked than prior and likelihood, but not in general.
Prior is dominated by likelihood for large sample size.
Posterior = gamma = prior ⇒ conjugacy.

93 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Equivalence of Prior Information and Extra Data

Prior = equivalent to experiment of size β0 with counts summing up to α0 − 1

STM study: prior corresponds to an experiment of size 1 with a count equal to 2.

94 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

The Poisson Case: No Prior Information Available

Gamma with α0 ≈ 1 and β0 ≈ 0 = non-informative prior.

95 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Exercise 5.3.1

Given a y ∼ Poisson(λ) and p(λ) ∼ γ(a, b), what is the mean of the posterior? Assume
x = mean(y ).
a+z
a + nz

96 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Exercise 5.3.2: Road Accidents in Cambodia

The bar chart on the right shows the number

of injuries caused by road accidents in
Cambodia from 2014 to 2019. We are
interested in making inferences on the average
of fatalities at year level. The experts suggest
that road accidents kill more or less 2,000
people every year. Construct the posterior
distribution of the average fatality.
Figure: Source: UNDP: Road Traffic Accidents in
Cambodia

97 / 97

Chapitre 1 Statistique - Bayesienne
No ratings yet
Chapitre 1 Statistique - Bayesienne
47 pages
Bayesain Stastatics: P.Charan 23H51A66B3 CSM-4
No ratings yet
Bayesain Stastatics: P.Charan 23H51A66B3 CSM-4
9 pages
SAT QAS October 2023 US
100% (1)
SAT QAS October 2023 US
62 pages
Bayesian Statistics For Beginners A Step by Step Approach Therese M. Donovan Download
100% (1)
Bayesian Statistics For Beginners A Step by Step Approach Therese M. Donovan Download
39 pages
SMBI CH3 Final Update
No ratings yet
SMBI CH3 Final Update
96 pages
Unit - 5 ML
No ratings yet
Unit - 5 ML
57 pages
Studio 5 Questions
No ratings yet
Studio 5 Questions
8 pages
Lecture 5 - 8 Bayesian Estimation
No ratings yet
Lecture 5 - 8 Bayesian Estimation
65 pages
Quiz 2 - Statistics Coursera
No ratings yet
Quiz 2 - Statistics Coursera
1 page
IT590 Bayesian Theory Lecture 1
No ratings yet
IT590 Bayesian Theory Lecture 1
5 pages
FSMLecture 4
No ratings yet
FSMLecture 4
49 pages
Bayesian Statistics
No ratings yet
Bayesian Statistics
20 pages
24 Intro To Bayesian Inference
No ratings yet
24 Intro To Bayesian Inference
33 pages
25 Intro To Bayesian Inference
No ratings yet
25 Intro To Bayesian Inference
31 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Baysian Inferences
No ratings yet
Baysian Inferences
20 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
Bayesian Inference: Chris Mathys
No ratings yet
Bayesian Inference: Chris Mathys
32 pages
1-MS2 (Intro Bayes)
No ratings yet
1-MS2 (Intro Bayes)
38 pages
Bayesian Analysis - Explanation
No ratings yet
Bayesian Analysis - Explanation
20 pages
Intro-Bayes Theory
No ratings yet
Intro-Bayes Theory
17 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Notes 2 BayesianStatistics
No ratings yet
Notes 2 BayesianStatistics
6 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
79 pages
(Ebook PDF) Statistics For The Behavioral Sciences 5th Edition Complete Edition
100% (1)
(Ebook PDF) Statistics For The Behavioral Sciences 5th Edition Complete Edition
170 pages
Bayes
No ratings yet
Bayes
31 pages
Bayesian-Statistics Final 20140416 3
No ratings yet
Bayesian-Statistics Final 20140416 3
38 pages
Complete Download (Ebook PDF) Statistics in Context by Barbara Blatchley PDF All Chapters
No ratings yet
Complete Download (Ebook PDF) Statistics in Context by Barbara Blatchley PDF All Chapters
55 pages
Week 2
No ratings yet
Week 2
45 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
16 pages
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
No ratings yet
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
40 pages
Bayesian
No ratings yet
Bayesian
26 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Chapter 2 The Process-2023
No ratings yet
Chapter 2 The Process-2023
39 pages
Part 1
No ratings yet
Part 1
200 pages
Primer Clase de Estadística Bayesiana
No ratings yet
Primer Clase de Estadística Bayesiana
19 pages
Bayesian Statistics With R and BUGS
100% (1)
Bayesian Statistics With R and BUGS
143 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
23 pages
Zzzz-Essential Bayes
No ratings yet
Zzzz-Essential Bayes
158 pages
Chapter 3 SW Project Management Metric-2023
No ratings yet
Chapter 3 SW Project Management Metric-2023
29 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Chapter 1 Introduction To SE-2023
No ratings yet
Chapter 1 Introduction To SE-2023
28 pages
IDS22Bayes Applications
No ratings yet
IDS22Bayes Applications
34 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Exploring Research Methodology: Review Article International Journal of Research & Reviewed by KEL
No ratings yet
Exploring Research Methodology: Review Article International Journal of Research & Reviewed by KEL
5 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
G4 A3 Topic Doctor System
No ratings yet
G4 A3 Topic Doctor System
15 pages
Quantitative Application of UV-Visible Spectros
100% (1)
Quantitative Application of UV-Visible Spectros
7 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
School Classes System Databases
No ratings yet
School Classes System Databases
10 pages
G1 Database
No ratings yet
G1 Database
9 pages
Bayesian Statistics: Thomas Bayes
No ratings yet
Bayesian Statistics: Thomas Bayes
22 pages
Bayesian vs Frequentist Sample Size
No ratings yet
Bayesian vs Frequentist Sample Size
26 pages
03 Bay Est He or em
No ratings yet
03 Bay Est He or em
13 pages
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
No ratings yet
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
41 pages
Course On Bayesian Methods in Environmental Valuation: Basics (Continued) : Models For Proportions and Means
No ratings yet
Course On Bayesian Methods in Environmental Valuation: Basics (Continued) : Models For Proportions and Means
34 pages
Analytical Chemistry Resources
No ratings yet
Analytical Chemistry Resources
21 pages
Bayesian Inference & Applications
No ratings yet
Bayesian Inference & Applications
12 pages
Bayesian Statistics
No ratings yet
Bayesian Statistics
76 pages
Advance Statistics
No ratings yet
Advance Statistics
23 pages
QB - Business Forecasting
No ratings yet
QB - Business Forecasting
8 pages
State Space Stability Analysis
No ratings yet
State Space Stability Analysis
4 pages
XYZ Data Analysis Report
100% (1)
XYZ Data Analysis Report
14 pages
Experimental Design in Quantitative Studies
No ratings yet
Experimental Design in Quantitative Studies
19 pages
Main
No ratings yet
Main
195 pages
Community Social Psychology in Latin America: Myths, Dilemmas and Challenges
No ratings yet
Community Social Psychology in Latin America: Myths, Dilemmas and Challenges
16 pages
Var PPTS
No ratings yet
Var PPTS
249 pages
Statistics: 2.3 The Mann-Whitney U Test: Rosie Shier. 2004
No ratings yet
Statistics: 2.3 The Mann-Whitney U Test: Rosie Shier. 2004
3 pages
Baysian Analysis Notes
No ratings yet
Baysian Analysis Notes
30 pages
Tutorial Chapter 2
No ratings yet
Tutorial Chapter 2
5 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Phenomenology - Qualitative Study Design - LibGuides at Deakin University
No ratings yet
Phenomenology - Qualitative Study Design - LibGuides at Deakin University
7 pages
Bayesian Inference
No ratings yet
Bayesian Inference
22 pages
Bayesian Ibrahim
No ratings yet
Bayesian Ibrahim
370 pages
"Don't Ask, Don't Tell": The Academic Climate For Lesbian, Gay, Bisexual, and Transgender Faculty in Science and Engineering
No ratings yet
"Don't Ask, Don't Tell": The Academic Climate For Lesbian, Gay, Bisexual, and Transgender Faculty in Science and Engineering
21 pages
Additional MCQs Chap 1 MA
No ratings yet
Additional MCQs Chap 1 MA
4 pages
New Syllabus
No ratings yet
New Syllabus
3 pages
819 Philosophy Value Neutrality
No ratings yet
819 Philosophy Value Neutrality
32 pages
Bayes Slides1
No ratings yet
Bayes Slides1
146 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
12 pages
SMBI Ch1 - Introduction To Bayesian Statistics
No ratings yet
SMBI Ch1 - Introduction To Bayesian Statistics
92 pages
Lesson 4 The Human Person Flourishing in Terms of Science and Technology
No ratings yet
Lesson 4 The Human Person Flourishing in Terms of Science and Technology
4 pages
Chapter 4 Experimental Research Designs
No ratings yet
Chapter 4 Experimental Research Designs
21 pages
Two-Way ANOVA with Replication Analysis
No ratings yet
Two-Way ANOVA with Replication Analysis
2 pages
StatisticUsing R PDF
No ratings yet
StatisticUsing R PDF
35 pages
Random Number Generation Lab
No ratings yet
Random Number Generation Lab
2 pages
An Overview of Bayesian Econometrics
No ratings yet
An Overview of Bayesian Econometrics
30 pages
Bayesian Statistics Essentials
No ratings yet
Bayesian Statistics Essentials
180 pages
Introduction To Discrete Bayesian Methods: Petri Nokelainen
No ratings yet
Introduction To Discrete Bayesian Methods: Petri Nokelainen
146 pages
What Is Behavioural Science?
No ratings yet
What Is Behavioural Science?
32 pages
Bayesian Statistics Lecture Notes
No ratings yet
Bayesian Statistics Lecture Notes
146 pages
Group 5 Sip Sampling
No ratings yet
Group 5 Sip Sampling
11 pages
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
No ratings yet
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
49 pages
2024 Principles of Marketing Week 7
No ratings yet
2024 Principles of Marketing Week 7
19 pages
Validation in Clinical Chemistry: Elvar Theodorsson
No ratings yet
Validation in Clinical Chemistry: Elvar Theodorsson
26 pages