Smbi CH1
Smbi CH1
Chapter 01
Introduction to Bayesian Statistics
References
Hoff, P. D. (2009). A first course in Bayesian statistical methods (Vol. 580). New York:
Springer.
2 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Contents
3 Common Distributions
4 Priors
3 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
4 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Frequentist Approach
Classical statistical approach is the method that provides statistical inference based on the
classical P-value, the significance level, the power, and the confidence interval (CI).
Mix of two approaches (Fisher’s approach & Neyman and Pearson’s approach)
Fisher’s Approach
Inductive approach
Introduction of null hypothesis (H0 ), significant test, P-value (= evidence against H0 ), and
significant level. NO alternative hypothesis. NO power.
Neyman and Pearson’s Approach
Deductive approach
Introduction of the alternative hypothesis (HA ), type I error, type II error, power, and
hypothesis test.
In practice the two approaches are mixed.
5 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Likelihood Approach
Inference purely on likelihood function has not been developed to a full-blown statistical
approach.
Considered here as a precursor to the Bayesian approach.
Likelihood function = plausibility of the observed data as a function of the parameters of
the stochastic model.
Likelihood(θ|data) = P(x|θ)
Likelihood does not form a valid probability since the Probability Density Function formed
by varying θ with the observed data does not integrate to 1.
6 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Bayesian Approach
Central idea of Bayesian approach: combine likelihood (data) with your prior knowledge
(prior information) to update information on the parameter to result in a revised
probability associated with the parameter (posterior probability).
Example of Bayesian reasoning in real life:
Tourist: prior views on Cambodian + visit Cambodia (data) ⇒ posterior view on
Cambodian.
Marketing: launching of new energy drink on the market.
Medical: Patients treated for CVA1 with thrombolytic agents suffer from severe bleeding
accident (SBA). Historical studies (20% - prior), pilot study (10% - data) ⇒ posterior
1 CVA - Cerebral Vascular Accident (a brain attack) is an interruption in the flow of blood to cells in the brain.
7 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Contents
3 Common Distributions
4 Priors
8 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Bayes’ Rule
P(B|A)P(A)
P(A|B) =
P(B)
Equivalence to,
P(B|A)P(A)
P(A|B) =
P(B|A)P(A) + P(B|Ā)P(Ā)
9 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
10 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Bayes’ Rule
P(T + |D + )P(D+)
P(D + |T + ) =
P(T + |D + )P(D + ) + P(T + |D − )P(D − )
In terms of Se , Sp , and prev
Se · prev
prev + =
Se · prev + (1 − Sp ) · (1 − prev )
11 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
12 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Bayes’ Rule:
P(B|A)P(A)
P(A|B) =
P(B)
Posterior: P(A|B). In Bayesian analysis, we are often looking for the posterior to
represent the distribution of the parameter given the data.
Likelihood: P(B|A) In the future, we will see that the likelihood will represent the
likelihood of observing the data given the parameters.
Prior: P(A). Prior can represent a belief, it can be informed, or vague.
Marginal: P(B). This is a constant and in many analyses may be dropped out.
13 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Exercise 2.1
A car repair shop receives a car with reports of a strange noise coming from the engine. The
shop knows 90% of the cars that come in for “noises” have a loose fan belt while the other
10% have a loose muffler. A common description, 95%, of cars having loose mufflers is the
rattle. Less commonly, 8%, fan belt issues can also sound like a rattle. The car owner is
describing the strange noise as a rattle. What is the probability the car has a loose muffler?
1 78%
2 57%
3 95%
14 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Exercise 2.2
It is estimated that 80% of emails are spam. You have developed a new algorithm to detect
spam. Your spam software can detect 99% of spam emails but has a false positive rate of 5%.
Your company receives 1000 emails in a day, how many emails will be incorrectly marked as
spam?
1 10
2 20
3 5
4 200
5 50
15 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Exercise 2.3
You have developed a new algorithm for detecting fraud. It has a sensitivity of 90% with a
specificity of 95%. Choose the correct statement:
1 true positive rate = 90%, true negative rate = 5%
2 true positive rate = 90%, true negative rate = 95%
16 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Contents
3 Common Distributions
4 Priors
17 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Binomial Distribution
18 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
INTERACT_FLAG = True
19 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG):
interact(binomial_vector_over_y, theta=0.5, n=15)
else:
binomial_vector_over_y(theta=0.5, n=10)
20 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Binomial Distributions
21 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Mean = r (1 − θ)/θ
Variance = r (1 − θ)/θ2
Example: To measure the number of days your car would work before it breaks down for
the 3rd time.
Conditions
Count of discrete events
The events can be non-independent (the events can influence or cause other events)
Variance can exceed the mean
22 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG):
interact(negative_binomial_vector_over_y, theta=0.9, total_events=15)
else:
negative_binomial_vector_over_y(theta=0.9, total_events=15)
23 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
24 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Poisson Distribution
Mean = Variance = θ
Example: To model the number of accidents at an intersection. To model the number of
Salmonella outbreaks in a year.
Conditions
Discrete non-negative data - count of events, the rate parameter can be a non-integer
positive value
Each event is independent of other events
Each event happens at a fixed rate
A fixed amount of time in which the events occur
25 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG == True):
interact(poisson_vector, theta=7, y_end=20)
else:
poisson_vector(theta=7, y_end=20)
26 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Poisson Distribution
27 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Exponential Distribution
28 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
x = np.linspace(0,x_end,x_end*4)
if(INTERACT_FLAG):
interact(exponential_distribution, lambda_rate = 4, x_end=20)
else:
exponential_distribution(lambda_rate = 0.2, x_end=20)
29 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Exponential Distribution
30 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Gamma Distribution
α > 0 is the shape parameter; β > 0 is the rate parameter, or the inverse scale parameter
Mean = α/β
Variance = α/β 2
Example: To model the time taken for 4 bolts in your car to fail.
Conditions
Continuous non-negative data
A generalization of the exponential distribution, but more parameters to fit,
An exponential distribution models the time to the first event, Gamma distribution models
the time to the “n” event.
31 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG):
interact(gamma_individual,a=2,b=1,x_max=10)
else:
gamma_individual(2,1,10)
32 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Gamma Distribution
.
33 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
34 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
import math
if(INTERACT_FLAG):
interact(normal_distribution, mean = 4, sigma = 3)
else:
normal_distribution(mean = 5, sigma = 4)
35 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Normal Distribution
36 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Log-normal Distribution
1 2 2
) /2σ 2
P(x) = √ e −(ln(x)−µ
xσ 2π
37 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
import math
x = np.linspace(0.1,2.5,100)
38 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
OPTION = 2
if(OPTION == 1):
mean_x = 2 # CHANGE THIS
sigma_x = 2 # CHANGE THIS
mean = np.log(mean_x**2 / (np.sqrt(mean_x**2 + sigma_x**2)))
sigma = np.log(1 + (sigma_x**2 / mean_x**2))
else:
sigma = 0.2 # CHANGE THIS
mode = 0.8 # CHANGE THIS
mean = np.log(mode + sigma**2)
if(INTERACT_FLAG):
interact(lognormal_distribution, mean = 1, sigma = 0.25)
else:
lognormal_distribution(mean = mean, sigma = sigma)
39 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Log-normal Distribution
40 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Student’s t-Distribution
Similar to the normal distribution with its bell shape but has heavier tails.
r
Γ( ν+1
2 ) λ λ(x − µ)2 −(ν+1)/2
p(x) = ν (1 + )
Γ( 2 ) νπ ν
Mean = µ
Variance = µ/(ν − 2)λ
Example: A distribution of test scores from an exam which has a significant number of
outliers and would not be appropriate for a Normal distribution
Conditions
Continuous data
Unbounded distribution
Considered an overdispersed Normal distribution, a mixture of individual normal distributions
with different variances
41 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
def studentst_distribution(v):
t = np.linspace(-10,10,100)
if(INTERACT_FLAG == True):
interact(studentst_distribution, v=10)
else:
studentst_distribution(v=10)
42 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
v = 1
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t, line=go.scatter.Line(color="gray"), showlegend=True))
fig.add_scatter(x=t, y=p_t, name="v=1", mode="lines")
v = 4
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=2", mode="lines")
v = 10
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=10", mode="lines")
43 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Student’s t-Distribution
44 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Beta Distribution
Γ(a + b) a−1
P(θ|a, b) = θ (1 − θ)b−1
Γ(a)Γ(b)
a
Mean = a+b
ab
Variance = (a+b)2 (a+b+1)
Example: in Bayesian analyses, the beta distribution is often used as a prior distribution of
the parameter p (which is bounded between 0 and 1) of the binomial distribution.
Conditions
Takes positive values between 0 and 1 as input
Setting a and b to 1 gives you a uniform distribution
45 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Beta Distribution
# Beta posterior with uniform Beta prior, a=1, b=1
def beta_vector_theta(num_p, total, a, b):
alpha = num_p + a
beta = total - num_p + b
theta = np.linspace(0,1,25)
print("Posterior a =",alpha)
print("Posterior b =",beta)
if(INTERACT_FLAG):
interact(beta_vector_theta, num_p = 4, total=10, a=1, b=1)
else:
beta_vector_theta(num_p = 4, total=10, a=1, b=1)
46 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Beta Distribution
. 47 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Exercise 3.1
Matching distribution to its usage:
Distribution Usage
(1) Binomial distribution (A) Modeling the time between events occurring in a Poisson process.
(2) Negative Binomial distribution (B) Modeling continuous data that follow a symmetric bell-shaped curve.
(3) Poisson distribution (C) Modeling the number of events occurring in a fixed interval of time or space.
(4) Exponential distribution (D) Modeling the number of successes in a fixed number of independent Bernoulli trials.
(5) Gamma distribution (E) Modeling positive data that are skewed and have a distribution of logarithmic values that follow a normal distribution.
(6) Normal distribution (F) Modeling the waiting time until a given number of events occur in a Poisson process.
(7) Log-normal distribution (G) Used for hypothesis testing and constructing confidence intervals for small sample sizes when the population standard deviation is unknown.
(8) Student’s t distribution (H) Modeling the number of trials needed to achieve a fixed number of successes in independent Bernoulli trials.
(9) Beta distribution (I) Modeling continuous data that are bounded between 0 and 1, commonly used in Bayesian analysis as a prior distribution for binomial proportions.
Answer:
1 →
2 →
3 →
4 →
5 →
6 →
7 →
8 →
9 →
48 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Contents
3 Common Distributions
4 Priors
49 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Priors
Bayes’ Rule gives us a method to update our beliefs based on prior knowledge.
Prior is the unconditional probability of the parameters before the (new) data.
Prior can come from a number of sources including:
past experiments or experience
some sort of desire for balance or weighting in a decision
non-informative, but objective
mathematical convenience
The choice of prior is as much about what is currently known about the parameters. It is
often subjective and contested. Two broad types of prior:
1 non-informative
2 informative
The prior can be proper, i.e. conform to the rules of probability and integrate to 1, or
improper.
Convenient choice of priors can lead to closed-form solutions for the posterior.
50 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Conjugate priors
Conjugate priors are priors that induce a known (same family as prior) distribution in the
posterior.
Example:
Data: X ∼ Bern(θ) Q
Likelihood: L(x|θ) = ni=1 θix (1 − θ)1−xi = θk (1 − θ)n−k , where k =
P
xi .
α−1 β−1
Prior ∝ Kernel of beta = θ (1 − θ)
Posterior = Likelihood × prior
51 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
52 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Non-informative priors
Non-informative priors are priors that suggest ignorance as to the parameters. These are
sometimes called vague or diffuse priors.
The priors generally cover the region of the parameter space relatively smoothly.
Common non-informative priors: U[−100, 100], N[0, 104 ].
53 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Jeffrey’s prior is a non-informative prior that is derived from the Fisher information.
We do not specify prior information. We use the data information to shape the prior.
Fisher’s information In (θ) tells us how much information about θ is included in the data.
Jeffery’s prior is derived by:
2 !
∂2
p ∂ ln f (X ; θ)
p(θ) ∝ In (θ), where I (θ) = E = −E ln f (x; θ)
∂θ ∂θ2
54 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Example:
Data: X ∼ gamma(α, β), assuming α is known and β is unknown.
Fisher’s information, In (β) = nα
β 2 leading to the Jeffrey’s prior for β:
r
nα
p(β) ∝
β2
Note: Jeffrey’s priors are not guaranteed to be a proper prior. Perhaps most importantly,
Jeffrey’s priors are stable under reparameterization.
55 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Informative priors
Informative priors are explicitly chosen to represent current knowledge or belief about the
parameter of interest.
When choosing informative priors, one can also choose the form of prior.
Example: Tossing Coins
We were given a new coin and were told it would generate heads with P(heads) = 0.75.
We conduct a new experiment to characterize the distribution of θ.
When dealing with Bernoulli trials, a computationally convenient on the prior is beta(a, b).
56 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
57 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
x = np.linspace(0,1,100)
plt.legend(loc="upper left")
plt.show()
58 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
We can tune our prior belief using the mean or even mode to center our belief and
variance as a measure of the strength of belief.
Example: Tossing Coins
For prior, beta(6.9,3):
a a−1 ab
E [x] = = 0.70 mode(x) = = 0.77 V (x) = = 0.02
a+b a+b−2 (a + b)2 (a + b + 1)
59 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
The general approach to using priors in models is to start with some justification for a
prior, run the analysis, then come up with competing priors and re-examine the
conclusions under the alternative priors.
Many Bayesian experts recommend that a sensitivity analysis should always be conducted.
The process takes place as follows:
The researcher predetermines a set of priors to use for model estimation.
The model is estimated, and convergence is obtained for all model parameters.
The researcher comes up with a set of competing priors to examine.
Results are obtained for the competing priors and then compared with the original results
through a series of visual and statistical comparisons.
The final model results are written up to reflect the original model results (obtained in Item
1, from the original priors), and the sensitivity analysis results are also presented in order to
comment on how robust (or not) the final model results are to different prior settings.
60 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Exercise 4.1
We are studying a Bernoulli process for which we have no prior information. We decided to use
a non-informative prior such as the beta(1,1). Because the prior is flat [0,1], this prior will
have no effect on the posterior.
1 True
2 False (Although, as you gather more data, the effect of the prior diminishes.)
61 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Exercise 4.2
You are given a data set containing the average weight of male flies and decide to model the
average weight via a normal distribution. You don’t know what to expect in terms of mean,
but expected a previous study of female flies to be informative about the variance (σf2 ). Using
conjugates, your model, assuming unknown mean and known variance, will be (likelihood ×
prior):
1 N(µ, σ 2 ) × N(µ0 , σf2 )
2 N(µ, σf2 ) × N(µ0 , σ02 )
3 N(µ, σ 2 ) × N(µ, σ02 )
62 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
Contents
3 Common Distributions
4 Priors
63 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
1 Γ(α0 )Γ(β0 )
p(θ) = θα0 −1 (1 − θ)β0 −1 , B(α0 , β0 ) =
B(α0 , β0 ) Γ(α0 + β0 )
65 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
Posterior distribution
p(θ|x) ∝ L(θ|x)p(x) ∝ Beta(α, β)
1 Γ(α)Γ(β)
p(θ|x) = θα−1 (1 − θ)β−1 , B(α, β) =
B(α, β) Γ(α + β)
where α = α0 + x and β = β0 + n − x
66 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
67 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
x = np.linspace(0, 1, 100)
def f(p):
return p**(10) * (1-p)**(50-10)
68 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)
# posterior: likelihood*prior
alpha1 = alpha0 + y # 9+10=19
beta1 = beta0 + n - y # 93+50-10=133
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(19, 133)")
plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.show()
69 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
70 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
71 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
72 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
def f(p):
return p**(10) * (1-p)**(50-10)
p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)
73 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
# posterior: likelihood*prior
alpha1 = alpha0 + y # 1+10=11
beta1 = beta0 + n - y # 1+50-10=41
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(11, 41)")
plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.xlabel('theta')
plt.show()
74 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
Consider you are doing a coin toss experiment. You are given a presumably unfair coin with
p(heads) = 0.80 from 20 coins tossed. You are now collecting new data and analyzing the
posterior by doing 10 coin tosses and getting 4 heads.
A. Choose the distribution for your prior and construct your posterior distribution.
B. In case no prior information is available, construct your posterior distribution.
75 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
76 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
1 (y −µ)2
f (y ) = √ e − 2σ2
2πσ
Sample y1 , ..., yn the we obtained the likelihood:
" n
# " 2 #
1 X 1 µ − ȳ
L(µ|y ) ∝ exp − 2 (yi − µ)2 ∝ exp − √ ≡ L(µ|ȳ )
2σ 2 σ/ n
i=1
77 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
78 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
Denote sample n0 IBBENS data: y 0 ≡ {y0,1 , y0,2 , ..., y0,n0 } with mean ȳ0
Likelihood ∝ N(µ0 , σ02 )
µ0 ≡ ȳ0 = 328 √
√
σ0 = σ/ n0 = 120.3/ 563 = 5.072
IBBENS prior distribution
" 2 #
1 1 µ − µ0
p(µ) = √ exp −
2πσ0 2 σ0
with µ0 ≡ ȳ0
79 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
IBBENS-2 Study:
Sample y with n = 50
ȳ = 318 mg/day and s = 119.5 mg/day
The 95% confidence interval = [284.3, 351.9] mg/day ⇒ wide
Combine IBBENS prior distribution and IBBENS-2 Normal likelihood:
IBBENS-2 likelihood: L(y |ŷ )
IBBENS prior density: N(µ0 , σ02 )
Posterior distribution ∝ p(µ)L(µ|ȳ ):
( " 2 2 #)
1 µ − µ0 µ − ȳ
p(µ|y ) ∝ p(µ|ȳ ) ≡ exp − + √
2 σ0 σ/ n
80 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
81 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
82 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
1
= w0 + w1
σ̄ 2
with w0 = 1/σ02 = prior precision and w1 = 1/(σ 2 /n) = sample precision.
83 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
84 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
85 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
86 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
Exercies 5.2.1
Given a prior having a mean 10 and data having a mean 5, we should expect the posterior
mean to lie
1 to the left of 5
2 to the right of 10
3 between 5 and 10
87 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
88 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
89 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
90 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
β0α0 α0 −1 −β0 θ
p(θ) = θ e
Γ(α0 )
Posterior
n
Y θ α0 α0 P
p(θ|y ) ∝ L(θ|y )p(θ) ∝ e −nθ (θiy /yi !) 0 θα −1 e −β0 θ ∝ θ( yi +α0 )−1 e −(n+β0 )θ
Γ(α0 )
i=1
P
Recognize kernel of a Gamma( yi + α0 , n + β0 ) distribution
β̄ ᾱ ᾱ−1 −β̄θ
⇒ p(θ|y ) ≡ p(θ|ȳ ) = θ e
Γ(ᾱ)
P
with ᾱ = yi + α0 = 9758 + 3 = 9761 and β̄ = n + β0 = 4351 + 1 = 4352 ⇒ STM
Study: The effect of the prior is minimal.
92 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
93 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
94 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
95 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
Exercise 5.3.1
Given a y ∼ Poisson(λ) and p(λ) ∼ γ(a, b), what is the mean of the posterior? Assume
x = mean(y ).
a+z
a + nz
96 / 97
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
97 / 97