0% found this document useful (0 votes)

12 views13 pages

Bayesian Estimation

Bayesian parameter estimation is a method for estimating unknown parameters by combining prior beliefs with observed data to form posterior distributions. The process involves defining research questions, specifying prior probabilities, collecting data, and updating beliefs using Bayes' theorem. While Bayesian estimation has merits such as incorporating prior knowledge and providing full distributional information, it also has challenges including sensitivity to prior choice and computational complexity.

Uploaded by

sri.harini.007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views13 pages

Bayesian Estimation

Uploaded by

sri.harini.007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Bayesian estimation of Parameters

Bayesian parameter estimation is a technique for estimating the probability density function of random
variables with unknown parameters. It involves identifying the prior distribution, collecting data and
forming the likelihood, and finding the posterior distribution. A Bayes estimator is an estimator
that minimizes the posterior expected loss or maximizes the posterior expected utility
The classic theory assumes parameters are fixed quantities that we want to estimate as precisely as
possible. Whereas Bayesian perspective is different; parameters are random variables with probabilities
assigned to particular values of parameters to reflect the degree of evidence for that value

The steps involved in Bayesian inference

 Define the research question or problem: Clearly articulate the problem you aim to solve using
Bayesian inference.

 Specify the prior probabilities: Determine the initial beliefs or probabilities based on available
information.

 Collect and analyze data: Gather relevant data and analyse it using appropriate statistical methods.

 Update the prior probabilities: Apply Bayes’ theorem to combine the prior probabilities with the
observed data and obtain posterior probabilities.

 Interpret the results: Interpret the posterior probabilities in the context of the research question
and draw conclusions.

Bayesian estimation of parameters relies on several key assumptions. These assumptions ensure that the
process of updating beliefs about a parameter using Bayes' theorem is valid and meaningful. Here are the
main assumptions:

1. Model Assumptions

a. Correct Model Specification

 It is assumed that the statistical model chosen correctly represents the underlying data generation
process.
 The likelihood function p(X∣θ) accurately describes the probability of observing the data given the
parameter θ.

b. Independence of Observations

 Observations are often assumed to be independent and identically distributed (i.i.d.).

 For models that assume independence, each data point provides independent information about
the parameter.
2. Prior Assumptions

a. Choice of Prior Distribution

 The prior distribution p(θ) reflects prior beliefs or knowledge about the parameter before observing
the data.
 The prior should be chosen based on subject matter knowledge, previous research, or convenience
(e.g., conjugate priors).

b. Informative vs. Non-informative Priors

 Informative priors incorporate substantial prior knowledge.

 Non-informative (or weakly informative) priors are used when little prior information is available,
often aiming to have minimal influence on the posterior.

3. Data Assumptions

Data Representativeness

 The observed data is representative of the population or process being studied.

 The data is collected in a way that avoids biases that could distort the estimation.

4. Computational Assumptions

Convergence of Algorithms

 When using computational methods such as Markov Chain Monte Carlo (MCMC) to approximate
the posterior distribution, it is assumed that the algorithms converge to the true posterior
distribution.
 Proper diagnostics and checks are necessary to ensure convergence.

5. Mathematical and Logical Assumptions

Bayes' Theorem Validity

 Bayes' theorem is mathematically sound and applies to the problem at hand.

 The prior, likelihood, and evidence are properly defined and integrable functions.

Mathematical derivation:

Bayes parameter estimation (BPE) is a widely used technique for estimating the probability density
function of random variables with unknown parameters. Suppose that we have an observable random
variable X for an experiment and its distribution depends on unknown parameter θ taking values in a
parameter space Θ. The probability density function of X for a given value of θ is denoted by p(x|θ ). It
should be noted that the random variable X and the parameter θ can be vector-valued. Now we obtain a
set of independent observations or samples S = {x 1,x2,...,xn} from an experiment. Our goal is to compute
p(x|S) which is as close as we can come to obtain the unknown p(x), the probability density function of X.
In Bayes parameter estimation, the parameter θ is viewed as a random variable or random vector following
the distribution p(θ ). Then the probability density function of X given a set of observations S can be
estimated by

So if we know the form of p(x|θ) with unknown parameter vector θ, then we need to estimate
the weight p(θ |S), often called posterior, so as to obtain p(x|S) using Eq. (1). Based on Bayes Theorem, the
posterior can be written as

where p(θ) is called prior distribution or simply prior, and p(S|θ) is called likelihood function . A prior is
intended to reflect our knowledge of the parameter before we gather data and the posterior is an updated
distribution after obtaining the information from data.

Example 1: Binomial Model

Let S = {x1,x2,...,xn} be a set of coin flipping observations, where xi = 1 denotes 'Head' and xi = 0 denotes
'Tail'. Assume the coin is weighted and our goal is to estimate parameter θ , the probability of 'Head'.
Assume that we flipped a coin 20 times yesterday, but we did not remember how many times the ’Head’
was observed. What we know is that the probability of 'Head' is around 1/4, but this probability is
uncertain since we only did 20 trails and we did not remember the number of 'Heads'. With this prior
information, we decide to do this experiment today so as to estimate the parameter θ .

A prior represents our previous knowledge or belief about parameter θ. Based on our memories from
yesterday, assume that the prior of θ follows Beta distribution Beta(5, 15)

Today we flipped the same coin n times and y 'Heads' were observed. Then we compute
the posterior with today’s data. Consider Eq. (2), the posterior is written as
4

Assume that we did 500 trials and 'Head' appeared 220 times, the posterior is Beta(225,295) . It can be
noted that the posterior and prior distribution have the same form. This kind of prior distribution is called
conjugate prior. The Beta distribution is conjugate to the binomial distribution which gives the likelihood of
i.i.d Bernoulli trials.
As we can see, the conjugate prior successfully includes previous information, or our belief of parameter θ
into the posterior. So our knowledge about the parameter is updated with today’s data, and the posterior
obtained today can be used as prior for tomorrow’s estimation. This reveals an important property of Bayes
parameter estimation, that the Bayes estimator is based on cumulative information or knowledge of
unknown parameters, from past and present.
After we obtain the posterior, then we can estimate the probability density function of random variable X.
Consider Eq. (1), the density function can be expressed as

It suggests that the prior Beta(5,15) actually is equivalent to adding 20 Bernoulli observations to the data,
5 Heads and 15 Tails. This means the posterior summarizes all our knowledge about the parameter x, and
the prior does affect the estimate of the density of random variable X. However, as we do more and more
coin flipping trials (i.e. n is getting larger), the density function p(x|S) will almost surely converge to the
underlying distribution (Figure 3), which means the prior becomes less important. Figure 4 illustrates that
as n is getting larger, the posterior becomes sharper. In our experiment, when n = 10000, the posterior has
sharp peak at θ = 0.4. In this case, the prior (said around 1/4) has little effect on posterior and we have
strong belief to say that the probability of ’Head’ is around 0.4.
Example 2: Gaussian Model
The Bernoulli distribution discussed above is a discrete example, here we will illustrate a continuous

a Gaussian random variable X ∼ N(μ,σ2) with unknown mean μ. Here we use a conjugate
example, a Gaussian random variable X [3]. Assume S = {x1,x2,...,xn} is a set of independent observations of

prior p(μ)∼N(μ0,σ^2 0). Then the posterior can be computed by

So the posterior also follows Gaussian distribution N(μn,σ2n), where μn and σ2n is defined by

Consider Eq. (1), we have the estimated density of random variable X.

8
Merits and Demerits
Merits of Bayesian Estimation

1. Incorporation of Prior Knowledge:

- Allows integration of prior knowledge or expert opinion through the prior distribution.

2. Full Distributional Information:

- Provides a full posterior distribution, offering comprehensive uncertainty quantification.

3. Sequential Updating:
- Supports real-time data assimilation and updating of beliefs as new data arrives.

4. Flexible Modelling:
- Handles complex hierarchical and mixture models, accommodating various data structures.

5. Effective with Small Sample Sizes:

- Performs well with limited data by leveraging prior information and reducing overfitting.

Demerits of Bayesian Estimation

1. Sensitivity to Prior Choice:

- Results can be heavily influenced by the choice of prior, requiring careful selection.

2. Computational Complexity:
- Often requires intensive algorithms like MCMC, demanding significant computational resources.

3. Challenging Prior Interpretation:

- Difficult to specify appropriate priors without solid prior information, which can be subjective.

4. Complex Integration and Normalization:

- Computing the marginal likelihood for model comparison can be challenging, especially in high dimensions.

5. Difficulty in Communication:
- Explaining Bayesian concepts, such as priors and posterior distributions, can be complex for non-statisticians.
Stochastic Process
A family of random variables {X(t), >=0} indexed by the time parameter t, then X(t) is called a stochastic
process.
The parameter t belongs to T, where T is known as the index set and it can be either discrete or continuous
T is also known as parameter space
The values assumed by the process are called the states and the set of possible values is called state space
and it is denoted by S
A state space is discrete if it contains a finite or a denumerable infinite number of points otherwise it is
continuous.

Markov Chain
A Stochastic process Xn{ X(t); t belongs to T} is said to be a Markov chain if the index set and the state space
both are discrete and the sequence of random variables satisfies the following condition
P[Xn+1 = j | X0 = i0 , X1 ,......., Xn = in ] = P[Xn+1 = j | Xn = in ] for all n>=0
[ i.e., The probability of jumping from one state to the next state depends only on the current
state and not on the sequence of previous states that lead to this current state.]

Where i0, i1,......in, j belongs to S and the sequence {Xn} is said to possess Markov property
Markov chain is known as a finite Markov chain and if the state space is countably infinite then the Markov
chain is known as countable Markov chain.
The outcomes of the trails are called as the states of the MC. If Xn = j is given then the process is said to be
at state j at nth trail.

Transition Probabilities
Pij(n) = P(Xn = j | X0=i) is known as the n- step transition probabilities of a Markov chain from state ‘i’ at time
0 to state j at time ‘n’ units.
Pij(n,n+1) is known as the one-step-transition probability of the Markov chain from state i at the time point t n
to the state j at the time point tn+1
These one step transition probabilities should satisfy Pij(n,n+1) >=0 , ∑ nj=1 Pij(n,n+1) =1; for all i,j and they can be
represented in a matrix form, which is known as transition probability matrix(TPM) of MC and it is denoted
by P.
Stationary Distribution
Suppose, we have a process of few states and we have a fixed transition probability (P) of jumping between
states.
We start with some random probability distribution over all states (Sᵢ)
at time step i, and to estimate the probability distribution over all states at the next time step i.e i+1, we
multiply it by transition probability P.

Si+1 = Si P
If we keep on doing this, after a while S stops changing on multiplying with matrix P, this is when we say it
has reached a Stationary Distribution.

S= S P
0 1 2

[ ]
0 0 2/3 1/3
Ex: P = 1 3/8 1/8 1 /2 Find the stationary distribution of MC with TPM P
2 1/2 1/2 0

Sol: Let We want to satisfy S = S P

[ ]
0 2/3 1/3
Consider (S0, S1, S2) = (S0, S1, S2) 3/8 1/8 1 /2
1/2 1/2 0

S0 = (3/8) S1 + (1/2) S2 ..............1

S1 = (2/3)S0 + (1/8)S1 + (1/2) S2 .............2
S2 = (1/3)S0 + (1/2)S1 ................3
After solving the above 3 equations we obtain S = (S0, S1, S2) = (0.3,0.4,0.3) which are the steady state
probabilities of the MC
i.e., P0 = (0.3,0.4,0.3)
P1 = P0 P2 = (0.3,0.4,0.3)
............Pn= P0Pn = (0.3,0.4,0.3)
This is a stationary distribution of a MC.

Markov Chain Monte Carlo (MCMC)

MCMC can be used to sample from any probability distribution. Mostly we use it to sample from the
intractable posterior distribution for the purpose of Inference. Estimating the Posterior using Bayes can be
difficult sometimes, in most of the cases we can find the functional form of Likelihood x Prior. However,
computing marginalized probability P(B) can be computationally expensive, especially when it is a
continuous distribution.
The trick here is to avoid calculating the normalization constant altogether.
The General Idea for the algorithm is to start with some random probability distribution and
gradually move towards desired probability distribution.
 Initiate a markov chain with a random probability distribution over states
 Gradually move in the chain converging towards stationary distribution
 Apply some condition (Detailed Balance Sheet) that ensures this stationary distribution resembles
desired probability distribution.

Thus, on reaching stationary distribution we have approximated posterior probability distribution.

 The probability p(A) represents the probability of being at A, the probability T(A -> B) represents the
probability of moving from A to B.
 The probability p(B) represents the probability of being at B, the probability T(B -> A) represents the
probability of moving from B to A.
 Each of the sides represents probability flow from either A to B or B to A.
If the condition satisfies, then it guarantees the stationary state to be approximately representing
posterior distribution.
 Although MCMC itself is complicated, they provide a lot of flexibility. It provides us with efficient
sampling in high-dimension. It can be used to solve problems with a large state space.

Limitation — MCMC doesn’t perform well in approximating probability distribution that has multi modes.

 Marginal Probability P(B) in this case is a constant known as a normalization constant that sums
over the entire possible values of the numerator.
 There are techniques for training or evaluating models that have intractable normalization constant
(also known as partition function ). Few of them use MCMC for sampling in their algorithm.

Metropolis — Hasting Algorithm

The Metropolis-Hastings algorithm is a specific type of Markov Chain Monte Carlo (MCMC) method used to
generate samples from a target distribution when direct sampling is difficult. This algorithm constructs a
Markov chain with the desired stationary distribution by using a proposal distribution to suggest new states
and an acceptance probability to decide whether to accept or reject these new states.
Suppose we are sampling from distribution p(x) = f(x) / Z, where Z is the intractable normalization constant.
Our objective is to sample from p(x) in such a way that involves making use of numerator alone and avoids
having to estimate denominator.

The Metropolis–Hastings algorithm generates a sequence of sample values in such a way that, as more and
more sample values are produced, the distribution of values more closely approximates the desired
distribution. These sample values are produced iteratively in such a way, that the distribution of the next
sample depends only on the current sample value, which makes the sequence of samples a Markov chain.

Derivation

desired distribution 𝑃(𝑥). To accomplish this, the algorithm uses a Markov process, which asymptotically
The purpose of the Metropolis–Hastings algorithm is to generate a collection of states according to a

reaches a unique stationary distribution 𝜋(𝑥) such that 𝜋(𝑥)=𝑃(𝑥)

A Markov process is uniquely defined by its transition probabilities 𝑃(𝑥′∣𝑥), the probability of transitioning
from any given state 𝑥 to any other given state 𝑥′ . It has a unique stationary distribution 𝜋(𝑥) when the
following two conditions are met:

1. Existence of stationary distribution: there must exist a stationary distribution 𝜋(𝑥). A sufficient but
not necessary condition is detailed balance, which requires that each transition 𝑥→𝑥′ is reversible:
for every pair of states 𝑥,𝑥′, the probability of being in state 𝑥 and transitioning to state 𝑥′ must be
equal to the probability of being in state 𝑥′and transitioning to state 𝑥, 𝜋(𝑥)𝑃(𝑥′∣𝑥) = 𝜋(𝑥
′)𝑃(𝑥∣𝑥′).
2. Uniqueness of stationary distribution: the stationary distribution 𝜋(𝑥) must be unique. This is
guaranteed by ergodicity of the Markov process, which requires that every state must
 Be aperiodic—the system does not return to the same state at fixed intervals; and
 Be positive recurrent—the expected number of steps for returning to the same state is
finite.

probabilities) that fulfils the two above conditions, such that its stationary distribution 𝜋(𝑥) is chosen to
Step 1: The Metropolis–Hastings algorithm involves designing a Markov process (by constructing transition

be 𝑃(𝑥). The derivation of the algorithm starts with the condition of detailed balance:

𝑃(𝑥′∣𝑥)𝑃(𝑥)=𝑃(𝑥∣𝑥′)𝑃(𝑥′)

Step 2: which is re-written as

P ( x ' ∣ x ) P ( x ')
=
P ( x ∣ x ') P (x)

Step 3: The approach is to separate the transition in two sub-steps; the proposal and the acceptance-

The proposal distribution 𝑔(𝑥′∣𝑥) is the conditional probability of proposing a state 𝑥′given 𝑥, and
rejection.

The acceptance distribution 𝐴(𝑥′,𝑥) is the probability to accept the proposed state 𝑥′.
The transition probability can be written as the product of them:

𝑃(𝑥′∣𝑥)=𝑔(𝑥′∣𝑥)𝐴(𝑥′,𝑥).

ll’y 𝑃(𝑥 ∣𝑥′)=𝑔(𝑥 ∣ 𝑥′)𝐴(𝑥,𝑥′).

Step 4: Inserting this relation in the previous equation, we have

A (x ' , x ) P ( x ') g(x ∣ x ' )

=
A (x , x ') P (x) g(x ' ∣ x)
Step 5: The next step in the derivation is to choose an acceptance ratio that fulfills the condition above.
One common choice is the Metropolis choice:

𝐴(𝑥′, 𝑥) = min (1 , P(x

P( x ) g ( x ' ∣ x) )
') g ( x ∣ x ' )

Step 6: We keep on sampling for a long time and discard the initial few samples as the chain has not
reached its stationary state yet (this period is known as burn-in period).

* For this Metropolis acceptance ratio 𝐴, either 𝐴(𝑥′,𝑥)=1 or 𝐴(𝑥,𝑥′)=1 and, either way, the condition is
satisfied.

Thus the Metropolis–Hastings algorithm in Bayesian Inference can thus be written as follows:

 Pick an initial state 𝜃0.

1. Initialise

 Set i=0 Iterate

i) Generate a random candidate state 𝜃∗ according to 𝑄(𝜃∗|𝜃𝑖).

2. Iterate

𝑃𝑎𝑐𝑐(𝜃𝑖→𝜃∗) = min ¿
ii) Calculate the acceptance probability

 Generate a uniform random number 𝑢∈[0,1];

iii) Accept or reject:

 If 𝑢≤ 𝑃𝑎𝑐𝑐(𝜃𝑖→𝜃∗), then accept the new state and set 𝜃𝑖+1 = 𝜃∗;
 If 𝑢> 𝑃𝑎𝑐𝑐(𝜃𝑖→𝜃∗), then reject the new state, and copy the old state forward 𝜃𝑖+1 = 𝜃𝑖.

iv) Increment: set i=i+1.

Provided that specified conditions are met, the empirical distribution of saved states 𝜃0, 𝜃1 ,...... 𝜃𝑖 will
approach 𝑃(𝜃).

probability is given above where 𝐿 is the likelihood, 𝑃(𝜃) the prior probability density and 𝑄 the
MCMC can be used to draw samples from the posterior distribution of a statistical model. The acceptance

(conditional) proposal probability

Gibbs Sampling Algorithm

Gibbs sampling is a special case of the Metropolis–Hastings algorithm. Gibbs sampling is applicable when
the joint distribution is not known explicitly or is difficult to sample from directly, but the conditional
distribution of each variable is known and is easy (or at least, easier) to sample from.

The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a
conditional distribution than to marginalize by integrating over a joint distribution.

Bivariate Distribution case

Our goal is to sample from 2D Normal distribution with

μ= (00) , ∑ = [ 1/21 1/2

1 ]
Now we need to sample from this 2 D N(μ,∑) with pdf f(x,y), which might be a difficult task. So instead of
sampling directly from the joint pdf, we consider their conditional distributions f(x|y) or f(y|x) and obtain
the samples. This method makes easy the task.

From the given data

f(x|y) = N(ρ y, 1- ρ2) = N(y/2, 3/4)

ll’y for f(y|x) = N(x/2, 3/4)

Now we have two individual univariate distributions which is quite easy to sample the data from.

Procrdure:
1) Start at (x(0),y(0))
2) Sample x(1) ~ f(x(1)| y(0))
3) Sample y(1) ~ f(y(1)| x(1))

Then iterate steps 2 and 3 until we get our desired samples.

This is for the Bivariate Distribution.

Generalization
Let 𝑦 denote observations generated from the sampling distribution 𝑓(𝑦|𝜃) and 𝜋(𝜃) be a prior
supported on the parameter space Θ. Then one of the central goals of the Bayesian statistics is to
approximate the posterior density

𝜋(𝜃|𝑦) =
f ( y∨θ)⋅ π (θ)
m( y )
where the marginal likelihood 𝑚(𝑦)=∫Θ 𝑓(𝑦|𝜃)⋅𝜋(𝜃)𝑑𝜃 is assumed to be finite for all 𝑦.
To explain the Gibbs sampler, we additionally assume that the parameter space Θ is decomposed as

k
Θ =∏ Θi = Θ1×⋯Θ𝑖×⋯×Θ𝐾, (𝐾>1),
i

where × represents the Cartesian product. Each component parameter space Θ𝑖 can be a set of scalar

Define a set Θ−𝑖 that complements the Θ𝑖. Essential ingredients of the Gibbs sampler is the 𝑖-th full
components, subvectors, or matrices.

conditional posterior distribution for each 𝑖=1,⋯,𝐾

𝜋(𝜃𝑖|𝜃−𝑖,𝑦)=𝜋(𝜃𝑖|𝜃1,⋯,𝜃𝑖−1,𝜃𝑖+1,⋯,𝜃𝐾, 𝑦).

The following algorithm details a generic Gibbs sampler

Initialize: Pick arbitrary starting value 𝜃(1) = (𝜃1(1), 𝜃2(1), 𝜃3(1),......., 𝜃k(1))

Iterate a Cycle:

Step 1. Draw 𝜃1(𝑠+1)∼𝜋(𝜃1|𝜃2(𝑠),𝜃3(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step 2. Draw 𝜃2(𝑠+1)∼𝜋(𝜃2|𝜃1(𝑠+1),𝜃3(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step i. Draw 𝜃𝑖(𝑠+1)∼𝜋(𝜃𝑖|𝜃1(𝑠+1),𝜃2(𝑠+1),⋯,𝜃𝑖−1(𝑠+1),𝜃𝑖+1(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step i+1. Draw 𝜃𝑖+1(𝑠+1)∼𝜋(𝜃𝑖+1|𝜃1(𝑠+1),𝜃2(𝑠+1),⋯,𝜃𝑖(𝑠+1),𝜃𝑖+2(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step K. Draw 𝜃𝐾(𝑠+1)∼𝜋(𝜃𝐾|𝜃1(𝑠+1),𝜃2(𝑠+1),⋯,𝜃𝐾−1(𝑠+1),𝑦)

⋮

4. end Iterate : Repeat the iteration for a number of iterations with one variable at a time and for all the
variables present in the system.
5. A Markov Chain is created in the process which converges to the target distribution, we have to
discard the initial values which led us to the target sample i.e. we have to discard the burn in phase .

Dana S. Dunn, Suzanne Mannes - Statistics and Data Analysis For The Behavioral Sciences-McGraw-Hill Companies (2001)
100% (1)
Dana S. Dunn, Suzanne Mannes - Statistics and Data Analysis For The Behavioral Sciences-McGraw-Hill Companies (2001)
758 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
Bayesian Uncertainty Quantification
No ratings yet
Bayesian Uncertainty Quantification
23 pages
7.4 - Bayesian Estimation - 2
No ratings yet
7.4 - Bayesian Estimation - 2
8 pages
Lecture 6. Bayesian Estimation
No ratings yet
Lecture 6. Bayesian Estimation
14 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
CH 5
No ratings yet
CH 5
45 pages
Studio 5 Questions
No ratings yet
Studio 5 Questions
8 pages
LN 13
No ratings yet
LN 13
5 pages
Bayes' Estimators: The Method
No ratings yet
Bayes' Estimators: The Method
7 pages
Bayesian Inference: by Hoai Nam Nguyen September 9, 2017
No ratings yet
Bayesian Inference: by Hoai Nam Nguyen September 9, 2017
7 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Bayes
No ratings yet
Bayes
3 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Revision - Bayesian Inference
No ratings yet
Revision - Bayesian Inference
4 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Expo Kundu
No ratings yet
Expo Kundu
22 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Bayesian Theory-Priors, Part 1: Other Reading
No ratings yet
Bayesian Theory-Priors, Part 1: Other Reading
14 pages
Chapter 1 Basic Concepts: Thomas Bayes (1702-1761)
No ratings yet
Chapter 1 Basic Concepts: Thomas Bayes (1702-1761)
6 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
IT590 Bayesian Theory Lecture 2
No ratings yet
IT590 Bayesian Theory Lecture 2
5 pages
Baysian Inferences
No ratings yet
Baysian Inferences
20 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
MIT18 650F16 Bayesian Statistics
No ratings yet
MIT18 650F16 Bayesian Statistics
18 pages
An Overview of Bayesian Econometrics
No ratings yet
An Overview of Bayesian Econometrics
30 pages
1 Inference
No ratings yet
1 Inference
9 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Lecture 20 - Bayesian Analysis
No ratings yet
Lecture 20 - Bayesian Analysis
4 pages
Lecture Material 2.5 - Bayesian Estimation & Concepts
No ratings yet
Lecture Material 2.5 - Bayesian Estimation & Concepts
12 pages
Single Parameter Models
No ratings yet
Single Parameter Models
37 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Model-Free Objetive Bayesian Prediction
No ratings yet
Model-Free Objetive Bayesian Prediction
8 pages
Bayesian Parameter Estimation
No ratings yet
Bayesian Parameter Estimation
40 pages
Frequentist vs Bayesian Analysis
No ratings yet
Frequentist vs Bayesian Analysis
18 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
86 pages
2.2 Bayesian Statistics
No ratings yet
2.2 Bayesian Statistics
12 pages
Baysian-Slides 16 Bayes Intro
No ratings yet
Baysian-Slides 16 Bayes Intro
49 pages
Bayesian Statistics Exam Guide
No ratings yet
Bayesian Statistics Exam Guide
6 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
qt9kb6x0bw Nosplash
No ratings yet
qt9kb6x0bw Nosplash
18 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
No ratings yet
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
36 pages
Parameter Estimation
No ratings yet
Parameter Estimation
50 pages
Bayesian Estimate
No ratings yet
Bayesian Estimate
2 pages
Var PPTS
No ratings yet
Var PPTS
249 pages
Zzzz-Essential Bayes
No ratings yet
Zzzz-Essential Bayes
158 pages
Notes 19
No ratings yet
Notes 19
11 pages
Bayesian Inference Basics
No ratings yet
Bayesian Inference Basics
7 pages
Bayesian Statistics: MA501, Statistics For Insurance
No ratings yet
Bayesian Statistics: MA501, Statistics For Insurance
28 pages
Aktivitas Manajemen Laba Analisis Peran Komite Audit, Kepemilikan
No ratings yet
Aktivitas Manajemen Laba Analisis Peran Komite Audit, Kepemilikan
21 pages
Inferential Stats: Two-Group Design
No ratings yet
Inferential Stats: Two-Group Design
36 pages
6 Anova Using SPSS
No ratings yet
6 Anova Using SPSS
60 pages
Package Car': March 30, 2023
No ratings yet
Package Car': March 30, 2023
158 pages
Econometrics I: Applied Econometrics Course Overview
No ratings yet
Econometrics I: Applied Econometrics Course Overview
40 pages
Unit I Probability
No ratings yet
Unit I Probability
40 pages
Hypothesis Development and Testing: Sendil Mourougan, Dr. K. Sethuraman
No ratings yet
Hypothesis Development and Testing: Sendil Mourougan, Dr. K. Sethuraman
7 pages
Unit1-Unit5 2mark Question
No ratings yet
Unit1-Unit5 2mark Question
4 pages
Quantitative Data Analysis in Accounting
No ratings yet
Quantitative Data Analysis in Accounting
51 pages
Failure Rate Analysis of Jaw Crusher Using Weibull Model
No ratings yet
Failure Rate Analysis of Jaw Crusher Using Weibull Model
14 pages
Assignment I Questions Econ. For Acct & Fin. May 2021
80% (5)
Assignment I Questions Econ. For Acct & Fin. May 2021
3 pages
Branches of Statistics, Data Types, and Graphs
No ratings yet
Branches of Statistics, Data Types, and Graphs
6 pages
EFM MCQs
No ratings yet
EFM MCQs
34 pages
Inferential Statistics: Estimation Hypothesis Testing
No ratings yet
Inferential Statistics: Estimation Hypothesis Testing
59 pages
Tugas Mandiri
No ratings yet
Tugas Mandiri
18 pages
Simple Regression
No ratings yet
Simple Regression
22 pages
Chapter - 3-Hypothesis Testing
No ratings yet
Chapter - 3-Hypothesis Testing
55 pages
Norm Test
No ratings yet
Norm Test
14 pages
PSYC-2020A F23-W24 Herbert
No ratings yet
PSYC-2020A F23-W24 Herbert
9 pages
A Nova Module
No ratings yet
A Nova Module
21 pages
Social Statistics For A Diverse Society 8th Edition Frankfort Nachmias Test Bank PDF Version
100% (1)
Social Statistics For A Diverse Society 8th Edition Frankfort Nachmias Test Bank PDF Version
103 pages
Research Methods in Literature
No ratings yet
Research Methods in Literature
4 pages
Using Ridge Regression With Genetic Algorithm To Enhance Real Estate Appraisal Forecasting
No ratings yet
Using Ridge Regression With Genetic Algorithm To Enhance Real Estate Appraisal Forecasting
11 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
2 pages
Advanced Econometrics - Assignment 2
No ratings yet
Advanced Econometrics - Assignment 2
2 pages
Econometrics Test: Sales Revenue Analysis
No ratings yet
Econometrics Test: Sales Revenue Analysis
4 pages
Naïve Approach: Forecasting
No ratings yet
Naïve Approach: Forecasting
5 pages
Bootstrapping For Regressions in Stata 031017
No ratings yet
Bootstrapping For Regressions in Stata 031017
20 pages

Bayesian Estimation

Uploaded by

Bayesian Estimation

Uploaded by

Bayesian estimation of Parameters

The steps involved in Bayesian inference

a. Correct Model Specification

 Observations are often assumed to be independent and identically distributed (i.i.d.).

a. Choice of Prior Distribution

b. Informative vs. Non-informative Priors

 Informative priors incorporate substantial prior knowledge.

 The observed data is representative of the population or process being studied.

5. Mathematical and Logical Assumptions

Bayes' Theorem Validity

 Bayes' theorem is mathematically sound and applies to the problem at hand.

Example 1: Binomial Model

prior p(μ)∼N(μ0,σ^2 0). Then the posterior can be computed by

Consider Eq. (1), we have the estimated density of random variable X.

1. Incorporation of Prior Knowledge:

2. Full Distributional Information:

5. Effective with Small Sample Sizes:

Demerits of Bayesian Estimation

1. Sensitivity to Prior Choice:

3. Challenging Prior Interpretation:

4. Complex Integration and Normalization:

Sol: Let We want to satisfy S = S P

S0 = (3/8) S1 + (1/2) S2 ..............1

Markov Chain Monte Carlo (MCMC)

Thus, on reaching stationary distribution we have approximated posterior probability distribution.

Metropolis — Hasting Algorithm

reaches a unique stationary distribution 𝜋(𝑥) such that 𝜋(𝑥)=𝑃(𝑥)

Step 2: which is re-written as

ll’y 𝑃(𝑥 ∣𝑥′)=𝑔(𝑥 ∣ 𝑥′)𝐴(𝑥,𝑥′).

Step 4: Inserting this relation in the previous equation, we have

A (x ' , x ) P ( x ') g(x ∣ x ' )

𝐴(𝑥′, 𝑥) = min (1 , P(x

 Pick an initial state 𝜃0.

 Set i=0 Iterate

i) Generate a random candidate state 𝜃∗ according to 𝑄(𝜃∗|𝜃𝑖).

 Generate a uniform random number 𝑢∈[0,1];

iv) Increment: set i=i+1.

(conditional) proposal probability

Bivariate Distribution case

μ= (00) , ∑ = [ 1/21 1/2

From the given data

f(x|y) = N(ρ y, 1- ρ2) = N(y/2, 3/4)

ll’y for f(y|x) = N(x/2, 3/4)

Then iterate steps 2 and 3 until we get our desired samples.

This is for the Bivariate Distribution.

conditional posterior distribution for each 𝑖=1,⋯,𝐾

The following algorithm details a generic Gibbs sampler

Step 1. Draw 𝜃1(𝑠+1)∼𝜋(𝜃1|𝜃2(𝑠),𝜃3(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step 2. Draw 𝜃2(𝑠+1)∼𝜋(𝜃2|𝜃1(𝑠+1),𝜃3(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step i. Draw 𝜃𝑖(𝑠+1)∼𝜋(𝜃𝑖|𝜃1(𝑠+1),𝜃2(𝑠+1),⋯,𝜃𝑖−1(𝑠+1),𝜃𝑖+1(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step i+1. Draw 𝜃𝑖+1(𝑠+1)∼𝜋(𝜃𝑖+1|𝜃1(𝑠+1),𝜃2(𝑠+1),⋯,𝜃𝑖(𝑠+1),𝜃𝑖+2(𝑠),⋯,𝜃𝐾(𝑠),𝑦)

Step K. Draw 𝜃𝐾(𝑠+1)∼𝜋(𝜃𝐾|𝜃1(𝑠+1),𝜃2(𝑠+1),⋯,𝜃𝐾−1(𝑠+1),𝑦)

You might also like