0% found this document useful (0 votes)

8 views9 pages

Lecture 14

The document discusses decision theory, focusing on the estimation of parameters using data and the associated risk functions. It covers various loss functions, methods for comparing estimators, and the concepts of Bayes and minimax estimators. The document also highlights the trade-offs between Bayes estimators, which depend on prior distributions, and minimax estimators, which aim to minimize the worst-case risk.

Uploaded by

yifanwa2cc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views9 pages

Lecture 14

Uploaded by

yifanwa2cc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Lecture Notes 14

36-705

We continue with our discussion of decision theory.

1 Decision Theory
Suppose we want to estimate a parameter θ using data X n = (X1 , . . . , Xn ). What is the
best possible estimator θb = θ(X
b 1 , . . . , Xn ) of θ? Decision theory provides a framework for
answering this question.

1.1 The Risk Function

Let θb = θ(X
b n ) be an estimator for the parameter θ ∈ Θ. We start with a loss function
b that measures how good the estimator is. For example:
L(θ, θ)

b = (θ − θ)
L(θ, θ) b2 squared error loss,
b = |θ − θ|
L(θ, θ) b absolute error loss,
b = |θ − θ|
L(θ, θ) bp Lp loss,
b = 0 if θ = θb or 1 if θ 6= θb
L(θ, θ) zero–one loss,
b = I(|θb − θ| > c)
L(θ, θ) large deviation loss,
R
b = log p(x; θ) p(x; θ)dx
L(θ, θ) Kullback–Leibler loss.
p(x; θ)
b

If θ = (θ1 , . . . , θk ) is a vector then some common loss functions are

X
k
b = kθ − θk
L(θ, θ) b2= (θbj − θj )2 ,
j=1

!1/p
X
k
b = kθ − θk
L(θ, θ) bp= |θbj − θj |p .
j=1

When the problem is to predict a Y ∈ {0, 1} based on some classifier h(x) a commonly used
loss is
L(Y, h(X)) = I(Y 6= h(X)).
For real valued prediction a common loss function is

L(Y, Yb ) = (Y − Yb )2 .

1
3
2 R(θ, θb2 )

1 R(θ, θb1 )
0 θ
0 1 2 3 4 5

Figure 1: Comparing two risk functions. Neither risk function dominates the other at all
values of θ.

The risk of an estimator θb is

Z
b b b 1 , . . . , xn ))p(x1 , . . . , xn ; θ)dx.
R(θ, θ) = Eθ L(θ, θ) = L(θ, θ(x (1)

When the loss function is squared error, the risk is just the MSE (mean squared error):

b = Eθ (θb − θ)2 = Varθ (θ)

R(θ, θ) b + bias2 . (2)

If we do not state what loss function we are using, assume the loss function is squared error.

1.2 Comparing Risk Functions

To compare two estimators, we compare their risk functions. However, this does not provide
a clear answer as to which estimator is better. Consider the following examples.

Example 1 Let X ∼ N (θ, 1) and assume we are using squared error loss. Consider two
estimators: θb1 = X and θb2 = 3. The risk functions are R(θ, θb1 ) = Eθ (X − θ)2 = 1 and
R(θ, θb2 ) = Eθ (3 − θ)2 = (3 − θ)2 . If 2 < θ < 4 then R(θ, θb2 ) < R(θ, θb1 ), otherwise, R(θ, θb1 ) <
R(θ, θb2 ). Neither estimator uniformly dominates the other; see Figure 1.

2
Example 2 Let X1 , . . . , Xn ∼ Bernoulli(p). Consider squared error loss and let pb1 = X.
Since this has zero bias, we have that
p(1 − p)
R(p, pb1 ) = Var(X) = .
n
Another estimator is
Y +α
pb2 =
α+β+n
Pn
where Y = i=1 Xi and α and β are positive constants.1 Now,
R(p, pb2 ) = Varp (b p2 ))2
p2 ) + (biasp (b
2
Y +α Y +α
= Varp + Ep −p
α+β+n α+β+n
2
np(1 − p) np + α
= + −p .
(α + β + n)2 α+β+n
p
Let α = β = n/4. The resulting estimator is
p
Y + n/4
pb2 = √
n+ n
and the risk function is
n
√ .
R(p, pb2 ) =
4(n + n)2
The risk functions are plotted in Figure 2. As we can see, neither estimator uniformly
dominates the other.

These examples highlight the need to be able to compare risk functions. To do so, we need a
one-number summary of the risk function. Two such summaries are the maximum risk and
the Bayes risk.
The maximum risk is
b = sup R(θ, θ)
R(θ) b (3)
θ∈Θ

and the Bayes risk under prior π is

Z
b =
Bπ (θ) b
R(θ, θ)π(θ)dθ. (4)

Example 3 Consider again the two estimators in Example 2. We have

p(1 − p) 1
R(b
p1 ) = max =
0≤p≤1 n 4n
1
This is the posterior mean using a Beta (α, β) prior.

3
Risk

Figure 2: Risk functions for pb1 and pb2 in Example 2. The solid curve is R(b
p1 ). The dotted
line is R(b
p2 ).

and
n n
R(b
p2 ) = max √ 2 = √ .
p 4(n + n) 4(n + n)2
Based on maximum risk, pb2 is a better estimator since R(b p2 ) < R(b
p1 ). However, when n is
large, R(b
p1 ) has smaller risk except for a small region in the parameter space near p = 1/2.
Thus, many people prefer pb1 to pb2 . This illustrates that one-number summaries like the
maximum risk are imperfect.

These two summaries of the risk function suggest two different methods for devising estima-
tors: choosing θb to minimize the maximum risk leads to minimax estimators; choosing θb to
minimize the Bayes risk leads to Bayes estimators.
An estimator θb that minimizes the Bayes risk is called a Bayes estimator. That is,
b = inf Bπ (θ̃)
Bπ (θ) (5)
θ̃

where the infimum is over all estimators θ̃. An estimator that minimizes the maximum risk
is called a minimax estimator. That is,
b = inf sup R(θ, θ̃)
sup R(θ, θ) (6)
θ θ̃ θ

where the infimum is over all estimators θ̃. We call the right hand side of (6), namely,
b
Rn ≡ Rn (Θ) = inf sup R(θ, θ), (7)
θb θ∈Θ

4
the minimax risk. Statistical decision theory has two main goals: determine the minimax
risk Rn and find an estimator that achieves this risk.
Once we have found the minimax risk Rn we want to find the minimax estimator that
achieves this risk:
b = inf sup R(θ, θ).
sup R(θ, θ) b (8)
θ∈Θ θb θ∈Θ

1.3 Bayes Estimators

Let π be a prior distribution. After observing X n = (X1 , . . . , Xn ), the posterior distribution

is, according to Bayes’ theorem,
R R
p(X1 , . . . , Xn |θ)π(θ)dθ L(θ)π(θ)dθ
P(θ ∈ A|X ) = R
n A
= RA (9)
Θ
p(X1 , . . . , Xn |θ)π(θ)dθ Θ
L(θ)π(θ)dθ
where L(θ) = p(xn ; θ) is the likelihood function. The posterior has density
p(xn |θ)π(θ)
π(θ|xn ) = (10)
m(xn )
R
where m(xn ) = p(xn |θ)π(θ)dθ is the marginal distribution of X n . Define the posterior
b n ) by
risk of an estimator θ(x
Z
b n b n ))π(θ|xn )dθ.
r(θ|x ) = L(θ, θ(x (11)

b satisfies
Theorem 4 The Bayes risk Bπ (θ)
Z
b
Bπ (θ) = r(θ|x b n )m(xn ) dxn . (12)

b n ) be the value of θ that minimizes r(θ|x

Let θ(x b n ). Then θb is the Bayes estimator.

Proof:
Let p(x, θ) = p(x|θ)π(θ) denote the joint density of X and θ. We can rewrite the Bayes risk
as follows:
Z Z Z !
b =
Bπ (θ) b
R(θ, θ)π(θ)dθ = b n ))p(x|θ)dxn π(θ)dθ
L(θ, θ(x
Z Z Z Z
= b n n
L(θ, θ(x ))p(x, θ)dx dθ = b n ))π(θ|xn )m(xn )dxn dθ
L(θ, θ(x
Z Z ! Z
= b n n n n
L(θ, θ(x ))π(θ|x )dθ m(x ) dx = r(θ|x b n )m(xn ) dxn .

5
b n ) to be the value of θ that minimizes r(θ|x
If we choose θ(x b n ) then we will minimize the
R
b n )m(xn )dxn .
integrand at every x and thus minimize the integral r(θ|x
Now we can find an explicit formula for the Bayes estimator for some specific loss functions.

b = (θ − θ)
Theorem 5 If L(θ, θ) b 2 then the Bayes estimator is
Z
b n) =
θ(x θπ(θ|xn )dθ = E(θ|X = xn ). (13)

b = |θ − θ|
If L(θ, θ) b then the Bayes estimator is the median of the posterior π(θ|xn ). If L(θ, θ)
b
is zero–one loss, then the Bayes estimator is the mode of the posterior π(θ|xn ).

Proof:
We will prove the theorem for squared error loss. The Bayes estimator θ(x b n ) minimizes
R
b n ) = (θ − θ(x
r(θ|x b n ))2 π(θ|xn )dθ. Taking the derivative of r(θ|x
b n ) with respect to θ(x
b n ) and
R
b n ))π(θ|xn )dθ = 0. Solving for θ(x
setting it equal to zero yields the equation 2 (θ − θ(x b n)
we get 13.

Example 6 Let X1 , . . . , Xn ∼ N (µ, σ 2 ) where σ 2 is known. Suppose we use a N (a, b2 ) prior

for µ. The Bayes estimator with respect to squared error loss is the posterior mean, which is
2
σ
b 1 , . . . , Xn ) = b2 n
θ(X 2 X + σ2
a. (14)
b2 + σn b2 + n

It is worth keeping in mind the trade-off: Bayes estimators although easy to compute are
very subjective; they depend strongly on the prior π. Minimax estimators, although more
challenging to compute are not subjective, but do have the drawback that they are protecting
against the worst-case which might lead to pessimistic conclusions.

2 Minimax Estimators through Bayes Estimators

Our goal is to compute a minimax estimator θb that satisfies:

b ≤ inf sup R(θ, θ).

sup R(θ, θ) e
θ∈Θ θe θ∈Θ

We will let θminimax denote a minimax estimator.

6
2.1 Bounding the Minimax Risk

One strategy to find the minimax estimator is by finding (upper and lower) bounds on the
minimax risk that match. Then the estimator that achieves the upper bound is a minimax
estimator.
Upper bounding the minimax risk is straightforward. Given an estimator θbup we can compute
its maximum risk and use it to upper bound the minimax risk, i.e.
e ≤ R(θ, θbup ).
inf sup R(θ, θ)
θe θ∈Θ

The Bayes risk of the Bayes estimator for any prior π lower bounds the minimax risk. Fix a
prior π and suppose that θblow is the Bayes estimator with respect to π, then we have that:
Bπ (θblow ) ≤ Bπ (θminimax ) ≤ sup R(θ, θminimax ) = inf sup R(θ, θ).
e
θ θe θ∈Θ

Let us see an example of this in action.

Example: We will prove a classical result that if we observe independent draws from a
d-dimensional Gaussian, X1 , . . . , Xn ∼ N (θ, Id ), then the average:
1X
n
θb = Xi ,
n i=1
is a minimax estimator of θ with respect to the squared loss.
Let Rn denote the minimax risk. First, let us compute the upper bound on Rn . We note
that,
θb ∼ N (θ, Id /n),
so that its risk:
X
d Xd
b b 2
R(θ, θ) = E[ (θi − θi ) ] = E[ Zi2 ],
i=1 i=1

where Zi ∼ N (0, 1/n). This yields that,

b = d.
e ≤ R(θ, θ)
inf sup R(θ, θ)
θe θ∈Θ n
Now we lower bound the minimax risk using the Bayes risk. Let us take the prior to be
zero-mean Gaussian, i.e. we take π = N (0, c2 Id ). By sufficiency, we can replace the data
b We can write:
with θ.
θ ∼ N (0, c2 Id )
b ∼ N (θ, Id /n).
θ|θ

7
We can write this as

θ = c
1
θb = √ Z
n

where , Z ∼ N (0, Id ). Hence,

2
θ 0 c Id c2 Id
∼N ,
θb 0 c2 Id (c2 + 1/n)Id

We can now compute the posterior (using standard conditional Gaussian formulae), and
obtain its mean:

b = c2 b
E[θ|θ] θ.
c2 + 1/n

Now,
2
b =E c2
R(θ, θ) θb − θ .
c2 + 1/n

Write θb = θ + W , where W ∼ N (0, Id /n). Then

2
b = EW c2 θ
R(θ, θ) Z − .
c2 + 1/n n(c2 + 1/n)

Let us denote β := c2 + 1/n. Then we obtain that,

b = kθk22 c4 2 kθk22 c4 d
R(θ, θ) + EkW k 2 = + .
n2 β 2 β 2 n2 β 2 β 2 n

The Bayes risk further averages this over θ ∼ N (0, c2 Id ) to obtain that,

c2 b c2 d c4 d c2 d d
Bπ 2
θ = 2 2
+ 2
= = .
c + 1/n nβ β n nβ n(1 + 1/(nc2 ))

We conclude that
d d
≤ Rn ≤ .
n(1 + 1/(nc2 )) n
This is true for every c > 0. Since c was arbitrary we can take the limit as c → ∞ to obtain
that the minimax risk is upper and lower bounded by d/n and hence, Rn = d/n and the
sample average θb is minimax.

8
2.2 Least Favorable Prior

The other way to obtain Bayes estimators is by constructing what are called least favorable
priors.

Theorem 7 Let θb be the Bayes estimator for some prior π. If

b ≤ Bπ (θ)
R(θ, θ) b for all θ (15)

then θb is minimax and π is called a least favorable prior.

Proof:
Suppose that θb is not minimax. Then there is another estimator θb0 such that supθ R(θ, θb0 ) <
b Since the average of a function is always less than or equal to its maximum, we
supθ R(θ, θ).
have that Bπ (θb0 ) ≤ supθ R(θ, θb0 ). Hence,

Bπ (θb0 ) ≤ sup R(θ, θb0 ) < sup R(θ, θ)

b ≤ Bπ (θ)
b (16)
θ θ

which is a contradiction.

Theorem 8 Suppose that θb is the Bayes estimator with respect to some prior π. If the risk
is constant then θb is minimax.

Proof:
R
b = R(θ, θ)π(θ)dθ
The Bayes risk is Bπ (θ) b b ≤ Bπ (θ)
= c and hence R(θ, θ) b for all θ. Now
apply the previous theorem.

Example 9 Consider the Bernoulli model with squared error loss. We showed previously
that the estimator Pn p
i=1 Xi + n/4
pb = √
n+ n
has a constant risk function. This estimator is the
p posterior mean, and hence the Bayes
estimator, for the prior Beta(α, β) with α = β = n/4. Hence, by the previous theorem,
this estimator is minimax.

Chap 2
No ratings yet
Chap 2
40 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
w03 LectureSlices MA4550
No ratings yet
w03 LectureSlices MA4550
28 pages
Chap 3
No ratings yet
Chap 3
74 pages
Bayesian Decision Theory Generalization.....
No ratings yet
Bayesian Decision Theory Generalization.....
38 pages
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
No ratings yet
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
11 pages
Process Piping Workshop Guide
No ratings yet
Process Piping Workshop Guide
0 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Chap3 01
No ratings yet
Chap3 01
35 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
No ratings yet
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
11 pages
8 DecisionTheoryHandout
No ratings yet
8 DecisionTheoryHandout
15 pages
Decisiontheory 0
No ratings yet
Decisiontheory 0
13 pages
BT Wk5 LectureNotes A
No ratings yet
BT Wk5 LectureNotes A
17 pages
Binary Classification & Bayes Classifier
No ratings yet
Binary Classification & Bayes Classifier
4 pages
Bayesian Decision Theory Guide
No ratings yet
Bayesian Decision Theory Guide
64 pages
Lecture 09
No ratings yet
Lecture 09
32 pages
Lecture 13
No ratings yet
Lecture 13
12 pages
Minimax Estimation for Statisticians
No ratings yet
Minimax Estimation for Statisticians
10 pages
Minimax Estimation Lecture Notes
No ratings yet
Minimax Estimation Lecture Notes
26 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Studio 2 Questions
No ratings yet
Studio 2 Questions
5 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Bartlett 08 A
No ratings yet
Bartlett 08 A
18 pages
RegEstimationLS ML StatColumbia
No ratings yet
RegEstimationLS ML StatColumbia
44 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
RN Notes
No ratings yet
RN Notes
119 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
HW 3 Sol
No ratings yet
HW 3 Sol
6 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Chapter 4a Riskmin-Reg - Commented4
No ratings yet
Chapter 4a Riskmin-Reg - Commented4
54 pages
Point Estimation
No ratings yet
Point Estimation
5 pages
Lec7 Model
No ratings yet
Lec7 Model
8 pages
Creative Design Skill Guide
No ratings yet
Creative Design Skill Guide
4 pages
Clouds Lesson Plan PDF 2
No ratings yet
Clouds Lesson Plan PDF 2
4 pages
Aspiring Bioenergy Innovator
No ratings yet
Aspiring Bioenergy Innovator
3 pages
Asymptotic Theory & Inference Guide
No ratings yet
Asymptotic Theory & Inference Guide
32 pages
Revised Lecture Notes 2
No ratings yet
Revised Lecture Notes 2
16 pages
Ex 5
No ratings yet
Ex 5
3 pages
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
No ratings yet
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
9 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Gjpamv17n2 01
No ratings yet
Gjpamv17n2 01
27 pages
RPH Reviewer
No ratings yet
RPH Reviewer
29 pages
James Stein Estimator
No ratings yet
James Stein Estimator
9 pages
ML Notes
No ratings yet
ML Notes
4 pages
Output 25
No ratings yet
Output 25
8 pages
Decision Theory in Statistics Lecture
No ratings yet
Decision Theory in Statistics Lecture
7 pages
Unesco - Eolss Sample Chapters: Statistical Parameter Estimation
No ratings yet
Unesco - Eolss Sample Chapters: Statistical Parameter Estimation
9 pages
Manual MFD Ingles
No ratings yet
Manual MFD Ingles
451 pages
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
No ratings yet
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
29 pages
3assignment Sol
No ratings yet
3assignment Sol
7 pages
Lecture 4: Inference, Asymptotics & Monte Carlo: August 11, 2018
No ratings yet
Lecture 4: Inference, Asymptotics & Monte Carlo: August 11, 2018
39 pages
Empirical Risk Minimization Guide
No ratings yet
Empirical Risk Minimization Guide
6 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Problem Set 1 Solution
No ratings yet
Problem Set 1 Solution
4 pages
S.S. Carnatic: 19th Century Shipwreck Analysis
100% (2)
S.S. Carnatic: 19th Century Shipwreck Analysis
264 pages
STS-30 Press Kit
No ratings yet
STS-30 Press Kit
41 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Hospital Housekeeping Guide
100% (1)
Hospital Housekeeping Guide
16 pages
l3 Assignment - Interview Questions - Template 2
No ratings yet
l3 Assignment - Interview Questions - Template 2
3 pages
Representer Function
No ratings yet
Representer Function
12 pages
Cost Comparison Engage Report Final Version PDF
No ratings yet
Cost Comparison Engage Report Final Version PDF
80 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
ECE531 Lecture 3: Minimax Hypothesis Testing: D. Richard Brown III
No ratings yet
ECE531 Lecture 3: Minimax Hypothesis Testing: D. Richard Brown III
21 pages
AC7114-4 Rev G AUDIT CRITERIA FOR NONDESTRUCTIVE TESTING FACILITY FILM RADIOGRAPHY SURVEY
100% (2)
AC7114-4 Rev G AUDIT CRITERIA FOR NONDESTRUCTIVE TESTING FACILITY FILM RADIOGRAPHY SURVEY
21 pages
Hypothesis
No ratings yet
Hypothesis
3 pages
Linear Model Methodology
No ratings yet
Linear Model Methodology
9 pages
Food Chains
No ratings yet
Food Chains
5 pages
Statistical Analysis of Home Prices
No ratings yet
Statistical Analysis of Home Prices
4 pages
Pulo, Dalahican, Cavite City
No ratings yet
Pulo, Dalahican, Cavite City
3 pages
Construction Contract - The Cost of Mistrust
No ratings yet
Construction Contract - The Cost of Mistrust
6 pages
eNodeB Site Commissioning Guide
No ratings yet
eNodeB Site Commissioning Guide
22 pages
Features of Academic Writing
No ratings yet
Features of Academic Writing
50 pages
12200.01 IT Assessment and Understanding
No ratings yet
12200.01 IT Assessment and Understanding
13 pages
Jha RT Cpecc
No ratings yet
Jha RT Cpecc
10 pages
Oreilly Using - Samba
No ratings yet
Oreilly Using - Samba
798 pages
DTT TMT TelecomIndRprt 03824
No ratings yet
DTT TMT TelecomIndRprt 03824
24 pages
Classical Physics Prof. V. Balakrishnan Department of Physics Indian Institute of Technology, Madras Lecture No. # 12
No ratings yet
Classical Physics Prof. V. Balakrishnan Department of Physics Indian Institute of Technology, Madras Lecture No. # 12
25 pages
Money: Are You An Spender or A Keeper?
No ratings yet
Money: Are You An Spender or A Keeper?
2 pages
Adaptable PID Controller For Industrial Hot and Cold Chamber
No ratings yet
Adaptable PID Controller For Industrial Hot and Cold Chamber
46 pages
INTASON, Montira, 2024 - The Dilemma Between Cultural Rituals and Hedonism For A Tourism in A Cultural Festival (Scopus)
No ratings yet
INTASON, Montira, 2024 - The Dilemma Between Cultural Rituals and Hedonism For A Tourism in A Cultural Festival (Scopus)
29 pages
Parallel Texts Alignment Strategies
No ratings yet
Parallel Texts Alignment Strategies
7 pages
Jigsaw Jumbled Test Sheet
No ratings yet
Jigsaw Jumbled Test Sheet
75 pages
Board Resolutions Sample
No ratings yet
Board Resolutions Sample
32 pages

Lecture 14

Uploaded by

Lecture 14

Uploaded by

Lecture Notes 14

We continue with our discussion of decision theory.

1.1 The Risk Function

If θ = (θ1 , . . . , θk ) is a vector then some common loss functions are

The risk of an estimator θb is

b = Eθ (θb − θ)2 = Varθ (θ)

1.2 Comparing Risk Functions

and the Bayes risk under prior π is

Example 3 Consider again the two estimators in Example 2. We have

1.3 Bayes Estimators

Let π be a prior distribution. After observing X n = (X1 , . . . , Xn ), the posterior distribution

b n ) be the value of θ that minimizes r(θ|x

Example 6 Let X1 , . . . , Xn ∼ N (µ, σ 2 ) where σ 2 is known. Suppose we use a N (a, b2 ) prior

2 Minimax Estimators through Bayes Estimators

Our goal is to compute a minimax estimator θb that satisfies:

b ≤ inf sup R(θ, θ).

We will let θminimax denote a minimax estimator.

Let us see an example of this in action.

where Zi ∼ N (0, 1/n). This yields that,

where , Z ∼ N (0, Id ). Hence,

Write θb = θ + W , where W ∼ N (0, Id /n). Then

Let us denote β := c2 + 1/n. Then we obtain that,

Theorem 7 Let θb be the Bayes estimator for some prior π. If

then θb is minimax and π is called a least favorable prior.

Bπ (θb0 ) ≤ sup R(θ, θb0 ) < sup R(θ, θ)

You might also like

where , Z ∼ N (0, Id ). Hence,