Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views30 pages

Lecture 06

Uploaded by

wangweian8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views30 pages

Lecture 06

Uploaded by

wangweian8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

STA732

Statistical Inference
Lecture 06: Information Inequality

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 05

• Convex loss and Jensen’s inequality


• Rao-Blackwell Theorem allows us to improve an estimator
using sufficient statistics
• UMVU exists and is unique when the estimand is U-estimable
and complete sufficient statistics exist

2
Goal of Lecture 06

1. Second thoughts about bias


2. Log-likelihood, score and Fisher information
3. Cramér-Rao lower bound
4. Hammersley-Chapman-Robbins ineq

Chap. 4.2, 4.5-4.6 in Keener or Chap. 2.5 in Lehmann and Casella

3
Second thoughts about bias
Admissibility

Def. Admissible
An estimator 𝛿 is called inadmissible if there exists 𝛿 ∗ which has a
better risk:
𝑅(𝜃, 𝛿 ∗ ) ≤ 𝑅(𝜃, 𝛿) for all 𝜃 ∈ Ω, with 𝑅(𝜃1 , 𝛿 ∗ ) < 𝑅(𝜃1 , 𝛿) for some
𝜃1 ∈ Ω.

We also say that 𝛿 ∗ dominates 𝛿

4
Uniform distribution example from last lecture

𝑋1 , … , 𝑋𝑛 are i.i.d. from the uniform distribution on (0, 𝜃).


𝑇 = max {𝑋1 , … , 𝑋𝑛 } is complete sufficient.
𝑛+1
• We have derived that 𝑛 𝑇 is UMVU for estimating 𝜃.
• Among estimators in the form of mutiple of 𝑇 , is the UMVU
estimator admissible?

5
Gaussian sequence model example

2
𝑋𝑖 ∼ 𝒩(𝜇𝑖 , 1), 𝑖 = 1, … , 𝑛, independent. Want to estimate ‖𝜇‖2 ,
𝜇1
⎛ ⎞
where 𝜇 = ⎜ ⋮ ⎟
⎜ ⎟
⎝𝜇𝑛 ⎠
• Find a UMVU estimator ‖𝑋‖22 − 𝑛
• Can we find a better estimator (if 𝜇 = 0)?

6
Thoughts about unbiased estimators

• A UMVU estimator is not necessarily admissible!


• It might even be absurd (Ex 4.7 in Keener)
• It is a good estimator to start with, but in general we shall not
insist on UMVU

7
Log-likelihood, score and Fisher
information
Log-likelihood

Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}.


Assume each distribution has density 𝑝𝜃 and shares the common
support {𝑥 ∣ 𝑝𝜃 (𝑥) > 0}. The log-likelihood is

ℓ(𝜃; 𝑋) = log 𝑝𝜃 (𝑋)

8
Score

Def. Score
The score is defined as the gradient of the log-likelihood with
respect to the parameter vector

∇ℓ(𝜃; 𝑋)

Remark
• can treat it as “local sufficient statistics”, for 𝜉 ≈ 0

𝑝𝜃0 +𝜉 = exp ℓ(𝜃0 + 𝜉; 𝑥)


≈ exp [𝜉 ⊤ ∇ℓ(𝜃0 ; 𝑥)] ⋅ 𝑝𝜃0 (𝑥)

• indicates the sensitivity to infinitesimal changes to 𝜃.

9
Expected value of score is zero

Under enough regularity conditions, we have

𝔼𝜃 [∇ℓ(𝜃; 𝑋)] = 0

Proof:

1 = ∫ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)

Taking derivative (under regularity conditions) implies

𝜕
0=∫ ℓ(𝜃; 𝑥) ⋅ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)
𝜕𝜃𝑗

10
Fisher information

Def. Fisher information


For 𝜃 taking values in ℝ𝑠 , the Fisher information is a 𝑠 × 𝑠 matrix

𝐼(𝜃) = Cov𝜃 (∇ℓ(𝜃; 𝑋))


= 𝔼𝜃 [−∇2 ℓ(𝜃; 𝑋)]

why are the two definitions equivalent?

11
Cramér-Rao lower bound
Cramér-Rao lower bound in 1-dimension case

Consider an estimator 𝛿(𝑋) which is unbiased for 𝑔(𝜃). Then


𝑔(𝜃) = 𝔼𝜃 𝛿
Under enough regularity

𝑔′ (𝜃) = ∫ 𝛿(𝑥)ℓ′ (𝜃; 𝑥)𝑒ℓ(𝜃;𝑥) 𝑑𝜇(𝑥) = 𝔼𝜃 𝛿ℓ′

Thm 4.9 in Keener


Let P = {𝑃𝜃 ∶ 𝜃 ∈ Ω} be a dominated family with densities 𝑝𝜃
differentiable. Under enough regularity conditions (𝔼𝜃 𝑙′ = 0,
𝔼𝜃 𝛿 2 < ∞, 𝑔′ well defined), we have

[𝑔′ (𝜃)]2
Var𝜃 (𝛿) ≥ ,𝜃 ∈ Ω
𝐼(𝜃)
called Cramér-Rao lower bound or information lower bound 12
proof idea: Cauchy Schwarz inequality

13
Cramér-Rao lower bound in high dimension

For 𝜃 ∈ ℝ𝑠 , we have

Var𝜃 (𝛿) ≥ ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)

14
Interpretation of the Cramér-Rao lower bound

• To estimate 𝑔(𝜃), no unbiased estimator can have smaller


variance than ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)
• For a unbiased estimator 𝛿, we always have the lower bound of
the form for any random variable 𝜓
2
Cov𝜃 (𝛿, 𝜓)
Var𝜃 (𝛿) ≥
Var𝜃 (𝜓)
What is a good 𝜓?

15
Example: Cramér-Rao lower bound for i.i.d. samples

i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑝𝜃 , 𝜃 ∈ Ω. The joint density is
𝑛
(1)
𝑝𝜃 (𝑥) = ∏ 𝑝𝜃 (𝑥𝑖 )
𝑖=1

What is the relationship between Fisher information for 𝑛 i.i.d.


observations and that for a single observation?

16
Efficiency

CRLB is not always attainable


Def. efficiency
The efficiency of an unbiased estimator 𝛿 is
CRLB
eff𝜃 (𝛿) =
Var𝜃 (𝛿)

Remark
• According to the definition and the Cramér-Rao lower bound,
for “regular” unbiased estimators, eff𝜃 (𝛿) ≤ 1
• Efficiency 1 is rarely achieved in finite samples, but usually we
can approach it asymptotically as 𝑛 → ∞

17
Hammersley-Chapman-Robbins
Inequality
Motivation behind Hammersley-Chapman-Robbins Inequality

The Cramér-Rao lower bound requires the differentiation under


integral, thus requires regularity conditions so that the
differentiation is well-defined.
We can get a more general statement if we replace ∇ℓ(𝜃; 𝑋) with
the corresponding finite difference.

18
Hammersley-Chapman-Robbins Inequality (1)

Recall that by Cauchy-Schwarz, for a unbiased estimator 𝛿, we


always have the lower bound of the form for any random variable 𝜓
2
Cov𝜃 (𝛿, 𝜓)
Var𝜃 (𝛿) ≥
Var𝜃 (𝜓)

• In CRLB, we took 𝜓 = ∇ℓ(𝜃; 𝑋)


• Here we take
𝑝𝜃+𝜖 (𝑋)
− 1 = exp (ℓ(𝜃 + 𝜖; 𝑋) − ℓ(𝜃; 𝑋)) − 1
𝑝𝜃 (𝑋)

≈ 𝜖⊤ ∇ℓ(𝜃; 𝑋) for small 𝜖

19
Hammersley-Chapman-Robbins Inequality (2)

We verify that
𝑝𝜃+𝜖 (𝑋)
• 𝔼[ 𝑝𝜃 (𝑋) − 1] = 0

𝑝𝜃+𝜖 (𝑋) 𝑝 (𝑥)
Cov𝜃 (𝛿(𝑋), − 1) = ∫ 𝛿(𝑥) ( 𝜃+𝜖 − 1) 𝑝𝜃 (𝑥)𝑑𝜇(𝑥)
𝑝𝜃 (𝑋) 𝑝𝜃 (𝑥)
= 𝔼𝜃+𝜖 [𝛿] − 𝔼𝜃 [𝛿]
= 𝑔(𝜃 + 𝜖) − 𝑔(𝜃)

Hence HCRI:
2
(𝑔(𝜃 + 𝜖) − 𝑔(𝜃))
Var𝜃 (𝛿) ≥ 2
𝑝𝜃+𝜖 (𝑥)
𝔼 [( 𝑝𝜃 (𝑥) − 1) ]

CRLB follows from taking 𝜖 → 0, but taking sup over 𝜖 can give better bounds

20
Example 1: exponential family

What is the Cramér-Rao lower bound for the exponential family?

21
Example 2: curved exponential family

What is the Cramér-Rao lower bound for the curved exponential


family?
𝑝𝜃 (𝑥) = exp(𝜂(𝜃)⊤ 𝑇 (𝑥) − 𝐵(𝜃))ℎ(𝑥), 𝜃 ∈ ℝ, 𝑇 (𝑥) ∈ ℝ𝑠

22
Summary

• Restricting to unbiased estimators have nice theory: UMVU


theory. But it is not always admissible in terms of total risk
• Score and Fisher information
• Cramér-Rao lower bound and its variant

23
What is next?

• Equivariance

24
Thank you

25
26

You might also like