0% found this document useful (0 votes)

13 views30 pages

Lecture 06

Uploaded by

wangweian8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views30 pages

Lecture 06

Uploaded by

wangweian8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

STA732

Statistical Inference
Lecture 06: Information Inequality

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 05

• Convex loss and Jensen’s inequality

• Rao-Blackwell Theorem allows us to improve an estimator
using sufficient statistics
• UMVU exists and is unique when the estimand is U-estimable
and complete sufficient statistics exist

2
Goal of Lecture 06

1. Second thoughts about bias

2. Log-likelihood, score and Fisher information
3. Cramér-Rao lower bound
4. Hammersley-Chapman-Robbins ineq

Chap. 4.2, 4.5-4.6 in Keener or Chap. 2.5 in Lehmann and Casella

3
Second thoughts about bias
Admissibility

Def. Admissible
An estimator 𝛿 is called inadmissible if there exists 𝛿 ∗ which has a
better risk:
𝑅(𝜃, 𝛿 ∗ ) ≤ 𝑅(𝜃, 𝛿) for all 𝜃 ∈ Ω, with 𝑅(𝜃1 , 𝛿 ∗ ) < 𝑅(𝜃1 , 𝛿) for some
𝜃1 ∈ Ω.

We also say that 𝛿 ∗ dominates 𝛿

4
Uniform distribution example from last lecture

𝑋1 , … , 𝑋𝑛 are i.i.d. from the uniform distribution on (0, 𝜃).

𝑇 = max {𝑋1 , … , 𝑋𝑛 } is complete sufficient.
𝑛+1
• We have derived that 𝑛 𝑇 is UMVU for estimating 𝜃.
• Among estimators in the form of mutiple of 𝑇 , is the UMVU
estimator admissible?

5
Gaussian sequence model example

2
𝑋𝑖 ∼ 𝒩(𝜇𝑖 , 1), 𝑖 = 1, … , 𝑛, independent. Want to estimate ‖𝜇‖2 ,
𝜇1
⎛ ⎞
where 𝜇 = ⎜ ⋮ ⎟
⎜ ⎟
⎝𝜇𝑛 ⎠
• Find a UMVU estimator ‖𝑋‖22 − 𝑛
• Can we find a better estimator (if 𝜇 = 0)?

6
Thoughts about unbiased estimators

• A UMVU estimator is not necessarily admissible!

• It might even be absurd (Ex 4.7 in Keener)
• It is a good estimator to start with, but in general we shall not
insist on UMVU

7
Log-likelihood, score and Fisher
information
Log-likelihood

Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}.

Assume each distribution has density 𝑝𝜃 and shares the common
support {𝑥 ∣ 𝑝𝜃 (𝑥) > 0}. The log-likelihood is

ℓ(𝜃; 𝑋) = log 𝑝𝜃 (𝑋)

8
Score

Def. Score
The score is defined as the gradient of the log-likelihood with
respect to the parameter vector

∇ℓ(𝜃; 𝑋)

Remark
• can treat it as “local sufficient statistics”, for 𝜉 ≈ 0

𝑝𝜃0 +𝜉 = exp ℓ(𝜃0 + 𝜉; 𝑥)

≈ exp [𝜉 ⊤ ∇ℓ(𝜃0 ; 𝑥)] ⋅ 𝑝𝜃0 (𝑥)

• indicates the sensitivity to infinitesimal changes to 𝜃.

9
Expected value of score is zero

Under enough regularity conditions, we have

𝔼𝜃 [∇ℓ(𝜃; 𝑋)] = 0

Proof:

1 = ∫ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)

Taking derivative (under regularity conditions) implies

𝜕
0=∫ ℓ(𝜃; 𝑥) ⋅ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)
𝜕𝜃𝑗

10
Fisher information

Def. Fisher information

For 𝜃 taking values in ℝ𝑠 , the Fisher information is a 𝑠 × 𝑠 matrix

𝐼(𝜃) = Cov𝜃 (∇ℓ(𝜃; 𝑋))

= 𝔼𝜃 [−∇2 ℓ(𝜃; 𝑋)]

why are the two definitions equivalent?

11
Cramér-Rao lower bound
Cramér-Rao lower bound in 1-dimension case

Consider an estimator 𝛿(𝑋) which is unbiased for 𝑔(𝜃). Then

𝑔(𝜃) = 𝔼𝜃 𝛿
Under enough regularity

𝑔′ (𝜃) = ∫ 𝛿(𝑥)ℓ′ (𝜃; 𝑥)𝑒ℓ(𝜃;𝑥) 𝑑𝜇(𝑥) = 𝔼𝜃 𝛿ℓ′

Thm 4.9 in Keener

Let P = {𝑃𝜃 ∶ 𝜃 ∈ Ω} be a dominated family with densities 𝑝𝜃
differentiable. Under enough regularity conditions (𝔼𝜃 𝑙′ = 0,
𝔼𝜃 𝛿 2 < ∞, 𝑔′ well defined), we have

[𝑔′ (𝜃)]2
Var𝜃 (𝛿) ≥ ,𝜃 ∈ Ω
𝐼(𝜃)
called Cramér-Rao lower bound or information lower bound 12
proof idea: Cauchy Schwarz inequality

13
Cramér-Rao lower bound in high dimension

For 𝜃 ∈ ℝ𝑠 , we have

Var𝜃 (𝛿) ≥ ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)

14
Interpretation of the Cramér-Rao lower bound

• To estimate 𝑔(𝜃), no unbiased estimator can have smaller

variance than ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)
• For a unbiased estimator 𝛿, we always have the lower bound of
the form for any random variable 𝜓
2
Cov𝜃 (𝛿, 𝜓)
Var𝜃 (𝛿) ≥
Var𝜃 (𝜓)
What is a good 𝜓?

15
Example: Cramér-Rao lower bound for i.i.d. samples

i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑝𝜃 , 𝜃 ∈ Ω. The joint density is
𝑛
(1)
𝑝𝜃 (𝑥) = ∏ 𝑝𝜃 (𝑥𝑖 )
𝑖=1

What is the relationship between Fisher information for 𝑛 i.i.d.

observations and that for a single observation?

16
Efficiency

CRLB is not always attainable

Def. efficiency
The efficiency of an unbiased estimator 𝛿 is
CRLB
eff𝜃 (𝛿) =
Var𝜃 (𝛿)

Remark
• According to the definition and the Cramér-Rao lower bound,
for “regular” unbiased estimators, eff𝜃 (𝛿) ≤ 1
• Efficiency 1 is rarely achieved in finite samples, but usually we
can approach it asymptotically as 𝑛 → ∞

17
Hammersley-Chapman-Robbins
Inequality
Motivation behind Hammersley-Chapman-Robbins Inequality

The Cramér-Rao lower bound requires the differentiation under

integral, thus requires regularity conditions so that the
differentiation is well-defined.
We can get a more general statement if we replace ∇ℓ(𝜃; 𝑋) with
the corresponding finite difference.

18
Hammersley-Chapman-Robbins Inequality (1)

Recall that by Cauchy-Schwarz, for a unbiased estimator 𝛿, we

always have the lower bound of the form for any random variable 𝜓
2
Cov𝜃 (𝛿, 𝜓)
Var𝜃 (𝛿) ≥
Var𝜃 (𝜓)

• In CRLB, we took 𝜓 = ∇ℓ(𝜃; 𝑋)

• Here we take
𝑝𝜃+𝜖 (𝑋)
− 1 = exp (ℓ(𝜃 + 𝜖; 𝑋) − ℓ(𝜃; 𝑋)) − 1
𝑝𝜃 (𝑋)

≈ 𝜖⊤ ∇ℓ(𝜃; 𝑋) for small 𝜖

19
Hammersley-Chapman-Robbins Inequality (2)

We verify that
𝑝𝜃+𝜖 (𝑋)
• 𝔼[ 𝑝𝜃 (𝑋) − 1] = 0
•
𝑝𝜃+𝜖 (𝑋) 𝑝 (𝑥)
Cov𝜃 (𝛿(𝑋), − 1) = ∫ 𝛿(𝑥) ( 𝜃+𝜖 − 1) 𝑝𝜃 (𝑥)𝑑𝜇(𝑥)
𝑝𝜃 (𝑋) 𝑝𝜃 (𝑥)
= 𝔼𝜃+𝜖 [𝛿] − 𝔼𝜃 [𝛿]
= 𝑔(𝜃 + 𝜖) − 𝑔(𝜃)

Hence HCRI:
2
(𝑔(𝜃 + 𝜖) − 𝑔(𝜃))
Var𝜃 (𝛿) ≥ 2
𝑝𝜃+𝜖 (𝑥)
𝔼 [( 𝑝𝜃 (𝑥) − 1) ]

CRLB follows from taking 𝜖 → 0, but taking sup over 𝜖 can give better bounds

20
Example 1: exponential family

What is the Cramér-Rao lower bound for the exponential family?

21
Example 2: curved exponential family

What is the Cramér-Rao lower bound for the curved exponential

family?
𝑝𝜃 (𝑥) = exp(𝜂(𝜃)⊤ 𝑇 (𝑥) − 𝐵(𝜃))ℎ(𝑥), 𝜃 ∈ ℝ, 𝑇 (𝑥) ∈ ℝ𝑠

22
Summary

• Restricting to unbiased estimators have nice theory: UMVU

theory. But it is not always admissible in terms of total risk
• Score and Fisher information
• Cramér-Rao lower bound and its variant

23
What is next?

• Equivariance

24
Thank you

25
26

Trig Cheat Sheet 1.4
67% (3)
Trig Cheat Sheet 1.4
2 pages
STA 303 Theory of Estimation 9th Lecture-1
No ratings yet
STA 303 Theory of Estimation 9th Lecture-1
7 pages
Cramer Raoh and Out 08
No ratings yet
Cramer Raoh and Out 08
13 pages
Algebra: Distributive Property Guide
No ratings yet
Algebra: Distributive Property Guide
3 pages
Classical Estimation
No ratings yet
Classical Estimation
11 pages
7 UMVUEHandout
No ratings yet
7 UMVUEHandout
82 pages
Lecture Note 16
No ratings yet
Lecture Note 16
4 pages
Lecture Four 2025
No ratings yet
Lecture Four 2025
57 pages
Crrao
No ratings yet
Crrao
7 pages
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
No ratings yet
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
2 pages
Bios602 Wi13 Lec12 Handout
No ratings yet
Bios602 Wi13 Lec12 Handout
6 pages
4 Criteria For Estimators: Lectures 18-21
No ratings yet
4 Criteria For Estimators: Lectures 18-21
4 pages
Lec 9
No ratings yet
Lec 9
21 pages
2 Efficiency
No ratings yet
2 Efficiency
4 pages
Estimator Properties
No ratings yet
Estimator Properties
17 pages
Unbias
No ratings yet
Unbias
15 pages
Lec 10
No ratings yet
Lec 10
16 pages
Lec 5
No ratings yet
Lec 5
24 pages
Biostatistics Lecture: Cramer-Rao
No ratings yet
Biostatistics Lecture: Cramer-Rao
90 pages
Introecon Estimators Properties
No ratings yet
Introecon Estimators Properties
8 pages
Lecture 1.4
No ratings yet
Lecture 1.4
13 pages
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
6 pages
Lecture 05
No ratings yet
Lecture 05
31 pages
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
No ratings yet
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
7 pages
Minimum Variance Unbiased Estimators
No ratings yet
Minimum Variance Unbiased Estimators
4 pages
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
6 pages
6.chapter 4
No ratings yet
6.chapter 4
9 pages
02 Estimation
No ratings yet
02 Estimation
20 pages
Unbiasedness: Sudheesh Kumar Kattumannil University of Hyderabad
No ratings yet
Unbiasedness: Sudheesh Kumar Kattumannil University of Hyderabad
10 pages
Point Estimation Exam Study Guide
No ratings yet
Point Estimation Exam Study Guide
18 pages
Lecture15 Fisherinfo
No ratings yet
Lecture15 Fisherinfo
4 pages
Lecture 7
No ratings yet
Lecture 7
12 pages
Unit-16 IGNOU STATISTICS
No ratings yet
Unit-16 IGNOU STATISTICS
16 pages
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
No ratings yet
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
6 pages
Classics: 76 Resonance
No ratings yet
Classics: 76 Resonance
15 pages
MVUE
No ratings yet
MVUE
89 pages
Lecture 13
No ratings yet
Lecture 13
12 pages
CRLB Vector Proof
No ratings yet
CRLB Vector Proof
24 pages
Mvue Notes
No ratings yet
Mvue Notes
5 pages
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
No ratings yet
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
27 pages
Week 7 In-Class Problems
No ratings yet
Week 7 In-Class Problems
2 pages
St202 Ps7 WT
No ratings yet
St202 Ps7 WT
14 pages
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
No ratings yet
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
20 pages
Properties of Estimators Explained
No ratings yet
Properties of Estimators Explained
43 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Theory of Estimation
No ratings yet
Theory of Estimation
11 pages
Bayesian Estimation Seminar Report
No ratings yet
Bayesian Estimation Seminar Report
9 pages
Solved Examples of Cramer Rao Lower Bound
100% (1)
Solved Examples of Cramer Rao Lower Bound
6 pages
Solution To Rao Crammer Bound
No ratings yet
Solution To Rao Crammer Bound
11 pages
Cramer-Rao Lower Bound Explained
No ratings yet
Cramer-Rao Lower Bound Explained
53 pages
Lecture 24
No ratings yet
Lecture 24
23 pages
02 Point Estimators
No ratings yet
02 Point Estimators
33 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Cramer-Rao Inequality
No ratings yet
Cramer-Rao Inequality
10 pages
Optimal Estimator Comparisons
No ratings yet
Optimal Estimator Comparisons
16 pages
Unbiased Estimation Techniques
No ratings yet
Unbiased Estimation Techniques
6 pages
ET Lecture02
No ratings yet
ET Lecture02
41 pages
01 Estimation PDF
No ratings yet
01 Estimation PDF
13 pages
American International University-Bangladesh: Declaration and Statement of Authorship
No ratings yet
American International University-Bangladesh: Declaration and Statement of Authorship
10 pages
Calculus & Geometry Course Outline
No ratings yet
Calculus & Geometry Course Outline
9 pages
Finite Element Analysis Exam 2016
No ratings yet
Finite Element Analysis Exam 2016
2 pages
Formation Academic Year: X y Z X y Z U P
No ratings yet
Formation Academic Year: X y Z X y Z U P
1 page
Engineering Maths: Differential Equations
No ratings yet
Engineering Maths: Differential Equations
39 pages
12-267 - Homework Assignment 2 - Drorbn
No ratings yet
12-267 - Homework Assignment 2 - Drorbn
4 pages
Eberly, David H GPGPU Programming For Games and Science
100% (1)
Eberly, David H GPGPU Programming For Games and Science
464 pages
44 Multiplicity of Eigenvalues
No ratings yet
44 Multiplicity of Eigenvalues
2 pages
ID: 6d99b141
No ratings yet
ID: 6d99b141
35 pages
Linear and Non-Linear Relations Syllabus
No ratings yet
Linear and Non-Linear Relations Syllabus
3 pages
Bie1123/Bie1124 - Mathematics For Computing Bit1113/Bit1114 - Digital Design Fundamentals
No ratings yet
Bie1123/Bie1124 - Mathematics For Computing Bit1113/Bit1114 - Digital Design Fundamentals
3 pages
Assignment1 Doubling v2
No ratings yet
Assignment1 Doubling v2
7 pages
SPSS Instruction - Chapter 8
No ratings yet
SPSS Instruction - Chapter 8
20 pages
Elie Cartan Thesis Writing Help
100% (2)
Elie Cartan Thesis Writing Help
4 pages
Quantum Equilibrium Dynamics
No ratings yet
Quantum Equilibrium Dynamics
65 pages
6 Projection of Planes
No ratings yet
6 Projection of Planes
42 pages
Ioqm Book
No ratings yet
Ioqm Book
75 pages
Engineering Inertia Formulas Guide
No ratings yet
Engineering Inertia Formulas Guide
4 pages
Free Body Diagrams & MATLAB Guide
No ratings yet
Free Body Diagrams & MATLAB Guide
4 pages
Flintstones Cannon Balls and Calculus - Proceedengs
No ratings yet
Flintstones Cannon Balls and Calculus - Proceedengs
10 pages
Class 11 Mathematics Notes 2025 26 Chapter 2 Relations and Functions
100% (1)
Class 11 Mathematics Notes 2025 26 Chapter 2 Relations and Functions
51 pages
A2H Name: - Statistics Homework WS Date
No ratings yet
A2H Name: - Statistics Homework WS Date
3 pages
Homework Sol
No ratings yet
Homework Sol
7 pages
03b Pure Mathematics 2 June 2021
No ratings yet
03b Pure Mathematics 2 June 2021
32 pages
Che 555
No ratings yet
Che 555
10 pages
Probability: (EAMCET 2009)
100% (1)
Probability: (EAMCET 2009)
11 pages
Determinants - Practice Sheet - VIJETA SERIES CLASS-12TH
No ratings yet
Determinants - Practice Sheet - VIJETA SERIES CLASS-12TH
4 pages
SR'S - CD - Synopsis - 31-10-2022 - (Maths-Srinivas Sir)
No ratings yet
SR'S - CD - Synopsis - 31-10-2022 - (Maths-Srinivas Sir)
6 pages

Lecture 06

Uploaded by

Lecture 06

Uploaded by

STA732

• Convex loss and Jensen’s inequality

1. Second thoughts about bias

Chap. 4.2, 4.5-4.6 in Keener or Chap. 2.5 in Lehmann and Casella

We also say that 𝛿 ∗ dominates 𝛿

𝑋1 , … , 𝑋𝑛 are i.i.d. from the uniform distribution on (0, 𝜃).

• A UMVU estimator is not necessarily admissible!

Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}.

ℓ(𝜃; 𝑋) = log 𝑝𝜃 (𝑋)

𝑝𝜃0 +𝜉 = exp ℓ(𝜃0 + 𝜉; 𝑥)

• indicates the sensitivity to infinitesimal changes to 𝜃.

Under enough regularity conditions, we have

1 = ∫ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)

Taking derivative (under regularity conditions) implies

Def. Fisher information

𝐼(𝜃) = Cov𝜃 (∇ℓ(𝜃; 𝑋))

why are the two definitions equivalent?

Consider an estimator 𝛿(𝑋) which is unbiased for 𝑔(𝜃). Then

𝑔′ (𝜃) = ∫ 𝛿(𝑥)ℓ′ (𝜃; 𝑥)𝑒ℓ(𝜃;𝑥) 𝑑𝜇(𝑥) = 𝔼𝜃 𝛿ℓ′

Thm 4.9 in Keener

Var𝜃 (𝛿) ≥ ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)

• To estimate 𝑔(𝜃), no unbiased estimator can have smaller

What is the relationship between Fisher information for 𝑛 i.i.d.

CRLB is not always attainable

The Cramér-Rao lower bound requires the differentiation under

Recall that by Cauchy-Schwarz, for a unbiased estimator 𝛿, we

• In CRLB, we took 𝜓 = ∇ℓ(𝜃; 𝑋)

≈ 𝜖⊤ ∇ℓ(𝜃; 𝑋) for small 𝜖

What is the Cramér-Rao lower bound for the exponential family?

What is the Cramér-Rao lower bound for the curved exponential

• Restricting to unbiased estimators have nice theory: UMVU

You might also like