0% found this document useful (0 votes)

13 views30 pages

Lecture 06

Uploaded by

wangweian8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views30 pages

Lecture 06

Uploaded by

wangweian8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

STA732

Statistical Inference
Lecture 06: Information Inequality

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 05

• Convex loss and Jensen’s inequality

• Rao-Blackwell Theorem allows us to improve an estimator
using sufficient statistics
• UMVU exists and is unique when the estimand is U-estimable
and complete sufficient statistics exist

2
Goal of Lecture 06

1. Second thoughts about bias

2. Log-likelihood, score and Fisher information
3. Cramér-Rao lower bound
4. Hammersley-Chapman-Robbins ineq

Chap. 4.2, 4.5-4.6 in Keener or Chap. 2.5 in Lehmann and Casella

3
Second thoughts about bias
Admissibility

Def. Admissible
An estimator 𝛿 is called inadmissible if there exists 𝛿 ∗ which has a
better risk:
𝑅(𝜃, 𝛿 ∗ ) ≤ 𝑅(𝜃, 𝛿) for all 𝜃 ∈ Ω, with 𝑅(𝜃1 , 𝛿 ∗ ) < 𝑅(𝜃1 , 𝛿) for some
𝜃1 ∈ Ω.

We also say that 𝛿 ∗ dominates 𝛿

4
Uniform distribution example from last lecture

𝑋1 , … , 𝑋𝑛 are i.i.d. from the uniform distribution on (0, 𝜃).

𝑇 = max {𝑋1 , … , 𝑋𝑛 } is complete sufficient.
𝑛+1
• We have derived that 𝑛 𝑇 is UMVU for estimating 𝜃.
• Among estimators in the form of mutiple of 𝑇 , is the UMVU
estimator admissible?

5
Gaussian sequence model example

2
𝑋𝑖 ∼ 𝒩(𝜇𝑖 , 1), 𝑖 = 1, … , 𝑛, independent. Want to estimate ‖𝜇‖2 ,
𝜇1
⎛ ⎞
where 𝜇 = ⎜ ⋮ ⎟
⎜ ⎟
⎝𝜇𝑛 ⎠
• Find a UMVU estimator ‖𝑋‖22 − 𝑛
• Can we find a better estimator (if 𝜇 = 0)?

6
Thoughts about unbiased estimators

• A UMVU estimator is not necessarily admissible!

• It might even be absurd (Ex 4.7 in Keener)
• It is a good estimator to start with, but in general we shall not
insist on UMVU

7
Log-likelihood, score and Fisher
information
Log-likelihood

Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}.

Assume each distribution has density 𝑝𝜃 and shares the common
support {𝑥 ∣ 𝑝𝜃 (𝑥) > 0}. The log-likelihood is

ℓ(𝜃; 𝑋) = log 𝑝𝜃 (𝑋)

8
Score

Def. Score
The score is defined as the gradient of the log-likelihood with
respect to the parameter vector

∇ℓ(𝜃; 𝑋)

Remark
• can treat it as “local sufficient statistics”, for 𝜉 ≈ 0

𝑝𝜃0 +𝜉 = exp ℓ(𝜃0 + 𝜉; 𝑥)

≈ exp [𝜉 ⊤ ∇ℓ(𝜃0 ; 𝑥)] ⋅ 𝑝𝜃0 (𝑥)

• indicates the sensitivity to infinitesimal changes to 𝜃.

9
Expected value of score is zero

Under enough regularity conditions, we have

𝔼𝜃 [∇ℓ(𝜃; 𝑋)] = 0

Proof:

1 = ∫ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)

Taking derivative (under regularity conditions) implies

𝜕
0=∫ ℓ(𝜃; 𝑥) ⋅ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)
𝜕𝜃𝑗

10
Fisher information

Def. Fisher information

For 𝜃 taking values in ℝ𝑠 , the Fisher information is a 𝑠 × 𝑠 matrix

𝐼(𝜃) = Cov𝜃 (∇ℓ(𝜃; 𝑋))

= 𝔼𝜃 [−∇2 ℓ(𝜃; 𝑋)]

why are the two definitions equivalent?

11
Cramér-Rao lower bound
Cramér-Rao lower bound in 1-dimension case

Consider an estimator 𝛿(𝑋) which is unbiased for 𝑔(𝜃). Then

𝑔(𝜃) = 𝔼𝜃 𝛿
Under enough regularity

𝑔′ (𝜃) = ∫ 𝛿(𝑥)ℓ′ (𝜃; 𝑥)𝑒ℓ(𝜃;𝑥) 𝑑𝜇(𝑥) = 𝔼𝜃 𝛿ℓ′

Thm 4.9 in Keener

Let P = {𝑃𝜃 ∶ 𝜃 ∈ Ω} be a dominated family with densities 𝑝𝜃
differentiable. Under enough regularity conditions (𝔼𝜃 𝑙′ = 0,
𝔼𝜃 𝛿 2 < ∞, 𝑔′ well defined), we have

[𝑔′ (𝜃)]2
Var𝜃 (𝛿) ≥ ,𝜃 ∈ Ω
𝐼(𝜃)
called Cramér-Rao lower bound or information lower bound 12
proof idea: Cauchy Schwarz inequality

13
Cramér-Rao lower bound in high dimension

For 𝜃 ∈ ℝ𝑠 , we have

Var𝜃 (𝛿) ≥ ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)

14
Interpretation of the Cramér-Rao lower bound

• To estimate 𝑔(𝜃), no unbiased estimator can have smaller

variance than ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)
• For a unbiased estimator 𝛿, we always have the lower bound of
the form for any random variable 𝜓
2
Cov𝜃 (𝛿, 𝜓)
Var𝜃 (𝛿) ≥
Var𝜃 (𝜓)
What is a good 𝜓?

15
Example: Cramér-Rao lower bound for i.i.d. samples

i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑝𝜃 , 𝜃 ∈ Ω. The joint density is
𝑛
(1)
𝑝𝜃 (𝑥) = ∏ 𝑝𝜃 (𝑥𝑖 )
𝑖=1

What is the relationship between Fisher information for 𝑛 i.i.d.

observations and that for a single observation?

16
Efficiency

CRLB is not always attainable

Def. efficiency
The efficiency of an unbiased estimator 𝛿 is
CRLB
eff𝜃 (𝛿) =
Var𝜃 (𝛿)

Remark
• According to the definition and the Cramér-Rao lower bound,
for “regular” unbiased estimators, eff𝜃 (𝛿) ≤ 1
• Efficiency 1 is rarely achieved in finite samples, but usually we
can approach it asymptotically as 𝑛 → ∞

17
Hammersley-Chapman-Robbins
Inequality
Motivation behind Hammersley-Chapman-Robbins Inequality

The Cramér-Rao lower bound requires the differentiation under

integral, thus requires regularity conditions so that the
differentiation is well-defined.
We can get a more general statement if we replace ∇ℓ(𝜃; 𝑋) with
the corresponding finite difference.

18
Hammersley-Chapman-Robbins Inequality (1)

Recall that by Cauchy-Schwarz, for a unbiased estimator 𝛿, we

always have the lower bound of the form for any random variable 𝜓
2
Cov𝜃 (𝛿, 𝜓)
Var𝜃 (𝛿) ≥
Var𝜃 (𝜓)

• In CRLB, we took 𝜓 = ∇ℓ(𝜃; 𝑋)

• Here we take
𝑝𝜃+𝜖 (𝑋)
− 1 = exp (ℓ(𝜃 + 𝜖; 𝑋) − ℓ(𝜃; 𝑋)) − 1
𝑝𝜃 (𝑋)

≈ 𝜖⊤ ∇ℓ(𝜃; 𝑋) for small 𝜖

19
Hammersley-Chapman-Robbins Inequality (2)

We verify that
𝑝𝜃+𝜖 (𝑋)
• 𝔼[ 𝑝𝜃 (𝑋) − 1] = 0
•
𝑝𝜃+𝜖 (𝑋) 𝑝 (𝑥)
Cov𝜃 (𝛿(𝑋), − 1) = ∫ 𝛿(𝑥) ( 𝜃+𝜖 − 1) 𝑝𝜃 (𝑥)𝑑𝜇(𝑥)
𝑝𝜃 (𝑋) 𝑝𝜃 (𝑥)
= 𝔼𝜃+𝜖 [𝛿] − 𝔼𝜃 [𝛿]
= 𝑔(𝜃 + 𝜖) − 𝑔(𝜃)

Hence HCRI:
2
(𝑔(𝜃 + 𝜖) − 𝑔(𝜃))
Var𝜃 (𝛿) ≥ 2
𝑝𝜃+𝜖 (𝑥)
𝔼 [( 𝑝𝜃 (𝑥) − 1) ]

CRLB follows from taking 𝜖 → 0, but taking sup over 𝜖 can give better bounds

20
Example 1: exponential family

What is the Cramér-Rao lower bound for the exponential family?

21
Example 2: curved exponential family

What is the Cramér-Rao lower bound for the curved exponential

family?
𝑝𝜃 (𝑥) = exp(𝜂(𝜃)⊤ 𝑇 (𝑥) − 𝐵(𝜃))ℎ(𝑥), 𝜃 ∈ ℝ, 𝑇 (𝑥) ∈ ℝ𝑠

22
Summary

• Restricting to unbiased estimators have nice theory: UMVU

theory. But it is not always admissible in terms of total risk
• Score and Fisher information
• Cramér-Rao lower bound and its variant

23
What is next?

• Equivariance

24
Thank you

25
26

STA 303 Theory of Estimation 9th Lecture-1
No ratings yet
STA 303 Theory of Estimation 9th Lecture-1
7 pages
Cramer Raoh and Out 08
No ratings yet
Cramer Raoh and Out 08
13 pages
Assignment - 1 - Lab 3.5 - Varinder
No ratings yet
Assignment - 1 - Lab 3.5 - Varinder
5 pages
Regression Analysis Random Motors
100% (2)
Regression Analysis Random Motors
11 pages
CRLB Vector Proof
No ratings yet
CRLB Vector Proof
24 pages
Properties of Estimators Explained
No ratings yet
Properties of Estimators Explained
43 pages
Solution To Rao Crammer Bound
No ratings yet
Solution To Rao Crammer Bound
11 pages
Unbiasedness: Sudheesh Kumar Kattumannil University of Hyderabad
No ratings yet
Unbiasedness: Sudheesh Kumar Kattumannil University of Hyderabad
10 pages
Minimum Variance Unbiased Estimators
No ratings yet
Minimum Variance Unbiased Estimators
4 pages
01 Estimation PDF
No ratings yet
01 Estimation PDF
13 pages
02 Estimation
No ratings yet
02 Estimation
20 pages
Bayesian Estimation Seminar Report
No ratings yet
Bayesian Estimation Seminar Report
9 pages
Mvue Notes
No ratings yet
Mvue Notes
5 pages
Cramer-Rao Lower Bound Explained
No ratings yet
Cramer-Rao Lower Bound Explained
53 pages
Regreesion Analysis
No ratings yet
Regreesion Analysis
24 pages
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
No ratings yet
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
27 pages
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
No ratings yet
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
20 pages
Solved Examples of Cramer Rao Lower Bound
100% (1)
Solved Examples of Cramer Rao Lower Bound
6 pages
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
No ratings yet
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
2 pages
ET Lecture02
No ratings yet
ET Lecture02
41 pages
Cramer-Rao Inequality
No ratings yet
Cramer-Rao Inequality
10 pages
12.simple Regression NLS Edit
No ratings yet
12.simple Regression NLS Edit
62 pages
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
No ratings yet
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
6 pages
Lecture 1.4
No ratings yet
Lecture 1.4
13 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Caro 2013
No ratings yet
Caro 2013
9 pages
Crrao
No ratings yet
Crrao
7 pages
Topics in Applied Econometrics MIT 14.387 J. Angrist Spring 2004 W. Newey
No ratings yet
Topics in Applied Econometrics MIT 14.387 J. Angrist Spring 2004 W. Newey
7 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
6 pages
Classics: 76 Resonance
No ratings yet
Classics: 76 Resonance
15 pages
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
No ratings yet
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
7 pages
Forecast Error Metrics Analysis
No ratings yet
Forecast Error Metrics Analysis
9 pages
MVUE
No ratings yet
MVUE
89 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Unit-16 IGNOU STATISTICS
No ratings yet
Unit-16 IGNOU STATISTICS
16 pages
Estimator Properties
No ratings yet
Estimator Properties
17 pages
Introecon Estimators Properties
No ratings yet
Introecon Estimators Properties
8 pages
Francis Galton: Galton's Law of Universal Regression Tall Fathers Less Short Fathers Was Greater
No ratings yet
Francis Galton: Galton's Law of Universal Regression Tall Fathers Less Short Fathers Was Greater
18 pages
Biostatistics Lecture: Cramer-Rao
No ratings yet
Biostatistics Lecture: Cramer-Rao
90 pages
Statisticians' Guide to MSEP
No ratings yet
Statisticians' Guide to MSEP
8 pages
Estimating Demand Functions: Managerial Economics
No ratings yet
Estimating Demand Functions: Managerial Economics
38 pages
Optimal Estimator Comparisons
No ratings yet
Optimal Estimator Comparisons
16 pages
R Package for Distribution Fitting
No ratings yet
R Package for Distribution Fitting
22 pages
02 Point Estimators
No ratings yet
02 Point Estimators
33 pages
Lecture 05
No ratings yet
Lecture 05
31 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
16 pages
Statistics Assignment Guide
No ratings yet
Statistics Assignment Guide
2 pages
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
No ratings yet
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
8 pages
Sampling & Estimation Practice
No ratings yet
Sampling & Estimation Practice
9 pages
2 Efficiency
No ratings yet
2 Efficiency
4 pages
FAT 2 - Sample Solutions
No ratings yet
FAT 2 - Sample Solutions
4 pages
Logistic Regression Analysis 2022
No ratings yet
Logistic Regression Analysis 2022
38 pages
Alhar Coba Sendiri 14 Jul 20.14
No ratings yet
Alhar Coba Sendiri 14 Jul 20.14
21 pages
Excel Regression Analysis Guide
No ratings yet
Excel Regression Analysis Guide
40 pages
Lecture Note 16
No ratings yet
Lecture Note 16
4 pages
Lecture15 Fisherinfo
No ratings yet
Lecture15 Fisherinfo
4 pages
Classical Estimation
No ratings yet
Classical Estimation
11 pages
St202 Ps7 WT
No ratings yet
St202 Ps7 WT
14 pages
M Com Ist Sem Dec 2018
No ratings yet
M Com Ist Sem Dec 2018
14 pages
Life Tables & Insurance Modeling
No ratings yet
Life Tables & Insurance Modeling
7 pages
26th FEB Assignment 1
No ratings yet
26th FEB Assignment 1
2 pages
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
6 pages
Bios602 Wi13 Lec12 Handout
No ratings yet
Bios602 Wi13 Lec12 Handout
6 pages
Theory of Estimation
No ratings yet
Theory of Estimation
11 pages
Week 7 In-Class Problems
No ratings yet
Week 7 In-Class Problems
2 pages
Unbiased Estimation Techniques
No ratings yet
Unbiased Estimation Techniques
6 pages
Lecture Four 2025
No ratings yet
Lecture Four 2025
57 pages
Chen 1993
No ratings yet
Chen 1993
23 pages
Lasso Regression for Data Scientists
No ratings yet
Lasso Regression for Data Scientists
16 pages
IEEE Standard Specification Format Guide and Test Procedure For Linear Single-Axis, Nongyroscopic Accelerometers1293-2018
No ratings yet
IEEE Standard Specification Format Guide and Test Procedure For Linear Single-Axis, Nongyroscopic Accelerometers1293-2018
271 pages
Lecture 24
No ratings yet
Lecture 24
23 pages
4 Criteria For Estimators: Lectures 18-21
No ratings yet
4 Criteria For Estimators: Lectures 18-21
4 pages
Lasso Regression
No ratings yet
Lasso Regression
3 pages
6.chapter 4
No ratings yet
6.chapter 4
9 pages
Unbias
No ratings yet
Unbias
15 pages
7 UMVUEHandout
No ratings yet
7 UMVUEHandout
82 pages
Multiple Regression Lecture Notes
No ratings yet
Multiple Regression Lecture Notes
46 pages
Intervention Analysis
No ratings yet
Intervention Analysis
37 pages
Lecture 13
No ratings yet
Lecture 13
12 pages
Point Estimation Exam Study Guide
No ratings yet
Point Estimation Exam Study Guide
18 pages
Problems On Statistics 2
No ratings yet
Problems On Statistics 2
2 pages
Lec 5
No ratings yet
Lec 5
24 pages
Lecture 7
No ratings yet
Lecture 7
12 pages
Lec 9
No ratings yet
Lec 9
21 pages
Lec 10
No ratings yet
Lec 10
16 pages

Lecture 06

Uploaded by

Lecture 06

Uploaded by

STA732

• Convex loss and Jensen’s inequality

1. Second thoughts about bias

Chap. 4.2, 4.5-4.6 in Keener or Chap. 2.5 in Lehmann and Casella

We also say that 𝛿 ∗ dominates 𝛿

𝑋1 , … , 𝑋𝑛 are i.i.d. from the uniform distribution on (0, 𝜃).

• A UMVU estimator is not necessarily admissible!

Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}.

ℓ(𝜃; 𝑋) = log 𝑝𝜃 (𝑋)

𝑝𝜃0 +𝜉 = exp ℓ(𝜃0 + 𝜉; 𝑥)

• indicates the sensitivity to infinitesimal changes to 𝜃.

Under enough regularity conditions, we have

1 = ∫ exp ℓ(𝜃; 𝑥)𝑑𝜇(𝑥)

Taking derivative (under regularity conditions) implies

Def. Fisher information

𝐼(𝜃) = Cov𝜃 (∇ℓ(𝜃; 𝑋))

why are the two definitions equivalent?

Consider an estimator 𝛿(𝑋) which is unbiased for 𝑔(𝜃). Then

𝑔′ (𝜃) = ∫ 𝛿(𝑥)ℓ′ (𝜃; 𝑥)𝑒ℓ(𝜃;𝑥) 𝑑𝜇(𝑥) = 𝔼𝜃 𝛿ℓ′

Thm 4.9 in Keener

Var𝜃 (𝛿) ≥ ∇𝑔(𝜃)⊤ 𝐼(𝜃)−1 ∇𝑔(𝜃)

• To estimate 𝑔(𝜃), no unbiased estimator can have smaller

What is the relationship between Fisher information for 𝑛 i.i.d.

CRLB is not always attainable

The Cramér-Rao lower bound requires the differentiation under

Recall that by Cauchy-Schwarz, for a unbiased estimator 𝛿, we

• In CRLB, we took 𝜓 = ∇ℓ(𝜃; 𝑋)

≈ 𝜖⊤ ∇ℓ(𝜃; 𝑋) for small 𝜖

What is the Cramér-Rao lower bound for the exponential family?

What is the Cramér-Rao lower bound for the curved exponential

• Restricting to unbiased estimators have nice theory: UMVU

You might also like