Studio 2 Questions

Uploaded by

Moqiu Liang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views5 pages

Studio 2 Questions

Uploaded by

Moqiu Liang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

FIT3154 Studio 2

Estimation, Loss and Risk

Daniel F. Schmidt
July 29, 2024

Contents

1 Estimation Risk for Bernoulli Model 2

2 Prediction Risk for Normal Distribution 4

1
Introduction
In this studio covers some problems in statistical decision theory. To gain some familiarity with the
ideas of risk functions for characterising estimators, we will examine the risk of a class of estimators
of the probability of success for the Bernoulli model. The second question will explore the Kullback–
Leibler divergence prediction loss for estimation of the mean and variance of a normal distribution.

1 Estimation Risk for Bernoulli Model

For the first part of the studio, we will look at the theoretical squared-error estimation risk of a class
of common estimators for the Bernoulli model. In this case, we observe a series of zeros and ones,
y = (y1 , . . . , yn ) with yj ∈ {0, 1} that we assume were generated independently and identically from a
Bernoulli model
p(yj = 1 | θ) = θ, j = 1, . . . , n (1)
so that θ is the probability of observing a success (“1”). The problem is to try and estimate θ from our
sequence of n trials. This problem is very common data science, as it is essentially about prediction
of future binary events (storm/no storm, win election/lose election, etc.) under the assumption that
each event is independent and unaffected by external effects. More general models of this type allow θ
to be determined by predictors, etc. (i.e., logistic regression), but as we will see, how to best estimate
θ even in the simple model (1) is not immediately clear. A very common class of estimators are the
so-called smoothed frequency estimators:
s+α
θ̂α (y) = (2)
n + 2α
Pn
where s = j=1 yj is the number of successes in our sequence of trials, and α ≥ 0 is tunable smooth-
ing/shrinkage/regularisation constant chosen by the user. Let us examine the risk of the estimator (2)
for different choices of α. We will first look at its risk under the squared-error loss
L(θ, θ̂) = (θ − θ̂)2
Using the fact that if y follows (1), then
E [s] = nθ, V [s] = nθ(1 − θ),
we can find the bias and variance of (2) as functions of the population paameter θ and constant α:
α(2θ − 1) h i nθ(1 − θ)
biasθ (θ̂α ) = − , Vθ θ̂α = . (3)
n + 2α (n + 2α)2
We can use these to explore the squared-error risk for different choices of α:
1. Write a line of R code to calculate the estimates of θ using (2) for s = 0, . . . , 10 (i.e., for n = 10,
try all possible numbers of successes). (hint: this can be done in a single line by making a vector
of s values, i.e., s = 0:10).
Try this for α = 0 and then α = 1/2. How do the two estimates differ? When α = 0, what
estimator is (2) equivalent to?
2. Download the file bin.risk(). This computes the risk function for the estimator (2) using the
fact that h i h i
Eθ (θ̂α (y) − θ)2 = biasθ (θ̂α )2 + Vθ θ̂α
along with (3). Open and examine the code.

2
3. Using the function bin.risk() create a plot showing the risk curves for n = 10 when α = 0 and
α = 1/2:

rv=bin.risk(alpha=0, n=10)
plot(rv$theta,rv$risk,type="l")
rv=bin.risk(alpha=1/2, n=10)
lines(rv$theta,rv$risk,col="red")

Please answer the following questions.

(a) Where is the point of maximum risk (i.e., which value of θ is the hardest two estimate well?)
Why do you think the population parameter is hardest to estimate when it takes on this
value?
(b) At what point are risk curves at their smallest? Why do you think these values of the
population parameter are easier to estimate well?
(c) How do the two curves differ?
(d) Does either choice of α dominate (see Slide 21, Lecture 1) the other?

(hint: remember to use plot() to plot the first curve, then lines() to overlay successive curves)
√
4. The choice of α = n/2 produces an estimator that is minimax (see Slide 29, Lecture 1). Overlay
the risk curve when α takes on this value. What does this curve look like?
5. Create a new plot and overlay the risk curves
√ for n = 100 (much larger sample) for α = 0,
α = 1/2 and the minimax estimator α = n/2. How do the curves compare? Do you think the
minimax estimator is a good choice? Why or why not?
6. Challenge Question. See if you can derive the bias and variance formulas (3) for the smoothed
estimator (2). (complete out of studio).

3
2 Prediction Risk for Normal Distribution
As a second problem, let us consider the problem of estimating the mean µ and variance σ 2 of a normal
distribution. For reference this has probability density function of
1/2
(y − µ)2

2 1
p(y | µ, σ ) = exp − .
2πσ 2 2σ 2
the estimation of the mean µ from a sample y = (y1 , . . . , yn ) using the
In Lecture 1 we examined P
n
sample mean µ̂(y) = (1/n) i=1 yi , under squared-error loss

L(µ, µ̂) = (µ − µ̂)2 , (4)

and found the squared-error risk to be (Slide 24, Lecture 1)

σ2
R(µ, µ̂) = Eµ [L(µ, µ̂)] = (5)
n
which is independent of the population value of µ, but depends on the noise variance (the larger σ 2 ,
the bigger the average squared-error) and the sample size (the larger the sample size, the smaller the
average squared-error). Now let us consider the risk when estimating a normal using Kullback–Leibler
(KL) divergence (see Slide 36, Lecture 1); the KL divergence is a loss function that takes into account
the structural properties of the probability model and measures the ability of our estimated model
to predict future data from the same population. The KL divergence from a normal distibution with
mean µ and σ 2 to an estimated normal distribution with mean µ̂ and variance σ̂ 2 is
2
2 2 1 σ̂ σ2 (µ − µ̂)2 1
KL(µ, σ ||µ̂, σ̂ ) = log + + − (6)
2 σ2 2σ̂ 2 2σ̂ 2 2
Please answer the following questions.
1. Imagine we only need to estimate the mean µ, i.e., somehow we know the true value of the
variance σ 2 .
(a) Write down the simplified formula for the KL divergence (6) in this case, i.e., when we set
σ̂ 2 = σ 2 .
(b) How does this compare to the standard squared-error loss (4). In what way is it similar,
and how does it differ? Why is this the case?
2. Calculate the KL risk for the sample mean assuming that we know σ 2 , i.e., set σ̂ 2 = σ 2 , using
your previously simplified formula. How does it differ from the squared-error risk?
3. Imagine now instead that we are estimating the variance σ 2 and somehow we know the true
population mean µ.
(a) Write down the simplified formula for the KL divergence (6) in this case, i.e., when we set
µ̂ = µ.
(b) Plot the function for σ 2 = 1 and σ̂ 2 ∈ (0.1, 10) using

sigma2 = 1
sigma2.hat = seq(0.1, 10, length.out=1e3)
kl = (1/2)*log(sigma2.hat/sigma2) + sigma2/2/sigma2.hat - 1/2
plot(sigma2.hat, kl, type="l")

4
(c) What does the curve look like? Does the KL divergence penalize overestimation of σ 2
differently from underestimation? Which is less costly in terms of loss?
(d) Overlay the KL divergence for σ 2 = 2 and σ̂ 2 ∈ (0.1, 10). Does it look similar to the previous
curve?
4. Standard estimators that we have previously examined for the mean and variance of a normal
distribution are the sample mean and (adjusted) sample variance
n Xn
1X 1
µ̂ = yi , σ̂ 2 = (yi − µ̂)2 , (7)
n i=1 n − k i=1

where k ≥ 0 is a user-chosen constant. For k = 0 the estimator is the sample variance, and
for k = 1 it is the unbiased estimate of variance. The KL risk of these estimators is found by
plugging (7) into (6) and taking appropriate expectations, and is a little tricky; we could instead
use simulation to obtain an approximate value of risk.
Download and source the file normal.kl.risk.R. This contains a function
normal.kl.risk(mu,sigma2,k,n,m=1e6)
which runs m simulations, each which involves drawing a sample of size n from the normal with
specified mean and variance, calcuating the estimates (7) using this sample, and then calculating
the KL divergence (6) for these estimates. All m KL divergences are then averaged to estimate
the risk. This is an example of the simulation procedure from Slide 20, Lecture 1. Examine the
code and work through the lines; you should be able to understand what is going on.
5. Run the normal.kl.risk() function for mu = 0, sigma2 = 1, n = 10 and k = 0. Compare this
against the risk obtained using k = 1. Which is better? Why do you think it might be better?
6. It turns out that the exact KL risk for these estimators is given by the formula

2 2
1 n−1 2 (n + 1)(n − k) 1
E KL(µ, σ ||µ̂, σ̂ ) = ψ + log + − (8)
2 2 n−k 2(n − 3)n 2
where ψ(x) is a special function called the digamma function (digamma() in R). In what way
does the KL risk (8) for the estimators (7) depend on the population values of µ and σ 2 ? What
does this imply, and is this a good thing?
7. Evaluate (8) for n = 10 and k = 0 and k = 1 using the function normal.kl.risk.exact(n,k=0)
from the file normal.kl.risk.R. How do these quantities compare to the estimates of risk
obtained by simulation previously? Do the same for n = 100; how do they compare?
8. Challenge Question 1. See if you can derive the formula for the KL divergence (6) between
two normal distributions. As a hint, it is easier to rewrite the KL divergence formula as
Z Z
KL(θ||θ̂) = p(y | θ) log p(y | θ)dy − p(y | θ) log p(y | θ̂)dy
h i
= Eθ [log p(y | θ)] − Eθ log p(y | θ̂)

where Eθ [f (y)] denotes an expectation of f (y) with respect to the distribution p(y | θ). (complete
out of studio)
9. Challenge Question 2. Using (8) determine the optimal value of k for a given value of n.
What happens to this value as n → ∞? (complete out of studio)

Top 60 Six Sigma Green Belt Test Questions and Answers For Practice
67% (3)
Top 60 Six Sigma Green Belt Test Questions and Answers For Practice
20 pages
Degree of Freedom
No ratings yet
Degree of Freedom
16 pages
Econometrics - Fumio Hayashi (Solutions)
No ratings yet
Econometrics - Fumio Hayashi (Solutions)
19 pages
Work Measurement and Time Study
100% (1)
Work Measurement and Time Study
43 pages
Electrical: Electronic
No ratings yet
Electrical: Electronic
250 pages
Unit Test 2 First Shift Ay2022 2023
No ratings yet
Unit Test 2 First Shift Ay2022 2023
2 pages
CH 6-Interval Estimate
No ratings yet
CH 6-Interval Estimate
58 pages
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manual 1
100% (60)
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manual 1
26 pages
Astm E2782 17 2022
0% (1)
Astm E2782 17 2022
12 pages
Psychological Testing Summary
33% (3)
Psychological Testing Summary
45 pages
Unwedge: Tutorial 7 - Probabilistic Analysis
No ratings yet
Unwedge: Tutorial 7 - Probabilistic Analysis
15 pages
Stat Inference for Car Mileage
100% (1)
Stat Inference for Car Mileage
75 pages
Probability and Statistics in Excel
No ratings yet
Probability and Statistics in Excel
11 pages
Managerial Statistics Exercises
100% (1)
Managerial Statistics Exercises
37 pages
Financial Management Mcqs
No ratings yet
Financial Management Mcqs
17 pages
Grade 9 Test Results Analysis
No ratings yet
Grade 9 Test Results Analysis
33 pages
Statistics: Understanding Deviation
No ratings yet
Statistics: Understanding Deviation
14 pages
RobustStats Practice Problems
No ratings yet
RobustStats Practice Problems
4 pages
Introduction To Econometrics (3 Updated Edition, Global Edition)
No ratings yet
Introduction To Econometrics (3 Updated Edition, Global Edition)
8 pages
Review Sol 8
No ratings yet
Review Sol 8
9 pages
Statistics and Finance: An: David Ruppert
50% (2)
Statistics and Finance: An: David Ruppert
46 pages
Business Finance Project Report
No ratings yet
Business Finance Project Report
11 pages
Standard Deviation and Standard Error
No ratings yet
Standard Deviation and Standard Error
5 pages
Probability and Statistics With R For Engineers and Scientists 1st Edition Michael Akritas Solutions Manual Download
100% (23)
Probability and Statistics With R For Engineers and Scientists 1st Edition Michael Akritas Solutions Manual Download
16 pages
EC501 Lecture 03
No ratings yet
EC501 Lecture 03
30 pages
Lecture 14
No ratings yet
Lecture 14
9 pages
Clinical Chemistry 1 (MKEB2404)
100% (1)
Clinical Chemistry 1 (MKEB2404)
7 pages
Post Hoc Test
100% (1)
Post Hoc Test
7 pages
Chap 3
No ratings yet
Chap 3
74 pages
Chapter 6-The Normal Distribution and Other Continuous Distributions
No ratings yet
Chapter 6-The Normal Distribution and Other Continuous Distributions
52 pages
Traditional Chinese Exercises On Depression
No ratings yet
Traditional Chinese Exercises On Depression
10 pages
Basem Abu Tahon (Final Exam)
No ratings yet
Basem Abu Tahon (Final Exam)
21 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Chap3 01
No ratings yet
Chap3 01
35 pages
Lecture 13
No ratings yet
Lecture 13
12 pages
Introduction
No ratings yet
Introduction
11 pages
PRML Solution Manual-2
No ratings yet
PRML Solution Manual-2
122 pages
Statistics Resit July 16 2019+ (With+answers)
No ratings yet
Statistics Resit July 16 2019+ (With+answers)
11 pages
6.2 Part1 Hwork Answers
No ratings yet
6.2 Part1 Hwork Answers
4 pages
Solution Manual For A Second Course in Statistics: Regression Analysis, 8th Edition, William Mendenhall, Terry T. Sincich
100% (2)
Solution Manual For A Second Course in Statistics: Regression Analysis, 8th Edition, William Mendenhall, Terry T. Sincich
128 pages
TJC H2 2021 Math Prelim P2 Solutions
No ratings yet
TJC H2 2021 Math Prelim P2 Solutions
19 pages
4 Estimation
No ratings yet
4 Estimation
33 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
11 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Wooldridge 7e Ch05 IM
No ratings yet
Wooldridge 7e Ch05 IM
9 pages
Parameter Estimation Techniques
No ratings yet
Parameter Estimation Techniques
8 pages
B.A. H Economics Intermedi Bikup2y2023
No ratings yet
B.A. H Economics Intermedi Bikup2y2023
32 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
A FAST Pattern Matching Algorithm: S. S. Sheik, Sumit K. Aggarwal, Anindya Poddar, N. Balakrishnan, and K. Sekar
No ratings yet
A FAST Pattern Matching Algorithm: S. S. Sheik, Sumit K. Aggarwal, Anindya Poddar, N. Balakrishnan, and K. Sekar
6 pages
07a3bs03 Probability and Statistics
No ratings yet
07a3bs03 Probability and Statistics
8 pages
Academic Integrity: Honor Code Impact
No ratings yet
Academic Integrity: Honor Code Impact
22 pages
Frekuens
No ratings yet
Frekuens
3 pages
Effect Size
No ratings yet
Effect Size
3 pages
Empirical Finance 6
No ratings yet
Empirical Finance 6
38 pages
STAT4027 Assignment 1: Lewis Hastie
No ratings yet
STAT4027 Assignment 1: Lewis Hastie
26 pages
Lecture 01
No ratings yet
Lecture 01
58 pages
PP 01 Soln
No ratings yet
PP 01 Soln
10 pages
Quant Chapter 05 Ols
No ratings yet
Quant Chapter 05 Ols
15 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
Asymptotic Theory & Inference Guide
No ratings yet
Asymptotic Theory & Inference Guide
32 pages
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
02 Point Estimators
No ratings yet
02 Point Estimators
33 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Probability & Regression Basics
100% (2)
Probability & Regression Basics
5 pages
TD Meth 2024
No ratings yet
TD Meth 2024
6 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Sheet8 Sol
No ratings yet
Sheet8 Sol
14 pages
Assignment3 Finaldraft
No ratings yet
Assignment3 Finaldraft
38 pages
Course 4 Examination Questions and Illustrative Solutions November 2000
100% (1)
Course 4 Examination Questions and Illustrative Solutions November 2000
58 pages
Notes 2
No ratings yet
Notes 2
16 pages
STAT511Q2Q4
No ratings yet
STAT511Q2Q4
11 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Appdx Cans
No ratings yet
Appdx Cans
15 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
Econometrics: CLM & OLS Basics
No ratings yet
Econometrics: CLM & OLS Basics
11 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
295 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
Wooldridge 6e Ch09 SSM
No ratings yet
Wooldridge 6e Ch09 SSM
8 pages
Exercise 6
No ratings yet
Exercise 6
8 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
Econometrics: Nonlinear R-Squared
No ratings yet
Econometrics: Nonlinear R-Squared
14 pages
Scott and Watson CHPT 4 Solutions
No ratings yet
Scott and Watson CHPT 4 Solutions
4 pages
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
No ratings yet
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
29 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages

Studio 2 Questions

Uploaded by

Studio 2 Questions

Uploaded by

FIT3154 Studio 2

Estimation, Loss and Risk

1 Estimation Risk for Bernoulli Model 2

2 Prediction Risk for Normal Distribution 4

1 Estimation Risk for Bernoulli Model

Please answer the following questions.

L(µ, µ̂) = (µ − µ̂)2 , (4)

and found the squared-error risk to be (Slide 24, Lecture 1)

You might also like