Parameter Estimation
Nathaniel E. Helwig
Associate Professor of Psychology and Statistics
University of Minnesota
August 30, 2020
Copyright c 2020 by Nathaniel E. Helwig
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 1 / 40
Table of Contents
1. Parameters and Statistics
2. Sampling Distribution
3. Estimates and Estimators
4. Quality of Estimators
5. Estimation Frameworks
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 2 / 40
Parameters and Statistics
Table of Contents
1. Parameters and Statistics
2. Sampling Distribution
3. Estimates and Estimators
4. Quality of Estimators
5. Estimation Frameworks
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 3 / 40
Parameters and Statistics
Probability Distribution Reminders
A random variable X has a cumulative distribution function (CDF)
denoted by F (x) = P (X ≤ x) that describes the probabilistic nature of
the random variable X.
F (·) has an associated probability mass function (PMF) or probability
density function (PDF) denoted by f (x).
• PMF: f (x) = P (X = x) for discrete variables
Rb
• PDF: a f (x) = P (a < X < b) for continuous variables
The functions F (·) and f (·) are typically assumed to depend on a finite
number of parameters, where a parameter θ = t(F ) is some function of
the probability distribution.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 4 / 40
Parameters and Statistics
Inferences and Statistics
Given a sample of n independent and identically distributed (iid)
observations from some distribution F , inferential statistical analyses
are concerned with inferring things about the population from which
the sample was collected.
To form inferences, researchers often make assumptions about the form
of F , e.g., F is a normal distribution, and then use the sample of data
to form educated guesses about the population parameters.
Given a sample of data x = (x1 , . . . , xn )> , a statistic T = s(x) is some
function of the sample of data. Not all statistics are created equal. . .
• Some are useful for estimating parameters or testing hypotheses
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 5 / 40
Sampling Distribution
Table of Contents
1. Parameters and Statistics
2. Sampling Distribution
3. Estimates and Estimators
4. Quality of Estimators
5. Estimation Frameworks
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 6 / 40
Sampling Distribution
Statistics are Random Variables
iid iid
Assume that xi ∼ F for i = 1, . . . , n, where the notation ∼ denotes
that the xi are iid observations from the distribution F .
• x = (x1 , . . . , xn )> denotes the sample of data as an n × 1 vector
Each xi is assumed to be an independent realization of a random
variable X ∼ F , so any valid statistic T = s(x) will be a random
variable with a probability distribution.
• By “valid” I mean that T must depend on the xi values
The sampling distribution of a statistic T = s(x) refers to the
probability distribution of T .
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 7 / 40
Sampling Distribution
Sampling Distribution Properties
Suppose that we collect R independent realizations of the vector x, and
let Tr = s(xr ) denote the r-th realization of the statistic. The sampling
distribution is the probability distribution of {Tr }R
r=1 as the number of
independent realizations R → ∞.
The sampling distribution depends on the distribution of data.
iid iid
• if xi ∼ F and yi ∼ G, then the statistics T = s(x) and U = s(y)
will have different sampling distributions if F and G are different.
Sometimes the sampling distribution will be known as n → ∞.
• CLT or asymptotic normality of MLEs
• Question of interest is: how large does n need to be?
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 8 / 40
Estimates and Estimators
Table of Contents
1. Parameters and Statistics
2. Sampling Distribution
3. Estimates and Estimators
4. Quality of Estimators
5. Estimation Frameworks
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 9 / 40
Estimates and Estimators
Definition of Estimates and Estimators
iid
Given a sample of data x1 , . . . , xn where xi ∼ F , an estimate of a
parameter θ = t(F ) is some function of the sample θ̂ = g(x) that is
meant to approximate θ.
An estimator refers to the function g(·) that is applied to the sample to
obtain the estimate θ̂.
Standard notation in statistics, where a “hat” (i.e.,ˆ) is placed on top
of the parameter to denote that θ̂ is an estimate of θ.
• θ̂ should be read as “theta hat”
• should interpret θ̂ as some estimate of the parameter θ
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 10 / 40
Estimates and Estimators
Examples of Estimates and Estimators
Example. Suppose that we have a sample of data x1 , . . . , xn where
iid
xi ∼ F , which denotes any generic distribution, and the population
mean µ P= E(X) is the parameter of interest. The sample mean
x̄ = n1 ni=1 xi provides an estimate of the parameter µ, so we could
also write it as x̄ = µ̂.
Example. Similarly, suppose that we have a sample of data x1 , . . . , xn
iid
where xi ∼ F and the population variance σ 2 = E[(XP− µ)2 ] is the
1 n
parameter of interest. The sample variance s2 = n−1 i=1 (xi − x̄)
2
provides an estimate of the parameter σ 2 , so we could also
P write it as
s2 = σ̂ 2 . Another reasonable estimate would be s̃2 = n1 ni=1 (xi − x̄)2 .
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 11 / 40
Quality of Estimators
Table of Contents
1. Parameters and Statistics
2. Sampling Distribution
3. Estimates and Estimators
4. Quality of Estimators
5. Estimation Frameworks
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 12 / 40
Quality of Estimators
Overview
Like statistics, not all estimators are created equal. Some estimators
produce “better” estimates of the intended population parameters.
There are several ways to talk about the “quality” of an estimator:
• its expected value (bias)
• its uncertainty (variance)
• both its bias and variance (MSE)
• its asymptotic properties (consistency)
MSE is typically the preferred way to measure an estimator’s quality.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 13 / 40
Quality of Estimators
Bias of an Estimator
The bias of an estimator refers to the difference between the expected
value of the estimate θ̂ = g(x) and the parameter θ = t(F ), i.e.,
Bias(θ̂) = E(θ̂) − θ
where the expectation is calculated with respect to F .
• An estimator is “unbiased” if Bias(θ̂) = 0
Despite the negative connotations of the word “bias”, it is important to
note that biased estimators can be a good thing (see Helwig, 2017).
• Ridge regression (Hoerl and Kennard, 1970)
• Least absolute shrinkage and selection operator (LASSO)
regression (Tibshirani, 1996)
• Elastic Net regression (Zou and Hastie, 2005)
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 14 / 40
Quality of Estimators
Bias Example 1: The Mean
iid
Given a sample of data x1 , . . . , xn where xi ∼ F and F has mean
1 Pn
µ = E(X), the sample mean x̄ = n i=1 xi is an unbiased estimate of
the population mean µ.
To prove that x̄ is an unbiased estimator, we can use the expectation
rules from Introduction
P to Random Variables chapter. Specifically,
note that E(x̄) = n1 ni=1 E(xi ) = n1 ni=1 µ = µ.
P
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 15 / 40
Quality of Estimators
Bias Example 2: The Variance (part 1)
iid
Given a sample of data x1 , . . . , xn where xi ∼ F and F has mean
µ = E(X)Pand variance σ 2 = E[(X − µ)2 ], the sample variance
1 n
s2 = n−1 2
i=1 (xi − x̄) is an unbiased estimate of σ .
2
To prove that s2 is unbiased, first note that
n
X n
X n
X n
X
2 2 2
(xi − x̄) = xi − 2x̄ xi + nx̄ = x2i − nx̄2
i=1 i=1 i=1 i=1
1
Pn
which implies that E(s2 ) = 2 − nE(x̄2 ) .
n−1 i=1 E(xi )
Now note that σ 2 = E(x2i ) − µ2 , which implies that E(x2i ) = σ 2 + µ2 .
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 16 / 40
Quality of Estimators
Bias Example 2: The Variance (part 2)
Also, note that we can write
!2
n n n X
i−1
1X 1 X X
x̄2 = xi = 2 x2i + 2 xi xj
n n
i=1 i=1 i=2 j=1
and applying the expectation operator gives
n n i−1
2 1 X 2 2 XX
E(x̄ ) = 2 E(xi ) + 2 E(xi )E(xj )
n n
i=1 i=2 j=1
1 n−1 2
= (σ 2 + µ2 ) + µ
n n
given that E(xi xj ) = E(xi )E(xj ) for all i 6= j because xi and xj are
n(n−1) 2
independent, and ni=2 i−1 2
P P
j=1 µ = 2 µ .
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 17 / 40
Quality of Estimators
Bias Example 2: The Variance (part 3)
Putting all of the pieces together gives
n
!
1 X
E(s2 ) = E(x2i ) − nE(x̄2 )
n−1
i=1
1
n(σ 2 + µ2 ) − (σ 2 + µ2 ) − (n − 1)µ2
=
n−1
= σ2
which completes the proof that E(s2 ) = σ 2 .
This result can be used to show that s̃2 = n1 ni=1 (xi − x̄)2 is biased:
P
• E s̃2 = E n−1 2 = n−1 E s2 = n−1 σ 2
n s n n
• n−1 2
n < 1 for any finite n, so s̃ has a downward bias
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 18 / 40
Quality of Estimators
Variance of a Estimator
The variance of an estimator refers to second central moment of the
estimator’s probability distribution, i.e.,
2
Var(θ̂) = E θ̂ − E(θ̂)
where both expectations are calculated with respect to F .
The standard error of an estimator is the square root of the variance of
the estimator, i.e., SE(θ̂) = Var(θ̂)1/2 .
We would like an estimator that is both reliable (low variance) and
valid (low bias), but there is a trade-off between these two concepts.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 19 / 40
Quality of Estimators
Variance of the Sample Mean
iid
Given a sample of data x1 , . . . , xn where xi ∼ F and F has mean
µ = E(X) and variance σ 2 = E[(X − µ)2 ], the sample mean
2
x̄ = n1 ni=1 xi has a variance of Var(x̄) = σn .
P
To prove that this is the variance of x̄, we can use the variance rules
from the Introduction to Random Variables chapter, i.e.,
n n
!
1X 1 X σ2
Var(x̄) = Var xi = 2 Var(xi ) =
n n n
i=1 i=1
given that the xi are independent and Var(xi ) = σ 2 for all i = 1, . . . , n.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 20 / 40
Quality of Estimators
Variance of the Sample Variance
iid
Given a sample of data x1 , . . . , xn where xi ∼ F and F has mean
µ = E(X)Pand variance σ 2 = E[(X − µ)2 ], the sample variance
1 n
s2 = n−1 2
i=1 (xi − x̄) has a variance of
1 n−3 4
Var(s2 ) = µ4 − σ
n n−1
where µ4 = E[(X − µ)4 ] is the fourth central moment of X.
• The proof of this is too tedious to display on the slides
• Bonus points for anyone who can prove this formula
The above result can be used to show that
n−1
(n−1)2
2 2
• Var(s̃ ) = Var n s = n3 µ4 − n−3
n−1 σ 4
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 21 / 40
Quality of Estimators
Mean Squared Error of an Estimator
The mean squared error (MSE) of an estimator refers to the expected
squared difference between the parameter θ = t(F ) and the estimate
θ̂ = g(x), i.e.,
MSE(θ̂) = E (θ̂ − θ)2
where the expectation is calculated with respect to F .
Although not obvious from its definition, MSE can be decomposed as
MSE(θ̂) = Bias(θ̂)2 + Var(θ̂)
where the first term is squared bias and the second term is variance.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 22 / 40
Quality of Estimators
MSE = Bias2 + Variance
To prove this relationship holds for any estimator, first note that
(θ̂ − θ)2 = θ̂2 − 2θ̂θ + θ2 , and applying the expectation operator gives
E (θ̂ − θ)2 = E(θ̂2 ) − 2θE(θ̂) + θ2
given that the parameter θ is assumed to be an unknown constant.
Next, note that we can write the squared bias and variance as
2
Bias(θ̂)2 = E(θ̂) − θ = E(θ̂)2 − 2θE(θ̂) + θ2
Var(θ̂) = E(θ̂2 ) − E(θ̂)2
and adding these two terms together gives
Bias(θ̂)2 + Var(θ̂) = E(θ̂)2 − 2θE(θ̂) + θ2 + E(θ̂2 ) − E(θ̂)2
= E(θ̂2 ) − 2θE(θ̂) + θ2
which is the form of the MSE given on the previous slide.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 23 / 40
Quality of Estimators
Consistency of an Estimator
iid
Given a sample of data x1 , . . . , xn with xi ∼ F , an estimator θ̂ = g(x)
p
of a parameter θ = t(F ) is said to be consistent if θ̂ →− θ as n → ∞.
p
The notation →
− should be read as “converges in probability to”, which
means that the probability that θ̂ 6= θ goes to zero as n gets large.
Note that any reasonable estimator should be consistent. Otherwise,
collecting more data will not result in better estimates.
All of the estimators that we’ve discussed (i.e., x̄, s2 and s̃2 ) are
consistent estimators.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 24 / 40
Quality of Estimators
Efficiency of an Estimator
iid
Given a sample of data x1 , . . . , xn with xi ∼ F , an estimator θ̂ = g(x)
of a parameter θ = t(F ) is said to be efficient if it is the best possible
estimator for θ using some loss function.
The chosen loss function is often MSE, so the most efficient estimator
is the one with the smallest MSE compared to all other estimators of θ.
If you have two estimators θ̂1 = g1 (x) and θ̂2 = g2 (x), we would say
that θ̂1 is more efficient than θ̂2 if MSE(θ̂1 ) < MSE(θ̂2 ).
• If θ̂1 and θ̂2 are both unbiased, the most efficient estimator is the
one with the smallest variance
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 25 / 40
Estimation Frameworks
Table of Contents
1. Parameters and Statistics
2. Sampling Distribution
3. Estimates and Estimators
4. Quality of Estimators
5. Estimation Frameworks
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 26 / 40
Estimation Frameworks
Least Squares Estimation
A simple least squares estimate of a parameter θ = t(F ) is the estimate
θ̂ = g(x) that minimizes a least squares loss function of the form
n
X
(h(xi ) − θ)2
i=1
where h(·) is some user-specified function (typically h(x) = x).
Least squares estimation methods can work well for mean parameters
and regression coefficients, but will not work well for all parameters.
• Variance parameters are best estimated using other approahces
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 27 / 40
Estimation Frameworks
Least Squares Estimation Example
iid
Given a sample of data x1 , . . . , xn where xi ∼ F , suppose that we want
to find the least squares estimate of µ = E(X).
The least squares loss function is
n
X n
X n
X
LS(µ|x) = (xi − µ)2 = x2i − 2µ xi + nµ2
i=1 i=1 i=1
where x = (x1 , . . . , xn ) is the observed data vector.
Taking the derivative of the function with respect to µ gives
n
dLS(µ|x) X
= −2 xi + 2nµ
dµ
i=1
1 Pn
and setting the derivative to 0 and solving for µ gives µ̂ = n i=1 xi .
• The sample mean x̄ is the least squares estimate of µ
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 28 / 40
Estimation Frameworks
Method of Moments Estimation
Assume that X ∼ F where the probability distribution F depends on
parameters θ1 , . . . , θp .
Also, suppose that the first p moments of X can be written as
µj = E(X j ) = mj (θ1 , . . . , θp )
where mj (·) is some known function for j = 1, . . . , p.
iid
Given data xi ∼ F for i = 1, . . . , n, the method of moments estimates
of the parameters are the values θ̂1 , . . . , θ̂p that solve the equations
µ̂j = mj (θ̂1 , . . . , θ̂p )
1 Pn j
where µ̂j = n i=1 xi is the j-th sample moment for j = 1, . . . , p.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 29 / 40
Estimation Frameworks
Method of Moments: Normal Distribution
iid
Suppose that xi ∼ N (µ, σ 2 ) for i = 1, . . . , n. The first two moments of
the normal distribution are µ1 = µ and µ2 = µ2 + σ 2 .
two sample moments are µ̂1 = n1P ni=1 xi = x̄ and
P
The first
µ̂2 = n1 ni=1 x2i = x̄2 + s̃2 , where s̃2 = n1 ni=1 (xi − x̄)2 .
P
Thus, the method of moments estimates of µ and σ 2 are given by
µ̂ = x̄ and σ̂ 2 = s̃2 .
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 30 / 40
Estimation Frameworks
Method of Moments: Uniform Distribution
iid
Suppose that xi ∼ U [a, b] for i = 1, . . . , n. The first two moments of
the uniform distribution are µ1 = 12 (a + b) and µ2 = 31 (a2 + ab + b2 ).
Solving the first equation gives b = 2µ1 − a and plugging this into the
second equation gives µ2 = 13 a2 − 2aµ1 + 4µ21 , which is a simple
quadratic function of a.
√ p 2
1 −p 3 µ2 − µ1 ,
Applying the quadratic formula (see here) gives a = µ√
2
and plugging this into b = 2µ1 − a produces b = µ1 + 3 µ2 − µ1 .
Using µ̂1 and µ̂2 in these equations gives the methods of moments
estimates of a and b.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 31 / 40
Estimation Frameworks
Likelihood Function and Log-Likelihood Function
iid
Suppose that xi ∼ F for i = 1, . . . , n where the distribution F depends
on the vector of parameters θ = (θ1 , . . . , θp )> .
The likelihood function has the form
n
Y
L(θ|x) = f (xi |θ)
i=1
where f (xi |θ) is the probability mass function (PMF) or probability
density function (PDF) corresponding to the distribution function F .
The log-likelihood function is the logarithm of the likelihood function:
n
X
`(θ|x) = log (L(θ|x)) = log (f (xi |θ))
i=1
where log(·) = ln(·) is the natural logarithm function.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 32 / 40
Estimation Frameworks
Maximum Likelihood Estimation
iid
Suppose that xi ∼ F for i = 1, . . . , n where the distribution F depends
on the vector of parameters θ = (θ1 , . . . , θp )> .
The maximum likelihood estimates (MLEs) are the parameter values
that maximize the likelihood (or log-likelihood) function, i.e.,
θ̂ MLE = arg max L(θ|x) = arg max `(θ|x)
θ∈Θ θ∈Θ
where Θ = Θ1 × · · · × Θp is the joint parameter space with Θj denoting
the parameter space for the j-th parameter, i.e., θj ∈ Θj for all j.
Maximum likelihood estimates have desirable large sample properties:
• consistent: θ̂MLE → θ as n → ∞
• asymptotically efficient: Var(θ̂MLE ) ≤ Var(θ̂) as n → ∞
• functionally invariant: if θ̂MLE is the MLE of θ, then h(θ̂MLE ) is
the MLE of h(θ) for any continuous function h(·)
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 33 / 40
Estimation Frameworks
MLE for Normal Distribution
iid
Suppose that xi ∼ N (µ, σ 2 ) for i = 1, . . . , n. Assuming that
X ∼ N (µ, σ 2 ), the probability density function can be written as
2 1 1 2
f (x|µ, σ ) = √ exp − 2 (x − µ)
2πσ 2 2σ
This implies that the log-likelihood function has the form
n
1 X n
`(µ, σ 2 |x) = − (xi − µ)2 − log(σ 2 ) − c
2σ 2 2
i=1
where c = (n/2) log(2π) is a constant with respect to µ and σ 2 .
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 34 / 40
Estimation Frameworks
MLE for Normal Distribution (part 2)
Maximizing `(µ, σ 2 |x) with respect to µ is equivalent to minimizing
n
X
`1 (µ|x) = (xi − µ)2
i=1
which is the least squares loss function that we encountered before.
We can use the same approach as before to derive the MLE:
• Take the derivative of `1 (µ|x) with respect to µ
• Equate the derivative to zero and solve for µ
1 Pn
The MLE of µ is the sample mean, i.e., µ̂MLE = x̄ = n i=1 xi .
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 35 / 40
Estimation Frameworks
MLE for Normal Distribution (part 3)
Maximizing `(µ, σ 2 |x) with respect to σ 2 is equivalent to minimizing
n
1 X
`2 (σ 2 |µ̂, x) = (xi − x̄)2 + n log(σ 2 )
σ2
i=1
Taking the derivative of `2 (σ 2 |µ̂, x) with respect to σ 2 gives
n
d`2 (σ 2 |µ̂, x) 1 X n
2
= − 4
(xi − x̄)2 + 2
σ σ σ
i=1
Equating the derivative to zero and solving for σ 2 reveals that
1 Pn
σ̂MLE = s̃ = n i=1 (xi − x̄)2 .
2 2
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 36 / 40
Estimation Frameworks
MLE for Binomial Distribution
iid
Suppose that xi ∼ B[N, p] for i = 1, . . . , n. Assuming that
X ∼ B[N, p], the probability density function can be written as
N x N!
f (x|N, p) = p (1 − p)N −x = px (1 − p)N −x
x x!(N − x)!
This implies that the log-likelihood function has the form
n n
!
X X
`(p|x, N ) = log(p) xi + log(1 − p) nN − xi +c
i=1 i=1
Pn
where c = n log(N !) − i=1 [log(xi !) + log((N − xi )!)] is a constant.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 37 / 40
Estimation Frameworks
MLE for Binomial Distribution (part 2)
Taking the derivative of the log-likelihood with respect to p gives
n n
!
d`(p|x, N ) 1X 1 X
= xi − nN − xi
dp p 1−p
i=1 i=1
Setting the derivative to zero and multiplying by p(1 − p) reveals that
the MLE satisfies
(1 − p)nx̄ − pn (N − x̄) = 0 → x̄ − pN = 0
Solving the above equation for p reveals that the MLE of p is
n
1 X x̄
p̂MLE = xi =
nN N
i=1
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 38 / 40
Estimation Frameworks
MLE for Uniform Distribution
iid
Suppose that xi ∼ U [a, b] for i = 1, . . . , n. Assuming that X ∼ U [a, b],
the probability density function can be written as
1
f (x|a, b) =
b−a
This implies that the log-likelihood function has the form
n
X
`(a, b|x) = − log(b − a) = −n log(b − a)
i=1
Maximizing `(a, b|x) is equivalent to minimizing log(b − a) with the
requirements that a ≤ xi for all i = 1, . . . , n and b ≥ xi for all i.
• MLEs are âMLE = min(xi ) = x(1) and b̂MLE = max(xi ) = x(n)
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 39 / 40
References
References
Helwig, N. E. (2017). Adding bias to reduce variance in psychological
results: A tutorial on penalized regression. The Quantitative
Methods for Psychology 13, 1–19.
Hoerl, A. and R. Kennard (1970). Ridge regression: Biased estimation
for nonorthogonal problems. Technometrics 12, 55–67.
Tibshirani, R. (1996). Regression and shrinkage via the lasso. Journal
of the Royal Statistical Society, Series B 58, 267–288.
Zou, H. and T. Hastie (2005). Regularization and variable selection via
the elastic net. Journal of the Royal Statistical Society, Series B 67,
301–320.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 40 / 40