Univariate Continuous Distribution Theory - m347 - 1
Univariate Continuous Distribution Theory - m347 - 1
Mathematical statistics
2 Moments 11
2.1 Expectation 11
2.2 The mean 13
2.3 Raw moments 14
2.4 Central moments 15
2.5 The variance 16
2.6 Linear transformation 18
2.6.1 Expectation and variance 18
2.6.2 Effects on the pdf 19
2.7 The moment generating function 21
2.7.1 Examples 22
Solutions 24
This publication forms part of an Open University module. Details of this and other
Open University modules can be obtained from the Student Registration and Enquiry Service, The
Open University, PO Box 197, Milton Keynes MK7 6BJ, United Kingdom (tel. +44 (0)845 300 6090;
email [email protected]).
Alternatively, you may visit the Open University website at www.open.ac.uk where you can learn
more about the wide range of modules and packs offered at all levels by The Open University.
To purchase a selection of Open University materials visit www.ouw.co.uk, or contact Open
University Worldwide, Walton Hall, Milton Keynes MK7 6AA, United Kingdom for a brochure
(tel. +44 (0)1908 858793; fax +44 (0)1908 858787; email [email protected]).
Note to reader
Mathematical/statistical content at the Open University is usually provided to students in
printed books, with PDFs of the same online. This format ensures that mathematical notation
is presented accurately and clearly. The PDF of this extract thus shows the content exactly as
it would be seen by an Open University student. Please note that the PDF may contain
references to other parts of the module and/or to software or audio-visual components of the
module. Regrettably mathematical and statistical content in PDF files is unlikely to be
accessible using a screenreader, and some OpenLearn units may have PDF files that are not
searchable. You may need additional help to read these documents.
f (x)
area = 1
0 x
Note that in Figure 2.1, f (x) ≥ 0 for all x. Also, its integral, which is the
area of the shaded region under the pdf, is 1.
In fact, any mathematical function which is non-negative, positive on at
least one interval of values of x, and has a finite integral can be made into
a pdf. (A function whose integral does not exist cannot be made into a
pdf.) Suppose that g is such a function with
Z ∞
g(x) dx = C,
−∞
where 0 < C < ∞. Then
g(x)
f (x) =
C
is a pdf. To see this, note that
Z ∞ Z ∞
g(x) 1 ∞ 1
Z
f (x) dx = dx = g(x) dx = × C = 1.
−∞ −∞ C C −∞ C
C is called the normalising (or normalisation) constant. Some statisticians use the term
‘normalising constant’ for the
The following nomenclature is not entirely standard in statistics, but can quantity 1/C instead of C.
be useful. The part g of the pdf f that contains all the dependence of f on
x will be referred to as the density core, or just core for short, in M347.
3
Univariate continuous distribution theory
0
x
x0 x0 +
Figure 2.2 The pdf f (x) of Figure 2.1 with the area under the pdf and
between x0 and x0 + ε shaded
1.1.1 Examples
The distribution with density f which is constant on 0 < x < 1 and zero
otherwise is called the uniform distribution on (0, 1), denoted U (0, 1).
The constant value, k say, must be positive for f to be a density. What is
the value of k? Well, the density is 0 for x ≤ 0, k for 0 < x < 1 and 0 again
for x ≥ 1. So, splitting the range of integration into three parts gives
Z ∞ Z 0 Z 1 Z ∞
f (x) dx = f (x) dx + f (x) dx + f (x) dx
−∞ −∞ 0 1
Z 0 Z 1 Z ∞
= 0 dx + k dx + 0 dx
−∞ 0 1
Z 1
=0+ k dx + 0
0
Z 1
= k dx = [kx]10 = k(1 − 0) = k.
0 Integration of a constant
However, this integral should be 1, so it must be that k = 1. That is,
1 if 0 < x < 1,
f (x) =
0 otherwise.
For short, it can be said that
f (x) = 1 on 0 < x < 1.
4
1 Pdfs and cdfs
0
−1 0 1 2 x
Notice that you could have replaced the integration step in calculating k
by using the area of the square in Figure 2.3.
When a pdf only takes positive values on a subset of R, then the limits of
integration reduce to the limits of that subset. In Example 2.1, these limits
were 0 and 1. In general, the domain on which f takes positive values is
called the support of the distribution. Thus, in Example 2.1, the support
of f is (0, 1) and the density is written in the form
f (x) = {function of x} on x in its support.
And, in general, the integral of f is the integral of f over its support. In
M347, the support of f will always be a single interval, although one or
both endpoints of that interval may be ±∞.
5
Univariate continuous distribution theory
0
−1 0 1 2 3 4 x
Exercise 2.1
For each of the following four functions, decide whether or not they are, or
can be made into, densities. If they are non-negative and integrable,
calculate what their correct normalising constants should be.
(a) fa (x) = x(1 − x) on 0 < x < 1.
(b) fb (x) = ex (1 − x) on x > 0.
1 + x on −1 < x < 0,
(c) fc (x) =
1 − x on 0 ≤ x < 1.
(d) fd (x) = 1/x on x > 1.
That is, F (x) is the probability that the random variable X takes any
value less than or equal to the fixed value x.
6
1 Pdfs and cdfs
Exercise 2.2
Explain why, in the continuous case, F (x) is also equal to P (X < x).
x
d
Z
F (x) = f (y) dy; f (x) = F (x) = F ′ (x).
−∞ dx
Interactive content appears here. Please visit the website to use it.
Exercise 2.3
Use the relationships between a pdf and cdf to check mathematically that:
(a) limx→∞ F (x) = 1;
(b) F is an increasing function of x on its support.
7
Univariate continuous distribution theory
1.2.1 Examples
0
−1 0 1 2 x
(a)
F (x)
1
0
−1 0 1 2 x
(b)
Figure 2.5 (a) Graph of the pdf of the uniform distribution on (0, 1);
(b) graph of the cdf of the uniform distribution on (0, 1)
8
1 Pdfs and cdfs
It is always the case that if the support of f , (a, b) say, is not the whole
of R but an interval subset thereof, then F (x) = 0 for x ≤ a and F (x) = 1
for x ≥ b. This was illustrated for the uniform distribution with a = 0 and
b = 1 in Figure 2.5. In such cases, for short, just write
F (x) = {function of x} on a < x < b.
For example, for the uniform distribution of Example 2.3, write
F (x) = x on 0 < x < 1.
When doing so, do not lose sight of the fact that F is zero ‘below’ its
support and F is one ‘above’ its support. If a = −∞, then there is no
interval of values for which F (x) = 0, and if b = ∞, then there is no
interval of values with F (x) = 1.
Exercise 2.4
From Example 2.2, the pdf of the exponential distribution with parameter
λ > 0 is
f (x) = λe−λx on x > 0.
This exercise concerns the cdf of the exponential distribution.
(a) What is the value of F (x) for x ≤ 0?
(b) Show that the cdf of the exponential distribution is
F (x) = 1 − e−λx on x > 0.
(c) Check this result by differentiation to obtain the corresponding pdf.
A distribution is introduced now that may be new to you but will be used
as an example sufficiently often in M347 to warrant a name: the power
distribution. It has density function
f (x) = βxβ−1 on 0 < x < 1.
The parameter β can take any positive value.
Animation 2.2 Graph of the pdf of a power distribution, with a
slider to change the value of β > 0
Interactive content appears here. Please visit the website to use it.
Exercise 2.5
Using Animation 2.2, describe the main qualitative behaviour of the power
density for values of β < 1, β = 1 and β > 1, respectively.
9
Univariate continuous distribution theory
Exercise 2.6
Also,
P (c ≤ X ≤ d) = P (c ≤ X < d) = P (c < X ≤ d) = P (c < X < d)
since X is continuous so that P (X = c) = P (X = d) = 0. The formula in
the box above is most easily seen with the aid of an animated figure:
Animation 2.3 Consecutive showing of images corresponding to
F (d), then F (c), then F (d) − F (c)
Interactive content appears here. Please visit the website to use it.
Exercise 2.7
(a) For the distribution with cdf
F (x) = x2 (3 − 2x) on 0 < x < 1,
calculate P ( 14 ≤X≤ 5
8 ).
(b) For the exponential distribution with λ = 1 which has cdf
F (x) = 1 − e−x on x > 0,
show that
P (log 2 ≤ X ≤ log 4) = 14 .
10
2 Moments
Exercise 2.8
(a) Write P (x0 ≤ X < x0 + ε), ε > 0, as a function of F .
(b) Hence explain why
P (x0 ≤ X < x0 + ε)
lim = f (x0 ).
ε→0 ε
2 Moments
The mean and variance are special cases of a concept known as the
‘moments’ of a distribution. These in turn are special cases of the more
general concept of ‘expectation’, or ‘expected value’, which is described in
Subsection 2.1. The familiar notion of the mean is then considered in
Subsection 2.2. The mean is also known as the first moment of a
distribution, which leads, in Subsections 2.3 and 2.4, to two general
definitions of moments. Following these, another familiar notion, the
variance, is studied in Subsection 2.5. Subsection 2.6 concerns random
variables linked by ‘linear transformation’ and, in particular, the means,
variances and densities thereof. Finally, in Subsection 2.7, you will see how
to deal with many moments all in one go through the medium of
something called the ‘moment generating function’.
‘Doing the best at this moment puts you in the best place for the
next moment.’
(Oprah Winfrey, American television personality, actress and producer)
2.1 Expectation
Informally, as you might guess, the expected value, or expectation, of a
random variable X is some kind of average or mean or typical value of X.
The formal definition of expected value is given here in general terms for
the expected value of a general function h of X. This expected value is
denoted by E{h(X)} and is defined by another integral.
11
Univariate continuous distribution theory
Exercise 2.9
Exercise 2.10
Exercise 2.11
( n ) n
X X
E ki hi (X) = ki E{hi (X)}. (2.4)
i=1 i=1
12
2 Moments
This result follows from the linearity of integration in the same way that
Equation (2.2) does, but no further details of the proof are given.
13
Univariate continuous distribution theory
Exercise 2.12
As with the expected value, these moments are subject to the existence of
their defining integrals.
To distinguish them from other versions of moments, one of which will be
introduced in the next section, these moments are sometimes called the
raw moments. Broadly speaking, the lower the value of r, the more
important the corresponding moment is in statistics!
For some distributions, it is as easy to determine the general rth moment
as it is the mean, so all the (raw) moments might as well be calculated at
once.
14
2 Moments
Exercise 2.13
Calculate the rth moment of the power distribution, which has density
f (x) = βxβ−1 on (0, 1). Can you deduce the formula for the rth moment of
the uniform distribution on (0, 1) (Example 2.6) from your result? Power distribution
It will make life easier below to also define µ0 = E{(X − µ)0 } and
µ1 = E{(X − µ)1 }, but these both take particular constant values.
Exercise 2.14
The rth central moment can be written in terms of the mean µ and the
raw moments up to and including the rth, and the rth raw moment can be
written in terms of the mean and the central moments up to the rth. The
latter formula will be derived next. The derivation starts with a handy
little trick: write X = X − µ + µ so that
X r = (X − µ + µ)r = {(X − µ) + µ}r .
Now the binomial expansion
r
r
X r i r−i
(a + b) = ab
i
i=0
can be used. Set a = X − µ, b = µ to get
r
r
X r
X = (X − µ)i µr−i
i
i=0
15
Univariate continuous distribution theory
Thus the rth raw moment E(X r ) has been written in terms of the mean µ,
µ0 = 1, µ1 = 0 and the central moments µ2 , µ3 , . . . , µr .
Exercise 2.15
Show that the rth central moment µr can be written in terms of the
mean µ and the raw moments up to the rth as follows:
r
X r
µr = (−1)r−i E(X i ) µr−i .
i
i=0
16
2 Moments
In Example 2.4, it was shown that for the uniform distribution on (0, 1),
µ = 12 . This can be used in calculating the variance of the uniform
distribution:
V (X) = E{(X − µ)2 } = E{(X − 12 )2 }
Z 1
= (x − 12 )2 dx
0
Z 1
= (x2 − x + 14 ) dx
0
1
x3 x2 x
= − +
3 2 4 0
= ( 13 − 1
2 + 1
4) −0
1
= 12 .
(Remember, f (x) = 1 on 0 < x < 1.)
The standard deviation is therefore
q
1
p
σ = V (X) = 12 = 0.29 (correct to two decimal places).
If, having worked out the variance directly in Example 2.7, you were
concerned that such calculations could quickly start to become tricky, you
will be pleased to find that there is an alternative, easier, route to the
same answer. It depends on the following simple link between the second
central moment, the variance, and the first and second raw moments, µ
and E(X 2 ).
= E(X 2 ) − 2µ2 + µ2
= E(X 2 ) − µ2 .
17
Univariate continuous distribution theory
Exercise 2.16
18
2 Moments
Exercise 2.17
Write Y = aX + b.
(a) What is E(Y ) in terms of a, b, E(X) and V (X)?
(b) What is V (Y ) in terms of a, b, E(X) and V (X)?
If Y = aX + b, then
E(Y ) = a E(X) + b and V (Y ) = a2 V (X). (2.6)
Interactive content appears here. Please visit the website to use it.
19
Univariate continuous distribution theory
If Y = aX + b, a > 0, then
1 y−b
fY (y) = fX . (2.7)
a a
If fX has support (c, d), then fY has support (ac + b, ad + b).
Exercise 2.18
Suppose now that X follows the standard normal distribution and define
Y = σX + µ. Standard normal distribution
(a) What are E(Y ) and V (Y )?
(b) Let the pdf of the standard normal distribution be designated φ(x).
What, in terms of µ, σ and φ, is the pdf of Y ?
(c) Use the result of part (b) and the formula for φ(x) to show that the
density of Y is that of the general normal distribution with mean µ Normal distribution
and variance σ2 (a fact with which you should already be familiar).
20
2 Moments
If E(etX ) exists for all t in some interval (−δ, δ), δ > 0, then the For fixed t, etX is just a
function particular choice of function
h(X) for which expectation can
MX (t) = E(etX ), t ∈ (−δ, δ), be considered as in
Subsection 2.1.
is called the moment generating function of X.
t2 t3
MX (t) = 1 + E(X) t + E(X 2 ) + E(X 3 ) + · · ·
2! 3!
tr
+ E(X r ) + ···. (2.8)
r!
The reason for the name is now apparent: the function MX (t) generates all
the (raw) moments of a distribution by attaching them as coefficients of
tr /r! in a power series expansion of itself.
Moreover, there is a neat way to recover the individual raw moments from
the mgf. First, differentiate (2.8) with respect to t to get
t2
′
MX (t) = E(X) + E(X 2 ) t + E(X 3 ) + ··· Each prime, ′ , denotes
2! differentiating once with respect
and again to get to t.
′′
MX (t) = E(X 2 ) + E(X 3 ) t + · · · .
Then set t = 0:
′
MX (0) = E(X), ′′
MX (0) = E(X 2 ).
21
Univariate continuous distribution theory
2.7.1 Examples
The mgf is particularly straightforward to obtain for the standard normal
and exponential distributions.
= exp( 12 t2 ) × 1 = exp( 21 t2 ).
Now, the derivative of MX (t) = exp( 12 t2 ) with respect to t is
′
MX (t) = t exp( 21 t2 ),
so µ = E(X) = MX ′ (0) = 0, confirming the value of the mean of the
22
2 Moments
so E(X 2 ) = MX
′′ (0) = 1. In addition, it follows that the variance of the
Exercise 2.19
Exercise 2.20
(a) Calculate the mgf of the exponential distribution. (You may assume
that t < λ, which turns out to be the condition necessary for the Exponential distribution
exponential mgf to exist.)
(b) Hence verify that the mean and variance of the exponential
distribution are 1/λ and 1/λ2 , respectively.
(r−1)
(c) The formula for the (r − 1)th derivative of MX (t), denoted MX (t),
is
(r−1) (r − 1)!λ
MX (t) = , t < λ.
(λ − t)r
(r)
Hence obtain the formula for MX (t), t < λ.
(d) Using the result of part (c), verify that, for the exponential
distribution,
r!
E(X r ) = .
λr
23
Solutions
Solutions
Solution 2.1
(a) fa (x) ≥ 0 for all x ∈ (0, 1). Also,
Z 1 Z 1
x(1 − x) dx = (x − x2 ) dx
0 0 Integration of a power
2 3 1
x x
= −
2 3 0
= ( 21 − 13 ) − (0 − 0) = 16 .
The correct density associated with this function is therefore
fa (x) x(1 − x)
= = 6x(1 − x) on 0 < x < 1.
1/6 1/6
f (x)
1.5
0 1 x
(b) The exponential function is positive for all x > 0, but 1 − x is not: it
is negative when x > 1. It follows that fb (x) is negative when x > 1
and hence is not a probability density function on x > 0.
(c) This function is non-negative for all x ∈ (−1, 1). Its integral is
Z 0 Z 1 0 1
x2 x2
(1 + x) dx + (1 − x) dx = x + + x−
−1 0 2 −1 2 0
= 0 − ((−1) + 12 ) + ((1 − 12 ) − 0)
1 1
= 2
+ 2
= 1.
Alternatively, you might recognise fc as the triangular function shown
below. If so, the integral, which is the area under the triangle, is
1 1
2 + 2 = 1. This is because the area under the left-hand half of the
triangle is half that of the unit square (which is 21 × 1), and similarly
for the right-hand half.
f (x)
1
−1 0 1 x
24
Univariate continuous distribution theory
Solution 2.2
F (x) = P (X ≤ x) = P (X < x) + P (X = x) = P (X < x)
in the continuous case because then P (X = x) = 0.
Solution 2.3
(a) As f is a pdf and must therefore integrate to 1,
Z x
lim F (x) = lim f (y) dy
x→∞ x→∞ −∞
Z ∞
= f (y) dy = 1.
−∞
(b) The derivative of F (x) is the density f (x), which is positive for all x
on its support; it follows that F is increasing on its support.
Solution 2.4
(a) F (x) = 0 for all x ≤ 0. Explicitly, for x ≤ 0,
Z x
F (x) = 0 dy = 0.
−∞
25
Solutions
f (x)
1
0
−1 0 1 2 3 4 x
(a)
F (x)
1
0
−1 0 1 2 3 4 x
(b)
Solution 2.5
For β < 1, the power density is decreasing. For β = 1, the power density is
flat/constant. For β > 1, the power density is increasing.
Solution 2.6
(a) On 0 < x < 1,
Z x β x
β−1 y
F (x) = βy dy = β = xβ − 0 = xβ .
0 β 0 Integration of a power
(b) On a < x < b,
Z x
1 1 x x−a
F (x) = dy = [y]a =
a b−a b−a b−a
(or you could have evaluated the area of the appropriate rectangle).
Solution 2.7
(a) P ( 14 ≤ X ≤ 85 ) = F ( 85 ) − F ( 14 )
= ( 85 )2 (3 − 2( 58 )) − ( 41 )2 (3 − 2( 14 ))
25
= 64 (3 − 54 ) − 1
16 (3 − 12 )
25 7 1
= ( 64 )( 4 ) − ( 16 )( 25 )
175−40 135
= 256
= 256
= 0.53
(correct to two decimal places).
(b) P (log 2 ≤ X ≤ log 4) = F (log 4) − F (log 2)
Exponentials and logarithms
= (1 − e− log 4 ) − (1 − e− log 2 )
= (1 − 14 ) − (1 − 12 )
3 1
= 4 − 2 = 14 .
Solution 2.8
(a) P (x0 ≤ X < x0 + ε) = P (x0 ≤ X ≤ x0 + ε)
= F (x0 + ε) − F (x0 ).
26
Univariate continuous distribution theory
Solution 2.9
Z b
E{k1 h1 (X) + k2 h2 (X)} = {k1 h1 (x) + k2 h2 (x)} f (x) dx
a
Z b Z b
= k1 h1 (x)f (x) dx + k2 h2 (x)f (x) dx
a a Linearity of integration
Z b Z b
= k1 h1 (x)f (x) dx + k2 h2 (x)f (x) dx
a a
= k1 E{h1 (X)} + k2 E{h2 (X)}.
Solution 2.10
Set h1 (X) = h(X) and h2 (X) = 1. Then Equation (2.2) shows that
E{k1 h(X) + k2 } = k1 E{h(X)} + k2 E(1) = k1 E{h(X)} + k2
since E(1) = 1.
Solution 2.11
E{k h(X)} = k E{h(X)} by setting k1 = k and k2 = 0 in Equation (2.3).
Solution 2.12
Z 1
E(X) = 6x2 (1 − x) dx
0
Z 1
=6 (x2 − x3 ) dx
0
1
x3 x4
=6 − = 6(( 13 − 41 ) − (0 − 0)) = 6 × 1
12
= 12 .
3 4 0
Solution 2.13
Z 1
r
E(X ) = xr βxβ−1 dx
0
Z 1
=β xr+β−1 dx
0
1
xr+β
β β
=β = (1 − 0) = .
r+β 0 (r + β) r+β
The formula for the uniform distribution arises when β = 1, namely
1/(r + 1).
Solution 2.14
µ0 = E{(X − µ)0 } = E(1) = 1.
By Equation (2.3),
µ1 = E(X − µ) = E(X) − µ = µ − µ = 0.
27
Solutions
Solution 2.15
( r
)
r
X r
µr = E{(X − µ) } = E X i (−µ)r−i
i=0
i Binomial expansion
(by setting a = X and b = −µ in the binomial expansion) Linearity of expectation
r
X r
= (−1)r−i E(X i ) µr−i
i=0
i
(by the linearity of expectation).
Solution 2.16
β β
E(X) = , E(X 2 ) = ,
1+β 2+β
and so
V (X) = E(X 2 ) − {E(X)}2
2
β β
= −
2+β 1+β
β(1 + β)2 − β2 (2 + β)
=
(2 + β)(1 + β)2
β + 2β2 + β3 − 2β2 − β3
=
(2 + β)(1 + β)2
β
= .
(2 + β)(1 + β)2
Solution 2.17
(a) From Equation (2.3), E(Y ) = E(aX + b) = a E(X) + b.
(b) The calculation of V (Y ) is similar to the calculation of V (L) and
V (S) above:
V (Y ) = E{(Y − E(Y ))2 }
= E{(aX + b − (aµ + b))2 }
= E{(aX − aµ)2 }
= E{a2 (X − µ)2 } = a2 E{(X − µ)2 } = a2 V (X).
Solution 2.18
(a) By Equations (2.6),
E(Y ) = σ E(X) + µ = σ × 0 + µ = µ,
V (Y ) = σ2 V (X) = σ2 × 1 = σ2 .
(b) By Equation (2.7),
1 y−µ
fY (y) = φ .
σ σ
1
φ(x) = √ exp − 12 x2
(c)
2π
gives
( 2 )
1 1 y−µ
fY (y) = √ exp − ,
σ 2π 2 σ
28
Univariate continuous distribution theory
which is indeed the pdf of the normal distribution with mean µ and
variance σ2 .
Solution 2.19
Starting from MX′′
(t) = (1 + t2 ) exp( 12 t2 ) as in Example 2.9, you should
find that
(3) d ′′ Product rule
MX (t) = MX (t)
dt
= 2t exp( 12 t2 ) + (1 + t2 )t exp( 12 t2 )
= (3t + t3 ) exp( 12 t2 )
and
(4) d (3)
MX (t) = M (t)
dt X
= (3 + 3t2 ) exp( 12 t2 ) + (3t + t3 )t exp( 12 t2 )
= (3 + 6t2 + t4 ) exp( 12 t2 ).
Thus
(4)
E(X 4 ) = MX (0) = (3 + 0 + 0) × 1 = 3.
Solution 2.20
(a) MX (t) = E(etX )
Z ∞
= exp(tx) f (x) dx
0
Z ∞
= exp(tx) λ exp(−λx) dx
0 Integration of an exponential
Z ∞
=λ exp{(t − λ)x} dx
0
∞
1
=λ exp{(t − λ)x}
t−λ 0
λ λ
= (0 − 1) =
t−λ λ−t
(note that t − λ < 0 gives e(t−λ)x = 0 for x = ∞).
′ λ ′ λ 1
(b) MX (t) = 2
, so E(X) = MX (0) = 2 = ,
(λ − t) λ λ
2λ 2λ 2
MX ′′
(t) = 3
, so E(X 2 ) = MX ′′
(0) = 3 = 2 ,
(λ − t) λ λ
and therefore
2
2
2 2 1 1
V (X) = E(X ) − {E(X)} = 2 − = 2.
λ λ λ
(r) d (r−1) r(r − 1)!λ r!λ
(c) MX (t) = MX (t) = r+1
= , t < λ.
dt (λ − t) (λ − t)r+1
(r) r!λ r!
(d) E(X r ) = MX (0) = r+1 = λr ,
λ
as required.
29