Mathematical Expectation
Reza Abdolmaleki
Probability
Lecture 11
Moments
In statistics, the mathematical expectations defined here and, called the moments of the distribution of a
random variable or simply the moments of a random variable, are of special importance.
Definition. The -th moment about the origin of a random variable , denoted by is the expected value
of ; symbolically
for when is discrete, and
when is continuous
It is of interest to note that the term “moment” comes from the field of physics: If the quantities in
the discrete case were point masses acting perpendicularly to the axis at distances from the origin,
would be the x-coordinate of the center of gravity, that is, the first moment divided by , and would
be the moment of inertia. This also explains why the moments are called moments about the origin:
In the analogy to physics, the length of the lever arm is in each case the distance from the origin.
The analogy applies also in the continuous case, where and might be the coordinate of the center of
gravity and the moment of inertia of a rod of variable density.
When , we have by Corollary 2 of Theorem 2 in the previous Lecture. When , we have which is
just the expected value of the random variable and in view of its importance in statistics we give it a
special symbol and a special name.
Definition. is called the mean of the distribution of , or simply the mean of , and it is denoted
simply by
The special moments we shall define next are of importance in statistics because they serve to describe the
shape of the distribution of a random variable, that is, the shape of the graph of its probability distribution
or probability density
Definition. The -th moment about the mean of a random variable , denoted by , is the expected value of ,
symbolically
for when is discrete, and
when is continuous
It is easy to see that = 1 and = 0 for any random variable for which μ exists.
The second moment about the mean is of special importance in statistics
because it is indicative of the spread or dispersion of the distribution of a
random variable; thus, it is given a special symbol and a special name.
D is called the variance of the distribution of , or simply the variance of , and it
is denoted by ,, or The positive square root of the variance, , is called the
standard deviation of
The following figure shows how the variance reflects the spread or dispersion of
the distribution of a random variable. Here we show the histograms of the
probability distributions of four random variables with the same mean but
variances equaling , and . As can be seen, a small value of suggests that we are
likely to get a value close to the mean, and a large value of suggests that there is
a greater probability of getting a value that is not close to the mean. This will be
discussed
further in the next lecture.
Let us derive the following computing formula for :
Theorem 1.
Proof.
Example 1. Use Theorem 1 to calculate the variance of , representing the number of points rolled with a
balanced die
Solution.
First we compute
Now,
and it follows that
Example 2. With reference to Example 2 of Lecture 10, find the standard deviation of the random
variable
Solution.
In Example 2 of Lecture 10, we showed that Now
and it follows that
and
The following is another theorem that is of importance in work connected with standard deviations
or variances:
Theorem 2. If has the variance , then
Proof.
Note that: for , we find that the addition of a constant to the values of a random variable, resulting
in a shift of all the values of to the left or to the right, in no way affects the spread of its
distribution; for we find that if the values of a random variable are multiplied by a constant, the
variance is multiplied by the square of that constant, resulting in a corresponding change in the
spread of the distribution.
Chebyshev’s Theorem
To demonstrate how or is indicative of the spread or dispersion of the distribution of a random
variable, let us now prove the following theorem, called Chebyshev’s theorem after the nineteenth-
century Russian mathematician P. L. Chebyshev. We shall prove it here only for the continuous case,
leaving the discrete case as an exercise.
Theorem 3. (Chebyshev’s Theorem) If and are the mean and the standard deviation of a
random variable , then for any positive constant the probability is at least that X will take on a
value within standard deviations of the mean; symbolically,
Proof. Using the definitions, we write
Then, dividing the integral into three parts as shown in the figure, we get
Since the integrand is nonnegative, we can form the inequality
by deleting the second integral. Therefore, since for or it follows that
and hence that
provided Since the sum of the two integrals on the right-hand side is the probability that will take on a
value less than or equal to or greater than or equal to , we have thus shown that
and it follows that
For instance, the probability is at least that a random variablewill take on a value within two standard
deviations of the mean, the probability is at least that it will take on a value within three standard
deviations of the mean, and the probability is at least that it will take on a value within five standard
deviations of the mean. It is in this sense that σ controls the spread or dispersion of the distribution of
a random variable. Clearly, the probability given by Chebyshev’s theorem is only a lower bound;
whether the probability that a given random variable will take on a value within k standard deviations
of the mean is actually greater than and, if so, by how much we cannot say, but Chebyshev’s theorem
assures us that this probability cannot be less than Only when the distribution of a random variable is
known can we calculate the exact probability.
Example 3. If the probability density of X is given by
find the probability that it will take on a value within two standard deviations of the mean and compare
this probability with the lower bound provided by Chebyshev’s theorem.
Solution.
Straightforward integration shows that and , so that or approximately . Thus, the probability that
will take on a value within two standard deviations of the mean is the probability that it will take on
a value between and that is,
Observe that the statement “the probability ” is a much stronger statement than “the probability is at
least ,” which is provided by Chebyshev’s theorem.
Moment-Generating Functions
Although the moments of most distributions can be determined directly by evaluating the necessary
integrals or sums, an alternative procedure sometimes provides considerable simplifications. This
technique utilizes moment-generating functions.
Definition. The moment generating function of a random variable , where it exists, is given by
when is discrete, and
when is continuous.
The independent variable is , and we are usually interested in values of in the
neighbourhood of .
To explain why we refer to this function as a “moment-generating” function, let us
substitute for its Maclaurin’s series expansion, that is,
For the discrete case, we thus get
and it can be seen that in the Maclaurin’s series of the moment-generating function
of X the coefficient of is , the -th moment about the origin. In the continuous case,
the argument is the same.
Example 4. Find the moment-generating function of the random variable whose probability density
is given by
and use it to find an expression for .
Solution.
By definition
for .
As is well known, when the Maclaurin’s series for this moment-generating function is
and hence for
The main difficulty in using the Maclaurin’s series of a moment-generating
function to determine the moments of a random variable is usually not that of
finding the moment-generating function, but that of expanding it into a Maclaurin’s
series. If we are interested only in the first few moments of a random variable, say,
and , their determination can usually be simplified by using the following theorem.
Theorem 4.
This follows from the fact that if a function is expanded as a power series in , the coefficient of is the r-
th derivative of the function with respect to at .
Example 5. Given that X has the probability distribution for and , find
the moment-generating function of this random variable and use it to determine and
Now using Theorem we get
and
Often the work involved in using moment-generating functions can be simplified by
making use of the following theorem.
Theorem 5. If andare constants, then
1. ;
2. ;
3. .
The proof of this theorem is left as an exercise. The first part of the theorem is of
special importance when , and the third part is of special importance when and ,
in which case