AMSI DiscreteProbability4c
AMSI DiscreteProbability4c
2
3
4
6 12
11
7 10
8 9
© 2013 Education Services Australia Ltd, except where indicated otherwise. You may
copy, distribute and adapt this material free of charge for non-commercial educational
purposes, provided you retain all copyright notices and acknowledgements.
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Discrete random variables: general ideas . . . . . . . . . . . . . . . . . . . . . . . . 7
Probability functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Examples of discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Mean of a discrete random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Variance of a discrete random variable . . . . . . . . . . . . . . . . . . . . . . . . . 20
Answers to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Discrete probability
distributions
Assumed knowledge
The content of the module Probability, which formalises the ideas in the series of TIMES
modules Chance (Years 1–10).
Motivation
This module introduces discrete random variables. A random variable can be either dis-
crete or continuous. In this module, we cover the first type, and the module Continuous
probability distributions covers the second.
The idea of a random variable builds on the fundamental ideas of probability. Students
need to understand that random variables are conceptually different from the mathe-
matical variables that they have met before. A random variable is linked to observations
in the real world, where uncertainty is involved.
Random variables are central to the use of probability in practice. They are used to model
random phenomena, which means that they are relevant to a wide range of human ac-
tivity. In particular, they are used extensively in many branches of research, including
agriculture, biology, ecology, economics, medicine, meteorology, physics, psychology
and others. They provide a structure for making inferences about the world, when it
is impossible to measure things comprehensively. They are used to model outcomes of
processes that cannot be predicted in advance.
Random variables have distributions. In this module, we describe the essential proper-
ties of distributions of discrete random variables. Distributions can have many forms,
but there are some special types of distributions that arise in many different practical
A guide for teachers – Years 11 and 12 • {5}
contexts. In this module, we discuss two of these special distributions: discrete uniform
distributions and geometric distributions.
This module also covers the mean of a discrete random variable, which is a measure of
central location, and the variance and standard deviation, which are measures of spread.
Content
Random variables
Example: Tetris
Consider the following example of a random procedure from the module Probability:
During a game of Tetris, we observe a sequence of three consecutive pieces.
Each Tetris piece has one of seven possible shapes, which are labelled by the letters I, J,
L, O, S, T and Z. So in this random procedure, we can observe a sequence such as JLL,
ZOS, ZSZ, III and so on.
Based on this random procedure, we may define a number of random variables. For
example:
These are not the only random variables that could be defined in this context.
This example illustrates that a random variable takes a numerical value in a specific case,
when the random procedure is carried out.
A convention of notation is that random variables are denoted by capital letters, usually
near the end of the alphabet.
{6} • Discrete probability distributions
Consider another example of a random procedure from the module Probability: Five
babies born in 1995 are followed up over their lives, and major health and milestone
events are recorded.
This example is only vaguely described, and would be more tightly defined in practice.
But we can see, again, that a number of random variables could be defined:
• Let Ui be the total number of times that individual i moves residence up to age 18.
Then Ui can take values 0, 1, 2, . . . .
• Let Vi be the total number of mobile phones owned by individual i up to age 18. The
possible values for Vi are 0, 1, 2, . . . , which incidentally are the same as those for Ui .
• Let W be the average height of the five people at age 18. Then the value of W must be
positive, but there is no obvious upper bound. The common practice in such cases is
to say that the possible values are W > 0; we will assign extremely low probabilities to
large values.
• Let Ti be the total time spent on Facebook by individual i up to age 18. Then Ti in
this case is limited by the total time span being considered. If we measure Ti in years,
then 0 ≤ Ti ≤ 18; again, values anywhere near the logical maximum of 18 years will be
assigned essentially zero probability.
The random variables given in the previous example are of two distinct types, which are
handled in different ways:
2 A continuous random variable can take any value in an interval. In the example, the
third and fourth random variables W and Ti are continuous random variables.
This module concerns discrete random variables. The module Continuous probability
distributions deals with continuous random variables.
Example
A fair coin is spun vertically on a flat surface. (This example comes from an exercise in
the module Probability.) Here are two related random variables:
• Let X be the number of heads showing when the coin comes to rest. Then X takes
the value 0 if the coin finishes up ‘tails’, or 1 if the coin finishes up ‘heads’.
• Let Y be the time between the commencement of the spin and the coin coming to
rest, measured in seconds.
A special case of a discrete random variable is one that can take only a finite number of
values. We call this a simple random variable. Since X in the previous example can only
take values 0 and 1, it is a simple random variable.
We start with the classic example of rolling a fair six-sided die. This scenario has already
been discussed extensively in the module Probability. You may wonder why introductory
discussions of probability and random variables often start with examples involving dice,
cards and coins. The reason is that these are concrete randomising devices with which
we are familiar, and for which we are willing to entertain plausible assumptions about
the idealised model.
As we have seen in the module Probability, for coin tossing in particular, this is not as
straightforward as it seems. When considering the actual use of dice, cards and coins,
the potential for other factors that will invalidate the model should always be kept in
mind. Probability and statistics give us the framework to think about this rationally.
But for the initial example, we assume that the model is correct: the die is symmetric in
every respect and is shaken well before it is rolled. The uppermost face is observed. The
event space for this random procedure is E = {1, 2, 3, 4, 5, 6}, and the possible outcomes
are equiprobable.
Define the discrete random variable X to be the number on the uppermost face when the
die comes to rest. In this case, there is a distinct value of the random variable for each
distinct possible outcome of the random procedure. X can take the values 1, 2, 3, 4, 5, 6,
and no other values. It takes discrete values, integers in this case — the die cannot end
up with an outcome of π or 1.01 — so X is a discrete random variable. Since X can only
take six possible values, it is a simple random variable.
{8} • Discrete probability distributions
The probability that a discrete random variable X takes the value x is denoted Pr(X = x).
We read this as ‘the probability that X equals x’, which means the probability that X takes
the value x when we actually obtain an observation. For the die-rolling example,
1 if x = 1, 2, 3, 4, 5, 6,
6
Pr(X = x) =
0 otherwise,
for each real number x. Often, for discrete random variables, it is sufficient to specify in
some way the values with non-zero probability only; the values with zero probability are
usually clear, or clearly implied.
There are other simple random variables that can be defined for the random procedure
of rolling a die. For example:
These two examples are not terribly interesting; but they illustrate the important point
that a single random procedure can accommodate several random variables. Note that
“Y = 1” and “Z = 0” are events, in that they define subsets of the event space E . The event
“Y = 1” is {2, 4, 6}. This is a crucial insight; it makes it feasible to obtain the probability
distribution of a random variable.
This leads to the definition of a random variable from a formal perspective. Students
accustomed to formal mathematical treatments of topics sometimes find the description
of a random variable given so far somewhat elusive. A random variable can be defined
formally in a way that strongly relates to mathematical topics that students have covered
elsewhere, specifically, functions.
A random variable is a numerical-valued function that maps the event space E to the set
of real numbers. Students will not be familiar with a ‘variable’ that is a function. It is an
important conceptual point, represented in the following diagram.
û û
X
ûû û
ûû û û
û û û û û
û û û
û û û û û û ûû
û û û û
ε x ℝ
Probability functions
To work out the probability that a discrete random variable X takes a particular value x,
we need to identify the event (the set of possible outcomes) that corresponds to “X = x”.
In general, the function used to describe the probability distribution of a discrete random
variable is called its probability function (abbreviated as pf). The probability function
of X is the function p X : R → [0, 1] given by
In general, the probability function p X (x) may be specified in a variety of ways. One way
is to specify a numerical value for each possible value of X ; we have done that for the
die-rolling example. In the die-rolling example, the random variable X can take exactly
six values, and no others, and we assert that the probability that X takes any one of these
values is the same, namely 61 .
As is the case generally for functions, the lower-case x here is merely the argument of the
function. If we write Pr(X = y) it is essentially the same function, just as f (x) = 2x 2 +3x−1
and f (y) = 2y 2 +3y −1 are the same function. But it helps to associate the corresponding
lower-case letter with the random variable we are considering.
Another way to specify the probability function is using a formula. We will see examples
of this in the next section.
Less formally, the probability distribution may be represented using a graph, with a spike
of height p X (x) at each possible value x of X . If there are too many possible values of X
for this visual representation to work, we may choose to omit probabilities that are very
close to zero; such values will typically be invisible on a graph anyway.
{10} • Discrete probability distributions
The following graph shows the probability function for the outcome of rolling a fair die.
Pr(X=x)
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6
x
The distribution of X , the number on the uppermost face when a fair die is rolled.
The importance of the notational convention to use capital letters for random variables
now becomes apparent. A random variable X has a distribution. Formally, X is a func-
tion from the event space to the real line. An observed value of X , such as x = 4, is simply
a number. Once we have rolled the die, we have the outcome. If we observed x = 4, this
does not have a distribution. It is the observed value, and a number.
Quite often, there is more than one random variable being considered. That is the reason
for writing p X (x) for the probability function of X , to distinguish it from p Y (y), say, the
probability function of Y .
Since the probability function p X (x) is a probability, it must obey the axioms. Two im-
portant properties of a probability function follow from this:
Exercise 1
Sneaky Sam has manufactured a deliberately biased six-sided die, with the following
probability distribution for X , the number of spots on the uppermost face when the die
is rolled.
x 1 2 3 4 5 6
Random variables arising in practical settings of real importance are often much less
readily dealt with than those arising in dice rolling. A common approach in such sit-
uations, which directly mirrors one of the approaches to probability, is to estimate the
probability distribution of the random variable from data.
Example
Consider the random variable X defined to be the number of languages in which a ran-
domly chosen Australian child attending school can hold an everyday conversation.
Suppose we take a random sample of 1000 Australian school children and obtain the
following data for the number of languages spoken.
x 1 2 3 4 5 6 Total
x 1 2 3 4 5 6
So we estimate that 22.6% of Australian school children can hold an everyday conversa-
tion in (exactly) two languages; equivalently, we estimate that Pr(X = 2) = 0.226.
In this case, we cannot be sure that x = 7 (for example) is an impossible value. It may
just be so rare that it did not crop up in the sample.
{12} • Discrete probability distributions
Examples of discrete distributions
In the previous two sections, we considered the random variable defined as the number
on the uppermost face when a fair die is rolled. This is an example of a random variable
with a discrete uniform distribution. In general, for a positive integer m, let X be a
discrete random variable with pf p X (x) given by
1 if x = 1, 2, . . . , m,
p X (x) = Pr(X = x) = m
0 otherwise.
Then X has a discrete uniform distribution. This is a distribution that arises often in
lotteries and games of chance.
We have seen this distribution in the Powerball example considered in the module Prob-
ability. In commercial lotteries, such as Powerball, it is a regulatory requirement that
each outcome is equally likely. There are 45 possible Powerball numbers (1, 2, . . . , 45). So
if X is the Powerball drawn on a particular occasion, the pf p X (x) of X is given by
1 if x = 1, 2, . . . , 45,
p X (x) = Pr(X = x) = 45
0 otherwise.
If this model for the drawing of Powerball numbers is correct, we should expect that,
over a large number of draws, the relative frequencies of the 45 possible numbers are all
1
approximately equal to 45 ≈ 0.0222. The following graph shows the relative frequencies
observed in the 853 draws from May 1996 to September 2012.
Relative frequency
0.035
0.030
0.025
0.0222
0.020
0.015
0.010
0.005
0.000
0 10 20 30 40
Powerball number
The set of possible values for a discrete random variable may be infinite, as the following
example shows.
Example
Fix a real number p with 0 < p < 1. Now let X be the discrete random variable with
probability function p X (x) given by
p(1 − p)x if x = 0, 1, 2, . . . ,
p X (x) = Pr(X = x) =
0 otherwise.
We will confirm that p X (x) has the two requisite properties of a probability function.
First, we have p X (x) ≥ 0, since p > 0 and 1 − p > 0. Second, we have
= p 1 + (1 − p) + (1 − p)2 + · · ·
¡ ¢
³ 1 ´
=p (sum of an infinite geometric series)
1 − (1 − p)
= 1.
We have already met one special distribution that is given a name (the discrete uniform
distribution). The distribution in the previous example also has a name; it is called a
geometric distribution.
Suppose that a sequence of independent ‘trials’ occur, and at each trial the probability of
‘success’ equals p. Define X to be the number of trials that occur before the first success
is observed. Then X has a geometric distribution with parameter p.
We introduce here a symbol used throughout the modules on probability and statistics.
d d
If X has a geometric distribution with parameter p, we write X = G(p). The symbol =
stands for ‘has the distribution’, meaning the distribution indicated immediately to the
right of the symbol.
Note the use of the generic terms ‘trial’ and ‘success’. They are arbitrary, but they carry
with them the idea of each observation involving a kind of test (i.e., trial) in which we
ask the question: Which one of the two possibilities will be observed, a ‘success’ or a
‘failure’? In this sense, the words ‘success’ and ‘failure’ are just labels to keep track of the
two possibilities for each trial.
{14} • Discrete probability distributions
Note that X can take the value 0, if a success is observed at the very first trial. Or it can
take the value 1, if a failure is observed at the first trial and then a success at the second
trial. And so on. What is the largest value that X can take? There is no upper limit, in
theory. As the values of X increase, the probabilities become smaller and smaller. The
sum of the probabilities equals one, as shown in the previous example.
Exercise 2
Recall that Tetris pieces have seven possible shapes, labelled I, J, L, O, S, T, Z. Assume that,
at any stage of the game, all seven shapes are equally likely to be produced, independent
of whatever pieces have been produced previously.
Consider the sequences of three consecutive pieces observed during a game of Tetris,
such as JLL, ZOS, ZSZ, III and so on.
a What is the probability that a sequence of three pieces does not contain a Z?
b Hence, what is the probability that a sequence of three pieces has at least one Z?
c What is the probability that a Z occurs in the first three-piece sequence observed?
d What is the probability that a Z does not occur in the first three-piece sequence, but
does occur in the second such sequence? (Here we are considering non-overlapping
sequences of three pieces.)
e What is the probability that a Z does not occur in the first x three-piece sequences,
but does occur in the (x + 1)st such sequence?
Exercise 3
Julia and Tony play the hand game Rock-paper-scissors. Assume that, at each play, they
make their choices with equal probability ( 13 ) for each of the three moves, independently
of any previous play.
If you roll a fair die many times, what will be the average outcome? Imagine rolling it
6000 times. You would expect to roll about 1000 ones, 1000 twos, and so on: about 1000
occurrences of each possible outcome. What would be the average value of the outcomes
obtained? Approximately, the average or mean would be
This can be thought of as the weighted average of the six possible values 1, 2, . . . , 6, with
weights given by the relative frequencies. Note that 3.5 is not a value that we can actually
observe.
By analogy with data and relative frequencies, we can define the mean of a discrete ran-
dom variable using probabilities from its distribution, as follows.
The mean µ X of a discrete random variable X with probability function p X (x) is given by
µX =
X
x p X (x),
where the sum is taken over all values x for which p X (x) > 0.
The mean can be regarded as a measure of ‘central location’ of a random variable. It is the
weighted average of the values that X can take, with weights provided by the probability
distribution.
The mean is also sometimes called the expected value or expectation of X and denoted
by E(X ). These are both somewhat curious terms to use; it is important to understand
that they refer to the long-run average. The mean is the value that we expect the long-run
average to approach. It is not the value of X that we expect to observe.
{16} • Discrete probability distributions
Consider a random variable U that has the discrete uniform distribution with possible
values 1, 2, . . . , m. The mean is given by
m ³ 1´
µU =
X
x×
x=1 m
1 Xm
= x
m x=1
1 m(m + 1)
= ×
m 2
m +1
= .
2
6+1
For example, the mean for the roll of a fair die is = 3.5, as expected.
2
So in the long run, rolling a single die many times and obtaining the average of all the
outcomes, we ‘expect’ the average to be close to 3.5, and the more rolls we carry out, the
closer the average will be.
The use of the terms ‘expected value’ and ‘expectation’ is the reason for the notation E(X ),
which also extends to functions of X .
Exercise 4
Consider again the biased die made by Sneaky Sam. Recall that the distribution of X , the
number of spots on the uppermost face when the die is rolled, is as follows.
x 1 2 3 4 5 6
Pr(X=x)
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6
x
The distribution of X , the number on the uppermost face when a fair die is rolled.
We can give a general physical interpretation of the mean of a discrete random variable X
with pf p X (x). Suppose we imagine that the x-axis is an infinite see-saw in each direction,
and we place weights equal to p X (x) at each possible value x of X . Then the mean µ X
is at the point which will make the see-saw balance. In other words, it is at the centre of
mass of the system.
Exercise 5
The distributions (labelled ‘a’ to ‘f’) of six different random variables are shown below.
Pr(X = x) a b
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
c d
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
e f
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Recall that X is the number of trials before the first success in a sequence of independent
trials, each with probability of success p. Do you expect there to be many trials before
the first success, on average, or just a few?
d
A result which we state here without proof is that, for X = G(p), we have
1−p
µX = .
p
If p is large (that is, close to 1), then successes are very likely and the wait before the first
success is likely to be short; in this case, µ X is small. On the other hand, if p is small
(close to 0), then failures are very likely and the wait before the first success is likely to be
long; in this case, µ X is large.
Exercise 6
a One of the standard forms of commercial lottery selects six balls at random out of 45.
What is the chance of winning first prize in such a lottery with a single entry?
b Suppose that someone buys a single entry in every draw. What is the distribution of
the number of draws entered before the player wins first prize for the first time?
c What is the expected number of draws before winning first prize for the first time?
d Suppose the draws occur weekly. On average, how many years does the person have
to wait before winning first prize for the first time?
We may wish to find the mean of a function of a random variable X , such as X 2 or log X .
For a discrete random variable X with pf p X (x), consider an arbitrary function of X , say
Y = g (X ). Then the expectation of Y , that is, E(Y ) = µY , is obtained as follows:
µY =
X
y Pr(Y = y) (summing over y for which Pr(Y = y) > 0)
X
= g (x) Pr(X = x) (summing over x for which Pr(X = x) > 0)
X
= g (x) p X (x).
For the special case of a linear transformation Y = a X +b, we shall see that it follows that
µY = aµ X + b. This is a very useful result; it says that, for a linear transformation, the
mean of the transformed variable equals the transformation of the mean of the original
variable. In particular, if Y = aX , then µY = aµ X , as you might expect. This applies to
changes of units: for example, if the random variable X measures a time interval in days,
{20} • Discrete probability distributions
and we wish to consider the equivalent time in hours, then we can define Y = 24X and
we know that µY = 24µ X .
µY =
X
g (x) p X (x)
X
= (ax + b) p X (x)
X X
= ax p X (x) + b p X (x)
X X
=a x p X (x) + b p X (x)
= aµ X + b,
as claimed.
Technical note. Every discrete random variable X with a finite set of possible values has
a mean µ X . But it is possible to construct examples where the mean does not exist. For
example, consider the discrete random variable X with probability function p X (x) = x1 ,
1
for x = 2, 4, 8, 16, . . . . Here p X (x) = ∞
P P
n=1 2n = 1, as required for a probability function.
But µ X = x p X (x) = n=1 1 does not exist. Such complications are not considered in
P P∞
We have seen that the mean of a random variable X is a measure of the central location
of the distribution of X . If we are summarising features of the distribution of X , it is clear
that location is not the only relevant feature. The second most important feature is the
spread of the distribution.
If values of X near its mean µ X are very likely and values further away from µ X have
very small probability, then the distribution of X will be closely concentrated around µ X .
In this case, the spread of the distribution of X is small. On the other hand, if values
of X some distance from its mean µ X are likely, the spread of the distribution of X will
be large.
These ideas lead to the most important measure of spread, the variance, and a closely
related measure, the standard deviation.
A guide for teachers – Years 11 and 12 • {21}
Students have met the concepts of variance and standard deviation when summarising
data. These were the sample variance and the sample standard deviation. The difference
here is that we are referring to properties of the distribution of a random variable.
(x − µ)2 p X (x),
X
var(X ) =
where the sum is taken over all values of x for which p X (x) > 0. So the variance of X is
the weighted average of the squared deviations from the mean µ, where the weights are
given by the probability function p X (x) of X .
The standard deviation of X is defined to be the square root of the variance of X . That is,
p
sd(X ) = σ X = var(X ).
In some ways, the standard deviation is the more tangible of the two measures, since
it is in the same units as X . For example, if X is a random variable measuring lengths
in metres, then the standard deviation is in metres (m), while the variance is in square
metres (m2 ).
Unlike the mean, there is no simple direct interpretation of the variance or standard de-
viation. The variance is analogous to the moment of inertia in physics, but that is not
necessarily widely understood by students. What is important to understand is that, in
relative terms:
• a small standard deviation (or variance) means that the distribution of the random
variable is narrowly concentrated around the mean
• a large standard deviation (or variance) means that the distribution is spread out, with
some chance of observing values at some distance from the mean.
Note that the variance cannot be negative, because it is an average of squared quantities.
This is appropriate, as a negative spread for a distribution does not make sense. Hence,
var(X ) ≥ 0 and sd(X ) ≥ 0 always.
{22} • Discrete probability distributions
Example
Consider the rolling of a fair six-sided die, with X the number on the uppermost face.
We know that the pf of X is
1
p X (x) = , x = 1, 2, 3, 4, 5, 6,
6
and that µ X = 3.5. The variance of X is given by
var(X ) = E[(X − µ X )2 ]
(x − µ X )2 p X (x)
X
=
1
(x − µ X )2
X
=
6
1 ³ ´
= (1 − 3.5)2 + (2 − 3.5)2 + (3 − 3.5)2 + (4 − 3.5)2 + (5 − 3.5)2 + (6 − 3.5)2
6
1
= × 17.5
6
35
= ≈ 2.9167.
12
q
35
Hence, the standard deviation of X is σ X = 12 ≈ 1.7078.
Exercise 7
Consider again the example of the number of languages spoken by Australian school
children. Define X to be the number of languages in which a randomly chosen Australian
child attending school can hold an everyday conversation. Assume that the probability
function of X , p X (x), is as shown in the following table.
x 1 2 3 4 5 6
Exercise 8
Consider the following four discrete distributions.
Pr(X = x) a b
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
c d
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
x
b Pr(X ≥ 4) = 3 × 1+θ
¢ 1
= 2 (1 + θ).
¡
6
Exercise 2
¡ 6 ¢3
a Assuming independence, 7 ≈ 0.6297.
¡ 6 ¢3
b 1− 7 ≈ 0.3703.
c 0.3703.
d 0.6297 × 0.3703 ≈ 0.2332.
e 0.6297x × 0.3703.
f X has a geometric distribution with parameter p = 0.3703. The probability function
is given by p X (x) = 0.3703 × 0.6297x , for x = 0, 1, 2, . . . .
Exercise 3
a There are nine possible outcomes in a single play, three of which are ties. So the
probability of a tie is 31 .
b The chance of Julia winning on any single play, including on the first play, is 13 .
c X has a geometric distribution with parameter p = 13 .
¡ 2 ¢x
d p X (x) = 31 3 , for x = 0, 1, 2, . . . .
1 2 5
¡ ¢
e p X (5) = 3 3 ≈ 0.0439.
f There are at least two ways to approach this problem. One method is to calculate:
∞
X X∞ 1 ³ 2 ´x X∞ ³ 2 ´n p X (5)
Pr(X ≥ 5) = p X (x) = = p X (5) = ≈ 0.1317,
x=5 x=5 3 3 n=0 3 1 − 23
Alternatively, note that the event “X ≥ 5” (the first success is after at least five failures)
occurs precisely when the first five trials are failures. The probability of failure in a
¡ ¢5
single trial is 1 − p = 32 and the trials are independent, so Pr(X ≥ 5) = 23 ≈ 0.1317.
A guide for teachers – Years 11 and 12 • {25}
Exercise 4
¡ 1−θ ¢ ¡ 1+θ ¢
a µX = = 21 (7 + 3θ).
P
x p X (x) = 6 6 + 15 6
Exercise 5
a In each case, the probabilities are all clearly non-negative and the sum of the proba-
bilities equals one, as required.
b You should be able to get fairly close to the actual means (see part c) visually.
c In each case, the mean is given by µX =
P
x p X (x).
a b c d e f
Exercise 6
1 1
a The chance of a single entry winning first prize is equal to ¡45¢ = .
8 145 060
6
b The random variable X , the number of draws before a first prize is obtained, has a
1
geometric distribution with parameter p = 8 145 060 .
1−p
c The mean of X is given by p = 8 145 059.
Exercise 7
a µX = 1.52.
b var(X ) = 0.8216, sd(X ) ≈ 0.9064.
c The units are ‘number of languages’.
Exercise 8
a For each graph, we have p X (x) ≥ 0 and
P
p X (x) = 1.
b You should be able to get close to the actual means (see part c) visually.
c Each of the four means is equal to 6.
d Guessing variances is much more difficult than guessing means. But for these four
distributions, the order (largest to smallest) should be clear visually: b > a > d > c.
e
a b c d