LESSON 12: THEORY OF PROBABILITY
RANDOM VARIABLES
By random we mean unpredictable; when the same is applied in case of a random variable,
means, we cannot with certainty predict its future value. Even through the entire past history of
the variable is known, it is still unpredictable with certainty. If the variable is of the deterministic
type, no such uncertainty exists.
However, quite a few random variables do exhibit statistical regularity. Consider a simple
experiment of tossing an unbiased coin. We do not know in advance whether the outcome on a
particular toss would be a head or tail. But, we know for sure that in a long sequence of tosses,
about half of the outcomes would be heads. If this does not happen, we suspect either the coin or
the person tossing it is biased. Statistical regularity of averages is an experimentally verifiable
phenomenon in many cases involving random quantities. Hence, we are tempted to develop
mathematical tools for the analysis and quantitative characterization of random variables.
Random Variable Definition
Although it may look simple at first sight to give a definition of what a random variable is, it
proves to be quite difficult in practice. A random variable, usually written X, is a variable whose
possible values are numerical outcomes of a random experiment. It therefore is a function that
associates a unique numerical value with every outcome of an experiment. The value of the
random variable will vary from trial to trial as the experiment is repeated.
The following paragraphs will cover the details.
Random Experiment: As Random variables are outcomes of a random experiment, it is
essential to understand a random experiment as well. Where Random variables are outcomes, a
random experiment is a process leading to an uncertain out-come, before the experiment is run. It
is usually assumed that the experiment can be repeated indefinitely under essentially
homogeneous conditions. Result of a random experiment is not unique but it can be one of the
various possible outcomes. Simple example is tossing an unbiased coin, where outcomes can be
head or tail. You keep on tossing the coin a number of times under essential homogeneous
conditions, the outcomes would keep on flipping between Head & Tail, without exactly knowing
which toss would result in to what.
The outcome of an experiment need not be a number, for example, the outcome in a coin toss
experiment can be 'heads' or 'tails'. However, we often want to represent outcomes as numbers. A
random variable is a function that associates a unique numerical value with every outcome of an
experiment. In the given example, if there are three trails (say); the number of times “Head”
appears can be a random variable, which can assume values as, 0, 1, 2, & 3. Because in three
trials, you can have minimum zero Heads and maximum three Heads.
Types of Random Variables
Classification of random variables is done based on their probability distribution. A random
variable has either an associated probability distribution (discrete random variable) or probability
density function (continuous random variable). Based on that, there are two types of random
variable – Discrete and Continuous.
Discrete Random Variables
A discrete random variable is one which may take on only a countable number of distinct values
such as 0,1,2,3,4,........ Discrete random variables are usually (but not necessarily) counts. If a
random variable can take only a finite number of distinct values, then it must be discrete.
Examples of discrete random variables include the number of children in a family, the number of
people in an ATM queue, the number of patients in a doctor's surgery, the number of defective
light bulbs in a box of ten etc.
The probability distribution of a discrete random variable is a list of probabilities associated with
each of its possible values. It is also sometimes called the probability function or the probability
mass function.
Suppose a random variable X may take k different values, with the probability that X = xi defined
to be P(X = xi) = pi. The probabilities pi must satisfy the following:
1: 0 < pi < 1 for each i
2: p1 + p2 + ... + pk = 1.
Example
1. A coin is tossed ten times. The random variable X is the number of “Tails” that are noted.
X can only take the values 0, 1, ..., 10, so X is a discrete random variable. The above two
properties hold in this case. For e.g. probability of 8 Tails, p8 will definitely fall in the
range 0 to 1. And also, the sum of probabilities for all possible values of Tails, p0 + p1 +
p2 + ... + p10 =1.
Note in case of ten trials, number of tails can be 0 to 10.
Continuous Random Variables
A continuous random variable is one which takes an infinite number of possible values (usually
in a given range). Continuous random variables are usually measurements like, height, weight,
the amount of sugar in an orange, time required to finish a task, interest earn etc. For e.g.: life of
an individual in a community. A person may die immediately on his birth (life equals to zero
years) or after attaining an age of 110 years (say). Within this range, he may die at any age.
Therefore the variable “age” can take any value in the range 0 to 110, in this case.
A continuous random variable is not defined at specific values, since the values are infinite and
therefore probability at a specific value is almost zero. Instead, it is defined over an interval of
values, and is represented by the area under a curve.
Suppose a random variable X may take all values over an interval of real numbers. Then the
probability that X is in the set of outcomes A, P(A), is defined to be the area above A and under a
curve. The curve, which represents a function p(x), must satisfy the following:
1: The curve has no negative values (p(x) > 0 for all x)
2: The total area under the curve is equal to 1.
A curve meeting these requirements is known as a density curve.
1. A light bulb is burned until it burns out. Suppose the life of bulb ranges between zero
hours (minimum) to 100 hours (maximum). The random variable Y is its lifetime in
hours. Y can take any positive real value in the range 0 to 100, so Y is a continuous
random variable. It is immaterial to calculate probability of Y at a specific point in the
specified range; instead we wish to calculate probability between any two end points in
the range, like 0-10, 50-70, less than 20, more than 90 etc. At any point in the complete
range (0-100), p(x) > 0, and the total area in the probability curve from p(x=0) to
p(x=100) would be equal to one.
EXPECTED VALUE
The expected value (or population mean) of a random variable indicates its average or central
value. It is a useful summary value (a number) of the variable's distribution.
Stating the expected value gives a general impression of the behaviour of some random variable
without giving full details of its probability distribution (if it is discrete) or its probability density
function (if it is continuous).
Two random variables with the same expected value can have very different distributions. There
are other useful descriptive measures which affect the shape of the distribution, for example
standard deviation.
The expected value of a random variable X is symbolised by E(X).
If X is a discrete random variable with possible values x1, x2, x3, ..., xn, and p(xi) denotes P(X =
xi), then the expected value of X is defined by:
where the elements are summed over all values of the random variable X.
If X is a continuous random variable with probability density function f(x), then the expected
value of X is defined by:
Example
Discrete case: When a die is thrown, each of the possible faces 1, 2, 3, 4, 5, 6 (the xi's) has a
probability of 1/6 (the p(xi)'s) of showing. The expected value of the face showing is therefore:
µ = E(X) = (1 x 1/6) + (2 x 1/6) + (3 x 1/6) + (4 x 1/6) + (5 x 1/6) + (6 x 1/6) = 3.5
Notice that, in this case, E(X) is 3.5, which is not a possible value of X.
Expected Values of Random Variables
We already looked at finding the mean in the section on averages. Random variables also have
means but their means are not calculated by simply adding up the different variables.
The mean of a random variable is more commonly referred to as its Expected Value, i.e. the
value you expect to obtain should you carry out some experiment whose outcomes are
represented by the random variable.
In Probability theory, the expected value (or expectation, mathematical expectation, EV,
mean, ) refers, intuitively, to the value of a random variable one would "expect" to find if one
could repeat the random variable process an infinite number of times and take the average of the
values obtained. More formally, the expected value is a weighted average of all possible values.
In other words, each possible value that the random variable can assume is multiplied by its
assigned weight, and the resulting products are then added together to find the expected value.
The weights used in computing this average are the probabilities in the case of a discrete random
variable, or the values of a probability density function in the case of a continuous random
variable.
Example:
A local club plans to invest $10000 to host a baseball game. They expect to sell tickets worth
$15000. But if it rains on the day of game, they won't sell any tickets and the club will lose all
the money invested. If the weather forecast for the day of game is 20% possibility of rain, is this
a good investment?
Make a table of probability distribution.
Use the weighted average formula.
The club can expect a return of $1600. So, it's a good investment, though a bit risky.
In other cases, we are asked to find the values of one or more variables involved in the model for
which the experiment has a given expected value.
Example:
A company makes electronic gadgets. One out of every 50 gadgets is faulty, but the company
doesn't know which ones are faulty until a buyer complains. Suppose the company makes a $3
profit on the sale of any working gadget, but suffers a loss of $80 for every faulty gadget because
they have to repair the unit. Check whether the company can expect a profit in the long term.
Write the probability distribution.
Since the expected value is positive, the company can expect to make a profit. On average, they
make a profit of $1.34 per gadget produced.
The intuitive explanation of the expected value above is a consequence of the law of average
numbers: the expected value, when it exists, is almost surely the limit of the sample mean as the
sample size grows to infinity. More informally, it can be interpreted as the long-run average of
the results of many independent repetitions of an experiment (e.g. a dice roll).
Suppose random variable X can take value x1 with probability p1, value x2 with probability p2,
and so on, up to value xk with probability pk. Then the expectation of this random variable X is
defined as
Since all probabilities pi add up to one (p1 + p2 + ... + pk = 1), the expected value can be viewed
as the weighted average, with pi’s being the weights:
If all outcomes xi are equally likely (that is, p1 = p2 = ... = pk), then the weighted average turns
into the simple average. This is intuitive: the expected value of a random variable is the average
of all values it can take; thus the expected value is what one expects to happen on average. If the
outcomes xi are not equally probable, then the simple average must be replaced with the
weighted average, which takes into account the fact that some outcomes are more likely than the
others. The intuition however remains the same: the expected value of X is what one expects to
happen on average.
Example 1. Let X represent the outcome of a roll of a six-sided die. More specifically, X will be
the number of pips showing on the top face of the die after the toss. The possible values for X are
1, 2, 3, 4, 5, and 6, all equally likely (each having the probability of 1/6). The expectation of X is
Let X be a discrete random variable taking values x1, x2, ... with probabilities p1, p2, ...
respectively. Then the expected value of this random variable is the infinite sum
Given that the random variable X is continuous and has a probability distribution f(x), the
expected value of the random variable is given by:
Uses and applications
It is possible to construct an expected value equal to the probability of an event by taking the
expectation of an indicator function that is one if the event has occurred and zero otherwise. This
relationship can be used to translate properties of expected values into properties of probabilities,
e.g. using the law of large numbers to justify estimating probabilities by frequencies.
PROBABILITY DISTRIBUTION
A probability distribution is a table or an equation that links each outcome of a statistical
experiment with its probability of occurrence. The usefulness of probability theory comes in
understanding probability distributions (also called probability functions and probability
densities or masses). Probability distributions list or describe probabilities for all possible
occurrences of a random variable.
Example: Let us consider a case of tossing a coin two times. This simple statistical experiment
can have four possible outcomes: HH, HT, TH, and TT. Now, let the variable X represent the
“number of Heads” that result from this experiment. The variable X can take on the values 0, 1,
or 2. Where, a value of X = 0, signifies, none of the trials resulted in Head. A value of X =2,
means both the trials gives Head. Similarly we can interpret for X = 1. In this example, X is a
random variable; because its value is determined by the outcome of a statistical experiment.
As stated above, probability distribution is a table or an equation that links each outcome of a
statistical experiment with its probability of occurrence. In the experiment described above, the
table below, which associates each outcome with its probability, is an example of a probability
distribution.
Number of heads Probability
0 0.25
1 0.50
2 0.25
These all probabilities are calculated using classical approach. The probability of X = 1, is 0.50
because, two outcomes (out of four) resulted in to one Head (HT, and TH).
Types of probability distributions: Broadly there are two types of Probability distribution,
Discrete and Continuous. Within these two broad categories, there are many theoretical
distributions defined.
Discrete probability distributions: It describes a finite set of possible occurrences, for discrete
variables.” For example, the number of successful treatments out of 4 patients is discrete,
because the random variable represent the number of success can be only 0, 1, 2,3 or 4. The
probability of all possible occurrences: P (0 successes), P (1success),.....P (4successes)
constitutes the probability distribution for this discrete random variable. The example cited
above (tossing a coin twice is also a case of discrete distribution).
Therefore a discrete probability distribution lists each possible value that a random variable can
take, along with its probability. It has the following properties:
The probability of each value of the discrete random variable is between 0 and 1,
So 0 ≤ p(x) ≤ 1
The sum of all the probabilities is 1, so ∑ p(x) = 1
Example: Consider the table below:
X -5 6 9
P(x) 05 .025 .025
This is a probability distribution, since all of the probabilities are between 0 and 1, and they add
to 1.
Continuous probability distributions: It describes the probability distribution of a continuous
variable. For example, the probability of a given weight (of an infant) can be anything from, say,
two Kg to more than 6 Kg (or something like that). Thus, the random variable of weight is
continuous, with an infinite number of possible points between any two values.
When moving from discrete to continuous distributions, the random variable will no longer be
restricted to integer values, but will now be able to take on any value in some interval of real
numbers. Graphically, we will be moving from the discrete bars of a histogram to the curve of a
(possibly piecewise) continuous function.
In the discrete case, probabilities were given by a probability distribution function P(X=x), and
graphically displayed by using its value as the height of each bar. We might also observe that
each of the bars had width 1, and therefore the height of each bar was equal to its area.
In the continuous case, the function f(x) is called the probability density function, and
probabilities are determined by the areas under the curve f(x). So as we move from the discrete
to the continuous case, we need to modify how we interpret the graph, so that we see
probabilities as areas. And yet, the mathematics has not changed at all, since probabilities are
areas in both cases.
********