Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
277 views51 pages

Chapter 2 - Probability & Random Variables

This document provides a summary of key concepts in probability theory, including: 1) It introduces basic set definitions such as elements, subsets, countable/uncountable sets, and operations like union and intersection. 2) It defines probability measure and outlines the most commonly used probability measures. 3) It states the rules governing the calculation of probabilities for single and joint experiments, and introduces the concept of random variables which are characterized by a probability space, possible values, and a rule for probabilities.

Uploaded by

Frances Diaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
277 views51 pages

Chapter 2 - Probability & Random Variables

This document provides a summary of key concepts in probability theory, including: 1) It introduces basic set definitions such as elements, subsets, countable/uncountable sets, and operations like union and intersection. 2) It defines probability measure and outlines the most commonly used probability measures. 3) It states the rules governing the calculation of probabilities for single and joint experiments, and introduces the concept of random variables which are characterized by a probability space, possible values, and a rule for probabilities.

Uploaded by

Frances Diaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

PROBABILITY 9

CHAPTER TWO
functions and density functions are developed. We then discuss summary meas-
ures or expected values) that frequently prove useful in characterizing
random variables.
Vector-valued random variables (or random vectors, as they are often re-
ferred to) and methods of characterizing them are introduced in Section 2.5.
Review of Probability and Various multivariate distribution and density functions that form the basis of
probability models for random vectors are presented.
Random Variables As electrical engineers, we are often interested in calculating the response
of a system for a given input. Procedures for calculating the details of the
probability model for the output of a system driven by a random input are
developed in Section 2.6.
In Section 2.7, we introduce inequalities for computing probabilities, which
are often very useful in many applications because they require less knowledge
about the random variables. A series approximation to a density function based
.i] on some of its moments is introduced, and an approximation to the distribution
1·' of a random variable that is a nonlinear function of other (known) random vari-
' ables is presented.
;2
Convergence of sequences of random variable is the final topic introduced
in this chapter. Examples of convergence are the law of large numbers and the
i central limit theorem.
11
!
2.1 INTRODUCTION

The purpose of this chapter is to provide a review of probability for those PROBABILITY

1
'I
;\
electrical engineering students who have already completed a course in prob-
ability. We assume that course covered at least the material that is presented
here in Sections 2.2 through 2.4. Thus, the material in these sections is partic-
ularly brief and includes very few examples. Sections 2.5 through 2.8 may or
In this section we outline mathematical techniques for describing the results of
an experiment whose outcome is not known in advance. Such an experiment is
called a random experiment. The mathematical approach used for studying the
'l may not have been covered in the prerequisite course; thus, we elaborate more results of random experiments and random phenomena is called probability
in these sections. Those aspects of probability theory and random variables used theory. We begin our review of probability with some basic definitions and
in later chapters and in applications are emphasized. The presentation in this axioms.
chapter relies heavily on intuitive reasoning rather than on mathematical rigor.
.;1
A bulk of the proofs of statements and theorems are left as exercises for the
J reader to complete. Those wishing a detailed treatment of this subject are re-
'l
CJ 2.2.1 Set Definitions
·1 ferred to several well-written texts listed in Section 2.10.
i
''J We begin our review of probability and random variables with an introduction A set is defined to be a collection of elements. Notationally, capital letters A,
to basic sets and set operations. We then define probability measure and review B, ... , will designate sets; and the small letters a, b, ... , will designate
'-}
the two most commonly used probability measures. Next we state the rules elements or members of a set. The symbol, E, is read as "is an element of,"
,j governing the calculation of probabilities and present the notion of multiple or and the symbol, fl., is read "is not an element of." Thus x E A is read "xis an
;l joint experiments and develop the rules governing the calculation of probabilities element of A."
associated with joint experiments. Two special sets are of some interest. A set that has no elements is called
The concept of random variable is introduced next. A random variable is the empty set or null set and will be denoted by A set having at least one
characterized by a probabilistic model that consists of (1) the probability space, element is called nonempty. The whole or entire space S is a set that contains
I
(2) the set of values that the random variable can have, and (3) a rule for all other sets under consideration in the problem.
.A computing the probability that the random variable has a value that belongs to A set is countable if its elements can be put into one-to-one correspondence
l a subset of the set of all permissible values. The use of probability distribution with the integers. A countable set that has a finite number of elements and the
;
J
j
.


I
l

10 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBABILITY 11


(
1 null set are called finite sets. A set that is not countable is called uncountable. and is the .set of all elements .that belong to both A and B. A n B is also written
A set that is not finite is called an infinite set. AB. The intersection of N sets is written as

Subset. Given two sets A and B, the notation


n A;
N
Al n Az n ... nAN=
l i=l

4
ACB
Mr
Mutually Exclusive. Two sets are called mutually exclusive (or disjoint) if they
or equivalently
have no common elements; that is, two arbitrary sets A and B are mutually
exclusive if
B::JA

An B = AB =¢
is read A is contained in B, or A is a subset of B, orB contains A. Thus A is
contained in BorA C B if and only if every element of A is an element of B.
There are three results that follow from the foregoing definitions. For an where ¢ is the null set.
I arbitrary set, A Then sets A2 , ••• , An are called mutually exclusive if

ACS A; n Aj = ¢ for all i, j, i "' j

¢cA
Complement. The complement, A, of a set A relative to S is defined as the
ACA
set of all elements of S that are not in A.
Let S b.e the whole space and let A, B, C be arbitrary subsets of S. The
Set Equality. Two arbitrary sets, A and B, are called equal if and only if they following results can be verified by applying the definitions and verifying that
contain exactly the same elements, or equivalently, each is a subset of the other. Note that the operator precedence is (1) paren-
theses, (2) complement, (3) intersection, and (4) union.
A = B if and only if A C B and B CA
Commutative Laws.
1 Union. The Union of two arbitrary sets, A and B, is written as
AUB=BUA

AUB AnB=BnA

and is the set of all elements that belong to A or belong to B (or to both). The Associative Laws.
union of N sets is obtained by repeated application of the foregoing definition
'1 and is denoted by
(AU B) U C =AU (B U C) =AU B U C

N (A n B) n C = A n (B n C) = A nBn C
A 1 U A 2 U · · · U AN = U A;
i= 1

Distributive Laws.

J Intersection. The intersection of two arbitrary sets, A and B, is written as


A n (B U C) = (A n B) U (A n C)
1
AnB A U (B n C) = (A U B) n (A U C)

J
12 PROBABILITY 13

{
REVIEW OF PROBABILITY AND RANDOM VARIABLES

,_f
;.'{ DeMorgan's Laws. l. Y(.S) = 1 (2.1)
Yi"

axioms
2. 0 for all A C S (2.2)
4;
P(A) 2::
(Au B)= An B
(An B)= Au B 3. P (
N:
Ak
)
= x-
N

1
P(Ak) (2.3)

if A; n Ai = ¢fori# j,
-! and N'may be infinite
2.2.2 Sample Space (¢ is the empty or null set)
When applying the concept of sets in the theory of probability, the whole space
will consist of elements that are outcomes of an experiment. In this text an
experiment is a sequence of actions that produces outcomes (that are not known A random experiment is completely described by a sampi<e space, a probability
in advance). This definition of experiment is broad enough to encompass the measure (i.e., a rule for assigning probabilities), and the class of sets forming
usual scientific experiment and other actions that are sometimes regarded as the domain set of the probability measure. The combinatiom of these three items
observations. is called a probabilistic model.
The totality of all possible outcomes is the sample space. Thus, in applications By assigning numbers to events, a probability measure distributes numbers
of probability, outcomes correspond to elements and the sample space corre- over the sample space. This intuitive notion has led to tE!e use of probability
sponds to S, the whole space. With these definitions an event may be defined distribution as another name for a probability measure. 'We now present two
as a collection of outcomes. Thus, an event is a set, or subset, of the sample widely used definitions of the probability measure.
space. An event A is said to have occurred if the experiment results in an outcome
that is an element of A. Relative Frequency Definition. Suppose that a random experiment is repeated
For mathematical reasons, one defines a completely additive family of subsets n times. If the event A occurs nA times, then its probability P(A) is defined as
:f'
of S to be events where the class, S, of sets defined on S is called completely the limit of the relative frequency nA/n of the occurrence of A. That is
'
··;
additive if
I
t. lim nA (2.4)
1. scs P(A)
n-oo n
n
2. If Ak C S for k = 1, 2, 3, ... , then U Ak C S for n 1, 2, 3, ...
For example, if a coin (fair or not) is tossed n times and heads show up nu
times, then the probability of heads equals the limiting value of nuln.
3. If A C S, then A C S, where A is the complement of A

Classical Definition. In this definition, the probability P(A) of an event A is


found without experimentation. This is done by counting the total number, N,
2.2.3 Probabilities of Random Events of the possible outcomes of the experiment, that is, the number of outcomes in
Using the simple definitions given before, we now proceed to define the prob- S (Sis finite). If NA of these outcomes belong to event A, then P(A) is defined
abilities (of occurrence) of random events. The probability of an event A, de- to be
noted by P(A), is a number assigned to this event. There are several ways in
which probabilities can be assigned to outcomes and events that are subsets of
the sample space. In order to arrive at a satisfactory theory of probability (a P(A) ;, NA (2.5)
N
theory that does not depend on the method used for assigning probabilities to
events), the probability measure is required to obey a set of axioms.
If we use this definition to find the probability of a tail when a coin is tossed,
Defmition. A probability measure is a set function whose domain is a com- we will obtain an answer oft. This answer is correct when we have a fair coin.
pletely additive class S of events defined on the sample space S such that the If the coin is not fair, then the classical definition will lead to incorrect values
measure satisfies the following conditions: for probabilities. We can take this possibility into account and modify the def-
..,

14 "''


11
14 REVIEW OF PROBABILITY AND RANDOM VARIABLES

inition as: the probability of an event A consisting of NA outcomes equals the 2. For an arbitrary event, A
PROBABILITY 15

ratio NAI N provided the outcomes are equally likely to occur.



I
The reader can verify that the two definitions of probabilities given in the
preceding paragraphs indeed satisfy the axioms stated in Equations 2.1-2.3. The 3. If A U A =S and A n A = ¢,
P(A)::; 1

then A is called the complement of A


(2.7)


difference between these two definitions is illustrated by Example 2.1. and
P(A) = 1 - P(A) (2.8)

,. EXAMPLE 2.1. (Adapted from Shafer [9]). 4. If A is a subset of B, that is, A C B, then
P(A)::; P(B) (2.9)
• DIME-STORE DICE: Willard H. Longcor of Waukegan, Illinois, reported in
5. P(A U B) = P(A) + P(B) - P(A n B) (2.10.a)
• the late 1960s that he had thrown a certain type of plastic die with drilled pips
over one million times, using a new die every 20,000 throws because the die 6. P(A U B) ::; P(A) + P(B) (2.10.b)
wore down. In order to avoid recording errors, Longcor recorded only whether 7. If Al> A 2 , • • • , An are random events such that
,.
f
the outcome of each throw was odd or even, but a group of Harvard scholars
A; n Ai = ¢ for i ¥- j (2.10.c)
who analyzed Longcor's data and studied the effects of the drilled pips in the
die guessed that the chances of the six different outcomes might be approximated and
-
II
by the relative frequencies in the following table:
A 1 U Az U · · · U An = S (2.10.d)
II then

• P(A) = P(A n S) = P[A n (A 1 U A 2 U · · · U An)]

• Upface
Relative
Frequency
1

.155
2

.159
3

.164 .169
4 5

.174
6

.179
Total

1.000
= P[(A
= P(A
n A 1) U (A n A 2 ) U · · · U (A
n A 1) + P(A n A 2) + · · · + P(A
n An)J
n An) (2.10.e)

Classical 1 1.
1.000 The sets Ar, A 2 , • • • , An are said to be and exhatlS.tive
6 6
if Equations 2.10.c and 2.10.d are satisfied .

• P A;) + + +
• 8. = P(A 1) P(A 1Az) P(ArAzAJ)

• They obtained these frequencies by calculating the excess of even over odd in + p ( An ir:! A;
n-1 )
(2.11)

•• Longcor's data and supposing that each side of the die is favored in proportion
to the extent that is has more drilled pips than the opposite side. The 6, since
it is opposite the 1, is the most favored .
Proofs of these relationships are left as an exercise for the reader.


II
1IIC 2.2.5 Joint, Marginal, and Conditional Probabilities

• 2.2.4 Useful Laws of Probability


Using any of the many definitions of probability that satisfies the axioms given
In many engineering applications we often perform an experiment that consists
of many Two examples are the simultaneous observation ofthe
input and output digits of a binary communication system, and simultaneous
in Equations 2.1, 2.2, and 2.3, we can establish the following relationships:
• observation of the trajectories of several objects in space. Suppose we have a
random experiment E that consists of two subexperiments £ 1 and £ 2 (for ex-
• 1. If ¢ is the null event, then
P(¢) = 0 (2.6)
ample,£: toss a die and a coin; £ 1: toss a die; and £ 2 : toss a coin). Now if the
sample spaceS 1 of £ 1 consists of outcomes a 1, a 2 , • • • , an 1 and the sample space
•l
,<

'l 16 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBABILITY 17

S2 of E 2 consists of outcomes bl> b2 , ••• , bn,, then the sample space S of the NAB be the number of outcomes belonging to events A, B, and AB, respectively,
combined experiment is the Cartesian product of S1 and S2 • That is and let N total number .of m.uoomes in the sample space. Then,

event A
s = sl X Sz oooo NA = #outcomes of
NAB
P(AB) = N
= {(a;, bi): i = 1, 2, ... , n 1, j = 1, 2, ... , nz}
oooo NB = # Outcomes Of NA
S2 and S = S1 x S2 • If events A 1 ,
P(A) = N (2.13)
We can define probability measures on went B
A 2 , • • • , An are defined for the first subexperiment and the events
Oferrent AB
B 2 , • • • , Bm are defined for the second subexperiment £ 2 , then event A;Bi is •
NAB = # Outcomes
an event of the total experiment. Given that the event A has occurred, we know that the outcome is in A. There
are NA outcomes in A. Now, for B to occur given that A has occurred, the
Joint Probability. The probability of an event such as A; n Bi that is the outcome should belong to A and B. There are NAB outcomes in AB. Thus, the
intersection of events from subexperiments is called the joint probability of the probability of occurrence of B given A has occurred is
event and is denoted by P(A; n Bi). The abbreviation A;Bi is often used to


denote A; n Bi.

Marginal Probability, If the events A 1, A 2 , • • • , An associated with subex-


P(BjA) = NAB
NA
= NA 8 /N
NA/N
= DHABI
periment £ 1 are mutually exclusive and exhaustive, then ports
The implicit assumption here is that NA ¥ 0. Based on this motivation we define
P(BJ P(Bi n S) = P[Bi n (A 1 U A 2 U · · · U An)]
n
conditional probability by
L P(A;Bi)
i=l
(2.12)
P(BjA) P(A) ¥ 0 (2.14)

Since Bi is an event associated with subexperiment £ 2 , f'(,J!i} is


PLOJ?.<!I:>i!i!Y.
O
One can show that P(BIA) as defined by Equation 2.14 is a probability measure,
that is, it satisfies Equations 2.1, 2.2, and 2.3.
-
Conditional Probability. Quite often, the probability of occurrence of event
Bi may depend on the occurrence of a related event A;. For example, imagine -
e W ③Axioms to
a box containing six resistors and one capacitor. Suppose we draw a component Relationships Involving Joint, Marginal, and Conditional Probabilities. The
from the box. Then, without replacing the first component, we draw a second reader can use the results given in Equations 2.12 and 2.14 to establish the
component. Now, the probability of getting a capacitor on the second draw
following useful relationships.
depends on the outcome of the first draw. For if we had drawn a capacitor on
the first draw, then the probability of getting a capacitor on the second draw is '

1. P(AB) = P(AjB)P(B) = P(BjA)P(A) (2.15)


zero since there is no capacitor left in the box! Thus, we have a situation where

{ }
the occurrence of event Bi (a capacitor on the second draw) on the second 2. If AB = 0, then P(A U BjC) = P(AjC) + P(BjC) (2.16)
subexperiment is conditional on the occurrence of event A; (the component 3. ,P(ABO = P{A)P(BjA)P(CjAB) (Chain Rule) (2.17)
drawn first) on the first subexperiment. We denote the probability of event Bi
4. ILB 1, B 2 , ••.• , B., .are.a set of mutually exclusive and exhaustive
given that event A; is known to have occurred by the conditional probability
events, then
P(BijA;).
An expression for the conditional probability P(BIA) in terms of the joint
probability P(AB) and the marginal probabilities P(A) and P(B) can be ob- P(A) L P(AjBi)P(Bi) (2.18)
tained as follows using the classical definition of probability. Let NA, NB, and j=l
1.- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -....

• 18 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBABILITY 19


I •
1 EXAMPLE 2.2.
(c) Directly from the right margin
140
• An examination of records on certain components showed the following results
P(Mt) = -
530
• when classified by manufacturer and class of defect:
(d) This conditional probability is found by the interpretation that given the
• component is from manufacturer M 2 , there are 160 outcomes in the

• space, two of which have critical defects. Thus

• Class of Defect
2
P(BziMz) = 160
B, = Bz = 83 = B.= Bs =
Manufacturer none critical serious minor incidental Totals
or by the formal definition, Equation 2.14
M, 124 6 3 1 6 140
1 Mz 145 2 4 0 9 160 2
• M3
M.
115
101
1
2
2
0
1
5
1
2
120
110 P(BziM2) = P(BzMz)
P(M2)
530
160
2
160
Totals 485 11 9 7 18 530
530

I (e) 6
P(MdBz) = U
• What is the probability of a component selected at random from the 530 com-
t ponents (a) being from manufacturer M 2 and having no defects, (b) having a

• critical defect, (c) being from manufacturer (d) having a critical defect given
the component is from manufacturer M2 , (e) being from manufacturer M 1, given
Bayes' Rule.
the form
Sir Thomas Bayes applied Equations 2.15 and 2.18 to arrive at

• it has a critical defect?

• SOLUTION: P(BiiA) = m
P(AiBJP(B;)
(2.19)
2: P(AiB)P(B;)
c (a) This is a joint probability and is found by assuming that each component
is equally likely to be selected. There are 145 components from M 2
having no defects out of a total of 530 components. Thus
which is used in many applications and particularly in interpreting the impact
145
P(MzB 1) = of additional information A on the probability of some event P( Bi ). An example
530 illustrates another application of Equation 2.19, which is called Bayes' rule.


14
(b) This calls for a marginal probability .
P(Bz) = P(MtBz) + P(MzBz) + P(M3Bz) + P(M4Bz)
EXAMPLE 2.3.
6 2 1 2 11
= 530 + 530 + 530 + 530 = 530 A binary communication channel is a system that carries data in the form of
one of two types of signals, say, either zeros or ones. Because of noise, a
Note that P(B2) can also be found in the bottom margin of the table,
transmitted zero is sometimes received as a one and a transmitted one is some-
that is
times received as a zero.

1
P(B2)
11
- 530
We assume that for a certain binary communication channel, the probability
a transmitted zero is received as a zero is .95 and the probability that a transmitted
20 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VARIABLES 21
4
one is received as a one is . 90. We also assume the probability a zero is transmitted
l is .4. Find
Equation 2.20.a implies Equation 2.20.b and conversely. Observe that statistical
independence is quite different from mutual exclusiveness. Indeed, if A; and B1

i (a)
(b)
Probability a one is received.
Probability a one was transmitted given a one was received.
are mutually exclusive, then P(A;B1) = 0 by definition.

SOLUTION: Defining
2.3 RANDOM VARIABLES

1 A = one transmitted It is often useful to describe the outcome of a random experiment by a number,
A = zero transmitted for example, the number of telephone calls arriving at a central switching station
in an hour, or the lifetime of a component in a system. The numerical quantity
B = one received associated with the outcomes of a random experiment is called loosely a random
B = zero received variable. Different repetitions of the experiment may give rise to different ob-
served values for the random variable. Consider tossing a coin ten times and
observing the number of heads. If we denote the number of heads by X, then
From the problem statement X takes integer values from 0 through 10, and X is called a random variable.
Formally, a random variable is a function whose domain is the set of outcomes
P(A) = .6, P(BjA) = .90, A E S, and whose range is the real line. For every outcome A E S, the
P(BjA) .05
random variable assigns a number, X(;\) such that
(a) With the use of Equation 2.18 1. The set {;\:X(;\) :s: x} is an eveilt for every x E R 1•
P(B) = P(BjA)P(A) + P(BjA)P(A) 2. The probabilities of the events {;\:X(;\) = oo}, and {;\:X(;\) = -co} equal
zero. .that is,
.90(.6) + .05(.4)
P(X = oo) = P(X = -oo) = 0
.56.
(b) Using Bayes' rule, Equation 2.19 Thus, a random variable maps S onto a set of real numbers Sx C R" where Sx
is the range set that contains all permissible values of the random variable. Often
P(AjB) = P(BjA)P(A) (.90)(.6) 27 Sx is also called the ensemble of the random variable. This definition guarantees
=-
P(B) .56 28 that to every set A C S there corresponds a set T C R1 called the image (under
X) of A. Also for every (Borel) set T C R 1 there exists inS the inverse image
x- 1(T) where
Statistical Independence. Suppose that A; and B1 are events associated with
the outcomes of two experiments. Suppose that the occurrence of A; does not x- 1(T) = {;\. E S:X(A.) E T}
influence the probability of occurrence of B1 and vice versa. Then we say that
the events are statistically independent (sometimes, we say probabilistically in- and this set is an event which has a probability, P[X- 1(T)].
dependent or simply independent). More precisely, we say that two events A; We will use uppercase letters to denote random variables and lowercase
and B1 are statistically independent if letters to denote fixed values of the random variable (i.e., numbers).
Thus, the random variabie X induces a probability measure on the real line
P(A;Bj) = P(A;)P(B1) as follows
(2.20.a)

or when P(X = x) = P {;\:X(;\) = x}


P(Xsx) = P {;\:X(A.) :s: x}
P(A;jB1 ) = P(A;) (2.20.b) P(x 1 < X :s: x 2) = P {;\:x 1 < X(A.) :s: x 2}
---

-------------------.,=-............,...,. ,. _,.,.,.,.,..
---r----
'
\ {:

RANDOM VARIABLES 23


22 REVIEW OF PROBABILITY AND RANDOM VARIABLES
(

.....- Up face is 1 Up face is 2 f--



Up face is 3 Up face is 4
XC>-2J
I
XC>-1
Up face is 5 Up face is 6 f--

• I
,.---.J
• I X(>-6) l

.
I I I I
,.---.J
• -1 0 1 2 3 4 5 6 7

• Figure 2.1 Mapping of the sample space by a random variable . I

(
• I
I

• EXAMPLE 2.4 . 00 2 3 4 5 6 7 8 9 10

• Consider the toss of one die. Let the random variable X represent the value of
X

Figure 2.2 Distribution function of the random variable X shown in Figure 2.1.

\.
(14 the up face. The mapping performed by X is shown in Figure 2.1. The values
of the random variable are 1, 2, 3, 4, 5, 6.

1 SOLUTION: The solution is given in Figure 2.2 .


l • • 2.3.1 Distribution Functions
The probability P(X :S x) is also denoted by the function Fx(x), which is called
Joint Distribution Function. We now consider the case where two random
variables are defined on a sample space. For example, both the voltage and
the distribution function of the random variable X. Given Fx(x), we can compute current might be of interest in a certain experiment.
such quantities as P(X > x 1), P(x 1 :S X :S x 2), and so on, easily. The probability of the joint occurrence of two events such as A and B was
1 called the joint probability P(A n B). If the event A is the event (X :S x) and
A distribution function has the following properties

•• 1. Fx( -co) = 0
DF prop for
1 crandon
. the event B is the event (Y :S y), then the joint probability is called the joint
distribution function of the random variables X and Y; that is

• 2. Fx(oo) = 1 Fx.Y(x, y) = P[(X s x) n (Y s y)]

•• nuiaenecxs
\
3. lim Fx(x + E) = Fx(x)
e>O From this definition it can be noted that
4. Fx(xt) :S Fx(Xz) if X1 < Xz
Fx,Y( -cc, -co) = 0, FxA -ec, y) = 0, Fx,y(oo, y) = Fy(y),
5. P[xt < X :S Xz] = Fx(Xz) - Fx(xt)
Fx,Y(x, -cc) = 0, Fx,Y(x, oo) = 1, Fx,Y(x, oo) = Fx(x) (2.21)

•• EXAMPLE 2.5. A random variable may be discrete or continuous. A discrete random variable
can take on only a countable number of distinct values. A continuous random
,. Consider the toss of a fair die. Plot the distribution function of X where X is a
random variable that equals the number of dots on the up face.
variable can assume any value within one or more intervals on the real line.
Examples of discrete random variables are the number of telephone calls arriving
I,
24 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VARIABLES 25

at an office in a finite interval of time, or a student's numerical score on an P(X=x;)

examination. The exact time of arrival of a telephone call is an example of a


continuous random variable.

2.3.2 Discrete Random Variables and Probability Mass Functions


A discrete random variable X is characterized by a set of allowable values x 1 ,
x 2 , ••• , x" and the probabilities of the random variable taking on one of these 0 2 3 4 5 6 X;

values based on the outcome of the underlying random experiment. The prob-
Number of dots showing up on a die
ability that X= X; is denoted by P(X = x;) fori = 1, 2, ... , n, and is called
the probability mass function. Figure 2.3 Probability mass function for Example 2.6.
The probability mass function of a random variable has the following im-
portant properties:
mass function P(X = X;, Y = Yi), which gives the probability that X = X; and
1. P(X = X;) > 0, i = 1, 2, ... , n (2.22.a) Y = Yi·
n
Using the probability rules stated in the preceding sections, we can prove
2. 2: P(X = x;) = 1 (2.22.b) the following relationships involving joint, marginal and conditional probability
i=l mass functions:
3. P(X :5 x) = Fx(x) = 2: P(X = x;) (2.22.c)
2: 2:
alJ X(:'::;;X
Joint 1. P(X :5 X, y :5 y) =
Zt.$.t' Y(:;Y
P(X = X;, y = Yi) (2.23)

4. P(X = X;) = lim [Fx(x;) - Fx(X; - e)) (2.22.d)


e>O 2. P(X == X;) = 2: P(X = X;, y = Yi) marginal
m
Note that there is a one-to-one correspondence between the probability distri- = 2: P(X = X;IY = Yi)P(Y = yj) (2.24)
bution function and the probability mass function as given in Equations 2.22c
and 2.22d. P(X = X;, Y = Yi)
3. P(X x;IY Yi) P(Y = Yi) 0
P(Y = yJ
= =

EXAMPLE 2.6.
conditional (2.25)
P(Y = YiiX = X;)P(X =X;)
n
(Bayes' rule)
Consider the toss of a fair die. Plot the probability mass function. 2: P(Y = YiiX = X;)P(X = X;)
i= 1
SOLUTION: See Figure 2.3. (2.26)
4. Random variables X and Y are statistically independent if
P(X = X;, y = Yi) = P(X = X;)P(Y = Yi)
i = 1, 2, ... , n; j = 1, 2, ... , m (2.27)
Two Random Variables-Joint, Marginal, and Conditional Distributions and
Independence. It is of course possible to define two or more random variables
on the sample space of a single random experiment or on the combined sample
spaces of many random experiments. If these variables are all discrete, then EXAMPLE 2.7.
they are characterized by a joint probability mass function. Consider the example
of two random variables X and Y that take on the values Xt. x 2 , ••• , Xn and Find the joint probability mass function and joint distribution function of X,Y
Yz, ... , Ym· These two variables can be characterized by a joint probability associated with the experiment of tossing two fair dice where X represents the
''iik·:c•:-

RANDOM VARIABLES 27
26 REVIEW OF PROBABILITY AND RANDOM VARIABLES

number appearing on the up face of one die and Y represents the number E{(X- !Lx)"} = a-1- = 2: (x; - !Lx) 2P(X = x;) (2.30)

•• appearing on the up face of the other die. i=l

SOLUTION: The square-root of variance is called the standard deviation. The mean of a
random variable is its average value and the variance of a random variable is a
1 measure of the "spread" of the values of the random variable.
P(X == i, Y = j) = 36' i = 1, 2, ... ' 6; j = 1, 2, ... ' 6 We will see in a later section that when the probability mass function is not
known, then the mean and variance can be used to arrive at bounds on prob-
X J 1 abilities via the Tchebycheff's inequality, which has the form
Fx_y(X, y) = 2: 2: 36'
I
X = 1, 2, ... , 6; y = 1, 2, ... , 6

• - xy
- 36 P[\X - 11-xi > k] :s;
(12
(2.31)

•,. If x andy are not integers and are between 0 and 6, Fxx(x, y) = Fx,y([x], [y]) The Tchebycheff's inequality can be used to obtain bounds on the probability
where [x] is the greatest integer less than or equal to x. Fx.Y(x, y) = 0 for x < of finding X outside of an interval 11-x ± kax .
•• 1 or y < 1. Fx,Y(x, y) = 1 for x =:: 6 andy=:: 6. Fx,y(x, y) = Fx(x) for y =:: 6 .
Fx.v(x, y) = Fv(Y) for x =:: 6 .
The expected value of a function of two random variables is defined as

• E{g(X, Y)} =
n m
2: 2: g(x;, Yi)P(X = X;, Y = Yi) (2.32)
• I

• 2.3.3 Expected Values or Averages A useful expected value that gives a measure of dependence between two random
• The probability mass function (or the distribution function) provides as complete
variables X and Y is the correlation coefficient defined as

a description as possible for a discrete random variable. For many purposes this
description is often too detailed. It is sometimes simpler and more convenient E{(X- !Lx)(Y- !Ly)} axv (2.33)
Pxv = = --
to describe a random variable by a few characteristic numbers or summary axay axay
measures that are representative of its probability mass function. These numbers
are the various expected values (sometimes called statistical averages). The ex- The numerator of the right-hand side of Equation 2.33 is called the covariance
pected value or the average of a function g(X) of a discrete random variable X (a-XY) of X and Y. The reader can verify that if X and Y are statistically inde-
is defined as pendent, then PXY = 0 and that in the case when X and Yare linearly dependent
(i.e., when Y = (b + kX), then IPxYI = 1. Observe that PxY = 0 does not imply
n
statistical independence.
E{g(X)} 2: g(x;)P(X = x;) (2.28) Two random variables X and Y are said to be orthogonal if
i=l

E{XY} = 0
It will be seen in the next section that the expected value of a random variable
is valid for all random variables, not just for discrete random variables. The
form of the average simply appears different for continuous random variables. The relationship between two random variables is sometimes described in
Two expected values or moments that are most commonly used for characterizing terms of conditional expected values, which are defined as
a random variable X are its mean 11-x and its variance a}. The mean and variance
are defined as E{g(X, Y)IY = yj} = L g(x;, Yi)P(X = x;\Y = Yi) (2.34.a)
i

E{X} = JLx = 2: x;P(X = x;) (2.29)


E{g(X, Y)jX = x;} = 2: g(x;, Yi)P(Y = YiiX = x;) (2.34.b)
i=I
28 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VARIABLES 29

The reader can verify that From the factorial moments, we can obtain ordinary moments, for example, as

E{g(X, Y)} Ex,y{g(X, Y)} fLx = el


= Ex{EYJx[g(X, Y)IX]} (2.34.c)
and
where the subscripts denote the distributions with respect to which the expected
values are computed. o-i = ez + et - q
One of the important conditional expected values is the conditional mean:

2.3.4 Examples of Probability Mass Functions


E{XIY = Yi} = fLx[Y=yi = _2: x;P(X = x;IY = Yi) (2.34.d)
The probability mass functions of some random variables have convenient an-
alytical forms. Several examples are presented. We will encounter these prob-
The conditional mean plays an important role in estimating the value of one ability mass functions very often in analysis of communication systems.
random variable given the value of a related random variable, for example, the
estimation of the weight of an individual given the height. The Uniform Probability Mass Function. A random variable X is said to have
a uniform probability mass function (or distribution) when
Probability Generating Functions. When a random variable takes on values
that are uniformly spaced, it is said to be a lattice type random variable. The P(X = x;) = 1/n, i = 1, 2, 3, ... , n (2.36)
most common example is one whose values are the nonnegative integers, as in
many applications that involve counting. A convenient tool for analyzing prob- The Binomial Probability Mass Function. Let p be the probability of an event
ability distributions of non-negative integer-valued random variables is the prob- A, oi a random experiment E. If the experiment is repeated n times and then
ability generating function defined by outcomes are independent, let X be a random variable that represents the num-
ber of times A occurs in the n repetitions. The probability that event A occurs
k times is given by the binomial probability mass function
Gx(z) = _2: zkP(X = k) (2.35.a)
k=O
P(X = k) pk(1 _ p)n-k, k = 0, 1, 2, ... , n (2.37)
The reader may recognize this as the z transform of a sequence of probabilities
{pk}, Pk = P(X = k), except that z- 1 has been replaced by z. The probability
generating function has the following useful properties: where

n a n! a
1. Gx(l) = _2: P(X = k) =1 (2.35.b) (k) = k!(n _ k)! and m! = m(m - 1)(m - 2) ... (3)(2)(1); 0! 1.
k=O
2. If Gx(z) is given, Pk can be obtained from it either by expanding it in a
power series or from The reader can verify that the mean and variance of the binomial random variable
1 dk are given by (see Problem 2.13)
P(X = k) = k! dzk [ Gx(z )]lz=O (2.35.c)

'
l
i
3. The derivatives of the probability generating function evaluated at z =
1 yield the factorial moments en, where
en = E{X(X - l)(X - 2) · · · (X - n + 1)}
f.Lx = np
O"k = np(l - p)
(2.38.a)
(2.38.b)

d" Poisson Probability Mass Function. The Poisson random variable is used to
= dz" [Gx(z)]iz=! (2.35.d)
model such things as the number of telephone calls received by an office and
;
.:!
,,.,. ·

30 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VARIABLES 31

the number of electrons emitted by a hot cathode. In situations like these if we


make the following assumptions:
1:XA'M?lE 2.8.

The input to a binary communication system, denoted by a random variable X,


1. The number of events occurring in a small time interval A.' tlt as takes on one of two values 0 or 1 with probabilities i and i, respectively. Due
0. to errors caused by noise in the system, the output Y differs from the input X
2. The number of events occurring in nonoverlapping time intervals are occasionally. The behavior of the communication system is modeled by the
independent. conditional probabilities

then the number of events in a time interval of length T can be shown (see
Chapter 5) to have a Poisson probability mass function of the form 3 7
P(Y = 1IX = 1) =- and P(Y = OIX = 0) = -
4 8
A.k
P(X = k) = - k = 0, 1, 2, ... (2.39.a) (a) Find P(Y = 1) and P(Y = 0).
k' ,
(b) Find P(X = 11 Y = 1).
where A. = A.'T. The mean and variance of the Poisson random variable are
given by (Note that this is similar to Example 2.3. The primary difference is the
use of random variables.)
J.Lx = A. (2.39.b)
o1 = A. (2.39.c) SOLUTION:
(a) Using Equation 2.24, we have

Multinomial Probability Mass Function. Another useful probability mass func- P(Y = 1) = P(Y = liX = O)P(X = 0)
tion is the multinomial probability mass function that is a generalization of the + P(Y = liX = l)P(X = 1)
binomial distribution to two or more variables. Suppose a random experiment
is repeated n times. On each repetition, the experiment terminates in but one (1- + = ;2
of k mutually exclusive and exhaustive events AI> A 2 , • • • , Ak. Let p; be the
probability that the experiment terminates in A; and let p; remain constant 23
P(Y = 0) = 1 - P(Y = 1) = -
throughout n independent repetitions of the e)(periment. Let X;, i = 1, 2, ... , 32
k denote the number of times the experiment terminates in event A;. Then
(b) Using Bayes' rule, we obtain-

P(Xr = Xr, Xz = Xz, ... 'xk = xk) P(X = II y = 1) = P(Y = IIX = 1)P(X- 1)
n! P(Y = 1)
Xr!x 2! · • · xk -1·'x k•r P1'P2' · · · Plt (2.40)

2
=---=-
where x 1 + x 2 + · · · + xk = n, p 1 + p 2 + · · · Pk = 1, and X;= 0, 1, 2, ... , 9 3
n. The probability mass function given Equation 2.40 is called a multinomial 32
probability mass function.
Note thatwithA 1 =A, andA 2 = A,p 1 = p, andp 2 = 1 - p, the multinomial P(X = 11 Y = 1) is the probability that the input to the system is 1
probability mass function reduces to the binomial case. when the output is 1.
Before we proceed to review continuous random variables, let us look at
three examples that illustrate the concepts described in the preceding sections.
CONTINUOUS RANDOM VARIABLES 33
32 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Find
EXAMPLE 2.9.
(a) The joint probability mass function of M and N.
Binary data are transmitted over a noisy communication channel in blocks of (b) The marginal probability mass function of M.
16 binary digits. The probability that a received binary digit is in error due to (c) The condition probability mass function of N given M.
channel noise is 0.1. Assume that the occurrence of an error in a particular digit (d) E{MiN}.
does not influence the probability of occurrence of an error in any other digit (e) E{M} from part (d).
within the block (i.e., errors occur in various digit positions within a block in a
statistically independent fashion). SOLUTION:
e-1o n = 0, 1,
(a) Find the average (or expected) number of errors per block.
(b) Find the variance of the number of errors per block. (a) P(M = i, N = n) = -
n.1 t
i = 0, 1, ,n
(c) Find the probability that the number of errors per block is greater than
or equal to 5. . ., e-10(10)n(.l)n n! . .
(b) P(M = z) = n! i!(n _ i)! (.9)'(.1)-'

SOLUTION: e- 10 (9) 1 "' 1


(a) Let X be the random variable representing the number of errors per = -.-,-
l.
2: (n -
n=i
")I
t .
block. Then, X has a binomial distribution e- 9(9)i
P(X = k) = ci
6
)(.1)k(.9)t6-k, k = 0, 1, ... ' 16
=
.
l.
, ' i = 0, 1,. 0 •

. e- 10 10n n . . i!
and using Equation 2.38.a (c) P(N = niM = z) = -n!- (.t )(.9)'(.1)n-• -
e- -
9 .
(9)'
E{X} = np = (16)(.1) = 1.6 = e- 1 /(n - i)!, n = i, i + 1, ...
i = 0, 1, ...
(b) The variance of X is found from Equation 2.38.b:
oJ = np(1 - p) = (16)(.1)(.9) = 1.44 (d) Using Equation 2.38.a
E{MIN = n} = .9n
(c) P(X 5) = 1 - P(X s 4)

= 1 - ±
k=O
<!6)(0.1)k(0.9)16-k
Thus
E{MiN} = .9N

= 0.017 (e) E{M} = EN{E{MIN}} = EN(.9N) = (.9)EN{N} = 9

This may also be found directly using the results of part (b) if these results are
available.

EXAMPLE 2.1 0.

The number N of defects per plate of sheet metal is Poisson with A. = 10. The
inspection process has a constant probability of .9 of finding each defect and
the successes are independent, that is, if M represents the number of found 2.4 CONTINUOUS RANDOM VARIABLES
defects
2.4.1 Probability Density Functions
A continuous random variable can take on more than a countable number of
P(M = iiN = n) = i s n values in one or more intervals on the real line. The probability law for a
l
r r
)
34 REVIEW OF PROBABILITY AND RANDOM VARIABLES 35
,•
CONTINUOUS RANDOM VARIABLES

continuous random variable X is defined by a probability density function (pdf)


fx(x) where tXAMPtE 2.11.
• Resistors are produced that have a nominal value of 10 ohms and are ±10%
J
fx(x) = dFx(x) resistors. Assume that any possible value of resistance is equally likely. Find the
(2.41)
J dx density and distribution function of the random variable R, which represents
resistance. Find the probability that a resistor selected at random is between 9.5
and 10.5 ohms.
With this definition the probability that the observed value of X falls in a small
interval of length Ax containing the point x is approximated by f x(x)Ax. With
SOLUTION: The density and distribution functions are shown in Figure 2.4.
such a function, we can evaluate probabilities of events by integration. As with
Using the distribution function,
a probability mass function, there are properties that fx(x) must have before it
can be used as a density function for a random variable. These properties follow
from Equation 2.41 and the properties of a distribution function. 3 1 1
P(9.5 < R :s 10.5) = FR(10.5) - FR(9.5)
4 4 2
1. fx(x);::::: 0 (2.42.a)
or using the density function,
2. roo fx(x) dx = 1 (2.42.b)

3. P(X :sa) = Fx(a) = fx fx(x) dx (2.42.c)


P(9.5 < R :s 10.5) = f! 0 · 5
)95 2
dr = 10.5 - 9.5
2
1
= 2

1
4. P(a :s X :s b) = f fx(x) dx (2.42.d)

Mixed Random Variable. It is possible for a random variable to have a dis-


Furthermore, from the definition of integration, we have tribution function as shown in Figure 2.5. In this case, the random variable and
the distribution function are called mixed, because the distribution function
consists of a part that has a density function and a part that has a probability
P(X = a) = J" fx(x) dx
a
= lim fx(a) Ax = 0 (2.42.e) mass function.

for a continuous random variable.

Fx (x)
1

j
I

'
J
..:::
-o
c::

s"'
I
I
j
r::
,J
1 11
Figure 2.4 Distribution function and density function for Example 2.11. Figure 2.5 Example of a mixed distribution function.

J
I.
36 REVIEW OF PROBABILITY AND RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES 37

Two Random Variables-Joint, Marginal, and Conditional Density Functions fXly(x!y) fx,Y(x, y) fy(y) > {} (2.44.a)
and Independence. If we have a multitude of random variables defined on one fy(y) '
or more random experiments, then the probability model is specified in terms
of a joint probability density function. For example, if there are two random fY!x(Yix) fx.Y(x, y) (2.44.b)
variables X and Y, they may be characterized by a joint probability density fx(x) ' fx(x) > 0
function fx. y(x, y). If the joint distribution function, Fx, y, is continuous and
has partial derivatives, then a joint density function is defined by fYlx(Yix) = oo fx!Y(xiy)fy(y) Bayes' rule (2.44.c)
f_oo fx!Y(xiX.)fy(X.) dX.

Finally, random variables X and Y are said to be statistically independent if

It can be shown that


fx,Y(x, y) = fx(x)fy(y) (2.45)

fu(x, y) 2: 0
EXAMPLE 2.12.
From the fundamental theorem of integral calculus
The joint density function of X and Y is

Fx,y(X, Y) = too J:oo fx,y(J.L, v) dJ.L dv fx.Y(x, y) = axy, 1 :5 X :S 3, 2 :S y :S 4


= 0 elsewhere
,j
Since Fx, y(oo, oo) = 1
Find a, fx(x), and Fy(y)

!
-t
II fx.Y(!J., v) d!J. dv = 1
SOLUTION: Since the area under the joint pdf is 1, we have

1 = ff axy dx dy =a f y [ I: dy
.,·!
it
A joint density function may be interpreted as
= a f 4y dy = 4a I: = 24a
lim P[(x <X :5 X + dx)
dx-.o
n (y < y :5 y + dy)] = !x,y(x, y) dx dy
dy-->0 or

From the joint probability density function one can obtain marginal proba- 1
bility density functions fx(x), fy(y), and conditional probability density func- £l.;;: 24
tions fx!Y(xiy) and fnrtYix) as follows:
a
;.! The marginal pdf of X is obtained from Equation 2.43.a as
l
,J
fx(x) = roo fx,y(x, y) dy (2.43.a)
·..
fx(x) = -24
1 xy dy = -24
X
[8 - 2] = X-4' 1 :S x < 3
# fy(y) = roo fx,y(X, y) dx (2.43.b)
= 0
2

elsewhere
-
r
38 REVIEW OF PROBABILITY AND RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES 39

And the distribution function of Y is if X and Yare independent, then

Fy(y) = 0, y=s2 E{g(X)h(Y)} = E{g(X)}E{h(Y)} (2.49)


= 1, y>4

= 24 )
1 ry exu dx dv
J =
1
6 Jz
p
v dv
It should be noted that the concept of the expected value of a random variable
2 1 is equally applicable to discrete and continuous random variables. Also, if gen-
1 eralized derivatives of the distribution function are defined using the Dirac delta
-- 12 [ y 2 - 4], 2=sy=s4 function 8 (x), then discrete random variables have generalized density functions.
For example, the generalized density function of die tossing as given in Example
2.6, is

Expected Values. As in the case of discrete random variables, continuous


fx(x) = 61 [o(x - 1) + o(x - 2) + o(x - 3)
random variables can also be described by statistical averages or expected values. + o(x - 4) + 8(x - 5) + o(x - 6)]
The expected values of functions of continuous random variables are defined
by
If this approach is used then, for example, Equations 2.29 and 2.30 are special
cases of Equations 2.47 .a and 2.47. b, respectively.
E{g(X, Y)} = g(x, y)fx.y(x, y) dx dy (2.46)
Characteristic Functions and Moment Generating Functions. In calculus we
use a variety of transform techniques to help solve various analysis problems.
f.lx = E{X} = X fx(x) dx (2.47 .a) For example, Laplace and Fourier transforms are used extensively for solving
linear differential equations. In probability theory we use two similar "trans-

fx
forms" to aid in the analysis. These transforms lead to the concepts of charac-
o1 = E{(X - f.lx) 2} = (x - f.lx)Zfx(x) dx (2.47.b) teristic and moment generating functions.
The characteristic function 'l' x(w) of a random variable X is defined as the
expected value of exp(jwX)
Uxy = E{(X- f.lx)(Y- f.ly)} (2.47.c)

= f (x - f.lx)(y - f.ly)fx.Y(x, y) dx dy
'l' x(w) = E{exp(jwX)}, j = v=1

For a continuous random variable (and using 8 functions also for a discrete
and random variable) this definition leads to

PxY =
E{(X - f.lx)(Y -
UxUy
f.ly)}
(2.47.d) 'l'x(w) = fx fx(x)exp(jwx) dx (2.50.a)

It can be shown that -1 ::s PxY ::s 1. The Tchebycheff's inequality for a contin- which is the complex conjugate of the Fourier transform of the pdf of X. Since
uous random variable has the same form as given in Equation 2.31. lexp(jwx) I ::s 1,
Conditional expected values involving continuous random variables are de-
fined as
lfx(x)exp(jwx)l dx ::S fx fx(x) dx = 1

E{g(X, Y)IY = y} = r, g(x, y)fxry(xjy) dx (2.48)


and hence the characteristic function always exists.
40 REVIEW OF PROBABILITY AND RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES 41

Using the inverse Fourier transform, we can obtain fx(x) from 'l'x(w) as EXAMPLE '2.1'3.

X 1 and X 2 are two independent (Gaussian) random variables with means ILl and
fx(x) =
2
1TI J"'_"' 'l'x(w)exp( -jwx) dw (2.50.b) ILz and variances cry and The pdfs of X 1 and X 2 have the form

Thus, f x(x) and 'I'x( w) form a Fourier transform pair. The characteristicfunction 1 [ (x; - ILYJ = 1, 2
fx,(x;) = • exp 2 z ' i
of a random variable has the following properties. V .L:TI 0'; 0';

1. The characteristic function is unique and determines the pdf of a random (a) Find 'l'x,(w) and 'l'x,(w)
variable (except for points of discontinuity of the pdf). Thus, if two (b) Using 'I'x( w) find E{X4} where X is a Gaussian random variable with
continuous random variables have the same characteristic function, they mean zero and variance cr 2.
have the same pdf. (c) Find the pdf of Z = a 1X 1 + a 2 X 2
2. 'l'x(O) = 1, and
SOLUTION:
(a) "' 1
E{Xk} =
Jk
[dk'l' x(w)J
dwk
at w = 0 (2.5l.a) 'I' x/ w) =
f_, 2TI <T1
exp[- (x 1 - IL 1) 212cri]exp(jwx 1 ) dx 1

We can combine the exponents in the previous equation and write it as


Equation (2.51.a) can be established by differentiating both sides of
Equation (2.50.a) k times with respect tow and setting w = 0. exp[j,.L 1w + (cr 1jw) 2/2]exp{- [x 1 - (IL 1 + cr!jw)]2/2cri}
and hence
The concept of characteristic functions can be extended to the case of two 1
f
oo

or more random variables. For example, the characteristic function of two ran- 'l'x,(w) = exp[jiLJW + (crdw)2f2]. -oo crl
dom variables X 1 and X 2 is given by
x exp[- (x 1 - ILD2 /2o-I] dx,

'I' w2) = E{exp(jw 1 X 1 + jw 2 X 2 )} (2.5l.b) where ILi = 1L1 + <rijw.


The value of the integral in the preceding equation is 1 and hence

The reader can verify that 'I' x/w) = exp[jiL 1w + (<rdw)Z/2]


Similarly
'l' x"x,(O, 0) 1 'l' x,( w) = exp[j IL 2w + (<r 2 jw ) 2 /2]
(b) From part (a) we have
and
'I' x( w) = exp(- <r 2w2 /2)

aman ['I' (w and from Equation 2.51.a


E{xmxn}
1
- ·-(m+n)
2 - J aw m" n x,.x, b w )]
2 at ( w 2) (0, 0) (2.51.c)
1 uW 2
E{X4} = {Fourth derivative of 'l' x(w) at w = 0}
1
The real-valued function Mx(t) = E{exp(tX)} is called the moment generating = 3<r4
function. Unlike the characteristic function, the moment generating function Following the same procedure it can be shown for X a normal random
need not always exist, and even when it exists, it may be defined for only some variable with mean zero and variance cr 2 that
values oft within a region of convergence (similar to the existence of the Laplace
transform). If Mx(t) exists, then Mx(t) = 'I'x(t!j).
We illustrate two uses of characteristic functions. E[X"] = g.3 ... (n - 1)cr"
n = 2k + 1
n = 2k, k an integer.
r
42 REVIEW OF PROBABILITY AND RANDOM VARIABLES
CONTINUOUS RANDOM VARIABLES 43
(c) '1' 2 (w) = E{exp(jwZ)} = E{exp(jw[a 1 X 1 + a2 X 2 ])}
.and eLJuating like powers of <o results in
= E{exp(jwa 1 X 1 )exp(jwa2 X 2 )}
= E{exp(jwa 1 X 1)}E{exp(jwa2 X 2 )}
E[X] = K 1
since X 1 and X 2 are independent. Hence, (2.52.c)
E[X2 ] = Kz + Kf (2.52.d)
'l'z(w) = 'l'xJwat)'l'x2 (waz)
3
E[X ] = K 3 + 3KzKt + Ki (2.52.e)
= exp(j(atf.Lt + azf.Lz)w + (afaf + a1aD(jw) 212]
4
which shows that Z is Gaussian with E[X ] = K4 + 4K3 K 1 + + 6K2 Kf + Ki (2.52.f)
f.Lz = a1f.11 + azf.Lz
Reference [5] contains more information on cumulants. The cumulants are
and
particularly useful when independent random variables are summed because the
individual cumulants are directly added.
= ayay + aiai
2.4.2 Examples of Probability Density Functions
We now present three useful models for continuous random variables that will
be used later. Several additional models are given in the problems included at
Cumulant Generating Function. The cumulant generating function Cx of X is the end of the chapter.
defined by
Uniform Probability Density Functions. A random variable X is said to have
a uniform pdf if
Cx(w) = In 'I' x(w) (2.52.a)

Thus fx(x) .= {1/(b


0
- a) ' a-:5x-sb
(2.53.a)
elsewhere

exp{Cx(w)} = 'I'x(w) The mean and of a uniform random variable can be shown to be

Using series expansions on both sides of this equation results in


b +a
f.Lx = (2.53.b)
(jw)2 (jw)n } (b - a) 2
exp { K 1(jw) + K 2 7 + · · · Kn--;;r + · · · (2.53.c)
12
1 + E[X](jw) + E[XZ] (jw)Z + ... + E[Xn] (jw)n + (2.52.b) Gaussian Probability Density Function. One of the most widely used pdfs is
2! n!
the Gaussian or normal probability density function. This pdf occurs in so many
applications partly because of a remarkable phenomenon called the central limit
The cumulants Ki are defined by the identity in w given in Equation 2.52.b. theorem and partly because of a relatively simple analytical form. The central
Expanding the left-hand side of Equation 2.52.b as the product of the Taylor limit theorem, to be proved in a later section, implies that a random variable
series expansions of that is determined by the sum of a large number of independent causes tends
to have a Gaussian probability distribution. Several versions of this theorem
have been proven by statisticians and verified experimentally from data by en-
2 gineers and physicists.
exp{Kdw}exp { Kz -(jw)- (jw)"}
Kn 7 ···
, · · · exp
} {
2 One primary interest in studying the Gaussian pdf is from the viewpoint of
using it to model random electrical noise. Electrical noise in communication

I
(
44 REVIEW OF PROBABILITY AND RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES 45

fxl.x)

0 a 0 a -a a
0
Figure 2.7 Probabilities for a standard Gaussian pdf.

Area= P(X>p.x+Yux) Unfortunately, this integral cannot be evaluated in closed form and requires
= Q(y)
numerical evaluation. Several versions of the integral are tabulated, and we will
use tabulated values (Appendix D) of the Q function, which is defined as
I , // / / / / / / / / / / / / ] l'\"\'\),'\'\.'\).S'rz X

0 1
P.x P.x+Y"x
Q(y) = Y exp(- z2 /2) dz, y>O (2.55)
Figure 2.6 Gaussian probability density function.

In terms of the values of the Q functions we can write P(X > a) as

systems is often due to the cumulative effects of a large number of randomly P(X >a) = Q[(a - JJ.x)lax] (2.56)
moving charged particles and hence the instantaneous value of the noise will
tend to have a Gaussian distribution-a fact that can be tested experimentally. Various tables give any of the areas shown in Figure 2. 7, so one must observe
(The reader is cautioned that there are examples of noise that cannot be modeled which is being tabulated. However, any of the results can be obtained from the
by Gaussian pdfs. Such examples include pulse type disturbances on a telephone others by using the following relations for the standard (f.l = 0, a = 1) normal
line and the electrical noise from nearby lightning discharges.) random variable X:
The Gaussian pdf shown in Figure 2.6 has the form
P(X:::; x) = 1 - Q(x)
2
1 [ (x - JJ.x) ]
(2.54) P(-a:SX:Sa) = 2P(-a:SX:S0) = 2P(O:::;X:::;a)
fx(x) = exp 2a},
1
P(X:::; 0) = - = Q(O)
2
The family of Gaussian pdfs is characterized by only two parameters, f.lx and
ai, which are the mean and variance of the random variable X. In many ap-
plications we will often be interested in probabilities such as
EXAMPLE 2.14.

1 (x - f.lx)
2 The voltage X at the output of a noise generator is a standard normal random
P(X > a) = • , exp [ ]
dx variable. Find P(X > 2.3) and P(1 :::; X:::; 2.3).
a YLnO'x 2O'x2

SOLUTION: Using one of the tables of standard normal distributions


By making a change of variable z = (x - f.Lx)lax, the preceding integral can
be reduced to
P(X > 2.3) = Q(2.3) = .011

1
P(1 :::; X:::; 2.3) = 1 - Q(2.3) - [1 - Q(1)] Q(1) - Q(2.3) = .148
P(X > a) = • ;;:;- exp(- z 2 /2) dz
v2n
,

46 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VECTORS 47

EXAMPLE 2.15. The t:xpected value of g(Z) .is defined as

The velocity V of the wind at a certain location is normal random variable with
1J.. = 2 and fi = 5. Determine P( -3 :s V :s 8).
E{g(Z)} r"' f"' g(z)fx,y(x, y) dx dy

SOLUTION:
Thus the mean, IJ..z, of Z is

1 exp [ (u - 2)2] IJ..z = = + jE{Y} IJ..x + j!J..y


P(- 3 :s V :s 8) = J_s Y21T(25) 2(25) du
E{Z} E{X} =

xz]
3

1 [ at is defined as
f The variance,
(S-2)15
= --exp
- - dx
(-3-2)/5 \,12;
2
= 1- Q(1.2)- [1- Q(-1)] = .726 E{IZ -

The covariance of two complex random variables Zm and Z 11 is defined by

Bivariate Gaussian pdf. We often encounter the situation when the instanta- Czmz, E{(Zm - IJ..zJ*(Zn - IJ..z,)}
neous amplitude of the input signal to a linear system has a Gaussian pdf and
we might be interested in the joint pdf of the amplitude of the input and the where * denotes complex conjugate.
output signals. The bivariate Gaussian pdf is a valid model for describing such
situations. The bivariate Gaussian pdf has the form
2.5 RANDOM VECTORS

fx.Y(x, y) =
2TiuxO"y
1 exp { - - -1
2(1 - p 2 )
[ (X :XIJ..xr + (y :YIJ..yr In the preceding sections we concentrated on discussing the specification of
probability laws for one or two random variables. In this section we shall discuss
2p(x - IJ..x)(y the specification of probability laws for many random variables (i.e., random
IJ..y)]}
UxUy (2.57) vectors). Whereas scalar-valued random variables take on values on the real
line, the values of "vector-valued" random variables are points in a real-valued
higher (say m) dimensional space (Rm)· An example of a three-dimensional
The reader can verify that the marginal pdfs of X and Y are Gaussian with random vector is the location of a space vehicle in a Cartesian coordinate system.
means IJ..x, IJ..y, and variances ai, u}, respectively, and The probability law for vector-valued random variables is specified in terms
of a joint distribution function
E{(X- IJ..x)(Y - IJ..y)} _ aXY
p = PXY = axay - axay
... , Xm) = P[(XJ :S X1) ... (Xm :S Xm)]

or by a joint probability mass function (discrete case) or a joint probability


2.4.3 Complex Random Variables density function (continuous case). We treat the continuous case in this section
leaving details of the discrete case for the reader.
A complex random variable Z is defined in terms of the real random variables The joint probability density function of an m-dimensional random vector
Xand Yby is the partial derivative of the distribution function and is denoted by

Z =X+ jY fx 1,x2 , ... ,xm(xJ, Xz, · .. , Xm)


48 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VECTORS 49

From the joint pdf, we can obtain the marginal pdfs as Important parameters of the joint distribution are the means and the co-
variances

fx,(XJ) = roo roo' . 'roo fx,.x,, ... ,xJXl> Xz, ... , Xm) dx2' ' · dxm fLx, = E{X;}
m - 1 integrals
and
and
IJ'x,x; = E{XiXi} - fLx,fLx1
f x,.x,(xJ> Xz)
= roo roo ... roo fx,.x, ... xJX1, Xz, X3, · .. , Xm) dx3 dx4 · · · dxm (2.58) Note that crxx is the variance of Xi. We will use both crx x, and cr} to denote
the variance ofx;. Sometimes the notations Ex,, Ex,xi' used to denote
m - 2 integrals expected values with respect to the marginal distribution of Xi, the joint distri-
bution of X; and Xi, and the conditional distribution of Xi given Xi, respectively.
We will use subscripted notation for the expectation operator only when there
Note that the marginal pdf of any subset of the m variables is obtained by is ambiguity with the use of unsubscripted notation.
"integrating out" the variables not in the subset. The probability law for random vectors can be specified in a concise form
The conditional density functions are defined as (using m = 4 as an example), using the vector notation. Suppose we are dealing with the joint probability law
form random variables X 2 , ••• , Xm. These m variables can be represented
as components of an m x 1 column vector X,
fx,.x,.x,lx,(x 1, x 2 , x3Jx 4 ) = fx,.x •. x 2 , x 3, x 4 ) (2.59)
fx/x4)

and or xr = (X 1, X 2 , ••• , Xm)

fx,.x .. Xz, X3, x4)


f x,.x2IX3 .x,(xl> Xzlx3, X4) (2.60)
fx,.x,(x3, X4)
where T indicates the transpose of a vector (or matrix). The values of X are
points in the m-dimensional space Rm. A specific value of X is denoted by
Expected values are evaluated using multiple integrals. For example,

xr = (xi> Xz, ... , Xm)


X 2 , X 3, X4)}

= J:, g(xl> Xz, X3, X4)j x,.x,.x,.x,(xl> Xz, x 3, X4) dx 1 dx 2 dx3 dx 4 Then, the joint pdf is denoted by

(2.61)
fx(X) = fx,.x,, ... ,x)xb Xz, · · · , Xm)

where g is a scalar-valued function. Conditional expected values are defined for The mean vector is defined as
example, as

E(X1) ]
E{g(X1, Xz, X3, X4)JX3 = X3, X4 = x4}

r, roo
E(Xz)
fLx = E(X)
= g(Xt, Xz, X3, X4)fx,.x,IX3 .x,(Xl, XziX3, X4) dx1 dx2 (2.62) [
E(Xm)
..... ..
r
i
l 50
I
l REVIEW OF PROBABILITY AND RANDOM VARIABLES

i
RANDOM VECTORS 51

t
l
and the "covariance-matrix", Ix, an m x m matrix is defined as 1. Suppose X has an m-dimensional multivariate Gaussian distribution. If
we partition X as
!I '
2"
Ix = E{XXI} - f.Lxf.Lk
J J

=
ax,x1
<Tx,x,
<Tx,x,
<Tx,x2
<Tx,xm]
O'xzXm
x: X,
X [X] 1: X,
r<TxmX1 <TxmX1 <Txmxm m X m and

The covariance matrix describes the second-order relationship between the com-
-
f.Lx- -
J.Lx,
kx- - k21 l:zz
ponents of the random vector X. The components are said to be "uncorrelated"
when
where J.Lx, is k x 1 and l: 11 is k x k, then X 1 has a k-dimensional
multivariate Gaussian distribution with a mean J.Lx, and covariance l: 11 •
<Tx,x; = ri;i = 0, i# j 2. If l:x is a diagonal matrix, that is,

rii

1]
and independent if 0 0
kx = 0

fx,.x, .... ,xJxt. Xz, · · · , Xm) =


m
IT fx,(x;)
i=1
(2.63) r 0 0 0

then the components of X are independent (i.e., uncorrelatedness implies


independence. However, this property does not hold for other distri-
butions).
2.5.1 Multivariate Gaussian Distribution 3. If A is a k x m matrix of rank k, then Y = AX has a k-variate Gaussian
An important extension of the bivariate Gaussian distribution is the multivariate distribution with

I
!
Gaussian distribution, which has many applications. A random vector X is mul-
tivariate Gaussian if it has a pdf of the form J.Lv = AJ.Lx
ky = A:kxAT
(2.65.a)
(2.65.b)
I
j fx(x) = [(21T)m' 21Ixl 112 ]- 1exp [ (x - J.LxYkx 1(x - J.Lx) J (2.64)
4. With a partition of X as in (1), the conditional density of X 1 given X 2
x2 is a k-dimensional multivariate Gaussian with

where f.Lx is the mean vector, Ix is the covariance matrix, Ix 1 is its inverse, IIxl f.Lx 1JX 2 = E[XdXz = Xz] = J.Lx, + l:l2l:iz1(Xz - f.Lx,) (2.66.a)

I is the determinant of Ix, and X is of dimension m.


and
II kx 1IX2 = l:u - l:12l:iz1l:21 (2.66.b)

2.5.2 Properties of the Multivariate Gaussian Distribution


Properties (1), (3), and (4) state that marginals, conditionals, as well as linear
1 We state next some of the important properties of the multivariate Gaussian transformations derived from a multivariate Gaussian distribution all have mul-
distribution. Proofs of these properties are given in Reference [6]. tivariate Gaussian distributions.
52 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VECTORS 53

Hence Y has a trivariate Gaussian distribution with


EXAMPLE 2.15.

Suppose X is four-variate Gaussian with

m
2 0
IJ.y = Aj..tx = 1 2
[0 0

m and

21 02 00 0][6
0 3 3 3 21][2
4 2 0 1
2 OJ
and

[0 0 1 1 2 3 4 3 0 0 1
0

6 3 2 1]
3 4 3 2 2424 6]1233 001
=
[2
1 2 3 3
3 4 3
= 24 34 13
[ 6 13 13

Let (c) X 1 given X 2 = (x 3 , x 4 )T has a bivariate Gaussian distribution with

()21 + (32 21)(43 33)-1(x 3 - 1)


!
\ X1 = Xz = J.lx,Jx" = X4 - 0

(a)
(b)
Find the distribution of X 1 •
Find the distribution of
= [x, - x, + 1]
2X1 ]
x3 - 3 X4
Y = X1 + 2X2
[ x3 + x4
and

[63] [2 1] [4 3] [2 3]
(c) Find the distribution of X 1 given X 2 = (x 3 , x 4 )T.
-I
= 3 4 - 3 2 3 3 1 2
SOLUTION:
14/3 4/3]
(a) X 1 has a bivariate Gaussian distribution with [ 4/3 5/3

IJ.x, = [i] and =

(b) We can express Y as


2.5.3 Moments of Multivariate Gaussian pdf
0 0
2 0
0 1 n[!}AX Although Equation 2.65 gives the moments of a linear combination of multi-
variate Gaussian variables, there are many applications where we need to com-
pute moments such as E{XrXU, E{X1X2 X 3 X4}, and so on. These moments can
r
54 REVIEW OF PROBABILITY AND RANDOM VARIABLES
I TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 55
i
be calculated using the joint characteristic function of the multivariate Gaussian When we square the quadradic term, the only terms proportional to w1w2 w3w4
density function, which is defined by \ will be

Wz, • .. 'Wn) E{exp[j(w1X1 + WzXz + · · · wnXn)]}


81 {80'120'34W1W2W3W4
'1Jfx(W1,
+ 8a230'14W2W3W1W4 + 8az40'13W2W4W1W3}

exp [jJ.L{w - wTixw J (2.67)

Taking the partial derivative of the preceding expression and setting w (0),
where wT = (wb w2, . . . ' wn). From the joint characteristic function, the we have
moments can be obtained by partial differentiation. For example,

E{X1X2X3X4} = a120'34 + az30'14 + O'z4a13


a4'1Jfx(WJ. W2, WJ, W4)
at w = (0) (2.68) E{X1Xz}E{X3X4} + E{XzX3}E{X1X4}
E{X1XzX3X4} = aw aw aw aw4
1 2 3 + E{X2 X 4 }E{X!X3} (2.69)

To simplify the illustrative calculations, let us assume that all random variables
have zero means. Then, The reader can verify that for the zero mean case

E{XiXU = E{XI}E{XU + 2[E{X1Xz}]" (2.70)


'l'x(wJ. w2, w3, w4) exp ( wTixw)

Expanding the characteristic function as a power series prior to differentiation,


we have
2.6 TRANSFORMATIONS (FUNCTIONS) OF
RANDOM VARIABLES
1
'l'x(w 1, w2, w 3 , w4) 1 - 2 wTixw In the analysis of electrical systems we are often interested in finding the prop-
1 erties of a signal after it has been "processed" by the system. Typical processing
+ 8 (wTixwf +R operations include integration, weighted averaging, and limiting. These signal
processing operations may be viewed as transformations of a set of input variables
to a set of output variables. If the input is a set of random variables, then the
where R contains terms of w raised to the sixth and higher power. When we output will also be a set of random variables. In this section, we develop tech-
take the partial derivatives and set w1 = w2 = w3 = w4 = 0, the only nonzero niques for obtaining the probability law (distribution) for the set of output
terms come from terms proportional to w 1w2w3 w4 in random variables given the transformation and the probability law for the set
of input random variables.
The general type of problem we address is the following. Assume that X is
1 a random variable with ensemble Sx and a known probability distribution. Let
g (wTixw? !{O'nW12+ 0'2zWz2+ 0'33W32+ 0'44W42 g be a scalar function that maps each x E Sx toy = g(x). The expression
8

+ 20'12W1W2 + 20'13W1W3 + 20'14W1W4

+ 2a23W2W3 + 20'z4WzW4 + 20'34W3W4}2 y = g(X) I


:11
TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 57
56 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Sample space RangesetSx C R1 Range set Sy C R1 Now, suppose that g is a continuous function and C = ( -oo, y]. If B =
{x : g(x) :5 y}, then

P(C) = P(Y::;; y) = Fy(y)


= L fx(x) dx

which gives the distribution function of Yin terms of the density function of X.
The density function of Y (if Y is a continuous random variable) can be obtained
Random variable Y by differentiating Fy(y).
As an alternate approach, suppose Iy is a small interval of length Ay con-
Figure 2.8 Transformation of a random variable. taining the pointy. Let Ix = {x : g(x) E Iy}. Then, we have

P(Y Ely) = fy(y) Ay

defines a new random variable* as follows (see Figure 2.8). For a given outcome
f..., X(t...) is a number x, and g[X(t...)] is another number specified by g(x). This
= J fx(x) dx
lx

number is the value of the random variable Y, that is, Y(t...) = y = g(x). The
ensemble S y of Y is the set
which shows that we can derive the density of Y from the density of X.
We will use the principles outlined in the preceding paragraphs to find the
Sy = {y = g(x) : x E Sx} distribution of scalar-valued as well as vector-valued functions of random vari-
ables.
We are interested in finding the probability law for Y.
The method used for identifying the probability law for Y is to equate the
probabilities of equivalent events. Suppose C C Sy. Because the function g(x)
maps Sx----? Sy, there is an equivalent subset B, B C Sx, defined by 2.6.1 Scalar-valued Function of One Random Variable
Discrete Case. Suppose X is a discrete random variable that can have one
B = {x:g(x) E C} of n values x 2 , ••• , Xn. Let g(x) be a scalar-valued function. Then Y =
g(X) is a discrete random variable that can have one of m, m :5 n, values Yv
y 2 , ••• , Ym· If g(X) is a one-to-one mapping, then m will be equal ton. However,
Now, B corresponds to event A, which is a subset of the sample spaceS (see if g(x) is a many-to-one mapping, then m will be smaller than n. The probability
Figure 2.8). It is obvious that A" maps to C and hence mass function of Y can be obtained easily from the probability mass function of
X as
P(C) P(A) P(B)
P(Y = Yi) = L P(X ;: xi)

*For Y to be a random variable, the function g : X__,. Y must have the following properties:
where the sum is over all values of xi that map to Yi·

1. Its domain must include the range of the random variable X. Continuous Random Variables. If X is a continuous random variable, then the
2. It must be a Baire function, that is, for every y, the set I, such that g(x) s y must consist pdf of Y = g(X) can be obtained from the pdf of X as follows. Let y be a
of the union and intersection of a countable number of intervals in Sx. Only then {Y :S y}
is an event.
particular value of Y and let x(!l, x(2), ... , x<kl be the roots of the equation y =
2
3. The : g(X(;I.)) = ±co} must have zero probability. g(x). That is y = g(x(ll) = ... = g(x<kl). (For example, if y = x , then the
r r
58 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 59

y g(x) where llx< 1l > 0, > 0 but 6.x{2l < 0. From the foregoing it follows that

P(y < Y < y + = P(x(ll <X< x< 1l +


+ P(x< 2l + < X < x<2l)
+ P(x<3l < X< x(3l +

We can see from Figure 2.9 that the terms in the right-hand side are given by

fy(y) P(x(ll <X< x<1l + = fx(x(ll)


P(x<2l + <X< x<2l) =

P(x(3l <X< x< 3l + = fx(x(3l)

Since the slope g'(x) of g(x) is we have


I L 1'\'i [\'\! 1\'J -:::::ooo., X

H
A xlll
H
A xl21
H
A xl31
=

=
i Figure 2.9 Transformation of a continuous random variable.
.A.x(Jl =
1
1
Hence we conclude that, when we have three roots for the equation y = g(x),
l two roots are x<l) = + vY and x< l 2
= - Vy; also see Figure 2.9 for another
example.) We know that
f x(x(ll) f x(x<Zl) f x(x< 3l)
= g'(x(ll) + \g'(x<Zl)\ + g'(x(3l)

4 P(y < Y y + = fy(y) as 0


Canceling the and generalizing the result, we have

Now if we can find the set of values of x such that y < g(x) y + then fy(y) = L
k
fx(xUl)
we can obtain fy(y) from the probability that X belongs to this set. That is (2.71)
i=l \g'(x<il)l

I
g'(x) is also called the Jacobian of the transformation and is often denoted by
P(y < Y y + = P[{x:y < g(x) y + J(x). Equation 2.71 .gives the pdf of the transformed variable Yin terms of the
pdf of X, which is given. The use of Equation 2.71 is limited by our ability to
1 find the roots of the equation y = g(x). If g(x) is highly nonlinear, then the
For the example shown in Figure 2.9, this set consists of the following three solutions of y = g(x) can be difficult to find.
intervals:

'
l x(ll < x x(ll +


EXAMPLE 2.16 .
x< l +
2 <x x<2l
Suppose X has a Gaussian distribution with a mean of 0 and variance of 1 and
) x(3l < x x<3l + Y = X 2 + 4. Find the pdf of Y.
60 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 61

SOLUTION: y = g(x) = x 2 + 4 has two roots: I fx(x)


I
IY.

and hence
x<!J =
x<ZJ=
(a)

-3
I :
IQ
-1
3 X

I
IJy=g(x)
g'(x(ll) = lL:..- - , - - - - - -
1
g'(x<z>) = (b)
I
The density function of Y is given by

fx(x(ll) fx(x< 2>) Probability Mass Probability Mass


fy(y) = lg'(x<ll)l + lg'(x<Z>)I P(Y=-l)=VJ- P(Y=ll = y,

(c)

With fx(x) given as pd{of Y; {y(y) = Y. for iyl< 1


Y/J//N////.1 y
-l
1
f x(x) = • ;;:;- exp(- x 2 /2), Figure 2.10 Transformation discussed in Example 2.17.
v2-rr

we obtain
SOLUTION: For -1 < x < 1, y = x and hence

f,(y) L, 1 " exp(- (y - 4)/2), y:2:4

y<4
fy(y) = fx(Y) =
1
6' -1 <y< 1

All the values of x > 1 map to y = 1. Since x > 1 has a probability of L the
probability that Y = 1 is equal to P(X > 1) = !. Similarly P(Y = -1) = !.
Thus, Y has a mixed distribution with a continuum of values in the interval
Note that since y = x 2 + 4, and the domain of X is ( -oo, oo), the domain of Y ( -1, 1) and a discrete set of values from the set { -1, 1}. The continuous
is [4, oo). part is characterized by a pdf and the discrete part is characterized by a prob-
ability mass function as shown in Figure 2.10.c.

EXAMPLE 2.17
2.6.2 Functions of Several Random Variables
Using the pdf of X and the transformation shown in Figure 2.10oa and 2.10.b, We now attempt to find the joint distribution of n random variables Y2 ,
find the distribution of Y. 0 Yn given the distribution of n related random variables XI> X 2 , ••
•• , 0 , Xn
r
62 REVIEW OF PROBABILITY AND RANDOM VARIABLES
TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 63
and the relationship between the two sets of random variables,
)J(x\;l, where x 2 ) is the Jacobian of the transformation defined as

Y: = Xz, ... , Xn), i = 1, 2, ... , n ag! ag!


J(xlo Xz) = I ax1
axz
ag (2.72)
2 agz
Let us start with a mapping of two random variables onto two other random
variables: ax1 axz

By summing the contribution from all regions, we obtain the joint pdf of Y1 and
Y1 = Xz) Y2 as
Yz = Xz)
f Y,.Y, ( Yz) = L f x,,x,(
k

i=1
(i)
,
.

'" .
(2.73)
Suppose (x\il, xiil), i = 1, 2, ... , k are the k roots of y 1 = x2 ) and Yz =
x 2). Proceeding along the lines of the previous section, we need to find
the region in the x 2 plane ·such that Using the vector notation, we can generalize this result to the n-variate case as

Y1 < Xz) < Y1 + Lly1


fv(Y) = L fx(xUl) (2.74.a)
1 )J(xUl))

and
where x(i) = [x\il, ... , is the ith solution toy = g(x) = [g 1(x), g 2 (x),
... , gn(x)],r and the Jacobian J is defined by
Yz < gz(xl> Xz) < Yz + Llyz
ag! ag! ... ag!
There are k such regions as shown in Figure 2.11 (k = 3). Each region consists
ax! axz axn
of a parallelogram and the area of each parallelogram is equal to Lly 1Lly/ J[x(il] = I I (2.74.b)
agn agn ... agn
ax 1 axz ax" I at xUJ

Yz xz
Suppose we have n random variables with known joint pdf, and we are
interested in the joint pdf of m < n functions of them, say

Yi = g;(x!, Xz, ... , xn), i = 1, 2, ... , m


ll.yz (x 1121, x 2121)

Now, we can define n - m additional functions

(xlnl, xzlll) Yi = gi(xb Xz, ... , Xn), j = m + 1, ... , n

L--------yl XJ

Figure 2.11 Transformation of two random variables. in any convenient way so that the Jacobian is nonzero, compute the joint pdf
of Y1 , Y2 , ••• , Yn, and then obtain the marginal pdf of Y 2 , • •• , Ym by
64 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 65

integrating out Ym+t. ... , Yn. If the additional functions are carefully chosen, We are given
then the inverse can be easily found and the resulting integration can be handled,
but often with great difficulty. 1
fx,,x,(xi, Xz) = fx,(xt)fx,(xz) = 4' 9 ::S X 1 :::; 11, 9 ::S Xz ::S 11

=0 elsewhere
EXAMPLE 2.18.

Let two resistors, having independent resistances, X 1 and X 2 , uniformly distrib- Thus
uted between 9 and 11 ohms, be placed in parallel. Find the probability density
function of resistance Y1 of the parallel combination. 1
Y1,Y2 =?
SOLUTION: The resistance of the parallel combination is
fY,,Y,(Yt. Yz) - 4 (Yz - YtF '
= 0 elsewhere

Yt =
We must now find the region in the y 1 , y 2 plane that corresponds to the region
9:::; x 1 :::; 11, 9:::; x 2 :::; 11. Figure 2.12 shows the mapping and the resulting
Introducing the variable region in the y 1 , y 2 plane.
Now to find the marginal density of Yt. we "integrate out" y 2 •

Yz = Xz
f9y1 i(9-YJ) Yz2 , dy , 1 19
h,(YI) = ) 9 4(yz - Y1) 2 4 2:::; YI:::; 4 20
and solving for x 1 and x 2 results in the unique solution
11 19 1

X!
Y1Y2
Xz = Yz
-- f ny,i(ll-y,) 4(yz - Y1 y
dyz
' 4-:sy :::;5-
20
elsewhere
I 2
Yz- Y1 = 0

Thus, Equation 2.73 reduces to

yz=xz
X2
Yz) = fx,,x, ( , Yz)/IJ(xl> Xz)l
Yz Y1 yz=9y\v-JI(9-y,)
11

D
11

where 9 9

XI
(xi + Xz)z (xl + Xz)z
J(x 1 , Xz) 0 1 41J2 4'%. 5 1/2
(xi + Xz)z 9
y 1 =x 1x 21(x 1 +x 2 )

(b)
(Yz - YI) 2 (a)

Figure 2.U Transformation of Example 2.18.


... ..·-"-'---·

66 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 67

Carrying out the integration results in The Jacobian of the transformation is

y1 - 9 _____1i2
1 19 at,! a1,2 ... a!,n
h(Yt) = --2- . + 2(9 - Yt) + Ytln 9 _Y_t 4 2 ::5 Yt ::5 4 ...
- Yt' 20 J = az,z
az,nl = /A/
19 1
11 - Yt + Ytln 11 - Yt 4 20 ::5 Yt ::5 5 2 an,! an,2 ... an,n
2 Y!
= 0 elsewhere
Substituting the preceding two equations into Equation 2.71, we obtain the pdf
of Y as

fv(Y) = fx(A -ty - A -lB)IIAII- 1 (2.76)


Special-case: Linear Transformations. One of the most frequently used type
of transformation is the affine transformation, where each of the new variables
is a linear combination of the old variables plus a constant. That is Sum of Random Variables. We consider Y1 = X 1 + X 2 where X 1 and X 2 are
independent random variables. As suggested before, let us introduce an addi-
tional function Y2 = X 2 so that the transformation is given by
yl = al,lxl + a1,2x2 + + al,nXn + b1
Yz = az,1X1 + az,zXz + + az,nXn + bz =

Yn = an,1X1 + an,zXz + ··· + an,nXn + bn


From Equation 2.76 it follows that

where the a;,/s and b;'s are all constants. In matrix notation we can write this
transformation as !Y .Y,(Yt• Yz)
1
= fx 1 ,x,(YJ - Yz, Yz)
= fx1(Yt - Yz)fx,(Yz)

a1,z

m[""
since X 1 and X 2 are independent.
az,J az,z al,n][X1] + [b1] The pdf of ¥ 1 is obtained by integration as
an,! an,2 an,n Xn bn

or
iY/YI) foo fx (Yt - 1 Yz)fx,(Yz) dyz (2.77.a)

Y=AX+B (2.75) The relationship given in Equation 2.77.a is said to be the convolution of fx 1
and fx2 , whic.h.is written symbolically as

where A is n x n, Y, X, and B are n x 1 matrices. If A is nonsingular, then


the inverse transformation exists and is given by fy 1 = fx 1 * fx, (2.77.b)

Thus, the density function of the sum of two independent random variables is
X= A-ty- A- 1B given by the convolution of their densities. This also implies that the charac-
68 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 69

teristic functions are multiplied, and the cumulant generating functions as well
as individual cumulants are summed. EXAMPLE 2.20.

Let Y = X 1 + X 2 where X 1 and X 2 are independent, and

EXAMPLE 2.19. fx 1(Xt) = exp( -xt), x 1 :2: 0; fx,(xz) = 2 exp(- 2xz), x2 :2: 0,
= 0 <0 = 0 X2 < 0.
X 1 and X 2 are independent random variables with identical uniform distributions X1

in the interval [ -1, 1]. Find the pdf of Y1 = X 1 + X 2 •


Find the pdf of Y.
SOLUTION: See Figure 2.13
SOLUTION: (See Figure 2.14)

fy(y) = J: exp( -x )2 exp[ -2(y -


1 x 1)] dx 1

2 exp( -2y) J: exp(x dx


I f <xtl
I xl = = 2 exp( -2y)[exp(y) - 1]
I 1/2 1) 1

fy(y) = 2[exp( -y) - exp[ -2y], y 0


-1
i ] X[
=0 y<O
:2:

(a)

l fx2(x2)

I 1/2

I l- J
-1
n X2 EXAMPLE 2.21.

X has an n-variate Gaussian density function with E{X;} = 0, and a covariance


(b) matrix of !x. Find the pdf of Y = AX where A is ann x n nonsingular matrix.

V//K//0\(//! Y2
-0.5

(c)

I "'-

exp(-x)

0 .5 2 Yl

(d) y X

Figure 2.13 Convolution of pdfs-Example 2.19. Figure 2.14 Convolution for Example 2.20.
r

70 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 71

SOLUTION: We are given group. We will now show that the joint pdf of Y1 , Y 2 , ••• , Yn is given by

fY1,Y1•... ,Y.(Yt• Yz, · · · , Yn) = n!fx(Yt)fx(Yz) ... fx(Yn)


fx(x) [(211')" 121Ixl 112] -1 exp [
-21 xTix 1X J a < Yt < Yz < ... < Yn < b

With x = A - 1y, and J = IAI, we obtain We shall prove this for n = 3, but the argument can be entirely general.
With n = 3

fv(Y) = [(211')"' 2 1Ixl 112]- 1exp [ IIAII- 1 Xz, x3) = fx(xt)fx(xz)fx(x3)

and the transformation is


Now if we define Iv = AixAT, then the exponent in the pdf of Y has the form
Y1 = smallest of X 2 , X 3)
Y2 = middle value of X 2 , X 3)
exp (- yTivJY)
Y3 = largest of Xz, X3)

which corresponds to a multivariate Gaussian pdf with zero means and a co- A given set of values x 2 , x 3 may fall into one of the following six possibilities:
variance matrix of Iv. Hence, we conclude that Y, which is a linear transforma-
tion of a multivariate Gaussian vector X, also has a Gaussian distribution. (Note:
This cannot be generalized for any arbitrary distribution.) x1 < Xz < X3 or Y! =XI> Yz = Xz, Y3 = X3
x1 < X3 < Xz or Yt = Xj, Yz = X3, Y3 = Xz
x2 < x 1 < X3 or Yt = Xz, Yz = Y3 = X3
x2 < x3 < x1 or Yt = Xz, Yz = X3, Y3 = Xj
Order Statistics. Ordering, comparing, and finding the minimum and maximum x3 < x1 < Xz or Yt = X3, Yz = XI> Y3 = Xz
are typical statistical or data processing operations. We can use the techniques
x3< x2 < x 1 or Y! = X3, Yz = Xz, Y3 = Xj
outlined in the preceding sections for finding the distribution of minimum and
maximum values within a group of independent random variables. (Note that Xt = Xz, etc., occur with a probability of 0 since xj, Xz, x3 are
Let XI' Xz, x3' ... 'x. be a group of independent random variables having continuous random variables.)
a common pdf, fx(x), defined over the interval (a, b). To find the distribution Thus, we have six or 3! inverses. If we take a particular inverse, say, y 1
of the smallest and largest of these X;s, let us define the following transformation: X3, Yz = x 1 , and y 3 = x 2 , the Jacobian is given by

Let Y1 = smallest of (X1 , X 2 , •.. , X.) 0 0 1


J = p o ol = 1
Y 2 = next X; in order of magnitude 0 1 0

Yn = largest of (X1 , X 2 , ••• , X.) The reader can verify that, for all six inverses, the Jacobian has a magnitude of
1, and using Equation 2.71, we obtain the joint pdf of Y1 , Y 2 , Y3 as

That is Y1 < Y 2 < ··· < Yn represent X 1 , X 2 , • • • , Xn when the latter are arranged
in ascending order of magnitude. Then Y; is called the ith order statistic of the Yz, Y3) = 3!fx(Yt)fx(Yz)fx(Y3), a < Y1 < Yz < Y3 < b
72 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 73

Generalizing this to the case of n variables we obtain SOLUTION: From Equation 2.78.b, we obtain

/y.,Y2, ••• ,Y.(Yt. Yz, · · · , Yn) = n!fx(Yt)fx(Yz) ··· fx(Yn) jy10(Y) = 10[1 - e-aY]9ae-ay,
a < Yt < Yz < ··· < Yn < b (2.78.a) = 0 y<O

The marginal pdf of Yn is obtained by integrating out Yt. Yz, ... , Yn-1•
il
Nonlinear Transformations. While it is relatively easy to find the distribution
fy.(Yn) Yn JYn-1 ... JY3
a JYz
a n!fx(YI)fx(Yz) ··· fx(Yn) dyl dyz ··· dyn-1
of Y = g(X) when g is linear or affine, it is usually very difficult to find the d
=
Ja a distribution of Y when g is nonlinear. However, if X is a scalar random variable, II
then Equation 2.71 provides a general solution. The difficulties when X is two-
dimensional are illustrated by Example 2.18, and this example suggests the il
The innermost integral on y 1 yields Fx(y 2 ), and the next integral is difficulties when X is more than two-dimensional and g is nonlinear.
For general nonlinear transformations, two approaches are common in prac-
d
y, Jy' tice. One is the Monte Carlo approach, which is outlined in the next subsection. d
J Fx(Yz)fx(Yz) dyz = Fx(Yz)d[Fx(yz)]
a a The other approach is based upon an approximation involving moments and is
presented in Section 2. 7. We mention here that the mean, the variance, and
il
[Fx(Y3)]2 higher moments of Y can be obtained easily (at least conceptually) as follows. I!
2 We start with
:i
q,
Repeating this process (n 1) times, we obtain
E{h(Y)]} = JY h(y)fy(y)dy q
fy.(yn) = n[Fx(Yn)]"- 1fx(yn), a< Yn < b (2.78.b) I!
However, Y = g(X), and hence we can compute E{h(Y)} as 'I
Proceeding along similar lines, we can show that !I
;#
Ey{h(Y)} = Ex{h(g(X)} q

!Y,(Yt) = n[l - Fx(YI)]"- fx(YJ),


1
a< Yl < b (2.78.c) d,
Since the right-hand side is a function of X alone, its expected value is i
Equations 2.78.b and 2.78.c can be used to obtain and analyze the distribution !l
of the largest and smallest among a group of random variables.
Ex{h(g(X))} = L h(g(x))fx(x) dx (2.79) I
I
EXAMPLE 2.22. {,
Using the means and covariances, we may be able to approximate the dis-
A peak detection circuit processes 10 identically distributed random samples tribution of Y as discussed in the next section.
and selects as its output the sample with the largest value. Find the pdf of the ,f
peak detector output assuming that the individual samples have the pdf Monte Carlo (Synthetic Sampling) Technique. We seek an approximation to
H
the distribution or pdf of Y when
fx(x) = ae-ax, t
= 0 x<O Y = ... , Xn)
'l
l

74 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 75

Generate 20
random numbers
and store as
Xlr·••rX2Q

<W
IV'
o17·
6£'
8£'
LE'
9£'
9£'
17£'
££'
Z£'
I£'
lOT 0£'
ZZI 6Z'
I17I
8Ll sz·
Organize
8ZZ a·
No y,s and £Z<: 9Z'
print or 691:
gz·
plot 17Z'
81£ t.z·
817£
II£ zz· .<::
lZ' u
Figure 2.15 Simple Monte Carlo simulation. 66£ o;;;·
96£
61' <ll
u
1917 81' c:
16£ n· "'<ll
8Z17 91'
88£ u
91' 0
It is assumed that Y = ... , Xn) is known and that the joint density II17 \71'
Z££
<ll

fx,.x, ..... x" is known. Now if a sample value of each random variable were known 1717£
£1' "
zr
LI£ t:i
(say X 1 = X 2 = x1.2, ... , Xn = x 1.n), then a sample value of Y could be 98Z
n· 0
·.;::
or
computed [say y 1 = g(xu. x1.2, ... , xl.n)]. If another set of sample values were 9LG 60'
::;
12:1:
chosen for the random variables (say X 1 = x 2 •1 , • • • , Xn = Xz,n), then y 2 = 061
80'
LO' e
'<;l
I9I
Xz.z, ... , Xz,n) could be computed. 9ZI
90'
90' 0
;::::
Monte Carlo techniques simply consist of computer algorithms for selecting 901 170' ell

the samples xi.!, ... , X;,n, a method for calculating y; = g(x;, 1, • • • , X;,n), which
LL
69
w u
w· <!)

often is just one or a few lines of code, and a method of organizing and displaying 99 w· c0
0
the results of a large number of repetitions of the procedure. w·- ::E
ZQ'-
Consider the case where the components of X are independent and uniformly £0'-
ell
.....0
distributed between zero and one. This is a particularly simple example because 170·-
computer routines that generate pseudorandom numbers uniformly distributed go·- E::;
90'-
between zero and one are widely available. A Monte Carlo program that ap- LO'- "'
proximates the distribution of Y when X is of dimension 20 is shown in Figure
2.15. The required number of samples is beyond the scope of this introduction. ...f'i
\C

0 0 0
0 0
s
'0
However, the usual result of a Monte Carlo routine is a histogram, and the 0 0 0 0
N ..."'
errors of histograms, which are a function of the number of samples, are discussed "' '<!'
"' 6k
in Chapter 8. sardwes jO JaqwnN fi:
If the random variable X; is not uniformly distributed between zero and one,
then random sampling is somewhat more difficult. In such cases the following
procedure is used. Select a random sample of U that is uniformly distributed
between 0 and 1. Call this random sample u 1 • Then Fx, 1(u 1) is the random sample
of X;.
76 REVIEW OF PROBABILITY AND RANDOM VARIABLES BOUNDS AND APPROXIMATIONS 77

For example, suppose that X is uniformly distributed between 10 and 20. 2. 7.1 Tchebycbeff Inequality
Then If only the mean and variance of a random variable X are known, we can obtain
upper bounds on P(!XI :2: the Tchebycheff inequality, which we prove
Fx,(x) = 0 X< 10 now. Suppose X is a random variable, and we define

g
= (x - 10)/10, 10::5 X< 20
=1 X :2: 20
Ye =
if !XI :2:
if
Notice Fx/(u) = lOu + 10. Thus, if the value .250 were the random sample
of U, then the corresponding random sample of X would be 12.5. where is a positive constant. From the definition of Y. it follows that
The reader is asked to show using Equation 2. 71 that if X; has a density
function and if X; = F;- 1( U) = g(U) where U is uniformly distributed between X2 :2: X2YE :2: €2YE
zero and one then F;- 1 is unique and

and thus
dF(x)
fx(x) = - d ' where F; = (F;-1)-1
' X
E{X 2} :2: E{X 2 Ye} :2: t:
2E{YE} (2.80)

If the random variables X; are dependent, then the samples of X 2 , • • • , Xn


are based upon the conditional density function fx,IX,, . .. , fx,!x,_,, . .. , x,· However,
The results of an example Monte Carlo simulation of a mechanical tolerance
application where Y represents clearance are shown in Figure 2.16. In this case E{Ye} = 1 · P(jXj :;;::: + 0 · P(jXj < = P(!Xj :;;::: (2.81)
Y was a somewhat complex trigonometric function of 41 dimensions on a pro-
duction drawing. The results required an assumed distribution for each of the
41 individual dimensions involved in the clearance, and all were assumed to be Combining Equations 2.80 and 2.81, we obtain the Tchebycheff inequality as
uniformly distributed between their tolerance limits. This quite nonlinear trans-
formation resulted in results that appear normal, and interference, that is, neg-
ative clearance, occurred 71 times in 8000 simulations. This estimate of the 1
P(!X! :;;::: ::::; 2 E(X 2] (2.82.a)
probability of interference was verified by results of the assembly operation. €

(Note that the foregoing inequality does not require the complete distribution
2.7 BOUNDS AND APPROXIMATIONS of X, that is, it is distribution free.)
Now, if we let X = (Y- fLy), and E = k, Equation 2.82.a takes the form
In many applications requiring the calculations of probabilities we often face
the following situations:
l
1. The underlying distributions are not completely specified-only the P(j(Y - J.Ly )j :;;::: kay) ::::; kz (2.82.b)
means, variances, and some of the higher order moments E{(X - J.Lx)k},
k > 2 are known.
2. The underlying density function is known but integration in closed form or
is not possible (example: the Gaussian pdf).

0'2
In these cases we use several approximation techniques that yield upper and/ or P(jY - fLy! :;;::: k) ::5 (2.82.c)
lower bounds on probabilities.
T BOUNDS AND APPROXIMATIONS 79
78 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Equation 2.82. b gives an upper bound on the probability that a random variable 2.7.3 Union Bound
has a value that deviates from its mean by more than k times its standard I This bound is very useful in approximating the probability of union of events,
deviation. Equation 2.82.b thus justifies the use of the standard deviation as a and it follows directly from
measure of variability for any random variable.
P(A U B) = P(A) + P(B) - P(AB) :s P(A) + P(B)
2. 7.2 Chernoff Bound
since P(AB) ;;::: 0. This result can be generalized as
The Tchebycheff inequality often provides a very "loose" upper bound on prob-
abilities. The Chernoff bound provides a "tighter" bound. To derive the Chernoff
bound, define p (
'
A;) :s L P(A;)
' !
(2.84)

Ye = g X;;:: e
X< e
We now present an example to illustrate the use of these bounds.

Then, for all t ;;::: 0, it must be true that


EXAMPLE 2.23.

e'x;;::: e"Y. X 1 and X 2 are two independent Gaussian random variables with J.Lx1 J.Lx, -
0 and ai-1 = 1 and ai, = 4.

and, hence, (a) Find the Tchebycheff and Chernoff bounds on P(X1 ;;::: 3) and compare
it with the exact value of P(X1 2: 3).
(b) Find the union bound on P(X1 ;;::: 3 or X 2 ;;::: 4) and compare it with the
E{e'x} ;;::: e'•E{Y.} = e'•P(X ;;::: e)
actual value.

or SOLUTION:
(a) The Tchebycheff bound on P(X1 ;;::: 3) is obtained using Equation 2.82.c
P(X 2: e) :s e-"E{e'x}, t;;::O as

1
Furthermore, P(Xt ;;::: 3) :s P(IX1 1 ;;::: 3) :s-
9
= 0.111

P(X;;::: e) :s min e-'•E{e'x}


To obtain the Chernoff bound we start with
:s min exp[- te + ln E{e'X}] (2.83)
t?:.O

J
oo 1 2
.£{e'X'} = erxt \!2'IT' [xt/2 dx 1
Equation 2.83 is the Chernoff bound. While the advantage of the Chernoff
bound is that it is tighter than the Tchebycheff bound, the disadvantage of the
Chernoff bound is that it requires the evaluation of E{e'x} and thus requires = e''n -oo 1 exp[ -(xt - t)Z/2] dx 1
more extensive knowledge of the distribution. The Tchebycheff bound does not
require such knowledge of the distribution. = e''n
. ...-
___

80 REVIEW OF PROBABILITY AND RANDOM VARIABLES BOUNDS AND APPROXIMATIONS 81

Hence, The union bound is usually very tight when the probabilities involved
are small and the random variables are indepentlent.

P(X1 ;:: E) :S exp (- tE +

The minimum value of the right-hand side occurs with t = E and


2.7.4 Approximating the Distribution of Y =g(XH X 2 , • •• , Xn)
P(X1 ;:: E) :s e-•212 A practical approximation based on the first-order Taylor series expansion is
discussed. Consider
Thus, the Chernoff bound on P(X1 ;:: 3) is given by
Y = g(X1, Xz, ... , Xn)
P(X1 ;:: 3) :s e- 912
= 0.0111
If Y is represented by its first-order Taylor series expansion about the point t-11,
J..L,
f.l.2, . . . ,
From the tabulated values of the Q( ) function (Appendix D), we
obtain the value of P(X1 ;:: 3) as

P(X1 ;:: 3) = Q(3) = .0013


Y g(J.Lb J.Lz, • • • , t-Ln) + [:fi (J.L" J.Lz, · · · , t-Ln) J[Xi - 1-1J

then
Comparison of the exact value with the Chernoff and Tchebycheff
bounds indicates that the Tchebycheff bound is much looser than the
Chernoff bound. This is to be expected since the Tchebycheff bound q Y) = g(J.LI, J.Lz,. · · , t-Ln)
does not take into account the functional form of the pdf. = E[(Y- J.Ly)2]

= aY (J.LI, .•. , f.l.n) Jz a},


L" [ -x.
(b) P(X1 ;:: 3 or X 2 ;:: 4) •= 1 a,
= P(X1 ;:: 3) + P(X2 ;:: 4) - P(X1 ;:: 3 and X 2 ;:: 4) " " aY aY
= P(X1 ;:: 3) + P(X2 ;:: 4) - P(X1 ;:: 3)P(X2 ;:: 4)
+ L Lax. (f.l-
•=I ]=I I
1, • • • 'f.l.n) ax. (f.l.,,
1
· · · ' f.l.n)Px,x10"x,O"x1
j;<i

since X 1 and X 2 are independent. The union bound consists of the sum where
of the first two terms of the right-hand side of the preceding equation,
and the union bound is "off" by the value of the third term. Substituting
the value of these probabilities, we have f.l.i = E[Xr]
= E[Xi -
P(X1 ;:: 3 or X 2 ;:: 4) = (.0013) + (.0228) - (.0013)(.0228) _ £[(Xi - f.l.)(Xj - IJ.i)]
= .02407 Px,x1 -
O"x,O"x1

The union bound is given by


If the random variables, X 1 , • • • , X., are uncorrelated (Pxx = 0), then
the double sum is zero. ''
P(X1 ;:: 3 or X 2 ;:: 4) :s P(X1 ;:: 3) + P(X2 ;:: 4) = .0241 Furthermore, as will be explained in Section 2.8.2, the central limit theorem
I_
82 REVIEW OF PROBABILITY AND RANDOM VARIABLES 1 BOUNDS AND APPROXIMATIONS 83

suggests that if n is reasonably large, then it may not be too unreasonable to 2.1.5 Series Approximation of Probability Density Functions
assume that Y is normal if the X;s meet certain conditions.
In some applications, .such as. those that involve nonlinear transformations, it
will not be possible to calculate the probability density functions in closed form.
EXAMPLE 2.24. However, it might be easy to calculate the expected values. As an example,
consider Y = X 3 • Even if the pdf of Y cannot be specified in analytical form,
it might be possible to calculate E{Yk} = E{X3k} for k :s: m. In the following
Xr paragraphs we present a method for appwximating the unknown pdf fv(y) of
y = X2 + X3X4 - Xs2 a random variable Y whose moments E{Yk} are known. To simplify the algebra,
we will assume that E{Y} = 0 and a} = 1.
The readers have seen the Fourier series expansion for periodic functions.
The X;s are independent. A similar series approach can be used to expand probability density functions.
A commonly used and mathematically tractable series approximation is the
Gram-Charlier series, which has the form:
f.Lx, = 10 a},= 1

f.Lx, = 2 a2x,-
--
1 fv(Y) = h(y) L CiHi(Y) (2.85)
2 j=O

1
f.Lx, = 3 a2x,-
--4 where

a2x,-
--
1 1
f.Lx, = 4 3 h(y) = . ;-;;-- exp{ -y2f2) (2.86)
V21T
1
f.Lx, = 1 a2x,-
--5
and the basis functions of the expansion, Hi(y), are the Tchebycheff-Hermite
(T-H) polynomials. The first eight T-H polynomials are

Find approximately (a) f.Ly, (b) a}, and (c) P(Y :s: 20).
Ho(Y) = 1

SOLUTION: Hr(Y) = y
Hz(Y) = Y 2 - 1
10
(a) f.Ly = 2 + (3)(4) - 1 = 16 HJ(y) = y 3 - 3y
Hly) = y4 - 6y2 + 3
(b) a}= GY 1
(1) + ( - 4 °YG) + 4
2
+ 32 G) + G)
2
2
Hs(Y)
H 6(y)
= y5
= y6
- 10y 3
15y 4
+ 15y
+ 45y 2 - 15
= 11.2 -

H-,(y) = y7 -
5
21y + 105y3- 105y
(c) With only five terms in the approximate linear equation, we assume, H 8(y) = y 8
- 28y 6 + 210y 4 - 420y 2 + 105 (2.87)
for an approximation, that Y is normal. Thus

!.2 1 and they have the following properties:


P(Y :s: 20) =
f-x
• ;-;;-- exp(- z 2 /2) dz = 1 - Q(1.2) = .885
V 21T
d(Hk_ 1 (y)h(y))
1. Hk(y)h(y) = k2::1
dy
84 REVIEW OF PROBABILITY AND RANDOM VARIABLES BOUNDS AND APPROXIMA T/ONS 85

2. Hk(Y) - yHk-!(Y) + (k - 1)Hk_z(y) = 0, k?:.2 Substituting Equation 2.89 into Equation 2.85 we obtain the series expansion

3. r"' Hm(y)Hn(y)h(y) dy = 0, m 7'= n


(2.88)
for the ydf of a random variable in terms of the moments of the random variable
and the T-H polynomials.
The Gram-Charlier series expansion for the pdf of a random variable X with
= n!, m = n mean f1x and variance a"} has the form:
The coefficients of the series expansion are evaluated by multiplying both
sides of Equation 2.85 by Hk(y) and integrating from -oo to oo. By virtue of the.
orthogonality property given in Equation 2.88, we obtain fx(x) = _ 1 _ exp [ - (x -
.yz;<Tx 2<T x
f CiHi (x - f1x)
i=O <Tx
(2.90)

Ck = k!
1 I"' -oo Hk(y)fy(y)dy
where the coefficients Ci are given by Equation 2.89 with f1k used for f1k where

= k!
1 [ k[Z]
f1k - (2)1! f1k-2
k[ 4]
+ 222! f.Lk-4 - .. ·] (2.89.a)

where
11£ = E {[x :xf1xr}
f1m = E{Ym}
EXAMPLE 2.25.
and
For a random variable X
k'
k[m] = · = k(k - 1) ··· [k - (m - 1)], k?:. m
(k- m)! f11 = 3, f12 = 13, f13 = 59, f14 = 309
The first eight coefficients follow directly from Equations 2.87 and 2.89.a and
are given by Find P(X ::s 5) using four terms of a Gram-Charlier series.

SOLUTION:
Co= 1
c1 = 111
<Ti = E(X 2) - [E(X)F = f12 - f1t = 4
1
C2 = 2 (f12 - 1)
Converting to the standard normal form
1
c3 = (f13 - 3f11)
6
1 Z=X-3
c4 = C114 - 6112 + 3) 2
24
1
Cs = (f1s - 10!13 + 15!1 1)
120 Then the moments of Z are
1
c6 = (f16 - 1s114 + 45112 - 15)
720 f1i = 0 = 1
1 f13 - 9f1z + 27!1 1 - 27
c7 = 5040 ( f17 - 21 f1s + 105 f13 - 105 f1J) -.5
1 _
f13 - =
8
1 f14 - 12f13 + 54f12 - 108f11 + 81
Cs =
40320
(f1s - 28f16 + 21011 4 - 420f1 2 + 105) (2.89.b) 1
f14 = 16 = 3.75
-- - . "._. .........

86 REVIEW OF PROBABILITY AND RANDOM VARIABLES BOUNDS AND APPROXIMATIONS 87

Then for the random variable Z, using Equation 2.89, we .add more terms, the higher ordei terms will force the pdf to take a more
proper shape.
Co= 1 A series of th·e form given in Equation 2.90 is useful only if it converges
rapidly and the terms can be calculated easily. This is true for the Gram-Charlier
C1 = 0 series when the underlying pdf is nearly Gaussian or when the random variable
C2 = 0 X is the sum of many independent components. Unfortunately, the Gram-
Charlier series is not uniformly convergent, thus adding more terms does not
c3 = 6 c- .5) =
1
-.08333 guarantee increased accuracy. A rule of thumb suggests four to six terms for
many practical applications.
c4 = 241 (3.75 - 6 + 3) .03125

2.7.6 Approximations of Gaussian Probabilities


Now P(X s 5) = P(Z s 1)
The Gaussian pdf plays an important role in probability theory. Unfortunately,
this pdf cannot be integrated in closed form. Several approximations have been
= foo vk exp(- z /2) [#a CiHi(z) Jdz
2 developed for evaluating

I 1 Jl 1
f
oo
\12; exp(- Z 2 12) dz +
I
= -oo -oo ( - .0833)h(z)H3(z) dz Q(y) = • ;;:;- exp( -x 2 /2) dx
y V21T

+ roo .03125h(z)H (z) dz 4


and are given in the Handbook of Mathematical functions edited by Abramowitz
and Stegun (pages 931-934). For large values of y, (y > 4), an approximation
for Q(y) is
Using the property (1) the T-H polynomials yields
2
P(Z s 1) 1 exp (-
Q(y) = \12;y 2y ) (2.9l.a)
= .8413 + .0833h(1)H2 (1) - .03125h(1)H3 (1)

= .8413 + .0833 vk exp ( (0) - .03125 vk exp ( ( -2) For 0 s y, the following approximation is excellent as measured by le(y)l, the
magnitude of the error.
= .8413 + .0151 = .8564
Q(y) = h(y)(b 1t + b2 t2 + b3t 3 + b4t 4 + b5 t5 ) + e(y) (2.9l.b)

where
Equation 2.90 is a series approximation to the pdf of a random variable X
whose moments are known. If we know only the first two moments, then the 1
series approximation reduces to h(y) = • ;;:;- exp( -y 2 /2)
V21T

1
- 1 t=-- b 2 = - .356563782
f x(x) = _ r.:- exp(- (x - 1 + PY
2Tiax
le(y)l < 7.5 X 10- 8 b3 = 1.781477937

which says that (if only the first and second moments of a random variable are p = .2316419 b4 = -1.821255978
known) the Gaussian pdf is used as an approximation to the underlying pdf. As bl = .319381530 b5 = 1.330274429
88 REVIEW OF PROBABILITY AND RANDOM VARIABLES SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 89

2.8 SEQUENCES OF RANDOM VARIABLES . . . , converges for every A E S, then we say that the random sequence converges
AND CONVERGENCE everywhere. The limit of each sequence can depend upon >.., and if we denote
the limit by X, then X is a random variable.
One of the most important concepts in mathematical analysis is the concept of Now, there may be cases where the sequence does not converge for every
convergence and the existence of a limit. Fundamental operations of calculus outcome. In such cases if the set of outcomes for which the limit exists has a
such as differentiation, integration, and summation of infinite series are defined probability of 1, that is, if
by means of a limiting process. The same is true in many engineering applications,
for example, the steady state of a dynamic system or the asymptotic trajectory P{l\. : lim Xn(>..) = X(l\.)} = 1
of a moving object. It is similarly useful to study the convergence of random ,.....,
sequences.
With real continuous functions, we use the notation
then we say that the sequence converges almost everywhere or almost surely.
This is written as
x(t) _,. a as t _,. t0 or lim x(t) = a
t->to
P{Xn -i> X} = 1 as n _,. co (2.92)

to denote that x(t) converges to a as t approaches t0 where tis continuous. The


corresponding statement for t a discrete variable is
2.8.2 Convergence in Distribution and Central Limit Theorem
x(t,) -i> a as t, _,. t 0 or lim x(t,) = a
Let Fn(x) and F(x) denote the distribution functions of Xn and X, respectively.
If
for any discrete sequence such that
Fn(x) _,. F(x) as n _,. co (2.93)
tn _,. t0 as n _,. oo
for all x at which F(x) is continuous, then we say that the sequence Xn converges
in distribution to X.
With this remark in mind, let us proceed to investigate the convergence of
sequences of random variables, or random sequences. A random sequence is Central Limit Theorem. Let X 1 , X 2 , • • • , Xn be a sequence of independent,
denoted by XI> X 2 , • • • , Xn, .... For a specific outcome, >.., Xn(l\.) = Xn is a 2
identically distributed random variables, each with mean f.l. and variance cr • Let
sequence of numbers that might or might not converge. The concept of con-
vergence of a random sequence may be concerned with the convergence of
n
individual sequences, Xn(l\.) = Xn, or the convergence of the probabilities of
some sequence of events determined by the entire ensemble of sequences or Zn = 2: (X; -
i=l
both. Several definitions and criteria are used for determining the convergence
of random sequences, and we present four of these criteria.
Then Zn has a limiting (as n -i> co) distribution that is Gaussian with mean 0 and
variance 1.
The central limit theorem can be ptollled follows. Suppose we assume that
2.8.1 Convergence Everywhere and Almost Everywhere the moment-generating function M(t) of Xk exists for ltl <h. Then the function
For every outcome A, we have a sequence of numbers m(t)

X1(A.), Xz(l\.), ... , Xn(>..), ... m(t) E{exp[t(Xk - f.!.)]} = exp(- f.l.l)M(t)

and hence the random sequence X 1 , X 2 , • • • , Xn represents a family of se- exists for -h < t <h. Furthermore, since Xk has a finite mean and variance,
quences. If each member of the family converges to a limit, that is, X 1(l\.), X 2 (A.), the first two derivatives of M(t) and hence the derivatives of m(t) exist at t =
........ - - - - - - - - - - - - - - - - - - - - - - - - - ........... ..

90 REVIEW OF PROBABILITY AND RANDOM VARIABLES SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 91

0. We can use Taylor's formula and expand m(t) as (The last step follows from the familiar formula of calculus 1im,_..oo[1 + a!n]" =
ea). Since exp(T1 /2) is the moment-generating function of a Gaussian random
variable withO .mean .and variance 1., and since the moment-generating function
m(t) = m(O) + m'(O)t + 0 ::s <t
uniquely determines the underlying pdf at all points of continuity, Equation
a 2t 2 - a 2 ]t2 2.94 shows that Zn converges to a Gaussian distribution with 0 mean and vari-
= 1 + - + "---'-"-'---::-----"--
2 2 ance 1.
In many engineering applications, the central limit theorem and hence the
Next consider Gaussian pdf play an important role. For example, the output of a linear system
is a weighted sum of the input values, and if the input is a sequence of random
variables, then the output can be approximated by a Gaussian distribution.
Mn(T) = E{exp(TZn)} Another example is the total nois.e in a radio link that can be modeled as the

= E { exp ( T X1aVn
- J.L) exp (T XzaVn
- J.L) · · · exp (T XnaVn
- J.L)}
sum of the contributions from a large number of independent sources. The
central limit theorem permits us to model the total noise by a Gaussian distri-
bution.
E{exp ... E{exp (Tx;VnJ.L)} We had assumed that X;'s are independent and identically distributed and

[ E { exp ( T :V,t)} r that the moment-generating function exists in order to prove the central limit
theorem. The theorem, however, holds under a variety of weaker conditions
(Reference [6]):

[m CV,;) r. -h < -
aVn
7
- < h 1. The random variables X 1 , X 2 , ••• , in the original sequence are inde-
pendent with the same mean and variance but not identically distributed.
2. X 1 , X 2 , • • • , are independent with different means, same variance, and
In m(t), replace t by T/(aVn) to obtain not identically distributed.
3. Assume X 1 , X 2 , X 3 , • • • are independent and have variances ay,
a5, .... If there exist positive constants E and such that E < ar <
m(-T-) = 1 + ..:.=_ + .!o.. [m___,"(--"-'0'---..,. a_z-2_]T_z for all i, then the distribution of the standardized sum converges to the
aVn 2n 2na 2
standard Gaussian; this says in particular that the variances must exist
and be neither too large nor too small.
where now is between 0 and T/(aVn). Accordingly,
The assumption of finite variances, however, is essential for the central limit
theorem to hold.
Mn(T) = {1 + Tz + [m"(O - a2]T2}" T
2n 2na 2 '
0 ::s < aVn Finite Sums. The central limit theorem states that an infinite sum, Y, has a
normal distribution. For a finite sum of independent random variables, that is,
Since m"(t) is continuous at t = 0 and since 0 as n oo, we have

Y = 2: xi
lim[ m"(O - a 2] = 0 i=l

then
and

fY = j X1 * j X 2 * • · · * j X,
lim Mn(T) = lim { 1 + -T2}n n
2n
rz--x; ft-"J''.:G
ljly(w) = IT lJ!x,(w)
= exp(T 2/2) (2.94)
92 REVIEW OF PROBABILITY AND RANDOM VARIABLES SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 93

and 7.0

Cy(w) = 2: Cx,(w) 6.0


...-+-.

where 'I' is the characteristic function and Cis the cumulant-generating function.
5.0
'(/
b
i7
\, Normal
)/approximation
I
I
Also, if K; is the ith cumulant where K; is the coefficient of (jw)i/i! in a power
J 1\v' Exact

7
series expansion of C, then it follows that 4.0
..
n 3.0
Ky
l, = "'
LJ
j=l
K;x.
'1

2.0
\
)
I

v
and in particular the first cumulant is the mean, thus 1.0 I

\ ,j
0 ./
fLy = L fl.x, 9.70 9.75 9.80 9.85 9.90 9.95 10.00 10.05 10.10 10.15 10.20 10.25
i==l X

Figure 2.17 Density and approximation for Example 2.26.

and the second cumulant is the variance

n
EXAMPLE 2.26.
a} = 2: a_k,
i=l
Find the resistance of a circuit consisting of five independent resistances in series.
All resistances are assumed to have a uniform density function between 1. 95
and the third cumulant, K 3 ,x is E{(X - fl.x) 3}, thus and 2.05 ohms (2 ohms ± 2.5% ). Find the resistance of the series combination
and compare it with the normal approximation.
n
SOLUTION: The exact density is found by four convolutions of uniform density
E{(Y- fLy)3} = 2: E{(X; - fl.xY}
functions. The.mean value of each resistance is 2 and the standard deviation is
i=l
(20 \13) -t. The exact density function of the resistance of the series circuit is
plotted in Figure 2.17 along with the normal density function, which has the
and K 4 ,x is E{(X - fl.x) 4} - 3 K 2,x, thus same mean (10) and the same variance (1/240). Note the close correspondence.

n n
K4,Y = L
i=l
K4,x, = 2: (E{(X -
i=l
fl.x) 4} - 3Kz,x)

2.8.3 Convergence in Probability (in Measure) and the Law of Large


For finite sums the normal distribution is often rapidly approached; thus a Numbers
Gaussian approximation or a Gram-Charlier approximation is often appropriate. The probability P{jX - Xnl > e} of the event {jX - Xnl > e} is a sequence of
The following example illustrates the rapid approach to a normal distribution. numbers depending on n and E. If this sequence tends to zero as n-? oo, that
94 REVIEW OF PROBABILITY AND RANDOM VARIABLES SUMMARY 95

is, if

P{iX - Xnl > E} 0 as n oo


Xn-.X
almost everywhere
,--- Xn-.X
in probability
- Xn-.X
in distribution

for any E > 0, then we say that Xn converges to the random variable X in
probability. This is also called stochastic convergence. An important application
of convergence in probability is the law of large numbers.
l
Xn-.X
in mean square
Law of Large Numbers. Assume that X 2 , • • • , Xn is a sequence of in-
dependent random variables each with mean f.l. and variance 0' 2• Then, if we
define Figure 2.18 Relationship between various modes of convergence.

1 n
Xn =-"X
£... 1 (2.95.a)
n i=l
For random sequences the following version of the Cauchy criterion applies.
lim P{iXn tJ.I 2: E} = 0 for each E > 0 (2.95.b)
E{(Xn - X)l} 0 as n oo

The law of large numbers can be proved directly by using Tchebycheff's ine-
quality. if and only if

E{IXn+"' - X.l 2} 0 as n oo for any m > 0 (2.97)


2.8.4 Convergence in Mean Square
A sequence Xn is said to converge in mean square if there exists a random
variable X (possibly a constant) such that 2.8.5 Relationship between Different Forms of Convergence
The relationship between various modes of convergence is shown in Figure 2.18.
E[(Xn - X) 2
] 0 as n oo (2.96) If a sequence converges in MS sense, then it follows from the application of
Tchebycheff's inequality that the sequence also converges in probability. It can
also be shown that almost everywhere convergence implies convergence in prob-
If Equation 2.96 holds, then the random variable X is called the mean square ability, which in turn implies convergence in distribution.
limit of the sequence Xn and we use the notation

l.i.m. Xn =X

2.9 SUMMARY
where l.i.m. is meant to suggest the phrase limit in mean (square) to distinguish
it from the symbol lim for the ordinary limit of a sequence of numbers. The reviews of probability, random variables, distribution function, probabil-
Although the verification of some modes of convergences is difficult to es- ity mass function (fOT discrete random variables), and probability density
tablish, the Cauchy criterion can be used to establish conditions for mean-square functions (for continuous random variables) were brief, as was the review of
convergence. For deterministic sequences the Cauchy criterion establishes con- expected value. Four particularly useful expected values were briefly dis-
vergence of Xn to x without actually requiring the value of the limit, that is, x. cussed: the characteristic function E{exp(jwX)}; the moment generating func-
In the deterministic case, Xn x if
tion E{exp(tX)}; the cumulative generating function In E{exp(tX)}; and the
probability generating function E{zx} (non-negative integer-valued random
lxn+m - Xni 0 as n oo for any m> 0 variables).
PROBLEMS 97
96 REVIEW OF PROBABILITY AND RANDOM VARIABLES

The review of random vectors, that is, vector random variables, extended the [5] M. Kendall and A. Stuart, The Advanced Theory of Statistics, Vol. 1, 4th ed.,
Macmillan, New York, FJ77.
ideas of marginal, joint, and conditional density function to n dimensions,
and vector notation was introduced. Multivariate normal random variables [6] H. L. Larson and B. 0. Shubert, Probabilistic Models in Engineering Sciences,
were emphasized. Vol. I, John Wiley & Sons, New York, 1979.
[7] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-
Transformations of random variables were reviewed. The special cases of a Hill, New York, 1984.
function of one random variable and a sum (or more generally an affine
[8] P. Z. Peebles, Jr., Probability, Random Variables, and Random Signal Principles,
transformation) of random variables were considered. Order statistics were
2nd ed., McGraw-Hill, New York, 1987.
considered as a special transformation. The difficulty of a general nonlinear
[9] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Prince-
transformations was illustrated by an example, and the Monte Carlo tech-
ton, N.J., 1976.
nique was introduced.
[10] J. B. Thomas, An Introduction to Applied Probability and Random Processes, John
We reviewed the following bounds: the Tchebycheff inequality, the Chernoff Wiley & Sons, New York, 1971.
bound, and the union bound. We also discussed the Gram-Charlier series ap-
proximation to a density function using moments. Approximating the distribu- 2.11 PROBLEMS
tion of Y = g(X1 , • • • , Xn) using a linear approximation with the first two
2.1 Suppose we draw four cards from an ordinary deck of cards. Let
moments was also reviewed. Numerical approximations to the Gaussian distri-
bution function were suggested. A 1 : an ace on the first draw

Limit concepts for sequences of random variables were introduced. Conver- A 2 : an ace on the second draw
gence almost everywhere, in distribution, in probability and in mean square A 3 : an ace on the third draw
were defined. The central limit theorem and the law of large numbers were
introduced. Finite sum convergence was also discussed. A 4: an ace <m tire fourth draw.

These concepts will prove to be essential in our study of random signals. a. Find P(A 1 n A 2 n A 3 n A 4 ) assuming that the cards are drawn
with replacement (i.e., each card is replaced and the deck is reshuffled
after a card is drawn and observed).
2.10 REFERENCES
b. Find P(A 1 n A 2 n A 3 n A 4 ) assuming that the cards are drawn
The material presented in this chapter was intended as a review of probability and random without replacement.
variables. For additional details, the reader may refer to one of the following books.
Reference (2], particularly Vol. 1, has become a classic text for courses in probability 2.2 A random experiment consists of tossing a die and observing the number
theory. References [8] and the first edition of [7] are widely used for courses in applied
probability taught by electrical engineering departments. References [1], [3], and [10] of dots showing up. Let
also provide an introduction to probability from an electrical engineering perspective. A 1 : number of dots showing up = 3
Reference [4] is a widely used text for statistics and the first five chapters are an excellent
introduction to probability. Reference [5] contains an excellent treatment of series ap- A 2 : even number of dots showing up
proximations and cumulants. Reference [6] is written at a slightly higher level and presents
the theory of many useful applications. Reference [9] describes a theory of probable A 3 : odd number of dots showing up
reasoning that is based on a set of axioms that differs from those used in probability. a. Find P(A 1) and P(A1 n A3).
[1] A. M. Breipohl, Probabilistic Systems Analysis, John Wiley & Sons, New York,
1970.
b. Find P(A 2 U A 3), P(A 2 n A3), P(A1jA3).

[2] W. Feller, An Introduction to Probability Theory and Applications, Vols. I, II, c. Are A 2 and A 3 disjoint?
John Wiley & Sons, New York, 1957, 1967. d. Are A 2 and A 3 independent?
[3] C. H. Helstrom, Probability and Stochastic Processes for Engineers, Macmillan,
New York, 1977. 2.3 A box contains three 100-ohm resistors labeled R 1 , R 2 , and R 3 and two
1000-ohm resistors labeled R 4 and R 5 • Two resistors are drawn from this
[4] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, Macmillan,
New York, 1978. box without replacement.
---------------......
.,...... . .,_ ,_, . .-" "< ),_,,..""' , •• ,-.

98 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 99

a. List all the outcomes of this random experiment. [A typical outcome exclusive and exhaustive sets of events associated with a random experiment
may be listed as R 5 ) to represent that R 1 was drawn first followed by E 2 • The joint probabilities of occurrence of these events and some marginal
Rs.] probabilities are listed in the table:
b. Find the probability that both resistors are 100-ohm resistors.
c. Find the probability of drawing one 100-ohm resistor and one 1000-
ohm resistor.
d. Find the probability of drawing a 100-ohm resistor on the first draw
and a 1000-ohm resistor on the second draw. B, B2 B3

Work parts (b), (c), and (d) by counting the outcomes that belong to the A, 3/36 * 5/36
appropriate events. Az 5/36 4/36 5/36
A3 * 6/36 *
2.4 With reference to the random experiment described in Problem 2.3, define
the following events.
P(B;) 12/36 14/36 *
I
A 1 : 100-ohm resistor on the first draw

A 2 : 1000-ohm resistor on the first draw a. Find the missing probabilities (*) in the table.

B 1 : 100-ohm resistor on the second draw b. Find P(B 3 1At) and P(A1IB3).
c. Are events A 1 and B 1 statistically independent?
I
B 2 : 1000-ohm resistor on the second draw
a. Find P(A 1B 1), P(A 2 B 1), and P(A 2 B 2 ).
2.7 There are two bags containing mixtures of blue and red marbles. The first
b. Find P(A 1 ), P(A 2 ), P(B 1IA 1), and P(BdA 2 ). Verify that bag contains 7 red marbles and 3 blue marbles. The second bag contains 4 I
i'"
P(B,) = P(BdA 1 )P(A 1 ) + P(BtiAz)P(Az). red marbles and 5 blue marbles. One marble is drawn from bag one and !··
transferred to bag two. Then a marble is taken out of bag two. Given that
2.5 Show that: the marble drawn from the second bag is red, find the probability that the
color of the marble transferred from the first bag to the second bag was
a. P(A U B U C) = P(A) + P(B) + P(C) - P(AB) - P(BC) blue.
- P(CA) + P(ABC).
b. P(AiB) = P(A) implies P(BiA) = P(B). 2.8 In the diagram shown in Figure 2.19, each switch is in a closed state with
probability p, and in the open state with probability 1 - p. Assuming that
c. P(ABC) = P(A)P(BiA)P(CIAB). the state of one switch is independent of the state of another switch, find
the probability that a closed path can be maintained between A and B
2.6 A 2 , A 3 are three mutually exclusive and exhaustive sets of events as- (Note: There are many closed paths between A and B.)
sociated with a random experiment E 1 • Events B 1 , B 2 , and B 3 are mutually
2.9 The probability that a student passes a certain exam is .9, given that he
studied. The probability that he passes the exam without studying is .2.
Assume ihat the probability that the student studies for an exam is .75 (a
somewhat lazy student). Given that the student passed the exam, what is
the probability that he studied?

2.10 A fair coin is tossed four times and the faces showing up are observed.
A----+----r - - - - t - - - - e B
a. List all the outcomes of this random experiment.
'--/. b. If X is the number of heads in each of the outcomes of this ex-
Figure 2.19 Circuit diagram for Problem 2.8. periment, find the probability mass function of X.
100 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 101

2.11 Two dice are tossed. Let X be the sum of the numbers showing up. Find 2.18 Show that the expected value operator has the following properties.
the probability mass function of X. a. + bX} = a + bE{X}
E{a
2.12 A random experiment can terminate in one of three events A, B, or C b. E{aX + bY} = aE{X} + bE{Y}
with probabilities 112, 114, and 1/4, respectively. The experiment is re-
c. Variance of aX+ bY= a 2 Var[X] + b 2 Var[Y]
peated three times. Find the probability that events A, B, and C each
occur exactly one time.
+ 2ab Covar[ X, Y]
2.19 Show that Ex,y{g(X, Y)} = Ex{Ey1x[g(X, Y)]} where the subscripts
2.13 Show that the mean and variance of a binomial random variable X are
denote the distributions with respect to which the expected values are
IJ.x = np and u} = npq, where q = 1 - p.
computed.
2.14 Show that the mean and variance of a Poisson random variable are IJ.x =
2.20 A thief has been placed in a prison that has three doors. One of the doors
A. and ui = A..
leads him on a one-day trip, after which he is dumped on his head (which
destroys his memory as to which door he chose). Another door is similar
2.15 The probability mass function of a geometric random variable has the form
except he takes a three-day trip before being dumped on his head. The
P(X = k) = pqk-1, k = 1, 2, 3, ... ; p, q > 0, p + q = 1. third door leads to freedom. Assume he chooses a door immediately and
with probability 1/3 when he has a chance. Find his expected number of
a. Find the mean and variance of X.
days to freedom. (Hint: Use conditional expectation.)
b. Find the probability-generating function of X.
2.21 Consider the circuit shown in Figure 2.20. Let the time at which the ith
2.16 Suppose that you are trying to market a digital transmission system (mo- switch closes be denoted by X;. Suppose X1. X 2 , X 3 , X 4 are independent,
dem) that has a bit error probability of 10- 4 and the bit errors are inde- identically distributed random variables each with distribution function F.
pendent. The buyer will test your modem by sending a known message of As time increases, switches will close until there is an electrical path from
104 digits and checking the received message. If more than two errors A to C. Let
occur, your modem will be rejected. Find the probability that the customer
U = time when circuit is first completed from A to B
will buy your modem.
V = time when circuit is first completed from B to C
2.17 The input to a communication channel is a random variable X and the
W = time when circuit is first completed from A to C
output is another random variable Y. The joint probability mass functions
of X and Y are listed: Find the following:
a. The distribution function of U.
b. The distribution function of W.

X
c. If F(x) = x, 0 :s x :s 1 (i.e., uniform), what are the mean and
-1 0 1 variance of X;, U, and W?
-1 4 0
0 0 ! 0

,J-c
1 0 i 1
4

A-L
2

B
a. Find P(Y = 1\X = 1).
b. Find P(X = 1\ Y = 1). 3

c. Find PXY· Figure 2.20 Circuit diagram for Problem 2.21.


....,.....-

102 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 103

2.22 Prove the following inequalities a. Find the marginal pdfs, fx(x) and fy(y).
a. (E{XY})Z $ E{X 2}E{Y 2 } (Schwartz or cosine inequality) b. Find the conditional pdfs fxiY(xJy) and fYix(yJx).
b. YE{(X + Y) 2} $ YE{X 2} + YE{Y 2} (triangle inequality) c. Find E{XJ Y = 1} and E{XJ Y = 0.5}.

2.23 Show that the mean and variance of a random variable X having a uniform d. Are X and Y statistically independent?
distribution in the interval [a, b] are J.Lx = (a + b)/2 and a} = (b - e. Find PXY·
a) 2!12.
2.30 The joint pdf of two random variables is
2.24 X is a Gaussian random variable with J.Lx = 2 and a} = 9. Find P(- 4 <
X$ 5) using tabulated values of Q( ). fx 1,x/Xt Xz) = 1, 0 $ Xt $ 1, 0 $ x2 $ 1
Let Y1 = X 1X 2 and Y2 = Xt
2.25 X is a zero mean Gaussian random variable with a variance of a}. Show
that a. Find the joint pdf of jy1,y,(y 1 , y 2); clearly indicate the domain of
Yt, Yz·
E{X"} = { 6ux)" 1 · 3 · 5 · · · (n - 1), n even b. Find jy1(Yt) and fy,(yz).
n odd
c. Are Y1 and Y2 independent?
2.26 Show that the characteristic function of a random variable can be expanded
as 2.31 X and Y have a bivariate Gaussian pdf given in Equation 2.57.

'l'x(w) = ± (jw)k E{Xk}


k!
a.
b.
Show that the marginals are Gaussian pdfs.
Find the conditional pdf fxiY(xJy). Show that this conditional pdf
has .a mean
(Note: The series must be terminated by a remainder term just before the
first infinite moment, if any exist). ax
E{XJY = y} = J.Lx + P - (y - JJ..y)
ay
2.27 a. Show that the characteristic function of the sum of two independent and a variance
random variables is equal to the product of the characteristic functions of
the two variables. a}(1 - p 2)

b. Show that the cumulant generating function of the sum of two 2.32 Let Z = X + Y - c, where X and Yare independent random variables
independent random variables is equal to the sum of the cumulant gen- with variances u} and a} and cis constant. Find the variance of Z in terms
erating function of the two variables. of u}, uL and c.
c. Show that Equations 2.52.c through 2.52.f are correct by equating
coefficients of like powers of jw in Equation 2.52.b. 2.33 X and Y are independent zero mean Gaussian random variables with
variances u}, and a}. Let
2.28 The probability density function of Cauchy random variable is given by Z = !(X + Y) and W = !(X - Y)
(X a. Find the joint pdf fz.w(z, w).
fx(x) = '1T(x2 + a.Z)' a> 0,
b. Find the marginal pdf fz(z).
a. Find the characteristic function of X. c. Are Z and W independent?
b. Comment about the first two moments of X.
2.34 Xt> X 2 , • •• , Xn are n independent zero mean Gaussian random variables
with equal variances, a}, = a 2 • Show that
2.29 The joint pdf of random variables X and Y is
1
O$x$y, O$y$2 Z = - [Xt + Xz + + Xn]
fx,y(x, y) = !. n
104 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 105

is a Gaussian random variable with f.Lz = 0 and = a 2 /n. (Use the result Let Y1 = X 1 + X 2 and Y2 = X 11(X1 + X 2 )
derived in Problem 2.32.) a. Find Yz).
2.35 X is a Gaussian random variable with rriean 0 and variance aJ:. Find the b. Find jyJy 1 ), fy,(yz) and show that Y 1 and Y 2 are independent.
pdf of Yif:
2.40 X 2 , X 3, ... , Xn are n independent Gaussian random variables with
a. y = xz
zero means and unit variances. Let
b. Y= lXI n

c. Y =![X+ lXI] Y = 2:xr


i=l

if X> <Tx Find the pdf of Y.


d. if lXI :s; ax
-1 if X< -ax 2.41 X is uniformly distributed in the interval [ -1r, 1T]. Find the pdf of
Y = a sin(X).
2.36 X is a zero-mean Gaussian random variable with a variance aJ:. Let Y =
aX 2 • 2.42 X is multivariate Gaussian with
a. Find the characteristic function of Y, that is, find
'l!y(w) = E{exp(jwY)} = E{exp(jwaX 2 )} m =
! 1
1 2
[!
i]
1
b. Find fy(y) by inverting 'l!y(w).
Find the mean vector and the covariance matrix of Y = [Yt. Y 2 , Y 3 )T,
2.37 X 1 and X 2 are two identically distributed independent Gaussian random
where
variables with zero mean and variance aJ:. Let
Y1 = X 1 - X2
R = v'Xr
Y2 = X 1 + X 2 - 2X3
and
Y3 = X1 + x3
8 = tan- 1 [X2 /Xt]
2.43 X is a four-variate Gaussian with
a. Find fR,e(r, 6).
b. Find fR(r), and fe(e).
OJ
0 [43 34 23 21]
c. Are R and 8 statistically independent? f.Lx =
[00 and = 2 3 4 3
1 2 3 4
2.38 X 1 and X 2 are two independent random variables with uniform pdfs in the
interval [0, 1]. Let Find E{X1 1Xz = 0.5, X3 1.0, x4 = 2.0} and the variance of XI given
Xz = X 3 = X4 = 0.
Y1 = X1 + X 2 and Y2 = X1 - X 2
a. Find the joint pdf fY,,Y,(Yt. y 2 ) and clearly identify the domain 2.44 Show that a. necessary condition for to be a covariance matrix is that
where this joint pdf is nonzero. for aU
b. Find py1y 2 and E{Y1IY2 = 0.5}.

2.39 X 1 and X 2 are two independent random variables each with the following
density function:
fx,(x) = e-x, x>O 2: 0

= 0 x:SO (This is the condition for positive semidefiniteness of a matrix.)


,.....-

106 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 107

2.45 Consider the following 3 x 3 matrices 2.50 Compare the Tchebycheff and Chernoff bounds on P(Y 2: a) with exact
values for the Laplacian pdf
A= 102 5
301], B = [105 3
5 1
2], C = [105 3
5 32] 1
[ 1 0 2 2 1 2 2 3 2
fy(y) = 2 exp( -lyl)
Which of the three matrices can be covariance matrices?
2.51 In a communication system, the received signal Y has the form
2.46 Suppose X is an n-variate Gaussian with zero means and a covariance Y=X+N
matrix !x. Let ••• , ben distinct eigenvalues of Ix and let VI>
where X is the "signal" component and N is the noise. X can have one
V 2 , • • • , Vn be the corresponding normalized eigenvectors. Show that
of eight values shown in Figure 2.21, and N has an uncorrelated bivariate
y =AX Gaussian distribution with zero means and variances The signal X
and noise N can be assumed to be independent.
where
The receiver observes Y and determines an estimated value X of X
A = Vz, V3, ... , according to the algorithm

-
has an n variate Gaussian density with zero means and if y E A; then X= X;

The decision regions A; fori = 1, 2, 3, ... , 8 are illustrated by A 1 in

!v-
0

.
°] Figure 2.21. Obtain an upper bound on P(X #- X) assuming that P(X =
X;) = k for i = 1, 2, ... , 8.
Hint:
0 8
1. P(X #- X) = 2: P(X #- XIX = x;)P(X = x;)
i=l

2. Use the union bound.


2.47 X is bivariate Gaussian with

= [ J and Ix = [i ;J Y2

a. Find the eigenvalues and eigenvectors of Ix.


b. Find the transformation Y = Y2 f = AX such that the com- x../
/
/ -- XJ

........
lx,l = 1
Angle of x, = (i- 1) 7r/4

' ' 'e x = 01-J2,


2 l!.J2)
/ '\
ponents of Y are uncorrelated. I \
I
2.48 If U(x) 2: 0 for all x and U(x) > a > 0 for all x E where is some I
interval, show that
x5 f Y1

1
P[U(X) 2: a]:::; -E{U(X)} \
a
' \
2.49 Plot the Tchebycheff and Chernoff bounds as well as the exact values for
P(X 2: a), a > 0, if X is
\
'-.
xs '- I
a. Uniform in the interval [0, 1]. ' ........
-- x7
I
I!
b. Exponential, fx(x) = exp( -x), x > 0. i
c. Gaussian with zero mean and unit variance. Figure 2.21 Signal values and decision regions for Problem 2.51.
·II
108 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 109
I
2.52 Show that the Tchebycheff-Hermite polynomials satisfy and let R be the resistive value of the series combination. Using the Gaus-
sian approximation for R find
( -1)k dk:;;) = Hk(y)h(y), k = 1, 2, ... P[9000 s R :s 11000]

2.53 X has a triangular pdf centered in the interval [ -1, 1]. Obtain a Gram- 2.59 Let
Charlier approximation to the pdf of X that includes the first six moments 1 n
of X and sketch the approximation for values of X ranging from -2 to 2. y n =-"'X
LJ 1
n i=l

2.54 Let p be the probability of obtaining heads when a coin is tossed. Suppose where Xi, i = 1, 2, ... , n are statistically independent and identically
we toss the coin N times and form an estimate of p as distributed random variables each with a Cauchy pdf
ahr
p =NH
A
- fx(x) = xz + a2
N
where N H = number of heads showing up in N tosses. Find the smallest a. Determine the characteristic function Y".
value of N such that
b. Determine the pdf of Yn.
P[ip -PI 2: 0.01p) $ 0.1 c. Consider the pdf of Y. in the limit as n oo. Does the central limit
theorem hold? Explain.
(Assume that the unknown value of pis in the range 0.4 to 0.6.)
2.60 Y is a Guassian random variable with zero mean and unit variance and
2.55 X 2 , • • • , Xn are n independent samples of a continuous random
variable X, that is x. = {sin(Y/n) if y > 0
cos( Yin) if y $0
n
fx,,x,, ... ,x.(Xh Xz, · · · , Xn) = n fx(xi)
i=l
Discuss the convergence of the sequence X". (Does the series converge,
if so, in what sense?)
Assume that f!.x = 0 and o"i- is finite.
2.61 Let Y be the number of dots that show up when a die is tossed, and let
a. Find the mean and variance of
Xn = exp[ -n(Y- 3)]
1 n
X=-2:Xi Discuss the convergence of the sequence Xn.
n i=l

2.62 Y is a Gaussian random variable with zero mean and unit variance and
b. Show that X converges to 0 in MS, that is, l.i.m. X= 0. Xn = exp(- Yin)
Discuss the convergence of the sequence Xn.
2.56 Show that if X;s are of continuous type and independent, then for suffi-
ciently large n the density of sin(X1 + X 2 + · · · + X.) is nearly equal
to the density of sin(X) where X is a random variable with uniform dis-
tribution in the interval (- 1T, 1T).

2.57 Using the Cauchy criterion, show that a sequence Xn tends to a limit in
the MS sense if and only if E{XmXn} exists as m, n oo.

2.58 A box has a large number of 1000-ohm resistors with a tolerance of ±100
ohms (assume a uniform distribution in the interval 900 to 1100 ohms).
Suppose we draw 10 resistors from this box and connect them in series

You might also like