Chapter 2 - Probability & Random Variables
Chapter 2 - Probability & Random Variables
CHAPTER TWO
functions and density functions are developed. We then discuss summary meas-
ures or expected values) that frequently prove useful in characterizing
random variables.
Vector-valued random variables (or random vectors, as they are often re-
ferred to) and methods of characterizing them are introduced in Section 2.5.
Review of Probability and Various multivariate distribution and density functions that form the basis of
probability models for random vectors are presented.
Random Variables As electrical engineers, we are often interested in calculating the response
of a system for a given input. Procedures for calculating the details of the
probability model for the output of a system driven by a random input are
developed in Section 2.6.
In Section 2.7, we introduce inequalities for computing probabilities, which
are often very useful in many applications because they require less knowledge
about the random variables. A series approximation to a density function based
.i] on some of its moments is introduced, and an approximation to the distribution
1·' of a random variable that is a nonlinear function of other (known) random vari-
' ables is presented.
;2
Convergence of sequences of random variable is the final topic introduced
in this chapter. Examples of convergence are the law of large numbers and the
i central limit theorem.
11
!
2.1 INTRODUCTION
The purpose of this chapter is to provide a review of probability for those PROBABILITY
1
'I
;\
electrical engineering students who have already completed a course in prob-
ability. We assume that course covered at least the material that is presented
here in Sections 2.2 through 2.4. Thus, the material in these sections is partic-
ularly brief and includes very few examples. Sections 2.5 through 2.8 may or
In this section we outline mathematical techniques for describing the results of
an experiment whose outcome is not known in advance. Such an experiment is
called a random experiment. The mathematical approach used for studying the
'l may not have been covered in the prerequisite course; thus, we elaborate more results of random experiments and random phenomena is called probability
in these sections. Those aspects of probability theory and random variables used theory. We begin our review of probability with some basic definitions and
in later chapters and in applications are emphasized. The presentation in this axioms.
chapter relies heavily on intuitive reasoning rather than on mathematical rigor.
.;1
A bulk of the proofs of statements and theorems are left as exercises for the
J reader to complete. Those wishing a detailed treatment of this subject are re-
'l
CJ 2.2.1 Set Definitions
·1 ferred to several well-written texts listed in Section 2.10.
i
''J We begin our review of probability and random variables with an introduction A set is defined to be a collection of elements. Notationally, capital letters A,
to basic sets and set operations. We then define probability measure and review B, ... , will designate sets; and the small letters a, b, ... , will designate
'-}
the two most commonly used probability measures. Next we state the rules elements or members of a set. The symbol, E, is read as "is an element of,"
,j governing the calculation of probabilities and present the notion of multiple or and the symbol, fl., is read "is not an element of." Thus x E A is read "xis an
;l joint experiments and develop the rules governing the calculation of probabilities element of A."
associated with joint experiments. Two special sets are of some interest. A set that has no elements is called
The concept of random variable is introduced next. A random variable is the empty set or null set and will be denoted by A set having at least one
characterized by a probabilistic model that consists of (1) the probability space, element is called nonempty. The whole or entire space S is a set that contains
I
(2) the set of values that the random variable can have, and (3) a rule for all other sets under consideration in the problem.
.A computing the probability that the random variable has a value that belongs to A set is countable if its elements can be put into one-to-one correspondence
l a subset of the set of all permissible values. The use of probability distribution with the integers. A countable set that has a finite number of elements and the
;
J
j
.
•
I
l
4
ACB
Mr
Mutually Exclusive. Two sets are called mutually exclusive (or disjoint) if they
or equivalently
have no common elements; that is, two arbitrary sets A and B are mutually
exclusive if
B::JA
An B = AB =¢
is read A is contained in B, or A is a subset of B, orB contains A. Thus A is
contained in BorA C B if and only if every element of A is an element of B.
There are three results that follow from the foregoing definitions. For an where ¢ is the null set.
I arbitrary set, A Then sets A2 , ••• , An are called mutually exclusive if
¢cA
Complement. The complement, A, of a set A relative to S is defined as the
ACA
set of all elements of S that are not in A.
Let S b.e the whole space and let A, B, C be arbitrary subsets of S. The
Set Equality. Two arbitrary sets, A and B, are called equal if and only if they following results can be verified by applying the definitions and verifying that
contain exactly the same elements, or equivalently, each is a subset of the other. Note that the operator precedence is (1) paren-
theses, (2) complement, (3) intersection, and (4) union.
A = B if and only if A C B and B CA
Commutative Laws.
1 Union. The Union of two arbitrary sets, A and B, is written as
AUB=BUA
AUB AnB=BnA
and is the set of all elements that belong to A or belong to B (or to both). The Associative Laws.
union of N sets is obtained by repeated application of the foregoing definition
'1 and is denoted by
(AU B) U C =AU (B U C) =AU B U C
N (A n B) n C = A n (B n C) = A nBn C
A 1 U A 2 U · · · U AN = U A;
i= 1
Distributive Laws.
J
12 PROBABILITY 13
{
REVIEW OF PROBABILITY AND RANDOM VARIABLES
,_f
;.'{ DeMorgan's Laws. l. Y(.S) = 1 (2.1)
Yi"
axioms
2. 0 for all A C S (2.2)
4;
P(A) 2::
(Au B)= An B
(An B)= Au B 3. P (
N:
Ak
)
= x-
N
1
P(Ak) (2.3)
if A; n Ai = ¢fori# j,
-! and N'may be infinite
2.2.2 Sample Space (¢ is the empty or null set)
When applying the concept of sets in the theory of probability, the whole space
will consist of elements that are outcomes of an experiment. In this text an
experiment is a sequence of actions that produces outcomes (that are not known A random experiment is completely described by a sampi<e space, a probability
in advance). This definition of experiment is broad enough to encompass the measure (i.e., a rule for assigning probabilities), and the class of sets forming
usual scientific experiment and other actions that are sometimes regarded as the domain set of the probability measure. The combinatiom of these three items
observations. is called a probabilistic model.
The totality of all possible outcomes is the sample space. Thus, in applications By assigning numbers to events, a probability measure distributes numbers
of probability, outcomes correspond to elements and the sample space corre- over the sample space. This intuitive notion has led to tE!e use of probability
sponds to S, the whole space. With these definitions an event may be defined distribution as another name for a probability measure. 'We now present two
as a collection of outcomes. Thus, an event is a set, or subset, of the sample widely used definitions of the probability measure.
space. An event A is said to have occurred if the experiment results in an outcome
that is an element of A. Relative Frequency Definition. Suppose that a random experiment is repeated
For mathematical reasons, one defines a completely additive family of subsets n times. If the event A occurs nA times, then its probability P(A) is defined as
:f'
of S to be events where the class, S, of sets defined on S is called completely the limit of the relative frequency nA/n of the occurrence of A. That is
'
··;
additive if
I
t. lim nA (2.4)
1. scs P(A)
n-oo n
n
2. If Ak C S for k = 1, 2, 3, ... , then U Ak C S for n 1, 2, 3, ...
For example, if a coin (fair or not) is tossed n times and heads show up nu
times, then the probability of heads equals the limiting value of nuln.
3. If A C S, then A C S, where A is the complement of A
14 "''
•
11
14 REVIEW OF PROBABILITY AND RANDOM VARIABLES
inition as: the probability of an event A consisting of NA outcomes equals the 2. For an arbitrary event, A
PROBABILITY 15
•
difference between these two definitions is illustrated by Example 2.1. and
P(A) = 1 - P(A) (2.8)
,. EXAMPLE 2.1. (Adapted from Shafer [9]). 4. If A is a subset of B, that is, A C B, then
P(A)::; P(B) (2.9)
• DIME-STORE DICE: Willard H. Longcor of Waukegan, Illinois, reported in
5. P(A U B) = P(A) + P(B) - P(A n B) (2.10.a)
• the late 1960s that he had thrown a certain type of plastic die with drilled pips
over one million times, using a new die every 20,000 throws because the die 6. P(A U B) ::; P(A) + P(B) (2.10.b)
wore down. In order to avoid recording errors, Longcor recorded only whether 7. If Al> A 2 , • • • , An are random events such that
,.
f
the outcome of each throw was odd or even, but a group of Harvard scholars
A; n Ai = ¢ for i ¥- j (2.10.c)
who analyzed Longcor's data and studied the effects of the drilled pips in the
die guessed that the chances of the six different outcomes might be approximated and
-
II
by the relative frequencies in the following table:
A 1 U Az U · · · U An = S (2.10.d)
II then
• Upface
Relative
Frequency
1
.155
2
.159
3
.164 .169
4 5
.174
6
.179
Total
1.000
= P[(A
= P(A
n A 1) U (A n A 2 ) U · · · U (A
n A 1) + P(A n A 2) + · · · + P(A
n An)J
n An) (2.10.e)
Classical 1 1.
1.000 The sets Ar, A 2 , • • • , An are said to be and exhatlS.tive
6 6
if Equations 2.10.c and 2.10.d are satisfied .
• P A;) + + +
• 8. = P(A 1) P(A 1Az) P(ArAzAJ)
• They obtained these frequencies by calculating the excess of even over odd in + p ( An ir:! A;
n-1 )
(2.11)
•• Longcor's data and supposing that each side of the die is favored in proportion
to the extent that is has more drilled pips than the opposite side. The 6, since
it is opposite the 1, is the most favored .
Proofs of these relationships are left as an exercise for the reader.
•
II
1IIC 2.2.5 Joint, Marginal, and Conditional Probabilities
S2 of E 2 consists of outcomes bl> b2 , ••• , bn,, then the sample space S of the NAB be the number of outcomes belonging to events A, B, and AB, respectively,
combined experiment is the Cartesian product of S1 and S2 • That is and let N total number .of m.uoomes in the sample space. Then,
event A
s = sl X Sz oooo NA = #outcomes of
NAB
P(AB) = N
= {(a;, bi): i = 1, 2, ... , n 1, j = 1, 2, ... , nz}
oooo NB = # Outcomes Of NA
S2 and S = S1 x S2 • If events A 1 ,
P(A) = N (2.13)
We can define probability measures on went B
A 2 , • • • , An are defined for the first subexperiment and the events
Oferrent AB
B 2 , • • • , Bm are defined for the second subexperiment £ 2 , then event A;Bi is •
NAB = # Outcomes
an event of the total experiment. Given that the event A has occurred, we know that the outcome is in A. There
are NA outcomes in A. Now, for B to occur given that A has occurred, the
Joint Probability. The probability of an event such as A; n Bi that is the outcome should belong to A and B. There are NAB outcomes in AB. Thus, the
intersection of events from subexperiments is called the joint probability of the probability of occurrence of B given A has occurred is
event and is denoted by P(A; n Bi). The abbreviation A;Bi is often used to
↳
denote A; n Bi.
{ }
the occurrence of event Bi (a capacitor on the second draw) on the second 2. If AB = 0, then P(A U BjC) = P(AjC) + P(BjC) (2.16)
subexperiment is conditional on the occurrence of event A; (the component 3. ,P(ABO = P{A)P(BjA)P(CjAB) (Chain Rule) (2.17)
drawn first) on the first subexperiment. We denote the probability of event Bi
4. ILB 1, B 2 , ••.• , B., .are.a set of mutually exclusive and exhaustive
given that event A; is known to have occurred by the conditional probability
events, then
P(BijA;).
An expression for the conditional probability P(BIA) in terms of the joint
probability P(AB) and the marginal probabilities P(A) and P(B) can be ob- P(A) L P(AjBi)P(Bi) (2.18)
tained as follows using the classical definition of probability. Let NA, NB, and j=l
1.- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -....
• Class of Defect
2
P(BziMz) = 160
B, = Bz = 83 = B.= Bs =
Manufacturer none critical serious minor incidental Totals
or by the formal definition, Equation 2.14
M, 124 6 3 1 6 140
1 Mz 145 2 4 0 9 160 2
• M3
M.
115
101
1
2
2
0
1
5
1
2
120
110 P(BziM2) = P(BzMz)
P(M2)
530
160
2
160
Totals 485 11 9 7 18 530
530
I (e) 6
P(MdBz) = U
• What is the probability of a component selected at random from the 530 com-
t ponents (a) being from manufacturer M 2 and having no defects, (b) having a
• critical defect, (c) being from manufacturer (d) having a critical defect given
the component is from manufacturer M2 , (e) being from manufacturer M 1, given
Bayes' Rule.
the form
Sir Thomas Bayes applied Equations 2.15 and 2.18 to arrive at
• SOLUTION: P(BiiA) = m
P(AiBJP(B;)
(2.19)
2: P(AiB)P(B;)
c (a) This is a joint probability and is found by assuming that each component
is equally likely to be selected. There are 145 components from M 2
having no defects out of a total of 530 components. Thus
which is used in many applications and particularly in interpreting the impact
145
P(MzB 1) = of additional information A on the probability of some event P( Bi ). An example
530 illustrates another application of Equation 2.19, which is called Bayes' rule.
•
14
(b) This calls for a marginal probability .
P(Bz) = P(MtBz) + P(MzBz) + P(M3Bz) + P(M4Bz)
EXAMPLE 2.3.
6 2 1 2 11
= 530 + 530 + 530 + 530 = 530 A binary communication channel is a system that carries data in the form of
one of two types of signals, say, either zeros or ones. Because of noise, a
Note that P(B2) can also be found in the bottom margin of the table,
transmitted zero is sometimes received as a one and a transmitted one is some-
that is
times received as a zero.
•
1
P(B2)
11
- 530
We assume that for a certain binary communication channel, the probability
a transmitted zero is received as a zero is .95 and the probability that a transmitted
20 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VARIABLES 21
4
one is received as a one is . 90. We also assume the probability a zero is transmitted
l is .4. Find
Equation 2.20.a implies Equation 2.20.b and conversely. Observe that statistical
independence is quite different from mutual exclusiveness. Indeed, if A; and B1
i (a)
(b)
Probability a one is received.
Probability a one was transmitted given a one was received.
are mutually exclusive, then P(A;B1) = 0 by definition.
SOLUTION: Defining
2.3 RANDOM VARIABLES
1 A = one transmitted It is often useful to describe the outcome of a random experiment by a number,
A = zero transmitted for example, the number of telephone calls arriving at a central switching station
in an hour, or the lifetime of a component in a system. The numerical quantity
B = one received associated with the outcomes of a random experiment is called loosely a random
B = zero received variable. Different repetitions of the experiment may give rise to different ob-
served values for the random variable. Consider tossing a coin ten times and
observing the number of heads. If we denote the number of heads by X, then
From the problem statement X takes integer values from 0 through 10, and X is called a random variable.
Formally, a random variable is a function whose domain is the set of outcomes
P(A) = .6, P(BjA) = .90, A E S, and whose range is the real line. For every outcome A E S, the
P(BjA) .05
random variable assigns a number, X(;\) such that
(a) With the use of Equation 2.18 1. The set {;\:X(;\) :s: x} is an eveilt for every x E R 1•
P(B) = P(BjA)P(A) + P(BjA)P(A) 2. The probabilities of the events {;\:X(;\) = oo}, and {;\:X(;\) = -co} equal
zero. .that is,
.90(.6) + .05(.4)
P(X = oo) = P(X = -oo) = 0
.56.
(b) Using Bayes' rule, Equation 2.19 Thus, a random variable maps S onto a set of real numbers Sx C R" where Sx
is the range set that contains all permissible values of the random variable. Often
P(AjB) = P(BjA)P(A) (.90)(.6) 27 Sx is also called the ensemble of the random variable. This definition guarantees
=-
P(B) .56 28 that to every set A C S there corresponds a set T C R1 called the image (under
X) of A. Also for every (Borel) set T C R 1 there exists inS the inverse image
x- 1(T) where
Statistical Independence. Suppose that A; and B1 are events associated with
the outcomes of two experiments. Suppose that the occurrence of A; does not x- 1(T) = {;\. E S:X(A.) E T}
influence the probability of occurrence of B1 and vice versa. Then we say that
the events are statistically independent (sometimes, we say probabilistically in- and this set is an event which has a probability, P[X- 1(T)].
dependent or simply independent). More precisely, we say that two events A; We will use uppercase letters to denote random variables and lowercase
and B1 are statistically independent if letters to denote fixed values of the random variable (i.e., numbers).
Thus, the random variabie X induces a probability measure on the real line
P(A;Bj) = P(A;)P(B1) as follows
(2.20.a)
-------------------.,=-............,...,. ,. _,.,.,.,.,..
---r----
'
\ {:
RANDOM VARIABLES 23
•
22 REVIEW OF PROBABILITY AND RANDOM VARIABLES
(
•
•
Up face is 3 Up face is 4
XC>-2J
I
XC>-1
Up face is 5 Up face is 6 f--
• I
,.---.J
• I X(>-6) l
.
I I I I
,.---.J
• -1 0 1 2 3 4 5 6 7
(
• I
I
• EXAMPLE 2.4 . 00 2 3 4 5 6 7 8 9 10
• Consider the toss of one die. Let the random variable X represent the value of
X
Figure 2.2 Distribution function of the random variable X shown in Figure 2.1.
\.
(14 the up face. The mapping performed by X is shown in Figure 2.1. The values
of the random variable are 1, 2, 3, 4, 5, 6.
•
l • • 2.3.1 Distribution Functions
The probability P(X :S x) is also denoted by the function Fx(x), which is called
Joint Distribution Function. We now consider the case where two random
variables are defined on a sample space. For example, both the voltage and
the distribution function of the random variable X. Given Fx(x), we can compute current might be of interest in a certain experiment.
such quantities as P(X > x 1), P(x 1 :S X :S x 2), and so on, easily. The probability of the joint occurrence of two events such as A and B was
1 called the joint probability P(A n B). If the event A is the event (X :S x) and
A distribution function has the following properties
•• 1. Fx( -co) = 0
DF prop for
1 crandon
. the event B is the event (Y :S y), then the joint probability is called the joint
distribution function of the random variables X and Y; that is
•• nuiaenecxs
\
3. lim Fx(x + E) = Fx(x)
e>O From this definition it can be noted that
4. Fx(xt) :S Fx(Xz) if X1 < Xz
Fx,Y( -cc, -co) = 0, FxA -ec, y) = 0, Fx,y(oo, y) = Fy(y),
5. P[xt < X :S Xz] = Fx(Xz) - Fx(xt)
Fx,Y(x, -cc) = 0, Fx,Y(x, oo) = 1, Fx,Y(x, oo) = Fx(x) (2.21)
•• EXAMPLE 2.5. A random variable may be discrete or continuous. A discrete random variable
can take on only a countable number of distinct values. A continuous random
,. Consider the toss of a fair die. Plot the distribution function of X where X is a
random variable that equals the number of dots on the up face.
variable can assume any value within one or more intervals on the real line.
Examples of discrete random variables are the number of telephone calls arriving
I,
24 REVIEW OF PROBABILITY AND RANDOM VARIABLES RANDOM VARIABLES 25
values based on the outcome of the underlying random experiment. The prob-
Number of dots showing up on a die
ability that X= X; is denoted by P(X = x;) fori = 1, 2, ... , n, and is called
the probability mass function. Figure 2.3 Probability mass function for Example 2.6.
The probability mass function of a random variable has the following im-
portant properties:
mass function P(X = X;, Y = Yi), which gives the probability that X = X; and
1. P(X = X;) > 0, i = 1, 2, ... , n (2.22.a) Y = Yi·
n
Using the probability rules stated in the preceding sections, we can prove
2. 2: P(X = x;) = 1 (2.22.b) the following relationships involving joint, marginal and conditional probability
i=l mass functions:
3. P(X :5 x) = Fx(x) = 2: P(X = x;) (2.22.c)
2: 2:
alJ X(:'::;;X
Joint 1. P(X :5 X, y :5 y) =
Zt.$.t' Y(:;Y
P(X = X;, y = Yi) (2.23)
EXAMPLE 2.6.
conditional (2.25)
P(Y = YiiX = X;)P(X =X;)
n
(Bayes' rule)
Consider the toss of a fair die. Plot the probability mass function. 2: P(Y = YiiX = X;)P(X = X;)
i= 1
SOLUTION: See Figure 2.3. (2.26)
4. Random variables X and Y are statistically independent if
P(X = X;, y = Yi) = P(X = X;)P(Y = Yi)
i = 1, 2, ... , n; j = 1, 2, ... , m (2.27)
Two Random Variables-Joint, Marginal, and Conditional Distributions and
Independence. It is of course possible to define two or more random variables
on the sample space of a single random experiment or on the combined sample
spaces of many random experiments. If these variables are all discrete, then EXAMPLE 2.7.
they are characterized by a joint probability mass function. Consider the example
of two random variables X and Y that take on the values Xt. x 2 , ••• , Xn and Find the joint probability mass function and joint distribution function of X,Y
Yz, ... , Ym· These two variables can be characterized by a joint probability associated with the experiment of tossing two fair dice where X represents the
''iik·:c•:-
RANDOM VARIABLES 27
26 REVIEW OF PROBABILITY AND RANDOM VARIABLES
number appearing on the up face of one die and Y represents the number E{(X- !Lx)"} = a-1- = 2: (x; - !Lx) 2P(X = x;) (2.30)
SOLUTION: The square-root of variance is called the standard deviation. The mean of a
random variable is its average value and the variance of a random variable is a
1 measure of the "spread" of the values of the random variable.
P(X == i, Y = j) = 36' i = 1, 2, ... ' 6; j = 1, 2, ... ' 6 We will see in a later section that when the probability mass function is not
known, then the mean and variance can be used to arrive at bounds on prob-
X J 1 abilities via the Tchebycheff's inequality, which has the form
Fx_y(X, y) = 2: 2: 36'
I
X = 1, 2, ... , 6; y = 1, 2, ... , 6
• - xy
- 36 P[\X - 11-xi > k] :s;
(12
(2.31)
•,. If x andy are not integers and are between 0 and 6, Fxx(x, y) = Fx,y([x], [y]) The Tchebycheff's inequality can be used to obtain bounds on the probability
where [x] is the greatest integer less than or equal to x. Fx.Y(x, y) = 0 for x < of finding X outside of an interval 11-x ± kax .
•• 1 or y < 1. Fx,Y(x, y) = 1 for x =:: 6 andy=:: 6. Fx,y(x, y) = Fx(x) for y =:: 6 .
Fx.v(x, y) = Fv(Y) for x =:: 6 .
The expected value of a function of two random variables is defined as
• E{g(X, Y)} =
n m
2: 2: g(x;, Yi)P(X = X;, Y = Yi) (2.32)
• I
• 2.3.3 Expected Values or Averages A useful expected value that gives a measure of dependence between two random
• The probability mass function (or the distribution function) provides as complete
variables X and Y is the correlation coefficient defined as
a description as possible for a discrete random variable. For many purposes this
description is often too detailed. It is sometimes simpler and more convenient E{(X- !Lx)(Y- !Ly)} axv (2.33)
Pxv = = --
to describe a random variable by a few characteristic numbers or summary axay axay
measures that are representative of its probability mass function. These numbers
are the various expected values (sometimes called statistical averages). The ex- The numerator of the right-hand side of Equation 2.33 is called the covariance
pected value or the average of a function g(X) of a discrete random variable X (a-XY) of X and Y. The reader can verify that if X and Y are statistically inde-
is defined as pendent, then PXY = 0 and that in the case when X and Yare linearly dependent
(i.e., when Y = (b + kX), then IPxYI = 1. Observe that PxY = 0 does not imply
n
statistical independence.
E{g(X)} 2: g(x;)P(X = x;) (2.28) Two random variables X and Y are said to be orthogonal if
i=l
E{XY} = 0
It will be seen in the next section that the expected value of a random variable
is valid for all random variables, not just for discrete random variables. The
form of the average simply appears different for continuous random variables. The relationship between two random variables is sometimes described in
Two expected values or moments that are most commonly used for characterizing terms of conditional expected values, which are defined as
a random variable X are its mean 11-x and its variance a}. The mean and variance
are defined as E{g(X, Y)IY = yj} = L g(x;, Yi)P(X = x;\Y = Yi) (2.34.a)
i
The reader can verify that From the factorial moments, we can obtain ordinary moments, for example, as
n a n! a
1. Gx(l) = _2: P(X = k) =1 (2.35.b) (k) = k!(n _ k)! and m! = m(m - 1)(m - 2) ... (3)(2)(1); 0! 1.
k=O
2. If Gx(z) is given, Pk can be obtained from it either by expanding it in a
power series or from The reader can verify that the mean and variance of the binomial random variable
1 dk are given by (see Problem 2.13)
P(X = k) = k! dzk [ Gx(z )]lz=O (2.35.c)
'
l
i
3. The derivatives of the probability generating function evaluated at z =
1 yield the factorial moments en, where
en = E{X(X - l)(X - 2) · · · (X - n + 1)}
f.Lx = np
O"k = np(l - p)
(2.38.a)
(2.38.b)
d" Poisson Probability Mass Function. The Poisson random variable is used to
= dz" [Gx(z)]iz=! (2.35.d)
model such things as the number of telephone calls received by an office and
;
.:!
,,.,. ·
then the number of events in a time interval of length T can be shown (see
Chapter 5) to have a Poisson probability mass function of the form 3 7
P(Y = 1IX = 1) =- and P(Y = OIX = 0) = -
4 8
A.k
P(X = k) = - k = 0, 1, 2, ... (2.39.a) (a) Find P(Y = 1) and P(Y = 0).
k' ,
(b) Find P(X = 11 Y = 1).
where A. = A.'T. The mean and variance of the Poisson random variable are
given by (Note that this is similar to Example 2.3. The primary difference is the
use of random variables.)
J.Lx = A. (2.39.b)
o1 = A. (2.39.c) SOLUTION:
(a) Using Equation 2.24, we have
Multinomial Probability Mass Function. Another useful probability mass func- P(Y = 1) = P(Y = liX = O)P(X = 0)
tion is the multinomial probability mass function that is a generalization of the + P(Y = liX = l)P(X = 1)
binomial distribution to two or more variables. Suppose a random experiment
is repeated n times. On each repetition, the experiment terminates in but one (1- + = ;2
of k mutually exclusive and exhaustive events AI> A 2 , • • • , Ak. Let p; be the
probability that the experiment terminates in A; and let p; remain constant 23
P(Y = 0) = 1 - P(Y = 1) = -
throughout n independent repetitions of the e)(periment. Let X;, i = 1, 2, ... , 32
k denote the number of times the experiment terminates in event A;. Then
(b) Using Bayes' rule, we obtain-
P(Xr = Xr, Xz = Xz, ... 'xk = xk) P(X = II y = 1) = P(Y = IIX = 1)P(X- 1)
n! P(Y = 1)
Xr!x 2! · • · xk -1·'x k•r P1'P2' · · · Plt (2.40)
2
=---=-
where x 1 + x 2 + · · · + xk = n, p 1 + p 2 + · · · Pk = 1, and X;= 0, 1, 2, ... , 9 3
n. The probability mass function given Equation 2.40 is called a multinomial 32
probability mass function.
Note thatwithA 1 =A, andA 2 = A,p 1 = p, andp 2 = 1 - p, the multinomial P(X = 11 Y = 1) is the probability that the input to the system is 1
probability mass function reduces to the binomial case. when the output is 1.
Before we proceed to review continuous random variables, let us look at
three examples that illustrate the concepts described in the preceding sections.
CONTINUOUS RANDOM VARIABLES 33
32 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Find
EXAMPLE 2.9.
(a) The joint probability mass function of M and N.
Binary data are transmitted over a noisy communication channel in blocks of (b) The marginal probability mass function of M.
16 binary digits. The probability that a received binary digit is in error due to (c) The condition probability mass function of N given M.
channel noise is 0.1. Assume that the occurrence of an error in a particular digit (d) E{MiN}.
does not influence the probability of occurrence of an error in any other digit (e) E{M} from part (d).
within the block (i.e., errors occur in various digit positions within a block in a
statistically independent fashion). SOLUTION:
e-1o n = 0, 1,
(a) Find the average (or expected) number of errors per block.
(b) Find the variance of the number of errors per block. (a) P(M = i, N = n) = -
n.1 t
i = 0, 1, ,n
(c) Find the probability that the number of errors per block is greater than
or equal to 5. . ., e-10(10)n(.l)n n! . .
(b) P(M = z) = n! i!(n _ i)! (.9)'(.1)-'
. e- 10 10n n . . i!
and using Equation 2.38.a (c) P(N = niM = z) = -n!- (.t )(.9)'(.1)n-• -
e- -
9 .
(9)'
E{X} = np = (16)(.1) = 1.6 = e- 1 /(n - i)!, n = i, i + 1, ...
i = 0, 1, ...
(b) The variance of X is found from Equation 2.38.b:
oJ = np(1 - p) = (16)(.1)(.9) = 1.44 (d) Using Equation 2.38.a
E{MIN = n} = .9n
(c) P(X 5) = 1 - P(X s 4)
= 1 - ±
k=O
<!6)(0.1)k(0.9)16-k
Thus
E{MiN} = .9N
This may also be found directly using the results of part (b) if these results are
available.
EXAMPLE 2.1 0.
The number N of defects per plate of sheet metal is Poisson with A. = 10. The
inspection process has a constant probability of .9 of finding each defect and
the successes are independent, that is, if M represents the number of found 2.4 CONTINUOUS RANDOM VARIABLES
defects
2.4.1 Probability Density Functions
A continuous random variable can take on more than a countable number of
P(M = iiN = n) = i s n values in one or more intervals on the real line. The probability law for a
l
r r
)
34 REVIEW OF PROBABILITY AND RANDOM VARIABLES 35
,•
CONTINUOUS RANDOM VARIABLES
1
4. P(a :s X :s b) = f fx(x) dx (2.42.d)
Fx (x)
1
j
I
'
J
..:::
-o
c::
s"'
I
I
j
r::
,J
1 11
Figure 2.4 Distribution function and density function for Example 2.11. Figure 2.5 Example of a mixed distribution function.
J
I.
36 REVIEW OF PROBABILITY AND RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES 37
Two Random Variables-Joint, Marginal, and Conditional Density Functions fXly(x!y) fx,Y(x, y) fy(y) > {} (2.44.a)
and Independence. If we have a multitude of random variables defined on one fy(y) '
or more random experiments, then the probability model is specified in terms
of a joint probability density function. For example, if there are two random fY!x(Yix) fx.Y(x, y) (2.44.b)
variables X and Y, they may be characterized by a joint probability density fx(x) ' fx(x) > 0
function fx. y(x, y). If the joint distribution function, Fx, y, is continuous and
has partial derivatives, then a joint density function is defined by fYlx(Yix) = oo fx!Y(xiy)fy(y) Bayes' rule (2.44.c)
f_oo fx!Y(xiX.)fy(X.) dX.
fu(x, y) 2: 0
EXAMPLE 2.12.
From the fundamental theorem of integral calculus
The joint density function of X and Y is
!
-t
II fx.Y(!J., v) d!J. dv = 1
SOLUTION: Since the area under the joint pdf is 1, we have
1 = ff axy dx dy =a f y [ I: dy
.,·!
it
A joint density function may be interpreted as
= a f 4y dy = 4a I: = 24a
lim P[(x <X :5 X + dx)
dx-.o
n (y < y :5 y + dy)] = !x,y(x, y) dx dy
dy-->0 or
From the joint probability density function one can obtain marginal proba- 1
bility density functions fx(x), fy(y), and conditional probability density func- £l.;;: 24
tions fx!Y(xiy) and fnrtYix) as follows:
a
;.! The marginal pdf of X is obtained from Equation 2.43.a as
l
,J
fx(x) = roo fx,y(x, y) dy (2.43.a)
·..
fx(x) = -24
1 xy dy = -24
X
[8 - 2] = X-4' 1 :S x < 3
# fy(y) = roo fx,y(X, y) dx (2.43.b)
= 0
2
elsewhere
-
r
38 REVIEW OF PROBABILITY AND RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES 39
= 24 )
1 ry exu dx dv
J =
1
6 Jz
p
v dv
It should be noted that the concept of the expected value of a random variable
2 1 is equally applicable to discrete and continuous random variables. Also, if gen-
1 eralized derivatives of the distribution function are defined using the Dirac delta
-- 12 [ y 2 - 4], 2=sy=s4 function 8 (x), then discrete random variables have generalized density functions.
For example, the generalized density function of die tossing as given in Example
2.6, is
fx
forms" to aid in the analysis. These transforms lead to the concepts of charac-
o1 = E{(X - f.lx) 2} = (x - f.lx)Zfx(x) dx (2.47.b) teristic and moment generating functions.
The characteristic function 'l' x(w) of a random variable X is defined as the
expected value of exp(jwX)
Uxy = E{(X- f.lx)(Y- f.ly)} (2.47.c)
= f (x - f.lx)(y - f.ly)fx.Y(x, y) dx dy
'l' x(w) = E{exp(jwX)}, j = v=1
For a continuous random variable (and using 8 functions also for a discrete
and random variable) this definition leads to
PxY =
E{(X - f.lx)(Y -
UxUy
f.ly)}
(2.47.d) 'l'x(w) = fx fx(x)exp(jwx) dx (2.50.a)
It can be shown that -1 ::s PxY ::s 1. The Tchebycheff's inequality for a contin- which is the complex conjugate of the Fourier transform of the pdf of X. Since
uous random variable has the same form as given in Equation 2.31. lexp(jwx) I ::s 1,
Conditional expected values involving continuous random variables are de-
fined as
lfx(x)exp(jwx)l dx ::S fx fx(x) dx = 1
Using the inverse Fourier transform, we can obtain fx(x) from 'l'x(w) as EXAMPLE '2.1'3.
X 1 and X 2 are two independent (Gaussian) random variables with means ILl and
fx(x) =
2
1TI J"'_"' 'l'x(w)exp( -jwx) dw (2.50.b) ILz and variances cry and The pdfs of X 1 and X 2 have the form
Thus, f x(x) and 'I'x( w) form a Fourier transform pair. The characteristicfunction 1 [ (x; - ILYJ = 1, 2
fx,(x;) = • exp 2 z ' i
of a random variable has the following properties. V .L:TI 0'; 0';
1. The characteristic function is unique and determines the pdf of a random (a) Find 'l'x,(w) and 'l'x,(w)
variable (except for points of discontinuity of the pdf). Thus, if two (b) Using 'I'x( w) find E{X4} where X is a Gaussian random variable with
continuous random variables have the same characteristic function, they mean zero and variance cr 2.
have the same pdf. (c) Find the pdf of Z = a 1X 1 + a 2 X 2
2. 'l'x(O) = 1, and
SOLUTION:
(a) "' 1
E{Xk} =
Jk
[dk'l' x(w)J
dwk
at w = 0 (2.5l.a) 'I' x/ w) =
f_, 2TI <T1
exp[- (x 1 - IL 1) 212cri]exp(jwx 1 ) dx 1
or more random variables. For example, the characteristic function of two ran- 'l'x,(w) = exp[jiLJW + (crdw)2f2]. -oo crl
dom variables X 1 and X 2 is given by
x exp[- (x 1 - ILD2 /2o-I] dx,
exp{Cx(w)} = 'I'x(w) The mean and of a uniform random variable can be shown to be
I
(
44 REVIEW OF PROBABILITY AND RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES 45
fxl.x)
0 a 0 a -a a
0
Figure 2.7 Probabilities for a standard Gaussian pdf.
Area= P(X>p.x+Yux) Unfortunately, this integral cannot be evaluated in closed form and requires
= Q(y)
numerical evaluation. Several versions of the integral are tabulated, and we will
use tabulated values (Appendix D) of the Q function, which is defined as
I , // / / / / / / / / / / / / ] l'\"\'\),'\'\.'\).S'rz X
0 1
P.x P.x+Y"x
Q(y) = Y exp(- z2 /2) dz, y>O (2.55)
Figure 2.6 Gaussian probability density function.
systems is often due to the cumulative effects of a large number of randomly P(X >a) = Q[(a - JJ.x)lax] (2.56)
moving charged particles and hence the instantaneous value of the noise will
tend to have a Gaussian distribution-a fact that can be tested experimentally. Various tables give any of the areas shown in Figure 2. 7, so one must observe
(The reader is cautioned that there are examples of noise that cannot be modeled which is being tabulated. However, any of the results can be obtained from the
by Gaussian pdfs. Such examples include pulse type disturbances on a telephone others by using the following relations for the standard (f.l = 0, a = 1) normal
line and the electrical noise from nearby lightning discharges.) random variable X:
The Gaussian pdf shown in Figure 2.6 has the form
P(X:::; x) = 1 - Q(x)
2
1 [ (x - JJ.x) ]
(2.54) P(-a:SX:Sa) = 2P(-a:SX:S0) = 2P(O:::;X:::;a)
fx(x) = exp 2a},
1
P(X:::; 0) = - = Q(O)
2
The family of Gaussian pdfs is characterized by only two parameters, f.lx and
ai, which are the mean and variance of the random variable X. In many ap-
plications we will often be interested in probabilities such as
EXAMPLE 2.14.
1 (x - f.lx)
2 The voltage X at the output of a noise generator is a standard normal random
P(X > a) = • , exp [ ]
dx variable. Find P(X > 2.3) and P(1 :::; X:::; 2.3).
a YLnO'x 2O'x2
1
P(1 :::; X:::; 2.3) = 1 - Q(2.3) - [1 - Q(1)] Q(1) - Q(2.3) = .148
P(X > a) = • ;;:;- exp(- z 2 /2) dz
v2n
,
The velocity V of the wind at a certain location is normal random variable with
1J.. = 2 and fi = 5. Determine P( -3 :s V :s 8).
E{g(Z)} r"' f"' g(z)fx,y(x, y) dx dy
SOLUTION:
Thus the mean, IJ..z, of Z is
xz]
3
1 [ at is defined as
f The variance,
(S-2)15
= --exp
- - dx
(-3-2)/5 \,12;
2
= 1- Q(1.2)- [1- Q(-1)] = .726 E{IZ -
Bivariate Gaussian pdf. We often encounter the situation when the instanta- Czmz, E{(Zm - IJ..zJ*(Zn - IJ..z,)}
neous amplitude of the input signal to a linear system has a Gaussian pdf and
we might be interested in the joint pdf of the amplitude of the input and the where * denotes complex conjugate.
output signals. The bivariate Gaussian pdf is a valid model for describing such
situations. The bivariate Gaussian pdf has the form
2.5 RANDOM VECTORS
fx.Y(x, y) =
2TiuxO"y
1 exp { - - -1
2(1 - p 2 )
[ (X :XIJ..xr + (y :YIJ..yr In the preceding sections we concentrated on discussing the specification of
probability laws for one or two random variables. In this section we shall discuss
2p(x - IJ..x)(y the specification of probability laws for many random variables (i.e., random
IJ..y)]}
UxUy (2.57) vectors). Whereas scalar-valued random variables take on values on the real
line, the values of "vector-valued" random variables are points in a real-valued
higher (say m) dimensional space (Rm)· An example of a three-dimensional
The reader can verify that the marginal pdfs of X and Y are Gaussian with random vector is the location of a space vehicle in a Cartesian coordinate system.
means IJ..x, IJ..y, and variances ai, u}, respectively, and The probability law for vector-valued random variables is specified in terms
of a joint distribution function
E{(X- IJ..x)(Y - IJ..y)} _ aXY
p = PXY = axay - axay
... , Xm) = P[(XJ :S X1) ... (Xm :S Xm)]
From the joint pdf, we can obtain the marginal pdfs as Important parameters of the joint distribution are the means and the co-
variances
fx,(XJ) = roo roo' . 'roo fx,.x,, ... ,xJXl> Xz, ... , Xm) dx2' ' · dxm fLx, = E{X;}
m - 1 integrals
and
and
IJ'x,x; = E{XiXi} - fLx,fLx1
f x,.x,(xJ> Xz)
= roo roo ... roo fx,.x, ... xJX1, Xz, X3, · .. , Xm) dx3 dx4 · · · dxm (2.58) Note that crxx is the variance of Xi. We will use both crx x, and cr} to denote
the variance ofx;. Sometimes the notations Ex,, Ex,xi' used to denote
m - 2 integrals expected values with respect to the marginal distribution of Xi, the joint distri-
bution of X; and Xi, and the conditional distribution of Xi given Xi, respectively.
We will use subscripted notation for the expectation operator only when there
Note that the marginal pdf of any subset of the m variables is obtained by is ambiguity with the use of unsubscripted notation.
"integrating out" the variables not in the subset. The probability law for random vectors can be specified in a concise form
The conditional density functions are defined as (using m = 4 as an example), using the vector notation. Suppose we are dealing with the joint probability law
form random variables X 2 , ••• , Xm. These m variables can be represented
as components of an m x 1 column vector X,
fx,.x,.x,lx,(x 1, x 2 , x3Jx 4 ) = fx,.x •. x 2 , x 3, x 4 ) (2.59)
fx/x4)
= J:, g(xl> Xz, X3, X4)j x,.x,.x,.x,(xl> Xz, x 3, X4) dx 1 dx 2 dx3 dx 4 Then, the joint pdf is denoted by
(2.61)
fx(X) = fx,.x,, ... ,x)xb Xz, · · · , Xm)
where g is a scalar-valued function. Conditional expected values are defined for The mean vector is defined as
example, as
E(X1) ]
E{g(X1, Xz, X3, X4)JX3 = X3, X4 = x4}
r, roo
E(Xz)
fLx = E(X)
= g(Xt, Xz, X3, X4)fx,.x,IX3 .x,(Xl, XziX3, X4) dx1 dx2 (2.62) [
E(Xm)
..... ..
r
i
l 50
I
l REVIEW OF PROBABILITY AND RANDOM VARIABLES
i
RANDOM VECTORS 51
t
l
and the "covariance-matrix", Ix, an m x m matrix is defined as 1. Suppose X has an m-dimensional multivariate Gaussian distribution. If
we partition X as
!I '
2"
Ix = E{XXI} - f.Lxf.Lk
J J
=
ax,x1
<Tx,x,
<Tx,x,
<Tx,x2
<Tx,xm]
O'xzXm
x: X,
X [X] 1: X,
r<TxmX1 <TxmX1 <Txmxm m X m and
The covariance matrix describes the second-order relationship between the com-
-
f.Lx- -
J.Lx,
kx- - k21 l:zz
ponents of the random vector X. The components are said to be "uncorrelated"
when
where J.Lx, is k x 1 and l: 11 is k x k, then X 1 has a k-dimensional
multivariate Gaussian distribution with a mean J.Lx, and covariance l: 11 •
<Tx,x; = ri;i = 0, i# j 2. If l:x is a diagonal matrix, that is,
rii
1]
and independent if 0 0
kx = 0
I
!
Gaussian distribution, which has many applications. A random vector X is mul-
tivariate Gaussian if it has a pdf of the form J.Lv = AJ.Lx
ky = A:kxAT
(2.65.a)
(2.65.b)
I
j fx(x) = [(21T)m' 21Ixl 112 ]- 1exp [ (x - J.LxYkx 1(x - J.Lx) J (2.64)
4. With a partition of X as in (1), the conditional density of X 1 given X 2
x2 is a k-dimensional multivariate Gaussian with
where f.Lx is the mean vector, Ix is the covariance matrix, Ix 1 is its inverse, IIxl f.Lx 1JX 2 = E[XdXz = Xz] = J.Lx, + l:l2l:iz1(Xz - f.Lx,) (2.66.a)
m
2 0
IJ.y = Aj..tx = 1 2
[0 0
m and
21 02 00 0][6
0 3 3 3 21][2
4 2 0 1
2 OJ
and
[0 0 1 1 2 3 4 3 0 0 1
0
6 3 2 1]
3 4 3 2 2424 6]1233 001
=
[2
1 2 3 3
3 4 3
= 24 34 13
[ 6 13 13
(a)
(b)
Find the distribution of X 1 •
Find the distribution of
= [x, - x, + 1]
2X1 ]
x3 - 3 X4
Y = X1 + 2X2
[ x3 + x4
and
[63] [2 1] [4 3] [2 3]
(c) Find the distribution of X 1 given X 2 = (x 3 , x 4 )T.
-I
= 3 4 - 3 2 3 3 1 2
SOLUTION:
14/3 4/3]
(a) X 1 has a bivariate Gaussian distribution with [ 4/3 5/3
Taking the partial derivative of the preceding expression and setting w (0),
where wT = (wb w2, . . . ' wn). From the joint characteristic function, the we have
moments can be obtained by partial differentiation. For example,
To simplify the illustrative calculations, let us assume that all random variables
have zero means. Then, The reader can verify that for the zero mean case
Sample space RangesetSx C R1 Range set Sy C R1 Now, suppose that g is a continuous function and C = ( -oo, y]. If B =
{x : g(x) :5 y}, then
which gives the distribution function of Yin terms of the density function of X.
The density function of Y (if Y is a continuous random variable) can be obtained
Random variable Y by differentiating Fy(y).
As an alternate approach, suppose Iy is a small interval of length Ay con-
Figure 2.8 Transformation of a random variable. taining the pointy. Let Ix = {x : g(x) E Iy}. Then, we have
defines a new random variable* as follows (see Figure 2.8). For a given outcome
f..., X(t...) is a number x, and g[X(t...)] is another number specified by g(x). This
= J fx(x) dx
lx
number is the value of the random variable Y, that is, Y(t...) = y = g(x). The
ensemble S y of Y is the set
which shows that we can derive the density of Y from the density of X.
We will use the principles outlined in the preceding paragraphs to find the
Sy = {y = g(x) : x E Sx} distribution of scalar-valued as well as vector-valued functions of random vari-
ables.
We are interested in finding the probability law for Y.
The method used for identifying the probability law for Y is to equate the
probabilities of equivalent events. Suppose C C Sy. Because the function g(x)
maps Sx----? Sy, there is an equivalent subset B, B C Sx, defined by 2.6.1 Scalar-valued Function of One Random Variable
Discrete Case. Suppose X is a discrete random variable that can have one
B = {x:g(x) E C} of n values x 2 , ••• , Xn. Let g(x) be a scalar-valued function. Then Y =
g(X) is a discrete random variable that can have one of m, m :5 n, values Yv
y 2 , ••• , Ym· If g(X) is a one-to-one mapping, then m will be equal ton. However,
Now, B corresponds to event A, which is a subset of the sample spaceS (see if g(x) is a many-to-one mapping, then m will be smaller than n. The probability
Figure 2.8). It is obvious that A" maps to C and hence mass function of Y can be obtained easily from the probability mass function of
X as
P(C) P(A) P(B)
P(Y = Yi) = L P(X ;: xi)
*For Y to be a random variable, the function g : X__,. Y must have the following properties:
where the sum is over all values of xi that map to Yi·
1. Its domain must include the range of the random variable X. Continuous Random Variables. If X is a continuous random variable, then the
2. It must be a Baire function, that is, for every y, the set I, such that g(x) s y must consist pdf of Y = g(X) can be obtained from the pdf of X as follows. Let y be a
of the union and intersection of a countable number of intervals in Sx. Only then {Y :S y}
is an event.
particular value of Y and let x(!l, x(2), ... , x<kl be the roots of the equation y =
2
3. The : g(X(;I.)) = ±co} must have zero probability. g(x). That is y = g(x(ll) = ... = g(x<kl). (For example, if y = x , then the
r r
58 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 59
y g(x) where llx< 1l > 0, > 0 but 6.x{2l < 0. From the foregoing it follows that
We can see from Figure 2.9 that the terms in the right-hand side are given by
H
A xlll
H
A xl21
H
A xl31
=
=
i Figure 2.9 Transformation of a continuous random variable.
.A.x(Jl =
1
1
Hence we conclude that, when we have three roots for the equation y = g(x),
l two roots are x<l) = + vY and x< l 2
= - Vy; also see Figure 2.9 for another
example.) We know that
f x(x(ll) f x(x<Zl) f x(x< 3l)
= g'(x(ll) + \g'(x<Zl)\ + g'(x(3l)
Now if we can find the set of values of x such that y < g(x) y + then fy(y) = L
k
fx(xUl)
we can obtain fy(y) from the probability that X belongs to this set. That is (2.71)
i=l \g'(x<il)l
I
g'(x) is also called the Jacobian of the transformation and is often denoted by
P(y < Y y + = P[{x:y < g(x) y + J(x). Equation 2.71 .gives the pdf of the transformed variable Yin terms of the
pdf of X, which is given. The use of Equation 2.71 is limited by our ability to
1 find the roots of the equation y = g(x). If g(x) is highly nonlinear, then the
For the example shown in Figure 2.9, this set consists of the following three solutions of y = g(x) can be difficult to find.
intervals:
'
l x(ll < x x(ll +
•
EXAMPLE 2.16 .
x< l +
2 <x x<2l
Suppose X has a Gaussian distribution with a mean of 0 and variance of 1 and
) x(3l < x x<3l + Y = X 2 + 4. Find the pdf of Y.
60 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 61
and hence
x<!J =
x<ZJ=
(a)
-3
I :
IQ
-1
3 X
I
IJy=g(x)
g'(x(ll) = lL:..- - , - - - - - -
1
g'(x<z>) = (b)
I
The density function of Y is given by
(c)
we obtain
SOLUTION: For -1 < x < 1, y = x and hence
y<4
fy(y) = fx(Y) =
1
6' -1 <y< 1
All the values of x > 1 map to y = 1. Since x > 1 has a probability of L the
probability that Y = 1 is equal to P(X > 1) = !. Similarly P(Y = -1) = !.
Thus, Y has a mixed distribution with a continuum of values in the interval
Note that since y = x 2 + 4, and the domain of X is ( -oo, oo), the domain of Y ( -1, 1) and a discrete set of values from the set { -1, 1}. The continuous
is [4, oo). part is characterized by a pdf and the discrete part is characterized by a prob-
ability mass function as shown in Figure 2.10.c.
EXAMPLE 2.17
2.6.2 Functions of Several Random Variables
Using the pdf of X and the transformation shown in Figure 2.10oa and 2.10.b, We now attempt to find the joint distribution of n random variables Y2 ,
find the distribution of Y. 0 Yn given the distribution of n related random variables XI> X 2 , ••
•• , 0 , Xn
r
62 REVIEW OF PROBABILITY AND RANDOM VARIABLES
TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 63
and the relationship between the two sets of random variables,
)J(x\;l, where x 2 ) is the Jacobian of the transformation defined as
By summing the contribution from all regions, we obtain the joint pdf of Y1 and
Y1 = Xz) Y2 as
Yz = Xz)
f Y,.Y, ( Yz) = L f x,,x,(
k
i=1
(i)
,
.
'" .
(2.73)
Suppose (x\il, xiil), i = 1, 2, ... , k are the k roots of y 1 = x2 ) and Yz =
x 2). Proceeding along the lines of the previous section, we need to find
the region in the x 2 plane ·such that Using the vector notation, we can generalize this result to the n-variate case as
and
where x(i) = [x\il, ... , is the ith solution toy = g(x) = [g 1(x), g 2 (x),
... , gn(x)],r and the Jacobian J is defined by
Yz < gz(xl> Xz) < Yz + Llyz
ag! ag! ... ag!
There are k such regions as shown in Figure 2.11 (k = 3). Each region consists
ax! axz axn
of a parallelogram and the area of each parallelogram is equal to Lly 1Lly/ J[x(il] = I I (2.74.b)
agn agn ... agn
ax 1 axz ax" I at xUJ
Yz xz
Suppose we have n random variables with known joint pdf, and we are
interested in the joint pdf of m < n functions of them, say
L--------yl XJ
Figure 2.11 Transformation of two random variables. in any convenient way so that the Jacobian is nonzero, compute the joint pdf
of Y1 , Y2 , ••• , Yn, and then obtain the marginal pdf of Y 2 , • •• , Ym by
64 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 65
integrating out Ym+t. ... , Yn. If the additional functions are carefully chosen, We are given
then the inverse can be easily found and the resulting integration can be handled,
but often with great difficulty. 1
fx,,x,(xi, Xz) = fx,(xt)fx,(xz) = 4' 9 ::S X 1 :::; 11, 9 ::S Xz ::S 11
=0 elsewhere
EXAMPLE 2.18.
Let two resistors, having independent resistances, X 1 and X 2 , uniformly distrib- Thus
uted between 9 and 11 ohms, be placed in parallel. Find the probability density
function of resistance Y1 of the parallel combination. 1
Y1,Y2 =?
SOLUTION: The resistance of the parallel combination is
fY,,Y,(Yt. Yz) - 4 (Yz - YtF '
= 0 elsewhere
Yt =
We must now find the region in the y 1 , y 2 plane that corresponds to the region
9:::; x 1 :::; 11, 9:::; x 2 :::; 11. Figure 2.12 shows the mapping and the resulting
Introducing the variable region in the y 1 , y 2 plane.
Now to find the marginal density of Yt. we "integrate out" y 2 •
Yz = Xz
f9y1 i(9-YJ) Yz2 , dy , 1 19
h,(YI) = ) 9 4(yz - Y1) 2 4 2:::; YI:::; 4 20
and solving for x 1 and x 2 results in the unique solution
11 19 1
X!
Y1Y2
Xz = Yz
-- f ny,i(ll-y,) 4(yz - Y1 y
dyz
' 4-:sy :::;5-
20
elsewhere
I 2
Yz- Y1 = 0
yz=xz
X2
Yz) = fx,,x, ( , Yz)/IJ(xl> Xz)l
Yz Y1 yz=9y\v-JI(9-y,)
11
D
11
where 9 9
XI
(xi + Xz)z (xl + Xz)z
J(x 1 , Xz) 0 1 41J2 4'%. 5 1/2
(xi + Xz)z 9
y 1 =x 1x 21(x 1 +x 2 )
(b)
(Yz - YI) 2 (a)
y1 - 9 _____1i2
1 19 at,! a1,2 ... a!,n
h(Yt) = --2- . + 2(9 - Yt) + Ytln 9 _Y_t 4 2 ::5 Yt ::5 4 ...
- Yt' 20 J = az,z
az,nl = /A/
19 1
11 - Yt + Ytln 11 - Yt 4 20 ::5 Yt ::5 5 2 an,! an,2 ... an,n
2 Y!
= 0 elsewhere
Substituting the preceding two equations into Equation 2.71, we obtain the pdf
of Y as
where the a;,/s and b;'s are all constants. In matrix notation we can write this
transformation as !Y .Y,(Yt• Yz)
1
= fx 1 ,x,(YJ - Yz, Yz)
= fx1(Yt - Yz)fx,(Yz)
a1,z
m[""
since X 1 and X 2 are independent.
az,J az,z al,n][X1] + [b1] The pdf of ¥ 1 is obtained by integration as
an,! an,2 an,n Xn bn
or
iY/YI) foo fx (Yt - 1 Yz)fx,(Yz) dyz (2.77.a)
Y=AX+B (2.75) The relationship given in Equation 2.77.a is said to be the convolution of fx 1
and fx2 , whic.h.is written symbolically as
Thus, the density function of the sum of two independent random variables is
X= A-ty- A- 1B given by the convolution of their densities. This also implies that the charac-
68 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 69
teristic functions are multiplied, and the cumulant generating functions as well
as individual cumulants are summed. EXAMPLE 2.20.
EXAMPLE 2.19. fx 1(Xt) = exp( -xt), x 1 :2: 0; fx,(xz) = 2 exp(- 2xz), x2 :2: 0,
= 0 <0 = 0 X2 < 0.
X 1 and X 2 are independent random variables with identical uniform distributions X1
(a)
l fx2(x2)
I 1/2
I l- J
-1
n X2 EXAMPLE 2.21.
V//K//0\(//! Y2
-0.5
(c)
I "'-
exp(-x)
0 .5 2 Yl
(d) y X
Figure 2.13 Convolution of pdfs-Example 2.19. Figure 2.14 Convolution for Example 2.20.
r
SOLUTION: We are given group. We will now show that the joint pdf of Y1 , Y 2 , ••• , Yn is given by
With x = A - 1y, and J = IAI, we obtain We shall prove this for n = 3, but the argument can be entirely general.
With n = 3
which corresponds to a multivariate Gaussian pdf with zero means and a co- A given set of values x 2 , x 3 may fall into one of the following six possibilities:
variance matrix of Iv. Hence, we conclude that Y, which is a linear transforma-
tion of a multivariate Gaussian vector X, also has a Gaussian distribution. (Note:
This cannot be generalized for any arbitrary distribution.) x1 < Xz < X3 or Y! =XI> Yz = Xz, Y3 = X3
x1 < X3 < Xz or Yt = Xj, Yz = X3, Y3 = Xz
x2 < x 1 < X3 or Yt = Xz, Yz = Y3 = X3
x2 < x3 < x1 or Yt = Xz, Yz = X3, Y3 = Xj
Order Statistics. Ordering, comparing, and finding the minimum and maximum x3 < x1 < Xz or Yt = X3, Yz = XI> Y3 = Xz
are typical statistical or data processing operations. We can use the techniques
x3< x2 < x 1 or Y! = X3, Yz = Xz, Y3 = Xj
outlined in the preceding sections for finding the distribution of minimum and
maximum values within a group of independent random variables. (Note that Xt = Xz, etc., occur with a probability of 0 since xj, Xz, x3 are
Let XI' Xz, x3' ... 'x. be a group of independent random variables having continuous random variables.)
a common pdf, fx(x), defined over the interval (a, b). To find the distribution Thus, we have six or 3! inverses. If we take a particular inverse, say, y 1
of the smallest and largest of these X;s, let us define the following transformation: X3, Yz = x 1 , and y 3 = x 2 , the Jacobian is given by
Yn = largest of (X1 , X 2 , ••• , X.) The reader can verify that, for all six inverses, the Jacobian has a magnitude of
1, and using Equation 2.71, we obtain the joint pdf of Y1 , Y 2 , Y3 as
That is Y1 < Y 2 < ··· < Yn represent X 1 , X 2 , • • • , Xn when the latter are arranged
in ascending order of magnitude. Then Y; is called the ith order statistic of the Yz, Y3) = 3!fx(Yt)fx(Yz)fx(Y3), a < Y1 < Yz < Y3 < b
72 REVIEW OF PROBABILITY AND RANDOM VARIABLES TRANSFORMATIONS (FUNCTIONS) OF RANDOM VARIABLES 73
Generalizing this to the case of n variables we obtain SOLUTION: From Equation 2.78.b, we obtain
/y.,Y2, ••• ,Y.(Yt. Yz, · · · , Yn) = n!fx(Yt)fx(Yz) ··· fx(Yn) jy10(Y) = 10[1 - e-aY]9ae-ay,
a < Yt < Yz < ··· < Yn < b (2.78.a) = 0 y<O
The marginal pdf of Yn is obtained by integrating out Yt. Yz, ... , Yn-1•
il
Nonlinear Transformations. While it is relatively easy to find the distribution
fy.(Yn) Yn JYn-1 ... JY3
a JYz
a n!fx(YI)fx(Yz) ··· fx(Yn) dyl dyz ··· dyn-1
of Y = g(X) when g is linear or affine, it is usually very difficult to find the d
=
Ja a distribution of Y when g is nonlinear. However, if X is a scalar random variable, II
then Equation 2.71 provides a general solution. The difficulties when X is two-
dimensional are illustrated by Example 2.18, and this example suggests the il
The innermost integral on y 1 yields Fx(y 2 ), and the next integral is difficulties when X is more than two-dimensional and g is nonlinear.
For general nonlinear transformations, two approaches are common in prac-
d
y, Jy' tice. One is the Monte Carlo approach, which is outlined in the next subsection. d
J Fx(Yz)fx(Yz) dyz = Fx(Yz)d[Fx(yz)]
a a The other approach is based upon an approximation involving moments and is
presented in Section 2. 7. We mention here that the mean, the variance, and
il
[Fx(Y3)]2 higher moments of Y can be obtained easily (at least conceptually) as follows. I!
2 We start with
:i
q,
Repeating this process (n 1) times, we obtain
E{h(Y)]} = JY h(y)fy(y)dy q
fy.(yn) = n[Fx(Yn)]"- 1fx(yn), a< Yn < b (2.78.b) I!
However, Y = g(X), and hence we can compute E{h(Y)} as 'I
Proceeding along similar lines, we can show that !I
;#
Ey{h(Y)} = Ex{h(g(X)} q
Generate 20
random numbers
and store as
Xlr·••rX2Q
<W
IV'
o17·
6£'
8£'
LE'
9£'
9£'
17£'
££'
Z£'
I£'
lOT 0£'
ZZI 6Z'
I17I
8Ll sz·
Organize
8ZZ a·
No y,s and £Z<: 9Z'
print or 691:
gz·
plot 17Z'
81£ t.z·
817£
II£ zz· .<::
lZ' u
Figure 2.15 Simple Monte Carlo simulation. 66£ o;;;·
96£
61' <ll
u
1917 81' c:
16£ n· "'<ll
8Z17 91'
88£ u
91' 0
It is assumed that Y = ... , Xn) is known and that the joint density II17 \71'
Z££
<ll
fx,.x, ..... x" is known. Now if a sample value of each random variable were known 1717£
£1' "
zr
LI£ t:i
(say X 1 = X 2 = x1.2, ... , Xn = x 1.n), then a sample value of Y could be 98Z
n· 0
·.;::
or
computed [say y 1 = g(xu. x1.2, ... , xl.n)]. If another set of sample values were 9LG 60'
::;
12:1:
chosen for the random variables (say X 1 = x 2 •1 , • • • , Xn = Xz,n), then y 2 = 061
80'
LO' e
'<;l
I9I
Xz.z, ... , Xz,n) could be computed. 9ZI
90'
90' 0
;::::
Monte Carlo techniques simply consist of computer algorithms for selecting 901 170' ell
the samples xi.!, ... , X;,n, a method for calculating y; = g(x;, 1, • • • , X;,n), which
LL
69
w u
w· <!)
often is just one or a few lines of code, and a method of organizing and displaying 99 w· c0
0
the results of a large number of repetitions of the procedure. w·- ::E
ZQ'-
Consider the case where the components of X are independent and uniformly £0'-
ell
.....0
distributed between zero and one. This is a particularly simple example because 170·-
computer routines that generate pseudorandom numbers uniformly distributed go·- E::;
90'-
between zero and one are widely available. A Monte Carlo program that ap- LO'- "'
proximates the distribution of Y when X is of dimension 20 is shown in Figure
2.15. The required number of samples is beyond the scope of this introduction. ...f'i
\C
0 0 0
0 0
s
'0
However, the usual result of a Monte Carlo routine is a histogram, and the 0 0 0 0
N ..."'
errors of histograms, which are a function of the number of samples, are discussed "' '<!'
"' 6k
in Chapter 8. sardwes jO JaqwnN fi:
If the random variable X; is not uniformly distributed between zero and one,
then random sampling is somewhat more difficult. In such cases the following
procedure is used. Select a random sample of U that is uniformly distributed
between 0 and 1. Call this random sample u 1 • Then Fx, 1(u 1) is the random sample
of X;.
76 REVIEW OF PROBABILITY AND RANDOM VARIABLES BOUNDS AND APPROXIMATIONS 77
For example, suppose that X is uniformly distributed between 10 and 20. 2. 7.1 Tchebycbeff Inequality
Then If only the mean and variance of a random variable X are known, we can obtain
upper bounds on P(!XI :2: the Tchebycheff inequality, which we prove
Fx,(x) = 0 X< 10 now. Suppose X is a random variable, and we define
g
= (x - 10)/10, 10::5 X< 20
=1 X :2: 20
Ye =
if !XI :2:
if
Notice Fx/(u) = lOu + 10. Thus, if the value .250 were the random sample
of U, then the corresponding random sample of X would be 12.5. where is a positive constant. From the definition of Y. it follows that
The reader is asked to show using Equation 2. 71 that if X; has a density
function and if X; = F;- 1( U) = g(U) where U is uniformly distributed between X2 :2: X2YE :2: €2YE
zero and one then F;- 1 is unique and
and thus
dF(x)
fx(x) = - d ' where F; = (F;-1)-1
' X
E{X 2} :2: E{X 2 Ye} :2: t:
2E{YE} (2.80)
(Note that the foregoing inequality does not require the complete distribution
2.7 BOUNDS AND APPROXIMATIONS of X, that is, it is distribution free.)
Now, if we let X = (Y- fLy), and E = k, Equation 2.82.a takes the form
In many applications requiring the calculations of probabilities we often face
the following situations:
l
1. The underlying distributions are not completely specified-only the P(j(Y - J.Ly )j :;;::: kay) ::::; kz (2.82.b)
means, variances, and some of the higher order moments E{(X - J.Lx)k},
k > 2 are known.
2. The underlying density function is known but integration in closed form or
is not possible (example: the Gaussian pdf).
0'2
In these cases we use several approximation techniques that yield upper and/ or P(jY - fLy! :;;::: k) ::5 (2.82.c)
lower bounds on probabilities.
T BOUNDS AND APPROXIMATIONS 79
78 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Equation 2.82. b gives an upper bound on the probability that a random variable 2.7.3 Union Bound
has a value that deviates from its mean by more than k times its standard I This bound is very useful in approximating the probability of union of events,
deviation. Equation 2.82.b thus justifies the use of the standard deviation as a and it follows directly from
measure of variability for any random variable.
P(A U B) = P(A) + P(B) - P(AB) :s P(A) + P(B)
2. 7.2 Chernoff Bound
since P(AB) ;;::: 0. This result can be generalized as
The Tchebycheff inequality often provides a very "loose" upper bound on prob-
abilities. The Chernoff bound provides a "tighter" bound. To derive the Chernoff
bound, define p (
'
A;) :s L P(A;)
' !
(2.84)
Ye = g X;;:: e
X< e
We now present an example to illustrate the use of these bounds.
e'x;;::: e"Y. X 1 and X 2 are two independent Gaussian random variables with J.Lx1 J.Lx, -
0 and ai-1 = 1 and ai, = 4.
and, hence, (a) Find the Tchebycheff and Chernoff bounds on P(X1 ;;::: 3) and compare
it with the exact value of P(X1 2: 3).
(b) Find the union bound on P(X1 ;;::: 3 or X 2 ;;::: 4) and compare it with the
E{e'x} ;;::: e'•E{Y.} = e'•P(X ;;::: e)
actual value.
or SOLUTION:
(a) The Tchebycheff bound on P(X1 ;;::: 3) is obtained using Equation 2.82.c
P(X 2: e) :s e-"E{e'x}, t;;::O as
1
Furthermore, P(Xt ;;::: 3) :s P(IX1 1 ;;::: 3) :s-
9
= 0.111
J
oo 1 2
.£{e'X'} = erxt \!2'IT' [xt/2 dx 1
Equation 2.83 is the Chernoff bound. While the advantage of the Chernoff
bound is that it is tighter than the Tchebycheff bound, the disadvantage of the
Chernoff bound is that it requires the evaluation of E{e'x} and thus requires = e''n -oo 1 exp[ -(xt - t)Z/2] dx 1
more extensive knowledge of the distribution. The Tchebycheff bound does not
require such knowledge of the distribution. = e''n
. ...-
___
Hence, The union bound is usually very tight when the probabilities involved
are small and the random variables are indepentlent.
then
Comparison of the exact value with the Chernoff and Tchebycheff
bounds indicates that the Tchebycheff bound is much looser than the
Chernoff bound. This is to be expected since the Tchebycheff bound q Y) = g(J.LI, J.Lz,. · · , t-Ln)
does not take into account the functional form of the pdf. = E[(Y- J.Ly)2]
since X 1 and X 2 are independent. The union bound consists of the sum where
of the first two terms of the right-hand side of the preceding equation,
and the union bound is "off" by the value of the third term. Substituting
the value of these probabilities, we have f.l.i = E[Xr]
= E[Xi -
P(X1 ;:: 3 or X 2 ;:: 4) = (.0013) + (.0228) - (.0013)(.0228) _ £[(Xi - f.l.)(Xj - IJ.i)]
= .02407 Px,x1 -
O"x,O"x1
suggests that if n is reasonably large, then it may not be too unreasonable to 2.1.5 Series Approximation of Probability Density Functions
assume that Y is normal if the X;s meet certain conditions.
In some applications, .such as. those that involve nonlinear transformations, it
will not be possible to calculate the probability density functions in closed form.
EXAMPLE 2.24. However, it might be easy to calculate the expected values. As an example,
consider Y = X 3 • Even if the pdf of Y cannot be specified in analytical form,
it might be possible to calculate E{Yk} = E{X3k} for k :s: m. In the following
Xr paragraphs we present a method for appwximating the unknown pdf fv(y) of
y = X2 + X3X4 - Xs2 a random variable Y whose moments E{Yk} are known. To simplify the algebra,
we will assume that E{Y} = 0 and a} = 1.
The readers have seen the Fourier series expansion for periodic functions.
The X;s are independent. A similar series approach can be used to expand probability density functions.
A commonly used and mathematically tractable series approximation is the
Gram-Charlier series, which has the form:
f.Lx, = 10 a},= 1
f.Lx, = 2 a2x,-
--
1 fv(Y) = h(y) L CiHi(Y) (2.85)
2 j=O
1
f.Lx, = 3 a2x,-
--4 where
a2x,-
--
1 1
f.Lx, = 4 3 h(y) = . ;-;;-- exp{ -y2f2) (2.86)
V21T
1
f.Lx, = 1 a2x,-
--5
and the basis functions of the expansion, Hi(y), are the Tchebycheff-Hermite
(T-H) polynomials. The first eight T-H polynomials are
Find approximately (a) f.Ly, (b) a}, and (c) P(Y :s: 20).
Ho(Y) = 1
SOLUTION: Hr(Y) = y
Hz(Y) = Y 2 - 1
10
(a) f.Ly = 2 + (3)(4) - 1 = 16 HJ(y) = y 3 - 3y
Hly) = y4 - 6y2 + 3
(b) a}= GY 1
(1) + ( - 4 °YG) + 4
2
+ 32 G) + G)
2
2
Hs(Y)
H 6(y)
= y5
= y6
- 10y 3
15y 4
+ 15y
+ 45y 2 - 15
= 11.2 -
H-,(y) = y7 -
5
21y + 105y3- 105y
(c) With only five terms in the approximate linear equation, we assume, H 8(y) = y 8
- 28y 6 + 210y 4 - 420y 2 + 105 (2.87)
for an approximation, that Y is normal. Thus
2. Hk(Y) - yHk-!(Y) + (k - 1)Hk_z(y) = 0, k?:.2 Substituting Equation 2.89 into Equation 2.85 we obtain the series expansion
Ck = k!
1 I"' -oo Hk(y)fy(y)dy
where the coefficients Ci are given by Equation 2.89 with f1k used for f1k where
= k!
1 [ k[Z]
f1k - (2)1! f1k-2
k[ 4]
+ 222! f.Lk-4 - .. ·] (2.89.a)
where
11£ = E {[x :xf1xr}
f1m = E{Ym}
EXAMPLE 2.25.
and
For a random variable X
k'
k[m] = · = k(k - 1) ··· [k - (m - 1)], k?:. m
(k- m)! f11 = 3, f12 = 13, f13 = 59, f14 = 309
The first eight coefficients follow directly from Equations 2.87 and 2.89.a and
are given by Find P(X ::s 5) using four terms of a Gram-Charlier series.
SOLUTION:
Co= 1
c1 = 111
<Ti = E(X 2) - [E(X)F = f12 - f1t = 4
1
C2 = 2 (f12 - 1)
Converting to the standard normal form
1
c3 = (f13 - 3f11)
6
1 Z=X-3
c4 = C114 - 6112 + 3) 2
24
1
Cs = (f1s - 10!13 + 15!1 1)
120 Then the moments of Z are
1
c6 = (f16 - 1s114 + 45112 - 15)
720 f1i = 0 = 1
1 f13 - 9f1z + 27!1 1 - 27
c7 = 5040 ( f17 - 21 f1s + 105 f13 - 105 f1J) -.5
1 _
f13 - =
8
1 f14 - 12f13 + 54f12 - 108f11 + 81
Cs =
40320
(f1s - 28f16 + 21011 4 - 420f1 2 + 105) (2.89.b) 1
f14 = 16 = 3.75
-- - . "._. .........
Then for the random variable Z, using Equation 2.89, we .add more terms, the higher ordei terms will force the pdf to take a more
proper shape.
Co= 1 A series of th·e form given in Equation 2.90 is useful only if it converges
rapidly and the terms can be calculated easily. This is true for the Gram-Charlier
C1 = 0 series when the underlying pdf is nearly Gaussian or when the random variable
C2 = 0 X is the sum of many independent components. Unfortunately, the Gram-
Charlier series is not uniformly convergent, thus adding more terms does not
c3 = 6 c- .5) =
1
-.08333 guarantee increased accuracy. A rule of thumb suggests four to six terms for
many practical applications.
c4 = 241 (3.75 - 6 + 3) .03125
I 1 Jl 1
f
oo
\12; exp(- Z 2 12) dz +
I
= -oo -oo ( - .0833)h(z)H3(z) dz Q(y) = • ;;:;- exp( -x 2 /2) dx
y V21T
= .8413 + .0833 vk exp ( (0) - .03125 vk exp ( ( -2) For 0 s y, the following approximation is excellent as measured by le(y)l, the
magnitude of the error.
= .8413 + .0151 = .8564
Q(y) = h(y)(b 1t + b2 t2 + b3t 3 + b4t 4 + b5 t5 ) + e(y) (2.9l.b)
where
Equation 2.90 is a series approximation to the pdf of a random variable X
whose moments are known. If we know only the first two moments, then the 1
series approximation reduces to h(y) = • ;;:;- exp( -y 2 /2)
V21T
1
- 1 t=-- b 2 = - .356563782
f x(x) = _ r.:- exp(- (x - 1 + PY
2Tiax
le(y)l < 7.5 X 10- 8 b3 = 1.781477937
which says that (if only the first and second moments of a random variable are p = .2316419 b4 = -1.821255978
known) the Gaussian pdf is used as an approximation to the underlying pdf. As bl = .319381530 b5 = 1.330274429
88 REVIEW OF PROBABILITY AND RANDOM VARIABLES SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 89
2.8 SEQUENCES OF RANDOM VARIABLES . . . , converges for every A E S, then we say that the random sequence converges
AND CONVERGENCE everywhere. The limit of each sequence can depend upon >.., and if we denote
the limit by X, then X is a random variable.
One of the most important concepts in mathematical analysis is the concept of Now, there may be cases where the sequence does not converge for every
convergence and the existence of a limit. Fundamental operations of calculus outcome. In such cases if the set of outcomes for which the limit exists has a
such as differentiation, integration, and summation of infinite series are defined probability of 1, that is, if
by means of a limiting process. The same is true in many engineering applications,
for example, the steady state of a dynamic system or the asymptotic trajectory P{l\. : lim Xn(>..) = X(l\.)} = 1
of a moving object. It is similarly useful to study the convergence of random ,.....,
sequences.
With real continuous functions, we use the notation
then we say that the sequence converges almost everywhere or almost surely.
This is written as
x(t) _,. a as t _,. t0 or lim x(t) = a
t->to
P{Xn -i> X} = 1 as n _,. co (2.92)
X1(A.), Xz(l\.), ... , Xn(>..), ... m(t) E{exp[t(Xk - f.!.)]} = exp(- f.l.l)M(t)
and hence the random sequence X 1 , X 2 , • • • , Xn represents a family of se- exists for -h < t <h. Furthermore, since Xk has a finite mean and variance,
quences. If each member of the family converges to a limit, that is, X 1(l\.), X 2 (A.), the first two derivatives of M(t) and hence the derivatives of m(t) exist at t =
........ - - - - - - - - - - - - - - - - - - - - - - - - - ........... ..
90 REVIEW OF PROBABILITY AND RANDOM VARIABLES SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 91
0. We can use Taylor's formula and expand m(t) as (The last step follows from the familiar formula of calculus 1im,_..oo[1 + a!n]" =
ea). Since exp(T1 /2) is the moment-generating function of a Gaussian random
variable withO .mean .and variance 1., and since the moment-generating function
m(t) = m(O) + m'(O)t + 0 ::s <t
uniquely determines the underlying pdf at all points of continuity, Equation
a 2t 2 - a 2 ]t2 2.94 shows that Zn converges to a Gaussian distribution with 0 mean and vari-
= 1 + - + "---'-"-'---::-----"--
2 2 ance 1.
In many engineering applications, the central limit theorem and hence the
Next consider Gaussian pdf play an important role. For example, the output of a linear system
is a weighted sum of the input values, and if the input is a sequence of random
variables, then the output can be approximated by a Gaussian distribution.
Mn(T) = E{exp(TZn)} Another example is the total nois.e in a radio link that can be modeled as the
= E { exp ( T X1aVn
- J.L) exp (T XzaVn
- J.L) · · · exp (T XnaVn
- J.L)}
sum of the contributions from a large number of independent sources. The
central limit theorem permits us to model the total noise by a Gaussian distri-
bution.
E{exp ... E{exp (Tx;VnJ.L)} We had assumed that X;'s are independent and identically distributed and
[ E { exp ( T :V,t)} r that the moment-generating function exists in order to prove the central limit
theorem. The theorem, however, holds under a variety of weaker conditions
(Reference [6]):
[m CV,;) r. -h < -
aVn
7
- < h 1. The random variables X 1 , X 2 , ••• , in the original sequence are inde-
pendent with the same mean and variance but not identically distributed.
2. X 1 , X 2 , • • • , are independent with different means, same variance, and
In m(t), replace t by T/(aVn) to obtain not identically distributed.
3. Assume X 1 , X 2 , X 3 , • • • are independent and have variances ay,
a5, .... If there exist positive constants E and such that E < ar <
m(-T-) = 1 + ..:.=_ + .!o.. [m___,"(--"-'0'---..,. a_z-2_]T_z for all i, then the distribution of the standardized sum converges to the
aVn 2n 2na 2
standard Gaussian; this says in particular that the variances must exist
and be neither too large nor too small.
where now is between 0 and T/(aVn). Accordingly,
The assumption of finite variances, however, is essential for the central limit
theorem to hold.
Mn(T) = {1 + Tz + [m"(O - a2]T2}" T
2n 2na 2 '
0 ::s < aVn Finite Sums. The central limit theorem states that an infinite sum, Y, has a
normal distribution. For a finite sum of independent random variables, that is,
Since m"(t) is continuous at t = 0 and since 0 as n oo, we have
Y = 2: xi
lim[ m"(O - a 2] = 0 i=l
then
and
fY = j X1 * j X 2 * • · · * j X,
lim Mn(T) = lim { 1 + -T2}n n
2n
rz--x; ft-"J''.:G
ljly(w) = IT lJ!x,(w)
= exp(T 2/2) (2.94)
92 REVIEW OF PROBABILITY AND RANDOM VARIABLES SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 93
and 7.0
where 'I' is the characteristic function and Cis the cumulant-generating function.
5.0
'(/
b
i7
\, Normal
)/approximation
I
I
Also, if K; is the ith cumulant where K; is the coefficient of (jw)i/i! in a power
J 1\v' Exact
7
series expansion of C, then it follows that 4.0
..
n 3.0
Ky
l, = "'
LJ
j=l
K;x.
'1
2.0
\
)
I
v
and in particular the first cumulant is the mean, thus 1.0 I
\ ,j
0 ./
fLy = L fl.x, 9.70 9.75 9.80 9.85 9.90 9.95 10.00 10.05 10.10 10.15 10.20 10.25
i==l X
n
EXAMPLE 2.26.
a} = 2: a_k,
i=l
Find the resistance of a circuit consisting of five independent resistances in series.
All resistances are assumed to have a uniform density function between 1. 95
and the third cumulant, K 3 ,x is E{(X - fl.x) 3}, thus and 2.05 ohms (2 ohms ± 2.5% ). Find the resistance of the series combination
and compare it with the normal approximation.
n
SOLUTION: The exact density is found by four convolutions of uniform density
E{(Y- fLy)3} = 2: E{(X; - fl.xY}
functions. The.mean value of each resistance is 2 and the standard deviation is
i=l
(20 \13) -t. The exact density function of the resistance of the series circuit is
plotted in Figure 2.17 along with the normal density function, which has the
and K 4 ,x is E{(X - fl.x) 4} - 3 K 2,x, thus same mean (10) and the same variance (1/240). Note the close correspondence.
n n
K4,Y = L
i=l
K4,x, = 2: (E{(X -
i=l
fl.x) 4} - 3Kz,x)
is, if
for any E > 0, then we say that Xn converges to the random variable X in
probability. This is also called stochastic convergence. An important application
of convergence in probability is the law of large numbers.
l
Xn-.X
in mean square
Law of Large Numbers. Assume that X 2 , • • • , Xn is a sequence of in-
dependent random variables each with mean f.l. and variance 0' 2• Then, if we
define Figure 2.18 Relationship between various modes of convergence.
1 n
Xn =-"X
£... 1 (2.95.a)
n i=l
For random sequences the following version of the Cauchy criterion applies.
lim P{iXn tJ.I 2: E} = 0 for each E > 0 (2.95.b)
E{(Xn - X)l} 0 as n oo
The law of large numbers can be proved directly by using Tchebycheff's ine-
quality. if and only if
l.i.m. Xn =X
2.9 SUMMARY
where l.i.m. is meant to suggest the phrase limit in mean (square) to distinguish
it from the symbol lim for the ordinary limit of a sequence of numbers. The reviews of probability, random variables, distribution function, probabil-
Although the verification of some modes of convergences is difficult to es- ity mass function (fOT discrete random variables), and probability density
tablish, the Cauchy criterion can be used to establish conditions for mean-square functions (for continuous random variables) were brief, as was the review of
convergence. For deterministic sequences the Cauchy criterion establishes con- expected value. Four particularly useful expected values were briefly dis-
vergence of Xn to x without actually requiring the value of the limit, that is, x. cussed: the characteristic function E{exp(jwX)}; the moment generating func-
In the deterministic case, Xn x if
tion E{exp(tX)}; the cumulative generating function In E{exp(tX)}; and the
probability generating function E{zx} (non-negative integer-valued random
lxn+m - Xni 0 as n oo for any m> 0 variables).
PROBLEMS 97
96 REVIEW OF PROBABILITY AND RANDOM VARIABLES
The review of random vectors, that is, vector random variables, extended the [5] M. Kendall and A. Stuart, The Advanced Theory of Statistics, Vol. 1, 4th ed.,
Macmillan, New York, FJ77.
ideas of marginal, joint, and conditional density function to n dimensions,
and vector notation was introduced. Multivariate normal random variables [6] H. L. Larson and B. 0. Shubert, Probabilistic Models in Engineering Sciences,
were emphasized. Vol. I, John Wiley & Sons, New York, 1979.
[7] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-
Transformations of random variables were reviewed. The special cases of a Hill, New York, 1984.
function of one random variable and a sum (or more generally an affine
[8] P. Z. Peebles, Jr., Probability, Random Variables, and Random Signal Principles,
transformation) of random variables were considered. Order statistics were
2nd ed., McGraw-Hill, New York, 1987.
considered as a special transformation. The difficulty of a general nonlinear
[9] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Prince-
transformations was illustrated by an example, and the Monte Carlo tech-
ton, N.J., 1976.
nique was introduced.
[10] J. B. Thomas, An Introduction to Applied Probability and Random Processes, John
We reviewed the following bounds: the Tchebycheff inequality, the Chernoff Wiley & Sons, New York, 1971.
bound, and the union bound. We also discussed the Gram-Charlier series ap-
proximation to a density function using moments. Approximating the distribu- 2.11 PROBLEMS
tion of Y = g(X1 , • • • , Xn) using a linear approximation with the first two
2.1 Suppose we draw four cards from an ordinary deck of cards. Let
moments was also reviewed. Numerical approximations to the Gaussian distri-
bution function were suggested. A 1 : an ace on the first draw
Limit concepts for sequences of random variables were introduced. Conver- A 2 : an ace on the second draw
gence almost everywhere, in distribution, in probability and in mean square A 3 : an ace on the third draw
were defined. The central limit theorem and the law of large numbers were
introduced. Finite sum convergence was also discussed. A 4: an ace <m tire fourth draw.
These concepts will prove to be essential in our study of random signals. a. Find P(A 1 n A 2 n A 3 n A 4 ) assuming that the cards are drawn
with replacement (i.e., each card is replaced and the deck is reshuffled
after a card is drawn and observed).
2.10 REFERENCES
b. Find P(A 1 n A 2 n A 3 n A 4 ) assuming that the cards are drawn
The material presented in this chapter was intended as a review of probability and random without replacement.
variables. For additional details, the reader may refer to one of the following books.
Reference (2], particularly Vol. 1, has become a classic text for courses in probability 2.2 A random experiment consists of tossing a die and observing the number
theory. References [8] and the first edition of [7] are widely used for courses in applied
probability taught by electrical engineering departments. References [1], [3], and [10] of dots showing up. Let
also provide an introduction to probability from an electrical engineering perspective. A 1 : number of dots showing up = 3
Reference [4] is a widely used text for statistics and the first five chapters are an excellent
introduction to probability. Reference [5] contains an excellent treatment of series ap- A 2 : even number of dots showing up
proximations and cumulants. Reference [6] is written at a slightly higher level and presents
the theory of many useful applications. Reference [9] describes a theory of probable A 3 : odd number of dots showing up
reasoning that is based on a set of axioms that differs from those used in probability. a. Find P(A 1) and P(A1 n A3).
[1] A. M. Breipohl, Probabilistic Systems Analysis, John Wiley & Sons, New York,
1970.
b. Find P(A 2 U A 3), P(A 2 n A3), P(A1jA3).
[2] W. Feller, An Introduction to Probability Theory and Applications, Vols. I, II, c. Are A 2 and A 3 disjoint?
John Wiley & Sons, New York, 1957, 1967. d. Are A 2 and A 3 independent?
[3] C. H. Helstrom, Probability and Stochastic Processes for Engineers, Macmillan,
New York, 1977. 2.3 A box contains three 100-ohm resistors labeled R 1 , R 2 , and R 3 and two
1000-ohm resistors labeled R 4 and R 5 • Two resistors are drawn from this
[4] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, Macmillan,
New York, 1978. box without replacement.
---------------......
.,...... . .,_ ,_, . .-" "< ),_,,..""' , •• ,-.
a. List all the outcomes of this random experiment. [A typical outcome exclusive and exhaustive sets of events associated with a random experiment
may be listed as R 5 ) to represent that R 1 was drawn first followed by E 2 • The joint probabilities of occurrence of these events and some marginal
Rs.] probabilities are listed in the table:
b. Find the probability that both resistors are 100-ohm resistors.
c. Find the probability of drawing one 100-ohm resistor and one 1000-
ohm resistor.
d. Find the probability of drawing a 100-ohm resistor on the first draw
and a 1000-ohm resistor on the second draw. B, B2 B3
Work parts (b), (c), and (d) by counting the outcomes that belong to the A, 3/36 * 5/36
appropriate events. Az 5/36 4/36 5/36
A3 * 6/36 *
2.4 With reference to the random experiment described in Problem 2.3, define
the following events.
P(B;) 12/36 14/36 *
I
A 1 : 100-ohm resistor on the first draw
l·
A 2 : 1000-ohm resistor on the first draw a. Find the missing probabilities (*) in the table.
B 1 : 100-ohm resistor on the second draw b. Find P(B 3 1At) and P(A1IB3).
c. Are events A 1 and B 1 statistically independent?
I
B 2 : 1000-ohm resistor on the second draw
a. Find P(A 1B 1), P(A 2 B 1), and P(A 2 B 2 ).
2.7 There are two bags containing mixtures of blue and red marbles. The first
b. Find P(A 1 ), P(A 2 ), P(B 1IA 1), and P(BdA 2 ). Verify that bag contains 7 red marbles and 3 blue marbles. The second bag contains 4 I
i'"
P(B,) = P(BdA 1 )P(A 1 ) + P(BtiAz)P(Az). red marbles and 5 blue marbles. One marble is drawn from bag one and !··
transferred to bag two. Then a marble is taken out of bag two. Given that
2.5 Show that: the marble drawn from the second bag is red, find the probability that the
color of the marble transferred from the first bag to the second bag was
a. P(A U B U C) = P(A) + P(B) + P(C) - P(AB) - P(BC) blue.
- P(CA) + P(ABC).
b. P(AiB) = P(A) implies P(BiA) = P(B). 2.8 In the diagram shown in Figure 2.19, each switch is in a closed state with
probability p, and in the open state with probability 1 - p. Assuming that
c. P(ABC) = P(A)P(BiA)P(CIAB). the state of one switch is independent of the state of another switch, find
the probability that a closed path can be maintained between A and B
2.6 A 2 , A 3 are three mutually exclusive and exhaustive sets of events as- (Note: There are many closed paths between A and B.)
sociated with a random experiment E 1 • Events B 1 , B 2 , and B 3 are mutually
2.9 The probability that a student passes a certain exam is .9, given that he
studied. The probability that he passes the exam without studying is .2.
Assume ihat the probability that the student studies for an exam is .75 (a
somewhat lazy student). Given that the student passed the exam, what is
the probability that he studied?
2.10 A fair coin is tossed four times and the faces showing up are observed.
A----+----r - - - - t - - - - e B
a. List all the outcomes of this random experiment.
'--/. b. If X is the number of heads in each of the outcomes of this ex-
Figure 2.19 Circuit diagram for Problem 2.8. periment, find the probability mass function of X.
100 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 101
2.11 Two dice are tossed. Let X be the sum of the numbers showing up. Find 2.18 Show that the expected value operator has the following properties.
the probability mass function of X. a. + bX} = a + bE{X}
E{a
2.12 A random experiment can terminate in one of three events A, B, or C b. E{aX + bY} = aE{X} + bE{Y}
with probabilities 112, 114, and 1/4, respectively. The experiment is re-
c. Variance of aX+ bY= a 2 Var[X] + b 2 Var[Y]
peated three times. Find the probability that events A, B, and C each
occur exactly one time.
+ 2ab Covar[ X, Y]
2.19 Show that Ex,y{g(X, Y)} = Ex{Ey1x[g(X, Y)]} where the subscripts
2.13 Show that the mean and variance of a binomial random variable X are
denote the distributions with respect to which the expected values are
IJ.x = np and u} = npq, where q = 1 - p.
computed.
2.14 Show that the mean and variance of a Poisson random variable are IJ.x =
2.20 A thief has been placed in a prison that has three doors. One of the doors
A. and ui = A..
leads him on a one-day trip, after which he is dumped on his head (which
destroys his memory as to which door he chose). Another door is similar
2.15 The probability mass function of a geometric random variable has the form
except he takes a three-day trip before being dumped on his head. The
P(X = k) = pqk-1, k = 1, 2, 3, ... ; p, q > 0, p + q = 1. third door leads to freedom. Assume he chooses a door immediately and
with probability 1/3 when he has a chance. Find his expected number of
a. Find the mean and variance of X.
days to freedom. (Hint: Use conditional expectation.)
b. Find the probability-generating function of X.
2.21 Consider the circuit shown in Figure 2.20. Let the time at which the ith
2.16 Suppose that you are trying to market a digital transmission system (mo- switch closes be denoted by X;. Suppose X1. X 2 , X 3 , X 4 are independent,
dem) that has a bit error probability of 10- 4 and the bit errors are inde- identically distributed random variables each with distribution function F.
pendent. The buyer will test your modem by sending a known message of As time increases, switches will close until there is an electrical path from
104 digits and checking the received message. If more than two errors A to C. Let
occur, your modem will be rejected. Find the probability that the customer
U = time when circuit is first completed from A to B
will buy your modem.
V = time when circuit is first completed from B to C
2.17 The input to a communication channel is a random variable X and the
W = time when circuit is first completed from A to C
output is another random variable Y. The joint probability mass functions
of X and Y are listed: Find the following:
a. The distribution function of U.
b. The distribution function of W.
X
c. If F(x) = x, 0 :s x :s 1 (i.e., uniform), what are the mean and
-1 0 1 variance of X;, U, and W?
-1 4 0
0 0 ! 0
,J-c
1 0 i 1
4
A-L
2
B
a. Find P(Y = 1\X = 1).
b. Find P(X = 1\ Y = 1). 3
2.22 Prove the following inequalities a. Find the marginal pdfs, fx(x) and fy(y).
a. (E{XY})Z $ E{X 2}E{Y 2 } (Schwartz or cosine inequality) b. Find the conditional pdfs fxiY(xJy) and fYix(yJx).
b. YE{(X + Y) 2} $ YE{X 2} + YE{Y 2} (triangle inequality) c. Find E{XJ Y = 1} and E{XJ Y = 0.5}.
2.23 Show that the mean and variance of a random variable X having a uniform d. Are X and Y statistically independent?
distribution in the interval [a, b] are J.Lx = (a + b)/2 and a} = (b - e. Find PXY·
a) 2!12.
2.30 The joint pdf of two random variables is
2.24 X is a Gaussian random variable with J.Lx = 2 and a} = 9. Find P(- 4 <
X$ 5) using tabulated values of Q( ). fx 1,x/Xt Xz) = 1, 0 $ Xt $ 1, 0 $ x2 $ 1
Let Y1 = X 1X 2 and Y2 = Xt
2.25 X is a zero mean Gaussian random variable with a variance of a}. Show
that a. Find the joint pdf of jy1,y,(y 1 , y 2); clearly indicate the domain of
Yt, Yz·
E{X"} = { 6ux)" 1 · 3 · 5 · · · (n - 1), n even b. Find jy1(Yt) and fy,(yz).
n odd
c. Are Y1 and Y2 independent?
2.26 Show that the characteristic function of a random variable can be expanded
as 2.31 X and Y have a bivariate Gaussian pdf given in Equation 2.57.
b. Show that the cumulant generating function of the sum of two 2.32 Let Z = X + Y - c, where X and Yare independent random variables
independent random variables is equal to the sum of the cumulant gen- with variances u} and a} and cis constant. Find the variance of Z in terms
erating function of the two variables. of u}, uL and c.
c. Show that Equations 2.52.c through 2.52.f are correct by equating
coefficients of like powers of jw in Equation 2.52.b. 2.33 X and Y are independent zero mean Gaussian random variables with
variances u}, and a}. Let
2.28 The probability density function of Cauchy random variable is given by Z = !(X + Y) and W = !(X - Y)
(X a. Find the joint pdf fz.w(z, w).
fx(x) = '1T(x2 + a.Z)' a> 0,
b. Find the marginal pdf fz(z).
a. Find the characteristic function of X. c. Are Z and W independent?
b. Comment about the first two moments of X.
2.34 Xt> X 2 , • •• , Xn are n independent zero mean Gaussian random variables
with equal variances, a}, = a 2 • Show that
2.29 The joint pdf of random variables X and Y is
1
O$x$y, O$y$2 Z = - [Xt + Xz + + Xn]
fx,y(x, y) = !. n
104 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 105
is a Gaussian random variable with f.Lz = 0 and = a 2 /n. (Use the result Let Y1 = X 1 + X 2 and Y2 = X 11(X1 + X 2 )
derived in Problem 2.32.) a. Find Yz).
2.35 X is a Gaussian random variable with rriean 0 and variance aJ:. Find the b. Find jyJy 1 ), fy,(yz) and show that Y 1 and Y 2 are independent.
pdf of Yif:
2.40 X 2 , X 3, ... , Xn are n independent Gaussian random variables with
a. y = xz
zero means and unit variances. Let
b. Y= lXI n
2.39 X 1 and X 2 are two independent random variables each with the following
density function:
fx,(x) = e-x, x>O 2: 0
2.45 Consider the following 3 x 3 matrices 2.50 Compare the Tchebycheff and Chernoff bounds on P(Y 2: a) with exact
values for the Laplacian pdf
A= 102 5
301], B = [105 3
5 1
2], C = [105 3
5 32] 1
[ 1 0 2 2 1 2 2 3 2
fy(y) = 2 exp( -lyl)
Which of the three matrices can be covariance matrices?
2.51 In a communication system, the received signal Y has the form
2.46 Suppose X is an n-variate Gaussian with zero means and a covariance Y=X+N
matrix !x. Let ••• , ben distinct eigenvalues of Ix and let VI>
where X is the "signal" component and N is the noise. X can have one
V 2 , • • • , Vn be the corresponding normalized eigenvectors. Show that
of eight values shown in Figure 2.21, and N has an uncorrelated bivariate
y =AX Gaussian distribution with zero means and variances The signal X
and noise N can be assumed to be independent.
where
The receiver observes Y and determines an estimated value X of X
A = Vz, V3, ... , according to the algorithm
-
has an n variate Gaussian density with zero means and if y E A; then X= X;
!v-
0
.
°] Figure 2.21. Obtain an upper bound on P(X #- X) assuming that P(X =
X;) = k for i = 1, 2, ... , 8.
Hint:
0 8
1. P(X #- X) = 2: P(X #- XIX = x;)P(X = x;)
i=l
= [ J and Ix = [i ;J Y2
........
lx,l = 1
Angle of x, = (i- 1) 7r/4
1
P[U(X) 2: a]:::; -E{U(X)} \
a
' \
2.49 Plot the Tchebycheff and Chernoff bounds as well as the exact values for
P(X 2: a), a > 0, if X is
\
'-.
xs '- I
a. Uniform in the interval [0, 1]. ' ........
-- x7
I
I!
b. Exponential, fx(x) = exp( -x), x > 0. i
c. Gaussian with zero mean and unit variance. Figure 2.21 Signal values and decision regions for Problem 2.51.
·II
108 REVIEW OF PROBABILITY AND RANDOM VARIABLES PROBLEMS 109
I
2.52 Show that the Tchebycheff-Hermite polynomials satisfy and let R be the resistive value of the series combination. Using the Gaus-
sian approximation for R find
( -1)k dk:;;) = Hk(y)h(y), k = 1, 2, ... P[9000 s R :s 11000]
2.53 X has a triangular pdf centered in the interval [ -1, 1]. Obtain a Gram- 2.59 Let
Charlier approximation to the pdf of X that includes the first six moments 1 n
of X and sketch the approximation for values of X ranging from -2 to 2. y n =-"'X
LJ 1
n i=l
2.54 Let p be the probability of obtaining heads when a coin is tossed. Suppose where Xi, i = 1, 2, ... , n are statistically independent and identically
we toss the coin N times and form an estimate of p as distributed random variables each with a Cauchy pdf
ahr
p =NH
A
- fx(x) = xz + a2
N
where N H = number of heads showing up in N tosses. Find the smallest a. Determine the characteristic function Y".
value of N such that
b. Determine the pdf of Yn.
P[ip -PI 2: 0.01p) $ 0.1 c. Consider the pdf of Y. in the limit as n oo. Does the central limit
theorem hold? Explain.
(Assume that the unknown value of pis in the range 0.4 to 0.6.)
2.60 Y is a Guassian random variable with zero mean and unit variance and
2.55 X 2 , • • • , Xn are n independent samples of a continuous random
variable X, that is x. = {sin(Y/n) if y > 0
cos( Yin) if y $0
n
fx,,x,, ... ,x.(Xh Xz, · · · , Xn) = n fx(xi)
i=l
Discuss the convergence of the sequence X". (Does the series converge,
if so, in what sense?)
Assume that f!.x = 0 and o"i- is finite.
2.61 Let Y be the number of dots that show up when a die is tossed, and let
a. Find the mean and variance of
Xn = exp[ -n(Y- 3)]
1 n
X=-2:Xi Discuss the convergence of the sequence Xn.
n i=l
2.62 Y is a Gaussian random variable with zero mean and unit variance and
b. Show that X converges to 0 in MS, that is, l.i.m. X= 0. Xn = exp(- Yin)
Discuss the convergence of the sequence Xn.
2.56 Show that if X;s are of continuous type and independent, then for suffi-
ciently large n the density of sin(X1 + X 2 + · · · + X.) is nearly equal
to the density of sin(X) where X is a random variable with uniform dis-
tribution in the interval (- 1T, 1T).
2.57 Using the Cauchy criterion, show that a sequence Xn tends to a limit in
the MS sense if and only if E{XmXn} exists as m, n oo.
2.58 A box has a large number of 1000-ohm resistors with a tolerance of ±100
ohms (assume a uniform distribution in the interval 900 to 1100 ohms).
Suppose we draw 10 resistors from this box and connect them in series