0 ratings0% found this document useful (0 votes) 65 views23 pagesJ Hamilton Time Series Analysis-360-382
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
A
Mathematical Review
‘This book assumes some familiarity with elementary trigonometry, complex num-
bers, caleulus, matrix algebra, and probability. Introductions to the frst three topics
by Chiang (1974) or Thomas (1972) are adequate; Marsden (1974) treated these
issues in more depth. No matrix algebra is required beyond the level of standard
‘econometrics texts such as Theil (1971) or Johnston (1984); for more detailed
treatments, see O'Nan (1976), Strang (1976), and Magnus and Neudecker (1988).
‘The concepts of probability and statistics from standard econometrics texts ae also
sulfcient for getting through this book; for more complete introductions, see Lind-
‘ren (1976) or Hoel, Port, and Stone (1971).
‘This appendix reviews the necessary mathematical concepts and results. The
reader familiar with these topics is invited to skip this material, or consult sub-
headings for desired coverage.
All, Trigonometry
Definitions
Figure A.1 displays a circle with unit radius centered atthe origin ia (x,y)
space. Let (i. yo) denote some point on this unt circle, and consider the angle 8
between this point and the z-axis, The sine of 4 is defined as the y-coordinate of
the point, and the cosine is the x-coordinate:
” (ana
£08(0) =X faa]
‘This text always measures angles in radians. The radian measure of the angle
is defined as the distance traveled counterclockwise along the unit circle starting
atthe s-axis before reaching (x, yo). The circumference ofa circle with unit radius
is 2m, A rotation one-quarter of the way around the unit circle would therefore
correspond to radian measure of @ = #(2n) = 2. An angle whose radian measure
is 7/2 is more commonly described asa right angle or a 90° angle. A 45° angle has
radian measure of 7/4, a 180° angle has radian measure of 7, and so on.
Polar Coordinates
Consider a smaller triangle—say, the triangle with vertex (x;, y,) shown in
Figure A.1—that shares the same angle @ as the original triangle with vertex
704
FIGURE A.1. Trigonometric functions as distances in (x, )-space.
(xo. ye). The ratio of any two sides of such a smaller triangle will be the same as.
that for the larger triangle
yale, = yolk fa13}
riley = aa {a4
‘Comparing [A.1.3] with (A.1.1], the ycoordinate of any point such as (, ) in
(e,yespace may be expressed as
n= evsin(®), (a1s}
‘where cis the distance from the origin to (4, y,) and isthe angle that the
(tu ys) makes with the x-axis. Comparing [A.1.4] with [A.1.2], the x-000
of (ys) cam be expressed as
21 = e050). (a1
Recall further thatthe magnitude cy, which represents the distance from the origin
to the point (x,y), is given by the formula
= VFS {aay
‘Taking a point in (x, y)-space and writing it as (€-cos(8), e-sin() is called de-
seribing the point in terms ofits polar coordinates and 0.
ALL. Trigonometry 708Properties of Sine and Cosine Functions
‘The functioas sin(@) and cos(@) are called trigonometric or sinusoidal func:
tions. Viewed as a function of 8, the sine function starts out at zero:
sin(Q) = 0.
‘The sine function rises to 1 a5 @ increases to x/2 and then falls back to zero as @
increases further to 7 see panel (a) of Figure A.2. The function reaches its min-
imum value of ~1 at @ = 322 and then begins climbing back up.
If we travel a distance of 2a radians around the unit circle, we are right back
where we started, and the function repeats itself
sin(2a + 8) = sin(),
‘The function would again repeat itself if we made two full revolutions around the
unit circle, Indeed for any integer j,
sinQnj + 0) = sin(0) [Ars]
oN
(@) sina)
. ‘
ws er a
(6) cosa)
FIGURE A.2_ Sine and cosine functions.
706 Appendix A | Mathematical Review
‘The sine function is thus periodic and is for this reason often useful for describing
time series that repeats itself in a particular cycle.
‘The cosine function starts out at unity and falls to zero as @ increases to
‘nid; see panel (b) of Figure A.2, It turns out simply to be a horizontal shift of the
Sine function:
cos(0) = sia(o + 2) fais)
The sine or cosine funtion cn also be evaluated for negative values of 6,
defined a clockwise rotation sound the unt re from te wax, Cea,
sin(—0) = ~sin(o (a.119)
coot=8) = so, faLiy
For (Zo, yo) @ point on the unit circle, [A.1.7] implies that
1= Vit
or, squaring both sides and wing (A.11] and [4.1.2],
1 = [cos(@)P + [sin(P. fAaz}
Using Trigonometric Functions to Represent Cycles
Suppose we construct the function g(@) by frst multiplying @ by 2 and then
evaluating the sine of the product: :
8(0) = sin(20)
This doubles the frequency at which the function eyces. When @ goes from 0 to
+, 28 goes from 0 to 2, and so g(0) i back tits orignal value (sce Figure A.3)
Ingeneral, the function sin(k0) would go through keycles inthe time it takes sa(@)
to complete a single cyte,
‘We will sometimes describe the value a variable y takes on at date ¢ as a
funetion of sines or cosines, such as
Y= Recos(wt + a. fata3
FIGURE A.3. Effect of changing frequency of a periodic function
A. Trigonometry 707‘The parameter R pves the amplinde of (A.113]. Tae variable y, wil atain a
maximum value of + and a minimum value of ~ R. The parameter ais the phase
‘The phase determines where in the cycle y, would be at f = 0. The parameter o
governs how quickly the variable cycles, which can be summarized by either of
{vo measures. The period i the length of time required forthe process to repeat
a ful eycle. The period of [A.1.13]s 2vlo. For example, if» = 1 then y repeats
itself every 2x periods, whereas if w ~ 2 the process repeats itself every = periods.
The frequency summarizes how frequeotly the proces cycles compared withthe
simple function cos(t); thus, it measures the numberof cycles completed during
2a periods. The frequency of cos(t) is unity, and the frequency of [A.1-13] is
For example, if « = 2, the cycles are completed twice as quickly as those for
cos’). There is a simple relation between these two measures ofthe speed of
cyeles—the period is equal to 2x divided by the frequency.
AX. Complex Numbers
Definitions
Consider the following expression
eel (a2a}
‘There are two values of x that satisy [A.2.1], namely, x = 1 andx = —
‘Suppose instead that we were given the following equation:
een (a22]
No real number satisfies [A.2.2]. However, let us coasider an imaginary number
(denoted i) that does:
Ben {a23]
‘We assume that 'can be multiplied by areal number and manipulated using standard
rules of algebra, For example,
4 B= Si
and
@i)-Gi) = OF = ~6.
‘This last property implies that a second solution to (A.2.2] is given by x = —
(it = (10 = 1
Thus, [A.2.1] has two real roots (+1 and —1), whereas [A.2.2] has two imaginary
roots (i and ~i)
For any real numbers @ and b, we can construct the expression
a+ bi [a24]
If b = 0, then [A.2.4] is a real number; whereas if a = O and b is nonzero, then
[A.2.4] isan imaginary aumber. A number written in the general form of [A.2.4]
is called a complex number.
Rules for Manipulating Complex Numbers
‘Complex numbers are manipulated using standard rules of algebra. TWo
‘complex numbers are added as follows:
(ay + bul) + (as + Bal) = (@ + 2) + (Oy + BD.
08 Appendix A | Mathematical Review
‘Complex numbers are multiplied this way
(a, + bul)-(as + bal) = aye, + aba + Byayi + bybyi?
) + ibs + bai
‘Not ths the resuling eornion ate alway simplified by separating the sal
component (such as [aja — b,bs)) from the imaginary component (such as
(abs + bas}. :
= (aa;
‘Graphical Representation of Complex Numbers
‘A complex number (a + bi) is sometimes represented graphically in an
Argand diagram as in Figure A.4. The valve of the real componeat (a) is ploted
on the horizontal axis, and the imaginary component (6 is plotted on the vertical
ais. The size, or modus, of a complex number is measured the same wey a3 the
O there ea 8 > Osuch tat) = flO ,n
power seis an be used to characterize the function f(x), To finda power series
we choose a particular value e around which to cemer th expansion sch as ¢°2
8, We then use (A. 3.12] with r=» =, For example, conser the sine function, The
first two derivatives are given by [A.3.2} and [A.3 5], with the following higher:
order derivatives:
# sing)
te) = caste)
50) © ace)
@ sin(x)
SB? = costs),
and so on. Evaluated at x = 0, we have
£(0) = sin(o)
= cost) = 1
0 :
SE) como) =.
Substituting into (A.3.12] with ¢ = 0 and letting r~» = produces a power series
for the sine function:
dys oe
sing) =x GP + pe Pat {a3.13)
Similar calculations give a power series for the cosine function:
Layla ld
coe) = 1 = att at data [asd]
Exponential Functions
‘A number raised 0 the power x,
fee rs
TIA Appendix A | Mathematical Review
is called an exponential function of x. The number is called the base of this
function, and x is called the exponent. To multiply two exponential functions that
share the same base, the exponents are added:
(9-079 = He, [a3.1s]
For example,
AW) = Or”
To raise an exponential function to the power k, the exponents are multiplied:
y
(rt = 9. (A319)
For example,
OF? = DLN =
Exponentation is distributive over multiplication:
(Ay = (0-69. (asm
Negative exponents denote reciprocals
yt = (uy,
‘Any number rised to the power 0s taken tobe equal to unity
peu (A3.18}
This convention is sensible since ify = —x in [A.3.15},
(90-9 =F
and
oy at
oy =
The Number e
‘The base for the natural logarithms is denoted e. The number e has the
property that an exponential function with base e equals its own derivative
det
ae [ass]
all the higher-order derivatives of e* are equal to e* as well:
we
@
[A320]
‘We sometimes use the expression “exp[x]” to represent “eraised tothe power
exple] = et
If u(z) denotes a separate function of x, the derivative of the compound
function e*” can be evaluated using the chain rule:
det) det du au
det dit guy dt 3.
a du dx a (asi)
A. Calculus 15‘To find a power series for the function f(z) = e*, notice from (A.3.20] that
ea
aos
sud, tom (A248,
fi] wan a
for all r. Substituting (A.3.22] into [A.3.12] with ¢ = O yields a power series for
the function f(x) =
eoe
( enters tee
(a323)
Setting x = 1 in [A.3.23] gives a numerical procedure for calculating the
value of e:
foes = 2.71828
deed
tit+gede
Euler Relations and De Moivre's Theorem
Suppose we evaluate the power series (A.3.23] atthe imaginary number x =
19, where {= V=T and 0 is some real angle measured in radians:
COP, GOP , ot, Gor.
1m 9 OE OIE
=f $f Jeufe-fee-
eng o[A:8 an (43.4 frente ion (32
cos(@) + i-sin(a), {A.3.25]
Similarly,
ots (inp CHB, io? ioe cine
ae G+ GE
-[-$+8- }-ef } A326
= cose) — sn,
‘To raise a complex number (a + bi) to the kth power, the complex number
is waitten in polar coordinate form as in (A.2.6}
@ + bi = R{cox(a) + J-sin(9)].
18 [A.3.25], this ean then be treated as an exponential function of 8:
a+ b= Re, [A327
"Now raise both sides of [A.3.27] to the kth power, recalling [A.3.17] and [A.3.16}:
(a + bik = RE fe = Reel [a.3.23]
Finally, use [A.3.25] in reverse,
el) = cos( Ok) + i-sin( Ok),
716 Appendix A | Mathematical Review
10 deduce that (A.3.28] can be written
(a + Bilt = RE [oos(ok) + i-sin(oky), [A329
Definition of Natural Logarithm
‘The natural logarithm (denoted throughout the text simply by “log”) is the
inverse of the function e*:
log(er) = x:
Notice from (A.3.18] that e° = 1 and therefore log(1) = 0,
Properties of Logarithms
For any x > 0, it also the case that
r= om, (43.30)
From {A.3.30] and [A.3.15}, we sce thatthe log ofthe product of two numbers is
equal tothe sum ofthe logs:
1og(0-b) = logl(e%s(2408)] = gleaned M4} Jog(a) + log).
Also, use [A.3.16] to write
a [asst]
[estoy = erat,
Taking logs of both sides of [A.3.31] reveals that the log of a number raised to the
4 power is equal toa times the log of the number:
log(e") = a-log(x).
Derivatives of Natural Logarithms
Let u(x) = log(s), and write the right side of [A.3.30] ase), Differentiating
both sides of [A.3.30] using [A.3.21] reveals that
a
Tous,
4 togts) |
de
[A333]
Logarithms and Elasticities
‘tis sometimes also useful to differentiate a function f(s) with respect to the
variable log(x). To do so, write (x) as f(u(x)), where
Gz) = expllog(s)
A. Caleuus 717Now use the chain rule to differentiate:
SL OBR see
Tlog(s) ~ du Tioga) (43.33)
But from (A.321),
au log) _ (A334)
Tiogh ~ “PCP Tioga)
Substituting (A.3.34] into [4.3.33] gives
ae)
Tlogt) ~ * a
It follows from (A.3.32] that
dog if (e+ 9) = playifta)
dloge ~f*de~ e+ a) - ax
Which has the interpretation as the elasticity off with respect to x, or the percent
change in f resulting from a 1% increase in x.
Logarithms and Percent
‘An approximation to the natural log function is obtained from a first-order
Taylor series around ¢ = 1
log +) = logy + £22860 (A335)
But log(t) = 0, and
‘Thus, for 8 lose to zero, an excellent approximation i provided by
log(t + a) = 8 (A336)
‘An implication of [A.3.36] is the following. Let r denote the net interest rate
‘measured a8 a fraction of 1; for example, r = 0.05 corresponds to a 5% interest
rate, Then (1 + 1) denotes the gross interest rate (principal plus net interes).
‘Equation [A.3.36} says thatthe log of the gross interest rate (1 + r) is essentially
the same number asthe net interest rate (r)
Definition of indefinite Integral
Integration (indicated by J dx) isthe inverse operation from differentiation.
For example,
[rdc= x2, (4337)
718. Appendix A | Mathematical Review
because
{z3?2)
oP) [43.38]
‘The function 2472 is not the only function satisfying [A.3.38]; the function
(wn) +c
slso works for any constant C. The term C is referred to as the constant of inte-
ration.
‘Some Useful Indefinite Integrals
‘The following integrals can be confirmed from [A.3.1], (A.3.32], [A.3.2},
[A.3.3], and (A.3.21}
feaefsc ten [43.39]
eT 220
Jove (eS fe 2p ate
Josey de sine) +06 (asa
snp dc = ~consy + 6 (asa)
Jere ume se (ass)
{tis also straightforward to demonstrate that for constants @ and b not de-
pending on x,
flere + d-sande~af reds +6 fede,
Definite Integrals
Consider the continuous function f(x) plotted in Figure A.S. Define the
function A(x; a) to be the area under f(x) between a and x, viewed as a function
‘of x. Thus, A(b; a) would be the area between a and b. Suppose we increase 6 by
‘small amount 4, This is approximately the same as adding a rectangle of height
1(e) and width & to the area A(b; a):
A(b + 45a) = AG; a) + f(b),
AW 8:0) = AbiA) 5
In the limit as A 0,
ACs; a)
ae
10). (A3.49]
Now, [4.3.44] has to hold for any value of b > a@ that we might have chosen,
A3, Caleulus 190)
4
SSS "
«oy tw
7 ° 7
FIGURE A.S The definite integral as the area under a function
Jmplying thatthe area function A(x; a is the inverse of diferentiation:
AG 0) = FQ) +, (A3.43]
where
AFD - 50)
Be £0).
To find the value of C, notice that (a €) in [A.3.45] shouldbe equal to zero
A(a; a) = 0 = Fla) + C.
For this to be tue,
ce -Fo. (A349
Evaluating (A.345] a = 6, the area between @ and b is given by
Atta) = FO) +6;
weeens AO A(b; a) = F(b) ~ F(a), [A347]
were Fe) siisis dex = (2)
Fe) = [fe de
ation (A 3.47] known a8 he fndamental ere of cals
Paine operat in (A. 2 Koown clon fie era
Lrow- [fro] - [fro]
For example, to find the area under the sine function between @ = 0 and
0 = a, we use [A.3.42]
720 Appendix A | Mathematical Review
SP sac) a= (-conNeona = [cone
= (-cox(a2)] + e080)
ol
ah
To find the area between 0 and 2, we take
ste ae» ent + cot
ene
=6.
The positive values for sin(x) between 0 and x exactly cancel out the negative
values between 1 and 2x.
AA. Matrix Algebra anaes
Definitions
‘An (m Xn) matrixisan array of numbers ordered into m rows and n columns:
ae RE SF
{ethereis only one column (n = 1), then Ais described as a column vector, whereas
1), Ais called a row vector. A single number (n = Land
IE the number of rows equals the number of columns (m
2), the matrix is
ssid to be square, The diagonal running through (a 0, -” +e) in@ aguare
matrix is called the principal diagonal. If all elements oft the pescpal diagonal
se zez0, the matrix ssid to be diagonal
‘A matrix is sometimes specified by describing the element in tow i, column
A
uu)
Summation and Multiplication
‘Two (m x m) matrices are added clement by element
a ay ay bu be os Be
fn ta o's al |b ba + bee
i a
ut bn Git by oo Ot Dy
Qutbn Oa tba => ant dy,
Ami + bmi Oa + Pmt °° * Go + Br
Ad Marte Atnaten 704or, more compactly,
A+ B= [ay + by)
ees (mn)
The product of an (m Xn) matrix and an (w x q) matrix is an (m x q) matix:
Ache mec
cone eee) (wee
where the row i, column j element of C is given by 31-144b,,. Notice that mul-
‘iplication requires that the number of columas of A be the same as the number
of rows of B
To multiply A by a scalar a, each element of A is multiplied by a:
axa =c,
5)“ eoeny ean
with
C = [aay
It is easy to show that addition is commutative:
A+B=BHA;
‘whereas multiplication is not
AB + BA.
Indeed, the préduct BA will not exist unless m = q, and even where it exists, AB
would be equal fo BA only in rather special cates.
Both addition and multiplication are associative
(A+B) +C=A+B40)
(AB)C = A(BC).
Tdentity Matrix
‘The identity matrix of order n (denoted I,) is an (n x n) matsix with Is along
the principal diagonal and Os elsewhere:
10g
wef be
O01
For any (m x n) matrix A,
and also
Powers of Matrices
For an (n x n) matrix A, the expression A? denotes A-A. The expression
‘A* indicates the matrix A multiplied by itself k times, with A® interpreted as the
(n x n) identity matrix.
722 Appendix A | Mathematical Review
Transposition
Let aj, denote the row i, column j element of a matrix A:
A= (ay)
The transpose of A (denoted A’) is given by
At = fay
For example, the transpose of
set
a5 2
673)
‘The transpose of a row vector isa column vector
Its easy to verity the following:
ay {aaa}
(a+py [aaa]
(aBy [a.a3]
Symmetric Matrices
‘A square matrix satisfying A = AY is said to be symmetric,
Trace of a Matrix
‘The trace of an (n x n) matrix is defined as the sum ofthe elements along
the principal diagonal:
trace(A) = ay ton +++ + dye
EA isan (m Xn) matrix and B is an (n Xm) matrix, then AB isan (m
im) matt whose tace is
Zeuta + Savta e+ +S onda = EB eube
ace(Al
‘The product BA is an (n x n) matrix whose trace is
Thus,
trace(AB) = trace(BA),
IEA and B are both (n x n) matzices, then
tuace(A + B) = trace(A) + trace
Ad. Matrix Algebra 723IE Ais an (w % n) matrix and A isa scalar, then
sta) = Sta dine
Partitioned Matrices
A paritioned matric is a matrix whose individual elements are themselves
matrices. For example, the (3 x 4) matrix
SRBeel
S55
osimeime
aft gl
sora] =f 3]
aim lan an) fess a).
rovide thatthe row aad column dimensions pemitthe props matt
bs Fe x
operations, For example,
where
Ap Beene Ase BAe By
misny mn] g fomminn emvmna] _ [taxa “naa
‘As *[ oa Be TT [At Bs Ag+ B
natn emma) Lomein emminad emg? “nn
Similarly,
‘Agehy Bo Be] AB. + AB, AB + AB,
inti nts] fovea creas] _ [AAS ned
7 : By | [AsB, + ABs ASB, + AB,
natn) mrmdl Lineed canes ouned Gave
Definition of Determinant
‘The determinant of a2 X 2 matic i given by the following salar:
: {AL = etn ~ ata fas)
The determinant of an n x nm matrix can be defined recursively. Let Ay, denote
the (2 ~ 1) x (wm ~ 1) matrix formed by deleting row and column j from A. The
determinant of Ais given by
lal [Aas]
For example, the determinant of a 3 x 3 matrix
a1 a a
2 ae as
lan asl 4 g,,|2 2
TAA Appendic A | Mathematical Review
|
|
|
|
Properties of Determinants
A square matrix is said to be lower triangular if all the elements above the
Drincipal diagonal are zero (ay, = 0 for j > i)
a 0 0 ss oO
iets eaiagi. °
On Oy aa Bn,
‘The determinant of a lower triangular matrix is simply the product of the terms
‘long the principal diagonal:
IAl = ayn Inn: [A.4.6]
That 4.4.6] holds for n = 2fllows immediately from (A.4.4] Given tat it holds
fora matrix of order n ~ 1, equation (A4.5} implies that holds for w
a 0 0 °
lAl= ay] % 0] + Otdal #2 + OLA
a cr
‘An immediate implication of (A.4.6] is that the determinant ofthe identity
matrix is unity:
iL] =a. [a4]
Another useful fact about determinants is that if an m x m mate A is mul-
Uiplied by a scalar a, the effect is to multiply the determinant by a:
Jaa! = atlAl. (a4a]
Again, (A.4.8] is immediately apparent for the
foal = [ogo seal
(@ayaaz2) ~ (aay00;,)
(01.02 ~0,202:)
= aAl.
case from [A.4.4):
Given that it holds for n ~ 1, its simple to verify for m using [A.4.5].
By contrast, if a single row of A is multiplied by the constant a (as opposed
‘to multiplying the entire matrix by a), then the determinant is multiplied by a. If
the row that is multiplied by'a is the first row, then this result i immediately
‘apparent from (4.4.5). If only the ith row of A is multiplied by a, the result can
be shown by recursively applying [A.4.5] until the elements of the ith row appear
cexplicly in the formala.
Suppose that some constant c times the second row of a2 x 2 matrix is added
to the fist row. This operation has no effect on the determinant
lan + cam, aia + ca
a 2 | = Gu + conden ~ (0a + conden
Auster ~ as
Similarly, if some constant ¢ times the third row of a 3 x 3 matrix is added to the
A. Matrix Algebra 725second row, the determinant will again be unchanged:
Jan + cam, an + Ca aay + Cass
— aqlt + oan aay + cass
fap|t 4 oO in + ety
on as
iF
w= 0,|% ol — sayls %
In general, if any row of an n x m matrix is multiplied by e and added to another
row, the new matrix will have the same determinant as the original. Similaly,
‘multiplying any column by c and adding the result to another column will not
change the determinant,
This can be viewed as a special case of the following result. IA and B are
both m x m matrices, thea
[ABI = [ALB [a4]
‘Adding c times the second column of a2 x 2 matrix A to the first column can be
thought of as postmultiplying A by the following matrix:
Et]
Since B is lower triangular with 1s along the principal diagonal, its determinant is
unity, and s0, from [4.4.8],
AB] = [al.
‘Thus, the fact that adding a multiple of one column to another does not alter the
determinant can be viewed as an implication of [4.4.3].
If two rows of matrix are switched, the determinant changes signs. To switeh,
the ith row withthe jth, multiply the th row by ~1; this changes the sign of the
determinant. Then subtract row / from row j, add the new j back toi, and subtract
{ from j once again. These last operations complete the switch and do not affect
the determinant further. For example, let A be a (4 x 4) matrix written in par
titioned form as
Where the (1x 4) vector a represents the ith row of A. The determinant when
rows 1 and 4 are switched can be calculated from
726 Appendix A | Mathematical Review
‘This result permits calculation of the determinant of A in reference to any
row of an (we Xn) matrix A:
tal = 3 (= n!a)1Ay (4.4.10)
To derive [A.4.10}, define A* as
‘Then, from [A.4.5],
3 cpaglay
lar
Cited
Moreover, A i obtained from A by (I ~ 1) row switches, such as switching with
FUE A with = 2)... and 2 with 1. Hence,
la
creas
cy
+S aitalasle
as claimed in [A.4.10}.
‘An immediate implication of (A.4.10] is that if any row of a matrix contains
all zeros, then the determinant of the matrix is zero,
It can also be shown thatthe transpose of a matrix has the same determinant
as the original matrix:
IAT = LAL. faa.)
‘This means, for example, that ifthe kth column of a matrix consists entirely of
zeros, then the determinant ofthe matrixis zero. Italso impliesthat the determinant
‘of an upper triangular matrix (one for which a,, = O for allj< i) isthe product of
the terms on the principal diagonal.
Adjoint of a Matrix
Let A denote an (n x n) matrix, and as before let A, denote the (2 ~ 3) x
(x ~ 1)] matric that results from deleting row j and column /of A. The adjoint of
Ais the (nn) matrix whote row i, column j element is given by (—1)"// Ay
Inverse of a Matrix
Ifthe determinant of an m X 1 matrix A is not equal to zero, its inverse (an
nx m matrix denoted A~*) exists and is found by dividing the ‘adjoint by the
determinant:
Art = (MAD Dag) faaag]
At, Matrix Algebra 727For example, for
fe mall Sees sutnd-[ 22, el (A413)
2,
‘Acmatrix whose inverse exists i said tobe nonsingular. A matrix whose determinant
is zero is singular and has no inverse.
‘When an inverse exists,
Ax Ate, [aatg
‘Taking determinants of both sides of (A.4.14] and using [4.4.9] and (A.4.7],
[Ai-[A- = 1,
1a] = vial (Aas)
Alternatively, taking the transpose of both sides of [A.4.14] and recalling
(a.43},
anya’ = hy
‘which means that (Ais the inverse of A’:
any = (ay
and A a nonsingular matrix,
[aA}“! = @'a-t
For a arnonzero se
‘Also, for A, B, and C all nonsingular (7 x n) matrices,
[any = BA-*
and
[aBc]-* = C-*B-!A-}
Tinear Dependence
Let m, ta. % BE a set of k different (n % 1) vectors. The vectors are
said to be linearly dependent it there exists set of K scalats (Cy yy 64), BOL
all of which are zero, such that
aks + eats + oo + Oh =O.
If no such set of nonzero numbers (¢, ca,» Ca) exits, then the vectors (x,
Ya, «1 a) ate said t0 be linearly independent
Suppose the vectors (ny, Xa,» Xx) are collected in an (n x k) matrix T,
‘written in partitioned form as
Tata mo mh
If the number of vectors (k) is equal to the dimension of each vector (n), then
there isa simple relation between the notion of linear dependence and the dete-
at ofthe (nn) matsix T; specially, if (xy, xg, ~~» X,) ae linearly
dependent, then |T| = 0. To see ths, suppose that x, i one of the vectors that
have a nonzero value of Then linear dependence means that
= ~(aleds
728 Appendic A | Mathematical Review
= (eles = = Glenn
i
‘Then the determinant of T is equal to
IT] = If-(eledee ~ (sfediny ~ =~ (Glee + x
But if we add (c/cy) times the th column to the first column, (cy. /,) times the
(a ~ 1)th column to the fist column... , and (c,/e,) times the second column
to the first column, the result is
IT] = 0 x ol
“The converse can also be shown to be true: if |T| = 0, then (Xs, Xx,
1) are linearly dependent.
Eigenvalues and Eigenvectors
Suppose that ann Xn matrix A, a nonzero m X 1 vector x, and a scalar A
are related by
Ax ax (ass)
‘Then x is called an eigenvector of A and A the associated eigenvalue. Equation
[4.4.16] can be waitten
Ax - Ale = 0
(A~ Ale = 0. [aaa]
Suppose that the matrix (A ~ AL,) were nonsingular. Thea (A ~ Al,)~ would
exist and we could premultiply (A:4.17] by (A — Al,)~' to deduce that
x=0.
‘Thus, if a nonzero vector x exists that satisfies (A.4.16], then it must be associated
with a value of A such that (A — AI,) is singular. An eigenvalue of the matrix A
is therefore a number A such that
1A = at = 0. (A48)
Eigenvalues of Triangular Matrices
Notice that if A is upper triangular or lower triangular, then A ~ Al, is as
well, and its determinant is just the product of terms along the principal diagonal:
JA = Adal = (as — AY(@a2 — A) > + + Gan ~ A).
‘Thus, for a triangular matrix, the eigenvalues (the values of A for which this
‘expression equals zero) are just the values of A along the principal diagonal.
Tinear Independence of Eigenvectors
‘A useful result i that ifthe eigenvalues (A,, Ay, , Ay) are all distinct,
then the associated eigenvectors (Ry, %,-- - .X.) are linearly independent. To see
this for the case n = 2, consider any numbers ¢, and cy such that
ak tom = [Aas]
AA, Marx Algebra 729Premultiplying both sides of [A.4.19] by A produces
GA, + GAK = GA + GA = 0. [aa20)
If [A.4.19] is multiplied by A, and subtracted from [A.4.20}, the result is
as - Ade = 0 (aaaiy
But x; is an eigenvector of A, and so it cannot be the zero vector. Also, Ay ~ Ay
cannot be zero, since Ay # A,. Equation [A.4.21] therefore implies that ¢; = 0.
A parallel set of calculations show that c, = 0. Thus, the only values ofc, and cy
Consistent with [A.4.19] are c; = 0 and c = 0, which means that x, and x, are
linearly independent. A similar argument for n > 2 can be made by induction
A Useful Decomposition
Suppose an n x m matrix A has m distinct eigenvalues (Ay, An, «5 Ay):
Collect there ia a diagonal matrix A
a) o
valor 8
0 0
Collect the eigenvectors (Xs Moy « « %) i an (n Xn) matrix T
Tein mos ah
Applying the formula for multiplying partitioned matics,
AT © (Am An c++ AR
But since (xy, xs, -. . .%) are eigenvectors, equation (A.4.16] implies that
AT= (Am Am c0 Atal [44.22]
‘A second application of the formula for multiplying parti
that the right side of [A.4.22] isin turn equal to
ned matrices shows
Paxy At + Aaa]
AO 0
Oa 0
i mi] 4
oo
TA,
Thus, [A422] can be written
At = TA [A423]
Now, since the eigenvalues (ly. Ags =» Aq) ae taken to be distinc, the
eigenvectors (8, Xa. « » %) are knovn to be linearly independent. Thus, |T| + 0
and T~" exists, Postmultiplying [A.4.23] by T~? reveals @ useful decompasition of
A
A
‘AT (aaa)
The Jordan Decomposition
‘The decomposition in [A.4.24] required the (n Xn) matrix A to have 1
linearly independent eigenvectors. This will be true whenever A has 7 distinct
730 Appendix A | Mathematical Review
|
|
|
|
cigenvalues, and could stil be true even if A has some repeated eigenvalues. In
the completely general case when A has s = n linearly independent eigenvec-
tots, there always exists a decomposition similar to [A.4.24], known as. the
Jordan decomposition. Specitcaly, for such a matrix A there exists a nonsingular
(n % n) mattix M such that
A= MIM-!, [A42y,
where the (1 n) matrix J takes the form
0
s=[0 2 [4.4.26]
oy,
with
A 1 0 0
Oa 10
U=]O0 Ao 0 (asm
000
‘Thus, J,has the eigenvalue A, repeated along the principal diagonal and has unity
repeated along the diagonal above the principal diagonal. The same eigenvalue A,
can appear in two different Jordan blocks J, and J, if it corresponds to several
linearly independent eigenvectors.
‘Some Further Results on Eigenvalues
‘Suppose that A is an eigenvalue of the (n x n) matrix A. Then A is also an
eigenvalue of SAS for any nonsingular (n x n) matrix S, To see this, note that
(A= Alyx =0
implies that
S(A = AL)S~'Sx
(SAS“! ~ AL)x* = 0 [a.425]
for x* = Sx. Thus, A is an eigenvalue of SAS~1 associated with the cigenvec-
tor
From [A.4.25], this implies thatthe determinant of any (n Xn) matrix Ais
the same as the determinant of its Jordan matrix J defined in [A.4.25. Since Js
upper triangular, its determinant isthe product of terms along the principal di
agonal, which were just the eigenvalues ofA. Thus, the determinant of any mstrx
Ais given by the product ofits eigenvalues.
Ttis abo clear thatthe eigenvalues of A are the same as those of A’, Taking
the transpose of [A.4.25],
are yum,
wwe see that the eigenvalues of A’ are the eigenvalues of J’. Since J” is lower
AA Matrix Algebra 731‘siangular, its eigenvalues are the elements on its principal diagonal. But J" and J
have the same principal diagonal, meaning that A’ and A have the same eigenvalues.
Matrix Geometric Series
‘The results of (A.3.6] through [A.3.10] generalize readily to geometric series
involving square matrices. Consider the sum
SreLtAtAPH ADH GAT (a4)
for A an (n Xn) matrix, Premultiplying both sides of (A.4.29] by A, we see that
ASP eA HARE AM EATS ATH, [A430]
Subtracting [4.4.30] from [A.4.29], we find that
(y= A)Sp = 1, — ATH, [a3]
Notice from (A.4.18] that iff, — A] = 0, then A = 1 would be an eigenvalue of
‘A. Assuming that none of the eigenvalues of A is equal to unity, the matrix
(1, = A) is nonsingular and (A.4.31) implies that
Sr= (y — AY, ~ AT) [A432]
if no eigenvalue of A equals 1. If all the eigenvalues of A are strictly less than 1
in modulus, it ean be shown that AT*—» O-as T+ =, implying that
GAH ATER = Halt [A433]
‘assuming that the eigenvalues of A are all inside unit ctcle,
Kronecker Products
For Aan (m x n) matrix and B a (p x q) matrix, the Kronecker product of
‘A and B is defined as the following (mp) X (nq) matrix:
0B OB eB,
‘The following properties of the Kronecker product are readily verified. For any
matrices A, B, and C,
A@By =a'@B [aay
OB @C=A@ BECO) (a4as
‘Also, for A and B both (m Xn) matrices and C any matrix,
(A+B @C= G+ BRO [4436
COA+B = (COA + (COB. [0437]
Let A be (m x n), Bbe (p x q), C be (n x k), and D be (g x r). Then
(A ®BY(C @ D) = (AC) @ (BD); [A438]
32 Appendix A | Mathematical Review
j
that is,
xB oyB <-- ay B][euD cad + ey)
2n8 2aB s+ anB| lex cad << cup
Pe FES 3
VayeBO LayeBo --- Y ayeeBo
| 2 tABD YL ayeaRD --- Z aycyBo
& MmjepBD Y ayje;.BD 1 aayeaBD.
For A (n x n) and B (p p) both nonsingular matrices we ean set C =
A-* and D = B~"in (A.4.38] to deduce that
(A@ BIA @B-) = (AA) @ GB) = 1, @1, = 1,
Thus,
(A@B = (A @B-Y. (A439)
‘igenvalues of a Kronecker Product
For A.an (n x 7) matrix with (possibly nondistinc) eigenvalue (Ay, As) «5
‘Ap and B (p % p) with eigenvalues (4, Ha - - - » ip, then the (np) eigenvalues
of A @ B ate given by Au fori 1,2.-+-,nandj= 1,2,...,p. To see
this, write A and B in Jordan form as
A= MahMit
B= MydoM"
Then (M, @ Mz) has inverse given by (Mz! @ Ma"). Moreover, we know from
[4.4.28] thatthe eigenvalues of (A @ B) are the same as the eigenvalues of
(Mz @ Mj'(A @ BUM, @ Ma) = (Mz'AM,) @ (Mz"BM,)
=U @ de.
But Ja andJp are both upper tianglat, meaning that (J, @ Js) is upper triangular
as wel The eigenvalues of (A @)B) are hus jst the tems onthe principal diagonal
of Ja @ Jo)s which are ven by Ay
Positive Definite Matrices
‘An (nx 1) real symmetric matrix A is said to be postive semidefnie if for
any real (n x 1) vector x,
wAKe 0.
‘We make the stronger statement that areal symmetric matrix Ais positive definite
if for any real nonzero (n x 1) vector X,
wAK> 0;
hence, any positive definite matrix could also be said to be positive semidefinte.
AA, Matric Algebra 733Let A be an eigenvalue of A associated with the eigenvector x
Ax
Premultiplying this equation by x’ results in
WAR = Ax’,
Since an eigenvector x cannot be the zero vector, x'x > 0. Thus, for a positive
semidefinite matrix A, any eigenvalue A of A must be preater than or equal to
zero. For A postive definite all eigenvalues are strictly greater than zero. Since
the determinant of Ais the product of the eigenvalues, the determinant of a positive
definite matrix A is strictly positive.
Let A be a positive definite (n x n) matrix and let B denote a nonsingular
(x x n) matrix. Then B’AB is positive definite. To see this let x be any nonzero
vector. Define
i= Br,
Then % cannot be the zero vector, for if it were, this equation would state that
there exists a nonzero vector x such that
Bx = 0-x,
in which case zero would be an eigenvalue of B associated with the eigenvector x.
But since B is nonsingular, none of its eigenvalues can be zero, Thus, % = BX
ceannot be the zero vector, and
: x'BABK
establishing thatthe matrix B'AB is positive definite.
‘A special case of this result is obtained by letting A be the identity matrix.
‘Then the result implies that any matrix that can be written as B'B for some non-
singular matrix Bis positive definite. More generally, any matrix that ean be writen
as B'B for an arbitrary matrix B must be positive semidefinite:
xBBx H+d+s + Heo, [asao]
"AR > 0,
where & = Bx.
‘The converse propositions are also true: if A is positive semidefinite, then
there exists a matrix B such that A = B'B; if A is positive definite, then there
exists @ nonsingular matrix B such that A = B’B, A proof of this claim and an
algorithm for calculating B are provided in Section 4.4
Conjugate Transposes
Let A denote an (m * n) matrix of (possibly) complex numbers:
By + Buk oy + bad
Aa] ant bal 8 Ont bad
mt + Boal °° * gg + Bn
“The conjugate ranspose ofA, denoted AM is formedby transposing A and replacing
each element with ts compex conjugt
yy — Dak + Ot ~ Ona
ain [85 Bato 0a bal
Co ee
‘Thus, if A is real, then A!’ and A’ would denote the same matrix.
74 Appendix A | Mathemarical Review
i
|
{
[Notice that if an (n x 1) complex vector is premultiptied by its conjugate
transpose, the result is a nonnegative real scalar:
0 + be
G2 bi) +++ @,~ bag] 4 OF
a, + yk
He = [la
~ Sars oneo.
For Ba real (m x n) matrix and xa complex (n x 1) vector,
(Bay! = xB
More generally, if both B and x are complex,
(Bay = xB H,
Notice that if A is positive semidefinite, then
aan = x!BBR ©
with <= Be. Thus, x"Ax is a nonnegative real scalar for any x when A is positive
semidefinite. It isa positive real scalar for A positive definite,
Continuity of Functions of Vectors
‘A function of more than one argument, such as
Y= FO Rae ay [A441]
is said to be continuous at (Cy Ca « «5 €y) HE FCC Cy «+ Gis finite and for
every e > O there is a 8 > such that
Wiles tase) — len eas Ol Se
whenever
(ey - + Ga) to t GC 8
Partial Derivatives
‘The partial derivative off with respect tox, is defined by
Fm tim ape tas poe +
Fe TRAM eet Asta
[A442]
Fei tay ia i Rieter
Gradient
If we collect the n partial derivatives in [A.4.42] in vector, we obtain the
‘gradient of the function f, denoted ¥:
aflan,
yn |r
os i
: [A443]
afar,
Ad, Matrix Algebra 738For example, suppose fs linear function:
(6s Say oy)
ayn tary te age (AAAS
Defi and xo be the following (nx 1) vectors:
i
xe [A446]
‘Then [A444] can be waitten
fx) = a's
‘The partial derivative of f(+) with respect to the ith argument is
‘and the gradient is
‘Second-Order Derivatives
‘A second-order derivative of [A441] is given by
fry at) 2 [2 te +l]
25, 2x, Es
Where second-order derivatives exit and are continuous fo all and , the order
of differentiation is ielevant:
. +]
8 [afte sed] 8 [afte
ar, Es &
Sometimes these second-order derivatives are collected in an n x n matrix Hcalled
the Hessian magix:
We will also use the notation
to represent the matrix H.
736 Appendix A | Mathematical Review
Derivatives of Vector-Valued Functions
Suppose we have ast of m functions f+), fs f(- e8ch of which
depends onthe n variables (2, -- q). We can cole them fanetons into
a ingle vectorvalved fenction
oo)
s9,-|
ful)
‘We sometimes write
fae Re
to indicate that the function takes n different real numbers (summarized by the
vector x, an element of R*) and calculates m different new numbers (summarized
by the value of f, an element of R"). Suppose that each of the functions f,(-),
Fi)r os fg) has derivatives with respect to each of the arguments x1, £3,
7 a. We can summarize these derivatives in an (m x n) matrix, called the
Jacobian matrix of €and indicated by atin’
ald, afilbe +++ alate
aflae, alae, apa,
Alt, BfylOky +++ Aly
For example, suppose that each of the functions f(x) is linear:
F(R) = aye, + drat + + Baa
LR) = dnaty + Oats H+ day
Srl) = Oaks + yas + °° + Sgn
We could waite this system in matrix form as
(3) = Ax,
where
Aw [o 2
om
Om Ona”
and x is the (n 1) vector defined in [A.4.46]. Then
a
we
Taylor’s Theorem with Multiple Arguments
Let f: R* > R? as in [A.4.41], with continuous second derivatives. A first-
order Taylor series expansion of fH) around eis given by
1010+ 2| -@-0+ Rien [aaa]
Ad, Matrix Algebra 737Here affox' denotes the (1 X mn) vector that is the transpose ofthe gradient, and
the remainder R,(-) satisfies
nen=35 3 2e
Am, 3s,
(a - eka - 6)
eau)
2
for &(i, j) an (n x 1) vector, potentially diferent for each i and j, with each
&(i, j) between e and x, that is, BG, /) = ACG, fe + [L ~ AG, {Dlx for some
‘AG, 7) between 0 and I. Furthermore,
‘An implication of [A.4.47] is that if we wish to approximate the consequences
for fof simultaneously changing x, by 41, %, by Bp,» and x, by da, we could
Fb + Bios # Baye BR) = Fie Re) a
See “
aha tae rte
If f(-) has continuous third derivatives, a second-order Taylor series expan
sion of f(x) around ¢ is given by
. f@) = f@ + ee (x = ¢) (asagy
the-9 2h] @-9 + Re0.
where: =~
nie = 53 3 3 are a uw OH OO) ee = 2
with 6(, j, between and x and
tim — Files 2) —
= @- Je-9”
Mutiiple Integrals
‘The notation
[[tenaya
indicts the folowing operation: fst integrate
[tere
with respect to y, with x held fixed, and then integrate the resulting function with
respect to x. For example,
if ay dy dx = [. x4222) ~ (#2) de = AHS ~ OF] = 25.
738. Appendix A | Mathematical Review
Provided that f(x,y) is continuous, the order of integration can be reversed. For
example,
if sty dedy = [ (W18y dy = (U5)-@22) = 25.
A'S. Probability and Statistics
Densities and Distributions
A stochastic or random variable X is said to be discret-valued if t can take
‘on only one of K particular values; call these sy, x), v5 Xx. Its probability
distribution is a set of numbers that give the probability of esch outcome:
P(X = x4} = probability that X takes on the value, k= 1,...,K.
‘The probabilities sum to unity
DPX =u} = 1
‘Assuming that the possible outcomes are ordered x, o
fad) = ( Fe) oe fas
AS. Probability and Statistics TAL[Notice that this satisfies the requirement of a density [A.5.1]:
[toni ay = [7 Be) ay
--f
= 75 [a teotes nds
GO,
Ax@)
A further obyiousimpliation of the definition in [A.5.7] tha joint density
can be witten as the product of the marginal density andthe conditional density
Frey 9) = Bi 918) Fu) (A58]
‘The conditional expectation of ¥ given tha the random varitble X takes on
‘the particular value xis
BULK =x) = [" y-fnu(ole & [Ass]
Taw of Iterated Expectations
[Note that the conditional expectation is a function ofthe value ofthe random
variable X. For different realizations of X, the conditional expectation will be a
different number. Suppose we view (YX) as a random variable and take its
expectation with respect to the distribution of X:
ExlEnx(¥i01 = fo [ [x texolo 4] fala) de
Results [A.5.8] and [A.5.6] can be used to express this expectation as
j j vr fexloa) dy de = | y-foly)
Thus,
ExlEye(¥1X)] = Ey(1). (45.10)
Jn words, the random variable E(Y|X) has the same expectation as the random
variable ¥. This is known as the law of iterated expectations.
Independence
The variables ¥ and X are said to be independent if
Sart 9) = fale). (as.al]
Comparing [A.5.11] with [A 5.8}, if ¥ and X ate independent, then
fax ils) = fe). (as.iz)
Covariance
Let wx denote E(2) and wy denote E(¥). The population covariance between
Xand ¥ is given by
cox ry ff te mo ae fevles9) dy ds. [AS.3)
742 Appendix A | Mathematical Review
Correlation
“The population correlation between X and Y i given by
= Cou, ¥)
Cont Xs ¥) = Pyare) -VVarPy
If the covariance (or correlation) between X and Y is zero, then X and ¥ ate said
to be uncorrelated.
Relation Between Correlation and Independence
Note that if X and ¥ are independent, then they are uncorrelated
conta, ¥) = ff ~ ned ~ afl) Fola) dy ae
Sf
#0 [f°0 = anton ay] 100
Furthermore,
ie 0 Hdfe) 4] = [yf ay ~ wef fon ay
= be
a
‘Thus, if X and ¥ are independent, then Cov(xX, ¥) = 0, as claimed,
"The converse proposition, however, is not true—the fact that X and Y are
uncorrelated is not enough to deduce that they are independent. To construct a
counterexample, suppose that Z and Y are independent random variables each
with mean zero, and let X= Z*Y. Then
E(K ~ px)(¥ ~ ay) = E(2¥)-¥]
» E(Z)-EW) = 0,
and so X and Y are uncorrelated. They are not, however, independent —the value
of ZY depends on ¥.
‘Orthogonality
Consider a sample of size T on two random variables, fry...» #r} and
{us Yor «+ «4 Yr}- The two variables are said to be orthogonal if
Sam
0.
‘Thus, orthogonality isthe sample analog of absence of correlation.
For example, let x; = 1 denote a sequence of constants and let y, = Ww, ~
W, where = (1/7)3iLy¥; is the sample mean of the variable w. Then x and y
are orthogonal:
Erm Su-reea
Al. Probability and Statistics 743Population Moments of Sums i
‘Consider the random variable aX + bY. Its mean is given by I
Bax + ov) = ff e+ bd farlen) ay de
maf | eter» arde+o] | ytertes dae
) wf epear sof» nora,
and so
E(k + BY) = @ 5X) + BEY) [as.q
‘The variance of (aX + bY) is
vartax + bY) = ff (ee + Oy) ~ (ony + buydP fxs 9) dy de
= j j [ax ~ any)? + Yar ~ anx)(by ~ buy)
+ y ~ buy) Feeley) dy de .
ee) fe-mhrenee
+ ab j J (= Waly ~ Hr) Farle») dye
0°] fom farted ae 1
Thus,
Var(ax + BY) = a?-Var(X) + 2ab-Cov(X, Y) + 67 Var(¥). [A.5.15]
When X and ¥ are uncorrelated,
Var(aX + BY) = a?-Var(X) + B® Var(¥).
Ic is straightforward to generalize results (A.S.14] and (A.5.15). If (%,
Xz... Xp} denotes a collection of n random variables, then
F(X, + GX, +--+ + aX)
= a EK) + 0B) +22 + 0 EK)
Varaak, + oak +++ + aX)
= af Var(Xi) + oF: Var() + ++ + a}: V(X)
+ ty: CoM %) + 2ayey-Cov, 3) + (as.i7]
+ 24,0y-Cov(X, Xa) + 2ayty-Cov(Ny 5) + aay Cov, Xe)
Het Bayt Covina as X)
fas.ig
TAA Appendix A | Mathematica! Review
Ifthe X's are uncorrelated, then [A.5.17] simplifies to,
Var(@uXs + oaXs + °° + ayXy) fas.s]
a4. Var(&) + af-Van(XG) + +++ + 02. Vat)
Cauchy-Schwarz Inequality
‘Te Cauchy-Schwarz inequality states that for any random variables X and
¥ whose variances and covariance exist, the correlation is no greater than unity in
absolute value:
15 Con(X, ¥) $1 (a.s.i9]
‘To establish the far right inequality in [A.5.19], consider the random variable
ze toe
ware}
“The square of this variable cannot take on negative values, 50
Hy) = ad
«(Sete - Sate] =o
‘Recognizing that Var(X) and Var(¥) dente population moments (as opposed to
random variables), equation [A.5.15] cam be used to deduce
EC = ux Bes sy Y= wy) ] + EO =m
Varex) Wax) Var argh)
Thus,
= 2Con(x, ¥) +120,
‘meaning that
Corr(X, ¥) = 1.
“To establish the far left inequality in [A519], noice that
(Se wx), = sal =o
War(?). =
Wax) *
implying that
1+ 2-ComX, ¥) +120,
so that
Conr(x, ¥) = = 1.
The Normal Distribution
‘The variable ¥, has & Gaussian, or Normal, distribution with mean wand
variance 0? if
on
2
Io) = ese =O 8) (A520)
We write
~ Nw o)
to indicate that the density of ¥, is given by [A.5.20].
A.S. Probability and Staistics 745(Centered od-ordered population moments for 8 Gaussian variable are zero:
EW, wy =O forr= 1,35...
‘The centered fourth moment is
EY, ~ wt = Sot,
‘Skew and Kurtosis
‘The skewness of a variable Y, with mean w is represented by
EC, = pp
(arp?
A variable with a negative skew is more likely to be far below the mean than itis
to be far above the mean. The kurtosis is
[var(¥)?
A distribution whose kurtosis exceeds 3 has more mase in the tails than a Gaussian
distribution with the same variance,
Other Useful Univariate Distributions
Let (Xj, Xn --- » X,) de independent and identically distributed (4id.)
‘N(O, 1) variables, and consider the sum of their squares:
Yu Xp+ Xe +X.
“Then Yissidto havea chrsquare itrbaton wth degree of freedom, denoted
¥ ~ xn),
Let X~ N(O, 1) and ¥ ~ x%() with X and Y independent, Then
x
Wi
is uid to have at dsrution wit n degrees of freedom, denoted
Z~ Kn).
Let, ~ x4) and ¥5 ~ x) with Yad ¥ independent, Thea
Ye
Yuin
is said to have an F distribution with n, numerator degrees of freedom and nz
‘denominator degrees of freedom, denoted
Z~ Fry m).
Note that if 2 ~ ¢(n), then Z? ~ F(L, n)
z=
Likelihood Function
Suppose we have observed a sample of ize T on some random variable ¥,.
Let faves ute(ir Jas + «++ ¥ni 8) denote the joint density of ¥,, Ys, ..., Yr.
746 Appends A | Mathematical Review
‘The notation emphasizes that this joint density is presumed to depend on a vector
of population parameters 0. If we view this joint density as a function of @ (given
the data on ¥), the result is called the sample likelihood function.
For example, consider a ample of T iid. variables drawn from a N(u, o)
distribution. For tis distribution, © = (j, 0°)’, and from [4.5.11] the joint density
is the product of individual terms such as [A.5.20}
Prater iv Yar Pride 29) = TT fs as 2)
“The log of the joint density is the sum of the logs of these terms:
1B Fete nte Pas Yar == Ye Be
- & 0 fut 0?) (A521)
Tr) log(2n) ~ (72) gto") ~ 3
Ts, fora simple of T Gaussian random variables with mean and vince 2
the sample log Hkeihood function, denoted £(, 0°: ys, Yas -- ++ /r) is given by:
20, 0% yaya aye) = = (02) ogo?) ~ BOP (a.s20)
In calculating the sample log likelihood function, any constant term that doesnot
involve the parameter or 0 can be ignored for most purposes. In (A.5.22, this
constant term i
k= ~(7) log(2a).
Maximum Likelihood Estimation
For a given sample of observations (Ys, Ya, +» Ys the value of @ that
makes the sample likelihood as large as possible is called the maximum likelihood
‘estimate (MLE) of 8. For example, the maximum likelihood estimate of the pop-
‘lation mean p for an iid. sample of size T from a N(u, 22) distribution is found
ative of [A.5.22] with respect to w equal to zero:
[as.23]
[as.24]
Substituting [A.5.23] into [A.5.24] and solving for o* gives
on wn n> ay [A525]
A.S. Probability and Statistics 747‘Thus, the sample mean is the MLE of the population mean and the sample
variance is the MEE of the population variance for an i.d. sample of Gaussian
variables
Y= (My Ya Ya)!
be a collection of n random variables. The vector Y has a multivariate Normal, ot
‘multivariate Gaussian, distribution if its density takes the form
$s) = @n)-"9)21-¥ expl(—12)(y ~ wy'D-(y ~ w)]. [45.26]
‘The mean of ¥ is given by the vector
i)
and its variance-covariance matrix is O:
EY ~ why ~ wy = 0.
Note that (Y ~ p)(¥ ~ )’ is symmetric and postive semidefinite for any
YY, meaning that any variance-covariance matrix must be symmetric and positive
semidefinite; the form of the likelihood in [4.5.26] assumes that 0 is positive
defitite.
Result [A 4.15] is sometimes used to write the multivariate Gaussian dens
jn an equivalent form:
Fe(2) = Qn)-27]-11"7 expl(—12N.y - wy OGY ~ w]e
IY ~ N(q, 0), then for any nonstochastic (r x n) matrix H' and (r x 1)
vector b,
HY + b~N(GH'm + 8), HOH),
Correlation and Independence for Multivariate
Gaussian Variates
It Y has a multivariate Gaussian distribution, then absence of correlation
implies independence. To see this, note that if the elements of Y are uncorrelated,
then El(Y, ~ w)(¥; ~ n)] = Ofori + j and the off-diagonal elements of 0 are
ees
0 60 o%.
sch ap 0,
19) = ofe} +o} [A527
w[Phe tase
Cn ee
748 Appendix A | Mathematical Review
Substituting [A.5.27] and [A.5.28] into [A.5.26] produces
AQ) = 20) "fojo} «+ + 3]
x exp[(- 12X01 ~ mot + Oa ~ malo} +
+ Oa ~ meio)
= fLem- tei eml(-10,10, - Pi
which isthe product of n univariate Gaussian densities. Since the joint density is
the product of the individual densities, the random variables (¥. Yay «-, Ya)
are independent.
Probability Limit
Let (X,, Xa. «- Xz} denate a sequence of random variables. Often we are
interested in what happens to this sequence as T becomes large. For example, Xy
might denote the sample mean of T observations:
Xe = UT)AY, + Yet + YD), (as.29]
in which case we might want to know the properties of the sample mean as the
size of the sample T grows large.
‘The sequence {Xy, Xa, . . . » Xr}is sid to converge in probability toc if for
every © > 0 and 6 > O there exists a value N such that, forall 7 = N,
Pi\Xp- el > 8} O there exists a value N such that, for all T= N,
EX, - oF Se. fas]
We indicate that the sequence converges to ¢ in mean square as follows:
Xe,
Convergence in mean square implies convergence in probability, but con-
vergence in probability does not imply convergence in mean square,
AS. Probably and Staisties 749