Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
65 views23 pages

J Hamilton Time Series Analysis-360-382

Uploaded by

Leo Teruel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
65 views23 pages

J Hamilton Time Series Analysis-360-382

Uploaded by

Leo Teruel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 23
A Mathematical Review ‘This book assumes some familiarity with elementary trigonometry, complex num- bers, caleulus, matrix algebra, and probability. Introductions to the frst three topics by Chiang (1974) or Thomas (1972) are adequate; Marsden (1974) treated these issues in more depth. No matrix algebra is required beyond the level of standard ‘econometrics texts such as Theil (1971) or Johnston (1984); for more detailed treatments, see O'Nan (1976), Strang (1976), and Magnus and Neudecker (1988). ‘The concepts of probability and statistics from standard econometrics texts ae also sulfcient for getting through this book; for more complete introductions, see Lind- ‘ren (1976) or Hoel, Port, and Stone (1971). ‘This appendix reviews the necessary mathematical concepts and results. The reader familiar with these topics is invited to skip this material, or consult sub- headings for desired coverage. All, Trigonometry Definitions Figure A.1 displays a circle with unit radius centered atthe origin ia (x,y) space. Let (i. yo) denote some point on this unt circle, and consider the angle 8 between this point and the z-axis, The sine of 4 is defined as the y-coordinate of the point, and the cosine is the x-coordinate: ” (ana £08(0) =X faa] ‘This text always measures angles in radians. The radian measure of the angle is defined as the distance traveled counterclockwise along the unit circle starting atthe s-axis before reaching (x, yo). The circumference ofa circle with unit radius is 2m, A rotation one-quarter of the way around the unit circle would therefore correspond to radian measure of @ = #(2n) = 2. An angle whose radian measure is 7/2 is more commonly described asa right angle or a 90° angle. A 45° angle has radian measure of 7/4, a 180° angle has radian measure of 7, and so on. Polar Coordinates Consider a smaller triangle—say, the triangle with vertex (x;, y,) shown in Figure A.1—that shares the same angle @ as the original triangle with vertex 704 FIGURE A.1. Trigonometric functions as distances in (x, )-space. (xo. ye). The ratio of any two sides of such a smaller triangle will be the same as. that for the larger triangle yale, = yolk fa13} riley = aa {a4 ‘Comparing [A.1.3] with (A.1.1], the ycoordinate of any point such as (, ) in (e,yespace may be expressed as n= evsin(®), (a1s} ‘where cis the distance from the origin to (4, y,) and isthe angle that the (tu ys) makes with the x-axis. Comparing [A.1.4] with [A.1.2], the x-000 of (ys) cam be expressed as 21 = e050). (a1 Recall further thatthe magnitude cy, which represents the distance from the origin to the point (x,y), is given by the formula = VFS {aay ‘Taking a point in (x, y)-space and writing it as (€-cos(8), e-sin() is called de- seribing the point in terms ofits polar coordinates and 0. ALL. Trigonometry 708 Properties of Sine and Cosine Functions ‘The functioas sin(@) and cos(@) are called trigonometric or sinusoidal func: tions. Viewed as a function of 8, the sine function starts out at zero: sin(Q) = 0. ‘The sine function rises to 1 a5 @ increases to x/2 and then falls back to zero as @ increases further to 7 see panel (a) of Figure A.2. The function reaches its min- imum value of ~1 at @ = 322 and then begins climbing back up. If we travel a distance of 2a radians around the unit circle, we are right back where we started, and the function repeats itself sin(2a + 8) = sin(), ‘The function would again repeat itself if we made two full revolutions around the unit circle, Indeed for any integer j, sinQnj + 0) = sin(0) [Ars] oN (@) sina) . ‘ ws er a (6) cosa) FIGURE A.2_ Sine and cosine functions. 706 Appendix A | Mathematical Review ‘The sine function is thus periodic and is for this reason often useful for describing time series that repeats itself in a particular cycle. ‘The cosine function starts out at unity and falls to zero as @ increases to ‘nid; see panel (b) of Figure A.2, It turns out simply to be a horizontal shift of the Sine function: cos(0) = sia(o + 2) fais) The sine or cosine funtion cn also be evaluated for negative values of 6, defined a clockwise rotation sound the unt re from te wax, Cea, sin(—0) = ~sin(o (a.119) coot=8) = so, faLiy For (Zo, yo) @ point on the unit circle, [A.1.7] implies that 1= Vit or, squaring both sides and wing (A.11] and [4.1.2], 1 = [cos(@)P + [sin(P. fAaz} Using Trigonometric Functions to Represent Cycles Suppose we construct the function g(@) by frst multiplying @ by 2 and then evaluating the sine of the product: : 8(0) = sin(20) This doubles the frequency at which the function eyces. When @ goes from 0 to +, 28 goes from 0 to 2, and so g(0) i back tits orignal value (sce Figure A.3) Ingeneral, the function sin(k0) would go through keycles inthe time it takes sa(@) to complete a single cyte, ‘We will sometimes describe the value a variable y takes on at date ¢ as a funetion of sines or cosines, such as Y= Recos(wt + a. fata3 FIGURE A.3. Effect of changing frequency of a periodic function A. Trigonometry 707 ‘The parameter R pves the amplinde of (A.113]. Tae variable y, wil atain a maximum value of + and a minimum value of ~ R. The parameter ais the phase ‘The phase determines where in the cycle y, would be at f = 0. The parameter o governs how quickly the variable cycles, which can be summarized by either of {vo measures. The period i the length of time required forthe process to repeat a ful eycle. The period of [A.1.13]s 2vlo. For example, if» = 1 then y repeats itself every 2x periods, whereas if w ~ 2 the process repeats itself every = periods. The frequency summarizes how frequeotly the proces cycles compared withthe simple function cos(t); thus, it measures the numberof cycles completed during 2a periods. The frequency of cos(t) is unity, and the frequency of [A.1-13] is For example, if « = 2, the cycles are completed twice as quickly as those for cos’). There is a simple relation between these two measures ofthe speed of cyeles—the period is equal to 2x divided by the frequency. AX. Complex Numbers Definitions Consider the following expression eel (a2a} ‘There are two values of x that satisy [A.2.1], namely, x = 1 andx = — ‘Suppose instead that we were given the following equation: een (a22] No real number satisfies [A.2.2]. However, let us coasider an imaginary number (denoted i) that does: Ben {a23] ‘We assume that 'can be multiplied by areal number and manipulated using standard rules of algebra, For example, 4 B= Si and @i)-Gi) = OF = ~6. ‘This last property implies that a second solution to (A.2.2] is given by x = — (it = (10 = 1 Thus, [A.2.1] has two real roots (+1 and —1), whereas [A.2.2] has two imaginary roots (i and ~i) For any real numbers @ and b, we can construct the expression a+ bi [a24] If b = 0, then [A.2.4] is a real number; whereas if a = O and b is nonzero, then [A.2.4] isan imaginary aumber. A number written in the general form of [A.2.4] is called a complex number. Rules for Manipulating Complex Numbers ‘Complex numbers are manipulated using standard rules of algebra. TWo ‘complex numbers are added as follows: (ay + bul) + (as + Bal) = (@ + 2) + (Oy + BD. 08 Appendix A | Mathematical Review ‘Complex numbers are multiplied this way (a, + bul)-(as + bal) = aye, + aba + Byayi + bybyi? ) + ibs + bai ‘Not ths the resuling eornion ate alway simplified by separating the sal component (such as [aja — b,bs)) from the imaginary component (such as (abs + bas}. : = (aa; ‘Graphical Representation of Complex Numbers ‘A complex number (a + bi) is sometimes represented graphically in an Argand diagram as in Figure A.4. The valve of the real componeat (a) is ploted on the horizontal axis, and the imaginary component (6 is plotted on the vertical ais. The size, or modus, of a complex number is measured the same wey a3 the O there ea 8 > Osuch tat) = flO ,n power seis an be used to characterize the function f(x), To finda power series we choose a particular value e around which to cemer th expansion sch as ¢°2 8, We then use (A. 3.12] with r=» =, For example, conser the sine function, The first two derivatives are given by [A.3.2} and [A.3 5], with the following higher: order derivatives: # sing) te) = caste) 50) © ace) @ sin(x) SB? = costs), and so on. Evaluated at x = 0, we have £(0) = sin(o) = cost) = 1 0 : SE) como) =. Substituting into (A.3.12] with ¢ = 0 and letting r~» = produces a power series for the sine function: dys oe sing) =x GP + pe Pat {a3.13) Similar calculations give a power series for the cosine function: Layla ld coe) = 1 = att at data [asd] Exponential Functions ‘A number raised 0 the power x, fee rs TIA Appendix A | Mathematical Review is called an exponential function of x. The number is called the base of this function, and x is called the exponent. To multiply two exponential functions that share the same base, the exponents are added: (9-079 = He, [a3.1s] For example, AW) = Or” To raise an exponential function to the power k, the exponents are multiplied: y (rt = 9. (A319) For example, OF? = DLN = Exponentation is distributive over multiplication: (Ay = (0-69. (asm Negative exponents denote reciprocals yt = (uy, ‘Any number rised to the power 0s taken tobe equal to unity peu (A3.18} This convention is sensible since ify = —x in [A.3.15}, (90-9 =F and oy at oy = The Number e ‘The base for the natural logarithms is denoted e. The number e has the property that an exponential function with base e equals its own derivative det ae [ass] all the higher-order derivatives of e* are equal to e* as well: we @ [A320] ‘We sometimes use the expression “exp[x]” to represent “eraised tothe power exple] = et If u(z) denotes a separate function of x, the derivative of the compound function e*” can be evaluated using the chain rule: det) det du au det dit guy dt 3. a du dx a (asi) A. Calculus 15 ‘To find a power series for the function f(z) = e*, notice from (A.3.20] that ea aos sud, tom (A248, fi] wan a for all r. Substituting (A.3.22] into [A.3.12] with ¢ = O yields a power series for the function f(x) = eoe ( enters tee (a323) Setting x = 1 in [A.3.23] gives a numerical procedure for calculating the value of e: foes = 2.71828 deed tit+gede Euler Relations and De Moivre's Theorem Suppose we evaluate the power series (A.3.23] atthe imaginary number x = 19, where {= V=T and 0 is some real angle measured in radians: COP, GOP , ot, Gor. 1m 9 OE OIE =f $f Jeufe-fee- eng o[A:8 an (43.4 frente ion (32 cos(@) + i-sin(a), {A.3.25] Similarly, ots (inp CHB, io? ioe cine ae G+ GE -[-$+8- }-ef } A326 = cose) — sn, ‘To raise a complex number (a + bi) to the kth power, the complex number is waitten in polar coordinate form as in (A.2.6} @ + bi = R{cox(a) + J-sin(9)]. 18 [A.3.25], this ean then be treated as an exponential function of 8: a+ b= Re, [A327 "Now raise both sides of [A.3.27] to the kth power, recalling [A.3.17] and [A.3.16}: (a + bik = RE fe = Reel [a.3.23] Finally, use [A.3.25] in reverse, el) = cos( Ok) + i-sin( Ok), 716 Appendix A | Mathematical Review 10 deduce that (A.3.28] can be written (a + Bilt = RE [oos(ok) + i-sin(oky), [A329 Definition of Natural Logarithm ‘The natural logarithm (denoted throughout the text simply by “log”) is the inverse of the function e*: log(er) = x: Notice from (A.3.18] that e° = 1 and therefore log(1) = 0, Properties of Logarithms For any x > 0, it also the case that r= om, (43.30) From {A.3.30] and [A.3.15}, we sce thatthe log ofthe product of two numbers is equal tothe sum ofthe logs: 1og(0-b) = logl(e%s(2408)] = gleaned M4} Jog(a) + log). Also, use [A.3.16] to write a [asst] [estoy = erat, Taking logs of both sides of [A.3.31] reveals that the log of a number raised to the 4 power is equal toa times the log of the number: log(e") = a-log(x). Derivatives of Natural Logarithms Let u(x) = log(s), and write the right side of [A.3.30] ase), Differentiating both sides of [A.3.30] using [A.3.21] reveals that a Tous, 4 togts) | de [A333] Logarithms and Elasticities ‘tis sometimes also useful to differentiate a function f(s) with respect to the variable log(x). To do so, write (x) as f(u(x)), where Gz) = expllog(s) A. Caleuus 717 Now use the chain rule to differentiate: SL OBR see Tlog(s) ~ du Tioga) (43.33) But from (A.321), au log) _ (A334) Tiogh ~ “PCP Tioga) Substituting (A.3.34] into [4.3.33] gives ae) Tlogt) ~ * a It follows from (A.3.32] that dog if (e+ 9) = playifta) dloge ~f*de~ e+ a) - ax Which has the interpretation as the elasticity off with respect to x, or the percent change in f resulting from a 1% increase in x. Logarithms and Percent ‘An approximation to the natural log function is obtained from a first-order Taylor series around ¢ = 1 log +) = logy + £22860 (A335) But log(t) = 0, and ‘Thus, for 8 lose to zero, an excellent approximation i provided by log(t + a) = 8 (A336) ‘An implication of [A.3.36] is the following. Let r denote the net interest rate ‘measured a8 a fraction of 1; for example, r = 0.05 corresponds to a 5% interest rate, Then (1 + 1) denotes the gross interest rate (principal plus net interes). ‘Equation [A.3.36} says thatthe log of the gross interest rate (1 + r) is essentially the same number asthe net interest rate (r) Definition of indefinite Integral Integration (indicated by J dx) isthe inverse operation from differentiation. For example, [rdc= x2, (4337) 718. Appendix A | Mathematical Review because {z3?2) oP) [43.38] ‘The function 2472 is not the only function satisfying [A.3.38]; the function (wn) +c slso works for any constant C. The term C is referred to as the constant of inte- ration. ‘Some Useful Indefinite Integrals ‘The following integrals can be confirmed from [A.3.1], (A.3.32], [A.3.2}, [A.3.3], and (A.3.21} feaefsc ten [43.39] eT 220 Jove (eS fe 2p ate Josey de sine) +06 (asa snp dc = ~consy + 6 (asa) Jere ume se (ass) {tis also straightforward to demonstrate that for constants @ and b not de- pending on x, flere + d-sande~af reds +6 fede, Definite Integrals Consider the continuous function f(x) plotted in Figure A.S. Define the function A(x; a) to be the area under f(x) between a and x, viewed as a function ‘of x. Thus, A(b; a) would be the area between a and b. Suppose we increase 6 by ‘small amount 4, This is approximately the same as adding a rectangle of height 1(e) and width & to the area A(b; a): A(b + 45a) = AG; a) + f(b), AW 8:0) = AbiA) 5 In the limit as A 0, ACs; a) ae 10). (A3.49] Now, [4.3.44] has to hold for any value of b > a@ that we might have chosen, A3, Caleulus 19 0) 4 SSS " «oy tw 7 ° 7 FIGURE A.S The definite integral as the area under a function Jmplying thatthe area function A(x; a is the inverse of diferentiation: AG 0) = FQ) +, (A3.43] where AFD - 50) Be £0). To find the value of C, notice that (a €) in [A.3.45] shouldbe equal to zero A(a; a) = 0 = Fla) + C. For this to be tue, ce -Fo. (A349 Evaluating (A.345] a = 6, the area between @ and b is given by Atta) = FO) +6; weeens AO A(b; a) = F(b) ~ F(a), [A347] were Fe) siisis dex = (2) Fe) = [fe de ation (A 3.47] known a8 he fndamental ere of cals Paine operat in (A. 2 Koown clon fie era Lrow- [fro] - [fro] For example, to find the area under the sine function between @ = 0 and 0 = a, we use [A.3.42] 720 Appendix A | Mathematical Review SP sac) a= (-conNeona = [cone = (-cox(a2)] + e080) ol ah To find the area between 0 and 2, we take ste ae» ent + cot ene =6. The positive values for sin(x) between 0 and x exactly cancel out the negative values between 1 and 2x. AA. Matrix Algebra anaes Definitions ‘An (m Xn) matrixisan array of numbers ordered into m rows and n columns: ae RE SF {ethereis only one column (n = 1), then Ais described as a column vector, whereas 1), Ais called a row vector. A single number (n = Land IE the number of rows equals the number of columns (m 2), the matrix is ssid to be square, The diagonal running through (a 0, -” +e) in@ aguare matrix is called the principal diagonal. If all elements oft the pescpal diagonal se zez0, the matrix ssid to be diagonal ‘A matrix is sometimes specified by describing the element in tow i, column A uu) Summation and Multiplication ‘Two (m x m) matrices are added clement by element a ay ay bu be os Be fn ta o's al |b ba + bee i a ut bn Git by oo Ot Dy Qutbn Oa tba => ant dy, Ami + bmi Oa + Pmt °° * Go + Br Ad Marte Atnaten 704 or, more compactly, A+ B= [ay + by) ees (mn) The product of an (m Xn) matrix and an (w x q) matrix is an (m x q) matix: Ache mec cone eee) (wee where the row i, column j element of C is given by 31-144b,,. Notice that mul- ‘iplication requires that the number of columas of A be the same as the number of rows of B To multiply A by a scalar a, each element of A is multiplied by a: axa =c, 5)“ eoeny ean with C = [aay It is easy to show that addition is commutative: A+B=BHA; ‘whereas multiplication is not AB + BA. Indeed, the préduct BA will not exist unless m = q, and even where it exists, AB would be equal fo BA only in rather special cates. Both addition and multiplication are associative (A+B) +C=A+B40) (AB)C = A(BC). Tdentity Matrix ‘The identity matrix of order n (denoted I,) is an (n x n) matsix with Is along the principal diagonal and Os elsewhere: 10g wef be O01 For any (m x n) matrix A, and also Powers of Matrices For an (n x n) matrix A, the expression A? denotes A-A. The expression ‘A* indicates the matrix A multiplied by itself k times, with A® interpreted as the (n x n) identity matrix. 722 Appendix A | Mathematical Review Transposition Let aj, denote the row i, column j element of a matrix A: A= (ay) The transpose of A (denoted A’) is given by At = fay For example, the transpose of set a5 2 673) ‘The transpose of a row vector isa column vector Its easy to verity the following: ay {aaa} (a+py [aaa] (aBy [a.a3] Symmetric Matrices ‘A square matrix satisfying A = AY is said to be symmetric, Trace of a Matrix ‘The trace of an (n x n) matrix is defined as the sum ofthe elements along the principal diagonal: trace(A) = ay ton +++ + dye EA isan (m Xn) matrix and B is an (n Xm) matrix, then AB isan (m im) matt whose tace is Zeuta + Savta e+ +S onda = EB eube ace(Al ‘The product BA is an (n x n) matrix whose trace is Thus, trace(AB) = trace(BA), IEA and B are both (n x n) matzices, then tuace(A + B) = trace(A) + trace Ad. Matrix Algebra 723 IE Ais an (w % n) matrix and A isa scalar, then sta) = Sta dine Partitioned Matrices A paritioned matric is a matrix whose individual elements are themselves matrices. For example, the (3 x 4) matrix SRBeel S55 osimeime aft gl sora] =f 3] aim lan an) fess a). rovide thatthe row aad column dimensions pemitthe props matt bs Fe x operations, For example, where Ap Beene Ase BAe By misny mn] g fomminn emvmna] _ [taxa “naa ‘As *[ oa Be TT [At Bs Ag+ B natn emma) Lomein emminad emg? “nn Similarly, ‘Agehy Bo Be] AB. + AB, AB + AB, inti nts] fovea creas] _ [AAS ned 7 : By | [AsB, + ABs ASB, + AB, natn) mrmdl Lineed canes ouned Gave Definition of Determinant ‘The determinant of a2 X 2 matic i given by the following salar: : {AL = etn ~ ata fas) The determinant of an n x nm matrix can be defined recursively. Let Ay, denote the (2 ~ 1) x (wm ~ 1) matrix formed by deleting row and column j from A. The determinant of Ais given by lal [Aas] For example, the determinant of a 3 x 3 matrix a1 a a 2 ae as lan asl 4 g,,|2 2 TAA Appendic A | Mathematical Review | | | | Properties of Determinants A square matrix is said to be lower triangular if all the elements above the Drincipal diagonal are zero (ay, = 0 for j > i) a 0 0 ss oO iets eaiagi. ° On Oy aa Bn, ‘The determinant of a lower triangular matrix is simply the product of the terms ‘long the principal diagonal: IAl = ayn Inn: [A.4.6] That 4.4.6] holds for n = 2fllows immediately from (A.4.4] Given tat it holds fora matrix of order n ~ 1, equation (A4.5} implies that holds for w a 0 0 ° lAl= ay] % 0] + Otdal #2 + OLA a cr ‘An immediate implication of (A.4.6] is that the determinant ofthe identity matrix is unity: iL] =a. [a4] Another useful fact about determinants is that if an m x m mate A is mul- Uiplied by a scalar a, the effect is to multiply the determinant by a: Jaa! = atlAl. (a4a] Again, (A.4.8] is immediately apparent for the foal = [ogo seal (@ayaaz2) ~ (aay00;,) (01.02 ~0,202:) = aAl. case from [A.4.4): Given that it holds for n ~ 1, its simple to verify for m using [A.4.5]. By contrast, if a single row of A is multiplied by the constant a (as opposed ‘to multiplying the entire matrix by a), then the determinant is multiplied by a. If the row that is multiplied by'a is the first row, then this result i immediately ‘apparent from (4.4.5). If only the ith row of A is multiplied by a, the result can be shown by recursively applying [A.4.5] until the elements of the ith row appear cexplicly in the formala. Suppose that some constant c times the second row of a2 x 2 matrix is added to the fist row. This operation has no effect on the determinant lan + cam, aia + ca a 2 | = Gu + conden ~ (0a + conden Auster ~ as Similarly, if some constant ¢ times the third row of a 3 x 3 matrix is added to the A. Matrix Algebra 725 second row, the determinant will again be unchanged: Jan + cam, an + Ca aay + Cass — aqlt + oan aay + cass fap|t 4 oO in + ety on as iF w= 0,|% ol — sayls % In general, if any row of an n x m matrix is multiplied by e and added to another row, the new matrix will have the same determinant as the original. Similaly, ‘multiplying any column by c and adding the result to another column will not change the determinant, This can be viewed as a special case of the following result. IA and B are both m x m matrices, thea [ABI = [ALB [a4] ‘Adding c times the second column of a2 x 2 matrix A to the first column can be thought of as postmultiplying A by the following matrix: Et] Since B is lower triangular with 1s along the principal diagonal, its determinant is unity, and s0, from [4.4.8], AB] = [al. ‘Thus, the fact that adding a multiple of one column to another does not alter the determinant can be viewed as an implication of [4.4.3]. If two rows of matrix are switched, the determinant changes signs. To switeh, the ith row withthe jth, multiply the th row by ~1; this changes the sign of the determinant. Then subtract row / from row j, add the new j back toi, and subtract { from j once again. These last operations complete the switch and do not affect the determinant further. For example, let A be a (4 x 4) matrix written in par titioned form as Where the (1x 4) vector a represents the ith row of A. The determinant when rows 1 and 4 are switched can be calculated from 726 Appendix A | Mathematical Review ‘This result permits calculation of the determinant of A in reference to any row of an (we Xn) matrix A: tal = 3 (= n!a)1Ay (4.4.10) To derive [A.4.10}, define A* as ‘Then, from [A.4.5], 3 cpaglay lar Cited Moreover, A i obtained from A by (I ~ 1) row switches, such as switching with FUE A with = 2)... and 2 with 1. Hence, la creas cy +S aitalasle as claimed in [A.4.10}. ‘An immediate implication of (A.4.10] is that if any row of a matrix contains all zeros, then the determinant of the matrix is zero, It can also be shown thatthe transpose of a matrix has the same determinant as the original matrix: IAT = LAL. faa.) ‘This means, for example, that ifthe kth column of a matrix consists entirely of zeros, then the determinant ofthe matrixis zero. Italso impliesthat the determinant ‘of an upper triangular matrix (one for which a,, = O for allj< i) isthe product of the terms on the principal diagonal. Adjoint of a Matrix Let A denote an (n x n) matrix, and as before let A, denote the (2 ~ 3) x (x ~ 1)] matric that results from deleting row j and column /of A. The adjoint of Ais the (nn) matrix whote row i, column j element is given by (—1)"// Ay Inverse of a Matrix Ifthe determinant of an m X 1 matrix A is not equal to zero, its inverse (an nx m matrix denoted A~*) exists and is found by dividing the ‘adjoint by the determinant: Art = (MAD Dag) faaag] At, Matrix Algebra 727 For example, for fe mall Sees sutnd-[ 22, el (A413) 2, ‘Acmatrix whose inverse exists i said tobe nonsingular. A matrix whose determinant is zero is singular and has no inverse. ‘When an inverse exists, Ax Ate, [aatg ‘Taking determinants of both sides of (A.4.14] and using [4.4.9] and (A.4.7], [Ai-[A- = 1, 1a] = vial (Aas) Alternatively, taking the transpose of both sides of [A.4.14] and recalling (a.43}, anya’ = hy ‘which means that (Ais the inverse of A’: any = (ay and A a nonsingular matrix, [aA}“! = @'a-t For a arnonzero se ‘Also, for A, B, and C all nonsingular (7 x n) matrices, [any = BA-* and [aBc]-* = C-*B-!A-} Tinear Dependence Let m, ta. % BE a set of k different (n % 1) vectors. The vectors are said to be linearly dependent it there exists set of K scalats (Cy yy 64), BOL all of which are zero, such that aks + eats + oo + Oh =O. If no such set of nonzero numbers (¢, ca,» Ca) exits, then the vectors (x, Ya, «1 a) ate said t0 be linearly independent Suppose the vectors (ny, Xa,» Xx) are collected in an (n x k) matrix T, ‘written in partitioned form as Tata mo mh If the number of vectors (k) is equal to the dimension of each vector (n), then there isa simple relation between the notion of linear dependence and the dete- at ofthe (nn) matsix T; specially, if (xy, xg, ~~» X,) ae linearly dependent, then |T| = 0. To see ths, suppose that x, i one of the vectors that have a nonzero value of Then linear dependence means that = ~(aleds 728 Appendic A | Mathematical Review = (eles = = Glenn i ‘Then the determinant of T is equal to IT] = If-(eledee ~ (sfediny ~ =~ (Glee + x But if we add (c/cy) times the th column to the first column, (cy. /,) times the (a ~ 1)th column to the fist column... , and (c,/e,) times the second column to the first column, the result is IT] = 0 x ol “The converse can also be shown to be true: if |T| = 0, then (Xs, Xx, 1) are linearly dependent. Eigenvalues and Eigenvectors Suppose that ann Xn matrix A, a nonzero m X 1 vector x, and a scalar A are related by Ax ax (ass) ‘Then x is called an eigenvector of A and A the associated eigenvalue. Equation [4.4.16] can be waitten Ax - Ale = 0 (A~ Ale = 0. [aaa] Suppose that the matrix (A ~ AL,) were nonsingular. Thea (A ~ Al,)~ would exist and we could premultiply (A:4.17] by (A — Al,)~' to deduce that x=0. ‘Thus, if a nonzero vector x exists that satisfies (A.4.16], then it must be associated with a value of A such that (A — AI,) is singular. An eigenvalue of the matrix A is therefore a number A such that 1A = at = 0. (A48) Eigenvalues of Triangular Matrices Notice that if A is upper triangular or lower triangular, then A ~ Al, is as well, and its determinant is just the product of terms along the principal diagonal: JA = Adal = (as — AY(@a2 — A) > + + Gan ~ A). ‘Thus, for a triangular matrix, the eigenvalues (the values of A for which this ‘expression equals zero) are just the values of A along the principal diagonal. Tinear Independence of Eigenvectors ‘A useful result i that ifthe eigenvalues (A,, Ay, , Ay) are all distinct, then the associated eigenvectors (Ry, %,-- - .X.) are linearly independent. To see this for the case n = 2, consider any numbers ¢, and cy such that ak tom = [Aas] AA, Marx Algebra 729 Premultiplying both sides of [A.4.19] by A produces GA, + GAK = GA + GA = 0. [aa20) If [A.4.19] is multiplied by A, and subtracted from [A.4.20}, the result is as - Ade = 0 (aaaiy But x; is an eigenvector of A, and so it cannot be the zero vector. Also, Ay ~ Ay cannot be zero, since Ay # A,. Equation [A.4.21] therefore implies that ¢; = 0. A parallel set of calculations show that c, = 0. Thus, the only values ofc, and cy Consistent with [A.4.19] are c; = 0 and c = 0, which means that x, and x, are linearly independent. A similar argument for n > 2 can be made by induction A Useful Decomposition Suppose an n x m matrix A has m distinct eigenvalues (Ay, An, «5 Ay): Collect there ia a diagonal matrix A a) o valor 8 0 0 Collect the eigenvectors (Xs Moy « « %) i an (n Xn) matrix T Tein mos ah Applying the formula for multiplying partitioned matics, AT © (Am An c++ AR But since (xy, xs, -. . .%) are eigenvectors, equation (A.4.16] implies that AT= (Am Am c0 Atal [44.22] ‘A second application of the formula for multiplying parti that the right side of [A.4.22] isin turn equal to ned matrices shows Paxy At + Aaa] AO 0 Oa 0 i mi] 4 oo TA, Thus, [A422] can be written At = TA [A423] Now, since the eigenvalues (ly. Ags =» Aq) ae taken to be distinc, the eigenvectors (8, Xa. « » %) are knovn to be linearly independent. Thus, |T| + 0 and T~" exists, Postmultiplying [A.4.23] by T~? reveals @ useful decompasition of A A ‘AT (aaa) The Jordan Decomposition ‘The decomposition in [A.4.24] required the (n Xn) matrix A to have 1 linearly independent eigenvectors. This will be true whenever A has 7 distinct 730 Appendix A | Mathematical Review | | | | cigenvalues, and could stil be true even if A has some repeated eigenvalues. In the completely general case when A has s = n linearly independent eigenvec- tots, there always exists a decomposition similar to [A.4.24], known as. the Jordan decomposition. Specitcaly, for such a matrix A there exists a nonsingular (n % n) mattix M such that A= MIM-!, [A42y, where the (1 n) matrix J takes the form 0 s=[0 2 [4.4.26] oy, with A 1 0 0 Oa 10 U=]O0 Ao 0 (asm 000 ‘Thus, J,has the eigenvalue A, repeated along the principal diagonal and has unity repeated along the diagonal above the principal diagonal. The same eigenvalue A, can appear in two different Jordan blocks J, and J, if it corresponds to several linearly independent eigenvectors. ‘Some Further Results on Eigenvalues ‘Suppose that A is an eigenvalue of the (n x n) matrix A. Then A is also an eigenvalue of SAS for any nonsingular (n x n) matrix S, To see this, note that (A= Alyx =0 implies that S(A = AL)S~'Sx (SAS“! ~ AL)x* = 0 [a.425] for x* = Sx. Thus, A is an eigenvalue of SAS~1 associated with the cigenvec- tor From [A.4.25], this implies thatthe determinant of any (n Xn) matrix Ais the same as the determinant of its Jordan matrix J defined in [A.4.25. Since Js upper triangular, its determinant isthe product of terms along the principal di agonal, which were just the eigenvalues ofA. Thus, the determinant of any mstrx Ais given by the product ofits eigenvalues. Ttis abo clear thatthe eigenvalues of A are the same as those of A’, Taking the transpose of [A.4.25], are yum, wwe see that the eigenvalues of A’ are the eigenvalues of J’. Since J” is lower AA Matrix Algebra 731 ‘siangular, its eigenvalues are the elements on its principal diagonal. But J" and J have the same principal diagonal, meaning that A’ and A have the same eigenvalues. Matrix Geometric Series ‘The results of (A.3.6] through [A.3.10] generalize readily to geometric series involving square matrices. Consider the sum SreLtAtAPH ADH GAT (a4) for A an (n Xn) matrix, Premultiplying both sides of (A.4.29] by A, we see that ASP eA HARE AM EATS ATH, [A430] Subtracting [4.4.30] from [A.4.29], we find that (y= A)Sp = 1, — ATH, [a3] Notice from (A.4.18] that iff, — A] = 0, then A = 1 would be an eigenvalue of ‘A. Assuming that none of the eigenvalues of A is equal to unity, the matrix (1, = A) is nonsingular and (A.4.31) implies that Sr= (y — AY, ~ AT) [A432] if no eigenvalue of A equals 1. If all the eigenvalues of A are strictly less than 1 in modulus, it ean be shown that AT*—» O-as T+ =, implying that GAH ATER = Halt [A433] ‘assuming that the eigenvalues of A are all inside unit ctcle, Kronecker Products For Aan (m x n) matrix and B a (p x q) matrix, the Kronecker product of ‘A and B is defined as the following (mp) X (nq) matrix: 0B OB eB, ‘The following properties of the Kronecker product are readily verified. For any matrices A, B, and C, A@By =a'@B [aay OB @C=A@ BECO) (a4as ‘Also, for A and B both (m Xn) matrices and C any matrix, (A+B @C= G+ BRO [4436 COA+B = (COA + (COB. [0437] Let A be (m x n), Bbe (p x q), C be (n x k), and D be (g x r). Then (A ®BY(C @ D) = (AC) @ (BD); [A438] 32 Appendix A | Mathematical Review j that is, xB oyB <-- ay B][euD cad + ey) 2n8 2aB s+ anB| lex cad << cup Pe FES 3 VayeBO LayeBo --- Y ayeeBo | 2 tABD YL ayeaRD --- Z aycyBo & MmjepBD Y ayje;.BD 1 aayeaBD. For A (n x n) and B (p p) both nonsingular matrices we ean set C = A-* and D = B~"in (A.4.38] to deduce that (A@ BIA @B-) = (AA) @ GB) = 1, @1, = 1, Thus, (A@B = (A @B-Y. (A439) ‘igenvalues of a Kronecker Product For A.an (n x 7) matrix with (possibly nondistinc) eigenvalue (Ay, As) «5 ‘Ap and B (p % p) with eigenvalues (4, Ha - - - » ip, then the (np) eigenvalues of A @ B ate given by Au fori 1,2.-+-,nandj= 1,2,...,p. To see this, write A and B in Jordan form as A= MahMit B= MydoM" Then (M, @ Mz) has inverse given by (Mz! @ Ma"). Moreover, we know from [4.4.28] thatthe eigenvalues of (A @ B) are the same as the eigenvalues of (Mz @ Mj'(A @ BUM, @ Ma) = (Mz'AM,) @ (Mz"BM,) =U @ de. But Ja andJp are both upper tianglat, meaning that (J, @ Js) is upper triangular as wel The eigenvalues of (A @)B) are hus jst the tems onthe principal diagonal of Ja @ Jo)s which are ven by Ay Positive Definite Matrices ‘An (nx 1) real symmetric matrix A is said to be postive semidefnie if for any real (n x 1) vector x, wAKe 0. ‘We make the stronger statement that areal symmetric matrix Ais positive definite if for any real nonzero (n x 1) vector X, wAK> 0; hence, any positive definite matrix could also be said to be positive semidefinte. AA, Matric Algebra 733 Let A be an eigenvalue of A associated with the eigenvector x Ax Premultiplying this equation by x’ results in WAR = Ax’, Since an eigenvector x cannot be the zero vector, x'x > 0. Thus, for a positive semidefinite matrix A, any eigenvalue A of A must be preater than or equal to zero. For A postive definite all eigenvalues are strictly greater than zero. Since the determinant of Ais the product of the eigenvalues, the determinant of a positive definite matrix A is strictly positive. Let A be a positive definite (n x n) matrix and let B denote a nonsingular (x x n) matrix. Then B’AB is positive definite. To see this let x be any nonzero vector. Define i= Br, Then % cannot be the zero vector, for if it were, this equation would state that there exists a nonzero vector x such that Bx = 0-x, in which case zero would be an eigenvalue of B associated with the eigenvector x. But since B is nonsingular, none of its eigenvalues can be zero, Thus, % = BX ceannot be the zero vector, and : x'BABK establishing thatthe matrix B'AB is positive definite. ‘A special case of this result is obtained by letting A be the identity matrix. ‘Then the result implies that any matrix that can be written as B'B for some non- singular matrix Bis positive definite. More generally, any matrix that ean be writen as B'B for an arbitrary matrix B must be positive semidefinite: xBBx H+d+s + Heo, [asao] "AR > 0, where & = Bx. ‘The converse propositions are also true: if A is positive semidefinite, then there exists a matrix B such that A = B'B; if A is positive definite, then there exists @ nonsingular matrix B such that A = B’B, A proof of this claim and an algorithm for calculating B are provided in Section 4.4 Conjugate Transposes Let A denote an (m * n) matrix of (possibly) complex numbers: By + Buk oy + bad Aa] ant bal 8 Ont bad mt + Boal °° * gg + Bn “The conjugate ranspose ofA, denoted AM is formedby transposing A and replacing each element with ts compex conjugt yy — Dak + Ot ~ Ona ain [85 Bato 0a bal Co ee ‘Thus, if A is real, then A!’ and A’ would denote the same matrix. 74 Appendix A | Mathemarical Review i | { [Notice that if an (n x 1) complex vector is premultiptied by its conjugate transpose, the result is a nonnegative real scalar: 0 + be G2 bi) +++ @,~ bag] 4 OF a, + yk He = [la ~ Sars oneo. For Ba real (m x n) matrix and xa complex (n x 1) vector, (Bay! = xB More generally, if both B and x are complex, (Bay = xB H, Notice that if A is positive semidefinite, then aan = x!BBR © with <= Be. Thus, x"Ax is a nonnegative real scalar for any x when A is positive semidefinite. It isa positive real scalar for A positive definite, Continuity of Functions of Vectors ‘A function of more than one argument, such as Y= FO Rae ay [A441] is said to be continuous at (Cy Ca « «5 €y) HE FCC Cy «+ Gis finite and for every e > O there is a 8 > such that Wiles tase) — len eas Ol Se whenever (ey - + Ga) to t GC 8 Partial Derivatives ‘The partial derivative off with respect tox, is defined by Fm tim ape tas poe + Fe TRAM eet Asta [A442] Fei tay ia i Rieter Gradient If we collect the n partial derivatives in [A.4.42] in vector, we obtain the ‘gradient of the function f, denoted ¥: aflan, yn |r os i : [A443] afar, Ad, Matrix Algebra 738 For example, suppose fs linear function: (6s Say oy) ayn tary te age (AAAS Defi and xo be the following (nx 1) vectors: i xe [A446] ‘Then [A444] can be waitten fx) = a's ‘The partial derivative of f(+) with respect to the ith argument is ‘and the gradient is ‘Second-Order Derivatives ‘A second-order derivative of [A441] is given by fry at) 2 [2 te +l] 25, 2x, Es Where second-order derivatives exit and are continuous fo all and , the order of differentiation is ielevant: . +] 8 [afte sed] 8 [afte ar, Es & Sometimes these second-order derivatives are collected in an n x n matrix Hcalled the Hessian magix: We will also use the notation to represent the matrix H. 736 Appendix A | Mathematical Review Derivatives of Vector-Valued Functions Suppose we have ast of m functions f+), fs f(- e8ch of which depends onthe n variables (2, -- q). We can cole them fanetons into a ingle vectorvalved fenction oo) s9,-| ful) ‘We sometimes write fae Re to indicate that the function takes n different real numbers (summarized by the vector x, an element of R*) and calculates m different new numbers (summarized by the value of f, an element of R"). Suppose that each of the functions f,(-), Fi)r os fg) has derivatives with respect to each of the arguments x1, £3, 7 a. We can summarize these derivatives in an (m x n) matrix, called the Jacobian matrix of €and indicated by atin’ ald, afilbe +++ alate aflae, alae, apa, Alt, BfylOky +++ Aly For example, suppose that each of the functions f(x) is linear: F(R) = aye, + drat + + Baa LR) = dnaty + Oats H+ day Srl) = Oaks + yas + °° + Sgn We could waite this system in matrix form as (3) = Ax, where Aw [o 2 om Om Ona” and x is the (n 1) vector defined in [A.4.46]. Then a we Taylor’s Theorem with Multiple Arguments Let f: R* > R? as in [A.4.41], with continuous second derivatives. A first- order Taylor series expansion of fH) around eis given by 1010+ 2| -@-0+ Rien [aaa] Ad, Matrix Algebra 737 Here affox' denotes the (1 X mn) vector that is the transpose ofthe gradient, and the remainder R,(-) satisfies nen=35 3 2e Am, 3s, (a - eka - 6) eau) 2 for &(i, j) an (n x 1) vector, potentially diferent for each i and j, with each &(i, j) between e and x, that is, BG, /) = ACG, fe + [L ~ AG, {Dlx for some ‘AG, 7) between 0 and I. Furthermore, ‘An implication of [A.4.47] is that if we wish to approximate the consequences for fof simultaneously changing x, by 41, %, by Bp,» and x, by da, we could Fb + Bios # Baye BR) = Fie Re) a See “ aha tae rte If f(-) has continuous third derivatives, a second-order Taylor series expan sion of f(x) around ¢ is given by . f@) = f@ + ee (x = ¢) (asagy the-9 2h] @-9 + Re0. where: =~ nie = 53 3 3 are a uw OH OO) ee = 2 with 6(, j, between and x and tim — Files 2) — = @- Je-9” Mutiiple Integrals ‘The notation [[tenaya indicts the folowing operation: fst integrate [tere with respect to y, with x held fixed, and then integrate the resulting function with respect to x. For example, if ay dy dx = [. x4222) ~ (#2) de = AHS ~ OF] = 25. 738. Appendix A | Mathematical Review Provided that f(x,y) is continuous, the order of integration can be reversed. For example, if sty dedy = [ (W18y dy = (U5)-@22) = 25. A'S. Probability and Statistics Densities and Distributions A stochastic or random variable X is said to be discret-valued if t can take ‘on only one of K particular values; call these sy, x), v5 Xx. Its probability distribution is a set of numbers that give the probability of esch outcome: P(X = x4} = probability that X takes on the value, k= 1,...,K. ‘The probabilities sum to unity DPX =u} = 1 ‘Assuming that the possible outcomes are ordered x, o fad) = ( Fe) oe fas AS. Probability and Statistics TAL [Notice that this satisfies the requirement of a density [A.5.1]: [toni ay = [7 Be) ay --f = 75 [a teotes nds GO, Ax@) A further obyiousimpliation of the definition in [A.5.7] tha joint density can be witten as the product of the marginal density andthe conditional density Frey 9) = Bi 918) Fu) (A58] ‘The conditional expectation of ¥ given tha the random varitble X takes on ‘the particular value xis BULK =x) = [" y-fnu(ole & [Ass] Taw of Iterated Expectations [Note that the conditional expectation is a function ofthe value ofthe random variable X. For different realizations of X, the conditional expectation will be a different number. Suppose we view (YX) as a random variable and take its expectation with respect to the distribution of X: ExlEnx(¥i01 = fo [ [x texolo 4] fala) de Results [A.5.8] and [A.5.6] can be used to express this expectation as j j vr fexloa) dy de = | y-foly) Thus, ExlEye(¥1X)] = Ey(1). (45.10) Jn words, the random variable E(Y|X) has the same expectation as the random variable ¥. This is known as the law of iterated expectations. Independence The variables ¥ and X are said to be independent if Sart 9) = fale). (as.al] Comparing [A.5.11] with [A 5.8}, if ¥ and X ate independent, then fax ils) = fe). (as.iz) Covariance Let wx denote E(2) and wy denote E(¥). The population covariance between Xand ¥ is given by cox ry ff te mo ae fevles9) dy ds. [AS.3) 742 Appendix A | Mathematical Review Correlation “The population correlation between X and Y i given by = Cou, ¥) Cont Xs ¥) = Pyare) -VVarPy If the covariance (or correlation) between X and Y is zero, then X and ¥ ate said to be uncorrelated. Relation Between Correlation and Independence Note that if X and ¥ are independent, then they are uncorrelated conta, ¥) = ff ~ ned ~ afl) Fola) dy ae Sf #0 [f°0 = anton ay] 100 Furthermore, ie 0 Hdfe) 4] = [yf ay ~ wef fon ay = be a ‘Thus, if X and ¥ are independent, then Cov(xX, ¥) = 0, as claimed, "The converse proposition, however, is not true—the fact that X and Y are uncorrelated is not enough to deduce that they are independent. To construct a counterexample, suppose that Z and Y are independent random variables each with mean zero, and let X= Z*Y. Then E(K ~ px)(¥ ~ ay) = E(2¥)-¥] » E(Z)-EW) = 0, and so X and Y are uncorrelated. They are not, however, independent —the value of ZY depends on ¥. ‘Orthogonality Consider a sample of size T on two random variables, fry...» #r} and {us Yor «+ «4 Yr}- The two variables are said to be orthogonal if Sam 0. ‘Thus, orthogonality isthe sample analog of absence of correlation. For example, let x; = 1 denote a sequence of constants and let y, = Ww, ~ W, where = (1/7)3iLy¥; is the sample mean of the variable w. Then x and y are orthogonal: Erm Su-reea Al. Probability and Statistics 743 Population Moments of Sums i ‘Consider the random variable aX + bY. Its mean is given by I Bax + ov) = ff e+ bd farlen) ay de maf | eter» arde+o] | ytertes dae ) wf epear sof» nora, and so E(k + BY) = @ 5X) + BEY) [as.q ‘The variance of (aX + bY) is vartax + bY) = ff (ee + Oy) ~ (ony + buydP fxs 9) dy de = j j [ax ~ any)? + Yar ~ anx)(by ~ buy) + y ~ buy) Feeley) dy de . ee) fe-mhrenee + ab j J (= Waly ~ Hr) Farle») dye 0°] fom farted ae 1 Thus, Var(ax + BY) = a?-Var(X) + 2ab-Cov(X, Y) + 67 Var(¥). [A.5.15] When X and ¥ are uncorrelated, Var(aX + BY) = a?-Var(X) + B® Var(¥). Ic is straightforward to generalize results (A.S.14] and (A.5.15). If (%, Xz... Xp} denotes a collection of n random variables, then F(X, + GX, +--+ + aX) = a EK) + 0B) +22 + 0 EK) Varaak, + oak +++ + aX) = af Var(Xi) + oF: Var() + ++ + a}: V(X) + ty: CoM %) + 2ayey-Cov, 3) + (as.i7] + 24,0y-Cov(X, Xa) + 2ayty-Cov(Ny 5) + aay Cov, Xe) Het Bayt Covina as X) fas.ig TAA Appendix A | Mathematica! Review Ifthe X's are uncorrelated, then [A.5.17] simplifies to, Var(@uXs + oaXs + °° + ayXy) fas.s] a4. Var(&) + af-Van(XG) + +++ + 02. Vat) Cauchy-Schwarz Inequality ‘Te Cauchy-Schwarz inequality states that for any random variables X and ¥ whose variances and covariance exist, the correlation is no greater than unity in absolute value: 15 Con(X, ¥) $1 (a.s.i9] ‘To establish the far right inequality in [A.5.19], consider the random variable ze toe ware} “The square of this variable cannot take on negative values, 50 Hy) = ad «(Sete - Sate] =o ‘Recognizing that Var(X) and Var(¥) dente population moments (as opposed to random variables), equation [A.5.15] cam be used to deduce EC = ux Bes sy Y= wy) ] + EO =m Varex) Wax) Var argh) Thus, = 2Con(x, ¥) +120, ‘meaning that Corr(X, ¥) = 1. “To establish the far left inequality in [A519], noice that (Se wx), = sal =o War(?). = Wax) * implying that 1+ 2-ComX, ¥) +120, so that Conr(x, ¥) = = 1. The Normal Distribution ‘The variable ¥, has & Gaussian, or Normal, distribution with mean wand variance 0? if on 2 Io) = ese =O 8) (A520) We write ~ Nw o) to indicate that the density of ¥, is given by [A.5.20]. A.S. Probability and Staistics 745 (Centered od-ordered population moments for 8 Gaussian variable are zero: EW, wy =O forr= 1,35... ‘The centered fourth moment is EY, ~ wt = Sot, ‘Skew and Kurtosis ‘The skewness of a variable Y, with mean w is represented by EC, = pp (arp? A variable with a negative skew is more likely to be far below the mean than itis to be far above the mean. The kurtosis is [var(¥)? A distribution whose kurtosis exceeds 3 has more mase in the tails than a Gaussian distribution with the same variance, Other Useful Univariate Distributions Let (Xj, Xn --- » X,) de independent and identically distributed (4id.) ‘N(O, 1) variables, and consider the sum of their squares: Yu Xp+ Xe +X. “Then Yissidto havea chrsquare itrbaton wth degree of freedom, denoted ¥ ~ xn), Let X~ N(O, 1) and ¥ ~ x%() with X and Y independent, Then x Wi is uid to have at dsrution wit n degrees of freedom, denoted Z~ Kn). Let, ~ x4) and ¥5 ~ x) with Yad ¥ independent, Thea Ye Yuin is said to have an F distribution with n, numerator degrees of freedom and nz ‘denominator degrees of freedom, denoted Z~ Fry m). Note that if 2 ~ ¢(n), then Z? ~ F(L, n) z= Likelihood Function Suppose we have observed a sample of ize T on some random variable ¥,. Let faves ute(ir Jas + «++ ¥ni 8) denote the joint density of ¥,, Ys, ..., Yr. 746 Appends A | Mathematical Review ‘The notation emphasizes that this joint density is presumed to depend on a vector of population parameters 0. If we view this joint density as a function of @ (given the data on ¥), the result is called the sample likelihood function. For example, consider a ample of T iid. variables drawn from a N(u, o) distribution. For tis distribution, © = (j, 0°)’, and from [4.5.11] the joint density is the product of individual terms such as [A.5.20} Prater iv Yar Pride 29) = TT fs as 2) “The log of the joint density is the sum of the logs of these terms: 1B Fete nte Pas Yar == Ye Be - & 0 fut 0?) (A521) Tr) log(2n) ~ (72) gto") ~ 3 Ts, fora simple of T Gaussian random variables with mean and vince 2 the sample log Hkeihood function, denoted £(, 0°: ys, Yas -- ++ /r) is given by: 20, 0% yaya aye) = = (02) ogo?) ~ BOP (a.s20) In calculating the sample log likelihood function, any constant term that doesnot involve the parameter or 0 can be ignored for most purposes. In (A.5.22, this constant term i k= ~(7) log(2a). Maximum Likelihood Estimation For a given sample of observations (Ys, Ya, +» Ys the value of @ that makes the sample likelihood as large as possible is called the maximum likelihood ‘estimate (MLE) of 8. For example, the maximum likelihood estimate of the pop- ‘lation mean p for an iid. sample of size T from a N(u, 22) distribution is found ative of [A.5.22] with respect to w equal to zero: [as.23] [as.24] Substituting [A.5.23] into [A.5.24] and solving for o* gives on wn n> ay [A525] A.S. Probability and Statistics 747 ‘Thus, the sample mean is the MLE of the population mean and the sample variance is the MEE of the population variance for an i.d. sample of Gaussian variables Y= (My Ya Ya)! be a collection of n random variables. The vector Y has a multivariate Normal, ot ‘multivariate Gaussian, distribution if its density takes the form $s) = @n)-"9)21-¥ expl(—12)(y ~ wy'D-(y ~ w)]. [45.26] ‘The mean of ¥ is given by the vector i) and its variance-covariance matrix is O: EY ~ why ~ wy = 0. Note that (Y ~ p)(¥ ~ )’ is symmetric and postive semidefinite for any YY, meaning that any variance-covariance matrix must be symmetric and positive semidefinite; the form of the likelihood in [4.5.26] assumes that 0 is positive defitite. Result [A 4.15] is sometimes used to write the multivariate Gaussian dens jn an equivalent form: Fe(2) = Qn)-27]-11"7 expl(—12N.y - wy OGY ~ w]e IY ~ N(q, 0), then for any nonstochastic (r x n) matrix H' and (r x 1) vector b, HY + b~N(GH'm + 8), HOH), Correlation and Independence for Multivariate Gaussian Variates It Y has a multivariate Gaussian distribution, then absence of correlation implies independence. To see this, note that if the elements of Y are uncorrelated, then El(Y, ~ w)(¥; ~ n)] = Ofori + j and the off-diagonal elements of 0 are ees 0 60 o%. sch ap 0, 19) = ofe} +o} [A527 w[Phe tase Cn ee 748 Appendix A | Mathematical Review Substituting [A.5.27] and [A.5.28] into [A.5.26] produces AQ) = 20) "fojo} «+ + 3] x exp[(- 12X01 ~ mot + Oa ~ malo} + + Oa ~ meio) = fLem- tei eml(-10,10, - Pi which isthe product of n univariate Gaussian densities. Since the joint density is the product of the individual densities, the random variables (¥. Yay «-, Ya) are independent. Probability Limit Let (X,, Xa. «- Xz} denate a sequence of random variables. Often we are interested in what happens to this sequence as T becomes large. For example, Xy might denote the sample mean of T observations: Xe = UT)AY, + Yet + YD), (as.29] in which case we might want to know the properties of the sample mean as the size of the sample T grows large. ‘The sequence {Xy, Xa, . . . » Xr}is sid to converge in probability toc if for every © > 0 and 6 > O there exists a value N such that, forall 7 = N, Pi\Xp- el > 8} O there exists a value N such that, for all T= N, EX, - oF Se. fas] We indicate that the sequence converges to ¢ in mean square as follows: Xe, Convergence in mean square implies convergence in probability, but con- vergence in probability does not imply convergence in mean square, AS. Probably and Staisties 749

You might also like