Econometrics Lecture 1:
Review of Matrix Algebra
R. G. Pierse
1 Introduction
A matrix is a rectangular array of numbers. If the matrix has n rows and m
columns it is said to be an n × m matrix. This is called the dimension of the
matrix. A matrix with a single column ( n × 1) is called a column vector. A
matrix with a single row ( 1 × m) is called a row vector. A matrix with only one
row and one column ( a single number) is called a scalar.
The standard convention for denoting a matrix is to use a capital letter in
bold typeface as in A, B, C. A column vector is denoted with a lowercase letter
in bold typeface as in a, b, c. A row vector is denoted with a lowercase letter in
bold typeface, followed by a prime, as in a0 , b0 , c0 . A scalar is generally denoted
with a lowercase letter is normal typeface as in a, b, c.
An n × m matrix A can be written out explicitly in terms of its elements as
in:
a11 a12 · · · a1j a1m
a21 a22 · · · a2j a2m
.. .. . .
A=
. . .
.
ai1 ai2 aij aim
an1 an2 anj anm
Each element has two subscripts: the first is the row index and the second the
column index so that aij refers to the element in the ith row and jth column of
A.
2 Matrix Operations
The standard operations of addition, subtraction and multiplication can be defined
for two matrices as long as the dimensions of the matrices satisfy appropriate
conditions to ensure that the operation makes sense. If so, then the two matrices
1
Econometrics: Lecture 1 c Richard G. Pierse 2
are said to be conformable for the operation. If not, then the operation is not
defined for these matrices.
2.1 Matrix Addition and Subtraction
C = A + B; cij = aij + bij
C = A − B; cij = aij − bij
For A and B to be conformable for addition or subtraction, they must be of the
same dimension (n × m). Then the resultant matrix C will also be n × m with
each element equal to the sum (difference) of the corresponding elements of A
and B. Matrix addition obeys the rules that
A+B=B+A
and
A + (B + C) = A + B + C.
2.2 Matrix Multiplication
X
C = AB; cij = aik bkj
k
For A and B to be conformable for matrix multiplication, the number of
columns of A must be equal to the number of rows of B. If A is of dimension
n × m and B is of dimension m × p, then the resultant matrix C will be of
dimension n × p. The ijth element of C is the sum of the product of the elements
of the ith row of A and the jth column of B.
Note that, except under very special conditions,
AB 6= BA
and in fact both products will only be defined in the special case that p = n.
Because of the fact that the order of multiplication matters, it is important to
distinguish between pre-multiplying and post-multiplying a matrix.
Matrix products obey the rules of
A(BC) = ABC
and distribution
A(B + C) = AB + AC.
Econometrics: Lecture 1 c Richard G. Pierse 3
2.3 Matrix Transposition
The transpose of an n × m matrix A, denoted as A0 , is the m × n matrix defined
by
C = A0 ; cij = aji
so that the ith row of C is the ith column of A. The transpose operator obeys
the rules that
(A + B)0 = A0 + B0
and
(AB)0 = B0 A0 .
3 Square Matrices
A matrix with the same number of columns as rows is called a square matrix.
The number of rows (columns) is called the order of the matrix. The elements
with row index equal to column index as in a11 , a22 , etc. are called the diagonal
elements and the elements aij , i 6= j are called the off-diagonal elements.
3.1 The trace operator
The trace of a square matrix, denoted tr, is the sum of its diagonal elements
n
X
tr(A) = aii .
i=1
The trace operator obeys the rules that
tr(A + B) = tr(A) + tr(B)
and
tr(AB) = tr(BA)
if both AB and BA exist.
3.2 Special matrices
3.2.1 Symmetric matrices
A square matrix A that satisfies the property A = A0 is said to be symmetric. It
has the property that aij = aji for all values of the indices i and j.
Econometrics: Lecture 1 c Richard G. Pierse 4
3.2.2 Diagonal matrices
A square matrix with all off-diagonal elements equal to zero is called a diagonal
matrix. A diagonal matrix is symmetric.
3.2.3 Triangular matrices
A square matrix with all elements below the diagonal equal to zero is called an
upper triangular matrix. Similarly a matrix with all elements above the diagonal
equal to zero is called a lower triangular matrix.
3.2.4 The Identity matrix
The square matrix of order n with all diagonal elements equal to one, and all
off-diagonal elements equal to zero is called the identity matrix of order n and
is denoted as In . The identity matrix is symmetric aand diagonal. It has the
property that, for any n × m matrix A,
In A = A and AIm = A
so that any matrix when pre- or post-multiplied by the identity matrix is un-
changed. The identity matrix is the equivalent of the number one in standard
(scalar) algebra.
4 Matrix Inversion
If A is a square n × n matrix, then it may or may not be possible to find a square
n × n matrix B such that
AB = In .
If B does exist then it is called the inverse of A and is written A−1 . Where the
matrix inverse exists, it satisfies
AA−1 = In and A−1 A = In .
To state the conditions for the existence of the matrix inverse A−1 we need to
consider the concept of linear independence of a set of vectors and the concept of
the rank of a matrix.
Econometrics: Lecture 1 c Richard G. Pierse 5
4.1 Linear independence
Let a1 , a2 , · · · , am be a set of column vectors of dimension n × 1, and let λ1 , λ2 ,
· · · , λm be a set of scalar weights. Then the vector c defined by
m
X
c= λi ai
i=1
is called a linear combination of the vectors a1 , a2 , · · · , am .
Under what conditions on the weights λi will this linear combination be equal
to the n × 1 zero column vector 0n ? Clearly this will be the case if all the weights
are zero, λi = 0, ∀i. If this is the only condition under which c = 0 then the
vectors a1 , a2 , · ·P
· , am are called linearly independent. However, if there are values
for λi such that m i=1 λi ai = 0 where at least one λi 6= 0, then the vectors ai are
said to be linearly dependent.
If a set of vectors are linearly dependent, then it is possibleP to write one of the
vectors as a linear combination of the others. For example if m i=1 λi ai = 0 with
λj 6= 0 then
m
1 X
aj = − λi ai .
λj i=1
i6=j
Note that if m > n, then the set of m column vectors a1 , a2 , · · · , am must be
linearly dependent. Similarly, if any vector is equal to 0n , then the set of vectors
must be linearly dependent.
4.2 The rank of a matrix
The column rank of an n × m matrix A is defined to be the maximum number of
linearly independent columns of A. The row rank is defined to be the maximum
number of linearly independent rows of A. Since it can be shown that the column
rank and row rank of a matrix are always equal, we can simply refer to the rank
of A, denoted rank(A). The following results hold for the rank of a matrix:
0 ≤ rank(A) ≤ min(n, m)
and
rank(A0 ) = rank(A).
If rank(A) = min(n, m) then the matrix is said to be of full rank.
Econometrics: Lecture 1 c Richard G. Pierse 6
4.3 The matrix inverse
The inverse of a square n × n matrix A exists if and only if
rank(A) = n
so that A is of full rank. The matrix inverse has the following properties:
(A−1 )−1 = A
(A0 )−1 = (A−1 )0
(AB)−1 = B−1 A−1 .
4.4 Example: solving linear equations
Consider the set of n linear equations in the n variables x1 , x2 , · · · , xn defined by
a11 x1 + a12 x2 + · · · + a1n xn = c1
a21 x1 + a22 x2 + · · · + a2n xn = c2
..
.
an1 x1 + an2 x2 + · · · + ann xn = cn
or, in matrix form,
Ax = c.
If the matrix A is nonsingular, then these equations have a solution which is given
by pre-multiplying the set of equations by the matrix inverse A−1 to give
A−1 Ax = x = A−1 c.
If the matrix is singular, then no solution to these equations will exist. In par-
ticular, this will be the case if A is not square, with either too few or too many
equations to uniquely determine x. More generally, linear dependence of the
equations will mean that no solution exists, corresponding to the singularity of
the matrix A.
5 Determinants
The determinant of a square n × n matrix A is defined by the expression
X
det(A) = |A| = (±) a1i a2j · · · anr
Econometrics: Lecture 1 c Richard G. Pierse 7
where the summation is taken over all permutations of the second subscripts. Each
term has a plus sign for even permutations and a minus sign for odd permutations.
For example, for the second order matrix
a11 a12
A=
a21 a22
the determinant is given by the expression
det(A) = |A| = a11 a22 − a12 a21 .
A singular matrix has determinant equal to zero while a nonsingular matrix has
a non-zero determinant.
6 Quadratic Forms
Let A be an n × n square, symmetric matrix, and x be an n × 1 column vector.
Then the scalar expression
n X
X n
0
x Ax = aij xi xj
i=1 j=1
is called a quadratic form. If A is a nonsingular matrix then the quadratic form
x0 Ax can only be equal to zero if x = 0.
A positive definite (pd) matrix is one for which all quadratic forms are greater
than zero for all values of x 6= 0. Formally
x0 Ax >0, ∀x 6= 0.
A negative definite (nd) matrix is one for which all quadratic forms are less than
zero for all values of x 6= 0. Formally
x0 Ax <0, ∀x 6= 0
Similarly, a positive semi-definite (psd) matrix is one for which
x0 Ax ≥0, ∀x
and a negative semi-definite (nsd) matrix is one for which
x0 Ax ≤0, ∀x
Econometrics: Lecture 1 c Richard G. Pierse 8
7 Eigenvalues and Eigenvectors
Let A be an n × n square matrix. Consider the equation system
Ax = λx
where x is an n × 1 vector with x 6= 0 and λ is a scalar. A value of x that solves
this system of equations is called an eigenvector (or characteristic vector or latent
vector ) of the matrix A. λ is the corresponding eigenvalue (or characteristic value
or latent root). In general there will be n solutions to this system of equations
although these need not be distinct. If the matrix is not symmetric, then the
eigenvalues λi may include complex numbers.
The eigenvalues of a matrix have many useful properties. In particular, the
trace of a matrix is the sum of its eigenvalues
n
X
tr(A) = λi
i=1
and the determinant of a matrix is the product of its eigenvalues
n
Y
det(A) = λi .
i=1
A positive definite matrix has eigenvalues that are all positive and a negative def-
inite matrix has eigenvalues that are all negative. In addition, if A is a symmetric
matrix, then its rank is equal to the number of its non-zero eigenvalues.
If A is a symmetric matrix, then the n eigenvectors x1 , · · · , xn have the
property of orthogonality that
x0i xj = 0 i 6= j and x0i xi = 1 , i, j = 1, · · · , n.
Stacking these eigenvectors into a matrix
X = [x1 : x2 : · · · : xn ]
with the property that X−1 = X0 , it follows that
A = XΛX0
where Λ is an n × n diagonal matrix with the eigenvalues λ1 , · · · , λn along the
diagonal. This result is called the eigenvalue decomposition of the symmetric
matrix A.
Econometrics: Lecture 1 c Richard G. Pierse 9
8 Cholesky Decomposition
Let A be an n × n symmetric positive-definite matrix. Then it can be shown that
A = HH0
where H is a lower triangular matrix of order n×n. This is known as the Cholesky
decomposition of the symmetric positive-definite matrix A.
It follows that
A−1 = H−10 H−1 .
9 Idempotent Matrices
A square n × n matrix A is idempotent if
AA = A.
If the matrix is symmetric it also follows that
A0 A = A.
Idempotent matrices are also called projection matrices. The eigenvalues of an
idempotent matrix are all either zero or one. It follows that most idempotent
matrices are singular. The exception is the identity matrix In . If A is idempotent,
then so also is In − A.
Idempotent matrices have the property that their rank is equal to their trace,
or,
rank(A) = tr(A).
Idempotent matrices are very important in econometrics. Let X be an n × k
matrix of data of rank k. Then the matrix
M = X(X0 X)−1 X0
is a symmetric idempotent matrix since
MM = X(X0 X)−1 X0 X(X0 X)−1 X0
= X(X0 X)−1 X0 = M.
The rank of M can be determined using the results above since
rank(M) = tr(M) = tr(X(X0 X)−1 X0 )
= tr(X0 X(X0 X)−1 ) = tr(Ik ) = k.
Econometrics: Lecture 1 c Richard G. Pierse 10
10 The Kronecker Product
The Kronecker product (or tensor product) of the n × m matrix A and the p × q
matrix B, which is denoted A ⊗ B, is defined by the np × mq matrix
a11 B a12 B · · · a1m B
a21 B a22 B · · · a2m B
A ⊗ B = .. .
.. . .
. . .
an1 B an2 B anm B
The Kronecker product has the following properties:
(A ⊗ B)(C ⊗ D) = AC ⊗ BD
(A ⊗ B)0 = A0 ⊗ B0
and
(A ⊗ B)−1 = A−1 ⊗ B−1 .
11 Vectorisation
Let A be an n × m matrix with columns
A = [a1 : a2 : · · · : am ] .
Then the column vectorisation of A, denoted by vec(A), is defined by the nm × 1
vector
a1
a2
vec(A) = ..
.
am
constructed by stacking the columns of A.
The vec operator has the property that
vec(ABC) = (C0 ⊗ A) vec (B).
12 Matrix Derivatives
The rules of differential calculus carry over to matrices in a straightforward way.
The only issue is that of adopting a convention for ordering the derivatives.
Econometrics: Lecture 1 c Richard G. Pierse 11
12.1 Derivatives of a scalar wrt a matrix
The derivatives of a scalar function f with respect to a matrix argument X of
dimension n × m is defined by the n × m dimensional matrix
∂f ∂f
∂x11
· · · ∂x1m
∂f
= ... ... .
∂X ∂f ∂f
∂xn1 ∂xnm
12.2 Derivatives of a vector wrt a vector
The derivatives of an n × 1 vector y with respect to an m × 1 vector x is defined
by the n × m dimensional matrix
∂y1 ∂y1
∂x1
· · · ∂x m
∂y
= ... .. .
0 .
∂x ∂yn ∂yn
∂x1 ∂xm
12.3 Derivatives of a matrix wrt a matrix
There is no obvious way to order the derivatives of one matrix with respect to
another matrix. In this case the most sensible procedure is to vectorise both
matrices and look at the matrix of derivatives
∂ vec(Y)
.
∂ vec(X)0
If Y is of order p × q and X is of order n × m, then this matrix of derivatives is
of order pq × nm.
12.4 Some useful results
Two general rules allow the calculation of derivatives of complicated functions.
These are followed by some useful derivatives of commonly used matrix functions.
12.4.1 Function of a function rule
∂y ∂y ∂z
0
= 0 0
∂x ∂z ∂x
12.4.2 Product rule
∂ vec(AB) ∂ vec(A) ∂ vec(B)
0
= (B0 ⊗ I) 0
+ (I ⊗ A)
∂x ∂x ∂x0
Econometrics: Lecture 1 c Richard G. Pierse 12
12.4.3 Derivative of an inner product
∂a0 x
=a
∂x
12.4.4 Derivative of a quadratic form
∂x0 Ax
= (A + A0 )x
∂x
12.4.5 Derivative of the trace of a matrix
∂ tr(A)
=I
∂A
12.4.6 Derivative of the determinant of a matrix
∂ det(A)
= det(A)(A0 )−1
∂A
12.4.7 Derivative of the log determinant of a matrix
∂ ln det(A) −1
= (A0 )
∂A
12.4.8 Derivative of a matrix inverse
∂ vec(A−1 )
0 −1 −1
= − (A ) ⊗ A
∂ vec(A)0