Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
214 views56 pages

Matrix Calculus

This document discusses matrix calculus, which is a specialized notation for doing multivariable calculus over spaces of matrices. It collects partial derivatives into vectors and matrices to simplify operations like finding extrema of functions. Two competing notations exist. The document also covers derivatives involving scalars, vectors and matrices, and their relation to other mathematical topics.

Uploaded by

vineetguleria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views56 pages

Matrix Calculus

This document discusses matrix calculus, which is a specialized notation for doing multivariable calculus over spaces of matrices. It collects partial derivatives into vectors and matrices to simplify operations like finding extrema of functions. Two competing notations exist. The document also covers derivatives involving scalars, vectors and matrices, and their relation to other mathematical topics.

Uploaded by

vineetguleria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Matrix calculus

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus,


especially over spaces of matrices. It collects the various partial derivatives of a single
function with respect to many variables, and/or of a multivariate function with respect to a
single variable, into vectors and matrices that can be treated as single entities. This greatly
simplifies operations such as finding the maximum or minimum of a multivariate function
and solving systems of differential equations. The notation used here is commonly used in
statistics and engineering, while the tensor index notation is preferred in physics.

Two competing notational conventions split the field of matrix calculus into two separate
groups. The two groups can be distinguished by whether they write the derivative of a scalar
with respect to a vector as a column vector or a row vector. Both of these conventions are
possible even when the common assumption is made that vectors should be treated as
column vectors when combined with matrices (rather than row vectors). A single convention
can be somewhat standard throughout a single field that commonly uses matrix calculus
(e.g. econometrics, statistics, estimation theory and machine learning). However, even within
a given field different authors can be found using competing conventions. Authors of both
groups often write as though their specific conventions were standard. Serious mistakes can
result when combining results from different authors without carefully verifying that
compatible notations have been used. Definitions of these two conventions and comparisons
between them are collected in the layout conventions section.
Scope
Matrix calculus refers to a number of different notations that use matrices and vectors to
collect the derivative of each component of the dependent variable with respect to each
component of the independent variable. In general, the independent variable can be a scalar,
a vector, or a matrix while the dependent variable can be any of these as well. Each different
situation will lead to a different set of rules, or a separate calculus, using the broader sense of
the term. Matrix notation serves as a convenient way to collect the many derivatives in an
organized way.

As a first example, consider the gradient from vector calculus. For a scalar function of three
independent variables, , the gradient is given by the vector equation

where represents a unit vector in the direction for . This type of generalized
derivative can be seen as the derivative of a scalar, f, with respect to a vector, , and its result
can be easily collected in vector form.

More complicated examples include the derivative of a scalar function with respect to a
matrix, known as the gradient matrix, which collects the derivative with respect to each
matrix element in the corresponding position in the resulting matrix. In that case the scalar
must be a function of each of the independent variables in the matrix. As another example, if
we have an n-vector of dependent variables, or functions, of m independent variables we
might consider the derivative of the dependent vector with respect to the independent vector.
The result could be collected in an m×n matrix consisting of all of the possible derivative
combinations.
There are a total of nine possibilities using scalars, vectors, and matrices. Notice that as we
consider higher numbers of components in each of the independent and dependent variables
we can be left with a very large number of possibilities. The six kinds of derivatives that can
be most neatly organized in matrix form are collected in the following table.[1]

Types of matrix derivative

Types Scalar Vector Matrix

Scalar

Vector

Matrix

Here, we have used the term "matrix" in its most general sense, recognizing that vectors are
simply matrices with one column (and scalars are simply vectors with one row). Moreover, we
have used bold letters to indicate vectors and bold capital letters for matrices. This notation
is used throughout.

Notice that we could also talk about the derivative of a vector with respect to a matrix, or any
of the other unfilled cells in our table. However, these derivatives are most naturally organized
in a tensor of rank higher than 2, so that they do not fit neatly into a matrix. In the following
three sections we will define each one of these derivatives and relate them to other branches
of mathematics. See the layout conventions section for a more detailed table.

Relation to other derivatives


The matrix derivative is a convenient notation for keeping track of partial derivatives for doing
calculations. The Fréchet derivative is the standard way in the setting of functional analysis
to take derivatives with respect to vectors. In the case that a matrix function of a matrix is
Fréchet differentiable, the two derivatives will agree up to translation of notations. As is the
case in general for partial derivatives, some formulae may extend under weaker analytic
conditions than the existence of the derivative as approximating linear mapping.
Usages
Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of
Lagrange multipliers. This includes the derivation of:

Kalman filter
Wiener filter
Expectation-maximization algorithm
for Gaussian mixture
Gradient descent

Notation
The vector and matrix derivatives presented in the sections to follow take full advantage of
matrix notation, using a single variable to represent a large number of variables. In what
follows we will distinguish scalars, vectors and matrices by their typeface. We will let M(n,m)
denote the space of real n×m matrices with n rows and m columns. Such matrices will be
denoted using bold capital letters: A, X, Y, etc. An element of M(n,1), that is, a column vector,
is denoted with a boldface lowercase letter: a, x, y, etc. An element of M(1,1) is a scalar,
denoted with lowercase italic typeface: a, t, x, etc. XT denotes matrix transpose, tr(X) is the
trace, and det(X) or | X | is the determinant. All functions are assumed to be of
differentiability class C1 unless otherwise noted. Generally letters from the first half of the
alphabet (a, b, c, ...) will be used to denote constants, and from the second half (t, x, y, ...) to
denote variables.

NOTE: As mentioned above, there are competing notations for laying out systems of partial
derivatives in vectors and matrices, and no standard appears to be emerging yet. The next
two introductory sections use the numerator layout convention simply for the purposes of
convenience, to avoid overly complicating the discussion. The section after them discusses
layout conventions in more detail. It is important to realize the following:

1. Despite the use of the terms


"numerator layout" and
"denominator layout", there are
actually more than two possible
notational choices involved. The
reason is that the choice of
numerator vs. denominator (or in
some situations, numerator vs.
mixed) can be made independently
for scalar-by-vector, vector-by-scalar,
vector-by-vector, and scalar-by-
matrix derivatives, and a number of
authors mix and match their layout
choices in various ways.
2. The choice of numerator layout in
the introductory sections below
does not imply that this is the
"correct" or "superior" choice. There
are advantages and disadvantages
to the various layout types. Serious
mistakes can result from carelessly
combining formulas written in
different layouts, and converting
from one layout to another requires
care to avoid errors. As a result,
when working with existing
formulas the best policy is probably
to identify whichever layout is used
and maintain consistency with it,
rather than attempting to use the
same layout in all situations.
Alternatives
The tensor index notation with its Einstein summation convention is very similar to the matrix
calculus, except one writes only a single component at a time. It has the advantage that one
can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two
are quite unwieldy with matrix notation. All of the work here can be done in this notation
without use of the single-variable matrix notation. However, many problems in estimation
theory and other areas of applied mathematics would result in too many indices to properly
keep track of, pointing in favor of matrix calculus in those areas. Also, Einstein notation can
be very useful in proving the identities presented here (see section on differentiation) as an
alternative to typical element notation, which can become cumbersome when the explicit
sums are carried around. Note that a matrix can be considered a tensor of rank two.

Derivatives with vectors


Because vectors are matrices with only one column, the simplest matrix derivatives are
vector derivatives.

The notations developed here can accommodate the usual operations of vector calculus by
identifying the space M(n,1) of n-vectors with the Euclidean space Rn, and the scalar M(1,1)
is identified with R. The corresponding concept from vector calculus is indicated at the end of
each subsection.

NOTE: The discussion in this section assumes the numerator layout convention for
pedagogical purposes. Some authors use different conventions. The section on layout
conventions discusses this issue in greater detail. The identities given further down are
presented in forms that can be used in conjunction with all common layout conventions.
Vector-by-scalar
The derivative of a vector , by a scalar x is written (in numerator
layout notation) as

In vector calculus the derivative of a vector y with respect to a scalar x is known as the

tangent vector of the vector y, . Notice here that y: R1 → Rm.

Example Simple examples of this include the velocity vector in Euclidean space, which is the
tangent vector of the position vector (considered as a function of time). Also, the
acceleration is the tangent vector of the velocity.

Scalar-by-vector
The derivative of a scalar y by a vector , is written (in numerator
layout notation) as
In vector calculus, the gradient of a scalar field f : Rn → R (whose independent coordinates
are the components of x) is the transpose of the derivative of a scalar by a vector.

By example, in physics, the electric field is the negative vector gradient of the electric
potential.

The directional derivative of a scalar function f(x) of the space vector x in the direction of the
unit vector u (represented in this case as a column vector) is defined using the gradient as
follows.

Using the notation just defined for the derivative of a scalar with respect to a vector we can

re-write the directional derivative as This type of notation will be nice when

proving product rules and chain rules that come out looking similar to what we are familiar
with for the scalar derivative.
Vector-by-vector
Each of the previous two cases can be considered as an application of the derivative of a
vector with respect to a vector, using a vector of size one appropriately. Similarly we will find
that the derivatives involving matrices will reduce to derivatives involving vectors in a
corresponding way.

The derivative of a vector function (a vector whose components are functions)


, with respect to an input vector, ,
is written (in numerator layout notation) as

In vector calculus, the derivative of a vector function y with respect to a vector x whose
components represent a space is known as the pushforward (or differential), or the Jacobian
matrix.

The pushforward along a vector function f with respect to vector v in Rn is given by


Derivatives with matrices
There are two types of derivatives with matrices that can be organized into a matrix of the
same size. These are the derivative of a matrix by a scalar and the derivative of a scalar by a
matrix. These can be useful in minimization problems found in many areas of applied
mathematics and have adopted the names tangent matrix and gradient matrix respectively
after their analogs for vectors.

Note: The discussion in this section assumes the numerator layout convention for
pedagogical purposes. Some authors use different conventions. The section on layout
conventions discusses this issue in greater detail. The identities given further down are
presented in forms that can be used in conjunction with all common layout conventions.

Matrix-by-scalar
The derivative of a matrix function Y by a scalar x is known as the tangent matrix and is
given (in numerator layout notation) by
Scalar-by-matrix
The derivative of a scalar function y, with respect to a p×q matrix X of independent variables,
is given (in numerator layout notation) by

Important examples of scalar functions of matrices include the trace of a matrix and the
determinant.

In analog with vector calculus this derivative is often written as the following.

Also in analog with vector calculus, the directional derivative of a scalar f(X) of a matrix X in
the direction of matrix Y is given by
It is the gradient matrix, in particular, that finds many uses in minimization problems in
estimation theory, particularly in the derivation of the Kalman filter algorithm, which is of
great importance in the field.

Other matrix derivatives


The three types of derivatives that have not been considered are those involving vectors-by-
matrices, matrices-by-vectors, and matrices-by-matrices. These are not as widely considered
and a notation is not widely agreed upon.

Layout conventions
This section discusses the similarities and differences between notational conventions that
are used in the various fields that take advantage of matrix calculus. Although there are
largely two consistent conventions, some authors find it convenient to mix the two
conventions in forms that are discussed below. After this section, equations will be listed in
both competing forms separately.

The fundamental issue is that the derivative of a vector with respect to a vector, i.e. , is

often written in two competing ways. If the numerator y is of size m and the denominator x
of size n, then the result can be laid out as either an m×n matrix or n×m matrix, i.e. the
elements of y laid out in columns and the elements of x laid out in rows, or vice versa. This
leads to the following possibilities:

1. Numerator layout, i.e. lay out


according to y and xT (i.e. contrarily
to x). This is sometimes known as
the Jacobian formulation. This
corresponds to the m×n layout in
the previous example, which means

that the row number of equals

to the size of the numerator and

the column number of equals

to the size of xT.


2. Denominator layout, i.e. lay out
according to yT and x (i.e. contrarily
to y). This is sometimes known as
the Hessian formulation. Some
authors term this layout the
gradient, in distinction to the
Jacobian (numerator layout), which
is its transpose. (However, gradient
more commonly means the
derivative regardless of

layout.). This corresponds to the


n×m layout in the previous example,
which means that the row number

of equals to the size of x (the

denominator).
3. A third possibility sometimes seen
is to insist on writing the derivative

as (i.e. the derivative is taken

with respect to the transpose of x)


and follow the numerator layout.
This makes it possible to claim that
the matrix is laid out according to
both numerator and denominator. In
practice this produces results the
same as the numerator layout.
When handling the gradient and the opposite case we have the same issues. To be

consistent, we should do one of the following:

1. If we choose numerator layout for

we should lay out the gradient

as a row vector, and as a

column vector.
2. If we choose denominator layout for

we should lay out the gradient

as a column vector, and as

a row vector.
3. In the third possibility above, we

write and and use

numerator layout.
Not all math textbooks and papers are consistent in this respect throughout. That is,
sometimes different conventions are used in different contexts within the same book or
paper. For example, some choose denominator layout for gradients (laying them out as

column vectors), but numerator layout for the vector-by-vector derivative

Similarly, when it comes to scalar-by-matrix derivatives and matrix-by-scalar derivatives

then consistent numerator layout lays out according to Y and XT, while consistent

denominator layout lays out according to YT and X. In practice, however, following a

denominator layout for and laying the result out according to YT, is rarely seen because

it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the
following layouts can often be found:

1. Consistent numerator layout, which

lays out according to Y and

according to XT.
2. Mixed layout, which lays out

according to Y and according

to X.

3. Use the notation with results

the same as consistent numerator


layout.
In the following formulas, we handle the five possible combinations

and separately. We also handle cases of scalar-by-scalar derivatives that involve an

intermediate vector or matrix. (This can arise, for example, if a multi-dimensional parametric
curve is defined in terms of a scalar variable, and then a derivative of a scalar function of the
curve is taken with respect to the scalar that parameterizes the curve.) For each of the
various combinations, we give numerator-layout and denominator-layout results, except in the
cases above where denominator layout rarely occurs. In cases involving matrices where it
makes sense, we give numerator-layout and mixed-layout results. As noted above, cases
where vector and matrix denominators are written in transpose notation are equivalent to
numerator layout with the denominators written without the transpose.

Keep in mind that various authors use different combinations of numerator and denominator
layouts for different types of derivatives, and there is no guarantee that an author will
consistently use either numerator or denominator layout for all types. Match up the formulas
below with those quoted in the source to determine the layout used for that particular type of
derivative, but be careful not to assume that derivatives of other types necessarily follow the
same kind of layout.
When taking derivatives with an aggregate (vector or matrix) denominator in order to find a
maximum or minimum of the aggregate, it should be kept in mind that using numerator
layout will produce results that are transposed with respect to the aggregate. For example, in
attempting to find the maximum likelihood estimate of a multivariate normal distribution
using matrix calculus, if the domain is a k×1 column vector, then the result using the
numerator layout will be in the form of a 1×k row vector. Thus, either the results should be
transposed at the end or the denominator layout (or mixed layout) should be used.

Result of differentiating various kinds of aggregates with other kinds of aggregates

Column vector y Matrix Y (size


Scalar y
(size m×1) m×n)

Notation Type Notation Type Notation Type

Size-m
m×n
Numerator column
matrix
Scalar x Scalar vector

Size-m row
Denominator
vector

Size-n row m×n


Column Numerator
vector matrix
vector x
Size-n
(size n×m
Denominator column
n×1) matrix
vector

Matrix X Numerator q×p matrix


(size
p×q) Denominator p×q matrix

The results of operations will be transposed when switching between numerator-layout and
denominator-layout notation.

Numerator-layout notation
Using numerator-layout notation, we have:[1]
The following definitions are only provided in numerator-layout notation:

Denominator-layout notation
Using denominator-layout notation, we have:[2]
Identities
As noted above, in general, the results of operations will be transposed when switching
between numerator-layout and denominator-layout notation.

To help make sense of all the identities below, keep in mind the most important rules: the
chain rule, product rule and sum rule. The sum rule applies universally, and the product rule
applies in most of the cases below, provided that the order of matrix products is maintained,
since matrix products are not commutative. The chain rule applies in some of the cases, but
unfortunately does not apply in matrix-by-scalar derivatives or scalar-by-matrix derivatives (in
the latter case, mostly involving the trace operator applied to matrices). In the latter case, the
product rule can't quite be applied directly, either, but the equivalent can be done with a bit
more work using the differential identities.

The following identities adopt the following conventions:

the scalars, a, b, c, d, and e are


constant in respect of, and the scalars,
u, and v are functions of one of x, x, or
X;
the vectors, a, b, c, d, and e are
constant in respect of, and the vectors,
u, and v are functions of one of x, x, or
X;
the matrices, A, B, C, D, and E are
constant in respect of, and the
matrices, U and V are functions of one
of x, x, or X.

Vector-by-vector identities
This is presented first because all of the operations that apply to vector-by-vector
differentiation apply directly to vector-by-scalar or scalar-by-vector differentiation simply by
reducing the appropriate vector in the numerator or denominator to a scalar.
Identities: vector-by-vector

Numerator Denominator
Condition Expression layout, i.e. by layout, i.e. by
y and xT yT and x

a is not a function of
x

A is not a function of
x

A is not a function of
x

a is not a function of
x,
u = u(x)

v = v(x),
a is not a function of
x

v = v(x), u = u(x)

A is not a function of
x,
u = u(x)

u = u(x), v = v(x)

u = u(x)

u = u(x)

Scalar-by-vector identities
The fundamental identities are placed above the thick black line.
Identities: scalar-by-vector

Numerator layout, Denominator layout,


Condition Expression i.e. by xT; result is i.e. by x; result is
row vector column vector

a is not a [nb 1] [nb 1]


function of x

a is not a
function of x,
u = u(x)

u = u(x), v = v(x)

u = u(x), v = v(x)

u = u(x)

u = u(x)

u = u(x),
v = v(x) in in

numerator layout denominator layout

u = u(x),
v = v(x),
A is not a in in

function of x numerator layout denominator layout

, the Hessian
matrix[3]

a is not a
function of x
A is not a
function of x
b is not a
function of x

A is not a
function of x

A is not a
function of x
A is symmetric

A is not a
function of x

A is not a
function of x
A is symmetric

a is not a
function of x, in numerator in denominator
u = u(x)
layout layout

a, b are not
functions of x

A, b, C, D, e are
not functions of
x

a is not a
function of x
Vector-by-scalar identities

Identities: vector-by-scalar

Numerator
Denominator
layout, i.e. by
layout, i.e. by
y,
Condition Expression yT,
result is
result is row
column
vector
vector

a is not a function of [nb 1]


x

a is not a function of
x,
u = u(x)

A is not a function of
x,
u = u(x)

u = u(x)

u = u(x), v = v(x)

u = u(x), v = v(x)

u = u(x)
Assumes consistent matrix
layout; see below.

u = u(x)
Assumes consistent matrix
layout; see below.
U = U(x), v = v(x)

NOTE: The formulas involving the vector-by-vector derivatives and (whose

outputs are matrices) assume the matrices are laid out consistent with the vector layout, i.e.
numerator-layout matrix when numerator-layout vector and vice versa; otherwise, transpose
the vector-by-vector derivatives.

Scalar-by-matrix identities
Note that exact equivalents of the scalar product rule and chain rule do not exist when
applied to matrix-valued functions of matrices. However, the product rule of this sort does
apply to the differential form (see below), and this is the way to derive many of the identities
below involving the trace function, combined with the fact that the trace function allows
transposing and cyclic permutation, i.e.:

For example, to compute


Therefore,
(numerator layout)

(denominator layout)
(For the last step, see the Conversion from differential to derivative form section.)
Identities: scalar-by-matrix

Condition Expression Numerator layout, i.e. by XT Denom

a is not a [nb 2]
function of X

a is not a
function of X,
u = u(X)

u = u(X),
v = v(X)

u = u(X),
v = v(X)

u = u(X)

u = u(X)

U = U(X) [3]
Both forms assume numerator layout f

i.e. mixed layout if denominator layout for X

a and b are
not functions
of X

a and b are
not functions
of X

a, b and C are
not functions
of X

a, b and C are
not functions
of X
U = U(X),
V = V(X)

a is not a
function of X,
U = U(X)

g(X) is any
polynomial
with scalar
coefficients,
or any matrix
function
defined by an
infinite
polynomial
series (e.g. eX,
sin(X),
cos(X), ln(X),
etc. using a
Taylor series);
g(x) is the
equivalent
scalar
function, g′(x)
is its
derivative, and
g′(X) is the
corresponding
matrix
function
[4]
A is not a
function of X

[3]
A is not a
function of X
[3]
A is not a
function of X

[3]
A is not a
function of X

A, B are not
functions of X

A, B, C are not
functions of X
[3]
n is a positive
integer

A is not a [3]
function of X,
n is a positive
integer
[3]

[3]

[5]

a is not a [3]

function of X [nb 3]

[3]
A, B are not
functions of X

n is a positive
[3]
integer
[3]
(see pseudo-
inverse)

[3]
(see pseudo-
inverse)

A is not a
function of X,
X is square
and invertible

A is not a
function of X,
X is non-
square,
A is
symmetric

A is not a
function of X,
X is non-
square,
A is non-
symmetric
Matrix-by-scalar identities

Identities: matrix-by-scalar

Condition Expression Numerator layout, i.e. by Y

U = U(x)

A, B are not
functions of x,
U = U(x)

U = U(x),
V = V(x)

U = U(x),
V = V(x)

U = U(x),
V = V(x)

U = U(x),
V = V(x)

U = U(x)

U = U(x,y)

A is not a
function of x,
g(X) is any
polynomial
with scalar
coefficients,
or any matrix
function
defined by an
infinite
polynomial
series (e.g. eX,
sin(X),
cos(X), ln(X),
etc.); g(x) is
the equivalent
scalar
function, g′(x)
is its
derivative, and
g′(X) is the
corresponding
matrix
function

A is not a
function of x
Scalar-by-scalar identities

With vectors involved

Identities: scalar-by-scalar, with vectors involved

Any layout (assumes


dot product ignores
Condition Expression
row vs. column
layout)

u = u(x)

u = u(x), v = v(x)
With matrices involved

Identities: scalar-by-scalar, with matrices involved[3]

Consistent numerator layout, Mixed la


Condition Expression T
i.e. by Y and X i.e. by Y

U = U(x)

U = U(x)

U = U(x)

U = U(x)

A is not a
function of x,
g(X) is any
polynomial
with scalar
coefficients,
or any matrix
function
defined by an
infinite
polynomial
series (e.g. eX,
sin(X),
cos(X), ln(X),
etc.); g(x) is
the equivalent
scalar
function, g′(x)
is its
derivative, and
g′(X) is the
corresponding
matrix
function.

A is not a
function of x

Identities in differential form


It is often easier to work in differential form and then convert back to normal derivatives. This
only works well using the numerator layout. In these rules, a is a scalar.

Differential identities: scalar involving matrix[1][3]

Expression Result (numerator layout)


Differential identities: matrix[1][3][6] [7]

Condition Expression Result (numerator layout)

A is not a function of X

a is not a function of X

(Kronecker product)

(Hadamard product)

(conjugate transpose)

n is a positive integer

is

diagonalizable

f is differentiable at every
eigenvalue

In the last row, is the Kronecker delta and is the set of

orthogonal projection operators that project onto the k-th eigenvector of X. Q is the matrix of
eigenvectors of , and are the eigenvalues. The matrix function
is defined in terms of the scalar function for diagonalizable matrices by
where with .

To convert to normal derivative form, first convert it to one of the following canonical forms,
and then use these identities:

Conversion from differential to derivative form[1]

Canonical differential form Equivalent derivative form (numerator layout)

Applications
Matrix differential calculus is used in statistics and econometrics, particularly for the
statistical analysis of multivariate distributions, especially the multivariate normal distribution
and other elliptical distributions.[8][9][10]

It is used in regression analysis to compute, for example, the ordinary least squares
regression formula for the case of multiple explanatory variables.[11] It is also used in random
matrices, statistical moments, local sensitivity and statistical diagnostics.[12][13]
See also

Mathematics
portal

Derivative (generalizations)
Product integral
Ricci calculus

Notes

1. Here, refers to a column vector of all 0's,


of size n, where n is the length of x.

2. Here, refers to a matrix of all 0's, of the


same shape as X.

3. The constant a disappears in the result.


This is intentional. In general,
or, also

References

1. Thomas P., Minka (December 28, 2000).


"Old and New Matrix Algebra Useful for
Statistics" (http://research.microsoft.co
m/en-us/um/people/minka/papers/matri
x/) . MIT Media Lab note (1997; revised
12/00). Retrieved 5 February 2016.
2. Felippa, Carlos A. "Appendix D, Linear
Algebra: Determinants, Inverses, Rank" (ht
tp://www.colorado.edu/engineering/cas/c
ourses.d/IFEM.d/IFEM.AppD.d/IFEM.App
D.pdf) (PDF). ASEN 5007: Introduction To
Finite Element Methods. Boulder,
Colorado: University of Colorado.
Retrieved 5 February 2016. Uses the
Hessian (transpose to Jacobian)
definition of vector and matrix derivatives.
3. Petersen, Kaare Brandt; Pedersen,
Michael Syskind. The Matrix Cookbook (ht
tps://web.archive.org/web/20100302210
536/http://www.imm.dtu.dk/pubdb/view
s/edoc_download.php/3274/pdf/imm327
4.pdf) (PDF). Archived from the original
(http://matrixcookbook.com) on 2 March
2010. Retrieved 5 February 2016. This
book uses a mixed layout, i.e. by Y in

by X in

4. Duchi, John C. "Properties of the Trace


and Matrix Derivatives" (https://web.stanf
ord.edu/~jduchi/projects/matrix_prop.pd
f) (PDF). Stanford University. Retrieved
5 February 2016.
5. See Determinant § Derivative for the
derivation.

6. Giles, Michael B. (2008). "An extended


collection of matrix derivative results for
forward and reverse mode algorithmic
differentiation" (https://web.archive.org/w
eb/20200227075201/https://pdfs.semant
icscholar.org/c74c/5e11ed05246c12165c
e7e4b6222bd32d68dc.pdf) (PDF).
S2CID 17431500 (https://api.semanticsch
olar.org/CorpusID:17431500) . Archived
from the original (https://pdfs.semanticsc
holar.org/c74c/5e11ed05246c12165ce7e
4b6222bd32d68dc.pdf) (PDF) on 2020-
02-27. {{cite journal}}: Cite journal
requires |journal= (help)
7. Unpublished memo (https://www.ias.edu/
sites/default/files/sns/files/1-matrixlog_t
ex(1).pdf) by S Adler (IAS)

8. Fang, Kai-Tai; Zhang, Yao-Ting (1990).


Generalized multivariate analysis. Science
Press (Beijing) and Springer-Verlag
(Berlin). ISBN 3540176519.
9783540176510.

9. Pan, Jianxin; Fang, Kaitai (2007). Growth


curve models and statistical diagnostics.
Beijing: Science Press.
ISBN 9780387950532.

10. Kollo, Tõnu; von Rosen, Dietrich (2005).


Advanced multivariate statistics with
matrices. Dordrecht: Springer. ISBN 978-1-
4020-3418-3.
11. Magnus, Jan; Neudecker, Heinz (2019).
Matrix differential calculus with
applications in statistics and
econometrics. New York: John Wiley.
ISBN 9781119541202.

12. Liu, Shuangzhe; Leiva, Victor; Zhuang,


Dan; Ma, Tiefeng; Figueroa-Zúñiga, Jorge
I. (2022). "Matrix differential calculus with
applications in the multivariate linear
model and its diagnostics" (https://doi.or
g/10.1016%2Fj.jmva.2021.104849) .
Journal of Multivariate Analysis. 188:
104849. doi:10.1016/j.jmva.2021.104849
(https://doi.org/10.1016%2Fj.jmva.2021.1
04849) .
13. Liu, Shuangzhe; Trenkler, Götz; Kollo,
Tõnu; von Rosen, Dietrich; Baksalary,
Oskar Maria (2023). "Professor Heinz
Neudecker and matrix differential
calculus". Statistical Papers.
doi:10.1007/s00362-023-01499-w (http
s://doi.org/10.1007%2Fs00362-023-0149
9-w) . S2CID 263661094 (https://api.sem
anticscholar.org/CorpusID:263661094) .

Further reading

Abadir, Karim M.; Magnus, Jan R. (2005).


Matrix algebra. Econometric Exercises.
Cambridge: Cambridge University Press.
ISBN 978-0-511-64796-3. OCLC 569411497
(https://www.worldcat.org/oclc/56941149
7) .
Lax, Peter D. (2007). "9. Calculus of Vector-
and Matrix-Valued Functions". Linear
algebra and its applications (2nd ed.).
Hoboken, N.J.: Wiley-Interscience.
ISBN 978-0-471-75156-4.
Magnus, Jan R. (October 2010). "On the
concept of matrix derivative". Journal of
Multivariate Analysis. 101 (9): 2200–2206.
doi:10.1016/j.jmva.2010.05.005 (https://do
i.org/10.1016%2Fj.jmva.2010.05.005) ..
Note that this Wikipedia article has been
nearly completely revised from the version
criticized in this article.
External links

Software

MatrixCalculus.org (http://www.matrix
calculus.org/) , a website for
evaluating matrix calculus expressions
symbolically
NCAlgebra (https://math.ucsd.edu/~n
calg/) , an open-source Mathematica
package that has some matrix calculus
functionality
SymPy supports symbolic matrix
derivatives in its matrix expression
module (https://docs.sympy.org/lates
t/modules/matrices/expressions.htm
l) , as well as symbolic tensor
derivatives in its array expression
module (https://docs.sympy.org/lates
t/modules/tensor/array_expressions.h
tml) .

Information

Matrix Reference Manual (https://web.


archive.org/web/20120630192238/htt
p://www.psi.toronto.edu/matrix/calcul
us.html) , Mike Brookes, Imperial
College London.
Matrix Differentiation (and some other
stuff) (http://www.atmos.washington.e
du/~dennis/MatrixCalculus.pdf) ,
Randal J. Barnes, Department of Civil
Engineering, University of Minnesota.
Notes on Matrix Calculus (http://www
4.ncsu.edu/~pfackler/MatCalc.pdf) ,
Paul L. Fackler, North Carolina State
University.
Matrix Differential Calculus (https://wik
i.inf.ed.ac.uk/twiki/pub/CSTR/ListenSe
mester1_2006_7/slide.pdf) Archived
(https://web.archive.org/web/2012091
6044332/https://wiki.inf.ed.ac.uk/twik
i/pub/CSTR/ListenSemester1_2006_7/
slide.pdf) 2012-09-16 at the Wayback
Machine (slide presentation), Zhang
Le, University of Edinburgh.
Introduction to Vector and Matrix
Differentiation (https://web.archive.or
g/web/20120526142207/http://www.e
con.ku.dk/metrics/Econometrics2_05_
II/LectureNotes/matrixdiff.pdf) (notes
on matrix differentiation, in the context
of Econometrics), Heino Bohn Nielsen.
A note on differentiating matrices (htt
p://mpra.ub.uni-muenchen.de/1239/1/
MPRA_paper_1239.pdf) (notes on
matrix differentiation), Pawel Koval,
from Munich Personal RePEc Archive.
Vector/Matrix Calculus (http://www.pe
rsonal.rdg.ac.uk/~sis01xh/teaching/C
Y4C9/ANN3.pdf) More notes on
matrix differentiation.
Matrix Identities (http://www.cs.nyu.ed
u/~roweis/notes/matrixid.pdf) (notes
on matrix differentiation), Sam Roweis.
Retrieved from
"https://en.wikipedia.org/w/index.php?
title=Matrix_calculus&oldid=1203746125"

This page was last edited on 5 February 2024, at


14:06 (UTC). •
Content is available under CC BY-SA 4.0 unless
otherwise noted.

You might also like