Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views7 pages

Lin Al Rev

Lin Al revision

Uploaded by

saud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views7 pages

Lin Al Rev

Lin Al revision

Uploaded by

saud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CSC 576: Mathematical Foundations I

Ji Liu
Department of Computer Sciences, University of Rochester

September 20, 2016

1 Notations and Assumptions


In most cases (if without local definitions), we use
• Greek alphabets such as α, β, and γ to denote real numbers;

• Small letters such as x, y, and z to denote vectors;

• Capital letters to denote matrices, e.g., A, B, and C.


Other notations:
• R is the one dimensional Euclidean space;

• Rn is the n dimensional vector Euclidean space;

• Rm×n is the m × n dimensional matrix Euclidean space;

• R+ denotes the range [0, +∞);

• 1n ∈ Rn denotes a vector with 1 in all entries;

• For any vector x ∈ Rn , we use |x| to denote the absolute vector, that is, |x|i = |xi | ∀i =
1, · · · , n;

• denotes the component-wise product, that is, for any vectors x and y, (x y)i = xi yi .
Some assumptions:
• Unless explicit (local) definition, we always assume that all vectors are column vectors.

2 Vector norms, Inner product


A function f : x ∈ Rn → y ∈ R+ is called a “norm”, if the following three conditions are satisfied
• (Zero element) f (x) ≥ 0 and f (x) = 0 if and only if x = 0;

• (Homogeneous) For any α ∈ R and x ∈ Rn , f (αx) = |α|f (x);

• (Triangle inequality) Any x, y ∈ Rn satisfy f (x) + f (y) ≥ f (x + y).

1
The `2 norm “k · k2 ” (a special “f (·)”) in Rn is defined as
1
kxk2 = (|x1 |2 + |x2 |2 + · · · + |xn |2 ) 2 .

Because of `2 is the most commonly used norm (also known as Euclidean norm), we denote it as
k · k sometimes for short. (Think about it how about f ([x1 , x2 ]) = 2x21 + x22 ?)
A general `p norm (p ≥ 1) is defined as
1
kxkp = (|x1 |p + |x2 |p + · · · + |xn |p ) p .

Note that for p < 1, it is not a “norm” since the triangle inequality is violated. `∞ norm is defined
as
kxk∞ = max{|x1 |, |x2 |, · · · , |xn |}.
One may notice that the `∞ norm is the limit of the `p norm, that is, for any x ∈ Rn , kxk∞ =
limp→+∞ kxkp . In addition, people use kxk0 to denote the `0 “norm”.
The inner product h·, ·i in Rn is defined as
X
hx, yi = xi yi .
i

One can show that hx, xi = kxk2 . Two vectors x and y are orthogonal if hx, yi = 0. That is one
reason why `2 norm is so special.
If p ≥ q, then for any x ∈ Rn we have kxkp ≤ kxkq . In particular, we have

kxk1 ≥ kxk2 ≥ kxk∞ .

To bound from the order sides, we have


√ √
kxk1 ≤ nkxk2 kxk2 ≤ nkxk∞ .

Proof. To see the first one, we have



kxk1 = h1n , |x|i ≤ k1n k2 k|x|k2 = nkxk2

where the last inequality uses the Cauchy inequality. I leave the proof of the second inequality in
your homework.

Given a norm “k · kA ”, its dual norm is defined as


hx, zi
kxkA∗ = max hx, yi = max hx, yi = max .
kykA ≤1 kykA =1 z kzkA
Several important properties about the dual norm are
• The dual norm’s dual norm is itself, that is, kxk(A∗ )∗ = kxkA ;
• The `2 norm is self-dual, that is, the dual norm of the `2 norm is still the `2 norm;
• The dual norm of the `p norm (p ≥ 1) is `q norm where p and q satisfy 1/p + 1/q = 1.
Particularly, `1 norm and `∞ norm are dual to each other.
• (Holder inequality): hx, yi ≤ kxkA kykA∗

2
3 Linear space, subspace, linear transformation
A set S is a linear space if

• 0 ∈ S;

• given any two points x ∈ S, y ∈ S in S and any two scalars α ∈ R and β ∈ R, we have

αx + βy ∈ S.

Note that ∅ is not a linear space. Examples: vector space Rn , matrix space Rm×n . How about the
following things:

• 0; (no)

• {0}; (yes)

• {x | Ax = b} where A is a matrix and b is a vector. (b = 0 yes; otherwise, no)

Let S be a linear space. A set S 0 is a subspace if S 0 is a linear space and also a subset of S.
Actually, “subspace” is equivalent to “linear space”, because any subspace is a linear space and
any linear space is a subspace. They are indeed talking about the same thing.
Let S be a linear space. A function L(·) is a linear transformation if given any two points
x, y ∈ S and two scalars α ∈ R and β ∈ R, one has

L(αx + βy) = αL(x) + βL(y).

For vector space, there exists a 1-1 correspondence between a linear transformation and a matrix.
Therefore, we can simply say “a matrix is a linear transformation”.

• Prove that {L(x) | x ∈ S} is a linear space if S is a linear space and L is a linear transformation.

• Prove that {x | L(x) ∈ S} a linear space assuming S is a linear space, and L is a linear
transformation.

How to express a subspace? The most intuitive way is to use a bunch of vectors. A subspace
can be expressed by
( n )
X
span{x1 , x2 , · · · , xn } = αi xi | αi ∈ R = {Xα | α},
i=1

which is called the range space of matrix X. A subspace can be also represented by the null space
of X by
{α | Xα = 0}.

3
4 Eigenvalues / eigenvectors, rank, SVD, inverse
The transpose of a matrix A ∈ Rm×n is defined as AT ∈ Rn×m :

(AT )ij = Aji .

One can verify that


(AB)T = B T AT .
A matrix B ∈ Rn×n is the inverse of an invertible matrix A ∈ Rn×n if

AB = I and BA = I.

B can be denoted as A−1 . A has the inverse is equivalent to that A has a full rank (the definition
for “rank” will be clear very soon.) Note that the inverse of a matrix is unique. One can also verify
that if both A and B are invertible, then

(AB)−1 = B −1 A−1 .

The “transpose” and the “inverse” are exchangeable:

(AT )−1 = (A−1 )T .

When we write A−1 , we have to make sure that A is invertible.


Given a square matrix A ∈ Rn×n , x ∈ Rn (x 6= 0) is called its eigenvector and λ ∈ Rn is called
its eigenvalue, if the following relationship is satisfied

Ax = λx. (The effect of applying the linear transformation A on x is nothing but scaling it.)

Note that
• If {λ, x} is a pair of eigenvalue-eigenvector, then so is {λ, αx} for any α 6= 0.

• One eigenvalue may correspond to multiple different eigenvectors. “Different” means eigen-
vectors are different after normalization.
If the matrix A is symmetric, then any two eigenvectors (corresponding to different eigenvalues)
are orthogonal, that is, if AT = A, Ax1 = λ1 x1 , Ax2 = λ2 x2 , and λ1 6= λ2 , then

xT1 x2 = 0.

Proof. Consider xT1 Ax2 . We have

xT1 Ax2 = xT1 (Ax2 ) = xT1 (Ax2 ) = xT1 (λ2 x2 ) = λ2 xT1 x2 ,

and
xT1 Ax2 = (xT1 A)x2 = (AT x1 )T x2 |{z}
= (Ax1 )T x2 = λ1 xT1 x2 .
A=AT
Therefore, we have
λ2 xT1 x2 = λ1 xT1 x2 .
Since λ1 6= λ2 , we obtain xT1 x2 = 0.

4
A matrix A ∈ Rm×n is a “rank-1” matrix, if A can be expressed as

A = xy T

where x ∈ Rm and y ∈ Rn , and x 6= 0, y 6= 0. The rank of a matrix A ∈ Rm×n is defined as


r
( )
X
rank(A) = min r | A = xi yiT , xi ∈ Rm , yi ∈ Rn
i=1
r
( )
X
= min r | A = Bi , Bi is a “rank-1” matrix .
i=1

Examples: [1, 1; 1, 1], [1, 1; 2, 2], and many natural images have the low rank property. “Low rank”
implies that the contained information is few.
We say “U ∈ Rm×n has orthogonal columns” if U T U = I, that is, any two columns Ui· and Uj·
of U satisfies
Ui·T Uj· = 0 if i 6= j; otherwise Ui·T Uj· = 1.
Swapping any two columns in U to get U 0 , U 0 still satisfies U 0T U 0 = I.

• kU xk = kxk ∀x.

• kU T yk ≤ kyk ∀y.

If U is a square matrix and has orthogonal columns, then we call it “orthogonal matrix”. It has
some nice properties

• U −1 = U T (which means that U U T = U T U = I.)

• U T is also an orthogonal matrix.

• The effect of applying the transformation U on a vector x is to rotate x, that is, kU xk =


kxk = kU T xk.

“SVD” is short for “singular value decomposition”, which is the most important concept in
linear algebra and matrix analysis. SVD almost explores all structures of a matrix. Given any
matrix A ∈ Rm×n , it can be decomposed into
r
X
A = U ΣV T = σi Ui· Vi·T
i=1

where U ∈ Rm×r and V ∈ Rn×r have orthogonal columns, and Σ = diag{σ1 , σ2 , · · · , σr } is a


diagonal matrix with positive diagonal elements. σi ’s are called singular values, which are positive
and are arranged in the decreasing order.

• rank(A) = r;

• kAxk ≤ σ1 kxk. Why?

A matrix B ∈ Rn×n is positive semi-definite (PSD), if the following things are satisfied

5
• B is symmetric;

• ∀x ∈ Rn , we have xT Bx ≥ 0.

The positive definite matrix is defined by adding one more condition

• xT Bx = 0 ⇔ x = 0.

We can also use an equivalent definition for PSD matrices in the following: A matrix B ∈ Rn×n is
positive semi-definite (PSD), if the SVD of B can be written as

B = U ΣU T

where U T U = I and Σ is a diagonal matrix with nonnegative diagonal elements. Examples of PSD
matrices: I, AT A.
Assume matrices A and B are invertible. We have the following identity:

B −1 = A−1 − B −1 (B − A)A−1 .

The Sherman-Morrison-Woodbury Formula is very useful to calculate the matrix inverse:

(A + U V > )−1 = A−1 − A−1 U (I + V > A−1 U )−1 V > A−1 .

This result is especially important from the perspective of computation. A special case would be
that U and V are two vectors u and v. Then it is in form of

(A + uv > )−1 = A−1 − (1 + v > A−1 u)−1 A−1 uv > A−1 ,

which can be calculated with complexity O(n2 ) if A−1 is known.


The Sylvester’s determinant theorem is

det(Im + AB) = det(In + BA).

5 Matrix norms (spectral norm, nuclear norm, Frobenius norm)


The Frobenius norm (F-norm) of a matrix A ∈ Rm×n is defined as
 1 !1
2
X X 2

kAkF =  |Ai,j |2  = σi2


1≤i≤m,1≤j≤n i=1

If A is a vector, one can verify that kAkF = kAk2 .


The inner product h·, ·i in Rm×n is defined as
X
hX, Y i = Xij Yij = trace(X T Y ) = trace(Y X T ) = trace(XY T ) = trace(Y T X).
i,j

An important property for trace(AB):

trace(AB) = trace(BA) = trace(AT B T ) = trace(B T AT ).

6
One may notice that hX, Xi = kXk2F .
The spectral (trace) norm of a matrix A ∈ Rm×n is defined as

kAkspec = max kAxk = max y T Ax = σ1 (A)


kxk=1 kxk=1,kyk=1

The nuclear norm of a matrix A ∈ Rm×n is defined as


X
kAktr = σi (A) = trace(Σ)
i

where Σ is the diagonal matrix of SVD of A = U ΣV T .


An important relationship
p
kAkspec ≤ kAkF ≤ kAktr and rank(A)kAkspec ≥ rank(A)kAkF ≥ kAktr .

The dual norm for a matrix norm k · kA is defined as

hX, Y i
kY kA∗ := max = max hX, Y i. (1)
kXk≤1 kXkA X

We have the following properties (think about why it is true):

kXkspec∗ = kXktr , kXkF ∗ = kXkF .

6 Matrix and Vector Differential


Let f (X) : Rm×n → R be a function with respect to matrix X ∈ Rm×n . It is differential (or
gradient) is defined as
 ∂f (X)
· · · ∂f (X)
· · · ∂f (X)

∂X11 ∂X1j ∂X1n
 ··· ··· ··· ··· ···
 

∂f (X)  ∂f (X) ∂f (X) ∂f (X) 
=  ∂Xi1 · · · ∂Xij · · · ∂Xin
 .
∂X  ···

 ··· ··· ··· ··· 

∂f (X) ∂f (X) ∂f (X)
∂Xm1 · · · ∂Xmj · · · ∂Xmn

We provide a few examples in the following


∂f (X)
f (X) = trace(AT X) = hA, Xi =A
∂X
∂f (X)
f (X) = trace(X T AX) = (A + AT )X
∂X
1 ∂f (X)
f (X) = kAX − Bk2F = AT (AX − B)
2 ∂X
1 ∂f (X)
f (X) = trace(B T X T XB) = XBB T
2 ∂X
1 ∂f (X) 1
f (X) = trace(B T X T AXB) = (A + AT )XBB T
2 ∂X 2

You might also like