Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views83 pages

Lecture Notes

The document outlines the syllabus for 'Mathematical Methods in Physics (Part I)' by Dr. Cristina Zambon, covering various mathematical concepts essential for physics, including vector algebra, matrices, vector functions, and integral transforms. It provides a structured approach to topics such as orthogonal projections, eigenvalue problems, Fourier series, and vector calculus, along with fundamental principles of probability. The content serves as a guideline for lecture materials and emphasizes the importance of understanding these mathematical methods in the context of physics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views83 pages

Lecture Notes

The document outlines the syllabus for 'Mathematical Methods in Physics (Part I)' by Dr. Cristina Zambon, covering various mathematical concepts essential for physics, including vector algebra, matrices, vector functions, and integral transforms. It provides a structured approach to topics such as orthogonal projections, eigenvalue problems, Fourier series, and vector calculus, along with fundamental principles of probability. The content serves as a guideline for lecture materials and emphasizes the importance of understanding these mathematical methods in the context of physics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Mathematical Methods in Physics (Part I)1

Dr Cristina Zambon
2 October 2024

1
Despite the effort to eliminate all typographic errors, some of them could still be present. Hence be
careful. Note that this summary is intended as a guideline for the materials covered in lectures and it is
not supposed to replace the textbook.
Contents

1 Introduction 4
1.1 Vector Algebra (Chapter 7 in Riley) . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Matrices (Chapter 8 in Riley) . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Vector functions (Chapter 10 in Riley) . . . . . . . . . . . . . . . . . . . . 8

2 Orthogonal projection and index notation in R3 9

3 Vector Spaces 12

4 Matrices: the eigenvalue problem & matrix diagonalisation 17


4.1 Some basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 The eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Matrix diagonalisaiton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Fourier Series 26
5.1 Fourier series and vector spaces . . . . . . . . . . . . . . . . . . . . . . . . 34

6 Integral Transforms and the Dirac delta function 35


6.1 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 The Dirac delta function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3 Laplace transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7 Vector Calculus: Del operator & integrals 50


7.1 The Del operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2 Line integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.3 Surface integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.4 Volume integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

8 Theorems of integration 60

9 Change of variables: orthogonal curvilinear coordinates 67

2
CONTENTS 3

10 Introduction to probability 71
10.1 The basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.2 Counting the number of outcomes in an event . . . . . . . . . . . . . . . . 75
10.3 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 1

Introduction

Before engaging with the lectures, ensure that you are familiar with the mathe-
matical concepts that follow. This material will not be covered in any details during
lectures.

1.1 Vector Algebra (Chapter 7 in Riley)


Consider the three dimensional space with basis {i, j, k} (Standard basis).

ˆ Scalar (or dot) product:

a · b = |a||b| cos θ = a1 b1 + a2 b2 + a3 b3 .

where θ is the angle between the vectors a and b.

i) |a|2 = a · a.
ii) a · b = b · a.
iii) a · (b + c) = a · b + a · c, a · (β b) = β a · b, where β is a scalar.
iv) If the scalar product of two vectors is zero, then the vectors are perpendicular.

ˆ Vector (or cross) product:

a × b = (a2 b3 − a3 b2 ) i + (a3 b1 − a1 b3 ) j + (a1 b2 − a2 b1 ) k,


with |a × b| = |a||b| sin θ,

where θ is the angle between the vectors a and b.

i) |a × b| is the area of the parallelogram with sides a and b.


ii) a × b = −b × a.
iii) a × (b + c) = a × b + a × c, a × (β b) = β a × b.

4
1.1. VECTOR ALGEBRA (CHAPTER 7 IN RILEY) 5

iv) If the vector product of two vectors is zero, then the vectors are parallel or
antiparallel.

ˆ Scalar triple product:

a1 a2 a3
[a, b, c] = a · (b × c) = b1 b2 b3 .
c1 c2 c3

i) |a · (b × c)| is the volume of the parallelepiped with sides a, b and c.


ii) [α a + β b, c, d] = α [a, c, d] + β [b, c, d].
iii) [a, b, c] = [b, c, a] = [c, a, b] = −[a, c, b] = −[c, b, a] = −[b, a, c]
iv) If the scalar triple product of three vectors is zero, then the vectors are coplanar.

ˆ Vector triple product:

a × (b × c) = (a · c)b − (a · b)c.

i) The vector triple product is not associative i.e. (a × b) × c ̸= a × (b × c).


ii) (a × b) × c = (a · c)b − (b · c)a.

ˆ Equation of a line.
Given a point A with position vector a located on a line having a direction b̂, a
generic point R on the same line with position vector r is given by
 
x
r = a + λb̂, r =  y ,
z

where λ is a scalar with −∞ < λ < ∞. Note that the same equation can be also
written as follows
(r − a) × b̂ = 0.

Figure 1.1: Line passing through the point A and having a direction b̂.
6 CHAPTER 1. INTRODUCTION

ˆ Equation of a plane.

i) A point R on a plane perpendicular to the unit vector n̂ and passing through


the point A is:
(r − a) · n̂ = 0,
where a and r are the position vectors of A and R, respectively.

Figure 1.2: Plane perpendicular to the unit vector n̂ and passing through the point A.

ii) A point R on a plane passing through the points A, B and C is:

r = a + λ(b − a) + µ(c − a),

where a, b, c and r are the position vectors of A, B, C and R, respectively.


Note that both parameters λ and µ are between −∞ and ∞.

Figure 1.3: Plane passing through the points A, B and C.

ˆ Equation of a sphere.
A point R on a sphere of radius a and centre at C is:

|r − c|2 = a2 ,

where r and c are the position vector of R and C, respectively.


1.2. MATRICES (CHAPTER 8 IN RILEY) 7

1.2 Matrices (Chapter 8 in Riley)


ˆ Matrix operations:

i) Matrix addition: (A + B)ij = Aij + Bij .


ii) Multiplication by a scalar: (αA)ij = αAij .
iii) Multiplication of matrices: (AB)ij = Aik Bkj , with AB ̸= BA.
iv) Transposition: (AT )ij = Aji , with (ABC . . . F )T = F T . . . C T B T AT .
v) Complex conjugation: (A∗ )ij = (Aij )∗ .
vi) Hermitian conjugation (adjoint): (A† )ij = (Aji )∗ , with
(ABC . . . F )† = F † . . . C † B † A† .

ˆ The determinant of a square matrix:

|A| = Ajk Cjk , for any row j, |A| = Akj Ckj , for any column j,

where Cmn = (−1)m+n |Amn | is the cofactor associated to the matrix element Amn .
In turn, |Amn | is the minor associated to the matrix element Amn . The minor is the
determinant of the matrix obtained by removing the m-th row and n-th column from
the matrix A.
Properties:

i) |AB . . . F | = |A||B| . . . |F |.
ii) |AT | = |A|, |A∗ | = |A|∗ , |A† | = |A|∗ , |A−1 | = |A|−1 .
iii) If the rows (or the columns) are linearly dependent, then |A| = 0.
iv) If B is obtained from A by multiplying the elements of any row (or column) by
a factor α, then |B| = α |A.
v) If B is obtained from A by interchanging two rows (or columns), then |B| =
−|A|.
vi) If B is obtained from A by adding k times one row (or column) to the other
row (or column), then |A| = |B|.

ˆ Elementary row operations (on matrices):

i) Multiply any row by a non zero constant.


ii) Interchange any two rows.
iii) Add some multiple of one row to any other row.

ˆ The inverse of a square matrix:

CT Cji
A−1 = , that is A−1
ij = , A−1 A = AA−1 = I,
|A| |A|
8 CHAPTER 1. INTRODUCTION

where C is the cofactor matrix and I the identity matrix (Iij = δij ). If |A| = 0 the
inverse does not exist and the matrix A is said to be singular.

Note that in order to find the inverse of a matrix, you can also use the Gauss-Jordan
method shown in the lectures, which makes use of the elementary row operations.
Properties:
i) (AB . . . F )−1 = F −1 . . . B −1 A−1 .
ii) (AT )−1 = (A−1 )T , (A† )−1 = (A−1 )† .

1.3 Vector functions (Chapter 10 in Riley)


Vector functions are vectors whose components are functions of one or more variables e.g.
a(u, v, . . . ) = ax (u, v, . . . ) i + ay (u, v, . . . ) j + az (u, v, . . . ) k.
A vector function defines a vector field.
ˆ Differentiation of vector functions:
∂a ∂ax ∂ay ∂az
= i+ j+ k.
∂u ∂u ∂u ∂u
Note that in cartesian coordinates i, j, k, are constants.
Differentiation rules:
i)
∂ ∂ϕ ∂a ∂ da ∂ϕ
(ϕa) = a+ϕ , a(ϕ(u, v, . . . )) = ,
∂u ∂u ∂u ∂u dϕ ∂u
ii)
∂ ∂a ∂b ∂ ∂a ∂b
(a · b) = ·b+a· , (a × b) = ×b+a× ,
∂u ∂u ∂u ∂u ∂u ∂u
where a, b are vector functions and ϕ, is a scalar function.
ˆ Differential of a vector function:
∂a ∂a
da = du + dv + · · · .
∂u ∂v

Example 1 Calculate the differential of the position vector r = x i + y j + z k.

∂r ∂r ∂r
= i, = j, = k,
∂x ∂y ∂z
hence dr = i dx + j dy + k dz.
Chapter 2

Orthogonal projection and index no-


tation in R3

Please refer to the content in section 1.1

ˆ Orthogonal projection of b onto the direction of a (b∥ ):

a (a · b) a
b = b∥ + b⊥ , b∥ = (|b| cos θ) = .
|a| |a| |a|

Figure 2.1: Orthogonal projection

ˆ Orthonormal vectors: The vectors a1 , a2 , · · · , an are said to be orthonormal if


they are unit vectors mutually orthogonal. A good example of a set or orthonormal
vectors is the standard basis in R3 i.e. i, j, k.

ˆ Summation convention: An index that appears twice in a given term is understood


to be summed over all values that the index can take. In the case of the space R3 , the

9
10 CHAPTER 2. ORTHOGONAL PROJECTION AND INDEX NOTATION IN R3

indices can take the values ,1,2,3. The summed over indices are called called dummy
indices and the other free indices.

Example 1 Expand the following expressions

3
P
(i) aij bjk ≡ aij bjk = ai1 b1k + ai2 b2k + ai3 b3k .
j

3 P
P 3 3
P
(ii) aij bjk ck ≡ aij bjk ck = (aij bj1 c1 + aij bj2 c2 + aij bj3 c3 )
j k j

= (ai1 b11 c1 + ai1 b12 c2 + ai1 b13 c3 ) + (ai2 b21 c1 + ai2 b22 c2 + ai2 b23 c3 )
+ (ai3 b31 c1 + ai3 b32 c2 + ai3 b33 c3 ).

Let us introduce two mathematical objects that can be used in the context of the summa-
tion convention:

ˆ Kronecker delta: 
1 if i = j,
δij = i, j = 1, 2, 3
0 if i ̸= j.
Note that this object is symmetric.

Example 2 Use the index notation to rewrite the following expressions

(i) bi δij = b1 δ1j + b2 δ2j + b3 δ3j = bj .

(ii) δjj = δ11 + δ22 + δ33 = 3.

(iii) a · b = ai bi = δij ai bj .

ˆ Levi-Civita symbol:

 1 if (i, j, k) = (1, 2, 3) = (2, 3, 1) = (3, 1, 2),
ϵijk = −1 if (i, j, k) = (1, 3, 2) = (3, 2, 1) = (2, 1, 3),
0 otherwise.

i) The Levi-Civita symbol is totally antisymmetric.


ii) ϵijk ϵnlm = δin δjl δkm + δil δjm δkn + δim δjn δkl − δil δjn δkm − δin δjm δkl − δim δjl δkn .
11

Example 3 Use the index notation to rewrite the following expressions

(i) (a × b)i = ϵijk aj bk . For instance:


(a × b)1 = ϵ1jk aj bk = ϵ123 a2 b3 + ϵ132 a3 b2 = a2 b3 − a3 b2 .

(ii) (b × a)i = ϵijk bj ak = ϵijk ak bj = −ϵikj ak bj = −(a × b)i .

(iii) [a, b, c] = a · (b × c) = ai (b × c)i = ai ϵijk bj ck = ϵijk ai bj ck

= a1 b 2 c 3 + a2 b 3 c 1 + a3 b 1 c 2 − a1 b 3 c 2 − a3 b 2 c 1 − a2 b 1 c 3
a1 a2 a3
= b1 b2 b3 .
c1 c2 c3

(iv) ϵijk ϵilm = 3δjl δkm + δil δjm δki + δim δji δkl − δil δji δkm − 3δjm δkl − δim δjl δki
= δjl δkm − δjm δkl .
Chapter 3

Vector Spaces

Linear vector space. A set of objects called vectors forms a vector space V if there
are two operations defined on the elements of the set called addition and multiplication by
scalars, which obey the following simple rules (the axioms of the vector space):

i) If u and v are in V then u + v (addition) is in V. If v is in V then αv (scalar


multiplication) is in V.

We say that the vector space V is closed with respect to addition and scalar multipli-
cation.

ii) (u + v) + w = u + (v + w), (α β)v = α(β v).

iii) There exists a neutral element 0 such that v + 0 = v for all v.

iv) There exists an inverse element −v such that v + (−v) = 0 for all v.

v) u + v = v + u.

vi) α(u + v) = α u + α v, (α + β)v = α v + β v.

vii) 1 v = v for all v,

where u, v, w are vectors and α, β are scalars. If the scalars α are real V is called a real
vector space, otherwise V is called a complex vector space.

12
13

Example 1 Indicate which if the following set form a vector space

(i) R3 . Yes.

(ii) Rn . Yes (Euclidean vector spaces.)

(iii) Cn . Yes (Examples of complex vector spaces.)

(iv) The set of real functions f (x) with no restriction on the values of x and with
the usual addition and scalar multiplication. Yes.

(v) The set of matrices of size (n × m) with real entries and with the usual addition
and scalar multiplication. Yes.

(vi) The set of 2-dimensional vectors with real entries and the usual addition but
the following definition of scalar multiplication
   
x αx
α = ,
y 0

where α is a scalar. No.

(vii) The set of solutions of the following second order, linear, homogeneous differ-
ential equation
d2 f df
p(x) 2 + q(x) + r(x) f = 0,
dx dx
where p, q, r are given functions. Yes.
 
x
(viii) The set of vector u =  y  in the 3-dimensional space for which
z

2x − 3y + 11z + 2 = 0.

No.

ˆ Linear combinations:

α1 v1 + α2 v2 + · · · αk vk = αi vi ,

where v1 , v2 , · · · vk , are k vectors in V.


The set of all linear combinations of v1 , v2 , · · · vk , is called a span of v1 , v2 , · · · vk ,
and denoted Span(v1 , v2 , · · · vk ).
14 CHAPTER 3. VECTOR SPACES

Example 2 Answer the following questions

(i) What is a span of a single vector in R3 ? It is the set of all scalar multiples
of this vector. It is a line in the direction of the vector.

(ii) What is a span of two non collinear vectors in R3 ? It is a plane through


zero containing these two vectors.

ˆ Linearly independent vectors: k vectors v1 , v2 , · · · , vk are said to be linearly


independent if the equation

α1 v1 + α2 v2 + · · · + αk vk = 0

is satisfied if and only if all αi = 0. Otherwise, the vectors are said to be linearly
dependent. That is, they are linearly dependent if the expression

α1 v1 + α2 v2 + · · · + αk vk = 0

with at least one αi ̸= 0. In other words, the vectors v1 , v2 , · · · , vk are linearly


dependent if and only if one vector vi can be written as a linear combination of the
others.

Example 3 Indicate whether the following sets of vectors are linearly dependent
or independent.

(i)      
0 0 1
v1 =  1  , v2 =  1  , v3 =  1  .
1 2 −1
 
α3
By definition: α1 v1 + α2 v2 + α3 v3 =  α+ α2 + α3  = 0,
α1 + 2α2 − α3
This implies, α1 = α2 = α3 = 0. Hence the vectors are linearly indepen-
dent.
15

(ii)      
−2 1 0
v1 =  0  , v2 =  1  , v3 =  2  .
1 1 3
We can see that v3 = v1 + 2v2 . Hence the vectors are linearly depen-
dent.

(iiii)
{1 + x + x2 , 1 − x + 3x2 , 1 + 3x − x2 }.
By definition: α1 (1 + x + x2 ) + α2 (1 − x + 3x2 ) + α3 (1 + 3x − x2 ) =
x2 (α1 + 3α2 − α3 ) + x(α1 − α2 + 3α3 ) + (α1 + α2 + α3 ) = 0.
It follows that α1 = −2α2 , α2 = α3 . Hence the ‘vectors’ are linearly
dependents.

ˆ Basis: A basis is a minimal set of vectors that span a vector space. In other words,
a set of vectors v1 , v2 , · · · vk , in V is called a basis of V if and only if v1 , v2 , · · · vk ,
are linearly independent and V = Span(v1 , v2 , · · · vk ). Then

i) The numbers of vector in a basis is called the dimension of the space V (dim
V ).
ii) If the set {v1 , v2 , · · · , vk } is a basis of the vector space V , then any vector v
in V can be written as a unique linear combination of the basis vectors and
the coefficients of the unique linear combination are called the components of v
with respect to that basis.

Example 4 Write down a basis for the following vector spaces

(i) R3 .
Basis: the set of vectors in Example 3 (i). Dim 3.

(ii) The set of (2 × 3) matrices with real entries.


Basis:
     
1 0 0 0 1 0 0 0 0
E11 = , E12 = , ··· , E23 = .
0 0 0 0 0 0 0 0 1

Dim 6.
16 CHAPTER 3. VECTOR SPACES

(iii) The set of polynomials of degree two or less with real coefficients.
Basis: {1, x, x2 }. Dim 3.

ˆ Inner (or scalar) product: Consider a vector space V . The inner product between
two elements of V is a scalar function denoted ⟨u|v⟩ that satisfies the following
properties
i) ⟨u|v⟩ = ⟨v|u⟩∗ .
ii) ⟨u|(λv + µw)⟩ = λ⟨u|v⟩ + µ⟨u|w⟩, λ, µ scalars.
iii) ⟨u|u⟩ > 0 if u ̸= 0.
Note that I have used the ‘physics’ convention, for which the expression ii) is linear
in the second argument and conjugate linear - or antilinear - in the first argument.
The inner product inducesp naturally the notion of lenght of a vector - or norm -
defined as follows: |u| = ⟨u|u⟩. Additionally, the inner product allows as to define
what we mean with orthogonality. In fact, two vectors are orthogonal if ⟨u|w⟩ = 0.

Example 5 Check the properties for the following scalar products

(i) In R3 :
⟨u|v⟩ = uT · v.
   
u1 v1
Take u =  u2  , v =  v2  .
u3 v3
For the first property: ⟨u|v⟩ = uT · v = (u1 v1 + u2 v2 + u3 v3 ) = vT · u =
(vT · u)∗ = ⟨v|u⟩∗ . Similar procedure for the other properties.

(ii) In C3 :
⟨u|v⟩ = u† · v.
   
u1 v1
Take u =  u2  , v =  v2  .
u3 v3
For the first property: ⟨u|v⟩ = u† ·v = (u∗1 v1 +u∗2 v2 +u∗3 v3 ) = (u1 v1∗ +u2 v2∗ +
u3 v3∗ )∗ = (v† · u)∗ = ⟨v|u⟩∗ . Similar procedure for the other properties.

Hilbert spaces, widely used in physics, are examples of vector spaces with a norm and an
inner product. Mind you, additionally, Hilber spaces also enjoy the property of completenss.
Chapter 4

Matrices: the eigenvalue problem &


matrix diagonalisation

From now on in the course, we will work in R3 /C3 . Please refer to the content in section
1.2.

4.1 Some basic concepts

ˆ Linear operators: An object A is called a linear operator if its action on vectors u


and v is as follows
A (αu + βv) = αA u + βA v,

where α and β are scalars. Matrices are examples of operators.

ˆ The trace of a square matrix: It is the sum of the diagonal elements of the
matrix, i.e.
X
Tr A = Akk ≡ Akk .
k

Properties:

i) The trace is a linear operation.


ii) Tr AT = Tr A; Tr A† = (Tr A)∗ .
iii) Tr(AB) = (AB)ii = Aij Bji = Bji Aij = (BA)jj = Tr(BA).
iv) Tr(ABC) = Tr(BCA) = Tr(CAB), i.e. the trace is invariant under cyclic
permutations of the matrices.

17
18CHAPTER 4. MATRICES: THE EIGENVALUE PROBLEM & MATRIX DIAGONALISATION

Example 1 A and B are two anticommuting matrices, i.e. AB = −BA and


A2 = B 2 = I. Show that Tr A = Tr B = 0.
Multiply the relation given by A on the left hand side: AAB = −ABA.
Apply the trace and its properties: Tr B = −Tr (ABA) = −Tr (BAA) =
−Tr (B).
This implies that Tr B = 0. Similar procedure can be applied in order to show
that Tr A = 0.

ˆ Special types of square matrices:

i) Symmetric matrices: AT = A.
Anti-symmetric or skew-symmetric matrices: AT = −A.

ii) Hermitian matrices: A† = A.


Anti-hermitian matrices: A† = −A.

iii) Unitary matrices: A† = A−1


It follows that (A† A)ij = δij .
Note that they unitary matrices preserve the length of vectors. In fact

v = Au → |v|2 = v† · v = (u† A† ) · (Au) = |u|2 .

iv) Orthogonal matrices: AT = A−1


They also preserve the length of vectors since they correspond to real unitary
matrices (AT ≡ A† .)

Example 2 Consider the Pauli matrices

     
0 1 0 −i 1 0
σ1 = , σ2 = , σ3 = .
1 0 i 0 0 −1

Show that the Pauli matrices, together with the identity matrix, I, form a basis
for the vector space of the (2 × 2) hermitian matrices.
4.2. THE EIGENVALUE PROBLEM 19

A general element of the vector space is:


 
γ + δ α − iβ
ασ1 + βσ2 + γσ3 + δI = ,
α + iβ −γ + δ
where α, β, γ, δ are real. This is the general form of an hermitian matrix since
it can be rewritten as follows
 
a c
,
c∗ b

where a b are real.

4.2 The eigenvalue problem

Consider a (n × n) matrix. We want to answer the following question:

Are there any vectors x ̸= 0 which are transformed by a matrix A into multiple of
themselves?

In other words: For which vectors x and scalar λ is the following eigenvalue equation

Ax = λx

satisfied?

i) The vector x is called eigenvector and λ is called the corresponding eigenvalue.

ii) The determinant |A − λI| is called the characteristic polynomial of degree n.

iii) The eigenvalue equation, being a set of non homogeneous linear equations, has a non
trivial solution if and only if |A − λI| = 0.

iv) The eigenvalues of the matrix A are the roots of the characteristic polynomial.

v) The eigenvectors associated to the eigenvalue µ are the vectors x such that

(A − µ I)x = 0.
20CHAPTER 4. MATRICES: THE EIGENVALUE PROBLEM & MATRIX DIAGONALISATION

Example 3 Find the eigenvalues and eigenvectors of the matrix

 
1 2 1
A =  2 1 1 .
1 1 2

1−λ 2 1
|A − λI| = 2 1−λ 1 = (1 − λ)2 (2 − λ) − 2(1 − λ) − 4(1 − λ) = 0.
1 1 2−λ
Hence λ1 = 1, λ2 = 4, λ3 = −1.
For λ1 = 1 :
    
0 2 1 x1 2x2 + x3
(A − I)x1 =  2 0 1   x2  =  2x1 + x3  = 0.
1 1 1 x3 x1 + x2 + x3
   
x1 1
Hence x1 =  x1  . A possible eigenvector is: x1 =  1  .
−2x1   −2
1
For λ1 = 4 solve (A − 4I)x2 = 0 → x2 =  1 .
1 
1
For λ1 = −1 solve (A + I)x3 = 0 → x3 =  −1  .
0

ˆ Eigenvectors associated to different eigenvalues are linearly independent.

ˆ If a n × n matrix A has n distinct eigenvalues, then the set of corresponding eigenvec-


tors represents a basis in the vector space on which the matrix acts. If the eigenvalues
are not all distinct, it may or it may not exist a basis of eigenvectors.

ˆ If a matrix has an eigenvalue equal to zero, then the matrix is singular since its
determinant is zero.

Example 4 Find the eigenvalues and eigenvectors of the matrix

 
−2 2 −3
A= 2 1 −6  .
−1 −2 0
4.2. THE EIGENVALUE PROBLEM 21

The eigenvalues are: λ1 = 5, λ2 = λ3 = −3. Hence one of the eigenvalues is


degenerate.  
1
For λ1 = 5 solve (A − 5I)x1 = 0 → x1 =  2  .
−1
For λ2 = λ3 = −3 :
    
1 2 −3 x1 x1 + 2x2 − 3x3
(A + 3I)x =  2 4 −6   x2  =  2x1 + 4x2 − 6x3  = 0.
−1 −2 3 x3 −x1 − 2x2 + 3x3
 
−2x2 + 3x3
Hence x =  x2 .
x3    
3 −2
Two linearly independent eigenvectors are: x2 =  0  , x3 =  1  .
1 0

ˆ On the special types of square matrices:

i) Hermitian matrix and symmetric matrix:


Their eigenvalues are real.
Proof: Consider the expression Ax = λx. Take the adjoint of this expression, i.e.
x† A† = λ∗ x† . Multiply on the left the first expression by x† and multiply on the
right the second expression by x, then compare them. You get λx† x = λ∗ x† x,
which implies λ = λ∗ .

ii) Anti-Hermitian matrix and antisymmetric matrix:


Their eigenvalues are purely imaginary or zero.

iii) Unitary matrix and orthogonal matrix:


Their eigenvalues have absolute value equal to one.
Proof: Consider the expression Ax = λx. Take the adjoint of this expression,
i.e. x† A† = λ∗ x† . Take the scalar product between these two expressions. You
get x† (A† A)x = λ∗ λ(x† x), which implies |λ|2 = 1.

iv) Theorem: The eigenvectors of all special matrices are linearly independent. In
addition, they can always be chosen in such a way that they form a mutually
orthogonal set.
22CHAPTER 4. MATRICES: THE EIGENVALUE PROBLEM & MATRIX DIAGONALISATION

4.3 Matrix diagonalisaiton


ˆ Similar matrices: Two (n × n) matrices A and A′ are said to be similar if it exists
a matrix S such that
A′ = S −1 AS.
The two matrices represent the same linear operator in different bases. The two bases
are related by the matrix S. The proof follows; however it will not be examinable or
shown during lectures.

Consider the basis {e1 , e2 , · · · , en } in an n-dimensional space. Then a vector x can


be written as follows
x = x1 e 1 + x 2 e 2 + · · · + x n e n ,
where x = (x1 , x2 , · · · , xn )T is the representation of the vector x in the basis
{e1 , e2 , · · · , en }. Similarly, if we consider a new basis {e′ 1 , e′ 2 , · · · , e′ n }, the vector x
can be written as follows
x = x′1 e′ 1 + x′2 e′ 2 + · · · + x′n e′ n ,
with x′ = (x′1 , x′2 , · · · , x′n )T . If the two bases are related by a matrix S
n
X

ei = Sji ej ,
j=1

then the two representations for the vector x are related by x = S x′ since
n n n n
!
X X X X
x= xj ej = x′i e′ i = Sji x′i ej .
j=1 i=1 j=1 i=1

Consider now a linear operator A and the relation y = Ax. In the representation
associated with the basis {e1 , e2 , · · · , en }, it becomes y = A x. On the other hand,
in the basis {e′ 1 , e′ 2 , · · · , e′ n }, it is:
S y ′ = A S x′ → y ′ = S −1 AS x′ ,
hence A′ = S −1 AS.
ˆ Similar matrices A and A′ share basis-independent properties of the operator A they
represents such as determinant, trace and set of eigenvalues.
ˆ Theorem: Diagonalisation of a matrix: If the new basis is a set of eigenvectors of
A then A′ ≡ D is diagonal with
 
λ1 0 . . . 0  
 0 λ2 . . . 0  | | . . . |
D=  x1 x2 . . . xn 
 ... ... ... ... , S =

↓ ↓ ... ↓
0 0 . . . λn
4.3. MATRIX DIAGONALISAITON 23

where λi are the eigenvalues and xi the eigenvectors. Proof:


   
| | ... | | | ... |
AS = A  x1 x2 . . . xn  =  Ax1 Ax2 . . . Axn 
↓ ↓ ... ↓ ↓ ↓ ... ↓
   
| | ... | | | ... |
=  λ1 x1 λ2 x2 . . . λn xn  =  x1 x2 . . . xn  D = SD.
↓ ↓ ... ↓ ↓ ↓ ... ↓
Q P
Note that |A| = i λi and TrA = i λi .
ˆ If A is a special matrix, then A is diagonalisable since it is always possible to find a
basis of eigenvectors. Moreover, since the basis of eigenvectors can be chosen to be
orthonormal, the matrix S is unitary, i.e.
D = S † AS.
In fact
x∗1 x†1 x1 x†1 x2 x†1 xn
   
− →   ...
| | ... |  x† x1 . . .
− x∗2 →  ... x†2 xn 
S †S = 

  x1 x2 . . . xn  =  2  = I.
 ... ... ...   ... ... ... ... 
↓ ↓ ... ↓
− x∗n → †
xn x1 . . . ... x†n xn

Example 5 Diagonalise the matrix

 
1 0 3
A =  0 −2 0  .
3 0 1

The eigenvalues are: λ1 = 4, λ2 = λ3 = −2.    


a b
The general forms of the eigenvectors are: x1 =  0  , x2/3 =  c  .
a  −b
1
1 
Hence, a set of orthonormal eigenvectors is: x1 = 2
√ 0 ,
     1 
1 0 4 0 0
x2 = √12  0  , x3 =  1  . Hence D =  0 −2 0 
−1 0 0 0 −2
   
1 0 1 1 0 3 1 1 √0
1
=  1 √0 −1   0 −2 0   0 0 2  = S † AS.
2
0 2 0 3 0 1 1 −1 0
24CHAPTER 4. MATRICES: THE EIGENVALUE PROBLEM & MATRIX DIAGONALISATION

ˆ Commuting matrices: Two matrices, A and B commute if

[A, B] ≡ (AB − BA) = 0.

Theorem: Two (n × n) matrices have the same set of eigenvectors if and only if they
commute. Note that the proof is not examinable and it will not be shown during
lectures.

Proof (assume all eigenvalues are different):

Suppose [A, B] = 0 and Ax = λx. Then

AB x = BA x = λ Bx,

thus Bx is an eigenvector of A with eigenvalue λ. Hence x and Bx are proportional


i.e. Bx = µx. Therefore, if x is an eigenvector of A, it is also an eigenvector of B.
Swapping A and B, the same argument applies, hence the two matrices share the
same set of eigenvectors.

Suppose A and B share the same set of eigenvectors i.e.

Axi = λ(i) xi , Bxi = µ(i) xi , i = 1, · · · , n.

Then aP
vector z in the vector space spanned by the set of eigenvectors can be written
as z = ni=1 ci xi . Consider the two expressions below
n
X n
X
AB z = AB ci xi = ci µ(i) λ(i) xi ,
i=1 i=1
n
X n
X
BA z = BA ci xi = ci λ(i) µ(i) xi .
i=1 i=1

Subtract them and you get (AB − BA) z = 0 for an arbitrary z. Hence [A, B] = 0.

Note that, if the eigenvalues of one of the matrix are degenerate, then not all pos-
sible sets of eigenvectors of one matrix are eigenvectors of the other matrix as well.
However, provided that by taking linear combinations a set of common eigenvectors
can be found, the result above still applies.

Note that [A, A† ] = 0 if A is a special matrix (hermitian, unitary, etc).

Applications: consider a square matrix A that can be diagonalised, then


4.3. MATRIX DIAGONALISAITON 25

i) n-power of A :
An = AA . . . A = (SDS −1 )(SDS −1 ) . . . (SDS −1 ) = SDn S −1 .

ii) Exponential of A :

A
X An
e = ,
n=0
n!
then ∞
−1 )
X (SDS −1 )n
eA = e(SDS = = SeD S −1 .
n=0
n!

Example 6 Consider a unitary matrix U .

i) Show that U has the form U = eiH for some hermitian matrix H.
Since U is a special matrix, it can be diagonalised: U = SDS † with
 iθ 
e 1 0 0
 0 eiθ2 0  iθi
D=  · · · · · · · · ·  and e = λi .

0 0 eiθn
 
θ1 0 0
 0 θ2 0 
Then U = SeiΛ S † with Λ = 
 ···
.
··· ··· 
0 0 θn

It follows that U = eiSΛS where SΛS † ≡ H is an hermitian matrix.

ii) Show that |U | = |eiH | = eiTr H .


Since H is hermitian, it can be diagonalised. Then

|U | = |eiH | = |eiSDS | = |SeiD S † | = |S||eiD ||S † | = |eiD |

with
   iλ 
λ1 0 0 e 1 0 0
 0 iλ2
λ2 0    0 e 0 
D=
 ··· , and eiD =  .
··· ···   ··· ··· ··· 
0 0 λn 0 0 eiλn
Then
= eiTr D = eiTr H .
P
|eiD | = Πnj=1 eiλj = ei j λj
Chapter 5

Fourier Series

The Fourier series of a periodic function f (t) with period T is a representation of


the function f (t) as an infinite series of cosine and sine functions
∞     
a0 X 2πnt 2πnt
f (t) = + an cos + bn sin .
2 n=1
T T
The Fourier coefficients a0 , an and bn are:
tZ
0 +T
2
a0 = f (t) dt,
T
t0
tZ
0 +T  
2 2πnt
an = f (t) cos dt n = 0, 1, 2, 3, . . . ,
T T
t0
tZ
0 +T  
2 2πnt
bn = f (t) sin dt n = 1, 2, 3, . . . ,
T T
t0

where t0 is an arbitrary point along the t-axis. Apart from the term a0 , the other compo-
nents - the harmonics - have period T /n or frequency n/T. Because the frequencies of the
individual harmonics are integer multiples of the lowest frequency, the periods of the sum
is T.
It is convenient to write Fourier series using complex exponentials:
∞     
a0 X 2πnt 2πnt
f (t) = + an cos + bn sin
2 n=1
T T
∞     
a0 X i2πnt/T an bn −i2πnt/T an b n
= + e + +e − .
2 n=1
2 2i 2 2i

26
27

Set (an − ibn )/2 ≡ cn . Then, since an = a−n and bn = −b−n we get (an + ibn )/2 ≡ c−n . It
follows that
∞ ∞
a0 X i2πnt/T −i2πnt/T
X
cn ei2πnt/T .

f (t) = + cn e + c−n e =
2 n=1 n=−∞

A similar manipulation allows us to find


tZ
0 +T
1
cn = f (t) e−2πint/T dt.
T
t0

Observations

i) The coefficents are, in general, complex numbers with cn = c−n .


p
ii) |cn | = |c−n | = a2n + b2n /2.

iii) If f (t) is real and even then bn = 0 and coefficients cn are real and even. If f (t) is real
and odd then an = 0 and coefficients cn are purely imaginary and odd.

iv) Note that


tZ
0 +T
1
c0 = f (t) dt
T
t0

is the average value of the function.

v) The set of frequencies present in a given periodic signal is the spectrum of the signal.

vi) The square of the magnitude of the coefficients, |cn |2 can be identified with the energy
of the single harmonics. They form the energy spectrum. It follows that the energy of
the signal is:
tZ
0 +T ∞
1 2
X
|f (t)| dt = |cn |2 .
T n=−∞
t0

This is called the Parseval’s theorem.


28 CHAPTER 5. FOURIER SERIES

Figure 5.1: Function for Example 1.

Example 1 Calculate the Fourier series for the function sketched in the figure above.

The function is even, hence the coefficients bn = 0. The interval T = 2π. Consider
the function between −π and π then

−t if −π ≤ t ≤ 0
f (t) = ,
t if 0 ≤ t ≤ π.

and the Fourier coefficients are:


 0 
Z Zπ Zπ
1 2
a0 = (−t) dt + t dt = t dt = π,
π π
−π 0 0

 0 
Zπ Zπ
2 (−1)n − 1
Z
1 2
an = (−t) cos nt dt + t cos nt dt = t cos nt dt = .
π π π n2
−π 0 0

It follows that c0 = π/2 and cn = −2/(πn2 ) for n odd.


29

Original function: f(t)


n=1
n=3
n=7
3.5 3.0

9.42 3.14 0.00 3.14 6.28 9.42

Figure 5.2: Fourier series for the function in Example 1.

Spectrum: |cn|2

3/2 1/2 0 1/2 2/2 3/2

Figure 5.3: Spectrum in the frequency domain for the function in Example 1.

Observations

i) The larger the period in time, the small the spacing of the spectrum. If T → ∞ you
can imagine the frequencies to form a continuous spectrum.

ii) We need hight frequencies to describe sharp corners such as in the triagle function.
30 CHAPTER 5. FOURIER SERIES

Example 2 Calculate the Fourier series for the function


−1 if −π < x < 0
f (x) =
1 if 0 < x < π.

The function is odd, hence the coefficients an = 0. The interval T = 2π. The Fourier
coefficients are:
 0 
Zπ Zπ
2 1 − (−1)n
Z
1 2
bn = (−1) sin nt dt + sin nt dt = sin nt dt = .
π π π n
−π 0 0

It follows that cn = −2i/(πn) for n odd. The Fourier series is:



4 X sin(2n − 1) t
f (t) = .
π n=1 (2n − 1)

The function is discontinuous at t = 0, ±π, ±2π, . . . and its value, at these points, is
zero.

Original function: f(t)


n=3
n = 10
9.42
n = 100 3.14 0.00 3.14 6.28 9.42

3.50 3.25 3.00 2.75

Figure 5.4: Fourier series for the function in Example 2


31

Observations

i) We need arbitrary high frequencies in order to describe jump discontinuities. In other


words we need an infinite series.

ii) We can use fourier sereis to represent periodic fucntions with a finite number of finite
discontinuities in a given period. Note that this is not possible by using Taylor series.
The value of the Fourier series at the point of the discontinuity, t0 , is the average of
the upper and lower values i.e.

f (t− +
0 ) + f (t0 )
f (t0 ) = ,
2

where f (t− +
0 ) and f (t0 ) are the left and right limits of the function at t0 , respectively.

iii) The overshooting at the points of discontinuities is called the Gibbs phenomenon. This
is the real phenomenon that does not go away in the limit. This is dues to the fact
that we have pointwise convergence but not uniform convergence.

iv) Fourier series evaluated at specific points can be used to calculate series of constant
terms. Consider the function in Example 2 and its Fourier series
 
4 sin 3t sin 5t
f (t) = sin t + + + ··· .
π 3 5

Because f (π/2) = 1, it follows that



4 X (−1)n
1= ,
π n=0 2n + 1

which implies

π X (−1)n
= .
4 n=0
2n + 1

This expression is the Leibniz formula for π.

What happens if a function is not periodic?


Consider a function defined only on a finite interval T. Then, in order to calculate its
Fourier series we need to extend the function over the whole t-axis. In other words we
need to consider a periodic extension of the original function. The Fourier series of any
extension is a representation of the original function on the finite interval T. However,
normally continuous extension are preferable because they allow us to avoid the Gibbs
phenomenon at the points of discontinuity (see page 421 in Riley).
32 CHAPTER 5. FOURIER SERIES

Example 3 Sketch possible extensions for the function


t2 if 0 < t < 2
f (t) =
0 otherwise.

All extensions below provide good representations of the function f (t) in the interval
0 ≤ t ≤ 2.

Figure 5.5: Example 3: even and continuous extension.

Figure 5.6: Example 3: odd and non-continuous extension.

Figure 5.7: Example 3: non-continuous extension.


33

Since the Fourier series are functions, can we perform on them the common
operations of differentioation and integration?

Given a Fourier series, integration and differentiation can be used to obtain Fourier series
for other functions. However, while integration is always a safe operation in the sense that
convergence of the new series is always guaranteed, differentiation is not since an additional
power of n at the numerator reduces the rate of convergence of the new series.

Example 4 Consider the even extension for the function


t2 if 0 ≤ t ≤ 2
f (t) =
0 otherwise.

Write its Fourier series and by using the operations of integration and differentiation
find the Fourier series of the function
 3
t if 0 ≤ t ≤ 2
g(t) =
0 otherwise.

We choose the extension (i) in Example 3. Its Fourier series is:



(−1)n
 
4 X πnt
f (t) = + 16 2
cos = t2 (0 ≤ t ≤ 2).
3 n=1
(πn) 2

Take the derivative of the expression above, f (t) :


∞ ∞
(−1)n (−1)n
   
X πnt X πnt
−8 sin = 2t (0 ≤ t ≤ 2) → t = −4 sin .
n=1
πn 2 n=1
πn 2

Integrate f (t) :

(−1)n t3
 
4 X πnt
t + 32 sin + c = (0 ≤ t ≤ 2),
3 n=1
(πn)3 2 3

where c is the constant of integration. We can replace the result for t into this
expression, then we get
∞ ∞
(−1)n (−1)n
   
X πnt X πnt
3
t = −16 sin + 96 sin + c′ (0 ≤ t ≤ 2).
n=1
πn 2 n=1
(πn)3 2

Since g(0) = 0, c′ = 0 and the expression above becomes the Fourier series of the
function g(t).
34 CHAPTER 5. FOURIER SERIES

5.1 Fourier series and vector spaces

The set of all periodic functions on the interval T that can be represented by Fourier series
forms a vector space.

i) Othonormal basis:
e2πint/T
en (t) = √ , n = 0, 1, 2, 3, . . . .
T
ii) Operations: standard addition and multiplication by a scalar.

iii) Inner product:


ZT
⟨f |g⟩ = f (t)∗ g(t)dt.
0

Note that √
⟨en |em ⟩ = δmn , ⟨en |f ⟩ = T cn .

iv) A general element of the space is:



X ∞
X
f (t) = ⟨en |f ⟩ en = cn e2πint/T .
n=−∞ n=−∞

v) Parseval’s theorem:

ZT ∞
X ∞
X
2 2
|f (t)| dt = ⟨f |f ⟩ = T |cn | = |⟨en |f ⟩|2 .
0 n=−∞ n=−∞

The square length of the ‘vector’ f is equal to the sum of the squares of the vector
components with respect to the orthonormal basis. In fact
* ∞ ∞
+ ∞ ∞
X X X X
⟨f |f ⟩ = ⟨en |f ⟩ en | ⟨em |f ⟩ em = ⟨en |f ⟩∗ ⟨em |f ⟩⟨en |em ⟩ = T |cn |2 .
n=−∞ m=−∞ n,m=−∞ n=−∞
Chapter 6

Integral Transforms and the Dirac


delta function

An integral transform is a function g that can be expressed as an integral of another


function f in the form
Z∞
I[f (x)](y) ≡ g(y) = K(x, y)f (x)dx,
−∞

where K(x, y) is called the kernel of the transform.


ˆ I is a linear operator, that is:
I[c1 f1 + c2 f2 ] = c1 I[f1 ] + c2 I[f2 ], c1 , c2 constants.

ˆ Given I such that I[f ] = g, the inverse operator I −1 is also a linear operator and
I −1 [g] = f.
There are several types of integral functions. We are going to discuss the Fourier and the
Laplace transforms.

6.1 Fourier Transforms


The Fourier Transform of the function f (t) is:
Z∞
F[f (t)](ω) ≡ fˆ(ω) = f (t) e−iωt dt, ω ∈ R,
−∞

and it is a complex-valued function of ω. The inverse Fourier transform is defined as follows


Z∞
−1 ˆ 1
F [f (ω)](t) = f (t) = fˆ(ω) eiωt dω.

−∞

35
36 CHAPTER 6. INTEGRAL TRANSFORMS AND THE DIRAC DELTA FUNCTION

Observations
R∞
i) If −∞ |f (t)|dt is finite, then the integral transform exists and it is continuous (no
proof). Note that, this does not mean that the inverse Fourier transform exists. On
the other hand, we can say that if f is continuous and has a Fourier transform, then
its inverse Fourier transform exists.

ii) There different ways to write Fourier transforms

Z∞
1
F[f (t)](ω) ≡ fˆ(ω) = f (t) e−iBωt dt,
A
−∞

with √
A = 1, B = ±1, or A = 2π, B = ±1, or A = 1, B = ±2π.
Note that the inverse Fourier transform should be defined consistently with the above
definitions.

iii) We need fourier trasnfrom because most functions are not periodic and Fourier sereis
cannot be used. Non-periodic functions can be seen as periodic functions defined over
an infinite interval.

How fourier transforms emerge from the fourier series in the limit
Consider the complex Fourier series with period T
 
T /2
X∞ ∞
X 1 Z
f (t) = cn ei2πnt/T = f (t) e−2πint/L dt ei2πnt/L .

T

n=−∞ n=−∞
−T /2

Set 2πn/T ≡ ωn . This implies ∆ω = ωn+1 − ωn = 2π/T. Then


 
T /2

X  ∆ω Z
f (t) = f (t) e−itωn dt eitωn .



n=−∞
−T /2

When T −→ ∞ the expression above becomes

Z∞
 ∞
Z∞

Z
dω  1
f (t) = f (t) e−itω dt eitω = dω fˆ(ω) eitω .
2π 2π
−∞ −∞ −∞

We have shown that the scaled Fourier coefficent, T cn , of the function f tends to fˆ.
6.1. FOURIER TRANSFORMS 37

Example 1 Calculate the complex Fourier series for the following periodic function

 0 −T /2 ≤ t ≤ −1/2
f (t) = 1 −1/2 < t < 1/2 , T > 1,
0 1/2 ≤ t ≤ −T /2

and the Fourier transform for the following non periodic function

1 −1/2 < t < 1/2
rect(t) ≡ g(t) =
0 otherwise

Then compare T cn from the Fourier series of f (t)) and ĝ(ω) by sketching them on
the same axis.
1
The coefficients for the complex Fourier series of f (t) are: cn = πn
sin(nπ/T ). Then

ZT /2
sin(nπ/T )
T cn = = dt f (t) ei2πnt/T .
nπ/T
−T /2

On the other hand, the Fourier transform of g(t) is:

Z1/2 1/2
eiωt sin(ω/2)
ĝ(ω) = eiωt dt = = ≡ sinc(ω/2).
−iω −1/2 ω/2
−1/2

Spectrum:T = 4 Spectrum:T = 16 Spectrum:T = 25 Spectrum:T

Figure 6.1: Example 1: Spectrum for different value of the period.


Observations
38 CHAPTER 6. INTEGRAL TRANSFORMS AND THE DIRAC DELTA FUNCTION

i) The spectrum of a periodic function is always discreate. On the other hand, the
Fourier transform of a non-periodic function produces a continuous spectrum.

ii) Note that the sinc-function is a continuous function even if the rect-function is not.

iii) fˆ(0) represents the area under the function f.

iv) The Fourier transform and its inverse allow us to see two different representations of
the same signal.

v) Parseval’s identity:
Z∞ Z∞
|f (t)|2 dt = 2π |fˆ(ω)|2 dω.
−∞ −∞

It represents the energy spectrum in the two domains.

Properties of the Fourier transform

ˆ If f is even then its Fourier trasform is even. If f is odd its Fourier trasform is odd.
If f is real and even then its Fourier trasform is real and even. On the other hand,
if f is real and odd its Fourier trasform is purely imaginary and odd.

ˆ Translation or shift:

F[f (t + a)](ω) = eiaω fˆ(ω), a constant.

ˆ Exponential multiplication:

F[eαt f (t)](ω) = fˆ(ω + iα), α constant.

ˆ Scaling or stretch:
Z∞
F[f (at)](ω) = f (at) e−iωt dt,
−∞

where a ̸= 0 is a constant. Set at = t′ , then

Z∞
1 ′ 1 ω 
F[f (at)](ω) = f (t′ ) e−iωt /a dt = fˆ .
|a| a a
−∞

Note that a function squeezed in the time domain is streached out in the frequency
domain and viceversa.
6.1. FOURIER TRANSFORMS 39

Example 2 Calculate the Fourier transform of f (at + b), where a and b are
constants, in terms of the Fourier transform of f (t). Then, apply the formula
to the function rect((t − 4)/3), where the rect function in defined in Example
1.
F[f (at + b)](ω) = F[f (a(t + b/a)](ω).
By using scaling and translation in this order, we get
1 ω  1 ω  1 ˆ  ω  iωb/a
F[f (t + b/a)] = F[f (t)] eiωb/a = f e .
|a| a |a| a |a| a

In the rectangular pulse a = 1/3 and b = 4/3. This function describes a


rectangular pulse shifted by a factor of 4 and streached by a factor of 3 with
respect to the basic rectangular pulse. Its Fourier transform is:

sinc(3 ω/2)
F[rect((t − 4)/3)](ω) = 3 e−i4 ω .
3 ω/2

ˆ The Fourier transform of a derivative:

Z∞
F[f ′ (t)](ω) = f ′ (t) e−iωt dt
−∞
Z∞

e−iωt
f (t) −∞
+ (iω) f (t) e−iωt dt = (iω)fˆ(ω).
−∞

R∞
In fact, since −∞
|f (t)|dt is finite, f (t) −→ 0 when t → ±∞.
For a derivative of order n we have

F[f (n) (t)](ω) = (iω)n fˆ(ω).

ˆ Convolution and convolution theorem


The convolution theorem answers the following question: which operation of the
functions f and g we need to perform such that in the frequency domain we can
write fˆ(ω)ĝ(ω)? Let us find out.

Z∞ Z∞
fˆ(ω)ĝ(ω) = dt f (t) e −iωt
dz g(z) e−iωz .
−∞ −∞
40 CHAPTER 6. INTEGRAL TRANSFORMS AND THE DIRAC DELTA FUNCTION

Taking exp(−iωt) into the second integral and setting (z + t) = y we get

Z∞ Z∞ Z∞ Z∞
 

fˆ(ω)ĝ(ω) = dt f (t) dy g(y − t) e−iωy = dy  dt f (t) g(y − t) e−iωy


−∞ −∞ −∞ −∞
Z∞
 

= F dt f (t) g(y − t) (ω) = F[h(y)](ω).


−∞

Hence, the result of such an operation is h and such an operation is called convolution.
It is denoted by a start ∗.

Z∞
h(y) = f (t)g(y − t) dt ≡ (f ∗ g)(y) = (g ∗ f )(y).
−∞

The convolution theorem for Fourier transforms tells us

ĥ(ω)(ω) = fˆ(ω)ĝ(ω).

Observations

i) The convolution is commutative: (g ∗ f ) = (f ∗ g).


ii) The convolution is associative: (g ∗ f ) ∗ h = g ∗ (f ∗ h).
iii) The convolution is a ‘smoothing’ operation: (g ∗ f )′ = (g ′ ∗ f ) = (g ∗ f ′ ), where
the ‘prime’ symbol indicates a derivative.

Let us see an example of the use of the convolution in the context of filters. A filters
modifies the spectral content of an input signal. The filtering is realised by a fixed function
in the frequency domain called transfer function ĥ.

Example 3: lawpass filter Consider an input signal u and a lawpass filter. This type
of filter elimitates all frequencies greater than |ω0 |. A suitable transfer function is:

1 |ω| < ω0
ĥ(ω) = , ω0 > 0.
0 otherwise
6.1. FOURIER TRANSFORMS 41

This function can be written in terms of the rect function as follows

ĥ(ω) = rect(ω/(2ω0 )).

The inverse Fourier transform of ĥ(ω) is (1/2π) sinc(t/2). It follows that

h(t) = (ω0 /π) sinc(tω0 ),

and the output signal v is given by the convolution of u, the input signal, and h
Z∞
v(y) = (u ∗ h)(y) = (ω0 /π) u(y − t) sinc(tω0 ) dt.
−∞

Example 4 Consider the classical hamiltonian (energy) for a harmonic oscillator

r
p2 1 k
H(p, x) = + mω 2 x2 = E, ω=
2m 2 m
and the Schrödinger equation associated with it i.e

h̄2 d2 1
− ψ(x) + mω 2 x2 ψ(x) = E ψ(x),
2m dx2 2
where ψ(x) represents the wave function of the harmonic oscillator in coordinate
space. The solution for the ground state is:

2 h̄ω
ψ0 (x) = e−(mω/2h̄)x , E0 = .
2
p
This is a Gaussian with width ∆x = h̄/mω. We want to find out the ground
state wave function in momentum space. In order to do so, calculate the Fourier
transform of ψ0 (x). The variable in Fourier space is k = p/h̄ (see workshops for
Fourier transform of a Gaussian.) Then, calculate ∆p. What is the meaning of the
quantity ∆x∆p?
42 CHAPTER 6. INTEGRAL TRANSFORMS AND THE DIRAC DELTA FUNCTION

The wave function in momentum space is:


Z∞ r r
1 −(mω/2h̄)x2 −ikx h̄ −k2 h̄/(2mω) h̄ −p2 /(2h̄mω)
F[ψ(x)](k) = √ e e dx = e = e .
2π mω mω
−∞

This is a Gaussian with width ∆p = h̄mω. It follows that ∆x∆p = h̄, which codifies
the uncertainty principle in QM.

6.2 The Dirac delta function

Consider a family of rectangular pulses

1
− 2ε < x < 2ε

1 x
ε
δε (x) = rect = , ε > 0.
ε ε 0 otherwise

If we take the duration of the pulse to decrease, while retaining a unit area, then, in the
limit, we are led to the notion of the Dirac δ-function.

Epsilon:= 1
Epsilon:= 0.5
Epsilon:= 0.125
Epsilon:= 0.0625
Epsilon:= 0.03125

Figure 6.2: Dirac δ-function.

In order to define the Dirac δ-function, we need to pair it with a test function, f , by means
of an integration i. e.
Z∞
δε (x)f (x) dx.
−∞

The test function f is a smooth function near the origin so that we can use Taylor expansion
6.2. THE DIRAC DELTA FUNCTION 43

to write

Z∞ Zε/2 Zε/2
1 1
δε (x)f (x) dx = f (x) dx = (f (0) + f ′ (0) x + · · · + O(x2 )) dx
ε ε
−∞ −ε/2 −ε/2
2
= f (0) + O(ε ).

In the limit
Z∞
lim δε (x)f (x) dx = f (0).
ε→0
−∞

Therefore, we can write

Z∞ Z∞
δ(x) f (x) dx = f (0), δ(x) dx = 1.
−∞ −∞

The Dirac delta function δ(x − a) - with a a constant - is a generalised function (or
distribution) and it is defined as the limit of a sequence (not unique) of functions. Its
defining property is:

Zβ 
f (a) α < a < β
f (x)δ(x − a) dx =
0 otherwise
α

Example 5 Calculate the following integrals

R4
i) δ(x − π) cos x dx = cos π = −1.
−4

R∞
ii) δ̂(ω) = δ(t) e−iωt dt = 1.
−∞

Note that in the last example we have calculated the Fourier transform of the Dirac δ-
function. If F[δ(t)](ω) = 1, then it must be that F −1 [1](t) = δ(t). Indeed. This is called
the integral representation of the Dirac delta function i.e.

Z∞
1
δ(t) = eiωt dω.

−∞
44 CHAPTER 6. INTEGRAL TRANSFORMS AND THE DIRAC DELTA FUNCTION

Example 6 Calculate the Fourier transform of cos(ta) where a is a constant.

Z∞ Z∞
1 −iωx −ita 1
ita
eit(−ω+a) + e−it(ω+a) dt
 
F [cos(ta)] (ω) = e e +e dt =
2 2
−∞ −∞
= π (δ(−ω + a) + δ(−ω − a)) = π (δ(ω − a) + δ(ω + a)) .

Calculate the inverse and verify that you get back the cosine function.

Properties of the Dirac delta function.


i) δ(x) = δ(−x).
δ(x − a)/|g ′ (a)|.
P
ii) δ(g(x)) =
a
where a are the roots of the function g(x) i.e. g(a) = 0 and g ′ (a) ̸= 0

R∞
Example 7 Calculate I = dt δ(x2 − b2 ) f (x) dx, where b is a constant.
−∞

First, simplify the Dirac delta-function

δ(x − b) δ(x + b)
δ(x2 − b2 ) = + ,
|2b| | − 2b|

then
Z∞ Z∞
1 1 1
I= dt δ(x − b) f (x) + dt δ(x + b) f (x) = (f (b) + f (−b)) .
2b 2b 2b
−∞ −∞

R∞
iii) f (x)δ ′ (x − a)dx = −f ′ (a).
−∞

iv) H ′ (x) = δ(x)


where H(x) is the Heaviside step function defined as follows

1 x≥0
H(x) = .
0 x<0
In fact
Z∞ Z∞ Z∞

f (x) H (x) dx = f (x) H(x)|∞
−∞ − ′
f (x) H(x) dx = − f ′ (x) dx = f (0).
−∞ −∞ 0
6.3. LAPLACE TRANSFORMS 45
R∞
Since f (0) = −∞
f (x) δ(x) dx the property is proved.

Figure 6.3: Dirac δ-function as derivative of the Heaviside step function.


v) The convolution theorem holds. In fact
(δ ∗ f )(t) = f (t), F[(δ ∗ f )](ω) = F[δ]F[f ] = F[f ].

6.3 Laplace transforms


The Laplace transform of the function f (t) is:
Z∞
L[f (t)](s) ≡ f¯(s) = f (t) e−st dt,
0
where s is taken to be real. Note that, sometimes a constrain on the variable s should be
imposed in order for the integral to exist.

Example 8 Calculate the Laplace transform of the functions t and cosh(kt), where k
is a constant.
Z∞ ∞
−st e−st 1
L[t](s) = te dt = = for s > 0.
−s2 0 s2
0
Z∞ ∞ ∞
e(k−s) e−(k+s)

−st 1 s
L[cosh(kt)](s) = cosh(kt) e dt = − =
2 k−s 0 k+s 0 s2 − k 2
0
for s > |k|.
46 CHAPTER 6. INTEGRAL TRANSFORMS AND THE DIRAC DELTA FUNCTION

Properties of the Laplace transform.

ˆ Delay rule:

L[H(t − a)f (t − a)](s) = e−sa f¯(s), a constant.

ˆ Exponential multiplication:

L[eat f (t)](s) = f¯(s − a), a constant.

ˆ Scaling:
1 ¯ s 
L[f (at)](s) = f , a ̸= 0 constant.
|a| a

ˆ Polynomial multiplication:

dn f¯(s)
L[tn f (t)](s) = (−1)n , n = 1, 2, 3 . . . .
dsn

ˆ The Laplace transform of a derivative:

For the derivative of order one we have

Z∞ Z∞


L[f (t)](s) = ′
f (t) e −st
dt = f (t) e−st 0 +s f (t) e−st dt = −f (0)+sf¯(s), s > 0.
0 0

For a derivative of order n

L[f (n) (t)](s) = sn f¯(s) − sn−1 f (0) − sn−2 f (1) (0) − · · · − f (n−1) (0), s > 0,

where f (n) is the nth derivative of the function f .

ˆ The Laplace transform for integration:


 t 
f¯(s)
Z
L  f (u)du = .
s
0
6.3. LAPLACE TRANSFORMS 47

Example 9 Using the properties of the Laplace transforms and the result
L[cosh(kt)](s) = s/(s2 − k 2 ) with s > |k|, calculate the Laplace transform of the
following functions

i) sinh(kt).

Use the result of Example 4 (ii). Then


  
d cosh(kt) 1 s s k
L [sinh(kt)] (s) = L (s) = − + 2 2
= 2
dt k k k s −k s − k2
for s > |k|.

ii) t sinh(kt).

 
d k 2ks
L [t sinh(kt)] (s) = (−1) = for s > |k|.
ds s − k2
2 (s2 − k 2 )2

The convolution theorem for a Laplace transform


If the functions f and g have Laplace transforms f¯ and ḡ, then:
L[(f ∗ g)](s) = L[(g ∗ f )](s) = f¯(s)ḡ(s).
What is the convulution operation in this case? Let us find out.
Z∞ Z∞
f¯(s)ḡ(s) = du f (u) dv e−s(u+v) g(v).
0 0

Set (u + v) = t, then
Z∞ Z∞
f¯(s)ḡ(s) = du f (u) dt e−st g(t − u).
0 u

Swapping the order of integration, and being careful with the new limits of integration, we
have
Z∞
 t   t 
Z Z
f¯(s)ḡ(s) = dt e−st  du f (u)g(t − u) = L  f (u)g(t − u) du (s),
0 0 0

which implies
Zt
(f ∗ g)(t) = f (u)g(t − u) du.
0
48 CHAPTER 6. INTEGRAL TRANSFORMS AND THE DIRAC DELTA FUNCTION

The inverse of a Laplace transform

Formally
L−1 [f¯(s)] = f (t).

The general method for calculating inverse Laplace transforms requires notions of complex
analysis. Nevertheless, in some cases it is possible to calculate an inverse Laplace transforms
by means of

ˆ partial fraction decomposition,

ˆ convolution theorem,

together with the Laplace transform properties and tables of known Laplace transforms
(see table on page 455 in Riley.) In this course we are going to limit ourselves to the use
of these two techniques.

Example 10 Use partial fraction decomposition and the table at page 455 in order
to calculate f (t) given that
s+3
f¯(s) = .
s(s + 1)
3 2
f¯(s) = − = f¯1 (s) + f¯2 (s).
s s+1
Using the tables we have L−1 f¯1 (s) (t) = 3 for s > 0 and L−1 f¯2 (s) (t) = −2 e−t
   

for s > −1 hence

L−1 f¯(s) (t) = 3 − 2 et


 

for s > 0.

Example 11 Use the convolution theorem and the table at page 455 in order to
calculate f (t) given that
2
f¯(s) = .
s2 (s − 1)2
2 1
f¯(s) = 2 = f¯1 (s)f¯2 (s).
s (s − 1)2
Using the tables we have L−1 f¯1 (s) (t) = 2t for s > 0 and L−1 f¯2 (s) (t) = t et for
   

s > 1 hence
6.3. LAPLACE TRANSFORMS 49

Zt
L−1 f¯(s) (t) = 2(t − u) u eu du = 2 et (t − 2) + 2 (t + 2),
 

for s > 1.

Example 12 Use the Laplace transform in order to find a solution, i.e. f (t), for the
following ODE
df
+ 2 f (t) = e−t , f (0) = 3.
dt
Start by taking the Laplace transform of the ODE.

df
(s) + 2 L [f ] (s) = L et (s),
 
L
dt
which becomes: −f (0) + s f¯(s) + 2 f¯(s) = 1/(s + 1) →
3s + 4 1 2
f¯(s) = = .
(s + 2)(s + 1) s+1s+2

Hence f (t) = e−t + 2 e−2t .


Chapter 7

Vector Calculus: Del operator & inte-


grals

Please refer to the content in section 1.3.

7.1 The Del operator


We define the linear vector differential operator Del (or Nabla) in cartesian coordinates
as follows
∂ ∂ ∂
∇=i +j +k .
∂x ∂y ∂z
Let us apply such an operator to scalar and vector functions.
ˆ The gradient of a scalar field ϕ:
∂ϕ ∂ϕ ∂ϕ
grad ϕ = ∇ϕ = i+ j+ k.
∂x ∂y ∂z
This is a vector field and some useful rules are:
i) ∇(ϕ + ψ) = ∇ϕ + ∇ψ
ii) ∇(ϕψ) = ψ∇ϕ + ϕ∇ψ
iii) ∇(ψ(ϕ)) = ψ ′ (ϕ)∇ϕ
3 ′
iv) Special cases: ∇r = r/r, ∇(1/r) = −r/r p , ∇ϕ(r) = ϕ r/r, where r is the
modulus of the position vector r, i.e. r = x2 + y 2 + z 2 .
Consider a surface ϕ(x, y, z) = c where c is a constant. By definition
∂ϕ ∂ϕ ∂ϕ
dϕ = dx + dy + dz
∂x ∂y ∂z
 
∂ϕ ∂ϕ ∂ϕ
= i+ j+ k · (dx i + dy j + dz k) = ∇ϕ · dr = 0.
∂x ∂y ∂z

50
7.1. THE DEL OPERATOR 51

It follows that ∇ϕ is perpendicular to the surface since dr is the tangent vector.

Example 1 Consider the surface ϕ(x, y, z) = x2 + y 2 + z 2 = c. Calculate ∇ϕ


and verify that it is perpendicular to the surface.

∇ϕ = 2x i + 2y j + 2z k = 2 r.

This vector is proportional to r, hence it is clearly perpendicular to the surface.

ˆ The divergence of a vector field a:

∂ax ∂ay ∂az


div a = ∇ · a = + + .
∂x ∂y ∂z

This is a scalar field and some useful rules are:

i) ∇ · (a + b) = ∇ · a + ∇ · b
ii) ∇ · (ϕ a) = ∇ϕ · a + ϕ(∇ · a), ∇ · (a × b) = b · (∇ × a) − a · (∇ × b)
iii) Special case: ∇ · r = 3
iv) If ∇ · a = 0, a is said to be solenoidal.

Example 2 Use the index notation to show that

∇ · (a × b) = b · (∇ × a) − a · (∇ × b).

Note that this is a scalar triple product.

∇ · (a × b) = ϵijk ∇i (aj bk ) = ϵijk (∇i aj ) bk + ϵijk aj (∇i bk )


= ϵkij bk (∇i aj ) − ϵjik aj (∇i bk ) = b · (∇ × a) − a · (∇ × b)

ˆ The curl of a vector field a:


     
∂az ∂ay ∂ax ∂az ∂ay ∂ax
curl a = ∇ × a = i − +j − +k −
∂y ∂z ∂z ∂x ∂x ∂y

i) ∇ × (a + b) = ∇ × a + ∇ × b
ii) ∇ × (ϕ a) = (∇ϕ) × a + ϕ(∇ × a),
∇ × (a × b) = (b · ∇)a − (∇ · a)b − (a · ∇)b + (∇ · b)a
iii) Special case: ∇ × r = 0
iv) If ∇ × a = 0, a is said to be irrotational.
52 CHAPTER 7. VECTOR CALCULUS: DEL OPERATOR & INTEGRALS

Note that because ∇ is a differential operator the order matter i.e.

∇ · a ̸= a · ∇, ∇ × a ̸= a × ∇.

Let us apply the nabla operator on gradient, divergence and curl. We got five possible
combination quite common in physics, for instance in electromagnetism. They are:

i) The divergence of a gradient is called the Laplacian of the scalar function

∂ 2ϕ ∂ 2ϕ ∂ 2ϕ
∇ · (∇ϕ) = ∇2 ϕ = + + 2,
∂x2 ∂y 2 ∂z

where ∇2 is a scalar differential operator and it is called the Lapalcian.

ii) ∇ × (∇ϕ) = 0, all gradient are irrotational.

iii) ∇ · (∇ × a) = 0. All curls are solenoidal.

iv) ∇ × (∇ × a) = ∇(∇ · a) − ∇2 a.

Example 3 Consider the Maxwell’s equations in vacuum

(a) ∇ · B = 0, (b) ∇ · E = 0,
∂E ∂B
(c) ∇ × B = ϵ0 µ0 , (d) ∇ × E = − .
∂t ∂t
i) Derive the Laplace equation of electrostatic.

Since E = −∇ϕ, ∇ · E = −∇ · (∇ϕ) = ∇2 ϕ = 0.

ii) Derive the electromagnetic wave equation.

Take the curl of (d):

∂B ∂(∇ × B)
∇ × (∇ × E) = −∇ × =− .
∂t ∂t
Take the time derivative of (c):

∂(∇ × B) ∂ 2E
= −ϵ0 µ0 2 = ∇(∇ · E) − ∇2 E
∂t ∂t
Combine the two results
∂ 2E 1 ∂ 2E
 
2 −2 2
∇ E = ϵ0 µ0 2 , (ϵ0 µ0 ) = c → − ∇ E.
∂t c2 ∂t2
7.2. LINE INTEGRALS 53

7.2 Line integrals

ˆ Curves: A curve C can be represented by a vector function r that depends on one


parameter u (parametric representation)

r(u) = x(u) i + y(u) j + z(u) k.

Example 4 Provide a parametric representation for the following curves

i) The curve y = −x with −1 ≤ x ≤ 1.

r(u) = u i − u j, −1 ≤ u ≤ 1

ii) The curve x2 /4 + y 2 = 1 with y ≥ 0 and z = 3.

r(u) = 2 cos u i + sin u j + 3 k, 0 ≤ u ≤ 2π.

iii) The curve x2 + y 2 − 4 x + 3 = 0 with z = 0.

This is clearly a circle with centre at the point (2, 0), hence
r(u) = 2 + cos u i + sin u j, 0 ≤ u ≤ 2π.

i) The derivative r′ (u) ≡ t(u) is a vector tangent to the curve at each point.
ii) The arch length s measured along the curve satisfies:

 2 2
r
ds dr dr dr dr dr
= · = , ds = ± · du,
du du du du du du

where the sign fixes the direction of measuring s, for increasing or decreasing u.
Note that ds is the line element of the curve.

ˆ Definition: The line integral (or path integral) of a vector field a(r) along the
curve C is:
Z uZmax
dr
a(r) · dr = a(r(u)) · du,
du
C umin

where C is a smooth oriented (a direction along C must be specified) curve defined


by the equation r(u) with endpoints A = r(umin ) and B = r(umax ).
54 CHAPTER 7. VECTOR CALCULUS: DEL OPERATOR & INTEGRALS

Example 5 Consider the vector function a(r) = x ey i + z 2 j + xy k. Evaluate


R
the integral C a · dr along the following curves, with the same end points
A = (0, 0, 0) and B = (1, 1, 1).

i) C1 : r(u) = u i + u j + u k, 0 ≤ u ≤ 1.

The parametrisation tells us that x = u, y = u, z = u. Hence

a(r(u)) = u eu i+u2 j+u2 k, r′ (u) = i +j +k, a(r(u))·r′ (u) = u eu +2 u2 ,

hence
Z Z1
5
a · dr = (u eu + 2 u2 ) du = .
3
C1 0

ii) C2 : r(u) = u i + u2 j + u3 k, 0 ≤ u ≤ 1.

The parametrisation tells us that x = u, y = u2 , z = u3 . Hence


2
a(r(u)) = u eu i + u6 j + u3 k, r′ (u) = i + 2u j + 3u2 k,
2
a(r(u)) · r′ (u) = u eu + 2 u7 + 3u5 ,

hence
Z Z1
e 1
a · dr = (u eu2 + 2 u7 + 3u5 ) du = + .
2 4
C2 0

Properties/Observations.

i) In general the integral depends on the endpoints and the path C.


R R
ii) a · dr = − a · dr, where C is a curve with orientation A −→ B and − C is
C −C
a curve with orientation B −→ A.
R R R R
iii) If C = C1 + C2 + · · · + Cn then a · dr = a · dr + a · dr, · · · + a · dr. Note
C C1 C2 Cn
that it must be a set of compatible choices amongst the different segments.

iv) Other kinds of line integrals are possible. For instance:

Z Z Z Z
ϕ dr, a × dr, ϕ ds, a ds.
C C C C
7.2. LINE INTEGRALS 55

R
Example 6 Evaluate C
ϕ ds where ϕ = (x − y)2 and r(u) = a cos u i +
a sin u j, 0 ≤ u ≤ π, a constant.
p
ds = ( dr/du · dr/du) du = a du, then
Z Zπ
ϕ ds = (a cos u − a sin u)2 a du = π a3 .
C 0

ˆ Definition: Simply connected region. A region D is simply connected if every


closed path within D can be shrunk to a point without leaving the region.

R
Theorem: Consider the integral I = C a · dr, where the path C is in a simply
connected region D. Then, the following statements are equivalent:

i) The line integral I is independent of the path C. It only depends on the end-
points of the path C.

ii) It exists a scalar function ϕ (a potential) such that a = ∇ϕ.


Notice that the sign is a convention. In physics, because of the meaning of
potential in association with forces, we use a = −∇ϕ.

iii) ∇ × a = 0.

The vector field a is said to be conservative (or irrotational ) and ϕ is its potential.
In addition:

R
i) I = ∇ϕ · dr = ϕ(B) − ϕ(A) where A and B are the endpoints of the path C.
C

Notice that if you use a = −∇ϕ, then I = ϕ(A) − ϕ(B).

ii) The line integral I along any closed path C in D is zero.


56 CHAPTER 7. VECTOR CALCULUS: DEL OPERATOR & INTEGRALS

Example 7 Consider the vector function a(r) = (xy 2 + z) i + (x2 y + 2) j + x k.

i) Show that the field a is conservative and find its potential ϕ.


∂ϕ
Since ∇×a = 0, the filed is conservative. Hence a = ∇ϕ = ∂x
i+ ∂ϕ
∂y
j+ ∂ϕ
∂z
k.

∂ϕ x2 y 2
= ax = xy 2 + z → ϕ = + zx + f (y, z),
∂x 2
∂ϕ ∂f
= ay = x 2 y + 2 = x 2 y + → f = 2y + g(z),
∂y ∂y
∂ϕ dh
= az = x = x + → h = c.
∂z dz
It follows that ϕ = (xy)2 /2 + xz + 2y + c.
R
ii) Evaluate the integral C a · dr along the curve r(u) = u i + 1/u j + k with
end points A = (1, 1, 1) and B = (3, 1/3, 1).
Z
2
a · dr = ϕ(B) − ϕ(A) = .
3
C

7.3 Surface integrals


ˆ Surfaces: A surface S can be represented by a vector function r that depends on
two parameters u and v (parametric representation)
r(u, v) = x(u, v) i + y(u, v) j + z(u, v) k
i) The vectors ∂r/∂u, ∂r/∂v are linear independent and tangent to the curve S.
ii) A vector normal to the surface is:
∂r ∂r
n= × .
∂u ∂v
iii) Vector area element:
 
∂r ∂r
dS = × du dv = n du dv.
∂u ∂v
iv) Scalar area element:
∂r ∂r
dS = × du dv = |n| du dv.
∂u ∂v
dS represents a small area of the surface S.
7.3. SURFACE INTEGRALS 57

v) The orientation of the surface S is determined by the sign of n.


vi) A surface S is orientable if the vector n can be determined everywhere by a
choice of sign.
vii) A surface is bounded if it can be contained within some shpere. A bounded
surface can have a boundary, ∂S, consisting of a smooth closed curve. A bounded
surface with no boundary is closed.

ˆ Definition: The surface integrals of vector functions a(r) over a smooth surface
S, defined by r(u, v) with orientation given by the normal n̂, is:

Z Z umax
Z Zvmax  
∂r ∂r
a(r) · dS = a(r) · n̂dS = a(r(u.v)) · × du dv.
∂u ∂v
S S umin vmin

R
Example 8 Evaluate the integral I = S
a · dS over the surface defined by

r(u, v) = 2 cos u i + 2 sin u j + v k, 0 ≤ u ≤ π, −1 ≤ v ≤ 2,

where a = z j + xy k. We need

∂r ∂r
= −2 sin u i + 2 cos u j, = k.
∂u ∂v
Hence  
∂r ∂r
dS = × dudv = (2 cos u i + 2 cos u j) du dv,
∂u ∂v
and a(r(u, v)) = v j + 4 sin u cos u k with a · dS = (2v sin u) du dv. It follows
that
Zπ Z2
I = 2 du sin u dv v = 6.
0 −1

Observation.

i) The integral depends on the orientation of the surface S since the sign of dS
depends on the orientation of S.
ii) If the surface is closed, by convention, the vector n is pointed outwards the
volume enclosed.
iii) In order to parametrise the surface is often useful to use alternative coordinates
systems. For instance:
58 CHAPTER 7. VECTOR CALCULUS: DEL OPERATOR & INTEGRALS

a) Cylindrical polar coordinates:



 x = ρ cos ϕ
y = ρ sin ϕ , ρ ≥ 0, 0 ≤ ϕ < 2π, −∞ < z < ∞
z=z

Figure 7.1: Cylindrical polar coordinates.

b) Spherical polar coordinates:



 x = r sin θ cos ϕ
y = r sin θ sin ϕ , r ≥ 0, 0 ≤ ϕ < 2π, 0≤θ≤π
z = r cos θ

Figure 7.2: Spherical polar coordinates.

iv) Other kinds of integrals are possible. For instance:


Z Z Z Z
ϕ dS, a × dS, ϕ dS, a dS.
S S S S
7.4. VOLUME INTEGRALS 59

R
Example 9 Evaluate the integral S
ϕ dS where S is the surface of the hemi-
sphere x2 + y 2 + z 2 = a2 with z ≥ 0, a constant and ϕ = z 2 . Using spherical
polar coordinates we have:

r(θ, ϕ) = a sin θ cos ϕ i + a sin θ sin ϕ j + a cos θ k and

∂r ∂r
= a cos θ cos ϕ i+a cos θ sin ϕ j−a sin θ k, = −a sin θ sin ϕ i+a sin θ cos ϕ j.
∂θ ∂θ
Hence
 
∂r ∂r
dS = × dθdϕ = a sin θr dθdϕ dS = |dS| = a2 sin θ dθ dϕ,
∂θ ∂ϕ

and ϕ(r(θ, ϕ)) = a2 cos2 θ. It follows that

Z Zπ/2
4 2
ϕ dS = 2π a dθ cos2 θ sin θ = a4 π.
3
S 0

7.4 Volume integrals


Volume integrals of a function ϕ(r) (or a(r)) over a volume V described by r(u, v, w) is:
Z umax Z
vmax
Z Zwmax  
∂r ∂r ∂r
ϕ(r) dV = ϕ(r(u, v, w)) · × dudvdw.
∂u ∂v ∂w
V umin vmin wmin

Observation

i) dV The order to parametrisation does not matter.

ii) Given a parametrisation r(u, v, w)


 
∂r ∂r ∂r ∂(x, y, z)
dV = · × dudvdw = dudvdw,
∂u ∂v ∂w ∂(u, v, w)

where the last expression highlights the 3-dimensional Jacobian.

a) For cylindrical polar coordinates: dV = ρ dρdϕdz.


b) For spherical polar coordinates: dV = r2 sin θ drdθdϕ.
Chapter 8

Theorems of integration

ˆ Divergence theorem (Gauss’ theorem):

ZZZ ZZ
(∇ · a) dV = a · dS,
V S

where V is a bounded volume with boundary ∂V = S and S is a closed surface n̂(dS)


with normal pointing outwards.

Example 1 Use the divergence theorem to show that the Gauss’s law of elec-
trostatic for a point-like charge q is equivalent to the Maxwell’s equation
∇ · E = ρ/ϵ0 .
Consider the point-like charge at the origin. Then
ZZZ
qr
E= , with q = ρ dV,
4πϵ0 r3
V

where ρ is the charge density. On the other hand, Gauss’ law states:
RR
S
= E · dS = q/ϵ0 . It follows, by using the divergence theorem, that
ZZ ZZZ ZZZ
1
E · dS = (∇ · E) dV = ρ dV.
ϵ0
S V V

Hence the initial statement is proved.

60
61

Figure 8.1: Example 2.

Example 2 Take V to be the solid hemisphere x2 + y 2 + z 2 ≤ a2 with z ≥ 0


and a constant and a the vector function a = (z + a) k. Verify the divergence
theorem.
ZZZ ZZ ZZ
(∇ · a) dV = a · dS1 + a · dS2 .
V S1 S2

For the integral on the left hand side:


ZZZ
2
∇·a=1 → (∇ · a) dV = πa3 .
3
V

For the integrals on the right hand side use spherical polar coordinates.
From the Example 9 in the previous chapter, we know that dS1 = a sin θ r dθdϕ.
Also

a(r1 (θ, ϕ)) = (a cos θ + a) k, a · dS1 = a3 (cos θ + 1) sin θ cos θ dθdϕ,

hence
ZZ Zπ/2
5
a · dS1 = a3 2π (cos2 θ + sin θ cos θ) dθ = πa3 .
3
S1 0

For the second integral on the right hand side a suitable parametrisation is
r2 (ϕ) = r cos ϕ i + r sin ϕ j. Hence
  ZZ Za
∂r ∂r
dS1 = × drdϕ = −r k drdϕ → a·dS2 = −a2π r dr = −πa3 .
∂ϕ ∂r
S2 0

It follows that the divergence theorem reads: 2/3 πa3 = 5/3πa3 − πa3 .
62 CHAPTER 8. THEOREMS OF INTEGRATION

Example 3 Derive Gauss’s law for a general surface S. Then use the divergence
theorem to show that  
21
∇ = −4πδ(r),
r
where δ(r) is the 3-dimensional delta function
ZZZ 
f (r) a ∈ V
f (r)δ(r − a) dV = .
0 otherwise
V

Consider a point-like charge at the origin. Hence, applying the divergence


theorem
ZZ ZZ ZZZ
q r q r
E · dS = · dS = ∇ · 3 dV = 0 if r ̸= 0.
4πϵ0 r3 4πϵ0 r
S S V

In fact, ∇ · (r/r3 ) = 0 for r ̸= 0, i.e. if the origin is not inside the surface S.
Then consider the volume between the surface S and a small sphere around
the origin S ′ . Because of the divergence theorem
ZZ ZZ
r r
3
· dS − · dS′ = 0.
r r3
S S′

We can calculate easily the second integral, since dS′ = r2 r̂ sin θdθdϕ, hence
ZZ Zπ
r
· dS′ = 2π sin θdθ = 4π.
r3
S′ 0

It follows that
ZZ ZZ
q r q q
E · dS = · dS = 4π = .
4πϵ0 r3 4πϵ0 ϵ0
S S

Since ZZ 
r 4π r = 0
· dS = ,
r3 0 r= ̸ 0
S

we can write
ZZZ r ZZZ   ZZZ
2 1
∇ · 3 dV = − ∇ dV = 4πδ(r) dV.
r r
V V V

The final result follows.


63

ˆ Green’ s theorem in a plane:

I ZZ  
∂Q ∂P
(P dx + Qdy) = − dxdy,
∂x ∂y
C R

where P (x, y) and Q(x, y) are two functions whose derivatives are continuous and
single-valued inside and on a boundary of a simple connected region R in the xy-
plane and C = ∂R is a closed, anticlockwise oriented curve.

Figure 8.2: Green Theorem.

Green’s theorem is also called the divergence theorem in two dimensions. In fact,
consider such a theorem in Cartesian coordinates for a vector function a. Then in
two dimensions it reads

ZZZ ZZ   I I
∂ax ∂ay
∇·a= + dxdy = (a · n̂)dS = (ax dy − ay dx),
∂x ∂y
V R ∂V C

since n = n̂ dS = dy î − dx ĵ. Setting ax = Q and ay = −P Green’s theorem is


recovered.
64 CHAPTER 8. THEOREMS OF INTEGRATION

Example 4 Use the Green’s theorem to calculate the integral


I
I= (y − sin x) dx + cos x dy,
C

where C is the boundary of the triangle with vertices (0, 0), (1, 0), (1, 2).
Set F = (P, Q) = (y − sin x, cos x), then
ZZ   ZZ
∂Q ∂P
I= − dxdy = − (sin x + 1) dx dy,
∂x ∂y
R R

where R is the region inside the triangle. Then

Z1 Z2x Z1
I=− dx dy (sin x + 1) = − dx (sin x + 1) 2x = 2 cos(1) − 2 sin(1) − 1.
0 0 0

Observation
i) Note that Green’s theorem holds in multiply connected regions as well. In this
case the integrals must be calculated over all boundaries of the regions suitably
oriented (positive oriented).
ii) Green’s theorem can be used to evaluate the area A of a region R. In fact, set
F = (P, Q), then choose P and Q such that
 
∂Q ∂P
− = 1,
∂x ∂y
that is F = (−y/2, x/2) or F = (0, x) or F = (−y, 0). Then Green’s theorem
implies I I I
1
A= (xdy − ydx) = xdy = − ydx,
2
C C C
respectively, where C = ∂R.
iii) It can be shown that
∂Q ∂P
=
∂x ∂y
is a necessary and sufficient condition for the field F = (P, Q) to be conservative.
In this case Green’s theorem implies
I I
(P dx + Qdy) = F · dr = 0.
C
65

Example 5 Calculate the area of the ellipse {x = a cos ϕ, y = b sin ϕ}.

Using the formula Z


1
A= (xdy − ydx),
2
C

the area of the ellipse is:

Z2π Z2π
1 ab
A= (ba cos ϕ cos ϕ + ab sin ϕ sin ϕ) dϕ = dϕ = πab.
2 2
0 0

ˆ Stokes’ theorem: ZZ Z
(∇ × a) · dS = a · dr,
S C
where S is a bounded smooth surface with boundary ∂S = C and S is a piecewise
smooth curve. C and S must have compatible orientation.
Compatible orientation: Imagine you are walking on the surface (side with the
normal pointing out). If you walk near the edge of the surface in the direction
corresponding to the orientation of C, then the surface must be to your left.

Figure 8.3: Example 6.

Example 6 Take a = xz j and S be the section of the cone x2 + y 2 = z 2 , with


a ≤ z ≤ b, b > a > 0. Verify Stokes’ theorem. Hint: Use cylindrical polar
coordinates.
66 CHAPTER 8. THEOREMS OF INTEGRATION

ZZ Z Z
(∇ × a) · dS = a · dra + a · drb
S Ca Cb

On the left hand side:


A suitable parametrisation is: r(ρ, ϕ) = ρ cos ϕ i + ρ sin ϕ j + ρ k, hence
 
∂r ∂r
dS = × dρdϕ = ρ(− cos ϕ i − sin ϕ j + k) dρdϕ.
∂ρ ∂ϕ

The field evaluated on the surface is: ∇ × a = ρ (− cos ϕ i + k). It follows


that
ZZ Z2π Zb
(∇ × a) · dS = dϕ (ρ2 cos2 ϕ + ρ2 ) dρ = π(b3 − a3 ).
S 0 a

On the right hand side:


A suitable parametrisation for the circle with radius b is:

r(ϕ) = b cos ϕ i + b sin ϕ j + b k, r′ (ϕ) = −b sin ϕ i + b cos ϕ j, a = b2 cos ϕ j,

then Z Z 2π
3
a · drb = b cos2 ϕ dϕ = b3 π.
0
Cb

The circle with radius a has a clockwise parametrisation, hence


Z Z 2π
3
a · dra = −a cos2 ϕ dϕ = −a3 π.
0
Ca

It follows that the Stokes’ theorem reads π(b3 − a3 ) = −π a3 + π b3 .


Chapter 9

Change of variables: orthogonal curvi-


linear coordinates

Given the position vector r expressed in cartesian coordinates x, y, z we can use a change
of variable to express this vector in terms of a new set of coordinates u, v, w
r(u, v, w) = x(u, v, w) i + y(u, v, w) j + z(u, v, w) k,
where x, y, z are continuous and differentiable functions.
The line element is:
∂r ∂r ∂r
dr = du + dv + dw.
∂u ∂v ∂w
where the vectors ∂r/∂u, ∂r/∂v, ∂r/∂w are linearly independent. If these vectors are
orthogonal, then the coordinates u, v, w are said to be orthogonal curvilinear coordi-
nates.
Properties

ˆ New basis:
∂r ∂r ∂r
= hu êu , = hv êv , = hw êw ,
∂u ∂v ∂w
where hu , hv hw positive and called scale factors. In an orthogonal curvilinear coor-
dinate system these vectors are orthogonal and êu , êv , êw form an orthonormal basis
of the three dimensional vector space R3 .
ˆ Line element:
dr = hu êu du + hv êv dv + hw êw dw.
The scale factors determine the changes in length along each orthogonal direction
resulting from changes in u, v, w.
ˆ Arc length:
ds2 = dr · dr = h2u (du)2 + h2v (dv)2 + h2w (dw)2

67
68CHAPTER 9. CHANGE OF VARIABLES: ORTHOGONAL CURVILINEAR COORDINATES

ˆ Vector area element (surface of constant w, parametrised by u and v):


 
∂r ∂r
dS = × du dv = hu hv dudv.
∂u ∂v

ˆ Volume element:
 
∂r ∂r ∂r
dV = · × du dv dw = hu hv hw dudvdw.
∂u ∂v ∂w

ˆ Note that vector algebra is the same in orthogonal curvilinear coordinates as in


cartesian coordinates.

Example 1 Derive the the scale factors, basis vector and volume elements for

1) Cartesian coordinates.

r(x, y, z) = x i + y j + z k, hence

∂r ∂r ∂r
= i, = j, =k → hx = hy = hz = 1, êx = i, êy = j, êz = k,
∂x ∂y ∂z
and dV = dxdydz.

2) Cylindrical polar coordinates.

r(ρ, ϕ, z) = ρ cos ϕ i + ρ sin ϕ j + z k, hence

∂r ∂r ∂r
= cos ϕ i + sin ϕ j, = −ρ sin ϕ i + ρ cos ϕ j, = k, →
∂ρ ∂ϕ ∂z
hρ = 1, hϕ = ρ, hz = 1, êρ = cos ϕ i+sin ϕ j, êϕ = − sin ϕ i+cos ϕ j, êz = k
and dV = ρ dρdϕdz.

Gradient, divergence and curl in orthogonal curvilinear coordinates.


Consider a scalar function f (u, v, w). Then

∂f ∂f ∂f
df = du + dv + dw = ∇f · dr.
∂u ∂v ∂w
In cartesian coordinates this becomes
 
∂ ∂ ∂
∇f · dr = i +j +k · (i dx + j dy + k dz) .
∂x ∂y ∂z
69

On the other hand, in general curvilinear coordinates it is:

∇f · dr = ∇f · (hu eu du + hv ev dv + hw ew dw) ,

which implies
êu ∂f êv ∂f êw ∂f
∇f = + + .
hu ∂u hv ∂v hw ∂w
This is the gradient of the function f in general curvilinear coordinates. It follows that
the del operator is:
êu ∂ êv ∂ êw ∂
∇= + + .
hu ∂u hv ∂v hw ∂w
Without derivation, we also have
ˆ Divergence:
 
1 ∂ ∂ ∂
∇·a= (hv hw au ) + (hw hu av ) + (hu hv aw ) ,
hu hv hw ∂u ∂v ∂w
where a = au êu + av êv + aw êw .

ˆ Curl:
hu êu hv êv hw êw
1 ∂ ∂ ∂
∇×a= ∂u ∂v ∂w
hu hv hw
hu au hv av hw aw .

ˆ Laplacian:
      
2 1 ∂ hv hw ∂ϕ ∂ hw hu ∂ϕ ∂ hu hv ∂ϕ
∇ ϕ= + + .
hu hv hw ∂u hu ∂u ∂v hv ∂v ∂w hw ∂w

Example 2 Find the position vector r in cylindrical polar coordinates and verify that
∇ · r = 3.
From Example 1, we have the unit vectors for the cylindrical polar coordinates. By
inverting those relations we obtain:

i = cos ϕ êρ − sin ϕ êϕ , j = sin ϕ êρ + cos ϕ êϕ , k = êz .

Then

r = ρ cos ϕ(cos ϕ êρ − sin ϕ êϕ ) + ρ sin ϕ(sin ϕ êρ + cos ϕ êϕ ) + z êz = ρ êρ + z êz

and  
1 ∂ 2
 ∂
∇·r= ρ + (ρ z) = 3.
ρ ∂ρ ∂z
70CHAPTER 9. CHANGE OF VARIABLES: ORTHOGONAL CURVILINEAR COORDINATES

Example 3 A rigid body is rotating about a fixed axis with a constant angular velocity
ω. Take ω to lie along the z-axis. Use cylindrical polar coordinates to compute

1) v = ω × r.

The position vector has been found in Example 2. Then ω = ω êz . Then

êρ êϕ êz


v= 0 0 ω = ωρ êϕ .
ρ 0 z

2) ∇ × v.
êρ ρ êϕ êz
1 ∂ ∂ ∂
∇×v = ∂ρ ∂ϕ ∂z = 2ω êz = 2 ω.
ρ 2
0 ωρ 0

Summary of common orthogonal curvilinear coordinates:


ˆ Cylindrical polar coordinates:
r(ρ, ϕ, z) = ρ cos ϕ i + ρ sin ϕ j + z k, ρ ≥ 0, 0 ≤ ϕ < 2π, −∞ < z < ∞
hρ = 1, êρ = cos ϕ i + sin ϕ j,
i) hϕ = ρ, êϕ = − sin ϕ i + cos ϕ j,
hz = 1, êz = k.

 êρ ρ dϕdz (ρ = const)
ii) dS = êϕ dρdz (ϕ = const)
êz ρ dρdϕ (z = const).

iii) dV = ρ dρdϕdz.
ˆ Spherical polar coordinates:
r(r, θ, ϕ) = r sin θ cos ϕ i+r sin θ sin ϕ j+r cos θ k, r ≥ 0, 0 ≤ ϕ < 2π, 0≤θ≤π
hr = 1, êr = sin θ cos ϕ i + sin θ sin ϕ j + cos θ k,
i) hθ = r, êθ = cos θ cos ϕ i + cos θ sin ϕ j − sin θ k,
hϕ = r sin θ, êϕ = − sin ϕ i + cos ϕ j.

 êr r2 sin θ dθdϕ (r = const)
ii) dS = êθ r sin θ drdϕ (θ = const)
êϕ r drdθ (ϕ = const).

iii) dV = r2 sin θ drdθdϕ.


Chapter 10

Introduction to probability

10.1 The basics

Probability is the foundation and language of statistics. It provides a language to express


our degree of belief or certainty about events and a logical framework for quantifying
randomness.

ˆ Probability can often be counterintuitive.

ˆ Probability is necessary for understanding quantum physics and it is the foundation


of statistical mechanics.

ˆ Probability plays an essential role in studying performance of algorithms in machine


learning and artificial intelligence. Biology, finance, medicine, social science, meteo-
rology, etc. make use of it.

Terminology & Notation: consider an experiment

ˆ Trial: A single performance of an experiment.

ˆ Outcome (s): Each single result of the experiment.

ˆ Sample Space (S): The set of all outcomes.

ˆ Event (A): A subset of the sample space S.

71
72 CHAPTER 10. INTRODUCTION TO PROBABILITY

Example 1 Experiment: a six-sided dice is thrown.

1. S = {1, 2, 3, 4, 5, 6} sample space

2. s = 2, s∈S outcome

3. A = {2, 4, 6}, A⊂S event

Operations and relationships between events


Because we are working with sets, we can use the language of sets.

ˆ A or B → A ∪ B, Union

ˆ A and B → A ∩ B, Intersection

ˆ not A → Ā The complement of A

ˆ A implies B → A ⊆ B

ˆ A and B are mutually exclusive (disjoint) → A ∪ B ̸= ∅

ˆ A1 , A2 , . . . An are a partition of S → A1 ∪ A2 ∪ . . . ∪ An = S, Ai ∩ Aj = ∅ for


i ̸= j

Probability: what it is and its properties


Probability is a real number between 0 and 1 associated with each event A ⊆ S and it is
denoted as P (A). P must satisfies the following axioms:

i) P (S) = 1, P (∅) = 0

ii) If Ai ∪ Aj = ∅ for all i ̸= j (disjoint events), then



! ∞
[ X
P Aj = P (Aj ) Addition or the OR rule.
j=1 j=1

Note that if A1 , A2 , . . . An is a partition then

P (A) = P (A1 ) + P (A2 ) + · · · + P (An ).

ˆ P (Ā) = 1 − P (A) Complement rule.


10.1. THE BASICS 73

ˆ If A1 and A2 are not disjoint, then

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) Addition or the OR rule.

In general


! ∞
[ X X X
P Aj = P (Aj ) − P (Ai ∩ Aj ) + P (Ai ∩ Aj ∩ Ak ) −
j=1 j=1 i<j i<j<k
n+1
. . . (−) P (A1 ∩ · · · ∩ An ).

It follows that if two evens A1 and A2 are disjoint, then P (A1 ∩ A2 ) = 0.

ˆ Two events are independent if

P (A ∩ B) = P (A) · P (B) Multiplication or the AND rule

More generally, the probability of a sequence of independent events is obtained by


multiplying the probability of the single events. Note that independence is different
from disjointness.

Definition: Frequentist interpretation of probability

Let A be an event for an experiment with a finite space S with equally likely outcome,
then
number of outcomes favourable to A
P (A) = .
number of outcomes in S

This definition is based on relative frequency over a large number of repeated trials.

Suppose you want to know the long-run relative frequency of setting head from a fair coin.
For each trial you will get an H or a T. We perform n trials and count the number of times
H appears. Then we estimate the probability of H, P (H), by the relative frequency. From
the figure 10.1, we can note as the probability of H approaches 1/2.
74 CHAPTER 10. INTRODUCTION TO PROBABILITY

1.0
Probability of getting heads vs Number of trials
Sequence 1
Sequence 2
Sequence 3
Sequence 4
True probability (0.5)
0.8
Probability of heads

0.6

0.4

0.2

0.0
1 2 5 10 20 50 100 200 500 1000 3000 10000
Number of trials (log scale)

Figure 10.1: Long-run relative frequency.

Example 2 Find the probability of drawing from a pack of card one that has at least
one of the following properties (Note that the total number of cards is 52. There are
2 black suits - clubs, spades, and 2 red suits - hearts, diamonds. There are 13 cards
in each suit. Each card in a single suit has a different rank. See figure 10.2):

A the card is an ace

B the card has a black suit

C the card is a diamond

4 26 13
P (A) =
, P (B) = , P (C) = .
52 52 52
We are interested in the following event: A ∪ B ∪ C.

P (A∪B∪C) = P (A)+P (B)+P (C)−P (A∪B)−P (A∪C)−P (B∪C)+P (A∪B∪C).


10.2. COUNTING THE NUMBER OF OUTCOMES IN AN EVENT 75

Since
2 1
P (A ∪ B) = , P (A ∪ C) = , P (B ∪ C) = 0, P (A ∪ B ∪ C) = 0,
52 52
we have
40
P (A ∪ B ∪ C) = ∼ 0.77 = 77%.
52

Figure 10.2: Full deck of cards.

10.2 Counting the number of outcomes in an event


ˆ Sampling with replacement. Consider n objects and consider making k choices
from them one at a time with replacement. The objects are distinguishable, hence
the order matter. Then the number of outcomes is:
nk .

ˆ Sampling without replacement and the order matter. Consider n objects and
consider making k choices from them one at a time without replacement. The order
matter. Then the number of outcomes is:
n!
n(n − 1)(n − 2) . . . (n − k + 1) = Permutation of size k on n.
(n − k)!
76 CHAPTER 10. INTRODUCTION TO PROBABILITY

ˆ Sampling without replacement and the order does not matter. Consider n
objects and consider making k choices from them one at a time without replacement.
The order does not matter. Then the number of outcomes is:

 
n n(n − 1)(n − 2) . . . (n − k + 1) n!
= = The binomial coefficient.
k k! (n − k)!k!

Note the division by k! with respect to the previous case. We must take into account
the permutation of k objects because now they are indistinguishable.

Example 3: Birthday problem


There are k people in a room. Find the probability that at least 2 have the same
Birthday (consider a year with 265 days). It is easier to approach the problem by
counting the complement i.e. the number of ways to assign Birthdays to k people
such that no 2 people share a Birthday (sampling without replacements in which
the order matters). Hence, the probability of no Birthday matches in a group of k
people is:

365 · 364 · 363 . . . (365 − k + 1)


P (no Birthday matches) = .
365k
Note at the denominator the use of sampling with replacement for counting the
number of way to assign a Birthday to the people in the room. Finally, the probability
that at least a Birthday matches is:

365 · 364 · 363 . . . (365 − k + 1)


P (at least one Birthday matches) = 1 − .
365k

Note that at k = 23 the probability already exceeds 1/2 while at k = 57 the probability
exceeds 99%
10.2. COUNTING THE NUMBER OF OUTCOMES IN AN EVENT 77

1.0
Probability of Birthday matches against the number of people
Probability: P(k)
k = 23
k = 57
0.8
Probability of Birthday matches

0.6

0.4

0.2

0.0
0 20 40 60 80 100
k

Figure 10.3: Example 3: The Birthday problem

Example 4 Consider a college with n students. Count the number of ways to choose
a president, a vice president and a treasurer. This is a situation of sampling without
replacements in which the order matters. We assume that the 3 roles are distinct.
Then, the number of ways is:

n(n − 1)(n − 2).

Now, let us now suppose that we want to choose the 3 candidates without predeter-
mined titles. In this case we need to consider overcounting and make adjustments
accordingly. Hence, the number of ways becomes:
 
n n(n − 1)(n − 2)
=
3 3!

Let us now suppose that we want to create a committee of 5 people out of the
remaining number of students. How many possibilities are there? Answer:
  
n n−3
.
3 5
78 CHAPTER 10. INTRODUCTION TO PROBABILITY

Example 5: Full house in poker


Consider a 5-card hand dealt from a standard 52-card deck. The hand is called a full
house in poker it consists of 3 cards of the same rank and 2 cards of another rank
(e.g 3 sevens and 2 tens). What is the probability of a full house?
The order of the cards in the hand does not matter. Hence the number of combina-
tions of 5 cards out of 52 is:  
52
.
5
Since there are 4 suits, the number of ways in which we can choose 3 cards of the
same rank is:  
4
.
3
Similarly, for 2 cards of the same rank is:
 
4
.
2

Since there are 13 choices for the rank we have 3 of and 12 choices for the rank we
have 2 of, the probability is:

13 · 43 12 · 42
 
P (full house) = 52
 ≈ 0.00144.
5

10.3 Conditional probability

All probabilities are conditional. There is always background knowledge built into every
probability. Previously we said that probability allows us to express our belief or uncer-
tainty about events. The conditional probability allows us to answer the question of how
to update our beliefs in light of new evidence.
Definition: Conditional probability
If A and B are events, then the conditional probability of A given B, denoted P (A|B) is
defined as
P (A ∩ B)
P (A|B) = .
P (B)

- A: the event whose uncertainty we want to update.

- B: the new evidence we observed.


10.3. CONDITIONAL PROBABILITY 79

- P (A): the prior probability of A.

- P (A|B): the posterior probability of A.

Independence of events
If the events A and B are independent, then

P (A|B) = P (A), P (B|A) = P (B).

Law of total probability


Let A1 , A2 , . . . , An be a partition of the sample space S and B an event in S. Then

B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ · · · ∪ (B ∩ An ) −→
P (B) = P (B ∩ A1 ) + P (B ∩ A2 ) + · · · + P (B ∩ An )
= P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + · · · + P (B|An )P (An ).

Figure 10.4: Total probability: partition of the sample space S.

This tells us that to get the unconditional probability of B, we can divide the sample space
into disjoint slices Ai , find the conditional probability of B within each of the slices, then
take a weighted sum of the conditional probabilities where the weights are the probabilities
P (Ai ).

Example 6 Let us suppose that we have 3 containers with some black and red balls
as in the figure 10.5. Let us choose a ball. What is the probability that the ball is
red?
80 CHAPTER 10. INTRODUCTION TO PROBABILITY

Let us define 4 events.

- B1 : getting a ball from the first container.

- B2 : getting a ball from the second container.

- B3 : getting a ball from the third container.

- A: getting a red ball.

Then
P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + P (A|B3 )P (B3 ).
Since
1 2 3 1
P (B1 ) = P (B2 ) = P (B3 ) = , P (A|B1 ) = , P (A|B2 ) = , P (A|B3 ) = .
3 3 4 2
Then  
2 3 1 1 23
P (A) = + + = = 0.638.
3 4 2 3 36

Figure 10.5: Example 6: containers

Bayes’ rule. Take the formula of conditional probability and rearrange it, that is

P (B|A)P (A)
P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A) =⇒ P (A|B) = .
P (B)

Bayes’ rule relate P (A|B) with P (B|A), which are different.

- P (A|B): the posterior.

- P (A): the prior.

- P (B): the evidence.


10.3. CONDITIONAL PROBABILITY 81

- P (B|A): the likelihood.

Suppose A is just an event in a collection of events which form a partition of the sample
space S (Ai with i=1,2,. . . , n) and relabel it Aj . Then

P (B|Aj )P (Aj ) P (B|Aj )P (Aj )


P (Aj |B) = = Pn .
P (B) i=1 P (B|Ai )P (Ai )

This is still Bayes’ rule in a particularly useful form. Note that Bayes’ rule lies at the core
of Bayesian inference, which is heavily used in data analysis.

Example 7a Suppose a screening test for doping in sport is claimed to be 95% accu-
rate, meaning that 95% of dopers, and 95% of non-dopers, will be correctly classified.
Assume 1 in 50 (2%) athletes are truly doping at anytime. If an athlete tests positive,
what is the probability that they are truly doping? Let us use Bayes’ rule.

- A: the athlete is doping.

- B: the test is positive.

- partition: A ∩ Ā = ∅.

Hence
P (B|A)P (A)
P (A|B) = .
P (B|A)P (A) + P (B|Ā)P (Ā)
Since

- P (B|A) = 0.95 accuracy of the test.

- P (B|Ā) = 0.05.

- P (A) = 0.02, P (Ā) = 0.98.

Then
0.95 · 0.02
P (A|B) = ≃ 28%.
0.95 · 0.02 + 0.05 · 0.98
Note that P (A|B) = 28% ̸= P (B|A) = 95%.
a
From The art of Statistics, D. Spiegelhalter
82 CHAPTER 10. INTRODUCTION TO PROBABILITY

Suppose that the athletes who tests positive is forced to take another test. assume
the two test s are independent. We want to update out probability based on this
new piece of information.

- A: the athlete is doping.

- B1 : the first test is positive.

- B2 : the second test is positive.

Then
P (B1 ∩ B2 |A)P (A)
P (A|B1 ∩ B2 ) =
P (B1 ∩ B2 |A)P (A) + P (B1 ∩ B2 |Ā)P (Ā)
(0.95)2 · 0.02
= ≃ 88%.
(0.95)2 · 0.02 + (0.05)2 · 0.98

Note that, looking at the frequency tree in figure 10.6 we can infer
19 + 49
P (athletes test positive) = ∼ 68%,
1000
19
P (athletes test positive and they are truly doping) = ∼ 28.%
1000

Figure 10.6: Example 7: Frequency tree with a sample of 1000 athletes.


10.3. CONDITIONAL PROBABILITY 83

Definition: Odds The odds of an event are:

P (A) odds(A)
odds(A) = =⇒ P (A) = .
P (Ā) 1 + odds(A)

Alternative form of Bayes’ rule. Take Bayes’ rule and divide it by P (Ā|B). Then

P (A|B) P (B|A) P (A)


= ·
P (Ā|B) P (B|Ā) P (Ā)
posterior odds = likelihood ratio · prior odds.

Consider Example 7. Then, the likelihood ration is:

P (B|A) 0.95
= = 19.
P (B|Ā) 0.05

Hence the positive result is 19 times more likely to happen if the athlete is doping.

You might also like