Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views89 pages

MA271 Lecture Notes

MA271 - Mathematical Analysis 3 is a self-contained course covering topics such as pointwise and uniform convergence, differentiation, and complex valued functions. The course relies on foundational concepts from first-year analysis and includes a comprehensive review of limits, continuity, and differentiability. The lecture notes, authored by Doctor Mario Micallef and Professor Jose Rodrigo, outline the structure and key results of the course, with non-examinable material clearly marked.

Uploaded by

aamnazulfiqar31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views89 pages

MA271 Lecture Notes

MA271 - Mathematical Analysis 3 is a self-contained course covering topics such as pointwise and uniform convergence, differentiation, and complex valued functions. The course relies on foundational concepts from first-year analysis and includes a comprehensive review of limits, continuity, and differentiability. The lecture notes, authored by Doctor Mario Micallef and Professor Jose Rodrigo, outline the structure and key results of the course, with non-examinable material clearly marked.

Uploaded by

aamnazulfiqar31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

MA271 - Mathematical Analysis 3

Instructor: Doctor Vedran Sohinger


Lecture notes written by Doctor Mario Micallef and Profesor Jose Rodrigo.

January 14, 2025


Contents

Contents i

1 Introduction 2
1.1 Review of limits of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Review of continuity and differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Review of integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Sequences and Series of Functions 7


2.1 Pointwise convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Uniform convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Series of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 A continuous, nowhere differentiable function (THE PROOFS IN THIS SECTION ARE
NOT EXAMINABLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Basic results about Rn 18


3.1 Notation in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 The Euclidean norm and inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Convergence in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Subsequences and the Bolzano-Weierstrass theorem . . . . . . . . . . . . . . . . . . . . . 22
3.5 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1 Definitions of continuity and continuous limit . . . . . . . . . . . . . . . . . . . . 23
3.5.2 Separate continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.3 Basic properties of continuous functions . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.4 Constructing continuous functions of several variables from continuous real valued
functions of a single real variable (NOT EXAMINABLE). . . . . . . . . . . . . . . 25
3.5.5 Caution with taking limits in dimension > 2 . . . . . . . . . . . . . . . . . . . . . 26

4 Rudiments of topology of Rn and Continuity 29


4.1 Closed and open subsets of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Continuity and topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Continuity in terms of open sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Continuity and sequential compactness . . . . . . . . . . . . . . . . . . . . . . . . 32

5 The space of linear maps and matrices 34


5.1 Two norms on the space of linear maps and matrices . . . . . . . . . . . . . . . . . . . . 35
5.1.1 Comparison of the two norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.2 Properties of the operator norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Convergence and continuity in L (Rn , Rk ) . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.1 Continuity of functions involving matrices or linear maps (NOT COVERED IN
CLASS AND NOT EXAMINABLE) . . . . . . . . . . . . . . . . . . . . . . . . . 37

i
CONTENTS

6 The Derivative 39
6.1 Directional derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1.1 Directional derivative and continuity . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 The (Fréchet) Derivative as an affine linear approximation . . . . . . . . . . . . . . . . . 40
6.2.1 Affine linear approximation in the 1-variable case . . . . . . . . . . . . . . . . . . 40
6.2.2 The (Fréchet) Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2.3 Differentiability of components of vector-valued functions . . . . . . . . . . . . . . 41
6.2.4 Relation between the derivative and directional derivative . . . . . . . . . . . . . . 42
6.3 Partial derivatives, gradient and Jacobian matrix . . . . . . . . . . . . . . . . . . . . . . . 42
6.3.1 Algebraic rules for partial derivatives. . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3.2 Gradient and Jacobian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3.3 Why so many different notations for the same thing?! . . . . . . . . . . . . . . . . 44
6.4 Geometric approximation and approximation of functions (NOT COVERED IN CLASS AND
NOT EXAMINABLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4.1 Tangent to a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4.2 Tangent plane of a surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.4.3 Graph of a scalar function of 2 variables . . . . . . . . . . . . . . . . . . . . . . . 46
6.4.4 Orders of approximation of a function . . . . . . . . . . . . . . . . . . . . . . . . 46
6.5 Examples of direct calculation of the derivative from its definition (NOT COVERED IN
CLASS AND NOT EXAMINABLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.5.1 Differentiation of matrix-valued functions . . . . . . . . . . . . . . . . . . . . . . 47
6.6 The Chain Rule (NOT COVERED IN CLASS AND NOT EXAMINABLE) . . . . . . . . . 48
6.6.1 Jacobian form of chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.6.2 Calculating with the chain rule and gradient . . . . . . . . . . . . . . . . . . . . . 50
6.6.3 Another proof of Proposition 6.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.6.4 Application of chain rule to the verification of a PDE satisfied by a function . . . . 51
6.7 Continuity of partial derivatives implies differentiability (NOT COVERED IN CLASS AND
NOT EXAMINABLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.7.1 The space of continuously differentiable functions . . . . . . . . . . . . . . . . . . 53

7 Complex Analysis 54
7.1 Review of basic facts about C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.2 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.1 The exponential and the circular functions . . . . . . . . . . . . . . . . . . . . . . 62
7.2.2 Argument and Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.3 Complex integration, contour integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.3.1 Links with Green’s and Gauss’ Theorems . . . . . . . . . . . . . . . . . . . . . . . 70
7.4 Additional material (NOT COVERED AND NON-EXAMINABLE) . . . . . . . . . . . . . 75
7.4.1 Consequences of Cauchy’s Theorem (NOT COVERED AND NOT EXAMINABLE) 77
7.4.2 Applications of Cauchy’s formula to evaluate integrals in R (NOT COVERED AND
NOT EXAMINABLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Bibliography 87

ANALYSIS III 1
Chapter 1

Introduction

MA271, Mathematical Analysis 3 is a self-contained module. Only material lectured in class will be
examined. Non-examinable will be clearly marked.
The course covers the following topics:

• Pointwise and uniform convergence (sequences and series of functions).

• Differentiation.

• Complex valued functions.

The module does not follow any specific source. There references at the end of the notes cover most
of the topics in the module. I would be happy to supply a list of references that can be used to expand
any of the Chapters in the notes.
The course relies heavily on material covered in first-year analysis. We recall several of the main notions
that we will use.

• The triangle inequality

• Sequences and convergence

• Subsequences and The Bolzano-Weierstrass Theorem

• Cauchy sequences

• Summation

• Basic properties of power series

• The continuity of power series

• The derivative

• The diferentiability of power series

• The radius of convergence formula.

• The Riemann integral, construction and basic properties

• Uniform continuity.

Below is a brief summary of the main results.

2
CHAPTER 1. INTRODUCTION

1.1 Review of limits of sequences


In first-year analysis you have studied the convergence of sequences of real numbers (Chapter 2). Here is
a quick recap.

Definition 1.1. A sequence (an ) converges to a limit l ∈ R if for every ε > 0 there exists N ∈ N such
that
|an − l| < ε for every n > N.
In this case we write an → l as n → ∞.

The following results were covered in first-year analysis.

• Uniqueness of limits: A sequence can have at most one limit.

• The shift rule: For any fixed k ∈ N, an → l as n → ∞ if and only if an+k → l as n → ∞. (This
effectively means that for the question of convergence, we can disregard the first k terms of the
sequence (for any fixed k we like).

• Convergent sequences are bounded: Any convergent sequence is bounded.

• If an → a then |an | → |a|.

• The basic algebra of limits: Suppose an → a and bn → b. Then

(i) an + bn → a + b;
(ii) an bn → ab;
(iii) if b 6= 0 then an /bn → a/b.

• Limits and inequalities: If an ≤ bn for all n, an → a and bn → b then a ≤ b.

• Sandwich rule: an ≤ bn ≤ cn with an → l and cn → l implies that bn → l.

1.2 Review of continuity and differentiability


We review the notion of continuity (Chapter 5 in MA141) and uniform continuity (covered in Chapter 10
in MA141, and page 71 in MA139). Let Ω ⊂ R.

Definition 1.2. Given f : Ω → R, we say that f is continuous at x ∈ Ω if for every ε > 0 there exists
δ = δ(x, ε) > 0 such that

y ∈ Ω and |x − y| < δ =⇒ |f (y) − f (x)| < ε. (1.1)

The key point to note from the definition above is that given a function f , ε > 0 and a point x there
exists δ, but δ can depend on ε and x (and of course f ).

Definition 1.3. Given f : Ω → R, we say that f is uniformly continuous if for every ε > 0 there exists
δ = δ(ε) > 0 such that
x, y ∈ Ω and |x − y| < δ =⇒ |f (y) − f (x)| < ε. (1.2)

The key point here is that δ can be chosen independently of x.


In the case in which Ω = [a, b] we have the following result.

Theorem 1.4. Let f : [a, b] → R be a continuous function. Then it is uniformly continuous.

ANALYSIS III 3
CHAPTER 1. INTRODUCTION

Before we prove the result let’s consider a couple of examples in which the closed, bounded interval
[a, b] is replaced by an unbounded or an open domain.
Consider f (x) = ex , defined in R. Clearly this is a continuous function, but not uniformly continuous.
Indeed, since f grows faster and faster for larger x it is possible to find arbitrarily small intervals in which
f changes by at least ε. This example shows that the result in Theorem 1.4 is not necessarily true for
unbounded domains.
We can also consider g(x) = x1 on (0, 1). Just as in the previous example, near zero, the function g
grows to infinity faster and faster as we approach the origin, making it impossible to find δ independent of
x that satisfies (1.2).

Proof of Theorem 1.4. We will argue by contradiction. That would mean that there exist ε > 0 and xn , yn
such that |xn − yn | ≤ n1 but |f (xn ) − f (yn )| > ε.
The sequences {xn } and {yn } are bounded, as they are in [a, b], and therefore we can apply the Bolzano–
Weierstrass theorem to obtain convergent subsequences {xnk }∞ k=1 to x and {ynk }k=1 to y. Notice that

1
|x − ynk | ≤ |x − xnk | + |xnk − ynk | ≤ |x − xnk | + −−−−−−−→ 0,
nk k→∞

which implies that x = y. However we know that |f (xnk ) − f (ynk )| > ε for all k. Since f is continuous,
taking limits as k goes to infinity we obtain 0 = |f (x) − f (x)| > ε, which is a contradiction.

For completeness we reproduce the definition of derivative.

Definition 1.5. Suppose f : I → R is defined on an open interval I and c ∈ I. We say that f is


differentiable at c if
f (c + h) − f (c)
lim
h→0 h
exists. If so we call the limit f 0 (c).

One can show the following result.

Lemma 1.6. Suppose I is an open interval, f : I → R and c ∈ I. Then f is differentiable at c if and only
if there exists a number A and a function ε with the properties that for all x

f (x) − f (c) = A(x − c) + ε(x)(x − c),

ε(c) = 0 and ε is continuous at c: (ε(x) → 0 as x → c). If that happens A = f 0 (c).

1.3 Review of integration


To construct the integral on an interval [a, b] we consider partitions of the interval.
In practice, a partition of the interval [a, b] is determined by a collection of points {xi }ni=0 , for some n
such that
a = x0 < x1 < . . . < xn−1 < xn = b,
which yields the collection of intervals Ij = [xj−1 , xj ], for j = 1, . . . , n. Given a partition of P =
{I1 , . . . , In } of I = [a, b] we denote

M = supf m = inf f Mk = supf mk = inf f.


I I Ik Ik

Definition 1.7. Given f : [a, b] → R and a partition P = {I1 , . . . , In } of [a, b] we define the upper
Riemann sum of f with respect to P as
n
X
U (f, P ) := Mk |Ik |,
k=1

ANALYSIS III 4
CHAPTER 1. INTRODUCTION

and the lower Riemann sum of f with respect to P as


n
X
L(f, P ) := mk |Ik |.
k=1

Figure 1.1 shows the intuitive idea for calculating an integral, displaying the Lower and Upper Riemann
sums, for a uniform partition with 10 intervals (for f (x) = x2 ).
x2
x2
1.

0.8

0.6

0.4

0.2

0. x 0 x
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure 1.1: Lower (left) and Upper (right) Riemann sum of f

We will denote by P the set of all partitions of [a, b].


Definition 1.8. Given f : [a, b] → R, bounded, we define the upper Riemann integral of f by

U (f ) := inf U (f, P ).
P ∈P

We define the lower Riemann integral of f by

L(f ) := sup L(f, P ).


P ∈P

Definition 1.9. Given f : [a, b] → R bounded we say that it is Riemann integrable if and only if L(f ) =
´b ´b
U (f ), and define its Riemann integral, denoted by a f (x)dx or a f , by
ˆ b
f (x)dx := L(f ) = U (f ).
a

The following result will also prove useful in showing that a function is integrable.
Theorem 1.10. Let f : [a, b] → R be a bounded function. Then f is integrable if and only if for every
ε > 0 there exists a partition P of [a, b] such that

U (f, P ) − L(f, P ) < ε

Theorem 1.11. Let f : [a, b] → R be a continuous function. Then it is Riemann integrable.


Theorem 1.12. Left f, g : [a, b] → R be Riemann integrable functions, and c ∈ R. Then f + g and cf
are Riemann integrable and we have
ˆ b ˆ b ˆ b ˆ b ˆ b
cf = c f, (f + g) = f+ g.
a a a a a

Theorem 1.13. Let f, g : [a, b] → R be Riemann integrable functions such that f ≤ g. Then
ˆ b ˆ b
f≤ g.
a a

ANALYSIS III 5
CHAPTER 1. INTRODUCTION

Theorem 1.14. Let f : [a, b] → R be an integrable function. Then |f | is integrable and we have
ˆ b ˆ b
f ≤ |f |.
a a

The Fundamental Theorem of Calculus explores the relationship between integration and differentiation,
and how under sufficient conditions they can be understood as inverse operations. The first result we
consider is when the integral of a derivative is the original function.

Theorem 1.15. Let F : [a, b] → R be a continuous function that is differentiable on (a, b) with F 0 = f .
Assume that f : [a, b] → R is an integrable function. Then
ˆ b
f (x)dx = F (b) − F (a).
a

Theorem 1.16. Let f : [a, b] → R be an integrable function and define the function F : [a, b] → R by
ˆ x
F (x) := f (t)dt.
a

Then F is continuous on [a, b]. Additionally if f is continuous at c ∈ [a, b] then F 0 (c) = f (c), with the
derivatives at a and b understood as one-sided derivatives.

ANALYSIS III 6
Chapter 2

Sequences and Series of Functions

In this Chapter we will consider sequences and series of functions and aspects relating to pointwise and
uniform convergence and its interactions with continuity, integrability and differentiability questions.

2.1 Pointwise convergence


We will consider sequences of functions fn : Ω → R from a fixed domain Ω. Here we do not make any
assumptions about Ω, i.e. being open or closed, bounded or unbounded for example. While most examples
will be in one dimension, unless otherwise noted they apply to higher dimensions. We start by defining
pointwise convergence.
Definition 2.1. Let (fn )∞
n=1 be a sequence of functions, with fn : Ω → R. We say that (fn ) or fn
converges pointwise to f : Ω → R if and only if for every x ∈ Ω we have limn→∞ fn (x) = f (x). We will
denote pointwise convergence by fn → f .
Example 2.2. Consider the sequence (fn ) given by fn : [0, 1] → R, fn (x) = x1/n .
1.0
f20
0.8

f4 f3 f2 f1
0.6

0.4

0.2

0.0 0.2 0.4 0.6 0.8 1.0

Figure 2.1: The sequence fn for n = 1, 2, 3, 4 and 20.

Notice that fn (0) = 0 for every n, but that for every x ∈ (0, 1] we have limn→∞ x1/n = 1. As a result
the limit of the sequence (fn ) is (
0 x = 0,
f (x) =
1 x ∈ (0, 1].
Remark 2.3. Notice that the above example shows that the pointwise limit of a sequence of continuous
functions need not be continuous. It also produces a counterexample for the commutativity of the limits.
We have
lim lim fn (x) 6= lim lim fn (x),
n→∞ x→0+ x→0+ n→∞
as the left-hand side equals zero, while the right-hand side equals one.

7
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

Pointwise convergence clearly does not preserve continuity. It can also be very non-uniform, in the sense
that while fn (x) → 0 for every x we may have supx |fn (x)−f (x)| → C > 0 or even supx |fn (x)−f (x)| →
∞ as n goes to infinity, as shown in the next examples.

Example 2.4. Consider the sequences


 
1 2 1
2nx

 x ∈ [0, 2n ) 2n x

 x ∈ [0, 2n )
1 1 1
gn (x) = −2n(x − n ) x ∈ [ 2n , n ) hn (x) = −2n2 (x − n1 ) 1 1
x ∈ [ 2n , n )
 
x ∈ [ n1 , 1] x ∈ [ n1 , 1].

0 
0

It is easy to see that gn and hn are continuous and converge to the function f = 0. However, for every n
we have gn (1/(2n)) = 1 (with that being the maximum of gn ) and therefore

sup |gn (x) − 0| = 1.


x∈[0,1]

The situation is worse for the sequence (hn ), known as the Witch’s hat. Indeed hn (1/(2n)) = n, which
shows that while hn → 0 we have
sup |hn (x) − 0| → ∞.
x∈[0,1]

Pointwise convergence and integrability do not interact´as one ´would hope. Indeed, even if we assume
that the pointwise limit is integrable we may not have lim fn = lim fn .

Example 2.5. Consider fn (x) = χ[n,n+1) (x), where χI is the indicator of the set I, i.e., takes value 1 if
x ∈ I and zero otherwise. Clearly fn converges pointwise to f = 0. However,
ˆ ˆ
1 = fn 6= f = 0.

We can think of this, as “the mass escaping to´ infinity” (along the x axis). In the latter calculation, we
+∞
are considering the improper Riemann integral −∞ on all of R.
Another example of this phenomena,´ can be found by considering gn (x) = nχ(0,1/n) (x) we also have
that gn converges to 0, while having gn = 1 for every n. We can think of this as “pointwise convergence
allowing the mass to go to infinity” (along the y axis this time). The Witch’s hat above also provides a
similar example, in this case with continuous functions.

Example 2.6. Another sequence that will play a role in several modules this year is fn (x) = sin(n x). This
sequence is connected to Fourier series and will be heavily studied in MA250 PDE for example. Notice
that for x = kπ with k ∈ Z the limit exists and equals 0. If x = p/q π with p/q ∈
/ Z then there is no limit.
Indeed sin(nqx) = 0 while sin((2nq + 1)x) = sin(x) 6= 0. If x is an irrational multiple of π, then the rest
of the division of nx by 2π is dense in [0, 2π] and there is no limit.
Despite the fact that sin(nx) does not have a limit for most x, one can show that for every integrable
function f ˆ π
f (x) sin(nx)dx → 0 as n → ∞.
−π

This result, known as the Riemann–Lebesgue Lemma, suggests that sin(nx) goes to zero in some sense.
We can also consider the sequence gn (x) = cos(nx)
n . As cosine is a bounded function it is easy to see that
gn converges pointwise to 0. Since gn are smooth we can also consider gn0 (x) = − sin(nx). This tells us
that even for smooth functions, having gn converge pointwise to g does not imply that gn0 converges to g 0
even if g is smooth.
´
The final example we consider is one of a sequence (fn ) such that (fn − f )dx converges to zero, but
where fn does not converge pointwise to f .

ANALYSIS III 8
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

Example 2.7. We will consider functions defined on [0, 1]. Let

f0 (x) = χ[0,1] (x),

f1 (x) = χ[0,1/2] (x), f2 (x) = χ[1/2,1] (x),


f3 (x) = χ[0,1/4] (x), f4 (x) = χ[1/4,1/2] (x), f5 (x) = χ[1/2,3/4] (x), f6 (x) = χ[3/4,1] (x).
Notice that each function is an indicator of an interval, and that in each group above the intervals sweep
[0, 1]. When we move to the next block the length of the corresponding intervals gets divided by 2 and
therefore we consider twice as many functions for each group. While the process is clear from the list
writing a formula for fn is annoying to say the least. You can check that the following works. For an index
k−1
X k
X
n∈[ 2l , 2l ], k = 1, 2, . . .
l=0 l=0

we set fn as the indicator of the interval


" Pk−1 l Pk−1 l #
n− l=0 2 n − l=0 2 + 1
, .
2k 2k
´
Since the length of the intervals tends to zero it is clear that fn → 0, but since the intervals keep sweeping
the entire interval [0, 1] the sequence fn does not converge to zero (or any other function for that matter).
This is contrary to the intuition that if the area between f and fn is going to zero the functions fn must
be approaching, and therefore converging to f .

2.2 Uniform convergence


We now consider the notion of uniform convergence.

Definition 2.8. Let fn : Ω → R be a sequence of functions. We say that (fn ) converges uniformly to
f : Ω → R if and only if for every ε > 0 there exists N (ε) such that |fn (x) − f (x)| < ε for every x ∈ Ω
and for all n > N (ε).

The key different with pointwise convergence is that N depends only on ε and not on x. For pointwise
convergence we first froze x and consider the convergence of fn (x) to f (x). We will denote uniform
convergence by fn ⇒ f .
As before we are not making any assumption on Ω. In order to simplify the presentation we introduce
the notation
kf k∞ = sup |f (x)|.
x∈Ω

With this notation we have

fn ⇒ f ⇐⇒ ∀ε > 0, ∃N (ε) such that kfn − f k∞ < ε ∀n > N (ε).

Remark 2.9. Clearly uniform convergence implies pointwise convergence. The converse is of course false,
as can be seen by considering the sequence from Remark 2.3. Namely, we note that fn (1/2n ) = 1/2 and
so kfn − f k∞ ≥ 1/2. Alternatively, one can argue by contradiction and apply Theorem 2.13 below.

Definition 2.10. A sequence (fn ) of functions in Ω is called uniformly Cauchy if and only if for every ε > 0
there exists N (ε) such that kfn −fm k∞ < ε for all n, m > N (ε) (or alternatively supx∈Ω |fn (x)−fm (x)| <
ε for all n, m > N (ε)).

Theorem 2.11. A sequence (fn ) is uniformly convergent if and only if it is uniformly Cauchy.

ANALYSIS III 9
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

Proof. Assume that (fn ) is uniformly convergent to f , i.e. for every ε there exists N such that kfn −f k∞ <
ε/2 for all n > N . Then, for m, n > N

kfn − fm k∞ ≤ kfn − f + f − fm k∞ ≤ kfn − f k∞ + kfm − f k∞ ≤ ε/2 + ε/2 = ε.

For the converse, assume (fn ) is uniformly Cauchy. That means that for every x, fn (x) is a Cauchy
sequence in R and therefore convergent. That means there exists f (x) such that fn (x) converges to f (x)
at least pointwise. Now, we know that given ε > 0 there exists N (ε) > 0 such that |fn (x) − fm (x)| < ε/2
for every x and all n, m > N (ε). That is

fm (x) − ε/2 < fn (x) < fm (x) + ε/2 for all x, and all n, m > N (ε).

As the left-hand side holds for all m > N (ε) we can take limits as m goes to infinity. We find

f (x) − ε/2 ≤ fn (x) ≤ f (x) + ε/2 for all x, and all n > N (ε).

from which it follows that

|f (x) − fn (x)| < ε for all x, and all n > N (ε),

which proves the result.

Remark 2.12. It is worth noting that k · k∞ is a norm in the space of bounded functions in Ω ( we make
no assumptions about it being open, closed, bounded or unbounded). k · k∞ is referred to as the supremum
norm or uniform norm. Recall that by norm we mean that it satisfies

1. kf k∞ ≥ 0, with kf k∞ = 0 if and only if f = 0,

2. kλf k∞ = |λ|kf k∞ , for all λ ∈ R, and

3. kf + gk∞ ≤ kf k∞ + kgk∞ .

Theorem 2.13. Let (fn ) be a sequence of continuous functions in Ω that converges uniformly to f : Ω → R.
Then f is continuous.

Proof. First notice that the uniform convergence implies that given any ε > 0 there exists N > 0 such
that kfn − f k∞ < ε/3 for all n > N . In order to show that f is continuous at x0 ∈ Ω we need to show
that given ε there exists δ = δ(ε) such that for all x ∈ (x0 − δ, x0 + δ) ∩ Ω we have |f (x) − f (x0 )| < ε.
With N as above, we choose n > N , fixed from now on. Since fn is continuous at x0 we know that there
exists δ = δ(ε) such that for all x ∈ (x0 − δ, x0 + δ) ∩ Ω we have |fn (x) − fn (x0 )| < ε/3.
We estimate |f (x) − f (x0 )| using the triangle inequality

|f (x) − f (x0 )| = |f (x) − fn (x) + fn (x) − fn (x0 ) + fn (x0 ) − f (x0 )|

≤ |f (x) − fn (x)| + |fn (x) − fn (x0 )| + |fn (x0 ) − f (x0 )|


ε ε ε
≤ kfn − f k∞ + |fn (x) − fn (x0 )| + kfn − f k∞ < + + ,
3 3 3
for n > N and x ∈ (x0 − δ, x0 + δ) ∩ Ω, with N and δ chosen as above. This completes the proof.

We will denote the space of bounded, continuous functions with the uniform norm by (Cb ; k · k∞ ).

Theorem 2.14. (Cb ; k · k∞ ) is a complete space, i.e. every Cauchy sequence converges to a continuous
bounded function.

ANALYSIS III 10
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

Proof. We need to show that if (fn ) is Cauchy in the space (Cb ; k · k∞ ), then there is a limit, and that
the limit is bounded and continuous. First notice that a Cauchy sequence in (Cb ; k · k∞ ) is, by definition,
a uniformly Cauchy sequence. Theorem 2.11 implies that the sequence is convergent and since all the
functions are continuous Theorem 2.13 implies the limit is continuous.
To see that it is bounded, notice that for every x ∈ Ω
|f (x)| ≤ |f (x) − fn (x)| + |fn (x)|
for every n. Since fn converges uniformly to f there exists n large enough |fn (x) − f (x)| < 1. For that
n, since fn is bounded we have |fn | ≤ M . These two inequalities lead to |f (x)| ≤ M + 1 for every x ∈ Ω,
proving the boundedness of f .

Remark 2.15. We could consider the interaction of uniform convergence and differentiation or integration.
2 x)
Consider for example fn (x) = sin(n
n . The sequence (fn ) converges to f = 0 uniformly. Indeed

sin(n2 x) 1
−0 ≤ ∀x.
n n

Clearly all the functions fn are smooth. The derivatives are given by fn0 (x) = n cos(n2 x). It is easy to see
that the sequence (fn0 ) does not converge uniformly (or pointwise). This example shows that while fn ⇒ f
we may not have fn0 ⇒ f 0 or even fn0 → f 0 .
1
To explore integrability, we consider gn (x) = 2n χ[−n,n] . Recall that strictly speaking we have not defined
Riemann
´ integration in R, but rather improper integration, via a limiting procedure. It is clear however
that gn =´ 1 for every n. The ´ sequence gn converges uniformly to g = 0 as we have |gn − 0| ≤ 1/(2n),
and so lim gn = 1 6= 0 = g. We reiterate that strictly speaking´ gn are´not Riemann integrable and we
will prove that in fact, on a bounded interval fn ⇒ f does imply fn → f .
Theorem 2.16. Lef (fn ), fn : [a, b] → R be a sequence of Riemann
´ integrable
´ functions that converges
uniformly to f : [a, b] → R. Then f is Riemann integrable and fn → f .
Proof. First we need to show that f is Riemann integrable, that is show that for every ε > 0 there exists
a partition P of [a, b] such that
U (f, P ) − L(f, P ) < ε.
Now, since fn ⇒ f we know that for any ε > 0 there exists N such that kfn − f k∞ < ε/(4(b − a)) for
n > N . For a fixed n > N since fn is integrable we know that given ε > 0 there exists a partition P of
[a, b] such that
ε
U (fn , P ) − L(fn , P ) < .
2
Now, for that P
X X
U (f, P ) − L(f, P ) = [sup f − inf f ]|Ik | = [sup(f − fn + fn ) − inf (f − fn + fn )]|Ik |
Ik Ik Ik Ik
" #
X
≤ kf − fn k∞ + sup fn + kf − fn k∞ − inf fn |Ik |
Ik Ik
X X
=2 kf − fn k∞ |Ik | + [sup fn − inf fn ]|Ik |
Ik Ik

≤ 2kf − fn k∞ (b − a) + U (fn , P ) − L(fn , P )


ε ε
≤2 (b − a) + = ε.
4(b − a) 2
´ ´
To see that fn → f , notice that
ˆ b ˆ b ˆ b ˆ b ˆ b
fn − f = (fn − f ) ≤ |fn − f | ≤ kf − fn k∞ = kfn − f k∞ (b − a).
a a a a a

Clearly the right hand side goes to zero as n goes to infinity by the uniform convergence of (fn ) to f .

ANALYSIS III 11
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

In many circumstances it is necessary to consider functions of two variables (or more) from which we
construct new functions by integrating out some of the variables. We want to study several results in this
direction; we start by reviewing the notions of continuity and uniform continuity in two dimensions. The
definitions are analogous to Definitions 1.2 and 1.3.
Definition 2.17. Given f : Ω ⊂ R2 → R, we say that f is continuous at x if for every ε > 0 there exists
δ = δ(x, ε) > 0 such that

y ∈ Ω and |x − y| < δ =⇒ |f (x) − f (y)| < ε. (2.1)

Note that | · | has been used both to denote Euclidean distance in the plane, as in |x − y|, as well as for
absolute value of a real number, in |f (y) − f (x)|.
Definition 2.18. Given f : Ω ⊂ R2 → R, we say that it is uniformly continuous if for every ε > 0 there
exists δ = δ(ε) > 0 such that

x, y ∈ Ω and |x − y| < δ =⇒ |f (x) − f (y)| < ε. (2.2)

The key point here is that δ can be chosen independently of x. Similarly to Theorem 1.4 we have the
following result.
Theorem 2.19. Let f : Ω ⊂ R2 → R be a continuous function. Assume that Ω is closed and bounded.
Then it is uniformly continuous.
Proof. We will argue by contradiction. That would mean that there exists ε > 0 and xn , yn such that
|xn − yn | ≤ n1 but |f (xn ) − f (yn )| > ε.
The sequences {xn } and {yn } are bounded, as they are in Ω, which is closed and bounded, and therefore
we can apply Bolzano–Weierstrass to each component to obtain convergent subsequences {xnk }∞ k=1 to x
and {ynk }∞k=1 to y. Notice that
1
|x − ynk | ≤ |x − xnk | + |xnk − ynk | ≤ |x − xnk | + −−−−−−−→ 0,
nk k→∞

which implies that x = y. However we know that |f (xnk ) − f (ynk )| > ε for all k. Since f is continuous,
taking limits as k goes to infinity we obtain 0 = |f (x) − f (x)| ≥ ε, which is a contradiction.

Theorem 2.20. Let f : [a, b] × [c, d] → R be a continuous function. Define


ˆ b
I(t) := f (x, t)dx
a

Then I is a continuous function on [c, d].


Proof. We need to show that for every ε > 0 there exists δ such that |t − t0 | < δ, and t, t0 ∈ [c, d] implies
|I(t) − I(t0 )| < ε. ´
Now I(t) − I(t0 ) = [f (x, t) − f (x, t0 )]dx and therefore
ˆ b
|I(t) − I(t0 )| ≤ |f (x, t) − f (x, t0 )|dx. (2.3)
a

Since f is continuous on [a, b]×[c, d] it is p uniformly continuous, and therefore given ε > 0 there exists δ such
that (x1 , t1 ), (x2 , t2 ) ∈ [a, b]×[c, d] with (x1 − x2 )2 + (t1 − t2 )2 < δ implies that |f (x1 , t1 )−f (x2 , t2 )| <
ε/(b − a). Therefore if |t − t0 | < δ we have |f (x, t) − f (x, t0 )| < ε/(b − a). As a result (2.3) becomes
ˆ b ˆ b
ε
|I(t) − I(t0 )| ≤ |f (x, t) − f (x, t0 )|dx < dx = ε,
a a b−a

and we obtain the desired result.

ANALYSIS III 12
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

We can also consider differentiating I with respect to t under sufficient regularity results for f .
Theorem 2.21. Lef f, ∂f ∂t be continuous functions on [a, b] × [c, d]. Then, for t ∈ (c, d)
ˆ ˆ b
d b ∂f
f (x, t)dx = (x, t)dx.
dt a a ∂t
´b ´b
Proof. Set F (t) := a f (x, t)dx, and G(t) := a ∂f ∂t (x, t)dx. We want to show that F is differentiable on
(c, d) and F = G. We consider the difference between the incremental quotient that is used to define a
0

derivative of F and the function we expect to be derivative, namely G. Let t0 ∈ (c, d) be given. Consider
h ∈ R \ {0} such that t0 + h ∈ [c, d]. We write
ˆ b
F (t0 + h) − F (t0 ) f (x, t0 + h) − f (x, t0 ) ∂f
− G(t) = − (x, t0 )dx ,
h a h ∂t

which by the Mean Value Theorem, becomes, for some τ between t0 and t0 + h
ˆ b ˆ b
∂f ∂f ∂f ∂f
= (x, τ ) − (x, t0 )dx ≤ (x, τ ) − (x, t0 ) dx.
a ∂t ∂t a ∂t ∂t

Now, since ∂f
∂t is continuous on [a, b] × [c, d] it is uniformly continuous, and therefore for every ε > 0 there
exists δ such that for |h| < δ and τ as above we have
∂f ∂f ε
(x, τ ) − (x, t0 ) <
∂t ∂t b−a
This is implies that for |h| < δ
ˆ b
F (t0 + h) − F (t0 ) ε
− G(t0 ) ≤ dx = ε.
h a b−a
Such δ > 0 can be found for all ε > 0. Hence F is differentiable at t0 and F 0 (t0 ) = G(t0 ).

We now explore a version of Fubini’s Theorem for continuous functions.


Theorem 2.22. Let f : [a, b] × [c, d] → R be a continuous function. Then
ˆ b ˆ d ! ˆ d ˆ b !
f (x, y)dy dx = f (x, y) dx dy
a c c a
´d ´b
Proof. Since f is continuous on [a, b] × [c, d] Theorem 2.20 implies that c f (x, y)dy and a f (x, y) dx
are continuous on their respective domains, and therefore Riemann integrable. Consider
ˆ t ˆ d ! ˆ d ˆ t !
F (t) = f (x, y)dy dx − f (x, y) dx dy.
a c c a

By the FTC (Theorem 1.16), we know that F is continuous, with F (a) = 0. Also the first integral is
differentiable with ˆ ˆ d ! ˆ d
d t
f (x, y)dy dx = f (t, y)dy.
dt a c c

We know that ˆ t
d
f (x, y) dx = f (t, y).
dt a
We would now like to differentiate the second integral in F , namely
ˆ d ˆ t !
− f (x, y) dx dy
c a

ANALYSIS III 13
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

by differentiating inside the first integral. For that Theorem 2.21 requires that
ˆ t !
f (x, y) dx
a

be continuous in [a, b] × [c, d] as a function of t and y. Theorem 2.20 proves that it is continuous as a
function of y and we actually know that it is differentiable as a function of t. However, continuity in each
of the variables separately does not ensure that the function is continuous on [a, b] × [c, d]. However, one
can modify the proof of Theorem 2.20 to show continuity in [a, b] × [c, d]. (This is left as an exercise.)
Then we are allowed to differentiate inside the integral and we obtain
ˆ ˆ t ! ˆ d ˆ t ! ˆ d
d d ∂
f (x, y) dx dy = f (x, y) dx dy = f (t, y) dy.
dt c a c ∂t a c

Therefore ˆ ˆ
d d
0
F (t) = f (t, y) dy − f (t, y) dy = 0,
c c
Since F is continuous on [a, b], F (a) = 0 and F 0 (t) = 0 we find F (b) = 0. This implies the result.
Remark 2.23. The continuity requirement is necessary in the previous Theorem. The following is a
counterexample to Fubini’s theorem when continuity fails at just a point. Let
x2 − y 2
f (x, y) = .
(x2 + y 2 )2
Notice that f is not continuous at the origin. We have
ˆ 1 2 y=1
x − y2 y 1
2 2 2
dy = 2 = ,
0 (x + y ) x + y2 1 + x2
y=0

and ˆ ˆ ! ˆ
1 1 1
x2 − y 2 1 π
dy dx = dx = .
0 0 (x2 + y 2 )2 0 1+x 2 4
−π
In the opposite direction we get by symmetry. The key here is that the function is not in L1 , i.e. |f |
4
is not integrable.
ˆ 1 ˆ 1 ! ˆ 1 ˆ x 2 ! ˆ 1 y=x ˆ 1
x2 − y 2 x − y2 y 1
2 + y 2 )2
dy dx ≥ 2 + y 2 )2
dy dx = 2 + y2
dx = dx = ∞.
0 0 (x 0 0 (x 0 x y=0 0 2x

Differentiation revisited.
We will use the notation C k (a, b) to denote functions that are k times continuously differentiable on
(a, b), and C ∞ (a, b) for functions that are infinitely differentiable on (a, b).
We have seen examples of sequences (fn ) that are differentiable, with (fn ) converging uniformly to f
but for which fn0 does not converge to f 0 . In fact it is easy to construct examples of C 1 functions that
converge uniformly for which f 0 does not exist. Consider
 1/2
fn (x) = x2 + 1/n .

They are clearly C 1 as the x2 + 1/n never vanishes for fixed n. (fn ) converges uniformly to f (x) = |x|,
which is not smooth at the origin. To see this notice that if
 1/2
A := x2 + 1/n − |x|
then  √ 1/2 1
A ≤ (x + 1/ n)2 − |x| ≤ √ ,
n
and the uniform convergence follows.

ANALYSIS III 14
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

The following result will prove rather useful.


Theorem 2.24. Let (fn ) be a sequence of C 1 functions on [a, b] (understood as a one-sided derivative).
Assume fn → f in the pointwise sense and that fn0 converges uniformly to g. Then f is C 1 and g = f 0
(or fn0 ⇒ f 0 ).
Proof. Since fn0 ⇒ g, Theorem 2.16 yields
ˆ x ˆ x ˆ x
0
g(y)dy = lim fn (y)dy = lim fn0 (y)dy,
a a n→∞ n→∞ a

which by the FTC yields


ˆ x
g(y)dy = lim [fn (x) − fn (a)] = f (x) − f (a).
a n→∞

Notice that since g is continuous this means that f is continuous. While the Theorem does not assume
that g is continuous, that
´ x is a consequence of the uniform convergence of fn to g, since fn are C . Now,
0 1

the FTC implies that a g is differentiable with derivative g. Since


ˆ x
g = f (x) − f (a)
a

we obtain that f is differentiable and g = f 0.

2.3 Series of functions


In this section we consider series of functions, i.e., we study

X
fk (x),
k=1

with fk : Ω → R. We begin by establishing the notion of pointwise convergence and uniform convergence
for a series.
Definition 2.25. Let (fk ) be a sequence of functions fk : Ω → R. Let (Sn ) be the sequence of partial
sums, with Sn : Ω → R defined by
n
X
Sn (x) = fk (x).
k=1
Then the series ∞
X
fk (x)
k=1
converges pointwise to S : Ω → R in Ω if Sn → S pointwise on Ω and it converges uniformly to S in Ω if
Sn ⇒ S uniformly in Ω.
Theorem 2.26. Let (fk ), with fk : [a, b] → R, be a sequence of Riemann integrable functions on [a, b].
Assume that ∞
P P∞
k=1 fk converges uniformly. Then k=1 fk is Riemann integrable on [a, b] and
ˆ bX∞ X∞ ˆ b
fk = fk .
a k=1 k=1 a

Proof. Sn is a finite sum of integrable functions and therefore integrable (by additivity). Since Sn converges
uniformly Theorem 2.16 implies that S = limn→∞ Sn is integrable and moreover
ˆ b ˆ b
lim Sn = S.
n→∞ a a
´b ´b P∞
Since fk and S = limn→∞ Sn = we obtain the result.
Pn
a Sn = k=1 a k=1 fk

ANALYSIS III 15
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

P∞
Theorem 2.27. Let (fk ), with fk : [a, b] → R, be a sequence of C 1 functions such that k=1 fk converges
pointwise. Assume that ∞ 0
P
k=1 fk converges uniformly. Then


!0 ∞
fk0 (x),
X X
fk (x) =
k=1 k=1

that is, the series is differentiable and can be differentiated term-by-term.


Proof. The proof is a simple consequence of Theorem 2.24.This results says (changing the notation) that
if Sn is C 1 , Sn → S, Sn0 ⇒ g then S ∈ C 1 and S 0 = g (or Sn0 ⇒ S 0 ). If we define Sn = nk=1 fk then, it
P

is C 1 , since each fk is C 1 ; it converges pointwise to S = ∞


k=1 fk and finally (Sn ) converges uniformly, to
0
P

g say. Then S is C 1 and Sn0 ⇒ S 0 . This means



!0
lim S 0 0
X
g= =S = fk (x)
n→∞ n
k=1

0
but since Sn0 = ( we obtain the result, namely
Pn Pn 0
k=1 fk ) = k=1 fk

∞ ∞
!0
fk0 (x) =
X X
fk (x) .
k=1 k=1

Theorem 2.28 (The Weierstrass M-test). Let (fk ) be a sequence of functions fk : Ω → R, and assume
that for every k there exists Mk > 0 such that |fk (x)| ≤ Mk for every x ∈ Ω and ∞
k=1 Mk < ∞. Then
P


X
fk
k=1

converges uniformly on Ω.
Proof. Notice that it suffices to show that Sn := nk=1 fk is uniformly Cauchy (recall Theorem 2.11). Now
P

since ∞k=1 Mk < ∞, given ε > 0 there exists N such that


P

n
for all m, n > N.
X
Mk < ε
k=m+1

Now
n
X m
X n
X n
X n
X
|Sn (x) − Sm (x)| = fk (x) − fk (x) = fk (x) ≤ |fk | ≤ Mk ≤ ε,
k=1 k=1 k=m+1 k=m+1 k=m+1

for every x. Therefore Sn is uniformly Cauchy and the proof is complete.

2.4 A continuous, nowhere differentiable function (THE PROOFS IN


THIS SECTION ARE NOT EXAMINABLE)
In 1872 Weierstrass showed that there exist continuous functions that are nowhere differentiable. Standard
examples are constructed using Fourier Series. For example

X
f (x) = ak cos(2πbk x)
k=0

for any 0 < a < 1 < b with ab > 1.

ANALYSIS III 16
CHAPTER 2. SEQUENCES AND SERIES OF FUNCTIONS

We will construct an example based on the sawtooth function. Consider


(
x − bxc x ≤ bxc + 21
φ(x) =
1 − x + bxc x > bxc + 21 .

The function φ is equal to the distance function from x to Z.


We define, for n = 0, 1, . . .
1
fn (x) = n φ(4n x).
4
We will show that f (x) = ∞ is continuous but nowhere differentiable. Notice that
P
f
n=0 n

1 1 1
0 ≤ fn ≤ n
φ≤ ,
4 2 4n
and that by the Weierstrass M-test we have the uniform convergence of the series. Since each fn is
continuous, and the convergence is uniform we have that f is C 0 .
Given x ∈ R we will choose the sign of hn = ± 4n+1 1
in such a way that the points 4n x and 4n (x + hn )
2 ] for some k ∈ Z. We make this choice of sign for
both belong to the same interval of length 1/2, [ k2 , k+1
hn because on each of these intervals [ 2 , 2 ], the function φ has constant slope +1 or −1.
k k+1

Consider the incremental quotient

fn (x + hn ) − fn (x) φ(4n x + 4n hn ) − φ(4n x)


= = ±1.
hn 4n hn
Moreover, if m < n the graph offm also has slope ±1 on the interval to which x and x + hn belong. Let
us justify this last step in more detail. We argue by contradiction. Consider the case hn = 4n+1
1
(the case
when hn = − 4n+1 is treated analogously). Namely, we suppose that there exists ` ∈ Z such that
1

`
4m x < < 4m x + 4m hn .
2
Multiplying the above inequalities by 4n−m , we get
` 1
4n x < 4n−m < 4n x + .
2 4
Since 4n−m 2` , this is a contradiction to our choice of hn . Therefore,

fm (x + hn ) − fm (x) φ(4m x + 4m hn ) − φ(4m x)


m := = = ±1.
hn 4 m hn

However for m ≥ n + 1 we have (since 4m x + 4m hn − 4m x = +4m hn = ±4m−n−1 ∈ Z)


1
fm (x + hn ) − fn (x) = (φ(4m x + 4m hn ) − φ(4m x)) = 0.
4m
Therefore n n
f (x + hn ) − f (x) X fm (x + hn ) − fm (x) X
An := = = m .
hn m=0
hn m=0

Therefore An is an even integer if n is odd and an odd integer if n is even. Hence there is no limit as n
goes to infinity. Since hn goes to zero that proves that f is not differentiable.

ANALYSIS III 17
Chapter 3

Basic results about Rn

3.1 Notation in Rn

The main vector spaces that we shall consider in this module are Rn , n ∈ N. Thus, by a vector x ∈ Rn
we mean the n-tuple (x1 , . . . , xn ), xi ∈ R, 1 6 i 6 n.
For ease of writing, a vector x ∈ Rn will be written as a row vector x = (x1 , . . . , xn ), xi ∈ R, 1 6 i 6 n.
However, in calculations vectors will be written as column vectors
x1
 
 .. 
x =  . .
xn

This is so that, if A : Rn → Rk is a linear map represented by the matrix

a11 . . . a1n
 
 .. .. 
A= . . 
ak1 . . . akn

with respect to the standard bases of Rn and Rk , then the vector y := Ax is obtained by multiplying the
column vector x by the matrix A on the left. In index notation, if y = (y1 , . . . , yk ), then
n
X
yi = aij xj , i ∈ {1, . . . , k}.
j=1

A vector-valued function f (with values in Rk ) of the variables x1 , . . . , xn is denoted by f : U → Rk where


U ⊂ Rn is domain of the function f , i.e., the subset in which the independent variables x = (x1 , . . . , xn )
lie.1 Thus
f (x) is shorthand for (f1 (x1 , . . . , xn ), . . . , fk (x1 , . . . , xn ))
and,
f1 (x1 , . . . , xn )
 

for calculations, f (x) =  ..


. .
 

fk (x1 , . . . , xn )
We normally use a, b and c at the start of the latin alphabet to denote real numbers (i.e. scalars) and
we shall use letters like x, y, p, q, u, v and w in the second half of the latin alphabet to denote vectors.
Thus we shall write ax for the vector (ax1 , . . . , axn ) without always spelling out that a ∈ R and x ∈ Rn .
In two and three dimensions, we will often be written f (x, y) and f (x, y, z). Finally, 0 will denote both
the zero vector (though we shall occasionally write it as (0, . . . , 0)) and the zero scalar!
1
A real valued function f : U → R will be referred to as a scalar function.

18
CHAPTER 3. BASIC RESULTS ABOUT RN

3.2 The Euclidean norm and inner product


The Euclidean norm, or length, or magnitude of x = (x1 , . . . , xn ) ∈ Rn is denoted by kxk and is defined
by
n
!1/2
(3.1)
X
kxk := x2i .
i=1

The notation is convenient given that for n = 1, kxk means |x|.


The direction of a nonzero vector x is defined to be the unit vector kxk .
x
The obvious relation
x
x = kxk , x 6= 0,
kxk
is the mathematical statement of the informal definition of a (nonzero) vector as a quantity that has both
magnitude and direction2 .
The Euclidean distance between x and y in Rn is defined as kx − yk.
The Euclidean inner product x · y, also called the dot product and scalar product, of x, y ∈ Rn is
defined as n X
x · y := xi yi .
i=1

Other notations for x · y include (x, y) and hx, yi. Evidently, kxk = x · x.
The Cauchy-Schwarz inequality states that

|x · y| 6 kxkkyk.

It follows from
2
0 6 kyk2 x − (x · y)y = kyk4 kxk2 − (x · y)2 kyk2 .

Definition 3.1 (Angle between two nonzero vectors). The Cauchy-Schwarz inequality implies that, if x
and y are both nonzero, then there exists unique θ ∈ [0, π] such that

x · y = kxk kyk cos θ;

θ is then defined to be the angle between x and y.

Proposition 3.2 (The triangle inequality). ∀ x, y ∈ Rn ,

kx + yk 6 kxk + kyk. (3.2)

Proof.

kx + yk2 = (x + y) · (x + y) = kxk2 + 2x · y + kyk2 6 kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 .

Replacing x by x − z and y by z − y we get:

kx − yk 6 kx − zk + kz − yk, (3.3)

which corresponds to the familiar fact that the distance from x to y is less than or equal to the sum of the
distances from x to z and z to y. Regarding x, y and z as the vertices of a triangle, (3.3) says that the
length of the edge joining x and y is less than or equal to the sum of the lengths of the edges joining x to
z and z to y; this is the usual triangle inequality.
2
Note that the zero vector does not have a direction.

ANALYSIS III 19
CHAPTER 3. BASIC RESULTS ABOUT RN

Exercise 3.1. Prove that, for all x, y ∈ Rn ,

kxk − kyk 6 kx − yk. (3.4)

Proposition 3.3.

(i) kaxk = |a| kxk ∀ a ∈ R, x ∈ Rn .

(ii) kxk > 0 ∀ x ∈ Rn and kxk = 0 ⇔ x = 0.

Warning 3.4. It is true that y = x if, and only if, kx − yk = 0. However, kxk = kyk does not imply that
y = ±x. (Think of points on the unit circle.)

Remark 3.5 (k · k satisfies the definition of a norm). A norm is a non-negative valued function
k · k : X → R+ on a real vector space X which satisfies

(i) kxk > 0 ∀ x ∈ X and kxk = 0 ⇔ x = 0.

(ii) For every a ∈ R and x ∈ X we have kaxk = |a| kxk.

(iii) The triangle inequality (3.2): for every x, y ∈ X we have kx + yk 6 kxk + kyk.

3.3 Convergence in Rn
Definition 3.6. A sequence (xj ) of vectors in Rn converges to x ∈ Rn if

∀ ε > 0, ∃ N ∈ N such that, j > N ⇒ kxj − xk < ε.

Proposition 3.7 (Uniqueness of limits). Let (xj ) be a sequence in Rn . If it converges to both x and x̃,
then x = x̃.

Proof. If we assume, by contradiction that x 6= x̃, then ε := 21 kx − x̃k > 0. Since xj converges to x,
∃ N1 ∈ N such that
j > N1 ⇒ kxj − xk < ε. (3.5)

Similarly, since xj also converges to x̃, ∃ N2 ∈ N such that

j > N2 ⇒ kxj − x̃k < ε. (3.6)

Then, for j > max{N1 , N2 } we have:

2ε = kx − x̃k 6 kx − xj k + kxj − x̃k < 2ε.

This is of course a contradiction and therefore x = x̃.3

The notation when we consider each of the coordinates of one of the elements xj in the sequence can
get a bit awkward. We will donte the i-th coordinate of xj ∈ Rn by xj,i .

Proposition 3.8 (Componentwise Convergence). A sequence (xj ) of vectors in Rn converges to x0 ∈ Rn if,


and only if, for each i ∈ {1, . . . , n}, lim xj,i = x0,i , where xj = (xj,1 , . . . , xj,n ) and x0 = (x0,1 , · · · , x0,n ).
j→∞
3
Note how the triangle inequality is crucial for proving the uniqueness of the limit of a sequence.

ANALYSIS III 20
CHAPTER 3. BASIC RESULTS ABOUT RN

Proof that convergence implies componentwise convergence. Note that


" n #1/2
X
2
∀ i ∈ {1, . . . n}, |x0,i − xj,i | 6 kx0 − xj k = (x0,k − xj,k ) .
k=1

Now by definition of convergence, given ε > 0, ∃ N ∈ N such that

j > N ⇒ kx0 − xj k < ε


and therefore
j > N ⇒ |x0,i − xj,i | < ε ∀ i ∈ {1, . . . n},

i.e., limj→∞ xj,i = x0,i for every i ∈ {1, . . . n}.

Proof that componentwise convergence implies convergence. Given ε > 0 and i ∈ {1, . . . , n}, ∃ Ni ∈ N

such that j > Ni ⇒ |x0,i − xj,i | < ε/ n. Set N := max{N1 , . . . , Nn }. Then

n
!1/2
X
2
j > N ⇒ kx0 − xj k = (x0,k − xj,k ) < ε,
k=1

i.e., limj→∞ xj = x0 .

Remark 3.9. Proposition 3.8 allows us to reduce questions of convergence of a vector-valued sequence to
the corresponding (more familiar) questions of convergence of a sequence of real numbers.
The norm k · k that we defined in (3.1) corresponds to the standard notion of distance we are used to.
However we could have defined other alternative norms.
Definition 3.10 (Max-norm k · k∞ ). The max-norm, which is denoted by k · k∞ , is defined by

kxk∞ := max{|x1 |, . . . , |xn |}, x = (x1 , . . . , xn ). (3.7)

The following definition provides yet another norm on Rn .


Definition 3.11 (The ‘Manhattan or taxi cab norm’ k · k1 ).

kxk1 := |x1 | + · · · + |xn |, x = (x1 , . . . , xn ). (3.8)

In fact there is a full family of norms kxkp for 1 ≤ p < ∞ given by

n
!1/p
X
kxkp := |xi |p .
i=1

The Euclidean norm (3.1) corresponds to p = 2, and k · k∞ corresponds to taking the limit as p → ∞.
Exercise 3.2 (Comparison of the Euclidean norm with k · k∞ and k · k1 ). Prove that

kxk∞ 6 kxk 6 n kxk∞ (3.9)
and that

kxk 6 kxk1 6 n kxk. (3.10)

Furthermore, verify that k · k∞ and k · k1 satisfy the triangle inequality and the relations stated in
Remark 3.5 for a norm; indeed, k · k1 and k · k∞ are actually norms.
This exercise shows that in Definition 3.6 we could have equivalently used k · k1 or k · k∞ instead of
k · k to define the limit of a sequence.
Since Proposition 3.8 reduces convergence to componentwise convergence, we have the following result.

ANALYSIS III 21
CHAPTER 3. BASIC RESULTS ABOUT RN

Proposition 3.12 (Sequence sum rule). If xj converges to x, yj converges to y and a, b ∈ R then

lim (axj + byj ) = ax + by.


j→∞

Exercise 3.3 (Sequence product rules). Let (aj ) be a sequence of real numbers that converges to a and
let (xj ) and (yj ) be sequences of vectors in Rn that converge to x and y respectively. Prove that

(i) the sequence of vectors (aj xj ) converges to ax and

(ii) the sequence of real numbers (not vectors!) xj · yj converges to x · y.

Definition 3.13 (Boundedness of a sequence). A sequence (xj ) is bounded if there ∃ M > 0 such that
kxj || 6 M for every j ∈ N.

Proposition 3.14 (Boundedness of a convergent sequence). If (xj ) converges to x, then (xj ) is bounded.

It is possible to prove this proposition by using componentwise convergence and the boundedness of
real sequences. A more direct proof is based on the following lemma.

Lemma 3.15. If (xj ) converges to x then the sequence of real numbers kxj k converges to kxk.

Proof. Given ε > 0, ∃ N ∈ N such that j > N ⇒ kxj − xk < ε. It follows from the reverse triangle
inequality that
for j > N, | kxj k − kxk | 6 kxj − xk < ε.

Proof of boundedness of a convergent sequence. By the lemma, the convergence of (xj ) implies the con-
vergence of kxj k. The boundedness of a convergent sequence of real numbers then implies that kxj k is
bounded and therefore, by definition, (xj ) is bounded.

Remark 3.16. Note that the converse of Lemma (3.15) does not hold, not even when n = 1. (Use a
mathematical software package to plot the sequence ak = (cos k, sin k) in R2 for a demonstration of how
badly the converse of Lemma (3.15) can fail.)

Proposition 3.17 (Completeness of Rn ). Let (xj ) be a Cauchy sequence in Rn , that is, ∀ ε > 0, ∃ N ∈ N
such that j, k > N ⇒ kxj − xk k < ε. Then (xj ) converges to some x ∈ Rn .

Sketch proof. Show that each component xj,i , 1 6 i 6 n, is a Cauchy sequence of real numbers. Then
use the completeness of R and componentwise convergence of xj .

3.4 Subsequences and the Bolzano-Weierstrass theorem


The Bolzano-Weierstrass theorem is one of the most important theorems about sequences of real numbers.
It states that every bounded sequence of real numbers has a convergent subsequence. It generalises
immediately to sequences in Rn .

Theorem 3.18 (Bolzano-Weierstrass for a bounded sequence of vectors). A bounded sequence (xj ) in Rn
has a convergent subsequence (xj` ).

Sketch of the proof. The proof of the Bolzano-Weierstrass in Chapter 3 in MA141 is done in R. The
argument below is a complete proof in two dimensions, and with Let xj = (xj,1 , . . . , xj,n ) be a bounded
sequence in Rn . Then xj,1 is a bounded sequence in R and therefore, by the Bolzano-Weierstrass Theorem
it has a convergent subsequence xjk ,1 which converges to x∗1 ∈ R. Since we are only interested in finding
a subsequence, we can consider the following sequence, indexed by k, (xjk ,1 , . . . , xjk ,n ). So far we have
constructed a subsequence of the original for which the first coordinate is a convergent sequence.

ANALYSIS III 22
CHAPTER 3. BASIC RESULTS ABOUT RN

Consider now the sequence xjk ,2 . The sequence is of course bounded and therefore, by Bolzano-
Weierstrass, it has a subsequence xjkl ,2 which converges to x∗2 ∈ R. Notice that since xjk ,1 is convergent,
so is xjkl ,1 . Therefore, if we consider the sequence indexed by l, (xjkl ,1 , . . . , xjkl ,n ), we now have convergent
sequences in the first two components.
It is hopefully clear that, aside from running out letters (and having to resort to cleverer notation), we
can repeat this procedure n times, iteratively constructing subsequences to ensure that every component
is convergent.

3.5 Continuity
3.5.1 Definitions of continuity and continuous limit
We define continuity following the results in year 1 (see Definition 1.2). The only changes are in the
dimension of the domain and the target of the function. We consider U ⊂ Rn , p ∈ U and a function
f : U → Rk .

Definition 3.19 (ε-δ Definition of Continuity). Given f : U ⊂ Rn → Rk we say that f is continuous at p


if,
∀ ε > 0, ∃ δ > 0 such that, for x ∈ U , kx − pk < δ ⇒ kf (x) − f (p)k < ε.

Notice that the two norms k · k in the definition above correspond to norms in different spaces, namely
Rn and Rk , but we dot make a distinction in the notation.

Definition 3.20 (Sequential Definition of Continuity). f : U ⊂ Rn → Rk is continuous at p if, for every


sequence (xj ) in U which converges to p, (f (xj )) converges to f (p).

Exercise 3.4. Check that these two definitions are equivalent

Hint: The argument is the same as that given in First Year Analysis. That the ε-δ definition implies
the sequential definition is straightforward. The converse proceeds by proving the contrapositive, i.e., one
assumes the failure of the ε-δ definition and then one constructs a sequence xj in U which converges to p
but for which f (xj ) does not converge to f (p).
We say that f is continuous, without specifying a particular point, if it is continuous at all points of its
domain. If we wish to emphasize the domain U on which f is continuous, then we say that f is continuous
on U .

Notation 3.21. The space of functions continuous on U with values in Rk is denoted by C(U, Rk ) or
C 0 (U, Rk ).4 When k = 1, we simply write C(U ) or C 0 (U ).

Definition 3.22 (Continuous limit). f : U → Rk has a (continuous) limit at p ∈ U if there exists q ∈ Rk


such that

∀ ε > 0, ∃ δ > 0 such that, x ∈ U and 0 < kx − pk < δ ⇒ kf (x) − qk < ε.

We then write limx→p f (x) = q.

Just as for limits of sequences, continuous limits are unique. It is also clear that f is continuous at p if,
and only if, limx→p f (x) = f (p). Notice that the definition of continuous limit tacitly assumes that there
exist points in U , different from x, which are arbitrarily close to x.
4
The superscript will later denote the number of derivatives.

ANALYSIS III 23
CHAPTER 3. BASIC RESULTS ABOUT RN

3.5.2 Separate continuity


In this subsubsection, we shall restrict ourselves to the case n = 2 and to functions defined on all of
R2 . The generalisation to higher dimensions is straightforward but the presentation is much easier in two
dimensions using x and y as variables.
Given a real valued function f (x, y), we consider two families of functions {g y : R → R}y∈R and
{h : R → R}x∈R defined by
x

g y (x) := f (x, y) =: hx (y). (3.11)


Thus g s is the restriction of f to the horizontal line y = s and ht is the restriction of f to the vertical line
x = t.

Definition 3.23 (Separate continuity). A function f : R2 → R is separately continuous at (x0 , y0 ) if g y0


is continuous at x0 as a function of x and hx0 is continuous at y0 as a function of y.

Two natural questions arise:

(i) Does continuity imply separate continuity?

(ii) Does separate continuity imply continuity?

Exercise 3.5. Prove that continuity implies separate continuity.

As for question (ii), the example below shows that separate continuity does not imply continuity.

Example 3.24. Define f : R2 → R by

f (x, y) := 1 , if xy 6= 0, f (x, y) := 0 , if xy = 0.

g 0 (x) = 0 for every x ∈ R and h0 (y) = 0 for every y ∈ R. In particular, both g 0 and h0 are continuous
at 0 and therefore, f is separately continuous at (0, 0).
However, f is not continuous at (0, 0) because lim(x,y)→(0,0) f (x, y) does not exist. We can estab-
lish this by finding two sequences (aj , bj ) and (αj , βj ) both of which converge to (0, 0) but for which
limj→∞ f (aj , bj ) 6= limj→∞ f (αj , βj ); we also require (aj , bj ) 6= (0, 0) and (αj , βj ) 6= (0, 0) ∀ j ∈ N.
So take, for example, aj = αj = βj = 1/j and bj = 0. Then f (aj , bj ) = 0 and f (αj , βj ) = 1 ∀ j and
therefore limj→∞ f (aj , bj ) = 0 6= 1 = limj→∞ f (αj , βj ). By uniqueness of limits, if lim(x,y)→(0,0) f (x, y)
were to exist, limj→∞ f (aj , bj ) and limj→∞ f (αj , βj ) would have to have the same value. Since they do
not, we conclude that lim(x,y)→(0,0) f (x, y) does not exist.
The following are easy to verify (left as exercises):

• f is continuous at all points (x, y) such that xy 6= 0,

• f is not separately continuous at points (x, 0) such that x 6= 0 (because hx is then not continuous
at 0) and similarly,

• f is not separately continuous at points (0, y) such that y 6= 0.

3.5.3 Basic properties of continuous functions


Throughout this subsubsection, U ⊂ Rn , p ∈ U and a, b ∈ R.
The following results are basic properties of continuous functions with effectively the same proof as in
one dimension.

Proposition 3.25 (The sum of continuous functions is continuous). If f, g : U → Rk are both continuous
at p then, af + bg is continuous at p.

The proof of this is just an application of the sum rule for limits of sequences of vectors.

ANALYSIS III 24
CHAPTER 3. BASIC RESULTS ABOUT RN

Proposition 3.26 (The product of a continuous scalar (real) valued function with a continuous vector-val-
ued function is continuous). If f : U → R and g : U → Rk are both continuous at p then, f g is continuous
at p where (f g)(x) := f (x) g(x) .
The proof of this is just an application of the product rule for a convergent sequence of real numbers
and a convergent sequence of vectors.
Exercise 3.6. Suppose that f : U → Rk is continuous at p. Prove that if f (p) 6= 0 then there exists δ > 0
such that kf (x)k > 21 kf (p)k ∀ x ∈ U for which kx − pk < δ.
Exercise 3.7. Suppose that f : U → R is continuous at p ∈ U and f (x) 6= 0 ∀ x ∈ U . Prove that 1/f is
continuous at p.
Corollary 3.27. Suppose that f : U → R and g : U → Rk are both continuous at p and that f (x) 6=
0 ∀ x ∈ U . Then g/f is continuous at p.
Proposition 3.28 (The composition of continuous functions is continuous). If U ⊂ Rn , V ⊂ Rk , f : U →
Rk is continuous at p ∈ U, f (U ) ⊂ V, g : V → Rm is continuous at f (p) ∈ V , then g ◦ f : U → Rm is
continuous at p.
The proof of this is just an application of the sequential definition of continuity.
Proposition 3.29 (Componentwise continuity). Recall that f : U → Rk can be written as

(x1 , . . . , xn ) = x 7→ f (x) = (f1 (x1 , . . . , xn ), . . . , fk (x1 , . . . , xn )), x ∈ U.

f is continuous at p if, and only if, ∀ i ∈ {1, . . . , k}, fi : U → R is continuous at p.


Remark 3.30. The proposition above says that f is continuous if, and only if, all of its component functions
are continuous. The proof is a straightforward application of the sequential definition of continuity and the
equivalence of convergence and componentwise convergence for sequences in Rn .
This proposition suggests that most (but not all5 ) of the features related to the continuity (and, as we
shall see, differentiability) of functions f : Rn → Rk that are different from those of functions f : R → R
arise when n > 1; whether k is greater than 1 is less significant.

3.5.4 Constructing continuous functions of several variables from continuous real valued
functions of a single real variable (NOT EXAMINABLE).
xy
A function like f (x, y) = looks continuous on R2 \ {(0, 0)}, but how do we prove it without
+ y2x2
resorting to the ε-δ definition or the sequential definition of continuity? We know g(x) = x and h(x) = x2
are continuous as functions of the single real variable x. However, what we need to know is that the
functions γ : R2 → R and η : R2 → R defined by

γ(x, y) := x, η(x, y) := x2

are continuous as functions of two variables. That is precisely the content of the next proposition, which
follows from the following easy lemma.
Lemma 3.31. Write Rn+` as Rn ⊕ R` , that is

Rn+` = {(x, y) : x ∈ Rn , y ∈ R` }.

Denote by π1 and π2 the two projections of Rn+` onto Rn and R` respectively:

π1 (x, y) := x, π2 (x, y) := y, x ∈ Rn , y ∈ R` .

Then π1 and π2 are continuous.


5
See, for instance, Corollary 4.25.

ANALYSIS III 25
CHAPTER 3. BASIC RESULTS ABOUT RN

Proof. Fix (x0 , y0 ) ∈ Rn+` and, given ε > 0, choose δ = ε. Then

k(x, y) − (x0 , y0 )k < δ ⇒ kπ1 (x, y) − π1 (x0 , y0 )k = kx − x0 k 6 k(x, y) − (x0 , y0 )k < ε,

that is, π1 is continuous. The continuity of π2 is proved similarly.

Proposition 3.32. Consider E ⊂ R, a ∈ E and a function g : E → R. For i ∈ {1, . . . , n}, define


πi : Rn → R by
πi (x1 , . . . , xi , . . . , xn ) := xi
and let Ui := πi−1 (E) := {(x1 , . . . , xn ) ∈ Rn : xi ∈ E}. Define fi : Ui → R by f (x1 , . . . , xn ) := g(xi ),
that is, fi (x) = g(πi (x)) = g ◦ πi (x). Suppose that g is continuous at a. Then f is continuous at all
points of πi−1 {a} = {(x1 , . . . , xn ) ∈ Rn : xi = a}.

Proof. By Lemma 3.31, πi is continuous on Rn and therefore, by the continuity of composition of contin-
uous functions, f = g ◦ πi is continuous on πi−1 {a}.
xy
We can use Proposition 3.32 and the results in section 3.5.3 to prove the continuity of f (x, y) =
x2 + y2
on R2 \ {(0, 0)} as follows. Consider the four functions, each defined on R2 by

γ(x, y) := x, η(x, y) := x2 , σ(x, y) := y, τ (x, y) := y 2 .

Proposition 3.32 tells us that the continuity of these four functions follows from the continuity (proved in
First Year Analysis) of g(t) = t and h(t) = t2 as functions of the single real variable t. Now

(γ(x, y))(σ(x, y))


f (x, y) =
(η(x, y)) + (τ (x, y))

and therefore, the continuity of f on R2 \ {(0, 0)} follows from the continuity of the product, sum and
quotient of continuous functions at points where the denominator does not vanish.
A similar approach can be followed for most functions given by explicit formulas. However, the continuity
of a function at points where the function is given special values (not by a formula) has to be investigated
by separate arguments.
The following two examples are intended to clarify what is meant by ‘natural domain of definition’ of
a function defined by an expression involving familiar continuous functions. The natural domain of

x2 sin(y)
F (x, y) =
ex − cosh y

is R2 \ {(log(cosh(y)), y) : y ∈ R} and F is continuous on this set. Similarly,

log(x + y)
 q 
f (x, y, z) := , arccos(y) 1 + (cos(xez ))2
sin z

is continuous on {(x, y, z) ∈ R3 : x + y > 0, −1 6 y 6 1, z 6= nπ, n ∈ Z}.


Proposition 3.32 is, in a sense, ‘obvious’ and you need not quote it explicitly when appealing to the
continuity of functions given by expressions similar to the ones above.

3.5.5 Caution with taking limits in dimension > 2


If a ∈ R then a can be approached from only two directions, namely left and right. So, limx→a f (x) exists
if the right-hand limit limx→a+ f (x) and the left-hand limit limx→a− f (x) both exist and are equal.
The situation is much more complicated in higher dimensions. It suffices to illustrate the issue by
considering some of the many different ways of approaching (0, 0) in R2 . We may, for instance, approach
(0, 0) along any line ax+by = 0. We can also follow more complicated paths from a point in R2 \{(0, 0)} to

ANALYSIS III 26
CHAPTER 3. BASIC RESULTS ABOUT RN

(0, 0). For example, we can proceed along (x, x2 ), i.e. along the parabola y = x2 . Indeed, we can approach
(0, 0) along the graph of any continuous function ψ(x) for which ψ(0) = 0. This still does not exhaust all
possibilities because, for instance, we may approach (0, 0) along a spiral like (t cos(1/t), t sin(1/t), t > 0.
So, by Proposition 3.28, if f : R2 → R is continuous at (0, 0) then limt→0 f (ϕ(t), ψ(t)) would have to
exist for any pair of functions ϕ, ψ : R → R that are continuous at 0 and equal to 0 there. This should
make it clear that continuity is much more restrictive than separate continuity and indeed, more restrictive
than continuity along lines, which we shall now define.

Definition 3.33 (continuity along lines, also called linear continuity). A function f : Rn → Rk is continuous
along lines (also referred to as linearly continuous) at x0 if the restriction f L of f to the line L passing
through x0 is continuous for every such line L.

The line L through x0 in the direction of v ∈ Rn is parameterised by rv (t) := x0 + tv. Therefore f is


continuous along lines at x0 if f ◦ rv is continuous at t = 0 for every choice of v ∈ Rn . In particular,

lim f (x0 + tv) = f (x0 ) ∀ v ∈ Rn ,


t→0

that is, limt→0 f (x0 + tv) is independent of v. We have seen above that continuity implies continuity along
lines.
In the next example we will exhibit a function which is separately continuous at all points of R2
but which is not continuous along lines through (0, 0).

Example 3.34. Define f : R2 → R by


xy
f (x, y) = , if (x, y) 6= (0, 0), f (0, 0) := 0.
x2 + y2

For y 6= 0, g y (defined by (3.11)) is a continuous function of x and g 0 (x) = 0 ∀ x ∈ R. Therefore,


gyis continuous for any choice of y ∈ R. By similar reasoning, hx is continuous for any choice of x ∈ R.
This shows that f is separately continuous at all points of R2 . However, note that

f (t, t) = 21 ,


2
f (t, 2t) = 5 , ∀ t ∈ R \ {0}.

and f (t, −t) = − 12 

Therefore, lim(x,y)→(0,0) f (x, y) depends on the line in R2 along which we approach (0, 0). In particular, it
is not possible to assign any value to f at (0, 0) that would make it continuous along lines through (0, 0).

Remark 3.35. (This is not examinable) One may be tempted to think that a separately continuous
function fails to be continuous only at isolated points. This is not the case. For a fixed (a, b) ∈ R2
define f(a,b) : R2 → R by f(a,b) (x, y) := f (x − a, y − b) where f is as in Example 3.34. Let (an , bn ) be
an enumeration of Q × Q, i.e., of all points in R2 both of whose coordinates are rational. Then define
F : R2 → R by

2−n f(an ,bn ) (x, y).
X
F (x, y) :=
n=0

It can be shown that F is separately continuous, but discontinuous precisely on Q × Q. Conceptually, the
sum in the definition of F disperses the discontinuity of f to all the rational points in R2 .

The next example exhibits a function f : R2 → R which is continuous along lines through (0, 0)
but which is not continuous at (0, 0).

Example 3.36. Define f : R2 → R by f (x, y) = 1 if 0 < y < x2 and f (x, y) = 0 otherwise. Show that
limt→0 f (tv) = 0 = f (0, 0) ∀ v ∈ R2 . However, show also that f is discontinuous at (0, 0).

ANALYSIS III 27
CHAPTER 3. BASIC RESULTS ABOUT RN

Figure 3.1: Diagram of the function

In the diagram below, the grey shaded region E is the set on which f = 0, i.e.,

E := {(x, y) : f (x, y) = 0} = {(x, y) : y 6 0 or y > x2 }.

Given v ∈ R2 , ∃ τ > 0 such that, |t| < τ ⇒ tv ∈ E. (If v = (a, b), take τ = b/a2 if ab = 6 0
and τ = +∞ if ab = 0.) In other words, f (tv) = 0 ∀ t ∈ (−τ, τ ). It follows that limt→0 f (tv) = 0 =
f (0, 0) ∀ v ∈ R2 , as claimed.
Finally, limx→0 f (x, 12 x2 ) = 1 6= 0 = f (0, 0) which shows that f is discontinuous at (0, 0).
In the next example, we consider the following question: Suppose we demand that f : R2 → R be
continuous along any line in R2 and not just the ones that pass through a chosen point. Would f then
have to be continuous? Remarkably, this is still not the case, as demonstrated by the following example.

Example 3.37. Define f : R2 → R by

x2 y
f (x, y) = , if (x, y) 6= (0, 0), f (0, 0) := 0.
x4 + y 2

For each k ∈ R, define g k : R → R to be the restriction of f to the line y = kx through the origin in
R2 ,i.e.,
kx
g k (x) := f (x, kx) = 2 ∀ x ∈ R.
x + k2
Also define g ∞ : R → R to be the restriction of f to the y-axis, i.e.,

g ∞ (x) := f (0, y) = 0 ∀ y ∈ R.

We see that, for any choice of k ∈ R ∪ ∞, g k is continuous.


We now consider the restriction of f to the parabola y = x2 . This is given by ϕ : R → R where

x2 x2 1
ϕ(x) := f (x, x2 ) = 4 2 2
= , if x 6= 0, ϕ(0) = f (0, 0) = 0.
x + (x ) 2

Thus ϕ is not continuous. It follows from Propositon 3.28 that, since g(x) := (x, x2 ) is continuous, f
cannot be continuous at (0, 0) because, if it were, then ϕ = f ◦ g would also have to be continuous, which
it is not. Indeed, we have also shown that lim(x,y)→(0,0) f (x, y) does not exist. In particular, the function
f is continuous away from (0, 0) (we will not show this). It is also continuous along lines at (0, 0), but it
is not continuous at (0, 0).

ANALYSIS III 28
Chapter 4

Rudiments of topology of Rn and


Continuity

4.1 Closed and open subsets of Rn


Definition 4.1 (Closed set). X ⊂ Rn is defined to be closed if, whenever (xj ) is a sequence of points in
X which converges to x ∈ Rn then the limit x also belongs to X.
Definition 4.2 (Open set). U ⊂ Rn is defined to be open if, ∀ x ∈ U, ∃ ε > 0 such that y ∈ Rn and
ky − xk < ε ⇒ y ∈ U .
By convention, the empty set is defined to be both open and closed.
Proposition 4.3. A set is open if, and only if, its complement is closed.
Proof. Suppose that U ⊂ Rn is open and that U c is not empty (if U c is empty, then it is closed by
definition). In order to show that U c is closed, we consider a sequence (xj ) in U c which converges to
x ∈ Rn . We have to show that x lies in U c . If it does not, then, since U is open, ∃ ε > 0 such that
ky − xk < ε ⇒ y ∈ U . But limj→∞ xj = x and therefore, ∃ N ∈ N such that

j > N ⇒ kxj − xk < ε ⇒ xj ∈ U

which contradicts the assumption that xj ∈/ U ∀ j ∈ N.


For the converse, let X be a closed subset of Rn whose complement X c is nonempty. To prove that
X c (which we assume to be nonempty) is open, we have to show that, given y ∈ X c ∃ ε > 0 such that
kx − yk > ε ∀ x ∈ X. (ε is allowed to depend on y but not on x ∈ X). If this were not the case, then,
we could find y ∈ X c and a sequence (xj ) in X such that kxj − yk 6 1/j. But then y = limj→∞ xj and,
since X is closed, y must belong to X, contrary to the assumption that y ∈
/ X.

Remark 4.4. Most textbooks first define an open set and then define a closed set to be the complement
of an open set. Of course, these textbooks then have to prove that a closed set satisfies Definition 4.1.
The definition of open set motivates the following definition.
Definition 4.5 (Open (Euclidean) ball). The open ball of radius r > 0 centred at a ∈ Rn is denoted by
B(a, r) or Br (a) and is defined by

Br (a) ≡ B(a, r) := {x ∈ Rn : kx − ak < r}.

We abbreviate Br (0) to Br and B1 to just B.


The definition of an open set can now be rephrased as

U ⊂ Rn is open if, ∀ x ∈ U, ∃ ε > 0 such that Bε (x) ⊂ U .

29
CHAPTER 4. RUDIMENTS OF TOPOLOGY OF RN AND CONTINUITY

This is the definition of an open set that is given in most textbooks. The following proposition justifies the
use of the adjective ‘open’ in the definition of an open ball.
Proposition 4.6. An open ball is open, i.e., it satisfies the definition of an open set.
Proof. For each y ∈ B(a, r) we need to find ρy > 0 so that the open ball B(y, ρy ) ⊂ B(a, r).
To this end, set ρy = r − ky − ak. (This value of ρy is suggested by a picture which you should draw.)
Then, since ky − ak < r, we have ρy > 0 and, for x ∈ B(y, ρy ),

kx − ak 6 kx − yk + ky − ak < ρy + ky − ak = r,

i.e., the open ball B(y, r − ky − ak) ⊂ B(a, r) as required.

Definition 4.7 (Closed ball). The closed ball of radius r > 0 centred at a ∈ Rn is denoted by B(a, r) or
Br (a) and is defined by
Br (a) ≡ B(a, r) := {x ∈ Rn : kx − ak 6 r}.
We abbreviate Br (0) to Br and B1 to just B.
The following proposition justifies the use of the adjective ‘closed’ in the definition of a closed ball.
Proposition 4.8. A closed ball is closed, i.e., it satisfies the definition of a closed set.
Sketch proof. This proposition can be proved in at least two ways. One way is to prove that the complement
of a closed ball is open. Another way is to prove that, if xj is a sequence in B(a, r) which converges to x,
then kx − ak 6 r, i.e., x ∈ B(a, r).

Proposition 4.9 (An arbitrary union of open sets is open). If Uλ is open for all λ ∈ Λ, where Λ is an
S
indexing set (which could be uncountable), then λ∈Λ Uλ is open.
Proof. If p ∈ λ∈Λ Uλ then ∃ λ∗ ∈ Λ such that p ∈ Uλ∗ . But Uλ∗ is open and therefore, ∃ ε > 0 such
S

that B(p, ε) ⊂ Uλ∗ . In particular, B(p, ε) ⊂ λ∈Λ Uλ , i.e., λ∈Λ Uλ is open.


S S

Definition 4.10 (ε-neighbourhood). Let E be any subset of Rn . Given ε > 0, the ε-neighbourhood
N (E, ε) of E is defined by
N (E, ε) :=
[
B(x, ε).
x∈E

By the previous proposition, N (E, ε) is open.


Example 4.11. Let E := Q ∩ [0, 1] ⊂ R. Enumerate the rational numbers in E by a sequence x1 , x2 , . . . .
Given ε > 0, let

O := (xj − 2−j ε , xj + 2−j ε).
[

j=1

Then O is open. The sum of the lengths of the intervals that make up O is ε ∞ 1−j = 2ε. Therefore,
P
j=1 2
if ε < 21 , O cannot contain all the irrationals between zero and 1. This example shows how complicated
open sets can be.
Proposition 4.12 (The finite intersection of open sets is open).
If U1 , U2 , . . . , Um are all open, then m
T
j=1 Uj is also open.

Proof. If p ∈ then
Tm
j=1 Uj

∃ ε1 > 0 such that B(p, ε1 ) ⊂ U1 ,


... ,
∃ εm > 0 such that B(p, εm ) ⊂ Um .

Set ε := min{ε1 , . . . , εm } > 0. Then B(p, ε) ⊂ j=1 Uj , i.e., is open.


Tm Tm
j=1 Uj

ANALYSIS III 30
CHAPTER 4. RUDIMENTS OF TOPOLOGY OF RN AND CONTINUITY

Corollary 4.13. An arbitrary intersection of closed sets is closed and the finite union of closed sets is
closed.

Sketch proof. Consider the complements of the relevant closed sets and apply the preceding propositions
together with de Morgan’s laws on complements, unions and intersections.

Remark 4.14. Note that a subset of Rn may be neither open, nor closed. For example [0, 1) in R or

{(x, y) x2 + y 2 ≤ 1} ∩ {(x, y) |y > 0}.

Note, too, that ∅ and Rn are both open and closed.

Terminology. The collection of open subsets of Rn is called a topology of Rn .

4.2 Continuity and topology


4.2.1 Continuity in terms of open sets
The ε-δ definition of continuity at p for a function f : U → Rk , p ∈ U ⊂ Rn can be phrased as

∀ ε > 0 ∃ δ > 0 such that f B(p, δ) ∩ U ⊂ B(f (p), ε). (4.1)




Equivalently,
∀ ε > 0 ∃ δ > 0 such that B(p, δ) ∩ U ⊂ f −1 B(f (p), ε) . (4.2)


Informally and pictorially, f is continuous at p if f (x) can be guaranteed to stay near f (p) (ε-near) by
requiring x to stay sufficiently close (δ-close) to p in U .

Theorem 4.15 (Continuity via open sets and closed sets). The following statements are equivalent.

(i) f : Rn → Rk is continuous at all points of Rn .

(ii) for all open subsets V of Rk , f −1 (V ) is open.

(iii) for all closed subsets F of Rk , f −1 (F ) is closed.

Proof. Suppose that f is continuous at all points of Rn and let V ⊂ Rk be open. Then, for each
p ∈ f −1 (V ), ∃ ε > 0 such that B(f (p), ε) ⊂ V . By continuity of f at p as stated in (4.2), ∃ δ(p) > 0
such that B(p, δ(p)) ⊂ f −1 B(f (p), ε) ⊂ f −1 (V ), which shows that f −1 (V ) is open. (We have assumed


that f −1 (V ) is not empty; if it is, it would still be an open set.)


Conversely, given p ∈ Rn and ε > 0, the ball B(f (p), ε) is open and therefore, it follows (by assumption)
that f −1 B(f (p), ε) is open. In particular, ∃ δ > 0 such that B(p, δ) ⊂ f −1 B(f (p), ε) , which is precisely


the statement of the ε-δ definition of continuity as expressed by (4.2).


Finally, the equivalence of (ii) and (iii) follows from f −1 (V c ) = (f −1 (V ))c and the fact that a set is
closed if, and only if, its complement is open.

Remark 4.16. If f : Rn → Rk is continuous, then the image of an open set need not be open. For instance,
consider the constant map f (x) = 0 ∀ x ∈ Rn .
If f : Rn → Rk is continuous, then the image of a closed set need not be closed. For instance, consider
x2
the map f : R → R given by f (x) = 2 ∀ x ∈ R. Then f (R) = [0, 1) which is neither a closed subset
x +1
nor an open subset of R.

Example 4.17 (Open and closed sets via Theorem 4.15). Show that

(i) the unit sphere S n−1 := {x ∈ Rn : kxk = 1} is a closed subset of Rn .

ANALYSIS III 31
CHAPTER 4. RUDIMENTS OF TOPOLOGY OF RN AND CONTINUITY

(ii) The set E := {(x, y) ∈ R2 : xy sin(1/x) cos(1/y) > −1} is an open subset of R2 .
 

Solution.
(i) Define f ∈ C(Rn ) by f (x) = kxk. Then S n−1 = f −1 ({1}) which, by part (iii) of Theorem 4.15, is a
closed subset of Rn because {1} is a closed subset of R.
(ii) Define f : R2 → R by

cos(1/y) if xy 6= 0, f (x, y) = 0 if xy = 0.
 
f (x, y) := xy sin(1/x)

Then f is continuous and E = f −1 (−1, ∞) which, by part (ii) of Theorem 4.15, is an open subset of


R2 because (−1, ∞) is an open subset of R.

Remark 4.18. Close inspection of the proof of Theorem 4.15 will reveal that if U ⊂ Rn is open then
f : U → Rk is continuous at all points of Rn if, and only if, for all open subsets V of Rk , f −1 (V ) is open.
However, it is no longer true that the preimage of a closed set is necessarily closed, that is, statement (iii)
of Theorem 4.15 no longer applies.
Similarly, if U ⊂ Rn is closed then f : U → Rk is continuous at all points of Rn if, and only if, for all
closed subsets F of Rk , f −1 (F ) is closed. However, it is no longer true that the preimage of an open set
is necessarily open, that is, statement (ii) of Theorem 4.15 no longer applies.
The extension of Theorem 4.15 to functions f : U → Rk , where U is an arbitrary subset of Rn , requires
the notion of sets that are open/closed relative to U . We will not explore these notions in this module.

4.2.2 Continuity and sequential compactness


Definition 4.19 (Sequentially compact subset). K ⊂ Rn is sequentially compact if every sequence xj in
K has a convergent subsequence xj` whose limit is in K.

Definition 4.20. X ⊂ Rn is bounded if ∃ M > 0 such that kxk 6 M ∀ x ∈ X.

Theorem 4.21. K ⊂ Rn is sequentially compact if, and only if, K is closed and bounded.

Proof. Suppose that K is sequentially compact. To prove that K is closed, we consider a sequence xj in
K which converges to x ∈ Rn and then we have to show that x ∈ K. By the sequential compactness of
K, xj has a subsequence xj` whose limit is in K. But x = limj→∞ xj = lim`→∞ xj` ∈ K. The proof
that K is closed is complete.
To prove that K is bounded, assume, for a contradiction, that it is unbounded. Then there exists a
sequence xj in K such that kxj k > j ∀ j ∈ N. By the sequential compactness of K, xj has a subsequence
xj` whose limit is in K. In particular, xj` is bounded, i.e., ∃ M > 0 such that kxj` k 6 M ∀ ` ∈ N. But
by definition of subsequence, j` > ` and, by the way the sequence xj was chosen, kxj` k > j` . Therefore,

M > kxj` k > j` > ` ∀ ` ∈ N.

This clearly cannot hold and we conclude that K must be bounded.


We now assume that K is closed and bounded and prove that it is sequentially compact. So consider
an arbitrary sequence xj in K. Since K is bounded, xj must be bounded and, by the Bolzano-Weierstrass
theorem, it has a convergent subsequence xj` whose limit x must be in K, since K is closed. The proof
that K is sequentially compact is complete.

Theorem 4.21 is important because it enables us to determine easily whether a set is sequentially
compact. For instance, the theorem asserts that a closed ball B(a, r) is sequentially compact without
having to check whether all its sequences contain a convergent subsequence! Similarly, we can assert that
the sphere Sn−1 (a, r) := {x ∈ Rn : kx − ak = r} is sequentially compact; it is clearly bounded and we
showed above (for a = 0 and r = 1, but the proof is virtually identical) that it is closed.

ANALYSIS III 32
CHAPTER 4. RUDIMENTS OF TOPOLOGY OF RN AND CONTINUITY

Theorem 4.22 (Continuity preserves sequential compactness). If f : K → Rk is continuous and K is


sequentially compact then f (K) is also sequentially compact.

Proof. Let (yj ) be a sequence in f (K). Then, for each j ∈ N, ∃ xj ∈ K such that f (xj ) = yj . By the
sequential compactness of K, there exists a convergent subsequence (xj` ) of (xj ) such that lim`→∞ xj` =
x ∈ K. By continuity of f at x, lim`→∞ yj` = lim`→∞ f (xj` ) = f (x) ∈ f (K), i.e., f (K) is sequentially
compact.

Theorem 4.23 (Extreme Value Theorem). Let K ⊂ Rn be sequentially compact and let f : K → R be
continuous. Then ∃ x∗ , x∗ ∈ K such that

f (x∗ ) 6 f (x) 6 f (x∗ ) ∀ x ∈ K.

This theorem asserts that a continuous real valued function on a sequentially compact space attains
its extreme values, i.e., max and min. This theorem was proved in First Year Analysis in the case that K is
a closed, bounded interval. It is one of the most important theorems of elementary mathematical analysis
because, for instance, it is used in the proof of Rolle’s Theorem which, in turn, is used in the proof of
Taylor’s theorem.

Proof of Extreme Value Theorem. By the previous theorem and Theorem 4.21, f (K) ⊂ R must be closed
and bounded. Therefore, M := sup f (K) and m := inf f (K) are both finite because f (K) is bounded. By
definition of sup and inf, there exist sequences aj , bj ∈ f (K) such that limj→∞ aj = m and limj→∞ bj =
M . But f (K) is closed and therefore, m, M ∈ f (K), i.e., ∃ x∗ , x∗ ∈ K such that

f (x∗ ) = m 6 f (x) 6 M = f (x∗ ) ∀ x ∈ K.

Remark 4.24. The notion of supremum cannot be extended from R to Rk , k > 2. That is why we had
to restrict ourselves to scalar functions in the preceding theorem. The best we can do for vector-valued
functions is stated in the following corollary.

Corollary 4.25. Let K ⊂ Rn be sequentially compact and let f : K → Rk be continuous. Then ∃ x∗ , x∗ ∈


K such that
kf (x∗ )k 6 kf (x)k 6 kf (x∗ )k ∀ x ∈ K.

Proof. Let us note that the function g : K → R given by g(x) := kf (x)k is continuous. In order to see
this, we use the triangle inequality to obtain that for all x, y ∈ K, we have

|g(x) − g(y)| = |kf (x)k − kf (y)k| ≤ kf (x) − f (y)k . (4.3)

More precisely, given x ∈ K and ε > 0, by continuity of f at x, it follows that there exists δ > 0 such
that kf (x) − f (y)k < ε for all y ∈ K with kx − yk < δ. Substituting this into (4.3), we obtain that
|g(x) − g(y)| < ε for all such y. The result now follows from Theorem 4.23.

ANALYSIS III 33
Chapter 5

The space of linear maps and matrices

Notation 5.1.

(i) The space of linear maps, i.e., {A : Rn → Rk | A is linear}, shall be denoted by L (Rn , Rk ) and
L (Rn , Rn ) shall be abbreviated to L (Rn ).1

(ii) The space of k × n matrices with real entries shall be denoted by Rk,n .2

With a matrix
a11 . . . a1n
 
 .. ..  ∈ Rk,n
(aij ) =  . . 
ak1 . . . akn
we associate (subconsciously?!) A ∈ L (Rn , Rk ) defined by

x1 a11 . . . a1n x1
    
 ..   .. .
..   ... 
n
R 3 x =  .  7→ Ax :=  . k
∈R . (5.1)
 

xn ak1 . . . akn xn

Let {v1 , . . . , vn } and {w1 , . . . , wk } be the standard bases of Rn and Rk respectively, i.e.,

vj = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rn , wi = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rk .
↑ ↑
j th position among n entries ith position among k entries

Then,
a1j
 
k
 ..  X
Avj =  .  = aij wi , j ∈ {1, . . . , n}, (5.2)
i=1
akj
and therefore, (aij ) is the matrix representation of A with respect to the standard bases on Rn and Rk .
On a few occasions it shall be useful to express this association of (aij ) with A as defined above more
formally as a map µ : L (Rn , Rk ) → Rk,n , i.e.,

µ(A) := (aij ), where A and (aij ) are related by (5.2). (5.3)

It is easy to verify that µ is a linear isomorphism. Moreover, since we shall be using standard bases on Rn
and Rk throughout (unless otherwise explicitly stated), we shall switch between the linear map A and the
associated matrix µ(A) = (aij ) without warning.
1
Other notations in use are HomR (Rn , Rk ) and End(Rn ).
2
Other notations in use are Rk×n , M (k × n, R), Mk×n (R) and Mkn (R). M (n, R) is sometimes used as an abbreviation
of M (n × n, R).

34
CHAPTER 5. THE SPACE OF LINEAR MAPS AND MATRICES

It is easy to check that the identification


a11 . . . a1n
 
 .. ..  ←→ (a , . . . , a , a , . . . , a , . . . , a , . . . , a )
 . .  11 1n 21 2n k1 kn (5.4)
ak1 . . . akn
between Rk,n and Rnk is a linear isomorphism. It follows that
dim(L (Rn , Rk )) = dim(Rk,n ) = nk.

5.1 Two norms on the space of linear maps and matrices


We shall be discussing the continuity of maps like
f : Rn → L (Rk , R` ), f : L (Rn ) → R and f : L (Rn ) → L (Rn ).
As we have seen, this necessitates defining a notion of distance (norm) on L (Rn , Rk ), or equivalently, a
norm on Rk,n . The first such notion that comes to mind is to use the identification (5.4) and define the
so-called Frobenius norm k · kF by
 1/2
k X
n
(5.5)
X
k(aij )kF :=  a2ij  .
i=1 j=1

This is fine, but we shall make more use of the operator norm (defined below) as it turns out to be more
convenient.3
The operator norm arises from studying how large kAxk can get relative to kxk as x ranges over Rn .
We can do this using (5.1) and the Cauchy-Schwarz inequality:
 
X n
k X 2 k
X n
X n
 X  X n
k X 
kAxk2 = aij xj 6  a2ij x2j  = a2ij kxk2 = k(aij )k2F kxk2 .
i=1 j=1 i=1 j=1 j=1 i=1 j=1

In particular, for x 6= 0 we have


kAxk2
sup 2
6 k(aij )k2F . (5.6)
x∈Rn \{0} kxk
This makes possible the following definition.
Definition 5.2. The operator norm |||A||| of A ∈ L (Rn , Rk ) is defined by
kAxk
|||A||| := sup . (5.7)
x∈Rn \{0} kxk
In practice, (5.7) is often used in the form
kAxk 6 |||A||| kxk ∀ x ∈ Rn . (5.8)
Also in practice, if one may somehow establish that A ∈ L (Rn , Rk ) satisfies
kAxk 6 M kxk ∀ x ∈ Rn for some M > 0,
then (5.7) implies that |||A||| 6 M .
kAxk 1 1
The expression can be equivalently written as Ax and, by the linearity of A, Ax =
kxk kxk kxk
x x
   
A . Since = 1, we have the following equivalent definition of |||A|||:
kxk |x|
|||A||| := sup kAxk. (5.9)
kxk=1
3
 
Observe that kµ(A)k2F = trace (µ(A))T (µ(A)) = trace (µ(A))(µ(A))T , where the superscript T denotes transpose
and the trace of a matrix is the sum of its diagonal entries.

ANALYSIS III 35
CHAPTER 5. THE SPACE OF LINEAR MAPS AND MATRICES

Observe that the supremum in (5.9) is being taken over a sequentially compact set (namely, the unit
sphere S n−1 in Rn ). We shall see in Proposition 5.5 below that this can be an advantage of (5.9) over
(5.7).

5.1.1 Comparison of the two norms


Recalling from (5.3) that µ(A) = (aij ) we can rewrite (5.6) as
|||A|||2 6 kµ(A)k2F . (5.10)
From (5.2) we see that
n X
k n n
(5.11)
X X X
kµ(A)k2F = k(aij )k2F := a2ij = 2
kAvj k 6 kAk 2
kvj k2 = nkAk2 .
j=1 i=1 j=1 j=1

Combining (5.11) and (5.10) gives the comparison


√1 kµ(A)kF
n
6 |||A||| 6 kµ(A)kF . (5.12)

5.1.2 Properties of the operator norm


The properties of |||·||| in the next proposition justify calling it a norm.
Proposition 5.3. In (i), (ii) and (iii) below, A, B ∈ L (Rn , Rk ) and a ∈ R.
(i) |||A||| = 0 ⇔ A = 0.
(ii) |||aA||| = |a| |||A|||.
(iii) Triangle inequality. |||A + B||| 6 |||A||| + |||B|||.
Proof. The first two items are elementary and the proofs are left to the reader. For the third item,
k(A + B)xk = kAx + Bxk 6 kAxk + kBxk 6 (|||A||| + |||B|||)kxk
and therefore

k(A + B)xk
|||A + B||| = sup 6 |||A||| + |||B|||.
x∈Rn \{0} kxk

Proposition 5.4 (Composition bound). A ∈ L (Rn , Rk ) and B ∈ L (Rk , Rm ) ⇒ BA ∈ L (Rn , Rm )


and kBAk 6 kBk kAk.
Proof. k(BA)(x)k = kB(Ax)k 6 |||B|||kAxk 6 |||B||| |||A||| kxk and therefore, |||BA||| 6 |||B||| |||A|||.

Proposition 5.5. A ∈ L (Rn , Rk ) is injective if, and only if, ∃ α > 0 such that kAxk > αkxk ∀ x ∈ Rn .
(Note that k does not have to be equal to n.)
Proof. If Ax = 0 and kAxk > αkxk for some α > 0 then x = 0, i.e., A is injective.
The converse is proved by establishing the contrapositive, i.e., suppose that there is a sequence xj
in Rn \ {0} such that kAxj k/kxj k → 0 as j → ∞. Set uj := xj /kxj k. Then kuj k = 1 ∀ j ∈ N and
Auj → 0 as j → ∞. Since S n−1 is sequentially compact, there exists a subsequence uj` which converges
to u ∈ S n−1 . Let us note that the map x 7→ kAxk is continuous. Namely, by the triangle inequality and
linearity, we have for all x, y ∈ Rn that
kAxk − kAyk ≤ kAx − Ayk = kA(x − y)k ≤ |||A||| kx − yk ,
from where we deduce the continuity of x 7→ kAxk. Therefore, kAuk = limj→∞ kAuj k = 0, i.e.,
u ∈ ker(A). It follows that A is not injective.

ANALYSIS III 36
CHAPTER 5. THE SPACE OF LINEAR MAPS AND MATRICES

Remark 5.6. Proposition 5.5 can be regarded as a quantitative measure of injectivity. An injective
linear map has to keep a nonzero vector x away from zero. The larger the value of α in the inequality
kAxk > αkxk, the more the linear map A pushes x away from zero.
A better way of saying this is to consider a perturbation of A by a matrix B to get the matrix A + B.
We then have the following

Proposition 5.7. Suppose that A, B ∈ L (Rn , Rk ) satisfy

kAxk > αkxk for some α > 0 (5.13)

and |||B||| < α. Then A + B is still injective.

Proof.
k(A + B)xk > kAxk − kBxk > αkxk − |||B|||kxk = δkxk,
where δ := α − |||B||| > 0. Therefore (A + B)x = 0 ⇒ x = 0, which proves that A + B is injective.

This proposition can be interpreted as saying that if (5.13) holds then the open ball B(A, α) ⊂
L (Rn , Rk )4 is contained in the set of injective linear transformations in L (Rn , Rk ).
So, a larger value of α in (5.13) indicates that A is able to withstand perturbations by ‘larger’ (as
measured by the operator norm) linear transformations while maintaining injectivity.

5.2 Convergence and continuity in L (Rn , Rk )


These are defined in exactly the same way as for sequences in Rn and functions f : U → Rk , U ⊂ Rn .
For instance, a sequence (Aj )j∈N of linear transformations in L (Rn , Rk ) converges to A ∈ L (Rn , Rk ) if,
∀ ε > 0, ∃ N ∈ N such that j > N ⇒ |||Aj − A||| < ε.
Similarly, for r > 0, B(A, r) := {B ∈ L (Rn , Rk ) : |||B − A||| < r}.
A moment’s thought will reveal that, because of (5.12), we could also use k · kF instead of the operator
norm to define these notions. Recall that k · kF on Rk,n is the same as | · | on Rnk via (5.4) and (5.5).
Therefore, the completeness5 of Rnk that was established in Proposition 3.17, immediately implies the
completeness of Rk,n with respect to k · kF . The completeness of L (Rn , Rk ) with respect to k · k then
follows from (5.12).
Property (i) in Proposition (5.3) and the triangle inequality are precisely the properties that are needed
to prove the usual properties of limits of sequences and continuous limits, such as uniqueness of limit, sum
rule, boundedness of convergent sequences, etc..

5.2.1 Continuity of functions involving matrices or linear maps (NOT COVERED IN


CLASS AND NOT EXAMINABLE)
A function f : U → Rk,n is continuous at x ∈ U if ∀ ε > 0, ∃ δ > 0 such that |y − x| < δ ⇒
kf (y) − f (x)kF < ε. As above, since k · kF on Rk,n is the same as k · k on Rnk via (5.4) and (5.5), we
can use Proposition 3.29 to assert that

a11 (x) . . . a1n (x)


 

x 7→  ... ..  : U → Rk,n
f 
. 
ak1 (x) . . . akn (x)

is continuous at x if, and only if, ∀ i ∈ {1, . . . , k}, j ∈ {1, . . . , n}, x 7→ aij (x) is continuous at x.
A function F : U → L (Rn , Rk ) is continuous at x ∈ U if ∀ ε > 0, ∃ δ > 0 such that ky − xk < δ ⇒
|||F (y) − F (x)||| < ε.
4
This ball is taken with respect to the operator norm on L (Rn , Rk ); see §5.2.
5
A space X is complete if every Cauchy sequence in X converges to an element of X.

ANALYSIS III 37
CHAPTER 5. THE SPACE OF LINEAR MAPS AND MATRICES

Remark 5.8. Because of (5.12), we see that F : U → L (Rn , Rk ) is continuous at x ∈ U if, and only if,
µ(F ) : U → Rk,n is continuous at x ∈ U .

This remark is very useful because it provides a practical way of checking the continuity of
F : U → L (Rn , Rk ). Namely, we simply have to check whether all the matrix entries of the
matrix representation µ(F ) (with respect to the standard bases on Rn and Rk ) are continuous.

The continuity of f : Rk,n → R` and of F : Rk,n → R`,m is defined similarly by identifying (Rk,n , k · kF )
with (Rnk , k · k), as in the next proposition and the example below it.

Proposition 5.9 (Continuity of the determinant function). The map ∆ : Rn,n → R defined by ∆(aij ) :=
det(aij ) is continuous with respect to the norm k · kF on Rn,n .

Proof. The determinant is simply a polynomial of degree n in its n2 variables

a11 , . . . , a1n , a21 , . . . , a2n , . . . , an1 , . . . , ann . (5.14)


2
Therefore, its continuity follows from the identifications (5.4) and (5.5) of Rn,n , k · kF with Rn , k · k
 
2
and the usual continuity of polynomials6 on Rn .

Example 5.10. Define F : R2,2 → R2,2 by F (A) = A2 . If


! !
a b a2 + bc ab + bd
A= , then A2 =
c d ac + cd bc + d2

and therefore, F can be viewed as ϕ : R4 → R4 defined by

ϕ(a, b, c, d) := (a2 + bc, ab + bd, ac + cd, bc + d2 ),

which is clearly continuous and therefore F is continuous.

6
See Proposition 3.32 and use the algebraic properties of continuous functions. Each term of ∆(aij ) is, in fact, linear
in each of the n2 variables (5.14)and this is what makes ∆ an example of a multilinear map. The special case of the
determinant of a 2 × 2 matrix ac db may help to clarify matters. This determinant is then just the function on R4 defined by
∆(a, b, c, d) := ad − bc, which is easily seen to be continuous by Proposition 3.32 and the algebraic properties of continuous
functions.

ANALYSIS III 38
Chapter 6

The Derivative

From now on, the domain U ⊂ Rn of a function f : U → Rk shall be an open subset of Rn , unless
otherwise stated. In particular, this means that when p ∈ U and a limit like limx→p is considered, x is
allowed to approach p from any direction.

6.1 Directional derivative


The rate of change of a function of two or more variables depends on the direction in which that change
is measured. For example, f (x, y) = x increases as we move to the right along a line parallel to the x-axis
but it does not change at all when we move vertically along a line parallel to the y-axis. So we introduce
the notion of directional derivative in order to take this dependence on direction into account.
Given v ∈ Rn with v 6= 0, the line Lx,v passing through x ∈ Rn in the direction of v is parameterised
by r(t) = x + tv, t ∈ R. When v = 0, we have that r(t) = x for all t. Since U is open, ∃ τ > 0 such that
x + tv ∈ U ∀ t ∈ (−τ, τ ). The restriction gx,v of f to this segment of Lx,v is defined by

gx,v (t) := f (x + tv) ∈ Rk , |t| < τ.

Since gx,v (t) = (g1 (t), . . . , gk (t)) is a function of a single real real variable, we can differentiate it component
by component in the usual way.1

Definition 6.1. The directional derivative ∂v f (x) is defined by

d
∂v f (x) := gx,v (t)
dt t=0
d
= f (x + tv) (6.1)
dt t=0
f (x + tv) − f (x)
= lim . (6.2)
t→0 t

Example 6.2. Calculate ∂v f (x, y) for the function f (x, y) := x2 − y 2 in the direction of v = (a, b).

Solution. f (x, y) + t(a, b) = (x + ta)2 − (y + tb)2 = x2 + 2tax + t2 a2 − y 2 − 2tby − t2 b2 . Therefore




d
f (x + tv) = 2ax + 2ta2 − 2by − 2tb2 ,
dt
d
∂v f (x, y) = f (x + tv) = 2ax − 2by.
dt t=0
1
In the definition 3.33 of linear continuity, gx,v was denoted by f L .

39
CHAPTER 6. THE DERIVATIVE

6.1.1 Directional derivative and continuity


A function of one variable that is differentiable must, in particular, be continuous. So it is reasonable to
ask the following question.
Suppose that ∂v f (x) exists for all v ∈ Rn , does it follow that f is continuous at x?
Somewhat surprisingly, the answer is no!
Example 6.3. As in Example 3.36, let f (x, y) = 1 if 0 < y < x2 and f (x, y) = 0 otherwise. Show that
∂v f (0, 0) exists for all v ∈ R2 even though f is not continuous at (0, 0)!
Solution. As in Example 3.36, given v ∈ R2 , ∃ τ > 0 such that f (tv) = 0 ∀ t ∈ (−τ, τ ). It follows that
∂v f (0, 0) = lim f (tv)−f
t
(0,0)
= 0 ∀ v ∈ R2 .
t→0
So, we need a definition of derivative that is more restrictive than partial and directional derivatives
just as continuity is more restrictive than separate and linear continuity. The key idea, which we are about
to describe, is to regard the derivative as providing a ‘best’ affine linear approximation of a function. In this
process we move away from the ‘kinematic’ notion of derivative as rate of change and adopt, instead, a
‘mapping’ viewpoint of functions in which a nonlinear map is approximated by an affine linear one. Hence
the need to fully understand linear maps as in a module on Linear Algebra.

6.2 The (Fréchet) Derivative as an affine linear approximation


6.2.1 Affine linear approximation in the 1-variable case
Given x ∈ (a, b) ⊂ R, the derivative at x, f 0 (x), of a function f : (a, b) → R is defined by
f (x + h) − f (x)
f 0 (x) = lim . (6.3)
h→0 h
This definition cannot be readily extended to functions f : Rn → Rk because it is not possible to divide
vectors in Rk by vectors in Rn , even when n = k. We can get around this difficulty by rewriting (6.3)
(when n = k = 1) as
kf (x + h) − f (x) − f 0 (x) hk
lim = 0. (6.4)
h→0 khk
More precisely, taking n = k = 1, we can write (6.3) as
f (x + h) − f (x) f (x + h) − f (x) − f 0 (x) h
 
0 = lim − f 0 (x) = lim
h→0 h h→0 h
and so
|f (x + h) − f (x) − f 0 (x) h|
lim = 0.
h→0 |h|
Then, rewriting f (x + h) − f (x) − f 0 (x) h as f (x + h) − (f (x) + f 0 (x) h), we can interpret (6.4) as saying
that, for ‘small’ h, the (nonlinear) mapping h 7→ f (x + h), x fixed, is optimally approximated by the
affine linear map h 7→ f (x) + f 0 (x) h. Observe that this is a mapping viewpoint of the derivative, which
is conceptually different from the hitherto held kinematic viewpoint of rate of change. It is this mapping
viewpoint which we are able to generalise to the notion of derivative of functions of several variables.

6.2.2 The (Fréchet) Derivative


By analogy with (6.4), we make the following definition.
Definition 6.4. f : U → Rk is differentiable at x ∈ U if ∃ A ∈ L (Rn , Rk ) such that
kf (x + h) − f (x) − Ahk
lim = 0, (6.5)
h→0 khk

ANALYSIS III 40
CHAPTER 6. THE DERIVATIVE

As above, rewriting f (x + h) − f (x) − Ah as f (x + h) − (f (x) + Ah), we can interpret this definition


as saying that, for ‘small’ h, the (nonlinear) map h 7→ f (x + h) is optimally approximated by the affine
linear map h 7→ f (x) + Ah.

Exercise 6.1. Show that the linear map A in (6.5), if it exists, is unique. This justifies saying that the
Fréchet derivative provides the optimal linear approximation of f .

Remark 6.5. A real number a can also be viewed as the linear map A : R → R defined by Ah = ah.
Indeed a is the 1 × 1 matrix representation of A with respect to the standard basis 1 of R. Similarly, the
real number f 0 (x) in (6.3) is the 1 × 1 matrix representation of the linear map h 7→ f 0 (x) h in (6.4).

Notation 6.6. If a linear map A that satisfies (6.5) exists, it is called the (Fréchet) derivative of f at x
and it is denoted by Df (x).

Thus (6.5) can be rewritten as

kf (x + h) − (f (x) + Df (x) h)k


lim = 0. (6.6)
h→0 khk

Exercise 6.1 justifies calling the affine linear map h 7→ f (x)+Df (x)h the best affine linear approximation
of the map h 7→ f (x + h). The ε-δ formulation of (6.6) provides us with a way of quantifying how good
this approximation is. Namely, by definition of continuous limit we have that

kf (x + h) − (f (x) + Df (x) h)k


∀ ε > 0, ∃ δ > 0 so that 0 < khk < δ ⇒ < ε.
khk

Multiplying both sides by khk and then allowing h = 0 we deduce that

∀ ε > 0, ∃ δ > 0 so that khk < δ ⇒ kf (x + h) − f (x) + Df (x) h k 6 εkhk. (6.7)




We shall refer to (6.7) and (6.6) also as the definition of the derivative Df (x) of f at x. Note that equality
has to be allowed in (6.7) because we have allowed the possibility that h = 0, which can be convenient in
many situations.

Proposition 6.7 (Differentiability implies continuity). If f : U → Rk is differentiable at x ∈ U then f is


continuous at x.

Proof. By (6.7) we have that ∀ ε > 0 ∃ δ > 0 such that



khk < δ ⇒ kf (x + h) − f (x) + Df (x) h k 6 εkhk
⇒ kf (x + h) − f (x)k 6 kDf (x) hk + εkhk
⇒ kf (x + h) − f (x)k 6 (|||Df (x)||| + ε)khk.

Set δ∗ := min{δ, ε/(|||Df (x)||| + ε)}. Then khk < δ∗ ⇒ khk < δ and therefore we can use the above
chain of implications to conclude that

khk < δ∗ ⇒ kf (x + h) − f (x)k < (|||Df (x)||| + ε)δ∗ < ε.

We have proved the claim that f is continuous at x.

6.2.3 Differentiability of components of vector-valued functions


Exercise 6.2. Given f : U → Rk , f (x) = (f1 (x), . . . , fk (x)), prove that f is differentiable at x ∈ U if,
and only if, for each i ∈ {1, . . . , k}, fi : U → R is differentiable at x.

Remark 6.8. Compare Exercise 6.2 with Proposition 3.29 on componentwise continuity.

ANALYSIS III 41
CHAPTER 6. THE DERIVATIVE

6.2.4 Relation between the derivative and directional derivative


Proposition 6.9. If Df (x) exists then ∂v f (x) exists for all v ∈ Rn and ∂v f (x) = Df (x)v. In particular,
if f is differentiable at x, then ∂v f (x) is linear in v, i.e.,

∂av+bw f (x) = a ∂v f (x) + b ∂w f (x) ∀ a, b ∈ R and ∀ v, w ∈ Rn . (6.8)

Proof. If v = 0 there is nothing to prove. So, we assume that v 6= 0. Then, replacing h in (6.6) by tv and
removing k · k where that is allowed, we get

f (x + tv) − f (x) − Df (x)(tv)


lim = 0. (6.9)
t→0 tkvk

Multiply both sides of (6.9) by kvk and use Df (x)(tv) = tDf (x)v by the linearity of Df (x) so as to get

f (x + tv) − f (x)
lim = Df (x)v, i.e.,∂v f (x) = Df (x)v.
t→0 t
Finally, by linearity of Df (x),

∂av+bw f (x) = Df (x)(av + bw) = aDf (x)v + bDf (x)w = a ∂v f (x) + b ∂w f (x).

Example 6.3 shows that the converse of Proposition 6.9 does not hold.

Remark 6.10. We give another proof of Proposition 6.9 after we have proved the chain rule; see §6.6.3.

6.3 Partial derivatives, gradient and Jacobian matrix


Let {v1 , . . . , vn } be the standard basis of Rn , i.e.,

vi = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rn .

ith position among n entries

Definition 6.11. For 1 6 i 6 n, ∂vi f (x) is called the ith -partial derivative of f : U → Rk at x ∈ U . It is
more simply denoted by ∂i f (x).

Since
f (x + tvi ) − f (x)
∂vi f (x) = lim
t→0 t
f (x1 , . . . , xi−1 , xi + t, xi+1 , . . . xn ) − f (x1 , . . . , xi−1 , xi , xi+1 , . . . xn )
= lim
t→0 t
∂i f (x) is calculated by differentiating f (x1 , . . . , xn ) with respect to the ith variable, treating all the other
∂f
variables as constant2 . It is therefore also common to write ∂x i
(x) or ∂x

i
f (x1 , . . . , xn ) instead of ∂i f (x).
Bearing in mind that f (x) = (f1 (x), . . . , fk (x)) we have

∂i f (x) = (∂i f1 (x), . . . , ∂i fk (x)), and so, ∂i f (x) is a vector in Rk .

If f is a function of a few variables, say two, it is common to write f (x, y) instead of f (x1 , x2 ), and
to write fx instead of ∂f ∂f
∂x or ∂1 f . Similarly, fy is shorthand for ∂y . Similar shorthand applies to functions
of 3 or 4 variables. For 5 variables or more, it is usually more convenient to number the variables, rather
than choose distinct letters!
2
See Example 6.17 further down.

ANALYSIS III 42
CHAPTER 6. THE DERIVATIVE

6.3.1 Algebraic rules for partial derivatives.


Since partial differentiation involves differentiating with respect to a single variable, the usual algebraic
rules of differentiation apply. For instance,

if f, g : U → Rk , then ∂i (f + g) = ∂i f + ∂i g.

Similarly,
if f : U → R and g : U → Rk then ∂i (f g) = (∂i f )g + f ∂i g.

6.3.2 Gradient and Jacobian matrix


Definition 6.12. The Jacobian matrix at x, ∂f (x), of f : U → Rk , f (x) = (f1 (x), . . . , fk (x)) but written
as a column vector, is defined by

∂1 f1 (x) . . . ∂n f1 (x) ∂f1 (x) . . .


   
!
∂1 f (x) . . . ∂n f (x)
∂f (x) =  ... ..  ..
= = .
  
. .. .. 
. .
∂1 fk (x) . . . ∂n fk (x) ∂fk (x) . . .

where ∂1 f (x), . . . , ∂n f (x) are the vector-valued partial derivatives of the vector-valued function f (the
values of f are vectors in Rk ) and ∂f1 , . . . , ∂fk are the Jacobian 1 × n matrices (row vectors) of the
scalar-valued functions f1 , . . . , fk .
Definition 6.13. The gradient at x, ∇f (x), of a scalar valued function f : U → R is defined to be the
column vector
∂1 f (x)
 

∇f (x) :=  ...  .
 

∂n f (x)
T
Thus ∇f (x) is the vector in Rn which is the transpose of the row vector ∂f (x), ∇f (x) = ∂f (x) .
Remark 6.14. For a scalar valued function f : U → R, ∂f (x) represents a linear functional on Rn defined
by
Rn 3 h 7→ ∂f (x) (h) := ∂1 f (x) h1 + · · · + ∂n f (x) hn ∈ R,
  
h = (h1 , . . . , hn ).
Using the Euclidean inner product, this linear functional ∂f (x) is identified with the vector ∇f (x):
 
∂f (x) (h) = ∇f (x) · h.

However, be warned that the distinction between ∇f and ∂f is often suppressed, even in these notes!
Proposition 6.15. If f : U → Rk is differentiable at x ∈ U and h ∈ Rn then

Df (x)h = ∂f (x)h. (6.10)

Remark 6.16. It is important to appreciate the difference between the two sides of (6.10). On the left hand
side we have the linear map Df (x) acting on the vector h whereas on the right hand side we have the matrix
∂f (x) multiplying the vector h. In other words, ∂f (x) is the matrix representation of Df (x) with respect
to the standard bases on Rn and Rk . More formally, ∂f (x) = µ(Df (x)) where µ : L (Rn , Rk ) → Rk,n is
defined by (5.3).
Proof of Proposition 6.15. h = h1 v1 + · · · + hn vn and therefore, by linearity of Df (x),
n
X n
X
Df (x)h = hi Df (x)vi = hi ∂i f (x) = ∂f (x)h,
i=1 i=1

where, in the second equality, we have used Proposition 6.9.

ANALYSIS III 43
CHAPTER 6. THE DERIVATIVE

Example 6.17. Calculate the Jacobian matrix ∂f of f : R3 → R2 defined by


!
x3 +2y sin x
f (x, y, z) := e , p .
1 + y2z4

Solution. 3 +2y 3 +2y


3x2 ex 2ex
 
0
∂f (x, y, z) =  4
yz sin x 2 3
2y z sin x
.
√ cos x − (1+y 2 z 4 )3/2 − (1+y 2 z 4 )3/2
1+y 2 z 4

All the entries of the Jacobian matrix ∂f are continuous functions and we shall see in §6.7 that this implies
the existence of the derivative Df (x, y, z) ∈ L (R3 , R2 ), which is the linear map defined by
3 +2y 3 +2y    
3x2 ex 2ex r r

0
Df (x, y, z)(r, s, t) =  s = ∂f (x, y, z) · s .
   
4
yz sin x 2 3
2y z sin x

√ cos x − (1+y 2 z 4 )3/2 − (1+y 2 z 4 )3/2
  
1+y 2 z 4 t t

6.3.3 Why so many different notations for the same thing?!


We have seen in Propositions 6.9 and 6.15 that,

when f is differentiable at x, Df (x)h = ∂h f (x) and Df (x)h = ∂f (x)h.

So why bother with three different ways of writing the same thing?
The reason is that even when Df (x) does not exist, ∂h f (x) and ∂f (x)h may both still exist but
it may happen that they are not equal! (See Example 6.19 below.) In other words, the existence
of the Jacobian matrix ∂f (x) does not guarantee that the linear map it defines is the derivative
Df (x), unless Df (x) is known to exist.
The reader would be right to wonder at this stage how to go about calculating the derivative
Df if this cannot be simply done by computing the Jacobian matrix ∂f . Fortunately, as has
already been pointed out in Example 6.17, it suffices to verify further that all the entries of ∂f
are continuous for then, by Theorem 6.30, Df exists and its matrix representation with respect
to the standard bases of Rn and Rk is given by ∂f .
However, there are situations where it may be necessary to to calculate Df directly from the definition
(6.6). This would be the case, for example, at points outside the natural domain of definition of f where it
is assigned a special value. There are also situations where it is more convenient, to calculate ∂v f directly
from the definition (6.1), as in §6.5.1.

Remark 6.18. The notions of partial derivative, directional derivative and (Fréchet) derivative are the
differentiable analogues of separate continuity, continuity along lines and continuity.

Example 6.19. Define f : R2 → R by

x3
f (x, y) = if (x, y) 6= (0, 0), f (0, 0) = 0.
x2 + y 2

(i) Show that ∂v f (0, 0) exists for all v ∈ R2 . In particular, calculate ∂f (0, 0).

(ii) Show that ∂v f (0, 0) 6= ∂f (0, 0) v and that ∂v f (0, 0) is not linear in v.

(iii) Calculate fx (x, y) and fy (x, y) for (x, y) 6= (0, 0) and show that fx and fy are not continuous at
(0, 0).

(iv) Explain why Df (0, 0) does not exist.

ANALYSIS III 44
CHAPTER 6. THE DERIVATIVE

Solution. We shall let v = (a, b) ∈ R2 throughout.

ta3
(i) As always, ∂(0,0) f (0, 0) = 0. If (a, b) 6= (0, 0) we have f (ta, tb) = ∀ t ∈ R and therefore,
a2 + b2

d d ta3 a3
∂v f (0, 0) = f (ta, tb) = = .
dt t=0 dt a2 + b2 t=0
a2 + b2

So, ∂v f (0, 0) exists for all v ∈ R2 and, in particular,

the Jacobian matrix ∂f (0, 0) = (∂(1,0) f (0, 0) , ∂(0,1) f (0, 0)) = (1, 0).

a3
(ii) ∂f (0, 0) (a, b) = a 6= ∂(a,b) f (0, 0), unless b = 0. ∂(a,b) f (0, 0) is not linear in (a, b) because a2 +b2
is
not a linear function of a and b.

(iii)
x2 (x2 + 3y 2 ) −2x3 y
fx (x, y) = , fy (x, y) = .
(x2 + y 2 )2 (x2 + y 2 )2

We have seen that fx (0, 0) = ∂(1,0) f (0, 0) = 1 but limy→0 fx (0, y) = 0. Therefore, fx is not
continuous at (0, 0). Similarly fy (0, 0) = ∂(0,1) f (0, 0) = 0 but limx→0 fy (x, x) = − 12 . Therefore, fy
is also not continuous at (0, 0).

(iv) ∂v f (0, 0) is not linear in v and therefore, by Proposition 6.9, Df (0, 0) does not exist. The situation is
similar to that in Example 3.37 of a real valued function of two variables which fails to be continuous
even though its restriction to any line in R2 is continuous. We shall see below that the lack of
differentiability of f at (0, 0) can be understood geometrically as the failure of the graph of f (which
lies in R3 ) to have a tangent plane at (0, 0, 0).

6.4 Geometric approximation and approximation of functions (NOT COV-


ERED IN CLASS AND NOT EXAMINABLE)
6.4.1 Tangent to a curve
Let r : [a, b] → Rk , r(t) = (x1 (t), . . . , xk (t)), be a continuously differentiable parameterisation of a curve
C = r([a, b]) ⊂ Rk . By this we mean that the functions dx dt , . . . , dt are all continuous. Assume that
1 dxn

r (t) = ( dt , . . . , dt ) 6= 0 ∀ t ∈ [a, b], i.e., the parameterisation r is regular. Using the rate of change
0 dx1 dxk

definition of derivative given by (6.3), we can then interpret r0 (t) as the vector tangent to C at r(t).3 The
line Lr(t) tangent to C at r(t) is parameterised by

`(h) = r(t) + r0 (t)h.

But r0 (t) = ∂r(t) and therefore, the affine linear approximation of h 7→ r(t+h) by h 7→ r(t)+∂r(t)h = `(h)
is a parameterisation of the tangent line Lr(t) . In other words, the affine linear approximation of h 7→ r(t+h)
by h 7→ r(t) + ∂r(t)h for small h corresponds to the geometric approximation of C by Lr(t) near r(t).
In the special case that C is itself a line, then Lr(t) is the same as C. This is the geometric manifestation
of the fact that, as discussed in Example 6.20, the best affine linear approximation of an affine linear map
is itself!
3
We can also view r(t) as the position of a particle at time t and then r0 (t), also denoted ṙ(t), is the velocity of the particle.

ANALYSIS III 45
CHAPTER 6. THE DERIVATIVE

6.4.2 Tangent plane of a surface


Let U be an open subset of R2 and let r : U → R3 be a continuously differentiable parameterisation of a
surface S = r(U ) ⊂ R3 . By this we mean that if r(u, v) = (x(u, v), y(u, v), z(u, v)) then all six partial
derivatives xu , yu , zu , xv , yv and zv are continuous. Assume that ∂r is of rank 2, the maximal rank that
it can have, at all points of U , i.e., the parameterisation r is regular. Since
 
xu xv
ru = (xu , yu , zu ), rv = (xv , yv , zv ) and ∂r =  yu yv 
 
zu z v

we see that ∂r is of rank 2 if, and only if, ru and rv are linearly independent.4 As in the preceding discussion
for a curve C, the affine linear approximation of (h, k) 7→ r(u + h, v + k) by

(h, k) 7→ r(u, v) + ∂r(u, v)(h, k) = r(u, v) + hru (u, v) + krv (u, v)

is then a parameterisation of the plane Tr(u,v) S tangent to S at r(u, v). Once again, the affine linear
approximation of (h, k) 7→ r(u + h, v + k) for small h and k corresponds to the geometric approximation
of S by Tr(u,v) S near r(u, v).

6.4.3 Graph of a scalar function of 2 variables


Given f : U → R, U ⊂ R2 , the graph Gf of f is the surface parameterised by

r(x, y) = (x, y, f (x, y)).

For example, if f (x, y) = 1 − x2 − y 2 , x2 + y 2 < 1, then r(x, y) = (x, y, 1 − x2 − y 2 ) is another


p p

parameterisation of the upper hemisphere.


Note that rx = (1, 0, fx ) and ry = (0, 1, fy ) are linearly independent for any function f . A parameter-
isation of the plane tangent to Gf at (x, y, f (x, y)) is given by

(h, k) 7→ r(x, y) + (Dr(x, y))(h, k) = (x, y, f (x, y)) + h(1, 0, fx ) + k(0, 1, fy )


= (x + h, y + k, f (x, y) + hfx + kfy )

= x + h, y + k, f (x, y) + (h, k) · (∇f (x, y)) .

Thus we see that f is not differentiable at (x0 , y0 ) ∈ pU if, and only if, Gf does not have a tangent plane at
(x0 , y0 , f (x0 , y0 )). For example, (x, y) → |(x, y)| = x2 + y 2 is not differentiable at (0, 0) because none
of its partial derivatives exist at 0. We see this geometrically by noting that the graph of (x, y) → |(x, y)|
on R2 is a circular cone about the z-axis with an apex at the origin where the cone does not have a tangent
plane.

6.4.4 Orders of approximation of a function


For arbitrary values of n and k, it is not possible to provide simple geometric interpretations of Df (x) ∈
L (Rn , Rk ) similar to those presented above; consider, for example, n = 3 and k = 2. Therefore we have to
change our viewpoint when defining the derivative from that of rate of change or tangent line and tangent
plane to that of best approximation by a linear map. Linear maps are the simplest maps, after constant
maps, and they are fully understood (rank, eigenvalues, etc.) by the methods of linear algebra. We can
then transfer this knowledge of linear maps to differentiable maps up to an error that can be quantified by
(6.7).
Recalling Taylor’s theorem, we see that
4
For example,
r(u, v) := ((cos v)(sin u), (sin v)(sin u), cos u), 0 < v < 2π, 0 < u < π,
is a regular parameterisation of the unit sphere minus the prime meridian, i.e., the semicircle running from the North Pole
(0, 0, 1) to the South Pole (0, 0, −1) via (1, 0, 0).

ANALYSIS III 46
CHAPTER 6. THE DERIVATIVE

(i) a function h 7→ f (x + h) which is continuous at h = 0 admits an approximation by the constant


f (x). The error of the approximation is measured by ε = εkhk0 and therefore, this approximation is
said to be of zeroth order in h.
(ii) a function h 7→ f (x+h) which is differentiable at h = 0 can be approximated by the affine linear map
h 7→ f (x) + Df (x) h. According to (6.7), the error of the approximation is now measured by εkhk
and therefore, this approximation is said to be of first order (equivalently, linear) in h. Furthermore,
for small h, εkhk  ε, i.e., this first order approximation is much better (i.e., the error is smaller)
than that demanded by continuity, or even Lipschitz continuity.
(iii) Later on in this module, we shall show that if h 7→ f (x + h) is twice differentiable at h = 0 then it
admits an approximation of the form h 7→ f (x) + Df (x) h + (quadratic polynomial in h). The error
of the approximation is now measured by εkhk2 and therefore, this approximation is said to be of
second order (equivalently, quadratic) in h. Quadratic polynomials are also studied in linear algebra
under the topic of symmetric bilinear forms.
The above discussion should make clear that, when discussing derivatives of functions of several variables,
the significance of derivative moves away from that of rate of change to that of approximation by polynomials
which are ‘simple’ enough to be amenable to detailed study.

6.5 Examples of direct calculation of the derivative from its definition


(NOT COVERED IN CLASS AND NOT EXAMINABLE)
Example 6.20. Show that the derivative of the affine linear map f : Rn → Rk defined by
f (x) = Ax + y0 , A ∈ L (Rn , Rk ), y0 ∈ Rk ,
is given by Df (x) = A ∀ x ∈ Rn .
Solution. To calculate Df (x) we need to consider f (x + h) − f (x) and look for the term linear in h:
 
f (x + h) − f (x) = A(x + h) + y0 − Ax + y0 = Ah,
where, by linearity, A(x + h) = Ax + Ah. So, f (x + h) − f (x) − Ah = 0 ∀ h ∈ Rn . It follows from the
definition of derivative by (6.6) that Df (x) = A ∀ x ∈ Rn . We have also shown that, as expected, the
best affine linear approximation of the affine linear map f is f itself!
In particular, if n = 1 and f ∈ L (R, Rk ) is defined by
f (t) = tv + y0 , t ∈ R, v, y0 ∈ Rk ,
then Df (t) is the linear map h 7→ hv : R → Rk and ∂f (t) = v ∀ t ∈ R. This is an extension of Remark
6.5 and a special case of Remark 6.16. Namely, a vector v ∈ Rk can also be viewed as the linear map
A : R → Rk defined by Ah = hv. Indeed v is the k × 1 matrix representation of A with respect to the
standard basis w1 , . . . , wk of Rk .

6.5.1 Differentiation of matrix-valued functions


The spaces L (Rn , Rk ) and Rk,n are both vector spaces that can be identified with Rnk and therefore, the
definition of derivative given by (6.6) can also be applied to functions with domain and/or range in these
spaces. The only change that is needed is the replacement of k · k in (6.6) by the operator norm |||·||| or
k · kF .
Example 6.21. Show directly from the definition 6.6 that the derivative of the quadratic map
f : L (Rn ) → L (Rn ) defined by f (A) := A2
is given by (Df (A))(H) = AH + HA ∀ H ∈ L (Rn ). (Note that AH + HA = 2AH only if A and H
commute, i.e., AH = HA.)

ANALYSIS III 47
CHAPTER 6. THE DERIVATIVE

Solution. As in Example 6.20, we consider f (A + H) − f (A) and look for the term linear in H:

f (A + H) − f (A) = (A + H)(A + H) − A2 = AH + HA + H 2 .

The term linear in H is AH + HA and so, we define a linear map ΛA : L (Rn ) → L (Rn ) by ΛA (H) :=
AH + HA. Then, f (A + H) − f (A) − ΛA (H) = H 2 and therefore,

|||f (A + H) − f (A) − ΛA (H)||| H2 |||H|||2


lim = lim 6 lim = lim |||H||| = 0.
H→0 |||H||| H→0 |||H||| H→0 |||H||| H→0

We have shown that (Df (A))(H) = ΛA (H) = AH + HA.

Remark 6.22. We can also calculate the directional derivative ∂H f (A) directly from its definition:
d
∂H f (A) = f (A + tH)
dt t=0
d 2
= (A + tAH + tHA + t2 H 2 )
dt t=0
2
= (AH + HA + 2tH )
t=0
= AH + HA.

This is in agreement with Proposition 6.9 according to which (Df (A))(H) = ∂H f (A).
Indeed, since the entries of f (A) are quadratic expressions in the entries of A5 then we know that the
partial derivatives of f with respect to the variables of the entries of A are continuous and therefore, as we
shall see in §4.7, f is differentiable and the calculation of (Df (A))(H) can be reduced to that of ∂H f (A),
which is often much simpler.

6.6 The Chain Rule (NOT COVERED IN CLASS AND NOT EXAM-
INABLE)
Theorem 6.23. Let U and V be open subsets of Rn and Rk respectively. Suppose that f : U → Rk is
differentiable at x ∈ U and that f (x) ∈ V . Suppose further that g : V → Rm is differentiable at f (x).
Then g ◦f : Rn → Rm is differentiable at x and

D(g ◦f )(x) = Dg(f (x)) ◦ Df (x). (6.11)

The following two lemmas will be useful in the proof of the Chain Rule.

Lemma 6.24. Given f : U → Rk , x ∈ U, r > 0 such that B(x, r) ⊂ U and A ∈ L (Rn , Rk ), define
∆x,A f : B(0, r) → Rk by 
 f (x+h)−f (x)−Ah , if h 6= 0,
∆x,A f (h) = khk
(6.12)
0, if h = 0.
Then f is differentiable at x with Df (x) = A if, and only if, ∆x,A f is continuous at 0.

Proof. If ∆x,A f is continuous at 0 then limh→0 k∆x,A f (h)k = k limh→0 ∆x,A f (h)k = k∆x,A f (0)k = 0,
Therefore, (6.5) holds and f is differentiable at x with Df (x) = A.
Conversely, if f is differentiable at x and we set A = Df (x) in (6.12) then, (6.6) asserts that
limh→0 k∆x,A f (h)k = 0. But then limh→0 ∆x,A f (h) = 0 = ∆x,A f (0), which is precisely the statement
that ∆x,A f is continuous at 0.

Notation 6.25. If f is differentiable at x, then we let ∆x f (h) denote ∆x,Df (x) f (h).
a2 +bc ab+bd
5 a b
 
For instance, if A = c d then f (A) = ca+dc cb+d2
.

ANALYSIS III 48
CHAPTER 6. THE DERIVATIVE

Lemma 6.26. Let τ > 0 and consider a function δ from the open ball Bτ ⊂ Rn to Rk defined by
δ(h) := ξ(h) η(h), 0 < khk < τ, δ(0) := 0,
where, ξ : Bτ \ {0} → R is bounded and η : Bτ → Rk is continuous at 0 ∈ Bτ and η(0) = 0. Then δ is


continuous at 0 ∈ Bτ .
Proof. By continuity of η at 0, given ε > 0, ∃ σ ∈ (0, τ ) such that khk < σ ⇒ kη(h)k < ε.
By boundedness of ξ, ∃ M > 0 such that kξ(h)k < M ∀ h ∈ Bτ \ {0}.
Therefore, 0 < khk < σ ⇒ kδ(h)k < M ε, i.e., limh→0 δ(h) = 0 = δ(0) and this completes the proof of
the lemma.
Proof of Chain Rule. As in the proof of Proposition 6.7 we have
f (x + h) = f (x) + Df (x)h + ∆x f (h)khk
and
g(f (x) + k) = g(f (x)) + Dg(f (x))k + ∆f (x) g(k)kkk (6.13)
where 
 f (x+h)−f (x)−Df (x)h , if h 6= 0,
khk
∆x f (h) :=
0, if h = 0.
and 
 g(f (x)+k)−g(f (x))−Dg(f (x))k , if k 6= 0,
kkk
∆f (x) g(k) :=
0, if k = 0.
Set k(h) := Df (x)h + ∆x f (h)khk in (6.13). Then, by linearity of Dg(f (x)),
g(f (x + h)) = g(f (x)) + Dg(f (x))(Df (x)h)
+ khkDg(f (x))(∆x f (h)) + kk(h)k ∆f (x) g(k(h)).
Therefore, 
g(f (x + h)) − g(f (x)) − Dg(f (x)) ◦ Df (x)h = khk δ1 (h) + δ2 (h)
where,
δ1 (h) := Dg(f (x))(∆x f (h)),
kk(h)k
and δ2 (h) := ∆f (x) g(k(h)), h 6= 0, δ2 (0) := 0.
khk
The proof of the Chain Rule will be complete once we prove that
lim kδ1 (h)k = 0 and lim kδ2 (h)k = 0.
h→0 h→0
We start with δ1 (h).
kδ1 (h)k 6 |||Dg(f (x))k||| k∆x f (h)k
and, by differentiability of f at x, we have limh→0 k∆x f (h)k = 0. It follows immediately that limh→0 kδ1 (h)k =
0.
We move on to δ2 (h). For h 6= 0, set
kk(h)k kDf (x)hk
ξ(h) := 6 + k∆x f (h)k 6 |||Df (x)||| + k∆x f (h)k.
khk khk
The continuity of ∆x f at 0 implies that ξ(h) is bounded on Bτ \ {0} for some τ > 0. Next set
η(h) := ∆f (x) g(k(h)).
k(h) is a continuous function of h and k(0) = 0. Therefore, by differentiability of g at f (x) and Proposition
6.7, η(h) is a continuous function of h and η(0) = 0. We may therefore apply Lemma 6.26 to δ2 (h) =
ξ(h) η(h) to conclude that limh→0 kδ2 (h)k = 0.
The proof that g ◦f is differentiable at x and (6.11) holds is complete.

ANALYSIS III 49
CHAPTER 6. THE DERIVATIVE

6.6.1 Jacobian form of chain rule


The linear isomorphism µ : L (Rn , Rk ) → Rk,n defined by (5.3) takes composition of linear transformations
to matrix multiplication. Therefore, under the same hypotheses as for Theorem 6.23 we have

∂ g ◦f (x) = ∂g(f (x)) · ∂f (x) where · stands for matrix multiplication. (6.14)

More eexampleicitly,

∂1 g1 ◦f (x) ... ∂n g1 ◦f (x)


 
.. ..
. . =
 

∂1 gm ◦f (x) . . . ∂n gm ◦f (x)

∂1 g1 (f (x)) ... ∂k g1 (f (x)) ∂1 f1 (x) . . . ∂n f1 (x)


  
.. .. .. ..
. . . .  . (6.15)
  
 
∂1 gm (f (x)) . . . ∂k gm (f (x)) ∂1 fk (x) . . . ∂n fk (x)


The entry in the j th row and ith column of ∂ g ◦f (x) can be written as gj (f (x) . If we set y = f (x)

∂xi
and we see g as a function of y = (y1 , . . . , yk ) then the entry in the j th row and rth column of ∂g(f (x))
∂gj
can be written as (f (x)). Then, (6.15) can be written as
∂yr
k
∂ ∂gj ∂yr ∂yr ∂fr
(x) where, by (x) we mean (6.16)
 X
gj (f (x) = (f (x)) (x).
∂xi r=1
∂yr ∂xi ∂xi ∂xi

(6.16) is perhaps more memorable than (6.15) because we can imagine cancelling ∂yr from the denominator
and numerator in the terms of the sum in (6.16). However, it is important to appreciate the difference
∂ ∂gj
between gj (f (x) and (f (x)). In the first expression, the function gj ◦f is being differentiated

∂xi ∂yi
with respect to its ith variable (1 6 i 6 n) and evaluated at x whereas in the second expression it is the
function gj that is being differentiated with respect to its ith variable (1 6 i 6 k) and then evaluated at
y = f (x).

6.6.2 Calculating with the chain rule and gradient


Given f : Rn → R and g : R → R the ith partial derivative of g ◦f can be computed using (6.16) with
k = 1 and m = 1:
∂i g ◦f (x) = g 0 (f (x)) ∂i f (x)


and therefore,
∇ g ◦f (x) = g 0 (f (x))∇f (x).
In the example that follows, the gradient of a function will be written as a row vector (to make it easier to
write)!
n 2
√ Calculate ∇kxk, x ∈ R \ {0}, by applying the chain rule to g ◦f where f (x) := kxk
Example 6.27.
and g(t) := t, t > 0.

Solution. g 0 (t) = 2√ 1
t
and, since f (x) = x21 + · · · + x2n , we have that ∂i f (x) = 2xi , i.e., ∇f (x) =
(∂1 f (x), . . . , ∂n f (x)) = (2x1 , . . . , 2xn ) = 2x. Therefore,

1 x
∇kxk = ∇ g ◦f (x) = g 0 (f (x))∇f (x) = p 2x = . (6.17)
2 kxk2 kxk

ANALYSIS III 50
CHAPTER 6. THE DERIVATIVE

The component form of (6.17) is


∂ xi
kxk = .
∂xi kxk

Another common application of the chain rule occurs in the calculation of the derivative of f ◦ r where
r : R → Rn is a parameterisation of a path in Rn and f : Rn → R is a scalar function. In this case, if
r(t) = (x1 (t), . . . , xn (t)) then
n
0
X ∂f dxi
(f ◦ r) (t) = (r(t))
i=1
∂xi dt

= ∇f (r(t)) · r0 (t). (6.18)

Example 6.28. Fix x ∈ Rn and define r : R → Rn by r(t) := tx, i.e., if x 6= 0 then r is a parameterisation
of the line through 0 and x. Given f : Rn → R calculate (f ◦ r)0 (t) in terms of ∇f .

Solution. r0 (t) = x and therefore, by (6.18), (f ◦ r)0 (t) = x · ∇f (tx) . Equivalently,




d ∂f ∂f
f (r(t)) = x1 (tx) + · · · + xn (tx).
dt ∂x1 ∂xn

6.6.3 Another proof of Proposition 6.9


Given v ∈ Rn , ∃ δ > 0 such that x + tv ∈ U ∀ t ∈ (−δ, δ). As in Example 6.28 define r : (−δ, δ) → Rn
by r(t) := x + tv, i.e., r is a parameterisation of a line segment through x in the direction of v. Then
d
∂v f (x) := dt f (r(t)) and therefore,
t=0

d
∂v f (x) = Df (x) (x + tv) (chain rule)
dt t=0
= Df (x)v.

6.6.4 Application of chain rule to the verification of a PDE satisfied by a function


Example 6.29. Suppose that f : R → R is differentiable and define u : R2 → R by u(x, y) := f (x2 e−y ).
Show that u satisfies the PDE xux + 2uy = 0.

Solution.
∂ 2 −y
ux (x, y) = f 0 (x2 e−y ) (x e ) = 2xe−y f 0 (x2 e−y ),
∂x

uy (x, y) = f 0 (x2 e−y ) (x2 e−y ) = −x2 e−y f 0 (x2 e−y ).
∂y

Therefore xux + 2uy = 2x2 e−y f 0 (x2 e−y ) + (−2x2 e−y f 0 (x2 e−y )) = 0.

6.7 Continuity of partial derivatives implies differentiability (NOT COV-


ERED IN CLASS AND NOT EXAMINABLE)
As noted in Example 6.17, partial derivatives are easily computed using the familiar rules of differentiation.
Unfortunately, as Example 6.19 shows, the existence of all partial derivatives of a function at all points
does not guarantee its differentiability. However, it is worth noting that in Example 6.19 Df was shown to
not exist at a point where the partial derivatives are discontinuous. The partial derivatives of f in Example
6.3 do not even exist at points of the form (x, x2 ), x 6= 0 and fy does not exist at points on the real axis
different from (0, 0). These examples raise the possibility that continuity of the partial derivatives may
guarantee differentiability. That is the content of the next theorem.

ANALYSIS III 51
CHAPTER 6. THE DERIVATIVE

Theorem 6.30. Consider f : U → Rk and suppose there exists B(x, r) ⊂ U such that the Jacobian matrix
∂f (y) exists at all points of B(x, r) and that ∂f is continuous at x. Then f is differentiable at x and
Df (x) h = ∂f (x) h ∀ h ∈ Rn .

Remark 6.31. Recall from §5.2.1 that, writing f as (f1 , . . . , fk ), ∂f is continuous at x if, and only if,
∂1 f1 , . . . , ∂n f1 , ∂1 f2 , . . . , ∂n f2 , . . . , ∂1 fk , . . . , ∂n fk are all continuous at x.
Note, too, that we need to make an assumption on the behaviour of the partial derivatives of f at all
points y sufficiently near x in order to conclude the existence of Df at just x.

Proof. We shall only give the proof in the simplest case n = 2, k = 1.


For 0 < k(h1 , h2 )k < r define,

∆f (h1 , h2 ) := f (x1 + h1 , x2 + h2 ) − f (x1 , x2 ) − h1 ∂1 f (x1 , x2 ) − h2 ∂2 f (x1 , x2 ). (6.19)

We need to show that


∆f (h1 , h2 )
lim = 0. (6.20)
(h1 ,h2 )→(0,0) k(h1 , h2 )k

If we succeed, then we would have proved that Df (x1 , x2 ) is the linear map from R2 to R defined by

Df (x1 , x2 )(h1 , h2 ) := h1 ∂1 f (x1 , x2 ) + h2 ∂2 f (x1 , x2 ).

Partial derivatives only provide information along lines parallel to the axes. Therefore, we have to break
f (x1 + h1 , x2 + h2 ) − f (x1 , x2 ) into differences along the axes as follows:
 
f (x1 + h1 , x2 + h2 ) − f (x1 , x2 ) = f (x1 + h1 , x2 + h2 ) − f (x1 + h1 , x2 ) + f (x1 + h1 , x2 ) − f (x1 , x2 )
= II + I.

The second term I can be written in terms of ∂1 f by applying the mean value theorem to f (·, x2 ), x2
fixed. Namely,

∃ θ1 ∈ (0, 1) such that f (x1 + h1 , x2 ) − f (x1 , x2 ) = h1 ∂1 f (x1 + θ1 h1 , x2 ). (6.21)

Similarly, for the first term II,

∃ θ2 ∈ (0, 1) such that f (x1 + h1 , x2 + h2 ) − f (x1 + h1 , x2 ) = h2 ∂2 f (x1 + h1 , x2 + θ2 h2 ). (6.22)

Substituting (6.21) and (6.22) in the definition (6.19) of ∆f (h1 , h2 ) we get



∆f (h1 , h2 ) = h1 ∂1 f (x1 + θ1 h1 , x2 ) − ∂1 f (x1 , x2 )
 (6.23)
+ h2 ∂2 f (x1 + h1 , x2 + θ2 h2 ) − ∂2 f (x1 , x2 ) .

By continuity of ∂1 f and ∂2 f at (x1 , x2 ), given ε > 0, ∃ δ > 0 (which we may, and shall, assume to
be less than r) such that

k(h̃1 , h̃2 )k < δ ⇒ k∂1 f (x1 + h̃1 , x2 + h̃2 ) − ∂1 f (x1 , x2 )k < ε


(6.24)
and k∂2 f (x1 + h̃1 , x2 + h̃2 ) − ∂2 f (x1 , x2 )k < ε.

Now k(h1 , θ2 h2 )k < k(h1 , h2 )k and k(θ1 h1 , 0)k < k(h1 , h2 )k and therefore, if k(h1 , h2 )k < δ then, using
(6.24) in (6.23) with (h̃1 , h̃2 ) = (θ1 h1 , 0) in the first term and (h̃1 , h̃2 ) = (h1 , θ2 h2 ) in the second term,
yields √
k∆f (h1 , h2 )k < ε(|h1 | + |h2 |) 6 ε 2 k(h1 , h2 )k.
In other words, we have established (6.20) and completed the proof of Theorem 6.30.

ANALYSIS III 52
CHAPTER 6. THE DERIVATIVE

6.7.1 The space of continuously differentiable functions


Definition 6.32. Suppose that f : U → Rk is differentiable on U . Then f is said to be continuously
differentiable at p ∈ U if the map x 7→ Df (x) : U → L (Rn , Rk ) is continuous at p. More eexampleicitly,

∀ ε > 0, ∃ δ > 0 such that kx − pk < δ ⇒ |||Df (x) − Df (p)||| < ε.

Proposition 6.33. f : U → Rk is continuously differentiable on U if, and only if, ∂f : U → Rk,n is


continuous on U .

Proof. Recall that L (Rn , Rk ) and Rk,n are identified via the map µ : L (Rn , Rk ) → Rk,n defined by
(5.3) and that, by Proposition 6.15, if Df (x) exists, then ∂f (x) = µ(Df (x)). Furthermore, by the
discussion in §5.2.1, the continuity at p of x 7→ Df (x) : U → L (Rn , Rk ) implies the continuity at p of
x 7→ ∂f (x) : U → Rk,n .
Conversely, if ∂f is continuous on U then Theorem 6.30 assures us that Df exists at all points of U
and that ∂f = µ ◦ Df . We can then appeal again to the discussion in §5.2.1 to assert that the continuity
at p of ∂f implies the continuity at p of Df .

This proposition is useful because it provides us with a practical way of checking continuous
differentiability, namely, we simply have to compute all the first order partial derivatives ∂i fj
of f = (f1 , . . . , fk ) and verify that they are all continuous. This means that most functions
that one can eexampleicitly write down in terms of polynomials, exponential, logarithm, etc. are
continuously differentiable on their ‘natural’ domain of definition.

Notation 6.34.
C 1 (U, Rk ) := {f : U → Rk | ∂f : U → Rk,n is continuous}.
C 1 (U ) := C 1 (U, R).

ANALYSIS III 53
Chapter 7

Complex Analysis

This part of the course is an introduction to complex analysis. The main topics will be complex differ-
entiability, power series and contour integrals. Basic notions and properties for complex numbers were
introduction in Year 1 and we only provide a quick review here.

7.1 Review of basic facts about C


The field of complex numbers is given by

C = {z = x + iy, x, y ∈ R},

with i2 = −1. For z = x + iy as above we say that x is the real part of z, denoted by x = Re z and that
ypis the imaginary part of z, denoted by y = Im z. By |z| we denote the modulus (or norm) of z, given by
x2 + y 2 . We denote by z̄ the complex conjugate of z. That is, if z = x + iy then z̄ = x − iy. It is easy
to see that

1. z̄¯ = z,

2. z + w = z̄ + w̄,

3. zw = z̄ w̄,

4. |z|2 = z z̄ and |z̄| = |z|.

Notice that we can identify C with R2 , simply by identifying z = x + iy with (x, y). In this way |z|
corresponds to the Euclidean norm in R2 . We will not use k · k, the notation that we used for the norm in
R2 .
The notions of convergence, open and closed for C are identical to those in the plane (see results in
Definition 4.1.

Definition 7.1. We say that (zn )∞n=1 ⊂ C converges to z if and only if |zn − z| tends to zero as n goes to
∞. That is, if for every ε > 0 there exists N ∈ N such that |zn − z| < ε for all n > N .

Definition 7.2. We say that Ω ⊂ C is open if and only for every x ∈ Ω there exits r > 0 such that
Br (x) = {z ∈ C |z − x| < r} ⊂ Ω. We say that Ω is closed if and only if Ωc is open.

Definition 7.3. A set K ⊂ C is sequentially compact if and only if for every sequence (xj )j∈N ⊂ K has a
convergent subsequence (xj(l) )l∈N whose limit is in K.

Now, maps in f : Ω ⊂ C → C, are given by a pair of real-valued functions f (z) = u(z) + iv(z), the
real part u of f and the imaginary part v of f . We can think of those two functions as functions of z or
as functions in R2 of x and y, the real and imaginary part of z. This means we can also think of f as a
function from Ω ⊂ R2 to R2 .

54
CHAPTER 7. COMPLEX ANALYSIS

Definition 7.4. Given f : Ω ⊂ C → C we say that it is continuous at z0 ∈ Ω if and only if for every ε > 0
there exists δ such that |z − z0 | < δ, with z ∈ Ω implies that |f (z) − f (z0 )| < ε.

Notice that the notion of continuity coincides with the one defined for maps from R2 to R2 . We will
now consider the notion of differentiability, where the two notions differ very significantly.
Recall that a function f : Rn → Rk is differentiable at a point p if and only if there exists a linear map
Df (p) ∈ L (Rn ; Rk ) such that

kf (p + h) − f (p) − Df (p)hk
lim = 0. (7.1)
h→0 khk

The reason for introducing that definition arose from the fact that when k > 1 we have no notion of
division for the quantity we would like to study

f (p + h) − f (p)
lim
h→0 h
as division by h ∈ Rn , n > 1 is not well defined. However, in C we do have a notion of multiplication and
therefore we can use that quotient to define differentiability.

Definition 7.5. Let Ω ⊂ C be an open set and z ∈ Ω. We say that f is complex differentiable at z if and
only if the limit
f (z + h) − f (z)
lim (7.2)
h→0 h
exists. We denote the limit by f 0 (z).

In contrast to what happened in the real valued case, where the derivative was a linear map from Rn to
Rk , which in our case would mean from R2 to R2 , corresponding to a 2 by 2 matrix, in the complex case we
obtain a complex number. Before studying how to reconcile this difference, we look at the consequences
of the definition for the real and imaginary part of f . Let’s write h = ∆x + i∆y, and f (z) = u(z) + iv(z),
which we can also think of as f (x, y) = u(x, y) + iv(x, y). Then the quotient in the definition of complex
derivative can be rewritten as
f (z + h) − f (z) u(x + ∆x, y + ∆y) − u(x, y) + i[v(x + ∆x, y + ∆y) − v(x, y)]
= .
h ∆x + i∆y

We could consider multiple ways of sending ∆x + i∆y to zero, obtaining the same answer if the limit exists.
We will consider the two obvious options, sending ∆x first to zero followed by ∆y, and the reverse, ∆y
first followed by ∆x. We find

u(x + ∆x, y + ∆y) − u(x, y) + i[v(x + ∆x, y + ∆y) − v(x, y)]


lim lim
∆y→0∆x→0 ∆x + i∆y

u(x, y + ∆y) − u(x, y) + i[v(x, y + ∆y) − v(x, y)]


= lim
∆y→0 i∆y
1 u(x, y + ∆y) − u(x, y) v(x, y + ∆y) − v(x, y)
= lim +i
i ∆y→0 ∆y ∆y
1 ∂u ∂v
 
= (x, y) + i (x, y) = vy (x, y) − iuy (x, y),
i ∂y ∂y
while
u(x + ∆x, y + ∆y) − u(x, y) + i[v(x + ∆x, y + ∆y) − v(x, y)]
lim lim
∆x→0∆y→0 ∆x + i∆y
u(x + ∆x, y) − u(x, y) + i[v(x + ∆x, y) − v(x, y)]
= lim
∆x→0 ∆x

ANALYSIS III 55
CHAPTER 7. COMPLEX ANALYSIS

u(x + ∆x, y) − u(x, y) v(x + ∆x, y) − v(x, y)


= lim +i
∆x→0 ∆x ∆x
∂u ∂v
 
= (x, y) + i (x, y) = ux (x, y) + ivx (x, y).
∂x ∂x
This immediately means that at the very least we need to demand some relationships between the partial
derivatives of u and v to hold in order to have a complex derivative. Namely

ux = vy uy = −vx (7.3)

These equations are known as the Cauchy–Riemann equations. These are clearly necessary conditions, but
at this point in no way guarantee that a complex derivative would exists if (7.3) is satisfied.
By considering two simple examples, it is easy to see that the notion of complex derivative is highly
restrictive as functions that are obviously smooth when considered as a map from R2 to R2 are not actually
complex differentiable. First we consider f (z) = z. Notice that f 0 (z) exists and equals 1. Indeed
z+h−z
f 0 (z) = lim = 1.
h→0 h
However if we consider g(z) = z̄, we obtain a function that is not complex differentiable. We have

g(z + h) − g(z) z̄ + h̄ − z̄ h̄
lim = lim = lim ,
h→0 h h→0 h h→0 h
a limit that does not exist. (Consider for example the limits obtained by taking h along the real or the
imaginary axis.) The function g does not satisfy the Cauchy–Riemann equations. We have g(z) = x − iy,
and therefore
ux = 1, vy = −1, uy = 0, vx = 0.
When considering g as a function from R2 to R2 we have g(x, y) = (x, −y) we clearly have a differentiable
function, as all components are smooth functions. (The existence of continuous partial derivatives suffices
to obtain differentiability, see Theorem 6.30.)
Definition 7.6. We say that f : Ω → C is analytic (or holomorphic) in a neighbourhood U of z if it is
complex differentiable everywhere in U . We say that f is entire if it is analytic in the whole of C.
A function can be differentiable at one point, but not necessarily analytic. Consider as an example
the function f (z) = |z|2 . We will show that the function is complex differentiable at 0, but that it is not
analytic, as it is not complex differentiable outside the origin. Notice that f (z) = x2 + y 2 , and u = x2 + y 2
and v = 0. When computing the Cauchy–Riemann equations we find

ux = 2x uy = 2y, vx = vy = 0.

The Cauchy–Riemann equations mean 2x = 0 and 2y = 0, which is only satisfied at the origin. Now, to
check that f is complex differentiable at the origin

|z + h|2 − |z|2 |h|2


= = h̄ −−−−−−−→ 0,
h h h→0
z=0

proving that f is complex differentiable at the origin with derivative 0.


We will now revisit the Cauchy–Riemann equations and connect complex differentiability with the
dependence of the function on z̄. Consider f (z) as given by u(x, y) + iv(x, y). Using the fact that x = z+z̄
2
and y = z−z̄
2i we can rewrite the function back in terms of z and z̄. Now, we could consider the derivative
of f with respect to z̄. Applying the chain rule we would obtain
∂u 1 1 ∂v 1 1
= ux − uy = vx − vy
∂ z̄ 2 2i ∂ z̄ 2 2i

ANALYSIS III 56
CHAPTER 7. COMPLEX ANALYSIS

Therefore
∂f 1 1 1 1
 
= uz̄ + ivz̄ = ux − uy + i vx − vy ,
∂ z̄ 2 2i 2 2i
which we can simplify to
∂f 1 1
= [ux − vy ] + i [vx + uy ] .
∂ z̄ 2 2
Notice that if the function is complex differentiable, it satisfies the Cauchy–Riemann equations and therefore
the expression above is identically zero. In this sense we say that if a function is complex differentiable,
then
∂f
= 0.
∂ z̄
This illustrates why f (z) = z̄ or g(z) = |z|2 = z z̄ were not complex differentiable.
We conclude this section by proving that f (z) = z n is complex differentiable for every n ∈ N. Using
Theorem 7.7 it suffices to show that it has a derivative at every point and that it satisifies the Cauchy–
Riemann equations. Notice that since it is a polynomial (once expanded in terms of x and y and considered
as map from R2 to R2 we trivially have that it has a derivative). To see that it satisfies the Cauchy–Riemann
equations, notice that (and similarly for v)

ux = (Re f )x = Re(fx ).

Therefore
fx = ux + ivx = n(x + iy)n−1 fy = uy + ivy = n(x + iy)n−1 i.
Without computing what ux , vx , uy , vy are, notice that it follows from the expression above that

uy + ivy = i(ux + ivx ),

which implies that ux = vy and uy = −vx , which are the Cauchy–Riemann equations.
The remainder of Section 7.1 is optional. It was not covered in class, and it is not examinable.
Now that we know that the Cauchy–Riemann equations need to be satisfied for a function to be complex
differentiable we can identify the complex plane with a subspace of 2 × 2 matrices. This identification will
allow us to connect directly complex differentiability with the standard notion of differentiability discussed
in Chapter 6. We have already identified a + ib with the point in R2 given by (a, b). We can also identify
it with the matrix !
a −b
.
b a
Note that which factor of b contains a minus sign is just a convention. Notice that the determinant of that
matrix equals |a + ib|2 , and that therefore the matrix is invertible unless a + ib = 0. This identification
preserves the basic operations we have for complex numbers, for example summation and multiplication.
That is it is possible to perform the operation (a + ib) + (c + id) as complex numbers or as the sum of
the two corresponding matrices, with the results agreeing (modulo the identification). For the product we
have
(a + ib)(c + id) = (ac − bd) + i(bc + ad)
and ! ! !
a −b c −d ac − bd −(bc + ad)
= ,
b a d c bc + ad ac − bd
proving the result. Sometimes it is useful to consider a hybrid of both identification, the one as a matrix,
and the one as a point (or vector) in R2 . For example, for the product of two complex numbers that we
have just considered, we could identify it with
! !
a −b c
.
b a d

ANALYSIS III 57
CHAPTER 7. COMPLEX ANALYSIS

The answer is the vector !


ac − bd
,
bc + ad
!
ac − bd −(bc + ad)
which corresponds to the right complex number (ac−bd)+i(bc+ad) and to the matrix .
bc + ad ac − bd
We are now ready to connect complex differentiation with Cauchy–Riemann and differentiation for
functions in R2 .

Theorem 7.7. Let f : Ω ⊂ C → C with Ω open. f is complex differentiable at z = a + ib ∈ Ω if and


only if f , when considered as map from Ω ⊂ R2 to R2 has a derivative at the point (a, b) that satisfies the
Cauchy–Riemann equations.

Before we prove this result, we emphasize that some books will replace the right-hand side by asking
that the Cauchy–Riemann equations are satisfied and that all partial derivatives are continuous. Notice
that this last condition implies the existence of a derivative.

Proof. Assume that f is complex differentiable at z = a + ib. Then we have

f (z + h) − f (z)
lim = f 0 (z),
h→0 h
which we can rewrite as
f (z + h) − f (z) − f 0 (z)h
lim = 0. (7.4)
h→0 h
In order to prove that f is differentiable as a map in R2 we need to find a linear map Df that satisfies
(7.1), which translates in finding a 2 × 2 matrix. Notice that (7.4) suggest that f 0 (z) ∈ C should be the
map. Indeed if we identify f 0 (z) with the corresponding matrix, and think of f 0 (z)h not as a product of
two complex numbers but as a matrix acting on the vector h then we have in fact proven that f has a
derivative. Since we already know that all complex differentiable functions satisfy the Cauchy–Riemann
equations we have completed that implication.
For the reverse, assuming that we have a derivative, that means that we have a 2 × 2 matrix which is
given by !
ux uy
Df ((a, b)) =
vx vy
and that satisfies
|f ((a, b) + h) − f ((a, b)) − Df ((a, b))h|
lim = 0.
h→0 |h|
Since the Cauchy–Riemann equations are satisfied we know that this matrix does in fact have the form
!
ux −vx
,
vx ux

meaning that we could identify it with a complex number as before. We could therefore identify Df h with
the product of the complex numbers f 0 (z) = ux + ivx and h. Identifying (a, b) with z we obtain

|f (z + h) − f (z) − f 0 (z)h|
lim =0
h→0 |h|

which implies that


f (z + h) − f (z)
lim
h→0 h
exists and equals f 0 (z), completing the proof.

ANALYSIS III 58
CHAPTER 7. COMPLEX ANALYSIS

As a consequence of the above result, since we can connect complex derivatives with derivatives of
maps from R2 to R2 , we have the following results:
Theorem 7.8. Lef f, g : Ω ⊂ C → C be complex differentiable functions. Then (asumming g 6= 0 in the
third expression) we have that the familiar expressions
 0
f f 0g − f g0
(f + g)0 = f 0 + g 0 (f g)0 = f 0 g + f g 0 = (f (g))0 = f 0 (g)g 0
g g2
apply to the complex-valued case as well. For the final expression one needs to assume that the composition
makes sense, i.e. the range of g is contained in the domain of f .

7.2 Power Series


We want to focus on the study of power series, i.e. expressions of the form ∞ n=0 an z . We begin by
n P

reviewing (in a very utilitarian way) some basic ideas of series for complex numbers covered in year 1.
P∞ PN
Definition 7.9. The series n=0 an , with an ∈ C is convergent if and only if the sequence SN = n=0 an
is convergent in C.
P∞
Definition 7.10. The series n=0 an , with an ∈ C is absolutely convergent if and only if the series
P∞
n=0 |an | is convergent.

The geometric series ∞ n=0 z is convergent if and only if |z| < 1, and sums up to 1/(1 − z) (with
n
P

partial sums SN = (1 − z N +1 )/(1 − z) ). We review a couple of the convergence tests from year 1.
P∞
Theorem 7.11 (Ratio Test). Consider n=0 an and assume that an 6= 0 for all n. Then

1. If lim sup |a|an+1 | P∞


n|
< 1 then n=0 an is convergent.
|an+1 | P∞
2. If |an | ≥ 1 for all n > N then n=0 an is divergent.

In particular if lim |a|an+1


n|
|
exists, and equals L we have convergence for L < 1 and divergence for L > 1.
(The test is inconclusive if L = 1.)
P∞
Theorem 7.12 (Root Test). Consider n=0 an . Then
P∞
1. If lim sup |an |1/n < 1 then n=0 an converges.
P∞
2. If lim sup |an |1/n > 1 then n=0 an diverges.
The proofs of these results are obtained by comparison with the geometric series and will not be covered
in these notes.
We will focus on studying expressions of the form
∞ ∞
or
X X
n
an z an (z − z0 )n ,
n=0 n=0

with an , z ∈ C.
Theorem 7.13. Given (an )∞
n=0 there exists R ∈ [0, ∞] such that

X
an z n
n=0

1
converges for all |z| < R and diverges for |z| > R. (As we will see in the proof R = lim sup |an |1/n
.) The
quantity R is called the radius of convergence of the series.

ANALYSIS III 59
CHAPTER 7. COMPLEX ANALYSIS

P∞
Proof. We consider z given, but fixed, and apply the root test to the series n=0 an z
n. This series is
convergent if
lim sup |an z n |1/n
is less than 1 and divergent if it is greater than 1. But that translates to convergence if
1
|z| <
lim sup |an |1/n

and divergence when


1
|z| > ,
lim sup |an |1/n
proving the result.

A simple application of the ratio tests yields the following result:

Theorem 7.14. Let an 6= 0 for all n ≥ N , and assume that lim |a|an+1 | P∞ n
n|
exists. Then n=0 an z has radius
|an |
of convergence R = lim |an+1 | .

Next we will show that within the radius of convergence a power series is actually differentiable, and
that we can in fact compute the derivative term-by-term. More precisely:

Theorem 7.15. Assume ∞ n


n=0 an z has radius of convergence R. Then for |z| < R the function f (z) =
P
P∞ n
n=0 an z is differentiable and

f 0 (z) =
X
nan z n−1 .
n=1

Proof. First we will show that the power series for f 0 (z) does have the same radius of convergence. Notice
that the radius of convergence of ∞ n−1 and P∞ na z n (i.e. where we have multiplied the
P
n=1 nan z n=1 n
expression by z) is the same. To see this notice that if ∞ n−1 is convergent for |z| < R and
P
n=1 na nz
divergent for |z| > R then the same will apply to the second series. Therefore we just need to consider

lim sup |nan |1/n = lim n1/n lim sup |an |1/n = lim sup |an |1/n ,

which shows that the radius of convergence is the same as for ∞ n=0 an z . Notice that the series
n P
P∞ n−2 also has the same radius of convergence (we will need this result in our estimate
n=2 n(n − 1)an z
below, even though we never formally compute second derivatives).
Next, notice that for k ∈ N we have

wk − z k
= wk−1 + wk−2 z + · · · + wz k−2 + z k−1 . (7.5)
w−z
Now, in order to prove that f is complex differentiable and compute its derivative we study

f (z + h) − f (z) X
− nan z n−1 .
h n=1

We denote by w = z + h (and so h = w − z), and substitute the expression for f in terms of a series to
find
∞  n
w − zn
X 
an − nz n−1 .
n=0
w−z
We look more carefully at the term in brackets. Using (7.5) we find (taking k = n)

wn − z n
− nz n−1 = wn−1 + wn−2 z + · · · + wz n−2 + z n−1 − nz n−1
w−z

ANALYSIS III 60
CHAPTER 7. COMPLEX ANALYSIS

= wn−1 − z n−1 + [wn−2 − z n−2 ]z + · · · + (w − z)z n−2


" #
wn−1 − z n−1 wn−2 − z n−2 w − z n−2
= (w − z) + z + ··· + z . (7.6)
w−z w−z w−z
Now, for |z| < r < R and |w| < r < R we have

wk − z k
= |wk−1 + wk−2 z + · · · + wz k−2 + z k−1 | < krk−1
w−z

and therefore
wk − z k n−k−1
z ≤ krk−1 rn−k−1 ≤ krn−2 ,
w−z
which substituted in (7.6) yields
h i
≤ |w − z| |(n − 1)rn−2 + (n − 2)rn−2 + · · · 2rn−2 + rn−2 |

1
≤ |w − z|rn−2 n(n − 1).
2
We have shown that
∞ ∞
f (z + h) − f (z) X 1X
− nan z n−1 ≤ |w − z| n(n − 1)|an |rn−2 ≤ M |h|,
h n=1
2 n=0

which goes to zero as h goes to zero. Notice that in the last inequality we have used that the series
P∞ n−2 is finite, since we observed that the radius of convergence of the corresponding
n=0 n(n − 1)|an |r
power series was also R.

We have the following simple consequence of the Theorem above, which allows us to compute the
coefficients an in terms of derivatives of f .

Corollary 7.16. Let ∞ n


P
P∞ n=0 an z be a power series with radius of convergence R > 0. Then f (z) =
n
n=0 an z is infinitely differentiable and moreover

f (n) (0) = an n!, n = 0, 1, 2, . . .

Proof. The result is trivial for f (0), as it clearly equals a0 . A simple induction argument using the formula
for the derivative of f in the previous Theorem yields the desired result.

Theorem 7.17. Let ∞ n


P
n=0 an z be a power series with radius of convergence R > 0. Then for every
r ∈ (0, R) the sequence of functions
k
X
fk := an z n
n=0

converges uniformly in |z| ≤ r.

Proof. We show the result by proving that (fk ) is uniformly Cauchy in |z| ≤ r. We have (assuming that
j ≤ k)
k
X k
X ∞
X
|fk (z) − fj (z)| = | an z n | ≤ |an |rn ≤ |an |rn .
n=j+1 n=j+1 n=j+1
P∞
Since by assumption n=0 |an |rn is finite, given any ε > 0 we can choose N large enough to make
|fk (z) − fj (z)| < ε for all j, k > N , concluding the proof. (This proof is essentially an application of the
Weierstrass M-test that we covered a few weeks ago.)

ANALYSIS III 61
CHAPTER 7. COMPLEX ANALYSIS

7.2.1 The exponential and the circular functions


Many of these functions should have appeared in year I, though perhaps only in the real-valued case.

Definition 7.18. We define the following power series for z ∈ C .



1 n
(7.7)
X
ez : = z ,
n=0
n!

(−1)n 2n
(7.8)
X
cos(z) : = z ,
n=0
(2n)!

1 2n
(7.9)
X
cosh(z) : = z ,
n=0
(2n)!

(−1)n 2n+1
(7.10)
X
sin(z) : = z ,
n=0
(2n + 1)!

1
(7.11)
X
sinh(z) : = z 2n+1 .
n=0
(2n + 1)!

The ratio test shows (Exercise) that the radius of convergence of all of the series above is R = ∞.
Notice that using Theorem 7.15 we can prove well known identities like (ez )0 = ez . Indeed

!0 ∞ ∞
z 0
X 1 n X n n−1 X 1 n
(e ) = z = z = z = ez .
n=0
n! n=1
n! n=0
n!

In fact, we can easily relate all the circular functions to the exponential.

Proposition 7.19. The following identities hold for all z ∈ C:

eiz + e−iz eiz − e−iz


cos z = , sin z = ,
2 2i
ez + e−z ez − e−z
cosh z = , sinh z = .
2 2
Proof. We only prove the first one. The others are very similar and are left as an Exercise.
∞ ∞
" #
eiz + e−iz 1 X 1 X 1
= (iz)n + (−iz)n
2 2 n=0 n! n=0
n!

∞ n ∞
" #
1 X i + (−i)n n X (−1)k 2k
= z = z ,
2 n=0 n! k=0
(2k)!
where we have used that
n even
(
n n 2(i)n = 2(−1)n/2
i + (−i) = .
0 n odd

There are additional relationships between sine and cosine and their hyperbolic counterparts. Notice
that we have

cos(iz) = cosh(z) cosh(iz) = cos(z) sin(iz) = i sinh(z) sinh(iz) = i sin(z),

which shows that sine and cosine are unbounded functions in the complex plane. Just consider z = iy for
y ∈ R together with the fact that the real valued sinh and cosh grow exponentially at infinity.

ANALYSIS III 62
CHAPTER 7. COMPLEX ANALYSIS

Theorem 7.20. The exponential function ez satisfies the following properties

1. ez+w = ez ew for all z, w ∈ C.

2. ez 6= 0 for all z ∈ C.

3. ez = 1 if and only if z = 2kπi for k ∈ Z, and as a result ez+w = ez if and only if w = 2kπi, k ∈ Z.
Notice that in particular we have shown ez+2kπi = ez for all k ∈ Z, so in this sense the exponential
is periodic in the imaginary variable.

4. ez = −1 if and only if z = (2k + 1)πi for k ∈ Z.

Proof. We present the direct proof of part 1, without using more advanced tools from complex analysis
that would reduce the heavy computational nature.
∞ ∞ ∞ X
! !
z w
X 1 n X 1 k
X z n wk
e e = z w =
n=0
n! k=0
k! l=0 n,k
n! k!
n+k=l
∞ X
l ∞
!
X 1 l j l−j X 1
= z w = (z + w)l = ez+w .
l=0 j=0
l! j l=0
l!

For part 2, notice that ez e−z = 1, proving that ez 6= 0. For part 3, denoting z = x + iy we find

ez = ex eiy = ex (cos y + i sin y),

which equals 1 if and only if ex = 1 and cos y + i sin y = 1. These only happen if x = 0 and y = 2πk,
k ∈ Z. Similarly for part 4.

7.2.2 Argument and Log


Every complex number z ∈ C\{0} can be written in the form z = |z|eiθ , where θ is the angle that the
vector z forms with the x axis, measured counter-clockwise. Of course that angle is not unique (but rather
up to factors of 2π. Notice that for z = 0 there is no natural way to choose an angle.
y

1.5

r
1.0

θ
0.5

x
-1.5 -1.0 -0.5 0.5 1.0 1.5

-0.5

-1.0

-1.5

Figure 7.1: Polar representation of a complex number

We can define the (multivalued) function, for z 6= 0,

arg(z) = {θ ∈ R : z = |z|eiθ }. (7.12)

It is not a function as such, as the image is not uniquely defined, and if θ ∈ arg(z) then so is θ + 2kπ.
The following are easily verified properties of arg.

ANALYSIS III 63
CHAPTER 7. COMPLEX ANALYSIS

Proposition 7.21.

1. arg(αz) = arg(z) for all α > 0.

2. arg(αz) = arg(z) + π = {θ + π, for θ ∈ arg(z)} for all α < 0,

3. arg(z̄) = − arg(z) = {−θ, for θ ∈ arg(z)},

4. arg(1/z) = − arg(z),

5. arg(zw) = arg(z) + arg(w) = {θ + φ, with θ ∈ arg(z), φ ∈ arg(w)}.

The ambiguity of the argument function can be solved by defining the principal value Arg of the arg
function to take values in (−π, π]. That is for any z ∈ C we have Arg(z) ∈ (−π, π].
Notice that it is impossible to define the Arg function continuously in the entire plane. In particular
as we approach any point in the negative real axis, if we do it from above the Arg function will yield π,
while if we do it from below it will −π. Observe that if we had made any other choice for the range of Arg
there would always be a half-line where we have the same issue, the difference between the values of the
argument when approaching from opposite sides is always 2π.
We want to define the logarithm by analogy of what happens in R. In the real valued case we say (here
w, z ∈ R)
w = log(z) if and only if ew = z.
If we could extend this for w, z ∈ C, since we know that ew = ew+2πik for any k ∈ Z we would have that
if w = log(z) the so is w + 2πik. Therefore we would have that log(z) is a multivalued function (just like
it happened before with arg(z), the argument function).
Let’s write z = |z|ei arg(z) and w = log(z) = u + iv. We have

eu+iv = eu eiv = z = |z|ei arg(z) ,

and therefore comparing the two expressions in polar form we must have

eu = |z| and eiv = ei arg(z) .

That means that u = log |z|, with this logarithm being the real logarithm. We will denote by Log the
logarithm in R to distinguish it from the complex valued we want to define. We define the multivalued
function
log(z) = Log|z| + i arg(z). (7.13)
In terms of the Arg function we have

log(z) = Log|z| + iArg(z) + 2πik for k ∈ Z.

For example if we compute the complex logarithm of 1 we have

log(1) = Log|1| + iArg(1) + 2πik = 2πik for k ∈ Z.

Notice that the definition above makes sense provided that z 6= 0, where the real logarithm is not defined.
We can now compute logarithms of negative numbers.

log(−1) = Log| − 1| + iArg(−1) + 2πik = iπ + 2πik for k ∈ Z.

The complex logarithm we have just defined obeys many of the properties that we know for the real
logarithm, with the caveat that we have to take care of the multi-valuedness of the function. For example

log(zw) = log z + log w.

ANALYSIS III 64
CHAPTER 7. COMPLEX ANALYSIS

To prove this result, notice that since Log|zw| = Log(|z||w|) = Log|z| + Log|w| and that arg(zw) =
arg(z) + arg(w) we have

log(zw) = Log|zw| + i arg(zw) = Log|z| + Log|w| + i arg(z) + i arg(w) = log z + log w.

This equality needs to be understood modulo 2πi, that is there exists k ∈ Z such that

log(zw) − log(z) − log(w) = 2πik.

Similarly we have
log(z/w) = log(z) − log(w).

If we want to consider the (complex) differentiability of the log we have to deal with the multi-valuedness
of the arg function. Indeed if we consider the incremental quotient

log(z + ∆z) − log z


∆z
we need to make sure that as we approach z both logs approach the same value. We know that this cannot
be done continuously in the entire plane, and that we need to remove a semi-line arising from the origin.
For example if we consider C\{x ≤ 0} we can consider the principal branch of the logarithm, which by
an abuse of notation we denote by Log, just like the real logarithm, by

Log(z) = Log|z| + iArg(z).

This function, defined on C\{x ≤ 0} is single valued. If we consider points of the form z = x ± iε, for
x < 0 and ε > 0 small, we find
lim Log(x ± iε) = Log(−x) ± iπ,
ε→0

showing that the function could not be extended continuously along {x < 0}. This half-line is called a
branch cut. It is possible to compute the derivative of Log directly from the definition, or in terms of its
inverse. However, for practical purposes, once we know it is differentiable, from the identity

eLogz = z

we find
eLogz (Logz)0 = 1
from which it follows that (Logz)0 = 1/z.
Once we have defined the notion of logarithm it is possible to consider defining complex powers of
complex numbers. Given α ∈ C, and z 6= 0 we define the α-th power of z by

z α := eα log(z) = eαLog|z|+αi arg(z) .

The multi-valuedness of arg means that the same is true for z α . If we rewrite the above as

z α = eαLog|z|+αi arg(z) = eαLog|z|+αiArg(z)+2παki = eαLog(z) e2παki

for k ∈ Z the multi-valuedness becomes more evident. The number of α powers, whether it is one, finitely
many or infinitely many will depend on α.
Indeed if α is an integer for example then e2παki = 1, which means that in fact there is only one value
of z α . If α is rational, say α = p/q, with p, q coprime, then z α will have finitely many powers. It is easy
to see that for α = p/q (with p, q coprime, and q ∈ N)

e2παki = e2πα(k+q)i

ANALYSIS III 65
CHAPTER 7. COMPLEX ANALYSIS

and therefore z α will take q different values

eαLog(z) e2παki , k = 0, 1, . . . , q − 1.

In the case of an irrational α it will actually take infinitely many values.


In the rational case the result obtained above is consistent with what we know about finding roots
of polynomials. If we consider, for q ∈ N the equation z q = 1 we know it should have q roots which
correspond to
z = 11/q .
Now, using the expressions above we find

11/q = eLog(1)/q e2πik/q = e2πik/q k = 0, 1, . . . , q − 1.

7.3 Complex integration, contour integrals


For a function f : [a, b] → C we define
ˆ b ˆ b ˆ b
f (t)dt = Re f (t)dt + i Im f (t)dt. (7.14)
a a a
This definition means that we reduce integrating a complex-valued function to integrating two real-valued
functions, and can therefore use every result we know from before, such as the Fundamental Theorem of
Calculus to compute each integral.
It is easy to see that for every f, g : [a, b] → C and every α, β ∈ C we have
ˆ b ˆ b ˆ b
[αf + βg]dt = α f (t)dt + β g(t)dt.
a a a
´b ´b ´b
That a (f + g)dt = a f dt + a gdt follows immediately from the definition. We show the more tedious
´b ´b
a αf (t)dt = α a f (t)dt. We have (suppressing the limits of integration and dt for simplicity)
ˆ ˆ ˆ 
α f =α Re(f ) + i Im(f )
ˆ ˆ  ˆ ˆ 
= Re(α) Re(f ) − Im(α) Im(f ) + i Im(α) Re(f ) + Re(α) Im(f )
ˆ ˆ 
= Re(α) Re(f ) − Im(α) Im(f ) + i Im(α) Re(f ) + Re(α) Im(f )
ˆ ˆ ˆ
= Re(αf ) + i Im(αf ) = (αf ).

Notice that in this case


ˆ b ˆ b
f (t)dt = f (t)dt. (7.15)
a a
Indeed
ˆ b ˆ b ˆ b ˆ b ˆ b ˆ b
f (t)dt = Re f (t)dt − i Im f (t)dt = Re f (t)dt + i Im f (t)dt = f (t)dt.
a a a a a a

We also have the following estimate (which we will use repeteadly below)
ˆ b ˆ b
f (t)dt ≤ |f (t)|dt. (7.16)
a a

ANALYSIS III 66
CHAPTER 7. COMPLEX ANALYSIS

´b ´b
To prove this result, assume that a f (t)dt = Reiθ , where R = a f (t)dt . As a result of this
representation R also equals
ˆ b ˆ b
R = e−iθ f (t)dt = e−iθ f (t)dt.
a a
Now, if we write e−iθ f (t) = u + iv, with u and v real valued. Then we must have
ˆ b ˆ b
R= udt vdt = 0.
a a

Notice that u = Re[e−iθ f (t)] ≤ e−iθ f (t) ≤ |f (t)|. This implies


ˆ b ˆ b
R= u(t)dt ≤ |f (t)|dt.
a a
´b
But since R equals a f (t)dt we are done.
The definition above is a natural choice for integrating functions from R to C, with a far less obvious
choice for integrating a function from C to C. Instead, we want to study integrals of complex valued-valued
functions along curves, that is, expressions of the form
ˆ
f dz,
Γ

where Γ is a curve in the complex plane. To define a curve in C, consider a function γ : [a, b] → C,
given by γ(t) = x(t) + iy(t). We will ask that the curve γ be C 1 . The primary reason is that we want to
have a well defined tangent at every point of the curve (which is also integrable). We say that the curve
Γ = γ([a, b]) ⊂ C is parametrised by the map γ.
Definition 7.22. Given a function f : Ω ⊂ C → C along the path Γ ⊂ Ω ⊂ C parametrised by
γ : [a, b] → C the integral of f over Γ is given by
ˆ ˆ b ˆ b ˆ b
0 0
f dz = f (γ(t))γ (t)dt = Re(f (γ(t))γ (t))dt + i Im(f (γ(t))γ 0 (t))dt.
Γ a a a

Notice that we are not making any regularity assumptions on f , just that the integrals are well defined.
Sometimes ´ we will ´consider more than ´one parametrisation of a curve Γ, say γ1 and γ2 and will use the
notation γ1 f and γ2 f in addition to Γ .
On many occasions we want to consider curves that are not C 1 but perhaps just piece-wise C 1 . For
example a square. In this case we can think of Γ as a union of n curves Γj , each one C 1 , and parametrised
in the right direction, so that connected in the right order they describe the entire curve Γ. We can define
ˆ Xn ˆ
f dz := f dz.
Γ j=1 Γj

It is straightforward from the definition (details are left as an Exercise) that given a curve Γ, and two
functions f, g : C → C and α, β ∈ C we have
ˆ ˆ ˆ
(αf (z) + βg(z))dz = α f (z)dz + β g(z)dz.
Γ Γ Γ

If we allow for not to exist at finitely many points, this can be defined as a single integral, with clearly
γ 0 (t)
both formulations being equivalent.
Example 7.23. Let f : C → C be given by f (z) = f (x + iy) = x4 + iy 4 and the curve joining the origin in
a straight line to the point 1 + i, parametrized by γ : [0, 1] → C, γ(t) = (1 + i)t. Notice that γ 0 (t) = 1 + i
and so we have ˆ ˆ 1 ˆ 1
4 4 2
f= (t + it )(1 + i)dt = 2it4 dt = i.
Γ 0 0 5

ANALYSIS III 67
CHAPTER 7. COMPLEX ANALYSIS

´
In the next Lemma we want to show that Γf depends only on the orientation of the parametrisation
of the curve. More precisely
Lemma 7.24. Let Γ be a curve in C, parametrised by γ : [a, b] → C, that is γ([a, b]) = Γ. Given
f : Ω ⊂ C → C and Γ ⊂ Ω we have:
1. if γ − represents the parametrisation of γ in the opposite direction, then
ˆ ˆ
f = − f.
γ− γ

If a curve Γ has attached a sense of direction we will call it a directed curve. In this case we will
denote by −Γ the same curve swept in the opposite direction. Without the need to specify the
parametrisation we can reformulate the above result by
ˆ ˆ
f dz = − f dz.
Γ −Γ

2. If γ̃ : [ã, b̃] → C is another parametrisation of Γ that preserves the orientation then


ˆ ˆ
f = f.
γ̃ γ

We refer to this fact as reparametrisation invariance. [In practise, with the regularity we are demanding
on the curves, this means that there exists φ : [ã, b̃] → [a, b], bijective and increasing, such that
γ̃ = γ(φ).]
Proof. 1. Notice that if γ : [a, b] → C parametrises the curve in one direction then γ − is given by
γ − : [a, b] → C with γ − (t) = γ(a + b − t). Therefore
ˆ ˆ b ˆ b
− − 0
f= f (γ (t)) γ (t)dt = f (γ(a + b − t))(−γ 0 (a + b − t))dt
γ− a a
ˆ a ˆ b ˆ
0 0
= f (γ(s))(−γ (s))(−1)ds = − f (γ(s))γ (s)ds = − f.
b a γ

2. The proof is very similar to part one.


ˆ ˆ b̃ ˆ b̃ ˆ b ˆ
0 0 0 0
f= f (γ̃(t))γ̃ (t)dt = f (γ(φ(t)))γ (φ(t))φ (t)dt = f (γ(s))γ (s)ds = f,
γ̃ ã ã a γ

where we have made the change of variables φ(t) = s and therefore φ0 (t)dt = ds

Consider the function f (z) = 1 as a complex-valued function and a curve γ : [a, b] → C. Then
ˆ ˆ b
f dz = γ 0 (t)dt.
γ a

Here γ 0 (t) is a complex valued number and the integral´will be a complex


´1 number. For example if we take
γ just like in Example 7.23 we have γ 0 (t) = 1 + i and γ f dz = 0 (1 + i)dt = 1 + i. This is because we
are considering dz as complex valued, given by γ 0 (t)dt.
We could consider defining the integral
ˆ ˆ b ˆ bq
0
|dz| := |γ (t)|dt = (x0 (t))2 + (y 0 (t))2 dt = l(γ),
γ a a

where γ : [a, b] → C is given by γ(t) = x(t) + iy(t), and l(γ) stands for the length of the curve γ.

ANALYSIS III 68
CHAPTER 7. COMPLEX ANALYSIS

Similarly for f : C → C we can define


ˆ ˆ b
|f ||dz| := |f (γ(t))||γ 0 (t)|dt.
γ a
´
Notice that γ |f ||dz| ≥ 0 and that we have
ˆ ˆ
f dz ≤ |f ||dz|.
γ γ

To show this notice that (using (7.16))


ˆ ˆ b ˆ b ˆ
0 0
f dz = f (γ(t))γ (t)dt ≤ |f (γ(t))||γ (t)|dt = |f ||dz|.
γ a a γ

We can further estimate the right-hand side


ˆ ˆ
|f ||dz| ≤ max|f (z)| |dz| = max|f (z)|l(γ).
γ z∈Γ γ z∈Γ

Therefore we obtain ˆ
f dz ≤ max|f (z)|l(γ).
γ z∈Γ

Definition 7.25. Given f : C → C and a curve γ : [a, b] → C we define


ˆ ˆ b
f dz̄ := f (γ(t))γ 0 (t)dt.
γ a

Observe that in general ˆ


f (z)dz
γ
is not equal to ˆ
f (z)dz,
γ
unlike when we considered functions f : [a, b] → C; see (7.15). Instead we have
ˆ ˆ b ˆ b ˆ b ˆ
f (z)dz = 0
f (γ(t))γ (t)dt = 0
f (γ(t))γ (t)dt = 0
f (γ(t)) γ (t)dt = f (z)dz̄.
γ a a a γ

We compute a few more examples of integrals along curves.


Example 7.26. Integrate f (z) = z̄ (the definition does not require functions to be analytic) along the
circle of centred at 1 + i of radius 2 (oriented counterclockwise).
First we describe the curve γ. Notice that 2eit for t ∈ [0, 2π] describes a circle or raidus two centred
at the origin and with the required orientation. Therefore γ(t) = (1 + i) + 2eit for t ∈ [0, 2π). We have
γ 0 (t) = 2ieit . Therefore the integral becomes
ˆ ˆ 2π ˆ 2π ˆ 2π
it it it
f (z)dz = ((1 + i) + 2e )2ie dt = 2(1 − i)i e dt + 4idt = 8πi,
γ 0 0 0
´ 2π
since 0 eit dt = 0. Indeed
ˆ 2π ˆ 2π ˆ 2π
it
e dt = cos tdt + i sin tdt = 0.
0 0 0

In fact ˆ 2π
eint dt = 0, for all n 6= 0.
0

ANALYSIS III 69
CHAPTER 7. COMPLEX ANALYSIS

Example 7.27. Integrate f (z) = z along the circle of centred at 1 + i of radius 2 (oriented counterclock-
wise). As before γ(t) = (1 + i) + 2eit for t ∈ [0, 2π]. We have γ 0 (t) = 2ieit . Therefore the integral
becomes
ˆ ˆ 2π   ˆ 2π ˆ 2π
it it it
f (z)dz = (1 + i) + 2e 2ie dt = 2(1 + i)i e dt + 4ie2it dt = 0,
γ 0 0 0
´ 2π
using that 0 eint dt = 0, for all n 6= 0.
dF
Theorem 7.28. Assume that F : Ω ⊂ C → C is analytic (Ω open) and set f (z) = dz , with f continuous.
Let γ : [a, b] → Ω be a C 1 curve. Then
ˆ
f dz = F (γ(b)) − F (γ(a)).
γ

Proof. We have
ˆ ˆ b ˆ b ˆ b
dF d
f dz = f (γ(t))γ 0 (t)dt = (γ(t))γ 0 (t)dt = F (γ(t))dt = F (γ(b)) − F (γ(a)).
γ a a dz a dt

We remark that there are no assumptions made about Ω other than it is open. That is, all we need
for the result to be true, is that f is analytic in an open neighborhood of the curve. The notion of simply
connected (for a domain) will be defined later, but we emphasize that there is no such requirement on Ω
above result to be true.

7.3.1 Links with Green’s and Gauss’ Theorems


ONLY THE STATEMENT OF THEOREM 7.29 WAS COVERED IN CLASS. THE MATERIAL
PRECEDING IT WAS NOT COVERED, AND IT IS HENCE NOT EXAMINABLE.
We want to connect the notion of contour integral with the notions introduced in last year’s modules.
The line integral we have just defined has many similarities with the notion of tangential line integral for
a vector field F. There the definition read
ˆ ˆ β
dr
F · dr := F(r(t)) · dt,
C α dt
where C is a curve parametrised by r : [α, β] → Rn with r(α) = p and r(β) = q. For closed curves that
integral is usually referred as circulation.
We also recall the flux integral, which is given by
ˆ
F · ndt.
C

Here n represents the normal, with the following convention. If the curve C is parametrised by r(t) =
(x(t), y(t)), and r0 (t) = (x0 (t), y 0 (t)) has the same direction of the tangent, we choose

n(t) := r0 (t)⊥ = (y 0 (t), −x0 (t)).

When considering the curves determining the boundary of a regular domain we will consider them as
positively oriented. That is, choose the orientation so that the corresponding n as defined above corresponds
(i.e. has the same direction as) to the outward normal.
The following results (considered here only for two dimensions) correspond to Green’s and Gauss’
Theorems. For a positively oriented regular domain Ω we have
¨ ‰
curl Fdxdy = F · dr
Ω ∂Ω

ANALYSIS III 70
CHAPTER 7. COMPLEX ANALYSIS

and ¨ ‰
divFdxdy = F · ndt.
Ω ∂Ω
´
Now let’s consider our contour integral γ f (z)dz for a function f = u + iv and a curve γ(t) =
γ1 (t) + iγ2 (t). We have
ˆ ˆ b
f (z)dz = [u(γ(t)) + iv(γ(t))][γ10 (t) + iγ20 (t)]dt
γ a
ˆ b ˆ b
= u(γ(t))γ10 (t) − v(γ(t))γ20 (t)dt +i u(γ(t))γ20 (t) + v(γ(t))γ10 (t)dt
a a
ˆ b ˆ b
= (u, −v) · (γ10 , γ20 )dt + i (u, −v) · (γ20 , −γ10 )dt
a a
ˆ ˆ
= (u, −v) · dr + i (u, −v) · ndt,
γ γ

and so if we define the vector field f = (u, −v), we have just shown that
ˆ
f dz = circulation(f ) + i flux(f ).
γ

Using the above expression, together with Green’s and Gauss’ Theorem we can prove the following
result.
Theorem 7.29 (Cauchy’s Theorem). Let f : Ω → C be an analytic function, with Ω a simply connected
domain. Let γ be a C 1 closed curve in Ω. Then
ˆ
f dz = 0.
γ

Before we prove the result we define simply connected. Loosely speaking means that the domain
contains no holes. A set of more formal definitions is as follows.
Definition 7.30. A set Ω ⊂ C is connected if it cannot be expressed as the union of non-empty open sets
Ω1 and Ω2 such that Ω1 ∩ Ω2 = ∅. An open, connected set Ω ⊂ C is called simply connected if every
closed curve in Ω can be continuously deformed to a point.
Proof. (THIS PROOF IS NOT EXAMINABLE.) The proof presented here assumes that the curve is
a simple, regular curve and that f 0 is continuous. If the domain is simply connected, the region inside the
curve does not have any holes, and f is analytic in it. We know
ˆ
f dz = circulation(f ) + i flux(f )
γ
¨ ¨
= curl f dxdy + i divf dxdy ,
A A
where A is the region encircled by γ. We claim that both terms are actually 0, because curl f = divf = 0.
Since f = (u, −v) we have

divf = ux − vy curl f = −vx − uy

but since f = u + iv is analytic it satisfies the Cauchy–Riemann equations,

ux = vy vx = −uy

which imply the result.

ANALYSIS III 71
CHAPTER 7. COMPLEX ANALYSIS

Notice that Cauchy’s Theorem applies to Example 7.27, where the function is analytic, but obviously
not to Example 7.26, where the function is not analytic.
Cauchy’s Theorem works for more general curves. Consider the shaded region Ω in Figure 7.2. If we
think of its boundary as a one curve Γ, even though it is formed by two separate curves we have
ˆ
f dz = 0,
Γ

provided that Γ is oriented positively. That means that the exterior curve, that we denote by γ1 needs to
be oriented counter-clockwise, while the interior curve, denoted by γ2 has to be oriented clockwise.

Figure 7.2: Region bound by two positively oriented curves

An equivalent formulation of this fact, which will be extremely useful is known as the deformation of
contour Theorem.
Theorem 7.31. Let Ω ⊂ C be a region bounded by two closed simple curves γ1 (the exterior curve) and
γ2 (the interior). Assume they are oriented positively (meaning that Ω is to the left of the curves as we
traverse them), and let f be an analytic function in a neighbourhood of Ω ∪ γ1 ∪ γ2 . Then
ˆ ˆ
f dz + f dz = 0.
γ1 γ2

If we denote by γ2− the anti-clockwise parametrization of γ2 , then the result can be rephrased as
ˆ ˆ
f dz = f dz,
γ1 γ2−

that is the integral is the same along both curves when both are parametrised counter-clockwise.
Proof. The proof is based on creating two new contours of integration, the boundaries of two simply
connected regions where f is analytic so that we can apply Cauchy’s Theorem 7.29.
To achieve this we add two new curves to the previous picture, now in yellow in Figure 7.3. They join
the points A (in γ1 ) with D (in γ2 ) and the points B (in γ1 ) with C (in γ2 ). The two curves we want to
consider are denoted by ρ and η. Each one of them is piecewise C 1 and formed by four sections. Each one
of these curves is oriented positively with respect to the region they enclose, that is, they are both oriented
counter-clockwise.
By Cauchy’s Theorem
ˆ ˆ ˆ ˆ ˆ
f dz = f dz + f dz + f dz + f dz = 0, (7.17)
ρ ρ1 ρ2 ρ3 ρ4
ˆ ˆ ˆ ˆ ˆ
f dz = f dz + f dz + f dz + f dz = 0. (7.18)
η η1 η2 η3 η4

ANALYSIS III 72
CHAPTER 7. COMPLEX ANALYSIS

Figure 7.3: two positively oriented curves

We observe that η1 and ρ4 correspond to the same curve but with parametrisations in opposite directions.
Similarly for η3 and ρ2 . Therefore
ˆ ˆ ˆ ˆ
f dz + f dz = 0 f dz + f dz = 0.
η1 ρ4 η3 ρ2

Adding (7.17) and (7.18) and using the above identities we find
ˆ ˆ ˆ ˆ
f dz + f dz + f dz + f dz = 0
ρ1 ρ3 η2 η4

Also notice that ρ1 and η4 together build γ1 , while ρ3 and η2 build γ2 . Therefore, the above equality can
be rewritten as ˆ ˆ
f dz + f dz = 0.
γ1 γ2

Since ˆ ˆ
f dz = − f dz
γ2 γ2−

we obtain ˆ ˆ
f dz = f dz
γ1 γ2−

as required.

We now compute one of the fundamental contour integrals. We will show that
ˆ (
2πi n = −1,
(z − a) dz =n
(7.19)
∂Br (a) 0 n 6= 1,

where ∂Br (a) denotes the boundary of the ball of radius r, parametrised counter-clockwise (i.e. positively
oriented with respect to Br (a)).
Observe that the result is uniform with respect to r. That is a natural consequence of Theorem 7.31,
given than the functions we are integrating only fail to be analytic at one point (at most, depending on
n). In fact we could have chosen any curve that wraps around a once and obtain the same result.

ANALYSIS III 73
CHAPTER 7. COMPLEX ANALYSIS

Now, to compute the integral above, notice that we can parametrise the curve as γ(t) = a + reit , for
t ∈ [0, 2π). Therefore we have (since γ 0 (t) = ireit )
ˆ ˆ 2π ˆ 2π
n it n it n+1
(z − a) dz = (re ) ire dt = ir ei(n+1)t dt.
∂Br (a) 0 0

Notice that in the case n = −1 that expression equals 2πi. When n 6= −1 notice that we obtain 0, since
for all k 6= 0 we have
ˆ 2π 2π
1 1 1
eikt dt = eikt = − = 0.
0 k 0 k k
We restate, in the notation that will be most convenient for the next few results, the fundamental integral
above in the case n = −1, noting that the result does not depend on r. We have
ˆ
1
dw = 2πi.
∂Br (z) w − z

Definition 7.32. Given a simple closed C 1 curve γ we denote by I(γ) the interior region to γ. We denote
by O(γ) the exterior region to γ.
Notice that by the deformation of contours Theorem we have
ˆ ˆ
1 1
dw = dw = 2πi (7.20)
γ w − z ∂Br (z) w − z
for every z ∈ I(γ) and every r sufficiently small so that Br (z) ⊂ I(γ).
Theorem 7.33. Let γ : [a, b] → C be a positively oriented simple closed C 1 curve. Assume that f is
analytic in γ and on the interior of γ, I(γ). Then
ˆ
1 f (w)
f (z) = dw for all z ∈ I(γ). (7.21)
2πi γ w − z
Proof. Fix z ∈ I(γ), and choose r small enough so that Br (z) ⊂ I(γ). By the deformation of contours
theorem we have ˆ ˆ
1 f (w) 1 f (w)
dw = dw,
2πi γ w − z 2πi ∂Br (z) w − z
reducing the problem to considering γ as a ∂Br (z). Observe that the integral is the same for every r
sufficiently small, and later on we will exploit this fact by talking limits as r tends to zero. For now, we
have
ˆ ˆ ˆ
1 f (w) 1 f (z) 1 f (w) − f (z)
dw = dw + dw =: I + II.
2πi ∂Br (z) w − z 2πi ∂Br (z) w − z 2πi ∂Br (z) w−z
Notice that the first integral I equals f (z). Indeed, using (7.20)
ˆ ˆ
1 f (z) 1 1
dw = f (z) dw = f (z).
2πi ∂Br (z) w − z 2πi ∂Br (z) w − z
All that remains to is to show that II = 0. Notice that since f is analytic in I(γ), given any ε > 0 we can
find r sufficiently small so that
|f (w) − f (z)| ≤ ε for all w ∈ ∂Br (z).
We parametrise ∂Br (z) counterclockwise by γ(t) = z + reit for t ∈ [0, 2π]. We have γ 0 (t) = ireit and
therefore ˆ ˆ 2π
1 f (w) − f (z) 1 f (z + reit ) − f (z) it
|II| = dw ≤ ire dt
2πi ∂Br (z) w−z 2πi 0 reit
ˆ 2π
1
≤ |f (z + reit ) − f (z)|dt ≤ ε.
2π 0
Since ε is arbitrary we obtain the desired result.

ANALYSIS III 74
CHAPTER 7. COMPLEX ANALYSIS

Remark 7.34. The formula


ˆ
1 f (w)
f (z) = dw for all z ∈ I(γ)
2πi γ w−z

has remarkable consequences for analytic functions. First notice that it claims that we can recover the
value of f at any point by integration along a curve around that point (provided the curve is sufficiently
regular, positively oriented, and contained in I(γ)). This is a very significant difference with respect to
smooth functions in R2 for example.
THIS IS THE END OF THE MATERIAL THAT WE COVERED IN THE MODULE. THE
REST IS NOT EXAMINABLE.

7.4 Additional material (NOT COVERED AND NON-EXAMINABLE)


Notice that since the curve γ is a compact set, for any point z ∈ I(γ) the expression w − z found in
the denominator in Cauchy’s formula is bounded away from zero, suggesting that we can differentiate the
formula with respect to z to obtain
ˆ
0 1 f (w)
f (z) = dw.
2πi γ (w − z)2

Of course we need to justify moving the derivative inside the integral sign. We assumed that f was analytic,
which means that f 0 (z) exists. The expression above would produce a formula for it, a way to compute it.
The key observation is that without assuming that f has more derivatives it seems that the right hand side
can be differentiated arbitrarily many times, which would suggest that f has infinitely many derivatives.
This is indeed the case as we will show in the next Theorem.

Theorem 7.35. Let γ : [a, b] → C be a positively oriented simple closed C 1 curve. Assume that f is
analytic in γ and on the interior of γ, I(γ). Then f (n) (z) exists for all n ∈ N and the derivative is given by
ˆ
n! f (w)
(n)
f (z) = dw for all z ∈ I(γ). (7.22)
2πi γ (w − z)(n+1)

Proof. Notice that Theorem 7.33 would correspond to the case n = 0 in the current Theorem. In order to
prove the result for n = 1 we consider the incremental quotient, and use (7.21) to obtain
" ˆ ˆ #
f (z + h) − f (z) 1 1 f (w) 1 f (w)
= dw − dw .
h h 2πi γ w−z−h 2πi γ w−z

By the deformation of contours Theorem we can choose γ as ∂B2r (z), with B2r (z) ⊂ I(γ). We have,
operating on the right-hand side
ˆ
f (z + h) − f (z) 1 f (w)
= dw
h 2πi ∂B2r (z) (w − z − h)(w − z)
ˆ ˆ
1 f (w) 1 1 1
 
= 2
dw + f (w) − dw
2πi ∂B2r (z) (w − z) 2πi ∂B2r (z) (w − z − h)(w − z) (w − z)2
ˆ ˆ
1 f (w) 1 hf (w)
 
= dw + dw.
2πi ∂B2r (z) (w − z)2 2πi ∂B2r (z) (w − z − h)(w − z)2
To conclude the proof all that we need to do is show that the limit of the last integral as h tends to zero
is zero, that is (ignoring factors of 2πi)
ˆ
hf (w)
 
lim dw = 0,
h→0 ∂B2r (z) (w − z − h)(w − z)2

ANALYSIS III 75
CHAPTER 7. COMPLEX ANALYSIS

and recall that we are able to choose r arbitrarily small without affecting the value of the integrals above.
First we choose |h| < r so that for all w ∈ ∂B2r (z) we have
|w − z − h| ≥ |w − z| − |h| > r.
Here we have used the reverse triangle inequality in the first case, and the fact that |w − z| = 2r for points
w ∈ ∂B2r (z). Choosing γ(t) = z + 2reit for t ∈ [0, 2π), we have γ 0 (t) = 2rieit , and therefore |γ 0 (t)| ≤ 2r.
Since f is analytic, in particular it is continuous and therefore there exists M > 0 such that |f (w)| ≤ M
for all w ∈ ∂B2r (z). Using these facts we have
ˆ ˆ 2π
hf (w) hM πM
 
2
dw ≤ 2
2rdt = 2 h,
∂B2r (z) (w − z − h)(w − z) 0 r(2r) r
which goes to zero as h goes to zero, proving the result for n = 1. The general case is proven by induction.
If we assume the result for n = 1, 2, · · · , k − 1 we want to prove it for n = k. That is, in particular we
assume ˆ
(k − 1)! f (w)
f (k−1)
(z) = (k)
dw for all z ∈ I(γ).
2πi γ (w − z)
We write the corresponding incremental quotient, just as before
" ˆ ˆ #
f (k−1) (z + h) − f (k−1) (z) 1 (k − 1)! f (w) (k − 1)! f (w)
= k
dw − k
dw .
h h 2πi γ (w − z − h) 2πi γ (w − z)

By the deformation of contours Theorem we can choose γ as ∂B2r (z), with B2r (z) ⊂ I(γ). We have,
operating on the right-hand side
ˆ
f (k−1) (z + h) − f (k−1) (z) (k − 1)! f (w)[(w − z)k − (w − z − h)k ]
= dw
h 2πih ∂B2r (z) (w − z − h)k (w − z)k
ˆ
k! f (w)
= dw
2πi ∂B2r (z) (w − z)(k+1)
ˆ " #
(k − 1)! [(w − z)k − (w − z − h)k ] k
+ f (w) − dw
2πi ∂B2r (z) h(w − z − h)k (w − z)k (w − z)(k+1)
ˆ ˆ " #
k! f (w) (k − 1)! (w − z)k+1 − (w − z − h)k (w − z) − kh(w − z − h)k
= dw+ f (w) dw.
2πi ∂B2r (z) (w − z)k+1 2πi ∂B2r (z) h(w − z − h)k (w − z)k+1
(7.23)
As before, all that remains is to show that the last integral tends to zero as h tends to zero. We choose h
and the parametrisation as above. The result will follow if we show that
(w − z)k+1 − (w − z − h)k (w − z) − kh(w − z − h)k
≤ C|h|,
h
where the constant might depend on r. This is the case because, as before |f | ≤ M and |w − z − h| ≥
|w − z| − |h| > r implies
1 1
≤ .
(w − z − h)k (w − z)k (2r)k rk
In order to prove (7.23), notice that the binomial formula implies
k
!
k
X k
(w − z − h) = (w − z)k−j (−h)j
j=0
j
and therefore
(w − z)k+1 − (w − z − h)k (w − z) − kh(w − z − h)k
k k
! !
X k X k
=− (w − z)k+1−j (−h)j − kh (w − z)k−j (−h)j
j=2
j j=1
j
which is of order h2 , proving the result.

ANALYSIS III 76
CHAPTER 7. COMPLEX ANALYSIS

7.4.1 Consequences of Cauchy’s Theorem (NOT COVERED AND NOT EXAMINABLE)


Theorem 7.36 (Taylor Series Expansion). Let f be an analytic function on BR (a) for a ∈ C, R > 0.
There there exist unique constants cn , n ∈ N such that

X
f (z) = cn (z − a)n for all z ∈ BR (a).
n=0

Moreover, the coefficients cn are given by


ˆ
1 f (w) f (n) (a)
cn = dw = ,
2πi γ (w − a)n+1 n!

where γ is any positively oriented simple closed curve (piece-wise C 1 ) that is contained in BR (a) with
a ∈ I(γ).

Proof. Given some z ∈ BR (a) we will take γ to be ∂Br (a) (positively oriented), for r small enough so
that |z − a| < r < R. We can use the Theorem of deformation of contours to prove the integrals over all
curves γ as above are the same. Cauchy’s formula (7.21) gives
ˆ
1 f (w)
f (z) = dw. (7.24)
2πi ∂Br (a) w − z

Notice that since |w − a| = r and we have chosen r so that |z − a| < r we have |z − a| < |w − a| for all
w ∈ ∂Br (a). As a result
|z − a|
<1
|w − a|
and we can use the geometric series expansion to obtain
∞ n
1 1 1 1 X z−a

=   = .
w−z w−a 1− z−a w − a n=0 w − a
w−a

Inserting this expression in (7.24) we obtain


ˆ ∞ n
1 1 X z−a

f (z) = f (w) dw.
2πi ∂Br (a) w − a n=0 w − a

For w ∈ ∂Br (a) the series converges absolutely (Weierstrass M-test), and therefore we can exchange the
order of the summation and integration to obtain
∞ ˆ ∞
X 1 f (w) n
X
f (z) = dw(z − a) = cn (z − a)n ,
n=0
2πi ∂Br (a) (w − a)n+1 n=0

obtaining the desired result. It remains to show that the coefficients are unique. Now, assume that
f (z) = ∞ k=0 bk (z − a) for some bk ∈ C. We have
k
P

ˆ ˆ ∞
f (w) X 1
dw = bk (w − a)k dw
∂Br (a) (w − a)n+1 ∂Br (a) k=0 (w − a)n+1

X
ˆ
= bk (w − a)k−n−1 dw = 2πibn ,
k=0 ∂Br (a)

where we have used the fundamental integrals, together with the fact that we can commute the summation
and integration. This proves that bn = cn , concluding the proof.

ANALYSIS III 77
CHAPTER 7. COMPLEX ANALYSIS

Example 7.37. We consider an example of a Taylor series. We consider the function (1 + z)a for a ∈ C
and |z| < 1.
When we consider logarithms we noticed that z n is well defined for n ∈ N, but not for any a, without
making any specific choice of the argument function. In this case
n
!
X n k
(1 + z)n = z ,
k=0
k
which is a polynomial of order n, and equals the Taylor series expansion centred at the origin. This series
converges for every z ∈ C, not just |z| < 1. However, we defined

f (z) = (1 + z)a := eaLog(1+z)

having made a choice of the argument function defining the logarithm, which meant creating a branch cut
where the function was not defined. Choosing the argument in (−π, π), and since our function is translated
(not z a ) we obtained a function that is not defined for z ∈ (−∞, −1].
We want to show that in fact a binomial expansion is possible for all a ∈ C. We know by Taylor’s
Theorem 7.36 that we have a Taylor expansion. To compute we need to work out the derivatives of (1+z)a .
Using the definition we have (for (a ∈/ N))
 0 a
eaLog(1+z) = eaLog(1+z) a(Log(1 + z))0 = (1 + z)a = a(1 + z)a−1 .
1+z
Notice that since by induction we have

dk  aLog(1+z) 
e = a(a − 1) · · · (a − k + 1)(1 + z)a−k .
dz k
Therefore we obtain the Taylor series (centred at 0)

X a(a − 1) · · · (a − k + 1) k
z .
k=0
k!

Notice that the radius of convergence of this series is 1, as we know there are issues for z ∈ (−∞, −1].
The binomial coefficient, for integer values n and k is
!
n n! n(n − 1) · · · (n − k + 1)
= =
k (n − k)!k! k!

and so extending the defition to n ∈ C we obtain


∞ ∞
!
X a(a − 1) · · · (a − k + 1) k X
a a k
(1 + z) = z = z .
k=0
k! k=0
k

We can obtain similar expansions centred at different points



a
X a(a − 1) · · · (a − k + 1)
(1 + z) = (1 + z0 )a−k (z − z0 )k ,
k=0
k!

which would naturally a radius of convergence R equal to the distance from the point z0 to the half line
{x ≤ −1}, where we have made a brach cut for the Log function. You may ignore the issue of the radius
of convergence for this series for the exam.

The following result is also a direct consequence of Cauchy’s formula.

Theorem 7.38 (Liouville’s Theorem). Let f : C → C be an analytic, bounded function. Then f is


constant.

ANALYSIS III 78
CHAPTER 7. COMPLEX ANALYSIS

Proof. Assume that |f (z)| ≤ M for all z ∈ C. Let a 6= b be two points in C. Choose R large enough so
that 2 max{|a|, |b|} < R. That means that if we consider w ∈ ∂BR (0), that is |w| = R then

R R
|w − a| > |w − b| > .
2 2
Since f is analytic in C we can use Cauchy’s formula to compute f (a) and f (b) using ∂BR (0) as the curve
γ (of course positively oriented!). We have
ˆ ˆ
1 f (w) 1 f (w)
f (a) − f (b) = dw − dw
2πi ∂BR (0) w − a 2πi ∂BR (0) w − b
ˆ ˆ
1 1 1 a−b f (w)
 
= f (w) − dw = dw.
2πi ∂BR (0) w−a w−b 2πi ∂BR (0) (w − a)(w − b)
Therefore ˆ
|a − b| M |a − b|4M
|f (a) − f (b)| ≤ 1dw = ,
2π R2 /4 ∂Br (0) R
´
as ∂BR (0 1dw is just the length of the curve, which equals 2πR. Notice that since R is arbitrary (provided
that it is big enough, as indicated above) we can send R to infinity, showing that |f (a) − f (b)| = 0 for
any a and b in C, therefore proving that the function is constant.

A fundamental consequence of Liouville’s Theorem is Fundamental Theorem of Algebra.

Theorem 7.39 (Fundamental Theorem of Algebra). Every non-constant polynomial p on C has a root,
that is, there exists a ∈ C such that p(a) = 0.

Proof. We will prove the result by contradiction. Assume that |p(z)| = 6 0 for every z ∈ C. Define
f : C → C by f (z) = p(z) 1
. Now, since p does not vanish, the function f is analytic in all of C, since it is
the composition of two holomorphic functions (1/z is holomophic outside the origin).
Notice that if we assume p(z) = nk=0 ck z k , with cn 6= 0 (n > 0), then at infinity the polynomial
P

behaves like cn z n , as that is the highest power. That means |p(z)| goes to infinity as z goes to infinity,
and satisfies |p(z)| > 1 for all |z| > R for some R > 0. As a result the function f (z) = p(z) 1
is bounded
in C. It is less than 1 for all |z| > R based on our analysis of p, and it is bounded on the compact set
|z| ≤ R since it is continuous.
Liouville’s Theorem implies that f is in fact constant, which would force p to be constant, which is a
contradiction.

Theorem 7.40. Let fn : Ω → C be a sequence of analytic functions on an open set Ω. If fn converges


uniformly to f , then f is analytic.

Recall that for a function to be analytic at one point we require that the function be differentiable in
a neighbourhood of the point, and therefore the assumption on Ω being open is natural. Being analytic is
a local property, and requiring that the uniform convergence holds only on compact sets would suffice.

Proof. Let z ∈ Ω. Choose r > 0 sufficiently small so that Br (z) ⊂ Ω. Since fn is analytic in Ω we can
apply Cauchy’s formula to obtain
ˆ
1 fn (w)
fn (z) = dw.
2πi ∂Br (z) w − z

Taking limits as n goes to infinity, and assuming that we can move the limit inside the integral we would
obtain ˆ
1 f (w)
f (z) = dw.
2πi ∂Br (z) w − z

ANALYSIS III 79
CHAPTER 7. COMPLEX ANALYSIS

We have seen before that this implies that f is differentiable (in fact infinitely differentiable) and obtained
an expression for its derivative (see Theorem 7.35). So the only thing left is to justify moving the limit
inside the integral. Notice that this is really a one dimensional integral and we can apply the results learnt
earlier in the year. Taking γ(t) = z + reit for t ∈ [0, 2π), we have γ 0 (t) = ireit and so
ˆ ˆ 2π ˆ 2π
fn (w) fn (z + reit ) it
dw = ire dt = i fn (z + reit )dt. (7.25)
∂Br (z) w−z 0 reit 0

For fixed z, as a function of t we have that fn (z + reit ) converges uniformly to f (z + reit ) and applying
Theorem 2.16 we can move the limit inside the integral, obtaining
ˆ ˆ 2π ˆ 2π
fn (w) it
lim dw = lim i fn (z + re )dt = i f (z + reit )dt.
n→∞ ∂B (z)
r
w−z n→∞ 0 0

Notice that we have (reading the expression (7.25) backwards, now for f instead of for fn )
ˆ 2π ˆ
it f (w)
i f (z + re )dt = dw,
0 ∂Br (z) w−z
obtaining the result.

7.4.2 Applications of Cauchy’s formula to evaluate integrals in R (NOT COVERED


AND NOT EXAMINABLE)
We present various examples that illustrate a more general theory (of residues) for computing integrals of
functions over R.
Consider for example ˆ ∞
1
2
dx.
−∞ 1 + x

The idea is to consider the contours γ1 and γ2 in Figure 7.4.

Figure 7.4: Contours

γ1 is formed by the segment joining −R and R, together with the half circle or radius R. The contour γ2
is a circle centred at i and of radius r < 1. To understand the choice of curves, notice that we can rewrite
the integral as ˆ ∞
1
dx.
−∞ (x − i)(x + i)

ANALYSIS III 80
CHAPTER 7. COMPLEX ANALYSIS

Notice that in the region enclosed by γ2 the function (the integrand extended to a function on C)
1
f (z) :=
(z − i)(z + i)
is analytic except for at z = i. By the deformation of contours Theorem we know that
ˆ ˆ
f (z)dz = f (z)dz,
γ1 γ2

since the two curves have the same orientation. Now


ˆ ˆ R ˆ
f (z)dz = f (z)dz + f (z)dz.
γ1 −R arc
We parametrise the arc by Reit for t ∈ [0, π). We have
ˆ ˆ ˆ π
1 1
f (z)dz = 2
dz = 2 2it
Rieit dt.
arc arc 1 + z 0 1+R e

Therefore ˆ ˆ π
R R
f (z)dz ≤ dt = π 2 .
arc 0 R2
−1 R −1
´
As R tends to infinity the arc f (z)dz equals zero. Therefore
ˆ ∞ ˆ ˆ
f (z)dz = f (z)dz = f (z)dz.
−∞ γ1 γ2

Now ˆ ˆ
1 1
f (z)dz = dz.
γ2 ∂Br (i) z+iz−i
Recall that by Cauchy’s formula if g(z) is analytic in the interior of a positively oriented curve then
ˆ
1
g(z) dz = 2πig(a).
γ z−a

Therefore, taking g(z) = 1


z+i we obtain
ˆ
1 1 1
dz = 2πi = π,
∂Br (i) z+iz−i 2i

which yields ˆ ∞
1
dx = π.
−∞ 1 + x2

As a second example, let’s compute ˆ ∞


1
dx.
∞ 1 + x4
Notice that the function
1
1 + z4
has four singularities at the points

eπi/4 e3πi/4 e−πi/4 e−3πi/4 ,

and so if we choose a contour similar to the one above (an expanding semi-circle) there will be two
singularities in the interior. We obtain the picture 7.5.

ANALYSIS III 81
CHAPTER 7. COMPLEX ANALYSIS

Figure 7.5: Contours

Now γ1 is built out of the line joining −R and R, together with the semi-circle or radius R centred at
0. γ2 and γ3 correspond to circles centred at e3πi/4 and eπi/4 , oriented clock-wise (positively with respect
to both the blue-shaded and yellow-shaded regions). Notice that with those orientations we have
ˆ ˆ ˆ
1 1 1
4
dz + 4
dz + 4
dz = 0.
γ1 1 + z γ2 1 + z γ3 1 + z

We start by considering the integral over γ1 . Wave


ˆ ˆ R ˆ
1 1 1
4
dz = 4
dz + 4
dz.
γ1 1 + z −R 1 + z arc 1 + z
We will show that the integral over the arc goes to zero as R goes to infinity. Indeed
ˆ ˆ
1 1 πR
4
dz ≤ 4
|dz| = 4 ,
arc 1 + z arc R − 1 R −1
´
which goes to zero as R goes to infinity. Above we have used that arc |dz| =length(arc)= πR.
We now consider the integral over γ2 . We have (denoting by γ2− the anti-clockwise parametrisation of
the circle) ˆ ˆ
1 1
dz = − dz
γ2 1 + z
4
γ2 (z − e
− iπ/4 )(z − e 3iπ/4 )(z − e−iπ/4 )(z − e−3iπ/4 )
ˆ
g(z)
=− dz = −2πig(e3iπ/4 ),
− (z − e3iπ/4 )
γ2

where the above defines g as


1
g(z) = ,
(z − eiπ/4 )(z − e−iπ/4 )(z − e−3iπ/4 )

and we have used Cauchy’s formula as g is analytic inside the curve γ2− . Now
1 1
g(e3iπ/4 ) = −iπ/4 −3iπ/4
= √ √ √ √ .
(e3iπ/4 − eiπ/4 )(e3iπ/4 −e )(e3iπ/4 −e ) (− 2)(− 2 + 2i)( 2i)
Now we consider the integral over γ3
ˆ ˆ
1 1
dz = − dz
γ3 1 + z
4
γ3− (z − e
iπ/4 )(z − e3iπ/4 )(z − e−iπ/4 )(z − e−3iπ/4 )

ANALYSIS III 82
CHAPTER 7. COMPLEX ANALYSIS

ˆ
h(z)
=− iπ/4
dz = −2πih(eiπ/4 ),
γ3− (z − e )
where the above defines h as
1
h(z) = ,
(z − e3iπ/4 )(z − e−iπ/4 )(z − e−3iπ/4 )

and we have used Cauchy’s formula as h is analytic inside the curve γ3− . Now

1 1
h(eiπ/4 ) = −iπ/4 −3iπ/4
=√ √ √ √ .
(eiπ/4 − e3iπ/4 )(eiπ/4 −e )(eiπ/4 −e ) 2( 2i)( 2 + 2i)

Since we have ˆ ∞ ˆ ˆ
1 1 1
dz = − d − dz
−∞ 1 + z4 γ2 1 + z4 γ3 1 + z4
we obtain
ˆ ∞

1 1 1 π 2
dz = 2πi √ √ √ √ + 2πi √ √ √ √ = .
−∞ 1 + z4 (− 2)(− 2 + 2i)( 2i) 2( 2i)( 2 + 2i) 2

In addition to being able to integrate quotients involving polynomials, we can integration some trigono-
metric functions. For example ˆ ∞
cos(3x)
2
dx.
−∞ 4 + x

We can rewrite this integral as ˆ ∞


e3iz
Re dz,
−∞ (z − 2i)(z + 2i)
and we can actually drop the Re part as the imaginary part will be an odd integrand and it will vanish.
We consider the contours (notice they are both oriented counter-clockwise)

Figure 7.6: Contours

As before, ˆ ˆ
e3iz e3iz
dz = dzdz.
γ1 (z − 2i)(z + 2i) γ2 (z − 2i)(z + 2i)
Also ˆ ˆ ˆ
R
e3iz e3iz e3iz
dz = dz + |dz|.
γ1 (z − 2i)(z + 2i) −R (z − 2i)(z + 2i) arc (z − 2i)(z + 2i)

ANALYSIS III 83
CHAPTER 7. COMPLEX ANALYSIS

We consider first the integral over the arc (half circle of radius R). We have, for R >> 4
ˆ ˆ ˆ
e3iz e3iz e−3 Im z πR
dz ≤ 2
|dz| ≤ 2
|dz| ≤ 2 −−−−→ 0,
arc (z − 2i)(z + 2i) arc |z + 4| arc R − 4 R − 4 R→∞

where we have used that along the arc, Im z ≥ 0 and so e−3 Im z ≤ 1. Now for γ2 (remember it is oriented
anti-clockwise)
ˆ ˆ
e3iz g(z)
dz = dz = 2πig(2i),
γ2 (z − 2i)(z + 2i) γ2 z − 2i

where
e3iz
g(z) = dz
z + 2i
and we have used Cauchy’s formula since g is analytic inside γ2 . We have

e−6
g(2i) = .
4i
Therefore
ˆ ∞ ˆ ˆ
e3iz e3iz e3iz π
dz = = = 2πig(2i) = e−6 .
−∞ (z − 2i)(z + 2i) γ1 (z − 2i)(z + 2i) γ2 (z − 2i)(z + 2i) 2

We use a similar approach for


ˆ ∞ ˆ ∞
x sin x 1 zeiz
2
dx = dz.
−∞ 1+x i −∞ 1 + z2

Notice the real part of the integral vanishes as the integrand is odd (hence dividing the i). We consider
the following contours of integration We consider the contours (notice they are both oriented counter-
clockwise) By Cauchy’s Theorem since the integrand is analytic in the region between the curves we

Figure 7.7: Contours

have ˆ ˆ
zeiz zeiz
dz = dz.
γ1 1 + z2 γ2 1 + z2
Now for γ2
ˆ ˆ
zeiz h(z)
dz = dz = 2πih(i),
γ2 (z − i)(z + i) γ2 (z − i)

ANALYSIS III 84
CHAPTER 7. COMPLEX ANALYSIS

for
zeiz
h(z) = ,
z+i
and so ˆ
zeiz ie−1 π
dz = 2πi = i.
γ2 (z − i)(z + i) 2i e
We now consider the integral over γ1 . We look at the integral along the arc.
ˆ ˆ π ˆ π
zeiz Reit e−R sin t+Ri cos t it R2 −R sin t
2
dz = Rie dt ≤ e dt
arc 1 + z 0 1 + R2 e2it 2
0 R −1

ˆ π ˆ π/2 ˆ π/2 π/2


−R sin t −R sin t −R2t/π π
≤2 e dt = 4 e dt ≤ 4 e dt = −4 e−R2t/π
0 0 0 2R 0
−2π h −R i
= e − 1 −−−−→ 0.
R R→∞
Therefore ˆ ˆ ˆ

x sin x 1 zeiz 1 zeiz π
2
dx = dz = dz = .
−∞ 1+x i γ1 (z − i)(z + i) i γ2 (z − i)(z + i) e

As the final example we consider an integrand that has a singularity along the natural path of integration
ˆ ∞
sin x
dx.
−∞ x

If we complexify the integrand, we are left with


ˆ ∞
1 eiz
dz,
i −∞ z

since the real part of the integrand is odd. Since the denominator vanishes for a point in the real axis we
need to modify the contours we consider.
The contour is formed of 4 curves, and since the integrand is analytic we know that
ˆ ˆ ˆ ˆ
eiz eiz eiz eiz
dz + dz + dz + dz = 0.
γ1 z γ2 z γ3 z γ4 z

Now for γ1

Figure 7.8: Contours


ˆ ˆ π
eiz eiR cos t−R sin t
dz = iReit dt
γ1 z 0 Reit
and so ˆ ˆ π
eiz
dz ≤ eR sin t dt −−−−→ 0
γ1 z 0 R→∞

ANALYSIS III 85
CHAPTER 7. COMPLEX ANALYSIS

as we have seen before. As for γ3 , since it is oriented clock-wise


ˆ ˆ ˆ π it ˆ π
eiz eiz eiεe
dz = − dz = − iεeit dt = −i eiε cos t e−ε sin t dt −−−→ −πi.
γ3 z γ3− z 0 εeit 0 ε→0

Therefore ˆ ˆ
∞ ∞
sin x 1eiz
dx = dz
−∞ x −∞ z
i
"ˆ ˆ # " ˆ ˆ #
1 eiz eiz 1 eiz 1 eiz
= lim lim dz + dz = lim lim − dz − dz = π.
ε→0 R→∞ i γ2 z γ4 z ε→0 R→∞ i γ1 z γ3 i z

ANALYSIS III 86
Bibliography

[1] Rudin, W. Principles of mathematical analysis. Third edition. International Series in Pure and Applied
Mathematics. McGraw-Hill Book Co., 1976.

[2] Lebl, J. Basic Analysis: Introduction to Real Analysis. Available to download at


https://www.jirka.org/ra/

[3] Ross, K. Elementary Analysis: the theory of calculus. Springer 2013

[4] Abbott, S. Understanding analysis. Springer, 2001.

[5] Gelbaum, B.R., & Olmsted, J.M.H. Counterexamples in Analysis, Holden Day Inc, 1964.

87

You might also like