Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views195 pages

Basic Analysis II

This document is the second volume of 'Basic Analysis' by Jirí Lebl, intended as a continuation of the first volume for undergraduate analysis courses. It covers advanced topics in real analysis, including vector spaces, linear mappings, and partial derivatives, and is structured to align with the curriculum of a second-semester analysis course. The book is dual licensed under Creative Commons, allowing for sharing and adaptation under specified conditions.

Uploaded by

lamecochieng995
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views195 pages

Basic Analysis II

This document is the second volume of 'Basic Analysis' by Jirí Lebl, intended as a continuation of the first volume for undergraduate analysis courses. It covers advanced topics in real analysis, including vector spaces, linear mappings, and partial derivatives, and is structured to align with the curriculum of a second-semester analysis course. The book is dual licensed under Creative Commons, allowing for sharing and adaptation under specified conditions.

Uploaded by

lamecochieng995
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 195

Basic Analysis II

Introduction to Real Analysis, Volume II

by Jiří Lebl

May 7, 2018
(version 2.0)
2

Typeset in LATEX.

Copyright c 2012–2018 Jiří Lebl

This work is dual licensed under the Creative Commons Attribution-Noncommercial-Share Alike
4.0 International License and the Creative Commons Attribution-Share Alike 4.0 International
License. To view a copy of these licenses, visit
or or send a letter to
Creative Commons PO Box 1866, Mountain View, CA 94042, USA.

You can use, print, duplicate, share this book as much as you want. You can base your own notes
on it and reuse parts if you keep the license the same. You can assume the license is either the
CC-BY-NC-SA or CC-BY-SA, whichever is compatible with what you wish to do, your derivative
works must use at least one of the licenses.

During the writing of these notes, the author was in part supported by NSF grant DMS-1362337.

The date is the main identifier of version. The major version / edition number is raised only if there
have been substantial changes. For example version 2.0 is first edition, 0th update (no updates yet).

See for more information (including contact information).


Contents

7
. . . . . . . . . . . . . . . . . . . . 7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
. . . . . . . . . . . . . . . . . . . . . . . . 47
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

61
. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

85
. . . . . . . . . . . . . . . . . . . . . . . . . . . 85
. . . . . . . . . . . . . . . . . . . . . . . . . 96
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
. . . . . . . . . . . . . . . . . . . . . . . 109
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

125
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
. . . . . . . . . . . . . . . . . . . . . . . . . . 138
. . . . . . . . . . . . . . 147
. . . . . . . . . . . . . . . . . . . . . . . . . . . 153
. . . . . . . . . . . . . . . . . . . . 156
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

187
4 CONTENTS

189

193
Introduction

About this book


This book is the continuation of “Basic Analysis”. The book is meant to be a seamless continuation,
so the chapters are numbered to start where the first volume left off. The book started with my notes
for a second semester undergraduate analysis at University of Wisconsin—Madison in 2012, where
I used my notes together with Rudin’s book. The choice of some of the material and many of the
proofs are very similar to Rudin, though I do try to provide more detail and context. In 2016, I
taught a second semester undergraduate analysis at Oklahoma State University and heavily modified
and cleaned up the notes, this time using them as the main text. In 2018, I taught this course again,
this time adding chapter (which I originally wrote for the Wisconsin course).
I plan on eventually adding some more topics. I will try to preserve the current numbering in
subsequent editions as always. The new topics I have planned would add chapters onto the end
of the book, or add sections to end of existing chapters, and I will try as hard as possible to leave
exercise numbers unchanged.
For the most part, this second volume depends on the non-optional parts of volume I. Of the
optional parts, higher order derivatives (but not Taylor’s theorem itself) are used in , , .
Exponentials, logarithms, improper integrals are used a few times in examples and exercises, and
they are heavily used in .
This book is not necessarily the entire second semester course, though it should have enough
material for an entire semester if need be. What I had in mind for a two semester course is that some
bits of the first volume, such as metric spaces, are covered in the second semester, while some of
the optional topics of volume I are covered in the first semester. Leaving metric spaces for second
semester makes more sense as then the second semester is the “multivariable” part of the course.
Another possibility for a faster course is to leave out some of the optional parts, go quicker in
the first semester including metric spaces and then arrive at .
Several possibilities for things to cover after metric spaces, depending on amount of time are
1) – , – , .
2) Chapter , chapter , and .
3) Chapters , , and .
4) Chapters , (maybe ), and .
5) Chapter , chapter , , , .
6 INTRODUCTION
Chapter 8

Several Variables and Partial Derivatives

8.1 Vector spaces, linear mappings, and convexity


Note: 2–3 lectures

8.1.1 Vector spaces


The euclidean space Rn has already made an appearance in the metric space chapter. In this chapter,
we will extend the differential calculus we created for one variable to several variables. The key
idea in differential calculus is to approximate functions by lines and linear functions. In several
variables we must introduce a little bit of linear algebra before we can move on. So let us start with
vector spaces and linear functions on vector spaces.
While it is common to use ~x or the bold x for elements of Rn , especially in the applied sciences,
we use just plain x, which is common in mathematics. That is, v ∈ Rn is a vector, which means
v = (v1 , v2 , . . . , vn ) is an n-tuple of real numbers.
It is common to write and treat vectors as column vectors, that is, n-by-1 matrices:
 
v1
v 2 
 
v = (v1 , v2 , . . . , vn ) = ..
 .. 
vn

We will do so when convenient. We call real numbers scalars to distinguish them from vectors.
In Rn we often think of vectors as a direction and a magnitude, and draw the vector as an arrow.
The vector (v1 , v2 , . . . , vn ) is represented by the arrow from the origin to the point (v1 , v2 , . . . , vn ),
see in the plane R2 . When we do think of vectors as arrows, they are not based at the
origin necessarily; a vector is simply the direction and the magnitude, and it does not know where it
starts.
On the other hand, each vector also represents a point in Rn . Usually we think of v ∈ Rn as
a point if we are thinking of Rn as a metric space, and we think of it as arrow if we think of the
so-called vector space structure on Rn . Let us define the abstract notion of the vector space, as there
are many other vector spaces than just Rn .

Subscripts are used for many purposes, so sometimes we may have several vectors that may also be identified by
subscript, such as a finite or infinite sequence of vectors y1 , y2 , . . ..
8 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

x2
(v1 , v2 )

x1
Figure 8.1: Vector as an arrow.

Definition 8.1.1. Let X be a set together with the operations of addition, + : X × X → X, and
multiplication, · : R × X → X, (we usually write ax instead of a · x). X is called a vector space (or a
real vector space) if the following conditions are satisfied:
(i) (Addition is associative) If u, v, w ∈ X, then u + (v + w) = (u + v) + w.
(ii) (Addition is commutative) If u, v ∈ X, then u + v = v + u.
(iii) (Additive identity) There is a 0 ∈ X such that v + 0 = v for all v ∈ X.
(iv) (Additive inverse) For every v ∈ X, there is a −v ∈ X, such that v + (−v) = 0.
(v) (Distributive law) If a ∈ R, u, v ∈ X, then a(u + v) = au + av.
(vi) (Distributive law) If a, b ∈ R, v ∈ X, then (a + b)v = av + bv.
(vii) (Multiplication is associative) If a, b ∈ R, v ∈ X, then (ab)v = a(bv).
(viii) (Multiplicative identity) 1v = v for all v ∈ X.
Elements of a vector space are usually called vectors, even if they are not elements of Rn (vectors in
the “traditional” sense).
If Y ⊂ X is a subset that is a vector space itself using the same operations, then Y is called a
subspace or a vector subspace of X.
Multiplication by scalars works as one would expect. For example, 2v = (1 + 1)v = 1v +
1v = v + v, similarly 3v = v + v + v, and so on. One particular fact we often use is that 0v = 0,
where the zero on the left is 0 ∈ R and the zero on the right is 0 ∈ X. To see this start with
0v = (0 + 0)v = 0v + 0v, and add −(0v) to both sides to obtain 0 = 0v. Similarly −v = (−1)v,
which follows by (−1)v + v = (−1)v + 1v = (−1 + 1)v = 0v = 0. These algebraic facts which
follow quickly from the definition we will take for granted from now on.
Example 8.1.2: An example vector space is Rn , where addition and multiplication by a scalar is
done componentwise: if a ∈ R, v = (v1 , v2 , . . . , vn ) ∈ Rn , and w = (w1 , w2 , . . . , wn ) ∈ Rn , then
v + w := (v1 , v2 , . . . , vn ) + (w1 , w2 , . . . , wn ) = (v1 + w1 , v2 + w2 , . . . , vn + wn ),
av := a(v1 , v2 , . . . , vn ) = (av1 , av2 , . . . , avn ).
In this book we mostly deal with vector spaces that can be often regarded as subsets of Rn , but
there are other vector spaces useful in analysis. Let us give a couple of examples.
Example 8.1.3: A trivial example of a vector space is just X := {0}. The operations are defined in
the obvious way: 0 + 0 := 0 and a0 := 0. A zero vector must always exist, so all vector spaces are
nonempty sets, and this X is in fact the smallest possible vector space.
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 9

Example 8.1.4: The space C([0, 1], R) of continuous functions on the interval [0, 1] is a vector
space. For two functions f and g in C([0, 1], R) and a ∈ R, we make the obvious definitions of f + g
and a f :

( f + g)(x) := f (x) + g(x), (a f )(x) := a f (x) .

The 0 is the function that is identically zero. We leave it as an exercise to check that all the vector
space conditions are satisfied.
The space C1 ([0, 1], R) of continuously differentiable functions is a subspace of C([0, 1], R).

Example 8.1.5: The space of polynomials c0 + c1t + c2t 2 + · · · + cmt m (of arbitrary degree m) is a
vector space. Let us denote it by R[t] (coefficients are real and the variable is t). The operations are
defined in the same way as for functions above. Suppose there are two polynomials, one of degree
m and one of degree n. Assume n ≥ m for simplicity. Then

(c0 + c1t + c2t 2 + · · · + cmt m ) + (d0 + d1t + d2t 2 + · · · + dnt n ) =


(c0 + d0 ) + (c1 + d1 )t + (c2 + d2 )t 2 + · · · + (cm + dm )t m + dm+1t m+1 + · · · + dnt n

and
a(c0 + c1t + c2t 2 + · · · + cmt m ) = (ac0 ) + (ac1 )t + (ac2 )t 2 + · · · + (acm )t m .

Despite what it looks like, R[t] is not equivalent to Rn for any n. In particular, it is not “finite
dimensional”. We will make this notion precise in just a little bit. One can make a finite dimensional
vector subspace by restricting the degree. For example, if Pn is the set of polynomials of degree n
or less, then Pn is a finite dimensional vector space, and we could identify it with Rn+1 .
In the above, the variable t is really just a formal placeholder. By setting t equal to a real number
we obtain a function. So the space R[t] can be thought of as a subspace of C(R, R). If we restrict
the range of t to [0, 1], R[t] can be identified with a subspace of C([0, 1], R).

Remark 8.1.6. If X is a vector space, to check that a subset S ⊂ X is a vector subspace, we only need
1) 0 ∈ S,
2) S is closed under addition, adding two vectors in S gets us a vector in S, and
3) S is closed under scalar multiplication, multiplying a vector in S by a scalar gets us a vector in S.
2) and 3) make sure that the addition and scalar multiplication are in fact defined on S. 1) is required
to fullfill . Existence of additive inverse −v follows because −v = (−1)v and 3) says that −v ∈ S
if v ∈ S. All other properties are certain equalities which are already satisfied in X and thus must be
satisfied in a subset.
It is often better to think of even the simpler “finite dimensional” vector spaces using the abstract
notion rather than always as Rn . It is possible to use other fields than R in the definition (for example
it is common to use the complex numbers C), but let us stick with the real numbers .


If you want a very funky vector space over a different field, R itself is a vector space over the rational numbers.
10 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

8.1.2 Linear combinations and dimension


Definition 8.1.7. Suppose X is a vector space, x1 , x2 , . . . , xk ∈ X are vectors, and a1 , a2 , . . . , ak ∈ R
are scalars. Then
a1 x1 + a2 x2 + · · · + ak xk
is called a linear combination of the vectors x1 , x2 , . . . , xk .
If Y ⊂ X is a set, then the span of Y , or in notation span(Y ), is the set of all linear combinations
of all finite subsets of Y . We say Y spans span(Y ).

Example 8.1.8: Let Y := (1, 1) ⊂ R2 . Then

span(Y ) = (x, x) ∈ R2 : x ∈ R .

That is, span(Y ) is the line through the origin and the point (1, 1).

Example 8.1.9: Let Y := (1, 1), (0, 1) ⊂ R2 . Then

span(Y ) = R2 ,

as any point (x, y) ∈ R2 can be written as a linear combination

(x, y) = x(1, 1) + (y − x)(0, 1).

Example 8.1.10: Let Y := {1,t,t 2 ,t 3 , . . .} ⊂ R[t], and E := {1,t 2 ,t 4 ,t 6 . . .} ⊂ R[t].


The span of Y is all polynomials,

span(Y ) = R[t].

The span of E is the set of polynomials with even powers of t only.


Suppose we have two linear combinations of vectors from Y . One linear combination uses
the vectors {x1 , x2 , . . . , xk }, and the other uses {e
x1 , xe2 , . . . , xeℓ }. Then clearly we can write both
linear combinations using vectors from the union {x1 , x2 , . . . , xk } ∪ {e x1 , xe2 , . . . , xeℓ }, by just taking
zero multiples of the vectors we do not need, e.g. x1 = x1 + 0e x1 . Suppose we have two linear
combinations, we can without loss of generality write them as a linear combination of x1 , x2 , . . . , xk .
Then their sum is also a linear combination of vectors from Y :

(a1 x1 + a2 x2 + · · · + ak xk ) + (b1 x1 + b2 x2 + · · · + bk xk )
= (a1 + b1 )x1 + (a2 + b2 )x2 + · · · + (ak + bk )xk .

Similarly a scalar multiple of a linear combination of vectors from Y is a linear combination of


vectors from Y :

b(a1 x1 + a2 x2 + · · · + ak xk ) = ba1 x1 + ba2 x2 + · · · + bak xk .

We formalize this statement in a proposition.


Proposition 8.1.11. Let X be a vector space. For any Y ⊂ X, the set span(Y ) is a vector space
itself. That is, span(Y ) is a subspace of X.
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 11

If Y is already a vector space, then span(Y ) = Y .


Definition 8.1.12. A set of vectors {x1 , x2 , . . . , xk } ⊂ X is linearly independent , if the only solution
to
a1 x1 + a2 x2 + · · · + ak xk = 0 (8.1)
is the trivial solution a1 = a2 = · · · = ak = 0. A set that is not linearly independent, is linearly
dependent.
A linearly independent set of vectors B such that span(B) = X is called a basis of X.
If a vector space X contains a linearly independent set of d vectors, but no linearly independent
set of d + 1 vectors, then we say the dimension of X is d, and we write dim X := d. If for all d ∈ N
the vector space X contains a set of d linearly independent vectors, we say X is infinite dimensional
and write dim X := ∞.
For the trivial vector space {0}, we define dim {0} := 0.
For example, the set Y of the two vectors in is a basis of R2 , and we have that
2
dim R ≥ 2. We will see in a moment that any vector subspace of Rn has a finite dimension, and
that dimension is less than or equal to n. It will follow that dim R2 = 2.
If a set is linearly dependent, then one of the vectors is a linear combination of the others. In
other words, in ( ) if a j 6= 0, then we solve for x j :
−a1 −a j−1 −a j+1 −ak
xj = x1 + · · · + x j−1 + x j+1 + · · · + xk .
aj aj aj ak
The vector x j has at least two different representations as linear combinations of {x1 , x2 , . . . , xk }.
The one above and x j itself.

For example, the set (0, 1), (2, 3), (5, 0) in R2 is linearly dependent:
3(0, 1) − (2, 3) + 2(1, 0) = 0, so (2, 3) = 3(0, 1) + 2(1, 0).
Proposition 8.1.13. If B = {x1 , x2 , . . . , xk } is a basis of a vector space X, then every point y ∈ X
has a unique representation of the form
k
y= ∑ aj xj
j=1

for some scalars a1 , a2 , . . . , ak .


Proof. Every y ∈ X is a linear combination of elements of B since X is the span of B. For uniqueness
suppose
k k
y= ∑ a j x j = ∑ b j x j,
j=1 j=1
then
k
∑ (a j − b j )x j = 0.
j=1
By linear independence of the basis a j = b j for all j.

For an infinite set Y ⊂ X, we would say Y is linearly independent if every finite subset of Y is linearly independent
in the sense given. However, this situation only comes up in infinitely many dimensions and we will not require it.
12 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

For Rn we define

e1 := (1, 0, 0, . . . , 0), e2 := (0, 1, 0, . . . , 0), ..., en := (0, 0, 0, . . . , 1),

and call this the standard basis of Rn . We use the same letters e j for any Rn , and which space Rn
we are working in is understood from context. A direct computation shows that {e1 , e2 , . . . , en } is
really a basis of Rn ; it spans Rn and is linearly independent. In fact,
n
a = (a1 , a2 , . . . , an ) = ∑ a j e j.
j=1

Proposition 8.1.14. Let X be a vector space and d a nonnegative integer.


(i) If X is spanned by d vectors, then dim X ≤ d.
(ii) dim X = d if and only if X has a basis of d vectors (and so every basis has d vectors).
(iii) In particular, dim Rn = n.
(iv) If Y ⊂ X is a vector subspace and dim X = d, then dim Y ≤ d.
(v) If dim X = d and a set T of d vectors spans X, then T is linearly independent.
(vi) If dim X = d and a set T of m vectors is linearly independent, then there is a set S of d − m
vectors such that T ∪ S is a basis of X.

Proof. Let us start with . Suppose S = {x1 , x2 , . . . , xd } spans X, and T = {y1 , y2 , . . . , ym } is a set
of linearly independent vectors of X. We wish to show that m ≤ d. Write
d
y1 = ∑ ak,1xk ,
k=1

for some numbers a1,1 , a2,1 , . . . , ad,1 , which we can do as S spans X. One of the ak,1 is nonzero
(otherwise y1 would be zero), so suppose without loss of generality that this is a1,1 . Then we solve
d a
1 k,1
x1 = y1 − ∑ xk .
a1,1 k=2 a1,1

In particular, {y1 , x2 , . . . , xd } span X, since x1 can be obtained from {y1 , x2 , . . . , xd }. Therefore, there
are some numbers for some numbers a1,2 , a2,2 , . . . , ad,2 , such that

d
y2 = a1,2 y1 + ∑ ak,2 xk .
k=2

As T is linearly independent, one of the ak,2 for k ≥ 2 must be nonzero. Without loss of generality
suppose a2,2 6= 0. Proceed to solve for
d a
1 a1,2 k,2
x2 = y2 − y1 − ∑ xk .
a2,2 a2,2 k=3 a2,2
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 13

In particular, {y1 , y2 , x3 , . . . , xd } spans X.


We continue this procedure. If m < d, then we are done. So suppose m ≥ d. After d steps we ob-
tain that {y1 , y2 , . . . , yd } spans X. Any other vector v in X is a linear combination of {y1 , y2 , . . . , yd },
and hence cannot be in T as T is linearly independent. So m = d.
Let us look at . First a short claim. If T is a set of k linearly independent vectors that do not
span X, that is X \ span(T ) 6= 0, / then choose a vector v ∈ X \ span(T ). The set T ∪ {v} is linearly
independent: A nonzero linear combination of elements of T ∪ {v} would either produce v as a
combination of T , or it would be a combination of elements of T , and neither option is possible.
If dim X = d, then there must exist some linearly independent set of d vectors T , and it must
span X, otherwise we could choose a larger set of linearly independent vectors. So we have a basis
of d vectors. On the other hand if we have a basis of d vectors, it is linearly independent and spans
X by definition. By we know there is no set of d + 1 linearly independent vectors, so dimension
of X must be d.
For notice that {e1 , e2 , . . . , en } is a basis of Rn .
To see , suppose Y ⊂ X is a vector subspace, where dim X = d. As X cannot contain d + 1
linearly independent vectors, neither can Y .
For suppose T is a set of m vectors that is linearly dependent and spans X, we will show that
m > d. One of the vectors is a linear combination of the others. If we remove it from T we obtain a
set of m − 1 vectors that still span X and hence dim X ≤ m − 1 by .
For suppose T = {x1 , x2 , . . . , xm } is a linearly independent set. Firstly, m ≤ d. If m = d we
are done. Otherwise, we follow the procedure above in the proof of to add a vector v not in the
span of T . The set T ∪ {v} is linearly independent, whose span has dimension m + 1. Therefore, we
can repeat this procedure d − m times before we find a set of d linearly indendent vectors. They
must span X otherwise we could add yet another vector.

8.1.3 Linear mappings


A function f : X → Y , when Y is not R, is often called a mapping or a map rather than a function.
Definition 8.1.15. A mapping A : X → Y of vector spaces X and Y is linear (we also say A is a
linear transformation or a linear operator) if for all a ∈ R and all x, y ∈ X,

A(ax) = aA(x), and A(x + y) = A(x) + A(y).

We usually write Ax instead of A(x) if A is linear. If A is one-to-one and onto, then we say A is
invertible, and we denote the inverse by A−1 . If A : X → X is linear, then we say A is a linear
operator on X.
We write L(X,Y ) for the set of all linear transformations from X to Y , and just L(X) for the set
of linear operators on X. If a ∈ R and A, B ∈ L(X,Y ), define the transformations aA and A + B by

(aA)(x) := aAx, (A + B)(x) := Ax + Bx.

If A ∈ L(Y, Z) and B ∈ L(X,Y ), define the transformation AB as the composition A ◦ B, that is,

ABx := A(Bx).

Finally denote by I ∈ L(X) the identity: the linear operator such that Ix = x for all x.
14 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

It is not hard to see that aA ∈ L(X,Y ) and A + B ∈ L(X,Y ), and that AB ∈ L(X, Z). In particular,
L(X,Y ) is a vector space. As the set L(X) is not only a vector space, but also admits a product
(composition of operators), it is often called an algebra.
An immediate consequence of the definition of a linear mapping is: if A is linear, then A0 = 0.

Proposition 8.1.16. If A ∈ L(X,Y ) is invertible, then A−1 is linear.

Proof. Let a ∈ R and y ∈ Y . As A is onto, then there is an x such that y = Ax, and further as it is
also one-to-one A−1 (Az) = z for all z ∈ X. So

A−1 (ay) = A−1 (aAx) = A−1 A(ax) = ax = aA−1 (y).

Similarly let y1 , y2 ∈ Y , and x1 , x2 ∈ X such that Ax1 = y1 and Ax2 = y2 , then



A−1 (y1 + y2 ) = A−1 (Ax1 + Ax2 ) = A−1 A(x1 + x2 ) = x1 + x2 = A−1 (y1 ) + A−1 (y2 ).

Proposition 8.1.17. If A ∈ L(X,Y ) is linear, then it is completely determined by its values on a


basis of X. Furthermore, if B is a basis of X, then any function A e : B → Y extends to a linear
function on X.

We will only prove this proposition for finite dimensional spaces, as we do not need infinite
dimensional spaces. For infinite dimensional spaces, the proof is essentially the same, but a little
trickier to write, so let us stick with finitely many dimensions.
Proof. Let {x1 , x2 , . . . , xn } be a basis of X, and let y j := Ax j . Every x ∈ X has a unique representation
n
x= ∑ bj xj
j=1

for some numbers b1 , b2 , . . . , bn . By linearity


n n n
Ax = A ∑ b j x j = ∑ b j Ax j = ∑ b j y j.
j=1 j=1 j=1

The “furthermore” follows by setting y j := A(xe j ), and then for x = ∑n b j x j , defining the extension
j=1
n
as Ax := ∑ j=1 b j y j . The function is well-defined by uniqueness of the representation of x. We leave
it to the reader to check that A is linear.
The next proposition only works for finite dimensional vector spaces. It is a special case of the
so-called rank-nullity theorem from linear algebra.

Proposition 8.1.18. If X is a finite dimensional vector space and A ∈ L(X), then A is one-to-one if
and only if it is onto.

Proof. Let {x1 , x2 , . . . , xn } be a basis for X. Suppose A is one-to-one. Now suppose


n n
∑ c j Ax j = A ∑ c j x j = 0.
j=1 j=1
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 15

As A is one-to-one, the only vector that is taken to 0 is 0 itself. Hence,


n
0= ∑ cj xj
j=1

and c j = 0 for all j. So {Ax1 , Ax2 , . . . , Axn } is a linearly independent set. By and
the fact that the dimension is n, we conclude {Ax1 , Ax2 , . . . , Axn } spans X. Any point x ∈ X can be
written as
n n
x= ∑ a j Ax j = A ∑ a j x j ,
j=1 j=1

so A is onto.
Now suppose A is onto. As A is determined by the action on the basis, every element of X is in
the span of {Ax1 , Ax2 , . . . , Axn }. Suppose that for some c1 , c2 , . . . , cn ,
n n
A ∑ cj xj = ∑ c j Ax j = 0.
j=1 j=1

By as {Ax1 , Ax2 , . . . , Axn } span X, the set is linearly independent, and hence
c j = 0 for all j. In other words if Ax = 0, then x = 0. This means that A is one-to-one: If Ax = Ay,
then A(x − y) = 0 and so x = y.
We leave the proof of the next proposition as an exercise.

Proposition 8.1.19. If X and Y are finite dimensional vector spaces, then L(X,Y ) is also finite
dimensional.

Finally let us note that we often identify a finite dimensional vector space X of dimension n
with Rn , provided we fix a basis {x1 , x2 , . . . , xn } in X. That is, we define a bijective linear map
A ∈ L(X, Rn ) by Ax j := e j , where {e1 , e2 , . . . , en } is the standard basis in Rn . Then we have the
correspondence
n
A
∑ cj xj ∈ X 7→ (c1 , c2 , . . . , cn ) ∈ Rn .
j=1

8.1.4 Convexity
A subset U of a vector space is convex if whenever x, y ∈ U, the line segment from x to y lies in U.
That is, if the convex combination (1 − t)x + ty is in U for all t ∈ [0, 1]. Sometimes we write [x, y]
for this line segment. See .
In R, convex sets are precisely the intervals, which are also precisely the connected sets. In
two or more dimensions there are lots of nonconvex connected sets. For example, the set R2 \ {0}
is not convex, but it is connected. To see this, take any x ∈ R2 \ {0} and let y := −x. Then
(1/2)x + (1/2)y = 0, which is not in the set. Balls in Rn are convex. We use this result often enough
we state it as a proposition, and leave the proof as an exercise.

Proposition 8.1.20. Let x ∈ Rn and r > 0. The ball B(x, r) ⊂ Rn (using the standard metric on Rn )
is convex.
16 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

U y

(1 − t)x + ty
x

Figure 8.2: Convexity.

Example 8.1.21: As a convex combination is in particular a linear combination, any subspace V of


a vector space X is convex.

Example 8.1.22: A somewhat more complicated example is given by the following. Let C([0, 1], R)
be the vector space of continuous real-valued functions on R. Let X ⊂ C([0, 1], R) be the set of
those f such that
Z 1
f (x) dx ≤ 1 and f (x) ≥ 0 for all x ∈ [0, 1].
0

Then X is convex. Take t ∈ [0, 1], and note that if f , g ∈ X, then t f (x) + (1 − t)g(x) ≥ 0 for all x.
Furthermore
Z 1  Z 1 Z 1
t f (x) + (1 − t)g(x) dx = t f (x) dx + (1 − t) g(x) dx ≤ 1.
0 0 0

Note that X is not a vector subspace of C([0, 1], R).

Proposition 8.1.23. The intersection two convex sets is convex. In fact, if {Cλ }λ ∈I is an arbitrary
collection of convex sets, then \
C := Cλ
λ ∈I

is convex.

Proof. If x, y ∈ C, then x, y ∈ Cλ for all λ ∈ I, and hence if t ∈ [0, 1], then tx + (1 − t)y ∈ Cλ for all
λ ∈ I. Therefore, tx + (1 − t)y ∈ C and C is convex.

Proposition 8.1.24. Let T : V → W be a linear mapping between two vector spaces and let C ⊂ V
be a convex set. Then T (C) is convex.

Proof. Take any two points p, q ∈ T (C). Pick x, y ∈ C such that T x = p and Ty = q. As C is convex,
then tx + (1 − t)y ∈ C for all t ∈ [0, 1], so

t p + (1 − t)q = tT x + (1 − t)Ty = T tx + (1 − t)y ∈ T (C).
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 17

For completeness, a very useful construction is the convex hull. Given any set S ⊂ V of a vector
space, define the convex hull of S, by
\
co(S) := {C ⊂ V : S ⊂ C, and C is convex}.

That is, the convex hull is the smallest convex set containing S. By a proposition above, the
intersection of convex sets is convex and hence, the convex hull is convex.

Example 8.1.25: The convex hull of 0 and 1 in R is [0, 1]. Proof: Any convex set containing 0 and
1 must contain [0, 1]. The set [0, 1] is convex, therefore it must be the convex hull.

8.1.5 Exercises
Exercise 8.1.1: Show that in Rn (with the standard euclidean metric) for any x ∈ Rn and any r > 0, the ball
B(x, r) is convex.

Exercise 8.1.2: Verify that Rn is a vector space.

Exercise 8.1.3: Let X be a vector space. Prove that a finite set of vectors {x1 , x2 , . . . , xn } ⊂ X is linearly
independent if and only if for every j = 1, 2, . . . , n

span({x1 , . . . , x j−1 , x j+1 , . . . , xn }) ( span({x1 , x2 , . . . , xn }).

That is, the span of the set with one vector removed is strictly smaller.
R1
Exercise 8.1.4: Show that the set X ⊂ C([0, 1], R) of those functions such that 0 f = 0 is a vector subspace.

Exercise 8.1.5 (Challenging): Prove C([0, 1], R) is an infinite dimensional vector space where the operations
are defined in the obvious way: s = f + g and m = a f are defined as s(x) := f (x) + g(x) and m(x) := a f (x).
Hint: For the dimension, think of functions that are only nonzero on the interval (1/n+1, 1/n).

Exercise 8.1.6: Let k : [0, 1]2 → R be continuous. Show that L : C([0, 1], R) → C([0, 1], R) defined by
Z 1
L f (y) := k(x, y) f (x) dx
0

is a linear operator. That is, first show that L is well-defined by showing that L f is continuous whenever f is,
and then showing that L is linear.

Exercise 8.1.7: Let Pn be the vector space of polynomials in one variable of degree n or less. Show that
Pn is a vector space of dimension n + 1.

Exercise 8.1.8: Let R[t] be the vector space of polynomials in one variable t. Let D : R[t] → R[t] be the
derivative operator (derivative in t). Show that D is a linear operator.

Exercise 8.1.9: Let us show that  only works in finite dimensions. Take R[t] and define the
operator A : R[t] → R[t] by A P(t) = tP(t). Show that A is linear and one-to-one, but show that it is not
onto.
18 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Exercise 8.1.10: Finish the proof of in the finite dimensional case. That is, suppose,
{x1 , x2 , . . . xn } is a basis of X, {y1 , y2 , . . . yn } ⊂ Y and we define a function
n n
Ax := ∑ b jy j, if x= ∑ b jx j.
j=1 j=1

Then prove that A : X → Y is linear.

Exercise 8.1.11: Prove . Hint: A linear transformation is determined by its action on a


basis. So given two bases {x1 , . . . , xn } and {y1 , . . . , ym } for X and Y respectively, consider the linear operators
A jk that send A jk x j = yk , and A jk xℓ = 0 if ℓ 6= j.

Exercise 8.1.12 (Easy): Suppose X and Y are vector spaces and A ∈ L(X,Y ) is a linear operator.
a) Show that the nullspace N := {x ∈ X : Ax = 0} is a vector space.
b) Show that the range R := {y ∈ Y : Ax = y for some x ∈ X} is a vector space.

Exercise 8.1.13 (Easy): Show by example that a union of convex sets need not be convex.

Exercise 8.1.14: Compute the convex hull of the set of 3 points (0, 0), (0, 1), (1, 1) in R2 .

Exercise 8.1.15: Show that the set (x, y) ∈ R2 : y > x2 is a convex set.
R1
Exercise 8.1.16: Show that the set X ⊂ C([0, 1], R) of those functions such that 0 f = 1 is a convex set, but
not a vector subspace.

Exercise 8.1.17: Show that every convex set in Rn is connected using the standard topology on Rn .

Exercise 8.1.18: Suppose K ⊂ R2 is a convex set such that the only point of the form (x, 0) in K is the point
(0, 0). Further suppose that there (0, 1) ∈ K and (1, 1) ∈ K. Then show that if (x, y) ∈ K, then y > 0 unless
x = 0.

Exercise 8.1.19: Prove that an arbitrary intersection of vector subspaces is a vector subspace. That is, if X
T
is a vector space and {Vλ }λ ∈I is an arbitrary collection of vector subspaces of X, then λ ∈I Vλ is a vector
subspace of X.
8.2. ANALYSIS WITH VECTOR SPACES 19

8.2 Analysis with vector spaces


Note: 3 lectures

8.2.1 Norms
Let us start measuring distance.

Definition 8.2.1. If X is a vector space, then we say a function k·k : X → R is a norm if:
(i) kxk ≥ 0, with kxk = 0 if and only if x = 0.
(ii) kcxk = |c| kxk for all c ∈ R and x ∈ X.
(iii) kx + yk ≤ kxk + kyk for all x, y ∈ X (Triangle inequality).
A vector space equipped with a norm is called a normed vector space.

Given a norm (any norm) on a vector space X, we define a distance d(x, y) := kx − yk, and this
d makes X into a metric space (exercise).
Before defining the standard norm on Rn , let us define the standard scalar dot product on Rn .
For two vectors if x = (x1 , x2 , . . . , xn ) ∈ Rn and y = (y1 , y2 , . . . , yn ) ∈ Rn , define
n
x · y := ∑ x j y j.
j=1

The dot product is linear in each variable separately, or in more fancy language it is bilinear. That
is, if y is fixed, the map x 7→ x · y is a linear map from Rn to R. Similarly, if x is fixed, then y 7→ x · y
is also linear. It is also symmetric in the sense that x · y = y · x. The Euclidean norm is defined as
√ q
kxk := kxkRn := x · x = (x1 )2 + (x2 )2 + · · · + (xn )2 .

We normally just use kxk, but sometimes it is necessary to emphasize that we are talking about the
euclidean norm and we use kxkRn . It is easy to see that the Euclidean norm satisfies and . To
prove that holds, the key inequality is the so-called Cauchy–Schwarz inequality we saw before.
As this inequality is so important let us restate and reprove a slightly stronger version using the
notation of this chapter.

Theorem 8.2.2 (Cauchy–Schwarz inequality). Let x, y ∈ Rn , then


√ √
|x · y| ≤ kxk kyk = x · x y · y,

with equality if and only if x = λ y or y = λ x for some λ ∈ R.

Proof. If x = 0 or y = 0, then the theorem holds trivially. So assume x 6= 0 and y 6= 0.


If x is a scalar multiple of y, that is x = λ y for some λ ∈ R, then the theorem holds with equality:

|x · y| = |λ y · y| = |λ | |y · y| = |λ | kyk2 = kλ yk kyk = kxk kyk.


20 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Next take x + ty, we find that kx + tyk2 is a quadratic polynomial in t:

kx + tyk2 = (x + ty) · (x + ty) = x · x + x · ty + ty · x + ty · ty = kxk2 + 2t(x · y) + t 2 kyk2 .

If x is not a scalar multiple of y, then kx + tyk2 > 0 for all t. So the polynomial kx + tyk2 is never
zero. Elementary algebra says that the discriminant must be negative:

4(x · y)2 − 4kxk2 kyk2 < 0,

or in other words (x · y)2 < kxk2 kyk2 .


Item , the triangle inequality in Rn , follows via the following computation:
 2
kx + yk2 = x · x + y · y + 2(x · y) ≤ kxk2 + kyk2 + 2 kxk kyk = kxk + kyk .

The distance d(x, y) := kx − yk is the standard distance (standard metric) on Rn that we used
when we talked about metric spaces.
Definition 8.2.3. Let A ∈ L(X,Y ). Define

kAk := sup kAxk : x ∈ X with kxk = 1 .

The number kAk (possibly ∞) is called the operator norm. We will see below that indeed it is a
norm for finite dimensional spaces. Again, when necessary to emphasize which norm we are talking
about, we may write it as kAkL(X,Y ) .

x kAxk x
By linearity, A kxk = kxk , for any nonzero x ∈ X. The vector kxk is of norm 1. Therefore,

 kAxk
kAk = sup kAxk : x ∈ X with kxk = 1 = sup .
x∈X kxk
x6=0

This implies that, assuming kAk is not infinity,

kAxk ≤ kAkkxk.

It is not hard to see from the definition that kAk = 0 if and only if A = 0, that is, if A takes every
vector to the zero vector.
It is also not difficult to compute the operator norm of the identity operator:
kIxk kxk
kIk = sup = sup = 1.
x∈X kxk x∈X kxk
x6=0 x6=0

The operator norm is not always a norm on L(X,Y ), in particular, kAk is not always finite for
A ∈ L(X,Y ). We prove below that kAk is finite when X is finite dimensional. This also implies
that A is continuous. For infinite dimensional spaces neither statement needs to be true. For
example, take the vector space of continuously differentiable functions on [0, 2π ] and as the norm
use the uniform norm. The functions t 7→ sin(nt) have norm 1, but the derivatives have norm n. So
8.2. ANALYSIS WITH VECTOR SPACES 21

differentiation (which is a linear operator) has infinite operator norm on this space. We will stick to
finite dimensional spaces.
When we talk about a finite dimensional vector space X, one often thinks of Rn , although if we
have a norm on X, the norm might not be the standard euclidean norm. In the exercises, you can
prove that every norm is “equivalent” to the euclidean norm in that the topology it generates is the
same. For simplicity, we only prove the following proposition for the euclidean space, and the proof
for a general finite dimensional space is left as an exercise.

Proposition 8.2.4. Let X and Y be normed vector spaces. Suppose that X is finite dimensional. If
A ∈ L(X,Y ), then kAk < ∞, and A is uniformly continuous (Lipschitz with constant kAk).

Proof. As we said we only prove the proposition for euclidean space so suppose that X = Rn and
the norm is the standard euclidean norm. The general case is left as an exercise.
Let {e1 , e2 , . . . , en } be the standard basis of Rn . Write x ∈ Rn , with kxk = 1, as
n
x= ∑ c j e j.
j=1

Since e j · eℓ = 0 whenever j 6= ℓ and e j · e j = 1, then c j = x · e j and by Cauchy–Schwarz

|c j | = |x · e j | ≤ kxk ke j k = 1.

Then
n n n
kAxk = ∑ c j Ae j ≤ ∑ |c j | kAe j k ≤ ∑ kAe j k.
j=1 j=1 j=1

The right hand side does not depend on x. We found a finite upper bound independent of x, so
kAk < ∞.
For any normed vector spaces X and Y , and A ∈ L(X,Y ), suppose that kAk < ∞. For v, w ∈ X,

kAv − Awk = kA(v − w)k ≤ kAk kv − wk.

As kAk < ∞, then this says A is Lipschitz with constant kAk.

Proposition 8.2.5. Let X, Y , and Z be finite dimensional normed vector spaces .


(i) If A, B ∈ L(X,Y ) and c ∈ R, then

kA + Bk ≤ kAk + kBk, kcAk = |c| kAk.

In particular, the operator norm is a norm on the vector space L(X,Y ).


(ii) If A ∈ L(X,Y ) and B ∈ L(Y, Z), then

kBAk ≤ kBk kAk.



If we strike the “In particular” part and interpret the algebra with infinite operator norms properly, namely decree
that 0 times ∞ is 0, then this result also holds for infinite dimensional spaces.
22 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Proof. First, since all the spaces are finite dimensional, then all the operator norms are finite, and
the statements make sense to begin with.
For ,

k(A + B)xk = kAx + Bxk ≤ kAxk + kBxk ≤ kAk kxk + kBk kxk = kAk + kBk kxk.

So kA + Bk ≤ kAk + kBk.
Similarly, 
k(cA)xk = |c| kAxk ≤ |c| kAk kxk.
Thus kcAk ≤ |c| kAk. Next,
|c| kAxk = kcAxk ≤ kcAk kxk.
Hence |c| kAk ≤ kcAk.
For write
kBAxk ≤ kBk kAxk ≤ kBk kAk kxk.

As a norm defines a metric, there is a metric space topology on L(X,Y ) for finite dimensional
vector spaces, so we can talk about open/closed sets, continuity, and convergence.

Proposition 8.2.6. Let X be a finite dimensional normed vector space. Let GL(X) ⊂ L(X) be the
set of invertible linear operators.
(i) If A ∈ GL(X), B ∈ L(X), and
1
kA − Bk < , (8.2)
kA−1 k
then B is invertible.
(ii) GL(X) is an open subset, and A 7→ A−1 is a continuous function on GL(X).

Let us make sense of this proposition on a simple example. Consider X = R1 , where linear
operators are just numbers a and the operator norm of a is |a|. The operator a is invertible (a−1 = 1/a)
1
whenever a 6= 0. The condition |a − b| < |a−1 |
does indeed imply that b is not zero. And a 7→ 1/a is
a continuous map. When n > 1, then there are other noninvertible operators than just zero, and in
general things are a bit more difficult.

Proof. Let us prove . We know something about A−1 and A − B. These are linear operators so let
us apply them to a vector:
A−1 (A − B)x = x − A−1 Bx.
Therefore,

kxk = kA−1 (A − B)x + A−1 Bxk


≤ kA−1 k kA − Bk kxk + kA−1 k kBxk.

Now assume x 6= 0 and so kxk 6= 0. Using ( ) we obtain

kxk < kxk + kA−1 k kBxk,


8.2. ANALYSIS WITH VECTOR SPACES 23

or in other words kBxk 6= 0 for all nonzero x, and hence Bx 6= 0 for all nonzero x. This is enough to
see that B is one-to-one (if Bx = By, then B(x − y) = 0, so x = y). As B is one-to-one operator from
X to X, which is finite dimensional, B is invertible.
Let us prove . Fix some A ∈ GL(X). Let B be near A, specifically kA − Bk < 2kA1−1 k . Then
( ) is satisfied and B is invertible. We have shown above (using B−1 y instead of x)

1
kB−1 yk ≤ kA−1 k kA − Bk kB−1 yk + kA−1 k kyk ≤ kB−1 yk + kA−1 k kyk,
2
or
kB−1 yk ≤ 2kA−1 k kyk.
So kB−1 k ≤ 2kA−1 k.
Now
A−1 (A − B)B−1 = A−1 (AB−1 − I) = B−1 − A−1 ,
and
kB−1 − A−1 k = kA−1 (A − B)B−1 k ≤ kA−1 k kA − Bk kB−1 k ≤ 2kA−1 k2 kA − Bk.
Therefore, as B tends to A, kB−1 − A−1 k tends to 0, and so the inverse operation is a continuous
function at A.

8.2.2 Matrices
Once we fix a basis in a finite dimensional vector space X, we can represent a vector of X as an
n-tuple of numbers, that is a vector in Rn . The same thing can be done with L(X,Y ), which brings
us to matrices, which are a convenient way to represent finite-dimensional linear transformations.
Suppose {x1 , x2 , . . . , xn } and {y1 , y2 , . . . , ym } are bases for vector spaces X and Y respectively. A
linear operator is determined by its values on the basis. Given A ∈ L(X,Y ), Ax j is an element of Y .
Define the numbers ai, j as follows
m
Ax j = ∑ ai, j yi , (8.3)
i=1
and write them as a matrix  
a1,1 a1,2 · · · a1,n
 a2,1 a2,2 · · · a2,n 
 
A =  .. .. ... ..  .
 . . . 
am,1 am,2 · · · am,n
We sometimes write A as [ai, j ]. We say A is an m-by-n matrix. The columns of the matrix are
precisely the coefficients that represent Ax j , in terms of the basis {y1 , y2 , . . . , ym }. If we know
the numbers ai, j , then via the formula ( ) we find the corresponding linear operator, as it is
determined by the action on a basis. Hence, once we fix a basis on X and on Y we have a one-to-one
correspondence between L(X,Y ) and the m-by-n matrices.
When
n
z= ∑ c j x j,
j=1
24 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

then ! !
n n m m n
Az = ∑ c j Ax j = ∑ c j ∑ ai, j yi =∑ ∑ ai, j c j yi ,
j=1 j=1 i=1 i=1 j=1

which gives rise to the familiar rule for matrix multiplication.


There is a one-to-one correspondence between matrices and linear operators in L(X,Y ), once we
fix a basis in X and in Y . If we choose a different basis, we get different matrices. This is important
distinction, the operator A acts on elements of X, the matrix is something that works with n-tuples
of numbers, that is, vectors of Rn .
If B is an n-by-r matrix with entries b j,k , then the matrix for C = AB is an m-by-r matrix whose
(i, k)th entry ci,k is
n
ci,k = ∑ ai, j b j,k .
j=1

A way to remember it is if you order the indices as we do, that is row, column, and put the elements
in the same order as the matrices, then it is the “middle index” that is “summed-out.”
A linear mapping changing one basis to another is represented by a square matrix in which
the columns represent vectors of the second basis in terms of the first basis. We call such a linear
mapping a change of basis. So for two choices of a basis in an n-dimensional vector space, there is
a linear mapping (a change of basis) taking one basis to the other, and this corresponds to an n-by-n
matrix which does the corresponding operation on Rn .
Suppose X = Rn , Y = Rm , and all the bases are just the standard bases. Using the Cauchy–
Schwarz inequality compute
!2 ! ! !
m n m n n m n
kAzk2 = ∑ ∑ ai, j c j ≤∑ ∑ (c j )2 ∑ (ai, j )2 =∑ ∑ (ai, j )2 kzk2 .
i=1 j=1 i=1 j=1 j=1 i=1 j=1

In other words, we have a bound on the operator norm (note that equality rarely happens)
s
m n
kAk ≤ ∑ ∑ (ai, j )2.
i=1 j=1

If the entries go to zero, then kAk goes to zero. In particular, if A is fixed and B is changing such
that the entries of A − B go to zero, then B goes to A in operator norm. That is, B goes to A in the
metric space topology induced by the operator norm. We proved the first part of:

Proposition 8.2.7. If f : S → Rnm is a continuous function for a metric space S, then considering
the components of f as the entries of a matrix, f is a continuous mapping from S to L(Rn , Rm ).
Conversely, if f : S → L(Rn , Rm ) is a continuous function, then the entries of the corresponding
matrix are continuous functions.

Let us prove the second part. Take f (x)e j , which is a continuous


 function of S to Rm with
standard Euclidean norm: k f (x)e j − f (y)e j k = k f (x) − f (y) e j k ≤ k f (x) − f (y)k, so as x → y,
then k f (x) − f (y)k → 0 and so k f (x)e j − f (y)e j k → 0. Such a function is continuous if and only if
its components are continuous and these are the components of the jth column of the matrix f (x).
8.2. ANALYSIS WITH VECTOR SPACES 25

8.2.3 Determinants
A certain number can be assigned to square matrices that measures how the corresponding linear
mapping stretches space. In particular, this number, called the determinant, can be used to test for
invertibility of a matrix.
Define the symbol sgn(x) (read “sign of x”) for a number x by


−1 if x < 0,
sgn(x) := 0 if x = 0,


1 if x > 0.

Suppose σ = (σ1 , σ2 , . . . , σn ) is a permutation of the integers (1, 2, . . . , n), that is, a reordering of
(1, 2, . . . , n). Let
sgn(σ ) = sgn(σ1 , . . . , σn ) := ∏ sgn(σq − σ p ). (8.4)
p<q
Here ∏ stands for multiplication, similarly to how ∑ stands for summation.
Any permutation can be obtained by a sequence of transpositions (switchings of two elements).
We say a permutation is a permutation even (resp. odd) if it takes an even (resp. odd) number of
transpositions to get from (1, 2, . . . , n) to σ . For example, (2, 4, 3, 1) is two transpositions away
from (1, 2, 3, 4) and is therefore even: (1, 2, 3, 4) → (2, 1, 3, 4) → (2, 4, 3, 1). Being even or odd is
well-defined: sgn(σ ) is 1 if σ is even and −1 if σ is odd (exercise). This fact can be proved by
noting that applying a transposition changes the sign, and computing that sgn(1, 2, . . . , n) = 1.
Let Sn be the set of all permutations on n elements (the symmetric group). Let A = [ai, j ] be a
square n-by-n matrix. Define the determinant of A
n
det(A) := ∑ sgn(σ ) ∏ ai,σi .
σ ∈Sn i=1

Proposition 8.2.8.
(i) det(I) = 1.

(ii) For every j = 1, 2, . . . , n, the function x j 7→ det [x1 x2 · · · xn ] is linear.
(iii) If two columns of a matrix are interchanged, then the determinant changes sign.
(iv) If two columns of A are equal, then det(A) = 0.
(v) If a column is zero, then det(A) = 0.
(vi) A 7→ det(A) is a continuous function on L(Rn ).
  
(vii) det ac db = ad − bc, and det [a] = a.
In fact, the determinant is the unique function that satisfies , , and . But we digress. By
, we mean that if we fix all the vectors x1 , . . . , xn except for x j , and let v, w ∈ Rn be two vectors,
and a, b ∈ R be scalars, then

det [x1 · · · x j−1 (av + bw) x j+1 · · · xn ] =
 
a det [x1 · · · x j−1 v x j+1 · · · xn ] + b det [x1 · · · x j−1 w x j+1 · · · xn ] .
26 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Proof. We go through the proof quickly, as you have likely seen this before.
is trivial. For , notice that each term in the definition of the determinant contains exactly
one factor from each column.
Part follows by noting that switching two columns is like switching the two corresponding
numbers in every element in Sn . Hence all the signs are changed. Part follows because if
two columns are equal and we switch them we get the same matrix back and so part says the
determinant must have been 0.
Part follows because the product in each term in the definition includes one element from
the zero column. Part follows as det is a polynomial in the entries of the matrix and hence
continuous (in the entries of the matrix). A function defined on matrices is continuous in the operator
norm if and only if it is continuous in the entries. Finally, part is a direct computation.
The determinant tells us about areas and volumes, and how they change. For example, in
the 1-by-1 case, a matrix is just a number, and the determinant is exactly this number. It says
how the linear mapping “stretches” the space. Similarly for R2 . Suppose A ∈ L(R2 ) is a linear 
transformation. It can be checked directly that the area of the image of the unit square A [0, 1]2 is
precisely |det(A)|. This works with arbitrary figures, not just the unit square: The absolute value
of the determinant tells us the stretch in the area. The sign of the determinant tells us if the image
is flipped (changes orientation) or not. In R3 it tells us about the 3-dimensional volume, and in n
dimensions about the n-dimensional volume. We claim this without proof.
Proposition 8.2.9. If A and B are n-by-n matrices, then det(AB) = det(A) det(B). Furthermore, A
is invertible if and only if det(A) 6= 0 and in this case, det(A−1 ) = det(A)
1
.

Proof. Let b1 , b2 , . . . , bn be the columns of B. Then

AB = [Ab1 Ab2 ··· Abn ].

That is, the columns of AB are Ab1 , Ab2 , . . . , Abn .


Let b j,k denote the elements of B and a j the columns of A. By linearity of the determinant,
" #!
 n
det(AB) = det [Ab1 Ab2 · · · Abn ] = det ∑ b j,1a j Ab2 · · · Abn
j=1
n 
= ∑ b j,1 det [a j Ab2 ··· Abn ]
j=1

= ∑ b j1 ,1 b j2 ,2 · · · b jn ,n det [a j1 a j2 ··· a jn ]
1≤ j1 , j2 ,..., jn ≤n
!

= ∑ b j1 ,1 b j2 ,2 · · · b jn ,n sgn( j1 , j2 , . . . , jn ) det [a1 a2 ··· an ] .
( j1 , j2 ,..., jn )∈Sn

In the last equality we can sum over just the elements of Sn instead of all n-tuples for integers
between 1 and n by noting that when two columns in the determinant are the same, then the
determinant is zero. Then we reordered the columns to the original ordering to obtain the sgn.
The conclusion that det(AB) = det(A) det(B) follows by recognizing above the determinant of
B. We obtain this by plugging in A = I. The expression we got for the determinant of B has rows
8.2. ANALYSIS WITH VECTOR SPACES 27

and columns swapped, so as a side note, we have also just proved that the determinant of a matrix
and its transpose are equal.
To prove the second part of the theorem, suppose A is invertible. Then A−1 A = I and con-
sequently det(A−1 ) det(A) = det(A−1 A) = det(I) = 1. If A is not invertible, then there must be a
nonzero vector that A takes to zero as A is not one-to-one. In other words, the columns of A are
linearly dependent. Suppose
n
∑ γ j a j = 0,
j=1
where not all γ j are equal to 0. Without loss of generality suppose γ1 6= 1. Take
 
γ1 0 0 · · · 0
γ2 1 0 · · · 0
 
γ3 0 1 · · · 0
B :=  .
 .. .. .. . . .. 
. . . . .
γn 0 0 · · · 1
Using the definition of the determinant (there is only a single permutation σ for which ∏ni=1 bi,σi is
nonzero) we find det(B) = γ1 6= 0. Then det(AB) = det(A) det(B) = γ1 det(A). The first column of
AB is zero, and hence det(AB) = 0. We conclude det(A) = 0.
Proposition 8.2.10. Determinant is independent of the basis. In other words, if B is invertible, then
det(A) = det(B−1 AB).
The proof is to compute det(B−1 AB) = det(B−1 ) det(A) det(B) = det(B) 1
det(A) det(B) = det(A).
If in one basis A is the matrix representing a linear operator, then for another basis we can find a
matrix B such that the matrix B−1 AB takes us to the first basis, applies A in the first basis, and takes
us back to the basis we started with. Let X be a finite dimensional vector space. Let Φ ∈ L(X, Rn )
take a basis {x1 , . . . , xn } to the standard basis {e1 , . . . , en } and let Ψ ∈ L(X, Rn ) take another basis
{y1 , . . . , yn } to the standard basis. Let T ∈ L(X) be a linear operator and let a matrix A represent the
operator in the basis {x1 , . . . , xn }. Then B would be such that we have the following diagram :
B−1 AB
Rn Rn
Ψ Ψ−1
T
B X X B−1

Φ Φ−1
A
Rn Rn
The Rn on the bottom row represent X in the first basis, and the Rn on top represent X in the second
basis.
If we compute the determinant of the matrix A, we obtain the same determinant if we had used
any other basis: in the other basis the matrix would be B−1 AB. It follows that
det : L(X) → R

This is a so-called communtative diagram. Following arrows in any way should end up with the same result.
28 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

is a well-defined function (not just on matrices).


There are three types of so-called elementary matrices. There are three types of elementary
matrices. Let e1 , e2 , . . . , en be the standard basis on Rn as usual.
First, for some j = 1, 2, . . . , n and some λ ∈ R, λ 6= 0, define the first type of an elementary
matrix, an n-by-n matrix E by
(
ei if i 6= j,
Eei :=
λ ei if i = j.

Given any n-by-m matrix M the matrix EM is the same matrix as M except with the kth row
multiplied by λ . It is an easy computation (exercise) that det(E) = λ .
Next, for some j and k with j 6= k, and λ ∈ R, define the second type of an elementary matrix E
by
(
ei if i 6= j,
Eei :=
ei + λ ek if i = j.

Given any n-by-m matrix M the matrix EM is the same matrix as M except with λ times the kth
row added to the jth row. It is an easy computation (exercise) that det(E) = 1.
Finally, for some j and k with j 6= k, define the third type of an elementary matrix E by


ei if i 6= j and i 6= k,
Eei := ek if i = j,


ej if i = k.

Given any n-by-m matrix M the matrix EM is the same matrix with jth and kth rows swapped. It is
an easy computation (exercise) that det(E) = −1.

Proposition 8.2.11. Let T be an n-by-n invertible matrix. Then there exists a finite sequence of
elementary matrices E1 , E2 , . . . , Ek such that

T = E1 E2 · · · Ek ,

and
det(T ) = det(E1 ) det(E2 ) · · · det(Ek ).

The proof is left as an exercise. The proposition says that we can compute the determinant by
doing elementary row operations. For computing the determinant one doesn’t have to factor the
matrix into a product of elementary matrices completely: usually one would only do row operations
until we find an upper triangular matrix, that is a matrix [ai, j ] where ai, j = 0 if i > j. Computing
their determinant is not difficult (exercise).
Factorization into elementary matrices (or variations on elementary matrices) is useful in proofs
involving an arbitrary linear operator, by reducing to a proof for an elementary matrix, similarly as
the computation of the determinant.
8.2. ANALYSIS WITH VECTOR SPACES 29

8.2.4 Exercises
Exercise 8.2.1: For a vector space X with a norm k·k, show that d(x, y) := kx − yk makes X a metric space.

Exercise 8.2.2 (Easy): Show that for square matrices A and B, det(AB) = det(BA).

Exercise 8.2.3: For Rn define


kxk∞ := max{|x1 |, |x2 |, . . . , |xn |},
sometimes called the sup or the max norm.
a) Show that k·k∞ is a norm on Rn (defining a different distance).
b) What is the unit ball B(0, 1) in this norm?

Exercise 8.2.4: For Rn define


n
kxk1 := ∑ |x j |,
j=1

sometimes called the 1-norm (or L1 norm).


a) Show that k·k1 is a norm on Rn (defining a different distance, sometimes called the taxicab distance).
b) What is the unit ball B(0, 1) in this norm?

Exercise 8.2.5: Using the euclidean norm on R2 , compute the operator norm of the operators in L(R2 ) given
by the matrices:
   0 1    
a) 10 02 b) −1 0 c) 10 11 d) 00 10

Exercise 8.2.6: Using the standard euclidean norm Rn . Show


a) Suppose A ∈ L(R, Rn ) is defined for x ∈ R by Ax = xa for a vector a ∈ Rn . Then the operator norm
kAkL(R,Rn ) = kakRn . (that is the operator norm of A is the euclidean norm of a).
b) Suppose B ∈ L(Rn , R) is defined for x ∈ Rn by Bx = b · x for a vector b ∈ Rn . Then the operator norm
kBkL(Rn ,R) = kbkRn .

Exercise 8.2.7: Suppose σ = (σ1 , σ2 , . . . , σn ) is a permutation of (1, 2, . . . , n).


a) Show that we can make a finite number of transpositions (switching of two elements) to get to (1, 2, . . . , n).
b) Using the definition ( ) show that σ is even if sgn(σ ) = 1 and σ is odd if sgn(σ ) = −1. In particular,
showing that being odd or even is well-defined.

Exercise 8.2.8: Verify the computation of the determinant for the three types of elementary matrices.

Exercise 8.2.9: Prove .

Exercise 8.2.10:
a) Suppose D = [di, j ] is an n-by-n diagonal matrix, that is, di, j = 0 whenever i 6= j. Show that det(D) =
d1,1 d2,2 · · · dn,n .
b) Suppose A is a diagonalizable matrix. That is, there exists a matrix B such that B−1 AB = D for a diagonal
matrix D = [di, j ]. Show that det(A) = d1,1 d2,2 · · · dn,n .
30 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Exercise 8.2.11: Take the vector space of polynomials R[t] and the linear operator D ∈ L(R[t]) that is the
differentiation (we proved in an earlier exercise that D is a linear operator). Given P(t) = c0 + c1t + · · · +
cnt n ∈ R[t] define kPk := sup |c j | : j = 0, 1, 2, . . . , n .
a) Show that kPk is a norm on R[t].
b) Show that D does not have bounded operator norm, that is kDk = ∞. Hint: Consider the polynomials t n
as n tends to infinity.

Exercise 8.2.12: In this exercise we finish the proof of . Let X be any finite dimensional
normed vector space. Let {x1 , x2 , . . . , xn } be a basis for X.
a) Show that the function f : Rn → R

f (c1 , c2 , . . . , cn ) = kc1 x1 + c2 x2 + · · · + cn xn k

is continuous.
b) Show that there exist numbers m and M such that if c = (c1 , c2 , . . . , cn ) ∈ Rn with kck = 1 (standard
euclidean norm), then m ≤ kc1 x1 + c2 x2 + · · · + cn xn k ≤ M (here the norm is on X).
c) Show that there exists a number B such that if kc1 x1 + c2 x2 + · · · + cn xn k = 1, then |c j | ≤ B.
d) Use part c) to show that if X is finite dimensional vector spaces and A ∈ L(X,Y ), then kAk < ∞.

Exercise 8.2.13: Let X be any finite dimensional vector space with a norm k·k and basis {x1 , x2 , . . . , xn }. Let
c = (c1 , c2 , . . . , cn ) ∈ Rn and kck be the standard euclidean norm on Rn .
a) Find that there exist positive numbers m, M > 0 such that for all c ∈ Rn

mkck ≤ kc1 x1 + c2 x2 + · · · + cn xn k ≤ Mkck.

Hint: See previous exercise.


b) Use part a) to show that of k·k1 and k·k2 are two norms on X, then there exist positive numbers m, M > 0
(perhaps different than above) such that for all x ∈ X we have

mkxk1 ≤ kxk2 ≤ Mkxk1 .

c) Show that U ⊂ X is open in the metric defined by kx − yk1 if and only if it is open in the metric defined
by kx − yk2 . In other words, convergence of sequences and continuity of functions is the same in either
norm.

Exercise 8.2.14: Let A be an upper triangular matrix. Find a formula for the determinant of A in terms of
the diagonal entries, and prove that your formula works.

Exercise 8.2.15: Given an n-by-n matrix A, prove that |det(A)| ≤ kAkn (the norm on A is the operator norm).
Hint: Note that you only need to show this for invertible matrices. Then possibly reorder columns and factor
A into n matrices each of which differs from the identity by one column.

Exercise 8.2.16: Consider , for any n.


1
a) Prove that the estimate kA − Bk < is the best possible in the following sense: for any invertible
kA−1 k
A ∈ GL(Rn ) find a B where equality is satisfied and B is not invertible.
1
b) For any fixed invertible A ∈ GL(Rn ), let M denote the set of matrices B such that kA − Bk < kA−1 k
.
Prove that while B ∈ M is always invertible, kB−1 k is unbounded as a function of B on M .
8.2. ANALYSIS WITH VECTOR SPACES 31

Let A be an n-by-n matrix. A (possibly complex) number λ ∈ C is called an eigenvalue of A if there


is a nonzero (possibly complex) vector x ∈ Cn such that Ax = λ x (the multiplication by complex vectors is
the same as for real vectors. In particular if x = a + ib for real vectors a and b, and A is a real matrix, then
Ax = Aa + iAb). The number

ρ(A) := sup |λ | : λ is an eigenvalue of A

is called the spectral radius of A. Here |λ | is the complex modulus. We state without proof that at least one
eigenvalue always exists, and there are no more than n distinct eigenvalues of A. You can therefore assume
that 0 ≤ ρ(A) < ∞. The exercises below hold for complex matrices, but feel free to assume they are real
matrices.

Exercise 8.2.17: Let A, S be n-by-n matrices, where S is invertible. Prove that λ is an eigenvalue of A, if and
only if it is an eigenvalue of S−1 AS. Then prove that ρ(S−1 AS) = ρ(S). In particular, ρ is a well defined
function on L(X) for any finite dimensional vector space X.

Exercise 8.2.18: Let A be an n-by-n matrix A.


a) Prove ρ(A) ≤ kAk.
b) For any k ∈ N, prove ρ(A) ≤ kAk k1/k .
c) Suppose lim Ak = 0 (limit in the operator norm). Prove that ρ(A) < 1.
k→∞

Exercise 8.2.19: We say a set C ⊂ Rn is symmetric if x ∈ C implies −x ∈ C.


a) Let k·k be any norm on Rn . Show that the closed unit ball C(0, 1) (using the metric induced by this norm)
is a compact symmetric convex set.
b) (Challenging) Let C ⊂ Rn be a compact symmetric convex set and 0 ∈ C. Show that
n x o
kxk := inf λ : λ > 0 and ∈ C
λ
is a norm on Rn , and C = C(0, 1) (the closed unit ball) in the metric induced by this norm.
Hint: Feel free to the result of part c).
32 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

8.3 The derivative


Note: 2–3 lectures

8.3.1 The derivative


For a function f : R → R, we defined the derivative at x as
f (x + h) − f (x)
lim .
h→0 h
In other words, there is a number a (the derivative of f at x) such that
f (x + h) − f (x) f (x + h) − f (x) − ah | f (x + h) − f (x) − ah|
lim − a = lim = lim = 0.
h→0 h h→0 h h→0 |h|

Multiplying by a is a linear map in one dimension: h 7→ ah. That is, we think of a ∈ L(R1 , R1 ),
which is the best linear approximation of how f changes near x. We use this definition to extend
differentiation to more variables.
Definition 8.3.1. Let U ⊂ Rn be an open subset and f : U → Rm . We say f is differentiable at
x ∈ U if there exists an A ∈ L(Rn , Rm ) such that
k f (x + h) − f (x) − Ahk
lim = 0.
h→0n khk
h∈R

We write D f (x) := A, or f ′ (x) := A, and we say A is the derivative of f at x. When f is differentiable


at every x ∈ U, we say simply that f is differentiable. See for an illustration.

y
y = f (x1 , x2 )

Ah

x2
h
x1
Figure 8.3: Illustration of a derivative for a function f : R2 → R. The vector h is shown in the x1 x2 -plane
based at (x1 , x2 ), and the vector Ah ∈ R1 is shown along the y direction.

For a differentiable function, the derivative of f is a function from U to L(Rn , Rm ). Compare


to the one dimensional case, where the derivative is a function from U to R, but we really want
8.3. THE DERIVATIVE 33

to think of R here as L(R1 , R1 ). As in one dimension, the idea is that a differentiable mapping is
“infinitesimally close” to a linear mapping, and this linear mapping is the derivative.
Notice which norms are being used in the definition. The norm in the numerator is on Rm , and
the norm in the denominator is on Rn where h lives. Normally it is understood that h ∈ Rn from
context. We will not explicitly say so from now on.
We have again cheated somewhat and said that A is the derivative. We have not shown yet that
there is only one, let us do that now.
Proposition 8.3.2. Let U ⊂ Rn be an open subset and f : U → Rm . Suppose x ∈ U and there exist
A, B ∈ L(Rn , Rm ) such that
k f (x + h) − f (x) − Ahk k f (x + h) − f (x) − Bhk
lim =0 and lim = 0.
h→0 khk h→0 khk
Then A = B.
Proof. Suppose h ∈ Rn , h 6= 0. Compute
k(A − B)hk k f (x + h) − f (x) − Ah − ( f (x + h) − f (x) − Bh)k
=
khk khk
k f (x + h) − f (x) − Ahk k f (x + h) − f (x) − Bhk
≤ + .
khk khk

So k(A−B)hk
khk → 0 as h → 0. That is, given ε > 0, then for all nonzero h in some δ -ball around the
origin
k(A − B)hk h
ε> = (A − B) .
khk khk
h
For any x with kxk = 1, let h = (δ/2) x, then khk < δ and khk = x. So k(A − B)xk < ε . Taking the
supremum over all x with kxk = 1 we get the operator norm kA − Bk ≤ ε . As ε > 0 was arbitrary,
kA − Bk = 0, or in other words A = B.
Example 8.3.3: If f (x) = Ax for a linear mapping A, then f ′ (x) = A:
k f (x + h) − f (x) − Ahk kA(x + h) − Ax − Ahk 0
= = = 0.
khk khk khk
Example 8.3.4: Let f : R2 → R2 be defined by

f (x, y) = f1 (x, y), f2 (x, y) := (1 + x + 2y + x2 , 2x + 3y + xy).
Let us show that f is differentiable at the origin and let us compute the derivative, directly using
the
 a bdefinition.
 If the derivative exists, it is in L(R2 , R2 ), so it can be represented by a 2-by-2 matrix
c d . Suppose h = (h1 , h2 ). We need the following expression to go to zero.

k f (h1 , h2 ) − f (0, 0) − (ah1 + bh2 , ch1 + dh2 )k


=
k(h1 , h2 )k
q 2 2
(1 − a)h1 + (2 − b)h2 + h21 + (2 − c)h1 + (3 − d)h2 + h1 h2
q .
h21 + h22
34 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

If we choose a = 1, b = 2, c = 2, d = 3, the expression becomes


q q
4 2
h 1 + h1 h22 h21 + h22
q = |h1 | q = |h1 |.
2
h1 + h22 2
h1 + h2 2

And this expression does indeed go to zero as h → 0. The function f is differentiable at the origin
and the derivative f ′ (0) is represented by the matrix 12 23 .
Proposition 8.3.5. Let U ⊂ Rn be open and f : U → Rm be differentiable at p ∈ U. Then f is
continuous at p.
Proof. Another way to write the differentiability of f at p is to first write
r(h) := f (p + h) − f (p) − f ′ (p)h,

and kr(h)k ′
khk must go to zero as h → 0. So r(h) itself must go to zero. The mapping h 7→ f (p)h is a
linear mapping between finite dimensional spaces, it is therefore continuous and goes to zero as
h → 0. Therefore, f (p + h) must go to f (p) as h → 0. That is, f is continuous at p.
The derivative is itself a linear operator on the space of differentiable functions.
Proposition 8.3.6. Suppose U ⊂ Rn is open, f : U → Rm and g : U → Rm are differentiable at p,
and α ∈ R. Then the functions f + g and α f are differentiable at p and
( f + g)′ (p) = f ′ (p) + g′ (p), and (α f )′ (p) = α f ′ (p).
Proof. Let h ∈ Rn , h 6= 0. Then
 
f (p + h) + g(p + h) − f (p) + g(p) − f ′ (p) + g′ (p) h
khk
k f (p + h) − f (p) − f ′ (p)hk kg(p + h) − g(p) − g′ (p)hk
≤ + ,
khk khk
and
kα f (p + h) − α f (p) − α f ′ (p)hk k f (p + h)) − f (p) − f ′ (p)hk
= |α | .
khk khk
The limits as h goes to zero of the right hand sides as are zero by hypothesis. The conclusion
follows.
If A ∈ L(X,Y ) and B ∈ L(Y, Z) are linear maps, then they are their own derivative. The com-
position BA ∈ L(X, Z) is also its own derivative, and so the derivative of the composition is the
composition of the derivatives. As differentiable maps are “infinitesimally close” to linear maps,
they have the same property:
Theorem 8.3.7 (Chain rule). Let U ⊂ Rn be open and let f : U → Rm be differentiable at p ∈ U.
Let V ⊂ Rm be open, f (U) ⊂ V and let g : V → Rℓ be differentiable at f (p). Then

F(x) = g f (x)
is differentiable at p and 
F ′ (p) = g′ f (p) f ′ (p).
8.3. THE DERIVATIVE 35

Without the points where things are evaluated, this is sometimes written as F ′ = ( f ◦ g)′ = g′ f ′ .
The way to understand it is that the derivative of  the composition g ◦ f is the composition of the
derivatives of g and f . If f ′ (p) = A and g′ f (p) = B, then F ′ (p) = BA, just as for linear maps.

Proof. Let A := f ′ (p) and B := g′ f (p) . Take h ∈ Rn and write q = f (p), k = f (p + h) − f (p).
Let
r(h) := f (p + h) − f (p) − Ah.
Then r(h) = k − Ah or Ah = k − r(h), and f (p + h) = q + k. We look at the quantity we need to go
to zero:
 
kF(p + h) − F(p) − BAhk kg f (p + h) − g f (p) − BAhk
=
khk khk

kg(q + k) − g(q) − B k − r(h) k
=
khk
kg(q + k) − g(q) − Bkk kr(h)k
≤ + kBk
khk khk
kg(q + k) − g(q) − Bkk k f (p + h) − f (p)k kr(h)k
= + kBk .
kkk khk khk

First, kBk is constant and f is differentiable at p, so the term kBk kr(h)k


khk goes to 0. Next as f is
kg(q+k)−g(q)−Bkk
continuous at p, we have that as h goes to 0, then k goes to 0. Therefore kkk goes to 0
because g is differentiable at q. Finally
k f (p + h) − f (p)k k f (p + h) − f (p) − Ahk kAhk k f (p + h) − f (p) − Ahk
≤ + ≤ + kAk.
khk khk khk khk
As f is differentiable at p, for small enough h, the quantity k f (p + h) − f (p) − Ahk khk is bounded.
Therefore the term k f (p+h)−
khk
f (p)k
stays bounded as h goes to 0. Therefore, kF(p+h)−F(p)−BAhk
khk goes
to zero, and F ′ (p) = BA, which is what was claimed.

8.3.2 Partial derivatives


There is another way to generalize the derivative from one dimension. We hold all but one variable
constant and take the regular derivative.
Definition 8.3.8. Let f : U → R be a function on an open set U ⊂ Rn . If the following limit exists
we write
∂f f (x1 , . . . , x j−1 , x j + h, x j+1 , . . . , xn ) − f (x) f (x + he j ) − f (x)
(x) := lim = lim .
∂xj h→0 h h→0 h
∂f
We call ∂ x j (x) the partial derivative of f with respect to x j . Sometimes we write D j f instead. See
.
For a mapping f : U → Rm we write f = ( f1 , f2 , . . . , fm ), where fk are real-valued functions.
We then take partial derivatives of the components, ∂∂ xfkj (or write it as D j fk ).
36 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

y
y = f (x1 , x2 )

∂f
slope = ∂ x2 (x1 , x2 )

x2
(x1 , x2 )
x1
Figure 8.4: Illustration of a partial derivative for a function f : R2 → R. The yx2 -plane where x1 is fixed
is marked in dotted line, and the slope of the tangent line in the yx2 -plane is ∂∂xf2 (x1 , x2 ).

Partial derivatives are easier to compute with all the machinery of calculus, and they provide a
way to compute the derivative of a function.
Proposition 8.3.9. Let U ⊂ Rn be open and let f : U → Rm be differentiable at p ∈ U. Then all the
partial derivatives at p exist and, in terms of the standard bases of Rn and Rm , f ′ (p) is represented
by the matrix  
∂ f1 ∂ f1 ∂ f1
(p) (p) . . . (p)
 ∂ x1 ∂ x2 ∂ xn

 ∂ f2 ∂ f2 ∂ f2 
 ∂ x1 (p) ∂ x2 (p) . . . ∂ xn (p) 
 . .. ..  .
 .. ... 
 . . 
∂ fm ∂ fm ∂ fm
∂ x (p) ∂ x (p) . . . ∂ xn (p)
1 2

In other words
m
∂f
f ′ (p) e j = ∑ ∂ xkj (p) ek .
k=1
If v = ∑nj=1 c j e j = (c1 , c2 , . . . , cn ), then
!
n m m n
∂ fk ∂ fk
f ′ (p) v = ∑ ∑ c j (p) ek = ∑ ∑ c j (p) ek .
j=1 k=1 ∂ x j k=1 j=1 ∂ x j

Proof. Fix a j and note that


f (p + he j ) − f (p) f (p + he j ) − f (p) − f ′ (p) he j
− f ′ (p) e j =
h h
k f (p + he j ) − f (p) − f ′ (p) he j k
= .
khe j k
As h goes to 0, the right hand side goes to zero by differentiability of f , and hence
f (p + he j ) − f (p)
lim = f ′ (p) e j .
h→0 h
8.3. THE DERIVATIVE 37

Let us represent f by components f = ( f1 , f2 , . . . , fm ), since it is vector-valued. Taking a limit in


Rm is the same as taking the limit in each component separately. For any k the partial derivative

∂ fk fk (p + he j ) − fk (p)
(p) = lim
∂xj h→0 h

exists and is equal to the kth component of f ′ (p) e j , and we are done.

The converse of the proposition is not true. Just because the partial derivatives exist, does not
mean that the function is differentiable. See the exercises. However, when the partial derivatives are
continuous, we will prove that the converse holds. One of the consequences of the proposition is
that if f is differentiable on U, then f ′ : U → L(Rn , Rm ) is a continuous function if and only if all
the ∂∂ xfkj are continuous functions.

8.3.3 Gradients, curves, and directional derivatives


Let U ⊂ Rn be open and f : U → R a differentiable function. We define the gradient as
n
∂f
∇ f (x) := ∑ ∂ x j (x) e j .
j=1

The gradient gives a way to represent the action of the derivative as a dot product: f ′ (x) v = ∇ f (x) · v.
Suppose γ : (a, b) ⊂ R → Rn is a differentiable function. Such a function and its image is
sometimes called a curve, or a differentiable curve. Write γ = (γ1 , γ2 , . . . , γn ). For the purposes of
computation we identify L(R1 ) and R as we did when we defined the derivative in one variable.
We also identify L(R1, Rn ) with Rn . We treat γ ′ (t) both as an operator in L(R1 , Rn ) and the vector
γ1′ (t), γ2′ (t), . . . , γn′ (t) in Rn . Using , if v ∈ Rn is γ ′ (t) acting as a vector, then
h 7→ h v (for h ∈ R1 = R) is γ ′ (t) acting as an operator in L(R1 , Rn ). We often use this slight abuse
of notation when dealing with curves. See .

γ (t) γ ′ (t) γ (b)



γ (a) γ (a, b)

Figure 8.5: Differentiable curve and its derivative as a vector.


Suppose γ (a, b) ⊂ U and let 
g(t) := f γ (t) .
The function g is differentiable. Treating g′ (t) as a number:

 n  dγ j n
∂f ∂ f dγ j
′ ′
g (t) = f γ (t) γ ′ (t) = ∑ ∂ x j γ (t) dt (t) = ∑ ∂xj .
j=1 j=1 dt
38 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

For convenience, we often leave out the points where we are evaluating, such as above on the right
hand side. Let us rewrite this equation with the notation of the gradient and the dot product.

g′ (t) = (∇ f ) γ (t) · γ ′ (t) = ∇ f · γ ′ .

We use this idea to define derivatives in a specific direction. A direction is simply a vector
pointing in that direction. Pick a vector u ∈ Rn such that kuk = 1, and fix x ∈ U. We define the
directional derivative as
d   f (x + hu) − f (x)
Du f (x) := f (x + tu) = lim ,
dt t=0 h→0 h
d
where the notation dt t=0 represents the derivative evaluated at t = 0. Taking the standard basis
∂f ∂f
vector e j we find = De j f . For this reason, sometimes the notation
∂xj ∂u is used instead of Du f .
Let γ be defined by
γ (t) := x + tu.
Then γ ′ (t) = u for all t. By the computation above:

d   
Du f (x) = f (x + tu) = (∇ f ) γ (0) · γ ′ (0) = (∇ f )(x) · u.
dt t=0

Suppose (∇ f )(x) 6= 0. By the Cauchy–Schwarz inequality,

|Du f (x)| ≤ k(∇ f )(x)k.

Equality is achieved when u is a scalar multiple of (∇ f )(x). That is, when

(∇ f )(x)
u= ,
k(∇ f )(x)k

we get Du f (x) = k(∇ f )(x)k. The gradient points in the direction in which the function grows
fastest, in other words, in the direction in which Du f (x) is maximal.

8.3.4 The Jacobian


Definition 8.3.10. Let U ⊂ Rn and f : U → Rn be a differentiable mapping. Define the Jacobian ,
or the Jacobian determinant , of f at x as

J f (x) := det f ′ (x) .

Sometimes J f is written as
∂ ( f1 , f2 , . . . , fn )
.
∂ (x1 , x2 , . . . , xn )

Named after the Italian mathematician (1804–1851).

The matrix from representing f ′ (x) is sometimes called the Jacobian matrix.
8.3. THE DERIVATIVE 39

This last piece of notation may seem somewhat confusing, but it is quite useful when we need
to specify the exact variables and function components used, as will for example do in the implicit
function theorem.
The Jacobian J f is a real-valued function, and when n = 1 it is simply the derivative. From the
chain rule and the fact that det(AB) = det(A) det(B), it follows that:

J f ◦g (x) = J f g(x) Jg (x).
The determinant of a linear mapping tells us what happens to area/volume under the mapping.
Similarly, the Jacobian measures how much a differentiable mapping stretches things locally, and if
it flips orientation. In particular, if the Jacobian is non-zero than we would assume that locally the
mapping is invertible (and we would be correct as we will later see).

8.3.5 Exercises
Exercise 8.3.1: Suppose γ : (−1, 1) → Rn and α : (−1, 1) → Rn be two differentiable curves such that
γ(0) = α(0) and γ ′ (0) = α ′ (0). Suppose F : Rn → R is a differentiable function. Show that
d  d 
F γ(t) = F α(t) .
dt
t=0 dt t=0
p
2
Exercise 8.3.2: Let f : R → R be given by f (x, y) := x2 + y2 , see . Show that f is not differen-
tiable at the origin.

x
p
Figure 8.6: Graph of x2 + y2 .

Exercise 8.3.3: Using only the definition of the derivative, show that the following f : R2 → R2 are differen-
tiable at the origin and find their derivative.
a) f (x, y) := (1 + x + xy, x),

b) f (x, y) := y − y10 , x ,

c) f (x, y) := (x + y + 1)2 , (x − y + 2)2 .
Exercise 8.3.4: Suppose f : R → R and g : R → R are differentiable functions.
 Using only the definition of
2 2
the derivative, show that h : R → R defined by h(x, y) := f (x), g(y) is a differentiable function and find
the derivative at any point (x, y).
40 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Exercise 8.3.5: Define a function f : R2 → R by (see )


(
xy
x2 +y2
if (x, y) 6= (0, 0),
f (x, y) :=
0 if (x, y) = (0, 0).

∂f ∂f
a) Show that partial derivatives ∂x and ∂y exist at all points (including the origin).
b) Show that f is not continuous at the origin (and hence not differentiable).

x
xy
Figure 8.7: Graph of x2 +y2
.

Exercise 8.3.6: Define a function f : R2 → R by (see )


(
x2 y
x2 +y2
if (x, y) 6= (0, 0),
f (x, y) :=
0 if (x, y) = (0, 0).

∂f ∂f
a) Show that partial derivatives ∂x and ∂y exist at all points.
b) Show that for all u ∈ R2 with kuk = 1, the directional derivative Du f exists at all points.
c) Show that f is continuous at the origin.
d) Show that f is not differentiable at the origin.

Exercise 8.3.7: Suppose f : Rn → Rn is one-to-one, onto, differentiable at all points, and such that f −1 is
also differentiable at all points.
′  
a) Show that f ′ (p) is invertible at all points p and compute ( f −1 ) f (p) . Hint: Consider x = f −1 f (x) .
b) Let g : Rn → Rn be a function differentiable at q ∈ Rn and such that g(q) = q. Suppose f (p) = q for
some p ∈ Rn . Show Jg (q) = J f −1 ◦g◦ f (p) where Jg is the Jacobian determinant.

Exercise 8.3.8: Suppose f : R2 → R is differentiable and such that f (x, y) = 0 if and only if y = 0 and such
that ∇ f (0, 0) = (0, 1). Prove that f (x, y) > 0 whenever y > 0, and f (x, y) < 0 whenever y < 0.
8.3. THE DERIVATIVE 41

x
x2 y
Figure 8.8: Graph of x2 +y2
.

As for functions of one variable, f : U → R has a relative maximum at p ∈ U if there exists a δ > 0 such
that f (q) ≤ f (p) for all q ∈ B(p, δ ) ∩U. Similarly for relative minimum.

Exercise 8.3.9: Suppose U ⊂ Rn is open and f : U → R is differentiable. Suppose f has a relative maximum
at p ∈ U. Show that f ′ (p) = 0, that is the zero mapping in L(Rn , R). That is p is a critical point of f .

Exercise 8.3.10: Suppose f : R2 → R is differentiable and suppose that whenever x2 + y2 = 1, then f (x, y) =
0. Prove that there exists at least one point (x0 , y0 ) such that ∂∂ xf (x0 , y0 ) = ∂∂ yf (x0 , y0 ) = 0.

Exercise 8.3.11: Define f (x, y) := (x − y2 )(2y2 − x). The graph of f is called the Peano surface . Show
a) (0, 0) is a critical point, that is f ′ (0, 0) = 0, that is the zero linear map in L(R2 , R).
b) For every direction, that is (x, y) such that x2 + y2 = 1 the “restriction of f to the line containing the
points (0, 0) and (x, y)”, that is a function g(t) := f (tx,ty) has a relative maximum at t = 0.
Hint: While not necessary §4.3 of volume I makes this part easier.
c) f does not have a relative maximum at (0, 0).

Exercise 8.3.12: Suppose f : R → Rn is differentiable and k f (t)k = 1 for all t (that is, we have a curve in
the unit sphere). Then show that for all t, treating f ′ as a vector we have, f ′ (t) · f (t) = 0.

Exercise 8.3.13: Define f : R2 → R2 by f (x, y) := x, y + ϕ(x) for some differentiable function ϕ of one
variable. Show f is differentiable and find f ′ .

Exercise 8.3.14: Suppose U ⊂ Rn is open, p ∈ U, and f : U → R, g : U → R, h : U → R are functions such


that f (p) = g(p) = h(p), f and h are differentiable at p, f ′ (p) = h′ (p), and

f (x) ≤ g(x) ≤ h(x)

for all x ∈ U. Show that g is differentiable at p and g′ (p) = f ′ (p) = h′ (p).


Named after the Italian mathematician (1858–1932).
42 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

8.4 Continuity and the derivative


Note: 1–2 lectures

8.4.1 Bounding the derivative


Let us prove a “mean value theorem” for vector-valued functions.

Lemma 8.4.1. If ϕ : [a, b] → Rn is differentiable on (a, b) and continuous on [a, b], then there exists
a t0 ∈ (a, b) such that
kϕ (b) − ϕ (a)k ≤ (b − a)kϕ ′ (t0 )k.

Proof. By mean value theorem on the scalar-valued function t 7→ ϕ (b) − ϕ (a) · ϕ (t), where the
dot is the dot product, we obtain that there is a t0 ∈ (a, b) such that
 
kϕ (b) − ϕ (a)k2 = ϕ (b) − ϕ (a) · ϕ (b) − ϕ (a)
 
= ϕ (b) − ϕ (a) · ϕ (b) − ϕ (b) − ϕ (a) · ϕ (a)

= (b − a) ϕ (b) − ϕ (a) · ϕ ′ (t0 ),

where we treat ϕ ′ as a vector in Rn by the abuse of notation we mentioned in the previous section.
If we think of ϕ ′ (t) as a vector, then by , kϕ ′ (t)kL(R,Rn ) = kϕ ′ (t)kRn . That is, the
euclidean norm of the vector is the same as the operator norm of ϕ ′ (t).
By the Cauchy–Schwarz inequality

kϕ (b) − ϕ (a)k2 = (b − a) ϕ (b) − ϕ (a) · ϕ ′ (t0 ) ≤ (b − a)kϕ (b) − ϕ (a)k kϕ ′ (t0 )k.

Recall that a set U is convex if whenever x, y ∈ U, the line segment from x to y lies in U.

Proposition 8.4.2. Let U ⊂ Rn be a convex open set, f : U → Rm a differentiable function, and an


M such that
k f ′ (x)k ≤ M for all x ∈ U.
Then f is Lipschitz with constant M, that is

k f (x) − f (y)k ≤ Mkx − yk for all x, y ∈ U.

Proof. Fix x and y in U and note that (1 − t)x + ty ∈ U for all t ∈ [0, 1] by convexity. Next

dh i 
f (1 − t)x + ty = f ′ (1 − t)x + ty (y − x).
dt
By the mean value theorem above we get for some t0 ∈ (0, 1),

d h i
k f (x) − f (y)k ≤ f (1 − t)x + ty
dt t=t0

≤ f ′ (1 − t0 )x + t0 y ky − xk ≤ Mky − xk.
8.4. CONTINUITY AND THE DERIVATIVE 43

Example 8.4.3: If U is not convex the proposition is not true: Consider the set
 
U := (x, y) : 0.5 < x2 + y2 < 2 \ (x, 0) : x < 0 .
For (x, y) ∈ U, let f (x, y) be the angle that the line from the origin to (x, y) makes with the positive
x axis. We even have a formula for f :
!
y
f (x, y) = 2 arctan p .
x + x2 + y2
Think a spiral staircase with room in the middle. See .

(x, y) y

θ = f (x, y) z

Figure 8.9: A non-Lipschitz function with uniformly bounded derivative.

The function is differentiable, and the derivative is bounded on U, which is not hard to see. Now
think of what happens near where the negative x-axis cuts the annulus in half. As we approach this
cut from positive y, f (x, y) approaches π . From negative y, f (x, y) approaches −π . So for small
ε > 0, | f (−1, ε ) − f (−1, −ε )| approaches 2π , but k(−1, ε ) − (−1, −ε )k = 2ε , which is arbitrarily
small. The conclusion of the proposition does not hold for this nonconvex U.
Let us solve the differential equation f ′ = 0.
Corollary 8.4.4. If U ⊂ Rn is connected and f : U → Rm is differentiable and f ′ (x) = 0, for all
x ∈ U, then f is constant.
Proof. For any x ∈ U, there is a ball B(x, δ ) ⊂ U. The ball B(x, δ ) is convex. Since k f ′ (y)k ≤ 0
for all y ∈ B(x, δ ), then by the proposition, k f (x) − f (y)k ≤ 0kx − yk = 0. So f (x) = f (y) for all
y ∈ B(x, δ ).
This means that f −1 (c) is open for any c ∈ Rm . Suppose f −1 (c) is nonempty. The two sets

U ′ = f −1 (c), U ′′ = f −1 Rm \ {c}
are open and disjoint, and further U = U ′ ∪U ′′ . So as U ′ is nonempty, and U is connected, we have
that U ′′ = 0.
/ So f (x) = c for all x ∈ U.
44 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

8.4.2 Continuously differentiable functions


Definition 8.4.5. Let U ⊂ Rn be open. We say f : U → Rm is continuously differentiable, or C1 (U),
if f is differentiable and f ′ : U → L(Rn , Rm ) is continuous.
Proposition 8.4.6. Let U ⊂ Rn be open and f : U → Rm . The function f is continuously differen-
∂f
tiable if and only if the partial derivatives ∂ x j exist for all j and ℓ and are continuous.

Without continuity the theorem does not hold. Just because partial derivatives exist does not
mean that f is differentiable, in fact, f may not even be continuous. See the exercises for the last
section and also for this section.
Proof. We proved that if f is differentiable, then the partial derivatives exist. The partial derivatives
are the entries of the matrix of f ′ (x). If f ′ : U → L(Rn , Rm ) is continuous, then the entries are
continuous, and hence the partial derivatives are continuous.
To prove the opposite direction, suppose the partial derivatives exist and are continuous. Fix
x ∈ U. If we show that f ′ (x) exists we are done, because the entries of the matrix f ′ (x) are the partial
derivatives and if the entries are continuous functions, the matrix-valued function f ′ is continuous.
We do induction on dimension. First, the conclusion is true when n = 1. In this case the
derivative is just the regular derivative (exercise, noting that f is vector-valued).
Suppose the conclusion is true for Rn−1 , that is, if we restrict to the first n − 1 variables, the
function is differentiable. It is easy to see that the first n − 1 partial derivatives of f restricted to the
set where the last coordinate is fixed are the same as those for f . In the following, by a slight abuse
of notation, we think of Rn−1 as a subset of Rn , that is the set in Rn where xn = 0. In other words,
we identify the vectors (x1 , x2 , . . . , xn−1 ) and (x1 , x2 , . . . , xn−1 , 0). Let
     
∂ f1 ∂ f1 ∂ f1 ∂ f1 ∂ f1
(x) . . . (x) (x) . . . (x) (x)
 ∂ x1. .
∂ xn
.  ′
 ∂ x1. .
∂ xn−1
.   ∂ xn. 

A :=  .. .. ..  
A :=  .. .. ..  v :=  . 
, ,  . .
∂ fm ∂ fm ∂ fm ∂ fm ∂ fm
∂ x (x) . . . ∂ xn (x)
1 1 ∂ x (x) . . . ∂ x
n−1
(x) ∂ xn (x)

Let ε > 0 be given. By the induction hypothesis, there is a δ > 0 such that for any k ∈ Rn−1 with
kkk < δ we have
k f (x + k) − f (x) − A′ kk
< ε.
kkk
By continuity of the partial derivatives, suppose δ is small enough so that
∂ fj ∂ fj
(x + h) − (x) < ε ,
∂ xn ∂ xn
for all j and all h ∈ Rn with khk < δ .
Suppose h = k + ten is a vector in Rn , where k ∈ Rn−1 , t ∈ R, such that khk < δ . Then
kkk ≤ khk < δ . Note that Ah = A′ k + tv.

k f (x + h) − f (x) − Ahk = k f (x + k + ten ) − f (x + k) − tv + f (x + k) − f (x) − A′ kk


≤ k f (x + k + ten ) − f (x + k) − tvk + k f (x + k) − f (x) − A′ kk
≤ k f (x + k + ten ) − f (x + k) − tvk + ε kkk.
8.4. CONTINUITY AND THE DERIVATIVE 45

As all the partial derivatives exist, by the mean value theorem, for each j there is some θ j ∈ [0,t]
(or [t, 0] if t < 0), such that
∂ fj
f j (x + k + ten ) − f j (x + k) = t (x + k + θ j en ).
∂ xn
Note that if khk < δ , then kk + θ j en k ≤ khk < δ . So to finish the estimate

k f (x + h) − f (x) − Ahk ≤ k f (x + k + ten ) − f (x + k) − tvk + ε kkk


v
um  2
u ∂ fj ∂ fj
t
≤ ∑ t (x + k + θ j en ) − t (x) + ε kkk
j=1 ∂ xn ∂ xn

≤ m ε |t| + ε kkk

≤ ( m + 1)ε khk.
A common application is to prove that a certain function is differentiable. For example, let us
show that all polynomials are differentiable, and in fact continuously differentiable by computing
the partial derivatives.
Corollary 8.4.7. A polynomial p : Rn → R in several variables
j j
p(x1 , x2 , . . . , xn ) = ∑ c j1 , j2 ,..., jn x11 x22 · · · xnjn
0≤ j1 + j2 +···+ jn ≤d

is continuously differentiable.
Proof. Consider the partial derivative of p in the xn variable. Write p as
d
p(x) = ∑ p j (x1, . . . , xn−1) xnj ,
j=0

where p j are polynomials in one less variable. Then


d
∂p
∂ xn
(x) = ∑ p j (x1, . . . , xn−1) jxnj−1,
j=1

which is again a polynomial. So the partial derivatives of polynomials exist and are again polynomi-
als. By the continuity of algebraic operations, polynomials are continuous functions. Therefore p is
continuously differentiable.

8.4.3 Exercises
Exercise 8.4.1: Define f : R2 → R as
( −1 
(x2 + y2 ) sin (x2 + y2 ) if (x, y) 6= (0, 0),
f (x, y) :=
0 else.

Show that f is differentiable at the origin, but that it is not continuously differentiable.
Note: Feel free to use what you know about sine and cosine from calculus.
46 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Exercise 8.4.2: Let f : R2 → R be the function from , that is,


(
xy
x2 +y2
if (x, y) 6= (0, 0),
f (x, y) :=
0 if (x, y) = (0, 0).

∂f ∂f
Compute the partial derivatives ∂x and ∂y at all points and show that these are not continuous functions.

Exercise 8.4.3: Let B(0, 1) ⊂ R2 be the unit ball (disc), that is, the set given by x2 + y2 < 1. Suppose
f : B(0, 1) → R is a differentiable function such that | f (0, 0)| ≤ 1, and ∂∂ xf ≤ 1 and ∂∂ yf ≤ 1 for all points
in B(0, 1).
a) Find an M ∈ R such that k f ′ (x, y)k ≤ M for all (x, y) ∈ B(0, 1).
b) Find a B ∈ R such that | f (x, y)| ≤ B for all (x, y) ∈ B(0, 1).

Exercise 8.4.4: Define ϕ : [0, 2π] → R2 by ϕ(t) = sin(t), cos(t) . Compute ϕ ′ (t) for all t. Compute kϕ ′ (t)k
for all t. Notice that ϕ ′ (t) is never zero, yet ϕ(0) = ϕ(2π), therefore, Rolle’s theorem is not true in more
than one dimension.
∂f ∂f
Exercise 8.4.5: Let f : R2 → R be a function such that ∂x and ∂y exist at all points and there exists an
∂f ∂f
M ∈ R such that ∂x ≤ M and ∂y ≤ M at all points. Show that f is continuous.

Exercise 8.4.6: Let f : R2 → R be a function and M ∈ R, such that for every (x, y) ∈ R2 , the function
g(t) := f (xt, yt) is differentiable and |g′ (t)| ≤ M.
a) Show that f is continuous at (0, 0).
b) Find an example of such an f which is not continuous at every other point of R2
Hint: Think back to how we constructed a nowhere continuous function on [0, 1].

Exercise 8.4.7: Suppose r : Rn \ X → R is a rational function, that is, let p : Rn → R and q : Rn → R be


polynomials, where X = q−1 (0), and r = qp . Show that r is continuously differentiable.

Exercise 8.4.8: Suppose f : Rn → R and h : Rn → R are two differentiable functions such that f ′ (x) = h′ (x)
for all x ∈ Rn . Prove that if f (0) = h(0) then f (x) = h(x) for all x ∈ Rn .

Exercise 8.4.9: Prove the assertion about the base case in the proof of . That is, prove that
if n = 1 and the the partials exist and are continuous, the function is continuously differentiable.

Exercise 8.4.10: Suppose g : R → R is continuously differentiable and h : R2 → R is continuous. Show that


Z y
F(x, y) := g(x) + h(x, s) ds
0

∂F
is continuously differentiable, and that it is the solution of the partial differential equation ∂y = h, with the
initial condition F(x, 0) = g(x) for all x ∈ R.
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 47

8.5 Inverse and implicit function theorem


Note: 2–3 lectures
To prove the inverse function theorem we use the contraction mapping principle from chapter 7,
we used to prove Picard’s theorem. Recall that a mapping f : X → Y between two metric spaces
(X, dX ) and (Y, dY ) is called a contraction if there exists a k < 1 such that

dY f (p), f (q) ≤ k dX (p, q) for all p, q ∈ X.

The contraction mapping principle says that if f : X → X is a contraction and X is a complete metric
space, then there exists a unique fixed point, that is, there exists a unique x ∈ X such that f (x) = x.
Intuitively if a function is continuously differentiable, then it locally “behaves like” the derivative
(which is a linear function). The idea of the inverse function theorem is that if a function is
continuously differentiable and the derivative is invertible, the function is (locally) invertible.

Theorem 8.5.1 (Inverse function theorem). Let U ⊂ Rn be an open set and let f : U → Rn be
a continuously differentiable function. Also suppose p ∈ U, f (p) = q, and f ′ (p) is invertible
(that is, J f (p) 6= 0). Then there exist open sets V,W ⊂ Rn such that p ∈ V ⊂ U, f (V ) = W and
f |V is one-to-one. And hence a g : W → V exists such that g(y) := ( f |V )−1 (y). See .
Furthermore, g is continuously differentiable and
−1
g′ (y) = f ′ (x) , for all x ∈ V , y = f (x).

U
f
V W = f (V )
p g q = f (p)
f
x y
g

Figure 8.10: Setup of the inverse function theorem in Rn .

Proof. Write A = f ′ (p). As f ′ is continuous, there exists an open ball V around p such that

1
kA − f ′ (x)k < for all x ∈ V .
2kA−1 k

Note that f ′ (x) is invertible for all x ∈ V by .


Given y ∈ Rn we define ϕy : C → Rn by

ϕy (x) := x + A−1 y − f (x) .
48 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

As A−1 is one-to-one, ϕy (x) = x (x is a fixed point) if only if y − f (x) = 0, or in other words f (x) = y.
Using the chain rule we obtain

ϕy′ (x) = I − A−1 f ′ (x) = A−1 A − f ′ (x) .

So for x ∈ V we have
kϕy′ (x)k ≤ kA−1 k kA − f ′ (x)k < 1/2.
As V is a ball, it is convex. Hence
1
kϕy (x1 ) − ϕy (x2 )k ≤ kx1 − x2 k for all x1 , x2 ∈ V .
2
In other words ϕy is a contraction defined on V , though we so far do not know what is the range
of ϕy . We cannot yet apply the fixed point theorem, but we can say that ϕy has at most one fixed
point in V : If ϕy (x1 ) = x1 and ϕy (x2 ) = x2 , then kx1 − x2 k = kϕy (x1 ) − ϕy (x2 )k ≤ 21 kx1 − x2 k, so
x1 = x2 . That is, there exists at most one x ∈ V such that f (x) = y, and so f |V is one-to-one.
Let W := f (V ). We need to show that W is open. Take a y0 ∈ W . There is a unique x0 ∈ V such
that f (x0 ) = y0 . Let r > 0 be small enough such that the closed ball C(x0 , r) ⊂ V (such r > 0 exists
as V is open).
Suppose y is such that
r
ky − y0 k < .
2kA−1 k
If we show that y ∈ W , then we have shown that W is open. If x ∈ C(x0 , r), then

kϕy (x) − x0 k ≤ kϕy (x) − ϕy (x0 )k + kϕy (x0 ) − x0 k


1
≤ kx − x0 k + kA−1 (y − y0 )k
2
1
≤ r + kA−1 k ky − y0 k
2
1 r
< r + kA−1 k = r.
2 2kA−1 k

So ϕy takes C(x0 , r) into B(x0 , r) ⊂ C(x0 , r). It is a contraction on C(x0 , r) and C(x0 , r) is complete
(closed subset of Rn is complete). Apply the contraction  mapping principle to obtain a fixed point x,
i.e. ϕy (x) = x. That is, f (x) = y, and y ∈ f C(x0 , r) ⊂ f (V ) = W . Therefore W is open.
Next we need to show that g is continuously differentiable and compute its derivative. First
let us show that it is differentiable. Let y ∈ W and k ∈ Rn , k 6= 0, such that y + k ∈ W . Because
f |V is a one-to-one and onto mapping of V onto W , there are unique x ∈ V and h ∈ Rn , h 6= 0 and
x + h ∈ V , such that f (x) = y and f (x + h) = y + k. In other words, g(y) = x and g(y + k) = x + h.
See .
We can still squeeze some information from the fact that ϕy is a contraction.

ϕy (x + h) − ϕy (x) = h + A−1 f (x) − f (x + h) = h − A−1 k.

So
1 khk
kh − A−1 kk = kϕy (x + h) − ϕy (x)k ≤ kx + h − xk = .
2 2
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 49

V f W

x+h g y+k

f
x y
g

Figure 8.11: Proving that g is differentiable.

By the inverse triangle inequality khk − kA−1 kk ≤ 12 khk, so

khk ≤ 2kA−1 kk ≤ 2kA−1 k kkk.

In particular, as k goes to 0, so does h.


−1
As x ∈ V , then f ′ (x) is invertible. Let B = f ′ (x) , which is what we think the derivative of g
at y is. Then

kg(y + k) − g(y) − Bkk kh − Bkk


=
kkk kkk

kh − B f (x + h) − f (x) k
=
kkk

kB f (x + h) − f (x) − f ′ (x)h k
=
kkk
khk k f (x + h) − f (x) − f ′ (x)hk
≤ kBk
kkk khk
k f (x + h) − f (x) − f ′ (x)hk
≤ 2kBk kA−1 k .
khk

As k goes to 0, so does h. So the right hand side goes to 0 as f is differentiable, and hence the left
hand side also goes to 0. And B is precisely what we wanted g′ (y) to be.
We have g is differentiable, let us show it is C1 (W ). The function g : W → V is continuous
(it is differentiable), f ′ is a continuous function from V to L(Rn ), and X 7→ X −1 is a continuous
−1
function on the set of invertible operators. As g′ (y) = f ′ g(y) is the composition of these
three continuous functions, it is continuous.

Corollary 8.5.2. Suppose U ⊂ Rn is open and f : U → Rn is a continuously differentiable mapping


such that f ′ (x) is invertible for all x ∈ U. Then given any open set V ⊂ U, f (V ) is open ( f is said
to be an open mapping).

Proof. Without loss of generality, suppose U = V . For each point y ∈ f (V ), we pick x ∈ f −1 (y)
(there could be more than one such point), then by the inverse function theorem there is a neighbor-
hood of x in V that maps onto a neighborhood of y. Hence f (V ) is open.
50 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Example 8.5.3: The theorem, and the corollary, is not true if f ′ (x)
 is not invertible for some x. For
2 2
example, the map f (x, y) := (x, xy), maps R onto the set R \ (0, y) : y 6= 0 , which is neither
open nor closed. In fact f −1 (0, 0) = (0, y) : y ∈ R . This bad behavior only occurs on the y-axis,
everywhere else the function is locally invertible. If we avoid the y-axis, f is even one-to-one.

Example 8.5.4: Also note that just because f ′ (x) is invertible everywhere does not mean that f
is one-to-one globally. It is “locally” one-to-one but perhaps not “globally.” For an example, take
the map f : R2 \ {0} → R2 defined by f (x, y) := (x2 − y2 , 2xy). It is left to student to show that f is
differentiable and the derivative is invertible
On the other hand, the mapping is 2-to-1 globally. For every (a, b) that is not the origin, there
are exactly two solutions to x2 − y2 = a and 2xy = b. We leave it to the student to show that there
is at least one solution, and then notice that replacing x and y with −x and −y we obtain another
solution.

The invertibility of the derivative is not a necessary condition, just sufficient, for having a
continuous inverse and being an open mapping. For example, the function f (x) := x3 is an open
mapping from R to R and is globally one-to-one with a continuous inverse, although the inverse is
not differentiable at x = 0.
As a side note, there is a related famous, and as yet unsolved problem, called the Jacobian
conjecture. If F : Rn → Rn is polynomial (each component is a polynomial) and JF is a nonzero
constant, does F have a polynomial inverse? The inverse function theorem gives a local C1 inverse,
but can one always find a global polynomial inverse is the question.

8.5.1 Implicit function theorem


The inverse function theorem is really a special case of the implicit function theorem, which we
prove next. Although somewhat ironically we prove the implicit function theorem using the inverse
function theorem. In the inverse function theorem we showed that the equation x − f (y) = 0 is
solvable for y in terms of x if the derivative in terms of y is invertible, that is if f ′ (y) is invertible.
Then there is (locally) a function g such that x − f g(x) = 0.
OK, so how about the equation f (x, y) = 0. This equation is not solvable for y in terms of x in
every case. For example, there is no solution when f (x, y) does not actually depend on y. For a
slightly more complicated example, notice that x2 + y2 − 1 = 0 defines the unit circle, and we can
locally solve for y in terms of x when 1) we are near a point that lies on the unit circle and 2) when
we are not at a point where the circle has a vertical tangency, or in other words where ∂∂ yf = 0.
To make things simple, we fix some notation. We let (x, y) ∈ Rn+m denote the coordinates
(x1 , . . . , xn , y1 , . . . , ym ). A linear transformation A ∈ L(Rn+m , Rm ) can then be written as A = [Ax Ay ]
so that A(x, y) = Ax x + Ay y, where Ax ∈ L(Rn , Rm ) and Ay ∈ L(Rm ).

Proposition 8.5.5. Let A = [Ax Ay ] ∈ L(Rn+m , Rm ) and suppose Ay is invertible. If B = −(Ay )−1 Ax ,
then
0 = A(x, Bx) = Ax x + Ay Bx.
Furthermore, y = Bx is the unique y ∈ Rm such that A(x, y) = 0.
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 51

The proof is obvious. We simply solve and obtain y = Bx. Another way to solve is to “complete
the basis” that is, add rows to the matrix until we have an invertible matrix. In this case, we construct
a mapping (x, y) 7→ (x, Ax x + Ay y), and find that this operator in L(Rn+m ) is invertible, and the map
B can be read off from the inverse. Let us show that the same can be done for C1 functions.
Theorem 8.5.6 (Implicit function theorem). Let U ⊂ Rn+m be an open set and let f : U → Rm be a
C1 (U) mapping. Let (p, q) ∈ U be a point such that f (p, q) = 0 and such that
∂ ( f1 , . . . , fm )
(p, q) 6= 0.
∂ (y1 , . . . , ym )
Then there exists an open set W ⊂ Rn with p ∈ W , an open set W ′ ⊂ Rm with q ∈ W ′ , with
W ×W ′ ⊂ U, and a C1 (W ) mapping g : W → W ′ , with g(p) = q, and for all x ∈ W , the point g(x)
is the unique point in W ′ such that 
f x, g(x) = 0.
Furthermore, if [Ax Ay ] = f ′ (p, q), then

g′ (p) = −(Ay )−1 Ax .


∂ ( f1 ,..., fm )
The condition ∂ (y1 ,...,ym ) (p, q) = det(Ay ) 6= 0 simply means that Ay is invertible. If n = m = 1,
the condition then becomes ∂∂ yf (p, q) 6= 0, W and W ′ are then open intervals. See .

y (p, q)
W′
W ×W ′

W x
f (x, y) = 0

Figure 8.12: Implicit function theorem for f (x, y) = x2 +y2 −1 in U = R2 and (p, q) in the first quadrant.


Proof. Define F : U → Rn+m by F(x, y) := x, f (x, y) . It is clear that F is C1 , and we want to
show that the derivative at (p, q) is invertible.
Let us compute the derivative. We know that
k f (p + h, q + k) − f (p, q) − Ax h − Ay kk
k(h, k)k
p
goes to zero as k(h, k)k = khk2 + kkk2 goes to zero. But then so does

kF(p + h, q + k) − F(p, q) − (h, Ax h + Ay k)k k h, f (p + h, q + k) − f (p, q) − (h, Ax h + Ay k)k
=
k(h, k)k k(h, k)k
k f (p + h, q + k) − f (p, q) − Ax h − Ay kk
= .
k(h, k)k
52 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
h i
I 0
So the derivative of F at (p, q) takes (h, k) to (h, Ax h + Ay k). In block matrix form it is Ax Ay . If
(h, Ax h + Ay k) = (0, 0), then h = 0, and so Ay k = 0. As Ay is one-to-one, k = 0. Thus F ′ (p, q)
is
one-to-one or in other words invertible, and we apply the inverse function theorem.
That is, there exists an open set  V ⊂ Rn+m with F(p, q) = (p, 0) ∈ V , and a C1 mapping
G : V → Rn+m , such that F G(x, s) = (x, s) for all (x, s) ∈ V , G is one-to-one, and G(V ) is open.
Write G = (G1 , G2 ) (the first n and the second m components of G). Then
  
F G1 (x, s), G2 (x, s) = G1 (x, s), f G1 (x, s), G2 (x, s) = (x, s).
 
So x = G1 (x, s) and f G1 (x, s), G2 (x, s) = f x, G2 (x, s) = s. Plugging in s = 0, we obtain

f x, G2 (x, 0) = 0.

As the set G(V ) is open and (p, q) ∈ G(V ), there exist some open sets W e and W ′ such that

We ×W ′ ⊂ G(V ) with p ∈ W e and q ∈ W ′ . Take W := x ∈ W e : G2 (x, 0) ∈ W ′ . The function that
takes x to G2 (x, 0) is continuous and therefore W is open. Define g : W → Rm by g(x) := G2 (x, 0),
which is the g in the theorem. The fact that g(x) is the unique point in W ′ follows because
W ×W ′ ⊂ G(V ) and G is one-to-one.
Next differentiate 
x 7→ f x, g(x) ,
at p, which is the zero map, so its derivative is zero. Using the chain rule,

0 = A h, g′ (p)h = Ax h + Ay g′ (p)h,

for all h ∈ Rn , and we obtain the desired derivative for g.

In other words, in the context of the theorem we have m equations in n + m unknowns:

f1 (x1 , . . . , xn , y1 , . . . , ym ) = 0,
f2 (x1 , . . . , xn , y1 , . . . , ym ) = 0,
..
.
fm (x1 , . . . , xn , y1 , . . . , ym ) = 0.

And the condition guaranteeing a solution is that this is a C1 mapping (that all the components are
C1 , or in other words all the partial derivatives exist and are continuous), and the matrix
 
∂ f1 ∂ f1 ∂ f1
. . .
 ∂ y1 ∂ y2 ∂ ym

 ∂ f2 ∂ f2 ∂ f2 
 ∂ y1 ∂ y2 . . . ∂ ym 
 . .. . . .. 
 .. . . . 
 
∂ fm ∂ fm ∂ fm
∂y ∂y . . . ∂ ym
1 2

is invertible at (p, q).


8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 53

Example 8.5.7: Consider the set x2 + y2 − (z + 1)3 = −1, ex + ey + ez = 3 near the point (0, 0, 0).
The function we are looking at is

f (x, y, z) = (x2 + y2 − (z + 1)3 + 1, ex + ey + ez − 3).

We find that  
′ 2x 2y −3(z + 1)2
f = x y .
e e ez
The matrix    
2(0) −3(0 + 1)2 0 −3
=
e0 e0 1 1
is invertible. Hence near (0, 0, 0) we can find y and z as C1 functions of x such that for x near 0 we
have 3
x2 + y(x)2 − z(x) + 1 = −1, ex + ey(x) + ez(x) = 3.
The theorem does not tell us how to find y(x) and z(x) explicitly, it just tells us they exist. In other
words, near the origin the set of solutions is a smooth curve in R3 that goes through the origin.

An interesting observation from the proof is that we solved the equation f x, g(x) = s for all s
in some neighborhood of 0, not just s = 0.
Remark 8.5.8. There are versions of the theorem for arbitrarily many derivatives. If f has k
continuous derivatives, then the solution also has k continuous derivatives. See also the next section.

8.5.2 Exercises

Exercise 8.5.1: Let C = (x, y) ∈ R2 : x2 + y2 = 1 .
a) Solve for y in terms of x near (0, 1) (that is, find the function g from the implicit function theorem for a
neighbourhood of the point (p, q) = (0, 1)).
b) Solve for y in terms of x near (0, −1).
c) Solve for x in terms of y near (−1, 0).

Exercise 8.5.2: Define f : R2 → R2 by f (x, y) := x, y + h(x) for some continuously differentiable function
h of one variable.
a) Show that f is one-to-one and onto.
b) Compute f ′ .
c) Show that f ′ is invertible at all points, and compute its inverse.
 
Exercise 8.5.3: Define f : R2 → R2 \ (0, 0) by f (x, y) := ex cos(y), ex sin(y) .
a) Show that f is onto.
b) Show that f ′ is invertible at all points.

c) Show that f is not one-to-one, in fact for every (a, b) ∈ R2 \ (0, 0) , there exist infinitely many different
points (x, y) ∈ R2 such that f (x, y) = (a, b).
Therefore, invertible derivative at every point does not mean that f is invertible globally.
Note: Feel free to use what you know about sine and cosine from calculus.
54 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

Exercise 8.5.4: Find a map f : Rn → Rn that is one-to-one, onto, continuously differentiable, but f ′ (0) = 0.
Hint: Generalize f (x) = x3 from one to n dimensions.

Exercise 8.5.5: Consider z2 + xz + y = 0 in R3 . Find an equation D(x, y) = 0, such that if D(x0 , y0 ) 6= 0 and
z2 + x0 z + y0 = 0 for some z ∈ R, then for points near (x0 , y0 ) there exist exactly two distinct continuously
differentiable functions r1 (x, y) and r2 (x, y) such that z = r1 (x, y) and z = r2 (x, y) solve z2 + xz + y = 0. Do
you recognize the expression D from algebra?

Exercise 8.5.6: Suppose f : (a, b) → R2 is continuously differentiable and the first component (the x compo-
nent) of ∇ f (t) is not equal to 0 for all t ∈ (a, b). Prove that thereexists an interval (c, d) and a continuously
differentiable function g : (c, d) → R such that (x, y) ∈ f (a, b) if and only if x ∈ (c, d) and y = g(x). In
other words, the set f (a, b) is a graph of g.

Exercise 8.5.7: Define f : R2 → R2


( 
(x2 sin 1/x + 2x , y) if x 6= 0,
f (x, y) :=
(0, y) if x = 0.

a) Show that f is differentiable everywhere.


b) Show that f ′ (0, 0) is invertible.
c) Show that f is not one-to-one in any neighborhood of the origin (it is not locally invertible, that is, the
inverse function theorem does not work).
d) Show that f is not continuously differentiable.
Note: Feel free to use what you know about sine and cosine from calculus.

Exercise 8.5.8 (Polar coordinates): Define a mapping F(r, θ ) := r cos(θ ), r sin(θ ) .
a) Show that F is continuously differentiable (for all (r, θ ) ∈ R2 ).
b) Compute F ′ (0, θ ) for any θ .
c) Show that if r 6= 0, then F ′ (r, θ ) is invertible, therefore an inverse of F exists locally as long as r 6= 0.
d) Show that F : R2 → R2 is onto, and for each point (x, y) ∈ R2 , the set F −1 (x, y) is infinite.
e) Show that F : R2 → R2 is an open map, despite not satisfying the condition of the inverse function
theorem.

f) Show that F|(0,∞)×[0,2π ) is one to one and onto R2 \ (0, 0) .
Note: Feel free to use what you know about sine and cosine from calculus.

Exercise 8.5.9: Let H := (x, y) ∈ R2 : y > 0}, and for (x, y) ∈ H define
 
x2 + y2 − 1 −2x
F(x, y) := , .
x2 + 2y + y2 + 1 x2 + 2y + y2 + 1

Prove that F is a bijective mapping from H to B(0, 1), it is continuously differentiable on H, and its inverse is
also continuously differentiable.

Exercise 8.5.10: Suppose U ⊂ R2 is an open set and f : U → R is a C1 function such that ∇ f (x, y) 6= 0 for
all (x, y) ∈ U. Show that every level set is a C1 smooth curve.
 That is, for every (x, y) ∈ U, there exists a C
1
2 ′
function γ : (−δ , δ ) → R with γ (0) 6= 0 such that f γ(t) is constant for all t ∈ (−δ , δ ).
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 55

Exercise 8.5.11: Suppose U ⊂ R2 is an open set and f : U → R is a C1 function such that ∇ f (x, y) 6= 0
for all (x, y) ∈ U. Show that for every (x, y) there exists a neighborhood V of (x, y) an open set W ⊂ R2 , a
bijective C1 function with a C1 inverse g : W → V such that the level sets of  f ◦ g are horizontal lines in W ,
that is, the set given by ( f ◦ g)(s,t) = c for a constant c is a set of the form (s,t0 ) ∈ R2 : s ∈ R, (s,t0 ) ∈ W ,
where t0 is fixed. That is, the level curves can be locally “straightened.”
56 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

8.6 Higher order derivatives


Note: less than 1 lecture, partly depends on the optional §4.3 of volume I
Let U ⊂ Rn be an open set and f : U → R a function. Denote by x = (x1 , x2 , . . . , xn ) ∈ Rn our
coordinates. Suppose ∂∂xfj exists everywhere in U, then we note that it is also a function ∂∂xfj : U → R.
∂f
Therefore, it makes sense to talk about its partial derivatives. We denote the partial derivative of ∂xj
with respect to xk by

∂2 f ∂ ∂∂xfj
:= .
∂ xk ∂ x j ∂ xk
∂2 f
If k = j, then we write ∂ x2j
for simplicity.
We define higher order derivatives inductively. Suppose j1 , j2 , . . . , jℓ are integers between 1 and
n, and suppose
∂ ℓ−1 f
∂ x jℓ−1 ∂ x jℓ−2 · · · ∂ x j1
exists and is differentiable in the variable x jℓ , then the partial derivative with respect to that variable
is denoted by
∂ ℓ−1 f 
∂ fℓ ∂ ∂ x jℓ−1 ∂ x jℓ−2 ···∂ x j1
:= .
∂ x jℓ ∂ x jℓ−1 · · · ∂ x j1 ∂ x jℓ
Such a derivative is called a partial derivative of order ℓ.
2
Sometimes the notation fx j xk is used for ∂ x∂ ∂fx j . This notation swaps the order in which we write
k
the derivatives, which may be important.

Definition 8.6.1. If U ⊂ Rn is an open set and f : U → R a function. We say f is k-times con-


tinuously differentiable function, or a Ck function, if all partial derivatives of all orders up to and
including order k exist and are continuous.

So a continuously differentiable, or C1 , function is one where all partial derivatives exist and
are continuous, which agrees with our previous definition due to . We could have
required only that the kth order partial derivatives exist and are continuous, as the existence of lower
order derivatives is clearly necessary to even define kth order partial derivatives, and these lower
order derivatives are continuous as they are differentiable functions.
When the partial derivatives are continuous, we can swap their order.

Proposition 8.6.2. Suppose U ⊂ Rn is open and f : U → R is a C2 function, and j and k are two
integers from 1 to n. Then
∂2 f ∂2 f
= .
∂ xk ∂ x j ∂ x j ∂ xk
Proof. Fix a p ∈ U, and let e j and ek be the standard basis vectors. Pick two positive numbers s and
t small enough so that p + s0 e j + t0 ek ∈ U whenever 0 < s0 ≤ s and 0 < t0 ≤ t. This can be done as
U is open and so contains a small open ball (or a box if you wish) around p.
8.6. HIGHER ORDER DERIVATIVES 57

Use the mean value theorem on the function


τ 7→ f (p + se j + τ ek ) − f (x + τ ek ),
on the interval [0,t] to find a t0 ∈ (0,t) such that
f (p + se j + tek ) − f (p + tek ) − f (p + se j ) + f (p) ∂f ∂f
= (p + se j + t0 ek ) − (p + t0 ek ).
t ∂ xk ∂ xk
Next there exists a number s0 ∈ (0, s)
∂f ∂f
∂ xk (p + se j + t0 ek ) − ∂ xk (p + t0 ek ) ∂2 f
= (p + s0 e j + t0 ek ).
s ∂ x j ∂ xk
In other words
f (p + se j + tek ) − f (p + tek ) − f (p + se j ) + f (p) ∂2 f
g(s,t) := = (p + s0 e j + t0 ek ).
st ∂ x j ∂ xk

ek p + s0 e j + t0 ek
p + tek p + se j + tek
p + t0 ek p + se j + t0 ek

ej
p p + se j

Figure 8.13: Using the mean value theorem to estimate a second order partial derivative by a certain
difference quotient.

See . The s0 and t0 depend on s and t, but 0 < s0 < s and 0 < t0 < t. Denote by R2+
the set of (s,t) where s > 0 and t > 0. The set R2+ is the domain of g, and (0, 0) is a cluster point
of R2+ . As (s,t) ∈ R2+ goes to (0, 0), (s0 ,t0 ) ∈ R2+ also goes to (0, 0). By continuity of the second
partial derivatives,
∂2 f
lim g(s,t) = (p).
(s,t)→(0,0) ∂ x j ∂ xk
Now reverse the ordering. Start with the function σ 7→ f (p + σ e j + tek ) − f (p + σ e j ) find an
s1 ∈ (0, s) such that
f (p + tek + se j ) − f (p + se j ) − f (p + tek ) + f (p) ∂f ∂f
= (p + tek + s1 e j ) − (p + s1 e j ).
s ∂xj ∂xj
Find a t1 ∈ (0,t) such that
∂f ∂f
∂ x j (p + tek + s1 e j ) − ∂ x j (p + s1 e j ) ∂2 f
= (p + t1 ek + s1 e j ).
t ∂ xk ∂ x j
58 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES

∂2 f
So g(s,t) = ∂ xk ∂ x j (p + t1 ek + s1 e j ) for the same g as above. And as before

∂2 f
lim g(s,t) = (p).
(s,t)→(0,0) ∂ xk ∂ x j

Therefore the two partial derivatives are equal.


The proposition does not hold if the derivatives are not continuous. See the exercises. Notice also
that we did not really need a C2 function, we only needed the two second order partial derivatives
involved to be continuous functions.

8.6.1 Exercises
Exercise 8.6.1: Suppose f : U → R is a C2 function for some open U ⊂ Rn and p ∈ U. Use the proof of
to find an expression in terms of just the values of f (analogue of the difference quotient for
2
the first derivative), whose limit is ∂ x∂j ∂fxk (p).

Exercise 8.6.2: Define (


xy(x2 −y2 )
x2 +y2
if (x, y) 6= (0, 0),
f (x, y) :=
0 if (x, y) = (0, 0).
Show that
a) The first order partial derivatives exist and are continuous.
∂2 f ∂2 f ∂2 f
b) The partial derivatives ∂ x∂ y and ∂ y∂ x exist, but are not continuous at the origin, and ∂ x∂ y (0, 0) 6=
∂2 f
∂ y∂ x (0, 0).

Exercise 8.6.3: Suppose f : U → R is a Ck function for some open U ⊂ Rn and p ∈ U. Suppose j1 , j2 , . . . , jk


are integers between 1 and n, and suppose σ = (σ1 , σ2 , . . . , σk ) is a permutation of (1, 2, . . . , k). Prove

∂k f ∂k f
(p) = (p).
∂ x jk ∂ x jk−1 · · · ∂ x j1 ∂ x jσk ∂ x jσk−1 · · · ∂ x jσ1

Exercise 8.6.4: Suppose ϕ : R2 → R be a Ck function such that ϕ(0, θ ) = ϕ(0, ψ) for all θ , ψ ∈ R and
ϕ(r, θ ) = ϕ(r, θ + 2π) for all r, θ ∈ R. Let F(r, θ) = r cos(θ ), r sin(θ ) from . Show that a
2 −1 −1
function g : R → R, given g(x, y) := ϕ F (x, y) is well-defined (notice that F (x, y) can only be defined
locally), and when restricted to R2 \ {0} it is a Ck function.
Note: Feel free to use what you know about sine and cosine from calculus.

Exercise 8.6.5: Suppose f : R2 → R is a C2 function. For any (x, y) ∈ R2 compute

f (x + t, y) + f (x − t, y) + f (x, y + t) + f (x, y − t) − 4 f (x, y)


lim
t→0 t2
in terms of the partial derivatives of f .

Exercise 8.6.6: Suppose f : R2 → R is a function such that all first and second order partial derivatives
exist. Furthermore, suppose that all second order partial derivatives are bounded functions. Prove that f is
continuously differentiable.
8.6. HIGHER ORDER DERIVATIVES 59

Exercise 8.6.7: Follow the strategy below to prove the following simple version of the second derivative
test for functions defined on R2 (using (x, y) as coordinates): Suppose f : R2 → R is a twice continuously
differentiable function with a critical point at the origin, f ′ (0, 0) = 0. If
 2 2
∂2 f ∂2 f ∂2 f ∂ f
(0, 0) > 0 and (0, 0) 2 (0, 0) − (0, 0) > 0,
∂ x2 ∂ x2 ∂y ∂ x∂ y

then f has a (strict) local minimum at (0, 0). Use the following technique: First suppose without loss of
generality that f (0, 0) = 0. Then prove:
∂ 2g ∂ 2g ∂ 2g
a) There exists an A ∈ L(R2 ) such that g = f ◦ A is such that ∂ x∂ y (0, 0) = 0, and ∂ x2
(0, 0) =
= 1.∂ y2
(0, 0)

b) For any ε > 0 there exists a δ > 0 such that g(x, y) − x2 − y2 < ε(x2 + y2 ) for all (x, y) ∈ B (0, 0), δ .
Hint: You can use Taylor’s theorem in one variable.
c) This means that g, and therefore f , has a strict local minimum at (0, 0).
Note: You must avoid the temptation to just apply the one variable second derivative test along lines through
the origin, see .
60 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Chapter 9

One-dimensional Integrals in Several


Variables

9.1 Differentiation under the integral


Note: less than 1 lecture
Let f (x, y) be a function of two variables and define
Z b
g(y) := f (x, y) dx.
a

If f is continuous on the compact rectangle [a, b] × [c, d], then Proposition 7.5.12 from volume I
says that g is continuous on [c, d].
Suppose f is differentiable in y. The main question we want to ask is when can we “differentiate
under the integral,” that is, when is it true that g is differentiable and its derivative is
Z b
? ∂f
g′ (y) = (x, y) dx.
a ∂y
Differentiation is a limit and therefore we are really asking when do the two limiting operations of
integration and differentiation commute. This is not always possible and some extra hypothesis is
necessary. In particular, the first question we would face is the integrability of ∂∂ yf , but the formula
can fail even if ∂∂ yf is integrable as a function of x for every fixed y.
Let us prove a simple, but perhaps the most useful version of this theorem.
Theorem 9.1.1 (Leibniz integral rule). Suppose f : [a, b] × [c, d] → R is a continuous function, such
that ∂∂ yf exists for all (x, y) ∈ [a, b] × [c, d] and is continuous. Define
Z b
g(y) := f (x, y) dx.
a

Then g : [c, d] → R is continuously differentiable and


Z b
′ ∂f
g (y) = (x, y) dx.
a ∂y
62 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

∂f
The hypotheses on f and ∂y can be weakened, see e.g. , but not dropped outright.
The main point in the proof is for ∂∂ yf to exist and be continuous for all x up to the ends, but we only
need a small interval in the y direction. In applications, we often make [c, d] be a small interval
around the point where we need to differentiate.

Proof. Fix y ∈ [c, d] and let ε > 0 be given. As ∂∂ yf is continuous on [a, b] × [c, d] it is uniformly
continuous. In particular, there exists δ > 0 such that whenever y1 ∈ [c, d] with |y1 − y| < δ and all
x ∈ [a, b] we have
∂f ∂f
(x, y1 ) − (x, y) < ε .
∂y ∂y
Suppose h is such that y + h ∈ [c, d] and |h| < δ . Fix x for a moment and apply the mean value
theorem to find a y1 between y and y + h such that

f (x, y + h) − f (x, y) ∂ f
= (x, y1 ).
h ∂y

As |y1 − y| ≤ |h| < δ ,

f (x, y + h) − f (x, y) ∂ f ∂f ∂f
− (x, y) = (x, y1 ) − (x, y) < ε .
h ∂y ∂y ∂y

This argument worked for every x ∈ [a, b]. Therefore, as a function of x

f (x, y + h) − f (x, y) ∂f
x 7→ converges uniformly to x 7→ (x, y) as h → 0.
h ∂y

We defined uniform convergence for sequences although the idea is the same. You may replace h
with a sequence of nonzero numbers {hn } converging to 0 such that y + hn ∈ [c, d] and let n → ∞.
Consider the difference quotient of g,
Rb Rb Z b
g(y + h) − g(y) a f (x, y + h) dx − a f (x, y) dx f (x, y + h) − f (x, y)
= = dx.
h h a h
Uniform convergence implies the limit can be taken underneath the integral. So
Z b Z b
g(y + h) − g(y) f (x, y + h) − f (x, y) ∂f
lim = lim dx = (x, y) dx.
h→0 h a h→0 h a ∂y

Then g′ is continuous on [c, d] by Proposition 7.5.12 from volume I mentioned above.

Example 9.1.2: Let


Z 1
f (y) = sin(x2 − y2 ) dx.
0
Then Z 1

f (y) = −2y cos(x2 − y2 ) dx.
0
9.1. DIFFERENTIATION UNDER THE INTEGRAL 63

Example 9.1.3: Suppose we start with


Z 1
x−1
dx.
0 ln(x)
The function under the integral extends to be continuous on [0, 1], and hence the integral exists, see
. Trouble is finding it. Introduce a parameter y and define a function:
Z 1 y
x −1
g(y) := dx.
0 ln(x)
−1 y
The function xln(x) also extends to a continuous function of x and y for (x, y) ∈ [0, 1] × [0, 1] (also in
the exercise). See .

xy −1
Figure 9.1: The graph z = ln(x) on [0, 1] × [0, 1].

Therefore g is a continuous function of on [0, 1], and g(0) = 0. For any ε > 0, the y derivative
of the integrand, xy , is continuous on [0, 1] × [ε , 1]. Therefore, for y > 0 we may differentiate under
the integral sign
Z 1 Z 1
′ ln(x)xy 1
g (y) = dx = xy dx = .
0 ln(x) 0 y+1
We need to figure out g(1), knowing g′ (y) = 1
y+1 and g(0) = 0. By elementary calculus we find
R
g(1) = 01 g′ (y) dy = ln(2). Therefore
Z 1
x−1
dx = ln(2).
0 ln(x)

9.1.1 Exercises
Exercise 9.1.1: Prove the two statements that were asserted in :
x−1
a) Prove ln(x) extends to a continuous function of [0, 1]. That is, there exists a continuous function on [0, 1]
x−1
that equals ln(x) on (0, 1).
xy −1
b) Prove ln(x) extends to a continuous function on [0, 1] × [0, 1].
64 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

Exercise 9.1.2: Suppose h : R → R is a continuous function. Suppose g : R → R is which is continuously


differentiable and compactly supported. That is there exists some M > 0 such that g(x) = 0 whenever |x| ≥ M.
Define Z ∞
f (x) := h(y)g(x − y) dy.
−∞

Show that f is differentiable.

Exercise 9.1.3: Suppose f : R → R is an infinitely differentiable function (all derivatives exist) such that
f (0) = 0. Then show that there exists an infinitely differentiable function g : R → R such that f (x) = x g(x).
Finally show that if f ′ (0)R6= 0, then g(0) 6= 0.
Hint: First write f (x) = 0x f ′ (s) ds and then rewrite the integral to go from 0 to 1.
R R1 n x
Exercise 9.1.4: Compute 01 etx dx. Derive the formula for 0 x e dx not using integration by parts, but by
differentiation underneath the integral.

Exercise 9.1.5: Let U ⊂ Rn be an open set and suppose f (x, y1 , y2 , . . . , yn ) is a continuous function defined
on [0, 1] × U ⊂ Rn+1 . Suppose ∂∂yf1 , ∂∂yf2 , . . . , ∂∂yfn exist and are continuous on [0, 1] × U. Then prove that
F : U → R defined by
Z 1
F(y1 , y2 , . . . , yn ) := f (x, y1 , y2 , . . . , yn ) dx
0

is continuously differentiable.

Exercise 9.1.6: Work out the following counterexample: Let


( xy3
2 if x 6= 0 or y 6= 0,
f (x, y) := (x2 +y2 )
0 if x = 0 and y = 0.

a) Prove that for any fixed y the function x 7→ f (x, y) is Riemann integrable on [0, 1] and
Z 1
y
g(y) = f (x, y) dx = .
0 2y2 + 2

Therefore g′ (y) exists and we get the continuous function

1 − y2
g′ (y) = .
2(y2 + 1)2

∂f
b) Prove ∂y exists at all x and y and compute it.
c) Show that for all y
Z 1
∂f
(x, y) dx
0 ∂y
exists, but
Z 1
∂f
g′ (0) 6= (x, 0) dx.
0 ∂y
9.1. DIFFERENTIATION UNDER THE INTEGRAL 65

Exercise 9.1.7: Work out the following counterexample: Let


(  
y
x sin x2 +y2 if (x, y) 6= (0, 0),
f (x, y) :=
0 if (x, y) = (0, 0).

a) Prove f is continuous on all of R2 . Therefore the following function is well defined for every y ∈ R:
Z 1
g(y) := f (x, y) dx.
0

∂f
b) Prove ∂y exists for all (x, y), but is not continuous at (0, 0).
R1 ∂f
c) Show that 0 ∂ y (x, 0) dx does not exist even if we take improper integrals, that is, that the limit
R1 ∂f
lim h ∂ y (x, 0) dx does not exist.
h→0+
Note: Feel free to use what you know about sine and cosine from calculus.

Exercise 9.1.8: Strengthen the Leibniz integral rule in the following way. Suppose f : (a, b) × (c, d) → R
is a bounded continuous function, such that ∂∂ yf exists for all (x, y) ∈ (a, b) × (c, d) and is continuous and
bounded. Define
Z b
g(y) := f (x, y) dx.
a
Then g : (c, d) → R is continuously differentiable and
Z b
′ ∂f
g (y) = (x, y) dx.
a ∂y
Hint: See also Exercise 7.5.18 and Theorem 6.2.10 from volume I.
66 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

9.2 Path integrals


Note: 2–3 lectures

9.2.1 Piecewise smooth paths


Let γ : [a, b] → Rn be a function and write γ = (γ1 , γ2 , . . . , γn ). Then γ is said to be continuously
differentiable whenever there exists a continuous function γ ′ : [a, b] → Rn such that for every
′ (t) hk
t ∈ [a, b], we have lim kγ(t+h)−γ(t)−γ
|h| = 0. As before we treat γ ′ (t) either as a linear operator
h→0 
(an n × 1 matrix) or a vector, as it is a column vector. As a vector, γ ′ (t) = γ1′ (t), γ2′ (t), . . . , γn′ (t) ,
and the definition is the same as γ j being a continuously differentiable function on [a, b] for every
j = 1, 2, . . . , n. By , the operator norm of the operator γ ′ (t) is equal to the euclidean
norm of the corresponding vector, so there is no confusion when writing kγ ′ (t)k.
Definition 9.2.1. A continuously differentiable function γ : [a, b] → Rn is called a smooth path or a
continuously differentiable path if γ is continuously differentiable and γ ′ (t) 6= 0 for all t ∈ [a, b].
The function γ : [a, b] → Rn is called a piecewise smooth path or a piecewise continuously
differentiable path if there exist finitely many points t0 = a < t1 < t2 < · · · < tk = b such that the
restriction of the function γ |[t j−1 ,t j ] is smooth path.
A path γ is said to be a simple path if γ |(a,b) is a one-to-one function. A path γ is called a closed
path if γ (a) = γ (b), that is if the path starts and ends in the same point.
Example 9.2.2: Let γ : [0, 4] → R2 be defined by


(t, 0) if t ∈ [0, 1],

(1,t − 1) if t ∈ (1, 2],
γ (t) :=

(3 − t, 1) if t ∈ (2, 3],


(0, 4 − t) if t ∈ (3, 4].

t =3 t=2

t =4
t=0 t=1
Figure 9.2: The path γ traversing the unit square.

The path is the unit square traversed counterclockwise. See . It is a piecewise smooth
path. For example, γ |[1,2] (t) = (1,t − 1) and so (γ |[1,2] )′ (t) = (0, 1) 6= 0. Similarly for the other

The word “smooth” can sometimes mean “infinitely differentiable” in the literature.
9.2. PATH INTEGRALS 67

sides. Notice that (γ |[1,2] )′ (1) = (0, 1), (γ |[0,1] )′ (1) = (1, 0), but γ ′ (1) does not exist. At the corners
γ is not differentiable. The path γ is a simple closed path, as γ |(0,4) is one-to-one and γ (0) = γ (4).

The definition of a piecewise smooth path as we have given it implies continuity (exercise). For
general functions, many authors also allow finitely many discontinuities, when they use the term
piecewise smooth, and so one may say that we defined a piecewise smooth path to be a continuous
piecewise smooth function. While one may get by with smooth paths, for computations, the simplest
paths to write down are often piecewise smooth. 
Generally, we are interested in the direct image γ [a, b] , rather than the specific parametrization,
although that is also important to some degree. When we informally talk about a path or a curve,
we often mean the set γ [a, b] , depending on context.

Example 9.2.3: The condition γ ′ (t) 6= 0 means that the image γ [a, b] has no “corners” where γ
is smooth. Consider (
(t 2 , 0) if t < 0,
γ (t) :=
(0,t 2 ) if t ≥ 0.
See  . It is left for the reader to check that γ is continuously differentiable, yet the image
γ (R) = (x, y) ∈ R2 : (x, y) = (s, 0) or (x, y) = (0, s) for some s ≥ 0 has a “corner” at the origin.
And that is because γ ′ (0) = (0, 0). More complicated examples with, say, infinitely many corners
exist, see the exercises.

t = −1

t = −1/2

t =0 t = 1/2 t=1

Figure 9.3: Smooth path with zero derivative with a corner. Several values of t are marked with dots.

The condition γ ′ (t) 6= 0 even at the endpoints guarantees not only no corners, but also that the
path ends nicely, that is, it can extend a little bit past the endpoints. Again, see the exercises.

Example 9.2.4: A graph of a continuously differentiable function f : [a, b] → R is a smooth path.


Define γ : [a, b] → R2 by 
γ (t) := t, f (t) .
 
Then γ ′ (t) = 1, f ′ (t) , which is never zero, and γ [a, b] is the graph of f .
There are other ways of parametrizing the path. That is, there are different paths with the
same image. For example, the function t 7→ (1 − t)a + tb, takes the interval [0, 1] to [a, b]. Define
α : [0, 1] → R2 by 
α (t) := (1 − t)a + tb, f ((1 − t)a + tb) .
68 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
  
Then
 α ′ (t) = b − a, (b − a) f ′ ((1 −t)a +tb) , which is never zero. As sets, α [0, 1] = γ [a, b] =
(x, y) ∈ R2 : x ∈ [a, b] and f (x) = y , which is just the graph of f .
The last example leads us to a definition.
Definition 9.2.5. Let γ : [a, b] → Rn be a smooth path and h : [c, d] → [a, b] a continuously differ-
entiable bijective function such that h′ (t) 6= 0 for all t ∈ [c, d]. Then the composition γ ◦ h is called a
smooth reparametrization of γ .
Let γ be a piecewise smooth path, and h be a piecewise smooth bijective function. Then the
composition γ ◦ h is called a piecewise smooth reparametrization of γ .
If h is strictly increasing, then h is said to preserve orientation. If h does not preserve orientation,
then h is said to reverse orientation.
 
A reparametrization is another path for the same set. That is, (γ ◦ h) [c, d] = γ [a, b] .
Let us remark that for h, piecewise smooth means that there is some partition t0 = c < t1 <
t2 < · · · < tk = d, such that h|[t j−1 ,t j ] is continuously differentiable and (h|[t j−1 ,t j ] )′ (t) 6= 0 for all
t ∈ [t j−1 ,t j ]. Since h is bijective, it is either strictly increasing or strictly decreasing. Therefore
either (h|[t j−1 ,t j ] )′ (t) > 0 for all t or (h|[t j−1 ,t j ] )′ (t) < 0 for all t.
Proposition 9.2.6. If γ : [a, b] → Rn is a piecewise smooth path, and γ ◦h : [c, d] → Rn is a piecewise
smooth reparametrization, then γ ◦ h is a piecewise smooth path.
Proof. Let us assume that h preserves orientation, that is, h is strictly increasing. If h : [c, d] → [a, b]
gives a piecewise smooth reparametrization, then for some partition r0 = c < r1 < r2 < · · · < rℓ = d,
the restriction h|[r j−1 ,r j ] is continuously differentiable with a positive derivative.
Let t0 = a < t1 < t2 < · · · < tk = b be the partition from the definition of piecewise smooth for
γ together with the points {h(r0 ), h(r1 ), h(r2 ), . . . , h(rℓ )}. Let s j := h−1 (t j ). Then s0 = c < s1 <
s2 < · · · < sk = d is a partition that includes (is a refinement of) the {r0 , r1 , . . . , rℓ }. If τ ∈ [s j−1 , s j ],
then h(τ ) ∈ [t j−1 ,t j ] since h(s j−1 ) = t j−1 , h(s j ) = t j , and h is strictly increasing. Also h|[s j−1 ,s j ] is
continuously differentiable, and γ |[t j−1 ,t j ] is also continuously differentiable. Then

(γ ◦ h)|[s j−1 ,s j ] (τ ) = γ |[t j−1 ,t j ] h|[s j−1 ,s j ] (τ ) .
The function (γ ◦ h)|[s j−1 ,s j ] is therefore continuously differentiable and by the chain rule
′ ′ 
(γ ◦ h)|[s j−1 ,s j ] (τ ) = γ |[t j−1 ,t j ] h(τ ) (h|[s j−1 ,s j ] )′ (τ ) 6= 0.
Consequently, γ ◦ h is a piecewise smooth path. Orientation reversing h is left as an exercise.
If two paths are simple and their images are the same, it is left as an exercise that there exists a
reparametrization.

9.2.2 Path integral of a one-form


Definition 9.2.7. Let (x1 , x2 , . . . , xn ) ∈ Rn be our coordinates. Given n real-valued continuous
functions ω1 , ω2 , . . . , ωn defined on a set S ⊂ Rn , we define a one-form to be an object of the form
ω = ω1 dx1 + ω2 dx2 + · · · + ωn dxn .
We could represent ω as a continuous function from S to Rn , although it is better to think of it as a
different object.
9.2. PATH INTEGRALS 69

Example 9.2.8:
−y x
ω (x, y) := dx + dy
x 2 + y2 x 2 + y2
is a one-form defined on R2 \ {(0, 0)}.
Definition 9.2.9. Let γ : [a, b] → Rn be a smooth path and let

ω = ω1 dx1 + ω2 dx2 + · · · + ωn dxn ,



be a one-form defined on the direct image γ [a, b] . Write γ = (γ1 , γ2 , . . . , γn ). Define:
Z Z b    
ω := ω1 γ (t) γ1′ (t) + ω2 γ (t) γ2′ (t) + · · · + ωn γ (t) γn′ (t) dt
γ a
Z b
!
n  ′
=
a
∑ ω j γ (t) γ j (t) dt.
j=1

To remember the definition note that x j is γ j (t), so dx j becomes γ j′ (t) dt.


If γ is piecewise smooth, take the corresponding partition t0 = a < t1 < t2 < . . . < tk = b, and
assume the partition is minimal in the sense that γ is not differentiable at t1 ,t2 , . . . ,tk−1 . As each
γ |[t j−1 ,t j ] is a smooth path, define
Z Z Z Z
ω := ω+ ω + ··· + ω.
γ γ|[t0 ,t1 ] γ|[t1 ,t2 ] γ|[t
k−1 ,tk ]

The notation makes sense from the formula you remember from calculus, let us state it somewhat
informally: if x j (t) = γ j (t), then dx j = γ j′ (t)dt.
Paths can be cut up or concatenated. The proof is a direct application of the additivity of the
Riemann integral, and is left as an exercise. The proposition justifies why we defined the integral
over a piecewise smooth path in the way we did, and it justifies that we may as well have taken any
partition not just the minimal one in the definition.
Proposition 9.2.10. Let γ : [a, c] → Rn be a piecewise smooth path, and b ∈ (a, c). Define  the
piecewise smooth paths α := γ |[a,b] and β := γ |[b,c] . Let ω be a one-form defined on γ [a, c] . Then
Z Z Z
ω= ω+ ω.
γ α β

Example 9.2.11: Let the one-form ω and the path γ : [0, 2π ] → R2 be defined by
−y x 
ω (x, y) := 2 2
dx + 2 dy, γ (t) := cos(t), sin(t) .
x +y x + y2
Then
Z Z 2π
!
− sin(t)  cos(t) 
ω= 2 2 − sin(t) + 2 2 cos(t) dt
γ 0 cos(t) + sin(t) cos(t) + sin(t)
Z 2π
= 1 dt = 2π .
0
70 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

Next, let us parametrize the same curve as α : [0, 1] → R2 defined by α (t) := cos(2π t), sin(2π t) ,
that is α is a smooth reparametrization of γ . Then
Z Z 1 
− sin(2π t)
ω= 2 2 −2π sin(2π t)
α 0 cos(2π t) + sin(2π t)
!
cos(2π t) 
+ 2 2 2π cos(2π t) dt
cos(2π t) + sin(2π t)
Z 1
= 2π dt = 2π .
0

Now let us reparametrize with β : [0, 2π ] → R2 as β (t) := cos(−t), sin(−t) . Then
Z Z 2π
!
− sin(−t)  cos(−t) 
ω= 2 2 sin(−t) + 2 2 − cos(−t) dt
β 0 cos(−t) + sin(−t) cos(−t) + sin(−t)
Z 2π
= (−1) dt = −2π .
0

The path α is an orientation preserving reparametrization of γ , and the integrals are the same. The
path β is an orientation reversing reparametrization of γ and the integral is minus the original. See
.

γ (π/2) = α (1/4) = β (3π/2)


γ (π/4) = α (1/8) = β (7π/4)

γ (0) = α (0) = β (2π )


γ (π ) = α (1/2) = β (π )
γ (2π ) = α (1) = β (0)

γ (3π/2) = α (3/4) = β (π/2)


Figure 9.4: A circular path reparametrized in two different ways. The arrow indicates the orientation of
γ and α. The path β traverses the circle in the opposite direction.

The previous example is not a fluke. The path integral does not depend on the parametrization
of the curve, the only thing that matters is the direction in which the curve is traversed.
Proposition 9.2.12. Let γ : [a, b] → Rn be a piecewise smooth path and γ ◦ h : [c, d]→ Rn a piece-
wise smooth reparametrization. Suppose ω is a one-form defined on the set γ [a, b] . Then
Z
(R
ω if h preserves orientation,
ω = γR
γ◦h − γ ω if h reverses orientation.
9.2. PATH INTEGRALS 71

Proof. Assume first that γ and h are both smooth. Write ω = ω1 dx1 + ω2 dx2 +· · ·+ ωn dxn . Suppose
that h is orientation preserving. Using the change of variables formula for the Riemann integral,
Z Z b
!
n  ′
γ
ω=
a
∑ ω j γ (t) γ j (t) dt
j=1
Z d
!
n   ′  ′
=
c
∑ ω j γ h(τ ) γ j h(τ ) h (τ ) d τ
j=1
Z d
! Z
n  
= ∑ ω j γ h( τ ) (γ j ◦ h)′ (τ ) dτ = ω.
c j=1 γ◦h

If h is orientation reversing, it swaps the order of the limits on the integral and introduces a
minus sign. The details, along with finishing the proof for piecewise smooth paths, is left as
.
n
 proposition (and the exercises), if Γ ⊂ R is the image of a simple piecewise smooth
Due to this
path γ [a, b] , then if we somehow indicate the orientation, that is, the direction in which we traverse
the curve, then we can write Z
ω,
Γ
without mentioning the specific γ . Furthermore, for a simple closed path, it does not even matter
where we start the parametrization. See the exercises.
Recall that simple means that γ restricted to (a, b) is one-to-one, that is, γ is one-to-one except
perhaps at the endpoints. We also often relax the simple path condition a little bit. For example, as
long as γ : [a, b] → Rn is one-to-one except at finitely many points. That is, there are only finitely
many points p ∈ Rn such that γ −1 (p) is more than one point. See the exercises. The issue about the
injectivity problem is illustrated by the following example.

Example 9.2.13: Suppose γ : [0, 2π ]→ R2 is given by γ (t) := cos(t), sin(t) , and β : [0, 2π ] → R
2

is given by β (t) := cos(2t), sin(2t) . Notice that γ [0, 2π ] = β [0, 2π ] , and we travel around
the same curve, the unit circle. But γ goes around the unit circle once in the counter clockwise
direction, and β goes around the unit circle twice (in the same direction). See .
Compute
Z Z 2π    
−y dx + x dy = − sin(t) − sin(t) + cos(t) cos(t) dt = 2π ,
γ 0
Z Z 2π    
−y dx + x dy = − sin(2t) −2 sin(2t) + cos(t) 2 cos(t) dt = 4π .
β 0

It is sometimes convenient to define a path integral over γ : [a, b] → Rn that is not a path. Define
Z Z b
!
n  ′
ω :=
γ
∑ ω j γ (t) γ j (t) dt
a j=1
for any continuously differentiable  γ . A case that comes up naturally is when γ is constant. Then
γR ′ (t) = 0 for all t, and γ [a, b] is a single point, which we regard as a “curve” of length zero. Then,
γ ω = 0 for any ω .
72 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

γ (π/2) = β (π/4) = β (5π/4)


γ (π/4) = β (π/8) = β (9π/8)

γ (0) = β (0) = β (π )
γ (π ) = β (π/2) = β (3π/2)
γ (2π ) = β (2π )

γ (3π/2) = β (3π/4) = β (7π/4)


Figure 9.5: Circular path traversed once by γ : [0, 2π] → R2 and twice by β : [0, 2π] → R2 .

9.2.3 Line integral of a function


Next we integrate a function against the so-called arc-length measure ds. The geometric picture we
have in mind is the area under the graph of the function over a path. Imagine a fence erected over γ
with height given by the function and the integral is the area of the fence. See .

y
γ
x
Figure 9.6: A path γ : [a, b] → R2 in the xy-plane (bold curve), and a function z = f (x, y) graphed above
it in the z direction. The integral is the shaded area depicted.

Definition 9.2.14. Suppose


 γ : [a, b] → Rn is a smooth path, and f is a continuous function defined
on the image γ [a, b] . Then define
Z Z b 
f ds := f γ (t) kγ ′ (t)k dt.
γ a

To emphasize the variables we may use


Z Z
f (x) ds(x) := f ds.
γ γ

The definition for a piecewise smooth path is similar as before and is left to the reader.
9.2. PATH INTEGRALS 73

The line integral of a function is also independent of the parametrization, and in this case, the
orientation does not matter.
Proposition 9.2.15. Let γ : [a, b] → Rn be a piecewise smooth path and γ ◦ h : [c, d] → Rn a piece-

wise smooth reparametrization. Suppose f is a continuous function defined on the set γ [a, b] .
Then Z Z
f ds = f ds.
γ◦h γ

Proof. Suppose first that h is orientation preserving and that γ and h are both smooth. Then
Z Z b 
f ds = f γ (t) kγ ′ (t)k dt
γ a
Z d   ′ 
= f γ h(τ ) kγ h(τ ) kh′ (τ ) d τ
c
Z d   ′ 
= f γ h(τ ) kγ h(τ ) h′ (τ )k d τ
c
Z d 
= f (γ ◦ h)(τ ) k(γ ◦ h)′ (τ )k d τ
Zc
= f ds.
γ◦h

If h is orientation reversing it swaps the order of the limits on the integral, but you also have to
introduce a minus sign in order to take h′ inside the norm. The details, along with finishing the
proof for piecewise smooth paths is left to the reader as .
Similarly as before, because of this proposition (and the exercises),
 if γ is simple, it does not
matter which parametrization we use. Therefore, if Γ = γ [a, b] we can simply write
Z
f ds.
Γ

In this case we also do not need to worry about orientation, either way we get the same integral.
Example 9.2.16: Let f (x, y) := x. Let C ⊂ R2 be half of the unit circle for x ≥ 0. We wish to
compute Z
f ds.
C

Parametrize the curve C via γ : [−π/2, π/2] → defined as γ (t) := cos(t), sin(t) . Then γ ′ (t) =
R2
− sin(t), cos(t) , and
Z Z Z π/2 q 2 2 Z π/2
f ds = f ds = cos(t) − sin(t) + cos(t) dt = cos(t) dt = 2.
C γ −π/2 −π/2

Definition 9.2.17.Suppose Γ ⊂ Rn is parametrized by a simple piecewise smooth path γ : [a, b] →


Rn , that is γ [a, b] = Γ. We define the length by
Z Z
ℓ(Γ) := ds = ds.
Γ γ
74 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

If γ is smooth,
Z b
ℓ(Γ) = kγ ′ (t)k dt.
a
R
This may be a good time to mention that it is common to write ab kγ ′ (t)k dt even if the path is only
piecewise smooth. That is because kγ ′ (t)k has only finitely many discontinuities and is bounded,
and so the integral exists.
Example 9.2.18: Let x, y ∈ Rn be two points and write [x, y] as the straight line segment between
the two points x and y. Parametrize [x, y] by γ (t) := (1 − t)x + ty for t running between 0 and 1. See
. Then γ ′ (t) = y − x, and therefore
 Z Z 1
ℓ [x, y] = ds = ky − xk dt = ky − xk.
[x,y] 0

So the length of [x, y] is the standard euclidean distance between x and y, justifying the name.

t=1
[x, y] y

x
t=0

Figure 9.7: Straight path between x and y parametrized by (1 − t)x + ty.

A simple piecewise smooth path γ : [0, r] → Rn is said to be an arc-length parametrization if for


all t ∈ [0, r] we have 
ℓ γ [0,t] = t.
If γ is smooth, then Z t  Z t ′
d τ = t = ℓ γ [0,t] = kγ (τ )k d τ
0 0
′ Similarly for piecewise smooth γ , we get kγ ′ (t)k = 1
for all t, which means that kγ (t)k = 1 for all t.
for all t where the derivative exists. So you can think of such a parametrization as moving around
speed 1.R If γ : [0, r] n
your curve at
R r  → R is an arclength parametrization, it is common to use s as the
variable as γ f ds = 0 f γ (s) ds.

9.2.4 Exercises
Exercise 9.2.1: Show that if ϕ : [a, b] → Rn is a piecewise smooth path as we defined it, then ϕ is a continuous
function.

Exercise 9.2.2: Finish the proof of for orientation reversing reparametrizations.


9.2. PATH INTEGRALS 75

Exercise 9.2.3: Prove .

Exercise 9.2.4: Finish the proof of for


a) orientation reversing reparametrizations, and
b) piecewise smooth paths and reparametrizations.

Exercise 9.2.5: Finish the proof of for


a) orientation reversing reparametrizations, and
b) piecewise smooth paths and reparametrizations.

n
 γ : [a, b] → R is a piecewise
Exercise 9.2.6: Suppose R
smooth path, and f is a continuous function defined
on the image γ [a, b] . Provide a definition of γ f ds.

Exercise 9.2.7: Directly using the definitions compute:


a) the arc-length of the unit square from using the given parametrization.

b) the arc-length of the unit circle using the parametrization γ : [0, 1] → R2 , γ(t) := cos(2πt), sin(2πt) .

c) the arc-length of the unit circle using the parametrization β : [0, 2π] → R2 , β (t) := cos(t), sin(t) .
Note: Feel free to use what you know about sine and cosine from calculus.

Exercise 9.2.8: Suppose γ : [0, 1] → Rn is a smooth path, and ω is a one-form defined on the image γ [a, b] .
For r ∈ R[0, 1], let γr : [0, r] → Rn be defined as simply the restriction of γ to [0, r]. Show that the function
h(r) := γr ω is a continuously differentiable function on [0, 1].

Exercise 9.2.9: Suppose γ : [a, b] → Rn is a smooth path. Show that there exists an ε > 0 and a smooth
function γe: (a − ε, b + ε) → Rn with γe(t) = γ(t) for all t ∈ [a, b] and γe′ (t) 6= 0 for all t ∈ (a − ε, b + ε). That
is, prove that a smooth path extends some small distance past the end points.

Exercise 9.2.10: Suppose  α : [a, b] → Rn and β : [c, d] → Rn are piecewise smooth paths such that Γ :=
α [a, b] = β [c, d] . Show that there exist finitely  many points {p1 , p2 , . . . , pk } ∈ Γ, such that the sets
α −1 {p1 , p2 , . . . , pk } and β −1 {p1 , p2 , . . . , pk } are partitions of [a, b] and [c, d], such that on any subinter-
val the paths are smooth (that is, they are partitions as in the definition of piecewise smooth path).

Exercise 9.2.11:

a) Suppose γ : [a, b] → Rn and α : [c, d] → Rn are two smooth paths which are one-to-one and γ [a, b] =
α [c, d] . Then there exists a smooth reparametrization h : [a, b] → [c, d] such that γ = α ◦ h.
Hint 1: It is not hard to show h exists. The trick is to prove it is continuously differentiable with a nonzero
derivative. Apply the implicit function theorem though it may at first seem the dimensions are wrong.
Hint 2: Worry about derivative of h in (a, b) first.
b) Prove the same thing as part a, but now for simple closed paths with the further assumption that
γ(a) = γ(b) = α(c) = α(d).
c) Prove parts a) and b) but for piecewise smooth paths, obtaining piecewise smooth reparametrizations.
Hint: The trick is to find two partitions such that when restricted to a subinterval of the partition both
paths have the same image and are smooth, see the above exercise.
76 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

Exercise 9.2.12: Suppose α : [a, b] → Rn and β : [b, c] → Rn are piecewise smooth paths with α(b) = β (b).
Let γ : [a, c] → Rn be defined by (
α(t) if t ∈ [a, b],
γ(t) :=
β (t) if t ∈ (b, c].
Show that γ is a piecewise smooth path, and that if ω is a one-form defined on the curve given by γ, then
Z Z Z
ω= ω+ ω.
γ α β

Exercise 9.2.13: Suppose γ : [a, b] → Rn and β : [c, d] → Rn are two simple piecewise smooth closed paths.
That is γ(a) = γ(b) and
 β (c) = β (d) and the restrictions γ|(a,b) and β |(c,d) are one-to-one. Suppose
Γ = γ [a, b] = β [c, d] and ω is a one-form defined on Γ ⊂ Rn . Show that either
Z Z Z Z
ω= ω, or ω =− ω.
γ β γ β
R
In particular, the notation Γ ω makes sense if we indicate the direction in which the integral is evaluated.
Hint: See previous three exercises.

Exercise 9.2.14: Suppose γ : [a, b] → Rn and β : [c, d] → Rn are two piecewise smooth paths which are
one-to-one except at finitely many points. That is, there is at most finitely
 many points
 p ∈ Rn such that
−1 −1
γ (p) or β (p) contains more than one point. Suppose Γ = γ [a, b] = β [c, d] and ω is a one-form
defined on Γ ⊂ Rn . Show that either
Z Z Z Z
ω= ω, or ω =− ω.
γ β γ β
R
In particular, the notation Γ ω makes sense if we indicate the direction in which the integral is evaluated.
Hint: Same hint as the last exercise.
 2 
Exercise 9.2.15: Define γ : [0, 1] → R2 by γ(t) := t 3 sin(1/t ), t 3t 2 sin(1/t ) − t cos(1/t ) for t 6= 0 and
γ(0) = (0, 0). Show that:
a) γ is continuously differentiable on [0, 1].
b) Show that there exists an infinite sequence {tn } in [0, 1] converging to 0, such that γ ′ (tn ) = (0, 0).
c) Show that the points γ(tn ) lie on the line y = 0 and such that the x-coordinate of γ(tn ) alternates between
positive and negative (if they do not alternate you only found a subsequence, you need to find them all).

d) Show that there is no piecewise smooth α whose image equals γ [0, 1] . Hint: Look at part c) and show
that α ′ must be zero where it reaches the origin.
e) (Computer) if you know a plotting software that allows you to plot parametric curves, make a plot of
the curve, but only for t in the range [0, 0.1] otherwise you will not see the behavior. In particular, you
should notice that γ [0, 1] has infinitely many “corners” near the origin.
Note: Feel free to use what you know about sine and cosine from calculus.
9.3. PATH INDEPENDENCE 77

9.3 Path independence


Note: 2 lectures

9.3.1 Path independent integrals


Let U ⊂ Rn be a set and ω a one-form defined on U. The integral of ω is said to be path independent
if for any two points x, y ∈ U and any two piecewise smooth paths γ : [a, b] → U and β : [c, d] → U
such that γ (a) = β (c) = x and γ (b) = β (d) = y we have
Z Z
ω= ω.
γ β

In this case we simply write Z y Z Z


ω := ω= ω.
x γ β
Not every one-form gives a path independent integral. Most do not.
Example 9.3.1: Let γ : [0, 1] → R2 be the path  γ (t) := (t, 0) going from (0, 0) to (1, 0). Let
2
β : [0, 1] → R be the path β (t) := t, (1 − t)t also going between the same points. Then
Z Z 1 Z 1
y dx = γ2 (t)γ1′ (t) dt = 0(1) dt = 0,
γ 0 0
Z Z 1 Z 1
1
y dx = β2 (t)β1′ (t) dt = (1 − t)t(1) dt = .
β 0 0 6
R (1,0)
The integral of y dx is not path independent. In particular, (0,0)
y dx does not make sense.

Definition 9.3.2. Let U ⊂ Rn be an open set and f : U → R a continuously differentiable function.


The one-form
∂f ∂f ∂f
d f := dx1 + dx2 + · · · + dxn
∂ x1 ∂ x2 ∂ xn
is called the total derivative of f .
An open set U ⊂ Rn is said to be path connected if for every two points x and y in U, there
exists a piecewise smooth path starting at x and ending at y.
We leave as an exercise that every connected open set is path connected.
Proposition
Ry
9.3.3. Let U ⊂ Rn be a path connected open set and ω a one-form defined on U. Then
x ω is path independent (for all x, y ∈ U) if and only if there exists a continuously differentiable
f : U → R such that ω = d f .
In fact, if such an f exists, then for any two points x, y ∈ U
Z y
ω = f (y) − f (x).
x

Normally only a continuous path is used in this definition, but for open sets the two definitions are equivalent. See
the exercises.
78 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
Rx
In other words if we fix p ∈ U, then f (x) = C + p ω for some constant C.
Proof. First suppose that the integral is path independent. Pick p ∈ U. Since U is path connected,
there exists a path from p to any x ∈ U. Define
Z x
f (x) := ω.
p

Write ω = ω1 dx1 + ω2 dx2 + · · · + ωn dxn . We wish to show that for every j = 1, 2, . . . , n, the
partial derivative ∂∂xfj exists and is equal to ω j .
Let e j be an arbitrary standard basis vector, and h a nonzero real number. Compute
Z x+he Z x  Z
f (x + he j ) − f (x) 1 j 1 x+he j
= ω− ω = ω,
h h p p h x
R R R
which follows by and path independence as px+he j ω = px ω + xx+he j ω , because
we pick a path from p to x + he j that also happens to pass through x, and then we cut this path in
two, see .

x x + he j

ej

Figure 9.8: Using path indepedence in computing the partial derivative.

Since U is open, suppose h is so small so that all points of distance |h| or less from x are in
U. As the integral is path independent, pick the simplest path possible from x to x + he j , that is
γ (t) := x + the j for t ∈ [0, 1]. The path is in U. Notice γ ′ (t) = he j has only one nonzero component
and that is the jth component, which is h. Therefore
Z x+he j Z Z 1 Z 1
1 1 1
ω= ω= ω j (x + the j )h dt = ω j (x + the j ) dt.
h x h γ h 0 0

We wish to take the limit as h → 0. The function ω j is continuous at x. Given ε > 0, suppose h is
small enough so that |ω (x) − ω (y)| < ε , whenever kx − yk ≤ |h|. Hence, ω j (x + the j ) − ω j (x) < ε
for all t ∈ [0, 1], and we estimate
Z 1 Z 1 
ω j (x + the j ) dt − ω (x) = ω j (x + the j ) − ω (x) dt ≤ ε .
0 0

That is,
f (x + he j ) − f (x)
lim = ω j (x).
h→0 h
All partials of f exist and are equal to ω j , which are continuous functions. Thus, f is continuously
differentiable, and furthermore d f = ω .
9.3. PATH INDEPENDENCE 79

For the other direction, suppose a continuously differentiable f exists such that d f = ω . Take a
smooth path γ : [a, b] → U such that γ (a) = x and γ (b) = y. Then
Z Z b  ′  ′  ′

∂f ∂f ∂f
df = γ (t) γ1 (t) + γ (t) γ2 (t) + · · · + γ (t) γn (t) dt
γ a ∂ x1 ∂ x2 ∂ xn
Z b h i
d
= f γ (t) dt
a dt
= f (y) − f (x).
The value of the integral only depends on x and y, not the path taken. Therefore the integral is path
independent. We leave checking this fact for a piecewise smooth path as an exercise.
Path independence can be stated more neatly in terms of integrals over closed paths.
Proposition 9.3.4. Let U ⊂ Rn be a path connected open set and ω a one-form defined on U. Then
ω = d f for some continuously differentiable f : U → R if and only if
Z
ω =0 for every piecewise smooth closed path γ : [a, b] → U.
γ

Proof. Suppose ω = d f and let γ be a piecewise smooth closed path. Since γ (a) = γ (b) for a closed
path, the previous proposition says
Z  
ω = f γ (b) − f γ (a) = 0.
γ
R
Now suppose that for every piecewise smooth closed path γ , γ ω = 0. Let x, y be two points in
U and let α : [0, 1] → U and β : [0, 1] → U be two piecewise smooth paths with α (0) = β (0) = x
and α (1) = β (1) = y. See .

α y

x β

Figure 9.9: Two paths from x to y.

Define γ : [0, 2] → U by (
α (t) if t ∈ [0, 1],
γ (t) :=
β (2 − t) if t ∈ (1, 2].
This path is piecewise smooth. This is due to the fact that γ |[0,1] (t) = α (t) and γ |[1,2] (t) = β (2 − t)
(note especially γ (1) = α (1) = β (2 − 1)). It is also closed as γ (0) = α (0) = β (0) = γ (2). So
Z Z Z
0= ω= ω− ω.
γ α β

This follows first by , and then noticing that the second part is β travelled
backwards so that we get minus the β integral. Thus the integral of ω on U is path independent.
80 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

However one states path independence, it is often a difficult criterion to check, you have to
check something “for all paths.”
There is a local criterion, a differential equation, that guarantees path independence, or in other
words an antiderivative f whose total derivative is the given one-form ω . Since the criterion is
local, we generally only get the result locally. We can find the antiderivative in any so-called
simply connected domain, which informally is a domain where any path between two points can be
“continuously deformed” into any other path between those two points. To make matters simple, the
usual way this result is proved is for so-called star-shaped domains. As balls are star-shaped we
have the result locally.
Definition 9.3.5. Let U ⊂ Rn be an open set and p ∈ U. We say U is a star-shaped domain with
respect to p if for any other point x ∈ U, the line segment [p, x] is in U, that is, if (1 − t)p + tx ∈ U
for all t ∈ [0, 1]. If we say simply star-shaped, then U is star-shaped with respect to some p ∈ U.
See .

p
x

Figure 9.10: A star-shaped domain with respect to p.

Notice the difference between star-shaped and convex. A convex domain is star-shaped, but a
star-shaped domain need not be convex.
Theorem 9.3.6 (Poincarè lemma). Let U ⊂ Rn be a star-shaped domain and ω a continuously
differentiable one-form defined on U. That is, if

ω = ω1 dx1 + ω2 dx2 + · · · + ωn dxn ,

then ω1 , ω2 , . . . , ωn are continuously differentiable functions. Suppose that for every j and k
∂ ω j ∂ ωk
= ,
∂ xk ∂xj
then there exists a twice continuously differentiable function f : U → R such that d f = ω .
The condition on the derivatives of ω is precisely the condition that the second partial derivatives
commute. That is, if d f = ω , and f is twice continuously differentiable, then

∂ωj ∂2 f ∂2 f ∂ ωk
= = = .
∂ xk ∂ xk ∂ x j ∂ x j ∂ xk ∂xj
The condition is clearly necessary. The Poincarè lemma says that it is sufficient for a star-shaped U.
9.3. PATH INDEPENDENCE 81

Proof. Suppose U is star-shaped with respect to p = (p1 , p2 , . . . , pn ) ∈ U.


Given x = (x1 , x2 , . . . , xn ) ∈ U, define the path γ : [0, 1] → U as γ (t) := (1 − t)p + tx, so γ ′ (t) =
x − p. Then let !
Z Z 1 n 
f (x) := ω =
γ 0
∑ ωk (1 − t)p + tx (xk − pk ) dt.
k=1
We differentiate in x j under the integral, which is allowed as everything, including the partials, is
continuous.
Z 1
! !
n  
∂f ∂ ωk
∂xj
(x) =
0
∑ ∂ x j (1 − t)p + tx t(xk − pk ) + ω j (1 − t)p + tx dt
k=1
Z 1
! !
n ∂ω  
j
=
0
∑ ∂ xk
(1 − t)p + tx t(xk − pk ) + ω j (1 − t)p + tx dt
k=1
Z 1 h i
d
= t ω j (1 − t)p + tx dt
0 dt
= ω j (x).
And this is precisely what we wanted.
Example 9.3.7: Without some hypothesis on U the theorem is not true. Let
−y x
ω (x, y) := dx + dy
x 2 + y2 x 2 + y2

be defined on R2 \ {0}. Then


   
∂ −y y2 − x 2 ∂ x
= 2
= .
∂ y x 2 + y2 (x2 + y2 ) ∂ x x2 + y2

However, there is no f : R2 \ {0} → R such that d f = ω . In we integrated


 from
(1, 0) to (1, 0) along the unit circle counterclockwise, that is γ (t) = cos(t), sin(t) for t ∈ [0, 2π ],
and we found the integral to be 2π . We would have gotten 0 if the integral was independent, or in
other words if there would exist an f such that d f = ω .

9.3.2 Vector fields


A common object to integrate is a so-called vector field.
Definition 9.3.8. Let U ⊂ Rn be a set. A continuous function v : U → Rn is called a vector field.
Write v = (v1 , v2 , . . . , vn ). 
Given a smooth path γ : [a, b] → Rn with γ [a, b] ⊂ U we define the path integral of the
vectorfield v as Z Z b 
v · d γ := v γ (t) · γ ′ (t) dt,
γ a
where the dot in the definition is the standard dot product. Again the definition of a piecewise
smooth path is done by integrating over each smooth interval and adding the result.
82 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES

If we unravel the definition we find that


Z Z
v · dγ = v1 dx1 + v2 dx2 + · · · + vn dxn .
γ γ

Therefore what we know about integration of one-forms carries over to the integration of vector
fields. For example, path independence for integration of vector fields is simply that
Z y
v · dγ
x

is path independent if and only if v = ∇ f , that is v is the gradient of a function. The function f is
then called a potential for v.
A vector field v whose path integrals are path independent is called a conservative vector field.
The naming comes from the fact that such vector fields arise in physical systems where a certain
quantity, the energy is conserved.

9.3.3 Exercises
2 +y2 2 +y2
Exercise 9.3.1: Find an f : R2 → R such that d f = xex dx + yex dy.

Exercise 9.3.2: Find an ω2 : R2 → R such that there exists a continuously differentiable f : R2 → R for
which d f = exy dx + ω2 dy.

Exercise 9.3.3: Finish the proof of , that is, we only proved the second direction for a
smooth path, not a piecewise smooth path.

Exercise 9.3.4: Show that a star-shaped domain U ⊂ Rn is path connected.

Exercise 9.3.5: Show that U := R2 \ {(x, y) ∈ R2 : x ≤ 0, y = 0} is star-shaped and find all points (x0 , y0 ) ∈ U
such that U is star-shaped with respect to (x0 , y0 ).

Exercise 9.3.6: Suppose U1 and U2 are two open sets in Rn with U1 ∩ U2 nonempty and path connected.
Suppose there exists an f1 : U1 → R and f2 : U2 → R, both twice continuously differentiable such that
d f1 = d f2 on U1 ∩U2 . Then there exists a twice differentiable function F : U1 ∪U2 → R such that dF = d f1
on U1 and dF = d f2 on U2 .

Exercise 9.3.7 (Hard): Let γ : [a, b] → Rn be a simple nonclosed piecewise smooth path (so γ is one-to- 
one). Suppose ω is a continuously differentiable one-form defined on some open set V with γ [a, b] ⊂ V
∂ω 
and ∂ xkj = ∂∂ωx jk for all j and k. Prove that there exists an open set U with γ [a, b] ⊂ U ⊂ V and a twice
continuously differentiable
 function f : U → R such that d f = ω.
Hint 1: γ [a, b] is compact.
Hint 2: Show that you can cover the curve by finitely many balls in sequence so that the kth ball only intersects
the (k − 1)th ball.
Hint 3: See previous exercise.
9.3. PATH INDEPENDENCE 83

Exercise 9.3.8:
a) Show that a connected open set U ⊂ Rn is path connected. Hint: Start with a point x ∈ U, and let Ux ⊂ U
is the set of points that are reachable by a path from x. Show that Ux and U \Ux are both open, and since
Ux is nonempty (x ∈ Ux ) it must be that Ux = U.
b) Prove the converse that is, a path connected set U ⊂ Rn is connected. Hint: For contradiction assume
there exist two open and disjoint nonempty open sets and then assume there is a piecewise smooth (and
therefore continuous) path between a point in one to a point in the other.

Exercise 9.3.9: Usually path connectedness is defined using just continuous paths rather than piecewise
smooth paths. Prove that the definitions are equivalent, in other words prove the following statement:
Suppose U ⊂ Rn is open is such that for any x, y ∈ U, there exists a continuous function γ : [a, b] → U such
that γ(a) = x and γ(b) = y. Then U is path connected (in other words, then there exists a piecewise smooth
path).

Exercise 9.3.10 (Hard): Take


−y x
ω(x, y) = dx + dy
x 2 + y2 x 2 + y2
defined on R2 \ {(0, 0)}. Let γ : [a, b] → R2 \ {(0, 0)} be a closed piecewise smooth path. Let R := {(x, y) ∈
R2 : x ≤ 0 and y = 0}. Suppose R ∩ γ [a, b] is a finite set of k points. Prove that
Z
ω = 2πℓ
γ

for some integer ℓ with |ℓ| ≤ k.


Hint
R
1: First prove that for a path β that starts and end on R but does not intersect it otherwise, you find that
β ω is −2π, 0, or 2π.
Hint 2: You proved above that R2 \ R is star-shaped.
Note: The number ℓ is called the winding number it measures how many times does γ wind around the origin
in the clockwise direction.
84 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
Chapter 10

Multivariable Integral

10.1 Riemann integral over rectangles


Note: 2–3 lectures
As in , we define the Riemann integral using the Darboux upper and lower integrals.
The ideas in this section are very similar to integration in one dimension. The complication is
mostly notational. The differences between one and several dimensions will grow more pronounced
in the sections following.

10.1.1 Rectangles and partitions


Definition 10.1.1. Let (a1 , a2 , . . . , an ) and (b1 , b2 , . . . , bn ) be such that ak ≤ bk for all k. A set of
the form [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] is called a closed rectangle. In this setting it is sometimes
useful to allow ak = bk , in which case we think of [ak , bk ] = {ak } as usual. If ak < bk for all k, then
a set of the form (a1 , b1 ) × (a2 , b2 ) × · · · × (an , bn ) is called an open rectangle.
For an open or closed rectangle R := [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] ⊂ Rn or R := (a1 , b1 ) ×
(a2 , b2 ) × · · · × (an , bn ) ⊂ Rn , we define the n-dimensional volume by

V (R) := (b1 − a1 )(b2 − a2 ) · · · (bn − an ).

A partition P of the closed rectangle R = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] is a finite set of parti-
tions P1 , P2 , . . . , Pn of the intervals [a1 , b1 ], [a2 , b2 ], . . . , [an , bn ]. We write P = (P1 , P2 , . . . , Pn ). That is,
for every k = 1, 2, . . . , n there is an integer ℓk and a finite set of numbers Pk = {xk,0 , xk,1 , xk,2 , . . . , xk,ℓk }
such that
ak = xk,0 < xk,1 < xk,2 < · · · < xk,ℓk −1 < xk,ℓk = bk .
Picking a set of n integers j1 , j2 , . . . , jn where jk ∈ {1, 2, . . . , ℓk } we get the subrectangle

[x1, j1 −1 , x1, j1 ] × [x2, j2 −1 , x2, j2 ] × · · · × [xn, jn −1 , xn, jn ].

For simplicity, we order the subrectangles somehow and we say {R1 , R2 , . . . , RN } are the subrectan-
gles corresponding to the partition P of R. More simply, we say they are the subrectangles of P. In
other words, we subdivided the original rectangle into many smaller subrectangles. See .
86 CHAPTER 10. MULTIVARIABLE INTEGRAL

It is not difficult to see that these subrectangles cover our original R, and their volume sums to that
of R. That is,
N
[ N
R= Rk , and V (R) = ∑ V (Rk ).
k=1 k=1

x2,3
R1 R2 R3
x2,2
R6 R5 R4
x2,1
R7 R8 R9
x2,0
x1,0 x1,1 x1,2 x1,3
Figure 10.1: Example partition of a rectangle in R2 . The order of the subrectangles is not important.

When
Ri = [x1, j1 −1 , x1, j1 ] × [x2, j2 −1 , x2, j2 ] × · · · × [xn, jn −1 , xn, jn ],
then

V (Ri ) = ∆x1, j1 ∆x2, j2 · · · ∆xn, jn = (x1, j1 − x1, j1 −1 )(x2, j2 − x2, j2 −1 ) · · · (xn, jn − xn, jn −1 ).

Let R ⊂ Rn be a closed rectangle and let f : R → R be a bounded function. Let P be a partition


of R and suppose that there are N subrectangles R1 , R2 , . . . , RN . Define

mi := inf f (x) : x ∈ Ri ,

Mi := sup f (x) : x ∈ Ri ,
N
L(P, f ) := ∑ miV (Ri ),
i=1
N
U(P, f ) := ∑ MiV (Ri ).
i=1

We call L(P, f ) the lower Darboux sum and U(P, f ) the upper Darboux sum.
The indexing in the definition may be complicated, but fortunately we generally do not need to
go back directly to the definition often. We start proving facts about the Darboux sums analogous to
the one-variable results.
Proposition 10.1.2. Suppose R ⊂ Rn is a closed rectangle and f : R → R is a bounded function.
Let m, M ∈ R be such that for all x ∈ R we have m ≤ f (x) ≤ M. For any partition P of R we have

mV (R) ≤ L(P, f ) ≤ U(P, f ) ≤ M V (R).


10.1. RIEMANN INTEGRAL OVER RECTANGLES 87

Proof. Let P be a partition. Then for all i we have m ≤ mi and Mi ≤ M. Also mi ≤ Mi for all i.
Finally ∑N
i=1 V (Ri ) = V (R). Therefore,
!
N N N
mV (R) = m ∑ V (Ri) = ∑ mV (Ri ) ≤ ∑ mi V (Ri ) ≤
i=1 i=1 i=1
!
N N N
≤ ∑ Mi V (Ri ) ≤ ∑ M V (Ri ) = M ∑ V (Ri) = M V (R).
i=1 i=1 i=1

10.1.2 Upper and lower integrals


By the set of upper and lower Darboux sums are bounded sets and we can take
their infima and suprema. As before, we now make the following definition.

Definition 10.1.3. If f : R → R is a bounded function on a closed rectangle R ⊂ Rn . Define


Z  Z 
f := sup L(P, f ) : P a partition of R , f := inf U(P, f ) : P a partition of R .
R R
R R
We call the lower Darboux integral and the upper Darboux integral.

As in one dimension we have refinements of partitions.

Definition 10.1.4. Let R ⊂ Rn be a closed rectangle. Let P = (P1 , P2 , . . . , Pn ) and Pe = (Pe1 , Pe2 , . . . , Pen )
be partitions of R. We say Pe a refinement of P if, as sets, Pk ⊂ Pek for all k = 1, 2, . . . , n.

It is not difficult to see that if Pe is a refinement of P, then subrectangles of P are unions of


subrectangles of P. e Simply put, in a refinement we take the subrectangles of P, and we cut them
into smaller subrectangles. See .

x̃2,4 x2,3
Re1 Re2 Re3 Re4 Re5
x̃2,3 x2,2
Re18 Re12 Re13 Re6 Re7
x̃2,2
Re19 Re14 Re15 Re8 Re9
x̃2,1 x2,1
Re20 Re16 Re17 Re10 Re11
x̃2,0 x2,0
x1,0 x1,1 x1,2 x1,3
x̃1,0 x̃1,1 x̃1,2 x̃1,3 x̃1,4 x̃1,5
Figure 10.2: Example refinement of a partition. New “cuts” are marked in dashed lines. Do note that
the exact order of the new subrectangles does not matter.
88 CHAPTER 10. MULTIVARIABLE INTEGRAL

Proposition 10.1.5. Suppose R ⊂ Rn is a closed rectangle, P is a partition of R and Pe is a refinement


of P. If f : R → R be a bounded function, then

e f)
L(P, f ) ≤ L(P, and e f ) ≤ U(P, f ).
U(P,

Proof. We prove the first inequality, the second follows similarly. Let R1 , R2 , . . . , RN be the subrect-
angles of P and Re1 , Re2 , . . . , ReNe be the subrectangles of R.
e Let Ik be the set of all indices j such that
Re j ⊂ Rk . For example, in figures and , I4 = {6, 7, 8, 9} as R4 = Re6 ∪ Re7 ∪ Re8 ∪ Re9 . Then,
[
Rk = Re j , V (Rk ) = ∑ V (Rej ).
j∈Ik j∈Ik

e j := inf{ f (x) :∈ Re j } as usual. If j ∈ Ik , then mk ≤ m


Let m j := inf{ f (x) : x ∈ R j }, and m e j . Then

N N N e
N
L(P, f ) = ∑ mkV (Rk ) = ∑ ∑ mkV (Rej ) ≤ ∑ ∑ me jV (Rej ) = ∑ me jV (Rej ) = L(P,
e f ).
k=1 k=1 j∈Ik k=1 j∈Ik j=1

The key point of this next proposition is that the lower Darboux integral is less than or equal to
the upper Darboux integral.

Proposition 10.1.6. Let R ⊂ Rn be a closed rectangle and f : R → R a bounded function. Let


m, M ∈ R be such that for all x ∈ R we have m ≤ f (x) ≤ M. Then
Z Z
mV (R) ≤ f≤ f ≤ M V (R). (10.1)
R R

Proof. For any partition P, via ,

mV (R) ≤ L(P, f ) ≤ U(P, f ) ≤ M V (R).

Taking supremum of L(P, f ) and infimum of U(P, f ) over all P, we obtain the first and the last
inequality in ( ).
The key inequality in ( ) is the middle one. Let P = (P1 , P2 , . . . , Pn ) and Q = (Q1 , Q2 , . . . , Qn )
be partitions of R. Define P = (Pe1 , Pe2 , . . . , Pen ) by letting Pek := Pk ∪ Qk . Then Pe is a partition of R as
e
can easily be checked, and Pe is a refinement of P and a refinement of Q. By ,
e f ) and U(P,
L(P, f ) ≤ L(P, e f ) ≤ U(Q, f ). Therefore,

e f ) ≤ U(P,
L(P, f ) ≤ L(P, e f ) ≤ U(Q, f ).

In other words, for two arbitrary partitions P and Q we have L(P, f ) ≤ U(Q, f ). Via Proposition 1.2.7
from volume I, we obtain
 
sup L(P, f ) : P a partition of R ≤ inf U(P, f ) : P a partition of R .
R R
In other words Rf ≤ R f.
10.1. RIEMANN INTEGRAL OVER RECTANGLES 89

10.1.3 The Riemann integral


We have all we need to define the Riemann integral in n-dimensions over rectangles. Again, the
Riemann integral is only defined on a certain class of functions, called the Riemann integrable
functions.
Definition 10.1.7. Let R ⊂ Rn be a closed rectangle. Let f : R → R be a bounded function such
that Z Z
f (x) dx = f (x) dx.
R R
Then f is said to be Riemann integrable, and we sometimes say simply integrable. The set of
Riemann integrable functions on R is denoted by R(R). When f ∈ R(R) we define the Riemann
integral Z Z Z
f := f= f.
R R R

When the variable x ∈ Rn needs to be emphasized we write


Z Z Z
f (x) dx, f (x1 , . . . , xn ) dx1 · · · dxn , or f (x) dV.
R R R

If R ⊂ R2 , then often instead of volume we say area, and hence write


Z
f (x) dA.
R
immediately implies the following proposition.
Proposition 10.1.8. Let f : R → R be a Riemann integrable function on a closed rectangle R ⊂ Rn .
Let m, M ∈ R be such that m ≤ f (x) ≤ M for all x ∈ R. Then
Z
mV (R) ≤ f ≤ M V (R).
R

Example 10.1.9: A constant function is Riemann integrable. Suppose f (x) = c for all x on R. Then
Z Z
cV (R) ≤ f≤ f ≤ cV (R).
R R
R
So f is integrable, and furthermore R f = cV (R).
The proofs of linearity and monotonicity are almost completely identical as the proofs from one
variable. We therefore leave it as an exercise to prove the next two propositions.
Proposition 10.1.10 (Linearity). Let R ⊂ Rn be a closed rectangle and let f and g be in R(R) and
α ∈ R.
(i) α f is in R(R) and Z Z
αf =α f.
R R
(ii) f + g is in R(R) and Z Z Z
( f + g) = f+ g.
R R R
90 CHAPTER 10. MULTIVARIABLE INTEGRAL

Proposition 10.1.11 (Monotonicity). Let R ⊂ Rn be a closed rectangle, let f and g be in R(R),


and suppose f (x) ≤ g(x) for all x ∈ R. Then
Z Z
f≤ g.
R R

Checking for integrability using the definition often involves the following technique, as in the
single variable case.

Proposition 10.1.12. Let R ⊂ Rn be a closed rectangle and f : R → R a bounded function. Then


f ∈ R(R) if and only if for every ε > 0, there exists a partition P of R such that

U(P, f ) − L(P, f ) < ε .

Proof. First, if f is integrable, then clearly the supremum of L(P, f ) and infimum of U(P, f ) must
be equal and hence the infimum of U(P, f ) − L(P, f ) is zero. Therefore for every ε > 0 there must
be some partition P such that U(P, f ) − L(P, f ) < ε .
For the other direction, given an ε > 0 find P such that U(P, f ) − L(P, f ) < ε .
Z Z
f− f ≤ U(P, f ) − L(P, f ) < ε .
R R

R R R R
As Rf ≥ Rf and the above holds for every ε > 0, we conclude Rf = Rf and f ∈ R(R).

For simplicity if f : S → R is a function and R ⊂ S is a closed rectangle, then if the restriction


f |R is integrable we say f is integrable on R, or f ∈ R(R) and we write
Z Z
f := f |R .
R R

Proposition 10.1.13. For a closed rectangle S ⊂ Rn , if f : S → R is integrable and R ⊂ S is a


closed rectangle, then f is integrable over R.

Proof. Given ε > 0, we find a partition P of S such that U(P, f ) − L(P, f ) < ε . By making a
refinement of P if necessary, we assume that the endpoints of R are in P. In other words, R is
a union of subrectangles of P. The subrectangles of P divide into two collections, ones that are
subsets of R and ones whose intersection with the interior of R is empty. Suppose R1 , R2 . . . , RK are
the subrectangles that are subsets of R and let RK+1 , . . . , RN be the rest. Let Pe be the partition of R
composed of those subrectangles of P contained in R. Using the same notation as before,
K N
ε > U(P, f ) − L(P, f ) = ∑ (Mk − mk )V (Rk ) + ∑ (Mk − mk )V (Rk )
k=1 k=K+1
K
≥ e f |R ) − L(P,
∑ (Mk − mk )V (Rk ) = U(P, e f |R ).
k=1

Therefore, f |R is integrable.
10.1. RIEMANN INTEGRAL OVER RECTANGLES 91

10.1.4 Integrals of continuous functions


Although later we will prove a much more general result, it is useful to start with integrability
of continuous functions. First we wish to measure the fineness of partitions. In one variable we
measured the length of a subinterval, in several variables, we similarly measure the sides of a
subrectangle. We say a rectangle R = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] has longest side at most α if
bk − ak ≤ α for all k = 1, 2, . . . , n.
Proposition 10.1.14. If a rectangle R ⊂ Rn has longest side at most α . Then for any x, y ∈ R,

kx − yk ≤ n α .
Proof.
q
kx − yk = (x1 − y1 )2 + (x2 − y2 )2 + · · · + (xn − yn )2
q
≤ (b1 − a1 )2 + (b2 − a2 )2 + · · · + (bn − an )2
p √
≤ α2 + α2 + · · · + α2 = n α.
Theorem 10.1.15. Let R ⊂ Rn be a closed rectangle and f : R → R a continuous function, then
f ∈ R(R).
Proof. The proof is analogous to the one variable proof with some complications. The set R is
a closed and bounded subset of Rn , and hence compact. So f is not just continuous, but in fact
uniformly continuous by Theorem 7.5.11 from volume I. Let ε > 0 be given. Find a δ > 0 such that
ε
kx − yk < δ implies | f (x) − f (y)| < V (R) .
Let P be a partition of R, such that longest side of any subrectangle is strictly less than √δn . If

x, y ∈ Rk for some subrectangle Rk of P we have, by the proposition above, kx − yk < n √δn = δ .
Therefore
ε
f (x) − f (y) ≤ | f (x) − f (y)| < .
V (R)
As f is continuous on Rk , it attains a maximum and a minimum on this subrectangle. Let x be a
point where f attains the maximum and y be a point where f attains the minimum. Then f (x) = Mk
and f (y) = mk in the notation from the definition of the integral. Therefore,
ε
Mk − mk = f (x) − f (y) < .
V (R)
And so
! !
N N
U(P, f ) − L(P, f ) = ∑ MkV (Rk ) − ∑ mkV (Rk )
k=1 k=1
N
= ∑ (Mk − mk )V (Rk )
k=1
ε N
< ∑ V (Rk ) = ε .
V (R) k=1
Via application of we find that f ∈ R(R).
92 CHAPTER 10. MULTIVARIABLE INTEGRAL

10.1.5 Integration of functions with compact support


Let U ⊂ Rn be an open set and f : U → R be a function. We say the support of f is the set

supp( f ) := {x ∈ U : f (x) 6= 0},

where the closure is with respect to the subspace topology on U. Taking the closure with respect to
the subspace topology is the same as {x ∈ U : f (x) 6= 0} ∩U, where the closure is with respect to
the ambient euclidean space Rn . In particular, supp( f ) ⊂ U. The support is the closure (in U) of
the set of points where the function is nonzero. Its complement in U is open. If x ∈ U and x is not
in the support of f , then f is constantly zero in a whole neighborhood of x.
A function f is said to have compact support if supp( f ) is a compact set.
Example 10.1.16: The function f : R2 → R defined by
( 2 p
−x(x2 + y2 − 1) if x2 + y2 ≤ 1,
f (x, y) :=
0 else,
 p
is continuous and its support is the closed unit disc C(0, 1) = (x, y) : x2 + y2 ≤ 1 , which is a
compact set, so f has compact support. Do note that the function is zero on the entire y-axis and on
the unit circle, but all points that lie in the closed unit disc are still within the support as they are in
the closure of points where f is nonzero. See .

z y

x
Figure 10.3: Function with compact support (left), the support is the closed unit disc (right).

If U 6= Rn , then you must be careful to take the closure in U. Consider the following two
examples.
Example 10.1.17: Consider the unit disc B(0, 1) ⊂ R2 . The function f : B(0, 1) → R defined by
( p
0 if x2 + y2 > 1/2,
f (x, y) := p p
1/2 − x2 + y2 if x2 + y2 ≤ 1/2,

is continuous on B(0, 1) and its support is the smaller closed ball C(0, 1/2). As that is a compact set,
f has compact support.
10.1. RIEMANN INTEGRAL OVER RECTANGLES 93

The function g : B(0, 1) → R defined by


(
0 if x ≤ 0,
g(x, y) :=
x if x > 0,

is continuous on B(0, 1), but its support is the set (x, y) ∈ B(0, 1) : x ≥ 0 . In particular, g is not
compactly supported.

We mostly consider the case when U = Rn . In light of , which says any


n
continuous function with compact support on U ⊂ R can be extended to a continuous function
with compact support on Rn , this is not an oversimplification.

Example 10.1.18: On the other hand for the unit  disc B(0, 1) ⊂ R2 , the function continuous
f : B(0, 1) → R defined by f (x, y) := sin 1−x12 −y2 , does not have compact support; as f is not
constantly zero on neighborhood of any point in B(0, 1), we know that the support is the entire disc
B(0, 1). The function does not extend as above to a continuous function. In fact it is not difficult to
show that it cannot be extended in any way whatsoever to be continuous on all of R2 (the boundary
of the disc is the problem).

Proposition 10.1.19. Suppose f : Rn → R be a continuous function with compact support. If R and


S are closed rectangles such that supp( f ) ⊂ R and supp( f ) ⊂ S, then
Z Z
f= f.
S R

Proof. As f is continuous,
R
it is automatically
R R
integrable on the rectangles R, S, and R ∩ S. Then
says S f = S∩R f = R f .
Because of this proposition, when f : Rn → R has compact support and is integrable over a
rectangle R containing the support we write
Z Z Z Z
f := f or f := f.
R Rn R
R
For example, if f is continuous and of compact support, then Rn f exists.

10.1.6 Exercises
Exercise 10.1.1: Suppose U ⊂ Rn is open and f : U → R is continuous and of compact support. Show that
the function fe: Rn → R (
f (x) if x ∈ U,
fe(x) :=
0 otherwise,
is continuous.

Exercise 10.1.2: Prove .

Exercise 10.1.3: Suppose R is a rectangle


R
with the length of one of the sides equal to 0. For any bounded
function f , show that f ∈ R(R) and R f = 0.
94 CHAPTER 10. MULTIVARIABLE INTEGRAL

Exercise 10.1.4: Suppose R is a rectangle with the length of one of the sides equal to 0, and suppose S is a
rectangle
R
with R ⊂ S. If f is a bounded function such that f (x) = 0 for x ∈ R \ S, show that f ∈ R(R) and
R f = 0.
Exercise 10.1.5: Suppose f : Rn → R is such that f (x) := 0 if x 6= 0 and f (0) := R
1. Show that f is integrable
on R := [−1, 1] × [−1, 1] × · · · × [−1, 1] directly using the definition, and find R f .
Exercise 10.1.6: Suppose R is a closed rectangle and h : R → R is a bounded function such that h(x) = 0 if
/ ∂ R (the boundary of R). Let S be any closed rectangle. Show that h ∈ R(S) and
x∈
Z
h = 0.
S
Hint: Write h as a sum of functions as in .
Exercise 10.1.7: Suppose R and are two closed rectangles with R′ ⊂ R. Suppose f : R → R is in R(R′ )
R′
and f (x) = 0 for x ∈ R \ R′ . Show that f ∈ R(R) and
Z Z
f= f.
R′ R
Do this in the following steps.
a) First do the proof assuming that furthermore f (x) = 0 whenever x ∈ R \ R′ .
b) Write f (x) R= g(x)R+ h(x) where g(x) = 0 whenever x ∈ R \ R′ , and h(x) is zero except perhaps on ∂ R′ .
Then show R h = R′ h = 0 (see ).
R R
c) Show R′ f= R f.
Exercise 10.1.8: Suppose R′ ⊂ Rn and R′′ ⊂ Rn are two rectangles such that R = R′ ∪ R′′ is a rectangle, and
R′ ∩ R′′ is rectangle with one of the sides having length 0 (that is V (R′ ∩ R′′ ) = 0). Let f : R → R be a function
such that f ∈ R(R′ ) and f ∈ R(R′′ ). Show that f ∈ R(R) and
Z Z Z
f= f+ f.
R R′ R′′
Hint: See previous exercise.
Exercise 10.1.9: Prove a stronger version of . Suppose f : Rn → R be a function with
compact support but not necessarily continuous. Prove that if R is a closed rectangle such that supp( f ) ⊂ R
and f is integrable
R R
over R, then for any other closed rectangle S with supp( f ) ⊂ S, the function f is integrable
over S and S f = R f . Hint: See .
Exercise 10.1.10: Suppose R and S are closed rectangles of Rn . Define
R
f : Rn → R as f (x) := 1 if x ∈ R,
and f (x) := 0 otherwise. Prove f is integrable over S and compute S f . Hint: Consider S ∩ R.
Exercise 10.1.11: Let R = [0, 1] × [0, 1] ⊂ R2 .
a) Suppose f : R → R is defined by (
1 if x = y,
f (x, y) :=
0 else.
R
Show that f ∈ R(R) and compute R f.
b) Suppose f : R → R is defined by
(
1 if x ∈ Q or y ∈ Q,
f (x, y) :=
0 else.
Show that f ∈
/ R(R).
10.1. RIEMANN INTEGRAL OVER RECTANGLES 95

Exercise 10.1.12: Suppose R is a closed rectangle, and suppose S j are closed rectangles such that S j ⊂ R
and S j ⊂ S j+1 for all j. Suppose f : R → R is bounded and f ∈ R(S j ) for all j. Show that f ∈ R(R) and
Z Z
lim f= f.
j→∞ S j R

Exercise 10.1.13: Suppose f : [−1, 1] × [−1, 1] → R is a Riemann integrable function such f (x) = − f (−x).
Using the definition prove Z
f = 0.
[−1,1]×[−1,1]
96 CHAPTER 10. MULTIVARIABLE INTEGRAL

10.2 Iterated integrals and Fubini theorem


Note: 1–2 lectures
The Riemann integral in several variables is hard to compute from the definition. For one-
dimensional Riemann integral we have the fundamental theorem of calculus and we can compute
many integrals without having to appeal to the definition of the integral. We will rewrite a Riemann
integral in several variables into several one-dimensional Riemann integrals by iterating. However, if
f : [0, 1]2 → R is a Riemann integrable function, it is not immediately clear if the three expressions
Z Z 1Z 1 Z 1Z 1
f, f (x, y) dx dy, and f (x, y) dy dx
[0,1]2 0 0 0 0

are equal, or if the last two are even well-defined.

Example 10.2.1: Define (


1 if x = 1/2 and y ∈ Q,
f (x, y) :=
0 otherwise.
R R1R1
Then f is Riemann integrable on R := [0, 1]2 and R f = 0. Furthermore, 0 0 f (x, y) dx dy = 0.
However Z 1
f (1/2, y) dy
0
R R
does not exist, so we cannot even write 01 01 f (x, y) dy dx.
Proof: Let us start with integrability of f . We simply take the partition of [0, 1]2 where the
partition in the x direction is {0, 1/2 − ε , 1/2 + ε , 1} and in the y direction {0, 1} . The subrectangles
of the partition are

R1 := [0, 1/2 − ε ] × [0, 1], R2 := [1/2 − ε , 1/2 + ε ] × [0, 1], R3 := [1/2 + ε , 1] × [0, 1].

We have m1 = M1 = 0, m2 = 0, M2 = 1, and m3 = M3 = 0. Therefore,

L(P, f ) = m1V (R1 ) + m2V (R2 ) + m3V (R3 ) = 0(1/2 − ε ) + 0(2ε ) + 0(1/2 − ε ) = 0,

and

U(P, f ) = M1V (R1 ) + M2V (R2 ) + M3V (R3 ) = 0(1/2 − ε ) + 1(2ε ) + 0(1/2 − ε ) = 2ε .

The upper and Rlower sum are arbitrarily close and the lower sum is always zero, so the function is
integrable and R f = 0.
For any y, the function that takes x to f (x, y) is zero except perhaps at a single point x = 1/2. We
R R R
know that such a function is integrable and 01 f (x, y) dx = 0. Therefore, 01 01 f (x, y) dx dy = 0.
However if x = 1/2, the function that takes y to f (1/2, y) is the nonintegrable function that is 1 on
the rationals and 0 on the irrationals. See Example 5.1.4 from volume I.

We will solve this problem of undefined inside integrals by using the upper and lower integrals,
which are always defined.
10.2. ITERATED INTEGRALS AND FUBINI THEOREM 97

We split the coordinates of Rn+m into two parts. That is, we write the coordinates on Rn+m =
Rn × Rm as (x, y) where x ∈ Rn and y ∈ Rm . For a function f (x, y) we write

fx (y) := f (x, y)

when x is fixed and we wish to speak of the function in terms of y. We write

f y (x) := f (x, y)

when y is fixed and we wish to speak of the function in terms of x.

Theorem 10.2.2 (Fubini version A ). Let R × S ⊂ Rn × Rm be a closed rectangle and f : R × S → R


be integrable. The functions g : R → R and h : R → R defined by
Z Z
g(x) := fx and h(x) := fx
S S

are integrable over R and Z Z Z


g= h= f.
R R R×S

In other words
Z Z Z  Z Z 
f= f (x, y) dy dx = f (x, y) dy dx.
R×S R S R S

If it turns out that fx is integrable for all x, for example when f is continuous, then we obtain the
more familiar Z Z Z
f= f (x, y) dy dx.
R×S R S

Proof. Any partition of R × S is a concatenation of a partition of R and a partition of S. That is,


write a partition of R × S as (P, P′ ) = (P1 , P2 , . . . , Pn , P1′ , P2′ , . . . , Pm′ ), where P = (P1 , P2 , . . . , Pn ) and
P′ = (P1′ , P2′ , . . . , Pm′ ) are partitions of R and S respectively. Let R1 , R2 , . . . , RN be the subrectangles
of P and R′1 , R′2 , . . . , R′K be the subrectangles of P′ . Then the subrectangles of (P, P′ ) are R j × R′k
where 1 ≤ j ≤ N and 1 ≤ k ≤ K.
Let
m j,k := inf ′ f (x, y).
(x,y)∈R j ×Rk

We notice that V (R j × R′k ) = V (R j )V (R′k ) and hence


!
 N K N K
L (P, P′ ), f = ∑ ∑ m j,k V (R j × R′k ) = ∑ ∑ m j,k V (R′k ) V (R j ).
j=1 k=1 j=1 k=1

If we let
mk (x) := inf′ f (x, y) = inf′ fx (y),
y∈Rk y∈Rk


Named after the Italian mathematician (1879–1943).
98 CHAPTER 10. MULTIVARIABLE INTEGRAL

then of course if x ∈ R j , then m j,k ≤ mk (x). Therefore

K K Z
∑ m j,k V (R′k ) ≤ ∑ mk (x)V (R′k ) = L(P′, fx ) ≤ S
fx = g(x).
k=1 k=1

As we have the inequality for all x ∈ R j we have


K
∑ m j,k V (R′k ) ≤ x∈R
inf g(x).
j
k=1

We thus obtain  
 N

L (P, P ), f ≤ ∑ inf g(x) V (R j ) = L(P, g).
x∈R j
j=1

Similarly U (P, P′ ), f ) ≥ U(P, h), and the proof of this inequality is left as an exercise.
Putting this together we have
 
L (P, P′ ), f ≤ L(P, g) ≤ U(P, g) ≤ U(P, h) ≤ U (P, P′ ), f .

And since f is integrable, it must be that g is integrable as


 
U(P, g) − L(P, g) ≤ U (P, P′ ), f − L (P, P′ ), f ,

′ ), f ≤
and we can make the right
 hand side arbitrarily
R
small.
R
As for any partition we have L (P, P
L(P, g) ≤ U (P, P′ ), f we must have that R g = R×S f .
Similarly we have
 
L (P, P′ ), f ≤ L(P, g) ≤ L(P, h) ≤ U(P, h) ≤ U (P, P′ ), f ,

and hence  
U(P, h) − L(P, h) ≤ U (P, P′ ), f − L (P, P′ ), f .
 
′ ), f ≤ L(P, h) ≤ U (P, P′ ), f we must have that
So
R
if fR
is integrable so is h, and as L (P, P
R h = R×S f .

We can also do the iterated integration in opposite order. The proof of this version is almost
identical to version A (or follows quickly from version A), and we leave it as an exercise to the
reader.

Theorem 10.2.3 (Fubini version B). Let R × S ⊂ Rn × Rm be a closed rectangle and f : R × S → R


be integrable. The functions g : S → R and h : S → R defined by
Z Z
y
g(y) := f and h(y) := fy
R R

are integrable over S and Z Z Z


g= h= f.
S S R×S
10.2. ITERATED INTEGRALS AND FUBINI THEOREM 99

That is we also have


Z Z Z  Z Z 
f= f (x, y) dx dy = f (x, y) dx dy.
R×S S R S R

Next suppose that fx and f y are integrable for simplicity. For example, suppose that f is
continuous. Then by putting the two versions together we obtain the familiar
Z Z Z Z Z
f= f (x, y) dy dx = f (x, y) dx dy.
R×S R S S R

Often the Fubini theorem is stated in two dimensions for a continuous function f : R → R on a
rectangle R = [a, b] × [c, d]. Then the Fubini theorem states that
Z Z bZ d Z dZ b
f= f (x, y) dy dx = f (x, y) dx dy.
R a c c a
And the Fubini theorem is commonly thought of as the theorem that allows us to swap the order of
iterated integrals.
Repeatedly applying Fubini theorem gets us the following corollary: Let R := [a1 , b1 ]×[a2 , b2 ]×
· · · × [an , bn ] ⊂ Rn be a closed rectangle and let f : R → R be continuous. Then
Z Z b1 Z b2 Z bn
f= ··· f (x1 , x2 , . . . , xn ) dxn dxn−1 · · · dx1 .
R a1 a2 an

Clearly we can also switch the order of integration to any order we please. We can also relax the
continuity requirement by making sure that all the intermediate functions are integrable, or by using
upper or lower integrals.

10.2.1 Exercises
R1R1 xy
Exercise 10.2.1: Compute −1 xe
dx dy in a simple way.
0

Exercise 10.2.2: Prove the assertion U (P, P′ ), f ≥ U(P, h) from the proof of .
Exercise 10.2.3 (Easy): Prove .
Exercise 10.2.4: Let R = [a, b] × [c, d] and f (x, y) is an integrable function on R such that such that for any
fixed y, the function that takes x to f (x, y) is zero except at finitely many points. Show
Z
f = 0.
R

Exercise 10.2.5: Let R = [a, b] × [c, d] and f (x, y) := g(x)h(y) for two continuous functions g : [a, b] → R
and h : [a, b] → R. Prove   
Z Z b Z d
f= g h .
R a c

Exercise 10.2.6: Compute


Z 1Z 1 2 Z 1Z 1 2
x − y2 x − y2
dx dy and dy dx.
0 0 (x2 + y2 )2 0 0 (x2 + y2 )2
R1
You will need to interpret the integrals as improper, that is, the limit of ε as ε → 0.
100 CHAPTER 10. MULTIVARIABLE INTEGRAL

Exercise 10.2.7: Suppose f (x, y) := g(x) where g : [a, b] → R is Riemann integrable. Show that f is Riemann
integrable for any R = [a, b] × [c, d] and
Z Z b
f = (d − c) g.
R a

Exercise 10.2.8: Define f : [−1, 1] × [0, 1] → R by


(
x if y ∈ Q,
f (x, y) :=
0 else.
R1R1 R1 R1
a) Show 0 −1 f (x, y) dx dy exists, but −1 0 f (x, y) dy dx does not.
R1 R1 R1 R1
b) Compute −1 0 f (x, y) dy dx and −1 0 f (x, y) dy dx.
c) Show f is not Riemann integrable on [−1, 1] × [0, 1] (use Fubini).

Exercise 10.2.9: Define f : [0, 1] × [0, 1] → R by


(
1/q if x ∈ Q, y ∈ Q, and y = p/q in lowest terms,
f (x, y) :=
0 else.

a) Show f is Riemann integrable on [0, 1] × [0, 1].


R1 R1
b) Find 0 f (x, y) dx and 0 f (x, y) dx for all y ∈ [0, 1], and show they are unequal for all y ∈ Q.
R1R1 R1R1
c) Show 0 0 f (x, y) dy dx exists, but 0 0 f (x, y) dx dy does not.
R1R1 R1R1
Note: By Fubini, 0 0 f (x, y) dy dx and 0 0 f (x, y) dy dx do exist and equal the integral of f on R.
10.3. OUTER MEASURE AND NULL SETS 101

10.3 Outer measure and null sets


Note: 2 lectures

10.3.1 Outer measure and null sets


Before we characterize all Riemann integrable functions, we need to make a slight detour. We
introduce a way of measuring the size of sets in Rn .
Definition 10.3.1. Let S ⊂ Rn be a subset. Define the outer measure of S as

m∗ (S) := inf ∑ V (R j ),
j=1
S∞
where the infimum is taken over all sequences {R j } of open rectangles such that S ⊂ j=1 R j . See
. In particular, S is of measure zero or a null set if m∗ (S) = 0.

R2
R1 S
R3
Figure 10.4: Outer measure construction, in this case S ⊂ R1 ∪ R2 ∪ R3 ∪ · · · , so m∗ (S) ≤ V (R1 ) +
V (R2 ) +V (R3 ) + · · · .

An immediate and useful consequence (exercise) of the definition is that if A ⊂ B then m∗ (A) ≤
m∗ (B). It is also not difficult to show (exercise) that we obtain the same number m∗ (S) if we also
allow both finite and infinite sequences of rectangles in the definition.
The theory of measures on Rn is a very complicated subject. We will only require measure-zero
sets and so we focus on these. The set S is of measure zero if for every ε > 0 there exists a sequence
of open rectangles {R j } such that

[ ∞
S⊂ Rj and ∑ V (R j ) < ε . (10.2)
j=1 j=1

If S is of measure zero and S′ ⊂ S, then S′ is of measure zero. We can use the same exact rectangles.
It is sometimes more convenient to use balls instead of rectangles. Furthermore, we can choose
balls no bigger than a fixed radius.
Proposition 10.3.2. Let δ > 0 be given. A set S ⊂ Rn is measure zero if and only if for every ε > 0,
there exists a sequence of open balls {B j }, where the radius of B j is r j < δ such that

[ ∞
S⊂ Bj and ∑ rnj < ε .
j=1 j=1
102 CHAPTER 10. MULTIVARIABLE INTEGRAL

Note that the “volume” of B j is proportional to rnj .

Proof. If C is a (closed or√open) cube (rectangle with all sides equal) of side s, then R is contained

in a closed ball of radius n s by , and therefore in an open ball of size 2 n s.
Suppose R is a (closed
√ or open) rectangle. Let s be a number that is less than the smallest side
of R and also so that 2 n s < δ . We claim R is contained in a union of closed cubes C1 ,C2 , . . . ,Ck
of sides s such that
k
∑ V (C j ) ≤ 2nV (R).
j=1

It is clearly true (without the 2n ) if R has sides that are integer multiples of s. So if a side is of length
(ℓ + α )s, for ℓ ∈ N and 0 ≤ α < 1, then (ℓ + α )s ≤ 2ℓs. Increasing the side to 2ℓs, and then doing
the same for every side, we obtain a new larger rectangle of volume at most 2n times larger, but
whose sides are multiples of s.
So suppose that there exist {R j } as in the definition such that ( ) is true. As we have seen
above, we can choose closed cubes {Ck } with Ck of side sk as above that cover all the rectangles
{R j } and so that
∞ ∞ ∞
∑ snk = n
∑ V (Ck ) ≤ 2 ∑ V (Rk ) < 2nε .
k=1 k=1 j=1

Covering Ck with balls Bk of radius rk = 2 n sk < δ we obtain

∑ rkn < 22nnε .
k=1
S S S
And as S ⊂ j R j ⊂ k Ck ⊂ k Bk , we are finished.
For the other direction, suppose S is covered by balls B j of radii r j , such that ∑ rnj < ε , as in the
statement of the proposition. Each B j is contained a in a cube R j of side 2r j . So V (R j ) = (2r j )n =
2n rnj . Therefore

[ ∞ ∞
S⊂ Rj and ∑ V (R j ) ≤ ∑ 2nrnj < 2nε .
j=1 j=1 j=1

The definition of outer measure could have been done with open balls as well, not just null sets.
We leave this generalization to the reader.

10.3.2 Examples and basic properties


Example 10.3.3: The set Qn ⊂ Rn of points with rational coordinates is a set of measure zero.
Proof: The set Qn is countable and therefore let us write it as a sequence q1 , q2 , . . .. For each q j
find an open rectangle R j with q j ∈ R j and V (R j ) < ε 2− j . Then

[ ∞ ∞
Qn ⊂ Rj and ∑ V (R j ) < ∑ ε 2− j = ε .
j=1 j=1 j=1

The example points to a more general result.


10.3. OUTER MEASURE AND NULL SETS 103

Proposition 10.3.4. A countable union of measure zero sets is of measure zero.

Proof. Suppose

[
S= S j,
j=1

where S j are all measure zero sets. Let ε > 0 be given. For each j there exists a sequence of open
rectangles {R j,k }∞
k=1 such that

[
Sj ⊂ R j,k
k=1

and

∑ V (R j,k ) < 2− j ε .
k=1

Then
∞ [
[ ∞
S⊂ R j,k .
j=1 k=1

As V (R j,k ) is always positive, the sum over all j and k can be done in any order. In particular, it can
be done as
∞ ∞ ∞
∑ ∑ V (R j,k ) < ∑ 2− j ε = ε .
j=1 k=1 j=1

The next example is not just interesting, it will be useful later.

Example 10.3.5: Let P := {x ∈ Rn : xk = c} for a fixed k = 1, 2, . . . , n and a fixed constant c ∈ R.


Then P is of measure zero.
Proof: First fix s and let us prove that

Ps := x ∈ Rn : xk = c, |x j | ≤ s for all j 6= k

is of measure zero. Given any ε > 0 define the open rectangle



R := x ∈ Rn : c − ε < xk < c + ε , |x j | < s + 1 for all j 6= k .

It is clear that Ps ⊂ R. Furthermore


n−1
V (R) = 2ε 2(s + 1) .

As s is fixed, we make V (R) arbitrarily small by picking ε small enough. So Ps is measure zero.
Next

[
P= Pj
j=1

and a countable union of measure zero sets is measure zero.


104 CHAPTER 10. MULTIVARIABLE INTEGRAL

Example 10.3.6: If a < b, then m∗ [a, b] = b − a.
Proof: In the case of R, open rectangles are open intervals. Since [a, b] ⊂ (a − ε , b + ε ) for all
ε > 0. Hence, m∗ [a, b] ≤ b − a. 
Let us prove the other inequality. Suppose (a j , b j ) are open intervals such that

[
[a, b] ⊂ (a j , b j ).
j=1

We wish to bound ∑(b j − a j ) from below. Since [a, b] is compact, then finitely many of the open
intervals still cover [a, b]. As throwing out some of the intervals only makes the sum smaller, we
only need to consider the finite number of intervals still covering [a, b]. If (ai , bi ) ⊂ (a j , b j ), then we
can throw out (ai , bi ) as well, in other words the intervals that are left have distinct left endpoints,
S
and whenever a j < ai < b j , then b j < bi . Therefore [a, b] ⊂ kj=1 (a j , b j ) for some k, and we assume
that the intervals are sorted such that a1 < a2 < · · · < ak . Since (a2 , b2 ) is not contained in (a1 , b1 ),
since a j > a2 for all j > 2, and since the intervals must contain every point in [a, b], we find that
a2 < b1 , or in other words a1 < a2 < b1 < b2 . Similarly a j < a j+1 < b j < b j+1 . Furthermore,
a1 < a and bk > b. Thus,

 k k−1
m∗ [a, b] ≥ ∑ (b j − a j ) ≥ ∑ (a j+1 − a j ) + (bk − ak ) = bk − a1 > b − a.
j=1 j=1

Proposition 10.3.7. Suppose E ⊂ Rn is a compact set of measure zero. Then for every ε > 0, there
exist finitely many open rectangles R1 , R2 , . . . , Rk such that
k
E ⊂ R1 ∪ R2 ∪ · · · ∪ Rk and ∑ V (R j ) < ε .
j=1

Also for any δ > 0, there exist finitely many open balls B1 , B2 , . . . , Bk of radii r1 , r2 , . . . , rk < δ such
that
k
E ⊂ B1 ∪ B2 ∪ · · · ∪ Bk and ∑ rnj < ε .
j=1

Proof. Find a sequence of open rectangles {R j } such that



[ ∞
E⊂ Rj and ∑ V (R j ) < ε .
j=1 j=1

By compactness, there are finitely many of these rectangles that still contain E. That is, there is
some k such that E ⊂ R1 ∪ R2 ∪ · · · ∪ Rk . Hence
k ∞
∑ V (R j ) ≤ ∑ V (R j ) < ε .
j=1 j=1

The proof that we can choose balls instead of rectangles is left as an exercise.
10.3. OUTER MEASURE AND NULL SETS 105

Example 10.3.8: So that the reader is not under the impression that there are only very few measure
zero sets and that these are simple, let us give an uncountable, compact, measure zero subset in
[0, 1]. For any x ∈ [0, 1] write the representation in ternary notation


x= ∑ dn3−n, where dn = 0, 1, or 2.
n=1

See §1.5 in volume I, in particular Exercise 1.5.4. Define the Cantor set C as
n ∞ o
−n
C := x ∈ [0, 1] : x = ∑ n
d 3 , where d n = 0 or d n = 2 for all n .
n=1

That is, x is in C if it has a ternary expansion in only 0’s and 2’s. If x has two expansions, as long as
one of them does not have any 1’s, then x is in C. Define C0 := [0, 1] and
n ∞ o
−n
Ck := x ∈ [0, 1] : x = ∑ dn3 , where dn = 0 or dn = 2 for all n = 1, 2, . . . , k .
n=1

Clearly,

\
C= Ck .
k=1

See .
We leave as an exercise to prove that:
(i) Each Ck is a finite union of closed intervals. It is obtained by taking Ck−1 , and from each
closed interval removing the “middle third”.
(ii) Therefore, each Ck is closed, and so C is closed.
n
(iii) Furthermore, m∗ (Ck ) = 1 − ∑kn=1 32n+1 .
(iv) Hence, m∗ (C) = 0.
(v) The set C is in one-to-one correspondence with [0, 1], in other words, it is uncountable.

C0
C1
C2
C3
C4
Figure 10.5: Cantor set construction.
106 CHAPTER 10. MULTIVARIABLE INTEGRAL

10.3.3 Images of null sets


Before we look at images of measure zero sets, let us see what a continuously differentiable function
does to a ball.

Lemma 10.3.9. Suppose U ⊂ Rn is an open set, B ⊂ U is an open (resp. closed) ball of radius
at most r, f : B → Rn is continuously differentiable and suppose k f ′ (x)k ≤ M for all x ∈ B. Then
f (B) ⊂ B′ , where B′ is an open (resp. closed) ball of radius at most Mr.

Proof. Without loss of generality assume B is a closed ball. The ball B is convex, and hence via
, k f (x)− f (y)k ≤ Mkx − yk for all x, y in B. In particular, suppose B = C(y, r),
then f (B) ⊂ C f (y), Mr .
The image of a measure zero set using a continuous map is not necessarily a measure zero
set, although this is not easy to show (see the exercises). However, if the mapping is continuously
differentiable, then the mapping cannot “stretch” the set that much.

Proposition 10.3.10. Suppose U ⊂ Rn is an open set and f : U → Rn is a continuously differentiable


mapping. If E ⊂ U is a measure zero set, then f (E) is measure zero.

Proof. We leave the proof for a general measure zero set as an exercise, and we now prove the
proposition for a compact measure zero set.
Suppose E is compact. First let us replace U by a smaller open set to make k f ′ (x)k bounded. At
each point x ∈ E pick an open ball B(x, rx ) such that the closed ball C(x, rx ) ⊂ U. By compactness
we only need to take finitely many points x1 , x2 , . . . , xq to still cover E. Define
q
[ q
[
U ′ := B(x j , rx j ), K := C(x j , rx j ).
j=1 j=1

We have E ⊂ U ′ ⊂ K ⊂ U. The set K is compact. The function that takes x to k f ′ (x)k is continuous,
and therefore there exists an M > 0 such that k f ′ (x)k ≤ M for all x ∈ K. So without loss of generality
we may replace U by U ′ and from now on suppose that k f ′ (x)k ≤ M for all x ∈ U.
At each point x ∈ E pick a ball B(x, δx ) of maximum radius so that B(x, δx ) ⊂ U. Let δ =
infx∈E δx . Take a sequence {x j } ⊂ E so that δx j → δ . As E is compact, we can pick the sequence to
δ δ
be convergent to some y ∈ E. Once kx j − yk < 2y , then δx j > 2y by the triangle inequality. Therefore
δ > 0.
Given ε > 0, there exist balls B1 , B2 , . . . , Bk of radii r1 , r2 , . . . , rk < δ such that
k
E ⊂ B1 ∪ B2 ∪ · · · ∪ Bk and ∑ rnj < ε .
j=1

The balls are are contained in U. Suppose B′1 , B′2 , . . . , B′k are the balls of radius Mr1 , Mr2 , . . . , Mrk
from , such that f (B j ) ⊂ B′j for all j.

k
f (E) ⊂ f (B1 ) ∪ f (B2 ) ∪ · · · ∪ f (Bk ) ⊂ B′1 ∪ B′2 ∪ · · · ∪ B′k and ∑ Mrnj < Mε .
j=1
10.3. OUTER MEASURE AND NULL SETS 107

10.3.4 Exercises
Exercise 10.3.1: Finish the proof of , that is, show that you can use balls instead of
rectangles.

Exercise 10.3.2: If A ⊂ B, then m∗ (A) ≤ m∗ (B).

Exercise 10.3.3: Suppose X ⊂ Rn is a set such that for every ε > 0 there exists a set Y such that X ⊂ Y and
m∗ (Y ) ≤ ε. Prove that X is a measure zero set.

Exercise 10.3.4: Show that if R ⊂ Rn is a closed rectangle, then m∗ (R) = V (R).

Exercise 10.3.5: The closure of a measure zero set can be quite large. Find an example set S ⊂ Rn that is of
measure zero, but whose closure S = Rn .

Exercise 10.3.6: Prove the general case of without using compactness:


a) Mimic the proof to first prove that the proposition holds if E is relatively compact; a set E ⊂ U is relatively
compact if the closure of E in the subspace topology on U is compact, or in other words if there exists a
compact set K with K ⊂ U and E ⊂ K.
Hint: The bound on the size of the derivative still holds, but you need to use countably many balls in the
second part of the proof. Be careful as the closure of E need no longer be measure zero.
b) Now prove it for any null set E.
Hint: First show that {x ∈ U : d(x, y) ≥ 1/m for all y ∈
/ U and d(0, x) ≤ m} is a compact set for any m > 0.

Exercise 10.3.7: Let U ⊂ Rn be an open set and let f : U → R be a continuously differentiable function. Let
G := {(x, y) ∈ U × R : y = f (x)} be the graph of f . Show that f is of measure zero.

Exercise 10.3.8: Given a closed rectangle R ⊂ Rn , show that for any ε > 0 there exists a number s > 0 and
finitely many open cubes C1 ,C2 , . . . ,Ck of side s such that R ⊂ C1 ∪C2 ∪ · · · ∪Ck and

k
∑ V (C j ) ≤ V (R) + ε.
j=1

Exercise 10.3.9: Show that there exists a number k = k(n, r, δ ) depending only on n, r and δ such the
following holds. Given B(x, r) ⊂ Rn and δ > 0, there exist k open balls B1 , B2 , . . . , Bk of radius at most δ
such that B(x, r) ⊂ B1 ∪ B2 ∪ · · · ∪ Bk . Note that you can find k that really only depends on n and the ratio δ/r.

Exercise 10.3.10 (Challenging): Prove the statements of . That is, prove:


a) Each Ck is a finite union of closed intervals, and so C is closed.
n
b) m∗ (Ck ) = 1 − ∑kn=1 32n+1 .
c) m∗ (C) = 0.
d) The set C is in one to one correspondence with [0, 1].

Exercise 10.3.11: Prove that the Cantor set of contains no interval. That is, whenever a < b,
there exists a point x ∈
/ C such that a < x < b.
Note the consequence of this statement. While we proved that an open set in R is a countable disjoint union of
intervals, a closed set (even though it is just the complement of an open set) need not be a union of intervals.
108 CHAPTER 10. MULTIVARIABLE INTEGRAL

Exercise 10.3.12 (Challenging): Let us construct the so called Cantor function or the Devil’s staircase. Let C
be the Cantor set and let Ck be as in . Write x ∈ [0, 1] in ternary representation x = ∑n=1 dn 3−n .
If dn 6= 1 for all n, then let cn := d2n for all n. Otherwise, let k be the smallest integer such that dk = 1. Then
let cn := d2n if n < k, ck := 1, and cn := 0 if n > k. Then define

ϕ(x) := ∑ cn 2−n .
n=1

a) Prove that ϕ is continuous and increasing (See ).


/ C, ϕ is differentiable at x and ϕ ′ (x) = 0. (Notice that ϕ ′ exists and is zero except for a
b) Prove that for x ∈
set of measure zero, yet the function manages to climb from 0 to 1).
c) Define ψ : [0, 1] → [0, 2] by ψ(x) := ϕ(x) + x. Show that ψ is continuous, strictly increasing, and in fact
bijective.

d) Prove that while m∗ (C) = 0, m∗ ψ(C) 6=0. That is, continuous functions need take measure zero sets to
measure zero sets. Hint: m∗ ψ([0, 1] \C) = 1, but m∗ [0, 2] = 2.

0.5

0
0 0.5 1
Figure 10.6: Cantor function or Devil’s staircase (the function ϕ from the exercise).

Exercise 10.3.13: Prove that we obtain the same outer measure if we allow both finite and infinite sequences
in the definition. That is, define µ ∗ (S) := inf ∑ j∈I V (R j ) where the infimum is taken over all countable (finite
S
or infinite) sets of open rectangles {R j } j∈I such that S ⊂ j∈I R j . Prove that for every S ⊂ Rn , µ ∗ (S) = m∗ (S).
10.4. THE SET OF RIEMANN INTEGRABLE FUNCTIONS 109

10.4 The set of Riemann integrable functions


Note: 1 lecture

10.4.1 Oscillation and continuity


Let S ⊂ Rn be a set and f : S → R a function. Instead of just saying that f is or is not continuous at
a point x ∈ S, we want to quantify how discontinuous f is at x. For any δ > 0 define the oscillation
of f on the δ -ball in subspace topology, that is BS (x, δ ) = BRn (x, δ ) ∩ S, as

o( f , x, δ ) := sup f (y) − inf f (y) = sup f (y1 ) − f (y2 ) .
y∈BS (x,δ ) y∈BS (x,δ ) y1 ,y2 ∈BS (x,δ )

That is, o( f , x, δ ) is the length of the smallest interval that contains the image f BS (x, δ ) . Clearly
o( f , x, δ ) ≥ 0 and notice o( f , x, δ ) ≤ o( f , x, δ ′ ) whenever δ < δ ′ . Therefore, the limit as δ → 0
from the right exists and we define the oscillation of a function f at x as
o( f , x) := lim o( f , x, δ ) = inf o( f , x, δ ).
δ →0+ δ >0

Proposition 10.4.1. f : S → R is continuous at x ∈ S if and only if o( f , x) = 0.


Proof. First suppose that f is continuous at x ∈ S. Then given any ε > 0, there exists a δ > 0 such
that for y ∈ BS (x, δ ) we have | f (x) − f (y)| < ε . Therefore if y1 , y2 ∈ BS (x, δ ), then

f (y1 ) − f (y2 ) = f (y1 ) − f (x) − f (y2 ) − f (x) < ε + ε = 2ε .
We take the supremum over y1 and y2

o( f , x, δ ) = sup f (y1 ) − f (y2 ) ≤ 2ε .
y1 ,y2 ∈BS (x,δ )

Hence, o(x, f ) = 0.
On the other hand suppose that o(x, f ) = 0. Given any ε > 0, find a δ > 0 such that o( f , x, δ ) < ε .
If y ∈ BS (x, δ ), then

| f (x) − f (y)| ≤ sup f (y1 ) − f (y2 ) = o( f , x, δ ) < ε .
y1 ,y2 ∈BS (x,δ )

Proposition 10.4.2. Let S ⊂ Rn be closed, f : S → R, and ε > 0. The set {x ∈ S : o( f , x) ≥ ε } is


closed.
Proof. Equivalently we want to show that G = {x ∈ S : o( f , x) < ε } is open in the subspace topology.
As infδ >0 o( f , x, δ ) < ε , find a δ > 0 such that
o( f , x, δ ) < ε
Take any ξ ∈ BS (x, δ/2). Notice that BS (ξ , δ/2) ⊂ BS (x, δ ). Therefore,
 
o( f , ξ , δ/2) = sup f (y1 ) − f (y2 ) ≤ sup f (y1 ) − f (y2 ) = o( f , x, δ ) < ε .
y1 ,y2 ∈BS (ξ ,δ/2) y1 ,y2 ∈BS (x,δ )

So o( f , ξ ) < ε as well. As this is true for all ξ ∈ BS (x, δ/2) we get that G is open in the subspace
topology and S \ G is closed as is claimed.
110 CHAPTER 10. MULTIVARIABLE INTEGRAL

10.4.2 The set of Riemann integrable functions


We have seen that continuous functions are Riemann integrable, but we also know that certain kinds
of discontinuities are allowed. It turns out that as long as the discontinuities happen on a set of
measure zero, the function is integrable and vice versa.
Theorem 10.4.3 (Riemann–Lebesgue). Let R ⊂ Rn be a closed rectangle and f : R → R a bounded
function. Then f is Riemann integrable if and only if the set of discontinuities of f is of measure
zero (a null set).
Proof. Let S ⊂ R be the set of discontinuities of f . That is S = {x ∈ R : o( f , x) > 0}. Suppose that
m∗ (S) = 0, that is, S is a measure zero set. The trick to this proof is to isolate the bad set into a
small set of subrectangles of a partition. There are only finitely many subrectangles of a partition,
so we will wish to use compactness. If S is closed, then it would be compact and we could cover it
by small rectangles as it is of measure zero. Unfortunately, in general S is not closed so we need to
work a little harder.
For every ε > 0, define
Sε := {x ∈ R : o( f , x) ≥ ε }.
By Sε is closed and as it is a subset of R, which is bounded, Sε is compact.
Furthermore, Sε ⊂ S and S is of measure zero. Via there are finitely many open
rectangles O1 , O2 , . . . , Ok that cover Sε and ∑ V (O j ) < ε .
The set T = R \ (O1 ∪ · · · ∪ Ok ) is closed, bounded, and therefore compact. Furthermore for
x ∈ T , we have o( f , x) < ε . Hence for each x ∈ T , there exists a small closed rectangle Tx with x in
the interior of Tx , such that
sup f (y) − inf f (y) < 2ε .
y∈Tx y∈Tx

The interiors of the rectangles Tx cover T . As T is compact there exist finitely many such rectangles
T1 , T2 , . . . , Tm that cover T .
Take the rectangles T1 , T2 , . . . , Tm and O1 , O2 , . . . , Ok and construct a partition out of their end-
points. That is construct a partition P of R with subrectangles R1 , R2 , . . . , R p such that every R j is
contained in Tℓ for some ℓ or the closure of Oℓ for some ℓ. Order the rectangles so that R1 , R2 , . . . , Rq
are those that are contained in some Tℓ , and Rq+1 , Rq+2 , . . . , R p are the rest. In particular,
q p
∑ V (R j ) ≤ V (R) and ∑ V (R j ) ≤ ε .
j=1 j=q+1

Let m j and M j be the inf and sup of f over R j as before. If R j ⊂ Tℓ for some ℓ, then (M j − m j ) < 2ε .
Let B ∈ R be such that | f (x)| ≤ B for all x ∈ R, so (M j − m j ) < 2B over all rectangles. Then
p
U(P, f ) − L(P, f ) = ∑ (M j − m j )V (R j )
j=1
! !
q p
= ∑ (M j − m j )V (R j ) + ∑ (M j − m j )V (R j )
j=1 j=q+1
! !
q p
≤ ∑ 2εV (R j ) + ∑ 2BV (R j )
j=1 j=q+1

≤ 2ε V (R) + 2Bε = ε 2V (R) + 2B .
10.4. THE SET OF RIEMANN INTEGRABLE FUNCTIONS 111

We can make the right hand side as small as we want, and hence f is integrable.
For the other direction, suppose f is Riemann integrable over R. Let S be the set of discontinuities
again and now let
Sk := {x ∈ R : o( f , x) ≥ 1/k}.
Fix a k ∈ N. Given an ε > 0, find a partition P with subrectangles R1 , R2 , . . . , R p such that
p
U(P, f ) − L(P, f ) = ∑ (M j − m j )V (R j ) < ε
j=1

Suppose R1 , R2 , . . . , R p are ordered so that the interiors of R1 , R2 , . . . , Rq intersect Sk , while the


interiors of Rq+1 , Rq+2 , . . . , R p are disjoint from Sk . If x ∈ R j ∩ Sk and x is in the interior of R j so
sufficiently small balls are completely inside R j , then by definition of Sk we have M j − m j ≥ 1/k.
Then
q
p
1 q
ε > ∑ (M j − m j )V (R j ) ≥ ∑ (M j − m j )V (R j ) ≥ ∑ V (R j )
j=1 j=1 k j=1
q
In other words ∑ j=1 V (R j ) < kε . Let G be the set of all boundaries of all the subrectangles of P.
The set G is of measure zero (see ). Let R◦j denote the interior of R j , then

Sk ⊂ R◦1 ∪ R◦2 ∪ · · · ∪ R◦q ∪ G.

As G can be covered by open rectangles arbitrarily small volume, Sk must be of measure zero. As

[
S= Sk
k=1

and a countable union of measure zero sets is of measure zero, S is of measure zero.

Corollary 10.4.4. Let R ⊂ Rn be a closed rectangle. Let R(R) be the set of Riemann integrable
functions on R. Then
(i) R(R) is a real algebra: if f , g ∈ R(R) and a ∈ R, then a f ∈ R(R), f + g ∈ R(R) and
f g ∈ R(R).
(ii) If f , g ∈ R(R) and

ϕ (x) := max{ f (x), g(x)}, ψ (x) := min{ f (x), g(x)},

then ϕ , ψ ∈ R(R).
(iii) If f ∈ R(R), then | f | ∈ R(R), where | f |(x) := | f (x)|.
(iv) If R′ ⊂ Rm is another closed rectangle, U ⊂ Rn and U ′ ⊂ Rm are open sets such that R ⊂ U and
R′ ⊂ U ′ , g : U → U ′ is continuously differentiable, bijective, g−1 is continuously differentiable,
and f ∈ R(R′ ), then the composition f ◦ g is Riemann integrable on R.

The proof is contained in the exercises.


112 CHAPTER 10. MULTIVARIABLE INTEGRAL

10.4.3 Exercises
Exercise 10.4.1: Suppose f : (a, b) × (c, d) → R is a bounded continuous function. Show that the integral of
f over R = [a, b] × [c, d] makes sense and is uniquely defined. That is, set f to be anything on the boundary
of R and compute the integral.
Exercise 10.4.2: Suppose R ⊂ Rn is a closed rectangle. Show that R(R), the set of Riemann integrable
functions, is an algebra. That is, show that if f , g ∈ R(R) and a ∈ R, then a f ∈ R(R), f + g ∈ R(R) and
f g ∈ R(R).
Exercise 10.4.3: Suppose R ⊂ Rn is a closed rectangle and
R
f : R → R is a bounded function which is zero
except on a closed set E ⊂ R of measure zero. Show that R f exists and compute it.
Exercise 10.4.4: Suppose R ⊂ Rn is a closed rectangle and f : R → R and g : R → R are two R
Riemann
R
integrable functions. Suppose f = g except for a closed set E ⊂ R of measure zero. Show that R f = R g.
Exercise 10.4.5: Suppose R ⊂ Rn is a closed rectangle and f : R → R is a bounded function.
a) Suppose there exists a closed set E ⊂ R of measure zero such that f |R\E is continuous. Then f ∈ R(R).
b) Find an example where E ⊂ R is a set of measure zero (but not closed) such that f |R\E is continuous and
f 6∈ R(R).
Exercise 10.4.6: Suppose R ⊂ Rn is a closed rectangle and f : R → R and g : R → R are Riemann integrable
functions. Show that
ϕ(x) := max{ f (x), g(x)}, ψ(x) := min{ f (x), g(x)},
are Riemann integrable.
Exercise 10.4.7: Suppose R ⊂ Rn is a closed rectangle and f : R → R a Riemann integrable function. Show
that | f | is Riemann integrable. Hint: Define f+ (x) := max{ f (x), 0} and f− (x) := max{− f (x), 0}, and then
write | f | in terms of f+ and f− .
Exercise 10.4.8:
a) Suppose R ⊂ Rn and R′ ⊂ Rm are closed rectangles, U ⊂ Rn and U ′ ⊂ Rm are open sets such that R ⊂ U
and R′ ⊂ U ′ , g : U → U ′ is continuously differentiable, bijective, g−1 is continuously differentiable, and
f ∈ R(R′ ), then the composition f ◦ g is Riemann integrable on R.
b) Find a counterexample when g is not one-to-one. Hint: Try g(x, y) := (x, 0) and R = R′ = [0, 1] × [0, 1].
Exercise 10.4.9: Suppose f : [0, 1]2 → R is defined by
(
1
if x, y ∈ Q and x = kℓ and y = qp in lowest terms,
f (x, y) := kq
0 else.

Show that f ∈ R [0, 1]2 .

Exercise 10.4.10: Compute the oscillation o f , (x, y) for all (x, y) ∈ R2 for the function
(
xy
x 2 +y2 if (x, y) 6= (0, 0),
f (x, y) :=
0 else.
Exercise 10.4.11: For the popcorn function f : [0, 1] → R
(
1
if x ∈ Q and x = qp .
f (x) := q
0 else.
Compute o( f , x) for all x ∈ [0, 1].
10.5. JORDAN MEASURABLE SETS 113

10.5 Jordan measurable sets


Note: 1 lecture

10.5.1 Volume and Jordan measurable sets


Given a bounded set S ⊂ Rn , its characteristic function or indicator function χS : Rn → R is defined
by (
1 if x ∈ S,
χS (x) :=
0 if x ∈ / S.
A bounded set S is Jordan measurable if for some closed rectangle R such that S ⊂ R, the function
χS is in R(R). Take two closed rectangles R and R′ with S ⊂ R and S ⊂ R′ , then R ∩ R′ is a closed
rectangle also containing S. By and , χS ∈ R(R ∩ R′ ) and so
χS ∈ R(R′ ). Thus Z Z Z
χS = χS = χS .
R R′ R∩R′
We define the n-dimensional volume of the bounded Jordan measurable set S as
Z
V (S) := χS ,
R

where R is any closed rectangle containing S.

Proposition 10.5.1. A bounded set S ⊂ Rn is Jordan measurable if and only if the boundary ∂ S is
a measure zero set.

Proof. Suppose R is a closed rectangle such that S is contained in the interior of R. If x ∈ ∂ S, then
for every δ > 0, the sets S ∩ B(x, δ ) (where χS is 1) and the sets (R \ S) ∩ B(x, δ ) (where χS is 0) are
both nonempty. So χS is not continuous at x. If x is either in the interior of S or in the complement
of the closure S, then χS is either identically 1 or identically 0 in a whole neighborhood of x and
hence χS is continuous at x. Therefore, the set of discontinuities of χS is precisely the boundary ∂ S.
The proposition then follows.

Proposition 10.5.2. Suppose S and T are bounded Jordan measurable sets. Then
(i) The closure S is Jordan measurable.
(ii) The interior S◦ is Jordan measurable.
(iii) S ∪ T is Jordan measurable.
(iv) S ∩ T is Jordan measurable.
(v) S \ T is Jordan measurable.

The proof of the proposition is left as an exercise. Next, we find that the volume that we defined
above coincides with the outer measure we defined above.

Named after the French mathematician (1838–1922).
114 CHAPTER 10. MULTIVARIABLE INTEGRAL

Proposition 10.5.3. If S ⊂ Rn is Jordan measurable, then V (S) = m∗ (S).

Proof. Given ε > 0, let R be a closed rectangle that contains S. Let P be a partition of R such that
Z Z
U(P, χS ) ≤ χS + ε = V (S) + ε and L(P, χS ) ≥ χS − ε = V (S) − ε .
R R

Let R1 , R2 , . . . , Rk be all the subrectangles of P such that χS is not identically zero on each R j . That
is, there is some point x ∈ R j such that x ∈ S. Let O j be an open rectangle such that R j ⊂ O j and
S
V (O j ) < V (R j ) + ε/k. Notice that S ⊂ j O j . Then
!
k k
U(P, χS ) = ∑ V (R j ) > ∑ V (O j ) − ε ≥ m∗ (S) − ε .
j=1 j=1

As U(P, χS ) ≤ V (S) + ε , then m∗ (S) − ε ≤ V (S) + ε , or in other words m∗ (S) ≤ V (S).


Let R′1 , R′2 , . . . , R′ℓ be all the subrectangles of P such that χS is identically one on each R′j . In
other words, these are the subrectangles contained in S. The interiors of the subrectangles R′◦j are
disjoint and V (R′◦j ) = V (R′j ). It is easy to see from definition that

[
ℓ  ℓ
m∗ R′◦j = ∑ V (R′◦j ).
j=1 j=1

Hence
[
ℓ  [
ℓ  ℓ ℓ
∗ ∗
m (S) ≥ m R′j ≥m ∗
R′◦j = ∑ V (R′◦j ) = ∑ V (R′j ) = L(P, f ) ≥ V (S) − ε .
j=1 j=1 j=1 j=1

Therefore m∗ (S) ≥ V (S) as well.

10.5.2 Integration over Jordan measurable sets


In R there is only one reasonable type of set to integrate over: an interval. In Rn there are many
common types of sets we want to integrate over and these are not described so easily.

Definition 10.5.4. Let S ⊂ Rn be a bounded Jordan measurable set. A bounded function f : S → R


is said to be Riemann integrable on S, or f ∈ R(S), if for a closed rectangle R such that S ⊂ R, the
function fe: R → R defined by
(
f (x) if x ∈ S,
fe(x) :=
0 otherwise,

is in R(R). In this case we write Z Z


f := fe.
S R
10.5. JORDAN MEASURABLE SETS 115

When f is defined on a larger set and we wish to integrate over S, then we apply the definition
to the restriction f |S . In particular, if f : R → R for a closed rectangle R, and S ⊂ R is a Jordan
measurable subset, then Z Z
f = f χS .
S R
Proposition 10.5.5. If S ⊂ Rnis a bounded Jordan measurable set and f : S → R is a bounded
continuous function, then f is integrable on S.
Proof. Define the function fe as above for some closed rectangle R with S ⊂ R. If x ∈ R \ S, then
fe is identically zero in a neighborhood of x. Similarly if x is in the interior of S, then fe = f on a
neighborhood of x and f is continuous at x. Therefore, fe is only ever possibly discontinuous at ∂ S,
which is a set of measure zero, and we are finished.

10.5.3 Images of Jordan measurable subsets


Finally, images of Jordan measurable sets are Jordan measurable under nice enough mappings. For
simplicity, let us assume that the Jacobian never vanishes.
Proposition 10.5.6. Suppose S ⊂ Rn is a closed bounded Jordan measurable set, and S ⊂ U for an
open set U ⊂ Rn . Suppose g : U → Rn is a one-to-one continuously differentiable mapping such
that Jg is never zero on S. Then g(S) is bounded and Jordan measurable.
Proof. Let T := g(S). As S ⊂ Rn is closed and bounded it is compact. By Lemma 7.5.5 from
volume I, the set T is also compact and so closed and bounded. We claim ∂ T ⊂ g(∂ S). Suppose
the claim is proved. As S is Jordan measurable, then ∂ S is measure zero. Then g(∂ S) is measure
zero by . As ∂ T ⊂ g(∂ S), then T is Jordan measurable.
It is therefore left to prove the claim. As T is closed, ∂ T ⊂ T . Suppose y ∈ ∂ T , then there must
exist an x ∈ S such that g(x) = y, and by hypothesis Jg (x) 6= 0. We use the inverse function theorem
( ). We find a neighborhood V ⊂ U of x and an open set W such that the restriction
f |V is a one-to-one and onto function from V to W with a continuously differentiable inverse. In
particular, g(x) = y ∈ W . As y ∈ ∂ T , there exists a sequence {yk } in W with lim yk = y and yk ∈ / T.
As g|V is invertible and in particular has a continuous inverse, there exists a sequence {xk } in V
such that g(xk ) = yk and lim xk = x. Since yk ∈ / T = g(S), clearly xk ∈/ S. Since x ∈ S, we conclude
that x ∈ ∂ S. The claim is proved, ∂ T ⊂ g(∂ S).

10.5.4 Exercises
Exercise 10.5.1: Prove .
Exercise 10.5.2: Prove that a bounded convex set is Jordan measurable. Hint: Induction on dimension.
Exercise 10.5.3: Let f : [a, b] → R and g : [a, b] → R be continuous functions and such that for all x ∈ (a, b),
f (x) < g(x). Let 
U := (x, y) ∈ R2 : a < x < b and f (x) < y < g(x) .
a) Show that U is Jordan measurable.
b) If f : U → R is Riemann integrable on U, then
Z Z b Z f (x)
f= f (x, y) dy dx.
U a g(x)
116 CHAPTER 10. MULTIVARIABLE INTEGRAL

Exercise 10.5.4: Let us construct an example of a non-Jordan measurable open set. For simplicity we work
first in one dimension. Let {r j } be an enumeration of all rational numbers in (0, 1). Let (a j , b j ) be open
S
intervals such that (a j , b j ) ⊂ (0, 1) for all j, r j ∈ (a j , b j ), and ∑∞j=1 (b j − a j ) < 1/2. Now let U := ∞j=1 (a j , b j ).
Show that
a) The open intervals (a j , b j ) as above actually exist.
b) ∂U = [0, 1] \U.
c) ∂U is not of measure zero, and therefore U is not Jordan measurable.
 
d) Show that W := (0, 1) × (0, 2) \ U × [0, 1] ⊂ R2 is a connected bounded open set in R2 that is not
Jordan measurable.

Exercise 10.5.5: Suppose K ⊂ Rn is a closed measure zero set.


a) If K is bounded, prove that K is Jordan measurable.
b) If S ⊂ Rn is bounded and Jordan measurable, prove that S \ K is Jordan measurable.
c) Find an example where S ⊂ Rn is bounded and Jordan measurable, T ⊂ Rn is bounded and of measure
zero, and neither T nor S \ T is Jordan measurable.

Exercise 10.5.6: Suppose U ⊂ Rn is open and K ⊂ U is compact. Find a compact Jordan measurable set S
such that S ⊂ U and K ⊂ S◦ (K is in the interior of S).

Exercise 10.5.7: Prove replacing all closed rectangles with bounded Jordan measurable
sets.
10.6. GREEN’S THEOREM 117

10.6 Green’s theorem


Note: 1 lecture, requires
One of the most important theorems of analysis in several variables is the so-called generalized
Stokes’ theorem, a generalization of the fundamental theorem of calculus. Perhaps the most often
used version is the version in two dimensions, called Green’s theorem , which we prove here.

Definition 10.6.1. Let U ⊂ R2 be a bounded connected open set. Suppose the boundary ∂ U is a
finite union of (the images of) simple piecewise smooth paths such that near each point p ∈ ∂ U
every neighborhood V of p contains points of R2 \ U. Then U is called a bounded domain with
piecewise smooth boundary in R2 .

The condition about points outside the closure says that locally ∂ U separates R2 into an “inside”
and an “outside”. The condition prevents ∂ U from being just a “cut” inside U. As we travel along
the path in a certain orientation, there is a well-defined left and a right, and either U is on the left
and the complement of U is on the right, or vice-versa. The orientation on U is the direction in
which we travel along the paths. We can switch orientation if needed by reparametrizing the path.

Definition 10.6.2. Let U ⊂ R2 be a bounded domain with piecewise smooth boundary, let ∂ U
be oriented , and let γ : [a, b] → R2 be a parametrization
 of ∂ U giving the orientation. Write
γ (t) = x(t), y(t) . If the vector n(t) := −y′ (t), x′ (t) points into the domain, that is, ε n(t) + γ (t)
is in U for all small enough ε > 0, then ∂ U is positively oriented. See . Otherwise it is
negatively oriented.

∂U

n(t) = −y′ (t), x′ (t)

U 
γ ′ (t) = x′ (t), y′ (t)

Figure 10.7: Positively oriented domain (left), and a positively oriented domain with a hole (right).

The vector n(t) turns γ ′ (t) counterclockwise by 90◦ , that is to the left. When we travel along
a positively oriented boundary in the direction of its orientation, the domain is “on our left”. For
example, if U is a bounded domain with “no holes”, that is ∂ U is connected, then the positive
orientation means we are travelling counterclockwise around ∂ U. If we do have “holes”, then we
travel around them clockwise.

Proposition 10.6.3. Let U ⊂ R2 be a bounded domain with piecewise smooth boundary, then U is
Jordan measurable.

Named after the British mathematical physicist (1793–1841).
118 CHAPTER 10. MULTIVARIABLE INTEGRAL

Proof. We must show that ∂ U is a null set. As ∂ U is a finite union of simple piecewise smooth
paths, which are finite unions of smooth paths, we need only show that a smooth
 path in R2 is a null
2
set. Let γ : [a, b] → R be a smooth path. It is enough to show that γ (a, b) is a null set, as adding
the points γ (a) and γ (b), to a null set still results in a null set. Define

f : (a, b) × (−1, 1) → R2 , as f (x, y) := γ (x).


 
The set (a, b) × {0} is a null set in R2 and
 γ (a, b) = f (a, b) × {0} . By ,
2
γ (a, b) is a null set in R and so γ [a, b] is a null set, and so finally ∂ U is a null set.

Theorem 10.6.4 (Green). Suppose U ⊂ R2 is a bounded domain with piecewise smooth boundary
with the boundary positively oriented. Suppose P and Q are continuously differentiable functions
defined on some open set that contains the closure U. Then
Z Z  
∂Q ∂P
P dx + Q dy = − .
∂U U ∂x ∂y

We stated Green’s theorem in general, although we will only prove a special version of it. That
is, we will only prove it for a special kind of domain. The general version follows from the special
case by application of further geometry, and cutting up the general domain into smaller domains on
which to apply the special case. We will not prove the general case.
Let U ⊂ R2 be a domain with piecewise smooth boundary. We say U is of type I if there exist
numbers a < b, and continuous functions f : [a, b] → R and g : [a, b] → R, such that

U := {(x, y) ∈ R2 : a < x < b and f (x) < y < g(x)}.

Similarly, U is of type II if there exist numbers c < d, and continuous functions h : [c, d] → R and
k : [c, d] → R, such that

U := {(x, y) ∈ R2 : c < y < d and h(y) < x < k(y)}.

Finally, U ⊂ R2 is of type III if it is both of type I and type II. See .

type I type II type III


Figure 10.8: Domain types for Green’s theorem.

We will only prove Green’s theorem for type III domains.


10.6. GREEN’S THEOREM 119

Proof of Green’s theorem for U of type III. Let f , g, h, k be the functions defined above. By
, U is Jordan measurable and as U is of type I, then
Z   Z b Z f (x)  
∂P ∂P
− = − (x, y) dy dx
U ∂y a g(x) ∂y
Z b  
= −P x, f (x) + P x, g(x) dx
a
Z b  Z b 
= P x, g(x) dx − P x, f (x) dx.
a a
We integrate P dx along the boundary. The one-form P dx integrates to zero along the straight
vertical lines in the boundary. Therefore it is only integrated along the top and along the bottom.
As a parameter,
 x runs from left to right. If we use the parametrizations that take x to x, f (x) and
to x, g(x) we recognize path integrals above. However the second path integral is in the wrong
direction; the top should be going right to left, and so we must switch orientation.
Z Z b  Z a  Z  
∂P
P dx = P x, g(x) dx + P x, f (x) dx = − .
∂U a b U ∂y
Similarly, U is also of type II. The form Q dy integrates to zero along horizontal lines. So
Z Z d Z h(y) Z b   Z
∂Q ∂Q
= (x, y) dx dy = Q y, h(y) − Q y, k(y) dx = Q dy.
U ∂x c k(y) ∂ x a ∂U

Putting the two together we obtain


Z Z Z Z  Z Z 
∂P ∂Q ∂Q ∂P
P dx + Q dy = P dx + Q dy = − + = − .
∂U ∂U ∂U U ∂y U ∂x U ∂x ∂y
Let us illustrate the usefulness of Green’s theorem on a fundamental result about harmonic
functions.
Example 10.6.5: Suppose U ⊂ R2 is an open set and f : U → R is harmonic, that is, f is twice
2 2
continuously differentiable and ∂∂ x2f + ∂∂ y2f = 0. We will prove one of the most fundamental properties
of Harmonic functions.
Let Dr = B(p, r) be closed disc such that its closure C(p, r) ⊂ U. Write p = (x0 , y0 ). We orient
∂ Dr positively. See . Then
Z  2 
1 ∂ f ∂2 f
0= +
2π r Dr ∂ x2 ∂ y2
Z
1 ∂f ∂f
= − dx + dy
2π r ∂ Dr ∂ y ∂x
Z 2π   
1 ∂f
= − x0 + r cos(t), y0 + r sin(t) −r sin(t)
2π r 0 ∂y

∂f 
+ x0 + r cos(t), y0 + r sin(t) r cos(t) dt
∂x
 Z 
d 1 2π 
= f x0 + r cos(t), y0 + r sin(t) dt .
dr 2π 0
120 CHAPTER 10. MULTIVARIABLE INTEGRAL

1 R 2π
 ′
Let g(r) := 2π 0 f x0 + r cos(t), y0 + r sin(t) dt. Then g (r) = 0 for all r > 0. The function is
constant for r > 0 and continuous at r = 0 (exercise). Therefore g(0) = g(r) for all r > 0. Therefore,
Z 2π 
1
g(r) = g(0) = f x0 + 0 cos(t), y0 + 0 sin(t) dt = f (x0 , y0 ).
2π 0

We proved the mean value property of harmonic functions:


Z 2π  Z
1 1
f (x0 , y0 ) = f x0 + r cos(t), y0 + r sin(t) dt = f ds.
2π 0 2π r ∂ Dr

That is, the value at p = (x0 , y0 ) is the average over a circle of any radius r centered at (x0 , y0 ).

10.6.1 Exercises
Exercise 10.6.1: Prove that a disc B(p, r) ⊂ R2 is a type
 III domain, and prove that the orientation given by
the parametrization γ(t) = x0 + r cos(t), y0 + r sin(t) where p = (x0 , y0 ) is the positive orientation of the
boundary ∂ B(p, r).
Note: Feel free to use what you know about sine and cosine from calculus.

Exercise 10.6.2: Prove that any bounded domain with piecewise smooth boundary that is convex is a type III
domain.

Exercise 10.6.3: Suppose V ⊂ R2 is a domain with piecewise smooth boundary that is a type III domain and
suppose that U ⊂ R2Ris a domain such that V ⊂ U. Suppose f : U → R is a twice continuously differentiable
function. Prove that ∂ V ∂∂ xf dx + ∂∂ yf dy = 0.

Exercise 10.6.4: For a disc B(p, r) ⊂ R2 , orient the boundary ∂ B(p, r) positively:
Z
a) Compute −y dx.
∂ B(p,r)
Z
b) Compute x dy.
∂ B(p,r)
Z
−y x
c) Compute dy + dy.
∂ B(p,r) 2 2
Exercise 10.6.5: Using Green’s theorem show that the area of a triangle with vertices (x1 , y1 ), (x2 , y2 ),
(x3 , y3 ) is 21 |x1 y2 + x2 y3 + x3 y1 − y1 x2 − y2 x3 − y3 x1 |. Hint: See previous exercise.

Exercise 10.6.6: Using the mean value property prove the maximum principle for harmonic functions:
Suppose U ⊂ R2 is a connected open set and f : U → R is harmonic. Prove that if f attains a maximum at
p ∈ U, then f is constant.
p
Exercise 10.6.7: Let f (x, y) := ln x2 + y2 .
a) Show f is harmonic where defined.
b) Show lim(x,y)→0 f (x, y) = −∞.
1 R
c) Using a circle Cr of radius r around the origin, compute 2π r ∂ Cr f ds. What happens as r → 0?
d) Why can’t you use Green’s theorem?
10.7. CHANGE OF VARIABLES 121

10.7 Change of variables


Note: 1 lecture
In one variable, we have the familiar change of variables
Z b  ′ Z g(b)
f g(x) g (x) dx = f (x) dx.
a g(a)

It may be surprising that the analogue in higher dimensions is quite a bit more complicated. The
first complication is orientation.
Rb
IfR we use the definition of integral from this chapter, then we do
not have the notion of a versus ba . We are simply integrating over an interval [a, b]. With this
notation, the change of variables becomes
Z  ′ Z
f g(x) |g (x)| dx = f (x) dx.
[a,b] g([a,b])

In this section we will obtain the several-variable analogue of this form.


First we wish to see what plays the role of |g′ (x)|. The g′ (x) is a scaling of dx. The integral
measures volumes in general, so in one dimension it measures length. If our g is linear, that is,
g(x) = Lx, then g′ (x) = L. Then the length of the interval g([a, b]) is simply |L|(b − a). That is
because g([a, b]) is either [La, Lb] or [Lb, La]. This property holds in higher dimension with |L|
replaced by the absolute value of the determinant.
Proposition 10.7.1. Suppose
 R ⊂ Rn is a rectangle and A : Rn → Rn is linear. Then A(R) is Jordan
measurable and V A(R) = |det(A)|V (R).
Proof. It is enough to prove for elementary matrices. The proof is left as an exercise.
Let us prove that |Jg (x)| is the replacement of |g′ (x)| for multiple dimensions. The following
theorem holds in more generality, but this statement is sufficient for many uses.
Theorem 10.7.2. Suppose S ⊂ Rn is a closed bounded Jordan measurable set, and S ⊂ U for an
open set U ⊂ Rn . If g : U → Rn is a one-to-one continuously differentiable mapping such that Jg is
never zero on S.
Suppose f : g(S) → R is Riemann integrable. Then f ◦ g is Riemann integrable on S and
Z Z 
f (x) dx = f g(x) |Jg (x)| dx.
g(S) S

The set g(S) is Jordan measurable by , so the left hand side does make sense.
That the right hand side makes sense follows by (actually ).
Proof. The set S can be covered by finitely many closed rectangles P1 , P2 , . . . , Pk , whose interiors do
not overlap such that each Pj ⊂ U (exercise). Proving the theorem for Pj ∩ S instead of S is enough
Furthermore, for y ∈
/ g(S) write f (y) := 0. We therefore assume that S is equal to a rectangle R.
We can write any Riemann integrable function f as f = f+ − f− for two nonnegative Riemann
integrable functions f+ and f− :

f+ (x) := max{ f (x), 0}, f− (x) := max{− f (x), 0}.


122 CHAPTER 10. MULTIVARIABLE INTEGRAL

So, if we prove the theorem for a nonnegative f , we obtain the theorem for arbitrary f . Therefore,
let us also suppose that f (y) ≥ 0 for all y.
Let ε > 0 be given. For every x ∈ R, let

Wx := y ∈ U : kg′ (x) − g′ (y)k < ε/2 .
We leave it as an exercise to prove that Wx is open. As x ∈ Wx for every x, we find an open cover.
Therefore by the Lebesgue covering lemma, there exists a δ > 0 such that for every y ∈ R, there is
an x such that B(y, δ ) ⊂ Wx . In other words, if P is a rectangle P of maximum side length less than
√δ and such that y ⊂ P, then P ⊂ B(y, δ ) ⊂ Wx . By triangle inequality kg′ (ξ ) − g′ (η )k < ε for all
n
ξ , η ∈ P.
So let R1 , R2 , . . . , RN be subrectangles partitioning R such that the maximum side of any R j is
less than √δn . We also make sure that the minimum side length is at least 2√ δ
n
, which we can do if δ
is sufficiently small compared to the sides of R (exercise).
Consider some R j and some fixed x j ∈ R j . First suppose x j = 0, g(x j ) = 0, and g′ (0) = I.
For any y ∈ R j , apply the fundamental theorem of calculus to the function t 7→ g(ty) to find
R
g(y) = 01 g′ (ty)y dt. As the side of R j is at most √δn then kyk ≤ δ . So
Z 1  Z 1 Z 1

kg(y) − yk = g (ty)y − y dt ≤ kg (ty)y − yk dt ≤ kyk kg′ (ty) − Ik dt ≤ δ ε .

0 0 0

Therefore, g(R j ) ⊂ Re j , where Re j is a rectangle obtained from R j by extending by δ ε on all sides.


See .

δε
g(y)
y Re j

g(R j )
s2
x j = 0 = g(x j ) Rj

δε

δε s1 δε

Figure 10.9: Image of R j under g lies inside Re j . A sample point y ∈ R j (on the boundary of R j in fact)
is marked and g(y) must lie within with a radius of δ ε (also marked).


If the sides are s1 , s2 , . . . , sn then V (R j ) = s1 s2 · · · sn . Recall δ ≤ 2 n s j . Thus

V (Re j ) = (s1 + 2δ ε )(s2 + 2δ ε ) · · · (sn + 2δ ε )


√ √ √
≤ (s1 + 4 n s1 ε )(s2 + 4 n s2 ε ) · · · (sn + 4 n sn ε )
√ √ √ √
= s1 (1 + 4 n ε ) s2 (1 + 4 n ε ) · · · sn (1 + 4 n ε ) = V (R j )(1 + 4 n ε ).
10.7. CHANGE OF VARIABLES 123

In other words,  √
V g(R j ) ≤ V (R j )(1 + 4 n ε ).
Next, let us suppose that A := g′ (0) is not ′
 the identity. Write g = A ◦ ge where ge (0) = I. By
, we know that V A(R j ) = |det(A)|V (R j ), and hence
 √
V g(R j ) ≤ |det(A)|V (R j )(1 + 4 n ε )

= |Jg (0)|V (R j )(1 + 4 n ε ).
Finally, translation does not change volume, and therefore for any R j , and x j ∈ R j including when
x j 6= 0 and g(x j ) 6= 0, we find
 √
V g(R j ) ≤ |Jg (x j )|V (R j )(1 + 4 n ε ).
First assume that f (y) ≥ 0 for all y ∈ R. Suppose δ > 0 was chosen small enough such that
Z  N  

ε + f g(x) |Jg (x)| dx ≥ ∑ sup f g(x) |Jg (x)| V (R j )
R j=1 x∈R j
N  

≥ ∑ sup f g(x) |Jg (x j )|V (R j )
j=1 x∈R j
N  
 √
≥ ∑ sup f (y) V g(R j ) (1 + 4 n ε )
j=1 y∈g(R j )
N Z 

≥ ∑ g(R j )
f (y) dy (1 + 4 n ε )
j=1
Z

= (1 + 4 n ε ) f (y) dy.
g(R)
Where the last equality follows because the overlaps of the rectangles are their boundaries which
are of measure zero, and hence the image of their boundaries is also measure zero.
Letting ε go to zero we find
Z  Z
f g(x) |Jg (x)| dx ≥ f (y) dy.
R g(R)
By adding this result for several rectangles covering an S we obtain the result for an arbitrary
bounded Jordan measurable S ⊂ U, and any nonnegative integrable function f :
Z  Z
f g(x) |Jg (x)| dx ≥ f (y) dy.
S g(S)

Next, recall that g−1 exists, and that g−1 g(S) = S. Also 1 = Jg◦g−1 = Jg (g−1 (y))Jg−1 (y) for
y ∈ g(S). So
Z Z  
f (y) dy = f g g−1 (y) |Jg g−1 (y) | |Jg−1 (y)| dy
g(S) g(S)
Z  Z 
≥ f g(x) |Jg (x)| dx = f g(x) |Jg (x)| dx.
g−1 (g(S)) S
The conclusion of the theorem holds for all nonnegative f and as we mentioned above, it thus
holds for all Riemann integrable f .
124 CHAPTER 10. MULTIVARIABLE INTEGRAL

10.7.1 Exercises
Exercise 10.7.1: Prove .

Exercise 10.7.2: Suppose S ⊂ Rn is a closed bounded Jordan measurable set, and S ⊂ U for an open set
U ⊂ Rn . Show that there exist finitely many closed bounded rectangles P1 , P2 , . . . , Pk such that Pj ⊂ U,
S ⊂ P1 ∪ P2 ∪ · · · ∪ Pk , and the interiors are mutually disjoint, that is Pj◦ ∩ Pℓ◦ = 0/ whenever j 6= ℓ.

Exercise 10.7.3: Suppose U ⊂ Rn is an open set, x ∈ U, g : U → Rn a continuously differentiable mapping.


For any ε, show that 
Wx := y ∈ U : kg′ (x) − g′ (y)k < ε/2
from the proof of the theorem is an open set.

Exercise 10.7.4: Suppose R ⊂ Rn is a closed bounded rectangle. Show that if δ ′ > 0 is sufficiently small
compared to the sides of R, then R can be partitioned into subrectangles where each side of any subrectangle

is between δ2 and δ ′ .

Exercise 10.7.5: Prove the following version of the theorem: Suppose f : Rn → R is a Riemann integrable
compactly supported function. Suppose K ⊂ Rn is the support of f , S is a compact set, and g : Rn → Rn
is a function that when restricted to a neighborhood U of S is one-to-one and continuously differentiable,
g(S) = K and Jg is never zero on S (in the formula assume Jg (x) = 0 if g not differentiable at x, that is when
x∈/ U). Then Z Z 
f (x) dx = f g(x) |Jg (x)| dx.
Rn Rn

Exercise 10.7.6: Prove the following version of the theorem: Suppose S ⊂ Rn is an open bounded Jordan
measurable set, g : S → Rn is a one-to-one continuously differentiable mapping such that Jg is never zero on
S, and such that g(S) is bounded and Jordan measurable (it is also open). Suppose f : g(S) → R is Riemann
integrable. Then f ◦ g is Riemann integrable on S and
Z Z 
f (x) dx = f g(x) |Jg (x)| dx.
g(S) S

Hint: Write S as an increasing union of closed bounded Jordan measurable sets, then apply the theorem of
the section to those. Then prove that you can take the limit.
Chapter 11

Functions as Limits

11.1 Complex numbers


Note: half a lecture

11.1.1 The complex plane


In this chapter we consider approximation of functions, or in other words functions as limits of
sequences and series. We will extend some results we already saw to a somewhat more general
setting, and we will look at some completely new results. In particular, we consider complex-valued
functions. We gave complex numbers as examples before, but let us start from scratch and properly
define the complex number field.
A complex number is just a pair (x, y) ∈ R2 on which we define multiplication (see below). We
call the set the complex numbers and denote it by C. We identify x ∈ R with (x, 0) ∈ C. The x-axis
is then called the real axis and the y-axis is called the imaginary axis. The set C is sometimes called
the complex plane.
Define:
(x, y) + (s,t) := (x + s, y + t),
(x, y)(s,t) := (xs − yt, xt + ys).
Under the identification above we have 0 = (0, 0) and 1 = (1, 0). These two operations make the
plane into a field (exercise).
We write a complex number (x, y) as x + iy, where we define
i := (0, 1).
Notice that i2 = (0, 1)(0, 1) = (0 − 1, 0 + 0) = −1. That is, i is a solution to the polynomial equation
z2 + 1 = 0.
From now on, we will not use the notation (x, y) and use only x + iy. See .
We generally use x, y, r, s,t for real values and z, w, ξ , ζ for complex values, although that is not
a hard and fast rule. In particular, z is often used as a third real variable in R3 .

Note that engineers use j instead of i.
126 CHAPTER 11. FUNCTIONS AS LIMITS

x + iy or (x, y)
iy
i

1 x

Figure 11.1: The points 1, i, x, iy, and x + iy in the complex plane.

Definition 11.1.1. Suppose z = x + iy. We call x the real part of z, and we call y the imaginary part
of z. We write
Re z := x, Im z := y.
Define complex conjugate as
z̄ := x − iy.
Similarly define modulus as p
|z| := x2 + y2 .

Modulus acts like absolute value. For example, |zw| = |z| |w| (exercise).
The complex conjugate is a reflection of the plane across the real axis. The real numbers are
precisely those numbers for which the imaginary part y = 0. In particular, they are precisely those
numbers which satisfy the equation
z = z̄.
As C is really R2 , we let the metric on C be the standard euclidean metric on R2 . In particular,

|z| = d(z, 0), and also |z − w| = d(z, w).

So the topology on C is the same exact topology as the standard topology on R2 with the euclidean
metric, and |z| is in fact equal to the euclidean norm on R2 . Importantly, since R2 is a complete
metric space, then so is C.
Since |z| is the euclidean norm on R2 we have the triangle inequality of both flavors:

|z + w| ≤ |z| + |w| and |z| − |w| ≤ |z − w|.

The complex conjugate and the modulus are even more intimately related:

|z|2 = x2 + y2 = (x + iy)(x − iy) = zz̄.

Remark 11.1.2. There is no natural ordering on the complex numbers. In particular, no ordering
that makes the complex numbers into an ordered field. Ordering is one of the things we lose when
we go from real to complex numbers.
11.1. COMPLEX NUMBERS 127

11.1.2 Complex numbers and limits


It is not hard to show that the algebraic operations are continuous. This is because convergence
in R2 is the same as convergence for each component. So for example: write zn = xn + iyn and
wn = sn + itn , and suppose that lim zn = z = x + iy and lim wn = w = s + it. Let us show
lim zn wn = zw
n→∞

First,
zn wn = (xn sn − yntn ) + i(xntn + yn sn ).
As topology on C is the same as on R2 , then xn → x, yn → y, sn → s, and tn → t. So
lim (xn sn − yntn ) = xs − yt and lim (xntn + yn sn ) = xt + ys.
n→∞ n→∞

As (xs − yt) + i(xt + ys) = zw, then


lim zn wn = zw.
n→∞
Similarly the modulus, and complex conjugate are continuous functions. We leave the proof of the
following proposition as an exercise.
Proposition 11.1.3. Suppose {zn }, {wn } are sequences of complex numbers converging to z and w
respectively. Then
(i) lim zn + wn = z + w,
n→∞
(ii) lim zn wn = zw,
n→∞
zn z
(iii) assuming wn 6= 0 for all n and w 6= 0, lim = ,
n→∞ wn w
(iv) lim |zn | = |z|,
n→∞
(v) lim z̄n = z̄.
n→∞

As we have seen above, convergence in C is the same as convergence in R2 . In particular, a


sequence in C converges if and only if the real and imaginary parts converge. Therefore, feel free to
apply everything you have learned about convergence in R2 , as well as applying results about real
numbers to the real and imaginary parts.
We also need to extend convergence of complex series. Let {zn } be a sequence of complex
numbers, then the series

∑ zn
n=1
converges if the limit of partial sums converges, that is, if
k
lim
k→∞
∑ zn exists.
n=1

As before, we sometimes write ∑ zn for the series. A series converges absolutely if ∑|zn | converges.
We say a series is Cauchy if the sequence of partial sums is Cauchy. The following two
propositions have essentially the same proofs as for real series and we leave them as exercises.
128 CHAPTER 11. FUNCTIONS AS LIMITS

Proposition 11.1.4. The complex series ∑ zn is Cauchy if for every ε > 0, there exists an M ∈ N
such that for every n ≥ M and every k > n we have

k
∑ z j < ε.
j=n+1

Proposition 11.1.5. If a complex series ∑ zn converges absolutely, then it converges.

The series ∑|zn | is a real series. All the convergence tests (ratio test, root test, etc. . . ) that
talk about absolute convergence work with the numbers |zn |, that is, they are really talking about
convergence of series of nonnegative real numbers. You can directly apply these tests them without
needing to reprove anything for complex series.

11.1.3 Complex-valued functions


When we deal with complex valued functions f : X → C, what we often do is to write f = u + iv
for real-valued functions u : X → R and v : X → R.
On thing we often wish to do is to integrate f : [a, b] → C. What we do is to write f = u + iv for
real valued u and v. We then define that f is Riemann integrable if and only if u and v are Riemann
integrable, and in this case we define
Z b Z b Z b
f := u+i v.
a a a

We make the same definition for every other type of integral (improper, multivarible, etc. . . ).
Similarly when we differentiate, write f : [a, b] → C as f = u + iv. Thinking of C as R2 we
find that f is differentiable if u and v are differentiable. For such a function the derivative was
represented by a vector in R2 . Now a vector in R2 is a complex number. In other words we write
the derivative as
f ′ (t) := u′ (t) + i v′ (t).
The linear operator representing the derivative is the multiplication by the complex number f ′ (t), so
nothing is lost in this identification.

11.1.4 Exercises
Exercise 11.1.1: Check that C is a field.

Exercise 11.1.2: Prove that for z, w ∈ C, we have |zw| = |z| |w|.

Exercise 11.1.3: Finish the proof of .

Exercise 11.1.4: Prove .

Exercise 11.1.5: Prove .


11.1. COMPLEX NUMBERS 129
 x −y 
Exercise 11.1.6: Considering the definition of complex multiplication, given x + iy define the matrix y x .
Prove that
a) The action of this matrix on a vector (s,t) is the same as the action of multiplying (x + iy)(s + it).
b) Multiplying two such matrices is the same multiplying the underlying complex numbers and then getting
the corresponding matrix for the product. In other words, we can think of the field C as also a subset of
the two by two matrices.
 
c) Show that xy −yx has eigenvalues x + iy and x − iy. Recall that λ is an eigenvalue of a matrix A if A − λ I
(a complex matrix in our case) is not invertible, or in other words if it has linearly dependent rows: That
is, one row is a (complex) multiple of the other

Exercise 11.1.7: Prove the Bolzano-Weierstrass theorem for complex sequences. Suppose {zn } is a bounded
sequence of complex numbers, that is, there exists an M such that |zn | ≤ M for all n. Prove that there exists a
subsequence {znk } that converges to some z ∈ C.

Exercise 11.1.8:
a) Prove that there is no simple mean value theorem for complex valued functions: Find a differentiable
function f : [0, 1] → C such that f (0) = f (1) = 0, but f ′ (t) 6= 0 for all t ∈ [0, 1].
b) However, we have the weaker form of the mean value theorem as we do for vector-valued functions.
Prove: If f : [a, b] → C is continuous and differentiable in (a, b), and for some M, | f ′ (x)| ≤ M for all
x ∈ (a, b), then | f (b) − f (a)| ≤ M|b − a|.

Exercise 11.1.9: Prove that there is no simple meanRvalue theorem for integrals for complex valued functions:
Find a continuous function f : [0, 1] → C such that 01 f = 0 but f (t) 6= 0 for all t ∈ [0, 1].
130 CHAPTER 11. FUNCTIONS AS LIMITS

11.2 Swapping limits


Note: 2 lectures

11.2.1 Continuity
Let us get back to swapping limits. Let { fn } be a sequence of functions fn : X → Y for a set X and
a metric space Y . Let f : X → Y be a function and for every x ∈ X suppose that
f (x) = lim fn (x).
n→∞

We say the sequence { fn } converges pointwise to f .


Suppose Y = C, a series converges pointwise if for every x ∈ X we have
n ∞
f (x) = lim
n→∞
∑ fk (x) = ∑ fk (x).
k=1 k=1

The question is: If fn are all continuous, is f continuous? Differentiable? Integrable? What are
the derivatives or integrals of f ?
For example, for continuity of the pointwise limit of a sequence { fn }, we are asking if
?
lim lim fn (x) = lim lim fn (x).
x→x0 n→∞ n→∞ x→x0

We don’t even a priory know if both sides exist, let alone if they are equal each other.
Example 11.2.1: The functions fn : R → R,
1
fn (x) :=
1 + nx2
converge pointwise to (
1 if x = 0,
f (x) :=
0 else,
which is not continuous of course.
Pointwise convergence is not enough to preserve continuity (nor even boundedness). For that,
we need uniform convergence.
Let fn : X → Y be functions. Then { fn } converges uniformly to f if for every ε > 0, there exists
an M such that for all n ≥ M and all x ∈ X we have

d fn (x), f (x) < ε .
A series ∑ fn of complex-valued functions converges uniformly if the sequence of partial sums
converges uniformly, that is for every ε > 0 there exists an M such that for all n ≥ M and all x ∈ X
!
n
∑ fk (x) − f (x) < ε .
k=1

The simplest property preserved by uniform convergence is boundedness. We leave the proof of
the following proposition as an exercise. It is almost identical to the proof for real-valued functions.
11.2. SWAPPING LIMITS 131

Proposition 11.2.2. Let X be any set and Y any metric space. If fn : X → Y are bounded functions
and converge uniformly to f : X → Y , then f is bounded.

We have a notion of uniformly Cauchy as for real-valued functions. The proof of the following
proposition is again essentially the same as for the real-valued functions and is left as an exercise.

Proposition 11.2.3. Let X be any set and let (Y, d) be a Cauchy complete metric space. Let
fn : X → Y be functions. Then { fn } converges uniformly if and only if for every ε > 0, there is an
M such that for all n, m ≥ M, and all x ∈ X we have

d fn (x), fm (x) < ε .

For f : X → C, we write
k f ku := sup| f (x)|.
x∈X

We call k·ku the supremum norm or uniform norm. Then a sequence of functions fn : X → C
converges uniformly to f : X → C if and only if

lim k fn − f ku = 0.
n→∞

The supermum norm satisfies the triangle inequality: For any x ∈ X

| f (x) + g(x)| ≤ | f (x)| + |g(x)| ≤ k f ku + kgku .

Take a supremum on the left to get

k f + gku ≤ k f ku + kgku .

For a compact metric space X, the uniform norm is a norm on the vector space C(X, C). We
leave it as an exercise. While we will not need it, C(X, C) is in fact a complex vector space, that is
in the definition of a vector space replace R with C. Convergence in the metric space C(X, C) is
uniform convergence.
We will study a couple of types of series of functions, and a useful test for uniform convergence
of a series is the so-called Weierstrass M-test.

Theorem 11.2.4 (Weierstrass M-test). Let X be any set. Suppose fn : X → C are functions, Mn > 0
numbers such that

| fn (x)| ≤ Mn for all x ∈ X, and ∑ Mn converges.
n=1

Then

∑ fn(x) converges uniformly.
n=1

Another way to state the theorem is to say that if ∑k fn ku converges, then ∑ fn converges
uniformly. Note that the converse of this theorem is not true.
132 CHAPTER 11. FUNCTIONS AS LIMITS

Proof. Suppose ∑ Mn converges. Given ε > 0, we have that the partial sums of ∑ Mn are Cauchy so
for there is an N such that for all m, n ≥ N with m ≥ n we have
m
∑ Mk < ε
k=n+1

Now let us look at a Cauchy difference of the partial sums of the functions
m m m
∑ fk (x) ≤ ∑ | fk (x)| ≤ ∑ Mk < ε .
k=n+1 k=n+1 k=n+1

And we are done by .


Example 11.2.5: The series

sin(nx)
∑ n2
n=1
converges uniformly on R. See . This is a Fourier series, we will see more of these in a
later section. It converges because
sin(nx) 1

n2 n2
and ∑∞ 1
n=1 n2 converges.

0.5

−0.5

−1
−4 −2 0 2 4
sin(nx)
Figure 11.2: Plot of ∑∞
n=1 n2 including the first 8 partial sums in various shades of gray.

Example 11.2.6: The series



xn
∑ n!
n=0
converges uniformly on any bounded interval. Take the interval [−r, r] ⊂ R (any bounded interval is
contained in such an interval)
xn rn

n! n!
n
and ∑∞ r
n=1 n! converges, for example by the ratio test.
11.2. SWAPPING LIMITS 133

Now we would love to say something about the limit. For example, is it continuous?
Proposition 11.2.7. Let (X, dX ) and (Y, dY ) be metric spaces, and let fn : X → Y be functions.
Suppose Y is complete metric space. Suppose { fn } converges uniformly to f : X → Y . Let {xk } be
a sequence in X and x := lim xk . Suppose that
an := lim fn (xk )
k→∞

exists for all n. Then {an } converges and


lim f (xk ) = lim an .
k→∞ n→∞

In other words
lim lim fn (xk ) = lim lim fn (xk ).
k→∞ n→∞ n→∞ k→∞

Proof. First we show that {an } converges. As { fn } converges uniformly it is uniformly Cauchy.
Let ε > 0 be given. There is an M such that for all m, n ≥ M we have

dY fn (xk ), fm (xk ) < ε for all k.
As a metric is automatically continuous we let k go to infinity to find
dY (an , am ) ≤ ε .
Hence {an } is Cauchy and converges since Y is complete. Write a := lim an .
Find a k ∈ N such that 
dY fk (p), f (p) < ε/3
for all p ∈ X. Assume k is large enough so that
dY (ak , a) < ε/3.
Find an N ∈ N such that for m ≥ N,

dY fk (xm ), ak < ε/3.
Then for m ≥ N,
   
dY f (xm ), a ≤ dY f (xm ), fk (xm ) + dY fk (xm ), ak + dY ak , a < ε/3 + ε/3 + ε/3 = ε .
Immediately we obtain a corollary about continuity.
Corollary 11.2.8. Let X and Y be metric spaces such that Y is Cauchy complete. Let fn : X → Y
be continuous functions such that { fn } converges uniformly to f : X → Y . Then f is continuous.
Converse is not true. Just because the limit is continuous doesn’t mean that the convergence is
uniform. For example: fn : (0, 1) → R defined by fn (x) := xn converge to the zero function, but not
uniformly. However, if we add extra conditions on the sequence, we can obtain a partial converse
such as Dini’s theorem, see Exercise 6.2.10 from volume I.
Assuming the exercise that for a compact X, C(X, C) is a metric space with the uniform norm
(actually a normed vector space). We have just shown that it is Cauchy complete:
says that a Cauchy sequence in C(X, C) converges to some function, and shows
that the limit is in fact continuous and hence in C(X, C).
134 CHAPTER 11. FUNCTIONS AS LIMITS

Corollary 11.2.9. Let (X, d) be a compact metric space, then C(X, C) is a Cauchy complete metric
space.

Example 11.2.10: We have seen that the example Fourier series


sin(nx)
∑ n2
n=1

converges uniformly and hence is continuous (as is visible in ).

11.2.2 Integration
Proposition 11.2.11. Suppose fn : [a, b] → C are Riemann integrable and suppose that { fn } con-
verges uniformly to f : [a, b] → C. Then f is Riemann integrable and
Z b Z b
f = lim fn .
a n→∞ a

Since the integral of a complex-valued function is just the integral of the real and imaginary
parts separately, the proof follows directly by the results of chapter 6 of volume I. We leave the
details as an exercise.

Corollary 11.2.12. Suppose fn : [a, b] → C are Riemann integrable and suppose that


∑ fn(x)
n=1

converges uniformly. Then the series is Riemann integrable on [a, b] and


Z b ∞ ∞ Z b
∑ fn(x) dx = ∑
a n=1
fn (x) dx
n=1 a

Example 11.2.13: Let us show how to integrate a Fourier series.


Z x ∞ ∞ Z x ∞
cos(nt) cos(nt) sin(nx)

0 n=1 n2
dt = ∑ n 2
dt = ∑ n3
n=1 0 n=1

The swapping of integral and sum is possible because of uniform convergence, which we have
proved before using the Weierstrass M-test ( ).

Note that we can swap integrals and limits under far less stringent hypotheses, but for that we
would need a stronger integral than the Riemann integral. E.g. the Lebesgue integral.
11.2. SWAPPING LIMITS 135

11.2.3 Differentiation
Recall that a complex-valued function f : [a, b] → C where f (x) = u(x) + i v(x) is differentiable, if
u and v are differentiable and the derivative is

f ′ (x) = u′ (x) + i v′ (x).

The proof of the following theorem is to apply the corresponding theorem for real functions to u
and v, and is left as an exercise.

Theorem 11.2.14. Let I ⊂ R be a bounded interval and let fn : I → C be continuously differentiable


functions. Suppose { fn′ } converges uniformly to g : I → C, and suppose { fn (c)}∞
n=1 is a convergent
sequence for some c ∈ I. Then { fn } converges uniformly to a continuously differentiable function
f : I → C, and f ′ = g.

Uniform limits of the functions themselves are not enough, and can make matters even worse.
In we will prove that continuous functions are uniform limits of polynomials, yet as the
following example demonstrates, a continuous function need not have a derivative anywhere.

Example 11.2.15: Let us construct a continuous nowhere differentiable function. Such functions
are often called Weierstrass functions, although this particular one is a different example than what
Weierstrass gave.
Define
ϕ (x) := |x| for x ∈ [−1, 1].
Extend the definition of ϕ to all of R by making it 2-periodic: Decree that ϕ (x) = ϕ (x + 2). The
function ϕ : R → R is continuous, in fact |ϕ (x) − ϕ (y)| ≤ |x − y| (why?). See .

0
−8 −6 −4 −2 0 2 4 6 8

Figure 11.3: The 2-periodic function ϕ.


3 n
As ∑ 4 converges and |ϕ (x)| ≤ 1 for all x, we have by the M-test ( ) that
 n

3
f (x) := ∑ ϕ (4n x)
n=0 4

converges uniformly and hence is continuous. We claim that this f : R → R is nowhere differentiable.
See .
Fix x and define
1
δm := ± 4−m ,
2
where the sign is chosen so that there is no integer between 4m x and 4m (x + δm ) = 4m x ± 12 . Fix m
for a moment.
136 CHAPTER 11. FUNCTIONS AS LIMITS

0
0 1 2
Figure 11.4: Plot of the nowhere differentiable function f .

Let 
ϕ 4n (x + δm ) − ϕ (4n x)
γn := .
δm
If n > m, then 4n δm is an even integer. As ϕ is 2-periodic we get that γn = 0.
As there is no integer between 4m (x + δm ) = 4m x ± 1/2and 4m x, then on this interval ϕ (t) =
±t + ℓ for some integer ℓ. In particular, ϕ 4m (x + δm ) − ϕ (4m x) = |4m x ± 1/2 − 4m x| = 1/2.
Therefore 
ϕ 4m (x + δm ) − ϕ (4m x)
|γm | = −m
= 4m .
±( /2)4
1

Similarly, suppose n < m. Since |ϕ (s) − ϕ (t)| ≤ |s − t|,



ϕ 4n x ± (1/2)4n−m − ϕ (4n x) ±(1/2)4n−m
|γn | = ≤ = 4n .
±(1/2)4−m ±(1/2)4−m

And so
∞ n  ∞  n
f (x + δm ) − f (x) 3 ϕ 4n (x + δm ) − ϕ (4n x) 3
δm
= ∑ δm
= ∑ 4 γn
n=0 4 n=0
m  n
3
= ∑ γn
n=0 4
 m m−1  n
3 3
≥ γm − ∑ γn
4 n=0 4
m−1
3 m − 1 3m + 1
≥ 3m − ∑ 3n = 3m − = .
n=0 3−1 2
3m +1
As m → ∞ we have δm → 0, but 2 goes to infinity. Hence f cannot be differentiable at x.
11.2. SWAPPING LIMITS 137

11.2.4 Exercises
Exercise 11.2.1: Prove .

Exercise 11.2.2: Prove .

Exercise 11.2.3: Suppose (X, d) is a compact metric space. Prove that k·ku is a norm on the vector space of
continuous complex-valued functions C(X, C).

Exercise 11.2.4:
a) Prove that fn (x) := 2−n sin(2n x) converge uniformly to zero, but there exists a dense set D ⊂ R such that
limn→∞ fn′ (x) = 1 for all x ∈ D.
b) Prove that ∑∞ −n n
n=1 2 sin(2 x) converges uniformly to a continuous function, and there exists a dense set
D ⊂ R where the derivatives of the partial sums do not converge.

Exercise 11.2.5: Suppose (X, d) is a compact metric space. Prove that k f kC1 := k f ku + k f ′ ku is a norm on
the vector space of continuously differentiable complex-valued functions C1 (X, C).

Exercise 11.2.6: Prove .

Exercise 11.2.7: Prove by reducing to the real result.

Exercise 11.2.8: Work out the following counterexample to the converse of the Weierstrass M-test (
). Define fn : [0, 1] → R by
(
1 1
if n+1 < x < 1n ,
fn (x) := n
0 else.

Then prove that ∑ fn converges uniformly, but ∑k fn ku does not.

Exercise 11.2.9: Suppose fn : [0, 1] → R are monotone increasing functions and suppose that ∑ fn converges
pointwise. Prove that ∑ fn converges uniformly.

Exercise 11.2.10: Prove that



∑ e−nx
n=1

converges for all x > 0 to a differentiable function.


138 CHAPTER 11. FUNCTIONS AS LIMITS

11.3 Power series and analytic functions


Note: 2–3 lectures

11.3.1 Analytic functions


A (complex) power series is a series of the form

∑ cn(z − a)n
n=0
for cn , z, a ∈ C. We say the series converges if the series converges for any z 6= a.
Let U ⊂ C be an open set and suppose f : U → C is a function such that for every a ∈ U there
exists a ρ > 0 and a power series convergent to the function:

f (z) = ∑ cn(z − a)n
n=0
for all z ∈ B(a, ρ ). Then we say f is an analytic function.
Similarly if we have an interval (a, b) ⊂ R, we say that f : (a, b) → C is analytic or perhaps
real-analytic if for each point c ∈ (a, b) there is a power series around c that converges in some
(c − ρ , c + ρ ) for some ρ > 0.
As we will sometimes talk about real and sometimes about complex power series we will use z
to denote a complex number and x a real number, but we will always mention which case we are
working with.
An analytic function has different expansions around different points. Also the convergence
does not automatically happen on the entire domain of the function. For example, if |z| < 1, then

1
= ∑ zn .
1 − z n=0
While the left hand side exists on all of z 6= 1, the right hand side happens to converge only if |z| < 1.
1
See a graph of a small piece of 1−z in . Notice that we can’t graph the function itself, we
can only graph its real or imaginary parts for lack of dimensions in our universe.

11.3.2 Convergence of power series


We proved several results for power series of a real variable in §2.6 of volume I. For the most part
the convergence properties of power series deal with the series ∑|ck | |z − a|k and so we have already
proved many results about complex power series. In particular, we computed the so-called radius of
convergence of a power series.
n
Proposition 11.3.1. Let ∑∞
n=0 cn (z − a) be a power series. There exists a ρ ∈ [0, ∞] such that
(i) the series diverges and ρ = 0,
(ii) the series converges for all z ∈ C and ρ = ∞,
(iii) 0 < ρ < ∞ and the series converges absolutely on B(a, ρ ), and diverges when |z − a| > ρ .
Furthermore, if 0 < r < ρ then the series converges uniformly on the closed ball C(a, r).
11.3. POWER SERIES AND ANALYTIC FUNCTIONS 139

y y

x x

1
Figure 11.5: Graphs of the real and imaginary parts of z = x + iy 7→ 1−z in the square [−0.8, 0.8]2 . The
singularity at z = 1 is marked with a vertical dashed line.

Proof. We use the real version of this proposition, Proposition 2.6.10 in volume I. Let
p
R := lim sup n |cn |.
n→∞

If R = 0, then ∑∞ n ∞ n
n=0 |cn | |z − a| converges for all z. If R = ∞, then ∑n=0 |cn | |z − a| converges only

at z = a. Otherwise, let ρ := 1/R and ∑n=0 |cn | |z − a|n converges when |z − a| < ρ , and diverges (in
fact the terms of the series do not go to zero) when |z − a| > ρ .
To prove the furthermore suppose 0 < r < ρ and z ∈ C(a, r). Then consider the partial sums

k k k
∑ cn(z − a)n ≤ ∑ |cn||z − a|n ≤ ∑ |cn|rn.
n=0 n=0 n=0

The number ρ is called the radius of convergence. See . The radius of convergence
gives us a disk around a where the series converges. A power series is convergent if ρ > 0.

series
converges series
does not converge
a
ρ

Figure 11.6: Radius of convergence.


140 CHAPTER 11. FUNCTIONS AS LIMITS

If ∑ cn (z − a)n converges for some z, then

∑ cn(w − a)n
converges absolutely whenever |w − a| < |z − a|. Conversely if the series diverges at z, then it must
diverge at w whenever |w − a| > |z − a|. This means that to show that the radius of convergence is
at least some number, we simply need to show convergence at some point by any method we know.

Example 11.3.2: Let us list some series we already know:



∑ zn has radius of convergence 1.
n=0

1
∑ n! zn has radius of convergence ∞.
n=0

∑ nn z n has radius of convergence 0.
n=0

1 1
Example 11.3.3: Note the difference between 1−z and its power series. Let us expand 1−z as power
1
series around any point a 6= 1. Let c := 1−a , then
!
∞ ∞
1 c 1
= = c ∑ cn (z − a)n = ∑ n+1
(z − a)n .
1 − z 1 − c(z − a) n=0 n=0 (1 − a)

The series ∑ cn (z − a)n converges if and only if the series on the right hand side converges and
p
n 1
lim sup |cn | = |c| = .
n→∞ |1 − a|

The radius of convergence of the power series is |1 − a|, that is the distance from 1 to a. The
1
function 1−z has a power series representation around every a 6= 1 and so is analytic in C \ {1}. The
domain of the function is bigger than the region of convergence of the power series representing the
function at any point.

It turns out that if a function has a power series representation converging to the function on
some ball, then it has a power series representation at every point in the ball. We will prove this
result later.

11.3.3 Properties of analytic functions


Proposition 11.3.4. If

f (z) := ∑ cn(z − a)n
n=0

is convergent in B(a, ρ ) for some ρ > 0, then f : B(a, ρ ) → C is continuous. In particular, analytic
functions are continuous.
11.3. POWER SERIES AND ANALYTIC FUNCTIONS 141

Proof. For any z0 ∈ B(a, ρ ) pick r < ρ such that z0 ∈ B(a, r). On B(a, r) the partial sums (which are
continuous) converge uniformly, and so the limit f |B(a,r) is continuous. Any sequence converging
to z0 has some tail that is completely in the open ball B(a, r), hence f is continuous at z0 .
In Corollary 6.2.13 of volume I we proved that we can differentiate real power series term by
term. That is we proved that if

f (x) := ∑ cn(x − a)n
n=0
converges for real x in an interval around a ∈ R, then we can differentiate term by term and obtain a
series ∞ ∞
f ′ (x) = ∑ ncn(x − a)n−1 = ∑ (n + 1)cn+1(x − a)n
n=1 n=0
with the same radius of convergence. We only proved this theorem when cn is real, however, for
complex cn , we write cn = sn + itn , and as x and a are real
∞ ∞ ∞
∑ cn(x − a)n = ∑ sn(x − a)n + i ∑ tn(x − a)n.
n=0 n=0 n=0

We apply the theorem to the real and imaginary part.


By iterating this theorem we find that an analytic function is infinitely differentiable:
∞ ∞
f (ℓ) (x) = ∑ n(n − 1) · · · (n − ℓ + 1)ck (x − a)n−ℓ = ∑ (n + ℓ)(n + ℓ − 1) · · · (n + 1)cn+ℓ(x − a)n.
n=ℓ n=0

In particular,
f (ℓ) (a) = ℓ! cℓ . (11.1)
So the coefficients are uniquely determined by the derivatives of the function, and vice versa.
On the other hand, just because we have an infinitely differentiable function doesn’t mean that
(n)
the numbers cn obtained by cn = f n!(0) give a convergent power series. There is a theorem, which
we will not prove, that given an arbitrary sequence {cn }, there exists an infinitely differentiable
(n)
function f such that cn = f n!(0) . Finally, even if the obtained series converges it may not converge to
the function we started with. For a simpler example, see Exercise 5.4.11 in volume I: The function
(
e−1/x if x > 0,
f (x) :=
0 if x ≤ 0,
is infinitely differentiable, and all derivatives at the origin are zero. So its series at the origin would
be just the zero series, and while that series converges, it does not converge to f for x > 0.
Note that we can always apply an affine transformation z 7→ z + a that converts a power series to
a series at the origin. That is, if
∞ ∞
n
f (z) = ∑ cn(z − a) , we consider f (z + a) = ∑ cn z n .
n=0 n=0

Therefore it is usually sufficient to prove results about power series at the origin. From now on, we
often assume a = 0 for simplicity.
142 CHAPTER 11. FUNCTIONS AS LIMITS

11.3.4 Power series as analytic functions


We need a theorem on swapping limits of series, that is, Fubini’s theorem for sums.
Theorem 11.3.5 (Fubini for sums). Let {ak j }∞
k=1, j=1 be a double sequence of complex numbers
and suppose that for every k the series

∑ |ak j | converges
j=1

and furthermore that !


∞ ∞
∑ ∑ |ak j | converges.
k=1 j=1
Then ! !
∞ ∞ ∞ ∞
∑ ∑ ak j = ∑ ∑ ak j ,
k=1 j=1 j=1 k=1
where all the series involved converge.
Proof. Let E be the set {1/n : n ∈ N} ∪ {0}, and treat it as a metric space with the metric inherited
from R. Define the sequence of functions fk : E → C by
n ∞
fk (1/n) := ∑ ak j fk (0) := ∑ ak j
j=1 j=1

As the series converge we get that each fk is continuous at 0 (since 0 is the only cluster point, they
are continuous everywhere, but we don’t need that). For all x ∈ E we have

| fk (x)| ≤ ∑ |ak j |
j=1

By knowing that ∑k ∑ j |ak j | converges (does not depend on x), we know that for any x ∈ E
n
∑ fk (x)
k=1

converges uniformly. So define



g(x) := ∑ fk (x),
k=1
which is therefore a continuous function at 0. So
!
∞ ∞ ∞
∑ ∑ ak j = ∑ fk (0) = g(0) = n→∞
lim g(1/n)
k=1 j=1 k=1
∞ ∞ n
= lim
n→∞
∑ fk (1/n) = lim
n→∞
∑ ∑ ak j
k=1 k=1 j=1
!
n ∞ ∞ ∞
= lim
n→∞
∑ ∑ ak j = ∑ ∑ ak j .
j=1 k=1 j=1 k=1
11.3. POWER SERIES AND ANALYTIC FUNCTIONS 143

Now we prove that once we have a series converging to a function in some interval, we can
expand the function around any point.
Theorem 11.3.6 (Taylor’s theorem for real-analytic functions). Let

f (x) := ∑ ak x k
k=0

be a power series converging in (−ρ , ρ ) for some ρ > 0. Given any a ∈ (−ρ , ρ ), and x such that
|x − a| < ρ − |a| we obtain

f (k) (a)
f (x) = ∑ (x − a)k .
k=0 k!

The power series at a could of course converge in a larger interval, but the one above is
guaranteed. It is the largest symmetric interval about a that fits in (−ρ , ρ ).
Proof. Given a and x as in the theorem, write
∞ k
f (x) = ∑ ak (x − a) + a
k=0
∞   k
k k−m
= ∑ ak ∑ a (x − a)m
k=0 m=0 m

k  k−m
Define ck,m := ak m a if m ≤ k and 0 if m > k. Then
∞ ∞
f (x) = ∑ ∑ ck,m(x − a)m. (11.2)
k=0 m=0

Let us show that the double sum converges absolutely.


∞ ∞ ∞ k 
m k k−m
∑ ∑ ck,m (x − a) = ∑ ∑ ak a (x − a)m
k=0 m=0 k=0 m=0 m
∞ k  
k
= ∑ |ak | ∑ |a|k−m |x − a|m
k=0 m=0 m
∞ k
= ∑ |ak | |x − a| + |a| ,
k=0

and this series converges as long as (|x − a| + |a|) < ρ or in other words if |x − a| < ρ − |a|.
Using , swap the order of summation in ( ), and the following series converges
when |x − a| < ρ − |a|:
!
∞ ∞ ∞ ∞
f (x) = ∑ ∑ ck,m (x − a)m = ∑ ∑ ck,m (x − a)m .
k=0 m=0 m=0 k=0

The formula in terms of derivatives at a follows by differentiating the series to obtain ( ).


144 CHAPTER 11. FUNCTIONS AS LIMITS

Note that if a series converges for real x ∈ (a − ρ , a + ρ ) it also converges for all complex
numbers in B(a, ρ ). We have the following corollary.
Corollary 11.3.7. If ∑ ck (z − a)k converges to f (z) in B(a, ρ ) and b ∈ B(a, ρ ), then there exists a
power series ∑ dk (z − b)k that converges to f (z) in B(b, ρ − |b − a|).
Proof. Without loss of generality assume that a = 0. We can rotate to assume that b is real, but

since that is harder to picture, let us do it explicitly. Let α = |b| . Notice that

|1/α | = |α | = 1.

Therefore the series ∑ ck (z/α )k = ∑ ck α −k zk converges to f (z/α ) in B(0, ρ ). When z = x is real we


apply at |b| and get a series that converges to f (z/α ) on B(|b|, ρ − |b|). That is,
there is a convergent series
∞ k
f (z/α ) = ∑ ak z − |b| .
k=0
Using α b = |b|, we find
∞ ∞ k ∞
f (z) = f (αz/α ) = ∑ ak (α z − |b|)k = ∑ ak α k z − |b|/α = ∑ ak α k (z − b)k ,
k=0 k=0 k=0

and this series converges for all z such that α z − |b| < ρ − |b| or |z − b| < ρ − |b|.
We proved above that a convergent power series is an analytic function where it converges. We
1
have also shown before that 1−z is analytic outside of z = 1.
Note that just because a real analytic function is analytic on the entire real line it does not
necessarily mean that it has a power series representation that converges everywhere. For example,
the function
1
f (x) =
1 + x2
happens to be real analytic function on R (exercise). A power series around the origin converging to
f has a radius of convergence of exactly 1. Can you see why? (exercise)

11.3.5 Identity theorem for analytic functions


Lemma 11.3.8. Suppose f (z) = ∑ ak zk is a convergent power series and {zn } is a sequence of
nonzero complex numbers converging to 0, such that f (zn ) = 0 for all n. Then ak = 0 for every k.
Proof. By continuity we know f (0) = 0 so a0 = 0. Suppose there exists some nonzero ak . Let m
be the smallest m such that am 6= 0. Then
∞ ∞ ∞
f (z) = ∑ ak zk = zm ∑ ak zk−m = zm ∑ ak+mzk .
k=m k=m k=0

Write g(z) = ∑∞ k
k=0 ak+m z (this series converges in on the same set as f ). g is continuous and
g(0) = am 6= 0. Thus there exists some δ > 0 such that g(z) 6= 0 for all z ∈ B(0, δ ). As f (z) = zm g(z),
then the only point in B(0, δ ) where f (z) = 0 is when z = 0, but this contradicts the assumption that
f (zn ) = 0 for all n.
11.3. POWER SERIES AND ANALYTIC FUNCTIONS 145

Recall that in a metric space X, a cluster point (or sometimes limit point) of a set E is a point
p ∈ X such that B(p, ε ) \ {p} contains points of E for all ε > 0.
Theorem 11.3.9 (Identity theorem). Let U ⊂ C be an open connected set. If f : U → C and
g : U → C are analytic functions that are equal on a set E ⊂ U, and E has a cluster point in U,
then f (z) = g(z) for all z ∈ U.
In most common applications of this theorem E is an open set or perhaps a curve.
Proof. Without loss of generality suppose E is the set of all points z ∈ U such that g(z) = f (z).
Note that E must be closed as f and g are continuous.
Suppose E has a cluster point. Without loss of generality assume that 0 is this cluster point.
Near 0, we have the expansions
∞ ∞
f (z) = ∑ ak zk and g(z) = ∑ bk zk ,
k=0 k=0

which converge in some ball B(0, ρ ). Therefore the series



0 = f (z) − g(z) = ∑ (ak − bk )zk
k=0

converges in B(0, ρ ). As 0 is a cluster point of E, then there is a sequence of nonzero points {zn }
such that f (zn ) − g(zn ) = 0. Therefore by the lemma above we have that ak = bk for all k. And
therefore B(0, ρ ) ⊂ E.
This means that E is open. As E is also closed, and U is connected, we conclude that E = U.
By restricting our attention to real x we obtain the same theorem for connected open subsets of
R, which are just open intervals.

11.3.6 Exercises
Exercise 11.3.1: Let 

1 if k = j,
ak j := −2k− j if k < j,


0 else.
Compute (or show the limit doesn’t exist):
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
a) ∑ |ak j | for any k, b) ∑ |ak j | for any j, c) ∑ ∑ |ak j |, d) ∑ ∑ ak j , e) ∑ ∑ ak j .
j=1 k=1 k=1 j=1 k=1 j=1 j=1 k=1
Hint: Fubini for sums does not apply, in fact, answers to d) and e) are different.
1
Exercise 11.3.2: Let f (x) := 1+x2
. Prove that
a) f is analytic function on all of R by finding a power series for f at every a ∈ R,
b) the radius of convergence of the power series for f at the origin is 1.

Exercise 11.3.3: Suppose f : C → C is an analytic function. Show that for each n, there are at most finitely
many zeros of f in B(0, n), that is f −1 (0) ∩ B(0, n) is finite for each n.
146 CHAPTER 11. FUNCTIONS AS LIMITS

Exercise 11.3.4: Suppose U ⊂ C is a connected open set, 0 ∈ U, and f : U → C is an analytic function.


Treating f as a function of a real x at the origin, suppose f (n) (0) = 0 for all n. Show that f (z) = 0 for all
z ∈ U.

Exercise 11.3.5: Suppose U ⊂ C is a connected open set, 0 ∈ U, and f : U → C is an analytic function. For
real x and y, let h(x) := f (x) and g(y) := −i f (iy). Show that h and g are infinitely differentiable at the origin
and h′ (0) = g′ (0).

Exercise 11.3.6: Suppose at least in some neighborhood of the origin f is analytic. Suppose further that
there exists an M such that | f (n) (0)| ≤ M for all n. Prove that the series of f at the origin converges for all
z ∈ C.

Exercise 11.3.7: Suppose f (z) := ∑ cn zn with a radius of convergence 1. Suppose f (0) = 0, but f is not the
zero function. Show that there exists a k ∈ N and a convergent power series g(z) := ∑ dn zn with radius of
convergence 1 such that f (z) = zk g(z) for all z ∈ B(0, 1), and g(0) 6= 0.

Exercise 11.3.8: Suppose U ⊂ C is open and connected. Suppose that f : U → C is analytic, U ∩ R 6= 0/ and
f (x) = 0 for all x ∈ U ∩ R. Show that f (z) = 0 for all z ∈ U.

Exercise 11.3.9: For α ∈ C and k = 0, 1, 2, 3 . . ., define


 
α α(α − 1) · · · (α − k)
:= .
k k!

a) Show that the series  



α k
f (z) := ∑ z
k=0 k

converges whenever |z| < 1. In fact, prove that for α = 0, 1, 2, 3, . . . the radius of convergence is ∞, and
for all other α the radius of convergence is 1.
b) Show that for x ∈ R, |x| < 1, we have

(1 + x) f ′ (x) = α f (x),

meaning that f (x) = (1 + x)α .

Exercise 11.3.10: Suppose f : C → C is analytic and suppose that for some open interval (a, b) ⊂ R, f is
real valued on (a, b). Show that f is real-valued on R.

Exercise 11.3.11: Let D = B(0, 1) be the unit disc. Suppose f : D → C is analytic with power series ∑ cn zn .
1
Suppose |cn | ≤ 1 for all n. Prove that for all z ∈ D we have | f (z)| ≤ 1−|z| .
11.4. THE COMPLEX EXPONENTIAL AND THE TRIGONOMETRIC FUNCTIONS 147

11.4 The complex exponential and the trigonometric functions


Note: 1 lecture

11.4.1 The complex exponential


Define

1
E(z) := ∑ n! zn.
n=0

This series converges for all z ∈ C. We notice that E(0) = 1, and that for z = x ∈ R, E(x) ∈ R.
Keeping x real, we find
d 
E(x) = E(x)
dx
by direct calculation. In §5.4 of volume I (or by Picard’s theorem), we proved that the unique
function satisfying E ′ = E and E(0) = 1 is the exponential. In other words for x ∈ R, ex = E(x).
For complex numbers z we define

z 1
e := E(z) = ∑ k! zk .
k=0

On the real line this new definition agrees with our previous one. See . Notice that in the
x direction (the real direction) the graph behaves like the real exponential, and in the y direction (the
imaginary direction) the graph oscillates.

y y

x x

Figure 11.7: Graphs of the real part (left) and imaginary part (right) of the complex exponential
ez = ex+iy . The x-axis goes from −4 to 4, the y-axis goes from −6 to 6, and the vertical axis goes from
−e6 ≈ −54.6 to e6 ≈ 54.6. The plot of the real exponential (y = 0) is marked in a bold line.

Proposition 11.4.1. Let z, w ∈ C be complex numbers. Then

ez+w = ez ew .
148 CHAPTER 11. FUNCTIONS AS LIMITS

Proof. We know ex+y = ex ey is true for real numbers x and y.


Now for any fixed y ∈ R, we get by the identity theorem ( ) that ez+y = ez ey for
z+y z y
all z ∈ C. Fixing an arbitrary z ∈ C and we get e = e e for all y ∈ R, and again by the identity
theorem ez+w = ez ew for all z, w ∈ C.

A simple consequence is that ez 6= 0 for all z ∈ C, as ez e−z = ez−z = 1.

11.4.2 Trigonometric functions and π


We can now finally define sine and cosine by the equation

ex+iy = ex cos(y) + i sin(y) .

In fact, we define sine and cosine for all complex z:

eiz + e−iz eiz − e−iz


cos(z) := and sin(z) := .
2 2i
Let us use our definition to prove the common properties we usually associate with sine and
cosine. In the process we also define the number π .

Proposition 11.4.2. The sine and cosine functions have the following properties:
(i) For all z ∈ C,
eiz = cos(z) + i sin(z) (Euler’s formula).

(ii) cos(0) = 1, sin(0) = 0.


(iii) For all z ∈ C,
cos(−z) = cos(z), sin(−z) = − sin(z).

(iv) For all z ∈ C,



(−1)k 2k ∞
(−1)k 2k+1
cos(z) = ∑ z , sin(z) = ∑ z .
k=0 (2k)! k=0 (2k + 1)!

(v) For all x ∈ R


cos(x) = Re(eix ) and sin(x) = Im(eix ).

(vi) For all x ∈ R,


2 2
cos(x) + sin(x) = 1.

(vii) For all x ∈ R,


|sin(x)| ≤ 1, |cos(x)| ≤ 1.

(viii) For all x ∈ R

d  d 
cos(x) = − sin(x) and sin(x) = cos(x).
dx dx
11.4. THE COMPLEX EXPONENTIAL AND THE TRIGONOMETRIC FUNCTIONS 149

(ix) For all x ≥ 0,


sin(x) ≤ x.
(x) There exists an x > 0 such that cos(x) = 0. We define

π := 2 inf{x > 0 : cos(x) = 0}.

(xi) For all z ∈ C,


e2πi = 1, and ez+i2π = ez .
(xii) Sine and cosine are 2π -periodic and not periodic with any smaller period. That is 2π is the
smallest number such that for all z ∈ C,

sin(z + 2π ) = sin(z) and cos(z + 2π ) = cos(z).

(xiii) The function x 7→ eix is a bijective map from [0, 2π ) onto the set of z ∈ C such that |z| = 1.

The proposition immediately implies that sin(x) and cos(x) are real whenever x is real.

Proof. The first three items follow directly from the definition. The computation of the power series
for both is left as an exercise.
As complex conjugate is a continuous function, then the definition of ez implies (ez ) = ez̄ . If x
is real,
(eix ) = e−ix .
Thus for real x, cos(x) = Re(eix ) and sin(x) = Im(eix ).
For real x we compute
2 2
1 = eix e−ix = |eix |2 = cos(x) + sin(x) .

In particular, is eix is unimodular, the values lie on the unit circle. A square is always nonnegative:
2 2
sin(x) = 1 − cos(x) ≤ 1.

So |sin(x)| ≤ 1 and similarly |cos(x)| ≤ 1.


We leave the computation of the derivatives to the reader as exercises.
Let us now prove that sin(x) ≤ x for x ≥ 0. Consider f (x) := x − sin(x) and differentiate:

d 
f ′ (x) = x − sin(x) = 1 − cos(x) ≥ 0,
dx
for all x as |cos(x)| ≤ 1. In other words, f is increasing and f (0) = 0. So f must be nonnegative
when x ≥ 0.
We claim there exists a positive x such that cos(x) = 0. As cos(0) = 1 > 0, cos(x) > 0 for x near
0. Suppose that cos(x) > 0 on [0, y), then sin(x) is strictly increasing on [0, y). As sin(0) = 0, then
sin(x) > 0 for x ∈ (0, y). Take a ∈ (0, y). By the mean value theorem there is a c ∈ (a, y) such that

2 ≥ cos(a) − cos(y) = sin(c)(y − a) ≥ sin(a)(y − a).


150 CHAPTER 11. FUNCTIONS AS LIMITS

As a ∈ (0, y), then sin(a) > 0 and so

2
y≤ + a.
sin(a)

Hence there is some largest y such that cos(x) > 0 in [0, y), and let y be the largest such number. By
continuity, cos(y) = 0. In fact, y is the smallest positive y such that cos(y) = 0. As mentioned π is
defined to be 2y.
2
As cos(π/2) = 0 then sin(π/2) = 1. As sin was positive on (0, y) we have sin(π/2) = 1. Hence

eiπ/2 = i,

and by the addition formula


eiπ = −1, ei2π = 1.

So ei2π = 1 = e0 . The addition formula says

ez+i2π = ez

for all z ∈ C. Immediately we also obtain cos(z + 2π ) = cos(z) and sin(z + 2π ) = sin(z). So sin
and cos are 2π -periodic.
We claim that sin and cos are not periodic with a smaller period. It would be enough to show
that if eix = 1 for the smallest positive x, then x = 2π . So let x be the smallest positive x such that
eix = 1. Of course, x ≤ 2π . By the addition formula
4
eix/4 = 1.

If eix/4 = a + ib, then



(a + ib)4 = a4 − 6a2 b2 + b4 + i 4ab(a2 − b2 ) = 1.

As x/4 ≤ π/2, then a = cos(x/4) ≥ 0 and 0 < b = sin(x/4). Then either a = 0 or a2 = b2 . If a2 = b2 ,


then a4 − 6a2 b2 + b4 = −4a4 < 0 and in particular not equal to 1. Therefore a = 0 in which case
x/4 = π/2. Hence 2π is the smallest period we could choose for eix and so also for cos and sin.
Finally we also wish to show that eix is one-to-one and onto from the set [0, 2π ) to the set of
z ∈ C such that |z| = 1. Suppose eix = eiy and x > y. Then ei(x−y) = 1, meaning x − y is a multiple
of 2π and hence only one of them can live in [0, 2π ). To show onto, pick (a, b) ∈ R2 such that
a2 + b2 = 1. Suppose first that a, b ≥ 0. By the intermediate value theorem there must exist an
2
x ∈ [0, π/2] such that cos(x) = a, and hence b2 = sin(x) . As b and sin(x) are nonnegative we
have b = sin(x). Since − sin(x) is the derivative of cos(x) and cos(−x) = cos(x) then sin(x) < 0
for x ∈ [−π/2, 0). Using the same reasoning we obtain that if a > 0 and b ≤ 0, we can find an x in
[−π/2, 0), and by periodicity, x ∈ [3π/2, 2π ) such that cos(x) = a and sin(x) = b. Multiplying by −1
is the same as multiplying by eiπ or e−iπ . So we can always assume that a ≥ 0 (details are left as
exercise).
11.4. THE COMPLEX EXPONENTIAL AND THE TRIGONOMETRIC FUNCTIONS 151

11.4.3 The unit circle and polar coordinates


The arclength of a curve parametrized by γ : [a, b] → C is given by
Z b
|γ ′ (t)| dt.
a

We have that eit parametrizes the circle for t in [0, 2π ). dtd eit = ieit and so the circumference of
the circle is Z 2π Z 2π
it
|ie | dt = 1 dt = 2π .
0 0

More generally we notice that eit parametrizes the circle by arclength. That is, t measures the
arclength, and hence a circle of radius 1 by the angle in radians. Hence the definitions of sin and
cos we have used above agree with the standard geometric definitions.
All the points on the unit circle can be achieved by eit for some t. Therefore, we can write any
complex number z ∈ C (in so-called polar coordinates) as

z = reiθ

for some r ≥ 0 and θ ∈ R. The θ is, of course, not unique as θ or θ + 2π gives the same number.
The formula ea+b = ea eb leads to a useful formula for powers and products of complex numbers in
polar coordinates:
n
(reiθ ) = rn einθ , (reiθ )(seiγ ) = rsei(θ +γ) .

11.4.4 Exercises
Exercise 11.4.1: Derive the power series for sin(z) and cos(z) at the origin.
d
  d
 
Exercise 11.4.2: Using the power series, show that for x real we have dx sin(x) = cos(x) and dx cos(x) =
− sin(x).

Exercise 11.4.3: Finish the proof of the argument that x 7→ eix from [0, 2π) is onto the unit circle. In particular,
assume that we get all points of the form (a, b) where a2 + b2 = 1 for a ≥ 0. By multiplying by eiπ or e−iπ
show that we get everything.

Exercise 11.4.4: Prove that there is no z ∈ C such that ez = 0.

Exercise 11.4.5: Prove that for every w 6= 0 and every ε > 0, there exists a z ∈ C, |z| < ε such that e1/z = w.
2 2 2 2
Exercise 11.4.6: We showed cos(x) + sin(x) = 1 for all x ∈ R. Prove that cos(z) + sin(z) = 1
for all z ∈ C.

Exercise 11.4.7: Prove the trigonometric identities sin(z + w) = sin(z) cos(w) + cos(z) sin(w) and cos(z +
w) = cos(z) cos(w) − sin(z) sin(w) for all z, w ∈ C.
sin(z)
Exercise 11.4.8: Define sinc(z) := z for z 6= 0 and sinc(0) := 1. Show that sinc is analytic and compute
its power series at zero.
152 CHAPTER 11. FUNCTIONS AS LIMITS

Define the hyperbolic sine and hyperbolic cosine by

ez − e−z ez + e−z
sinh(z) := , cosh(z) := .
2 2

Exercise 11.4.9: Derive the power series for the hyperbolic sine and cosine.

Exercise 11.4.10: Show


a) sinh(0) = 0, cosh(0) = 1.
d
  d
 
b) dx sinh(x) = cosh(x) and dx cosh(x) = sinh(x).
c) cosh(x) > 0 for all x ∈ R and show that sinh(x) is strictly increasing and bijective from R to R.
2 2
d) cosh(x) = 1 + sinh(x) for all x.
sin(x)
Exercise 11.4.11: Define tan(x) := cos(x) as usual.
a) Show that for x ∈ (−π/2, π/2) both sin and tan are strictly increasing, and hence sin−1 and tan−1 exist
when we restrict to that interval.
b) Show that sin−1 and tan−1 are differentiable and that d
dx sin−1 (x) = √ 1
1−x2
and d
dx tan−1 (x) = 1
1+x2
.
c) Using the finite geometric sum formula show
Z x
−1 1 ∞
(−1)k 2k+1
tan (x) =
1 + ts
dt = ∑ x
0 k=0 2k + 1

converges for all −1 ≤ x ≤ 1 (including the end points). Hint: integrate the finite sum, not the series.
d) Use this to show that
1 1 ∞
(−1)k π
1− + −··· = ∑ = .
3 5 k=0 2k + 1 4
11.5. FUNDAMENTAL THEOREM OF ALGEBRA 153

11.5 Fundamental theorem of algebra


Note: half a lecture, optional
In this section we study the local behaviour of polynomials and the growth of polynomials as z
goes to infinity. As an application we prove the fundamental theorem of algebra: any polynomial
has a complex root.

Lemma 11.5.1. Let p(z) be complex polynomial. If p(z0 ) 6= 0, then there exist w ∈ C such that
|p(w)| < |p(z0 )|. In fact, we can pick w to be arbitrarily close to z0 .

Proof. Without loss of generality assume that z0 = 0 and p(0) = 1. Write

p(z) = 1 + ak zk + ak+1 zk+1 + · · · + ad zd ,

where ak 6= 0. Pick t such that ak eikt = −|ak |, which we can do by the discussion on trigonometric
functions. Suppose r > 0 is small enough such that 1 − rk |ak | > 0. We have

p(reit ) = 1 − rk |ak | + rk+1 ak+1 ei(k+1)t + · · · + rd ad eidt .

So

p(reit ) − rk+1 ak+1 ei(k+1)t + · · · + rd ad eidt ≤ p(reit ) − rk+1 ak+1 ei(k+1)t − · · · − rd ad eidt

= 1 − rk |ak | = 1 − rk |ak |.

In other words,
 
p(reit ) ≤ 1 − rk |ak | − r ak+1 ei(k+1)t + · · · + rd−k−1 ad eidt .

For small enough r the expression the parentheses is positive as |ak | > 0. Then p(reit ) < 1 =
p(0).

Remark 11.5.2. The above lemma holds essentially with an unchanged proof for (complex) analytic
functions. A proof of this generalization is left as an exercise to the reader. What the lemma says is
that the only minima the modulus of analytic functions (polynomials) has are precisely at the zeros.
Remark 11.5.3. The lemma does not hold if we restrict to real numbers. For example, x2 + 1 has a
minimum at x = 0, but no zero there. The thing is that there is a w arbitrarily close to 0 such that
|w2 + 1| < 1, but this w is necessarily not real. Letting w = iε for small ε > 0 works.
The moral of the story is that if p(0) = 1, then very close to 0, the polynomial looks like 1 + azk
and this has no minimum at the origin. All the higher powers of z are too small to make a difference.
We find similar behavior at infinity.

Lemma 11.5.4. Let p(z) be complex polynomial. Then for any M there exists an R such that if
|z| ≥ R then |p(z)| ≥ M.
154 CHAPTER 11. FUNCTIONS AS LIMITS

Proof. Write p(z) = a0 + a1 z + · · · + ad zd and suppose that ad 6= 0. Suppose |z| ≥ R (so also
|z|−1 ≤ R−1 ). We estimate:

|p(z)| ≥ |ad zd | − |a0 | − |a1 z| − · · · − |ad−1 zd−1 |



= |z|d |ad | − |a0 | |z|−d − |a1 | |z|−d+1 − · · · − |ad−1 | |z|−1

≥ Rd |ad | − |a0 |R−d − |a1 |R1−d − · · · − |ad−1 |R−1 .

Then the expression in parentheses is eventually positive for large enough R. In particular, for large
enough R we get that it is greater than |a2d | and so

|ad |
|p(z)| ≥ Rd .
2
Therefore, we can pick R large enough to be bigger than a given M.
The above lemma does not generalize to analytic functions, even those defined in all of C. The
function cos(z) is an obvious counterexample. Note that we had to look at the term with the largest
degree, and we only have such a term for a polynomial. In fact, something that we will not prove
is that an analytic function defined on all of C satisfying the conclusion of the lemma must be a
polynomial.
The moral of the story here is that for very large |z| (far away from the origin) a polynomial of
degree d really looks like a constant multiple of zd .
Theorem 11.5.5 (Fundamental theorem of algebra). Let p(z) be complex polynomial, then there
exists a z0 ∈ C such that p(z0 ) = 0.

Proof. Let µ := inf |p(z)| : z ∈ C . Find an R such that for all z with |z| ≥ R we
 have |p(z)| ≥ µ +1.
Therefore, any z with |p(z)| close to µ must be in the closed ball C(0, R) = z ∈ C : |z| ≤ R . As
|p(z)| is a continuous real-valued function, it achieves its minimum on the compact set C(0, R)
(closed and bounded) and this minimum must be µ . So there is a z0 ∈ C(0, R) such that |p(z0 )| = µ .
As that is a minimum of |p(z)| on C, then by the first lemma above we have that |p(z0 )| = 0.
The theorem doesn’t generalize to analytic functions either. For example ez is an analytic
function on C with no zeros.

11.5.1 Exercises
Exercise 11.5.1: Prove for an analytic function. That is, suppose that p(z) is a power series
around z0 .

Exercise 11.5.2: Use to prove the maximum principle for analytic functions: If U ⊂ C is
open and connected, f : U → C is analytic, and | f (z)| attains a relative maximum at z0 ∈ U. Then f is
constant.

Exercise 11.5.3: Let U ⊂ C be open and z0 ∈ U. Suppose f : U → C is analytic and f (z0 ) = 0. Show that
there exists an ε > 0 such that either f (z) 6= 0 for all z with 0 < |z| < ε or f (z) = 0 for all z ∈ B(z0 , ε). In
other words zeros of analytic functions are isolated. Of course, same holds for polynomials.
11.5. FUNDAMENTAL THEOREM OF ALGEBRA 155

p(z)
A rational function is a function f (z) := q(z) where p and q are polynomials and q is not identically
zero. A point z0 ∈ C where f (z0 ) = 0 (and therefore p(z0 ) = 0) is called a zero. A point z0 ∈ C is called an
singularity of f if q(z0 ) = 0. As all zeros are isolated and so all singularities of rational functions are isolated
and so are called an isolated singularity. An isolated singularity is called removable if limz7→z0 f (z) exists.
An isolated singularity is called a pole if limz7→z0 | f (z)| = ∞. We say f has pole at ∞ if

lim | f (z)| = ∞,
z→∞

that is, if for every M > 0 there exists an R > 0 such that | f (z)| > M for all z with |z| > R.

Exercise 11.5.4: Show that a rational function which is not identically zero has at most finitely many zeros
and singularities. In fact, show that if p is a polynomial of degree n > 0 it has at most n zeros.
Hint: If z0 is a zero of p, without loss of generality assume z0 = 0, then use induction.
p(z)
Exercise 11.5.5: Prove that if z0 is a removable singularity of a rational function f (z) := q(z) , show that
pe(z)
there exist polynomials pe and qe such that qe(z0 ) 6= 0 and f (z) = qe(z) .
Hint: Without loss of generality assume z0 = 0.

Exercise 11.5.6: Given a rational function f and an isolated singularity z0 , show that z0 is either removable
or a pole.
Hint: See the previous exercise.

Exercise 11.5.7: Let f be a rational function and S ⊂ C is the set of the singularities of f . Prove that f is
equal to a polynomial on C \ S if and only if f has a pole at infinity and all the singularities are removable.
Hint: See previous exercises.
156 CHAPTER 11. FUNCTIONS AS LIMITS

11.6 Equicontinuity and the Arzelà–Ascoli theorem


Note: 2 lectures
We would like an analogue of Bolzano-Weierstrass. Something to the tune of “every bounded
sequence of functions (with some property) has a convergent subsequence.” Matters are not as
simple even for continuous functions. Not every bounded sequence in the metric space C([0, 1], R)
has a convergent subsequence.

Definition 11.6.1. Let X be any set. Let fn : X → C be functions in a sequence. We say that { fn }
is pointwise bounded if for every x ∈ X, there is an Mx ∈ R such that

| fn (x)| ≤ Mx for all n ∈ N.

We say that { fn } is uniformly bounded if there is an M ∈ R such that

| fn (x)| ≤ M for all n ∈ N and all x ∈ X.

If X is a compact metric space, then a sequence in C(X, C) is uniformly bounded if it is bounded


as a set in the metric space C(X, C) using the uniform norm.

Example 11.6.2: There exist sequences of continuous functions on [0, 1] that are uniformly bounded
but contain no subsequence converging even pointwise. Let us state without proof that fn (x) :=
sin(2π nx) is one such sequence. Below we will show that there must always exist a subsequence
converging at countably many points, but [0, 1] is uncountable.

Example 11.6.3: The sequence fn (x) := xn of functions on [0, 1] is uniformly bounded, but contains
no sequence that converges uniformly, although the sequence converges pointwise to a discontinuous
function.

Example 11.6.4: The sequence { fn } of functions in C([0, 1], R) given by fn (x) := n2 (1 − x)xn
converges pointwise to the zero function (use the ratio test for x < 1). As for each x, { fn (x)}
converges to 0, it is bounded so { fn } is pointwise bounded.
n
By calculus we find that the maximum of each fn on [0, 1] is at the critical point at x = n+1 , and
   n+1
n n
k f n ku = f n =n .
n+1 n+1

n n+1
It is left to the reader to check that lim n+1 = e−1 , and so lim k fn ku = ∞, or in other words,
this sequence is not uniformly bounded.

When the domain is countable, we can always guarantee at least pointwise convergence. The
proof uses a very common and useful diagonal argument.

Proposition 11.6.5. Let X be a countable set and fn : X → C give a pointwise bounded sequence
of functions. Then { fn } has a subsequence that converges pointwise.
11.6. EQUICONTINUITY AND THE ARZELÀ–ASCOLI THEOREM 157

Proof. Let {xn }∞ ∞


n=1 be an enumeration of the elements of X. The sequence { f n (x1 )}n=1 is bounded
and hence we have a subsequence of { fn }∞ ∞ ∞
n=1 , which we denote by { f 1,k }k=1 such that { f 1,k (x1 )}k=1
∞ ∞ ∞
converges. Next { f1,k (x2 )}k=1 is bounded and so { f1,k }k=1 has a subsequence { f2,k }k=1 such that
{ f2,k (x2 )}∞ ∞
k=1 converges. Note that { f 2,k (x1 )}k=1 is still convergent.
In general, we have a sequence { fm,k }k=1 that makes { fm,k (x j )}∞

k=1 converge for all j ≤ m and
we let { fm+1,k }k=1 be the subsequence of { fm,k }k=1 such that { fm+1,k (xm+1 )}∞
∞ ∞
k=1 converges (and
hence it converges for all x j for j = 1, 2, . . . , m + 1) and we rinse and repeat.
If X is finite we are done as the process stops at some point. If X is countably infinite, we
pick the sequence { fk,k }∞ ∞
k=1 . This is a subsequence of the original sequence { f n }n=1 . For any m
the tail { fk,k }k=m is a subsequence of { fm,k }k=1 and hence for any m the sequence { fk,k (xm )}∞
∞ ∞
k=1
converges.
For larger than countable sets, we need the functions of the sequence to be related. We look at
continuous functions, and the concept we need is equicontinuity.
Definition 11.6.6. Let X be a metric space. A set S of functions f : X → C is said to be uniformly
equicontinuous if for every ε > 0, there is a δ > 0 such that if x, y ∈ X with d(x, y) < δ we have

| f (x) − f (y)| < ε for all f ∈ S.

Notice the functions in a uniformly equicontinuous sequence are all uniformly continuous. It is
not hard to show that a finite set of uniformly continuous functions is uniformly equicontinuous.
The definition is really interesting if S is infinite.
And just as for continuity, one can define equicontinuity at a point. That is, S is equicontinuous at
x ∈ X if for every ε > 0, there is a δ > 0 such that if y ∈ X with d(x, y) < δ we have | f (x) − f (y)| < ε
for all f ∈ S. We will only deal with compact X here, and one can prove (exercise) that for a compact
metric space X, if S is equicontinuous at every x ∈ X, then it is uniformly equicontinuous. For
simplicity we stick to uniform equicontinuity.
Proposition 11.6.7. Suppose (X, d) is a compact metric space, fn ∈ C(X, C), and { fn } converges
uniformly, then { fn } is uniformly equicontinuous.
Proof. Let ε > 0 be given. As { fn } converges uniformly, there is an N ∈ N such that for all n ≥ N

| fn (x) − fN (x)| < ε/3 for all x ∈ X.

As X is compact, any continuous function is uniformly continuous. So { f1 , f2 , . . . , fN } is a finite set


of uniformly continuous functions. And so, as we mentioned above, it is uniformly equicontinuous.
Hence there is a δ > 0 such that

| f j (x) − f j (y)| < ε/3 < ε

whenever d(x, y) < δ and 1 ≤ j ≤ N.


Take n > N. For d(x, y) < δ we have

| fn (x) − fn (y)| ≤ | fn (x) − fN (x)| + | fN (x) − fN (y)| + | fN (y) − fn (y)| < ε/3 + ε/3 + ε/3 = ε .

Proposition 11.6.8. A compact metric space X contains a countable dense subset, that is, there
exists a countable D ⊂ X such that D = X.
158 CHAPTER 11. FUNCTIONS AS LIMITS

Proof. For each n ∈ N there are finitely many balls of radius 1/n that cover X (as X is compact).
That is, for every n, there exists a finite set of points xn,1 , xn,2 , . . . , xn,kn such that
kn
[
X= B(xn, j , 1/n)
j=1
S
Let D := ∞ n=1 {xn,1 , xn,2 , . . . , xn,kn }. The set D is countable as it is a countable union of finite sets.
For every x ∈ X and every ε > 0, there exists an n such that 1/n < ε and an xn, j ∈ D such that

x ∈ B(xn, j , 1/n) ⊂ B(xn, j , ε ).

Hence x ∈ D, so D = X and D is dense.


We are now ready for the main result of this section, the Arzelà–Ascoli theorem about existence
of convergent subsequences.
Theorem 11.6.9 (Arzelà–Ascoli). Let (X, d) be a compact metric space, and let { fn } be pointwise
bounded and uniformly equicontinuous sequence of functions fn ∈ C(X, C). Then { fn } is uniformly
bounded and { fn } contains a uniformly convergent subsequence.
Basically, a uniformly equicontinuous sequence in the metric space C(X, C) that is pointwise
bounded is bounded (in C(X, C)) and furthermore contains a convergent subsequence in C(X, C).
As we mentioned before, as X is compact, it is enough to just assume that { fn } is equicontinuous
as uniform equicontinuity is automatic via an exercise.
Proof. Let us first show that the sequence is uniformly bounded.
By uniform equicontinuity we have that there is a δ > 0 such that for all x ∈ X and all n ∈ N

B(x, δ ) ⊂ fn−1 B( fn (x), 1) .

The space X is compact, so there exist x1 , x2 , . . . , xk such that


k
[
X= B(x j , δ ).
j=1

As { fn } is pointwise bounded there exist M1 , M2 , . . . , Mk such that for j = 1, 2, . . . , k we have

| fn (x j )| ≤ M j for all n.

Let M := 1 + max{M1 , M2 , . . . , Mk }. Given


 any x ∈ X, there is a j such that x ∈ B(x j , δ ). Therefore,
−1
for all n we have x ∈ fn B( fn (x j ), 1) , or in other words

| fn (x) − fn (x j )| < 1.

By reverse triangle inequality,

| fn (x)| < 1 + | fn (x j )| ≤ 1 + M j ≤ M.

Named after the Italian mathematicians (1847–1912), and (1843–1896).
11.6. EQUICONTINUITY AND THE ARZELÀ–ASCOLI THEOREM 159

And as x was arbitrary, { fn } is uniformly bounded.


Next, pick a countable dense subset D ⊂ X. By , we find a subsequence { fn j }
that converges pointwise on D. Write g j := fn j for simplicity. The sequence {gn } is uniformly
equicontinuous. Let ε > 0 be given, then there exists a δ > 0 such that for all x ∈ X and all n ∈ N

B(x, δ ) ⊂ g−1
n B(gn (x), /3) .
ε

By density of D and because δ is fixed, every x ∈ X is in some B(y, δ ) for some y ∈ D. By


compactness of X, there is a finite subset {x1 , x2 , . . . , xk } ⊂ D such that
k
[
X= B(x j , δ ).
j=1

As there are finitely many points and {gn } converges pointwise on D, there exists a single N such
that for all n, m ≥ N we have

|gn (x j ) − gm (x j )| < ε/3 for all j = 1, 2, . . . , k.

Let x ∈ X be arbitrary. There is some j such that x ∈ B(x j , δ ) and so we have for all ℓ ∈ N

|gℓ (x) − gℓ (x j )| < ε/3,

and so n, m ≥ N that

|gn (x) − gm (x)| ≤ |gn (x) − gn (x j )| + |gn (x j ) − gm (x j )| + |gm (x j ) − gm (x)| < ε/3 + ε/3 + ε/3 = ε .

Hence, the sequence is uniformly Cauchy. By completeness of C, it is uniformly convergent.


Corollary 11.6.10. Let X be a compact metric space. Let S ⊂ C(X, C) be a closed, bounded and
uniformly equicontinuous set. Then S is compact.
The theorem says that S is sequentially compact and that means compact in a metric space.
Recall that the closed unit ball in C([0, 1], R) (and therefore also in C([0, 1], C)) is not compact.
Hence it cannot be a uniformly equicontinuous set.
Corollary 11.6.11. Suppose { fn } is a sequence of differentiable functions on [a, b], { fn′ } is uni-
formly bounded, and there is an x0 ∈ [a, b] such that { fn (x0 )} is bounded. Then there exists a
uniformly convergent subsequence { fn j }.
Proof. The trick is to use the mean value theorem. If M is the uniform bound on { fn′ }, then by the
mean value theorem for any n

| fn (x) − fn (y)| ≤ M|x − y| for all x, y ∈ X.

All the fn are Lipschitz with the same constant and hence the sequence is uniformly equicontinuous.
Suppose | fn (x0 )| ≤ M0 for all n. For all x ∈ [a, b]

| fn (x)| ≤ | fn (x0 )| + | fn (x) − fn (x0 )| ≤ M0 + M|x − x0 | ≤ M0 + M(b − a).

So { fn } is uniformly bounded. We apply to find the subsequence.


160 CHAPTER 11. FUNCTIONS AS LIMITS

A classic application of the above corollary to Arzelà–Ascoli in the theory of differential


equations is to prove the Peano existence theorem, that is, the existence of solutions to ordinary
differential equations. See below.
Another application of Arzelà–Ascoli using the same idea as the above corollary is the following.
Take a continuous k : [0, 1] × [0, 1] → C. For any f ∈ C([0, 1], C) define
 Z 1
T f (x) := f (t) k(x,t) dt.
0
In exercises to earlier sections you have shown that T is a linear operator on C([0, 1], C). Via
Arzelà–Ascoli, we also find (exercise) that the image of the unit ball of functions
 
T B(0, 1) = T f ∈ C([0, 1], C) : k f ku < 1
has compact closure, usually called relatively compact. Such an operator is called a compact
operator. And they are very useful. Generally operators defined by integration tend to be compact.

11.6.1 Exercises
Exercise 11.6.1: Let fn : [−1, 1] → R be given by fn (x) := nx 2 . Prove that the sequence is uniformly
1+(nx)
bounded, converges pointwise to 0, but does not converge uniformly to 0. Which hypothesis of Arzelà–Ascoli
is not satisfied? Prove your assertion.
1
Exercise 11.6.2: Define fn : R → R by fn (x) := . Prove that this sequence is uniformly bounded,
(x−n)2 +1
uniformly equicontinuous, the sequence converges pointwise to zero, yet there is no subsequence that
converges uniformly. Which hypothesis of Arzelà–Ascoli is not satisfied? Prove your assertion.
Exercise 11.6.3: Let (X, d) be a compact metric space, C > 0, 0 < α ≤ 1, and suppose fn : X → C are
functions such as | fn (x) − fn (y)| ≤ Cd(x, y)α for all x, y ∈ X and n ∈ N. Suppose also that there is a point
p ∈ X such that fn (p) = 0 for all n. Show that there exists a uniformly convergent subsequence converging to
an f : X → C that also satisfies f (p) = 0 and | f (x) − f (y)| ≤ Cd(x, y)α .
Exercise 11.6.4: Let T : C([0, 1], C) → C([0, 1], C) be the operator given by
 Z x
T f (x) := f (t) dt.
0
(That T is linear and that T f is continuous follows from linearity of the integral and the fundamental theorem
of calculus).
a) Show that T takes the unit ball centered at 0 in C([0, 1], C) into a relatively compact set (a set with
compact closure). That is, T is a compact operator.
Hint: See Exercise 7.4.20 in Volume I.
b) Let C ⊂ C([0, 1], C) the closed unit ball, prove that the image T (C) is not closed (though it is relatively
compact).
Exercise 11.6.5: Given k ∈ C([0, 1] × [0, 1], C), let T : C([0, 1], C) → C([0, 1], C) be the operator defined by
 Z 1
T f (x) := f (t) k(x,t) dt.
0
Show that T takes the unit ball centered at 0 in C([0, 1], C) into a relatively compact set (a set with compact
closure). That is, T is a compact operator.
Hint: See Exercise 7.4.20 in Volume I.
Note: That T is a well-defined linear operator was proved in .
11.6. EQUICONTINUITY AND THE ARZELÀ–ASCOLI THEOREM 161

Exercise 11.6.6: Suppose S1 ⊂ C is the unit circle, that is the set where |z| = 1. Suppose the continuous
functions fn : S1 → C are uniformly bounded. Let γ : [0, 1] → S1 be a parametrization of S1 , and g(z, w)
a continuous function on C(0, 1) × S1 (here C(0, 1) ⊂ C is the closed unit ball). Define the functions
Fn : C(0, 1) → C by the path integral (See )
Z
Fn (z) := fn (w) g(z, w) ds(w).
γ

Show that {Fn } has a uniformly convergent subsequence.

Exercise 11.6.7: Suppose (X, d) is a compact metric space, { fn } a uniformly equicontinuous sequence of
functions in C(X, C). Suppose { fn } converges pointwise. Show that it converges uniformly.

Exercise 11.6.8: Suppose that { fn } is a uniformly equicontinuous uniformly bounded sequence of 2π-
periodic functions fn : R → R. Show that there is a uniformly convergent subsequence.

Exercise 11.6.9: Show that for a compact metric space X, a sequence { fn } that is equicontinuous at every
x ∈ X is uniformly equicontinuous.

Exercise 11.6.10: Define fn : [0, 1] → C by fn (t) := ei(2π t+n) . This is a uniformly equicontinuous uniformly
bounded sequence. Prove more than just the conclusion of Arzelà–Ascoli for this sequence. Let γ ∈ R be
given, and define g(t) := ei(2π t+γ ) . Show that there exists a subsequence of { fn } converging uniformly to g.
Hint: Feel free to use the Kronecker density theorem : The sequence {ein }∞ n=1 is dense in the unit circle.

Exercise 11.6.11: Prove the Peano existence theorem (note the lack of uniqueness in this theorem):
Theorem: Suppose F : I × J → R is a continuous function where I, J ⊂ R are closed bounded intervals,
let I ◦ and J ◦ be their interiors, and let (x0 , y0 ) ∈ I ◦ × J ◦ . Then there exists an h > 0 and a differentiable
function f : [x0 − h, x0 + h] → J ⊂ R, such that

f ′ (x) = F x, f (x) and f (x0 ) = y0 .

Use the following outline:


a) We wish to define the Picard iterates, that is, set f0 (x) := y0 , and
Z x 
fn+1 (x) := y0 + F t, fn (t) dt.
x0

Prove that there exists an h > 0 such that fn : [x0 − h, x0 + h] → C is well-defined for all n. Hint: F is
bounded (why?).
b) Show that { fn } is equicontinuous and bounded, in fact it is Lipshitz with a uniform Lipshitz constant.
Arzelà–Ascoli then says that there exists a uniformly convergent subsequence { fnk }.
  ∞
c) Prove F x, fnk (x) k=1 converges uniformly on [x0 − h, x0 + h]. Hint: F is uniformly continuous (why?).
d) Finish the proof of the theorem by taking the limit under the integral and applying the fundamental
theorem of calculus.


Named after the German mathematician (1823–1891).
162 CHAPTER 11. FUNCTIONS AS LIMITS

11.7 The Stone–Weierstrass theorem


Note: 3 lectures

11.7.1 Weierstrass approximation


Perhaps surprisingly, even a very badly behaving continuous function is really just a uniform limit
of polynomials. And we cannot really get any “nicer” functions than polynomials. The idea of the
proof is a very common approximation or “smoothing” idea (convolution with an approximate delta
function) that has applications far beyond pure mathematics.

Theorem 11.7.1 (Weierstrass approximation theorem). If f : [a, b] → C is continuous, then there


exists a sequence {pn } of polynomials converging to f uniformly on [a, b]. Furthermore, if f is
real-valued, we can find pn with real coefficients.

Proof. For x ∈ [0, 1] define


 
g(x) := f (b − a)x + a − f (a) − x f (b) − f (a) .

If we can prove the theorem for g and find the sequence {pn } for g, we prove it for f as we simply
composed with an invertible affine function and added an affine function to f . We can reverse the
process and apply that to our pn , to obtain polynomials approximating f .
The function g is defined on [0, 1] and g(0) = g(1) = 0. Assume that g is defined on the whole
real line for simplicity by defining g(x) := 0 if x < 0 or x > 1. This extended g is continuous.
Define  
Z 1 −1
2 n
cn := (1 − x ) dx , qn (x) := cn (1 − x2 )n .
−1
R1
The choice of cn is so that −1 qn (x) dx = 1. See .

0
−1 0 1
Figure 11.8: Plot of the approximate delta functions qn on [−1, 1] for n = 5, 10, 15, 20, . . . , 100 with
higher n in lighter shade.
11.7. THE STONE–WEIERSTRASS THEOREM 163

The functions qn are peaks around 0 (ignoring what happens outside of [−1, 1]) that get narrower
and taller as n increases, while the area underneath is always 1. A classic approximation idea is to
do a convolution integral with peaks like this: For for x ∈ [0, 1], let
Z 1  Z∞ 
pn (x) := g(t)qn (x − t) dt = g(t)qn (x − t) dt .
0 −∞

The idea of this convolution is that we do a “weighted average” of the function g around the point x
using qn as the weight. See .

0
0.5 1
−1 x

Figure 11.9: For x = 0.3, the plot of q100 (x −t) (light gray peak centered at x), some continuous function
g(t) (the jagged line) and the product g(t)q100 (x − t) (the bold line).

As qn is a narrow peak, the integral mostly sees the values of g that are close to x and it does the
weighted average of them. When the peak gets narrower, we compute this average closer to x and
we expect the result to get closer to the value of g(x). Really we are approximating what is called a
delta function (don’t worry if you have not heard of this concept), and functions like qn are often
called approximate delta functions. We could do this with any set of polynomials that look like
narrower and narrower peaks near zero. These just happen to be the simplest ones. We only need
this behavior on [−1, 1] as the convolution sees nothing further than this as g is zero outside [0, 1].
Because qn is a polynomial we write
qn (x − t) = a0 (t) + a1 (t) x + · · · + a2n (t) x2n ,
where ak (t) are polynomials in t, in particular continuous and hence integrable functions. So
Z 1
pn (x) = g(t)qn (x − t) dt
0Z 1
 Z 1  Z 1 
= g(t)a0 (t) dt + g(t)a1 (t) dt x + · · · + g(t)a2n (t) dt x2n .
0 0 0

In other words, pn is a polynomial in x. If g(t) is real-valued then the functions g(t)a j (t) are
real-valued and hence pn has real coefficients, proving the “furthermore” part of the theorem.

which is not actually a function

Do note that the functions a j depend on n, so the coefficients of pn change as n changes.
164 CHAPTER 11. FUNCTIONS AS LIMITS

We still need to prove that {pn } converges to g. First let us get some handle on the size of cn .
For x ∈ [0, 1], we have that 1 − x ≤ 1 − x2 . We estimate
Z 1 Z 1
2 n n
c−1
n = (1 − x ) dx = 2 (1 − x2 ) dx
−1 0
Z 1
2
≥2 (1 − x)n dx = .
0 n+1
So cn ≤ n+1
2 ≤ n.
Let us see how small qn is, if we ignore some small interval around the origin, which is where
the peak is. Given any δ > 0, δ < 1, for x such that δ ≤ |x| ≤ 1, we have
n n
qn (x) ≤ cn (1 − δ 2 ) ≤ n(1 − δ 2 ) ,
n
because qn is increasing on [−1, 0] and decreasing on [0, 1]. By the ratio test, n(1 − δ 2 ) goes to 0
as n goes to infinity.
The function qn is even, qn (t) = qn (−t), and g is zero outside of [0, 1]. So for x ∈ [0, 1],
Z 1 Z 1−x Z 1
pn (x) = g(t)qn (x − t) dt = g(x + t)qn (−t) dt = g(x + t)qn (t) dt.
0 −x −1

Let ε > 0 be given. As [−1, 2] is compact and g is continuous on [−1, 2], we have that g is
uniformly continuous. Pick 0 < δ < 1 such that if |x − y| < δ (and x, y ∈ [−1, 2]) then
ε
|g(x) − g(y)| < .
2
Let M be such that |g(x)| ≤ M for all x. Let N be such that for all n ≥ N,
n ε
4Mn(1 − δ 2 ) < .
2
R1
Note that −1 qn (t) dt = 1 and qn (t) ≥ 0 on [−1, 1]. So for n ≥ N and any x ∈ [0, 1],
Z 1 Z 1
|pn (x) − g(x)| = g(x + t)qn (t) dt − g(x) qn (t) dt
−1 −1
Z 1 
= g(x + t) − g(x) qn (t) dt
−1
Z 1
≤ |g(x + t) − g(x)|qn (t) dt
−1
Z −δ Z δ
= |g(x + t) − g(x)|qn (t) dt + |g(x + t) − g(x)|qn (t) dt
−1 −δ
Z 1
+ |g(x + t) − g(x)|qn (t) dt
δ
Z −δ Z δ Z 1
ε
≤ 2M qn (t) dt + qn (t) dt + 2M qn (t) dt
−1 2 −δ δ
n ε 2 n
≤ 2Mn(1 − δ 2 ) (1 − δ ) + + 2Mn(1 − δ ) (1 − δ )
2
n ε
< 4Mn(1 − δ 2 ) + < ε.
2
11.7. THE STONE–WEIERSTRASS THEOREM 165

A convolution often inherits some property of the functions we are convolving. In our case the
convolution pn inherited the property of being a polynomial from qn . The same idea of the proof
is often used to get other properties. If qn or g is infinitely differentiable, so is pn . If qn or g is a
solution to a linear differential equation so is pn . Etc. . .
Let us note an immediate application of the Weierstrass theorem. We have already seen that
countable dense subsets can be very useful.
Corollary 11.7.2. The metric space C([a, b], C) contains a countable dense subset.
Proof. Without loss of generality suppose that we are dealing with C([a, b], R) (why?). The real
polynomials are dense in C([a, b], R) by Weierstrass. If we show that any real polynomial can be
approximated by polynomials with rational coefficients, we are done. This is because there are only
countably many rational numbers and so there are only countably many polynomials with rational
coefficients (a countable union of countable sets is still countable).
Further without loss of generality, suppose [a, b] = [0, 1]. Let
n
p(x) := ∑ ak xk
k=0
ε
be a polynomial of degree n where ak ∈ R. Given ε > 0, pick bk ∈ Q such that |ak − bk | < n+1 .
Then if we let
n
q(x) := ∑ bk xk ,
k=0
we have
n n n n
ε
|p(x) − q(x)| = ∑ (ak − bk )xk ≤ ∑ |ak − bk |xk ≤ ∑ |ak − bk | < ∑ n + 1 = ε.
k=0 k=0 k=0 k=0

Remark 11.7.3. While we will not prove this, the above corollary implies that C([a, b], C) has
the same cardinality as R, which may be a bit surprising. The set of all functions [a, b] → C has
cardinality that is strictly greater than the cardinality of R, it has the cardinality of the power set of
R. So the set of continuous functions is a very tiny subset of the set of all functions.
Warning! The fact that every continuous function f : [−1, 1] → C (or any interval [a, b]) can be
uniformly approximated by polynomials
n
∑ ak xk
k=0

does not mean that that any continuous f is analytic, that it is equal to the series

∑ ck x k .
k=0

An analytic function is infinitely differentiable, so the function |x| provides a counterexample.


The key distinction is that the polynomials coming from the Weierstrass theorem are not the
partial sums of any series. For each one, the coefficients ak above can be completely different, they
do not need to come from a single sequence {ck }.
166 CHAPTER 11. FUNCTIONS AS LIMITS

11.7.2 Stone-Weierstrass approximation


We want to abstract away what is not really necessary and prove a general version of the Weierstrass
theorem. The polynomials are dense in the space of continuous functions on a compact interval.
What other kind of families of functions are also dense? And if the domain is an arbitrary metric
space, then we no longer have polynomials to begin with.
The theorem we will prove is the Stone–Weierstrass theorem . First, we need a very special
case of the Weierstrass theorem though.

Corollary 11.7.4. Let [−a, a] be an interval. Then there is a sequence of real polynomials {pn }
that converges uniformly to |x| on [−a, a] and such that pn (0) = 0 for all n.

Proof. As f (x) := |x| is continuous and real-valued on [−a, a], the Weierstrass theorem gives a
sequence of real polynomials { pen } that converges to f uniformly on [−a, a]. Let

pn (x) := pen (x) − pen (0).

Obviously pn (0) = 0.
Given ε > 0, let N be such that for n ≥ N we have pen (x) − |x| < ε/2 for all x ∈ [−a, a]. In
particular, | pen (0)| < ε/2. Then for n ≥ N,

pn (x) − |x| = pen (x) − pen (0) − |x| ≤ pen (x) − |x| + | pen (0)| < ε/2 + ε/2 = ε .

Following the proof of the corollary, we can always make the polynomials from the Weierstrass
theorem have a fixed value at one point, so it works not just for |x|, but that’s the one we will need.

Definition 11.7.5. A set A of complex-valued functions f : X → C is said to be an algebra


(sometimes complex algebra or algebra over C) if for all f , g ∈ A and c ∈ C we have
(i) f + g ∈ A ,
(ii) f g ∈ A , and
(iii) cg ∈ A .
A real algebra or an algebra over R is a set of real-valued functions that satisfies the three properties
above for c ∈ R.

We are interested in the case when X is a compact metric space. Then C(X, C) and C(X, R) are
metric spaces. Given a set A ⊂ C(X, C), the set of all uniform limits is the metric space closure A .
When we talk about closure of an algebra from now on we mean the closure in C(X, C) as a metric
space. Same for C(X, R).
The set P of all polynomials is an algebra in C([a, b], C), and we have shown that its closure
P = C([a, b], C). That is, it is dense. That is the sort of result that we wish to prove.
We leave the following proposition as an exercise.

Proposition 11.7.6. Suppose X is a compact metric space. If A ⊂ C(X, C) is an algebra, then the
closure A is also an algebra. Similarly for a real algebra in C(X, R).

Named after the American mathematician (1903–1989), and the German mathematician
(1815–1897).
11.7. THE STONE–WEIERSTRASS THEOREM 167

Let us distill the properties of polynomials that were sufficient for an approximation theorem.

Definition 11.7.7. Let A be a set of complex-valued functions defined on a set X.


(i) A separates points if for every x, y ∈ X, with x 6= y there is a function f ∈ A such that
f (x) 6= f (y).
(ii) A vanishes at no point if for every x ∈ X there is an f ∈ A such that f (x) 6= 0.

Example 11.7.8: The set P of polynomials separates points and vanishes at no point on R. That
is, 1 ∈ P so it vanishes at no point. And for x, y ∈ R, x 6= y, take f (t) := t: f (x) = x 6= y = f (y).
So P separates points.

Example 11.7.9: The set of functions of the form


k
f (t) = a0 + ∑ an cos(nt)
n=1

is an algebra, which follows by the identity cos(mt) cos(nt) = cos((n+m)t)


2 + cos((n−m)t)
2 . The algebra
does not separate points if the domain is any interval of the form [−a, a], because f (−t) = f (t) for
all t. It does separate points if the domain is [0, π ], as cos(t) is one-to-one on that set.

Example 11.7.10: The set of polynomials with no constant term vanishes at the origin.

Proposition 11.7.11. Suppose A is an algebra of complex-valued functions on a set X, that


separates points and vanishes at no point. Suppose x, y are distinct points of X, and c, d ∈ C. Then
there is an f ∈ A such that
f (x) = c, f (y) = d.
If A is a real algebra, the conclusion holds for c, d ∈ R.

Proof. There must exist an g, h, k ∈ A such that

g(x) 6= g(y), h(x) 6= 0, k(y) 6= 0.

Let
 
g − g(y) h g − g(x) k gh − g(y)h gk − g(x)k
f := c  +d  =c +d .
g(x) − g(y) h(x) g(y) − g(x) k(y) g(x)h(x) − g(y)h(x) g(y)k(y) − g(x)k(y)

Do note that we are not dividing by zero (clear from the first formula). Also from the first formula we
see that f (x) = c and f (y) = d. By the second formula we see that f ∈ A (as A is an algebra).

Theorem 11.7.12 (Stone–Weierstrass, real version). Let X be a compact metric space and A an
algebra of real-valued continuous functions on X, such that A separates points and vanishes at no
point. Then the closure A = C(X, R).

The proof is divided into several claims.


Claim 1: If f ∈ A then | f | ∈ A .
168 CHAPTER 11. FUNCTIONS AS LIMITS

Proof. The function f is bounded (continuous on a compact set), so there is an M such that
| f (x)| ≤ M for all x ∈ X.
Let ε > 0 be given. By the corollary to the Weierstrass theorem there exists a real polynomial
c1 y + c2 y2 + · · · + cn yn (vanishing at y = 0) such that
N
|y| − ∑ c j y j < ε
j=1

for all y ∈ [−M, M]. Because A is an algebra and because there is no constant term in the
polynomial,
N
∑ cj f j ∈ A .
j=1

As | f (x)| ≤ M we have that for all x ∈ X


N j
| f (x)| − ∑ c j f (x) < ε .
j=1

So | f | is in the closure of A , which is closed, so | f | ∈ A .

Claim 2: If f ∈ A and g ∈ A then max( f , g) ∈ A and min( f , g) ∈ A , where


   
max( f , g) (x) := max f (x), g(x) , and min( f , g) (x) := min f (x), g(x) .

Proof. Write:
f + g | f − g|
max( f , g) = + ,
2 2
and
f + g | f − g|
min( f , g) = − .
2 2
As A is an algebra we are done.
The claim is true for the minimum or maximum of any finite collection of functions as well.
Claim 3: Given f ∈ C(X, R), x ∈ X and ε > 0 there exists a gx ∈ A with gx (x) = f (x) and

gx (t) > f (t) − ε for all t ∈ X.

Proof. Fix f , x, and ε . By , for every y ∈ X we find an hy ∈ A such that

hy (x) = f (x), hy (y) = f (y).

As hy and f are continuous, the function hy − f is continuous, and the set


 
Uy := t ∈ X : hy (t) > f (t) − ε = (hy − f )−1 (−ε , ∞)

is open (it is the inverse image of an open set by a continuous function). Furthermore y ∈ Uy . So the
sets Uy cover X.
11.7. THE STONE–WEIERSTRASS THEOREM 169

The space X is compact so there exist finitely many points y1 , y2 , . . . , yn in X such that
n
[
X= Uy j .
j=1

Let
gx := max(hy1 , hy2 , . . . , hyn ).
By Claim 2, gx ∈ A . Furthermore,
gx (t) > f (t) − ε
for all t ∈ X, since for any t there is a y j such that t ∈ Uy j and so hy j (t) > f (t) − ε .
Finally hy (x) = f (x) for all y ∈ X, so gx (x) = f (x).

Claim 4: If f ∈ C(X, R) and ε > 0 is given then there exists an ϕ ∈ A such that
| f (x) − ϕ (x)| < ε .
Proof. For any x find the function gx as in Claim 3.
Let 
Vx := t ∈ X : gx (t) < f (t) + ε .
The sets Vx are open as gx and f are continuous. As gx (x) = f (x), then x ∈ Vx . So the sets Vx cover
X. By compactness of X, there are finitely many points x1 , x2 , . . . , xk such that
k
[
X= Vx j .
j=1

Let
ϕ := min(gx1 , gx2 , . . . , gxk ).
By Claim 2, ϕ ∈ A . Similarly as before (same argument as in Claim 3) we have that for all t ∈ X
ϕ (t) < f (t) + ε .
Since all the gx satisfy gx (t) > f (t) − ε for all t ∈ X, ϕ (t) > f (t) − ε as well. Therefore, for all t
−ε < ϕ (t) − f (t) < ε ,
which is the desired conclusion.
The proof of the theorem follows from Claim 4. The claim states that an arbitrary continuous
function is in the closure of A , which itself is closed. So the theorem is proved.
Example 11.7.13: The functions of the form
n
f (t) = ∑ c j e jt ,
j=1

for c j ∈ R, are dense in C([a, b], R). We need to note that such functions are a real algebra, which
follows from e jt ekt = e( j+k)t . They separate points as et is one-to-one, and et > 0 for all t so the
algebra does not vanish at any point.
170 CHAPTER 11. FUNCTIONS AS LIMITS

In general if we have a set of functions that separates points and does not vanish at any point, we
can let these functions generate an algebra by considering all the linear combinations of arbitrary
multiples of such functions. That is, we consider all real polynomials without constant term of such
functions. In the example above, the algebra is generated by et . We consider polynomials in et
without constant term.
Example 11.7.14: We mentioned that the set of all functions of the form
N
a0 + ∑ an cos(nt)
n=1

is an algebra. When considered on [0, π ], it separates points and vanishes nowhere so


applies. As for polynomials, you do not want to conclude that every continuous function
on [0, π ] has a uniformly convergent Fourier cosine series, that is, that every continuous function
can be written as ∞
a0 + ∑ an cos(nt).
n=1
That is not true! There exist continuous functions whose Fourier series does not converge even
pointwise let alone uniformly.
To obtain Stone–Weierstrass for complex algebras, we must make an extra assumption.
Definition 11.7.15. An algebra A is self-adjoint, if for all f ∈ A , the function f¯ defined by
f¯(x) := f (x) is in A , where by the bar we mean the complex conjugate.
Theorem 11.7.16 (Stone–Weierstrass, complex version). Let X be a compact metric space and A
an algebra of complex-valued continuous functions on X, such that A separates points, vanishes at
no point, and is self-adjoint. Then the closure A = C(X, C).
Proof. Suppose AR ⊂ A is the set of the real-valued elements of A . For any f ∈ A , write
f = u + iv where u and v are real-valued. Then
f + f¯ f − f¯
u= , v= .
2 2i
So u, v ∈ A as A is a self-adjoint algebra, and since they are real-valued u, v ∈ AR .
If x 6= y, then find an f ∈ A such that f (x) 6= f (y). If f = u + iv, then it is obvious that either
u(x) 6= u(y) or v(x) 6= v(y). So AR separates points.
Similarly, for any x find f ∈ A such that f (x) 6= 0. If f = u + iv, then either u(x) 6= 0 or v(x) 6= 0.
So AR vanishes at no point.
The set AR is a real algebra, and satisfies the hypotheses of the .
Given any f = u+iv ∈ C(X, C), we find g, h ∈ AR such that |u(t) − g(t)| < /2 and |v(t) − h(t)| < /2
ε ε
for all t ∈ X. Next, g + ih ∈ A , and
 
f (t) − g(t) + ih(t) = u(t) + iv(t) − g(t) + ih(t)
≤ |u(t) − g(t)| + |v(t) − h(t)| < ε/2 + ε/2 = ε .

for all t ∈ X. So A = C(X, C).


11.7. THE STONE–WEIERSTRASS THEOREM 171

The self-adjoint requirement is necessary although it is not so obvious to see it. For an example
see .
Here is an interesting application. When working with functions of two variables, it may be
useful to work with functions of the form f (x)g(y) rather than F(x, y). For example, they are easier
to integrate. We have the following.

Example 11.7.17: Any continuous function F : [0, 1] × [0, 1] → C can be approximated uniformly
by functions of the form
n
∑ f j (x)g j (y)
j=1

where f j : [0, 1] → C and g j : [0, 1] → C are continuous.


Proof: It is not hard to see that the functions of the above form are a complex algebra. It is
equally easy to show that they vanish nowhere, separate points, and the algebra is self-adjoint. As
[0, 1] × [0, 1] is compact we apply to obtain the result.

11.7.3 Exercises
Exercise 11.7.1: Prove . Hint: If { fn } is a sequence in C(X, R) converging to f , then as f
is bounded, you can show that fn is uniformly bounded, that is, there exists a single bound for all fn (and f ).

Exercise 11.7.2: Suppose X := R (not compact in particular). Show that f (t) := et is not possible to
t
uniformly approximate by polynomials on X. Hint: Consider ten as t → ∞.

Exercise 11.7.3: Suppose f : [0, 1] → C is a uniform limit of a sequence of polynomials of degree at most d,
then the limit is a polynomial of degree at most d. Conclude that to approximate a function which is not a
polynomial, we need the degree of the approximations to go to infinity.
Hint: First prove that if a sequence of polynomials of degree d converges uniformly to the zero function, then
the coefficients converge to zero. One way to do this is linear algebra: Consider a polynomial p evaluated at
d + 1 points to be a linear operator taking the coefficients of p to the values of p (an operator in L(Rd+1 )).
R
Exercise 11.7.4: Suppose f : [0, 1] → R is continuous and 01 f (x)xn dx = 0 for all n = 0, 1, 2, . . .. Show that
R 2
f (x) = 0 for all x ∈ [0, 1]. Hint: approximate by polynomials to show that 01 f (x) dx = 0.

1
Exercise 11.7.5: Suppose I : C([0, 1], R) → R is a linear continuous function such that I(xn ) = n+1 for all
R
n = 0, 1, 2, 3, . . .. Prove that I( f ) = 01 f for all f ∈ C([0, 1], R).

Exercise 11.7.6: Let A be the collection of real polynomials in x2 , that is polynomials of the form c0 +
c1 x2 + c2 x4 + · · · + cd x2d .
a) Show that every f ∈ C([0, 1], R) is a uniform limit of polynomials from A .
b) Find an f ∈ C([−1, 1], R) that is not a uniform limit of polynomials from A .
c) Which hypothesis of the real Stone-Weierstrass is not satisfied for the domain [−1, 1]?
172 CHAPTER 11. FUNCTIONS AS LIMITS

Exercise 11.7.7: Let |z| = 1 define the unit circle S1 ⊂ C.


a) Show that functions of the form
n
∑ ck zk
k=−n

are dense in C(S1 , C). Notice the negative powers.


b) Show that functions of the form
n n
c0 + ∑ ck zk + ∑ c−k z̄k
k=1 k=1

are dense in C(S1 , C).


These are so-called harmonic polynomials, and this approximation leads for
example to the solution of the steady state heat problem.
Hint: A good way to write the equation for S1 is zz̄ = 1.

Exercise 11.7.8: Show that for complex numbers c j , the set of functions of x on [−π, π] of the form
n
∑ ck eikx
k=−n

satisfies the hypotheses of the complex Stone-Weierstrass theorem and therefore such functions are dense in
the C([−π, π], C).
Exercise 11.7.9: Let S1 ⊂ C be the unit circle, that is the set where |z| = 1. Orient this set counterclockwise.
Let γ(t) := eit . For the one-form f (z) dz we write
Z Z 2π
f (z) dz := f (eit ) ieit dt.
S1 0
R k
a) Prove that for all nonnegative integers k = 0, 1, 2, 3, . . . we have S1 z dz = 0.
R
b) Prove that if P(z) = ∑nk=0 ck zk is any polynomial in z, then S1 P(z) dz = 0.
R
c) Prove S1 z̄ dz 6= 0.
d) Conclude that polynomials in z (this algebra of functions is not self-adjoint) are not dense in C(S1 , C).
Exercise 11.7.10: Let (X, d) be a compact metric space and suppose A ⊂ C(X, R) is a real algebra that
separates points, but such that for some x0 , f (x0 ) = 0 for all f ∈ A . Prove that any function g ∈ C(X, R)
such that g(x0 ) = 0 is a uniform limit of functions from A .
Exercise 11.7.11: Let (X, d) be a compact metric space and suppose A ⊂ C(X, R) is a real algebra. Suppose
that for each y ∈ X the closure A contains the function ϕy (x) := d(y, x). Then A = C(X, R).
Exercise 11.7.12:
a) Suppose f : [a, b] → C is continuously differentiable. Show that there exists a sequence of polynomials
{pn } that converges in the C1 norm to f , that is k f − pn ku + k f ′ − p′n ku → 0 as n → ∞.
b) Suppose f : [a, b] → C is k times continuously differentiable. Show that there exists a sequence of
polynomials {pn } that converges in the Ck norm to f , that is
k
( j)
∑ k f ( j) − pn ku → 0 as n → ∞.
j=0

One could also define dz := dx+i dy and then extend the path integral from to complex-valued one-forms.
11.7. THE STONE–WEIERSTRASS THEOREM 173

Exercise 11.7.13:
a) Show that an even function f : [−1, 1] → R is a uniform limit of polynomials with even powers only, that
is, polynomials of the form a0 + a1 x2 + a2 x4 + · · · + ak x2k .
b) Show that an odd function f : [−1, 1] → R is a uniform limit of polynomials with odd powers only, that is,
polynomials of the form b1 x + b2 x3 + b3 x5 + · · · + bk x2k−1 .
174 CHAPTER 11. FUNCTIONS AS LIMITS

11.8 Fourier series


Note: 3–4 lectures
Fourier series is perhaps the most important (and most difficult to understand) of the series that
we cover in this book. We have seen it in a few examples before, but let us start at the beginning.

11.8.1 Trigonometric polynomials


A trigonometric polynomial is an expression of the form
N 
a0 + ∑ an cos(nx) + bn sin(nx) ,
n=1

or equivalently, thanks to Euler’s formula (eiθ = cos(θ ) + i sin(θ )):


N
∑ cn einx .
n=−N

The second form is usually more convenient. Note that if |z| = 1 we write z = eix , and so
N N
∑ cn einx = ∑ cn zn .
n=−N n=−N

So a trigonometric polynomial is really a rational function (do note that we are allowing negative
powers) evaluated on the unit circle. There is a wonderful connection between power series (actually
Laurent series because of the native powers) and Fourier series because of this observation, but we
will not investigate this further.
Another reason why Fourier series are important and come up in so many applications is that
the functions are eigenfunctions of various differential operators. For example,

d  ikx  d 2  ikx 
e = (ik)eikx , e = (−k2 )eikx .
dx dx2
That is, they are the functions whose derivative is a scalar (the eigenvalue) times itself. Just as
eigenvalues and eigenvectors are important in studying matrices, eigenvalues and eigenfunctions
are important when studying linear differential equations.
The functions cos(nx), sin(nx), and einx are 2π -periodic and hence trigonometric polynomials
are also 2π -periodic. We could rescale x to make the period different, but the theory is the same, so
inx
let us stick with the period of 2π . The antiderivative of einx is ein and so
Z π
(
2π if n = 0,
einx dx =
−π 0 otherwise.

Named after the French mathematician (1768–1830).

Eigenfunction is like an eigenvector for a matrix, but for a linear operator on a vector space of functions.
11.8. FOURIER SERIES 175

Consider
N
f (x) := ∑ cn einx ,
n=−N
and for m = −N, . . . , N compute
Z Z
! Z π
N N
1 π −imx 1 π i(n−m)x 1
f (x)e dx = ∑ cn e dx = ∑ cn ei(n−m)x dx = cm .
2π −π 2π −π n=−N n=−N 2π −π

We just found a way of computing the coefficients cm using an integral of f . If |m| > N the integral
is just 0: We might as well have included enough zero coefficients to make |m| ≤ N.
Proposition 11.8.1. A trigonometric polynomial f (x) = ∑N
n=−N cn e
inx is real-valued for real x if

and only if c−m = cm for all m = −N, . . . , N.


Proof. If f (x) is real-valued, that is f (x) = f (x), then
Z π Z π Z π
1 1 1
cm = f (x)e−imx dx = f (x)e−imx dx = f (x)eimx dx = c−m .
2π −π 2π −π 2π −π

The complex conjugate goes inside the integral because the integral is done on real and imaginary
parts separately.
On the other hand if c−m = cm , then

c−m e−imx + cm eimx = c−m eimx + cm e−imx = cm eimx + c−m e−imx ,

which is real valued. Also c0 = c0 , so c0 is real. So by pairing up the terms we obtain that f has to
be real-valued.
The functions einx are also linearly independent.
Proposition 11.8.2. If
N
∑ cn einx = 0
n=−N

for all x ∈ [−π , π ], then cn = 0 for all n.


Proof. Proof follows immediately from the integral formula for cn .

11.8.2 Fourier series


We now take limits. We call the series ∞
∑ cn einx
n=−∞
the Fourier series. The numbers cn we call Fourier coefficients. We could also develop everything
with sines and cosines, but it is equivalent and slightly more messy.
Several questions arise. What functions are expressible as Fourier series? Obviously, they have
to be 2π -periodic, but not every periodic function is expressible with the series. Furthermore, if we
176 CHAPTER 11. FUNCTIONS AS LIMITS

do have a Fourier series, where does it converge (where and if at all)? Does it converge absolutely?
Uniformly? Also note that the series has two limits. When talking about Fourier series convergence,
we often talk about the following limit:
N
lim ∑ cn einx .
N→∞ n=−N

There are other ways we can sum the series that can get convergence in more situations, but we
refrain from discussing those.
Conversely, we start with any integrable function f : [−π , π ] → C, and we call the numbers
Z π
1
cn := f (x)e−inx dx
2π −π

its Fourier coefficients. Often these numbers are written as fˆ(n) . We then formally write down a
Fourier series. As you might imagine such a series might not even converge. We write

f (x) ∼ ∑ cn einx
n=−∞

although the ∼ doesn’t imply anything about the two sides being equal in any way. It is simply that
we created a formal series using the formula for the coefficients.
A few sections ago, we proved that the Fourier series

sin(nx)
∑ n2
n=1

converges uniformly and hence converges to a continuous function. This example and its proof can
be extended to a more general criterion.
Proposition 11.8.3. Let ∑∞
n=−∞ cn e
inx be a Fourier series, and C, α > 1 constants such that

C
|cn | ≤ for all n ∈ Z \ {0}.
|n|α
Then the series converges (absolutely and uniformly) to a continuous function on R.
The proof is to apply the Weierstrass M-test ( ) and the p-series test, to find that
the series converges uniformly and hence to a continuous function. We can also take derivatives.
Proposition 11.8.4. Let ∑∞
n=−∞ cn e
inx be a Fourier series, and C, α > 2 constants such that

C
|cn | ≤ for all n ∈ Z \ {0}.
|n|α
Then the series converges to a continuously differentiable function on R.

The notation seems similar to Fourier transform for those readers that have seen it. The similarity is not just
coincidental, we are taking a type of Fourier transform here.
11.8. FOURIER SERIES 177

The trick is to first notice that the series converges first to a continuous function by the previous
proposition, so in particular it converges at some point. Then differentiate the partial sums
N
∑ incn einx
n=−N

and notice that


C
|incn | ≤ .
|n|α−1
The differentiated series converges uniformly by the M-test again. Since the differentiated series con-
verges uniformly, we find that the original series ∑ cn einx converges to a continuously differentiable
function, whose derivative is the differentiated series.
By iterating the same reasoning we find that if for some C and α > k + 1 (k ∈ N) we have
C
|cn | ≤
|n|α
for all nonzero integers n. Then the Fourier series converges to a k-times continuously differentiable
function. Therefore, the faster the coefficients go to zero, the more regular the limit is.

11.8.3 Orthonormal systems


Let us abstract away some of the properties of the exponentials, and study a more general series
for a function. One fundamental property of the exponentials that make Fourier series what it is is
that the exponentials are a so-called orthonormal system. Let us fix an interval [a, b]. We define an
inner product for the space of functions. We restrict our attention to Riemann integrable functions
since we do not have the Lebesgue integral, which would be the natural choice. Let f and g be
complex-valued Riemann integrable functions on [a, b] and define the inner product
Z b
h f , gi := f (x)g(x) dx.
a
If you have seen Hermitian inner products in linear algebra, this is precisely such a product. We
have to put in the conjugate as we are working with complex numbers. We then have the “size”,
that is the L2 norm k f k2 by (defining the square)
Z b
k f k22 := h f , f i = | f (x)|2 dx.
a

Remark 11.8.5. Notice the similarity to finite dimensions. For z = (z1 , z2 , . . . , zn ) ∈ Cn we define
n
hz, wi := ∑ zk wk
k=1

and then the norm is (usually it is denoted by simply kzk rather than kzk2 )
n
2
kzk = hz, zi = ∑ |zk |2.
k=1

This is just the euclidean distance to the origin in Cn (same as R2n ).


178 CHAPTER 11. FUNCTIONS AS LIMITS

Let us get back to function spaces. In what follows, we will assume all functions are Riemann
integrable.

Definition 11.8.6. Let {ϕn } be a sequence of integrable complex-valued functions on [a, b]. We
say that this is an orthonormal system if
Z b
(
1 if n = m,
hϕn , ϕm i = ϕn (x) ϕm (x) dx =
a 0 otherwise.

In particular, kϕn k2 = 1 for all n. If we only require that hϕn , ϕm i = 0 for m 6= n then the system
would be just an orthogonal system.

We noticed above that  


1 inx
√ e

is an orthonormal system. The factor out in front is to make the norm be 1.
Having an orthonormal system {ϕn } on [a, b] and an integrable function f on [a, b], we can
write a Fourier series relative to {ϕn }. We let
Z b
cn := h f , ϕn i = f (x)ϕn (x) dx,
a

and write

f (x) ∼ ∑ cnϕn.
n=1
In other words, the series is

∑ h f , ϕniϕn(x).
n=1
Notice the similarity to the expression for the orthogonal projection of a vector onto a subspace
from linear algebra. We are in fact doing just that, but in a space of functions.

Theorem 11.8.7. Suppose f is a Riemann integrable function on [a, b]. Let {ϕn } be an orthonormal
system on [a, b] and suppose

f (x) ∼ ∑ cnϕn(x).
n=1
If
n n
sn (x) := ∑ ck ϕk (x) and pn (x) := ∑ dk ϕk (x).
k=1 k=1

for some other sequence {dk }, then


Z b Z b
| f (x) − sn (x)|2 dx = k f − sn k22 ≤ k f − pn k22 = | f (x) − pn (x)|2 dx.
a a

with equality only if dk = ck for all k = 1, 2, . . . , n.


11.8. FOURIER SERIES 179

In other words the partial sums of the Fourier series are the best approximation with respect to
the L2 norm.
Proof. Let us write
Z b Z b Z b Z b Z b
| f − pn |2 = | f |2 − f pn − f pn + |pn |2 .
a a a a a
Now Z b Z b Z b
n n n

a
f pn =
a
f ∑ dk ϕk = ∑ dk a
f ϕk = ∑ dk ck ,
k=1 k=1 k=1
and Z b Z b n Z b
n n n n
2
a
|pn | = ∑ dk ϕk ∑ d j ϕ j = ∑ ∑ dk d j
a k=1 a
ϕk ϕ j = ∑ |dk |2.
j=1 k=1 j=1 k=1
So
Z b Z b n n n Z b n n
2 2 2
| f − pn | = | f | − ∑ dk ck − ∑ dk ck + ∑ |dk | = | f | − ∑ |ck | + ∑ |dk − ck |2 .
2 2
a a k=1 k=1 k=1 a k=1 k=1

This is minimized precisely when dk = ck .


When we do plug in dk = ck , then
Z b Z b n
| f − sn |2 = | f |2 − ∑ |ck |2
a a k=1

and so Z b
n
2
∑ |ck | ≤ | f |2
k=1 a

for all n. Note that


n
∑ |ck |2 = ksnk22
k=1
by the above calculation. We take a limit to obtain the so called Bessel’s inequality.
Theorem 11.8.8 (Bessel’s inequality ). Suppose f is a Riemann integrable function on [a, b]. Let
{ϕn } be an orthonormal system on [a, b] and suppose

f (x) ∼ ∑ cnϕn(x).
n=1

Then Z b

2
∑ |ck | ≤ | f |2 = k f k22 .
k=1 a
Rb 2
In particular (given that a Riemann integrable function satisfies a |f| < ∞), we get that the
series converges and hence
lim ck = 0.
k→∞

Named after the German astronomer, mathematician, physicist, and geodesist (1784–
1846).
180 CHAPTER 11. FUNCTIONS AS LIMITS

11.8.4 The Dirichlet kernel and approximate delta functions


Let us return to the trigonometric Fourier series. Here we note that the system {einx } is orthogonal,
but not orthonormal if we simply integrate over [−π , π ]. We can also rescale the integral and hence
the inner product to make {einx } orthonormal. That is, if we replace
Z b Z π
1
with
a 2π −π

(we are just rescaling the dx really) then everything works and we obtain that the system {einx } is
orthonormal with respect to the inner product
Z π
1
h f , gi = f (x) g(x) dx.
2π −π

So suppose f is an integrable function on [−π , π ], further suppose that f is defined on all of R


and is 2π -periodic. Let Z
1 π
cn := f (x)e−inx dx.
2π −π
Write

f (x) ∼ ∑ cn einx .
n=−∞
We define the symmetric partial sums
N
sN ( f ; x) := ∑ cn einx .
n=−N

The inequality leading up to Bessel now reads:


Z π N Z π
1 2 1 2
|sN ( f ; x)| dx = ∑ |cn | ≤ | f (x)|2 dx.
2π −π n=−N 2π −π

Let us now define the Dirichlet kernel


N
DN (x) := ∑ einx .
n=−N

We claim that 
N sin (N + 1/2)x
inx
DN (x) = ∑ e =
sin(x/2)
,
n=−N

at least for x such that sin(x/2) 6= 0. We know that the left hand side is continuous and hence the
right hand side extends continuously to all of R as well. To show the claim we use a familiar trick:

(eix − 1)DN (x) = ei(N+1)x − e−iNx .



Mathematicians in this field sometimes simplify matters by making a tongue-in-cheek definition that 1 = 2π.
11.8. FOURIER SERIES 181

Multiply by e−ix/2
(eix/2 − e−ix/2 )DN (x) = ei(N+ /2)x − e−i(N+ /2)x .
1 1

The claim follows.


We expand the definition of sN

N Z π Z π N
1 −int inx 1
sN ( f ; x) = ∑ f (t)e dt e = f (t) ∑ ein(x−t) dt
n=−N 2π −π 2π −π n=−N
Z π
1
= f (t)DN (x − t) dt.
2π −π

Convolution strikes again! As DN and f are 2π -periodic we may also change variables and write
Z x+π Z π
1 1
sN ( f ; x) = f (x − t)DN (t) dt = f (x − t)DN (t) dt.
2π x−π 2π −π

See for a plot of DN for N = 5 and N = 20.

40
35
30
25
20
15
10
5
0
−5
−10
−4 −3 −2 −1 0 1 2 3 4
Figure 11.10: Plot of DN (x) for N = 5 (gray) and N = 20 (black).

The central peak gets taller and taller as N gets larger, and the side peaks stay small (but oscillate
wildly). We are convolving (again) with approximate delta functions, although these have all these
oscillations away from zero, which do not go away. So we expect that sN ( f ) goes to f . Things are
not always so simple, but under some conditions on f , such a conclusion holds. For this reason
people write

δ (x) ∼ ∑ einx
n=∞

although we have not really defined the delta function (and it is not a function), nor a Fourier series
of whatever kind of object it is.
182 CHAPTER 11. FUNCTIONS AS LIMITS

11.8.5 Localization
If f satisfies a Lipschitz condition at a point, then the Fourier series converges at that point.
Theorem 11.8.9. Let x be fixed and let f be a 2π -periodic function Riemann integrable on [−π , π ].
Suppose there exist δ > 0 and M such that

| f (x + t) − f (x)| ≤ M|t|

for all t ∈ (−δ , δ ), then


lim sN ( f ; x) = f (x).
N→∞

In particular, if f is continuously differentiable at x then we obtain convergence (exercise).


We state the more often used version of this corollary. A function f : [a, b] → C is continuous
piecewise smooth if it is continuous and there exist points x0 = a < x1 < x2 < · · · < xk = b such that
f restricted to [x j , x j+1 ] is continuously differentiable (up to the endpoints) for all j.
Corollary 11.8.10. Let f be a 2π -periodic function Riemann integrable on [−π , π ]. Suppose there
exist x ∈ R and δ > 0 such that f is continuous piecewise smooth on [x − δ , x + δ ], then

lim sN ( f ; x) = f (x).
N→∞

The proof of the corollary is left as an exercise.


Proof of . For all N,
Z π
1
DN = 1.
2π −π

Write
Z Z
1 π 1 π
sN ( f ; x) − f (x) = f (x − t)DN (t) dt − f (x) DN (t) dt
2π −π 2π −π
Z
1 π 
= f (x − t) − f (x) DN (t) dt
2π −π
Z
1 π f (x − t) − f (x) 
= sin (N + 1/2)t dt.
2π −π sin(t/2)
By the hypotheses, for small nonzero t we get
f (x − t) − f (x) M|t|
≤ .
sin(t/2) |sin(t/2)|
h(t) M|t|
As sin(t) = t + h(t) where t → 0 as t → 0, we notice that |sin(t/2)| is continuous at the origin and
hence f (x−t)− f (x)
sin(t/2) must be bounded near the origin. As t = 0 is the only place on [−π , π ] where the
denominator vanishes, it is the only place where there could be a problem. The function is also
Riemann integrable. We use a trigonometric identity

sin (N + 1/2)t = cos(t/2) sin(Nt) + sin(t/2) cos(Nt),
11.8. FOURIER SERIES 183

so
Z π 
1 f (x − t) − f (x)
sin (N + 1/2)t dt =
2π −π sin(t/2)
Z   Z
1 π f (x − t) − f (x) 1 π 
cos( /2) sin(Nt) dt +
t f (x − t) − f (x) cos(Nt) dt.
2π −π sin(t/2) 2π −π

Now f (x−t)− f (x)
sin(t/2) cos(t/2) and f (x − t) − f (x) are bounded Riemann integrable functions and so
their Fourier coefficients go to zero by . So the two integrals on the right hand side,
which compute the Fourier coefficients for the real version of the Fourier series go to 0 as N goes to
infinity. This is because sin(Nt) and cos(Nt) are also orthonormal systems with respect to the same
inner product. Hence sN ( f ; x) − f (x) goes to 0, that is, sN ( f ; x) goes to f (x).
The theorem also says that convergence depends only on local behavior.
Corollary 11.8.11. Suppose f is a 2π -periodic function, Riemann integrable on [−π , π ]. If J is an
open interval and f (x) = 0 for all x ∈ J, then lim sN ( f ; x) = 0 for all x ∈ J.
In particular, if f and g are 2π -periodic functions, Riemann integrable on [−π , π ], J an open
interval, and f (x) = g(x) for all x ∈ J, then for all x ∈ J, the sequence {sN ( f ; x)} converges if and
only if {sN (g; x)} converges.
That is, convergence at x is only dependent on the values of the function near x. To prove the
first claim, take M = 0 in the theorem. The “In particular” follows by considering the function f − g,
which is zero on J and sN ( f − g) = sN ( f ) − sN (g). On the other hand, we have seen that the rate of
convergence, that is how fast does sN ( f ) converge to f , depends on global behavior of the function.
There is a subtle difference between the corollary and what can be achieved by the
. Any continuous function on [−π , π ] can be uniformly approximated by
trigonometric polynomials, but these trigonometric polynomials need not be the partial sums sN .

11.8.6 Parseval’s theorem


Next we prove that convergence always happens in the L2 sense and that operations on the (infinite)
vectors of Fourier coefficients are the same as the operations using the integral inner product.
Theorem 11.8.12 (Parseval ). Let f and g be 2π -periodic functions, Riemann integrable on [−π , π ]
with ∞ ∞
f (x) ∼ ∑ cn einx and g(x) ∼ ∑ dn einx .
n=−∞ n=−∞
Then Z π
1
lim k f − sN ( f )k22 = lim | f (x) − sN ( f ; x)|2 dx = 0.
N→∞ N→∞ 2π −π
Also Z π ∞
1
h f , gi =
2π −π
f (x)g(x) dx = ∑ cn dn ,
n=−∞
and Z π ∞
1
k f k22 = | f (x)|2 dx = ∑ |cn |2 .
2π −π n=−∞

Named after the French mathematician (1755–1836).
184 CHAPTER 11. FUNCTIONS AS LIMITS

Proof. There exists (exercise) a continuous 2π -periodic function h such that

k f − hk2 < ε .

Via , approximate h with a trigonometric polynomial uniformly. That is, there is


a trigonometric polynomial P(x) such that |h(x) − P(x)| < ε for all x. Hence
r Z π
1
kh − Pk2 = |h(x) − P(x)|2 dx ≤ ε .
2π −π

If P is of degree N0 , then for all N ≥ N0 we have

kh − sN (h)k2 ≤ kh − Pk2 ≤ ε

as sN (h) is the best approximation for h in L2 ( ). By the inequality leading up to


Bessel we have
ksN (h) − sN ( f )k2 = ksN (h − f )k2 ≤ kh − f k2 ≤ ε .
The L2 norm satisfies the triangle inequality (exercise). Thus, for all N ≥ N0 ,

k f − sN ( f )k2 ≤ k f − hk2 + kh − sN (h)k2 + ksN (h) − sN ( f )k2 ≤ 3ε .

Hence, the first claim follows.


Next,
Z π N Z π N
1 1 ikx
hsN ( f ), gi = sN ( f ; x)g(x) dx = ∑ ck e g(x) dx = ∑ ck dk .
2π −π k=−N 2π −π k=−N

We need the Schwarz (or Cauchy–Schwarz or Cauchy–Bunyakovsky–Schwarz) inequality, that is


Z b 2 Z b
 Z b

2 2
f ḡ ≤ |f| |g| .
a a a

This is left as an exercise. The proof is not really that different from the finite dimensional version.
So
Z π Z π Z π
f ḡ − sN ( f )g = ( f − sN ( f ))g
−π −π −π
Z π
≤ | f − sN ( f )| |g|
−π
Z π 1/2 Z π
1/2
2 2
≤ | f − sN ( f )| |g| .
−π −π

The right hand side goes to 0 as N goes to infinity by the first claim of the theorem. That is, as
N goes to infinity, hsN ( f ), gi goes to h f , gi, and the second claim is proved. The last claim in the
theorem follows by using g = f .
11.8. FOURIER SERIES 185

11.8.7 Exercises
Exercise 11.8.1: Take the Fourier series

1
∑ 2n sin(2n x).
n=1

Show that the series converges uniformly and absolutely to a continuous function. Note: This is another
example of a nowhere differentiable function (you do not have to prove that) . See .

0.8

0.4

−0.4

−0.8
−4 −2 0 2 4

Figure 11.11: Plot of ∑∞ 1 n


n=1 2n sin(2 x).

Exercise 11.8.2: Suppose that a 2π-periodic function that is Riemann integrable on [−π, π], and such
that f is continuously differentiable on some open interval (a, b). Prove that for every x ∈ (a, b), we have
lim sN ( f ; x) = f (x).

Exercise 11.8.3: Prove , that is, Suppose a 2π-periodic function is continuous piecewise
smooth near a point x, then lim sN ( f ; x) = f (x). Hint: See the previous exercise.

Exercise 11.8.4: Given a 2π-periodic function f : R → C Riemann integrable on [−π, π], and ε > 0. Show
that there exists a continuous 2π-periodic function g : R → C such that k f − gk2 < ε.

Exercise 11.8.5: Prove the Cauchy-Bunyakovsky-Schwarz inequality for Riemann integrable functions:

Z b 2 Z b
 Z b

2 2
f ḡ ≤ |f| |g| .
a a a

Exercise 11.8.6: Prove the L2 triangle inequality for Riemann integrable functions on [−π, π]:

k f + gk2 ≤ k f k2 + kgk2 .


See G. H. Hardy, Weierstrass’s Non-Differentiable Function, Transactions of the American Mathematical Society,
17, No. 3 (Jul., 1916), pp. 301–325.
186 CHAPTER 11. FUNCTIONS AS LIMITS

C
Exercise 11.8.7: Suppose for some C and α > 1, we have a real sequence {an } with |an | ≤ nα for all n. Let

g(x) := ∑ an sin(nx).
n=1

a) Show that g is continuous.


b) Formally (that is, suppose you can differentiate under the sum) find a solution (formal solution, that is,
do not yet worry about convergence) to the differential equation

y′′ + 2y = g(x)

of the form

y(x) = ∑ bn sin(nx).
n=1

c) Then show that this solution y is twice continuously differentiable, and in fact solves the equation.

Exercise 11.8.8: Let f be a 2π-periodic function such that f (x) = x for 0 < x < 2π. Use Parseval’s theorem
to find

1 π2
∑ n2 = 6 .
n=1

Exercise 11.8.9: Suppose that cn = 0 for all n < 0 and ∑∞ n=0 |cn | converges. Let D = B(0, 1) ⊂ C be the unit
disc, and D = C(0, 1) be the closed unit disc. Show that there exists a continuous function f : D → C which
is analytic on D and such that on the boundary of D we have f (eiθ ) = ∑∞ iθ
n=0 cn e .
iθ n n
Hint: if z = re then z = r e .inθ

Exercise 11.8.10: Show that ∞


∑ e−1/n sin(nx)
n=1
converges to an infinitely differentiable function.
Rx
Exercise 11.8.11: Let f be a 2π-periodic function such that f (x) = f (0) + 0 g for a function g that is
Riemann integrable on any interval. Suppose

f (x) ∼ ∑ cn einx .
n=−∞

C
Show that there exists a C > 0 such that |cn | ≤ |n| .

Exercise 11.8.12:
a) Let ϕ be the 2π-periodic function defined by ϕ(x) := 0 if x ∈ (−π, 0), and ϕ(x) := 1 if x ∈ (0, π), letting
ϕ(0) and ϕ(π) be arbitrary. Show that lim sN (ϕ; 0) = 1/2.
b) Let f be a 2π-periodic function Riemann integrable on [−π, π], x ∈ R, δ > 0, and there are continuously
differentiable g : [x − δ , x] → C and h : [x, x + δ ] → C where f (t) = g(t) for all t ∈ [x − δ , x) and where
f (t) = h(t) for all t ∈ (x, x + δ ]. Then lim sN ( f ; x) = g(x)+h(x)
2 , or in other words:
 
1
lim sN ( f ; x) = lim f (t) + lim+ f (t) .
N→∞ 2 t→x− t→x
Further Reading

[R1] Maxwell Rosenlicht, Introduction to analysis, Dover Publications Inc., New York, 1986.
Reprint of the 1968 edition.
[R2] Walter Rudin, Principles of mathematical analysis, 3rd ed., McGraw-Hill Book Co., New
York, 1976. International Series in Pure and Applied Mathematics.
[T] William F. Trench, Introduction to real analysis, Pearson Education, 2003.
.
188 FURTHER READING
Index

algebra, , continuously differentiable path,


analytic, converges
antiderivative, complex series,
approximate delta function, power series,
arc-length measure, converges absolutely
arc-length parametrization, complex series,
Arzelà–Ascoli theorem, converges pointwise,
complex series,
basis, converges uniformly,
Bessel’s inequality, convex,
bilinear, convex combination,
bounded domain with piecewise smooth convex hull,
boundary, convolution,
cosine,
Cantor function,
critical point,
Cantor set,
curve,
Cauchy
complex series, Darboux integral,
Cauchy–Schwarz inequality, Darboux sum,
change of basis, derivative,
characteristic function, complex-valued function,
closed path, determinant,
closed rectangle, Devil’s staircase,
column vectors, diagonal matrix,
columns, differentiable,
communtative diagram, differentiable curve,
compact operator, differential one-form,
compact support, dimension,
complex algebra, directional derivative,
complex conjugate, Dirichlet kernel,
complex number, dot product,
complex plane,
conservative vector field, eigenvalue,
continuous piecewise smooth, , elementary matrix,
continuously differentiable, , equicontinuous,
190 INDEX

Euclidean norm, lower Darboux integral,


Euler’s formula, lower Darboux sum,
even permutation,
map,
Fourier coefficients, mapping,
Fourier series, matrix,
Fubini for sums, maximum principle,
maximum principle for analytic functions,
generate an algebra,
gradient, mean value property,
Green’s theorem, measure zero,
hyperbolic cosine, modulus,
hyperbolic sine, monotonicity of the integral,

identity, n-dimensional volume


imaginary axis, Jordan measurable set,
imaginary part, rectangles,
implicit function theorem, negatively oriented,
indicator function, norm,
inner product, normed vector space,
integrable, null set,
inverse function theorem, odd permutation,
invertible linear transformation, one-form,
isolated singularity, open mapping,
Jacobian, open rectangle,
Jacobian conjecture, operator norm,
Jacobian determinant, operator, linear,
Jacobian matrix, orthogonal system,
Jordan measurable, orthonormal system,
oscillation,
k-times continuously differentiable function, outer measure,

Kronecker density theorem, Parseval’s theorem,


partial derivative,
Leibniz integral rule, partial derivative of order ℓ,
length, partition,
length of a curve, path connected,
linear, path independent,
linear combination, Peano existence theorem,
linear operator, Peano surface,
linear transformation, permutation,
linearity of the integral, piecewise continuously differentiable path,
linearly dependent, piecewise smooth,
linearly independent, piecewise smooth boundary,
longest side, piecewise smooth path,
INDEX 191

piecewise smooth reparametrization, star-shaped domain,


Poincarè lemma, Stone–Weierstrass
pointwise bounded, complex version,
pointwise convergence, real version,
complex series, subrectangle,
polar coordinates, , subspace,
pole, support,
positively oriented, supremum norm,
potential, symmetric group,
preserve orientation, symmetric partial sums,

radius of convergence, total derivative,


rational function, transformation, linear,
real algebra, triangle inequality
real axis, complex numbers,
real part, triangle inequality for norms,
real vector space, trigonometric polynomial,
real-analytic, type I domain,
rectangle, type II domain,
refinement of a partition, type III domain,
relative maximum,
relative minimum, uniform convergence,
relatively compact, , uniform norm,
removable singularity, uniformly bounded,
reparametrization, uniformly Cauchy,
reverse orientation, uniformly equicontinuous,
Riemann integrable, upper Darboux integral,
complex-valued function, upper Darboux sum,
Riemann integral, upper triangular matrix,

scalars, vanishes at no point,


separates points, vector, ,
simple path, vector field,
simply connected, vector space,
sine, vector subspace,
singularity, volume,
smooth path, volume of rectangles,
smooth reparametrization, Weierstrass M-test,
span, winding number,
spectral radius,
standard basis, zero of a function,
192 INDEX
List of Notation

Notation Description Page


(v1 , v2 , . . . , vn ) vector
 v1 
.. vector (column vector)
.
vn

R[t] the set of polynomials in t


span(Y ) span of the set Y
ej standard basis vector (0, . . . , 0, 1, 0, . . . , 0)
L(X,Y ) set of linear maps from X to Y
L(X) set of linear operators on X
x 7→ y function that takes x to y
k·k norm on a vector space
x·y dot product of x and y
k·kRn the euclidean norm on Rn
k·kL(X,Y ) operator norm on L(X,Y )
GL(X) invertible linear operators on X
" a ··· a #
1,1 1,n
.. . . .. matrix
. . .
am,1 ··· am,n

sgn(x) sign function


∏ product
det(A) determinant of A
f ′, D f derivative of f ,
∂f
∂xj , Dj f partial derivative of f with respect to x j
∇f gradient of f
∂f
Du f , ∂u directional derivative of f
194 LIST OF NOTATION

Notation Description Page


∂ ( f1 , f2 ,..., fn )
Jf , ∂ (x1 ,x2 ,...,xn ) Jacobian determinant of f

C1 , C1 (U) continuously differentiable function/mapping


∂2 f
∂ x2 ∂ x1 derivative of f with respect to x1 and then x2
fx1 x2 derivative of f with respect to x1 and then x2
Ck k-times continuously differentiable function
ω1 dx1 + ω2 dx2 + · · · + ωn dxn differential one-form
R
γω path integral of a one-form
R R
γ f ds, γ f (x) ds(x) line integral of f against arc-length measure
R
γ v · dγ path integral of a vector field
V (R) n-dimensional volume ,
L(P, f ) lower Darboux sum of f over partition P
U(P, f ) upper Darboux sum of f over partition P
Z
f lower Darboux integral over rectangle R
R
Z
f upper Darboux integral over rectangle R
R
R(R) Riemann integrable functions on R
Z Z Z
f, f (x) dx, f (x) dV Riemann integral of f on R
R R R

m (S) outer measure of S
o( f , x, δ ), o( f , x) oscillation of a function at x
χS indicator function of S

i The imaginary number, −1
Re z real part of z
Im z imaginary part of z
z complex conjugate of z
|z| modulus of z
k f ku uniform norm of f
ez complex exponential function
sin(z) sine function
cos(z) cosine function
LIST OF NOTATION 195

Notation Description Page


π the number π

f (x) ∼ ∑ cn einx Fourier series for f
n=−∞

h f , gi inner product of functions


k f k2 L2 norm of f
sN ( f ; x) symmetric partial sum of a Fourier series

You might also like