Basic Analysis II
Basic Analysis II
by Jiří Lebl
May 7, 2018
(version 2.0)
2
Typeset in LATEX.
This work is dual licensed under the Creative Commons Attribution-Noncommercial-Share Alike
4.0 International License and the Creative Commons Attribution-Share Alike 4.0 International
License. To view a copy of these licenses, visit
or or send a letter to
Creative Commons PO Box 1866, Mountain View, CA 94042, USA.
You can use, print, duplicate, share this book as much as you want. You can base your own notes
on it and reuse parts if you keep the license the same. You can assume the license is either the
CC-BY-NC-SA or CC-BY-SA, whichever is compatible with what you wish to do, your derivative
works must use at least one of the licenses.
During the writing of these notes, the author was in part supported by NSF grant DMS-1362337.
The date is the main identifier of version. The major version / edition number is raised only if there
have been substantial changes. For example version 2.0 is first edition, 0th update (no updates yet).
7
. . . . . . . . . . . . . . . . . . . . 7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
. . . . . . . . . . . . . . . . . . . . . . . . 47
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
61
. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
85
. . . . . . . . . . . . . . . . . . . . . . . . . . . 85
. . . . . . . . . . . . . . . . . . . . . . . . . 96
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
. . . . . . . . . . . . . . . . . . . . . . . 109
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
125
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
. . . . . . . . . . . . . . . . . . . . . . . . . . 138
. . . . . . . . . . . . . . 147
. . . . . . . . . . . . . . . . . . . . . . . . . . . 153
. . . . . . . . . . . . . . . . . . . . 156
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
187
4 CONTENTS
189
193
Introduction
We will do so when convenient. We call real numbers scalars to distinguish them from vectors.
In Rn we often think of vectors as a direction and a magnitude, and draw the vector as an arrow.
The vector (v1 , v2 , . . . , vn ) is represented by the arrow from the origin to the point (v1 , v2 , . . . , vn ),
see in the plane R2 . When we do think of vectors as arrows, they are not based at the
origin necessarily; a vector is simply the direction and the magnitude, and it does not know where it
starts.
On the other hand, each vector also represents a point in Rn . Usually we think of v ∈ Rn as
a point if we are thinking of Rn as a metric space, and we think of it as arrow if we think of the
so-called vector space structure on Rn . Let us define the abstract notion of the vector space, as there
are many other vector spaces than just Rn .
∗
Subscripts are used for many purposes, so sometimes we may have several vectors that may also be identified by
subscript, such as a finite or infinite sequence of vectors y1 , y2 , . . ..
8 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
x2
(v1 , v2 )
x1
Figure 8.1: Vector as an arrow.
Definition 8.1.1. Let X be a set together with the operations of addition, + : X × X → X, and
multiplication, · : R × X → X, (we usually write ax instead of a · x). X is called a vector space (or a
real vector space) if the following conditions are satisfied:
(i) (Addition is associative) If u, v, w ∈ X, then u + (v + w) = (u + v) + w.
(ii) (Addition is commutative) If u, v ∈ X, then u + v = v + u.
(iii) (Additive identity) There is a 0 ∈ X such that v + 0 = v for all v ∈ X.
(iv) (Additive inverse) For every v ∈ X, there is a −v ∈ X, such that v + (−v) = 0.
(v) (Distributive law) If a ∈ R, u, v ∈ X, then a(u + v) = au + av.
(vi) (Distributive law) If a, b ∈ R, v ∈ X, then (a + b)v = av + bv.
(vii) (Multiplication is associative) If a, b ∈ R, v ∈ X, then (ab)v = a(bv).
(viii) (Multiplicative identity) 1v = v for all v ∈ X.
Elements of a vector space are usually called vectors, even if they are not elements of Rn (vectors in
the “traditional” sense).
If Y ⊂ X is a subset that is a vector space itself using the same operations, then Y is called a
subspace or a vector subspace of X.
Multiplication by scalars works as one would expect. For example, 2v = (1 + 1)v = 1v +
1v = v + v, similarly 3v = v + v + v, and so on. One particular fact we often use is that 0v = 0,
where the zero on the left is 0 ∈ R and the zero on the right is 0 ∈ X. To see this start with
0v = (0 + 0)v = 0v + 0v, and add −(0v) to both sides to obtain 0 = 0v. Similarly −v = (−1)v,
which follows by (−1)v + v = (−1)v + 1v = (−1 + 1)v = 0v = 0. These algebraic facts which
follow quickly from the definition we will take for granted from now on.
Example 8.1.2: An example vector space is Rn , where addition and multiplication by a scalar is
done componentwise: if a ∈ R, v = (v1 , v2 , . . . , vn ) ∈ Rn , and w = (w1 , w2 , . . . , wn ) ∈ Rn , then
v + w := (v1 , v2 , . . . , vn ) + (w1 , w2 , . . . , wn ) = (v1 + w1 , v2 + w2 , . . . , vn + wn ),
av := a(v1 , v2 , . . . , vn ) = (av1 , av2 , . . . , avn ).
In this book we mostly deal with vector spaces that can be often regarded as subsets of Rn , but
there are other vector spaces useful in analysis. Let us give a couple of examples.
Example 8.1.3: A trivial example of a vector space is just X := {0}. The operations are defined in
the obvious way: 0 + 0 := 0 and a0 := 0. A zero vector must always exist, so all vector spaces are
nonempty sets, and this X is in fact the smallest possible vector space.
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 9
Example 8.1.4: The space C([0, 1], R) of continuous functions on the interval [0, 1] is a vector
space. For two functions f and g in C([0, 1], R) and a ∈ R, we make the obvious definitions of f + g
and a f :
( f + g)(x) := f (x) + g(x), (a f )(x) := a f (x) .
The 0 is the function that is identically zero. We leave it as an exercise to check that all the vector
space conditions are satisfied.
The space C1 ([0, 1], R) of continuously differentiable functions is a subspace of C([0, 1], R).
Example 8.1.5: The space of polynomials c0 + c1t + c2t 2 + · · · + cmt m (of arbitrary degree m) is a
vector space. Let us denote it by R[t] (coefficients are real and the variable is t). The operations are
defined in the same way as for functions above. Suppose there are two polynomials, one of degree
m and one of degree n. Assume n ≥ m for simplicity. Then
and
a(c0 + c1t + c2t 2 + · · · + cmt m ) = (ac0 ) + (ac1 )t + (ac2 )t 2 + · · · + (acm )t m .
Despite what it looks like, R[t] is not equivalent to Rn for any n. In particular, it is not “finite
dimensional”. We will make this notion precise in just a little bit. One can make a finite dimensional
vector subspace by restricting the degree. For example, if Pn is the set of polynomials of degree n
or less, then Pn is a finite dimensional vector space, and we could identify it with Rn+1 .
In the above, the variable t is really just a formal placeholder. By setting t equal to a real number
we obtain a function. So the space R[t] can be thought of as a subspace of C(R, R). If we restrict
the range of t to [0, 1], R[t] can be identified with a subspace of C([0, 1], R).
Remark 8.1.6. If X is a vector space, to check that a subset S ⊂ X is a vector subspace, we only need
1) 0 ∈ S,
2) S is closed under addition, adding two vectors in S gets us a vector in S, and
3) S is closed under scalar multiplication, multiplying a vector in S by a scalar gets us a vector in S.
2) and 3) make sure that the addition and scalar multiplication are in fact defined on S. 1) is required
to fullfill . Existence of additive inverse −v follows because −v = (−1)v and 3) says that −v ∈ S
if v ∈ S. All other properties are certain equalities which are already satisfied in X and thus must be
satisfied in a subset.
It is often better to think of even the simpler “finite dimensional” vector spaces using the abstract
notion rather than always as Rn . It is possible to use other fields than R in the definition (for example
it is common to use the complex numbers C), but let us stick with the real numbers .
∗
If you want a very funky vector space over a different field, R itself is a vector space over the rational numbers.
10 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
That is, span(Y ) is the line through the origin and the point (1, 1).
Example 8.1.9: Let Y := (1, 1), (0, 1) ⊂ R2 . Then
span(Y ) = R2 ,
span(Y ) = R[t].
(a1 x1 + a2 x2 + · · · + ak xk ) + (b1 x1 + b2 x2 + · · · + bk xk )
= (a1 + b1 )x1 + (a2 + b2 )x2 + · · · + (ak + bk )xk .
For Rn we define
and call this the standard basis of Rn . We use the same letters e j for any Rn , and which space Rn
we are working in is understood from context. A direct computation shows that {e1 , e2 , . . . , en } is
really a basis of Rn ; it spans Rn and is linearly independent. In fact,
n
a = (a1 , a2 , . . . , an ) = ∑ a j e j.
j=1
Proof. Let us start with . Suppose S = {x1 , x2 , . . . , xd } spans X, and T = {y1 , y2 , . . . , ym } is a set
of linearly independent vectors of X. We wish to show that m ≤ d. Write
d
y1 = ∑ ak,1xk ,
k=1
for some numbers a1,1 , a2,1 , . . . , ad,1 , which we can do as S spans X. One of the ak,1 is nonzero
(otherwise y1 would be zero), so suppose without loss of generality that this is a1,1 . Then we solve
d a
1 k,1
x1 = y1 − ∑ xk .
a1,1 k=2 a1,1
In particular, {y1 , x2 , . . . , xd } span X, since x1 can be obtained from {y1 , x2 , . . . , xd }. Therefore, there
are some numbers for some numbers a1,2 , a2,2 , . . . , ad,2 , such that
d
y2 = a1,2 y1 + ∑ ak,2 xk .
k=2
As T is linearly independent, one of the ak,2 for k ≥ 2 must be nonzero. Without loss of generality
suppose a2,2 6= 0. Proceed to solve for
d a
1 a1,2 k,2
x2 = y2 − y1 − ∑ xk .
a2,2 a2,2 k=3 a2,2
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 13
We usually write Ax instead of A(x) if A is linear. If A is one-to-one and onto, then we say A is
invertible, and we denote the inverse by A−1 . If A : X → X is linear, then we say A is a linear
operator on X.
We write L(X,Y ) for the set of all linear transformations from X to Y , and just L(X) for the set
of linear operators on X. If a ∈ R and A, B ∈ L(X,Y ), define the transformations aA and A + B by
If A ∈ L(Y, Z) and B ∈ L(X,Y ), define the transformation AB as the composition A ◦ B, that is,
ABx := A(Bx).
Finally denote by I ∈ L(X) the identity: the linear operator such that Ix = x for all x.
14 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
It is not hard to see that aA ∈ L(X,Y ) and A + B ∈ L(X,Y ), and that AB ∈ L(X, Z). In particular,
L(X,Y ) is a vector space. As the set L(X) is not only a vector space, but also admits a product
(composition of operators), it is often called an algebra.
An immediate consequence of the definition of a linear mapping is: if A is linear, then A0 = 0.
Proof. Let a ∈ R and y ∈ Y . As A is onto, then there is an x such that y = Ax, and further as it is
also one-to-one A−1 (Az) = z for all z ∈ X. So
A−1 (ay) = A−1 (aAx) = A−1 A(ax) = ax = aA−1 (y).
We will only prove this proposition for finite dimensional spaces, as we do not need infinite
dimensional spaces. For infinite dimensional spaces, the proof is essentially the same, but a little
trickier to write, so let us stick with finitely many dimensions.
Proof. Let {x1 , x2 , . . . , xn } be a basis of X, and let y j := Ax j . Every x ∈ X has a unique representation
n
x= ∑ bj xj
j=1
The “furthermore” follows by setting y j := A(xe j ), and then for x = ∑n b j x j , defining the extension
j=1
n
as Ax := ∑ j=1 b j y j . The function is well-defined by uniqueness of the representation of x. We leave
it to the reader to check that A is linear.
The next proposition only works for finite dimensional vector spaces. It is a special case of the
so-called rank-nullity theorem from linear algebra.
Proposition 8.1.18. If X is a finite dimensional vector space and A ∈ L(X), then A is one-to-one if
and only if it is onto.
and c j = 0 for all j. So {Ax1 , Ax2 , . . . , Axn } is a linearly independent set. By and
the fact that the dimension is n, we conclude {Ax1 , Ax2 , . . . , Axn } spans X. Any point x ∈ X can be
written as
n n
x= ∑ a j Ax j = A ∑ a j x j ,
j=1 j=1
so A is onto.
Now suppose A is onto. As A is determined by the action on the basis, every element of X is in
the span of {Ax1 , Ax2 , . . . , Axn }. Suppose that for some c1 , c2 , . . . , cn ,
n n
A ∑ cj xj = ∑ c j Ax j = 0.
j=1 j=1
By as {Ax1 , Ax2 , . . . , Axn } span X, the set is linearly independent, and hence
c j = 0 for all j. In other words if Ax = 0, then x = 0. This means that A is one-to-one: If Ax = Ay,
then A(x − y) = 0 and so x = y.
We leave the proof of the next proposition as an exercise.
Proposition 8.1.19. If X and Y are finite dimensional vector spaces, then L(X,Y ) is also finite
dimensional.
Finally let us note that we often identify a finite dimensional vector space X of dimension n
with Rn , provided we fix a basis {x1 , x2 , . . . , xn } in X. That is, we define a bijective linear map
A ∈ L(X, Rn ) by Ax j := e j , where {e1 , e2 , . . . , en } is the standard basis in Rn . Then we have the
correspondence
n
A
∑ cj xj ∈ X 7→ (c1 , c2 , . . . , cn ) ∈ Rn .
j=1
8.1.4 Convexity
A subset U of a vector space is convex if whenever x, y ∈ U, the line segment from x to y lies in U.
That is, if the convex combination (1 − t)x + ty is in U for all t ∈ [0, 1]. Sometimes we write [x, y]
for this line segment. See .
In R, convex sets are precisely the intervals, which are also precisely the connected sets. In
two or more dimensions there are lots of nonconvex connected sets. For example, the set R2 \ {0}
is not convex, but it is connected. To see this, take any x ∈ R2 \ {0} and let y := −x. Then
(1/2)x + (1/2)y = 0, which is not in the set. Balls in Rn are convex. We use this result often enough
we state it as a proposition, and leave the proof as an exercise.
Proposition 8.1.20. Let x ∈ Rn and r > 0. The ball B(x, r) ⊂ Rn (using the standard metric on Rn )
is convex.
16 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
U y
(1 − t)x + ty
x
Example 8.1.22: A somewhat more complicated example is given by the following. Let C([0, 1], R)
be the vector space of continuous real-valued functions on R. Let X ⊂ C([0, 1], R) be the set of
those f such that
Z 1
f (x) dx ≤ 1 and f (x) ≥ 0 for all x ∈ [0, 1].
0
Then X is convex. Take t ∈ [0, 1], and note that if f , g ∈ X, then t f (x) + (1 − t)g(x) ≥ 0 for all x.
Furthermore
Z 1 Z 1 Z 1
t f (x) + (1 − t)g(x) dx = t f (x) dx + (1 − t) g(x) dx ≤ 1.
0 0 0
Proposition 8.1.23. The intersection two convex sets is convex. In fact, if {Cλ }λ ∈I is an arbitrary
collection of convex sets, then \
C := Cλ
λ ∈I
is convex.
Proof. If x, y ∈ C, then x, y ∈ Cλ for all λ ∈ I, and hence if t ∈ [0, 1], then tx + (1 − t)y ∈ Cλ for all
λ ∈ I. Therefore, tx + (1 − t)y ∈ C and C is convex.
Proposition 8.1.24. Let T : V → W be a linear mapping between two vector spaces and let C ⊂ V
be a convex set. Then T (C) is convex.
Proof. Take any two points p, q ∈ T (C). Pick x, y ∈ C such that T x = p and Ty = q. As C is convex,
then tx + (1 − t)y ∈ C for all t ∈ [0, 1], so
t p + (1 − t)q = tT x + (1 − t)Ty = T tx + (1 − t)y ∈ T (C).
8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 17
For completeness, a very useful construction is the convex hull. Given any set S ⊂ V of a vector
space, define the convex hull of S, by
\
co(S) := {C ⊂ V : S ⊂ C, and C is convex}.
That is, the convex hull is the smallest convex set containing S. By a proposition above, the
intersection of convex sets is convex and hence, the convex hull is convex.
Example 8.1.25: The convex hull of 0 and 1 in R is [0, 1]. Proof: Any convex set containing 0 and
1 must contain [0, 1]. The set [0, 1] is convex, therefore it must be the convex hull.
8.1.5 Exercises
Exercise 8.1.1: Show that in Rn (with the standard euclidean metric) for any x ∈ Rn and any r > 0, the ball
B(x, r) is convex.
Exercise 8.1.3: Let X be a vector space. Prove that a finite set of vectors {x1 , x2 , . . . , xn } ⊂ X is linearly
independent if and only if for every j = 1, 2, . . . , n
That is, the span of the set with one vector removed is strictly smaller.
R1
Exercise 8.1.4: Show that the set X ⊂ C([0, 1], R) of those functions such that 0 f = 0 is a vector subspace.
Exercise 8.1.5 (Challenging): Prove C([0, 1], R) is an infinite dimensional vector space where the operations
are defined in the obvious way: s = f + g and m = a f are defined as s(x) := f (x) + g(x) and m(x) := a f (x).
Hint: For the dimension, think of functions that are only nonzero on the interval (1/n+1, 1/n).
Exercise 8.1.6: Let k : [0, 1]2 → R be continuous. Show that L : C([0, 1], R) → C([0, 1], R) defined by
Z 1
L f (y) := k(x, y) f (x) dx
0
is a linear operator. That is, first show that L is well-defined by showing that L f is continuous whenever f is,
and then showing that L is linear.
Exercise 8.1.7: Let Pn be the vector space of polynomials in one variable of degree n or less. Show that
Pn is a vector space of dimension n + 1.
Exercise 8.1.8: Let R[t] be the vector space of polynomials in one variable t. Let D : R[t] → R[t] be the
derivative operator (derivative in t). Show that D is a linear operator.
Exercise 8.1.9: Let us show that only works in finite dimensions. Take R[t] and define the
operator A : R[t] → R[t] by A P(t) = tP(t). Show that A is linear and one-to-one, but show that it is not
onto.
18 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Exercise 8.1.10: Finish the proof of in the finite dimensional case. That is, suppose,
{x1 , x2 , . . . xn } is a basis of X, {y1 , y2 , . . . yn } ⊂ Y and we define a function
n n
Ax := ∑ b jy j, if x= ∑ b jx j.
j=1 j=1
Exercise 8.1.12 (Easy): Suppose X and Y are vector spaces and A ∈ L(X,Y ) is a linear operator.
a) Show that the nullspace N := {x ∈ X : Ax = 0} is a vector space.
b) Show that the range R := {y ∈ Y : Ax = y for some x ∈ X} is a vector space.
Exercise 8.1.13 (Easy): Show by example that a union of convex sets need not be convex.
Exercise 8.1.14: Compute the convex hull of the set of 3 points (0, 0), (0, 1), (1, 1) in R2 .
Exercise 8.1.15: Show that the set (x, y) ∈ R2 : y > x2 is a convex set.
R1
Exercise 8.1.16: Show that the set X ⊂ C([0, 1], R) of those functions such that 0 f = 1 is a convex set, but
not a vector subspace.
Exercise 8.1.17: Show that every convex set in Rn is connected using the standard topology on Rn .
Exercise 8.1.18: Suppose K ⊂ R2 is a convex set such that the only point of the form (x, 0) in K is the point
(0, 0). Further suppose that there (0, 1) ∈ K and (1, 1) ∈ K. Then show that if (x, y) ∈ K, then y > 0 unless
x = 0.
Exercise 8.1.19: Prove that an arbitrary intersection of vector subspaces is a vector subspace. That is, if X
T
is a vector space and {Vλ }λ ∈I is an arbitrary collection of vector subspaces of X, then λ ∈I Vλ is a vector
subspace of X.
8.2. ANALYSIS WITH VECTOR SPACES 19
8.2.1 Norms
Let us start measuring distance.
Definition 8.2.1. If X is a vector space, then we say a function k·k : X → R is a norm if:
(i) kxk ≥ 0, with kxk = 0 if and only if x = 0.
(ii) kcxk = |c| kxk for all c ∈ R and x ∈ X.
(iii) kx + yk ≤ kxk + kyk for all x, y ∈ X (Triangle inequality).
A vector space equipped with a norm is called a normed vector space.
Given a norm (any norm) on a vector space X, we define a distance d(x, y) := kx − yk, and this
d makes X into a metric space (exercise).
Before defining the standard norm on Rn , let us define the standard scalar dot product on Rn .
For two vectors if x = (x1 , x2 , . . . , xn ) ∈ Rn and y = (y1 , y2 , . . . , yn ) ∈ Rn , define
n
x · y := ∑ x j y j.
j=1
The dot product is linear in each variable separately, or in more fancy language it is bilinear. That
is, if y is fixed, the map x 7→ x · y is a linear map from Rn to R. Similarly, if x is fixed, then y 7→ x · y
is also linear. It is also symmetric in the sense that x · y = y · x. The Euclidean norm is defined as
√ q
kxk := kxkRn := x · x = (x1 )2 + (x2 )2 + · · · + (xn )2 .
We normally just use kxk, but sometimes it is necessary to emphasize that we are talking about the
euclidean norm and we use kxkRn . It is easy to see that the Euclidean norm satisfies and . To
prove that holds, the key inequality is the so-called Cauchy–Schwarz inequality we saw before.
As this inequality is so important let us restate and reprove a slightly stronger version using the
notation of this chapter.
If x is not a scalar multiple of y, then kx + tyk2 > 0 for all t. So the polynomial kx + tyk2 is never
zero. Elementary algebra says that the discriminant must be negative:
The distance d(x, y) := kx − yk is the standard distance (standard metric) on Rn that we used
when we talked about metric spaces.
Definition 8.2.3. Let A ∈ L(X,Y ). Define
kAk := sup kAxk : x ∈ X with kxk = 1 .
The number kAk (possibly ∞) is called the operator norm. We will see below that indeed it is a
norm for finite dimensional spaces. Again, when necessary to emphasize which norm we are talking
about, we may write it as kAkL(X,Y ) .
x kAxk x
By linearity, A kxk = kxk , for any nonzero x ∈ X. The vector kxk is of norm 1. Therefore,
kAxk
kAk = sup kAxk : x ∈ X with kxk = 1 = sup .
x∈X kxk
x6=0
kAxk ≤ kAkkxk.
It is not hard to see from the definition that kAk = 0 if and only if A = 0, that is, if A takes every
vector to the zero vector.
It is also not difficult to compute the operator norm of the identity operator:
kIxk kxk
kIk = sup = sup = 1.
x∈X kxk x∈X kxk
x6=0 x6=0
The operator norm is not always a norm on L(X,Y ), in particular, kAk is not always finite for
A ∈ L(X,Y ). We prove below that kAk is finite when X is finite dimensional. This also implies
that A is continuous. For infinite dimensional spaces neither statement needs to be true. For
example, take the vector space of continuously differentiable functions on [0, 2π ] and as the norm
use the uniform norm. The functions t 7→ sin(nt) have norm 1, but the derivatives have norm n. So
8.2. ANALYSIS WITH VECTOR SPACES 21
differentiation (which is a linear operator) has infinite operator norm on this space. We will stick to
finite dimensional spaces.
When we talk about a finite dimensional vector space X, one often thinks of Rn , although if we
have a norm on X, the norm might not be the standard euclidean norm. In the exercises, you can
prove that every norm is “equivalent” to the euclidean norm in that the topology it generates is the
same. For simplicity, we only prove the following proposition for the euclidean space, and the proof
for a general finite dimensional space is left as an exercise.
Proposition 8.2.4. Let X and Y be normed vector spaces. Suppose that X is finite dimensional. If
A ∈ L(X,Y ), then kAk < ∞, and A is uniformly continuous (Lipschitz with constant kAk).
Proof. As we said we only prove the proposition for euclidean space so suppose that X = Rn and
the norm is the standard euclidean norm. The general case is left as an exercise.
Let {e1 , e2 , . . . , en } be the standard basis of Rn . Write x ∈ Rn , with kxk = 1, as
n
x= ∑ c j e j.
j=1
|c j | = |x · e j | ≤ kxk ke j k = 1.
Then
n n n
kAxk = ∑ c j Ae j ≤ ∑ |c j | kAe j k ≤ ∑ kAe j k.
j=1 j=1 j=1
The right hand side does not depend on x. We found a finite upper bound independent of x, so
kAk < ∞.
For any normed vector spaces X and Y , and A ∈ L(X,Y ), suppose that kAk < ∞. For v, w ∈ X,
Proof. First, since all the spaces are finite dimensional, then all the operator norms are finite, and
the statements make sense to begin with.
For ,
k(A + B)xk = kAx + Bxk ≤ kAxk + kBxk ≤ kAk kxk + kBk kxk = kAk + kBk kxk.
So kA + Bk ≤ kAk + kBk.
Similarly,
k(cA)xk = |c| kAxk ≤ |c| kAk kxk.
Thus kcAk ≤ |c| kAk. Next,
|c| kAxk = kcAxk ≤ kcAk kxk.
Hence |c| kAk ≤ kcAk.
For write
kBAxk ≤ kBk kAxk ≤ kBk kAk kxk.
As a norm defines a metric, there is a metric space topology on L(X,Y ) for finite dimensional
vector spaces, so we can talk about open/closed sets, continuity, and convergence.
Proposition 8.2.6. Let X be a finite dimensional normed vector space. Let GL(X) ⊂ L(X) be the
set of invertible linear operators.
(i) If A ∈ GL(X), B ∈ L(X), and
1
kA − Bk < , (8.2)
kA−1 k
then B is invertible.
(ii) GL(X) is an open subset, and A 7→ A−1 is a continuous function on GL(X).
Let us make sense of this proposition on a simple example. Consider X = R1 , where linear
operators are just numbers a and the operator norm of a is |a|. The operator a is invertible (a−1 = 1/a)
1
whenever a 6= 0. The condition |a − b| < |a−1 |
does indeed imply that b is not zero. And a 7→ 1/a is
a continuous map. When n > 1, then there are other noninvertible operators than just zero, and in
general things are a bit more difficult.
Proof. Let us prove . We know something about A−1 and A − B. These are linear operators so let
us apply them to a vector:
A−1 (A − B)x = x − A−1 Bx.
Therefore,
or in other words kBxk 6= 0 for all nonzero x, and hence Bx 6= 0 for all nonzero x. This is enough to
see that B is one-to-one (if Bx = By, then B(x − y) = 0, so x = y). As B is one-to-one operator from
X to X, which is finite dimensional, B is invertible.
Let us prove . Fix some A ∈ GL(X). Let B be near A, specifically kA − Bk < 2kA1−1 k . Then
( ) is satisfied and B is invertible. We have shown above (using B−1 y instead of x)
1
kB−1 yk ≤ kA−1 k kA − Bk kB−1 yk + kA−1 k kyk ≤ kB−1 yk + kA−1 k kyk,
2
or
kB−1 yk ≤ 2kA−1 k kyk.
So kB−1 k ≤ 2kA−1 k.
Now
A−1 (A − B)B−1 = A−1 (AB−1 − I) = B−1 − A−1 ,
and
kB−1 − A−1 k = kA−1 (A − B)B−1 k ≤ kA−1 k kA − Bk kB−1 k ≤ 2kA−1 k2 kA − Bk.
Therefore, as B tends to A, kB−1 − A−1 k tends to 0, and so the inverse operation is a continuous
function at A.
8.2.2 Matrices
Once we fix a basis in a finite dimensional vector space X, we can represent a vector of X as an
n-tuple of numbers, that is a vector in Rn . The same thing can be done with L(X,Y ), which brings
us to matrices, which are a convenient way to represent finite-dimensional linear transformations.
Suppose {x1 , x2 , . . . , xn } and {y1 , y2 , . . . , ym } are bases for vector spaces X and Y respectively. A
linear operator is determined by its values on the basis. Given A ∈ L(X,Y ), Ax j is an element of Y .
Define the numbers ai, j as follows
m
Ax j = ∑ ai, j yi , (8.3)
i=1
and write them as a matrix
a1,1 a1,2 · · · a1,n
a2,1 a2,2 · · · a2,n
A = .. .. ... .. .
. . .
am,1 am,2 · · · am,n
We sometimes write A as [ai, j ]. We say A is an m-by-n matrix. The columns of the matrix are
precisely the coefficients that represent Ax j , in terms of the basis {y1 , y2 , . . . , ym }. If we know
the numbers ai, j , then via the formula ( ) we find the corresponding linear operator, as it is
determined by the action on a basis. Hence, once we fix a basis on X and on Y we have a one-to-one
correspondence between L(X,Y ) and the m-by-n matrices.
When
n
z= ∑ c j x j,
j=1
24 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
then ! !
n n m m n
Az = ∑ c j Ax j = ∑ c j ∑ ai, j yi =∑ ∑ ai, j c j yi ,
j=1 j=1 i=1 i=1 j=1
A way to remember it is if you order the indices as we do, that is row, column, and put the elements
in the same order as the matrices, then it is the “middle index” that is “summed-out.”
A linear mapping changing one basis to another is represented by a square matrix in which
the columns represent vectors of the second basis in terms of the first basis. We call such a linear
mapping a change of basis. So for two choices of a basis in an n-dimensional vector space, there is
a linear mapping (a change of basis) taking one basis to the other, and this corresponds to an n-by-n
matrix which does the corresponding operation on Rn .
Suppose X = Rn , Y = Rm , and all the bases are just the standard bases. Using the Cauchy–
Schwarz inequality compute
!2 ! ! !
m n m n n m n
kAzk2 = ∑ ∑ ai, j c j ≤∑ ∑ (c j )2 ∑ (ai, j )2 =∑ ∑ (ai, j )2 kzk2 .
i=1 j=1 i=1 j=1 j=1 i=1 j=1
In other words, we have a bound on the operator norm (note that equality rarely happens)
s
m n
kAk ≤ ∑ ∑ (ai, j )2.
i=1 j=1
If the entries go to zero, then kAk goes to zero. In particular, if A is fixed and B is changing such
that the entries of A − B go to zero, then B goes to A in operator norm. That is, B goes to A in the
metric space topology induced by the operator norm. We proved the first part of:
Proposition 8.2.7. If f : S → Rnm is a continuous function for a metric space S, then considering
the components of f as the entries of a matrix, f is a continuous mapping from S to L(Rn , Rm ).
Conversely, if f : S → L(Rn , Rm ) is a continuous function, then the entries of the corresponding
matrix are continuous functions.
8.2.3 Determinants
A certain number can be assigned to square matrices that measures how the corresponding linear
mapping stretches space. In particular, this number, called the determinant, can be used to test for
invertibility of a matrix.
Define the symbol sgn(x) (read “sign of x”) for a number x by
−1 if x < 0,
sgn(x) := 0 if x = 0,
1 if x > 0.
Suppose σ = (σ1 , σ2 , . . . , σn ) is a permutation of the integers (1, 2, . . . , n), that is, a reordering of
(1, 2, . . . , n). Let
sgn(σ ) = sgn(σ1 , . . . , σn ) := ∏ sgn(σq − σ p ). (8.4)
p<q
Here ∏ stands for multiplication, similarly to how ∑ stands for summation.
Any permutation can be obtained by a sequence of transpositions (switchings of two elements).
We say a permutation is a permutation even (resp. odd) if it takes an even (resp. odd) number of
transpositions to get from (1, 2, . . . , n) to σ . For example, (2, 4, 3, 1) is two transpositions away
from (1, 2, 3, 4) and is therefore even: (1, 2, 3, 4) → (2, 1, 3, 4) → (2, 4, 3, 1). Being even or odd is
well-defined: sgn(σ ) is 1 if σ is even and −1 if σ is odd (exercise). This fact can be proved by
noting that applying a transposition changes the sign, and computing that sgn(1, 2, . . . , n) = 1.
Let Sn be the set of all permutations on n elements (the symmetric group). Let A = [ai, j ] be a
square n-by-n matrix. Define the determinant of A
n
det(A) := ∑ sgn(σ ) ∏ ai,σi .
σ ∈Sn i=1
Proposition 8.2.8.
(i) det(I) = 1.
(ii) For every j = 1, 2, . . . , n, the function x j 7→ det [x1 x2 · · · xn ] is linear.
(iii) If two columns of a matrix are interchanged, then the determinant changes sign.
(iv) If two columns of A are equal, then det(A) = 0.
(v) If a column is zero, then det(A) = 0.
(vi) A 7→ det(A) is a continuous function on L(Rn ).
(vii) det ac db = ad − bc, and det [a] = a.
In fact, the determinant is the unique function that satisfies , , and . But we digress. By
, we mean that if we fix all the vectors x1 , . . . , xn except for x j , and let v, w ∈ Rn be two vectors,
and a, b ∈ R be scalars, then
det [x1 · · · x j−1 (av + bw) x j+1 · · · xn ] =
a det [x1 · · · x j−1 v x j+1 · · · xn ] + b det [x1 · · · x j−1 w x j+1 · · · xn ] .
26 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Proof. We go through the proof quickly, as you have likely seen this before.
is trivial. For , notice that each term in the definition of the determinant contains exactly
one factor from each column.
Part follows by noting that switching two columns is like switching the two corresponding
numbers in every element in Sn . Hence all the signs are changed. Part follows because if
two columns are equal and we switch them we get the same matrix back and so part says the
determinant must have been 0.
Part follows because the product in each term in the definition includes one element from
the zero column. Part follows as det is a polynomial in the entries of the matrix and hence
continuous (in the entries of the matrix). A function defined on matrices is continuous in the operator
norm if and only if it is continuous in the entries. Finally, part is a direct computation.
The determinant tells us about areas and volumes, and how they change. For example, in
the 1-by-1 case, a matrix is just a number, and the determinant is exactly this number. It says
how the linear mapping “stretches” the space. Similarly for R2 . Suppose A ∈ L(R2 ) is a linear
transformation. It can be checked directly that the area of the image of the unit square A [0, 1]2 is
precisely |det(A)|. This works with arbitrary figures, not just the unit square: The absolute value
of the determinant tells us the stretch in the area. The sign of the determinant tells us if the image
is flipped (changes orientation) or not. In R3 it tells us about the 3-dimensional volume, and in n
dimensions about the n-dimensional volume. We claim this without proof.
Proposition 8.2.9. If A and B are n-by-n matrices, then det(AB) = det(A) det(B). Furthermore, A
is invertible if and only if det(A) 6= 0 and in this case, det(A−1 ) = det(A)
1
.
In the last equality we can sum over just the elements of Sn instead of all n-tuples for integers
between 1 and n by noting that when two columns in the determinant are the same, then the
determinant is zero. Then we reordered the columns to the original ordering to obtain the sgn.
The conclusion that det(AB) = det(A) det(B) follows by recognizing above the determinant of
B. We obtain this by plugging in A = I. The expression we got for the determinant of B has rows
8.2. ANALYSIS WITH VECTOR SPACES 27
and columns swapped, so as a side note, we have also just proved that the determinant of a matrix
and its transpose are equal.
To prove the second part of the theorem, suppose A is invertible. Then A−1 A = I and con-
sequently det(A−1 ) det(A) = det(A−1 A) = det(I) = 1. If A is not invertible, then there must be a
nonzero vector that A takes to zero as A is not one-to-one. In other words, the columns of A are
linearly dependent. Suppose
n
∑ γ j a j = 0,
j=1
where not all γ j are equal to 0. Without loss of generality suppose γ1 6= 1. Take
γ1 0 0 · · · 0
γ2 1 0 · · · 0
γ3 0 1 · · · 0
B := .
.. .. .. . . ..
. . . . .
γn 0 0 · · · 1
Using the definition of the determinant (there is only a single permutation σ for which ∏ni=1 bi,σi is
nonzero) we find det(B) = γ1 6= 0. Then det(AB) = det(A) det(B) = γ1 det(A). The first column of
AB is zero, and hence det(AB) = 0. We conclude det(A) = 0.
Proposition 8.2.10. Determinant is independent of the basis. In other words, if B is invertible, then
det(A) = det(B−1 AB).
The proof is to compute det(B−1 AB) = det(B−1 ) det(A) det(B) = det(B) 1
det(A) det(B) = det(A).
If in one basis A is the matrix representing a linear operator, then for another basis we can find a
matrix B such that the matrix B−1 AB takes us to the first basis, applies A in the first basis, and takes
us back to the basis we started with. Let X be a finite dimensional vector space. Let Φ ∈ L(X, Rn )
take a basis {x1 , . . . , xn } to the standard basis {e1 , . . . , en } and let Ψ ∈ L(X, Rn ) take another basis
{y1 , . . . , yn } to the standard basis. Let T ∈ L(X) be a linear operator and let a matrix A represent the
operator in the basis {x1 , . . . , xn }. Then B would be such that we have the following diagram :
B−1 AB
Rn Rn
Ψ Ψ−1
T
B X X B−1
Φ Φ−1
A
Rn Rn
The Rn on the bottom row represent X in the first basis, and the Rn on top represent X in the second
basis.
If we compute the determinant of the matrix A, we obtain the same determinant if we had used
any other basis: in the other basis the matrix would be B−1 AB. It follows that
det : L(X) → R
∗
This is a so-called communtative diagram. Following arrows in any way should end up with the same result.
28 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Given any n-by-m matrix M the matrix EM is the same matrix as M except with the kth row
multiplied by λ . It is an easy computation (exercise) that det(E) = λ .
Next, for some j and k with j 6= k, and λ ∈ R, define the second type of an elementary matrix E
by
(
ei if i 6= j,
Eei :=
ei + λ ek if i = j.
Given any n-by-m matrix M the matrix EM is the same matrix as M except with λ times the kth
row added to the jth row. It is an easy computation (exercise) that det(E) = 1.
Finally, for some j and k with j 6= k, define the third type of an elementary matrix E by
ei if i 6= j and i 6= k,
Eei := ek if i = j,
ej if i = k.
Given any n-by-m matrix M the matrix EM is the same matrix with jth and kth rows swapped. It is
an easy computation (exercise) that det(E) = −1.
Proposition 8.2.11. Let T be an n-by-n invertible matrix. Then there exists a finite sequence of
elementary matrices E1 , E2 , . . . , Ek such that
T = E1 E2 · · · Ek ,
and
det(T ) = det(E1 ) det(E2 ) · · · det(Ek ).
The proof is left as an exercise. The proposition says that we can compute the determinant by
doing elementary row operations. For computing the determinant one doesn’t have to factor the
matrix into a product of elementary matrices completely: usually one would only do row operations
until we find an upper triangular matrix, that is a matrix [ai, j ] where ai, j = 0 if i > j. Computing
their determinant is not difficult (exercise).
Factorization into elementary matrices (or variations on elementary matrices) is useful in proofs
involving an arbitrary linear operator, by reducing to a proof for an elementary matrix, similarly as
the computation of the determinant.
8.2. ANALYSIS WITH VECTOR SPACES 29
8.2.4 Exercises
Exercise 8.2.1: For a vector space X with a norm k·k, show that d(x, y) := kx − yk makes X a metric space.
Exercise 8.2.2 (Easy): Show that for square matrices A and B, det(AB) = det(BA).
Exercise 8.2.5: Using the euclidean norm on R2 , compute the operator norm of the operators in L(R2 ) given
by the matrices:
0 1
a) 10 02 b) −1 0 c) 10 11 d) 00 10
Exercise 8.2.8: Verify the computation of the determinant for the three types of elementary matrices.
Exercise 8.2.10:
a) Suppose D = [di, j ] is an n-by-n diagonal matrix, that is, di, j = 0 whenever i 6= j. Show that det(D) =
d1,1 d2,2 · · · dn,n .
b) Suppose A is a diagonalizable matrix. That is, there exists a matrix B such that B−1 AB = D for a diagonal
matrix D = [di, j ]. Show that det(A) = d1,1 d2,2 · · · dn,n .
30 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Exercise 8.2.11: Take the vector space of polynomials R[t] and the linear operator D ∈ L(R[t]) that is the
differentiation (we proved in an earlier exercise that D is a linear operator). Given P(t) = c0 + c1t + · · · +
cnt n ∈ R[t] define kPk := sup |c j | : j = 0, 1, 2, . . . , n .
a) Show that kPk is a norm on R[t].
b) Show that D does not have bounded operator norm, that is kDk = ∞. Hint: Consider the polynomials t n
as n tends to infinity.
Exercise 8.2.12: In this exercise we finish the proof of . Let X be any finite dimensional
normed vector space. Let {x1 , x2 , . . . , xn } be a basis for X.
a) Show that the function f : Rn → R
f (c1 , c2 , . . . , cn ) = kc1 x1 + c2 x2 + · · · + cn xn k
is continuous.
b) Show that there exist numbers m and M such that if c = (c1 , c2 , . . . , cn ) ∈ Rn with kck = 1 (standard
euclidean norm), then m ≤ kc1 x1 + c2 x2 + · · · + cn xn k ≤ M (here the norm is on X).
c) Show that there exists a number B such that if kc1 x1 + c2 x2 + · · · + cn xn k = 1, then |c j | ≤ B.
d) Use part c) to show that if X is finite dimensional vector spaces and A ∈ L(X,Y ), then kAk < ∞.
Exercise 8.2.13: Let X be any finite dimensional vector space with a norm k·k and basis {x1 , x2 , . . . , xn }. Let
c = (c1 , c2 , . . . , cn ) ∈ Rn and kck be the standard euclidean norm on Rn .
a) Find that there exist positive numbers m, M > 0 such that for all c ∈ Rn
c) Show that U ⊂ X is open in the metric defined by kx − yk1 if and only if it is open in the metric defined
by kx − yk2 . In other words, convergence of sequences and continuity of functions is the same in either
norm.
Exercise 8.2.14: Let A be an upper triangular matrix. Find a formula for the determinant of A in terms of
the diagonal entries, and prove that your formula works.
Exercise 8.2.15: Given an n-by-n matrix A, prove that |det(A)| ≤ kAkn (the norm on A is the operator norm).
Hint: Note that you only need to show this for invertible matrices. Then possibly reorder columns and factor
A into n matrices each of which differs from the identity by one column.
is called the spectral radius of A. Here |λ | is the complex modulus. We state without proof that at least one
eigenvalue always exists, and there are no more than n distinct eigenvalues of A. You can therefore assume
that 0 ≤ ρ(A) < ∞. The exercises below hold for complex matrices, but feel free to assume they are real
matrices.
Exercise 8.2.17: Let A, S be n-by-n matrices, where S is invertible. Prove that λ is an eigenvalue of A, if and
only if it is an eigenvalue of S−1 AS. Then prove that ρ(S−1 AS) = ρ(S). In particular, ρ is a well defined
function on L(X) for any finite dimensional vector space X.
Multiplying by a is a linear map in one dimension: h 7→ ah. That is, we think of a ∈ L(R1 , R1 ),
which is the best linear approximation of how f changes near x. We use this definition to extend
differentiation to more variables.
Definition 8.3.1. Let U ⊂ Rn be an open subset and f : U → Rm . We say f is differentiable at
x ∈ U if there exists an A ∈ L(Rn , Rm ) such that
k f (x + h) − f (x) − Ahk
lim = 0.
h→0n khk
h∈R
y
y = f (x1 , x2 )
Ah
x2
h
x1
Figure 8.3: Illustration of a derivative for a function f : R2 → R. The vector h is shown in the x1 x2 -plane
based at (x1 , x2 ), and the vector Ah ∈ R1 is shown along the y direction.
to think of R here as L(R1 , R1 ). As in one dimension, the idea is that a differentiable mapping is
“infinitesimally close” to a linear mapping, and this linear mapping is the derivative.
Notice which norms are being used in the definition. The norm in the numerator is on Rm , and
the norm in the denominator is on Rn where h lives. Normally it is understood that h ∈ Rn from
context. We will not explicitly say so from now on.
We have again cheated somewhat and said that A is the derivative. We have not shown yet that
there is only one, let us do that now.
Proposition 8.3.2. Let U ⊂ Rn be an open subset and f : U → Rm . Suppose x ∈ U and there exist
A, B ∈ L(Rn , Rm ) such that
k f (x + h) − f (x) − Ahk k f (x + h) − f (x) − Bhk
lim =0 and lim = 0.
h→0 khk h→0 khk
Then A = B.
Proof. Suppose h ∈ Rn , h 6= 0. Compute
k(A − B)hk k f (x + h) − f (x) − Ah − ( f (x + h) − f (x) − Bh)k
=
khk khk
k f (x + h) − f (x) − Ahk k f (x + h) − f (x) − Bhk
≤ + .
khk khk
So k(A−B)hk
khk → 0 as h → 0. That is, given ε > 0, then for all nonzero h in some δ -ball around the
origin
k(A − B)hk h
ε> = (A − B) .
khk khk
h
For any x with kxk = 1, let h = (δ/2) x, then khk < δ and khk = x. So k(A − B)xk < ε . Taking the
supremum over all x with kxk = 1 we get the operator norm kA − Bk ≤ ε . As ε > 0 was arbitrary,
kA − Bk = 0, or in other words A = B.
Example 8.3.3: If f (x) = Ax for a linear mapping A, then f ′ (x) = A:
k f (x + h) − f (x) − Ahk kA(x + h) − Ax − Ahk 0
= = = 0.
khk khk khk
Example 8.3.4: Let f : R2 → R2 be defined by
f (x, y) = f1 (x, y), f2 (x, y) := (1 + x + 2y + x2 , 2x + 3y + xy).
Let us show that f is differentiable at the origin and let us compute the derivative, directly using
the
a bdefinition.
If the derivative exists, it is in L(R2 , R2 ), so it can be represented by a 2-by-2 matrix
c d . Suppose h = (h1 , h2 ). We need the following expression to go to zero.
And this expression does indeed go to zero as h → 0. The function f is differentiable at the origin
and the derivative f ′ (0) is represented by the matrix 12 23 .
Proposition 8.3.5. Let U ⊂ Rn be open and f : U → Rm be differentiable at p ∈ U. Then f is
continuous at p.
Proof. Another way to write the differentiability of f at p is to first write
r(h) := f (p + h) − f (p) − f ′ (p)h,
and kr(h)k ′
khk must go to zero as h → 0. So r(h) itself must go to zero. The mapping h 7→ f (p)h is a
linear mapping between finite dimensional spaces, it is therefore continuous and goes to zero as
h → 0. Therefore, f (p + h) must go to f (p) as h → 0. That is, f is continuous at p.
The derivative is itself a linear operator on the space of differentiable functions.
Proposition 8.3.6. Suppose U ⊂ Rn is open, f : U → Rm and g : U → Rm are differentiable at p,
and α ∈ R. Then the functions f + g and α f are differentiable at p and
( f + g)′ (p) = f ′ (p) + g′ (p), and (α f )′ (p) = α f ′ (p).
Proof. Let h ∈ Rn , h 6= 0. Then
f (p + h) + g(p + h) − f (p) + g(p) − f ′ (p) + g′ (p) h
khk
k f (p + h) − f (p) − f ′ (p)hk kg(p + h) − g(p) − g′ (p)hk
≤ + ,
khk khk
and
kα f (p + h) − α f (p) − α f ′ (p)hk k f (p + h)) − f (p) − f ′ (p)hk
= |α | .
khk khk
The limits as h goes to zero of the right hand sides as are zero by hypothesis. The conclusion
follows.
If A ∈ L(X,Y ) and B ∈ L(Y, Z) are linear maps, then they are their own derivative. The com-
position BA ∈ L(X, Z) is also its own derivative, and so the derivative of the composition is the
composition of the derivatives. As differentiable maps are “infinitesimally close” to linear maps,
they have the same property:
Theorem 8.3.7 (Chain rule). Let U ⊂ Rn be open and let f : U → Rm be differentiable at p ∈ U.
Let V ⊂ Rm be open, f (U) ⊂ V and let g : V → Rℓ be differentiable at f (p). Then
F(x) = g f (x)
is differentiable at p and
F ′ (p) = g′ f (p) f ′ (p).
8.3. THE DERIVATIVE 35
Without the points where things are evaluated, this is sometimes written as F ′ = ( f ◦ g)′ = g′ f ′ .
The way to understand it is that the derivative of the composition g ◦ f is the composition of the
derivatives of g and f . If f ′ (p) = A and g′ f (p) = B, then F ′ (p) = BA, just as for linear maps.
Proof. Let A := f ′ (p) and B := g′ f (p) . Take h ∈ Rn and write q = f (p), k = f (p + h) − f (p).
Let
r(h) := f (p + h) − f (p) − Ah.
Then r(h) = k − Ah or Ah = k − r(h), and f (p + h) = q + k. We look at the quantity we need to go
to zero:
kF(p + h) − F(p) − BAhk kg f (p + h) − g f (p) − BAhk
=
khk khk
kg(q + k) − g(q) − B k − r(h) k
=
khk
kg(q + k) − g(q) − Bkk kr(h)k
≤ + kBk
khk khk
kg(q + k) − g(q) − Bkk k f (p + h) − f (p)k kr(h)k
= + kBk .
kkk khk khk
y
y = f (x1 , x2 )
∂f
slope = ∂ x2 (x1 , x2 )
x2
(x1 , x2 )
x1
Figure 8.4: Illustration of a partial derivative for a function f : R2 → R. The yx2 -plane where x1 is fixed
is marked in dotted line, and the slope of the tangent line in the yx2 -plane is ∂∂xf2 (x1 , x2 ).
Partial derivatives are easier to compute with all the machinery of calculus, and they provide a
way to compute the derivative of a function.
Proposition 8.3.9. Let U ⊂ Rn be open and let f : U → Rm be differentiable at p ∈ U. Then all the
partial derivatives at p exist and, in terms of the standard bases of Rn and Rm , f ′ (p) is represented
by the matrix
∂ f1 ∂ f1 ∂ f1
(p) (p) . . . (p)
∂ x1 ∂ x2 ∂ xn
∂ f2 ∂ f2 ∂ f2
∂ x1 (p) ∂ x2 (p) . . . ∂ xn (p)
. .. .. .
.. ...
. .
∂ fm ∂ fm ∂ fm
∂ x (p) ∂ x (p) . . . ∂ xn (p)
1 2
In other words
m
∂f
f ′ (p) e j = ∑ ∂ xkj (p) ek .
k=1
If v = ∑nj=1 c j e j = (c1 , c2 , . . . , cn ), then
!
n m m n
∂ fk ∂ fk
f ′ (p) v = ∑ ∑ c j (p) ek = ∑ ∑ c j (p) ek .
j=1 k=1 ∂ x j k=1 j=1 ∂ x j
∂ fk fk (p + he j ) − fk (p)
(p) = lim
∂xj h→0 h
exists and is equal to the kth component of f ′ (p) e j , and we are done.
The converse of the proposition is not true. Just because the partial derivatives exist, does not
mean that the function is differentiable. See the exercises. However, when the partial derivatives are
continuous, we will prove that the converse holds. One of the consequences of the proposition is
that if f is differentiable on U, then f ′ : U → L(Rn , Rm ) is a continuous function if and only if all
the ∂∂ xfkj are continuous functions.
The gradient gives a way to represent the action of the derivative as a dot product: f ′ (x) v = ∇ f (x) · v.
Suppose γ : (a, b) ⊂ R → Rn is a differentiable function. Such a function and its image is
sometimes called a curve, or a differentiable curve. Write γ = (γ1 , γ2 , . . . , γn ). For the purposes of
computation we identify L(R1 ) and R as we did when we defined the derivative in one variable.
We also identify L(R1, Rn ) with Rn . We treat γ ′ (t) both as an operator in L(R1 , Rn ) and the vector
γ1′ (t), γ2′ (t), . . . , γn′ (t) in Rn . Using , if v ∈ Rn is γ ′ (t) acting as a vector, then
h 7→ h v (for h ∈ R1 = R) is γ ′ (t) acting as an operator in L(R1 , Rn ). We often use this slight abuse
of notation when dealing with curves. See .
Suppose γ (a, b) ⊂ U and let
g(t) := f γ (t) .
The function g is differentiable. Treating g′ (t) as a number:
n dγ j n
∂f ∂ f dγ j
′ ′
g (t) = f γ (t) γ ′ (t) = ∑ ∂ x j γ (t) dt (t) = ∑ ∂xj .
j=1 j=1 dt
38 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
For convenience, we often leave out the points where we are evaluating, such as above on the right
hand side. Let us rewrite this equation with the notation of the gradient and the dot product.
g′ (t) = (∇ f ) γ (t) · γ ′ (t) = ∇ f · γ ′ .
We use this idea to define derivatives in a specific direction. A direction is simply a vector
pointing in that direction. Pick a vector u ∈ Rn such that kuk = 1, and fix x ∈ U. We define the
directional derivative as
d f (x + hu) − f (x)
Du f (x) := f (x + tu) = lim ,
dt t=0 h→0 h
d
where the notation dt t=0 represents the derivative evaluated at t = 0. Taking the standard basis
∂f ∂f
vector e j we find = De j f . For this reason, sometimes the notation
∂xj ∂u is used instead of Du f .
Let γ be defined by
γ (t) := x + tu.
Then γ ′ (t) = u for all t. By the computation above:
d
Du f (x) = f (x + tu) = (∇ f ) γ (0) · γ ′ (0) = (∇ f )(x) · u.
dt t=0
(∇ f )(x)
u= ,
k(∇ f )(x)k
we get Du f (x) = k(∇ f )(x)k. The gradient points in the direction in which the function grows
fastest, in other words, in the direction in which Du f (x) is maximal.
Sometimes J f is written as
∂ ( f1 , f2 , . . . , fn )
.
∂ (x1 , x2 , . . . , xn )
∗
Named after the Italian mathematician (1804–1851).
†
The matrix from representing f ′ (x) is sometimes called the Jacobian matrix.
8.3. THE DERIVATIVE 39
This last piece of notation may seem somewhat confusing, but it is quite useful when we need
to specify the exact variables and function components used, as will for example do in the implicit
function theorem.
The Jacobian J f is a real-valued function, and when n = 1 it is simply the derivative. From the
chain rule and the fact that det(AB) = det(A) det(B), it follows that:
J f ◦g (x) = J f g(x) Jg (x).
The determinant of a linear mapping tells us what happens to area/volume under the mapping.
Similarly, the Jacobian measures how much a differentiable mapping stretches things locally, and if
it flips orientation. In particular, if the Jacobian is non-zero than we would assume that locally the
mapping is invertible (and we would be correct as we will later see).
8.3.5 Exercises
Exercise 8.3.1: Suppose γ : (−1, 1) → Rn and α : (−1, 1) → Rn be two differentiable curves such that
γ(0) = α(0) and γ ′ (0) = α ′ (0). Suppose F : Rn → R is a differentiable function. Show that
d d
F γ(t) = F α(t) .
dt
t=0 dt t=0
p
2
Exercise 8.3.2: Let f : R → R be given by f (x, y) := x2 + y2 , see . Show that f is not differen-
tiable at the origin.
x
p
Figure 8.6: Graph of x2 + y2 .
Exercise 8.3.3: Using only the definition of the derivative, show that the following f : R2 → R2 are differen-
tiable at the origin and find their derivative.
a) f (x, y) := (1 + x + xy, x),
b) f (x, y) := y − y10 , x ,
c) f (x, y) := (x + y + 1)2 , (x − y + 2)2 .
Exercise 8.3.4: Suppose f : R → R and g : R → R are differentiable functions.
Using only the definition of
2 2
the derivative, show that h : R → R defined by h(x, y) := f (x), g(y) is a differentiable function and find
the derivative at any point (x, y).
40 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
∂f ∂f
a) Show that partial derivatives ∂x and ∂y exist at all points (including the origin).
b) Show that f is not continuous at the origin (and hence not differentiable).
x
xy
Figure 8.7: Graph of x2 +y2
.
∂f ∂f
a) Show that partial derivatives ∂x and ∂y exist at all points.
b) Show that for all u ∈ R2 with kuk = 1, the directional derivative Du f exists at all points.
c) Show that f is continuous at the origin.
d) Show that f is not differentiable at the origin.
Exercise 8.3.7: Suppose f : Rn → Rn is one-to-one, onto, differentiable at all points, and such that f −1 is
also differentiable at all points.
′
a) Show that f ′ (p) is invertible at all points p and compute ( f −1 ) f (p) . Hint: Consider x = f −1 f (x) .
b) Let g : Rn → Rn be a function differentiable at q ∈ Rn and such that g(q) = q. Suppose f (p) = q for
some p ∈ Rn . Show Jg (q) = J f −1 ◦g◦ f (p) where Jg is the Jacobian determinant.
Exercise 8.3.8: Suppose f : R2 → R is differentiable and such that f (x, y) = 0 if and only if y = 0 and such
that ∇ f (0, 0) = (0, 1). Prove that f (x, y) > 0 whenever y > 0, and f (x, y) < 0 whenever y < 0.
8.3. THE DERIVATIVE 41
x
x2 y
Figure 8.8: Graph of x2 +y2
.
As for functions of one variable, f : U → R has a relative maximum at p ∈ U if there exists a δ > 0 such
that f (q) ≤ f (p) for all q ∈ B(p, δ ) ∩U. Similarly for relative minimum.
Exercise 8.3.9: Suppose U ⊂ Rn is open and f : U → R is differentiable. Suppose f has a relative maximum
at p ∈ U. Show that f ′ (p) = 0, that is the zero mapping in L(Rn , R). That is p is a critical point of f .
Exercise 8.3.10: Suppose f : R2 → R is differentiable and suppose that whenever x2 + y2 = 1, then f (x, y) =
0. Prove that there exists at least one point (x0 , y0 ) such that ∂∂ xf (x0 , y0 ) = ∂∂ yf (x0 , y0 ) = 0.
Exercise 8.3.11: Define f (x, y) := (x − y2 )(2y2 − x). The graph of f is called the Peano surface . Show
a) (0, 0) is a critical point, that is f ′ (0, 0) = 0, that is the zero linear map in L(R2 , R).
b) For every direction, that is (x, y) such that x2 + y2 = 1 the “restriction of f to the line containing the
points (0, 0) and (x, y)”, that is a function g(t) := f (tx,ty) has a relative maximum at t = 0.
Hint: While not necessary §4.3 of volume I makes this part easier.
c) f does not have a relative maximum at (0, 0).
Exercise 8.3.12: Suppose f : R → Rn is differentiable and k f (t)k = 1 for all t (that is, we have a curve in
the unit sphere). Then show that for all t, treating f ′ as a vector we have, f ′ (t) · f (t) = 0.
Exercise 8.3.13: Define f : R2 → R2 by f (x, y) := x, y + ϕ(x) for some differentiable function ϕ of one
variable. Show f is differentiable and find f ′ .
∗
Named after the Italian mathematician (1858–1932).
42 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Lemma 8.4.1. If ϕ : [a, b] → Rn is differentiable on (a, b) and continuous on [a, b], then there exists
a t0 ∈ (a, b) such that
kϕ (b) − ϕ (a)k ≤ (b − a)kϕ ′ (t0 )k.
Proof. By mean value theorem on the scalar-valued function t 7→ ϕ (b) − ϕ (a) · ϕ (t), where the
dot is the dot product, we obtain that there is a t0 ∈ (a, b) such that
kϕ (b) − ϕ (a)k2 = ϕ (b) − ϕ (a) · ϕ (b) − ϕ (a)
= ϕ (b) − ϕ (a) · ϕ (b) − ϕ (b) − ϕ (a) · ϕ (a)
= (b − a) ϕ (b) − ϕ (a) · ϕ ′ (t0 ),
where we treat ϕ ′ as a vector in Rn by the abuse of notation we mentioned in the previous section.
If we think of ϕ ′ (t) as a vector, then by , kϕ ′ (t)kL(R,Rn ) = kϕ ′ (t)kRn . That is, the
euclidean norm of the vector is the same as the operator norm of ϕ ′ (t).
By the Cauchy–Schwarz inequality
kϕ (b) − ϕ (a)k2 = (b − a) ϕ (b) − ϕ (a) · ϕ ′ (t0 ) ≤ (b − a)kϕ (b) − ϕ (a)k kϕ ′ (t0 )k.
Recall that a set U is convex if whenever x, y ∈ U, the line segment from x to y lies in U.
Proof. Fix x and y in U and note that (1 − t)x + ty ∈ U for all t ∈ [0, 1] by convexity. Next
dh i
f (1 − t)x + ty = f ′ (1 − t)x + ty (y − x).
dt
By the mean value theorem above we get for some t0 ∈ (0, 1),
d h i
k f (x) − f (y)k ≤ f (1 − t)x + ty
dt t=t0
≤ f ′ (1 − t0 )x + t0 y ky − xk ≤ Mky − xk.
8.4. CONTINUITY AND THE DERIVATIVE 43
Example 8.4.3: If U is not convex the proposition is not true: Consider the set
U := (x, y) : 0.5 < x2 + y2 < 2 \ (x, 0) : x < 0 .
For (x, y) ∈ U, let f (x, y) be the angle that the line from the origin to (x, y) makes with the positive
x axis. We even have a formula for f :
!
y
f (x, y) = 2 arctan p .
x + x2 + y2
Think a spiral staircase with room in the middle. See .
(x, y) y
θ = f (x, y) z
The function is differentiable, and the derivative is bounded on U, which is not hard to see. Now
think of what happens near where the negative x-axis cuts the annulus in half. As we approach this
cut from positive y, f (x, y) approaches π . From negative y, f (x, y) approaches −π . So for small
ε > 0, | f (−1, ε ) − f (−1, −ε )| approaches 2π , but k(−1, ε ) − (−1, −ε )k = 2ε , which is arbitrarily
small. The conclusion of the proposition does not hold for this nonconvex U.
Let us solve the differential equation f ′ = 0.
Corollary 8.4.4. If U ⊂ Rn is connected and f : U → Rm is differentiable and f ′ (x) = 0, for all
x ∈ U, then f is constant.
Proof. For any x ∈ U, there is a ball B(x, δ ) ⊂ U. The ball B(x, δ ) is convex. Since k f ′ (y)k ≤ 0
for all y ∈ B(x, δ ), then by the proposition, k f (x) − f (y)k ≤ 0kx − yk = 0. So f (x) = f (y) for all
y ∈ B(x, δ ).
This means that f −1 (c) is open for any c ∈ Rm . Suppose f −1 (c) is nonempty. The two sets
U ′ = f −1 (c), U ′′ = f −1 Rm \ {c}
are open and disjoint, and further U = U ′ ∪U ′′ . So as U ′ is nonempty, and U is connected, we have
that U ′′ = 0.
/ So f (x) = c for all x ∈ U.
44 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Without continuity the theorem does not hold. Just because partial derivatives exist does not
mean that f is differentiable, in fact, f may not even be continuous. See the exercises for the last
section and also for this section.
Proof. We proved that if f is differentiable, then the partial derivatives exist. The partial derivatives
are the entries of the matrix of f ′ (x). If f ′ : U → L(Rn , Rm ) is continuous, then the entries are
continuous, and hence the partial derivatives are continuous.
To prove the opposite direction, suppose the partial derivatives exist and are continuous. Fix
x ∈ U. If we show that f ′ (x) exists we are done, because the entries of the matrix f ′ (x) are the partial
derivatives and if the entries are continuous functions, the matrix-valued function f ′ is continuous.
We do induction on dimension. First, the conclusion is true when n = 1. In this case the
derivative is just the regular derivative (exercise, noting that f is vector-valued).
Suppose the conclusion is true for Rn−1 , that is, if we restrict to the first n − 1 variables, the
function is differentiable. It is easy to see that the first n − 1 partial derivatives of f restricted to the
set where the last coordinate is fixed are the same as those for f . In the following, by a slight abuse
of notation, we think of Rn−1 as a subset of Rn , that is the set in Rn where xn = 0. In other words,
we identify the vectors (x1 , x2 , . . . , xn−1 ) and (x1 , x2 , . . . , xn−1 , 0). Let
∂ f1 ∂ f1 ∂ f1 ∂ f1 ∂ f1
(x) . . . (x) (x) . . . (x) (x)
∂ x1. .
∂ xn
. ′
∂ x1. .
∂ xn−1
. ∂ xn.
A := .. .. ..
A := .. .. .. v := .
, , . .
∂ fm ∂ fm ∂ fm ∂ fm ∂ fm
∂ x (x) . . . ∂ xn (x)
1 1 ∂ x (x) . . . ∂ x
n−1
(x) ∂ xn (x)
Let ε > 0 be given. By the induction hypothesis, there is a δ > 0 such that for any k ∈ Rn−1 with
kkk < δ we have
k f (x + k) − f (x) − A′ kk
< ε.
kkk
By continuity of the partial derivatives, suppose δ is small enough so that
∂ fj ∂ fj
(x + h) − (x) < ε ,
∂ xn ∂ xn
for all j and all h ∈ Rn with khk < δ .
Suppose h = k + ten is a vector in Rn , where k ∈ Rn−1 , t ∈ R, such that khk < δ . Then
kkk ≤ khk < δ . Note that Ah = A′ k + tv.
As all the partial derivatives exist, by the mean value theorem, for each j there is some θ j ∈ [0,t]
(or [t, 0] if t < 0), such that
∂ fj
f j (x + k + ten ) − f j (x + k) = t (x + k + θ j en ).
∂ xn
Note that if khk < δ , then kk + θ j en k ≤ khk < δ . So to finish the estimate
is continuously differentiable.
Proof. Consider the partial derivative of p in the xn variable. Write p as
d
p(x) = ∑ p j (x1, . . . , xn−1) xnj ,
j=0
which is again a polynomial. So the partial derivatives of polynomials exist and are again polynomi-
als. By the continuity of algebraic operations, polynomials are continuous functions. Therefore p is
continuously differentiable.
8.4.3 Exercises
Exercise 8.4.1: Define f : R2 → R as
( −1
(x2 + y2 ) sin (x2 + y2 ) if (x, y) 6= (0, 0),
f (x, y) :=
0 else.
Show that f is differentiable at the origin, but that it is not continuously differentiable.
Note: Feel free to use what you know about sine and cosine from calculus.
46 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
∂f ∂f
Compute the partial derivatives ∂x and ∂y at all points and show that these are not continuous functions.
Exercise 8.4.3: Let B(0, 1) ⊂ R2 be the unit ball (disc), that is, the set given by x2 + y2 < 1. Suppose
f : B(0, 1) → R is a differentiable function such that | f (0, 0)| ≤ 1, and ∂∂ xf ≤ 1 and ∂∂ yf ≤ 1 for all points
in B(0, 1).
a) Find an M ∈ R such that k f ′ (x, y)k ≤ M for all (x, y) ∈ B(0, 1).
b) Find a B ∈ R such that | f (x, y)| ≤ B for all (x, y) ∈ B(0, 1).
Exercise 8.4.4: Define ϕ : [0, 2π] → R2 by ϕ(t) = sin(t), cos(t) . Compute ϕ ′ (t) for all t. Compute kϕ ′ (t)k
for all t. Notice that ϕ ′ (t) is never zero, yet ϕ(0) = ϕ(2π), therefore, Rolle’s theorem is not true in more
than one dimension.
∂f ∂f
Exercise 8.4.5: Let f : R2 → R be a function such that ∂x and ∂y exist at all points and there exists an
∂f ∂f
M ∈ R such that ∂x ≤ M and ∂y ≤ M at all points. Show that f is continuous.
Exercise 8.4.6: Let f : R2 → R be a function and M ∈ R, such that for every (x, y) ∈ R2 , the function
g(t) := f (xt, yt) is differentiable and |g′ (t)| ≤ M.
a) Show that f is continuous at (0, 0).
b) Find an example of such an f which is not continuous at every other point of R2
Hint: Think back to how we constructed a nowhere continuous function on [0, 1].
Exercise 8.4.8: Suppose f : Rn → R and h : Rn → R are two differentiable functions such that f ′ (x) = h′ (x)
for all x ∈ Rn . Prove that if f (0) = h(0) then f (x) = h(x) for all x ∈ Rn .
Exercise 8.4.9: Prove the assertion about the base case in the proof of . That is, prove that
if n = 1 and the the partials exist and are continuous, the function is continuously differentiable.
∂F
is continuously differentiable, and that it is the solution of the partial differential equation ∂y = h, with the
initial condition F(x, 0) = g(x) for all x ∈ R.
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 47
The contraction mapping principle says that if f : X → X is a contraction and X is a complete metric
space, then there exists a unique fixed point, that is, there exists a unique x ∈ X such that f (x) = x.
Intuitively if a function is continuously differentiable, then it locally “behaves like” the derivative
(which is a linear function). The idea of the inverse function theorem is that if a function is
continuously differentiable and the derivative is invertible, the function is (locally) invertible.
Theorem 8.5.1 (Inverse function theorem). Let U ⊂ Rn be an open set and let f : U → Rn be
a continuously differentiable function. Also suppose p ∈ U, f (p) = q, and f ′ (p) is invertible
(that is, J f (p) 6= 0). Then there exist open sets V,W ⊂ Rn such that p ∈ V ⊂ U, f (V ) = W and
f |V is one-to-one. And hence a g : W → V exists such that g(y) := ( f |V )−1 (y). See .
Furthermore, g is continuously differentiable and
−1
g′ (y) = f ′ (x) , for all x ∈ V , y = f (x).
U
f
V W = f (V )
p g q = f (p)
f
x y
g
Proof. Write A = f ′ (p). As f ′ is continuous, there exists an open ball V around p such that
1
kA − f ′ (x)k < for all x ∈ V .
2kA−1 k
As A−1 is one-to-one, ϕy (x) = x (x is a fixed point) if only if y − f (x) = 0, or in other words f (x) = y.
Using the chain rule we obtain
ϕy′ (x) = I − A−1 f ′ (x) = A−1 A − f ′ (x) .
So for x ∈ V we have
kϕy′ (x)k ≤ kA−1 k kA − f ′ (x)k < 1/2.
As V is a ball, it is convex. Hence
1
kϕy (x1 ) − ϕy (x2 )k ≤ kx1 − x2 k for all x1 , x2 ∈ V .
2
In other words ϕy is a contraction defined on V , though we so far do not know what is the range
of ϕy . We cannot yet apply the fixed point theorem, but we can say that ϕy has at most one fixed
point in V : If ϕy (x1 ) = x1 and ϕy (x2 ) = x2 , then kx1 − x2 k = kϕy (x1 ) − ϕy (x2 )k ≤ 21 kx1 − x2 k, so
x1 = x2 . That is, there exists at most one x ∈ V such that f (x) = y, and so f |V is one-to-one.
Let W := f (V ). We need to show that W is open. Take a y0 ∈ W . There is a unique x0 ∈ V such
that f (x0 ) = y0 . Let r > 0 be small enough such that the closed ball C(x0 , r) ⊂ V (such r > 0 exists
as V is open).
Suppose y is such that
r
ky − y0 k < .
2kA−1 k
If we show that y ∈ W , then we have shown that W is open. If x ∈ C(x0 , r), then
So ϕy takes C(x0 , r) into B(x0 , r) ⊂ C(x0 , r). It is a contraction on C(x0 , r) and C(x0 , r) is complete
(closed subset of Rn is complete). Apply the contraction mapping principle to obtain a fixed point x,
i.e. ϕy (x) = x. That is, f (x) = y, and y ∈ f C(x0 , r) ⊂ f (V ) = W . Therefore W is open.
Next we need to show that g is continuously differentiable and compute its derivative. First
let us show that it is differentiable. Let y ∈ W and k ∈ Rn , k 6= 0, such that y + k ∈ W . Because
f |V is a one-to-one and onto mapping of V onto W , there are unique x ∈ V and h ∈ Rn , h 6= 0 and
x + h ∈ V , such that f (x) = y and f (x + h) = y + k. In other words, g(y) = x and g(y + k) = x + h.
See .
We can still squeeze some information from the fact that ϕy is a contraction.
ϕy (x + h) − ϕy (x) = h + A−1 f (x) − f (x + h) = h − A−1 k.
So
1 khk
kh − A−1 kk = kϕy (x + h) − ϕy (x)k ≤ kx + h − xk = .
2 2
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 49
V f W
x+h g y+k
f
x y
g
As k goes to 0, so does h. So the right hand side goes to 0 as f is differentiable, and hence the left
hand side also goes to 0. And B is precisely what we wanted g′ (y) to be.
We have g is differentiable, let us show it is C1 (W ). The function g : W → V is continuous
(it is differentiable), f ′ is a continuous function from V to L(Rn ), and X 7→ X −1 is a continuous
−1
function on the set of invertible operators. As g′ (y) = f ′ g(y) is the composition of these
three continuous functions, it is continuous.
Proof. Without loss of generality, suppose U = V . For each point y ∈ f (V ), we pick x ∈ f −1 (y)
(there could be more than one such point), then by the inverse function theorem there is a neighbor-
hood of x in V that maps onto a neighborhood of y. Hence f (V ) is open.
50 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Example 8.5.3: The theorem, and the corollary, is not true if f ′ (x)
is not invertible for some x. For
2 2
example, the map f (x, y) := (x, xy), maps R onto the set R \ (0, y) : y 6= 0 , which is neither
open nor closed. In fact f −1 (0, 0) = (0, y) : y ∈ R . This bad behavior only occurs on the y-axis,
everywhere else the function is locally invertible. If we avoid the y-axis, f is even one-to-one.
Example 8.5.4: Also note that just because f ′ (x) is invertible everywhere does not mean that f
is one-to-one globally. It is “locally” one-to-one but perhaps not “globally.” For an example, take
the map f : R2 \ {0} → R2 defined by f (x, y) := (x2 − y2 , 2xy). It is left to student to show that f is
differentiable and the derivative is invertible
On the other hand, the mapping is 2-to-1 globally. For every (a, b) that is not the origin, there
are exactly two solutions to x2 − y2 = a and 2xy = b. We leave it to the student to show that there
is at least one solution, and then notice that replacing x and y with −x and −y we obtain another
solution.
The invertibility of the derivative is not a necessary condition, just sufficient, for having a
continuous inverse and being an open mapping. For example, the function f (x) := x3 is an open
mapping from R to R and is globally one-to-one with a continuous inverse, although the inverse is
not differentiable at x = 0.
As a side note, there is a related famous, and as yet unsolved problem, called the Jacobian
conjecture. If F : Rn → Rn is polynomial (each component is a polynomial) and JF is a nonzero
constant, does F have a polynomial inverse? The inverse function theorem gives a local C1 inverse,
but can one always find a global polynomial inverse is the question.
Proposition 8.5.5. Let A = [Ax Ay ] ∈ L(Rn+m , Rm ) and suppose Ay is invertible. If B = −(Ay )−1 Ax ,
then
0 = A(x, Bx) = Ax x + Ay Bx.
Furthermore, y = Bx is the unique y ∈ Rm such that A(x, y) = 0.
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 51
The proof is obvious. We simply solve and obtain y = Bx. Another way to solve is to “complete
the basis” that is, add rows to the matrix until we have an invertible matrix. In this case, we construct
a mapping (x, y) 7→ (x, Ax x + Ay y), and find that this operator in L(Rn+m ) is invertible, and the map
B can be read off from the inverse. Let us show that the same can be done for C1 functions.
Theorem 8.5.6 (Implicit function theorem). Let U ⊂ Rn+m be an open set and let f : U → Rm be a
C1 (U) mapping. Let (p, q) ∈ U be a point such that f (p, q) = 0 and such that
∂ ( f1 , . . . , fm )
(p, q) 6= 0.
∂ (y1 , . . . , ym )
Then there exists an open set W ⊂ Rn with p ∈ W , an open set W ′ ⊂ Rm with q ∈ W ′ , with
W ×W ′ ⊂ U, and a C1 (W ) mapping g : W → W ′ , with g(p) = q, and for all x ∈ W , the point g(x)
is the unique point in W ′ such that
f x, g(x) = 0.
Furthermore, if [Ax Ay ] = f ′ (p, q), then
y (p, q)
W′
W ×W ′
W x
f (x, y) = 0
Figure 8.12: Implicit function theorem for f (x, y) = x2 +y2 −1 in U = R2 and (p, q) in the first quadrant.
Proof. Define F : U → Rn+m by F(x, y) := x, f (x, y) . It is clear that F is C1 , and we want to
show that the derivative at (p, q) is invertible.
Let us compute the derivative. We know that
k f (p + h, q + k) − f (p, q) − Ax h − Ay kk
k(h, k)k
p
goes to zero as k(h, k)k = khk2 + kkk2 goes to zero. But then so does
kF(p + h, q + k) − F(p, q) − (h, Ax h + Ay k)k k h, f (p + h, q + k) − f (p, q) − (h, Ax h + Ay k)k
=
k(h, k)k k(h, k)k
k f (p + h, q + k) − f (p, q) − Ax h − Ay kk
= .
k(h, k)k
52 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
h i
I 0
So the derivative of F at (p, q) takes (h, k) to (h, Ax h + Ay k). In block matrix form it is Ax Ay . If
(h, Ax h + Ay k) = (0, 0), then h = 0, and so Ay k = 0. As Ay is one-to-one, k = 0. Thus F ′ (p, q)
is
one-to-one or in other words invertible, and we apply the inverse function theorem.
That is, there exists an open set V ⊂ Rn+m with F(p, q) = (p, 0) ∈ V , and a C1 mapping
G : V → Rn+m , such that F G(x, s) = (x, s) for all (x, s) ∈ V , G is one-to-one, and G(V ) is open.
Write G = (G1 , G2 ) (the first n and the second m components of G). Then
F G1 (x, s), G2 (x, s) = G1 (x, s), f G1 (x, s), G2 (x, s) = (x, s).
So x = G1 (x, s) and f G1 (x, s), G2 (x, s) = f x, G2 (x, s) = s. Plugging in s = 0, we obtain
f x, G2 (x, 0) = 0.
As the set G(V ) is open and (p, q) ∈ G(V ), there exist some open sets W e and W ′ such that
We ×W ′ ⊂ G(V ) with p ∈ W e and q ∈ W ′ . Take W := x ∈ W e : G2 (x, 0) ∈ W ′ . The function that
takes x to G2 (x, 0) is continuous and therefore W is open. Define g : W → Rm by g(x) := G2 (x, 0),
which is the g in the theorem. The fact that g(x) is the unique point in W ′ follows because
W ×W ′ ⊂ G(V ) and G is one-to-one.
Next differentiate
x 7→ f x, g(x) ,
at p, which is the zero map, so its derivative is zero. Using the chain rule,
0 = A h, g′ (p)h = Ax h + Ay g′ (p)h,
f1 (x1 , . . . , xn , y1 , . . . , ym ) = 0,
f2 (x1 , . . . , xn , y1 , . . . , ym ) = 0,
..
.
fm (x1 , . . . , xn , y1 , . . . , ym ) = 0.
And the condition guaranteeing a solution is that this is a C1 mapping (that all the components are
C1 , or in other words all the partial derivatives exist and are continuous), and the matrix
∂ f1 ∂ f1 ∂ f1
. . .
∂ y1 ∂ y2 ∂ ym
∂ f2 ∂ f2 ∂ f2
∂ y1 ∂ y2 . . . ∂ ym
. .. . . ..
.. . . .
∂ fm ∂ fm ∂ fm
∂y ∂y . . . ∂ ym
1 2
Example 8.5.7: Consider the set x2 + y2 − (z + 1)3 = −1, ex + ey + ez = 3 near the point (0, 0, 0).
The function we are looking at is
We find that
′ 2x 2y −3(z + 1)2
f = x y .
e e ez
The matrix
2(0) −3(0 + 1)2 0 −3
=
e0 e0 1 1
is invertible. Hence near (0, 0, 0) we can find y and z as C1 functions of x such that for x near 0 we
have 3
x2 + y(x)2 − z(x) + 1 = −1, ex + ey(x) + ez(x) = 3.
The theorem does not tell us how to find y(x) and z(x) explicitly, it just tells us they exist. In other
words, near the origin the set of solutions is a smooth curve in R3 that goes through the origin.
An interesting observation from the proof is that we solved the equation f x, g(x) = s for all s
in some neighborhood of 0, not just s = 0.
Remark 8.5.8. There are versions of the theorem for arbitrarily many derivatives. If f has k
continuous derivatives, then the solution also has k continuous derivatives. See also the next section.
8.5.2 Exercises
Exercise 8.5.1: Let C = (x, y) ∈ R2 : x2 + y2 = 1 .
a) Solve for y in terms of x near (0, 1) (that is, find the function g from the implicit function theorem for a
neighbourhood of the point (p, q) = (0, 1)).
b) Solve for y in terms of x near (0, −1).
c) Solve for x in terms of y near (−1, 0).
Exercise 8.5.2: Define f : R2 → R2 by f (x, y) := x, y + h(x) for some continuously differentiable function
h of one variable.
a) Show that f is one-to-one and onto.
b) Compute f ′ .
c) Show that f ′ is invertible at all points, and compute its inverse.
Exercise 8.5.3: Define f : R2 → R2 \ (0, 0) by f (x, y) := ex cos(y), ex sin(y) .
a) Show that f is onto.
b) Show that f ′ is invertible at all points.
c) Show that f is not one-to-one, in fact for every (a, b) ∈ R2 \ (0, 0) , there exist infinitely many different
points (x, y) ∈ R2 such that f (x, y) = (a, b).
Therefore, invertible derivative at every point does not mean that f is invertible globally.
Note: Feel free to use what you know about sine and cosine from calculus.
54 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Exercise 8.5.4: Find a map f : Rn → Rn that is one-to-one, onto, continuously differentiable, but f ′ (0) = 0.
Hint: Generalize f (x) = x3 from one to n dimensions.
Exercise 8.5.5: Consider z2 + xz + y = 0 in R3 . Find an equation D(x, y) = 0, such that if D(x0 , y0 ) 6= 0 and
z2 + x0 z + y0 = 0 for some z ∈ R, then for points near (x0 , y0 ) there exist exactly two distinct continuously
differentiable functions r1 (x, y) and r2 (x, y) such that z = r1 (x, y) and z = r2 (x, y) solve z2 + xz + y = 0. Do
you recognize the expression D from algebra?
Exercise 8.5.6: Suppose f : (a, b) → R2 is continuously differentiable and the first component (the x compo-
nent) of ∇ f (t) is not equal to 0 for all t ∈ (a, b). Prove that thereexists an interval (c, d) and a continuously
differentiable function g : (c, d) → R such that (x, y) ∈ f (a, b) if and only if x ∈ (c, d) and y = g(x). In
other words, the set f (a, b) is a graph of g.
Prove that F is a bijective mapping from H to B(0, 1), it is continuously differentiable on H, and its inverse is
also continuously differentiable.
Exercise 8.5.10: Suppose U ⊂ R2 is an open set and f : U → R is a C1 function such that ∇ f (x, y) 6= 0 for
all (x, y) ∈ U. Show that every level set is a C1 smooth curve.
That is, for every (x, y) ∈ U, there exists a C
1
2 ′
function γ : (−δ , δ ) → R with γ (0) 6= 0 such that f γ(t) is constant for all t ∈ (−δ , δ ).
8.5. INVERSE AND IMPLICIT FUNCTION THEOREM 55
Exercise 8.5.11: Suppose U ⊂ R2 is an open set and f : U → R is a C1 function such that ∇ f (x, y) 6= 0
for all (x, y) ∈ U. Show that for every (x, y) there exists a neighborhood V of (x, y) an open set W ⊂ R2 , a
bijective C1 function with a C1 inverse g : W → V such that the level sets of f ◦ g are horizontal lines in W ,
that is, the set given by ( f ◦ g)(s,t) = c for a constant c is a set of the form (s,t0 ) ∈ R2 : s ∈ R, (s,t0 ) ∈ W ,
where t0 is fixed. That is, the level curves can be locally “straightened.”
56 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
So a continuously differentiable, or C1 , function is one where all partial derivatives exist and
are continuous, which agrees with our previous definition due to . We could have
required only that the kth order partial derivatives exist and are continuous, as the existence of lower
order derivatives is clearly necessary to even define kth order partial derivatives, and these lower
order derivatives are continuous as they are differentiable functions.
When the partial derivatives are continuous, we can swap their order.
Proposition 8.6.2. Suppose U ⊂ Rn is open and f : U → R is a C2 function, and j and k are two
integers from 1 to n. Then
∂2 f ∂2 f
= .
∂ xk ∂ x j ∂ x j ∂ xk
Proof. Fix a p ∈ U, and let e j and ek be the standard basis vectors. Pick two positive numbers s and
t small enough so that p + s0 e j + t0 ek ∈ U whenever 0 < s0 ≤ s and 0 < t0 ≤ t. This can be done as
U is open and so contains a small open ball (or a box if you wish) around p.
8.6. HIGHER ORDER DERIVATIVES 57
ek p + s0 e j + t0 ek
p + tek p + se j + tek
p + t0 ek p + se j + t0 ek
ej
p p + se j
Figure 8.13: Using the mean value theorem to estimate a second order partial derivative by a certain
difference quotient.
See . The s0 and t0 depend on s and t, but 0 < s0 < s and 0 < t0 < t. Denote by R2+
the set of (s,t) where s > 0 and t > 0. The set R2+ is the domain of g, and (0, 0) is a cluster point
of R2+ . As (s,t) ∈ R2+ goes to (0, 0), (s0 ,t0 ) ∈ R2+ also goes to (0, 0). By continuity of the second
partial derivatives,
∂2 f
lim g(s,t) = (p).
(s,t)→(0,0) ∂ x j ∂ xk
Now reverse the ordering. Start with the function σ 7→ f (p + σ e j + tek ) − f (p + σ e j ) find an
s1 ∈ (0, s) such that
f (p + tek + se j ) − f (p + se j ) − f (p + tek ) + f (p) ∂f ∂f
= (p + tek + s1 e j ) − (p + s1 e j ).
s ∂xj ∂xj
Find a t1 ∈ (0,t) such that
∂f ∂f
∂ x j (p + tek + s1 e j ) − ∂ x j (p + s1 e j ) ∂2 f
= (p + t1 ek + s1 e j ).
t ∂ xk ∂ x j
58 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
∂2 f
So g(s,t) = ∂ xk ∂ x j (p + t1 ek + s1 e j ) for the same g as above. And as before
∂2 f
lim g(s,t) = (p).
(s,t)→(0,0) ∂ xk ∂ x j
8.6.1 Exercises
Exercise 8.6.1: Suppose f : U → R is a C2 function for some open U ⊂ Rn and p ∈ U. Use the proof of
to find an expression in terms of just the values of f (analogue of the difference quotient for
2
the first derivative), whose limit is ∂ x∂j ∂fxk (p).
∂k f ∂k f
(p) = (p).
∂ x jk ∂ x jk−1 · · · ∂ x j1 ∂ x jσk ∂ x jσk−1 · · · ∂ x jσ1
Exercise 8.6.4: Suppose ϕ : R2 → R be a Ck function such that ϕ(0, θ ) = ϕ(0, ψ) for all θ , ψ ∈ R and
ϕ(r, θ ) = ϕ(r, θ + 2π) for all r, θ ∈ R. Let F(r, θ) = r cos(θ ), r sin(θ ) from . Show that a
2 −1 −1
function g : R → R, given g(x, y) := ϕ F (x, y) is well-defined (notice that F (x, y) can only be defined
locally), and when restricted to R2 \ {0} it is a Ck function.
Note: Feel free to use what you know about sine and cosine from calculus.
Exercise 8.6.6: Suppose f : R2 → R is a function such that all first and second order partial derivatives
exist. Furthermore, suppose that all second order partial derivatives are bounded functions. Prove that f is
continuously differentiable.
8.6. HIGHER ORDER DERIVATIVES 59
Exercise 8.6.7: Follow the strategy below to prove the following simple version of the second derivative
test for functions defined on R2 (using (x, y) as coordinates): Suppose f : R2 → R is a twice continuously
differentiable function with a critical point at the origin, f ′ (0, 0) = 0. If
2 2
∂2 f ∂2 f ∂2 f ∂ f
(0, 0) > 0 and (0, 0) 2 (0, 0) − (0, 0) > 0,
∂ x2 ∂ x2 ∂y ∂ x∂ y
then f has a (strict) local minimum at (0, 0). Use the following technique: First suppose without loss of
generality that f (0, 0) = 0. Then prove:
∂ 2g ∂ 2g ∂ 2g
a) There exists an A ∈ L(R2 ) such that g = f ◦ A is such that ∂ x∂ y (0, 0) = 0, and ∂ x2
(0, 0) =
= 1.∂ y2
(0, 0)
b) For any ε > 0 there exists a δ > 0 such that g(x, y) − x2 − y2 < ε(x2 + y2 ) for all (x, y) ∈ B (0, 0), δ .
Hint: You can use Taylor’s theorem in one variable.
c) This means that g, and therefore f , has a strict local minimum at (0, 0).
Note: You must avoid the temptation to just apply the one variable second derivative test along lines through
the origin, see .
60 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES
Chapter 9
If f is continuous on the compact rectangle [a, b] × [c, d], then Proposition 7.5.12 from volume I
says that g is continuous on [c, d].
Suppose f is differentiable in y. The main question we want to ask is when can we “differentiate
under the integral,” that is, when is it true that g is differentiable and its derivative is
Z b
? ∂f
g′ (y) = (x, y) dx.
a ∂y
Differentiation is a limit and therefore we are really asking when do the two limiting operations of
integration and differentiation commute. This is not always possible and some extra hypothesis is
necessary. In particular, the first question we would face is the integrability of ∂∂ yf , but the formula
can fail even if ∂∂ yf is integrable as a function of x for every fixed y.
Let us prove a simple, but perhaps the most useful version of this theorem.
Theorem 9.1.1 (Leibniz integral rule). Suppose f : [a, b] × [c, d] → R is a continuous function, such
that ∂∂ yf exists for all (x, y) ∈ [a, b] × [c, d] and is continuous. Define
Z b
g(y) := f (x, y) dx.
a
∂f
The hypotheses on f and ∂y can be weakened, see e.g. , but not dropped outright.
The main point in the proof is for ∂∂ yf to exist and be continuous for all x up to the ends, but we only
need a small interval in the y direction. In applications, we often make [c, d] be a small interval
around the point where we need to differentiate.
Proof. Fix y ∈ [c, d] and let ε > 0 be given. As ∂∂ yf is continuous on [a, b] × [c, d] it is uniformly
continuous. In particular, there exists δ > 0 such that whenever y1 ∈ [c, d] with |y1 − y| < δ and all
x ∈ [a, b] we have
∂f ∂f
(x, y1 ) − (x, y) < ε .
∂y ∂y
Suppose h is such that y + h ∈ [c, d] and |h| < δ . Fix x for a moment and apply the mean value
theorem to find a y1 between y and y + h such that
f (x, y + h) − f (x, y) ∂ f
= (x, y1 ).
h ∂y
f (x, y + h) − f (x, y) ∂ f ∂f ∂f
− (x, y) = (x, y1 ) − (x, y) < ε .
h ∂y ∂y ∂y
f (x, y + h) − f (x, y) ∂f
x 7→ converges uniformly to x 7→ (x, y) as h → 0.
h ∂y
We defined uniform convergence for sequences although the idea is the same. You may replace h
with a sequence of nonzero numbers {hn } converging to 0 such that y + hn ∈ [c, d] and let n → ∞.
Consider the difference quotient of g,
Rb Rb Z b
g(y + h) − g(y) a f (x, y + h) dx − a f (x, y) dx f (x, y + h) − f (x, y)
= = dx.
h h a h
Uniform convergence implies the limit can be taken underneath the integral. So
Z b Z b
g(y + h) − g(y) f (x, y + h) − f (x, y) ∂f
lim = lim dx = (x, y) dx.
h→0 h a h→0 h a ∂y
xy −1
Figure 9.1: The graph z = ln(x) on [0, 1] × [0, 1].
Therefore g is a continuous function of on [0, 1], and g(0) = 0. For any ε > 0, the y derivative
of the integrand, xy , is continuous on [0, 1] × [ε , 1]. Therefore, for y > 0 we may differentiate under
the integral sign
Z 1 Z 1
′ ln(x)xy 1
g (y) = dx = xy dx = .
0 ln(x) 0 y+1
We need to figure out g(1), knowing g′ (y) = 1
y+1 and g(0) = 0. By elementary calculus we find
R
g(1) = 01 g′ (y) dy = ln(2). Therefore
Z 1
x−1
dx = ln(2).
0 ln(x)
9.1.1 Exercises
Exercise 9.1.1: Prove the two statements that were asserted in :
x−1
a) Prove ln(x) extends to a continuous function of [0, 1]. That is, there exists a continuous function on [0, 1]
x−1
that equals ln(x) on (0, 1).
xy −1
b) Prove ln(x) extends to a continuous function on [0, 1] × [0, 1].
64 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
Exercise 9.1.3: Suppose f : R → R is an infinitely differentiable function (all derivatives exist) such that
f (0) = 0. Then show that there exists an infinitely differentiable function g : R → R such that f (x) = x g(x).
Finally show that if f ′ (0)R6= 0, then g(0) 6= 0.
Hint: First write f (x) = 0x f ′ (s) ds and then rewrite the integral to go from 0 to 1.
R R1 n x
Exercise 9.1.4: Compute 01 etx dx. Derive the formula for 0 x e dx not using integration by parts, but by
differentiation underneath the integral.
Exercise 9.1.5: Let U ⊂ Rn be an open set and suppose f (x, y1 , y2 , . . . , yn ) is a continuous function defined
on [0, 1] × U ⊂ Rn+1 . Suppose ∂∂yf1 , ∂∂yf2 , . . . , ∂∂yfn exist and are continuous on [0, 1] × U. Then prove that
F : U → R defined by
Z 1
F(y1 , y2 , . . . , yn ) := f (x, y1 , y2 , . . . , yn ) dx
0
is continuously differentiable.
a) Prove that for any fixed y the function x 7→ f (x, y) is Riemann integrable on [0, 1] and
Z 1
y
g(y) = f (x, y) dx = .
0 2y2 + 2
1 − y2
g′ (y) = .
2(y2 + 1)2
∂f
b) Prove ∂y exists at all x and y and compute it.
c) Show that for all y
Z 1
∂f
(x, y) dx
0 ∂y
exists, but
Z 1
∂f
g′ (0) 6= (x, 0) dx.
0 ∂y
9.1. DIFFERENTIATION UNDER THE INTEGRAL 65
a) Prove f is continuous on all of R2 . Therefore the following function is well defined for every y ∈ R:
Z 1
g(y) := f (x, y) dx.
0
∂f
b) Prove ∂y exists for all (x, y), but is not continuous at (0, 0).
R1 ∂f
c) Show that 0 ∂ y (x, 0) dx does not exist even if we take improper integrals, that is, that the limit
R1 ∂f
lim h ∂ y (x, 0) dx does not exist.
h→0+
Note: Feel free to use what you know about sine and cosine from calculus.
Exercise 9.1.8: Strengthen the Leibniz integral rule in the following way. Suppose f : (a, b) × (c, d) → R
is a bounded continuous function, such that ∂∂ yf exists for all (x, y) ∈ (a, b) × (c, d) and is continuous and
bounded. Define
Z b
g(y) := f (x, y) dx.
a
Then g : (c, d) → R is continuously differentiable and
Z b
′ ∂f
g (y) = (x, y) dx.
a ∂y
Hint: See also Exercise 7.5.18 and Theorem 6.2.10 from volume I.
66 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
t =3 t=2
t =4
t=0 t=1
Figure 9.2: The path γ traversing the unit square.
The path is the unit square traversed counterclockwise. See . It is a piecewise smooth
path. For example, γ |[1,2] (t) = (1,t − 1) and so (γ |[1,2] )′ (t) = (0, 1) 6= 0. Similarly for the other
∗
The word “smooth” can sometimes mean “infinitely differentiable” in the literature.
9.2. PATH INTEGRALS 67
sides. Notice that (γ |[1,2] )′ (1) = (0, 1), (γ |[0,1] )′ (1) = (1, 0), but γ ′ (1) does not exist. At the corners
γ is not differentiable. The path γ is a simple closed path, as γ |(0,4) is one-to-one and γ (0) = γ (4).
The definition of a piecewise smooth path as we have given it implies continuity (exercise). For
general functions, many authors also allow finitely many discontinuities, when they use the term
piecewise smooth, and so one may say that we defined a piecewise smooth path to be a continuous
piecewise smooth function. While one may get by with smooth paths, for computations, the simplest
paths to write down are often piecewise smooth.
Generally, we are interested in the direct image γ [a, b] , rather than the specific parametrization,
although that is also important to some degree. When we informally talk about a path or a curve,
we often mean the set γ [a, b] , depending on context.
Example 9.2.3: The condition γ ′ (t) 6= 0 means that the image γ [a, b] has no “corners” where γ
is smooth. Consider (
(t 2 , 0) if t < 0,
γ (t) :=
(0,t 2 ) if t ≥ 0.
See . It is left for the reader to check that γ is continuously differentiable, yet the image
γ (R) = (x, y) ∈ R2 : (x, y) = (s, 0) or (x, y) = (0, s) for some s ≥ 0 has a “corner” at the origin.
And that is because γ ′ (0) = (0, 0). More complicated examples with, say, infinitely many corners
exist, see the exercises.
t = −1
t = −1/2
t =0 t = 1/2 t=1
Figure 9.3: Smooth path with zero derivative with a corner. Several values of t are marked with dots.
The condition γ ′ (t) 6= 0 even at the endpoints guarantees not only no corners, but also that the
path ends nicely, that is, it can extend a little bit past the endpoints. Again, see the exercises.
Example 9.2.8:
−y x
ω (x, y) := dx + dy
x 2 + y2 x 2 + y2
is a one-form defined on R2 \ {(0, 0)}.
Definition 9.2.9. Let γ : [a, b] → Rn be a smooth path and let
The notation makes sense from the formula you remember from calculus, let us state it somewhat
informally: if x j (t) = γ j (t), then dx j = γ j′ (t)dt.
Paths can be cut up or concatenated. The proof is a direct application of the additivity of the
Riemann integral, and is left as an exercise. The proposition justifies why we defined the integral
over a piecewise smooth path in the way we did, and it justifies that we may as well have taken any
partition not just the minimal one in the definition.
Proposition 9.2.10. Let γ : [a, c] → Rn be a piecewise smooth path, and b ∈ (a, c). Define the
piecewise smooth paths α := γ |[a,b] and β := γ |[b,c] . Let ω be a one-form defined on γ [a, c] . Then
Z Z Z
ω= ω+ ω.
γ α β
Example 9.2.11: Let the one-form ω and the path γ : [0, 2π ] → R2 be defined by
−y x
ω (x, y) := 2 2
dx + 2 dy, γ (t) := cos(t), sin(t) .
x +y x + y2
Then
Z Z 2π
!
− sin(t) cos(t)
ω= 2 2 − sin(t) + 2 2 cos(t) dt
γ 0 cos(t) + sin(t) cos(t) + sin(t)
Z 2π
= 1 dt = 2π .
0
70 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
Next, let us parametrize the same curve as α : [0, 1] → R2 defined by α (t) := cos(2π t), sin(2π t) ,
that is α is a smooth reparametrization of γ . Then
Z Z 1
− sin(2π t)
ω= 2 2 −2π sin(2π t)
α 0 cos(2π t) + sin(2π t)
!
cos(2π t)
+ 2 2 2π cos(2π t) dt
cos(2π t) + sin(2π t)
Z 1
= 2π dt = 2π .
0
Now let us reparametrize with β : [0, 2π ] → R2 as β (t) := cos(−t), sin(−t) . Then
Z Z 2π
!
− sin(−t) cos(−t)
ω= 2 2 sin(−t) + 2 2 − cos(−t) dt
β 0 cos(−t) + sin(−t) cos(−t) + sin(−t)
Z 2π
= (−1) dt = −2π .
0
The path α is an orientation preserving reparametrization of γ , and the integrals are the same. The
path β is an orientation reversing reparametrization of γ and the integral is minus the original. See
.
The previous example is not a fluke. The path integral does not depend on the parametrization
of the curve, the only thing that matters is the direction in which the curve is traversed.
Proposition 9.2.12. Let γ : [a, b] → Rn be a piecewise smooth path and γ ◦ h : [c, d]→ Rn a piece-
wise smooth reparametrization. Suppose ω is a one-form defined on the set γ [a, b] . Then
Z
(R
ω if h preserves orientation,
ω = γR
γ◦h − γ ω if h reverses orientation.
9.2. PATH INTEGRALS 71
Proof. Assume first that γ and h are both smooth. Write ω = ω1 dx1 + ω2 dx2 +· · ·+ ωn dxn . Suppose
that h is orientation preserving. Using the change of variables formula for the Riemann integral,
Z Z b
!
n ′
γ
ω=
a
∑ ω j γ (t) γ j (t) dt
j=1
Z d
!
n ′ ′
=
c
∑ ω j γ h(τ ) γ j h(τ ) h (τ ) d τ
j=1
Z d
! Z
n
= ∑ ω j γ h( τ ) (γ j ◦ h)′ (τ ) dτ = ω.
c j=1 γ◦h
If h is orientation reversing, it swaps the order of the limits on the integral and introduces a
minus sign. The details, along with finishing the proof for piecewise smooth paths, is left as
.
n
proposition (and the exercises), if Γ ⊂ R is the image of a simple piecewise smooth
Due to this
path γ [a, b] , then if we somehow indicate the orientation, that is, the direction in which we traverse
the curve, then we can write Z
ω,
Γ
without mentioning the specific γ . Furthermore, for a simple closed path, it does not even matter
where we start the parametrization. See the exercises.
Recall that simple means that γ restricted to (a, b) is one-to-one, that is, γ is one-to-one except
perhaps at the endpoints. We also often relax the simple path condition a little bit. For example, as
long as γ : [a, b] → Rn is one-to-one except at finitely many points. That is, there are only finitely
many points p ∈ Rn such that γ −1 (p) is more than one point. See the exercises. The issue about the
injectivity problem is illustrated by the following example.
Example 9.2.13: Suppose γ : [0, 2π ]→ R2 is given by γ (t) := cos(t), sin(t) , and β : [0, 2π ] → R
2
is given by β (t) := cos(2t), sin(2t) . Notice that γ [0, 2π ] = β [0, 2π ] , and we travel around
the same curve, the unit circle. But γ goes around the unit circle once in the counter clockwise
direction, and β goes around the unit circle twice (in the same direction). See .
Compute
Z Z 2π
−y dx + x dy = − sin(t) − sin(t) + cos(t) cos(t) dt = 2π ,
γ 0
Z Z 2π
−y dx + x dy = − sin(2t) −2 sin(2t) + cos(t) 2 cos(t) dt = 4π .
β 0
It is sometimes convenient to define a path integral over γ : [a, b] → Rn that is not a path. Define
Z Z b
!
n ′
ω :=
γ
∑ ω j γ (t) γ j (t) dt
a j=1
for any continuously differentiable γ . A case that comes up naturally is when γ is constant. Then
γR ′ (t) = 0 for all t, and γ [a, b] is a single point, which we regard as a “curve” of length zero. Then,
γ ω = 0 for any ω .
72 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
γ (0) = β (0) = β (π )
γ (π ) = β (π/2) = β (3π/2)
γ (2π ) = β (2π )
y
γ
x
Figure 9.6: A path γ : [a, b] → R2 in the xy-plane (bold curve), and a function z = f (x, y) graphed above
it in the z direction. The integral is the shaded area depicted.
The definition for a piecewise smooth path is similar as before and is left to the reader.
9.2. PATH INTEGRALS 73
The line integral of a function is also independent of the parametrization, and in this case, the
orientation does not matter.
Proposition 9.2.15. Let γ : [a, b] → Rn be a piecewise smooth path and γ ◦ h : [c, d] → Rn a piece-
wise smooth reparametrization. Suppose f is a continuous function defined on the set γ [a, b] .
Then Z Z
f ds = f ds.
γ◦h γ
Proof. Suppose first that h is orientation preserving and that γ and h are both smooth. Then
Z Z b
f ds = f γ (t) kγ ′ (t)k dt
γ a
Z d ′
= f γ h(τ ) kγ h(τ ) kh′ (τ ) d τ
c
Z d ′
= f γ h(τ ) kγ h(τ ) h′ (τ )k d τ
c
Z d
= f (γ ◦ h)(τ ) k(γ ◦ h)′ (τ )k d τ
Zc
= f ds.
γ◦h
If h is orientation reversing it swaps the order of the limits on the integral, but you also have to
introduce a minus sign in order to take h′ inside the norm. The details, along with finishing the
proof for piecewise smooth paths is left to the reader as .
Similarly as before, because of this proposition (and the exercises),
if γ is simple, it does not
matter which parametrization we use. Therefore, if Γ = γ [a, b] we can simply write
Z
f ds.
Γ
In this case we also do not need to worry about orientation, either way we get the same integral.
Example 9.2.16: Let f (x, y) := x. Let C ⊂ R2 be half of the unit circle for x ≥ 0. We wish to
compute Z
f ds.
C
Parametrize the curve C via γ : [−π/2, π/2] → defined as γ (t) := cos(t), sin(t) . Then γ ′ (t) =
R2
− sin(t), cos(t) , and
Z Z Z π/2 q 2 2 Z π/2
f ds = f ds = cos(t) − sin(t) + cos(t) dt = cos(t) dt = 2.
C γ −π/2 −π/2
If γ is smooth,
Z b
ℓ(Γ) = kγ ′ (t)k dt.
a
R
This may be a good time to mention that it is common to write ab kγ ′ (t)k dt even if the path is only
piecewise smooth. That is because kγ ′ (t)k has only finitely many discontinuities and is bounded,
and so the integral exists.
Example 9.2.18: Let x, y ∈ Rn be two points and write [x, y] as the straight line segment between
the two points x and y. Parametrize [x, y] by γ (t) := (1 − t)x + ty for t running between 0 and 1. See
. Then γ ′ (t) = y − x, and therefore
Z Z 1
ℓ [x, y] = ds = ky − xk dt = ky − xk.
[x,y] 0
So the length of [x, y] is the standard euclidean distance between x and y, justifying the name.
t=1
[x, y] y
x
t=0
9.2.4 Exercises
Exercise 9.2.1: Show that if ϕ : [a, b] → Rn is a piecewise smooth path as we defined it, then ϕ is a continuous
function.
n
γ : [a, b] → R is a piecewise
Exercise 9.2.6: Suppose R
smooth path, and f is a continuous function defined
on the image γ [a, b] . Provide a definition of γ f ds.
Exercise 9.2.9: Suppose γ : [a, b] → Rn is a smooth path. Show that there exists an ε > 0 and a smooth
function γe: (a − ε, b + ε) → Rn with γe(t) = γ(t) for all t ∈ [a, b] and γe′ (t) 6= 0 for all t ∈ (a − ε, b + ε). That
is, prove that a smooth path extends some small distance past the end points.
Exercise 9.2.10: Suppose α : [a, b] → Rn and β : [c, d] → Rn are piecewise smooth paths such that Γ :=
α [a, b] = β [c, d] . Show that there exist finitely many points {p1 , p2 , . . . , pk } ∈ Γ, such that the sets
α −1 {p1 , p2 , . . . , pk } and β −1 {p1 , p2 , . . . , pk } are partitions of [a, b] and [c, d], such that on any subinter-
val the paths are smooth (that is, they are partitions as in the definition of piecewise smooth path).
Exercise 9.2.11:
a) Suppose γ : [a, b] → Rn and α : [c, d] → Rn are two smooth paths which are one-to-one and γ [a, b] =
α [c, d] . Then there exists a smooth reparametrization h : [a, b] → [c, d] such that γ = α ◦ h.
Hint 1: It is not hard to show h exists. The trick is to prove it is continuously differentiable with a nonzero
derivative. Apply the implicit function theorem though it may at first seem the dimensions are wrong.
Hint 2: Worry about derivative of h in (a, b) first.
b) Prove the same thing as part a, but now for simple closed paths with the further assumption that
γ(a) = γ(b) = α(c) = α(d).
c) Prove parts a) and b) but for piecewise smooth paths, obtaining piecewise smooth reparametrizations.
Hint: The trick is to find two partitions such that when restricted to a subinterval of the partition both
paths have the same image and are smooth, see the above exercise.
76 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
Exercise 9.2.12: Suppose α : [a, b] → Rn and β : [b, c] → Rn are piecewise smooth paths with α(b) = β (b).
Let γ : [a, c] → Rn be defined by (
α(t) if t ∈ [a, b],
γ(t) :=
β (t) if t ∈ (b, c].
Show that γ is a piecewise smooth path, and that if ω is a one-form defined on the curve given by γ, then
Z Z Z
ω= ω+ ω.
γ α β
Exercise 9.2.13: Suppose γ : [a, b] → Rn and β : [c, d] → Rn are two simple piecewise smooth closed paths.
That is γ(a) = γ(b) and
β (c) = β (d) and the restrictions γ|(a,b) and β |(c,d) are one-to-one. Suppose
Γ = γ [a, b] = β [c, d] and ω is a one-form defined on Γ ⊂ Rn . Show that either
Z Z Z Z
ω= ω, or ω =− ω.
γ β γ β
R
In particular, the notation Γ ω makes sense if we indicate the direction in which the integral is evaluated.
Hint: See previous three exercises.
Exercise 9.2.14: Suppose γ : [a, b] → Rn and β : [c, d] → Rn are two piecewise smooth paths which are
one-to-one except at finitely many points. That is, there is at most finitely
many points
p ∈ Rn such that
−1 −1
γ (p) or β (p) contains more than one point. Suppose Γ = γ [a, b] = β [c, d] and ω is a one-form
defined on Γ ⊂ Rn . Show that either
Z Z Z Z
ω= ω, or ω =− ω.
γ β γ β
R
In particular, the notation Γ ω makes sense if we indicate the direction in which the integral is evaluated.
Hint: Same hint as the last exercise.
2
Exercise 9.2.15: Define γ : [0, 1] → R2 by γ(t) := t 3 sin(1/t ), t 3t 2 sin(1/t ) − t cos(1/t ) for t 6= 0 and
γ(0) = (0, 0). Show that:
a) γ is continuously differentiable on [0, 1].
b) Show that there exists an infinite sequence {tn } in [0, 1] converging to 0, such that γ ′ (tn ) = (0, 0).
c) Show that the points γ(tn ) lie on the line y = 0 and such that the x-coordinate of γ(tn ) alternates between
positive and negative (if they do not alternate you only found a subsequence, you need to find them all).
d) Show that there is no piecewise smooth α whose image equals γ [0, 1] . Hint: Look at part c) and show
that α ′ must be zero where it reaches the origin.
e) (Computer) if you know a plotting software that allows you to plot parametric curves, make a plot of
the curve, but only for t in the range [0, 0.1] otherwise you will not see the behavior. In particular, you
should notice that γ [0, 1] has infinitely many “corners” near the origin.
Note: Feel free to use what you know about sine and cosine from calculus.
9.3. PATH INDEPENDENCE 77
Write ω = ω1 dx1 + ω2 dx2 + · · · + ωn dxn . We wish to show that for every j = 1, 2, . . . , n, the
partial derivative ∂∂xfj exists and is equal to ω j .
Let e j be an arbitrary standard basis vector, and h a nonzero real number. Compute
Z x+he Z x Z
f (x + he j ) − f (x) 1 j 1 x+he j
= ω− ω = ω,
h h p p h x
R R R
which follows by and path independence as px+he j ω = px ω + xx+he j ω , because
we pick a path from p to x + he j that also happens to pass through x, and then we cut this path in
two, see .
x x + he j
ej
Since U is open, suppose h is so small so that all points of distance |h| or less from x are in
U. As the integral is path independent, pick the simplest path possible from x to x + he j , that is
γ (t) := x + the j for t ∈ [0, 1]. The path is in U. Notice γ ′ (t) = he j has only one nonzero component
and that is the jth component, which is h. Therefore
Z x+he j Z Z 1 Z 1
1 1 1
ω= ω= ω j (x + the j )h dt = ω j (x + the j ) dt.
h x h γ h 0 0
We wish to take the limit as h → 0. The function ω j is continuous at x. Given ε > 0, suppose h is
small enough so that |ω (x) − ω (y)| < ε , whenever kx − yk ≤ |h|. Hence, ω j (x + the j ) − ω j (x) < ε
for all t ∈ [0, 1], and we estimate
Z 1 Z 1
ω j (x + the j ) dt − ω (x) = ω j (x + the j ) − ω (x) dt ≤ ε .
0 0
That is,
f (x + he j ) − f (x)
lim = ω j (x).
h→0 h
All partials of f exist and are equal to ω j , which are continuous functions. Thus, f is continuously
differentiable, and furthermore d f = ω .
9.3. PATH INDEPENDENCE 79
For the other direction, suppose a continuously differentiable f exists such that d f = ω . Take a
smooth path γ : [a, b] → U such that γ (a) = x and γ (b) = y. Then
Z Z b ′ ′ ′
∂f ∂f ∂f
df = γ (t) γ1 (t) + γ (t) γ2 (t) + · · · + γ (t) γn (t) dt
γ a ∂ x1 ∂ x2 ∂ xn
Z b h i
d
= f γ (t) dt
a dt
= f (y) − f (x).
The value of the integral only depends on x and y, not the path taken. Therefore the integral is path
independent. We leave checking this fact for a piecewise smooth path as an exercise.
Path independence can be stated more neatly in terms of integrals over closed paths.
Proposition 9.3.4. Let U ⊂ Rn be a path connected open set and ω a one-form defined on U. Then
ω = d f for some continuously differentiable f : U → R if and only if
Z
ω =0 for every piecewise smooth closed path γ : [a, b] → U.
γ
Proof. Suppose ω = d f and let γ be a piecewise smooth closed path. Since γ (a) = γ (b) for a closed
path, the previous proposition says
Z
ω = f γ (b) − f γ (a) = 0.
γ
R
Now suppose that for every piecewise smooth closed path γ , γ ω = 0. Let x, y be two points in
U and let α : [0, 1] → U and β : [0, 1] → U be two piecewise smooth paths with α (0) = β (0) = x
and α (1) = β (1) = y. See .
α y
x β
Define γ : [0, 2] → U by (
α (t) if t ∈ [0, 1],
γ (t) :=
β (2 − t) if t ∈ (1, 2].
This path is piecewise smooth. This is due to the fact that γ |[0,1] (t) = α (t) and γ |[1,2] (t) = β (2 − t)
(note especially γ (1) = α (1) = β (2 − 1)). It is also closed as γ (0) = α (0) = β (0) = γ (2). So
Z Z Z
0= ω= ω− ω.
γ α β
This follows first by , and then noticing that the second part is β travelled
backwards so that we get minus the β integral. Thus the integral of ω on U is path independent.
80 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES
However one states path independence, it is often a difficult criterion to check, you have to
check something “for all paths.”
There is a local criterion, a differential equation, that guarantees path independence, or in other
words an antiderivative f whose total derivative is the given one-form ω . Since the criterion is
local, we generally only get the result locally. We can find the antiderivative in any so-called
simply connected domain, which informally is a domain where any path between two points can be
“continuously deformed” into any other path between those two points. To make matters simple, the
usual way this result is proved is for so-called star-shaped domains. As balls are star-shaped we
have the result locally.
Definition 9.3.5. Let U ⊂ Rn be an open set and p ∈ U. We say U is a star-shaped domain with
respect to p if for any other point x ∈ U, the line segment [p, x] is in U, that is, if (1 − t)p + tx ∈ U
for all t ∈ [0, 1]. If we say simply star-shaped, then U is star-shaped with respect to some p ∈ U.
See .
p
x
Notice the difference between star-shaped and convex. A convex domain is star-shaped, but a
star-shaped domain need not be convex.
Theorem 9.3.6 (Poincarè lemma). Let U ⊂ Rn be a star-shaped domain and ω a continuously
differentiable one-form defined on U. That is, if
then ω1 , ω2 , . . . , ωn are continuously differentiable functions. Suppose that for every j and k
∂ ω j ∂ ωk
= ,
∂ xk ∂xj
then there exists a twice continuously differentiable function f : U → R such that d f = ω .
The condition on the derivatives of ω is precisely the condition that the second partial derivatives
commute. That is, if d f = ω , and f is twice continuously differentiable, then
∂ωj ∂2 f ∂2 f ∂ ωk
= = = .
∂ xk ∂ xk ∂ x j ∂ x j ∂ xk ∂xj
The condition is clearly necessary. The Poincarè lemma says that it is sufficient for a star-shaped U.
9.3. PATH INDEPENDENCE 81
Therefore what we know about integration of one-forms carries over to the integration of vector
fields. For example, path independence for integration of vector fields is simply that
Z y
v · dγ
x
is path independent if and only if v = ∇ f , that is v is the gradient of a function. The function f is
then called a potential for v.
A vector field v whose path integrals are path independent is called a conservative vector field.
The naming comes from the fact that such vector fields arise in physical systems where a certain
quantity, the energy is conserved.
9.3.3 Exercises
2 +y2 2 +y2
Exercise 9.3.1: Find an f : R2 → R such that d f = xex dx + yex dy.
Exercise 9.3.2: Find an ω2 : R2 → R such that there exists a continuously differentiable f : R2 → R for
which d f = exy dx + ω2 dy.
Exercise 9.3.3: Finish the proof of , that is, we only proved the second direction for a
smooth path, not a piecewise smooth path.
Exercise 9.3.5: Show that U := R2 \ {(x, y) ∈ R2 : x ≤ 0, y = 0} is star-shaped and find all points (x0 , y0 ) ∈ U
such that U is star-shaped with respect to (x0 , y0 ).
Exercise 9.3.6: Suppose U1 and U2 are two open sets in Rn with U1 ∩ U2 nonempty and path connected.
Suppose there exists an f1 : U1 → R and f2 : U2 → R, both twice continuously differentiable such that
d f1 = d f2 on U1 ∩U2 . Then there exists a twice differentiable function F : U1 ∪U2 → R such that dF = d f1
on U1 and dF = d f2 on U2 .
Exercise 9.3.7 (Hard): Let γ : [a, b] → Rn be a simple nonclosed piecewise smooth path (so γ is one-to-
one). Suppose ω is a continuously differentiable one-form defined on some open set V with γ [a, b] ⊂ V
∂ω
and ∂ xkj = ∂∂ωx jk for all j and k. Prove that there exists an open set U with γ [a, b] ⊂ U ⊂ V and a twice
continuously differentiable
function f : U → R such that d f = ω.
Hint 1: γ [a, b] is compact.
Hint 2: Show that you can cover the curve by finitely many balls in sequence so that the kth ball only intersects
the (k − 1)th ball.
Hint 3: See previous exercise.
9.3. PATH INDEPENDENCE 83
Exercise 9.3.8:
a) Show that a connected open set U ⊂ Rn is path connected. Hint: Start with a point x ∈ U, and let Ux ⊂ U
is the set of points that are reachable by a path from x. Show that Ux and U \Ux are both open, and since
Ux is nonempty (x ∈ Ux ) it must be that Ux = U.
b) Prove the converse that is, a path connected set U ⊂ Rn is connected. Hint: For contradiction assume
there exist two open and disjoint nonempty open sets and then assume there is a piecewise smooth (and
therefore continuous) path between a point in one to a point in the other.
Exercise 9.3.9: Usually path connectedness is defined using just continuous paths rather than piecewise
smooth paths. Prove that the definitions are equivalent, in other words prove the following statement:
Suppose U ⊂ Rn is open is such that for any x, y ∈ U, there exists a continuous function γ : [a, b] → U such
that γ(a) = x and γ(b) = y. Then U is path connected (in other words, then there exists a piecewise smooth
path).
Multivariable Integral
A partition P of the closed rectangle R = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] is a finite set of parti-
tions P1 , P2 , . . . , Pn of the intervals [a1 , b1 ], [a2 , b2 ], . . . , [an , bn ]. We write P = (P1 , P2 , . . . , Pn ). That is,
for every k = 1, 2, . . . , n there is an integer ℓk and a finite set of numbers Pk = {xk,0 , xk,1 , xk,2 , . . . , xk,ℓk }
such that
ak = xk,0 < xk,1 < xk,2 < · · · < xk,ℓk −1 < xk,ℓk = bk .
Picking a set of n integers j1 , j2 , . . . , jn where jk ∈ {1, 2, . . . , ℓk } we get the subrectangle
For simplicity, we order the subrectangles somehow and we say {R1 , R2 , . . . , RN } are the subrectan-
gles corresponding to the partition P of R. More simply, we say they are the subrectangles of P. In
other words, we subdivided the original rectangle into many smaller subrectangles. See .
86 CHAPTER 10. MULTIVARIABLE INTEGRAL
It is not difficult to see that these subrectangles cover our original R, and their volume sums to that
of R. That is,
N
[ N
R= Rk , and V (R) = ∑ V (Rk ).
k=1 k=1
x2,3
R1 R2 R3
x2,2
R6 R5 R4
x2,1
R7 R8 R9
x2,0
x1,0 x1,1 x1,2 x1,3
Figure 10.1: Example partition of a rectangle in R2 . The order of the subrectangles is not important.
When
Ri = [x1, j1 −1 , x1, j1 ] × [x2, j2 −1 , x2, j2 ] × · · · × [xn, jn −1 , xn, jn ],
then
V (Ri ) = ∆x1, j1 ∆x2, j2 · · · ∆xn, jn = (x1, j1 − x1, j1 −1 )(x2, j2 − x2, j2 −1 ) · · · (xn, jn − xn, jn −1 ).
We call L(P, f ) the lower Darboux sum and U(P, f ) the upper Darboux sum.
The indexing in the definition may be complicated, but fortunately we generally do not need to
go back directly to the definition often. We start proving facts about the Darboux sums analogous to
the one-variable results.
Proposition 10.1.2. Suppose R ⊂ Rn is a closed rectangle and f : R → R is a bounded function.
Let m, M ∈ R be such that for all x ∈ R we have m ≤ f (x) ≤ M. For any partition P of R we have
Proof. Let P be a partition. Then for all i we have m ≤ mi and Mi ≤ M. Also mi ≤ Mi for all i.
Finally ∑N
i=1 V (Ri ) = V (R). Therefore,
!
N N N
mV (R) = m ∑ V (Ri) = ∑ mV (Ri ) ≤ ∑ mi V (Ri ) ≤
i=1 i=1 i=1
!
N N N
≤ ∑ Mi V (Ri ) ≤ ∑ M V (Ri ) = M ∑ V (Ri) = M V (R).
i=1 i=1 i=1
Definition 10.1.4. Let R ⊂ Rn be a closed rectangle. Let P = (P1 , P2 , . . . , Pn ) and Pe = (Pe1 , Pe2 , . . . , Pen )
be partitions of R. We say Pe a refinement of P if, as sets, Pk ⊂ Pek for all k = 1, 2, . . . , n.
x̃2,4 x2,3
Re1 Re2 Re3 Re4 Re5
x̃2,3 x2,2
Re18 Re12 Re13 Re6 Re7
x̃2,2
Re19 Re14 Re15 Re8 Re9
x̃2,1 x2,1
Re20 Re16 Re17 Re10 Re11
x̃2,0 x2,0
x1,0 x1,1 x1,2 x1,3
x̃1,0 x̃1,1 x̃1,2 x̃1,3 x̃1,4 x̃1,5
Figure 10.2: Example refinement of a partition. New “cuts” are marked in dashed lines. Do note that
the exact order of the new subrectangles does not matter.
88 CHAPTER 10. MULTIVARIABLE INTEGRAL
e f)
L(P, f ) ≤ L(P, and e f ) ≤ U(P, f ).
U(P,
Proof. We prove the first inequality, the second follows similarly. Let R1 , R2 , . . . , RN be the subrect-
angles of P and Re1 , Re2 , . . . , ReNe be the subrectangles of R.
e Let Ik be the set of all indices j such that
Re j ⊂ Rk . For example, in figures and , I4 = {6, 7, 8, 9} as R4 = Re6 ∪ Re7 ∪ Re8 ∪ Re9 . Then,
[
Rk = Re j , V (Rk ) = ∑ V (Rej ).
j∈Ik j∈Ik
N N N e
N
L(P, f ) = ∑ mkV (Rk ) = ∑ ∑ mkV (Rej ) ≤ ∑ ∑ me jV (Rej ) = ∑ me jV (Rej ) = L(P,
e f ).
k=1 k=1 j∈Ik k=1 j∈Ik j=1
The key point of this next proposition is that the lower Darboux integral is less than or equal to
the upper Darboux integral.
Taking supremum of L(P, f ) and infimum of U(P, f ) over all P, we obtain the first and the last
inequality in ( ).
The key inequality in ( ) is the middle one. Let P = (P1 , P2 , . . . , Pn ) and Q = (Q1 , Q2 , . . . , Qn )
be partitions of R. Define P = (Pe1 , Pe2 , . . . , Pen ) by letting Pek := Pk ∪ Qk . Then Pe is a partition of R as
e
can easily be checked, and Pe is a refinement of P and a refinement of Q. By ,
e f ) and U(P,
L(P, f ) ≤ L(P, e f ) ≤ U(Q, f ). Therefore,
e f ) ≤ U(P,
L(P, f ) ≤ L(P, e f ) ≤ U(Q, f ).
In other words, for two arbitrary partitions P and Q we have L(P, f ) ≤ U(Q, f ). Via Proposition 1.2.7
from volume I, we obtain
sup L(P, f ) : P a partition of R ≤ inf U(P, f ) : P a partition of R .
R R
In other words Rf ≤ R f.
10.1. RIEMANN INTEGRAL OVER RECTANGLES 89
Example 10.1.9: A constant function is Riemann integrable. Suppose f (x) = c for all x on R. Then
Z Z
cV (R) ≤ f≤ f ≤ cV (R).
R R
R
So f is integrable, and furthermore R f = cV (R).
The proofs of linearity and monotonicity are almost completely identical as the proofs from one
variable. We therefore leave it as an exercise to prove the next two propositions.
Proposition 10.1.10 (Linearity). Let R ⊂ Rn be a closed rectangle and let f and g be in R(R) and
α ∈ R.
(i) α f is in R(R) and Z Z
αf =α f.
R R
(ii) f + g is in R(R) and Z Z Z
( f + g) = f+ g.
R R R
90 CHAPTER 10. MULTIVARIABLE INTEGRAL
Checking for integrability using the definition often involves the following technique, as in the
single variable case.
Proof. First, if f is integrable, then clearly the supremum of L(P, f ) and infimum of U(P, f ) must
be equal and hence the infimum of U(P, f ) − L(P, f ) is zero. Therefore for every ε > 0 there must
be some partition P such that U(P, f ) − L(P, f ) < ε .
For the other direction, given an ε > 0 find P such that U(P, f ) − L(P, f ) < ε .
Z Z
f− f ≤ U(P, f ) − L(P, f ) < ε .
R R
R R R R
As Rf ≥ Rf and the above holds for every ε > 0, we conclude Rf = Rf and f ∈ R(R).
Proof. Given ε > 0, we find a partition P of S such that U(P, f ) − L(P, f ) < ε . By making a
refinement of P if necessary, we assume that the endpoints of R are in P. In other words, R is
a union of subrectangles of P. The subrectangles of P divide into two collections, ones that are
subsets of R and ones whose intersection with the interior of R is empty. Suppose R1 , R2 . . . , RK are
the subrectangles that are subsets of R and let RK+1 , . . . , RN be the rest. Let Pe be the partition of R
composed of those subrectangles of P contained in R. Using the same notation as before,
K N
ε > U(P, f ) − L(P, f ) = ∑ (Mk − mk )V (Rk ) + ∑ (Mk − mk )V (Rk )
k=1 k=K+1
K
≥ e f |R ) − L(P,
∑ (Mk − mk )V (Rk ) = U(P, e f |R ).
k=1
Therefore, f |R is integrable.
10.1. RIEMANN INTEGRAL OVER RECTANGLES 91
where the closure is with respect to the subspace topology on U. Taking the closure with respect to
the subspace topology is the same as {x ∈ U : f (x) 6= 0} ∩U, where the closure is with respect to
the ambient euclidean space Rn . In particular, supp( f ) ⊂ U. The support is the closure (in U) of
the set of points where the function is nonzero. Its complement in U is open. If x ∈ U and x is not
in the support of f , then f is constantly zero in a whole neighborhood of x.
A function f is said to have compact support if supp( f ) is a compact set.
Example 10.1.16: The function f : R2 → R defined by
( 2 p
−x(x2 + y2 − 1) if x2 + y2 ≤ 1,
f (x, y) :=
0 else,
p
is continuous and its support is the closed unit disc C(0, 1) = (x, y) : x2 + y2 ≤ 1 , which is a
compact set, so f has compact support. Do note that the function is zero on the entire y-axis and on
the unit circle, but all points that lie in the closed unit disc are still within the support as they are in
the closure of points where f is nonzero. See .
z y
x
Figure 10.3: Function with compact support (left), the support is the closed unit disc (right).
If U 6= Rn , then you must be careful to take the closure in U. Consider the following two
examples.
Example 10.1.17: Consider the unit disc B(0, 1) ⊂ R2 . The function f : B(0, 1) → R defined by
( p
0 if x2 + y2 > 1/2,
f (x, y) := p p
1/2 − x2 + y2 if x2 + y2 ≤ 1/2,
is continuous on B(0, 1) and its support is the smaller closed ball C(0, 1/2). As that is a compact set,
f has compact support.
10.1. RIEMANN INTEGRAL OVER RECTANGLES 93
Example 10.1.18: On the other hand for the unit disc B(0, 1) ⊂ R2 , the function continuous
f : B(0, 1) → R defined by f (x, y) := sin 1−x12 −y2 , does not have compact support; as f is not
constantly zero on neighborhood of any point in B(0, 1), we know that the support is the entire disc
B(0, 1). The function does not extend as above to a continuous function. In fact it is not difficult to
show that it cannot be extended in any way whatsoever to be continuous on all of R2 (the boundary
of the disc is the problem).
Proof. As f is continuous,
R
it is automatically
R R
integrable on the rectangles R, S, and R ∩ S. Then
says S f = S∩R f = R f .
Because of this proposition, when f : Rn → R has compact support and is integrable over a
rectangle R containing the support we write
Z Z Z Z
f := f or f := f.
R Rn R
R
For example, if f is continuous and of compact support, then Rn f exists.
10.1.6 Exercises
Exercise 10.1.1: Suppose U ⊂ Rn is open and f : U → R is continuous and of compact support. Show that
the function fe: Rn → R (
f (x) if x ∈ U,
fe(x) :=
0 otherwise,
is continuous.
Exercise 10.1.4: Suppose R is a rectangle with the length of one of the sides equal to 0, and suppose S is a
rectangle
R
with R ⊂ S. If f is a bounded function such that f (x) = 0 for x ∈ R \ S, show that f ∈ R(R) and
R f = 0.
Exercise 10.1.5: Suppose f : Rn → R is such that f (x) := 0 if x 6= 0 and f (0) := R
1. Show that f is integrable
on R := [−1, 1] × [−1, 1] × · · · × [−1, 1] directly using the definition, and find R f .
Exercise 10.1.6: Suppose R is a closed rectangle and h : R → R is a bounded function such that h(x) = 0 if
/ ∂ R (the boundary of R). Let S be any closed rectangle. Show that h ∈ R(S) and
x∈
Z
h = 0.
S
Hint: Write h as a sum of functions as in .
Exercise 10.1.7: Suppose R and are two closed rectangles with R′ ⊂ R. Suppose f : R → R is in R(R′ )
R′
and f (x) = 0 for x ∈ R \ R′ . Show that f ∈ R(R) and
Z Z
f= f.
R′ R
Do this in the following steps.
a) First do the proof assuming that furthermore f (x) = 0 whenever x ∈ R \ R′ .
b) Write f (x) R= g(x)R+ h(x) where g(x) = 0 whenever x ∈ R \ R′ , and h(x) is zero except perhaps on ∂ R′ .
Then show R h = R′ h = 0 (see ).
R R
c) Show R′ f= R f.
Exercise 10.1.8: Suppose R′ ⊂ Rn and R′′ ⊂ Rn are two rectangles such that R = R′ ∪ R′′ is a rectangle, and
R′ ∩ R′′ is rectangle with one of the sides having length 0 (that is V (R′ ∩ R′′ ) = 0). Let f : R → R be a function
such that f ∈ R(R′ ) and f ∈ R(R′′ ). Show that f ∈ R(R) and
Z Z Z
f= f+ f.
R R′ R′′
Hint: See previous exercise.
Exercise 10.1.9: Prove a stronger version of . Suppose f : Rn → R be a function with
compact support but not necessarily continuous. Prove that if R is a closed rectangle such that supp( f ) ⊂ R
and f is integrable
R R
over R, then for any other closed rectangle S with supp( f ) ⊂ S, the function f is integrable
over S and S f = R f . Hint: See .
Exercise 10.1.10: Suppose R and S are closed rectangles of Rn . Define
R
f : Rn → R as f (x) := 1 if x ∈ R,
and f (x) := 0 otherwise. Prove f is integrable over S and compute S f . Hint: Consider S ∩ R.
Exercise 10.1.11: Let R = [0, 1] × [0, 1] ⊂ R2 .
a) Suppose f : R → R is defined by (
1 if x = y,
f (x, y) :=
0 else.
R
Show that f ∈ R(R) and compute R f.
b) Suppose f : R → R is defined by
(
1 if x ∈ Q or y ∈ Q,
f (x, y) :=
0 else.
Show that f ∈
/ R(R).
10.1. RIEMANN INTEGRAL OVER RECTANGLES 95
Exercise 10.1.12: Suppose R is a closed rectangle, and suppose S j are closed rectangles such that S j ⊂ R
and S j ⊂ S j+1 for all j. Suppose f : R → R is bounded and f ∈ R(S j ) for all j. Show that f ∈ R(R) and
Z Z
lim f= f.
j→∞ S j R
Exercise 10.1.13: Suppose f : [−1, 1] × [−1, 1] → R is a Riemann integrable function such f (x) = − f (−x).
Using the definition prove Z
f = 0.
[−1,1]×[−1,1]
96 CHAPTER 10. MULTIVARIABLE INTEGRAL
R1 := [0, 1/2 − ε ] × [0, 1], R2 := [1/2 − ε , 1/2 + ε ] × [0, 1], R3 := [1/2 + ε , 1] × [0, 1].
L(P, f ) = m1V (R1 ) + m2V (R2 ) + m3V (R3 ) = 0(1/2 − ε ) + 0(2ε ) + 0(1/2 − ε ) = 0,
and
U(P, f ) = M1V (R1 ) + M2V (R2 ) + M3V (R3 ) = 0(1/2 − ε ) + 1(2ε ) + 0(1/2 − ε ) = 2ε .
The upper and Rlower sum are arbitrarily close and the lower sum is always zero, so the function is
integrable and R f = 0.
For any y, the function that takes x to f (x, y) is zero except perhaps at a single point x = 1/2. We
R R R
know that such a function is integrable and 01 f (x, y) dx = 0. Therefore, 01 01 f (x, y) dx dy = 0.
However if x = 1/2, the function that takes y to f (1/2, y) is the nonintegrable function that is 1 on
the rationals and 0 on the irrationals. See Example 5.1.4 from volume I.
We will solve this problem of undefined inside integrals by using the upper and lower integrals,
which are always defined.
10.2. ITERATED INTEGRALS AND FUBINI THEOREM 97
We split the coordinates of Rn+m into two parts. That is, we write the coordinates on Rn+m =
Rn × Rm as (x, y) where x ∈ Rn and y ∈ Rm . For a function f (x, y) we write
fx (y) := f (x, y)
f y (x) := f (x, y)
In other words
Z Z Z Z Z
f= f (x, y) dy dx = f (x, y) dy dx.
R×S R S R S
If it turns out that fx is integrable for all x, for example when f is continuous, then we obtain the
more familiar Z Z Z
f= f (x, y) dy dx.
R×S R S
If we let
mk (x) := inf′ f (x, y) = inf′ fx (y),
y∈Rk y∈Rk
∗
Named after the Italian mathematician (1879–1943).
98 CHAPTER 10. MULTIVARIABLE INTEGRAL
K K Z
∑ m j,k V (R′k ) ≤ ∑ mk (x)V (R′k ) = L(P′, fx ) ≤ S
fx = g(x).
k=1 k=1
We thus obtain
N
′
L (P, P ), f ≤ ∑ inf g(x) V (R j ) = L(P, g).
x∈R j
j=1
Similarly U (P, P′ ), f ) ≥ U(P, h), and the proof of this inequality is left as an exercise.
Putting this together we have
L (P, P′ ), f ≤ L(P, g) ≤ U(P, g) ≤ U(P, h) ≤ U (P, P′ ), f .
and hence
U(P, h) − L(P, h) ≤ U (P, P′ ), f − L (P, P′ ), f .
′ ), f ≤ L(P, h) ≤ U (P, P′ ), f we must have that
So
R
if fR
is integrable so is h, and as L (P, P
R h = R×S f .
We can also do the iterated integration in opposite order. The proof of this version is almost
identical to version A (or follows quickly from version A), and we leave it as an exercise to the
reader.
Next suppose that fx and f y are integrable for simplicity. For example, suppose that f is
continuous. Then by putting the two versions together we obtain the familiar
Z Z Z Z Z
f= f (x, y) dy dx = f (x, y) dx dy.
R×S R S S R
Often the Fubini theorem is stated in two dimensions for a continuous function f : R → R on a
rectangle R = [a, b] × [c, d]. Then the Fubini theorem states that
Z Z bZ d Z dZ b
f= f (x, y) dy dx = f (x, y) dx dy.
R a c c a
And the Fubini theorem is commonly thought of as the theorem that allows us to swap the order of
iterated integrals.
Repeatedly applying Fubini theorem gets us the following corollary: Let R := [a1 , b1 ]×[a2 , b2 ]×
· · · × [an , bn ] ⊂ Rn be a closed rectangle and let f : R → R be continuous. Then
Z Z b1 Z b2 Z bn
f= ··· f (x1 , x2 , . . . , xn ) dxn dxn−1 · · · dx1 .
R a1 a2 an
Clearly we can also switch the order of integration to any order we please. We can also relax the
continuity requirement by making sure that all the intermediate functions are integrable, or by using
upper or lower integrals.
10.2.1 Exercises
R1R1 xy
Exercise 10.2.1: Compute −1 xe
dx dy in a simple way.
0
Exercise 10.2.2: Prove the assertion U (P, P′ ), f ≥ U(P, h) from the proof of .
Exercise 10.2.3 (Easy): Prove .
Exercise 10.2.4: Let R = [a, b] × [c, d] and f (x, y) is an integrable function on R such that such that for any
fixed y, the function that takes x to f (x, y) is zero except at finitely many points. Show
Z
f = 0.
R
Exercise 10.2.5: Let R = [a, b] × [c, d] and f (x, y) := g(x)h(y) for two continuous functions g : [a, b] → R
and h : [a, b] → R. Prove
Z Z b Z d
f= g h .
R a c
Exercise 10.2.7: Suppose f (x, y) := g(x) where g : [a, b] → R is Riemann integrable. Show that f is Riemann
integrable for any R = [a, b] × [c, d] and
Z Z b
f = (d − c) g.
R a
R2
R1 S
R3
Figure 10.4: Outer measure construction, in this case S ⊂ R1 ∪ R2 ∪ R3 ∪ · · · , so m∗ (S) ≤ V (R1 ) +
V (R2 ) +V (R3 ) + · · · .
An immediate and useful consequence (exercise) of the definition is that if A ⊂ B then m∗ (A) ≤
m∗ (B). It is also not difficult to show (exercise) that we obtain the same number m∗ (S) if we also
allow both finite and infinite sequences of rectangles in the definition.
The theory of measures on Rn is a very complicated subject. We will only require measure-zero
sets and so we focus on these. The set S is of measure zero if for every ε > 0 there exists a sequence
of open rectangles {R j } such that
∞
[ ∞
S⊂ Rj and ∑ V (R j ) < ε . (10.2)
j=1 j=1
If S is of measure zero and S′ ⊂ S, then S′ is of measure zero. We can use the same exact rectangles.
It is sometimes more convenient to use balls instead of rectangles. Furthermore, we can choose
balls no bigger than a fixed radius.
Proposition 10.3.2. Let δ > 0 be given. A set S ⊂ Rn is measure zero if and only if for every ε > 0,
there exists a sequence of open balls {B j }, where the radius of B j is r j < δ such that
∞
[ ∞
S⊂ Bj and ∑ rnj < ε .
j=1 j=1
102 CHAPTER 10. MULTIVARIABLE INTEGRAL
Proof. If C is a (closed or√open) cube (rectangle with all sides equal) of side s, then R is contained
√
in a closed ball of radius n s by , and therefore in an open ball of size 2 n s.
Suppose R is a (closed
√ or open) rectangle. Let s be a number that is less than the smallest side
of R and also so that 2 n s < δ . We claim R is contained in a union of closed cubes C1 ,C2 , . . . ,Ck
of sides s such that
k
∑ V (C j ) ≤ 2nV (R).
j=1
It is clearly true (without the 2n ) if R has sides that are integer multiples of s. So if a side is of length
(ℓ + α )s, for ℓ ∈ N and 0 ≤ α < 1, then (ℓ + α )s ≤ 2ℓs. Increasing the side to 2ℓs, and then doing
the same for every side, we obtain a new larger rectangle of volume at most 2n times larger, but
whose sides are multiples of s.
So suppose that there exist {R j } as in the definition such that ( ) is true. As we have seen
above, we can choose closed cubes {Ck } with Ck of side sk as above that cover all the rectangles
{R j } and so that
∞ ∞ ∞
∑ snk = n
∑ V (Ck ) ≤ 2 ∑ V (Rk ) < 2nε .
k=1 k=1 j=1
√
Covering Ck with balls Bk of radius rk = 2 n sk < δ we obtain
∞
∑ rkn < 22nnε .
k=1
S S S
And as S ⊂ j R j ⊂ k Ck ⊂ k Bk , we are finished.
For the other direction, suppose S is covered by balls B j of radii r j , such that ∑ rnj < ε , as in the
statement of the proposition. Each B j is contained a in a cube R j of side 2r j . So V (R j ) = (2r j )n =
2n rnj . Therefore
∞
[ ∞ ∞
S⊂ Rj and ∑ V (R j ) ≤ ∑ 2nrnj < 2nε .
j=1 j=1 j=1
The definition of outer measure could have been done with open balls as well, not just null sets.
We leave this generalization to the reader.
Proof. Suppose
∞
[
S= S j,
j=1
where S j are all measure zero sets. Let ε > 0 be given. For each j there exists a sequence of open
rectangles {R j,k }∞
k=1 such that
∞
[
Sj ⊂ R j,k
k=1
and
∞
∑ V (R j,k ) < 2− j ε .
k=1
Then
∞ [
[ ∞
S⊂ R j,k .
j=1 k=1
As V (R j,k ) is always positive, the sum over all j and k can be done in any order. In particular, it can
be done as
∞ ∞ ∞
∑ ∑ V (R j,k ) < ∑ 2− j ε = ε .
j=1 k=1 j=1
As s is fixed, we make V (R) arbitrarily small by picking ε small enough. So Ps is measure zero.
Next
∞
[
P= Pj
j=1
We wish to bound ∑(b j − a j ) from below. Since [a, b] is compact, then finitely many of the open
intervals still cover [a, b]. As throwing out some of the intervals only makes the sum smaller, we
only need to consider the finite number of intervals still covering [a, b]. If (ai , bi ) ⊂ (a j , b j ), then we
can throw out (ai , bi ) as well, in other words the intervals that are left have distinct left endpoints,
S
and whenever a j < ai < b j , then b j < bi . Therefore [a, b] ⊂ kj=1 (a j , b j ) for some k, and we assume
that the intervals are sorted such that a1 < a2 < · · · < ak . Since (a2 , b2 ) is not contained in (a1 , b1 ),
since a j > a2 for all j > 2, and since the intervals must contain every point in [a, b], we find that
a2 < b1 , or in other words a1 < a2 < b1 < b2 . Similarly a j < a j+1 < b j < b j+1 . Furthermore,
a1 < a and bk > b. Thus,
k k−1
m∗ [a, b] ≥ ∑ (b j − a j ) ≥ ∑ (a j+1 − a j ) + (bk − ak ) = bk − a1 > b − a.
j=1 j=1
Proposition 10.3.7. Suppose E ⊂ Rn is a compact set of measure zero. Then for every ε > 0, there
exist finitely many open rectangles R1 , R2 , . . . , Rk such that
k
E ⊂ R1 ∪ R2 ∪ · · · ∪ Rk and ∑ V (R j ) < ε .
j=1
Also for any δ > 0, there exist finitely many open balls B1 , B2 , . . . , Bk of radii r1 , r2 , . . . , rk < δ such
that
k
E ⊂ B1 ∪ B2 ∪ · · · ∪ Bk and ∑ rnj < ε .
j=1
By compactness, there are finitely many of these rectangles that still contain E. That is, there is
some k such that E ⊂ R1 ∪ R2 ∪ · · · ∪ Rk . Hence
k ∞
∑ V (R j ) ≤ ∑ V (R j ) < ε .
j=1 j=1
The proof that we can choose balls instead of rectangles is left as an exercise.
10.3. OUTER MEASURE AND NULL SETS 105
Example 10.3.8: So that the reader is not under the impression that there are only very few measure
zero sets and that these are simple, let us give an uncountable, compact, measure zero subset in
[0, 1]. For any x ∈ [0, 1] write the representation in ternary notation
∞
x= ∑ dn3−n, where dn = 0, 1, or 2.
n=1
See §1.5 in volume I, in particular Exercise 1.5.4. Define the Cantor set C as
n ∞ o
−n
C := x ∈ [0, 1] : x = ∑ n
d 3 , where d n = 0 or d n = 2 for all n .
n=1
That is, x is in C if it has a ternary expansion in only 0’s and 2’s. If x has two expansions, as long as
one of them does not have any 1’s, then x is in C. Define C0 := [0, 1] and
n ∞ o
−n
Ck := x ∈ [0, 1] : x = ∑ dn3 , where dn = 0 or dn = 2 for all n = 1, 2, . . . , k .
n=1
Clearly,
∞
\
C= Ck .
k=1
See .
We leave as an exercise to prove that:
(i) Each Ck is a finite union of closed intervals. It is obtained by taking Ck−1 , and from each
closed interval removing the “middle third”.
(ii) Therefore, each Ck is closed, and so C is closed.
n
(iii) Furthermore, m∗ (Ck ) = 1 − ∑kn=1 32n+1 .
(iv) Hence, m∗ (C) = 0.
(v) The set C is in one-to-one correspondence with [0, 1], in other words, it is uncountable.
C0
C1
C2
C3
C4
Figure 10.5: Cantor set construction.
106 CHAPTER 10. MULTIVARIABLE INTEGRAL
Lemma 10.3.9. Suppose U ⊂ Rn is an open set, B ⊂ U is an open (resp. closed) ball of radius
at most r, f : B → Rn is continuously differentiable and suppose k f ′ (x)k ≤ M for all x ∈ B. Then
f (B) ⊂ B′ , where B′ is an open (resp. closed) ball of radius at most Mr.
Proof. Without loss of generality assume B is a closed ball. The ball B is convex, and hence via
, k f (x)− f (y)k ≤ Mkx − yk for all x, y in B. In particular, suppose B = C(y, r),
then f (B) ⊂ C f (y), Mr .
The image of a measure zero set using a continuous map is not necessarily a measure zero
set, although this is not easy to show (see the exercises). However, if the mapping is continuously
differentiable, then the mapping cannot “stretch” the set that much.
Proof. We leave the proof for a general measure zero set as an exercise, and we now prove the
proposition for a compact measure zero set.
Suppose E is compact. First let us replace U by a smaller open set to make k f ′ (x)k bounded. At
each point x ∈ E pick an open ball B(x, rx ) such that the closed ball C(x, rx ) ⊂ U. By compactness
we only need to take finitely many points x1 , x2 , . . . , xq to still cover E. Define
q
[ q
[
U ′ := B(x j , rx j ), K := C(x j , rx j ).
j=1 j=1
We have E ⊂ U ′ ⊂ K ⊂ U. The set K is compact. The function that takes x to k f ′ (x)k is continuous,
and therefore there exists an M > 0 such that k f ′ (x)k ≤ M for all x ∈ K. So without loss of generality
we may replace U by U ′ and from now on suppose that k f ′ (x)k ≤ M for all x ∈ U.
At each point x ∈ E pick a ball B(x, δx ) of maximum radius so that B(x, δx ) ⊂ U. Let δ =
infx∈E δx . Take a sequence {x j } ⊂ E so that δx j → δ . As E is compact, we can pick the sequence to
δ δ
be convergent to some y ∈ E. Once kx j − yk < 2y , then δx j > 2y by the triangle inequality. Therefore
δ > 0.
Given ε > 0, there exist balls B1 , B2 , . . . , Bk of radii r1 , r2 , . . . , rk < δ such that
k
E ⊂ B1 ∪ B2 ∪ · · · ∪ Bk and ∑ rnj < ε .
j=1
The balls are are contained in U. Suppose B′1 , B′2 , . . . , B′k are the balls of radius Mr1 , Mr2 , . . . , Mrk
from , such that f (B j ) ⊂ B′j for all j.
k
f (E) ⊂ f (B1 ) ∪ f (B2 ) ∪ · · · ∪ f (Bk ) ⊂ B′1 ∪ B′2 ∪ · · · ∪ B′k and ∑ Mrnj < Mε .
j=1
10.3. OUTER MEASURE AND NULL SETS 107
10.3.4 Exercises
Exercise 10.3.1: Finish the proof of , that is, show that you can use balls instead of
rectangles.
Exercise 10.3.3: Suppose X ⊂ Rn is a set such that for every ε > 0 there exists a set Y such that X ⊂ Y and
m∗ (Y ) ≤ ε. Prove that X is a measure zero set.
Exercise 10.3.5: The closure of a measure zero set can be quite large. Find an example set S ⊂ Rn that is of
measure zero, but whose closure S = Rn .
Exercise 10.3.7: Let U ⊂ Rn be an open set and let f : U → R be a continuously differentiable function. Let
G := {(x, y) ∈ U × R : y = f (x)} be the graph of f . Show that f is of measure zero.
Exercise 10.3.8: Given a closed rectangle R ⊂ Rn , show that for any ε > 0 there exists a number s > 0 and
finitely many open cubes C1 ,C2 , . . . ,Ck of side s such that R ⊂ C1 ∪C2 ∪ · · · ∪Ck and
k
∑ V (C j ) ≤ V (R) + ε.
j=1
Exercise 10.3.9: Show that there exists a number k = k(n, r, δ ) depending only on n, r and δ such the
following holds. Given B(x, r) ⊂ Rn and δ > 0, there exist k open balls B1 , B2 , . . . , Bk of radius at most δ
such that B(x, r) ⊂ B1 ∪ B2 ∪ · · · ∪ Bk . Note that you can find k that really only depends on n and the ratio δ/r.
Exercise 10.3.11: Prove that the Cantor set of contains no interval. That is, whenever a < b,
there exists a point x ∈
/ C such that a < x < b.
Note the consequence of this statement. While we proved that an open set in R is a countable disjoint union of
intervals, a closed set (even though it is just the complement of an open set) need not be a union of intervals.
108 CHAPTER 10. MULTIVARIABLE INTEGRAL
Exercise 10.3.12 (Challenging): Let us construct the so called Cantor function or the Devil’s staircase. Let C
be the Cantor set and let Ck be as in . Write x ∈ [0, 1] in ternary representation x = ∑n=1 dn 3−n .
If dn 6= 1 for all n, then let cn := d2n for all n. Otherwise, let k be the smallest integer such that dk = 1. Then
let cn := d2n if n < k, ck := 1, and cn := 0 if n > k. Then define
∞
ϕ(x) := ∑ cn 2−n .
n=1
0.5
0
0 0.5 1
Figure 10.6: Cantor function or Devil’s staircase (the function ϕ from the exercise).
Exercise 10.3.13: Prove that we obtain the same outer measure if we allow both finite and infinite sequences
in the definition. That is, define µ ∗ (S) := inf ∑ j∈I V (R j ) where the infimum is taken over all countable (finite
S
or infinite) sets of open rectangles {R j } j∈I such that S ⊂ j∈I R j . Prove that for every S ⊂ Rn , µ ∗ (S) = m∗ (S).
10.4. THE SET OF RIEMANN INTEGRABLE FUNCTIONS 109
Hence, o(x, f ) = 0.
On the other hand suppose that o(x, f ) = 0. Given any ε > 0, find a δ > 0 such that o( f , x, δ ) < ε .
If y ∈ BS (x, δ ), then
| f (x) − f (y)| ≤ sup f (y1 ) − f (y2 ) = o( f , x, δ ) < ε .
y1 ,y2 ∈BS (x,δ )
So o( f , ξ ) < ε as well. As this is true for all ξ ∈ BS (x, δ/2) we get that G is open in the subspace
topology and S \ G is closed as is claimed.
110 CHAPTER 10. MULTIVARIABLE INTEGRAL
The interiors of the rectangles Tx cover T . As T is compact there exist finitely many such rectangles
T1 , T2 , . . . , Tm that cover T .
Take the rectangles T1 , T2 , . . . , Tm and O1 , O2 , . . . , Ok and construct a partition out of their end-
points. That is construct a partition P of R with subrectangles R1 , R2 , . . . , R p such that every R j is
contained in Tℓ for some ℓ or the closure of Oℓ for some ℓ. Order the rectangles so that R1 , R2 , . . . , Rq
are those that are contained in some Tℓ , and Rq+1 , Rq+2 , . . . , R p are the rest. In particular,
q p
∑ V (R j ) ≤ V (R) and ∑ V (R j ) ≤ ε .
j=1 j=q+1
Let m j and M j be the inf and sup of f over R j as before. If R j ⊂ Tℓ for some ℓ, then (M j − m j ) < 2ε .
Let B ∈ R be such that | f (x)| ≤ B for all x ∈ R, so (M j − m j ) < 2B over all rectangles. Then
p
U(P, f ) − L(P, f ) = ∑ (M j − m j )V (R j )
j=1
! !
q p
= ∑ (M j − m j )V (R j ) + ∑ (M j − m j )V (R j )
j=1 j=q+1
! !
q p
≤ ∑ 2εV (R j ) + ∑ 2BV (R j )
j=1 j=q+1
≤ 2ε V (R) + 2Bε = ε 2V (R) + 2B .
10.4. THE SET OF RIEMANN INTEGRABLE FUNCTIONS 111
We can make the right hand side as small as we want, and hence f is integrable.
For the other direction, suppose f is Riemann integrable over R. Let S be the set of discontinuities
again and now let
Sk := {x ∈ R : o( f , x) ≥ 1/k}.
Fix a k ∈ N. Given an ε > 0, find a partition P with subrectangles R1 , R2 , . . . , R p such that
p
U(P, f ) − L(P, f ) = ∑ (M j − m j )V (R j ) < ε
j=1
As G can be covered by open rectangles arbitrarily small volume, Sk must be of measure zero. As
∞
[
S= Sk
k=1
and a countable union of measure zero sets is of measure zero, S is of measure zero.
Corollary 10.4.4. Let R ⊂ Rn be a closed rectangle. Let R(R) be the set of Riemann integrable
functions on R. Then
(i) R(R) is a real algebra: if f , g ∈ R(R) and a ∈ R, then a f ∈ R(R), f + g ∈ R(R) and
f g ∈ R(R).
(ii) If f , g ∈ R(R) and
then ϕ , ψ ∈ R(R).
(iii) If f ∈ R(R), then | f | ∈ R(R), where | f |(x) := | f (x)|.
(iv) If R′ ⊂ Rm is another closed rectangle, U ⊂ Rn and U ′ ⊂ Rm are open sets such that R ⊂ U and
R′ ⊂ U ′ , g : U → U ′ is continuously differentiable, bijective, g−1 is continuously differentiable,
and f ∈ R(R′ ), then the composition f ◦ g is Riemann integrable on R.
10.4.3 Exercises
Exercise 10.4.1: Suppose f : (a, b) × (c, d) → R is a bounded continuous function. Show that the integral of
f over R = [a, b] × [c, d] makes sense and is uniquely defined. That is, set f to be anything on the boundary
of R and compute the integral.
Exercise 10.4.2: Suppose R ⊂ Rn is a closed rectangle. Show that R(R), the set of Riemann integrable
functions, is an algebra. That is, show that if f , g ∈ R(R) and a ∈ R, then a f ∈ R(R), f + g ∈ R(R) and
f g ∈ R(R).
Exercise 10.4.3: Suppose R ⊂ Rn is a closed rectangle and
R
f : R → R is a bounded function which is zero
except on a closed set E ⊂ R of measure zero. Show that R f exists and compute it.
Exercise 10.4.4: Suppose R ⊂ Rn is a closed rectangle and f : R → R and g : R → R are two R
Riemann
R
integrable functions. Suppose f = g except for a closed set E ⊂ R of measure zero. Show that R f = R g.
Exercise 10.4.5: Suppose R ⊂ Rn is a closed rectangle and f : R → R is a bounded function.
a) Suppose there exists a closed set E ⊂ R of measure zero such that f |R\E is continuous. Then f ∈ R(R).
b) Find an example where E ⊂ R is a set of measure zero (but not closed) such that f |R\E is continuous and
f 6∈ R(R).
Exercise 10.4.6: Suppose R ⊂ Rn is a closed rectangle and f : R → R and g : R → R are Riemann integrable
functions. Show that
ϕ(x) := max{ f (x), g(x)}, ψ(x) := min{ f (x), g(x)},
are Riemann integrable.
Exercise 10.4.7: Suppose R ⊂ Rn is a closed rectangle and f : R → R a Riemann integrable function. Show
that | f | is Riemann integrable. Hint: Define f+ (x) := max{ f (x), 0} and f− (x) := max{− f (x), 0}, and then
write | f | in terms of f+ and f− .
Exercise 10.4.8:
a) Suppose R ⊂ Rn and R′ ⊂ Rm are closed rectangles, U ⊂ Rn and U ′ ⊂ Rm are open sets such that R ⊂ U
and R′ ⊂ U ′ , g : U → U ′ is continuously differentiable, bijective, g−1 is continuously differentiable, and
f ∈ R(R′ ), then the composition f ◦ g is Riemann integrable on R.
b) Find a counterexample when g is not one-to-one. Hint: Try g(x, y) := (x, 0) and R = R′ = [0, 1] × [0, 1].
Exercise 10.4.9: Suppose f : [0, 1]2 → R is defined by
(
1
if x, y ∈ Q and x = kℓ and y = qp in lowest terms,
f (x, y) := kq
0 else.
Show that f ∈ R [0, 1]2 .
Exercise 10.4.10: Compute the oscillation o f , (x, y) for all (x, y) ∈ R2 for the function
(
xy
x 2 +y2 if (x, y) 6= (0, 0),
f (x, y) :=
0 else.
Exercise 10.4.11: For the popcorn function f : [0, 1] → R
(
1
if x ∈ Q and x = qp .
f (x) := q
0 else.
Compute o( f , x) for all x ∈ [0, 1].
10.5. JORDAN MEASURABLE SETS 113
Proposition 10.5.1. A bounded set S ⊂ Rn is Jordan measurable if and only if the boundary ∂ S is
a measure zero set.
Proof. Suppose R is a closed rectangle such that S is contained in the interior of R. If x ∈ ∂ S, then
for every δ > 0, the sets S ∩ B(x, δ ) (where χS is 1) and the sets (R \ S) ∩ B(x, δ ) (where χS is 0) are
both nonempty. So χS is not continuous at x. If x is either in the interior of S or in the complement
of the closure S, then χS is either identically 1 or identically 0 in a whole neighborhood of x and
hence χS is continuous at x. Therefore, the set of discontinuities of χS is precisely the boundary ∂ S.
The proposition then follows.
Proposition 10.5.2. Suppose S and T are bounded Jordan measurable sets. Then
(i) The closure S is Jordan measurable.
(ii) The interior S◦ is Jordan measurable.
(iii) S ∪ T is Jordan measurable.
(iv) S ∩ T is Jordan measurable.
(v) S \ T is Jordan measurable.
The proof of the proposition is left as an exercise. Next, we find that the volume that we defined
above coincides with the outer measure we defined above.
∗
Named after the French mathematician (1838–1922).
114 CHAPTER 10. MULTIVARIABLE INTEGRAL
Proof. Given ε > 0, let R be a closed rectangle that contains S. Let P be a partition of R such that
Z Z
U(P, χS ) ≤ χS + ε = V (S) + ε and L(P, χS ) ≥ χS − ε = V (S) − ε .
R R
Let R1 , R2 , . . . , Rk be all the subrectangles of P such that χS is not identically zero on each R j . That
is, there is some point x ∈ R j such that x ∈ S. Let O j be an open rectangle such that R j ⊂ O j and
S
V (O j ) < V (R j ) + ε/k. Notice that S ⊂ j O j . Then
!
k k
U(P, χS ) = ∑ V (R j ) > ∑ V (O j ) − ε ≥ m∗ (S) − ε .
j=1 j=1
[
ℓ ℓ
m∗ R′◦j = ∑ V (R′◦j ).
j=1 j=1
Hence
[
ℓ [
ℓ ℓ ℓ
∗ ∗
m (S) ≥ m R′j ≥m ∗
R′◦j = ∑ V (R′◦j ) = ∑ V (R′j ) = L(P, f ) ≥ V (S) − ε .
j=1 j=1 j=1 j=1
When f is defined on a larger set and we wish to integrate over S, then we apply the definition
to the restriction f |S . In particular, if f : R → R for a closed rectangle R, and S ⊂ R is a Jordan
measurable subset, then Z Z
f = f χS .
S R
Proposition 10.5.5. If S ⊂ Rnis a bounded Jordan measurable set and f : S → R is a bounded
continuous function, then f is integrable on S.
Proof. Define the function fe as above for some closed rectangle R with S ⊂ R. If x ∈ R \ S, then
fe is identically zero in a neighborhood of x. Similarly if x is in the interior of S, then fe = f on a
neighborhood of x and f is continuous at x. Therefore, fe is only ever possibly discontinuous at ∂ S,
which is a set of measure zero, and we are finished.
10.5.4 Exercises
Exercise 10.5.1: Prove .
Exercise 10.5.2: Prove that a bounded convex set is Jordan measurable. Hint: Induction on dimension.
Exercise 10.5.3: Let f : [a, b] → R and g : [a, b] → R be continuous functions and such that for all x ∈ (a, b),
f (x) < g(x). Let
U := (x, y) ∈ R2 : a < x < b and f (x) < y < g(x) .
a) Show that U is Jordan measurable.
b) If f : U → R is Riemann integrable on U, then
Z Z b Z f (x)
f= f (x, y) dy dx.
U a g(x)
116 CHAPTER 10. MULTIVARIABLE INTEGRAL
Exercise 10.5.4: Let us construct an example of a non-Jordan measurable open set. For simplicity we work
first in one dimension. Let {r j } be an enumeration of all rational numbers in (0, 1). Let (a j , b j ) be open
S
intervals such that (a j , b j ) ⊂ (0, 1) for all j, r j ∈ (a j , b j ), and ∑∞j=1 (b j − a j ) < 1/2. Now let U := ∞j=1 (a j , b j ).
Show that
a) The open intervals (a j , b j ) as above actually exist.
b) ∂U = [0, 1] \U.
c) ∂U is not of measure zero, and therefore U is not Jordan measurable.
d) Show that W := (0, 1) × (0, 2) \ U × [0, 1] ⊂ R2 is a connected bounded open set in R2 that is not
Jordan measurable.
Exercise 10.5.6: Suppose U ⊂ Rn is open and K ⊂ U is compact. Find a compact Jordan measurable set S
such that S ⊂ U and K ⊂ S◦ (K is in the interior of S).
Exercise 10.5.7: Prove replacing all closed rectangles with bounded Jordan measurable
sets.
10.6. GREEN’S THEOREM 117
Definition 10.6.1. Let U ⊂ R2 be a bounded connected open set. Suppose the boundary ∂ U is a
finite union of (the images of) simple piecewise smooth paths such that near each point p ∈ ∂ U
every neighborhood V of p contains points of R2 \ U. Then U is called a bounded domain with
piecewise smooth boundary in R2 .
The condition about points outside the closure says that locally ∂ U separates R2 into an “inside”
and an “outside”. The condition prevents ∂ U from being just a “cut” inside U. As we travel along
the path in a certain orientation, there is a well-defined left and a right, and either U is on the left
and the complement of U is on the right, or vice-versa. The orientation on U is the direction in
which we travel along the paths. We can switch orientation if needed by reparametrizing the path.
Definition 10.6.2. Let U ⊂ R2 be a bounded domain with piecewise smooth boundary, let ∂ U
be oriented , and let γ : [a, b] → R2 be a parametrization
of ∂ U giving the orientation. Write
γ (t) = x(t), y(t) . If the vector n(t) := −y′ (t), x′ (t) points into the domain, that is, ε n(t) + γ (t)
is in U for all small enough ε > 0, then ∂ U is positively oriented. See . Otherwise it is
negatively oriented.
∂U
n(t) = −y′ (t), x′ (t)
U
γ ′ (t) = x′ (t), y′ (t)
Figure 10.7: Positively oriented domain (left), and a positively oriented domain with a hole (right).
The vector n(t) turns γ ′ (t) counterclockwise by 90◦ , that is to the left. When we travel along
a positively oriented boundary in the direction of its orientation, the domain is “on our left”. For
example, if U is a bounded domain with “no holes”, that is ∂ U is connected, then the positive
orientation means we are travelling counterclockwise around ∂ U. If we do have “holes”, then we
travel around them clockwise.
Proposition 10.6.3. Let U ⊂ R2 be a bounded domain with piecewise smooth boundary, then U is
Jordan measurable.
∗
Named after the British mathematical physicist (1793–1841).
118 CHAPTER 10. MULTIVARIABLE INTEGRAL
Proof. We must show that ∂ U is a null set. As ∂ U is a finite union of simple piecewise smooth
paths, which are finite unions of smooth paths, we need only show that a smooth
path in R2 is a null
2
set. Let γ : [a, b] → R be a smooth path. It is enough to show that γ (a, b) is a null set, as adding
the points γ (a) and γ (b), to a null set still results in a null set. Define
Theorem 10.6.4 (Green). Suppose U ⊂ R2 is a bounded domain with piecewise smooth boundary
with the boundary positively oriented. Suppose P and Q are continuously differentiable functions
defined on some open set that contains the closure U. Then
Z Z
∂Q ∂P
P dx + Q dy = − .
∂U U ∂x ∂y
We stated Green’s theorem in general, although we will only prove a special version of it. That
is, we will only prove it for a special kind of domain. The general version follows from the special
case by application of further geometry, and cutting up the general domain into smaller domains on
which to apply the special case. We will not prove the general case.
Let U ⊂ R2 be a domain with piecewise smooth boundary. We say U is of type I if there exist
numbers a < b, and continuous functions f : [a, b] → R and g : [a, b] → R, such that
Similarly, U is of type II if there exist numbers c < d, and continuous functions h : [c, d] → R and
k : [c, d] → R, such that
Proof of Green’s theorem for U of type III. Let f , g, h, k be the functions defined above. By
, U is Jordan measurable and as U is of type I, then
Z Z b Z f (x)
∂P ∂P
− = − (x, y) dy dx
U ∂y a g(x) ∂y
Z b
= −P x, f (x) + P x, g(x) dx
a
Z b Z b
= P x, g(x) dx − P x, f (x) dx.
a a
We integrate P dx along the boundary. The one-form P dx integrates to zero along the straight
vertical lines in the boundary. Therefore it is only integrated along the top and along the bottom.
As a parameter,
x runs from left to right. If we use the parametrizations that take x to x, f (x) and
to x, g(x) we recognize path integrals above. However the second path integral is in the wrong
direction; the top should be going right to left, and so we must switch orientation.
Z Z b Z a Z
∂P
P dx = P x, g(x) dx + P x, f (x) dx = − .
∂U a b U ∂y
Similarly, U is also of type II. The form Q dy integrates to zero along horizontal lines. So
Z Z d Z h(y) Z b Z
∂Q ∂Q
= (x, y) dx dy = Q y, h(y) − Q y, k(y) dx = Q dy.
U ∂x c k(y) ∂ x a ∂U
1 R 2π
′
Let g(r) := 2π 0 f x0 + r cos(t), y0 + r sin(t) dt. Then g (r) = 0 for all r > 0. The function is
constant for r > 0 and continuous at r = 0 (exercise). Therefore g(0) = g(r) for all r > 0. Therefore,
Z 2π
1
g(r) = g(0) = f x0 + 0 cos(t), y0 + 0 sin(t) dt = f (x0 , y0 ).
2π 0
That is, the value at p = (x0 , y0 ) is the average over a circle of any radius r centered at (x0 , y0 ).
10.6.1 Exercises
Exercise 10.6.1: Prove that a disc B(p, r) ⊂ R2 is a type
III domain, and prove that the orientation given by
the parametrization γ(t) = x0 + r cos(t), y0 + r sin(t) where p = (x0 , y0 ) is the positive orientation of the
boundary ∂ B(p, r).
Note: Feel free to use what you know about sine and cosine from calculus.
Exercise 10.6.2: Prove that any bounded domain with piecewise smooth boundary that is convex is a type III
domain.
Exercise 10.6.3: Suppose V ⊂ R2 is a domain with piecewise smooth boundary that is a type III domain and
suppose that U ⊂ R2Ris a domain such that V ⊂ U. Suppose f : U → R is a twice continuously differentiable
function. Prove that ∂ V ∂∂ xf dx + ∂∂ yf dy = 0.
Exercise 10.6.4: For a disc B(p, r) ⊂ R2 , orient the boundary ∂ B(p, r) positively:
Z
a) Compute −y dx.
∂ B(p,r)
Z
b) Compute x dy.
∂ B(p,r)
Z
−y x
c) Compute dy + dy.
∂ B(p,r) 2 2
Exercise 10.6.5: Using Green’s theorem show that the area of a triangle with vertices (x1 , y1 ), (x2 , y2 ),
(x3 , y3 ) is 21 |x1 y2 + x2 y3 + x3 y1 − y1 x2 − y2 x3 − y3 x1 |. Hint: See previous exercise.
Exercise 10.6.6: Using the mean value property prove the maximum principle for harmonic functions:
Suppose U ⊂ R2 is a connected open set and f : U → R is harmonic. Prove that if f attains a maximum at
p ∈ U, then f is constant.
p
Exercise 10.6.7: Let f (x, y) := ln x2 + y2 .
a) Show f is harmonic where defined.
b) Show lim(x,y)→0 f (x, y) = −∞.
1 R
c) Using a circle Cr of radius r around the origin, compute 2π r ∂ Cr f ds. What happens as r → 0?
d) Why can’t you use Green’s theorem?
10.7. CHANGE OF VARIABLES 121
It may be surprising that the analogue in higher dimensions is quite a bit more complicated. The
first complication is orientation.
Rb
IfR we use the definition of integral from this chapter, then we do
not have the notion of a versus ba . We are simply integrating over an interval [a, b]. With this
notation, the change of variables becomes
Z ′ Z
f g(x) |g (x)| dx = f (x) dx.
[a,b] g([a,b])
The set g(S) is Jordan measurable by , so the left hand side does make sense.
That the right hand side makes sense follows by (actually ).
Proof. The set S can be covered by finitely many closed rectangles P1 , P2 , . . . , Pk , whose interiors do
not overlap such that each Pj ⊂ U (exercise). Proving the theorem for Pj ∩ S instead of S is enough
Furthermore, for y ∈
/ g(S) write f (y) := 0. We therefore assume that S is equal to a rectangle R.
We can write any Riemann integrable function f as f = f+ − f− for two nonnegative Riemann
integrable functions f+ and f− :
So, if we prove the theorem for a nonnegative f , we obtain the theorem for arbitrary f . Therefore,
let us also suppose that f (y) ≥ 0 for all y.
Let ε > 0 be given. For every x ∈ R, let
Wx := y ∈ U : kg′ (x) − g′ (y)k < ε/2 .
We leave it as an exercise to prove that Wx is open. As x ∈ Wx for every x, we find an open cover.
Therefore by the Lebesgue covering lemma, there exists a δ > 0 such that for every y ∈ R, there is
an x such that B(y, δ ) ⊂ Wx . In other words, if P is a rectangle P of maximum side length less than
√δ and such that y ⊂ P, then P ⊂ B(y, δ ) ⊂ Wx . By triangle inequality kg′ (ξ ) − g′ (η )k < ε for all
n
ξ , η ∈ P.
So let R1 , R2 , . . . , RN be subrectangles partitioning R such that the maximum side of any R j is
less than √δn . We also make sure that the minimum side length is at least 2√ δ
n
, which we can do if δ
is sufficiently small compared to the sides of R (exercise).
Consider some R j and some fixed x j ∈ R j . First suppose x j = 0, g(x j ) = 0, and g′ (0) = I.
For any y ∈ R j , apply the fundamental theorem of calculus to the function t 7→ g(ty) to find
R
g(y) = 01 g′ (ty)y dt. As the side of R j is at most √δn then kyk ≤ δ . So
Z 1 Z 1 Z 1
′
kg(y) − yk = g (ty)y − y dt ≤ kg (ty)y − yk dt ≤ kyk kg′ (ty) − Ik dt ≤ δ ε .
′
0 0 0
δε
g(y)
y Re j
g(R j )
s2
x j = 0 = g(x j ) Rj
δε
δε s1 δε
Figure 10.9: Image of R j under g lies inside Re j . A sample point y ∈ R j (on the boundary of R j in fact)
is marked and g(y) must lie within with a radius of δ ε (also marked).
√
If the sides are s1 , s2 , . . . , sn then V (R j ) = s1 s2 · · · sn . Recall δ ≤ 2 n s j . Thus
In other words, √
V g(R j ) ≤ V (R j )(1 + 4 n ε ).
Next, let us suppose that A := g′ (0) is not ′
the identity. Write g = A ◦ ge where ge (0) = I. By
, we know that V A(R j ) = |det(A)|V (R j ), and hence
√
V g(R j ) ≤ |det(A)|V (R j )(1 + 4 n ε )
√
= |Jg (0)|V (R j )(1 + 4 n ε ).
Finally, translation does not change volume, and therefore for any R j , and x j ∈ R j including when
x j 6= 0 and g(x j ) 6= 0, we find
√
V g(R j ) ≤ |Jg (x j )|V (R j )(1 + 4 n ε ).
First assume that f (y) ≥ 0 for all y ∈ R. Suppose δ > 0 was chosen small enough such that
Z N
ε + f g(x) |Jg (x)| dx ≥ ∑ sup f g(x) |Jg (x)| V (R j )
R j=1 x∈R j
N
≥ ∑ sup f g(x) |Jg (x j )|V (R j )
j=1 x∈R j
N
√
≥ ∑ sup f (y) V g(R j ) (1 + 4 n ε )
j=1 y∈g(R j )
N Z
√
≥ ∑ g(R j )
f (y) dy (1 + 4 n ε )
j=1
Z
√
= (1 + 4 n ε ) f (y) dy.
g(R)
Where the last equality follows because the overlaps of the rectangles are their boundaries which
are of measure zero, and hence the image of their boundaries is also measure zero.
Letting ε go to zero we find
Z Z
f g(x) |Jg (x)| dx ≥ f (y) dy.
R g(R)
By adding this result for several rectangles covering an S we obtain the result for an arbitrary
bounded Jordan measurable S ⊂ U, and any nonnegative integrable function f :
Z Z
f g(x) |Jg (x)| dx ≥ f (y) dy.
S g(S)
Next, recall that g−1 exists, and that g−1 g(S) = S. Also 1 = Jg◦g−1 = Jg (g−1 (y))Jg−1 (y) for
y ∈ g(S). So
Z Z
f (y) dy = f g g−1 (y) |Jg g−1 (y) | |Jg−1 (y)| dy
g(S) g(S)
Z Z
≥ f g(x) |Jg (x)| dx = f g(x) |Jg (x)| dx.
g−1 (g(S)) S
The conclusion of the theorem holds for all nonnegative f and as we mentioned above, it thus
holds for all Riemann integrable f .
124 CHAPTER 10. MULTIVARIABLE INTEGRAL
10.7.1 Exercises
Exercise 10.7.1: Prove .
Exercise 10.7.2: Suppose S ⊂ Rn is a closed bounded Jordan measurable set, and S ⊂ U for an open set
U ⊂ Rn . Show that there exist finitely many closed bounded rectangles P1 , P2 , . . . , Pk such that Pj ⊂ U,
S ⊂ P1 ∪ P2 ∪ · · · ∪ Pk , and the interiors are mutually disjoint, that is Pj◦ ∩ Pℓ◦ = 0/ whenever j 6= ℓ.
Exercise 10.7.4: Suppose R ⊂ Rn is a closed bounded rectangle. Show that if δ ′ > 0 is sufficiently small
compared to the sides of R, then R can be partitioned into subrectangles where each side of any subrectangle
′
is between δ2 and δ ′ .
Exercise 10.7.5: Prove the following version of the theorem: Suppose f : Rn → R is a Riemann integrable
compactly supported function. Suppose K ⊂ Rn is the support of f , S is a compact set, and g : Rn → Rn
is a function that when restricted to a neighborhood U of S is one-to-one and continuously differentiable,
g(S) = K and Jg is never zero on S (in the formula assume Jg (x) = 0 if g not differentiable at x, that is when
x∈/ U). Then Z Z
f (x) dx = f g(x) |Jg (x)| dx.
Rn Rn
Exercise 10.7.6: Prove the following version of the theorem: Suppose S ⊂ Rn is an open bounded Jordan
measurable set, g : S → Rn is a one-to-one continuously differentiable mapping such that Jg is never zero on
S, and such that g(S) is bounded and Jordan measurable (it is also open). Suppose f : g(S) → R is Riemann
integrable. Then f ◦ g is Riemann integrable on S and
Z Z
f (x) dx = f g(x) |Jg (x)| dx.
g(S) S
Hint: Write S as an increasing union of closed bounded Jordan measurable sets, then apply the theorem of
the section to those. Then prove that you can take the limit.
Chapter 11
Functions as Limits
x + iy or (x, y)
iy
i
1 x
Definition 11.1.1. Suppose z = x + iy. We call x the real part of z, and we call y the imaginary part
of z. We write
Re z := x, Im z := y.
Define complex conjugate as
z̄ := x − iy.
Similarly define modulus as p
|z| := x2 + y2 .
Modulus acts like absolute value. For example, |zw| = |z| |w| (exercise).
The complex conjugate is a reflection of the plane across the real axis. The real numbers are
precisely those numbers for which the imaginary part y = 0. In particular, they are precisely those
numbers which satisfy the equation
z = z̄.
As C is really R2 , we let the metric on C be the standard euclidean metric on R2 . In particular,
So the topology on C is the same exact topology as the standard topology on R2 with the euclidean
metric, and |z| is in fact equal to the euclidean norm on R2 . Importantly, since R2 is a complete
metric space, then so is C.
Since |z| is the euclidean norm on R2 we have the triangle inequality of both flavors:
The complex conjugate and the modulus are even more intimately related:
Remark 11.1.2. There is no natural ordering on the complex numbers. In particular, no ordering
that makes the complex numbers into an ordered field. Ordering is one of the things we lose when
we go from real to complex numbers.
11.1. COMPLEX NUMBERS 127
First,
zn wn = (xn sn − yntn ) + i(xntn + yn sn ).
As topology on C is the same as on R2 , then xn → x, yn → y, sn → s, and tn → t. So
lim (xn sn − yntn ) = xs − yt and lim (xntn + yn sn ) = xt + ys.
n→∞ n→∞
As before, we sometimes write ∑ zn for the series. A series converges absolutely if ∑|zn | converges.
We say a series is Cauchy if the sequence of partial sums is Cauchy. The following two
propositions have essentially the same proofs as for real series and we leave them as exercises.
128 CHAPTER 11. FUNCTIONS AS LIMITS
Proposition 11.1.4. The complex series ∑ zn is Cauchy if for every ε > 0, there exists an M ∈ N
such that for every n ≥ M and every k > n we have
k
∑ z j < ε.
j=n+1
The series ∑|zn | is a real series. All the convergence tests (ratio test, root test, etc. . . ) that
talk about absolute convergence work with the numbers |zn |, that is, they are really talking about
convergence of series of nonnegative real numbers. You can directly apply these tests them without
needing to reprove anything for complex series.
We make the same definition for every other type of integral (improper, multivarible, etc. . . ).
Similarly when we differentiate, write f : [a, b] → C as f = u + iv. Thinking of C as R2 we
find that f is differentiable if u and v are differentiable. For such a function the derivative was
represented by a vector in R2 . Now a vector in R2 is a complex number. In other words we write
the derivative as
f ′ (t) := u′ (t) + i v′ (t).
The linear operator representing the derivative is the multiplication by the complex number f ′ (t), so
nothing is lost in this identification.
11.1.4 Exercises
Exercise 11.1.1: Check that C is a field.
Exercise 11.1.7: Prove the Bolzano-Weierstrass theorem for complex sequences. Suppose {zn } is a bounded
sequence of complex numbers, that is, there exists an M such that |zn | ≤ M for all n. Prove that there exists a
subsequence {znk } that converges to some z ∈ C.
Exercise 11.1.8:
a) Prove that there is no simple mean value theorem for complex valued functions: Find a differentiable
function f : [0, 1] → C such that f (0) = f (1) = 0, but f ′ (t) 6= 0 for all t ∈ [0, 1].
b) However, we have the weaker form of the mean value theorem as we do for vector-valued functions.
Prove: If f : [a, b] → C is continuous and differentiable in (a, b), and for some M, | f ′ (x)| ≤ M for all
x ∈ (a, b), then | f (b) − f (a)| ≤ M|b − a|.
Exercise 11.1.9: Prove that there is no simple meanRvalue theorem for integrals for complex valued functions:
Find a continuous function f : [0, 1] → C such that 01 f = 0 but f (t) 6= 0 for all t ∈ [0, 1].
130 CHAPTER 11. FUNCTIONS AS LIMITS
11.2.1 Continuity
Let us get back to swapping limits. Let { fn } be a sequence of functions fn : X → Y for a set X and
a metric space Y . Let f : X → Y be a function and for every x ∈ X suppose that
f (x) = lim fn (x).
n→∞
The question is: If fn are all continuous, is f continuous? Differentiable? Integrable? What are
the derivatives or integrals of f ?
For example, for continuity of the pointwise limit of a sequence { fn }, we are asking if
?
lim lim fn (x) = lim lim fn (x).
x→x0 n→∞ n→∞ x→x0
We don’t even a priory know if both sides exist, let alone if they are equal each other.
Example 11.2.1: The functions fn : R → R,
1
fn (x) :=
1 + nx2
converge pointwise to (
1 if x = 0,
f (x) :=
0 else,
which is not continuous of course.
Pointwise convergence is not enough to preserve continuity (nor even boundedness). For that,
we need uniform convergence.
Let fn : X → Y be functions. Then { fn } converges uniformly to f if for every ε > 0, there exists
an M such that for all n ≥ M and all x ∈ X we have
d fn (x), f (x) < ε .
A series ∑ fn of complex-valued functions converges uniformly if the sequence of partial sums
converges uniformly, that is for every ε > 0 there exists an M such that for all n ≥ M and all x ∈ X
!
n
∑ fk (x) − f (x) < ε .
k=1
The simplest property preserved by uniform convergence is boundedness. We leave the proof of
the following proposition as an exercise. It is almost identical to the proof for real-valued functions.
11.2. SWAPPING LIMITS 131
Proposition 11.2.2. Let X be any set and Y any metric space. If fn : X → Y are bounded functions
and converge uniformly to f : X → Y , then f is bounded.
We have a notion of uniformly Cauchy as for real-valued functions. The proof of the following
proposition is again essentially the same as for the real-valued functions and is left as an exercise.
Proposition 11.2.3. Let X be any set and let (Y, d) be a Cauchy complete metric space. Let
fn : X → Y be functions. Then { fn } converges uniformly if and only if for every ε > 0, there is an
M such that for all n, m ≥ M, and all x ∈ X we have
d fn (x), fm (x) < ε .
For f : X → C, we write
k f ku := sup| f (x)|.
x∈X
We call k·ku the supremum norm or uniform norm. Then a sequence of functions fn : X → C
converges uniformly to f : X → C if and only if
lim k fn − f ku = 0.
n→∞
k f + gku ≤ k f ku + kgku .
For a compact metric space X, the uniform norm is a norm on the vector space C(X, C). We
leave it as an exercise. While we will not need it, C(X, C) is in fact a complex vector space, that is
in the definition of a vector space replace R with C. Convergence in the metric space C(X, C) is
uniform convergence.
We will study a couple of types of series of functions, and a useful test for uniform convergence
of a series is the so-called Weierstrass M-test.
Theorem 11.2.4 (Weierstrass M-test). Let X be any set. Suppose fn : X → C are functions, Mn > 0
numbers such that
∞
| fn (x)| ≤ Mn for all x ∈ X, and ∑ Mn converges.
n=1
Then
∞
∑ fn(x) converges uniformly.
n=1
Another way to state the theorem is to say that if ∑k fn ku converges, then ∑ fn converges
uniformly. Note that the converse of this theorem is not true.
132 CHAPTER 11. FUNCTIONS AS LIMITS
Proof. Suppose ∑ Mn converges. Given ε > 0, we have that the partial sums of ∑ Mn are Cauchy so
for there is an N such that for all m, n ≥ N with m ≥ n we have
m
∑ Mk < ε
k=n+1
Now let us look at a Cauchy difference of the partial sums of the functions
m m m
∑ fk (x) ≤ ∑ | fk (x)| ≤ ∑ Mk < ε .
k=n+1 k=n+1 k=n+1
0.5
−0.5
−1
−4 −2 0 2 4
sin(nx)
Figure 11.2: Plot of ∑∞
n=1 n2 including the first 8 partial sums in various shades of gray.
Now we would love to say something about the limit. For example, is it continuous?
Proposition 11.2.7. Let (X, dX ) and (Y, dY ) be metric spaces, and let fn : X → Y be functions.
Suppose Y is complete metric space. Suppose { fn } converges uniformly to f : X → Y . Let {xk } be
a sequence in X and x := lim xk . Suppose that
an := lim fn (xk )
k→∞
In other words
lim lim fn (xk ) = lim lim fn (xk ).
k→∞ n→∞ n→∞ k→∞
Proof. First we show that {an } converges. As { fn } converges uniformly it is uniformly Cauchy.
Let ε > 0 be given. There is an M such that for all m, n ≥ M we have
dY fn (xk ), fm (xk ) < ε for all k.
As a metric is automatically continuous we let k go to infinity to find
dY (an , am ) ≤ ε .
Hence {an } is Cauchy and converges since Y is complete. Write a := lim an .
Find a k ∈ N such that
dY fk (p), f (p) < ε/3
for all p ∈ X. Assume k is large enough so that
dY (ak , a) < ε/3.
Find an N ∈ N such that for m ≥ N,
dY fk (xm ), ak < ε/3.
Then for m ≥ N,
dY f (xm ), a ≤ dY f (xm ), fk (xm ) + dY fk (xm ), ak + dY ak , a < ε/3 + ε/3 + ε/3 = ε .
Immediately we obtain a corollary about continuity.
Corollary 11.2.8. Let X and Y be metric spaces such that Y is Cauchy complete. Let fn : X → Y
be continuous functions such that { fn } converges uniformly to f : X → Y . Then f is continuous.
Converse is not true. Just because the limit is continuous doesn’t mean that the convergence is
uniform. For example: fn : (0, 1) → R defined by fn (x) := xn converge to the zero function, but not
uniformly. However, if we add extra conditions on the sequence, we can obtain a partial converse
such as Dini’s theorem, see Exercise 6.2.10 from volume I.
Assuming the exercise that for a compact X, C(X, C) is a metric space with the uniform norm
(actually a normed vector space). We have just shown that it is Cauchy complete:
says that a Cauchy sequence in C(X, C) converges to some function, and shows
that the limit is in fact continuous and hence in C(X, C).
134 CHAPTER 11. FUNCTIONS AS LIMITS
Corollary 11.2.9. Let (X, d) be a compact metric space, then C(X, C) is a Cauchy complete metric
space.
∞
sin(nx)
∑ n2
n=1
11.2.2 Integration
Proposition 11.2.11. Suppose fn : [a, b] → C are Riemann integrable and suppose that { fn } con-
verges uniformly to f : [a, b] → C. Then f is Riemann integrable and
Z b Z b
f = lim fn .
a n→∞ a
Since the integral of a complex-valued function is just the integral of the real and imaginary
parts separately, the proof follows directly by the results of chapter 6 of volume I. We leave the
details as an exercise.
Corollary 11.2.12. Suppose fn : [a, b] → C are Riemann integrable and suppose that
∞
∑ fn(x)
n=1
The swapping of integral and sum is possible because of uniform convergence, which we have
proved before using the Weierstrass M-test ( ).
Note that we can swap integrals and limits under far less stringent hypotheses, but for that we
would need a stronger integral than the Riemann integral. E.g. the Lebesgue integral.
11.2. SWAPPING LIMITS 135
11.2.3 Differentiation
Recall that a complex-valued function f : [a, b] → C where f (x) = u(x) + i v(x) is differentiable, if
u and v are differentiable and the derivative is
The proof of the following theorem is to apply the corresponding theorem for real functions to u
and v, and is left as an exercise.
Uniform limits of the functions themselves are not enough, and can make matters even worse.
In we will prove that continuous functions are uniform limits of polynomials, yet as the
following example demonstrates, a continuous function need not have a derivative anywhere.
Example 11.2.15: Let us construct a continuous nowhere differentiable function. Such functions
are often called Weierstrass functions, although this particular one is a different example than what
Weierstrass gave.
Define
ϕ (x) := |x| for x ∈ [−1, 1].
Extend the definition of ϕ to all of R by making it 2-periodic: Decree that ϕ (x) = ϕ (x + 2). The
function ϕ : R → R is continuous, in fact |ϕ (x) − ϕ (y)| ≤ |x − y| (why?). See .
0
−8 −6 −4 −2 0 2 4 6 8
3 n
As ∑ 4 converges and |ϕ (x)| ≤ 1 for all x, we have by the M-test ( ) that
n
∞
3
f (x) := ∑ ϕ (4n x)
n=0 4
converges uniformly and hence is continuous. We claim that this f : R → R is nowhere differentiable.
See .
Fix x and define
1
δm := ± 4−m ,
2
where the sign is chosen so that there is no integer between 4m x and 4m (x + δm ) = 4m x ± 12 . Fix m
for a moment.
136 CHAPTER 11. FUNCTIONS AS LIMITS
0
0 1 2
Figure 11.4: Plot of the nowhere differentiable function f .
Let
ϕ 4n (x + δm ) − ϕ (4n x)
γn := .
δm
If n > m, then 4n δm is an even integer. As ϕ is 2-periodic we get that γn = 0.
As there is no integer between 4m (x + δm ) = 4m x ± 1/2and 4m x, then on this interval ϕ (t) =
±t + ℓ for some integer ℓ. In particular, ϕ 4m (x + δm ) − ϕ (4m x) = |4m x ± 1/2 − 4m x| = 1/2.
Therefore
ϕ 4m (x + δm ) − ϕ (4m x)
|γm | = −m
= 4m .
±( /2)4
1
And so
∞ n ∞ n
f (x + δm ) − f (x) 3 ϕ 4n (x + δm ) − ϕ (4n x) 3
δm
= ∑ δm
= ∑ 4 γn
n=0 4 n=0
m n
3
= ∑ γn
n=0 4
m m−1 n
3 3
≥ γm − ∑ γn
4 n=0 4
m−1
3 m − 1 3m + 1
≥ 3m − ∑ 3n = 3m − = .
n=0 3−1 2
3m +1
As m → ∞ we have δm → 0, but 2 goes to infinity. Hence f cannot be differentiable at x.
11.2. SWAPPING LIMITS 137
11.2.4 Exercises
Exercise 11.2.1: Prove .
Exercise 11.2.3: Suppose (X, d) is a compact metric space. Prove that k·ku is a norm on the vector space of
continuous complex-valued functions C(X, C).
Exercise 11.2.4:
a) Prove that fn (x) := 2−n sin(2n x) converge uniformly to zero, but there exists a dense set D ⊂ R such that
limn→∞ fn′ (x) = 1 for all x ∈ D.
b) Prove that ∑∞ −n n
n=1 2 sin(2 x) converges uniformly to a continuous function, and there exists a dense set
D ⊂ R where the derivatives of the partial sums do not converge.
Exercise 11.2.5: Suppose (X, d) is a compact metric space. Prove that k f kC1 := k f ku + k f ′ ku is a norm on
the vector space of continuously differentiable complex-valued functions C1 (X, C).
Exercise 11.2.8: Work out the following counterexample to the converse of the Weierstrass M-test (
). Define fn : [0, 1] → R by
(
1 1
if n+1 < x < 1n ,
fn (x) := n
0 else.
Exercise 11.2.9: Suppose fn : [0, 1] → R are monotone increasing functions and suppose that ∑ fn converges
pointwise. Prove that ∑ fn converges uniformly.
y y
x x
1
Figure 11.5: Graphs of the real and imaginary parts of z = x + iy 7→ 1−z in the square [−0.8, 0.8]2 . The
singularity at z = 1 is marked with a vertical dashed line.
Proof. We use the real version of this proposition, Proposition 2.6.10 in volume I. Let
p
R := lim sup n |cn |.
n→∞
If R = 0, then ∑∞ n ∞ n
n=0 |cn | |z − a| converges for all z. If R = ∞, then ∑n=0 |cn | |z − a| converges only
∞
at z = a. Otherwise, let ρ := 1/R and ∑n=0 |cn | |z − a|n converges when |z − a| < ρ , and diverges (in
fact the terms of the series do not go to zero) when |z − a| > ρ .
To prove the furthermore suppose 0 < r < ρ and z ∈ C(a, r). Then consider the partial sums
k k k
∑ cn(z − a)n ≤ ∑ |cn||z − a|n ≤ ∑ |cn|rn.
n=0 n=0 n=0
The number ρ is called the radius of convergence. See . The radius of convergence
gives us a disk around a where the series converges. A power series is convergent if ρ > 0.
series
converges series
does not converge
a
ρ
∑ cn(w − a)n
converges absolutely whenever |w − a| < |z − a|. Conversely if the series diverges at z, then it must
diverge at w whenever |w − a| > |z − a|. This means that to show that the radius of convergence is
at least some number, we simply need to show convergence at some point by any method we know.
1 1
Example 11.3.3: Note the difference between 1−z and its power series. Let us expand 1−z as power
1
series around any point a 6= 1. Let c := 1−a , then
!
∞ ∞
1 c 1
= = c ∑ cn (z − a)n = ∑ n+1
(z − a)n .
1 − z 1 − c(z − a) n=0 n=0 (1 − a)
The series ∑ cn (z − a)n converges if and only if the series on the right hand side converges and
p
n 1
lim sup |cn | = |c| = .
n→∞ |1 − a|
The radius of convergence of the power series is |1 − a|, that is the distance from 1 to a. The
1
function 1−z has a power series representation around every a 6= 1 and so is analytic in C \ {1}. The
domain of the function is bigger than the region of convergence of the power series representing the
function at any point.
It turns out that if a function has a power series representation converging to the function on
some ball, then it has a power series representation at every point in the ball. We will prove this
result later.
is convergent in B(a, ρ ) for some ρ > 0, then f : B(a, ρ ) → C is continuous. In particular, analytic
functions are continuous.
11.3. POWER SERIES AND ANALYTIC FUNCTIONS 141
Proof. For any z0 ∈ B(a, ρ ) pick r < ρ such that z0 ∈ B(a, r). On B(a, r) the partial sums (which are
continuous) converge uniformly, and so the limit f |B(a,r) is continuous. Any sequence converging
to z0 has some tail that is completely in the open ball B(a, r), hence f is continuous at z0 .
In Corollary 6.2.13 of volume I we proved that we can differentiate real power series term by
term. That is we proved that if
∞
f (x) := ∑ cn(x − a)n
n=0
converges for real x in an interval around a ∈ R, then we can differentiate term by term and obtain a
series ∞ ∞
f ′ (x) = ∑ ncn(x − a)n−1 = ∑ (n + 1)cn+1(x − a)n
n=1 n=0
with the same radius of convergence. We only proved this theorem when cn is real, however, for
complex cn , we write cn = sn + itn , and as x and a are real
∞ ∞ ∞
∑ cn(x − a)n = ∑ sn(x − a)n + i ∑ tn(x − a)n.
n=0 n=0 n=0
In particular,
f (ℓ) (a) = ℓ! cℓ . (11.1)
So the coefficients are uniquely determined by the derivatives of the function, and vice versa.
On the other hand, just because we have an infinitely differentiable function doesn’t mean that
(n)
the numbers cn obtained by cn = f n!(0) give a convergent power series. There is a theorem, which
we will not prove, that given an arbitrary sequence {cn }, there exists an infinitely differentiable
(n)
function f such that cn = f n!(0) . Finally, even if the obtained series converges it may not converge to
the function we started with. For a simpler example, see Exercise 5.4.11 in volume I: The function
(
e−1/x if x > 0,
f (x) :=
0 if x ≤ 0,
is infinitely differentiable, and all derivatives at the origin are zero. So its series at the origin would
be just the zero series, and while that series converges, it does not converge to f for x > 0.
Note that we can always apply an affine transformation z 7→ z + a that converts a power series to
a series at the origin. That is, if
∞ ∞
n
f (z) = ∑ cn(z − a) , we consider f (z + a) = ∑ cn z n .
n=0 n=0
Therefore it is usually sufficient to prove results about power series at the origin. From now on, we
often assume a = 0 for simplicity.
142 CHAPTER 11. FUNCTIONS AS LIMITS
As the series converge we get that each fk is continuous at 0 (since 0 is the only cluster point, they
are continuous everywhere, but we don’t need that). For all x ∈ E we have
∞
| fk (x)| ≤ ∑ |ak j |
j=1
By knowing that ∑k ∑ j |ak j | converges (does not depend on x), we know that for any x ∈ E
n
∑ fk (x)
k=1
Now we prove that once we have a series converging to a function in some interval, we can
expand the function around any point.
Theorem 11.3.6 (Taylor’s theorem for real-analytic functions). Let
∞
f (x) := ∑ ak x k
k=0
be a power series converging in (−ρ , ρ ) for some ρ > 0. Given any a ∈ (−ρ , ρ ), and x such that
|x − a| < ρ − |a| we obtain
∞
f (k) (a)
f (x) = ∑ (x − a)k .
k=0 k!
The power series at a could of course converge in a larger interval, but the one above is
guaranteed. It is the largest symmetric interval about a that fits in (−ρ , ρ ).
Proof. Given a and x as in the theorem, write
∞ k
f (x) = ∑ ak (x − a) + a
k=0
∞ k
k k−m
= ∑ ak ∑ a (x − a)m
k=0 m=0 m
k k−m
Define ck,m := ak m a if m ≤ k and 0 if m > k. Then
∞ ∞
f (x) = ∑ ∑ ck,m(x − a)m. (11.2)
k=0 m=0
and this series converges as long as (|x − a| + |a|) < ρ or in other words if |x − a| < ρ − |a|.
Using , swap the order of summation in ( ), and the following series converges
when |x − a| < ρ − |a|:
!
∞ ∞ ∞ ∞
f (x) = ∑ ∑ ck,m (x − a)m = ∑ ∑ ck,m (x − a)m .
k=0 m=0 m=0 k=0
Note that if a series converges for real x ∈ (a − ρ , a + ρ ) it also converges for all complex
numbers in B(a, ρ ). We have the following corollary.
Corollary 11.3.7. If ∑ ck (z − a)k converges to f (z) in B(a, ρ ) and b ∈ B(a, ρ ), then there exists a
power series ∑ dk (z − b)k that converges to f (z) in B(b, ρ − |b − a|).
Proof. Without loss of generality assume that a = 0. We can rotate to assume that b is real, but
b̄
since that is harder to picture, let us do it explicitly. Let α = |b| . Notice that
|1/α | = |α | = 1.
and this series converges for all z such that α z − |b| < ρ − |b| or |z − b| < ρ − |b|.
We proved above that a convergent power series is an analytic function where it converges. We
1
have also shown before that 1−z is analytic outside of z = 1.
Note that just because a real analytic function is analytic on the entire real line it does not
necessarily mean that it has a power series representation that converges everywhere. For example,
the function
1
f (x) =
1 + x2
happens to be real analytic function on R (exercise). A power series around the origin converging to
f has a radius of convergence of exactly 1. Can you see why? (exercise)
Write g(z) = ∑∞ k
k=0 ak+m z (this series converges in on the same set as f ). g is continuous and
g(0) = am 6= 0. Thus there exists some δ > 0 such that g(z) 6= 0 for all z ∈ B(0, δ ). As f (z) = zm g(z),
then the only point in B(0, δ ) where f (z) = 0 is when z = 0, but this contradicts the assumption that
f (zn ) = 0 for all n.
11.3. POWER SERIES AND ANALYTIC FUNCTIONS 145
Recall that in a metric space X, a cluster point (or sometimes limit point) of a set E is a point
p ∈ X such that B(p, ε ) \ {p} contains points of E for all ε > 0.
Theorem 11.3.9 (Identity theorem). Let U ⊂ C be an open connected set. If f : U → C and
g : U → C are analytic functions that are equal on a set E ⊂ U, and E has a cluster point in U,
then f (z) = g(z) for all z ∈ U.
In most common applications of this theorem E is an open set or perhaps a curve.
Proof. Without loss of generality suppose E is the set of all points z ∈ U such that g(z) = f (z).
Note that E must be closed as f and g are continuous.
Suppose E has a cluster point. Without loss of generality assume that 0 is this cluster point.
Near 0, we have the expansions
∞ ∞
f (z) = ∑ ak zk and g(z) = ∑ bk zk ,
k=0 k=0
converges in B(0, ρ ). As 0 is a cluster point of E, then there is a sequence of nonzero points {zn }
such that f (zn ) − g(zn ) = 0. Therefore by the lemma above we have that ak = bk for all k. And
therefore B(0, ρ ) ⊂ E.
This means that E is open. As E is also closed, and U is connected, we conclude that E = U.
By restricting our attention to real x we obtain the same theorem for connected open subsets of
R, which are just open intervals.
11.3.6 Exercises
Exercise 11.3.1: Let
1 if k = j,
ak j := −2k− j if k < j,
0 else.
Compute (or show the limit doesn’t exist):
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
a) ∑ |ak j | for any k, b) ∑ |ak j | for any j, c) ∑ ∑ |ak j |, d) ∑ ∑ ak j , e) ∑ ∑ ak j .
j=1 k=1 k=1 j=1 k=1 j=1 j=1 k=1
Hint: Fubini for sums does not apply, in fact, answers to d) and e) are different.
1
Exercise 11.3.2: Let f (x) := 1+x2
. Prove that
a) f is analytic function on all of R by finding a power series for f at every a ∈ R,
b) the radius of convergence of the power series for f at the origin is 1.
Exercise 11.3.3: Suppose f : C → C is an analytic function. Show that for each n, there are at most finitely
many zeros of f in B(0, n), that is f −1 (0) ∩ B(0, n) is finite for each n.
146 CHAPTER 11. FUNCTIONS AS LIMITS
Exercise 11.3.5: Suppose U ⊂ C is a connected open set, 0 ∈ U, and f : U → C is an analytic function. For
real x and y, let h(x) := f (x) and g(y) := −i f (iy). Show that h and g are infinitely differentiable at the origin
and h′ (0) = g′ (0).
Exercise 11.3.6: Suppose at least in some neighborhood of the origin f is analytic. Suppose further that
there exists an M such that | f (n) (0)| ≤ M for all n. Prove that the series of f at the origin converges for all
z ∈ C.
Exercise 11.3.7: Suppose f (z) := ∑ cn zn with a radius of convergence 1. Suppose f (0) = 0, but f is not the
zero function. Show that there exists a k ∈ N and a convergent power series g(z) := ∑ dn zn with radius of
convergence 1 such that f (z) = zk g(z) for all z ∈ B(0, 1), and g(0) 6= 0.
Exercise 11.3.8: Suppose U ⊂ C is open and connected. Suppose that f : U → C is analytic, U ∩ R 6= 0/ and
f (x) = 0 for all x ∈ U ∩ R. Show that f (z) = 0 for all z ∈ U.
converges whenever |z| < 1. In fact, prove that for α = 0, 1, 2, 3, . . . the radius of convergence is ∞, and
for all other α the radius of convergence is 1.
b) Show that for x ∈ R, |x| < 1, we have
(1 + x) f ′ (x) = α f (x),
Exercise 11.3.10: Suppose f : C → C is analytic and suppose that for some open interval (a, b) ⊂ R, f is
real valued on (a, b). Show that f is real-valued on R.
Exercise 11.3.11: Let D = B(0, 1) be the unit disc. Suppose f : D → C is analytic with power series ∑ cn zn .
1
Suppose |cn | ≤ 1 for all n. Prove that for all z ∈ D we have | f (z)| ≤ 1−|z| .
11.4. THE COMPLEX EXPONENTIAL AND THE TRIGONOMETRIC FUNCTIONS 147
This series converges for all z ∈ C. We notice that E(0) = 1, and that for z = x ∈ R, E(x) ∈ R.
Keeping x real, we find
d
E(x) = E(x)
dx
by direct calculation. In §5.4 of volume I (or by Picard’s theorem), we proved that the unique
function satisfying E ′ = E and E(0) = 1 is the exponential. In other words for x ∈ R, ex = E(x).
For complex numbers z we define
∞
z 1
e := E(z) = ∑ k! zk .
k=0
On the real line this new definition agrees with our previous one. See . Notice that in the
x direction (the real direction) the graph behaves like the real exponential, and in the y direction (the
imaginary direction) the graph oscillates.
y y
x x
Figure 11.7: Graphs of the real part (left) and imaginary part (right) of the complex exponential
ez = ex+iy . The x-axis goes from −4 to 4, the y-axis goes from −6 to 6, and the vertical axis goes from
−e6 ≈ −54.6 to e6 ≈ 54.6. The plot of the real exponential (y = 0) is marked in a bold line.
ez+w = ez ew .
148 CHAPTER 11. FUNCTIONS AS LIMITS
Proposition 11.4.2. The sine and cosine functions have the following properties:
(i) For all z ∈ C,
eiz = cos(z) + i sin(z) (Euler’s formula).
d d
cos(x) = − sin(x) and sin(x) = cos(x).
dx dx
11.4. THE COMPLEX EXPONENTIAL AND THE TRIGONOMETRIC FUNCTIONS 149
(xiii) The function x 7→ eix is a bijective map from [0, 2π ) onto the set of z ∈ C such that |z| = 1.
The proposition immediately implies that sin(x) and cos(x) are real whenever x is real.
Proof. The first three items follow directly from the definition. The computation of the power series
for both is left as an exercise.
As complex conjugate is a continuous function, then the definition of ez implies (ez ) = ez̄ . If x
is real,
(eix ) = e−ix .
Thus for real x, cos(x) = Re(eix ) and sin(x) = Im(eix ).
For real x we compute
2 2
1 = eix e−ix = |eix |2 = cos(x) + sin(x) .
In particular, is eix is unimodular, the values lie on the unit circle. A square is always nonnegative:
2 2
sin(x) = 1 − cos(x) ≤ 1.
d
f ′ (x) = x − sin(x) = 1 − cos(x) ≥ 0,
dx
for all x as |cos(x)| ≤ 1. In other words, f is increasing and f (0) = 0. So f must be nonnegative
when x ≥ 0.
We claim there exists a positive x such that cos(x) = 0. As cos(0) = 1 > 0, cos(x) > 0 for x near
0. Suppose that cos(x) > 0 on [0, y), then sin(x) is strictly increasing on [0, y). As sin(0) = 0, then
sin(x) > 0 for x ∈ (0, y). Take a ∈ (0, y). By the mean value theorem there is a c ∈ (a, y) such that
2
y≤ + a.
sin(a)
Hence there is some largest y such that cos(x) > 0 in [0, y), and let y be the largest such number. By
continuity, cos(y) = 0. In fact, y is the smallest positive y such that cos(y) = 0. As mentioned π is
defined to be 2y.
2
As cos(π/2) = 0 then sin(π/2) = 1. As sin was positive on (0, y) we have sin(π/2) = 1. Hence
eiπ/2 = i,
ez+i2π = ez
for all z ∈ C. Immediately we also obtain cos(z + 2π ) = cos(z) and sin(z + 2π ) = sin(z). So sin
and cos are 2π -periodic.
We claim that sin and cos are not periodic with a smaller period. It would be enough to show
that if eix = 1 for the smallest positive x, then x = 2π . So let x be the smallest positive x such that
eix = 1. Of course, x ≤ 2π . By the addition formula
4
eix/4 = 1.
More generally we notice that eit parametrizes the circle by arclength. That is, t measures the
arclength, and hence a circle of radius 1 by the angle in radians. Hence the definitions of sin and
cos we have used above agree with the standard geometric definitions.
All the points on the unit circle can be achieved by eit for some t. Therefore, we can write any
complex number z ∈ C (in so-called polar coordinates) as
z = reiθ
for some r ≥ 0 and θ ∈ R. The θ is, of course, not unique as θ or θ + 2π gives the same number.
The formula ea+b = ea eb leads to a useful formula for powers and products of complex numbers in
polar coordinates:
n
(reiθ ) = rn einθ , (reiθ )(seiγ ) = rsei(θ +γ) .
11.4.4 Exercises
Exercise 11.4.1: Derive the power series for sin(z) and cos(z) at the origin.
d
d
Exercise 11.4.2: Using the power series, show that for x real we have dx sin(x) = cos(x) and dx cos(x) =
− sin(x).
Exercise 11.4.3: Finish the proof of the argument that x 7→ eix from [0, 2π) is onto the unit circle. In particular,
assume that we get all points of the form (a, b) where a2 + b2 = 1 for a ≥ 0. By multiplying by eiπ or e−iπ
show that we get everything.
Exercise 11.4.5: Prove that for every w 6= 0 and every ε > 0, there exists a z ∈ C, |z| < ε such that e1/z = w.
2 2 2 2
Exercise 11.4.6: We showed cos(x) + sin(x) = 1 for all x ∈ R. Prove that cos(z) + sin(z) = 1
for all z ∈ C.
Exercise 11.4.7: Prove the trigonometric identities sin(z + w) = sin(z) cos(w) + cos(z) sin(w) and cos(z +
w) = cos(z) cos(w) − sin(z) sin(w) for all z, w ∈ C.
sin(z)
Exercise 11.4.8: Define sinc(z) := z for z 6= 0 and sinc(0) := 1. Show that sinc is analytic and compute
its power series at zero.
152 CHAPTER 11. FUNCTIONS AS LIMITS
ez − e−z ez + e−z
sinh(z) := , cosh(z) := .
2 2
Exercise 11.4.9: Derive the power series for the hyperbolic sine and cosine.
converges for all −1 ≤ x ≤ 1 (including the end points). Hint: integrate the finite sum, not the series.
d) Use this to show that
1 1 ∞
(−1)k π
1− + −··· = ∑ = .
3 5 k=0 2k + 1 4
11.5. FUNDAMENTAL THEOREM OF ALGEBRA 153
Lemma 11.5.1. Let p(z) be complex polynomial. If p(z0 ) 6= 0, then there exist w ∈ C such that
|p(w)| < |p(z0 )|. In fact, we can pick w to be arbitrarily close to z0 .
where ak 6= 0. Pick t such that ak eikt = −|ak |, which we can do by the discussion on trigonometric
functions. Suppose r > 0 is small enough such that 1 − rk |ak | > 0. We have
So
p(reit ) − rk+1 ak+1 ei(k+1)t + · · · + rd ad eidt ≤ p(reit ) − rk+1 ak+1 ei(k+1)t − · · · − rd ad eidt
= 1 − rk |ak | = 1 − rk |ak |.
In other words,
p(reit ) ≤ 1 − rk |ak | − r ak+1 ei(k+1)t + · · · + rd−k−1 ad eidt .
For small enough r the expression the parentheses is positive as |ak | > 0. Then p(reit ) < 1 =
p(0).
Remark 11.5.2. The above lemma holds essentially with an unchanged proof for (complex) analytic
functions. A proof of this generalization is left as an exercise to the reader. What the lemma says is
that the only minima the modulus of analytic functions (polynomials) has are precisely at the zeros.
Remark 11.5.3. The lemma does not hold if we restrict to real numbers. For example, x2 + 1 has a
minimum at x = 0, but no zero there. The thing is that there is a w arbitrarily close to 0 such that
|w2 + 1| < 1, but this w is necessarily not real. Letting w = iε for small ε > 0 works.
The moral of the story is that if p(0) = 1, then very close to 0, the polynomial looks like 1 + azk
and this has no minimum at the origin. All the higher powers of z are too small to make a difference.
We find similar behavior at infinity.
Lemma 11.5.4. Let p(z) be complex polynomial. Then for any M there exists an R such that if
|z| ≥ R then |p(z)| ≥ M.
154 CHAPTER 11. FUNCTIONS AS LIMITS
Proof. Write p(z) = a0 + a1 z + · · · + ad zd and suppose that ad 6= 0. Suppose |z| ≥ R (so also
|z|−1 ≤ R−1 ). We estimate:
Then the expression in parentheses is eventually positive for large enough R. In particular, for large
enough R we get that it is greater than |a2d | and so
|ad |
|p(z)| ≥ Rd .
2
Therefore, we can pick R large enough to be bigger than a given M.
The above lemma does not generalize to analytic functions, even those defined in all of C. The
function cos(z) is an obvious counterexample. Note that we had to look at the term with the largest
degree, and we only have such a term for a polynomial. In fact, something that we will not prove
is that an analytic function defined on all of C satisfying the conclusion of the lemma must be a
polynomial.
The moral of the story here is that for very large |z| (far away from the origin) a polynomial of
degree d really looks like a constant multiple of zd .
Theorem 11.5.5 (Fundamental theorem of algebra). Let p(z) be complex polynomial, then there
exists a z0 ∈ C such that p(z0 ) = 0.
Proof. Let µ := inf |p(z)| : z ∈ C . Find an R such that for all z with |z| ≥ R we
have |p(z)| ≥ µ +1.
Therefore, any z with |p(z)| close to µ must be in the closed ball C(0, R) = z ∈ C : |z| ≤ R . As
|p(z)| is a continuous real-valued function, it achieves its minimum on the compact set C(0, R)
(closed and bounded) and this minimum must be µ . So there is a z0 ∈ C(0, R) such that |p(z0 )| = µ .
As that is a minimum of |p(z)| on C, then by the first lemma above we have that |p(z0 )| = 0.
The theorem doesn’t generalize to analytic functions either. For example ez is an analytic
function on C with no zeros.
11.5.1 Exercises
Exercise 11.5.1: Prove for an analytic function. That is, suppose that p(z) is a power series
around z0 .
Exercise 11.5.2: Use to prove the maximum principle for analytic functions: If U ⊂ C is
open and connected, f : U → C is analytic, and | f (z)| attains a relative maximum at z0 ∈ U. Then f is
constant.
Exercise 11.5.3: Let U ⊂ C be open and z0 ∈ U. Suppose f : U → C is analytic and f (z0 ) = 0. Show that
there exists an ε > 0 such that either f (z) 6= 0 for all z with 0 < |z| < ε or f (z) = 0 for all z ∈ B(z0 , ε). In
other words zeros of analytic functions are isolated. Of course, same holds for polynomials.
11.5. FUNDAMENTAL THEOREM OF ALGEBRA 155
p(z)
A rational function is a function f (z) := q(z) where p and q are polynomials and q is not identically
zero. A point z0 ∈ C where f (z0 ) = 0 (and therefore p(z0 ) = 0) is called a zero. A point z0 ∈ C is called an
singularity of f if q(z0 ) = 0. As all zeros are isolated and so all singularities of rational functions are isolated
and so are called an isolated singularity. An isolated singularity is called removable if limz7→z0 f (z) exists.
An isolated singularity is called a pole if limz7→z0 | f (z)| = ∞. We say f has pole at ∞ if
lim | f (z)| = ∞,
z→∞
that is, if for every M > 0 there exists an R > 0 such that | f (z)| > M for all z with |z| > R.
Exercise 11.5.4: Show that a rational function which is not identically zero has at most finitely many zeros
and singularities. In fact, show that if p is a polynomial of degree n > 0 it has at most n zeros.
Hint: If z0 is a zero of p, without loss of generality assume z0 = 0, then use induction.
p(z)
Exercise 11.5.5: Prove that if z0 is a removable singularity of a rational function f (z) := q(z) , show that
pe(z)
there exist polynomials pe and qe such that qe(z0 ) 6= 0 and f (z) = qe(z) .
Hint: Without loss of generality assume z0 = 0.
Exercise 11.5.6: Given a rational function f and an isolated singularity z0 , show that z0 is either removable
or a pole.
Hint: See the previous exercise.
Exercise 11.5.7: Let f be a rational function and S ⊂ C is the set of the singularities of f . Prove that f is
equal to a polynomial on C \ S if and only if f has a pole at infinity and all the singularities are removable.
Hint: See previous exercises.
156 CHAPTER 11. FUNCTIONS AS LIMITS
Definition 11.6.1. Let X be any set. Let fn : X → C be functions in a sequence. We say that { fn }
is pointwise bounded if for every x ∈ X, there is an Mx ∈ R such that
Example 11.6.2: There exist sequences of continuous functions on [0, 1] that are uniformly bounded
but contain no subsequence converging even pointwise. Let us state without proof that fn (x) :=
sin(2π nx) is one such sequence. Below we will show that there must always exist a subsequence
converging at countably many points, but [0, 1] is uncountable.
Example 11.6.3: The sequence fn (x) := xn of functions on [0, 1] is uniformly bounded, but contains
no sequence that converges uniformly, although the sequence converges pointwise to a discontinuous
function.
Example 11.6.4: The sequence { fn } of functions in C([0, 1], R) given by fn (x) := n2 (1 − x)xn
converges pointwise to the zero function (use the ratio test for x < 1). As for each x, { fn (x)}
converges to 0, it is bounded so { fn } is pointwise bounded.
n
By calculus we find that the maximum of each fn on [0, 1] is at the critical point at x = n+1 , and
n+1
n n
k f n ku = f n =n .
n+1 n+1
n n+1
It is left to the reader to check that lim n+1 = e−1 , and so lim k fn ku = ∞, or in other words,
this sequence is not uniformly bounded.
When the domain is countable, we can always guarantee at least pointwise convergence. The
proof uses a very common and useful diagonal argument.
Proposition 11.6.5. Let X be a countable set and fn : X → C give a pointwise bounded sequence
of functions. Then { fn } has a subsequence that converges pointwise.
11.6. EQUICONTINUITY AND THE ARZELÀ–ASCOLI THEOREM 157
Notice the functions in a uniformly equicontinuous sequence are all uniformly continuous. It is
not hard to show that a finite set of uniformly continuous functions is uniformly equicontinuous.
The definition is really interesting if S is infinite.
And just as for continuity, one can define equicontinuity at a point. That is, S is equicontinuous at
x ∈ X if for every ε > 0, there is a δ > 0 such that if y ∈ X with d(x, y) < δ we have | f (x) − f (y)| < ε
for all f ∈ S. We will only deal with compact X here, and one can prove (exercise) that for a compact
metric space X, if S is equicontinuous at every x ∈ X, then it is uniformly equicontinuous. For
simplicity we stick to uniform equicontinuity.
Proposition 11.6.7. Suppose (X, d) is a compact metric space, fn ∈ C(X, C), and { fn } converges
uniformly, then { fn } is uniformly equicontinuous.
Proof. Let ε > 0 be given. As { fn } converges uniformly, there is an N ∈ N such that for all n ≥ N
| fn (x) − fn (y)| ≤ | fn (x) − fN (x)| + | fN (x) − fN (y)| + | fN (y) − fn (y)| < ε/3 + ε/3 + ε/3 = ε .
Proposition 11.6.8. A compact metric space X contains a countable dense subset, that is, there
exists a countable D ⊂ X such that D = X.
158 CHAPTER 11. FUNCTIONS AS LIMITS
Proof. For each n ∈ N there are finitely many balls of radius 1/n that cover X (as X is compact).
That is, for every n, there exists a finite set of points xn,1 , xn,2 , . . . , xn,kn such that
kn
[
X= B(xn, j , 1/n)
j=1
S
Let D := ∞ n=1 {xn,1 , xn,2 , . . . , xn,kn }. The set D is countable as it is a countable union of finite sets.
For every x ∈ X and every ε > 0, there exists an n such that 1/n < ε and an xn, j ∈ D such that
| fn (x j )| ≤ M j for all n.
| fn (x) − fn (x j )| < 1.
| fn (x)| < 1 + | fn (x j )| ≤ 1 + M j ≤ M.
∗
Named after the Italian mathematicians (1847–1912), and (1843–1896).
11.6. EQUICONTINUITY AND THE ARZELÀ–ASCOLI THEOREM 159
As there are finitely many points and {gn } converges pointwise on D, there exists a single N such
that for all n, m ≥ N we have
Let x ∈ X be arbitrary. There is some j such that x ∈ B(x j , δ ) and so we have for all ℓ ∈ N
and so n, m ≥ N that
|gn (x) − gm (x)| ≤ |gn (x) − gn (x j )| + |gn (x j ) − gm (x j )| + |gm (x j ) − gm (x)| < ε/3 + ε/3 + ε/3 = ε .
All the fn are Lipschitz with the same constant and hence the sequence is uniformly equicontinuous.
Suppose | fn (x0 )| ≤ M0 for all n. For all x ∈ [a, b]
11.6.1 Exercises
Exercise 11.6.1: Let fn : [−1, 1] → R be given by fn (x) := nx 2 . Prove that the sequence is uniformly
1+(nx)
bounded, converges pointwise to 0, but does not converge uniformly to 0. Which hypothesis of Arzelà–Ascoli
is not satisfied? Prove your assertion.
1
Exercise 11.6.2: Define fn : R → R by fn (x) := . Prove that this sequence is uniformly bounded,
(x−n)2 +1
uniformly equicontinuous, the sequence converges pointwise to zero, yet there is no subsequence that
converges uniformly. Which hypothesis of Arzelà–Ascoli is not satisfied? Prove your assertion.
Exercise 11.6.3: Let (X, d) be a compact metric space, C > 0, 0 < α ≤ 1, and suppose fn : X → C are
functions such as | fn (x) − fn (y)| ≤ Cd(x, y)α for all x, y ∈ X and n ∈ N. Suppose also that there is a point
p ∈ X such that fn (p) = 0 for all n. Show that there exists a uniformly convergent subsequence converging to
an f : X → C that also satisfies f (p) = 0 and | f (x) − f (y)| ≤ Cd(x, y)α .
Exercise 11.6.4: Let T : C([0, 1], C) → C([0, 1], C) be the operator given by
Z x
T f (x) := f (t) dt.
0
(That T is linear and that T f is continuous follows from linearity of the integral and the fundamental theorem
of calculus).
a) Show that T takes the unit ball centered at 0 in C([0, 1], C) into a relatively compact set (a set with
compact closure). That is, T is a compact operator.
Hint: See Exercise 7.4.20 in Volume I.
b) Let C ⊂ C([0, 1], C) the closed unit ball, prove that the image T (C) is not closed (though it is relatively
compact).
Exercise 11.6.5: Given k ∈ C([0, 1] × [0, 1], C), let T : C([0, 1], C) → C([0, 1], C) be the operator defined by
Z 1
T f (x) := f (t) k(x,t) dt.
0
Show that T takes the unit ball centered at 0 in C([0, 1], C) into a relatively compact set (a set with compact
closure). That is, T is a compact operator.
Hint: See Exercise 7.4.20 in Volume I.
Note: That T is a well-defined linear operator was proved in .
11.6. EQUICONTINUITY AND THE ARZELÀ–ASCOLI THEOREM 161
Exercise 11.6.6: Suppose S1 ⊂ C is the unit circle, that is the set where |z| = 1. Suppose the continuous
functions fn : S1 → C are uniformly bounded. Let γ : [0, 1] → S1 be a parametrization of S1 , and g(z, w)
a continuous function on C(0, 1) × S1 (here C(0, 1) ⊂ C is the closed unit ball). Define the functions
Fn : C(0, 1) → C by the path integral (See )
Z
Fn (z) := fn (w) g(z, w) ds(w).
γ
Exercise 11.6.7: Suppose (X, d) is a compact metric space, { fn } a uniformly equicontinuous sequence of
functions in C(X, C). Suppose { fn } converges pointwise. Show that it converges uniformly.
Exercise 11.6.8: Suppose that { fn } is a uniformly equicontinuous uniformly bounded sequence of 2π-
periodic functions fn : R → R. Show that there is a uniformly convergent subsequence.
Exercise 11.6.9: Show that for a compact metric space X, a sequence { fn } that is equicontinuous at every
x ∈ X is uniformly equicontinuous.
Exercise 11.6.10: Define fn : [0, 1] → C by fn (t) := ei(2π t+n) . This is a uniformly equicontinuous uniformly
bounded sequence. Prove more than just the conclusion of Arzelà–Ascoli for this sequence. Let γ ∈ R be
given, and define g(t) := ei(2π t+γ ) . Show that there exists a subsequence of { fn } converging uniformly to g.
Hint: Feel free to use the Kronecker density theorem : The sequence {ein }∞ n=1 is dense in the unit circle.
Exercise 11.6.11: Prove the Peano existence theorem (note the lack of uniqueness in this theorem):
Theorem: Suppose F : I × J → R is a continuous function where I, J ⊂ R are closed bounded intervals,
let I ◦ and J ◦ be their interiors, and let (x0 , y0 ) ∈ I ◦ × J ◦ . Then there exists an h > 0 and a differentiable
function f : [x0 − h, x0 + h] → J ⊂ R, such that
f ′ (x) = F x, f (x) and f (x0 ) = y0 .
Prove that there exists an h > 0 such that fn : [x0 − h, x0 + h] → C is well-defined for all n. Hint: F is
bounded (why?).
b) Show that { fn } is equicontinuous and bounded, in fact it is Lipshitz with a uniform Lipshitz constant.
Arzelà–Ascoli then says that there exists a uniformly convergent subsequence { fnk }.
∞
c) Prove F x, fnk (x) k=1 converges uniformly on [x0 − h, x0 + h]. Hint: F is uniformly continuous (why?).
d) Finish the proof of the theorem by taking the limit under the integral and applying the fundamental
theorem of calculus.
∗
Named after the German mathematician (1823–1891).
162 CHAPTER 11. FUNCTIONS AS LIMITS
If we can prove the theorem for g and find the sequence {pn } for g, we prove it for f as we simply
composed with an invertible affine function and added an affine function to f . We can reverse the
process and apply that to our pn , to obtain polynomials approximating f .
The function g is defined on [0, 1] and g(0) = g(1) = 0. Assume that g is defined on the whole
real line for simplicity by defining g(x) := 0 if x < 0 or x > 1. This extended g is continuous.
Define
Z 1 −1
2 n
cn := (1 − x ) dx , qn (x) := cn (1 − x2 )n .
−1
R1
The choice of cn is so that −1 qn (x) dx = 1. See .
0
−1 0 1
Figure 11.8: Plot of the approximate delta functions qn on [−1, 1] for n = 5, 10, 15, 20, . . . , 100 with
higher n in lighter shade.
11.7. THE STONE–WEIERSTRASS THEOREM 163
The functions qn are peaks around 0 (ignoring what happens outside of [−1, 1]) that get narrower
and taller as n increases, while the area underneath is always 1. A classic approximation idea is to
do a convolution integral with peaks like this: For for x ∈ [0, 1], let
Z 1 Z∞
pn (x) := g(t)qn (x − t) dt = g(t)qn (x − t) dt .
0 −∞
The idea of this convolution is that we do a “weighted average” of the function g around the point x
using qn as the weight. See .
0
0.5 1
−1 x
Figure 11.9: For x = 0.3, the plot of q100 (x −t) (light gray peak centered at x), some continuous function
g(t) (the jagged line) and the product g(t)q100 (x − t) (the bold line).
As qn is a narrow peak, the integral mostly sees the values of g that are close to x and it does the
weighted average of them. When the peak gets narrower, we compute this average closer to x and
we expect the result to get closer to the value of g(x). Really we are approximating what is called a
delta function (don’t worry if you have not heard of this concept), and functions like qn are often
called approximate delta functions. We could do this with any set of polynomials that look like
narrower and narrower peaks near zero. These just happen to be the simplest ones. We only need
this behavior on [−1, 1] as the convolution sees nothing further than this as g is zero outside [0, 1].
Because qn is a polynomial we write
qn (x − t) = a0 (t) + a1 (t) x + · · · + a2n (t) x2n ,
where ak (t) are polynomials in t, in particular continuous and hence integrable functions. So
Z 1
pn (x) = g(t)qn (x − t) dt
0Z 1
Z 1 Z 1
= g(t)a0 (t) dt + g(t)a1 (t) dt x + · · · + g(t)a2n (t) dt x2n .
0 0 0
In other words, pn is a polynomial in x. If g(t) is real-valued then the functions g(t)a j (t) are
real-valued and hence pn has real coefficients, proving the “furthermore” part of the theorem.
∗
which is not actually a function
†
Do note that the functions a j depend on n, so the coefficients of pn change as n changes.
164 CHAPTER 11. FUNCTIONS AS LIMITS
We still need to prove that {pn } converges to g. First let us get some handle on the size of cn .
For x ∈ [0, 1], we have that 1 − x ≤ 1 − x2 . We estimate
Z 1 Z 1
2 n n
c−1
n = (1 − x ) dx = 2 (1 − x2 ) dx
−1 0
Z 1
2
≥2 (1 − x)n dx = .
0 n+1
So cn ≤ n+1
2 ≤ n.
Let us see how small qn is, if we ignore some small interval around the origin, which is where
the peak is. Given any δ > 0, δ < 1, for x such that δ ≤ |x| ≤ 1, we have
n n
qn (x) ≤ cn (1 − δ 2 ) ≤ n(1 − δ 2 ) ,
n
because qn is increasing on [−1, 0] and decreasing on [0, 1]. By the ratio test, n(1 − δ 2 ) goes to 0
as n goes to infinity.
The function qn is even, qn (t) = qn (−t), and g is zero outside of [0, 1]. So for x ∈ [0, 1],
Z 1 Z 1−x Z 1
pn (x) = g(t)qn (x − t) dt = g(x + t)qn (−t) dt = g(x + t)qn (t) dt.
0 −x −1
Let ε > 0 be given. As [−1, 2] is compact and g is continuous on [−1, 2], we have that g is
uniformly continuous. Pick 0 < δ < 1 such that if |x − y| < δ (and x, y ∈ [−1, 2]) then
ε
|g(x) − g(y)| < .
2
Let M be such that |g(x)| ≤ M for all x. Let N be such that for all n ≥ N,
n ε
4Mn(1 − δ 2 ) < .
2
R1
Note that −1 qn (t) dt = 1 and qn (t) ≥ 0 on [−1, 1]. So for n ≥ N and any x ∈ [0, 1],
Z 1 Z 1
|pn (x) − g(x)| = g(x + t)qn (t) dt − g(x) qn (t) dt
−1 −1
Z 1
= g(x + t) − g(x) qn (t) dt
−1
Z 1
≤ |g(x + t) − g(x)|qn (t) dt
−1
Z −δ Z δ
= |g(x + t) − g(x)|qn (t) dt + |g(x + t) − g(x)|qn (t) dt
−1 −δ
Z 1
+ |g(x + t) − g(x)|qn (t) dt
δ
Z −δ Z δ Z 1
ε
≤ 2M qn (t) dt + qn (t) dt + 2M qn (t) dt
−1 2 −δ δ
n ε 2 n
≤ 2Mn(1 − δ 2 ) (1 − δ ) + + 2Mn(1 − δ ) (1 − δ )
2
n ε
< 4Mn(1 − δ 2 ) + < ε.
2
11.7. THE STONE–WEIERSTRASS THEOREM 165
A convolution often inherits some property of the functions we are convolving. In our case the
convolution pn inherited the property of being a polynomial from qn . The same idea of the proof
is often used to get other properties. If qn or g is infinitely differentiable, so is pn . If qn or g is a
solution to a linear differential equation so is pn . Etc. . .
Let us note an immediate application of the Weierstrass theorem. We have already seen that
countable dense subsets can be very useful.
Corollary 11.7.2. The metric space C([a, b], C) contains a countable dense subset.
Proof. Without loss of generality suppose that we are dealing with C([a, b], R) (why?). The real
polynomials are dense in C([a, b], R) by Weierstrass. If we show that any real polynomial can be
approximated by polynomials with rational coefficients, we are done. This is because there are only
countably many rational numbers and so there are only countably many polynomials with rational
coefficients (a countable union of countable sets is still countable).
Further without loss of generality, suppose [a, b] = [0, 1]. Let
n
p(x) := ∑ ak xk
k=0
ε
be a polynomial of degree n where ak ∈ R. Given ε > 0, pick bk ∈ Q such that |ak − bk | < n+1 .
Then if we let
n
q(x) := ∑ bk xk ,
k=0
we have
n n n n
ε
|p(x) − q(x)| = ∑ (ak − bk )xk ≤ ∑ |ak − bk |xk ≤ ∑ |ak − bk | < ∑ n + 1 = ε.
k=0 k=0 k=0 k=0
Remark 11.7.3. While we will not prove this, the above corollary implies that C([a, b], C) has
the same cardinality as R, which may be a bit surprising. The set of all functions [a, b] → C has
cardinality that is strictly greater than the cardinality of R, it has the cardinality of the power set of
R. So the set of continuous functions is a very tiny subset of the set of all functions.
Warning! The fact that every continuous function f : [−1, 1] → C (or any interval [a, b]) can be
uniformly approximated by polynomials
n
∑ ak xk
k=0
does not mean that that any continuous f is analytic, that it is equal to the series
∞
∑ ck x k .
k=0
Corollary 11.7.4. Let [−a, a] be an interval. Then there is a sequence of real polynomials {pn }
that converges uniformly to |x| on [−a, a] and such that pn (0) = 0 for all n.
Proof. As f (x) := |x| is continuous and real-valued on [−a, a], the Weierstrass theorem gives a
sequence of real polynomials { pen } that converges to f uniformly on [−a, a]. Let
Obviously pn (0) = 0.
Given ε > 0, let N be such that for n ≥ N we have pen (x) − |x| < ε/2 for all x ∈ [−a, a]. In
particular, | pen (0)| < ε/2. Then for n ≥ N,
pn (x) − |x| = pen (x) − pen (0) − |x| ≤ pen (x) − |x| + | pen (0)| < ε/2 + ε/2 = ε .
Following the proof of the corollary, we can always make the polynomials from the Weierstrass
theorem have a fixed value at one point, so it works not just for |x|, but that’s the one we will need.
We are interested in the case when X is a compact metric space. Then C(X, C) and C(X, R) are
metric spaces. Given a set A ⊂ C(X, C), the set of all uniform limits is the metric space closure A .
When we talk about closure of an algebra from now on we mean the closure in C(X, C) as a metric
space. Same for C(X, R).
The set P of all polynomials is an algebra in C([a, b], C), and we have shown that its closure
P = C([a, b], C). That is, it is dense. That is the sort of result that we wish to prove.
We leave the following proposition as an exercise.
Proposition 11.7.6. Suppose X is a compact metric space. If A ⊂ C(X, C) is an algebra, then the
closure A is also an algebra. Similarly for a real algebra in C(X, R).
∗
Named after the American mathematician (1903–1989), and the German mathematician
(1815–1897).
11.7. THE STONE–WEIERSTRASS THEOREM 167
Let us distill the properties of polynomials that were sufficient for an approximation theorem.
Example 11.7.8: The set P of polynomials separates points and vanishes at no point on R. That
is, 1 ∈ P so it vanishes at no point. And for x, y ∈ R, x 6= y, take f (t) := t: f (x) = x 6= y = f (y).
So P separates points.
Example 11.7.10: The set of polynomials with no constant term vanishes at the origin.
Let
g − g(y) h g − g(x) k gh − g(y)h gk − g(x)k
f := c +d =c +d .
g(x) − g(y) h(x) g(y) − g(x) k(y) g(x)h(x) − g(y)h(x) g(y)k(y) − g(x)k(y)
Do note that we are not dividing by zero (clear from the first formula). Also from the first formula we
see that f (x) = c and f (y) = d. By the second formula we see that f ∈ A (as A is an algebra).
Theorem 11.7.12 (Stone–Weierstrass, real version). Let X be a compact metric space and A an
algebra of real-valued continuous functions on X, such that A separates points and vanishes at no
point. Then the closure A = C(X, R).
Proof. The function f is bounded (continuous on a compact set), so there is an M such that
| f (x)| ≤ M for all x ∈ X.
Let ε > 0 be given. By the corollary to the Weierstrass theorem there exists a real polynomial
c1 y + c2 y2 + · · · + cn yn (vanishing at y = 0) such that
N
|y| − ∑ c j y j < ε
j=1
for all y ∈ [−M, M]. Because A is an algebra and because there is no constant term in the
polynomial,
N
∑ cj f j ∈ A .
j=1
Proof. Write:
f + g | f − g|
max( f , g) = + ,
2 2
and
f + g | f − g|
min( f , g) = − .
2 2
As A is an algebra we are done.
The claim is true for the minimum or maximum of any finite collection of functions as well.
Claim 3: Given f ∈ C(X, R), x ∈ X and ε > 0 there exists a gx ∈ A with gx (x) = f (x) and
is open (it is the inverse image of an open set by a continuous function). Furthermore y ∈ Uy . So the
sets Uy cover X.
11.7. THE STONE–WEIERSTRASS THEOREM 169
The space X is compact so there exist finitely many points y1 , y2 , . . . , yn in X such that
n
[
X= Uy j .
j=1
Let
gx := max(hy1 , hy2 , . . . , hyn ).
By Claim 2, gx ∈ A . Furthermore,
gx (t) > f (t) − ε
for all t ∈ X, since for any t there is a y j such that t ∈ Uy j and so hy j (t) > f (t) − ε .
Finally hy (x) = f (x) for all y ∈ X, so gx (x) = f (x).
Claim 4: If f ∈ C(X, R) and ε > 0 is given then there exists an ϕ ∈ A such that
| f (x) − ϕ (x)| < ε .
Proof. For any x find the function gx as in Claim 3.
Let
Vx := t ∈ X : gx (t) < f (t) + ε .
The sets Vx are open as gx and f are continuous. As gx (x) = f (x), then x ∈ Vx . So the sets Vx cover
X. By compactness of X, there are finitely many points x1 , x2 , . . . , xk such that
k
[
X= Vx j .
j=1
Let
ϕ := min(gx1 , gx2 , . . . , gxk ).
By Claim 2, ϕ ∈ A . Similarly as before (same argument as in Claim 3) we have that for all t ∈ X
ϕ (t) < f (t) + ε .
Since all the gx satisfy gx (t) > f (t) − ε for all t ∈ X, ϕ (t) > f (t) − ε as well. Therefore, for all t
−ε < ϕ (t) − f (t) < ε ,
which is the desired conclusion.
The proof of the theorem follows from Claim 4. The claim states that an arbitrary continuous
function is in the closure of A , which itself is closed. So the theorem is proved.
Example 11.7.13: The functions of the form
n
f (t) = ∑ c j e jt ,
j=1
for c j ∈ R, are dense in C([a, b], R). We need to note that such functions are a real algebra, which
follows from e jt ekt = e( j+k)t . They separate points as et is one-to-one, and et > 0 for all t so the
algebra does not vanish at any point.
170 CHAPTER 11. FUNCTIONS AS LIMITS
In general if we have a set of functions that separates points and does not vanish at any point, we
can let these functions generate an algebra by considering all the linear combinations of arbitrary
multiples of such functions. That is, we consider all real polynomials without constant term of such
functions. In the example above, the algebra is generated by et . We consider polynomials in et
without constant term.
Example 11.7.14: We mentioned that the set of all functions of the form
N
a0 + ∑ an cos(nt)
n=1
The self-adjoint requirement is necessary although it is not so obvious to see it. For an example
see .
Here is an interesting application. When working with functions of two variables, it may be
useful to work with functions of the form f (x)g(y) rather than F(x, y). For example, they are easier
to integrate. We have the following.
Example 11.7.17: Any continuous function F : [0, 1] × [0, 1] → C can be approximated uniformly
by functions of the form
n
∑ f j (x)g j (y)
j=1
11.7.3 Exercises
Exercise 11.7.1: Prove . Hint: If { fn } is a sequence in C(X, R) converging to f , then as f
is bounded, you can show that fn is uniformly bounded, that is, there exists a single bound for all fn (and f ).
Exercise 11.7.2: Suppose X := R (not compact in particular). Show that f (t) := et is not possible to
t
uniformly approximate by polynomials on X. Hint: Consider ten as t → ∞.
Exercise 11.7.3: Suppose f : [0, 1] → C is a uniform limit of a sequence of polynomials of degree at most d,
then the limit is a polynomial of degree at most d. Conclude that to approximate a function which is not a
polynomial, we need the degree of the approximations to go to infinity.
Hint: First prove that if a sequence of polynomials of degree d converges uniformly to the zero function, then
the coefficients converge to zero. One way to do this is linear algebra: Consider a polynomial p evaluated at
d + 1 points to be a linear operator taking the coefficients of p to the values of p (an operator in L(Rd+1 )).
R
Exercise 11.7.4: Suppose f : [0, 1] → R is continuous and 01 f (x)xn dx = 0 for all n = 0, 1, 2, . . .. Show that
R 2
f (x) = 0 for all x ∈ [0, 1]. Hint: approximate by polynomials to show that 01 f (x) dx = 0.
1
Exercise 11.7.5: Suppose I : C([0, 1], R) → R is a linear continuous function such that I(xn ) = n+1 for all
R
n = 0, 1, 2, 3, . . .. Prove that I( f ) = 01 f for all f ∈ C([0, 1], R).
Exercise 11.7.6: Let A be the collection of real polynomials in x2 , that is polynomials of the form c0 +
c1 x2 + c2 x4 + · · · + cd x2d .
a) Show that every f ∈ C([0, 1], R) is a uniform limit of polynomials from A .
b) Find an f ∈ C([−1, 1], R) that is not a uniform limit of polynomials from A .
c) Which hypothesis of the real Stone-Weierstrass is not satisfied for the domain [−1, 1]?
172 CHAPTER 11. FUNCTIONS AS LIMITS
Exercise 11.7.8: Show that for complex numbers c j , the set of functions of x on [−π, π] of the form
n
∑ ck eikx
k=−n
satisfies the hypotheses of the complex Stone-Weierstrass theorem and therefore such functions are dense in
the C([−π, π], C).
Exercise 11.7.9: Let S1 ⊂ C be the unit circle, that is the set where |z| = 1. Orient this set counterclockwise.
Let γ(t) := eit . For the one-form f (z) dz we write
Z Z 2π
f (z) dz := f (eit ) ieit dt.
S1 0
R k
a) Prove that for all nonnegative integers k = 0, 1, 2, 3, . . . we have S1 z dz = 0.
R
b) Prove that if P(z) = ∑nk=0 ck zk is any polynomial in z, then S1 P(z) dz = 0.
R
c) Prove S1 z̄ dz 6= 0.
d) Conclude that polynomials in z (this algebra of functions is not self-adjoint) are not dense in C(S1 , C).
Exercise 11.7.10: Let (X, d) be a compact metric space and suppose A ⊂ C(X, R) is a real algebra that
separates points, but such that for some x0 , f (x0 ) = 0 for all f ∈ A . Prove that any function g ∈ C(X, R)
such that g(x0 ) = 0 is a uniform limit of functions from A .
Exercise 11.7.11: Let (X, d) be a compact metric space and suppose A ⊂ C(X, R) is a real algebra. Suppose
that for each y ∈ X the closure A contains the function ϕy (x) := d(y, x). Then A = C(X, R).
Exercise 11.7.12:
a) Suppose f : [a, b] → C is continuously differentiable. Show that there exists a sequence of polynomials
{pn } that converges in the C1 norm to f , that is k f − pn ku + k f ′ − p′n ku → 0 as n → ∞.
b) Suppose f : [a, b] → C is k times continuously differentiable. Show that there exists a sequence of
polynomials {pn } that converges in the Ck norm to f , that is
k
( j)
∑ k f ( j) − pn ku → 0 as n → ∞.
j=0
∗
One could also define dz := dx+i dy and then extend the path integral from to complex-valued one-forms.
11.7. THE STONE–WEIERSTRASS THEOREM 173
Exercise 11.7.13:
a) Show that an even function f : [−1, 1] → R is a uniform limit of polynomials with even powers only, that
is, polynomials of the form a0 + a1 x2 + a2 x4 + · · · + ak x2k .
b) Show that an odd function f : [−1, 1] → R is a uniform limit of polynomials with odd powers only, that is,
polynomials of the form b1 x + b2 x3 + b3 x5 + · · · + bk x2k−1 .
174 CHAPTER 11. FUNCTIONS AS LIMITS
The second form is usually more convenient. Note that if |z| = 1 we write z = eix , and so
N N
∑ cn einx = ∑ cn zn .
n=−N n=−N
So a trigonometric polynomial is really a rational function (do note that we are allowing negative
powers) evaluated on the unit circle. There is a wonderful connection between power series (actually
Laurent series because of the native powers) and Fourier series because of this observation, but we
will not investigate this further.
Another reason why Fourier series are important and come up in so many applications is that
the functions are eigenfunctions of various differential operators. For example,
d ikx d 2 ikx
e = (ik)eikx , e = (−k2 )eikx .
dx dx2
That is, they are the functions whose derivative is a scalar (the eigenvalue) times itself. Just as
eigenvalues and eigenvectors are important in studying matrices, eigenvalues and eigenfunctions
are important when studying linear differential equations.
The functions cos(nx), sin(nx), and einx are 2π -periodic and hence trigonometric polynomials
are also 2π -periodic. We could rescale x to make the period different, but the theory is the same, so
inx
let us stick with the period of 2π . The antiderivative of einx is ein and so
Z π
(
2π if n = 0,
einx dx =
−π 0 otherwise.
∗
Named after the French mathematician (1768–1830).
†
Eigenfunction is like an eigenvector for a matrix, but for a linear operator on a vector space of functions.
11.8. FOURIER SERIES 175
Consider
N
f (x) := ∑ cn einx ,
n=−N
and for m = −N, . . . , N compute
Z Z
! Z π
N N
1 π −imx 1 π i(n−m)x 1
f (x)e dx = ∑ cn e dx = ∑ cn ei(n−m)x dx = cm .
2π −π 2π −π n=−N n=−N 2π −π
We just found a way of computing the coefficients cm using an integral of f . If |m| > N the integral
is just 0: We might as well have included enough zero coefficients to make |m| ≤ N.
Proposition 11.8.1. A trigonometric polynomial f (x) = ∑N
n=−N cn e
inx is real-valued for real x if
The complex conjugate goes inside the integral because the integral is done on real and imaginary
parts separately.
On the other hand if c−m = cm , then
which is real valued. Also c0 = c0 , so c0 is real. So by pairing up the terms we obtain that f has to
be real-valued.
The functions einx are also linearly independent.
Proposition 11.8.2. If
N
∑ cn einx = 0
n=−N
do have a Fourier series, where does it converge (where and if at all)? Does it converge absolutely?
Uniformly? Also note that the series has two limits. When talking about Fourier series convergence,
we often talk about the following limit:
N
lim ∑ cn einx .
N→∞ n=−N
There are other ways we can sum the series that can get convergence in more situations, but we
refrain from discussing those.
Conversely, we start with any integrable function f : [−π , π ] → C, and we call the numbers
Z π
1
cn := f (x)e−inx dx
2π −π
its Fourier coefficients. Often these numbers are written as fˆ(n) . We then formally write down a
Fourier series. As you might imagine such a series might not even converge. We write
∞
f (x) ∼ ∑ cn einx
n=−∞
although the ∼ doesn’t imply anything about the two sides being equal in any way. It is simply that
we created a formal series using the formula for the coefficients.
A few sections ago, we proved that the Fourier series
∞
sin(nx)
∑ n2
n=1
converges uniformly and hence converges to a continuous function. This example and its proof can
be extended to a more general criterion.
Proposition 11.8.3. Let ∑∞
n=−∞ cn e
inx be a Fourier series, and C, α > 1 constants such that
C
|cn | ≤ for all n ∈ Z \ {0}.
|n|α
Then the series converges (absolutely and uniformly) to a continuous function on R.
The proof is to apply the Weierstrass M-test ( ) and the p-series test, to find that
the series converges uniformly and hence to a continuous function. We can also take derivatives.
Proposition 11.8.4. Let ∑∞
n=−∞ cn e
inx be a Fourier series, and C, α > 2 constants such that
C
|cn | ≤ for all n ∈ Z \ {0}.
|n|α
Then the series converges to a continuously differentiable function on R.
∗
The notation seems similar to Fourier transform for those readers that have seen it. The similarity is not just
coincidental, we are taking a type of Fourier transform here.
11.8. FOURIER SERIES 177
The trick is to first notice that the series converges first to a continuous function by the previous
proposition, so in particular it converges at some point. Then differentiate the partial sums
N
∑ incn einx
n=−N
Remark 11.8.5. Notice the similarity to finite dimensions. For z = (z1 , z2 , . . . , zn ) ∈ Cn we define
n
hz, wi := ∑ zk wk
k=1
and then the norm is (usually it is denoted by simply kzk rather than kzk2 )
n
2
kzk = hz, zi = ∑ |zk |2.
k=1
Let us get back to function spaces. In what follows, we will assume all functions are Riemann
integrable.
Definition 11.8.6. Let {ϕn } be a sequence of integrable complex-valued functions on [a, b]. We
say that this is an orthonormal system if
Z b
(
1 if n = m,
hϕn , ϕm i = ϕn (x) ϕm (x) dx =
a 0 otherwise.
In particular, kϕn k2 = 1 for all n. If we only require that hϕn , ϕm i = 0 for m 6= n then the system
would be just an orthogonal system.
and write
∞
f (x) ∼ ∑ cnϕn.
n=1
In other words, the series is
∞
∑ h f , ϕniϕn(x).
n=1
Notice the similarity to the expression for the orthogonal projection of a vector onto a subspace
from linear algebra. We are in fact doing just that, but in a space of functions.
Theorem 11.8.7. Suppose f is a Riemann integrable function on [a, b]. Let {ϕn } be an orthonormal
system on [a, b] and suppose
∞
f (x) ∼ ∑ cnϕn(x).
n=1
If
n n
sn (x) := ∑ ck ϕk (x) and pn (x) := ∑ dk ϕk (x).
k=1 k=1
In other words the partial sums of the Fourier series are the best approximation with respect to
the L2 norm.
Proof. Let us write
Z b Z b Z b Z b Z b
| f − pn |2 = | f |2 − f pn − f pn + |pn |2 .
a a a a a
Now Z b Z b Z b
n n n
a
f pn =
a
f ∑ dk ϕk = ∑ dk a
f ϕk = ∑ dk ck ,
k=1 k=1 k=1
and Z b Z b n Z b
n n n n
2
a
|pn | = ∑ dk ϕk ∑ d j ϕ j = ∑ ∑ dk d j
a k=1 a
ϕk ϕ j = ∑ |dk |2.
j=1 k=1 j=1 k=1
So
Z b Z b n n n Z b n n
2 2 2
| f − pn | = | f | − ∑ dk ck − ∑ dk ck + ∑ |dk | = | f | − ∑ |ck | + ∑ |dk − ck |2 .
2 2
a a k=1 k=1 k=1 a k=1 k=1
and so Z b
n
2
∑ |ck | ≤ | f |2
k=1 a
Then Z b
∞
2
∑ |ck | ≤ | f |2 = k f k22 .
k=1 a
Rb 2
In particular (given that a Riemann integrable function satisfies a |f| < ∞), we get that the
series converges and hence
lim ck = 0.
k→∞
∗
Named after the German astronomer, mathematician, physicist, and geodesist (1784–
1846).
180 CHAPTER 11. FUNCTIONS AS LIMITS
(we are just rescaling the dx really) then everything works and we obtain that the system {einx } is
orthonormal with respect to the inner product
Z π
1
h f , gi = f (x) g(x) dx.
2π −π
We claim that
N sin (N + 1/2)x
inx
DN (x) = ∑ e =
sin(x/2)
,
n=−N
at least for x such that sin(x/2) 6= 0. We know that the left hand side is continuous and hence the
right hand side extends continuously to all of R as well. To show the claim we use a familiar trick:
Multiply by e−ix/2
(eix/2 − e−ix/2 )DN (x) = ei(N+ /2)x − e−i(N+ /2)x .
1 1
N Z π Z π N
1 −int inx 1
sN ( f ; x) = ∑ f (t)e dt e = f (t) ∑ ein(x−t) dt
n=−N 2π −π 2π −π n=−N
Z π
1
= f (t)DN (x − t) dt.
2π −π
Convolution strikes again! As DN and f are 2π -periodic we may also change variables and write
Z x+π Z π
1 1
sN ( f ; x) = f (x − t)DN (t) dt = f (x − t)DN (t) dt.
2π x−π 2π −π
40
35
30
25
20
15
10
5
0
−5
−10
−4 −3 −2 −1 0 1 2 3 4
Figure 11.10: Plot of DN (x) for N = 5 (gray) and N = 20 (black).
The central peak gets taller and taller as N gets larger, and the side peaks stay small (but oscillate
wildly). We are convolving (again) with approximate delta functions, although these have all these
oscillations away from zero, which do not go away. So we expect that sN ( f ) goes to f . Things are
not always so simple, but under some conditions on f , such a conclusion holds. For this reason
people write
∞
δ (x) ∼ ∑ einx
n=∞
although we have not really defined the delta function (and it is not a function), nor a Fourier series
of whatever kind of object it is.
182 CHAPTER 11. FUNCTIONS AS LIMITS
11.8.5 Localization
If f satisfies a Lipschitz condition at a point, then the Fourier series converges at that point.
Theorem 11.8.9. Let x be fixed and let f be a 2π -periodic function Riemann integrable on [−π , π ].
Suppose there exist δ > 0 and M such that
| f (x + t) − f (x)| ≤ M|t|
lim sN ( f ; x) = f (x).
N→∞
Write
Z Z
1 π 1 π
sN ( f ; x) − f (x) = f (x − t)DN (t) dt − f (x) DN (t) dt
2π −π 2π −π
Z
1 π
= f (x − t) − f (x) DN (t) dt
2π −π
Z
1 π f (x − t) − f (x)
= sin (N + 1/2)t dt.
2π −π sin(t/2)
By the hypotheses, for small nonzero t we get
f (x − t) − f (x) M|t|
≤ .
sin(t/2) |sin(t/2)|
h(t) M|t|
As sin(t) = t + h(t) where t → 0 as t → 0, we notice that |sin(t/2)| is continuous at the origin and
hence f (x−t)− f (x)
sin(t/2) must be bounded near the origin. As t = 0 is the only place on [−π , π ] where the
denominator vanishes, it is the only place where there could be a problem. The function is also
Riemann integrable. We use a trigonometric identity
sin (N + 1/2)t = cos(t/2) sin(Nt) + sin(t/2) cos(Nt),
11.8. FOURIER SERIES 183
so
Z π
1 f (x − t) − f (x)
sin (N + 1/2)t dt =
2π −π sin(t/2)
Z Z
1 π f (x − t) − f (x) 1 π
cos( /2) sin(Nt) dt +
t f (x − t) − f (x) cos(Nt) dt.
2π −π sin(t/2) 2π −π
Now f (x−t)− f (x)
sin(t/2) cos(t/2) and f (x − t) − f (x) are bounded Riemann integrable functions and so
their Fourier coefficients go to zero by . So the two integrals on the right hand side,
which compute the Fourier coefficients for the real version of the Fourier series go to 0 as N goes to
infinity. This is because sin(Nt) and cos(Nt) are also orthonormal systems with respect to the same
inner product. Hence sN ( f ; x) − f (x) goes to 0, that is, sN ( f ; x) goes to f (x).
The theorem also says that convergence depends only on local behavior.
Corollary 11.8.11. Suppose f is a 2π -periodic function, Riemann integrable on [−π , π ]. If J is an
open interval and f (x) = 0 for all x ∈ J, then lim sN ( f ; x) = 0 for all x ∈ J.
In particular, if f and g are 2π -periodic functions, Riemann integrable on [−π , π ], J an open
interval, and f (x) = g(x) for all x ∈ J, then for all x ∈ J, the sequence {sN ( f ; x)} converges if and
only if {sN (g; x)} converges.
That is, convergence at x is only dependent on the values of the function near x. To prove the
first claim, take M = 0 in the theorem. The “In particular” follows by considering the function f − g,
which is zero on J and sN ( f − g) = sN ( f ) − sN (g). On the other hand, we have seen that the rate of
convergence, that is how fast does sN ( f ) converge to f , depends on global behavior of the function.
There is a subtle difference between the corollary and what can be achieved by the
. Any continuous function on [−π , π ] can be uniformly approximated by
trigonometric polynomials, but these trigonometric polynomials need not be the partial sums sN .
k f − hk2 < ε .
kh − sN (h)k2 ≤ kh − Pk2 ≤ ε
This is left as an exercise. The proof is not really that different from the finite dimensional version.
So
Z π Z π Z π
f ḡ − sN ( f )g = ( f − sN ( f ))g
−π −π −π
Z π
≤ | f − sN ( f )| |g|
−π
Z π 1/2 Z π
1/2
2 2
≤ | f − sN ( f )| |g| .
−π −π
The right hand side goes to 0 as N goes to infinity by the first claim of the theorem. That is, as
N goes to infinity, hsN ( f ), gi goes to h f , gi, and the second claim is proved. The last claim in the
theorem follows by using g = f .
11.8. FOURIER SERIES 185
11.8.7 Exercises
Exercise 11.8.1: Take the Fourier series
∞
1
∑ 2n sin(2n x).
n=1
Show that the series converges uniformly and absolutely to a continuous function. Note: This is another
example of a nowhere differentiable function (you do not have to prove that) . See .
0.8
0.4
−0.4
−0.8
−4 −2 0 2 4
Exercise 11.8.2: Suppose that a 2π-periodic function that is Riemann integrable on [−π, π], and such
that f is continuously differentiable on some open interval (a, b). Prove that for every x ∈ (a, b), we have
lim sN ( f ; x) = f (x).
Exercise 11.8.3: Prove , that is, Suppose a 2π-periodic function is continuous piecewise
smooth near a point x, then lim sN ( f ; x) = f (x). Hint: See the previous exercise.
Exercise 11.8.4: Given a 2π-periodic function f : R → C Riemann integrable on [−π, π], and ε > 0. Show
that there exists a continuous 2π-periodic function g : R → C such that k f − gk2 < ε.
Exercise 11.8.5: Prove the Cauchy-Bunyakovsky-Schwarz inequality for Riemann integrable functions:
Z b 2 Z b
Z b
2 2
f ḡ ≤ |f| |g| .
a a a
Exercise 11.8.6: Prove the L2 triangle inequality for Riemann integrable functions on [−π, π]:
k f + gk2 ≤ k f k2 + kgk2 .
∗
See G. H. Hardy, Weierstrass’s Non-Differentiable Function, Transactions of the American Mathematical Society,
17, No. 3 (Jul., 1916), pp. 301–325.
186 CHAPTER 11. FUNCTIONS AS LIMITS
C
Exercise 11.8.7: Suppose for some C and α > 1, we have a real sequence {an } with |an | ≤ nα for all n. Let
∞
g(x) := ∑ an sin(nx).
n=1
y′′ + 2y = g(x)
of the form
∞
y(x) = ∑ bn sin(nx).
n=1
c) Then show that this solution y is twice continuously differentiable, and in fact solves the equation.
Exercise 11.8.8: Let f be a 2π-periodic function such that f (x) = x for 0 < x < 2π. Use Parseval’s theorem
to find
∞
1 π2
∑ n2 = 6 .
n=1
Exercise 11.8.9: Suppose that cn = 0 for all n < 0 and ∑∞ n=0 |cn | converges. Let D = B(0, 1) ⊂ C be the unit
disc, and D = C(0, 1) be the closed unit disc. Show that there exists a continuous function f : D → C which
is analytic on D and such that on the boundary of D we have f (eiθ ) = ∑∞ iθ
n=0 cn e .
iθ n n
Hint: if z = re then z = r e .inθ
C
Show that there exists a C > 0 such that |cn | ≤ |n| .
Exercise 11.8.12:
a) Let ϕ be the 2π-periodic function defined by ϕ(x) := 0 if x ∈ (−π, 0), and ϕ(x) := 1 if x ∈ (0, π), letting
ϕ(0) and ϕ(π) be arbitrary. Show that lim sN (ϕ; 0) = 1/2.
b) Let f be a 2π-periodic function Riemann integrable on [−π, π], x ∈ R, δ > 0, and there are continuously
differentiable g : [x − δ , x] → C and h : [x, x + δ ] → C where f (t) = g(t) for all t ∈ [x − δ , x) and where
f (t) = h(t) for all t ∈ (x, x + δ ]. Then lim sN ( f ; x) = g(x)+h(x)
2 , or in other words:
1
lim sN ( f ; x) = lim f (t) + lim+ f (t) .
N→∞ 2 t→x− t→x
Further Reading
[R1] Maxwell Rosenlicht, Introduction to analysis, Dover Publications Inc., New York, 1986.
Reprint of the 1968 edition.
[R2] Walter Rudin, Principles of mathematical analysis, 3rd ed., McGraw-Hill Book Co., New
York, 1976. International Series in Pure and Applied Mathematics.
[T] William F. Trench, Introduction to real analysis, Pearson Education, 2003.
.
188 FURTHER READING
Index