Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
68 views483 pages

Math

Uploaded by

Ashwini Purkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views483 pages

Math

Uploaded by

Ashwini Purkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 483

Advanced Calculus For Data Science

Mike Carr
Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1 Review of Algebra and Calculus 5


1.1 Graphs of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Limits and Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Applications of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4 Definite Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2 Advanced Integration and Applications 59


2.1 Area Between Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.3 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.4 Approximate Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.5 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.6 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.7 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

3 Series 169
3.1 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
3.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
3.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
3.5 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

4 Multivariable Functions 241


4.1 Three-Dimensional Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 242
4.2 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
4.3 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
4.4 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
4.5 Linear Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

5 Vectors in Calculus 307


5.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
5.2 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
5.3 Normal Equations of Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
5.4 The Gradient Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
5.5 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
5.6 Maximum and Minimum Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
5.7 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

6 Multivariable Integration 411


6.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
6.2 Double Integrals over General Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
6.3 Joint Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
6.4 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

1
Introduction
So far in calculus you have developed the tools to answer the following questions about a function
of one variable:

1 How quickly does the value of the function change


as the input changes?
2 How do we estimate the value of the function near
a point?

3 What are the maximum and minimum values of the


function?

4 What is the area under the graph of the function?


What does it mean?

These are all useful tools, but they don’t necessarily apply to the types of data that we encounter in
the world.
Data generally takes the form of a set of observations, rather than an algebraic function. How do
we perform calculus with such a set? We cannot integrate it without an antiderivative. In some cases,
the best functions to model our data are difficult to work with. We take for granted that sin x is a
2
useful function, but how do we even evaluate a quantity like sin(7.52)? In all these circumstances, the
best we can do is approximate. We will develop methods to approximate integrals and to approximate
functions.

Figure: Approximations of an integral and of a function

Many measurable quantities can be found to depend on the value of multiple inputs. These are
multivariable functions like z = F (x, y), where z is a function of two independent variables. Examples
appear in all the sciences

nrt
1 Chemistry: V =
P

GM m
2 Physics: F =
r2

3 Economics: P = P0 ert

Figure: The graph of a two-variable function

We want to understand how to measure rates of change of these functions, and what these mea-
surements can do for us.
Furthermore, real world data does not come prepackaged with a differentiable function to describe
it. One approach is to find a line of best fit. Doing so requires optimizing two variables at once (slope
and intercept) to find the best fit.

3
Introduction

Figure: Fitting a line to a set of data points

The values of y may not be a function of x at all. Another view point is to see (x, y) as a randomly
chosen point in the plane. To model such random choices, we use a two-variable density function.
Volumes under its graph (computed by integrals) tell us where these random points are likely to lie.

Figure: A function that models the outcomes of a random process

These approaches will requires us to use derivatives and integrals of multivariable functions.

4
Chapter 1

Review of Algebra and Calculus

This chapter reviews the most important information about functions, limits, derivatives, and integrals.
It is not meant to teach this material to a first-time learner, but can serve as a reference or reminder.

Contents
1.1 Graphs of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Limits and Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Applications of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4 Definite Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Section 1.1

Graphs of Functions
Goals:

1 Graph algebraic and trigonometric functions.


2 Solve equations using inverse functions.

3 Solve equations containing quotients.


4 Graph transformations of functions.

Definition

The graph of an equation is the set of ordered pairs (x, y) that satisfy the equation. These are the
points that, when their coordinates are plugged in for x and y, the two sides of the equation are equal.

Linear Functions

Linear functions can be written in slope-intercept form:

f (x) = mx + b.

The graph y = mx + b of a linear function is a line.


m is the slope, which is the change in y over the change in x between any two points on the line.

(0, b) is the y intercept.


If we have the slope and a known point (x0 , y0 ) on a line. We can write its equation in point-slope
form.
y − y0 = m(x − x0 )

If we have both the x- and y-intercepts of the line, it is convenient to write it in normal form

ax + by + c = 0

6
Monomials

A monomial is a function of the form:


f (x) = xn
where n is an integer greater than 0.

For n ≥ 2 the graph y = xn curves upward over the positive values of x.


Higher values of n have lower values when 0 < x < 1 but higher values after x > 1.
For even values of n the graph is symmetric across the y-axis, curving up when x is negative.
For odd values of n the graph curves down when x is negative. It is anti-symmetric across the
x = 0.

Figure: Graphs of monomials of odd and even powers

Monomials of Negative Power

Monomials of negative power have the form f (x) = x−n . They are also commonly written

1
f (x) =
xn
.

1
The graph y = xn has a vertical asymptote at x = 0.
The graph approaches the x-axis, y = 0 as x gets large.

For even values of n, the graph is above the x-axis.


For odd values of n, the graph is above the x-axis for positive x and below it for negative x.
A larger choice of n makes the function approach the x-axis more quickly.

7
Section 1.1 Graphs of Functions

Figure: Graphs of monomials of negative odd and even powers

Roots

A root functiom is a function of the form:



n
f (x) = x

where n is an integer greater than 0.


The domain of n
x is [0, ∞) if n is even and all real numbers if n is odd.

The x and y intercept of y = n x is at (0, 0).
Root functions are increasing. At x = 0, they travel straight up.

√ √
Figure: The graphs of y = x and y = 3
x

8
Exponential Functions

An exponential function has the form:


f (x) = ax
where a is a number greater than 0.

a is called the base of the exponential function.


The graph y = ax passes through (0, 1).
If a > 1 then f (x) increases quickly as x takes on positive values. Higher values of a give a
steeper increase. f (x) approaches 0 as x goes to −∞. Higher values of a give a faster approach.
The graph does not touch or cross the x-axis.

If a < 1, then the above is reversed.


e is a commonly used base. e is approximately 2.718.

Figure: The graphs of exponential functions

Logarithms

A logarithmic function have the form:


f (x) = logxa
where a is a number greater than 1. loga x is the number b such that ab = x.

ab can never be 0 or less. The domain of f (x) = logxa is (0, ∞).


As x goes to 0, loga x goes to −∞.
y = loga x has an x intercept at (1, 0).

9
Section 1.1 Graphs of Functions

Figure: The graphs of logarithm functions

Logarithms and exponents are inverse functions. We solve exponential equations by applying a
logarithm to both sides. We solve logarithm equations by exponentiating both sides.

ax = c x = loga c
loga x = c x = ac

Trigonometric Functions

f (x) = sin x and f (x) = cos x are periodic functions.

sin x and cos x have a range of [−1, 1].


These functions are periodic. This means that for all x, f (x + 2π) = f (x).

Figure: The graphs of y = sin x and y = cos x


10
The other trigonometric functions can be written in terms of sine and cosine.

sin x
tan x =
cos x
cos x
cot x =
sin x
1
sec x =
cos x
1
csc x =
sin x

Since trigonometric functions obtain the same values infinitely many times, the do not technically
have inverse functions. However, we define inverse trigonometric functions on a restricted range.

π π
− ≤ sin−1 x ≤
2 2
0 ≤ cos−1 x ≤ π
π π
− ≤ tan−1 x ≤
2 2

These functions provide one solution to a trigonometric equation. We can obtain the others by using
the periodic behavior of trognometric funtions.

sin x = c x = sin−1 c + 2πn or π − sin−1 c + 2πn


cos x = c x = cos−1 c + 2πn or − cos−1 c + 2πn
tan x = c x = tan−1 c + πn

Where n can be any integer.

Question 1.1.1
How Do Transformations Affect the Graph of a Function?

Transformations

Suppose we would like to transform the graph y = f (x). Here are four ways we can.

The graph of y = af (x) is stretched by a factor of a in the y direction.


The graph of y = f (x) + b is shifted by b in the positive y direction.
The graph of y = f (cx) is compressed by a factor of c in the x direction.
The graph of y = f (x + d) is shifted by d in the negative x direction.

11
Question 1.1.1 How Do Transformations Affect the Graph of a Function?

Figure: The graphs of y = f (x) and it transformation

Example 1.1.2
A Equation with Quotients

f (x)
An equation of the form g(x) = 0 is satisfied whenever f (x) = 0 but g(x) ̸= 0.

Example

Solve
2x2 − 3x − 5
=0
x2 + 3x + 2

12
Solution

2x2 − 3x − 5 = 0 set numerator = 0


(2x − 5)(x + 1) = 0 factor
5
x= or x = −1
2

Then we must check that neither of these causes the denominator to be 0.


 2  
5 5 63
+3 +2= (−1)2 + 3(−1) + 2 = 0
2 2 4

5
So x = 2 is the only solution.

If there are terms besides the quotient, move them all to the same side of the equation and use a
common denominator to combine them.

Example

Solve
x+3 4
2+ =
x+1 x

Solution

x+3 4
2+ − =0 move to one side
x+1 x
2x2 + 2x x2 + 3x 4x + 4
+ 2 − 2 =0 common denominator
x2 + x x +x x +x
3x2 + x − 4
=0 combine
x2 + x
set 3x2 + x − 4 = 0
(3x + 4)(x − 1) = 0 factor
4
x=− or x = 1
3

Then we must check that neither of these causes the denominator to be 0.


 2  
4 4 4
− + − = 12 + 1 = 2
3 3 9

Both solutions are valid. x = − 34 or x = 1.

13
Section 1.1
Exercises

1.1

3
Q1 Simplify 52 54

Q2 Simplify e5 (e4 )3

Q3 Compress 2 log5 x + log5 y − 3 log5 z into a single logarithm.

Q4 Compress 3 ln(x + y) − ln(x2 + 2xy + y 2 ) into a single logarithm.

Q5 Solve 2ex − 7 = 22

Q6 Solve 4 cos(2x) = 1

Q7 Solve 2 sin2 x − 1 = 0

Q8 Solve 2 ln(x − 5) = 16

Q9 Solve 43x−2 = 15

Q10 Solve log7 (x2 + 5) − 3 = 11

1.1.1

Q11 Graph y = 3 sin(2x).

Q12 Graph y = − ln x + 5.

Q13 Graph y = ex − 4.


3
Q14 Graph y = x + 3.

1
Q15 Graph y = .
(x − 2)2
14

Q16 Graph y = −2 x + 1 + 4.

1.1.2

x2 + 5x − 6
Q17 Solve for x: =0
x−1
ex − 2
Q18 Solve for x: =0
x2 + 2x − 3
3x2 − 5
Q19 Solve for x: =0
2ex − 7
ln t − 4
Q20 Solve for t: =0
3−t
ln x − 4
Q21 Solve for x: =0
3−x
3 7
Q22 Solve for x: =
x+2 x+4
5 u
Q23 Solve for u: =
(u + 1)2 u+1

15
Section 1.2

Limits and Derivatives


Goals:

1 Compute limits of functions.


2 Verify that a function is continuous.

3 Compute derivatives.
4 Use derivatives to understand graphs and vice versa.

Question 1.2.1
What Is a Limit?

The Limit of a Function

If we can make f (x) arbitrarily close to some number L by considering only x in a small interval
(a, a + δ) then we say the limit of f as x approaches a from the right is L. We write:

lim f (x) = L
x→a+

If f (x) cannot be made arbitrarily close to any number, then this limit does not exist.
Similarly, if we can make f (x) arbitrarily close to some number L by considering only x in a small
interval (a − δ, a) then we say the limit of f as x approaches a from the left is L. We write:

lim f (x) = L
x→a−

If f (x) cannot be made arbitrarily close to any number, then this limit does not exist.
If both lim f (x) = L and lim f (x) = L, we say the two-sided limit or just limit of f as x
x→a+ x→a−
approaches a is L. We write

lim f (x) = L
x→a

If the either the limit from the left or the limit from the right does not exist, or if they do exist
but are not equal to each other, then the two sided limit does not exist.

16
Figure: An interval of x values that produce values in a small neighborhood of L when plugged into
f (x).

Infinite Limits

If f (x) can be made arbitrarily large by considering only x in a small interval (a, a + δ) then we say the
limit of f as x approaches a from the right is ∞.

lim f (x) = ∞
x→a+

This is a way of representing growth without bound. Infinite limits from the left are defined anal-
ogously. Also analogous is our treatment of a function then decreases without bound. We say these
functions limit to −∞. If either one-sided limit at x = a is infinite, then the line x = a is a vertical
asymptote of y = f (x).

Example

Let f (x) = x1 .
lim f (x) = ∞
x→0+

lim f (x) = −∞
x→0−

17
Question 1.2.1 What Is a Limit?

1
Figure: The graph of y = x

Vertical Asymptotes

There are only two common algebraic constructions that produce infinite limits.

f (x)
A function of the form g(x) where lim g(x) = 0 and lim f (x) ̸= 0.
x→a x→a

lim loga x = −∞.


x→0+

Remark

∞ is not a number, so if lim+ f (x) = ∞ we would still say that lim+ f (x) does not exist.
x→a x→a

There are several limit laws that allow us to compute limits of combinations of simpler functions.

Theorem [Limit Laws]

The following hold limits, provided that lim f (x) and lim g(x) exist.
x→a x→a

lim (f (x) + g(x)) = lim f (x) + lim g(x)


x→a x→a x→a

lim (cf (x)) = c lim f (x)


x→a x→a

  
lim (f (x)g(x)) = lim f (x) lim g(x)
x→a x→a x→a

lim f (x)
  !
f (x) x→a
lim = provided that lim g(x) ̸= 0
x→a g(x) lim g(x) x→a
x→a

lim f (g(x)) = lim f (x) provided that lim g(x) = b


x→a x→b x→a

We can write similar statements for one-sided limits, though we need to be careful about directions in
the composition rule.

18
Question 1.2.2
What is Continuity?

Definition

A function f (x) is continuous at a, if

lim f (x) = f (a)


x→a

Remark

This definition is useful, if we already know we are dealing with a continuous function. For example
f (x) = sin x is continuous so
π 1
lim sin x = sin =
x→ π6 6 2

Fortunately, many familiar functions are continuous.

Theorem

The following functions are continuous on their domains


1 Constant functions
2 Linear functions

3 Polynomials
4 Roots
5 Exponential functions
6 Logarithms

7 Trigonometric functions
8 f (x) = |x|

More complex functions made from continuous functions are also continuous.

19
Question 1.2.2 What is Continuity?

Theorem

If f (x) and g(x) are continuous on their domains, and c is a constant, then the following are also
continuous on their domains
1 f (x) + g(x)
2 f (x) − g(x)

3 f (x)g(x)

f (x)
4 g(x) (note that any x where g(x) = 0 is not in the domain)

5 f (x)g(x) as long as f (x) > 0


6 f (g(x))

Remark

Putting the above theorems together, we see that just about any function we can write using alge-
braic and trigonometric expressions is continuous on its domain. This does not mean it is continuous
everywhere. f (x) = x1 is not continuous at x = 0, for example.

Example 1.2.3
Computing a Limit

x2 − 7x + 12
How do we compute lim ?
x→3 x−3

Solution

x2 −7x+12
f (x) = x−3 is continuous on its domain, but x = 3 is not in the domain. However, let g(x) = x−4.
x2 −7x+12
We know = x − 4 for every x except x = 3. Specifically, in any neighborhood around x = 3,
x−3
f (x) = g(x) so they have the same limit.

x2 − 7x + 12
lim = lim x − 4 because they agree around x = 3
x→3 x−3 x→3

=3−4 because g(x) = x − 4 is continuous at x = 3


= −1

20
Question 1.2.4
What Is the Intermediate Value Theorem?

One early intuition for continuity is that the graph of the function can be drawn without any breaks.
There are many ways to formalize this idea. One of the most important is the following theorem.

Theorem [The Intermediate Value Theorem]

If f is a continuous function on [a, b] and K is a number between f (a) and f (b), then there is some
number c between a and b such that f (c) = K.

This theorem essentially states that a continuous graph cannot get from one side of the line y = K
to the other without intersecting y = K. Notice that this theorem does not say exactly where this
intersection must occur, only that it must occur somewhere in the interval (a, b). It also does not rule
out the possibility of more than one such c existing.

Example

Show that f (x) = ex − 3x has a root between 0 and 1.

Solution

A root is a number c such that f (c) = 0. To prove such a root exists, we check the conditions of the
IVT.
f (x) is a sum of continuous functions, so it is continuous on its domain.

f (0) = 1
f (1) = e − 3 < 0
0 is between f (0) and f (1)
We conclude there is some c between 0 and 1 such that f (c) = 0.

21
Question 1.2.5
What Is a Limit at Infinity?

Definition

If we can make f (x) arbitrarily close to some number L by considering only x in some interval
(n, ∞) then we say the limit of f as x approaches ∞ is L. We write:

lim f (x) = L
x→∞

If f (x) cannot be made arbitrarily close to any number, then this limit does not exist.

Similarly if we can f (x) arbitrarily close to L by considering only x in some interval (−∞, n) then
we say the limit of f as x approaches −∞ is L. We write:

lim f (x) = L
x→−∞

If either lim f (x) = L or lim f (x) = L, then y = L is a horizontal aysmptote of the graph
x→∞ x→−∞

y = f (x).

By observing graphs or using arithmetic intuition, we arrive at the following limits at infinity.

f (x) lim f (x) lim f (x) Comments


x→∞ x→−∞

xn (n odd) ∞ −∞ n>0
n
x (n even) ∞ ∞ n>0

n
x (n odd) ∞ DNE domain is x ≥ 0

n
x (n even) ∞ −∞
1
xn 0 0 n>0
x
a (a > 1) ∞ 0
ax (0 < a < 1) 0 ∞
loga x ∞ DNE a > 1, domain is x > 0
sin x DNE DNE oscillates
−1 π
tan x 2 − π2

22
Question 1.2.6
How Do We Measure the Change in a Function?

Definition

The average rate of change of a function f (x) between x = a and x = b is

f (b) − f (a)
b−a

This is also the slope of the secant line from (a, f (a)) to (b, f (b)) on the graph y = f (x).

Knowing the average rate of change over a range of inputs (or times) doesn’t tell us the rate of
change at a specific point (or moment). Geometrically the is the slope of the tangent line to y = f (x)
at a particular point (a, f (a))

tangent line secant line

y = f ( x)

a b

Figure: A secant line and a tangent line

The secant lines get closer and closer to the tangent line (in slope) as b gets closer to a. This
suggests that we could take the limit of these approaching values to get the actual slope.

Definition

The instantaneous rate of change or derivative of a function f (x) at x = a is

f (a + h) − f (a)
lim
h→0 h

provided that this limit exists. This is also the slope of the tangent line to y = f (x) at (a, f (a)). Two
common notations for the derivative are
Prime notation: f ′ (a)

df
Leibniz notation: dx
x=a

23
Question 1.2.6 How Do We Measure the Change in a Function?

Figure: A limit of the slopes of secant lines

We can attempt to compute the derivative at any point a. We can put these values together to
create a function f ′ (x).

Definition

The derivative function of f (x) is the function that takes the value

f (x + h) − f (x)
f ′ (x) = lim
h→0 h

at each x.
df
We can denote the derivative function as f ′ (x) or dx . The second can be rewritten d
dx f to emphasize
that we are applying the differentiation operation to the function f .

Example

If f (x) = x2 + 2x, compute f ′ (x).

24
Solution

f (x + h) − f (x)
f ′ (x) = lim definition of derivative
h→0 h
(x + h)2 + 2(x + h) − x2 − 2x
= lim plug in x and x + h
h→0 h
x2 + 2xh + h2 + 2x + 2h − x2 − 2x
= lim distribute
h→0 h
2xh + h2 + 2h
= lim cancel
h→0 h
= lim 2x + h + 2 functions agree except at h = 0 so limits are equal
h→0

= 2x + 0 + 2 limit = value on a continuous function


= 2x + 2

Theorem

If f ′ (x) > 0 for all x in some interval [a, b] then f (x) is increasing on [a, b].
If f ′ (x) < 0 for all x on [a, b] then f (x) is decreasing on [a, b].

We can take higher order derivatives by taking derivatives of derivatives. The derivative function
of f in this context is called the first derivative. Its derivative function is the second derivative. The
second derivative’s derivative function is the third derivative and so on.

Notation

The following notations are used for higher order derivatives

name prime notation Leibniz notation

df
first derivative f ′ (x)
dx
d2 f
second derivative f ′′ (x)
dx2
d3 f
third derivative f ′′′ (x)
dx3
d4 f
fourth derivative f (4) (x)
dx4
d5 f
fifth derivative f (5) (x)
dx5

25
Question 1.2.6 How Do We Measure the Change in a Function?

The sign of a higher order derivative tells us how the derivative of one order lower is changing. For
d5 f d4 f
example if 5
< 0, then is decreasing. The sign of higher order derivatives is difficult to discern
dx dx4
from the shape of y = f (x), with the exeption of the second derivative.

Theorem

If f ′′ (x) > 0 on some interval, then y = f (x) is concave up on that interval. If f ′′ (x) < 0, then
y = f (x) is concave down.

Definition

A point a such that f (x) is concave up to one side of a and concave down to the other side is called
an inflection point.

Question 1.2.7
How Do We Compute Derivatives

The limit definition of a derivative is too unwieldy to use every time. A better approach is to learn
the derivatives of some simple functions, and then use theorems to compute derivatives when those
functions are combined.

Derivatives of Simple Functions

d
dx c = 0 (derivative of a constant is 0)

d n
dx x = nxn−1 for any n ̸= 0 (The Power Rule)

d
dx sin x = cos x

d
dx cos x = − sin x

d x
dx e = ex

d x
dx a = ax ln a for a > 0

d 1
dx ln x = x

26
Theorem

The following rules allow us to differentiate functions made of simpler functions whose derivative we
know.

Sum Rule (f (x) + g(x))′ = f ′ (x) + g ′ (x)


Constant Multiple Rule (cf (x))′ = cf ′ (x)
Product Rule (f (x)g(x))′ = f ′ (x)g(x) + g ′ (x)f (x)
′
f ′ (x)g(x)−g ′ (x)f (x)

f (x)
Quotient Rule g(x) = (g(x))2 unless g(x) = 0

Chain Rule (f (g(x))′ = f ′ (g(x))g ′ (x)

Example

d
Compute tan(x)
dx

Solution

sin x
tan x = . We apply the quotient rule
cos x

(sin x)′ cos x − (cos x)′ sin x


(tan x)′ = quotient rule
cos2 x
cos2 x + sin2 x
=
cos2 x
1
= Pythagorean identity
cos2 x
= sec2 x

27
Application 1.2.8
The Shape of a Graph

What can the first and second derivative of f (x) = 8x3 − x4 tell us about the shape of its graph?

Solution

We will compute the first and second derivative using the power rule. Factoring them will allow us to
perform a sign analysis.

f ′ (x) = 24x2 − 4x3 f ′′ (x) = 48x − 12x2

= 4x2 (6 − x) = 12x(4 − x)

4x2 + + + 12x − + +
(6 − x) + + − (4 − x) + + −
f ′ (x) + + − f ′′ (x) − + −
0 6 0 4

From the sign of f ′ (x) we conclude f is increasing on (−∞, 0) and (0, 6) but decreasing on (6, ∞).
From the sign of f ′′ (x) we conclude that f is concave down on (−∞, 0) and (4, ∞), but concave up
on (0, 4).

Figure: The graph of y = 8x3 − x4

28
Section 1.2
Exercises

1.2.1

Q1 Given the graph of y = f (x) here, give the value of each of the following limits (if they exist).

a lim f (x) c lim f (x) e lim f (x)


x→−3− x→−2 x→4−

b lim f (x) d lim f (x) f lim f (x)


x→−3+ x→0 x→4+

Q2 Given the graph of y = g(x) here, give the value of each of the following limits (if they exist).

a lim g(x) d lim g(x) g lim g(x)


x→0− x→3− x→−4−

b lim g(x) e lim g(x) h lim g(x)


x→0+ x→3+ x→−4+

c lim g(x) f lim g(x) i lim g(x)


x→0 x→3 x→−1

29
Section 1.2 Exercises

1.2.2

ex
Q3 Explain why f (x) = x2 +3 is continuous on R.

p
Q4 Explain why f (x) = sin(3x2 ) is continuous on its domain.

Q5 Is

sin(2x) if x < 0

f (x) = 4 if x = 0

 2
−x if x > 0

continuous at x = 0? Justify your answer.

Q6 Is
(
x3 − 2x + 1 if x < 0
f (x) =
ex if x ≥ 0

continuous at x = 0? Justify your answer.

Q7 Is

x + 5
 if x < 1
f (x) = 6 if x = 1

 2
x + 4x + 1 if x > 1

continuous at x = 1? Justify your answer.

Q8 Where is

cos(πx) if x < 4

f (x) = 1 if x = 4
√

x − 3 if x > 4

continuous?

30
1.2.3

x−3
Q9 Compute lim
x→3 x2 − 9
x2 − 4x + 3
Q10 Compute lim
x→1 x−1
2x − 18
Q11 Compute lim √
x→9 x−3
1 1
x2− 16
Q12 Compute lim
x→4 x−4

1.2.4

Q13 Explain why sin x = 2x − 1 has a solution in [0, 1].


Q14 Explain why 3
x = log2 x has a solution in [0, 8].

1 1
Q15 What does the Intermediate Value Theorem say about whether f (x) = x − 2 has a root in

[−1, 1]?

3 π 3 5π 1 3
Q16 Consider the equation sin x = . Gloria computes sin = and sin = . Since is not
4 3 2 6 2 4

1 3 3
, she concludes that sin x = has no roots in π3 , 5π
 
between and 6 . What do you think
2 2 4
of Gloria’s reasoning?

1.2.5

x2 + 2x − 9
Q17 Compute lim .
x→∞ 3x − 6
4x2 − 7x + 9
Q18 Compute lim .
x→∞ 2x2 + 11
p
Q19 Compute lim e1/x .
x→∞

1
Q20 Compute lim .
x→∞ ln x
31
Section 1.2 Exercises

x
Q21 Compute lim ee .
x→−∞

Q22 Compute lim sin(ln x).


x→∞

1.2.6

Q23 Let f (x) = x3 .

a Compute the average rate of change of f from x = 2 to x = 5.

b Give the equation of the secant line that meets y = f (x) at x = 2 and x = 5.

c Use the limit definition of the derivative to compute f ′ (2).


Q24 Let f (x) = x Compute the average rate of change of f between x = 4 and x = 9. Based on

the graph of y = f (x), is the instantaneous rate of change at x = 4 greater or less than this
average?

Q25 Let f (x) = 3x2 − 7. Compute f ′ (6) using the limit definition of the derivative.

Q26 Let f (x) = 1


x+2 . Compute f ′ (1) using the limit definition of the derivative.

Q27 Let f (x) = 1


x2 . Compute f ′ (x) using the limit definition of the derivative.


Q28 Let f (x) = x. Compute f ′ (x) using the limit definition of the derivative.

32
1.2.7

Q29 Use derivative rules to differentiate each of the following functions.

5
a 5x7 − 3x2 + f cos(4x)
x2
4x5 − 2x2 + 3x + 4
b g sin(ex )
x

c (x2 + 2x) sin x h (x2 + 5x + 4)60

ex 2
d i ex sin x
x2
√ ln(x2 + 2)
e x−5 j
x2 + 3x

Q30 Use derivative rules to differentiate each of the following functions.

3 7
a + f e3x+2
x x3
5x4 + 3x3 − 8x2
b g cos(x3 + 2x)
x2
ln x 5
c h (cos x)3
x
2
d 4x sin(x) i ex sin3 x


e tan(2x + 7) j ln( x sin x)

Q31 Let f (x) = sin(3x). Compute f ′′′ (x).

3
Q32 Let f (x) = ex . Compute f ′′ (x).

1.2.8

Q33 Where in its domain is the function f (x) = x3 − x2 increasing?

33
Section 1.2 Exercises

Q34 Where in its domain is f (x) = ex − x2 concave up?


Q35 Where in its domain is f (x) = 1024 x − x4 increasing?

Q36 Find the inflection point(s) of x4 − 8x3 .

34
Section 1.3

Applications of Derivatives
Goals:

1 Write the equation of a tangent line.


2 Identify local maximums and minimums.

3 Use the Extreme Value Theorem to find minimums and maximums.


4 Use l’Hôpital’s rule to compute limits.

This section reviews the most important applications of the derivative.

Application 1.3.1
The Tangent Line to a Graph

Given a function f (x), the derivative f ′ (a) is the slope of the line tangent to y = f (x) at (a, f (a)).

Formula

The equation of the tangent line to y = f (x) at x = a in point-slope form is:

y − f (a) = f ′ (a)(x − a)

We can rewrite the tangent line as a function of x. We call this a linearization, because this function
is linear, but it approximates the value of f (x) for x near a.

Formula

The linearization of y = f (x) at x = a is the function:

L(x) = f (a) + f ′ (a)(x − a)

If we want to emphasize the change in x and y instead of their actual values we can use differential
notation:

35
Application 1.3.1 The Tangent Line to a Graph

Notation

If y = f (x) is approximated by a tangent line at x = a then we let


dx = x − a represents a change in x from a. Since x is an independent variable, so is dx.
dy = f ′ (a) dx is equal to the change in y corresponding to dx, if we travel along the tangent
line. This approximates the actual change in f (x) if x increases by dx.

Figure: The differentials dx and dy on the tangent line to y = f (x)

Application 1.3.2
Maximum and Minimum Values of a Function

Definition

A number a is a maximum of a function f (x) if f (a) ≥ f (x) for all x in the domain of f .
a is a minimum if f (a) ≤ f (x) for all x in the domain of f .

36
Definition

A number a is a local maximum of a function f (x) if f (a) ≥ f (b) for all b in some neighborhood of a.
a is a local minimum if f (a) ≤ f (b) for all b in some neighborhood of a.

To distinguish ordinary maximums from the local variety, we sometimes call them global maximums
or absolute maximums. Every global maximum is a local maximum, but local maximums need not be
global maximums. If f ′ (a) > 0 then there are larger values of f (a) to the right of a and lower values
to the left. Thus a cannot be a local maximum or minimum. The same argument applies if f ′ (a) < 0.

Figure: Maximum and minimum values of f (x)

Definition

A critical point of f (x) is a value a in the domain of f such that either f ′ (a) = 0 or f ′ (a) does not
exist.

Theorem [The First Derivative Test]

Local maximums and minimums of f (x) can only occur at critical points.

We can use concavity as a way to classify critical points. Knowing whether a graph is concave up
or concave down at a point where f ′ (x) = 0 allows us to visualize a small neighborhood of that point.

37
Application 1.3.2 Maximum and Minimum Values of a Function

Theorem [The Second Derivative Test]

Let a be a critical point of f .


If f ′′ (a) < 0 then a is a local maximum.
If f ′′ (a) > 0 then a is a local minimum.

If f ′′ (a) = 0 or does not exist, then the test is inconclusive. a could be a local maximum, a local
minimum, or neither.

Example

What does the second derivative test tell you about the critical points of f (x) = 8x3 − x4 ?

Solution

First we compute the critical points.

f ′ (x) = 24x2 − 4x3 compute first derivative

0 = 24x2 − 4x3 set equal to 0

0 = 4x2 (6 − x) factor
x = 0 or x = 6

Now we compute the second derivative and evaluate it at each critical point.

f ′′ (x) = 48x − 12x2


f ′′ (0) = 0
f ′′ (6) = (48)(6) − (12)(36) = −144

f ′′ (6) < 0 so x = 6 is a local maximum. f ′′ (0) = 0 so the second derivative test cannot tell whether
x = 0 is a local maximum or local minimum (in fact it is neither).

38
Question 1.3.3
Does a Function Always Have a Maximum?

No. Many functions don’t have maximums, because as x gets larger and larger the values of f (x)
increase or decrease without bound. However, if we restrict the domain, we can sometimes guarantee a
maximum

Theorem [The Extreme Value Theorem]

If f (x) is a continuous function on a closed domain [a, b] then f has an absolute maximum and an
absolute minimum on [a, b].

Remark

When the EVT applies, we can find the absolute maximum and minimum by process of elimination. A
maximum exists, so it must occur at a critical point. We can find the critical points and evaluate f at
each of them. Whichever has the greatest value is the maximum.
Note that a and b are always critical points because the derivative does not exist there. There is no
limit from the left at a because those points are outside the domain of f . Similarly, there is no limit
from the right at b.

Example

Compute the maximum and minimum value of f (x) = 8x3 − x4 on the domain [2, 8], if they exist.

Solution

f (x) is continuous and [2, 8] is closed, so the EVT guarantees that a maximum and minimum exist.
The first derivative test says that they can only occur at critical points.

f ′ (x) = 24x2 − 4x3 compute first derivative

0 = 24x2 − 4x3 set equal to 0

0 = 4x2 (6 − x) factor
x = 0 or x = 6

x = 0 is not in the domain, so we discard it. On the other hand x = 2 and x = 8 are also critical points
because the derivative does not exist there. To find which critical point is the maximum and which is
the minimum, we plug each into f and compare.

f (2) = (8)(8) − 16 = 48
f (6) = (8)(216) − 1296 = 436 (maximum)
f (8) = (8)(512) − 4096 = 0 (minimum)

39
Application 1.3.4
L’Hôpital’s Rule

The limit rules tell us how to take limits of quotients, products, sums and differences. What happens
if one of the functions being divided goes to ∞, or if the denominator of a quotient goes to 0? In some
cases we can reason this out using our intuition of arithmetic.

Example

tan−1 (x)
Consider lim .
x→∞ ln x
π
lim tan−1 x =
x→∞ 2

lim ln x = ∞
x→∞

Since the numerators are approaching π/2 and the denominators are increasing without bound, we
conclude that this ratio get smaller and smaller and will limit to 0.

In other cases, intuition cannot help us.

Definition

f (x)
A limit of the form lim is of indeterminate form if either
x→a g(x)

lim f (x) = ±∞ and lim g(x) = ±∞ or


x→a x→a

lim f (x) = 0 and lim g(x) = 0


x→a x→a

This definition also applies to one-sided limits or to limits at ±∞.

Limits of products and sums can sometimes be rewritten as quotients of indeterminate form as well.

Theorem [L’Hôpital’s Rule]

f (x)
If lim is of indeterminate form, then it is equal to
x→a g(x)

f ′ (x)
lim
x→a g ′ (x)

assuming this limit exists.

Often L’Hôpital’s Rule converts a limit of indeterminate form to one we can evaluate through intuition
or direct computation. Sometimes, we need to apply L’Hôpital’s Rule more than once.

Warning

If a limit is not of indeterminate form, then L’Hôpital’s Rule does not apply. Attempting to apply it will
usually give an incorrect value for the limit.

40
Example 1.3.5
A Limit of Indeterminate Form

ex − x − 1
Evaluate lim
x→0 x2

Solution

ex − x − 1 0
lim form
x→0 x2 0
x
e −1 0
= lim L’Hôpital’s Rule, still form
x→0 2x 0
x
e
= lim L’Hôpital’s Rule again
x→0 2

1
=
2

Section 1.3
Exercises

1.3.1


Q1 Write the equation of the tangent line to y = x at (4, 2).

1 1

Q2 Write the equation of the tangent line to y = x2 at 5, 25 .

Q3 Let f (x) = sin(x)

π
a Write the equation of the linearization y = f (x) at x = 3.

b If we wanted to use a to approximate sin(1) by hand, what number(s) would we need


decimal approximations of?

c Use a calculator to get decimal approximations of those numbers, then show how to approx-

imate sin(1).

1 1
Q4 Write a linearization of f (x) = at x = 3 and use it to approximate .
x 2.93
41
Section 1.3 Exercises

Q5 A baterical culture has mass 3g after t = 5 hours of growth. At that time, its instantaneous rate

of growth is 0.2g/hr.

a Write a linear function to approximate m(t) the mass of the culture at hour t.

b Approximate the mass at time 8 hours.

c Given that m′′ (t) > 0, is your answer to b is overestiamte or an underestimate?

Q6 A space capsule is descending from orbit. After 90 seconds, it is 10, 000m above sea level and
falling at 400m per second.

a Write a linearization for h(t), the height of the capsule at time t.

b Use a to predict when the capsule will splash down into the ocean.

c Do you expect that your answer to b is an overestimate or underestimate? Explain.

1.3.2

Q7 Find the critical points of f (x) = 12x2/3 − x.

Q8 Find the critical points of g(x) = x4 − 18x2 + 5. Apply the second derivative test to each.

Q9 Find the critical points of f (x) = x3 − 75x. Apply the second derivative test to each.

Q10 Find the critical points of g(x) = ex − 2x. Apply the second derivative test to each.

1.3.3

Q11 Find the maximum and minimum values of f (x) = x2/3 on [−8, 1].

Q12 Find the maximum and minimum values of f (x) = x3 − 75x on [−10, 10].

42
1.3.4

x cos(x − π)
Q13 Evaluate lim+ .
x→0 ex − 1
e−3x + 3x − 1
Q14 Evaluate lim .
x→0+ sin(x2 )

x ln x
Q15 Evaluate lim .
x→∞ x5/2 + 3

Q16 Evaluate lim ex x2 .


x→−∞

43
Section 1.4

Definite Integrals
Goals:

1 Express areas under a graph and antiderivatives using integral notation.


2 Derive antiderivatives from known derivatives.
3 Compute general antiderivatives.
4 Compute definite integrals using the Fundamental Theorem of Calculus.
5 Use u-substitution to compute integrals where necessary.

By definition, integrals compute area under a graph. The Fundamental Theorem of Calculus connects
integrals to antiderivatives, meaning that integrals can also be used to compute total change, given a
rate of change function.

Question 1.4.1
What Is an Antiderivative?

Definition

F (x) is antiderivative of a function f (x), if F ′ (x) = f (x).

Every derivative we know also tells us an antiderivative.

Example
 
d x2 x2
dx 2 + 5 = x so F (x) = 2 + 5 is an antiderivative of f (x) = x.
2 2
x x x2
Notice that 2 + 2, 2 − 6, and 2 are also antiderivatives of f (x) = x.

Functions have infinitely many antiderivatives. Adding a constant to one antiderivative produces
another, since the derivative of a constant is 0. In fact, this is the only relationship between antideriva-
tives.

Theorem

If F (x) and G(x) are antideriavatives of f (x), then there is a constant c such that

F (x) = G(x) + c.

Since the antiderivatives are related this way, it is easy to express all of the antiderivatives of a
function at once.
44
Definition

If F (x) is an antiderivative of f (x), then the general antiderivative of f (x) is the family of functions:

F (x) + c

where c can be any constant.

Here is a table of antiderivatives that we can compute just by reverse engineering the derivatives we
already know.

f (x) general antiderivative of f (x)


xn+1
xn n+1 +c
ex ex + c
ax
ax ln a +c
1
x ln x + c
sin x − cos x
cos x sin x

Remark

Many familiar functions are missing from this list. This is because we just haven’t come across them as
derivatives of some other function. For instance, we do not yet know a function F (x) whose derivative
is ln x or tan x.

Question 1.4.2
How Do We Compactly Denote a Sum of Many Terms

Defining the definite integral requires us to add up many numbers. The problem is not just that
the number of summands is large. We need to be flexible about how many terms are in the sum. The
notation that gives us this flexibility is Σ notation.

Notation

Σ (‘sigma’) notation allows us to sum many different values of an expression using an index variable.
The index variable will be replaced by each integer between an initial and final value, and the resulting
outputs are added together.

n
X
f (k) = f (1) + f (2) + f (3) + · · · + f (n)
k=1

We may choose any variable as the index variable. The index variable could also have a different initial
value, if that is more convenient.

45
Question 1.4.2 How Do We Compactly Denote a Sum of Many Terms

Example

7
X j2 9 16 25 36 49
= + + + +
j=3
j + 1 4 5 6 7 8

P
Part of the challenge of writing a sum in notation is choosing an f that will produce all the terms
of your sum.

Example 1.4.3
Writing a Sum in Σ Notation

Write each of the following sums in Σ notation.

a 4 + 7 + 10 + 13 + 16 + 19 + 22

b 2 + 6 + 18 + 54 + 162 + 486

c −3 + 4 − 5 + 6 − 7 + 8 − 9 + 10

√ √ √
1 2 3 2 5
d + + + +
4 9 16 25 36

Solution

a The terms increase by 3 each time. Repeated addition is multiplication, in this case 3k plus some

starting value. Starting with index k = 0 is convenient, because 3(0) = 0 at the starting value.

6
X
4 + 7 + 10 + 13 + 16 + 19 + 22 = 4 + 3k
k=0

b The terms are multiplied by 3 each time. Repeated multiplication is exponentiation, in this case

3k times some starting value. Starting with index k = 0 is convenient, because 30 = 1 at the
starting value.
5
X
2 + 6 + 18 + 54 + 162 + 486 = (2)(3k )
k=0

46
c The absolute values of this sum could just be the values of the index variable. To create an

alternating + and − pattern, we can multiply by (−1)k .


10
X
−3 + 4 − 5 + 6 − 7 + 8 − 9 + 10 = (−1)k k
k=3

d In a fraction, we can model the numerator and denominator separately.

√ √ √ 5 √
1 2 3 2 5 X k
+ + + + =
4 9 16 25 36 (k + 1)2
k=1

Question 1.4.4
How Do We Compute the Area Under a Graph?

Suppose we would like to know the area below the graph y = f (x) between x = a and x = b. We
approximate this area by rectangles. We can improve these approximations and take a limit of such
improvements to compute the actual area. Here is the procedure.
1 Divide [a, b] into n subintervals, of lengths ∆xi .
2 Pick a point x∗i in each subinterval.
3 Evaluate f (x∗i ), which is the height of the graph above x∗i .
4 Produce a rectangle of height f (x∗i ) and width ∆xi over each subinterval.
5 Sum the areas of these rectangles. This is an approximation of the actual area.
6 Take a limit of such approximations as |∆x|, the largest of the ∆xi goes to 0.

Figure: The area under y = f (x) approximated by rectangles

47
Question 1.4.4 How Do We Compute the Area Under a Graph?

Defintion

We define the definite integral of f (x) over [a, b] to be


Z b X
f (x) dx = lim f (x∗i )∆xi
a |∆x|→0

where the limit is taken over all divisions of [a, b], ∆xi is the length of the ith subinterval, x∗i is a point
in the ith subinterval and |∆x| is the largest ∆xi .

Notice there is no requirement that the subintervals be the same length. Becauseof this, we don’t
take a limit as n approaches ∞. For instance, using a large number of rectangles from a, a+b 2 and only
 a+b 
a single rectangle over 2 , b will not give us a good approximation, no matter how many rectangles
we use. Instead we take a limit as the largest ∆xi approaches 0.
In practice, we get the same limit whether the subintervals are equal length or not not. It is common
to use the same ∆x = b−a n for each subinterval.
The definite integral almost solves our area problem, but wherever f (x) < 0, the product f (x∗i )∆xi
will be negative.

Theorem
Z b
If f (x) > 0 on [a, b] then f (x) dx computes the area under y = f (x) over [a, b]. In general
a
Z b
f (x) dx computes the signed area between y = f (x) and the x-axis, where area above the axis
a
counts as positive, and area below the axis counts a negative.

Since integrals are limits, they inherit two laws from limits. The third can be taken from geometry,
setting the area of a region equal to the sum of the areas of two subregions.

Integral Laws

Z b Z b Z b
f (x) + g(x) dx = f (x) dx + g(x) dx (Sum Rule)
a a a

Z b Z b
cf (x) dx = c f (x) dx (Constant Multiple Rule)
a a

Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (Union Rule)
a a c

48
Question 1.4.5
How Do We Evaluate an Integral?

The limit form of an integral is usually impossible to evaluate directly. Instead we use a powerful
pair of theorems.

Theorem [The First Fundamental Theorem of Calculus]


Z x
Given a function f (x), let g(x) = f (t) dt. At any x where f is continuous, g ′ (x) = f (x).
a

To prove this, we use the definition of a derivative.

g(x + h) − g(x)
g ′ (x) = lim
h→0 h
R x+h Rx
a
f (t) dt − a f (t) dt
= lim
h→0 h
R x+h
f (t) dt
= lim x union rule
h→0 h

As the interval [x, x + h] shrinks, the values of f over that interval can be made arbitrarily close to f (x),
R x+h
since f is continuous. Thus x f (t) dt approaches the area of a rectangle of height f (x) and width
h. Thus
R x+h
f (t) dt
lim x = f (x)
h→0 h

Figure: g(x + h) − g(x) represented as an area

The main use of the First Fundamental Theorem of Calculus is to prove the Second Fundamental
Theorem of Calculus.

Theorem [The Second Fundamental Theorem of Calculus]

Let f (x) be a continuous function on [a, b]. If F (x) an antiderivative of f (x), then
Z b
f (x) dx = F (b) − F (a)
a

49
Question 1.4.5 How Do We Evaluate an Integral?

R x This follows immediately from the First Fundamental Theorem. If we continue to define g(x) =
a
f (t) dt, then

Z b Z b Z a
f (x) dx = f (x) dx − f (x) dx
a a a

= g(b) − g(a)

We know that g(x) is an antiderivative of f (x). If we instead pick a different antiderivative F (x), then
F (x) = g(x) + c, and

F (b) − F (a) = g(b) + c − (g(a) + c)


= g(b) − g(a)
Z b
= f (x) dx
a

Because we will be computing F (b) − F (a) frequently, we will develop the following shorthand.

Notation

The quantity F (b) − F (a) can be denoted


b
F (x)
a

This relationship between integrals and antiderivatives motivates the following vocabulary.

Notation

The general antiderivative of f (x) is also called an indefinite integral and is denoted
Z
f (x) dx.

50
Example 1.4.6
A Definite Integral

Z 5
Compute x2 dx
2

Solution

5 5
x3
Z
x2 dx =
2 3 2
3
5 23
= −
3 3
125 − 8
=
3
= 39

Question 1.4.7
How Do We Apply the Chain Rule in an Antiderivative?

The chain rule states that


(f (g(x)))′ = f ′ (g(x))g ′ (x).
The key insight here is to rewrite think of g as a variable, in addition to being a function of x. Typically
we rename it with a letter closer to the end of the alphabet, like u. The following substituion theorem
uses the chain rule to say that we can integrate with respect to u instead of x.

Theorem

If u(x) is a function of x, then

Z b Z u(b)
f (u(x))u′ (x) dx = f (u) du
a u(a)

This allows us to replace a complicated integrand in x with a simpler one in u. To correctly rewrite
the integral, the bounds must be updated to the corresponding values of u.
We can also apply this to indefinite integrals. If F is an antiderivative of f , then

Z Z
f (u(x))u′ (x) dx = f (u) du

= F (u) + c
= F (u(x)) + c
The most common u substitutions are linear, where u = ax.
51
Question 1.4.7 How Do We Apply the Chain Rule in an Antiderivative?

Example
Z
Compute sin 3x dx

Solution

We will perform a u substitution, using u = 3x.

Z Z
1 u-substitution
sin(3x) dx = sin u du
3
u = 3x
1 du = 3 dx
= − cos u + c
3 1
3 du = dx
1
= − cos(3x) + c
3

Note that we should express our antiderivatives in terms of the original variable (often x), not in
terms of u.

Figure: The graphs y = sin x, y = sin 3x their related tangent lines

Example 1.4.8
A u-substitution

Compute the integral:


Z 3
2
xex dx
0

52
Solution

We start by looking for a candidate for u(x). Since we want the integrand to be f (u(x))u′ (x), we
note u(x) should be the inner function in some composition. x2 is the natural target. We attempt the
substituion, and hope that the remaining factors in the integrand can be expressed in terms of u′ (x).
We see that our u′ (x) dx is 2x dx. Since we only have an x dx in our integrand, we divide by 2.

Z 3 Z 9
x2 1 u u-substitution
xe dx = e du
0 0 2
u = x2 x=0⇒u=0
9
1 du = 2x dx x=3⇒u=9
= eu
2 0 1
du = x dx
2
1
= (e9 − 1)
2

Section 1.4
Exercises

1.4.1

Q1 Write two different anti-derivatives of f (x) = x + 5.

Q2 Write two different anti-derivatives of f (x) = x3 − 6x2 + x2 .

Q3 Write a general antiderivative of f (x) = 4 cos x + 6x2 .

Q4 Suppose x4 − sin(x3 ) is an antiderivative of f (x). Write three other antiderivatives of f (x). You
should do this without computing what f is.

Q5 If F (x) and G(x) are both antiderviatives of f (x), find the value b such that 3F (x) − bG(x) is

also an antiderivative of f (x).

Q6 Suppose F and G are both antiderivatives of f (x). Suppose further that F is an antiderivative

of F and G is an antiderivative of G. Describe the possible values of F(x) − G(x).

53
Section 1.4 Exercises

1.4.2

5
X
Q7 Evaluate 3k − 2
k=2

4
X
Q8 Evaluate j2 − j
j=−1

b
X
Q9 Write a formula for the value of c.
k=a

Q10 We do not need to write a constant multiple rule for Σ notation because we already have one.
b
X b
X
Explain what rules of mathematics tell us that cf (k) = c f (k).
k=a k=a

Q11 Explain what’s wrong with the following notation:

k
X 1
3k 2 +
k
k=1

n
X 1
Q12 Consider the sum for a few different values of n. Can you conjecture a formula for this
2k
k=1
sum (it will depend on n).

1.4.3

Q13 Write the following sums in Σ notation.

a 3 + 7 + 11 + 15 + 19

b 6 + 12 + 24 + 48 + 96 + 192

3 4 5 6 7
c 4 − 5 + 6 − 7 + 8 − 89 .

Q14 Write the following sums in Σ notation.

a 5 − 15 + 25 − 35 + 45 − 55 + 65 − 75 + 85 − 95

54
1 4 9 16 25
b 4 + 16 + 64 + 256 + 1024

√ √ √ √ √ √ √
c 2+ 6+ 12 + 20 + 30 + 42 + 56.

1.4.4

R1 1 
Q15 Does 1/2
ln x dx compute the area under y = ln x over 2, 1 ? Explain.

Rb
Q16 Suppose a
f (x) dx < 0. What does this tell you about the graph y = f (x)? Be specific.


Q17 Draw a careful graph of y = x. Use 5 subintervals of [1, 11] to estimate the area beneath the

graph over [1, 11]. Use the left endpoints of each subinterval as the test points x∗i .

Q18 Draw a careful graph of y = 3x. Use 3 subintervals of [2, 8] to estimate the area beneath the
graph, with the test points x∗i being the left endpoints of each subinterval.
R
Q19 Draw the graph of y = 7. Use geometry to evaluate 3
87 dx.

x
9 x3 + 1 dx.
R
Q20 Draw the graph of y = 3 + 1. Use geometry to evaluate −3

1.4.5

Rx
Q21 Let g(x) = 5
f (t) dt. What is g ′ (8)?

Rx
Q22 Let g(x) = 2
cos t dt. Is g(x) increasing or decreasing at x = 3? Explain.

R 31
Q23 Suppose f (x) is an increasing function. Is 22
f ′ (x) dx positive or negative?

Q24 Suppose F (x) and G(x) are both antiderivatives of f (x). Given the following incomplete table
R4
of values, compute 1
f (x) dx.

x 1 2 3 4 5 6
F (x) − 7 − 13 − 9
G(x) 3 − 9 − 10 5
55
Section 1.4 Exercises

Z Z b
Q25 Explain the difference between f (x) dx and f (x) dx in a few sentences.
a
Z π
Q26 Compute cos(x) dx. Explain the geometric meaning of your answer in a sentence or two.
0

1.4.6

Z 8
3
Q27 Compute x− dx.
1 x
Z 4
1
Q28 Compute dt.
1 t3/2
Z
Q29 Compute ex − 6x2 dx.
Z 0
1 x
Q30 Compute e + 5 dx.
t 3
Z √
Q31 Compute t dt.

2
x2 + 2
Z
Q32 Compute dx.
10 5x
Z
3
Q33 Compute sin y dy.
5
Z 2
Q34 Compute x4 − 3x + 2 dx.
0
Z 3π/4
Q35 Compute 2 cos v dv.
π/6
Z π
Q36 Compute 2 sin t + cos t dt.
0

56
1.4.7

Q37 Write some general rules. Suppose F (x) + c is the antiderivative of f (x)

a What is the antiderivative of f (x + a)?

b What is the antiderivative of f (ax)?


Z b Z 2b
1 x
Q38 Assuming that f (x) dx exists, argue that it is equal to f dx, in the following two
a 2a 2 2
ways:

a By appealing to an integration rule.

x

b By describing the relationship between the graphs of y = f (x) and y = f 2 . A picture
might help.

1.4.8

Z
Q39 Compute e7x dx.


Z
Q40 Compute 5x + 3 dx.
Z  
θ
Q41 Compute cos dθ.
3
Z
Q42 Compute (t − 2)6 dt.
Z 1/4
Q43 Compute sin(πt) dt.
0
Z 3
3
Q44 Compute x2 ex dx.
0
Z
Q45 Compute (x5 − 2x)(5x4 − 2) dx.
Z 3π/4
1
Q46 Compute cos(x) dx.
π/4 sin2 x

57
Section 1.4 Exercises

58
Chapter 2

Advanced Integration and


Applications

This chapter covers a variety of methods and applications for single-variable integrals. The first two
sections lay the groundwork for multivariable integration by exploring the connections between integration
and geometry. One section touches on approximation methods for integrals. Other sections prepare us
for our goal: applying integration to probability and statistics.

Contents
2.1 Area Between Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.3 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.4 Approximate Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.5 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.6 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.7 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 158
Section 2.1

Area Between Curves


Goals:

1 Use integrals to calculate the geometric area of a region.

The Fundamental Theorem of Calculus relates the change in a function to the area under a curve.
Modern scientists have seized upon integration as a way to study change, whether they are measuring
a chemical reaction, the position of a particle, or economic activity. The geometric applications are
irrelevant to most consumers of calculus.
Historically, these methods were exciting to scholars who had been limited to area formulas for circles
and triangles. Now any shape that was defined by an algebraic function was fair game. In this section
we push integration beyond areas under a curve to areas bounded by two or more curves. This gives us
the ability to measure a wide variety of shapes, but geometry is not our end goal. Instead the goal is
to study how integration works on these oddly shaped regions. We will find that the methods of this
section return to relevance when it is time to integrate functions of more than one variable.

Question 2.1.1
How Is the Integral Related to Geometric Area?

When we defined the definite integral, we were attempting to compute the area under a curve.
However, our methods introduced a glitch. Consider the following example.
Z 8
38 38
This region has an area of , but f (x) dx = − .
3 3 3

Figure: A region below the x-axis and above y = f (x)

We were taught that the integral does not measure geometric area, but instead signed area. Area
below the x axis counts as negative.
Why does this happen? Recall the definition of the definite integral.

60
Definition

The integral is computed by the following limit


Z b X
f (x) dx = lim f (x∗i )∆x
a ∆x→0
i

This limit takes better and better approximations of the area. The approximation is a sum of
rectangles, whose area is height × width. All the rectangles have width ∆x, but their heights vary, and
we used the height of the graph y = f (x) to measure them. This works fine when f (x) is positive.
When f (x) < 0, the product f (x∗i )∆x computes a negative “area” for each rectangle.

Figure: An approximation by rectangles of negative height

In this example the resolution of this glitch is straightforward. Eliminating the negative sign, we
obtain the correct area. However, we can imagine a region that requires a more sophisticated approach.

Question 2.1.2
What Integral Computes the Geometric Area Between Two Graphs?

Suppose we want to know the area between the graphs y = f (x) and y = g(x) for some interval
a ≤ x ≤ b. We can approximate this by rectangles. As the number of rectangles increases, the
approximation becomes more accurate.

61
Question 2.1.2 What Integral Computes the Geometric Area Between Two Graphs?

Figure: The region between y = f (x) and y = g(x), approximated by rectangles

Let’s derive a formula for this rectangle approximation.

We let x∗i denote the left endpoint of each subin-


terval. The rectangles have width ∆x and height
g(x∗i ) − f (x∗i ). We compute:
X
Area = lim (g(x∗i ) − f (x∗i ))∆x
∆x→0
i

This limit exactly matches the definition of a definite


integral. The function being integrated is g(x) −
f (x). Thus we can compute the area below y =
g(x) and above y = f (x) by integrating g(x) − f (x)
from a to b.

Main Idea

The area above y = f (x) and below y = g(x) from x = a to x = b is computed


Z b
g(x) − f (x) dx.
a

62
Example 2.1.3
The Area Between Two Curves

√ √
Suppose we want to compute the area between y = x and y = x − x from x = 6 to x = 12.
How do we know which graph is on top and which is on the bottom?
The height of a graph is the value of the function. We can evaluate the function at some x in the
interval [6, 12]. The most convenient x is x = 9.
√ √
9=3 9− 9=6
√ √
So at x = 9, y = x − x is above y = x.

Exercise
√ √
We’ve established that at x = 9, y = x − x is above y = x. Unfortunately there are infinitely many
points between x = 6 and x = 12. How can we decide which graph is on top at each of them?
√ √
1 Does the graph of y = x intersect the graph of y = x − x between x = 6 and x = 12?

√ √
2 What theorem could we use to argue that if y = x is ever above y = x − x then the graphs
must have intersected?

Solution

1 To test where the graphs intersect, we set the functions equal to each other.
√ √
x=x− x

0=x−2 x
√ √
0 = x( x − 2) (factor)

x = 0 or x − 2 = 0
x = 0 or 4

Neither of these is in [6, 12].


2 The Intermediate Value Theorem tells us that these functions
√ cannot
√ switch places without inter-
secting. Switching places means that the difference (x − x) − ( x) would change from positive
to negative. As this is a continuous function,
√ the√Intermediate Value Theorem says there must
be some point along the way where (x − x) − ( x) = 0. We’ve √ already shown that
√ all those
points lie outside the interval, so we can conclude that y = x − x is above y = x over the
entire interval [6, 12].


The figure below confirms that y = x − x is on top for all x in [6, 12].

63
Example 2.1.3 The Area Between Two Curves

√ √
Figure: An approximation of the area between y = x − x and y = x

Main Ideas

Plugging a test point into f (x) and g(x) tells us which graph is above the other.

If the functions are continuous, then solving f (x) = g(x) computes the only points where the
graphs can change positions.

Example 2.1.4
The Area Enclosed by Two Curves

Set up an integral that computes the area enclosed between the curves y = x2 and y = 3 − x − x2 .

Figure: The area enclosed by two parabolas

64
Solution

These are parabolas. If they enclose any area, the downward facing parabola must lie above the upward
facing parabola. This tells us we are integrating
Z b
3 − x − x2 − x2 dx
a

But what are the bounds of integration? To know this we must find the points where the graphs
intersect.

3 − x − x2 = x2

0 = 2x2 + x − 3
0 = (2x + 3)(x − 1)
3
x=− or 1
2

The area is computed


Z 1
Area= 3 − x − x2 − x2 dx
−3/2

Main Ideas

To determine the range of x values that define an enclosed region, solve for the intersection points
between the graphs.

Sketching the graphs can be a time-saver and a reality check for your answer.

Example 2.1.5
The Area Enclosed by Two Curves that Intersect More than Twice

Compute the area enclosed by f (x) = x3 − 10x and g(x) = 3x2 .

65
Example 2.1.5 The Area Enclosed by Two Curves that Intersect More than Twice

Solution

To find the intersections we set f (x) = g(x) and solve:

x3 − 10x = 3x2

x3 − 3x2 − 10x = 0
x(x − 5)(x + 2) = 0
x =0, 5, or − 2

Our region is bounded between x = −2 and x = 5, but one graph does not need to be above the other
for the entire region. The graphs intersect at x = 0 so one graph might be on top for [−2, 0], while the
other is on top for [0, 5]. To find out which is which we could evaluate at test points (we would need
two). Alternately, since we’ve already factored f (x) − g(x) = x(x − 5)(x + 2) we can perform a sign
analysis:

x − − + +
(x − 5) − − − +
(x + 2) − + + +
f (x) − g(x) − + − +
−2 0 5

Thus x3 − 10x > 3x2 on [−2, 0] and x3 − 10x < 3x2 on [0, 5]. The enclosed area is computed by:
Z 0 Z 5
3 2
Area = x − 10x − 3x dx + 3x2 − x3 + 10x dx
−2 0
0 5
x4 x4
= − 5x2 − x3 +x − + 5x2 3
4 −2 4 0
 
625
= (0 − 0 − 0 − 4 + 20 − 8) + 125 − + 125 − 0 + 0 − 0
4
407
=
4

Main Ideas

With more intersections, we must check the region between each pair of intersections to see which
graph is on top.
It can be more efficient to make a sign analysis chart.
Sketching the graphs may be more difficult. If you can do it, it will corroborate (or correct) your
calculations.

66
Example 2.1.6
A Region without a Single Top Curve

16 √
Compute the area enclosed by the curves y = 1, y = and y = 2 x.
x

We should start by drawing this region and finding the coordinates of the intersections.
There are three intersections to solve for, one using each pair of equations.

16 √ 16 √
=2 x =1 2 x=1
x x
3 √ 1
16 = 2x 2 16 = x x=
2
3
8 = x2
1
x=4 x = 16 x=
4
R 16
If we write this area as an integral 1 g(x) − f (x) dx, the top function would need to be piece-wise:
4

( √
2 x if 14 ≤ x ≤ 4
g(x) = 16
.
x if 4 ≤ x ≤ 16

We don’t know the anti-derivative of a piece-wise function. Instead, we consider a few different ap-
proaches. Since the upper boundary is defined by a different function for different values of x, one
approach is to break the region into two integrals.

Figure: Two subregions whose areas can be expressed by integrals


R4 √ R 16 16
The area of the region on the left is 1 2
4
x−1 dx. The are of the region on the right is 4 x −1 dx.
Adding these together gives the total enclosed area.
Another approach would be to obtain the area by subtraction. Find the following two areas on the
diagram:
Z 16 √
Z 16 √ 16
2 x − 1 dx 2 x− dx
1
4 4 x
67
Example 2.1.6 A Region without a Single Top Curve

You should be able to convince yourself that


Z 16 √
Z 16 √ 16
Enclosed Area = 2 x − 1 dx − 2 x− dx
1
4 4 x

Both of these approaches require us to evaluate two integrals. That is unavoidable because our inte-
grals are limits of an approximation by rectangles of different heights, and those heights are determined
by different enclosing graphs, depending on which x value we measure at. For this particular region,
there is a way to avoid this.
Instead we can approximate the region by rectangles of different widths.

√ 16
Notice the left endpoint always lies on y = 2 x and the right endpoint always lies on y = . As
x
the height of the rectangles goes to 0, the approximation becomes exact.
Let’s derive a formula for this rectangle approximation and compute the exact area.

Let ∆y be the height of each rectangle. The widths are given by the horizontal distance between
√ 16
the graph y = 2 x and y = at the heights yi∗ corresponding to the bottom of each rectangle.
x
Horizontal distance is the difference in x values. What x values correspond to yi∗ ? We can plug in yi∗
and solve for x.

68
√ 16
yi∗ = 2 x yi∗ =
x
yi∗ √
= x xyi∗ = 16
2
(yi∗ )2 16
=x x=
4 yi∗

These computations should be familiar. Finding x in terms of y is called finding the inverse function.
These inverse functions give the left and right bounds of our region. To find the area, we take a sum
of the areas of these rectangles of different widths. Then we take a limit. Notice that to make the
width positive we subtract the smaller x value from the larger x value. Geometrically, this is the right
   ∗ 2
(yi )
endpoint y16∗ minus the left endpoint 4 .
i

Z 4
X  16 (yi∗ )2 16 y 2
 
lim − ∆y = − dy
∆y→0 y∗ 4 1 y 4
| i {z
|{z}
i } height
width

This limit is an integral, but the variable of integration is y, not x. The bounds of integration are
the set of y values in the region. The lowest point in the region is at y = 1. The highest is at y = 4.
We evaluate the integral using the Fundamental Theorem of Calculus, but with y instead of x.

4
16 y 2
Z  
Area Enclosed = − dy
1 y 4
4
y3
= 16 ln |y| −
12 1
   
64 1
= 16 ln 4 − − 16 ln 1 −
12 12
63
= 16 ln 4 −
12

Main Idea

The area to the right of x = f −1 (y) and to the left of x = g −1 (y) for y from a to b can be computed
Z b
g −1 (y) − f −1 (y) dy.
a

Strategy

Changing an integral to dy may be more work than breaking it into two or more parts. When solving
an area problem, consider both methods and use the one that seems more promising. If you run into
problems with your chosen approach, give the other method a try.

69
Section 2.1
Exercises

Summary Questions

Q1 What is the geometric significance of f (x)−g(x) in the formula for the area between two graphs?

Q2 How do we determine which curve is the top of a region and which is the bottom? Describe the
difficulties that can arise.

Q3 How do we use boundaries of the form y = g(x) and y = f (x) in an dy-integral to compute
geometric area?

Q4 When setting up a dy-integral, how can we visually identify which graph’s function will be sub-
tracted from which?

Q5 An integral can be positive or negative. If we are solving for area (which may not be negative)
describe the steps we take to guarantee our area is positive.

Q6 Explain the difference between “The region enclosed by y = f (x) and y = g(x)” and “The region

f (x) ≤ y ≤ g(x).”

2.1.1

Q7 Suppose the graph y = f (x) is above the x-axis.

a How much would the geometric area between y = f (x) and the x-axis for a ≤ x ≤ b increase
if the graph were shifted up by k units. Try to argue geometrically or with a visual.

b Would shifting the graph down by k instead decrease the area by the same amount? Draw
a graph for which it wouldn’t.

Q8 How would we use integrals to calculate the geometric area of the shaded region below?

70
Q9 The expressions
Z b Z b
|f (x)| dx and f (x) dx
a a

are not equivalent. Explain why, and draw the graph of a function on which these expressions
disagree.

Q10 Given a differentiable function f (x), the signed area between the graph y = f ′ (x) and the x-axis
Rb
from x = a to x = b is denoted a f ′ (x) dx and is equal to the change in f (x) from x = a to
x = b. In what sense does the geometric area between the graph of y = f ′ (x) and the x-axis
represent a change in f (x)?

2.1.2

Q11 Suppose y = f (x) and y = g(x) are below the x-axis. What integral computes the geometric
area between them. How does this compare to the situation when they are above the x-axis?

Q12 Here is another way to derive the formula for the area between curves. Consider the functions
graphed here:

71
Section 2.1 Exercises

Rb Rb
a Indicate on the graph what areas are denoted by a
f (x) dx and a
g(x) dx. How are they

related to the region between y = f (x) and y = g(x).


Rb Rb
b Is a
g(x) dx − a
f (x) dx equivalent to the expression for area we derived in 2.1.2? What

integral rule(s) would you apply to justify this?


Rb
c If y = f (x) is below the x-axis, how does this change the meaning of a
f (x) dx? Does the

formula from b still work? Explain.

2.1.3

Q13 Compute the area between y = 4x and y = x3 from x = 3 to x = 5

Q14 Compute the area between y = ex and y = sin(πx) from x = −1 to x = 0

2.1.4


Q15 Compute the area enclosed by y = x and y = x2 .

Q16 Compute the area enclosed by y = x2 − 5 and y = 4x.

Q17 Compute the area enclosed by y = x2 , y = 2x − 1 and x = −3.


Q18 Compute the area enclosed by y = x + 2 and y = 3 x.

72
2.1.5

Q19 Compute the area between y = sin x and y = cos x over the interval [0, 2π].

Q20 Erica and Carter were asked to compute the area enclosed by y = 4x and y = x3 . They agree

that 4x = x3 when x = −2 and when x = 2. Erica thinks the area is


Z 2
4x − x3 dx
−2

Carter thinks it is
Z 2
x3 − 4x dx
−2

a Who is correct?

b How do you think the mistake could reasonably have happened, and how can you avoid it?

2
Q21 Compute the area enclosed by y = xex , and y = ex.

Q22 Set up an integral or integrals to compute the region enclosed by the curves f (x) = x2 (x2 − 4)

and g(x) = x4 (x2 − 4).

Q23 Often the top curve of an enclosed region alternates between f (x) and g(x) at each intersection.
Can you explain what about the previous problem caused this pattern to fail?

Q24 Suppose y = f (x) and y = g(x) intersect multiple times, with x = a their leftmost intersection
Rb
and x = b their rightmost. We can express the area enclosed between them by a
|g(x)−f (x)| dx.

a Explain why this formula works.

b Explain why this formula isn’t partcilaularly helpful.

73
Section 2.1 Exercises

2.1.6


Q25 Compute the area enclosed by y = 6, y = x and y = −2x

Q26 Compute the area enclosed by y = ex , y = e4−x , and y = 1.

Q27 You have been taught at least three ways to set up an expression that will compute the area

enclosed by (all of) y = 3, y = 3x, y = 9 and x + y = −5. Set up all the methods you know
that will do this. You do not need to evaluate them.

Q28 Write the area in the first quadrant enclosed by y = 3x, y = 0, and x2 + y 2 = 4 as a single
integral.

Q29 Write the area enclosed by y = x and y = x2 as

a an integral in x

b an integral in y

Q30 Write the area in the first quadrant enclosed by y = x2 , y = 3x2 , and y = 18 − 3x as

a a sum of integrals in x

b a sum of integrals in y

Extension and Synthesis

Q31 Suppose you’ve found that y = f (x) and y = g(x) intersect at x = a (along with perhaps other

places). What could knowing the values of f ′ (a) and g ′ (a) tell you about where each graph is
above the other? Be as specific as possible.

Q32 Suppose you are given that for all x:

f ′ (x) > 0
g ′ (x) < 0
We approximate area between y = f (x) and y = g(x) from x = a to x = b by rectangles,
letting the x∗i be the right endpoints of each subinterval. What can we say about whether the
approximation will overestimate or underestimate the true area?

74
Section 2.2

Volumes
Goals:

1 Recognize cross sections of a solid object.


2 Write the area of each cross section as a function.

3 Compute the volume of a solid.


4 Visualize and compute the volume of a solid of revolution.

The motivation for the definite integral was computing an area. However, the definition turns out
to be more useful than that. With the correct setup, we can express a volume as an integral as well.

Question 2.2.1
What Is Volume?

Dimension

In mathematics, we define the dimension of an object. Dimension measures the number of degrees of
freedom available to a point traveling in the object.

The definition may not match your intuition for dimension. For example, you only encounter a
parabola in two (or more)-dimensional space. However, the parabola itself is one-dimensional. If you
imagine that you are an insect crawling on the parabola, you can only travel forward or backward, not
side to side. If you were small enough, the parabola would seem indistinguishable from a line.

Example

1 A plane is two dimensional. You can travel left/right or up/down.


2 A circle is one dimensional. You can only travel clockwise/counterclockwise.

3 A point is zero dimensional. There is nowhere to travel within it.

We measure objects of different dimensions differently. In all cases, measuring is counting how many
units of measurement fit inside the object. A 6 unit by 3 unit rectangle has area 18 square units, because
18 unit squares can fit inside it. For less regular objects we need to consider parts of square units. This
requires a lot of work to do formally, but the intuition should be straightforward.

75
Question 2.2.1 What Is Volume?

Figure: Objects of several dimensions and their units of measurement

We use different names to describe objects and their measurements in different dimensions:

Dimension Names Measurement

0 point none
1 line, circle, curve length
2 square, polygon, disc, sphere, surface area
3 cube, polyhedron, ball, solid volume

Vocabulary Check

It doesn’t make sense to talk about the volume of a surface. No unit cubes will fit inside it.

Similarly it doesn’t make sense to talk about the area of a solid. Infinitely many unit squares will fit
in any solid. However, solids have boundary surfaces, and we do sometimes measure their areas.

The simplest solid to measure is a (right) prism. If a prism has height h, we can see that each unit
square (or part thereof) in the base has h unit cubes stacked above it. Thus we have

76
Formula for Volume of a Prism

volume = area of base × height

Figure: A prism divided into unit cubes and its base divided into unit squares.

Here we see the base of the prism and the square units (or parts thereof) that it contains. The prism
has height 3.5. We can see there are 3.5 cubic units above each square unit in the base.
You may be questioning the relevance of studying areas and volumes in the 21st century. Few people
need to compute geometric measurements in their careers. However, geometry is not the end goal of
this investigation.

Remark

Our motivation for studying solids is not to solve geometry problems. Recall that the definite integral
allowed us to express total change as an area:

total change = rate of change × time


Z b
f (b) − f (a) = f ′ (t) dt
a

This allowed us to use our geometric intuition of areas to better understand rates of change. Similarly,
volume will allow us to use geometry understand different types of rates later on.

77
Question 2.2.2
How Do We Visualize 3-Dimensional Solids?

Without computer graphics, it can be difficult to visualize anything but the simplest solids. Taking
an arbitrary solid like a lamp or a sculpture, computing its volume by filling it with cubes is a hopeless
endeavor (though a computer could make a decent estimate using small enough cubes). In the absence
of a computer rendering, how do we give our brains a visual reference, and how can we leverage this to
make measurements? We use cross sections.

Definition

A cross section of a solid object is its intersection with some transversal plane.

Transversal means the plane cuts across the solid. In the case of this square-based pyramid, a
transversal plane parallel to the base intersects the pyramid in a square. If it intersects at a different
height, the intersection would be larger or smaller. If it intersects at a different angle, it wouldn’t produce
a square at all.

Figure: A cross section of a pyramid

A solid can be reassembled from its cross sections. This is valuable because cross sections are two-
dimensional, making them easier to draw or visualize. If you have a set of parallel cross sections, you
can imagine them side by side and infer the shape of the original solid.

78
Figure: A set of parallel cross sections of a solid

Question 2.2.3
How Can We Approximate or Compute the Volume of a Non-Prism Solid?

Suppose we want to find the volume of a pyramid. Different square units of the base have a different
number of cubic units above them. Thus we need a more robust approach than counting cubes.

Figure: A pyramid with its base divided into unit squares

We will approximate the pyramid by prisms, whose bases are cross sections.

79
Question 2.2.3 How Can We Approximate or Compute the Volume of a Non-Prism Solid?

Figure: A pyramid approximated by prisms

The key insight is to represent the different heights of these cross sections by the variable x. We can
imagine the x-axis running through the solid in the direction of its height. The bases of the prisms are
cross sections. We let x∗i denote the height at which the ith prism’s base lies. The distance between the
heights x∗i is denoted ∆x, which is also the height of each prism. At different heights, we have different
cross sections with different areas. Area is what we really care about, since we want to compute the
volume of these prisms. We write cross sectional area as a function.

A(x) = Area of the cross section at height x


The sum of the volumes of these prisms can be written:
X
A(x∗i )∆x.
i

Taking a limit gives the exact volume of the solid:


X
Volume = lim A(x∗i )∆x
∆x→0
i

Notice that this is fits the definition of a definite integral, where A(x) is the function being integrated.
That is excellent news for us. Instead of having to learn a new way of evaluating this limit, we can use
the tools of integration that we already know.

Theorem

If the cross section of a solid, perpendicular to the x-axis, has area A(x) at each x, then the volume of
the solid is
Z b
A(x) dx
a

where a and b are the values of x at the bottom and top of the solid.

80
Example 2.2.4
A Solid with Its Cross-Sections Given

Suppose a solid S extends from x = 2 to x = 6 and the cross section at each x is a right triangle
of height x1 and base x2 . Compute the volume of S.

Solution

We will let the x direction be the height of our solid. Then the cross sectional area at each x is the area
of the triangle at that x.

1 1 1 1
A(x) = bh = x2 = x
2 2 x 2
Integrating this from x = 2 to x = 6 gives the volume.

Z 6
Volume = A(x) dx
2
Z 6
1
= x dx
2 2
6
1 2
= x
4 2

1 1
= 36 − 4
4 4
=8

The volume is 8 cubic units.

Example 2.2.5
A Solid Obtained by Rotation

5
Suppose the region under the graph y = x+1 from x = 1 to x = 4 is rotated around the x-axis.
Compute the volume of the resulting solid.

81
Example 2.2.5 A Solid Obtained by Rotation

5
Figure: The solid obtained by rotating the region under y = x+1 about the x-axis

Solution

When we cut the region under the graph perpendicular to the x-axis, we obtain a line segment whose
height is the value of the function. When that line segment is rotated around the axis, it sweeps out a
circle, with the line segment as the radius. We can use the formula for the area of a circle.
 2
2 5 25π
A(x) = πr = π =
x+1 (x + 1)2
We apply our volume formula.

Z 4
Volume = A(x) dx
1
Z 5
25π u-substitution
= dx
1 (x + 1)2 u =x+1 x=1⇒u=2
Z 6
25π du = dx x=5⇒u=6
= du
2 u2
6
25π
=−
u 2
25π 25π
=− +
6 2
25π
=
3

25π
The volume of the solid is 3 cubic units.

Main Idea

When the region under a graph y = f (x) is rotated around the x-axis, the cross sections are discs of
radius f (x). Their areas are π[f (x)]2 .

82
Example 2.2.6
A Solid Defined by Its Base

Suppose we have a solid S with the following properties:


The base of S is the region enclosed by y = 0 and y = 4x − x2 .
The cross-sections of S perpendicular to the x-axis are trapezoids which have one base in the base
of S, another base twice as long, and whose heights are 6 units.

Compute the volume of S.

Solution

We find the x-bounds of S by computing the x-bounds of the base. We solve

0 = 4x − x2
0 = x(4 − x)
x − 0 or 4

So x ranges from 0 to 4. The base of the trapezoid at each x is the height from y = 0 to y = 4x − x2 .
Note 4x − x2 > 0 when 0 < x < 4. Thus the base b1 = 4x − x2 . The other base is twice as long, so it
is 8x − 2x2 . The height is 6, regardless of x.

1
A(x) = (b1 + b2 )h area of a trapezoid
2
1
= (4x − x2 + 8x − 2x2 ))6
2
= 36x − 9x2
Z 4
Volume = 36x − 9x2 dx
0
4
= 18x2 − 3x3
0

= 96

Figure: A solid with base between two graphs and trapezoidal cross-sections

83
Example 2.2.6 A Solid Defined by Its Base

Main Idea

The cross section of the base of a solid is a segment. If we know what role this segment plays in the
cross section of the solid, we can use the expression for the length of this segment to derive an expression
for A(x).

Remark

Notice it is not necessary to be able to visualize the solid to compute its volume from cross sections. It
is not even necessary to know what the cross-sections look like precisely. For instance, our trapezoids
may or may not have a right angle. As long as we can compute the area, the exact shape is irrelevant.

Example 2.2.7
A Solid Described by Measurements

Compute the volume of a pyramid with a square base of side length s and a height of h.

Solution

Let x = 0 be the base of the pyramid and x = h be the vertex. The cross sections are squares. Since
the edges of the pyramid are straight, the squares shrink linearly from s at x = 0 to 0 at x = h. The
line that goes through these two points is
s
Side length = − x + s
h
The cross sections have area

 s 2  
1 2 2
A(x) = (Side length)2 = − x + s = s2 x − x + 1
h h2 h
We can plug this into the formula for volume.

Z h  
1 2 2
Volume = s2x − x + 1 dx
0 h2 h
  h
2 1 3 1 2
=s x − x +x
3h2 h 0
 
2 1 3 1 2
=s h − h +h−0
3h2 h
 
2 1
=s −1+1 h
3
1 2
= s h
3
The volume of the pyramid in cubic units is V = 31 s2 h.

84
Section 2.2
Exercises

Summary Questions

Q1 Describe how a cross section of a solid is produced.

Q2 What is the significance of the function A(x) in the formula for the volume of a solid?

Q3 What shapes do we use to approximate the volume of a solid? Why do we choose that shape?

Q4 When we rotate the region under y = f (x) around the x axis, how do we compute the area of
each cross-section?

2.2.1

Q5 Which of the following shapes have (nonzero) volume?

a square
a ball
a sphere
a cube
a cone
a triangle

Q6 Suppose I have a solid S. I tried to fit a unit cube into S but I couldn’t do it, no matter where
I placed the cube or how I rotated it. I conclude that the volume of S is less than 1 unit cube.
What do you think of my conclusion?

Q7 Will the volume of an object be greater is measured in cubic centimeters or cubic inches? Explain
using the definition of how we measure volume.

Q8 Suppose I create a solid by stacking a cone on top of a cylinder. How is the volume of my
new solid related to the volume of the cone and the volume of the cylinder? Explain using the
definition of how we measure volume.

85
Section 2.2 Exercises

2.2.2

Q9 Let S be a sphere of radius 5 centered at the origin. What are the cross sections, perpendicular
to the x-axis? How do they change as you travel along the axis from −5 to 5?

Q10 Describe or draw the cross sections of the pyramid below when it is cut by planes parallel to the
one pictured.

Q11 Suppose all of the cross sections of a solid S, perpendicular to the height, are identical (same

shape and same size). What kind of solid is S?

Q12 Describe the cross sections of a cube

a perpendicular to an edge.

b perpendicular to the line connecting the midpoints to two opposite edges.

c perpendicular to the diagonal that connects two opposite vertices.

86
2.2.3

Q13 Suppose I’m trying to approximate the volume of a solid S of height 12 using four prisms of equal
height. Supoose those prisms have volumes 5.1, 6, 7.2 and 9.6

a What is the approximate volume of S?

b What are the areas of the cross sections I used to produce each prism?

Q14 Suppose I’m trying to approximate the volume of the half-ball below by prisms. I subdivide the
height into n subheights and use the cross section at the left hand side of each as the base of each
prism. Will I overestimate or underestimate the volume? Explain how you know in a sentence or
two.

Q15 Produce an approximation of the volume of a pyramid with height 9 and square base of side
length 6 using 3 prisms. There are multiple correct answer to this, corresponding to different
choices of where to take the cross sections.

Q16 Suppose a solid S has height 16. Suppose all of its cross-sections perpendicular to the height
have a different shape, but all of those shapes have area 5.

a What is the volume of S?

b Do you really need calculus to solve a ? Discuss.

2.2.4

Q17 Compute the volume of the solid between x = 0 and x = 3 whose cross sections at each x are
squares of side length ex .

Q18 Compute the volume of the solid between x = 0 and x = 2 whose cross sections at each x are

trapezoids of bases x + 1 and x + 3 and height x2 .

Q19 Compute the volume of the solid whose cross sections, perpendicular to the x-axis, are triangles

whose bases lie between y = 3x and y = x2 from x = 0 to x = 3 and whose heights are equal
to the length of their bases.

87
Section 2.2 Exercises

Q20 Compute the volume of a solid between x = 1 and x = e2 whose cross sections perpendicular to
ln x
the x-axis are rectangles of base ln x and height x .

2.2.5


Q21 Compute the volume of the solid created by rotating the region under y = x from x = 0 to
x = 9 around the x-axis.

Q22 Consider the semidisk of radius 3 below:

a Write a function y = f (x) that defines the boundary of this semidisk.

b Suppose this semidisk is rotated around the x-axis. Describe the resulting solid.

c Compute A(x), the area of the cross section at each value of x.

d Write and evaluate an integral that computes the volume the solid of rotation.

Q23 Compute the volume of the solid created by rotating the region y = 4 − x2 from x = −2 to
x = 2 about the x-axis.

Q24 Compute the volume of the solid created by rotating a trapezoid with vertices (2, 0), (5, 0), (5, 8)

and (2, 2) around the x-axis.

88
2.2.6

Q25 Compute the volume of a solid whose base is the triangle under y = − 12 x + 3 in the first quadrant
and whose cross sections, perpendicular to the x-axis are triangles of height 8.
√ x
Q26 Compute the volume of a solid whose base is the region enclosed by y = x and y = 2 and
whose cross sections, perpendicular to the x-axis are squares.

Q27 Compute the volume of a solid whose base is a right triangle with legs 4 and 3 and whose cross
sections, perpendicular to the leg of length 4, are semicircles with their diameter in the base.

Q28 Compute the volume of a solid S whose base is the unit disc and whose cross sections perpendicular
to the x-axis are isosceles right triangles, with one leg in the base.

Extension and Synthesis

Q29 Let D be the region enclosed by y = x2 − 6x and the x-axis.

a Set up an integral that will compute the geometric area of D. You do not need to evaluate
it.

b Let S be a solid whose base is D and whose cross sections perpendicular to the x-axis are
semicircles with their diameter in D. Set up an integral that will compute the volume of S.
You do not need to evaluate it.

Q30 Consider the solid obtained by rotating the triangle below around the x-axis.

a Describe the shape of the cross sections. Which measurements of this shape depend on x?

b Compute a formula for A(x), the area of the cross section at each value of x.

c Compute the volume of the solid.

89
Section 2.2 Exercises

Q31 A solid S of height 12 has the following cross sections areas A(x) at height x. How would you
approximate the volume?
x A(x)

1 10
5 12
7 11
10 7
12 2

90
Section 2.3

Integration by Parts
Goals:
1 Use the integration by parts formula to find anti-derivatives and definite integrals.
2 Choose appropriate decompositions for integrating by parts.
3 Recognize when applying the formula multiple times will be fruitful.
The product rule gives us a reliable method for computing derivatives of products. If you can
differentiate each factor in a product, you can differentiate the entire product. This is not the case for
integration. In this section we add another tool to our limited tool set for integrating a product of two
functions. Even with this method, many problems will be permanently out of reach.

Question 2.3.1
How Do We Compute an Anti-Derivative of a Product of Two Functions?

We reversed the chain rule (which computes derivatives) to compute anti-derivatives of certain
functions. This method is called u-substitution. The du term means that we often end up integrating
a product of functions with this method.

Example
Z 3
2
Compute the integral: xex dx
0

Solution

Z 3 Z 9
2 1 u
xex dx = e du
0 0 2 u-substitution
9 2
1 u u=x x=0⇒u=0
= e
2 0
du = 2x dx x=3⇒u=9
1
= (e9 − 1)
2

Main Idea

u-substitution is extremely fragile. Our example relies on the fact that the factor x is a constant multiple
of the derivative of the inner function, x2 .

Since the chain rule can only produce certain products, we should look for other differentiation rules
that could produce other products. The product rule is the obvious candidate.
91
Question 2.3.1 How Do We Compute an Anti-Derivative of a Product of Two Functions?

Reminder

The Product Rule states that if f (x) and g(x) are differentiable, then

[f (x)g(x)] = f ′ (x)g(x) + g ′ (x)f (x).

Example
Z
Compute x2 cos x + 2x sin x dx

Solution

This integrand looks like it might be the output of the product rule. If we write
f ′ (x)g(x) + g ′ (x)f (x) = x2 cos x + 2x sin x
we can match up the factors as
f (x) = sin x f ′ (x) = cos x

g(x) = x2 g ′ (x) = 2x

d 2
Since dx (sin(x)x ) = x2 cos x + 2x sin x we can conclude
Z
x2 cos x + 2x sin x dx = sin(x)x2 + c

If anything, this is more fragile than u-substitution. It requires a sum of compatible products. How

can we make the formula [f (x)g(x)] = f ′ (x)g(x) + g ′ (x)f (x) more useful?
A formula that applies to a single product instead of a sum of two products would be much more
useful. We can obtain it by subtracting.


f ′ (x)g(x) + g ′ (x)f (x) = [f (x)g(x)] product rule
Z
f ′ (x)g(x) + g ′ (x)f (x) dx = f (x)g(x) + c integrate both sides
Z Z
f ′ (x)g(x) dx + g ′ (x)f (x) dx = f (x)g(x) + c sum rule of integrals
Z Z
g ′ (x)f (x) dx = f (x)g(x) − f ′ (x)g(x) subtract from both sides

Notice we don’t need the “+c” anymore. Both sides contain an indefinite integral so the possible
constant of difference is built in on both sides. We can make one further move to simplify the equation.
Since g ′ (x)dx is the differential of g(x) and f ′ (x)dx is the differential of f (x), it is convenient to
represent these functions with variables. u and v are the traditional choices here.
This method is called integration by parts. Here is the formal statement.
92
Theorem
Z
Suppose an integral can be written u dv where

u is a function (more precisely u(x)),


and dv is a differential (more precisely v ′ (x)dx).
We can apply the following formula:
Z Z
u dv = uv − v du

The integration by parts formula was not difficult


R to derive. The more pressing questionR is whether
it is useful. It replaces the problem of evaluating u dv with a new problem: evaluating v du. We
need to see some examples to determine whether it is ever any help at all.

Example 2.3.2
Computing an Anti-derivative Using Integration by Parts

Z
Compute xex dx.

Solution

To use integration by parts, we need to look at the integrand xex and decide which part is u and which
part is dv. Let’s try letting u = x and dv = ex dx. The formula says
Z Z
u dv = uv − v du.
Z
We can replace xex dx by the right hand side, but we need to know what du and v are. We find du

by taking the differential of u. We find v by taking the antiderivative of dv.


u=x =⇒ du = dx
dv = ex dx =⇒ v = ex
Now we can apply the integration by parts formula.
Z Z
xex dx = xex − ex dx

Notice the integrand vdu is not a product. It is a function whose antiderivative we know. Thus
integration by parts allowed us to replace a product we couldn’t integrate with something we could.
Evaluating the integral, we obtain:
Z
xex dx = xex − ex + c

93
Example 2.3.2 Computing an Anti-derivative Using Integration by Parts

We can always verify our antiderivatives by differentiating them. In this case

d
(xex − ex + c) = xex + ex (1) −ex
dx | {z }
product rule

= xex

This verifies that we have found the correct antiderivative of xex .

Remark

The most general antiderivative of dv = ex dx would be v = ex + c. However, we can get away


with using a specific antiderivative instead. To convince yourself of this, try redoing the problem with
v = ex + c, and see that the c cancels out of your answer.

Question 2.3.3
How Do We Choose u and dv?

Z
What would happen if we again solved xex dx by parts, but set

u = ex
dv = x dx?

In this case we compute

Z by parts
xex dx
u = ex dv = x dx
du = e dx x
v = 12 x2
Z
1 x 2 1 2 x
= e x − x e dx
2 2

This is no less correct than our previous application of the formula. It is, however, much less useful.
To evaluate this we need to know an anti-derivative of 12 x2 ex , which seems like an even harder problem
than the one we started with. As we can see, the choice of u and dv can determine the success or failure
of integration by parts. So what makes a good choice of u and dv?
In integration by parts, u is going to be differentiated. This usually makes functions simpler if
Z
anything. dv is going to be integrated. This could make v du difficult to compute. The following

mnemonic helps us decide which factor to choose as u and which as v.

94
I.L.A.T.E.

When deciding which factor of a product should be u and which should be dv, put them into the chart
below.

better u’s better dv’s


Algebraic
Inverse Trig Exponential
Logarithms expressions
functions functions functions
(polyniomials)

Let’s apply I.L.A.T.E to the following products:


Z
1 x5 ln x dx

x5 is algebraic. ln x is a logarithm. We should let u = ln x and dv = x5 dx.


Z
2 x sin x dx

x is algebraic. sin x is trigonometric. We should let u = x and dv = sin x dx.


Z
3 x2 tan−1 (x) dx

x2 is algebraic. tan−1 (x) is an inverse function. We should let u = tan−1 (x) and dv = x2 dx.

Z by parts
2 −1
x tan (x) dx u = tan−1 (x) dv = x2 dx
1
v = 13 x3
Z
1 3 1 3 1 du = dx
= x tan−1 (x) − x dx 1+x2
3 3 1 + x2
Z
1 3 1 3 1
= x tan−1 (x) − x dx
3 3 1 + x2
u-substitution
1 x2
Z
1
= x3 tan−1 (x) − 2x dx
3 6 1 + x2 u = 1 + x2
1u−1 du = 2x dx
Z
1 3
= x tan−1 (x) − du
3 6 u
Z
1 1 1
= x3 tan−1 (x) − 1 − du
3 6 u
1 3 1
= x tan−1 (x) − (u − ln |u|) + c
3 6
1 3 1
= x tan−1 (x) − (1 + x2 − ln |1 + x2 |) + c
3 6

95
Example 2.3.4
Using Integration by Parts More than Once

Z π
Compute x2 cos x dx
0

Solution

I.L.A.T.E.
R suggests u = x2 and dv = cos x dx. When we apply integration by parts to a definite integral,
the v du maintains the same bounds of integration. The uv is evaluated at those bounds, because it
is part of the antiderivative.

Z π by parts
x2 cos x dx
0 u = x2 dv = cos x dx
π
Z π
du = 2x dx v = sin x
= x2 sin x − 2x sin x dx
0 0

Unfortunately, we don’t know the anti-derivative


Rπ of 2x sin x. It is still a product. We can try applying
integration by parts again to replace 0 2x sin x with something we can evaluate.

Z π
x2 cos x dx
0
π
Z π by parts (again)
= x2 sin x − 2x sin x dx
0 0 u = 2x dv = sin x dx
π
 π
Z π 
du = 2 dx v = − cos x
= x2 sin x − −2x cos x − −2 cos x dx
0 0 0
π π π
= x2 sin x + 2x cos x − 2 sin x
0 0 0
2
= (π )(0) − (0)(0) + (2π)(−1) − (0)(1) − (0) + (0)
= −2π

Change of Variables?

Notice that despite defining functions u and v, we continue to work in terms of the variable x. Contrast
this with u-substitution where the variable x can be completely eliminated in a definite integral. That
approach isn’t possible here. We’d have to write v as a function of u. This would be complicated or
impossible.

96
Example 2.3.5
Using Integration by Parts to Produce an Equation

Z
Compute e2x cos x dx

Solution

I.L.A.T.E. suggests u = cos x and dv = e2x dx. To integrate dv we use a u-substitution. We apply the
integration by parts formula, factoring the − 21 from the integrand:

Z by parts
e2x cos x dx
u = cos x dv = e2x dx
du = − sin x dx v = 12 e2x
Z
1 2x 1
= e cos x − − e2x sin x dx
2 2
Z
1 1
= e2x cos x + e2x sin x dx
2 2

Did this help? We don’t know the antiderivative of e2x sin x. Even worse, it doesn’t seem to have
improved in any way. It is just as complicated as what we started with. Our intuition might be to give
up and try another approach. Perhaps I.L.A.T.E. has done us wrong and we should choose a different
u and dv. In this case, however, we should reject that intuition and continue. We’ll apply integration
by parts again.

Z by parts again
e2x cos x dx
u = sin x dv = e2x dx
v = 21 e2x
Z
1 2x 1 du = cos x dx
= e cos x + e2x sin x dx
2 2
 Z 
1 2x 1 1 2x 1 2x
= e cos x + e sin x − e cos x dx
2 2 2 2
Z
1 1 1
= e2x cos x + e2x sin x − e2x cos x dx
2 4 4

Does this help? Again the integrand does not seem Rto have improved, until we notice that the
integrand is exactly what
R we began with. We could add 14 e2x cos x dx to both sides of the equation,
2x
and we could solve for e cos x dx algebraically.

Z Z
1 2x 1 1
e2x cos x dx = e cos x + e2x sin x − e2x cos x dx
2 4 4
Z
5 1 2x 1
e2x cos x dx = e cos x + e2x sin x + c
4 2 4
Z  
4 1 2x 1
e2x cos x dx = e cos x + e2x sin x + c
5 2 4
Z
2 1
e2x cos x dx = e2x cos x + e2x sin x + c
5 5

97
Example 2.3.5 Using Integration by Parts to Produce an Equation

Main Idea

We’ve seen a variety of techniques to apply when integration by parts does not give us an immediate
Z
answer. The success of integration by parts depends on the v du term. You might use the following

flow chart to decide how to proceed once you have applied integration by parts.
Z
Is v du still a product?

no
yes
Integrate it.
Can you apply a u-sub?
You are done.
no yes
Z
How does v du compare Use u-sub.

to the orginal integrand? You are done.

simpler more constant multiple


complicated
similar
Apply integration by Use another Write an equation
parts again. approach. and solve.

Section 2.3
Exercises

Summary Questions

Q1 What type of integrands are good candidates for integration by parts?

Q2 How is u handled differently in integration by parts than in u-substitution?

Q3 How is the acronym I.L.A.T.E. used?

Q4 Under what conditions would we want to apply integration by parts more than once?

98
2.3.1

Z
sin x
Q5 Compute + cos x tan−1 x dx
1 + x2

Q6 Which of the following can be integrated using u-substitution?

ex dx xex dx x2 ex dx x3 ex dx
R R R R

2 2 2 2
ex dx xex dx x2 ex dx x3 ex dx
R R R R

3 3 3 3
ex dx xex dx x2 ex dx x3 ex dx
R R R R

4 4 4 4
ex dx xex dx x2 ex dx x3 ex dx
R R R R

2.3.3

Z
ln x
Q7 Evaluate dx.
x3
Z
Q8 Evaluate x sin x dx.
Z
Q9 Use integration by parts to compute tan−1 x dx. Note that d
dx tan−1 x = 1
1+x2

Z Z
Q10 We can write ln x dx as a product: (1)(ln x) dx.

a How does I.L.A.T.E. suggest we proceed?

b Use integration by parts to compute the antiderivative.

sin−1 x dx.
R
Q11 Compute

R π/4
Q12 Compute 0
tan−1 x dx.

99
Section 2.3 Exercises

2.3.4

Z
Q13 Compute x2 cos(x + 2) dx.
Z 1
Q14 Compute x3 ex dx.
0
Z
Q15 Compute x−7 sin(x−2 ) dx. Hint: The easiest way to split this is not the correct way. You’ll

need some factors of x to find an antiderivative of your trig function.


Z π
Q16 Compute x sin x dx.
0

2.3.5

Z
Q17 Compute e3x sin x dx.
Z
Q18 Compute e−x cos 2x dx.

Extension and Synthesis

Z
2
Q19 Compute x3 ex dx. Choose your dv carefully. You want something that you can integrate.
Z
Q20 Compute sin(ln x) dx. Perform a u-substitution before trying by parts.

Q21 Compute the area enclosed by y = xex and y = ex.

Q22 Let S be a solid between x = 0 and x = 3 whose cross-sections perpendicular to the x-axis are
triangles of base x and height ex . Compute the volume of S.

Q23 Let S be the solid obtained by rotating the region below y = ln x from x = 1 to x = 5 about
the x-axis. Compute the volume of S.

Q24 Suppose that S is a solid between x = 1 and x = 5 whose cross sections (perpendicular to the

x-axis) are triangles of height x2 and base ln x at each x. Compute the volume of S.

100
Section 2.4

Approximate Integration
Goals:

1 Use several methods to approximate definite integrals.


2 Assess the accuracy of an approximation.

3 Approximate integrals given incomplete information.


Rb
One of the first applications of integration is to measure total change. If v(t) is our velocity, a f (t) dt
computes the total displacement between the times a and b. In practice, to evaluate such an integral,
we need to know the antiderivative of f . Can we realistically expect to do this? Except in theoretical
situations (say a physics experiment), we cannot. A person driving a car will not produce a velocity
function that can be expressed in terms of algebra or trigonometry. While every continuous function has
an antiderivative, it doesn’t help us if we don’t know what it is or how to evaluate it.
Our best option in these situations is to approximate the integral. For instance, if we measure
velocity once per second, we could multiply each velocity by one second to approximate the distance
traveled in that second. Adding these up would approximate the total displacement. What we’ve done
is approximated the integral by rectangles of width 1. The natural question to ask is: how accurate is
such an approximation? How can we make it more accurate? These are the questions we’ll need to
address whenever we want to apply calculus to data sets instead of abstract functions.

Question 2.4.1
What x∗i Can We Use when Approximating an Integral?

Recall the following

Definition

The definite integral is given by the formula


Z b n
X
f (x) dx = lim f (x∗i )∆x
a ∆x→0
i=1

where ∆x are the lengths of the subintervals of [a, b], and x∗i is a number in the ith subinterval.

Without the limit (which is difficult or impossible to compute anyway) the sums on the right are
approximations of the integral. Once we choose an x∗i for each i, we can evaluate this approximation.
The simplest idea is to just use the left endpoint of each subinterval as x∗i .

101
Question 2.4.1 What x∗i Can We Use when Approximating an Integral?

Notation
The notation Ln refers to the approximation of
Z b
f (x) dx by n rectangles,
a
n
X
f (x∗i )∆x,
i=1

where the x∗i are the left endpoints of each subinterval.

Similarly Rn refers to the approximation using the right


endpoints for x∗i .
L4 approximation

Example 2.4.2
Computing an Ln Approximation

Z 5
a Compute an L3 approximation of x2 dx.
−1

Z 5
b Does L3 over or underestimate the actual value of x2 dx?
−1

Solution

a Let f (x) = x2 . The interval [−1, 5] has length 5 − (−1) = 6. Three rectangles means that

∆x = 63 = 2. We can divide up the interval to find all three subintervals. A diagram is a good
way to avoid mistakes.

x
−1 1 3 5

The left endpoints are −1, 1 and 3. Our approximation is

3
X
L3 = f (x∗i )∆x
i=1

= f (x∗1 )∆x + f (x∗2 )∆x + f (x∗3 )∆x


= ∆x(f (x∗1 ) + f (x∗2 ) + f (x∗3 ))

= 2((−1)2 + 12 + 32 )
= 22
102
b When the function increases, it has more signed area beneath it than then left-endpoint rectangles.

When it decreases it has less. f (x) = x2 increases and decreases, but on the interval [−1, 5], it
spends much more time increasing than decreasing. Thus we expect that L3 underestimates the
true integral. We can verify our intuition with a computation.

5 5
x3
Z
126
x2 dx = = > 22
−1 3 −1 3

Question 2.4.3
How Accurate is an Ln or Rn Approximation?

An approximation is much more useful, if we have some idea of how accurate (or inaccurate) it might
be. The way we quantify this inaccuracy is error.

103
Question 2.4.3 How Accurate is an Ln or Rn Approximation?

Definitions

The error in an approximation is given by

error = approximated value − actual value


In a real world approximation, we do not know the exact error (why?). We will settle for putting a
bound on error. This is a number N such that we are sure that

|error| ≤ N.

Determining error bounds can be difficult. Here are some questions to ask.
1 In what circumstances is the approximation exact?
2 What property or measurement seems to correspond to the amount of error?

3 Is there a “worst case scenario” associated to that property or measurement?


The following exercise explores these questions.

Exercise

a Draw a function for which Ln is always an overestimate.

b Draw a function for which Ln is always an underestimate.

c What has to be true of a function for Ln to always be exact?

d What familiar calculus measurement appears to measure whether you are in the situations you

described in a - c ?

104
Solution

a A decreasing function will be overestimated by Ln .

b An increasing function will be underestimated by Ln .

c If Ln is always exact, then f (x) is a constant function.

d Functions can be classified as increasing, decreasing or constant by their first derivative. f ′ (x)

seems to determine the sign (and maybe size) of the error.

Figure: The error of an Ln approximation

Let’s use the results of the exercise to formulate an error bound for Ln .
Higher derivatives seem to produce more negative errors. If we allow for steeper and steeper slopes,
there is no limit to how large the error could be. So let’s put a bound on how big the derivative is.
Suppose we know that f ′ (x) ≤ S on [a, b]. Over each interval [xi , xi+1 ] we know that f (x) lies below
the line of slope S through (xi , f (xi )):
f (x) ≤ S(x − xi ) + f (xi )
105
Question 2.4.3 How Accurate is an Ln or Rn Approximation?

The region below the graph y = f (x) and above the ith rectangle is smaller than the region below the
line and above the rectangle, but we can compute the area of the larger region. It is a triangle. Its base
is ∆x = b−a
n . Its height can be determined by the slope of the line.

Figure: The error and the error bound over one rectangle of an Ln approximation

height rise 1
= =S area = (base)(height)
base run 2
height 1
=S = S∆x2
∆x 2
 2
1 b−a
height = S∆x = S
2 n

2
So the error over each subinterval can be no larger than 12 S b−a
n . There are n subintervals, so the
Rb 2
total Ln approximation underestimates a f (x) dx by no more than S(b−a) 2n .
Rb
We can make a similar argument that if f ′ (x) ≥ −S then Ln overestimates a f (x) dx by no more
2
than S(b−a)
2n . We can combine these two statements into one by using absolute values. −S ≤ f ′ (x) ≤ S
is rewritten |f ′ (x)| ≤ S.
We could make the same argument for the Rn approximation. We’d only need to swapping the
overestimate with the underestimate. The error bounds it produces are the same. Our result can be
stated as a theorem:

Theorem
Z b
If EL and ER are the errors in an Ln and Rn approximations of f (x) dx and |f ′ (x)| ≤ S on [a, b]
a
then

S(b − a)2 S(b − a)2


|EL | ≤ and |ER | ≤
2n 2n

106
Remark

The argument that the line of slope S is the “worst case” scenario is a useful heuristic, but you may be
unsatisfied with its lack of rigor. A formal argument relies on the following ideas:
Rb Rb
Larger functions have larger integrals. If f (x) ≤ g(x), then a
f (x) dx ≤ a
g(x) dx as long as
a ≤ b.
Rx
The Fundamental Theorem of Calculus tells us we can write f (x) = f (xi ) + xi
f ′ (t)dt.
Rx
The line of slope S would be L(x) = f (xi ) + xi S dt. Over the interval [xi , xi+1 ], comparing these
Rx Rx
integrals shows that f (x) ≤ L(x). Thus xii+1 f (x) dx ≤ xii+1 L(x) dx. This tells us that there is
more error, and thus a larger underestimate in the left hand approximation of L(x) than there is in the
left hand approximation of f (x).

Example 2.4.4
Computing an EL Bound

Z 16 √
Suppose we want to understand the error of an Ln approximation of x dx.
1

a What bounds can we put on |f ′ (x)| for our error calculation?

b What bound can we put on the error of the L5 approximation?

1
c What n would we need in order to guarantee that the Ln approximation has error at most .
100
Z 16 √
d What problem would result, if we tried to bound the error of an Ln approximation of x dx?
0
How might you resolve this?

Solution

a f ′ (x) = 1

2 x
. This is always positive, and it decreases as x increases. The largest value of f ′ (x)

on [1, 16] occurs when x = 1. If we let S = f ′ (1) = 12 , we are guaranteed that for all x in [1, 16],
|f ′ (x)| < 21 .

107
Example 2.4.4 Computing an EL Bound

b By our theorem

S(b − a)2
|EL | ≤
2n
1
2 (16 − 1)2
=
2(5)
45
=
4

So the error lies between − 45


4 and
45
4 .

1
c We can set our error bound (with n as a variable) to be less than 100 and solve for n.

1
2 (16− 1)2 1
|EL | ≤ ≤
2n 100
225 1

4n 100
(225)(100) ≤ 4n
(225)(25) ≤ n
5625 ≤ n

1
We conclude that the error will be less than 100 as long as n is at least 5625. Note that since this
1
is an error bound, the actual error may shrink below 100 with fewer rectangles. We would need a
different method to verify that, though.
Z 16 √
d If we want apply our theorem to x dx, we need an S such that |f ′ (x)| ≤ S. This derivative
0
is f ′ (x) = 1

2 x
, which increases without bound as x → 0+ . Thus there is no S, and we cannot
apply the error bound theorem.
To get around this problem we could break the interval into two parts and bound them by different
methods. We can bound the error on rectangles 2 through n over the interval [∆x, 16] using the
theorem as above. In this case S = 2√1∆x will work. To bound the error over the first rectangle
[0, ∆x], note that f (x) is increasing. The first rectangle of Ln will underestimate the integral,
while the first rectangle of Rn will overestimate
√ it. Thus the actual error can be no bigger than
the difference between them, which is ∆x∆x − 0∆x. The total error can be no larger than the
sum of the error bound over [0, ∆x] and the error bound over [∆x, 16].

108
Question 2.4.5
How Can We Make our Approximation Less Sensitive to Slope?

Ln and Rn have large errors when function is increasing or decreasing rapidly. We’ll examine two
approximations that are more resilient. The first is the midpoint approximation.

Notation
Z b
The Mn approximation of f (x) dx is calculated by
a
summing:
n
X
f (x∗i )∆x
i=1

where the x∗i are the midpoints of each subinterval.

M4

Our final approximation abandons rectangles entirely. Using trapezoids instead allows for shapes that
reflect the value of the function at both the right and left endpoint. In this construction, the trapezoids
are sideways from the way you may be used to looking at them when you learned their area formula
A = 21 (b1 + b2 )h. The parallel bases are vertical. The height is along the x-axis.

Notation
Z b
The Tn approximation of f (x) dx is calculated by
a
summing:
n
X 1
(f (xi ) + f (xi+1 ))∆x
i=1
2

where xi and xi+1 and the two endpoints of the ith subin-
terval.
Tn can also be calculated as 12 (Ln + Rn ).
T4

Example 2.4.6
A Midpoint Approximation

Z 5
Calculate the M3 approximation of x2 dx.
−1

Solution

5−(−1)
∆x = 3 = 2. We can sketch the intervals:
109
Example 2.4.6 A Midpoint Approximation

x
−1 1 3 5

The midpoints are x∗1 = 0, x∗2 = 2 and x∗3 = 4.

n
X
M3 = f (x∗i )∆x
i=1

= ∆x(f (x∗1 ) + f (x∗2 ) + f (x∗3 ))

= 2(02 + 22 + 42 )
= 40

Example 2.4.7
A Trapezoid Approximation Using a Table of Values

Approximation has no practical use for algebraic functions. We would rather get the exact answer
by taking an antiderivative and applying the Fundamental Theorem of Calculus. In many real-world
applications, our data about a function consists of a finite number of measurements. In this case, we
don’t even have an expression for the function, let alone its antiderivative. Here is an example where
approximation is the best we can do.
Suppose we have the following table of values for a function f (x)

x 0 2 4 6 8 10 12 14 16
f (x) 2 5 3 4 7 8 5 4 1

Z 14
Calculate the T3 approximation of f (x) dx.
2

Solution

14−2
∆x = 3 = 4. We can sketch the intervals:

x
2 6 10 14

110
3
X 1
T3 = (f (xi ) + f (xi+1 ))∆x
i=1
2

1
= ∆x(f (x1 ) + f (x2 ) + f (x2 ) + f (x3 ) + f (x3 ) + f (x4 ))
2
1
= ∆x(f (2) + f (6) + f (6) + f (10) + f (10) + f (14))
2
1
= (4)(5 + 4 + 4 + 8 + 8 + 4)
2
= 66

Question 2.4.8
How Do the Error Bounds of the Approximations Compare?

Tn and Mn have zero error when f (x) is a straight line, regardless of slope. Larger errors result
from high rates of curvature. You can see this by using a small number of rectangles/trapezoids and
increasing the curvature of the function. Proving an error bound involves using a quadratic as a “worst
case scenario.” Any function with second derivative smaller than the quadratic will have a smaller error.
Here is the result.

111
Question 2.4.8 How Do the Error Bounds of the Approximations Compare?

Theorem

Suppose |f ′′ (x)| ≤ K for a ≤ x ≤ b. If ET and EM are the error in the trapezoid and midpoint
Z b
approximations of f (x) dx then
a

K(b − a)3 K(b − a)3


|ET | ≤ and |EM | ≤
12n2 24n2

Remarks

1 The maximum error is smaller when the function has less curvature.
2 The error is also reduced by increasing n, the number of subintervals.

3 These formulas indicate that we can usually expect Mn to have half as much error as Tn .
4 As n increases, the error bounds for Mn and Tn approach 0 much more quickly than Ln and Rn .

Example 2.4.9
Choosing n to Meet an Error Target

R 16 √
Suppose we wish to approximate 1
x dx by a midpoint approximation. How many rectangles
1
must we use to guarantee that the error is smaller than 1000 ?

Solution

The midpoint error formula requires use to have a bound K on |f ′′ (x)| on [1, 16].

1
f ′ (x) = √
2 x
1
f ′′ (x) = −
4x3/2

As x gets larger, the denominator of f ′′ (x) gets larger, meaning |f ′′ (x)| gets smaller (we could also
verify this by checking the sign of f ′′′ (x)). Thus it will be largest at x = 1. We can safely use the value
there as our K
1
|f ′′ (x)| ≤ |f ′′ (1)| = = K
4

112
We can now apply the error bound formula, leaving n as a variable. We will set the error bound to be
1
less than 1000 and solve for n.

K(b − a)3 1
|EM | ≤ 2

24n 1000
1
4 (16− 1)3 1

24n2 1000
1
4 (16− 1)3 1
≤ all factors are postive
24n2 1000
(1000)(15)3
≤ n2 isolate n2
(4)(24)
140, 625
≤ n2
4
375
≤n square root of both sides
2

Thus any n bigger than 375/2, will work. We need to use at least 188 rectangles to guarantee that the
1
error is less than 1000 . Note that we might achieve a sufficiently small error with fewer rectangles, but
our error bound theorem can not guarantee it.

Section 2.4
Exercises

Summary Questions

Q1 How is the error in an approximation defined?

Q2 What does the first derivative of f (x) tell you about the error in the right-hand approximation
Z b
of f (x) dx?
a

Q3 As the number of subintervals gets large, which approximation(s) converge most quickly to the
actual value?

Q4 Under what situation is a midpoint approximation preferable to a trapezoid approximation? When


would trapezoid be preferable?

113
Section 2.4 Exercises

2.4.1

Z 4
Q5 Seong-ju and Anthony are both approximating x2 dx with 4 rectangles. They know that
−4
they can use any combination of test points in their rectangles. What is the maximum difference
between their approximations?

Q6 a What ∆x and x∗i ’s would you use for the L4 approximation of

Z 23
f (x) dx?
3

b Can you write a general expression for ∆x and the x∗i ’s for

Z b
f (x) dx?
a

2.4.2

Z 16
Q7 Compute the L5 approximation of x3/2 dx.
1
Z 8  πx 
Q8 Compute the R3 approximation of x sin dx.
2 12
Z 2
Q9 Compute the L4 approximation of x3 ex .
0
18
3x
Z
Q10 Compute the L5 approximation of dx.
3 x

114
2.4.3

Z 8 √
3
Q11 Compute the theoretical error bound on the L14 approximation of x dx.
1
Z 15
1
Q12 Compute the theoretical error bound on the R5 approximation of dx.
0 +1 x2
Z 8
Q13 How large would n need to be to guarantee that the Ln approximation of log2 x dx is within
2
1
10000 of the actual value?
Z 2
Q14 How large would n need to be to guarantee that the Rn approximation of x3 dx is within
−1
1
1000 of the actual value?

2.4.4

Z 30
Q15 Suppose we make the following approximations of 4x + 7 dx. Without computing them, put
15
them in order from least to greatest (some may be equal).

L4 M4
L8
M8
R4
R8 The actual value

Rb
Q16 Yiming has a great idea. He approximates a
f (x) dx by 12 rectangles. In order to mitigate the
error of left and right hand approximations, he takes the right endpoint of the first subinterval as
a test point, but the left endpoint of the second subinterval. He continues to alternate for all 12
subintervals. What is another name for the approximation Yiming has produced?

115
Section 2.4 Exercises

2.4.5

Z 16
Q17 Compute the T3 approximation of x2 − x dx.
1
Z 16
Q18 Compute the M3 approximation of x2 − x dx.
1
9
πx2
Z  
Q19 Compute the M4 approximation of cos dx.
1 12
Z 6
2
Q20 Compute the T2 approximation of ex +2x
.
0

2.4.6

Q21 Given the following table of values of f (x)

x 0 3 6 9 12 15 18 21
f (x) 10 13 11 15 13 11 9 12

Z 15
a Compute the M2 approximation of f (x) dx.
3
Z 18
b Compute the T3 approximation of f (x) dx.
0

Q22 Given the following table of values of h(x)

x 1 2 3 4 5 6 7 8 9
h(x) 2 −1 3 4 2 1 −3 5 4

Z 9
a Compute the T3 approximation of h(x) dx.
1
Z 8
b Compute the M3 approximation of h(x) dx.
2

116
2.4.7

1
Q23 Let f (x) = x3 . If you wanted to use a midpoint approximation with n rectangles to approximate
Z 5
f (x) dx. How large must n be to guarantee your approximation had an error of no more
3
1
than 10000 ? Your answer should have the form n ≥ . . ., but you do not need to simplify any
arithmetic.
Z 9

Q24 Suppose we want to approximate x dx.
1

a Produce the T4 approximation. Don’t bother simplifying the arithmetic.

1
b Solve for a value n such that Tn has an error of at most 1000000 . Don’t simplify the arithmetic.

Q25 Consider the following data about an unknown function g(x).

x 0 2 4 6 8 10 12 14
g(x) 3 5 8 9 7 4 3 1

Z 12
a Compute a M3 approximation of g(x) dx.
0

b If you are given that |g ′′ (x)| < 1


4, what bound can you put on the error of the previous
approximation?
Z π
Q26 Sasha is trying to bound the error of her M10 approximation of sin x dx. She computes
0
f ′′ (0) = 0 and f ′′ (π) = 0 and so decides to use K = 0.

a What does her choice of K imply about the accuracy of her calculation.

b Explain what is wrong with Sasha’s reasoning.

c Compute the actual error bound for the M10 approximation.

117
Section 2.4 Exercises

Extension and Synthesis

Q27 Give an example of a function for which L4 and R4 are both overestimates on some interval. You
may want to express your function by drawing its graph.
Z 20
Q28 Suppose we want to estimate f (x) dx and have the following table of values
4

x 4 6 8 10 12 14 16 18 20
f (x) 3 5 4 2 −1 6 2 5 8

a What estimates are possible with this data?

b Would you expect the M4 or the T8 approximation to give you a better estimate?
Z 8
Q29 Consider T3 , the trapezoid approximation of x3 dx.
2

a Produce this approximation. Do not simplify the arithmetic.

b Compute the theoretical error bound for this approximation.

c Explain in a couple sentences how you can tell whether the error is positive or negative. You
can include a diagram, if you’d like to.
Z 25
Q30 Suppose you are interested in the value of f (x) dx, but you have only the following data.
0

x 1 2 6 8 13 14 20 23 25
f (x) 12 19 20 20 28 34 50 57 66

Z 25
How might you approximate f (x) dx?
0

Q31 Suppose you invent your own approximation for a definite integral. You name it the “ultimate
approximation” and denote it Un . Its formula is

Ln + Rn + Mn + Tn
Un = .
4

Will Un overestimate or underestimate the integral of a linear function? Justify your answer.
R 13
Q32 Suppose we compute an L5 approximation of −7
f (x) dx.

118
a What formula that we learned would give a bound on the error of this approximation? Fill in
all the information you can, and indicate the information that you would need to complete
the calculation. Be as specific as possible.

b Suppose that, instead of the information you need for the formula, you were only given that

f is an increasing function on [−7, 13]. How could you compute an error bound in this case?
Justify your answer.

119
Section 2.5

Improper Integrals
Goals:

1 Integrate a function that has a discontinuity.


2 Recognize when an integral is improper.

3 Determine whether an improper integral converges or diverges.


4 Compute the value of an improper integral.
5 Use comparison to determine convergence.

So far we have been content to evaluate integrals of continuous functions over bounded integrals.
Not all functions are continuous. We may be interested in the area under a discontinuous function, even
one with a vertical asymptote. We may be interested in the area under the entire graph of a function,
not just over some subset. In many cases these areas will be infinite, but in some cases they are not.
We will need to develop the methods to determine which case is which.

Question 2.5.1
What Is Infinity?

In this section we’ll be revisiting ideas about infinity.

Notation

The symbol ∞ implies that a variable or function is increasing without bound. It eventually gets bigger
than every number.

1
∞ is not a number. We cannot evaluate or ∞ · 0 or tan−1 (∞).

The main way that we’ve encountered this notation is with limits. Limits at infinity will also be
relevant to improper integrals, so you may want to review them.

120
Exercise

Evaluate the following limits:

1
a lim
x→∞ x2


b lim x
x→∞

c lim et
t→−∞

d lim sin y
y→∞

e lim ln w
w→∞

3x2 + 7
f lim
x→−∞ x2 − 5x

Solution

1
a lim = 0.
x→∞ x2


b lim x = ∞.
x→∞

c lim et = 0.
t→−∞

d lim sin y does not exist.


y→∞

e lim ln w = ∞.
w→∞

3x2 + 7
f lim = 3.
x→−∞ x2 − 5x

121
Question 2.5.2
How Do We Integrate a Discontinuous Function?

Consider the function


(
3x2 if x ≤ 2
f (x) =
10 − 2x if x > 2
Z 5
What is f (x) dx?
0

Figure: The area beneath a discontinuous graph


Z 5
f (x) dx is the signed area under f (x) from x = 0 to x = 5. It is equal to a limit
0

Z 5 n
X
f (x) dx = lim f (x∗i )∆x
0 ∆x→0
i=1

If we look at the rectangle approximations in this equation, we see that they can badly estimate the
function near the point of discontinuity.

Figure: Rectangle approximations of the area beneath a discontinuous graph

122
Remarks

n
X
We might worry that the approximations are so bad, that the limit lim f (x∗i )∆x does not
∆x→0
i=1
exist. Fortunately, it does, as long as there are only finitely many discontinuities..
Z x
f (x) almost has an antiderivative function. F (x) = f (t) dt has derivative f (x) at all x,
0
except perhaps at the points of discontinuity.

While it may be comforting to know that an antiderivative function exists, it doesn’t help us evaluate
the integral. We don’t know what number to assign to F (x) for many values of x. So how do we compute
Z 5
f (x) dx? Instead of dealing with a a function whose antiderivative we don’t know, we break this
0
into two integrals that we do know.
Z 5 Z 2 Z 5
f (x) dx = f (x) dx + f (x) dx
0 0 2
Z 2 Z 5
= 3x2 dx + f (x) dx
0 2

R5 R5
Why can’t we replace 2
10 − 2x dx? At x = 2, f (x) = 3x2 , not 10 − 2x. This is
f (x) dx with 2
R5 R5
unfortunate, because for any number t > 2 we could replace t f (x) dx with t 10 − 2x dx. We will
need to break our integral down further.
Z 5 Z 2 Z t Z 5
f (x) dx = f (x) dx + f (x) dx + f (x) dx
0 0 2 t
Z 2 Z t Z 5
= 3x2 dx + f (x) dx + 10 − 2x dx
0 2 t

We still don’t know the value of the middle integral, but we know that as t approaches 2, the domain
of integration shrinks to 0. We can take advantage of this by taking a limit.
Z 5 Z 2 Z t Z 5
f (x) dx = lim+ 3x2 dx + f (x) dx + 10 − 2x dx
0 t→2 0 2 t
2
Z t 5
= lim+ x3 dx + f (x) dx + 10x − x2
t→2 0 2 t
Z t
= lim+ 8 − 0 + f (x) dx + (50 − 25) − (10t − t2 )
t→2 2
Z t
= lim 33 − 10t + t2 + f (x) dx
t→2+ 2
Z 2
= 33 − 10(2) + 22 + f (x) dx
2

123
Question 2.5.2 How Do We Integrate a Discontinuous Function?

= 17
Notice that we had to evaluate an integral with the variable t as a bound. Once we had applied the
Fundamental Theorem of Calculus and plugged in t, this integral became a continuous function and we
could evaluate the limit.
Notice also the strange role the limit played in this computation. Usually we take limits to see what
value a changing function approaches. Our function has the same value for any choice of t (make sure
you see why), so technically we were taking the limit of a constant function. The limit was a purely
computational tool.

Remark
Rt
The discontinuity at x = 2 meant that we were stuck with an integral f (x) dx. With a less well- 2
R2
behaved function we might have also needed an integral on the left side of 2, like s f (x) dx. However,
these two integrals can always be sent to zero by a limit, so when solving integrals of discontinuous
functions, we can leave these out of our calculations.

We can summarize the method as follows:

Integrating discontinuous functions

If f (x) is discontinuous at x = c and a ≤ c ≤ b, then


Z b Z t Z b
f (x) dx = lim− f (x) dx + lim+ f (x) dx
a t→c a s→c s

provided that both of these limits exist.

A removable discontinuity should not slow us down even this much. The area under a single point
of discontinuity is zero. We can use the following theorem for a function with any finite number of
removable discontinuities.

Theorem

If f (x) and g(x) are equal on [a, b] except at a finite number of points, then
Z b Z b
f (x) dx = g(x) dx.
a a

This theorem eliminates the need to use limits in our example

Z 5 Z 2 Z 5
f (x) dx = f (x) dx + f (x) dx
0 0 |{z} 2 |{z}
=3x2 = 10 − 2x
except at x = 2

Z 2 Z 5
2
= 3x dx + 10 − 2x dx
0 2

Most discontinuities can be handled this way, but there is one type that will still require limits.
124
Example 2.5.3
Integrating a Function with a Vertical Asymptote

Definition
Z b
When f (x) has a vertical asymptote at c in [a, b] we call f (x) dx an improper integral.
a

Z 4
1
How can we compute √ dx?
0 x
In this case, breaking this integral into 2 doesn’t help.
Z 4 Z t Z 4
1 1 1
√ dx = lim √ dx + √ dx
0 x t→0+ 0 x t x
Z t
1
We cannot take for granted that lim √ dx goes to 0. The interval is getting smaller, but the
0 x
t→0+

values of the function may be so large that its rectangle approximations stay arbitrarily large and do not
Z t Z 4
1 + 1
limit to 0. If there were an unbounded amount of area in lim √ dx, then as t → 0 , √ dx
t→0 +
0 x t x
Z 4
1
would absorb more and more of that area and tend to ∞. Thus if (and only if) lim+ √ dx exists,
t→0 t x
Z t
1
we can assume that the remaining piece √ dx limits to 0 and can be ignored.
0 x

Solution

Z 4 Z 4
1 1
√ dx = lim √ dx
0 x t→0+ t x
√ 4
= lim+ 2 x
t→0 t
√ √
= lim 2 4 − 2 t
t→0+

=4−0

Z 4
1
Since lim √ dx exists, we conclude that
t→0+ t x
Z 4 Z 4
1 1
√ dx = lim √ dx = 4
0 x t→0+ t x

125
Example 2.5.3 Integrating a Function with a Vertical Asymptote

Figure: The area beneath a function with a vertical asymptote

Main Idea

To compute an improper integral, we introduce a dummy variable t and take limit(s) as t → c. If the
limit(s) exist, we say the integral converges. If any do not, we say it diverges.

Remark

Convergent and divergent are the terms that describe whether the limit which defines an integral ap-
proaches a single, finite numerical value. They perform a similar role to “exists” and “does not exist”
for limits or “defined” and “undefined” for arithmetic.

Question 2.5.4
How Can We Compute an Integral over an Unbounded Region?

So far we have been interested in integrals over bounded intervals: a ≤ x ≤ b. We approximated


these with rectangles.

Figure: The area beneath a graph, approximated by rectangles


126
Consider how this approach would work with an unbounded interval: a ≤ x.
Rectangles will not approximate the area we want, but we can compute any finite subsection of it:
Z t
f (x) dx. Like with a discontinuity, we’ll take a limit.
a

Definition
Z ∞
An integral of the form f (x) dx is also called an improper integral. We evaluate it by computing
a

Z ∞ Z t
f (x) dx = lim f (x) dx
a t→∞ a

assuming this limit exists. If the limit exists we say the improper integral converges. Otherwise we say
it diverges.

Z b Z b
Similarly, we can compute f (x) dx = lim f (x) dx.
−∞ t→−∞ t

Example 2.5.5
Evaluating an Improper Integral

Z ∞
32
Compute dx.
2 x3

Figure: An integral over an unbounded domain

127
Example 2.5.5 Evaluating an Improper Integral

Solution

We’ll compute the limit.


Z ∞ t
32 16
lim dx = lim − 2
t→∞ 2 x3 t→∞ x 2

16
= lim − +4
t→∞ t2
=4
Z ∞
32
Since the limit exists, it is the value of the improper integral. dx = 4.
2 x3

Example 2.5.6
An Integral over the Entire Real Line

So far we have looked at intervals unbounded in one direction. If the interval is (−∞, ∞), the entire
real line, then we use the following definition.

Definition
Z ∞
The improper integral f (x) dx is computed:
−∞

Z ∞ Z a Z ∞
f (x) dx = f (x) dx + f (x) dx
−∞ −∞ a

for any number a, so long as both integrals on the right converge. If either integral diverges, then we
Z ∞
say f (x) dx diverges as well.
−∞

Let
(
ex if x < 1
f (x) = .
√e if x ≥ 1
x
Z ∞
Compute f (x) dx.
−∞

128
Figure: An integral over the real line, broken into two limits

Solution

We break this integral into two limits. The natural breaking point is a = 1 since that is where the
function changes branches anyway. Both limits must converge for the integral to converge.
Z 1 Z t
lim f (x) dx lim f (x) dx
s→−∞ s t→∞ 1
Z 1 Z t
x e
lim e dx lim √ dx
s→−∞ s t→∞ 1 x
1 √ t
= lim ex = lim 2e x
s→−∞ s t→∞ 1

= lim e − es = lim 2e t − 2e
s→−∞ t→∞

=e = ∞ (diverges)

Z ∞
One limit converges to e. The other diverges. This means that f (x) dx diverges.
−∞

Question 2.5.7
Rt
Can We Take a Limit of −t f (x) dx Instead?

Z ∞
We might wonder whether we need to break an integral f (x) dx into two integrals. Instead
−∞
of two dummy variables, one going to −∞ and one going to ∞, could we replace them by one? The

129
Rt
Question 2.5.7 Can We Take a Limit of −t
f (x) dx Instead?

Z ∞
integral x3 dx is a useful test case. We can certainly compute
−∞

t t
x4
Z
lim x3 dx = lim
t→∞ −t t→∞ 4 −t
4
t t4
= lim −
t→∞ 4 4
= lim 0
t→∞

=0
This might even seem right because the area above the axis seems to cancel out the area below the
axis. However, intuitively, we expect that the area of a region should be preserved if we shift it in some
direction. Let’s shift this graph one unit to the left.
t t
(x + 1)4
Z
lim (x + 1)3 dx = lim
t→∞ −t t→∞ 4 −t
4
(t + 1) (−t + 1)4
= lim −
t→∞ 4 4
t4 + 4t3 + 6t2 + 4t + 1 t4 − 4t3 + 6t2 − 4t + 1
= lim −
t→∞ 4 4
= lim −2t3 − 2t
t→∞

= −∞
We can see that, for any choice of t, there will be more area below the graph than above, and the
difference grows quickly as t increases. If the area of a region changes when we shift it to the side, then
that area was not well defined to begin with. We thus say that these integrals diverge, not because
they go to ∞ or −∞, but because they are not defined at all. The formal definition above handles this
Z 0 Z ∞
example correctly. x3 dx diverges, so x3 dx also diverges.
−∞ −∞

Figure: The area under a functions of the form f (x) = (x − a)3


130
Main Idea

Do not replace the correct definition:


Z a Z t
lim f (x) dx + lim f (x) dx
t→−∞ t t→∞ a

with the “shortcut:”


Z t
lim f (x) dx
t→∞ −t

The “shortcut” can suggest that the integral converges, when in fact it diverges.

Synthesis 2.5.8
A Comparison Test

Recall the following theorems

Theorem
Z b Z b
If f (x) ≤ g(x) on [a, b] then f (x) dx ≤ g(x) dx.
a a

Theorem

Let a be a real number or ±∞. If F (x) ≤ G(x) for all x near a, then lim F (x) ≤ lim G(x).
x→a x→a

Suppose we have a function f (x) whose anti-derivative we don’t know, and a function g(x) whose
Z ∞
anti-derivative we do know. What can the divergence or convergence of g(x) dx tell us about
a
Z ∞
f (x) dx?
a

131
Synthesis 2.5.8 A Comparison Test

Solution
Z t Z t
If we know that f (x) ≤ g(x) then for all t ≥ a, f (x) dx ≤ g(x) dx. This allows us to also
a a
Z ∞ Z ∞
compare their limits, which are the improper integrals: f (x) dx and g(x) dx. This could be
a a
useful in a couple ways.
Z t Z t Z ∞
If lim g(x) dx = −∞ then lim f (x) dx = −∞ as well, meaning f (x) dx diverges.
t→∞ a t→∞ a a

Z t Z t
If on the other hand f (x) ≥ g(x) and lim g(x) dx = ∞ then lim f (x) dx = ∞ as well,
t→∞ a t→∞ a
Z ∞
which also means f (x) dx diverges.
a
Z ∞ Z ∞
We might like to reverse these and say that if g(x) dx converges, f (x) dx must as well,
a a
Z ∞
but f (x) dx can diverge without going to infinity. f (x) could oscillate between positive and
a
Z t
negative so that f (x) dx increases and decreases and does not have a limit as t → ∞.
a

We can actually solve the last issue adding the assumption that f (x) is non-negative. The result is
not easy to prove, but it is useful.

Theorem

Suppose 0 ≤ f (x) ≤ g(x) for all x.


Z ∞ Z ∞
If f (x) dx diverges, g(x) dx diverges.
a a
Z ∞ Z ∞
If g(x) dx converges, then f (x) dx converges.
a a

There are similar versions of this theorem for integrals to −∞ or for functions that are non-positive.

132
Section 2.5
Exercises

Summary Questions

Q1 What is an improper integral?


Z b Z b
Q2 Under what conditions were we able to conclude that f (x) dx = g(x) dx?
a a

Q3 What does it mean for an improper integral to converge or diverge?


Z ∞ Z ∞
Q4 If we know that g(x) dx converges, what condition on f (x) would guarantee that f (x) dx
a a
converges?

2.5.1

Q5 In the expressions below, which of the boxes can legally be replaced by an ∞ symbol?

Z 4 8
5 1
lim x + 2 = 3 f (x) dx = e + x2 + 2x − log 7
|x|
x→ 1 0 6 1

p
4
Q6 Evaluate lim x3 − 2x + 1.
x→∞

Q7 Evaluate the following limits:

x2 + 3x + 5
a lim
x→∞ ex
x2 + 3x + 5
b lim
x→−∞ ex
 
1
Q8 Evaluate lim ln .
w→∞ w

133
Section 2.5 Exercises

2.5.2

R3 x2
Q9 Evaluate 0 x dx. Explain how you dealt with any discontinuities.

Q10 Let
(
4 x = 1, 4, or 6
f (x) = .
2 otherwise

a Sketch the graph y = f (x).


Z 5
b Evaluate f (x) dx. State what tool you used to deal with any discontinuities.
0

Q11 Let
√
 x
 if 0 ≤ x ≤ 4
g(x) = 3 if 4 < x < 6 .
1

x2 if 6 ≤ x
Z 8
Compute g(x) dx.
1

Q12 The sign function has the form


(
1 if x > 0
σ(x) = .
−1 if x < 0
Z b
Write a formula (in terms of a and b) for σ(x) dx. Your answer will be a piecewise expression.
a

2.5.3

Z 2
1
Q13 Consider the integral dx.
−2 x

a Sketch the graph of y = x1 .

b Set up the limits that would compute this integral.

c Do these limits exist?

134
Z 1
Q14 Evaluate ln x dx.
0
Z 4
1 1
Q15 Evaluate √ +√ dx.
0 x 4−x
Z 3
2
Q16 Evaluate dw.
0 w2

2.5.4

Q17 How large will the base (∆x) of each rectangle be, if we want to approximate:

a The area over the interval [4, 16] with 3 rectangles?

b The area over the interval [a, b] with n rectangles?

c The area over the interval [a, ∞) with n rectangles?


Z ∞
2
Q18 Compute dx.
3 x
Z 0
Q19 Compute ex dx.
−∞
Z ∞
Q20 Evaluate e−2x dx.
0
Z 1
Q21 Evaluate ln x dx. You may need l’Hôpital’s rule.
0
Z ∞
1
Q22 Compute dx, showing all necessary steps.
3 x3

135
Section 2.5 Exercises

2.5.5

Q23 Compute
Z ∞
2
xe−x dx.
−∞

Z ∞
Q24 Show how to evaluate x1/3 dx or show that it diverges.
−∞

Q25 Let
(
1
x3 if x < −2
f (x) 1 .
(x+4)2 if x ≥ −2
Z ∞
Evaluate f (x) dx.
−∞
Z ∞ Z
1 1
Q26 How would you write dx as a sum of two limits? You might recall that dx =
−∞ 1 + x2 1 + x2
tan−1 x + c. Use this to evaluate the integral.

Extension and Synthesis

Q27 Let
(√
x
3
if x < 8
f (x) .
10 − x if x ≥ 8

a Is f (x) continuous? Justify your answer with a calculation

b What is the area enclosed by y = f (x) and y = 0?

Q28 Let

−4/3
x
 if x < −8
1
f (x) √
3 x if − 8 ≤ x < 0 .

−x
e if x ≥ 0

Z ∞
Evaluate f (x) dx.
−∞

136
Q29 Consider the region R below y = x1 , above y = 0 and to the right of x = 1.

a Try to compute the area of R using an integral.

b Suppose R is rotated around the x-axis to create a solid S. Compute the volume of S.

c How annoying are the conclusions of a and b ?

3
Q30 Consider the region in the first quadrant whose boundary is the curves y = x, y = 2x − 1 and
y = 0.

a Write the area of this region as an integral in the variable y. Do not evaluate.

b Suppose this region is rotated around the x-axis. Write the resulting volume using one or
more integrals. Do not evaluate.

137
Section 2.6

Probability
Goals:

1 Test the properties of a probability density function.


2 Use probability density function to describe the underlying random variable.

3 Use the uniform, exponential, and normal distributions.


4 Compute probabilities and expected values.

The main problem facing every planner is uncertainty. When will the next epidemic strike? Will the
stock market go up or down? How many rare particles will flow through a detection device? These
outcomes cannot be known ahead of time, but they can be modeled as probabilities. Knowing when the
epidemic is likely to happen can guide our decision of how much to invest in mitigation. Knowing how
many particles are likely to pass through an area can inform us how sensitive our detection device needs
to be.
On the other hand, probabilities can also help us understand what has already happened. Probabilities
tell us whether the results of an experiment are likely to be a coincidence. Is an apparent pattern just
the variation inherent in random sampling, or is it likely to be present if the procedure is repeated? This
is in fact the basic model for statistical reasoning:

1 Assume that the type of pattern you’re looking for does not exist (a null hypothesis).
2 Collect observations.

3 Compute the probability of seeing those observations, given your assumption.


4 If the probability is very low, then the assumption is probably false.

Such reasoning allows us to conclude that survey is representative of the population as a whole. It
allows us understand what outcome will occur on average, or how much outcomes are likely to vary.
Such statistics help us understand the way the world works. We can design our next experiment or plan
our future behavior around that understanding. For example, on average, the stock market goes up.
This is one of the most powerful financial facts available to long-term investors, and it can be grounded
in a probabilistic study of past performance.

Question 2.6.1
What Is a Continuous Probability Distribution?

Definition

A random variable encodes the possible outcomes of a random selection. We use the notation
P (outcome) to denote the probability that a particular outcome occurs. If an outcome is impossible,
we write P (outcome) = 0. If it is certain we write P (outcome) = 1.

138
Example

Our outcome can be any expression concerning the random variable, for instance:
If S is the sum of the rolls of two six-sided dice, then

5
P (S = 8) = .
36

If T is the number of tails when two coins are flipped then

3
P (T ≥ 1) = .
4

We can encode these probabilities with a distribution function. The value of the function at each
number a is the probability that the outcome is a.

Example

If T is the number of tails obtained from two fair coins then


1
4
 if t=0
1

if t=1
fT (t) = 2
1
 if t=2
4


0 if t = anything else

Notice

The sum of the probabilities adds to 1.


There are only finitely many values of T that are possible.

What if we wanted to model height with a random variable? No one is exactly 68 inches tall. Even
people who say they are “five feet eight inches” are slightly taller or shorter. A distribution function
like we made for coins is unsuitable. It would have the property fH (h) = 0 for all h. To handle this
situation, we need to define a different kind of random variable with a different relationship to a defining
function.

139
Question 2.6.1 What Is a Continuous Probability Distribution?

Definition

A continuous random variable X is a random variable whose outcomes are real numbers, and whose
probability is modeled by a probability density function fX (x) such that
Z b
P (a ≤ X ≤ b) = fX (x) dx.
a

fX (x) must satisfy

1 fX (x) ≥ 0 for all x.


Z ∞
2 fX (x) dx = 1
−∞

Remark

The term density should give us a hint about how to think about these functions. Density is a rate.
The value of a probability density function tells you the rate of likelihood per unit of length on the real
number line. Integrating this rate over an interval gives the total likelihood of lying on that interval,
much like integrating a rate of change over an interval computes the total change.

An integral is the natural way to measure probability. The rules of integration are compatible with
our intuition of probability. Suppose we have an interval [a, b] broken into two or more subintervals. The
total probability of X having an outcome in [a, b] is equal to the sum of the probabilities of the outcome
lying in each subinterval. Similarly, the area above [a, b] and below the graph y = f (x) is equal to the
sum of the areas above each subinterval. In equations, these are the laws:

P (a ≤ X ≤ c) + P (c ≤ X ≤ b) = P (a ≤ X ≤ b)
Z c Z b Z b
fX (x) dx + fX (x) dx = fX (x) dx
a c a

140
Example 2.6.2
Describing a Random Variable from its Density Function

Consider the function


(
1 2
9x if 0 ≤ x ≤ 3
fX (x) =
0 if x > 3 or x < 0

a Verify that fX is a probability density function.

b If fX is the density function of X, compute P (X ≥ 2).

c What does fX tell us about the likely values of X?

Solution

Z ∞
a We need to check that fX (x) is never negative and fX (x) dx = 1
−∞

fX (x) is never negative, because it is either a square or 0.

Z ∞ Z 0 Z 3 Z ∞
fX (x) dx = fX (x) dx + fX (x) dx + fX (x) dx
−∞ −∞ 0 3
Z 0 Z 3 Z ∞
1 2
= 0 dx + x dx + 0 dx
−∞ 0 9 3
3
1 3
= x
27 0

1
= (27 − 0)
27
=1

Z ∞
P (x ≥ 2) = fX (x) dx
2
Z 3 Z ∞
= fX (x) dx + fX (x) dx
2 3
Z 3 Z ∞
1 2
= x dx + 0 dx
2 9 3
3
1 3
= x
27 2

141
Example 2.6.2 Describing a Random Variable from its Density Function

1
= (27 − 8)
27
19
=
27

c Outcomes outside of [0, 3] are impossible. Among the outcomes in [0, 3], outcomes closer to 3 are
more likely than outcomes closer to 0, because the density function has a greater value there.

Figure: The density function of X and the area representing P (X > 2)

Main Ideas

To verify that a function is a probability density function, we need to check that it is never negative
and that it integrates, over the entire real line, to 1.
We compute the probability that X has an outcome in an interval by integrating fX (x) over that
interval.
Outcomes of X where fX (x) is large are more likely than outcomes where fX (x) is small.

142
Figure: The density function of X and the areas that represent the likelihood of larger and smaller
outcomes

Question 2.6.3
What Density Functions Arise Naturally?

The requirements to be a probability density function are not very strict. The vast majority of prob-
ability density functions do not model a real life phenomenon or even an intriguing thought experiment.
What follows are three families of density functions that are especially useful. The first is the simplest.
When we lack data to suggest otherwise, it is a common choice when creating a model with some
randomness.

Definition

Given an interval [a, b], the uniform distribution on [a, b] is given by


(
1
b−a if a ≤ x ≤ b
fX (x) =
0 if x > b or x < a

Notice that the shorter the interval [a, b] is, the higher density is required to integrate to a total
probability of 1.

143
Question 2.6.3 What Density Functions Arise Naturally?

Figure: The density function of a uniform distribution

An intuitive but imprecise way to describe a random variable with a uniform distribution is to say that
all outcomes in [a, b] are equally likely. Since every outcome of a continuous random variable occurs with
probability 0, this is unhelpful. X is remarkable, because all outcomes in [a, b] have equal probability
density. To connect this to actual probabilities, we might say that all subintervals of [a, b] are equally
likely to contain the outcome of X, but this is incorrect. X is 3 times as likely to have an outcome in
an interval of length 6 as an interval of length 2. A precise statement would be: the likelihood of the
outcome of X occurring in each subinterval of [a, b] is proportional to the length of the subinterval.
Our second family of random variables naturally measures waiting time. This answer questions like:
when will the next customer come in? When will this device next detect a certain type of ambient
particle? Here is the formal definition.

Definition

Suppose an event happens randomly and uniformly at an average rate of λ times per unit of time (x).
Then the amount of time until it next occurs is given by the exponential distribution:
(
λe−λx if 0 ≤ x
fX (x) =
0 if x < 0

Observe the following


1 Higher λ means that X is likely to be smaller, as the event occurs sooner.
2 The probability of the event occurring in given interval, given that it did not occur before that
interval, depends only on the length of the interval.

144
Figure: The density function of an exponential distribution

The second point is best illustrated with a concrete example.

Example

Gravitational waves large enough to detect pass through the earth from time to time. Suppose we
switch on a gravitational wave detector, and the time (in days) until the first detection is modeled by
the exponential random variable X with density function 0.7e−0.7x .

The probability that the first detection occurs within two days is 0.75.
If the first detection does not occur in the first two days, then the probability that it occurs in the
following two days is 0.75
If the first detection does not occur in the first four days, then the probability that it occurs in
the following two days is 0.75

And so on
From this we can compute

P (2 ≤ X ≤ 4) = (1 − P (X ≤ 2))(0.75)
| {z }
X is not in
the first two days

= (0.25)(0.75)
= 0.1875

Our final family is the most famous, because it is the most generally applicable.

Definition

The normal distribution is sometimes called a bell curve. Many natural phenomena are normally
distributed. The formula is
1 (x−µ)2
fX (x) = √ e− 2σ2
σ 2π

145
Question 2.6.3 What Density Functions Arise Naturally?

The anti-derivative of this density function cannot be expressed with functions that we can evaluate.
Instead we can look up values in a table. The normal distribution has a special role in statistics:

Theorem [The Central Limit Theorem]

The average of any n independent identically distributed random variables (for instance performing the
same experiment n times) will converge to a normal distribution as n gets large.

This theorem helps explain why many natural measurements are approximated by bell curves. For
example, human height is affected by hundreds of factors, including individual genes, nutrition and
environment. If we view human height as an average of these factors, scaled with appropriate units,
then we expect human heights to be modeled by a normal random variable. Viewing a histogram of
human height statistics shows the expected bell curve.
The parameters in fX can be interpreted as follows:
µ is the average value of X. It corresponds to the peak of the bell curve.
σ is the standard deviation of X. Larger σ means that X has a larger probability of being far
from µ.

Figure: The density function (bell curve) of a normal distribution

Question 2.6.4
What Is the Expected Value of a Random Variable?

Expected value will be the first statistic we can compute for a random variable. Statistics of a data
set tell us something about the numbers in the data set. Statistics of a random variable should tell us
something about the outcomes of the random variable.
The expected value or average value of X describes what the average result will be, if you
let X take a value at random many times. It is typically denoted E[X] or with the letter µ.

146
Example

Suppose we average our rolls of a six-sided die. As the number of rolls n gets large, we’ll roll each
number close to n6 times. The sum of the rolls will be approximately
n n n n n n
1 +2 +3 +4 +5 +6
6 6 6 6 6 6

to compute the average, we divide by n. Fortunately, every term already has an n.


           
1 1 1 1 1 1
µ=1 +2 +3 +4 +5 +6 = 3.5
6 6 6 6 6 6

In general dividing the number of occurrences of the result a in n evaluations of X will be nfX (a).
When we divide out n, we obtain the following weighted average:

Formula

The expected value of a (discrete) random variable X with probability distribution function fX is
X
E[X] = xfX (x)
x

where x is summed over all possible outcomes of X.

To produce the corresponding formula for a continuous random variable, instead of multiplying
each outcome by its probability and summing, we multiply each output by its density and integrate

Formula

The expected value of a continuous random variable X with probability density function fX is
Z ∞
E[X] = xfX (x) dx
−∞

147
Example 2.6.5
The Expected Value of a Uniform Random Variable

Compute the expected value of a uniform random variable on [a, b].

Solution

We’ll apply the formula. Since fX (x) has discontinuities at a and b, we will break it into three parts.
Z ∞
E[X] = xfX (x) dx
−∞
Z a Z b Z ∞
1
= x(0) dx + x dx + x(0) dx
−∞ a b−a b
b
1
= x2
2(b − a) a

1 1
= b2 − a2
2(b − a) 2(b − a)
b2 − a2
=
2(b − a)
(b − a)(b + a)
=
2(b − a)
b+a
=
2

Notice that this is the midpoint of the interval [a, b]. Since X is uniformly distributed across the interval,
we’d expect the average value to occur at the midpoint.

Main Ideas

E[X] is typically occurs somewhere in the middle of the possible outcomes of X. With symmetric
density functions, it is the midpoint.

Example 2.6.6
The Expected Value of an Exponential Random Variable

a Compute the expected value of a exponential random variable.

b Explain why the role of λ in the answer to a makes sense.

148
Solution

a We will use the formula. Even after removing the region of 0 density, we are left with an improper
integral. We therefore will compute a limit.

Z ∞
E[X] = xfX (x) dx
−∞
Z 0 Z ∞
= x(0) dx + xλe−λx dx
−∞ 0
Z t by parts
= lim xλe−λx dx
t→∞ 0 u=x dv = λe−λx dx
t t
du = dx v = −e−λx
Z
= lim − xe−λx − −e−λx dx
t→∞ 0 0
t
1 −λx
= lim − xe−λx − e
t→∞ λ 0
1 0
= lim −te−λt − e−λt + 0e0 + e
t→∞ λ
1
= lim −te−λt − 0 + 0 +
t→∞ λ
1 t ∞ 
= + lim − form
λ t→∞ eλt ∞
1 1
= + lim − λt (l’Hôpital’s rule)
λ t→∞ λe
1
= +0
λ

Our final answer is


1
E[X] =
λ

b X measures the time until an event with average frequency λ occurs. Thus on average, we expect

to wait λ1 for it. For example, if an event occurs three times per hour, we would expect to wait
about 20 minutes for it to occur.

149
Example 2.6.6 The Expected Value of an Exponential Random Variable

Figure: The expected value of a exponential random variable

Main Idea

For asymmetric density functions, E[X] will not be in the middle of the range of values. It will be pulled
toward regions of higher likelihood.

Synthesis 2.6.7
Median Wait Time

Suppose that an exponential random variable models the wait time of a random caller to a call
center.

a What is the median wait time?

b Explain graphically why the median wait time less than the expected wait time.

Solution

a The median is the number m such that half the outcomes are larger than m and half are smaller.

150
We can write this as the following equation and solve for m.

P (X ≤ m) = 0.5
Z m
fX (x) dx = 0.5
−∞
Z 0 Z m
fX (x) dx + fX (x) dx = 0.5 (presumably m > 0)
−∞ 0
Z 0 Z m
0 dx + λe−λx dx = 0.5
−∞ 0
m
−e−λx = 0.5
0

−e−λm + e0 = 0.5

−e−λm = −0.5
−λm = ln 0.5
1
m= ln 2
λ

b The median is the point such that half the area under y = fX (x) lies on either side. The expected
value is weighted. A few outcomes far to one side can balance many outcomes slightly to the
other side. The outcomes of X extends to ∞ on the right but only to 0 on the left. These distant
outcomes pull the average to the right, but their distant position has no effect on the median.

Figure: The median M and expected value µ of an exponential random variable

151
Synthesis 2.6.7 Median Wait Time

Main Idea

The median is the value m such that half the area under y = fX (x) lies on either side of x = m.
We compute the median by setting P (X ≤ m) = 0.5 and solving for m.

Median is not the same as expected value. y = fX (x) may have more area on one side of E[X]
than the other, if the smaller side’s area is farther from the middle.

Section 2.6
Exercises

Summary Questions

Q1 Describe the difference between a continuous random variable and a non-continuous (discrete)
one.

Q2 How do we use a probability density function to compute the probability of an outcome?

Q3 What must be true about a probability density function?

Q4 How do you compute the expected value of a random variable?

2.6.1

Q5 How many possible outcomes does a continuous random variable have?

Q6 One of the following probability questions is different from the others. Explain why.

i. If you spin a prize wheel 3 times, what is the probability that my winnings add up to exactly
$80?
ii. If you flip two weighted (unfair) coins, what is the probability that exactly one of them comes
up tails?
iii. If you pick a random person, what is the probability that her height is exactly 68 inches?
152
iv. If I spin a wheel of names, what is the probability that it takes exactly 7 spins to land on my
own name?

Q7 Let X be a continuous random variable. Compute P (X = 13).


Z b
Q8 Another book might teach you that P (a < X < b) = fX (x) dx, instead of P (a ≤ X ≤ b) =
a
Z b
fX (x) dx. Why shouldn’t this bother you?
a

Q9 Let fT (t) be a probability density function of a random variable T . What quantity is represented
Z 5
by fT (t) dt?
−∞

Q10 Let fX (x) be a probability density function of a random variable X. What quantity is represented
Z ∞
by fX (x) dx?
2

Q11 Given a density function fU (u) for a random variable U , write an integral or integrals to compute

P (4 ≤ U 2 ≤ 9).

Q12 Suppose the height of a mature sunflower is given by the random variable H with density function

fH (h). If you friend tells you that her sunflower is in the top quintile in height, explain how you
could use fH to determine a range that the height of her sunflower must lie in.

2.6.2

Q13 Let W be a random variable with density function


(
36−w2
144 if 0 ≤ w ≤ 6
fW (w) =
0 otherwise

Compute P (2 ≤ W ≤ 9)

Q14 Let T be a random variable with density function


( √
3 t
2 if 0 ≤ t ≤ 1
fT (t) =
0 otherwise

Compute (0 ≤ T ≤ 41 )

153
Section 2.6 Exercises

2.6.3

Q15 If U is a uniform random variable on [4, 7.5], compute is the probability that U ≤ 5.5.

Q16 If X is a uniform random variable on [2, c] and P (0 ≤ X ≤ 4) = 0.25, what is c?

Q17 If W is an exponential random variable such that P (W ≥ 1) = 27 , then compute the value of the
parameter λ in its density function fW .

Q18 Juan looks at the density function of an exponential random variable X and says “X is more
likely to have the value 1 than 5.” “That’s silly,” replies Neha, “X has exactly zero probability
of being either of those. They are equally likely.” What do you think of their argument?

2.6.4

(
bx−3 x≥2
Q19 Let f (x) = .
0 x<2

a Compute a number b so that f is a probability density function.

b If f is the density function for some random variable Z, compute E[Z].

Q20 Suppose X is a random variable with density function fX (x). Suppose fX (x) is 0 outside [3, 11]

and decreasing on [3, 11]. Is E[X] greater or less than 7? Explain.

Q21 Suppose X is a continuous random variable with probability density function


( √
3 x
16 if 0 ≤ x ≤ 4
fX (x) =
0 if x > 4 or x < 0

a In a sentence or two, state what you would need to check to ensure that fX (x) is a valid
probability density function. You do not need to actually perform the calculations.

b Compute E[X].

Q22 Explain how you can use the graph of a normal random variable to identify the expected value.
Then compute that value using the expected value formula.

154
2.6.5

Q23 Give the expected value of a uniform random variable on [5.2, 9.4].

Q24 If the uniform random variable on [a, b] has expected value 7, and a = 3, what is b?

Q25 In this example, we divided by (b − a). What would happen if b − a = 0?

Q26 If you know the expected value µ of a uniform random variable X, what is the probability that
≥ µ? Is this problem answerable without the assumption that X is uniform? Explain.

2.6.6

Q27 Suppose X and Y are two different exponential random variables modeling events that occur on
average p and 2p times per day respectively. How are their expected values related?

Q28 Does our expected value formula result sense if λ < 0? Why should this not bother us.

Q29 On bus route 70, 3 buses come per hour, on average.

a Write a probability density function for X, the amount of time until the next bus arrives.

b What is the expected amount of time until the next bus comes?

c How likely is it that you will wait more than an hour for the bus?

Q30 If X is an exponential random variable, what is the probability that X ≤ E[X].

155
Section 2.6 Exercises

2.6.7

Q31 Compute the median value of a uniform random variable on [a, b].

Q32 Let W be a random variable with density function

(
36−w2
144 if 0 ≤ w ≤ 6
fW (w) =
0 otherwise

Compute the median value of W .

Q33 Let T be a random variable with density function

( √
3 t
2 if 0 ≤ t ≤ 1
fT (t) =
0 otherwise

Compute the median value of T .

Q34 Examine the graph of the density function of a normal random variable X. What is the median
of X? Explain how you can see this in the graph.

Extension and Synthesis

Q35 Suppose X is a uniform random variable on [a, b] and P (3 ≤ X ≤ 4) = 12 . Describe all possible
values of a and b.

Q36 Suppose the random variable W has the density function

(
k(7 − w) if 1 ≤ w ≤ 7
fW (w) =
0 if w > 7 or w < 1

a What values of W are possible?

b What can you say about which values of W are more likely than others?

c Given that fW is a density function, what is the value of the constant k?

156
d What is the average value of W ?

e Can you compute the median value of W ? This might be easier with geometry than with
calculus.

Q37 Suppose that g(x) is a probability distribution for a random variable X and g(x) = 0 for all
x ≥ 0.
Z 0
a What is the value of g(x) dx? Justify your answer with a sentence or computation.
−∞

b Give a formula for E[X]. Is it positive or negative? Justify your answer in a sentence or two.

Q38 Recall that an even function f (x) has the property that f (x) = f (−x) for all x. If the density
function of a random variable is even, what does that say about the expected value and median
of X? Explain your answer.

157
Section 2.7

Functions of Random Variables


Goals:

1 Compute expected values of functions of a random variable.


2 Compute the average value of a function.

3 Compute the variance of a random variable.

Sometimes the quantity modeled by a random variable is not the quantity we actually care about. For
example, while we might have a model for how many people will contract a disease, what we actually
would like to predict is how many healthcare resources they will require. The number of patients
determines the required resources, so mathematically, resources is a function of patients. Expected
values of such functions turn out to be straightforward to compute. A natural way to generate statistics
about a random variable is to write a function that measures something interesting and compute its
expected value.

Question 2.7.1
What Is a Function of a Random Variable?

When we write a function g(X) of a random variable X, then the output Y of this function is itself
a random variable. These functions are most intuitive with a discrete random variable. In this case we
can compute Y ’s probability distribution function by applying g to each outcome of X and summing
the probabilities that produce each output.

Example

Let X be a discrete random variable with probability distribution function fX (x). If Y = g(X) = X 2
then Y is a random variable and we can compute its probability distribution function fY (y).

0.1 if x = 0 
0.1 if y = 0


0.2 if x = 2

 

 
0.6
 if y = 4
fX (x) = 0.3 if x = 3 fY (y) =
  0.3 if y = 9
0.4 if x = −2
 

0 otherwise

 

0 otherwise

Since X = 2 and X = −2 both produce Y = 4, we added their probabilities together.

The function g does not need to be algebraically defined.

158
Example

Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed
1
(meaning each occurs with probability 100 ). Let N give the number of digits of X. Then N has
distribution function.
 9
 100
 if n = 1
 90

if n = 2
fN (n) = 100
1
 if n = 3
 100


0 otherwise

Question 2.7.2
How Do We Compute Expected Value of a Function?

In the case of a discreet random variable, we can compute expected value directly from the distribution
function.

Example

Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed.
Let N give the number of digits of X.
     
9 90 1
E[N ] = (1) + (2) + (3) = 1.92
100 100 100

Alternately, we could avoid using fN by directly applying the digits function to each outcome X and
taking a weighted average.

Example

   
1 1
E[N ] = (1) + · · · + (1)
100 100
| {z }
9 times
   
1 1
+ (2) + · · · + (2)
100 100
| {z }
90 times
 
1
+ (3)
100
= 1.92

159
Question 2.7.2 How Do We Compute Expected Value of a Function?

In general this gives us two ways to compute the expected value of a function.

Formulas

If Y = g[X] then we can compute E[Y ] from fX or from fY .


X
E[Y ] = yi fY (yi )
outcomes yi
X
E[Y ] = g(xi )fX (xi )
outcomes xi

Remarks

We can equate these formulas by substituting


X
fY (yi ) = fX (xj ).
g(xj )=yi

All that remains is to distribute the yi .

Both formulas will get us to the answer, but one of them skips the step of finding a distribution
function for Y .

In the case of a continuous random variable X, we might find it difficult to find the expected value
of Y = g(X) directly. We would need to
Find a density function fY (y) such that
Z b
fY (y) dy = P (a ≤ g(X) ≤ b)
a

for all a and b


Z ∞
Integrate E[Y ] = yfY (y) dy.
−∞

The first step is difficult for any but the simplest functions.
Fortunately, there is an integration analogue of substitution and distributive argument for discrete
variables. This allows us to compute the average outcome of Y as a weighted average of the probabilities
of X.

Theorem

If Y = g(X) is a function of a continuous random variable X with density function fX (x), then
Z ∞
E[Y ] = g(x)fX (x) dx
−∞

160
Notice that the expected value of X is a special case of this theorem. In this case, we are computing
the expected value of the function g(X) = X.

Example 2.7.3
Computing the Expected Value of a Function

Consider the random variable X with density function


(
1 2
9x if 0 ≤ x ≤ 3
fX (x) =
0 if x > 3 or x < 0

What is the expected value of eX ?

Solution

Since we want E[eX ], our function is g(x) = ex .

Z ∞
X by parts
E[e ] = ex fX (x) dx
−∞ 1 2
u= 9x dv = ex dx
Z 3
1 2 x
= x e dx du = 29 x dx v = ex
0 9
3 Z 3
1 2 x
= x2 ex − xe dx by parts again
9 0 0 9
3 3 Z 3 u = 29 x dv = ex dx
1 2 2 x
= x2 ex − xex + e dx 2
9 0 9 0 0 9 du = 9 dx v = ex
3
1 2 x 2 x 2 x
= x e − xe + e
9 9 9 0
3
5e − 2
=
9

We can check whether our answer is reasonable. Since X has outcomes between 0 and 3, eX should
have outcomes between 1 and e3 . Our expected value should also fall in that range, and it does.

161
Application 2.7.4
The Average Value of a Function

Sometimes people refer to the average value of a function without any reference to a random variable.
In this case, we understand the input variable to be uniformly distributed.

Definition

The average value of a function from x = a to x = b is the expected value of f (X), where X is
a uniform random variable on [a, b]. The density function is a constant, so we can factor it out of the
integral. We obtain the formula:
Z b
1
fave = f (x) dx.
b−a a

The number fave has geometric significance as well. The signed area under the graph y = f (x) from
x = a to x = b is
Z b
Area = f (x) dx.
a

The region under the horizontal line y = fave is a rectangle with equal signed area:
!
Z b
1
Area = width × height = (b − a) f (x) dx .
b−a a

In other words, if we flattened the area under f into a rectangle, fave would be its height.

Figure: The graph of y = f (x) and the constant function y = fave

162
Example 2.7.5
Computing The Average Value of a Function

2
Compute the average value of f (x) = xex between x = 1 and x = 3.

Solution

Z 3 u-substitution
1 2
fave = xex dx u = x2 x=1⇒u=1
3−1 1
du = 2x dx x=3⇒u=9
1 91 u
Z
1 y
= e du 4 du = 2 dy
2 1 2
9
1 u
= e
4 1
1
= (e9 − e)
4

Application 2.7.6
Variance

Suppose we wanted to plan ahead for the outcome of some random variable X. We might choose
to prepare for the circumstance in which X takes on the value E[X]. This is most likely to be a good
bet, but how much effort should we expend preparing for outcomes far from E[X]? It would help to
know how likely X is to be far from E[X]. We can model this with a distance function (actually we’ll
use distance squared) and compute the expected value of the distance function.

Definition

The variance of a random variable X is the expected value of (X − E[X])2 . If X is continuous with
density function fX (x), we obtain the formula
Z ∞
(x − E[X])2 fX (x) dx
−∞

The square root of variance is the standard deviation. Standard deviation is often denoted by σ, and
variance is often denoted by σ 2 .

If the expected value of (x − E[X])2 is larger, then X is more likely to be far from its expected
value.

163
Application 2.7.6 Variance

Figure: A density function with less variance and a density function with more variance

For example, we can compute the variance of X where X is a uniform random variable on [0, 8].

Solution

Variance is the expected value of (X − E[X])2 , so first we need to know the number E[X]. We showed
earlier that for a uniform random variable, E[X] is the midpoint of the interval. In this case that is
8+0
2 = 4. Armed with this value, we can compute the variance.

h i Z ∞
2 2
E (X − 4) = (x − 4) fX (x) dx
−∞
Z 8
2 1
= (x − 4) dx because fX (x) = 0 outside [0, 8]
0 8−0
Z 8
1 1
= x2 − 8x + 16 dx factor out
8 0 8
8
x3
 
1
= − 4x2 + 16x
8 3 0
 
1 512
= − (4)(64) + (16)(8) − 0 + 0 − 0
8 3
  
1 128
=
8 3
16
=
3

Remarks

In order to solve for variance, we need to know the expected value. We may have to compute
Z ∞
E[X] = xfX (x) dx.
−∞

Variance is larger when the area under y = fX (x) is spread farther to both sides, away from E[X].

164
Section 2.7
Exercises

Summary Questions

Q1 What kind of object is a function of a random variable?

Q2 How do we compute the expected value of a random variable?

Q3 If someone mentions the “average value” of a function without mentioning what random variable
to use, what do you assume?

Q4 What function’s expected value is the variance?

2.7.1

Q5 Let X be a random variable that indicates how long from now an event will occur (in hours).
How could a random variable indicating how long until the event happens in minutes be defined
in terms of X?

Q6 Suppose the radius of a circle R is a random variable. How could we define a random variable to
express the area of the circle?

Q7 Dominic buys 200 shares of a stock for $60 each. At the end of the day, the stock is worth $V
per share, where V is a random variable. How could you express Dominic’s profit or loss from his
stock purchase with a random variable?

Q8 Suppose X is a random variable with outcomes in the range [2, 7]. What is the range of outcomes
3
of the random variable Y = X2 ?

165
Section 2.7 Exercises

2.7.2

Q9 Suppose X is a random variable and Y = cX for some number c. Explain using one or more

rules of integration why E[Y ] = cE[X].

Q10 Suppose X is a random variable and Y = X + d for some number d. Explain using one or more

rules of integration why E[Y ] = E[X] + d.

Q11 Let X be a uniform random variable on [2, 5] with density function fX . Write a density function
fY for Y = 10X. Explain how your density function differs from fX .

Q12 Let X be a uniform random variable on [0, 3]. Is Y = X 2 a uniform random variable on [0, 9]?
Provide evidence for your answer.

2.7.3

Q13 Let W be a random variable with density function

(
36−w2
144 if 0 ≤ w ≤ 6
fW (w) =
0 otherwise

1
 
Compute E W

Q14 Let T be a random variable with density function

( √
2 t
3 if 0 ≤ t ≤ 1
fT (t) =
0 otherwise

Compute E[T 3 ].

Q15 Let X be an exponential random variable. Compute E[X 2 ].

Q16 Let g(x) = c be a constant function. Let X be a random variable. Compute E[g(X)].

166
2.7.4

Q17 Suppose that you are told that the average value of f (x) from x = a to x = b is 0.

a What geometric information does this give you about the graph y = f (x). Be specific.

b Suppose you are told that f (x) is non-negative for all x. How does that affect your answer

to a ?


Q18 Suppose you know that f (x) = 3
x has a positive average value over [a, b]. What does this tell
you about a and b?

2.7.5

Q19 Compute the average value of f (x) = x2 over [0, 3].

Q20 Compute the average value of g(x) = x sin x over [0, π].

Q21 Compute the average value of f (x) = x2 e3x over [0, 2]

1
Q22 What happens if we try to compute the average value of h(x) = x2 over [−2, 2]?

2.7.6

Q23 Compute the variance of an exponential random variable X. Note that you may already know
some components of this computation from earlier examples and exercises.

Q24 Compute the variance of a uniform random variable on [2, 7].

167
Section 2.7 Exercises

Q25 Let W be a random variable with density function

(
36−w2
144 if 0 ≤ w ≤ 6
fW (w) =
0 otherwise

Compute the variance of W . I’d suggest using a computer to help with the algebra.

Q26 Let T be a random variable with density function

( √
2 t
3 if 0 ≤ t ≤ 1
fT (t) =
0 otherwise

Compute the variance of T .

Synthesis and Extension

Q27 Let X be a random variable with density function fX . Let Y = cX for some number c. Write a
formula for fY

Q28 Compute the value b such that the average value of f (x) = x2 over [0, b] is 1.

Q29 Some people memorize compute variance using the formula σ 2 = E[X 2 ] − E[X]2 . Explain why

this formula is equivalent to the one we gave. (This is a famous calculation, so if you can’t figure
it out, look it up and try to explain each step).

168
Chapter 3

Series

This chapter introduces the Taylor polynomial, which is a useful tool for approximating functions that
cannot be evaluated with arithmetic. Like with the derivative and integral before it, we would like to
send the error in these approximations to 0. This requires us to take a new kind of limit called a series.
We will develop the tools to work with series, with the ultimate goal of defining and utilizing Taylor
series.

Contents
3.1 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
3.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
3.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
3.5 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Section 3.1

Taylor Polynomials
Goals:

1 Approximate a function with a Taylor polynomial.


2 Compute error bounds for a Taylor polynomial.

When learning algebra and trigonometry, we learn to use exact values like 7 instead of decimal
approximations, like 2.646. This prevents us from introducing errors into our calculations. However,
there are also advantages to approximation. Decimal approximations give us a much better sense of the
9
size of a number than ln 873 or e 5 . (Which of these is larger?)
Unfortunately arithmetic does not give us methods for approximating many quantities. Ideally, we
would like a method of approximation whose accuracy is limited only by how much time we wish to
spend computing. An example of this is long division. We can compute as many decimal places of 32 13
as we want, getting closer and closer to the exact value. Of course, long division can only approximate
fractions.
The method we will develop in this section is called a Taylor polynomial. It gives us a way to
approximate otherwise incomputable functions. The beginning point is the tangent line. The tangent
line was the motivation for developing the derivative, but its greatest benefit is not geometric. The
tangent line approximates the values of a function near the point of tangency. While the function may
be difficult to evaluate, the equation of the tangent line is linear. We can evaluate it by hand.

Question 3.1.1
How Can We Improve on a Linearization?

Formula

The linearization or tangent line to a function f (x) at a has the equation.

L(x) = f (a) + f ′ (a)(x − a)

By design f and L have


1 Equal values at a.
2 Equal first derivatives at a.
This means that for values of x near a, L(x) and f (x) will have similar values. L(x), which is easy
to compute, can be used as an approximation of f (x). As x travels away from a and y = f (x) curves
away from its tangent line, this method will lose accuracy. We could make a better approximation, if
we could match second, third, fourth derivatives of f (x). A line cannot do that, but a polynomial can.

170
Question 3.1.2
What Is a Taylor Polynomial?

A polynomial that mimics the first n derivatives of a function is called a Taylor polynomial. Here is
the formal definition.

Definition

The nth Taylor polynomial of f (x) at x = a is a degree n polynomial that shares the value and first
n derivatives of f at x = a. Its formula is

n
X f (k) (a)
Tn (x) = (x − a)k .
k!
k=0

Remarks

The variable is x. f (k) (a) is not a function but a number.

f (0) is the zeroth derivative, meaning f (0) (a) = f (a).


0! is defined to be 1.

Example 3.1.3
Computing a Taylor Polynomial


a Find the degree 3 Taylor polynomial of y = x at x = 4.


b Use it to estimate 5.

171
Example 3.1.3 Computing a Taylor Polynomial

Solution

a We will apply the equation of the Taylor polynomial where a = 4 and n = 3. Examining the

formula shows we need to know the value of first three derivatives of f (x) at a = 4.

f (x) = x1/2 f (4) = 2


1 −1/2 1
f ′ (x) =x f ′ (4) =
2 4
1 1
f ′′ (x) = − x−3/2 f ′′ (4) = −
4 32
3 3
f ′′′ (x) = x−5/2 f ′′′ (4) =
8 256

We can plug these into the summation formula:

3
X f (k) (4)
T3 (x) = (x − 4)k
k!
k=0

f (4) f ′ (4) f ′′ (4) f ′′′ (4)


= (1) + (x − 4) + (x − 4)2 + (x − 4)3
0! 1! 2! 3!
1 1 3
1
= + 4 (x − 4) − 32 (x − 4)2 + 256 (x − 4)3
2 1 2 6
1 1 1 1
= + (x − 4) − (x − 4)2 + (x − 4)3
2 4 64 512

√ √
b To approximate 5, notice 5 = f (5) and f (5) ≈ T3 (5).

1 1 1 1
T3 (5) = + (5 − 4) − (5 − 4)2 + (5 − 4)3
2 4 64 512
1 1 1 1
= + (1) − (1) + (1)
2 4 64 512
256 128 8 1
= + − +
512 512 512 512
377
=
512

172
Example 3.1.4
Writing a Sum in Σ Notation

P
As our Taylor polynomials get longer, we would like to condense them into notation. Part of
the challenge is choosing an expression that will produce all the terms of our sum. Write each of the
following sums in Σ notation.

a 4 + 7 + 10 + 13 + 16 + 19 + 22

b 2 + 6 + 18 + 54 + 162 + 486

c −3 + 4 − 5 + 6 − 7 + 8 − 9 + 10

√ √ √
1 2 3 2 5
d + + + +
4 9 16 25 36

Solution

a The terms increase by 3 each time. Repeated addition is multiplication, in this case 3k plus some

starting value. Starting with index k = 0 is convenient, because 3(0) = 0 at the starting value.
6
X
4 + 7 + 10 + 13 + 16 + 19 + 22 = 4 + 3k
k=0

b The terms are multiplied by 3 each time. Repeated multiplication is exponentiation, in this case

3k times some starting value. Starting with index k = 0 is convenient, because 30 = 1 at the
starting value.
5
X
2 + 6 + 18 + 54 + 162 + 486 = (2)(3k )
k=0

c The absolute values of this sum could just be the values of the index variable. To create an

alternating + and − pattern, we can multiply by (−1)k .


10
X
−3 + 4 − 5 + 6 − 7 + 8 − 9 + 10 = (−1)k k
k=3

d In a fraction, we can model the numerator and denominator separately.

√ √ √ 5 √
1 2 3 2 5 X k
+ + + + =
4 9 16 25 36 (k + 1)2
k=1

173
Example 3.1.5
P
A Taylor Polynomial in Notation

1
Write the 10th degree Taylor Polynomial for f (x) = x centered at x = 3.

Solution

Computing 10 derivatives seems excessive, so we will compute 4 and try to find a pattern. We’ll write
f (x) = x−2 and apply the power rule.

f (x) = x−2

f ′ (x) = −2x−3

f ′′ (x) = 6x−4

f ′′′ (x) = −24x−5

f (4) (x) = 120x−6

We observe
The sign of these derivatives is alternating, which we can model with a (−1)k .

The coefficients look like a factorial pattern, but offset. For example when k = 2 we obtain 3!.
We model this with (k + 1)!.
The exponent of x decreases by the same amount each step. We model it with −2 − k.
This suggests a general formula for the kth derivative.

f (k) (x) = (−1)k (k + 1)!x−2−k

We plug x = 3 into f (k) (x) and assemble the Taylor Polynomial:

10
X (−1)k (k + 1)!3−2−k
T10 (x) = (x − 3)k
k!
k=0

Question 3.1.6
How Accurate Is the Taylor Polynomial?

An approximation is much more useful, if we can put a bound on its error. We will present an error
bound theorem called “Taylor’s Inequality.” Taylor polynomials are effective approximations because
they try to match the values and rates of change of the original function. In order to make a careful
argument, we begin with the basic principal that we can compare functions using the values of their
derivatives.

174
Theorem

Let f and g be differentiable functions. Consider an interval [a, b], and suppose f (a) = g(a).
1 If f ′ (x) = g ′ (x) on [a, b], then f (x) = g(x) on [a, b]
2 If f ′ (x) < g ′ (x) on [a, b], then f (x) < g(x) on (a, b]

Reasoning

Intuitive If two functions start at the same value at a, then the one that grows faster will have a higher
value at b.
Formal The Fundamental Theorem of Calculus says
Z x Z x
f (x) − f (a) = f ′ (t)dt g(x) − g(a) = g ′ (t)dt.
a a

Larger functions have larger integrals.

Figure: Two functions with a common value at a: f (x) with a smaller derivative and g(x) with a
larger derivative.

Notation

Given a function f (x) and its nth Taylor polynomial Tn (x) centered at a, the remainder at x is

Rn (x) = f (x) − Tn (x)

175
Question 3.1.6 How Accurate Is the Taylor Polynomial?

If we are using Tn (x) to approximate f (x),

Rn (x) = −error of Tn (x).

We should be very interested in knowing the value of Rn (x). We will use our derivative comparison
theorem to make two arguments

1 If f (n+1) (x) is a constant M , then we can compute Rn (x) exactly.


2 If |f (n+1) (x)| ≤ M then the error in 1 is the worst-case scenario.

Theorem

If f (n+1) (x) is a constant M on [a, b], then

M
f (x) = Tn+1 (x) = Tn (x) + (x − a)n+1 .
(n + 1)!

Beginning with our assumption about the (n+1)th derivatives and the equality of the nth derivatives
at a, we can use our derivative comparison theorem to equate the nth derivatives on [a.b]. We can use
that equality to equate the (n − 1)th derivatives on [a, b]. We continue this reasoning until we conclude
that the functions are equal.

d d
dxn+1 f (x) = dxn+1 Tn+1 (x) = M on [a, b]

d d d d
dxn f (a) = dxn Tn+1 (a) dxn f (x) = dxn Tn+1 (x) on [a, b]
a Taylor polynomial match the function
Because derivatives and values of

d d d d
dxn−1 f (a) = dxn−1 Tn+1 (a) dxn−1 f (x) = dxn−1 Tn+1 (x) on [a, b]

d d d d
dx f (a) = dx Tn+1 (a) dx f (x) = dx Tn+1 (x) on [a, b]

f (a) = Tn+1 (a) f (x) = Tn+1 (x) on [a, b]

Remark

M
This theorem tells us that when f (n+1) (x) is a constant M , Rn (x) = f (x) − Tn (x) = (n+1)! (x − a)
n+1

But what if f (n+1) (x) is not a constant? In this case we will settle for a bound on f (n+1) (x).

176
Theorem [Taylor’s Inequality]

If f (n+1) (t) ≤ M for all x between a and b, then for all x between a and b,

M
|Rn (x)| ≤ (x − a)n+1
(n + 1)!

To prove Taylor’s Inequality, we compare the derivatives of f (x) with the worst-case scenario w(x) =
M (k)
Tn (x) + (n+1)! (x − a)n+1 . The derivatives w(k) (a) are the same as Tn (a) and f (k) (a) for 0 ≤ k ≤ n,
d
and dxn+1 w(x) = M.

d
Because M is a bound on dxn+1 f (x)

d d
dxn+1 f (x) ≤ dxn+1 w(a) = M on [a, b]

d d d d
dxn f (a) = dxn w(a) dxn f (x) ≤ dxn w(x) on [a, b]
a Taylor polynomial match the function
Because derivatives and values of

d d d d
dxn−1 f (a) = dxn−1 w(a) dxn−1 f (x) ≤ dxn−1 w(x) on [a, b]

d d d d
dx f (a) = dx w(a) dx f (x) ≤ dx w(x) on [a, b]

f (a) = w(a) f (x) ≤ w(x) on [a, b]

To finish the argument we need to


M
1 Produce a lower bound for f using w(x) = Tn (x) − (n+1)! (x − a)n+1 .

2 Solve the inequality bounds for Rn (x).

M M
Tn (x) − (x − a)n+1 ≤ f (x) ≤ Tn (x) + (x − a)n+1
(n + 1)! (n + 1)!
M M
− (x − a)n+1 ≤ Rn (x) ≤ (x − a)n+1
(n + 1)! (n + 1)!

3 Repeat for intervals of the form [b, a]. These work the same way with a sign reversed.

177
Example 3.1.7
A Taylor Approximation Error Bound

Let f (x) = sin x.

a Give a general form for the nth Taylor polynomial for f at x = 0.

b Find a bound on f (n) (x) for each n.

c What happens to the error bound as x increases but n stays the same?

d What happens to the error bound as n increases but x stays the same?

e What does this tell us about the relationship between the Tn (x) approximations and f (x)?

Solution

a For the Taylor polynomial formula, we need to compute the derivatives of f (x).

f (x) = sin x f (0) = 0


f ′ (x) = cos x f ′ (0) = 1
f ′′ (x) = − sin x f ′′ (0) = 0
f ′′′ (x) = − cos x f ′′′ (0) = −1

f (4) (x) = sin x f (4) (0) = 0

f (5) (x) = cos x f (5) (0) = 1


.. ..
. .

In order to write a general Taylor polynomial, we would need a general expression for f (k) (0). The
pattern is obvious, but trying to express it as a formula is much more difficult. The solution is a
trick worth remembering:
Since the even derivatives are zero, those terms do not appear in our Taylor polynomials. Since
we want to only have odd terms in our summation, we can let our index variable be k, but our
exponents in each term be 2k + 1. Thus as k goes from 0 to n, the summation will include only
the odd terms x1 through x2n+1 . We can produce the following chart to work out our coefficients:

k f (2k+1) (0)
0 1
1 −1
2 1
3 −1
.. ..
. .
178
This is an easier pattern to express:

f (2k+1) (0) = (−1)k

Now we are ready to write a formula. Since we intend to sum from k = 0 to k = n, we are
actually producing the (2n + 1)th Taylor polynomial.

n
X f (2k+1) (0)
T2n+1 (x) = x2k+1
(2k + 1)!
k=0
n
X (−1)k 2k+1
= x
(2k + 1)!
k=0

These are the odd degree Taylor polynomials, but what about the even numbered ones? Since
T2n (x) is just T2n−1 (x) plus the 2nth term, and the 2nth term is zero, we can write

n−1
X (−1)k 2k+1
T2n (x) = x
(2k + 1)!
k=0

b Given the chart above, we can see that the derivatives are sines and cosines. These are bounded

above by 1 and below by −1. Since Taylor’s inequality requires a bound of the form |f (n+1) (x)| ≤
M , we write
|f (n+1) (x)| ≤ 1
And luckily, thus works for all x and all n.

1 n+1
c Taylor’s Inequality says that |Rn (x)| ≤ (n+1)! x . As x goes to ∞, this bound goes to ∞ as

well. This makes sense, since Tn (x) is polynomial, while the function it is approximating stays
between −1 and 1.

d When n increases xn+1 increases by a factor of x. On the other hand, (n + 1)! increases by a

factor of n + 2. As n increases without bound, (n + 1)! grows faster than xn+1 and their ratio
approaches 0.

e Any Tn (x) will eventually become inaccurate outside a certain distance from 0. On the other

hand, if we want to approximate sin(x) for a particular x, we can make Tn (x) have as small an
error as we want by choosing sufficiently large n.

179
Example 3.1.7 A Taylor Approximation Error Bound

Figure: f (x) = sin x approximated by its Taylor polynomials, Tn (x)

Main Ideas

In order to understand how the error changes as n increases, we need to have an expression for
f (n) (x).

We can choose M to be the largest value of |f (n+1) | on the interval [a, x]. This may not be the
value of |f (n+1) (a)|.
In general, Taylor polynomials will become less accurate the farther you get from a.
We can often mitigate this inaccuracy by choosing larger values of n.

The (n + 1)! in Taylor’s Inequality might suggest that as n increases, the error in the nth Taylor
polynomial must shrink toward 0. However, this is not the case. Some functions are not well estimated
by their Taylor polynomial.

180
Example

(
0 if x ≤ 0
f (x) = 1
e− x if x > 0

f (k) (0) = 0 for all k. So the Taylor polynomial


at x = 0 is
n
X
Tn (x) = 0xk .
k=0

No matter how large n gets, Tn (x) will not get any closer to f (x) for any x > 0.

How can this happen, given Taylor’s Inequality? The derivatives of f get bigger and bigger. M
grows so fast that the error Rn (x) gets no smaller even with an (n + 1)! in the denominator of Taylor’s
Inequality.

Figure: A function whose derivative bounds grow factorially

Despite examples like this, it turns out that Taylor polynomials often do a good job of approximating
functions. For numerical computations, an approximation is good enough. For more theoretical situ-
ations, we would like to let n go to ∞ so that the error goes to 0 and we can use the polynomial as
an exact replacement of the function. Unfortunately, with infinitely many terms, we no longer have a
polynomial at all. Instead we have an object that we will call a Taylor series. We will develop the tools
to define and work with Taylor series over the course of this chapter.

181
Section 3.1
Exercises

Summary Questions

Q1 Why do we use Taylor polynomials?

Q2 Why is there a denominator of k! in the formula for a Taylor polynomial?

Q3 Explain why we’d always rather center a Taylor polynomial for y = ln x at x = 1.

Q4 What properties make a Taylor polynomial Tn (x) a better approximation of f (x)?

3.1.1

√ √
3
Q5 Suppose we use the linearization of f (x) = 3
x at x = 8 to approximate 6.

√ √
3
a What is the relationship between f (x) = 3
x and 6?

b Suppose L(x) is the the linearization of f (x) at x = 8. Would you expect L(6) to overesti-

mate or underestimate 3 6? Explain in a sentence or two.

Q6 Suppose you were locked in a room with only a pencil and paper and asked to compute the first
ten decimal places of the following numbers:

4 √
7 e
17

Which could you compute?


For the ones you can compute, how would you do it?

182
3.1.2

Q7 Is a tangent line a Taylor polynomial?

Q8 Suppose T4 (x) is the Taylor polynomial for f (x) centered at x = 10. List what information T4 (x)

and f (x) have in common, being as specific as possible.

Q9 If f (x) is a decreasing function, what can you say about the coefficients of any Taylor polynomial

of f (x)?

Q10 Suppose f (x) has a Taylor polynomial

1
T4 (x) = 5 + 3(x − 2) − (x − 2)2 + 2(x − 2)4
6

a What is f (2)?

b Is f increasing or decreasing at x = 2?

c Is f concave up or concave down at x = 2?

3.1.3

Q11 Let f (x) = ex .

a Find the degree 8 Taylor polynomial of y = f (x) centered at x = 0.

b How could you use this to estimate the value of e?

c Can you use sigma notation to write a general form for the degree n Taylor polynomial of
y = ex ?

Q12 Let f (x) = ln x

a Write the 5th Taylor polynomial of f (x) at x = 1.

b Use your polynomial to approximate ln 2.

183
Section 3.1 Exercises

Q13 Write the 10th Taylor polynomial for f (x) = cos x centered at x = π.

1
Q14 Write the 4th Taylor polynomial for f (x) = x2 centered at x = 5.

3.1.4

Q15 Write each of the following sums in Σ notation.

a 15 − 45 + 105 − 315 + 945

b 24 + 19 + 14 + 9 + 4 − 1 − 6

1 1 1 1 1
c 8 + 18 + 50 + 72 + 98

Q16 Write each of the following sums in Σ notation.

a 11 − 13 + 15 − 17 + 19 − 21 + 23

b 384 + 192 + 96 + 48 + 24 + 12 + 6

2 3 4 5
c 10 + 100 + 1000 + 10000

3.1.5

Q17 Write an expression in Σ notation for the 53rd Taylor polynomial of f (x) = ln x centered at
x=1

Q18 Write an expression in Σ notation for the 15th Taylor polynomial of f (x) = ex centered at x = 0

Q19 Write an expression in Σ notation for the 100th Taylor polynomial of f (x) = cos x centered at
x=0

1
Q20 Write an expression in Σ notation for the 71st Taylor polynomial of f (x) = x2 centered at x = 10

184
3.1.6

Q21 Why don’t we have any theorems for a lower bound for error? Give your answer in a few sentences.

Q22 Suppose you are using Taylor polynomials of f (x) centered at x = 0 to approximate f (−3).
k!
However, for each k, the best bound you can put on f (k) (x) on [−3, 0] is 4k
. Will you be able
to guarantee a good approximation of f (−3) this way? Explain.

3
Q23 Suppose the fourth derivative of f (x) is f (4) (x) = ex . Suppose we have written T4 (x), the

degree 4 Taylor polynomial of f (x) centered at x = 1. What can you say about the difference
between T4 (5) and f (5)? Be specific and justify your answer with a computation. You do not
need to simplify any arithmetic in your calculations.

Q24 Sketch a graph of y = ex and several tangent lines. On which part of the graph do the tangent
lines appear to approximate the function better? Does Taylor’s Inequality confirm this observa-
tion? Explain.

3.1.7


Q25 Here is the degree 3 Taylor polynomial of f (x) = x centered at x = 4:

1 1 1
T3 (x) = 2 + (x − 4) − (x − 4)2 + (x − 4)3
4 64 512

a Which derivative will let you bound the error of this approximation?

b Can you put a bound on this derivative that holds for all x?

c Can you put a bound on this derivative that holds for x in the interval [4, 5]?


d What error bound does this suggest for using T3 (5) to approximate 5?


Q26 Let f (x) = 3
x.

a Write the degree 2 Taylor polynomial of f centered at x = 8.


3
b If you wanted to use the Taylor polynomial to approximate 10, how would you do that?

185
Section 3.1 Exercises

c What bound could you place on the error in the approximation in b ?

Q27 Let f (x) = ex .

a Write the degree 5 Taylor polynomial of f centered at 0.

b How could we use this polynomial to approximate √1 ?


e

c Produce an error bound for your approximation in b .

Q28 Let f (x) = xex .

a Compute the Taylor polynomial T3 (x) for f (x) centered at x = 0.

b Compute the theoretical error bound for T3 (2).

c Explain the difficulties that would arise from this error bound, if your goal is to approximate

f (2) by hand. Can you resolve them?

Q29 Let f (x) = cos 3x

a Write the degree 4 Taylor polynomial of f centered at x = 0.

b How would you use that Taylor polynomial to approximate the value of cos 3π
4 ?

c What bound can you place on the error of such an approximation?

Q30 Consider the graph of y = f (x) below.

186
a Suppose you wanted to produce the second degree Taylor polynomial of f centered at a =
−1. Indicate whether the constant term and each coefficient would be positive or negative.
Provide evidence for your answer.

b Would T2 (4) underestimate or overestimate f (4)? Explain.

Synthesis and Extension

Q31 Let f (x) = x3 − 3x + 5.

a Write an expression for T3 (x), the Taylor polynomial centered at x = 2.

b What can you say about th error R3 (x) for any x?

c What relationship does this suggest between f (x) and T3 (x)?

d Can you verify this relationship algebraically?

e Conjecture a general relationship between polynomial functions and certain Taylor polyno-
mials. Can you use Taylor’s inequality to justify your conjecture?

187
Section 3.2

Sequences
Goals:
1 Use notation to describe the terms of an infinite sequence.
2 Calculate the limit of an infinite sequence.
Sequences are the first step in our development of Taylor series. While they appear to have little in
common with polynomials of infinite degree, they are the scaffolding on which such objects are built.

Question 3.2.1
What Is a Sequence?

A sequence is an ordered set of numbers. If this set is infinite, we can most rigorously define it by
giving a general formula for the nth term for some index variable n. Here are three different notations
for the same sequence.
   ∞
1 2 3 4 n n
, , , ... an =
2 3 4 5 n+1 n=1 n+1

Example
∞
n2

The first three terms of are
2n n=0

02 12 1 22
=0 = =1
20 21 2 22

Question 3.2.2
What Is the Limit of a Sequence?

Definition

If we can make the elements of a sequence an arbitrarily close to some number L by considering only n
above a certain number, then we write
lim an = L
n→∞

and we say the sequence converges to L. If an does not converge to any such L then we say it
diverges.

188
Remarks

The first few or even the first thousand terms of a sequence have no bearing on the limit. We
only care that we can eventually get close to L.

“Arbitrarily close” means any level of closeness than anyone could ask for. Eventually the sequence
1 1 1
must be within 100 of L, and 1000 and 1000000 .

Figure: A sequence converging to L = 3

Example 3.2.3
Computing a Limit

n
Calculate lim
n→∞ n + 1

1 2 3 4
2, 3, 4, 5 ...

189
Example 3.2.3 Computing a Limit

Solution

Writing the first few terms suggests that this sequence approaches 1. To see that, we can measure the
distance to 1:
n 1
1 − an = 1 − =
n+1 n+1
1
We can make this smaller than any positive number. For instance to make an within 1000 of 1, we can
n
consider only n > 1000. We conclude lim =1
n→∞ n + 1

n
Figure: The sequence n+1 converges to L = 1.

Question 3.2.4
How Are Limits of Sequences and Functions Related?

The definition of lim an should look familiar. The definition of the limit of a function is similar.
n→∞
In fact, the limit of a f (x) as x → ∞ has a nearly identical construction, except that n must be an
integer, while x can be any real number. The following theorem lets us use that connection to evaluate
limits.

190
Theorem

Suppose for a sequence an , there is a function f (x) such that f (n) = an for all n (or at least all n
sufficiently large). If

lim f (x) = L
x→∞

we can conclude that

lim an = L.
n→∞

Example 3.2.5
Sequence Limits Using Functions

Find limits of the following sequences:

2n
a lim
n→∞ n+3

1
b lim
n→∞ n3

c lim e−n
n→∞

n2
d lim
n→∞ en

e lim (−1)n
n→∞

Solution

We will use x to denote a real number variable and n to denote natural numbers.

2x 2n
a lim = 2, so lim = 2.
x→∞ x+3 n→∞ n + 3

1 1
b lim 3
= 0, so lim 3 = 0.
x→∞ x n→∞ n

c lim e−x = 0 so lim e−n = 0.


x→∞ n→∞

191
Example 3.2.5 Sequence Limits Using Functions

x2
d lim can be evaluated with L’hôpital’s rule.
x→∞ ex

x2 2x ∞ 
lim = lim x form, L’hôpital’s again
x→∞ ex x→∞ e ∞
2
= lim x
x→∞ e

n2
= 0so lim =0
n→∞ en

e f (x) = (−1)x is not well defined for real numbers so we can’t use its limit. Instead examine the
sequence directly. The sequence has the form

−1, 1, −1, 1, −1, 1, −1, 1, . . .

This does not approach arbitrarily close to any number. No matter how many early terms we
disregard, there will always be terms remaining that are not close to 1, or not close to −1 or not
close to any other number. Thus an = (−1)n diverges.

The following limit laws for sequences should look familiar. They mirror the laws for limits of
functions.

Theorem [Limit Laws]

If lim an = K and lim bn = L then the following sequences converge with the following limits:
n→∞ n→∞

lim (an + bn ) = K + L
n→∞

lim (an − bn ) = K − L
n→∞

lim (an bn ) = KL
n→∞

an K
If L ̸= 0, then lim =
n→∞ bn L

For any constant c, lim can = cK


n→∞

192
Synthesis 3.2.6
Indeterminate Forms with Factorials

We will encounter sequences of the form an = cbnn . If bn or cn both go to 0 or ±∞, then any attempt
to use
lim an = lim f (x)
n→∞ x→∞

would require l’Hôpital’s rule.

Dominance

f (x)
We say f (x) dominates g(x) if lim = ±∞. We write
x→∞ g(x)

f (x) >> g(x)

Even if you include a constant multiple or add multiple functions together, the dominant function
will outgrow any combination of dominated ones. We have already established an order of dominance
using l’Hôpital’s rule:

exponential polynomial root logarithm


>> >> >>
(larger base>>smaller base) (larger degree>>smaller) (smaller power>>larger) (smaller base>>larger)

But n! is not a differentiable function. We cannot analyze it using l’Hôpital’s rule. Where does it fit
in the domincance pecking order?

Theorem

As n → ∞, n! will eventually dominate any exponential function (and thus any polynomial, root or
logarithm).

We will not provide a formal proof, but here is a useful thought experiment. Suppose we compare
n! to 63n . At first 63n grows faster, multiplying by 63 every time we increase n. However, when n is
greater than 63, n! is multiplying by a higher number. When n reaches one billion, 63n increases by a
factor of 63 every step, while n! increases by a factor of 1, 000, 000, 000. By this point n! is much larger
and growing much faster.

193
Section 3.2
Exercises

Summary Questions

Q1 Why do we use n instead of x as an index for a sequence?

Q2 Describe three different ways of denoting a sequence.

Q3 When is the limit of a sequence equal to the limit of a function?

Q4 If an = bn + 1000 for 1 ≤ n ≤ 2000000, what does that tell us about the limits lim an and
n→∞
lim bn ?
n→∞

3.2.1

Q5 Find a general expression for an , the nth term of the following sequences. Use this to write the
sequences using both other types of notation.

a {2, 5, 10, 17, 26, 37, 50, . . .}


 
3 3 3 3 3
b ,− , ,− , ,...
2 4 8 16 32
 
1 1 1 1 1
c , , , , ,...
2 6 12 20 30

Q6 What is the fourth term in the sequence {n3 − 5n}∞


n=3 ?

194
3.2.2

sin n
Q7 Show using the definition of the limit of a sequence that lim = 0.
n→∞ n2
2n − 1
Q8 Show using the definition of the limit of a sequence that lim = 1.
n→∞ 2n

Q9 A sequence is increasing if every term is larger than the previous term. Must an increasing
sequence always diverge? Explain.

Q10 A sequence is alternating if its terms alternate between positive and negative values. Is it possible
that the limit of an alternating sequence exists? What would its value have to be?

3.2.3

Q11 Consider the sequence an = 2n .

a What function could we write such that f (n) = an .

b Does limx→∞ f (x) converge?

c Does the theorem equating limits of functions and sequences apply to this function?

d Can we argue that limn→∞ 2n diverges anyway?

Q12 Consider the sequence an = n sin(πn)

a What is lim x sin(πx)?


x→∞

b Compute the first few values a1 , a2 , a3 , and a4 .

c What is lim n sin(πn)?


n→∞

d Does this contradict one of our theorems? Explain.

195
Section 3.2 Exercises

3.2.4

log n
Q13 Compute lim .
n→∞ 3n
n
Q14 Compute lim .
n→∞ 2n
n3 + 3
Q15 Compute lim .
n→∞ 4n3 − 9

sin n
Q16 Compute lim .
n→∞ log n

en
Q17 Compute lim √ .
n→∞ n

Q18 Compute lim tan−1 n.


n→∞

3.2.5

n!
Q19 Compute lim .
n→∞ 5n
n4 + 3n + 1
Q20 Compute lim .
n→∞ n!

Q21 Does nn grow faster or slower than n!? Explain.

n!
Q22 Yuran knows that lim = ∞ because n! growns faster than an . However, he thinks he can
n→∞ 5n
make the denominator grown faster than the numerator if he uses a product like 5nn!6n or 5n 6n!n 7n .
Will he eventually obtain a non-infinite limit by this method? Explain how you know.

196
Synthesis & Extension

(
f (n) if n ≤ 342
Q23 Suppose we have a sequence an = . Which of the following could help us
g(n) if n > 342
evaluate lim an ?
n→∞

lim f (x)
x→∞

lim g(x)
x→∞

Q24 Let Tn (x) be the nth Taylor polynomial of f (x) = ln x centered at x = 1.

a Write an expression for Tn (x) using Σ notation.

b Write an expression for the error bound of Tn (x) for some x between 0 and 1.

c For what values of x will the error bound shrink to 0 as n goes to ∞?

197
Section 3.3

Series
Goals:

1 Identify partial sums of a series.


2 Recognize harmonic and alternating harmonic series.
3 Apply the divergence test.
4 Evaluate geometric series.
5 Apply the ratio test.

The first step in understanding a Taylor polynomial of infinite degree is understanding how to add
up infinitely many of anything. This proposition is mechanically absurd. Addition is an operation for
two numbers at a time. Adding three or four numbers requires us to add two or three times. Adding
infinitely many requires us to add infinitely many times, something no one has time to do.
Yet there are some intuitive exercises we could perform. Suppose we lay a length of 12 m next to 14 m
next to 81 m. If we continued indefinitely, we could imagine these lengths extending an entire meter.

Figure: One meter expressed as a sum of infinitely many smaller lengths

What reasoning could we use to make this exercise rigorous? How could we add up lengths or
numbers where the pattern is not so intuitive? The formal object that does this is called a series. A
series is the first step on our way to push the Taylor polynomial to infinite degree. It is also the most
general. While we are concerned with one specific (and very useful) type of series, there are other
applications worth exploring as well.

Question 3.3.1
What Is a Series?

You have been encountering series since you first learned about decimals. You likely have not seen
a rigorous description of what they mean.

0.33333333 . . . 3.1415926...

We can write
3 3 3 3
0.3333 . . . = + + + + ···
10 100 1000 10000
or
1 4 1 5
3.1415 . . . = 3 + + + + + ···
10 100 1000 10000
You may have an intuitive sense of what these quantities are, but what does it mean to add up
infinitely many numbers?
198
Definition

X
A series is a sum of the form ak where ak is an infinite sequence. If it is more convenient, we can
k=1
X
give k a different initial value. If the context is clear, we can write ak as a shorthand.

Example


X 3
0.33333 . . . =
10k
k=1


X 1
The harmonic series is
k
k=1

This tells us what a series is but not how to evaluate it. How do we know that, for example

1
0.333 . . . = ?
3

We evaluate a series by associating it with a sequence of partial sums.

Definition

X
The nth partial sum of the series ak is
k=1

sn = a1 + a2 + a3 + · · · + an


X
A series ak converges to L if
k=1
lim sn = L.
n→∞

A series that does not converge to any L diverges.

Vocabulary Note

Do not confuse a sequence with a series. One is a list of numbers. The other is the sum of a list of
numbers.

199
Example 3.3.2
Computing Partial Sums


X 3
Consider .
10k
k=1

a Compute the first few partial sums s1 , s2 , s3 of this series.

b Compute lim sn
n→∞

Solution

3
s1 =
10
3 3 33
s2 = + =
10 100 100
3 3 3 333
s3 = + + =
10 100 1000 1000
3 3 3 3 3333
s4 = + + + =
10 100 1000 10000 10000

b In order to use our usual methods of limits, we would need an algebraic expression for sn . It isn’t
immediately clear how to produce one. Given our knowledge of decimals, we expect the answer
to be 13 . We will use this as a hint. We expect 31 − sn to approach 0.

1 1
− s1 =
3 30
1 1
− s2 =
3 300
1 1
− s3 =
3 3000
1 1
− s4 =
3 30000
1 1
extrapolating suggests − sn =
3 3(10)n

Assuming this pattern holds, we have


1 1
lim − sn = lim =0
n→∞ 3 n→∞ 3(10)n


X 3 1
and we conclude that k
= .
10 3
k=1

200
Main Idea

X
Often, we can show that ak = L by computing L − sn and seeing that it converges to 0.
k=1

1
Figure: The partial sums sn converging to L = 3

Example 3.3.3
The Harmonic Series

We have seen examples of series in which the terms approach 0 as k → ∞. These have allowed us
to add infinitely many terms and obtain a finite sum. Does this always work? No. A series can have its
terms approach 0, and yet the partial sums go to ∞. The most famous example of this is the harmonic

X 1
series: . Rather than computing the partial sums directly (which would be a lot of computation)
k
k=1
we will compare the partial sums to an expression that is easier to calculate. We will replace each term
by a fraction with a power of 2 in the denominator. Here’s what we’ll do with s8 .

1 1 1 1 1 1 1 1
s8 = + + + + + + +
1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 1
> + + + + + + +
1 2 |4 {z 4} |8 8 {z 8 8}
1 1
2 2

1 1 1
=1 + + +
2 2 2

Since we replaced each term with something smaller and obtained a sum of 25 , we can conclude that
s8 > 52 . Continuing this pattern, the terms 91 to 16
1
sum to more than 12 so s16 > 62 . In general we can

201
Example 3.3.3 The Harmonic Series

make sn bigger than any integer c by setting n = 2m where

1
1 + m > c.
2
This tells us that the harmonic series diverges.

Question 3.3.4
What Is a Geometric Series?

The two series so far that we have been able to evaluate belonged to a larger family. These are the
geometric series.

Definition

X
A geometric series is a series of the form ark−1 .
k=1

a is the initial term. r is the common ratio between terms.

Example

∞  k−1
X 1 1 1 1
=1+ + + + ···
2 2 4 8
k=1

∞  k−1
X 3 1 3 3 3 1
= + + + ··· =
10 10 10 100 1000 3
k=1

Unlike many other series, geometric series are simple enough that we can write a formula for their
sum. We can get a convenient expression for sn by performing a cute algebra trick. We’ll multiply sn
by r and subtract rsn from sn . Most of the terms cancel and we obtain an equation that we can solve
for sn .
sn = a + ar + ar2 + · · · + arn−1

−rsn = −ar − ar2 − ar3 − · · · − arn


(1 − r)sn = a − arn
a(1 − rn )
sn =
1−r
The last step requires that 1 − r ̸= 0, since we cannot divide by 0. As long as r ̸= 1, we can evaluate
the series by taking a limit.

X a(1 − rn )
ark−1 = lim
n→∞ 1−r
k=1

202
To evaluate this limit, we need to understand the behavior of rn as n → ∞
If −1 < r < 1 then higher powers of r get smaller and smaller and rn → 0.
If r > 1 then higher powers of r get larger and larger and rn → ∞.
If r < −1 then higher powers of r get larger but alternate signs. lim rn does not exist.
n→∞

If r = 1 then the series is a + a + a + a + a + · · · , then sn = an which diverges to ±∞, depending


on the sign of a.

If r = −1 then the series is a − a + a − a + a − · · · , then sn alternates between a and 0. This


sequence does not converge.
We can apply the above to completely solve the problem of evaluating a geometric series. Our result is
the following theorem:

ark−1 for various r


P
Figure: The partial sums of

Theorem

Geometric series have the following partial sums

n

n  a(1 − r )
X if r ̸= 1
sn = ark−1 = 1−r
an if r = 1
k=1

a
These converge to when |r| < 1 and diverge when |r| ≥ 1.
1−r

203
Example 3.3.5
Evaluating Geometric Series

Identify a and r in the following geometric series. Then evaluate the series.

2 4 8
a 3 + 15 + 75 + ···


X
b 3n
n=2

c 0.999999 . . .

Solution

4/15
a a is the initial term, which is 23 . The common ratio is the ratio between any two terms. 2/3 = 25 .

Since |r| < 1, the sum of the series is

∞    k−1 2 2
X 2 2 3 3 10
= 2 = 3 =
k=1
3 5 1− 5 5
9


X
b The initial term of this series is 9. The common ratio is 3. Since |3| ≥ 1, 3n diverges.
n=2

9 9 9 9 1
c 0.999999 . . . = 10 + 100 + 1000 + · · · . This has an initial term of 10 and a common ratio of 10 .

|r| < 1 so
9 9
10 10
0.999999 . . . = 1 = 9 =1
1− 10 10

204
Question 3.3.6
P
What Does the Size of ak Tell Us About ak ?

The discussion of the geometric series suggests that certain properties of a series make convergence
impossible. Specifically, in the cases in which the terms were not shrinking to 0, the partial sums were
growing without bound or oscillating. This intuition can be formalized in the following theorem, which
applies to more than just geometric series.

Theorem [The Divergence Test]

Let ak be a sequence. If lim ak ̸= 0, then the series


k→∞


X
ak
k=1

diverges.

Remark

The divergence test does not tell us anything, if lim ak = 0. The series might converge, and it might
k→∞
not. In this case we say the test is inconclusive.

Example 3.3.7
Applying the Divergence Test

What does the divergence test tell us about each of the following series?

X
a 3k
k=2


X 1
b
k
k=2


X k2 − 1
c
3k 2 + 7
k=2


X k2
d
ek
k=2

205
Example 3.3.7 Applying the Divergence Test

Solution

a The sequence is ak = 3k . lim 3k = ∞. This limit is not 0, so by the divergence test, the series
k→∞
diverges.

1 1
b The sequence is ak = k. lim= 0. The divergence test is inconclusive. It cannot tell us
k
k→∞
whether this series diverges or converges. By our earlier work, we happen to know this series
diverges.

k2 −1 k2 − 1 1
c The sequence is ak = 3k2 +7 . k→∞
lim 2
= . This limit is not 0, so by the divergence test,
3k + 7 3
the series diverges.

k2
d The sequence is ak = ek
. We need L’Hôptial’s rule to evaluate the limit.

k2 2k  ∞ 
lim k
= lim k still form
k→∞ e k→∞ e ∞
2
= lim
k→∞ ek

=0
The divergence test is inconclusive. It cannot tell us whether this series diverges or converges. It
turns out that this series converges, but we do not have a method to verify that yet.

Question 3.3.8
What Is the Ratio Test?

So far we have two tests to determine the convergence of a series. One test is very specific, applying
only to geometric series. The other is very imprecise. The divergence test is often inconclusive. It
does not help us to evaluate a series at all, only recognizing some series that diverge. Unfortunately,
these shortcoming are typical of series tests. A rigorous study of infinite series requires learning almost a
dozen tests. On a randomly chosen series, most of these tests will be inconclusive, and none of them will
give a numerical value, even if the series happens to converge. Because we are interested in extending
Taylor polynomials to have infinitely many terms, some of these tests are much more useful than others.
The most useful is the ratio test, though it is still no help in evaluating a series and is still sometimes
inconclusive.
In the case of a geometric series, ark−1 , the common ratio between terms determines whether this
P
series grows out of control, or whether the terms shrink quickly enough that the partial sums converge.
Even when a series is not geometric, we can attempt to apply similar reasoning to determine whether it
converges. A non-geometric series does not have a constant ratio. The ratio between successive terms
will change as we progress through them. We will instead compute the limit of these ratios.

206
Theorem [The Ratio Test]

ak+1 X
If lim = L < 1, then ak converges absolutely.
k→∞ ak
ak+1 X
If lim = L > 1 or is infinite, then ak is divergent.
k→∞ ak
ak+1
If lim = 1, then the ratio test is inconclusive.
k→∞ ak

Remark

Converges absolutely is a term for series with both positive and negative terms. It means the series
would converge, even if the signs of all the terms were all positive. The alternative is conditional
convergence, meaning the series’s convergence may require the positive and negative terms partially
canceling each other out.

Example

The series
1 1 1 1
1− + − + − ···
2 3 4 5
converges (we won’t prove this). If we made all the terms positive, it would be the harmonic series,
which diverges. This series converges conditionally, not absolutely.

Absolute versus conditional convergence can be interesting to play with. You may see references to
it in other math books, but we won’t have any further use for it.

Example 3.3.9
Applying the Ratio Test


X (−1)k−1
a Does converge or diverge?
k!
k=1


X 2k
b Does converge or diverge?
k2
k=1


X
c Does k converge or diverge?
k=1

207
Example 3.3.9 Applying the Ratio Test

Solution

a First we will compute and simplify the ratio. Then we will take its limit and draw a conclusion.

(−1)k
ak+1 (k+1)!
= (−1)k−1
ak
k!

(−1)k k!
=
(−1)k−1 (k + 1)!
(−1)k (1)(2)(3) · · · (k)
= (expand the factorials)
(−1)k−1 (1)(2)(3) · · · (k)(k + 1)
(−1)k
= (cancel the matching factors)
(−1)k−1 (k + 1)
−1
= (cancel k − 1 powers of − 1)
k+1
1
= (absolute value of a negative number is its negatve)
k+1

Now we take the limit


1
lim =0
k→∞ k+1

X (−1)k−1
0 < 1 so by the ratio test, converges.
k!
k=1

b We will apply the ratio test. First we compute the ratio, and then we take a limit.

2k+1
ak+1 (k+1)2
= 2k
ak k2

2k+1 k 2
=
2k (k 2 + 2k + 1)
2k 2
= (cancel the 2s)
k 2 + 2k + 1
2k 2
=
k2 + 2k + 1
2k 2
lim =2
k→∞ k 2 + 2k + 1

2 > 1 so by the ratio test, this series diverges.

208
c We will apply the ratio test. First we compute the ratio, and then we take a limit.

ak+1 k+1
=
ak k
k+1 k+1
= lim =1
k k→∞ k

Here the ratio test is inconclusive. It cannot tell whether this series converges or diverges. However,
we can probably figure this out another way. The terms of this series are increasing, which means
the partial sums will grow faster and faster. This was the reasoning behind the divergence test.

lim k = ∞
k→∞

Since lim k ̸= 0, the divergence test concludes that the series diverges.
k→∞

Main Ideas

When applying the ratio test, be sure to replace every k with k + 1 for the ak+1 term.

Familiarize yourself with the algebra rules that allow you to simplify ratios of exponentials and
factorials.

Example 3.3.10
A Strategy for Series Tests

209
Example 3.3.10 A Strategy for Series Tests

Strategy

Given the three ways we have to test for divergence and convergence and the relative ease of applying
each, here is a reasonable approach to testing a series.

Check lim an by dominance


n→∞

not zero zero hard to tell

constant |r| ≥ 1 an+1 constant |r| < 1


Compute
P P
an diverges an converges
an

not constant

>1 an+1 <1


Compute lim
n→∞ an

=1 hard to tell

Inconclusive, look up another test

Let’s apply our strategy to see what we can tell about


X 1
.
n=1
n2

Solution

First we’ll check that the terms go to zero. If they don’t we quickly classify this as a divergent series.

1
lim =0
n→∞ n2

They do, so we need another check. Now we’ll compute the ratio between terms.

1
an+1 (n+1)2 n2
= 1 =
an n2
n2 + 2n + 1

This is not a constant; it depends on n. Thus an is not a geometric series. We’ll try the ratio test.

an+1 n2
lim = lim 2
n→∞ an n→∞ n + 2n + 1

n2
= lim
n→∞ n2 + 2n + 1
=1

This means that the ratio test is inconclusive. We do not know whether this series converges or diverges.
We have exhausted all our tests. If we want the answer, we need to look up another test.

210
Section 3.3
Exercises

Summary Questions

Q1 What is the difference between a sequence and a series?

Q2 How do we evaluate a series?

Q3 What is a geometric series. How do we evaluate one?

Q4 What does it mean to say that a series test is inconclusive?

ak+1
Q5 How do each of the following factors behave in the ratio ?
ak

a k p (p a constant)

b ck (c a constant)

c k!

X
Q6 How would the ratio test apply to a geometric series ark−1 ?

3.3.1

Q7 Give a more common name for each of the following series.

7 1 8 2 8
a 2+ + + + + + ···
10 100 1000 10000 100000
6 6 6 6
b + + + + ···
10 100 1000 10000

25
Q8 Use a calculator to get a decimal approximation of 33 and write it as a series of fractions with
powers of 10 as denominators.

211
Section 3.3 Exercises

3.3.2

Q9 Consider the series



X 1
k(k + 1)
k=1

a Compute the first four elements in the series.

b Compute the partial sums: s1 , s2 , s3 , s4 .

c What do the partial sums appear to be converging to?

d Can you use algebra to generalize your answer to 2 to sn ?


X k+1
Q10 Compute the first 3 partial sums of . Don’t simplify the arithmetic.
k2
k=1


X
Q11 Compute the first four partial sums of (−1)k . What do you think this suggests about the sum
k=1
of the series?

X 1
Q12 Compute the first five partial sums of . Use them to make a prediction about the value
(−2)k
k=0
of the series.

3.3.3

Q13 Give an example of an n such that you know the nth partial sum of the harmonic series is greater
than 20.

X 1
Q14 Modify our argument for the harmonic series to show that √ diverges?
k=0
k

212
3.3.4

1 1 1 1
Q15 Is + + + + · · · a geometric series? How can you tell?
2 4 6 8

Q16 Is 1 + 4 + 9 + 16 + 25 + · · · a geometric series? How can you tell?

Q17 The first two terms of a geometric series are 5 and 7.5. What is the third term?

Q18 The fifth term of a geometric series is 17. The eigth term is 51. What is the sixth term?

3.3.5


X
Q19 Evaluate 5(0.3)k
k=0

∞  k
X 1 4
Q20 Evaluate .
4 3
k=0


X 15
Q21 Evaluate .
j=3
5j


X
Q22 Evaluate 0.8k .
k=1


X 3k
Q23 Evaluate .
2k (18)
k=4


X 37
Q24 Evaluate . What decimal does this represent?
100k
k=1


X 3k
Q25 For what values of z does converge?
zk
k=0


X 12p2k
Q26 For what values of p does converge?
16k
k=3

213
Section 3.3 Exercises

3.3.6

n
X
1
Q27 If ak > 100 for all k, then what can you say about the value of sn = ak ?
k=1

1
Q28 If limk→∞ ak = 100 , use the definition of a limit and the reasoning in the previous exercise to

X
show that an diverges.
k=1

3.3.7


X 1
Q29 What does the divergence test say about ?
k3
k=1


X k2 + 1
Q30 What does the divergence test say about ?
5k 2 + 3k
k=1


X
Q31 What does the divergence test say about ln k?
k=2


X 1
Q32 What does the divergence test say about ?
ln k
k=2

3.3.8

Q33 Will the divergence test detect every series that “fails” the ratio test (L > 1)? Explain.

an + 1
Q34 If lim does not exist, the ratio test is inconclusive. Give examples of two series where
n→∞ an
this limit does not exist, one series that diverges and one that converges.

214
3.3.9


X k!
Q35 Apply the ratio test to . What can you conclude?
4k
k=1


X k5k
Q36 Apply the ratio test to . What can you conclude?
(k + 1)!
k=1


X (−1)k−1
Q37 Apply the ratio test to . What can you conclude?
k2
k=1


X (−8)k
Q38 Apply the ratio test to . What can you conclude?
k 2 5k
k=1


X k2
Q39 Apply the ratio test to . What can you conclude?
4k
k=1


X k!
Q40 Apply the ratio test to . What can you conclude?
5k 3 + 4k − 2
k=3

∞ √
X k+1
Q41 Apply the ratio test to What can you conclude?
k2
k=1

∞ √
X
Q42 Apply the ratio test to ke−k What can you conclude?
k=1

3.3.10

P∞ k+1
Q43 Use one of the tests from this section to deterine whether k=1 k converges.

P∞ 3(4k )
Q44 Use one of the tests from this section to deterine whether k=1 7k
converges.

kek
P∞
Q45 Use one of the tests from this section to deterine whether k=1 4k+1 converges.

7k9k
P∞
Q46 Use one of the tests from this section to deterine whether k=1 k32k+1 converges.

215
Section 3.3 Exercises

Synthesis & Extension

Q47 In a paragraph or two, explain: How is evaluating an improper integral similar to evaluating an
infinite series. How are they different?

Q48 Suppose we have a sequence an such that lim an = 30. Suppose we then increase the values
n→∞
of the first few terms of an by 10, 000 each.

a Explain how this will affect the value of lim an .


n→∞

X
b Explain how this will affect the value of an .
n=1

R∞ 1
Q49 Suppose we wanted to approximate 0 ex dx by rectangles of length ∆x = 1, with heights
measured at the left endpoints.

a What are the areas of the first 5 rectangles, starting from x = 0?

b How many rectangles will you need in total?

c Express the sum of the areas of these rectangles as a series.

d Does this series converge? To what value?

e Does your series over- or underestimate the true value of the integral?

R∞ 1
Q50 Suppose we wanted to approximate 1 x2 dx by rectangles of length ∆x = 1, with heights
measured at the right endpoints.

a What are the areas of the first 5 rectangles, starting from x = 1?

b Express the sum of the areas of all the the rectangles you’ll need as a series.

c Does your series over- or underestimate the true value of the integral?

d What is the true value of the integral? What does this suggest about whether your series
converges or diverges?

Q51 Suppose that a discrete random variable X has distribution function


(
1
2x if x is a positive integer
fX (x) =
0 otherwise
216
a Verify that fX (x) is a valid probability distribution function.

b Compute P (X > 4).

c Compute E[X] (this is difficult).

Q52 Suppose that a discrete random variable X has distribution function

(
1 1
x − x+1 if x is a positive integer
fX (x) =
0 otherwise

a Verify that fX (x) is a valid probability distribution function.

b Compute P (3 ≤ X ≤ 5).

c Explain why you can’t compute E[X].

217
Section 3.4

Power Series
Goals:

1 Use series tests to determine for what values of x a power series converges.
2 Identify the radius of convergence of a power series.
3 Recognize functions that can be rewritten as a power series.

The infinite degree polynomials we seek to define are series. The tools we’ve developed so far
provide the foundation for understanding the objects we want to construct, but there is more to do. A
polynomial also contains a variable. In this section we deal with the ramifications of including a variable
in an infinite series.

Question 3.4.1
What Is a Power Series?

So far we have studied infinite series of numbers. If instead of just numbers, our terms include
variables, then we’ve created a function. Plugging in different values for the variable gives us a different
series of numbers.

Example

The expression
1 + x + x2 + x3 + · · ·
becomes
1 + 2 + 4 + 8 + ···
when we evaluate it at x = 2. It becomes

1 1 1
1− + − + ···
3 9 27

when we evaluate it at x = − 31 .

Definition

An infinite series of the form



X
ck (x − a)k
k=0

is called a power series centered at a.

It is a function of x whose domain is all values of x that make the series converge.

For the purposes of this definition, we define x0 = 1 even when x = 0.


218
Example 3.4.2
A Geometric Series as a Power Series

1
Use the geometric series formula to write f (x) = as a power series and find its domain.
1−x

Solution

1
is the sum of a geometric series. In this case, the initial term a = 1 and the common ratio r is
1−x
x. If we write out the first few terms we obtain 1 + x + x2 + x3 + · · · . WeP∞see this is a power series
centered at 0. The coefficients ck are all equal to 1. We could write it as k=0 xk .
The domain of a power series is the values of x that make it converge. We know that this geometric
series converges if and only if the common ratio x has absolute value less than 1. Those values of x,
the open interval (−1, 1), are the domain of f .

Example 3.4.3
The Domain of a Power Series


X k2
What is the domain of (x − 5)k ?
4k
k=1

Solution

The domain is the set of x values that make the series converge. The ratio test will be helpful here.
The ratio between terms is

(k+1)2
ak+1 4k+1
(x − 5)k+1
= k 2
ak 4k
(x − 5)k

(k + 1)2 4k (x − 5)k+1
=
k 2 4k+1 (x − 5)k
(k 2 + 2k + 1)(x − 5)
=
4k 2

Notice this entire computation is invalid if x = 5, because we cannot divide by 0. We can examine
this case directly. If x = 5 then every term of the series is 0, and the series converges. For the rest of
the real numbers, we compute the limit as k → ∞, but x will remain in the result.

(k 2 + 2k + 1)(x − 5) (x − 5) k 2 + 2k + 1 (x − 5)
lim = lim =
k→∞ 4k 2 4 k→∞ k2 4

219
Example 3.4.3 The Domain of a Power Series

(x−5)
The ratio test can tell us whether the series converges for some values of x. If 4 < 1 the series
converges. We can solve for x

(x − 5)
<1
4

x−5 <4 (since 4 > 0)

−4 < x − 5 < 4
1<x<9 (add 5 to all three expressions)

(x−5)
On the other hand, if 4 > 1 the series diverges. Solving for x follows a similar procedure.

(x − 5)
>1
4

x−5 >4 (since 4 > 0)

x − 5 < −4 or x − 5 > 4
x < 1 or x > 9

(x−5)
What about when x = 1 or x = 9? 4 = 1 so the ratio test is indeterminate. We would
need another test to resolve these points. In this case, we are lucky. If x = 9 the series becomes
P∞  k2 
k=1 4 (4). The divergence test is useful here: lim k 2 = ∞. Since the terms do not approach 0,
k→∞
the series diverges. A similar argument works for k = 1.

Main Idea

The ratio test is usually successful in finding where a power series converges. Generally it is inconclusive
at only two points. We will not always have a test that can tell us whether the series converges at these
points.

You may notice a pattern in the types of domains we have computed for power series. That pattern
is formalized in the theorem below, which tells us that the domain of a power series must take a very
particular form.

220
Theorem

X
Given a power series ck (x − a)k centered at a, one of the following is true.
k=0

1 The series converges only when x = a.


2 The series converges when x is any real number.
3 There is a radius of convergence R such that
a The series converges when |x − a| < R, and
b The series diverges when |x − a| > R.

In case 3 , the inequality |x − a| < R solves to a − R < x < a + R, which means the domain is an
interval centered at a and extending a distance R to either side. The theorem does not state whether
this is a closed, open or half open interval. This reasoning extends intuitively, if not formally, to the
other cases. 1 can the thought of as a (closed) interval extending distance 0 on either side. 2 would
then be an interval extending infinitely on either side.

Figure: The domain |x − a| < R of a power series.

Remark

The main consequence of this theorem is that when solving for the domain of a power series, we
can simplify our use of the ratio test. The interval of convergence will always be the solution to
ak + 1
lim < 1. The endpoints may or may not lie in the domain. The points beyond the endpoints
k→∞ ak
will never be part of the domain.

Question 3.4.4
Can We Integrate or Differentiate a Power Series?

When f (x) is a polynomial, we can find the derivative and anti-derivative of f (x) by computing the
(anti-)derivative of each term. The following theorem says that we can do this for a power series too.

221
Question 3.4.4 Can We Integrate or Differentiate a Power Series?

Theorem

X
If f (x) is the power series ck (x − a)k and f (x) has radius of convergence R > 0 then f (x) is
k=0
differentiable and continuous on the interval (a − R, a + R), and


X
1

f (x) = kck (x − a)k−1
k=1


(x − a)k+1
Z X
2 f (x) dx = C + ck
k+1
k=0

Both of these functions also have radius of convergence R.

Remark

Notice that we remove the k = 0 term from the derivative. The derivative of that term is 0, but
0c0 (x − a)−1 is undefined at x = a.

Example

1 X
We have seen that = xk on the interval (−1, 1). From that we can compute:
1−x
k=0

∞ ∞
d X k X k−1
x = kx
dx
k=0 k=1
∞ ∞
xk+1
Z X X
xk dx = +c
k+1
k=0 k=0

Both have domain (−1, 1).

222
Section 3.4
Exercises

Summary Questions

Q1 What is the difference between a polynomial and a power series?

Q2 What test is useful for establishing the domain of a power series? What form can this domain
have?

Q3 How can we integrate or differentiate a power series?

Q4 How does differentiation affect the radius of convergence of a power series?

3.4.1

Q5 Use Σ notation to express the following series

a 10 + 15x + 20x2 + 25x3 + 30x4 + · · ·

1
b 2 − 14 x2 + 18 x4 − 1 6
16 x + 1 8
32 x − ···

Q6 Use Σ notation to express the following series

x2 x4 x6 x8
a 1− 2 + 24 − 720 + 40640 − ···

b x3 + 4x4 + 9x5 + 16x6 + 25x7 + ·

223
Section 3.4 Exercises

3.4.2

1
Q7 Consider f (x) = 1−4x2 .

a If f is the sum of a geometric series, what is r?

b Write f (x) as a geometric series centered at 0.

c What is the domain of your answer in b ?

5
Q8 Write as a power series centered at x = 2.
1 − 3(x − 2)

X k3
Q9 Can the power series p(x) = (x + 7)k be evaluated using the sum of a geometric series
4k
k=1
formula? Explain.

X 1
Q10 Evaluate f (x) = (x − 2)k at x = 6 using the formula for the sum of a geometric series.
5k
k=3

3.4.3


X
Q11 What is the domain of 2k (x − 3)k ?
k=1


X (x + 2)k
Q12 Compute the domain of .
k3
k=0


X 1
Q13 Compute the domain of (x − 6)k .
4k
k=0


X xk
Q14 Compute the domain of .
k!
k=0


X
Q15 Compute the radius of convergence of k(x + 3)k . What interval does this guarantee the series
k=0
converges on?

X
Q16 Compute the radius of convergence of k!xk . What interval does this guarantee the series
k=0
converges on?

224

X 4k
Q17 Compute the radius of convergence of (x − 5)k . What interval does this guarantee the
3k
k=1
series converges on?

Q18 Suppose you are told that a given power series p(x) centered at x = a converges at x = −4 and
diverges at x = −7.

a If a = 1, what can you say about the domain of p(x)?

b What are all of the the possible values of a? Explain your reasoning (briefly).

3.4.4


X
Q19 Compute the antiderivative of 2k (x − 3)k .
k=0


X (x + 2)k
Q20 Compute the derivative of . What is its domain?
k3
k=0


X 1
Q21 Compute the derivative of (x − 6)k . What is its domain?
4k
k=0


X xk
Q22 Compute the antiderivative of .
k!
k=0


X
Q23 What is the domain of the fifth deriative of k(x + 3)k ?
k=0


X 4k
Q24 Compute the radius of convergece of the antiderivative of (x − 5)k .
3k
k=4

225
Section 3.4 Exercises

Synthesis & Extension

Q25 Consider the power series



X k2 + k
p(x) = (x + 3)k .
5k
k=0

a Compute the domain of P . You do not need to check any endpoints of your answer.

b Write an expression for


Z
p(x) dx.


X k
Q26 Consider the series S = .
2k
k=1


X kxk−1
a How is S related to the power series p(x) = .
2k
k=1

b Compute the an avtiderivative P (x) of p(x).

c Write P (x) as ratio F (x), using the sum of a geometric series formula.

d Compute F ′ (1). What is the significance of this value?

Q27 Write a power series for f (x) = tan−1 x by

Diffrentiating f (x)
Writing f ′ (x) as a geometric series
Taking an antiderivative of the geometric series

226
Section 3.5

Taylor Series
Goals:

1 Use a combination of power series and algebra to work with functions.


2 Integrate and differentiate power series.

Our goal has been to understand how to extend a Taylor polynomial to have infinite degree. We are
now ready to define the object rigorously. In general we will not know how to evaluate Taylor series.
If all we want to do is approximate values, they offer no advantages over Taylor polynomials. The
applications of Taylor series are more abstract. After defining these objects, we collect some tricks and
applications for working with them.

Question 3.5.1
What Is a Taylor Series?

Definition

The Taylor series of f (x) at x = a is


X f (k) (a)
T (x) = (x − a)k .
k!
k=0

The Taylor series’s notation simply swaps an n for an ∞ in the expression of a Taylor polynomial. If
we wanted to describe the mathematical relationship precisely, we would say its partial sums sn are the
Taylor polynomials Tn (x) of f at x = a.

Remark

Several mathematicians contributed to the discovery of Taylor series. Taylor series centered at x = 0
were popularized by Colin Maclaurin, and so are often called Maclaurin series.

This definition is built upon a stack of more general definitions, and the methods we have for working
with those apply here.
A Taylor series is a type of power series.

A power series is a type of series


A series is equivalent to a sequence of partial sums.
This list should make us feel better about our hard work over the last few sections. It also gives us
information that helps us understand Taylor series better. For example, since Taylor series are power
series, their domains are also intervals of radius R centered at a.

227
Question 3.5.1 What Is a Taylor Series?

Limitations of Taylor Series

Taylor polynomials were designed to approximate f (x). We might hope that T (x) would be the perfect
approximation, that T (x) and f (x) are equal. Unfortunately, there are obstacles to this.
The Taylor series might not converge for all x.
The Taylor polynomials might not approximate f (x) very well at all. Recall our example
(
0 if x ≤ 0
f (x) = 1
e− x if x > 0

For this function T (x) = 0.

Example 3.5.2
Writing a Taylor series

Let f (x) = ex

a Find the Taylor series for f (x) centered at x = 0.

b On what interval does it converge?

Solution

a We have seen previously that f (k) (x) = ex for all k and thus f (k) (0) = 1. We plug this into the
Taylor series formula.

X 1 k
T (x) = x
k!
k=0

b A Taylor series is a power series. We will use the ratio test to identify the interval of convergence.

228
The ratio of successive terms is
1 k+1
ak+1 (k+1)! x
= 1 k
ak k! x

k!xk+1
=
(k + 1)!xk
x
=
k+1
x
lim =0
k→∞ k+1

This limit is zero no matter what value of x we choose. Since 0 < 1, the ratio test concludes that
this series converges for any value of x. In other words, the domain is all real numbers.

Synthesis 3.5.3
Is a Taylor Series Equal to the Function it Approximates?

Let f (x) = ln x

a Find a pattern in the derivatives and write a general expression for the kth derivative: f (k) (x).

b Use your answer to a to write expressions for the Taylor polynomials Tn (x) and the Taylor series

T (x) of ln x centered at 1. Simplify the coefficients if possible.

c What does the ratio test tell you about where T (x) converges?

d If we wanted to apply Taylor’s inequality to Tn (x), we would need to know where the derivative is

largest (in absolute value). Where is the (n + 1)th derivative largest on the interval [x, 1]? (Here
0 < x < 1).

e Where is the (n + 1)th derivative largest on the interval [1, x]? (Here x > 1).

f What does Taylor’s inequality say about where Rn (x) → 0 as n → ∞?

g What does our answer to the previous question tell us about T (x)?

229
Synthesis 3.5.3 Is a Taylor Series Equal to the Function it Approximates?

Solution

a Let’s compute some derivatives and see if we can find an expression for f (k) (x)

f (x) = ln(x) f (1) = 0

f ′ (x) = x−1 f ′ (1) = 1

f ′′ (x) = −x−2 f ′′ (1) = −1

f ′′′ (x) = 2x−3 f ′′′ (1) = 2

f (4) (x) = −6x−4 f (4) (1) = −6

f (5) (x) = 24x−5 f (5) (1) = 24

These answers look like factorials, but they’re shifted by 1. They’re also alternating signs, which
we can model with (−1)k , except that the even powers are negative. The power of x is −k. One
way to model this is f (k) (x) = (−1)k+1 (k − 1)!x−k .

b Plugging in x = 1 gives f (k) (1) = (−1)k+1 (k − 1)! except at k = 0. For that case we compute
ln 1 = 0. This means we can leave it out of the summation. The form for the remaining terms
allows for some nice simplification.


X (−1)k+1 (k − 1)!
T (x) = (x − 1)k
k!
k=1

X (−1)k+1
= (x − 1)k
k
k=1

c We’ll apply the ratio test

(−1)k+2 k+1
ak+1 (k+1) (x − 1)
= (−1)k+1
ak (x − 1)k
k

(−1)k+2 k(x − 1)k+1


=
(−1)k+1 (k + 1)(x − 1)k
−k(x − 1)
=
k+1
k|x − 1|
=
k+1

230
Now we’ll solve for when the limit of this ratio is less than 1.

k|x − 1|
lim <1
k→∞ k+1
k
|x − 1| lim <1
k→∞ k+1
|x − 1| < 1
−1 < x − 1 < 1
0<x<2
The Taylor series converges on the interval (0, 2).

d To apply Taylor’s inequality to bound |Rn (x)|. We need a bound on |f (n+1) (x)| on the interval

from 1 to x. Looking back at our earlier computation, we obtain f (n+1) (x) = (−1)n+2 n!x−n−1 .
In this case that x > 1, the derivative f (n+1) decreases in magnitude from x to 1 so it is largest
at x. We can use M = n!x−n−1 .

e In this case, f (n+1) decreases in magnitude from 1 to x so it is largest at 1. We can use M = n!.

f The easier case is x ≥ 1. In this case Taylor’s inequality states

n!
|Rn (x)| ≤ (x − 1)n+1
(n + 1)!
1
≤ (x − 1)n+1
n+1

As n approaches infinity, this bound goes to infinity if x − 1 > 1 and to 0 if x − 1 ≤ 1. In the


case that 0 < x < 1, we need some clever algebra to write this as a multiple of an exponential.

n!x−n−1
|Rn (x)| ≤ (x − 1)n+1
(n + 1)!
 n+1
1 x−1

n+1 x

x−1
This goes to 0 if x ≤ 1 and infinity otherwise. Solving this (and assuming x > 0) gives x ≥ 12 .
Putting these together, we can state that the error bound from Taylor’s inequality approaches 0
as we takes higher degree Taylor polynomials, as long as 12 ≤ x ≤ 2.

1

g The answer to the previous question tells us that T (x) converges to ln x on 2, 2 , since the error
bound and hence the error goes to 0. On the other hand, outside this interval, the error might
still go to 0 on 0, 12 , even though the error bound does not. The series diverges outside (0, 2)
so it cannot converge to ln x there.

231
Synthesis 3.5.3 Is a Taylor Series Equal to the Function it Approximates?

Remark

It turns out that T (x) = ln x on (0, 2], which is a larger interval than we were able to establish using
Taylor’s inequality. This should not bother us. Taylor’s inequality produces a bound on the error. The
fact that the bound on the error is going to infinity, doesn’t mean the actual error does. In this case,
for x between 0 and 21 , the actual error approaches 0.

Figure: The Taylor polynomials approach ln x only on (0, 2].

Example 3.5.4
Mixing Taylor Series and Algebra

Let f (x) = x2 sin x. Compute a Taylor series for f (x) centered at x = 0.

Solution

We could try to work out a pattern in the derivatives of f , but even evaluating at x = 0 the computations
become intractable.
f ′ (x) = 2x sin x + x2 cos x

f ′′ (x) = 2 sin x + 4x cos x − x2 sin x

f ′′′ (x) = 6 cos x − 6x sin x − x2 cos x

f (4) (x) = −12 sin x − 8x cos x + x2 sin x

232
Instead we can write the Taylor series for sin x. Our earlier work gave us an expression for the Taylor
polynomials and showed that their error goes to 0 as the degree goes to infinity.


X (−1)k ) 2k+1
sin x = x
(2k + 1)!
k=0

We can obtain an expression for x2 sin x by multiplying both sides by x2 . Since we’re only multiplying
by a power of x, the resulting series will still be a power series centered at 0.


X (−1)k ) 2k+1
x2 sin x = x2 x
(2k + 1)!
k=0

X (−1)k ) 2k+3
= x
(2k + 1)!
k=0

Main Idea

When constructing a Taylor series for f (x) = xk g(x) centered at 0, construct the Taylor series of g(x),
and then distribute the xk .

Example 3.5.5
Integrating a Taylor Series

2
Let f (x) = ex .

a Write a Taylor polynomial T4 (x) for f (x) at x = 0.

b Find a better way to produce the Taylor series for f (x).

Z
2
c Compute a Taylor series for ex dx.

233
Example 3.5.5 Integrating a Taylor Series

Solution

a We will compute the first four derivatives of f (x). We will need the chain rule and later the
product rule.
2
f (x) = ex f (0) = 1
2
f ′ (x) = 2xex f ′ (0) = 0
2 2
f ′′ (x) = 2ex + 4x2 ex f ′′ (0) = 2
2 2
f ′′′ (x) = 12xex + 8x3 ex f ′′′ (0) = 0
2 2 2
f (4) (x) = 12ex + 48x2 ex + 16x4 ex f (4) (0) = 12
We can plug these values into our T4 (x) formula.
1 0 0 2 0 12
T4 (x) = x + x1 + x2 + x3 + x4
0! 1! 2! 3! 4!
1
= 1 + x2 + x4
2
We can see that our derivative calculations would quickly get out of hand as we take higher order
derivatives. Even if there is a discernible pattern, it might take more computation to determine it.

b A better approach is to start with a simpler Taylor series that we know.


X 1 k
ex = x
k!
k=0

2
ex is a composition of ex and x2 , so we will plug in x2 for x in our ex Taylor series.

2 X 1 2 k
ex = (x )
k!
k=0

X 1 2k
= x
k!
k=0

c Taylor series are also power saeries. By our theorem on power series, we can integrate term by
term.
Z ∞
2 X 1
ex dx = = x2k+1 + c
k!(2k + 1)
k=0
2
Note that ex dx is not a function we can express algebraically or compute. A Taylor series gives
R

us some way to represent this function, but we shouldn’t be too satisfied. If we actually wanted
to evaluate it, the best we could do is approximate it with a partial sum.

234
2 2
Figure: The graph of ex , ex dx, and the partial sums of its Taylor series.
R

Main Ideas

Compositions of functions can be composed through Taylor series.


Taylor series allow us to integrate functions that are otherwise impossible to integrate.

Application 3.5.6
Euler’s Formula

Recall i is an imaginary number that satisfies i2 = −1.

a Find an expression for f (x) = eix .

b Write your answer in terms of the Taylor series for sin x and cos x.

c Write two different expressions for ei2x . How is this equation useful?

235
Application 3.5.6 Euler’s Formula

Solution

a We can express eix by replacing x by ix in the Taylor series for ex .


X 1
T (x) = (ix)k
k!
k=0

b To make much sense of this, we should try to simplify ik .

i0 = 1 i4 = 1

i1 = i i5 = i
..
i2 = −1 .

i3 = −i

We can write out the terms of T (x) as follows:

1 1 1 1 1 1
T (x) = 1 + ix − x2 − ix3 + x4 + ix5 − x6 − ix7
2 3! 4! 5! 6! 7!

The terms with a factor of i are the Taylor series for sin x multiplied by i. The terms without a
factor of i are the Taylor series for cos x. We can write

eix = cos x + i sin x

c One way to write this would be to substitute 2x for x:

ei2x = cos 2x + i sin 2x

Another way would be to square our original formula.


2
ei2x = eix

= (cos x + i sin x)2

= cos2 x + 2i cos x sin x − sin2 x (i2 = −1)

Setting these equal to each other, we note that for two complex numbers to be equal, their real
parts must be equal and their imaginary parts must be equal.

cos 2x + i sin 2x = cos2 x + 2i cos x sin x − sin2 x

cos 2x = cos2 x − sin2 x


and sin 2x = 2 cos x sin x

These are the double angle formulas for sine and cosine.

236
We can take higher powers of eix to produce triple or quadruple angle formulas. This converts a
difficult geometry problem into something a high school algebra student could compute.

Remark

You would expect a relationship like this to be very famous, and it is. eix = cos x + i sin x is called
Euler’s Formula. In addition to trigonometric formulas, it gives us insight into the complex numbers.
This connection between an exponential and a periodic function is so powerful that it is used in such
concrete applications as electrical engineering and signal processing.

Section 3.5
Exercises

Summary Questions

Q1 How can we be sure that a Taylor series converges to the function it is approximating?

Q2 What is the domain of a Taylor series?

Q3 How can we produce the Taylor series for xn f (x) or f (xn )? Where does the center need to be
for the result to be a Taylor series?

Q4 What is a Maclaurin series?

3.5.1

Q5 If we wanted to compute a decimal approximation of ln(1.25) by hand, would the Taylor polyno-
mial or the Taylor series be more useful?

Q6 If T (x) is a Taylor series centered at x = a, what are the possible forms that the domain of T (x)
could take?

237
Section 3.5 Exercises

3.5.2

Q7 How would the Taylor series of f (x) = ex change if we centered it at x = 1 instead of x = 0?

Q8 Let T (x) be the Taylor series of f (x) = ex centered at 0. Verify that T ′ (x) = T (x).

1
Q9 Write a Taylor series of f (x) = x centered at 4.

1
Q10 Write a Taylor series of f (x) = x2 centered at −5.

Q11 Write a Taylor series of f (x) = cos x centered at 0.

Q12 Write a Taylor series of f (x) = sin x centered at 0.

3.5.3

Q13 Show that the Taylor series of f (x) = ex centered at x = 0 is equal to f (x) for all real numbers
x.

Q14 Show that the Taylor series of f (x) = sin x centered at x = 0 is equal to f (x) for all real numbers
x.
1
Q15 Show that the Taylor series of f (x) = x centered at 4 is equal to f (x) for all x in the interval

(2, 6).

3k
Q16 Suppose for a function f we are able to place a bound of k! on the kth derivative of f over

any interval. For what values of x can we conclude that T (x), the Taylor series centered at 2, is
equal to f (x)?

X (−1)k+1
Q17 We didn’t have a series test to determine whether converges. How does our analysis
k
k=1
of the Taylor series of ln x allows us to conclude that this series converges? Hint: what is T (2)?

Q18 For a general function f and its Taylor polynomials and series, how are the following sets of points
related? Does every number belonging to one of these sets belong to one of the others?
The set of numbers x where T (x) converges.
The set of numbers x where |Rn (x)| → 0 as n → ∞.
The set of numbers where f (x) = T (x).

238
3.5.4

Q19 Write a Taylor series for f (x) = x5 cos x centered at x = 0.

Q20 Write a Taylor series for f (x) = x3 ex centered at x = 0.

Q21 Can we use our Taylor series for f (x) = ln x centered at 1 to write a Taylor series for g(x) =

x2 ln x? Explain.

(x+5)3
Q22 Write a Taylor series for f (x) = x2 centered at −5.

3.5.5

3
Q23 Let g(x) be an antiderivative of ex . Write the Taylor series for g(x) centered at x = 0.

Q24 Let g(x) be an antiderivative of cos(x2 ). Write the Taylor series for g(x) centered at x = 0.

Q25 Let f (x) = cos x. Let T (x) be the Taylor series of f centered at x = 0. Compute T ′′ (x). Why
does your answer make sense?

1
Q26 Write the Taylor series for f (x) = x centered at 1. Verify that one of its antiderivatives is a
Taylor series for ln x.

3.5.6

Q27 Rewrite our formula for cos(2x) to be entirely in terms of cos x.

Q28 Use Euler’s formula to compute a formula for cos 3x in terms of cos x and sin x.

Q29 According to Euler’s formula, what is the value of e2πi ?

Q30 Use the Taylor series of ln x centered at x = 1 to compute ln(1 + i). Do you think this series
converges?

239
Section 3.5 Exercises

Synthesis & Extension

1
Q31 Let h(x) = x2 .

a Compute the Taylor polynomial T3 centered at x = 4.

1
b If you wanted to use your Taylor polynomial from a to approximate 2.52 , what bound would
Taylor’s inequality put on the error? Don’t simplify the arithmetic.

c What does the ratio test tell you about the domain of the Taylor series of h(x) centered at
x = 4?

Q32 Let X be a normal random variable with mean 0 and standard deviation 1. Write a series whose

value is P (0 ≤ X ≤ 1).

Q33 Suppose we produce the Taylor series T (x) for some f (x) centered at x = 10.

a If the Taylor series converges at x = 5, must it also converge at x = 7? Explain.

b If the errors of the Taylor polynomials Tn (2) converge 0 as n goes to ∞ for some x, must

T (2) converge? If T (2) converges, must the errors converge to 0?

c If you wanted to approximate f (7) as accurately as possible, which would be more useful, a
Taylor polynomial or a Taylor series?

Q34 Suppose we have a function f (x) and two different numbers a and b. Suppose further that the

Taylor series for f (x) centered at a is equal to the Taylor series for f (x) centered at b. What
can you say about the domain of this Taylor series?

240
Chapter 4

Multivariable Functions

This chapter introduces functions of more than one variable. We construct the higher dimensional spaces
needed for their domains, we produce tools to visualize them, and we compute their rates of change.

Contents
4.1 Three-Dimensional Coordinate Systems . . . . . . . . . . . . . . . . . . . . 242
4.2 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 259
4.3 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
4.4 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
4.5 Linear Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Section 4.1

Three-Dimensional Coordinate Systems


Goals:

1 Plot points in a three-dimensional coordinate system.


2 Use the distance formula.
3 Recognize the equation of a sphere and find its radius and center.
4 Graph an implicit function with a free variable.

Suppose we wanted to understand the growth rate of a species of bacteria. We could grow several
dozen cultures and take a series of measurements of size s at times t of each. Each measurement is an
ordered pair (t, s). We can plot these pairs in a coordinate plane to get a visual sense of how growth
occurs over time. We might even fit a function that approximates s as a function of t. What if we wanted
to understand the role of some other measurement, like temperature, light, or the availability of various
food sources? We could grow many cultures in different conditions. Now a single measurement has three
or more pieces of information. While we could strip these out and plot our data on a temperature/size
coordinate plane, we risk missing important relationships with the other variables. In order to take
advantage of the visual and computational benefits of a coordinate system, we must be prepared to
work with a coordinate system of more than two variables.

Question 4.1.1
How Do Cartesian Coordinates Extend to Higher Dimensions?

The best way to define a higher-dimensional coordinate system is to extrapolate from the coordinate
plane. This way we don’t need to remember a set of novel and arbitrary rules, and our two-dimensional
experience will be a guide to us in dimensions where we have no visual intuition.
Recall how we constructed the Cartesian plane.

1 Assign origin and two directions (x, y).


y
4
(2, 3) 2 y is 90 degrees anticlockwise from x.
3

2
3 Axes consist of the points displaced in only one direction.
1

−4 −3 −2 −1 1 2 3 4 x 4 Coordinates refer to displacement from the origin in each


−1
direction.
−2

−3
5 Either displacement can happen first.
−4

6 Each point has exactly one ordered pair that refers to it.

In a three-dimensional Cartesian coordinate system. We can extrapolate from two dimensions.

242
1 Assign origin and three directions (x, y, z).

2 Each axis makes a 90 degree angle with the


other two.

3 The z direction is determined by the right-


hand rule.

Question 4.1.2
How Do We Establish Which Direction Is Positive in Each Axis?

The choice of which direction is positive is arbitrary. However, it is important that we all make the
same choice, or our visualizations will be incompatible. In two dimensions, we agree that the positive
y-axis is counterclockwise from the positive x-axis. This will not work in three dimensions. Suppose the
positive y-axis is counterclockwise from the positive x-axis in three-space. If you rotate your point of
view to see the axes from the other side, the positive y-axis is now clockwise from the positive x-axis.
Thus the relative orientation of the positive x and y directions does not matter. You could pick a
different orientation, and just be looking at three-space from a different viewpoint.
The z direction is different. Once we’ve chosen a positive x and y direction, there are two equally
valid possible directions for positive z, pointing in opposite directions from each other. The choice here
matters, but it will be arbitrary. We agree to define the positive z direction by the right hand rule.
The right hand rule says that if you make the fingers of your right hand follow the (counterclockwise)
unit circle in the xy-plane, then your thumb indicates the direction of the positive z-axis.

Figure: The counterclockwise unit circle in the xy-plane

243
Example 4.1.3
Drawing a Location in Three-Dimensional Coordinates

The point (2, 3, 5) is the point displaced from the origin by

2 in the x direction

3 in the y direction
5 in the z direction.

How do we draw a reasonable diagram of where this point lies?

Solution

We can begin by finding the points (2, 0, 0) which lies on the x-axis two units from the origin and
(0, 3, 0) which lies on the y-axis three units from the origin. Along with the origin itself, these points
and (2, 3, 0) form a parallelogram. Now we need a displacement of 5 in the z direction. We can copy
the length and direction of this displacementof the segment from (0, 0, 0) to (0, 0, 5) on the z-axis. We
draw a segment of that length and direction from (2, 3, 0). The top of this segment is (2, 3, 5).

Remark

The extra lines we used to construct (2, 3, 5) are not just useful for guaranteeing accuracy, they also help
our audience to correctly visualize the location we mean to plot. When we project three-space onto a
flat page, each point on the page represents infinitely many points stretching into the background. If we
only draw a isolated point, which of these are we representing? Lines like the ones we produced in this
example trick a viewers brain into visualizing correct three-dimensional location in our flat diagram.

How can we draw a reasonable diagram of (−5, 1, −4)?

244
Solution

The procedure here is the same, except that the displacements in the x and z directions are negative.
Thus when producing these displacements, we travel backward along their axes.

Question 4.1.4
How Do We Measure Distance in Three-Space?

Since coordinate displacements in two-space are perpendicular, we compute the distance to a point
using the Pythagorean theorem. This reasoning extends to higher dimensions, but we need to build the
correct length using two or more right triangles.

Theorem

The distance from the origin to the point (x, y, z) is given by the Pythagorean Theorem
p
D = x2 + y 2 + z 2

245
Question 4.1.4 How Do We Measure Distance in Three-Space?

We first compute the distance from the origin to (x, y, 0) using a right triangle in the xy-plane.
The right triangle with the vertices (0, 0, 0), (x, y, 0) and (x, y, z) allows us to apply the Pythagorean
theorem again.
p 2
D2 = x2 + y 2 + z 2

If neither of the points is the origin, we can compute the displacements by subtraction. This is a
natural extension of the two-space distance formula.

Theorem

The distance from the point (x1 , y1 , z1 ) to the point (x2 , y2 , z2 ) is given by
p
D = (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2

Question 4.1.5
What Is a Graph?

A well-prepared calculus student has learned to understand the graphs of many equations: lines,
circles, parabolas. The definition of a graph, on the other hand, is often discarded after a few exercises
of plotting points by hand. The definition is worth recalling. It applies to a space of any dimension.

Definition

The graph of an implicit equation is the set of points whose coordinates satisfy that equation. In other
words, the two sides are equal when we plug the coordinates in for x, y and z.

This definition allows us to immediately understand the graphs of some equations. The graph of
the following equation consists of the points that, when plugged into a specific distance formula and
squared, give a result of 9. This is a sphere.

246
Example

The graph of

x2 + (y − 4)2 + (z + 1)2 = 9

is the set of points that are distance 3 from the point


(0, 4, −1)

Example 4.1.6
Graphing an Equation with Two Free Variables

Sketch the graph of the equation y = 3.

Solution

The naive approach would have us seek out the point marked with 3 on the y-axis. However, in two-
space, we know that the graph would be a horizontal line, not just the point (0, 3). Why is this? Any
point of the form (x, 3) satisfies the equation y = 3. Similarly, any point of the form (x, 3, z) in three-
space satisfies y = 3. These are all the points that can be reached from (0, 3, 0) by displacements in
the x and z directions. They create a plane through (0, 3, 0) parallel to the x and z axes.

Much as lines are the simplest and most fundamental one-dimensional objects, planes are the simplest
and most fundamental two-dimensional objects. In addition to coordinate axes, 3-dimensional space has
3 coordinate planes.
1 The graph of z = 0 is the xy-plane.
2 The graph of x = 0 is the yz-plane.

3 The graph of y = 0 is the xz-plane.

247
Example 4.1.6 Graphing an Equation with Two Free Variables

Figure: The coordinate planes in 3-dimensional space.

Remark

Planes extend forever but our pictures of them cannot. Notice that graphing software cuts them off
parallel to the axes they contain. The resulting images are parallelograms. This is a good practice when
drawing planes by hand too. It suggests the proper orientation to the viewer, despite the limitations of
a flat visualization.

Example 4.1.7
Graphing an Equation with One Free Variable

Sketch the graph of the equation z = x2 − 3.

Solution

We should recognize this as the equation of a parabola. If we ignore the variable y, we can graph this
equation in the xz plane. What does the absence of absence of y in the equation mean? If we follow
the definition of a graph, the value of y has no effect on whether a point lies on the graph or not. We
can take the parabola in the xz plane, and project it in the y direction to obtain a surface called a
parabolic cylinder.

248
z = x2 − 3

Question 4.1.8
What Do the Graphs of Implicit Equations Look Like Generally?

Notice that the graph of an implicit equation in the plane is generally one-dimensional (a curve),
whereas the graph of an implicit equation in three-space is generally two-dimensional (a surface).

Figure: The curve y = x2 − 3 Figure: The surface z = x2 − 3

Question 4.1.9
What Is the Slope-Intercept Equation of a Plane?

Unlike a line, a non-vertical plane has two slopes. One measures rise over run in the x-direction, the
other in the y-direction.

249
Question 4.1.9 What Is the Slope-Intercept Equation of a Plane?

Figure: A plane with slopes in the x and y directions.

Equation

A plane with z intercept (0, 0, b) and slopes mx and my in the x and y directions has equation

z = mx x + my y + b.

Example 4.1.10
Writing the Equation of a Plane

Write the equation of a plane with intercepts (4, 0, 0), (0, 6, 0) and (0, 0, 8).

Solution

From the point (4, 0, 0) to the point (0, 0, 8), the plane rises by 8 while x is reduced by 4. This gives a
slope in the x direction.
8−0
mx = = −2.
0−4
Similarly,
8−0 4
my = =− .
0−6 3
The point (0, 0, 8) is on the z-axis, and so indicates that the z-intercept is 8. Combining these, we
conclude the plane has equation:
4
z = −2x − y + 8
3

250
Main Idea

Given three points in a plane A = (x1 , y1 , z1 ), B = (x2 , y2 , z2 ) and C = (x3 , y3 , z3 )


1 If two points share an x-coordinate, we can directly compute my and vice versa.
2 Failing that, we can set up a system of equations and solve for mx , my and b.

Question 4.1.11
How Do We Extrapolate to Even Higher Dimensions?

The measurements we take of each observation, the more dimensions we need to plot the data we
have produced. Extrapolating from three-space to even higher dimensions introduces no new difficulties,
except that we cannot visualize the result. We can use a coordinate system to describe a space with
more than 3 dimensions. k-dimensional space can be defined as the set of points of the form

P = (x1 , x2 , . . . , xk ).

Theorem

The distance from the origin to P = (x1 , x2 , . . . , xk ) in k-space is


q
x21 + x22 + · · · + x2k

There is no right hand rule for higher dimensions, because we can’t draw these spaces anyway.

251
Section 4.1
Exercises

Summary Questions

Q1 What displacements are represented by the notation (a, b, c)?

Q2 What is the right hand rule and what does it tell you about a three-dimensional coordinate
system?

Q3 In three-space, what is the y-axis? What are the coordinates of a general point on it?

Q4 In three space, what is the xz-plane? What are the coordinates of a general point on it? What
is its equation?

Q5 How do we use a free variable to sketch a graph?

Q6 How do we recognize the equation of a sphere?

4.1.1

Q7 Suppose that instead of denoting each point P = (x, y) in R2 by its displacements from the

origin in the x- and y-directions, we denote it by P = (d, m) where d is its distance from the
origin, and m is the slope of the line through P and the origin. What problems could arise from
adopting this convention?

Q8 Suppose the x and y axes were not perpendicular. Could we still assign coordinates to each point
by its x and y displacements from the origin? Demonstrate with a diagram.

252
4.1.2

Q9 Which of the following depictions of the xy-plane are consistent with the usual orientation, and
which are backwards?

a The positive x axis points up, and the positive y-axis points left.

b The positive x axis points down, and the negative y-axis points right.

c The positive x axis points left, and the positive y-axis points up.

d The negative x axis points right, and the positive y-axis points down.

e The positive x axis points up and to the right, and the positive y-axis points down and to
the right.

Q10 Suppose we draw the xy plane on our paper in the standard way, and our paper is lying on a
table. Does the z-axis point down into the table or up out of the table?

4.1.3

Q11 Draw diagrams of points with the following coordinates.

a (6, 1, 2)

b (−3, 0, 0)

c (2, −1, 4)

d (0, 3, 5)

Q12 Draw diagrams of points with the following coordinates.

a (−4, 0, 0)

b (3, −2, 0)

253
Section 4.1 Exercises

c (4, 5, −3)

d (−1, 3, 4)

4.1.4

Q13 Compute the distance between (3, 6, 2) and (7, 3, −10).

Q14 Compute the distance between (0, 3, 2) and (5, 1, 0).

Q15 Compute the distance between (10, 12, 109) and (11, 9, 105).

Q16 Compute the distance between (53, 42, 9) and (43, 78, 2).

4.1.5

Q17 Does the point (4, 3, 8) lie on the graph of z = x2 − 2? Explain how you know.

Q18 Does (2, 2, 1) lie on the graph of x2 + y 2 + z 2 = 9? Explain how you know.

Q19 What is the graph of y 2 + z 2 = −1? Explain your reasoning.

Q20 The point (2, 3, 4) lies on the graph ax + ay − z = 26. What is the value of the number a?

Q21 Olivia says that the graph of (x − 2)(y − 3) = 0 in the xy-plane is the point (2, 3). Do you agree?
How would you explain it?

Q22 How is the graph of f (x, y, z)g(x, y, z) = 0 related to the graphs of f (x, y, z) = 0 and g(x, y, z) =
0?

254
4.1.6

Q23 Does the graph of z = 4 intersect the graph of z = 6? Explain both using geometry and algebra.

Q24 Does the graph of x = 2 intersect the graph of z = 1? Explain.

4.1.7

Q25 Sketch the graph of each equation.

a x = −4

b x2 + y 2 = 9

c x2 + 4x + y 2 + z 2 − 2z = 4

Q26 Sketch a graph of z = −2.

Q27 Sketch a graph of y = −z 2 .

Q28 Sketch a graph of x2 + z 2 = 25.

4.1.8

Q29 What dimension you we expect the graph of an equation to be in 6-dimensional space?

Q30 What is the graph of x2 + y 2 = 0 in the xy-plane? Is this an exception to our intuition about
the dimension of a graph?

Q31 Zoe and Muhammad both sketch the graph of y = x2 . Zoe’s graph is a curve. Muhammad’s is
a surface. Has one of them drawn the wrong graph? Explain.

Q32 In R3 , what is the dimension of the intersection of the graphs x2 + y 2 = 25 and z = 1? Can you
explain this in terms of our intuition about the dimension of a graph.

255
Section 4.1 Exercises

4.1.9

Q33 Suppose that y is a free variable in the equation of a plane. What does that tell us about mx
and my ?

Q34 Gabby is trying to find the equation of a plane P , but she doesn’t know any points on the xz-plane
or yz-plane. Instead she knows that P contains the points:

A = (1, 3, 6) B = (5, 3, 4) C = (7, 5, 10)


4−6
Using points A and B, she decides that mx = 5−1 = − 12 . Using points A and C, she decides
10−6
that my = 5−3 = 2.

a Which of Gabby’s conclusions do you agree with and which do you disagree with? Why?

b How could you fix the one that is wrong?

Q35 Supoose you intend to write the equation of the plane through A, B and C in slope-intercept

form. If A = (3, 5, 7) and B = (3, 2, 4), what value(s) of the y coordinate of C would make it
easiest to compute mx ?

Q36 Recall that we can write the equation of a line in R2 in point-slope form:

y − y0 = m(x − x0 )

where m is the slope and (x0 , y0 ) is a known point. This was especially useful in single-variable
calculus for writing equations of tangent lines.

a How would you expect to write the equation of the plane P through (2, 4, −6) with slopes
1
mx = 2 and my = −3?

b Does your answer to a actually pass through (2, 4, −6)? How do you know?

c Is your answer to a actually the equation of a plane? How do you know? Does it have the
correct slopes?

d Write a general expression for point-slope form for a plane.

Q37 The plane P has slopes mx = 3 and my = −1 and passes through (2, 5, −1).

a Write the equation of P is point-slope form.

256
b What is the z-intercept of P .

Q38 Given a plane with mx = 5 and my = 2, we can conclude that the plane is steeper in the
x-direction than the y-direction. Is the x-direction the steepest direction we could travel in? If
not, what is?

4.1.10

Q39 Write the equation of a plane through (3, 0, 0), (0, 7, 0), and (0, 0, −1).

Q40 Write the equation of a plane with intercepts (2, 0, 0), (0, −2, 0), and (0, 0, 4).

Q41 Write the equation of a plane through (6, 4, 1), (6, 7, −2), and (8, 7, 1).

Q42 Write the equation of a plane through (2, 2, 1), (4, 2, 9), and (2, 0, 0).

Q43 Write the equation of a plane through (3, 4, 2), (5, 5, 6), and (7, 4, 6).

Q44 Write the equation of a plane through (1, 5, 2), (11, 5, 4), and (6, 3, −3).

4.1.11

Q45 Assuming you could draw in 4 dimensions, describe how you might construct the graph of x21 +

x23 + x24 = 25 in R4 .

Q46 Assuming you could draw in 4 dimensions, describe how you might construct the graph of x2 = x23

in R4 .

Q47 What equation(s) would describe the x2 x4 -plane in R4 ?

Q48 What would you call the object in R4 defined by x1 = 0?

257
Section 4.1 Exercises

Extension and Synthesis

Q49 The points (1, 0, 3) and (1, 4, 0) are both on the sphere S. What are the possible values for the
radius of S?

Q50 The graph of x2 + y 2 = 0 in R2 is a point, not a curve. Use this idea to write an equation for the

intersection of the graphs f (x, y, z) = c and g(x, y, z) = d. What do you expect the dimension
of this intersection to be?

Q51 Suppose the x and y axes in R2 were not perpendicular. Would the distance formula still hold?
Demonstrate.

258
Section 4.2

Functions of Several Variables


Goals:

1 Convert an implicit function to an explicit function.


2 Calculate the domain of a multivariable function.

3 Calculate level curves and cross sections.

If we want to understand the relationship between variables, a function is the gold standard. For
example, when we can write y as a function of x, then at each value of x, we simply need plug in
the value and simplify the arithmetic. There is no chance that algebraic manipulation will lead us to
multiple values of y, or to an equation we cannot solve. Naturally, we want to understand this type of
relationship between more than two variables. Much like our investigation of n-space, we’ll begin by
adding one variable. After this initial step, extrapolating to more variables will be straightforward.

Question 4.2.1
What Is a Function of More than One Variable?

Definition

A function of two variables is a rule that assigns a number (the output) to each ordered pair of real
numbers (x, y) in its domain. The output is denoted f (x, y).

p
Some functions can be defined algebraically. If f (x, y) = 36 − 4x2 − y 2 then
p
f (1, 4) = 36 − 4 · 12 − 42 = 4.

Example 4.2.2
The Domain of a Function

p
Identify the domain of f (x, y) = 36 − 4x2 − y 2 .

259
Example 4.2.2 The Domain of a Function

Solution

The only obstacle to evaluating this function is that the value under the square root might be negative.
We can write an inequality to express this and solve.

36 − 4x2 − y 2 ≥ 0

36 ≥ 4x2 + y 2
x2 y2
1≥ +
9 36

These are the points inside an ellipse whose intercepts are (±3, 0) and (0, ±6).

Figure: The domain of a function

Main Idea

When solving for the domain of an algebraic function, we look for the same obstacles to evaluating the
function that we do for one-variable functions.
sin x
Expressions in a denominator cannot be 0 (including built-in fractions like tan x = cos x )

Expressions in a square root must be greater than or equal to 0.

Expressions in a logarithm must be greater than 0.


The conditions these produce with a two-variable function may be harder to visualize or simplify than
with a function of one variable.

260
Application 4.2.3
Temperature Maps

Many useful functions cannot be defined algebraically. There is a function T (x, y) which gives
the temperature at each latitude and longitude (x, y) on earth. No pair (x, y) has more than one
temperature, and no pair fails to have a temperature. Still there is no hope of producing an expression
that computes T for any x and y. Mathematically (though perhaps not meteorologically) this function
is arbitrary.

T (−71.06, 42.36) = 50

T (−83.74, 42.28) = 41

T (−84.38, 33.75) = 59

Figure: A temperature map

This function is represented graphically by using color to portray the value of T at each point.

Application 4.2.4
Digital Images

A digital image is made up of pixels, each with a different color. In many modern images, these
pixels are too small to see. The color of each pixel is a function of that pixel’s location. Since colors are
harder to define numerically, we can consider the simpler case: where each pixel is a different shade of
gray. In this case we have a brightness function B(x, y) where the output is a number that represents
the brightness of the pixel at the coordinates (x, y).

1024

B(339, 773) = 158 B(340, 773) = 127

x
687

Figure: An image represented as a brightness function B on each pixel

261
Application 4.2.4 Digital Images

Remark

The brightness function differs from other functions we’ve studied in one key way. It is only defined for
(x, y) where x and y are integers. Other examples can take any real numbers as coordinates. This makes
our usual calculus methods impossible. We cannot get arbitrarily close to a point in order to compute a
limit. All other points are at least 1 unit away. However, if we are willing to settle for approximations,
we can apply calculus and get useful results.

Question 4.2.5
What Is the Graph of a Two-Variable Function?

A graph is our most important way to visualize a function. The graph of a one variable functions
is an object in two-space. One dimension measures the input variable. The other measures the output.
For a two variable function, the graph lies in three-space.

Definition

The graph of a function f (x, y) is the set of all points (x, y, z) that satisfy

z = f (x, y).

The height z above a point (x, y) represents the value of the function at (x, y). In this figure,
f (1, 4) is equal to the height of the graph above (1, 4, 0).

p
Figure: The graph z = 36 − 4x2 − y 2

262
Question 4.2.6
How Do We Visualize a Graph in Three-Space?

Three-space is harder to visualize than two-space. What’s more, plotting points is more arduous with
two dimensions of inputs. In the absence of computer graphics, mathematicians have used a variety of
visualization tools.

Definition

A level set of a function f (x, y) is the graph of the equation f (x, y) = c for some constant c. For a
function of two variables this graph lies in the xy-plane and is called a level curve.

Example

Consider the function


p
f (x, y) = 36 − 4x2 − y 2 .
p
The level curve 36 − 4x2 − y 2 = 4 simplifies to 4x2 + y 2 = 20.
This is an ellipse. p
Other level curves have the form 36 − 4x2 − y 2 = c or 4x2 +y 2 =
36 − c2 . These are larger or smaller ellipses.

Level curves take their shape from the intersection of z = f (x, y) and z = c. Seeing many level
curves at once can help us visualize the shape of the graph.

Figure: The graph z = f (x, y), the planes z = c, and the level curves

263
Example 4.2.7
Drawing Level Curves

Where are the level curves on this temperature map?

Figure: A temperature map

Solution

The level sets are the points where the temperature has a certain value. Since the colors represent ranges
of temperatures, it’s difficult to pick out the level sets within that range. However, at the transition from
one color to the next, we know that the temperature is equal to the cutoff temperature between those
ranges. The picture below shows a reasonable attempt to sketch three level curves in white. Notice
that the level curves (especially the one between green and yellow) are not connected, and that drawing
them in perfect detail is beyond the ability of a human.

264
Example 4.2.8
Using Level Curves to Describe a Graph

What features can we discern from the level curves of this topographical map?

Figure: A topographical map

265
Example 4.2.8 Using Level Curves to Describe a Graph

Solution

A
D
D
B
C
D

There are many features we could describe. Here is a sample.


The point A is surrounded by relatively flat terrain. There are not many level curves here, which
means the altitude is not increasing or decreasing to higher or lower levels.
The points B and C are on slopes. If we travel north and south we cross level curves, meaning
our altitude is increasing or decreasing. The slope is steeper at B than at C, because traveling
north from B we cross more level curves than traveling north from C

The points marked D are in the middle of a series of rings of level curves. These are either the
tops of hills or (less likely given the context) the bottoms of valleys.

Example 4.2.9
A Cross Section

Definition

The intersection of a plane with a graph is a cross section. A level curve is a type of cross section, but
not all cross sections are level curves.

266
p
Find the cross section of z = 36 − 4x2 − y 2 at the plane y = 1.

p
Figure: The y = 1 cross section of z = 36 − 4x2 − y 2

Example 4.2.10
Converting an Implicit Equation to a Function

Definition

We sometimes call an equation in x, y and z an implicit equation. Often in order to graph these, we
convert them to explicit functions of the form z = f (x, y)

Write the equation of a paraboloid x2 − y + z 2 = 0 as one or more explicit functions so it can be


graphed. Then find the level curves.

267
Example 4.2.10 Converting an Implicit Equation to a Function

Figure: Level curves of x2 − y + z 2 = 0

Question 4.2.11
How Does this Apply to Functions of More Variables?

We can define functions of three variables as well. Denoting them f (x, y, z). For even more variables,
we use x1 through xn . The definitions of this section can be extrapolated as follows.

Variables 2 3 n
Function f (x, y) f (x, y, z) f (x1 , . . . , xn )
Domain subset of R2 subset of R3 subset of Rn
Graph z = f (x, y) in R3 w = f (x, y, z) in R4 xn+1 = f (x1 , . . . , xn ) in Rn+1
Level Sets level curve in R2 level surface in R3 level set in Rn

268
Observation

We might hope to solve an implicit equation of n variables to obtain an explicit function of n − 1


variables. However, we can also treat it as a level set of an explicit function of n variables (whose graph
lives in n + 1 dimensional space).

p
x2 + y 2 + z 2 = 25 f (x, y) = ± 25 − x2 − y 2

F (x, y, z) = x2 + y 2 + z 2
F (x, y, z) = 25

Both viewpoints will be useful in the future.

Section 4.2
Exercises

Summary Questions

Q1 What does the height of the graph z = f (x, y) represent?

Q2 What is the distinction between a level set and a cross section?

Q3 What are level sets in R2 and R3 called?

Q4 What is the difference between an implicit equation and explicit function?

269
Section 4.2 Exercises

4.2.1

Q5 If f (x, y) = 13x + xy , compute f (2, −8).

Q6 If f (x, y) = cos(πxy), compute f 4, 31 .





Q7 Is f (x, y) = ± 4x − y a function? Explain.

Q8 Is the following a function? Explain.

(√
y if y ≥ 0
f (x, y) = √
x if x ≥ 0

4.2.2

1
Q9 Compute the domain of f (x, y) = x+y .

1
Q10 What is the domain of f (x, y) = x2 +y 2 ?

p
Q11 What is the domain of g(x, y) = x3 + y 2 − 25?

Q12 What is the domain of g(x, y) = 15 + ln(y − 2x)?


x+3
Q13 What is the domain of f (x, y) = y 2 −x ?

4x
Q14 Compute the domain of h(x, y) = y−ln x

270
4.2.3

Q15 On the temperature map, we saw T (−84.38, 33.75) = 59. Is T (−84.38, 35.75) greater than or
less than 59?

Q16 On the temperature map, we saw T (−83.74, 42.28) = 41. Is T (−93.74, 42.28) greater than or
less than 41?

Q17 What range of temperatures are found in South Dakota? In which parts of the state are the
extreme temperatures found?

Q18 Can you use this diagram to approximate T (−61.06, 42.36)? Explain.

4.2.4

Q19 In our image of Mona Lisa, what is the domain of B?

Q20 In our blow-up of the digital image, we see Mona Lisa’s eye is near the coordiante (369, 800).
Where is her other eye?

4.2.5

Q21 Can the points (1, 3, 5) and (1, 3, 7) both be on the graph of z = f (x, y)? Explain.

Q22 If the graph z = f (x, y) is below the xy-plane, what does that tell us about f (x, y)?

Q23 If f (x, y) has a z-intercept of c, what does that tell us about f ?

Q24 What is the significance of the points where the graph z = f (x, y) intersects the xy-plane?

271
Section 4.2 Exercises

4.2.6

Q25 Describe the level curves of f (x, y) = (x − 2)2 + (y + 1)2 .

Q26 Describe the level curves of f (x, y) = x2 − 3y + 5.

x2
Q27 Describe the level curves of y .

y
Q28 Describe the level curves of g(x, y) = ex .

Q29 Give the equation of the level curve of f (x, y) = x3 + y 3 that passes through (4, 2).

Q30 Give the equation of the level curve of g(x, y) = 17x2 − 3xy + y 3 that contains the point (1, 2).

Q31 Given a function f (x, y), how many level curves might pass through (3, 7)?

Q32 If the points (x1 , y1 ) and (x2 , y2 ) lie on the same level curve of h(x, y), what are the possible

values of the expression h((x1 , y1 ) − h(x2 , y2 )?

4.2.7

Q33 In our level curves on the temperature map, what physical meaning can we take from the fact
that the green-yellow and red-orange level curves are closer together in Kansas than they are
farther east?

Q34 Explain why it makes sense physically that level curves of a temperature function would be
complicated and disconnected.

4.2.8

Q35 In the topographical map, what can we deduce from the fact that no level curves cross the farm
fields in the lower center of the map?

Q36 Explain why it makes physical sense that there are level curves alongside the creeks in this map.

272
4.2.9

Q37 Give an equation for the y = 2 cross-section of the graph z = f (x, y) where f (x, y) = x3 + y 3 .

Q38 Consider the plane P whose equation is f (x, y) = 3x − 5y + 7.

i. Give the equation of the y = 0 cross section of P . What is this graph? What is the
significance of the various parts of its equation?
ii. Give the equation of the x = 0 cross section of P . What is the significance of the various
parts of its equation?
iii. Give the equation and describe the set of all level curves of f .

Q39 If the cross sections of z = f (x, y) in the planes y = b are identical for all values of b, what does
that tell us about f ?

Q40 If f (x, y) is a function that satisfies f (x, y) = f (x, −y) for all x and y, how will this be refelected

in the cross sections of z = f (x, y)?

4.2.10

Q41 Rewrite y = x2 + z 2 as one or more explicit functions z = f (x, y).

Q42 Rewrite ln x + ln y + ln z = 0 as one or more explicit functions z = f (x, y).

Q43 Rewrite x2 + y 2 + z 2 + xyz = 20 as one or more explicit functions z = f (x, y).

ln y √
Q44 Explain why it would be difficult to write z − xz = 5 + x as an explicit function of the form

z = f (x, z). Choose a better dependent and variable and write that variable as a function of the
other two.

273
Section 4.2 Exercises

4.2.11

Q45 Consider the function f (x1 , x2 , x3 , x4 , x5 ).

a What space does the graph of f lie in?

b What space does a level set of f lie in?

Q46 Write xyz = 1 as

a A level set of a function

b An explicit function z = f (x, y)

Q47 Consider a one-variable function f (x).

a What space does the graph of f (x) lie in?

b Where does a level set of f lie in? What does a typical level set look like?

Q48 Show how the graph of an explicit function xn+1 = f (x1 , x2 , . . . , xn ) can be converted to the
level set of an n + 1-variable function.

Synthesis & Extension

Q49 Let f (x, y) = x2 . Sketch the graph of z = f (x, y). What is the role of y in this graph?

Q50 Consider the implicit equation zx = y

a Rewrite this equation as an explicit function z = f (x, y).

b What is the domain of f ?

c Solve for and sketch a few level sets of f .

d What do the level sets tell you about the graph z = f (x, y)?

274
Q51 Consider the implicit equation: x = sin z.

a Sketch a graph of the equation.

1
b Describe (in words) what the cross section of the graph in the x = 2 plane looks like.

275
Section 4.3

Limits and Continuity


Goals:

1 Understand the definition of a limit of a multivariable function.


2 Use the Squeeze Theorem

3 Apply the definition of continuity.

Limits of multivariable functions are conceptually similar to one-variable functions. However, even
though the requirement is the same, it is a much harder to satisfy. Since there are so many more ways
to approach a given point in a higher dimensional space, there are more nearby points to check to see
whether the function is actually approaching the proposed limit.

Question 4.3.1
What Is the Limit of a Function?

Definition

We write
lim f (x, y) = L
(x,y)→(a,b)

if we can make the values of f stay arbitrarily close to L by restricting to a sufficiently small neighborhood
of (a, b).

Proving a limit exists requires a formula or rule. For any amount of closeness required (ϵ), you must
be able to produce a radius δ around (a, b) sufficiently small to keep |f (x, y) − L| < ϵ. For this reason,
we will not prove that any limits exist. We will present three examples of functions whose limit does
not exist.

Example 4.3.2
A Limit That Does Not Exist

x2 − y 2
Show that lim does not exist.
(x,y)→(0,0) x2 + y 2

276
Solution
x2 −y 2
Let’s define f (x, y) = x2 +y 2 . We will approach the point (0, 0) from two different directions. If
we approach along the x-axis, then the points on our path have the form (x, 0). When we plug these
2
into the function, the value is f (x, 0) = xx2 −0
+0 . This is equal to 1 for all values of x except 0, so as x
approaches 0, the values of f are arbitrarily close (in fact exactly equal) to 1.
On the other hand, if we approach 1 along the y-axis, then the points have the form (0, y). When
0−y 2
we plug these into the function, the value is f (0, y) = 0+y 2 . This is equal to −1 for all values of y
except 0, so as y approaches 0, the values of f are arbitrarily close (in fact exactly equal) to −1.
What does this say about the limit of f ? The lim f (x, y) ̸= 1 because there are points on the
(x,y)→(0,0)

y-axis do not give values close to 1, but any neighborhood of (0, 0) includes some points on the y-axis.
Similarly, lim f (x, y) ̸= −1. If we tried to argue that the limit had any other value, the x-axis
(x,y)→(0,0)

and y-axis would both present a problem. This this limit does not exist.
We can identify the problem behavior in the graph of z = f (x, y). As the graph approaches the
origin, there are points of all heights between −1 and 1. Specifically we can see the line above the x-axis
and below the y-axis. No amount of closeness can exclude this range of values.

Figure: A function with no limit at (0, 0)

We might take away the idea that checking limits of two-variable functions requires checking in both
the x-direction and the y-direction. Unfortunately, even that is not sufficient.

Example 4.3.3
Another Limit That Does Not Exist

xy
Show that lim does not exist.
(x,y)→(0,0) x2 + y 2

277
Example 4.3.3 Another Limit That Does Not Exist

Solution
xy
Let f (x, y) = x2 +y 2 . We can check the values of this function on the x- and y-axes. Except at (0, 0),
f (x, 0) = 0 and f (0, y) = 0. However, not all the points close to (0, 0) lie on an axis. Suppose we work
with the points on another line: y = mx. These points have the form (x, mx). We can evaluate f on
this line.

(x)(mx)
f (x, xm) =
x2+ (mx)2
mx2
=
(m2
+ 1)x2
m
= 2 (except at (0, 0))
m +1

Thus there are point arbitrarily close to (0, 0) on which f is valued as low as −0.5 (m = −1) and as
high as 0.5 (m = 1). The limit does not exist.

xy 1
Figure: The graph z = x2 +y 2 and the line of height 2 over x = y.

We might take away the idea that checking limits of two-variable functions requires checking along
each line through the point in question. Unfortunately, even that is not sufficient.

Example 4.3.4
Yet Another Limit That Does Not Exist

xy 2
Show that lim does not exist.
(x,y)→(0,0) x2 + y4

Solution

xy 2
Let f (x, y) = x2 +y 4 . We can check the values of this function on the x- and y-axes. Except at (0, 0),

278
f (x, 0) = 0 and f (0, y) = 0. We can also check the values along a line of the form y = mx.

(x)(mx)2
f (x, xm) =
x2 + (mx)4
m 2 x3
=
x2 (1 + m4 x2 )
m 2 x3
lim f (x, xm) = lim
x→0 x→0 x2 (1 + m4 x2 )

m2 x
= lim
x→0 1 + m4 x2

=0

Thus along each line, the values of f approach 0 as we approach the origin. However, we have not
considered paths that are not line. Consider the parabola x = y 2 . Points on this parabola have the form
(y 2 , y). We compute the values on this parabola.

(y 2 )(y)2
f (y 2 , y) =
(y 2 )2 + y 4
y4
=
2y 4

For any point on this parabola except the origin f has a value of 12 . Thus f takes values of 1
2 and 0 in
any neighborhood of (0, 0), meaning the limit does not exist.

xy 2 1
Figure: The graph z = x2 +y 4 , which limits to 0 along any line through the origin, but has height 2
over the parabola x = y 2

We take away from these exercises that establishing the value of a multi-variable limit cannot be
reduced to computing a single-variable limit, or even a family of single-variable limits. The formal
arguments that establish multi-variable limits are more advanced and beyond the scope of this text.

279
Question 4.3.5
What Tools Apply to Multi-Variable Limits?

The limit laws from single-variable limits transfer comfortably to multi-variable functions.
1 Sum/Difference Rule
2 Constant Multiple Rule
3 Product/Quotient Rule

These rules allow us to compute limits of complicated functions from simpler ones. How do we come
by those simpler limits in the first place? We can apply the kind of advanced arguments we alluded to
earlier. Another tool is the squeeze theorem.

The Squeeze Theorem

If g < f < h in some neighborhood of (a, b) and

lim g(x, y) = lim h(x, y) = L,


(x,y)→(a,b) (x,y)→(a,b)

then
lim f (x, y) = L.
(x,y)→(a,b)

Question 4.3.6
What Is a Continuous Function?

Definition

We say f (x, y) is continuous at (a, b) if

lim f (x, y) = f (a, b).


(x,y)→(a,b)

In a rigorous development of calculus, we compute limits and use them to show that functions are
continuous. Given that evaluating limits is beyond our current means, we will reverse the process. Rather
than worrying about how to prove the following theorem, we will assume it is true and use it to evaluate
limits.

280
Theorem

Polynomials, roots, trig functions, exponential functions and logarithms are continuous on their
domains.

Sums, differences, products, quotients and compositions of continuous functions are continuous
on their domains.

The limit of a continuous function is equal to the value of the function. When we need to compute
a limit of these functions, we’ll just evaluate them instead. Why didn’t this work in our examples? In
each of our examples, the function was a quotient of polynomials, but (0, 0) was not in the domain.

Remark

Limits, continuity and these theorems can all be extrapolated to functions of more variables.

Section 4.3
Exercises

Summary Questions

Q1 Why is it harder to verify a limit of a multivariable function?

Q2 What do you need to check in order to determine whether a function is continuous?

281
Section 4.4

Partial Derivatives
Goals:

1 Calculate partial derivatives.


2 Realize when not to calculate partial derivatives.

The first task in developing calculus is to understand rates of change. In the single-variable case, we
ask how the dependent variable changes per unit of increase in the independent variable. With more
than one independent variable we must ask: what kind of increase do we mean? There is more than
one possible answer. Partial derivatives are the simplest and most intuitive rate of change.

Question 4.4.1
What Is the Rate of Change of a Multivariable Function?

Motivational Example

The force due to gravity between two objects depends on their masses and on the distance between
them. Suppose at a distance of 8, 000km the force between two particular objects is 100 newtons and
at a distance of 10, 000km, the force is 64 newtons.

How much do we expect the force between these objects to increase or decrease per kilometer of
distance?

Solution

The change in force divided by the change in distance is

64N − 100N N
= −0.018
10, 000km − 8, 000km km

Notice that the change in force is entirely attributable to the change in distance. That is because the
masses of the objects did not change. The only change in the dependent variables is the 2, 000km
increase in distance.

Our goals in understanding multi-variable rates of change are guided by what we accomplished with
one variable. Derivatives of a single-variable function were a way of measuring the change in a function.
Recall the following facts about f ′ (x).

1 Average rate of change is realized as the slope of a secant line:

f (x) − f (x0 )
x − x0

282
2 The derivative f ′ (x) is defined as a limit of slopes:

f (x + h) − f (x)
f ′ (x) = lim
h→0 h

3 The derivative is the instantaneous rate of change of f at x.


4 The derivative f ′ (x0 ) is realized geometrically as the slope of the tangent line to y = f (x) at x0 .

5 The equation of that tangent line can be written in point-slope form:

y − y0 = f ′ (x0 )(x − x0 )

In the physics example above, the rate of change was easier to understand because only one inde-
pendent variable is changing. That was an average rate of change, taken between two points. We now
develop a corresponding instantaneous rate of change. A partial derivative measures the rate of change
of a multivariable function as one variable changes, but the others remain constant.

Definition

The partial derivatives of a two-variable function f (x, y) are the functions

f (x + h, y) − f (x, y)
fx (x, y) = lim
h→0 h
and

f (x, y + h) − f (x, y)
fy (x, y) = lim .
h→0 h

We can see the idea of each partial derivative in the formula. fx compares the values of f at
(x + h, y) and (x, y). The x values change between these two points, but the y values remain constant.
The opposite is true in the formula for fy .

Notation

The partial derivative of a function can be denoted a variety of ways. Here are some equivalent notations

fx
∂f
∂x

∂z
∂x


∂x f

Dx f

283
Example 4.4.2
Computing a Partial Derivative

∂ 2
Find ∂y (y − x2 + 3x sin y).

Main Idea

To compute a partial derivative fy , perform single-variable differentiation. Treat y as the independent


variable and x as a constant.

Solution

We take an ordinary derivative, treating y as the variable and x as a constant. The familiar rules of
derivatives apply. The sum rule means we can differentiate term-by-term.
∂ 2
∂y y = 2y

∂ 2
∂y x = 0, since the x2 term is treated as constant.


∂y 3x sin y = 3x cos y, since 3x is treated as constant multiple of the function sin y.
Together this gives the partial derivative

∂ 2
(y − x2 + 3x sin y) = 2y + 3x cos y.
∂y

Synthesis 4.4.3
Interpreting Derivatives from Level Sets

Below are the level curves f (x, y) = c for some values of c. Can we tell whether fx (−4, 1.25) and
fy (−4, 1.25) are positive or negative?

284
Figure: Some level curves of f (x, y)

Solution

As x increases and y remains constant, we travel to the right in the coordinate plane. Based on the
labeling of the level curves, this takes f from the value 40 to values between 40 and 50, meaning f
increases. Thus fx > 0.
Similarly, as y increases and x remains constant, we travel upwards in the coordinate plane. This
takes f from the value 40 to values between 30 and 40, meaning f decreases. Thus fy < 0.

Question 4.4.4
What Is the Geometric Significance of a Partial Derivative?

The partial derivative fx (x0 , y0 ) is realized geometrically as the slope of the line tangent to z =
f (x, y) at (x0 , y0 , z0 ) and traveling in the x direction. Since y is held constant, this tangent line lives in
y = y0 , a plane perpendicular to the y-axis. The line is tangent to the cross section of the graph with
that plane.

285
Question 4.4.4 What Is the Geometric Significance of a Partial Derivative?

Figure: The tangent line to z = f (x, y) in the x direction

Example 4.4.5
Derivative Rules and Partial Derivatives

Find fx for the following functions f (x, y):


a f= xy (on the domain x > 0, y > 0)

y
b f= x


c f= x+y

d f = sin (xy)

Solution

√ √ √
a We can rewrite this as f (x, y) = x y. In this setting, y is a constant multiple. Thus
1 √
fx (x, y) = √
2 x
y

b We can rewrite this as f (x, y) = x1 y. We treat y as a constant multiple. fx (x, y) = − x12 y.

√ √
c We cannot rewrite this as f (x, y) x + y, because that is not a valid algebraic manipulation.
Instead we use the chain rule.
√ 1
The outer function is x. Its derivative is √
2 x
.

286
The inner function is x + y. Its derivative is 1.
By the chain rule
∂ √ 1 1
x+y = √ (1) = √
∂x 2 x+y 2 x+y

d We do not have an easy trig rule to break up products. We’ll use the chain rule again.

The outer function is sin x. Its derivative is cos x.


The inner function is xy. Its derivative is y.
By the chain rule

sin(xy) = cos(xy)y
∂x

Main Idea

Sometimes we can detach the variable held constant from the changing variable using the rules of
algebra. When we can’t, we’ll often need a differentiation rule (usually the chain rule).

Question 4.4.6
What If We Have More than Two Variables?

We can also calculate partial derivatives of functions of more variables. All variables but one are
held to be constants. :

Example

If
f (x, y, z) = x2 − xy + cos(yz) − 5z 3 ,
then

∂f
= 0 − x − sin(yz)z − 0
∂y
= −x − z sin(yz)

287
Example 4.4.7
A Function of Three Variables

For an ideal gas, we have the law P = nRTV , where P is pressure, n is the number of moles of gas
molecules, T is the temperature, and V is the volume.

∂P
a Calculate ∂V .

∂P
b Calculate ∂T .

∂P
c (Science Question) Suppose we’re heating a sealed gas contained in a glass container. Does ∂T

tell us how quickly the pressure is increasing per degree of temperature increase?

Solution

a We can write P = nrT V1 and treat nrT as a constant multiple. Then ∂P


= nrT − V12 .

∂V

b In this case, nr V1 is a constant multiple. ∂P


∂T = nr V1 (1).

∂P
c No. ∂T assumes n and V are constant, but glass expands as it heats. The volume of both the
container and the gas is increasing, not constant.

Question 4.4.8
How Do Higher Order Derivatives Work?

Taking a partial derivative of a partial derivative gives us a higher order partial derivative. We use
the following notation.

Notation

∂2f
(fx )x = fxx =
∂x2

We need not use the same variable each time

288
Notation

∂ ∂ ∂2f
(fx )y = fxy = f=
∂y ∂x ∂y∂x

Remark

Notice the subscript notation and the ∂ notation express higher order derivatives in opposite order.
Subscripts are added to the right of f , which the differential operation is applied on the left of f .

Example 4.4.9
A Higher Order Partial Derivative

If f (x, y) = sin(3x + x2 y) calculate fxy .

Solution

First we compute fx . We’ll need the chain rule.


The outer function is sin x. Its derivative is cos x.
The inner function is 3x + x2 y. Its derivative is 3 + 2xy.
fx = cos(3x + x2 y)(3 + 2xy).

Computing (fx )y will require the product rule. ∂y cos(3x + x2 y) requires the chain rule.
The outer function is cos x. Its derivative is − sin x.
The inner function is 3x + x2 y. Its derivative is x2 .

∂y cos(3x + x2 y) = − sin(3x + x2 y)(x2 ).

Now we apply the product rule.

∂ ∂ ∂
cos(3x + x2 y)(3 + 2xy) = cos(3x + x2 y) (3 + 2xy) + cos(3x + x2 y)

(3 + 2xy)
∂y ∂y ∂y
= − sin(3x + x2 y)(x2 )(3 + 2xy) + cos(3x + x2 y)(2x)

289
Question 4.4.10
Does Differentiation Order Matter?

No. Specifically, the following is due to Clairaut:

Theorem

If f is defined on a neighborhood of (a, b) and the functions fxy and fyx are both continuous on that
neighborhood, then fxy (a, b) = fyx (a, b).

This readily generalizes to larger numbers of variables, and higher order derivatives. For example
fxyyz = fzyxy .

Section 4.4
Exercises

Summary Questions

Q1 What is the role of each variable when we compute a partial derivative?

Q2 What does the partial derivative fy (a, b) mean geometrically?

Q3 Can you think of an example where the partial derivative does not accurately model the change
in a function?

Q4 What is Clairaut’s Theorem?

4.4.1

Q5 Give the equation of the line that lies in the plane x = 2 and is tangent to the graph z = xe3xy +x

at the point (2, 0, 4). You may give your equation in any notation that works in 2 dimensions.

290
Q6 Alexander performs an experiment with his wireless networking router. At each level of power

output (in miliwatts) and distance from his computer (in meters), he measures T (p, d), the
maximum transfer speed of data (in megabits per second). Here is a table of his observations.

0mW 100mW 200mW 300mW 400mW 500mW 600mW 700mW


10m 6 15 40 100 300 800 800 800
20m 0 2 15 30 90 300 800 800
30m 0 0 2 10 50 100 400 800
40m 0 0 0 0 5 20 50 100
50m 0 0 0 0 2 5 20 45

a Use this data to approximate Tp (300, 20). Show what values you used. There is more than
one reasonable way to do this.

b What does the derivative in a mean in physical terms? Be precise and include units.

c Use this data to approximate Td (500, 30). Show what values you used. There is more than
one reasonable way to do this.

d What appears to be true about the sign of Td (p, d)? What does this mean in physical terms,
and why does it make sense?

4.4.2

Q7 Let f (x, y) = 7x2 + 5y cos x + ey . Compute fx (x, y). Explain the role of y in each term where
it is present.

Q8 Let f (x, y) = sin x sin y. Show how to compute fy (x, y) using the product rule, then suggest a
more efficient approach.

291
Section 4.4 Exercises

4.4.3

Q9 In the diagram from this example, is fx (3, 0) positive or negative? Explain.

Q10 In the diagram from this example, use a point on the c = 30 level set to approximate fy (4, −1.25).

Q11 In the diagram from this example, use a point on the c = 50 level set to approximate fx (4, −1.25).

Q12 In the diagram from this example, what is fy (0, 0)? Explain your reasoning.

4.4.4

Q13 Find fx and fy for the following functions f (x, y)

a f (x, y) = x2 − y 2

py
b f (x, y) = x (assume x > 0 and y > 0)

c f (x, y) = yexy

Q14 Find gx (x, y) and gy (x, y) for the following functions g(x, y)

292
2
+y 2
a g(x, y) = ex

b g(x, y) = y ln(y − x)

3x2 +4x−2
c g(x, y) = e(y3 )

4.4.5

Q15 Extrapolate from the limit defintion of fx (x, y) to give a limit definition of fx (x, y, z). Explain
why this limit represents a change in f where only x is changing.

√ ∂f
Q16 Let f (x, y, z) = e3x y + 3 yz + x3 z 7 . Compute ∂z .

2 ∂g
Q17 Let g(u, v, w) = euv+w . Compute ∂v .

er +es +et ∂p
Q18 Let p(r, s, t) = rst . Compute ∂r .

4.4.6

∂P
Q19 In this example, does the fact that glass expands as it is heated suggest that ∂T overstates or
understates the actual rate of pressure increase as T increases?

Q20 Suppose Jinteki Corporation makes widgets which is sells for $100 each. It commands a small

enough portion of the market that its production level does not affect the demand (price) for its
products. If W is the number of widgets produced and C is their operating cost, Jinteki’s profit
is modeled by
P = 100W − C.
∂P
Since ∂W = 100 does this mean that increasing production can be expected to increase profit at
a rate of $100 per widget?

293
Section 4.4 Exercises

4.4.7

Q21 Suppose g(s, t) is the partial derivative of f (s, t) with respect to t, and h(s, t) is the partial

derivative for g(s, t) with respect to s. Write h in terms of f using both subscript and ∂
notation.

Q22 Physicists note that velocity is the derivative of position with respect to time, and acceleration

is the derivative of velocity with respect to time. If s(t, f ) is the position of a rocket with f
∂3s
kilograms of fuel after t seconds, what is the physical meaning of ∂ 2 t∂f ?

4.4.8

Q23 If f (x, y) = sin(3x + x2 y) calculate fyx . Verify that you get the same answer that we did for
fxy .

Q24 Let f (x, y) = ln(x2 + y). Compute fxy (x, y).

2
Q25 Let g(x, y, z) = 2x3 z + yexy .

∂g
a Compute ∂y .

∂2g
b Compute ∂x2 .

Q26 Compute the following partial derivatives of

x3 sin(xz)
g(x, y, z) =
y

∂g
a
∂y
∂2g
b
∂z 2
∂2g
c
∂z∂x

294
4.4.9

Q27 If f (x, y, z) is a smooth function, which of the following are equavalent to fxyyzy ?

i. fxzzyz
ii. fzyyxy
iii. fyyyzx
iv. fxxxyz
v. fxyzy
vi. fxyz
vii. fyxxzx

Q28 How many third partial derivatives does a two-variable function have? Assuming these derivatives
are continuous, which of them are equal according to Clairaut’s theorem?

Synthesis & Extension

exy ∂f ∂f
Q29 Let f (x, y) = x+y . Is ∂x = ∂y ? If so, why? If not, how are they related?

Q30 The function f (x, y) = ex+y has the strange property that fx x, y = fy (x, y) at every point

(x, y). What does this mean geometrically about the function f ?

Q31 Do we know that fx (x, y) is in fact a function? What fact about limits is relevant to this
question?

295
Section 4.5

Linear Approximations
Goals:

1 Calculate the equation of a tangent plane.


2 Rewrite the tangent plane formula as a linearization or differential.

3 Use linearizations to estimate values of a function.


4 Use a differential to estimate the error in a calculation.

In single-variable calculus, the tangent line was one of the great applications of the derivative. It
solves a difficult geometry problem, but it also gives a method of approximating a difficult to compute
function. The height of the tangent line is close to the height of the graph near the point of tangency.
This means the value of the tangent line function approximates the value of the function, close to the
point of tangency. The two-variable analogue of a tangent line is a tangent plane.

Question 4.5.1
What Is a Tangent Plane?

Definition

A tangent plane at a point P = (x0 , y0 , z0 ) on a surface is a plane containing the tangent lines to the
surface through P .

Figure: The tangent plane to z = f (x, y) at a point

296
Equation

If the graph z = f (x, y) has a tangent plane at (x0 , y0 ), then it has the equation:

z − z0 = fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ).

Remarks

1 This is the point-slope form of the equation of a plane. fx (x0 , y0 ) and fy (x0 , y0 ) are the slopes.

2 x0 and y0 are numbers, so fx (x0 , y0 ) and fy (x0 , y0 ) are numbers. The variables in this equation
are x, y and z.

The cross sections of the tangent plane give the equation of the tangent lines we learned in single
variable calculus.
y = y0 x = x0
z − z0 = fx (x0 , y0 )(x − x0 ) + 0 z − z0 = 0 + fy (x0 , y0 )(y − y0 )

This shows that the tangent plane does contain these two tangent lines.

297
Example 4.5.2
Writing the Equation of a Tangent Plane


Give an equation of the tangent plane to f (x, y) = xey at (4, 0)

Solution

Writing the formula requires us to fill in 5 values.

1 x0 = 4 is given.
2 y0 = 0 is given.

3 z0 is the height of the graph at (4, 0) which is 4e0 = 2.
4 To compute fx (x0 , y0 ) we compute the partial derivative function

1 √ y
fx (x, y) = √ e .
2 x

Then we evaluate at (4, 0).


1 √ 1
fx (4, 0) = √ e0 = .
2 4 4

5 fy (x0 , y0 ) is similar though we will use the chain rule.

√ 1
fy (x, y) = x √ y ey
2 e
√ 1
fy (4, 0) = 4 √ e0 = 1
2 e0

We plug these values into the tangent plane formula.

1
z−2= (x − 4) + 1(y − 0)
4

which simplifies to
1
z−2= (x − 4) + y.
4

298
Question 4.5.3
How Do We Rewrite a Tangent Plane as a Function?

Definition

If we write z as a function L(x, y), we obtain the linearization of f at (x0 , y0 ).

L(x, y) = f (x0 , y0 ) + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 )

If the graph z = f (x, y) has a tangent plane, then L(x, y) approximates the values of f near (x0 , y0 ).

Notice f (x0 , y0 ) just calculates the value of z0 . This formula is equivalent to the tangent plane
equation after we solve for z by adding z0 to both sides.

Example 4.5.4
Approximating a Function


Use a linearization to approximate the value of 4.02e0.05 .

Solution
√ √
We don’t know 4.02e0.05 , but we can think of this as the value of the function f (x, y) = xey . We
don’t know the value of this function at (4.02, 0.05), but the point (4, 0) is nearby, and we can evaluate
it there. This is where we’ll produce our linearization. We already produced the equation of the tangent
plane in Example .4.5.2.
1
z − 2 = (x − 4) + y
4
We write z as the function L(x, y) and solve for it:

1
L(x, y) = 2 + (x − 4) + y
4

For points near (4, 0), L(x, y) is close to f (x, y). This is the basis of our approximation.

4.02e0.05 = f (4.02, 0.05) ≈ L(4.02, 0.05)
1
≈ 2 + (4.02 − 4) + 0.05
4
≈ 2 + 0.005 + 0.05
≈ 2.055

299
Question 4.5.5
How Does Differential Notation Work in More Variables?

The one-variable differential is a shorthand way to express change in the linearization of a function.
The differential dx is an independent variable. It can take on any value. The differential dy depends on
both x0 and dx.
dy = f ′ (x0 )dx
Once we’ve chosen x0 and dx, dy is the amount that the tangent line to y = f (x) at x0 rises when we
increase x by dx.

Figure: The differentials dx and dy on the tangent line to y = f (x)

The differential dz measures the change in the linearization of f (x, y) given particular changes in the
inputs: dx and dy. It is a useful shorthand when one is estimating the error in an initial computation.

Definition

For z = f (x, y), the differential or total differential dz is a function of a point (x0 , y0 ) and two
independent variables dx and dy.

dz = fx (x0 , y0 )dx + fy (x0 , y0 )dy


∂z ∂z
= dx + dy
∂x ∂y

Remark

The differential formula is just the tangent plane formula with

dz = z − z0 dx = x − x0 dy = y − y0 .

An old trigonometry application is to measure the height of a pole by standing at some distance.
We then measure the angle θ of incline to the top, as well as the distance b to the base. The height is
h = b tan θ.
300
π
a If the distance to the base is 13m and the angle of incline is 6, what is the height of the pole?

b Human measurement is never perfect. If our measurement of b is off by at most 0.1m and our
π
measurement of θ is off by at most 120 , use a differential to approximate the maximum possible
error in our h.

Solution

a The height is 13 tan π6 = 13



3
.

b To compute the differential, we need to know the partial derivatives of h:

∂h ∂h
= tan θ = b sec2 θ
∂b ∂θ
∂h 1 ∂h 56
=√ =
∂b (13, π ) 3 ∂θ (13, π ) 3
6 6

We can now compute the differential.

∂h ∂h
dh = db + dθ
∂b ∂θ
1 56
= √ db + dθ
3 3

π
dh is largest when db = 0.1 and dθ is 120 .

1 56 π
max dh = √ (0.1) +
3 3 120
1 13π
= √ +
10 3 90

301
Section 4.5
Exercises

Summary Questions

Q1 What do you need to compute in order to write the equation of a tangent plane to z = f (x, y)

at (x0 , y0 , z0 )?

Q2 For what kinds of functions are linear approximations useful?

Q3 How are the tangent plane and the linearization related?

Q4 How is the differential defined for a two variable function? What does each variable in the formula
mean?

4.5.1

Q5 Let p(x, y) = 3x + 5y − 2.

a What is the graph z = p(x, y)? What is the significance of 3, 5 and −2?

b Give the equation of the tangent plane to z = p(x, y) at (1, 4, 21)

c How is the tangent plane equation related to z = p(x, y)? Why does this make sense?

Q6 Olivia computes the tangent plane of z = x2 + y 2 at (4, 3, 25). Her answer is z − 25 =

2x(x − 4) + 2y(y − 3).

a Is this the equation of a plane? Explain.

b What does Olivia need to do to fix her answer?

Q7 If the equation of the tangent plane of z = f (x, y) does not have a y in it, does that mean that
y is a free variable of f ? Explain.

Q8 Can our tangent plane formula ever give us a plane parallel to the xy-plane? The xz-plane? The
zy-plane? Explain.

302
4.5.2

p
Q9 Compute the equation of the tangent plane to z = 36 − 4x2 − y 2 at (2, 2, 4).

3x2 +4x−2
Q10 Let g(x, y) = e(y3 )
. Write the equation of the tangent plane to z = g(x, y) at (0, 1).

py
Q11 Let f (x, y) = x. Write the equation of the tangent plane to z = f (x, y) at (4, 36, 3).

Q12 Let f (x, y) = ln(x2 + y). Write the equation of the tagent plane to z = f (x, y) at (e3 , 0, 6).

4.5.3

Q13 Write a linearization of f (x, y) = yexy at (3, 2).

2
+y 2
Q14 Write a linearization of g(x, y) = ex at (3, −4).

4.5.4


Q15 Suppose you want to approximate 5.5e0.3 by hand. Would using the linearization of f (x, y) =

xey at (5, 0) be a good strategy? Explain.

1 31π

Q16 Show how to use an appropriate linearization to approximate 5.12 sin 30 .

x2
Q17 Let g(x, y) = y . Suppose you don’t remember how to divide decimals. Show how you can use
3.972
a linearization of g to approximate 1.05 .
q √
Q18 Show how to use a linearization to approximate the value of (4.02)2 + 80.93 by hand.

303
Section 4.5 Exercises

4.5.5

y
Q19 Let f (x, y) = x2 +y 2 . Write the differential of f at (4, 3).

Q20 Let g(p, q) = p ln q. Write the differential of g at (3, e2 ).

Q21 Boris is measuring the area of a rectangular field, so he can decide how much grass seed to buy.

According to his measurements, the field is 30m by 50m, giving an area of 1500m2 . If we accept
that each of his measurements has an error no larger than 0.2m, use a differential to approximate
the maximum error in his area computation.

Q22 Suppose I decide to invest $10, 000 expecting a 6% annual rate of return for 12 years, after which
I’ll use it to purchase a house. The formula for compound interest
P = P0 ert
indicates that when I want to buy a house, I will have P = 10, 000e0.72 .

I accept that my expected rate of return might have an error of up to dr = 2%. Also, I may
decide to buy a house up to dt = 3 years before or after I expected.

a Write the formula for the differential dP at (r0 , t0 ) = (0.06, 12).

b Given my assumptions, what is the maximum estimated error dP in my initial calculation?

c What is the actual maximum error in P ?

Q23 Let z = 2x − y 3 . At the point (x, y) = (5, 2), what is the maximum value of the differential dz?

Q24 Let f (x, y) be a function. What differential and what inputs into that differential would you use

to approximate f (5.5, 3.2) − f (4.7, 3.8).

Synthesis & Extension

Q25 Let L(x, y) be the linearization of f (x, y) at (3, 2). If fyy (x, y) < 0 for all (x, y), at which points

can we guarantee that L(x, y) either under or overestimates the value of f (x, y)? Explain.

Q26 Let f (x, y) = 25 − (x + 1)2 − y(y − 3)2 . Describe the set of points (a, b) such that the tangent

plane to z = f (x, y) at (a, b, f (a, b)) passes through the origin.


304
Q27 Here is a table of selected values for a function f (x, y)

x
0 2 4 6 8 10
0 2 5 8 10 11 11
2 6 9 12 14 15 15
y
4 9 12 15 17 18 18
6 12 15 18 20 21 21
8 14 17 20 22 23 23
10 17 20 23 25 23 23

a Using any reasonable approximation method, show how to produce a linearization of f (x, y)

at (4, 2).

b Does your linearization over or underestimate f (10, 2)? Explain what that suggests about

one or more derivatives of f (x, y).

Q28 a Give an equation of the plane that passes through the points (3, 4, 2), (5, 5, 1) and (6, 2, 6).

b Suppose there is a function f (x, y) and the plane in part a is tangent to the graph z =

f (x, y) at (3, 4, 2). What partial derivatives of f can you compute exactly (be specific)?
Compute them.

305
Section 4.5 Exercises

306
Chapter 5

Vectors in Calculus

This chapter introduces vectors and their applications to calculus. We will use them to compute direc-
tional derivatives, to differentiate compositions of functions, and to find minimum and maximum values
of a function.

Contents
5.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
5.2 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
5.3 Normal Equations of Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
5.4 The Gradient Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
5.5 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
5.6 Maximum and Minimum Values . . . . . . . . . . . . . . . . . . . . . . . . 377
5.7 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Section 5.1

Vectors
Goals:

1 Distinguish vectors from scalars (real numbers) and points.


2 Add and subtract vectors, multiply by scalars.

3 Express real world vectors in terms of their components.

Calculus is the study of change. We defined the partial derivative to be instantaneous rate of change
of a multi-variable function when one variable changed but the other stayed constant. If we want to
describe a more complicated change, we will need new notations and vocabulary to describe them. We
will need vectors.

Question 5.1.1
What is a Vector?

A vector is a way of describing a change in position in n-space. To keep things simple, we’ll start
with vectors in the plane. We need two pieces of information to identify a vector.

Definition

A vector in 2-space consists of a magnitude (length) and a direction. Two vectors with the same
magnitude and the same direction are equal.

Example

Here are four vectors in 2-space (the plane) represented by arrows. Two of these vectors are equal.

Here are some vectors

3 miles south

308
The force that a magnetic field applies to a charged particle
The velocity of an airplane

Here are some non-vectors


17
The mass of an automobile

3:15 PM
Atlanta, GA

Question 5.1.2
How Do We Denote Vectors?

When defining a new type of object, we need to agree on a notation. This allows us to communicate
clearly which vector we are referring to. One way of denoting a vector is by its endpoints.

Endpoint Notation

The vector ⃗v from point A to point B can be represented by the notation


−−→
AB.
A is the initial point and B is the terminal point.

How does this notation interact with the idea of equal vectors?

Theorem
−−→ −−→
AB = CD if and only if ABDC is a parallelogram (perhaps a squished one).

The plane has a coordinate system. We can take advantage of this to produce a more quantitative
notation for vectors.

309
Question 5.1.2 How Do We Denote Vectors?

Coordinate Notation

We can represent a vector in the Cartesian plane by the x and y components of its displacement. If
−−→
A = (2, 3) and B = (5, 1), then AB increases x by 5 − 2 = 3 and y by 1 − 3 = −2. We can represent
−−→
AB = ⟨3, −2⟩

Figure: The x and y components of a vector

We can use coordinate notation to quickly test whether two vectors are equal.

Theorem

⃗v = ⃗u if and only if their coordinate representations match in each component.

We can also measure slope using the coordinate notation. For the vector ⃗v = ⟨a, b⟩:
b represents the displacement in the y-direction (rise).
a represents the displacement in the x-direction (run).
rise
The slope of ⃗v is run = ab .
Vectors are not points, but their coordinate notations look awfully similar. We can connect them
more formally. Every point in a Cartesian coordinate system has a position vector, which gives the
displacement of that point from the origin. The components of the vector are the coordinates of the
point.

310
Figure: There is only one point equal to (−5, 1), but there are many vectors equal to ⟨−5, 1⟩.

Question 5.1.3
What Arithmetic Can We Perform with Vectors?

Unlike locations (points), displacements (vectors) can be added and multiplied. This arithmetic
allows unlocks a variety of computations and measurements, specifically it will allow us to do calculus.
Since we have multiple ways of representing vectors, we will want to understand how to perform these
operations with each of those representations.

311
Question 5.1.3 What Arithmetic Can We Perform with Vectors?

Vector Sums

The sum of two vectors ⃗v + ⃗u is calculated by positioning ⃗v and ⃗u head to tail. The sum is the vector
from the initial point of one to the terminal point of the other. In coordinate notation, we just add each
component numerically.

⟨ 1, 3⟩
+⟨ 3, −1⟩
⟨ 4, 2⟩

Scalar Multiples

Given a number (called a scalar) λ and a vector ⃗v we can produce the scalar multiple λ⃗v , which is the
vector in the same direction as ⃗v but λ times as long.

If λ is negative then λ⃗v extends in the opposite di-


rection. Either way, we say λ⃗v is parallel to ⃗v .

In coordinates scalar multiplication is distributed to each component. For example:

2.5 ⟨6, 4⟩ = ⟨15, 10⟩

312
Example 5.1.4
Performing Vector Arithmetic

Given diagrams of two vectors ⃗u and ⃗v , how would we calculate 21 ⃗u + ⃗v ?

What if we are instead given the components ⃗u = ⟨a, b⟩ and ⃗v = ⟨c, d⟩?

Solution

After drawing a random ⃗u and a random ⃗v , we draw 21 ⃗u in the same direction as ⃗u but is half as long.
We place it head to tail with ⃗v , and 12 ⃗u + ⃗v completes the triangle.

In coordinates the computation is as follows.

1 1
⃗u + ⃗v = ⟨a, b⟩ + ⟨c, d⟩
2 2
 
1 1
= a, b + ⟨c, d⟩
2 2
 
1 1
= a + c, b + d
2 2

313
Question 5.1.5
What Is Standard Basis Notation?

Vector arithmetic gives us another notation that takes advantage of our algebraic intuition. We can
represent any vector in the plane as a sum of scalar multiples of the following standard basis vectors.

Standard Basis Vectors

The emphstandard basis vectors in R2 are


⃗i = ⟨1, 0⟩

⃗j = ⟨0, 1⟩

For example, the vector ⟨3, −5⟩ can be written as 3⃗i − 5⃗j. You can check yourself that the sum on
the right gives the correct vector.

Question 5.1.6
How Do We Measure the Length of a Vector?

A vector consists of two pieces of information: magnitude and direction. How do we measure these?
Length is the distance between the endpoints. We already have a method for measuring distance in the
plane.

Definition

The length or magnitude of a vector is calculated using the distance formula and notated |⃗v |. If
⃗v = a⃗i + b⃗j, then
p
|⃗v | = a2 + b2

314
Example 5.1.7
The Length of a Vector

If ⃗v = ⟨3, −5⟩ calculate |⃗v |

Solution
p √
|⃗v | = 32 + (−5)2 = 34

Definition

A unit vector is a vector of length 1. Given a vector ⃗v the scalar multiple

1
⃗v
|⃗v |

is a unit vector in the same direction as ⃗v .

Question 5.1.8
How Do We Measure the Direction of a Vector?

Direction cannot be described as clearly as length. How do we even measure it? A partial answer is
to measure the difference in direction between two vectors.
Angles are a good way of comparing directions. In general, two vectors will not intersect to form an
angle, so we use the following definition:

Definition

The angle between two vectors is the angle they make when they are placed so their initial points are
the same.
If they make a right angle, we call them orthogonal. If they make an angle of 0 or π, they are
parallel.

315
Question 5.1.9
How Do We Denote Vectors in Higher Dimensions?

Higher dimensional vectors represent displacements in higher dimensional spaces. We can call a
vector in n-space an n-vector. We can still denote and n-vector by its endpoints. We can also denote
it in coordinate notation, but we need more components.

Example

If A = (2, 4, 1) and B = (5, −1, 3) then

−−→
AB = ⟨3, −5, 2⟩ .

In three space, we add another standard basis vector ⃗k.

Standard basis for 3-vectors

⃗i = ⟨1, 0, 0⟩
⃗j = ⟨0, 1, 0⟩
⃗k = ⟨0, 0, 1⟩

Example

⟨3, −5, 2⟩ = 3⃗i − 5⃗j + 2⃗k

Higher dimensions still have a standard basis, but at this point the naming conventions are less
standard. {⃗e1 , ⃗e2 , ⃗e3 , . . . , ⃗en } is common for n-vectors.

Length of a Vector

The length of an n-vector derives from the distance formula in n-space.


q
|⟨a1 , a2 , a3 , . . . , an ⟩| = a21 + a22 + a23 + · · · + a2n

We might be concerned that direction becomes an even more difficult concept to work with as the
dimension increases. However, angles are a valid a way of comparing directions any dimension (though
they may be more difficult to compute).

316
Angles Between Vectors

Any two vectors with the same initial point lie in a plane. Their angle is a two-dimensional measurement.
However there is no good way to measure clockwise in 3 or more dimensions. The angle between
two vectors is never negative, nor more than π.

Figure: Two 3-vectors with a common initial point, the plane that contains them, and the angle
between them

Section 5.1
Exercises

Summary Questions

Q1 How is a vector similar to a point? To a number?

Q2 How is a vector different from a point? From a number?

Q3 How can you tell if two vectors point in the same direction? Opposite directions?

−−→
Q4 If ⃗u and ⃗v are position vectors of the points P and Q, how are ⃗u and ⃗v related to P Q?

317
Section 5.1 Exercises

5.1.1

Q5 Which of the following are vectors?

i. The reading on a speedometer.


ii. The intersection of two lines.
iii. Five miles toward Atlanta.
iv. The length of a string.
v. The velocity of a projectile.

Q6 Which of the following are vectors?

i. The displacement of a key on a keyboard, when pressed.


ii. The speed of light.
iii. The center of the earth.
iv. The force applied by a rocket engine.
v. The mass of five hippopotamuses.
−−→ −→
Q7 If AB = AC, what does that tell us about the points B and C? Explain.

−−→ −−→
Q8 If AB = BA, what does that tell us about the points A and B? Explain.

5.1.2

−−→
Q9 If A = (8, 7, 11) and B = (2, 3, 15) write the vector AB

a in terms of its components

b in standard basis notation

−−→
Q10 If P = (−2, 3, 5) and Q = (−2, 0, −4) write the vector P Q

a in terms of its components

b in standard basis notation

Q11 What is the slope of the vector −4⃗i + 10⃗j?

Q12 Give three different vectors of slope 37 .

318
Q13 Suppose two different vectors have the equal slopes. How are they related?

Q14 Given a number m, give two different vectors with slope m.

5.1.3

Q15 Let ⃗u be a vector. How are the magnitude and direction of ⃗u and 2⃗u related?

Q16 How is the direction and magnitude of ⃗u related to the direction and magnitude of −⃗u?

Q17 Given diagrams of two vectors ⃗u and ⃗v , how would we draw ⃗u − ⃗v ? What it its significance?

⃗ = ⃗u, what does that tell us about ⃗u? Explain.


Q18 If ⃗u is a vector and 2u

−−→ −→ −−→
Q19 If ⃗u = AB, ⃗v = AC, and 12 ⃗u + 12 ⃗v = AD, where is D?

−−→ −→ −−→
Q20 If ⃗u = AB, ⃗v = AC, and 15 ⃗u + 45 ⃗v = AD, where is D?

5.1.4

Q21 Let ⃗u = 4⃗i + 3⃗j and ⃗v = 5⃗i − 2⃗j. Compute ⃗u + ⃗v .

⃗ = ⟨5, −1⟩ and ⃗v = ⟨12, 10⟩. Compute w


Q22 Let w ⃗ − ⃗v .

Q23 For Lindsey to get from her house to Sam’s house, she travels 5mi north and 3mi west. To
get to Russel’s house, she travels 2mi due south. What displacement would get her from Sam’s
house to Russel’s house?

Q24 One can get from Atlanta to Decatur by travelling 8km east and 2km north. To get from
Decatur to Covington, one can travel 43km east and 20km south. Describe how to get directly
from Atlanta to Covington.

Q25 Using the diagram below, describe each vector in terms of ⃗u and ⃗v using vector addition and
scalar multiplication. Use the fact that ACDB and ACBE are parallelograms.
319
Section 5.1 Exercises

−−→
a EB

−−→
b CG

−−→
c BC

−→
d AF

−−→
e GB

Q26 Using the diagram below, describe each vector in terms of ⃗u and ⃗v using vector addition and
scalar multiplication. Use the fact that ACBD is a parallelogram, and the marked segments are
congruent.

−−→
a BD

−→
b EA

−−→
c DC

−−→
d BG

−→
e AG

−−→
f CF

5.1.5

Q27 Write ⟨5, 2⟩ in standard basis notation.

Q28 For any numbers a and b, use the definition of ⃗i and ⃗j to show that a⃗i + b⃗j = ⟨a, b⟩.

320
5.1.6

Q29 Compute the length of ⃗u = ⟨−5, 12⟩.

Q30 Given a nonzero vector ⃗u, many vectors of length 5 are parallel to ⃗u? Explain.

Q31 Find a unit vector in the direction of 3⃗i − ⃗j.

Q32 Find a unit vector in the direction of ⟨12, −16⟩.

5.1.7

Q33 If ⃗u and ⃗v are vectors in R2 whose components are all positive, what is the largest possible angle

between ⃗u and ⃗v ?

Q34 Explain the difference between the terms “perpendicular” and “orthogonal.”

Q35 Suppose two vectors do not have the same inital point, but when we represent them by arrows,
the arrows happen to cross. Is the angle made in the crossing equal to the angle between the
vectors (as we defined it)?

Q36 Describe all the vectors that make an angle of π


4 with ⃗v = −⃗j.

5.1.8

Q37 If ⃗u = ⟨2, 0, 3⟩ and ⃗v = ⟨5, 6, 0⟩, compute 3⃗u − 4⃗v .

Q38 If ⃗a = 10⃗i − 25⃗k and ⃗b = 8⃗i − 4⃗j + 10⃗k, compute 35 ⃗a + 12⃗b.

Q39 Compute the magnitude of ⃗v = 2⃗i − 7⃗j + 6⃗k.

Q40 Compute two unit vectors parallel to ⃗v = ⟨4, −4, 2⟩.

321
Section 5.1 Exercises

Q41 a How many different (nonequal) unit vectors are orthogonal to a given vector in R2 ? How
are they related to each other?

b How many different (nonequal) unit vectors are orthogonal to a given vector in R3 ? How
are they related to each other?

Q42 Let ⃗u and ⃗v be non-parallel vectors in R3 . How many unit vectors in R3 are orthogonal to both

⃗u and ⃗v ?

Synthesis and Extension

Q43 Is the vector ⃗v = 2⃗i + 3⃗j + 8⃗k parallel to the plane p whose slope-intercept equation is z =
x + 2y − 7?

Q44 For a two-variable function f (x, y), fx (x0 , y0 ) is the slope of the line tangent to z = f (x, y) at

(x0 , y0 , f (x0 , y0 )) in the x-direction. Write a vector ⃗v that is parallel to this line.

−−→ −→
Q45 If ⃗u = AB and ⃗v = AC, show that for any scalar t, t⃗u + (1 − t)⃗v = AD where D is a point on
the line through B and C.

⃗ are position vectors of the three vertices A, B and C of a triangle, then 31 (⃗u +⃗v + w)
Q46 If ⃗u, ⃗v and w ⃗
is the position vector of K, the center of mass of the triangle. Verify this by showing that K lies
on the line between A and the midpoint of the side BC.

Q47 Suppose we become interested in studying vectors of infinite dimension (yes this is something

mathematicians actually do).

a Explain what trouble we might run computing the length of the vector ⟨1, 1, 1, 1, 1, . . .⟩.

b What would the length of the vector ⟨1, 21 , 14 , 18 , 16


1
, . . .⟩ be?

322
Section 5.2

The Dot Product


Goals:

1 Calculate the dot product of two vectors.


2 Determine the geometric relationship between two vectors based on their dot product.

3 Calculate vector and scalar projections of one vector onto another.

The arithmetic of vectors appears to have room for expansion. While we can add and subtract
vectors, we only defined how to multiply them by scalars, not by other vectors. There are in fact
products of two vectors. The simplest and most useful is the dot product. The dot product takes two
n-vectors and outputs a single number. Despite this apparent loss of information, the dot product is
the key tool in computing the angle between vectors, the work done by a force, or the illumination in a
digital scene.

Question 5.2.1
What Is the Dot Product?

Definition

The dot product of two vectors is a number.


For two dimensional vectors ⃗v = ⟨v1 , v2 ⟩ and ⃗u = ⟨u1 , u2 ⟩ we define

⃗v · ⃗u = v1 u1 + v2 u2
For three dimensional vectors ⃗v = ⟨v1 , v2 , v3 ⟩ and ⃗u = ⟨u1 , u2 , u3 ⟩ we define

⃗v · ⃗u = v1 u1 + v2 u2 + v3 u3
This pattern can be extended to any dimension.

Example 5.2.2
Computing a Dot Product

a Calculate ⟨2, 3, −1⟩ · ⟨4, 1, 5⟩

b Calculate (−2⃗i + 4⃗k) · (⃗i + 2⃗j − ⃗k)

323
Example 5.2.2 Computing a Dot Product

Solution

a ⟨2, 3, −1⟩ · ⟨4, 1, 5⟩ = (2)(4) + (3)(1) + (−1)(5) = 6

b (−2⃗i + 4⃗k) · (⃗i + 2⃗j − ⃗k) = (−2)(1) + (0)(2) + (4)(−1) = −6

Question 5.2.3
What Are the Algebraic Properties of the Dot Product?

Theorem

The following algebraic properties hold for any vectors ⃗u, ⃗v and w
⃗ and scalars m and n.

Commutative ⃗u · ⃗v = ⃗v · ⃗u

Distributive ⃗u · (⃗v + w)
⃗ = ⃗u · ⃗v + ⃗u · w

Associative m⃗u · n⃗v = mn(⃗u · ⃗v )

Question 5.2.4
What Is the Geometric Significance of the Dot Product?

⃗u · ⃗v encodes key information about the magnitude and direction of ⃗u and ⃗v . This geometric
relationship can be derived from the algebraic properties we’ve established. We begin with the idea that
⃗u · ⃗u = |⃗u|2 . This doesn’t tell us the value of every dot product, but we can extend the reasoning to
any pair of parallel vectors.

324
Theorem

If ⃗u and ⃗v are parallel then


(
|⃗u||⃗v | if ⃗u and ⃗v have the same direction
⃗u · ⃗v =
−|⃗u||⃗v | if ⃗u and ⃗v have opposite directions

Since ⃗u and ⃗v are parallel, we can write ⃗v = m⃗u for some scalar m. ⃗v is m times as long as ⃗u. Both
lengths are positive, so this means if m > 0 then |⃗v | = m|⃗u|, but if m < 0, then |⃗v | = −m|⃗u|
⃗u · ⃗v = ⃗u · (m⃗u)
= m⃗u · ⃗u

= m|⃗u|2
= |⃗u|m|⃗u|
(
|⃗u||⃗v | if ⃗u and ⃗v have the same direction
=
−|⃗u||⃗v | if ⃗u and ⃗v have opposite directions

We can establish the dot product in another special case: when the vectors are orthogonal.

Theorem

If ⃗u and ⃗v are orthogonal then


⃗u · ⃗v = 0.

In this case, we place ⃗u and ⃗v head to tail and draw ⃗u + ⃗v . Since ⃗u and ⃗v make a right angle, these
three vectors make a right triangle. The Pythagorean theorem applies to the lengths of the vectors.

Figure: Orthogonal vectors and their sum making a right triangle

|⃗u + ⃗v |2 = |⃗u|2 + |⃗v |2 (Pythagorean theorem)


(⃗u + ⃗v ) · (⃗u + ⃗v ) = ⃗u · ⃗u + ⃗v · ⃗v
⃗u · ⃗u + ⃗u · ⃗v + ⃗v · ⃗u + ⃗v · ⃗v = ⃗u · ⃗u + ⃗v · ⃗v (distributive property)
⃗u · ⃗v + ⃗v · ⃗u = 0
2⃗u · ⃗v = 0 (commutative property)
⃗u · ⃗v = 0

325
Question 5.2.4 What Is the Geometric Significance of the Dot Product?

Two vectors need not be parallel or orthogonal, but given vectors ⃗u and ⃗v we can always write
⃗v = ⃗vproj + ⃗vorth . We choose ⃗vproj to be parallel to ⃗u and ⃗vorth to be orthogonal to ⃗u.

The properties of the dot product tell us that

⃗u · ⃗v =⃗u · (⃗vproj + ⃗vorth )


= ± |⃗u||⃗vproj | + 0

Definition
⃗u · ⃗v
The number is called the scalar projec-
|⃗u|
tion of ⃗v onto ⃗u.

The scalar projection is equal to the length of ⃗vproj if ⃗vproj is in the same direction as ⃗u. Otherwise,
it is the negative of the length.

Theorem

Let ⃗u and ⃗v have the same initial point and meet at angle θ. The following formula holds in any
dimension:

⃗u · ⃗v = |⃗u||⃗v | cos θ

Recall that cos θ is

positive when θ < π/2

negative when θ > π/2


zero when θ = π/2.

So the sign of ⃗u · ⃗v tells us whether θ is


acute, obtuse or right.

Example 5.2.5
Using the Cosine Formula

What is the angle between ⟨1, 0, 1⟩ and ⟨1, 1, 0⟩?

326
Solution

We’ll apply the cosine formula, compute all of the components besides θ and solve.

⟨1, 0, 1⟩ · ⟨1, 1, 0⟩ = | ⟨1, 0, 1⟩ || ⟨1, 1, 0⟩ | cos θ


p p
(1)(1) + (0)(1) + (1)(0) = 12 + 02 + 12 12 + 12 + 02 cos θ
√ √
1 = 2 2 cos θ
1
= cos θ
2
 
1
cos−1 =θ
2
π

3

We can verify this by noting that these vectors are diagonals in a unit cube. We could connect them
with a third diagonal to make an equilateral triangle. We may recall that an equilateral triangle has
angles of π3 .

Figure: Two vectors in a unit cube

Application 5.2.6
Work

In physics, we say a force works on an object if it moves the object in the direction of the force.
Given a force F and a displacement s, the formula for work is:

W = Fs

327
Application 5.2.6 Work

In higher dimensions, displacement and force are vectors. If the force and the displacement are not
in the same direction, then only F⃗proj contributes to work.

W = F⃗proj · ⃗s = F⃗ · ⃗s

Section 5.2
Exercises

Summary Questions

Q1 What algebraic properties does a dot product share with real number multiplication?

Q2 What is the significance of the dot product of two parallel vectors?

Q3 How is the angle between two vectors related to their dot product?

Q4 What is a scalar projection, and how do you compute it?

328
5.2.1

Q5 What do ⃗v · ⃗i and ⃗v · ⃗j measure about ⃗v ?

Q6 Elaine computes ⃗u ·⃗v and gets ⟨15, 4⟩. How can you tell that Elaine got the wrong answer without

even knowing what ⃗u and ⃗v are?

5.2.2

Q7 Compute the following dot products.

a ⟨4, 5⟩ · ⟨−1, −2⟩

b (5⃗i + 6⃗j) · (⃗i − 2⃗j)

c ⟨2, 4, −10⟩ · ⟨0, −1, −2⟩

Q8 Compute the following dot products.

a ⟨4, 5⟩ · ⟨−1, −2⟩

b (5⃗i + 6⃗j) · (⃗i − 2⃗j)

c (2⃗i − 3⃗k) · (7⃗j − ⃗k)

5.2.3

Q9 Let ⃗u = ⟨2, 3⟩, ⃗v = ⟨4, −1⟩ and w


⃗ = ⟨−5, 2⟩.

a Compute ⃗u · ⃗u and ⃗u · ⃗v and ⃗u · w.


b Compute ⃗v · ⃗u. How does it compare to ⃗u · ⃗v ?

329
Section 5.2 Exercises

c How is ⃗u · ⃗u related to |⃗u|?

d Compute 3⃗u and 3⃗v then take their dot product. How is it related to ⃗u · ⃗v ?

⃗ then compute ⃗u · (⃗v + w).


e Compute ⃗v + w ⃗ How is it related to ⃗u · ⃗v and ⃗u · w?

f Why do you think we call this operation a “dot product” and not a “dot sum?”

g If you wanted to prove that relationships your noticed in b - e work for all possible vectors,
how would you do that?

Q10 Expand the parentheses 2⃗u · (3⃗v − w).


Q11 Expand the parentheses (⃗a − 3⃗b) · (5⃗c + 2d).


Q12 Factor ⃗a · ⃗a + 6⃗a · ⃗b + 9⃗b · ⃗b.

5.2.4

Q13 Suppose we know that ⃗u and ⃗v are parallel, that |⃗v | = 4 and that ⃗u · ⃗v = −28.

a What is the length of ⃗u?

b What can you say about the directions of ⃗u and ⃗v ?

Q14 If |⃗u| = 12, |⃗v | = 9, and ⃗u · ⃗v = 0, what is the magnitude of the vector w
⃗ = ⃗u + ⃗v ?

Q15 If |⃗u| = 5 and ⃗u · ⃗v = 15, what are the possible values of |⃗v |?

Q16 If |⃗u| = 6 and |⃗v | = 10 what are the greatest and least possible values of ⃗u · ⃗v ?

Q17 Let ⃗v = 7⃗i − 2⃗j + ⃗k, what unit vector ⃗u produces the largest possible dot product ⃗u · ⃗v ?

Q18 Argue that ⃗u · ⃗v cannot be any larger than |⃗u||⃗v |.

330
5.2.5

Q19 Compute the angle between ⟨6, 1, 4⟩ and ⟨7, 0, 2⟩.

Q20 Compute the angle between ⟨0, 3, −5⟩ and ⟨3, −4, 3⟩.

Q21 Let A be the vertex of a cube. Let B the a vertex closest to A and C be the vertex farthest from
−−→ −→
A. Compute the angle between AB and AC.

Q22 Let A be the vertex of a cube, and B and C be any two other points on the cube. Use a dot
−−→ −→ π
product to explain why the angle between AB and AC cannot be larger than 2. (Hint, put A
at (0, 0, 0).)

Synthesis and Extension

Q23 How could you use the dot product to determine whether two vectors are parallel? How does this
compare with the methods we already have?

Q24 Use dot products to find at least one vector that is orthogonal to both ⟨5, −1, 2⟩ and ⟨4, 4, 1⟩

Q25 “Think of a vector ⃗v ” says Raphael, “tell me its dot product with the vector of my choice, and
I’ll tell you what your vector was.”

a Is there any mathematical way to make such a trick work? Explain.

b How many dot products would you need to ask for to uniquely identify an unknown vector?
What dot products would you ask for?

331
Section 5.3

Normal Equations of Planes


Goals:

1 Give equations of planes in both vector and normal forms.


2 Use normal vectors to measure the distance to a plane.

Question 5.3.1
What is a Normal Vector to a Plane?

In algebra, you learned the normal equation of a line: e.g. 2x + 3y − 12 = 0. Why is it called this?

Figure: A line and one of its normal vectors

The vector ⟨2, 3⟩ is a normal vector to the line, meaning it is orthogonal to any vector contained in
the line. We can extend this definition to planes in 3-space. A normal vector to a plane is orthogonal
to every vector in the plane.

Theorem

In three-dimensional space, every plane has normal vectors. They are all parallel to each other.

332
−−→
Figure: A plane, its normal vector ⃗n, and a vector P Q in the plane

−−→
This gives us an avenue to test whether a point Q lies on the plane or not. If P Q is orthogonal to
−−→
⃗n, then Q lies on the plane. If P Q and ⃗n make a different angle, then Q is not on the plane.
We’d like to rewrite this relationship terms of the coordinates of Q. If ⃗r0 is the position vector of
−−→
P and ⃗r is the position vector of Q, then P Q = ⃗r − ⃗r0 . The dot product gives us a simple test to see
whether this vector is orthogonal to ⃗n.

Theorem

If ⃗r0 = ⟨x0 , y0 , z0 ⟩ describes an known point on a plane, and ⃗n = ⟨a, b, c⟩ is a normal vector. Then
the normal equation of the plane is

(⃗r − ⃗r0 ) · ⃗n = 0
or
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0

Notice that since x0 , y0 and z0 are constants, we can distribute and collect them into a single term:
d.

ax + by + cz − ax0 − by0 − cz0 = 0


ax + by + cz + d = 0

This reasoning works in any dimension to define a set of points whose displacement from a known
point is orthogonal to some normal vector.

333
Question 5.3.1 What is a Normal Vector to a Plane?

Example

a(x − x0 ) + b(y − y0 ) = 0 defines a line.


a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0 defines a plane.

a1 (x1 − c1 ) + a2 (x2 − c2 ) + · · · + an (xn − cn ) = 0 defines a hyperplane.

Example 5.3.2
Computing a Normal Vector

Find the normal equation of the plane with intercepts (4, 0, 0), (0, 3, 0) and (0, 0, 8). Compute a
normal vector.

Solution

The normal equation of a plane has the form ax + by + cz + d = 0. Each of these points must satisfy
this equation. We will plug them in and see what they tell me about the coefficients.

a(4) + b(0) + c(0) + d = 0 4a + d = 0


d = −4a
a(0) + b(3) + c(0) + d = 0 3b + d = 0
d = −3b
a(0) + b(0) + c(8) + d = 0 8c + d = 0
d = −8c

There are infinitely many solutions to this system of equations. This makes sense, because there are
infinitely many normal vectors to a plane. Different choices of d give ⃗n’s that are scalar multiples of
each other. A convenient choice for d is −24, but any nonzero value will work. d = −24 gives

6x + 8y + 3z − 24 = 0

The normal vector is ⟨6, 8, 3⟩.

334
Synthesis 5.3.3
Using the Normal Vector to Compute Distance

Consider the line 2x + 3y − 12 = 0.

This is the line with normal vector ⃗n = ⟨2, 3⟩ and known point P = (3, 2).

Example

Let P1 = (7, 2) and P2 = (4, 0).


−−→ −−→
1 Draw the vectors P P1 and P P2 .
−−→ −−→
2 If you didn’t have a picture, how could you use the values of ⃗n · P P1 and ⃗n · P P2 to determine
which side of the line P1 and P2 lie on?

Solution

Since ⃗n is a normal vector, its angle with any vector in the line is π2 . The vectors on the same side of
the line as ⃗n make an acute angle with ⃗n. The vectors on the far side make an obtuse angle. Thus
−−→ −−→
when ⃗n · P Pi < 0, Pi lies on the far side of the line from ⃗n. When ⃗n · P Pi > 0, Pi lies on the same side
as ⃗n.

We can get more detailed information than just the sign of the dot product. We can actually compute
a distance.

335
Synthesis 5.3.3 Using the Normal Vector to Compute Distance

Theorem

Given a line, plane, or hyperplane with normal equation L(x1 , . . . , xk ) = 0 and corresponding normal
vector ⃗n, the signed distance from the hyperplane to the point Q = (q1 , . . . , qk ) is

L(q1 , . . . , qk )
.
⃗n

−−→
Let P be a known point on the hyperplane. The scalar projection of P Q onto ⃗n is equal to the
signed distance from the hyperplane to Q.

−−→
Figure: The scalar projection of P Q onto the normal vector of a line

−−→
P Q · ⃗n
Distance = (formula for scalar projection)
|⃗n|
L(q1 , . . . , qk )
= (normal equation of the plane)
|⃗n|

This formula is especially powerful because we do not need to know a point on the hyperplane. The
equations

a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0


ax + by + cz + d = 0

are equivalent, and correspond to the same normal vector. We can use whichever one we happen to
have in our signed distance formula.

336
Example 5.3.4
The Distance from a Plane

Compute the geometric distance from the origin to the plane 6x + 8y + 3z − 24 = 0.

Solution

⃗n = ⟨6, 8, 3⟩. The signed distance from the plane to the origin is

L(0, 0, 0) (6)(0) + (8)(0) + (3)(0) − 24


= √
|⃗n| 36 + 64 + 9
24
= −√
109

Geometric distance cannot be negative, so it is √24 .


109

Application 5.3.5
Support Vector Machines

One type of machine learning involves training a computer to distinguish between two states. For
example, a computer might be trained to distinguish between a cancerous tumor and a benign one.
To do this the computer is given a large set of cases. Each case is measured by numerical data, such
as:

The size of the tumor

The location of the tumor


The age of the patient
Results of blood tests
The brightness of each pixel in a CT scan or MRI

Each data type is a dimension, and each case is a point in a (probably very high) dimensional space.
The computer would like a simple test to divide these cases into cancerous and benign. The test will
be which side of a hyperplane they lie on. It is unlikely that any such hyperplane exists initially, so the
computer attempts a sequence of transformations of the data until they are separated by a hyperplane
with some degree of reliability.

337
Application 5.3.5 Support Vector Machines

Section 5.3
Exercises

Summary Questions

Q1 What information do you need in order to write the normal equation of a plane?

Q2 How are the normal vectors of a plane related to each other?

Q3 What is the significance of the coefficients in the normal equation of a plane?

Q4 How do we compute the signed distance from a point to a plane?

338
5.3.1

Q5 Is ⃗v = ⟨8, −3, −10⟩ parallel to the plane 6x + 6y + 3z + 11 = 0? Explain.

Q6 Is ⃗v = 9⃗i − 15⃗j + 6⃗k normal to the plane −6x + 10y − 4z + 23 = 0? Explain.

Q7 Name a normal vector to the following planes:

i. 3x − 8y + 10z − 4 = 0
ii. z − 2 = 4(x + 7) − 5(y + 1)

Q8 Suppose that ⃗n is a normal vector to 6x − 3y + 2z − 4 = 0, that happens to also be a unit vector.

Give all possible values of ⃗n.

Q9 Write a normal equation of a plane parallel to 7x − 11y + 8z + 15 = 0 that passes through the
origin.

Q10 Write a normal equation of a plane parallel to 10x − 11y + z + 20 = 0 that passes through

(2, 3, 5).

Q11 Given that the plane ax + by + cz + d = 0 passes through the origin, what can you say about a,
b, c, and d?

Q12 Given that plane ax + by + cz + d = 0 contains the x-axis, what can you say about a, b, c, and
d?

Q13 Are the planes 4x + 6y + 8z + 15 = 0 and 10x + 15y + 20z − 7 = 0 parallel? Explain how you
know.

Q14 Suppose we know the planes 12x + 18y + 6z − 15 = 0 and ax + by + 4z + d = 0 are parallel.
What can you say about the values of a, b and d?

Q15 The equations 3x − y + 4z + 10 = 0 and −6x + 2y − 8z + k = 0 describe the same plane. What
is the value of k?

Q16 Consider the plane with normal equation 7x + y − 2z = 5.

a Give two other normal equations of this plane.

b What are the normal vectors corresponding to the orginal equation and your two equations

in a ?

339
Section 5.3 Exercises

c How are these vectors in b related to each other?

5.3.2

Q17 Give a normal equation of the plane with intercepts (10, 0, 0), (0, −5, 0) and (0, 0, 2).

Q18 Give a normal equation of the plane with intercepts (−18, 0, 0), (0, 9, 0) and (0, 0, −4).

Q19 Give a normal equation of the plane through (4, 3, 0), (5, 1, 1) and (−2, 5, 2).

Q20 Give a normal equation of the plane through (1, 1, 1), (8, 1, 4) and (0, 0, 4).

5.3.3

Q21 Katie is computing the distance from the point (6, 3) to the line 2x + 3y − 12 = 0. She notices

that (6, 0) is the x-intercept of the line. Since (6, 3) is 3 units away from (6, 0) she concludes
the distance from the point to the line is 3. What do you think of Katie’s reasoning?

Q22 Consider the line L with normal equation 2x + 3y − 12 = 0 and the point Q = (6, 3).

a What is the slope of L?

b What would be the slope of a line perpendicular to L?

c Write an equation (in any form you’d like) of a line K that passes through Q and is perpen-
dicular to L.

d Compute the intersection point of P of L and K.

e What is the distance from P to Q?

f Check that your answer to e matches the distance formula we derived. Which method do
you like better?

340
5.3.4

Q23 How far is (5, 2, 1) from 3x + 2y − 5z + 10 = 0?

Q24 How far is (0, 0, 1) from 3x + 12y − 4z + 20 = 0?

Q25 Are (6, 7, 1) and (5, −3, −4) on the same or different sides of 3x − 10y + 9z + 46 = 0?

Q26 The point (x, 4, 5) lies on the same side of the plane 2x + y − 2z + 10 = 0 as the origin does.
What does that tell you about the value of x?

5.3.5

Q27 We have six images of dogs and cats. We measure four things about each, and have collected
the data below. We would like to use the hyperplane 2x1 + 5x2 − 4x3 + 10x4 + k = 0 to separate
the images of dogs from the images of cats.

Type Measurements
Cat (5, 1, 3, 6)
Dog (7, 3, 7, 2)
Dog (7, 2, 6, 4)
Dog (9, 1, 8, 5)
Cat (6, 4, 5, 5)
Cat (9, 2, 7, 6)

a What values of k would cause the hyperplane to correctly separate the dog images from the
cat images?

b If you intended to use the hyperplane to guess whether a future image was a dog or cat,
what k would you choose? Why?

Q28 Suppose we have a hyperplane that we would like to separate two sets of points, but it doesn’t
quite work. We measure the error of this separation by taking the sum of the geometric distances
from the hyperplane of each point that is on the wrong side of the hyperplane. Suppose we were
hoping that the line 2x + 3y − 12 = 0 would separate the points of type T from the points of
type S.
341
Section 5.3 Exercises

Type Coordinates
T (6, 2)
T (2, 1)
T (5, 3)
T (4, 4)
S (1, 5)
S (1, 1)
S (4, 0)
S (4, 2)

a Create a diagram of these points (labelled or colored by type) and the line.

b We did not specify which side of the line should be T and which should be S. Use your
diagram to decide which choice of sides will give less error.

c Compute the error in this method of separation.

d Suppose we were trying to find a better line of the form ax + by + c = 0. When a = 2, b = 3


and c = −12, would increasing a increase or decrease the error? Justify your answer with a
derivative.

Synthesis and Extension

Q29 Write the equation of a plane that contains all the points equidistant from A = (1, −2, 7) and

B = (7, 0, 5)

Q30 Two planes are perpendicular if their normal vectors are orthogonal.

a Are 4x − 7y + z − 3 = 0 and 5x + y + 13z + 25 = 0 perpendicular?

b If two planes are perpendicular, is every vector in the first plane orthogonal to every vector
in the second plane?

Q31 Write the normal equation of a plane that contains the x and z axes. Where have we seen this
plane before?

342
Q32 What trouble do you run into if you try to write the equation of the plane through (6, 0, 0),

(0, 8, 0) and (3, 4, 0)? Explain geometrically why this makes sense.

343
Section 5.4

The Gradient Vector


Goals:

1 Calculate the gradient vector of a function.


2 Relate the gradient vector to the shape of a graph and its level curves.

3 Compute directional derivatives.

Armed with ideas about vectors, we have the vocabulary to discuss more complex changes in the
variables of a function. Rather than having one variable change and the other stay constant, we can
indicate a change in both variables with a vector. When exploring these computations, we will construct
one of the most important tools for multivariable calculus.

Question 5.4.1
How Do We Compute Rates of Change in Another Direction?

The partial derivatives of f (x, y) give the instantaneous rate of change in the x and y directions.
This is realized geometrically as the slope of the tangent line. What if we want to travel in a different
direction?

Figure: The tangent line to z = f (x, y) in the x direction

Definition

Let f (x, y) be a function and ⃗u be a unit vector in R2 . The directional derivative, denoted D⃗u f ,
is the instantaneous rate of change of f as we move in the ⃗u direction. This is also the slope of the
tangent line to y = f (x, y) in the direction of ⃗u.

344
Figure: The tangent line to f (x, y) in the direction of ⃗u

Recall that we compute Dx f by comparing the values of f at (x, y) to the value at (x + h, y), a
displacement of h in the x-direction.

f (x + h, y) − f (x, y)
Dx f (x, y) = lim
h→0 h

To compute D⃗u f for ⃗u = a⃗i + b⃗j, we compare the value of f at (x, y) to the value at (x + ta, y + tb),
a displacement of t in the ⃗u-direction.

Limit Formula

f (x + ta, y + tb) − f (x, y)


D⃗u f (x, y) = lim
t→0 t

Questions:
1 What direction produces the greatest directional derivative? The smallest?
2 How are these directions related to the geometry (specifically the level curves) of the graph?

3 How these directions related to the partial derivatives?


We can explore these questions with an applet in the Other Cross Sections activity.

345
Question 5.4.1 How Do We Compute Rates of Change in Another Direction?

Figure: A cross section of z = f (x, y) and a tangent line in the direction of ⃗u

Question 5.4.2
What Is the Gradient Vector?

The relationship between the direction of maximum increase and the partial derivatives suggest that
we could treat the partial derivatives like components of a vector.

Definition

The gradient vector of f at (x, y) is

∇f (x, y) = ⟨fx (x, y), fy (x, y)⟩

Remarks:
1 The gradient vector is a function of (x, y). Different points have different gradients.
2 ⃗umax , which maximizes D⃗u f , points in the same direction as ∇f .
3 ⃗u0 , which is tangent to the level curves, is orthogonal to ∇f .

Remark

Students often wonder: what is the geometric intuition behind the gradient vector and its properties?
The answer is often disappointing, but important. The gradient vector does not have a geometric
motivation. We artificially created the gradient vector because it has convenient algebraic properties. If
that were the end of the story, we wouldn’t bother learning about it. However, the gradient turns out
to be so useful that we will study it intensely, despite its uncompelling origins.

346
Question 5.4.3
How Do We Compute a Directional Derivative?

There are several ways to derive a formula for the directional derivative. One approach is to apply
algebra and limit laws to the limit definition. A more geometric method is to exploit our previous work
with the tangent plane. The directional derivative is the slope of a tangent line. The tangent lines live
in the tangent plane. We can compute their slope by rise over run.
Let ⃗u be a unit vector from (x0 , y0 ) to (x1 , y1 ). Let the associated z values in the tangent plane be
z0 and z1 respectively.

rise z1 − z0
D⃗u f (x0 , y0 ) = =
run |⃗u|
=fx (x0 , y0 )(x1 − x0 ) + fy (x0 , y0 )(y1 − y0 )
=∇f (x0 , y0 ) · ⃗u.

Functions of More Variables

We can also define directional derivatives of higher variable functions with analogous results.
f (x1 , . . . , xn ) is a differentiable function.
⃗u is a unit vector in Rn .
D⃗u f denotes the directional derivative in the direction of ⃗u.

∇f = ⟨fx1 , . . . , fxn ⟩ is an n-dimensional vector function on Rn .


D⃗u f = ∇f · ⃗u

Synthesis 5.4.4
Directional Derivative and the Cosine Formula

Now that we have a formula for directional derivatives, we can verify our observations from earlier.
Suppose f (x, y) is a differentiable function and we can choose any unit vector ⃗u.

a Write D⃗u f (x, y) in terms of the length of a vector and an angle.

347
Synthesis 5.4.4 Directional Derivative and the Cosine Formula

b In what direction ⃗u will f increase fastest?

c What will be the value of D⃗u f (x, y) in that direction?

d In what direction ⃗u will D⃗u f (x, y) = 0?

Solution

a Since the directional derivative is a dot product, we can apply our formula that relates the dot
product to the lengths of the vectors and the angle between them.

D⃗u f (x, y) = ∇f (x, y) · ⃗u dot product formula


= |∇f (x, y)||⃗u| cos θ cosine formula
= |∇f (x, y)| cos θ ⃗u is a unit vector

b Given a particular (x, y), |∇f (x, y)| cos θ is largest when θ = 0 This means that D⃗u f (x, y) is

maximized when ⃗u is in the direction of ∇f (x, y). The formula for a unit vector in the direction
of the gradient is
1
⃗u = ∇f (x, y)
|∇f (x, y)|

c In this direction, cos θ = 1 so D⃗u f (x, y) = |∇f (x, y)|.

d We can solve for θ

D⃗u f (x, y) = 0
|∇f (x, y)| cos θ = 0by part (a)

cos θ = 0 as long as ∇f (x, y) ̸= ⃗0


π
θ=
2

We conclude that ⃗u must be orthogonal to ∇f (x, y).

348
Figure: The angle between the gradient of f and a unit vector

Main Ideas

The cosine formula for the dot product lets us relate the directional derivative to an angle.
f increases fastest in the direction of ∇f (x, y).

D⃗u f (x, y) = 0 when ∇f (x, y) and ⃗u are orthogonal.

Example 5.4.5
A Directional Derivative

p
Let f (x, y) = 9 − x2 − y 2 and let ⃗u = ⟨0.6, −0.8⟩.

a What are the level curves of f ?

b What direction does ∇f (1, 2) point?

c Without calculating, is D⃗u f (1, 2) positive or negative?

d Calculate ∇f (1, 2) and D⃗u f (1, 2).

349
Example 5.4.5 A Directional Derivative

Solution

p
a The level curves have the equations 9 − x2 − y 2 = c. These solve to x2 + y 2 = 9 − c2 . As
c increases from 0 to 3 these are circles starting at radius 3 and shrinking to the origin. For c
outside this range, the level curve has no points.

b ∇f points in the direction of increase and normal to the level curves. Since higher level curves

are smaller circles, closer to the origin, ∇f (1, 2) points toward the origin.

c D⃗u f (1, 2) = ∇f (1, 2) · ⃗u. Since ⃗u appears to make an acute angle with ∇f (1, 2), we expect this
dot product to be positive.

d First we need to compute ∇f (1, 2).

∇f (x, y) = ⟨fx (x, y), fy (x, y)⟩


* +
1 1
= p (−2x), p (−2y) (chain rule)
2 9 − x2 − y 2 2 9 − x2 − y 2
 
1 1
∇f (1, 2) = √ (−2)(1), √ (−2)(2)
2 9 − 12 − 22 2 9 − 12 − 22
 
1
= − , −1
2

Now we use the dot product formula to compute D⃗u f (1, 2).

D⃗u f (1, 2) = ∇f (1, 2) · ⃗u


 
1
= − , −1 · ⟨0.6, −0.8⟩
2
350
= −0.3 + 0.8
= 0.5
This confirms our intuition that D⃗u f (1, 2) is positive.

Example 5.4.6
Drawing the Gradient

Let h(x, y) give the altitude at longitude x and latitude y. Assuming h is differentiable, draw the
direction of ∇h(x, y) at each of the points labeled below. Which gradient is the longest?

A
B

Figure: A topographical map

Solution

The gradient vector at each point is normal to the level curves, pointing uphill. The hill is steepest at
B, because the level curves are closer together. This tells us that the partial derivatives are larger. Thus
∇h(B) is longer than ∇h(A) and ∇h(C).

A
B

351
Application 5.4.7
Edge Detection

Representing an image by defining a brightness (or color) function on the pixels is simple enough,
but can a computer be taught to make sense of what it sees? Image recognition is an exciting field that
promises to automate and improve tasks from medical diagnosis to driving a vehicle.
The problem is daunting. What algorithm can possibly take a set of pixels and locate a tumor or a
pedestrian? The first step is to identify the objects in the image. The first step of object identification is
edge detection, determining where one object ends and another begins. We can do this by approximating
the partial derivatives at each pixel. We compare each pixel to nearby pixels and compute rise over run
(how these are chosen and averaged can significantly affect the accuracy of the algorithm).
The length of the gradient of a brightness function detects the edges in a picture, where the brightness
is changing quickly.

∂B 185−187
∂x
(336, 785) ≈ 1
∂B 179−187 ∇B
∂y
(336, 785) ≈ 1

∇B(336, 785) ≈ (−2, −8)


∇B

∂B 97−139
∂x
(340, 784) ≈ 1
∂B 72−139
∂y
(340, 784) ≈ 1

∇B(340, 784) ≈ (−42, −67)

Figure: A long gradient vector indicates a swift change in brightness. Its direction suggests the shape
of the edges.

Notice that the gradient is long near the edge of the iris in Mona Lisa’s eye. It is much shorter at a
point in the white of her eye. Moreover, the gradient at the edge of the iris is approximately normal to
the edge of her iris, because gradients are normal to level curves. This information can be used by an
algorithm to detect not only the location of the edges, but also their direction.

Application 5.4.8
Tangent Planes to a Level Surface

Use a gradient vector to find the equation of the tangent plane to the graph x2 + y 2 + z 2 = 14 at
the point (2, 1, −3).
There are two solutions worth comparing here.

352
Solution 1

We can write z as a function of x and y and apply the tangent plane formula.

x2 + y 2 + z 2 = 14

z 2 = 14 − x2 − y 2
p
z = − 14 − x2 − y 2 (z = −3 is on the negative branch of the function)
1 2
fx (x, y) = − p (−2x) fx (2, 1) =
2
2 14 − x − y 2 3
1 1
fy (x, y) = − p (−2y) fy (2, 1) =
2 14 − x2 − y 2 3
2 1
Equation: z + 3 = (x − 2) + (y − 1)
3 3

Solution 2

Define F (x, y, z) = x2 + y 2 + z 2 . The graph x2 + y 2 + z 2 = 14 is a level surface of F . ∇F (2, 1, −3)


is normal to the level surface, meaning it is also a normal vector for the tangent plane.

∇F (x, y, z) = ⟨2x, 2y, 2z⟩


∇F (2, 1, −3) = ⟨4, 2, −6⟩

We now have a normal vector ⃗n = ∇F (2, 1, −3). Our known point is (x0 , y0 , z0 ) = (2, 1, −3). The
normal equation of the plane is

4(x − 2) + 2(y − 1) − 6(z + 3) = 0.

Solution 2 requires more conceptual reasoning, but is computationally much easier. In fact, in
some cases we cannot use Solution 1 at all because we do not know how to solve for z. Once we are
comfortable with the concepts involved, the second method is generally superior for graphs of implicit
equations.

353
Application 5.4.8 Tangent Planes to a Level Surface

Main Idea

The graph of an implicit equation can be written as a level set of a function. The gradient of that
function is a normal vector to the level set and also to its tangent line/plane/hyperplane.

Figure: The level surface x2 + y 2 + z 2 = 14, its tangent plane and ∇F .

Section 5.4
Exercises

Summary Questions

Q1 What does the direction of the gradient vector tell you?

Q2 What does the directional derivative mean geometrically?

Q3 How do you compute a directional derivative?

Q4 How is the gradient vector related to a level set?

354
5.4.1

Q5 Suppose that f (3, 7) = 12 and f (7, 4) = 10.

a What is the distance from (3, 7) to (7, 4)?

b Approximate the rate of change of f at (3, 7) travelling toward (7, 4)

Q6 Suppose g(0, 2) = 15 and g(4, 1) = 17.

a What is the distance from (0, 2) to (4, 1)?

b Approximate the rate of change of g at (0, 2) travelling toward (4, 1).

c If you wanted to express the previous rate of change as an approximation of D⃗u g(0, 2), what

would the unit vector ⃗u be?

5.4.2

Q7 If f (x, y) = x2 sin(xey ), what is ∇f (x, y)?

p
Q8 If g(x, y) = 6x2 + 5y 4 , what is ∇g(x, y)?

Q9 If ∇f (x0 , y0 ) is orthogonal to ∇g(x0 , y0 ), what can we say about the level curves of f and g?
Be specific.

Q10 Harriet says “The gradient vector of f is tangent to the graph of z = f (x, y).”

“No,” says Marcus, “it is normal to the graph of z = f (x, y).” Who is correct?

355
Section 5.4 Exercises

5.4.3

Q11 Consider our computation of the directional derivative as a dot product.

a Where did we use the fact that ⃗u is a unit vector?

b If ⃗u were not a unit vector, then ∇f · ⃗u would no longer represent rise over run. What would
it represent instead?

Q12 Suppose the linearization of f (x, y) at (−3, 9) has the equation

1
L(x, y) = 4 + 2(x + 3) − (y − 9).
3

What is the slope of L from (−3, 9) to (5, 3)?

5.4.4

Q13 Given a function f (x, y) and a point (x, y), in what direction ⃗u is f decreasing fastest? Compute

an expression for ⃗u.

Q14 If D⃗u f (x, y) < 0, what can you say about the directions of ∇f (x, y) and ⃗u?

Q15 If fx (3, 5) = fy (3, 5) in what direction(s) from (3, 5) could f increase most quickly?

Q16 Explain why it makes sense that if D⃗u f (a, b, c) = 0, then ⃗u is tangent to the level surface of f

through (a, b, c).

Q17 If f (x, y, z) = 3xy + z 2 , find the unit vector ⃗u that maximizes D⃗u f (2, 1, −4). What is the value

of D⃗u f (2, 1, −4) for this ⃗u?

Q18 Let f (x, y) = 2x2 y − 10x − y 2 .

a What unit vector ⃗u maximizes the quantity D⃗u f (−1, 3)?

b Compute D⃗u f (−1, 3) for the ⃗u you found in part a .

356
5.4.5

2 1 2
Q19 If ⃗u = 3, −3, −3 and f (x, y, z) = xeyz , compute D⃗u f (3, 0, 4).

3 6 2
Q20 If ⃗u = 7, 7, −7 and f (x, y, z) = xy + yz + zx, compute D⃗u f (7, −7, 14).

Q21 If ⃗u is a unit vector in the direction of ⟨2, 3⟩ and f (x, y) = x2 + 3xy + 2, calculate D⃗u f (−1, 4).

2
−y
Q22 Compute the directional derivative of g(x, y) = ex at (3, 7) in the direction of ⟨−12, 5⟩.

5.4.6

Q23 In this diagram, we have several level sets of f (x, y).

a Which way does ∇f (−4, 1.25) point?

b Mark all the points (x, y) that satisfy

f (x, y) = 30
∇f (x, y) points in the positive y-direction

Q24 Some level curves of f are drawn below. Indicate the direction of the gradient of f at each
labelled point.

357
Section 5.4 Exercises

5.4.7

Q25 If ∇B(x0 , y0 ) = ⟨13, −17⟩, would you expect the pixels above (x0 , y0 ) to be brighter or dimmer

than (x0 , y0 )? Explain.

Q26 The brightness function on the Mona Lisa image ranges from 0 to 255. If we use adjacent points
to apporixmate the gradient as in the example, what is the longest gradient vector we could
theoretically produce?

5.4.8

Q27 Calculate a normal equation of a tangent line to x3 + 8y 3 − 12xy = 0 at (3, 1.5).

Q28 Let P be a point on the circle x2 + y 2 = r2 . Show that the position vector of P is normal to the
tangent line to the circle at P .

Q29 Produce an equation of the tangent plane to z 3 − xz 2 − yx2 = 24 at (4, −2, 2).

Q30 Give an equation of the tangent plane to the graph z 2 x + 2yz − x2 y 2 = 59 at (3, 2, 5).

358
Synthesis and Extension

Q31 Suppose f (x, y) is a differentiable function, and we know that for ⃗u = ⟨−0.6, 0.8⟩, D⃗u f (5, −1) =

4 and for ⃗v = ⟨0, −1⟩ we know that D⃗v f (5, −1) = −2. What is ∇f (5, −1)?

Q32 Suppose the point P = (x0 , y0 , z0 ) lies on the graph z = f (x, y).

a Give the formula for tangent plane to this graph at P .

b z = f (x, y) is a level surface of F (x, y, z) = f (x, y) − z. Use the gradient of F to write the

equation of the tangent plane to F (x, y, z) = 0 at P .

c Are these equations equivalent? Justify your answer with algebra.

Q33 How could you use the gradient of f to rewrite the formula for the linearization L(x, y) of f (x, y)

at (x0 , y0 )?

Q34 Suppose f (x, y) is a differentiable function and ∇f (a, b) is not the zero vector. How many unit

vectors ⃗u exist such that D⃗u f (a, b) = 0. How are they related geometrically?

Q35 Suppose f (x, y, z) is a differentiable function and ∇f (a, b, c) is not the zero vector. How many

unit vectors ⃗u exist such that D⃗u f (a, b, c) = 0. How are they related geometrically?

Q36 Suppose that f (x, y, z) is a differentiable function, and f (3, 5, −2) = 13. Suppose further that

the vectors ⟨3, 1, 0⟩ and ⟨0, 2, 5⟩ both lie in the tangent plane to the surface f (x, y, z) = 13 at
(3, 5, −2). If the maximum value of D⃗u f (3, 5, −2) is 20, find all possible values of ∇f (3, 5, −2).

Q37 Consider the function h(x, y) = x2 + 2x + 4y 3/2

a Compute all possible unit vectors ⃗u such that D⃗u h(2, 3) = 6

b What angle do these vectors ⃗u make with the tangent line to the level curve h(x, y) =

8 + 12 3 at (2, 3).

Q38 Let f (x, y) = x4 y + 3x − y 3 .

a Give an equation of the level curve of f through the point (−1, 2).

b Give an equation of the tangent line to the level curve of f at (−1, 2). Write your equation
in normal form.
359
Section 5.4 Exercises

c Give an expression for the linearization of f at (−1, 2).

360
Section 5.5

The Chain Rule


Goals:
1 Use the chain rule to compute derivatives of compositions of functions.
2 Perform implicit differentiation using the chain rule.

Motivational Example

Suppose Jinteki Corporation makes widgets which is sells for $100 each. It commands a small enough
portion of the market that its production level does not affect the demand (price) for its products. If
W is the number of widgets produced and C is their operating cost, Jinteki’s profit is modeled by

P = 100W − C
∂P
The partial derivative ∂W = 100 does not correctly calculate the effect of increasing production on
profit. How can we calculate this correctly?

Question 5.5.1
How Can We Visualize a Composition with a Multivariable Function?

You may recall parametric equations from high school algebra. A parametric equation actually
consists of two or more equations. Each expresses a variable in our coordinate system in terms of a
parameter t.
We can visualize a parametric equation as particle traveling through space.
The variable t represents time.
x(t) and y(t) represent the coordinates of the position at time t.
The vector ⟨x′ (t), y ′ (t)⟩ represents velocity. It points in the direction of travel.

Figure: A particle whose position is defined by x(t) and y(t), the path it follows and its velocity vector
361
Question 5.5.1 How Can We Visualize a Composition with a Multivariable Function?

Given a function f (x, y) where x = x(t) and y = y(t), we can ask how f changes as t changes.
We can visualize this change by drawing the graph z = f (x, y) over the path given by the parametric
equations x(t) and y(t).

Figure: The composition f (x(t), y(t)), represented by the height of z = f (x, y) over the path
(x(t), y(t))

Question 5.5.2
How Do We Compute the Derivative of a Composition of Functions?

Theorem [The Chain Rule]

Consider a differentiable function f (x, y). If we define x = x(t) and y = y(t), both differential functions,
we have

df ∂f dx ∂f dy
= +
dt ∂x dt ∂y dt
or
df
= ∇f (x, y) · ⟨x′ (t), y ′ (t)⟩
dt

362
Remarks

df
f (x(t), y(t)) is a function (only) of t. Because of this, dt is an ordinary derivative, not a partial
derivative.
df
dt is not the slope of the composition graph.

rise in z
slope =
run in xy-plane
df rise in z
=
dt change in t

The chain rule is easy to remember because of its similarity to the differential:

∂z ∂z
dz = dx + dy.
∂x ∂y

The proof is more complicated than just sticking a dt under each term.

Example 5.5.3
Using the Chain Rule

dP
If P = R − C and we have R = 100w and C = 3000 + 70w − 0.1w2 , calculate dw .

Solution

The chain rule says


dP ∂P dR ∂P dC
= +
dw ∂R dw ∂C dw
We compute the required partial derivatives:

∂P ∂P
=1 = −1
∂R ∂C
dR dC
= 100 = 70 − 0.2w
dw dw
We plug these into the formula to get

dP
= (1)(100) + (−1)(70 − 0.2w)
dw
= 30 + 0.2w

363
Example 5.5.3 Using the Chain Rule

Remark

Notice we don’t need the chain rule when we have expressions for each function. We can write the
composition ourselves and take an ordinary derivative. In this example we could just differentiate
P = 100w − (3000 + 70w − 0.1w2 ).

Question 5.5.4
What If We Have More Variables?

The chain rule works just as well if x and y are functions of more than one variable. In this case it
computes partial derivatives.

Theorem

If f (x, y), x(s, t) and y(s, t), are all differentiable, then

∂f ∂z ∂x ∂z ∂y
= +
∂s ∂x ∂s ∂y ∂s
or
 
∂f ∂x ∂y
= ∇f (x, y) · ,
∂s ∂s ∂s

We can also modify it for functions of more than two variables.

Theorem

Given f (x, y, z), x(t), y(t) and z(t), all differentiable, we have

df ∂f dx ∂f dy ∂f dz
= + +
dt ∂x dt ∂y dt ∂z dt
or
df
= ∇f (x, y, z) · ⟨x′ (t), y ′ (t), z ′ (t)⟩
dt

364
Example 5.5.5
A Composition with More Variables

Recall that for an ideal gas P (n, T, V ) = nRT


V . R is a constant. n is the number of molecules of
gas. T is the temperature in Celsius. V is the volume in meters. Suppose we want to understand the
rate at which the pressure changes as an air-tight glass container of gas is heated.

dP
a Apply the chain rule to get an expression for dT .

dn
b What is dT ?

dT
c What is dT ?

d Suppose that dV
dT = (5.9 × 10−6 )V . Calculate and simplify the expression you got for dP
dT .

Solution

dP ∂P dT ∂P dn ∂P dV
a dT = ∂T dT + ∂n dT + ∂V dT

dn
b The container is sealed so no molecules are getting in or out. dT = 0.

dT
c If we write T as a function of T , we get T = T . dT = 1.

d We’ll compute the partial derivatives and then plug them into our chain rule expression.

∂P nR
=
∂T V
∂P nRT
=− 2
∂V V
dP nR nRT
= (1) + 0 − (5.9)(10−6 )V
dT V V2
nR(1 − 0.0000059T )
=
V

365
Example 5.5.6
A Composition with Limited Information

2
Suppose g(p, q, r) = rep q . Given that p, q, r are all differentiable functions of x with the values in
dg
the following table, compute dx when x = 2.

x 0 1 2 3

p(x) 3 1 5 10
p′ (x) −3 2 3 4
q(x) 6 2 −2 3
q ′ (x) −1 −5 2 3
r(x) 10 11 7 3

r (x) 1 0 −1 −3

Solution

The chain rule says


dg ∂g dp ∂g dq ∂g dr
= + +
dx ∂p dx ∂q dx ∂r dx
We require the partial derivatives of g

∂g 2
= 2pqrep q
∂p
∂g 2
= p2 rep q
∂q
∂g 2
= ep q
∂r

Now we plug in the partial derivatives, along with the derivatives of p, q and r from the table.

dg 2 2 2
= 2pqrep q (3) + p2 rep q (2) + ep q (−1)
dx

This is correct, but not sufficiently simplified. We have left p’s, q’s and r’s in the expression, but the
table tells us what value these have when x = 2. We can make these subsitutions:

dg 2 2 2
= 2(5)(−2)(7)e(5) (−2) (3) + (5)2 (7)e(5) (−2) (2) + e(5) (−2) (−1)
dx
= −420e−50 + 350e−50 − e−50

= −71e−50

366
Application 5.5.7
Implicit Differentiation

Recall that an implicit equation on n variables is a level curve of a n-variable function. Consider the
dy
graph x3 + y 2 − 4xy = 0. How can we use this to calculate dx at the point (3, 3)?

Solution

First, note that (3, 3) does lie on the graph. When we plug x = 3 and y = 3 into our equation, we get
27 + 9 − 36 = 0, which is true. Now suppose that for every x near 3, we can define y(x) to be the y
coordinate on the graph x3 + y 2 − 4xy = 0.
Define F (x, y) = x3 + y 2 − 4xy. The points (x, y(x)) lie on the graph F (x, y) = 0. We can use this
dy
equation to obtain an expression for dx . When we differentiate F (x, y(x)), both components change as
x changes, so we cannot use a partial derivative. We need the chain rule.

F (x, y(x)) = 0
d d
F (x, y(x)) = 0 differentiate both sides
dx dx
∂F dx ∂F dy
+ =0 apply chain rule
∂x dx ∂y dx
∂F ∂F dy dx
+ =0 =1
∂x ∂y dx dx
∂F dy ∂F dy
=− solve for
∂y dx ∂x dx
∂F
dy ∂x
= − ∂F
dx ∂y

We compute the partial derivatives at (3, 3), then plug them into the formula we derived.
Fx (x, y) = 3x2 − 4y Fx (3, 3) = 15
Fy (x, y) = 2y − 4x Fy (3, 3) = −6
dy 15
=−
dx −6
5
=
2

Figure: The graph of F (x, y) = x3 + y 2 − 4xy = 0, its tangent line at (3, 3), and the gradient of F

367
Application 5.5.7 Implicit Differentiation

Main Ideas

dy
dx is the slope of the tangent line to F (x, y) = c.

dy
The chain rule allows us to derive dx = −F
Fy
x

Fy
−F
Fy is the negative reciprocal of
x
Fx , which is the slope of ∇F .

dy
In order to solve for dx we had to assume that y was a differentiable function of x. How do we
know that’s even true? There is an advanced and powerful theorem that tells us when we can write one
variable in an implicit equation as a function of the others. Here is the two-variable version.

Theorem [The Implicit Function Theorem]

Suppose we have a point (x0 , y0 ) on the graph of F (x, y) = c. Suppose that


1 The partial derivatives of F exist and are continuous at (x0 , y0 )
2 Fy (x0 , y0 ) ̸= 0

Then there is a function y = f (x) that agrees with the graph of F (x, y) = c in some neighborhood
around (x0 , y0 ). Furthermore
1 f is continuous
2 f is differentiable

Fx (x0 , y0 )
3 f ′ (x0 ) = −
Fy (x0 , y0 )

In the case of our example, the partial derivatives in question are polynomials. As long as Fy (x0 , y0 ) ̸=
Fx (x0 , y0 )
0, we are guaranteed that our graph has a tangent line at (x0 , y0 ), and its slope is − .
Fy (x0 , y0 )

Application 5.5.8
Indirect Profit Functions

Suppose a firm chooses how much quantity q to produce, but their profit Π(q, α) depends on some
parameter α outside their control (maybe a tax or a measure of regulatory burden). The firm, once
it knows the value of α, will choose the q that maximizes profit. How will their profit change as α
changes?

368
Solution


The change in the firms profit is dα . Since q is also a function of α we will need the chain rule.

dΠ ∂Π dq ∂Π dα
= +
dα ∂q dα ∂α dα

dα ∂Π
We can substitute dα = 1. We can also argue that ∂q = 0. Why? Because q is the choice that
∂Π
maximizes profit, and maximums occur at critical points. If ∂q > 0 then the firm could increase q to
∂Π
increase profit (without changing α, which it has no control over). Similarly, If ∂q < 0 then reducing
production would increase profit.
Performing these substitutions gives:
dΠ ∂Π
=
dα ∂α
This suggests that in this case, the total derivative is equal to the partial derivative.

We can verify this equality graphically as well. Pick a particular α0 and let q0 = q(α0 ). Notice:
The graph π(q0 , α) is never above π(q(α), α) for any α, since q(α) is the optimal choice of q.
The graphs π(q0 , α) and π(q(α), α) meet at α0 , since q0 = q(α0 ).

If two graphs meet but one stays below the other, they are tangent. They have the same tangent
line and thus the same derivative.

Figure: Two graphs of z = Π(q, α), one where q changes to be the optimal choice for each α and one
where q is fixed at q0 , the optimal choice for α0

369
Application 5.5.8 Indirect Profit Functions

Remark

If we had an expression for q(α) and an expression for Π, we could substitute and use ordinary differen-
tiation. Since we did not, we needed the chain rule. Even with such an expression, to find dΠ
dα directly
we would need to
1 Solve for q as a function of α

2 Substitute q(α) into Π(q, α)


3 Differentiate the result
Taking a partial derivative is less work. Our result (which economists call the envelope theorem) is
both a useful abstraction and a computational shortcut.

Section 5.5
Exercises

Summary Questions

Q1 How can we visualize f (x, y), when x and y are functions of t?

df
Q2 Explain why dt cannot be interpreted as a slope of f over the xy-plane.

dz ∂z
Q3 What is the difference between dx and ∂x ? How is the first one computed?

Q4 How do you use the chain rule to differentiate implicit functions?

5.5.1

Q5 Plug in a few different t values and plot the corresponding points of

x(t) = 3 + 5t
y(t) = −2 + 4t

What is the resulting curve? What is the significance of the t coefficients?

370
Q6 Consider the curve defined by

x(t) = t

y(t) = et

a Plot a few points on the curve by plugging in different values of t.

b In general, what curve does

x(t) = t
y(t) = f (t)

seem to produce?

Q7 A particle is travelling according to the parametric equations

x(t) = 2 cos t
y(t) = 3 sin t
π
What is the speed (magnitude of velocity) at t = 3?

Q8 Produce a tangent vector to the curve defined by

x(t) = t3

y(t) = t2

at the point (−27, 9).

Q9 Is the graph of

x(t) = t2
y(t) = sin(t)

the graph of a function? How can you tell without graphing it?

Q10 How are the graphs of the following two parametric equations related? Can you generalize your
answer to similar pairs of parametric equations?

x(t) = cos t x(t) = cos(t3 )

y(t) = ln t y(t) = ln(t3 )

371
Section 5.5 Exercises

5.5.2

df
Q11 Let f (x, y) be a funtion. Under what conditions is dt equal to the directional derivative of f in

the direction of the tangent vector ⟨x′ (t), y ′ (t)⟩?

Q12 Liam says “If f is a function of x and y and x and y are increasing, then f is increasing.” We
all know Liam is incorrect. How could we use the chain rule to refute him?

5.5.3

v
Q13 The angular speed of an object is given by ω = r where r is the distance from the center of
rotation and v is the linear speed. Suppose an object is orbiting earth at a radius of 8400000m
and a speed of 6900m/s. If the radius is increasing at a rate of 100m/s and the linear speed is
decreasing by 60m/s2 , how quickly is the angular speed changing?

Q14 Let x = t2 and y = sin t. Let f (x, y) = xy.

df
a Compute dt using the multivariable chain rule.

df
b Compute dt by substituting and using single-variable differentiation.

c What earlier rule of differentiation can we recover by applying the chain rule to f (x, y) = xy?

5.5.4

Q15 Suppose h(x1 , x2 , x3 , x4 ) is a four-variable function and each xi (x, t) is a function of parameters
∂h
s and t. How would the multivariable chain rule compute ∂t ?

Q16 Suppose k(x) is a function and x(r, s, t) is a function of paramters r, s, and t. How does the
∂k
multivariable chain rule say we should compute ∂r ?

372
5.5.5

Q17 Agular momemtum is given by L = rmv where r is the radius of roatation, m is the mass of the
object, and v is its linear speed. At a certain time t0 , r is 42 million meters and increasing at
80, 000 meters per second, m is 6000kg and not changing, and v is 3100m/s and increasing at
20m/s2 . How quickly is angular momentum increasing?

∂f
at (r, θ) = 4, π6 .

Q18 Let f (x, y) = x2 − y 2 . If x(r, θ) = r cos θ and y(r, θ) = r sin θ, compute ∂θ

5.5.6

Q19 Suppose x(t) and y(t) are differentiable functions of t such that

x(2) = 3 x′ (2) = 2 y(2) = −5 y ′ (2) = 10

2 df
If f (x, y) = ye(x y)
, show how to compute dt at t = 2.

Q20 Suppose that x and y are functions of t such that when t = 2:

x=3 y=1

dx dy
dt =5 dt =2

dg
If g(x, y) = 3xy 2 − x2 + 2y, compute dt .
t=2

5.5.7

dy
Q21 Compute dx at (4, 2), if x and y satisfy y 3 − xy + x2 − 4 = 0

dy
Q22 Compute dx at (3, 0), if x and y satisfy xexy = 3

Q23 What is the slope of the tangent line to x − y 2 = 9 at (18, −3)?

Q24 Compute the slope of the tangent line to x3 = y 2 at (4, −8).

373
Section 5.5 Exercises

Q25 Angular momentum is given by L = rmv. One law of physics states that angular momentum of

an object is conversed (unchanged) unless the a force (besides gravity) acts to speed up or slow
down the object. Use the chain rule to derive an expression for dv dr , the amount of linear speed
an object gains or loses per unit that its radius of rotation increases. What do you notice about
the role of mass in your answer?

Q26 Another principle in physics is the conservation of energy. Kenetic energy is given by E = 12 mv 2 ,
where m is the mass and v is the linear speed of the object. Suppose that we have a rock
drifiting through space. Suppose it impacts stationary rocks and the combined mass sticks
together (without releasing any energy as heat, light or sound). Thus the mass of the total
travelling object increases, while the total energy stays the same. Derive an expression for how
speed changes per unit of increase in mass.

5.5.8

dx
Q27 Suppose that x is a function of t and that when t = 9, we have x = 7 and dt = −3. Define

f (x, t) = x + t.

∂f
a Compute the partial derivate (7, 9).
∂t
df
b Compute the total derivative (7, 9).
dt

c In a few sentences, explain what these two quantities compute and why they are different
from each other.

Q28 A firm with a monopoly produces gets to set the price of its products and decide how much to
produce. There is a demand function p such that if the firm produces q units, it must set its
price at p(q) to get consumer to buy all of its production. Each unit costs c to produce. The
profit function of the firm is
π(q, c) = p(q)q − cq
We can assume that once the firm has worked out what c is, it chooses the q to maximize profit.
How much will the firm’s actual profit change per unit of increase in c?

374
Synthesis and Extension

Q29 Find the slope of the tangent line to x2 + 2x − y 2 = 8 at (5, −3) using each of the following two
methods.

a Using a gradient vector to write the normal equation of the line and solving for the slope.

b Using implicit differentiation.

Q30 Suppose the position of a particle at time t is given by

x(t) = t2
y(t) = 3 − t

z(t) = t

At t = 4, how quickly is particle travelling away from the plane x + 2y − 2z = 10?

Q31 Here is a diagram of the level curves of h(x, y) for certain values of c.

a Is hy (2, 1) positive or negative? Explain in a sentence or two.

b Add a vector to the diagram that indicates the direction of greatest increase of h at (−2, 0).

c Suppose x = 4 − 5t and y = 3t2 . Determine, with the aid of a relevant calculation, whether
dh
dt is positive or negative at t = 1.

Q32 Let f (x, y) = x5 + 20xy + 5y 2 .

375
Section 5.5 Exercises

a Give an equation of the level curve of f through the point (1, −1).

b Give an equation of the tangent plane to z = f (x, y) at the point (1, −1, −14).

c Use the differential of f to estimate how much the z value of z = f (x, y) would change from

(1, −1, −14), if x increased by 3 and y decreased by 1. If you don’t remember differential
notation, you may use another notation for partial credit.

376
Section 5.6

Maximum and Minimum Values


Goals:

1 Find critical points of a function.


2 Test critical points to find local maximums and minimums.

3 Use the Extreme Value Theorem to find the global maximum and global minimum of a function
over a closed set.

Functions can be used to model a variety of real-world quantities. A company’s profit, a disease’s
infection rate, or the impact of a government program. In these cases, the most pressing question is:
what choice of independent variables will maximize or minimize the value of the function? Answering
this question was one of the headline applications of single-variable calculus. In this section we will
generalize those methods to functions of multiple variables.

Question 5.6.1
What Are Local Extremes?

The local extremes of a function are the local minimums and maximums.

Definition

Given an n-variable function f (x1 , x2 , . . . , xn ) we say that a point P in n-space is

1 a local maximum if f (P ) ≥ f (Q) for all Q in some neighborhood around P .


2 a local minimum if f (P ) ≤ f (Q) for all Q in some neighborhood around P .

Question 5.6.2
Where Do Local Extremes Lie?

At a local maximum (or minimum) D⃗u f cannot be positive (or negative) in any direction. Thus at
a local extreme, ∇f (P ) = ⃗0, the zero vector. In other words, all the partial derivatives of f are 0 at P .
In the case of a two-variable function, we can visualize this condition. If fx (P ) ̸= 0, then we could
travel in the x direction to increase or decrease f . If fx (P ) ̸= 0, then we could travel in the y direction
to increase or decrease f . Thus at a local maximum or local minimum, the tangent plane must be

377
Question 5.6.2 Where Do Local Extremes Lie?

horizontal.

Figure: Tangent lines must have slope 0 at a local max.

This argument works anywhere that ∇f exists. That motivates the following definition:

Definition

We say P is a critical point of f if either


1 ∇f (P ) = ⃗0 or

2 ∇f (P ) does not exist (because one of the partial derivatives does not exist).

Theorem

The local maximums and minimums of a function can only occur at critical points.

Example 5.6.3
Finding Critical Points

The function z = 2x2 + 4x + y 2 − 6y + 13 has a minimum value. Find it.

378
Solution

We know the minimum value exists, so it must lie at a critical point. We compute

∇f (x, y) = ⟨4x + 4, 2y − 6⟩

One type of critical point is where this is undefined, but no value of (x, y) makes these expressions
undefined. The other type of critical point occurs when these components are 0. We can solve that
system of equations.

4x + 4 = 0 2y − 6 = 0
x = −1 y=3

The only point that satisfies this requirement is (−1, 3). Since there is only one critical point, and the
promised minimum lies at a critical point, (−1, 3) must be that point. The minimum value is

z = (2)(−1)2 + (4)(−1) + 32 − (6)(3) + 13 = 2

Question 5.6.4
How Do We Identify Two-Variable Local Maximums and Minimums?

Once we have found a critical point, how do we know whether it is a local minimum, a local maximum
or neither? Consider a function f (x, y) and a critical point P . There are two possibilities for ∇f (P ). In
the case that ∇f (P ) does not exist, calculus can be no further use to us. If ∇f (P ) = ⟨0, 0⟩, there are
a few different shapes the graph could take. Since we are working with two-variables, we can visualize
these shapes.
A critical point could be a local maximum. In this case f curves downward in every direction.

Figure: A local maximum at (0, 0)


379
Question 5.6.4 How Do We Identify Two-Variable Local Maximums and Minimums?

A critical point could be a local minimum. In this case f curves upward in every direction.

Figure: A local minimum at (0, 0)

A critical point could be neither. f curves upward in some directions but downward in others. This
configuration is called a saddle point.

Figure: A saddle point at (0, 0)

Curvature is measured by the second derivatives. This matches our experience with single-variable
critical points, where the second derivative test classifies critical points as local maximums or local
minimums. We have a similar test for two-variable functions, though the computation is more involved.

380
Theorem [The Second Derivatives Test]

Suppose f is differentiable at (P ) and fx (P ) = fy (P ) = 0. Then we can compute

D = fxx (P )fyy (P ) − [fxy (P )]2

1 If D > 0 and fxx (P ) > 0 then P is a local minimum.


2 If D > 0 and fxx (P ) < 0 then P is a local maximum.
3 If D < 0 then P is a saddle point.

Unfortunately, if D = 0, this test gives no information.

Definition

The quantity D in the second derivatives test is actually the determinant of a matrix called the Hessian
of f .
 
f xx (P ) f xy (P )
fxx (P )fyy (P ) − [fxy (P )]2 = det  
fyx (P ) fyy (P )
| {z }
Hf (P )

Hf follows a logical pattern and can be a useful mnemonic for the second derivatives test.

Example 5.6.5
Classifying a Critical Point

Let f (x, y) = cos(2x + y) + xy

a Verify that ∇f (0, 0) = ⟨0, 0⟩.

b Is (0, 0) a local minimum, a local maximum, or neither?

381
Example 5.6.5 Classifying a Critical Point

Solution

fx (x, y) = − sin(2x + y)(2) + y (chain rule)


fx (0, 0) = − sin((2)(0) + 0)(2) + 0 = 0
fy (x, y) = − sin(2x + y)(1) + x (chain rule)
fy (0, 0) = − sin((2)(0) + 0)(1) + 0 = 0
∇f (0, 0) = ⟨0, 0⟩

b For the second derivatives test, we need to compute fxx , fxy and fyy at (0, 0).

fxx (x, y) = −2 cos(2x + y)(2) (chain rule)


fxx (0, 0) = −2 cos((2)(0) + (0))(2) = −4
fxy (x, y) = −2 cos(2x + y)(1) + 1 (chain rule)
fxy (0, 0) = −2 cos((2)(0) + (0))(1) + 1 = −1
fyy (x, y) = − cos(2x + y)(1) (chain rule)
fyy (0, 0) = − cos((2)(0) + (0))(1) = −1

D = fxx (0, 0)fyy (0, 0) − [fxy (0, 0)]2

= (−4)(−1) − (−1)2
=3
Since D > 0 and fxx < 0, (0, 0) is a local maximum of f .

Figure: The graph z = cos(2x + y) + xy with a local maximum at (0, 0)


382
Remark

Why does the final determination between maximum and minimum rely on fxx (P ) instead of fyy (P )?
Actually it doesn’t matter which we test. In order for D to be positive, fxx (P ) and fyy (P ) must have
the same sign.

Question 5.6.6
How Do We Find Global Extremes?

The second derivatives test can categorize local extremes, but what about a global extreme?

Definition

Given an n-variable function f (x1 , x2 , . . . , xn ) we say that a point P in n-space is

1 a local maximum if f (P ) ≥ f (Q) for all Q in the domain of f .


2 a local minimum if f (P ) ≤ f (Q) for all Q in the domain of f .

In a real-world application, we are much more interested in finding global extremes than local ones.
Many abstract functions do not even have global extremes. y = ex has no global maximum. It increases
without bound. y = x12 has no global minimum. It approaches 0 but never reaches it. The following
theorem guarantees that certain functions will have global extremes for us to try to find.

Theorem [The Extreme Value Theorem]

A continuous function f on a closed and bounded domain D has a global maximum and a global
minimum somewhere in D.

Two of the words in this theorem have not been defined yet. Here are their definitions.

Definition

Let D be a subset of n-space.

D is closed if it contains all of the points on its boundary.

D is bounded if there is some upper limit to how far its points get from the origin (or any other
fixed point). If there are points of D arbitrarily far from the origin, then D is unbounded.

383
Question 5.6.6 How Do We Find Global Extremes?

For one-variable functions. The EVT requires that the domain be a union of finite, closed intervals
(and maybe finitely many isolated points).

Figure: A union of finite, closed intervals

In 2-space, we can get a better sense of what these requirements mean. The boundary of D is
the set of points from which you can find points in D and points outside D arbitrarily close by. The
boundary of a disc is a circle. If the disc includes the circle, it is closed. If it does not include the circle,
it is not closed.

Figure: x2 + y 2 ≤ 9 is closed. Figure: x2 + y 2 < 9 is not closed.

Containing part of the boundary is not enough. Any missing point means that D is not closed. Even
removing an isolated point from the interior of D is a problem. That point is arbitrarily close to points
in D. It is also arbitrarily close to a point outside D, itself. Thus it is a boundary point not contained
in D, and D is not closed.

384
Figure: −2 ≤ x ≤ 2 and −3 < y < 3 is Figure: −2 ≤ x ≤ 2 and −3 ≤ y ≤ 3
not closed. and (x, y) ̸= (1, 2) is not closed.

Bounded regions are easier to understand. If we can enclose the region in a sufficiently large circle,
it is bounded. If it stretches outside any circle we would draw around it, then it is unbounded.

Figure: −2 ≤ x ≤ 2 and −3 ≤ y ≤ 3 is Figure: −2 ≤ x ≤ 2 is unbounded.


bounded.

Example 5.6.7
Finding a Global Maximum

Consider the function f (x, y) = x2 + 2y 2 − x2 y on the domain

D = { (x, y) : x2 + y 2 ≤ 16, x ≤ 0}
| {z } | {z }
points in R2 conditions

a Does f have a maximum value on D? How do we know?

385
Example 5.6.7 Finding a Global Maximum

b Find the critical points of f .

c Must one of the critical points be the maximum?

d Find the maximum of f .

Remark

The set notation

{type of objects in the set : conditions that thoise objects must satisfy}

is used throughout mathematics, because it is so flexible. It can denote sets of numbers, points,
functions, vectors or any other objects.

Solution

a f is a polynomial, so it is continuous. D is a semi-disc that includes its boundary, so it is closed


and bounded. The extreme value theorem guarantees that f has a global maximum on D.

b We begin by computing the gradient of f .

fx (x, y) = 2x − 2xy fy (x, y) = 4y − x2

386
These are never undefined, so there are no critical points of that type. The only critical points
will be where both partial derivatives are 0.

0 = 2x − 2xy 0 = 4y − x2
0 = 2x(1 − y) (factor 2x − 2xy)
x=0 or y = 1

0 = 4y − 02 0 = 4(1) − x2 (examine each case seperately)


0=y x = ±2

We should be careful not to lose track of the logic. The x = ±2 solution goes with the y = 1
case. The y = 0 solution goes with the x = 0 case. Mixing these up will give invalid solutions.
You can always plug in pair of (x, y) to verify they satisfy the system of equations.
We conclude that (0, 0), (2, 1) and (−2, 1) are the critical points, but (2, 1) is not in the domain,
so we discard it.

c No. Recall our method for maximizing single variable functions on a closed interval. The maximum
can occur at the endpoint of the interval without being detected by the derivative.

The same is true here. If the maximum is on the boundary of D, the gradient need not be 0. In
the single-variable case, we only need to test the endpoints (by evaluating f there). There are
infinitely many points on the boundary of D. Evaluating f on all of them is not an option. With
graphing software we can see that the maximum occurs on the boundary somewhere in the third
quadrant, but how can we solve for it exactly?

387
Example 5.6.7 Finding a Global Maximum

Figure: The graph of y = f (x, y) over the domain D

d To narrow down the search for a maximum on the boundary of D, we will use the boundary
equations to write an expression for f that is valid only on the boundary. We can find the critical
points of this expression, and rule out any point that is not a critical point.

Suppose the maximum lies on x = 0. The function on x = 0 is f (0, y) = 02 + 2y 2 − 02 y =


2y 2 . This function only has one variable, so we can find potential maximums by looking for
its critical points.
f ′ (y) = 4y
This is never undefined. It is 0 at y = 0. The only critical point of f (y) on x = 0 is (0, 0).
However, not all of x = 0 is the boundary of D. This component of the boundary ends
at (0, 4) and (0, −4). Like with a closed interval, the derivative of f (y) cannot detect a
maximum at those endpoints.
Suppose the maximum lies on x2 + y 2 = 16. On this graph, we can similarly reduce f (x, y)
to a function of one variable, but the substitution is more complicated. We solve

x2 + y 2 = 16

x2 = 16 − y 2

f (y) = (16 − y 2 ) + 2y 2 − (16 − y 2 )y (substitute for x2 )

= y 3 + y 2 − 16y + 16

f ′ (y) = 3y 2 + 2y − 16

0 = 3y 2 + 2y − 16 (solve for critical points)


0 = (3y + 8)(y − 2)

8
y=− y=2
3
 2
8
x2 + − = 16 x2 + 22 = 16 (substituue into x2 + y 2 = 16)
3
64
x2 = 16 − x2 = 16 − 4
9
388

r
80
x=− x = − 12 (+ solutions are not in D)
9
 q  √
Our critical points are − 80 8

9 , − 3 and − 12, 2 . This component of the boundary also

ends at (0, 4) and (0, −4), so the maximum might lie there.
We can now argue that one of the points we have found is the maximum.

If the maximum is not on the boundary, it lies at (−2, 1).


If the maximum is on x = 0, then it lies at (0, 0), (0, 4) or (0, −4).
 q  √
If the maximum is on x2 + y 2 = 16, then it lies at − 80 8

9 , − 3 , − 12, 2 , (0, 4) or

(0, −4).
One of these must be the case. To figure out which it is, we can evaluate f at each point and see
which produces the largest value.

f (−2, 1) = (−2)2 + 2(1)2 − (−2)2 (1) = 2


f (0, 0) = (0)2 + 2(0)2 − (0)2 (0) = 0
f (0, 4) = (0)2 + 2(4)2 − (0)2 (4) = 32
f (0, −4) = (0)2 + 2(−4)2 − (0)2 (−4) = 32
 q   q 2  q 2
8 2
f − 80 8 80
− 80 − 83 = 1264
 
9 , − 3 = − 9 + 2 − 3 − 9 27 (maximum)
√  √ √
f − 12, 2 = (− 12)2 + 2(2)2 − (− 12)2 (2) = −4

Main Ideas

If the Extreme Value Theorem applies, then all we need to do is find the critical points and evaluate
f at each. One is guaranteed to be the maximum, and one is guaranteed to be the minimum.

∇f = ⃗0 will detect critical points on the interior, but not on the boundary.
We can rewrite the function on a boundary component using substitution. Set the derivative equal
to 0 to find critical points.
Derivatives will not detect maximums at the endpoints of a boundary curve. These must be
included in your set of critical points.

389
Section 5.6
Exercises

Summary Questions

Q1 Where must the local maximums and minimums of a function occur? Why does this make sense?

Q2 What does the second derivatives test tell us?

Q3 What hypotheses does the Extreme Value Theorem require? What does it tell us?

Q4 Assuming a maximum and minimum exist, where must you look in a domain to be sure you find
them?

5.6.1

Q5 Raina claims that (0, 0) is the maximum of f (x, y) = x2 − y 2 − 10xy. Disprove her claim without
using calculus.

Q6 Is a global maximum also a local maximum? Explain.

Q7 Suppose g(x, y) = ef (x,y) . If (a, b) is a local minimum of f (x, y), is it also a local minimum of

g(x, y)? Explain.

Q8 Does a constant function have any local maximums? Justify your answer with the definition of
local maximum.

390
5.6.2

Q9 Suppose ∇f (4, 2) = ⟨−5, 11⟩. Where would you travel from (4, 2) to find higher values of f ?

Q10 The function f (x, y) = |x| + |y| has its global minimum at (0, 0). Is this a critical point? Explain.

Q11 If (a, b) produces the minimum value of |∇f (x, y)|, must (0, 0) must be a critical point? Explain.

Q12 Suppose f (x) is a function of x with critical points x = a and x = b. Suppose g(y) is a function

of y with critical points y = c and y = d. What are the critical points of h(x, y) = f (x) + g(y)?

5.6.3

Q13 Find the critical points of f (x, y) = x4 + 4xy + y 4 .

Q14 Find the critical points of g(x, y) = x2 + y 2 − 3xy − 13x + 12y.

5.6.4

Q15 If (x0 , y0 ) is critical point and f( xx)(x0 , y0 ) = 0, can (x0 , y0 ) be a local maximum of f ? What

must be the value of fxy (x0 , y0 ) if so?

Q16 For what values of a does f (x, y) = x2 + y 2 + axy have a local minimum at the origin?

391
Section 5.6 Exercises

5.6.5

Q17 Find the critical points of h(x, y) = x2 y − x2 − 2y 2 . Classify each as a local maximum, local
minimum, or saddle point.

1 3
Q18 Find all critical points of f (x, y) = 3x − 4xy + 2y 2 . Classify them as local maximums, local
minimums, or saddle points.

Q19 Compute the critical points of f (x, y) = 2x3 − 12xy + 3y 2 and classify each as a local maximum,
local minimum, or saddle point.

Q20 Let h(x, y) = x2 + y 3 + 3xy. Find the critical points of h, and classify each as a local maximum,
local minimum or saddle point.

Q21 Let f (x, y) = x3 − 15x2 − 9x + 12xy − 3y 2 − 18y. Find the critical points of f and classify each
one as local maximum, local minimum or saddle point.

Q22 Let f (x, y) = x5 + 20xy + 5y 2 . Find the critical points of f and classify each one as local
maximum, local minimum or saddle point.

3
+y 2 −12x+10y
Q23 Find the critical points of g(x, y) = ex . Classify each one as local maximum, local
minimum or saddle point.

1
Q24 Find the critical points of f (x, y) = x4 −x2 y+y 2 +10 . Classify each one as local maximum, local
minimum or saddle point.

5.6.6

Q25 Draw a sketch of D = {(x, y) : y ≥ x2 , y ≤ x3 }. State whether D is closed and whether D is


bounded.

Q26 Draw a sketch of D = {(x, y) : y ≥ x, y ≤ 2x, xy < 1}. State whether D is closed and whether
D is bounded.

Q27 Draw a sketch of D = {(x, y) : x > 0, y ≥ x4 }. State whether D is closed and whether D is
bounded.

392
Q28 Draw a sketch of D = {(x, y) : − 1 < x2 + y 2 ≤ 16}. State whether D is closed and whether
D is bounded.

Q29 Let D = {(x, y) : y ≥ x2 }. Can the Extreme Value Theorem guarantee that f has a maximum
on D? Explain.

1
Q30 Does the function f (x, y) = x2 +y 2 have a maximum and minimum value on the domain D =

{(x, y) : −3 ≤ x ≤ 3, −4 ≤ y ≤ 4}? If yes, find them. If not, explain why the extreme value
theorem does not apply.

5.6.7

Q31 Draw a careful diagram of D = {(x, y) : y ≥ x2 , x2 + y 2 ≤ 20}. Where would you need to
check to guarantee you’d find the maximum value of a continuous function f on D?

Q32 Let f (x, y) be a differentiable function and let

D = {(x, y) : y ≥ x2 − 4, x ≥ 0, y ≤ 5}.

a Sketch the domain D.

b Does the Extreme Value Theorem guarantee that f has an absolute minimum on D? Explain.

c List all the places you would need to check in order to locate the minimum.

Q33 Find the maximum and minimum value of f (x, y) = ex+3y in the triangle with vertices (0, 0),

(6, 0) and (0, 3).

Q34 Find the maximum and minimum value of f (x, y) = 3x + y on D, the closed region bounded by

y = x2 and y = 16.

Q35 Find the global max and min of f (x, y) = x3 − 12x + y 3 − 3y on the rectangle 0 ≤ x ≤ 4 and
−2 ≤ y ≤ 2.

x4 −2x2 +2
Q36 Consider the function g(x, y) = y 2 −2y+2 on the rectangle −2 ≤ x ≤ 2 and 0 ≤ y ≤ 3.

393
Section 5.6 Exercises

a Does the extreme value theorem apply to this function? Why might you be concerned, and
what would you have to check?

b Find the min and max of g.

Synthesis and Extension

Q37 Consider the function f (x, y) = x2 − 4xy + 4y 2 .

a Find the critical point(s) of f .

b What does the second derivatives test say about the critical points of f ?

c Can you classify the critical points using algebra instead? Explain.

Q38 If g(x) is an increasing function, explain why the local maximums and minimums of any f (x, y)

are the same as the maximums and minimums of g(f (x, y)).

394
Section 5.7

Lagrange Multipliers
Goals:

1 Find minimum and maximum values of a function subject to a constraint.


2 If necessary, use Lagrange multipliers.

Many of the functions we studied do not have maximum values. Polynomials and exponential
functions increase without bound. Yet in the real world, we never see corporations producing infinite
quantities of goods. We never see infinite populations of animals. Does this mean that polyonomials and
exponentials have no real-world applications? On the contrary, they are ubiquitous, but the corporations
and populations that opperate under these models also have constraints on their inputs.
Corporations do not have infinite money to invest. Animals do not have infinite food sources. In
this section we develop the tools to find maximum and minimum values of a function, when our inputs
are constrained.

Question 5.7.1
What Is a Constraint?

Sometimes we aren’t interested in the maximum value of f (x, y) over the whole domain, we want
to restrict to only those points that satisfy a certain constraint equation.

The maximum on the constraint is unlikely to


be the same as the unconstrained maximum
(where ∇f = 0). Can we still use ∇f to find
the maximum on the constraint?

Figure: Maximizing f such that x + y = 1

We explore this question in the Maximums on a Constraint activity.

Question 5.7.2
How Do We Solve a Constrained Optimization?

The method of Lagrange Multipliers makes use of the following theorem.


395
Question 5.7.2 How Do We Solve a Constrained Optimization?

Theorem

Suppose an objective function f (x, y) and a constraint function g(x, y) are differentiable. The local
extremes of f (x, y) given the constraint g(x, y) = c occur where

∇f = λ∇g

for some number λ, or else where ∇g = 0. The number λ is called a Lagrange Multiplier.

This theorem generalizes to functions of more variables.


We can justify the theorem visually by examining the relationship ∇f , ∇g and the constraint. The
constraint g(x, y) = c is by definition a level curve of g. It is normal to ∇g.

Figure: Where ∇f is not parallel to ∇g, we can travel along g(x, y) = c and increase the value of f .
This is because D⃗u f > 0 for some ⃗u along the constraint.

By this argument, the only place a maximum or minimum of the objective function can lie of the
contraint is where D⃗u f would have to be 0, because ∇f is parallel to ∇g.

Remark

When ∇f (P ) is parallel to ∇g(P ) (and neither of these vectors is ⃗0), the level curves of f through P
is tangent to the level curve g(x, y) = c. If we can draw the level curves of f , this gives us a visual
method of identifying the potential maximums and minimums.

Example 5.7.3
The Maximum on a Curve

Find the point(s) on the ellipse 4x2 + y 2 = 4 on which the function f (x, y) = xy is maximized.

396
The EVT and constraints

Are we guaranteed that a maximum exists at all? The Extreme Value Theorem can still be applied to
constraints. Here are a few ways we can identify that a constraint is closed:

1 A curve is closed if it includes its endpoints (or none exist).


2 A surface is closed if it includes its boundary (or none exists).
3 The level set of a continuous function is always closed.
Even armed with these, we still need to check that the domain is bounded.

Solution

We’ll check the conditions of the Extreme Value Theorem


1 4x2 + y 2 = 4 is a curve with no endpoints, so it is closed.

2 4x2 + y 2 = 4 is an ellipse. It stays within a bounded distance from the origin.


3 f is continuous.
By the Extreme Value Theorem, we know that a maximum exists. We will use Lagrange multipliers
to narrow down our search to the possible maximums. We set g(x, y) = 4x2 + y 2 and compute the
gradient vectors of f and g.

∇f (x, y) = ⟨y, x⟩ ∇g(x, y) = ⟨8x, 2y⟩

The theorem allows two possibilities at a maximum.


1 ∇g(x, y) = ⟨0, 0⟩. The only (x, y) that satisfies this is (0, 0). But (0, 0) is not on the constraint,
so it is not a valid solution.
2 ∇f = λ∇g. We can factor the λ across each component of the vectors, but that gives us two
equations and three variables (x ,y and λ). We need another equation, and fortunately we have
one. x and y must satisfy 4x2 + y 2 = 4 as well. Here is one (but not the only) way to solve this
system of equations.

397
Example 5.7.3 The Maximum on a Curve

y = λ8x x = λ2y 4x2 + y 2 = 4


y = λ8(λ2y)
0 = λ2 16y − y
0 = y(4λ − 1)(4λ + 1)
either 0 = y x = λ2(0) 4(0)2 + 02 = 4 (no solution)
1
or λ = ±
4
 
1
y= ± 8x
4
y = ±2x
4x2 + (±2x)2 = 4
8x2 = 4
1
x2 =
2
1
x = ±√
2
 
1
y = ±2 √
2

y =± 2

This tells us the only possible locations for the maximum are:


 
1
(x, y) = ±√ , ± 2
2

We identify the maximum by evaluating f at each point.

1 √ 1 √
   
f √ , 2 =1 f − √ , 2 = −1
2 2
√ √
   
1 1
f −√ , − 2 = 1 f √ , − 2 = −1
2 2
 √   √ 
We conclude that the maximum occurs at √1 , 2 and − √12 , − 2 .
2

398
Figure: The four points that satisfy ∇f = λ∇g and g(x, y) = c.

Main Idea

The level set of a continuous (constraint) function is always closed. If it is also bounded and the
objective function is differentiable, then one of the points produced by Lagrange multipliers will be the
global maximum and one will be the global minimum of the constrained optimization.

Example 5.7.4
The Maximum on a Surface

Find the maximum value of the function f (x, y, z) = x4 y 4 z on the sphere x2 + y 2 + z 2 = 36.

Figure: The gradient vector and level surface of a constraint function and the gradient vector of the
objective function

399
Example 5.7.4 The Maximum on a Surface

Solution

First note that the EVT applies, since a sphere is closed and bounded and f is continuous. To identify
potential maximums, we appeal to Lagrange multipliers.
Set g(x, y, z) = x2 +y 2 +z 2 . Then ∇g(x, y, z) = ⟨2x, 2y, 2z⟩. The case ∇g(x, y, z) = ⃗0 only occurs
at the origin, which is not on the sphere. The critical points must be only the points where ∇f = λ∇g.
∇f (x, y, z) = 4x3 y 4 z, 4x4 y 3 z, x4 y 4 .
Equating each coordinate gives us three equations, and the constraint is a fourth. We thus have a
system of four equations and four variables.

4x3 y 4 z = λ2x 4x4 y 3 z = λ2y x4 y 4 z = λ2z x2 + y 2 + z 2 = 36


The most obvious way to solve this algebraically is to solve for λ, but this requires us to divide by
x, y and z. We would need to remember that another possible solution is that x, y or z is 0. We can
avoid this by multiplying and factoring instead.

4x3 y 4 z = λ2x 4x4 y 3 z = λ2y x4 y 4 = λ2z


4x3 y 5 z 2 = λ2xyz 4x5 y 3 z 2 = λ2xyz x5 y 5 = λ2xyz
4x3 y 5 z 2 = 4x5 y 3 z 2 x5 y 5 = 4x5 y 3 z 2
4x3 y 5 z 2 − 4x5 y 3 z 2 = 0 x5 y 5 − 4x5 y 3 z 2 = 0
4x3 y 3 z 2 (y − x)(y + x) = 0 x5 y 3 (y − 2z)(y + 2z) = 0
either x = 0
or y = 0
or y = ±x and y = ±2z
±2z = x
x2 + y 2 + z 2 = 36
(±2z)2 + (±2z)2 + z 2 = 36
9z 2 = 36
z = ±2
(±2)(±2) = x y = (±2)(±2)
±4 = x y = ±4

This gives us 8 critical points: (±4, ±4, ±2). In addition every point in the x = 0 cross section of
the sphere is a critical point, as is every point in the y = 0 cross-section. This is infinitely many points
to evaluate, but fortunately the algebra of our objective function allows us to evaluate these points in
large batches.

if x = 0 f (x, y, z) = 04 y 4 z = 0

if y = 0 f (x, y, z) = x4 04 z = 0

f (±4, ±4, 2) = (±4)4 (±4)4 (2) = 217

f (±4, ±4, −2) = (±4)4 (±4)4 (−2) = −217

Thus the maximum value is 217 . It occurs at the four points (±4, ±4, 2).

400
Remark

If we hadn’t seen how to avoid dividing by x, y and z, we could have gone ahead and done the division.
Remember that when you divide while solving an equation, you obtain an extra solution where the divisor
is 0. This would lead us to check x = 0, y = 0 and z = 0 as we did in the factoring solution.

Synthesis 5.7.5
Using the Extreme Value Theorem and Lagrange Multipliers

How can Lagrange multipliers help us find the maximum of f (x, y) = x2 + 2y 2 − x2 y on the domain

D = {(x, y) : x2 + y 2 ≤ 16, x ≤ 0}?

Solution

We can continue Example 7. After finding the critical points of f at (0, 0) and (−2, 1), we turn to the
boundaries. The boundaries are level curves.
For x2 + y 2 = 16, set g(x, y) = x2 + y 2 = 16. We have

∇f (x, y) = 2x − 2xy, 4y − x2 ∇g(x, y) = ⟨2x, 2y⟩

∇g(x, y) = ⃗0 only at the origin, which isn’t on the constraint. So we solve ∇f (x, y) = λ∇g(x, y)
and g(x, y) = 4.

401
Synthesis 5.7.5 Using the Extreme Value Theorem and Lagrange Multipliers

2x − 2xy = λ2x 4y − x2 = λ2y x2 + y 2 = 16


2x − 2xy − 2λx = 0
2x(1 − y − λ) = 0
if x = 0 02 + y 2 = 16
y = ±4
if 1 − y − λ = 0
λ =1−y 4y − x2 = (1 − y)2y
2y 2 + 2y = x2
(2y 2 + 2y) + y 2 = 16
3y 2 + 2y − 16 = 0
(3y + 8)(y − 2) = 0

8
if y = − if y = 2
3
 2
8
x2 + − = 16 x2 + 22 = 16
3
64 144
x2 + = x2 = 12
9 9
80 √
x2 = = x = ± 12
9
r
80
x=±
9

√  q 
The critical points are (0, ±4), − 12, 2 and − 80 8

9 , − 3 . The solutions with positive x are

not in D.
On x = 0, substitution is probably the easier choice, but Lagrange multipliers are still possible.
x = 0 is a level set of the function g(x, y) = x.

∇g(x, y) = ⟨1, 0⟩

∇g ̸= ⃗0 so we solve ∇f (x, y) = λ∇g(x, y).

2x − 2xy = λ 4y − x2 = 0 x=0
4y = 0

This is the same equation we obtained by substituting x = 0 into f and differentiating.

402
Main Idea

To find the absolute minimum and maximum of a differentiable function f (x, y) over a closed and
bounded domain D:
1 Compute ∇f and find the critical points inside D.

2 Identify the boundary components. Find the critical points on each using substitution or Lagrange
multipliers.

3 Identify the endpoints (intersections) of the boundary components.

4 Evaluate f (x, y) at all of the above. The minimum is the lowest number, the maximum is the
highest.

Synthesis 5.7.6
The Gradient on the Boundary

Suppose P is a critical point of f on a boundary component of a domain D. What does the direction
of ∇f (P ) tell us about whether P is a maximum or minimum?

Figure: The critical points and gradient vectors of f (x, y) on a closed and bounded domain

Solution

First suppose ∇f (P ) points into D. Then f increases as we travel into D. Thus P cannot be a local
maximum.

403
Synthesis 5.7.6 The Gradient on the Boundary

P may be a local minimum but may not be. The directional derivative along the boundary is 0, so f
could curve upward or downward along the boundary. If f curves downward we could find lower values
of f nearby and P would not be a minimum. If f curves upward, then P would be a minimum. We
could compute this curvature by taking the substituted version of f that we used to solve for P and
computing its second derivative at P .
On the other hand, if we suppose that ∇f (P ) points out of D, then D decreases as we travel into
D, and P cannot be a local minimum. It may or may not be a local maximum.

Question 5.7.7
Can This Lagrange Apply to More Than One Constraint?

If we have two constraints in three-space, g(x, y, z) = c and h(x, y, z) = d, then their intersection
is generally a curve.

Figure: The intersection of the constraints g(x, y, z) = c and h(x, y, z) = d

According to our earlier argument about directional derivatives, at a maximum P on the constraint,
∇f (P ) must be normal to the constraint. There are more ways for this to happen with two constraint
equations.
1 ∇f (P ) could be parallel to ∇g(P ).
2 ∇f (P ) could be parallel to ∇h(P ).
3 ∇f (P ) could be the vector sum of a vector parallel to ∇g(P ) and a vector parallel to ∇h(P ).

You should look at Figure 380 to convince yourself that these ∇f (P ) would all be normal to the
constraint. We can express this condition algebraically

404
Theorem

If f (x, y, z) is a differentiable function and g(x, y, z) = c and h(x, y, z) = d are two constraints. If P is
a maximum of f (x, y, z) among the points that satisfy these constraints then either

∇f (P ) = λ∇g(P ) + µ∇h(P )

for some scalars λ and µ, or ∇g(P ) and ∇h(P ) are parallel.

This system of equations is usually difficult to solve by hand.

Remark

You can check the reasonableness of this method by noting that it gives us a system of 5 variables, x,
y, z, λ, µ, and five equations:

fx (x, y, z) = λgx (x, y, z) + µhx (x, y, z) g(x, y, z) = c


fy (x, y, z) = λgy (x, y, z) + µhy (x, y, z) h(x, y, z) = d
fz (x, y, z) = λgz (x, y, z) + µhz (x, y, z)

We therefore generally expect this system to have a finite number of solutions, though there are plenty
of counterexamples to this expectation.

Section 5.7
Exercises

Summary Questions

Q1 What is a constraint?

Q2 What equations do you write when you apply the method of Lagrange multipliers?

Q3 Is the set of points that satisfies a constraint closed and bounded? Explain.

Q4 How does a constraint arise when finding the maximum over a closed and bounded domain?

405
Section 5.7 Exercises

5.7.1

Q5 Suppose we have $230 to spend on three goods. Good 1 costs $13 per unit. Good 2 costs $22

per unit. Good 3 costs $11 per unit. Write a budget constraint that expresses what purchases
(x, y, z) of good 1, good 2 and good 3 are possible, if you spend you budget.

Q6 Suppose he maximum value of f (x, y) occurs at (3, −4). Where is the maximum value of f (x, y)

that satisfies the constraint x2 + y 2 = 25? Explain.

5.7.2

Q7 Suppose f (x, y, z) is a smooth function. Suppose the maximum value of f on the sphere x2 +

y 2 + z 2 = 25 occurs at P . What can you say about ∇f (P ) and the tangent plane to the sphere
at P ?

Q8 Suppose the curve below is the graph of g(x, y) = k. Use methods from calculus to find and

mark the approximate location of the point that maximizes the function f (x, y) = 3y − x subject
to the constraint g(x, y) = k. Justify your reasoning in a few sentences.

Q9 Suppose that (a, b) is a local maximum of the smooth function f (x, y) which also happens to

satisfy the constraint g(a, b) = k.

a Is (a, b) also a local maximum of f among the points on the constraint? Explain.

b If we used Lagrange multipliers to detect (a, b), what would we expect λ to be equal to at
that point?
406
Q10 Show that (3, 3) is not a local maximum of f (x, y) = 2x2 − 4xy + y 2 − 8x on the graph

x3 + y 3 = 6xy.

5.7.3

Q11 Compute the maximum value of y − x2 on the constraint x2 + y 2 = 4.

Q12 Refer to your “Maximums on a Constraint” worksheet.

a What system of equations would you set up to find the critical points of f on the constraint

p(x, y) = c?

b Can you solve it?

c Which was easier, using Lagrange or using substitution?

5.7.4

Q13 Find the maximum value of f (x, y, z) = xyz on the sphere x2 + y 2 + z 2 = 36.

Q14 Find the maximum value of f (x, y, z) = xz on the sphere x2 + y 2 + z 2 = 36.

Q15 Find the maximum value of f (x, y, z) = 3y + 2z on the ellipsoid 25x2 + y 2 + 4z 2 = 100.

Q16 Find the minimum value of h(x, y, z) = x2 + y 2 + z 2 on the plane 3x + 5y − 2z = 30.

407
Section 5.7 Exercises

5.7.5

Q17 Suppose f (x, y) is differentiable but has no critical points. Will the method of Lagrange multipliers

detect the maximum value of f in D = {(x, y) : x2 + y 2 ≤ 49}? Explain.

Q18 Consider the following two questions:

Find the maximum value of f (x, y) that satisfies x2 + y 2 ≤ 9.


Find the maximum value of f (x, y) that satisfies x2 + y 2 = 9.

a How are the questions different?

b Which question takes less work to solve? Explain how you know.

c Do solutions exist to both questions? What additional information would guarantee that
they do?

Q19 Let D = {(x, y) : x2 + y 2 ≤ 1, x ≥ 0, y ≤ 0}. Find the maximum and minimum values of

f (x, y) = x2 − y on D.

Q20 Consider the function f (x, y) = x2 + 6xy + 9y 2 + 5. Find the maximum and minimum values of

f on the domain D = {(x, y) : y ≥ x, x ≥ 0, x2 + y 2 ≤ 10}

Q21 Let D = {(x, y) : x2 + y 2 ≤ 20, y ≥ −x}. Find the maximum and minimum values of

f (x, y) = x4 y on D.

Q22 Let D = {(x, y) : x2 + y 2 ≤ 25, y ≥ x + 1, y ≥ 0}. Find the maximum and minimum values of

f (x, y) = x3 y 2 on D.

Q23 Let D = {(x, y) : x2 + y 2 ≤ 20, y ≥ −x}. Find the maximum and minimum values of

f (x, y) = x4 y on D.

x2 y2
 
Q24 Let D = (x, y) : + ≤ 1, x ≥ 0 . Find the points in D that obtain the maximum and
16 64
minimum values of f (x, y) = 2x + 3y.

408
5.7.6

Q25 Suppose the maximum of f (x, y) on

D = {(x, y) | g(x, y) ≤ c}

occurs at P on the boundary of D. We know that ∇f (P ) points out of D. What does this tell
us about the sign of λ?

Q26 Explain why knowing which way ∇f points is not useful for ruling out potential maximums given

a domain of the form g(x, y) = c.

5.7.7

Q27 How does the method of Lagrange multipliers suggest we solve for the maximum value of f (x, y)
on the constraints x + y = 1 and x − y = 0? Do we need to know what f is to solve this? Why
shouldn’t that bother us?

Q28 Write a system of equations that one would solve to find the maximum and minimum values of

f (x, y, z) = x on the two constraints y 2 + z 2 = 25 and x + y + z = 1.

Synthesis and Extension

Q29 Consider the plane p with normal equation 7x + 6y − 3z − 42 = 0

a Use Lagrange multipliers to find the point A on p that s closest to the origin O.

−→
b Show that OA is a normal vector to p.

c Show how you can use the observation in b to solve for the closest point (A) without using
calculus.

Q30 Determine the smallest rectangle (parallel to the x and y axes) that contains the ellipse x2 +

3xy + 4y 2 − 4x − 13y + 4 = 0.
409
Section 5.7 Exercises

Q31 An aquarium with an open top has volume 20m3 . Its rectangular base is made of slate, and its

sides are made of glass. Slate costs five times as much (per unit area) as glass. Set up and solve
a constrained onstrained optimization problem to find the dimensions (ℓ, w, h) of the aquarium
that will minimize the cost of materials.

Q32 Let D be the region enclosed by 2x + y = 8, y = 8 and x = 4. Consider the function

f (x, y) = xy − 3y − 6x.

a Does f have a maximum and minimum value on D? What tool can you use to verify this?
What did you need to check before applying this tool?

b Find the maximum and minimum values of f on D. Demonstrate in your work that you’ve
checked all the relevant places for potential maximums.

Q33 Find the maximum and minimum values of f (x, y) = 2x2 + 2xy + 5y 2 on the ellipse x2 + 4y 2 =
106.

410
Chapter 6

Multivariable Integration

This chapter introduces integration of functions of more than one variable. It also introduces joint
probability distributions as an application.

Contents
6.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
6.2 Double Integrals over General Regions . . . . . . . . . . . . . . . . . . . . . 426
6.3 Joint Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 439
6.4 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Section 6.1

Double Integrals
Goals:

1 Approximate the volume under a graph by adding prisms.


2 Calculate the volume under a graph using a double integral.

In single-variable calculus, the definite integral computes a total change from a rate of change.
Moreover, it also solves a geometry problem. This connection means that we can use geometric intuition
to understand integrals better. Integrals of multi-variable functions also allow us to aggregate rates into
totals. To build our intuition, we begin with the geometric problem that they solve.

Question 6.1.1
How Do We Approximate the Volume Under z = f (x, y)?

Begin by remembering our construction of the single-variable definite integral. We approximated


the area under the graph y = f (x) by rectangles. Smaller rectangles give a better approximation, and
we defined the limit of these approximations to be the definite integral.
Z b n
X
f (x)dx = lim f (x∗i )∆x
a ∆x→0
i=1

Figure: The area under y = f (x) approximated by rectangles

Consider a function f (x, y) and a rectangle D = {(x, y) : 0 ≤ x ≤ 4, 0 ≤ y ≤ 2}. We would like


to compute the volume under the part of the graph z = f (x, y) that lies above D. This will need to
be a signed volume, where volume below the xy-plane counts as negative. We approximate this volume
with prisms, because we have a formula for the volume of a prism.
We subdivide D into subrectangles. Over each subrectangle we place a prism. The height of each
prism is the height of the graph above a test point.

412
Figure: The volume under z = f (x, y) and the prisms that approximate it

If A is the area of each subrectangle, and (x∗i , yi∗ ) is the test point in the ith subrectangle, then our
approximation is
X n
Volume ≈ f (x∗i , yi∗ )A.
i=1

If our domain is not a rectangle, we may not be able to divide it into subrectangles. Luckily, the
formula for volume of a prism works for any shape base. We can still compute

n
X
Volume ≈ f (x∗i , yi∗ )Ai .
i=1

Figure: A domain subdivided into irregular subregions

Notice that instead of a single variable A for the area of all subregions, we need a different area for
each. For each i, Ai denotes the area of the ith subregion.
For a reasonably well-behaved function f (x, y), the actual volume can be computed by taking a limit
of these approximations. We call this limit the double-integral.

413
Question 6.1.1 How Do We Approximate the Volume Under z = f (x, y)?

Definition

Let D be a domain in R2 . For a given division of D into n subregions denote

Ai , the area of the ith region.


(x∗i , yi∗ ), any point in the ith region
|A| is the diameter of the largest region.
We define the double integral of f (x, y) to be a limit over all possible divisions of D.
ZZ n
X
f (x, y)dA = lim f (x∗i , yi∗ )Ai
D |A|→0
i=1

Remark

The diameter of a region is the distance between its two most distant points. Sending the largest
diameter to 0 ensures that all of the regions’ diameters shrink to 0.
Notice that we do not take the limit as the area goes to 0. If only the areas approach 0, the regions
could become long and thin. The test points could all be chosen from one end of the domain which is
unrepresentative of the whole.

Example 6.1.2
Approximating a Double Integral

ZZ
Consider x2 ydA, where D is the region shown here. Approximate the integral using the division
D
of D shown, and evaluating f (x, y) at the midpoint of each rectangle.
y

x
1 2

414
Solution

The value of A is the area of each rectangle. In this case that is

A = (1)(0.5) = 0.5.

The test points are the midpoints of each rectangle:

(x∗1 , y1∗ ) = (0.5, 0.25) (x∗3 , y3∗ ) = (1.5, 0.25)


(x∗2 , y2∗ ) = (0.5, 0.75) (x∗4 , y4∗ ) = (1.5, 0.75)

We can expand the sum and evaluate:

4
X
Volume ≈ f (x∗i , yi∗ )A
i=1

4
X
≈A f (x∗i , yi∗ )
i=1

≈ A f (0.5, 0.25) + f (0.5, 0.75) + f (1.5, 0.25) + f (1.5, 0.75)

≈ 0.5 (0.5)2 (0.25) + (0.5)2 (0.75) + (1.5)2 (0.25) + (1.5)2 (0.75)




Question 6.1.3
How Do We Evaluate Double Integrals?

We already know another way of computing a volume. We can compute the area of the cross sections
perpendicular to the x-axis. Let the function A(x) denote this area at each x. Then
Z b
Volume = A(x) dx
a

A(x) is itself the area under a curve. In a particular cross section, x is constant, and f (x, y) is a function
of y. The area below this graph is the integral
Z d
A(x) = f (x, y) dy
c

We can put these together to obtain an iterated integral, an integral whose integrand is itself an
integral.

415
Question 6.1.3 How Do We Evaluate Double Integrals?

Figure: Cross sections of the region below the graph: z = f (x, y)

This method computes the same signed volume as the double integral we defined. The formal
argument that they are equivalent is called Fubini’s theorem.

Theorem [Fubini’s Theorem]

For any domain D we have


!
ZZ Z b Z d
f (x, y) dA = f (x, y) dy dx
D a c

where a and b are the x bounds of D, and c and d are the y bounds of the cross section at each x.
Alternately, we can write
!
ZZ Z d Z b
f (x, y) dA = f (x, y) dx dy
D c a

where c and d are the y bounds of D, and a and b are the x bounds of the cross section at each y.

416
Notation

We will generally omit the parentheses and write


Z b Z d
f (x, y) dydx.
a c

In some cases, rather than figuring out what a, b, c and d are, we will use a hybrid notation. It indicates
a particular order of integration but does not go into details about the bounds of x and y.
ZZ
f (x, y)dydx.
D

Example 6.1.4
Using Fubini’s Theorem

ZZ
Compute x2 y dA, where D is the region shown here:
D
y

x
1 2

Solution

The x bounds of this region are 0 ≤ x ≤ 2. The y bounds are 0 ≤ y ≤ 1. We rewrite this as an
integrated integral and solve:
ZZ Z 2 Z 1
2
x y dA = x2 y dydx (Fubini’s theorem)
D 0 0
2 1
x2 y 2
Z
= dx (FTC on the inner integral)
0 2 y=0
2
x2 12 x2 0 2
Z
= − dx (plug in y values)
0 2 2
2
x2
Z
= dx
0 2
2
x3
=
6 x=0

8 0
= −
6 6
417
Example 6.1.4 Using Fubini’s Theorem

4
=
3

Question 6.1.5
Can We Break a Double Integral into a Product of Single Integrals?

RR
In general, we can’t expect to factor out the inner integral of D f (x, y)dydx (using the constant
multiple rule). The y-bounds may depend on x, and the y terms may not factor out of the integrand.
However, for certain functions and domains, this factoring is possible.

Theorem
! Z !
Z b Z d Z b d
f (x)g(y)dydx = f (x)dx g(y)dy
a c a c

We won’t be able to use this theorem all the time. It has two important requirements:

1 The bounds of integration (a, b, c, d) are constants. We’ll see integrals soon where this is not the
case.
2 The integrand can be factored into a function of x times a function of y. Most two-variable
functions cannot.

Example 6.1.6
Integrating a Product

x2 ydA, where D is the region shown here:


RR
Use a product decomposition to compute D
y

x
1 2

418
Solution
ZZ Z 2 Z 1
2
x y dA = x2 y dydx has constant bounds and the integrand can factor as (x2 )(y). The
D 0 0
product theorem applies:

Z 2 Z 1 Z 2  Z 1 
x2 y dydx = x2 dx ydy
0 0 0 0
! !
3 2 2 1
x y
=
3 0 2 0

3 3
12 02
   
2 0
= − −
3 3 2 2
  
8 1
=
3 2
4
=
3

This matches our computation from Example 4.

Remark

The product decomposition does not save us much work in most cases, but it can help us avoid mixing
up the variables.

Application 6.1.7
Rates (per Area)

Single integrals can compute total change given a rate of change.

meters traveled per second −→ total meters traveled.

GDP growth per year −→ total GDP growth.


mass of a chemical produced per second −→ total mass produced.

Double integrals can compute a total from a rate per unit of area. Integrating rainfall per square
kilometer gives the total rain that fell in a watershed.

419
Application 6.1.7 Rates (per Area)

Figure: A rainfall density map

Integrating watts per square meter on a solar array gives the total energy generated.

Figure: Solar panels


By Jud McCranie - Own work, CC BY-SA 4.0

https://commons.wikimedia.org/w/index.php?curid=70132767

Application 6.1.8
Probability

If we generate a data set in which we have measured two variables, then the probability that a
random data point lies in a given region is the double integral of a joint density function over that
area.

420
Figure: A highly correlated set of observations and an uncorrelated joint density function

Section 6.1
Exercises

Summary Questions

Q1 What shape do we use to approximate volume under a surface?

Q2 What formula do we use to compute the exact volume under a graph z = f (x, y)?

Q3 What does Fubini’s Theorem tell us?

Q4 What conditions do you need in order to write a double integral as a product of single integrals?

6.1.1

Q5 Suppose that we are approximating the volume under z = f (x, y) over T , the triangle with

vertices (0, 0), (2, 0) and (0, 1). We’d like to use subregions about 0.25 units long per side. Here
are two options:
Cover as much of T with square prisms as possible, use triangluar prisms in the remaining
spots.

421
Section 6.1 Exercises

Cover as much of T with square prisms as possible, and just forget about the remaining
space.

a Draw a diagram of where the squares and triangles could reasonably be placed.

b Suppose the side length of the squares shrinks to be arbitrarilty small. Explain why it does
not matter which of the two options we use in these approximations.

Q6 Let S be the unit square:

S = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}
ZZ
Suppose we approximate x dA by n prisms whose bases are rectangles of length 1 in the
S
1
x-direction and width n in the y-direction.

a How could you pick test points in each rectangle to ensure that the value of this approximation
is 0, no matter what n is?

b How could you pick test points in each rectangle to ensure that the value of this approximation
is 1, no matter what n is?

c Does the fact that both of these approximations are possible no matter how many rectangles
ZZ
we use mean that x dA does not exist? Explain.
S

6.1.2

R 6 R 12
Q7 Show how to approximate the integral 0 3
xy dydx using six 3 unit by 3 unit squares and
using their lower right corners as test points. You do not need to simplify the arithmetic.
R4R4
Q8 Approximate the value of 0 −2
sin2 (πxy)dydx by dividing the domain into an array of 4 rect-

angles (2 × 2), and evaluating the function at the midpoint of each.

Q9 Consider the integral


Z 6 Z 9
x
dydx.
0 5 y

a Show how to approximate the integral using six 2 unit by 2 unit squares and using their lower
right corners as test points. You do not need to simplify the arithmetic.
422
b Explain how you can tell whether your approximation in a is an overestimate or underesti-
mate without computing the actual value of the integral.

ex+y dA
RR
Q10 Let T be the triangle with vertices (0, 0), (1, 0) and (0, 2). Show how to approximate T
1
by dividing T into four right triangles with legs of length 1 and 2. Use the midpoint of the
hypotenuses as the test points.

6.1.3

Q11 Let R be the rectangle

R = {(x, y) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 3}.
Let S be the solid region above R and below the graph z = y 2 sin πx + 9. What is the area of
the y = 2 cross-section of S?

Q12 Let R be the rectangle

R = {(x, y) : − 2 ≤ x ≤ 2, −1 ≤ y ≤ 1}.

Let S be the solid region above R and below the graph z = x2 y + xy 2 . Write a function A(x)
which gives the area of the cross section of S perpendicular to the x-axis at each value of x.

6.1.4

Q13 Let R be the rectangle

R = {(x, y) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 3}.
ZZ
Compute y 2 sin πx + 9 dA
R

Q14 Let R be the rectangle

R = {(x, y) : − 2 ≤ x ≤ 2, −1 ≤ y ≤ 1}.
ZZ
Compute x2 y + xy 2 dA.
R

423
Section 6.1 Exercises

Z 5 Z 3
Q15 Evaluate yex dydx.
4 0
Z 10 Z 4
Q16 Evaluate y 3 − x dydx.
0 2

6.1.5

Q17 Let R be the rectangle

R = {(x, y) : − a ≤ x ≤ b, c ≤ y ≤ d}.

Let S be the solid region above R and below the graph z = f (x)g(y). Write a function A(x)
which gives the area of the cross section of S perpendicular to the x-axis at each value of x.
Explain why you can factor the f (x) out of this integral.

Q18 Let R be the rectangle

R = {(x, y) : − 2 ≤ x ≤ 2, −1 ≤ y ≤ 1}.
ZZ
Explain why the product decomposition theorem does not apply to x2 y + xy 2 dA.
R

6.1.6

Q19 Let R be the rectangle

R = {(x, y) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 3}.
ZZ
Write y 2 sin πx dA as a product of two single-variable integrals.
R
Z 3 Z 5
1
Q20 Write dydx as a product of two single-variable integrals.
−3 2 y2

424
6.1.7

Q21 A corrugated metal sheet has density of dx, y = 3 + sin 2x kg/m2 . What is the mass of the

rectangular sheet R = {(x, y) : 0 ≤ x ≤ 4π, 0 ≤ y ≤ 10}?

Q22 The shadow of a tree passes over part of a solar panel each day, covering the bottom of the panel
more of the day than the top. The rate of daily energy generation per unit of area at the point
(x, y) is given by p(x, y) = 8 sin y + π3 kilowatt hours per square meter. Compute the total
power generated per day by the panel whose bounds (in meters) are given by 0 ≤ x ≤ 1 and
0 ≤ y ≤ π6 .

Synthesis & Extension

Q23 Suppose we wanted to compute the volume above z = f (x, y) and below z = g(x, y) over the
rectangle
R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d}.
What double integral would compute this volume?

Q24 Suppose you want to approximate

Z b Z d
f (x, y)dydx
a c

by rectangles sampled from either upper-left, upper-right, lower-left or lower-right corners. If you
are told that fx (x, y) < 0 at all points (x, y), what does that tell you about which approximations
are larger than which?

425
Section 6.2

Double Integrals over General Regions


Goals:

1 Set up double integrals over regions that are not rectangles.


2 Evaluate integrals where the bounds contain variables.
R
3 Decide when to make dy the outer integral, and compute the change of bounds.

So far, we have computed double integrals over rectangular domains. In this section, we consider
double integrals over more complicated domains.

Example 6.2.1
Integrating Over a Polygon

Let D be the triangle with vertices (0, 0), (4, 0) and (4, 2). Calculate
ZZ
4xy dA
D

Solution

The naive approach would be to use the x and y bounds of D to write the integral
ZZ Z 4 Z 2
4xy dA = 4xy dydx
D 0 0

There are a couple reasons to distrust this approach


426
1 These are the same bounds we would use for a rectangle, and D is not a rectangle.
2 The y bounds are supposed to be the bounds of the cross section, and not every cross section
extends from y = 0 to y = 2.
In fact, the y bounds of the cross section depend on which cross section we’re looking at. At x = 1
R 0.5 R 1.5
the cross section has area A(1) = 0 4xy dy. At x = 3 the area is A(3) = 0 4xy dy. Another way
to say this is that the y bounds are a function of x. No matter what x we choose, the lower y bound
appears to be 0. The upper bound always lies on the line from (0, 0) to (4, 2). We can express the
y-values of this line as a function of x by writing its equation: y = 21 x. The correct iterated integral is

1
ZZ Z 4 Z 2x
4xy dA = 4xy dydx
D 0 0

This may appear harder to solve, but it isn’t. The only difference is that when we apply the fundamental
theorem of calculus to the inner integral, we plug in an expression instead of a number.

1
ZZ Z 4 Z 2x
4xy dA = 4xy dydx (Fubini’s theorem)
D 0 0
4 1
2x
Z
= 2xy 2 dx (FTC)
0 0
Z 4  2
1
= 2x x − 2x(0)2 dx
0 2
4
x3
Z
= dx
0 2
4
x4
= (FTC again)
8 x=0

44 04
= −
8 8
= 32

Main Idea

To find the bounds of a double integral


1 Find the x value where the domain begins and ends. These numbers are the bounds of the outer
integral.

2 Find the functions (of the form y = g(x)) which define the top and bottom of the domain. These
functions are the bounds of the inner integral.

427
Question 6.2.2
What Are the Integral Laws for Double Integrals?

Some single variable integral laws apply to double integrals as well (provided the integrals exist).
1 The sum rule:
ZZ ZZ ZZ
f (x, y) + g(x, y)dA = f (x, y)dA + g(x, y)dA
D D D

2 The constant multiple rule:


ZZ ZZ
cf (x, y)dA = c f (x, y)dA
D D

3 If D is the union of two non-overlapping subdomains D1 and D2 then


ZZ ZZ ZZ
f (x, y)dA = f (x, y)dA + f (x, y)dA
D D1 D2

Example 6.2.3
A Region Without a (Single) Bottom Curve


Let D be the region bounded by y = x, y = 0 and y = x − 6. Calculate
ZZ
(x + y) dA.
D

We begin by finding the intersections of these graphs. There are three pairs of graphs to solve for.
√ √
x=x−6 0= x ⃗0 = x − 6

x = (x − 6)2 0=x 6=x

0 = x2 − 13x − 36
0 = (x − 4)(x − 9)
x = 4 or x = 9

When we square both sides of an equation we have to check our solutions. x = 4 does not satisfy

x = x − 6 but x = 9 does. Look at the graph of these functions. There is not a single y lower bound
that applies to all cross sections of this region. For some values of x, the lower bound lies on y = 0.
For others it lies on y = x − 6. We will present three solutions to this problem. We’ll only evaluate the
last one.
428
Solution 1

Using the third integral law, we break up D into two subdomains, each of which has a single bottom
curve. The break happens at x = 6 since that is where y = 0 meets y = x − 6.
√ √
Z 6 Z x Z 9 Z x
(x + y) dA + (x + y) dydx
0 0 6 x−6

Solution 2

D can be written as the region between y = 0 and y = x with a triangle removed. We can use this
ZZ
to write (x + y) dA as a difference of two integrals.
D


Z 9 Z x Z 9 Z x−6
(x + y) dA − (x + y) dydx
0 0 6 0

Solution 3

Instead of taking cross sections perpendicular to the x-axis we can take cross sections perpendicular
to the y-axis. In this case, we need to know the x bounds of each cross section (as a function of
y). Drawing the horizontal line segments through
√ D at each y, we see that the upper x-bound lies on
y = x − 6 and the lower x bound lies on y = x. We need to write these x values as functions of y so
we solve them for y:

y= x y =x−6

y2 = x y+6=x

The lower y bound for the region is y = 0. The upper y bound is the intersection of y = x and
y = x − 6, where x = 9 and y = 3. Thus we can write
ZZ Z 3 Z y+6
(x + y) dA = (x + y) dxdy
D 0 y2

3 y+6
x2
Z
= + xy dy
0 2 y2
3
y 2 + 12y + 36 y4
Z
= + y 2 + 6y − − y 3 dy
0 2 2
Z 3
1 3
= − y 4 − y 3 + y 2 + 12y + 18 dy
0 2 2
3
1 5 1 4 1 3
=− y − y + y + 6y 2 + 18y
10 4 2 0

429
Example 6.2.3 A Region Without a (Single) Bottom Curve

243 81 27
=− − + + 54 + 54
10 4 2
1341
=
20

Main Idea

For a region without a single upper or lower curve, the strategies for integrating a function are the same
as the strategies for computing the area.

1 Break the region into two or more pieces, each of which has a single top curve and a single bottom
curve.
2 See if the region has a single left curve (lower x bound) and a single right curve (upper x bound).
If so, solve the bounds for x and change the order of integration.

Example 6.2.4
Using Anti-Symmetry


ZZ p
Let D be the region x2 + y 2 ≤ 9. Evaluate 3
x y + 3dA.
D

430
Solution

The function f and the domain D both have a particular type of symmetry. D is symmetric about the
y-axis. We can flip the right side of D over onto the left side of D and they match up perfectly. We
can express this transformation in algebra by

(x, y) → (−x, y)
√ √ √ √
Furthermore, f (x, y) = 3 x y + 3 and f (−x, y) = 3 −x y + 3 are opposites (they sum to 0). Thus
the height of the graph z = f (x, y) above the left half of D is equal to the depth of the graph below
the right half of D. These two regions have opposite signed volumes. Their sum, which is the integral
over all of D, is 0.

Main Idea
ZZ
We can argue that an integral f (x, y)dA is equal to zero when
D

1 D is symmetric about some line L. If we folded it over L, one side of D would lie exactly on the
other side.
2 f is antisymmetric about L. For each point (x, y) in D the image of (x, y) across L, denoted
rL (x, y) has the property:
f (rL (x, y)) = −f (x, y).

431
Example 6.2.5
Using Order to Manipulate the Integrand

Let D be the triangle with vertices (0, 0), (0, 2) and (1, 2).
Calculate ZZ
2
e(y ) dA.
D

Solution

D is the region above y = 2x and below y = 2 so we can write the integral


ZZ Z 1Z 2
(y 2 ) 2
e dA = e(y ) dydx
D 0 2x
2
The next step is to integrate with respect to y, but e(y ) does not have an antiderivative that we can
evaluate precisely. The trick in this case is to change the order of integration. The lower x bound is
x = 0 the upper x bound is x = y2 .

y
ZZ Z 2 Z 2
(y 2 ) 2
e dA = e(y )
dxdy
D 0 0
Z 2 y
2 2
= e(y ) x dy
0 0
u-substitution
Z 2 y
2 2
= e(y )
dy u=y y=0⇒u=0
0 2
du = 2y dy y=2⇒u=4
Z 4
1 u 1 y
= e du 4 du = 2 dy
0 4
1 u4
= e
4 0

e4 1
= −
4 4

Main Idea

If we don’t know the anti-derivative of an integrand with respect to one variable, try switching the order
of integration. Remember to change the bounds too.

432
Application 6.2.6
Area of a Domain

We can use a double integral of f to measure the domain of integration, or compute statistics about
f . Here are two examples.

Theorem

The area of a region D can be calculated:


ZZ
1 dA.
D

This theorem may seem counter-intuitive at first, because a double integral computes a volume, not
an area. However, the volume under a graph of height 1 is equal to 1 times the area of the base. As
long as we change from cubic units to square units, the integral will be numerically equal to the area.

Figure: A solid of height 1 over a domain D

433
Section 6.2
Exercises

Summary Questions

Q1 What are the steps for writing a double integral over a general region?

Q2 How do you decide whether dx or dy is the inner variable?

Q3 What is antisymmetry, and how can we use it to evaluate integrals?

Q4 How can we use a double integral to compute the area of a region?

6.2.1

x2 y dA
RR
Q5 If D is the triangle with vertices (0, −2), (4, 0) and (0, 8) calculate D

Q6 Integrate the function f (x, y) = y over the region enclosed by the lines y = 5x, y = 6 − x and
y = x.

Q7 Let f (x, y) be a function and D be the trapezoid with vertices (3, 1), (3, 6), (6, 5) and (6, 4).
RR
Draw D and set up the bounds of D f (x, y)dA.

Q8 Let D be the parallelogram with vertices (0, 1), (0, 4), (5, 3) and (5, 6). Let f (x, y) be a contin-
uous function.
ZZ
a Set up the bounds of integration of f (x, y) dA.
D

R5R4
b Could we save time by computing 0 1
f (x, y) dydx instead? Explain.
ZZ
Q9 Let D be the region enclosed by y = 6 − x2 and y = x. Evaluate xey dA.
D

Q10 If D is the region bounded by y = x2 and y = 8 − x2 , set up and calculate x3 dA.


RR
D

434
6.2.2

Q11 Let T be the triangle with vertices (0, 3), (7, 10) and (9, 0). Set up the bounds for two intgrals
ZZ
whose sum is f (x, y) dA.
T

Q12 Let P be the pentagon with vertices (0, 0), (0, 2), (4, 3), (4, 1) and (3, 0).
ZZ
a Set up the bounds for two integrals whose sum is f (x, y) dA.
P
ZZ
b Set up the bounds for two integrals whose difference is f (x, y) dA.
P

6.2.3

Q13 Let D be the region enclosed by y = ln x, x = 1 and y = 4 − ln x. Set up the integral

ZZ
f (x, y) dA
D

in two different ways, using both orders of dx and dy. Do not evaluate either.

Q14 Let D = {(x, y) : x2 + y 2 ≤ 9, x ≥ 0}. Draw D and set up


RR
D
f (x, y) dA in two different
ways.
√ √
Q15 Consider the region D enclosed by y = x, y = 27 x, and y = 90 − x.

RR
a Rewrite D
f (x, y) dA as one or more integrals with differential dydx. Do not evaluate.

RR
b Rewrite D
f (x, y) dA as one or more integrals with differential dxdy. Do not evaluate.

Q16 Let D = {(x, y) : y ≤ 12 − x2 , y ≥ x, y ≥ −x}.

RR
a Rewrite D
f (x, y) dA as one or more integrals with differential dydx. Do not evaluate.

RR
b Rewrite D
f (x, y) dA as one or more integrals with differential dxdy. Do not evaluate.
Z 5 Z 10−2x
Q17 Draw the domain of the integral f (x, y) dydx. Then rewrite the integral in the order
1 0
dxdy.
435
Section 6.2 Exercises

Z 6 Z 0
Q18 Consider the integral √ x2 dxdy. Write this integral in the order dydx.
−6 − 36−y 2

6.2.4


8 64−x2

Z Z
3
Q19 Let f (x, y) = cos x sin y. Argue that √ f (x, y) dydx = 0.
−8 − 64−x2
Z 4 Z 3
3 y2
Q20 Let g(x, y) = x e . Argue that g(x, y) dydx = 0.
−4 −3

Q21 Let R be the kite with vertices

(1, 1) (5, 7)
(7, 7) (7, 5)
RR
Suppose you wanted to argue that R f (x, y)dA = 0 by a symmetry argument. Describe with
a diagram or formula what would need to be true about f (x, y) for such an argument to work.

Q22 Let D be the trapezoid with vertices (0, 5), (6, 5), (2, 0) and (4, 0). Let g(x, y) be some continuous
function.
RR
a Sketch D and set up the bounds of integration for D
g(x, y) dA such that you obtain one

integral (not a sum or difference of integrals).


RR
b If you wanted to use an antisymmetry argument to show that D
g(x, y) dA = 0 what

would need to be true about g(x, y)? Express your answer as a formula.

Q23 Let h(x) be a one-variable function that takes only positive values. Let f (x, y) be a two-variable
Z b Z h(x)
function. Describe the antisymmetry of f that would allow us to conclude that f (x, y) dydx =
a −h(x)
0.

Q24 Suppose you are given that f (x, y) = −f (−y, −x). Over what domains D can we argue by
ZZ
symmetry that f (x, y) dA = 0? Draw an example of one.
D

436
6.2.5

Q25 Would the method in this example still work, if we instead defined D to have vertices (0, 0),

(1, 0), and (0, 2)? Explain.


ZZ
3
Q26 Suggest a domain D over which it would be possible to evaluate ey dA.
D
Z 2 Z 3
Q27 Evaluate yexy dydx.
0 0
Z 3 Z 3
Q28 Evaluate sin(πy 2 ) dydx.
0 x

6.2.6


Z 10 Z 100−x2
Q29 Use geometry to evaluate dydx.
0 0
Z 8 Z 4− 12 x
Q30 Use geomtery to evaluate dydx.
0 0

Synthesis & Extension

Q31 What is the geometric significance of the inner integral in a double integral of the form

Z b Z h(y)
f (x, y) dxdy?
a g(y)

Q32 Consider the integral


4 6

Z Z
x3 ydydx
−4 0

a Show how to approximate the value of this integral, dividing the domain into sub-rectangles
of length 2 units and width 3 units and using the lower right corners as test points. You
should evaluate any functions that appear in your estimate, but you do not need to simplify
the arithmetic.

437
Section 6.2 Exercises

b Explain in a sentence or two how you can determine the exact value of this integral without
calculating any anti-derivatives.

c Discuss what test point you could have picked in a , such that your approximation would
have computed the exact value of the integral. Note: There are several relevant observations
to make in response to this question.

438
Section 6.3

Joint Probability Distributions


Goals:

1 Integrate a joint density function to calculate a probability.


2 Recognize when random variables are independent.

Some of the most compelling statistical conclusions do not rely on one measurement but on many,
and the relationship between them. Suppose we test a drug by randomly giving different doses to different
participants, then measuring their symptoms. Knowing the likelihood of each level of symptoms doesn’t
tell you whether the drug is effective. Adding in the knowledge of what percentage of test subjects
receive each dosage does not help. Instead you need to know how likely certain pairs of dose and
outcomes are:

(low dose, low symptoms) (low dose, high symptoms)


(medium dose, low symptoms) (no dose, medium symptoms)

If (no dose, high symptoms) and (high dose, low symptoms) are likely enough, then there is a
correlation which points to efficacy of the drug. Individual random variables with individual density
functions cannot model this behavior. We need two-variable density functions and double integrals.

Question 6.3.1
How Do We Use Double Integrals to Compute Probabilities?

Recall how we modeled continuous random variables.

Definition

A function f is a probability density function for a random variable X, if the chance of an outcome
Rb
a < X < b is a f (x)dx.

439
Question 6.3.1 How Do We Use Double Integrals to Compute Probabilities?

Definition

A pair (or more) of random variables X and Y , along with the likelihood of various outcomes (X, Y ) is
called a joint distribution. If the space of outcomes is continuous, the distribution is modeled by a joint
probability density function fX,Y (x, y) as follows:

Z b Z d
P (a ≤ X ≤ b and c ≤ Y ≤ d) = fX,Y (x, y) dydx
a c

More generally, for any region D in R2


ZZ
P ((X, Y ) lies in D) = fX,Y (x, y) dA.
D

Example 6.3.2
Using a Joint Density Function

Suppose the random variables X and Y have the joint density function
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise

Compute the probability that X is at least twice as large as Y .

Solution

We can write “X is at least twice as large as Y ” with the inequality X ≥ 2Y . This is everything below
the line y = 21 x Call this region H. We’ll integrate f over this region. This may seem daunting, but
f (x, y) = 0 outside the unit square. We can break H into two subregions, one that lies inside the square
and one that lies outside. A diagram will make it easier to find the bounds.

Figure: The target region H and the unit square of possible outcomes

440
  ZZ
1
P Y ≤ X = x + y dA
2 H
1
Z 1 Z 2x
ZZ
= x + y dydx + 0 dA
0 0 the rest of H
1
1 2x
y2
Z
= xy + dx
0 2 0
Z 1
1 2 1 2
= x + x dx
0 2 8
Z 1
5 2
= x dx
0 8
1
5 3
= x
24 0

5
=
24

Warning

The region of integration in this example has one fourth of the area of the total region of possibilities,
5
yet the answer was 24 not 41 . Do not confuse area with probability. Not all outcomes are equally likely
to occur.
Since we got a low probability, relative to area, we can deduce that the probability density in the
region we examined is lower than at some other parts of the domain. That makes sense. The joint
density function x + y is largest in the upper right corner and lowest in the lower left. More of our
triangle was near the lower left than the upper right.

441
Example 6.3.2 Using a Joint Density Function

Exercise

Darmok and Jalad each travel to the island of Tanagra and arrive between noon and 4 PM. Let (X, Y )
represent their respective arrival times in hours after noon. Suppose their joint density function is
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise

R4R4
1 What is the value of 0 0
fX,Y (x, y)dydx?
2 Calculate the probability that Darmok arrives after 2PM.

3 Calculate the probability that Darmok arrives before Jalad.


4 What does the distribution say about when Darmok is likely to arrive? What about Jalad?
5 Write an integral that computes the probability that they arrive within an hour of each other (set
it up, don’t evaluate).

Question 6.3.3
What Is a Marginal Density Function?

Suppose we have a joint density function fX,Y (x, y). What if we are only interested in the values
of X? Perhaps we want to compute the expected value. Recall that a density function fX (x) of X
satisfies the property
Z b
P (a ≤ X ≤ b) = fX (x) dx
a

How can we get this function from the joint density function? We can compute P (a ≤ X ≤ b).

Z b Z ∞
P (a ≤ X ≤ b) = fX,Y (x, y) dydx
a −∞

Compare this to the definition of a probability density function. Both compute the same probability.
Both integrate over the same range of x-values. The only way for this to be true for all values of a
Z ∞
and b is if the integrand is the same. This means that the inner integral fX,Y (x, y) dy is equal to
−∞
fX (x), the probability density function of X.
442
When we obtain a density function of one random variable from a joint distribution, we call it a
marginal density function.

Theorem

Given a joint distribution X, Y with joint density function fX,Y , the individual variables have marginal
density functions:
Z ∞
fX (x) = fX,Y (x, y) dy
−∞
Z ∞
fY (y) = fX,Y (x, y) dx
−∞

Z ∞
For each x-value x0 , the inner integral fX,Y (x0 , y) dy is the area of the x = x0 cross-section
−∞
under z = fX,Y (x, y). In this figure, we see that larger values of X are more likely, because their
cross-sections have more area.

Figure: The marginal density function fX (x), represented as cross-sections under z = fX,Y (x, y)

Example 6.3.4
Computing Marginal Density Functions

Students at schools around the world compete in a rocketry contest. Rockets are scored based on
the altitude they reach (in meters). Suppose the first and second place altitudes at a randomly chosen
school are modeled by X and Y , which have joint density function
 
y2
(
12−0.012x y
1000 x2 − x3 if 0 ≤ x ≤ 1000, 0 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise

443
Example 6.3.4 Computing Marginal Density Functions

a What can we infer about the possible altitudes of student rockets from this joint density function?

b Compute the marginal density function of X, the altitude of the first place rocket.

c What can we conclude about what values of X are more or less likely?

Solution

Figure: The possible outcomes of (X, Y ), and the possible outcomes of Y for each X

a The maximum altitude of a rocket is 1000m. The second-place rocket always has a lower altitude
than the first-place rocket, which makes sense.

b For x > 1000 or x < 0, the function fX,Y (x, y) = 0 for any choice of Y . For 0 ≤ x ≤ 1000, the

function fX,Y (x, y) is piecewise function of y. We can see this in the figure above, fX,Y is only
nonzero when 0 ≤ y ≤ x.
Z ∞
fX (x) = fX,Y (x, y) dy
−∞

=0 if x < 0 or x > 1000


Z 0 Z x Z ∞
= fX,Y (x, y) dy + fX,Y (x, y) dy + fX,Y (x, y) dy if 0 ≤ x ≤ 1000
−∞ 0 x
x
y2
 
12 − 0.012x y
Z
=0+ − dy + 0
0 1000 x2 x3
12 − 0.012x x y y2
Z  
= − dy constant multiple rule
1000 0 x2 x3
 2  x
12 − 0.012x y y3
= − 3
1000 2x2 3x 0

444
x2 x3
 
12 − 0.012x
= − −0+0
1000 2x2 3x3
 
12 − 0.012x 1 1
= −
1000 2 3
2 − 0.002x
=
1000
(
2−0.002x
1000 if 0 ≤ x ≤ 1000
fX (x) =
0 otherwise

c fX (x) has its largest value at x = 0 and shrinks to 0 as x increases to 1000. This indicates that
lower altitudes are much more likely than higher altitudes.

Figure: The marginal density function of X

Figure: The marginal density function of X, represented as an area under the graph of z = fX,Y (x, y)
(z-axis not to scale)

Remark

Even though the range of possible outcomes is greater for larger X, the probability of achieving that X
is smaller. We can see this in the cross sections on the joint-density function. Larger values of X have
longer cross sections, but it is the area under the graph z = fX,Y (x, y) that matters.

445
Example 6.3.4 Computing Marginal Density Functions

Main Idea

If the range of possible outcomes is limited, then computing fX (x) requires us to:
1 make different computations for different ranges of X and
2 within each computation, divide the integral into pieces depending on which values of Y are
possible.

Question 6.3.5
Why Do We Need Joint Distributions?

If we want to communicate about the possible outcomes of X and Y , do we need to give an


expression for fX,Y ? Maybe the marginal density functions fX and fY tell us everything we need to
know about what outcomes are likely. In fact it does not. Marginal density functions cannot tell us how
the likely outcomes of Y change as the outcome of X changes, or vice versa. For instance, perhaps a
patient given a random amount of medicine X is likelier to have smaller symptoms Y when X is larger.
However, in some cases, there is no change at all. No matter what the outcome of X is, the
likelihood of each Y outcome is the same. In this case, the marginal density functions tell us everything
we need to know. This situation is useful to recognize when it occurs, and we have a name for it.

Definition

If the outcomes of Y don’t depend on the outcome of X and vice versa, we say X and Y are indepen-
dent. In this case
Z b Z d
P (a ≤ X ≤ b and c ≤ Y ≤ d) = fX (x) dx fY (y) dy
a c

Example

Suppose Darmok and Jalad’s arrival times have the joint density function
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise

Jalad’s arrival time is uniformly distributed. Darmok’s is triangular. Neither distribution depends on
the arrival time of the other.

446
Figure: The density function for Darmok and Jalad’s arrival times

Independence is straightforward to check, and it is closely related to the product decomposition of


a double integral.

Theorem

X and Y are independent, if and only if their joint density function can be written fX,Y (x, y) = g(x)h(y),
where
g(x) is a function only of x
h(y) is a function only of y

Remark

g(x) and h(y) can be chosen to be the marginal density functions of X and Y , but they don’t need to
be. As long as a factorization exists, the variables are independent.

Example

Suppose
(
3π π

12π−8 cos 2x (2y − y 2 ) if 0 ≤ x ≤ 6 and 0 ≤ y ≤ 4
fX,Y (x, y) =
0 otherwise

fX,Y (x, y) factors into the marginal density functions


(
π π

3π−2 cos 2x if 0 ≤ x ≤ 6
fX (x) =
0 otherwise
(
3
4 (2y − y 2 ) if 0 ≤ y ≤ 4
fY (y) =
0 otherwise

Thus we can conclude that X and Y are independent.

447
Question 6.3.5 Why Do We Need Joint Distributions?

We can see independence in the cross sections of z = fX,Y (x, y).

Figure: An independent joint density function and its cross sections

The area of a y = y0 cross section is fY (y0 ) the likelihood that Y is near y0 . The shape of the
cross section indicates what X values are likely for that choice of Y . For independent variables, the
X values are distributed the same way no matter what Y value we choose. Mathematically, the cross
section functions are constant multiples of each other. Multiplying by a constant does not change what
portion of the total area lies over a given range of X values.

Question 6.3.6
What Is the Expected Value of a Function of X and Y?

Y2
What if we wanted to know the expected value the function g(X, Y ) = X ? By definition, this is
very hard. We would need to write a density function h(t) such that

b
Y2
Z  
h(t) dt = P a ≤ ≤b
a X

Notice g(x, y) = a and g(x, y) = b are level curves of g. In this case they solve to

1
x = y2
a
1
x = y2
b
448
In the case of Darmok and Jalad, the probabilities that h(t) produces would have to integrate to
give the probability that (X, Y ) lies between the level curves:

Figure: The region where a ≤ g(x, y) ≤ b

Even if you did work through the steps to describe the bounds of such a region, you’d need to
1 Write the bounds as a function of a and b, which will be piecewise depending on whether the level
curves exit through the top or the side of the square.

Y2
2 Evaluate the integral of fX,Y (x, y) over such a region to compute P (a ≤ X ≤ b).

3 Use the Fundamental Theorem of Calculus to write an integrand h(t) that integrates to the
probability you found.
Z ∞
4 Integrate th(t) dt.
−∞

Only then would you know the expected value of g.


Fortunately there is a multivariable analogue to the expected value theorem from single variable
density functions.

Theorem

The expected value of a function g(X, Y ) of two continuous random variables X and Y with joint
density function fX,Y (x, y) can be computed:
Z ∞ Z ∞
E[g(X)] = g(x, y)fX,Y (x, y) dydx.
−∞ −∞

449
Example 6.3.7
Expected Value of a Random Variable

A special case of the expected value formula is to compute the expected values of g(x, y) = x or
g(x, y) = y. Suppose X and Y have joint density function
 
y2
(
12−0.012x y
1000 x2 − x3 if 0 ≤ x ≤ 1000, 0 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise

Compute E[X].

Solution

E[X] = E[g(X, Y )] where g(x, y) = x. We apply the expected value formula


Z ∞ Z ∞
E[g(X, Y )] = g(x, y)fX,Y (x, y) dydx
−∞ −∞
Z 1000 Z x ZZ
= xfX,Y (x, y) dydx + xfX,Y (x, y) dydx
0 0 everywhere else
1000 x
y2
 
12 − 0.012x y
Z Z
= x − 3 dydx + 0
0 0 1000 x2 x
1000
x(12 − 0.012x) x y y2
Z Z  
= − 3 dydx
0 1000 0 x2 x
1000  2  x
x(12 − 0.012x) y3
Z
y
= − dx
0 1000 2x2 3x3 0
1000
x(12 − 0.012x) x2 x3
Z  
= − 3 − 0 + 0 dx
0 1000 2x2 3x
1000  
x(12 − 0.012x) 1 1
Z
= − dx
0 1000 2 3
1000
x(2 − 0.002x)
Z
= dx
0 1000
1000
2x − 0.002x2
Z
= dx
0 1000
1000
x2 0.002x3
= −
1000 3000 0

2000
= 1000 −
3
1000
=
3

450
Main Ideas

We can compute E[X] or E[Y ] by integrating


Z ∞ Z ∞
E[X] = xfX,Y (x, y) dydx
−∞ −∞
Z ∞ Z ∞
E[Y ] = yfX,Y (x, y) dydx
−∞ −∞

If we already have the marginal density function fX (x) (or fY (y)), we can use the single-variable
expected value formula:
Z ∞
E[X] = xfX (x) dx
−∞

In fact, we saw this integral partway through our solution. Computing the marginal density function
is nearly equivalent to computing the inner integral in the two-variable expected value formula.

Example 6.3.8
Expected Value of a Function

2
Compute the expected value of YX where X is Darmok’s arrival time and Y is Jalad’s arrival time.
Assume that X and Y have joint density function:
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y = .
0 otherwise

Solution

The expected value is given by


∞ ∞
y2
Z Z
E[Y 2 /X] = fX,Y (x, y) dydx
−∞ −∞ x
4Z 4
y2 x
Z
= dydx because fX,Y = 0 outside 0 ≤ x ≤ 4, 0 ≤ y ≤ 4
0 0 x 32
4 4
y2
Z Z
= dydx
0 0 32
4 4
y3
Z
= dx
0 96 0
Z 4
2
= dx
0 3
451
Example 6.3.8 Expected Value of a Function

4
2
= x
3 0

8
=
3

Application 6.3.9
Average Value of a Function

Definition

The uniform distribution over a region D in R2 has the joint density function
(
1
area of D if (x, y) is inside D
fX,Y =
0 if (x, y) is outside D

Like with single variable function, we default to the uniform distribution whenever we average a
function and no specific random variable is specified.

Definition

The average value of a function f over a region D is defined to be the expected value of f (X, Y )
where X, Y are uniformly distributed over D.
ZZ
1
fave = f (x, y) dA
Area of D D

Since we can also compute the area of D using a double integral, we can also write
RR
D
f (x, y) dA
fave = RR
D
1 dA

452
Application 6.3.10
Covariance and Correlation

One of the most useful things to know about a pair of random variables is whether they are correlated,
whether high values of one tend to correspond to high values (or low values) of the other. We can measure
this by examining the expected value of a specific function, which is positive when X and Y are both
above average or both below average, and negative for pairs when one is above and the other is below.

Definition

The average value of (X − E[X])(Y − E[Y ]) is called the covariance of X and Y , denoted cov(X, Y ).

1 If cov(X, Y ) > 0, higher values of X tend to be correlated with higher values of Y .


2 If cov(X, Y ) < 0, higher values of X tend to be correlated with lower values of Y .
3 If cov(X, Y ) = 0, X and Y are uncorrelated.

To test this, we can look at a type of joint distribution whose correlation we already understand.
Suppose X and Y are independent. Then outcomes of X should not depend on outcomes of Y . The
joint density function can be written f (x, y) = g(x)h(y). We can use our integral rules to see that
covariance is always 0, matching our intuition.

Z ∞ Z ∞
cov(X, Y ) = (x − E[X])(y − E[Y ])fX,Y (x, y) dydx
−∞ −∞
Z ∞ Z ∞
= (x − E[X])(y − E[Y ])g(x)h(y) dydx
−∞ −∞
Z ∞  Z ∞ 
= (x − E[X])g(x) dx (y − E[Y ])h(y) dy
−∞ −∞
Z ∞  Z ∞ 
= xg(x) dx − E[X] yh(y) dy − E[Y ]
−∞ −∞

= (0)(0)

Covariance on its own does not allow us to compare whether one joint distribution is better correlated
than another. A joint distribution could have a large covariance because the variables are consistently
correlated, or because X (or Y ) has high variance (meaning X is generally farther from E[X]). To
control for the latter effect we often compute:

Pearson’s Correlation

cov(X, Y )
ρX,Y =
σX σY
Where the σs are standard deviations.

ρ returns a value between −1 and 1 which is one measure of how well-correlated two random variables
are.

453
Section 6.3
Exercises

Summary Questions

Q1 How do we use a joint density function to compute the probability of a certain set of outcomes?

Q2 What is a marginal density function and how do we compute it?

Q3 What does it mean for two random variables to be independent?

Q4 How can we tell from the graph of a joint density function that the two random variables are
independent?

6.3.1

Q5 Given a joint density function fX,Y (x, y), what does

Z 1 Z x
fX,Y (x, y) dydx
0 0

compute?

Q6 Suppose X and Y have the joint density function

(
ax if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 5
fX,Y (x, y) =
0 otherwise

What is the value of the number a?

454
6.3.2

Q7 Suppose X and Y have the joint density function


(
y2
18 if 0 ≤ x ≤ 2 and 0 ≤ y ≤ 3
fX,Y (x, y) =
0 otherwise

Compute the probability that X + Y is greater than 3.

Q8 Suppose X and Y have the joint density function


(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise

Compute the probability that X and Y differ by at least 21 .

Q9 Suppose X and Y have the joint density function


(
4−x+y
32π if x2 + y 2 ≤ 8
fX,Y (x, y) =
0 otherwise

a What values of (X, Y ) are possible?

b Among the possible values of (X, Y ), describe which are more or less likely than others.

c Set up an integral or integrals that would compute the probability that Y > X. You don’t
need to evaluate it.

Q10 Suppose we perform an experiment in which a pair of strangers find an amount of money on
the ground. Suppose X and Y are continuous random variables that model the portion of the
money (0 =none, while 1 = all) that each person keeps. Any money not kept is turned into the
authorities. Suppose the joint density function of X and Y is
(
24xy if x ≥ 0, y ≥ 0, and x + y ≤ 1
fX,Y (x, y) =
0 otherwise

a In a few sentences, interpret what this density function says about which outcomes are likely
and which are not. Feel free to include any comments on human nature that you need to
get off your chest.

b Set up an integral (or integrals) that computes the probability that each person takes at
most twice as much as the other. Do not evaluate.

455
Section 6.3 Exercises

6.3.3

Q11 Let T be the triangle with vertices (1, 2), (4, 0) and (3, 5). If X and Y are a joint distribution
with a density function fX,Y that is nonzero on T and zero everywhere else. For what values of
x is the marginal density function fX (x) nonzero? Illustrate with a diagram.

Q12 Let D be the region between y = x2 and y = 2x + 15. If X and Y are a joint distribution with
a density function fX,Y that is nonzero on D and zero everywhere else. For what values of y is
the marginal density function fY (y) nonzero? Illustrate with a diagram.

Q13 Suppose that X and Y are a joint distribution whose density function fX,Y is nonzero in the disk

x2 + y 2 ≤ 1 and nowhere else. If the marginal density function of X is the density function of a
uniform random variable, what does this tell you about where the function fX,Y (x, y) is higher
and lower?

Q14 Suppose X and Y have joint density function

(
g(y) if a ≤ x ≤ b and c ≤ y ≤ d
fX,Y =
0 otherwise

where g is a function only of y. What is the marginal density funtion of X? Justify your answer,
preferably without actually evaluating any integrals.

6.3.4

Q15 Suppose the random variables X and Y have the joint density function

(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise

Compute the marginal density function of X.

Q16 Suppse X and Y have joint density function

(
4xy − 2x − 2y + 2 if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise

a Compute the marginal density function fX (x).

456
b Compute the marginal density function fy (y).

c What familiar kind of random variables are X and Y ?

Q17 Let T be the triangle with vertices (0, 0), (1, 0) and (0, 1). Let X and Y have joint density
function
(
6x if (x, y) is in T
fX,Y (x, y) =
0 otherwise

Compute the marginal density function of X.

Q18 Suppose X and Y have joint density function

(
15y if x2 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise

a Draw the region of possible outcomes of (X, Y ) in R2 .

b Compute the marginal density function of X.

6.3.5

Q19 Suppose X and Y are independent. Their joint density function fX,Y (x, y) has the values

fX,Y (3, 7) = 0.1 fX,Y (5, 7) = 0.15fX,Y (5, 2) = 0.21

What is fX,Y (3, 2)?

Q20 How does the distribution of Y change as X takes different values, given the following joint
density function?
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise

457
Section 6.3 Exercises

Are X and Y independent?

Q21 Suppose X and Y are independent random variables. If their joint density function fX,Y (x, y) is
0 except on D, what can we say about the shape of D?

Q22 fX,Y (x, y) is a joint density function for a pair of independent variables X and Y . Here is a

picture of the x = 2 cross section of z = fX,Y (x, y).

a Describe what values of Y are more or less likely when X = 2.

b Assume fX,Y (x, y) is not always 0 at x = 5. Describe what values of Y are more or less
likely when X = 5.

c How is the shape of the x = 2 cross section of z = fX,Y (x, y) related to the x = 5 cross

section of z = fX,Y (x, y)?

458
6.3.6

Q23 Let X and Y be random variables with joint density function fX,Y (x, y) and let D be the

distance from (X, Y ) to the origin. What region would we need to integrate over to compute
P (1 ≤ D ≤ 2)?

Q24 Let X and Y be random variables with joint density function fX,Y (x, y) and let Z be the difference

X − Y . What region would we need to integrate over to compute P (0 ≤ Z ≤ 5)?

Q25 Use the expected value formula to show that if Z1 and Z2 are both functions of X and Y , then

E[Z1 + Z2 ] = E[Z1 ] + E[Z2 ].

Q26 Let T be the triangle with vertices (0, 0), (4, 0) and (0, 4). Suppose X and Y are random variables
with joint density function
(
1
8 if (x, y) is in T
fX,Y (x, y) =
0 otherwise

Let Z = X + Y .

a Write a function G(z) which gives the probability that Z < z.

dG
Rb
b Compute g(z) = dz . Explain why P (a ≤ Z ≤ b) = a
g(z) dz.

c Use g to directly compute the expected value of Z.

d Compute the expected value of Z instead using our multivariable expected value of a function
formula.

6.3.7

Q27 Suppose X and Y have joint density function

 
y2
(
12−0.012x y
1000 x2 − x3 if 0 ≤ x ≤ 1000, 0 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise

Compute E[Y ]

459
Section 6.3 Exercises

Q28 Suppose the random variables X and Y have the joint density function

(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise

Compute E[Y ].

Q29 Darmok and Jalad’s arrival times X and Y have the joint density function

(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise

What is the expected arrival time of Darmok?

Q30 Suppose the joint density function of X and Y is

(
24xy if x ≥ 0, y ≥ 0, and x + y ≤ 1
fX,Y (x, y) =
0 otherwise

. What is the expected value of X?

6.3.8

Q31 Suppose the random variables X and Y have the joint density function

(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise

Compute the expected value of XY .

Q32 Darmok and Jalad’s arrival times X and Y have the joint density function

(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise

Darmok is trying to break his habit of arriving late. He has agreed to donate 120 credits to a
local charity for each hour Jalad has to wait for him (prorated across partial hours). Assuming
that this incentive has no effect on their arrival times, what is the expected donation?

460
Q33 Suppse X and Y have joint density function

(
4xy − 2x − 2y + 2 if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise

Compute the expected value of X 2 Y 2 .

Q34 The longitude and latitude of a meteorite landing are random variables X degrees and Y degrees
with joint density function
(
8100−y 2
349920000 if − 180 ≤ x ≤ 180 and − 90 ≤ y ≤ 90
fX,Y (x, y) =
0 otherwise

a Write an integral that computes the probability that a meteorite lands within 20 degrees

longitude of the prime meridian (x = 0). Do not evaluate it.

b What does this density function say about where a meteorite is likely or unlikely to strike?
Answer in a few sentences.

c Suppose a perverse lottery is established that pays out 30 dollars minus the distance in

degrees from the south pole (y = −90), if the meteorite strikes within 30 degrees of the
south pole. Otherwise it pays out nothing. Set up an integral that computes the average
payout from this lottery. Do not evaluate.

6.3.9

Q35 Compute the average value of the function f (x, y) = 2y on the unit disc x2 + y 2 ≤ 1.

Q36 Compute the average value of the function f (x, y) = y on the region enclosed by y = x2 and
y = 16.

Q37 Compute the average value of the function f (x, y) = xy on the triangle with vertices (0, 0), (4, 0)

and (0, 8).

Q38 Compute the average value of the function f (x, y) = x2 on the triangle with vertices (−2, 0),

(2, 0) and (0, 2).

461
Section 6.3 Exercises

6.3.10

Q39 Recall our friends Darmok and Jalad arriving in Tanagra between noon and 4 PM. The joint

density function of their respective arrival times (X, Y ) is


(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y = .
0 otherwise

8 8

We found that E[X] = 3 and E[Y ] = 2. Consider the function g(X, Y ) = X − 3 (Y − 2).

a Draw the domain of possible values of (X, Y ) At what points in this domain is g positive?
Where is it negative?

b Could you argue, using the laws of integrals instead of a computation, that E[g(X)] = 0?

Q40 Suppose X and Y have joint density function

(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise

a Compute E[X] and E[Y ].

b Compute the covariance of X and Y .

c What does your answer to b suggest about how X and Y are correlated?

Synthesis & Extension

Q41 Suppose D is the region enclosed by 6x−x2 and the x-axis. Let X and Y are the uniform desntiy

function over D. Let fX (x) be the marginal density function of X.

a What values of X are possible outcomes? What values are impossible?

b What values of X and more likely and what values are less likely? Justify your answer.

Q42 If X and Y have a uniform joint distribution over some region D, can X and Y be correlated?
Explain or demonstrate.
462
Q43 Suppose on a trip to the movies, the number of minutes you wait in line for tickets (X) and

the number of minutes you wait in line for snacks (Y ) are random variables with joint density
function:
(
12x−x2 +10y−y 2
4880 if 0 ≤ x ≤ 12 and 0 ≤ y ≤ 10
fX,Y (x, y) =
0 otherwise

a Are X and Y independent? Justify your answer in a sentence or two.

b Compute the probability that the ticket line takes less than 5 minutes. You don’t need to
simplify the arithmetic.

c You decide to pay a friend 25 cents per minute to wait in line for snacks while you wait for
the tickets. If you’re still in line when she gets the snacks, she brings them to you and you
pay her. If she’s still in line when you get tickets, you pay her and take her place. Write an
integral or integrals that compute the expected (average) amount you will pay her. Do not
evaluate.

Q44 When you go to the movies, you have to wait in line for tickets, and then to buy snacks. You

model the ticket wait (in minutes) with the random variable X. You model the snack wait with
the random variable Y . Suppose X and Y have the joint density function
(
50e−5x−10y if 0 ≤ x and 0 ≤ y
fX,Y (x, y) =
0 otherwise

a Compute the probability that you wait a total of no more than 15 minutes in both lines.

b Are the X and Y in this problem indepedent?

c Is the independence of X and Y a reasonable assumpton? Explain.

Q45 Darmok and Jalad have agreed to meet up again at Tanagra. Darmok’s arrival time (in hours)
after noon is denoted by the random variable X, while Jalad’s is denoted by the random variable
Y . X and Y have the joint density function
(
y
6x2 if 1 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) =
0 otherwise

a Describe the possible arrival times of Darmok and the possible arrival times of Jalad.

b Compute the probability that Darmok arrives at least two hours after Jalad.

463
Section 6.3 Exercises

c Darmok and Jalad leave Tanagra at exactly 6PM. Write an integral or integrals that compute

the average amount of time they spend together at Tanagra. Do not evaluate your integral(s),
but your integrand(s) should be functions whose antiderivative(s) are well known.

Q46 Let

D = {(x, y) : x2 + y 2 ≤ 4, y ≥ 0}
Suppose X and Y have joint density function
(
3y
16 if (x, y) is in D
fX,Y (x, y) =
0 otherwise

a Compute the marginal density function of Y

b What integral would compute the expected value of X? How do you know the value of this
integral without computing it?
R6
Q47 Suppose we wish to approximate 0
x2 dx by dividing the domain into two equal subintervals.
Suppose the test points for each subinterval are independently chosen, uniformly distributed ran-
dom variables on their respective subintervals. Produce an integral that computes the probability
that this approximation overestimates the actual value of the integral.

Q48 Suppose X and Y are independent, and their joint density function is written as a product

fX,Y (x, y) = g(x)h(y). How is the marginal density function fX (x) related to g(x)?

464
Section 6.4

Triple Integrals
Goals:

1 Set up triple integrals over three-dimensional domains.


2 Evaluate triple integrals.

The theory of integrating a two-variable function extends without much trouble to functions of more
variables. Visualizing the domains and writing bounds of integration is a much greater challenge. Any
function whose domain is a piece of the real world needs (at least) three variables. Joint density functions
can also relate any number of random variables. In both cases, a triple integral allows us to aggregate
a rate (per unit of volume) to compute a total over the domain in question.

Question 6.4.1
How Do We Integrate a Three-Variable Function?

A triple integral is a natural extension of the double integral. A good exercise is to compare the two
definitions, point by point.

Definition

Given a domain D in three dimension space, and a function f (x, y, z). We can subdivide D into regions
Vi is the volume of the ith region.
(x∗i , yi∗ , zi∗ ) is a point in the ith region.
V is the diameter of the largest region.

We define the triple integral of f over D to be the following limit over all possible divisions of D:
ZZZ n
X
f (x, y, z) dV = lim f (x∗i , yi∗ , zi∗ )Vi
D V →0
i=1

Fubini’s theorem applies to triple integrals as well. We write them as interacted integrals.

465
Question 6.4.1 How Do We Integrate a Three-Variable Function?

Theorem
ZZZ Z x2 Z y2 Z z2
f (x, y, z)dV = Df (x, y, z) dzdydx
D x1 y1 z1

where
z1 and z2 are the bounds of z, which may be functions of x and y.
y1 and y2 are the bounds of y, which may be functions of x.

x1 and x2 are the bounds of x. They are numbers.


The variables of can also be reordered, with the bounds defined analogously.

Example 6.4.2
Integrating Over a Prism

ZZZ
Let R = {(x, y, z) : 0 ≤ x ≤ 4, 0 ≤ y ≤ 2, 0 ≤ z ≤ 3}. Compute 3zy + x2 dV .
R

Figure: A Rectangular Prism

Solution

We will set this up as an integral of the form dzdydx. The inner integral is dz. No matter where we
are in R, we can travel in the positive z direction until we hit z = 3 or the negative z direction until we
hit z = 0. Thus the inner integral is
Z 3
3zy + x2 dz.
0

Different choices of (x, y) will give different values of this inner integral. The points in R corresponding
to a choice of (x, y) are a vertical segment ranging from z = 0 to z = 3. These segments exist for any

466
(x, y) in the rectangle 0 ≤ x ≤ 4, 0 ≤ y ≤ 2. We can set up the x and y bounds over this rectangle as
we would over a normal double integral. The integrand is the dz integral above. Together we have an
iterated integral.
Z 4 Z 2 Z 3
3zy + x2 dzdydx
0 0 0

We evaluate the inner integral, then the middle, and finally the outer.

Z 4 Z 2 Z 3 Z 4 Z 2 3
3 2
3zy + x2 dzdydx = yz + x2 z dydx
0 0 0 0 0 2 0
Z 4 Z 2
27
= y + 2x2 dydx
0 0 2
Z 4 2
27 2
= y + 2x2 y dx
0 4 0
Z 4
= 27 + 4x2 dx
0
4
4
= 27x + x3
3 0

256
= 108 +
3
580
=
3

Like a rectangle for double integrals, the right rectangular prism has constant bounds for triple
integrals. This is because the bounds of the inner variables remain the same, no matter what values the
outer variables take.

Question 6.4.3
How Do We Interpret Triple Integrals Geometrically?

The double integral is the volume under the graph of a two-variable function. This graph lives
in three-space. The triple integral is thus a fourth-dimensional volume “under” the graph of a three-
variable function. This graph lives in four-space and is thus more problematic to visualize. However
we can flatten the fourth dimension into three-space, much like we can flatten the three directions of
three-space onto a two-dimensional page. Such a representation loses some information, but can be a
useful heuristic. We can examine the role of each iteration in an iterated triple integral through this
construction.
Z 3
f (x, y, z) dz computes the area under the graph w = f (x, y, z) over each vertical segment of
0
the form (x, y) = (x0 , y0 ) in the domain. It is a function of x and y.

467
Question 6.4.3 How Do We Interpret Triple Integrals Geometrically?

Z 3
Figure: f (x, y, z) dz, represented as an area in a zw-plane
0

Z 2 Z 3
f (x, y, z) dzdy computes the volume under the graph w = f (x, y, z) over each x = x0
0 0
cross-section of the domain. It is a function of x.

Z 2 Z 3
Figure: f (x, y, z) dzdy, represented as a volume in yzw-space
0 0

The final integral would require us to represent a fourth-dimensional analogue of volume, which
would severely overlap in this visualization.

468
Application 6.4.4
Triple Integrals in Math and Science

Triple integrals have a variety of applications, largely in physics which tried to model our three-
dimensional world.
1 Integrating a function ρ(x, y, z), which gives the density of an object at each point, gives the total
mass of the object.
2 Integrating xρ(x, y, z), yρ(x, y, z) and zρ(x, y, z) gives the center of mass of the object.
3 Integrating a three-dimensional probability distribution over a region gives the probability that the
triple (X, Y, Z) lies in that region.
4 Integrating 1 dV over a region gives the volume of that region.
Even if we aren’t interested in physics, this connection provides us with another visual model for
integration. Density lets us visualize a triple integral without referring to a fourth (geometric) dimension.

Z 3
f (x, y, z) dz computes the density of
0
the vertical segments at each (x, y).
Z 2 Z 3
f (x, y, z) dzdy computes the den-
0 0
sity of the rectangle at each x.

Z 4 Z 2 Z 3
f (x, y, z) dzdydx computes the total mass of the prism.
0 0 0

Remark

Technically the density at each step is a different kind of rate.

f (x, y, z) represents mass per unit of volume.


Z 3
f (x, y, z) dz represents mass per unit of area, since you would need all the segments above
0
some area to produce a volume.
Z 2 Z 3
f (x, y, z) dzdy represents mass per unit of length, since you would need to stack a segment
0 0
worth of rectangles to produce a volume.

469
Example 6.4.5
Integrating Over an Irregular Region

Let R be the region above the xy plane, below the cylinder x2 + z 2 = 16 and between y = 0 and
ZZZ
y = 3. Compute 4yz dV .
R

Figure: The region between x2 + y 2 = 16 and the xy-plane

Solution

The words “above” and “below” are useful hints here. “Above the xy-plane” indicates that for each
(x, y) the lower bound of z will be on the xy-plane, where z = 0. “Below
√ the cylinder” indicates that
the upper bound of z will satisfy x2 + z 2 = 16 which solves to z = ± 16 − x2 . Since these z values
are above the xy-plane, the positive branch must be the upper bound. Thus our inner integral is

Z 16−x2
4yz dz
0

To complete the middle and outer bounds we consider what x and y values lie in R. The lines y = 0 and
y = 3 suggest bounds for y, but they do not enclose any region. √Where else can we get information?
Since R is bounded above and below by y = 0 and above by y = 16 − x2 , then it is also bounded by
where these graphs meet. We can solve for that intersection:
p
0= 16 − x2

0 = 16 − x2
0 = (4 + x)(4 − x)
x = −4 or x = 4

Putting this together with the bounds we already have, we see that our x and y bounds are rectangular.
We set them up as we would in a double integral and put the inner integral as an integrand:

Z 4 Z 3 Z 16−x2
4yz dzdydx
−4 0 0

470
We now turn to evaluating the integral. Having a function of x in our z-bounds should be familiar from
double integrals.
√ √
Z 4 Z 3 Z 16−x2 Z 4 Z 3 16−x2
4yz dzdydx = 2yz 2 dydx
−4 0 0 −4 0 0
Z 4 Z 3 p
= 2y( 16 − x2 )2 dydx
−4 0
Z 4 Z 3
= 32y − 2yx2 dydx
−4 0
Z 4 3
= 16y 2 − x2 y 2 dx
−4 0
Z 4
= 144 − 9x2 dx
−4
4
= 144x − 3x3
−4

= 576 − 192 − (−576 + 192)


= 768

Main Idea

The following approach will produce the bounds of a region with a top surface and a bottom surface.

1 The z bounds are given by the equations z = f (x, y) and z = g(x, y) of the top and bottom
surface.
2 The intersection of the top and bottom surface can produce relevant bounds on x and y. We can
graph these, along with any given bounds involving x and y.
3 After drawing the bounded region in the xy-plane, the x and y bounds are computed as for a
double integral.

Like with double integrals, we will want to break the region into smaller pieces in some cases. In
other cases, we may want to change the order of integration.

471
Example 6.4.6
A Solid Given by Vertices

Suppose we want to integrate over T , the tetrahedron (pyramid) with vertices (0, 0, 0), (4, 0, 0),
(4, 2, 0) and (4, 0, 2). How would we set up the bounds of integration?

Figure: z bounds of T Figure: x, y bounds of T

Solution

In this case, it is helpful to draw a diagram of the tetrahedron in three-space. First we examine the inner
integral. The bounds of z are functions of (x, y). Visually, we want to imagine the vertical segments
lying in different parts of T and ask where their upper and lower endpoints lie. No matter which veritcal
segment we pick, its lower endpoint in on the xy-plane and its upper endpoint is on the triangle with
vertices (0, 0, 0), (4, 2, 0) and (4, 0, 2).
The xy-plane gives us a lower bound z = 0. The upper bound triangle also lies in a plane. Every
upper endpoint lies in this plane, so its z coordinates must satisfy the equation of that plane. This plane
has a z-intercept of 0 since (0, 0, 0) is a vertex of the triangle. We can solve for the slopes and write
the equation.

2−0 1
mx = =
4−0 2
2−0
my = = −1
0−2
1
z= x − 1y + 0
2
1
z = x−y
2

Our inner integral is


1
2 x−y
Z
f (x, y, z) dz
0

To find the outer bounds, we ask what values of (x, y) lie in T ? Every point in T lies directly above the
triangle with vertices (0, 0, 0), (4, 2, 0) and (4, 0, 0). Thus its (x, y) coordinates match those of a point

472
in the triangle. We can draw this triangle in the xy-plane and set up the bounds of a double integral
over it. The result is
Z 4 Z 12 x Z 12 x−y
f (x, y, z) dzdydx
0 0 0

Main Idea

In the case of a polyhedron given by vertices, we generally need to plot the vertices and draw the faces
to discern the upper and lower z bounds. The equations of these bounds are planes. We can then draw
the set of possible (x, y) in two-space and proceed as in a double integral.

Example 6.4.7
Changing the Order of Integration

Suppose D is the bounded region enclosed between the graph of y = 4x2 + z 2 and the plane y = 4.
ZZZ
Set up the bounds of the integral f (x, y, z)dV .
D

Figure: A region bounded by a paraboloid and a plane

473
Example 6.4.7 Changing the Order of Integration

Solution 1

We can begin by finding bounds for the z, the inner variable. The plane y = 4 does not have a z. z
is a free variable and thus the plane extends in the z direction, and p
cannot be the top or bottom of a
vertical segment. On the other hand y = 4x2 + z 2 solves to z = ± y − 4x2 . Since this gives a plus
and a minus branch, it can provide both the upper and lower bound of z. The inner integral is

Z √y−4x2
√ f (x, y, z) dz
− y−4x2

For xy-bounds we have equation y = 4, but this does not bound any region. We can search for additional
bounds by seeing where the top surface meets the bottom. We’ll use the fact that a square root can
only equal a negative square root if both are 0.
p p
− y − 4x2 = y − 4x2

y − 4x2 = 0

y = 4x2

We can add this parabola to a graph. We set up our xy bounds using our usual method for double
integrals. The graphs y = 4x2 and y = 4 intersect at x = ±1. Between x = −1 and x = 1, y = 4 is
greater than y = 4x2 . Here are the bounds.

Z 1 Z 4 Z √y−4x2
√ f (x, y, z) dzdydx
−1 4x2 − y−4x2

p The bounds in this solution look difficult to work with. For example, in the first step, we’ll plug
y − 4x2 in for z in the antiderivative of f . The resulting integrand would be even more difficult
to work with. We can improve this situation somewhat by choosing a different variable for our inner
integral.

Solution 2

Since both bounds are already solved for y, we will use y as our inner variable. We can test which is
the upper and which is the lower bound with a test point, but we don’t yet know which x and z values
lie in the region. We do not have any x or z bounds that don’t involve y, so we set the y-bounds equal
to each other.

4x2 + z 2 = 4

We may recognize this as an ellipse. Even if we do not, we can proceed at usual for a double integral,
except that our variables are x and z. We will use z is the inner variable and solve the bound for z.

4x2 + z 2 = 4

z 2 = 4 − 4x2
p
z = ± 4 − 4x2

474
These give upper and lower bounds for z. To find x bounds, we solve for where the z-bounds intersect.
p p
− 4 − 4x2 = 4 − 4x2

4 − 4x2 = 0 a sqaure root equals its nagative only at 0


4(1 − x)(1 + x) = 0
x = −1 or x = 1
This gives us the bounds of the outer integrals:

Z 1 Z 4−4x2

√ . . . dzdx
−1 − 4−4x2

We can choose xp= 0 since that is between −1 and 1.


We still need a test point for the y bounds. p
We can choose z = 0 since that is between − 4 − 4(0)2 and 4 − 4(0)2 . We plug them into both y
bounds and see that y = 4 is the upper bound.
y = (4)(0)2 + 02 vs y=4
Our final integral is

Z 1 Z 4−4x2 Z 4

√ f (x, y, z) dydzdx.
−1 − 4−4x2 4x2 +z 2

We still have difficult z bounds under this method, but we delay plugging them in until the second
step, which means they may cause less trouble for us.

Main Idea

When setting up a triple integral bounded by graphs, it may be more convenient to use an inner variable
that has a less complicated relationship with the bounding equations.

Question 6.4.8
When Does a Triple Integral Decompose as a Product?

The product theorem from double integrals also works here:

Theorem

If y1 , y2 , z1 and z2 are constants, then


Z x2 Z y2 Z z2
f (x)g(y)h(z) dzdydx
x1 y1 z1
Z x2  Z y2  Z z2 
= f (x) dx g(y) dy h(z) dz
x1 y1 z1

475
Question 6.4.8 When Does a Triple Integral Decompose as a Product?

Example

Along with the sum and constant multiple rules we can simplify
Z 4 Z 2 Z 3
3zy + x2 dzdydx
0 0 0

to obtain the following:


Z 4 Z 2 Z 3 Z 4 Z 2 Z 3
3zy dzdydx + x2 dzdydx
0 0 0 0 0 0
Z 4 Z 2 Z 3 Z 4 Z 2 Z 3
=3 dx y dy z dz + x2 dx dy dz
0 0 0 0 0 0
Z 2 Z 3 Z 4
=3 · 4 y dy z dz + 2 · 3 x2 dx
0 0 0

Section 6.4
Exercises

Summary Questions

Q1 What does Fubini’s theorem say about integrals with dV ?

Q2 How is density used to understand triple integrals. Why wasn’t it necessary or appropriate for
double integrals?

Q3 How do you find the bounds of the inner variable in a triple integral?

Q4 How to you find the bounds of the other two variables?

476
6.4.1

Z 3 Z 10 Z 2
Q5 Suppose we want to approximate f (x, y, z) dzdydx by subdividing the domain of
0 2 −2
integration into 12 sub-prisms of equal volume. What will V be?

Q6 Let R be a cube of side length 4, with edges parallel to the x-, y- and z-axes, and with vertices
ZZZ
(0, 0, 0) and (4, 4, 4). Suppose we want to approximate xyzdV using a subdivsion of R
R
into 8 identical cubes.

a What will V be?

b What test points would you use to make your approximation as large as possible.

c Produce the smallest possible approximation using this subdivision.

6.4.2

Z 4 Z 7 Z 2 Z 7 Z 2 Z 4
Q7 Is f (x, y, z) dzdydx = f (x, y, z) dzdydx? Explain.
0 0 0 0 0 0

Q8 Set up the bounds of integration of a function f (x, y, z) over the a general prism

P = {(x, y, z) : x0 ≤ x ≤ x1 , y0 ≤ y ≤ y1 , z0 ≤ z ≤ z1 }

Z 2 Z 2 Z 3
Q9 Evaluate (x + y)z dzdydx.
0 0 0
Z 5 Z 11 Z 1
Q10 Evaluate ye2x+z dzdydx.
0 0 −1

477
Section 6.4 Exercises

6.4.3

Z z1 Z y1 Z z1
Q11 In a triple integral, the inner integral f (x, y, z) dz is a function of x and y, while f (x, y, z) dzdy
z0 y0 z0
is a function of only x.

a Explain why this occurs algebraically.

b Explain why this makes sense given the context of an iterated triple integral.

Q12 In each of the following questions, assume x, y, z, and w are the variables of four-space.

a What is the dimension of the set of points that satisfy x = x0 ?

b What is the dimension of the set of points that satisfy both x = x0 and y = y0 ?

Q13 Give the area of the x = 4 and y = 1 cross-section of the region “under” the graph of w = x+yez
and “above” the prism

P = {(x, y, z) : 0 ≤ x ≤ 6, 0 ≤ y ≤ 4, 0 ≤ z ≤ 3}


z2 13−x2
Q14 Give the volume of the x = 2 cross-section of the region “under” the graph of w = y

and “above” the prism

P = {(x, y, z) : 0 ≤ x ≤ 3, 1 ≤ y ≤ 2, −3 ≤ z ≤ 3}

6.4.4

Q15 A prism of length ℓ, width w and height h can be defined by the inequalities

0≤x≤ℓ
0≤y≤w
0≤z≤h

Set up a triple integral to compute the volume of this prism. Verify that the value of this integral
matches the well-known volume formula, V = ℓwh.

478
Q16 Denser matter tends to sink to the bottom of a container. After sitting undisturbed for several
days, the density of a soil sample in the box

P = {(x, y, z) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 4, 0 ≤ z ≤ 2}

is given by ρ(x, y, z) = e−z/10 . Find the total mass of the soil in the box.

Q17 Xavier, Yolanda and Zoe’s respective arrival times (in hours after noon) at a restaurant are given
by joint random variables X, Y and Z. The joint density function of X, Y and Z is
(
12
11 (1 − x2 yz) if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1
fX,Y,Z (x, y, z) =
0 otherwise

Compute the probability that they all arive by 12 : 15.

Q18 Random variables X, Y and Z are uniform if their density function has the form

(
1
V if (x, y, z) is in R
fX,Y,Z (x, y, z) =
0 otherwise

where V is the volume of R. If X, Y and Z are uniform on

R = {(x, y, z) : 0 ≤ x ≤ 10, 0 ≤ y ≤ 10, 0 ≤ z ≤ 10},

compute P (X ≤ 4 and Z ≥ 3).

6.4.5

Q19 Let R be the region given by x2 + y 2 + z 2 ≤ 25.

a Describe R geometrically.
ZZZ
b Set up the bounds of integration for f (x, y, z) dV .
R

c If we plug in the function f (x, y, z) = 1 do you happen to know the value of this integral?

Q20 Cheng is integrating over R, the region given by x2 + y 2 + z 2 ≤ 25. He gives the following setup.
Is this valid?

Z √25−y2 −z2 Z √
25−x2 −z 2 Z √25−x2 −y2
√ √ √ f (x, y, z) dzdydx
− 25−y 2 −z 2 − 25−x2 −z 2 − 25−x2 −y 2

479
Section 6.4 Exercises

Q21 Consider the domain

D : {(x, y, z) : y ≥ 0, y ≤ −x, z ≥ 9, z ≤ 25 − x2 − y 2 }

RRR
a Set up the bounds of D
x dV . Do not evaluate.

b Do you expect the integral in a to be positive, negative, or zero? In a sentence or two,


explain how you know without computing it.

Q22 Let R = {(x, y, z) : z ≤ 2x − y, z ≥ 0, y ≥ x2 }. Compute

ZZZ
xz dV.
R

Q23 Let R be the region enclosed by the graphs z = x2 + y 2 and 2y − z = 0. Set up the bounds for
RRR
R
(y − 1) dV . Do not evaluate.

Q24 Set up a triple integral that will compute the volume enclosed by the planes x = 0, x = 5, y = 0,
z = 2y and z = 6. Do not evaluate.

Q25 Let R be the region enclosed by z = x2 , z = 16, y = 2 and y = 6. Set up and evaluate
ZZZ
x + z dV .
R

√ √
Q26 Let R be the region enclosed by y = 25 − x2 , z = 6 − y and z = y. Set up the bounds of
RRR
R
g(x, y, z)dV .

6.4.6

Q27 Let P be a square pyramid with vertices (0, 0, 0), (2, 0, 0), (2, 2, 0), (0, 2, 0) and (0, 0, 4).

a Explain why it might not be a good idea to use z as the inner variable when setting up the
ZZZ
bounds of f (x, y, z) dV .
P

b Set up the bounds using a different inner variable.

480
ZZZ
Q28 Set up the bounds of integration of f (x, y, z) dV , where T is a tetrahedron with vertices
T
(0, 0, 0), (8, 0, 0), (0, 6, 0) and (0, 0, 3).

6.4.7

Q29 Let R be the region over the first quadrant enclosed by y = x2 , x = 0, z = 0 and z = 4 − y.

ZZZ
Set up the integral f (x, y, z) dV
R

a with z as the inner variable

b with y as the inner variable

c with x as the inner variable

Q30 Let R be the region enclosed by the paraboloid x = 3 − y 2 − z 2 and the plane y = 21 x.

RRR
a Set up the integral R
f (x, y, z) dV with z as the inner variable.

481
Section 6.4 Exercises

RRR
b Set up the integral R
f (x, y, z) dV with x as the inner variable.

RRR
c Explain why it would be difficult to set up R
f (x, y, z) dV with y as the inner variable.

Q31 Let P be the prism whose base has vertices (0, 0, 0), (0, 5, 0) and (0, 0, −2) and whose height is
RRR
4 units in the direction of the positive x axis. Set up a triple integral P
g(x, y, z) dV in three
different ways, using three different inner variables.

Q32 Let P be the trapezoidal prism with vertices (0, 0, 0), (0, 6, 0), (0, 4, 2), (0, 0, 2), (5, 0, 0), (5, 6, 0),
ZZZ
(5, 4, 2), and (5, 0, 2). Set up the bounds of integration of h(x, y, z) dV without writing
P
it as a sum or difference of multiple integrals.

Q33 Consider the tetrahedron T whose vertices are (0, 0, 0), (0, 0, 4), (0, 6, 3), (2, 6, 3). Which vari-

able(s) could you use as the inner variable of a triple integral over T without having to break the
domain into two or more pieces.

Q34 Set up (but do not evaluate) one or more integrals of f (x, y, z) over the region

R = {(x, y, z) : z ≥ 0, x ≥ y 2 + z 2 , x + 2z ≤ 8}
Use dxdydz as your order of integration.
Z 1 Z x Z x−y
Q35 Rewrite the integral f (x, y, z) dzdydx as an integral with the differential dxdzdy.
0 x2 0
Z 2 Z 2 Z 4−x2
Q36 Rewrite the integral f (x, y, z) dzdydx as an integral with the differential dxdzdy.
0 2−x 0

6.4.8

Z 4 Z 8 Z 1
Q37 Use product and sum rules to decompose y 2 sin x − ey+z dzdydx into an expression
3 0 −1
containing only single integrals.
ZZZ
Q38 Let S = {(x, y, z) : x + y + z 2 2 2
≤ 25}. Explain why x3 y 4 cos πz dV cannot be
S
decomposed as a product.

482
Synthesis & Extension

Q39 Consider the domain

D : {(x, y, z) : x − 16 ≤ y ≤ 2, x2 + y 2 ≤ z ≤ x2 + x + 4}

RRR
a Set up the bounds of D
xyz dV . You may use one or more integrals to do so. Do not
evaluate.

b Does the function f (x, y, z) = xyz have a maximum value on D? Justify your answer with
a theorem, and verify that the theorem does or does not apply.

Q40 Let S be the region above z = 0 and below the graph z = f (x, y) over the rectangle

R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d}.

a Write the volume of S as a double integral.

b Write the volume of S as a triple integral.

c Show that if you evaluated your answer to b , your answer to a would be one of the step
of this computation.

Q41 Suppose that R is the solid obtained by rotating the region under y = f (x) from x = a to x = b
around the x-axis. Write a triple integral that computes the volume of R.

483

You might also like