Math
Math
Mike Carr
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Series 169
3.1 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
3.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
3.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
3.5 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
1
Introduction
So far in calculus you have developed the tools to answer the following questions about a function
of one variable:
These are all useful tools, but they don’t necessarily apply to the types of data that we encounter in
the world.
Data generally takes the form of a set of observations, rather than an algebraic function. How do
we perform calculus with such a set? We cannot integrate it without an antiderivative. In some cases,
the best functions to model our data are difficult to work with. We take for granted that sin x is a
2
useful function, but how do we even evaluate a quantity like sin(7.52)? In all these circumstances, the
best we can do is approximate. We will develop methods to approximate integrals and to approximate
functions.
Many measurable quantities can be found to depend on the value of multiple inputs. These are
multivariable functions like z = F (x, y), where z is a function of two independent variables. Examples
appear in all the sciences
nrt
1 Chemistry: V =
P
GM m
2 Physics: F =
r2
3 Economics: P = P0 ert
We want to understand how to measure rates of change of these functions, and what these mea-
surements can do for us.
Furthermore, real world data does not come prepackaged with a differentiable function to describe
it. One approach is to find a line of best fit. Doing so requires optimizing two variables at once (slope
and intercept) to find the best fit.
3
Introduction
The values of y may not be a function of x at all. Another view point is to see (x, y) as a randomly
chosen point in the plane. To model such random choices, we use a two-variable density function.
Volumes under its graph (computed by integrals) tell us where these random points are likely to lie.
These approaches will requires us to use derivatives and integrals of multivariable functions.
4
Chapter 1
This chapter reviews the most important information about functions, limits, derivatives, and integrals.
It is not meant to teach this material to a first-time learner, but can serve as a reference or reminder.
Contents
1.1 Graphs of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Limits and Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Applications of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4 Definite Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Section 1.1
Graphs of Functions
Goals:
Definition
The graph of an equation is the set of ordered pairs (x, y) that satisfy the equation. These are the
points that, when their coordinates are plugged in for x and y, the two sides of the equation are equal.
Linear Functions
f (x) = mx + b.
If we have both the x- and y-intercepts of the line, it is convenient to write it in normal form
ax + by + c = 0
6
Monomials
Monomials of negative power have the form f (x) = x−n . They are also commonly written
1
f (x) =
xn
.
1
The graph y = xn has a vertical asymptote at x = 0.
The graph approaches the x-axis, y = 0 as x gets large.
7
Section 1.1 Graphs of Functions
Roots
√
The domain of n
x is [0, ∞) if n is even and all real numbers if n is odd.
√
The x and y intercept of y = n x is at (0, 0).
Root functions are increasing. At x = 0, they travel straight up.
√ √
Figure: The graphs of y = x and y = 3
x
8
Exponential Functions
Logarithms
9
Section 1.1 Graphs of Functions
Logarithms and exponents are inverse functions. We solve exponential equations by applying a
logarithm to both sides. We solve logarithm equations by exponentiating both sides.
ax = c x = loga c
loga x = c x = ac
Trigonometric Functions
sin x
tan x =
cos x
cos x
cot x =
sin x
1
sec x =
cos x
1
csc x =
sin x
Since trigonometric functions obtain the same values infinitely many times, the do not technically
have inverse functions. However, we define inverse trigonometric functions on a restricted range.
π π
− ≤ sin−1 x ≤
2 2
0 ≤ cos−1 x ≤ π
π π
− ≤ tan−1 x ≤
2 2
These functions provide one solution to a trigonometric equation. We can obtain the others by using
the periodic behavior of trognometric funtions.
Question 1.1.1
How Do Transformations Affect the Graph of a Function?
Transformations
Suppose we would like to transform the graph y = f (x). Here are four ways we can.
11
Question 1.1.1 How Do Transformations Affect the Graph of a Function?
Example 1.1.2
A Equation with Quotients
f (x)
An equation of the form g(x) = 0 is satisfied whenever f (x) = 0 but g(x) ̸= 0.
Example
Solve
2x2 − 3x − 5
=0
x2 + 3x + 2
12
Solution
5
So x = 2 is the only solution.
If there are terms besides the quotient, move them all to the same side of the equation and use a
common denominator to combine them.
Example
Solve
x+3 4
2+ =
x+1 x
Solution
x+3 4
2+ − =0 move to one side
x+1 x
2x2 + 2x x2 + 3x 4x + 4
+ 2 − 2 =0 common denominator
x2 + x x +x x +x
3x2 + x − 4
=0 combine
x2 + x
set 3x2 + x − 4 = 0
(3x + 4)(x − 1) = 0 factor
4
x=− or x = 1
3
13
Section 1.1
Exercises
1.1
3
Q1 Simplify 52 54
Q2 Simplify e5 (e4 )3
Q5 Solve 2ex − 7 = 22
Q6 Solve 4 cos(2x) = 1
Q7 Solve 2 sin2 x − 1 = 0
Q8 Solve 2 ln(x − 5) = 16
Q9 Solve 43x−2 = 15
1.1.1
Q12 Graph y = − ln x + 5.
Q13 Graph y = ex − 4.
√
3
Q14 Graph y = x + 3.
1
Q15 Graph y = .
(x − 2)2
14
√
Q16 Graph y = −2 x + 1 + 4.
1.1.2
x2 + 5x − 6
Q17 Solve for x: =0
x−1
ex − 2
Q18 Solve for x: =0
x2 + 2x − 3
3x2 − 5
Q19 Solve for x: =0
2ex − 7
ln t − 4
Q20 Solve for t: =0
3−t
ln x − 4
Q21 Solve for x: =0
3−x
3 7
Q22 Solve for x: =
x+2 x+4
5 u
Q23 Solve for u: =
(u + 1)2 u+1
15
Section 1.2
3 Compute derivatives.
4 Use derivatives to understand graphs and vice versa.
Question 1.2.1
What Is a Limit?
If we can make f (x) arbitrarily close to some number L by considering only x in a small interval
(a, a + δ) then we say the limit of f as x approaches a from the right is L. We write:
lim f (x) = L
x→a+
If f (x) cannot be made arbitrarily close to any number, then this limit does not exist.
Similarly, if we can make f (x) arbitrarily close to some number L by considering only x in a small
interval (a − δ, a) then we say the limit of f as x approaches a from the left is L. We write:
lim f (x) = L
x→a−
If f (x) cannot be made arbitrarily close to any number, then this limit does not exist.
If both lim f (x) = L and lim f (x) = L, we say the two-sided limit or just limit of f as x
x→a+ x→a−
approaches a is L. We write
lim f (x) = L
x→a
If the either the limit from the left or the limit from the right does not exist, or if they do exist
but are not equal to each other, then the two sided limit does not exist.
16
Figure: An interval of x values that produce values in a small neighborhood of L when plugged into
f (x).
Infinite Limits
If f (x) can be made arbitrarily large by considering only x in a small interval (a, a + δ) then we say the
limit of f as x approaches a from the right is ∞.
lim f (x) = ∞
x→a+
This is a way of representing growth without bound. Infinite limits from the left are defined anal-
ogously. Also analogous is our treatment of a function then decreases without bound. We say these
functions limit to −∞. If either one-sided limit at x = a is infinite, then the line x = a is a vertical
asymptote of y = f (x).
Example
Let f (x) = x1 .
lim f (x) = ∞
x→0+
lim f (x) = −∞
x→0−
17
Question 1.2.1 What Is a Limit?
1
Figure: The graph of y = x
Vertical Asymptotes
There are only two common algebraic constructions that produce infinite limits.
f (x)
A function of the form g(x) where lim g(x) = 0 and lim f (x) ̸= 0.
x→a x→a
Remark
∞ is not a number, so if lim+ f (x) = ∞ we would still say that lim+ f (x) does not exist.
x→a x→a
There are several limit laws that allow us to compute limits of combinations of simpler functions.
The following hold limits, provided that lim f (x) and lim g(x) exist.
x→a x→a
lim (f (x)g(x)) = lim f (x) lim g(x)
x→a x→a x→a
lim f (x)
!
f (x) x→a
lim = provided that lim g(x) ̸= 0
x→a g(x) lim g(x) x→a
x→a
We can write similar statements for one-sided limits, though we need to be careful about directions in
the composition rule.
18
Question 1.2.2
What is Continuity?
Definition
Remark
This definition is useful, if we already know we are dealing with a continuous function. For example
f (x) = sin x is continuous so
π 1
lim sin x = sin =
x→ π6 6 2
Theorem
3 Polynomials
4 Roots
5 Exponential functions
6 Logarithms
7 Trigonometric functions
8 f (x) = |x|
More complex functions made from continuous functions are also continuous.
19
Question 1.2.2 What is Continuity?
Theorem
If f (x) and g(x) are continuous on their domains, and c is a constant, then the following are also
continuous on their domains
1 f (x) + g(x)
2 f (x) − g(x)
3 f (x)g(x)
f (x)
4 g(x) (note that any x where g(x) = 0 is not in the domain)
Remark
Putting the above theorems together, we see that just about any function we can write using alge-
braic and trigonometric expressions is continuous on its domain. This does not mean it is continuous
everywhere. f (x) = x1 is not continuous at x = 0, for example.
Example 1.2.3
Computing a Limit
x2 − 7x + 12
How do we compute lim ?
x→3 x−3
Solution
x2 −7x+12
f (x) = x−3 is continuous on its domain, but x = 3 is not in the domain. However, let g(x) = x−4.
x2 −7x+12
We know = x − 4 for every x except x = 3. Specifically, in any neighborhood around x = 3,
x−3
f (x) = g(x) so they have the same limit.
x2 − 7x + 12
lim = lim x − 4 because they agree around x = 3
x→3 x−3 x→3
20
Question 1.2.4
What Is the Intermediate Value Theorem?
One early intuition for continuity is that the graph of the function can be drawn without any breaks.
There are many ways to formalize this idea. One of the most important is the following theorem.
If f is a continuous function on [a, b] and K is a number between f (a) and f (b), then there is some
number c between a and b such that f (c) = K.
This theorem essentially states that a continuous graph cannot get from one side of the line y = K
to the other without intersecting y = K. Notice that this theorem does not say exactly where this
intersection must occur, only that it must occur somewhere in the interval (a, b). It also does not rule
out the possibility of more than one such c existing.
Example
Solution
A root is a number c such that f (c) = 0. To prove such a root exists, we check the conditions of the
IVT.
f (x) is a sum of continuous functions, so it is continuous on its domain.
f (0) = 1
f (1) = e − 3 < 0
0 is between f (0) and f (1)
We conclude there is some c between 0 and 1 such that f (c) = 0.
21
Question 1.2.5
What Is a Limit at Infinity?
Definition
If we can make f (x) arbitrarily close to some number L by considering only x in some interval
(n, ∞) then we say the limit of f as x approaches ∞ is L. We write:
lim f (x) = L
x→∞
If f (x) cannot be made arbitrarily close to any number, then this limit does not exist.
Similarly if we can f (x) arbitrarily close to L by considering only x in some interval (−∞, n) then
we say the limit of f as x approaches −∞ is L. We write:
lim f (x) = L
x→−∞
If either lim f (x) = L or lim f (x) = L, then y = L is a horizontal aysmptote of the graph
x→∞ x→−∞
y = f (x).
By observing graphs or using arithmetic intuition, we arrive at the following limits at infinity.
xn (n odd) ∞ −∞ n>0
n
x (n even) ∞ ∞ n>0
√
n
x (n odd) ∞ DNE domain is x ≥ 0
√
n
x (n even) ∞ −∞
1
xn 0 0 n>0
x
a (a > 1) ∞ 0
ax (0 < a < 1) 0 ∞
loga x ∞ DNE a > 1, domain is x > 0
sin x DNE DNE oscillates
−1 π
tan x 2 − π2
22
Question 1.2.6
How Do We Measure the Change in a Function?
Definition
f (b) − f (a)
b−a
This is also the slope of the secant line from (a, f (a)) to (b, f (b)) on the graph y = f (x).
Knowing the average rate of change over a range of inputs (or times) doesn’t tell us the rate of
change at a specific point (or moment). Geometrically the is the slope of the tangent line to y = f (x)
at a particular point (a, f (a))
y = f ( x)
a b
The secant lines get closer and closer to the tangent line (in slope) as b gets closer to a. This
suggests that we could take the limit of these approaching values to get the actual slope.
Definition
f (a + h) − f (a)
lim
h→0 h
provided that this limit exists. This is also the slope of the tangent line to y = f (x) at (a, f (a)). Two
common notations for the derivative are
Prime notation: f ′ (a)
df
Leibniz notation: dx
x=a
23
Question 1.2.6 How Do We Measure the Change in a Function?
We can attempt to compute the derivative at any point a. We can put these values together to
create a function f ′ (x).
Definition
The derivative function of f (x) is the function that takes the value
f (x + h) − f (x)
f ′ (x) = lim
h→0 h
at each x.
df
We can denote the derivative function as f ′ (x) or dx . The second can be rewritten d
dx f to emphasize
that we are applying the differentiation operation to the function f .
Example
24
Solution
f (x + h) − f (x)
f ′ (x) = lim definition of derivative
h→0 h
(x + h)2 + 2(x + h) − x2 − 2x
= lim plug in x and x + h
h→0 h
x2 + 2xh + h2 + 2x + 2h − x2 − 2x
= lim distribute
h→0 h
2xh + h2 + 2h
= lim cancel
h→0 h
= lim 2x + h + 2 functions agree except at h = 0 so limits are equal
h→0
Theorem
If f ′ (x) > 0 for all x in some interval [a, b] then f (x) is increasing on [a, b].
If f ′ (x) < 0 for all x on [a, b] then f (x) is decreasing on [a, b].
We can take higher order derivatives by taking derivatives of derivatives. The derivative function
of f in this context is called the first derivative. Its derivative function is the second derivative. The
second derivative’s derivative function is the third derivative and so on.
Notation
df
first derivative f ′ (x)
dx
d2 f
second derivative f ′′ (x)
dx2
d3 f
third derivative f ′′′ (x)
dx3
d4 f
fourth derivative f (4) (x)
dx4
d5 f
fifth derivative f (5) (x)
dx5
25
Question 1.2.6 How Do We Measure the Change in a Function?
The sign of a higher order derivative tells us how the derivative of one order lower is changing. For
d5 f d4 f
example if 5
< 0, then is decreasing. The sign of higher order derivatives is difficult to discern
dx dx4
from the shape of y = f (x), with the exeption of the second derivative.
Theorem
If f ′′ (x) > 0 on some interval, then y = f (x) is concave up on that interval. If f ′′ (x) < 0, then
y = f (x) is concave down.
Definition
A point a such that f (x) is concave up to one side of a and concave down to the other side is called
an inflection point.
Question 1.2.7
How Do We Compute Derivatives
The limit definition of a derivative is too unwieldy to use every time. A better approach is to learn
the derivatives of some simple functions, and then use theorems to compute derivatives when those
functions are combined.
d
dx c = 0 (derivative of a constant is 0)
d n
dx x = nxn−1 for any n ̸= 0 (The Power Rule)
d
dx sin x = cos x
d
dx cos x = − sin x
d x
dx e = ex
d x
dx a = ax ln a for a > 0
d 1
dx ln x = x
26
Theorem
The following rules allow us to differentiate functions made of simpler functions whose derivative we
know.
Example
d
Compute tan(x)
dx
Solution
sin x
tan x = . We apply the quotient rule
cos x
27
Application 1.2.8
The Shape of a Graph
What can the first and second derivative of f (x) = 8x3 − x4 tell us about the shape of its graph?
Solution
We will compute the first and second derivative using the power rule. Factoring them will allow us to
perform a sign analysis.
= 4x2 (6 − x) = 12x(4 − x)
4x2 + + + 12x − + +
(6 − x) + + − (4 − x) + + −
f ′ (x) + + − f ′′ (x) − + −
0 6 0 4
From the sign of f ′ (x) we conclude f is increasing on (−∞, 0) and (0, 6) but decreasing on (6, ∞).
From the sign of f ′′ (x) we conclude that f is concave down on (−∞, 0) and (4, ∞), but concave up
on (0, 4).
28
Section 1.2
Exercises
1.2.1
Q1 Given the graph of y = f (x) here, give the value of each of the following limits (if they exist).
Q2 Given the graph of y = g(x) here, give the value of each of the following limits (if they exist).
29
Section 1.2 Exercises
1.2.2
ex
Q3 Explain why f (x) = x2 +3 is continuous on R.
p
Q4 Explain why f (x) = sin(3x2 ) is continuous on its domain.
Q5 Is
sin(2x) if x < 0
f (x) = 4 if x = 0
2
−x if x > 0
Q6 Is
(
x3 − 2x + 1 if x < 0
f (x) =
ex if x ≥ 0
Q7 Is
x + 5
if x < 1
f (x) = 6 if x = 1
2
x + 4x + 1 if x > 1
Q8 Where is
cos(πx) if x < 4
f (x) = 1 if x = 4
√
x − 3 if x > 4
continuous?
30
1.2.3
x−3
Q9 Compute lim
x→3 x2 − 9
x2 − 4x + 3
Q10 Compute lim
x→1 x−1
2x − 18
Q11 Compute lim √
x→9 x−3
1 1
x2− 16
Q12 Compute lim
x→4 x−4
1.2.4
√
Q14 Explain why 3
x = log2 x has a solution in [0, 8].
1 1
Q15 What does the Intermediate Value Theorem say about whether f (x) = x − 2 has a root in
[−1, 1]?
√
3 π 3 5π 1 3
Q16 Consider the equation sin x = . Gloria computes sin = and sin = . Since is not
4 3 2 6 2 4
√
1 3 3
, she concludes that sin x = has no roots in π3 , 5π
between and 6 . What do you think
2 2 4
of Gloria’s reasoning?
1.2.5
x2 + 2x − 9
Q17 Compute lim .
x→∞ 3x − 6
4x2 − 7x + 9
Q18 Compute lim .
x→∞ 2x2 + 11
p
Q19 Compute lim e1/x .
x→∞
1
Q20 Compute lim .
x→∞ ln x
31
Section 1.2 Exercises
x
Q21 Compute lim ee .
x→−∞
1.2.6
b Give the equation of the secant line that meets y = f (x) at x = 2 and x = 5.
√
Q24 Let f (x) = x Compute the average rate of change of f between x = 4 and x = 9. Based on
the graph of y = f (x), is the instantaneous rate of change at x = 4 greater or less than this
average?
Q25 Let f (x) = 3x2 − 7. Compute f ′ (6) using the limit definition of the derivative.
√
Q28 Let f (x) = x. Compute f ′ (x) using the limit definition of the derivative.
32
1.2.7
5
a 5x7 − 3x2 + f cos(4x)
x2
4x5 − 2x2 + 3x + 4
b g sin(ex )
x
ex 2
d i ex sin x
x2
√ ln(x2 + 2)
e x−5 j
x2 + 3x
3 7
a + f e3x+2
x x3
5x4 + 3x3 − 8x2
b g cos(x3 + 2x)
x2
ln x 5
c h (cos x)3
x
2
d 4x sin(x) i ex sin3 x
√
e tan(2x + 7) j ln( x sin x)
3
Q32 Let f (x) = ex . Compute f ′′ (x).
1.2.8
33
Section 1.2 Exercises
√
Q35 Where in its domain is f (x) = 1024 x − x4 increasing?
34
Section 1.3
Applications of Derivatives
Goals:
Application 1.3.1
The Tangent Line to a Graph
Given a function f (x), the derivative f ′ (a) is the slope of the line tangent to y = f (x) at (a, f (a)).
Formula
y − f (a) = f ′ (a)(x − a)
We can rewrite the tangent line as a function of x. We call this a linearization, because this function
is linear, but it approximates the value of f (x) for x near a.
Formula
If we want to emphasize the change in x and y instead of their actual values we can use differential
notation:
35
Application 1.3.1 The Tangent Line to a Graph
Notation
Application 1.3.2
Maximum and Minimum Values of a Function
Definition
A number a is a maximum of a function f (x) if f (a) ≥ f (x) for all x in the domain of f .
a is a minimum if f (a) ≤ f (x) for all x in the domain of f .
36
Definition
A number a is a local maximum of a function f (x) if f (a) ≥ f (b) for all b in some neighborhood of a.
a is a local minimum if f (a) ≤ f (b) for all b in some neighborhood of a.
To distinguish ordinary maximums from the local variety, we sometimes call them global maximums
or absolute maximums. Every global maximum is a local maximum, but local maximums need not be
global maximums. If f ′ (a) > 0 then there are larger values of f (a) to the right of a and lower values
to the left. Thus a cannot be a local maximum or minimum. The same argument applies if f ′ (a) < 0.
Definition
A critical point of f (x) is a value a in the domain of f such that either f ′ (a) = 0 or f ′ (a) does not
exist.
Local maximums and minimums of f (x) can only occur at critical points.
We can use concavity as a way to classify critical points. Knowing whether a graph is concave up
or concave down at a point where f ′ (x) = 0 allows us to visualize a small neighborhood of that point.
37
Application 1.3.2 Maximum and Minimum Values of a Function
If f ′′ (a) = 0 or does not exist, then the test is inconclusive. a could be a local maximum, a local
minimum, or neither.
Example
What does the second derivative test tell you about the critical points of f (x) = 8x3 − x4 ?
Solution
0 = 4x2 (6 − x) factor
x = 0 or x = 6
Now we compute the second derivative and evaluate it at each critical point.
f ′′ (6) < 0 so x = 6 is a local maximum. f ′′ (0) = 0 so the second derivative test cannot tell whether
x = 0 is a local maximum or local minimum (in fact it is neither).
38
Question 1.3.3
Does a Function Always Have a Maximum?
No. Many functions don’t have maximums, because as x gets larger and larger the values of f (x)
increase or decrease without bound. However, if we restrict the domain, we can sometimes guarantee a
maximum
If f (x) is a continuous function on a closed domain [a, b] then f has an absolute maximum and an
absolute minimum on [a, b].
Remark
When the EVT applies, we can find the absolute maximum and minimum by process of elimination. A
maximum exists, so it must occur at a critical point. We can find the critical points and evaluate f at
each of them. Whichever has the greatest value is the maximum.
Note that a and b are always critical points because the derivative does not exist there. There is no
limit from the left at a because those points are outside the domain of f . Similarly, there is no limit
from the right at b.
Example
Compute the maximum and minimum value of f (x) = 8x3 − x4 on the domain [2, 8], if they exist.
Solution
f (x) is continuous and [2, 8] is closed, so the EVT guarantees that a maximum and minimum exist.
The first derivative test says that they can only occur at critical points.
0 = 4x2 (6 − x) factor
x = 0 or x = 6
x = 0 is not in the domain, so we discard it. On the other hand x = 2 and x = 8 are also critical points
because the derivative does not exist there. To find which critical point is the maximum and which is
the minimum, we plug each into f and compare.
f (2) = (8)(8) − 16 = 48
f (6) = (8)(216) − 1296 = 436 (maximum)
f (8) = (8)(512) − 4096 = 0 (minimum)
39
Application 1.3.4
L’Hôpital’s Rule
The limit rules tell us how to take limits of quotients, products, sums and differences. What happens
if one of the functions being divided goes to ∞, or if the denominator of a quotient goes to 0? In some
cases we can reason this out using our intuition of arithmetic.
Example
tan−1 (x)
Consider lim .
x→∞ ln x
π
lim tan−1 x =
x→∞ 2
lim ln x = ∞
x→∞
Since the numerators are approaching π/2 and the denominators are increasing without bound, we
conclude that this ratio get smaller and smaller and will limit to 0.
Definition
f (x)
A limit of the form lim is of indeterminate form if either
x→a g(x)
Limits of products and sums can sometimes be rewritten as quotients of indeterminate form as well.
f (x)
If lim is of indeterminate form, then it is equal to
x→a g(x)
f ′ (x)
lim
x→a g ′ (x)
Often L’Hôpital’s Rule converts a limit of indeterminate form to one we can evaluate through intuition
or direct computation. Sometimes, we need to apply L’Hôpital’s Rule more than once.
Warning
If a limit is not of indeterminate form, then L’Hôpital’s Rule does not apply. Attempting to apply it will
usually give an incorrect value for the limit.
40
Example 1.3.5
A Limit of Indeterminate Form
ex − x − 1
Evaluate lim
x→0 x2
Solution
ex − x − 1 0
lim form
x→0 x2 0
x
e −1 0
= lim L’Hôpital’s Rule, still form
x→0 2x 0
x
e
= lim L’Hôpital’s Rule again
x→0 2
1
=
2
Section 1.3
Exercises
1.3.1
√
Q1 Write the equation of the tangent line to y = x at (4, 2).
1 1
Q2 Write the equation of the tangent line to y = x2 at 5, 25 .
π
a Write the equation of the linearization y = f (x) at x = 3.
c Use a calculator to get decimal approximations of those numbers, then show how to approx-
imate sin(1).
1 1
Q4 Write a linearization of f (x) = at x = 3 and use it to approximate .
x 2.93
41
Section 1.3 Exercises
Q5 A baterical culture has mass 3g after t = 5 hours of growth. At that time, its instantaneous rate
of growth is 0.2g/hr.
a Write a linear function to approximate m(t) the mass of the culture at hour t.
Q6 A space capsule is descending from orbit. After 90 seconds, it is 10, 000m above sea level and
falling at 400m per second.
b Use a to predict when the capsule will splash down into the ocean.
1.3.2
Q8 Find the critical points of g(x) = x4 − 18x2 + 5. Apply the second derivative test to each.
Q9 Find the critical points of f (x) = x3 − 75x. Apply the second derivative test to each.
Q10 Find the critical points of g(x) = ex − 2x. Apply the second derivative test to each.
1.3.3
Q11 Find the maximum and minimum values of f (x) = x2/3 on [−8, 1].
Q12 Find the maximum and minimum values of f (x) = x3 − 75x on [−10, 10].
42
1.3.4
x cos(x − π)
Q13 Evaluate lim+ .
x→0 ex − 1
e−3x + 3x − 1
Q14 Evaluate lim .
x→0+ sin(x2 )
x ln x
Q15 Evaluate lim .
x→∞ x5/2 + 3
43
Section 1.4
Definite Integrals
Goals:
By definition, integrals compute area under a graph. The Fundamental Theorem of Calculus connects
integrals to antiderivatives, meaning that integrals can also be used to compute total change, given a
rate of change function.
Question 1.4.1
What Is an Antiderivative?
Definition
Example
d x2 x2
dx 2 + 5 = x so F (x) = 2 + 5 is an antiderivative of f (x) = x.
2 2
x x x2
Notice that 2 + 2, 2 − 6, and 2 are also antiderivatives of f (x) = x.
Functions have infinitely many antiderivatives. Adding a constant to one antiderivative produces
another, since the derivative of a constant is 0. In fact, this is the only relationship between antideriva-
tives.
Theorem
If F (x) and G(x) are antideriavatives of f (x), then there is a constant c such that
F (x) = G(x) + c.
Since the antiderivatives are related this way, it is easy to express all of the antiderivatives of a
function at once.
44
Definition
If F (x) is an antiderivative of f (x), then the general antiderivative of f (x) is the family of functions:
F (x) + c
Here is a table of antiderivatives that we can compute just by reverse engineering the derivatives we
already know.
Remark
Many familiar functions are missing from this list. This is because we just haven’t come across them as
derivatives of some other function. For instance, we do not yet know a function F (x) whose derivative
is ln x or tan x.
Question 1.4.2
How Do We Compactly Denote a Sum of Many Terms
Defining the definite integral requires us to add up many numbers. The problem is not just that
the number of summands is large. We need to be flexible about how many terms are in the sum. The
notation that gives us this flexibility is Σ notation.
Notation
Σ (‘sigma’) notation allows us to sum many different values of an expression using an index variable.
The index variable will be replaced by each integer between an initial and final value, and the resulting
outputs are added together.
n
X
f (k) = f (1) + f (2) + f (3) + · · · + f (n)
k=1
We may choose any variable as the index variable. The index variable could also have a different initial
value, if that is more convenient.
45
Question 1.4.2 How Do We Compactly Denote a Sum of Many Terms
Example
7
X j2 9 16 25 36 49
= + + + +
j=3
j + 1 4 5 6 7 8
P
Part of the challenge of writing a sum in notation is choosing an f that will produce all the terms
of your sum.
Example 1.4.3
Writing a Sum in Σ Notation
a 4 + 7 + 10 + 13 + 16 + 19 + 22
b 2 + 6 + 18 + 54 + 162 + 486
c −3 + 4 − 5 + 6 − 7 + 8 − 9 + 10
√ √ √
1 2 3 2 5
d + + + +
4 9 16 25 36
Solution
a The terms increase by 3 each time. Repeated addition is multiplication, in this case 3k plus some
starting value. Starting with index k = 0 is convenient, because 3(0) = 0 at the starting value.
6
X
4 + 7 + 10 + 13 + 16 + 19 + 22 = 4 + 3k
k=0
b The terms are multiplied by 3 each time. Repeated multiplication is exponentiation, in this case
3k times some starting value. Starting with index k = 0 is convenient, because 30 = 1 at the
starting value.
5
X
2 + 6 + 18 + 54 + 162 + 486 = (2)(3k )
k=0
46
c The absolute values of this sum could just be the values of the index variable. To create an
√ √ √ 5 √
1 2 3 2 5 X k
+ + + + =
4 9 16 25 36 (k + 1)2
k=1
Question 1.4.4
How Do We Compute the Area Under a Graph?
Suppose we would like to know the area below the graph y = f (x) between x = a and x = b. We
approximate this area by rectangles. We can improve these approximations and take a limit of such
improvements to compute the actual area. Here is the procedure.
1 Divide [a, b] into n subintervals, of lengths ∆xi .
2 Pick a point x∗i in each subinterval.
3 Evaluate f (x∗i ), which is the height of the graph above x∗i .
4 Produce a rectangle of height f (x∗i ) and width ∆xi over each subinterval.
5 Sum the areas of these rectangles. This is an approximation of the actual area.
6 Take a limit of such approximations as |∆x|, the largest of the ∆xi goes to 0.
47
Question 1.4.4 How Do We Compute the Area Under a Graph?
Defintion
where the limit is taken over all divisions of [a, b], ∆xi is the length of the ith subinterval, x∗i is a point
in the ith subinterval and |∆x| is the largest ∆xi .
Notice there is no requirement that the subintervals be the same length. Becauseof this, we don’t
take a limit as n approaches ∞. For instance, using a large number of rectangles from a, a+b 2 and only
a+b
a single rectangle over 2 , b will not give us a good approximation, no matter how many rectangles
we use. Instead we take a limit as the largest ∆xi approaches 0.
In practice, we get the same limit whether the subintervals are equal length or not not. It is common
to use the same ∆x = b−a n for each subinterval.
The definite integral almost solves our area problem, but wherever f (x) < 0, the product f (x∗i )∆xi
will be negative.
Theorem
Z b
If f (x) > 0 on [a, b] then f (x) dx computes the area under y = f (x) over [a, b]. In general
a
Z b
f (x) dx computes the signed area between y = f (x) and the x-axis, where area above the axis
a
counts as positive, and area below the axis counts a negative.
Since integrals are limits, they inherit two laws from limits. The third can be taken from geometry,
setting the area of a region equal to the sum of the areas of two subregions.
Integral Laws
Z b Z b Z b
f (x) + g(x) dx = f (x) dx + g(x) dx (Sum Rule)
a a a
Z b Z b
cf (x) dx = c f (x) dx (Constant Multiple Rule)
a a
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (Union Rule)
a a c
48
Question 1.4.5
How Do We Evaluate an Integral?
The limit form of an integral is usually impossible to evaluate directly. Instead we use a powerful
pair of theorems.
g(x + h) − g(x)
g ′ (x) = lim
h→0 h
R x+h Rx
a
f (t) dt − a f (t) dt
= lim
h→0 h
R x+h
f (t) dt
= lim x union rule
h→0 h
As the interval [x, x + h] shrinks, the values of f over that interval can be made arbitrarily close to f (x),
R x+h
since f is continuous. Thus x f (t) dt approaches the area of a rectangle of height f (x) and width
h. Thus
R x+h
f (t) dt
lim x = f (x)
h→0 h
The main use of the First Fundamental Theorem of Calculus is to prove the Second Fundamental
Theorem of Calculus.
Let f (x) be a continuous function on [a, b]. If F (x) an antiderivative of f (x), then
Z b
f (x) dx = F (b) − F (a)
a
49
Question 1.4.5 How Do We Evaluate an Integral?
R x This follows immediately from the First Fundamental Theorem. If we continue to define g(x) =
a
f (t) dt, then
Z b Z b Z a
f (x) dx = f (x) dx − f (x) dx
a a a
= g(b) − g(a)
We know that g(x) is an antiderivative of f (x). If we instead pick a different antiderivative F (x), then
F (x) = g(x) + c, and
Because we will be computing F (b) − F (a) frequently, we will develop the following shorthand.
Notation
This relationship between integrals and antiderivatives motivates the following vocabulary.
Notation
The general antiderivative of f (x) is also called an indefinite integral and is denoted
Z
f (x) dx.
50
Example 1.4.6
A Definite Integral
Z 5
Compute x2 dx
2
Solution
5 5
x3
Z
x2 dx =
2 3 2
3
5 23
= −
3 3
125 − 8
=
3
= 39
Question 1.4.7
How Do We Apply the Chain Rule in an Antiderivative?
Theorem
Z b Z u(b)
f (u(x))u′ (x) dx = f (u) du
a u(a)
This allows us to replace a complicated integrand in x with a simpler one in u. To correctly rewrite
the integral, the bounds must be updated to the corresponding values of u.
We can also apply this to indefinite integrals. If F is an antiderivative of f , then
Z Z
f (u(x))u′ (x) dx = f (u) du
= F (u) + c
= F (u(x)) + c
The most common u substitutions are linear, where u = ax.
51
Question 1.4.7 How Do We Apply the Chain Rule in an Antiderivative?
Example
Z
Compute sin 3x dx
Solution
Z Z
1 u-substitution
sin(3x) dx = sin u du
3
u = 3x
1 du = 3 dx
= − cos u + c
3 1
3 du = dx
1
= − cos(3x) + c
3
Note that we should express our antiderivatives in terms of the original variable (often x), not in
terms of u.
Example 1.4.8
A u-substitution
52
Solution
We start by looking for a candidate for u(x). Since we want the integrand to be f (u(x))u′ (x), we
note u(x) should be the inner function in some composition. x2 is the natural target. We attempt the
substituion, and hope that the remaining factors in the integrand can be expressed in terms of u′ (x).
We see that our u′ (x) dx is 2x dx. Since we only have an x dx in our integrand, we divide by 2.
Z 3 Z 9
x2 1 u u-substitution
xe dx = e du
0 0 2
u = x2 x=0⇒u=0
9
1 du = 2x dx x=3⇒u=9
= eu
2 0 1
du = x dx
2
1
= (e9 − 1)
2
Section 1.4
Exercises
1.4.1
Q4 Suppose x4 − sin(x3 ) is an antiderivative of f (x). Write three other antiderivatives of f (x). You
should do this without computing what f is.
Q5 If F (x) and G(x) are both antiderviatives of f (x), find the value b such that 3F (x) − bG(x) is
Q6 Suppose F and G are both antiderivatives of f (x). Suppose further that F is an antiderivative
53
Section 1.4 Exercises
1.4.2
5
X
Q7 Evaluate 3k − 2
k=2
4
X
Q8 Evaluate j2 − j
j=−1
b
X
Q9 Write a formula for the value of c.
k=a
Q10 We do not need to write a constant multiple rule for Σ notation because we already have one.
b
X b
X
Explain what rules of mathematics tell us that cf (k) = c f (k).
k=a k=a
k
X 1
3k 2 +
k
k=1
n
X 1
Q12 Consider the sum for a few different values of n. Can you conjecture a formula for this
2k
k=1
sum (it will depend on n).
1.4.3
a 3 + 7 + 11 + 15 + 19
b 6 + 12 + 24 + 48 + 96 + 192
3 4 5 6 7
c 4 − 5 + 6 − 7 + 8 − 89 .
a 5 − 15 + 25 − 35 + 45 − 55 + 65 − 75 + 85 − 95
54
1 4 9 16 25
b 4 + 16 + 64 + 256 + 1024
√ √ √ √ √ √ √
c 2+ 6+ 12 + 20 + 30 + 42 + 56.
1.4.4
R1 1
Q15 Does 1/2
ln x dx compute the area under y = ln x over 2, 1 ? Explain.
Rb
Q16 Suppose a
f (x) dx < 0. What does this tell you about the graph y = f (x)? Be specific.
√
Q17 Draw a careful graph of y = x. Use 5 subintervals of [1, 11] to estimate the area beneath the
graph over [1, 11]. Use the left endpoints of each subinterval as the test points x∗i .
Q18 Draw a careful graph of y = 3x. Use 3 subintervals of [2, 8] to estimate the area beneath the
graph, with the test points x∗i being the left endpoints of each subinterval.
R
Q19 Draw the graph of y = 7. Use geometry to evaluate 3
87 dx.
x
9 x3 + 1 dx.
R
Q20 Draw the graph of y = 3 + 1. Use geometry to evaluate −3
1.4.5
Rx
Q21 Let g(x) = 5
f (t) dt. What is g ′ (8)?
Rx
Q22 Let g(x) = 2
cos t dt. Is g(x) increasing or decreasing at x = 3? Explain.
R 31
Q23 Suppose f (x) is an increasing function. Is 22
f ′ (x) dx positive or negative?
Q24 Suppose F (x) and G(x) are both antiderivatives of f (x). Given the following incomplete table
R4
of values, compute 1
f (x) dx.
x 1 2 3 4 5 6
F (x) − 7 − 13 − 9
G(x) 3 − 9 − 10 5
55
Section 1.4 Exercises
Z Z b
Q25 Explain the difference between f (x) dx and f (x) dx in a few sentences.
a
Z π
Q26 Compute cos(x) dx. Explain the geometric meaning of your answer in a sentence or two.
0
1.4.6
Z 8
3
Q27 Compute x− dx.
1 x
Z 4
1
Q28 Compute dt.
1 t3/2
Z
Q29 Compute ex − 6x2 dx.
Z 0
1 x
Q30 Compute e + 5 dx.
t 3
Z √
Q31 Compute t dt.
2
x2 + 2
Z
Q32 Compute dx.
10 5x
Z
3
Q33 Compute sin y dy.
5
Z 2
Q34 Compute x4 − 3x + 2 dx.
0
Z 3π/4
Q35 Compute 2 cos v dv.
π/6
Z π
Q36 Compute 2 sin t + cos t dt.
0
56
1.4.7
Q37 Write some general rules. Suppose F (x) + c is the antiderivative of f (x)
x
b By describing the relationship between the graphs of y = f (x) and y = f 2 . A picture
might help.
1.4.8
Z
Q39 Compute e7x dx.
√
Z
Q40 Compute 5x + 3 dx.
Z
θ
Q41 Compute cos dθ.
3
Z
Q42 Compute (t − 2)6 dt.
Z 1/4
Q43 Compute sin(πt) dt.
0
Z 3
3
Q44 Compute x2 ex dx.
0
Z
Q45 Compute (x5 − 2x)(5x4 − 2) dx.
Z 3π/4
1
Q46 Compute cos(x) dx.
π/4 sin2 x
57
Section 1.4 Exercises
58
Chapter 2
This chapter covers a variety of methods and applications for single-variable integrals. The first two
sections lay the groundwork for multivariable integration by exploring the connections between integration
and geometry. One section touches on approximation methods for integrals. Other sections prepare us
for our goal: applying integration to probability and statistics.
Contents
2.1 Area Between Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.3 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.4 Approximate Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.5 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.6 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.7 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 158
Section 2.1
The Fundamental Theorem of Calculus relates the change in a function to the area under a curve.
Modern scientists have seized upon integration as a way to study change, whether they are measuring
a chemical reaction, the position of a particle, or economic activity. The geometric applications are
irrelevant to most consumers of calculus.
Historically, these methods were exciting to scholars who had been limited to area formulas for circles
and triangles. Now any shape that was defined by an algebraic function was fair game. In this section
we push integration beyond areas under a curve to areas bounded by two or more curves. This gives us
the ability to measure a wide variety of shapes, but geometry is not our end goal. Instead the goal is
to study how integration works on these oddly shaped regions. We will find that the methods of this
section return to relevance when it is time to integrate functions of more than one variable.
Question 2.1.1
How Is the Integral Related to Geometric Area?
When we defined the definite integral, we were attempting to compute the area under a curve.
However, our methods introduced a glitch. Consider the following example.
Z 8
38 38
This region has an area of , but f (x) dx = − .
3 3 3
We were taught that the integral does not measure geometric area, but instead signed area. Area
below the x axis counts as negative.
Why does this happen? Recall the definition of the definite integral.
60
Definition
This limit takes better and better approximations of the area. The approximation is a sum of
rectangles, whose area is height × width. All the rectangles have width ∆x, but their heights vary, and
we used the height of the graph y = f (x) to measure them. This works fine when f (x) is positive.
When f (x) < 0, the product f (x∗i )∆x computes a negative “area” for each rectangle.
In this example the resolution of this glitch is straightforward. Eliminating the negative sign, we
obtain the correct area. However, we can imagine a region that requires a more sophisticated approach.
Question 2.1.2
What Integral Computes the Geometric Area Between Two Graphs?
Suppose we want to know the area between the graphs y = f (x) and y = g(x) for some interval
a ≤ x ≤ b. We can approximate this by rectangles. As the number of rectangles increases, the
approximation becomes more accurate.
61
Question 2.1.2 What Integral Computes the Geometric Area Between Two Graphs?
Main Idea
62
Example 2.1.3
The Area Between Two Curves
√ √
Suppose we want to compute the area between y = x and y = x − x from x = 6 to x = 12.
How do we know which graph is on top and which is on the bottom?
The height of a graph is the value of the function. We can evaluate the function at some x in the
interval [6, 12]. The most convenient x is x = 9.
√ √
9=3 9− 9=6
√ √
So at x = 9, y = x − x is above y = x.
Exercise
√ √
We’ve established that at x = 9, y = x − x is above y = x. Unfortunately there are infinitely many
points between x = 6 and x = 12. How can we decide which graph is on top at each of them?
√ √
1 Does the graph of y = x intersect the graph of y = x − x between x = 6 and x = 12?
√ √
2 What theorem could we use to argue that if y = x is ever above y = x − x then the graphs
must have intersected?
Solution
1 To test where the graphs intersect, we set the functions equal to each other.
√ √
x=x− x
√
0=x−2 x
√ √
0 = x( x − 2) (factor)
√
x = 0 or x − 2 = 0
x = 0 or 4
√
The figure below confirms that y = x − x is on top for all x in [6, 12].
63
Example 2.1.3 The Area Between Two Curves
√ √
Figure: An approximation of the area between y = x − x and y = x
Main Ideas
Plugging a test point into f (x) and g(x) tells us which graph is above the other.
If the functions are continuous, then solving f (x) = g(x) computes the only points where the
graphs can change positions.
Example 2.1.4
The Area Enclosed by Two Curves
Set up an integral that computes the area enclosed between the curves y = x2 and y = 3 − x − x2 .
64
Solution
These are parabolas. If they enclose any area, the downward facing parabola must lie above the upward
facing parabola. This tells us we are integrating
Z b
3 − x − x2 − x2 dx
a
But what are the bounds of integration? To know this we must find the points where the graphs
intersect.
3 − x − x2 = x2
0 = 2x2 + x − 3
0 = (2x + 3)(x − 1)
3
x=− or 1
2
Main Ideas
To determine the range of x values that define an enclosed region, solve for the intersection points
between the graphs.
Sketching the graphs can be a time-saver and a reality check for your answer.
Example 2.1.5
The Area Enclosed by Two Curves that Intersect More than Twice
65
Example 2.1.5 The Area Enclosed by Two Curves that Intersect More than Twice
Solution
x3 − 10x = 3x2
x3 − 3x2 − 10x = 0
x(x − 5)(x + 2) = 0
x =0, 5, or − 2
Our region is bounded between x = −2 and x = 5, but one graph does not need to be above the other
for the entire region. The graphs intersect at x = 0 so one graph might be on top for [−2, 0], while the
other is on top for [0, 5]. To find out which is which we could evaluate at test points (we would need
two). Alternately, since we’ve already factored f (x) − g(x) = x(x − 5)(x + 2) we can perform a sign
analysis:
x − − + +
(x − 5) − − − +
(x + 2) − + + +
f (x) − g(x) − + − +
−2 0 5
Thus x3 − 10x > 3x2 on [−2, 0] and x3 − 10x < 3x2 on [0, 5]. The enclosed area is computed by:
Z 0 Z 5
3 2
Area = x − 10x − 3x dx + 3x2 − x3 + 10x dx
−2 0
0 5
x4 x4
= − 5x2 − x3 +x − + 5x2 3
4 −2 4 0
625
= (0 − 0 − 0 − 4 + 20 − 8) + 125 − + 125 − 0 + 0 − 0
4
407
=
4
Main Ideas
With more intersections, we must check the region between each pair of intersections to see which
graph is on top.
It can be more efficient to make a sign analysis chart.
Sketching the graphs may be more difficult. If you can do it, it will corroborate (or correct) your
calculations.
66
Example 2.1.6
A Region without a Single Top Curve
16 √
Compute the area enclosed by the curves y = 1, y = and y = 2 x.
x
We should start by drawing this region and finding the coordinates of the intersections.
There are three intersections to solve for, one using each pair of equations.
16 √ 16 √
=2 x =1 2 x=1
x x
3 √ 1
16 = 2x 2 16 = x x=
2
3
8 = x2
1
x=4 x = 16 x=
4
R 16
If we write this area as an integral 1 g(x) − f (x) dx, the top function would need to be piece-wise:
4
( √
2 x if 14 ≤ x ≤ 4
g(x) = 16
.
x if 4 ≤ x ≤ 16
We don’t know the anti-derivative of a piece-wise function. Instead, we consider a few different ap-
proaches. Since the upper boundary is defined by a different function for different values of x, one
approach is to break the region into two integrals.
Both of these approaches require us to evaluate two integrals. That is unavoidable because our inte-
grals are limits of an approximation by rectangles of different heights, and those heights are determined
by different enclosing graphs, depending on which x value we measure at. For this particular region,
there is a way to avoid this.
Instead we can approximate the region by rectangles of different widths.
√ 16
Notice the left endpoint always lies on y = 2 x and the right endpoint always lies on y = . As
x
the height of the rectangles goes to 0, the approximation becomes exact.
Let’s derive a formula for this rectangle approximation and compute the exact area.
Let ∆y be the height of each rectangle. The widths are given by the horizontal distance between
√ 16
the graph y = 2 x and y = at the heights yi∗ corresponding to the bottom of each rectangle.
x
Horizontal distance is the difference in x values. What x values correspond to yi∗ ? We can plug in yi∗
and solve for x.
68
√ 16
yi∗ = 2 x yi∗ =
x
yi∗ √
= x xyi∗ = 16
2
(yi∗ )2 16
=x x=
4 yi∗
These computations should be familiar. Finding x in terms of y is called finding the inverse function.
These inverse functions give the left and right bounds of our region. To find the area, we take a sum
of the areas of these rectangles of different widths. Then we take a limit. Notice that to make the
width positive we subtract the smaller x value from the larger x value. Geometrically, this is the right
∗ 2
(yi )
endpoint y16∗ minus the left endpoint 4 .
i
Z 4
X 16 (yi∗ )2 16 y 2
lim − ∆y = − dy
∆y→0 y∗ 4 1 y 4
| i {z
|{z}
i } height
width
This limit is an integral, but the variable of integration is y, not x. The bounds of integration are
the set of y values in the region. The lowest point in the region is at y = 1. The highest is at y = 4.
We evaluate the integral using the Fundamental Theorem of Calculus, but with y instead of x.
4
16 y 2
Z
Area Enclosed = − dy
1 y 4
4
y3
= 16 ln |y| −
12 1
64 1
= 16 ln 4 − − 16 ln 1 −
12 12
63
= 16 ln 4 −
12
Main Idea
The area to the right of x = f −1 (y) and to the left of x = g −1 (y) for y from a to b can be computed
Z b
g −1 (y) − f −1 (y) dy.
a
Strategy
Changing an integral to dy may be more work than breaking it into two or more parts. When solving
an area problem, consider both methods and use the one that seems more promising. If you run into
problems with your chosen approach, give the other method a try.
69
Section 2.1
Exercises
Summary Questions
Q1 What is the geometric significance of f (x)−g(x) in the formula for the area between two graphs?
Q2 How do we determine which curve is the top of a region and which is the bottom? Describe the
difficulties that can arise.
Q3 How do we use boundaries of the form y = g(x) and y = f (x) in an dy-integral to compute
geometric area?
Q4 When setting up a dy-integral, how can we visually identify which graph’s function will be sub-
tracted from which?
Q5 An integral can be positive or negative. If we are solving for area (which may not be negative)
describe the steps we take to guarantee our area is positive.
Q6 Explain the difference between “The region enclosed by y = f (x) and y = g(x)” and “The region
f (x) ≤ y ≤ g(x).”
2.1.1
a How much would the geometric area between y = f (x) and the x-axis for a ≤ x ≤ b increase
if the graph were shifted up by k units. Try to argue geometrically or with a visual.
b Would shifting the graph down by k instead decrease the area by the same amount? Draw
a graph for which it wouldn’t.
Q8 How would we use integrals to calculate the geometric area of the shaded region below?
70
Q9 The expressions
Z b Z b
|f (x)| dx and f (x) dx
a a
are not equivalent. Explain why, and draw the graph of a function on which these expressions
disagree.
Q10 Given a differentiable function f (x), the signed area between the graph y = f ′ (x) and the x-axis
Rb
from x = a to x = b is denoted a f ′ (x) dx and is equal to the change in f (x) from x = a to
x = b. In what sense does the geometric area between the graph of y = f ′ (x) and the x-axis
represent a change in f (x)?
2.1.2
Q11 Suppose y = f (x) and y = g(x) are below the x-axis. What integral computes the geometric
area between them. How does this compare to the situation when they are above the x-axis?
Q12 Here is another way to derive the formula for the area between curves. Consider the functions
graphed here:
71
Section 2.1 Exercises
Rb Rb
a Indicate on the graph what areas are denoted by a
f (x) dx and a
g(x) dx. How are they
2.1.3
2.1.4
√
Q15 Compute the area enclosed by y = x and y = x2 .
√
Q18 Compute the area enclosed by y = x + 2 and y = 3 x.
72
2.1.5
Q19 Compute the area between y = sin x and y = cos x over the interval [0, 2π].
Q20 Erica and Carter were asked to compute the area enclosed by y = 4x and y = x3 . They agree
Carter thinks it is
Z 2
x3 − 4x dx
−2
a Who is correct?
b How do you think the mistake could reasonably have happened, and how can you avoid it?
2
Q21 Compute the area enclosed by y = xex , and y = ex.
Q22 Set up an integral or integrals to compute the region enclosed by the curves f (x) = x2 (x2 − 4)
Q23 Often the top curve of an enclosed region alternates between f (x) and g(x) at each intersection.
Can you explain what about the previous problem caused this pattern to fail?
Q24 Suppose y = f (x) and y = g(x) intersect multiple times, with x = a their leftmost intersection
Rb
and x = b their rightmost. We can express the area enclosed between them by a
|g(x)−f (x)| dx.
73
Section 2.1 Exercises
2.1.6
√
Q25 Compute the area enclosed by y = 6, y = x and y = −2x
Q27 You have been taught at least three ways to set up an expression that will compute the area
enclosed by (all of) y = 3, y = 3x, y = 9 and x + y = −5. Set up all the methods you know
that will do this. You do not need to evaluate them.
√
Q28 Write the area in the first quadrant enclosed by y = 3x, y = 0, and x2 + y 2 = 4 as a single
integral.
√
Q29 Write the area enclosed by y = x and y = x2 as
a an integral in x
b an integral in y
Q30 Write the area in the first quadrant enclosed by y = x2 , y = 3x2 , and y = 18 − 3x as
a a sum of integrals in x
b a sum of integrals in y
Q31 Suppose you’ve found that y = f (x) and y = g(x) intersect at x = a (along with perhaps other
places). What could knowing the values of f ′ (a) and g ′ (a) tell you about where each graph is
above the other? Be as specific as possible.
f ′ (x) > 0
g ′ (x) < 0
We approximate area between y = f (x) and y = g(x) from x = a to x = b by rectangles,
letting the x∗i be the right endpoints of each subinterval. What can we say about whether the
approximation will overestimate or underestimate the true area?
74
Section 2.2
Volumes
Goals:
The motivation for the definite integral was computing an area. However, the definition turns out
to be more useful than that. With the correct setup, we can express a volume as an integral as well.
Question 2.2.1
What Is Volume?
Dimension
In mathematics, we define the dimension of an object. Dimension measures the number of degrees of
freedom available to a point traveling in the object.
The definition may not match your intuition for dimension. For example, you only encounter a
parabola in two (or more)-dimensional space. However, the parabola itself is one-dimensional. If you
imagine that you are an insect crawling on the parabola, you can only travel forward or backward, not
side to side. If you were small enough, the parabola would seem indistinguishable from a line.
Example
We measure objects of different dimensions differently. In all cases, measuring is counting how many
units of measurement fit inside the object. A 6 unit by 3 unit rectangle has area 18 square units, because
18 unit squares can fit inside it. For less regular objects we need to consider parts of square units. This
requires a lot of work to do formally, but the intuition should be straightforward.
75
Question 2.2.1 What Is Volume?
We use different names to describe objects and their measurements in different dimensions:
0 point none
1 line, circle, curve length
2 square, polygon, disc, sphere, surface area
3 cube, polyhedron, ball, solid volume
Vocabulary Check
It doesn’t make sense to talk about the volume of a surface. No unit cubes will fit inside it.
Similarly it doesn’t make sense to talk about the area of a solid. Infinitely many unit squares will fit
in any solid. However, solids have boundary surfaces, and we do sometimes measure their areas.
The simplest solid to measure is a (right) prism. If a prism has height h, we can see that each unit
square (or part thereof) in the base has h unit cubes stacked above it. Thus we have
76
Formula for Volume of a Prism
Figure: A prism divided into unit cubes and its base divided into unit squares.
Here we see the base of the prism and the square units (or parts thereof) that it contains. The prism
has height 3.5. We can see there are 3.5 cubic units above each square unit in the base.
You may be questioning the relevance of studying areas and volumes in the 21st century. Few people
need to compute geometric measurements in their careers. However, geometry is not the end goal of
this investigation.
Remark
Our motivation for studying solids is not to solve geometry problems. Recall that the definite integral
allowed us to express total change as an area:
This allowed us to use our geometric intuition of areas to better understand rates of change. Similarly,
volume will allow us to use geometry understand different types of rates later on.
77
Question 2.2.2
How Do We Visualize 3-Dimensional Solids?
Without computer graphics, it can be difficult to visualize anything but the simplest solids. Taking
an arbitrary solid like a lamp or a sculpture, computing its volume by filling it with cubes is a hopeless
endeavor (though a computer could make a decent estimate using small enough cubes). In the absence
of a computer rendering, how do we give our brains a visual reference, and how can we leverage this to
make measurements? We use cross sections.
Definition
A cross section of a solid object is its intersection with some transversal plane.
Transversal means the plane cuts across the solid. In the case of this square-based pyramid, a
transversal plane parallel to the base intersects the pyramid in a square. If it intersects at a different
height, the intersection would be larger or smaller. If it intersects at a different angle, it wouldn’t produce
a square at all.
A solid can be reassembled from its cross sections. This is valuable because cross sections are two-
dimensional, making them easier to draw or visualize. If you have a set of parallel cross sections, you
can imagine them side by side and infer the shape of the original solid.
78
Figure: A set of parallel cross sections of a solid
Question 2.2.3
How Can We Approximate or Compute the Volume of a Non-Prism Solid?
Suppose we want to find the volume of a pyramid. Different square units of the base have a different
number of cubic units above them. Thus we need a more robust approach than counting cubes.
We will approximate the pyramid by prisms, whose bases are cross sections.
79
Question 2.2.3 How Can We Approximate or Compute the Volume of a Non-Prism Solid?
The key insight is to represent the different heights of these cross sections by the variable x. We can
imagine the x-axis running through the solid in the direction of its height. The bases of the prisms are
cross sections. We let x∗i denote the height at which the ith prism’s base lies. The distance between the
heights x∗i is denoted ∆x, which is also the height of each prism. At different heights, we have different
cross sections with different areas. Area is what we really care about, since we want to compute the
volume of these prisms. We write cross sectional area as a function.
Notice that this is fits the definition of a definite integral, where A(x) is the function being integrated.
That is excellent news for us. Instead of having to learn a new way of evaluating this limit, we can use
the tools of integration that we already know.
Theorem
If the cross section of a solid, perpendicular to the x-axis, has area A(x) at each x, then the volume of
the solid is
Z b
A(x) dx
a
where a and b are the values of x at the bottom and top of the solid.
80
Example 2.2.4
A Solid with Its Cross-Sections Given
Suppose a solid S extends from x = 2 to x = 6 and the cross section at each x is a right triangle
of height x1 and base x2 . Compute the volume of S.
Solution
We will let the x direction be the height of our solid. Then the cross sectional area at each x is the area
of the triangle at that x.
1 1 1 1
A(x) = bh = x2 = x
2 2 x 2
Integrating this from x = 2 to x = 6 gives the volume.
Z 6
Volume = A(x) dx
2
Z 6
1
= x dx
2 2
6
1 2
= x
4 2
1 1
= 36 − 4
4 4
=8
Example 2.2.5
A Solid Obtained by Rotation
5
Suppose the region under the graph y = x+1 from x = 1 to x = 4 is rotated around the x-axis.
Compute the volume of the resulting solid.
81
Example 2.2.5 A Solid Obtained by Rotation
5
Figure: The solid obtained by rotating the region under y = x+1 about the x-axis
Solution
When we cut the region under the graph perpendicular to the x-axis, we obtain a line segment whose
height is the value of the function. When that line segment is rotated around the axis, it sweeps out a
circle, with the line segment as the radius. We can use the formula for the area of a circle.
2
2 5 25π
A(x) = πr = π =
x+1 (x + 1)2
We apply our volume formula.
Z 4
Volume = A(x) dx
1
Z 5
25π u-substitution
= dx
1 (x + 1)2 u =x+1 x=1⇒u=2
Z 6
25π du = dx x=5⇒u=6
= du
2 u2
6
25π
=−
u 2
25π 25π
=− +
6 2
25π
=
3
25π
The volume of the solid is 3 cubic units.
Main Idea
When the region under a graph y = f (x) is rotated around the x-axis, the cross sections are discs of
radius f (x). Their areas are π[f (x)]2 .
82
Example 2.2.6
A Solid Defined by Its Base
Solution
0 = 4x − x2
0 = x(4 − x)
x − 0 or 4
So x ranges from 0 to 4. The base of the trapezoid at each x is the height from y = 0 to y = 4x − x2 .
Note 4x − x2 > 0 when 0 < x < 4. Thus the base b1 = 4x − x2 . The other base is twice as long, so it
is 8x − 2x2 . The height is 6, regardless of x.
1
A(x) = (b1 + b2 )h area of a trapezoid
2
1
= (4x − x2 + 8x − 2x2 ))6
2
= 36x − 9x2
Z 4
Volume = 36x − 9x2 dx
0
4
= 18x2 − 3x3
0
= 96
Figure: A solid with base between two graphs and trapezoidal cross-sections
83
Example 2.2.6 A Solid Defined by Its Base
Main Idea
The cross section of the base of a solid is a segment. If we know what role this segment plays in the
cross section of the solid, we can use the expression for the length of this segment to derive an expression
for A(x).
Remark
Notice it is not necessary to be able to visualize the solid to compute its volume from cross sections. It
is not even necessary to know what the cross-sections look like precisely. For instance, our trapezoids
may or may not have a right angle. As long as we can compute the area, the exact shape is irrelevant.
Example 2.2.7
A Solid Described by Measurements
Compute the volume of a pyramid with a square base of side length s and a height of h.
Solution
Let x = 0 be the base of the pyramid and x = h be the vertex. The cross sections are squares. Since
the edges of the pyramid are straight, the squares shrink linearly from s at x = 0 to 0 at x = h. The
line that goes through these two points is
s
Side length = − x + s
h
The cross sections have area
s 2
1 2 2
A(x) = (Side length)2 = − x + s = s2 x − x + 1
h h2 h
We can plug this into the formula for volume.
Z h
1 2 2
Volume = s2x − x + 1 dx
0 h2 h
h
2 1 3 1 2
=s x − x +x
3h2 h 0
2 1 3 1 2
=s h − h +h−0
3h2 h
2 1
=s −1+1 h
3
1 2
= s h
3
The volume of the pyramid in cubic units is V = 31 s2 h.
84
Section 2.2
Exercises
Summary Questions
Q2 What is the significance of the function A(x) in the formula for the volume of a solid?
Q3 What shapes do we use to approximate the volume of a solid? Why do we choose that shape?
Q4 When we rotate the region under y = f (x) around the x axis, how do we compute the area of
each cross-section?
2.2.1
a square
a ball
a sphere
a cube
a cone
a triangle
Q6 Suppose I have a solid S. I tried to fit a unit cube into S but I couldn’t do it, no matter where
I placed the cube or how I rotated it. I conclude that the volume of S is less than 1 unit cube.
What do you think of my conclusion?
Q7 Will the volume of an object be greater is measured in cubic centimeters or cubic inches? Explain
using the definition of how we measure volume.
Q8 Suppose I create a solid by stacking a cone on top of a cylinder. How is the volume of my
new solid related to the volume of the cone and the volume of the cylinder? Explain using the
definition of how we measure volume.
85
Section 2.2 Exercises
2.2.2
Q9 Let S be a sphere of radius 5 centered at the origin. What are the cross sections, perpendicular
to the x-axis? How do they change as you travel along the axis from −5 to 5?
Q10 Describe or draw the cross sections of the pyramid below when it is cut by planes parallel to the
one pictured.
Q11 Suppose all of the cross sections of a solid S, perpendicular to the height, are identical (same
a perpendicular to an edge.
86
2.2.3
Q13 Suppose I’m trying to approximate the volume of a solid S of height 12 using four prisms of equal
height. Supoose those prisms have volumes 5.1, 6, 7.2 and 9.6
b What are the areas of the cross sections I used to produce each prism?
Q14 Suppose I’m trying to approximate the volume of the half-ball below by prisms. I subdivide the
height into n subheights and use the cross section at the left hand side of each as the base of each
prism. Will I overestimate or underestimate the volume? Explain how you know in a sentence or
two.
Q15 Produce an approximation of the volume of a pyramid with height 9 and square base of side
length 6 using 3 prisms. There are multiple correct answer to this, corresponding to different
choices of where to take the cross sections.
Q16 Suppose a solid S has height 16. Suppose all of its cross-sections perpendicular to the height
have a different shape, but all of those shapes have area 5.
2.2.4
Q17 Compute the volume of the solid between x = 0 and x = 3 whose cross sections at each x are
squares of side length ex .
Q18 Compute the volume of the solid between x = 0 and x = 2 whose cross sections at each x are
Q19 Compute the volume of the solid whose cross sections, perpendicular to the x-axis, are triangles
whose bases lie between y = 3x and y = x2 from x = 0 to x = 3 and whose heights are equal
to the length of their bases.
87
Section 2.2 Exercises
Q20 Compute the volume of a solid between x = 1 and x = e2 whose cross sections perpendicular to
ln x
the x-axis are rectangles of base ln x and height x .
2.2.5
√
Q21 Compute the volume of the solid created by rotating the region under y = x from x = 0 to
x = 9 around the x-axis.
b Suppose this semidisk is rotated around the x-axis. Describe the resulting solid.
d Write and evaluate an integral that computes the volume the solid of rotation.
Q23 Compute the volume of the solid created by rotating the region y = 4 − x2 from x = −2 to
x = 2 about the x-axis.
Q24 Compute the volume of the solid created by rotating a trapezoid with vertices (2, 0), (5, 0), (5, 8)
88
2.2.6
Q25 Compute the volume of a solid whose base is the triangle under y = − 12 x + 3 in the first quadrant
and whose cross sections, perpendicular to the x-axis are triangles of height 8.
√ x
Q26 Compute the volume of a solid whose base is the region enclosed by y = x and y = 2 and
whose cross sections, perpendicular to the x-axis are squares.
Q27 Compute the volume of a solid whose base is a right triangle with legs 4 and 3 and whose cross
sections, perpendicular to the leg of length 4, are semicircles with their diameter in the base.
Q28 Compute the volume of a solid S whose base is the unit disc and whose cross sections perpendicular
to the x-axis are isosceles right triangles, with one leg in the base.
a Set up an integral that will compute the geometric area of D. You do not need to evaluate
it.
b Let S be a solid whose base is D and whose cross sections perpendicular to the x-axis are
semicircles with their diameter in D. Set up an integral that will compute the volume of S.
You do not need to evaluate it.
Q30 Consider the solid obtained by rotating the triangle below around the x-axis.
a Describe the shape of the cross sections. Which measurements of this shape depend on x?
b Compute a formula for A(x), the area of the cross section at each value of x.
89
Section 2.2 Exercises
Q31 A solid S of height 12 has the following cross sections areas A(x) at height x. How would you
approximate the volume?
x A(x)
1 10
5 12
7 11
10 7
12 2
90
Section 2.3
Integration by Parts
Goals:
1 Use the integration by parts formula to find anti-derivatives and definite integrals.
2 Choose appropriate decompositions for integrating by parts.
3 Recognize when applying the formula multiple times will be fruitful.
The product rule gives us a reliable method for computing derivatives of products. If you can
differentiate each factor in a product, you can differentiate the entire product. This is not the case for
integration. In this section we add another tool to our limited tool set for integrating a product of two
functions. Even with this method, many problems will be permanently out of reach.
Question 2.3.1
How Do We Compute an Anti-Derivative of a Product of Two Functions?
We reversed the chain rule (which computes derivatives) to compute anti-derivatives of certain
functions. This method is called u-substitution. The du term means that we often end up integrating
a product of functions with this method.
Example
Z 3
2
Compute the integral: xex dx
0
Solution
Z 3 Z 9
2 1 u
xex dx = e du
0 0 2 u-substitution
9 2
1 u u=x x=0⇒u=0
= e
2 0
du = 2x dx x=3⇒u=9
1
= (e9 − 1)
2
Main Idea
u-substitution is extremely fragile. Our example relies on the fact that the factor x is a constant multiple
of the derivative of the inner function, x2 .
Since the chain rule can only produce certain products, we should look for other differentiation rules
that could produce other products. The product rule is the obvious candidate.
91
Question 2.3.1 How Do We Compute an Anti-Derivative of a Product of Two Functions?
Reminder
The Product Rule states that if f (x) and g(x) are differentiable, then
′
[f (x)g(x)] = f ′ (x)g(x) + g ′ (x)f (x).
Example
Z
Compute x2 cos x + 2x sin x dx
Solution
This integrand looks like it might be the output of the product rule. If we write
f ′ (x)g(x) + g ′ (x)f (x) = x2 cos x + 2x sin x
we can match up the factors as
f (x) = sin x f ′ (x) = cos x
g(x) = x2 g ′ (x) = 2x
d 2
Since dx (sin(x)x ) = x2 cos x + 2x sin x we can conclude
Z
x2 cos x + 2x sin x dx = sin(x)x2 + c
If anything, this is more fragile than u-substitution. It requires a sum of compatible products. How
′
can we make the formula [f (x)g(x)] = f ′ (x)g(x) + g ′ (x)f (x) more useful?
A formula that applies to a single product instead of a sum of two products would be much more
useful. We can obtain it by subtracting.
′
f ′ (x)g(x) + g ′ (x)f (x) = [f (x)g(x)] product rule
Z
f ′ (x)g(x) + g ′ (x)f (x) dx = f (x)g(x) + c integrate both sides
Z Z
f ′ (x)g(x) dx + g ′ (x)f (x) dx = f (x)g(x) + c sum rule of integrals
Z Z
g ′ (x)f (x) dx = f (x)g(x) − f ′ (x)g(x) subtract from both sides
Notice we don’t need the “+c” anymore. Both sides contain an indefinite integral so the possible
constant of difference is built in on both sides. We can make one further move to simplify the equation.
Since g ′ (x)dx is the differential of g(x) and f ′ (x)dx is the differential of f (x), it is convenient to
represent these functions with variables. u and v are the traditional choices here.
This method is called integration by parts. Here is the formal statement.
92
Theorem
Z
Suppose an integral can be written u dv where
Example 2.3.2
Computing an Anti-derivative Using Integration by Parts
Z
Compute xex dx.
Solution
To use integration by parts, we need to look at the integrand xex and decide which part is u and which
part is dv. Let’s try letting u = x and dv = ex dx. The formula says
Z Z
u dv = uv − v du.
Z
We can replace xex dx by the right hand side, but we need to know what du and v are. We find du
Notice the integrand vdu is not a product. It is a function whose antiderivative we know. Thus
integration by parts allowed us to replace a product we couldn’t integrate with something we could.
Evaluating the integral, we obtain:
Z
xex dx = xex − ex + c
93
Example 2.3.2 Computing an Anti-derivative Using Integration by Parts
d
(xex − ex + c) = xex + ex (1) −ex
dx | {z }
product rule
= xex
Remark
Question 2.3.3
How Do We Choose u and dv?
Z
What would happen if we again solved xex dx by parts, but set
u = ex
dv = x dx?
Z by parts
xex dx
u = ex dv = x dx
du = e dx x
v = 12 x2
Z
1 x 2 1 2 x
= e x − x e dx
2 2
This is no less correct than our previous application of the formula. It is, however, much less useful.
To evaluate this we need to know an anti-derivative of 12 x2 ex , which seems like an even harder problem
than the one we started with. As we can see, the choice of u and dv can determine the success or failure
of integration by parts. So what makes a good choice of u and dv?
In integration by parts, u is going to be differentiated. This usually makes functions simpler if
Z
anything. dv is going to be integrated. This could make v du difficult to compute. The following
94
I.L.A.T.E.
When deciding which factor of a product should be u and which should be dv, put them into the chart
below.
x2 is algebraic. tan−1 (x) is an inverse function. We should let u = tan−1 (x) and dv = x2 dx.
Z by parts
2 −1
x tan (x) dx u = tan−1 (x) dv = x2 dx
1
v = 13 x3
Z
1 3 1 3 1 du = dx
= x tan−1 (x) − x dx 1+x2
3 3 1 + x2
Z
1 3 1 3 1
= x tan−1 (x) − x dx
3 3 1 + x2
u-substitution
1 x2
Z
1
= x3 tan−1 (x) − 2x dx
3 6 1 + x2 u = 1 + x2
1u−1 du = 2x dx
Z
1 3
= x tan−1 (x) − du
3 6 u
Z
1 1 1
= x3 tan−1 (x) − 1 − du
3 6 u
1 3 1
= x tan−1 (x) − (u − ln |u|) + c
3 6
1 3 1
= x tan−1 (x) − (1 + x2 − ln |1 + x2 |) + c
3 6
95
Example 2.3.4
Using Integration by Parts More than Once
Z π
Compute x2 cos x dx
0
Solution
I.L.A.T.E.
R suggests u = x2 and dv = cos x dx. When we apply integration by parts to a definite integral,
the v du maintains the same bounds of integration. The uv is evaluated at those bounds, because it
is part of the antiderivative.
Z π by parts
x2 cos x dx
0 u = x2 dv = cos x dx
π
Z π
du = 2x dx v = sin x
= x2 sin x − 2x sin x dx
0 0
Z π
x2 cos x dx
0
π
Z π by parts (again)
= x2 sin x − 2x sin x dx
0 0 u = 2x dv = sin x dx
π
π
Z π
du = 2 dx v = − cos x
= x2 sin x − −2x cos x − −2 cos x dx
0 0 0
π π π
= x2 sin x + 2x cos x − 2 sin x
0 0 0
2
= (π )(0) − (0)(0) + (2π)(−1) − (0)(1) − (0) + (0)
= −2π
Change of Variables?
Notice that despite defining functions u and v, we continue to work in terms of the variable x. Contrast
this with u-substitution where the variable x can be completely eliminated in a definite integral. That
approach isn’t possible here. We’d have to write v as a function of u. This would be complicated or
impossible.
96
Example 2.3.5
Using Integration by Parts to Produce an Equation
Z
Compute e2x cos x dx
Solution
I.L.A.T.E. suggests u = cos x and dv = e2x dx. To integrate dv we use a u-substitution. We apply the
integration by parts formula, factoring the − 21 from the integrand:
Z by parts
e2x cos x dx
u = cos x dv = e2x dx
du = − sin x dx v = 12 e2x
Z
1 2x 1
= e cos x − − e2x sin x dx
2 2
Z
1 1
= e2x cos x + e2x sin x dx
2 2
Did this help? We don’t know the antiderivative of e2x sin x. Even worse, it doesn’t seem to have
improved in any way. It is just as complicated as what we started with. Our intuition might be to give
up and try another approach. Perhaps I.L.A.T.E. has done us wrong and we should choose a different
u and dv. In this case, however, we should reject that intuition and continue. We’ll apply integration
by parts again.
Z by parts again
e2x cos x dx
u = sin x dv = e2x dx
v = 21 e2x
Z
1 2x 1 du = cos x dx
= e cos x + e2x sin x dx
2 2
Z
1 2x 1 1 2x 1 2x
= e cos x + e sin x − e cos x dx
2 2 2 2
Z
1 1 1
= e2x cos x + e2x sin x − e2x cos x dx
2 4 4
Does this help? Again the integrand does not seem Rto have improved, until we notice that the
integrand is exactly what
R we began with. We could add 14 e2x cos x dx to both sides of the equation,
2x
and we could solve for e cos x dx algebraically.
Z Z
1 2x 1 1
e2x cos x dx = e cos x + e2x sin x − e2x cos x dx
2 4 4
Z
5 1 2x 1
e2x cos x dx = e cos x + e2x sin x + c
4 2 4
Z
4 1 2x 1
e2x cos x dx = e cos x + e2x sin x + c
5 2 4
Z
2 1
e2x cos x dx = e2x cos x + e2x sin x + c
5 5
97
Example 2.3.5 Using Integration by Parts to Produce an Equation
Main Idea
We’ve seen a variety of techniques to apply when integration by parts does not give us an immediate
Z
answer. The success of integration by parts depends on the v du term. You might use the following
flow chart to decide how to proceed once you have applied integration by parts.
Z
Is v du still a product?
no
yes
Integrate it.
Can you apply a u-sub?
You are done.
no yes
Z
How does v du compare Use u-sub.
Section 2.3
Exercises
Summary Questions
Q4 Under what conditions would we want to apply integration by parts more than once?
98
2.3.1
Z
sin x
Q5 Compute + cos x tan−1 x dx
1 + x2
ex dx xex dx x2 ex dx x3 ex dx
R R R R
2 2 2 2
ex dx xex dx x2 ex dx x3 ex dx
R R R R
3 3 3 3
ex dx xex dx x2 ex dx x3 ex dx
R R R R
4 4 4 4
ex dx xex dx x2 ex dx x3 ex dx
R R R R
2.3.3
Z
ln x
Q7 Evaluate dx.
x3
Z
Q8 Evaluate x sin x dx.
Z
Q9 Use integration by parts to compute tan−1 x dx. Note that d
dx tan−1 x = 1
1+x2
Z Z
Q10 We can write ln x dx as a product: (1)(ln x) dx.
sin−1 x dx.
R
Q11 Compute
R π/4
Q12 Compute 0
tan−1 x dx.
99
Section 2.3 Exercises
2.3.4
Z
Q13 Compute x2 cos(x + 2) dx.
Z 1
Q14 Compute x3 ex dx.
0
Z
Q15 Compute x−7 sin(x−2 ) dx. Hint: The easiest way to split this is not the correct way. You’ll
2.3.5
Z
Q17 Compute e3x sin x dx.
Z
Q18 Compute e−x cos 2x dx.
Z
2
Q19 Compute x3 ex dx. Choose your dv carefully. You want something that you can integrate.
Z
Q20 Compute sin(ln x) dx. Perform a u-substitution before trying by parts.
Q22 Let S be a solid between x = 0 and x = 3 whose cross-sections perpendicular to the x-axis are
triangles of base x and height ex . Compute the volume of S.
Q23 Let S be the solid obtained by rotating the region below y = ln x from x = 1 to x = 5 about
the x-axis. Compute the volume of S.
Q24 Suppose that S is a solid between x = 1 and x = 5 whose cross sections (perpendicular to the
x-axis) are triangles of height x2 and base ln x at each x. Compute the volume of S.
100
Section 2.4
Approximate Integration
Goals:
Question 2.4.1
What x∗i Can We Use when Approximating an Integral?
Definition
where ∆x are the lengths of the subintervals of [a, b], and x∗i is a number in the ith subinterval.
Without the limit (which is difficult or impossible to compute anyway) the sums on the right are
approximations of the integral. Once we choose an x∗i for each i, we can evaluate this approximation.
The simplest idea is to just use the left endpoint of each subinterval as x∗i .
101
Question 2.4.1 What x∗i Can We Use when Approximating an Integral?
Notation
The notation Ln refers to the approximation of
Z b
f (x) dx by n rectangles,
a
n
X
f (x∗i )∆x,
i=1
Example 2.4.2
Computing an Ln Approximation
Z 5
a Compute an L3 approximation of x2 dx.
−1
Z 5
b Does L3 over or underestimate the actual value of x2 dx?
−1
Solution
a Let f (x) = x2 . The interval [−1, 5] has length 5 − (−1) = 6. Three rectangles means that
∆x = 63 = 2. We can divide up the interval to find all three subintervals. A diagram is a good
way to avoid mistakes.
x
−1 1 3 5
3
X
L3 = f (x∗i )∆x
i=1
= 2((−1)2 + 12 + 32 )
= 22
102
b When the function increases, it has more signed area beneath it than then left-endpoint rectangles.
When it decreases it has less. f (x) = x2 increases and decreases, but on the interval [−1, 5], it
spends much more time increasing than decreasing. Thus we expect that L3 underestimates the
true integral. We can verify our intuition with a computation.
5 5
x3
Z
126
x2 dx = = > 22
−1 3 −1 3
Question 2.4.3
How Accurate is an Ln or Rn Approximation?
An approximation is much more useful, if we have some idea of how accurate (or inaccurate) it might
be. The way we quantify this inaccuracy is error.
103
Question 2.4.3 How Accurate is an Ln or Rn Approximation?
Definitions
|error| ≤ N.
Determining error bounds can be difficult. Here are some questions to ask.
1 In what circumstances is the approximation exact?
2 What property or measurement seems to correspond to the amount of error?
Exercise
d What familiar calculus measurement appears to measure whether you are in the situations you
described in a - c ?
104
Solution
d Functions can be classified as increasing, decreasing or constant by their first derivative. f ′ (x)
Let’s use the results of the exercise to formulate an error bound for Ln .
Higher derivatives seem to produce more negative errors. If we allow for steeper and steeper slopes,
there is no limit to how large the error could be. So let’s put a bound on how big the derivative is.
Suppose we know that f ′ (x) ≤ S on [a, b]. Over each interval [xi , xi+1 ] we know that f (x) lies below
the line of slope S through (xi , f (xi )):
f (x) ≤ S(x − xi ) + f (xi )
105
Question 2.4.3 How Accurate is an Ln or Rn Approximation?
The region below the graph y = f (x) and above the ith rectangle is smaller than the region below the
line and above the rectangle, but we can compute the area of the larger region. It is a triangle. Its base
is ∆x = b−a
n . Its height can be determined by the slope of the line.
Figure: The error and the error bound over one rectangle of an Ln approximation
height rise 1
= =S area = (base)(height)
base run 2
height 1
=S = S∆x2
∆x 2
2
1 b−a
height = S∆x = S
2 n
2
So the error over each subinterval can be no larger than 12 S b−a
n . There are n subintervals, so the
Rb 2
total Ln approximation underestimates a f (x) dx by no more than S(b−a) 2n .
Rb
We can make a similar argument that if f ′ (x) ≥ −S then Ln overestimates a f (x) dx by no more
2
than S(b−a)
2n . We can combine these two statements into one by using absolute values. −S ≤ f ′ (x) ≤ S
is rewritten |f ′ (x)| ≤ S.
We could make the same argument for the Rn approximation. We’d only need to swapping the
overestimate with the underestimate. The error bounds it produces are the same. Our result can be
stated as a theorem:
Theorem
Z b
If EL and ER are the errors in an Ln and Rn approximations of f (x) dx and |f ′ (x)| ≤ S on [a, b]
a
then
106
Remark
The argument that the line of slope S is the “worst case” scenario is a useful heuristic, but you may be
unsatisfied with its lack of rigor. A formal argument relies on the following ideas:
Rb Rb
Larger functions have larger integrals. If f (x) ≤ g(x), then a
f (x) dx ≤ a
g(x) dx as long as
a ≤ b.
Rx
The Fundamental Theorem of Calculus tells us we can write f (x) = f (xi ) + xi
f ′ (t)dt.
Rx
The line of slope S would be L(x) = f (xi ) + xi S dt. Over the interval [xi , xi+1 ], comparing these
Rx Rx
integrals shows that f (x) ≤ L(x). Thus xii+1 f (x) dx ≤ xii+1 L(x) dx. This tells us that there is
more error, and thus a larger underestimate in the left hand approximation of L(x) than there is in the
left hand approximation of f (x).
Example 2.4.4
Computing an EL Bound
Z 16 √
Suppose we want to understand the error of an Ln approximation of x dx.
1
1
c What n would we need in order to guarantee that the Ln approximation has error at most .
100
Z 16 √
d What problem would result, if we tried to bound the error of an Ln approximation of x dx?
0
How might you resolve this?
Solution
a f ′ (x) = 1
√
2 x
. This is always positive, and it decreases as x increases. The largest value of f ′ (x)
on [1, 16] occurs when x = 1. If we let S = f ′ (1) = 12 , we are guaranteed that for all x in [1, 16],
|f ′ (x)| < 21 .
107
Example 2.4.4 Computing an EL Bound
b By our theorem
S(b − a)2
|EL | ≤
2n
1
2 (16 − 1)2
=
2(5)
45
=
4
1
c We can set our error bound (with n as a variable) to be less than 100 and solve for n.
1
2 (16− 1)2 1
|EL | ≤ ≤
2n 100
225 1
≤
4n 100
(225)(100) ≤ 4n
(225)(25) ≤ n
5625 ≤ n
1
We conclude that the error will be less than 100 as long as n is at least 5625. Note that since this
1
is an error bound, the actual error may shrink below 100 with fewer rectangles. We would need a
different method to verify that, though.
Z 16 √
d If we want apply our theorem to x dx, we need an S such that |f ′ (x)| ≤ S. This derivative
0
is f ′ (x) = 1
√
2 x
, which increases without bound as x → 0+ . Thus there is no S, and we cannot
apply the error bound theorem.
To get around this problem we could break the interval into two parts and bound them by different
methods. We can bound the error on rectangles 2 through n over the interval [∆x, 16] using the
theorem as above. In this case S = 2√1∆x will work. To bound the error over the first rectangle
[0, ∆x], note that f (x) is increasing. The first rectangle of Ln will underestimate the integral,
while the first rectangle of Rn will overestimate
√ it. Thus the actual error can be no bigger than
the difference between them, which is ∆x∆x − 0∆x. The total error can be no larger than the
sum of the error bound over [0, ∆x] and the error bound over [∆x, 16].
108
Question 2.4.5
How Can We Make our Approximation Less Sensitive to Slope?
Ln and Rn have large errors when function is increasing or decreasing rapidly. We’ll examine two
approximations that are more resilient. The first is the midpoint approximation.
Notation
Z b
The Mn approximation of f (x) dx is calculated by
a
summing:
n
X
f (x∗i )∆x
i=1
M4
Our final approximation abandons rectangles entirely. Using trapezoids instead allows for shapes that
reflect the value of the function at both the right and left endpoint. In this construction, the trapezoids
are sideways from the way you may be used to looking at them when you learned their area formula
A = 21 (b1 + b2 )h. The parallel bases are vertical. The height is along the x-axis.
Notation
Z b
The Tn approximation of f (x) dx is calculated by
a
summing:
n
X 1
(f (xi ) + f (xi+1 ))∆x
i=1
2
where xi and xi+1 and the two endpoints of the ith subin-
terval.
Tn can also be calculated as 12 (Ln + Rn ).
T4
Example 2.4.6
A Midpoint Approximation
Z 5
Calculate the M3 approximation of x2 dx.
−1
Solution
5−(−1)
∆x = 3 = 2. We can sketch the intervals:
109
Example 2.4.6 A Midpoint Approximation
x
−1 1 3 5
n
X
M3 = f (x∗i )∆x
i=1
= 2(02 + 22 + 42 )
= 40
Example 2.4.7
A Trapezoid Approximation Using a Table of Values
Approximation has no practical use for algebraic functions. We would rather get the exact answer
by taking an antiderivative and applying the Fundamental Theorem of Calculus. In many real-world
applications, our data about a function consists of a finite number of measurements. In this case, we
don’t even have an expression for the function, let alone its antiderivative. Here is an example where
approximation is the best we can do.
Suppose we have the following table of values for a function f (x)
x 0 2 4 6 8 10 12 14 16
f (x) 2 5 3 4 7 8 5 4 1
Z 14
Calculate the T3 approximation of f (x) dx.
2
Solution
14−2
∆x = 3 = 4. We can sketch the intervals:
x
2 6 10 14
110
3
X 1
T3 = (f (xi ) + f (xi+1 ))∆x
i=1
2
1
= ∆x(f (x1 ) + f (x2 ) + f (x2 ) + f (x3 ) + f (x3 ) + f (x4 ))
2
1
= ∆x(f (2) + f (6) + f (6) + f (10) + f (10) + f (14))
2
1
= (4)(5 + 4 + 4 + 8 + 8 + 4)
2
= 66
Question 2.4.8
How Do the Error Bounds of the Approximations Compare?
Tn and Mn have zero error when f (x) is a straight line, regardless of slope. Larger errors result
from high rates of curvature. You can see this by using a small number of rectangles/trapezoids and
increasing the curvature of the function. Proving an error bound involves using a quadratic as a “worst
case scenario.” Any function with second derivative smaller than the quadratic will have a smaller error.
Here is the result.
111
Question 2.4.8 How Do the Error Bounds of the Approximations Compare?
Theorem
Suppose |f ′′ (x)| ≤ K for a ≤ x ≤ b. If ET and EM are the error in the trapezoid and midpoint
Z b
approximations of f (x) dx then
a
Remarks
1 The maximum error is smaller when the function has less curvature.
2 The error is also reduced by increasing n, the number of subintervals.
3 These formulas indicate that we can usually expect Mn to have half as much error as Tn .
4 As n increases, the error bounds for Mn and Tn approach 0 much more quickly than Ln and Rn .
Example 2.4.9
Choosing n to Meet an Error Target
R 16 √
Suppose we wish to approximate 1
x dx by a midpoint approximation. How many rectangles
1
must we use to guarantee that the error is smaller than 1000 ?
Solution
The midpoint error formula requires use to have a bound K on |f ′′ (x)| on [1, 16].
1
f ′ (x) = √
2 x
1
f ′′ (x) = −
4x3/2
As x gets larger, the denominator of f ′′ (x) gets larger, meaning |f ′′ (x)| gets smaller (we could also
verify this by checking the sign of f ′′′ (x)). Thus it will be largest at x = 1. We can safely use the value
there as our K
1
|f ′′ (x)| ≤ |f ′′ (1)| = = K
4
112
We can now apply the error bound formula, leaving n as a variable. We will set the error bound to be
1
less than 1000 and solve for n.
K(b − a)3 1
|EM | ≤ 2
≤
24n 1000
1
4 (16− 1)3 1
≤
24n2 1000
1
4 (16− 1)3 1
≤ all factors are postive
24n2 1000
(1000)(15)3
≤ n2 isolate n2
(4)(24)
140, 625
≤ n2
4
375
≤n square root of both sides
2
Thus any n bigger than 375/2, will work. We need to use at least 188 rectangles to guarantee that the
1
error is less than 1000 . Note that we might achieve a sufficiently small error with fewer rectangles, but
our error bound theorem can not guarantee it.
Section 2.4
Exercises
Summary Questions
Q2 What does the first derivative of f (x) tell you about the error in the right-hand approximation
Z b
of f (x) dx?
a
Q3 As the number of subintervals gets large, which approximation(s) converge most quickly to the
actual value?
113
Section 2.4 Exercises
2.4.1
Z 4
Q5 Seong-ju and Anthony are both approximating x2 dx with 4 rectangles. They know that
−4
they can use any combination of test points in their rectangles. What is the maximum difference
between their approximations?
Z 23
f (x) dx?
3
b Can you write a general expression for ∆x and the x∗i ’s for
Z b
f (x) dx?
a
2.4.2
Z 16
Q7 Compute the L5 approximation of x3/2 dx.
1
Z 8 πx
Q8 Compute the R3 approximation of x sin dx.
2 12
Z 2
Q9 Compute the L4 approximation of x3 ex .
0
18
3x
Z
Q10 Compute the L5 approximation of dx.
3 x
114
2.4.3
Z 8 √
3
Q11 Compute the theoretical error bound on the L14 approximation of x dx.
1
Z 15
1
Q12 Compute the theoretical error bound on the R5 approximation of dx.
0 +1 x2
Z 8
Q13 How large would n need to be to guarantee that the Ln approximation of log2 x dx is within
2
1
10000 of the actual value?
Z 2
Q14 How large would n need to be to guarantee that the Rn approximation of x3 dx is within
−1
1
1000 of the actual value?
2.4.4
Z 30
Q15 Suppose we make the following approximations of 4x + 7 dx. Without computing them, put
15
them in order from least to greatest (some may be equal).
L4 M4
L8
M8
R4
R8 The actual value
Rb
Q16 Yiming has a great idea. He approximates a
f (x) dx by 12 rectangles. In order to mitigate the
error of left and right hand approximations, he takes the right endpoint of the first subinterval as
a test point, but the left endpoint of the second subinterval. He continues to alternate for all 12
subintervals. What is another name for the approximation Yiming has produced?
115
Section 2.4 Exercises
2.4.5
Z 16
Q17 Compute the T3 approximation of x2 − x dx.
1
Z 16
Q18 Compute the M3 approximation of x2 − x dx.
1
9
πx2
Z
Q19 Compute the M4 approximation of cos dx.
1 12
Z 6
2
Q20 Compute the T2 approximation of ex +2x
.
0
2.4.6
x 0 3 6 9 12 15 18 21
f (x) 10 13 11 15 13 11 9 12
Z 15
a Compute the M2 approximation of f (x) dx.
3
Z 18
b Compute the T3 approximation of f (x) dx.
0
x 1 2 3 4 5 6 7 8 9
h(x) 2 −1 3 4 2 1 −3 5 4
Z 9
a Compute the T3 approximation of h(x) dx.
1
Z 8
b Compute the M3 approximation of h(x) dx.
2
116
2.4.7
1
Q23 Let f (x) = x3 . If you wanted to use a midpoint approximation with n rectangles to approximate
Z 5
f (x) dx. How large must n be to guarantee your approximation had an error of no more
3
1
than 10000 ? Your answer should have the form n ≥ . . ., but you do not need to simplify any
arithmetic.
Z 9
√
Q24 Suppose we want to approximate x dx.
1
1
b Solve for a value n such that Tn has an error of at most 1000000 . Don’t simplify the arithmetic.
x 0 2 4 6 8 10 12 14
g(x) 3 5 8 9 7 4 3 1
Z 12
a Compute a M3 approximation of g(x) dx.
0
a What does her choice of K imply about the accuracy of her calculation.
117
Section 2.4 Exercises
Q27 Give an example of a function for which L4 and R4 are both overestimates on some interval. You
may want to express your function by drawing its graph.
Z 20
Q28 Suppose we want to estimate f (x) dx and have the following table of values
4
x 4 6 8 10 12 14 16 18 20
f (x) 3 5 4 2 −1 6 2 5 8
b Would you expect the M4 or the T8 approximation to give you a better estimate?
Z 8
Q29 Consider T3 , the trapezoid approximation of x3 dx.
2
c Explain in a couple sentences how you can tell whether the error is positive or negative. You
can include a diagram, if you’d like to.
Z 25
Q30 Suppose you are interested in the value of f (x) dx, but you have only the following data.
0
x 1 2 6 8 13 14 20 23 25
f (x) 12 19 20 20 28 34 50 57 66
Z 25
How might you approximate f (x) dx?
0
Q31 Suppose you invent your own approximation for a definite integral. You name it the “ultimate
approximation” and denote it Un . Its formula is
Ln + Rn + Mn + Tn
Un = .
4
Will Un overestimate or underestimate the integral of a linear function? Justify your answer.
R 13
Q32 Suppose we compute an L5 approximation of −7
f (x) dx.
118
a What formula that we learned would give a bound on the error of this approximation? Fill in
all the information you can, and indicate the information that you would need to complete
the calculation. Be as specific as possible.
b Suppose that, instead of the information you need for the formula, you were only given that
f is an increasing function on [−7, 13]. How could you compute an error bound in this case?
Justify your answer.
119
Section 2.5
Improper Integrals
Goals:
So far we have been content to evaluate integrals of continuous functions over bounded integrals.
Not all functions are continuous. We may be interested in the area under a discontinuous function, even
one with a vertical asymptote. We may be interested in the area under the entire graph of a function,
not just over some subset. In many cases these areas will be infinite, but in some cases they are not.
We will need to develop the methods to determine which case is which.
Question 2.5.1
What Is Infinity?
Notation
The symbol ∞ implies that a variable or function is increasing without bound. It eventually gets bigger
than every number.
1
∞ is not a number. We cannot evaluate or ∞ · 0 or tan−1 (∞).
∞
The main way that we’ve encountered this notation is with limits. Limits at infinity will also be
relevant to improper integrals, so you may want to review them.
120
Exercise
1
a lim
x→∞ x2
√
b lim x
x→∞
c lim et
t→−∞
d lim sin y
y→∞
e lim ln w
w→∞
3x2 + 7
f lim
x→−∞ x2 − 5x
Solution
1
a lim = 0.
x→∞ x2
√
b lim x = ∞.
x→∞
c lim et = 0.
t→−∞
e lim ln w = ∞.
w→∞
3x2 + 7
f lim = 3.
x→−∞ x2 − 5x
121
Question 2.5.2
How Do We Integrate a Discontinuous Function?
Z 5 n
X
f (x) dx = lim f (x∗i )∆x
0 ∆x→0
i=1
If we look at the rectangle approximations in this equation, we see that they can badly estimate the
function near the point of discontinuity.
122
Remarks
n
X
We might worry that the approximations are so bad, that the limit lim f (x∗i )∆x does not
∆x→0
i=1
exist. Fortunately, it does, as long as there are only finitely many discontinuities..
Z x
f (x) almost has an antiderivative function. F (x) = f (t) dt has derivative f (x) at all x,
0
except perhaps at the points of discontinuity.
While it may be comforting to know that an antiderivative function exists, it doesn’t help us evaluate
the integral. We don’t know what number to assign to F (x) for many values of x. So how do we compute
Z 5
f (x) dx? Instead of dealing with a a function whose antiderivative we don’t know, we break this
0
into two integrals that we do know.
Z 5 Z 2 Z 5
f (x) dx = f (x) dx + f (x) dx
0 0 2
Z 2 Z 5
= 3x2 dx + f (x) dx
0 2
R5 R5
Why can’t we replace 2
10 − 2x dx? At x = 2, f (x) = 3x2 , not 10 − 2x. This is
f (x) dx with 2
R5 R5
unfortunate, because for any number t > 2 we could replace t f (x) dx with t 10 − 2x dx. We will
need to break our integral down further.
Z 5 Z 2 Z t Z 5
f (x) dx = f (x) dx + f (x) dx + f (x) dx
0 0 2 t
Z 2 Z t Z 5
= 3x2 dx + f (x) dx + 10 − 2x dx
0 2 t
We still don’t know the value of the middle integral, but we know that as t approaches 2, the domain
of integration shrinks to 0. We can take advantage of this by taking a limit.
Z 5 Z 2 Z t Z 5
f (x) dx = lim+ 3x2 dx + f (x) dx + 10 − 2x dx
0 t→2 0 2 t
2
Z t 5
= lim+ x3 dx + f (x) dx + 10x − x2
t→2 0 2 t
Z t
= lim+ 8 − 0 + f (x) dx + (50 − 25) − (10t − t2 )
t→2 2
Z t
= lim 33 − 10t + t2 + f (x) dx
t→2+ 2
Z 2
= 33 − 10(2) + 22 + f (x) dx
2
123
Question 2.5.2 How Do We Integrate a Discontinuous Function?
= 17
Notice that we had to evaluate an integral with the variable t as a bound. Once we had applied the
Fundamental Theorem of Calculus and plugged in t, this integral became a continuous function and we
could evaluate the limit.
Notice also the strange role the limit played in this computation. Usually we take limits to see what
value a changing function approaches. Our function has the same value for any choice of t (make sure
you see why), so technically we were taking the limit of a constant function. The limit was a purely
computational tool.
Remark
Rt
The discontinuity at x = 2 meant that we were stuck with an integral f (x) dx. With a less well- 2
R2
behaved function we might have also needed an integral on the left side of 2, like s f (x) dx. However,
these two integrals can always be sent to zero by a limit, so when solving integrals of discontinuous
functions, we can leave these out of our calculations.
A removable discontinuity should not slow us down even this much. The area under a single point
of discontinuity is zero. We can use the following theorem for a function with any finite number of
removable discontinuities.
Theorem
If f (x) and g(x) are equal on [a, b] except at a finite number of points, then
Z b Z b
f (x) dx = g(x) dx.
a a
Z 5 Z 2 Z 5
f (x) dx = f (x) dx + f (x) dx
0 0 |{z} 2 |{z}
=3x2 = 10 − 2x
except at x = 2
Z 2 Z 5
2
= 3x dx + 10 − 2x dx
0 2
Most discontinuities can be handled this way, but there is one type that will still require limits.
124
Example 2.5.3
Integrating a Function with a Vertical Asymptote
Definition
Z b
When f (x) has a vertical asymptote at c in [a, b] we call f (x) dx an improper integral.
a
Z 4
1
How can we compute √ dx?
0 x
In this case, breaking this integral into 2 doesn’t help.
Z 4 Z t Z 4
1 1 1
√ dx = lim √ dx + √ dx
0 x t→0+ 0 x t x
Z t
1
We cannot take for granted that lim √ dx goes to 0. The interval is getting smaller, but the
0 x
t→0+
values of the function may be so large that its rectangle approximations stay arbitrarily large and do not
Z t Z 4
1 + 1
limit to 0. If there were an unbounded amount of area in lim √ dx, then as t → 0 , √ dx
t→0 +
0 x t x
Z 4
1
would absorb more and more of that area and tend to ∞. Thus if (and only if) lim+ √ dx exists,
t→0 t x
Z t
1
we can assume that the remaining piece √ dx limits to 0 and can be ignored.
0 x
Solution
Z 4 Z 4
1 1
√ dx = lim √ dx
0 x t→0+ t x
√ 4
= lim+ 2 x
t→0 t
√ √
= lim 2 4 − 2 t
t→0+
=4−0
Z 4
1
Since lim √ dx exists, we conclude that
t→0+ t x
Z 4 Z 4
1 1
√ dx = lim √ dx = 4
0 x t→0+ t x
125
Example 2.5.3 Integrating a Function with a Vertical Asymptote
Main Idea
To compute an improper integral, we introduce a dummy variable t and take limit(s) as t → c. If the
limit(s) exist, we say the integral converges. If any do not, we say it diverges.
Remark
Convergent and divergent are the terms that describe whether the limit which defines an integral ap-
proaches a single, finite numerical value. They perform a similar role to “exists” and “does not exist”
for limits or “defined” and “undefined” for arithmetic.
Question 2.5.4
How Can We Compute an Integral over an Unbounded Region?
Definition
Z ∞
An integral of the form f (x) dx is also called an improper integral. We evaluate it by computing
a
Z ∞ Z t
f (x) dx = lim f (x) dx
a t→∞ a
assuming this limit exists. If the limit exists we say the improper integral converges. Otherwise we say
it diverges.
Z b Z b
Similarly, we can compute f (x) dx = lim f (x) dx.
−∞ t→−∞ t
Example 2.5.5
Evaluating an Improper Integral
Z ∞
32
Compute dx.
2 x3
127
Example 2.5.5 Evaluating an Improper Integral
Solution
16
= lim − +4
t→∞ t2
=4
Z ∞
32
Since the limit exists, it is the value of the improper integral. dx = 4.
2 x3
Example 2.5.6
An Integral over the Entire Real Line
So far we have looked at intervals unbounded in one direction. If the interval is (−∞, ∞), the entire
real line, then we use the following definition.
Definition
Z ∞
The improper integral f (x) dx is computed:
−∞
Z ∞ Z a Z ∞
f (x) dx = f (x) dx + f (x) dx
−∞ −∞ a
for any number a, so long as both integrals on the right converge. If either integral diverges, then we
Z ∞
say f (x) dx diverges as well.
−∞
Let
(
ex if x < 1
f (x) = .
√e if x ≥ 1
x
Z ∞
Compute f (x) dx.
−∞
128
Figure: An integral over the real line, broken into two limits
Solution
We break this integral into two limits. The natural breaking point is a = 1 since that is where the
function changes branches anyway. Both limits must converge for the integral to converge.
Z 1 Z t
lim f (x) dx lim f (x) dx
s→−∞ s t→∞ 1
Z 1 Z t
x e
lim e dx lim √ dx
s→−∞ s t→∞ 1 x
1 √ t
= lim ex = lim 2e x
s→−∞ s t→∞ 1
√
= lim e − es = lim 2e t − 2e
s→−∞ t→∞
=e = ∞ (diverges)
Z ∞
One limit converges to e. The other diverges. This means that f (x) dx diverges.
−∞
Question 2.5.7
Rt
Can We Take a Limit of −t f (x) dx Instead?
Z ∞
We might wonder whether we need to break an integral f (x) dx into two integrals. Instead
−∞
of two dummy variables, one going to −∞ and one going to ∞, could we replace them by one? The
129
Rt
Question 2.5.7 Can We Take a Limit of −t
f (x) dx Instead?
Z ∞
integral x3 dx is a useful test case. We can certainly compute
−∞
t t
x4
Z
lim x3 dx = lim
t→∞ −t t→∞ 4 −t
4
t t4
= lim −
t→∞ 4 4
= lim 0
t→∞
=0
This might even seem right because the area above the axis seems to cancel out the area below the
axis. However, intuitively, we expect that the area of a region should be preserved if we shift it in some
direction. Let’s shift this graph one unit to the left.
t t
(x + 1)4
Z
lim (x + 1)3 dx = lim
t→∞ −t t→∞ 4 −t
4
(t + 1) (−t + 1)4
= lim −
t→∞ 4 4
t4 + 4t3 + 6t2 + 4t + 1 t4 − 4t3 + 6t2 − 4t + 1
= lim −
t→∞ 4 4
= lim −2t3 − 2t
t→∞
= −∞
We can see that, for any choice of t, there will be more area below the graph than above, and the
difference grows quickly as t increases. If the area of a region changes when we shift it to the side, then
that area was not well defined to begin with. We thus say that these integrals diverge, not because
they go to ∞ or −∞, but because they are not defined at all. The formal definition above handles this
Z 0 Z ∞
example correctly. x3 dx diverges, so x3 dx also diverges.
−∞ −∞
The “shortcut” can suggest that the integral converges, when in fact it diverges.
Synthesis 2.5.8
A Comparison Test
Theorem
Z b Z b
If f (x) ≤ g(x) on [a, b] then f (x) dx ≤ g(x) dx.
a a
Theorem
Let a be a real number or ±∞. If F (x) ≤ G(x) for all x near a, then lim F (x) ≤ lim G(x).
x→a x→a
Suppose we have a function f (x) whose anti-derivative we don’t know, and a function g(x) whose
Z ∞
anti-derivative we do know. What can the divergence or convergence of g(x) dx tell us about
a
Z ∞
f (x) dx?
a
131
Synthesis 2.5.8 A Comparison Test
Solution
Z t Z t
If we know that f (x) ≤ g(x) then for all t ≥ a, f (x) dx ≤ g(x) dx. This allows us to also
a a
Z ∞ Z ∞
compare their limits, which are the improper integrals: f (x) dx and g(x) dx. This could be
a a
useful in a couple ways.
Z t Z t Z ∞
If lim g(x) dx = −∞ then lim f (x) dx = −∞ as well, meaning f (x) dx diverges.
t→∞ a t→∞ a a
Z t Z t
If on the other hand f (x) ≥ g(x) and lim g(x) dx = ∞ then lim f (x) dx = ∞ as well,
t→∞ a t→∞ a
Z ∞
which also means f (x) dx diverges.
a
Z ∞ Z ∞
We might like to reverse these and say that if g(x) dx converges, f (x) dx must as well,
a a
Z ∞
but f (x) dx can diverge without going to infinity. f (x) could oscillate between positive and
a
Z t
negative so that f (x) dx increases and decreases and does not have a limit as t → ∞.
a
We can actually solve the last issue adding the assumption that f (x) is non-negative. The result is
not easy to prove, but it is useful.
Theorem
There are similar versions of this theorem for integrals to −∞ or for functions that are non-positive.
132
Section 2.5
Exercises
Summary Questions
2.5.1
Q5 In the expressions below, which of the boxes can legally be replaced by an ∞ symbol?
Z 4 8
5 1
lim x + 2 = 3 f (x) dx = e + x2 + 2x − log 7
|x|
x→ 1 0 6 1
p
4
Q6 Evaluate lim x3 − 2x + 1.
x→∞
x2 + 3x + 5
a lim
x→∞ ex
x2 + 3x + 5
b lim
x→−∞ ex
1
Q8 Evaluate lim ln .
w→∞ w
133
Section 2.5 Exercises
2.5.2
R3 x2
Q9 Evaluate 0 x dx. Explain how you dealt with any discontinuities.
Q10 Let
(
4 x = 1, 4, or 6
f (x) = .
2 otherwise
Q11 Let
√
x
if 0 ≤ x ≤ 4
g(x) = 3 if 4 < x < 6 .
1
x2 if 6 ≤ x
Z 8
Compute g(x) dx.
1
2.5.3
Z 2
1
Q13 Consider the integral dx.
−2 x
134
Z 1
Q14 Evaluate ln x dx.
0
Z 4
1 1
Q15 Evaluate √ +√ dx.
0 x 4−x
Z 3
2
Q16 Evaluate dw.
0 w2
2.5.4
Q17 How large will the base (∆x) of each rectangle be, if we want to approximate:
135
Section 2.5 Exercises
2.5.5
Q23 Compute
Z ∞
2
xe−x dx.
−∞
Z ∞
Q24 Show how to evaluate x1/3 dx or show that it diverges.
−∞
Q25 Let
(
1
x3 if x < −2
f (x) 1 .
(x+4)2 if x ≥ −2
Z ∞
Evaluate f (x) dx.
−∞
Z ∞ Z
1 1
Q26 How would you write dx as a sum of two limits? You might recall that dx =
−∞ 1 + x2 1 + x2
tan−1 x + c. Use this to evaluate the integral.
Q27 Let
(√
x
3
if x < 8
f (x) .
10 − x if x ≥ 8
Q28 Let
−4/3
x
if x < −8
1
f (x) √
3 x if − 8 ≤ x < 0 .
−x
e if x ≥ 0
Z ∞
Evaluate f (x) dx.
−∞
136
Q29 Consider the region R below y = x1 , above y = 0 and to the right of x = 1.
b Suppose R is rotated around the x-axis to create a solid S. Compute the volume of S.
3
Q30 Consider the region in the first quadrant whose boundary is the curves y = x, y = 2x − 1 and
y = 0.
a Write the area of this region as an integral in the variable y. Do not evaluate.
b Suppose this region is rotated around the x-axis. Write the resulting volume using one or
more integrals. Do not evaluate.
137
Section 2.6
Probability
Goals:
The main problem facing every planner is uncertainty. When will the next epidemic strike? Will the
stock market go up or down? How many rare particles will flow through a detection device? These
outcomes cannot be known ahead of time, but they can be modeled as probabilities. Knowing when the
epidemic is likely to happen can guide our decision of how much to invest in mitigation. Knowing how
many particles are likely to pass through an area can inform us how sensitive our detection device needs
to be.
On the other hand, probabilities can also help us understand what has already happened. Probabilities
tell us whether the results of an experiment are likely to be a coincidence. Is an apparent pattern just
the variation inherent in random sampling, or is it likely to be present if the procedure is repeated? This
is in fact the basic model for statistical reasoning:
1 Assume that the type of pattern you’re looking for does not exist (a null hypothesis).
2 Collect observations.
Such reasoning allows us to conclude that survey is representative of the population as a whole. It
allows us understand what outcome will occur on average, or how much outcomes are likely to vary.
Such statistics help us understand the way the world works. We can design our next experiment or plan
our future behavior around that understanding. For example, on average, the stock market goes up.
This is one of the most powerful financial facts available to long-term investors, and it can be grounded
in a probabilistic study of past performance.
Question 2.6.1
What Is a Continuous Probability Distribution?
Definition
A random variable encodes the possible outcomes of a random selection. We use the notation
P (outcome) to denote the probability that a particular outcome occurs. If an outcome is impossible,
we write P (outcome) = 0. If it is certain we write P (outcome) = 1.
138
Example
Our outcome can be any expression concerning the random variable, for instance:
If S is the sum of the rolls of two six-sided dice, then
5
P (S = 8) = .
36
3
P (T ≥ 1) = .
4
We can encode these probabilities with a distribution function. The value of the function at each
number a is the probability that the outcome is a.
Example
Notice
What if we wanted to model height with a random variable? No one is exactly 68 inches tall. Even
people who say they are “five feet eight inches” are slightly taller or shorter. A distribution function
like we made for coins is unsuitable. It would have the property fH (h) = 0 for all h. To handle this
situation, we need to define a different kind of random variable with a different relationship to a defining
function.
139
Question 2.6.1 What Is a Continuous Probability Distribution?
Definition
A continuous random variable X is a random variable whose outcomes are real numbers, and whose
probability is modeled by a probability density function fX (x) such that
Z b
P (a ≤ X ≤ b) = fX (x) dx.
a
Remark
The term density should give us a hint about how to think about these functions. Density is a rate.
The value of a probability density function tells you the rate of likelihood per unit of length on the real
number line. Integrating this rate over an interval gives the total likelihood of lying on that interval,
much like integrating a rate of change over an interval computes the total change.
An integral is the natural way to measure probability. The rules of integration are compatible with
our intuition of probability. Suppose we have an interval [a, b] broken into two or more subintervals. The
total probability of X having an outcome in [a, b] is equal to the sum of the probabilities of the outcome
lying in each subinterval. Similarly, the area above [a, b] and below the graph y = f (x) is equal to the
sum of the areas above each subinterval. In equations, these are the laws:
P (a ≤ X ≤ c) + P (c ≤ X ≤ b) = P (a ≤ X ≤ b)
Z c Z b Z b
fX (x) dx + fX (x) dx = fX (x) dx
a c a
140
Example 2.6.2
Describing a Random Variable from its Density Function
Solution
Z ∞
a We need to check that fX (x) is never negative and fX (x) dx = 1
−∞
Z ∞ Z 0 Z 3 Z ∞
fX (x) dx = fX (x) dx + fX (x) dx + fX (x) dx
−∞ −∞ 0 3
Z 0 Z 3 Z ∞
1 2
= 0 dx + x dx + 0 dx
−∞ 0 9 3
3
1 3
= x
27 0
1
= (27 − 0)
27
=1
Z ∞
P (x ≥ 2) = fX (x) dx
2
Z 3 Z ∞
= fX (x) dx + fX (x) dx
2 3
Z 3 Z ∞
1 2
= x dx + 0 dx
2 9 3
3
1 3
= x
27 2
141
Example 2.6.2 Describing a Random Variable from its Density Function
1
= (27 − 8)
27
19
=
27
c Outcomes outside of [0, 3] are impossible. Among the outcomes in [0, 3], outcomes closer to 3 are
more likely than outcomes closer to 0, because the density function has a greater value there.
Main Ideas
To verify that a function is a probability density function, we need to check that it is never negative
and that it integrates, over the entire real line, to 1.
We compute the probability that X has an outcome in an interval by integrating fX (x) over that
interval.
Outcomes of X where fX (x) is large are more likely than outcomes where fX (x) is small.
142
Figure: The density function of X and the areas that represent the likelihood of larger and smaller
outcomes
Question 2.6.3
What Density Functions Arise Naturally?
The requirements to be a probability density function are not very strict. The vast majority of prob-
ability density functions do not model a real life phenomenon or even an intriguing thought experiment.
What follows are three families of density functions that are especially useful. The first is the simplest.
When we lack data to suggest otherwise, it is a common choice when creating a model with some
randomness.
Definition
Notice that the shorter the interval [a, b] is, the higher density is required to integrate to a total
probability of 1.
143
Question 2.6.3 What Density Functions Arise Naturally?
An intuitive but imprecise way to describe a random variable with a uniform distribution is to say that
all outcomes in [a, b] are equally likely. Since every outcome of a continuous random variable occurs with
probability 0, this is unhelpful. X is remarkable, because all outcomes in [a, b] have equal probability
density. To connect this to actual probabilities, we might say that all subintervals of [a, b] are equally
likely to contain the outcome of X, but this is incorrect. X is 3 times as likely to have an outcome in
an interval of length 6 as an interval of length 2. A precise statement would be: the likelihood of the
outcome of X occurring in each subinterval of [a, b] is proportional to the length of the subinterval.
Our second family of random variables naturally measures waiting time. This answer questions like:
when will the next customer come in? When will this device next detect a certain type of ambient
particle? Here is the formal definition.
Definition
Suppose an event happens randomly and uniformly at an average rate of λ times per unit of time (x).
Then the amount of time until it next occurs is given by the exponential distribution:
(
λe−λx if 0 ≤ x
fX (x) =
0 if x < 0
144
Figure: The density function of an exponential distribution
Example
Gravitational waves large enough to detect pass through the earth from time to time. Suppose we
switch on a gravitational wave detector, and the time (in days) until the first detection is modeled by
the exponential random variable X with density function 0.7e−0.7x .
The probability that the first detection occurs within two days is 0.75.
If the first detection does not occur in the first two days, then the probability that it occurs in the
following two days is 0.75
If the first detection does not occur in the first four days, then the probability that it occurs in
the following two days is 0.75
And so on
From this we can compute
P (2 ≤ X ≤ 4) = (1 − P (X ≤ 2))(0.75)
| {z }
X is not in
the first two days
= (0.25)(0.75)
= 0.1875
Our final family is the most famous, because it is the most generally applicable.
Definition
The normal distribution is sometimes called a bell curve. Many natural phenomena are normally
distributed. The formula is
1 (x−µ)2
fX (x) = √ e− 2σ2
σ 2π
145
Question 2.6.3 What Density Functions Arise Naturally?
The anti-derivative of this density function cannot be expressed with functions that we can evaluate.
Instead we can look up values in a table. The normal distribution has a special role in statistics:
The average of any n independent identically distributed random variables (for instance performing the
same experiment n times) will converge to a normal distribution as n gets large.
This theorem helps explain why many natural measurements are approximated by bell curves. For
example, human height is affected by hundreds of factors, including individual genes, nutrition and
environment. If we view human height as an average of these factors, scaled with appropriate units,
then we expect human heights to be modeled by a normal random variable. Viewing a histogram of
human height statistics shows the expected bell curve.
The parameters in fX can be interpreted as follows:
µ is the average value of X. It corresponds to the peak of the bell curve.
σ is the standard deviation of X. Larger σ means that X has a larger probability of being far
from µ.
Question 2.6.4
What Is the Expected Value of a Random Variable?
Expected value will be the first statistic we can compute for a random variable. Statistics of a data
set tell us something about the numbers in the data set. Statistics of a random variable should tell us
something about the outcomes of the random variable.
The expected value or average value of X describes what the average result will be, if you
let X take a value at random many times. It is typically denoted E[X] or with the letter µ.
146
Example
Suppose we average our rolls of a six-sided die. As the number of rolls n gets large, we’ll roll each
number close to n6 times. The sum of the rolls will be approximately
n n n n n n
1 +2 +3 +4 +5 +6
6 6 6 6 6 6
In general dividing the number of occurrences of the result a in n evaluations of X will be nfX (a).
When we divide out n, we obtain the following weighted average:
Formula
The expected value of a (discrete) random variable X with probability distribution function fX is
X
E[X] = xfX (x)
x
To produce the corresponding formula for a continuous random variable, instead of multiplying
each outcome by its probability and summing, we multiply each output by its density and integrate
Formula
The expected value of a continuous random variable X with probability density function fX is
Z ∞
E[X] = xfX (x) dx
−∞
147
Example 2.6.5
The Expected Value of a Uniform Random Variable
Solution
We’ll apply the formula. Since fX (x) has discontinuities at a and b, we will break it into three parts.
Z ∞
E[X] = xfX (x) dx
−∞
Z a Z b Z ∞
1
= x(0) dx + x dx + x(0) dx
−∞ a b−a b
b
1
= x2
2(b − a) a
1 1
= b2 − a2
2(b − a) 2(b − a)
b2 − a2
=
2(b − a)
(b − a)(b + a)
=
2(b − a)
b+a
=
2
Notice that this is the midpoint of the interval [a, b]. Since X is uniformly distributed across the interval,
we’d expect the average value to occur at the midpoint.
Main Ideas
E[X] is typically occurs somewhere in the middle of the possible outcomes of X. With symmetric
density functions, it is the midpoint.
Example 2.6.6
The Expected Value of an Exponential Random Variable
148
Solution
a We will use the formula. Even after removing the region of 0 density, we are left with an improper
integral. We therefore will compute a limit.
Z ∞
E[X] = xfX (x) dx
−∞
Z 0 Z ∞
= x(0) dx + xλe−λx dx
−∞ 0
Z t by parts
= lim xλe−λx dx
t→∞ 0 u=x dv = λe−λx dx
t t
du = dx v = −e−λx
Z
= lim − xe−λx − −e−λx dx
t→∞ 0 0
t
1 −λx
= lim − xe−λx − e
t→∞ λ 0
1 0
= lim −te−λt − e−λt + 0e0 + e
t→∞ λ
1
= lim −te−λt − 0 + 0 +
t→∞ λ
1 t ∞
= + lim − form
λ t→∞ eλt ∞
1 1
= + lim − λt (l’Hôpital’s rule)
λ t→∞ λe
1
= +0
λ
b X measures the time until an event with average frequency λ occurs. Thus on average, we expect
to wait λ1 for it. For example, if an event occurs three times per hour, we would expect to wait
about 20 minutes for it to occur.
149
Example 2.6.6 The Expected Value of an Exponential Random Variable
Main Idea
For asymmetric density functions, E[X] will not be in the middle of the range of values. It will be pulled
toward regions of higher likelihood.
Synthesis 2.6.7
Median Wait Time
Suppose that an exponential random variable models the wait time of a random caller to a call
center.
b Explain graphically why the median wait time less than the expected wait time.
Solution
a The median is the number m such that half the outcomes are larger than m and half are smaller.
150
We can write this as the following equation and solve for m.
P (X ≤ m) = 0.5
Z m
fX (x) dx = 0.5
−∞
Z 0 Z m
fX (x) dx + fX (x) dx = 0.5 (presumably m > 0)
−∞ 0
Z 0 Z m
0 dx + λe−λx dx = 0.5
−∞ 0
m
−e−λx = 0.5
0
−e−λm + e0 = 0.5
−e−λm = −0.5
−λm = ln 0.5
1
m= ln 2
λ
b The median is the point such that half the area under y = fX (x) lies on either side. The expected
value is weighted. A few outcomes far to one side can balance many outcomes slightly to the
other side. The outcomes of X extends to ∞ on the right but only to 0 on the left. These distant
outcomes pull the average to the right, but their distant position has no effect on the median.
151
Synthesis 2.6.7 Median Wait Time
Main Idea
The median is the value m such that half the area under y = fX (x) lies on either side of x = m.
We compute the median by setting P (X ≤ m) = 0.5 and solving for m.
Median is not the same as expected value. y = fX (x) may have more area on one side of E[X]
than the other, if the smaller side’s area is farther from the middle.
Section 2.6
Exercises
Summary Questions
Q1 Describe the difference between a continuous random variable and a non-continuous (discrete)
one.
2.6.1
Q6 One of the following probability questions is different from the others. Explain why.
i. If you spin a prize wheel 3 times, what is the probability that my winnings add up to exactly
$80?
ii. If you flip two weighted (unfair) coins, what is the probability that exactly one of them comes
up tails?
iii. If you pick a random person, what is the probability that her height is exactly 68 inches?
152
iv. If I spin a wheel of names, what is the probability that it takes exactly 7 spins to land on my
own name?
Q9 Let fT (t) be a probability density function of a random variable T . What quantity is represented
Z 5
by fT (t) dt?
−∞
Q10 Let fX (x) be a probability density function of a random variable X. What quantity is represented
Z ∞
by fX (x) dx?
2
Q11 Given a density function fU (u) for a random variable U , write an integral or integrals to compute
P (4 ≤ U 2 ≤ 9).
Q12 Suppose the height of a mature sunflower is given by the random variable H with density function
fH (h). If you friend tells you that her sunflower is in the top quintile in height, explain how you
could use fH to determine a range that the height of her sunflower must lie in.
2.6.2
Compute P (2 ≤ W ≤ 9)
Compute (0 ≤ T ≤ 41 )
153
Section 2.6 Exercises
2.6.3
Q15 If U is a uniform random variable on [4, 7.5], compute is the probability that U ≤ 5.5.
Q17 If W is an exponential random variable such that P (W ≥ 1) = 27 , then compute the value of the
parameter λ in its density function fW .
Q18 Juan looks at the density function of an exponential random variable X and says “X is more
likely to have the value 1 than 5.” “That’s silly,” replies Neha, “X has exactly zero probability
of being either of those. They are equally likely.” What do you think of their argument?
2.6.4
(
bx−3 x≥2
Q19 Let f (x) = .
0 x<2
Q20 Suppose X is a random variable with density function fX (x). Suppose fX (x) is 0 outside [3, 11]
a In a sentence or two, state what you would need to check to ensure that fX (x) is a valid
probability density function. You do not need to actually perform the calculations.
b Compute E[X].
Q22 Explain how you can use the graph of a normal random variable to identify the expected value.
Then compute that value using the expected value formula.
154
2.6.5
Q23 Give the expected value of a uniform random variable on [5.2, 9.4].
Q24 If the uniform random variable on [a, b] has expected value 7, and a = 3, what is b?
Q26 If you know the expected value µ of a uniform random variable X, what is the probability that
≥ µ? Is this problem answerable without the assumption that X is uniform? Explain.
2.6.6
Q27 Suppose X and Y are two different exponential random variables modeling events that occur on
average p and 2p times per day respectively. How are their expected values related?
Q28 Does our expected value formula result sense if λ < 0? Why should this not bother us.
a Write a probability density function for X, the amount of time until the next bus arrives.
b What is the expected amount of time until the next bus comes?
c How likely is it that you will wait more than an hour for the bus?
155
Section 2.6 Exercises
2.6.7
Q31 Compute the median value of a uniform random variable on [a, b].
(
36−w2
144 if 0 ≤ w ≤ 6
fW (w) =
0 otherwise
( √
3 t
2 if 0 ≤ t ≤ 1
fT (t) =
0 otherwise
Q34 Examine the graph of the density function of a normal random variable X. What is the median
of X? Explain how you can see this in the graph.
Q35 Suppose X is a uniform random variable on [a, b] and P (3 ≤ X ≤ 4) = 12 . Describe all possible
values of a and b.
(
k(7 − w) if 1 ≤ w ≤ 7
fW (w) =
0 if w > 7 or w < 1
b What can you say about which values of W are more likely than others?
156
d What is the average value of W ?
e Can you compute the median value of W ? This might be easier with geometry than with
calculus.
Q37 Suppose that g(x) is a probability distribution for a random variable X and g(x) = 0 for all
x ≥ 0.
Z 0
a What is the value of g(x) dx? Justify your answer with a sentence or computation.
−∞
b Give a formula for E[X]. Is it positive or negative? Justify your answer in a sentence or two.
Q38 Recall that an even function f (x) has the property that f (x) = f (−x) for all x. If the density
function of a random variable is even, what does that say about the expected value and median
of X? Explain your answer.
157
Section 2.7
Sometimes the quantity modeled by a random variable is not the quantity we actually care about. For
example, while we might have a model for how many people will contract a disease, what we actually
would like to predict is how many healthcare resources they will require. The number of patients
determines the required resources, so mathematically, resources is a function of patients. Expected
values of such functions turn out to be straightforward to compute. A natural way to generate statistics
about a random variable is to write a function that measures something interesting and compute its
expected value.
Question 2.7.1
What Is a Function of a Random Variable?
When we write a function g(X) of a random variable X, then the output Y of this function is itself
a random variable. These functions are most intuitive with a discrete random variable. In this case we
can compute Y ’s probability distribution function by applying g to each outcome of X and summing
the probabilities that produce each output.
Example
Let X be a discrete random variable with probability distribution function fX (x). If Y = g(X) = X 2
then Y is a random variable and we can compute its probability distribution function fY (y).
0.1 if x = 0
0.1 if y = 0
0.2 if x = 2
0.6
if y = 4
fX (x) = 0.3 if x = 3 fY (y) =
0.3 if y = 9
0.4 if x = −2
0 otherwise
0 otherwise
158
Example
Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed
1
(meaning each occurs with probability 100 ). Let N give the number of digits of X. Then N has
distribution function.
9
100
if n = 1
90
if n = 2
fN (n) = 100
1
if n = 3
100
0 otherwise
Question 2.7.2
How Do We Compute Expected Value of a Function?
In the case of a discreet random variable, we can compute expected value directly from the distribution
function.
Example
Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed.
Let N give the number of digits of X.
9 90 1
E[N ] = (1) + (2) + (3) = 1.92
100 100 100
Alternately, we could avoid using fN by directly applying the digits function to each outcome X and
taking a weighted average.
Example
1 1
E[N ] = (1) + · · · + (1)
100 100
| {z }
9 times
1 1
+ (2) + · · · + (2)
100 100
| {z }
90 times
1
+ (3)
100
= 1.92
159
Question 2.7.2 How Do We Compute Expected Value of a Function?
In general this gives us two ways to compute the expected value of a function.
Formulas
Remarks
Both formulas will get us to the answer, but one of them skips the step of finding a distribution
function for Y .
In the case of a continuous random variable X, we might find it difficult to find the expected value
of Y = g(X) directly. We would need to
Find a density function fY (y) such that
Z b
fY (y) dy = P (a ≤ g(X) ≤ b)
a
The first step is difficult for any but the simplest functions.
Fortunately, there is an integration analogue of substitution and distributive argument for discrete
variables. This allows us to compute the average outcome of Y as a weighted average of the probabilities
of X.
Theorem
If Y = g(X) is a function of a continuous random variable X with density function fX (x), then
Z ∞
E[Y ] = g(x)fX (x) dx
−∞
160
Notice that the expected value of X is a special case of this theorem. In this case, we are computing
the expected value of the function g(X) = X.
Example 2.7.3
Computing the Expected Value of a Function
Solution
Z ∞
X by parts
E[e ] = ex fX (x) dx
−∞ 1 2
u= 9x dv = ex dx
Z 3
1 2 x
= x e dx du = 29 x dx v = ex
0 9
3 Z 3
1 2 x
= x2 ex − xe dx by parts again
9 0 0 9
3 3 Z 3 u = 29 x dv = ex dx
1 2 2 x
= x2 ex − xex + e dx 2
9 0 9 0 0 9 du = 9 dx v = ex
3
1 2 x 2 x 2 x
= x e − xe + e
9 9 9 0
3
5e − 2
=
9
We can check whether our answer is reasonable. Since X has outcomes between 0 and 3, eX should
have outcomes between 1 and e3 . Our expected value should also fall in that range, and it does.
161
Application 2.7.4
The Average Value of a Function
Sometimes people refer to the average value of a function without any reference to a random variable.
In this case, we understand the input variable to be uniformly distributed.
Definition
The average value of a function from x = a to x = b is the expected value of f (X), where X is
a uniform random variable on [a, b]. The density function is a constant, so we can factor it out of the
integral. We obtain the formula:
Z b
1
fave = f (x) dx.
b−a a
The number fave has geometric significance as well. The signed area under the graph y = f (x) from
x = a to x = b is
Z b
Area = f (x) dx.
a
The region under the horizontal line y = fave is a rectangle with equal signed area:
!
Z b
1
Area = width × height = (b − a) f (x) dx .
b−a a
In other words, if we flattened the area under f into a rectangle, fave would be its height.
162
Example 2.7.5
Computing The Average Value of a Function
2
Compute the average value of f (x) = xex between x = 1 and x = 3.
Solution
Z 3 u-substitution
1 2
fave = xex dx u = x2 x=1⇒u=1
3−1 1
du = 2x dx x=3⇒u=9
1 91 u
Z
1 y
= e du 4 du = 2 dy
2 1 2
9
1 u
= e
4 1
1
= (e9 − e)
4
Application 2.7.6
Variance
Suppose we wanted to plan ahead for the outcome of some random variable X. We might choose
to prepare for the circumstance in which X takes on the value E[X]. This is most likely to be a good
bet, but how much effort should we expend preparing for outcomes far from E[X]? It would help to
know how likely X is to be far from E[X]. We can model this with a distance function (actually we’ll
use distance squared) and compute the expected value of the distance function.
Definition
The variance of a random variable X is the expected value of (X − E[X])2 . If X is continuous with
density function fX (x), we obtain the formula
Z ∞
(x − E[X])2 fX (x) dx
−∞
The square root of variance is the standard deviation. Standard deviation is often denoted by σ, and
variance is often denoted by σ 2 .
If the expected value of (x − E[X])2 is larger, then X is more likely to be far from its expected
value.
163
Application 2.7.6 Variance
Figure: A density function with less variance and a density function with more variance
For example, we can compute the variance of X where X is a uniform random variable on [0, 8].
Solution
Variance is the expected value of (X − E[X])2 , so first we need to know the number E[X]. We showed
earlier that for a uniform random variable, E[X] is the midpoint of the interval. In this case that is
8+0
2 = 4. Armed with this value, we can compute the variance.
h i Z ∞
2 2
E (X − 4) = (x − 4) fX (x) dx
−∞
Z 8
2 1
= (x − 4) dx because fX (x) = 0 outside [0, 8]
0 8−0
Z 8
1 1
= x2 − 8x + 16 dx factor out
8 0 8
8
x3
1
= − 4x2 + 16x
8 3 0
1 512
= − (4)(64) + (16)(8) − 0 + 0 − 0
8 3
1 128
=
8 3
16
=
3
Remarks
In order to solve for variance, we need to know the expected value. We may have to compute
Z ∞
E[X] = xfX (x) dx.
−∞
Variance is larger when the area under y = fX (x) is spread farther to both sides, away from E[X].
164
Section 2.7
Exercises
Summary Questions
Q3 If someone mentions the “average value” of a function without mentioning what random variable
to use, what do you assume?
2.7.1
Q5 Let X be a random variable that indicates how long from now an event will occur (in hours).
How could a random variable indicating how long until the event happens in minutes be defined
in terms of X?
Q6 Suppose the radius of a circle R is a random variable. How could we define a random variable to
express the area of the circle?
Q7 Dominic buys 200 shares of a stock for $60 each. At the end of the day, the stock is worth $V
per share, where V is a random variable. How could you express Dominic’s profit or loss from his
stock purchase with a random variable?
Q8 Suppose X is a random variable with outcomes in the range [2, 7]. What is the range of outcomes
3
of the random variable Y = X2 ?
165
Section 2.7 Exercises
2.7.2
Q9 Suppose X is a random variable and Y = cX for some number c. Explain using one or more
Q10 Suppose X is a random variable and Y = X + d for some number d. Explain using one or more
Q11 Let X be a uniform random variable on [2, 5] with density function fX . Write a density function
fY for Y = 10X. Explain how your density function differs from fX .
Q12 Let X be a uniform random variable on [0, 3]. Is Y = X 2 a uniform random variable on [0, 9]?
Provide evidence for your answer.
2.7.3
(
36−w2
144 if 0 ≤ w ≤ 6
fW (w) =
0 otherwise
1
Compute E W
( √
2 t
3 if 0 ≤ t ≤ 1
fT (t) =
0 otherwise
Compute E[T 3 ].
Q16 Let g(x) = c be a constant function. Let X be a random variable. Compute E[g(X)].
166
2.7.4
Q17 Suppose that you are told that the average value of f (x) from x = a to x = b is 0.
a What geometric information does this give you about the graph y = f (x). Be specific.
b Suppose you are told that f (x) is non-negative for all x. How does that affect your answer
to a ?
√
Q18 Suppose you know that f (x) = 3
x has a positive average value over [a, b]. What does this tell
you about a and b?
2.7.5
Q20 Compute the average value of g(x) = x sin x over [0, π].
1
Q22 What happens if we try to compute the average value of h(x) = x2 over [−2, 2]?
2.7.6
Q23 Compute the variance of an exponential random variable X. Note that you may already know
some components of this computation from earlier examples and exercises.
167
Section 2.7 Exercises
(
36−w2
144 if 0 ≤ w ≤ 6
fW (w) =
0 otherwise
Compute the variance of W . I’d suggest using a computer to help with the algebra.
( √
2 t
3 if 0 ≤ t ≤ 1
fT (t) =
0 otherwise
Q27 Let X be a random variable with density function fX . Let Y = cX for some number c. Write a
formula for fY
Q28 Compute the value b such that the average value of f (x) = x2 over [0, b] is 1.
Q29 Some people memorize compute variance using the formula σ 2 = E[X 2 ] − E[X]2 . Explain why
this formula is equivalent to the one we gave. (This is a famous calculation, so if you can’t figure
it out, look it up and try to explain each step).
168
Chapter 3
Series
This chapter introduces the Taylor polynomial, which is a useful tool for approximating functions that
cannot be evaluated with arithmetic. Like with the derivative and integral before it, we would like to
send the error in these approximations to 0. This requires us to take a new kind of limit called a series.
We will develop the tools to work with series, with the ultimate goal of defining and utilizing Taylor
series.
Contents
3.1 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
3.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
3.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
3.5 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Section 3.1
Taylor Polynomials
Goals:
Question 3.1.1
How Can We Improve on a Linearization?
Formula
170
Question 3.1.2
What Is a Taylor Polynomial?
A polynomial that mimics the first n derivatives of a function is called a Taylor polynomial. Here is
the formal definition.
Definition
The nth Taylor polynomial of f (x) at x = a is a degree n polynomial that shares the value and first
n derivatives of f at x = a. Its formula is
n
X f (k) (a)
Tn (x) = (x − a)k .
k!
k=0
Remarks
Example 3.1.3
Computing a Taylor Polynomial
√
a Find the degree 3 Taylor polynomial of y = x at x = 4.
√
b Use it to estimate 5.
171
Example 3.1.3 Computing a Taylor Polynomial
Solution
a We will apply the equation of the Taylor polynomial where a = 4 and n = 3. Examining the
formula shows we need to know the value of first three derivatives of f (x) at a = 4.
3
X f (k) (4)
T3 (x) = (x − 4)k
k!
k=0
√ √
b To approximate 5, notice 5 = f (5) and f (5) ≈ T3 (5).
1 1 1 1
T3 (5) = + (5 − 4) − (5 − 4)2 + (5 − 4)3
2 4 64 512
1 1 1 1
= + (1) − (1) + (1)
2 4 64 512
256 128 8 1
= + − +
512 512 512 512
377
=
512
172
Example 3.1.4
Writing a Sum in Σ Notation
P
As our Taylor polynomials get longer, we would like to condense them into notation. Part of
the challenge is choosing an expression that will produce all the terms of our sum. Write each of the
following sums in Σ notation.
a 4 + 7 + 10 + 13 + 16 + 19 + 22
b 2 + 6 + 18 + 54 + 162 + 486
c −3 + 4 − 5 + 6 − 7 + 8 − 9 + 10
√ √ √
1 2 3 2 5
d + + + +
4 9 16 25 36
Solution
a The terms increase by 3 each time. Repeated addition is multiplication, in this case 3k plus some
starting value. Starting with index k = 0 is convenient, because 3(0) = 0 at the starting value.
6
X
4 + 7 + 10 + 13 + 16 + 19 + 22 = 4 + 3k
k=0
b The terms are multiplied by 3 each time. Repeated multiplication is exponentiation, in this case
3k times some starting value. Starting with index k = 0 is convenient, because 30 = 1 at the
starting value.
5
X
2 + 6 + 18 + 54 + 162 + 486 = (2)(3k )
k=0
c The absolute values of this sum could just be the values of the index variable. To create an
√ √ √ 5 √
1 2 3 2 5 X k
+ + + + =
4 9 16 25 36 (k + 1)2
k=1
173
Example 3.1.5
P
A Taylor Polynomial in Notation
1
Write the 10th degree Taylor Polynomial for f (x) = x centered at x = 3.
Solution
Computing 10 derivatives seems excessive, so we will compute 4 and try to find a pattern. We’ll write
f (x) = x−2 and apply the power rule.
f (x) = x−2
f ′ (x) = −2x−3
f ′′ (x) = 6x−4
We observe
The sign of these derivatives is alternating, which we can model with a (−1)k .
The coefficients look like a factorial pattern, but offset. For example when k = 2 we obtain 3!.
We model this with (k + 1)!.
The exponent of x decreases by the same amount each step. We model it with −2 − k.
This suggests a general formula for the kth derivative.
10
X (−1)k (k + 1)!3−2−k
T10 (x) = (x − 3)k
k!
k=0
Question 3.1.6
How Accurate Is the Taylor Polynomial?
An approximation is much more useful, if we can put a bound on its error. We will present an error
bound theorem called “Taylor’s Inequality.” Taylor polynomials are effective approximations because
they try to match the values and rates of change of the original function. In order to make a careful
argument, we begin with the basic principal that we can compare functions using the values of their
derivatives.
174
Theorem
Let f and g be differentiable functions. Consider an interval [a, b], and suppose f (a) = g(a).
1 If f ′ (x) = g ′ (x) on [a, b], then f (x) = g(x) on [a, b]
2 If f ′ (x) < g ′ (x) on [a, b], then f (x) < g(x) on (a, b]
Reasoning
Intuitive If two functions start at the same value at a, then the one that grows faster will have a higher
value at b.
Formal The Fundamental Theorem of Calculus says
Z x Z x
f (x) − f (a) = f ′ (t)dt g(x) − g(a) = g ′ (t)dt.
a a
Figure: Two functions with a common value at a: f (x) with a smaller derivative and g(x) with a
larger derivative.
Notation
Given a function f (x) and its nth Taylor polynomial Tn (x) centered at a, the remainder at x is
175
Question 3.1.6 How Accurate Is the Taylor Polynomial?
We should be very interested in knowing the value of Rn (x). We will use our derivative comparison
theorem to make two arguments
Theorem
M
f (x) = Tn+1 (x) = Tn (x) + (x − a)n+1 .
(n + 1)!
Beginning with our assumption about the (n+1)th derivatives and the equality of the nth derivatives
at a, we can use our derivative comparison theorem to equate the nth derivatives on [a.b]. We can use
that equality to equate the (n − 1)th derivatives on [a, b]. We continue this reasoning until we conclude
that the functions are equal.
d d
dxn+1 f (x) = dxn+1 Tn+1 (x) = M on [a, b]
d d d d
dxn f (a) = dxn Tn+1 (a) dxn f (x) = dxn Tn+1 (x) on [a, b]
a Taylor polynomial match the function
Because derivatives and values of
d d d d
dxn−1 f (a) = dxn−1 Tn+1 (a) dxn−1 f (x) = dxn−1 Tn+1 (x) on [a, b]
d d d d
dx f (a) = dx Tn+1 (a) dx f (x) = dx Tn+1 (x) on [a, b]
Remark
M
This theorem tells us that when f (n+1) (x) is a constant M , Rn (x) = f (x) − Tn (x) = (n+1)! (x − a)
n+1
But what if f (n+1) (x) is not a constant? In this case we will settle for a bound on f (n+1) (x).
176
Theorem [Taylor’s Inequality]
If f (n+1) (t) ≤ M for all x between a and b, then for all x between a and b,
M
|Rn (x)| ≤ (x − a)n+1
(n + 1)!
To prove Taylor’s Inequality, we compare the derivatives of f (x) with the worst-case scenario w(x) =
M (k)
Tn (x) + (n+1)! (x − a)n+1 . The derivatives w(k) (a) are the same as Tn (a) and f (k) (a) for 0 ≤ k ≤ n,
d
and dxn+1 w(x) = M.
d
Because M is a bound on dxn+1 f (x)
d d
dxn+1 f (x) ≤ dxn+1 w(a) = M on [a, b]
d d d d
dxn f (a) = dxn w(a) dxn f (x) ≤ dxn w(x) on [a, b]
a Taylor polynomial match the function
Because derivatives and values of
d d d d
dxn−1 f (a) = dxn−1 w(a) dxn−1 f (x) ≤ dxn−1 w(x) on [a, b]
d d d d
dx f (a) = dx w(a) dx f (x) ≤ dx w(x) on [a, b]
M M
Tn (x) − (x − a)n+1 ≤ f (x) ≤ Tn (x) + (x − a)n+1
(n + 1)! (n + 1)!
M M
− (x − a)n+1 ≤ Rn (x) ≤ (x − a)n+1
(n + 1)! (n + 1)!
3 Repeat for intervals of the form [b, a]. These work the same way with a sign reversed.
177
Example 3.1.7
A Taylor Approximation Error Bound
c What happens to the error bound as x increases but n stays the same?
d What happens to the error bound as n increases but x stays the same?
e What does this tell us about the relationship between the Tn (x) approximations and f (x)?
Solution
a For the Taylor polynomial formula, we need to compute the derivatives of f (x).
In order to write a general Taylor polynomial, we would need a general expression for f (k) (0). The
pattern is obvious, but trying to express it as a formula is much more difficult. The solution is a
trick worth remembering:
Since the even derivatives are zero, those terms do not appear in our Taylor polynomials. Since
we want to only have odd terms in our summation, we can let our index variable be k, but our
exponents in each term be 2k + 1. Thus as k goes from 0 to n, the summation will include only
the odd terms x1 through x2n+1 . We can produce the following chart to work out our coefficients:
k f (2k+1) (0)
0 1
1 −1
2 1
3 −1
.. ..
. .
178
This is an easier pattern to express:
Now we are ready to write a formula. Since we intend to sum from k = 0 to k = n, we are
actually producing the (2n + 1)th Taylor polynomial.
n
X f (2k+1) (0)
T2n+1 (x) = x2k+1
(2k + 1)!
k=0
n
X (−1)k 2k+1
= x
(2k + 1)!
k=0
These are the odd degree Taylor polynomials, but what about the even numbered ones? Since
T2n (x) is just T2n−1 (x) plus the 2nth term, and the 2nth term is zero, we can write
n−1
X (−1)k 2k+1
T2n (x) = x
(2k + 1)!
k=0
b Given the chart above, we can see that the derivatives are sines and cosines. These are bounded
above by 1 and below by −1. Since Taylor’s inequality requires a bound of the form |f (n+1) (x)| ≤
M , we write
|f (n+1) (x)| ≤ 1
And luckily, thus works for all x and all n.
1 n+1
c Taylor’s Inequality says that |Rn (x)| ≤ (n+1)! x . As x goes to ∞, this bound goes to ∞ as
well. This makes sense, since Tn (x) is polynomial, while the function it is approximating stays
between −1 and 1.
d When n increases xn+1 increases by a factor of x. On the other hand, (n + 1)! increases by a
factor of n + 2. As n increases without bound, (n + 1)! grows faster than xn+1 and their ratio
approaches 0.
e Any Tn (x) will eventually become inaccurate outside a certain distance from 0. On the other
hand, if we want to approximate sin(x) for a particular x, we can make Tn (x) have as small an
error as we want by choosing sufficiently large n.
179
Example 3.1.7 A Taylor Approximation Error Bound
Main Ideas
In order to understand how the error changes as n increases, we need to have an expression for
f (n) (x).
We can choose M to be the largest value of |f (n+1) | on the interval [a, x]. This may not be the
value of |f (n+1) (a)|.
In general, Taylor polynomials will become less accurate the farther you get from a.
We can often mitigate this inaccuracy by choosing larger values of n.
The (n + 1)! in Taylor’s Inequality might suggest that as n increases, the error in the nth Taylor
polynomial must shrink toward 0. However, this is not the case. Some functions are not well estimated
by their Taylor polynomial.
180
Example
(
0 if x ≤ 0
f (x) = 1
e− x if x > 0
No matter how large n gets, Tn (x) will not get any closer to f (x) for any x > 0.
How can this happen, given Taylor’s Inequality? The derivatives of f get bigger and bigger. M
grows so fast that the error Rn (x) gets no smaller even with an (n + 1)! in the denominator of Taylor’s
Inequality.
Despite examples like this, it turns out that Taylor polynomials often do a good job of approximating
functions. For numerical computations, an approximation is good enough. For more theoretical situ-
ations, we would like to let n go to ∞ so that the error goes to 0 and we can use the polynomial as
an exact replacement of the function. Unfortunately, with infinitely many terms, we no longer have a
polynomial at all. Instead we have an object that we will call a Taylor series. We will develop the tools
to define and work with Taylor series over the course of this chapter.
181
Section 3.1
Exercises
Summary Questions
3.1.1
√ √
3
Q5 Suppose we use the linearization of f (x) = 3
x at x = 8 to approximate 6.
√ √
3
a What is the relationship between f (x) = 3
x and 6?
b Suppose L(x) is the the linearization of f (x) at x = 8. Would you expect L(6) to overesti-
√
mate or underestimate 3 6? Explain in a sentence or two.
Q6 Suppose you were locked in a room with only a pencil and paper and asked to compute the first
ten decimal places of the following numbers:
4 √
7 e
17
182
3.1.2
Q8 Suppose T4 (x) is the Taylor polynomial for f (x) centered at x = 10. List what information T4 (x)
Q9 If f (x) is a decreasing function, what can you say about the coefficients of any Taylor polynomial
of f (x)?
1
T4 (x) = 5 + 3(x − 2) − (x − 2)2 + 2(x − 2)4
6
a What is f (2)?
b Is f increasing or decreasing at x = 2?
3.1.3
c Can you use sigma notation to write a general form for the degree n Taylor polynomial of
y = ex ?
183
Section 3.1 Exercises
Q13 Write the 10th Taylor polynomial for f (x) = cos x centered at x = π.
1
Q14 Write the 4th Taylor polynomial for f (x) = x2 centered at x = 5.
3.1.4
b 24 + 19 + 14 + 9 + 4 − 1 − 6
1 1 1 1 1
c 8 + 18 + 50 + 72 + 98
a 11 − 13 + 15 − 17 + 19 − 21 + 23
b 384 + 192 + 96 + 48 + 24 + 12 + 6
2 3 4 5
c 10 + 100 + 1000 + 10000
3.1.5
Q17 Write an expression in Σ notation for the 53rd Taylor polynomial of f (x) = ln x centered at
x=1
Q18 Write an expression in Σ notation for the 15th Taylor polynomial of f (x) = ex centered at x = 0
Q19 Write an expression in Σ notation for the 100th Taylor polynomial of f (x) = cos x centered at
x=0
1
Q20 Write an expression in Σ notation for the 71st Taylor polynomial of f (x) = x2 centered at x = 10
184
3.1.6
Q21 Why don’t we have any theorems for a lower bound for error? Give your answer in a few sentences.
Q22 Suppose you are using Taylor polynomials of f (x) centered at x = 0 to approximate f (−3).
k!
However, for each k, the best bound you can put on f (k) (x) on [−3, 0] is 4k
. Will you be able
to guarantee a good approximation of f (−3) this way? Explain.
3
Q23 Suppose the fourth derivative of f (x) is f (4) (x) = ex . Suppose we have written T4 (x), the
degree 4 Taylor polynomial of f (x) centered at x = 1. What can you say about the difference
between T4 (5) and f (5)? Be specific and justify your answer with a computation. You do not
need to simplify any arithmetic in your calculations.
Q24 Sketch a graph of y = ex and several tangent lines. On which part of the graph do the tangent
lines appear to approximate the function better? Does Taylor’s Inequality confirm this observa-
tion? Explain.
3.1.7
√
Q25 Here is the degree 3 Taylor polynomial of f (x) = x centered at x = 4:
1 1 1
T3 (x) = 2 + (x − 4) − (x − 4)2 + (x − 4)3
4 64 512
a Which derivative will let you bound the error of this approximation?
b Can you put a bound on this derivative that holds for all x?
c Can you put a bound on this derivative that holds for x in the interval [4, 5]?
√
d What error bound does this suggest for using T3 (5) to approximate 5?
√
Q26 Let f (x) = 3
x.
√
3
b If you wanted to use the Taylor polynomial to approximate 10, how would you do that?
185
Section 3.1 Exercises
c Explain the difficulties that would arise from this error bound, if your goal is to approximate
b How would you use that Taylor polynomial to approximate the value of cos 3π
4 ?
186
a Suppose you wanted to produce the second degree Taylor polynomial of f centered at a =
−1. Indicate whether the constant term and each coefficient would be positive or negative.
Provide evidence for your answer.
e Conjecture a general relationship between polynomial functions and certain Taylor polyno-
mials. Can you use Taylor’s inequality to justify your conjecture?
187
Section 3.2
Sequences
Goals:
1 Use notation to describe the terms of an infinite sequence.
2 Calculate the limit of an infinite sequence.
Sequences are the first step in our development of Taylor series. While they appear to have little in
common with polynomials of infinite degree, they are the scaffolding on which such objects are built.
Question 3.2.1
What Is a Sequence?
A sequence is an ordered set of numbers. If this set is infinite, we can most rigorously define it by
giving a general formula for the nth term for some index variable n. Here are three different notations
for the same sequence.
∞
1 2 3 4 n n
, , , ... an =
2 3 4 5 n+1 n=1 n+1
Example
∞
n2
The first three terms of are
2n n=0
02 12 1 22
=0 = =1
20 21 2 22
Question 3.2.2
What Is the Limit of a Sequence?
Definition
If we can make the elements of a sequence an arbitrarily close to some number L by considering only n
above a certain number, then we write
lim an = L
n→∞
and we say the sequence converges to L. If an does not converge to any such L then we say it
diverges.
188
Remarks
The first few or even the first thousand terms of a sequence have no bearing on the limit. We
only care that we can eventually get close to L.
“Arbitrarily close” means any level of closeness than anyone could ask for. Eventually the sequence
1 1 1
must be within 100 of L, and 1000 and 1000000 .
Example 3.2.3
Computing a Limit
n
Calculate lim
n→∞ n + 1
1 2 3 4
2, 3, 4, 5 ...
189
Example 3.2.3 Computing a Limit
Solution
Writing the first few terms suggests that this sequence approaches 1. To see that, we can measure the
distance to 1:
n 1
1 − an = 1 − =
n+1 n+1
1
We can make this smaller than any positive number. For instance to make an within 1000 of 1, we can
n
consider only n > 1000. We conclude lim =1
n→∞ n + 1
n
Figure: The sequence n+1 converges to L = 1.
Question 3.2.4
How Are Limits of Sequences and Functions Related?
The definition of lim an should look familiar. The definition of the limit of a function is similar.
n→∞
In fact, the limit of a f (x) as x → ∞ has a nearly identical construction, except that n must be an
integer, while x can be any real number. The following theorem lets us use that connection to evaluate
limits.
190
Theorem
Suppose for a sequence an , there is a function f (x) such that f (n) = an for all n (or at least all n
sufficiently large). If
lim f (x) = L
x→∞
lim an = L.
n→∞
Example 3.2.5
Sequence Limits Using Functions
2n
a lim
n→∞ n+3
1
b lim
n→∞ n3
c lim e−n
n→∞
n2
d lim
n→∞ en
e lim (−1)n
n→∞
Solution
We will use x to denote a real number variable and n to denote natural numbers.
2x 2n
a lim = 2, so lim = 2.
x→∞ x+3 n→∞ n + 3
1 1
b lim 3
= 0, so lim 3 = 0.
x→∞ x n→∞ n
191
Example 3.2.5 Sequence Limits Using Functions
x2
d lim can be evaluated with L’hôpital’s rule.
x→∞ ex
x2 2x ∞
lim = lim x form, L’hôpital’s again
x→∞ ex x→∞ e ∞
2
= lim x
x→∞ e
n2
= 0so lim =0
n→∞ en
e f (x) = (−1)x is not well defined for real numbers so we can’t use its limit. Instead examine the
sequence directly. The sequence has the form
This does not approach arbitrarily close to any number. No matter how many early terms we
disregard, there will always be terms remaining that are not close to 1, or not close to −1 or not
close to any other number. Thus an = (−1)n diverges.
The following limit laws for sequences should look familiar. They mirror the laws for limits of
functions.
If lim an = K and lim bn = L then the following sequences converge with the following limits:
n→∞ n→∞
lim (an + bn ) = K + L
n→∞
lim (an − bn ) = K − L
n→∞
lim (an bn ) = KL
n→∞
an K
If L ̸= 0, then lim =
n→∞ bn L
192
Synthesis 3.2.6
Indeterminate Forms with Factorials
We will encounter sequences of the form an = cbnn . If bn or cn both go to 0 or ±∞, then any attempt
to use
lim an = lim f (x)
n→∞ x→∞
Dominance
f (x)
We say f (x) dominates g(x) if lim = ±∞. We write
x→∞ g(x)
Even if you include a constant multiple or add multiple functions together, the dominant function
will outgrow any combination of dominated ones. We have already established an order of dominance
using l’Hôpital’s rule:
But n! is not a differentiable function. We cannot analyze it using l’Hôpital’s rule. Where does it fit
in the domincance pecking order?
Theorem
As n → ∞, n! will eventually dominate any exponential function (and thus any polynomial, root or
logarithm).
We will not provide a formal proof, but here is a useful thought experiment. Suppose we compare
n! to 63n . At first 63n grows faster, multiplying by 63 every time we increase n. However, when n is
greater than 63, n! is multiplying by a higher number. When n reaches one billion, 63n increases by a
factor of 63 every step, while n! increases by a factor of 1, 000, 000, 000. By this point n! is much larger
and growing much faster.
193
Section 3.2
Exercises
Summary Questions
Q4 If an = bn + 1000 for 1 ≤ n ≤ 2000000, what does that tell us about the limits lim an and
n→∞
lim bn ?
n→∞
3.2.1
Q5 Find a general expression for an , the nth term of the following sequences. Use this to write the
sequences using both other types of notation.
194
3.2.2
sin n
Q7 Show using the definition of the limit of a sequence that lim = 0.
n→∞ n2
2n − 1
Q8 Show using the definition of the limit of a sequence that lim = 1.
n→∞ 2n
Q9 A sequence is increasing if every term is larger than the previous term. Must an increasing
sequence always diverge? Explain.
Q10 A sequence is alternating if its terms alternate between positive and negative values. Is it possible
that the limit of an alternating sequence exists? What would its value have to be?
3.2.3
c Does the theorem equating limits of functions and sequences apply to this function?
195
Section 3.2 Exercises
3.2.4
log n
Q13 Compute lim .
n→∞ 3n
n
Q14 Compute lim .
n→∞ 2n
n3 + 3
Q15 Compute lim .
n→∞ 4n3 − 9
sin n
Q16 Compute lim .
n→∞ log n
en
Q17 Compute lim √ .
n→∞ n
3.2.5
n!
Q19 Compute lim .
n→∞ 5n
n4 + 3n + 1
Q20 Compute lim .
n→∞ n!
n!
Q22 Yuran knows that lim = ∞ because n! growns faster than an . However, he thinks he can
n→∞ 5n
make the denominator grown faster than the numerator if he uses a product like 5nn!6n or 5n 6n!n 7n .
Will he eventually obtain a non-infinite limit by this method? Explain how you know.
196
Synthesis & Extension
(
f (n) if n ≤ 342
Q23 Suppose we have a sequence an = . Which of the following could help us
g(n) if n > 342
evaluate lim an ?
n→∞
lim f (x)
x→∞
lim g(x)
x→∞
b Write an expression for the error bound of Tn (x) for some x between 0 and 1.
197
Section 3.3
Series
Goals:
The first step in understanding a Taylor polynomial of infinite degree is understanding how to add
up infinitely many of anything. This proposition is mechanically absurd. Addition is an operation for
two numbers at a time. Adding three or four numbers requires us to add two or three times. Adding
infinitely many requires us to add infinitely many times, something no one has time to do.
Yet there are some intuitive exercises we could perform. Suppose we lay a length of 12 m next to 14 m
next to 81 m. If we continued indefinitely, we could imagine these lengths extending an entire meter.
What reasoning could we use to make this exercise rigorous? How could we add up lengths or
numbers where the pattern is not so intuitive? The formal object that does this is called a series. A
series is the first step on our way to push the Taylor polynomial to infinite degree. It is also the most
general. While we are concerned with one specific (and very useful) type of series, there are other
applications worth exploring as well.
Question 3.3.1
What Is a Series?
You have been encountering series since you first learned about decimals. You likely have not seen
a rigorous description of what they mean.
0.33333333 . . . 3.1415926...
We can write
3 3 3 3
0.3333 . . . = + + + + ···
10 100 1000 10000
or
1 4 1 5
3.1415 . . . = 3 + + + + + ···
10 100 1000 10000
You may have an intuitive sense of what these quantities are, but what does it mean to add up
infinitely many numbers?
198
Definition
∞
X
A series is a sum of the form ak where ak is an infinite sequence. If it is more convenient, we can
k=1
X
give k a different initial value. If the context is clear, we can write ak as a shorthand.
Example
∞
X 3
0.33333 . . . =
10k
k=1
∞
X 1
The harmonic series is
k
k=1
This tells us what a series is but not how to evaluate it. How do we know that, for example
1
0.333 . . . = ?
3
Definition
∞
X
The nth partial sum of the series ak is
k=1
sn = a1 + a2 + a3 + · · · + an
∞
X
A series ak converges to L if
k=1
lim sn = L.
n→∞
Vocabulary Note
Do not confuse a sequence with a series. One is a list of numbers. The other is the sum of a list of
numbers.
199
Example 3.3.2
Computing Partial Sums
∞
X 3
Consider .
10k
k=1
b Compute lim sn
n→∞
Solution
3
s1 =
10
3 3 33
s2 = + =
10 100 100
3 3 3 333
s3 = + + =
10 100 1000 1000
3 3 3 3 3333
s4 = + + + =
10 100 1000 10000 10000
b In order to use our usual methods of limits, we would need an algebraic expression for sn . It isn’t
immediately clear how to produce one. Given our knowledge of decimals, we expect the answer
to be 13 . We will use this as a hint. We expect 31 − sn to approach 0.
1 1
− s1 =
3 30
1 1
− s2 =
3 300
1 1
− s3 =
3 3000
1 1
− s4 =
3 30000
1 1
extrapolating suggests − sn =
3 3(10)n
∞
X 3 1
and we conclude that k
= .
10 3
k=1
200
Main Idea
∞
X
Often, we can show that ak = L by computing L − sn and seeing that it converges to 0.
k=1
1
Figure: The partial sums sn converging to L = 3
Example 3.3.3
The Harmonic Series
We have seen examples of series in which the terms approach 0 as k → ∞. These have allowed us
to add infinitely many terms and obtain a finite sum. Does this always work? No. A series can have its
terms approach 0, and yet the partial sums go to ∞. The most famous example of this is the harmonic
∞
X 1
series: . Rather than computing the partial sums directly (which would be a lot of computation)
k
k=1
we will compare the partial sums to an expression that is easier to calculate. We will replace each term
by a fraction with a power of 2 in the denominator. Here’s what we’ll do with s8 .
1 1 1 1 1 1 1 1
s8 = + + + + + + +
1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 1
> + + + + + + +
1 2 |4 {z 4} |8 8 {z 8 8}
1 1
2 2
1 1 1
=1 + + +
2 2 2
Since we replaced each term with something smaller and obtained a sum of 25 , we can conclude that
s8 > 52 . Continuing this pattern, the terms 91 to 16
1
sum to more than 12 so s16 > 62 . In general we can
201
Example 3.3.3 The Harmonic Series
1
1 + m > c.
2
This tells us that the harmonic series diverges.
Question 3.3.4
What Is a Geometric Series?
The two series so far that we have been able to evaluate belonged to a larger family. These are the
geometric series.
Definition
∞
X
A geometric series is a series of the form ark−1 .
k=1
Example
∞ k−1
X 1 1 1 1
=1+ + + + ···
2 2 4 8
k=1
∞ k−1
X 3 1 3 3 3 1
= + + + ··· =
10 10 10 100 1000 3
k=1
Unlike many other series, geometric series are simple enough that we can write a formula for their
sum. We can get a convenient expression for sn by performing a cute algebra trick. We’ll multiply sn
by r and subtract rsn from sn . Most of the terms cancel and we obtain an equation that we can solve
for sn .
sn = a + ar + ar2 + · · · + arn−1
202
To evaluate this limit, we need to understand the behavior of rn as n → ∞
If −1 < r < 1 then higher powers of r get smaller and smaller and rn → 0.
If r > 1 then higher powers of r get larger and larger and rn → ∞.
If r < −1 then higher powers of r get larger but alternate signs. lim rn does not exist.
n→∞
Theorem
n
n a(1 − r )
X if r ̸= 1
sn = ark−1 = 1−r
an if r = 1
k=1
a
These converge to when |r| < 1 and diverge when |r| ≥ 1.
1−r
203
Example 3.3.5
Evaluating Geometric Series
Identify a and r in the following geometric series. Then evaluate the series.
2 4 8
a 3 + 15 + 75 + ···
∞
X
b 3n
n=2
c 0.999999 . . .
Solution
4/15
a a is the initial term, which is 23 . The common ratio is the ratio between any two terms. 2/3 = 25 .
∞ k−1 2 2
X 2 2 3 3 10
= 2 = 3 =
k=1
3 5 1− 5 5
9
∞
X
b The initial term of this series is 9. The common ratio is 3. Since |3| ≥ 1, 3n diverges.
n=2
9 9 9 9 1
c 0.999999 . . . = 10 + 100 + 1000 + · · · . This has an initial term of 10 and a common ratio of 10 .
|r| < 1 so
9 9
10 10
0.999999 . . . = 1 = 9 =1
1− 10 10
204
Question 3.3.6
P
What Does the Size of ak Tell Us About ak ?
The discussion of the geometric series suggests that certain properties of a series make convergence
impossible. Specifically, in the cases in which the terms were not shrinking to 0, the partial sums were
growing without bound or oscillating. This intuition can be formalized in the following theorem, which
applies to more than just geometric series.
∞
X
ak
k=1
diverges.
Remark
The divergence test does not tell us anything, if lim ak = 0. The series might converge, and it might
k→∞
not. In this case we say the test is inconclusive.
Example 3.3.7
Applying the Divergence Test
What does the divergence test tell us about each of the following series?
∞
X
a 3k
k=2
∞
X 1
b
k
k=2
∞
X k2 − 1
c
3k 2 + 7
k=2
∞
X k2
d
ek
k=2
205
Example 3.3.7 Applying the Divergence Test
Solution
a The sequence is ak = 3k . lim 3k = ∞. This limit is not 0, so by the divergence test, the series
k→∞
diverges.
1 1
b The sequence is ak = k. lim= 0. The divergence test is inconclusive. It cannot tell us
k
k→∞
whether this series diverges or converges. By our earlier work, we happen to know this series
diverges.
k2 −1 k2 − 1 1
c The sequence is ak = 3k2 +7 . k→∞
lim 2
= . This limit is not 0, so by the divergence test,
3k + 7 3
the series diverges.
k2
d The sequence is ak = ek
. We need L’Hôptial’s rule to evaluate the limit.
k2 2k ∞
lim k
= lim k still form
k→∞ e k→∞ e ∞
2
= lim
k→∞ ek
=0
The divergence test is inconclusive. It cannot tell us whether this series diverges or converges. It
turns out that this series converges, but we do not have a method to verify that yet.
Question 3.3.8
What Is the Ratio Test?
So far we have two tests to determine the convergence of a series. One test is very specific, applying
only to geometric series. The other is very imprecise. The divergence test is often inconclusive. It
does not help us to evaluate a series at all, only recognizing some series that diverge. Unfortunately,
these shortcoming are typical of series tests. A rigorous study of infinite series requires learning almost a
dozen tests. On a randomly chosen series, most of these tests will be inconclusive, and none of them will
give a numerical value, even if the series happens to converge. Because we are interested in extending
Taylor polynomials to have infinitely many terms, some of these tests are much more useful than others.
The most useful is the ratio test, though it is still no help in evaluating a series and is still sometimes
inconclusive.
In the case of a geometric series, ark−1 , the common ratio between terms determines whether this
P
series grows out of control, or whether the terms shrink quickly enough that the partial sums converge.
Even when a series is not geometric, we can attempt to apply similar reasoning to determine whether it
converges. A non-geometric series does not have a constant ratio. The ratio between successive terms
will change as we progress through them. We will instead compute the limit of these ratios.
206
Theorem [The Ratio Test]
ak+1 X
If lim = L < 1, then ak converges absolutely.
k→∞ ak
ak+1 X
If lim = L > 1 or is infinite, then ak is divergent.
k→∞ ak
ak+1
If lim = 1, then the ratio test is inconclusive.
k→∞ ak
Remark
Converges absolutely is a term for series with both positive and negative terms. It means the series
would converge, even if the signs of all the terms were all positive. The alternative is conditional
convergence, meaning the series’s convergence may require the positive and negative terms partially
canceling each other out.
Example
The series
1 1 1 1
1− + − + − ···
2 3 4 5
converges (we won’t prove this). If we made all the terms positive, it would be the harmonic series,
which diverges. This series converges conditionally, not absolutely.
Absolute versus conditional convergence can be interesting to play with. You may see references to
it in other math books, but we won’t have any further use for it.
Example 3.3.9
Applying the Ratio Test
∞
X (−1)k−1
a Does converge or diverge?
k!
k=1
∞
X 2k
b Does converge or diverge?
k2
k=1
∞
X
c Does k converge or diverge?
k=1
207
Example 3.3.9 Applying the Ratio Test
Solution
a First we will compute and simplify the ratio. Then we will take its limit and draw a conclusion.
(−1)k
ak+1 (k+1)!
= (−1)k−1
ak
k!
(−1)k k!
=
(−1)k−1 (k + 1)!
(−1)k (1)(2)(3) · · · (k)
= (expand the factorials)
(−1)k−1 (1)(2)(3) · · · (k)(k + 1)
(−1)k
= (cancel the matching factors)
(−1)k−1 (k + 1)
−1
= (cancel k − 1 powers of − 1)
k+1
1
= (absolute value of a negative number is its negatve)
k+1
b We will apply the ratio test. First we compute the ratio, and then we take a limit.
2k+1
ak+1 (k+1)2
= 2k
ak k2
2k+1 k 2
=
2k (k 2 + 2k + 1)
2k 2
= (cancel the 2s)
k 2 + 2k + 1
2k 2
=
k2 + 2k + 1
2k 2
lim =2
k→∞ k 2 + 2k + 1
208
c We will apply the ratio test. First we compute the ratio, and then we take a limit.
ak+1 k+1
=
ak k
k+1 k+1
= lim =1
k k→∞ k
Here the ratio test is inconclusive. It cannot tell whether this series converges or diverges. However,
we can probably figure this out another way. The terms of this series are increasing, which means
the partial sums will grow faster and faster. This was the reasoning behind the divergence test.
lim k = ∞
k→∞
Since lim k ̸= 0, the divergence test concludes that the series diverges.
k→∞
Main Ideas
When applying the ratio test, be sure to replace every k with k + 1 for the ak+1 term.
Familiarize yourself with the algebra rules that allow you to simplify ratios of exponentials and
factorials.
Example 3.3.10
A Strategy for Series Tests
209
Example 3.3.10 A Strategy for Series Tests
Strategy
Given the three ways we have to test for divergence and convergence and the relative ease of applying
each, here is a reasonable approach to testing a series.
not constant
=1 hard to tell
∞
X 1
.
n=1
n2
Solution
First we’ll check that the terms go to zero. If they don’t we quickly classify this as a divergent series.
1
lim =0
n→∞ n2
They do, so we need another check. Now we’ll compute the ratio between terms.
1
an+1 (n+1)2 n2
= 1 =
an n2
n2 + 2n + 1
This is not a constant; it depends on n. Thus an is not a geometric series. We’ll try the ratio test.
an+1 n2
lim = lim 2
n→∞ an n→∞ n + 2n + 1
n2
= lim
n→∞ n2 + 2n + 1
=1
This means that the ratio test is inconclusive. We do not know whether this series converges or diverges.
We have exhausted all our tests. If we want the answer, we need to look up another test.
210
Section 3.3
Exercises
Summary Questions
ak+1
Q5 How do each of the following factors behave in the ratio ?
ak
a k p (p a constant)
b ck (c a constant)
c k!
X
Q6 How would the ratio test apply to a geometric series ark−1 ?
3.3.1
7 1 8 2 8
a 2+ + + + + + ···
10 100 1000 10000 100000
6 6 6 6
b + + + + ···
10 100 1000 10000
25
Q8 Use a calculator to get a decimal approximation of 33 and write it as a series of fractions with
powers of 10 as denominators.
211
Section 3.3 Exercises
3.3.2
∞
X k+1
Q10 Compute the first 3 partial sums of . Don’t simplify the arithmetic.
k2
k=1
∞
X
Q11 Compute the first four partial sums of (−1)k . What do you think this suggests about the sum
k=1
of the series?
∞
X 1
Q12 Compute the first five partial sums of . Use them to make a prediction about the value
(−2)k
k=0
of the series.
3.3.3
Q13 Give an example of an n such that you know the nth partial sum of the harmonic series is greater
than 20.
∞
X 1
Q14 Modify our argument for the harmonic series to show that √ diverges?
k=0
k
212
3.3.4
1 1 1 1
Q15 Is + + + + · · · a geometric series? How can you tell?
2 4 6 8
Q17 The first two terms of a geometric series are 5 and 7.5. What is the third term?
Q18 The fifth term of a geometric series is 17. The eigth term is 51. What is the sixth term?
3.3.5
∞
X
Q19 Evaluate 5(0.3)k
k=0
∞ k
X 1 4
Q20 Evaluate .
4 3
k=0
∞
X 15
Q21 Evaluate .
j=3
5j
∞
X
Q22 Evaluate 0.8k .
k=1
∞
X 3k
Q23 Evaluate .
2k (18)
k=4
∞
X 37
Q24 Evaluate . What decimal does this represent?
100k
k=1
∞
X 3k
Q25 For what values of z does converge?
zk
k=0
∞
X 12p2k
Q26 For what values of p does converge?
16k
k=3
213
Section 3.3 Exercises
3.3.6
n
X
1
Q27 If ak > 100 for all k, then what can you say about the value of sn = ak ?
k=1
1
Q28 If limk→∞ ak = 100 , use the definition of a limit and the reasoning in the previous exercise to
∞
X
show that an diverges.
k=1
3.3.7
∞
X 1
Q29 What does the divergence test say about ?
k3
k=1
∞
X k2 + 1
Q30 What does the divergence test say about ?
5k 2 + 3k
k=1
∞
X
Q31 What does the divergence test say about ln k?
k=2
∞
X 1
Q32 What does the divergence test say about ?
ln k
k=2
3.3.8
Q33 Will the divergence test detect every series that “fails” the ratio test (L > 1)? Explain.
an + 1
Q34 If lim does not exist, the ratio test is inconclusive. Give examples of two series where
n→∞ an
this limit does not exist, one series that diverges and one that converges.
214
3.3.9
∞
X k!
Q35 Apply the ratio test to . What can you conclude?
4k
k=1
∞
X k5k
Q36 Apply the ratio test to . What can you conclude?
(k + 1)!
k=1
∞
X (−1)k−1
Q37 Apply the ratio test to . What can you conclude?
k2
k=1
∞
X (−8)k
Q38 Apply the ratio test to . What can you conclude?
k 2 5k
k=1
∞
X k2
Q39 Apply the ratio test to . What can you conclude?
4k
k=1
∞
X k!
Q40 Apply the ratio test to . What can you conclude?
5k 3 + 4k − 2
k=3
∞ √
X k+1
Q41 Apply the ratio test to What can you conclude?
k2
k=1
∞ √
X
Q42 Apply the ratio test to ke−k What can you conclude?
k=1
3.3.10
P∞ k+1
Q43 Use one of the tests from this section to deterine whether k=1 k converges.
P∞ 3(4k )
Q44 Use one of the tests from this section to deterine whether k=1 7k
converges.
kek
P∞
Q45 Use one of the tests from this section to deterine whether k=1 4k+1 converges.
7k9k
P∞
Q46 Use one of the tests from this section to deterine whether k=1 k32k+1 converges.
215
Section 3.3 Exercises
Q47 In a paragraph or two, explain: How is evaluating an improper integral similar to evaluating an
infinite series. How are they different?
Q48 Suppose we have a sequence an such that lim an = 30. Suppose we then increase the values
n→∞
of the first few terms of an by 10, 000 each.
R∞ 1
Q49 Suppose we wanted to approximate 0 ex dx by rectangles of length ∆x = 1, with heights
measured at the left endpoints.
e Does your series over- or underestimate the true value of the integral?
R∞ 1
Q50 Suppose we wanted to approximate 1 x2 dx by rectangles of length ∆x = 1, with heights
measured at the right endpoints.
b Express the sum of the areas of all the the rectangles you’ll need as a series.
c Does your series over- or underestimate the true value of the integral?
d What is the true value of the integral? What does this suggest about whether your series
converges or diverges?
(
1 1
x − x+1 if x is a positive integer
fX (x) =
0 otherwise
b Compute P (3 ≤ X ≤ 5).
217
Section 3.4
Power Series
Goals:
1 Use series tests to determine for what values of x a power series converges.
2 Identify the radius of convergence of a power series.
3 Recognize functions that can be rewritten as a power series.
The infinite degree polynomials we seek to define are series. The tools we’ve developed so far
provide the foundation for understanding the objects we want to construct, but there is more to do. A
polynomial also contains a variable. In this section we deal with the ramifications of including a variable
in an infinite series.
Question 3.4.1
What Is a Power Series?
So far we have studied infinite series of numbers. If instead of just numbers, our terms include
variables, then we’ve created a function. Plugging in different values for the variable gives us a different
series of numbers.
Example
The expression
1 + x + x2 + x3 + · · ·
becomes
1 + 2 + 4 + 8 + ···
when we evaluate it at x = 2. It becomes
1 1 1
1− + − + ···
3 9 27
when we evaluate it at x = − 31 .
Definition
It is a function of x whose domain is all values of x that make the series converge.
1
Use the geometric series formula to write f (x) = as a power series and find its domain.
1−x
Solution
1
is the sum of a geometric series. In this case, the initial term a = 1 and the common ratio r is
1−x
x. If we write out the first few terms we obtain 1 + x + x2 + x3 + · · · . WeP∞see this is a power series
centered at 0. The coefficients ck are all equal to 1. We could write it as k=0 xk .
The domain of a power series is the values of x that make it converge. We know that this geometric
series converges if and only if the common ratio x has absolute value less than 1. Those values of x,
the open interval (−1, 1), are the domain of f .
Example 3.4.3
The Domain of a Power Series
∞
X k2
What is the domain of (x − 5)k ?
4k
k=1
Solution
The domain is the set of x values that make the series converge. The ratio test will be helpful here.
The ratio between terms is
(k+1)2
ak+1 4k+1
(x − 5)k+1
= k 2
ak 4k
(x − 5)k
(k + 1)2 4k (x − 5)k+1
=
k 2 4k+1 (x − 5)k
(k 2 + 2k + 1)(x − 5)
=
4k 2
Notice this entire computation is invalid if x = 5, because we cannot divide by 0. We can examine
this case directly. If x = 5 then every term of the series is 0, and the series converges. For the rest of
the real numbers, we compute the limit as k → ∞, but x will remain in the result.
(k 2 + 2k + 1)(x − 5) (x − 5) k 2 + 2k + 1 (x − 5)
lim = lim =
k→∞ 4k 2 4 k→∞ k2 4
219
Example 3.4.3 The Domain of a Power Series
(x−5)
The ratio test can tell us whether the series converges for some values of x. If 4 < 1 the series
converges. We can solve for x
(x − 5)
<1
4
−4 < x − 5 < 4
1<x<9 (add 5 to all three expressions)
(x−5)
On the other hand, if 4 > 1 the series diverges. Solving for x follows a similar procedure.
(x − 5)
>1
4
x − 5 < −4 or x − 5 > 4
x < 1 or x > 9
(x−5)
What about when x = 1 or x = 9? 4 = 1 so the ratio test is indeterminate. We would
need another test to resolve these points. In this case, we are lucky. If x = 9 the series becomes
P∞ k2
k=1 4 (4). The divergence test is useful here: lim k 2 = ∞. Since the terms do not approach 0,
k→∞
the series diverges. A similar argument works for k = 1.
Main Idea
The ratio test is usually successful in finding where a power series converges. Generally it is inconclusive
at only two points. We will not always have a test that can tell us whether the series converges at these
points.
You may notice a pattern in the types of domains we have computed for power series. That pattern
is formalized in the theorem below, which tells us that the domain of a power series must take a very
particular form.
220
Theorem
∞
X
Given a power series ck (x − a)k centered at a, one of the following is true.
k=0
In case 3 , the inequality |x − a| < R solves to a − R < x < a + R, which means the domain is an
interval centered at a and extending a distance R to either side. The theorem does not state whether
this is a closed, open or half open interval. This reasoning extends intuitively, if not formally, to the
other cases. 1 can the thought of as a (closed) interval extending distance 0 on either side. 2 would
then be an interval extending infinitely on either side.
Remark
The main consequence of this theorem is that when solving for the domain of a power series, we
can simplify our use of the ratio test. The interval of convergence will always be the solution to
ak + 1
lim < 1. The endpoints may or may not lie in the domain. The points beyond the endpoints
k→∞ ak
will never be part of the domain.
Question 3.4.4
Can We Integrate or Differentiate a Power Series?
When f (x) is a polynomial, we can find the derivative and anti-derivative of f (x) by computing the
(anti-)derivative of each term. The following theorem says that we can do this for a power series too.
221
Question 3.4.4 Can We Integrate or Differentiate a Power Series?
Theorem
∞
X
If f (x) is the power series ck (x − a)k and f (x) has radius of convergence R > 0 then f (x) is
k=0
differentiable and continuous on the interval (a − R, a + R), and
∞
X
1
′
f (x) = kck (x − a)k−1
k=1
∞
(x − a)k+1
Z X
2 f (x) dx = C + ck
k+1
k=0
Remark
Notice that we remove the k = 0 term from the derivative. The derivative of that term is 0, but
0c0 (x − a)−1 is undefined at x = a.
Example
∞
1 X
We have seen that = xk on the interval (−1, 1). From that we can compute:
1−x
k=0
∞ ∞
d X k X k−1
x = kx
dx
k=0 k=1
∞ ∞
xk+1
Z X X
xk dx = +c
k+1
k=0 k=0
222
Section 3.4
Exercises
Summary Questions
Q2 What test is useful for establishing the domain of a power series? What form can this domain
have?
3.4.1
1
b 2 − 14 x2 + 18 x4 − 1 6
16 x + 1 8
32 x − ···
x2 x4 x6 x8
a 1− 2 + 24 − 720 + 40640 − ···
223
Section 3.4 Exercises
3.4.2
1
Q7 Consider f (x) = 1−4x2 .
5
Q8 Write as a power series centered at x = 2.
1 − 3(x − 2)
∞
X k3
Q9 Can the power series p(x) = (x + 7)k be evaluated using the sum of a geometric series
4k
k=1
formula? Explain.
∞
X 1
Q10 Evaluate f (x) = (x − 2)k at x = 6 using the formula for the sum of a geometric series.
5k
k=3
3.4.3
∞
X
Q11 What is the domain of 2k (x − 3)k ?
k=1
∞
X (x + 2)k
Q12 Compute the domain of .
k3
k=0
∞
X 1
Q13 Compute the domain of (x − 6)k .
4k
k=0
∞
X xk
Q14 Compute the domain of .
k!
k=0
∞
X
Q15 Compute the radius of convergence of k(x + 3)k . What interval does this guarantee the series
k=0
converges on?
∞
X
Q16 Compute the radius of convergence of k!xk . What interval does this guarantee the series
k=0
converges on?
224
∞
X 4k
Q17 Compute the radius of convergence of (x − 5)k . What interval does this guarantee the
3k
k=1
series converges on?
Q18 Suppose you are told that a given power series p(x) centered at x = a converges at x = −4 and
diverges at x = −7.
b What are all of the the possible values of a? Explain your reasoning (briefly).
3.4.4
∞
X
Q19 Compute the antiderivative of 2k (x − 3)k .
k=0
∞
X (x + 2)k
Q20 Compute the derivative of . What is its domain?
k3
k=0
∞
X 1
Q21 Compute the derivative of (x − 6)k . What is its domain?
4k
k=0
∞
X xk
Q22 Compute the antiderivative of .
k!
k=0
∞
X
Q23 What is the domain of the fifth deriative of k(x + 3)k ?
k=0
∞
X 4k
Q24 Compute the radius of convergece of the antiderivative of (x − 5)k .
3k
k=4
225
Section 3.4 Exercises
a Compute the domain of P . You do not need to check any endpoints of your answer.
∞
X k
Q26 Consider the series S = .
2k
k=1
∞
X kxk−1
a How is S related to the power series p(x) = .
2k
k=1
c Write P (x) as ratio F (x), using the sum of a geometric series formula.
Diffrentiating f (x)
Writing f ′ (x) as a geometric series
Taking an antiderivative of the geometric series
226
Section 3.5
Taylor Series
Goals:
Our goal has been to understand how to extend a Taylor polynomial to have infinite degree. We are
now ready to define the object rigorously. In general we will not know how to evaluate Taylor series.
If all we want to do is approximate values, they offer no advantages over Taylor polynomials. The
applications of Taylor series are more abstract. After defining these objects, we collect some tricks and
applications for working with them.
Question 3.5.1
What Is a Taylor Series?
Definition
∞
X f (k) (a)
T (x) = (x − a)k .
k!
k=0
The Taylor series’s notation simply swaps an n for an ∞ in the expression of a Taylor polynomial. If
we wanted to describe the mathematical relationship precisely, we would say its partial sums sn are the
Taylor polynomials Tn (x) of f at x = a.
Remark
Several mathematicians contributed to the discovery of Taylor series. Taylor series centered at x = 0
were popularized by Colin Maclaurin, and so are often called Maclaurin series.
This definition is built upon a stack of more general definitions, and the methods we have for working
with those apply here.
A Taylor series is a type of power series.
227
Question 3.5.1 What Is a Taylor Series?
Taylor polynomials were designed to approximate f (x). We might hope that T (x) would be the perfect
approximation, that T (x) and f (x) are equal. Unfortunately, there are obstacles to this.
The Taylor series might not converge for all x.
The Taylor polynomials might not approximate f (x) very well at all. Recall our example
(
0 if x ≤ 0
f (x) = 1
e− x if x > 0
Example 3.5.2
Writing a Taylor series
Let f (x) = ex
Solution
a We have seen previously that f (k) (x) = ex for all k and thus f (k) (0) = 1. We plug this into the
Taylor series formula.
∞
X 1 k
T (x) = x
k!
k=0
b A Taylor series is a power series. We will use the ratio test to identify the interval of convergence.
228
The ratio of successive terms is
1 k+1
ak+1 (k+1)! x
= 1 k
ak k! x
k!xk+1
=
(k + 1)!xk
x
=
k+1
x
lim =0
k→∞ k+1
This limit is zero no matter what value of x we choose. Since 0 < 1, the ratio test concludes that
this series converges for any value of x. In other words, the domain is all real numbers.
Synthesis 3.5.3
Is a Taylor Series Equal to the Function it Approximates?
Let f (x) = ln x
a Find a pattern in the derivatives and write a general expression for the kth derivative: f (k) (x).
b Use your answer to a to write expressions for the Taylor polynomials Tn (x) and the Taylor series
c What does the ratio test tell you about where T (x) converges?
d If we wanted to apply Taylor’s inequality to Tn (x), we would need to know where the derivative is
largest (in absolute value). Where is the (n + 1)th derivative largest on the interval [x, 1]? (Here
0 < x < 1).
e Where is the (n + 1)th derivative largest on the interval [1, x]? (Here x > 1).
g What does our answer to the previous question tell us about T (x)?
229
Synthesis 3.5.3 Is a Taylor Series Equal to the Function it Approximates?
Solution
a Let’s compute some derivatives and see if we can find an expression for f (k) (x)
These answers look like factorials, but they’re shifted by 1. They’re also alternating signs, which
we can model with (−1)k , except that the even powers are negative. The power of x is −k. One
way to model this is f (k) (x) = (−1)k+1 (k − 1)!x−k .
b Plugging in x = 1 gives f (k) (1) = (−1)k+1 (k − 1)! except at k = 0. For that case we compute
ln 1 = 0. This means we can leave it out of the summation. The form for the remaining terms
allows for some nice simplification.
∞
X (−1)k+1 (k − 1)!
T (x) = (x − 1)k
k!
k=1
∞
X (−1)k+1
= (x − 1)k
k
k=1
(−1)k+2 k+1
ak+1 (k+1) (x − 1)
= (−1)k+1
ak (x − 1)k
k
230
Now we’ll solve for when the limit of this ratio is less than 1.
k|x − 1|
lim <1
k→∞ k+1
k
|x − 1| lim <1
k→∞ k+1
|x − 1| < 1
−1 < x − 1 < 1
0<x<2
The Taylor series converges on the interval (0, 2).
d To apply Taylor’s inequality to bound |Rn (x)|. We need a bound on |f (n+1) (x)| on the interval
from 1 to x. Looking back at our earlier computation, we obtain f (n+1) (x) = (−1)n+2 n!x−n−1 .
In this case that x > 1, the derivative f (n+1) decreases in magnitude from x to 1 so it is largest
at x. We can use M = n!x−n−1 .
e In this case, f (n+1) decreases in magnitude from 1 to x so it is largest at 1. We can use M = n!.
n!
|Rn (x)| ≤ (x − 1)n+1
(n + 1)!
1
≤ (x − 1)n+1
n+1
n!x−n−1
|Rn (x)| ≤ (x − 1)n+1
(n + 1)!
n+1
1 x−1
≤
n+1 x
x−1
This goes to 0 if x ≤ 1 and infinity otherwise. Solving this (and assuming x > 0) gives x ≥ 12 .
Putting these together, we can state that the error bound from Taylor’s inequality approaches 0
as we takes higher degree Taylor polynomials, as long as 12 ≤ x ≤ 2.
1
g The answer to the previous question tells us that T (x) converges to ln x on 2, 2 , since the error
bound and hence the error goes to 0. On the other hand, outside this interval, the error might
still go to 0 on 0, 12 , even though the error bound does not. The series diverges outside (0, 2)
so it cannot converge to ln x there.
231
Synthesis 3.5.3 Is a Taylor Series Equal to the Function it Approximates?
Remark
It turns out that T (x) = ln x on (0, 2], which is a larger interval than we were able to establish using
Taylor’s inequality. This should not bother us. Taylor’s inequality produces a bound on the error. The
fact that the bound on the error is going to infinity, doesn’t mean the actual error does. In this case,
for x between 0 and 21 , the actual error approaches 0.
Example 3.5.4
Mixing Taylor Series and Algebra
Solution
We could try to work out a pattern in the derivatives of f , but even evaluating at x = 0 the computations
become intractable.
f ′ (x) = 2x sin x + x2 cos x
232
Instead we can write the Taylor series for sin x. Our earlier work gave us an expression for the Taylor
polynomials and showed that their error goes to 0 as the degree goes to infinity.
∞
X (−1)k ) 2k+1
sin x = x
(2k + 1)!
k=0
We can obtain an expression for x2 sin x by multiplying both sides by x2 . Since we’re only multiplying
by a power of x, the resulting series will still be a power series centered at 0.
∞
X (−1)k ) 2k+1
x2 sin x = x2 x
(2k + 1)!
k=0
∞
X (−1)k ) 2k+3
= x
(2k + 1)!
k=0
Main Idea
When constructing a Taylor series for f (x) = xk g(x) centered at 0, construct the Taylor series of g(x),
and then distribute the xk .
Example 3.5.5
Integrating a Taylor Series
2
Let f (x) = ex .
Z
2
c Compute a Taylor series for ex dx.
233
Example 3.5.5 Integrating a Taylor Series
Solution
a We will compute the first four derivatives of f (x). We will need the chain rule and later the
product rule.
2
f (x) = ex f (0) = 1
2
f ′ (x) = 2xex f ′ (0) = 0
2 2
f ′′ (x) = 2ex + 4x2 ex f ′′ (0) = 2
2 2
f ′′′ (x) = 12xex + 8x3 ex f ′′′ (0) = 0
2 2 2
f (4) (x) = 12ex + 48x2 ex + 16x4 ex f (4) (0) = 12
We can plug these values into our T4 (x) formula.
1 0 0 2 0 12
T4 (x) = x + x1 + x2 + x3 + x4
0! 1! 2! 3! 4!
1
= 1 + x2 + x4
2
We can see that our derivative calculations would quickly get out of hand as we take higher order
derivatives. Even if there is a discernible pattern, it might take more computation to determine it.
∞
X 1 k
ex = x
k!
k=0
2
ex is a composition of ex and x2 , so we will plug in x2 for x in our ex Taylor series.
∞
2 X 1 2 k
ex = (x )
k!
k=0
∞
X 1 2k
= x
k!
k=0
c Taylor series are also power saeries. By our theorem on power series, we can integrate term by
term.
Z ∞
2 X 1
ex dx = = x2k+1 + c
k!(2k + 1)
k=0
2
Note that ex dx is not a function we can express algebraically or compute. A Taylor series gives
R
us some way to represent this function, but we shouldn’t be too satisfied. If we actually wanted
to evaluate it, the best we could do is approximate it with a partial sum.
234
2 2
Figure: The graph of ex , ex dx, and the partial sums of its Taylor series.
R
Main Ideas
Application 3.5.6
Euler’s Formula
b Write your answer in terms of the Taylor series for sin x and cos x.
c Write two different expressions for ei2x . How is this equation useful?
235
Application 3.5.6 Euler’s Formula
Solution
∞
X 1
T (x) = (ix)k
k!
k=0
i0 = 1 i4 = 1
i1 = i i5 = i
..
i2 = −1 .
i3 = −i
1 1 1 1 1 1
T (x) = 1 + ix − x2 − ix3 + x4 + ix5 − x6 − ix7
2 3! 4! 5! 6! 7!
The terms with a factor of i are the Taylor series for sin x multiplied by i. The terms without a
factor of i are the Taylor series for cos x. We can write
Setting these equal to each other, we note that for two complex numbers to be equal, their real
parts must be equal and their imaginary parts must be equal.
These are the double angle formulas for sine and cosine.
236
We can take higher powers of eix to produce triple or quadruple angle formulas. This converts a
difficult geometry problem into something a high school algebra student could compute.
Remark
You would expect a relationship like this to be very famous, and it is. eix = cos x + i sin x is called
Euler’s Formula. In addition to trigonometric formulas, it gives us insight into the complex numbers.
This connection between an exponential and a periodic function is so powerful that it is used in such
concrete applications as electrical engineering and signal processing.
Section 3.5
Exercises
Summary Questions
Q1 How can we be sure that a Taylor series converges to the function it is approximating?
Q3 How can we produce the Taylor series for xn f (x) or f (xn )? Where does the center need to be
for the result to be a Taylor series?
3.5.1
Q5 If we wanted to compute a decimal approximation of ln(1.25) by hand, would the Taylor polyno-
mial or the Taylor series be more useful?
Q6 If T (x) is a Taylor series centered at x = a, what are the possible forms that the domain of T (x)
could take?
237
Section 3.5 Exercises
3.5.2
Q8 Let T (x) be the Taylor series of f (x) = ex centered at 0. Verify that T ′ (x) = T (x).
1
Q9 Write a Taylor series of f (x) = x centered at 4.
1
Q10 Write a Taylor series of f (x) = x2 centered at −5.
3.5.3
Q13 Show that the Taylor series of f (x) = ex centered at x = 0 is equal to f (x) for all real numbers
x.
Q14 Show that the Taylor series of f (x) = sin x centered at x = 0 is equal to f (x) for all real numbers
x.
1
Q15 Show that the Taylor series of f (x) = x centered at 4 is equal to f (x) for all x in the interval
(2, 6).
3k
Q16 Suppose for a function f we are able to place a bound of k! on the kth derivative of f over
any interval. For what values of x can we conclude that T (x), the Taylor series centered at 2, is
equal to f (x)?
∞
X (−1)k+1
Q17 We didn’t have a series test to determine whether converges. How does our analysis
k
k=1
of the Taylor series of ln x allows us to conclude that this series converges? Hint: what is T (2)?
Q18 For a general function f and its Taylor polynomials and series, how are the following sets of points
related? Does every number belonging to one of these sets belong to one of the others?
The set of numbers x where T (x) converges.
The set of numbers x where |Rn (x)| → 0 as n → ∞.
The set of numbers where f (x) = T (x).
238
3.5.4
Q21 Can we use our Taylor series for f (x) = ln x centered at 1 to write a Taylor series for g(x) =
x2 ln x? Explain.
(x+5)3
Q22 Write a Taylor series for f (x) = x2 centered at −5.
3.5.5
3
Q23 Let g(x) be an antiderivative of ex . Write the Taylor series for g(x) centered at x = 0.
Q24 Let g(x) be an antiderivative of cos(x2 ). Write the Taylor series for g(x) centered at x = 0.
Q25 Let f (x) = cos x. Let T (x) be the Taylor series of f centered at x = 0. Compute T ′′ (x). Why
does your answer make sense?
1
Q26 Write the Taylor series for f (x) = x centered at 1. Verify that one of its antiderivatives is a
Taylor series for ln x.
3.5.6
Q28 Use Euler’s formula to compute a formula for cos 3x in terms of cos x and sin x.
Q30 Use the Taylor series of ln x centered at x = 1 to compute ln(1 + i). Do you think this series
converges?
239
Section 3.5 Exercises
1
Q31 Let h(x) = x2 .
1
b If you wanted to use your Taylor polynomial from a to approximate 2.52 , what bound would
Taylor’s inequality put on the error? Don’t simplify the arithmetic.
c What does the ratio test tell you about the domain of the Taylor series of h(x) centered at
x = 4?
Q32 Let X be a normal random variable with mean 0 and standard deviation 1. Write a series whose
value is P (0 ≤ X ≤ 1).
Q33 Suppose we produce the Taylor series T (x) for some f (x) centered at x = 10.
b If the errors of the Taylor polynomials Tn (2) converge 0 as n goes to ∞ for some x, must
c If you wanted to approximate f (7) as accurately as possible, which would be more useful, a
Taylor polynomial or a Taylor series?
Q34 Suppose we have a function f (x) and two different numbers a and b. Suppose further that the
Taylor series for f (x) centered at a is equal to the Taylor series for f (x) centered at b. What
can you say about the domain of this Taylor series?
240
Chapter 4
Multivariable Functions
This chapter introduces functions of more than one variable. We construct the higher dimensional spaces
needed for their domains, we produce tools to visualize them, and we compute their rates of change.
Contents
4.1 Three-Dimensional Coordinate Systems . . . . . . . . . . . . . . . . . . . . 242
4.2 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 259
4.3 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
4.4 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
4.5 Linear Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Section 4.1
Suppose we wanted to understand the growth rate of a species of bacteria. We could grow several
dozen cultures and take a series of measurements of size s at times t of each. Each measurement is an
ordered pair (t, s). We can plot these pairs in a coordinate plane to get a visual sense of how growth
occurs over time. We might even fit a function that approximates s as a function of t. What if we wanted
to understand the role of some other measurement, like temperature, light, or the availability of various
food sources? We could grow many cultures in different conditions. Now a single measurement has three
or more pieces of information. While we could strip these out and plot our data on a temperature/size
coordinate plane, we risk missing important relationships with the other variables. In order to take
advantage of the visual and computational benefits of a coordinate system, we must be prepared to
work with a coordinate system of more than two variables.
Question 4.1.1
How Do Cartesian Coordinates Extend to Higher Dimensions?
The best way to define a higher-dimensional coordinate system is to extrapolate from the coordinate
plane. This way we don’t need to remember a set of novel and arbitrary rules, and our two-dimensional
experience will be a guide to us in dimensions where we have no visual intuition.
Recall how we constructed the Cartesian plane.
2
3 Axes consist of the points displaced in only one direction.
1
−3
5 Either displacement can happen first.
−4
6 Each point has exactly one ordered pair that refers to it.
242
1 Assign origin and three directions (x, y, z).
Question 4.1.2
How Do We Establish Which Direction Is Positive in Each Axis?
The choice of which direction is positive is arbitrary. However, it is important that we all make the
same choice, or our visualizations will be incompatible. In two dimensions, we agree that the positive
y-axis is counterclockwise from the positive x-axis. This will not work in three dimensions. Suppose the
positive y-axis is counterclockwise from the positive x-axis in three-space. If you rotate your point of
view to see the axes from the other side, the positive y-axis is now clockwise from the positive x-axis.
Thus the relative orientation of the positive x and y directions does not matter. You could pick a
different orientation, and just be looking at three-space from a different viewpoint.
The z direction is different. Once we’ve chosen a positive x and y direction, there are two equally
valid possible directions for positive z, pointing in opposite directions from each other. The choice here
matters, but it will be arbitrary. We agree to define the positive z direction by the right hand rule.
The right hand rule says that if you make the fingers of your right hand follow the (counterclockwise)
unit circle in the xy-plane, then your thumb indicates the direction of the positive z-axis.
243
Example 4.1.3
Drawing a Location in Three-Dimensional Coordinates
2 in the x direction
3 in the y direction
5 in the z direction.
Solution
We can begin by finding the points (2, 0, 0) which lies on the x-axis two units from the origin and
(0, 3, 0) which lies on the y-axis three units from the origin. Along with the origin itself, these points
and (2, 3, 0) form a parallelogram. Now we need a displacement of 5 in the z direction. We can copy
the length and direction of this displacementof the segment from (0, 0, 0) to (0, 0, 5) on the z-axis. We
draw a segment of that length and direction from (2, 3, 0). The top of this segment is (2, 3, 5).
Remark
The extra lines we used to construct (2, 3, 5) are not just useful for guaranteeing accuracy, they also help
our audience to correctly visualize the location we mean to plot. When we project three-space onto a
flat page, each point on the page represents infinitely many points stretching into the background. If we
only draw a isolated point, which of these are we representing? Lines like the ones we produced in this
example trick a viewers brain into visualizing correct three-dimensional location in our flat diagram.
244
Solution
The procedure here is the same, except that the displacements in the x and z directions are negative.
Thus when producing these displacements, we travel backward along their axes.
Question 4.1.4
How Do We Measure Distance in Three-Space?
Since coordinate displacements in two-space are perpendicular, we compute the distance to a point
using the Pythagorean theorem. This reasoning extends to higher dimensions, but we need to build the
correct length using two or more right triangles.
Theorem
The distance from the origin to the point (x, y, z) is given by the Pythagorean Theorem
p
D = x2 + y 2 + z 2
245
Question 4.1.4 How Do We Measure Distance in Three-Space?
We first compute the distance from the origin to (x, y, 0) using a right triangle in the xy-plane.
The right triangle with the vertices (0, 0, 0), (x, y, 0) and (x, y, z) allows us to apply the Pythagorean
theorem again.
p 2
D2 = x2 + y 2 + z 2
If neither of the points is the origin, we can compute the displacements by subtraction. This is a
natural extension of the two-space distance formula.
Theorem
The distance from the point (x1 , y1 , z1 ) to the point (x2 , y2 , z2 ) is given by
p
D = (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2
Question 4.1.5
What Is a Graph?
A well-prepared calculus student has learned to understand the graphs of many equations: lines,
circles, parabolas. The definition of a graph, on the other hand, is often discarded after a few exercises
of plotting points by hand. The definition is worth recalling. It applies to a space of any dimension.
Definition
The graph of an implicit equation is the set of points whose coordinates satisfy that equation. In other
words, the two sides are equal when we plug the coordinates in for x, y and z.
This definition allows us to immediately understand the graphs of some equations. The graph of
the following equation consists of the points that, when plugged into a specific distance formula and
squared, give a result of 9. This is a sphere.
246
Example
The graph of
x2 + (y − 4)2 + (z + 1)2 = 9
Example 4.1.6
Graphing an Equation with Two Free Variables
Solution
The naive approach would have us seek out the point marked with 3 on the y-axis. However, in two-
space, we know that the graph would be a horizontal line, not just the point (0, 3). Why is this? Any
point of the form (x, 3) satisfies the equation y = 3. Similarly, any point of the form (x, 3, z) in three-
space satisfies y = 3. These are all the points that can be reached from (0, 3, 0) by displacements in
the x and z directions. They create a plane through (0, 3, 0) parallel to the x and z axes.
Much as lines are the simplest and most fundamental one-dimensional objects, planes are the simplest
and most fundamental two-dimensional objects. In addition to coordinate axes, 3-dimensional space has
3 coordinate planes.
1 The graph of z = 0 is the xy-plane.
2 The graph of x = 0 is the yz-plane.
247
Example 4.1.6 Graphing an Equation with Two Free Variables
Remark
Planes extend forever but our pictures of them cannot. Notice that graphing software cuts them off
parallel to the axes they contain. The resulting images are parallelograms. This is a good practice when
drawing planes by hand too. It suggests the proper orientation to the viewer, despite the limitations of
a flat visualization.
Example 4.1.7
Graphing an Equation with One Free Variable
Solution
We should recognize this as the equation of a parabola. If we ignore the variable y, we can graph this
equation in the xz plane. What does the absence of absence of y in the equation mean? If we follow
the definition of a graph, the value of y has no effect on whether a point lies on the graph or not. We
can take the parabola in the xz plane, and project it in the y direction to obtain a surface called a
parabolic cylinder.
248
z = x2 − 3
Question 4.1.8
What Do the Graphs of Implicit Equations Look Like Generally?
Notice that the graph of an implicit equation in the plane is generally one-dimensional (a curve),
whereas the graph of an implicit equation in three-space is generally two-dimensional (a surface).
Question 4.1.9
What Is the Slope-Intercept Equation of a Plane?
Unlike a line, a non-vertical plane has two slopes. One measures rise over run in the x-direction, the
other in the y-direction.
249
Question 4.1.9 What Is the Slope-Intercept Equation of a Plane?
Equation
A plane with z intercept (0, 0, b) and slopes mx and my in the x and y directions has equation
z = mx x + my y + b.
Example 4.1.10
Writing the Equation of a Plane
Write the equation of a plane with intercepts (4, 0, 0), (0, 6, 0) and (0, 0, 8).
Solution
From the point (4, 0, 0) to the point (0, 0, 8), the plane rises by 8 while x is reduced by 4. This gives a
slope in the x direction.
8−0
mx = = −2.
0−4
Similarly,
8−0 4
my = =− .
0−6 3
The point (0, 0, 8) is on the z-axis, and so indicates that the z-intercept is 8. Combining these, we
conclude the plane has equation:
4
z = −2x − y + 8
3
250
Main Idea
Question 4.1.11
How Do We Extrapolate to Even Higher Dimensions?
The measurements we take of each observation, the more dimensions we need to plot the data we
have produced. Extrapolating from three-space to even higher dimensions introduces no new difficulties,
except that we cannot visualize the result. We can use a coordinate system to describe a space with
more than 3 dimensions. k-dimensional space can be defined as the set of points of the form
P = (x1 , x2 , . . . , xk ).
Theorem
There is no right hand rule for higher dimensions, because we can’t draw these spaces anyway.
251
Section 4.1
Exercises
Summary Questions
Q2 What is the right hand rule and what does it tell you about a three-dimensional coordinate
system?
Q3 In three-space, what is the y-axis? What are the coordinates of a general point on it?
Q4 In three space, what is the xz-plane? What are the coordinates of a general point on it? What
is its equation?
4.1.1
Q7 Suppose that instead of denoting each point P = (x, y) in R2 by its displacements from the
origin in the x- and y-directions, we denote it by P = (d, m) where d is its distance from the
origin, and m is the slope of the line through P and the origin. What problems could arise from
adopting this convention?
Q8 Suppose the x and y axes were not perpendicular. Could we still assign coordinates to each point
by its x and y displacements from the origin? Demonstrate with a diagram.
252
4.1.2
Q9 Which of the following depictions of the xy-plane are consistent with the usual orientation, and
which are backwards?
a The positive x axis points up, and the positive y-axis points left.
b The positive x axis points down, and the negative y-axis points right.
c The positive x axis points left, and the positive y-axis points up.
d The negative x axis points right, and the positive y-axis points down.
e The positive x axis points up and to the right, and the positive y-axis points down and to
the right.
Q10 Suppose we draw the xy plane on our paper in the standard way, and our paper is lying on a
table. Does the z-axis point down into the table or up out of the table?
4.1.3
a (6, 1, 2)
b (−3, 0, 0)
c (2, −1, 4)
d (0, 3, 5)
a (−4, 0, 0)
b (3, −2, 0)
253
Section 4.1 Exercises
c (4, 5, −3)
d (−1, 3, 4)
4.1.4
Q15 Compute the distance between (10, 12, 109) and (11, 9, 105).
Q16 Compute the distance between (53, 42, 9) and (43, 78, 2).
4.1.5
Q17 Does the point (4, 3, 8) lie on the graph of z = x2 − 2? Explain how you know.
Q18 Does (2, 2, 1) lie on the graph of x2 + y 2 + z 2 = 9? Explain how you know.
Q20 The point (2, 3, 4) lies on the graph ax + ay − z = 26. What is the value of the number a?
Q21 Olivia says that the graph of (x − 2)(y − 3) = 0 in the xy-plane is the point (2, 3). Do you agree?
How would you explain it?
Q22 How is the graph of f (x, y, z)g(x, y, z) = 0 related to the graphs of f (x, y, z) = 0 and g(x, y, z) =
0?
254
4.1.6
Q23 Does the graph of z = 4 intersect the graph of z = 6? Explain both using geometry and algebra.
4.1.7
a x = −4
b x2 + y 2 = 9
c x2 + 4x + y 2 + z 2 − 2z = 4
4.1.8
Q29 What dimension you we expect the graph of an equation to be in 6-dimensional space?
Q30 What is the graph of x2 + y 2 = 0 in the xy-plane? Is this an exception to our intuition about
the dimension of a graph?
Q31 Zoe and Muhammad both sketch the graph of y = x2 . Zoe’s graph is a curve. Muhammad’s is
a surface. Has one of them drawn the wrong graph? Explain.
Q32 In R3 , what is the dimension of the intersection of the graphs x2 + y 2 = 25 and z = 1? Can you
explain this in terms of our intuition about the dimension of a graph.
255
Section 4.1 Exercises
4.1.9
Q33 Suppose that y is a free variable in the equation of a plane. What does that tell us about mx
and my ?
Q34 Gabby is trying to find the equation of a plane P , but she doesn’t know any points on the xz-plane
or yz-plane. Instead she knows that P contains the points:
a Which of Gabby’s conclusions do you agree with and which do you disagree with? Why?
Q35 Supoose you intend to write the equation of the plane through A, B and C in slope-intercept
form. If A = (3, 5, 7) and B = (3, 2, 4), what value(s) of the y coordinate of C would make it
easiest to compute mx ?
Q36 Recall that we can write the equation of a line in R2 in point-slope form:
y − y0 = m(x − x0 )
where m is the slope and (x0 , y0 ) is a known point. This was especially useful in single-variable
calculus for writing equations of tangent lines.
a How would you expect to write the equation of the plane P through (2, 4, −6) with slopes
1
mx = 2 and my = −3?
b Does your answer to a actually pass through (2, 4, −6)? How do you know?
c Is your answer to a actually the equation of a plane? How do you know? Does it have the
correct slopes?
Q37 The plane P has slopes mx = 3 and my = −1 and passes through (2, 5, −1).
256
b What is the z-intercept of P .
Q38 Given a plane with mx = 5 and my = 2, we can conclude that the plane is steeper in the
x-direction than the y-direction. Is the x-direction the steepest direction we could travel in? If
not, what is?
4.1.10
Q39 Write the equation of a plane through (3, 0, 0), (0, 7, 0), and (0, 0, −1).
Q40 Write the equation of a plane with intercepts (2, 0, 0), (0, −2, 0), and (0, 0, 4).
Q41 Write the equation of a plane through (6, 4, 1), (6, 7, −2), and (8, 7, 1).
Q42 Write the equation of a plane through (2, 2, 1), (4, 2, 9), and (2, 0, 0).
Q43 Write the equation of a plane through (3, 4, 2), (5, 5, 6), and (7, 4, 6).
Q44 Write the equation of a plane through (1, 5, 2), (11, 5, 4), and (6, 3, −3).
4.1.11
Q45 Assuming you could draw in 4 dimensions, describe how you might construct the graph of x21 +
x23 + x24 = 25 in R4 .
Q46 Assuming you could draw in 4 dimensions, describe how you might construct the graph of x2 = x23
in R4 .
257
Section 4.1 Exercises
Q49 The points (1, 0, 3) and (1, 4, 0) are both on the sphere S. What are the possible values for the
radius of S?
Q50 The graph of x2 + y 2 = 0 in R2 is a point, not a curve. Use this idea to write an equation for the
intersection of the graphs f (x, y, z) = c and g(x, y, z) = d. What do you expect the dimension
of this intersection to be?
Q51 Suppose the x and y axes in R2 were not perpendicular. Would the distance formula still hold?
Demonstrate.
258
Section 4.2
If we want to understand the relationship between variables, a function is the gold standard. For
example, when we can write y as a function of x, then at each value of x, we simply need plug in
the value and simplify the arithmetic. There is no chance that algebraic manipulation will lead us to
multiple values of y, or to an equation we cannot solve. Naturally, we want to understand this type of
relationship between more than two variables. Much like our investigation of n-space, we’ll begin by
adding one variable. After this initial step, extrapolating to more variables will be straightforward.
Question 4.2.1
What Is a Function of More than One Variable?
Definition
A function of two variables is a rule that assigns a number (the output) to each ordered pair of real
numbers (x, y) in its domain. The output is denoted f (x, y).
p
Some functions can be defined algebraically. If f (x, y) = 36 − 4x2 − y 2 then
p
f (1, 4) = 36 − 4 · 12 − 42 = 4.
Example 4.2.2
The Domain of a Function
p
Identify the domain of f (x, y) = 36 − 4x2 − y 2 .
259
Example 4.2.2 The Domain of a Function
Solution
The only obstacle to evaluating this function is that the value under the square root might be negative.
We can write an inequality to express this and solve.
36 − 4x2 − y 2 ≥ 0
36 ≥ 4x2 + y 2
x2 y2
1≥ +
9 36
These are the points inside an ellipse whose intercepts are (±3, 0) and (0, ±6).
Main Idea
When solving for the domain of an algebraic function, we look for the same obstacles to evaluating the
function that we do for one-variable functions.
sin x
Expressions in a denominator cannot be 0 (including built-in fractions like tan x = cos x )
260
Application 4.2.3
Temperature Maps
Many useful functions cannot be defined algebraically. There is a function T (x, y) which gives
the temperature at each latitude and longitude (x, y) on earth. No pair (x, y) has more than one
temperature, and no pair fails to have a temperature. Still there is no hope of producing an expression
that computes T for any x and y. Mathematically (though perhaps not meteorologically) this function
is arbitrary.
T (−71.06, 42.36) = 50
T (−83.74, 42.28) = 41
T (−84.38, 33.75) = 59
This function is represented graphically by using color to portray the value of T at each point.
Application 4.2.4
Digital Images
A digital image is made up of pixels, each with a different color. In many modern images, these
pixels are too small to see. The color of each pixel is a function of that pixel’s location. Since colors are
harder to define numerically, we can consider the simpler case: where each pixel is a different shade of
gray. In this case we have a brightness function B(x, y) where the output is a number that represents
the brightness of the pixel at the coordinates (x, y).
1024
x
687
261
Application 4.2.4 Digital Images
Remark
The brightness function differs from other functions we’ve studied in one key way. It is only defined for
(x, y) where x and y are integers. Other examples can take any real numbers as coordinates. This makes
our usual calculus methods impossible. We cannot get arbitrarily close to a point in order to compute a
limit. All other points are at least 1 unit away. However, if we are willing to settle for approximations,
we can apply calculus and get useful results.
Question 4.2.5
What Is the Graph of a Two-Variable Function?
A graph is our most important way to visualize a function. The graph of a one variable functions
is an object in two-space. One dimension measures the input variable. The other measures the output.
For a two variable function, the graph lies in three-space.
Definition
The graph of a function f (x, y) is the set of all points (x, y, z) that satisfy
z = f (x, y).
The height z above a point (x, y) represents the value of the function at (x, y). In this figure,
f (1, 4) is equal to the height of the graph above (1, 4, 0).
p
Figure: The graph z = 36 − 4x2 − y 2
262
Question 4.2.6
How Do We Visualize a Graph in Three-Space?
Three-space is harder to visualize than two-space. What’s more, plotting points is more arduous with
two dimensions of inputs. In the absence of computer graphics, mathematicians have used a variety of
visualization tools.
Definition
A level set of a function f (x, y) is the graph of the equation f (x, y) = c for some constant c. For a
function of two variables this graph lies in the xy-plane and is called a level curve.
Example
Level curves take their shape from the intersection of z = f (x, y) and z = c. Seeing many level
curves at once can help us visualize the shape of the graph.
Figure: The graph z = f (x, y), the planes z = c, and the level curves
263
Example 4.2.7
Drawing Level Curves
Solution
The level sets are the points where the temperature has a certain value. Since the colors represent ranges
of temperatures, it’s difficult to pick out the level sets within that range. However, at the transition from
one color to the next, we know that the temperature is equal to the cutoff temperature between those
ranges. The picture below shows a reasonable attempt to sketch three level curves in white. Notice
that the level curves (especially the one between green and yellow) are not connected, and that drawing
them in perfect detail is beyond the ability of a human.
264
Example 4.2.8
Using Level Curves to Describe a Graph
What features can we discern from the level curves of this topographical map?
265
Example 4.2.8 Using Level Curves to Describe a Graph
Solution
A
D
D
B
C
D
The points marked D are in the middle of a series of rings of level curves. These are either the
tops of hills or (less likely given the context) the bottoms of valleys.
Example 4.2.9
A Cross Section
Definition
The intersection of a plane with a graph is a cross section. A level curve is a type of cross section, but
not all cross sections are level curves.
266
p
Find the cross section of z = 36 − 4x2 − y 2 at the plane y = 1.
p
Figure: The y = 1 cross section of z = 36 − 4x2 − y 2
Example 4.2.10
Converting an Implicit Equation to a Function
Definition
We sometimes call an equation in x, y and z an implicit equation. Often in order to graph these, we
convert them to explicit functions of the form z = f (x, y)
267
Example 4.2.10 Converting an Implicit Equation to a Function
Question 4.2.11
How Does this Apply to Functions of More Variables?
We can define functions of three variables as well. Denoting them f (x, y, z). For even more variables,
we use x1 through xn . The definitions of this section can be extrapolated as follows.
Variables 2 3 n
Function f (x, y) f (x, y, z) f (x1 , . . . , xn )
Domain subset of R2 subset of R3 subset of Rn
Graph z = f (x, y) in R3 w = f (x, y, z) in R4 xn+1 = f (x1 , . . . , xn ) in Rn+1
Level Sets level curve in R2 level surface in R3 level set in Rn
268
Observation
p
x2 + y 2 + z 2 = 25 f (x, y) = ± 25 − x2 − y 2
F (x, y, z) = x2 + y 2 + z 2
F (x, y, z) = 25
Section 4.2
Exercises
Summary Questions
269
Section 4.2 Exercises
4.2.1
√
Q7 Is f (x, y) = ± 4x − y a function? Explain.
(√
y if y ≥ 0
f (x, y) = √
x if x ≥ 0
4.2.2
1
Q9 Compute the domain of f (x, y) = x+y .
1
Q10 What is the domain of f (x, y) = x2 +y 2 ?
p
Q11 What is the domain of g(x, y) = x3 + y 2 − 25?
√
x+3
Q13 What is the domain of f (x, y) = y 2 −x ?
4x
Q14 Compute the domain of h(x, y) = y−ln x
270
4.2.3
Q15 On the temperature map, we saw T (−84.38, 33.75) = 59. Is T (−84.38, 35.75) greater than or
less than 59?
Q16 On the temperature map, we saw T (−83.74, 42.28) = 41. Is T (−93.74, 42.28) greater than or
less than 41?
Q17 What range of temperatures are found in South Dakota? In which parts of the state are the
extreme temperatures found?
Q18 Can you use this diagram to approximate T (−61.06, 42.36)? Explain.
4.2.4
Q20 In our blow-up of the digital image, we see Mona Lisa’s eye is near the coordiante (369, 800).
Where is her other eye?
4.2.5
Q21 Can the points (1, 3, 5) and (1, 3, 7) both be on the graph of z = f (x, y)? Explain.
Q22 If the graph z = f (x, y) is below the xy-plane, what does that tell us about f (x, y)?
Q24 What is the significance of the points where the graph z = f (x, y) intersects the xy-plane?
271
Section 4.2 Exercises
4.2.6
x2
Q27 Describe the level curves of y .
y
Q28 Describe the level curves of g(x, y) = ex .
Q29 Give the equation of the level curve of f (x, y) = x3 + y 3 that passes through (4, 2).
Q30 Give the equation of the level curve of g(x, y) = 17x2 − 3xy + y 3 that contains the point (1, 2).
Q31 Given a function f (x, y), how many level curves might pass through (3, 7)?
Q32 If the points (x1 , y1 ) and (x2 , y2 ) lie on the same level curve of h(x, y), what are the possible
4.2.7
Q33 In our level curves on the temperature map, what physical meaning can we take from the fact
that the green-yellow and red-orange level curves are closer together in Kansas than they are
farther east?
Q34 Explain why it makes sense physically that level curves of a temperature function would be
complicated and disconnected.
4.2.8
Q35 In the topographical map, what can we deduce from the fact that no level curves cross the farm
fields in the lower center of the map?
Q36 Explain why it makes physical sense that there are level curves alongside the creeks in this map.
272
4.2.9
Q37 Give an equation for the y = 2 cross-section of the graph z = f (x, y) where f (x, y) = x3 + y 3 .
i. Give the equation of the y = 0 cross section of P . What is this graph? What is the
significance of the various parts of its equation?
ii. Give the equation of the x = 0 cross section of P . What is the significance of the various
parts of its equation?
iii. Give the equation and describe the set of all level curves of f .
Q39 If the cross sections of z = f (x, y) in the planes y = b are identical for all values of b, what does
that tell us about f ?
Q40 If f (x, y) is a function that satisfies f (x, y) = f (x, −y) for all x and y, how will this be refelected
4.2.10
ln y √
Q44 Explain why it would be difficult to write z − xz = 5 + x as an explicit function of the form
z = f (x, z). Choose a better dependent and variable and write that variable as a function of the
other two.
273
Section 4.2 Exercises
4.2.11
b Where does a level set of f lie in? What does a typical level set look like?
Q48 Show how the graph of an explicit function xn+1 = f (x1 , x2 , . . . , xn ) can be converted to the
level set of an n + 1-variable function.
Q49 Let f (x, y) = x2 . Sketch the graph of z = f (x, y). What is the role of y in this graph?
d What do the level sets tell you about the graph z = f (x, y)?
274
Q51 Consider the implicit equation: x = sin z.
1
b Describe (in words) what the cross section of the graph in the x = 2 plane looks like.
275
Section 4.3
Limits of multivariable functions are conceptually similar to one-variable functions. However, even
though the requirement is the same, it is a much harder to satisfy. Since there are so many more ways
to approach a given point in a higher dimensional space, there are more nearby points to check to see
whether the function is actually approaching the proposed limit.
Question 4.3.1
What Is the Limit of a Function?
Definition
We write
lim f (x, y) = L
(x,y)→(a,b)
if we can make the values of f stay arbitrarily close to L by restricting to a sufficiently small neighborhood
of (a, b).
Proving a limit exists requires a formula or rule. For any amount of closeness required (ϵ), you must
be able to produce a radius δ around (a, b) sufficiently small to keep |f (x, y) − L| < ϵ. For this reason,
we will not prove that any limits exist. We will present three examples of functions whose limit does
not exist.
Example 4.3.2
A Limit That Does Not Exist
x2 − y 2
Show that lim does not exist.
(x,y)→(0,0) x2 + y 2
276
Solution
x2 −y 2
Let’s define f (x, y) = x2 +y 2 . We will approach the point (0, 0) from two different directions. If
we approach along the x-axis, then the points on our path have the form (x, 0). When we plug these
2
into the function, the value is f (x, 0) = xx2 −0
+0 . This is equal to 1 for all values of x except 0, so as x
approaches 0, the values of f are arbitrarily close (in fact exactly equal) to 1.
On the other hand, if we approach 1 along the y-axis, then the points have the form (0, y). When
0−y 2
we plug these into the function, the value is f (0, y) = 0+y 2 . This is equal to −1 for all values of y
except 0, so as y approaches 0, the values of f are arbitrarily close (in fact exactly equal) to −1.
What does this say about the limit of f ? The lim f (x, y) ̸= 1 because there are points on the
(x,y)→(0,0)
y-axis do not give values close to 1, but any neighborhood of (0, 0) includes some points on the y-axis.
Similarly, lim f (x, y) ̸= −1. If we tried to argue that the limit had any other value, the x-axis
(x,y)→(0,0)
and y-axis would both present a problem. This this limit does not exist.
We can identify the problem behavior in the graph of z = f (x, y). As the graph approaches the
origin, there are points of all heights between −1 and 1. Specifically we can see the line above the x-axis
and below the y-axis. No amount of closeness can exclude this range of values.
We might take away the idea that checking limits of two-variable functions requires checking in both
the x-direction and the y-direction. Unfortunately, even that is not sufficient.
Example 4.3.3
Another Limit That Does Not Exist
xy
Show that lim does not exist.
(x,y)→(0,0) x2 + y 2
277
Example 4.3.3 Another Limit That Does Not Exist
Solution
xy
Let f (x, y) = x2 +y 2 . We can check the values of this function on the x- and y-axes. Except at (0, 0),
f (x, 0) = 0 and f (0, y) = 0. However, not all the points close to (0, 0) lie on an axis. Suppose we work
with the points on another line: y = mx. These points have the form (x, mx). We can evaluate f on
this line.
(x)(mx)
f (x, xm) =
x2+ (mx)2
mx2
=
(m2
+ 1)x2
m
= 2 (except at (0, 0))
m +1
Thus there are point arbitrarily close to (0, 0) on which f is valued as low as −0.5 (m = −1) and as
high as 0.5 (m = 1). The limit does not exist.
xy 1
Figure: The graph z = x2 +y 2 and the line of height 2 over x = y.
We might take away the idea that checking limits of two-variable functions requires checking along
each line through the point in question. Unfortunately, even that is not sufficient.
Example 4.3.4
Yet Another Limit That Does Not Exist
xy 2
Show that lim does not exist.
(x,y)→(0,0) x2 + y4
Solution
xy 2
Let f (x, y) = x2 +y 4 . We can check the values of this function on the x- and y-axes. Except at (0, 0),
278
f (x, 0) = 0 and f (0, y) = 0. We can also check the values along a line of the form y = mx.
(x)(mx)2
f (x, xm) =
x2 + (mx)4
m 2 x3
=
x2 (1 + m4 x2 )
m 2 x3
lim f (x, xm) = lim
x→0 x→0 x2 (1 + m4 x2 )
m2 x
= lim
x→0 1 + m4 x2
=0
Thus along each line, the values of f approach 0 as we approach the origin. However, we have not
considered paths that are not line. Consider the parabola x = y 2 . Points on this parabola have the form
(y 2 , y). We compute the values on this parabola.
(y 2 )(y)2
f (y 2 , y) =
(y 2 )2 + y 4
y4
=
2y 4
For any point on this parabola except the origin f has a value of 12 . Thus f takes values of 1
2 and 0 in
any neighborhood of (0, 0), meaning the limit does not exist.
xy 2 1
Figure: The graph z = x2 +y 4 , which limits to 0 along any line through the origin, but has height 2
over the parabola x = y 2
We take away from these exercises that establishing the value of a multi-variable limit cannot be
reduced to computing a single-variable limit, or even a family of single-variable limits. The formal
arguments that establish multi-variable limits are more advanced and beyond the scope of this text.
279
Question 4.3.5
What Tools Apply to Multi-Variable Limits?
The limit laws from single-variable limits transfer comfortably to multi-variable functions.
1 Sum/Difference Rule
2 Constant Multiple Rule
3 Product/Quotient Rule
These rules allow us to compute limits of complicated functions from simpler ones. How do we come
by those simpler limits in the first place? We can apply the kind of advanced arguments we alluded to
earlier. Another tool is the squeeze theorem.
then
lim f (x, y) = L.
(x,y)→(a,b)
Question 4.3.6
What Is a Continuous Function?
Definition
In a rigorous development of calculus, we compute limits and use them to show that functions are
continuous. Given that evaluating limits is beyond our current means, we will reverse the process. Rather
than worrying about how to prove the following theorem, we will assume it is true and use it to evaluate
limits.
280
Theorem
Polynomials, roots, trig functions, exponential functions and logarithms are continuous on their
domains.
Sums, differences, products, quotients and compositions of continuous functions are continuous
on their domains.
The limit of a continuous function is equal to the value of the function. When we need to compute
a limit of these functions, we’ll just evaluate them instead. Why didn’t this work in our examples? In
each of our examples, the function was a quotient of polynomials, but (0, 0) was not in the domain.
Remark
Limits, continuity and these theorems can all be extrapolated to functions of more variables.
Section 4.3
Exercises
Summary Questions
281
Section 4.4
Partial Derivatives
Goals:
The first task in developing calculus is to understand rates of change. In the single-variable case, we
ask how the dependent variable changes per unit of increase in the independent variable. With more
than one independent variable we must ask: what kind of increase do we mean? There is more than
one possible answer. Partial derivatives are the simplest and most intuitive rate of change.
Question 4.4.1
What Is the Rate of Change of a Multivariable Function?
Motivational Example
The force due to gravity between two objects depends on their masses and on the distance between
them. Suppose at a distance of 8, 000km the force between two particular objects is 100 newtons and
at a distance of 10, 000km, the force is 64 newtons.
How much do we expect the force between these objects to increase or decrease per kilometer of
distance?
Solution
64N − 100N N
= −0.018
10, 000km − 8, 000km km
Notice that the change in force is entirely attributable to the change in distance. That is because the
masses of the objects did not change. The only change in the dependent variables is the 2, 000km
increase in distance.
Our goals in understanding multi-variable rates of change are guided by what we accomplished with
one variable. Derivatives of a single-variable function were a way of measuring the change in a function.
Recall the following facts about f ′ (x).
f (x) − f (x0 )
x − x0
282
2 The derivative f ′ (x) is defined as a limit of slopes:
f (x + h) − f (x)
f ′ (x) = lim
h→0 h
y − y0 = f ′ (x0 )(x − x0 )
In the physics example above, the rate of change was easier to understand because only one inde-
pendent variable is changing. That was an average rate of change, taken between two points. We now
develop a corresponding instantaneous rate of change. A partial derivative measures the rate of change
of a multivariable function as one variable changes, but the others remain constant.
Definition
f (x + h, y) − f (x, y)
fx (x, y) = lim
h→0 h
and
f (x, y + h) − f (x, y)
fy (x, y) = lim .
h→0 h
We can see the idea of each partial derivative in the formula. fx compares the values of f at
(x + h, y) and (x, y). The x values change between these two points, but the y values remain constant.
The opposite is true in the formula for fy .
Notation
The partial derivative of a function can be denoted a variety of ways. Here are some equivalent notations
fx
∂f
∂x
∂z
∂x
∂
∂x f
Dx f
283
Example 4.4.2
Computing a Partial Derivative
∂ 2
Find ∂y (y − x2 + 3x sin y).
Main Idea
Solution
We take an ordinary derivative, treating y as the variable and x as a constant. The familiar rules of
derivatives apply. The sum rule means we can differentiate term-by-term.
∂ 2
∂y y = 2y
∂ 2
∂y x = 0, since the x2 term is treated as constant.
∂
∂y 3x sin y = 3x cos y, since 3x is treated as constant multiple of the function sin y.
Together this gives the partial derivative
∂ 2
(y − x2 + 3x sin y) = 2y + 3x cos y.
∂y
Synthesis 4.4.3
Interpreting Derivatives from Level Sets
Below are the level curves f (x, y) = c for some values of c. Can we tell whether fx (−4, 1.25) and
fy (−4, 1.25) are positive or negative?
284
Figure: Some level curves of f (x, y)
Solution
As x increases and y remains constant, we travel to the right in the coordinate plane. Based on the
labeling of the level curves, this takes f from the value 40 to values between 40 and 50, meaning f
increases. Thus fx > 0.
Similarly, as y increases and x remains constant, we travel upwards in the coordinate plane. This
takes f from the value 40 to values between 30 and 40, meaning f decreases. Thus fy < 0.
Question 4.4.4
What Is the Geometric Significance of a Partial Derivative?
The partial derivative fx (x0 , y0 ) is realized geometrically as the slope of the line tangent to z =
f (x, y) at (x0 , y0 , z0 ) and traveling in the x direction. Since y is held constant, this tangent line lives in
y = y0 , a plane perpendicular to the y-axis. The line is tangent to the cross section of the graph with
that plane.
285
Question 4.4.4 What Is the Geometric Significance of a Partial Derivative?
Example 4.4.5
Derivative Rules and Partial Derivatives
√
a f= xy (on the domain x > 0, y > 0)
y
b f= x
√
c f= x+y
d f = sin (xy)
Solution
√ √ √
a We can rewrite this as f (x, y) = x y. In this setting, y is a constant multiple. Thus
1 √
fx (x, y) = √
2 x
y
√ √
c We cannot rewrite this as f (x, y) x + y, because that is not a valid algebraic manipulation.
Instead we use the chain rule.
√ 1
The outer function is x. Its derivative is √
2 x
.
286
The inner function is x + y. Its derivative is 1.
By the chain rule
∂ √ 1 1
x+y = √ (1) = √
∂x 2 x+y 2 x+y
d We do not have an easy trig rule to break up products. We’ll use the chain rule again.
Main Idea
Sometimes we can detach the variable held constant from the changing variable using the rules of
algebra. When we can’t, we’ll often need a differentiation rule (usually the chain rule).
Question 4.4.6
What If We Have More than Two Variables?
We can also calculate partial derivatives of functions of more variables. All variables but one are
held to be constants. :
Example
If
f (x, y, z) = x2 − xy + cos(yz) − 5z 3 ,
then
∂f
= 0 − x − sin(yz)z − 0
∂y
= −x − z sin(yz)
287
Example 4.4.7
A Function of Three Variables
For an ideal gas, we have the law P = nRTV , where P is pressure, n is the number of moles of gas
molecules, T is the temperature, and V is the volume.
∂P
a Calculate ∂V .
∂P
b Calculate ∂T .
∂P
c (Science Question) Suppose we’re heating a sealed gas contained in a glass container. Does ∂T
tell us how quickly the pressure is increasing per degree of temperature increase?
Solution
∂P
c No. ∂T assumes n and V are constant, but glass expands as it heats. The volume of both the
container and the gas is increasing, not constant.
Question 4.4.8
How Do Higher Order Derivatives Work?
Taking a partial derivative of a partial derivative gives us a higher order partial derivative. We use
the following notation.
Notation
∂2f
(fx )x = fxx =
∂x2
288
Notation
∂ ∂ ∂2f
(fx )y = fxy = f=
∂y ∂x ∂y∂x
Remark
Notice the subscript notation and the ∂ notation express higher order derivatives in opposite order.
Subscripts are added to the right of f , which the differential operation is applied on the left of f .
Example 4.4.9
A Higher Order Partial Derivative
Solution
∂ ∂ ∂
cos(3x + x2 y)(3 + 2xy) = cos(3x + x2 y) (3 + 2xy) + cos(3x + x2 y)
(3 + 2xy)
∂y ∂y ∂y
= − sin(3x + x2 y)(x2 )(3 + 2xy) + cos(3x + x2 y)(2x)
289
Question 4.4.10
Does Differentiation Order Matter?
Theorem
If f is defined on a neighborhood of (a, b) and the functions fxy and fyx are both continuous on that
neighborhood, then fxy (a, b) = fyx (a, b).
This readily generalizes to larger numbers of variables, and higher order derivatives. For example
fxyyz = fzyxy .
Section 4.4
Exercises
Summary Questions
Q3 Can you think of an example where the partial derivative does not accurately model the change
in a function?
4.4.1
Q5 Give the equation of the line that lies in the plane x = 2 and is tangent to the graph z = xe3xy +x
at the point (2, 0, 4). You may give your equation in any notation that works in 2 dimensions.
290
Q6 Alexander performs an experiment with his wireless networking router. At each level of power
output (in miliwatts) and distance from his computer (in meters), he measures T (p, d), the
maximum transfer speed of data (in megabits per second). Here is a table of his observations.
a Use this data to approximate Tp (300, 20). Show what values you used. There is more than
one reasonable way to do this.
b What does the derivative in a mean in physical terms? Be precise and include units.
c Use this data to approximate Td (500, 30). Show what values you used. There is more than
one reasonable way to do this.
d What appears to be true about the sign of Td (p, d)? What does this mean in physical terms,
and why does it make sense?
4.4.2
Q7 Let f (x, y) = 7x2 + 5y cos x + ey . Compute fx (x, y). Explain the role of y in each term where
it is present.
Q8 Let f (x, y) = sin x sin y. Show how to compute fy (x, y) using the product rule, then suggest a
more efficient approach.
291
Section 4.4 Exercises
4.4.3
Q10 In the diagram from this example, use a point on the c = 30 level set to approximate fy (4, −1.25).
Q11 In the diagram from this example, use a point on the c = 50 level set to approximate fx (4, −1.25).
Q12 In the diagram from this example, what is fy (0, 0)? Explain your reasoning.
4.4.4
a f (x, y) = x2 − y 2
py
b f (x, y) = x (assume x > 0 and y > 0)
c f (x, y) = yexy
Q14 Find gx (x, y) and gy (x, y) for the following functions g(x, y)
292
2
+y 2
a g(x, y) = ex
b g(x, y) = y ln(y − x)
3x2 +4x−2
c g(x, y) = e(y3 )
4.4.5
Q15 Extrapolate from the limit defintion of fx (x, y) to give a limit definition of fx (x, y, z). Explain
why this limit represents a change in f where only x is changing.
√ ∂f
Q16 Let f (x, y, z) = e3x y + 3 yz + x3 z 7 . Compute ∂z .
2 ∂g
Q17 Let g(u, v, w) = euv+w . Compute ∂v .
er +es +et ∂p
Q18 Let p(r, s, t) = rst . Compute ∂r .
4.4.6
∂P
Q19 In this example, does the fact that glass expands as it is heated suggest that ∂T overstates or
understates the actual rate of pressure increase as T increases?
Q20 Suppose Jinteki Corporation makes widgets which is sells for $100 each. It commands a small
enough portion of the market that its production level does not affect the demand (price) for its
products. If W is the number of widgets produced and C is their operating cost, Jinteki’s profit
is modeled by
P = 100W − C.
∂P
Since ∂W = 100 does this mean that increasing production can be expected to increase profit at
a rate of $100 per widget?
293
Section 4.4 Exercises
4.4.7
Q21 Suppose g(s, t) is the partial derivative of f (s, t) with respect to t, and h(s, t) is the partial
derivative for g(s, t) with respect to s. Write h in terms of f using both subscript and ∂
notation.
Q22 Physicists note that velocity is the derivative of position with respect to time, and acceleration
is the derivative of velocity with respect to time. If s(t, f ) is the position of a rocket with f
∂3s
kilograms of fuel after t seconds, what is the physical meaning of ∂ 2 t∂f ?
4.4.8
Q23 If f (x, y) = sin(3x + x2 y) calculate fyx . Verify that you get the same answer that we did for
fxy .
2
Q25 Let g(x, y, z) = 2x3 z + yexy .
∂g
a Compute ∂y .
∂2g
b Compute ∂x2 .
x3 sin(xz)
g(x, y, z) =
y
∂g
a
∂y
∂2g
b
∂z 2
∂2g
c
∂z∂x
294
4.4.9
Q27 If f (x, y, z) is a smooth function, which of the following are equavalent to fxyyzy ?
i. fxzzyz
ii. fzyyxy
iii. fyyyzx
iv. fxxxyz
v. fxyzy
vi. fxyz
vii. fyxxzx
Q28 How many third partial derivatives does a two-variable function have? Assuming these derivatives
are continuous, which of them are equal according to Clairaut’s theorem?
exy ∂f ∂f
Q29 Let f (x, y) = x+y . Is ∂x = ∂y ? If so, why? If not, how are they related?
Q30 The function f (x, y) = ex+y has the strange property that fx x, y = fy (x, y) at every point
(x, y). What does this mean geometrically about the function f ?
Q31 Do we know that fx (x, y) is in fact a function? What fact about limits is relevant to this
question?
295
Section 4.5
Linear Approximations
Goals:
In single-variable calculus, the tangent line was one of the great applications of the derivative. It
solves a difficult geometry problem, but it also gives a method of approximating a difficult to compute
function. The height of the tangent line is close to the height of the graph near the point of tangency.
This means the value of the tangent line function approximates the value of the function, close to the
point of tangency. The two-variable analogue of a tangent line is a tangent plane.
Question 4.5.1
What Is a Tangent Plane?
Definition
A tangent plane at a point P = (x0 , y0 , z0 ) on a surface is a plane containing the tangent lines to the
surface through P .
296
Equation
If the graph z = f (x, y) has a tangent plane at (x0 , y0 ), then it has the equation:
Remarks
1 This is the point-slope form of the equation of a plane. fx (x0 , y0 ) and fy (x0 , y0 ) are the slopes.
2 x0 and y0 are numbers, so fx (x0 , y0 ) and fy (x0 , y0 ) are numbers. The variables in this equation
are x, y and z.
The cross sections of the tangent plane give the equation of the tangent lines we learned in single
variable calculus.
y = y0 x = x0
z − z0 = fx (x0 , y0 )(x − x0 ) + 0 z − z0 = 0 + fy (x0 , y0 )(y − y0 )
This shows that the tangent plane does contain these two tangent lines.
297
Example 4.5.2
Writing the Equation of a Tangent Plane
√
Give an equation of the tangent plane to f (x, y) = xey at (4, 0)
Solution
1 x0 = 4 is given.
2 y0 = 0 is given.
√
3 z0 is the height of the graph at (4, 0) which is 4e0 = 2.
4 To compute fx (x0 , y0 ) we compute the partial derivative function
1 √ y
fx (x, y) = √ e .
2 x
√ 1
fy (x, y) = x √ y ey
2 e
√ 1
fy (4, 0) = 4 √ e0 = 1
2 e0
1
z−2= (x − 4) + 1(y − 0)
4
which simplifies to
1
z−2= (x − 4) + y.
4
298
Question 4.5.3
How Do We Rewrite a Tangent Plane as a Function?
Definition
If the graph z = f (x, y) has a tangent plane, then L(x, y) approximates the values of f near (x0 , y0 ).
Notice f (x0 , y0 ) just calculates the value of z0 . This formula is equivalent to the tangent plane
equation after we solve for z by adding z0 to both sides.
Example 4.5.4
Approximating a Function
√
Use a linearization to approximate the value of 4.02e0.05 .
Solution
√ √
We don’t know 4.02e0.05 , but we can think of this as the value of the function f (x, y) = xey . We
don’t know the value of this function at (4.02, 0.05), but the point (4, 0) is nearby, and we can evaluate
it there. This is where we’ll produce our linearization. We already produced the equation of the tangent
plane in Example .4.5.2.
1
z − 2 = (x − 4) + y
4
We write z as the function L(x, y) and solve for it:
1
L(x, y) = 2 + (x − 4) + y
4
For points near (4, 0), L(x, y) is close to f (x, y). This is the basis of our approximation.
√
4.02e0.05 = f (4.02, 0.05) ≈ L(4.02, 0.05)
1
≈ 2 + (4.02 − 4) + 0.05
4
≈ 2 + 0.005 + 0.05
≈ 2.055
299
Question 4.5.5
How Does Differential Notation Work in More Variables?
The one-variable differential is a shorthand way to express change in the linearization of a function.
The differential dx is an independent variable. It can take on any value. The differential dy depends on
both x0 and dx.
dy = f ′ (x0 )dx
Once we’ve chosen x0 and dx, dy is the amount that the tangent line to y = f (x) at x0 rises when we
increase x by dx.
The differential dz measures the change in the linearization of f (x, y) given particular changes in the
inputs: dx and dy. It is a useful shorthand when one is estimating the error in an initial computation.
Definition
For z = f (x, y), the differential or total differential dz is a function of a point (x0 , y0 ) and two
independent variables dx and dy.
Remark
dz = z − z0 dx = x − x0 dy = y − y0 .
An old trigonometry application is to measure the height of a pole by standing at some distance.
We then measure the angle θ of incline to the top, as well as the distance b to the base. The height is
h = b tan θ.
300
π
a If the distance to the base is 13m and the angle of incline is 6, what is the height of the pole?
b Human measurement is never perfect. If our measurement of b is off by at most 0.1m and our
π
measurement of θ is off by at most 120 , use a differential to approximate the maximum possible
error in our h.
Solution
∂h ∂h
= tan θ = b sec2 θ
∂b ∂θ
∂h 1 ∂h 56
=√ =
∂b (13, π ) 3 ∂θ (13, π ) 3
6 6
∂h ∂h
dh = db + dθ
∂b ∂θ
1 56
= √ db + dθ
3 3
π
dh is largest when db = 0.1 and dθ is 120 .
1 56 π
max dh = √ (0.1) +
3 3 120
1 13π
= √ +
10 3 90
301
Section 4.5
Exercises
Summary Questions
Q1 What do you need to compute in order to write the equation of a tangent plane to z = f (x, y)
at (x0 , y0 , z0 )?
Q4 How is the differential defined for a two variable function? What does each variable in the formula
mean?
4.5.1
Q5 Let p(x, y) = 3x + 5y − 2.
a What is the graph z = p(x, y)? What is the significance of 3, 5 and −2?
c How is the tangent plane equation related to z = p(x, y)? Why does this make sense?
Q7 If the equation of the tangent plane of z = f (x, y) does not have a y in it, does that mean that
y is a free variable of f ? Explain.
Q8 Can our tangent plane formula ever give us a plane parallel to the xy-plane? The xz-plane? The
zy-plane? Explain.
302
4.5.2
p
Q9 Compute the equation of the tangent plane to z = 36 − 4x2 − y 2 at (2, 2, 4).
3x2 +4x−2
Q10 Let g(x, y) = e(y3 )
. Write the equation of the tangent plane to z = g(x, y) at (0, 1).
py
Q11 Let f (x, y) = x. Write the equation of the tangent plane to z = f (x, y) at (4, 36, 3).
Q12 Let f (x, y) = ln(x2 + y). Write the equation of the tagent plane to z = f (x, y) at (e3 , 0, 6).
4.5.3
2
+y 2
Q14 Write a linearization of g(x, y) = ex at (3, −4).
4.5.4
√
Q15 Suppose you want to approximate 5.5e0.3 by hand. Would using the linearization of f (x, y) =
√
xey at (5, 0) be a good strategy? Explain.
1 31π
Q16 Show how to use an appropriate linearization to approximate 5.12 sin 30 .
x2
Q17 Let g(x, y) = y . Suppose you don’t remember how to divide decimals. Show how you can use
3.972
a linearization of g to approximate 1.05 .
q √
Q18 Show how to use a linearization to approximate the value of (4.02)2 + 80.93 by hand.
303
Section 4.5 Exercises
4.5.5
y
Q19 Let f (x, y) = x2 +y 2 . Write the differential of f at (4, 3).
Q21 Boris is measuring the area of a rectangular field, so he can decide how much grass seed to buy.
According to his measurements, the field is 30m by 50m, giving an area of 1500m2 . If we accept
that each of his measurements has an error no larger than 0.2m, use a differential to approximate
the maximum error in his area computation.
Q22 Suppose I decide to invest $10, 000 expecting a 6% annual rate of return for 12 years, after which
I’ll use it to purchase a house. The formula for compound interest
P = P0 ert
indicates that when I want to buy a house, I will have P = 10, 000e0.72 .
I accept that my expected rate of return might have an error of up to dr = 2%. Also, I may
decide to buy a house up to dt = 3 years before or after I expected.
Q23 Let z = 2x − y 3 . At the point (x, y) = (5, 2), what is the maximum value of the differential dz?
Q24 Let f (x, y) be a function. What differential and what inputs into that differential would you use
Q25 Let L(x, y) be the linearization of f (x, y) at (3, 2). If fyy (x, y) < 0 for all (x, y), at which points
can we guarantee that L(x, y) either under or overestimates the value of f (x, y)? Explain.
Q26 Let f (x, y) = 25 − (x + 1)2 − y(y − 3)2 . Describe the set of points (a, b) such that the tangent
x
0 2 4 6 8 10
0 2 5 8 10 11 11
2 6 9 12 14 15 15
y
4 9 12 15 17 18 18
6 12 15 18 20 21 21
8 14 17 20 22 23 23
10 17 20 23 25 23 23
a Using any reasonable approximation method, show how to produce a linearization of f (x, y)
at (4, 2).
b Does your linearization over or underestimate f (10, 2)? Explain what that suggests about
Q28 a Give an equation of the plane that passes through the points (3, 4, 2), (5, 5, 1) and (6, 2, 6).
b Suppose there is a function f (x, y) and the plane in part a is tangent to the graph z =
f (x, y) at (3, 4, 2). What partial derivatives of f can you compute exactly (be specific)?
Compute them.
305
Section 4.5 Exercises
306
Chapter 5
Vectors in Calculus
This chapter introduces vectors and their applications to calculus. We will use them to compute direc-
tional derivatives, to differentiate compositions of functions, and to find minimum and maximum values
of a function.
Contents
5.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
5.2 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
5.3 Normal Equations of Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
5.4 The Gradient Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
5.5 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
5.6 Maximum and Minimum Values . . . . . . . . . . . . . . . . . . . . . . . . 377
5.7 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Section 5.1
Vectors
Goals:
Calculus is the study of change. We defined the partial derivative to be instantaneous rate of change
of a multi-variable function when one variable changed but the other stayed constant. If we want to
describe a more complicated change, we will need new notations and vocabulary to describe them. We
will need vectors.
Question 5.1.1
What is a Vector?
A vector is a way of describing a change in position in n-space. To keep things simple, we’ll start
with vectors in the plane. We need two pieces of information to identify a vector.
Definition
A vector in 2-space consists of a magnitude (length) and a direction. Two vectors with the same
magnitude and the same direction are equal.
Example
Here are four vectors in 2-space (the plane) represented by arrows. Two of these vectors are equal.
3 miles south
308
The force that a magnetic field applies to a charged particle
The velocity of an airplane
3:15 PM
Atlanta, GA
Question 5.1.2
How Do We Denote Vectors?
When defining a new type of object, we need to agree on a notation. This allows us to communicate
clearly which vector we are referring to. One way of denoting a vector is by its endpoints.
Endpoint Notation
How does this notation interact with the idea of equal vectors?
Theorem
−−→ −−→
AB = CD if and only if ABDC is a parallelogram (perhaps a squished one).
The plane has a coordinate system. We can take advantage of this to produce a more quantitative
notation for vectors.
309
Question 5.1.2 How Do We Denote Vectors?
Coordinate Notation
We can represent a vector in the Cartesian plane by the x and y components of its displacement. If
−−→
A = (2, 3) and B = (5, 1), then AB increases x by 5 − 2 = 3 and y by 1 − 3 = −2. We can represent
−−→
AB = ⟨3, −2⟩
We can use coordinate notation to quickly test whether two vectors are equal.
Theorem
We can also measure slope using the coordinate notation. For the vector ⃗v = ⟨a, b⟩:
b represents the displacement in the y-direction (rise).
a represents the displacement in the x-direction (run).
rise
The slope of ⃗v is run = ab .
Vectors are not points, but their coordinate notations look awfully similar. We can connect them
more formally. Every point in a Cartesian coordinate system has a position vector, which gives the
displacement of that point from the origin. The components of the vector are the coordinates of the
point.
310
Figure: There is only one point equal to (−5, 1), but there are many vectors equal to ⟨−5, 1⟩.
Question 5.1.3
What Arithmetic Can We Perform with Vectors?
Unlike locations (points), displacements (vectors) can be added and multiplied. This arithmetic
allows unlocks a variety of computations and measurements, specifically it will allow us to do calculus.
Since we have multiple ways of representing vectors, we will want to understand how to perform these
operations with each of those representations.
311
Question 5.1.3 What Arithmetic Can We Perform with Vectors?
Vector Sums
The sum of two vectors ⃗v + ⃗u is calculated by positioning ⃗v and ⃗u head to tail. The sum is the vector
from the initial point of one to the terminal point of the other. In coordinate notation, we just add each
component numerically.
⟨ 1, 3⟩
+⟨ 3, −1⟩
⟨ 4, 2⟩
Scalar Multiples
Given a number (called a scalar) λ and a vector ⃗v we can produce the scalar multiple λ⃗v , which is the
vector in the same direction as ⃗v but λ times as long.
312
Example 5.1.4
Performing Vector Arithmetic
What if we are instead given the components ⃗u = ⟨a, b⟩ and ⃗v = ⟨c, d⟩?
Solution
After drawing a random ⃗u and a random ⃗v , we draw 21 ⃗u in the same direction as ⃗u but is half as long.
We place it head to tail with ⃗v , and 12 ⃗u + ⃗v completes the triangle.
1 1
⃗u + ⃗v = ⟨a, b⟩ + ⟨c, d⟩
2 2
1 1
= a, b + ⟨c, d⟩
2 2
1 1
= a + c, b + d
2 2
313
Question 5.1.5
What Is Standard Basis Notation?
Vector arithmetic gives us another notation that takes advantage of our algebraic intuition. We can
represent any vector in the plane as a sum of scalar multiples of the following standard basis vectors.
⃗j = ⟨0, 1⟩
For example, the vector ⟨3, −5⟩ can be written as 3⃗i − 5⃗j. You can check yourself that the sum on
the right gives the correct vector.
Question 5.1.6
How Do We Measure the Length of a Vector?
A vector consists of two pieces of information: magnitude and direction. How do we measure these?
Length is the distance between the endpoints. We already have a method for measuring distance in the
plane.
Definition
The length or magnitude of a vector is calculated using the distance formula and notated |⃗v |. If
⃗v = a⃗i + b⃗j, then
p
|⃗v | = a2 + b2
314
Example 5.1.7
The Length of a Vector
Solution
p √
|⃗v | = 32 + (−5)2 = 34
Definition
1
⃗v
|⃗v |
Question 5.1.8
How Do We Measure the Direction of a Vector?
Direction cannot be described as clearly as length. How do we even measure it? A partial answer is
to measure the difference in direction between two vectors.
Angles are a good way of comparing directions. In general, two vectors will not intersect to form an
angle, so we use the following definition:
Definition
The angle between two vectors is the angle they make when they are placed so their initial points are
the same.
If they make a right angle, we call them orthogonal. If they make an angle of 0 or π, they are
parallel.
315
Question 5.1.9
How Do We Denote Vectors in Higher Dimensions?
Higher dimensional vectors represent displacements in higher dimensional spaces. We can call a
vector in n-space an n-vector. We can still denote and n-vector by its endpoints. We can also denote
it in coordinate notation, but we need more components.
Example
−−→
AB = ⟨3, −5, 2⟩ .
⃗i = ⟨1, 0, 0⟩
⃗j = ⟨0, 1, 0⟩
⃗k = ⟨0, 0, 1⟩
Example
Higher dimensions still have a standard basis, but at this point the naming conventions are less
standard. {⃗e1 , ⃗e2 , ⃗e3 , . . . , ⃗en } is common for n-vectors.
Length of a Vector
We might be concerned that direction becomes an even more difficult concept to work with as the
dimension increases. However, angles are a valid a way of comparing directions any dimension (though
they may be more difficult to compute).
316
Angles Between Vectors
Any two vectors with the same initial point lie in a plane. Their angle is a two-dimensional measurement.
However there is no good way to measure clockwise in 3 or more dimensions. The angle between
two vectors is never negative, nor more than π.
Figure: Two 3-vectors with a common initial point, the plane that contains them, and the angle
between them
Section 5.1
Exercises
Summary Questions
Q3 How can you tell if two vectors point in the same direction? Opposite directions?
−−→
Q4 If ⃗u and ⃗v are position vectors of the points P and Q, how are ⃗u and ⃗v related to P Q?
317
Section 5.1 Exercises
5.1.1
−−→ −−→
Q8 If AB = BA, what does that tell us about the points A and B? Explain.
5.1.2
−−→
Q9 If A = (8, 7, 11) and B = (2, 3, 15) write the vector AB
−−→
Q10 If P = (−2, 3, 5) and Q = (−2, 0, −4) write the vector P Q
318
Q13 Suppose two different vectors have the equal slopes. How are they related?
5.1.3
Q15 Let ⃗u be a vector. How are the magnitude and direction of ⃗u and 2⃗u related?
Q16 How is the direction and magnitude of ⃗u related to the direction and magnitude of −⃗u?
Q17 Given diagrams of two vectors ⃗u and ⃗v , how would we draw ⃗u − ⃗v ? What it its significance?
−−→ −→ −−→
Q19 If ⃗u = AB, ⃗v = AC, and 12 ⃗u + 12 ⃗v = AD, where is D?
−−→ −→ −−→
Q20 If ⃗u = AB, ⃗v = AC, and 15 ⃗u + 45 ⃗v = AD, where is D?
5.1.4
Q23 For Lindsey to get from her house to Sam’s house, she travels 5mi north and 3mi west. To
get to Russel’s house, she travels 2mi due south. What displacement would get her from Sam’s
house to Russel’s house?
Q24 One can get from Atlanta to Decatur by travelling 8km east and 2km north. To get from
Decatur to Covington, one can travel 43km east and 20km south. Describe how to get directly
from Atlanta to Covington.
Q25 Using the diagram below, describe each vector in terms of ⃗u and ⃗v using vector addition and
scalar multiplication. Use the fact that ACDB and ACBE are parallelograms.
319
Section 5.1 Exercises
−−→
a EB
−−→
b CG
−−→
c BC
−→
d AF
−−→
e GB
Q26 Using the diagram below, describe each vector in terms of ⃗u and ⃗v using vector addition and
scalar multiplication. Use the fact that ACBD is a parallelogram, and the marked segments are
congruent.
−−→
a BD
−→
b EA
−−→
c DC
−−→
d BG
−→
e AG
−−→
f CF
5.1.5
Q28 For any numbers a and b, use the definition of ⃗i and ⃗j to show that a⃗i + b⃗j = ⟨a, b⟩.
320
5.1.6
Q30 Given a nonzero vector ⃗u, many vectors of length 5 are parallel to ⃗u? Explain.
5.1.7
Q33 If ⃗u and ⃗v are vectors in R2 whose components are all positive, what is the largest possible angle
between ⃗u and ⃗v ?
Q34 Explain the difference between the terms “perpendicular” and “orthogonal.”
Q35 Suppose two vectors do not have the same inital point, but when we represent them by arrows,
the arrows happen to cross. Is the angle made in the crossing equal to the angle between the
vectors (as we defined it)?
5.1.8
321
Section 5.1 Exercises
Q41 a How many different (nonequal) unit vectors are orthogonal to a given vector in R2 ? How
are they related to each other?
b How many different (nonequal) unit vectors are orthogonal to a given vector in R3 ? How
are they related to each other?
Q42 Let ⃗u and ⃗v be non-parallel vectors in R3 . How many unit vectors in R3 are orthogonal to both
⃗u and ⃗v ?
Q43 Is the vector ⃗v = 2⃗i + 3⃗j + 8⃗k parallel to the plane p whose slope-intercept equation is z =
x + 2y − 7?
Q44 For a two-variable function f (x, y), fx (x0 , y0 ) is the slope of the line tangent to z = f (x, y) at
(x0 , y0 , f (x0 , y0 )) in the x-direction. Write a vector ⃗v that is parallel to this line.
−−→ −→
Q45 If ⃗u = AB and ⃗v = AC, show that for any scalar t, t⃗u + (1 − t)⃗v = AD where D is a point on
the line through B and C.
⃗ are position vectors of the three vertices A, B and C of a triangle, then 31 (⃗u +⃗v + w)
Q46 If ⃗u, ⃗v and w ⃗
is the position vector of K, the center of mass of the triangle. Verify this by showing that K lies
on the line between A and the midpoint of the side BC.
Q47 Suppose we become interested in studying vectors of infinite dimension (yes this is something
a Explain what trouble we might run computing the length of the vector ⟨1, 1, 1, 1, 1, . . .⟩.
322
Section 5.2
The arithmetic of vectors appears to have room for expansion. While we can add and subtract
vectors, we only defined how to multiply them by scalars, not by other vectors. There are in fact
products of two vectors. The simplest and most useful is the dot product. The dot product takes two
n-vectors and outputs a single number. Despite this apparent loss of information, the dot product is
the key tool in computing the angle between vectors, the work done by a force, or the illumination in a
digital scene.
Question 5.2.1
What Is the Dot Product?
Definition
⃗v · ⃗u = v1 u1 + v2 u2
For three dimensional vectors ⃗v = ⟨v1 , v2 , v3 ⟩ and ⃗u = ⟨u1 , u2 , u3 ⟩ we define
⃗v · ⃗u = v1 u1 + v2 u2 + v3 u3
This pattern can be extended to any dimension.
Example 5.2.2
Computing a Dot Product
323
Example 5.2.2 Computing a Dot Product
Solution
Question 5.2.3
What Are the Algebraic Properties of the Dot Product?
Theorem
The following algebraic properties hold for any vectors ⃗u, ⃗v and w
⃗ and scalars m and n.
Commutative ⃗u · ⃗v = ⃗v · ⃗u
Distributive ⃗u · (⃗v + w)
⃗ = ⃗u · ⃗v + ⃗u · w
⃗
Question 5.2.4
What Is the Geometric Significance of the Dot Product?
⃗u · ⃗v encodes key information about the magnitude and direction of ⃗u and ⃗v . This geometric
relationship can be derived from the algebraic properties we’ve established. We begin with the idea that
⃗u · ⃗u = |⃗u|2 . This doesn’t tell us the value of every dot product, but we can extend the reasoning to
any pair of parallel vectors.
324
Theorem
Since ⃗u and ⃗v are parallel, we can write ⃗v = m⃗u for some scalar m. ⃗v is m times as long as ⃗u. Both
lengths are positive, so this means if m > 0 then |⃗v | = m|⃗u|, but if m < 0, then |⃗v | = −m|⃗u|
⃗u · ⃗v = ⃗u · (m⃗u)
= m⃗u · ⃗u
= m|⃗u|2
= |⃗u|m|⃗u|
(
|⃗u||⃗v | if ⃗u and ⃗v have the same direction
=
−|⃗u||⃗v | if ⃗u and ⃗v have opposite directions
We can establish the dot product in another special case: when the vectors are orthogonal.
Theorem
In this case, we place ⃗u and ⃗v head to tail and draw ⃗u + ⃗v . Since ⃗u and ⃗v make a right angle, these
three vectors make a right triangle. The Pythagorean theorem applies to the lengths of the vectors.
325
Question 5.2.4 What Is the Geometric Significance of the Dot Product?
Two vectors need not be parallel or orthogonal, but given vectors ⃗u and ⃗v we can always write
⃗v = ⃗vproj + ⃗vorth . We choose ⃗vproj to be parallel to ⃗u and ⃗vorth to be orthogonal to ⃗u.
Definition
⃗u · ⃗v
The number is called the scalar projec-
|⃗u|
tion of ⃗v onto ⃗u.
The scalar projection is equal to the length of ⃗vproj if ⃗vproj is in the same direction as ⃗u. Otherwise,
it is the negative of the length.
Theorem
Let ⃗u and ⃗v have the same initial point and meet at angle θ. The following formula holds in any
dimension:
⃗u · ⃗v = |⃗u||⃗v | cos θ
Example 5.2.5
Using the Cosine Formula
326
Solution
We’ll apply the cosine formula, compute all of the components besides θ and solve.
We can verify this by noting that these vectors are diagonals in a unit cube. We could connect them
with a third diagonal to make an equilateral triangle. We may recall that an equilateral triangle has
angles of π3 .
Application 5.2.6
Work
In physics, we say a force works on an object if it moves the object in the direction of the force.
Given a force F and a displacement s, the formula for work is:
W = Fs
327
Application 5.2.6 Work
In higher dimensions, displacement and force are vectors. If the force and the displacement are not
in the same direction, then only F⃗proj contributes to work.
W = F⃗proj · ⃗s = F⃗ · ⃗s
Section 5.2
Exercises
Summary Questions
Q1 What algebraic properties does a dot product share with real number multiplication?
Q3 How is the angle between two vectors related to their dot product?
328
5.2.1
Q6 Elaine computes ⃗u ·⃗v and gets ⟨15, 4⟩. How can you tell that Elaine got the wrong answer without
5.2.2
5.2.3
329
Section 5.2 Exercises
d Compute 3⃗u and 3⃗v then take their dot product. How is it related to ⃗u · ⃗v ?
f Why do you think we call this operation a “dot product” and not a “dot sum?”
g If you wanted to prove that relationships your noticed in b - e work for all possible vectors,
how would you do that?
5.2.4
Q13 Suppose we know that ⃗u and ⃗v are parallel, that |⃗v | = 4 and that ⃗u · ⃗v = −28.
Q14 If |⃗u| = 12, |⃗v | = 9, and ⃗u · ⃗v = 0, what is the magnitude of the vector w
⃗ = ⃗u + ⃗v ?
Q15 If |⃗u| = 5 and ⃗u · ⃗v = 15, what are the possible values of |⃗v |?
Q16 If |⃗u| = 6 and |⃗v | = 10 what are the greatest and least possible values of ⃗u · ⃗v ?
Q17 Let ⃗v = 7⃗i − 2⃗j + ⃗k, what unit vector ⃗u produces the largest possible dot product ⃗u · ⃗v ?
330
5.2.5
Q20 Compute the angle between ⟨0, 3, −5⟩ and ⟨3, −4, 3⟩.
Q21 Let A be the vertex of a cube. Let B the a vertex closest to A and C be the vertex farthest from
−−→ −→
A. Compute the angle between AB and AC.
Q22 Let A be the vertex of a cube, and B and C be any two other points on the cube. Use a dot
−−→ −→ π
product to explain why the angle between AB and AC cannot be larger than 2. (Hint, put A
at (0, 0, 0).)
Q23 How could you use the dot product to determine whether two vectors are parallel? How does this
compare with the methods we already have?
Q24 Use dot products to find at least one vector that is orthogonal to both ⟨5, −1, 2⟩ and ⟨4, 4, 1⟩
Q25 “Think of a vector ⃗v ” says Raphael, “tell me its dot product with the vector of my choice, and
I’ll tell you what your vector was.”
b How many dot products would you need to ask for to uniquely identify an unknown vector?
What dot products would you ask for?
331
Section 5.3
Question 5.3.1
What is a Normal Vector to a Plane?
In algebra, you learned the normal equation of a line: e.g. 2x + 3y − 12 = 0. Why is it called this?
The vector ⟨2, 3⟩ is a normal vector to the line, meaning it is orthogonal to any vector contained in
the line. We can extend this definition to planes in 3-space. A normal vector to a plane is orthogonal
to every vector in the plane.
Theorem
In three-dimensional space, every plane has normal vectors. They are all parallel to each other.
332
−−→
Figure: A plane, its normal vector ⃗n, and a vector P Q in the plane
−−→
This gives us an avenue to test whether a point Q lies on the plane or not. If P Q is orthogonal to
−−→
⃗n, then Q lies on the plane. If P Q and ⃗n make a different angle, then Q is not on the plane.
We’d like to rewrite this relationship terms of the coordinates of Q. If ⃗r0 is the position vector of
−−→
P and ⃗r is the position vector of Q, then P Q = ⃗r − ⃗r0 . The dot product gives us a simple test to see
whether this vector is orthogonal to ⃗n.
Theorem
If ⃗r0 = ⟨x0 , y0 , z0 ⟩ describes an known point on a plane, and ⃗n = ⟨a, b, c⟩ is a normal vector. Then
the normal equation of the plane is
(⃗r − ⃗r0 ) · ⃗n = 0
or
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0
Notice that since x0 , y0 and z0 are constants, we can distribute and collect them into a single term:
d.
This reasoning works in any dimension to define a set of points whose displacement from a known
point is orthogonal to some normal vector.
333
Question 5.3.1 What is a Normal Vector to a Plane?
Example
Example 5.3.2
Computing a Normal Vector
Find the normal equation of the plane with intercepts (4, 0, 0), (0, 3, 0) and (0, 0, 8). Compute a
normal vector.
Solution
The normal equation of a plane has the form ax + by + cz + d = 0. Each of these points must satisfy
this equation. We will plug them in and see what they tell me about the coefficients.
There are infinitely many solutions to this system of equations. This makes sense, because there are
infinitely many normal vectors to a plane. Different choices of d give ⃗n’s that are scalar multiples of
each other. A convenient choice for d is −24, but any nonzero value will work. d = −24 gives
6x + 8y + 3z − 24 = 0
334
Synthesis 5.3.3
Using the Normal Vector to Compute Distance
This is the line with normal vector ⃗n = ⟨2, 3⟩ and known point P = (3, 2).
Example
Solution
Since ⃗n is a normal vector, its angle with any vector in the line is π2 . The vectors on the same side of
the line as ⃗n make an acute angle with ⃗n. The vectors on the far side make an obtuse angle. Thus
−−→ −−→
when ⃗n · P Pi < 0, Pi lies on the far side of the line from ⃗n. When ⃗n · P Pi > 0, Pi lies on the same side
as ⃗n.
We can get more detailed information than just the sign of the dot product. We can actually compute
a distance.
335
Synthesis 5.3.3 Using the Normal Vector to Compute Distance
Theorem
Given a line, plane, or hyperplane with normal equation L(x1 , . . . , xk ) = 0 and corresponding normal
vector ⃗n, the signed distance from the hyperplane to the point Q = (q1 , . . . , qk ) is
L(q1 , . . . , qk )
.
⃗n
−−→
Let P be a known point on the hyperplane. The scalar projection of P Q onto ⃗n is equal to the
signed distance from the hyperplane to Q.
−−→
Figure: The scalar projection of P Q onto the normal vector of a line
−−→
P Q · ⃗n
Distance = (formula for scalar projection)
|⃗n|
L(q1 , . . . , qk )
= (normal equation of the plane)
|⃗n|
This formula is especially powerful because we do not need to know a point on the hyperplane. The
equations
are equivalent, and correspond to the same normal vector. We can use whichever one we happen to
have in our signed distance formula.
336
Example 5.3.4
The Distance from a Plane
Solution
⃗n = ⟨6, 8, 3⟩. The signed distance from the plane to the origin is
Application 5.3.5
Support Vector Machines
One type of machine learning involves training a computer to distinguish between two states. For
example, a computer might be trained to distinguish between a cancerous tumor and a benign one.
To do this the computer is given a large set of cases. Each case is measured by numerical data, such
as:
Each data type is a dimension, and each case is a point in a (probably very high) dimensional space.
The computer would like a simple test to divide these cases into cancerous and benign. The test will
be which side of a hyperplane they lie on. It is unlikely that any such hyperplane exists initially, so the
computer attempts a sequence of transformations of the data until they are separated by a hyperplane
with some degree of reliability.
337
Application 5.3.5 Support Vector Machines
Section 5.3
Exercises
Summary Questions
Q1 What information do you need in order to write the normal equation of a plane?
338
5.3.1
i. 3x − 8y + 10z − 4 = 0
ii. z − 2 = 4(x + 7) − 5(y + 1)
Q9 Write a normal equation of a plane parallel to 7x − 11y + 8z + 15 = 0 that passes through the
origin.
Q10 Write a normal equation of a plane parallel to 10x − 11y + z + 20 = 0 that passes through
(2, 3, 5).
Q11 Given that the plane ax + by + cz + d = 0 passes through the origin, what can you say about a,
b, c, and d?
Q12 Given that plane ax + by + cz + d = 0 contains the x-axis, what can you say about a, b, c, and
d?
Q13 Are the planes 4x + 6y + 8z + 15 = 0 and 10x + 15y + 20z − 7 = 0 parallel? Explain how you
know.
Q14 Suppose we know the planes 12x + 18y + 6z − 15 = 0 and ax + by + 4z + d = 0 are parallel.
What can you say about the values of a, b and d?
Q15 The equations 3x − y + 4z + 10 = 0 and −6x + 2y − 8z + k = 0 describe the same plane. What
is the value of k?
b What are the normal vectors corresponding to the orginal equation and your two equations
in a ?
339
Section 5.3 Exercises
5.3.2
Q17 Give a normal equation of the plane with intercepts (10, 0, 0), (0, −5, 0) and (0, 0, 2).
Q18 Give a normal equation of the plane with intercepts (−18, 0, 0), (0, 9, 0) and (0, 0, −4).
Q19 Give a normal equation of the plane through (4, 3, 0), (5, 1, 1) and (−2, 5, 2).
Q20 Give a normal equation of the plane through (1, 1, 1), (8, 1, 4) and (0, 0, 4).
5.3.3
Q21 Katie is computing the distance from the point (6, 3) to the line 2x + 3y − 12 = 0. She notices
that (6, 0) is the x-intercept of the line. Since (6, 3) is 3 units away from (6, 0) she concludes
the distance from the point to the line is 3. What do you think of Katie’s reasoning?
Q22 Consider the line L with normal equation 2x + 3y − 12 = 0 and the point Q = (6, 3).
c Write an equation (in any form you’d like) of a line K that passes through Q and is perpen-
dicular to L.
f Check that your answer to e matches the distance formula we derived. Which method do
you like better?
340
5.3.4
Q25 Are (6, 7, 1) and (5, −3, −4) on the same or different sides of 3x − 10y + 9z + 46 = 0?
Q26 The point (x, 4, 5) lies on the same side of the plane 2x + y − 2z + 10 = 0 as the origin does.
What does that tell you about the value of x?
5.3.5
Q27 We have six images of dogs and cats. We measure four things about each, and have collected
the data below. We would like to use the hyperplane 2x1 + 5x2 − 4x3 + 10x4 + k = 0 to separate
the images of dogs from the images of cats.
Type Measurements
Cat (5, 1, 3, 6)
Dog (7, 3, 7, 2)
Dog (7, 2, 6, 4)
Dog (9, 1, 8, 5)
Cat (6, 4, 5, 5)
Cat (9, 2, 7, 6)
a What values of k would cause the hyperplane to correctly separate the dog images from the
cat images?
b If you intended to use the hyperplane to guess whether a future image was a dog or cat,
what k would you choose? Why?
Q28 Suppose we have a hyperplane that we would like to separate two sets of points, but it doesn’t
quite work. We measure the error of this separation by taking the sum of the geometric distances
from the hyperplane of each point that is on the wrong side of the hyperplane. Suppose we were
hoping that the line 2x + 3y − 12 = 0 would separate the points of type T from the points of
type S.
341
Section 5.3 Exercises
Type Coordinates
T (6, 2)
T (2, 1)
T (5, 3)
T (4, 4)
S (1, 5)
S (1, 1)
S (4, 0)
S (4, 2)
a Create a diagram of these points (labelled or colored by type) and the line.
b We did not specify which side of the line should be T and which should be S. Use your
diagram to decide which choice of sides will give less error.
Q29 Write the equation of a plane that contains all the points equidistant from A = (1, −2, 7) and
B = (7, 0, 5)
Q30 Two planes are perpendicular if their normal vectors are orthogonal.
b If two planes are perpendicular, is every vector in the first plane orthogonal to every vector
in the second plane?
Q31 Write the normal equation of a plane that contains the x and z axes. Where have we seen this
plane before?
342
Q32 What trouble do you run into if you try to write the equation of the plane through (6, 0, 0),
(0, 8, 0) and (3, 4, 0)? Explain geometrically why this makes sense.
343
Section 5.4
Armed with ideas about vectors, we have the vocabulary to discuss more complex changes in the
variables of a function. Rather than having one variable change and the other stay constant, we can
indicate a change in both variables with a vector. When exploring these computations, we will construct
one of the most important tools for multivariable calculus.
Question 5.4.1
How Do We Compute Rates of Change in Another Direction?
The partial derivatives of f (x, y) give the instantaneous rate of change in the x and y directions.
This is realized geometrically as the slope of the tangent line. What if we want to travel in a different
direction?
Definition
Let f (x, y) be a function and ⃗u be a unit vector in R2 . The directional derivative, denoted D⃗u f ,
is the instantaneous rate of change of f as we move in the ⃗u direction. This is also the slope of the
tangent line to y = f (x, y) in the direction of ⃗u.
344
Figure: The tangent line to f (x, y) in the direction of ⃗u
Recall that we compute Dx f by comparing the values of f at (x, y) to the value at (x + h, y), a
displacement of h in the x-direction.
f (x + h, y) − f (x, y)
Dx f (x, y) = lim
h→0 h
To compute D⃗u f for ⃗u = a⃗i + b⃗j, we compare the value of f at (x, y) to the value at (x + ta, y + tb),
a displacement of t in the ⃗u-direction.
Limit Formula
Questions:
1 What direction produces the greatest directional derivative? The smallest?
2 How are these directions related to the geometry (specifically the level curves) of the graph?
345
Question 5.4.1 How Do We Compute Rates of Change in Another Direction?
Question 5.4.2
What Is the Gradient Vector?
The relationship between the direction of maximum increase and the partial derivatives suggest that
we could treat the partial derivatives like components of a vector.
Definition
Remarks:
1 The gradient vector is a function of (x, y). Different points have different gradients.
2 ⃗umax , which maximizes D⃗u f , points in the same direction as ∇f .
3 ⃗u0 , which is tangent to the level curves, is orthogonal to ∇f .
Remark
Students often wonder: what is the geometric intuition behind the gradient vector and its properties?
The answer is often disappointing, but important. The gradient vector does not have a geometric
motivation. We artificially created the gradient vector because it has convenient algebraic properties. If
that were the end of the story, we wouldn’t bother learning about it. However, the gradient turns out
to be so useful that we will study it intensely, despite its uncompelling origins.
346
Question 5.4.3
How Do We Compute a Directional Derivative?
There are several ways to derive a formula for the directional derivative. One approach is to apply
algebra and limit laws to the limit definition. A more geometric method is to exploit our previous work
with the tangent plane. The directional derivative is the slope of a tangent line. The tangent lines live
in the tangent plane. We can compute their slope by rise over run.
Let ⃗u be a unit vector from (x0 , y0 ) to (x1 , y1 ). Let the associated z values in the tangent plane be
z0 and z1 respectively.
rise z1 − z0
D⃗u f (x0 , y0 ) = =
run |⃗u|
=fx (x0 , y0 )(x1 − x0 ) + fy (x0 , y0 )(y1 − y0 )
=∇f (x0 , y0 ) · ⃗u.
We can also define directional derivatives of higher variable functions with analogous results.
f (x1 , . . . , xn ) is a differentiable function.
⃗u is a unit vector in Rn .
D⃗u f denotes the directional derivative in the direction of ⃗u.
Synthesis 5.4.4
Directional Derivative and the Cosine Formula
Now that we have a formula for directional derivatives, we can verify our observations from earlier.
Suppose f (x, y) is a differentiable function and we can choose any unit vector ⃗u.
347
Synthesis 5.4.4 Directional Derivative and the Cosine Formula
Solution
a Since the directional derivative is a dot product, we can apply our formula that relates the dot
product to the lengths of the vectors and the angle between them.
b Given a particular (x, y), |∇f (x, y)| cos θ is largest when θ = 0 This means that D⃗u f (x, y) is
maximized when ⃗u is in the direction of ∇f (x, y). The formula for a unit vector in the direction
of the gradient is
1
⃗u = ∇f (x, y)
|∇f (x, y)|
D⃗u f (x, y) = 0
|∇f (x, y)| cos θ = 0by part (a)
348
Figure: The angle between the gradient of f and a unit vector
Main Ideas
The cosine formula for the dot product lets us relate the directional derivative to an angle.
f increases fastest in the direction of ∇f (x, y).
Example 5.4.5
A Directional Derivative
p
Let f (x, y) = 9 − x2 − y 2 and let ⃗u = ⟨0.6, −0.8⟩.
349
Example 5.4.5 A Directional Derivative
Solution
p
a The level curves have the equations 9 − x2 − y 2 = c. These solve to x2 + y 2 = 9 − c2 . As
c increases from 0 to 3 these are circles starting at radius 3 and shrinking to the origin. For c
outside this range, the level curve has no points.
b ∇f points in the direction of increase and normal to the level curves. Since higher level curves
are smaller circles, closer to the origin, ∇f (1, 2) points toward the origin.
c D⃗u f (1, 2) = ∇f (1, 2) · ⃗u. Since ⃗u appears to make an acute angle with ∇f (1, 2), we expect this
dot product to be positive.
Now we use the dot product formula to compute D⃗u f (1, 2).
Example 5.4.6
Drawing the Gradient
Let h(x, y) give the altitude at longitude x and latitude y. Assuming h is differentiable, draw the
direction of ∇h(x, y) at each of the points labeled below. Which gradient is the longest?
A
B
Solution
The gradient vector at each point is normal to the level curves, pointing uphill. The hill is steepest at
B, because the level curves are closer together. This tells us that the partial derivatives are larger. Thus
∇h(B) is longer than ∇h(A) and ∇h(C).
A
B
351
Application 5.4.7
Edge Detection
Representing an image by defining a brightness (or color) function on the pixels is simple enough,
but can a computer be taught to make sense of what it sees? Image recognition is an exciting field that
promises to automate and improve tasks from medical diagnosis to driving a vehicle.
The problem is daunting. What algorithm can possibly take a set of pixels and locate a tumor or a
pedestrian? The first step is to identify the objects in the image. The first step of object identification is
edge detection, determining where one object ends and another begins. We can do this by approximating
the partial derivatives at each pixel. We compare each pixel to nearby pixels and compute rise over run
(how these are chosen and averaged can significantly affect the accuracy of the algorithm).
The length of the gradient of a brightness function detects the edges in a picture, where the brightness
is changing quickly.
∂B 185−187
∂x
(336, 785) ≈ 1
∂B 179−187 ∇B
∂y
(336, 785) ≈ 1
∂B 97−139
∂x
(340, 784) ≈ 1
∂B 72−139
∂y
(340, 784) ≈ 1
Figure: A long gradient vector indicates a swift change in brightness. Its direction suggests the shape
of the edges.
Notice that the gradient is long near the edge of the iris in Mona Lisa’s eye. It is much shorter at a
point in the white of her eye. Moreover, the gradient at the edge of the iris is approximately normal to
the edge of her iris, because gradients are normal to level curves. This information can be used by an
algorithm to detect not only the location of the edges, but also their direction.
Application 5.4.8
Tangent Planes to a Level Surface
Use a gradient vector to find the equation of the tangent plane to the graph x2 + y 2 + z 2 = 14 at
the point (2, 1, −3).
There are two solutions worth comparing here.
352
Solution 1
We can write z as a function of x and y and apply the tangent plane formula.
x2 + y 2 + z 2 = 14
z 2 = 14 − x2 − y 2
p
z = − 14 − x2 − y 2 (z = −3 is on the negative branch of the function)
1 2
fx (x, y) = − p (−2x) fx (2, 1) =
2
2 14 − x − y 2 3
1 1
fy (x, y) = − p (−2y) fy (2, 1) =
2 14 − x2 − y 2 3
2 1
Equation: z + 3 = (x − 2) + (y − 1)
3 3
Solution 2
We now have a normal vector ⃗n = ∇F (2, 1, −3). Our known point is (x0 , y0 , z0 ) = (2, 1, −3). The
normal equation of the plane is
Solution 2 requires more conceptual reasoning, but is computationally much easier. In fact, in
some cases we cannot use Solution 1 at all because we do not know how to solve for z. Once we are
comfortable with the concepts involved, the second method is generally superior for graphs of implicit
equations.
353
Application 5.4.8 Tangent Planes to a Level Surface
Main Idea
The graph of an implicit equation can be written as a level set of a function. The gradient of that
function is a normal vector to the level set and also to its tangent line/plane/hyperplane.
Section 5.4
Exercises
Summary Questions
354
5.4.1
c If you wanted to express the previous rate of change as an approximation of D⃗u g(0, 2), what
5.4.2
p
Q8 If g(x, y) = 6x2 + 5y 4 , what is ∇g(x, y)?
Q9 If ∇f (x0 , y0 ) is orthogonal to ∇g(x0 , y0 ), what can we say about the level curves of f and g?
Be specific.
Q10 Harriet says “The gradient vector of f is tangent to the graph of z = f (x, y).”
“No,” says Marcus, “it is normal to the graph of z = f (x, y).” Who is correct?
355
Section 5.4 Exercises
5.4.3
b If ⃗u were not a unit vector, then ∇f · ⃗u would no longer represent rise over run. What would
it represent instead?
1
L(x, y) = 4 + 2(x + 3) − (y − 9).
3
5.4.4
Q13 Given a function f (x, y) and a point (x, y), in what direction ⃗u is f decreasing fastest? Compute
Q14 If D⃗u f (x, y) < 0, what can you say about the directions of ∇f (x, y) and ⃗u?
Q15 If fx (3, 5) = fy (3, 5) in what direction(s) from (3, 5) could f increase most quickly?
Q16 Explain why it makes sense that if D⃗u f (a, b, c) = 0, then ⃗u is tangent to the level surface of f
Q17 If f (x, y, z) = 3xy + z 2 , find the unit vector ⃗u that maximizes D⃗u f (2, 1, −4). What is the value
356
5.4.5
2 1 2
Q19 If ⃗u = 3, −3, −3 and f (x, y, z) = xeyz , compute D⃗u f (3, 0, 4).
3 6 2
Q20 If ⃗u = 7, 7, −7 and f (x, y, z) = xy + yz + zx, compute D⃗u f (7, −7, 14).
Q21 If ⃗u is a unit vector in the direction of ⟨2, 3⟩ and f (x, y) = x2 + 3xy + 2, calculate D⃗u f (−1, 4).
2
−y
Q22 Compute the directional derivative of g(x, y) = ex at (3, 7) in the direction of ⟨−12, 5⟩.
5.4.6
f (x, y) = 30
∇f (x, y) points in the positive y-direction
Q24 Some level curves of f are drawn below. Indicate the direction of the gradient of f at each
labelled point.
357
Section 5.4 Exercises
5.4.7
Q25 If ∇B(x0 , y0 ) = ⟨13, −17⟩, would you expect the pixels above (x0 , y0 ) to be brighter or dimmer
Q26 The brightness function on the Mona Lisa image ranges from 0 to 255. If we use adjacent points
to apporixmate the gradient as in the example, what is the longest gradient vector we could
theoretically produce?
5.4.8
Q28 Let P be a point on the circle x2 + y 2 = r2 . Show that the position vector of P is normal to the
tangent line to the circle at P .
Q29 Produce an equation of the tangent plane to z 3 − xz 2 − yx2 = 24 at (4, −2, 2).
Q30 Give an equation of the tangent plane to the graph z 2 x + 2yz − x2 y 2 = 59 at (3, 2, 5).
358
Synthesis and Extension
Q31 Suppose f (x, y) is a differentiable function, and we know that for ⃗u = ⟨−0.6, 0.8⟩, D⃗u f (5, −1) =
4 and for ⃗v = ⟨0, −1⟩ we know that D⃗v f (5, −1) = −2. What is ∇f (5, −1)?
Q32 Suppose the point P = (x0 , y0 , z0 ) lies on the graph z = f (x, y).
b z = f (x, y) is a level surface of F (x, y, z) = f (x, y) − z. Use the gradient of F to write the
Q33 How could you use the gradient of f to rewrite the formula for the linearization L(x, y) of f (x, y)
at (x0 , y0 )?
Q34 Suppose f (x, y) is a differentiable function and ∇f (a, b) is not the zero vector. How many unit
vectors ⃗u exist such that D⃗u f (a, b) = 0. How are they related geometrically?
Q35 Suppose f (x, y, z) is a differentiable function and ∇f (a, b, c) is not the zero vector. How many
unit vectors ⃗u exist such that D⃗u f (a, b, c) = 0. How are they related geometrically?
Q36 Suppose that f (x, y, z) is a differentiable function, and f (3, 5, −2) = 13. Suppose further that
the vectors ⟨3, 1, 0⟩ and ⟨0, 2, 5⟩ both lie in the tangent plane to the surface f (x, y, z) = 13 at
(3, 5, −2). If the maximum value of D⃗u f (3, 5, −2) is 20, find all possible values of ∇f (3, 5, −2).
b What angle do these vectors ⃗u make with the tangent line to the level curve h(x, y) =
√
8 + 12 3 at (2, 3).
a Give an equation of the level curve of f through the point (−1, 2).
b Give an equation of the tangent line to the level curve of f at (−1, 2). Write your equation
in normal form.
359
Section 5.4 Exercises
360
Section 5.5
Motivational Example
Suppose Jinteki Corporation makes widgets which is sells for $100 each. It commands a small enough
portion of the market that its production level does not affect the demand (price) for its products. If
W is the number of widgets produced and C is their operating cost, Jinteki’s profit is modeled by
P = 100W − C
∂P
The partial derivative ∂W = 100 does not correctly calculate the effect of increasing production on
profit. How can we calculate this correctly?
Question 5.5.1
How Can We Visualize a Composition with a Multivariable Function?
You may recall parametric equations from high school algebra. A parametric equation actually
consists of two or more equations. Each expresses a variable in our coordinate system in terms of a
parameter t.
We can visualize a parametric equation as particle traveling through space.
The variable t represents time.
x(t) and y(t) represent the coordinates of the position at time t.
The vector ⟨x′ (t), y ′ (t)⟩ represents velocity. It points in the direction of travel.
Figure: A particle whose position is defined by x(t) and y(t), the path it follows and its velocity vector
361
Question 5.5.1 How Can We Visualize a Composition with a Multivariable Function?
Given a function f (x, y) where x = x(t) and y = y(t), we can ask how f changes as t changes.
We can visualize this change by drawing the graph z = f (x, y) over the path given by the parametric
equations x(t) and y(t).
Figure: The composition f (x(t), y(t)), represented by the height of z = f (x, y) over the path
(x(t), y(t))
Question 5.5.2
How Do We Compute the Derivative of a Composition of Functions?
Consider a differentiable function f (x, y). If we define x = x(t) and y = y(t), both differential functions,
we have
df ∂f dx ∂f dy
= +
dt ∂x dt ∂y dt
or
df
= ∇f (x, y) · ⟨x′ (t), y ′ (t)⟩
dt
362
Remarks
df
f (x(t), y(t)) is a function (only) of t. Because of this, dt is an ordinary derivative, not a partial
derivative.
df
dt is not the slope of the composition graph.
rise in z
slope =
run in xy-plane
df rise in z
=
dt change in t
The chain rule is easy to remember because of its similarity to the differential:
∂z ∂z
dz = dx + dy.
∂x ∂y
The proof is more complicated than just sticking a dt under each term.
Example 5.5.3
Using the Chain Rule
dP
If P = R − C and we have R = 100w and C = 3000 + 70w − 0.1w2 , calculate dw .
Solution
∂P ∂P
=1 = −1
∂R ∂C
dR dC
= 100 = 70 − 0.2w
dw dw
We plug these into the formula to get
dP
= (1)(100) + (−1)(70 − 0.2w)
dw
= 30 + 0.2w
363
Example 5.5.3 Using the Chain Rule
Remark
Notice we don’t need the chain rule when we have expressions for each function. We can write the
composition ourselves and take an ordinary derivative. In this example we could just differentiate
P = 100w − (3000 + 70w − 0.1w2 ).
Question 5.5.4
What If We Have More Variables?
The chain rule works just as well if x and y are functions of more than one variable. In this case it
computes partial derivatives.
Theorem
If f (x, y), x(s, t) and y(s, t), are all differentiable, then
∂f ∂z ∂x ∂z ∂y
= +
∂s ∂x ∂s ∂y ∂s
or
∂f ∂x ∂y
= ∇f (x, y) · ,
∂s ∂s ∂s
Theorem
Given f (x, y, z), x(t), y(t) and z(t), all differentiable, we have
df ∂f dx ∂f dy ∂f dz
= + +
dt ∂x dt ∂y dt ∂z dt
or
df
= ∇f (x, y, z) · ⟨x′ (t), y ′ (t), z ′ (t)⟩
dt
364
Example 5.5.5
A Composition with More Variables
dP
a Apply the chain rule to get an expression for dT .
dn
b What is dT ?
dT
c What is dT ?
d Suppose that dV
dT = (5.9 × 10−6 )V . Calculate and simplify the expression you got for dP
dT .
Solution
dP ∂P dT ∂P dn ∂P dV
a dT = ∂T dT + ∂n dT + ∂V dT
dn
b The container is sealed so no molecules are getting in or out. dT = 0.
dT
c If we write T as a function of T , we get T = T . dT = 1.
d We’ll compute the partial derivatives and then plug them into our chain rule expression.
∂P nR
=
∂T V
∂P nRT
=− 2
∂V V
dP nR nRT
= (1) + 0 − (5.9)(10−6 )V
dT V V2
nR(1 − 0.0000059T )
=
V
365
Example 5.5.6
A Composition with Limited Information
2
Suppose g(p, q, r) = rep q . Given that p, q, r are all differentiable functions of x with the values in
dg
the following table, compute dx when x = 2.
x 0 1 2 3
p(x) 3 1 5 10
p′ (x) −3 2 3 4
q(x) 6 2 −2 3
q ′ (x) −1 −5 2 3
r(x) 10 11 7 3
′
r (x) 1 0 −1 −3
Solution
∂g 2
= 2pqrep q
∂p
∂g 2
= p2 rep q
∂q
∂g 2
= ep q
∂r
Now we plug in the partial derivatives, along with the derivatives of p, q and r from the table.
dg 2 2 2
= 2pqrep q (3) + p2 rep q (2) + ep q (−1)
dx
This is correct, but not sufficiently simplified. We have left p’s, q’s and r’s in the expression, but the
table tells us what value these have when x = 2. We can make these subsitutions:
dg 2 2 2
= 2(5)(−2)(7)e(5) (−2) (3) + (5)2 (7)e(5) (−2) (2) + e(5) (−2) (−1)
dx
= −420e−50 + 350e−50 − e−50
= −71e−50
366
Application 5.5.7
Implicit Differentiation
Recall that an implicit equation on n variables is a level curve of a n-variable function. Consider the
dy
graph x3 + y 2 − 4xy = 0. How can we use this to calculate dx at the point (3, 3)?
Solution
First, note that (3, 3) does lie on the graph. When we plug x = 3 and y = 3 into our equation, we get
27 + 9 − 36 = 0, which is true. Now suppose that for every x near 3, we can define y(x) to be the y
coordinate on the graph x3 + y 2 − 4xy = 0.
Define F (x, y) = x3 + y 2 − 4xy. The points (x, y(x)) lie on the graph F (x, y) = 0. We can use this
dy
equation to obtain an expression for dx . When we differentiate F (x, y(x)), both components change as
x changes, so we cannot use a partial derivative. We need the chain rule.
F (x, y(x)) = 0
d d
F (x, y(x)) = 0 differentiate both sides
dx dx
∂F dx ∂F dy
+ =0 apply chain rule
∂x dx ∂y dx
∂F ∂F dy dx
+ =0 =1
∂x ∂y dx dx
∂F dy ∂F dy
=− solve for
∂y dx ∂x dx
∂F
dy ∂x
= − ∂F
dx ∂y
We compute the partial derivatives at (3, 3), then plug them into the formula we derived.
Fx (x, y) = 3x2 − 4y Fx (3, 3) = 15
Fy (x, y) = 2y − 4x Fy (3, 3) = −6
dy 15
=−
dx −6
5
=
2
Figure: The graph of F (x, y) = x3 + y 2 − 4xy = 0, its tangent line at (3, 3), and the gradient of F
367
Application 5.5.7 Implicit Differentiation
Main Ideas
dy
dx is the slope of the tangent line to F (x, y) = c.
dy
The chain rule allows us to derive dx = −F
Fy
x
Fy
−F
Fy is the negative reciprocal of
x
Fx , which is the slope of ∇F .
dy
In order to solve for dx we had to assume that y was a differentiable function of x. How do we
know that’s even true? There is an advanced and powerful theorem that tells us when we can write one
variable in an implicit equation as a function of the others. Here is the two-variable version.
Then there is a function y = f (x) that agrees with the graph of F (x, y) = c in some neighborhood
around (x0 , y0 ). Furthermore
1 f is continuous
2 f is differentiable
Fx (x0 , y0 )
3 f ′ (x0 ) = −
Fy (x0 , y0 )
In the case of our example, the partial derivatives in question are polynomials. As long as Fy (x0 , y0 ) ̸=
Fx (x0 , y0 )
0, we are guaranteed that our graph has a tangent line at (x0 , y0 ), and its slope is − .
Fy (x0 , y0 )
Application 5.5.8
Indirect Profit Functions
Suppose a firm chooses how much quantity q to produce, but their profit Π(q, α) depends on some
parameter α outside their control (maybe a tax or a measure of regulatory burden). The firm, once
it knows the value of α, will choose the q that maximizes profit. How will their profit change as α
changes?
368
Solution
dΠ
The change in the firms profit is dα . Since q is also a function of α we will need the chain rule.
dΠ ∂Π dq ∂Π dα
= +
dα ∂q dα ∂α dα
dα ∂Π
We can substitute dα = 1. We can also argue that ∂q = 0. Why? Because q is the choice that
∂Π
maximizes profit, and maximums occur at critical points. If ∂q > 0 then the firm could increase q to
∂Π
increase profit (without changing α, which it has no control over). Similarly, If ∂q < 0 then reducing
production would increase profit.
Performing these substitutions gives:
dΠ ∂Π
=
dα ∂α
This suggests that in this case, the total derivative is equal to the partial derivative.
We can verify this equality graphically as well. Pick a particular α0 and let q0 = q(α0 ). Notice:
The graph π(q0 , α) is never above π(q(α), α) for any α, since q(α) is the optimal choice of q.
The graphs π(q0 , α) and π(q(α), α) meet at α0 , since q0 = q(α0 ).
If two graphs meet but one stays below the other, they are tangent. They have the same tangent
line and thus the same derivative.
Figure: Two graphs of z = Π(q, α), one where q changes to be the optimal choice for each α and one
where q is fixed at q0 , the optimal choice for α0
369
Application 5.5.8 Indirect Profit Functions
Remark
If we had an expression for q(α) and an expression for Π, we could substitute and use ordinary differen-
tiation. Since we did not, we needed the chain rule. Even with such an expression, to find dΠ
dα directly
we would need to
1 Solve for q as a function of α
Section 5.5
Exercises
Summary Questions
df
Q2 Explain why dt cannot be interpreted as a slope of f over the xy-plane.
dz ∂z
Q3 What is the difference between dx and ∂x ? How is the first one computed?
5.5.1
x(t) = 3 + 5t
y(t) = −2 + 4t
370
Q6 Consider the curve defined by
x(t) = t
y(t) = et
x(t) = t
y(t) = f (t)
seem to produce?
x(t) = 2 cos t
y(t) = 3 sin t
π
What is the speed (magnitude of velocity) at t = 3?
x(t) = t3
y(t) = t2
Q9 Is the graph of
x(t) = t2
y(t) = sin(t)
the graph of a function? How can you tell without graphing it?
Q10 How are the graphs of the following two parametric equations related? Can you generalize your
answer to similar pairs of parametric equations?
371
Section 5.5 Exercises
5.5.2
df
Q11 Let f (x, y) be a funtion. Under what conditions is dt equal to the directional derivative of f in
Q12 Liam says “If f is a function of x and y and x and y are increasing, then f is increasing.” We
all know Liam is incorrect. How could we use the chain rule to refute him?
5.5.3
v
Q13 The angular speed of an object is given by ω = r where r is the distance from the center of
rotation and v is the linear speed. Suppose an object is orbiting earth at a radius of 8400000m
and a speed of 6900m/s. If the radius is increasing at a rate of 100m/s and the linear speed is
decreasing by 60m/s2 , how quickly is the angular speed changing?
df
a Compute dt using the multivariable chain rule.
df
b Compute dt by substituting and using single-variable differentiation.
c What earlier rule of differentiation can we recover by applying the chain rule to f (x, y) = xy?
5.5.4
Q15 Suppose h(x1 , x2 , x3 , x4 ) is a four-variable function and each xi (x, t) is a function of parameters
∂h
s and t. How would the multivariable chain rule compute ∂t ?
Q16 Suppose k(x) is a function and x(r, s, t) is a function of paramters r, s, and t. How does the
∂k
multivariable chain rule say we should compute ∂r ?
372
5.5.5
Q17 Agular momemtum is given by L = rmv where r is the radius of roatation, m is the mass of the
object, and v is its linear speed. At a certain time t0 , r is 42 million meters and increasing at
80, 000 meters per second, m is 6000kg and not changing, and v is 3100m/s and increasing at
20m/s2 . How quickly is angular momentum increasing?
∂f
at (r, θ) = 4, π6 .
Q18 Let f (x, y) = x2 − y 2 . If x(r, θ) = r cos θ and y(r, θ) = r sin θ, compute ∂θ
5.5.6
Q19 Suppose x(t) and y(t) are differentiable functions of t such that
2 df
If f (x, y) = ye(x y)
, show how to compute dt at t = 2.
x=3 y=1
dx dy
dt =5 dt =2
dg
If g(x, y) = 3xy 2 − x2 + 2y, compute dt .
t=2
5.5.7
dy
Q21 Compute dx at (4, 2), if x and y satisfy y 3 − xy + x2 − 4 = 0
dy
Q22 Compute dx at (3, 0), if x and y satisfy xexy = 3
373
Section 5.5 Exercises
Q25 Angular momentum is given by L = rmv. One law of physics states that angular momentum of
an object is conversed (unchanged) unless the a force (besides gravity) acts to speed up or slow
down the object. Use the chain rule to derive an expression for dv dr , the amount of linear speed
an object gains or loses per unit that its radius of rotation increases. What do you notice about
the role of mass in your answer?
Q26 Another principle in physics is the conservation of energy. Kenetic energy is given by E = 12 mv 2 ,
where m is the mass and v is the linear speed of the object. Suppose that we have a rock
drifiting through space. Suppose it impacts stationary rocks and the combined mass sticks
together (without releasing any energy as heat, light or sound). Thus the mass of the total
travelling object increases, while the total energy stays the same. Derive an expression for how
speed changes per unit of increase in mass.
5.5.8
dx
Q27 Suppose that x is a function of t and that when t = 9, we have x = 7 and dt = −3. Define
√
f (x, t) = x + t.
∂f
a Compute the partial derivate (7, 9).
∂t
df
b Compute the total derivative (7, 9).
dt
c In a few sentences, explain what these two quantities compute and why they are different
from each other.
Q28 A firm with a monopoly produces gets to set the price of its products and decide how much to
produce. There is a demand function p such that if the firm produces q units, it must set its
price at p(q) to get consumer to buy all of its production. Each unit costs c to produce. The
profit function of the firm is
π(q, c) = p(q)q − cq
We can assume that once the firm has worked out what c is, it chooses the q to maximize profit.
How much will the firm’s actual profit change per unit of increase in c?
374
Synthesis and Extension
Q29 Find the slope of the tangent line to x2 + 2x − y 2 = 8 at (5, −3) using each of the following two
methods.
a Using a gradient vector to write the normal equation of the line and solving for the slope.
x(t) = t2
y(t) = 3 − t
√
z(t) = t
Q31 Here is a diagram of the level curves of h(x, y) for certain values of c.
b Add a vector to the diagram that indicates the direction of greatest increase of h at (−2, 0).
c Suppose x = 4 − 5t and y = 3t2 . Determine, with the aid of a relevant calculation, whether
dh
dt is positive or negative at t = 1.
375
Section 5.5 Exercises
a Give an equation of the level curve of f through the point (1, −1).
b Give an equation of the tangent plane to z = f (x, y) at the point (1, −1, −14).
c Use the differential of f to estimate how much the z value of z = f (x, y) would change from
(1, −1, −14), if x increased by 3 and y decreased by 1. If you don’t remember differential
notation, you may use another notation for partial credit.
376
Section 5.6
3 Use the Extreme Value Theorem to find the global maximum and global minimum of a function
over a closed set.
Functions can be used to model a variety of real-world quantities. A company’s profit, a disease’s
infection rate, or the impact of a government program. In these cases, the most pressing question is:
what choice of independent variables will maximize or minimize the value of the function? Answering
this question was one of the headline applications of single-variable calculus. In this section we will
generalize those methods to functions of multiple variables.
Question 5.6.1
What Are Local Extremes?
The local extremes of a function are the local minimums and maximums.
Definition
Question 5.6.2
Where Do Local Extremes Lie?
At a local maximum (or minimum) D⃗u f cannot be positive (or negative) in any direction. Thus at
a local extreme, ∇f (P ) = ⃗0, the zero vector. In other words, all the partial derivatives of f are 0 at P .
In the case of a two-variable function, we can visualize this condition. If fx (P ) ̸= 0, then we could
travel in the x direction to increase or decrease f . If fx (P ) ̸= 0, then we could travel in the y direction
to increase or decrease f . Thus at a local maximum or local minimum, the tangent plane must be
377
Question 5.6.2 Where Do Local Extremes Lie?
horizontal.
This argument works anywhere that ∇f exists. That motivates the following definition:
Definition
2 ∇f (P ) does not exist (because one of the partial derivatives does not exist).
Theorem
The local maximums and minimums of a function can only occur at critical points.
Example 5.6.3
Finding Critical Points
378
Solution
We know the minimum value exists, so it must lie at a critical point. We compute
∇f (x, y) = ⟨4x + 4, 2y − 6⟩
One type of critical point is where this is undefined, but no value of (x, y) makes these expressions
undefined. The other type of critical point occurs when these components are 0. We can solve that
system of equations.
4x + 4 = 0 2y − 6 = 0
x = −1 y=3
The only point that satisfies this requirement is (−1, 3). Since there is only one critical point, and the
promised minimum lies at a critical point, (−1, 3) must be that point. The minimum value is
Question 5.6.4
How Do We Identify Two-Variable Local Maximums and Minimums?
Once we have found a critical point, how do we know whether it is a local minimum, a local maximum
or neither? Consider a function f (x, y) and a critical point P . There are two possibilities for ∇f (P ). In
the case that ∇f (P ) does not exist, calculus can be no further use to us. If ∇f (P ) = ⟨0, 0⟩, there are
a few different shapes the graph could take. Since we are working with two-variables, we can visualize
these shapes.
A critical point could be a local maximum. In this case f curves downward in every direction.
A critical point could be a local minimum. In this case f curves upward in every direction.
A critical point could be neither. f curves upward in some directions but downward in others. This
configuration is called a saddle point.
Curvature is measured by the second derivatives. This matches our experience with single-variable
critical points, where the second derivative test classifies critical points as local maximums or local
minimums. We have a similar test for two-variable functions, though the computation is more involved.
380
Theorem [The Second Derivatives Test]
Definition
The quantity D in the second derivatives test is actually the determinant of a matrix called the Hessian
of f .
f xx (P ) f xy (P )
fxx (P )fyy (P ) − [fxy (P )]2 = det
fyx (P ) fyy (P )
| {z }
Hf (P )
Hf follows a logical pattern and can be a useful mnemonic for the second derivatives test.
Example 5.6.5
Classifying a Critical Point
381
Example 5.6.5 Classifying a Critical Point
Solution
b For the second derivatives test, we need to compute fxx , fxy and fyy at (0, 0).
= (−4)(−1) − (−1)2
=3
Since D > 0 and fxx < 0, (0, 0) is a local maximum of f .
Why does the final determination between maximum and minimum rely on fxx (P ) instead of fyy (P )?
Actually it doesn’t matter which we test. In order for D to be positive, fxx (P ) and fyy (P ) must have
the same sign.
Question 5.6.6
How Do We Find Global Extremes?
The second derivatives test can categorize local extremes, but what about a global extreme?
Definition
In a real-world application, we are much more interested in finding global extremes than local ones.
Many abstract functions do not even have global extremes. y = ex has no global maximum. It increases
without bound. y = x12 has no global minimum. It approaches 0 but never reaches it. The following
theorem guarantees that certain functions will have global extremes for us to try to find.
A continuous function f on a closed and bounded domain D has a global maximum and a global
minimum somewhere in D.
Two of the words in this theorem have not been defined yet. Here are their definitions.
Definition
D is bounded if there is some upper limit to how far its points get from the origin (or any other
fixed point). If there are points of D arbitrarily far from the origin, then D is unbounded.
383
Question 5.6.6 How Do We Find Global Extremes?
For one-variable functions. The EVT requires that the domain be a union of finite, closed intervals
(and maybe finitely many isolated points).
In 2-space, we can get a better sense of what these requirements mean. The boundary of D is
the set of points from which you can find points in D and points outside D arbitrarily close by. The
boundary of a disc is a circle. If the disc includes the circle, it is closed. If it does not include the circle,
it is not closed.
Containing part of the boundary is not enough. Any missing point means that D is not closed. Even
removing an isolated point from the interior of D is a problem. That point is arbitrarily close to points
in D. It is also arbitrarily close to a point outside D, itself. Thus it is a boundary point not contained
in D, and D is not closed.
384
Figure: −2 ≤ x ≤ 2 and −3 < y < 3 is Figure: −2 ≤ x ≤ 2 and −3 ≤ y ≤ 3
not closed. and (x, y) ̸= (1, 2) is not closed.
Bounded regions are easier to understand. If we can enclose the region in a sufficiently large circle,
it is bounded. If it stretches outside any circle we would draw around it, then it is unbounded.
Example 5.6.7
Finding a Global Maximum
D = { (x, y) : x2 + y 2 ≤ 16, x ≤ 0}
| {z } | {z }
points in R2 conditions
385
Example 5.6.7 Finding a Global Maximum
Remark
{type of objects in the set : conditions that thoise objects must satisfy}
is used throughout mathematics, because it is so flexible. It can denote sets of numbers, points,
functions, vectors or any other objects.
Solution
386
These are never undefined, so there are no critical points of that type. The only critical points
will be where both partial derivatives are 0.
0 = 2x − 2xy 0 = 4y − x2
0 = 2x(1 − y) (factor 2x − 2xy)
x=0 or y = 1
We should be careful not to lose track of the logic. The x = ±2 solution goes with the y = 1
case. The y = 0 solution goes with the x = 0 case. Mixing these up will give invalid solutions.
You can always plug in pair of (x, y) to verify they satisfy the system of equations.
We conclude that (0, 0), (2, 1) and (−2, 1) are the critical points, but (2, 1) is not in the domain,
so we discard it.
c No. Recall our method for maximizing single variable functions on a closed interval. The maximum
can occur at the endpoint of the interval without being detected by the derivative.
The same is true here. If the maximum is on the boundary of D, the gradient need not be 0. In
the single-variable case, we only need to test the endpoints (by evaluating f there). There are
infinitely many points on the boundary of D. Evaluating f on all of them is not an option. With
graphing software we can see that the maximum occurs on the boundary somewhere in the third
quadrant, but how can we solve for it exactly?
387
Example 5.6.7 Finding a Global Maximum
d To narrow down the search for a maximum on the boundary of D, we will use the boundary
equations to write an expression for f that is valid only on the boundary. We can find the critical
points of this expression, and rule out any point that is not a critical point.
x2 + y 2 = 16
x2 = 16 − y 2
= y 3 + y 2 − 16y + 16
f ′ (y) = 3y 2 + 2y − 16
8
y=− y=2
3
2
8
x2 + − = 16 x2 + 22 = 16 (substituue into x2 + y 2 = 16)
3
64
x2 = 16 − x2 = 16 − 4
9
388
√
r
80
x=− x = − 12 (+ solutions are not in D)
9
q √
Our critical points are − 80 8
9 , − 3 and − 12, 2 . This component of the boundary also
ends at (0, 4) and (0, −4), so the maximum might lie there.
We can now argue that one of the points we have found is the maximum.
(0, −4).
One of these must be the case. To figure out which it is, we can evaluate f at each point and see
which produces the largest value.
Main Ideas
If the Extreme Value Theorem applies, then all we need to do is find the critical points and evaluate
f at each. One is guaranteed to be the maximum, and one is guaranteed to be the minimum.
∇f = ⃗0 will detect critical points on the interior, but not on the boundary.
We can rewrite the function on a boundary component using substitution. Set the derivative equal
to 0 to find critical points.
Derivatives will not detect maximums at the endpoints of a boundary curve. These must be
included in your set of critical points.
389
Section 5.6
Exercises
Summary Questions
Q1 Where must the local maximums and minimums of a function occur? Why does this make sense?
Q3 What hypotheses does the Extreme Value Theorem require? What does it tell us?
Q4 Assuming a maximum and minimum exist, where must you look in a domain to be sure you find
them?
5.6.1
Q5 Raina claims that (0, 0) is the maximum of f (x, y) = x2 − y 2 − 10xy. Disprove her claim without
using calculus.
Q7 Suppose g(x, y) = ef (x,y) . If (a, b) is a local minimum of f (x, y), is it also a local minimum of
Q8 Does a constant function have any local maximums? Justify your answer with the definition of
local maximum.
390
5.6.2
Q9 Suppose ∇f (4, 2) = ⟨−5, 11⟩. Where would you travel from (4, 2) to find higher values of f ?
Q10 The function f (x, y) = |x| + |y| has its global minimum at (0, 0). Is this a critical point? Explain.
Q11 If (a, b) produces the minimum value of |∇f (x, y)|, must (0, 0) must be a critical point? Explain.
Q12 Suppose f (x) is a function of x with critical points x = a and x = b. Suppose g(y) is a function
of y with critical points y = c and y = d. What are the critical points of h(x, y) = f (x) + g(y)?
5.6.3
5.6.4
Q15 If (x0 , y0 ) is critical point and f( xx)(x0 , y0 ) = 0, can (x0 , y0 ) be a local maximum of f ? What
Q16 For what values of a does f (x, y) = x2 + y 2 + axy have a local minimum at the origin?
391
Section 5.6 Exercises
5.6.5
Q17 Find the critical points of h(x, y) = x2 y − x2 − 2y 2 . Classify each as a local maximum, local
minimum, or saddle point.
1 3
Q18 Find all critical points of f (x, y) = 3x − 4xy + 2y 2 . Classify them as local maximums, local
minimums, or saddle points.
Q19 Compute the critical points of f (x, y) = 2x3 − 12xy + 3y 2 and classify each as a local maximum,
local minimum, or saddle point.
Q20 Let h(x, y) = x2 + y 3 + 3xy. Find the critical points of h, and classify each as a local maximum,
local minimum or saddle point.
Q21 Let f (x, y) = x3 − 15x2 − 9x + 12xy − 3y 2 − 18y. Find the critical points of f and classify each
one as local maximum, local minimum or saddle point.
Q22 Let f (x, y) = x5 + 20xy + 5y 2 . Find the critical points of f and classify each one as local
maximum, local minimum or saddle point.
3
+y 2 −12x+10y
Q23 Find the critical points of g(x, y) = ex . Classify each one as local maximum, local
minimum or saddle point.
1
Q24 Find the critical points of f (x, y) = x4 −x2 y+y 2 +10 . Classify each one as local maximum, local
minimum or saddle point.
5.6.6
Q26 Draw a sketch of D = {(x, y) : y ≥ x, y ≤ 2x, xy < 1}. State whether D is closed and whether
D is bounded.
Q27 Draw a sketch of D = {(x, y) : x > 0, y ≥ x4 }. State whether D is closed and whether D is
bounded.
392
Q28 Draw a sketch of D = {(x, y) : − 1 < x2 + y 2 ≤ 16}. State whether D is closed and whether
D is bounded.
Q29 Let D = {(x, y) : y ≥ x2 }. Can the Extreme Value Theorem guarantee that f has a maximum
on D? Explain.
1
Q30 Does the function f (x, y) = x2 +y 2 have a maximum and minimum value on the domain D =
{(x, y) : −3 ≤ x ≤ 3, −4 ≤ y ≤ 4}? If yes, find them. If not, explain why the extreme value
theorem does not apply.
5.6.7
Q31 Draw a careful diagram of D = {(x, y) : y ≥ x2 , x2 + y 2 ≤ 20}. Where would you need to
check to guarantee you’d find the maximum value of a continuous function f on D?
D = {(x, y) : y ≥ x2 − 4, x ≥ 0, y ≤ 5}.
b Does the Extreme Value Theorem guarantee that f has an absolute minimum on D? Explain.
c List all the places you would need to check in order to locate the minimum.
Q33 Find the maximum and minimum value of f (x, y) = ex+3y in the triangle with vertices (0, 0),
Q34 Find the maximum and minimum value of f (x, y) = 3x + y on D, the closed region bounded by
y = x2 and y = 16.
Q35 Find the global max and min of f (x, y) = x3 − 12x + y 3 − 3y on the rectangle 0 ≤ x ≤ 4 and
−2 ≤ y ≤ 2.
x4 −2x2 +2
Q36 Consider the function g(x, y) = y 2 −2y+2 on the rectangle −2 ≤ x ≤ 2 and 0 ≤ y ≤ 3.
393
Section 5.6 Exercises
a Does the extreme value theorem apply to this function? Why might you be concerned, and
what would you have to check?
b What does the second derivatives test say about the critical points of f ?
c Can you classify the critical points using algebra instead? Explain.
Q38 If g(x) is an increasing function, explain why the local maximums and minimums of any f (x, y)
are the same as the maximums and minimums of g(f (x, y)).
394
Section 5.7
Lagrange Multipliers
Goals:
Many of the functions we studied do not have maximum values. Polynomials and exponential
functions increase without bound. Yet in the real world, we never see corporations producing infinite
quantities of goods. We never see infinite populations of animals. Does this mean that polyonomials and
exponentials have no real-world applications? On the contrary, they are ubiquitous, but the corporations
and populations that opperate under these models also have constraints on their inputs.
Corporations do not have infinite money to invest. Animals do not have infinite food sources. In
this section we develop the tools to find maximum and minimum values of a function, when our inputs
are constrained.
Question 5.7.1
What Is a Constraint?
Sometimes we aren’t interested in the maximum value of f (x, y) over the whole domain, we want
to restrict to only those points that satisfy a certain constraint equation.
Question 5.7.2
How Do We Solve a Constrained Optimization?
Theorem
Suppose an objective function f (x, y) and a constraint function g(x, y) are differentiable. The local
extremes of f (x, y) given the constraint g(x, y) = c occur where
∇f = λ∇g
for some number λ, or else where ∇g = 0. The number λ is called a Lagrange Multiplier.
Figure: Where ∇f is not parallel to ∇g, we can travel along g(x, y) = c and increase the value of f .
This is because D⃗u f > 0 for some ⃗u along the constraint.
By this argument, the only place a maximum or minimum of the objective function can lie of the
contraint is where D⃗u f would have to be 0, because ∇f is parallel to ∇g.
Remark
When ∇f (P ) is parallel to ∇g(P ) (and neither of these vectors is ⃗0), the level curves of f through P
is tangent to the level curve g(x, y) = c. If we can draw the level curves of f , this gives us a visual
method of identifying the potential maximums and minimums.
Example 5.7.3
The Maximum on a Curve
Find the point(s) on the ellipse 4x2 + y 2 = 4 on which the function f (x, y) = xy is maximized.
396
The EVT and constraints
Are we guaranteed that a maximum exists at all? The Extreme Value Theorem can still be applied to
constraints. Here are a few ways we can identify that a constraint is closed:
Solution
397
Example 5.7.3 The Maximum on a Curve
This tells us the only possible locations for the maximum are:
√
1
(x, y) = ±√ , ± 2
2
1 √ 1 √
f √ , 2 =1 f − √ , 2 = −1
2 2
√ √
1 1
f −√ , − 2 = 1 f √ , − 2 = −1
2 2
√ √
We conclude that the maximum occurs at √1 , 2 and − √12 , − 2 .
2
398
Figure: The four points that satisfy ∇f = λ∇g and g(x, y) = c.
Main Idea
The level set of a continuous (constraint) function is always closed. If it is also bounded and the
objective function is differentiable, then one of the points produced by Lagrange multipliers will be the
global maximum and one will be the global minimum of the constrained optimization.
Example 5.7.4
The Maximum on a Surface
Find the maximum value of the function f (x, y, z) = x4 y 4 z on the sphere x2 + y 2 + z 2 = 36.
Figure: The gradient vector and level surface of a constraint function and the gradient vector of the
objective function
399
Example 5.7.4 The Maximum on a Surface
Solution
First note that the EVT applies, since a sphere is closed and bounded and f is continuous. To identify
potential maximums, we appeal to Lagrange multipliers.
Set g(x, y, z) = x2 +y 2 +z 2 . Then ∇g(x, y, z) = ⟨2x, 2y, 2z⟩. The case ∇g(x, y, z) = ⃗0 only occurs
at the origin, which is not on the sphere. The critical points must be only the points where ∇f = λ∇g.
∇f (x, y, z) = 4x3 y 4 z, 4x4 y 3 z, x4 y 4 .
Equating each coordinate gives us three equations, and the constraint is a fourth. We thus have a
system of four equations and four variables.
This gives us 8 critical points: (±4, ±4, ±2). In addition every point in the x = 0 cross section of
the sphere is a critical point, as is every point in the y = 0 cross-section. This is infinitely many points
to evaluate, but fortunately the algebra of our objective function allows us to evaluate these points in
large batches.
if x = 0 f (x, y, z) = 04 y 4 z = 0
if y = 0 f (x, y, z) = x4 04 z = 0
Thus the maximum value is 217 . It occurs at the four points (±4, ±4, 2).
400
Remark
If we hadn’t seen how to avoid dividing by x, y and z, we could have gone ahead and done the division.
Remember that when you divide while solving an equation, you obtain an extra solution where the divisor
is 0. This would lead us to check x = 0, y = 0 and z = 0 as we did in the factoring solution.
Synthesis 5.7.5
Using the Extreme Value Theorem and Lagrange Multipliers
How can Lagrange multipliers help us find the maximum of f (x, y) = x2 + 2y 2 − x2 y on the domain
Solution
We can continue Example 7. After finding the critical points of f at (0, 0) and (−2, 1), we turn to the
boundaries. The boundaries are level curves.
For x2 + y 2 = 16, set g(x, y) = x2 + y 2 = 16. We have
∇g(x, y) = ⃗0 only at the origin, which isn’t on the constraint. So we solve ∇f (x, y) = λ∇g(x, y)
and g(x, y) = 4.
401
Synthesis 5.7.5 Using the Extreme Value Theorem and Lagrange Multipliers
8
if y = − if y = 2
3
2
8
x2 + − = 16 x2 + 22 = 16
3
64 144
x2 + = x2 = 12
9 9
80 √
x2 = = x = ± 12
9
r
80
x=±
9
√ q
The critical points are (0, ±4), − 12, 2 and − 80 8
9 , − 3 . The solutions with positive x are
not in D.
On x = 0, substitution is probably the easier choice, but Lagrange multipliers are still possible.
x = 0 is a level set of the function g(x, y) = x.
∇g(x, y) = ⟨1, 0⟩
2x − 2xy = λ 4y − x2 = 0 x=0
4y = 0
402
Main Idea
To find the absolute minimum and maximum of a differentiable function f (x, y) over a closed and
bounded domain D:
1 Compute ∇f and find the critical points inside D.
2 Identify the boundary components. Find the critical points on each using substitution or Lagrange
multipliers.
4 Evaluate f (x, y) at all of the above. The minimum is the lowest number, the maximum is the
highest.
Synthesis 5.7.6
The Gradient on the Boundary
Suppose P is a critical point of f on a boundary component of a domain D. What does the direction
of ∇f (P ) tell us about whether P is a maximum or minimum?
Figure: The critical points and gradient vectors of f (x, y) on a closed and bounded domain
Solution
First suppose ∇f (P ) points into D. Then f increases as we travel into D. Thus P cannot be a local
maximum.
403
Synthesis 5.7.6 The Gradient on the Boundary
P may be a local minimum but may not be. The directional derivative along the boundary is 0, so f
could curve upward or downward along the boundary. If f curves downward we could find lower values
of f nearby and P would not be a minimum. If f curves upward, then P would be a minimum. We
could compute this curvature by taking the substituted version of f that we used to solve for P and
computing its second derivative at P .
On the other hand, if we suppose that ∇f (P ) points out of D, then D decreases as we travel into
D, and P cannot be a local minimum. It may or may not be a local maximum.
Question 5.7.7
Can This Lagrange Apply to More Than One Constraint?
If we have two constraints in three-space, g(x, y, z) = c and h(x, y, z) = d, then their intersection
is generally a curve.
According to our earlier argument about directional derivatives, at a maximum P on the constraint,
∇f (P ) must be normal to the constraint. There are more ways for this to happen with two constraint
equations.
1 ∇f (P ) could be parallel to ∇g(P ).
2 ∇f (P ) could be parallel to ∇h(P ).
3 ∇f (P ) could be the vector sum of a vector parallel to ∇g(P ) and a vector parallel to ∇h(P ).
You should look at Figure 380 to convince yourself that these ∇f (P ) would all be normal to the
constraint. We can express this condition algebraically
404
Theorem
If f (x, y, z) is a differentiable function and g(x, y, z) = c and h(x, y, z) = d are two constraints. If P is
a maximum of f (x, y, z) among the points that satisfy these constraints then either
∇f (P ) = λ∇g(P ) + µ∇h(P )
Remark
You can check the reasonableness of this method by noting that it gives us a system of 5 variables, x,
y, z, λ, µ, and five equations:
We therefore generally expect this system to have a finite number of solutions, though there are plenty
of counterexamples to this expectation.
Section 5.7
Exercises
Summary Questions
Q1 What is a constraint?
Q2 What equations do you write when you apply the method of Lagrange multipliers?
Q3 Is the set of points that satisfies a constraint closed and bounded? Explain.
Q4 How does a constraint arise when finding the maximum over a closed and bounded domain?
405
Section 5.7 Exercises
5.7.1
Q5 Suppose we have $230 to spend on three goods. Good 1 costs $13 per unit. Good 2 costs $22
per unit. Good 3 costs $11 per unit. Write a budget constraint that expresses what purchases
(x, y, z) of good 1, good 2 and good 3 are possible, if you spend you budget.
Q6 Suppose he maximum value of f (x, y) occurs at (3, −4). Where is the maximum value of f (x, y)
5.7.2
Q7 Suppose f (x, y, z) is a smooth function. Suppose the maximum value of f on the sphere x2 +
y 2 + z 2 = 25 occurs at P . What can you say about ∇f (P ) and the tangent plane to the sphere
at P ?
Q8 Suppose the curve below is the graph of g(x, y) = k. Use methods from calculus to find and
mark the approximate location of the point that maximizes the function f (x, y) = 3y − x subject
to the constraint g(x, y) = k. Justify your reasoning in a few sentences.
Q9 Suppose that (a, b) is a local maximum of the smooth function f (x, y) which also happens to
a Is (a, b) also a local maximum of f among the points on the constraint? Explain.
b If we used Lagrange multipliers to detect (a, b), what would we expect λ to be equal to at
that point?
406
Q10 Show that (3, 3) is not a local maximum of f (x, y) = 2x2 − 4xy + y 2 − 8x on the graph
x3 + y 3 = 6xy.
5.7.3
a What system of equations would you set up to find the critical points of f on the constraint
p(x, y) = c?
5.7.4
Q13 Find the maximum value of f (x, y, z) = xyz on the sphere x2 + y 2 + z 2 = 36.
Q15 Find the maximum value of f (x, y, z) = 3y + 2z on the ellipsoid 25x2 + y 2 + 4z 2 = 100.
407
Section 5.7 Exercises
5.7.5
Q17 Suppose f (x, y) is differentiable but has no critical points. Will the method of Lagrange multipliers
b Which question takes less work to solve? Explain how you know.
c Do solutions exist to both questions? What additional information would guarantee that
they do?
Q19 Let D = {(x, y) : x2 + y 2 ≤ 1, x ≥ 0, y ≤ 0}. Find the maximum and minimum values of
f (x, y) = x2 − y on D.
Q20 Consider the function f (x, y) = x2 + 6xy + 9y 2 + 5. Find the maximum and minimum values of
Q21 Let D = {(x, y) : x2 + y 2 ≤ 20, y ≥ −x}. Find the maximum and minimum values of
f (x, y) = x4 y on D.
Q22 Let D = {(x, y) : x2 + y 2 ≤ 25, y ≥ x + 1, y ≥ 0}. Find the maximum and minimum values of
f (x, y) = x3 y 2 on D.
Q23 Let D = {(x, y) : x2 + y 2 ≤ 20, y ≥ −x}. Find the maximum and minimum values of
f (x, y) = x4 y on D.
x2 y2
Q24 Let D = (x, y) : + ≤ 1, x ≥ 0 . Find the points in D that obtain the maximum and
16 64
minimum values of f (x, y) = 2x + 3y.
408
5.7.6
D = {(x, y) | g(x, y) ≤ c}
occurs at P on the boundary of D. We know that ∇f (P ) points out of D. What does this tell
us about the sign of λ?
Q26 Explain why knowing which way ∇f points is not useful for ruling out potential maximums given
5.7.7
Q27 How does the method of Lagrange multipliers suggest we solve for the maximum value of f (x, y)
on the constraints x + y = 1 and x − y = 0? Do we need to know what f is to solve this? Why
shouldn’t that bother us?
Q28 Write a system of equations that one would solve to find the maximum and minimum values of
a Use Lagrange multipliers to find the point A on p that s closest to the origin O.
−→
b Show that OA is a normal vector to p.
c Show how you can use the observation in b to solve for the closest point (A) without using
calculus.
Q30 Determine the smallest rectangle (parallel to the x and y axes) that contains the ellipse x2 +
3xy + 4y 2 − 4x − 13y + 4 = 0.
409
Section 5.7 Exercises
Q31 An aquarium with an open top has volume 20m3 . Its rectangular base is made of slate, and its
sides are made of glass. Slate costs five times as much (per unit area) as glass. Set up and solve
a constrained onstrained optimization problem to find the dimensions (ℓ, w, h) of the aquarium
that will minimize the cost of materials.
f (x, y) = xy − 3y − 6x.
a Does f have a maximum and minimum value on D? What tool can you use to verify this?
What did you need to check before applying this tool?
b Find the maximum and minimum values of f on D. Demonstrate in your work that you’ve
checked all the relevant places for potential maximums.
Q33 Find the maximum and minimum values of f (x, y) = 2x2 + 2xy + 5y 2 on the ellipse x2 + 4y 2 =
106.
410
Chapter 6
Multivariable Integration
This chapter introduces integration of functions of more than one variable. It also introduces joint
probability distributions as an application.
Contents
6.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
6.2 Double Integrals over General Regions . . . . . . . . . . . . . . . . . . . . . 426
6.3 Joint Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 439
6.4 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Section 6.1
Double Integrals
Goals:
In single-variable calculus, the definite integral computes a total change from a rate of change.
Moreover, it also solves a geometry problem. This connection means that we can use geometric intuition
to understand integrals better. Integrals of multi-variable functions also allow us to aggregate rates into
totals. To build our intuition, we begin with the geometric problem that they solve.
Question 6.1.1
How Do We Approximate the Volume Under z = f (x, y)?
412
Figure: The volume under z = f (x, y) and the prisms that approximate it
If A is the area of each subrectangle, and (x∗i , yi∗ ) is the test point in the ith subrectangle, then our
approximation is
X n
Volume ≈ f (x∗i , yi∗ )A.
i=1
If our domain is not a rectangle, we may not be able to divide it into subrectangles. Luckily, the
formula for volume of a prism works for any shape base. We can still compute
n
X
Volume ≈ f (x∗i , yi∗ )Ai .
i=1
Notice that instead of a single variable A for the area of all subregions, we need a different area for
each. For each i, Ai denotes the area of the ith subregion.
For a reasonably well-behaved function f (x, y), the actual volume can be computed by taking a limit
of these approximations. We call this limit the double-integral.
413
Question 6.1.1 How Do We Approximate the Volume Under z = f (x, y)?
Definition
Remark
The diameter of a region is the distance between its two most distant points. Sending the largest
diameter to 0 ensures that all of the regions’ diameters shrink to 0.
Notice that we do not take the limit as the area goes to 0. If only the areas approach 0, the regions
could become long and thin. The test points could all be chosen from one end of the domain which is
unrepresentative of the whole.
Example 6.1.2
Approximating a Double Integral
ZZ
Consider x2 ydA, where D is the region shown here. Approximate the integral using the division
D
of D shown, and evaluating f (x, y) at the midpoint of each rectangle.
y
x
1 2
414
Solution
A = (1)(0.5) = 0.5.
4
X
Volume ≈ f (x∗i , yi∗ )A
i=1
4
X
≈A f (x∗i , yi∗ )
i=1
≈ A f (0.5, 0.25) + f (0.5, 0.75) + f (1.5, 0.25) + f (1.5, 0.75)
Question 6.1.3
How Do We Evaluate Double Integrals?
We already know another way of computing a volume. We can compute the area of the cross sections
perpendicular to the x-axis. Let the function A(x) denote this area at each x. Then
Z b
Volume = A(x) dx
a
A(x) is itself the area under a curve. In a particular cross section, x is constant, and f (x, y) is a function
of y. The area below this graph is the integral
Z d
A(x) = f (x, y) dy
c
We can put these together to obtain an iterated integral, an integral whose integrand is itself an
integral.
415
Question 6.1.3 How Do We Evaluate Double Integrals?
This method computes the same signed volume as the double integral we defined. The formal
argument that they are equivalent is called Fubini’s theorem.
where a and b are the x bounds of D, and c and d are the y bounds of the cross section at each x.
Alternately, we can write
!
ZZ Z d Z b
f (x, y) dA = f (x, y) dx dy
D c a
where c and d are the y bounds of D, and a and b are the x bounds of the cross section at each y.
416
Notation
In some cases, rather than figuring out what a, b, c and d are, we will use a hybrid notation. It indicates
a particular order of integration but does not go into details about the bounds of x and y.
ZZ
f (x, y)dydx.
D
Example 6.1.4
Using Fubini’s Theorem
ZZ
Compute x2 y dA, where D is the region shown here:
D
y
x
1 2
Solution
The x bounds of this region are 0 ≤ x ≤ 2. The y bounds are 0 ≤ y ≤ 1. We rewrite this as an
integrated integral and solve:
ZZ Z 2 Z 1
2
x y dA = x2 y dydx (Fubini’s theorem)
D 0 0
2 1
x2 y 2
Z
= dx (FTC on the inner integral)
0 2 y=0
2
x2 12 x2 0 2
Z
= − dx (plug in y values)
0 2 2
2
x2
Z
= dx
0 2
2
x3
=
6 x=0
8 0
= −
6 6
417
Example 6.1.4 Using Fubini’s Theorem
4
=
3
Question 6.1.5
Can We Break a Double Integral into a Product of Single Integrals?
RR
In general, we can’t expect to factor out the inner integral of D f (x, y)dydx (using the constant
multiple rule). The y-bounds may depend on x, and the y terms may not factor out of the integrand.
However, for certain functions and domains, this factoring is possible.
Theorem
! Z !
Z b Z d Z b d
f (x)g(y)dydx = f (x)dx g(y)dy
a c a c
We won’t be able to use this theorem all the time. It has two important requirements:
1 The bounds of integration (a, b, c, d) are constants. We’ll see integrals soon where this is not the
case.
2 The integrand can be factored into a function of x times a function of y. Most two-variable
functions cannot.
Example 6.1.6
Integrating a Product
x
1 2
418
Solution
ZZ Z 2 Z 1
2
x y dA = x2 y dydx has constant bounds and the integrand can factor as (x2 )(y). The
D 0 0
product theorem applies:
Z 2 Z 1 Z 2 Z 1
x2 y dydx = x2 dx ydy
0 0 0 0
! !
3 2 2 1
x y
=
3 0 2 0
3 3
12 02
2 0
= − −
3 3 2 2
8 1
=
3 2
4
=
3
Remark
The product decomposition does not save us much work in most cases, but it can help us avoid mixing
up the variables.
Application 6.1.7
Rates (per Area)
Double integrals can compute a total from a rate per unit of area. Integrating rainfall per square
kilometer gives the total rain that fell in a watershed.
419
Application 6.1.7 Rates (per Area)
Integrating watts per square meter on a solar array gives the total energy generated.
https://commons.wikimedia.org/w/index.php?curid=70132767
Application 6.1.8
Probability
If we generate a data set in which we have measured two variables, then the probability that a
random data point lies in a given region is the double integral of a joint density function over that
area.
420
Figure: A highly correlated set of observations and an uncorrelated joint density function
Section 6.1
Exercises
Summary Questions
Q2 What formula do we use to compute the exact volume under a graph z = f (x, y)?
Q4 What conditions do you need in order to write a double integral as a product of single integrals?
6.1.1
Q5 Suppose that we are approximating the volume under z = f (x, y) over T , the triangle with
vertices (0, 0), (2, 0) and (0, 1). We’d like to use subregions about 0.25 units long per side. Here
are two options:
Cover as much of T with square prisms as possible, use triangluar prisms in the remaining
spots.
421
Section 6.1 Exercises
Cover as much of T with square prisms as possible, and just forget about the remaining
space.
a Draw a diagram of where the squares and triangles could reasonably be placed.
b Suppose the side length of the squares shrinks to be arbitrarilty small. Explain why it does
not matter which of the two options we use in these approximations.
S = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}
ZZ
Suppose we approximate x dA by n prisms whose bases are rectangles of length 1 in the
S
1
x-direction and width n in the y-direction.
a How could you pick test points in each rectangle to ensure that the value of this approximation
is 0, no matter what n is?
b How could you pick test points in each rectangle to ensure that the value of this approximation
is 1, no matter what n is?
c Does the fact that both of these approximations are possible no matter how many rectangles
ZZ
we use mean that x dA does not exist? Explain.
S
6.1.2
R 6 R 12
Q7 Show how to approximate the integral 0 3
xy dydx using six 3 unit by 3 unit squares and
using their lower right corners as test points. You do not need to simplify the arithmetic.
R4R4
Q8 Approximate the value of 0 −2
sin2 (πxy)dydx by dividing the domain into an array of 4 rect-
a Show how to approximate the integral using six 2 unit by 2 unit squares and using their lower
right corners as test points. You do not need to simplify the arithmetic.
422
b Explain how you can tell whether your approximation in a is an overestimate or underesti-
mate without computing the actual value of the integral.
ex+y dA
RR
Q10 Let T be the triangle with vertices (0, 0), (1, 0) and (0, 2). Show how to approximate T
1
by dividing T into four right triangles with legs of length 1 and 2. Use the midpoint of the
hypotenuses as the test points.
6.1.3
R = {(x, y) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 3}.
Let S be the solid region above R and below the graph z = y 2 sin πx + 9. What is the area of
the y = 2 cross-section of S?
R = {(x, y) : − 2 ≤ x ≤ 2, −1 ≤ y ≤ 1}.
Let S be the solid region above R and below the graph z = x2 y + xy 2 . Write a function A(x)
which gives the area of the cross section of S perpendicular to the x-axis at each value of x.
6.1.4
R = {(x, y) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 3}.
ZZ
Compute y 2 sin πx + 9 dA
R
R = {(x, y) : − 2 ≤ x ≤ 2, −1 ≤ y ≤ 1}.
ZZ
Compute x2 y + xy 2 dA.
R
423
Section 6.1 Exercises
Z 5 Z 3
Q15 Evaluate yex dydx.
4 0
Z 10 Z 4
Q16 Evaluate y 3 − x dydx.
0 2
6.1.5
R = {(x, y) : − a ≤ x ≤ b, c ≤ y ≤ d}.
Let S be the solid region above R and below the graph z = f (x)g(y). Write a function A(x)
which gives the area of the cross section of S perpendicular to the x-axis at each value of x.
Explain why you can factor the f (x) out of this integral.
R = {(x, y) : − 2 ≤ x ≤ 2, −1 ≤ y ≤ 1}.
ZZ
Explain why the product decomposition theorem does not apply to x2 y + xy 2 dA.
R
6.1.6
R = {(x, y) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 3}.
ZZ
Write y 2 sin πx dA as a product of two single-variable integrals.
R
Z 3 Z 5
1
Q20 Write dydx as a product of two single-variable integrals.
−3 2 y2
424
6.1.7
Q21 A corrugated metal sheet has density of dx, y = 3 + sin 2x kg/m2 . What is the mass of the
Q22 The shadow of a tree passes over part of a solar panel each day, covering the bottom of the panel
more of the day than the top. The rate of daily energy generation per unit of area at the point
(x, y) is given by p(x, y) = 8 sin y + π3 kilowatt hours per square meter. Compute the total
power generated per day by the panel whose bounds (in meters) are given by 0 ≤ x ≤ 1 and
0 ≤ y ≤ π6 .
Q23 Suppose we wanted to compute the volume above z = f (x, y) and below z = g(x, y) over the
rectangle
R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d}.
What double integral would compute this volume?
Z b Z d
f (x, y)dydx
a c
by rectangles sampled from either upper-left, upper-right, lower-left or lower-right corners. If you
are told that fx (x, y) < 0 at all points (x, y), what does that tell you about which approximations
are larger than which?
425
Section 6.2
So far, we have computed double integrals over rectangular domains. In this section, we consider
double integrals over more complicated domains.
Example 6.2.1
Integrating Over a Polygon
Let D be the triangle with vertices (0, 0), (4, 0) and (4, 2). Calculate
ZZ
4xy dA
D
Solution
The naive approach would be to use the x and y bounds of D to write the integral
ZZ Z 4 Z 2
4xy dA = 4xy dydx
D 0 0
1
ZZ Z 4 Z 2x
4xy dA = 4xy dydx
D 0 0
This may appear harder to solve, but it isn’t. The only difference is that when we apply the fundamental
theorem of calculus to the inner integral, we plug in an expression instead of a number.
1
ZZ Z 4 Z 2x
4xy dA = 4xy dydx (Fubini’s theorem)
D 0 0
4 1
2x
Z
= 2xy 2 dx (FTC)
0 0
Z 4 2
1
= 2x x − 2x(0)2 dx
0 2
4
x3
Z
= dx
0 2
4
x4
= (FTC again)
8 x=0
44 04
= −
8 8
= 32
Main Idea
2 Find the functions (of the form y = g(x)) which define the top and bottom of the domain. These
functions are the bounds of the inner integral.
427
Question 6.2.2
What Are the Integral Laws for Double Integrals?
Some single variable integral laws apply to double integrals as well (provided the integrals exist).
1 The sum rule:
ZZ ZZ ZZ
f (x, y) + g(x, y)dA = f (x, y)dA + g(x, y)dA
D D D
Example 6.2.3
A Region Without a (Single) Bottom Curve
√
Let D be the region bounded by y = x, y = 0 and y = x − 6. Calculate
ZZ
(x + y) dA.
D
We begin by finding the intersections of these graphs. There are three pairs of graphs to solve for.
√ √
x=x−6 0= x ⃗0 = x − 6
0 = x2 − 13x − 36
0 = (x − 4)(x − 9)
x = 4 or x = 9
When we square both sides of an equation we have to check our solutions. x = 4 does not satisfy
√
x = x − 6 but x = 9 does. Look at the graph of these functions. There is not a single y lower bound
that applies to all cross sections of this region. For some values of x, the lower bound lies on y = 0.
For others it lies on y = x − 6. We will present three solutions to this problem. We’ll only evaluate the
last one.
428
Solution 1
Using the third integral law, we break up D into two subdomains, each of which has a single bottom
curve. The break happens at x = 6 since that is where y = 0 meets y = x − 6.
√ √
Z 6 Z x Z 9 Z x
(x + y) dA + (x + y) dydx
0 0 6 x−6
Solution 2
√
D can be written as the region between y = 0 and y = x with a triangle removed. We can use this
ZZ
to write (x + y) dA as a difference of two integrals.
D
√
Z 9 Z x Z 9 Z x−6
(x + y) dA − (x + y) dydx
0 0 6 0
Solution 3
Instead of taking cross sections perpendicular to the x-axis we can take cross sections perpendicular
to the y-axis. In this case, we need to know the x bounds of each cross section (as a function of
y). Drawing the horizontal line segments through
√ D at each y, we see that the upper x-bound lies on
y = x − 6 and the lower x bound lies on y = x. We need to write these x values as functions of y so
we solve them for y:
√
y= x y =x−6
y2 = x y+6=x
√
The lower y bound for the region is y = 0. The upper y bound is the intersection of y = x and
y = x − 6, where x = 9 and y = 3. Thus we can write
ZZ Z 3 Z y+6
(x + y) dA = (x + y) dxdy
D 0 y2
3 y+6
x2
Z
= + xy dy
0 2 y2
3
y 2 + 12y + 36 y4
Z
= + y 2 + 6y − − y 3 dy
0 2 2
Z 3
1 3
= − y 4 − y 3 + y 2 + 12y + 18 dy
0 2 2
3
1 5 1 4 1 3
=− y − y + y + 6y 2 + 18y
10 4 2 0
429
Example 6.2.3 A Region Without a (Single) Bottom Curve
243 81 27
=− − + + 54 + 54
10 4 2
1341
=
20
Main Idea
For a region without a single upper or lower curve, the strategies for integrating a function are the same
as the strategies for computing the area.
1 Break the region into two or more pieces, each of which has a single top curve and a single bottom
curve.
2 See if the region has a single left curve (lower x bound) and a single right curve (upper x bound).
If so, solve the bounds for x and change the order of integration.
Example 6.2.4
Using Anti-Symmetry
√
ZZ p
Let D be the region x2 + y 2 ≤ 9. Evaluate 3
x y + 3dA.
D
430
Solution
The function f and the domain D both have a particular type of symmetry. D is symmetric about the
y-axis. We can flip the right side of D over onto the left side of D and they match up perfectly. We
can express this transformation in algebra by
(x, y) → (−x, y)
√ √ √ √
Furthermore, f (x, y) = 3 x y + 3 and f (−x, y) = 3 −x y + 3 are opposites (they sum to 0). Thus
the height of the graph z = f (x, y) above the left half of D is equal to the depth of the graph below
the right half of D. These two regions have opposite signed volumes. Their sum, which is the integral
over all of D, is 0.
Main Idea
ZZ
We can argue that an integral f (x, y)dA is equal to zero when
D
1 D is symmetric about some line L. If we folded it over L, one side of D would lie exactly on the
other side.
2 f is antisymmetric about L. For each point (x, y) in D the image of (x, y) across L, denoted
rL (x, y) has the property:
f (rL (x, y)) = −f (x, y).
431
Example 6.2.5
Using Order to Manipulate the Integrand
Let D be the triangle with vertices (0, 0), (0, 2) and (1, 2).
Calculate ZZ
2
e(y ) dA.
D
Solution
y
ZZ Z 2 Z 2
(y 2 ) 2
e dA = e(y )
dxdy
D 0 0
Z 2 y
2 2
= e(y ) x dy
0 0
u-substitution
Z 2 y
2 2
= e(y )
dy u=y y=0⇒u=0
0 2
du = 2y dy y=2⇒u=4
Z 4
1 u 1 y
= e du 4 du = 2 dy
0 4
1 u4
= e
4 0
e4 1
= −
4 4
Main Idea
If we don’t know the anti-derivative of an integrand with respect to one variable, try switching the order
of integration. Remember to change the bounds too.
432
Application 6.2.6
Area of a Domain
We can use a double integral of f to measure the domain of integration, or compute statistics about
f . Here are two examples.
Theorem
This theorem may seem counter-intuitive at first, because a double integral computes a volume, not
an area. However, the volume under a graph of height 1 is equal to 1 times the area of the base. As
long as we change from cubic units to square units, the integral will be numerically equal to the area.
433
Section 6.2
Exercises
Summary Questions
Q1 What are the steps for writing a double integral over a general region?
6.2.1
x2 y dA
RR
Q5 If D is the triangle with vertices (0, −2), (4, 0) and (0, 8) calculate D
Q6 Integrate the function f (x, y) = y over the region enclosed by the lines y = 5x, y = 6 − x and
y = x.
Q7 Let f (x, y) be a function and D be the trapezoid with vertices (3, 1), (3, 6), (6, 5) and (6, 4).
RR
Draw D and set up the bounds of D f (x, y)dA.
Q8 Let D be the parallelogram with vertices (0, 1), (0, 4), (5, 3) and (5, 6). Let f (x, y) be a contin-
uous function.
ZZ
a Set up the bounds of integration of f (x, y) dA.
D
R5R4
b Could we save time by computing 0 1
f (x, y) dydx instead? Explain.
ZZ
Q9 Let D be the region enclosed by y = 6 − x2 and y = x. Evaluate xey dA.
D
434
6.2.2
Q11 Let T be the triangle with vertices (0, 3), (7, 10) and (9, 0). Set up the bounds for two intgrals
ZZ
whose sum is f (x, y) dA.
T
Q12 Let P be the pentagon with vertices (0, 0), (0, 2), (4, 3), (4, 1) and (3, 0).
ZZ
a Set up the bounds for two integrals whose sum is f (x, y) dA.
P
ZZ
b Set up the bounds for two integrals whose difference is f (x, y) dA.
P
6.2.3
ZZ
f (x, y) dA
D
in two different ways, using both orders of dx and dy. Do not evaluate either.
RR
a Rewrite D
f (x, y) dA as one or more integrals with differential dydx. Do not evaluate.
RR
b Rewrite D
f (x, y) dA as one or more integrals with differential dxdy. Do not evaluate.
RR
a Rewrite D
f (x, y) dA as one or more integrals with differential dydx. Do not evaluate.
RR
b Rewrite D
f (x, y) dA as one or more integrals with differential dxdy. Do not evaluate.
Z 5 Z 10−2x
Q17 Draw the domain of the integral f (x, y) dydx. Then rewrite the integral in the order
1 0
dxdy.
435
Section 6.2 Exercises
Z 6 Z 0
Q18 Consider the integral √ x2 dxdy. Write this integral in the order dydx.
−6 − 36−y 2
6.2.4
√
8 64−x2
√
Z Z
3
Q19 Let f (x, y) = cos x sin y. Argue that √ f (x, y) dydx = 0.
−8 − 64−x2
Z 4 Z 3
3 y2
Q20 Let g(x, y) = x e . Argue that g(x, y) dydx = 0.
−4 −3
(1, 1) (5, 7)
(7, 7) (7, 5)
RR
Suppose you wanted to argue that R f (x, y)dA = 0 by a symmetry argument. Describe with
a diagram or formula what would need to be true about f (x, y) for such an argument to work.
Q22 Let D be the trapezoid with vertices (0, 5), (6, 5), (2, 0) and (4, 0). Let g(x, y) be some continuous
function.
RR
a Sketch D and set up the bounds of integration for D
g(x, y) dA such that you obtain one
would need to be true about g(x, y)? Express your answer as a formula.
Q23 Let h(x) be a one-variable function that takes only positive values. Let f (x, y) be a two-variable
Z b Z h(x)
function. Describe the antisymmetry of f that would allow us to conclude that f (x, y) dydx =
a −h(x)
0.
Q24 Suppose you are given that f (x, y) = −f (−y, −x). Over what domains D can we argue by
ZZ
symmetry that f (x, y) dA = 0? Draw an example of one.
D
436
6.2.5
Q25 Would the method in this example still work, if we instead defined D to have vertices (0, 0),
6.2.6
√
Z 10 Z 100−x2
Q29 Use geometry to evaluate dydx.
0 0
Z 8 Z 4− 12 x
Q30 Use geomtery to evaluate dydx.
0 0
Q31 What is the geometric significance of the inner integral in a double integral of the form
Z b Z h(y)
f (x, y) dxdy?
a g(y)
a Show how to approximate the value of this integral, dividing the domain into sub-rectangles
of length 2 units and width 3 units and using the lower right corners as test points. You
should evaluate any functions that appear in your estimate, but you do not need to simplify
the arithmetic.
437
Section 6.2 Exercises
b Explain in a sentence or two how you can determine the exact value of this integral without
calculating any anti-derivatives.
c Discuss what test point you could have picked in a , such that your approximation would
have computed the exact value of the integral. Note: There are several relevant observations
to make in response to this question.
438
Section 6.3
Some of the most compelling statistical conclusions do not rely on one measurement but on many,
and the relationship between them. Suppose we test a drug by randomly giving different doses to different
participants, then measuring their symptoms. Knowing the likelihood of each level of symptoms doesn’t
tell you whether the drug is effective. Adding in the knowledge of what percentage of test subjects
receive each dosage does not help. Instead you need to know how likely certain pairs of dose and
outcomes are:
If (no dose, high symptoms) and (high dose, low symptoms) are likely enough, then there is a
correlation which points to efficacy of the drug. Individual random variables with individual density
functions cannot model this behavior. We need two-variable density functions and double integrals.
Question 6.3.1
How Do We Use Double Integrals to Compute Probabilities?
Definition
A function f is a probability density function for a random variable X, if the chance of an outcome
Rb
a < X < b is a f (x)dx.
439
Question 6.3.1 How Do We Use Double Integrals to Compute Probabilities?
Definition
A pair (or more) of random variables X and Y , along with the likelihood of various outcomes (X, Y ) is
called a joint distribution. If the space of outcomes is continuous, the distribution is modeled by a joint
probability density function fX,Y (x, y) as follows:
Z b Z d
P (a ≤ X ≤ b and c ≤ Y ≤ d) = fX,Y (x, y) dydx
a c
Example 6.3.2
Using a Joint Density Function
Suppose the random variables X and Y have the joint density function
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise
Solution
We can write “X is at least twice as large as Y ” with the inequality X ≥ 2Y . This is everything below
the line y = 21 x Call this region H. We’ll integrate f over this region. This may seem daunting, but
f (x, y) = 0 outside the unit square. We can break H into two subregions, one that lies inside the square
and one that lies outside. A diagram will make it easier to find the bounds.
Figure: The target region H and the unit square of possible outcomes
440
ZZ
1
P Y ≤ X = x + y dA
2 H
1
Z 1 Z 2x
ZZ
= x + y dydx + 0 dA
0 0 the rest of H
1
1 2x
y2
Z
= xy + dx
0 2 0
Z 1
1 2 1 2
= x + x dx
0 2 8
Z 1
5 2
= x dx
0 8
1
5 3
= x
24 0
5
=
24
Warning
The region of integration in this example has one fourth of the area of the total region of possibilities,
5
yet the answer was 24 not 41 . Do not confuse area with probability. Not all outcomes are equally likely
to occur.
Since we got a low probability, relative to area, we can deduce that the probability density in the
region we examined is lower than at some other parts of the domain. That makes sense. The joint
density function x + y is largest in the upper right corner and lowest in the lower left. More of our
triangle was near the lower left than the upper right.
441
Example 6.3.2 Using a Joint Density Function
Exercise
Darmok and Jalad each travel to the island of Tanagra and arrive between noon and 4 PM. Let (X, Y )
represent their respective arrival times in hours after noon. Suppose their joint density function is
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise
R4R4
1 What is the value of 0 0
fX,Y (x, y)dydx?
2 Calculate the probability that Darmok arrives after 2PM.
Question 6.3.3
What Is a Marginal Density Function?
Suppose we have a joint density function fX,Y (x, y). What if we are only interested in the values
of X? Perhaps we want to compute the expected value. Recall that a density function fX (x) of X
satisfies the property
Z b
P (a ≤ X ≤ b) = fX (x) dx
a
How can we get this function from the joint density function? We can compute P (a ≤ X ≤ b).
Z b Z ∞
P (a ≤ X ≤ b) = fX,Y (x, y) dydx
a −∞
Compare this to the definition of a probability density function. Both compute the same probability.
Both integrate over the same range of x-values. The only way for this to be true for all values of a
Z ∞
and b is if the integrand is the same. This means that the inner integral fX,Y (x, y) dy is equal to
−∞
fX (x), the probability density function of X.
442
When we obtain a density function of one random variable from a joint distribution, we call it a
marginal density function.
Theorem
Given a joint distribution X, Y with joint density function fX,Y , the individual variables have marginal
density functions:
Z ∞
fX (x) = fX,Y (x, y) dy
−∞
Z ∞
fY (y) = fX,Y (x, y) dx
−∞
Z ∞
For each x-value x0 , the inner integral fX,Y (x0 , y) dy is the area of the x = x0 cross-section
−∞
under z = fX,Y (x, y). In this figure, we see that larger values of X are more likely, because their
cross-sections have more area.
Figure: The marginal density function fX (x), represented as cross-sections under z = fX,Y (x, y)
Example 6.3.4
Computing Marginal Density Functions
Students at schools around the world compete in a rocketry contest. Rockets are scored based on
the altitude they reach (in meters). Suppose the first and second place altitudes at a randomly chosen
school are modeled by X and Y , which have joint density function
y2
(
12−0.012x y
1000 x2 − x3 if 0 ≤ x ≤ 1000, 0 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise
443
Example 6.3.4 Computing Marginal Density Functions
a What can we infer about the possible altitudes of student rockets from this joint density function?
b Compute the marginal density function of X, the altitude of the first place rocket.
c What can we conclude about what values of X are more or less likely?
Solution
Figure: The possible outcomes of (X, Y ), and the possible outcomes of Y for each X
a The maximum altitude of a rocket is 1000m. The second-place rocket always has a lower altitude
than the first-place rocket, which makes sense.
b For x > 1000 or x < 0, the function fX,Y (x, y) = 0 for any choice of Y . For 0 ≤ x ≤ 1000, the
function fX,Y (x, y) is piecewise function of y. We can see this in the figure above, fX,Y is only
nonzero when 0 ≤ y ≤ x.
Z ∞
fX (x) = fX,Y (x, y) dy
−∞
444
x2 x3
12 − 0.012x
= − −0+0
1000 2x2 3x3
12 − 0.012x 1 1
= −
1000 2 3
2 − 0.002x
=
1000
(
2−0.002x
1000 if 0 ≤ x ≤ 1000
fX (x) =
0 otherwise
c fX (x) has its largest value at x = 0 and shrinks to 0 as x increases to 1000. This indicates that
lower altitudes are much more likely than higher altitudes.
Figure: The marginal density function of X, represented as an area under the graph of z = fX,Y (x, y)
(z-axis not to scale)
Remark
Even though the range of possible outcomes is greater for larger X, the probability of achieving that X
is smaller. We can see this in the cross sections on the joint-density function. Larger values of X have
longer cross sections, but it is the area under the graph z = fX,Y (x, y) that matters.
445
Example 6.3.4 Computing Marginal Density Functions
Main Idea
If the range of possible outcomes is limited, then computing fX (x) requires us to:
1 make different computations for different ranges of X and
2 within each computation, divide the integral into pieces depending on which values of Y are
possible.
Question 6.3.5
Why Do We Need Joint Distributions?
Definition
If the outcomes of Y don’t depend on the outcome of X and vice versa, we say X and Y are indepen-
dent. In this case
Z b Z d
P (a ≤ X ≤ b and c ≤ Y ≤ d) = fX (x) dx fY (y) dy
a c
Example
Suppose Darmok and Jalad’s arrival times have the joint density function
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise
Jalad’s arrival time is uniformly distributed. Darmok’s is triangular. Neither distribution depends on
the arrival time of the other.
446
Figure: The density function for Darmok and Jalad’s arrival times
Theorem
X and Y are independent, if and only if their joint density function can be written fX,Y (x, y) = g(x)h(y),
where
g(x) is a function only of x
h(y) is a function only of y
Remark
g(x) and h(y) can be chosen to be the marginal density functions of X and Y , but they don’t need to
be. As long as a factorization exists, the variables are independent.
Example
Suppose
(
3π π
12π−8 cos 2x (2y − y 2 ) if 0 ≤ x ≤ 6 and 0 ≤ y ≤ 4
fX,Y (x, y) =
0 otherwise
447
Question 6.3.5 Why Do We Need Joint Distributions?
The area of a y = y0 cross section is fY (y0 ) the likelihood that Y is near y0 . The shape of the
cross section indicates what X values are likely for that choice of Y . For independent variables, the
X values are distributed the same way no matter what Y value we choose. Mathematically, the cross
section functions are constant multiples of each other. Multiplying by a constant does not change what
portion of the total area lies over a given range of X values.
Question 6.3.6
What Is the Expected Value of a Function of X and Y?
Y2
What if we wanted to know the expected value the function g(X, Y ) = X ? By definition, this is
very hard. We would need to write a density function h(t) such that
b
Y2
Z
h(t) dt = P a ≤ ≤b
a X
Notice g(x, y) = a and g(x, y) = b are level curves of g. In this case they solve to
1
x = y2
a
1
x = y2
b
448
In the case of Darmok and Jalad, the probabilities that h(t) produces would have to integrate to
give the probability that (X, Y ) lies between the level curves:
Even if you did work through the steps to describe the bounds of such a region, you’d need to
1 Write the bounds as a function of a and b, which will be piecewise depending on whether the level
curves exit through the top or the side of the square.
Y2
2 Evaluate the integral of fX,Y (x, y) over such a region to compute P (a ≤ X ≤ b).
3 Use the Fundamental Theorem of Calculus to write an integrand h(t) that integrates to the
probability you found.
Z ∞
4 Integrate th(t) dt.
−∞
Theorem
The expected value of a function g(X, Y ) of two continuous random variables X and Y with joint
density function fX,Y (x, y) can be computed:
Z ∞ Z ∞
E[g(X)] = g(x, y)fX,Y (x, y) dydx.
−∞ −∞
449
Example 6.3.7
Expected Value of a Random Variable
A special case of the expected value formula is to compute the expected values of g(x, y) = x or
g(x, y) = y. Suppose X and Y have joint density function
y2
(
12−0.012x y
1000 x2 − x3 if 0 ≤ x ≤ 1000, 0 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise
Compute E[X].
Solution
2000
= 1000 −
3
1000
=
3
450
Main Ideas
If we already have the marginal density function fX (x) (or fY (y)), we can use the single-variable
expected value formula:
Z ∞
E[X] = xfX (x) dx
−∞
In fact, we saw this integral partway through our solution. Computing the marginal density function
is nearly equivalent to computing the inner integral in the two-variable expected value formula.
Example 6.3.8
Expected Value of a Function
2
Compute the expected value of YX where X is Darmok’s arrival time and Y is Jalad’s arrival time.
Assume that X and Y have joint density function:
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y = .
0 otherwise
Solution
4
2
= x
3 0
8
=
3
Application 6.3.9
Average Value of a Function
Definition
The uniform distribution over a region D in R2 has the joint density function
(
1
area of D if (x, y) is inside D
fX,Y =
0 if (x, y) is outside D
Like with single variable function, we default to the uniform distribution whenever we average a
function and no specific random variable is specified.
Definition
The average value of a function f over a region D is defined to be the expected value of f (X, Y )
where X, Y are uniformly distributed over D.
ZZ
1
fave = f (x, y) dA
Area of D D
Since we can also compute the area of D using a double integral, we can also write
RR
D
f (x, y) dA
fave = RR
D
1 dA
452
Application 6.3.10
Covariance and Correlation
One of the most useful things to know about a pair of random variables is whether they are correlated,
whether high values of one tend to correspond to high values (or low values) of the other. We can measure
this by examining the expected value of a specific function, which is positive when X and Y are both
above average or both below average, and negative for pairs when one is above and the other is below.
Definition
The average value of (X − E[X])(Y − E[Y ]) is called the covariance of X and Y , denoted cov(X, Y ).
To test this, we can look at a type of joint distribution whose correlation we already understand.
Suppose X and Y are independent. Then outcomes of X should not depend on outcomes of Y . The
joint density function can be written f (x, y) = g(x)h(y). We can use our integral rules to see that
covariance is always 0, matching our intuition.
Z ∞ Z ∞
cov(X, Y ) = (x − E[X])(y − E[Y ])fX,Y (x, y) dydx
−∞ −∞
Z ∞ Z ∞
= (x − E[X])(y − E[Y ])g(x)h(y) dydx
−∞ −∞
Z ∞ Z ∞
= (x − E[X])g(x) dx (y − E[Y ])h(y) dy
−∞ −∞
Z ∞ Z ∞
= xg(x) dx − E[X] yh(y) dy − E[Y ]
−∞ −∞
= (0)(0)
Covariance on its own does not allow us to compare whether one joint distribution is better correlated
than another. A joint distribution could have a large covariance because the variables are consistently
correlated, or because X (or Y ) has high variance (meaning X is generally farther from E[X]). To
control for the latter effect we often compute:
Pearson’s Correlation
cov(X, Y )
ρX,Y =
σX σY
Where the σs are standard deviations.
ρ returns a value between −1 and 1 which is one measure of how well-correlated two random variables
are.
453
Section 6.3
Exercises
Summary Questions
Q1 How do we use a joint density function to compute the probability of a certain set of outcomes?
Q4 How can we tell from the graph of a joint density function that the two random variables are
independent?
6.3.1
Z 1 Z x
fX,Y (x, y) dydx
0 0
compute?
(
ax if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 5
fX,Y (x, y) =
0 otherwise
454
6.3.2
b Among the possible values of (X, Y ), describe which are more or less likely than others.
c Set up an integral or integrals that would compute the probability that Y > X. You don’t
need to evaluate it.
Q10 Suppose we perform an experiment in which a pair of strangers find an amount of money on
the ground. Suppose X and Y are continuous random variables that model the portion of the
money (0 =none, while 1 = all) that each person keeps. Any money not kept is turned into the
authorities. Suppose the joint density function of X and Y is
(
24xy if x ≥ 0, y ≥ 0, and x + y ≤ 1
fX,Y (x, y) =
0 otherwise
a In a few sentences, interpret what this density function says about which outcomes are likely
and which are not. Feel free to include any comments on human nature that you need to
get off your chest.
b Set up an integral (or integrals) that computes the probability that each person takes at
most twice as much as the other. Do not evaluate.
455
Section 6.3 Exercises
6.3.3
Q11 Let T be the triangle with vertices (1, 2), (4, 0) and (3, 5). If X and Y are a joint distribution
with a density function fX,Y that is nonzero on T and zero everywhere else. For what values of
x is the marginal density function fX (x) nonzero? Illustrate with a diagram.
Q12 Let D be the region between y = x2 and y = 2x + 15. If X and Y are a joint distribution with
a density function fX,Y that is nonzero on D and zero everywhere else. For what values of y is
the marginal density function fY (y) nonzero? Illustrate with a diagram.
Q13 Suppose that X and Y are a joint distribution whose density function fX,Y is nonzero in the disk
x2 + y 2 ≤ 1 and nowhere else. If the marginal density function of X is the density function of a
uniform random variable, what does this tell you about where the function fX,Y (x, y) is higher
and lower?
(
g(y) if a ≤ x ≤ b and c ≤ y ≤ d
fX,Y =
0 otherwise
where g is a function only of y. What is the marginal density funtion of X? Justify your answer,
preferably without actually evaluating any integrals.
6.3.4
Q15 Suppose the random variables X and Y have the joint density function
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise
(
4xy − 2x − 2y + 2 if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise
456
b Compute the marginal density function fy (y).
Q17 Let T be the triangle with vertices (0, 0), (1, 0) and (0, 1). Let X and Y have joint density
function
(
6x if (x, y) is in T
fX,Y (x, y) =
0 otherwise
(
15y if x2 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise
6.3.5
Q19 Suppose X and Y are independent. Their joint density function fX,Y (x, y) has the values
Q20 How does the distribution of Y change as X takes different values, given the following joint
density function?
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise
457
Section 6.3 Exercises
Q21 Suppose X and Y are independent random variables. If their joint density function fX,Y (x, y) is
0 except on D, what can we say about the shape of D?
Q22 fX,Y (x, y) is a joint density function for a pair of independent variables X and Y . Here is a
b Assume fX,Y (x, y) is not always 0 at x = 5. Describe what values of Y are more or less
likely when X = 5.
c How is the shape of the x = 2 cross section of z = fX,Y (x, y) related to the x = 5 cross
458
6.3.6
Q23 Let X and Y be random variables with joint density function fX,Y (x, y) and let D be the
distance from (X, Y ) to the origin. What region would we need to integrate over to compute
P (1 ≤ D ≤ 2)?
Q24 Let X and Y be random variables with joint density function fX,Y (x, y) and let Z be the difference
Q25 Use the expected value formula to show that if Z1 and Z2 are both functions of X and Y , then
Q26 Let T be the triangle with vertices (0, 0), (4, 0) and (0, 4). Suppose X and Y are random variables
with joint density function
(
1
8 if (x, y) is in T
fX,Y (x, y) =
0 otherwise
Let Z = X + Y .
dG
Rb
b Compute g(z) = dz . Explain why P (a ≤ Z ≤ b) = a
g(z) dz.
d Compute the expected value of Z instead using our multivariable expected value of a function
formula.
6.3.7
y2
(
12−0.012x y
1000 x2 − x3 if 0 ≤ x ≤ 1000, 0 ≤ y ≤ x
fX,Y (x, y) =
0 otherwise
Compute E[Y ]
459
Section 6.3 Exercises
Q28 Suppose the random variables X and Y have the joint density function
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise
Compute E[Y ].
Q29 Darmok and Jalad’s arrival times X and Y have the joint density function
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise
(
24xy if x ≥ 0, y ≥ 0, and x + y ≤ 1
fX,Y (x, y) =
0 otherwise
6.3.8
Q31 Suppose the random variables X and Y have the joint density function
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) = .
0 otherwise
Q32 Darmok and Jalad’s arrival times X and Y have the joint density function
(
x
32 if 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) = .
0 otherwise
Darmok is trying to break his habit of arriving late. He has agreed to donate 120 credits to a
local charity for each hour Jalad has to wait for him (prorated across partial hours). Assuming
that this incentive has no effect on their arrival times, what is the expected donation?
460
Q33 Suppse X and Y have joint density function
(
4xy − 2x − 2y + 2 if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise
Q34 The longitude and latitude of a meteorite landing are random variables X degrees and Y degrees
with joint density function
(
8100−y 2
349920000 if − 180 ≤ x ≤ 180 and − 90 ≤ y ≤ 90
fX,Y (x, y) =
0 otherwise
a Write an integral that computes the probability that a meteorite lands within 20 degrees
b What does this density function say about where a meteorite is likely or unlikely to strike?
Answer in a few sentences.
c Suppose a perverse lottery is established that pays out 30 dollars minus the distance in
degrees from the south pole (y = −90), if the meteorite strikes within 30 degrees of the
south pole. Otherwise it pays out nothing. Set up an integral that computes the average
payout from this lottery. Do not evaluate.
6.3.9
Q35 Compute the average value of the function f (x, y) = 2y on the unit disc x2 + y 2 ≤ 1.
Q36 Compute the average value of the function f (x, y) = y on the region enclosed by y = x2 and
y = 16.
Q37 Compute the average value of the function f (x, y) = xy on the triangle with vertices (0, 0), (4, 0)
Q38 Compute the average value of the function f (x, y) = x2 on the triangle with vertices (−2, 0),
461
Section 6.3 Exercises
6.3.10
Q39 Recall our friends Darmok and Jalad arriving in Tanagra between noon and 4 PM. The joint
8 8
We found that E[X] = 3 and E[Y ] = 2. Consider the function g(X, Y ) = X − 3 (Y − 2).
a Draw the domain of possible values of (X, Y ) At what points in this domain is g positive?
Where is it negative?
b Could you argue, using the laws of integrals instead of a computation, that E[g(X)] = 0?
(
x+y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1
fX,Y (x, y) =
0 otherwise
c What does your answer to b suggest about how X and Y are correlated?
Q41 Suppose D is the region enclosed by 6x−x2 and the x-axis. Let X and Y are the uniform desntiy
b What values of X and more likely and what values are less likely? Justify your answer.
Q42 If X and Y have a uniform joint distribution over some region D, can X and Y be correlated?
Explain or demonstrate.
462
Q43 Suppose on a trip to the movies, the number of minutes you wait in line for tickets (X) and
the number of minutes you wait in line for snacks (Y ) are random variables with joint density
function:
(
12x−x2 +10y−y 2
4880 if 0 ≤ x ≤ 12 and 0 ≤ y ≤ 10
fX,Y (x, y) =
0 otherwise
b Compute the probability that the ticket line takes less than 5 minutes. You don’t need to
simplify the arithmetic.
c You decide to pay a friend 25 cents per minute to wait in line for snacks while you wait for
the tickets. If you’re still in line when she gets the snacks, she brings them to you and you
pay her. If she’s still in line when you get tickets, you pay her and take her place. Write an
integral or integrals that compute the expected (average) amount you will pay her. Do not
evaluate.
Q44 When you go to the movies, you have to wait in line for tickets, and then to buy snacks. You
model the ticket wait (in minutes) with the random variable X. You model the snack wait with
the random variable Y . Suppose X and Y have the joint density function
(
50e−5x−10y if 0 ≤ x and 0 ≤ y
fX,Y (x, y) =
0 otherwise
a Compute the probability that you wait a total of no more than 15 minutes in both lines.
Q45 Darmok and Jalad have agreed to meet up again at Tanagra. Darmok’s arrival time (in hours)
after noon is denoted by the random variable X, while Jalad’s is denoted by the random variable
Y . X and Y have the joint density function
(
y
6x2 if 1 ≤ x ≤ 4 and 0 ≤ y ≤ 4
fX,Y (x, y) =
0 otherwise
a Describe the possible arrival times of Darmok and the possible arrival times of Jalad.
b Compute the probability that Darmok arrives at least two hours after Jalad.
463
Section 6.3 Exercises
c Darmok and Jalad leave Tanagra at exactly 6PM. Write an integral or integrals that compute
the average amount of time they spend together at Tanagra. Do not evaluate your integral(s),
but your integrand(s) should be functions whose antiderivative(s) are well known.
Q46 Let
D = {(x, y) : x2 + y 2 ≤ 4, y ≥ 0}
Suppose X and Y have joint density function
(
3y
16 if (x, y) is in D
fX,Y (x, y) =
0 otherwise
b What integral would compute the expected value of X? How do you know the value of this
integral without computing it?
R6
Q47 Suppose we wish to approximate 0
x2 dx by dividing the domain into two equal subintervals.
Suppose the test points for each subinterval are independently chosen, uniformly distributed ran-
dom variables on their respective subintervals. Produce an integral that computes the probability
that this approximation overestimates the actual value of the integral.
Q48 Suppose X and Y are independent, and their joint density function is written as a product
fX,Y (x, y) = g(x)h(y). How is the marginal density function fX (x) related to g(x)?
464
Section 6.4
Triple Integrals
Goals:
The theory of integrating a two-variable function extends without much trouble to functions of more
variables. Visualizing the domains and writing bounds of integration is a much greater challenge. Any
function whose domain is a piece of the real world needs (at least) three variables. Joint density functions
can also relate any number of random variables. In both cases, a triple integral allows us to aggregate
a rate (per unit of volume) to compute a total over the domain in question.
Question 6.4.1
How Do We Integrate a Three-Variable Function?
A triple integral is a natural extension of the double integral. A good exercise is to compare the two
definitions, point by point.
Definition
Given a domain D in three dimension space, and a function f (x, y, z). We can subdivide D into regions
Vi is the volume of the ith region.
(x∗i , yi∗ , zi∗ ) is a point in the ith region.
V is the diameter of the largest region.
We define the triple integral of f over D to be the following limit over all possible divisions of D:
ZZZ n
X
f (x, y, z) dV = lim f (x∗i , yi∗ , zi∗ )Vi
D V →0
i=1
Fubini’s theorem applies to triple integrals as well. We write them as interacted integrals.
465
Question 6.4.1 How Do We Integrate a Three-Variable Function?
Theorem
ZZZ Z x2 Z y2 Z z2
f (x, y, z)dV = Df (x, y, z) dzdydx
D x1 y1 z1
where
z1 and z2 are the bounds of z, which may be functions of x and y.
y1 and y2 are the bounds of y, which may be functions of x.
Example 6.4.2
Integrating Over a Prism
ZZZ
Let R = {(x, y, z) : 0 ≤ x ≤ 4, 0 ≤ y ≤ 2, 0 ≤ z ≤ 3}. Compute 3zy + x2 dV .
R
Solution
We will set this up as an integral of the form dzdydx. The inner integral is dz. No matter where we
are in R, we can travel in the positive z direction until we hit z = 3 or the negative z direction until we
hit z = 0. Thus the inner integral is
Z 3
3zy + x2 dz.
0
Different choices of (x, y) will give different values of this inner integral. The points in R corresponding
to a choice of (x, y) are a vertical segment ranging from z = 0 to z = 3. These segments exist for any
466
(x, y) in the rectangle 0 ≤ x ≤ 4, 0 ≤ y ≤ 2. We can set up the x and y bounds over this rectangle as
we would over a normal double integral. The integrand is the dz integral above. Together we have an
iterated integral.
Z 4 Z 2 Z 3
3zy + x2 dzdydx
0 0 0
We evaluate the inner integral, then the middle, and finally the outer.
Z 4 Z 2 Z 3 Z 4 Z 2 3
3 2
3zy + x2 dzdydx = yz + x2 z dydx
0 0 0 0 0 2 0
Z 4 Z 2
27
= y + 2x2 dydx
0 0 2
Z 4 2
27 2
= y + 2x2 y dx
0 4 0
Z 4
= 27 + 4x2 dx
0
4
4
= 27x + x3
3 0
256
= 108 +
3
580
=
3
Like a rectangle for double integrals, the right rectangular prism has constant bounds for triple
integrals. This is because the bounds of the inner variables remain the same, no matter what values the
outer variables take.
Question 6.4.3
How Do We Interpret Triple Integrals Geometrically?
The double integral is the volume under the graph of a two-variable function. This graph lives
in three-space. The triple integral is thus a fourth-dimensional volume “under” the graph of a three-
variable function. This graph lives in four-space and is thus more problematic to visualize. However
we can flatten the fourth dimension into three-space, much like we can flatten the three directions of
three-space onto a two-dimensional page. Such a representation loses some information, but can be a
useful heuristic. We can examine the role of each iteration in an iterated triple integral through this
construction.
Z 3
f (x, y, z) dz computes the area under the graph w = f (x, y, z) over each vertical segment of
0
the form (x, y) = (x0 , y0 ) in the domain. It is a function of x and y.
467
Question 6.4.3 How Do We Interpret Triple Integrals Geometrically?
Z 3
Figure: f (x, y, z) dz, represented as an area in a zw-plane
0
Z 2 Z 3
f (x, y, z) dzdy computes the volume under the graph w = f (x, y, z) over each x = x0
0 0
cross-section of the domain. It is a function of x.
Z 2 Z 3
Figure: f (x, y, z) dzdy, represented as a volume in yzw-space
0 0
The final integral would require us to represent a fourth-dimensional analogue of volume, which
would severely overlap in this visualization.
468
Application 6.4.4
Triple Integrals in Math and Science
Triple integrals have a variety of applications, largely in physics which tried to model our three-
dimensional world.
1 Integrating a function ρ(x, y, z), which gives the density of an object at each point, gives the total
mass of the object.
2 Integrating xρ(x, y, z), yρ(x, y, z) and zρ(x, y, z) gives the center of mass of the object.
3 Integrating a three-dimensional probability distribution over a region gives the probability that the
triple (X, Y, Z) lies in that region.
4 Integrating 1 dV over a region gives the volume of that region.
Even if we aren’t interested in physics, this connection provides us with another visual model for
integration. Density lets us visualize a triple integral without referring to a fourth (geometric) dimension.
Z 3
f (x, y, z) dz computes the density of
0
the vertical segments at each (x, y).
Z 2 Z 3
f (x, y, z) dzdy computes the den-
0 0
sity of the rectangle at each x.
Z 4 Z 2 Z 3
f (x, y, z) dzdydx computes the total mass of the prism.
0 0 0
Remark
469
Example 6.4.5
Integrating Over an Irregular Region
Let R be the region above the xy plane, below the cylinder x2 + z 2 = 16 and between y = 0 and
ZZZ
y = 3. Compute 4yz dV .
R
Solution
The words “above” and “below” are useful hints here. “Above the xy-plane” indicates that for each
(x, y) the lower bound of z will be on the xy-plane, where z = 0. “Below
√ the cylinder” indicates that
the upper bound of z will satisfy x2 + z 2 = 16 which solves to z = ± 16 − x2 . Since these z values
are above the xy-plane, the positive branch must be the upper bound. Thus our inner integral is
√
Z 16−x2
4yz dz
0
To complete the middle and outer bounds we consider what x and y values lie in R. The lines y = 0 and
y = 3 suggest bounds for y, but they do not enclose any region. √Where else can we get information?
Since R is bounded above and below by y = 0 and above by y = 16 − x2 , then it is also bounded by
where these graphs meet. We can solve for that intersection:
p
0= 16 − x2
0 = 16 − x2
0 = (4 + x)(4 − x)
x = −4 or x = 4
Putting this together with the bounds we already have, we see that our x and y bounds are rectangular.
We set them up as we would in a double integral and put the inner integral as an integrand:
√
Z 4 Z 3 Z 16−x2
4yz dzdydx
−4 0 0
470
We now turn to evaluating the integral. Having a function of x in our z-bounds should be familiar from
double integrals.
√ √
Z 4 Z 3 Z 16−x2 Z 4 Z 3 16−x2
4yz dzdydx = 2yz 2 dydx
−4 0 0 −4 0 0
Z 4 Z 3 p
= 2y( 16 − x2 )2 dydx
−4 0
Z 4 Z 3
= 32y − 2yx2 dydx
−4 0
Z 4 3
= 16y 2 − x2 y 2 dx
−4 0
Z 4
= 144 − 9x2 dx
−4
4
= 144x − 3x3
−4
Main Idea
The following approach will produce the bounds of a region with a top surface and a bottom surface.
1 The z bounds are given by the equations z = f (x, y) and z = g(x, y) of the top and bottom
surface.
2 The intersection of the top and bottom surface can produce relevant bounds on x and y. We can
graph these, along with any given bounds involving x and y.
3 After drawing the bounded region in the xy-plane, the x and y bounds are computed as for a
double integral.
Like with double integrals, we will want to break the region into smaller pieces in some cases. In
other cases, we may want to change the order of integration.
471
Example 6.4.6
A Solid Given by Vertices
Suppose we want to integrate over T , the tetrahedron (pyramid) with vertices (0, 0, 0), (4, 0, 0),
(4, 2, 0) and (4, 0, 2). How would we set up the bounds of integration?
Solution
In this case, it is helpful to draw a diagram of the tetrahedron in three-space. First we examine the inner
integral. The bounds of z are functions of (x, y). Visually, we want to imagine the vertical segments
lying in different parts of T and ask where their upper and lower endpoints lie. No matter which veritcal
segment we pick, its lower endpoint in on the xy-plane and its upper endpoint is on the triangle with
vertices (0, 0, 0), (4, 2, 0) and (4, 0, 2).
The xy-plane gives us a lower bound z = 0. The upper bound triangle also lies in a plane. Every
upper endpoint lies in this plane, so its z coordinates must satisfy the equation of that plane. This plane
has a z-intercept of 0 since (0, 0, 0) is a vertex of the triangle. We can solve for the slopes and write
the equation.
2−0 1
mx = =
4−0 2
2−0
my = = −1
0−2
1
z= x − 1y + 0
2
1
z = x−y
2
To find the outer bounds, we ask what values of (x, y) lie in T ? Every point in T lies directly above the
triangle with vertices (0, 0, 0), (4, 2, 0) and (4, 0, 0). Thus its (x, y) coordinates match those of a point
472
in the triangle. We can draw this triangle in the xy-plane and set up the bounds of a double integral
over it. The result is
Z 4 Z 12 x Z 12 x−y
f (x, y, z) dzdydx
0 0 0
Main Idea
In the case of a polyhedron given by vertices, we generally need to plot the vertices and draw the faces
to discern the upper and lower z bounds. The equations of these bounds are planes. We can then draw
the set of possible (x, y) in two-space and proceed as in a double integral.
Example 6.4.7
Changing the Order of Integration
Suppose D is the bounded region enclosed between the graph of y = 4x2 + z 2 and the plane y = 4.
ZZZ
Set up the bounds of the integral f (x, y, z)dV .
D
473
Example 6.4.7 Changing the Order of Integration
Solution 1
We can begin by finding bounds for the z, the inner variable. The plane y = 4 does not have a z. z
is a free variable and thus the plane extends in the z direction, and p
cannot be the top or bottom of a
vertical segment. On the other hand y = 4x2 + z 2 solves to z = ± y − 4x2 . Since this gives a plus
and a minus branch, it can provide both the upper and lower bound of z. The inner integral is
Z √y−4x2
√ f (x, y, z) dz
− y−4x2
For xy-bounds we have equation y = 4, but this does not bound any region. We can search for additional
bounds by seeing where the top surface meets the bottom. We’ll use the fact that a square root can
only equal a negative square root if both are 0.
p p
− y − 4x2 = y − 4x2
y − 4x2 = 0
y = 4x2
We can add this parabola to a graph. We set up our xy bounds using our usual method for double
integrals. The graphs y = 4x2 and y = 4 intersect at x = ±1. Between x = −1 and x = 1, y = 4 is
greater than y = 4x2 . Here are the bounds.
Z 1 Z 4 Z √y−4x2
√ f (x, y, z) dzdydx
−1 4x2 − y−4x2
p The bounds in this solution look difficult to work with. For example, in the first step, we’ll plug
y − 4x2 in for z in the antiderivative of f . The resulting integrand would be even more difficult
to work with. We can improve this situation somewhat by choosing a different variable for our inner
integral.
Solution 2
Since both bounds are already solved for y, we will use y as our inner variable. We can test which is
the upper and which is the lower bound with a test point, but we don’t yet know which x and z values
lie in the region. We do not have any x or z bounds that don’t involve y, so we set the y-bounds equal
to each other.
4x2 + z 2 = 4
We may recognize this as an ellipse. Even if we do not, we can proceed at usual for a double integral,
except that our variables are x and z. We will use z is the inner variable and solve the bound for z.
4x2 + z 2 = 4
z 2 = 4 − 4x2
p
z = ± 4 − 4x2
474
These give upper and lower bounds for z. To find x bounds, we solve for where the z-bounds intersect.
p p
− 4 − 4x2 = 4 − 4x2
√ . . . dzdx
−1 − 4−4x2
√ f (x, y, z) dydzdx.
−1 − 4−4x2 4x2 +z 2
We still have difficult z bounds under this method, but we delay plugging them in until the second
step, which means they may cause less trouble for us.
Main Idea
When setting up a triple integral bounded by graphs, it may be more convenient to use an inner variable
that has a less complicated relationship with the bounding equations.
Question 6.4.8
When Does a Triple Integral Decompose as a Product?
Theorem
475
Question 6.4.8 When Does a Triple Integral Decompose as a Product?
Example
Along with the sum and constant multiple rules we can simplify
Z 4 Z 2 Z 3
3zy + x2 dzdydx
0 0 0
Section 6.4
Exercises
Summary Questions
Q2 How is density used to understand triple integrals. Why wasn’t it necessary or appropriate for
double integrals?
Q3 How do you find the bounds of the inner variable in a triple integral?
476
6.4.1
Z 3 Z 10 Z 2
Q5 Suppose we want to approximate f (x, y, z) dzdydx by subdividing the domain of
0 2 −2
integration into 12 sub-prisms of equal volume. What will V be?
Q6 Let R be a cube of side length 4, with edges parallel to the x-, y- and z-axes, and with vertices
ZZZ
(0, 0, 0) and (4, 4, 4). Suppose we want to approximate xyzdV using a subdivsion of R
R
into 8 identical cubes.
b What test points would you use to make your approximation as large as possible.
6.4.2
Z 4 Z 7 Z 2 Z 7 Z 2 Z 4
Q7 Is f (x, y, z) dzdydx = f (x, y, z) dzdydx? Explain.
0 0 0 0 0 0
Q8 Set up the bounds of integration of a function f (x, y, z) over the a general prism
P = {(x, y, z) : x0 ≤ x ≤ x1 , y0 ≤ y ≤ y1 , z0 ≤ z ≤ z1 }
Z 2 Z 2 Z 3
Q9 Evaluate (x + y)z dzdydx.
0 0 0
Z 5 Z 11 Z 1
Q10 Evaluate ye2x+z dzdydx.
0 0 −1
477
Section 6.4 Exercises
6.4.3
Z z1 Z y1 Z z1
Q11 In a triple integral, the inner integral f (x, y, z) dz is a function of x and y, while f (x, y, z) dzdy
z0 y0 z0
is a function of only x.
b Explain why this makes sense given the context of an iterated triple integral.
Q12 In each of the following questions, assume x, y, z, and w are the variables of four-space.
b What is the dimension of the set of points that satisfy both x = x0 and y = y0 ?
Q13 Give the area of the x = 4 and y = 1 cross-section of the region “under” the graph of w = x+yez
and “above” the prism
P = {(x, y, z) : 0 ≤ x ≤ 6, 0 ≤ y ≤ 4, 0 ≤ z ≤ 3}
√
z2 13−x2
Q14 Give the volume of the x = 2 cross-section of the region “under” the graph of w = y
P = {(x, y, z) : 0 ≤ x ≤ 3, 1 ≤ y ≤ 2, −3 ≤ z ≤ 3}
6.4.4
Q15 A prism of length ℓ, width w and height h can be defined by the inequalities
0≤x≤ℓ
0≤y≤w
0≤z≤h
Set up a triple integral to compute the volume of this prism. Verify that the value of this integral
matches the well-known volume formula, V = ℓwh.
478
Q16 Denser matter tends to sink to the bottom of a container. After sitting undisturbed for several
days, the density of a soil sample in the box
P = {(x, y, z) : 0 ≤ x ≤ 5, 0 ≤ y ≤ 4, 0 ≤ z ≤ 2}
is given by ρ(x, y, z) = e−z/10 . Find the total mass of the soil in the box.
Q17 Xavier, Yolanda and Zoe’s respective arrival times (in hours after noon) at a restaurant are given
by joint random variables X, Y and Z. The joint density function of X, Y and Z is
(
12
11 (1 − x2 yz) if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1
fX,Y,Z (x, y, z) =
0 otherwise
Q18 Random variables X, Y and Z are uniform if their density function has the form
(
1
V if (x, y, z) is in R
fX,Y,Z (x, y, z) =
0 otherwise
6.4.5
a Describe R geometrically.
ZZZ
b Set up the bounds of integration for f (x, y, z) dV .
R
c If we plug in the function f (x, y, z) = 1 do you happen to know the value of this integral?
Q20 Cheng is integrating over R, the region given by x2 + y 2 + z 2 ≤ 25. He gives the following setup.
Is this valid?
Z √25−y2 −z2 Z √
25−x2 −z 2 Z √25−x2 −y2
√ √ √ f (x, y, z) dzdydx
− 25−y 2 −z 2 − 25−x2 −z 2 − 25−x2 −y 2
479
Section 6.4 Exercises
D : {(x, y, z) : y ≥ 0, y ≤ −x, z ≥ 9, z ≤ 25 − x2 − y 2 }
RRR
a Set up the bounds of D
x dV . Do not evaluate.
ZZZ
xz dV.
R
Q23 Let R be the region enclosed by the graphs z = x2 + y 2 and 2y − z = 0. Set up the bounds for
RRR
R
(y − 1) dV . Do not evaluate.
Q24 Set up a triple integral that will compute the volume enclosed by the planes x = 0, x = 5, y = 0,
z = 2y and z = 6. Do not evaluate.
Q25 Let R be the region enclosed by z = x2 , z = 16, y = 2 and y = 6. Set up and evaluate
ZZZ
x + z dV .
R
√ √
Q26 Let R be the region enclosed by y = 25 − x2 , z = 6 − y and z = y. Set up the bounds of
RRR
R
g(x, y, z)dV .
6.4.6
Q27 Let P be a square pyramid with vertices (0, 0, 0), (2, 0, 0), (2, 2, 0), (0, 2, 0) and (0, 0, 4).
a Explain why it might not be a good idea to use z as the inner variable when setting up the
ZZZ
bounds of f (x, y, z) dV .
P
480
ZZZ
Q28 Set up the bounds of integration of f (x, y, z) dV , where T is a tetrahedron with vertices
T
(0, 0, 0), (8, 0, 0), (0, 6, 0) and (0, 0, 3).
6.4.7
Q29 Let R be the region over the first quadrant enclosed by y = x2 , x = 0, z = 0 and z = 4 − y.
ZZZ
Set up the integral f (x, y, z) dV
R
Q30 Let R be the region enclosed by the paraboloid x = 3 − y 2 − z 2 and the plane y = 21 x.
RRR
a Set up the integral R
f (x, y, z) dV with z as the inner variable.
481
Section 6.4 Exercises
RRR
b Set up the integral R
f (x, y, z) dV with x as the inner variable.
RRR
c Explain why it would be difficult to set up R
f (x, y, z) dV with y as the inner variable.
Q31 Let P be the prism whose base has vertices (0, 0, 0), (0, 5, 0) and (0, 0, −2) and whose height is
RRR
4 units in the direction of the positive x axis. Set up a triple integral P
g(x, y, z) dV in three
different ways, using three different inner variables.
Q32 Let P be the trapezoidal prism with vertices (0, 0, 0), (0, 6, 0), (0, 4, 2), (0, 0, 2), (5, 0, 0), (5, 6, 0),
ZZZ
(5, 4, 2), and (5, 0, 2). Set up the bounds of integration of h(x, y, z) dV without writing
P
it as a sum or difference of multiple integrals.
Q33 Consider the tetrahedron T whose vertices are (0, 0, 0), (0, 0, 4), (0, 6, 3), (2, 6, 3). Which vari-
able(s) could you use as the inner variable of a triple integral over T without having to break the
domain into two or more pieces.
Q34 Set up (but do not evaluate) one or more integrals of f (x, y, z) over the region
R = {(x, y, z) : z ≥ 0, x ≥ y 2 + z 2 , x + 2z ≤ 8}
Use dxdydz as your order of integration.
Z 1 Z x Z x−y
Q35 Rewrite the integral f (x, y, z) dzdydx as an integral with the differential dxdzdy.
0 x2 0
Z 2 Z 2 Z 4−x2
Q36 Rewrite the integral f (x, y, z) dzdydx as an integral with the differential dxdzdy.
0 2−x 0
6.4.8
Z 4 Z 8 Z 1
Q37 Use product and sum rules to decompose y 2 sin x − ey+z dzdydx into an expression
3 0 −1
containing only single integrals.
ZZZ
Q38 Let S = {(x, y, z) : x + y + z 2 2 2
≤ 25}. Explain why x3 y 4 cos πz dV cannot be
S
decomposed as a product.
482
Synthesis & Extension
D : {(x, y, z) : x − 16 ≤ y ≤ 2, x2 + y 2 ≤ z ≤ x2 + x + 4}
RRR
a Set up the bounds of D
xyz dV . You may use one or more integrals to do so. Do not
evaluate.
b Does the function f (x, y, z) = xyz have a maximum value on D? Justify your answer with
a theorem, and verify that the theorem does or does not apply.
Q40 Let S be the region above z = 0 and below the graph z = f (x, y) over the rectangle
R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d}.
c Show that if you evaluated your answer to b , your answer to a would be one of the step
of this computation.
Q41 Suppose that R is the solid obtained by rotating the region under y = f (x) from x = a to x = b
around the x-axis. Write a triple integral that computes the volume of R.
483