Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views51 pages

Function of Multiple Variables

Uploaded by

enen930907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views51 pages

Function of Multiple Variables

Uploaded by

enen930907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

5.

Derivatives of functions of multiple variables

5.1. Functions of multiple variables. We now introduce the three-dimensional co-


ordinate system by passing a z-axis perpendicular to both the x- and y-axes at the origin.
Taken as pairs, the axes determine three coordinate planes: the xy-plane, the xz-plane,
and the yz-plane. These three coordinate planes separate the three-dimensional coordinate
system into eight octants. For convenience we orient the z-axis according to the right-hand
z

rule as the graph shows. x

D EFINITION 5.1.1. A function f of multiple variables is a function that maps some


subset D of Rn to R, here we recall that

Rn = {(x1 , x2 , · · · , xn ) : xi 2 R for all i} .

We usually denote f as f (x1 , · · · , xn ).

In this course, we consider mainly on the case n = 2 and sometimes on the case
n = 3.
5.1.1. Graphs of two variable functions.

D EFINITION 5.1.2. Given a function f (x, y) defined on a set D ⇢ R2 , the graph of


f is the set
{(x, y, f (x, y)) : (x, y) 2 D} .

We usually denote this set as


z = f (x, y),

as the case we do in one variable case.


Similarly, for a function f (x, y, z) defined on a set D ⇢ R3 , the graph of f is the set

{(x, y, z, f (x, y, z)) : (x, y, z) 2 D} .


7
8

We usually denote this set as


w = f (x, y, z).

Even for functions of two variables, it is usually not easy to draw their graphs. Fur-
thermore, it is impossible to draw graphs for functions of three variables because we are
people living in three-dimensional space.
From now on, when draw a graph, we mean that we draw a graph in R3 .

E XAMPLE 5.1.3. Draw the graph of the function f (x, y) = 1.

S OLUTION . By definition, the graph is

z = 1.

Hence no matter what x, y are, we always get the value z = 1. Hence the graph of f is
described as follows.

One way to understand and visualize the shape of the graph of a function f (x, y) is
cutting the graph by a lot of ”planes” and check images on these planes. This motivates
the definitions of level curves and level surfaces.

D EFINITION 5.1.4. If c is a value in the range of f , then the curve


0 8 1
<z = f (x, y),
f (x, y) = c @f (x, y) = c () A
:z = c.

is called a level curve for f (x, y). In other words, it is the curve obtained by intersecting
the graph of f with the horizontal plane z = c and then be projected onto the xy-plane.

R EMARK 5.1.5. Of course, we can just think f (x, y) = c as a curve in the xy-plane.
Hence there are two meanings for the curve

f (x, y) = c.
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 9

E XAMPLE 5.1.6. Consider the function f (x, y) = x2 + y 2 . Then the level curves are
the circles centered at the origin

x2 + y 2 = c, c 0.

Hence we can imagine how the graph z = x2 + y 2 looks like, see Example 5.1.9.

R EMARK 5.1.7. If c is a value in the range of f (x, y, z), then the surface
0 8 1
<w = f (x, y, z),
f (x, y, z) = c @f (x, y, z) = c () A
:w = c.

is called a level surface for f . In other words, it is the surface obtained by intersecting the
graph of f with the ”plane” w = c and then be projected onto the xyz-space.
Also, we can only just regard

f (x, y, z) = c

as a surface in the xyz space. Therefore, we still have two meanings for the surface

f (x, y, z) = c.

5.1.2. Draw the graphs of functions of two variables.

E XAMPLE 5.1.8. Draw the graph and level curve of the function

f (x, y) = 1 x y.

S OLUTION . A straight way to draw the graph

z=1 x y

is imitating what we do as we draw a curve in the xy plane, i.e., for each value of z we
draw a curve. For example, let z = 0, we have the level curve

1 x y = 0 =) x + y = 1.
10

which is a line. Let z = 1, we get another level curve

1=1 x y =) x + y = 0

which is also a line. As the value of z changes, we get different level curves as lines.
Finally, we could roughly draw the graph.
On the other hand, if you remember what we have learned in senior high school, then
you know that the points of the equation

ax + by + cz = d, a, b, c are not all zero,

form a plane. Also, a plane is determine by two non-parallel vectors on it. Hence, we can
let x = 0, y = 0 (and z = 0 if you want to see more) and draw the lines

z=1 y, z=1 x, (0 = 1 x y)

on them respectively, then we determine the plane.


The level curves of f (x, y) is easy to get by definition. ⌅

E XAMPLE 5.1.9. Draw the graphs and level curves of the following functions.
(i) f (x, y) = x2 + y 2 .
(ii) f (x, y) = x2 .
(iii) f (x, y) = x2 y2 .

S OLUTION . (i) For each c > 0, the level curve of f (x, y) is

x2 + y 2 = c

which is a circle. Hence the graph and level curves of f (x, y) are described as follows.

(ii) For each c > 0, the level curve of f (x, y) is


p
x2 = c () x = ± c
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 11

which are union of two straight lines. Hence the graph and level curves of f (x, y) are
described as follows. (In order to comprehend the graph of f in detail, we could cut the
graph by y = c. Then we get
z = x2
which are parabola in the xz plane.)

(iii) For each c, the level curve of f (x, y) is

x2 y2 = c

which is hyperbola if c 6= 0, and are union of two lines x + y = 0 and x y = 0 if


c = 0. Hence the graph and level curves of f (x, y) are described as follows. (In order to
comprehend the graph of f in detail, we could cut the graph by y = c. Then we get

z + c = x2

which are parabola in the xz plane.)


12

D EFINITION 5.1.10 (local extreme value). Suppose that for any point (x, y) close to
(a, b) we have
(i) f (x, y)  f (a, b), the f (a, b) is called a local maximum;
(ii) f (x, y) f (a, b), the f (a, b) is called a local minimum.

E XAMPLE 5.1.11. Consider the level curves of functions.


(i) How to differentiate the maximum or minimum from level curves.
(ii) Is the origin of the level curve z = x2 y 2 a minimum or a maximum?
(iii) What is the geometric meaning for the density of the level curves of a function
f (x, y).

S OLUTION . (i) If the values correspond to level curves near (a, b) is decreasing (in the
direction that (a, b) as a starting point), then c is a local maximum; if the values correspond
to level curves near (a, b) is increasing (in the direction that (a, b) as a starting point), then
c is a local minimum.
(ii) No, since there are some values corresponding level curves bigger and smaller then
0 = 02 02 (see the graph above), 0 is not a local extreme values.
(iii) If the level curves of c = f (x, y) are dense for c 2 N (or any given values), then
the graph z = f (x, y) is steep. If the level curves of f (x, y) are sparse, then the graph
z = f (x, y) is flat.

R EMARK 5.1.12. Technically speaking in this course, functions of multiple variables


are just simple generalizations for functions of one variable (this is not always the case
since we skip many explorations and details). For example, methods of derivatives, extreme
values and integrals here can be reduced to the one-variable case. This means that it
would be no further difficulties to learn these materials based on what you learned in last
semester.
The main difference is that we should emphasized more on geometric meanings and
intuitions here. Be trying to imagine the pictures and keep their geometric intuition in
mind. Otherwise, they are only operations of many symbols and you will not get any
reward (includes your scores).

5.2. Derivatives of functions of two variables. We will extend properties for deriva-
tives of functions of single variable to functions of two variables in this section.
5.2.1. Partial derivatives and partial derivative functions.

D EFINITION 5.2.1. Let f be a function of two variables x, y. The partial derivatives


of f with respect to x and with respect to y at the point (a, b) are the defined by
f (a + h, b) f (a, b)
fx (a, b) = lim
h!0 h
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 13

and
f (a, b + h) f (a, b)
fy (a, b) = lim ,
h!0 h
respectively.

R EMARK 5.2.2. For simplicity, we usually suppose functions are good enough that
any order of the partial derivative of the functions exists and is continuous.

R EMARK 5.2.3. Let (a, b) vary, we have different values fx (a, b) and fy (a, b). Hence
we may think fx (x, y) and fy (x, y) as functions on R2 .

D EFINITION 5.2.4. The functions fx (x, y) and fy (x, y) are called partial derivative
functions of f with respect to x and respect to y, respectively. Let z = f (x, y). Then we
denote
@z @f @z @f
(x, y) = (x, y) = fx (x, y) and (x, y) = (x, y) = fy (x, y),
@x @x @y @y
where @ means ”partial”.

R EMARK 5.2.5. Denote g(x) = f (x, b) and h(y) = f (a, y) for any fixed a and b.
Then by definition, we can express
d d
(1) fx (x, b) = g(x) and fy (a, y) = h(y).
dx dy
Hence when we take partial derivative of a function, we can regard the other variable as a
constant. This reduces the calculation of partial derivative as the one-variable derivatives.
2
E XAMPLE 5.2.6. Find the partial derivatives of f (x, y) = xex y
and evaluate each
at the point (1, ln 2).

S OLUTION .
2 2
fx (x, y) = 1 · ex y
+ xex y
· 2xy
2 2
= ex y
+ 2x2 yex y

and
2 2
fy (x, y) = xex y
· x2 = x3 ex y .

Hence
fx (1, ln 2) = 2 + 4 ln 2 and fy (1, ln 2) = 2.
We now use Remark 5.2.5 to do the calculation again. Let
2
g(x) = f (x, ln 2) = x2x and h(y) = f (1, y) = ey .

Recall that
d x
2 = 2x · ln 2.
dx
14

Hence ⇣ 2 ⌘
2
g 0 (1) = 2x + 2x2 2x ln 2 |x=1 = 2 + 4 ln 2
and
h0 (ln 2) = ey |y=ln 2 = 2.

E XAMPLE 5.2.7. Find the partial derivative functions of the following functions.
(i) f (x, y) = xexy .
(ii) f (x, y) = sin(xy) + x2 exy .

S OLUTION . (i)

fx (x, y) = 1 · exy + x(exy · y)


= exy + xyexy .

fy (x, y) = xexy · x = x2 exy .

(ii)

fx (x, y) = cos(xy) · y + (2x · exy + x2 exy · y)


= y cos(xy) + (2x + x2 y)exy .

fy (x, y) = cos(xy) · x + x2 exy · x


= x cos(xy) + x3 exy .

E XAMPLE 5.2.8. Find the partial derivatives of the following functions.


(i) f (x, y) = x tan 1 (xy).
R xy
(ii) f (x, y) = x1/3 cos(t3 ) dt.

S OLUTION . (i) Recall that d


dx tan 1
x= 1+x2 .
1

@ 1 y
f (x, y) = 1 · tan (xy) + x
@x 1 + (xy)2
xy
= tan 1 (xy) + .
1 + x2 y 2
@ x x2
f (x, y) = x · 2
= .
@y 1 + (xy) 1 + x2 y 2
(ii) Recall that if
Z ↵(x)
F (x) = g(t) dt,
(x)
then
d
F (x) = g(↵(x))↵0 (x) g( (x)) 0 (x).
dx
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 15

Therefore, if
Z ↵(x,y)
f (x, y) = f (t) dt,
(x,y)

then

@
f (x, y) = f (↵(x, y))↵x (x, y) f ( (x, y)) x (x, y),
@x
@
f (x, y) = f (↵(x, y))↵y (x, y) f ( (x, y)) y (x, y).
@y
R xy
Now, let f (x, y) = x1/3 cos(t3 ) dt. We have

@ 1 2
f (x, y) = cos(x3 y 3 ) · y cos(x) · x 3 ,
@x 3
1 2
= y cos(x3 y 3 ) x cos x,
3
3
@
f (x, y) = cos(x3 y 3 ) · x cos x · 0
@y
= x cos(x3 y 3 ).

R EMARK 5.2.9. In one-variable case, f 0 (a) gives the rate of change (or equivalently
the slope) of f (x) (in the x-direction) at the point a. We have similar geometric interpre-
tation in two-variable case.
(i) Fix y = b, then the geometric interpretation for fx (x, b) is the slope in x-
direction of the curve
z = f (x, b)

which is the intersection of the surface z = f (x, y) and the plane y = b. In


particular if y = 0, fx (x, 0) is the variation (slope) of

z = f (x, 0)

in x-direction on the xz plane.


(ii) Fix x = a, the geometric interpretation for fy (a, y) is the slope in y-direction of
the curve
z = f (a, y)

which is the intersection of the surface z = f (x, y) and the plane x = a. In


particular, if x = 0,
fy (0, y)

is the variation (slope) in y-direction on the yz plane.


16

5.2.2. Equations of tangent planes.

P ROPERTY 5.2.10. We recall how to get an equation of a plane E in R3 .


~ = (`, m, n) is a normal vector for E and (a, b, c) 2 E. Then
(i) Suppose N

(2) E : `(x a) + m(y b) + n(z c) = 0.

(ii) Two non-parallel vectors on E determine a normal vector for E and so determine
the equation of E. More precisely, if (a1 , b1 , c1 ) and (a2 , b2 , c2 ) are two non-
parallel vectors on the plane E, then they determine a normal vector
!
~ b1 c 1 a 1 c 1 a 1 b 1
N= , ,
b2 c 2 a 2 c 2 a 2 b 2
= (b1 c2 b2 c 1 , a1 c 2 a 2 c 1 , a1 b2 a 2 b1 )

of the plane E.

P ROPERTY 5.2.11. Given a function f (x, y).


(i) If E is a tangent plane for the surface z = f (x, y), the equation of E is of the
form
E : z = `x + my + k.
(ii) The equation of the tangent plane E at the point

(a, b, f (a, b))

on the graph of f (i.e., on the surface z = f (x, y)) is

z f (a, b) = fx (a, b)(x a) + fy (a, b)(y b).

(iii) A normal vector for the surface z = f (x, y) at the point (a, b, f (a, b)) is

(fx (a, b), fy (a, b), 1).

S OLUTION . (i) Suppose (`, m, 0) is a normal vector of E, i.e.,

E : `x + my = k.

Note that 8
<z = f (x, y),
:`x + my = k,

has infinite point. Hence E should not be a tangent plane of z = f (x, y). Therefore, a
normal vector of z = f (x, y) has the form (`, m, 1), i.e.,

E : z = `x + my + k.

(ii) By (i), we may suppose

E : z = `x + my + k
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 17

as the tangent plane of the surface

S : z = f (x, y)

at the point (a, b, f (a, b)). Consider the intersection of the plane x = a with E and S.
Then
z = my + (`a + k)
is the tangent line of
z = f (a, y)
at (y, z) = (b, f (a, b)). Thus
@z @f (a, b)
m= = = fy (a, b).
@y @y
Similarly, consider the intersection of the plane y = b with E and S. We have

` = fx (a, b).

Thus
E : z = fx (a, b)x + fy (a, b)y + k.
Since (a, b, f (a, b)) is on E, we have

k = f (a, b) fx (a, b)a fy (a, b)b.

(iii) is a consequence of (ii). ⌅

D EFINITION 5.2.12. Given a function f (x, y) (resp. g(x, y, z)). Then

rf (a, b) := (fx (a, b), fy (a, b))

(resp. rg(a, b, c) := (gx (a, b, c), gy (a, b, c), gz (a, b, c)), see Definition 5.2.18)
is called the gradient of f (resp. g) at the point (a, b) (resp. (a, b, c)).

In many respects gradients behave just as derivatives do in the one-variable case.

P ROPERTY 5.2.13. We have the following properties for gradient.


(i) r(f + g)(a, b) = rf (a, b) + rg(a, b).
(ii) r(↵f )(a, b) = ↵rf (a, b), ↵ 2 R.
(iii) r(f g)(a, b) = rf (a, b) ⇥ g(a, b) + f (a, b) ⇥ rg(a, b).

R EMARK 5.2.14 (Geometric meaning of gradients ). Given functions f (x, y) and


g(x, y, z).
(i) rf (a, b) (if it is not a zero vector) represents a normal vector of the level curve
f (x, y) = c at the point (a, b). If f has a local extreme value at (a, b) then
rf (a, b) = ~0.
18

(ii) rg(a, b, c) (if it is not a zero vector) represents a normal vector of the level
surface g(x, y, z) = d at the point (a, b, c). (In particular,

(fx (a, b), fy (a, b), 1)

is a normal vector of the surface f (x, y) z = 0 at the point (a, b, f (a, b)), which
is the assertion of Property 5.2.11.) If g has a local extreme value at (a, b, c),
then rg(a, b, c) = ~0.
We will prove these properties latter. Just keep these conclusions in mind.

E XAMPLE 5.2.15. Let g(x) = x2 and f (x, y) = y g(x) = y x2 . Consider

C : y = x2 () C : f (x, y) = 0.

Then rf (x, y) = ( 2x, 1) is a normal vector for C at (x, y). Recall that (x0 , g 0 (x)) =
(1, 2x)) is a tangent vector of C at (x, y). Thus

( 2x, 1) · (1, 2x) = 2x · 1 + 1 · 2x = 0.

E XAMPLE 5.2.16. Find the equation of the tangent plane of f (x, y) = x tan 1
(xy)
at the point (1, 1, ⇡/4).

S OLUTION . We can use the result


@ xy @ x2
f (x, y) = tan 1 (xy) + , f (x, y) = ,
@x 1 + x2 y 2 @y 1 + x2 y 2
in in Example 5.2.8 (i) to get
@ 1 1 ⇡ 1 @ 1
f (1, 1) = tan (1) + = + , f (1, 1) = .
@x 2 4 2 @y 2
Or calculate directly that
✓ ◆
d 1 ⇡ 1
fx (1, 1) = x tan 1 x|x=1 = tan 1
x+ = + ,
dx 1 + x2 x=1 4 2
d 1
fy (1, 1) = tan 1 y y=1 = .
dy 2
Hence the equation of the tangent line E at the point (1, 1, ⇡/4) is
✓ ◆
⇡ ⇡ 1 1
z = + (x 1) + (y 1) .
4 4 2 2

Let us review the level curves of the following two functions again.

Q UESTION 5.2.17. Check these two examples above for the following question.
(i) Could you determine whether the partial derivatives @z @z
@x , @y are positive or neg-
ative from their level curves?
(ii) Could you give a geometric explanation for the direction that orthogonal to the
level curves?
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 19

S OLUTION . (i) The partial derivatives with respect to x (resp. y) is positive if and
only if the function is increasing in the x-direction (resp. y-direction).
(ii) Since we have the shortest distance from a level curve to a next level curve in the
direction that orthogonal to the starting level curves (i.e., in the direction of gradient), the
function has maximum increase or maximum increase in this direction. This phenomenon
holds in general and will be proved in Section 5.4.3. ⌅

5.2.3. Linear approximations. Recall that an equation of the tangent line of the graph
y = f (x) at the point (a, f (a)) is equal to

y = f (a) + f 0 (a)(x a).

When x is close to a, the value f (x) is close to f 0 (a)(x a) + f (a). Therefore,

f (x) ⇡ Lf (x) := f 0 (a)(x a) + f (a)

when x is close to a. The function Lf (x) is called the linear approximation of f (x) at the
point x = a. Denote x=x a and y = f (x) f (a), we have

y ⇡ f 0 (x) x.

x ! 0, then y ! 0 and
dy = f 0 (x) dx.
In a similar way, the tangent plane

z = f (a, b) + fx (a, b)(x a) + fy (a, b)(y b)

for the surface z = f (x, y) at the point (a, b, f (a, b)) is called the linear approximation
of f (x, y) at the point (a, b). We have

(3) f (x, y) ⇡ Lf (x, y) := f (a, b) + fx (a, b)(x a) + fy (a, b)(y b),


20

when (x, y) is close to (a, b). Let x=x a and y=y b. Also we denote the change
of values (error) for the function f (x, y) around (a, b) by

z = f (x, y) f (a, b).

Suppose x and y are close to 0, so z is also close to 0. Then we have

z ⇡ fx (a, b) x + fy (a, b) y.

The ratio
z
f (a, b)
is called the relative error of the function f (x, y) at the point (a, b). Let x ! 0 and
y ! 0. Then we have z ! 0 and the so-called Leibniz rule:
@f @f
dz = dx + dy
@x @y
@z @z
(4) = dx + dy.
@x @y
5.2.4. Functions of three variables.

D EFINITION 5.2.18. Let f be a function of three variables x, y, z. The (first) partial


derivative functions of f with respect to x, with respect to y, and with respect to z are the
functions fx , fy and fz defined by
f (x + h, y, z) f (x, y, z)
fx (x, y, z) = lim ,
h!0 h
f (x, y + h, z) f (x, y, z)
fy (x, y, z) = lim ,
h!0 h
and
f (x, y, z + h) f (x, y, z)
fz (x, y, z) = lim ,
h!0 h
provided these limits exist. Let w = f (x, y, z). We denote
@w @f @w @f @w @f
= = fx (x, y, z), = = fy (x, y, z), = = fz (x, y, z).
@x @x @y @y @z @z
E XAMPLE 5.2.19. For f (x, y, z) = x3 y 2 z + sin(xy), we have
@f
(x, y, z) = 3x2 y 2 z + y cos(xy),
@x
@f
(x, y, z) = 2x3 yz + x cos(xy),
@y
@f
(x, y, z) = x3 y 2 .
@z
E XAMPLE 5.2.20. For a function of the form f (x, y, z) = F (x, y)G(y, z), we have

fx (x, y, z) = Fx (x, y)G(y, z),


fy (x, y, z) = Fy (x, y)G(y, z) + F (x, y)Gy (y, z),
fz (x, y, z) = F (x, y)Gz (y, z).
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 21

E XAMPLE 5.2.21. Find the derivative functions of the following functions.


(i) f (x, y, z) = xy 2 z 3 .
(ii) f (x, y, z) = x2 ey/z .

S OLUTION . (i)

fx = y 2 z 3 , fy = 2xyz 3 , fz = 3xy 2 z 2 .

(ii)
1 x2 y/z y x2 y y/z
fx = 2xey/z , fy = x2 ey/z = e , fz = x2 ey/z = e .
z z z2 z2

We also have linear approximation whenever f is a function of three variables. Let


w = f (x, y, z) and d = f (a, b, c). Denote

x=x a, y=y b, z=z c, w = f (x, y, z) f (a, b, c).

As | x| , | y| , | z| is small, we have | w| is small, and the linear approximation of


w at the point (a, b, c) is given by

w ⇡ fx (a, b, c) x + fy (a, b, c) y + fz (a, b, c) z.

Further, let x, y, z ! 0, then w ! 0 and


@f @f @f
dw = dx + dy + dz.
@x @y @z
5.3. Chain rules for functions of multiple variables. Recall that for z = f (x, y),
the Leibniz’s rule (4) tells us that
@f @f
(5) dz = dx + dy.
@x @y
Suppose x = x(u, v) and y = y(u, v) are functions of u, v. Hence

z = f (x, y) = f (x(u, v), y(u, v))

is a function of u, v.

Q UESTION 5.3.1. What are the partial derivatives of z with respect to u and v respec-
tively?

Use Leibniz’s rule for x = x(u, v) and y = y(u, v) again, we have


@x @x @y @y
dx = du + dv, dy = du + dv.
@u @v @u @v
Put these qualities in (5), we have
✓ ◆ ✓ ◆
@f @x @x @f @y @y
dz = du + dv + du + dv
@x @u @v @y @u @v
✓ ◆ ✓ ◆
@f @x @f @y @f @x @f @y
= + du + + dv.
@x @u @y @u @x @v @y @v
22

Divide the last equality by du and dv respectively and use


dv du
= = 0 (since u, v are variables with no relation),
du dv
we have
dz @f @x @f @y dz @f @x @f @y
= + , = + .
du @x @u @y @u dv @x @v @y @v
Note that since z is a function of (u, v), we write @u
@z
replacing du
dz
and write @z
@v replacing
dz
dv (recall that @ means ”partial”). Thus we have the chain rules stated in the following.

P ROPERTY 5.3.2 (Chain rules I). Suppose that

z = f (x, y) = f (x(u, v), y(u, v)).

Then we have
@z @z @x @z @y @z @z @x @z @y
(6) = + , = + .
@u @x @u @y @u @v @x @v @y @v
In particular, if
z = f (x, y) = f (x(t), y(t)),
then we have
dz @z dx @z dy
= + .
dt @x dt @y dt
@y dy
Note that since x and y are only functions of t, we write @x
@t and @t as dx
dt and dt , respec-
tively.

E XAMPLE 5.3.3. (i) Suppose

z(x, y) = xy, x(u, v) = 2u v, y(u, v) = ln(uv).

Calculate @z
@u and @v .
@z

(ii) Let
z = x2 + y 2 , x = cos t, y = sin t.
Calculate dt .
dz

S OLUTION . (i)
@z @z @x @z @y
= +
@u @x @u @y @u
1
=y·2+x· v
uv
2u v
= 2 ln(uv) + ,
u
@z @z @x @z @y
= + ,
@v @x @v @y @v
u
= y · ( 1) + x ·
uv
2u v
= ln(uv) + .
v
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 23

(ii)
dz @z dx @z dy
= +
dt @x dt @y dt
= 2x · ( sin t) + 2y · (cos t)
= 2 cos t sin t + 2 sin t cos t = 0.

Also note that since z = cos2 t + sin2 t = 1, we have


dz
= 0.
dt

E XAMPLE 5.3.4. Let w = w(x, y), x = x(u, v) and y = y(u, v). Suppose that at
(u, v) = (0, 1), we have (x(0, 1), y(0, 1) = (2, 1) Also, we suppose
@x @x @y @y
(0, 1) = 1, (0, 1) = 1, (0, 1) = 2, (0, 1) = 1,
@u @v @u @v
and
@w @w
(2, 1) = 1, (2, 1) = 2.
@x @y
Use linear approximation to estimate the change of the value w when (u, v) moves
from (0, 1) to (0.01, 1.1).

S OLUTION . Let f (u, v) = w(x(u, v), y(u, v)). Hence w = f (u, v). We want to
estimate
w = f (0.01, 1.1) f (0, 1).
Note that
u = 0.01 0 = 0.01, v = 1.1 1 = 0.1
We have
@w @w
w⇡ u+ v
@u @v
✓ ◆
@w @x @w @y
= (2, 1) (0, 1) + (2, 1) (0, 1) u
@x @u @y @u
✓ ◆
@w @x @w @y
+ (2, 1) (0, 1) + (2, 1) (0, 1) v
@x @v @y @v
= (1 · ( 1) + 2 · ( 2)) · 0.01 + (1 · 1 + 2 · 1) · 0.1
= 0.05 + 0.3 = 0.25.

R EMARK 5.3.5 (Chain rules II). The chain rules for partial derivatives of multiple
variables functions have similar forms. For example, suppose that

z = f (x, y) = f (x(u, v, w), y(u, v, w)).


24

Then we have
@z @z @x @z @y @z @z @x @z @y @z @z @x @z @y
= + , = + , = + .
@u @x @u @y @u @v @x @v @y @v @w @x @w @y @w
On the other hand, suppose

w = f (x, y, z) = f (x(t), y(t), z(t)).

Then
@w @f dx @f dy @f dz
= + + .
@t @x dt @y dt @z dt
E XAMPLE 5.3.6 (Implicit derivatives). Let c 2 R.
(i) Given F (x, y, z) = c. Consider z = z(x, y) as a function of (x, y). Show that
@z @z
Fx + Fz · = 0, and Fy + Fz · = 0.
@x @y
Hence when Fz 6= 0,
@z Fx @z Fy
= and = .
@x Fz @y Fz
(ii) Suppose
1 1
+ 2 = 1.
x2 + z 2 y + z2
Calculate
@z
(1, 1, 1).
@y
P ROOF. (i) Differentiate both sides of the equation with respect to x. By the chain
rule, we have
@F @x @F @y @F @z
0= + +
@x @x @y @x @z @x
@z
= Fx · 1 + Fy · 0 + F z ·
@x
@z
= Fx + Fz · .
@x
The partial derivative with respect to y is similar.
(ii) Let
1 1
F (x, y, z) = + 2 1.
x2 +z 2 y + z2
Then we F (x, y, z) = 0. Further,
2y 2z 2z
Fy = , Fz = + 2 .
(y 2 + z 2 )2 (z 2 + x2 )2 (y + z 2 )2
Hence
1
Fy (1, 1, 1) = , Fz (1, 1, 1) = 1.
2
By (i), we have
@z Fy (1, 1, 1) 1
= = .
@y Fz (1, 1, 1) 2

5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 25

5.4. Directional derivatives and gradients. We will give proofs for the geometric
properties of gradients stated in previous section.
5.4.1. Directional derivative. A vector ~u = (u1 , u2 ) (resp. ~v = (v1 , v2 , v3 )) is called
a unit vector if
q q
|~u| := u21 + u22 = 1 (resp. |~v | := v12 + u22 + v32 = 1).

For h 2 R,
h~u := (hu1 , hu2 ) (resp. h~v := (hv1 , hv2 , hv3 )).

The inner product of two vectors ~u = (u1 , u2 ) and w = (w1 , w2 ) are defined as

~u · w
~ : = u1 w 1 + u2 w 2
= |~u| |w|
~ cos ✓,

where ✓ is the angle between ~u and w.


~

D EFINITION 5.4.1. For each unit vector ~u = (u1 , u2 ), the limit

@f f ((a, b) + h~u) f (a, b)


(a, b) = lim
@~u h!0 h
@f
is called the directional derivative of f at (a, b) in the direction ~u. Hence @~
u gives the
rates of change of f in the direction of ~u.
Similarly, for each unit vector ~v = (v‘ , v2 , v3 ), the limit

@g g((a, b, c) + h~v ) g(a, b, c)


(a, b, c) = lim
@~v h!0 h
is called the directional derivative of g at (a, b, c) in the direction ~v .

R EMARK 5.4.2. Note that each partial derivative fx , fy is itself a direction deriva-
tive. More precisely, if ~u = (1, 0) and ~v = (0, 1), then

@f @f
fx = , fy = .
@~u @~v
P ROPERTY 5.4.3. Let ~u be a unit vector. Then the direction derivative of f (x, y) at
(a, b) in the direction ~u is equal to

@f
(a, b) = rf (a, b) · ~u.
@~u
Similarly, let ~v be a unit vector in R3 . Then the direction derivative of g(x, y, z) at
(a, b, c) in the direction ~v is equal to

@g
(a, b, c) = rg(a, b, c) · ~v .
@~v
26

P ROOF. Suppose ~u = (u1 , u2 ). By the linear approximation (3) of f (x, y), we have
that for |h| small

f ((a, b) + h~u) = f (a + hu1 , b + hu2 )


⇡ f (a, b) + fx (a, b)hu1 + fy (a, b)hu2 .

Hence
@f f (a + hu1 , b + hu2 ) f (a, b)
(a, b) = lim
@~u h!0 h
✓ ◆
fx (a, b)hu1 + fy (a, b)hu2
” = ” lim
h!0 h
= (fx (a, b), fy (a, b)) · (u1 , u2 )
= rf (a, b) · ~u.

R EMARK 5.4.4. We can easily see that the formula above holds for any non-unit
vector ~u. In fact, for any r 2 R, we have
@f
(a, b) = rf (a, b) · r~u
@(r~u)
= rrf (a, b) · ~u
@f
=r· (a, b).
@~u
In order to make sense for comparing the difference of change rates in different directions
~u and ~v , a proper and convenient way is supposing that they have the same length 1.
This is why we assume the vector ~u to be a unit vector when we calculate the directional
derivative.

E XAMPLE 5.4.5. Find the directional derivative of the function f (x, y) = x2 + y 2 at


the point (1, 0) in the direction of (1, 1) and ( 2, 2), respectively.

S OLUTION . The unit vectors in the direction (1, 1) and ( 2, 2) are


1 1 1
~u = p (1, 1) = ( p , p ),
12 + 12 2 2
1 1 1
~v = p ( 2, 2) = ( p , p ),
2
( 2) + ( 2) 2 2 2
respectively.
Hence
@f 1 1 p
(1, 0) = rf (1, 0) · ~u = (2, 0) · ( p , p ) = 2
@~u 2 2
and
@f 1 1 p
(1, 0) = rf (1, 0) · ~v = (2, 0) · ( p , p )= 2.
@~v 2 2

5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 27

E XAMPLE 5.4.6. Find the directional derivative of the function f (x, y, z) = 2xz 2 cos(⇡y)
at the point (1, 2, 1) toward the point (2, 1, 3).

S OLUTION . The vector from (1, 2, 1) to (2, 1, 3) is

(2, 1, 3) (1, 2, 1) = (1, 1, 4).

Hence the unit vector ~u in this direction is


1 1
~u = p (1, 1, 4) = p (1, 1, 4).
2 2
1 + ( 1) + 4 2 3 2
Also

rf (1, 2, 1) = (2z 2 cos(⇡y), 2⇡xz 2 sin(⇡y), 4xz cos(⇡y))|(1,2, 1)

= (2, 0, 4)

Hence
@ 1 14
(1, 2, 1) = rf (1, 2, 1) · ~u = p (2 16) = p .
@~u 3 2 3 2

5.4.2. Tangent vectors of curves. We recall (observe) some properties of curves. Sup-
pose
C : (t) = (x(t), y(t))
is a curve in the xy-plane and (x(0), y(0)) = (a, b). Then
0
(0) := (x0 (0), y 0 (0))

is a tangent vector of the curve C at the point (a, b). This is easy to see:
Note that the slope of the tangent line at the point (a, b) is equal to
dy
dy dt |t=0 y 0 (0)
= = .
dx (a,b) dx
dt |t=0
x0 (0)
Hence
0
(0) = (x0 (0), y 0 (0))
is the tangent vector of C at (a, b).
We have a similar result for the curve

C : (t) = (x(t), y(t), z(t))

in the xyz-space. Suppose (x(0), y(0), z(0)) = (a, b, c). Then


0 1
(0) : = lim (x(t) x(0), y(t) y(0), z(t) z(0))
t!0 t 0
= (x0 (0), y 0 (0), z 0 (0))

is a tangent vector of the curve C at the point (a, b, c).


28

5.4.3. Geometric properties of gradients.

P ROPERTY 5.4.7 (Geometric meaning of gradients). Suppose rf (a, b) 6= (0, 0).


(i) Starting from (a, b), the function f (x, y) increases most rapidly in the direction
of the gradient rf (a, b), and the function f (x, y) decreases most rapidly in the
opposite direction rf (a, b).
(ii) Let c = f (a, b). Then rf (a, b) is perpendicular to the level curve f (x, y) = c at
the point (a, b), i.e., rf (a, b) is a normal vector for the level curve f (x, y) = c
at (a, b).
(iii) The bigger |rf (a, b)| is, the denser the level curves near (a, b) are.

S OLUTION . (i) Note that


@f
(a, b) = rf (a, b) · ~u
@~u
= |rf (a, b)| cos ✓
 |rf (a, b)|,

where ✓ is the angle between these two vectors rf (a, b) (6= (0, 0) by assumption) and
~u. Hence the direction of most rapidly increase of f is determined by cos ✓ = 1, i.e.,
✓ = 0. Therefore, f (x, y) has most rapidly increase in the direction rf (x, y). In the case,
@f
u (a, b)
@~ has the value equal to |rf (x)|.
Similarly, the direction of most rapidly decrease of f is determined by cos ✓ = 1,
i.e., ✓ = ⇡,. Therefore, f (x, y) has most rapidly decrease in the direction rf (x, y). In
@f
this case, @~
u (x) has the value equal to |rf (x)|.
(ii) Suppose the equation of the level curve near the point (a, b) is (parametrized) of
the form
C : (x(t), y(t)), (x(0), y(0)) = (a, b).
Since C is on the graph of f , we have

f (x(t), y(t)) = c

for t close to 0. Take derivative of t on both sides at the point t = 0, we have


d
0= (f (x(t), y(t))) t=0
dt
✓ ◆
@f dx @f dy
= (x(t), y(t)) (t) + (x(t), y(t)) (t) t=0
@x dt @y dt
@f @f
= (a, b)x0 (0) + (a, b)y 0 (0)
@x @y
= rf (a, b) · (x0 (0), y 0 (0)).

Since (x0 (0), y 0 (0)) is the tangent vector of the the curve f (x, y) = c at the point (a, b)
and the inner product of it with rf (a, b) is zero, rf (a, b) is perpendicular to the curve
f (x, y) = c at the point (a, b).
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 29

(iii) follows directly from (i). ⌅

R EMARK 5.4.8. Let f (x, y) = x2 y 2 . Then

f (x, y) = 0 () x + y = 0, x y=0

is a union of two lines passing through the origin (0, 0). Note that rf (0, 0) = (0, 0) and
there is no normal vector of the curve f (x, y) = 0 at (0, 0).
On the other hand, let f (x, y) = x2 . Then rf (0, 0) = (0, 0). But the level curve

f (x, y) = 0 () x = 0

has a tangent vector (0, 1) at the point (0, 0).

R EMARK 5.4.9. Suppose z = f (x, y) and denote (t) = (x(t), y(t)). Recall the
calculation in Property 5.4.7, we have
dz @f dx @f dy
= +
dt @x dt @y dt
= rf (x(t), y(t)) · (x0 (t), y 0 (t))
0
= rf (x, y) · (t).

We can regard this formula geometrically as the rate of change of f (x, y) along the direc-
tion of the curve (t) = (x(t), y(t)). This is a generalization of directional derivatives.

C OROLLARY 5.4.10. The equation of the tangent line at the point (a, b) in the level
curve f (x, y) = c is
(x a, y b) · rf (a, b) = 0,
or equivalently
fx (a, b)(x a) + fy (a, b)(y b) = 0.

E XAMPLE 5.4.11. Let C be the curve

C : x2 + 2y 3 = xy + 4.

Find a tangent vector and a normal vector at the point (2, 1). Also, determine the equation
of the tangent line at the point (2, 1).

S OLUTION . Suppose f (x, y) = x2 + 2y 3 xy, then

C : f (x, y) = 4.

Therefore,
rf (2, 1) = (2x y, 6y 2 x)|(2,1) = (3, 4)
is a normal vector of C at the point (2, 1). Since

(3, 4) · (4, 3) = 0,
30

(4, 3) is a tangent vector of C at the point (2, 1). The equation of the tangent line at the
point (2, 1) is

3(x 2) + 4(y 1) = 0.

P ROPERTY 5.4.12. Given a function f (x, y, z) and suppose f (a, b, c) = d. If rf (a, b, c) 6=


(0, 0, 0), then rf (a, b, c) is perpendicular to the level surface f (x, y, z) = d at the point
(a, b, c), or equivalently, rf (a, b, c) is a normal vector of f at the point (a, b, c).

P ROOF. The ideal is the same as a normal vector of level curve at a point. The asser-
tion is equivalent to prove that rf (a, b, c) is perpendicular to any curve C on f (x, y, z) =
d passing through (a, b, c).
A curve C on f (x, y, z) = d passing through (a, b, c) can be assumed to have the form

C : (t) = (x(t), y(t), z(t)), (x(0), y(0), z(0)) = (a, b, c)

for t close to 0. Since C is on f (x, y, z) = d, we have

f (x(t), y(t), z(t)) = d.

Take derivatives at t = 0 on both sides, we have

d
0= (f (x(t), y(t), z(t))) t=0
dt
@f @f @f
= (a, b, c)x0 (0) + (a, b, c)y 0 (0) + (a, b, c)z 0 (0)
@x @y @z
= rf (a, b, c) · (x0 (0), y 0 (0), z 0 (0)).

Since (x0 (0), y 0 (0), z 0 (0)) is a tangent vector of C at the point (x(0), y(0), z(0)) = (a, b, c),
the proof is complete. ⇤
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 31

R EMARK 5.4.13. Suppose z = f (x, y, z) = f (x(t), y(t), z(t)) and denote (t) =
(x(t), y(t), z(t)). We have
dz
= rf (x(t), y(t), z(t)) · (x0 (t), y 0 (t), z 0 (t))
dt
= rf (x, y, z) · 0 (t).

We can regard this formula geometrically as the rate of change of f (x, y, z) along the
direction of the curve (t) = (x(t), y(t), z(t)). This is a generalization of directional
derivatives.

C OROLLARY 5.4.14. The equation of the tangent plane of the level surface f (x, y, z) =
d at the point (a, b, c) is equal to

rf (a, b, c) · (x a, y b, z c) = 0,

or equivalently

fx (a, b, c)(x a) + fy (a, b, c)(y b) + fz (a, b, c)(z c) = 0.

E XAMPLE 5.4.15. Find an equation of the plane tangent to the elliptic cone x2 + y 2
z 2 = 1 at the point (1, 1, 1).

S OLUTION . Suppose f (x, y, z) = x2 + y 2 z 2 , then the surface (elliptic cone) is


f (x, y, z) = 1.

rf (1, 1, 1) = (2x, 2y, 2z)|(1,1,1) = (2, 2, 2).

The equation of the tangent plane at (1, 1, 1) is

2(x 1) + 2(y 1) 2(z 1) = 0 () x + y z = 1.

5.5. Higher order derivatives. For a function f (x, y), the (first) partial directive
functions are
@f @ @f @
fx =
= f, fy = = f.
@x @x @y @y
The second-order (and in general n-th order) partial derivative functions are defined sim-
ilarly by
@ @f @2f
fxx := (fx )x = = ,
@x @x @x2
2
@ @f @ f
fxy := (fx )y = = ,
@y @x @y@x
@ @f @2f
fyx := (fy )x = = ,
@x @y @x@y
@ @f @2f
fyy := (fy )y = = .
@y @y @y 2
32

2
E XAMPLE 5.5.1. Let f (x, y) = x2 exy . Find fx , fy , fxx , fxy , fyx , fyy .

P ROOF. We have
@f 2 2
fx = = 2x exy · y 2 = 2x y 2 exy ,
@x
@f 2 2
fy = = exy · 2xy = 2xyexy ,
@y
and
@2f 2 2
fxx = = 2 y 2 exy · y 2 = 2 y 4 exy ,
@x2
@2f 2 2 2 2
fxy = = 2yexy y 2 exy · 2xy = 2yexy 2xy 3 exy ,
@y@x
@2f 2 2 2 2
fyx = = 2yexy 2xyexy · y 2 = 2yexy 2xy 3 exy ,
@x@y
@2f 2 2 2 2
fyy = 2
= 2xexy 2xyexy 2xy = 2xexy 4x2 y 2 exy .
@y

Note that in the example above, we have

fyx = fxy .

This property is called the Euler property and holds usually. The following is a sufficient
condition for the function f (x, y) having Euler property (in our course, we always assume
that the Euler property holds).

T HEOREM 5.5.2. Given a function f (x, y). Suppose that

f, fx , fy , fyx , fxy

are all continuous at the point (a, b). Then

fxy (a, b) = fyx (a, b).

Use the Euler property for f (x, y) repeatedly, it is easy to check

fx2 y := fxxy = (fx )xy = (fx )yx = fxyx = (fxy )x = (fyx )x = fyxx = fyx2 .

This means that we can change the order of partial derivatives. Therefore, if we take partial
derivatives for f (x, y) n times, then there are

Hn2 = Cn2+n 1
=n+1

types of partial derivatives. They are


@n @n @n @n @n
, , , ··· , , .
@xn @xn 1 @y @xn 2 @y 2 @x@y n 1 @y n
Similarly, given g(x, y, z), it is easy to check

gxyz = gxzy = gzxy = gzyx .


5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 33

If we take partial derivatives for g n times, then there are

Hn3 = Cnn+3 1
= Cnn+2

types of partial derivatives.

5.6. Test for extreme values and applications.

D EFINITION 5.6.1. (i) The function f (x, y) is said to have a local maximum
(resp. local minimum) at (a, b) if

f (x, y)  f (a, b) (resp. f (x, y) f (a, b))

for any (x, y) ”near” (a, b). The local maximums and minimums of f are called
local extreme values of f . A local maximum or a local minimum is called a local
extreme value.
(ii) f (x, y) is said to have a absolute maximum (resp. absolute minimum) at (a, b)
if
f (x, y)  f (a, b) (resp. f (x, y) f (a, b))

for any (x, y) in the domain of f . An absolute maximum or an absolute minimum


is called a absolute extreme value.

T HEOREM 5.6.2 (First-Partials Test). If f has a local extreme value at (a, b) then

rf (a, b) = 0

(or rf (a, b) does not exist but the latter case would not be discussed in our lecture).

P ROOF. Since f (x, y) has a local extreme at the point (a, b), so do the function
g(x) := f (x, b) at the point a, and the function h(y) := f (a, y) at the point b. There-
fore by first order test, we have

g 0 (a) = fx (a, b) = 0 and h0 (b) = fy (a, b) = 0.

This means that rf (a, b) = 0. ⇤

D EFINITION 5.6.3. The point (a, b) satisfying rf (a, b) = 0 (or does not exist) is
called a critical points (candidate point) of f .
If the points (a, b) is a critical point but does not give rise to a local extreme value,
then we call (a, b) a saddle point.

E XAMPLE 5.6.4. (0, 0) is a saddle point of z = x2 y2 .

E XAMPLE 5.6.5. Find local extreme values of the f (x, y) = 2x2 + y 2 xy 7y.
34

S OLUTION . Note that

rf (x, y) = (4x y, 2y x 7).

Let rf (x, y) = (0, 0). We get the point (1, 4) and f (1, 4) = 14. Hence 14 is the
candidate for the local extreme value.
Let us see the local behavior of f at (1, 4). Consider

f (1 + h, 4 + k) = 2h2 + k 2 hk 14
2
= h + (h k/2) + 3k 2 /4
2
14
14 = f (1, 4).

Thus f has a local minimum 14 at (1, 4). If fact, it is the absolute minimum by the
inequality above. ⌅

Recall that the second-derivative test (for single variable functions) tells us that if a is
a point satisfying f 0 (a) = 0, then
8
<f 00 (a) > 0 =) f has a local minimum at a,
:f 00 (a) < 0 =) f has a local maximum at a.

We have a similar result for functions of two variables, but it is more complicated to state.

T HEOREM 5.6.6 (Second-Partials Test). Let (a, b) be a critical point, i.e.,

rf (a, b) = 0.

Suppose

fxx (a, b) fxy (a, b)


D = D(a, b) = := fxx (a, b)fyy (a, b) fxy (a, b)2 .
fxy (a, b) fyy (a, b)
(i) If D < 0, then (a, b) is a saddle point.
(ii) If D > 0 and fxx (a, b) > 0, then f has a local minimum at (a, b).
(iii) If D > 0 and fxx (a, b) < 0, then f has a local maximum at (a, b).

Before the proof, let us see the following example.

E XAMPLE 5.6.7. For simplicity, we let


1
f (x, y) = (ax2 + by 2 ), ab 6= 0.
2
This function can be regarded as a prototype for Theorem 5.6.6. Note that

fx = ax, fy = by

and
fxx = a, fxy = fyx = 0, fyy = b.
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 35

Hence rf (x, y) = 0 if and only if (x, y) = (0, 0). Note that

fxx (a, b) fxy (a, b)


D= = ab.
fxy (a, b) fyy (a, b)

(i) If D = ab < 0. Then f (x, y) may > 0 or < 0 near (0, 0). Hence (0, 0) is a
saddle point.
(ii) If D = ab > 0 and a > 0, then we have b > 0. Hence for any (x, y) near (0, 0),
we have f (x, y) > 0. Therefore, f has a local minimum at (0, 0). In fact, it is
the absolute minimum.
(iii) If D = ab > 0 and a < 0, then we have b < 0. Hence for any (x, y) near (0, 0),
we have f (x, y) < 0. Therefore, f has a local maximum at (0, 0). In fact, it is
the absolute maximum.

PROOF OF T HEOREM 5.6.6. (Only for reference) The key point of the proof is the
example above and the second-order approximation (assume the result) that if (x, y) is
close to (a, b), then

f (x, y) ⇡f (a, b) + (fx (a, b)(x a) + fy (a, b)(y b))


1
+ fxx (a, b)(x a)2 + 2fxy (a, b)(x a)(y b) + fyy (a, b)(y b)2
2
1
=f (a, b) + A(x a)2 + 2B(x a)(y b) + C(y b)2 (since rf (a, b) = 0),
2
where
A = fxx (a, b), B = fxy (a, b), C = fyy (a, b).
Let X = x a, Y = y b. Then
1
f (x, y) ⇡ f (a, b) + AX 2 + 2BXY + CY 2
2
✓ ◆2 ✓ ◆ !
1 BY B2 2
= f (a, b) + A X+ + C Y .
2 A A

Note that
✓ ◆
BY
X+ ,Y = (0, 0) () (X, Y ) = (0, 0) () (x, y) = (a, b).
A
The behavior of f (x, y) is determined by
✓ ◆2 ✓ ◆
BY B2
A X+ + C Y 2,
A A
where
✓ ◆
B2 A B
A· C = AC B2 = .
A B C
Compare the criterion in example 5.6.7 above, we finish the proof. ⇤
36

R EMARK 5.6.8. More generally, for the quadratic equation

f (x, y) = ax2 + bxy + cy 2 + dx + ey + f, ac 6= 0,

we have the following conclusion.


(a) If Df < 0, then the critical point of f is a saddle point.
(b) if Df > 0 and a > 0, then the critical point is the absolute minimum.
(c) if Df > 0 and a < 0, then the critical point is the absolute maximum.

P ROOF. Note that


fxx (x, y) fxy (x, y) 2a b
Df = = = 4ac b2 .
fxy (x, y) fyy (x, y) b 2c
Set
rf (x, y) = (2ax + by + d, bx + 2cy + e) = (0, 0)
Then there is a unique critical point if
2a b
= Df 6= 0.
b 2c
Note that
✓ ◆2
by 4ac b2 2
f (x, y) = a x + + y + dx + ey + f
2a 4a
= ax21 + b1 y 2 + dx1 + e1 y + f
= (ax21 + dx1 ) + (b1 y 2 + e1 y) + f
= ax22 + b1 y12 + f1
:= g(x2 , y1 ),

where the qualities above are just changing of variables and


by 4ab b2 bd
x1 := x + , b1 := , e1 := e , ··· .
2a 4a 2a
Therefore, the behavior of f is the same as g. Note that
2a 0
Dg = = 4ab1 = 4ab b2 = D f .
0 2b1
The result follows directly by a similar discussion as above.

E XAMPLE 5.6.9. We now use the second-partials test to say that f (x, y) = 2x2 +
y 2
xy 7y has a local minimum at the point (1, 4) in Example 5.6.5. Set

r(x, y) = (fx , fy ) = (4x y, 2y x 7) = 0,

we have that (1, 4) is the only critical point. Further,

fxx = 4, fxy = fyx = 1, fyy = 2.


5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 37

Since
4 1
=8 1=7>0
1 2
and fxx = 4 > 0, f has a local minimum (in fact the absolute minimum) at (1, 4).

E XAMPLE 5.6.10. Find the saddle points and local extreme values of the function
1 4
f (x, y) = 4x + 23 x3 + 4xy y2 .

S OLUTION . Setting

rf (x, y) = ( x3 + 2x2 + 4y, 4x 2y) = (0, 0),

we have
x3 + 2x2 + 4y = 0, y = 2x.
Substituting y = 2x into the first equation, we have

0= x3 + 2x2 + 8x = x(x 4)(x + 2).

The solutions of this equation are x = 0, 4, 2. Hence the critical points are

(0, 0), (4, 8), ( 2, 4).

The second partials are

fxx = 3x2 + 4x, fxy = 4, fyy = 2.

Let
fxx (a, b) fxy (a, b) 3a2 + 4a 4
D= = .
fxy (a, b) fyy (a, b) 4 2
• At the point (a, b) = (0, 0),
0 4
D(0, 0) = = 16 < 0,
4 2
hence (0, 0) is a saddle point.
• At the point (a, b) = (4, 8),
32 4
D(4, 8) = = 48 > 0.
4 2
Since 32 < 0, at the point (4, 8), f has a local maximum f (4, 8) = 3 .
128

• At the point (a, b) = ( 2, 4),


20 4
D( 2, 4) = = 24 > 0.
4 2
Since 20 < 0, at the point (a, b) = ( 2, 4), f has a local maximum f ( 2, 4) =
3 .
20


38

E XAMPLE 5.6.11. Let f (x, y) = x3 + y 2 + 3xy. Discuss the local behaviors of the
critical point.

S OLUTION . Let

rf (x, y) = (3x2 + 3y, 2y + 3x) = 0.

Then ✓ ◆
3 9
(x, y) = (0, 0), , .
2 4
Note that
6x 3
D= = 12x 9.
3 2
Since
D(0, 0) = 9 < 0,
(0, 0) is a saddle point.
On the other hand, ✓ ◆
3 9
D , =9>0
2 4
and fxx 3 9
2, 4 =6· 3
2 = 9 > 0. Hence
✓ ◆
3 9 27
f , =
2 4 16
is a local minimum. ⌅

5.6.1. Application I: The least square method. Suppose we have data

(a1 , b1 ), (a2 , b2 ), ··· , (an , bn )

that do not lie on a vertical line simultaneously. We want to find a line

y = mx + k

that minimizes the sum


n
X
E(m, k) = ((mai + k) bi ) 2 ,
i=1

i.e., we need to determine which integers m, k minimize the values E(m, k).

R EMARK 5.6.12. Recall that the solution of


8
<ax + by = e,
:cx + dy = f,

is
ed bf
x y af ce
x= = , y= = (if 6= 0).
ad bc ad bc
If = 0, then (c, d) is parallel to (a, b) as vectors.
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 39

P ROPERTY 5.6.13. E(m, k) has a minimum at the point (m⇤ , k ⇤ ), where

ab āb̄
m⇤ = , k ⇤ = b̄ ām⇤ ,
a2 ā2
and we denote
Pn Pn Pn Pn
i=1 a2i i=1 ai i=1 bi i=1 a i bi
a2 = , ā = , b̄ = , ab = .
n n n n
Note that since E(m, k) is a quadratic polynomial, the minimum is the absolute minimum.
The line
y = m⇤ x + k ⇤
n
is called the least squares regression line for {(ai , bi )}i=1 .

P ROOF. Suppose
Xn
@E
=2 ((mai + k) bi ) · ai ) = 0,
@m i=1
Xn
@E
=2 ((mai + k) bi ) = 0.
@k i=1

We have
n
X n
X n
X
m a2i + k ai = a i bi ,
i=1 i=1 1=1
n
X Xn
m ai + kn = bi ,
i=1 i=1

i.e., by dividing n, we have

a2 m + āk = ab,
am
¯ + k = b̄.

This is equivalent to
! ! !
a2 ā m ab
= .
ā 1 k b̄
Hence
!
⇤ ⇤ ab āb̄ ā · ab + a2 · b̄
(m , k ) := (m, k) = ,
a2 ā2 a2 ā2
!
ab āb̄ b̄(a2 ā2 ) + ā(āb̄ ab)
= ,
a2 ā2 a2 ā2
✓ ◆
ab āb̄
= , b̄ ām⇤
a2 ā2
40

is a critical point. Note that if a2 = a2 , then


✓ ◆2
1 2 a1 + · · · + an
a + · · · + a2n = ,
n 1 n
i.e.,

(12 + · · · + 12 )(a21 + · · · + a2n ) = n(a21 + · · · + a2n ) = (a1 + · · · + an )2 .

This implies a1 = a2 = · · · = an by the Cauchy inequality. This is the case that all the
points lie on a vertical line and thus is excluded in our consideration. Hence we have

a2 ā2 6= 0.

Since (m⇤ , k ⇤ ) is the only critical point, it should be the point such that f (m⇤ , k ⇤ ) is
the absolute minimum by our geometric intuition. This finishes the proof.
To explain this more rigorous, we now use the second-partials test. Note that
Pn Pn
2 i=1 a2i 2 i=1 ai
D = Pn
2 i=1 ai 2n
0 !2 1
Xn X n
= 4 @n a2i ai A
i=1 i=1
0 !2 1
n
X n
X
= 4 @(12 + · · · 12 ) · a2i ai A 0
i=1 i=1

by Cauchy’s inequality. Where ”=” holds if and only if (a1 , a2 , · · · , an ) is parallel to


(1, 1, · · · , 1), i.e., a1 = a2 = · · · = an . This case is excluded in our consideration.
Therefore, we conclude D > 0.
Since
n
X
2 a2i > 0 (otherwise a1 = · · · = an = 0),
i=1
by the second-partials test, f (m⇤ , k ⇤ ) is a local minimum. This local minimum is the
absolute minimum by Remark 5.6.8. ⇤

We have a general result that guarantees the existence of the absolute extreme val-
ues for a function. We will use its later to determine the absolute extreme values with
constraints.

T HEOREM 5.6.14. If D is a bounded set containing its ”boundary” and f (x, y) is a


continuous function on D, then f has the absolute maximum and the absolute minimum on
D.

R EMARK 5.6.15 (only for reference). We come back to Property 5.6.13. Observe that
n
X
E(m, k) = ((mai + k) bi ) 2
i=1
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 41

is small only if each


|mai + k bi |
is small, i.e., there exists ci > 0 such that

|mai + k bi |  ci () c i + bi  a i m + k  c i + bi .

Then
n
\
S := {(m, k) : c i + bi  a i m + k  c i + bi }
i=1
is a bounded set (containing its boundary). Therefore, the absolute minimum s of E(m, k)
exists and is equal to the absolute minimum of E(m, k) on S. Since E(m, k) has only one
critical point by the argument above and the minimum of E(m, k) exists, E(m, k) should
have the the minimum at the critical point.

E XAMPLE 5.6.16. Use the least square method to determine the best line approximat-
ing these points
(1, 1), (1, 2), (2, 2), (3, 2), (4, 3).

S OLUTION . We have
1+1+2+3+4 11 12 + 12 + 22 + 32 + 42 31
ā = = , a2 = =
5 5 5 5
and
1+2+2+2+3 1·1+1·2+2·2+3·2+4·3
b̄ = = 2, ab = = 5.
5 5
Hence
5 (11/5) · 2 15 11 15 35
m⇤ = = , k⇤ = 2 · = .
31/5 (11/5)2 34 5 34 34
The line for best approximation is
15 35
y= x+ .
34 34

5.7. Lagrange multiplier for extreme values problems with side conditions.

E XAMPLE 5.7.1. Maximize and minimize f (x, y) = xy on the condition that x2 +


2
y = 1.

S OLUTION . The minimum and maximum here are easily got by the arithmetic-geometric
mean inequality. We would not use it here. Instead, we view this example from geometry
to get some intuition for latter property.
Consider the curve
C : {(x, y, c) : f (x, y) = c}
and the cylinder
S : x2 + y 2 = 1
42

in the xyz-space. Observe that if c > 0 is large, then C does not intersect the cylinder S.
Now let c > 0 get close to the origin. When C and S have the first intersection, C should
be ”tangent” to S at the intersection points and it is clear that this c corresponds to the
maximum value of f whenever C and S meet at this point. The situation is similar when
c < 0 is small.
Suppose (a, b, c) is a point in the first intersection. Since C and S are tangent at this
point, their level curves f (x, y) = c and x2 + y 2 = 1 are tangent at (a, b). By geometric
properties of gradient, we have that rf (a, b) = (b, a) 6= (0, 0) (since 0 · 0 = 0 but
ab = c > 0) is perpendicular to f (x, y) = c at the point (a, b). On the other hand,
r(x2 y2 1)(a, b) = (2a, 2b) 6= (0, 0) (since 02 + 02 6= 1) is perpendicular to the circle
x2 + y 2 = 1 at the point (a, b). Therefore, they are parallel, i.e., for some 2 R, we have

rf (a, b) = r(x2 y2 1)(a, b) =) (b, a) = (2a, 2b).

Thus, b = 2 a and a = 2 b. Since (a, b) is on x2 + y 2 = 1, we have


2
4 (a2 + b2 ) = 4 2
= 1,

i.e., = ±1/2 and a = ±b. Use x2 + y 2 = 1 again, we have that


8
<±( p1 , p1 ) if = 12 , =) xy = 12 (max),
(a, b) = 2 2
:±( p1 , p1 ) if = 1 , =) xy = 1 (min).
2 2 2 2

The following property is observed in the last example and holds in general.

P ROPERTY 5.7.2. Suppose f (x, y) is defined on D ⇢ R2 and C is a curve lying


entirely in D. If f (x) has a maximum or a minimum at (a, b) on C, then rf (a, b) is
perpendicular to C at (a, b).

P ROOF. Suppose C is parametrized by (t) = (x(t), y(t)) and (0) = (a, b). By
assumption g(t) := f ( (t)) has a maximum or minimum at 0. Hence
d
0 = g 0 (0) = (f ( (t))) |t=0
dt
= rf ( (t)) · 0 (t)|t=0
0
= rf (a, b) · (0).

This means that


0
rf (a, b) ? (0) = 0.

We are now ready for a method of side-condition problems.


5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 43

T HEOREM 5.7.3. (Lagrange multiplier) Suppose f (x, y) has the minimum or the
maximum at the point (a, b) when subject to the side condition

g(x, y) = 0.

Then rf (a, b) and rg(a, b) are parallel. Thus if rg(a, b) 6= 0, then there exists 2R
such that
rf (a, b) = rg(a, b).

P ROOF. Since f has the maximum or the minimum at (a, b) along

C : g(x, y) = 0,

by Lemma 5.7.2 we know that rf (a, b) is perpendicular to C at (a, b). On the other hand,
rg(a, b) is a normal vector of C at (a, b). Therefore, rf (a, b) and rg(a, b) are parallel.
Since rg(a, b) 6= 0, we have

rf (a, b) = rg(a, b)

for some 2 R. ⇤

D EFINITION 5.7.4. A in Theorem 5.7.3 is called a Lagrange multiplier.

E XAMPLE 5.7.5. Find the absolute extreme values (they exist by Theorem 5.6.14) for
the distance from the origin to the curve (ellipse)

C : x2 + xy + y 2 = 1.

Note that since C is an ellipse, the absolute extreme values exist automatically.

S OLUTION . Let f (x, y) = x2 + y 2 , which represent the square of the distance from
(x, y) to (0, 0). If we can find points (a, b) such that f (a, b) have absolute extreme values
subject to the condition C, then we get the answer by taking the square root of f (a, b).
Let g(x, y) = x2 + xy + y 2 1. Then

C : g(x, y) = 0

is the side condition. Hence by using the method of Lagrange, we need to solve
8
<rf = rg =) (2x, 2y) = (2x + y, x + 2y),
:g(x, y) = 0 =) x2 + xy + y 2 1 = 0.

Then we have 8
>
> (2 2 )x y = 0,
>
<
x + (2 2 )y = 0,
>
>
>
:x2 + xy + y 2 1 = 0.
44

Consider the first two equations. Observe that the point (x, y) = (0, 0) is not on the ellipse
C : x2 + xy + y 2 1 = 0, we must have
2 2
0= = (2 2 )2 2
2 2
= (2 3 )(2 ).

Hence
2
= or 2.
3
If = 23 , then x = y and x2 + x2 + x2 1 = 0. Thus
✓ ◆
1 1
(x, y) = ± p , p .
3 3
⇣ ⌘ ⇣ ⌘ q
In this case, f p13 , p13 = f p1 ,
3
p1
3
= 23 and the distance is 23 .
If = 2, then x = y and x2 x2 + x2 1 = 0. Thus

(x, y) = ±(1, 1).


p
In this case, f (1, 1) = f ( 1, 1) = 2 and the distance is 2.
p
We conclude that
q the longest distance from C to (0, 0) is 2, and the shortest distance
from C to (0, 0) is 3.
2

E XAMPLE 5.7.6. Determine the shortest distance from the origin (0, 0) to the parabola

y = x2 + ↵

according to the values of ↵. (Since it is a parabola, the shortest distance exists )

S OLUTION . The assertion is equivalent to the following statement. Determine the


absolute minimum of

f (x, y) = x2 + y 2 (then take the square root)

under the side condition


g(x, y) = y x2 ↵ = 0.
We would discuss the result according to the values of ↵.
We need to solve
8
<rf = rg =) (2x, 2y) = ( 2x, 1),
:y x2 ↵ = 0,

this is equivalent to 8
>
> (2 + 2 )x = 0,
>
<
2y = ,
>
>
>
: y x2 ↵ = 0.
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 45

Thus 8
>
> x=0 =) y = ↵, = 2↵,
>
<
or
>
>
>
: = 1 1 1
1 =) y = 2, x2 = 2 ↵ 0 =) ↵  2.
In the firs case we have the critical point (x, y) = (0, ↵) and

f (0, ↵) = ↵2 (no condition for ↵).

Hence the distance is equal to |↵|.


In the second case, we have the critical point
r !
1 1
(x, y) = ± ↵, .
2 2

Note that this critical point exists only when ↵  2.


1
In this case we have
1 1 1
f (x, y) = ↵+ = ↵
2 4 4
2
 ↵2 = |↵| (since(↵ + 1/2)2 0).

Hence when ↵  1/2, the shortest distance is


r
1
↵.
4
On the other hand, when ↵ > 1/2, the second critical point does not exist. Hence the
shortest instance occurs in the first critical point and the value is

|↵| .

Combine all the information together, we conclude that the shortest distance is equal to
8
<|↵| if ↵ > 12 ,
q
: 1
↵ if ↵  1 .
4 2

R EMARK 5.7.7. Note that in the example above, whenever ↵  2,


1
we have
1
f (0, ↵) = ↵2 ↵.
4
However, this does not mean that f (0, ↵) is the absolute maximum. The reason is that
the parabola is not bounded and it is easy to see there are points on the parabola such
that there distances to the origin are as large as possible. Therefore, f (0, ↵) is a local
maximum but is Not the absolute maximum.

• Strategy for determining the absolute extreme values of f (x, y) with a more gen-
eral side condition D, where D is a bounded domain containing its boundary (it is no
longer a curve). Note that the absolute extreme values exist by Theorem 5.6.14.
46

(i) Find the critical points of f In D. Determine the (local extreme) values of f at
this points.
(ii) Determine the extreme values of f on the set of boundary points. Since the
set of boundary points forms a curve C, it is an extreme value problems with a
side condition C as what we discussed before. (Recall that in Example 5.7.5, the
ellipse C is itself a set of boundary points). Also note that the value of f on a
boundary point of D may be large (or small) than any value of f on the points
inside D even if the value of f at this point is not a local extreme value).
(iii) Compare all the values in (i) and (ii). Use Theorem 5.6.14 we know that the
largest one is the absolute maximum and the smallest one is the absolute mini-
mum.

E XAMPLE 5.7.8. Find the absolute (extreme) values of

f (x, y) = x2 + 2y 2 2x + 3

on
D = (x, y) : x2 + y 2  10 .

S OLUTION . We break the side condition into two cases as the strategy suggests.
(a) The points inside the circle x2 + y 2 < 10.
(b) The points on the circle C : x2 + y 2 = 10.
In (b), we use Lagrange’s multiplier. The side condition is

g(x, y) : = x2 + y 2 10
= 0.

Set
(2x 2, 4y) = rf (x, y) = rg(x, y) = (2x, 2y).
Hence 8
<(1 )x = 1,
:(2 )y = 0.
p
• If y = 0, then x = ± 10.
• If y 6= 0, then= 2 and x = 1. This implies y = ±3.
p
Hence we get four candidate points (± 10, 0) and ( 1, ±3), and the values of f on them
are
p p p p
f ( 1, ±3) = 24, f( 10, 0) = 13 + 2 10, f ( 10, 0) = 13 2 10.

In (a), we should find the critical points in D. Set rf (x, y) = 0. We have (x, y) =
(1, 0) 2 D and
f (1, 0) = 2.
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 47

(Use the second-partial test, we know f has a local minimum at the point (1, 0), but we
don’t need this result here)
Combine all the information in (a) and (b) together, we know that f has the absolute
maximum f ( 1, ±3) = 24 and the absolute minimum f (1, 0) = 2. ⌅

E XAMPLE 5.7.9. Find the absolute (extreme) values of


5
f (x, y) = x2 xy + y 2
2
on
D = (x, y) : x2 + y 2  1 .

S OLUTION . We break the side condition into two cases: (a) the points on the circle
C : x2 + y 2 = 1, and (b) the points inside the circle, i.e., on x2 + y 2 < 1.
In the first case (a), we use Lagrange’s multiplier. The side condition is

C : g(x, y) = x2 + y 2 1 = 0.

Set ✓ ◆
5 5
2x y, 2y x = rf (x, y) = rg(x, y) = (2x, 2y).
2 2
Hence
8
<(2 2 )x 5
2y = 0,
(7)
: 5
2x + (2 2 )y = 0.

Since (0, 0) is not on C, we may suppose (x, y) 6= (0, 0) and then (by Remark 5.6.12)
5 ✓ ◆2
(2 2 ) 2 2 5
0= 5
= (2 2 )
2 2 2 2
✓ ◆✓ ◆
1 9
= 2 2 .
2 2
Hence
1 9
=or .
4 4
If = 4 , then x = y. By using x + y = 1, we have
1 2 2

✓ ◆
1 1 1
(x, y) = ± p , p , f (x, y) = .
2 2 4
If = 94 , then x = y. Use x2 + y 2 = 1 again, we have
✓ ◆
1 1 9
(x, y) = ± p , p , f (x, y) = .
2 2 4
(b) We now find the critical points inside the circle. Set rf (x, y) = 0. We have
(x, y) = (0, 0) 2 D and
f (0, 0) = 0.
48

(Since
5
2 2 9
D= 5
= < 0,
2 2 4

use the second-partial test, (0, 0) is a saddle point. Hence f (0, 0) = 0 is not an absolute
extreme value)
Combine all the information together, we know that f has the absolute maximum 9
4
and the absolute minimum 14 .

R EMARK 5.7.10. Sometimes we would meet calculations like (7) with nonzero con-
stant terms. For example,
8
>
> (2 2 )x y = 1,
>
<
(8) x + (2 2 )y = 1,
>
>
>
:x2 + y 2 = 1.

We use Remark 5.6.12 to discuss the solution.


If = (2 2 )2 1 for the first two equations of (8) is 0, then = 1/2 or 3/2. If
= 1/2, then (x, y) has no solution. If = 3/2, then
8
>
> x y = 1,
>
<
x y = 1, =) xy = 0 =) (x, y) = (0, 1), ( 1, 0).
>
>
>
:x2 + y 2 = 1.

On the other hand if 6= 0, then the first two equations of (8) implies

x (2 2 ) + 1 1 y (2 2 ) + 1 1
x= = = , y= = = .
(2 2 )2 1 1 2 (2 2 )2 1 1 2

Since x2 + y 2 = 1, we have 2/(1 2 )2 = 1, i.e.,


p
1 2 = ± 2.

Then we have
1
x = y = ±p .
2
E XAMPLE 5.7.11. Find the absolute extreme values of the function

f (x, y) = 4xy x2 y2 6x

on the triangular region

T = {(x, y) : 0  x  2, 0  y  3x} .
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 49

S OLUTION . We break the side condition into two cases: (a) the points inside the
triangle, and (b) the points on the boundary of the triangle, i.e., on these three segments

C1 = {(x, 0) : 0  x  2} ,
C2 = {(2, y) : 0  y  6} ,
C3 = {(x, 3x) : 0  x  2} .

(a) Set
rf (x, y) = (2y 2x 6, 4x 2y) = (0, 0).
We have (x, y) = (1, 2) which is inside the triangle and

f (1, 2) = 3.

(b) Restrict f (x, y) to C1 , C2 , C3 respectively, we have

f1 (x) := f (x, 0) = x2 6x, 0  x  2,


f2 (y) := f (2, y) = y 2 + 8y 16, 0  y  6,
2
f3 (x) := f (x, 3x) = 2x 6x, 0  x  2.

Note that f1 (x) = (x + 3)2 + 9 has extreme values

f1 (0) = 0 (max), f1 (2) = 16 (min).

f2 (y) = (y 4)2 has extreme values

f2 (4) = 0 (max), f (0) = f (6) = 4 (min).

f3 (x) = 2(x 3/2)2 9/2 has extreme values

f3 (3/2) = 9/2 (min), f3 (0) = 0 (max).

Therefore, compare (a) and (b), we know that the absolute maximum is 0, and the absolute
minimum is 16.
50

E XAMPLE 5.7.12. Find the absolute extreme values of

f (x, y) = x2 + 4x + y 3 3y

on the rectangle {(x, y) : 3  x  0, 2  y  2}.

S OLUTION . We break the side condition into two cases: (a) the points inside the
rectangle, i.e., the points on the set {(x, y) : 3 < x < 0, 2 < y < 2}, and (b) the points
on the boundary of the rectangle, i.e., on these four segments:

L1 = {(x, 2) : 3  x  0} ,
L2 = {(x, 2) : 3  x  0} ,
L3 = {( 3, y) : 2  y  2} ,
L4 = {(0, y) : 2  y  2} .

(a) Set
rf (x, y) = (2x + 4, 3y 2 3) = (0, 0).
We have (x, y) = ( 2, ±1) which are inside the rectangle and

f ( 2, 1) = 6, f ( 2, 1) = 2.

(b) Restrict f (x, y) to L1 , L2 , L3 , L4 respectively, we have

f1 (x) := f (x, 2) = x2 + 4x 2, 3  x  0,
f2 (x) := f (x, 2) = x2 + 4x + 2, 3  x  0,
3
g1 (y) := f ( 3, y) = y 3y 3, 2  y  2,
g2 (y) := f (0, y) = y 3 3y, 2  y  2.

Note that f1 (x) = (x + 2)2 6 has extreme values

f1 ( 2) = 6 (min), f1 (0) = 2 (max)

on L1 . f2 (x) = (x + 2)2 2 has extreme values

f2 ( 2) = 2 (min), f2 (0) = 2 (max)

on L2 . Set
g10 (y) = g20 (y) = 3y 2 3 = 0 =) y = ±1.
The candidates for the extreme values of g1 (y) and g2 (y) are

g1 ( 2) = 5, g1 ( 1) = 1, g1 (1) = 5, g1 (2) = 1

and
g2 ( 2) = 2, g1 ( 1) = 2, g1 (1) = 2, g1 (2) = 2.
5. DERIVATIVES OF FUNCTIONS OF MULTIPLE VARIABLES 51

Therefore, compare values in case (a) and case (b), the maximum and minimum of f on
the rectangle are 2 and 6, respectively. ⌅

R EMARK 5.7.13. For extreme values problems of f (x, y, z) with side conditions

g(x, y, z) = 0 = h(x, y, z),

we have a similar result as the case of two variables. That is, if f has an extreme value at
(a, b, c) with side conditions above, then there exist and µ 2 R such that

rf (a, b, c) = rg(a, b, c) + µrh(a, b, c).

In particular, if h(x, y, z) ⌘ 0 is a zero constant, then

rf (a, b, c) = rg(a, b, c).

P ROOF. The proof of this property is essentially the same as the case of two variables.
Note that the sides conditions
8
<g(x, y, z) = 0,
C:
:h(x, y, z) = 0,

contribute a curve. Hence the statement is equivalent to determine when f has absolute
values at the point (a, b, c) with side condition C.
Note that rf (a, b, c) is perpendicular to the level surface f (a, b, c) = d at (a, b, c).
Similarly, rg(a, b, c) is perpendicular to g(x, y, z) = 0 at the point (a, b, c), and rh(a, b, c)
is perpendicular to h(x, y, z) = 0 at the point (a, b, c). Therefore, rf (a, b, c), rg(a, b, c),
and rh(a, b, c) are all perpendicular to C at the point (a, b, c). This implies that if rg(a, b, c)
and rh(a, b, c) are not parallel, then rf (a, b, c) lies on the plane determined by rg(a, b, c)
and rh(a, b, c). Therefore, there exists , µ 2 R such that

rf (a, b, c) = rg(a, b, c) + µrh(a, b, c).


6. Integrals of functions of multiple variables

6.1. Double integral.

D EFINITION 6.1.1. Suppose f (x, y) is a function defined on [a, b] ⇥ [c, d]. Let P be a
partition of [a, b] ⇥ [c, d] given by

x0 = a < x1 < · · · < xn = b, xi = xi xi 1,

y0 = c < y1 < · · · < ym = d, y j = yi yi 1.

Let

⌦ij = [xi 1 , xi ] ⇥ [yj 1 , yj ], A(⌦ij ) = xi ⇥ yj ,


⇠i 2 [xi 1 , xi ], ⌘j 2 [yj 1 , yj ]

for 1 = 1, 2, · · · n and j = 1, 2, · · · , m.
(i) The norm of the partition P is defined by

kP k = max { xi , yj } .
1in,1jm

(ii) The sum


X
R(f, P, ⇠, ⌘) = f (⇠i , ⌘j ) · A(⌦ij )
1in,1jm

is called a Riemann sum of f (x, y) for the partition P and (⇠, ⌘) := ({⇠i } , {⌘j })1in,1jm .
(iii) If the limit
lim R(f, P, ⇠, ⌘) = L < 1 exists
kP k!0
for any partition P of [a, b] ⇥ [c, d] and (⇠, ⌘), then L is called the definite
integral of f on [a, b] ⇥ [c, d], and is denoted as
ZZ
f (x, y) dA.
[a,b]⇥[c,d]

T HEOREM 6.1.2. If f (x, y) is a continuous function on [a, b] ⇥ [c, d], then the definite
integral (the limit)
ZZ
f (x, y) dA := lim R(f, P, ⇠, ⌘)
[a,b]⇥[c,d] kP k!0

exists.
53
54

The calculation for integrals of functions of two variables can be reduced to double
integrals of functions of one variable as the following:

P ROPERTY 6.1.3. We have


ZZ Z Z !
d b
f (x, y) dA = f (x, y) dx dy
[a,b]⇥[c,d] c a
Z Z !
b d
= f (x, y) dy dx.
a c

In particular, if f (x, y) = g(x)h(y), then we have


ZZ Z b ! Z !
d
f (x, y) dA = g(x) dx · h(y) dy .
[a,b]⇥[c,d] a c

P ROOF. Note that

kP k ! 0 () max xi ! 0, max yj ! 0.
1in 1jm

Write
X
R(f, P, ⇠, ⌘) = f (⇠i , ⌘j ) · A(⌦ij )
1in,1jm
m n
!
X X
= f (⇠i , ⌘j ) xi yj
j=1 i=1
m Z !
X b
! f (x, ⌘j ) dx yj whenever kP k ! 0 and so xi ! 0,
j=1 a
Z Z !
d b
! f (x, y) dx dy whenever kP k ! 0 and so yj ! 0.
c a

On the other hand, change the order of summation, we have


0 1
Xn Xm
R(f, P, ⇠, ⌘) = @ f (⇠i , ⌘j ) yj A xi
i=1 j=1
n Z !
X b
! f (⇠i , y) dy xi if kP k ! 0
i=1 a
Z Z !
b d
! f (x, y) dy dx if kP k ! 0.
a c

E XAMPLE 6.1.4. Evaluate the double integrals.


ZZ
(i) (x2 + xy) dA.
Z Z[0,1]⇥[ 1,2]
xy
(ii) 2 + y 2 )2
dA.
[1,2]⇥[1,3] (x
6. INTEGRALS OF FUNCTIONS OF MULTIPLE VARIABLES 55

ZZ
x
(iii) dA.
[1,2]⇥[1,3] (x2 + y 2 )2
S OLUTION . (i) The integral is equal to
Z 2 ✓Z 1 ◆ Z 2 ✓ ◆
2 1 2 1 2 1
(x + xy) dx dy = x + x y 0 dy
1 0 1 3 2
Z 2✓ ◆
1 1
= + y dy
1 3 2
✓ ◆
y 1 2 2 3 7
= + y 1
=1+ = .
3 4 4 4
On the other hand, we can calculate in the following way.
Z 1 ✓Z 2 ◆ Z 1✓ ◆
1 2
(x2 + xy) dy dx = x2 y + xy 2 1
dx
0 1 0 2
Z 1✓ ◆
3
= 3x2 + x dx
0 2
✓ ◆
3 1 3 7
= x3 + x2 0 = 1 + = .
4 4 4
(ii)
Z 3 Z 2 Z 3 ✓ ◆
xy y/2 2
dx dy = dy
1 1 (x2 + y 2 )2 1 x2 + y 2 1
Z ✓ ◆
1 3 y y
= dy
2 1 22 + y 2 12 + y 2
1 3
= ln(4 + y 2 ) ln(1 + y 2 ) 1
4
1
= (ln 13 ln 5 ln 10 + ln 2)
4
1
= (ln 13 2 ln 5).
4
(iii)
Z 3Z 2 Z 3 ✓ ◆
x 1/2 2
dx dy = dy
1 1 (x2 + y 2 )2 1 x 2 + y2 1
Z ✓ ◆
1 3 1 1
= dy
2 1 22 + y 2 12 + y 2
✓ ◆
1 1 y 3
= tan 1 tan 1 y 1
2 2 2
✓ ◆
1 1 3 1 1
= tan 1 + tan 1 + tan 1
3 tan 1
1 .
2 2 2 2 2

D EFINITION 6.1.5. Suppose ⌦ is a bounded region (not necessary a rectangle) and


f (x, y) is a continuous function on ⌦. There are two ways to define the double integral of
56

f (x, y) on ⌦. Let R be a rectangle containing ⌦ and P be a partition of R as states in


Definition 6.1.1.

(i) Let {⌦ij }ij be the rectangles contained in ⌦. Then


ZZ X
f (x, y) dA := lim f (⇠, ⌘j ) xi yj .
⌦ kP k!0
i,j,
⌦ij ⇢⌦

(ii) The other way is let


8
<f (x, y) if (x, y) 2 ⌦
g(x, y) =
:0 if (x, y) 2 R\⌦.
Then ZZ ZZ
f (x, y) dA := g(x, y) dA.
⌦ R
Note that g(x, y) is no longer a continuous function.

R EMARK 6.1.6. The integrals of f over ⌦ by (i) and (ii) exist and equal whenever f
is a continuous function.

From the discussion above, we know that if f (x, y) 0 on ⌦, then


ZZ
f (x, y) dA

express the volume of the region bounded above by the surface z = f (x, y) and bounded
below by ⌦. If we take f (x, y) = 1, then
ZZ
1 dA = A(⌦) is the are of ⌦.

Similarly, if we suppose f (x, y) g(x, y), then the volume of the solid between the
surfaces z = f (x, y) and z = g(x, y) whenever (x, y) 2 ⌦ is equal to
ZZ
(f (x, y) g(x, y)) dA.

P ROPERTY 6.1.7. Basic properties for double integrals.


6. INTEGRALS OF FUNCTIONS OF MULTIPLE VARIABLES 57

(i) ZZ ZZ
↵f (x, y) dA = ↵ f (x, y) dA, ↵ 2 R.
⌦ ⌦
(ii)
ZZ ZZ ZZ
(f (x, y) + g(x, y)) dA = f (x, y) dA + g(x, y) dA.
⌦ ⌦ ⌦

(iii) If f (x, y) 0, then


ZZ
f (x, y) dA 0.

(iv) Suppose ⌦ = ⌦1 [ ⌦2 where ⌦1 \ ⌦2 = ;, a finite number of points, or a


”curve”. Then
ZZ ZZ ZZ
f (x, y) dA = f (x, y) dA + f (x, y) dA.
⌦ ⌦1 ⌦2

E XAMPLE 6.1.8. Calculate the volume within the cylinder x2 + y 2 = 4 between the
planes y + z = 3 and z = 0.

S OLUTION . Denote ⌦ = (x, y) : 0  x2 + y 2  4 . The volume is given by the


double integral ZZ
(3 y) dA.

Since ⌦ is symmetric about the x-axis and y is an odd function, so
ZZ
y dA = 0.

Thus
Z ZZ
(3 y) dx dy = 3 1 dx dy
⌦ ⌦
= 3 · area(⌦) = 12⇡.

You might also like