UnivMathS14 (MATH1013)
UnivMathS14 (MATH1013)
Department of Mathematics
The University of Hong Kong
1
2
1. Practical Information
• Instructor
– Dr. Ben Kane
– Email: bkane[at]maths.hku.hk
– Office: Run Run Shaw Building A311
– Consultation hours: Tuesdays 10:30–13:30
• Tutors:
(1) Dr. Cheung Wai Shun
– Email: cheungwaishun[at]gmail.com
– Office: Run Run Shaw Building A316
– Consultation hours: Tuesdays 13:30–15:30
(2) Cheng Xiaoqing
– Email: mechxqxiaofen[at]gmail.com
– Office: Run Run Shaw Building A205
– Consultation hours: Tuesdays 15:00–17:00
(3) Lau Pan Shun
– Email: panlau[at]hku.hk
– Office: Run Run Shaw Building A212
– Consultation hours: Wednesdays 10:30–12:30
• Website:
http://www.hkumath.hku.hk/course/MATH1013,
http://www.hkumath.hku.hk/course/MATH1804,
Moodle Website:
Section 2C: MATH1013_2C_2013, MATH1804_2C_2013,
Section 2D: MATH1013_2D_2013, MATH1804_2D_2013.
Grade assessment
Grade Score
+
A 97%–100%
A 92%–97%
−
A 90%–92%
B+ 87%–90%
B 81%–87%
−
B 78%–81%
+
C 75%–78%
C 69%–75%
−
C 65%–69%
D 57%–65%
F < 57%.
Homework Assignments
(1) Please drop your work in the assignment box marked Math1013/1804 on the
4th floor Run Run Shaw Building.
(2) Homework is due weekly on Wednesdays and the assignment is to be turned
in by 19:00 on the due date. No late work will be accepted.
(3) Please show your work! Answers without any of the steps shown will receive
no credit.
(4) You are permitted (and even encouraged!) to discuss the homework problems
with your classmates. However, you are responsible for your own work and
each student is expected to write down the solutions in their own words!
Photocopies of other students’ solutions, combined solutions for multiple
students, and plaguarized solutions will not be accepted.
Lectures
(1) Section 2C: Lectures will be held on Mondays from 9:30–11:20 and Thursdays
from 9:30–10:20, except for periods of class suspension for public holidays,
etc. Lecture will be held in Knowles 223.
(2) Section 2D: Lectures will be held on Tuesdays from 15:30–16:20 and Fridays
from 15:30–17:20, except for periods of class suspension for public holidays,
etc. Lecture will be held in Knowles 726.
Tutorials
2. Set Theory
A useful mathematical object is a set. One way to think of a set is to consider
it a box holding certain objects. For any object, one can ask if the object is in the
set/box or not in the set/box. Much in the same way that one would ask if your
keys are in some box, one can ask whether the number 3 is in some set.
Example 2.1. For example, if you want to collect Bob, Jane, and Mary, then we
put braces (the symbols { and }) around them to collect them together:
(2.1) {Bob, Jane, Mary} .
One reads this as “The set of Bob, Jane, and Mary.”
The property is a rule used to determine whether an object is in the set or not.
In Example 2.1, the property is that the object is Bob, Jane, or Mary.
To give a more complicated example, you could make a set of all children who
are older than 7. Let’s say that you have an object x and you want to know if it is
in the set of all children who are older than 7. You would first ask whether x is a
child. If not, then x is not in the set. If x is a child, then you would ask whether x
is older than 7. If the answer is yes, then x is in the set, and otherwise it is not in
the set.
Definition 2.3. One calls an object in the set an element of the set.
Let’s now consider a mathematical way to depict the description above. Let’s say
that P (x) is true if x is a child and P (x) is false if x is not a child. Similarly, Q(x)
is true if x is older than 7 and Q(x) is false otherwise. We could then denote the set
of all children who are older than 7 by
(2.2) x : P (x) and Q(x) .
|{z} | {z }
object property
The symbol : is stands for “such that”, so the above notation really means:
The set of x such that P (x) and Q(x) are both true,
which is in turn an abbreviation for
The set of all objects which are both children and older than 7.
You may consider this as an example of how mathematical notation can be used to
abbreviate statements.
Sets written like equation (2.1) are given in list notation. This means that the
property is determined by checking if the object is one of the explicitly-listed objects.
It is often useful to move between the notation in equation (2.2) and an explicit list
7
of the objects with this property. For example, if a friend asked you what languages
you can speak in, they are really asking about the set
(2.3) {x : x is a language you can speak} .
Even though they may not know the answer, they are able to express the question
by making a set with a property. Your answer will most likely not be in the form of
equation (2.3), but rather something like
(2.4) {Cantonese, English, German, Mandarin} .
Definition 2.4. One says that the sets in Equations (2.3) and (2.4) are equal
because they have the same elements. We write (assuming that this is true)
{x : x is a language you can speak} = {Cantonese, English, German, Mandarin} .
We use = to denote equality and := to define an abbreviation for an object.
Let’s try an example with mathematical content. Say
(2.5) S := {x : x is a positive integer and x ≤ 9} .
This means that S abbreviates the set of all positive integers which are less than or
equal to 9, and we say that S is defined to be the set of all elements which are less
than or equal to 9.
Warning and Disclaimer. Some books and teachers use = instead of := to make
definitions.
Question. How would one write S in list notation?
Solution: We have
S = {1, 2, 3, 4, 5, 6, 7, 8, 9} .
Returning to (2.3), instead of asking all of the languages that you speak, someone
might ask you whether you speak a specific language. For example, they may ask
whether you speak English. From our perspective, they are really asking whether
English is an element of the set of languages which you speak. It is convenient to to
be able to abbreviate this question and its answer. We use the symbol ∈ to say that
an object is an element of the set, and ∈/ to say that the object is not an element of
the set.
Example 2.5. If the set of languages you speak is really (2.4), then
Certain sets occur often in mathematics and we collect their notation here.
Definition 2.6. We call the set which has no elements the empty set and denote it
by ∅ or {}.
The set of all integers (both positive, negative, and zero) is written Z, the set
of all positive integers (also known as the natural numbers is written N, the set
of non-negative integers (positive integers and zero) is denoted by N0 , the rational
numbers (ratios of integers) are denoted Q, the real numbers are denoted by R, and
the complex numbers are denoted by C.
For a set S, it is common to write the set of pairs (x, y) with x ∈ S and y ∈ S as
S 2 . For example
R2 = {(x, y) : x ∈ R and y ∈ R} .
It is also useful to restrict a known set by adding an extra property. Since it is
bulky to write
{x : x ∈ S and x > 3}
one commonly abbreviates this by
{x ∈ S : x > 3} .
The list notation is quite useful, but it doesn’t work very well if there are too many
elements. An interval of real numbers is the set of x ∈ R satisfying an inequality
like
a≤x≤b
for some a, b ∈ R chosen beforehand (one also allows a and b “to be” ∞). We further
abbreviate this by the following:
(a, b) := {x ∈ R : a < x < b} ,
(a, ∞) := {x ∈ R : x > a} ,
(−∞, b) := {x ∈ R : x < b} ,
[a, b) := {x ∈ R : a ≤ x < b} .
In general, replacing a parethesis ’(’ or ’)’ with a bracket ’[’ or ’]’ changes the
inequalty to include equality. The interval (a, b) is called open, while the interval
[a, b] is called closed (the intervals [a, b) and (a, b] are called half-open).
Example 2.7. Give the set of solutions to
−3x < 5
in interval notation.
Solution: Dividing by 3 on both sides gives
5
−x <
3
9
5
We now add x to both sides and subtract 3
to get
5
− < x.
3
In interval notation, the solution is
5
− ,∞ .
3
Given two sets S and T , we use
S∪T
to denote the set (called the union of S and T ) which contains all of the elements
of S and all of the elements of T . For example, if S = {1, 5, 7} and T = {2, 5, 9},
then
S ∪ T = {1, 2, 5, 7, 9}.
Note that 5 occurs only once in S ∪ T . This is because elements are either
inside the set or not in the set (they are only counted once).
We call the set of elements contained in both S and T the intersection of S and T
and denote this by
S ∩ T = {5}.
Example 2.8. Write the set of solutions to
(2.6) 7 < 2x < 12
as an intersection and the set of solutions to
(2.7) 7 < 2x or 3x < −12
as a union.
Solution: In equation (2.6), we have
7
7 < 2x =⇒ x >
2
and
2x < 12 =⇒ x < 6.
Thus the set of solutions is
7
, ∞ ∩ (−∞, 6) .
2
In (2.7), we have
7
x>
2
or
x < −4.
10
−1 0 1 2 3 4 5 6 7
−1 0 1 2 3 4 5 6 7
1 f (x) = 2x − 3
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
A B
In the above picture, the shaded region depicts the elements which are contained
in both A and B, but are not in C.
Another Venn diagram is the following:
3. Functions
Functions are important objects which appear throughout mathematics and be-
yond. One way to think of a function is to imagine a machine that has an intake
and output pipe. You place an input into the intake pipe, the machine applies some
rule, and then it outputs something on the other end. Consider for the moment
your computer. If you type the letter ’L’ on your keyboard (this is the input), your
computer does something (you may or may not know what) and then something on
your screen appears which looks like the letter ’L’ (this is the output).
Of course, machines break, but functions are perfect machines that give the same
ansswer every time that you give it the same input.
Example 3.1. The function
f (x) := 3x2 + 7
returns 3x2 + 7 whenever the input is x. For example, if you give the input 3, then
you get the output
f (3) = 3(32 ) + 7 = 34.
Definition 3.2. A function from a set D to a set C is a rule which for each element
of D determines a unique output element of C. One calls D the domain of the
function and C the co-domain.
If the domain of a function f is not explicitly written, then we assume that the
domain is the largest set of real numbers for which the co-domain is real. This is
called the natural domain of f .
Example 3.3. For √
f (x) = x−2
the natural domain is [2, ∞), because the parameter to the square root must be
non-negative for the value (the output) to be real.
Although we require a function to give a value for each element of D, it is possible
that not every element of C actually occurs as an output. Furthermore, two different
inputs may return the same output. The range of the function is the set of all outputs
which actually occur.
Example 3.4. We return to the function
f (x) := 3x2 + 7
with D = R (the natural domain). Since x2 ≥ 0, we have f (x) ≥ 7 for every x ∈ R.
Hence the range is a subset of [7, ∞).
However, if y ≥ 7 then choose
r
y−7
x0 =
3
13
so that
y−7
f (x0 ) = 3 + 7 = y.
3
Therefore y is in the range of f and we conclude that the range of f is precisely
[7, ∞).
Furthermore, the input which yields y is not unique because
f (−x0 ) = y.
If we restrict the domain to D = [2, ∞), then the range is [19, ∞), since
3x2 + 7 ≥ 3(2)2 + 7 = 19
whenever x ≥ 2.
3.1. Graphs of functions. There are a number of ways to represent a function.
One of the most useful ways is through its graph.
Definition 3.5. The graph of a function is the set of pairs of inputs and outputs
of the function.
When the domain and co-domain are both R, the graph of a function is often
pictorially represented by placing dots at the pairs (x, y) which are elements of the
graph. Here x is the input and y is the output.
Example 3.6. Here is a sketch of the graph of f (x) = x2 − 4.
1 f (x) = x2 − 4
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
Given a graph, one can ask whether it is the graph of a function. Remember that
for each input there is only one output. Looking at the graph, this means that every
vertical line can only hit the graph once.
Example 3.7.
14
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
1 f (x) = 2x − 3
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
The vertical line test says that a graph is the graph of a function precisely when
there is no vertical line which intersects the graph more than once.
There are a number of important functions which occur often. A list of some of
these is given here:
(1) Polynomial functions: These are functions of the type
where A0 , . . . , An are some constants (this means that they do not change
when x changes) and An 6= 0. The number n ∈ N0 is called the degree of the
polynomial. When the degree is at least 1, one calls the x for which f (x) = 0
the roots or zeros of the polynomial.
15
f (x) = 7.
f (x) = x,
f (x) = x2 + 3x + 4.
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
f (x) 5
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
Now consider
f (x) := x4 − 2x + 1.
To check if it is even or odd, we compute f (−x). This gives
f (−x) = (−x)4 − 2(−x) + 1 = x4 + 2x + 1.
Question. This doesn’t look like f (x), and it doesn’t look like −f (x). But how do
we know for sure that it’s not (sometimes things look very different, but end up
somehow being equal)?
Solution: To check that something is not even and not odd, all you have to do is
check one value and show that f (−x) = f (x) and f (−x) = −f (x) are not satisfied.
Remember that f (−x) = f (x) means that I can plug in any choice of x and it should
18
be true. Therefore, if f (−7) 6= f (7), then it is not satisfied (even if it is satisfied for
every other choice of x).
In the above example, we compute
3.3. New functions from old functions. For two functions f and g, there are a
number of new functions that can be constructed. One can define the sum
or product
h(x) := f (x)g(x)
as well as the quotient
f (x)
h(x) := .
g(x)
You can also define the horizontal shift of f by
h(x) = f (x − r)
shift
1
2
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
h(x) = f (x) + r.
This is shifted up by r.
19
f (x)
5
f (x) − 2
0 1 2 3 4
shift 2
−4 −3 −2 −1
−1
−2
−3
−4
−5
Definition 3.11. A function which has each possible output at most once is called
injective.
f (x) = x2 .
f (x) = x2
Solution: Yes, in this domain f only attains each value once, so it is injective. The
inverse function is
√
f −1 (x) = x.
You can see this because
√ 2
x =x
and
√
x2 = |x|,
but x ≥ 0, so |x| = x.
From this example, you can see that checking whether a function is injective
depends on the domain.
An easy way to check whether a function is injective is the horizontal line test.
If a function is injective, then every horizontal line intersections its graph at most
once.
21
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
not injective
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
injective
The horizontal line test might remind you of the vertical line test to check whether
a graph is a function. This is not an accident, because the horizontal line test is
actually checking whether the graph coming from our previous attempt to define
the inverse is actually a function or not. To see, this, think about what changing
the input and the output does. This exchanges the roles of x and y, which is the
same as flipping over the line y = x.
22
y = f (x) 5
y = x
y = f −1 (x)
4
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
23
Remark. To consider the limit towards a point a, the function doesn’t even need
2 −4
to be defined at a! In the above example, we could have defined f (x) = xx−2 , which
does not have x = 2 in its natural domain. However, the limit as x approaches 2 is
still 4.
If you look at the graph of f (x), you see that it suddenly jumps at x = 2, just
like the magician’s trick. If we instead defined f (2) to be 4, then it would continue
24
(
f (x) if x 6= 2,
g(x) :=
4 if x = 2.
f (x)
5 5
g(x)
4 4
3 3
2 2
1 1
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
We will discuss the difference between these two phenomena when we speak about
continuity later.
Before continuing, it is important to discuss the existence of a limit. Consider
the function
(
x2 −4
x−2
if x > 2,
g(x) :=
−x − 1 if x ≤ 2.
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
If you approach x = 2 from the right, then f (x) approaches 4, but if you approach
x = 2 from the left, then f (x) approaches −3. We say that the limit does not exist.
However, one says that the limit from the right is 4 and the limit from the left is
−3. This is written:
lim+ f (x) = 4
x→2
and
lim f (x) = −3.
x→2−
We also simply say that limx→2 f (x) does not exist (this is sometimes abbreviated
DNE). There are even cases when the left and right limits do not exist, but we will
go into that further later.
Limits satisfy many useful properties. Assume that limx→a f (x) and limx→a g(x)
both exist. Then we have the following
(1) The limit of the sum or difference of two functions is the sum or difference
of the limits of the individual functions, i.e.,
lim [f (x) ± g(x)] = lim f (x) ± lim g(x).
x→a x→a x→a
(2) The limit of the product of two functions is the product of the limits of the
individual functions, i.e.,
lim [f (x)g(x)] = lim f (x) lim g(x) .
x→a x→a x→a
(3) If limx→a g(x) 6= 0, then the limit of the ratio of two functions is the ratio of
the limits, i.e.,
f (x) limx→a f (x)
lim = .
x→a g(x) limx→a g(x)
(4) The limit of the n-th root of a function is the n-th root of the limit (if n is
even, then we assume that limx→a f (x) ≥ 0)
p q
lim n f (x) = n lim f (x).
x→a x→a
Solution: One is tempted to bring the limit to the denominator and the numerator,
but this is not legal because we have to check that the limit of the denominator is
non-zero. In this case, it is zero, however (if you plug in directly, you get 00 , which
is not well-defined). In this case, we factor
x4 − 1 = x2 − 1 x2 + 1 = (x − 1)(x + 1) x2 + 1
and
x2 − 3x + 2 = (x − 1)(x − 2).
Now, for x 6= 1, we have
x4 − 1 (x2 + 1) (x + 1)✘
(x✘−✘1)
✘
(x2 + 1) (x + 1)
= = .
x2 − 3x + 2 (x − 2)✘(x✘−✘1)
✘
x−2
Since we are only getting “really close” to x = 1, we are allowed to cancel out and
then take the limit. It follows that
x4 − 1 (x2 + 1) (x + 1)
lim 2 = lim .
x→1 x − 3x + 2 x→1 x−2
27
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
28
The functions g(x) and h(x) are rather simple (perhaps they are something like
g(x) = x2 and h(x) = −x2 ). Therefore, it may be easy to compute the limits
lim g(x) = 0
x→0
and
lim h(x) = 0.
x→0
You would guess that
lim f (x) = 0
x→0
from the picture, but it might be difficult to show this. You can then use the
following theorem.
Theorem 4.5 (Squeeze theorem). Suppose that there is some interval (A, B) with
a ∈ (A, B) so that for every x ∈ (A, B) with x 6= a we have
h(x) ≤ f (x) ≤ g(x).
If
lim h(x) = lim g(x) = L,
x→a x→a
the the limit limx→a f (x) exists and
lim f (x) = L.
x→a
The idea of Theorem 4.5 is that a function squeezed between two other functions
that approach the same value cannot “escape” and end up at another value.
This is a concept used even by sports stars. If you have played soccer (football),
basketball, or hockey, then you would be familiar with a “box out”. The idea is to use
one’s body to force the direction of another player to go the direction that you want
them to go. They cannot go through you (this is like the inequality f (x) ≤ g(x)). If
there were 2 defenders both running towards a particular point and the other player
was stuck between them, then there would be no choice other than all players ending
up at the same point, much like the graph above.
Example 4.6. Compute
1
lim x sin .
x→0 x
Solution: For every x, we have
−1 ≤ sin(x) ≤ 1.
We also have
−|x| ≤ x ≤ |x|.
Therefore, for x 6= 0, we have
1
−|x| ≤ x sin ≤ |x|.
x
29
Since
lim (−|x|) = 0
x→0
and
lim |x| = 0,
x→0
we conclude that
1
lim x sin = 0.
x→0 x
1
lim .
x→0 x2
1
Here is the graph of f (x) = x2
.
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
1
As |x| gets smaller and smaller,
f (x) gets bigger and bigger (for example, f 1000
=
2 1 2
1000 and also f − 1000 = 1000 ). If a function f (x) keeps getting larger and larger
as x gets closer to a, we say that the limit is infinite, and this is written
lim f (x) = ∞.
x→0
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
We see that
lim f (x) = +∞
x→0+
and
lim f (x) = −∞.
x→0−
Since one goes to ∞ and one goes to −∞, the limit at 0 does not exist.
In a similar way, we consider infinite limits. Here we ask the question: as x gets
larger and larger, what does f (x) approach? For f (x) = x1 , as x gets larger and
larger, f (x) gets smaller and smaller. We hence write
1
lim = 0.
x→∞ x
We also have
1
lim = 0.
x→−∞ x
To find the limit towards ±∞ of a rational function, one divides the numerator
and denominator by the highest power of x appearing in the denominator.
Example 4.8. (1) Find the limit
3x2 + 2x + 1
lim .
x→−∞ 7x2 + 4
(2) Find the limit
2x + 1
lim .
x→∞ 7x2 + 4
(3) Find the limit
3x2 + 2x + 1
lim .
x→−∞ 4x + 1
Solution: (1) We divide the numerator and denominator by x2 to compute
3x2 + 2x + 1 3 + x2 + x12
lim = lim
x→−∞ 7x2 + 4 x→−∞ 7 + x42
3+0+0 3
= = .
7+0 7
1
In the last line, we used the properties of the limit to plug in limx→−∞ x
= 0 and
limx→−∞ x12 = 0.
(2) We divide the numerator and denominator by x2 to compute
2
2x + 1 x
+ x12
lim = lim
x→∞ 7x2 + 4 x→∞ 7 + 42
x
0+0
= = 0.
7+0
1
In the last line, we used the properties of the limit to plug in limx→∞ x
= 0 and
limx→∞ x12 = 0.
(3) We divide the numerator and denominator by x to compute
3x2 + 2x + 1 3x + 2 + x1
lim = lim .
x→−∞ 4x + 1 x→−∞ 4 + x1
We now note that the numerator approaches −∞ as x → −∞, while the denomi-
nator approaches 4. Thus the ratio approaches −∞. We conclude that
3x2 + 2x + 1
lim = −∞.
x→−∞ 4x + 1
32
3 3
2 2
1 1
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
and
g(2) = 4.
This verifies mathematically what our intuition tells us, that g is continuous.
1
Example 4.10. Where is the function f (x) = x
continuous? Where is it discontin-
uous?
4.3. Intermediate value theorem. Let’s return to the magician’s trick. The ma-
gician managed to magically and instantaneously move the coin from the right to the
left. Since it was instantaneous, the coin was never anywhere between the right and
the left hand. However, if we now return the magician to mere mortal status (her
magic is now only a slight of hand, and she is returned to the world of continuity),
then she must physically move the coin from the right hand to the left hand. As a
result, the coin must pass through all of the space between the left hand and the
right hand. Therefore, at some point (we don’t know when, because the magician
is very good at hiding the trick), the coin had to be at every point between the
two hands. This is indicative of the following important theorem about continuous
functions.
The theorem doesn’t say anything about what c is (much like we don’t know the
time that the coin passes through the air), but we can guarantee that it must be
there sometime in between a and b.
Here is another way to visualize the intermediate value theorem. Let’s say that
you see someone standing outside of your room at time a. At time b, they are inside
your room. Perhaps you were not paying attention and did not see him or her come
in, but you can still guarantee that at some point c, they were directly in your
doorway. Here is a graphical representation of the Intermediate Value Theorem.
34
a = −2
f (−2) = −2 5
b=2
f (2) = 3 4
c ∈ [−2, 2]
f (c) = 1
2
3
c
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
The Intermediate Value Theorem is helpful for finding roots of a function. Suppose
that f is a continuous function and f (a) > 0. If f (b) < 0 with a < b, then at some
point between a and b, the graph must have crossed over the x-axis. Wherever it
crossed (we don’t know), the value of f is zero. This means that there is a c between
a and b so that f (c) = 0.
This allows us to find roots by noticing that the graph on either side of a root
changes sign. Since polynomials are always continuous, this can be quite useful in
(numerically) approximating roots of polynomials. You can keep getting closer and
closer to find the approximate root (where the value on each side changes sign).
Together with the following theorem (which is not within the scope of this course),
this allows you to find all roots of a polynomial with a computer/calculator.
The theorem above means that, if you manage to find n roots (or n + 1 places
where it changes sign), then you have found all of the roots!
Example 4.14. Determine the range of f (x) := 3x2 + 7 using The Intermediate
Value Theorem.
Solution: We have already shown that the range is [7, ∞) in the first week of the
class. That time, we first used x2 ≥ 0 to show that the range was a subset of [7, ∞).
After this, for each y ≥ 7 we had to construct an x0 so that f (x0 ) = y. Instead
of doing this, we now use the continuity of f (x) (it is a polynomial). Suppose that
y ≥ 7. Since
lim f (x) = +∞,
x→∞
35
5. Differentiation
5.1. Slopes and Derivatives. One says that a linear function f (x) = mx + b has
slope m. The slope is the amount that y changes whenever x changes one unit.
Another way to write this is as the change of y divided by the change in x. The
slope is an important piece of information. For one example, think about the stock
market: every day they tell you the value of the stocks (this is f (x)), but they also
tell you how much it has changed from the day before (this is the slope of the line
connecting f (x − 1) and f (x)).
The difference quotients can be thought of as the slope of the secant line from
(a, f (a)) to (a + h, f (a + h)), depicted below:
37
f (x) 5
secant lines
4
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
By taking the limit, one is computing the slope of the tangent line at the point
(a, f (a)). Roughly speaking, the tangent line is the line which hits the graph at
exactly one point (more precisely, the limit of the secant lines as the points get
closer and closer, until they become one point).
f (x) 5
tangent line
4
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
Note that although there are many secant lines which go through the point (a, f (a)),
there is only one tangent line (if it exists). The slope of the tangent line at x = a is
the derivative, because it is the limit of the slopes of the secant line.
Thinking of the difference quotients as the rate of change (as in the example of the
stock market), one can think of the derivative as the instantaneous rate of change.
In other words, at what rate does f (x) change if x changes an infinitely small amount.
38
Remark. One often write ∆ for change. Then the difference quotient is written
∆f (x)
∆x
When ∆x becomes “infinitely small” it gets denoted dx instead of ∆x. This explains
d
the notation dx .
Example 5.3. To give an addition example beyond the stock market, the speed/velocity
of a car is the change of distance over a given time. If you travel 100 km over a
one-hour period, then your average speed (the difference quotient) is 100 km/h.
However, if you look at your spedometer at a given time in between, it gives your
instantaneous speed (the derivative), which might be different from 100 km/h.
The reason for the above theorem is the following: Take secant lines closer and
closer to the point where it is discontinuous. Since f jumps at x = a, the lines get
more and more vertical (they must jump a large distance in the y direction, but do
not change much in the x direction). As a limit, the line becomes vertical, which
has infinite slope.
If the tangent line is vertical, then we also say that the derivative does not exist.
inside) and then just plug in the parameter. However, afterwards, you have
to multiply by the derivative of the parameter.
Remark. If the parameter is just x, then you get
d dx
f (x) = f ′ (x) .
dx dx
dx ′
Since dx = 1, this is precisely f (x), as it should be.
Example 5.7. We illustrate some of the more complicated rules with an example.
Find the derivatives of
√
f (x) := x x3 − 2 ,
√
g(x) := x4 + 2x,
x3 − 2
h(x) := 5 .
x + 2x − 1
Solution:
√ 1
(1) Using the product rule (and the fact that x = x 2 ), we have
√ 2
1 −1
′
f (x) = x 3x + x 2 x3 − 2 .
2
√
(2) Using the chain rule (splitting as u with u := x4 + 2x), we have
1 4 − 1
g ′ (x) = x + 2x 2 4x3 + 2 .
2
(3) Using the quotient rule, we have
(x5 + 2x − 1) (3x2 ) − (x3 − 2) (5x4 + 2)
h′ (x) = .
(x5 + 2x − 1)2
Definition 5.8. Repeated differentiation may be performed. Thinking of the de-
rivative as the instantaneous change, the second derivative (the derivative of the
derivative) is the instantaneous change in the slope. It is denoted
′′ d ′ d d d2
f (x) := f (x) = f (x) = 2 f (x).
dx dx dx dx
The third derivative is denoted f ′′′ (x), while higher derivatives are denoted f (4) (x),
f (5) (x), . . . .
An example where the second derivative naturally occurs is gravity. When an
object (say a ball) is thrown upwards, it has a force (gravity) pulling it back down.
It is important to know the current location of the ball (the value), the speed at
which the ball is travelling (derivative), and at what rate it is slowing down (second
derivative). Gravity determines the rate at which it slows down. In this case, the
42
quickly!). Given this data (and assuming the observed relationship is really true in
dy
general), one can determine dx using implicit differentiation.
5.5. The Mean Value Theorem. Let’s say that you’re driving on a road with
a speed limit of 100 km/h. Let’s say that you pass a checkpoint at 12:30pm and
another 100 kilometers away at 1:20pm. You look at your spedometer as you pass
the second checkpoint and it says that you are currently travelling at 80 km/h.
However, a police officer pulls over your car and gives you a ticket for speeding.
You argue that you were only travelling at 80 km/h, but he says that you must
have been travelling over 100 km/h at some point. He explains that you travelled
100 kilometers in less than one hour, so your average speed was over 100 km/h. If
you had always been travelling below 100 km/h, then you could not have gotten an
average speed above 100 km/h. Unfortunately, his logic is sound and you cannot
argue this.
This police officer has essentially employed The Mean Value Theorem in his ex-
planation.
Theorem 5.9 (The Mean Value Theorem). If a function f is continuous on [a, b]
and differentiable on (a, b), then there exists some c ∈ (a, b) such that
f (b) − f (a)
f ′ (c) = .
b−a
Roughly speaking, the above theorem says that at some point, your speed was
exactly your average speed. Although the police officer doesn’t know when you were
speeding, he knows that you were speeding at some point. Just like the Intermediate
Value Theorem, you don’t know what c is, only that it exists.
Note that there are conditions which need to hold for the Mean Value Theorem
to hold. It assumes that the function is continuous (no Star Trek transporter has
beamed you from one location to another). Also, the function must be differentiable.
To give an example where differentiability is important, consider
f (x) = |x|.
We have f (1) = f (−1), so the average of the derivative on [−1, 1] is zero. However,
(
−1 if x < 0,
f ′ (x) =
1 if x > 0.
We see that there is no c ∈ (−1, 1) for which f ′ (c) = 0.
Let’s consider the special case of the Mean Value Theorem when f (a) = f (b).
Theorem 5.10 (Rolle’s Theorem). If a function f is continuous on [a, b] and dif-
ferentiable on (a, b) and f (a) = f (b), then there exists some c ∈ (a, b) such that
f ′ (c) = 0.
44
f (x) 5
tangent line
4
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
If you look at the above graph, you see that these horizontal points are (locally,
meaning in some small interval around them) the maximum and minimum value for
the function. Think back to the example of the stock broker. If the stock broker
knew the exact points where the stock value would be lowest (at a critical point) and
the exact point where it would be the highest (also at a critical point), then they
would know exactly when to buy and sell. So finding critical points is important for
finding important places where the direction of the stock might change.
5.6. L’Hôpital’s Rule. In the last section, we discussed a number of infinite limits
and limits towards infinity. Recall that
f (x) limx→a f (x)
lim = ,
x→a g(x) limx→a g(x)
provided that both limits exist and that limx→a g(x) 6= 0. This begs one to question:
What happens when limx→a g(x) = 0? Is there an easy way to determine the answer
in this case?
Consider the simple examples
x2 − x x−1 x2 − 2x − 1
, , and .
x − 1 x2 − 2x + 1 x−1
In each of these cases, the limit as x → 1 of the denominator vanishes. We therefore
cannot use the above rule. However, by cancelling a common factor, we have
x2 − x
lim = lim x = 1,
x→1 x − 1 x→1
45
x−1 1
lim = lim ,
x→1 x2 − 2x + 1 x→1 x − 1
which does not exist, and
x2 − 2x + 1
lim = lim x − 1 = 0.
x→1 x−1 x→1
We therefore were able to construct examples where the answer was 0, 1, and even
one where the limit does not exist (the left-limit and right-limit are −∞ and +∞,
respectively).
Hence plugging in and getting 00 doesn’t tell us anything about what the limit
might be. However, it there is still a useful rule which allows us to compute liits
when directly plugging in yields 00 .
Theorem 5.11 (L’Hôpital’s rule). Suppose that f and g are differentiable and that
both limx→a f (x) = 0 and limx→a g(x) = 0. Then
f (x) f ′ (x)
lim = lim ′ .
x→a g(x) x→a g (x)
Warning. If plugging in does not yield 00 , then you may not use L’Hôpital’s
rule. To give an example where this can be seen, consider
x−1
lim .
x→1 x + 2
Here the limit is zero (you can get this by plugging in directly). However, if you
tried to (illegally!) use L’Hôpital’s rule, you would obtain
1
lim = 1,
x→1 1
an incorrect answer.
Solution: In this case, you can just plug x = 0 in (from the original properties we
learned for the limit), because
lim cos2 (x) = cos2 (0) = 1 6= 0.
x→0
Therefore,
sin(x) limx→0 sin(x) 0
lim 2
= = = 0.
x→0 cos (x) limx→0 cos(x) 1
Had you tried to use L’Hôpital’s rule (which is not allowed because the plugging in
doesn’t give 00 ), you would have gotten
cos(x) 1
lim = − lim ,
x→0 −2 cos(x) sin(x) x→0 2 sin(x)
47
and
√
lim x + 1 = ∞.
x→∞
Noting that
d√ 1
x= √ ,
dx 2 x
∞
L’Hôpital’s rule for ∞
yields
p √ √ 1√ 2
√
2 x+7 2 2 x+7 2 x 1
lim √ = lim 1 = lim p √ .
x→∞ x+1 x→∞ √
2 x
x→∞ 2 x+7
Since
q
√
lim 2 x + 7 = ∞,
x→∞
we conclude that
1
lim p √ = 0.
x→∞ 2 x+7
Therefore
p √
2 x+7
lim √ = 0.
x→∞ x+1
Solution: Using
1
csc(x) = ,
sin(x)
we rewrite
1 x x
x sin = 1 = .
x
sin( x1 )
csc x1
Since sin(0) = 0, we have
1
lim csc = +∞
x→∞ x
Furthermore, by the chain rule
d d 1 1
csc(x) = =− 2 (cos(x)) = − csc(x) cot(x).
dx dx sin(x) sin (x)
Again using the chain rule, we obtain
d 1 1 1 1
csc = − csc cot − 2 .
dx x x x x
∞
Therefore by L’Hôpital’s for ∞
, we have
x 1 x2
lim = lim = lim .
x→∞ csc 1 x→∞ 1
csc2 1
sec 1 x→∞ csc2 1
sec 1
x x2 x x x x
The situation seems to actually have gotten much worse (the power of x in the
numerator is higher!). Usually this happens when you made the wrong choice. I’ll
explain what you should try next:
Maybe we want to instead write
1 sin x1
x sin = 1 .
x x
This is now of the form 00 and we may hence use L’Hôpital’s rule. We compute the
derivative
d
(x (cos(x) − 1)) = x sin(x) + cos(x) − 1.
dx
to obtain
cos(x) − 1 + x − sin(x) + 1
lim = lim .
x→0 x(cos(x) − 1) x→0 −x sin(x) + cos(x) − 1
The limit of the numerator is 1. To determine the limit of the denominator, we plug
in x = 0 to obtain
lim x sin(x) = 0,
x→0
lim cos(x) − 1 = 0.
x→0
cos(x) − 1 ≤ 0.
It follows that
1 1
lim + = −∞.
x→0 x cos(x) − 1
50
5.7. Curve sketching and optimization. We’ve spoken a bit about what the
derivative means from the point of view of the graph of a function. As we described,
this is the slope of the tangent line (or the instantaneous rate of change). The special
case when f ′ (x) = 0 was discussed when we looked at Rolle’s Theorem.
The interplay between derivatives of function and the graph of the function is an
important one. Given a graph, one can identify certain special points and general
properties of the function which help to understand the general structure of the
function. Conversely, given a function, one can understand it by sketching its graph.
5.7.1. Critical points and increasing/decreasing behavior. When you look at the
graph of a function, what stands out to you? The first thing that might be ob-
vious are the points of discontinuity. You might also notice the points where the
derivative does not exist (think about the graph of f (x) = |x|). The next impor-
tant thing that you might notice are the places where the graph changes direction.
Together, these points are known as the critical points.
Definition 5.19. For an interval (a, b), we call c ∈ (a, b) a critical point of f if
either f is not differentiable at c or
f ′ (c) = 0.
The critical points are important for finding the maximum and minimum values
in an interval. Think about the top of a mountain or a hill. At the very top, the
tangent line of a hill is horizontal.
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
In each of these cases, the top (the highest point) is a critical point. If you have a
bunch of mountains next to each other, then the top of each mountain is a critical
point. To find the tallest mountain, you just have to compare the height of each of
the critical points (the highest point on each mountain). The low points are also
critical points, so you will find the lowest point of each mountain this way, too. Of
course, if you are climbing the mountain, the highest point you’ve reached might
not be the top. However, if you consider the part that you’ve climbed as an interval,
the highest point will be one of the endpoints of the interval.
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
Therefore, if you want to find the highest (or lowest) point in an interval, you should
check the endpoints and the critical points.
52
The top of each mountain is a local maximum, while only the top of the tallest
mountain is a global maximum.
After finding the top and bottom points where the graph changes direction, it is
also notable to mark when the graph is going up and when it is going down. A
function f is said to be increasing on an interval (a, b) if for any a < r < s < b we
have
f (s) ≥ f (r).
If
f (s) ≤ f (r).
Again, we say that f is strictly decreasing if f (s) < f (r) always holds.
Suppose now that f is differentiable. Since
f (x + h) − f (x)
f ′ (x) = lim ,
h→0 h
if f (x + h) > f (x) for every h > 0 (meaning that f is strictly increasing), then
f ′ (x) > 0. Analyzing in this way, we obtain the following
increasing 5
decreasing
4
constant
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
Example 5.22. Find all of the critical points of f (x) := 41 x4 − x. Also determine
when it is increasing and decreasing. Determine the local maxima and minima and
the global maximum and minimum, if they exist. Give a rough sketch of its graph.
f ′ (x) = x3 − 1.
0 = x3 − 1.
f ′ (x) = x3 − 1 < 0.
If x > 1, then
f ′ (x) = x3 − 1 > 0.
Therefore we have that f is decreasing for x < 1 and increasing whenever x > 1.
Therefore, at x = 1, the function f has a local minimum. Since
increasing 5
decreasing
4
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
5.7.2. Asymptotes. Another property of the graph which might stick out is the limit
as x → ±∞ and any point where there is a discontinuity. This leads to the following
definitions.
Definition 5.23. If limx→∞ f (x) = L or limx→−∞ f (x) = L, then we say that the
line y = L is a horizontal asymptote to the curve y = f (x).
If limx→a+ f (x) = ±∞ or limx→a− f (x) = ±∞, then we call the line x = a a
vertical asymptote to the curve y = f (x).
1
Example 5.24. What are the horizontal and vertical asymptotes for f (x) := x−1
?
Solution: We have
1
lim =0
x→∞ x−1
and
1
lim = 0,
x→−∞ x − 1
so y = 0 is a horizontal asymptote.
Since
lim− f (x) = −∞
x→1
and
lim f (x) = +∞,
x→1+
we have that x = 1 is a vertical asymptote.
Example 5.25. Find the vertical and horizontal asymptotes of
2x2 + 3
f (x) := .
x2 − 1
55
Solution: We have
3
2x2 + 3 2+ x2
lim = lim 1 = 2.
x→±∞ x2 − 1 x→±∞ 1 −
x2
x2 − 1 = 0,
or x = ±1.
In order to sketch the graph, we determine the limits
2x2 + 3
lim − = +∞,
x→−1 x2 − 1
2x2 + 3
lim + 2 = −∞,
x→−1 x −1
2x2 + 3
lim− 2 = −∞,
x→1 x −1
2x2 + 3
lim+ 2 = +∞,
x→1 x −1
because x2 −1 > 0 whenever |x| > 1 and x2 −1 < 0 whenever |x| < 1 (and 2x2 +3 > 0
always).
To sketch the graph, it is also useful to know when the function is increasing and
decreasing. We compute the derivative
Therefore f ′ (x) = 0 only when x = 0, f ′ (x) > 0 for x < 0, and f ′ (x) < 0 for x > 0.
Putting this all together gives the following sketch of the graph.
56
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
Another kind of asymptote occurs when f (x) gets closer and closer to a line as x
gets bigger. We say that y = mx + b is an oblique asymptote for y = f (x) if
lim (f (x) − (mx + b)) = 0
x→∞
or
lim (f (x) − (mx + b)) = 0.
x→−∞
and
lim (f (x) − mx) = b.
x→±∞
x2 −1
Example 5.27. Find the oblique asymptote(s) (if they exist) for f (x) := x+2
.
We then compute
x2 − 1 x2 + 2x 2x + 1
lim (f (x) − x) = lim − = − lim = −2.
x→∞ x→∞ x+2 x+2 x→∞ x + 2
5.7.3. Convexity and points of inflection. We used the derivative to find interesting
information about when a function increases or decreases and where its maxima and
minima occur. The second derivatve also contains information about the shape of
the function.
Think about the shape of a bowl on your dining room table (or a spoon, with
the bottom of the spoon on the table) with some food in it. The shape of the bowl
is curved upwards. This is known as concave up (also called convex). If you turn
the bowl (or spoon) upside down, then it is curved downwards. This is known as
concave down (in some books, this is simply called concave).
To be more precise, if the tangent line is underneath the curve in some small
interval, then it is concave up. If the tangent line is above the curve in some small
interval, then it is concave down. You can see this in the picture below.
concave down
5
concave up
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
Remark. Another way to write the condition for convexity is the following: A func-
tion is convex (concave up) on the interval (a, b) if for every 0 < t < 1 and x, y ∈ (a, b)
we have
f ((1 − t)x + ty) ≤ (1 − t)f (x) + tf (y).
Roughly speaking, this inequality means that the value is less than the (weighted)
average (think about t = 12 ). Here “weighted” means that the closer you are to x,
the closer the value should be to f (x), so x has more “weight” in the average when
you are close to it..
The line
(1 − t)f (x) + tf (y)
is actually the secant line between the points (x, f (x)) and (y, f (y)) (plug in t = 0
and t = 1 to see that these two point are on the line), so another equivalent statement
is that f is concave up (convex) in an interval if the curve is underneath all of the
58
secant lines in the interval. It is concave down (concave) if the curve is above all of
the secant lines.
Now think of the derivative as the rate of change of the function (the speed of a
car, for example). Whenever a function is concave up, the rate of change is increasing
(the slope of the tangent line is increasing – look at the graph above). Whenever
the function is concave down, the rate of change is decreasing. In the example of
a car, being concave up therefore means that the speed is increasing, or in other
words, the car is accelerating. Concave down means that the speed is decreasing,
i.e, the car is decelerating.
Here the discussion is about the rate of change of the rate of change (the change of
the speed). But this is then the second derivative (the derivative of the derivative
is the rate of change of the derivative).
Theorem 5.28. Suppose that f is twice differentiable on the interval (a, b).
(1) If f ′′ (x) ≥ 0 for every x ∈ (a, b), then f is concave up on (a, b).
(2) If f ′′ (x) ≤ 0 for every x ∈ (a, b), then f is concave down on (a, b).
The points where the convexity changes are also important.
Definition 5.29. If f is continuous at c ∈ R and f changes convexity (i.e., from
concave up to concave down or concave down to concave up) between x < c and
x > c, then we call c a point of inflection of f .
By Theorem 5.28 the convexity changes precisely when the sign of the second
derivative changes.
Example 5.30. Find the points of inflection points of
f (x) := x3 − 2x
and
g(x) := x4 .
Solution: We take the second derivatives
f ′′ (x) = 6x
and
g ′′ (x) = 12x2 .
When the sign of the the second derivative changes, we must pass through a point
where the second derivative equals zero. Thus for f , this could only happen when
6x = 0 =⇒ x = 0.
It is easy to check that for x < 0 we have f ′′ (x) < 0 and for x > 0 we have f ′′ (x) > 0,
so f indeed has a point of inflection at x = 0.
59
For g ′′ (x), we can only have a point of inflection at x = 0. However, for x < 0
we have g ′′ (x) > 0 and for x > 0 we also have g ′′ (x) > 0. Therefore, the sign of the
second derivative doesn’t change and hence g has no points of inflection.
We are now ready to sketch a graph. The following process will help to accurately
sketch the graph:
(1) Determine the natural domain of f and determine where f is continuous/not
continuous.
(2) Determine the asymptotes of f , if any.
(3) Determine the y-intercept (and any x-intercepts, if possible).
(4) Find f ′ (x) and f ′′ (x).
(5) Find critical points (i.e., those points where the derivative doesn’t exist or
the derivative is zero) by setting f ′ (x) = 0.
(6) Between each two adjacent critical points, determine whether the graph is
increasing/decreasing. Also determine whether the function is increasing or
decreasing as x → −∞ and x → ∞.
(7) Determine whether critical points are local minima, local maxima or neither.
(8) Find possible points of inflection (where the second derivative doesn’t exist
or f ′′ (x) = 0.
(9) Compute the convexity between any two possible points of inflection.
Example 5.31. Use these techniques to sketch a graph of
x3
f (x) := .
x3 + 1
Solution:
(1) The natural domain is (−∞, −1) ∪ (−1, ∞), since the denominator equals
zero when x = −1. There is a point of discontinuity at x = −1 and no other
points of discontinuity.
(2) There is a vertical asymptote at x = −1 because the denominator is zero
but the numerator is not. To compute the horizontal asymptote, we take the
limits
1
lim f (x) = lim =1
x→∞ x→∞ 1 + 13
x
and
1
lim f (x) = lim = 1.
x→−∞ x→−∞ 1 + 13
x
Therefore, there is a horizontal asymptote at y = 1.
(3) The y-intercept is f (0) = 0. The only x-intercept is when the numerator is
zero, which occurs for x = 0.
(4) To easier compute the derivatives, we first rewrite
1
f (x) = 1 − 3 .
x +1
60
We compute (using the chain rule, but one could also do this with the quo-
tient rule)
3x2
f ′ (x) =
(x3 + 1)2
We now use the quotient rule to compute
2 2
′′ 6x (x3 + 1) − 2 (x3 + 1) (3x2 )
f (x) =
(x3 + 1)4
6x (x3 + 1) − 18x4
=
(x3 + 1)3
6x (−2x3 + 1)
= .
(x3 + 1)3
(5) The critical points are x = −1 (where the function is discontinuous) and
whenever f ′ (x) = 0. The derivative is zero whenever
3x2
= 0 ⇔ x = 0.
(x3 + 1)2
(6) We look at the intervals (−∞, −1), (−1, 0), and (0, ∞). To determine
whether the function is increasing or decreasing, we only need to plug in
one point (the sign doesn’t change in each interval, because other-
wise there would be another critical point!). For the interval (−∞, −1)
we plug in x = −2 to obtain
3(−2)2
f ′ (−2) = = 12 > 0.
((−2)3 + 1)2
For the interval (−1, 0) we plug in x = − 21 to obtain
1 2 3
1 3 − 2 4
f′ − = 2 = > 0.
2 1 3 7 2
−2 + 1 8
(8) We set
f ′′ (x) = 0
−12 · 17 204
f ′′ (−2) = 3 = > 0.
3
((−2) + 1) 73
q q
3 1 1 1
In the interval 0, 2 , we plug in 3 1000 = 10
to obtain
3 499
′′ 1 5 500
f = > 0.
10 1001 3
1000
q
1
Finally, for the interval 3
2
,∞ , we plug in x = 1 to obtain
3
f ′′ (1) = − < 0.
4
We collect these in an small table:
q q
(−∞, −1) (−1, 0) 0, 3 12 3 1
2
,∞
Sign of f ′′ + − + −
Convexity of f concave up concave down concave up concave down
q
We see that there are changes in concavity at x = −1, x = 0, and x = 3 12 .
We are now ready to sketch the graph, making sure that all of the above data is
on the sketch.
62
1
1
3
q
−4 −3 −2 −1 0 3 11
2
2 3 4
−1
−2
−3
−4
−5
5.8. Taylor approximation and error estimation. Consider again the difference
quotient
f (a + h) − f (a)
h
For h very small, this is extremely close to f ′ (a), assuming that the limit exists.
Now recall that f ′ (a) is the slope of the tangent line at the point (a, f (a)).
Recall that to determine the equation for a line it is enough to know the slope
and one point on the line. In particular, if the slope is m and the point (A, B) lies
on the line, then
y = m (x − A) + B.
Therefore, the tangent line y = P1 (x) can be written
P1 (x) = f ′ (a) (x − a) + f (a).
63
For x = a + h we have
P1 (x) = f ′ (a)(h) + f (a).
f (a + h) − f (a)
f ′ (a) ≈ ,
h
we have
f (a + h) − f (a)
P1 (x) ≈ (h) + f (a)
h
= f (a + h).
Example 5.32. Find the linear approximation P1 (x) for f (x) := x3 − 2x2 near
x = 2.
Solution: We have
f ′ (x) = 3x2 − 4x,
so that in particular
f ′ (2) = 12 − 8 = 4.
Moreover,
f (2) = 0.
f (x) ≈ 4 (x − 2) .
To see that this really does closely approximate f , let’s try x = 2.01. For this, we
have
f (2.01) = 0.0404001.
f (2.01) ≈ 0.04.
f (x) 5
tangent line
4
−4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
It is natural to ask whether one can get better approximations. Since the second
derivative also contains information about the concavity of f maybe packaging this
information into a second approximation would be helpful (a line has no concavity
because the second derivative is always zero). In particular, one obtains a quadratic
approximation by
f ′′ (a)
P2 (x) := f (a) + f ′ (a)(x − a) + (x − a)2
2
This is approximately equal to P1 (x) near x = a, but also matches the concavity
of f nearby x = a, so it approximates the function better. Of course, a quadratic
function has fixed concavity (it is either always concave up or always concave down,
because the second derivative is constant). For the concavity to change, one would
need a higher order polynomial. Continuing in this way, one is naturally led to
define the Taylor polynomial of order n of f at x = a by
Solution: We compute
f (0) = 0,
f ′ (x) = cos(x)
=⇒ f ′ (0) = 1,
f ′′ (x) = − sin(x)
=⇒ f ′′ (0) = 0,
f ′′′ (x) = − cos(x)
=⇒ f ′′′ (0) = −1.
Therefore
x3
P3 (x) = x − .
6
√
Example 5.34. Find the second Taylor polynomial for f (x) := x at x = 1.
Solution: We have
f (1) = 1,
1
f ′ (x) = √
2 x
1
=⇒ f ′ (1) = ,
2
1
f ′′ (x) = − 3
4x 2
1
=⇒ f ′′ (1) = − .
4
Therefore
1 − 41
P2 (x) = 1 + (x − 1) + (x − 1)2
2 2
1 1
= 1 + (x − 1) − (x − 1)2 .
2 8
As alluded to earlier, the Taylor polynomial at x = a gives a pretty good approxi-
mation to f (x) for x near a. It is useful to get an idea about the error between the ap-
proximation Pn (x) and f (x). We hence define the error of the nth Taylor approximation
(also known as the remainder)
Rn (x) := f (x) − Pn (x).
The following theorem allows us to get a good bound on how large Rn (x) can be.
66
Theorem 5.35 (Taylor’s Theorem). Suppose that the (n + 1)th derivative exists
in an open interval (A, B) containing a. Then for every x ∈ (A, B), there exists c
between x and a such that
f (n+1) (c)
Rn (x) = (x − a)n+1 .
(n + 1)!
In other words, there is some c between a and x such that
f ′′ (a) f (n) (a) f (n+1) (c)
f (x) = f (a)+f ′ (a)(x−a)+ (x − a)2 +· · ·+ (x − a)n + (x−a)n+1 .
2 n! (n + 1)!
f (n+1) (c)
By Taylor’s Theorem, if we get a good bound on (n+1)!
(x − a)n+1 somehow, then
we can get a very good approximation for f (x)!
√
Example 5.36. Use the second Taylor approximation to approximate 4.01. Es-
timate the absolute value of the error from the actual value.
Solution: We define
√
f (x) := x.
√
Since 4 = 2, we compute the second Taylor approximation at x = 4. We have
f (4) = 2,
1
f ′ (x) = √
2 x
1
=⇒ f ′ (4) = ,
4
1
f ′′ (x) = − 3
4x 2
1
=⇒ f ′′ (4) = − .
32
Therefore
1
1 − 32
P2 (x) = 2 + (x − 4) − (x − 4)2
4 2
1 1
= 2 + (x − 4) − (x − 4)2 .
4 64
√
To approximate 4.01, we plug in x = 4.01. This gives
1 1 1 1 1601 1281601
f (x) ≈ 2 + (0.01) − (0.01)2 = 2 + − =2+ = .
4 64 400 640000 640000 640000
By Taylor’s Theorem, the error is
f ′′′ (c)
R2 (4.01) = (4.01 − 4)3
3!
for some c ∈ (4, 4.01).
67
We compute
3
f ′′′ (x) = 5 ,
8x 2
Therefore, for c ∈ (4, 4.01), we have
3
|f ′′′ (c)| < 5 .
8 (4) 2
Here we used the fact that a larger x will give a smaller f ′′′ (x) (f ′′′ (x) is decreasing),
5
so the value is always smaller than plugging x = 4 in directly. Since 4 2 = 25 = 32,
we have
3
|f ′′′ (c)| < .
256
Hence the error may be bounded by
3
1 1
|R2 (4.01)| < 256
(0.01)3 = 3
= .
3! 512 (100 ) 512000000
We see that the approximation is pretty good.
68
2n−1
f (x) := bx
the exponential function with base b. For b > 1, this grows large very fast as x gets
bigger, while for b < 1 it very quickly gets small (we say that it decays fast). Its
natural domain is R and its range is (0, ∞). This is shown in the graph below.
bx (b > 1) 5
bx (b < 1)
4
0
−4 −3 −2 −1 1 2 3 4
−1
−2
−3
−4
−5
(4)
1
b−x =
bx
(5)
bx
y
= bx−y
b
|{z}
=bx ·b−y
(6)
a x ax
= .
b }
| {z bx
x
=(a· 1b )
(7)
b0 = 1.
Consider now the derivative
h
d x bx+h − bx xb − 1 bh − 1
(b ) = lim = lim b = bx lim .
dx h→0 h h→0 h h→0 h
d
Notice that the last limit does not depend on the input x. Therefore dx
(bx ) is
essentially bx (up to a constant). Defining
f (x) := bx ,
the constant is precisely
bh − 1
f ′ (0) = lim .
h→0 h
Thus
f ′ (x) = f ′ (0)bx .
It turns out that there is a unique number e ≈ 2.7182818284509 such that
eh − 1
lim = 1.
h→0 h
Thus
d x
e = ex .
dx
Since bx > 0 for every x and
f ′ (x) = f ′ (0)bx ,
we conclude that either f ′ (x) > 0 for every x, or f ′ (x) < 0 for every x (depending
on whether f ′ (0) > 0 or f ′ (0) < 0). Since
(
∞ if b > 1,
lim bx =
x→∞ 0 if b < 1,
we have the following (which is consistent with the graph given above).
70
Theorem 6.1. The function f (x) := bx is increasing for all x if b > 1 and decreasing
for all x if b < 1.
6.2. Logarithms. Since the function f is always increasing or always decreasing,
it is also injective. Therefore f is invertible. The inverse function is known as the
logarithm with base b and is written
logb (x).
In the special case that b = e, we write
ln(x) := loge (x).
This is known as the natural logarithm. The graph of the logarithm is given below:
bx (b > 1) 5
0
−4 −3 −2 −1 1 2 3 4
−1
−2
−3
−4
−5
Therefore
dy
= (1 + ln(x)) y = (1 + ln(x)) xx .
dx
Another useful application of logarithmic differentiation is an alternative to the
quotient rule.
We again write y := f (x) and implicitly differentiate. We have (using the properties
of the logarithm
1 !
(x2 + 1) 3 1 2
ln(y) = ln = ln x + 1 − 2 ln (x + 2) .
(x + 2)2 3
Therefore
1 dy 1 2
= 2
(2x) − .
y dx 3 (x + 1) x+2
Thus
1
dy 1 2 (x2 + 1) 3 2x 2
=y (2x) − = − .
dx 2
3 (x + 1) x+2 (x + 2)2 2
3 (x + 1) x + 2
75
6.5. Computation of limits. The logarithm and exponential function are also
useful for computing certain limits.
The logarithm and exponential functions are both continuous, meaning that for
any function f (x) for which limx→a f (x) exists and limx→a f (x) > 0, we have
lim ln (f (x)) = ln lim f (x)
x→a x→a
and if limx→a f (x) exists (the limit does not need to be positive)
lim ef (x) = elimx→a f (x) .
x→a
Using this, one is able to compute limits where plugging in directly you would get
∞0 , 00 , and 1∞ .
so that
1
lim x x = e0 = 1.
x→∞
Remark. The limit above naturally occurs when computing interest. Consider what
is known as compound interest (this is the actual interest system used by most
banks). Let’s say that you get n interest payments every year at a yearly interest
rate of 1%. The 1% is split into n pieces.
For simplicity, suppose that n = 12 (you get monthly interest). At the end of the
1
first month, you receive 12 % interest (since the 1% is split into 12 equal pieces). If
you started with $1, then you’d now have
.01
1+
12
dollars. But now you also get interest on the interest in the next month. So after
two months, you’d have
2
.01 .01 .01 .01
1+ + 1+ = 1+ .
| {z12} |
12 12
{z }
12
already in bank interest earned
6.6. Taylor polynomials of ex and ln(x). Define f (x) := ex . Since
f ′ (x) = ex ,
repeated differentiation for m ≥ 0 gives
f (m) (x) = ex .
Therefore f (m) (0) = 1.
Therefore, the order n Taylor polynomial at x = 0 for f is
f ′′ (0) 2 f (n) (0) n
Pn (x) = f (0) + f ′ (0)x + x + ··· + x
2 n!
x2 xn
=1+x+ + ··· + .
2 n!
Example 6.10. Find the order n Taylor polynomial of ln(x) at x = 1.
Solution: Set g(x) := ln(x). Then g ′ (x) = x1 . The second derivative is
1
g ′′ (x) = − = −x−2 .
x2
The next derivative is
g ′′′ (x) = 2x−3 .
Repeated differentiation gives (for m > 0)
g (m) (x) = (−1)m−1 (m − 1)!x−m
(This is true for m = 1, so to check this, take the derivative inductively/recursively).
Therefore
(x − 1)2 (−1)n−1
Pn (x) = (x − 1) − + ··· + (x − 1)n .
2 n
78
You can check each of these by taking the derivative of the second row. For example,
for F (x) := − cos(x), we have
Definition 7.3. One also calls the (set of) antiderivative F (x) + C of f (x) the
indefinite integral of f (x). One writes
Z
f (x)dx = F (x) + C,
(6) We have
Z
ex dx = ex + C.
80
We’re now going to look at the antiderivative in another light. Consider the
function
f (x) := x.
The antiderivative of this function is
Z
1
f (x)dx = x2 + C.
2
Now draw f (x) and consider the area under the line from x = 1 to x = 3.
0
−4 −3 −2 −1 1 2 3 4
−1
−2
−3
−4
−5
The area under the line f (x) from x = 1 to x = 3 is the area under the triangle
with the endpoints (0, 0), (3, 0), and (3, 3) minus the area under the triangle with
the endpoints (0, 0), (1, 0) and (1, 1). The area of a right triangle is half the area of
the rectangle which shares two of its sides.
81
Definition 7.4. We call the area under the curve f (x) from x = a to x = b the
definite integral of f from a to b. It is denoted by
Z b
f (x)dx.
a
The number a is called the lower limit of the integral and b is called the upper limit
of the integral.
By “area under the curve” we mean the area between the curve y = f (x) and the
x-axis. If the curve is below the x-axis, then the area is counted negatively. If it is
above the x-axis, then the area is counted positively.
The notation
F (x)|ba := F (b) − F (a)
means to substitute x = b in and then subtract with x = a plugged in.
Solution: By the first Fundamental Theorem of Calculus, we can take any anti-
derivative and subtract the values. Since
1 7
F (x) := x3 − x2
3 2
2
is an antiderivative of x − 7x, the value is
3
x 7 2 3 33 7 · 32 63 45
− x = − =9− =− .
3 2 0 3 2 2 2
Note that the area is negative, which is accounted for by the fact that the curve is
below the x-axis for x ∈ (0, 7).
Example 7.8. Find Z x
d t
dt.
dx 0 1 + t5
Solution: Define
x
f (x) :=
1 + x5
and Z x
F (x) := f (t)dt.
0
Then by the second Fundamental Theorem of Calculus, we have
x
F ′ (x) = f (x) = ..
1 + x5
Example 7.9. Find
Z x3 +4
d
cos t3 dt.
dx 2
83
Solution: Define
f (x) := cos x3
and
Z x
F (x) := f (t)dt.
2
By the second fundamental Theorem of Calculus (note that sec2 (x) is continuous
for x ∈ π2 , π2 ), the left-hand side of this equation is
Z x
d
f (t)dt = f (x).
dx c
Hence we have
f (x) = sec2 (x).
7.1. Integration by substitution. We next consider the reverse of the chain rule.
Remember that for
h(x) := f (g(x)),
the chain rule states that
h′ (x) = f ′ (g(x))g ′ (x)dx.
However, by the second Fundamental Theorem of Calculus, we have
Z
h(x) + C = h′ (x)dx.
86
The left-hand side of this equation is (by the second Fundamental Theorem of Cal-
culus) Z
d
f (x)g(x) = (f (x)g(x)) dx.
dx
Therefore Z Z
f (x)g(x) = g(x)f (x)dx + f (x)g ′ (x)dx.
′
88
Example 7.16. Compute Z 1
x2 ex dx.
0
2
Solution: Setting u = x and dv = ex dx, we have
u = x2 , dv = ex dx,
du = 2xdx, v = ex .
Therefore Z Z
2 x 2 x
xex dx.
x e dx = x e − 2
R
We now do integration by parts again to compute xex dx. Setting u = x and
dv = ex dx, we have
u = x, dv = ex dx,
du = dx, v = ex .
89
Therefore Z Z
x x
xe dx = xe − ex dx = xex − ex + C.
We conclude that Z
x2 ex dx = x2 ex − 2xex + 2ex + C.
u = sin(x), dv = ex dx,
du = cos(x)dx, v = ex .
We then obtain Z Z
x x
e sin(x)dx = e sin(x) − ex cos(x)dx.
We again apply integration by parts with u = cos(x) and dv = ex dx. This yields
u = cos(x), dv = ex dx,
du = − sin(x)dx, v = ex .
Therefore Z Z
x x
e cos(x)dx = e cos(x) + sin(x)ex dx.
Hence Z Z
x x x
e sin(x)dx = e sin(x) − e cos(x) − sin(x)ex dx.
It follows that Z
2 ex sin(x)dx = ex sin(x) − ex cos(x) + C
Therefore Z
1 1
ex sin(x)dx = ex sin(x) − ex cos(x) + C.
2 2
90
You can think of this as splitting the interval from a to b into two pieces and
computing the area underneath the curve for each one.
(3) If a > b, then we define
Z b Z a
f (x)dx := − f (x)dx.
a b
The intuitive reason that the area is negative is because the order is reversed.
The fact that the area is negative if the curve is below the x-axis means that
area has some sort of ordering. Hence, if you reverse things, the area should
be negative.
(4) As a result of the last definition, for any a, b, c ∈ R we have
Z b Z c Z b
f (x)dx = f (x)dx + f (x)dx.
a a c
This is one motivation for defining things as above. If you had the above
identity for all c and a = b, then
Z a Z c Z a
0= f (x)dx = f (x)dx + f (x)dx.
a a c
−5 −4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
91
−5 −4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
4
g(x)
3
−5 −4 −3 −2 −1 0 1 2 3 4
−1
−2
−3
−4
−5
8. Matrices
Definition 8.1. Suppose that n, m ∈ N. A m × n matrix with real coefficients is
an array
A11 A12 · · · A1n
A21 A22 · · · A2n
A := .. .. . . .. ,
. . . .
Am1 Am2 · · · Amn
where A11 , A12 , . . . , Amn ∈ R. One calls Aij the (i, j)th entry of A. This is sometimes
abbreviated A := (Aij )m×n or A := (Aij ) if m and n are clear from the context. We
call m × n the size or dimensions of the matrix.
If n = 1, then we call A a column vector of length m, and if m = 1, then we call
A a row vector of length n. More generally, for a matrix A and 1 ≤ j ≤ n, we call
(Aij )m×1 the jth column and for 1 ≤ i ≤ m we call (Aij )1×n the ith row.
In the case that n = m, we call A a square matrix of size n.
Example 8.2. The matrix
1
2
A :=
7
3
is a column vector of length 4.
The matrix
B := −1 2 −7 3 1
is a row vector of length 5.
The matrix
0 1 4
C :=
−1 3 6
is a 2 × 3 matrix.
The matrix
1 −3 5 −2
−7 8 4 1
D :=
−3 −2 1 6
0 1 1 0
is a square matrix of size 4 (or, in other words, a 4 × 4 matrix). The column vector
−3
8
−2
1
is the second column of D.
93
There are a number of very special matrices which occur throughout mathemat-
ics and applications to engineering and the sciences. The first such matrix is the
zero matrix (0)m×n , all of whose entries are zero (Aij = 0 for every i, j). This is
often simply written 0 if m and n are known.
The entries Aii are called the diagonal entries (or simply diagonal) of the matrix
A. The terms Aij with i 6= j are called the off-diagonal elements. The square matrix
of size n which has all diagonal entries equal to 1 and all off-diagonal entries equal
to 0 is called the identity matrix of size n.
We also have
1 + 2 0 − 3 −3 + 4 3 −3 1
A−B = = .
2 − 1 −5 + 1 4 − 2 1 −4 2
The zero matrix is the precisely the matrix which satisfies
A+0=A
for every A. One also has
A − A = 0.
For the next operation on matrices, we take a matrix A and λ ∈ R. The number
λ is called a scalar. We define scalar multiplication by multiplying the (i, j)th entry
of A by λ. That is to say, scalar multiplication is defined by
λA = λ (Aij )m×n := (λAij )m×n .
0 −2 3
and
0 −4
B := −2 1 ,
6 3
what is AB (if it exists)? What is BA (if it exists)?
Solution: We have
1 · 0 + (−1)(−2) + 2 · 6 1(−4) + (−1) · 1 + 2 · 3
1 · 0 + (−3)(−2) + 0 · 6 1(−4) + (−3) · 1 + 0 · 3
AB = −4 · 0 + 1(−2) + 7 · 6
(−4)(−4) + 1 · 1 + 7 · 3
0 · 0 + (−2)(−2) + 3 · 6 0(−4) + (−2) · 1 + 3 · 3
14 1
6 −7
=40 38 .
22 7
The product BA simply isn’t defined, because B has 2 columns while A has 4 rows.
These must be the same to multiply the matrices.
As evidenced in the above example, AB may be defined sometimes when BA is
not. Actually, AB and BA are defined if and only if A is an m × n matrix and B
is a n × m matrix, for some m, n ∈ N. The matrices AB and BA do not need to be
the same (they don’t even have to be the same size!).
Example 8.8. For
A := 1 −2
96
and
3
B := ,
−1
find AB and BA. What are the dimensions of these matrices?
Solution: We have
AB = 1 · 3 + −2(−1) = (5).
This is a 1 × 1 matrix.
Multiplying the other way, we have
3·1 3(−2) 3 −6
AB = = .
−1 · 1 (−1)(−2) −1 2
This is a 2×2 matrix. Clearly they are not equal, since they have different sizes.
When A and B are both square matrices of size n, then both AB and BA are square
matrices of size n. However, even in this case they are not necessarily equal.
Example 8.9. For
1 −1
A :=
3 −2
and
2 −4
B := ,
−2 1
find AB and BA.
Solution: We have
1 · 2 − 1(−2) 1(−4) − 1 · 1 4 −5
AB = = .
3 · 2 − 2(−2) 3(−4) − 2 · 1 10 −14
Moreover,
2·1−4·3 2(−1) − 4(−2) −10 6
BA = = .
−2 · 1 + 1 · 3 −2(−1) + 1(−2) 1 0
The n × n identity matrix In is the special matrix which for any n × n matrix A
satisfies
AIn = A
and
In A = A.
For example, with
1 0
I2 :=
0 1
and
1 7
A :=
4 3
97
we have
1·1+0·4 1·7+0·3
AI2 = =A
0·1+1·4 0·7+1·3
and
1·1+7·0 1·0+1·7
I2 A = = A.
4·1+3·0 4·0+3·1
Multiplication also satisfies a number of useful properties.
Theorem 8.10.
(1) If AB and BC are both well-defined, then we have
(AB)C = A(BC).
(2) If AB and AC are both well-defined, then we have
A (B + C) = AB + AC.
(3) If BA and CA are both well-defined, then we have
(B + C) A = BA + CA.
(4) For any λ ∈ R, if AB is well-defined, then we have
λ (AB) = (λA) B = A (λB) .
Sometimes it is useful to construct matrices from other known matrices. For
example, given a matrix A = (Aij )n×m , we define the transpose matrix AT by
AT := (Aji )m×n .
In other words, the rows and the columns of the matrix are reversed.
Example 8.11. Consider the matrix
1 3 7
A := .
−2 1 0
The transpose of this matrix is
1 −2
AT := 3 1 .
7 0
We call a square matrix A symmetric if
AT = A.
Example 8.12. The matrix
2 5 −3
A= 5 1 0
−3 0 2
is symmetric.
98
In addition to returning the matrix A when multiplying against it, the identity
matrix In appears in one additional important place. If A and B are both square
matrices of size n and
BA = AB = In ,
then we say that B is the (multiplicative) inverse of A. We write B = A−1 and
call A invertible. The inverse does not always exist, however. The existence of an
inverse is determined by something called the determinant of the matrix.
8.2. Determinants. Checking whether the inverse of a square matrix A exists in-
volves a number called the determinant, which is usually denoted det(A) or |A|.
Theorem 8.13. A square matrix A is invertible if and only if its determinant is
non-zero.
In these notes, we won’t give the full definition of the determinant (there is a
systematic definition, however), but will give it in the special case that the size of
the square matrix is n = 1, n = 2, or n = 3.
Definition 8.14.
(1) If n = 1, then the determinant of
A := (a)
is
det (A) = a.
(2) If n = 2, then the determinant of
a b
A=
c d
is
det(A) = ad − bc.
(3) If n = 3, then the determinant of
a b c
A = d e f
g h j
is
det(A) = aej + bf g + cdh − gec − hf a − jdb.
Example 8.15. For
2 −1
A := ,
3 −2
we have
det(A) = 2(−2) − (−1)(3) = −4 + 3 = −1.
99
Theorem 8.16. If
A := ( ac db )
has non-zero determinant ad − bc, then
−1 1 d −b
A = .
det(A) −c a
The determinant satisfies a number of useful properties.
(1) If any row or column of A is zero, then det(A) = 0.
(2) Multiplying one row or column by a constant λ ∈ R multiplies the determi-
nant by λ.
(3) We have
det AT = det(A).
(4) For any two matrices A and B, we have
8.3. Solving systems of linear equations. Recall that a linear equation (with
two variables x, y) is an equation of the form
ay + bx = c,
where a, b, c ∈ R are fixed constants. The solution to a linear equation with two
variables is a line.
More generally, if x1 , . . . , xn are variables and a1 , . . . , an , and b are constants, then
we call
a1 x 1 + · · · + an x n = b
a linear equation.
It is often useful to be able to find choices of x1 , . . . , xn which solve many linear
equations at the same time. When we want to find simultaneous solutions to a
number of equations, we call the set of these equations a system of linear equations.
100
5x + 2y = 9,
3x + y = 5.
The solution to the above system of linear equations is the unique (x, y) where
the two lines intersect. For example, this might represent the supply and demand
curves in an economics problem. Their intersection point is then the optimal number
supplied. In the above example, this is
5x + 2y − 2 (3x + y) = 9 − 2 · 5
−x = −1
x=1
3(1) + y = 5
y = 2.
This works fine enough for 2 variables and two equations, but might get quite difficult
when you have more than 2 variables or more than two equations.
We are next going to use matrices to package the system of linear equations
together nicely and also give a way to solve the system sometimes. Suppose that
you have a system of linear equations
A11 x1 + · · · + A1n xn = b1
A21 x1 + · · · + A2n xn = b2
.. . . . .
. . + .. = ..
Am1 x1 + · · · + Amn xn = bm
Then we can package the linear system of equations as the matrix equation
A11 A12 · · · A1n
A21 A22 · · · A2n x1 b1
.. ..
.. .. ... .. . = . .
. . .
xn bn
Am1 Am2 · · · Amn
For simplicity, one often simply writes
Ax = b,
101
is
x −1 2 9
=
y 3 −5 5
−9 + 10
=
27 − 25
1
= .
2
This is precisely the same answer that we got with substitution.
103
9. Complex numbers
Recall the definitions of N, Z, and R. If you ask for the solutions to
x2 − 2 x2 − 4 = 0
x2 + 1 = 0.
x2 + 1 = 0.
C := {a + bi : a, b ∈ R} .
9.1. The algebra of complex numbers. We next investigate the algebra of com-
plex numbers.
Two complex numbers z := x + iy and τ := u + iv are equal if and only if
x = u,
y = v.
z + τ = (x + u) + i (y + v) .
104
Then
1 1 z z x − iy
= · = = 2 .
z z z zz x + y2
Conjugation satisfies a number of useful properties. For z, τ ∈ C, we have:
z ± τ = z ± τ,
zτ = zτ ,
z z
= .
τ τ
9.2. Graphical representation of C. Now consider the following graphical rep-
resentation of C. For x + iy ∈ C (with x, y ∈ R), we simply make a grid with x- and
y-axes. This plane of these x/y-axes is called the complex plane. We put a point at
(x, iy) on the complex plane. The point 2 + 3i is marked in the graph below:
5i
4i
2 + 3i
3i
2i
−4 −3 −2 −1 0 1 2 3 4
−i
−2i
−3i
−4i
−5i
The x and iy plane is known as the Cartesian plane, and the representation (x, y)
is known as the Cartesian coordinates of z. With this graphical interpretation, it
is natural to consider the distance between two points. For z1 = x1 + iy1 ∈ C and
z2 = x2 + iy2 ∈ C (with x1 , x2 , y1 , y2 ∈ R), we define the distance between z1 and z2
by q
|z1 − z2 | := (x1 − x2 )2 + (y1 − y2 )2 .
Notice that if z1 = x1 , z2 = x2 ∈ R, then this simply becomes
q
|z1 − z2 | = (x1 − x2 )2 = |x1 − x2 | .
In particular, we define the absolute value of z = x + iy ∈ C by
p
|z| := x2 + y 2 .
Now notice that
|z|2 = x2 + y 2 = zz.
Remark. The number |z| is also sometimes called the modulus of z. It satisfies (for
z, τ ∈ C)
|zτ | = |z| · |τ |
106
and (for τ 6= 0)
z |z|
= .
τ |τ |
Another important inequality satisfied by the absolute value is the Triangle Inequality,
which states that for z, τ ∈ C
|z + τ | ≤ |z| + |τ |.
4i
2 + 3i
3i
2i
−4 −3 −2 −1 0 1 2 3 4
−i
−2i
−3i
−4i
−5i
The line is called a vector. This also gives a new interpretation of addition. If one
thinks of these vectors as moveable (so that they don’t have to start at the origin),
then you can put one arrow starting at the end of another arrow. The result is
the addition of these two vectors (or equivalently, the addition of the two complex
numbers that these vectors represent). An example of
(2 + 3i) + (2 + i) = 4 + 4i
2i
i 2+i
−4 −3 −2 −1 0 1 2 3 4
−i
−2i
−3i
−4i
−5i
107
of z and the angle θ of the counter-clockwise-oriented arc beginning from the positive
half of the x-axis and ending at z (this will be further explained below). This
representation is known as polar coordinates. For z in the first quadrant of the x
and iy plane, the angle, then just make the right triangle which is formed by the x
axis and the complex number (considered as a vector). That is to say, the endpoints
are 0, z, and Re(z). The angle meant above is precisely the interior angle of the
triangle at the corner at the endpoint 0:
5i
4i
2 + 3i
3i
2i
i
θ
−4 −3 −2 −1 0 1 2 3 4
−i
−2i
−3i
−4i
−5i
If z is in the other quadrants, then we start at the same point on the x-axis and
make an arc to the vector representing z. The angle of this arc is then θ:
5i
4i
2 + 3i
3i
2i
i
θ
−4 −3 −2 −1 0 1 2 3 4
−i
−2i
−3i
−4i
−5i
The number θ is sometimes called the argument of z. This number is only really
unique up to addition by multiples of 2π. In other words, i has argument π2 , but we
could also say that i has argument
π 5π
+ 2π = .
2 2
108
This occurs by wrapping one time around before getting to z. Throughout we choose
to always pick the choice of θ for which 0 ≤ θ < 2π.
Write z = x + iy with x = Re(z) ∈ R and y = Im(z) ∈ R (the real and imaginary
parts). Again denoting r := |z|, trigonometry on the triangle in the above diagram
yields
x = r cos(θ),
y = r sin(θ).
Using p
r= x2 + y 2 ,
we also get the formula
x
cos(θ) = p ,
x2 + y 2
y
sin(θ) = p .
x2 + y 2
The above two sets of formulas allow us to go back and forth between Cartesian
coordinates and polar coordinates.
√
1+i 3
Example 9.2. Consider the complex number z := 2
. Give z in Cartesian
coordinates and polar coordinates.
Solution: The Cartesian coordinates are precisely given above by
√ !
1 3
, .
2 2
To obtain the polar coordinates, we have to take the absolute value
v
u 2
u 1 √ !2 r
3 1+3
|z| = t + = = 1.
2 2 4
We also compute
1
x 1
cos(θ) = = 2 = .
|z| 1 2
We now recall that π 1
cos = ,
3 2
yielding (note that this is only unique up to adding multiples of 2π, but we have
restricted the possible choices of θ above)
π
θ= .
3
Hence the polar coordinates are
π
(r, θ) = 1, .
3
109
Since sin(−θ) = i sin(θ) and cos(−θ) = cos(θ), the polar coordinates for z (if the
polar coordinates of z = x + iy are (r, θ)) are given by
z = x − iy
= r cos(θ) − ir sin(θ)
= r cos(−θ) + ir sin(−θ).
(r, −θ) .
Then
cos (θ1 ) cos (θ2 ) − sin (θ1 ) sin (θ2 ) = cos (θ1 + θ2 )
and
sin (θ1 ) cos (θ2 ) + sin (θ2 ) cos (θ1 ) = sin (θ1 + θ2 ) .
Therefore, we have
z 2 = r2 (cos(2θ) + i sin(2θ)) .
Multiplying this by z, we get multiply by r again and add θ. Hence
z 3 = r3 (cos(3θ) + i sin(3θ)) .
z = r (cos(−θ) + i sin(−θ)) ,
we have
r (cos(−θ) + i sin(−θ)) 1
z −1 = 2
= (cos(−θ) + i sin(−θ)) .
r r
Thus
z n = rn (cos(nθ) + i sin(nθ))
holds for n = −1 as well.
Using the fact that
n
z −n = z −1
and then the computation of the polar coordinates for the product, we obtain
n
z −n = r−1 (cos(−θ) + i sin(−θ)) = r−n (cos(−nθ) + i sin(−nθ)) .
We have hence concluded the following.
9.5. Roots of unity. For n ∈ N, we call z an nth root of unity (or nth root of 1)
if
z n = 1.
By De Moivre’s formula, we have
z n = rn (cos(nθ) + i sin(nθ)) .
111
Moreover, we have √
2
cos(θ) = −
2
and √
2
sin(θ) = .
2
Thus
3π
θ= + 2πm,
4
for some m ∈ Z. Following the same argument given for finding the nth roots of
unity, if z (with polar coordinates r and θ) is an nth root of w, then
rn = |z|n = 2
3π 2πm
θ= + .
4n n
Here m ∈ Z. Dividing by the nth root of unity ωn , we have
z
α := m = z ωn−m .
ωn
But then |α| = |z| and the θ corresponding to α is
3π
.
4n
It follows that every nth root of w can be written as
αωnm ,
with
m = 0, 1, . . . , n − 1.
Thus R
f (x)dx
y = Ce .
Therefore, if we know how to integrate f (x), then we know how to solve the differ-
ential equation.
Now let’s try to solve a general linear differential equation. Multiply the left-hand
side by a function h(x)
h(x)y ′ + f (x)h(x)y.
This now looks a bit like the derivative from the product rule. Indeed, if
h′ (x) = f (x)h(x),
then
(h(x)y)′ = h′ (x)y + h(x)y ′ .
The point is that by the second Fundamental Theorem of Calculus, we now know
the integral Z
(h(x)y)′ dx = h(x)y + C.
We then get y showing up so that we can solve for it.
We now find out what h(x) should be. Since
h′ (x) = f (x)h(x),
we have
h′ (x)
= f (x).
h(x)
This is exactly like the separable differential equation that we solved initially. Inte-
grating, we obtain Z
ln (h(x)) = f (x)dx + C.
Thus we have R
f (x)dx
h(x) = Ce .
For this particular choice of h, we now obtain
h(x)y ′ + f (x)h(x)y = g(x)h(x).
We rewrite the left-hand side to obtain
(h(x)y)′ = g(x)h(x).
Integrating both sides yields
Z
h(x)y = g(x)h(x)dx + C.
116