Mathematical Methods: Advanced Course
(NEKN32)
Lecturer: Ovidijus Stauskas
Email: ovidijus.stauskas@nek Iu.se
Lund University
Personal Webpage: https:/ /sites. google.com /view /ovidijus-stauskas/home
Office: 275, EC1. Office hours: Thursdays from 15.00 to 17.00 or by appointment
Fall 2021
‘These Lecture Notes are and should be strictly used for teaching/studying NEKN32 and
not for commercial purposesContents
1 Chapter 1: General Terminology and Topology 5
1.1 Set Theory 5
1.2. Least (Greatest) Upper (Lower) Bound Principle R
13. Convex Sets 13
14. Distances 15
15 Sequences and Convergence 16
16 Open, Closed and Compact Sets 18
2. Chapter 2: Functions of a Single Variable 21
2.1 Functions, Domain and Codomain 2
22. Injective, Bijective and Surjective Functions . - 2D
23° Convex and Concave Functions»... 2.00.5. 26
24 Continuity 6.0.00... vee 7
25 Intermediate Value Theorem... .. 2... : sens 29
3. Chapter 3: Differential Calculus (Single Variable) 29
3.1. Derivatives 29
32. Differentiability and Increasing (Decreasing) Functions 35
3.3. Convex and Concave Functions Revisited 36
34. Linear Approximation and Differentials 37
35. Taylor Approximation 38
3.6 L/Hopital’s Rule wees . 40
4 Chapter 4: Trigonometry, Trigonometric Functions and Their Derivatives (Bonus
Lecture) ra
4.1 UnitCircle .. . + 41
42. Radians 42
43 Trigonometric Function on Unit Circle B
44 Graphs of Sine and Cosine 5
4.5. Trigonometric Identities and Useful Limits 46
4.6 The Derivatives 48
47 Tangent 48
48 Application in Economics and Econometrics 49
5 Chapter 5: Functions of Multiple Variables: Generalization of Concepts
5.1 Examples and Graphs
5.2. Continuity
53. Partial Derivatives
54 Total Derivatives
55. Linear Approximation and Differentials
5.6 Implicit Functions
6 Chapter 6: Optimization (Unconstrained) 87
6.1 Maximum, Minimum and Stationary Points 37
62. Extreme Value Theorem and Applications 58
63. Applications for ACR"... . 597
10
ca
Chapter 7: Optimization (Constrained)
7.1 Equality Constraints and Lagrange Multiplier Method
7.2 Interpretation of Lagrange Multiplier
7.3 Multiple Equality Constraints and Variables .
7A Necessary Conditions for Local Extreme Points
75 Inequality Constraints: Kuhn-Tucker Conditions
7.6 Kuhn-Tucker: Sufficient Conditions
Chapter 8: Integral Calculus (Single Variable)
81 Indefinite vs. Definite Integrals
82 Definition and Interpretation of the Definite Integral
83 Properties of Definite Integral
84 Fundamental Theorem of Calculus
85. Integration by Substitution
86 Integration by Parts
87 Improper Integrals :
88 Integral Mean Value Theorem .
89 Calculating Arc Length (Arc Integrals)*
8.10 Solving Infinite Sums as Integrals*
Chapter 9: Integral Calculus (Multiple Variables)
9.1 Double Integrals
9.2 Double Integrals: General Region D
9.3 Multiple Integrals
Chapter 10 (Optional): Introduction to Ordinary Differential Equations (ODE)
10.1 Definition of a Differential Equation.
102 First Order ODE .
10.3 Second Order ODE: Homogeneous Case
10.4 Second Order ODE: Non-Homogeneous Case and Method of Undetermined
Coefficients
Chapter 11: Linear Algebra
11.1 The Concept of a Matrix
11.2 Basic Matrix Properties and Operations
113 Transposition :
114 Dot (Inner) Product
115 Matrix Multiplication .
116 Identity Matrix and Powers of Matrices
117 Geometry of Vectors, Transformations and Length .
118 Determinant
119 Linear Independence and Interpretation of a Determinant
11.10General Calculation of a Determinant
11.1 1Inverse of a Matrix
11.12Trace of a Matrix
11.13Rank of a Matrix
11.14Bigenvectors and Eigenvalues
a
62
63
64
65
66
oo
o
9
70
73
73
ra
76
78
80
80
82,
83,
83
86
88
90
91
91
7
101
102
103
103
104
106
106
107
108
109
110
112
14
116
7
1711.15Quadratic Form and Positive /Negative Definiteness .
11.16Diagonalization .
12 Chapter 12: Vector Differential Calculus
12.1 The First Order Derivative with Respect to a Vector
12.2 Directional Derivative
12.3 Change of Variables in Multivariate Calculus*
12.4 The Second Order Derivative with Respect to a Vector
12.5 Multivariate Taylor Expansion
12.6 Hessian Matrix: Convexity and Concavity
12.7 Kuhn-Tucker Problem and Concave Lagrangian
128 Jacobian Matrices... 0...
12.9 Geometrical Interpretation of Lagrange Multipliers
12.10A Prominent Example From Econometrics: OLS Estimator
13 Chapter 13: Matrix Differential Calculus (Advanced*)
13.1 Vectorization Operator
13.2. Kronecker Product
133. Relationship Between Vectorization and Kronecker Product
13.4 Relationship Between Vectorization and ‘Trace
13.5 Examples of Matrix Valued Functions
13.6 Matrix Derivative, Differential and Their Relationship
13.7 Examples of the Derivatives with Proofs
13.7.1 Trace Functions
13.72 Inverse
13.73. Functions Related to Transposition
13.7.4 Determinant
14 Chapter 14: Probability Theory
14.1 Experiments, Outcomes and Events
142 Probability
143 Conditional Probability
144 Random Variables
145 Discrete vs. Continuous Random Variables
145.1 Discrete Case
145.2. Continuous Case
146 Expectation
147 General Approach to Expectation
148 Joint Probabilities and Densities
15 Appendix (Complex Numbers)
Course Description and Organisation
This course provides a somewhat more rigorous treatment of mathematical concepts that
are usually covered in undergraduate programs of economics. The topics include set the-
ory, univariate and multivariate functions, optimization, univariate and multivariate inte-gral calculus, linear algebra and probability theory. In the lectures, we will be focusing on
a question ‘why?", whereas during the exercise sessions (5 of them) we will dig into ‘how?’
‘Many of the explored topics will appear in the upcoming courses, especially in Advanced
Econometrics, Econometric Theory, Time Series Analysis or Individual Choice. The goal is
to make the students comfortable with those topics and make the choice of the electives and
specialization easier,
The main course books on which these lecture notes are based are Essential Mathematics
for Economic Analysis by Sydsaeter & Hammond (2nd or 3rd edition) and Further Mathe-
matics for Economic Analysis, Sydsaeter, Hammond, Seierstad & Strom, 2nd edition. How-
ever, the lecture notes are self-contained and are sufficient for studying. The linear algebra
part - especially its connection to multivariate calculus - is heavily based on Matrix Algebra
by Abadir & Magnus. Probability theory part is based on John E. Freund’s Mathematical
Statistics with Applications by Miller é& Miller. Again, the lecture notes cover all the re~
quired material with examples and the two latter books should be seen as an extra reading
only.
The material marked with the asterisk* is a more advanced material and not subject for ex-
amination - it is provided for completeness of discussions. Some illustrations are not drawn
manually and are taken from the sources. The link to the source comes along the illustration
and provides some additional information on the topic. Lastly, if you see an example which
comes together with a highlight Assignment I (II) Help, then it should be paid attention to
in order to solve the assignments correctly.
Lectures are given live on Zoom platform and the meetings will be recorded (if the class
agrees), so you would be able to re-watch and revise. Exercise sessions are held on campus
subject to the number of students. Therefore, registration is required in order to monitor the
number of students attending. Also, we can have Q&A sessions held on campus, as well
(if there is a demand). As indicated on the cover of these notes, office hours happen on
Thursdays from 15.00 to 17.00 in EC1: 275. You are welcome to drop by as long as you are
nota large group of students. Otherwise, we can arrange a consultation via email.
1 Chapter 1: General Terminology and Topology
This is an introductory chapter which covers basic set theory, distances and convergence of
sequences. These can be seen as tools for us to proceed with the upcoming chapters. Some
topics are just mentioned here (e.g. functions), but they will be covered in much more depth
in the upcoming chapters.
11 Set Theory
Definition:
set S is a collection of elements.Example.
S= {15,7},
Sz = {'all even numbers’},,
Ss = f'acar’, ‘abus’, ‘a plane’}.
Note that the ordering of elements does not matter. Also, a set does not change if some ele-
ments are listed more than once, because only the unique clements are important.
Example.
{1,1,1,2,3} = {1,2,3},
{4,4,7,10, 10} = (4,7, 10}.
We indicate that an element belongs (does not belong) to a set in the following way:
xe Si,
ye Sr
Also, sets can be finite or infinite. For example, S; = {1,5,7} is a finite set, while Sp =
{/all even numbers’} is (countably) infinite set.
We can use sets to indicate a property. For example, take S; = {all even numbers’}. Say,
we would like to distinguish the elements of S; which divide by 2. Clearly, they constitute a
new set, say $2. Then
Sj2 = {x € Sp: xmod2=0},
where mod is an operator that gives the remainder after division
In mathematics, we have the following typical infinite sets:
N = {1,2,3,...} (Natural numbers),
Z = {0,41, £2, 43,...} (Integers),
Q- {2 page Zandg# o} (Rational numbers),
R = {x such that x © N,x ¢ Z,x ¢ Q and the rest possible numbers} (Real numbers),
c
Complex numbers,
where the set of complex numbers will not be considered in this course (see Appendix of
these Lecture Notes for a brief introduction)
‘We also have some special cases of the sets named above. For example, R, represents only
non-negative real numbers (0 included), while R. , represents strictly positive real numbers
6(excluded), The same classification applies to the rest of the sets named above,
Example. Take 2 ¢ Z, Then
v2EQ,
because it is impossible to obtain this square root as a ratio. This is true for all prime num-
bers (the ones that divide by itself and 1 only). Such numbers are called irrational and they
belong to R together with other possible numbers. However, v4 — 2 € Z but it also belongs
toQ.
‘We denote the empty set as $. It is the only set with no elements. We consider such set due
to technical reasons and it serves the purpose of 0 in the context of sets
If two sets A and B are equal, then we denote this as A = B. It means that if x © A, then
x © B, as well, hence they contain the same elements, Formally, this equality can be stated
as
xEA>xEB,
xeB>xe Aor
xe AGxEB,
where -> means implication and ¢ represents equivalence.
If A isa subset of a set B, then we denote this as A B, which means that x € A > x € B,
thus every element in A belongs to B, as well. However, C also leaves a possibility for sets
to be equal. For example, A © A is true. Therefore, © also indicates weak subsets. Note
that ¢ is a subset of every set.
A set A is called a proper subset of a set B if every element in A is also in B but the sets are
not equal. We denote this by AC B or
ACB+ACBandA#B.
Note that our discussed typical infinite mathematical sets constitute proper subsets in the
following order:
NcZcQcR(cC).
All sets that we consider are subsets of some given universal set, which we denote by 9.
With the universal set, we can further develop concepts, such as product set (e.g. Cartesian
product) and complementation.
Definition: having A,B € O, a Cartesian product is
Ax B={(a,b):0€ A,b eB.Here, (a,b) is a so-called ordered pair. The important property of it is the fact order matters
Formally, (a,b) # (b,a).
Example. Take R. Then
RxR=
(xy) :x€ Ry €R}=R,
which constitutes a well-known x
y plane, or Cartesian plane:
Figure 1: Cartesian plane, Source: ck12.org,
Also, we can have
RxRx... x R=R"= (xy. %n
mE Rite ERY,
whichis called n-dimensional Euclidean space. Clearly, ifn = 3, then we have 3-dimensional
space:Figure 2: 3-dimensional Cartesian space. Source: polymathprogrammer.com
‘We will call points in R? or R° touples and triples, respectively, and use lower-case bold
letters. E.g. x = (11,2) € R? or x = (1,22/x3) € R°, A point in R" is x = (1,22, Xn)
In the upcoming chapters on linear algebra, we will give them more structure and slightly
change the notation.
Further, we explore set operations, For example,
AUB= {x Q:x€ Aorxe BY
is called the union. Also,
ANB={xeQ:x Aand x © B}
is called the intersection. If AB = ¢, then A and B are called disjoint sets. In addition, we
can have
Cao Aa
when we union or intersect a finite number of sets A; for
=1,...,m, Further,
A\B= {x O:x€A,x¢ BY
is called the difference. Using the difference, we can define the complement:
O\A= {xe O:x€0,x¢ A} = ASExample. Consider A = {1,2,3}, B = {5,6,7} and C = {1,3,4). Say O = {1,2,3,4,5,6,7}
Then
AUB = {1,2,3,5,6},
BUC = {1,3,4,5,6,7},
AUC ={1,2,3,4},
ANB
Bnc
A\,
A\C = 2},
\A = {4},
AS = {4,5,6,7},
BS = {1,2,3,4},
Co = {2,5,6,7}.
Definition: the collection of all subsets of a set A is called the power set of A and is denoted as P(A).
Therefore, if B € P(A), then BC A. Also, ¢ € P(A) always,
Example. Consider A = {1,3,7}. Then
P(A) = {9 {1}, {3} {7} {13} {1,7} {3,7}. (1,3,7}}.
Note that A € P(A) always. Also, if A has 7 elements, then P(A) has 2" elements.
Ahandy tool to work with set operations is Venn’s Diagrams. For example,
Aub ANB
IPE (PK
LED)
A\B BNA
Ge ev nS
Figure 3: Operations with two sets
10‘Venn's diagrams are helpful when verifying some logical statements with regard to set op-
erations, For instance,
is ck dnue thot ANCBNC)+(Ang)Nc?
yes!
Figure 4: Operation with 3 sets
Or, similarly,
1S & dnuz Abo AN BUC)=ANB)UC ?
wot
AN(euc) (ang) UC
Figure 5: Operation with 3 sets
Clearly, the more sets we have, the harder it is to visualize, However, 2-3 set operations are
easy to depict and verify.1.2. Least (Greatest) Upper (Lower) Bound Principle
Anon-emply set $ of real numbers that is bounded from above has the least upper bound.
This bound is often called the supremum of S and we denote it
b* = sup or bY = supx.
x5
The set $ is bounded from above if there is a real number b such that x
0, there exists x € S such that x > b* —e
Clearly, considering S = (0,5) with b* = 5, we can take ¢ = 0.01. Then bt —¢ = 4,99
However, 4,991 > b* —e and 4,991 € (0,5). Itis possible that the least upper bound, or the
supremum, does not exist. We can construct an example
Example. Consider $ = {x € Q: —¥2 < x < v2}. Note that v2 cannot be the supremum,
because V2 ¢ Q and we, by construction, only consider the subset of rationals in the inter-
val [-V2, v2]. Let us take y € Q just slightly smaller than V2. Assume that b* = y < v2.
However, by picking a large enough n € N, we can define y' = y+} < V2. Because y € Q,
then y' € Q,as well. However, y’ > y. By the same technique, we can construct y” € Q, such
that y' < y” < ¥2. Therefore, for any y € S we can find another y' between y and V2 which
12does not bound all x € S, Clearly, if § = {x € Q: -2 < x < 2}, then supS = maxS = 2,
because 2 € Q.
Similarly to the least upper bound, we can define the greatest lower bound. In particular, a
non-empty set S of real numbers that is bounded from below has the greatest lower bound.
Itis often called the infimum of S and we denote it
c* =infS or infx.
xs
The set S is bounded from below if there is a real number ¢ such that x > ¢ forall x € S. The
number c is called the lower bound of S. The greatest lower bound of $ is a real number c*
such that it is an upper bound and c* > c for every other upper bound c. We formulate the
definition using two qualifying conditions.
Definition: c* is the greatest lower bound of S ifand ontly ifthe two following conditions hold:
# forall x € S if follows that x > c*;
+ forall e > 0, there exists x € S such that x b for all qualifying c and x > 0 for all x € [0,5]. Note that in such situation the
infimum coincides with the minimum,
Now, consider § = {x € R : 0 < x <5}. In other words, it is an interval (0,5], where 0 is
not included. Again, b = —1,—2, are the lower bounds. Then
infs
=o
because 0 > c for all qualifying ¢ and x > 0 for all x € (0,5], again. However, note that the
minimum of S does not exists, because 0 is not inchided in the set. However, itis the largest
number that bounds all the x € (0,5] from below. Therefore, again, the infimum of a set does
not have to be included in that set.
1.3 Convex Sets
Geometric definition: a set S C IR (or R?, R?,...,R") is convex if the line segment between any
two points in S is entirely within
The figure below provides examples
13Ss
Figure 6: The line segment between any two points in S has to be in S
‘Typical examples of convex sets are circles, squares, triangles, ete. In other words, ‘nice’ sets.
The empty set $ is always convex and so are sets that contain single element (singleton sets),
The figure below provides more examples of such sets:
Figure 7: First line represents convex sets, whereas the second - not convex
Algebraic definition: a set S CR" is a convex set ifz = Ax + (1—A)y © S forall x,y © S and
A (0,1)
‘The point z in the definition exactly represents a point on the line segment between the two
points. If A (0,1), then z in on the segment and cannot coincide with x or y from S. How-
‘ever, we allow A € (0,1), because this case includes singleton sets. Then A = 1, trivially.
Theorem: intersection of any finite number of convex sets is a convex set.
Proof. We begin by proving that A; 9 Az is convex if A, and A are convex. ‘Take x1,%2 €
Ay (Ag and let 2 = Ax; + (1—A)x; be located on the line segment between x; and xz
Cleary, because Ay is convex, then ¥ € Ay. Also, A2 is convex and so £ € Az. This implies
that ¥ € Aj Az. In order to generalize, we can write
Na
1
((A10 Az) 1A3).--) An),where we intersect pairwise when one member of the pair involves more and more sets,
However, from the first part of the proof we know that they are convex and intersecting
with one more Aj preserves convexity.
14 Distances
Let x = (x1,22,...,%n) and y = (yi,Y2,-..,¥n) be two points in R”. Then a function d(x, y)
R" x R" > R, is called a distance function or metric if the following holds:
(x,y) 2 0;
(x,y) = if and only if x = y;
(x,y) = dly,x);
(x,y) 0 > by definition of the function;
(x,y) = 0if and only if x = y = holds by definition of the function;
(x,y) = dly,x) = holds, because d(x,y) = d(y,x) = 1ifx # y and d(x,y)
d(y,x) = Oifx = y;
(x,y) < d(x,z) +d(z,y) > holds, which is seen by checking 5 cases:
= xs y=2d(x,y) =0<04+0=d(x,2) +dz,y);
y, but zis distinct > d(x,y) = 0 <1+1= d(x,y) =1<0+1=d(x,2) +d(zy);
= y =z, but xis distinct -> d(x,y) =1<1+0=d(x,2) +d(z,y);
~ x,y,z are distinct > d(x,y) = 1<1+1=d(x,z) +d(z,y).
Therefore, all the requirements are satisfied and this function is a metric in R.
1.5 Sequences and Convergence
In this subsection we start considering notions of the limits as sequence tends to infinity
and combine them with distances. Most of the examples we consider here are related to se-
quences and distance in R, however, the results hold in RF for k € N.
Definition: a sequence {x,}%., € IRK is a function that for each number n IN yields a corre-
sponding point x, € RX
In other words, {x,} ; in an infinite sequence of elements in R that are ordered and there
are some mechanisms that generate x». The point x, is called the n-th element (or member)
of the sequence. Note that we index the sequence with mt and indicate the dimension with k
This is because 1 is a conventional indexing for sequences in literature. However, it is just a
notation,
Definition: a sequence {xn}, € R* converges to a point x € Rif
(xq, x) 9 Os 9 00,
Alternatively, for each € > 0, there exists no © N such that d(xq,x) < ¢ for all n > mo
Here, x is called the limiting value (or the limit) of the sequence. We can write
16We say that the sequence diverges if,
lim x, = +00 element-wise
Example. Let {x,}9; = {2} ,- Clearly,
1
a(x) = bin x| = [5-0] 0
as 1 —> 00, Alternatively, given any ¢ > 0. let us choose 119 = }. Then, for all 1 > to, we
have
mated cesocm co. Ifr > 1, then the sum diverges and if r = 1, the test is
inconclusive. This test is called D’Alambert's criterion.
Example. Again, consider D7} a’. Then
ma
lim
rite a
{al
Therefore, if |al < 1, the sum converges and if |a| = 1 or |a| > 1, the sum diverges,
Now, consider D/5j ja’. Then
(es Me gg = la,
because limyro St = limyso (1+ §
converges if Ja) <'1
1. Therefore, the same rule applies and the sum.
1.6 Open, Closed and Compact Sets
Definition: B,(a) is called an open ball in R" around a € R" with radius r > 0 and it is a set
such that
By(a) = {x € RY :d(x,a) O such that B,(a) © S.
It means that a set is open if it consists only of interior points, i.e. the boundary is not in-
cluded. In this way, we can always find r > 0 and have an open ball in that set.
Definition: a point a ¢ R" is called a boundary point of the set S © IR® if every B,(a) contains at
least one point in S and at least one point in S° (complement of S).
Figure below depicts the definition.
19Figure 9: The open ball around the boundary inchudes points in the set S and in S°
Looking at the same example of a unit circle centered at (0,0), it is clear that an open ball
around the boundary point includes points in $ and S for any r > 0.
Definition: a set S CR" is closed if it contains all its boundary points.
Example. An example of closed set in R? is Figure 8 with its boundary (the circle) included
An example in R is an interval [a,D], where the boundaries are included.
Definition: a set S C R" is bounded if there exists c © IN such that d(x,y) < c for all x,y € S.
Intuitively, bounded sets do not ‘contain’ infinity. For example, $1 = (0, 1] is bounded, but
S2 = (0, co) is not bounded. The following notion of a compact set is very important, because
it will help us to guarantee that a function attains its maximum/minimum in the context of
optimization (see Chapter 6).
Definition: a set S CR" is compact if and only if it is closed and bounded.
Example. The following set is compact
S={xeR*:0< x <5,0< x <5}.
Note that it is a square (in R2) with the edge length of 5. Because of the weak inequalities
(), the boundary points are included in S. Visually:
202 3 aS
Figure 10: Square in R? with included boundary points constitutes a compact set
Note that if we change $ to
Sp = {xe R05 x <5,0 < x2 <5},
it is not compact anymore because half of the boundary points of Sy are not included, even
though it is bounded,
Note that the empty set ¢ and R” for 1 > 1 are both open and closed.
2 Chapter 2: Functions of a Single Variable
In this chapter we cover theory of elementary functions of a single variable. In the upcoming
chapters, we will generalize it to functions of multiple variables.
2.1 Functions, Domain and Codomain
Definition: a function is a rule that for each x € A assigns one element y € B.
‘A function is usually denoted y = f(x) or f : A > B (a function from A to B). One says that
f maps x to y. The set A is called the domain of function f and the set B is called codomain
of function f
The domain of function f : A + B is the set of input values x for which the function is
defined. It may contain all possible values of x, eg
f(x) = VE, A= (x ER: x > 0}.
However, itis possible to define a domain that does not include all possible values of x, such
flx) = VE, A= (xR: x23},
The domain of a function f : A - Bis often denoted as Dy.
21The codomain of a function f : A + Bis the set of output values f(x) into which the output
values are constrained to fall.
The range of a function f : A > Bis usually denoted by V;, where
Vy = {y © B:y = f(x) for some x € A}.
Example. Let y = f(x) = x*, A = R, B = R. What is V;? Clearly,
Vy = {y © Bry =x" forsome x € A} = (0,00) =R
this means that Vy = Ry C R = B. Hence, some elements in B may not be in Vy
The precise definition of a function requires:
* therule f;
* domain A;
‘+ codomain B.
Example. Let
f(x) =2, A=R,B=R;
g(x) =x, A=[0,1), B=R.
Although the mapping rules are the same, these functions are different.
2.2 Injective, Bijective and Surjective Functions
Definition: a function is called injective (or, one-to-one) if every element of the codomain is mapped
to by at most one one element of the domain, or each y © B is an image of at most one x € A.
Figure 11: Injective function, Source: onlinemathdall.com
22Ifa function is injective, then f(x) = f(x") = x = x" for some x, x" € A,
Example. Let f(x) = 22. Also, Dy = B = R. Then
£(3) f(-3)
However, ~3 #3, Therefore, f is not injective.
Definition: a function is surjective (or, onto) if every element of codomain is mapped to by at least
one element of the domain, or each y © B is the image of at least one x © A.
A B
|
J \
Figure 12: Surjective function. Source: onlinemathdall.com
As the definition implies, V = B if f is surjective.
Example. Let f(x) = x7. Also, A = B = R. Clearly, Vj = Rs, so Vy # B, thus the function
is not surjective. However, it can be made surjective if we redefine B. Setting B = R., the
function becomes surjective
Definition: a function is bijective (or, one-to-one and onto) if every element of the codomain is
mapped to by exactly one element of the domain, or each y € B is the image of exactly one x € A.
The definition means that the function is both injective and surjective.
Figure 13: Bijective function, Source: onlinemathdall.com
23Hence, if Dy = A = B = Vy,a function is is bijective, however, whether domain is equal to
co-domain depends on the mapping f.
Example. Let f(x) = x2, Also, A=
=R,. Then, f is bijective.
Example. Let f(x) = ¢ and A = R,B = R. This function is not bijective, because not all
members of the codomain are mapped to by exactly one member of the domain. However,
if we let B = R,.,, then the definition is satisfied
Therefore, we can summarize in the following way:
Injective and Surjective => Bijective;
Injective, but not Surjective = Injective only;
Surjective, but not Injective = Surjective only;
Non-injective and Non-surjective => neither Injective, nor Surjective
Example. What functions are these?
f(x)=x<4, A=R, B=R,
glx) = x4, A=R, B=R,,
h(x) = x4, A=R,, B=R,
u(x) =x4, A= Ry, B= Ry
Hence, f(x) is neither, g(x) - surjective, h(x) - injective, u(x) - bijective.
If a function is bijective, then f has an inverse usually denoted by f-!. Note that this is
the inverse of the mapping rule and not the number given after the mapping. The latter is
denoted by f(x)~! = 74. The figure below illustrates f-?
24“P
P<
4
Dey
Y
Figure 14: Bijective function has an inverse
Example. Let f(x) = x7, A= R., B= Ry. Clearly, itis bijective. Let us find f-!
y= f=? x= f"W)= Vi
where — ,/7 is ruled out because x > 0. Therefore the inverse is
g(x) =v.
Visually,
Figure 15: Inverse functions are symmetric around 45° line
Note that g and f are symmetric around 45° (x = y) line.
252.3 Convex and Concave Functions
Definition: a function f is called concave (convex) if it is defined on a convex set and and the line
segmient joininig any two points on the graph is always below (above) the graph
A function f is strietly concave (convex) if the line segment joining any two points is strictly
below (above) the graph. Figure below depicts such case
Figure 16; Strictly convex and concave functions
Note that linear functions are both convex and concave, because their graphs constitute line
segments themselves.
Convexity and concavity of f can be expressed algebraically as well. In particular, we em-
ploy Jensen’s Inequality. Let SC R or (R?,R?,.... be a convex set. A function f is convex
on Sif
flax + (1 Aly) SAf(x) + 1 AVF)
for all x,y € Sand A € [0,1]. Similarly, f is concave, then
flax + (1 Ady) > AF (x) + (1 AVF),
When strict convexity and concavity are under consideration, weak inequalities are replaced
with strict inequalities (< and >, respectively).
‘A function f is convex (concave) on $ if and only if —f is concave (convex) on S. Also, f is
strictly convex (strictly concave) on $ ifand only if _ fis strictly concave (strictly convex) on
s
Example. Consider f(x) = 1 —x?. Let us show that itis concave on R. Let a and b be two
arbitrary points. By definition
f(Aa+ (1—A)b) > Af(a) + (1A) f(b)
26Let us start with the left-hand side and insert the function into the definition:
f(da+ (1 A)b) = 1 (Aat (1 A)b)? =
Aa? — 2A(1 — A)ab — (1 — AP
At the same time, the right-hand side gives:
Af (a) + (1 A)f(b) = AGA a?) + (1A). — 8)
Now, we compare according to the Jensen’s inequality:
1 = A?a? — 2A(1 — A)ab — (1 A)*b? > ACA — a?) + (1-A)(1- )
=> 1— Aq? - 2A(1 — Ajab — (1— A)? > A= Ag+ 1-—B -APAP
> Ad? — Aa? + Ab? — a2? — 2Aab +27 ab > 0
=> A(L—A)a? + A(1 — A) = A(1 = A)2ab > 0
=> A(1—A)(a? — 2ab +B) > 0
=> A(1—A)(a— by > 0,
which always holds, thus f is concave. Also, because the inequality holds for any A € (0,1),
f is also strictly concave whenever a # b.
2.4 Continuity
Definition. a function f with domain A © Ris continuous at a point a € A iffor any given e > 0
there exists 8 > 0 such that |x —a| < 6 for all x © A with |f(x) — f(a)| < €, or ifx > a, then
F(x) > f(a),
Roughly speaking, f is continuous if small changes in x cause small changes in f(x), which
implies no abrupt ‘jumps’ in the function. If f is continuous at every point ina € A, then we
say that it is continuous on A.
vise, we have
© F(x) > f(a) asx >a;
© limsoe f(x) = f(a)
To check if the function is continuous at x = a, we need to check 3 steps:
‘+ the function f is defined at x =a;
limyse f(x) exists;
# limyse f(x) = f(a), ie. the limit coincides with the functional value.
Example. Is f(x) = 4x7 continuous at x = 1?
7* Clearly, f is defined at x = 1 and it gives f(1) = 4;
# limy 51 f(x) = lim, 51 422 = 4, hence the limit exists;
# lim, 5; f(x) = 4 = f(1), thus the limit and functional value coincide.
Example. Is f(x) = “22 continuous at x = 0? Clearly, not, because f(0) is undefined. Now,
let us define
a) = {Race
xifx=0
Now, ¢(0) = 0, however, we can show that limy jo g(x) = 1, because lim, 9 $8% = 1
(we will prove this in the upcoming chapter). Thus, lim,o g(x) # g(0). Hence, g is not
continuous at x = 0. Lastly, let us define
u(x) (pce
lifx=0
Clearly, w is continuous at x = 0, because u(0) exists at it is 1. Also, limy yo u(x) = 1 and
lims-s0 u(x) = (0).
Example. Let
xhlife <0,
fe) (eyes
There is a jump in the mapping rule itself. Clearly, f(0) = 0, however, the limit does not
exists, This is 50, because
dim fla) =1,
dim f(x) = 0,
where the respective limits represent limits from the left and from the right. They do not
coincide.
Definition: the limit of f(x) as x tends to c € A from below (or, from the left) is called the left limit:
limy- f(x) = b; the lint of f(x) as x tends to c from above (or, right) is called the right limit:
limsset f(x) = ¢.
The necessary and sufficient condition for the limit to exist at x = a is that the two one-sided
limits coincide and exist at x = a. Formally,
lim f(x) = a¢> lim f(x) =a = lim f(x)
28Example. Let f(x) = 3. Clearly,
lim
x0
lim
x0
Infinity is not a number, hence it is not a limit. which means that both left and right limits
do not exist.
2.5 Intermediate Value Theorem
Let f : Dy > Rbe continuous on [a,b]. Then, if Visa pointin [2,6] such that min{ f(a), f(b)} <
J < max{f (a), f(b}}, then there exists ¢ € [a,b] such that f(c) = J.
This theorem says that a continuous function on [2,b] takes all the values between f(a) and
f(b), or the function takes all the ‘intermediate’ values. Intermediate value theorem (IVI) is
an existence theorem, it is silent about how to find c. Also, ¢ may not be unique since IVT
tells us that there exists at least one such c.
Example. Let h(x) = #5 + = +x—3on [—1,2]. Show that there exists at least one solution of
h(x) = 0 on R. To show this, we need to understand if 0 is between f(—1) and f(2). Clearly,
f(-l) = -43 <0 < f(2) = 38. Note that h(x) is continuous, therefore, there exists c in the
domain such that h(x) = 0 by the Intermediate Value Theorem. This guarantees at least one
solution.
3. Chapter 3: Differential Calculus (Single Variable)
In this chapter, we develop differential calculus for function of a single variable rigorously.
In Chapter 5, we generalize these notions to multivariate functions.
3.1. Derivatives
Definition: for h > 0, the derivative of a function f at a point a is given by:
(q) = tim Leth) = fla)
P)= je
If this limit exists, then the derivative at a exists, The derivative represents an instantaneous
rate of change of f at the point a.
Example. Let us find the derivative at x when f(x)
Fle h) = FO) _ yt
/ s2 4 2xh 1
) = tim LEAN) AFC) A ig RY A iggy STI
i) i imi = i
2xh +H?
= lim(2x +h) = 2x
29for h > 0,
Note that in the example above, we implicitly have the limit from the right. However, we
can obtain the limit from the left
He) f(x) = fe=h) (xh? 224 2xh— Ie
FC) = fies i = hy 7
2
= tim ZEA hima — h) = 2x
a0 mn
Therefore, for the derivative to exists at specific point in the domain of f, both limits have to
coincide.
Notation. Derivative of y= f(x) wrt (with respect to) x is usually denoted by f"(x) or
which is Lagrange’s notation. Alternatively, Leibniz’s notation is used:
dy df(x)
ax’ “dx
d 4
or FS).
Whenever we consider function of a single variable, we will stick to f"(x) most of the time,
However, we will switch to Leibniz’s notation in Chapter 5, where partial derivatives are
considered. At the same time, Leibniz’s notation is reasonable, because it always reminds us
that the derivative is a limiting ratio.
The table below provides derivatives of some elementary functions:
30Function name Function
Polynomial SF (2) = ana" + anit” +... +012 +09
Rational 2
Power
‘Exponential
Logarithmic
Sine
Cosine
‘Tangent
Function name
Polynomial Sf (@) = anne"! + ans (n= 1) 2"? +... 441
Rational 7 @) = FOLQPIE)
(e@i
Power fe) =a"!
Exponential —_’(z)
Logarithmic f’(z) =}
Sine I! (2) = cos
Cosine f'(2)=-sing
Tangent f@=—ae
Figure 17: Derivatives of some elementary functions
In the similar fashion, we define the second derivative at a:
ny flat) —f"a) Llavhenflaen) _ flav) f)
P= i
= tim £(¢+2h) = 2¢(a +h) + f@)
ho re
Example. Let us find the second derivative at x when f(x) = x. In particular,
1g) — yam FOE 2H) = 24H) + fx) 3B ch + Ah? — 292 — eh — 282 4 2?
PE 2) Bi ig
=2
Notation. The second derivative of y = f(x) wrt x is usually denoted by f"(x) or y", which
is Lagrange’s notation. Alternatively, Leibniz’s notation is used:
Py @, &
ED or Br)
31‘We consider three very important rules of taking the derivatives. In particular, the product
rule is the following. Define u(x) = f(x)g(x), where f(x) and g(x) are continuous at x,
Then
u!(x) = f(x)g(x) + FOda(2).
Proof. We start by considering the definition and manipulating it by not changing anything:
wx) = fm LOE WCW) = fade)
a) i
= tim LOE HSH) = fle = hg (a) + fOr Hg) — fla) (2)
A
m0
Note that we can distribute the numerator and split the limit
wa) = fm LOE MDa +H) ~ fle + Wala) + fle + Wale) — FaC)
i h
tn LOE HY (g + H) = 968) + gO HH) =F)
iw h
= tn FON +H) = 869), SOTA) = FO)
iw h i h
i f+ fin LEAL (LO = 8)
= fe) + 9F'0),
because lim), so f(x +h) = f(x) due to continuity at x.
Another important rule is the chain rule. Define u(x) = f(g(x)). ‘Then
ul(x) = f'(g(x))3'(x).
Proof®. This proof is lengthy and quite messy, therefore you can skip it. However, reading it
is useful, because it involves quite a few concepts that we discussed and puts them in action,
By definition we have
( gx +h) ~ a(x)
so) = jg AD
The existence of the result above implies that
lim (& h) = 3%) #0) =
Bee h
32Therefore, we define an auxiliary function
oH) set 80) _ g(x) ith £0
Oifh=0
Observe that limo 2(H"
we can write
(0), hence it is continuous at h = 0. Assuming that h # 0,
glxth) = g(x) +h(o(h) + 9/2),
but note that this expression is valid for any kr including 0. Because f(x) is differentiable, we
can define a similar auxiliary function:
ww (8 A) _ pray itk #0 ,
0 ifk=0
where lim, 49 = 0 = w(0), thus it is continuous at 0. Note that here z and k are the so-called
placeholders. They are some generic variables and we will ‘match’ them with the variables
‘we need, Similarly, assuming that k # 0, we obtain
F(Z +R) = f(z) + klw(k) + £@))
Note that by definition, we obtain
ACB) — py Fle +H) = fl)
dx
a i ”
where we focus on the numerator. Indeed, using the result for g(x +1), we get
S(glx +h) ~ fale) = fale) + kU) + 8'(2))) ~ f(g)
At this stage we do the ‘matching’: z — g(x) and k = kt(o(h) + g’(x)). This translates to
FE+K) = flg(x) + h(o(h) + g'(x)) = f(g(e)) + h(o(h) + 8'()) (w(®) + f'(g(x))).
Inserting this back leads to
S(glx +h) ~ flgle)) = h(oh) + 8/0) (@®) + f(s),
because f((x)) cancels out. However, this is just a numerator in the definition of the deriva-
tive, so
ESD pg Aa 1O) + FED) — sim oh) +9") WER) + F180)
dx 0 ha
= (igo +90) (un +fe))
Clearly, limy 9 0(Ft) = 0 = 0(0) and limy, ok = lim, ,9 h(v(h) + g'(x)) = 0. Because w is
continuous at k = 0, then limy, 9 20(k) = w (lin, k) = w(0) = 0. Therefore,
s8) YD) tim £68 ster) S(g(x))
a)
= F'(s))s'(x).
33,The final rule is quotient rule, which is also known as rational (see Figure 17). In particular,
PG)
given f(x) = $2,
Q(x) P"(x) = O'(2) P(x)
f(x) my
Proof. For this proof, we use another rule from Figure 17, In particular, logarithm derivative
in addition to the chain rule. Indeed,
In f(x) = In P(x) ~ In Q(x).
By the chain rule:
1 gy PO) 2H), 9 P@)_ Q)) _ Pa) (P'@)_ Oe)
Fat Fay gay 220-20) (Fey Sy) ~ Oe (Fey Oe)
P(x) P(x)Q"(x) _ Qx)P'(x) = P(x) O'(x)
Gey FG) OR)
Example (Assignment I Help). Take the derivative of the following function and evaluate it
atx =0
F(x) = (cosx)
Note that it does not obey the conventional rules we discussed and proved. To proceed, we
will need to invoke a trick. In particular,
In f(x) = In(cosx) = (In f(x))! =
> f(x) ts) (= tn(coss))
> 18) = f(0) (PREEE EP tn(coss) et sins)
(cos xy osx case
= f'(x) = (cosx) (SSS in(cos.) Soon)
, 2 (sind +e cos0 ree
= f'(0) = (cos0) (eo In(coso) ~ <5 sino)
VWax0-1x1x0)=0,
because cos = 1, sin0 = 0 and In(cos0) = In1 = 0.3.2. Differentiability and Increasing (Decreasing) Functions
Definition: a function f is differentiable ifthe derivative of f exists at each point in its domain.
Note that f continuous over all Dy can still be non-differentiable. This is so, because con-
tinuity does not imply differentiability. There is no need to state this formally, because it is
sufficient to provide a counter-example.
Example. Consider f(x) = |x|. The figure below gives its graph:
Figure 18; The function has a kink at x = 0
This function is continuous at x = 0, however, the derivative does not exists, because
im £LO+h) = £(0) £O+H) = F(0)
i, k # jie h .
ie. the left and right limits do not coincide. This happens every time a function f has a kink,
1 then it is continuous at x = a.
Theorem. if f(x) is differentiable at x =
Proof. If f(x) is differentiable at x = a, then
Pe) = jn LL
exists. We can write
fe) F(a) = LALO, a)
and by the simple limit rules:
lim(f(x) — fl@)) = tim LO = £10) hat 0) = f(a) x0=0,
sn m Xa oe
35which implies that
lim(f(2) ~ f(a)
0 lim f(x) = f(a),
This means that f(x) is continuous at x = a. Note that if f"(a) did not exist, then we would
not be able to claim that the result is equal to 0.
Derivatives can also help to detect if f is increasing (decreasing).
Definition: a function f is increasing (decreasing) if
FO) > fl) FO") < f)
whenever x* > x.
For strictly increasing functions, we replace > (<) by > (<). We have the following
derivative results:
# ('(x) > 0 @ forall x € $ the function is strictly increasing in S;
© f(x) > 0 @ forall x € S the function is increasing in S;
+ f'(x) <0 @ forall x € S the function is strictly decreasing in S;
© f'(x) <0 & forall x € S the function is decreasing in S;
© #"(x) =0 4 forall x € 5 the function is constant in S.
3.3 Convex and Concave Functions Revisited
In Chapter, 2 we discussed convexity and concavity in terms of Jensen’s inequality. However,
convexity and concavity can be tested by examining the second derivative of f. In particular,
# f(x) > 0.4 forall x € S the function is strictly convex in S;
# f'"(x) 2 0 @ forall x € S the function is convex in S;
+ f"(x) <0 @ forall x € S the function is strictly concave in S;
f(x) $0 & for all x € S the function is concave in S;
+ f"(x) = 0 & for alll x € $ the function both convex and concave in S.
36Note that all the linear functions are both convex and concave. We will come back to these
statements more formally when we cover linear algebra and Hessian matrices,
Example. Let f(x) = 2°, g(x) = e-* and h(x) = sinx. Also, S = R. Then
J"(x) = 6x > both convex and concave in R,
sia
h(x) =
* = strictly convex in R,
sin x = both convex and concave in R.
3.4 Linear Approximation and Differentials
Consider a function f(x) that is differentiable at x = a. Then the linear approximation of
(x) around x = ais given by
F(x) © f(a) + f'(@)(x—a),
given that x is close to a
Note that f(a) + f’(a)(x — a) is a linear function of x. In particular, it describes the tangent
line to the graph of f(x) and the tangency point is a. Visually.
ate!
Figure 19: The slope of the tangent line is given by f"(a)
Note that the slope of the tangent line is given by f"(a). Also, f(a) + f"(a)(x
(f(a) — f'(a)2), which means that the tangent line cuts y axis at y = f(a)
Example. Let f(x) = In(1 +x). Let us find its linear approximation around x = 0. In
particular,
In(1 + x) & f(0) + f'(O)(x— 0) = 04 1(x— 0)
37because In(1 +0) = Oand f'(0) = ty =1.
Definition: Consider a differentiable function f (x) and let dx be an arbitrary change in the variable
x. Then the differential of f(x) is given by
df (x) = f'(e)dax.
We interpret the differential in the following way: if f(x + dx) — f(x) is the total change of
the function when x changes by dyx, then dy (x) represents the change in f(x) if it changed
at the rate fixed at the point x. The figure below represents this:
y
y= f@) |
!
‘y @
ax
Figure 20: Ay = Af(x) differs from dy = dxf (x) because the rate of change is fixed in the
latter case. Source: nabla.hr
Note that the differential is obtained from the linear approximation of f(x +dyx) around an
arbitrary x:
Sle + dex) & f(x) + f(x) (x + dx — x) = f'(x)de > f(x + dex) — f(x) & f'R)dex
=> AF (x) & f'(x)dax.
3.5. Taylor Approximation
Approximation by linear function of some arbitrary f(x) may notbe accurate enough, there-
fore we can proceed with higher order approximations. For example, the quadratic approx-
imation around x = ais
FO) % fla) + fle — A) + FP"(@)(e~a
for x close to a
38Note that this is a quadratic function in x. In general, we can approximate using 1-th poly-
nomial. In particular,
cos 3, sind 4
sinx = sin(0) + SF°(x — 0) — [P(x — 0)? — SE (x — 08 + F(a 0)
ee
6m
because f’(x) = cos x, cos! x = ~sinx.
‘Also, generally,
However, it is possible to obtain equality with a finite sum (up to 7), although the function
may be differentiable more than 1 times or possibly infinitely many times.
Theorem: if f(x) is m +1 times differentiable in the interoal that contains x and a, then
LO) (a — a)" + Resale,
FAO (x — a)" is the remainder with ¢ € (x, 4].
where Ry si(e) =
The remainder R,-1(x) suggests an upper limit for the approximation error. Suppose that
for all x the absolute value of f"*1(x) is at most M ¢ R,... Then
M
[Resale a)
39Example. Let f(x) =e and a = 0. Then, again,
x eo
Cwltxt Test t ey
which means that that Taylor formula is
0114.04, 801 | 0.001 | 6.0001
Ta a
for c € [0,0.1]. Note that e® < ¢! (= 1.10517...) < 1.2. Hence,
_ 0.0001. 0.0001
x1.2=5 x 10°,
Ry(O1) = Spe’ < Sx 1.2 =5 x 10
Note that if we create an equality with the first derivative already, we obtain
f(x) = fla) + (xa),
for ¢ € (x,a). This implies that
plo) ~ LO= £0)
x—a@
This result is also known as Mean Value Theorem.
3.6 L'Hopital’s Rule
This rule connects derivatives and situations when limits are undefined. If f(a) = g(a) = 0
and g(a) #0, then
tim £® — £@)
ve g(x) gia)
The same rule applies if lim. f(x) + oe (9) and limy s(x) > 0
lim, sa g(x) # 0. Then
Jim £2. — Hitman f'(%)
tg) Timea g'(2)
), and also
40Example. Let u(x) = S82 = £3. Clearly, evaluation at 0 is undefined. However,
lim u(x) = lim “2% = jim S8* = 1
fim w(x) = ling = = bing = 1
since cos0 = 1.
Proof (of the 0/0 case). Given that ¢(a) # g(x) and g(a) = f(a) = 0, we have
fl) _ fl) fla) _ Sete
s@) ~ s@)—s@) ~ Dal
Now, we take the limit as x — a:
f(x) f(x)
Ey) ~!B5G)—s@) ~ iim, a Fe)
4 Chapter 4: Trigonometry, Trigonometric Functions and Their Deriva-
tives (Bonus Lecture)
The material of this s
These topics used to be included in an extra lecture in this course to refresh the students’
memory. However, depending on the background, the topics may have been covered differ-
ently or not covered at all during the undergraduate education. Therefore, the most relevant
parts are introduced here. If you are familiar with the basic trigonometry, you can skip this
chapter
ction deals with frequencies, periods and trigonometric functions.
4.1 Unit Circle
A unit circle is the workhorse in trigonometry. Definition: @ unit circle is a circle with the
radius of 1
The figure below provides an example:
aaad
4,0)
Figure 21: The unit circle
Further, the radius will be denoted as r and an angle will be denoted as 0.
The r = 1 is important because we will create triangles inside the circle and for the conve-
nience we want the hypotenuse to have length of 1. This is the core idea for the construction
and analysis of trigonometric functions. To proceed, we set the ‘language’ of a right triangle:
hypotenuse opposite
adjacent
Figure 22: A right triangle
4.2 Radians
It is usual to measure angles in term of degrees, e.g. 0 = 90°. However, the alternative is to
use radians. This measure connects degrees, the circle circumference and diameter. Recall
that circumference (¢) refers to the perimeter of a circle and diameter (d) is twice its radius.
2Formally:
d=,
= 2nr = red.
Definition: One radian is the angle subtended at the center of a circle by an arc that is equal in
length to the radius of the circle. The definition can be visualized:
~~ oe.
to
Maskiud (%)
Figure 23: Definition of one radian
The definition can be formalized mathematically. We start with the length of any arc from
the circumference of the circle:
6
bare = Bt
@
> fae = 255 (we work with the unit circle)
S1= ay bead (remember the definition!)
=> Ui rad = 360°.
Therefore, 27¢ radians are equivalent to 360°, 7¢ radians - to 180°, ete.
43 Trigonometric Function on Unit Circle
Definition: A trigonometric function that we consider is a periodic mapping f : R > [~1,1
where the domain is angles 8 measured in terms of radians.
Two very important trigonometric functions are sine and cosine: sin # and cos6, which are
defined in the following way:
sing = —Cppesite_
. Hypotenuse’
cose — Adjacent
Hypotenuse
3Observe that they are functions of @ but not of length of a hypotenuse, adjacent or opposite,
This is so, because for the fixed 0, their length can differ as long as the proportions stay the
same. That is, sind and cos@ are simply the ratios. In other words, if we increase (reduce)
a hypotenuse, adjacent and opposite by the same number, @ will not change. Because of
this, we can conveniently squish the right triangle to unit circle and focus on @. In such case
r=hypotenus
Note that because we made hypotenuse length 1 (r = 1), the value of sin@ will always be
reflected on y-axis and the value of cos@ on x-axis.
Kaa)
qf eo)
Figure 24: Ratios in the unit circle
The following thought experiment can help to understand the mechanics:
‘* What is the value of sin# and cos when # = 0, §, 7, 31t,27t? To answer this, we think
how the hypotenuse of the triangle rotates:
= Because the value of sine is reflected on the y-axis, we have sin0 = 0, sin 7/2 =
Lsin x = 0,sin3x/2 = —1,sin27 =0;
— Because the value of cosine is reflected on the x-axis, we have cos0 = 1, cos 1/2 =
0,c0s 7 — —1,cos3/2 — 0,cos27t = 1
‘+ Remember that 2rt rad=360°, hence we travel all the circle with = 27t and come back
to the same value. Can we go for 8(z) = 227 when z € Z?
~ Yes. We will return to the same position for each z = 1, 2,.
* Can we rotate clock-wise (ie. have negative radians)?
— Yes. Ifz € Z_, then the hypotenuse rotates clock-wise.
‘* What is the value of sin(@ + 1/2) and cos(0 + 7/2)?
“4— This just means we rotate the hypotenuse by additional 90°. Observe changes of
signs for both sin and cos.
‘* What is the value of sin(0 + r) and cos(@ + 7)?
— This just means we rotate the hypotenuse by 180°. Observe changes of signs for
both sin and cos.
‘* These little experiments lead us to a conclusion: sin x and cos x are periodic functions
with the maximum value of 1 and the minimum value of -1
The rotations above can be illustrated with the following picture:
Figure 25: Values of sine and cosine. Source: Wikipedia.org
44 Graphs of Sine and Cosine
If we graph how the values of sin @ and cos @ change with 8, we obtain the following graph
45Sa
~3n-22 24 -
2
nis
Figure 26: Graphs of the functions (sine and cosine waves). Source: academickids.com.
Remember that a function is even if f(—x) = f(x), while itis odd if f(-x) = —f(x). Indeed,
sin(—8) =~ sin@,
cos(—8) = cos8,
thus sine is odd and cosine is even.
Itis clear that both sin and cos@ complete the circle for every @ = 27. Hence, their period
(7) is 27. The quantity which is reciprocal to the period is called frequency: f = T™
Frequency of trigonometric functions can be changed by modifying by scaling the argument:
Figure 27: The frequency of sine is doubled (the period is twice as low). Source: under-
groundmathematics.org
4.5 Trigonometric Identities and Useful Limits
‘We can mathematically relate sine and cosine and obtain some results that are important
if we want to simplify complicated expressions we sometimes receive when working with
46trigonometry. Note that we are discussing equality (=) vs. identity (=). In the context of
functions, the first one holds for a specific value of x, whereas the second one holds for all x
Some useful and handy identities:
# sin(0 +) = sino;
cos(0 + 1) = —cos 0;
sin 26 = 2siné cos @ (double angle formula 1);
# sin? d+ cos?@ = 1;
'* cos 28 = cos? @ — sin®# (double angle formula Il)
We can easily prove sin® @ + cos? = 1.
Opposite” ' Adjacent? __ Hypotenuse’
sin? @ + coste — PRS ,;§ ees
Hypotenuse? * Hypotenuse? ~ Hypotenuse
where the second-to-last equality holds due to the application of Pythagoras Theorem in the
numerator.
Another set of identities (expansions) is very useful in differential calculus for trigonometric
functions:
+B) = sinacosB +cosasin §;
‘* sin(a — f) = sinacos B — cossinf;
* cos(« +) = cosa cos B ~ sinasin B;
'* cos(a ~ 8) = cosa cos + sina sin 6,
The following limits are essential in deriving the expressions of the derivatives of sin x and
cos x:
746 The Derivatives
It is interesting to observe that instantaneous rate of change of a trigonometric function is
described by another trigonometric function. Particularly:
a) sin’ x = cosx,
b) cos! x = —sinx.
Proof:
sin(x + Ax) —sinx
a) sin’ x = fim Ste + Ax) ~ sine
arse Bx
tim co8x8in Ax + sinx cos Ax~sinx he first expansion
Jim, (Use the first expansion)
= tim cosxsinx sin xcos Ax ~ sinx
~ aso Ax aro Ax
sin Ax _ 1 —cos Ax
=cosx lim sinx lim ——°S"* (Use the introduced limits)
axoo Ax axao Ax
= cos x,
cos(x + Ax) ~ cosx
b) cos! x = Jim SOS Ox) ~ cose
zy aso Bx
cos x cos Ax — sin xsin Ax ~ cosx
Jim S08 * 608 Ox ~ sin sin Ox — 608% (Use the third expansion)
ao x ‘ Pansion)
= gm SOB COSA cose 4, sinxsinAx
ax Bx arto Ax
= —cosx fim 1= 5% _ sin y tim SA* (Use the introduced limits)
08% Ay Be Aa Axe (se the ee
4.7 Tangent
A third trigonometric function to consider is tangent. It is derived from sinx and cos
Definition:
Opposite
sinx _ Typlioase _ Opposite
~ Adjacent
Note that sine and cosine are defined over the whole R. However, tangent is undefined
whenever cos x = 0. The discontinuities can be pictured in the following way:
48Figure 28: The graph of tangent. Source: math.tntech.edu
We can prove the form of its derivative:
, sinx)' _ sin'xcosx—cos'xsinx _ cos*x+sin?x 1
tan’ x —r—errvvwe
cose cose cose COS”
where the first equality holds due to the quotient rule; the second one comes from using
sin’ x = cosx and cos’ x = — sin x, while the last equality holds since cos? x + sin?x = 1,
which was proven.
4.8 Application in Economics and Econometrics
The use of trigonometric functions in our field is not straightforward. One of the applica-
tions is the following. Remember that these functions have frequencies: sin ax, cos bx where
a and b represent frequencies in Hz (hertz). These functions can help to analyze things in a
different domain, namely frequency domain,
Let g(t) bea function that depends on time. Hence, itis in the time domain. We proceed with
the following transformation from time to frequency domain, which is the famous Fourier
Integral (integrals are explored in detail in Chapters 8 and 9)
- Paty,
s(f) = [soe trtat
where i = y—Tis the imaginary unit (see the Appendix). One can show that e ?"f helps
to collect all the frequencies from trigonometric functions (we saw that sine and cosine are
waves, hence we can say that f() is constituted of various combinations of waves in the
time domain). The extracted information on frequencies helps to analyze business cycles
(macroeconomics) oF covariance structure between random variables in a process (time se-
ries analysis)
495 Chapter 5: Functions of Multiple Variables: Generalization of
Concepts
In this chapter, we generalize the notions of continuity, derivatives and differentials to func-
tions of more than one variable. That is f : Dp CR" +R
5.1 Examples and Graphs
Generally, a function of several variables is denoted as f(x), where x = (¥1,X2,+..%n) © R™
Clearly, if the domain is (a subset of) R”, then the picture is n+ 1-dimensional. However, it
is easy to visualize bivariate functions. For example,
@) fluy)=xty (b) f(xy) = xy
Figure 29: Examples of graphs of some simple bivariate functions
Note that f(x,y) = x + y gives a plane in R3, while f(x,y) = xy gives a parabola. The
graphs below depict a sine wave in 3 dimensions and so-called Gaussian function,
50@ fly
Figure 30: Examples of graphs of a bivariate sine function and Gaussian function
=sinxy (b) f(xy)
5.2. Continuity
Definition: a function f ; R" + R is continuous at a point a = (ay, A2,..., an) iffor every € > 0,
there exists 6 > 0 such that
d(x,a) <6 > |f(x) — flal R at a point a € RY to the direction of the j-th
coordinate for j = 1,...,m is given by
fC) gg LO rj I) = FOr iy)
Oxy had Hh
Partial derivative exists if the limit exists. Note that we switched to Leibniz’s notation to
indicate the variable w.rt which the derivative is taken. The rules of taking partial deriva-
tive follow the same mechanics from univariate differential calculus, e.g. product rule, chain
rule, while the other variables are treated as constants.
Example. Let f(x,y) = (sinx)e-"). Then
af(x,y)
ox
cos xe +") — (sin x)2xe +7)
by the product rule for the derivative with respect to x holding y fixed. Further,
af(zy)
— (x+y)
Be Pulsinade
holding x fixed.
Example. Let f(x,y) = 74. Then
f(xy) _ (ety)
ox &
holding x constant
Clearly, we can have a second order partial derivative defined similarly to the second order
derivative of a function of one variable:
= tim Le +h, (a)
feyonty hi heute) f(t) flay that) fla
es . i
— tim Llu 2f(aircverosar tn) + F(a)
a) caExample. Let f(x) = x} +x1x2. Then
a
10) aya
and
Ff) _
ax? =2
Note that the second derivative can be taken with respect to another variable and we call it
a cross derivative. For example, given f(x) = x3 + x1x2, we have
fla)
Ode
It can be easily extended to functions f : R" + R, For example, one possible scenario is,
ar f(x)
Ox OX,’
which is taking partial with respect to x; for each j =
have
arvtf(x)
axon]
--,mina row. Alternatively, we can
which represents taking p sequential partials with respect to some particular xj and then q
partials in a row with respect to some particular x; # x,
5.4 Total Derivatives
In this sub-chapter we generalize chain rule to functions of multiple variables. To begin with,
consider f : R? > R. Also, we introduce a third variable t © R, such that every pair in the
domain of f is parameterized by it, ic. (x,y) = (x(t),y(t)). Then, the total derivative with
respect to fis given by
af (x(t). y()) _ af dx. af dy
at ax dt” Gy
where the arguments are suppressed for simplicity. Generally for f : R” + R with x =
tessa) = (24 (B)e---y%n(E)) we have
df(y) _ af dx af dx, th af dx
dy dt tat Lay aeExample. Let f(x,y) = xe"¥, where x = x(t) = Pandy = y(!) = 1-1. Then
f(xy) _ afd , afd
dt ‘Ox dt ~ dy dt
+ xye) 26 + e(—12),
where £ = e*! + xye™Y by the product rule and % = xe, Technically, we are done, but it
is convenient to substitute x = x(#) and y = y(#) into the expression, because it is a function
of t. Indeed,
af (xy)
at
Note that the same result would have been obtained had we inserted x = x() and y = y(t)
from the beginning. However, it would have required more work,
t = Pet + 2tet.
(et + tet)2r— Pe
‘An important further generalization is the following. Consider f : R"!? + R, so that
S(X1y-.01 Suet) = faa (b)--+e%n(t)et). Note that now f explicitly depends on #. Then
df (xt) _ afd; | af
ae Hox; dt OF
Total derivatives in such forms are very handy in comparative staties in microeconomics.
Example. Let the production function be
Q=AKLt),
where K = K(t) is the accumulation of capital over time, L = L(t) is the labor force over
time and tis time. Let us find the instantaneous rate of change of production with respect to
time. In particular,
4Q__2QUK | aQal | a9
ama! aa! a
which is constituted of three parts: 1) the change in production via capital change over time,
2) the change in production via labor change over time and 3) the change in production
solely over time (e.g. technological improvements).
5.5 Linear Approximation and Differentials
Similarly to the case of one variable functions, we have the linear approximation based on
partial derivatives. Let f : R" + R. Then its linear approximation around a € R"
F(3) = f(a) + LONG, 01) +. + LO Gey) = 662) +E LO — 0,
ma tx
‘Using this, we can define the total differential:
aaron = L0
mt axwhere d,,x; is an arbitrary change in the direction of x; for i = 1...
Example. Let f(x,y) = xy. Let us introduce arbitrary changes in the x and y directions d,x
and dy, respectively. We will calculate the approximation error Af(x,y) — deyf(x,y). In
particular,
AS (%Y) = fle + dex y + dyy) ~ flery) = (+ dex) (y + dy) — xy
xy + xdyy + ydyx = dyxdyy — xy = xdyy + ydex + duxdyy.
Also,
day f(xy) = LOD ay
afte,
+ ww. lyy = xdyy + ydx,
which implies that
AS (2/9) ~ deyf (xy) = xdyy + dex + dexdyy — xdyy + ydyx = dexdyy.
5.6 Implicit Functions
‘Sometimes in economics we need to take derivatives of functions that are defined implicitly
by equation. Let us have
F(xy) =,
which represents a level curve. An example of level curve is given below.
Figure 32: Level curve is the plane in blue in R?. Source: mathinsight.org
‘Suppose this equation defines y implicitly as a function y = f(x), such that we have
F(x,y(x)) = 6 6.1)
Ify(zx) is differentiable, then its derivative is obtained as
ar
YQ) = Fem
oy
56This is implicit function theorem. The condition “24 4 0 is a sufficient condition to de-
fine y as a function of x. This theorem gives the value of the derivatives even if we do not
have the closed formed expressions. We can prove the theorem for the case of level curve in
R?
Proof. Given F(x,y) = ¢, the total differential is given by
2a)
oF (x,y)
‘ox
dyy
dyy = 0 9 =
oF (x,y)
oy
dex +
which then implies that
ares)
ax bal"
a cs
because the derivative is a ratio of small changes
Example. Let f(x,y) = y + xy = 4. Then
aflzy)
Ox
aflxy)
oy
Sy! +x,
We can solve for y/(x) at, for example, (x,y) = (3,1), because 4) 4 0. Then
af)
¥@,1) = —y8 =
6 Chapter 6: Optimization (Unconstrained)
6.1 Maximum, Minimum and Stationary Points
In this chapter we will explore maximization and minimization (hence, optimization) prob-
Jems related to univariate and multivariate functions. Optimization here is unconstrained,
meaning that we will look for the solutions over the whole Dy © R or Dy © R® or on some
restricted compact sets (intervals or their Cartesian products).
Let A CR, then
Definition: a function f has a global maximum at ¢ € A if f(c) > f(x) forall x € A
Definition: a function f has a global minimum at ¢ € A if f(c) < f(x) forall x € A,
Note that these two definitions do not imply that the minimum or maximum are unique,
For example, sine and cosine function have —1,1 as minimum and maximum, respectively,
57periodically.
Definition: a function f has a local maximum (minimum) at c € Dy ifit has a global maximtunt in
[c— 6,¢-+ 6) forsome 6 > 0
Note that these definitions do not impose any requirements on f such as f being differen-
tiable and/or continuous. Also, global optimum satisfies requirements for local optimum,
Optimum refers to both maximum and minimum. Also, c € Dy is called the extreme point,
Points ¢ at which f"(c) = 0 are called stationary points. Also, f"(x) = 0 is called the first
order condition, which is a necessary (but not sufficient) for a differentiable function f to
have maximum or minimum at an interior point in its domain. The figure below illustrates
the discussed definitions:
Figure 33: Global/local maxima and minima
6.2. Extreme Value Theorem and Applications
Theorem. let f : Dy > R be continuous on the compact interval {a,b], then there exists ¢ € (a,b]
and d € [a,b] such that f(d) < f(x) < f(c) forall x € [a,b], i. the function f has minimum and
maximum,
Note that this is an existence theorem, which means that if we verify the needed conditions
(continuity and compactness), then we are guaranteed that minimum and maximum exist,
It does not tell anything about what those are
When we consider the interior points (i. the interval (a,b)), we follow these rules:
58‘+ for a concave differentiable function the following holds: f has the maximum at ¢ €
(a,b) if an only if f(c) = 0;
* for a convex differentiable function the following holds: f has the minimum at ¢ €
(a,b) ifand only if f"(c
These are the general steps to follow when looking for maximum and minimum values of f
defined on A = [a,b
« find all stationary points of f in (a,);
+ evaluate f at all (i) stationary points, (ji) end points ¢,b and (iii) the interior points
where f is not differentiable;
'* compare the function values at all candidate extreme points; the largest (smallest)
value found at the step above is the maximum (minimum)
Example. f(x) = x*— x and A = [~1,2]. Let us find the maximum and minimum using
the discussed steps. First, note that is continuous and A is compact. Hence, by the Ex-
treme Value Theorem, the maximum and minimum exist in A. Firstly, we find the stationary
points:
1
f(x) =37-1=0S37=
sx=4
3v3'
f(-1I) = 0,
f(2) =6.
Also, there are no points where the derivative does not exist. Clearly, f(2) = 6 is maximum
and f (45) = ~ 525 is minimum.
63 Applications for A CR"
Similarly to the univariate case, we have steps for finding the minimum and maximum when
the function under consideration is multivariate. In particular,
* find the stationary points, i.e. find the points which satisfy “2% — 0 for i =
where x= (x1).--/%n);++ evaluate f at the stationary points of A, the boundary points of A and the points where
249 does not exist for some i = 1,...,1;
© compare the functional values.
Note that finding the stationary points is not as straightforward, because we need to find the
stationary points along the line segments, as well. For example, if A = {(x,y) © R?: <
x= b,c Sy 25x52
Then, starting from (i), we obtain
sty) > sily) = 2° 40,
60therefore we do not obtain more stationary points. Moving on to (ii):
fy) 2(y) = gh(y) = 2e #0,
therefore we do not obtain more stationary points, again. Moving on to (iii):
£(%,-2)
thus again no extra stationary points. Lastly, we go to (iv):
£ (5/2) = 2 = ga(x) > gh(x) = 26 #0,
therefore we do not obtain more stationary points. this mean that we are left to evaluate f at
the boundary points of A. Particularly,
2,
© = gilx) 5 g(x) = 2 £0,
Thus, the maximum is f(2,2) = f(—2, -2) = ef and minimum is f(~-2,2) = f(2,-2) =
This example for this specific A is depicted below:
Figure 35: f(x,y) = e¥ on the given A C R?
Clearly, the maximum and minimum are achieved at the boundary points (2,2), (—2, -2)
and (2,2), (~2,2), respectively. Also, it is clear that there are not stationary points along
the line segments.
617 Chapter 7: Optimization (Constrained)
In this chapter we generalize the last chapter to optimization given the constraints of the
form g(x) = c (equality constraints) or g(x) < c (inequality constraints) for x € R". Through-
out this chapter we will explore an algorithm to obtain candidate points for optimization
and solve the problems. When we study linear algebra in Chapter 11, we will come back
this topic and give a more conceptual look for the completeness of discussion.
7.1 Equality Constraints and Lagrange Multiplier Method
Let us maximize f(x,y) such that (x,y) satisfy an equality constraint g(x,y) = c. Formally,
the problem can be written as
max f(x/y) 8 g(x,y) = 6
For this, we define the so-called Lagrangian Function £, which is
Lxiy) = flxy) —Alg(x¥) ~¢],
where A is called Lagrange multiplier. To solve for the candidate points (x*, y"), we obtain
the following system of 3 equations and 3 unknowns:
actuw) _ aflxa) _ aa
eee) — Yes =0,
activ) _ aflny) _ ,axixy) _ 9,
acts, a *
ae = 8(xy) =
‘We can use the same approach to solve the minimization problem. Indeed,
min f(x,y) 3h gy) = ¢~ max —f(a,y) st g(%9) =e
Example. Let f(x,y) — 2x? +y? and
min2x? 4 y?s.tx+y=1
62Example. Let the consumer have the following Cobb-Douglas utility function:
u(x,y) = Axty
and the budget constraint
petqy=m,
where A, a,b, p,q,m are all positive. The consumer's demand problem is
mx Axty! st pehay =m,
2
£(x,y) = Axtyh Alpx i gym),
2660) — gaxely>— Ap —0 Ax ty! = Ap bay
eine ayo cg chy 9 (SEM
prtgqy=m pxtqy=m Pee ay
We can simplify the system further if we divide the first equation by Ax"y’. Then
sty =
any Ap act by y= Bx
bAxtytarg of rT
prtqy=m px equ =m prt qy =m
Now, we can insert the first equation to the second one and obtain
pra (Whe) Owith A= 0if g(x,y) O if the constraint is satisfied with equality
(binding constraint). Therefore, in order to solve the posed maximization problem, we need
to find points (xo, yo) and Ao such that the necessary Kuhn-Tucker conditions are met and
the constraint g(xo,¥o) < c is satisfied. Note that AT LEAST one of the conditions from
66= Owith A = Oif g(x,y) Owith A =Oifxt+y? <1
Firstly, we find (xo, yo) which satisfies the necessary Kuhn-Tucker conditions for some suit-
able value of A. We focus on the first two equations in the system:
2x -2Ax =0 _, (0-”)
2y+1—2Ay=0 2y(1- A)
Looking at the second equation, it implies that A # 1, because 2y x 0 4 —1. Then this
implies that x = 0, looking at the first equation. ‘Therefore, knowing that x = 0, we have
0
67<1? <1, Therefore, let us consider 2 cases: (i) y2 = 1 and (ii) y? < 1. Let us
focus on (i):
-A)= =3
Paioynsis [YT B2EU- = a-k
y= -1;2(-1)(1-A) =-1 A=}
Hence, from (i), we obtain two candidate points: (0,1),A = } and (0,~1),A = 3. Note
that these are the candidate points, because x? + y* < 1 is satisfied with equality (binding
constraint) and A > O is satisfied, which should hold when the constrain is binding.
Now, we move on to (ii)
1
P<1SA=05 W1-0)=-1s y= —5.
Note that this gives a third candidate point (0, —}) with A = 0, because x? +y? =0+4 <1
and the constraint is non-binding. Therefore, we have three candidate points in total. Then
‘we examine which of them gives the largest functional value:
2
£(0,1) = 0812 41-1 = 1 (maximum),
F(0,-1) = 4 (-1)?=1-1=-1,
t(0-$
Let us explore an example involving multiple constraints.
Example. Let f(x,y
x 3y—4e-"-¥ and
maxx $3y de" st 2x > 2x1 Sy
Then
L(x,y) = x+3y —4e 7 — Ay [x + 2y—2] — Aa ix +y- 1]
Bele) 1 4 ge-*¥ — Ay — An = 0,
26 — 34 de*¥ 2A, — a2 = 0,
Ay > Owith Ay = Oif x +2y <2,
Az = Owith Az = Oifx ty <1.
We can subtract the second equation from the first one:
L4+4e *¥—Ay—Ar—3—4e * 942A, Ar =0
Hp 2A FIA = Art Ag = 0 2H HOSA A
which is strictly positive. This implies that we have 2 — x = 2y. Let us consider two cases:
(i) Az = Oand (ii) Az > 0. Looking at (i) and the first equation:
1 1
a xy ily yen (1) =n
1+4e 2=05e gary (3) Ind
= xty=in4 = 1,39,
68However, we know that x +y <1 must hold, which implies that Az > 0 ((ii) holds). Hence,
x+y =1and we obtain the second equation for the system- In particular,
Bax=Y gy ig
xty=l ,
‘We are left to obtain Az. Again, from the first equation:
14de°} —2-A, =O 1440 7-2 =A, Ag = Ae T= 1
7.6 Kuhn-Tucker: Sufficient Conditions
Suppose (xo, yo) satisfies the necessary conditions and g(xo, yo) < c is satisfied. If £ is con-
cave, then (xo, yo) solves the maximization problem. Similarly, if £ is convex, then it solves
the minimization problem.
Proof. If £ is concave, then (xo, Yo) must be the maximum. This implies
L(x0/40) = £(x,9),
Using the definition of Lagrangian, we then obtain
S(Xo-¥o) — A [g(xor 40) ~ ¢] = fey) ~ Alg(xr¥) ~ €
= F (x0, Yo) ~ flay) > Ag(xo, Yo) — Ae Ag(xey) + Ae
> f(xoryo) — F(x.y) 2 A [g(x0. yo) — g(x-y)]
Now, we consider 2 cases: (i) g(xo,yo) < ¢ and A = 0 and (ii) g(xo,yo) = cand A > 0.
Looking at (i), we obtain
F(%0, Yo) — f(x-¥) 2 A[g (xo, yo) — 8(%-¥)] > f(X0-Yo) — f(xy) 2 0 > f(xor4o) 2 f(xy)
and thus (xo, yo) solves the maximization problem. Looking at (ii), we obtain
Fl%yo) ~ Fly) > Alg(x0,¥0) ~ g(¥)]
Floryo) ~ flay) > Ale 3(%,9)] > Fle0.¥0) > flay,
because g(x,y) < cand A > 0, so the implication is that f(xo, yo) — f(x,y) is greater than
some non-negative number, hence f(xo,Yo) > f(x,y). This means that (xo,yo) solves the
maximization problem. We will come back to the concave Lagrangian in Chapter 11, where
we connect linear algebra and multivariate differential calculus.
8 Chapter 8: Integral Calculus (Single Variable)
8.1 Indefinite vs. Definite Integrals
In this chapter we will explore the notions of indefinite and definite integrals and their prop-
erties, Also, a few integration rules, such as integration by substitution and integration by
parts will be examined alongside some examples.
69Consider f(x) and F(x) with f(x) = F'(x) . Then the indefinite integral is defined as
| foaax =F) 4
where cis an arbitrary constant, Because of this constant, we have the indefinite integral
This is 50, because F(x) + ¢ is not a definite function - it is a class of functions since the con-
stant is arbitrary. Finding the indefinite integral is sometimes called finding antiderivative.
Example. Let f(x) = 2°, g(x) = sinx and h(x) = 2xe-”. Then
ol
[fox = D0
/ a(x)dx = — cose ter,
[lear
On the other hand, the definite integral results in a number in R and not in a class of func-
tions, In particular, consider a continuous function f(x) defined on [a,b]. Also, consider a
continuous function F(x) defined on [a,b) such that F'(x) = f(x) for every x € [a,b]. Then
the definite integral is
’
[ foax = Felt = FO) Fa),
which is clearly a number. This result is known as the Fundamental Theorem of Calculus.
We will prove this result rigorously, but before this, let us explore some examples and give
the rigorous definition of the definite integral, as well as its interpretation
Example. Let the interval be [1,5], then
: -
J Pax = | =
h 13),
feu _ [<} =}
h Z|;
0s 20),
8.2 Definition and Interpretation of the Definite Integral
Let f(x) > 0 on an interval [a,b]. One of the interpretations of J f(x)dx is an area under
the curve. The figure below illustrates this.
70Figure 36: Area under the curve
‘The area S is approximated by, firstly, dividing the interval [a,b] into 1 sub-intervals of equal
size (Ax)), which we put in a set: {x1 — a,X2— X1,.2+/%n—%n-1} = {Ax Axa... Axa} =
1(n). We call I(n) the partition, which is a choice of ‘chopping’ [a,b] in terms of the length
of Ax;. I(n) is the function of 1, since the larger n is, the smaller A.x; becomes. ‘The figure
below presents this.
y
% a % 3 ac Xs %6
Figure 37: Forming sub-intervals. Source: tutorial. math.lamar.edu
7‘Then the the area is approximated in the following way:
s
Bin SG) Ox
‘This is the definition of Riemann integral. Note that the area is the limit as the size of
Ax; shrinks (as n — 00). For any finite n, the sum above is just an approximation and it
can undervalue or overvalue the area, which depends on where the function f is evaluated
within the interval [x;_1,x;). Therefore, it is possible to put more structure on such definite
integral construction. In particular, we can make a distinction between the lower (S)) and
the upper (S,) approximating sums. In particular,
S1= Din f(3)() x0) = L inf fx,
Su Yosup flay — xia) = Lesup f(a) Axi,
Fret Fret
where fj is the i-th sub-interval of length Ax;. Intuitively, §; and §, represent the largest
possible undervaluation and overvaluation of the area, respectively. The figure below repre-
sents both cases, where, in case of Sy, infyci, f(x) = m,, and in case of Sy, sup, , f(x) = Mi
Aa=Ms(ts~2) 4, Astma(ama)
Yat 4, % beat
Py Fy ty 82
Figure 38: Upper and lower approximating sums. Source: math.feld.cvut.cz,
the integral [? f(x)dx exists if
fin Bing f).4 ~ fg Yup flan feet
In other words, we require the limits of both lower and upper approximating sums to be
equal. This will happen to continuous functions on [a,b]. This more structured view of an
integral construction is called Darboux integral.
72Note that the definition of integral was given for f(x) > 0 on [a,b]. However, we can proceed
with the same definition if we split arbitrary f(x) into non-negative parts. Indeed,
L(x) = fel2) — 0),
where f(x) = max(0, f(x)] and f(x) = max{0,—f(x)]. Hence, it is defined as the differ-
ence of two non-negative functions.
Ultimately, we constructed an integral as the limit of the approximation which comes from
‘chopping’ the area under the curve and letting the pieces to become infinitesimally fine,
This might suggest that area under the curve is the definition. However, using the same
algorithm of approximation, it is possible to obtain different measures, such as length of a
line (see the section on arch integrals at the end of this chapter). This implies that area under
the curve is not the defining property of an integral - it is an interpretation only.
8.3. Properties of Definite Integral
Let f(x) and g(x) be integrable on (a, 5], Then
+ LU) + s))de = fp fa)doet fP a(a)ex,
0 [Pa f(xldx =a f? f(x)dx for a #0;
© f(x) S g(x) for every x € [a,b] > J f(x)dx < f* glx)dx;
© | Fdde| Sf f@)ldx ita