Lecture Notes
Lecture Notes
Preface i
1 The basics 1
1.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.3 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.5 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 R as a calculator . . . . . . . . . . . . . . . . . . . . . . . . . 30
3
4 CONTENTS
2.3 Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4 More on functions 89
5 Derivatives 123
6 Integration 141
9 Optimization 235
In each chapter, we cover the theoretical aspects of the topic and illustrate
them with a handful of examples. At the end of each chapter, we provide a
collection of exercises, and we close with a short section with recommendations
for further readings and exercises from other sources. In particular, we
generally refer to the following two books for more details on the topics
covered:
i
ii PREFACE
2021 [1]
While in the above books the presentation of the material might be more
detailed, we see the main advantage of this document in the fact that it
integrates both the mathematical aspects and the application in R of the
covered topics.
If you find any mistakes or typos, we will be grateful if you report them to
jana.hlavinova@wu.ac.at such that we can further improve the lecture notes.
Chapter 1
The basics
1.1 Numbers
In this course, we work with real numbers, denoted as R. These√ are all the
usual numbers that you can think of, such as 0, 5, −153, 75.2, 2, π, . . . (note
that we use a dot as the decimal delimiter, following the English conventions
and at the same time ensuring consistency with the usage of R). The set of
all real numbers contains other sets of numbers such as:
For particularly large or particularly small numbers, one sometimes does not
wish to write them out with all of their digits, in particular if there are a
lot of zeroes after or before the last/first non-zero digit (e.g. the value is in
millions). In such cases, we use a compact way of writing numbers called
scientific notation. In scientific notation, a non-zero number is written in
the form m · 10n where n is an integer and 1 ≤ |m| < 10. The values n
and m are called the exponent and mantissa (or significand ), respectively.
Sometimes, for instance in calculators, the notation m·e n is used (be careful,
e does not stand for the Euler’s constant here), that is, instead of writing
e.g. 3.42 · 10−2 , one would write 3.42e − 2 or 3.42e − 02. In the following,
1.2. ELEMENTARY ALGEBRA 3
we give a few examples of numbers written in both the ’usual’ way and in
scientific notation:
The distance between the Earth and the Sun: approximately 149600000km =
1.496e8km.
In algebra, we work with variables that might take different values. In some
situations, we want to solve equations to find certain unknown values; in
other cases, we incorporate variables on purpose to keep flexibility when
describing certain aspects of the world. Variables (and parameters) are
usually denoted with Latin letters, sometimes you will also encounter Greek
letters. Moreover, various indexes or accents, like a hat, bar or a prime can
be used to introduce new variables.
−(−a) = a (1.1)
−(a + b) = −a − b (1.2)
a(b + c) = ab + ac (1.3)
ab = ba (1.4)
(a + b)(c + d) = ac + ad + bc + bd (1.5)
(a + b)2 = a2 + 2ab + b2 (1.6)
(a − b)2 = a2 − 2ab + b2 (1.7)
(a + b)(a − b) = a2 − b2 (1.8)
Formulae (1.6) and (1.7) belong to binomial formulae, since they describe the
algebraic expression of a power (in this case a square) of a binomial term.
For fractions, the following basic rules apply for a, b, c, d ∈ R:
a·c a
= if b, c ̸= 0 (1.9)
b·c b
−a a a
= =− if b ̸= 0 (1.10)
b −b b
a b a+b
+ = for c ̸= 0 (1.11)
c c c
a c ad + cb
+ = for b, d ̸= 0 (1.12)
b d bd
a c ac
· = for b, d ̸= 0 (1.13)
b d bd
a c a d ad
÷ = · = for b, c, d ̸= 0 (1.14)
b d b c bc
Another usual type of equations are quadratic equations in which the variable
x appears at most in a quadratic term. The basic form of a quadratic equation
6 CHAPTER 1. THE BASICS
The formula given above can in fact be derived quite easily. ax2 + bx + c
can be rewritten with the help of one of the binomial formulae as follows
(you might check the equality as an exercise or, even better, try to derive it
yourself):
2
b2
2 b
ax + bx + c = a x + − + c.
2a 4a
If we set this equal to 0, as in the original equation, we obtain the equation
2
b2
b
a x+ = −c
2a 4a
1.2. ELEMENTARY ALGEBRA 7
or equivalently
2
b2
b c
x+ = 2− .
2a 4a a
Since on the left hand side we have a squared value, the equation can only
have a solution if the right hand side is non-negative:
b2 c
2
− ≥0
4a a
2
b − 4ac ≥ 0
which corresponds to the condition given above. Assuming that this condition
is satisfied, we can take the square root of both sides to obtain
r r √
b b2 c b2 − 4ac ± b2 − 4ac
x+ =± − =± =
2a 4a2 a 4a2 2a
b
which after subtracting 2a
from both sides leads to the known formula.
Let us now study the relationship between the parameters of the equation
a, b, c and the solutions of the equation if b2 − 4ac > 0. Let us denote
√ √
−b + b2 − 4ac −b − b2 − 4ac
x1 = and x2 = .
2a 2a
We can observe the following:
x +x
1 2
√
2
√
2
= −b+ 2ab −4ac + −b− 2ab −4ac = −b
a
. In particular, if a = 1, the
sum of the two solutions is the negative of the coefficient of the linear
term x, −b.
x 1
√ √ 2 2
· x2 = −b+ 2ab −4ac · −b− 2ab −4ac = b −(b4a−4ac)
2 2
2 = ac . In particular, if
a = 1, the product of the two solutions is the absolute term c.
The above facts allow for a simple way of factoring out quadratic expressions:
If an expression x2 + px + q can be factored into (x − x1 )(x − x2 ), then x1
and x2 are the solutions of the quadratic equation x2 + px + q = 0 and are
therefore such that x1 + x2 = −p and x1 · x2 = q. (Alternatively, we have
x2 + px + q = (x + y1 )(x + y2 ) with y1 + y2 = p and y1 · y2 = q.) To illustrate
this, let us consider an example.
x2 − x − 2a = 0
√
1 ± 1 + 8a
x1,2 = .
2
This expression is well defined whenever 1 + 8a ≥ 0. In particular, we have
two real solutions for a > − 18 , one solution for a = − 18 and no solution for
a < − 18 .
However, we have to remember, from the original equation, that x cannot be
equal to 3 or 5. Therefore we check for which values of a the solution above
would lead to any of these value:
For x ̸= 3, it has to be the case that
√
1 ± 1 + 8a ̸= 6
√
± 1 + 8a ̸= 5
1 + 8a ̸= 25
a ̸= 3.
1.2. ELEMENTARY ALGEBRA 9
x ∈ ∅ if a < − ,1
x = if a = − ,
8
1 1
x = −2 if a = 3,
2 8
x = −4 if a = 10,
x= √
1± 1+8a
2
1
if a ∈ (− , 3) ∪ (3, 5) ∪ (5, ∞).
8
Definition 1.1. The absolute value |x| of a number x gives its distance from
0:
(
x if x ≥ 0
|x| =
−x if x < 0.
Loosely speaking, the absolute value of a number is its value without its sign.
Following the definition of absolute value distinguishing two cases, one can
also solve equations by splitting it into cases. We demonstrate this in the
following example.
Problem 1.4. Solve the following equation in R: |x + 2| = 4|x − 3|.
Solution. From the definition of absolute value, we have
(
x+2 if x ≥ −2
|x + 2| =
−(x + 2) = −x − 2 if x < −2.
10 CHAPTER 1. THE BASICS
Similarly, we have
(
x − 3 if x ≥ 3
|x − 3| =
3 − x if x < 3.
Therefore we can split R into three intervals and solve the equation for each
of them separately:
1. x ∈ (−∞, −2):
|x + 2| = 4|x − 3|
−x − 2 = 4(3 − x)
14
x= ∈
/ (−∞, −2)
3
(thus no solution in this interval)
2. x ∈ [−2, 3):
|x + 2| = 4|x − 3|
x + 2 = 4(3 − x)
x=2
3. x ∈ [3, ∞):
|x + 2| = 4|x − 3|
x + 2 = 4(x − 3)
14
x=
3
1.2.3 Inequalities
iv) Applying an increasing operation (i.e. a function f for which f (x) <
f (y) whenever x < y) keeps the inequality sign.
v) Applying a decreasing operation (i.e. a function f for which f (x) > f (y)
whenever x < y) reverses the inequality sign.
Some operations, e.g. taking the reciprocal of a number or taking the square
of a number, are only increasing or decreasing if considering a certain set of
values. If we want to apply these operations, we have to carefully consider
what interval we are operating on. For instance, taking the reciprocal of
a number is a decreasing operation on (−∞, 0) (since e.g. −3 < −2 but
− 31 > − 21 ) and on (0, ∞) (since e.g. 5 < 10 but 15 > 10
1
), but not on R (since
1 1
e.g. −2 < 2 and − 2 < 2 ). Therefore, if both sides of the inequality share the
sign (they are both positive or they are both negative), the inequality sign
changes. This, however, is not the case if one side is negative and the other
is positive.
For the first interval, let us consider x = −1. We have (−5)2 > 16 which
means this interval is not a part of the solution.
For the second interval, let us consider x = 1. We have (−3)2 < 16, thus
x ∈ (0, 8) does solve the inequality.
For the third interval, let us consider x = 10. We have 62 > 16 which means
this interval is not a part of the solution.
Combining all the above cases, we get that the complete solution of the given
inequality is x ∈ (0, 8).
For working with powers, the following rules apply (a, b > 0, r, s ∈ R, n ∈ N):
a0 = 1 (1.15)
1
a−r = r (1.16)
a
1 √
a = na
n (1.17)
ar · as = ar+s (1.18)
ar · br = (ab)r (1.19)
(ar )s = ars (1.20)
if r > 0 : ar < br for a < b (1.21)
if r < 0 : ar > br for a < b (1.22)
if a > 1 : ar < as for r < s (1.23)
if a < 1 : ar > as for r < s (1.24)
Note that we chose a, b > 0 to make sure that terms such as ar always exist.
The rules above, however, apply whenever all terms are well defined. For
1.2. ELEMENTARY ALGEBRA 13
instance in case of equation (1.17), the rule also applies for a < 0 if n is odd.
To illustrate the use of (some of) the above rules, let us solve an example.
Problem 1.6. If x−4 y 6 = 25, find x2 y −3 + 2x−12 y 18 .
Solution. We start by rewriting the expression whose value we are looking
for in terms of x−4 y 6 , with the help of the above rules. Then we can plug in
25 for x−4 y 6 . We get
1
x2 y −3 + 2x−12 y 18 = (x−4 y 6 )− 2 + 2(x−4 y 6 )3
1
=p + 2(x−4 y 6 )3
x−4 y 6
1
= √ + 2 · 253 = 31250.2.
25
b) Suppose you buy something for e1000 which decreases in value (depreciates)
by 2% per year. How much is it worth after 5 years?
c) Repeating the steps from parts a) and b), we get that after 5 years, the
object is worth e1000 · 1.025 · 0.985 ≈ 998.
Note that the value is not equal to 1000: increasing and then decreasing
a certain price or value by the same percentage does not result in the
original price. The reason is simple: (1 + r)(1 − r) = 1 − r2 ̸= 1. If
a businessman decreases the price of a product by r · 100% and would
later like to increase it back to the original price, the increase will be
by more than r · 100%.
1.2.5 Logarithms
Special logarithms. Among all possible logarithm bases, there are two
numbers that stand out. The natural logarithm arises for a = e (Euler’s
constant ≈ 2.7182) and is often denoted as ln x. However, it is also not
unusual, in particular in programming languages – including R – to denote it
log x. This can often lead to confusion since log x sometimes also stands for
the decadic logarithm with a = 10. To avoid such confusion, in this course
1.2. ELEMENTARY ALGEBRA 15
we will explicitly write the base unless we mean natural logarithm, i.e. both
ln x and log x will stand for the natural logarithm, whereas decadic logarithm
will be denoted log10 x.
The following rules apply when working with logarithms (a, b, x, y > 0, a, b ̸=
1, r ∈ R):
loga (xy) = loga x + loga y (1.25)
loga 1 = 0 (1.26)
loga xr = r loga x (1.27)
x
loga = loga x − loga y (1.28)
y
logb x
loga x = (1.29)
logb a
if a > 1 : loga x < loga y for x < y (1.30)
if a < 1 : loga x > loga y for x < y (1.31)
The last two properties in particular, combined with the fact that loga 1 = 0,
imply that if the base is larger than 1 (which is the case for both natural
and decadic logarithm) loga x is negative for all x < 1 and positive whenever
x > 1. If a < 1, we have negative loga x for x > 1 and positive logarithm for
x < 1.
Problem 1.8. Find the mistake in the following ’proof’ showing that 2 < 1:
1/4 < 1/2
ln(1/4) < ln(1/2)
ln((1/2)2 ) < ln(1/2)
2 ln(1/2) < ln(1/2)
2<1
Solution. Following the chain of inequalities one by one, we check that the
first four inequalities are true: For base e, the logarithm is an increasing
operation, such that taking the logarithm of both sides does not change the
direction of the inequality. The next two steps are just rewriting the left
hand side using the properties of the logarithm. However, in the last step,
one divides by ln(1/2). Since e > 1 and 1/2 < 1, this is a negative number,
which means that the sign of the inequality has to be turned in this step.
The logarithm and the exponential are closely connected to each other as
they represent inverse operations. That means that if we apply one of the
16 CHAPTER 1. THE BASICS
operations on a number and then apply the other on the result, the final
result is the same as the original number. We therefore have the following
two useful properties:
loga (ax ) = x, (1.32)
aloga (x) = x. (1.33)
The truth value of the four basic conjugate statements in dependence on the
truth value of the statements A and B can be summarized in the following
truth table:
Remark. Note that a non-strict inequality, i.e. one with the at least or at
most sign, is in fact a conjugate statement, more precisely a disjunction of a
strict inequality and of an equality. It is sufficient that one of the statements
is true for the whole statement to be true. This means that for instance the
statement ’All natural numbers are greater than or equal to -5.’ is a true
statement. The fact that no natural number can ever be equal to -5 is not a
problem, since for all natural numbers the part ’greater than -5’ is satisfied.
Finally, let us illustrate the conjugate statements and their truth value with
the help of a small party.
Problem 1.9. Anne and Bob were invited to a party. Anne did not feel
well and stayed home, Bob went to the party. Decide whether the following
statements are true or false:
ii) Anne went to the party and Bob went to the party.
iv) If Anne went to the party, then Bob went to the party.
vii) Anne went to the party if and only if Bob stayed home.
ii) ”Anne went to the party and Bob went to the party.” translates to
A ∧ B and is false.
iii) ”Anne went to the party or Bob went to the party.” translates to A ∨ B
and is true.
iv) ”If Anne went to the party, then Bob went to the party.” translates to
A ⇒ B and is true.
vi) ”Anne stayed home if and only if Bob stayed home.” translates to
¬A ⇔ ¬B and is false.
vii) ”Anne went to the party if and only if Bob stayed home.” translates to
A ⇔ ¬B and is true.
In mathematics, one often talks about necessary and sufficient conditions, for
instance for certain properties to be satisfied. Let us consider two statements
A and B. If we have A ⇒ B, we say that A is a sufficient condition for B
since it suffices that A is true for B to be true. Often it is the case that A
is easier to check than B which is why sufficient conditions are looked for.
From the truth table it is also clear that A cannot be true if B is not satisfied,
too, thus we say that B is a necessary condition for A. However, generally
1.4. SUMMATION AND PRODUCT NOTATION 19
it is not the case that A is a necessary condition for B since (recall the truth
table again) B might be true even if A is false. For A to be both sufficient
and necessary for B (and vice versa), we need the equivalent relationship,
i.e. A ⇔ B.
As an example, let us consider the following properties: x > 2 and |x| > 2.
Clearly, there is an implication between these two statements, x > 2 ⇒ |x| >
2, such that x > 2 is a sufficient condition for |x| > 2, and |x| > 2 is a
necessary condition for x > 2 (it is not possible for x to be larger than 2
if |x| > 2 is not satisfied). However, x > 2 is not a necessary condition
for |x| > 2: for instance for x = −3 we also have |x| > 2, even though
the first condition is not satisfied. But if we extend the first condition to
x > 2 ∨ x < −2, we obtain an equivalence: (x > 2 ∨ x < −2) ⇔ |x| > 2.
Thus x > 2 ∨ x < −2 is a sufficient and necessary condition for |x| > 2.
i) the running index, i.e. in this case i, the index that is used to assign
each term its place in the sequence whose members are being summed;
ii) the limits of the index, i.e. the smallest and largest index in the sequence
that should be included in the sum, here 1 and 10;
iii) a sequence of values that should be summed, here the values ai .
The ingredients are thus the same as in the case of summation, however, the
individual values are multiplied with rather than added to each other.
1.5 Exercises
1.1 For the following numbers, find the smallest subset of real numbers (N,
Z, Q or I) in which they necessarily belong:
√
a) √12 , c) x7 for x ∈ R+ 0,
8
p
b) 4 , d) x8 y −2 for x, y ∈ N.
1.2 In a cinema, there are S seats. A ticket for an adult to see a movie costs
eT , a child pays 30% less than an adult. On a particular night, A adults
and C children saw a movie. Interpret the following quantities:
a)S − A − C
b)0.7T
c)T · S
d)A · T + 0.7C · T
1.3 A person has y euros available to spend on three kinds of fruit, namely
apples, bananas and cherries. She decides to spend y4 euros on each kind of
fruit and save the rest for later. The prices per kg of fruits are 2.5 for apples,
1.6 for bananas and 5 for cherries.
a)What is the total weight of fruits she buys?
b)How much does she pay per kg of fruits (combined)?
c)If instead she wants to buy equal quantities of each fruit by spending
the y euros entirely, how much does she buy of each fruit?
f) x+√4x2 +x − √1
x− x2 +x
= x3 ,
g)|x + 6| = 9,
h)|4 − x| − |2x + 3| = 7.
√
i) 1 + x + √ax
1+x
= 0 with parameter a
j) 2x−a
x−8
= x−a
x−1
with parameter a
1.5 A firm manufactures a commodity that costs e20 per unit to produce.
In addition, the firm has fixed costs of e2000. Each unit is sold for e75. How
many units must be sold if the firm is to meet a profit target of e14500?
1.6 A producer faces the following demand: P = 100 − 2Q, where P stands
for the price of a certain product and Q for the quantity of products sold.
At what price is the total revenue T R = P · Q equal to zero?
1.8 Find the fraction for which the following properties hold:
the denominator is by 3 larger than the numerator;
if we add 1 to the numerator and 2.5 to the denominator, we don’t
change the value of the fraction.
1.9 Factorize the following expressions, i.e. write them in the form a(x −
x1 )(x − x2 ):
1.10 Simplify and find the values of x, y for which the expressions are not
well defined:
a) 21 (2x − 8) + 34 (12 − 4x) − 51 (5x − 10)
1.5. EXERCISES 23
b) x−1
x+2
(x2 − 4) + x+1
x−2
(x2 − 4x + 4)
x2 −25
c) x24x+16
+8x+16
(2x + 8) − 2x+10
÷ (x − 5)
x2 −7 x2 −49 x−2
d) x+7 · 7−x
·
2x2 −98
3x−5 5−3y 5x−3
e) 25−9x 2 ÷ 9y 2 −30y+25
÷ 6x+10
f) 1+x1p−q + 1
1+xq−p
√ 1
15
1.11 Calculate without a calculator: a) 3
27, b) 32
, c) 0.00010.25 , d) log2 64,
e) ln 1, f) log4 12 .
1.12 A bank offers a savings account paying interest 4% at the end of each
year.
a)Person A deposits e1000 at the beginning of a year. For 15 years, the
person does not make any deposits or withdrawals in the account and
the interest does not change. How much is in the account after 15 years?
b)Person B wants to deposit money for 10 years and have e5000 in the
account at the end of this period (no deposits or withdrawals will be
made during that time). How much does the person have to deposit?
1.14 We know that the statement ’If Celia is sick, she doesn’t go to school.’
is true. Celia did not come to school. Decide the truth value of the statement
’Celia is sick.’
1.15 Fill the gaps with one of the following: ’at least’, ’exactly’, ’at most’ to
create a true statement.
a)Each prime number has ... two different divisors.
b)Two different lines in a plane have ... one common point.
c)The inequality x2 > 5 is satisfied by ... three natural numbers.
2
a)Let ak = (k − 1)2 for k ∈ Z. Determine a3 and
P
1.18 ak .
k=−1
2
b)Let ak = (−1)k (k + 1) for k ∈ Z. Determine a3 and
P
ak .
k=−1
1.19 Evaluate:
6
P
a) (2j − 5)
j=3
5
k3
P
b) 3
k=−5
2
P (4i+2)i
c) i+3
i=−2
R is an open source program. That is, it is free and its capabilities are
quickly growing thanks to the large R community that works on providing
new functions for specific tasks in what is called packages (we will get to
know some packages soon enough, though in QM I, we will mostly work with
the basic R without packages). Nevertheless, before a package is formally
made available in R, it has to go through a round of checks by the R core
team which ensures that high quality is maintained.
27
28 CHAPTER 2. GETTING STARTED WITH R
Before we start working with R, you need to download it. Best way to do it
is from the official webpage of the R-project. Then install R following the
instructions. Usually the default options are a good choice.
After installing R, one can start working with it. However, for a more
comfortable user experience, we suggest you also download and install Rstudio.
We will also use Rstudio in our practical sessions. Note that the order is
important here: You should always start with installing R and only then
proceed to installing Rstudio.
Let us briefly have a look at the user interface when first starting Rstudio in
the screenshot in Figure 2.1.
In the view when first opening R, the environment is split into three parts.
The large part on the left is called the console. In the console, we write and
run commands, and the outputs, if there are many, are shown here. After
starting Rstudio, you will see some information about your current version
of R and about some ways on how to receive more information here. At the
bottom, there is the prompt sign > that we will mention in more detail in
the next section.
2.1. DOWNLOAD, INSTALLATION AND BASIC VIEW 29
In the right part, we again have two windows, both of them with several
tabs. The most important tab of the upper window is Environment. In this
window, we can see the list of all user-defined objects currently stored in the
environment, also called the workspace. This includes for instance all values
assigned to variables as well as the user defined functions.
In the lower window, the most useful tabs are Files, Plots, Packages and
Help. We will mention each of these tabs at appropriate places in the further
text.
Once you get more familiar with R, you will realize that you might find a
different layout more comfortable, or even that some of the provided tabs
are not useful for you while others that you would like to have available are
not included. The good news is that you can customize the layout of your
Rstudio: To this end, choose Tools from the bar at the top of your Rstudio,
go to Global options and finally Pane layout. Here you can decide about the
30 CHAPTER 2. GETTING STARTED WITH R
position of the four basic windows, as well as what tabs should be included
in the two windows that in the basic view are on the right. We will refer to
the windows the way they are ordered in the basic view, i.e. the files in upper
left, console in lower left, environment etc. in upper right and help etc. in
lower right.
In the rest of this chapter, we will discuss how the user can communicate with
R and define variables, how to work with vectors, and how to understand
R’s ’replies’. In later chapters, we will focus on further tasks, like defining,
evaluating and plotting functions, checking whether particular conditions are
satisfied and defining actions based on the result of this check, or using loops
that allow to repeat a particular task as often as necessary in an automatized
way. In the following, whenever mentioning a function or its arguments in
the text, we will use the computer font that makes them stand out.
2.2 R as a calculator
Though its power lies elsewhere, for the basic understanding of R, its syntax
and how to interact with it, it is good to have a look at the most basic way
of using it, namely as a calculator (in some situations, it is even easier to use
and possibly more precise than a calculator).
As already mentioned above, once you start R, you will see the prompt
sign > in the console. That means that R is ready to receive and carry
out commands. If you enter a (correct) line of code and hit enter, the
command you entered will be run, if there should be an output, it will be
shown (printed ), and the prompt will show up in the next line. Should you
instead see a plus sign, that means that the code you entered in the previous
line was incomplete and R expects more (for instance a closing bracket).
## [1] 42.14
1/100000000
## [1] 1e-08
2.2. R AS A CALCULATOR 31
2.34e-02
## [1] 0.0234
## [1] 1.181818
At this point, we would like to make a few remarks. Let us start by commenting
on the way code and its output look. We create these lecture notes with the
help of a very useful package called knitr that allows for creation of nice
documents with integrated formatted code. The colorful lines are the code
we enter in the console or run from a file. The lines below, started with two
hashtags, show the output of each corresponding line of code. To run a line
of code from a file (open in the upper left window of your Rstudio), just place
the cursor anywhere in the corresponding line and use the key combination
Ctrl+Enter, or on some devices it might be Ctrl+R.
Now let us turn our attention to the code itself. As should be clear from the
first line of code, addition and multiplication work the usual way with the
signs + and *. Moreover, to raise a number to a certain power, one uses the
sign ˆ . The second line of code shows division using the slash (/) and looking
at the output of this line, we see that R makes use of the scientific notation
if the given number is very small (or large). The third line shows that not
only can the output be shown in scientific notation, R also understands if it
is given a number this way and can ’translate’ it.
In the fourth line of code, we are interested in the result of dividing 13/11.
This number is in fact an irrational number, which means that in the decimal
form, there are infinitely many digits after the decimal point. However, R
does not show all of them. Note that we are also given this information in the
line of code in question. However, it seems that R did not do anything about
this part of the line. This is the case because the words ’a cutoff after 6 digits
(per default)’ are entered after the hashtag sign. In R, the hashtag starts
a comment: Anything that comes after this sign in a line will be ignored.
It is a good practice to use comments in particular when working on larger
projects, things that you might need to come back to at a later point (one
can forget surprisingly quickly what one was thinking while writing the code
and why the variables were named in the seemingly illogical way they were)
32 CHAPTER 2. GETTING STARTED WITH R
The number of digits that are shown of a number can be changed in several
ways. One of them, particularly useful if one only wants to change this
setting for one single value (we will touch upon a more long-term solution at
a later point), is to use the print function: We provide it with the number
to show and set a parameter called digits to the number of desired digits
to be shown.
## [1] 1.18181818
The order of operations is the one that we know from basic algebraic rules:
taking powers comes before multiplication and division, and these operations
come before addition and subtraction. Therefore we need to use brackets if a
different order of operations is desired, as is illustrated by the following lines:
5*2 + 4/2^2
## [1] 11
5*(2 + (4/2)^2)
## [1] 30
Note that since power comes before division, 4/2^2 corresponds to 242 . But
adding brackets where they are not necessary, for instance writing 4/(2^2)
instead of 4/2^2, does not interfere with the outcome. Therefore if you are
unsure about the order of operations, you may use brackets even when they
are not necessary. Next to the fact that you can then be more sure that R
does what you want it to, in some situations it also makes the code easier to
read.
Of course, R can perform more than just the most basic tasks of addition,
subtraction, multiplication and division. For finding the square root (by
2.2. R AS A CALCULATOR 33
far the most used root) of a number, there is a specific command sqrt.
However, recall that taking the square root of a number is equivalent to
raising a number to one half, such that it should not come as a surprise that
the following two lines of code result in the same output:
sqrt(2)
## [1] 1.414214
2^0.5
## [1] 1.414214
However, for all the other roots there are no specific functions. To take the
n-th root of a number in R, just raise it to the power of n1 :
27^(1/3)
## [1] 3
(1/32)^(1/5)
## [1] 0.5
As with most other calculators, R does not need the leading zero for decimal
numbers smaller than 1 (in absolute value): 0.25 and .25 are considered the
same.
## [1] 0.1
Finally, there are some numbers in mathematics that are so important and
often used that they have earned a specific name under which they are
known. The two most well known of them are π and e. R knows both
34 CHAPTER 2. GETTING STARTED WITH R
pi
## [1] 3.141593
exp(1)
## [1] 2.718282
abs(-5)
## [1] 5
The functions exp(x) and abs(x) take only one argument, indicated in the
brackets as x. That is the value at which the exponential function should
be evaluated for exp, or whose absolute value should be returned for abs.
2.2. R AS A CALCULATOR 35
Closely associated with the exp function is the function log(x, base =
exp(1)) that takes two arguments. The first argument, x, is clearly the
value at which the logarithm is evaluated. The second argument called base
defines the base at which the logarithm should be taken. The notation base
= exp(1) in the function introduction above tells us that the base argument
actually does not have to be provided, since it has a default value assigned,
and this default value is e. That means that simply entering log(x) will
compute the natural logarithm of x in R. To compute a different logarithm,
say decadic, base has to be provided and set accordingly. Let us now
illustrate the use of these functions - in the last line, we effectively verify
that log computes the natural logarithm.
exp(5)
## [1] 148.4132
log(10)
## [1] 2.302585
log(10, 10)
## [1] 1
log(exp(5))
## [1] 5
chunk of code:
choose(5,2)
## [1] 10
choose(2,5)
## [1] 0
choose(k = 2, n = 5)
## [1] 10
For such simple functions as those that we got to know so far, providing
the argument names is not necessary. Later on, however, we will encounter
functions that take far more (optional) arguments and this aspect of R’s
workings comes in handy if we only want to set a parameter that is far away
in the list of arguments the function takes.
2.3 Help
To find out what arguments a particular function takes or even what it does,
we can consult the Help. One option is to search for the function in question
in the search bar of the Help tab in the lower right window. Alternatively,
one can call out the help page of a function directly from the console by
typing a question mark followed by the function’s name, for instance:
?log
Run the above line of code to see what the help page looks like. It will
appear in the lower left window. Often a function is part of a larger family
of functions that are connected to each other in various ways. In that case,
the help page that occurs for a particular function contains information not
2.3. HELP 37
only about that particular function, but about all other functions of that
family, too. For instance in the case of log, we are told about the basic
logarithm function, its special forms for bases 2 and 10 and for computing
log(1 + x), as well as about the exponential function and a variation of it. In
the Description part, we see the list of functions provided in the help page
and a short description of their workings.
In the part Usage, the full list of functions is given again, this time including
the list of their arguments in brackets. As already touched upon above, an
argument that is listed on a stand-alone basis is a mandatory argument, like
x in the case of log. If the name of an argument is followed by an equality
sign and a value, this value is the default value for this argument which
will be used if you do not provide a different value. Sometimes you will
see ... at the end of the argument list which means that there are many
more optional arguments that are usually common for many functions that
perform a certain type of tasks. For instance there is a whole set of graphic
arguments that can be provided to (almost) any plotting function – we will
discuss these when we learn about plotting.
Section Arguments provides more details about what arguments enter the
functions, including what type of variable they should be or what role they
play in the underlying computations.
This is followed by the Details sections that, as the name suggests, provides
details about workings of the functions. Sometimes formulae used in the
computation or some technical details may be given.
In the part Value we are given information about what the outcome of
running the function is. Of course, for the logarithm or exponential function
we have a very good idea about the outcome. However, there are also far
more complicated functions for which this sections makes a lot of sense, and
the technical details of what type of object from R’s point of view the output
is might also be relevant for further computations.
There are several other sections in the help page of most functions. For now,
let us only mention the section usually located at the very bottom called
Examples that shows examples of the correct usage of the functions.
If you are not sure about a functions name or you generally want to look
for information on a certain object or operation that is not a function, the
single question mark might not work. However, by doubling the question
mark and using quotation marks around the expression to be looked for, you
trigger a buzz word search, i.e. R will not look for the help page associated
38 CHAPTER 2. GETTING STARTED WITH R
to the expression entered, but will instead search through all help pages. For
example, we may look for R’s inequality sign != to get more information
about comparison methods:
?? '!='
Again, please enter into your console and run the above line of code to see the
outcome. You will be offered a list of help pages that contain the expression
you looked for. In this case, we were interested in comparison methods, so
upon looking at the brief description of the pages in the right column, we
choose the page on relational operators. We then learn for instance that the
’at most’ and ’at least’ operators translate to <= and >=, respectively, and to
find out whether two values are the same, we should use ==.
We will learn more about comparing values and about the outcome of such
comparisons later on. But before doing so, we turn to assigning and storing
values for later use.
r <- 4
13/11 -> somenumber
2.4. STORING VALUES 39
If you execute these lines of code, you will see two objects r and somenumber
in the upper right window in the environment tab. You can also let R print
the value of a particular variable in the console by only typing it’s name. If
you use a computation to assign a value, putting simple brackets around the
assignment will both assign the value and show it in the console.
## [1] 4
(r <- 1.2*r)
## [1] 4.8
As we see from the second output, we can also assign values iteratively in the
sense that we can use the old value of a variable to define its new value that
will simply be overwritten. At this point, we would like to point out that
unlike with written formulae, where the multiplication sign can be dispensed
with when working with variables (e.g. 3x and 3 · x are the same), in R it is
absolutely necessary to use the star whenever multiplication is intended. In
the above code, for example, writing 1.2r instead of 1.2*r would result in
an error (feel free to try it out).
Theoretically, the assignment can be done also with the equality sign:
a = 5
Though this will generally work, it is not the recommended way to do things.
One of the reasons is that the equality sign is meant for setting the values of
function arguments and it is good to have the differentiation between these
two types of code. A more sever reason is the fact that there are situations
in which it matters and if you use = instead of <- in these situations, your
code will either not work the way you would like it to, or it won’t work at all.
Even though these situations will not arise in this course, we will consistently
use the ’gets’ operator (usually in its typical way with the arrow pointing to
the left) and we recommend you get used to this way from the start, too.
After all, who knows, maybe some day you will use R at a level where it will
matter :)
40 CHAPTER 2. GETTING STARTED WITH R
2.5 Vectors
R is a vectorized language which means that the basic objects it works with
are vectors – you can think of them as lists or sequences of values – and the
basic operations can be performed on them elementwise. In fact, even the
few variables that we have assigned previously are strictly speaking vectors
of length 1.
The most basic way of creating a vector is the function c (stands for combine),
in which you provide the list of values you want to store – as single values,
but you can also combine several vectors into one this way. Let us define our
first vector:
r <- c(2,3,4,5)
## [1] 1 2 3 4
10:(2*2+1)
## [1] 10 9 8 7 6 5
10:4*2+1
## [1] 21 19 17 15 13 11 9
2.5. VECTORS 41
Now that we have two vectors, we can investigate what happens if we perform
basic operations on them:
r + r2
## [1] 3 5 7 9
r*r2
## [1] 2 6 12 20
r - r2
## [1] 1 1 1 1
r/r2
r + 5
## [1] 7 8 9 10
r2*2
## [1] 2 4 6 8
r^2
## [1] 4 9 16 25
In the first four outputs we can see that the operations are performed elementwise,
which means that for instance for addition, the first entry of the first vector
42 CHAPTER 2. GETTING STARTED WITH R
is added to the first entry of the second vector, the second entries are added
to each other, etc. – and similarly with the other operations. Turning our
attention to the next output, we see that by adding a single number to a
vector, we increase each single entry by this number. Multiplying a vector
by a constant leads to multiplying each element by this constant, and the
square of a vector is a vector containing the squares of each individual entry.
r3 <- 1:3
r + r3
## [1] 3 5 7 6
We see a warning saying that the length of the longer vector is not a multiple
of the length of the shorter one. Nevertheless, something has happened. If
we recall the values in r and r3, we can reconstruct what R did: It did the
elementwise operation as far as was possible, and then started recycling the
shorter vector to fill it up to the length of the longer one. Effectively, r3
became (1 2 3 1) for the purposes of this operation. Because this recycling
had to be done but was not finished (values 2 and 3 were only used once
while 1 was recycled and used twice), R was not entirely sure this is what
the user wanted to do and showed a warning. Interestingly, if the length of
the two vectors is not the same, but one of the lengths is a multiple of the
other one, there will be no warning:
2.5. VECTORS 43
r4 <- 3:10
r2 + r4
## [1] 4 6 8 10 8 10 12 14
As already mentioned, the c function can be used to combine not only single
values, but also to create one vector from several, for instance
## [1] 2 3 4 5 1 2 3 4 85 1 2 3
min(r)
## [1] 2
max(r)
## [1] 5
sum(r)
## [1] 14
length(r)
## [1] 4
mean(r)
## [1] 3.5
44 CHAPTER 2. GETTING STARTED WITH R
## [1] 10 8 6 4 2
seq(-5, 6, by = 3)
## [1] -5 -2 1 4
## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
2.5. VECTORS 45
As can be seen in the above code, often one would provide the parameters
from and to without the names, whereas by or length.out would be specified
with the names. This is also because the first two are provided always,
whereas the other two are complementary to each other and one would
explicitly show which one is used for the sequence specification by providing
the parameter name.
## [1] 2 3 4 5
rep(r, 2)
## [1] 2 3 4 5 2 3 4 5
rep(r, times = 2)
## [1] 2 3 4 5 2 3 4 5
rep(r, each = 2)
## [1] 2 2 3 3 4 4 5 5
## [1] 2 3 3 4 4 4 5 5 5 5
46 CHAPTER 2. GETTING STARTED WITH R
r[4]
## [1] 5
r[c(1, 3)]
## [1] 2 4
r[-4]
## [1] 2 3 4
r[-c(1, 3)]
## [1] 3 5
It is much more usual that one is interested in the values in a vector that
satisfy certain conditions rather than say the third entry in particular. In
the following, we will discuss the most basic such conditions one would like
to check, namely comparison of values for equality or inequality.
As mentioned above, to check whether two values are equal, one should use
a double equality sign. To see whether they are not equal, the comparison
operator != should be used. One can also check whether one value is smaller
2.5. VECTORS 47
than (<), at most (<=), greater than (>) or at least (>=) another value. The
outcome of such a comparison is what we call a logical – the truth value of a
statement. It can therefore be TRUE or FALSE. If several values are compared
at the same time, we obtain a vector of logicals as the output.
2 == 3
## [1] FALSE
2 < 3
## [1] TRUE
1 != 1
## [1] FALSE
r < 4
r < r2
r > 5:2
The simple comparisons can be combined by the logical operators and and
or from Chapter 1. The sign for the logical and in R is the ampersand &
and the outcome of such a combination will be TRUE if both parts are TRUE
(and FALSE otherwise). The logical or is in R denoted by | and the outcome
is TRUE if at least one of the two conditions is TRUE (and FALSE if both
48 CHAPTER 2. GETTING STARTED WITH R
are FALSE). Both of these work elementwise in a vector, just like the simple
comparisons:
r < 3 | r > 3
Remark. In some programming languages, ’and’ and ’or’ are used with
double signs, i.e. && and ||. In R these operators also exist but only work if
we work with single values rather than vectors. If the simpler conditions of
either side of these operators are vectors, we get an error:
r < 3 || r > 3
Now that we know about logicals, we can, as alluded to above, index vectors
based on comparisons. A vector can be indexed not only by directly providing
the indexes that should be considered. Another way is by providing a logical
vector of the same length as the vector to be indexed. Only the values that
correspond to TRUE in the logical vector will be displayed. To illustrate this
in more detail, in the first case we will first show the corresponding logical
vector and only then the indexing. In the latter examples, we will do the
indexing without first printing the logical used, but for a quick check, you
can of course execute the conditions in the square brackets to see their truth
values and compare with the outcome.
2.6. MORE ON LOGICALS 49
r < 4
r[r < 4]
## [1] 2 3
r[r != 4]
## [1] 2 3 5
## [1] 3
Let us now have a more detailed look at the logicals. While these are
a special type of values, they can easily be treated as numbers. If an
operation intended for numbers is applied for a logical, R does an internal
translation by assigning 1 to TRUE and 0 to FALSE. It is therefore possible to
for instance multiply a vector of logicals by a number or a vector. However,
the particularly useful advantage of this behavior is the possibility to use the
sum function: this allows to very easily find out how many entries of a vector
satisfy a particular condition:
r > 2
(r > 2)*1
50 CHAPTER 2. GETTING STARTED WITH R
## [1] 0 1 1 1
sum(r > 2)
## [1] 3
Finally, sometimes one might be interested to find out the entries of a vector
for which a particular condition is not satisfied. Instead of changing the
condition to its negation, one can make use of the exclamation mark – we
can observe already in the operator != that it stands for not.
!TRUE
## [1] FALSE
!FALSE
## [1] TRUE
!(r>2)
r[!(r>2)]
## [1] 2
As we already hinted at, R is much more than just a smart calculator. The
language is essentially a large set of rules and in this short section, we will
introduce some interesting corner cases.
2.7. CORNER CASES 51
We will start by introducing NaN. NaN stands for not a number and it is what
you obtain if you try to perform a ’prohibited action’, i.e. if the operation at
hand is not well defined. An example of such a situation is taking the square
root of a negative number:
sqrt(-40)
## [1] NaN
1/0
## [1] Inf
Inf stands for infinity and according to the rules of R, division by 0 leads to
this result. The underlying reason can loosely be explained by the fact that
lim x1 = ∞. There is also a value for negative infinity, -Inf.
x→0
Finally, when dealing with data (wait for Quantitative Methods II or Business
Analytics for more information), sometimes there are missing values in your
measurements. R denotes these values by NA (not available). Rather than
a value in itself, NA is an information that the particular value is unknown.
Therefore a result of any operation with it will be NA again: If you don’t
know what a particular value is, you cannot know what the result of adding
5 to it or multiplying it by 3 is.
NA + 5
## [1] NA
NA*3
## [1] NA
52 CHAPTER 2. GETTING STARTED WITH R
We will close this section by a handful of helpful functions for the management
of your workspace and a summary of some useful advice. Some of the points
mentioned below were already discussed before, but we collect them here
along with others in a compact way.
Although R can be used as a very smart calculator, one would typically use
it to work on larger projects including large amounts of functions, variables
and data. When working on such projects, name your variables and functions
in a smart way to avoid confusion and to make it easier for others and your
future self to understand your code. Try to avoid names that are already
reserved in R for functions and constants, such as c or pi. If you do use
these names, it might, at the very least, lead to confusions, or they might
even cause your code to not work properly.
If you have too many variables in your environment, you might lose overview.
To obtain a list of all variables in your environment, you can use the function
ls. With the function rm, one may remove a certain object, like a variable,
(or several objects) from the environment, or even all variables by executing
rm(list = ls()).
ls()
rm(r, r2)
ls()
rm(list = ls())
ls()
## character(0)
2.8. SOME USEFUL ADVICE AND WORKSPACE MANAGEMENT 53
R also allows to use the objects in your current workspace for later use. To
know where to look for the saved file, you need to know the current working
directory. This can be checked by the means of the command getwd(). If
necessary, the working directory can be changed by setwd(directory) or,
in Rstudio, by choosing the tab Session in the upper toolbar and choosing
’Set directory’. Here you can choose ’To Source File Location’ if you want
the directory to be the one where the file you are currently working in is
located, or ’Choose directory’ to manually search for the desired folder.
Now that we know in which directory we are, we can save all objects from
the current directory with save.image(file) by providing the name of the
file to save the data in, or only some of the variables with save(y, file)
by specifying what objects should be saved and in what file. The resulting
file will have the extension .Rdata. To load such file, just use the function
load(file). In both cases, file must be enclosed in quotation marks.
To illustrate the saving and loading of data, we will first create a few variables
to have something to save (remember, we have just cleared the workspace).
After saving them, we will clear the workspace to then reload them and check
whether they are back.
a <- 5
b <- 7:12
r <- 1.08
save(r, file = "only_r")
save.image("our_workspace")
ls()
rm(list = ls())
ls()
## character(0)
load("only_r")
ls()
## [1] "r"
54 CHAPTER 2. GETTING STARTED WITH R
load("our_workspace")
ls()
Next to smart work with variables, another useful habit that can make your
code more understandable is using comments. In particular for longer codes
or more complex chunks of code it is useful to write down short remarks
about what the individual values are and what various functions do.
The readability of the code is further enhanced by its visual form. You might
have noticed that we have consistently used spaces around the ’gets’ operator,
around the signs + and -, as well as after commas when defining vectors. We
also mentioned the use of brackets in some places where they are not entirely
necessary. While these are small details that do not influence the workings
of your code, they do make it easier on your (and others’) eyes. We have
also used one line of code per command. While it is possible to stack several
commands into one line of code if we separate them with a semicolon, this
strategy usually makes the readability worse.
2.9 Exercises
2.1 Define the following vectors making use of the : operator and the
functions rep and seq.
a)1, 11, 112 , . . . , 1110
b)1, 2, 3, 1, 2, 3, . . .; 15 times
c)5, 5, 5, 6, 6, 6, 7, 7, 7, . . . , 18, 18, 18
d)−500, −450, −400, −350, . . . , −100
e)55, 53, 51, 49, 47, . . . , 33
f)2, 4, 4, 6, 6, 6, . . . , 14, 14, 14, 14, 14, 14, 14
2.10. FURTHER READINGS 55
2.2 Compute the following in one line of code (in a way that is as compact
as possible):
25
i2
P
a)
i=−25
10
3i
P
b)
i=−10
c)1 + 2 + 4 + . . . + 1024
100
P
d) i(10 − i)
i=1
e)4 + 7 + 10 + 13 + 16 + . . . + 259
2.3 Observe the outcomes of the following lines of code and think about why
they are what they are.
a)1/Inf
b)1/0 + 1/0
c)Inf - Inf
d)NA == NA
e)-5*Inf
The contents of this chapter are discussed in Chapters 1 and 2 of [2]. The
introduction to R, the installation process and an overview of Rstudio are
56 CHAPTER 2. GETTING STARTED WITH R
given in Chapter 1. Sections 2.1 and 2.3 discuss R as a calculator and the
use of vectors, at the end of each of these sections, there are also useful
exercises to practice your newly gained knowledge. Similarly, Section 2.7
focuses on logical vectors and includes exercises on this topic. Finally, Section
2.6 summarizes the use of R help.
Chapter 3
1. a domain A,
3. a rule that assigns to any element of the domain A one element of the
codomain B.
57
58 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
Note that the codomain of a function is not unique. Any set that contains
all possible values of the function can be used as its codomain; the smallest
possible codomain is the function’s range. Often one would use the range as
the codomain, but this is not strictly necessary. In some cases, for instance,
the range is a rather complicated set and therefore one would use a larger
set for the codomain with a more compact notation.
Example 3.1. A rule that assigns to each person in a room their age (in
whole years) is a function. Its domain is the set of all people in the room. A
possible codomain is the set of all natural numbers.
A rule that assigns to each age a person of this age in the Quantitative
methods lecture room is not a function since in a BBE cohort, there are
several people of the same age and therefore one cannot uniquely assign one
person of the given age to each age.
Problem 3.1. The total
√ dollar cost of producing x units of a product is
given by C(x) = 100x x + 500. What is the domain, codomain and range
of this function? What does its graph look like?
Solution. When considering the production of a product, it is usually impossible
to produce non-whole numbers of units. Therefore strictly speaking, the
domain of the above function should be the natural numbers including zero.
As a codomain, one could use the whole set of real numbers, but also the set
√ ≥ 500 for any
of positive numbers (clearly, C(x) x ∈ N0 .) The range of this
function is then C(N) = {100x x + 500|x ∈ N0 }.
To obtain the graph of the given function, we will make a small detour to
learn about defining and plotting functions in R.
case of a mathematical function with one numeric input and one numeric
output. Moreover, we also discuss how to plot the graphs of such functions.
Later, we will extend the discussion of this section to more general functions
in various senses, e.g. considering several inputs or outputs or providing
default values for arguments.
f <- function(x) {
return(x^2 + 3)
}
To define a function, we choose its name and assign to this name a particular
object. Using function informs R that whatever will be provided in the
coming chunk of code, provided in curly brackets, is part of the newly defined
function. Before coding the body of the function in curly brackets, we first
provide the argument(s) of the function in normal brackets. In this case, the
function at hand takes one argument, a value x. In the body of the function
we see that provided x, the value x^2+3 is returned : we use the command
return to define the output of the function. We have successfully defined our
first function that, provided a value x, computes the value of f (x) = x2 + 3.
Let us now test this function:
f(0)
## [1] 3
f(-2)
## [1] 7
f(1:4)
## [1] 4 7 12 19
output. The entries of this output correspond to the values of the function
for each entry of the input. This is because, as already discussed in the
previous chapter, the basic mathematical operations, like raising to a power
or addition, are vectorized, too. If we directly executed the line of code
(1:4)^2 + 3, it would work, and provide the same output as we see above.
Therefore providing a vector for x also works the same way for our defined
function.
Note that the above definition is a general one and the structure is necessary
for more complicated functions. However, with a function as simple as the
above, where only one line of code can be used to describe the value of the
function, neither the curly brackets nor the return command are necessary.
This can be easily verified:
## [1] 4 7 12 19
To define a function with more than one argument, one simply lists all the
arguments in the brackets after the word function. Note that when then
using the function, the arguments must either be provided in the exact order
as in the definition of the function, or using their names.
## [1] 31
f3(x = 2, y = 3)
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 61
## [1] 31
f3(y = 3, x = 2)
## [1] 31
# but not!
f3(3, 2)
## [1] 17
Now that we know how to define functions, let us turn to plotting them. As
you learn more R commands, you might notice that the names of many of
them are very intuitive. That is also the case for plotting: the command
to use is called plot. To plot pairs of points – which a graph of a function
consists of – you need to provide two vectors, the first vector being the x
coordinates of the points, and the second vector being their y coordinates.
x <- seq(-10, 10, by = 0.5)
y <- f(x)
plot(x, y)
100
60
y
20
0
−10 −5 0 5 10
x
62 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
In your RStudio, the plot will show up in the lower right corner. In some
cases, you might get the following error:
Error in plot.new() : figure margins too large
This might happen if the lower right window of your RStudio is too small to
reasonably show the plot. Just use your mouse to increase the size of this
window and try to plot again.
Let us now turn our attention to the plot we created. The first thing to
catch our attention is probably the fact that the plot does not really show
the graph of a function as one might expect it, instead only the provided
points are plotted. This can be changed by setting another parameter of the
plot function, namely type. Type "l" corresponds to plotting a line. In
this case, R will simply connect the provided points by a line. Note that R
does not distinguish between single and double quotation marks when dealing
with character variables/values, such that both type = "l" and type = ’l’
would lead to the same outcome (you may try this).
40
20
0
−10 −5 0 5 10
x
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 63
Clearly, the closer we choose the values of x to each other, the more exact
the plot. If with the above function we used x <- -10:10 instead of x <-
seq(-10, 10, by = 0.5), the plot will be quite rugged compared to the one
presented above. Some further possible types are "b" or "o" – it remains for
the reader to inspect these.
The plot function belongs to the family of graphic functions and as such
takes many different graphic arguments. Many of them allow to customize
the graph in various ways. For instance, one can use col to set the color of
the points/line. There are several ways to do this: col can be provided the
colors as characters (any basic color and many others will work), as numbers
(e.g. 1 stands for black, 2 for red, 3 for green and 4 for blue) or in terms of
their RGB specification (which is beyond the scope of this text and will not
be discussed here).
40
20
0
−10 −5 0 5 10
Arguments xlim and ylim allow the user to decide what portion of the x-
axis and the y-axis should be plotted. If they are not specified, they are
64 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
chosen such that all the provided points are visible (and with small margins
around). However, one can choose to limit the values to a smaller area or,
on the contrary, show a larger area. It is particularly important to think
about these arguments if you plot several functions in one plot whose ranges
(or domains) differ significantly. Each of these two arguments takes a vector,
specifying the minimal and maximal value of the particular axis, as the input.
The parameters xlab and ylab are used to control the names of the axes
which by default are called the same as the vectors provided (compare the
code and the y-axis of the previous plots and the next one). Finally, the
argument main can be used to provide a title for the graph. However, it is
possible to add the title also after plotting the function itself, by the means
of title.
plot(x, f(x), type = "l", xlim = c(-5, 5), main = "Some function")
Some function
100
80
60
f(x)
40
20
0
−4 −2 0 2 4
x
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 65
plot(x, y, type = 'l', ylim = c(-5, 150), xlab = 'x values', ylab = "y values")
title("Some function")
Some function
150
100
y values
50
0
−10 −5 0 5 10
x values
Note that as mentioned above, the code works properly no matter whether we
use single or double quotation marks. However, for a ’nice code’, one should
decide on one of these two options and use it consistently within code.
Now that we know how to define and plot functions in R, we can plot the
function from Problem 3.1.
Cost function
8e+04
C(x)
4e+04
0e+00
0 20 40 60 80 100
You might have noticed that whenever we used the function plot, a new plot
was created. If one would like to add another function or just a few points
to an already existing plot, there are other commands that can be used.
Depending on whether only points or a line graph should be added, the basic
commands are points and lines. Both of these work very similarly to plot,
though some arguments of plot, like those for customizing the axes or the
plot title, will not work with them, since these arguments apply to the whole
plot being just created, whereas points and lines only add to the already
existing plot. Therefore, as already mentioned above, it might be a good
idea to think about the range of your functions before you start plotting, and
either start with the function that would need a larger area on the y-axis, or
adjust the axis limits accordingly.
To illustrate how one can plot several functions and some extra points in a
single figure, let us plot some more functions.
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 67
Some functions
8e+04
y
4e+04
0e+00
0 20 40 60 80 100
Note that we have split the plotting command in two lines. R generally
does not interpret the end of line as the end of command. On the contrary,
if the command has not been finished in a line, for instance brackets were
not closed, R will continue reading in the next line until it finds the end of
the command – e.g. the closing bracket of a function. In particular if using
functions with many arguments, splitting commands into several lines can
be beneficial to make the code better readable.
Now that we know what a function is, we will study some special types of
functions and their applications in the following sections.
68 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
The simplest and at the same time very important class of functions are
linear functions.
Definition 3.1. A linear function is a function f : R → R of the form
f (x) = ax + b
with a, b ∈ R. The two parameters a, b of a linear function are called the
slope and intercept of the function, respectively.
The graph of a linear function is a line, and the names of the parameters
allude also to their interpretation:
The intercept b gives the intersection of the function with the y-axis.
That is, it is the value of the function at x = 0: f (0) = b.
The slope a describes how much the function value y = f (x) changes
with a unit change in x. That is, if x increases by 1, f (x) changes
by a. If a > 0, the function is increasing (we will have a closer look
at increasing and decreasing functions at a later point), that is, if x
increases, so does f (x). For a < 0, we obtain a decreasing function for
which f (x) decreases with increasing x. Finally, for a = 0 the function
is constant.
By comparing the values of the function for two values of the argument x,
we obtain the following equality that holds for any linear function:
f (x2 ) − f (x1 )
=a (3.1)
x2 − x 1
(it is a simple exercise to show that this holds by plugging in the functional
values of f (x1 ) and f (x2 )). This goes hand in hand with the interpretation
of a slope: If a unit change in x results in a change of a in f (x), then f (x)
has to change by a(x2 − x1 ) when changing x from x1 to x2 .
Since a line is uniquely determined by two points, the form of a linear function
can uniquely be determined if the values of the function for two points are
given. We will illustrate this in the following problem, which also shows an
application of linear functions. In particular, in simple economic models the
supply and demand for a product in a market are often modelled by a linear
function.
3.3. LINEAR FUNCTIONS 69
Problem 3.2. Suppose demand D is a linear function of its price per unit
P . When price is e10, demand is 300 units, and when price is e15, demand
is 250 units. Find the demand function.
To obtain the value of b, we can just plug in into one of the equations:
−10 · 10 + b = 300, leading to b = 400.
The equilibrium price is thus P ∗ = 20, with both demand and supply being
200 units at this price. For the plot, we use R:
70 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
Market equilibrium
400
demand and supply
300
200
100
0
0 10 20 30 40
price
Note that we have only used two possible prices. This has again got to do
with the fact that a linear function is determined by just two points.
To plot a linear function in R, one can simply define the function and use the
plotting function as we did in Problem 3.3. However, since linear functions
are on the one hand very simple and on the other hand, they play a very
important role in mathematics, there is also a function abline that allows
3.3. LINEAR FUNCTIONS 71
to plot linear functions (and lines in general) in a more direct way. However,
this function, similarly to lines and points only adds a line to a plot; it
cannot be used to create a new plot. There are several possibilities how this
command can be used:
Let us demonstrate this on the above example with the market equilibrium.
Next to the demand and supply functions, we will also plot a horizontal line
at the level of the equilibrium demand and supply, and a vertical one at the
level of the equilibrium price. However, to make the plot easier to read, we
will use dashed lines for the horizontal and the vertical line. To this end,
we will use a graphical parameter that we have not used so far: lty can be
used to control the type of the line being plotted. By default it is equal to
1, which corresponds to a full line. lty = 2 means dashed line. Again, you
can play around with this parameter to see what different values mean.
Market equilibrium
400
demand and supply
300
200
100
0
0 10 20 30 40
price
f (x) = ax2 + bx + c
To find the roots of the function, we can use the formula for solving quadratic
equations, to find that f (x) = 0 for x1 = −3 and x2 = 1. From that it follows
that f (x) = − 12 (x + 3)(x − 1). Let us now study the sign of the function. We
know that for x1 and x2 , it is equal to 0, so these will be the points where the
sign changes. We can therefore split the real numbers into three intervals:
(−∞, −3), (−3, 1) and (1, ∞). In the first interval, we have (x + 3) < 0 and
(x−1) < 0, which after multiplying these two terms with − 12 yields f (x) < 0.
Similar situation arises in the third interval where both brackets are positive.
On the other hand, between -3 and 1, (x + 3) > 0 and (x − 1) < 0, which
implies f (x) > 0.
−15 −10
−4 −2 0 2 4
x
3.5. POWER FUNCTIONS 75
At a closer look, one can see that if r ∈ N, a power function is a special kind
of a polynomial of degree r, with ar−1 , . . . , a0 = 0.
Any power function always passes through the point (1, A) which can easily
be verified by plugging in 1 into the general formula of a power function.
Example 3.2. Assume that the relationship between the size of houses s (in
m2 ) and their selling price P (in e) follows approximately P (s) = 40000·s0.4 .
In that case, a house of no size, logically, costs 0 e. A tiny house of size 1
m2 would cost P (1) = 40000 e and a house with an area of 10 m2 would
cost approximately 100475.5 e. To get an idea about this relationship, we
can plot the function in R:
House prices
150000
P(s)
0 20 40 60 80
s
76 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
The last two classes of functions we will discuss in this section are closely
connected to each other, since they are what we call inverse functions to each
other – see Chapter 4. We will start by introducing exponential functions.
f (x) = Aax
for A ∈ R and a ∈ R+ .
To have a better idea about exponential functions, let us plot the natural
exponential function.
Exponential function
20
15
exp(x)
10
5
0
−3 −2 −1 0 1 2 3
The graph will always pass through two points: (0, A) and (1, Aa) (can be
easily verified). The shape will always be similar to the one we observe for
the natural exponential function, but it might be scaled or mirrored around
the x-axis. Whether it opens upwards or downwards is of course governed
by both A and a. If we for a moment consider A > 0, the value of a decides
whether the function increases (for a > 1) or decreases (for a < 1). If A < 0,
it is exactly the other way round. Note that the exponential function can
never reach the value 0 which is why the graph of any exponential function
cannot intersect with the x axis.
The rules listed in 1.2.4 apply when working with exponential functions.
In particular, for an exponential function of the form f (x) = ax for some
a ∈ R+ , for any x, y ∈ R we have the following:
78 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
Generally, one can interpret an exponential function of the form f (x) = Aax
as follows:
In some situations, rather than being interested in the value of the exponential
function after some time, one is interested in finding the x for which the
function reaches a particular value. To do so, one takes the logarithm, as
defined in Definition 1.2. The logarithmic function is therefore very closely
related to the exponential function.
Definition 3.6. A logarithmic function is a function f : R+ → R of the form
f (x) = loga x
with ∈ R+ .
Again, to get a first idea about the general form of logarithmic functions, let
us plot the natural logarithmic function.
3.6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS 79
Logarithmic function
2
1
log(x)
0
−1
−2
0 2 4 6 8 10
The graph of a general logarithmic function will always pass through the
points (1, 0) and (a, 1). Again, the general shape of the logarithmic function
is as observed in the graph of the natural one, but different values of a will
result in different scalings and possibly a mirrored version. For a > 1 we
have an increasing function whose graph opens downwards; for a < 1 we get
a decreasing function with the graph opening upwards.
Using the rules from Section 1.2.5, we can derive the following properties of
the logarithmic function f (x) = loga (x) for x, y ∈ R+ :
The last two properties show clearly how close the exponential and logarithmic
functions are.
Problem 3.5. With the help of the properties of exponential and logarithmic
functions, derive the rule for loga x − loga y.
Solution. To find the rule, we start by writing − loga y as
1
− loga y = −1 · loga y = loga y −1 = loga .
y
Then using the fact that loga x + loga y = loga xy, we can write that
1 x
loga x − loga y = loga x + loga = loga .
y y
Problem 3.6. Simplify the function f (x) = exp(ln x2 − 2 ln y).
Solution. We again make use of the properties of the logarithmic and exponential
functions (including the one derived in Problem 3.5) to write:
x2 x2
2 2
f (x) = exp(ln x − ln y ) = exp ln 2 = 2 .
y y
10 · 1.01t = 20
1.01t = 2
log 1.01t = log 2
t log 1.01 = log 2
log 2
t= ≈ 70
log 1.01
after rounding to a whole number. Similarly, the time in which the investment
quadruples can be computed as t = loglog1.01
4
≈ 139.
3.6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS 81
Problem 3.8. How long does it take for an amount x to double at a yearly
interest rate of i ∈ {1, 2, 3} %?
t
i
x 1+ = 2x
100
t
i
1+ =2
100
log 2
t= i
log 1 + 100
i <- 1:20
rule70 <- 70/i
exact_t <- log(2)/log(1+i/100)
plot(i, rule70, main = "Rule of 70",
xlab = "Interest", ylab = "Doubling time")
points(i, exact_t, col = "red")
82 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
Rule of 70
70
Doubling time
50
30
10
5 10 15 20
Interest
rule70 - exact_t
We have already mentioned about default values for some functions in Chapter
2. As an example, we can mention the function log, for which we can find
in help the following: log(x, base = exp(1)). To recall what this means,
we see that the function takes two parameters, x and base. Since x is listed
without further details, this is a mandatory argument that the user has to
3.7. DEFINING FUNCTIONS WITH DEFAULT VALUES IN R 83
provide. On the other hand, base is followed by an equality sign which means
that this argument has a default value as listed after the equality sign, i.e.
in this case it is exp(1). That means that log by default computes natural
logarithm. If the argument base is provided, this value will be used as the
base of the algorithm. However, if the user does not provide any value for
this parameter, R will automatically work with the default value exp(1).
This function, as we can see in its body, computes the value a*x+b. If the
values a and b are provided, these values will be used, otherwise their default
values 0 will be used. Let us study the results:
# a = 1, b = 2
linfun(1:5, 1, 2)
## [1] 3 4 5 6 7
linfun(1:5, a = 1, b = 2)
## [1] 3 4 5 6 7
# a = 2, b = 1
linfun(1:5, a = 2, b = 1)
## [1] 3 5 7 9 11
# a = 0, b = 0 by default
linfun(1:5)
## [1] 0 0 0 0 0
84 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
# a = 0 y default, set b = 1
linfun(1:5, b = 1)
## [1] 1 1 1 1 1
## [1] 1 2 3 4 5
3.8 Exercises
3.1 Find the functions from the graphs. Note that all functions are linear,
quadratic, exponential or logarithmic.
2 4 6
10
g(x)
f(x)
−2
0
−6
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x
l(x) j(x) h(x)
−4
−2
2
−2
0
3.8. EXERCISES
x
x
x
0
4
2
6
4
8
m(x) k(x) i(x)
−8 −6 −4 −2 −2 −1 0 1 2 0 5 10 15 20
−2
−5
−4
−1
−3
4
−2
x
x
x
0
−1
6
0
1
1
2
2
85
86 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
3.3 In the following scenarios, find the demand and supply function (unless
given explicitly), the equilibrium price(s) P ∗ and the equilibrium demand and
supply D(P ∗ ) = S(P ∗ ). Then write a code in which you define the demand
and supply functions D and S and plot a graph showing both functions and
the market equilibrium that satisfies the following (note that the list is only
to maintain readability, the points do not necessarily show up in the code in
this order):
i)Both functions are plotted as lines, and there is a point/points that
shows the equilibrium/equilibria.
ii)The demand function, supply function and the equilibrium/equilibria
are all plotted in different colors.
iii)It shows the values of both functions for P between x1 and x2 (where
x1 and x2 are defined in each scenario).
iv)The title of the function is Market equilibrium, the x-axis is called
Price and the y-axis is called Quantity.
The scenarios:
a)The demand for a particular product in dependence on the price P is
given by the quadratic function D(P ) = P 2 − 20P + 100 and the supply
S follows a linear function. The supply at P = 0 is 0 units, whereas at
P = 3, the supply is 15 units.
b)The demand and supply in a market are both linear functions of price
P . If the price is 10 EUR, the demand is 80 pieces, whereas the supply
is only 30 pieces. On the other hand, if the price is 25 EUR, then the
demand is 50 pieces and the supply is 75 pieces.
c)The demand in a market is a quadratic function of price and supply in
the same marked is a linear function of the price P . If the price is 2
EUR, the demand is 84 pieces, whereas the supply is only 18 pieces. On
the other hand, if the price is 6 EUR, then the demand is 28 pieces and
the supply is 54 pieces. Moreover, if the product were for free (P = 0),
the demand is 100 pieces.
3.8. EXERCISES 87
3.4 In the following, you will find short tasks and corresponding pieces of R
code. However, in each code, there are some mistakes. Find and correct the
mistakes (and avoid building in new ones) such that in the end, you have a
functioning piece of code that fulfills the given task.Try to do so without using
R first, only then check your final code in R (using several different input
parameters). In each task, you may assume that the provided arguments
satisfy conditions given, i.e. a ”missing check” is not a mistake (e.g. if the
inputs are required to be integers, you may assume that the user will only
plug in integers).
1
a)Implement a function that evaluates f (x, y) = ln x + x+y + 3y for x, y ∈
R.
f(x, y) <- function {
part1 <- ln(x)
part2 <- 1/x+y
part3 <- 3y
return(part1 + part2 + part3)
}
b)Implement a function expon.inverse that for any positive number a
prints the value of the function ax and its inverse for some positive x.
The arguments of expon.inverse are x and a, where the default value
of a is the Euler’s constant e (but of course, the function should work
properly for any value of a, not only the default).
expon.inverse <- function(x, a = e) {
print(a^x)
print(log(x))
}
√
8
2 |x−y|
c)Implement a function that evaluates f (x, y) = (x + y) + 4x+5y
for
positive x and y.
f <- function(x, y) {
part1 <- (x + y)^2
part2n <- abs(x-y)^1/8
part2d <- 4x+5y
88 CHAPTER 3. FUNCTIONS OF ONE VARIABLE
print(part1 + part2n/part2d)
}
√
d)Implement a function that evaluates f (x, y) = log10 (x) + xy + ey for
positive x and y. Then use it to compute f (10, 1).
f <- function(x, y) {
p1 <- log(x)
p2 <- sqrt(xy)
p3 <- e^y
p1 + p2 + p3
}
f(y = 1, x = 10)
More on functions
In this chapter we will extend the knowledge gained in the previous chapter in
several ways. We will study how simple operations like addition or multiplication
transform graphs of functions and how new functions can arise from (basic)
functions by combining several functions. We will introduce the notion of
an inverse function and finally, we will study some important properties of
functions.
89
90 CHAPTER 4. MORE ON FUNCTIONS
Tranformed functions
10 15 20 25
f
g
h
g2
y
h2
5
0
−3 −2 −1 0 1 2 3
x
4.1. TRANSFORMATIONS OF GRAPHS 91
Remark. Note that if you plot several functions in one graph, like in this
case, it is usually a good idea to provide a legend that allows the reader
to distinguish between the various functions. The command to do so is, not
surprisingly, called legend. In its first argument, one specifies the location of
the legend (next to the keywords "top", "bottom", "topleft", "topright",
etc., it is also possible to specify it by its x and y coordinates). Next, one
provides the vector of texts to describe the various lines. lty controls what
type of lines is used in the legend – if it is only one number, all lines will
be of the same type; if it’s a vector, the line types will be assigned in the
same order as the texts of the legend. One can provide the color of the lines
by col, where the assignment of colors works the same way as with the line
type. To learn more about the possible arguments of legend, refer to R help.
3.0
f f
5
k k2
l l2
4
2.0
3
y
1.0
2
1
0.0
0
−10 −5 0 5 10 −10 −5 0 5 10
x x
f f
4
k3 k4
l3 l4
2
2
y
y
0
1
−2
0
−4
−1
−10 −5 0 5 10 −10 −5 0 5 10
x x
Notice the line of code par(mfrow = c(2,2)). You probably guessed that
it allows us to plot several graphs stacked next to and above each other, and
this guess was right. In particular, the first value in the vector specifies the
number of graphs above each other and the second one the number of graphs
next to each other, before a new plot is started. We will discuss this setting
in more detail at a later point.
In the following example, we will use the above described transformations.
94 CHAPTER 4. MORE ON FUNCTIONS
Illustrate the tax function graphically. Then visualize the suggestions for tax
reduction and comment on them.
Solution. Before we turn to R to visualize the tax function and the various
tax reductions, let us consider each reduction in turn and see what it means
in mathematical terms.
Now that we know how the tax function changes under the various suggestions,
we can proceed to plot the graphs of the functions. However, before we do
that, let us have a look at how we can implement the two cases (0 ≤ y ≤
100000 and y < 100000) of the function. Moreover, we will also consider
the way R computes the minimum or maximum when working with vectors.
This will allow us to implement the function T correctly.
It is often the case in programming that one has to consider several cases of
what should be done based on some underlying conditions. The most basic
way to do this in R is if. The syntax of if is quite simple: one provides
the condition as the argument of if, followed by what should be done if this
condition is satisfied. If there is more than one command specifying the tasks
under the condition, it is necessary to enclose them all in curly brackets (if
there is only one line, the brackets may be used but are not necessary). Let
us illustrate this:
a <- 4; b <- 8
if(TRUE) print(a)
## [1] 4
if(TRUE) {
print(a)
print(b)
}
## [1] 4
## [1] 8
if(FALSE) print(a)
if(a > 3) print(a)
96 CHAPTER 4. MORE ON FUNCTIONS
## [1] 4
## [1] 4
# recommended syntax
if(b < 5) {
print(a)
} else {
print(b)
}
## [1] 8
Now let us consider what happens if we would like to check some condition
for both a and b (the same condition for both), and in each case, the tasks for
TRUE and FALSE are the same. Clearly, we can do that separately, similarly to
the above by using one if...else construction for a and a separate one for
b. For instance, we would like to check for both of the values whether they
are greater than 5. If so, "Yeah" should be printed on the screen, otherwise
4.2. IF...ELSE AND IFELSE 97
we should see "No" – in the end, we should see two words, in particular,
knowing the values of a and b, we should see "No" "Yeah". However, this is
impractical if we have more than just two values to check. So let’s try doing
it at the same time:
if(c(a, b) > 5) {
"Yeah"
} else {
"No"
}
We obtain an error that at a closer look explains that this will not work since
if only expects one logical value – but as we know from Chapter 2, c(a, b)
> 5 results in a vector of two logical values.
For situations in which the same condition should be checked for several
values, and then the same task performed depending on the results of this
check, there is the function ifelse. The syntax of this function is the
following: ifelse(condition, task1, task2) where condition represents
the vector of logicals, provided usually in the form of a condition check, task1
specifies what should be done whenever condition has the value TRUE, and
task2 corresponds to what should be done whenever condition is FALSE.
So what we tried (and failed) to do in the above chunk of code can actually
be implemented as follows:
Here, we used the modulo operator %%: It gives the remainder after dividing
by a certain number:
(1:10)%%2
## [1] 1 0 1 0 1 0 1 0 1 0
Due to the fact that ifelse can work with a vector of conditions, in contrast
to if...else, we also say that ifelse is the vectorized counterpart of
if...else. It is important to remember at this point that we already talked
about vectorized functions in Chapter 3. In particular, if we are interested
in finding the values of a function for several values of the argument, it is
important to have a vectorized function or, if it is not vectorized, to be aware
of this and to work around it. Therefore when we will be defining the function
T , we will make use of ifelse.
We have already mentioned the functions min and max that find the minimum
or maximum, respectively, of a provided set of numbers (for instance a
vector). However, in some situations, we are not interested in the minimum
(or maximum) of a single set. Instead, one might want to find the minimum
between 0 and several other values, for each value separately (e.g. if the values
were -3, 5, -4, 2, we would like to get -3, 0, -4, 0, because for -3 and -4, the
minimum with 0 are these values, whereas both 5 and 2 are greater than 0).
Alternatively, we might have two (or more) vectors, for which we don’t want
to find their maxima, instead we want to find the elementwise maximum,
i.e. a vector in which the first entry is the maximum of the first entries, the
second is the maximum of the second entries etc. For instance, imagine that
in some course, you are given two tests, but only the better of the two results
counts towards your grade. For the teacher to obtain the points that count
towards the grade for each students means to find the elementwise maximum
of two vectors where in the first, there are the results of the first test, and in
the second the results of the second test.
4.3. MINIMUM AND MAXIMUM IN R 99
In situations as described above, min and max will not work as we would
expect:
## [1] 10
max(15, data)
## [1] 15
## [1] 10
Note that in all of these cases, max outputs a single value, in particular the
largest value of all considered values.
The vectorized versions of min and max that will actually perform the tasks
described above are called pmin and pmax.
pmax(5, data)
## [1] 5 5 5 5 5 6 7 8 9 10
pmax(data, data2)
## [1] 10 9 8 7 6 6 7 8 9 10
pmin(data, data2)
## [1] 1 2 3 4 5 5 4 3 2 1
100 CHAPTER 4. MORE ON FUNCTIONS
Now we have all the ingredients we need to define the function T and its
transformations:
When defining the function Tax, note the use of ifelse and pmax. If instead
of ifelse we used if...else, the code would work well if we provided only
a single value of y. However, if we would like to compute the value of the
Tax function for several y’s at the same time, like above, we would obtain an
error because in fact, y < 1e5 would be a logical vector.
While with if...else we would quite easily find out about our mistake due
to the error message, things could get more dangerous if we used max instead
of pmax. The code would still run, however not the way intended. It is left
to you as an exercise to understand what would happen in that case.
T − Regular tax
T_A − Deduct 10k before taxes
T_B − Deduct 5% before taxes
30000
15000
10000
5000
0
The domain of f − g is A ∩ C.
Note that we mentioned the domains of the new functions, but not their
ranges (or even codomains). The reason is that there is no general way
of saying what the range will be, it always depends on the particular two
functions. We can however use the codomains of the functions to find a
possible codomain. For instance, a possible codomain of f + g is the set
B + D which is the set of all values that can possibly be the result of adding
x ∈ B and y ∈ D. However, in the case of the quotient, if D contains 0, it
clearly has to be excluded when defining the domain in a similar way.
The above are the basic simple ways of combining two functions. In fact,
they are all special cases of a more general concept of a composition.
The function g is often called the kernel, interior function or inner function,
whereas f is called the exterior function or outer function.
f ◦ g ̸= g ◦ f .
Problem 4.2. Let f (x) = 3x − x3 and g(x) = x3 . Compute and visualize:
(f + g)(x),
(f − g)(x),
4.4. NEW FUNCTIONS FROM OLD 103
(f g)(x),
(f /g)(x),
(f ◦ g)(x),
(g ◦ f )(x).
Evaluate (f ◦ g)(1) and (g ◦ f )(1).
(g ◦ f )(1) = 8.
15
5
−5
−15
−4 −2 0 2 4
0
−10 −5
−4 −2 0 2 4
x
4.5. INJECTIONS, SURJECTIONS, BIJECTIONS AND INVERSE FUNCTIONS105
If A = B = R , then f is bijective.
0
+
A B A B
b1 a1
a1 b1
b2 a2
a2 b2
b3 a3
A B B
a1 b1 a1 b1
a2 b2 a2 b2
a3 b3 a3 b3
consider a function that reverses the assignment given by f and for each
value in y ∈ B, look for the unique value in x ∈ A with f (x) = y. This leads
to the notion of an inverse function.
Finding the inverse thus means finding the function that, when applied to
a value y from the range (and thus codomain – note that f is bijective) of
4.5. INJECTIONS, SURJECTIONS, BIJECTIONS AND INVERSE FUNCTIONS107
f , it gives us the argument x from the domain that would lead to f (x) = y.
Looking for the inverse therefore basically means solving the equation y =
f (x) for x.
y = x3 − 1
y + 1 = x3
p p
3
y + 1 = x ⇒ f −1 (y) = 3
y+1
√
Let us now plot the functions in R. First, let us recall that 3 x is the same
1
as x 3 . Since R does not have specific commands for roots higher than 2, we
will need to use this form of the third root. However, before we can plot
(or even compute) the inverse function, we will need to solve a problem that
arises for negative values of y + 1:
(-8)^(1/3)
## [1] NaN
## [1] NaN
108 CHAPTER 4. MORE ON FUNCTIONS
nthroot(-8, 3)
## [1] -2
Note that whenever the second argument, n, is even AND at the same time
x is negative, we still return NaN since even roots of negative numbers are
not well defined. For odd roots, we take the odd root of the absolute value
of the number, and adjust the sign based on the original sign of x.
Now that we know how to take the third root also for negative numbers, we
can turn to plotting f and its inverse.
0
−1
−2
−2 −1 0 1 2
x
4.6. COMMON PROPERTIES OF FUNCTIONS 109
You may notice that the inverse function is actually the original function
mirrored by the line x = y which we purposefully added into the figure. This
is not a coincidence; in fact, it is a general property of the inverse function
and follows from the very definition of the inverse function. This means that
if we know what the graph of f looks like, it is easy to draw the graph of f −1
simply by ”rotating” it.
Example 4.2. Usual examples of even functions are for instance x2 or |x|,
but also cos x and many more. Shifting these functions along the y-axis
preserves the property: x2 − 2 is still an even function. By shifting these
functions along the x-axis, they are not even anymore, but they remain
symmetric: for instance (x − 2)2 is symmetric about 2.
A typical example of an odd function is x, x3 or any odd power of x. Also
sin x is an odd function, and of course there are many more. Note that
unlike with even functions, shifting an odd function in any direction ruins
this property.
The last type of properties we will consider here are connected to the shape
of the function.
−2 −1 0 1 2 3
where the inequality in the last step holds because λ ∈ [0, 1] such that λ2 ≤ λ.
Hence, f (x) = x2 is a convex function.
4.7 Exercises
4.1 For each of the following function, find its range. Then decide whether
it is injective, surjective or bijective as a function from R to i) R and ii) its
range.
a)f1 (x) = 2x + 5
b)f2 (x) = x2 − 4
c)f3 (x) = 2x
d)f4 (x) = |x + 1|
4.2 For the functions in the pictures below, find the largest interval A for
which they are bijective (if considered as functions mapping from A to f (A)).
114 CHAPTER 4. MORE ON FUNCTIONS
3.0
1.0
50
40
2.5
0.5
30
g(x)
h(x)
f(x)
2.0
0.0
20
10
−0.5
1.5
0
−10
−1.0
1.0
x x x
4.3 For the following functions, decide whether they are even, odd or neither:
4x 4x
a)f1 (x) = 4x d)f4 (x) = x2 −x
g)f7 (x) = x2 +4
x
b)f2 (x) = cos(x) e)f5 (x) = |x| h)f8 (x) = 2
1 x2
c)f3 (x) = x
f)f6 (x) = log x i)f9 (x) = |x|+3
4.4 For the functions below, choose all of the properties they satisfy in the
shown region. Note that in each point, several properties might be satisfied
for one function.
b)symmetric/even/odd/none,
d)injective/surjective/bijective/none.
4.7. EXERCISES 115
20
10 15 20
15
g(x)
f(x)
10
5
5
0
0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x
30
40
10
h(x)
i(x)
−10
20
−30
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x
3.5
15
2.5
10
k(x)
j(x)
1.5
5
0.5
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x
116 CHAPTER 4. MORE ON FUNCTIONS
15
0.3
10
0.2
m(x)
l(x)
0.1
5
0.0
0
−2 0 2 4 6 1 2 3 4 5
x x
50 100
−20
n(x)
o(x)
0
−100
−40
x x
4.5 Consider the function f given in the graph below. Sketch the graphs of
the following functions:
a)g1 (x) = f (x) + 2
b)g2 (x) = f (x + 2)
c)g3 (x) = f (x − 1) − 3
d)g4 (x) = 2f (x)
e)g5 (x) = 12 f (x)
f)g6 (x) = −f (x)
g)g7 (x) = −f (2x)
4.7. EXERCISES 117
6
4
f(x)
2
0
−2
−6 −4 −2 0 2 4 6
4.6 For the following functions, decide whether the function has an inverse
(if codomain = range). If the domain is not specified, consider the largest
possible domain; if it is not R, specify it. If there is an inverse, find it.
a)h1 (x) = 3x + 2
b)h2 (x) = − x2 + 4
c)h3 (x) = |x + 2|
d)h4 (x) = 12 x2 for x ∈ [0, ∞)
e)h5 (x) = x2 − 6x + 5
f)h6 (x) = x2 − 6x + 5 for x ∈ (−∞, 3]
g)h7 (x) = 2x−1 − 4
h)h8 (x) = log2 x + 1
√
i)h9 (x) = 4 ln( x + 4 − 2)
f : R+ → R+ , f (x) = x3 ,
g : [1, ∞) → R+ , g(x) = (x − 1)2 ,
√
x
h : R+ → R+ , h(x) = .
3
118 CHAPTER 4. MORE ON FUNCTIONS
Find the functions M (x) = (g/f )(x) and D(x) = (h ◦ f )(x), their
domains and codomains. Decide, whether M and D are surjective and
injective. If an inverse exists, find it.
c)Consider the following functions:
Find the function M (x) = (9f ◦ g)(h), its domain and the smallest
possible codomain. Decide, whether M is surjective and injective. If an
inverse exists, find it.
4.8 For the following functions, decide whether they are increasing, decreasing
or neither. (Note that derivatives have not been discussed at this point so
even if you are tempted to use them, the idea is not to do so. Instead, use
the definition of monotonicity or your knowledge about basic functions and
how they may change monotonicity when composed with other functions.)
a)f : (−5, 0) → R, f (x) = (|x| − 5)4 ;
b)g : (−10, 0) → R, g(x) = (ln(−x))2 ;
1
c)h : (0, 2) → R, h(x) = exp((x−2)(x+3))
;
d)m : (−2, 0) → R, m(x) = 1 − |x|0.5 ;
2
e)n : (0.5, 5) → R, n(x) = ln x1 .
4.9 In the following, you are given four R functions. For each function, find
out what they do without using R. (Of course you may verify your answer in
R once you find it, but try first to read the function and understand what it
does without using R. This will help you develop your algorithmic thinking
which is absolutely necessary for programming.)
4.7. EXERCISES 119
4.10 In the following, you will find short tasks and corresponding pieces of R
code. However, in each code, there are some mistakes. Find and correct the
mistakes (and avoid building in new ones) such that in the end, you have a
functioning piece of code that fulfills the given task.Try to do so without using
R first, only then check your final code in R (using several different input
parameters). In each task, you may assume that the provided arguments
satisfy conditions given, i.e. a ”missing check” is not a mistake (e.g. if the
inputs are required to be integers, you may assume that the user will only
plug in integers).
a)Implement the function division takes two vectors a and b of the same
length as arguments. The result of the function is again a vector of the
same length as the inputs, with the i-th element being 0, if the larger
of the numbers a[i] and b[i] is not divisible by the smaller one, and
otherwise it’s the result of the division. For instance, if you execute
a <- c(4, 3, 1, 6)
b <- c(2, 8, 5, 2)
division(a, b)
the result should be 2 0 5 3. The implementation:
division <- function(a, b) {
smaller <- pmin(a, b)
larger <- max(a, b)
ifelse(larger %% smaller == 0, 0, larger/smaller)
}
b)Implement a function divisible that for any positive integers x and a
considers all values between 1 and x (including) for divisibility by a. For
every value that is divisible by a, it will print "divisible", otherwise
"not divisible". The default value of a should be 2.
divisible <- function(x, a = 2) {
tocheck <- 1 to x
if(tocheck %% a == 0) "divisible" else "not divisible"
}
c)Implement a function HApoints that takes a vector points of QM
student’ points achieved from the home assignments as input, and outpus
a vector of final home assignment scores, where a student’s final score
4.8. FURTHER READINGS 121
is points
2.5
if this value is below 10, and 10 otherwise. For instance, if the
input vector is c(15, 28, 22, 25.3), the output should be 6 10 8.8
10.
HApoints <- function(points) {
divided <- points/2.5
final_score <- min(10, divided)
return(divided)
}
d)Write a function Performance that takes the vector raitings containing
the ratings for a list of movies as input, and outputs a new vector of
classifications based on these ratings. The classification criteria are as
follows: a film is classified as good if the rating value is strictly greater
than 50, and poor otherwise. For instance, if the input vecotr is c(55,
45, 65, 70), the output should be "good" "poor" "good" "good".
Performance <- function(ratings) {
final.class <- if(ratings < 50, "poor", else "good")
ratings
}
Derivatives
For a linear function of the form f (x) = ax + b, we know that its steepness
at any point can be quantified by its slope, the parameter a. For every unit
change in x, the function value changes by a. This can also be expressed
by the relationship a = f (xx22)−f
−x1
(x1 )
which stays constant independently of the
values x1 and x2 or even their difference x2 − x1 .
For general functions, finding the slope of the function is not as straightforward.
The rate at which the function changes usually varies in dependence on the
point at which it is to be considered. To quantify this rate of change, one
uses the notion of a derivative.
123
124 CHAPTER 5. DERIVATIVES
For any point x0 such that f is differentiable at this point (i.e. the above
limit exists), f ′ (x0 ) in fact corresponds to the slope of the tangent line to
the graph of f at x0 . Since the tangent is a line that touches the graph
of the function at the given point, we therefore know that it is of the form
tx0 (x) = f ′ (x0 )x + b for some b ∈ R and that tx0 (x0 ) = f (x0 ). These two
pieces of information lead to the following functional form of the tangent line
at x0 :
Let us use the above definition in (5.1) to find the derivative of a function.
Problem 5.1. Find the derivative f ′ (x) for f (x) = x2 .
Solution. According to the definition, we should find
(x + h)2 − x2
lim .
h→0 h
We have
(x + h)2 − x2 x2 + 2hx + h2 − x2
= = 2x + h.
h h
As h gets closer and closer to 0, the second term diminishes and the only
term that remains is 2x. Therefore f ′ (x) = 2x.
Remark. In the literature, you may see different notation for derivatives.
The following all denote the same thing:
df (x) ∂f (x)
= = f ′ (x).
dx ∂x
5.2. THE FOR CYCLE 125
In some situations, one would like to repeat the same task several times,
for instance for various value of a particular parameter. An example of
such a situation is the case described above where we want to repeat the
computation of f (1+h)−f
h
(1)
and the plotting of the corresponding line for
several values of h. If the task should be repeated two or three times, it is
alright to just repeat the corresponding lines of code with the adjustments
of h. However, if the number of repetitions is high, copying the lines of
code is not practical and more importantly, it also gets error prone. In such
situations, the for cycle is a very useful feature. The syntax is the following:
for(i in sequence) {
task
}
that a certain task is to be repeated 10 times, and the parameter that changes
its values i takes in turn values from 1 to 10. task is one or several lines
of code that specify what should be repeated, and this might be dependent
n
i2 for all
P
on the variable i. For example, we might want to find the value
i=1
natural numbers n from 1 to 5. This could be done as follows:
for(n in 1:5) {
print(sum((1:n)^2))
}
## [1] 1
## [1] 5
## [1] 14
## [1] 30
## [1] 55
To define sequence, one of the simplest ways is of course to use 1:maxi where
maxi is the last value for which task should be repeated. If it depends on
the length of a particular vector vec, this could be done via 1:length(vec).
Alternatively, seq along(vec) also creates an arithmetic sequence with step
size 1 starting at 1 and ending at length(vec). However, note that while
the above examples showcase typical use of for cycles, sequence can be any
sequence of values. It does not have to start at 1, be an arithmetic sequence,
the values don’t have to be integers or even positive.
Now that we know how to achieve that the task of computing the slope
and plotting the corresponding line can be executed 10 times without copy-
pasting the code as many times, let us finally visualize the idea of the
derivative as a limit.
at <- 1
for (i in seq_along(h) ) {
slope <- (f(at + h[i]) - f(at)) / h[i]
points(at, f(at), col = cols[i], pch = 16)
points(at + h[i], f(at + h[i]), col = cols[i], cex = 2, pch = 16)
abline(f(at) - slope*at, slope, col = cols[i])
print(slope)
}
15
10
f(xs)
5
0
xs
## [1] 15
## [1] 13.19616
## [1] 11.55693
## [1] 10.07407
## [1] 8.739369
## [1] 7.544582
128 CHAPTER 5. DERIVATIVES
## [1] 6.481481
## [1] 5.541838
## [1] 4.717421
## [1] 4
Even though for many functions, finding the limit in (5.1) is not an issue,
there are others for which this limit is much more complicated. Moreover,
having to compute the limit every time a derivative is needed is also quite
impractical. In the following, we therefore introduce some rules for computing
derivatives as well as the derivatives of some special functions. Let g, h be
differentiable functions.
Note that the rule (5.12) is a direct consequence of the quotient rule and of
dc
the fact that dx = 0.
5.4. DERIVATIVES AND THE PROPERTIES OF A FUNCTION 129
With the knowledge of the derivatives of some basic functions and the derivative
rules for the sum (difference), product, quotient or composition of several
functions, one can find the derivatives of many other functions, too. We will
illustrate the use of the above rules on two examples next.
sin(x)
Problem 5.2. Find the derivative f ′ (x) for f (x) = tan(x) = cos(x)
.
Solution. We will make use of the quotient rule (5.11) with g(x) = sin(x)
and h(x) = cos(x). From (5.6) and (5.7) we have that g ′ (x) = cos(x) and
h′ (x) = − sin(x). Therefore we can plug in into the quotient rule formula
and write
cos2 (x) + sin2 (x) 1
f ′ (x) = 2
=
cos (x) cos2 (x)
h(x) is in turn the difference of two functions. The term x2 can easily
2
be differentiated and we have dx dx
= 2x. The term 1/ exp(x2 ), on the
other hand, is somewhat more complicated. To simplify it a bit, let us
rewrite it as exp(−x2 ). Now we can interpret it as a composite function
k(ℓ(x)) with k(x) = exp(x) and ℓ(x) = −x2 . We have that k ′ (x) = exp(x)
and ℓ′ (x) = −2x. According to the chain rule in (5.13), this gives us
dk(ℓ(x))
dx
= k ′ (ℓ(x))ℓ′ (x) = −2x exp(−x2 ). Altogether, we get that h′ (x) =
−2x exp(−x2 ) − 2x = −2x(exp(−x2 ) + 1).
Coming back to the original composition g(h(x)) and recalling that g ′ (x) =
cos(x), we finally get the derivative of f as
While the interpretation of the derivative as the slope of the tangent to the
function is useful in itself, derivatives can also be used to decide about the
common properties of function like monotonicity and convexity or concavity.
130 CHAPTER 5. DERIVATIVES
Remark. Note that some of the points above mention f being strictly
increasing or decreasing at a point. We say that f is strictly increasing
(decreasing) at x0 , if there is an interval open interval I, x0 ∈ I, such
that f is strictly increasing (decreasing) on I. Also note that while the
conditions for monotonicity on an interval are ”if and only if”, the conditions
for monotonicity in a point are sufficient only.
Remark. Notice that from the derivative of f at a given point, we can only
decide about its monotonicity at the point if the corresponding derivative is
not equal to 0. If the derivative is equal to 0 at the given point, f could
be increasing, decreasing, or neither. As an example, consider the following
single variable functions: f1 (x) = x2 , f2 (x) = x3 , f3 (x) = −x3 . We have
f ′ (0) = 0 for all three functions, but f1 is neither increasing nor decreasing
at 0, while f2 is increasing and f3 is decreasing.
5.5. HIGHER ORDER DERIVATIVES 131
5.6 Applications
there are numerical methods that can do so quite easily. Therefore being
able to approximate a function by a polynomial, like explained above, is a
big advantage.
Problem 5.4. Find the n-th order Taylor approximation of f (x) = exp(x)
for n ∈ {1, 2, 3, 4, 5}.
Solution. As we can see from the formula (5.14), for the n-th order approximation
we will need derivatives up to order n in the point a. For f (x) = exp(x),
this is not a difficult task: Any derivative f (n) (x) remains the function itself
such that f (n) (x) = exp(x). For a = 0, we have f (n) (a) = 1 for any n ∈ N.
Plugging in into the approximation formula, we therefore get:
n = 1 : T (x) = 1 + x.
0
n = 2 : T (x) = 1 + x +
0
x2
2
.
n = 3 : T (x) = 1 + x +
0
x2
2
+ x3
6
.
n = 4 : T (x) = 1 + x +
0
x2
2
+ x3
6
+ x4
24
.
n = 5 : T (x) = 1 + x +
0
x2
2
+ x3
6
+ x4
24
+ x5
120
.
We can even easily write down the approximation formula for any n ∈ N:
n
X xi
T0 (x) = 1 + .
i=1
i!
To study how well the approximation mimics the behavior of the function,
let us plot the function itself and its approximations up to degree n = 5.
return(Ta)
}
cols <- c("red", "green", "blue", "violet", "yellow")
for(i in 1:5) {
lines(x, approximation(x, i), col = cols[i])
}
legend("topleft", c("n = 1", "n = 2", "n = 3", "n = 4", "n = 5"),
col = cols, lty = 1)
20
n=1
n=2
15
n=3
n=4
10
n=5
y
5
0
−3 −2 −1 0 1 2 3
As we can see, with increasing n, the Taylor polynomial gets closer and closer
to the function.
5.6.2 Elasticity
Recall the interpretation of df (x) as the difference in f (and similarly for x).
This allows us to interpret equation (5.15) as follows: It is the percentage
change in f in reaction to a one percent change in x. If f is the demand and
p the price, the price elasticity of demand gives the (approximate) percentage
change in demand if the price changes by one percent.
Problem 5.5. Find the price elasticity of demand if the demand function
is given by D(P ) = −10P + 400. If the current price is P = 10, how much
does the demand change if the price changes by 1%?
5.7 Exercises
5.1 Find the derivatives of the following functions. Write down the domains
of both the functions and their derivatives.
√
f)f6 (x) = x2 − 71 x2
5
a)f1 (x) = x2 + x3
b)f2 (x) = 4x2 − x + 1
√ g)f7 (x) = 2 sin x + 3 cos x
c)f3 (x) = x + x−2
√ h)f8 (x) = 3x
d)f4 (x) = 6 3 x − 5
1 4
e)f5 (x) = x2
+ x3
i)f9 (x) = 9 log10 (x)
logc b
Hint for f9 : You can make use of the fact that loga b = logc a
which easily
lets you change the basis.
136 CHAPTER 5. DERIVATIVES
5.2 Find the derivatives of the following functions. Where possible, simplify
before differentiating, but don’t forget to specify the domain accordingly.
Where necessary, make use of the product, quotient and/or chain rule.
(x+1)3
a)g1 (x) = x
f)g6 (x) = cos(2x + 4)
b)g2 (x) = (x + 1)6
2
g)g7 (x) = sin2 x
√
c)g3 (x) = 4x3 − x h)g8 (x) = sin(x2 )
p √
d)g4 (x) = x + 5x i)g9 (x) = ln(3 sin x + 8)
2
e)g5 (x) = (x − 1) sin x j)g10 (x) = esin x
5.3 For the following functions, find their derivatives and find intervals on
which they are increasing/decreasing.
(x2 +2)2 1
a)h1 (x) = 4
e)h5 (x) = (3x4 +x2 )10
x3 +1 √
b)h2 (x) = x+1 f)h6 (x) = ( 2x3 − 1 + 2)8
2x−1
c)h3 (x) = x+3 g)h7 (x) = ln(2x + 4)
√
x2 +2x x
d)h4 (x) = 1−x2
h)h8 (x) = e
5.4 For the following functions, find their derivatives and then use them to
decide, whether the functions are increasing or decresing at x = 0.
a)f (x) = (2x + e−x )3 + 1
x+2
;
b)g(x) = (x2 + e−3x )2 + 2−x
1
.
x3
5.5 For f (x) = x4 − 6
+ 2, find f ′ (0), f ′ (−1) and f ′ (2).
For functions f1 , f2 and f3 , find the intervals on which they are convex/concave.
5.7. EXERCISES 137
0
−4 −2
−4 −3 −2 −1 0 1 2
5.10 Assume that the demand for natural gas in Europe is given by D(p) =
200p−2 . Find the price elasticity of demand. Interpret the result.
138 CHAPTER 5. DERIVATIVES
5.12 The following code should implement a function A that takes two vectors
x and y of the same size and returns the element-wise minimum of their
element-wise product and their element-wise sum. However, there are some
mistakes in the code. Find and correct (and avoid building in new ones),
such that you end up with a functioning code that fulfills the task. You may
assume that the user only plugs in two vectors of the same size, that is, a
missing check for this condition is not a mistake.
A <- function(x, y) {
K <- length(x)
ews <- rep(0, times = K, each = 2)
for(i in 1:K) {
ews[i] <- x[i] + y[i]
}
ewprd <- x*y
min(ewprd, ews)
5.13 Assume that two vectors x and y of the same length have been assigned
in R. What does the following piece of code do? In particular, what is the
outcome of the last two lines?
m <- length(x)
res <- x[1]*y[1]
for(i in 2:m) res <- res + x[i]*y[i]
mxy <- max(x[1], y[1])
for(i in 2:m) mxy <- c(mxy, max(x[i],y[i]))
print(res)
print(mxy)
Integration
In this chapter, we will introduce both the indefinite and the definite integral,
some rules for calculating integrals, and we will close with some applications.
Assume that we don’t know the function F , but we know its derivative:
F ′ (x) = x2 . Thinking back to the rules about differentiating power functions,
it is clear that F (x) = 31 x3 could be the function F we are looking for because
differentiating F would lead to x2 . But this would also be the case if we set
141
142 CHAPTER 6. INTEGRATION
By reversing some of the rules for calculating derivatives, one can create a
set of basic rules for finding the indefinite integral. Let f and g be functions
and a ∈ R.
Z
adx = ax + C (6.1)
Z Z
af (x)dx = a f (x)dx (6.2)
Z Z Z
(f (x) + g(x))dx = f (x)dx + g(x)dx (6.3)
Z
1
xa dx = xa+1 + C for a ̸= −1 (6.4)
a+1
Z
1
dx = ln(|x|) + C (6.5)
x
Z
ex dx = ex +C (6.6)
Z
cos(x)dx = sin(x) + C (6.7)
Z
sin(x)dx = − cos(x) + C (6.8)
6.1. INDEFINITE INTEGRAL 143
Note that in rule (6.3), the brackets around f (x) + g(x) are not strictly
necessary since the integrand is enclosed between the integral sign and dx
such that there is no danger of confusion. However, using brackets in similar
situations improves the readability.
R
Problem 6.1. Find the indefinite integral (2x3 + 5)dx.
Solution. We make use of rules (6.1)-(6.4) to write
Z Z Z
3 3
(2x + 5)dx = 2 x dx + 5dx
1
= x4 + 5x + C
2
Clearly, the rules in the previous subsection will only work for fairly simple
functions. If the function to be integrated is more complicated, one will need
to use more advanced methods to find the antiderivative.
A method that often (but not always!) works well to find the indefinite
integral of a product of two functions is called integration by parts. This
method reverses the chain rule for derivatives to find the integral.
R
Problem 6.3. Find the indefinite integral ex cos(x)dx.
Solution. Upon analyzing the functions in the integrand, we realize that it
does not make a difference whether we integrate or differentiate cos(x); in
both cases, the new function will be sin(x) (or − sin(x)). Let us therefore
try and see what happens if we use integration by parts by setting f (x) = ex
and g ′ (x) = cos(x). In this case, we obtain f ′ (x) = ex and g(x) = sin(x).
The formula in (6.9) leads to
Z Z
e cos(x)dx = e sin(x) − ex sin(x)dx.
x x
At the first sight, this does not seem too helpful, but before
R x giving up, let
us try applying integration by parts again, this time on e sin(x)dx, while
keeping f (x) = ex and setting g ′ (x) = sin(x). Note that then g(x) = − cos(x)
such that in the integral part of the right hand side of (6.10), we arrive at
the original integral again:
Z Z
e sin(x)dx = − e cos(x) + ex cos(x)dx.
x x
(6.10)
R
Let us now combine both steps and denote ex cos(x)dx by I:
Z
I = ex cos(x)dx = ex sin(x) + ex cos(x) − I.
The second (and last) method for finding more complicated indefinite integral
that we introduce is integration by substitution. In this method, one reverses
the chain rule for derivatives.
Recall that for two integrable functions f and g such that their composition
f (g(x)) is well defined, the chain rule implies that (f (g(x)))′ = f ′ (g(x))g ′ (x).
By applying the indefinite integral to both sides of this equality, we get
Z
f ′ (g(x))g ′ (x)dx = f (g(x)).
While mathematically this makes a lot of sense, finding the functions f and
g requires a lot of practice. One needs to practice ”seeing the derivative
of a composite function” in the integrand. Generally we try and look for
a product of two functions: a composite function, where the inner function
is g, and the derivative of the inner function. Then we make the following
substitution: u(x) = g(x). Recall that we can write u′ (x) = du dx
. From
du ′ ′
dx
= g (x), we thus obtain du = g (x)dx. Then we can rewrite the original
integral as follows:
Z Z
f (g(x))g (x)dx = f ′ (u)du,
′ ′
72x and that (9x2 + 2)′ = 18x. The first part can therefore be seen as
a composite function with the inner function being g(x) = 9x2 + 2 and
the second part is a multiple of g ′ (x). We therefore use the substitution
146 CHAPTER 6. INTEGRATION
Finally, let us derive a useful rule for integrating composite functions with
the inner function being a linear function.
R
Problem 6.6. Derive the rule for f (ax + b)dx where a, b ∈ R, a ̸= 0 and
f (x) = F ′ (x).
Solution. We use the substitution ax + b = u. Then we get du = adx or,
equivalently, dx = a1 = du. Therefore it follows that
Z Z
1 1 1
f (ax + b)dx = f (u)du = F (u) + C = F (ax + b) + C.
a a a
Since ancient times, there have been formulas for calculating the area of
any rectangle and thus also of any polygon that, by definition, is entirely
6.2. DEFINITE INTEGRAL AND AREAS 147
Figure 6.1: Approximating the area under an irregular curve from below
Let us consider the area under an irregularly shaped curve, such as y = f (x),
between x = a and x = b. The area can be approximated by the area of
several rectangles that arise by dividing the interval [a, b] into n subintervals
and raising rectangles above these subintervals. The height of each rectangle
can e.g. be the function value in the left or right endpoint of the subinterval,
the middle of it, or the smallest or largest value of the function on this
subinterval. This idea, in the case of using the lowest value of the function
on the subinterval and therefore approximating the integral from below, is
illustrated in Figure 6.1.
In this notation, the function f that is being integrated and the interval of
integration from a to b are explicitly specified. It is read as ”the integral
from a to b of f (x)dx; a and b are the lower and upper limits of integration,
respectively.
Following the approximation idea described above, the definite integral can
be seen as an infinite sum or, in other words, the sum of the area of infinitely
many rectangles.
Remark. Note that the variable of integration x is a dummy variable in
the sense that it can be replaced by any other variable that does not occur
anywhere else in the expression. In that sense, it holds for instance that
Z b Z b Z b
f (x)dx = f (y)dy = f (ξ)dξ.
a a a
Rb
However, a
f (b)db is not a valid expression.
To evaluate the definite integral, one can use the fundamental theorem of
calculus.
Theorem 6.1. Let f be a continuous function and F such that F ′ = f . We
have Z b
f (x)dx = F (b) − F (a).
a
This theorem provides a powerful tool for calculating definite intervals. The
difference F (b) − F (a) is denoted by |ba F (x) = F (x)|ba = [F (x)]ba , to indicate
that b and a are to be substituted successively for x. Note that any constant
disappears in the difference F (b) − F (a) such that any antiderivative of f
can be used and the integration constant does not have to be considered for
definite integral.
R3
Problem 6.7. Evaluate 0 x2 dx.
Solution. We have
3 3
x3 33 03
Z
2
x dx = = − = 9.
0 3 0 3 3
6.2. DEFINITE INTEGRAL AND AREAS 149
Definition 6.3. The signed (net) area between the graph of a continuous
function f and the x axis between x = a and x = b is the definite integral of
f (x) over the interval from a to b:
Z b
f (x)dx.
a
Note that the fundamental theorem of calculus does not require the function
to be non-negative such that it also applies in case of general functions.
R 0
ln(3) x
(e −3)dx,
R 2π
−2π
sin(x)dx.
Solution. We have
R 0
ln(3) x
(e
ln(3)
−3)dx = [ex −3x]0 = 3 − 3 ln(3) − 1 = 2 − 3 ln(3) ≈ −1.296,
R π
−π
sin(x)dx = [− cos(x)]π−π = −1 + 1 = 0.
Problem 6.9. Calculate the area between the x-axis and the graph of the
function f (x) = ex −3 between x = 0 and x = ln(3).
150 CHAPTER 6. INTEGRATION
Solution. We note that the function f (x) is non-positive for all x ∈ [0, ln(3)].
Therefore, we have |f (x)| = −f (x) and the area we are looking for can be
evaluated as
Z ln(3) Z ln(3) Z ln(3)
|f (x)|dx = −f (x)dx = − f (x)dx = 3 ln(3) − 2 ≈ 1.296.
0 0 0
In the following, we collect some rules for working with definite integrals.
Z b Z b
kf (x)dx = k f (x)dx; k ∈ R (6.11)
a a
Z b Z b Z b
(f (x) ± g(x))dx = f (x)dx ± g(x)dx (6.12)
a a a
Z a Z b
f (x)dx = − f (x)dx (6.13)
b a
Z a
f (x)dx = 0 (6.14)
a
Z b Z c Z b
f (x)dx = f (x)dx + f (x)dx (6.15)
a a c
Z b(t)
∂
f (x)dx = f (b(t))b′ (t) − f (a(t))a′ (t) (6.16)
∂x a(t)
Let us now comment on the above rules. (6.11) and (6.12) are direct consequences
of (6.2) and (6.3).
(6.13) implies that the integral limits do not necessarily need to be ordered;
it is possible for the lower limit of integration to be actually greater than the
upper limit. The rule follows from the fundamental theorem of calculus:
Z a Z b
f (x)dx = F (a) − F (b) = −(F (b) − F (a)) = − f (x)dx.
b a
Ra
(6.14) also follows from the fundamental theorem of calculus: a f (x)dx =
F (a) − F (a). It also makes a lot of sense geometrically: The surface over a
single point is just a line, which clearly has area zero.
(6.15) is very useful in cases where the function f is defined as several
different mathematical functions on subintervals of [a, b] and can be used to
easily find the area between the x-axis and the function graph for functions
that change the sign on the interval of integration, as illustrated in Problem
6.10 below. Moreover, it even allows to evaluate definite integrals even for
functions that are not continuous by splitting the intervals in the points of
6.2. DEFINITE INTEGRAL AND AREAS 151
Problem 6.10. Calculate the area between the x-axis and the function
f (x) = sin(x) between x = −π and x = π.
Solution. To find the area between the x-axisR and the function f (x) = sin(x)
π
on the given interval, we need to evaluate −π | sin(x)|dx. While it is not
straightforward how to find the antiderivative of | sin(x)|, we notice that the
function is non-positive for x ∈ [−π, 0], such that | sin(x)| = − sin(x) on this
interval, and non-negative for x ∈ [0, π]. Therefore we can use rule (6.15) to
write
Z π Z 0 Z π
| sin(x)|dx = (− sin(x))dx+ sin(x)dx = [cos(x)]0−π +[− cos(x)]π0 = 2.
−π −π 0
We close this section by a little warning. Recall the method for finding
antiderivatives by substitution. Particularly when evaluating definite integrals,
it is important to remember to reverse the substitution at the end of the
process of finding the antiderivative, to return to the original variable of
integration, because the integration bounds are given in terms of this variable.
Alternatively, one can change the integration limits in the substitution step,
and in that case, the reversed substitution is not necessary at the end. We
showcase both methods in the following problem.
R2 √
Problem 6.11. Calculate 1 4x 1 + x2 dx.
R √
Solution. Method 1: We start by finding the indefinite integral 4x 1 + x2 dx.
We will use the substitution u = 1 + x2 , which leads to du = 2xdx. We get
√
√
p
√ 3 (1 + x2 )3
Z Z
4 u 4
4x 1 + x2 dx = 2 udu = +C = .
3 3
152 CHAPTER 6. INTEGRATION
6.3 Applications
The consumer surplus represents the overall benefit that consumers have in
the equilibrium from paying a smaller price – the equilibrium price P ∗ – than
what they are willing to pay according to the inverse demand function. At a
given quantity level Q, this difference is D−1 (Q) − P ∗ . For consumer surplus,
we are interested in all quantity levels below the equilibrium quantity Q∗ .
6.3. APPLICATIONS 153
Usually the difference D−1 (Q) − P ∗ will be positive as less people are willing
to buy at a higher price. The overall benefit of all consumers who are willing
to buy at a higher than equilibrium price, can therefore be calculated as
Z Q∗
CS = (D−1 (Q) − P ∗ )dQ. (6.17)
0
On the other hand, the producer surplus represents the overall benefit that
the producers gain in the equilibrium from selling at a higher price – the
equilibrium price P ∗ – than the price they are willing to produce for. At a
given quantity level P , this difference is P ∗ − S −1 (Q). The overall benefit of
all producers willing to sell at a lower than equilibrium price is then given as
Z Q∗
PS = (P ∗ − S −1 (Q))dQ. (6.18)
0
Let us now calculate the consumer and producer surplus in a market. In the
following problem, we will also graphically illustrate the two measures.
Problem 6.12. Given the inverse demand function P (Q) = D−1 (Q) =
110 − Q2 and the inverse supply function P (Q) = S −1 (Q) = 29
9
Q, find the
equilibrium price and quantity. Then, evaluate the consumer and producer
surplus.
Solution. To find the equilibrium quantity, we set the two inverse functions
equal to each other and solve for Q:
29
110 − Q2 = Q
9
29
Q2 + − 110 = 0
9
110
Q∈ − ,9
9
Since Q is a quantity, we only consider the positive solution and get Q∗ = 9,
P ∗ = 110 − 81 = 29. Then we have
Z 9 Z 9 9
Q3
2 2
CS = (110 − Q − 29)dQ = (81 − Q )dQ = 81Q − = 486
0 0 3 0
and
Z 9 9
29 29 2
PS = 29 − Q dQ = 29Q − Q = 130.5.
0 9 18 0
100
60
CS
P
20
PS
−20
0 2 4 6 8 10 12
Let us consider the area of the hypothetical rectangle. Since one of the sides
is the x-axis on the interval [a, b], the length of this side is (b − a). The
average value f¯ of the function, is the other side length – the height of the
rectangle. Since we want the (signed) area of the rectangle to be the same
as the (signed) area between the function graph of f and the x-axis on this
interval, we can write
Z b
f¯(b − a) = f (x)dx.
a
6.3. APPLICATIONS 155
Z b
1
f¯ = f (x)dx. (6.19)
b−a a
−t + 4, for 0 ≤ t < 3,
I(t) = −t + 8, for 3 ≤ t < 5,
−t + 12, for 5 ≤ t < 10.
¯ we need to calculate
Solution. To find the average inventory level I,
Z 10
1
I¯ = I(x)dx.
10 0
We notice that the given inventory function is discontinuous such that the
fundamental theorem of calculus cannot be directly applied to it. However,
rule (6.15) allows us to split the interval into three simpler intervals:
Z 3 Z 5 Z 10
1
I¯ = (−t + 4)dt + (−t + 8)dt + (−t + 12)dt
10 0 3 5
2 3 2 5 2 !
10
1 t t t
= − + 4t + − + 8t + − + 12t = 3.8.
10 2 0 2 3 2 5
8
6
I(t)
4
2
0
0 2 4 6 8 10
6.4 Exercises
6.1 Find the following indefinite integrals:
√
x4 −1+ x
R
a) (x3 + 6x2 − 2x)dx,
R
i) x 3 dx,
R
b) (3x + 5)dx, 1
R
j) 4x+15
dx,
c) (x−4 − x−5 )dx,
R
√
R ( x−1)2
R 1 1
k) x
dx,
d) (x 3 − 3x− 4 )dx, R
R 2 l) exp(2x + 5)dx,
e) ( x2 + x42 )dx, R
R √ 6
m) cos(0.5 + 0.25x)dx,
f) ( x + √ 3 x )dx, √
R √ 3
R
n) x5 dx,
g) 2 3xdx,
R √ R q
1
h) x x(1 + x√5 x )dx, o) x5
dx.
6.4 Find the function f for which the following hold: f ′ (x) = x2 + 32 , f (1) = 3.
6.5 Find the function f for which the following hold: f ′′ (x) = 12x − 6,
f (0) = 4, f (1) = 6.
6.7 Find the parameters a, b, c such that for the function f (x) = ax2 +bx+c,
2
the following hold: f ′ (1) = 8, f ′′ (1) = 6, 1 f (x)dx = 14.
R
6.8 Find the overall area between the graph of the function f (x) = x2 − 1
and the x-axis on the interval [−2, 2]. Hint: Try to plot the function first to
see its sign in various parts of the interval.
6.9 Find the consumer surplus in a market with the inverse demand function
given by D−1 (Q) = −3Q + 60 and equilibrium quantity Q∗ = 10. Hint: First
find the equilibrium price P ∗ .
6.10 Find the average value of the function f (x) = e4x between 0 and 2.5.
158 CHAPTER 6. INTEGRATION
6.11 The amount of products on stock in a certain e-shop for a given time
t ∈ [0, 10] is given by
3,
t ∈ [0, 4),
I(t) = −t + 7, t ∈ [4, 6)
−2t + 20, t ∈ [6, 10].
6.12 The amount of products on stock in a certain e-shop for a given time
t ∈ [0, 10] is given by
(
650 e−0.25t , t ∈ [0, 5),
S(t) =
330 − 10t, t ∈ [5, 10].
What is the average speed at which the car is traveling for t ∈ [0, 50]?
Most of the content of this chapter (with the exception of average value of
a function) is discussed in [1] in Chapter 10. Parts of Section 10.4 and the
whole Section 10.7 in [1] are not covered in these lecture notes nor in the
Quantitative Methods 1 course.
Matrix algebra
161
162 CHAPTER 7. MATRIX ALGEBRA
Note that in the dimension of the matrix, the number of rows comes first and
only then comes the number of columns. This is a general convention which
does not only apply to naming the matrix dimension, but also to naming
matrix entries. A general matrix A can be written in the form
a11 a12 . . . a1c
a21 a22 . . . a2c
A = (aij )r×c = (aij ) = .. .. . (7.1)
..
. . .
ar1 ar2 . . . arc
Note how each entry of the matrix is denoted with a double subscript, where
the first subscript refers to the row and the second subscript to the column.
aij (or equivalently Aij ) therefore refers to the element in row i and column
j of matrix A.
Some special matrices are the zero matrix, the identity matrix and a diagonal
matrix.
The zero matrix of size m × n is denoted by 0m×n or, if the dimension is clear
from the context, simply 0 and it is a matrix that only contains 0 in each of
its cells.
A diagonal matrix is a square matrix that only contains nonzero entries on
7.2. MATRICES IN R 163
the diagonal (but the diagonal entries might also be 0). That is, a square
matrix D is diagonal if dij = 0 for any i ̸= j.
The identity matrix of order n, denoted by In or, if the order is clear from
the context, simply I, is a diagonal matrix that contains 1 in all diagonal
entries. That is, Iij = 0 for i ̸= j and Iij = 1 for i = j. The identity matrix
plays the role of 1 among matrices: In matrix product (see below in Section
7.3), multiplying with the identity matrix leaves the other matrix unchanged,
just like multiplying a number with 1 leaves the number unchanged.
7.2 Matrices in R
matrix(1:6, 2, 3)
matrix(1:6, 2)
164 CHAPTER 7. MATRIX ALGEBRA
matrix(1:6, ncol = 3)
As we can observe from the first matrix, just like in the mathematical
notation, also in R the first dimension in a matrix is the number of its rows:
matrix(1:6, 2, 3) creates a matrix of 2 rows and 3 columns. Observe
the difference between the first and the second matrix: While in the first
matrix, the first column is filled with the values 1, 2 and 3 before moving
on to the next column, the second matrix is filled row by row, ensured by
byrow = TRUE. The third and fourth matrix demonstrate that in fact, it is
not necessary to provide both the number of rows and the number of columns:
providing only one of them is enough for R to decide what the other dimension
should be in dependence of the number of values used. (Recall that if only
providing the number of columns, the name of the argument ncol must be
specifically used since the second argument of the function matrix is nrow
and thus, if no name is provided, the number will be used for nrow – compare
matrix(1:6, ncol = 3) and matrix(1:6, 3)).
## [3,] 3 1 5 3
## [4,] 4 2 6 4
matrix(1:6, ncol = 4)
As we see, in all three cases, after all the provided values are used to fill the
cells of the matrix, recycling starts and the values are used all over again as
often as needed. When comparing the first and the second matrix, we note
that while in the first matrix, not all values were used the same amount of
times, in the second case the number of matrix entries are a multiple of the
number of provided values, such that the values vector was entirely recycled.
This explains why in the first case, we obtained a warning, whereas in the
second case, there was no warning.
If only one of the dimensions, the number of rows or the number of columns,
is provided, R creates the smallest matrix that has the desired number of
rows or columns and uses each of the provided values at least once. In the
case of the third matrix above, we provided 6 values to be used in 4 columns.
If only one row were used, this would only use 4 values such that 5 and 6
would remain unused. Therefore another row was used as well, but since now
8 values are necessary to fill the matrix, the values 1 and 2 were recycled.
This idea of recycling can also be used to create matrices with the same
166 CHAPTER 7. MATRIX ALGEBRA
value in each entry: Such a matrix can be defined by providing this single
value along with the desired number of rows and columns. In particular, this
allows a very simple definition of any zero matrix:
For the definition of the identity matrix, recall that it is a special diagonal
matrix. Therefore it can be created in R with the use of the command diag,
while specifying the order of the identity matrix by a single argument. If
instead of a single number, we provide a whole vector of values, diag creates
a diagonal matrix with the specified values on the diagonal.
diag(4)
diag(c(2, 3, 1, 4))
have two dimensions (the number of rows and the number of columns), we
usually specify both dimensions when indexing matrices. Recall that when
indexing matrix elements, the first subscript refers to the row and the second
one refers to the column. This convention also holds when indexing matrices
in R. We start by assigning a matrix A that we will use for demonstration
purposes.
#A[1, 2]
A[2, 3]
## [1] 6
A[5]
## [1] 13
A[8]
## [1] 8
By comparing these values with the matrix A, we can deduce after a short
consideration that if indexing by a single value only, R translates the matrix
into a vector containing the columns of A stored one after another, and access
168 CHAPTER 7. MATRIX ALGEBRA
the corresponding entries in this vector. The eighth entry of a matrix with
5 rows is therefore the third entry of the second column (8 = 5 + 3).
To access certain columns or rows of a matrix, one can provide the corresponding
row or column number and leave the space for the other dimension empty
– but the comma dividing the two dimensions must be used (otherwise we
are in the situation above). One might notice this way of indexing also when
looking at a matrix R: The first row is denoted with [1,] and the first column
with [,1] (and the other rows and columns also correspondingly).
A[1,]
## [1] 1 2 3
A[, 3]
## [1] 3 6 9 12 15
It might come as a surprise that when accessing the third column of A, the
output comes in the form of a row. The reason is that generally, vectors
in R are dimensionless and for usual operations, it does not matter whether
a vector is a row or a column vector. R therefore by default drops the
dimension, i.e. it does not keep the information whether the vector is a row or
a column. If for a specific application it is necessary to keep this information,
it is possible to override this default with the argument drop.
A[ , 1, drop = FALSE]
## [,1]
## [1,] 1
## [2,] 4
## [3,] 7
## [4,] 10
## [5,] 13
rows or columns at the same time. In that case, the entries that are at the
same time in the desired rows and in the desired columns will be outputted.
Similarly to vectors, one can also specify which rows or columns are not to
be shown by means of negative indexes.
## [,1] [,2]
## [1,] 4 6
## [2,] 7 9
## [,1] [,2]
## [1,] 1 3
## [2,] 4 6
## [3,] 7 9
## [4,] 10 12
## [5,] 13 15
A[-1, ]
Finally, just like with vectors, also with matrices it is possible to access only
values that satisfy a certain condition. Applying a comparison to a matrix
will result in a matrix of logical values TRUE or FALSE; using the outcome of
such a comparison in square brackets will show only the values in the matrix
for which the result of the comparison is TRUE.
A > 5
A[A > 5]
## [1] 7 10 13 8 11 14 6 9 12 15
Note that the result comes in the form of a vector (and by comparing to the
original matrix A, we may see that these values come again in a column-wise
manner) and in this case, no argument can change the fact. The reason
is simple: the number of the matrix entries that satisfy the condition in
different rows or columns will in general vary and the entries will be placed
within the matrix in an irregular pattern.
cbind works very similarly to rbind, with the difference being that it combines
the matrix columns together. As a consequence, the condition is that the
matrices provided to the function must have the same number of rows. If a
vector is used, too, then it must consist of as many values as is the number
of rows in the other input(s).
To obtain the dimensions of a matrix, one may use nrow for the number of
rows, ncol for the number of columns, or dim for the dimension, that is both
number of rows and number of columns (in this order).
nrow(A)
## [1] 5
ncol(A)
## [1] 3
dim(A)
## [1] 5 3
default not the case, which we can see by the fact that both rownames(A)
and colnames(A) result in NULL which means that the vectors that store
the names of rows and columns of A are empty. However, it can easily be
changed by assigning vectors of names to them.
Let us make a small detour to inspect the way we assigned the row names
of A. The command paste can be used to combine several objects or values
into character vectors. In this particular case, it combines the character "R"
with the values 1:5 in turn. sep = "" specifies that R and the number are
to be separated by nothing (you can play around with the function inputs
to see what happens if you e.g. change the separator). As a result, we get a
vector of 5 characters:
rownames(A)
Now that we have defined the names of the rows and columns of A, they will
also be shown whenever we view the matrix, instead of the generic square
brackets:
## C1 C2 C3
## R1 1 2 3
## R2 4 5 6
## R3 7 8 9
## R4 10 11 12
## R5 13 14 15
Moreover, it is now also possible to access the rows and columns (and in fact
any entry) also by their names – but the usual indexing via row and column
numbers also remains valid.
7.3. MATRIX OPERATIONS 173
A["R4", "C3"]
## [1] 12
A["R3",]
## C1 C2 C3
## 7 8 9
A[3, ]
## C1 C2 C3
## 7 8 9
While for a 5 × 3 matrix this may not seem too helpful, it is actually
very useful for larger matrices. A typical example would be some dataset
containing e.g. the information about several people, with each row corresponding
to one person (and being named by the person’s name or some identifier) and
each column corresponding to one type of information, like number of points
in different exams (and being named by the course name or exam date).
A and B are called equal, A = B, if all their elements are equal, that
is, a = b for all i ∈ {1, . . . , m}, j ∈ {1, . . . , n}.
ij ij
For the addition and scalar multiplication, the usual rules apply.
Theorem 7.1. For scalars α, β ∈ R and matrices A, B and C of the same
dimension m × n, the following hold:
(A + B) + C = A + (B + C) (associative law),
A + B = B + A (commutative law),
(α + β)A = αA + βA and α(A + B) = αA + αB (distributive law).
While matrix addition or scalar multiplication work very intuitively and in
a straight-forward manner, this is not the case for matrix multiplication.
Definition 7.3. For matrices A = (aij )m×n and B = (bij )n×p , the matrix
product AB is a matrix AB = C = (cij )m×p with
n
X
cij = air brj .
r=1
Breaking down the definition of the matrix product, we have that cij is the
scalar product of the i-th row of A and the j-th column of B, that is, the
sum of element-wise products of the i-th row of A and the j-th column of
B.
Problem 7.1. Find the matrix product
0 1 2 3 2
2 3 1 · 1 0 .
4 −1 6 −1 1
7.3. MATRIX OPERATIONS 175
Solution. We will refer to the left matrix in the product as A and to the
right matrix as B such that we are looking for the product C = AB. To fill
the entry in the first row and the first column of the product C, we consider
the first row of A and the first column of B: c11 = 0 · 3 + 1 · 1 + 2 · (−1) = −1.
For the first row and second column of C, we continue working with the first
row of A, but move on to the second column of B: c12 = 0·2+1·0+2·1 = 2.
We continue in a similar manner to get all the other entries:
Below, we put these together in the resulting matrix. We also use color-
coding to make it obvious what parts of which matrix were used to find
particular cells: In A, we use colored backgrounds to highlight its rows,
whereas in B, we use different entry colors for the two columns. In the
final matrix, the combination of the background color and of the color of
the resulting number tells you which row and column of the matrices of the
product were used:
0 1 2 3 2 −1 2
2 3 1 · 1 0 = 8 5 .
4 −1 6 −1 1 5 14
A way of keeping track which rows and columns are necessary for the entries
of the product matrix is the Falk’s scheme. In this scheme, one uses a kind
of a coordinate system. We start by entering the left matrix of the product
(A) in the lower left quadrant and the right matrix of the product (B) in
the upper right quadrant:
3 2
1 0
−1 1
0 1 2
2 3 1
4 −1 6
Then the rows of A and columns of B can be seen as the names of the
rows and columns in the (so far empty) table which arose in the lower right
quadrant – this we will proceed to fill with the resulting matrix C = AB.
Each cell will be filled with the scalar product of its ’row name’ and ’column
176 CHAPTER 7. MATRIX ALGEBRA
3 2 3 2 3 2
1 0 1 0 1 0
−1 1 −1 1 −1 1
→ → ··· →
0 1 2 −1 0 1 2 −1 2 0 1 2 −1 2
2 3 1 2 3 1 2 3 1 8 5
4 −1 6 4 −1 6 4 −1 6 5 14
Remark. Note that finding the matrix product is only possible if the left
matrix in the product has as many columns as the matrix on the right has
rows. This is in line with Definition 7.3 where the matrices are assumed
to be of size m × n and n × p. Moreover, we can notice in the solution of
Problem 7.1 that the resulting matrix has as many rows as matrix A and
as many columns as matrix B. This again agrees with Definition 7.3 where
the size of the product is said to be m × p. To remember these rules, one
can think about ’inner’ and ’outer’ dimensions of the matrices that are part
of the product. The ’inner’ dimensions must be the same for the product to
exist whereas the ’outer’ dimensions define the size of the resulting matrix:
AB=C
m×n n×p m×p
Theorem 7.2. The following hold for any matrices A, B and C of appropriate
dimensions:
## C1 C2 C3
## R1 1 2 3
## R2 4 5 6
## R3 7 8 9
## R4 10 11 12
## R5 13 14 15
178 CHAPTER 7. MATRIX ALGEBRA
A + 2
## C1 C2 C3
## R1 3 4 5
## R2 6 7 8
## R3 9 10 11
## R4 12 13 14
## R5 15 16 17
A*1.5
## C1 C2 C3
## R1 1.5 3.0 4.5
## R2 6.0 7.5 9.0
## R3 10.5 12.0 13.5
## R4 15.0 16.5 18.0
## R5 19.5 21.0 22.5
A + A
## C1 C2 C3
## R1 2 4 6
## R2 8 10 12
## R3 14 16 18
## R4 20 22 24
## R5 26 28 30
A/A
## C1 C2 C3
## R1 1 1 1
## R2 1 1 1
## R3 1 1 1
## R4 1 1 1
## R5 1 1 1
A^3
7.3. MATRIX OPERATIONS 179
## C1 C2 C3
## R1 1 8 27
## R2 64 125 216
## R3 343 512 729
## R4 1000 1331 1728
## R5 2197 2744 3375
A*A
## C1 C2 C3
## R1 1 4 9
## R2 16 25 36
## R3 49 64 81
## R4 100 121 144
## R5 169 196 225
A%*%B
Next to the usual operations, R also offers useful functions that allow to
easily obtain the sums of each row and of each column of the matrix.
rowSums(A)
## R1 R2 R3 R4 R5
## 6 15 24 33 42
colSums(A)
## C1 C2 C3
## 35 40 45
Let us close this section by two examples that are applications of storing
information in matrices in a concise way and of using matrix multiplication
or matrix powers to obtain further results.
7.3. MATRIX OPERATIONS 181
Problem 7.3. The following diagram shows the numbers of flight connections
between the different airports in three countries:
1
b1
2 2
3 1 c1
1
a1 b2
1
1 c2
2 1
a2 b3 4
c3
1 b4
0
the different airports in B and c1.
In fact, such an argument can be made for any combination of airports in
A and C. A concise overview of the flight connections between the different
airports of A and C can therefore be obtained as a matrix product of two
matrices: In matrix P , let us represent each of the airports in A by one row
and each of the airports in B by one column. In matrix Q, let us represent
each of the airports in B by one row, whereas each column will correspond
to one of the airports in C. In the final product R = P Q, each row will
182 CHAPTER 7. MATRIX ALGEBRA
represent one airport in A, whereas each column will represent one airport
in C. From the diagram, we can write
c1 c2 c3
b1 b2 b3 b4 b1 1 0 2
!
a1 2 1 0 1 b2 2 0 0
P = , Q= .
a2 3 0 2 1 b3
1 0 4
b4 0 1 0
firm A keeps 85% of its customers, while losing 5% to firm B and 10%
to firm C;
firm B keeps 55% of its customers, while losing 10% to firm A and 35%
to firm C;
firm C keeps 85% of its customers, while losing 10% to firm A and 5%
to firm B.
Find the initial market share vector s and the transition matrix T , that is,
the matrix such that T s describes the market shares at the end of the next
year. Find and interpret the values T s, T (T s), T (T (T s)), etc.
Solution. Since initially, the market shares of the three firms A, B and C
are 20%, 60% and 20%, respectively, the market share vector s is of the form
0.2
s = 0.6 .
0.2
7.3. MATRIX OPERATIONS 183
To find out the shares of the firms at the end of the year, let us consider the
described changes.
A keeps 85% of its own share and moreover gains 10% of B’s customers
and 10% of C’s customers. This corresponds to a final share of 0.85 ·
0.2 + 0.1 · 0.6 + 0.1 · 0.2.
B keeps 55% of its own share and moreover gains 5% of A’s customers
and 5% of C’s customers. This corresponds to a final share of 0.05 ·
0.2 + 0.55 · 0.6 + 0.05 · 0.2.
C keeps 85% of its own share and moreover gains 10% of A’s customers
and 35% of B’s customers. This corresponds to a final share of 0.1 ·
0.2 + 0.35 · 0.6 + 0.85 · 0.2.
After carefully inspecting the final shares of the firms and comparing them
to the initial share vector s, we see that the transition matrix is
0.85 0.1 0.1
T = 0.05 0.55 0.05 .
0.1 0.35 0.85
Note that the columns of the transition matrix contain the information about
how the current share of each firm will be distributed among the firms in the
next year, with each row corresponding to one (receiving) firm. In this case
T s really delivers the final shares described above and we have
0.25
T s = 0.35 .
0.40
Since this is the market share vector after one year, multiplying it with T
again gives the market share after two years, assuming that in the second
year, the same relative changes happen. Similarly, T k s gives the market
share after k years, assuming that the same relative changes happen every
year. To find the values, let us resort to R.
T <- matrix(c(85, 5, 10, 10, 55, 35, 10, 5, 85), nrow = 3)/100
s <- c(0.2, 0.6, 0.2)
T %*% s
184 CHAPTER 7. MATRIX ALGEBRA
## [,1]
## [1,] 0.25
## [2,] 0.35
## [3,] 0.40
T %*% T %*% s
## [,1]
## [1,] 0.2875
## [2,] 0.2250
## [3,] 0.4875
## [,1]
## [1,] 0.315625
## [2,] 0.162500
## [3,] 0.521875
## [,1]
## [1,] 0.3367187
## [2,] 0.1312500
## [3,] 0.5320312
(Note that if you copy this code into an R-file, the T will be shaded in a
different color. The reason is that T actually is a short version for TRUE that
we are now rewriting. This is generally not an issue if you always use TRUE;
however, if you are likely to attempt to use the short version T later in your
code, you should be aware of this issue and use a different name for your
transition matrix.)
Note that to obtain the higher powers of the transition matrix, we keep
repeatedly using matrix multiplication. At some point, this becomes tedious,
therefore it would be useful to have a function that can calculate matrix
powers directly. Unfortunately, base R does not provide such a function.
7.3. MATRIX OPERATIONS 185
Fortunately, it is not too difficult to define an own function that will do just
that. Let us define it and then use it to inspect the market behavior over
the next 30 years. Note that T 0 = I, which is in line with the idea of the
identity matrix acting as 1 for matrices.
Try to understand exactly from the code below what matrix M contains and
how this information is used to create the plot as an exercise.
0.4
0.0
0 5 10 15 20 25 30
0:30
186 CHAPTER 7. MATRIX ALGEBRA
In the plot, we can observe that after about 7 years, the market shares seem
to stabilize in a sort of equilibrium. Moreover, we see that B quickly loses the
majority it had at the beginning, due to the large proportion of customers
it loses to C every year. While after the first year, B still owns a proportion
of the market close to C, after the second year the market share of B is the
smallest of the three firms, with C almost having reached majority at this
point.
Remark. To inspect the market behavior over the years in Problem 7.4, we
defined specifically a function T.power.n to calculate matrix powers since
base R does not offer a similar function. However, there is such a function
in the package expm. However, expm is a package that is not part of the
basic R installation, so if you have not yet, you will need to install it in
order to be able to use its functions. The package can be installed using
install.packages("expm"), or alternatively, you can navigate to the tab
Packages in the right lower pane of your Rstudio, click on Install and look
for the package. After installing it, don’t forget to load the package using
library("expm") or library(expm) (or by ticking it in the Packages tab)
for R to be able to access its functions.
This package contains the operator %^% which is the matrix counterpart of
^, much like %*% is the matrix counterpart of *.
library("expm")
T %*% T
T %^% 2
For working with transposes, there are a few useful rules that often simplify
calculations.
Theorem 7.3. For two matrices A and B of appropriate dimensions, the
following hold:
(A ) = A,
′ ′
(A + B) = A + B ,
′ ′ ′
(AB) = B A .
′ ′ ′
Remark. Note the change of order in the rule for the product! This is an
important part of the rule. In fact, if A and B are not the same size, it
might be the case that even though AB is well defined (and can therefore
be transposed to arrive at a new matrix), A′ B ′ might not even exist. Note
that by transposition, the dimension of an m × n matrix changes to n × m.
Use this for the dimensions of A and B to see that B ′ A′ actually does exist
whenever AB exists.
t(A)
## R1 R2 R3 R4 R5
## C1 1 4 7 10 13
## C2 2 5 8 11 14
## C3 3 6 9 12 15
188 CHAPTER 7. MATRIX ALGEBRA
A special class of matrices are matrices that are not changed by transposition.
Definition 7.5. A matrix A is called symmetric if A′ = A, that is, aij = aji .
Remark. Recall that by transposition, the dimension of an m × n matrix
changes to n × m. This means that only square matrices can be symmetric.
Symmetric matrices play a special role in matrix algebra because they have
several properties that simplify more advanced operations. In this course,
we do not discuss such advanced concepts, however, we will mention the
very important Hessian matrices in the following chapter, which are also
symmetric.
Problem 7.5. For any matrix X, show that X ′ X and XX ′ are symmetric.
Solution. Recall that a matrix A is symmetric if A′ = A. Therefore, to
show that X ′ X is symmetric, we will transpose it and use the rules for
calculations with the transpose to show that its transpose is the same as
X ′ X itself. We have
(X ′ X)′ = X ′ (X ′ )′ = X ′ X.
The proof for XX ′ works analogously.
Remark. Note that both X ′ X and XX ′ always exist and are square matrices.
If X is of dimension m×n, then X ′ X is of size n×n and XX ′ of size m×m
t(A)%*%A
## C1 C2 C3
## C1 335 370 405
## C2 370 410 450
## C3 405 450 495
A%*%t(A)
## R1 R2 R3 R4 R5
## R1 14 32 50 68 86
## R2 32 77 122 167 212
## R3 50 122 194 266 338
## R4 68 167 266 365 464
## R5 86 212 338 464 590
7.5. SYSTEMS OF EQUATIONS IN MATRIX FORM 189
Throughout this section, we will illustrate the use of the matrix form of
equations on the following example:
Example 7.1. A simple economy has three industries: fishing, forestry, and
boatbuilding. Producing
To describe the needed amounts of fish, tons and timber, we can write down
equations in the following way:
There is a final demand of d 1 tons for fish. However, this is not the
necessary amount x1 , since, to produce the x2 tons of timber, some fish
will be necessary, too. Therefore, we have x1 = d1 + βx2 .
Finally, while there is no final demand for fishing boats, they will be
needed to produce the necessary amount of x1 tons of fish, such that
we get x3 = αx1 .
190 CHAPTER 7. MATRIX ALGEBRA
x1 − βx2 = d1
x2 − γx3 = d2 (7.2)
−αx1 + x3 = 0.
We may notice that the left hand side of the system actually is the outcome
of the matrix multiplication Ax where
1 −β 0 x1
A= 0 1 −γ and x = x2
−α 0 1 x3
The system (7.2) can in fact be easily solved by plugging in backwards: From
the last equation, we have x3 = αx1 , which leads from the second equation
to x2 = d2 + αγx1 , and that gives, upon plugging in into the first equation,
(1 − αβγ)x1 = d1 + βd2 . Therefore, we have
x1 = d1−αβγ
1 +βd2
x2 = d2 + αγ d1−αβγ1 +βd2
(7.4)
x3 = α d1−αβγ
1 +βd2
.
The method with plugging-in backwards worked well in this particular case
due to the simple structure of the system, in particular due to the fact that
each equation only features two of the three variables. A more versatile
method that works also in situations when the presented method leads to
too complicated solutions is the so called Gaussian elimination, or its slightly
augmented form the Gauss-Jordan method. It relies on three basic facts
about equations:
7.5. SYSTEMS OF EQUATIONS IN MATRIX FORM 191
Interchanging the order of two (or several equations) does not change
the solution of the system.
Multiplying one equation by a non-zero scalar does not change the
solution of the equation (and consequently of the system).
Adding a multiple of one equation to another equation (while keeping
all other equations) does not change the solution of the system.
The matrix form allows to simplify this procedure by only keeping track of
the coefficients next to the variables on the left hand side of the equations,
but not having to write down the variable names themselves. This can save a
lot of time in case of larger systems. Let us write down the process of solving
the above system in matrix notation.
0.5 −1 3 1 −2 6 1 −2 6 1 −2 6
∼ ∼ ∼
2 1 17 2 1 17 0 5 5 0 1 1
(7.5)
staircase in which each following row has its first non-zero coefficient to the
right of the previous equation. Then one can find the solution to the system
(if there is any) by plugging in in a backward manner, or one can continue
the elimination process in up to a point where every leading coefficient (first
non-zero coefficient of an equation) is 1 and only has zeroes above it, too. In
the latter case, we talk about the Gauss-Jordan elimination. In the case of
the above system, there would be only one step left to finish the elimination
process presented in (7.5): By adding twice the second row to the first, we
eliminate the -2 in the second column of the first row to get
1 0 8
0 1 1
Solution. With the elementary row operations, we can perform the following
7.5. SYSTEMS OF EQUATIONS IN MATRIX FORM 193
elimination procedure:
1 0 1 2 a 1 0 1 2 a
1 3 −1 1 b ∼ 0 3 −2 −1 b − a ∼
1 9 −5 −1 c 0 9 −6 −3 c − a
1 0 1 2 a 1 0 1 2 a
0 3 −2 −1 b−a ∼ 0 1 − 32 − 13 b−a
3
0 0 0 0 2a − 3b + c 0 0 0 0 2a − 3b + c
In the first step, we have subtracted the first row from the second as well as
the third row to eliminate the 1’s in the first column. In the second step, we
subtracted three times the second row from the third one, to eliminate the
9 in the second column of the third row. Finally, we divided the second row
by 3 to make the leading entry a 1.
From the last row, we see that the system only has a solution if 2a−3b+c = 0.
In this case, x3 and x4 are free variables, since they do not correspond to any
of the two leading entries. We get the solution
x1 = a − x3 − 2x4
b−a 2 1
x2 = + x3 + x4
3 3 3
x3 , x4 ∈ R.
The solution can be checked simply by plugging in into the original system
of equations.
Problem 7.7. Solve the system of equations from Example 7.1 for α = 12 ,
β = 41 , γ = 2, d1 = 100 and d2 = 80 with the elimination method.
Solution. We start by rewriting the general system in matrix form with the
corresponding parameter values:
1 − 14 0 100
0 1 −2 80 .
1
−2 0 1 0
194 CHAPTER 7. MATRIX ALGEBRA
The full elimination process (one version of it) is as follows (try to understand
the individual steps or, even better, perform them yourselves and compare
the final result):
1 − 14 0 100 1 − 14 0 100
0 1 −2 80 ∼ 0 1 −2 80 ∼
1
−2 0 1 0 0 − 18 1 50
1 − 14 0 100 1 − 14 0 100
0 1 −2 80 ∼ 0 1 −2 80 ∼
3
0 0 60 0 0 1 80
4 1
1 − 4 0 100 1 0 0 160
0 1 0 240 ∼ 0 1 0 240
0 0 1 80 0 0 1 80
which gives the solution x1 = 160, x2 = 240 and x3 = 80. By plugging in the
parameter values to (7.4), we see that this is also in line with the solution
we obtained for the general parameters.
AX = XA = I (7.6)
Definition 7.6. The matrix X that satisfies (7.6), if it exists, is called the
inverse of A. We denote the inverse by A−1 . If A has an inverse, it is called
invertible or regular ; a matrix that is not invertible is called singular.
Note that for (7.6) to make sense, both A and X must be square matrices.
If A−1 exists, it is uniquely defined (try to prove this). Note also that this
is another exception to the rule AB ̸= BA: If the inverse exists, then
AA−1 = A−1 A.
In the following, we provide some useful rules that can help simplify calculations
with the inverse.
Theorem 7.4. For square invertible matrices A, B of the same size and a
scalar α ̸= 0, the following hold:
(A ) = A,
−1 −1
(AB) = B A
−1 −1 −1
,
(A ) = (A ) ,
′ −1 −1 ′
(αA) = A .
−1 1
α
−1
We see immediately that x11 = 0 and x12 = 1. This then implies that x21 = 1
and x22 = −2.
196 CHAPTER 7. MATRIX ALGEBRA
Similar considerations for the second matrix yield that this matrix does not
have an inverse since we get
1 0 x11 x12 x11 x12 1 0
= =
0 0 x21 x22 0 0 0 1
we get
a b x11 x12 ax11 + bx21 ax12 + bx22 1 0
= = .
c d x21 x22 cx11 + dx21 cx12 + dx22 0 1
Remark. As it turns out in Problem 7.8, looking for the inverse corresponds
to solving severalsystems
at the same time. For a 2 × 2 matrix,
of equations
1 0
these are Ax = and Ax = , and this concept can be generalized
0 1
for larger matrices. The inverse can therefore actually be found by Gauss-
Jordan elimination with the identity matrix on the right side, i.e. the system
with the matrix form (A|I). If it is possible to achieve the identity matrix
on the left hand, then A is invertible and whatever matrix is on the right at
the end of this process is in fact the inverse of A. This is also why in R, the
function that provides the inverse of a matrix is in fact the same that is used
to solve systems of equations: solve with a single argument, a square matrix,
outputs the inverse of the matrix (if it exists; it gives an error otherwise):
solve(A)
7.6. THE INVERSE AND DETERMINANT 197
A%*%solve(A)
In Problem 7.8, we found the condition for the existence of the inverse of a
2 × 2 matrix
a b
A=
c d
is that ad − bc ̸= 0. The number ad − bc is a special number that can be
calculated for any square matrix A. It is called the determinant, denoted by
det(A) or |A|, and it helps us to determine whether a matrix is invertible or
not.
|D| = d1 d2 . . . dn .
We will now collect some properties of the determinant and then comment
on them, including on how they can be used to find the determinant of a
general square matrix, though calculating the determinant of a non-diagonal
matrix of size larger than 2 × 2 without R is beyond the scope of this course.
Theorem 7.6. For any n×n matrices A and B and a scalar α, the following
hold:
Note that points 4.-6. of Theorem 7.6 in fact describe what happens to the
determinant of the matrix if elementary row operations are performed. That
is, one can calculate the determinant of any square matrix with the help of
elimination by performing elementary row operations to arrive at a diagonal
matrix and keeping track of the changes to the determinant along the way.
Remark. Point 8. in Theorem 7.6 is in fact implied by point 6.: If A is
multiplied by α, that corresponds to every row being multiplied by α. Since
for each of the n rows, the determinant will be multiplied by α, we get
|αA| = αn |A|.
Remark. Point 7. of Theorem 7.6 implies that the product of two matrices
is invertible if and only if both of the matrices are invertible, too, since for
the product of two numbers to be non-zero, both of them must be non-zero,
and for the product to be 0, at least one of the two numbers must be 0.
7.6. THE INVERSE AND DETERMINANT 199
We will now apply our knowledge about matrix inverses and the general
rules for calculating with matrices to solve equations that feature matrices
not only as coefficients, but also in the role of the unknowns.
Example 7.2. Let us start by a simple system of linear equations that
in matrix form can be written as Ax = b where A is a square matrix of
coefficients, x is the vector of unknown variables and b is the vector of right
hand sides. If A is invertible, then we can multiply both sides of this system
by A−1 – but be careful! In matrix multplication, order matters, thus we
need to specify that in this case, we will be multiplying both sides of the
equation from the left to remove A from the right hand side:
Ax = b ⇔ A−1 Ax = x = A−1 b.
Note that if A is not invertible, it could mean both that the system does not
have a solution, but also that there are infinitely many solutions.
Problem 7.9. Find a matrix X that satisfies AB + CX = D. What
condition must be satisfied for the found X to exist?
Solution. Just like in equations with a single real valued variable, we start
by moving AB to the left side to only have terms with X on the right hand
side:
CX = D − AB.
Now, if C −1 exists, we may multiply both sides of the equation by it from
the left, to obtain
X = C −1 (D + AB)
which clearly only exists if C is invertible.
Problem 7.10. Find a matrix X that satisfies XA+3X = B and formulate
a condition for this solution’s existence.
Solution. We start by noting that 3X = X(3I). This reformulation is
necessary for us to be able to use the distributive law since in its original
form XA+3X, we cannot factor out X from the term on the right hand side
of the equation – the addition of a matrix and a number is mathematically
not defined. After this step, we may proceed as follows:
XA + X(3I) = B
X(A + 3I) = B
X = B(A + 3I)−1
200 CHAPTER 7. MATRIX ALGEBRA
C −1
=I +C
C 3
= −I + 2C
C 4
= 2I − 3C
C + I = C −1
C 3 + C 2 = C.
7.7 Exercises
7.1 Consider a firm that operates in 3 cities. The firm produces 3 tools that
are used in 4 different industries. A manager needs to analyze the data about
the firm for the last 4 years.
Four years ago, the average price of tools that the firm sold in cities {1, 2, 3}
was 3, 8 and 7 (in thousands of EUR), respectively, and the numbers of sales
in the same cities were 16, 10 and 14 (in thousands of units), respectively.
Each year, these prices increased by 2000 EUR and the number of sales
decreased by 1000 units in each city.
Find the matrix P that contains the average prices in each city (row) and
each year (column), the matrix S that contains the sales in each city and
each year, and then use R to find the matrix R that contains the revenue in
each city and each year.
7.2 Consider a firm that produces three types of products from two types of
raw materials. Product 1 requires 5 units material 1 and 2 units of Material
2, Product 2 requires 2 units of material 1 and 4 units of Material 2, and
Product 3 requires 12 units of Material 1. The firm ships these product into
3 countries. The next shipment into Country 1 should consist of 1000, 2500
and 500 units of the products, respectively. The next shipment into Country
2 should contain 2200 units of Product 1 and 3000 units or Product 3. To
Country 3, 5000 units of Product 1, 2500 units of Product 2 and 1800 units of
Product 3 will be shipped. Find the matrix P that contains the production
requirements of the materials for the products and the matrix O that contains
the ordered amounts of the products for the three countries. Then use R to
find the matrix R that contains the amounts of raw materials required to
produce the shipments for the three countries and then the amounts of raw
materials required overall for all three countries combined.
7.4 Solve the following systems of linear equations using Gaussian elimination:
2x1 + 4x3 = −6
a) x1 + 2x2 + 2x3 = 1
−3x1 + 2x2 = 1
3x1 + 6x3 − 9x4 = 18
x + 2x2 + 5x3 = 9
b) 1
−x1 + 3x2 − 2x3 + 2x4 = −5
− 2x2 − 2x3 + 7x4 = −11
x1 + x2 − 2x3 + 4x4 = 0
c) 2x1 + 2x2 + x3 + 3x4 = 5
−x1 − x2 + x3 − 3x4 = −1
7.5 Which of the following matrices must be equal to the matrix (A + B)2
(for A and B of appropriate sizes)?
a)(B + A)2
b)A2 + 2AB + B 2
c)A(A + B) + B(A + B)
d)(A + B)(B + A)
e)A2 + AB + BA + B 2
f)B(A + B) + (B + A)A
A+A′
7.6 Show that for any square matrix A, the matrix C = 2
is symmetric.
7.9 Consider the matrix equation CDX = C + D for the unknown matrix
X.
a)Solve the equation and formulate conditions for the existence of X.
b)Find X for
1 2 5 0
C= , D= .
3 4 0 6
Check your results in R.
c)Assume that some matrices C and D such that the solution of the above
matrix equation exists have been assigned in R. Moreover, the dimension
n has been assigned to the variable n. Among the following lines of code,
choose all that deliver the correct solution X. (There might be more than
one.)
i)inv(D) + inv(D)%*%C%*%D
ii)inv(D) + inv(D)%*%inv(C)%*%D
iii)solve(D) + D%*%solve(C)%*%solve(D)
iv)solve(D) + solve(D)%*%solve(C)%*%D
v)D^(-1) + D^(-1)*C*D
vi)solve(D) + solve(D)*solve(C)*D
vii)solve(D)%*%(diag(n) + solve(C)%*%D)
viii)inv(D)*C*D + inv(D)
7.10 Solve the following matrix equations where A, B and C are square
matrices of the same size and X the unknown matrix, and provide the
conditions for the existence of the solution X.
a)AXC −1 = A′ BA, where C is invertible. How does the solution
simplify if A is symmetric?
b)B −1 XC = BC, where B is invertible.
7.11 Calculate:
a)A−1 + BC −1 if
2 2 4 3 4 0
A= , B= , C= ,
3 4 2 1 0 5
b)E ′ D −1 + F if
1 2
1 3 2 4 1
D= , E= , F = 0 −1 ,
2 4 1 3 1
2 −2
7.7. EXERCISES 205
c)R′ + P −1 Q if
1 1
4 6 −1 1 0
P = , Q= , R = 3 5 .
1 2 2 1 3
6 −2
and then calculates Q′ (P R)−1 and (R′ P ′ )−1 Q. Think first about how to
write it, ideally write it down on paper, and only then use R to check your
answer.
7.13 Assume that a matrix A has been defined in R. What does the following
piece of code do? In particular, what is printed in the last two lines?
m <- nrow(A)
n <- ncol(A)
v1 <- sum(A[1, ])
for(i in 2:m) v1 <- c(v1, sum(A[i, ]))
v2 <- min(A[ , 1])
for(j in 2:n) v2 <- c(v2, min(A[ , j]))
print(v1)
print(v2)
d1 0 ... 0
0 d2 ... 0
7.14 Show that a diagonal matrix D = .. .. .. is invertible
..
. . . .
0 0 . . . dn
if
and only if d1 , . . . , dn are all non-zero and that in that case, D −1 =
1
d1
0 . . . 0
0 1 ... 0
d
.. ..2 . . .. .
. . . .
1
0 0 ... dn
a)|B −1 ACB|
b)|3A|
c)|C ′ BC|
7.16 Write a function inverse that takes a 2 × 2 matrix A as input and finds
its inverse, without using the in-built functions for computing the inverse or
for finding the determinant. If the inverse does not exist, it prints "A is not
invertible." You may assume that the provided input is a 2 × 2 matrix,
that is, you do not have to check the dimension of the given matrix A.
Hint: Recall that for a 2 × 2 matrix A, there is an explicit formula to find
A−1 .
7.18 Below you will find a code with gaps that is supposed to accomplish the
following set of tasks: Create a 3 × 4 matrix from the vector 1:12, filled in
a row-wise manner. Afterwards, interchange the first and second row. Then
multiply the second row of the (new) matrix by 5. Finally, print the sum
of the elements for those columns where this sum over the column is larger
than 30.
a)Fill in the gaps (denoted by _____) in the code using the words/terms/expressions
from the word cloud below (one word per gap) such that the code
accomplishes the tasks given above. Note that some words remain
unused.
Word cloud: M, G, for, if, max, min, 30, 100, TRUE, FALSE, nrow, ncol,
sum, mean, length, 1, 2, 3, 4, i, n, SC, paste, c(2, 1), c(1, 2)
b)What is the output of the code? (Provide the exact output, not a
description. Try to work it out without using R first, then check your
answer in R.)
7.19 For matrices A and B, what do the following functions do? (Do not
explain the code line by line. Instead, your task is to recognize what the
outcome of each function is in terms of the input.)
Section 13.4 about the basic rules for determinants, including exercises
1, 2 and 4;
f (0, 1) = 2 · 0 + 02 13 = 0,
f (−1, 0) = 2 · (−1) + (−1)2 03 = −2,
f (a + 1, b) = 2(a + 1) + (a + 1)2 b3 .
Example 8.2. Let the milk consumption f (p, m) be a function of the relative
price of milk p and income per family m:
209
210 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
Like for many things, also for plotting functions of two variables there are
many ways of how to do it in R. Here, we introduce the package emdbook and
its function curve3d which allows to create different types of plots, both 3-
and 2-dimensional. emdbook is not part of the basic R installation, so install
it first if you have not done so yet, and don’t forget to load it.
To introduce the plotting function, we will use the function from Example
8.2. Upon defining it, we use curve3d to create a first plot of the function.
As the name of the function suggests, it is designed specifically to create 3D
plots, and thus for functions of 2 variables. For such functions, the graph (the
collection of all combinations of function arguments and the corresponding
function value) is a 3-dimensional object – 2 dimensions correspond to the
function arguments, the third one to its value.
The function curve3d requires the function f to be plotted as its first argument.
We can further specify the intervals from which the values of the arguments
come. To this end, we provide a vector of smallest values in the argument
from and a vector of greatest values in the argument to (default values are
c(0, 0) for from and c(1, 1) for to.) If we wish to name the variables
differently than the generic x and y, we can do so with the help of the
argument varnames. Alternatively, one can change the axis labels with the
help of arguments xlab, ylab and zlab. Finally, sys3d controls what 3D
plotting system will be used. The value "wireframe" creates an image with
the view from the point from (also note the wire look of the function surface).
There are several other types of plots and this argument controls which of
them is to be used. We will introduce some of the other 3D plotting systems
one after the other.
8.1. PLOTTING FUNCTIONS OF TWO VARIABLES IN R 211
f(p,m)
m p
As already mentioned above, the point from which the function is viewed
when using sys3d = "wireframe" is from. However, we have no information
about what this point is, what intervals are used for the two variables, or what
values the function takes. While we might get an idea about the general form
of the function, the information about the scale is important and usually one
would like to include at least a part of it in a figure. Unfortunately, it is not
possible to add scales when using sys3d = "wireframe". However, the good
news is that the default setting sys3d = "perp" allows for adding axes ticks.
This can be done with the help of another argument, ticktype. The default
value of this argument is "simple" and it results in the arrows indicating the
direction of increase as used with sys3d = "wireframe". With ticktype =
"detailed", we get axis ticks with variable values.
212 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
2.5
2.0
f(p,m)
1.5
1.0
0.9
1.0 0.8
0.7
m
0.5 0.6
0.5
0.5 0.6 0.7 0.8 0.9 1.0
p
While 3D plots are great for getting a general idea about the behaviour of the
function, they are usually not too easy to read. For this, it is more common
to use contour plots. With curve3d, these can be created by setting sys3d =
"contour". In a contour plot, the function is visualized as the 3-dimensional
space being cut by various horizontal planes parallel to the xy-plane. The
lines in a contour plot are the so called level curves. Each curve corresponds
to a certain value – level – k of the function, and collects all points (x, y)
(pairs of variable values) for which f (x, y) = k. You have in fact possibly
encountered level curves if you ever used a hiking map: They allow hikers to
get a feeling for the steepness of the trail; the closer the level curves are to
each other, the steeper the trail. Note that while we only see level curves for
a few possible function values, the function takes on other values in between.
However, we would not aim to plot all the level curves as that would result
in a black mess that would not allow us to read any information from the
graph.
8.1. PLOTTING FUNCTIONS OF TWO VARIABLES IN R 213
8
2 2 1. 1.
6 1.4 1.2 1
2.
4
2.
0.9
0.8
0.8
0.6
m
0.7
0.4
0.6
0.5
In the contour plot above, one can nicely see that (in the plotted area), f
is decreasing in p and increasing in m: Moving along the axis corresponding
to p from left to right, the function value decreases, while with increasing
values of m (moving along the axis from bottom to top), the function value
increases. This may help us determine the sign of partial derivatives (which
will be introduced below) at a certain point.
1.0
0.9
0.8
m
0.7
0.6
0.5
Like with functions of single variables, also in the case of several variables
we are often interested in how functions react to changes in the underlying
variables. To this end, we study the partial derivatives of the function
with respect to its individual variables. We present the definition of partial
derivatives for two variables only, but the notion extends to n variables
(n > 2).
∂f f (x + h, y) − f (x, y)
fx = fx′ = f1′ = := lim
∂x h→0 h
8.2. PARTIAL DERIVATIVES 215
and
∂f f (x, y + h) − f (x, y)
fy = fy′ = f1′ = := lim .
∂y h→0 h
The vector of the partial derivatives
Definition 8.1 suggests that to find the partial derivative of f with respect
to one of its variables, all other variables are considered constants. If the
function contains e.g. a product of x and y, y plays the role of a multiplicative
constant when looking for fx′ .
kept constant) and the partial derivative with respect to y describes the
approximate change in the function value in reaction to a 1 unit increase in
y. The following interpretations of the derivative signs are therefore quite
intuitive:
If f (x, y) ≥ (>)0 for all (x, y) ∈ B for some convex set B ⊆ R×R → R,
x
f is increasing (strictly increasing) in x on the set B.
If f (x, y) ≤ (<)0 for all (x, y) ∈ B for some convex set B ⊆ R×R → R,
x
f is decreasing (strictly decreasing) in x on the set B.
Similar interpretations hold also for y (and any other variable of f , if it has
more than two).
Remark. Note that in the third and fourth point, we mention f being
increasing at a point. f is considered increasing in x at a point (x0 , y0 )
if there is an open interval I, x0 ∈ I such that f is increasing on I × {y0 }.
Problem 8.1. Find the partial derivatives of the function from Example
8.2, f (p, m) = p−1.5 m2.08 . Interpret them.
fm = 2.08p−1.5 m1.08 .
8.2. PARTIAL DERIVATIVES 217
Since p and m are positive, we have fp < 0 and fm > 0 for any values of
p and m from the function domain. That means that f is decreasing in p
and increasing in m over the function’s domain. This is in line with what we
observed in the contour plot.
Example 8.3. Let D(p, q) and E(p, q) be the demands for two commodities
when the prices per unit are p and q, respectively. Suppose the commodities
are substitutes in consumption, such as butter and margarine. Then the
normal signs of the first order derivatives are as follows:
Recall that the derivative gives the approximate absolute change of the
function value in reaction to a unit change in a variable, whereas elasticity
quantifies the approximate relative (percentage) change in reaction to a
relative (one percent) change in the variable. Knowing about partial derivatives,
the definition of elasticity for functions of several variables is a straight
forward generalization of the elasticity definition for single variable functions.
Definition 8.2. Let f : R × R → R be a function of two variables x1 , x2 .
Then the elasticity of f with respect to the variable xi is
xi
Elxi f (x1 , x2 ) = fx (x1 , x2 ).
f (x1 , x2 ) i
Remark. If D1 (p1 , p2 ) and D2 (p1 , p2 ) are two demand functions, both being
functions of the prices of two commodities (like D and E in Example 8.3),
then Elp2 D1 (p1 , p2 ) and Elp1 D2 (p1 , p2 ) are called cross demand elasticities.
Remark. Note that Definition 8.2 can easily be extended to functions n >
2 variables. Just like such a function would have n partial derivatives,
there would be n elasticities of the functions with respect to each individual
variable.
Problem 8.2. The demand for money in the United States for the period
1929 to 1952 was estimated as
M = 0.14Y + 76.03(r − 2)−0.84 , (r > 2),
218 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
where Y is the annual national income, and r is the annual interest rate
measured in percentages. Plot the function in R. Find the partial derivatives
MY and Mr and discuss their signs. Find the income and interest elasticies
of the demand for money.
Solution. For the derivative of M with respect to Y , note that the second
part of the function after the plus sign does not depend on Y in any way,
such that it will be considered an additive constant and therefore disappear
in the differentiation process. Similarly, the part of the function before the
plus sign is just an additive constant when differentiating with respect to r.
We therefore get
MY = 0.14, (8.1)
Mr = −63.8652(r − 2)−1.84 . (8.2)
Since MY > 0, the demand for money increases with national income. People
are more willing to take on credits and mortgages if they earn more and are
therefore confident to be able to repay them. On the other hand, Mr < 0
(for r > 2, which is assumed), such that the demand for money decreases
with increasing interest rates.
We proceed to find the elasticities. We use the partial derivatives we found
in (8.1) and (8.2) to get
0.14Y
ElY M (Y, r) =
0.14Y + (r − 2)−0.84
−0.84r(r − 2)−1.84
Elr M (Y, r) = .
0.14Y + (r − 2)−0.84
If for instance the current levels are Y = 10000 and r = 2.5, we get
ElY M (10000, 2.5) ≈ 0.9987 which means that a 1% increase in income leads
to an increase in the demand for money of about 1%. On the other hand,
Elr M (10000, 2.5) ≈ −0.0013, meaning that the demand for money reacts to
a 1% increase in the interest rate by a decrease of only 0.0013%.
Finally, let us plot the function in R, for which we of course first need to define
it. To get an idea about the behaviour of the function in both variables,
we use the perspective from the starting point, i.e. sys3d = "wireframe".
However, you may use other options, too, to familiarize yourself with the
function.
8.2. PARTIAL DERIVATIVES 219
M(Y,r)
r Y
Problem 8.3. Recall the function f (p, m) = p−1.5 m2.08 from Example 8.2.
Find the price and income elasticities of milk consumption.
Solution. We use the partial derivatives found in Problem 8.1 to find the
elasticities:
Remark. Note that the elasticies in both cases are constant and they are
the corresponding exponents of p and m. This is a special property of the
Cobb-Douglas functions: Their elasticities are always constant and equal to
the exponents of the variables.
220 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
f (x, y) = k (8.3)
for a constant k and a function f of two variables x, y (in general, f may also
be a function of more than two variables). If one would like to change the
value of x while keeping the function value f (x, y) constant at k, y would have
to be adapted as well. Note that f (x, y) = k corresponds to one level curve
in the contour plot of f . If you consider point (x0 , y0 ) on this level curve (i.e.
f (x0 , y0 ) = k) and move away from this point along the x axis, thus moving
away from x0 , you would typically also need to move away from y0 to stay at
the same level curve. In that sense, y can be seen as a function of x, and since
it is not given directly as a function, which would be an explicit definition,
but rather through an equality, we say that there is an implicit relation or
that it is given implicitly. That is, we have some function y = y(x) that we
usually can’t find in an explicit form. Similarly, if one moves away from the
current value of y, x typically also has to be adjusted, such that x is also a
function of y, x = x(y). In the following, we will consider y in dependence
of y, but the described concepts work the same also if we exchange the roles
of the variables.
Example 8.4. An example for a situation given above would be the production
of a certain product. Assume that the production output of a company when
using x and y units of different inputs (e.g. materials) is given as f (x, y) and
the company currently produces k units of the product. If the price of the
first material increases and the company would therefore like to decrease the
amount of the material used, while at the same time keeping the production
output at the same level, it will also need to adjust the amount y of the
second input used in the production.
use of chain rule, and very often the product rule will be needed, too. After
differentiating 8.3, we get the equation
df (x, y(x))
=0
dx
which we solve for y ′ (x). In fact, the chain rule lets us write
df (x, y(x)) ∂f ∂f
= (x, y) + (x, y)y ′ (x)
dx ∂x ∂y
which, upon setting equal to 0 and solving for y ′ (x), leads to
fx (x, y)
y ′ (x) = − . (8.4)
fy (x, y)
y(x) + xy ′ (x) = 0
y(x)
y ′ (x) = − . (8.5)
x
Note that in the differentiation, we used the product rule for xy(x), setting
f (x) = x and g(x) = y(x).
To find out by how much y has to change for each unit change in x at the
current levels, we just plug in the current levels of x and y into the expression
for y ′ : y ′ = − 2.5
2
= −0.8.
In this particular case, it is not difficult to reformulate the inequality to get
y = x5 (note that if x = 0, xy ̸= 5 such that 0 not being in the domain of x5 is
not an issue). Differentiating this function gives y ′ (x) = − x52 . On the other
hand, if we plug in y(x) = x5 into the expression (8.5) we got by implicit
differentiation, we get
y(x) 5
y ′ (x) = − =− 2
x x
which provides us an easy way to confirm that the implicit differentiation
worked correctly.
222 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
Problem 8.5. Find an expression for y ′ when it holds that y 3 + 3x2 y = 13.
Solution. We notice that unlike in the previous problem, it is not easy to
describe the relation between x and y explicitly, that is, we don’t know the
explicit form of the function y = y(x). We therefore have to resort to implicit
differentiation. We have
y 3 (x) + 3x2 y(x) = 13
3y 2 (x)y ′ (x) + 6xy(x) + 3x2 y ′ (x) = 0
y ′ (x)(3y 2 (x) + 3x2 ) = −6xy(x)
6xy(x)
y ′ (x) = − 2 .
3y (x) + 3x2
Note that to differentiate y 3 (x), we used the chain rule: We have y 3 (x) =
f (g(x)) with f (x) = x3 and g(x) = y(x). To differentiate this composite
function, we differentiate the outer function and plug in the inner function
(f ′ (g(x)) = 3y 2 (x)) and then multiply this by the derivative of the inner
function (g ′ (x) = y ′ (x)).
Alternatively, we can use formula (8.4) to find y ′ . For f (x, y) = y 3 + 3x2 y,
we have fx (x, y) = 6xy and fy (x, y) = 3y 2 + 3x2 , which gives
fx (x, y) 6xy
y ′ (x) = − =− 2 .
fy (x, y) 3y + 3x2
Remark. We use the notation y(x) to remind you that there is a relation
between x and y, that is, y is in fact considered a function of x. However,
this is not necessary and after you have had some practice with implicit
differentiation, you may drop the function argument and write y instead of
y(x). For instance the solution of the above problem would then look as
follows:
y 3 + 3x2 y = 13
3y 2 y ′ + 6xy + 3x2 y ′ = 0
y ′ (3y 2 + 3x2 ) = −6xy
6xy
y′ = − 2 .
3y + 3x2
For a function f (x, y), fx and fy are called first order partial derivatives.
These functions are, in general, again functions of two variables, and each
8.4. HIGHER ORDER PARTIAL DERIVATIVES 223
of them therefore possesses again two first order partial derivatives each.
With respect to the original function, these would be second order partial
derivatives because we obtain them by differentiating the function twice. A
function of two variables therefore has four second order partial derivatives
(and a function of n variables has n2 second order partial derivatives).
Definition 8.3. The second order partial derivatives of a function f : R ×
R → R are the partial derivatives of the first order partial derivatives.
Direct partial derivatives:
∂ 2f ∂ 2f
′′ ′′ ∂ ∂f ′′ ′′ ∂ ∂f
fxx = fxx = f11 = = , fyy = fyy = f22 = =
∂x2 ∂x ∂x ∂y 2 ∂y ∂y
(8.6)
Cross partial derivatives:
∂ 2f ∂ 2f
′′ ′′ ∂ ∂f ′′ ′′ ∂ ∂f
fxy = fxy = f12 = = , fyx = fyx = f21 = =
∂x∂y ∂y ∂x ∂y∂x ∂x ∂y
(8.7)
Second order partial derivatives are collected in the Hesse matrix or Hessian:
2 2
!
∂ f ∂ f
∂x2 ∂x∂y
Hf (x, y) = ∂2f ∂2f .
∂y∂x ∂y 2
Remark. For a function of n variables, the Hessian has n2 entries, with each
row corresponding to the variable of the first derivative, and each column to
the variable with respect to which the second derivative was taken:
2
∂ f ∂2f
2 . . . ∂x1 ∂xn
∂x. 1 . ..
Hf (x) = .
. . . . .
∂2f ∂2f
∂xn ∂x1 ∂x2n
Problem 8.6. Consider again the function f (p, m) = p−1.5 m2.08 . Find all of
its second order partial derivatives and the Hessian.
Solution. We already found the first order partial derivatives in Problem
8.1. We therefore differentiate each of them with respect to p and m to
obtain
fpp = 3.75p−3.5 m2.08
fmm = 2.2464p−1.5 m0.08
fpm = −3.12p−2.5 m1.08
fmp = −3.12p−2.5 m1.08 .
224 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
Like in the case of single variable functions, the second order derivatives can
be used to study whether the function is convex or concave. Let (a, b) be a
point in the domain of a twice differentiable function f : R2 ⊇ A → R, and
B be a convex subset of A.
The direct derivatives can be used to determine the behaviour of the function
with respect to one variable at a time (while keeping the other constant):
For f , we have fxx = fyy = 2 > 0, such that the function is convex in both
2
x and y. Moreover, we have fxy = 0, thus fxx fyy > fxy , the function is
therefore also convex overall. On the other hand, for g we have gxx = −2,
226 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
gyy = 6y and gxy = 0. Since gxx < 0, g is concave in x. gyy is positive for
2
y > 0, thus g is convex in y for y ∈ R+ . In this case, gxx gyy < gxy , g is
therefore neither convex nor concave on R × R+ . For y ∈ R− , gyy < 0 which
2
makes g concave in y. In this case, we have gxx gyy > gxy , which means that
g is also concave overall.
Let us now plot these functions to observe their shape.
f(x,y)
y x
We observe that f forms a ”bowl” and for any two points on the graph of
the function, if we connect them with a straight line, this line does not cross
below the graph of the function. For g, such a line connecting two points
will always stay below the graph for y < 0 (see the plot of g for x, y, ∈ [0, 1]),
however, this is generally not the case if y can be positive.
8.4. HIGHER ORDER PARTIAL DERIVATIVES 227
g(x,y)
y x
g(x,y)
y x
228 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
g(x,y)
y x
Problem 8.7. Consider again the function f (p, m) = p−1.5 m2.08 . Decide
whether the function is convex/concave in p, m and (p, m).
Solution. We already found the Hessian in Problem 8.6:
Note that both fpp and fmm are positive for all p, m > 0 (domain of f ).
Therefore f is strictly convex in p and strictly convex in m. However,
det(Hf (p, m)) = 8.424−9.7344 < 0 such that f is neither convex nor concave
in (p, m).
8.5 Exercises
8.1 Find the first order partial derivatives with respect to x and y of the
following functions:
q
a)f (x, y) = xy
3
c)h(x, y) = xy
2 +5xy
b)g(x, y) = ex d)i(x, y) = x3 ln(4xy)
8.5. EXERCISES 229
For each of the functions above, plot the contour plot for x, y ∈ (0, 1). Decide
about the sign of the first order partial derivatives from the analytical form
(the derivatives you found) and confirm it visually from the contour plot.
8.2 For the functions, decide whether they are increasing or decreasing in x
and y on R × R and whether they are convex or concave in x, y and (x, y)
on R × R.
a)f (x, y) = x2 + 4xy + 8y 2
b)g(x, y) = −x2 + 2xy − y 2
c)h(x, y) = − ex1 − ex1 +x2
8.3 Consider the function f (x, y) = 5x0.5 y 0.2 for x, y > 0. Find its Hessian
and decide whether the Hessian is positive/negative definite (or neither/remains
undetermined) for any x, y > 0. Conclude about the convexity/concavity of
the function.
8.4 For the functions
√
2 x
f : R+ × R+ → R, f (x, y) = ln((x + y) ) + and
3y
r
x2 −y x
g : R+ × R+ → R, g(x, y) = e − ,
y
find all their first and second order partial derivatives. Then consider the
point (x0 , y0 ) = (1, 4) and decide whether the functions are:
a)increasing or decreasing with respect to x at (x0 , y0 ),
b)increasing or decreasing with respect to y at (x0 , y0 ),
c)convex or concave with respect to x on B = (x0 − ϵ, x0 + ϵ) × {y0 } for
some small ϵ > 0,
d)convex or concave with respect to x on B = {x0 } × (y0 − ϵ, y0 + ϵ) for
some small ϵ > 0.
8.5 Below you will find the contour plots of functions f , g, h and i of two
variables, with two points highlighted in each plot. For each of the plots
answer the following questions:
a)Is the value of the function equal at the two points A and B? If not, at
which point is the value greater?
b)Does the function attain both positive and negative values in the depicted
area?
230 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
c)What is the sign of the partial derivatives of the function with respect
to x and to y at the point A?
Function f
3.0
−4
−3
.5
2.5
2.0
B
y
A
1.5
2.5
.5
.5
.5
.5
−3
5
1.0
−2
−0
−4
1.5 0.
−3
−2
−1
−1
2 1 0
Function g
2.0
−12
−10
1.5
−8
−6
−4 A
1.0
−2
0.5
0
y
0.0
4
0
2
−1.0 −0.5
−2
−4 B
x
8.5. EXERCISES 231
Function h
3.0
8 6
7
2.5
5 2
4
2.0
2
1.5
y
−1 −2
1.0
−3
A B
0.5
0 −1
0.0
Function i
1.0
0.5
0.5
0.0
B
y
−0.5
−0
−3.
.5
.5
0.5
A
−2
−1
−2
−1.0
.5
.5
−3
−1
−0
x
232 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
8.6 a)Assume that the demand for a certain product depends on its price
p1 and the price of competitor’s product p2 , with the demand function
given as f (p1 , p2 ) = p10
1 p2
. Find both price elasticities of f .
b)Assume that the demand for a certain product depends on its price p1 ,
the price of a competitor’s product p2 and the average income level m,
with the demand function given as g(p1 , p2 , m) = exp(−0.5p1 + 0.7p2 +
1.1m). Find both price elasticities and income elasticity of demand.
where Q represents output and K and L denote capital and labor inputs,
respectively. The current capital and labor levels are 100 units and 20 units,
respectively.
a)Find the current level of production.
b)Find and interpret the current marginal productivity of the labor for
the given production function (that is, compute ∂Q
∂L
).
c)Compute and interpret the elasticity of production with respect to
capital at the given input levels.
d)Due to a recent government wage increase that has elevated the cost
of labor, the firm decided to reduce the labor input. Determine how
the firm should adjust its capital level to maintain its initial production
output if they decrease labor by h units. (Hint: Find K ′ .)
Q(x, y) = x3 − xy 2 + 3y 3 + x2 y
where Q represents output and x and y denote inputs. The current input
levels are x = 10 units and y = 20 units, respectively.
a)Find the current level of production.
8.5. EXERCISES 233
b)Find and interpret the current marginal productivity of the input x for
the given production function.
c)Compute and interpret the elasticity of production with respect to input
y at the given input levels.
d)Due to a decrease in the availability of input x in the supply market, the
firm decided to reduce the amount of input x used in the production.
Determine how the firm should adjust its usage of input y to maintain
its initial production level if they decrease the usage if input x by h
units.
8.10 Anne is a student who loves chocolate. Her happiness from eating x
grams of dark chocolate and y grams of milk chocolate per week can be
described by the function
x 2 y 2 xy
H(x, y) = + + ln .
100 100 10000
Currently, she eats 200 grams of dark chocolate and 50 grams of milk chocolate
per week.
a)What is Anne’s current happiness level from eating cholate?
b)Anne realized that dark chocolate has become more expensive and
conders eating less dark and more milk chocolate to lower her chocolate
expenses. If she aims to maintain her current level of happiness from
eating chocolate, how much more milk chocolate does she have to eat if
she lowers her dark chocolate consumption by h grams per week?
c)If we denote by pd and pm the current prices of dark and milk chocolate,
respectively, what is the difference in her weekly chocolate expenses if
she lowers her dark chocolate consumption by h grams and increases
her milk chocolate consumption such that her chocolate happiness level
remains the same? What does the relationship between pd and pm have
to be such that she can save money by eating less dark and more milk
chocolate?
8.11 For the production of a certain powder, a firm uses two ingredients.
The amount (in kilograms) of the powder that can be produced using x kg
of the first ingredient and y kg of the second ingredient can be described by
the production function
P (x, y) = ln((x + y)2 ) + 5xy.
Currently, they are using x = 0.2 kg of the first ingredient and y = 0.8 kg of
the second ingredient.
234 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES
Optimization
In this chapter, we introduce the notions of local and global minima and
maxima and some conditions for these points in case of differentiable functions.
We first focus on single-variable functions and then shift our focus to functions
of several variables, discussing both the cases of unconstrained and constrained
optimization problems.
The condition f ′ (c) = 0 is only a necessary condition, that is, it does not
guarantee that c is indeed an extreme point of f . Consider e.g. the function
f (x) = x3 : f ′ (x) = x2 = 0 for x = 0, but f is increasing at x = 0 and x = 0
clearly is not an extreme point of f .
If there are α < c < β such that f (x) ≥ 0 on (α, c) and f (x) ≤ 0 on
′ ′
If there are α < c < β such that f (x) ≤ 0 on (α, c) and f (x) ≥ 0 on
′ ′
If there are α < c < β such that f (x) > 0 for any x ∈ (α, β) or
′
′
f (x) < 0 for any x ∈ (α, β), then c is not a local extreme point of f .
Problem 9.1. Maximize the profit Π of a firm, given the total revenue
function
R(Q) = 4000Q − 33Q3
and the total cost function
Solution. Profit is given as the total revenue minus the total costs, i.e. we
have
Π(Q) = R(Q) − C(Q) = −2Q3 − 30Q2 + 3600Q − 5000.
In the first step, we differentiate Π and set the derivative equal to 0:
R
C
P
40000
0
0 10 20 30 40 50 60
Note that above we used the function curve for plotting. This is a new way
of plotting functions compared to what we did so far. In curve, instead
240 CHAPTER 9. OPTIMIZATION
While Theorem 9.1 in theory gives a simple way of finding candidates for
local extremes, actually finding the critical points (and consequently local
extremes) can be in practice difficult. For many functions, the derivative
does not exist or is not readily available. Other functions, like polynomials
of degree more than 3, can be easily differentiated, but to solve the equation
f ′ (x) = 0 presents a challenge. In such situations, one resorts to numerical
methods, that is, uses a computer software, such as R, to optimize the
function at hand. In this subsection, we show how to optimize single variable
functions in R.
## $maximum
## [1] 20.00001
##
## $objective
## [1] 39000
The output of the optimize function is a list. You can think of lists in R
as baskets containing several objects. These objects may be all of the same
type, but don’t have to. They might also have each their names, and if they
do, as it is the case with the output of optimize, they can be accessed by
using the dollar sign. In this case, the two parts of the output are called
maximum, which is the point in which the local maximum is attained itself,
and objective, which gives the value of the objective function at this point.
out$maximum
## [1] 20.00001
out$objective
## [1] 39000
As we see, the value out$maximum is very close to the value we found analytically,
but still there is a small difference. This goes back to the numerical, and
therefore only approximative nature of the underlying algorithm. However,
the precision can be controlled to a certain degree, and the parameter that
is responsible for it is called tol. The smaller the tolerance, the more
precise the result. However, setting the tolerance too low could result in
long computation times or even to the algorithm not terminating at all, so
one needs to be carefult and not set it too low (and it cannot be set to 0).
Let us try with 1e-10.
242 CHAPTER 9. OPTIMIZATION
## [1] 20
out2$objective
## [1] 39000
one can use very similar considerations as in the case of functions of a single
variable to arrive at the first order necessary conditions that let us determine
the critical points, that is, candidates for a local minimum or maximum. We
only formulate them for functions of two variables, but they also extends to
functions of n variables (n > 2).
Theorem 9.4. Consider a function f : A → R of two variables and B ⊆ A.
If an interior point (x0 , y0 ) of B is a local minimum or maximum of f , then
fx (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0.
library(emdbook)
f <- function(x, y) (x - 2)^2 - (y + 3)^2
curve3d(f, from = c(0,-5), to = c(4,-1), sys3d = "wireframe")
f(x,y)
y x
9.2. MULTIVARIATE OPTIMIZATION 245
Summarizing Theorems 9.4 and 9.5, to find the local minima and maxima of
a function of two variables, we follow the following procedure:
1. Find the first order derivatives fx and fy and solve the systems of
equations fx (x, y) = 0, fy (x, y) = 0.
2. For each candidate point from point 1., check the convexity or concavity
of the function:
(a) Check the direct second order partial derivatives at the point. If
they are nonzero and share the same sign, the critical point is a
candidate for a point where f attains a local minimum (fxx > 0
and fyy > 0) or a local maximum (fxx < 0 and fyy < 0). Otherwise
the test is inconclusive (if all second order partial derivatives are
0) or the candidate point is not an extreme.
2
(b) If fxx fyy > 0, check also the cross-derivatives. If fxx fyy − fxy > 0,
the candidate point is indeed a local extreme. Otherwise the test
2
is inconclusive (if fxx fyy − fxy = 0) or the point is a saddle point
2
(if fxx fyy − fxy < 0).
Problem 9.2. A firm producing two goods in amounts x and y has the profit
function
Π(x, y) = 64x − 2x2 + 4xy − 4y 2 + 32y − 14.
Find the profit maximizing values of x and y and the maximal profit.
Solution. In the first step, we find the first order partial derivatives and
set them equal to 0. We get the following system of two equations in two
variables:
Πx (x, y) = 64 − 4x + 4y = 0
Πy (x, y) = 4x − 8y + 32 = 0.
The solution of this system is the point (x0 , y0 ) = (40, 24). To check whether
at this point the function really attains a local maximum, we perform the
second derivative test. We have
Πxx (x, y) = −4,
Πyy (x, y) = −8,
Πxy (x, y) = 4.
Since Πxx < 0, Πyy < 0 and Πxx Πyy > Π2xy , we conclude that (x0 , y0 ) =
(40, 24) are the profit maximizing amounts. The profit at this point is
Π(40, 24) = 1650.
Let us also visualize the function in R.
246 CHAPTER 9. OPTIMIZATION
profit(x,y)
y x
30
60
0 00 00 1600
10 12
25
20
y
15
00
1400 10
0 0 0
10
120 80 40
20 25 30 35 40 45 50
Problem 9.3. The demands for a monopolist’s two products are determined
by the equations
p = 25 − x, q = 24 − 2y
where p and q are prices per unit of the two goods, and x and y are the
corresponding quantities. The costs of producing and selling x units of the
first good and y units of the other are
Find the monopolist’s profit Π(x, y) from producing and selling x units of
the first good and y units of the other. Then, find the values of x and y that
maximize Π (and verify that you have found the maximum) and calculate
the maximum profit.
Solution. The revenue from selling x units of a product at the price p is
given as px, such that the monopolist’s revenue from selling x and y units,
respectively, of the two products, is given as
Correspondingly, the profit, given as the difference between the revenue and
the costs, is given by
Now that we have the profit function, we can proceed to write down the first
order conditions to find the critical point(s):
Πx (x, y) = 25 − 8x − 3y = 0
Πy (x, y) = 24 − 6y − 3x = 0.
(and both direct second order partial derivatives are negative), such that
these really are profit maximizing quantities. We have Π(2, 3) = 61.
Note that the function profit2 only takes one argument, but inside the body
of the function we access a[1] and a[2] which means that a is expected to
be a vector of (at least) two values. The first entry in a plays the role of the
variable x of the function profit, while the second entry plays the role of
y. Recall that we discussed these two possible ways of defining multivariate
functions at the beginning of Chapter 8.
Just like optimize, optim by default also minimizes fn. In this case, if
we want to maximize it, this can be done by the means of control =
list(fnscale = -1). This has to do with the fact that if a function is
9.3. CONSTRAINED OPTIMIZATION 249
multiplied by -1, it is mirrored about the x-axis, which means that any (local)
minimum becomes a (local) maximum and any (local) maximum becomes a
(local) minimum. Like with optimize, the output of optim is a list. In this
case, it contains more objects; for our purposes, the interesting parts are par,
which contains the found minimum or maximum, and value, which gives the
optimal objective value.
outmv$value
## [1] 1650
only, and there is only one constraint in the form of an equality. Formally,
this means that we deal with problems of the form
In the Lagrange multiplier method (or Lagrange method in short), the first
step is to define the Lagrange function, also called the Lagrangian. To this
end, one introduces a new variable λ to combine the objective function and
the constraint:
In the next step, one considers the first order necessary conditions applied to
the Lagrangian, as if we were optimizing this new function of three variable.
The idea behind the method is that we consider now an unconstrained
function in which, however, there is a penalty of λ – called the Lagrange
multiplier – if the constraint of the original problem is not satisfied, i.e. if
g(x, y) − c ̸= 0.
where we can notice that the last equation in the system, (9.5), is in fact
nothing else than the constraint of the original problem given in (9.1).
After solving this system of equations, we get the stationary points of the
Lagrangian, triples of values (x0 , y0 , λ0 ). Any (x, y) that are a solution of
the constrained optimization problem given in (9.1) must be part of such a
triple.
9.3. CONSTRAINED OPTIMIZATION 251
To find out whether (x0 , y0 ) of any candidate triple is in fact a solution to the
optimization problem, we need to check, similarly to unconstrained problems,
some second order derivatives. To this end, we will consider the second order
derivatives of the Lagrangian with respect to the variables of the original
problem, x and y.
Theorem 9.7. Consider the constrained problem in (9.1) and let (x0 , y0 , λ0 )
be a stationary point of its Lagrangian.
Lx (x, y, λ) = 80 − 4x − y − λ = 0
Ly (x, y, λ) = −x − 6y + 100 − λ = 0
Lλ (x, y, λ) = −x − y + 12 = 0.
This system has exactly one solution, namely the triple (x0 , y0 , λ0 ) = (5, 7, 53).
To check we found a maximum of the objective function under the given
constraint, we check the second order derivatives of the Lagrangian. We
get Lxx = −4, Lyy = −6 and Lxy = −1. Since Lxx < 0, Lyy < 0 and
Lxx Lyy > L2xy , x = 5 and y = 7 is the profit maximizing product combination.
10 12
12
00
8
110
0
6
y
100
0
4
900
2
800
10 20 30 400 700
0 0 0 500 600
0
0 2 4 6 8 10 12
Maximizing the function under the constraint graphically means finding the
highest level curve that crosses or touches the constraint. The point at which
this level curve touches the constraint is the optimal solution of the problem.
The value at the level curve is the maximal possible value of the objective
function.
While the ultimate goal of the Lagrange method is to find the x and y that
solve the optimization problem, the Lagrange multiplier is not only means to
an end. It has an actual interpretation which we give in terms of a particular
type of maximization problems but can be extended to general problems of
the form (9.1).
In the context of problem 9.4, this means that if the maximal production
were increased by one unit, that is, we would have x + y = 13, this can
lead to an increase of approximately λ0 = 53 monetary units in the profit.
Therefore, the producer would be willing to pay at most 53 monetary units
for a chance to increase the overall production output by one unit.
Problem 9.5. Minimize the function x+2y under the constraint x2 +y 2 = 1.
Solution. Again, we start by writing down the Lagrange function of the
problem:
L(x, y, λ) = x + y − λ(x2 + y 2 − 1).
The first order partial derivatives give us the system
Lx (x, y, λ) = 1 − 2λx = 0 (9.6)
Ly (x, y, λ) = 2 − 2λy = 0 (9.7)
Lλ (x, y, λ) = −x2 − y 2 + 1 = 0. (9.8)
1
From equations (9.6) and (9.7), we obtain that x = 2λ and y = λ1 . If we plug
in these expressions into (9.8) or, equivalently, into the original constraint,
we obtain
1 1
2
+ 2 =1
4λ λ
√ √
which has two solutions: λ1 = − 25 and λ1 = 25 . If we plug in into
the expressions √
for x and y, we have two candidate
√
points: (x1 , y1 , λ1 ) =
1 2 5 1 2 5
(− √5 , − √5 , − 2 ) and (x2 , y2 , λ2 ) = ( √5 , √5 , 2 ).
The second order partial derivatives of the Lagrangian with respect to x
and y are √
Lxx (x, y, λ) = −2λ, Lyy (x, y, λ) = −2λ and Lxy (x, y, λ) = 0. For
5
λ1 = − 2 , the conditions for a minimum are satisfied, such that the point
(− √15 , − √25 ) is the solution of the problem. Note that ( √15 , √25 ) would solve
the corresponding maximization problem.
who would be interested in more functions that allow to solve more general
classes of optimization problems are encouraged to have a look at the package
ROI.
g(x, y) = c
g(x, y) ≥ c
g(x, y) ≤ c.
g(x, y) ≥ c − 10−5
−g(x, y) ≥ −c − 10−5 .
Note that we could also use a different (small positive) constant instead of
10−5 .
Remark. The need to relax the equality constraint has to do with the fact
that the algorithms used by constrOptim belong to the class of the so called
interior point algorithms. Losely speaking, this means that the starting point
of the algorithm must be a point from which one could move in any direction
and still be able to stay within the constraint with an appropriately small
step size. The details of this procedure, however, are well beyond the scope
of this course.
256 CHAPTER 9. OPTIMIZATION
Moreover, recall that the variables of interest, x and y, were in fact quantities.
While when solving the problem analytically, we keep this interpretation in
mind and could, if necessary, decide to only use a positive solution (in case
there would be several), this information is only given implicitly and R would
not be able to makes this decision. Therefore we need to add the constraints
x ≥ 0 and y ≥ 0. The full set of ≥ constraints is therefore
1·x+1·y ≥ 12 − 10−5
(−1) · x + (−1) · y ≥ −12 − 10−5
1·x+0·y ≥0
0·x+1·y ≥ 0.
We recognize that the above system could be written in the following way:
12 − 10−5
1 1
−1 −1 x −12 − 10−5
1
≥ .
0 y 0
0 1 0
We will use the matrix on the left hand side and the vector on the right hand
side to pass the contraint(s) to the function.
Like optim, the function constrOptim also requires the function to only use
one argument. Therefore we redefine the profit function before providing
all arguments to constrOptim. To make sure that our function will be
maximized, we use control = list(fnscale = -1), just like in the case
of optim. We set grad = NULL such that we don’t have to provide the first
order partial derivatives of the function.
solution$par
solution$value
## [1] 868.0005
Note that the output of the function does not contain the optimal value of
the Lagrange multiplier.
To find the global minimum (maximum), one needs to consider all of the
local minima (maxima) and compare them. Moreover, one needs to consider
the overall behaviour of the function. If a function does not have any local
minimum (maximum), then there is also no global minimum (maximum).
An example of a function that does not have any local or global extremes is
x3 .
Other functions do have local extremes, but no global extremes. For instance,
f (x) = x3 −3x attains a local maximum at x = −1 with the value f (−1) = 2,
and a local minimum at x = 1 with the value f (1) = −2. But it does not
have any global extremes because as x approaches −∞, f (x) approaches
−∞, too, whereas for x approaching ∞, f (x) also increases towards ∞.
Finally, of course there are also functions that do have global extremes. A
very simple example is the function x2 .
Example 9.2. Consider again f (x) = x3 − 3x, now on the interval [−3, 1.5].
The two local extremes satisfying the first order necessary conditions are
x = −1 with f (−1) = 2 and x = 1 with f (1) = −2. When checking the
interval endpoints, we obtain f (−3) = −18 and f (1.5) = −1.125. Therefore,
on the given interval, the global minimum is at x = −3 and the global
maximum is at x = −2.
9.6 Exercises
9.1 For the following functions, find all of their stationary points and decide
whether the function attains a local minimum, local maximum or neither at
these points.
(x2 +2)2 2
a)f1 (x) = 4
d)f4 (x) = (x − 3) ex
x3 +1 e)f5 (x) = 3x4 + 4x3 − 30x2 + 36x +
b)f2 (x) = x+1 10
1
f)f6 (x) = x ln x1
c)f3 (x) = − (3x4 +x 2 )10
15
g'(x)
f'(x)
5
−20
0
0 2 4 6 0 2 4 6
x x
30
150
h'(x)
i'(x)
10
50
−10
−50
−4 0 2 4 −3 −1 1 3
x x
Find all its local minima and maxima in the interval [3, 15]. Then find its
global minimum and maximum in this interval (if they exist).
Find all its local minima and maxima in the interval (−20, 20). Then find
its global minimum and maximum in this interval (if they exist).
where x is the produced amount of the product (output). Find the profit
maximizing output and the corresponding maximal profit. Assume x > 0.
9.7 Find the stationary points of the following functions and classify them
(f attains local minimum, local maximum, saddle point, undetermined):
Use R to verify your answer; choose appropriate starting points to find all
extremes.
9.8 A monopolist offers two products at respective prices p1 and p2 . The
respective demands at these quantities are
The costs of producing one unit of the products are 21 and 35, respectively,
and the fixed costs are 2633. Find
a)the profit function as a function of the prices;
b)the profit maximizing prices p1 , p2 ;
9.6. EXERCISES 261
Due to some quota restrictions, the firm’s output must satisfy 2x + 3y = 60.
The firm seeks to minimize the costs.
a)Write the Lagrangian and find the cost minimizing output levels via the
Langrange method.
b)Find the minimal costs.
c)Find the (approximate) change in the value of the cost function caused
by a one unit change in the total number of goods that must be produced.
Due to some quota restrictions, the firm’s output must satisfy x + 2y = 10.
a)Write the Lagrangian and find the profit maximizing output levels via
the Lagrange method.
b)Find the maximum profit.
9.6. EXERCISES 263
In advertisements for accounts or credits, banks offer interest rates p.a. which
stands for per annum – per year. These are also called nominal interest
rates. However, usually the interest does not in fact accrue only once a year,
rather several times a year. Often it would be once a month, but there are
also instances of interest being added every day. In such cases, the nominal
interest is divided equally among all periods. For instance, if a bank offers
an interest rate of 1.5% p.a. with monthly compounding, that means that
every month, an interest of 1.5
12
% = 0.125% will be added to the account.
267
268CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY
Then we have
r nt
Sn (t) = S0 1+ . (10.1)
n
A special case is n = 1 where the interest accrues once a year. In that case,
as discussed in Problem 1.7, after t years we have S1 (t) = S0 (1 + r)t in the
account. With n compounding periods per year, not only does the periodic
interest (the interest rate corresponding to one compounding period) change
to nr , but it is compounded n times every year, thus in t years, nt times
altogether. The factor q = 1 + nr corresponding to one compounding period
is often referred to as the compounding factor.
exp(1)
## [1] 2.718282
r <- 0.05
n <- 1:1000
plot(n, compound(r, n))
abline(exp(r), 0, col=2)
1.0512
compound(r, n)
1.0506
1.0500
n
270CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY
Just like previously, we observe the factor increases with increasing n, and
in fact, it gets very close to the continuous factor relatively fast. Therefore,
while as mentioned above, in practice continuous compounding is not possible,
it is an important theoretical tool used in many finance-mathematical and nt
economic models, since working with ert is easier than working with 1 + nr .
As is clear from the above considerations and results, the same interest rate
with different frequency of interest payments leads to different outcomes.
Similarly, different nominal interest rate with different frequency of payments
may lead to the same value in the account after the same period of time.
Therefore, to make the interest rates with different compounding periods
comparable, we use the notion of effective rate of interest. The effective rate
of interest is the rate at which the initial capital really increases within one
year. Note that with n compounding periods, the capital after one year is
r n
Sn (1) = S0 1 +
n
which corresponds to an increase (change) by a factor of (1 + nr )n . Therefore,
the effective interest rate corresponds to
r n
R= 1+ − 1. (10.6)
n
Similarly, with continuous compounding, after one year the initial capital is
multiplied by a factor of er , which makes the effective rate of interest
R = er −1. (10.7)
Though one usually thinks of interest rates as positive values, it is not always
the case. In fact, the past years saw not only 0, but even negative interest
rates.
Problem 10.1. In mid-2018, the nominal annual interest rate at the European
central bank for AAA-rated bonds with a maturity of 1 year was about
−0.7%. Compute the time it takes for such a bond to lose 10% of its original
value and the respective effective interest rate, assuming that the nominal
interest rate stays at this level, if the interest is ’paid’
a) yearly,
b) monthly,
c) continuously.
10.1. INTEREST PERIODS AND EFFECTIVE RATES 271
S0 (1 − 0.007)t = 0.9S0
ln 0.9
t= = ln0.993 0.9 ≈ 14.9988.
ln 0.993
The effective rate in this case is the same as the nominal rate.
For part b), we have n = 12, which changes the equality for t to
12t
0.007
S0 1 − = 0.9S0
12
. The solution is
S0 e−0.007t = 0.9S0 ,
which is
ln 0.9
t= ≈ 15.0515.
−0.007
The effective interest rate is given by
R = e−0.007 −1 ≈ −0.006976.
Remark. Problem 10.1 shows that more frequent interest payments are not
only advantageous with positive interest rates (more earnings), but also with
negative interest rates as in that case, the value depreciates more slowly.
272CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY
K = A ert
which leads to
K
A= = K e−rt . (10.9)
ert
Note that the present value refers to how much a future capital K is worth
today; as already hinted above, there is not necessarily a real bank account
connected to this. For each period between now and then, the future value
is divided by the compounding factor q = 1 + nr corresponding to one period
once. The reciprocal value of the compounding factor d = 1q is often referred
to as the discounting factor.
Pn Fn
274CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY
To calculate the value of a payment one period later, one compounds once;
to calculate the value one period earlier, one discounts once.
Problem 10.2. Toni agrees to pay Steph e1000 one year from now and
e2000 two years from now. In return, Steph pays Toni e3000 three years
from now. Determine the present and the future value of this contract when
the interest rate is 5% (annual compounding).
Solution. Note that the present and future value of the contract differs
depending on whether we calculate the values from the point of view of
Steph or Toni – but they only differ in the sign. Let us calculate it from
the point of view of Steph. From Steph’s point of view, there will be three
payments a1 = 1000, a2 = 2000 and a3 = −3000 one, two and three years
from now, respectively. Note that the last payment is negative since she will
not receive this payment, she will make it. At r = 0.05, the present value of
this contract is therefore
1000 2000 3000
Pn = + 2
− ≈ 174.93.
1.05 1.05 1.053
The future value can easily be calculated from this using formula (10.13):
Fn = Pn · 1.053 = 202.5.
If we calculated the values from the point of view of Toni, we’d get Pn ≈
−174.93 and Fn = −202.5.
Inspecting equalities (10.10) and (10.11) for an annuity, we might notice that
in fact, the future and present value are both sums of geometric sequences
– sequences in which the relative difference between successive terms is
constant, or in other words, the next term arises from the previous one by
multiplying it by a fix constant k. For the sum of a geometric sequence, there
is in fact a simple formula.
10.3. ANNUITIES AND MORTGAGE REPAYMENTS 275
sn (1 − k) = sn − ksn = a + ak + ak 2 + . . . + ak n−1
− ak − ak 2 − . . . − ak n−1 − ak n = a − akn = a(1 − k n ).
Using formula (10.14), we can easily derive the formulae for present and
future values of any annuity or annuity due. Let us consider compounding
m times a year and let us denote, as above, by q = 1 + nr the compounding
factor for one period and by d = 1q the discounting factor for one period.
1 1 − 1.021 10
P10 = 1000 · 1 ≈ 8982.59,
1.02 1 − 1.02
1.0210 − 1
F10 = 1000 ≈ 10949.72.
1.02 − 1
Problem 10.4. Find the present and the future value of an annuity due of
e1000 per payment once per year for 10 years when the annual interest rate
is 2%.
Solution. We have an annuity due with a = 1000, n = 10, m = 1 and
1
r = 0.02. Consequently, q = 1.02 and d = 1.02 . Plugging in into (10.17) and
(10.18), we get
due 1 − 1.021 10
P10 = 1000 1 ≈ 9162.237,
1 − 1.02
due 1.0210 − 1
F10 = 1000 · 1.02 ≈ 11168.72.
1.02 − 1
Note that indeed, if we compare the results from Problem 10.3 and 10.4, we
due due
obtain that P10 = d · P10 and F10 = q · F10 .
10.4. INTERNAL RATE OF RETURN 277
The internal rate of return (IRR) of the project is such a value of r that
satisfies A = 0. In other words, it is an interest rate at which the present
value of all costs (e.g. the purchasing costs at the beginning of the project) is
equal to the present value of all revenues from the project. IRR is sometimes
used by companies to compare different investment opportunities: a project
with a higher IRR is considered better.
Problem 10.7. You give someone e10000, and they promise to pay back
e2000 after one year, e2500 after two years, e3000 after three years and
e3500 after four years. Calculate the internal rate of return.
−500
−1000
−1500
rs
## [1] 0.03584787
PV_IRR(IRR$root)
## [1] 0.006288343
The output of uniroot is a list, like we already know it from the optimization
functions. The root itself is provided in the object root of the output. A
280CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY
simple check of plugging in the outcome to the original function reveals that
the value at this point is still relatively far from 0, but we can adjust the
tolerance to get a better result.
## [1] 0.03584811
PV_IRR(IRR$root)
## [1] 2.273737e-13
The internal rate of return of the proposed deal is thus about 3.58%.
10.5 Exercises
10.1 Assume that you want to deposit a capital S0 for the next 5 years and
have three offers for a savings account:
Bank A offers you an annual interest of 3%, compounding yearly.
Bank B offers you an annual interest of 2.5%, compounding continuously.
Bank C offers you an annual interest of 2.8%, compounding twice a
year.
Which of the three offers is the best?
10.2 Assume that you deposit e4000 in an account with monthly compounding
and after 10 years, you have e4983.30. What is the nominal and effective
interest rate in the account?
10.3 Thomas won in a lottery and was offered two options: e10000 immediately
or e5100 in one year and e5100 in two years. Assuming an annual interest
rate of 5%, which offer should he choose?
10.4 After graduation, a BBE student got an offer from a master’s program
in another country. The student first arranged a room to stay that costs
e300. The remaining living expenses are estimated to be e500. The student
found a part-time job to cover the expenses. While the employer pays e1000
10.5. EXERCISES 281
at the end of each month, the landlord requires rent payments upon the
beginning of each month. On the other hand, the student uses a credit card
to pay for the other expenses, and the credit card bills are paid at the end of
each month. Suppose that from the former savings, the student has e1000
when moving to the new country. Assume that the annual interest rate is
2% and that the student will stay in the country for 2 years. Compute the
present and future value of the money the student will save over the two
years, after paying for the expenses in each month. Hint: Draw a timeline
with time points 0, 1, . . . , 23, 24 and indicate the cash inflows and outflows
that occur at the given time points and use formulae (10.15)-(10.18).
10.5 Erica plans to take a trip next year which she estimates will cost her
e5000. She wants to save up for the trip by repeated monthly deposits of
the same amount of money. If she deposits the first payment immediately,
saves for a year and the annual interest rate is 3%, how much does she have
to deposit each month?
10.6 Suppose that you are offered an annuity of 10 yearly payments of e1000
in exchange for an immediate payment of e7500. Use R to find the internal
rate of return.
10.7 Assume that two banks offer car loans, where the borrowed monay
plus interest is repaid at the end of the loan period. Bank 1 offers loans
with an annual interest rate of 10.07% and with continuous compounding.
Bank 2 offers loans with an annual interest rate of 10.1% and with monthly
compounding.
a)If a borrower wants to buy a car for e40000 and get this amount financed
with loan with a period of 5 years, which bank should they choose?
Justify your answer by computing and comparing the total payment
the borrower would make to each bank.
b)Calculate the effective annual interest rates of the two considered car
loans (that is, of both a loan from Bank 1 and a loan from Bank 2) from
part a).
c)Assume now that the borrower from part a) considers a third option
to finance a new car. Their close friend offers to lend them e40000
now and in return, the borrower promises to pay 850 at the end of each
month for the next 5 years. Assume an annual interest rate of 10% and
calculate the future value of the total payments the borrower makes to
their friend. Compare the result to the results from part a). Should
the borrower take the friend’s offer, or should they finance the car by
borrowing from one of the banks?
282CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY
The contents of this chapter are discussed in Sections 11.1-11.7 of [1], with
11.4 offering a much more detailed overview of geometric sequences and series
than necessary for this course. To practice and check your understanding of
the topics, we suggest the following exercises:
In this section, we introduce a few more useful R functions that did not fit
with any of the previous topics, to give you a solid basis for writing more
involved codes. We also share a few programming tips that can help to make
your code more compact and efficient.
We start this chapter with another cycle that, unlike for cycle, can be used
in situations when you don’t know upfront how many times a certain task
needs to be repeated.
Recall that in Section 5.2, we have introduced the following structure for the
for cycle:
for(i in sequence) {
task
}
283
284 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
would not know how to choose sequence. Instead, we would make use of
while which has the following basic structure:
while(condition) {
task
}
n <- 1
while(n <= 5) {
print(sum(1:n)^2)
n <- n + 1
}
## [1] 1
## [1] 9
## [1] 36
## [1] 100
## [1] 225
Note that unlike with for, we started by defining some value for n outside
of the cycle. After that, the following happens: At the beginning, and after
each iteration of the cycle, while checks whether n is not more than 5. If so,
sum(1:n)^2 is printed and afterwards, we increase the value of n by 1. Once
the end of task is reached, the iteration is done a new one starts with a new
check of the condition. The first time the condition returns FALSE, the task
will be skipped and R will go on with the rest of the code (if there is some
after).
As already mentioned, the above task could have been performed with a for
cycle, and even in a more elegant way. Let us therefore turn our attention
to a task that in fact requires while rather than for.
n
P
Problem 11.1. Find the smallest natural number n with i ≥ 1000.
i=1
11.1. WHILE CYCLE 285
Solution. This goal can be achieved e.g. by the following simple code:
n <- 1
s <- 0
while(s + n < 1000) {
s <- s + n
n <- n + 1
}
n
## [1] 45
## [1] 45
286 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
11.2 return
In the previous chapters, we often used the function return to define the
output of the function. However, we also mentioned in Section 3.2 that
it is not absolutely necessary to use it, especially in simple functions (e.g.
when defining a mathematical function before plotting it). In fact, the usual
practice is not to use it in simple functions. We will now discuss what are
the benefits of using it and when one should do so.
Looking at the functions, one can guess that the goal was to implement a
function that for n ≤ 10 outputs the value of n itself, for n ∈ (10, 20] twice
the value of n, and otherwise 3n. However, only one of the functions achieves
this. Let’s first look at the output of both functions for several values of n
to see what happens:
11.2. RETURN 287
Foo(2)
## [1] 6
Foo2(2)
## [1] 2
Foo(15)
## [1] 45
Foo2(15)
## [1] 30
Foo(40)
## [1] 120
Foo2(40)
## [1] 120
We observe that for Foo, the output is always 3n, even though we used n from
all three different categories used in the function. However, Foo2 delivers the
correct outputs. The reason is that return terminates the function. That is,
the moment that return is first executed, the rest of the code in the function
is ignored. In Foo, no value is returned or printed in either of the if clauses,
but the value 3*n is automatically returned at the end of the function. If
we would want to make sure that the function outputs the desired value for
any n, we would have to use two nested if ... else cases. return makes
it possible to write the code in a more elegant way using only if, but not
288 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
having to use else: If n ≤ 10, the rest of the code is never reached because
n is returned in the first clause already. Otherwise we know already that
n > 10, such that we don’t have to check this part of the second condition
anymore. Instead, we continue to check whether n ≤ 20, which would (in
combination with the first condition being FALSE) imply that n ∈ (10, 20]. If
this is the case, the value 2*n is returned and the last code line is ignored.
If however also here we have FALSE, then n must automatically by greater
than 30, implying 3*n as the function’s output.
Sometimes one wants to provide several values or even objects as the outputs
from the function. While we did not explicitly mention it so far, we did define
several functions previously whose outputs were vectors or matrices – this is
a very natural way of outputting several values. However, these have to be
of the same type, i.e. one cannot mix numbers with characters, for instance
(if one tried to define a vector with both value types, all numbers would be
turned to characters – feel free to try it). And in other situations, one does
not only want to output several values of different types, but also different
objects, like a number, a vector and a matrix. Recall that we saw such
outputs in Chapter 9: all of the optimization functions always provided as
their output lists of objects, among them the point x at which the optimal
value is achieved, and the optimal value itself. In the case of functions of
several variables, x was a vector with as many entries as the number of
variables of the function.
In this section, we discuss how to create lists and a few important things
about them. Generally, lists are a handy way of combining several objects,
possibly of different types, into one bigger object. Let’s say we want to
combine the following three outputs:
As we can see, the first object is a numeric vector; the second one is a single
character; and the last one is a numeric matrix. If we tried to combine
them e.g. in a vector, because of the presence of "Hello", all numbers would
11.3. SEVERAL OUTPUTS FROM A FUNCTION 289
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] "Hello"
##
## [[3]]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
The individual objects in a list can be accessed by the use of double square
brackets, and within them (if they are e.g. vectors or matrices), the usual
indexing works:
all_together[[1]]
## [1] 1 2 3 4 5
all_together[[3]][1, 3]
## [1] 7
In Chapter 9, we have already mentioned that if the objects in the list are
named, they might also be accessed through this name, using the dollar sign.
Let’s try:
290 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
all_together$some_output
## NULL
## [1] 1 2 3 4 5
## $some_output
## [1] 1 2 3 4 5
##
## $other_output
## [1] "Hello"
##
## $another_output
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Let’s write our first function that gives a list as its output.
the largest even natural number smaller than x, and provides the information
whether this number is divisible by 3 in a separate character object.
Solution. First we realize that for x ≤ 2, there is no even natural number
smaller than x. We will therefore start with the case of x ≤ 2 and then
continue to implement the rest of the function for values larger than 2.
Note that we first define max_div_2 as the maximal value in the sequence
1:x. If x is a natural number, it is included in the sequence, and if it is
moreover even, max_div_2 will be assigned this value. In that case, we have
to make it by 2 smaller (to make sure that it is still natural, even, and that
it is smaller than x). If x is a decimal number, 1:x will end at the largest
natural number below x.
Let us check the outcome:
foo(1)
foo(3)
## $multiple_of_2
## [1] 2
##
## $is_divisible_3
## [1] "not divisible by 3"
foo(18.2)
292 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
## $multiple_of_2
## [1] 18
##
## $is_divisible_3
## [1] "divisible by 3"
foo2(3)
## $multiple_of_2
11.5. WHICH 293
## [1] 2
##
## $is_divisible_3
## [1] "not divisible by 3"
foo2(18.2)
## $multiple_of_2
## [1] 18
##
## $is_divisible_3
## [1] "divisible by 3"
11.5 which
Another useful function we want to introduce here is the function which that
allows us to find out which of the entries of a vector satisfy a certain condition
– it outputs not their values, but indexes. For instance, let’s assume that a
vector v has been defined. max(v) finds the maximal value in v; which(v ==
max(v)) tells us where within v is this value. Similarly it can tell us all the
indexes of even values within v.
## [1] 18
which(v == max(v))
## [1] 6
which(v%%2 == 0)
## [1] 2 3 6 7
294 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
To find out the index of the minimal or maximal value within a vector, there
are also the functions which.max and which.min.
which.max(v)
## [1] 6
which.min(v)
## [1] 1
# chunk 1:
x <- 1:10000
a <- 5
sum(a*x)
## [1] 250025000
# chunk 2:
x <- 1:10000
a <- 5
a*sum(x)
## [1] 250025000
11.7. EXERCISES 295
Before you continue reading, think about the following: Which chunk requires
less mathematical operations? Which of them do you believe should be less
computationally costly?
The second thing you should bear in mind is that although for and while
cycles can be very useful, they should not be overused (though we do admit
that in order to demonstrate their use or to test your understanding of these
cycles as well as other concepts, we sometimes use them in a not very efficient
way – for example in some exercises at the end of this chapter). The reason is
simply that they are computationally costly, especially the while cycle that
in each iteration needs to check some condition to decide whether it should
keep running. Let’s add a third chunk to the above competition:
# chunk 3
x <- 1:10000
a <- 5
s <- a*x[1]
for(i in 1:length(x)) s <- s + a*x[i]
s
## [1] 250025005
Not only does this code use more mathematical operations than necessary, it
also uses a cycle. That makes it by far the least efficient of the three chunks,
and the difference in running times could actually already be observed for
much shorter vectors x than the difference between chunks 1 and 2.
11.7 Exercises
For the following tasks, try first to solve them without using R. Though it
might seem counterintuitive, one of the best ways to learn programming is to
296 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
write or read code on paper rather then directly type and run it, as it makes
you think about it in a more detailed way. Once you believe you have found
an answer, check it in R.
11.1 For an integer n, explain what the following function does. (Please do
not explain the code line by line. Your task is to recognize what the outcome
of the function is.)
11.2 For an integer n, the code below should find the largest integer x for
x
P
which it holds that i < n. However, there are some mistakes in the code.
i=1
Find and correct them.
11.3 For a matrix A, the code below should find the sums of the rows and the
columns of the matrix (equivalently to the functions rowSums and colSums,
however, without using these functions). The outcome should be a list of two
vectors containing the row sums in the first one and the column sums in the
second one. However, there are some mistakes in the code. Find and correct
them.
1.1 a) I, b) N, c) I, d) Q.
0.75 y
1.3 a) 0.30625y kg, b) 0.30625
≈ 2.45 euros, c) 9.1
kg.
√ √
1.4 a) x ∈ −1 − 2, −1 + 2 , b) x = 21 , c) x ∈ ∅, d) x ∈ ∅,
9 1
e) x = 5, f) x ∈ −1, 16 , g) x ∈ {−15, 3}, h) x ∈ ∅ x = − 1+a if
9
a ∈ (−∞, −1) ∪ (0, ∞) and no solution otherwise, i) x = −3 if a = −7,
√ 9
x = −7 if a = 1, x = −14 if a = 16, x = −3± 9 + 7a if a ∈ − 7 , 1 ∪(1, 16)∪(16, ∞)
and no solution otherwise.
1.5 300.
1.6 Q = 0 or Q = 50.
1.8 52 .
299
300 ANSWERS TO THE EXERCISES
1.11 a) 3, b) 21 , c) 0.1, d) 6, e) 0, f) − 21 .
909 50 100
i2 , c) (−1)i+1 ix2i (note that these are possible
P P P
1.17 a) 11i, b)
i=1 i=1 i=1
ways but the notation is not unique).
2
P 2
P
1.18 a) a3 = 4, ak = 14,, b) a3 = −4, ak = 2.
k=−1 k=−1
778
1.19 a) 16, b) 0, c) 36
.
7
P
1.20 a) 42,, b) (27 + 5(i − 1)).
i=1
301
4
P
1.21 ai = 7 + 4(i − 1), 7 + 4(i − 1).
i=1
2.1 The following codes present one possible option; there is not necessarily
a uniqe way. a) 11^(0:10), b) rep(1:3, 15), c) rep(5:18, each = 3),
d) seq(-500, -100, by = 50), e) seq(55, 33, by = -2), f) rep(seq(2,
14, by = 2), times = 1:7)
2.4 One possible chunk of code that would fulfill the given tasks is the
following:
3.1 f (x) = 3x + 5, g(x) = −(x + 2)(x − 3), h(x) = log2 x, i(x) = 5 · 2x ,
x
j(x) = 21 (x − 1)2 , k(x) = log 1 x, l(x) = − 15 x, m(x) = −2 21
3
3.3 a) S(P ) = 5P , P ∗ ∈ {5, 20}, D(5) = S(5) = 25, D(20) = S(20) = 100,
b) D(P ) = −2P +100, S(P ) = 3P , P ∗ 20, D(P ∗ ) = S(P ∗ ) = 60, c) D(P ) = −P 2 −6P +100,
S(P ) = 9P , P ∗ = 5, D(P ∗ ) = S(P ∗ ) = 45,, d) D(P ) = −18P + 264,
S(P ) = 4P 2 + 2P , P ∗ = 6, D(P ∗ ) = S(P ∗ ) = 156. To check your code,
compare the resulting figure with the list of required properties of the plot.
Pay particular attention to whether the y-axis is large enough by default or
needs to be adjusted to show all values of both functions in the given area
(this might also depend on which function you plot first).
302 ANSWERS TO THE EXERCISES
3.4 The mistakes are: a) function arguments are only listed after the word
function; the function for logarithm is called log; brackets are necessary
in the denominator of part2; star is necessary to indicate multiplication in
part3, b) e in R is exp(1); log(x) is the natural logarithm, to work for
any a correctly, base = a must be included, c) the power 1/8 in part2n
must be in brackets; multiplication stars missing in part2d, d) p1 requires
base = 10; p2 requires the star for multiplication; p3 should be exp(y)
4.1 a) range R, bijective, b) range [−4, ∞), neither for i), surjective for
ii), c) range R+ , injective for i), bijective for ii), d) range R0+ , neither for
i), surjective for ii)
6
6
4
4
g1(x)
g2(x)
2
2
0
−2 0
−2
−6 −4 −2 0 2 4 6 −8 −6 −4 −2 0 2 4
x x
6
10
4
2
g3(x)
g4(x)
5
0
0
−4
−4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
x x
6
2 4 6
4
g5(x)
g6(x)
2
−2
0
−2
−6
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
x x
304 ANSWERS TO THE EXERCISES
2
0
g7(x)
−6 −4 −2
−3 −2 −1 0 1 2 3
√
4.6 a) h−1 y−2 −1
1 (y) = 3 , b) h2 (y) = 2(4−y), c) h3 is not bijective, d) h−1
4 (y) = 2y,
−1 −1
p
e) h5 is not bijective, f) h6 (y) = 3− (y −4), g) h7 (y) = log2 (y +4)+1,
1
h) h−1
8 (y) = 2
y−1
, domain of h8 is R+ , i) h−1 4
2
9 (y) = (e + 2) − 4, domain of
h9 is R+
x3
4.7 a) M : (1, ∞) → R+ , M (x) = (x−1)2
. M is surjective, not injective,
√
3
inverse does not exist. D : R+ → R+ , D(x) = 3x . D is injective and
√
surjective. D−1 : R+ → R+ , D−1 (x) = 3 9x2 . Note that in both cases, if we
chose a larger codomain, the functions would not be surjective and thus D
2
would not have an inverse, either., b) M : R+ → R0+ , M (x) = (x−1) √
x
. M is
surjective, not injective, inverse does not exist. D : R+ → R+ , D(x) = exp(x−1)2 −1.
0 0
D is surjective, not injective, inverse does not exist. In both cases, a larger
codomain would destroy surjectivity.
item m : (0, 3) → (0, 9), M (x) = 9 3−x 3+x
. M is injective and surjective.
−1 −1 9−y
M : (0, 9) → (0, 3), M (x) = 3 9+y .
4.10 The mistakes are: a) max should be pmax; in ifelse, the second
and third argument should be switched (or the condition should contain !=
instead of ==), b) : instead of to in the definition of tocheck; tocheck is a
vector such that ifelse is required instead of if...else, c) pmin instead
of min, d) ifelse instead of if; < should be <=; else should be removed;
the last line should be final.class instead of ratings
5.1 In the following, D(f ) denotes the domain of function f . a) f1′ (x) = 2x+3x2 ,
D(f1 ) = D(f1′ ) = R, b) f2′ (x) = 8x − 1, D(f2 ) = D(f2′ ) = R, c) 2√1 x − x23 ,
D(f3 ) = D(f3′ ) = R+ , d) f4′ (x) = √ 2 ′
3 2 , D(f4 ) = R, D(f4 ) = R \ {0},
x
e) f5′ (x) = − x23 − x124 , D(f5 ) = D(f5′ ) = R \ {0}, f) f6′ (x) = − x22 − 35
2 √1
5 3,
x
′ ′ ′
D(f6 ) = D(f6 ) = R \ {0}, g) f7 (x) = 2 cos x − 3 sin x, D(f7 ) = D(f7 ) = R,
h) f8′ (x) = 3x ln 3, D(f8 ) = D(f8′ ) = R, i) f9′ (x) = x ln9 10 , D(f9 ) = R+ ,
D(f9′ ) = R \ {0}
2
√ −1
5.2 a) g1′ (x) = 2x+3− x12 for x ̸= 0, b) g2′ (x) = 12x(x2 +1)5 , c) g3′ (x) = 212x
√ 4x3 −x
′ ′
/ {0, 2 }, d) g4 (x) = √
for x ∈ 1 2 5x+5
√
2
for x ̸= 0, e) g5 (x) = 2x sin x+(x −1) cos x,
4 5x2 +5x 5x
f) g6 (x) = −2 sin(2x + 4), g) g7′ (x) = 2 sin x cos x,
′
h) g8′ (x) = 2x cos(x2 ),
i) g9′ (x) = 3 3sincosx+8
x ′
, j) g10 (x) = esin x cos x
5.3 a) h′1 (x) = x3 + 2x, h1 is decreasing on (−∞, 0], increasing on [0, ∞),
b) h′2 (x) = 2x − 1 for x ̸= −1, h2 is decreasing on (−∞, −1) and (−1, 21 ],
increasing on [ 12 , ∞), c) h′3 (x) = (x+3)
7
2 for x ̸= −3, h3 is increasing on
2x2 +2x+2
(−∞, −3) and on (−3, ∞), d) h′4 (x) = (1−x2 )2
for x ̸= ±1, h4 is increasing
306 ANSWERS TO THE EXERCISES
−10(12x3 +2x)
on (−∞, −1), on (−1, 1) and on (1, ∞), e) h′5 (x) = (3x4 +x2 )11
for x ̸= 0, h5
2
√
2x3 −1+2)7
is increasing on (−∞, 0) and decreasing on (0, ∞), f) h′6 (x) = 24x (√2x 3 −1
1 1 1 ′ 1
for x ̸= 3 2 , h6 is increasing on (−∞, 3 2 ) and ( 3 2 , ∞), g) h7 (x) = x+2
√ √ √ for
√
′ x √1
x > −2, h7 is increasing on its domain, h) h8 (x) = e 2 x for x ̸= 0, h8 is
increasing on (−∞, 0) and on (0, ∞)
2 3 4 5
5.8 a)T1 (x) = (x − 1) − (x−1)
2
+ (x−1)
3
− (x−1)
4
+ (x−1)
5
;
2 3
x−2 1 (x−2) 1 (x−2) 1 (x−2)4 1 (x−2)5
T2 (x) = ln 2 + 2 − 4 2 + 8 3 − 16 4 + 32 5
x3 x5 (x−2π)3 (x−2π)5
b)T0 (x) = x − 6
+ 120
; T2π (x) = (x − 2π) − 6
+ 120
x2 x4 (x− π2 )3 (x− π2 )5
c)T0 (x) = 1 − 2
+ 24
; T π2 (x) = −(x − π2 ) + 6
− 120
5 k
(−1)n−1 xk!
P
d)T0 (x) =
i=1
5 k
(−1)n xk! .
P
e)T0 (x) = 1 +
i=1
5.9 a) [−3, 2], b) [−4, −3], c) [−4, −2] and [0, 2], d) [−2, 0], e) -2 and
0.
307
5.10 Elp D(p) = −2 for all p. No matter what the current price is, a price
increase of 1% leads to an (approximate) 2% decrease in demand.
5.12 The mistakes are: each = 2 should be removed; pmin instead of min;
closing bracket missing.
5.13 res contains the sum of the elementwise product of x and y (equivalent
to sum(x*y)), mxy is the equivalent of pmax(x, y), that is, the elementwise
maximum of x and y.
x4 2 4 3
6.1 a) 4
+2x3 −x2 +C, b) 3x2 +5x+C, c) − 3x13 + x14 +C, d) 34 x 3 −4x 4 +C,
3 √ √ q
3 5
e) x6 − x4 + C, f) 32 x3 + 9 x2 + C, g) 4 x3 + C, h) 52 x 2 + 5x + C,
3
2 √
i) x2 + 2x12 − 3√2x3 + C, j) 14 ln |4x + 15| + C, k) x − 4 x + ln |x| + C,
√ √
l) 12 exp(2x+5)+C, m) 4 sin(0.5+0.25x)+C, n) 38 x8 +C, o) 45 x5 +C.
3 4
2 2
6.2 Use integration by parts. a) sin(x) − x cos(x) + C, b) x2 ln(x) − x4 + C,
c) x ln(x)−x+C, d) − 91 e−3x (3x+1)+C, e) 12 sin2 (x)+C, f) 21 ln2 (x)+C,
x
g) ex (x2 + 2x + 2) + C, h) e2 (sin(x) − cos(x)) + C, i) x−sin(x) 2
cos(x)
+ C,
x+sin(x) cos(x)
j) 2
+ C.
6 )8
6.3 Use substitution. a) 0.2(x2 +8)10 , b) (4+x
p
4 4
48
+C, c) 3
(x3 − 2)5 +C,
5 √
d) 31 ln |x3 +8|+C, e) (sin(x)+2)
p
5
+C, f) 23 (x2 + 5x + 1)3 +C, g) 23 1 + x3 +C,
6 9
h) (4+x9 ) + C.
x3 2x
6.4 f (x) = 3
+ 3
+ 2.
308 ANSWERS TO THE EXERCISES
√
6.6 a) 6, b) −30, c) 0, d) e − 1e , e) ln( 2), f) e −1
6.7 a = 3, b = 2, c = 4.
6.8 4.
6.11 I¯ = 3.2
6.13 88.75.
3 5 7 9 16 15 14 13
7.1 P = 8 10 12 14 in thousands of EUR, S = 10 9 8 7
7 9 11 13 14 13 12 11
in thousands of units; in R P*S (element-wise product) gives the revenues in
millions of EUR.
5 2 12
7.2 For instance P = , where the rows correspond to the materials
2 4 0
1000 3000 5000
and the columns to the products; O = 2500 0 2500, where the rows
500 3000 1800
correspond to the products and the columns to the countries. Then R can
be obtained in R by P%*%O and the combined raw material amounts required
by rowSums(P%*%O).
309
2 0
2 2 −1 1 4 1
7.3 a) not defined, b) , c) 2 3 , d) , e) not
0 3 −1 0 2 1
−1 −1
1 0 −1
1 2 2 4 9 2
defined, f) , g) 2 2 −1, h) , i) not defined, j) ,
4 0 4 1 4 0
0 −4 −2
1 4
k) , l) not defined.
4 8
−1 −1 −1 −1.8 1.2
7.9 a) X = D +D C D if both C and D are invertible, b) X = ,
1.25 − 31
c) iv, vii.
−2 4.5
3 −0.4 −6 1 −3
7.11 a) , b) −5 3.5, c) .
−1 1.2 5.5 6.5 4
1 −1
310 ANSWERS TO THE EXERCISES
7.13 v1 contains the sums of the rows of A, v2 the minimal values in each
column of A.
}
return(A)
}
7.18 a) G, nrow, c(1, 2), 2, for, 4, sum, i, 30, paste, SC, b) "The sum
in column 3 is 33." "The sum in column 4 is 40."
q q 2 +5xy
3 x
8.1 a) f1 (x, y) = 2 y
, f2 (x, y) = − 12 x
y3
, b) g1 (x, y) = (2x + 5y) ex ,
2
g2 (x, y) = 5x ex +5xy , c) h1 (x, y) = yxy−1 , h2 (x, y) = xy ln(x), d) i1 (x, y) =
3
3x2 ln(4xy) + x2 , i2 (x, y) = xy . Plot the functions in R, on a set (intervals
for x, y) of your choice. Use different types of plots to get some exercise on
the 3D plotting functions and observe the behaviour of the functions.
√
x
8.4 fx (x, y) = x+y 2
+ 6√1xy , fy (x, y) = x+y− 3y 2
, fxx (x, y) = − (x+y)2
2 −
√1
12 x3 y
,
2 √1 2 √1
fxy (x, y) = fyx (x, y) = − (x+y) 2 − 6 xy 2 , fyy (x, y) = − (x+y)2 − 6 xy 2 . At
(x0 , y0 ) increasing in both x and y, around the point concave in both x and
y. q
2 2 2
gx (x, y) = 2x ex −y − 2√1xy , gy (x, y) = − ex −y + 21 yx3 , gxx (x, y) = 4x2 ex −y + √1 3 ,
4 x y
q
x2 −y 2 −y
gxy (x, y) = gyx (x, y) = −2x e + √ 3 , gyy (x, y) = e
1 x 3 x
− 4 y5 . At
x xy
312 ANSWERS TO THE EXERCISES
5y 2 −4x
8.7 a) y ′ = − xy , b) y ′ = y3 , c) y ′ = 2+x
1−y
, d) y ′ = 18y 2 +8y−10xy
, e)
′ 3x2 cos(6y 3 )−8 cos(10y+8x) ′ y
y = 18x3 y 2 sin(6y 3 )+10 cos(10y+8x)
, f) y = −x.
8.10 a) 4.25, b) she must increase milk chocolate by 1.5h grams per week,
c) difference (old minus new expenses) is hpd − 1.5hpm , she saves if it’s
positive, i.e. if pd > 1.5pm .
9.1 a) x = 0 local
√
minimum, b) x =√ 21 local minimum, c) x = 0 local
maximum, d) 3−2 7 local maximum, 3+2 7 local minimum, e) x = −3 local
minimum, x = 1 neither, f) x = 1e local maximum
9.3 Local minima at x = 3 (with value 73) and x = 8 (with value 48),
local maxima at x = 4 (with value 80) and x = 15 (with value 685). Global
minimum at x = 8, global maximum at x = 15.
9.5 x = 5, Π(x) = 2050. Don’t forget to check the second order conditions.
9.6 x = 24, Π(x) = 40722. Don’t forget to check the second order conditions.
9.7 a) (0, 0) local minimum, b) (0, 0) saddle point, (1, 1) local minimum,
c) (0, 0) undetermined, (1, −1) and (1, 1) saddle points, d) (4, 1) and (4, −1)
314 ANSWERS TO THE EXERCISES
local minima
1400
9.9 h = 3
, s = 875, 1000λ ≈ 2455.09
10.3 e10000 today; the other offer has a present value of e9482.99.
}
}