KEMBAR78
Lecture Notes | PDF | Matrix (Mathematics) | Numbers
0% found this document useful (0 votes)
4 views326 pages

Lecture Notes

The document is a comprehensive guide on Quantitative Methods, covering topics from basic numbers and algebra to advanced concepts like optimization and matrix algebra. It includes sections on using R for statistical analysis, derivatives, integration, and financial calculations. Each chapter contains exercises and further readings to enhance understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views326 pages

Lecture Notes

The document is a comprehensive guide on Quantitative Methods, covering topics from basic numbers and algebra to advanced concepts like optimization and matrix algebra. It includes sections on using R for statistical analysis, derivatives, integration, and financial calculations. Each chapter contains exercises and further readings to enhance understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 326

Quantitative Methods I

Jana Hlavinová, Zehra Eksi, Nurtai Meimanjan

This version: October 16, 2024


2
Contents

Preface i

1 The basics 1

1.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Scientific notation . . . . . . . . . . . . . . . . . . . . . 2

1.2 Elementary algebra . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Equations in one variable . . . . . . . . . . . . . . . . . 5

1.2.2 Absolute value . . . . . . . . . . . . . . . . . . . . . . 9

1.2.3 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.4 Exponential expressions - powers . . . . . . . . . . . . 12

1.2.5 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3 Statement logic . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.4 Summation and product notation . . . . . . . . . . . . . . . . 19

1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.6 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Getting started with R 27

2.1 Download, installation and basic view . . . . . . . . . . . . . . 28

2.2 R as a calculator . . . . . . . . . . . . . . . . . . . . . . . . . 30

3
4 CONTENTS

2.3 Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Storing values . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.5 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.5.1 Vectors with patterns . . . . . . . . . . . . . . . . . . . 44

2.5.2 Comparison and subsetting . . . . . . . . . . . . . . . 46

2.6 More on logicals . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.7 Corner cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.8 Some useful advice and workspace management . . . . . . . . 52

2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.10 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Functions of one variable 57

3.1 The definition of a function . . . . . . . . . . . . . . . . . . . 57

3.2 Defining and plotting functions in R . . . . . . . . . . . . . . . 58

3.2.1 Plotting several functions in one plot . . . . . . . . . . 66

3.3 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3.1 Plotting linear functions in R . . . . . . . . . . . . . . 70

3.4 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . 72

3.5 Power functions . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.6 Exponential and logarithmic functions . . . . . . . . . . . . . 76

3.7 Defining functions with default values in R . . . . . . . . . . . 82

3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.9 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 88

4 More on functions 89

4.1 Transformations of graphs . . . . . . . . . . . . . . . . . . . . 89


CONTENTS 5

4.2 if...else and ifelse . . . . . . . . . . . . . . . . . . . . . . 95

4.3 Minimum and maximum in R . . . . . . . . . . . . . . . . . . 98

4.4 New functions from old . . . . . . . . . . . . . . . . . . . . . . 101

4.5 Injections, surjections, bijections and inverse functions . . . . 105

4.6 Common properties of functions . . . . . . . . . . . . . . . . . 109

4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.8 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 121

5 Derivatives 123

5.1 The definition of a derivative . . . . . . . . . . . . . . . . . . . 124

5.2 The for cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.3 Calculating derivatives . . . . . . . . . . . . . . . . . . . . . . 128

5.4 Derivatives and the properties of a function . . . . . . . . . . 129

5.5 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . 131

5.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.6.1 Taylor approximation . . . . . . . . . . . . . . . . . . . 132

5.6.2 Elasticity . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.8 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 138

6 Integration 141

6.1 Indefinite integral . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.1.1 Basic rules for indefinite integrals . . . . . . . . . . . . 142

6.1.2 Integration by parts . . . . . . . . . . . . . . . . . . . 143

6.1.3 Integration by substitution . . . . . . . . . . . . . . . . 145


6 CONTENTS

6.2 Definite integral and areas . . . . . . . . . . . . . . . . . . . . 146

6.2.1 Definite integral for general functions . . . . . . . . . . 149

6.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

6.3.1 Consumer and producer surplus . . . . . . . . . . . . . 152

6.3.2 Average function value . . . . . . . . . . . . . . . . . . 154

6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

6.5 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 158

7 Matrix algebra 161

7.1 Matrix terminology . . . . . . . . . . . . . . . . . . . . . . . . 161

7.1.1 Special matrices . . . . . . . . . . . . . . . . . . . . . . 162

7.2 Matrices in R . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.2.1 Defining matrices . . . . . . . . . . . . . . . . . . . . . 163

7.2.2 Accessing matrix entries . . . . . . . . . . . . . . . . . 166

7.2.3 Extending and combining matrices . . . . . . . . . . . 170

7.2.4 Managing matrix dimensions . . . . . . . . . . . . . . . 171

7.3 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . 173

7.3.1 Matrix operations in R . . . . . . . . . . . . . . . . . . 177

7.3.2 Applications of matrix multiplication and matrix powers180

7.4 Matrix transpose and symmetric matrices . . . . . . . . . . . 187

7.5 Systems of equations in matrix form . . . . . . . . . . . . . . 189

7.6 The inverse and determinant . . . . . . . . . . . . . . . . . . . 194

7.6.1 Solving matrix equations . . . . . . . . . . . . . . . . . 199

7.6.2 Determinant and definiteness . . . . . . . . . . . . . . 200

7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202


CONTENTS 7

7.8 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 208

8 Functions of several variables 209

8.1 Plotting functions of two variables in R . . . . . . . . . . . . . 210

8.2 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 214

8.3 Implicit differentiation . . . . . . . . . . . . . . . . . . . . . . 220

8.4 Higher order partial derivatives . . . . . . . . . . . . . . . . . 222

8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

8.6 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 234

9 Optimization 235

9.1 Single variable optimization . . . . . . . . . . . . . . . . . . . 235

9.1.1 Single variable optimization in R . . . . . . . . . . . . 240

9.2 Multivariate optimization . . . . . . . . . . . . . . . . . . . . 242

9.2.1 (Unconstrained) Multivariate optimization in R . . . . 248

9.3 Constrained optimization . . . . . . . . . . . . . . . . . . . . . 249

9.4 Constrained optimization in R . . . . . . . . . . . . . . . . . . 254

9.5 Global extremes . . . . . . . . . . . . . . . . . . . . . . . . . . 257

9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

9.7 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 264

10 Interest rates and time value of money 267

10.1 Interest periods and effective rates . . . . . . . . . . . . . . . . 267

10.2 Present and future value . . . . . . . . . . . . . . . . . . . . . 272

10.3 Annuities and mortgage repayments . . . . . . . . . . . . . . . 274

10.4 Internal rate of return . . . . . . . . . . . . . . . . . . . . . . 277


8 CONTENTS

10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

10.6 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . 282

11 Further useful R functions and tips 283

11.1 while cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

11.2 return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

11.3 Several outputs from a function . . . . . . . . . . . . . . . . . 288

11.4 floor and ceiling . . . . . . . . . . . . . . . . . . . . . . . . 292

11.5 which . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

11.6 Computational efficiency . . . . . . . . . . . . . . . . . . . . . 294

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

Answers to the exercises 299


Preface

The goal of these lecture notes is to summarize in slightly more details


than in the slides, and in one place as opposed to several recommended
readings, the contents of the course Quantitative Methods I taught at the
Vienna University of Economics and Business as part of the curriculum of the
Bachelor in Business and Economics program. We start with a chapter on
the basics that will not be directly covered in the course but that we expect
the students to know. Chapter 2 introduces R starting by brief instructions
on where to download the necessary software, and then explaining its basic
workings. Starting from Chapter 3, further features of R are introduced
as fit, complementing the mathematical contents of the individual chapters.
Chapters 3 and 4 deal with functions; starting by introducing what a function
is formally and some basic classes of functions, and then building on this
knowledge to create new functions and to discuss some basic function properties.
In Chapter 5, we introduce derivatives, and follow with Chapter 6 on the
’inverse’ process of differentiation, integration. In Chapter 7, we discuss the
basics of matrix algebra. In Chapter 8, we extend the concepts of Chapters
3-5 to functions of several variables. Chapter 9 presents an overview of one
of the most prominent applications of derivatives, optimization. Chapter 10
introduces some basic concepts of financial mathematics, namely time value
of money and how it is influenced by interest rates.

In each chapter, we cover the theoretical aspects of the topic and illustrate
them with a handful of examples. At the end of each chapter, we provide a
collection of exercises, and we close with a short section with recommendations
for further readings and exercises from other sources. In particular, we
generally refer to the following two books for more details on the topics
covered:

ˆ Sydsaeter, Knut; Hammond, Peter; Strom, Arne; Carvajal, Andres:


Essential Mathematics for Economic Analysis, 6th edition, Pearson

i
ii PREFACE

2021 [1]

ˆ Braun, W. John; Murdoch, Duncan J.: A first course in statistical


programming with R, 2nd edition, Cambride 2016 (3rd edition from
2021 is also available; references in these lecture notes are to the 2nd
edition) [2]

While in the above books the presentation of the material might be more
detailed, we see the main advantage of this document in the fact that it
integrates both the mathematical aspects and the application in R of the
covered topics.

If you find any mistakes or typos, we will be grateful if you report them to
jana.hlavinova@wu.ac.at such that we can further improve the lecture notes.
Chapter 1

The basics

In this chapter, we recall some basic mathematical concepts, such as those


of elementary algebra, exponential and logarithmic functions, or statement
logic. We will also introduce some of the notation used in the course. While
the content of this chapter is not discussed in the course and will not be tested
directly, it is assumed that you are familiar with it and its knowledge will
often be necessary to successfully solve the tasks of the home assignments or
exams. Therefore if you did not learn these concepts at school, or if you feel
like you forgot a lot, we strongly recommend you study this chapter before
the course starts. Clearly, you can also come back to it whenever you feel
the need.

1.1 Numbers

In this course, we work with real numbers, denoted as R. These√ are all the
usual numbers that you can think of, such as 0, 5, −153, 75.2, 2, π, . . . (note
that we use a dot as the decimal delimiter, following the English conventions
and at the same time ensuring consistency with the usage of R). The set of
all real numbers contains other sets of numbers such as:

ˆ N – natural numbers: positive integers, i.e. numbers with no decimal


places – 1, 2, 3, 4, . . .,

ˆ Z – integers, both positive and negative, and 0,


1
2 CHAPTER 1. THE BASICS

ˆ Q – rational numbers: all numbers that can be written in the form p


q
with p ∈ Z, q ∈ N. These are all integers, numbers with a finite number
of decimals, or those with periodic decimals,

ˆ I = R \ Q – irrational numbers: all decimal numbers with infinitely


many decimal places without periodic√places. These cannot be written
as fractions. Typical examples are 2 (or the square root of many
others, including but not limited to the prime numbers) or π.

Note that we have the following chain of set inclusions: N ⊂ Z ⊂ Q ⊂ R.

In certain situations, we might need to include or exclude some numbers from


a particular set, so we will also use the following notation:

ˆ N = N ∪ {0} – natural numbers with 0,


0

ˆ R = (0, ∞) – positive real numbers,


+

ˆ R = [0, ∞) – positive real numbers with 0,


0
+

ˆ R = (−∞, 0) – negative real numbers,


ˆ R = (−∞, 0] – negative real numbers with 0.


0

Remark. For intervals, we use the normal brackets ( ) to indicate open


intervals (endpoint(s) excluded) and square brackets [ ] to indicate close
intervals (endpoint(s) included). Note that these might be combined to
achieve intervals that are open on one and closed on the other end.

1.1.1 Scientific notation

For particularly large or particularly small numbers, one sometimes does not
wish to write them out with all of their digits, in particular if there are a
lot of zeroes after or before the last/first non-zero digit (e.g. the value is in
millions). In such cases, we use a compact way of writing numbers called
scientific notation. In scientific notation, a non-zero number is written in
the form m · 10n where n is an integer and 1 ≤ |m| < 10. The values n
and m are called the exponent and mantissa (or significand ), respectively.
Sometimes, for instance in calculators, the notation m·e n is used (be careful,
e does not stand for the Euler’s constant here), that is, instead of writing
e.g. 3.42 · 10−2 , one would write 3.42e − 2 or 3.42e − 02. In the following,
1.2. ELEMENTARY ALGEBRA 3

we give a few examples of numbers written in both the ’usual’ way and in
scientific notation:

ˆ The distance between the Earth and the Sun: approximately 149600000km =
1.496e8km.

ˆ The size of an atom core: 0.1nm = 0.1


1000000000
m = 1e − 10m.

ˆ Alcohol limit for drivers in Austria: 0.5 ‡= 0.0005 = 5e − 04.

To translate a number from scientific notation to the usual representation,


just multiply the mantissa with the corresponding power of 10. The other
way around might be slightly more complicated at the beginning, but there
really is not much to it: Just find out what power of 10 you need to divide
the number by to achieve a value that is (in absolute value) not smaller than
1 and at the same time smaller than 10. The result of this division is your
mantissa, whereas the required power gives you the exponent.

1.2 Elementary algebra

In algebra, we work with variables that might take different values. In some
situations, we want to solve equations to find certain unknown values; in
other cases, we incorporate variables on purpose to keep flexibility when
describing certain aspects of the world. Variables (and parameters) are
usually denoted with Latin letters, sometimes you will also encounter Greek
letters. Moreover, various indexes or accents, like a hat, bar or a prime can
be used to introduce new variables.

Irrespective of whether variables or only numbers are involved, the following


simple algebraic rules apply for any a, b ∈ R:
4 CHAPTER 1. THE BASICS

−(−a) = a (1.1)
−(a + b) = −a − b (1.2)
a(b + c) = ab + ac (1.3)
ab = ba (1.4)
(a + b)(c + d) = ac + ad + bc + bd (1.5)
(a + b)2 = a2 + 2ab + b2 (1.6)
(a − b)2 = a2 − 2ab + b2 (1.7)
(a + b)(a − b) = a2 − b2 (1.8)

Formulae (1.6) and (1.7) belong to binomial formulae, since they describe the
algebraic expression of a power (in this case a square) of a binomial term.
For fractions, the following basic rules apply for a, b, c, d ∈ R:

a·c a
= if b, c ̸= 0 (1.9)
b·c b
−a a a
= =− if b ̸= 0 (1.10)
b −b b
a b a+b
+ = for c ̸= 0 (1.11)
c c c
a c ad + cb
+ = for b, d ̸= 0 (1.12)
b d bd
a c ac
· = for b, d ̸= 0 (1.13)
b d bd
a c a d ad
÷ = · = for b, c, d ̸= 0 (1.14)
b d b c bc

The formulae (1.1)-(1.14) are often used to simplify various expressions.


However, one has to be careful when simplifying expressions including variables
(this applies in particular to equations, we will illustrate this later on). By
cancelling out terms that e.g. occur in both numerator and denominator of
a fraction, one might lose information about when the fraction (or any other
expression) at hand is well defined. Therefore we always have to write down
what values the variable is allowed to take. We show this in the following
example:
Problem 1.1. Find the values for which the expression
(x + 5)(x − 6)
36 − x2
1.2. ELEMENTARY ALGEBRA 5

is well-defined. Then simplify the expression.


Solution. This expression is well defined as long as the denominator is non-
zero. For the denominator to be non-zero, the conditions are x ̸= ±6.
Note that the for the denominator, we have
36 − x2 = (6 − x)(6 + x).
Therefore for values for which the expression is well defined we can write
(x + 5)(x − 6) x+5
2
= .
36 − x x+6
x+5
Note that if we used this final expression x+6
to identify the allowed values,
we would only find x ̸= −6.

1.2.1 Equations in one variable

Sometimes, the information about a certain value is given somewhat indirectly,


in the form of an equation that needs to be solved to identify the value. To
this end, a sequence of more or less complicated steps that do not change
the information encoded in the equation, so called equivalent operations, is
performed. The simplest equivalent operations are adding or subtracting
a constant to both sides of the equation, and multiplying or dividing both
sides by a (in case of division non-zero) constant. Further possibilities include
taking the reciprocal or logarithm of both sides. In case of other operations,
for which there might be several valid outcomes (like taking the square root),
one has to be more careful.

The simplest type of equations are linear equations. In a linear equation,


the variable of interest, let us denote it by x, only appears in linear terms,
i.e. in the form ax for some a ∈ R. a here is called a parameter. It might
be given a particular value, but often, it represents a whole set of possible
values and is used to create a general class of functions. For this type of
equations, addition, subtraction, multiplication and division are enough to
solve the equation. The basic form of a linear equation is ax + b = 0 for
a ̸= 0. By subtracting b from both sides and then dividing by a, we obtain
the solution x = − ab . Note that if the equation is not in this form, that is,
if the right hand side is not 0, the equation can easily be brought into this
form by subtracting the right hand side from both sides of the equation.

Another usual type of equations are quadratic equations in which the variable
x appears at most in a quadratic term. The basic form of a quadratic equation
6 CHAPTER 1. THE BASICS

is ax2 + bx + c = 0 (note that again, if the equation is not in this form, it


can be brought into it using the basic equivalent operations of addition and
subtraction). There are three cases in which this equation boils down into
something simpler:

ˆ If a = 0, the equation at hand is in fact a linear equation bx + c = 0


c
and the solution is x = − .
ˆ If b = 0, we obtain ax + c = 0 or equivalently x = − . If the right
b
2 2 c
a
hand side is negative, this equation has no solution. If c = 0 (see also
the next case), the solution ofpthis equation is 0. Finally, if − ac > 0,
we have two solutions, x = ± − ac . Note the ± sign in the solution:
While the square root of a number is defined as one positive value,
in the case of equations we are looking for all the values x for which,
in this case, x2 = − ac . Due to the symmetric property of the square,
x2 = (−x)2 , we have to take the negative of the corresponding square
root into account, too.
ˆ If c = 0, the equation is of the form ax +bx = 0 which can be rewritten
2

as x(ax + b) = 0. As is known, a product of two numbers is equal to


0 exactly if (at least) one of the numbers is equal to 0, thus we obtain
the solutions x = 0 and x = − ab .

If neither of the three parameters a, b, c is equal to 0, the equation only has


a solution if b2 − 4ac ≥ 0. In that case, the solution of the equation is given
by √
−b ± b2 − 4ac
x= .
2a
Note that this expression is only well defined if b2 −4ac ≥ 0, which corresponds
to the condition named above. Moreover, if the inequality is strict, there are
two solutions to the equation; for b2 − 4ac = 0, the single solution is x = −b
2a
.

The formula given above can in fact be derived quite easily. ax2 + bx + c
can be rewritten with the help of one of the binomial formulae as follows
(you might check the equality as an exercise or, even better, try to derive it
yourself):
2
b2

2 b
ax + bx + c = a x + − + c.
2a 4a
If we set this equal to 0, as in the original equation, we obtain the equation
2
b2

b
a x+ = −c
2a 4a
1.2. ELEMENTARY ALGEBRA 7

or equivalently
2
b2

b c
x+ = 2− .
2a 4a a
Since on the left hand side we have a squared value, the equation can only
have a solution if the right hand side is non-negative:

b2 c
2
− ≥0
4a a
2
b − 4ac ≥ 0

which corresponds to the condition given above. Assuming that this condition
is satisfied, we can take the square root of both sides to obtain
r r √
b b2 c b2 − 4ac ± b2 − 4ac
x+ =± − =± =
2a 4a2 a 4a2 2a
b
which after subtracting 2a
from both sides leads to the known formula.

Let us now study the relationship between the parameters of the equation
a, b, c and the solutions of the equation if b2 − 4ac > 0. Let us denote
√ √
−b + b2 − 4ac −b − b2 − 4ac
x1 = and x2 = .
2a 2a
We can observe the following:

ˆ x +x
1 2

2

2
= −b+ 2ab −4ac + −b− 2ab −4ac = −b
a
. In particular, if a = 1, the
sum of the two solutions is the negative of the coefficient of the linear
term x, −b.

ˆx 1
√ √ 2 2
· x2 = −b+ 2ab −4ac · −b− 2ab −4ac = b −(b4a−4ac)
2 2
2 = ac . In particular, if
a = 1, the product of the two solutions is the absolute term c.

The above facts allow for a simple way of factoring out quadratic expressions:
If an expression x2 + px + q can be factored into (x − x1 )(x − x2 ), then x1
and x2 are the solutions of the quadratic equation x2 + px + q = 0 and are
therefore such that x1 + x2 = −p and x1 · x2 = q. (Alternatively, we have
x2 + px + q = (x + y1 )(x + y2 ) with y1 + y2 = p and y1 · y2 = q.) To illustrate
this, let us consider an example.

Problem 1.2. Factorize the expression x2 − 11x + 30.


8 CHAPTER 1. THE BASICS

Solution. To find the roots x1 and x2 of the above expression, we consider


possible factorizations of the number 30 (the parameter q): 30 = 1 · 30 =
2·15 = 3·10 = 5·6. Since q > 0, both x1 and x2 will share the sign; p < 0 tells
us that they are both positive. By checking the possible combinations listed
above, we get that x1 = 5 and x2 = 6 such that x2 − 11x + 30 = (x − 5)(x − 6)
(you may check this easily).

As was already mentioned in the previous subsection, being careful about


simplifying expressions is of particular importance when solving equations,
and even more so in case of parametric equations. This situation is illustrated
by the following example:

Problem 1.3. Solve the equation


2x − a x−a
=
x−5 x−3
with a parameter a ∈ R.

Solution. Let us analyze the above equation in dependence of a:


2x − a x−a
=
x−5 x−3
(2x − a)(x − 3) = (x − a)(x − 5)
2x − ax − 6x + 3a = x2 − ax − 5x + 5a
2

x2 − x − 2a = 0

1 ± 1 + 8a
x1,2 = .
2
This expression is well defined whenever 1 + 8a ≥ 0. In particular, we have
two real solutions for a > − 18 , one solution for a = − 18 and no solution for
a < − 18 .
However, we have to remember, from the original equation, that x cannot be
equal to 3 or 5. Therefore we check for which values of a the solution above
would lead to any of these value:
For x ̸= 3, it has to be the case that

1 ± 1 + 8a ̸= 6

± 1 + 8a ̸= 5
1 + 8a ̸= 25
a ̸= 3.
1.2. ELEMENTARY ALGEBRA 9

Similary, for x ̸= 5 we have



1± 1 + 8a ̸= 10

± 1 + 8a ̸= 9
1 + 8a ̸= 81
a ̸= 10.

In both of these situations, we observe that after plugging in a into the


original equation simplifies it such that there is only one solution. (Alternatively,
plugging in into the formula for x1,2 , one gets one admissible solution.)
To summarize, we have the following for the above equation:

ˆ x ∈ ∅ if a < − ,1

ˆ x = if a = − ,
8
1 1

ˆ x = −2 if a = 3,
2 8

ˆ x = −4 if a = 10,
ˆ x= √
1± 1+8a
2
1
if a ∈ (− , 3) ∪ (3, 5) ∪ (5, ∞).
8

1.2.2 Absolute value

Definition 1.1. The absolute value |x| of a number x gives its distance from
0:
(
x if x ≥ 0
|x| =
−x if x < 0.

Loosely speaking, the absolute value of a number is its value without its sign.
Following the definition of absolute value distinguishing two cases, one can
also solve equations by splitting it into cases. We demonstrate this in the
following example.
Problem 1.4. Solve the following equation in R: |x + 2| = 4|x − 3|.
Solution. From the definition of absolute value, we have
(
x+2 if x ≥ −2
|x + 2| =
−(x + 2) = −x − 2 if x < −2.
10 CHAPTER 1. THE BASICS

Similarly, we have
(
x − 3 if x ≥ 3
|x − 3| =
3 − x if x < 3.

Therefore we can split R into three intervals and solve the equation for each
of them separately:

1. x ∈ (−∞, −2):

|x + 2| = 4|x − 3|
−x − 2 = 4(3 − x)
14
x= ∈
/ (−∞, −2)
3
(thus no solution in this interval)

2. x ∈ [−2, 3):

|x + 2| = 4|x − 3|
x + 2 = 4(3 − x)
x=2

3. x ∈ [3, ∞):

|x + 2| = 4|x − 3|
x + 2 = 4(x − 3)
14
x=
3

Combining the three cases, we get the solution x ∈ 2, 14



3
.

1.2.3 Inequalities

An inequality is a comparison of two numbers or terms that are not (necessarily)


equal. To express the relationship between two terms, four different mathematical
signs might be used:

ˆ < – less than,


1.2. ELEMENTARY ALGEBRA 11

ˆ > – greater than,


ˆ ≤ – less than or equal to / at most,
ˆ ≥ – greater than or equal to / at least.
Inequalities that do not allow for equality, that is those with the signs < and
>, are also referred to as strict inequalities.

By applying mathematical operations on both sides of an inequality, the


inequality can be simplified (solved). However, compared to solving equations,
more caution has to be applied since some operations change the direction of
the inequality, or may only be applied for x in a certain area. The following
rules apply:

i) Multiplication by a positive number keeps the inequality sign.

ii) Multiplication by a negative number reverses the inequality sign.

iii) Adding or subtracting a number keeps the inequality sign.

iv) Applying an increasing operation (i.e. a function f for which f (x) <
f (y) whenever x < y) keeps the inequality sign.

v) Applying a decreasing operation (i.e. a function f for which f (x) > f (y)
whenever x < y) reverses the inequality sign.

Some operations, e.g. taking the reciprocal of a number or taking the square
of a number, are only increasing or decreasing if considering a certain set of
values. If we want to apply these operations, we have to carefully consider
what interval we are operating on. For instance, taking the reciprocal of
a number is a decreasing operation on (−∞, 0) (since e.g. −3 < −2 but
− 31 > − 21 ) and on (0, ∞) (since e.g. 5 < 10 but 15 > 10
1
), but not on R (since
1 1
e.g. −2 < 2 and − 2 < 2 ). Therefore, if both sides of the inequality share the
sign (they are both positive or they are both negative), the inequality sign
changes. This, however, is not the case if one side is negative and the other
is positive.

In other situations, when solving inequalities, one has to consider several


cases, similarly to solving an equation that includes absolute values or powers
of the variable. This can be illustrated on the following inequality:
Problem 1.5. Find all solutions x ∈ R of the inequality (x − 4)2 < 16.
12 CHAPTER 1. THE BASICS

Solution. Recall that if the above inequality √ were an equation, we would


have two solutions, by considering x − 4 = 16 = ±4, giving us x1 =
0 and x2 = 8. Therefore, we can split the real axis into three intervals:
(−∞, 0), (0, 8) and (8, ∞). Note that the end points are not included in
any of the intervals, since we have a strict inequality. Within these intervals,
the expression (x − 4)2 keeps its relative position to the number 16, i.e. it is
either greater than 16 on the whole interval, or smaller than 16. Therefore
we can choose one number from each interval to check whether the particular
interval is or is not a solution of the given inequality.

For the first interval, let us consider x = −1. We have (−5)2 > 16 which
means this interval is not a part of the solution.

For the second interval, let us consider x = 1. We have (−3)2 < 16, thus
x ∈ (0, 8) does solve the inequality.

For the third interval, let us consider x = 10. We have 62 > 16 which means
this interval is not a part of the solution.

Combining all the above cases, we get that the complete solution of the given
inequality is x ∈ (0, 8).

1.2.4 Exponential expressions - powers

For working with powers, the following rules apply (a, b > 0, r, s ∈ R, n ∈ N):
a0 = 1 (1.15)
1
a−r = r (1.16)
a
1 √
a = na
n (1.17)
ar · as = ar+s (1.18)
ar · br = (ab)r (1.19)
(ar )s = ars (1.20)
if r > 0 : ar < br for a < b (1.21)
if r < 0 : ar > br for a < b (1.22)
if a > 1 : ar < as for r < s (1.23)
if a < 1 : ar > as for r < s (1.24)
Note that we chose a, b > 0 to make sure that terms such as ar always exist.
The rules above, however, apply whenever all terms are well defined. For
1.2. ELEMENTARY ALGEBRA 13

instance in case of equation (1.17), the rule also applies for a < 0 if n is odd.

To illustrate the use of (some of) the above rules, let us solve an example.
Problem 1.6. If x−4 y 6 = 25, find x2 y −3 + 2x−12 y 18 .
Solution. We start by rewriting the expression whose value we are looking
for in terms of x−4 y 6 , with the help of the above rules. Then we can plug in
25 for x−4 y 6 . We get
1
x2 y −3 + 2x−12 y 18 = (x−4 y 6 )− 2 + 2(x−4 y 6 )3
1
=p + 2(x−4 y 6 )3
x−4 y 6
1
= √ + 2 · 253 = 31250.2.
25

A usual application of powers is financial mathematics. We will discuss


various forms of compounding in financial mathematics in more detail at the
end of the course. For now, we only consider a simple example.
Problem 1.7. a) Suppose you deposit e1000 in a bank account paying
2% interest at the end of each year. How much do you have in 5 years?

b) Suppose you buy something for e1000 which decreases in value (depreciates)
by 2% per year. How much is it worth after 5 years?

c) Suppose you buy something for e1000 · 1.025 which depreciates by 2%


per year. How much is it worth after 5 years?

d) Suppose you deposited a certain amount of money in an account paying


5% interest at the end of each year 6 years ago and you currently have
e2010.14 in the account. How much money did you deposit?
Solution. a) The amount in the account increases by 2% each year. Recall
that a single increase by r·100% can generally be expressed by multiplying
by (1 + r): If we consider an amount A, we have A(1 + r) = A + Ar
which means that to the amount A, we have added a fraction of it
corresponding to r · 100%.
Therefore after the first year, we have in the account e1000 · 1.02.
At the end of the second year, the process is repeated, but since we
multiply the current value at that point, we have e1000 · 1.02 · 1.02
in the account at the end of year 2. Repeating these steps until
we arrive at the end of the fifth year, we find the final sum to be
14 CHAPTER 1. THE BASICS

e1000 · 1.02 · 1.02 · 1.02 · 1.02 · 1.02 = 1000 · 1.025 ≈ 1104.08.


The logic above applies generally: If an amount A is deposited in an
account that pays an interest r · 100% at the end of each year, after
t years (after the last interest has been added), the amount in the
account is equal to A(1 + r)t .

b) The value of the bought object decreases by 2% each year. A single


decrease by r · 100% can generally be expressed by multiplying by (1 −
r): If we consider an amount A, we have A(1 − r) = A − Ar which
means that from the amount A, we have subtracted a fraction of it
corresponding to r · 100%.
Starting with a value e1000 and using the logic from part a), we can
write down that the value of the bought object after 5 years is e1000 ·
0.985 ≈ 903.92.

c) Repeating the steps from parts a) and b), we get that after 5 years, the
object is worth e1000 · 1.025 · 0.985 ≈ 998.
Note that the value is not equal to 1000: increasing and then decreasing
a certain price or value by the same percentage does not result in the
original price. The reason is simple: (1 + r)(1 − r) = 1 − r2 ̸= 1. If
a businessman decreases the price of a product by r · 100% and would
later like to increase it back to the original price, the increase will be
by more than r · 100%.

d) Following the logic of part a) and denoting the deposited amount by


1
A, we can write A · 1.056 = 2010.14. Consequently, A = 2010.14 · 1.05 6.
t
Finding the starting capital by dividing the final amount by (1 + r) is
generally called discounting.

1.2.5 Logarithms

Definition 1.2 (Logarithm). Logarithm of a number x > 0 at base a > 0 is


a number b for which ab = x. We write loga x = b.

Special logarithms. Among all possible logarithm bases, there are two
numbers that stand out. The natural logarithm arises for a = e (Euler’s
constant ≈ 2.7182) and is often denoted as ln x. However, it is also not
unusual, in particular in programming languages – including R – to denote it
log x. This can often lead to confusion since log x sometimes also stands for
the decadic logarithm with a = 10. To avoid such confusion, in this course
1.2. ELEMENTARY ALGEBRA 15

we will explicitly write the base unless we mean natural logarithm, i.e. both
ln x and log x will stand for the natural logarithm, whereas decadic logarithm
will be denoted log10 x.

The following rules apply when working with logarithms (a, b, x, y > 0, a, b ̸=
1, r ∈ R):
loga (xy) = loga x + loga y (1.25)
loga 1 = 0 (1.26)
loga xr = r loga x (1.27)
x
loga = loga x − loga y (1.28)
y
logb x
loga x = (1.29)
logb a
if a > 1 : loga x < loga y for x < y (1.30)
if a < 1 : loga x > loga y for x < y (1.31)

The last two properties in particular, combined with the fact that loga 1 = 0,
imply that if the base is larger than 1 (which is the case for both natural
and decadic logarithm) loga x is negative for all x < 1 and positive whenever
x > 1. If a < 1, we have negative loga x for x > 1 and positive logarithm for
x < 1.
Problem 1.8. Find the mistake in the following ’proof’ showing that 2 < 1:
1/4 < 1/2
ln(1/4) < ln(1/2)
ln((1/2)2 ) < ln(1/2)
2 ln(1/2) < ln(1/2)
2<1
Solution. Following the chain of inequalities one by one, we check that the
first four inequalities are true: For base e, the logarithm is an increasing
operation, such that taking the logarithm of both sides does not change the
direction of the inequality. The next two steps are just rewriting the left
hand side using the properties of the logarithm. However, in the last step,
one divides by ln(1/2). Since e > 1 and 1/2 < 1, this is a negative number,
which means that the sign of the inequality has to be turned in this step.

The logarithm and the exponential are closely connected to each other as
they represent inverse operations. That means that if we apply one of the
16 CHAPTER 1. THE BASICS

operations on a number and then apply the other on the result, the final
result is the same as the original number. We therefore have the following
two useful properties:
loga (ax ) = x, (1.32)
aloga (x) = x. (1.33)

1.3 Statement logic

In mathematics and logic, a statement is a claim – for instance a sentence or


a mathematical expression such as an equality – about which one can decide
whether it is true or false. The truth value of a statement is its relation to
truth, i.e. true or false. This value is sometimes also referred to as the logical
value of the statement, in particular in programming languages (including
R). To abbreviate, sometimes 1 is used to denote the value true and 0 to
denote false.
The negation of a statement is another statement whose truth value is under
all circumstances the exact opposite of the truth value of the original statement.
If we consider a statement A, it’s negation is denoted by ¬A.
A conjugate statement is a statement that arises by combining two or more
simple statements by means of truth-functional operators. Let us consider
two simple statements A and B. The four basic conjugate statements are
the following:

i) Conjunction: A and B, notation: A ∧ B. This statement is true if both


A and B are true.
ii) Disjunction: A or B, notation: A ∨ B. This statement is true if at
least one of the two statements A and B is true, that is, the only
situation when it is false is the case of both A and B being false. This
is sometimes referred to as the inclusive or since this statement is also
considered true if both A and B are true. This is opposed to the
exclusive or for which the statement is true if exactly one of the simple
statements A and B are true. In this course, we will use the inclusive
or, unless stated otherwise.
iii) Implication: if A then B (A implies B), notation: A ⇒ B. If A is
true, this statement is only true if B is true, as well. If A is false, the
implication is always true independently of the truth value of B.
1.3. STATEMENT LOGIC 17

iv) Equivalence: A if and only if B (A is equivalent to B), notation: A ⇔


B. This statement is true if both A and B have the same truth value.

The truth value of the four basic conjugate statements in dependence on the
truth value of the statements A and B can be summarized in the following
truth table:

A B A∧B A∨B A⇒B A⇔B


1 1 1 1 1 1
1 0 0 1 0 0
0 1 0 1 1 0
0 0 0 0 1 1
Remark. An illustration of the implication truth value is the joke about a
son of a mathematician. At dinner, the father tells the son: ’If you don’t eat
all your veggies, you will not get any ice cream.’ The poor son eats all of his
veggies and does not get any ice cream...

Remark. Note that a non-strict inequality, i.e. one with the at least or at
most sign, is in fact a conjugate statement, more precisely a disjunction of a
strict inequality and of an equality. It is sufficient that one of the statements
is true for the whole statement to be true. This means that for instance the
statement ’All natural numbers are greater than or equal to -5.’ is a true
statement. The fact that no natural number can ever be equal to -5 is not a
problem, since for all natural numbers the part ’greater than -5’ is satisfied.

Finally, let us illustrate the conjugate statements and their truth value with
the help of a small party.

Problem 1.9. Anne and Bob were invited to a party. Anne did not feel
well and stayed home, Bob went to the party. Decide whether the following
statements are true or false:

i) Anne stayed home and Bob went to the party.

ii) Anne went to the party and Bob went to the party.

iii) Anne went to the party or Bob went to the party.

iv) If Anne went to the party, then Bob went to the party.

v) If Anne stayed home, then Bob stayed home.


18 CHAPTER 1. THE BASICS

vi) Anne stayed home if and only if Bob stayed home.

vii) Anne went to the party if and only if Bob stayed home.

Solution. Let us denote by A the statement ”Anne went to the party.”


and by B the statement ”Bob went to the party.” (and assume for this one
exercise that there are only two options – going to the party or staying home
– such that ¬A is the statement ”Anne stayed home.”). We write down the
conjugate statements with the help of A and B and decide about their truth
values with the help of the truth table. Note that A and ¬B are false whereas
¬A and B are true statements.

i) ”Anne stayed home and Bob went to the party.” translates to ¬A ∧ B


and is true.

ii) ”Anne went to the party and Bob went to the party.” translates to
A ∧ B and is false.

iii) ”Anne went to the party or Bob went to the party.” translates to A ∨ B
and is true.

iv) ”If Anne went to the party, then Bob went to the party.” translates to
A ⇒ B and is true.

v) ”If Anne stayed home, then Bob stayed home.” translates to ¬A ⇒ B


and is false.

vi) ”Anne stayed home if and only if Bob stayed home.” translates to
¬A ⇔ ¬B and is false.

vii) ”Anne went to the party if and only if Bob stayed home.” translates to
A ⇔ ¬B and is true.

Necessary and sufficient conditions

In mathematics, one often talks about necessary and sufficient conditions, for
instance for certain properties to be satisfied. Let us consider two statements
A and B. If we have A ⇒ B, we say that A is a sufficient condition for B
since it suffices that A is true for B to be true. Often it is the case that A
is easier to check than B which is why sufficient conditions are looked for.
From the truth table it is also clear that A cannot be true if B is not satisfied,
too, thus we say that B is a necessary condition for A. However, generally
1.4. SUMMATION AND PRODUCT NOTATION 19

it is not the case that A is a necessary condition for B since (recall the truth
table again) B might be true even if A is false. For A to be both sufficient
and necessary for B (and vice versa), we need the equivalent relationship,
i.e. A ⇔ B.

As an example, let us consider the following properties: x > 2 and |x| > 2.
Clearly, there is an implication between these two statements, x > 2 ⇒ |x| >
2, such that x > 2 is a sufficient condition for |x| > 2, and |x| > 2 is a
necessary condition for x > 2 (it is not possible for x to be larger than 2
if |x| > 2 is not satisfied). However, x > 2 is not a necessary condition
for |x| > 2: for instance for x = −3 we also have |x| > 2, even though
the first condition is not satisfied. But if we extend the first condition to
x > 2 ∨ x < −2, we obtain an equivalence: (x > 2 ∨ x < −2) ⇔ |x| > 2.
Thus x > 2 ∨ x < −2 is a sufficient and necessary condition for |x| > 2.

1.4 Summation and product notation

When taking the sum of several summands, say of terms a1 , a2 , . . . , a10 ,


sometimes one writes the sum down in a compact way as follows:
10
X
a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10 = ai .
i=1
P
The sign is the greek capital letter Sigma and is also called the summation
sign. The important ingredients in this notation are the following:

i) the running index, i.e. in this case i, the index that is used to assign
each term its place in the sequence whose members are being summed;
ii) the limits of the index, i.e. the smallest and largest index in the sequence
that should be included in the sum, here 1 and 10;
iii) a sequence of values that should be summed, here the values ai .

This notation is particularly advantageous if the terms ai are in some way


dependent on the index i. For instance, if we would like to consider the sum
of all values from -20 to 50, we can write this as
50
X
i.
i=−20
20 CHAPTER 1. THE BASICS

Problem 1.10. a) Evaluate


3
X
(i + 3)i .
i=−2

b) Express in summation notation:


x x2 x3 x79 x80
1− + − + ... − + .
2 3 4 80 81
Solution. a) To evaluate the given expression, we first rewrite it from the
summation notation to see the individual terms:
3
X
(i + 3)i = (−2 + 3)−2 + (−1 + 3)−1 + (0 + 3)0 + (1 + 3)1 + (2 + 3)2 + (3 + 3)3
i=−2
1
=1+ + 1 + 4 + 25 + 216 = 247.5.
2
b) Since we are asked to write the given expression in the compact summation
notation, we try to look for patterns that we could use to this end. To
make these patterns clearer, we start by rewriting the first term as
0
1 = x1 . From here, we can observe three facts: In the numerator of
the fraction, there is an increasing power of x, starting from 0 up to
80. In the denominator, we see the values 1 to 81 in increasing order.
Finally, the sign of the terms alternates, with odd terms having positive
sign and even terms having negative sign. This suggests that we might
write the sum as follows:
80
X (−x)i
.
i=0
i+1
Note that we could shift the running index, starting at 1 and ending at
81, which, however, would clearly require to write the individual terms
in a different way:
81
X (−x)i−1
.
i=1
i
This shows that summation notation is not unique.

Similarly, to write the product of several values ai in a compact way, we may


use the product notation. For instance, we have
5
Y
a1 a2 a3 a4 a5 = ai .
i=1
1.5. EXERCISES 21

The ingredients are thus the same as in the case of summation, however, the
individual values are multiplied with rather than added to each other.

1.5 Exercises
1.1 For the following numbers, find the smallest subset of real numbers (N,
Z, Q or I) in which they necessarily belong:

a) √12 , c) x7 for x ∈ R+ 0,
8
p
b) 4 , d) x8 y −2 for x, y ∈ N.

1.2 In a cinema, there are S seats. A ticket for an adult to see a movie costs
eT , a child pays 30% less than an adult. On a particular night, A adults
and C children saw a movie. Interpret the following quantities:
a)S − A − C
b)0.7T
c)T · S
d)A · T + 0.7C · T

1.3 A person has y euros available to spend on three kinds of fruit, namely
apples, bananas and cherries. She decides to spend y4 euros on each kind of
fruit and save the rest for later. The prices per kg of fruits are 2.5 for apples,
1.6 for bananas and 5 for cherries.
a)What is the total weight of fruits she buys?
b)How much does she pay per kg of fruits (combined)?
c)If instead she wants to buy equal quantities of each fruit by spending
the y euros entirely, how much does she buy of each fruit?

1.4 Solve the following equations in R:


a)x2 + 2x − 1 = 0
b)4x2 − 4x + 1 = 0
c)3x2 + x + 2 = 0
3 5x 3 x
d) x+2 + 4−x 2 = x−2 + x2 −4 ,
√ √ √
e) 2x + 7 + x − 5 = 3x + 2,
22 CHAPTER 1. THE BASICS

f) x+√4x2 +x − √1
x− x2 +x
= x3 ,
g)|x + 6| = 9,
h)|4 − x| − |2x + 3| = 7.

i) 1 + x + √ax
1+x
= 0 with parameter a
j) 2x−a
x−8
= x−a
x−1
with parameter a

1.5 A firm manufactures a commodity that costs e20 per unit to produce.
In addition, the firm has fixed costs of e2000. Each unit is sold for e75. How
many units must be sold if the firm is to meet a profit target of e14500?

1.6 A producer faces the following demand: P = 100 − 2Q, where P stands
for the price of a certain product and Q for the quantity of products sold.
At what price is the total revenue T R = P · Q equal to zero?

1.7 Solve inequalities in R:


a)3x − 5 > x − 3
b)|3x − 2| ≤ 5
c)|2x + 1| − |3 − x| ≥ 4
3
d) |x−2| ≤x

1.8 Find the fraction for which the following properties hold:
ˆthe denominator is by 3 larger than the numerator;
ˆif we add 1 to the numerator and 2.5 to the denominator, we don’t
change the value of the fraction.

1.9 Factorize the following expressions, i.e. write them in the form a(x −
x1 )(x − x2 ):

a)x2 − 5x + 6 e)x2 + 10x − 11


b)x2 + 2x − 8
f)3x2 + 12x + 15
c)x2 + x − 2
d)x2 + 7x + 10 g)− 12 x2 + x + 3
2

1.10 Simplify and find the values of x, y for which the expressions are not
well defined:
a) 21 (2x − 8) + 34 (12 − 4x) − 51 (5x − 10)
1.5. EXERCISES 23

b) x−1
x+2
(x2 − 4) + x+1
x−2
(x2 − 4x + 4)
x2 −25
c) x24x+16
+8x+16
(2x + 8) − 2x+10
÷ (x − 5)
x2 −7 x2 −49 x−2
d) x+7 · 7−x
·
2x2 −98
3x−5 5−3y 5x−3
e) 25−9x 2 ÷ 9y 2 −30y+25
÷ 6x+10

f) 1+x1p−q + 1
1+xq−p

√ 1
 15
1.11 Calculate without a calculator: a) 3
27, b) 32
, c) 0.00010.25 , d) log2 64,
e) ln 1, f) log4 12 .

1.12 A bank offers a savings account paying interest 4% at the end of each
year.
a)Person A deposits e1000 at the beginning of a year. For 15 years, the
person does not make any deposits or withdrawals in the account and
the interest does not change. How much is in the account after 15 years?
b)Person B wants to deposit money for 10 years and have e5000 in the
account at the end of this period (no deposits or withdrawals will be
made during that time). How much does the person have to deposit?

1.13 An amount of e5000 in an account has increased to e6000 in 20 years.


What (constant) yearly interest rate p has been used?

1.14 We know that the statement ’If Celia is sick, she doesn’t go to school.’
is true. Celia did not come to school. Decide the truth value of the statement
’Celia is sick.’
1.15 Fill the gaps with one of the following: ’at least’, ’exactly’, ’at most’ to
create a true statement.
a)Each prime number has ... two different divisors.
b)Two different lines in a plane have ... one common point.
c)The inequality x2 > 5 is satisfied by ... three natural numbers.

1.16 Decide about the truth value of the following statements:


a)x < 0 ⇒ |x − 5| > 0
2
b) (x−4)
2
< −4x ⇔ x2 < −16
c)10 is divisible by 2 and 15 is divisible by 2.
d)10 is divisible by 3 or 15 is divisible by 3.
24 CHAPTER 1. THE BASICS

1.17 Write the following in summation notation:

a)The sum of all multiples of 11 smaller than 10000.


b)The sum of the squares of natural numbers from 1 to 50.
c)x2 − 2x4 + 3x6 − 4x8 + . . . + 99x198 − 100x200

2
a)Let ak = (k − 1)2 for k ∈ Z. Determine a3 and
P
1.18 ak .
k=−1

2
b)Let ak = (−1)k (k + 1) for k ∈ Z. Determine a3 and
P
ak .
k=−1

1.19 Evaluate:
6
P
a) (2j − 5)
j=3

5
k3
P
b) 3
k=−5

2
P (4i+2)i
c) i+3
i=−2

1.20 On Monday, a small bookshop sold 27 books. Suppose that everyday,


they manage to increase their sales by 5 books. If we index the days of
that particular week starting from Monday with i ∈ {1, 2, . . . , 7} (i.e. i = 1
corresponds to Monday and i = 7 corresponds to Sunday – note that the
shops are open on Sundays for the purposes of this question), find

a)the number of books the shop manages to sell on Thursday.


b)the formula for the number of books sold in the whole week, given as a
sum.

1.21 During a 4-day holiday break, a group of students came together to


plant trees. During the first day, the students managed to plant 7 trees.
Following that, they planted 11, 15 and 19 trees during day 2, 3 and 4,
respectively. Index the days with i ∈ {1, 2, 3, 4}, denote by ai the number
of trees planted on day i, and provide a formula for each day as well as the
summation formula for the total number of trees planted in the four days.
1.6. FURTHER READINGS 25

1.6 Further readings

Most of the contents of this chapter are covered in Chapter 1 of [1]. In


particular, Section 1.2 covers statement logic. From this section, we suggest
exercises 1, 3 and 4 to practice the corresponding concepts, as well as exercise
5 from Review exercises of Chapter 1. Exercise 5 of Section 1.2 is more
difficult, but still a nice one to make sure that you understood the content
properly.

For more details on the rest of this chapter, we recommend Chapter 2 of


[1], in particular Sections 2.1-2.9. At the end of each section, as well as in
the section Review exercises, there are exercise that can help you check your
understanding.
26 CHAPTER 1. THE BASICS
Chapter 2

Getting started with R

In this chapter, we will introduce the programming language R, including its


user friendly UI Rstudio.

R is a statistical programming language, which means that while it can


perform a wide range of tasks, it was particularly developed with statistical
applications in mind and is therefore very well suited for working with data.
This side of R will become clear in the course Quantitative Methods II; but
to be able to perform statistical analysis in R, one has to get to know the
language, its syntax and how particular types of tasks can be performed.
That is one of the goals of Quantitative Methods I: In this course, we give
you the necessary basic knowledge to be able to go on and analyze data not
only in Quantitative Methods II, but also in Business Analytics courses, your
bachelor thesis, or even in your career.

R is an open source program. That is, it is free and its capabilities are
quickly growing thanks to the large R community that works on providing
new functions for specific tasks in what is called packages (we will get to
know some packages soon enough, though in QM I, we will mostly work with
the basic R without packages). Nevertheless, before a package is formally
made available in R, it has to go through a round of checks by the R core
team which ensures that high quality is maintained.

27
28 CHAPTER 2. GETTING STARTED WITH R

Figure 2.1: Rstudio upon opening after installation

2.1 Download, installation and basic view

Before we start working with R, you need to download it. Best way to do it
is from the official webpage of the R-project. Then install R following the
instructions. Usually the default options are a good choice.

After installing R, one can start working with it. However, for a more
comfortable user experience, we suggest you also download and install Rstudio.
We will also use Rstudio in our practical sessions. Note that the order is
important here: You should always start with installing R and only then
proceed to installing Rstudio.

Let us briefly have a look at the user interface when first starting Rstudio in
the screenshot in Figure 2.1.

In the view when first opening R, the environment is split into three parts.
The large part on the left is called the console. In the console, we write and
run commands, and the outputs, if there are many, are shown here. After
starting Rstudio, you will see some information about your current version
of R and about some ways on how to receive more information here. At the
bottom, there is the prompt sign > that we will mention in more detail in
the next section.
2.1. DOWNLOAD, INSTALLATION AND BASIC VIEW 29

Figure 2.2: Rstudio with a file open

When working on a larger project rather than using R simply as a calculator,


one would usually write the whole code in an .R file, also called an Rscript
(or for example .Rnw or .Rmd if the outcome should be a concise .pdf or
.html file, but this you will be introduced to later by the tutors). If such a
file is open, the left part is further split into an upper part with the open
file(s) and a lower part with the console as can be seen in Figure 2.2.

In the right part, we again have two windows, both of them with several
tabs. The most important tab of the upper window is Environment. In this
window, we can see the list of all user-defined objects currently stored in the
environment, also called the workspace. This includes for instance all values
assigned to variables as well as the user defined functions.

In the lower window, the most useful tabs are Files, Plots, Packages and
Help. We will mention each of these tabs at appropriate places in the further
text.

Once you get more familiar with R, you will realize that you might find a
different layout more comfortable, or even that some of the provided tabs
are not useful for you while others that you would like to have available are
not included. The good news is that you can customize the layout of your
Rstudio: To this end, choose Tools from the bar at the top of your Rstudio,
go to Global options and finally Pane layout. Here you can decide about the
30 CHAPTER 2. GETTING STARTED WITH R

position of the four basic windows, as well as what tabs should be included
in the two windows that in the basic view are on the right. We will refer to
the windows the way they are ordered in the basic view, i.e. the files in upper
left, console in lower left, environment etc. in upper right and help etc. in
lower right.

In the rest of this chapter, we will discuss how the user can communicate with
R and define variables, how to work with vectors, and how to understand
R’s ’replies’. In later chapters, we will focus on further tasks, like defining,
evaluating and plotting functions, checking whether particular conditions are
satisfied and defining actions based on the result of this check, or using loops
that allow to repeat a particular task as often as necessary in an automatized
way. In the following, whenever mentioning a function or its arguments in
the text, we will use the computer font that makes them stand out.

2.2 R as a calculator

Though its power lies elsewhere, for the basic understanding of R, its syntax
and how to interact with it, it is good to have a look at the most basic way
of using it, namely as a calculator (in some situations, it is even easier to use
and possibly more precise than a calculator).

As already mentioned above, once you start R, you will see the prompt
sign > in the console. That means that R is ready to receive and carry
out commands. If you enter a (correct) line of code and hit enter, the
command you entered will be run, if there should be an output, it will be
shown (printed ), and the prompt will show up in the next line. Should you
instead see a plus sign, that means that the code you entered in the previous
line was incomplete and R expects more (for instance a closing bracket).

Let us now have a look at some basic calculations in R:

4*10^1 + 2*10^0 + 1*10^(-1) + 4*10^(-2)

## [1] 42.14

1/100000000

## [1] 1e-08
2.2. R AS A CALCULATOR 31

2.34e-02

## [1] 0.0234

13/11 # a cutoff after 6 digits (per default)

## [1] 1.181818

At this point, we would like to make a few remarks. Let us start by commenting
on the way code and its output look. We create these lecture notes with the
help of a very useful package called knitr that allows for creation of nice
documents with integrated formatted code. The colorful lines are the code
we enter in the console or run from a file. The lines below, started with two
hashtags, show the output of each corresponding line of code. To run a line
of code from a file (open in the upper left window of your Rstudio), just place
the cursor anywhere in the corresponding line and use the key combination
Ctrl+Enter, or on some devices it might be Ctrl+R.

Now let us turn our attention to the code itself. As should be clear from the
first line of code, addition and multiplication work the usual way with the
signs + and *. Moreover, to raise a number to a certain power, one uses the
sign ˆ . The second line of code shows division using the slash (/) and looking
at the output of this line, we see that R makes use of the scientific notation
if the given number is very small (or large). The third line shows that not
only can the output be shown in scientific notation, R also understands if it
is given a number this way and can ’translate’ it.

In the fourth line of code, we are interested in the result of dividing 13/11.
This number is in fact an irrational number, which means that in the decimal
form, there are infinitely many digits after the decimal point. However, R
does not show all of them. Note that we are also given this information in the
line of code in question. However, it seems that R did not do anything about
this part of the line. This is the case because the words ’a cutoff after 6 digits
(per default)’ are entered after the hashtag sign. In R, the hashtag starts
a comment: Anything that comes after this sign in a line will be ignored.
It is a good practice to use comments in particular when working on larger
projects, things that you might need to come back to at a later point (one
can forget surprisingly quickly what one was thinking while writing the code
and why the variables were named in the seemingly illogical way they were)
32 CHAPTER 2. GETTING STARTED WITH R

or if working on a project with other people. Providing short information


about what a possibly tricky line of code doesn’t take too much time, but it
can save a lot of time to other people reading your code, or to yourself if you
come back to your code later on.

The number of digits that are shown of a number can be changed in several
ways. One of them, particularly useful if one only wants to change this
setting for one single value (we will touch upon a more long-term solution at
a later point), is to use the print function: We provide it with the number
to show and set a parameter called digits to the number of desired digits
to be shown.

print(13/11, digits = 9) # display 9 digits

## [1] 1.18181818

The order of operations is the one that we know from basic algebraic rules:
taking powers comes before multiplication and division, and these operations
come before addition and subtraction. Therefore we need to use brackets if a
different order of operations is desired, as is illustrated by the following lines:

5*2 + 4/2^2

## [1] 11

5*(2 + (4/2)^2)

## [1] 30

Note that since power comes before division, 4/2^2 corresponds to 242 . But
adding brackets where they are not necessary, for instance writing 4/(2^2)
instead of 4/2^2, does not interfere with the outcome. Therefore if you are
unsure about the order of operations, you may use brackets even when they
are not necessary. Next to the fact that you can then be more sure that R
does what you want it to, in some situations it also makes the code easier to
read.

Of course, R can perform more than just the most basic tasks of addition,
subtraction, multiplication and division. For finding the square root (by
2.2. R AS A CALCULATOR 33

far the most used root) of a number, there is a specific command sqrt.
However, recall that taking the square root of a number is equivalent to
raising a number to one half, such that it should not come as a surprise that
the following two lines of code result in the same output:

sqrt(2)

## [1] 1.414214

2^0.5

## [1] 1.414214

However, for all the other roots there are no specific functions. To take the
n-th root of a number in R, just raise it to the power of n1 :

27^(1/3)

## [1] 3

(1/32)^(1/5)

## [1] 0.5

As with most other calculators, R does not need the leading zero for decimal
numbers smaller than 1 (in absolute value): 0.25 and .25 are considered the
same.

.0001^.25 #no need for leading zeros in front of '.'

## [1] 0.1

Finally, there are some numbers in mathematics that are so important and
often used that they have earned a specific name under which they are
known. The two most well known of them are π and e. R knows both
34 CHAPTER 2. GETTING STARTED WITH R

of these values. π can be obtained by just typing pi - this is a constant


that is saved in R without the need to be defined by the user. e is, maybe
somewhat surprisingly, not saved this way. However, clearly, e is the value
of the exponential function at base e at x = 1. R knows the exponential
function at base e as exp.

pi

## [1] 3.141593

## Error in eval(expr, envir, enclos): object ’e’ not found

exp(1)

## [1] 2.718282

exp is what we call a function in R. Functions have one or more arguments


(recall that we already used the print function above to which we provided
two arguments), some of which have to be provided, others are optional
only. For the optional arguments, sometimes there are default values set,
that means that if we do not provide a value for the argument manually,
the default value will be used. In other cases, the function might perform
different tasks based on which arguments are provided. We will talk about in
this more detail later on. For now, let us focus on some more basic functions.
Next to the function exp, we will consider also the function abs that provides
the absolute value of a number.

abs(-5)

## [1] 5

The functions exp(x) and abs(x) take only one argument, indicated in the
brackets as x. That is the value at which the exponential function should
be evaluated for exp, or whose absolute value should be returned for abs.
2.2. R AS A CALCULATOR 35

Closely associated with the exp function is the function log(x, base =
exp(1)) that takes two arguments. The first argument, x, is clearly the
value at which the logarithm is evaluated. The second argument called base
defines the base at which the logarithm should be taken. The notation base
= exp(1) in the function introduction above tells us that the base argument
actually does not have to be provided, since it has a default value assigned,
and this default value is e. That means that simply entering log(x) will
compute the natural logarithm of x in R. To compute a different logarithm,
say decadic, base has to be provided and set accordingly. Let us now
illustrate the use of these functions - in the last line, we effectively verify
that log computes the natural logarithm.

exp(5)

## [1] 148.4132

log(10)

## [1] 2.302585

log(10, 10)

## [1] 1

log(exp(5))

## [1] 5

Another useful function is the function choose(n,k) which computes the


n
binomial coefficient k , that is, the number of possible ways of choosing k
objects out of n options. This is an example of a function that takes two
arguments. Generally, the order of the arguments is important. That means,
5
to find 2 , we need to enter choose(5,2) in the console; choose(2,5) will
simply result in 0 since there are 0 ways of choosing 5 out of 2 (without
repetition). However, the order of the arguments can be changed if we enter
them together with their names, as illustrated in the last line in the next
36 CHAPTER 2. GETTING STARTED WITH R

chunk of code:

choose(5,2)

## [1] 10

choose(2,5)

## [1] 0

choose(k = 2, n = 5)

## [1] 10

For such simple functions as those that we got to know so far, providing
the argument names is not necessary. Later on, however, we will encounter
functions that take far more (optional) arguments and this aspect of R’s
workings comes in handy if we only want to set a parameter that is far away
in the list of arguments the function takes.

2.3 Help

To find out what arguments a particular function takes or even what it does,
we can consult the Help. One option is to search for the function in question
in the search bar of the Help tab in the lower right window. Alternatively,
one can call out the help page of a function directly from the console by
typing a question mark followed by the function’s name, for instance:

?log

Run the above line of code to see what the help page looks like. It will
appear in the lower left window. Often a function is part of a larger family
of functions that are connected to each other in various ways. In that case,
the help page that occurs for a particular function contains information not
2.3. HELP 37

only about that particular function, but about all other functions of that
family, too. For instance in the case of log, we are told about the basic
logarithm function, its special forms for bases 2 and 10 and for computing
log(1 + x), as well as about the exponential function and a variation of it. In
the Description part, we see the list of functions provided in the help page
and a short description of their workings.

In the part Usage, the full list of functions is given again, this time including
the list of their arguments in brackets. As already touched upon above, an
argument that is listed on a stand-alone basis is a mandatory argument, like
x in the case of log. If the name of an argument is followed by an equality
sign and a value, this value is the default value for this argument which
will be used if you do not provide a different value. Sometimes you will
see ... at the end of the argument list which means that there are many
more optional arguments that are usually common for many functions that
perform a certain type of tasks. For instance there is a whole set of graphic
arguments that can be provided to (almost) any plotting function – we will
discuss these when we learn about plotting.

Section Arguments provides more details about what arguments enter the
functions, including what type of variable they should be or what role they
play in the underlying computations.

This is followed by the Details sections that, as the name suggests, provides
details about workings of the functions. Sometimes formulae used in the
computation or some technical details may be given.

In the part Value we are given information about what the outcome of
running the function is. Of course, for the logarithm or exponential function
we have a very good idea about the outcome. However, there are also far
more complicated functions for which this sections makes a lot of sense, and
the technical details of what type of object from R’s point of view the output
is might also be relevant for further computations.

There are several other sections in the help page of most functions. For now,
let us only mention the section usually located at the very bottom called
Examples that shows examples of the correct usage of the functions.

If you are not sure about a functions name or you generally want to look
for information on a certain object or operation that is not a function, the
single question mark might not work. However, by doubling the question
mark and using quotation marks around the expression to be looked for, you
trigger a buzz word search, i.e. R will not look for the help page associated
38 CHAPTER 2. GETTING STARTED WITH R

to the expression entered, but will instead search through all help pages. For
example, we may look for R’s inequality sign != to get more information
about comparison methods:

?? '!='

Again, please enter into your console and run the above line of code to see the
outcome. You will be offered a list of help pages that contain the expression
you looked for. In this case, we were interested in comparison methods, so
upon looking at the brief description of the pages in the right column, we
choose the page on relational operators. We then learn for instance that the
’at most’ and ’at least’ operators translate to <= and >=, respectively, and to
find out whether two values are the same, we should use ==.

We will learn more about comparing values and about the outcome of such
comparisons later on. But before doing so, we turn to assigning and storing
values for later use.

2.4 Storing values

It is often useful and desirable to store values in variables, for instance to


simplify computations by doing intermediate calculations, or for later use.
In this section, we will learn about storing, accessing and rewriting values.

To assign a value to a particular variable, we use the ’gets’ operator which


looks like a simple arrow directed from the value that is to be assigned to
the variable’s name. Typically, one starts the line of code by the variable’s
name such that the arrow points from right to left – it then consists of a
’less’ sign and a dash. For those who struggle with various keyboards and/or
can’t find < easily, Rstudio has a handy keyboard shortcut consisting of Alt
and the dash. This shortcut types <- for you. Writing the value first and
only then the name of the variable to which it should be assigned, as opposed
to the standard way of assignment, will work just as well, you just have to
remember to turn the arrow (but no shortcut for that, sorry). Let us assign
our first variables:

r <- 4
13/11 -> somenumber
2.4. STORING VALUES 39

If you execute these lines of code, you will see two objects r and somenumber
in the upper right window in the environment tab. You can also let R print
the value of a particular variable in the console by only typing it’s name. If
you use a computation to assign a value, putting simple brackets around the
assignment will both assign the value and show it in the console.

## [1] 4

(r <- 1.2*r)

## [1] 4.8

As we see from the second output, we can also assign values iteratively in the
sense that we can use the old value of a variable to define its new value that
will simply be overwritten. At this point, we would like to point out that
unlike with written formulae, where the multiplication sign can be dispensed
with when working with variables (e.g. 3x and 3 · x are the same), in R it is
absolutely necessary to use the star whenever multiplication is intended. In
the above code, for example, writing 1.2r instead of 1.2*r would result in
an error (feel free to try it out).

Theoretically, the assignment can be done also with the equality sign:

a = 5

Though this will generally work, it is not the recommended way to do things.
One of the reasons is that the equality sign is meant for setting the values of
function arguments and it is good to have the differentiation between these
two types of code. A more sever reason is the fact that there are situations
in which it matters and if you use = instead of <- in these situations, your
code will either not work the way you would like it to, or it won’t work at all.
Even though these situations will not arise in this course, we will consistently
use the ’gets’ operator (usually in its typical way with the arrow pointing to
the left) and we recommend you get used to this way from the start, too.
After all, who knows, maybe some day you will use R at a level where it will
matter :)
40 CHAPTER 2. GETTING STARTED WITH R

2.5 Vectors

R is a vectorized language which means that the basic objects it works with
are vectors – you can think of them as lists or sequences of values – and the
basic operations can be performed on them elementwise. In fact, even the
few variables that we have assigned previously are strictly speaking vectors
of length 1.

The most basic way of creating a vector is the function c (stands for combine),
in which you provide the list of values you want to store – as single values,
but you can also combine several vectors into one this way. Let us define our
first vector:

r <- c(2,3,4,5)

To define another vector, now containing values from 1 to 4, let us investigate


another way of defining vectors. If you want to define a vector of consecutive
whole numbers, in increasing or decreasing order, you may use the : operator:
a:b will result in a vector with the first value being a, then moving in steps
of 1 if a<b or -1 if b>a till the last value b:

(r2 <- 1:4)

## [1] 1 2 3 4

The operator : takes precedence over the mathematical operations like


multiplication or addition (but not taking powers). Therefore it is important
to set the brackets properly to ensure that the operations are performed
in the desired order. To understand this, compare the following two code
outputs and think about how they came to be:

10:(2*2+1)

## [1] 10 9 8 7 6 5

10:4*2+1

## [1] 21 19 17 15 13 11 9
2.5. VECTORS 41

Now that we have two vectors, we can investigate what happens if we perform
basic operations on them:

r + r2

## [1] 3 5 7 9

r*r2

## [1] 2 6 12 20

r - r2

## [1] 1 1 1 1

r/r2

## [1] 2.000000 1.500000 1.333333 1.250000

r + 5

## [1] 7 8 9 10

r2*2

## [1] 2 4 6 8

r^2

## [1] 4 9 16 25

In the first four outputs we can see that the operations are performed elementwise,
which means that for instance for addition, the first entry of the first vector
42 CHAPTER 2. GETTING STARTED WITH R

is added to the first entry of the second vector, the second entries are added
to each other, etc. – and similarly with the other operations. Turning our
attention to the next output, we see that by adding a single number to a
vector, we increase each single entry by this number. Multiplying a vector
by a constant leads to multiplying each element by this constant, and the
square of a vector is a vector containing the squares of each individual entry.

Remark. Note that in mathematics, if we consider two vectors x, y, their


product is not a vector with the elementwise products of their entries. Something
like x/y or x2 is not even defined. Still, it is often desirable to be able to do
these elementwise operations the way R does them, for instance if working
with data. We will defer the discussion of how to do ’mathematical vector
multiplication’ to a later chapter on matrix algebra.

So far, we consider the combination of a vector and a number, or of two


vectors of the same length. Let us now study the case that the vectors are
not of the same length:

r3 <- 1:3
r + r3

## Warning in r + r3: longer object length is not a multiple of


shorter object length

## [1] 3 5 7 6

We see a warning saying that the length of the longer vector is not a multiple
of the length of the shorter one. Nevertheless, something has happened. If
we recall the values in r and r3, we can reconstruct what R did: It did the
elementwise operation as far as was possible, and then started recycling the
shorter vector to fill it up to the length of the longer one. Effectively, r3
became (1 2 3 1) for the purposes of this operation. Because this recycling
had to be done but was not finished (values 2 and 3 were only used once
while 1 was recycled and used twice), R was not entirely sure this is what
the user wanted to do and showed a warning. Interestingly, if the length of
the two vectors is not the same, but one of the lengths is a multiple of the
other one, there will be no warning:
2.5. VECTORS 43

r4 <- 3:10
r2 + r4

## [1] 4 6 8 10 8 10 12 14

As already mentioned, the c function can be used to combine not only single
values, but also to create one vector from several, for instance

c(r, r2, 85, 1:3)

## [1] 2 3 4 5 1 2 3 4 85 1 2 3

To get basic information about vectors, there is a handful of useful built-in


functions. One can easily find the minimal or maximal entry, the sum of all
entries, the number of elements in the vector, or even the average:

min(r)

## [1] 2

max(r)

## [1] 5

sum(r)

## [1] 14

length(r)

## [1] 4

mean(r)

## [1] 3.5
44 CHAPTER 2. GETTING STARTED WITH R

2.5.1 Vectors with patterns

We already talked about one particular way of creating vectors following a


particular pattern, namely the : operator to create vectors with the absolute
distance between consecutive elements being 1. To create sequences with
the steps between two consecutive entries being something else than 1, there
is the function seq(from = 1, to = 1, by = ((to - from)/(length.out
- 1)), length.out = NULL). These default settings mean that if we run the
function with no arguments (seq()), the result will be a vector with a single
entry equal to 1. Clearly a more useful case is one where we do provide some
arguments. The parameters from and to stand for the starting value of the
sequence, and the largest or smallest possible value (depending on the step
size, it might not be included). We then might specify the step size of the
sequence by setting the argument by, or the number of entries in the desired
vector in the argument length.out. In the first case, R will start at the
from value and change this value by adding the provided step size until the
value to is reached. If it cannot be reached exactly, the last value of the
resulting vector will be the largest smaller value that can be achieved (or the
smallest larger value if the sequence is decreasing). Clearly, if the provided
step size is a positive number, this will result in an increasing sequence of
numbers, whereas a negative step size will provide a decreasing sequence. In
the latter case, defining the sequence by its length, the interval between from
and to will be cut evenly to achieve a sequence of the desired length. Let us
illustrate the use of seq:

seq(from = 10, to = 2, by = -2)

## [1] 10 8 6 4 2

seq(-5, 6, by = 3)

## [1] -5 -2 1 4

seq(0, 1, length.out = 11)

## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
2.5. VECTORS 45

As can be seen in the above code, often one would provide the parameters
from and to without the names, whereas by or length.out would be specified
with the names. This is also because the first two are provided always,
whereas the other two are complementary to each other and one would
explicitly show which one is used for the sequence specification by providing
the parameter name.

Another type of pattern that can easily be created by a function in R is


repetition. In the following chunk of code, we first recall the vector r to
make the workings of the function called rep better visible, and then use
this function in several variations. Observe the output of each line of code
to understand how various patterns can be achieved.

## [1] 2 3 4 5

rep(r, 2)

## [1] 2 3 4 5 2 3 4 5

rep(r, times = 2)

## [1] 2 3 4 5 2 3 4 5

rep(r, each = 2)

## [1] 2 2 3 3 4 4 5 5

rep(r, times = c(1, 2, 3, 4))

## [1] 2 3 3 4 4 4 5 5 5 5
46 CHAPTER 2. GETTING STARTED WITH R

2.5.2 Comparison and subsetting

Sometimes one is only interested in particular entries of a vector rather than


all values. Accessing a certain value is also called indexing and in its basic
forms is done, in the case of vectors, by the means of squared brackets. By
typing the name of a vector followed by an integer or a vector of integers in
squared brackets, one obtains the values in the corresponding entries:

r[4]

## [1] 5

r[c(1, 3)]

## [1] 2 4

It is also possible to specify which values should not be accessed by typing


a minus sign in front of the non-desired integers (of course, this is not too
useful with a vector of length 4, but with longer vectors it can be very useful):

r[-4]

## [1] 2 3 4

r[-c(1, 3)]

## [1] 3 5

It is much more usual that one is interested in the values in a vector that
satisfy certain conditions rather than say the third entry in particular. In
the following, we will discuss the most basic such conditions one would like
to check, namely comparison of values for equality or inequality.

As mentioned above, to check whether two values are equal, one should use
a double equality sign. To see whether they are not equal, the comparison
operator != should be used. One can also check whether one value is smaller
2.5. VECTORS 47

than (<), at most (<=), greater than (>) or at least (>=) another value. The
outcome of such a comparison is what we call a logical – the truth value of a
statement. It can therefore be TRUE or FALSE. If several values are compared
at the same time, we obtain a vector of logicals as the output.

2 == 3

## [1] FALSE

2 < 3

## [1] TRUE

1 != 1

## [1] FALSE

r < 4

## [1] TRUE TRUE FALSE FALSE

r < r2

## [1] FALSE FALSE FALSE FALSE

r > 5:2

## [1] FALSE FALSE TRUE TRUE

The simple comparisons can be combined by the logical operators and and
or from Chapter 1. The sign for the logical and in R is the ampersand &
and the outcome of such a combination will be TRUE if both parts are TRUE
(and FALSE otherwise). The logical or is in R denoted by | and the outcome
is TRUE if at least one of the two conditions is TRUE (and FALSE if both
48 CHAPTER 2. GETTING STARTED WITH R

are FALSE). Both of these work elementwise in a vector, just like the simple
comparisons:

r < 5 & r > 2

## [1] FALSE TRUE TRUE FALSE

r < 3 | r > 3

## [1] TRUE FALSE TRUE TRUE

Remark. In some programming languages, ’and’ and ’or’ are used with
double signs, i.e. && and ||. In R these operators also exist but only work if
we work with single values rather than vectors. If the simpler conditions of
either side of these operators are vectors, we get an error:

r < 5 && r > 2

## Error in r < 5 && r > 2: ’length = 4’ in coercion to ’logical(1)’

r < 3 || r > 3

## Error in r < 3 || r > 3: ’length = 4’ in coercion to ’logical(1)’

Now that we know about logicals, we can, as alluded to above, index vectors
based on comparisons. A vector can be indexed not only by directly providing
the indexes that should be considered. Another way is by providing a logical
vector of the same length as the vector to be indexed. Only the values that
correspond to TRUE in the logical vector will be displayed. To illustrate this
in more detail, in the first case we will first show the corresponding logical
vector and only then the indexing. In the latter examples, we will do the
indexing without first printing the logical used, but for a quick check, you
can of course execute the conditions in the square brackets to see their truth
values and compare with the outcome.
2.6. MORE ON LOGICALS 49

r < 4

## [1] TRUE TRUE FALSE FALSE

r[r < 4]

## [1] 2 3

r[r != 4]

## [1] 2 3 5

r[r < 4 & r > 2]

## [1] 3

2.6 More on logicals

Let us now have a more detailed look at the logicals. While these are
a special type of values, they can easily be treated as numbers. If an
operation intended for numbers is applied for a logical, R does an internal
translation by assigning 1 to TRUE and 0 to FALSE. It is therefore possible to
for instance multiply a vector of logicals by a number or a vector. However,
the particularly useful advantage of this behavior is the possibility to use the
sum function: this allows to very easily find out how many entries of a vector
satisfy a particular condition:

r > 2

## [1] FALSE TRUE TRUE TRUE

(r > 2)*1
50 CHAPTER 2. GETTING STARTED WITH R

## [1] 0 1 1 1

sum(r > 2)

## [1] 3

Finally, sometimes one might be interested to find out the entries of a vector
for which a particular condition is not satisfied. Instead of changing the
condition to its negation, one can make use of the exclamation mark – we
can observe already in the operator != that it stands for not.

!TRUE

## [1] FALSE

!FALSE

## [1] TRUE

!(r>2)

## [1] TRUE FALSE FALSE FALSE

r[!(r>2)]

## [1] 2

2.7 Corner cases

As we already hinted at, R is much more than just a smart calculator. The
language is essentially a large set of rules and in this short section, we will
introduce some interesting corner cases.
2.7. CORNER CASES 51

We will start by introducing NaN. NaN stands for not a number and it is what
you obtain if you try to perform a ’prohibited action’, i.e. if the operation at
hand is not well defined. An example of such a situation is taking the square
root of a negative number:

sqrt(-40)

## Warning in sqrt(-40): NaNs produced

## [1] NaN

Interestingly, even though strictly speaking, division by 0 is also a ’prohibited


action’, the outcome of such an action is not NaN:

1/0

## [1] Inf

Inf stands for infinity and according to the rules of R, division by 0 leads to
this result. The underlying reason can loosely be explained by the fact that
lim x1 = ∞. There is also a value for negative infinity, -Inf.
x→0

Finally, when dealing with data (wait for Quantitative Methods II or Business
Analytics for more information), sometimes there are missing values in your
measurements. R denotes these values by NA (not available). Rather than
a value in itself, NA is an information that the particular value is unknown.
Therefore a result of any operation with it will be NA again: If you don’t
know what a particular value is, you cannot know what the result of adding
5 to it or multiplying it by 3 is.

NA + 5

## [1] NA

NA*3

## [1] NA
52 CHAPTER 2. GETTING STARTED WITH R

2.8 Some useful advice and workspace management

We will close this section by a handful of helpful functions for the management
of your workspace and a summary of some useful advice. Some of the points
mentioned below were already discussed before, but we collect them here
along with others in a compact way.

Although R can be used as a very smart calculator, one would typically use
it to work on larger projects including large amounts of functions, variables
and data. When working on such projects, name your variables and functions
in a smart way to avoid confusion and to make it easier for others and your
future self to understand your code. Try to avoid names that are already
reserved in R for functions and constants, such as c or pi. If you do use
these names, it might, at the very least, lead to confusions, or they might
even cause your code to not work properly.

If you have too many variables in your environment, you might lose overview.
To obtain a list of all variables in your environment, you can use the function
ls. With the function rm, one may remove a certain object, like a variable,
(or several objects) from the environment, or even all variables by executing
rm(list = ls()).

ls()

## [1] "a" "r" "r2" "r3" "r4"


## [6] "somenumber"

rm(r, r2)
ls()

## [1] "a" "r3" "r4" "somenumber"

rm(list = ls())
ls()

## character(0)
2.8. SOME USEFUL ADVICE AND WORKSPACE MANAGEMENT 53

R also allows to use the objects in your current workspace for later use. To
know where to look for the saved file, you need to know the current working
directory. This can be checked by the means of the command getwd(). If
necessary, the working directory can be changed by setwd(directory) or,
in Rstudio, by choosing the tab Session in the upper toolbar and choosing
’Set directory’. Here you can choose ’To Source File Location’ if you want
the directory to be the one where the file you are currently working in is
located, or ’Choose directory’ to manually search for the desired folder.

Now that we know in which directory we are, we can save all objects from
the current directory with save.image(file) by providing the name of the
file to save the data in, or only some of the variables with save(y, file)
by specifying what objects should be saved and in what file. The resulting
file will have the extension .Rdata. To load such file, just use the function
load(file). In both cases, file must be enclosed in quotation marks.

To illustrate the saving and loading of data, we will first create a few variables
to have something to save (remember, we have just cleared the workspace).
After saving them, we will clear the workspace to then reload them and check
whether they are back.

a <- 5
b <- 7:12
r <- 1.08
save(r, file = "only_r")
save.image("our_workspace")
ls()

## [1] "a" "b" "r"

rm(list = ls())
ls()

## character(0)

load("only_r")
ls()

## [1] "r"
54 CHAPTER 2. GETTING STARTED WITH R

load("our_workspace")
ls()

## [1] "a" "b" "r"

Next to smart work with variables, another useful habit that can make your
code more understandable is using comments. In particular for longer codes
or more complex chunks of code it is useful to write down short remarks
about what the individual values are and what various functions do.

The readability of the code is further enhanced by its visual form. You might
have noticed that we have consistently used spaces around the ’gets’ operator,
around the signs + and -, as well as after commas when defining vectors. We
also mentioned the use of brackets in some places where they are not entirely
necessary. While these are small details that do not influence the workings
of your code, they do make it easier on your (and others’) eyes. We have
also used one line of code per command. While it is possible to stack several
commands into one line of code if we separate them with a semicolon, this
strategy usually makes the readability worse.

Finally, we would like to reiterate our recommendation to get used to using


the ’gets’ operator <- for assignments not only because it is considered ’best
practice’, but also because later on you might encounter situations when
using the equality sign instead might cause issues.

2.9 Exercises
2.1 Define the following vectors making use of the : operator and the
functions rep and seq.
a)1, 11, 112 , . . . , 1110
b)1, 2, 3, 1, 2, 3, . . .; 15 times
c)5, 5, 5, 6, 6, 6, 7, 7, 7, . . . , 18, 18, 18
d)−500, −450, −400, −350, . . . , −100
e)55, 53, 51, 49, 47, . . . , 33
f)2, 4, 4, 6, 6, 6, . . . , 14, 14, 14, 14, 14, 14, 14
2.10. FURTHER READINGS 55

2.2 Compute the following in one line of code (in a way that is as compact
as possible):
25
i2
P
a)
i=−25
10
3i
P
b)
i=−10

c)1 + 2 + 4 + . . . + 1024
100
P
d) i(10 − i)
i=1

e)4 + 7 + 10 + 13 + 16 + . . . + 259

2.3 Observe the outcomes of the following lines of code and think about why
they are what they are.
a)1/Inf
b)1/0 + 1/0
c)Inf - Inf
d)NA == NA
e)-5*Inf

2.4 Write a chunk of code that does the following:


1.It assigns the vector 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5 to the
variable called vector.
2.It combines vector with a predefined vector vector2 (assume it is
already available in your workspace) into one single vector called new vector.
3.It prints the last 5 elements of new vector.
4.It prints all elements of new vector that are greater than 3.
5.It checks and prints how many elements of new vector are smaller than
7 and at the same time not equal to 5.

2.10 Further readings

The contents of this chapter are discussed in Chapters 1 and 2 of [2]. The
introduction to R, the installation process and an overview of Rstudio are
56 CHAPTER 2. GETTING STARTED WITH R

given in Chapter 1. Sections 2.1 and 2.3 discuss R as a calculator and the
use of vectors, at the end of each of these sections, there are also useful
exercises to practice your newly gained knowledge. Similarly, Section 2.7
focuses on logical vectors and includes exercises on this topic. Finally, Section
2.6 summarizes the use of R help.
Chapter 3

Functions of one variable

In mathematics (and other disciplines), functions are used to describe the


relationship between two or more quantities. In this chapter, we will consider
functions of one variable, that is functions that describe the relationship
between two quantities – one of them is the variable, the other one is the
function value. At a later point, we will consider functions of several variables.

3.1 The definition of a function

A function is in fact an assignment. The definition of a function f requires


three objects:

1. a domain A,

2. a target set, so called codomain B and

3. a rule that assigns to any element of the domain A one element of the
codomain B.

We use the notation f : A → B, x 7→ f (x) to describe the function, providing


its domain, codomain and the rule. The range (or image) of the function
f : A → B is the collection of all the values of f – we denote it by f (A) or
Im(f ) and write f (A) = {f (x)|x ∈ A}. Clearly, we have f (A) ⊆ B (f (A) is
a subset of B).

57
58 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

Note that the codomain of a function is not unique. Any set that contains
all possible values of the function can be used as its codomain; the smallest
possible codomain is the function’s range. Often one would use the range as
the codomain, but this is not strictly necessary. In some cases, for instance,
the range is a rather complicated set and therefore one would use a larger
set for the codomain with a more compact notation.
Example 3.1. A rule that assigns to each person in a room their age (in
whole years) is a function. Its domain is the set of all people in the room. A
possible codomain is the set of all natural numbers.

A rule that assigns to each age a person of this age in the Quantitative
methods lecture room is not a function since in a BBE cohort, there are
several people of the same age and therefore one cannot uniquely assign one
person of the given age to each age.
Problem 3.1. The total
√ dollar cost of producing x units of a product is
given by C(x) = 100x x + 500. What is the domain, codomain and range
of this function? What does its graph look like?
Solution. When considering the production of a product, it is usually impossible
to produce non-whole numbers of units. Therefore strictly speaking, the
domain of the above function should be the natural numbers including zero.
As a codomain, one could use the whole set of real numbers, but also the set
√ ≥ 500 for any
of positive numbers (clearly, C(x) x ∈ N0 .) The range of this
function is then C(N) = {100x x + 500|x ∈ N0 }.

Looking at the range of this function, it is clear that there is no more


compact way of writing it down. Moreover, we will see later when discussing
optimization that it is not easy to for instance minimize a function like the
one given above if we only consider N0 as its domain. Therefore in practice,
one would choose R0+ as the domain. The codomain can again be R or R0+ .
However, the range changes significantly: If any non-negative number can be
plugged in for x, the range will be C(R0+ ) = [500, ∞).

To obtain the graph of the given function, we will make a small detour to
learn about defining and plotting functions in R.

3.2 Defining and plotting functions in R

In the previous chapter, we learned about built-in functions in R. In what


follows, we discuss how to define new functions, starting with the simpler
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 59

case of a mathematical function with one numeric input and one numeric
output. Moreover, we also discuss how to plot the graphs of such functions.
Later, we will extend the discussion of this section to more general functions
in various senses, e.g. considering several inputs or outputs or providing
default values for arguments.

Let us explain the way functions are defined in R on an example.

f <- function(x) {
return(x^2 + 3)
}

To define a function, we choose its name and assign to this name a particular
object. Using function informs R that whatever will be provided in the
coming chunk of code, provided in curly brackets, is part of the newly defined
function. Before coding the body of the function in curly brackets, we first
provide the argument(s) of the function in normal brackets. In this case, the
function at hand takes one argument, a value x. In the body of the function
we see that provided x, the value x^2+3 is returned : we use the command
return to define the output of the function. We have successfully defined our
first function that, provided a value x, computes the value of f (x) = x2 + 3.
Let us now test this function:

f(0)

## [1] 3

f(-2)

## [1] 7

f(1:4)

## [1] 4 7 12 19

As we can see in the last output, this function is automatically vectorized,


that is, we can provide a vector as the argument and obtain a vector as
60 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

output. The entries of this output correspond to the values of the function
for each entry of the input. This is because, as already discussed in the
previous chapter, the basic mathematical operations, like raising to a power
or addition, are vectorized, too. If we directly executed the line of code
(1:4)^2 + 3, it would work, and provide the same output as we see above.
Therefore providing a vector for x also works the same way for our defined
function.

Note that the above definition is a general one and the structure is necessary
for more complicated functions. However, with a function as simple as the
above, where only one line of code can be used to describe the value of the
function, neither the curly brackets nor the return command are necessary.
This can be easily verified:

f2 <- function(x) x^2 + 3


f(1:4)

## [1] 4 7 12 19

Nevertheless, remember that whenever more than one action is to be done


within the function, the whole body of the function has to be enclosed in
curly brackets – otherwise only the first line will be considered to be part of
the function. Therefore it might be a good idea, in particular while you are
still learning, to include the curly brackets in any case to make sure that you
do not forget them when they are necessary.

To define a function with more than one argument, one simply lists all the
arguments in the brackets after the word function. Note that when then
using the function, the arguments must either be provided in the exact order
as in the definition of the function, or using their names.

f3 <- function(x, y) x^2 + y^3


# to compute 2^2 + 3^3:
f3(2, 3)

## [1] 31

f3(x = 2, y = 3)
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 61

## [1] 31

f3(y = 3, x = 2)

## [1] 31

# but not!
f3(3, 2)

## [1] 17
Now that we know how to define functions, let us turn to plotting them. As
you learn more R commands, you might notice that the names of many of
them are very intuitive. That is also the case for plotting: the command
to use is called plot. To plot pairs of points – which a graph of a function
consists of – you need to provide two vectors, the first vector being the x
coordinates of the points, and the second vector being their y coordinates.
x <- seq(-10, 10, by = 0.5)
y <- f(x)
plot(x, y)
100
60
y

20
0

−10 −5 0 5 10

x
62 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

In your RStudio, the plot will show up in the lower right corner. In some
cases, you might get the following error:
Error in plot.new() : figure margins too large
This might happen if the lower right window of your RStudio is too small to
reasonably show the plot. Just use your mouse to increase the size of this
window and try to plot again.

Let us now turn our attention to the plot we created. The first thing to
catch our attention is probably the fact that the plot does not really show
the graph of a function as one might expect it, instead only the provided
points are plotted. This can be changed by setting another parameter of the
plot function, namely type. Type "l" corresponds to plotting a line. In
this case, R will simply connect the provided points by a line. Note that R
does not distinguish between single and double quotation marks when dealing
with character variables/values, such that both type = "l" and type = ’l’
would lead to the same outcome (you may try this).

plot(x, y, type = "l")


100
80
60
y

40
20
0

−10 −5 0 5 10

x
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 63

Clearly, the closer we choose the values of x to each other, the more exact
the plot. If with the above function we used x <- -10:10 instead of x <-
seq(-10, 10, by = 0.5), the plot will be quite rugged compared to the one
presented above. Some further possible types are "b" or "o" – it remains for
the reader to inspect these.
The plot function belongs to the family of graphic functions and as such
takes many different graphic arguments. Many of them allow to customize
the graph in various ways. For instance, one can use col to set the color of
the points/line. There are several ways to do this: col can be provided the
colors as characters (any basic color and many others will work), as numbers
(e.g. 1 stands for black, 2 for red, 3 for green and 4 for blue) or in terms of
their RGB specification (which is beyond the scope of this text and will not
be discussed here).

plot(x, y, type = "l", col = "red")


lines(x, y + 2, col = 4)
100
80
60
y

40
20
0

−10 −5 0 5 10

Arguments xlim and ylim allow the user to decide what portion of the x-
axis and the y-axis should be plotted. If they are not specified, they are
64 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

chosen such that all the provided points are visible (and with small margins
around). However, one can choose to limit the values to a smaller area or,
on the contrary, show a larger area. It is particularly important to think
about these arguments if you plot several functions in one plot whose ranges
(or domains) differ significantly. Each of these two arguments takes a vector,
specifying the minimal and maximal value of the particular axis, as the input.
The parameters xlab and ylab are used to control the names of the axes
which by default are called the same as the vectors provided (compare the
code and the y-axis of the previous plots and the next one). Finally, the
argument main can be used to provide a title for the graph. However, it is
possible to add the title also after plotting the function itself, by the means
of title.

plot(x, f(x), type = "l", xlim = c(-5, 5), main = "Some function")

Some function
100
80
60
f(x)

40
20
0

−4 −2 0 2 4

x
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 65

plot(x, y, type = 'l', ylim = c(-5, 150), xlab = 'x values', ylab = "y values")
title("Some function")

Some function
150
100
y values

50
0

−10 −5 0 5 10

x values

Note that as mentioned above, the code works properly no matter whether we
use single or double quotation marks. However, for a ’nice code’, one should
decide on one of these two options and use it consistently within code.

Now that we know how to define and plot functions in R, we can plot the
function from Problem 3.1.

C <- function(x) {100*x*sqrt(x) + 500}


plot(0:100, C(0:100), type = "l",
main = "Cost function", xlab = "x", ylab = "C(x)")
66 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

Cost function

8e+04
C(x)

4e+04
0e+00

0 20 40 60 80 100

3.2.1 Plotting several functions in one plot

You might have noticed that whenever we used the function plot, a new plot
was created. If one would like to add another function or just a few points
to an already existing plot, there are other commands that can be used.
Depending on whether only points or a line graph should be added, the basic
commands are points and lines. Both of these work very similarly to plot,
though some arguments of plot, like those for customizing the axes or the
plot title, will not work with them, since these arguments apply to the whole
plot being just created, whereas points and lines only add to the already
existing plot. Therefore, as already mentioned above, it might be a good
idea to think about the range of your functions before you start plotting, and
either start with the function that would need a larger area on the y-axis, or
adjust the axis limits accordingly.

To illustrate how one can plot several functions and some extra points in a
single figure, let us plot some more functions.
3.2. DEFINING AND PLOTTING FUNCTIONS IN R 67

plot(0:100, C(0:100), type = "l",


main = "Some functions", xlab = "x", ylab = "y")
lines(0:100, f(0:100), col = "red")
points(20,C(20))

Some functions
8e+04
y

4e+04
0e+00

0 20 40 60 80 100

Note that we have split the plotting command in two lines. R generally
does not interpret the end of line as the end of command. On the contrary,
if the command has not been finished in a line, for instance brackets were
not closed, R will continue reading in the next line until it finds the end of
the command – e.g. the closing bracket of a function. In particular if using
functions with many arguments, splitting commands into several lines can
be beneficial to make the code better readable.

Now that we know what a function is, we will study some special types of
functions and their applications in the following sections.
68 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

3.3 Linear functions

The simplest and at the same time very important class of functions are
linear functions.
Definition 3.1. A linear function is a function f : R → R of the form
f (x) = ax + b
with a, b ∈ R. The two parameters a, b of a linear function are called the
slope and intercept of the function, respectively.

The graph of a linear function is a line, and the names of the parameters
allude also to their interpretation:

ˆ The intercept b gives the intersection of the function with the y-axis.
That is, it is the value of the function at x = 0: f (0) = b.
ˆ The slope a describes how much the function value y = f (x) changes
with a unit change in x. That is, if x increases by 1, f (x) changes
by a. If a > 0, the function is increasing (we will have a closer look
at increasing and decreasing functions at a later point), that is, if x
increases, so does f (x). For a < 0, we obtain a decreasing function for
which f (x) decreases with increasing x. Finally, for a = 0 the function
is constant.

By comparing the values of the function for two values of the argument x,
we obtain the following equality that holds for any linear function:
f (x2 ) − f (x1 )
=a (3.1)
x2 − x 1
(it is a simple exercise to show that this holds by plugging in the functional
values of f (x1 ) and f (x2 )). This goes hand in hand with the interpretation
of a slope: If a unit change in x results in a change of a in f (x), then f (x)
has to change by a(x2 − x1 ) when changing x from x1 to x2 .

Since a line is uniquely determined by two points, the form of a linear function
can uniquely be determined if the values of the function for two points are
given. We will illustrate this in the following problem, which also shows an
application of linear functions. In particular, in simple economic models the
supply and demand for a product in a market are often modelled by a linear
function.
3.3. LINEAR FUNCTIONS 69

Problem 3.2. Suppose demand D is a linear function of its price per unit
P . When price is e10, demand is 300 units, and when price is e15, demand
is 250 units. Find the demand function.

Solution. From the given information, we know the following:

ˆ D(P ) = aP + b for some a, b ∈ R,


ˆ D(10) = 10a + b = 300 and
ˆ D(15) = 15a + b = 250.
From the information about D(10) and D(15), we obtain from (3.1) that

D(15) − D(10) −50


a= = = −10.
15 − 10 5
Alternatively, we can argue about a as follows: From the values of D(15) and
D(10), we see that an increase in the variable P by 5 leads to a decrease in
the function value by 50. An increase by 1 would therefore lead to a decrease
by 50
5
= 10, which gives us a = −10.

To obtain the value of b, we can just plug in into one of the equations:
−10 · 10 + b = 300, leading to b = 400.

The demand function is thus given by D(P ) = −10P + 400.

Problem 3.3. Assume that the demand as a function of price is as in


Problem 3.2 and the supply function S is given by S(P ) = 10P . Find
the equilibrium price P with D(P ) = S(P ). Plot the demand and supply
functions and the equilibrium.

Solution. To find the equilibrium price, we simply solve a linear equation:

−10P + 400 = 10P


−20P = −400
P = 20

The equilibrium price is thus P ∗ = 20, with both demand and supply being
200 units at this price. For the plot, we use R:
70 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

D <- function(P) -10*P + 400


S <- function(P) 10*P
p <- c(0,40)
plot(p, D(p), type = "l", main = "Market equilibrium",
xlab = "price", ylab = "demand and supply")
lines(p, S(p), col = "green")
points(20, 200)

Market equilibrium
400
demand and supply

300
200
100
0

0 10 20 30 40

price

Note that we have only used two possible prices. This has again got to do
with the fact that a linear function is determined by just two points.

3.3.1 Plotting linear functions in R

To plot a linear function in R, one can simply define the function and use the
plotting function as we did in Problem 3.3. However, since linear functions
are on the one hand very simple and on the other hand, they play a very
important role in mathematics, there is also a function abline that allows
3.3. LINEAR FUNCTIONS 71

to plot linear functions (and lines in general) in a more direct way. However,
this function, similarly to lines and points only adds a line to a plot; it
cannot be used to create a new plot. There are several possibilities how this
command can be used:

ˆ Plot a linear function: abline(a, b) adds a line determined by a linear


function with intercept a and slope b. Be careful, the roles of a and
b are exchanged in comparison to their roles as used in the classical
definition of a linear function!

ˆ Plot a horizontal line: abline(h = H) plots a horizontal line of the


form y = H.

ˆ Plot a vertical line: abline(v = V) plots a vertical line of the form


x=V.

Let us demonstrate this on the above example with the market equilibrium.
Next to the demand and supply functions, we will also plot a horizontal line
at the level of the equilibrium demand and supply, and a vertical one at the
level of the equilibrium price. However, to make the plot easier to read, we
will use dashed lines for the horizontal and the vertical line. To this end,
we will use a graphical parameter that we have not used so far: lty can be
used to control the type of the line being plotted. By default it is equal to
1, which corresponds to a full line. lty = 2 means dashed line. Again, you
can play around with this parameter to see what different values mean.

D <- function(P) -10*P + 400


p <- c(0,40)
plot(p, D(p), type = "l", main = "Market equilibrium",
xlab = "price", ylab = "demand and supply")
abline(a = 0, b = 10, col = "green")
points(20, 200)
abline(h = 200, col = "blue", lty = 2)
abline(v = 20, col = "red", lty = 2)
72 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

Market equilibrium

400
demand and supply

300
200
100
0

0 10 20 30 40

price

3.4 Polynomial functions

Another important class of functions is the class of polynomial functions.


In fact, linear functions are also special cases of polynomial functions, in
particular, they are polynomials of order 1 (if a ̸= 0).

Definition 3.2. A polynomial of degree n is a function f : R → R of the


form
f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0
with an ∈ R \ {0} and an−1 , . . . , a0 ∈ R.

Polynomial functions are generally considered to be ’nice’ functions that are


’well-behaved’. They are very well understood from mathematical point of
view, however, finding roots (values x with f (x) = 0), minima or maxima
can be tedious by hand. Generally, knowing the roots of polynomials is very
useful.
3.4. POLYNOMIAL FUNCTIONS 73

As already mentioned above, polynomials of degree 1 correspond to linear


functions. Polynomials of degree 2 are another special class of functions,
namely quadratic functions.
Definition 3.3. A quadratic function is a function f : R → R of the form

f (x) = ax2 + bx + c

with a ∈ R \ {0} and b, c ∈ R.

The graph of a quadratic function is a parabola that opens

ˆ upwards if a > 0 and


ˆ downwards if a < 0.
The minimum (if a > 0) or maximum (if a < 0) can easily be investigated
using the following identity:
2
b2 − 4ac

2 b
ax + bx + c = a x + − . (3.2)
2a 4a
Recall that we already mentioned this equality when deriving the solutions
of a quadratic equation in Chapter 1.2.1. Let us now study it in more detail.
b 2 b 2
Due to the square, we have that (x + 2a ) ≥ 0, with (x + 2a ) = 0 for
b b 2
x = − 2a . If a > 0, clearly 0 is the smallest possible value that a(x + 2a ) can
2
b −4ac
achieve, and by considering also the term − 4a , which is a constant, i.e.
not dependent on x, we get that the smallest possible value the quadratic
2
function can achieve is − b −4ac
4a
, and this happens if x = − 2ab
. If a < 0, the
argument holds very similarly, the only difference being that the f achieves
b
its maximum if x = − 2a .
Problem 3.4. Consider the function f (x) = − 12 x2 −x+ 23 . Find its minimum
or maximum. Moreover, find the values x1 , x2 with f (xi ) = 0 and use these
to reformulate f and to study how the sign of the function changes. Verify
your findings by plotting the function.
Solution. We have a = − 21 , b = −1 and c = 32 . Since a < 0, the parabola
of the function opens downwards and the function possesses a maximum.
To find it, we will make use of equality (3.2). By plugging in, we find that
f (x) = − 12 (x + 1)2 + 2, which tells us that the maximum is achieved at
x = −1 with the value 2.
74 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

To find the roots of the function, we can use the formula for solving quadratic
equations, to find that f (x) = 0 for x1 = −3 and x2 = 1. From that it follows
that f (x) = − 12 (x + 3)(x − 1). Let us now study the sign of the function. We
know that for x1 and x2 , it is equal to 0, so these will be the points where the
sign changes. We can therefore split the real numbers into three intervals:
(−∞, −3), (−3, 1) and (1, ∞). In the first interval, we have (x + 3) < 0 and
(x−1) < 0, which after multiplying these two terms with − 12 yields f (x) < 0.
Similar situation arises in the third interval where both brackets are positive.
On the other hand, between -3 and 1, (x + 3) > 0 and (x − 1) < 0, which
implies f (x) > 0.

Finally, we can verify all of the above graphically in R:

f <- function(x) -1/2*x^2 -x + 3/2


x <- seq(-5, 5, by = 0.01)
plot(x, f(x), type = "l")
points(-1, 2, col = "red")
abline(h=0)
points(c(-3, 1), c(0, 0), col = "green")
0
−5
f(x)

−15 −10

−4 −2 0 2 4

x
3.5. POWER FUNCTIONS 75

3.5 Power functions

Definition 3.4. A power function is a function f : R+ → R of the form


f (x) = Axr for A, r ∈ R. If r > 0, we may allow 0 to be in the domain of
f , with f (0) = 0. Moreover, if r = ab with a ∈ Z and b ∈ N odd, negative
values (and if a > 0 also 0) can be contained in the domain.

At a closer look, one can see that if r ∈ N, a power function is a special kind
of a polynomial of degree r, with ar−1 , . . . , a0 = 0.

Any power function always passes through the point (1, A) which can easily
be verified by plugging in 1 into the general formula of a power function.

Example 3.2. Assume that the relationship between the size of houses s (in
m2 ) and their selling price P (in e) follows approximately P (s) = 40000·s0.4 .
In that case, a house of no size, logically, costs 0 e. A tiny house of size 1
m2 would cost P (1) = 40000 e and a house with an area of 10 m2 would
cost approximately 100475.5 e. To get an idea about this relationship, we
can plot the function in R:

P <- function(s) 40000*s^0.4


s <- seq(0, 100, by = 0.1)
plot(s, P(s), type = "l", main = "House prices")

House prices
150000
P(s)

0 20 40 60 80

s
76 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

3.6 Exponential and logarithmic functions

The last two classes of functions we will discuss in this section are closely
connected to each other, since they are what we call inverse functions to each
other – see Chapter 4. We will start by introducing exponential functions.

Definition 3.5. An exponential function is a function f : R → R of the form

f (x) = Aax

for A ∈ R and a ∈ R+ .

A special case of an exponential function is what we call the natural exponential



1
P
function. This function arises with a = e = n!
≈ 2.718281828459045 and
n=0
A = 1, and is often denoted as exp(x).

Exponential functions, and in particular the natural exponential function,


play a special role in many areas. One such example is probability and
statistics, where the density of the Gaussian (also called normal) distribution
is given by
1 2
√ e−x /2 .

Exponential function also appears in the densities or distribution functions


of many other distributions; some of them will be introduced to you in
Quantitative methods II.

To have a better idea about exponential functions, let us plot the natural
exponential function.

x <- seq(-3, 3, by = 0.1)


plot(x, exp(x), type = "l", main = "Exponential function")
3.6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS 77

Exponential function
20
15
exp(x)

10
5
0

−3 −2 −1 0 1 2 3

The graph will always pass through two points: (0, A) and (1, Aa) (can be
easily verified). The shape will always be similar to the one we observe for
the natural exponential function, but it might be scaled or mirrored around
the x-axis. Whether it opens upwards or downwards is of course governed
by both A and a. If we for a moment consider A > 0, the value of a decides
whether the function increases (for a > 1) or decreases (for a < 1). If A < 0,
it is exactly the other way round. Note that the exponential function can
never reach the value 0 which is why the graph of any exponential function
cannot intersect with the x axis.

The rules listed in 1.2.4 apply when working with exponential functions.
In particular, for an exponential function of the form f (x) = ax for some
a ∈ R+ , for any x, y ∈ R we have the following:
78 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

f (x)f (y) = ax · ay = ax+y = f (x + y),


f (xy) = axy = (ax )y = (ay )x = f (x)y = f (y)x ,
 x
−x 1 1
f (−x) = a = = ,
a f (x)
f (0) = a0 = 1.

Generally, one can interpret an exponential function of the form f (x) = Aax
as follows:

ˆ If A > 0 and a = 1 + for some p > 0, then f (x) increases by p% for


p
100
every unit increase in x.
ˆ If A > 0 and a = 1 − for some p > 0, then f (x) decreases by p%
p
100
for every unit increase in x.

As already hinted to in Section 1.2.4, exponential functions play an important


role in financial mathematics. There, the amount of money in an account
after t years can be described by an exponential function. In particular, if
the amount A is deposited in an account at time 0 and the interest in the
account is r · 100% per year, then after t years, the sum in the account is
f (t) = A(1 + r)t .

In some situations, rather than being interested in the value of the exponential
function after some time, one is interested in finding the x for which the
function reaches a particular value. To do so, one takes the logarithm, as
defined in Definition 1.2. The logarithmic function is therefore very closely
related to the exponential function.
Definition 3.6. A logarithmic function is a function f : R+ → R of the form
f (x) = loga x
with ∈ R+ .

As already mentioned in Section 1.2.5, sometimes we write log x or ln x. In


both of these cases, we mean loge x, unless stated otherwise.

Again, to get a first idea about the general form of logarithmic functions, let
us plot the natural logarithmic function.
3.6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS 79

x <- seq(0, 10, by = 0.1)


plot(x, log(x), type = "l", main = "Logarithmic function")

Logarithmic function
2
1
log(x)

0
−1
−2

0 2 4 6 8 10

The graph of a general logarithmic function will always pass through the
points (1, 0) and (a, 1). Again, the general shape of the logarithmic function
is as observed in the graph of the natural one, but different values of a will
result in different scalings and possibly a mirrored version. For a > 1 we
have an increasing function whose graph opens downwards; for a < 1 we get
a decreasing function with the graph opening upwards.

Using the rules from Section 1.2.5, we can derive the following properties of
the logarithmic function f (x) = loga (x) for x, y ∈ R+ :

f (xy) = loga xy = loga x + loga y = f (x) + f (y),


f (xp ) = loga xp = p loga x = pf (x),
f (ax ) = loga ax = x,
af (x) = aloga x = x.
80 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

The last two properties show clearly how close the exponential and logarithmic
functions are.
Problem 3.5. With the help of the properties of exponential and logarithmic
functions, derive the rule for loga x − loga y.
Solution. To find the rule, we start by writing − loga y as
1
− loga y = −1 · loga y = loga y −1 = loga .
y
Then using the fact that loga x + loga y = loga xy, we can write that
1 x
loga x − loga y = loga x + loga = loga .
y y
Problem 3.6. Simplify the function f (x) = exp(ln x2 − 2 ln y).
Solution. We again make use of the properties of the logarithmic and exponential
functions (including the one derived in Problem 3.5) to write:

x2 x2
 
2 2
f (x) = exp(ln x − ln y ) = exp ln 2 = 2 .
y y

We will close this section with a discussion about doubling times.


Problem 3.7. Assume that you invest e10 at an annual interest rate of 1%.
Determine f (t), the amount you have after t years. How long does it take
(approximately) for your investment to double? How long does it take to
quadruple?
Solution. We have r = 0.01 and A = 10, so f (t) = A(1 + r)t = 10 · 1.01t .
To find out how long it takes for the investment to double, i.e. to reach e20,
we solve

10 · 1.01t = 20
1.01t = 2
log 1.01t = log 2
t log 1.01 = log 2
log 2
t= ≈ 70
log 1.01
after rounding to a whole number. Similarly, the time in which the investment
quadruples can be computed as t = loglog1.01
4
≈ 139.
3.6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS 81

Problem 3.8. How long does it take for an amount x to double at a yearly
interest rate of i ∈ {1, 2, 3} %?

Solution. Let us find the doubling times for general i:

 t
i
x 1+ = 2x
100
 t
i
1+ =2
100
log 2
t= i

log 1 + 100

If we now compute these values we get

ˆ for i = 1%: t ≈ 70,


ˆ for i = 2%: t ≈ 35,
ˆ for i = 3%: t ≈ 23.
In the above problem, we might observe the pattern that the doubling time
for i = 2% is about half of the time for i = 1%, and the time for i = 3 is
about a third of the time for i = 1%. In fact, this pattern can be observed
in general, and since we have t ≈ 70, this pattern can be written as t ≈ 70/i
for general i. This rule of thumb is known as the the rule of 70. This rule
can also be easily verified in R.

i <- 1:20
rule70 <- 70/i
exact_t <- log(2)/log(1+i/100)
plot(i, rule70, main = "Rule of 70",
xlab = "Interest", ylab = "Doubling time")
points(i, exact_t, col = "red")
82 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

Rule of 70

70
Doubling time

50
30
10

5 10 15 20

Interest

rule70 - exact_t

## [1] 0.339283106 -0.002788781 -0.116438917 -0.172987685


## [5] -0.206699083 -0.228994379 -0.244768351 -0.256468342
## [9] -0.265453949 -0.272540897 -0.278248255 -0.282922041
## [13] -0.286801784 -0.290058556 -0.292817788 -0.295173539
## [17] -0.297197719 -0.298946245 -0.300463246 -0.301784017

3.7 Defining functions with default values in


R

We have already mentioned about default values for some functions in Chapter
2. As an example, we can mention the function log, for which we can find
in help the following: log(x, base = exp(1)). To recall what this means,
we see that the function takes two parameters, x and base. Since x is listed
without further details, this is a mandatory argument that the user has to
3.7. DEFINING FUNCTIONS WITH DEFAULT VALUES IN R 83

provide. On the other hand, base is followed by an equality sign which means
that this argument has a default value as listed after the equality sign, i.e.
in this case it is exp(1). That means that log by default computes natural
logarithm. If the argument base is provided, this value will be used as the
base of the algorithm. However, if the user does not provide any value for
this parameter, R will automatically work with the default value exp(1).

If a user wants to define a function with default values, it works in a very


similar way. When listing the function arguments in its definition, one simply
follows the name of the argument that should have a default value by an
equality sign and the default value itself. We will show how this works on
the example of defininig a general linear function:

linfun <- function(x, a = 0, b = 0) {a*x + b}

This function, as we can see in its body, computes the value a*x+b. If the
values a and b are provided, these values will be used, otherwise their default
values 0 will be used. Let us study the results:

# a = 1, b = 2
linfun(1:5, 1, 2)

## [1] 3 4 5 6 7

linfun(1:5, a = 1, b = 2)

## [1] 3 4 5 6 7

# a = 2, b = 1
linfun(1:5, a = 2, b = 1)

## [1] 3 5 7 9 11

# a = 0, b = 0 by default
linfun(1:5)

## [1] 0 0 0 0 0
84 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

# a = 0 y default, set b = 1
linfun(1:5, b = 1)

## [1] 1 1 1 1 1

# be careful! If names are not provided, the values will be


# assigned in order:
linfun(1:5, 1) # this code means a = 1, b = 0 instead

## [1] 1 2 3 4 5

3.8 Exercises

3.1 Find the functions from the graphs. Note that all functions are linear,
quadratic, exponential or logarithmic.
2 4 6
10

g(x)
f(x)

−2
0

−6

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x
l(x) j(x) h(x)

−1.0 −0.5 0.0 0.5 1.0 0 2 4 6 8 −4 −2 0 2

−4
−2
2

−2
0
3.8. EXERCISES

x
x
x

0
4

2
6

4
8
m(x) k(x) i(x)

−8 −6 −4 −2 −2 −1 0 1 2 0 5 10 15 20

−2
−5
−4

−1
−3

4
−2

x
x
x

0
−1

6
0

1
1

2
2
85
86 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

3.2 Determine the domain and range of the following functions:



a)f (x) = x + 4
b)g(x) = x2 + 2x + 7
c)h(x) = log(3 − x) + 2
1
d)i(x) = x2 +4x+5

3.3 In the following scenarios, find the demand and supply function (unless
given explicitly), the equilibrium price(s) P ∗ and the equilibrium demand and
supply D(P ∗ ) = S(P ∗ ). Then write a code in which you define the demand
and supply functions D and S and plot a graph showing both functions and
the market equilibrium that satisfies the following (note that the list is only
to maintain readability, the points do not necessarily show up in the code in
this order):
i)Both functions are plotted as lines, and there is a point/points that
shows the equilibrium/equilibria.
ii)The demand function, supply function and the equilibrium/equilibria
are all plotted in different colors.
iii)It shows the values of both functions for P between x1 and x2 (where
x1 and x2 are defined in each scenario).
iv)The title of the function is Market equilibrium, the x-axis is called
Price and the y-axis is called Quantity.
The scenarios:
a)The demand for a particular product in dependence on the price P is
given by the quadratic function D(P ) = P 2 − 20P + 100 and the supply
S follows a linear function. The supply at P = 0 is 0 units, whereas at
P = 3, the supply is 15 units.
b)The demand and supply in a market are both linear functions of price
P . If the price is 10 EUR, the demand is 80 pieces, whereas the supply
is only 30 pieces. On the other hand, if the price is 25 EUR, then the
demand is 50 pieces and the supply is 75 pieces.
c)The demand in a market is a quadratic function of price and supply in
the same marked is a linear function of the price P . If the price is 2
EUR, the demand is 84 pieces, whereas the supply is only 18 pieces. On
the other hand, if the price is 6 EUR, then the demand is 28 pieces and
the supply is 54 pieces. Moreover, if the product were for free (P = 0),
the demand is 100 pieces.
3.8. EXERCISES 87

d)The demand in a market is a linear function of the price P and supply in


the same market is a quadratic function of price. If P = 2, the demand
is 228 pieces, whereas the supply is only 20 pieces. If P = 10, then
the demand is 84 pieces and the supply is 420 pieces. Moreover, if the
product were for free, there is no supply.

3.4 In the following, you will find short tasks and corresponding pieces of R
code. However, in each code, there are some mistakes. Find and correct the
mistakes (and avoid building in new ones) such that in the end, you have a
functioning piece of code that fulfills the given task.Try to do so without using
R first, only then check your final code in R (using several different input
parameters). In each task, you may assume that the provided arguments
satisfy conditions given, i.e. a ”missing check” is not a mistake (e.g. if the
inputs are required to be integers, you may assume that the user will only
plug in integers).
1
a)Implement a function that evaluates f (x, y) = ln x + x+y + 3y for x, y ∈
R.
f(x, y) <- function {
part1 <- ln(x)
part2 <- 1/x+y
part3 <- 3y
return(part1 + part2 + part3)
}
b)Implement a function expon.inverse that for any positive number a
prints the value of the function ax and its inverse for some positive x.
The arguments of expon.inverse are x and a, where the default value
of a is the Euler’s constant e (but of course, the function should work
properly for any value of a, not only the default).
expon.inverse <- function(x, a = e) {
print(a^x)
print(log(x))
}

8
2 |x−y|
c)Implement a function that evaluates f (x, y) = (x + y) + 4x+5y
for
positive x and y.
f <- function(x, y) {
part1 <- (x + y)^2
part2n <- abs(x-y)^1/8
part2d <- 4x+5y
88 CHAPTER 3. FUNCTIONS OF ONE VARIABLE

print(part1 + part2n/part2d)
}

d)Implement a function that evaluates f (x, y) = log10 (x) + xy + ey for
positive x and y. Then use it to compute f (10, 1).
f <- function(x, y) {
p1 <- log(x)
p2 <- sqrt(xy)
p3 <- e^y
p1 + p2 + p3
}
f(y = 1, x = 10)

3.9 Further readings

Chapter 4 of [1] offers a detailed overview of the contents of this chapter.


In addition to the theoretical explanations, each section also offers many
exercises to check your understanding of the topics. We particularly suggest
you do the following exercises (though other exercises might be helpful, too):

ˆ 8, 11 and 13-15 in Section 4.2,


ˆ 1-7 in Section 4.4,
ˆ 2-5 in Section 4.5,
ˆ 3-8 in Section 4.6,
ˆ 1 and 4-6 in Section 4.9,
ˆ 1-9 in Section 4.10,
ˆ 4, 6-8, 11, 13, 19, 20, 23 and 24 in Review exercises.
Chapter 4

More on functions

In this chapter we will extend the knowledge gained in the previous chapter in
several ways. We will study how simple operations like addition or multiplication
transform graphs of functions and how new functions can arise from (basic)
functions by combining several functions. We will introduce the notion of
an inverse function and finally, we will study some important properties of
functions.

4.1 Transformations of graphs

In the following, let us consider a function f : A → B.

We will start this section by considering adding a constant (positive or


negative, which means that this covers the case of subtracting a constant,
too). We distinguish between adding the constant to the function image and
to the variable.

ˆ Consider the function g(x) = f (x) + c. The graph of g is the graph of


f shifted by c along the y-axis (i.e. shifted upwards by c if c > 0 and
downwards by |c| if c < 0). The domain of g is the same as the domain
of f ; the range of g is the range of f shifted by c.
ˆ Consider the function h(x) = f (x + c). The graph of h is the graph of
f shifted by −c along the x-axis (i.e. shifted to the left by c if c > 0
and to the right by |c| if c < 0). The domain of h is the domain of f
shifted by −c; the range of g is the same as the range of f .

89
90 CHAPTER 4. MORE ON FUNCTIONS

To illustrate these shifted functions, let us define a function f in R and then


transform it as described above to g, h (shifts by positive constants), g2 and
h2 (shifts by negative constants). We will plot all five functions in one figure
to compare them. Note that we change the argument ylim to make sure that
all values of the transformed functions can be shown (since some of them will
not be reached in the chosen interval for f ).

f <- function(x) x^2


g <- function(x) f(x) + 2
h <- function(x) f(x + 2)
g2 <- function(x) f(x) - 2
h2 <- function(x) f(x - 2)
x <- seq(-3, 3, by = 0.1)
plot(x, f(x), type = "l", ylab = "y",
ylim = c(-2, 25), main = "Tranformed functions")
lines(x, g(x), col = 2)
lines(x, h(x), col = 3)
lines(x, g2(x), col = 4)
lines(x, h2(x), col = 5)
legend("top", c("f", "g", "h", "g2", "h2"), lty = 1, col = 1:5)

Tranformed functions
10 15 20 25

f
g
h
g2
y

h2
5
0

−3 −2 −1 0 1 2 3

x
4.1. TRANSFORMATIONS OF GRAPHS 91

Remark. Note that if you plot several functions in one graph, like in this
case, it is usually a good idea to provide a legend that allows the reader
to distinguish between the various functions. The command to do so is, not
surprisingly, called legend. In its first argument, one specifies the location of
the legend (next to the keywords "top", "bottom", "topleft", "topright",
etc., it is also possible to specify it by its x and y coordinates). Next, one
provides the vector of texts to describe the various lines. lty controls what
type of lines is used in the legend – if it is only one number, all lines will
be of the same type; if it’s a vector, the line types will be assigned in the
same order as the texts of the legend. One can provide the color of the lines
by col, where the assignment of colors works the same way as with the line
type. To learn more about the possible arguments of legend, refer to R help.

Next, we consider how the graph of a function will be transformed upon


multiplication by a constant c ̸= 0 (again, through including values smaller
than 1, this also covers the case of division). Again, the way how the graph
will be transformed depends on whether the constant multiplies the image
of the function or its argument:

ˆ Consider k(x) = cf (x). The graph of k is the graph of f stretched


vertically if |c| > 1 and compressed vertically if |c| < 1. If c < 0, the
graph is additionally flipped around the x axis. The domain of k is the
same as the domain of f ; the range of k consists of the values in the
range of f multiplied by c.
ˆ Consider ℓ(x) = f (cx). The graph of ℓ is the graph of f compressed
horizontally if |c| > 1 and stretched horizontally if |c| < 1. If c < 0,
the graph is additionally flipped around the y axis. The domain of ℓ
consists of the values in the domain of f divided by c; the range of ℓ is
the same as the range of f .
Remark. Note that c = 1 is not an interesting case since then the transformed
and the original function are the same. For c = −1, the function graph will
be neither compressed nor stretched; but it will be flipped around one of the
axes: f (−x) is the function f (x) mirrored about the y axis whereas −f (x)
is the function f (x) flipped around the x axis.

Once again, we resort to R to illustrate these graph transformations. We use


separate graphs to show the effects of multiplication with various constants
in a well visible manner. To better see the effects of the multiplication, we
also add the lines x = 0 and y = 0 (i.e. the axes) for these cases.
92 CHAPTER 4. MORE ON FUNCTIONS

f <- function(x) sin(x) + 1


k <- function(x) 2*f(x)
l <- function(x) f(2*x)
k2 <- function(x) 1/2*f(x)
l2 <- function(x) f(1/2*x)
k3 <- function(x) -2*f(x)
l3 <- function(x) f(-2*x)
k4 <- function(x) -1/2*f(x)
l4 <- function(x) f(-1/2*x)
x <- seq(-10, 10, by = 0.1)
par(mfrow = (c(2, 2)))
plot(x, f(x), type = "l", ylab = "y",
ylim = c(0, 5.5), main = "Tranformed functions")
abline(v = 0, col = "grey")
abline(h = 0, col = "grey")
lines(x, k(x), col = 2)
lines(x, l(x), col = 3)
legend("top", c("f", "k", "l"),
lty = 1, col = 1:3)
plot(x, f(x), type = "l", ylab = "y",
ylim = c(0, 3), main = "Tranformed functions")
abline(v = 0, col = "grey")
abline(h = 0, col = "grey")
lines(x, k2(x), col = 2)
lines(x, l2(x), col = 3)
legend("top", c("f", "k2", "l2"),
lty = 1, col = 1:3)
plot(x, f(x), type = "l", ylab = "y",
ylim = c(-4, 4.5), main = "Tranformed functions")
abline(v = 0, col = "grey")
abline(h = 0, col = "grey")
lines(x, k3(x), col = 2)
lines(x, l3(x), col = 3)
legend("top", c("f", "k3", "l3"),
lty = 1, col = 1:3)
plot(x, f(x), type = "l", ylab = "y",
ylim = c(-1, 3.5), main = "Tranformed functions")
abline(v = 0, col = "grey")
abline(h = 0, col = "grey")
lines(x, k4(x), col = 2)
4.1. TRANSFORMATIONS OF GRAPHS 93

lines(x, l4(x), col = 3)


legend("top", c("f", "k4", "l4"),
lty = 1, col = 1:3)

Tranformed functions Tranformed functions

3.0
f f
5

k k2
l l2
4

2.0
3
y

1.0
2
1

0.0
0

−10 −5 0 5 10 −10 −5 0 5 10

x x

Tranformed functions Tranformed functions

f f
4

k3 k4
l3 l4
2

2
y

y
0

1
−2

0
−4

−1

−10 −5 0 5 10 −10 −5 0 5 10

x x

Notice the line of code par(mfrow = c(2,2)). You probably guessed that
it allows us to plot several graphs stacked next to and above each other, and
this guess was right. In particular, the first value in the vector specifies the
number of graphs above each other and the second one the number of graphs
next to each other, before a new plot is started. We will discuss this setting
in more detail at a later point.
In the following example, we will use the above described transformations.
94 CHAPTER 4. MORE ON FUNCTIONS

Problem 4.1. Suppose a person earning y euros per year pays


(  
y2
max 0, 1000000 − 100 if 0 ≤ y ≤ 100000
T (y) =
9900 + y−100000
4
if y > 100000
euros that year in income tax. To reduce taxes,

ˆ A suggests to allow every individual to deduct 10000 euros before the


tax is calculated.
ˆ B suggests to allow every individual to deduct 5% before the tax is
calculated.
ˆ C suggests to calculate the income tax on the full amount and then to
allow each person a ”tax credit” of 1000 euros.
ˆ D suggests to calculate the income tax on the full amount and then to
allow each person a ”tax credit” of 10%.

Illustrate the tax function graphically. Then visualize the suggestions for tax
reduction and comment on them.
Solution. Before we turn to R to visualize the tax function and the various
tax reductions, let us consider each reduction in turn and see what it means
in mathematical terms.

ˆ A suggests to first deduct 10000 euros before the tax is calculated.


That means that the argument of the tax function is reduced by 10000,
resulting in a new tax function TA (y) = T (y −10000). As we know from
above discussions, this will shift the graph of the function by 10000 to
the right.
ˆ In B’s suggestion, the earnings y are first reduced by 5% before the tax
function is applied on them. This leads to TB (y) = T (0.95y). From
earlier discussions we know that this will result in a horizontal stretch
of the tax function.
ˆ According to C, one would subtract 1000 from the calculated tax in
the form of a ”tax credit”. This would result in TC (y) = T (y) − 1000
and correspond to a shift of 1000 downwards. In this case, to avoid
negative taxes, we should also consider the maximum between this
value and 0, effectively making the transformed tax function TC (y) =
max(0, T (y) − 1000).
4.2. IF...ELSE AND IFELSE 95

ˆ Finally, D suggests to subtract 10% from the computed tax, which


corresponds to TD (y) = 0.9T (y). Such multiplication results in a
vertical compression of the function graph.

Now that we know how the tax function changes under the various suggestions,
we can proceed to plot the graphs of the functions. However, before we do
that, let us have a look at how we can implement the two cases (0 ≤ y ≤
100000 and y < 100000) of the function. Moreover, we will also consider
the way R computes the minimum or maximum when working with vectors.
This will allow us to implement the function T correctly.

4.2 if...else and ifelse

It is often the case in programming that one has to consider several cases of
what should be done based on some underlying conditions. The most basic
way to do this in R is if. The syntax of if is quite simple: one provides
the condition as the argument of if, followed by what should be done if this
condition is satisfied. If there is more than one command specifying the tasks
under the condition, it is necessary to enclose them all in curly brackets (if
there is only one line, the brackets may be used but are not necessary). Let
us illustrate this:

a <- 4; b <- 8
if(TRUE) print(a)

## [1] 4

if(TRUE) {
print(a)
print(b)
}

## [1] 4
## [1] 8

if(FALSE) print(a)
if(a > 3) print(a)
96 CHAPTER 4. MORE ON FUNCTIONS

## [1] 4

if(a > 5) print(a)

In the above examples, we specified what should be done if the condition is


satisfied (i.e. its logical value is TRUE). Therefore whenever the condition of
if was FALSE, nothing was done.

Sometimes it is required to do one thing if a certain condition is satisfied


and another thing otherwise. In that case, we make use of if...else. In
this case, we simply program the if part as above, followed by the command
else and then the specification of what should be done if the condition of
if is not satisfied. In this case however, one should be more careful about
brackets and line breaks. If both if and else only contain one command, it
is possible to write the code without the curly brackets, but it has to be all
written within one line of code for R to understand this properly. Therefore
we recommend to always use curly brackets around the if and else tasks.

if(a > 3) print(a) else print(b)

## [1] 4

# recommended syntax
if(b < 5) {
print(a)
} else {
print(b)
}

## [1] 8

Now let us consider what happens if we would like to check some condition
for both a and b (the same condition for both), and in each case, the tasks for
TRUE and FALSE are the same. Clearly, we can do that separately, similarly to
the above by using one if...else construction for a and a separate one for
b. For instance, we would like to check for both of the values whether they
are greater than 5. If so, "Yeah" should be printed on the screen, otherwise
4.2. IF...ELSE AND IFELSE 97

we should see "No" – in the end, we should see two words, in particular,
knowing the values of a and b, we should see "No" "Yeah". However, this is
impractical if we have more than just two values to check. So let’s try doing
it at the same time:

if(c(a, b) > 5) {
"Yeah"
} else {
"No"
}

## Error in if (c(a, b) > 5) {: the condition has length > 1

We obtain an error that at a closer look explains that this will not work since
if only expects one logical value – but as we know from Chapter 2, c(a, b)
> 5 results in a vector of two logical values.

For situations in which the same condition should be checked for several
values, and then the same task performed depending on the results of this
check, there is the function ifelse. The syntax of this function is the
following: ifelse(condition, task1, task2) where condition represents
the vector of logicals, provided usually in the form of a condition check, task1
specifies what should be done whenever condition has the value TRUE, and
task2 corresponds to what should be done whenever condition is FALSE.
So what we tried (and failed) to do in the above chunk of code can actually
be implemented as follows:

ifelse(c(a, b) > 5, "Yeah", "No")

## [1] "No" "Yeah"

As another example, we check the divisibility of numbers between 1 and 10


by 2:

ifelse((1:10)%%2 == 0, "divisible by 2", "not divisible by 2")

## [1] "not divisible by 2" "divisible by 2" "not divisible by 2"


## [4] "divisible by 2" "not divisible by 2" "divisible by 2"
98 CHAPTER 4. MORE ON FUNCTIONS

## [7] "not divisible by 2" "divisible by 2" "not divisible by 2"


## [10] "divisible by 2"

Here, we used the modulo operator %%: It gives the remainder after dividing
by a certain number:

(1:10)%%2

## [1] 1 0 1 0 1 0 1 0 1 0

Due to the fact that ifelse can work with a vector of conditions, in contrast
to if...else, we also say that ifelse is the vectorized counterpart of
if...else. It is important to remember at this point that we already talked
about vectorized functions in Chapter 3. In particular, if we are interested
in finding the values of a function for several values of the argument, it is
important to have a vectorized function or, if it is not vectorized, to be aware
of this and to work around it. Therefore when we will be defining the function
T , we will make use of ifelse.

4.3 Minimum and maximum in R

We have already mentioned the functions min and max that find the minimum
or maximum, respectively, of a provided set of numbers (for instance a
vector). However, in some situations, we are not interested in the minimum
(or maximum) of a single set. Instead, one might want to find the minimum
between 0 and several other values, for each value separately (e.g. if the values
were -3, 5, -4, 2, we would like to get -3, 0, -4, 0, because for -3 and -4, the
minimum with 0 are these values, whereas both 5 and 2 are greater than 0).
Alternatively, we might have two (or more) vectors, for which we don’t want
to find their maxima, instead we want to find the elementwise maximum,
i.e. a vector in which the first entry is the maximum of the first entries, the
second is the maximum of the second entries etc. For instance, imagine that
in some course, you are given two tests, but only the better of the two results
counts towards your grade. For the teacher to obtain the points that count
towards the grade for each students means to find the elementwise maximum
of two vectors where in the first, there are the results of the first test, and in
the second the results of the second test.
4.3. MINIMUM AND MAXIMUM IN R 99

In situations as described above, min and max will not work as we would
expect:

data <- 1:10


max(5, data)

## [1] 10

max(15, data)

## [1] 15

data2 <- 10:1


max(data, data2)

## [1] 10

Note that in all of these cases, max outputs a single value, in particular the
largest value of all considered values.

The vectorized versions of min and max that will actually perform the tasks
described above are called pmin and pmax.

pmax(5, data)

## [1] 5 5 5 5 5 6 7 8 9 10

pmax(data, data2)

## [1] 10 9 8 7 6 6 7 8 9 10

pmin(data, data2)

## [1] 1 2 3 4 5 5 4 3 2 1
100 CHAPTER 4. MORE ON FUNCTIONS

Now we have all the ingredients we need to define the function T and its
transformations:

T1 <- function(y) { y^2 / 1e6 - 100 }


T2 <- function(y) { 9900 + (y - 1e5)/4 }
Tax <- function(y) {
ifelse(y < 1e5, pmax(0, T1(y)), T2(y))
}
y <- c(9000, 50000, 80000, 120000, 160000)
Tax(y)

## [1] 0 2400 6300 14900 24900

When defining the function Tax, note the use of ifelse and pmax. If instead
of ifelse we used if...else, the code would work well if we provided only
a single value of y. However, if we would like to compute the value of the
Tax function for several y’s at the same time, like above, we would obtain an
error because in fact, y < 1e5 would be a logical vector.

While with if...else we would quite easily find out about our mistake due
to the error message, things could get more dangerous if we used max instead
of pmax. The code would still run, however not the way intended. It is left
to you as an exercise to understand what would happen in that case.

Finally, let us plot the function and the suggested transformations.

y <- seq(0, 2e5, by = 100)


par(lwd = 1.5, lty=2)
plot(y, Tax(y), type = 'l')
lines(y, Tax(y - 1e4), col = 2)
lines(y, Tax(.95*y), col = 3)
lines(y, pmax(Tax(y) - 1e3, 0), col = 4)
lines(y, .9*Tax(y), col = 5)
legend("topleft", c("T - Regular tax",
"T_A - Deduct 10k before taxes",
"T_B - Deduct 5% before taxes",
"T_C - 1k tax credit",
"T_D - 10% tax credit"),
col = 1:5, lty = 2)
4.4. NEW FUNCTIONS FROM OLD 101
35000

T − Regular tax
T_A − Deduct 10k before taxes
T_B − Deduct 5% before taxes
30000

T_C − 1k tax credit


T_D − 10% tax credit
25000
20000
Tax(y)

15000
10000
5000
0

0 50000 100000 150000 200000

4.4 New functions from old

Sums, differences, products and quotients of functions can easily be defined.


Let f : A → B and g : C → D be two functions.

ˆ The sum of f and g, f + g, is given by (f + g)(x) = f (x) + g(x). The


domain of f +g is A∩C (i.e. the domain of f +g consists of all elements
that are included in both A and C).

ˆ The difference of f and g, f − g, is given by (f − g)(x) = f (x) − g(x).


102 CHAPTER 4. MORE ON FUNCTIONS

The domain of f − g is A ∩ C.

ˆ The product of f and g, f g, is given by (f g)(x) = f (x)g(x). The


domain of f g is A ∩ C.

ˆ The quotient of f and g, f /g, is given by (f /g)(x) = f (x)/g(x).


The domain of f /g is A ∩ C \ {x ∈ C : g(x) = 0}. Note that due to
the division by g, all x’s with g(x) = 0 have to be excluded when
considering the domain of the function f /g.

Note that we mentioned the domains of the new functions, but not their
ranges (or even codomains). The reason is that there is no general way
of saying what the range will be, it always depends on the particular two
functions. We can however use the codomains of the functions to find a
possible codomain. For instance, a possible codomain of f + g is the set
B + D which is the set of all values that can possibly be the result of adding
x ∈ B and y ∈ D. However, in the case of the quotient, if D contains 0, it
clearly has to be excluded when defining the domain in a similar way.

The above are the basic simple ways of combining two functions. In fact,
they are all special cases of a more general concept of a composition.

Definition 4.1. If g : A → B and f : B → C, then the composition f ◦ g is


defined by
(
A →C
f ◦ g:
x 7→ f (g(x))

The function g is often called the kernel, interior function or inner function,
whereas f is called the exterior function or outer function.

Remark. Note that in general:

ˆ f ◦ g ̸= f · g and thus also f 2


̸= f ◦ f .

ˆ f ◦ g ̸= g ◦ f .
Problem 4.2. Let f (x) = 3x − x3 and g(x) = x3 . Compute and visualize:

ˆ (f + g)(x),
ˆ (f − g)(x),
4.4. NEW FUNCTIONS FROM OLD 103

ˆ (f g)(x),
ˆ (f /g)(x),
ˆ (f ◦ g)(x),
ˆ (g ◦ f )(x).
Evaluate (f ◦ g)(1) and (g ◦ f )(1).

Solution. We have the following:

ˆ (f + g)(x) = f (x) + g(x) = 3x − x + x = 3x.


3 3

ˆ (f − g)(x) = f (x) − g(x) = 3x − x − x = 3x − 2x .


3 3 3

ˆ (f g)(x) = f (x)g(x) = (3x − x )x = 3x − x .


3 3 4 6

ˆ (f /g)(x) = f (x)/g(x) = (3x − x )/x = 3/x − 1 for x ̸= 0.


3 3 2

ˆ (f ◦ g)(x) = f (g(x)) = 3g(x) − g(x) = 3x − (x ) = 3x


3 3 3 3 3
− x9 .
(f ◦ g)(1) = 2.

ˆ (g ◦ f )(x) = g(f (x)) = (f (x)) = (3x − x ) = 27x − 27x + 9x − x .


3 3 3 3 5 7 9

(g ◦ f )(1) = 8.

To visualize the functions, we will again resort to R.

f <- function(x) {3*x - x^3}


g <- function(x) {x^3}

x <- seq(-5, 5, by = .01)


plot(x, f(x) + g(x), type = 'l', ylab = '')
lines(x, f(x) - g(x), col = "red")
lines(x, f(x) * g(x), col = "blue")
lines(x, f(x) / g(x), col = "green")
104 CHAPTER 4. MORE ON FUNCTIONS

15
5
−5
−15

−4 −2 0 2 4

plot(x, f(g(x)), type = 'l', ylim = c(-10,10))


lines(x, g(f(x)), col = "red")
10
5
f(g(x))

0
−10 −5

−4 −2 0 2 4

x
4.5. INJECTIONS, SURJECTIONS, BIJECTIONS AND INVERSE FUNCTIONS105

4.5 Injections, surjections, bijections and inverse


functions

Definition 4.2. Let us consider a function f : A → B. f is called

ˆ injective if f (x ) ̸= f (x ) for any x , x ∈ A with x ̸= x .


1 2 1 2 1 2

ˆ surjective if f (A) = B, that is, if the range of f is equal to its codomain.


ˆ bijective if it is injective and surjective.
In other words, injectivity means that any value in y ∈ B can be achieved for
not more than one x ∈ A. Graphically this means that any horizontal line
has at most one intersection with the graph of the function. If a function is
surjective, then for any value y ∈ B we can find some (at least one) x ∈ A
which results in f (x) = y. This corresponds to the above definition that the
codomain is equal to the range. Finally, a bijection is also called a one-to-one
and onto correspondence since it means each x ∈ A corresponds to exactly
one y ∈ B and the other way round, or a one-to-one and onto function. At
a closer look, we can say that injectivity means that there is at most one
x ∈ A for any y ∈ B, surjectivity means that there is at least one x ∈ A for
any y ∈ B and bijectivity, since both injectivity and surjectivity should be
satisfied at the same time, corresponds to having exactly one x ∈ A for any
y ∈ B. The definitions are illustrated in Figure 4.1.

Example 4.1. Consider f : A → B, f (x) = x2 .

ˆ If A = R, B = R, then f is neither injective, nor surjective. It is not


injective because for instance f (−2) = f (2); it is not surjective because
there is no x such that for instance f (x) = −2.

ˆ If A = R, B = R , then f is surjective, but not injective.


0
+

ˆ If A = R , B = R, then f is injective, but not surjective.


0
+

ˆ If A = B = R , then f is bijective.
0
+

Recall that in the case of a bijection f , we have a one-to-one relationship


between A and B. That means that we can not only uniquely assign a value
from B to all values in A, but also the other way round. We can therefore
106 CHAPTER 4. MORE ON FUNCTIONS

A B A B

b1 a1
a1 b1
b2 a2
a2 b2
b3 a3

injective, not surjective surjective, not injective

A B B

a1 b1 a1 b1
a2 b2 a2 b2
a3 b3 a3 b3

injective and surjective = bijective neither injective nor surjective

Figure 4.1: Injectivity, surjectivity and bijectivity

consider a function that reverses the assignment given by f and for each
value in y ∈ B, look for the unique value in x ∈ A with f (x) = y. This leads
to the notion of an inverse function.

Definition 4.3. Let f : A → B be a bijective function. A function g : B → A


is called the inverse of f if

g ◦ f = IdA that is, g(f (x)) = x for x ∈ A.

We denote the inverse of f also by f −1 .

Remark. Note that:

ˆ If g ◦ f = Id , then f ◦ g = Id , that is, f (g(y)) = y for y ∈ B. That


A
−1
B
−1
means that if g = f , then also f = g .

ˆ Don’t confuse the inverse f with the reciprocal 1/f !


−1

Finding the inverse thus means finding the function that, when applied to
a value y from the range (and thus codomain – note that f is bijective) of
4.5. INJECTIONS, SURJECTIONS, BIJECTIONS AND INVERSE FUNCTIONS107

f , it gives us the argument x from the domain that would lead to f (x) = y.
Looking for the inverse therefore basically means solving the equation y =
f (x) for x.

Problem 4.3. Find the inverse of f : [−2, 2] → [−9, 7]; x 7→ x3 − 1. Plot f


and f −1 .

Solution. As mentioned above, we will solve the equation y = f (x) for x:

y = x3 − 1
y + 1 = x3
p p
3
y + 1 = x ⇒ f −1 (y) = 3
y+1

Let us now plot the functions in R. First, let us recall that 3 x is the same
1
as x 3 . Since R does not have specific commands for roots higher than 2, we
will need to use this form of the third root. However, before we can plot
(or even compute) the inverse function, we will need to solve a problem that
arises for negative values of y + 1:

(-8)^(1/3)

## [1] NaN

The issue that R cannot compute fractional powers of negative numbers


arises for computational reasons (computers work in a different way than the
human mind and often use more complicated – though possibly more secure
and exact – methods that may sometimes lead to unexpected results) and it
means that we need to define our own function that will allow us to take the
cube root (or any odd root, for that matter) of a negative number. This is
actually fairly simple and can be achieved in just one line of code. We will
make use of a new function, sign(x), that results in -1 for a negative x and
in 1 for x greater than or equal to 0.

nthroot <- function(x, n) {


ifelse(n%%2 == 1 | x >= 0, sign(x)*abs(x)^(1/n), NaN)
}
nthroot(-4, 2)

## [1] NaN
108 CHAPTER 4. MORE ON FUNCTIONS

nthroot(-8, 3)

## [1] -2

Note that whenever the second argument, n, is even AND at the same time
x is negative, we still return NaN since even roots of negative numbers are
not well defined. For odd roots, we take the odd root of the absolute value
of the number, and adjust the sign based on the original sign of x.

Now that we know how to take the third root also for negative numbers, we
can turn to plotting f and its inverse.

x <- seq(-2, 2, 0.01)


y <- seq(-9, 7, 0.01)

f <- function(x) {x^3 - 1}


f_inv <- function(y) {nthroot(y + 1, 3)}

plot(x, f(x), type='l', ylim=c(-2,2))


lines(y, f_inv(y), col=2)
abline(0, 1, col=3)
2
1
f(x)

0
−1
−2

−2 −1 0 1 2

x
4.6. COMMON PROPERTIES OF FUNCTIONS 109

Note that while we have f : [−2, 2] → [−9, 7], for f −1 it is f −1 : [−9, 7] →


[−2, 2]. We took this into consideration when plotting the first function to
make sure that both functions are shown in full by setting xlim properly.

You may notice that the inverse function is actually the original function
mirrored by the line x = y which we purposefully added into the figure. This
is not a coincidence; in fact, it is a general property of the inverse function
and follows from the very definition of the inverse function. This means that
if we know what the graph of f looks like, it is easy to draw the graph of f −1
simply by ”rotating” it.

4.6 Common properties of functions

We will close this chapter with a discussion on some common properties of


functions. We start by some symmetricity notions.

Definition 4.4. Let f : A → B be a function. f is called

ˆ symmetric about a if for all x it holds that


f (a + x) = f (a − x).

If we say that a function is symmetric, we mean that there is some a


about which it is symmetric.

ˆ even if it is symmetric about 0, i.e. for all x it holds that


f (x) = f (−x).

ˆ odd if for all x it holds that


f (−x) = −f (x).

Remark. Note that if 0 is in the domain of f , f can only be odd if f (0) = 0,


otherwise f (−x) = −f (x) cannot be satisfied for x = 0. However, this is only
a necessary condition (recall Chapter 1); not every function with f (0) = 0 is
odd.
Also note that by definition, any even function is also a symmetric function.
110 CHAPTER 4. MORE ON FUNCTIONS

Example 4.2. Usual examples of even functions are for instance x2 or |x|,
but also cos x and many more. Shifting these functions along the y-axis
preserves the property: x2 − 2 is still an even function. By shifting these
functions along the x-axis, they are not even anymore, but they remain
symmetric: for instance (x − 2)2 is symmetric about 2.
A typical example of an odd function is x, x3 or any odd power of x. Also
sin x is an odd function, and of course there are many more. Note that
unlike with even functions, shifting an odd function in any direction ruins
this property.

Next, we turn our attention to monotonicity properties.

Definition 4.5. Let f : A → B be a function. f is called

ˆ increasing on an interval if for all a, b in the interval it holds a < b ⇒


f (a) ≤ f (b).

ˆ strictly increasing on an interval if for all a, b in the interval it holds


a < b ⇒ f (a) < f (b).

ˆ decreasing on an interval if for all a, b in the interval it holds a < b ⇒


f (a) ≥ f (b).

ˆ strictly decreasing on an interval if for all a, b in the interval it holds


a < b ⇒ f (a) > f (b).

ˆ (strictly) monotonic on an interval if it is (strictly) increasing on that


interval or (strictly) decreasing on that interval.

Moreover, a local extremum of f is a point where f changes its monotonicity.

Remark. If we say that a function is monotonic (increasing, decreasing, ...)


without naming an interval, we mean it is monotonic (increasing, decreasing,
...) on its domain.
There are two types of local extrema: a local minimum is a point where the
functions changes its monotonicity from decreasing and increasing, in a local
maximum, the change is from increasing to decreasing.

Remark. We have already mentioned the monotonicity properties several


times before properly introducing them, relying on their intuitive names
and the fact that you should already know them from school. For instance,
we mentioned that a linear function is increasing whenever a > 0 (strictly
4.6. COMMON PROPERTIES OF FUNCTIONS 111

increasing, in fact) and decreasing for a < 0 (strictly decreasing). The


exponential function ax is strictly increasing for a > 1 whereas it is strictly
decreasing for 0 < a < 1 and similarly for the logarithmic function.
Note that a constant function, y = a for a ∈ R is a function that is both
increasing and decreasing, but at the same time it is neither strictly increasing
nor strictly decreasing.

The last type of properties we will consider here are connected to the shape
of the function.

Definition 4.6. Let f : A → B be a function. f is called

ˆ convex on an interval if for all a, b in the interval and all λ ∈ [0, 1] it


holds that f (λa + (1 − λ)b) ≤ λf (a) + (1 − λ)f (b).

ˆ strictly convex on an interval if for all a, b in the interval and all λ ∈


(0, 1) it holds that f (λa + (1 − λ)b) < λf (a) + (1 − λ)f (b).

ˆ concave on an interval if for all a, b in the interval and all λ ∈ [0, 1] it


holds that f (λa + (1 − λ)b) ≥ λf (a) + (1 − λ)f (b).

ˆ strictly concave on an interval if for all a, b in the interval and all


λ ∈ (0, 1) it holds that f (λa + (1 − λ)b) > λf (a) + (1 − λ)f (b).

Moreover, an inflection point of f is a point where f changes its convexity


(from convex to concave or the other way round).

Remark. Graphically, convexity means that the function ”opens upwards”


whereas a concave function ”opens downwards”. A mnemotechnic aid for
these properties are their names themselves: A conVex function might remind
one of the upwards opening V, a concAve function reminds of the downwards
opening A.
Note that for the strict definitions, the points 0 and 1 have to be excluded
for λ since for these values, there will always be equality in the condition
(check this as an exercise).
If we say that a function is convex or concave without naming an interval,
we mean that it is is convex or concave on its domain.
The same as with the monotonicity properties holds for convexity about
a constant function, but also about a general linear function: Any linear
function is both convex and concave, but neither strictly convex, nor strictly
concave.
112 CHAPTER 4. MORE ON FUNCTIONS

Finally, if f is a convex function, then −f is a concave function and the other


way round. This relationship between convex and concave functions is often
used in optimization.

At this point, we would like to give a geometric interpretation of convexity


and concavity. Let us consider a function f : A → B as in the definition above
and two points a, b from its domain. For some λ ∈ [0, 1], the point λa+(1−λ)b
is called a convex combination of a and b. Geometrically, the set of all convex
combinations of two points corresponds to the whole line segment between a
and b. Now, the definition of convexity on the interval [a, b] actually means
the following: for any λ ∈ [0, 1], the value of the function at the corresponding
convex combination is less than or equal to the convex combination of the
individual values. In other words, the line segment between f (a) and f (b)
always stays above the function graph itself (it might possibly touch it, but
never crosses below). Similarly, for a concave function the line segment that
arises by connecting two points on the graph of the function will always stay
below the graph of the function (or possibly touch it).
0 2 4 6 8
y

−2 −1 0 1 2 3

Figure 4.2: Line segment in a convex function, f (x) = x2 , λ = 0.25

Example 4.3. We will prove that f (x) = x2 is a convex function.


4.7. EXERCISES 113

To this end, consider some x1 , x2 ∈ R and λ ∈ [0, 1]. We have

f (λx1 + (1 − λ)x2 ) = (λx1 + (1 − λ)x2 )2


= λ2 x21 + 2λ(1 − λ)x1 x2 + (1 − λ)2 x22 .

This we need to compare with λf (x1 ) + (1 − λ)f (x2 ). In particular, if f is


convex, we have f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) or equivalently
f (λx1 + (1 − λ)x2 ) − λf (x1 ) − (1 − λ)f (x2 ) ≤ 0. Let us check this:

f (λx1 + (1 − λ)x2 ) − λf (x1 ) − (1 − λ)f (x2 )


= λ2 x21 + 2λ(1 − λ)x1 x2 + (1 − λ)2 x22 − λx21 − (1 − λ)x22
= (λ2 − λ)x21 − 2(λ2 − λ)x1 x2 + (λ2 − λ)x22
= (λ2 − λ)(x1 − x2 )2 ≤ 0

where the inequality in the last step holds because λ ∈ [0, 1] such that λ2 ≤ λ.
Hence, f (x) = x2 is a convex function.

4.7 Exercises

4.1 For each of the following function, find its range. Then decide whether
it is injective, surjective or bijective as a function from R to i) R and ii) its
range.

a)f1 (x) = 2x + 5

b)f2 (x) = x2 − 4

c)f3 (x) = 2x

d)f4 (x) = |x + 1|

4.2 For the functions in the pictures below, find the largest interval A for
which they are bijective (if considered as functions mapping from A to f (A)).
114 CHAPTER 4. MORE ON FUNCTIONS

3.0

1.0

50
40
2.5

0.5

30
g(x)

h(x)
f(x)

2.0

0.0

20
10
−0.5
1.5

0
−10
−1.0
1.0

−2 −1 0 1 2 3 4 −1.0 −0.5 0.0 0.5 1.0 −3 −1 0 1 2 3 4

x x x

4.3 For the following functions, decide whether they are even, odd or neither:

4x 4x
a)f1 (x) = 4x d)f4 (x) = x2 −x
g)f7 (x) = x2 +4
x
b)f2 (x) = cos(x) e)f5 (x) = |x| h)f8 (x) = 2
1 x2
c)f3 (x) = x
f)f6 (x) = log x i)f9 (x) = |x|+3

4.4 For the functions below, choose all of the properties they satisfy in the
shown region. Note that in each point, several properties might be satisfied
for one function.

a)convex/strictly convex/concave/strictly concave/none,

b)symmetric/even/odd/none,

c)monotone increasing/monotone decrasing/neither,

d)injective/surjective/bijective/none.
4.7. EXERCISES 115

20
10 15 20

15
g(x)
f(x)

10
5
5

0
0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x
30

40
10
h(x)

i(x)
−10

20
−30

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x
3.5

15
2.5

10
k(x)
j(x)

1.5

5
0.5

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x
116 CHAPTER 4. MORE ON FUNCTIONS

15

0.3
10

0.2
m(x)
l(x)

0.1
5

0.0
0

−2 0 2 4 6 1 2 3 4 5

x x
50 100

−20
n(x)

o(x)
0
−100

−40

−1.0 −0.5 0.0 0.5 1.0 3 4 5 6 7 8 9

x x

4.5 Consider the function f given in the graph below. Sketch the graphs of
the following functions:
a)g1 (x) = f (x) + 2
b)g2 (x) = f (x + 2)
c)g3 (x) = f (x − 1) − 3
d)g4 (x) = 2f (x)
e)g5 (x) = 12 f (x)
f)g6 (x) = −f (x)
g)g7 (x) = −f (2x)
4.7. EXERCISES 117
6
4
f(x)

2
0
−2

−6 −4 −2 0 2 4 6

4.6 For the following functions, decide whether the function has an inverse
(if codomain = range). If the domain is not specified, consider the largest
possible domain; if it is not R, specify it. If there is an inverse, find it.
a)h1 (x) = 3x + 2
b)h2 (x) = − x2 + 4
c)h3 (x) = |x + 2|
d)h4 (x) = 12 x2 for x ∈ [0, ∞)
e)h5 (x) = x2 − 6x + 5
f)h6 (x) = x2 − 6x + 5 for x ∈ (−∞, 3]
g)h7 (x) = 2x−1 − 4
h)h8 (x) = log2 x + 1

i)h9 (x) = 4 ln( x + 4 − 2)

4.7 a)Consider the following functions:

f : R+ → R+ , f (x) = x3 ,
g : [1, ∞) → R+ , g(x) = (x − 1)2 ,

x
h : R+ → R+ , h(x) = .
3
118 CHAPTER 4. MORE ON FUNCTIONS

Find the functions M (x) = (f /g)(x) and D(x) = (h ◦ f )(x), their


domains and codomains. Decide, whether M and D are surjective and
injective. If an inverse exists, find it.
b)Consider the following functions:

f : R0+ → R0+ , f (x) = x,
g : R0+ → R0+ , g(x) = (x − 1)2 ,
h : R0+ → R0+ , h(x) = exp(x) − 1.

Find the functions M (x) = (g/f )(x) and D(x) = (h ◦ f )(x), their
domains and codomains. Decide, whether M and D are surjective and
injective. If an inverse exists, find it.
c)Consider the following functions:

f : (0, e1.5 −1) → (0, 9), f (x) = (2 ln(x + 1) − 3)2 ,



g : (0, 3) → (0, e1.5 −1), g(x) = ex − 1,
h : (0, 3) → (0, 9), h(x) = 9 − x2 .

Find the function M (x) = (9f ◦ g)(h), its domain and the smallest
possible codomain. Decide, whether M is surjective and injective. If an
inverse exists, find it.

4.8 For the following functions, decide whether they are increasing, decreasing
or neither. (Note that derivatives have not been discussed at this point so
even if you are tempted to use them, the idea is not to do so. Instead, use
the definition of monotonicity or your knowledge about basic functions and
how they may change monotonicity when composed with other functions.)
a)f : (−5, 0) → R, f (x) = (|x| − 5)4 ;
b)g : (−10, 0) → R, g(x) = (ln(−x))2 ;
1
c)h : (0, 2) → R, h(x) = exp((x−2)(x+3))
;
d)m : (−2, 0) → R, m(x) = 1 − |x|0.5 ;
2
e)n : (0.5, 5) → R, n(x) = ln x1 .

4.9 In the following, you are given four R functions. For each function, find
out what they do without using R. (Of course you may verify your answer in
R once you find it, but try first to read the function and understand what it
does without using R. This will help you develop your algorithmic thinking
which is absolutely necessary for programming.)
4.7. EXERCISES 119

a)MyFunction1 <- function(a, b) {


smaller <- min(a, b)
check_values <- 1:smaller
divisors <- ifelse(a%%check_values == 0 & b%%check_values == 0,
check_values, 0)
max(divisors)
}
Find the result of executing MyFunction1(25, 60), MyFunction1(5,
67), and MyFunction1(a, b) for general natural numbers a and b.
b)MyFunction2 <- function(a, b) {
smaller <- min(a, b)
check_values <- 1:smaller
larger <- max(a, b)
multiples <- larger*check_values
commons <- multiples[multiples %% smaller == 0]
min(commons)
}
Find the result of executing MyFunction2(7, 60), MyFunction2(6,
10) and MyFunction2(a, b) for general natural numbers a and b.
c)MyFunction3 <- function(x) {
MyResult <- x[1]
if(length(x) > 1) {
for(i in 2:length(x)) {
if(sum(x[i] == MyResult) == 0) MyResult <- c(MyResult, x[i])
}
}
MyResult
}
Find the result of executing MyFunction3(3), MyFunction3(1:5), MyFunction3(c(3,
1, 1, 2, 1, 2, 3, 1)) and MyFunction3(x) for a general vector x
(which might be of length 1, i.e. a single number, too).
d)MyFunction4 <- function(x, y) {
MyResult <- rep(0, length(y))
for(i in 1:length(y)) {
MyResult[i] <- sum(x%%y[i] == 0)
}
MyResult
}
Find the outcomes of executing MyFunction4(12, 1:4), MyFunction4(1:4,
120 CHAPTER 4. MORE ON FUNCTIONS

2), MyFunction4(c(4, 1, 2, 6, 2, 3, 1), c(2, 1, 4)) and MyFunction4(x,


y) for general vectors x and y of natural numbers.

4.10 In the following, you will find short tasks and corresponding pieces of R
code. However, in each code, there are some mistakes. Find and correct the
mistakes (and avoid building in new ones) such that in the end, you have a
functioning piece of code that fulfills the given task.Try to do so without using
R first, only then check your final code in R (using several different input
parameters). In each task, you may assume that the provided arguments
satisfy conditions given, i.e. a ”missing check” is not a mistake (e.g. if the
inputs are required to be integers, you may assume that the user will only
plug in integers).
a)Implement the function division takes two vectors a and b of the same
length as arguments. The result of the function is again a vector of the
same length as the inputs, with the i-th element being 0, if the larger
of the numbers a[i] and b[i] is not divisible by the smaller one, and
otherwise it’s the result of the division. For instance, if you execute
a <- c(4, 3, 1, 6)
b <- c(2, 8, 5, 2)
division(a, b)
the result should be 2 0 5 3. The implementation:
division <- function(a, b) {
smaller <- pmin(a, b)
larger <- max(a, b)
ifelse(larger %% smaller == 0, 0, larger/smaller)
}
b)Implement a function divisible that for any positive integers x and a
considers all values between 1 and x (including) for divisibility by a. For
every value that is divisible by a, it will print "divisible", otherwise
"not divisible". The default value of a should be 2.
divisible <- function(x, a = 2) {
tocheck <- 1 to x
if(tocheck %% a == 0) "divisible" else "not divisible"
}
c)Implement a function HApoints that takes a vector points of QM
student’ points achieved from the home assignments as input, and outpus
a vector of final home assignment scores, where a student’s final score
4.8. FURTHER READINGS 121

is points
2.5
if this value is below 10, and 10 otherwise. For instance, if the
input vector is c(15, 28, 22, 25.3), the output should be 6 10 8.8
10.
HApoints <- function(points) {
divided <- points/2.5
final_score <- min(10, divided)
return(divided)
}
d)Write a function Performance that takes the vector raitings containing
the ratings for a list of movies as input, and outputs a new vector of
classifications based on these ratings. The classification criteria are as
follows: a film is classified as good if the rating value is strictly greater
than 50, and poor otherwise. For instance, if the input vecotr is c(55,
45, 65, 70), the output should be "good" "poor" "good" "good".
Performance <- function(ratings) {
final.class <- if(ratings < 50, "poor", else "good")
ratings
}

4.8 Further readings

Most of the contents of this chapter are discussed in Chapters 5.1-5.3 in


[1]. To practice the function transformations, we suggest exercises 1-5 in
Chapter 5.1. From Chapter 5.2, we particularly recommend exercises 2 and
4 (exercise 3 was used as an example in this chapter). To practice finding the
inverse function, we suggest exercises 1-7 and 9-11 from Chapter 5.3. Review
exercises at the end of Chapter 5 also provide a good source of practice for
the topics of this chapter.
122 CHAPTER 4. MORE ON FUNCTIONS
Chapter 5

Derivatives

In the previous two chapters, we introduced the notion of a function and


considered some properties of single-variable functions. In this chapter, we
investigate in particular the monotonicity and convexity/concavity properties
in more detail, introducing a tool for studying these properties.

In many situations, one is interested in the ’steepness’ of the function or in


other words, in how the function changes if its argument is slightly changed.
For instance, one can be interested in how much the costs of production
change if the produced amount has to be slightly increased. While for some
functions this change can be easily calculated and even stays constant no
matter what the current production level is (think of linear functions), for
other functions such calculations might be fairly complicated.

For a linear function of the form f (x) = ax + b, we know that its steepness
at any point can be quantified by its slope, the parameter a. For every unit
change in x, the function value changes by a. This can also be expressed
by the relationship a = f (xx22)−f
−x1
(x1 )
which stays constant independently of the
values x1 and x2 or even their difference x2 − x1 .

For general functions, finding the slope of the function is not as straightforward.
The rate at which the function changes usually varies in dependence on the
point at which it is to be considered. To quantify this rate of change, one
uses the notion of a derivative.

123
124 CHAPTER 5. DERIVATIVES

5.1 The definition of a derivative

Definition 5.1. Let f : D → R be a function. The derivative f ′ of f is the


limit
f (x + h) − f (x)
f ′ (x) : = lim (5.1)
h→0 h
if the limit exists.
If the limits exists on a subset C of the domain D, we say that f is differentiable
on C. If the limits exists on the domain D, we simply say that f is diffrentiable.
In particular, f ′ itself is a function with f ′ : D → R.

For any point x0 such that f is differentiable at this point (i.e. the above
limit exists), f ′ (x0 ) in fact corresponds to the slope of the tangent line to
the graph of f at x0 . Since the tangent is a line that touches the graph
of the function at the given point, we therefore know that it is of the form
tx0 (x) = f ′ (x0 )x + b for some b ∈ R and that tx0 (x0 ) = f (x0 ). These two
pieces of information lead to the following functional form of the tangent line
at x0 :

tx0 (x) = f ′ (x0 )x + f (x0 ) − f ′ (x0 )x0 = f ′ (x0 )(x − x0 ) + f (x0 ).

Let us use the above definition in (5.1) to find the derivative of a function.
Problem 5.1. Find the derivative f ′ (x) for f (x) = x2 .
Solution. According to the definition, we should find
(x + h)2 − x2
lim .
h→0 h
We have
(x + h)2 − x2 x2 + 2hx + h2 − x2
= = 2x + h.
h h
As h gets closer and closer to 0, the second term diminishes and the only
term that remains is 2x. Therefore f ′ (x) = 2x.
Remark. In the literature, you may see different notation for derivatives.
The following all denote the same thing:
df (x) ∂f (x)
= = f ′ (x).
dx ∂x
5.2. THE FOR CYCLE 125

While we will mostly stick to f ′ (x) for the derivative of a (single-variable)


function, the other two notations are also not unusual and we might use them
interchangeably with f ′ (x) in the rest of this text. These two notations more
directly refer to the interpretation of derivative as the change in f in reaction
to a (small) change in x, or the rate of change. The d in dfdx(x) can be thought
of as difference (change), making the meaning of the formula ’difference in f
divided by difference in x’ which corresponds to the (average) difference in
f for a unit difference in x.

Let us now comment on the meaning of the limit in the definition of a


derivative. To compute the derivative f ′ of f at a given point x ∈ D, one
considers the relative change in the function between x and x + h and keeps
decreasing the distance h between these two points. We illustrate this idea
on the function f (x) = x4 at the point x = 1. We will plot the function and
the lines with slope of the form f (1+h)−f
h
(1)
for different values of h, starting
at h = 1 and decreasing this value. To this end, we will also introduce the
for cycle in R.

5.2 The for cycle

In some situations, one would like to repeat the same task several times,
for instance for various value of a particular parameter. An example of
such a situation is the case described above where we want to repeat the
computation of f (1+h)−f
h
(1)
and the plotting of the corresponding line for
several values of h. If the task should be repeated two or three times, it is
alright to just repeat the corresponding lines of code with the adjustments
of h. However, if the number of repetitions is high, copying the lines of
code is not practical and more importantly, it also gets error prone. In such
situations, the for cycle is a very useful feature. The syntax is the following:

for(i in sequence) {
task
}

Here, i is a variable that changes its values, running through sequence.


(Note that the variable might be called something else than i, too.) A
typical start of a for cycle would be for instance for(i in 1:10) meaning
126 CHAPTER 5. DERIVATIVES

that a certain task is to be repeated 10 times, and the parameter that changes
its values i takes in turn values from 1 to 10. task is one or several lines
of code that specify what should be repeated, and this might be dependent
n
i2 for all
P
on the variable i. For example, we might want to find the value
i=1
natural numbers n from 1 to 5. This could be done as follows:

for(n in 1:5) {
print(sum((1:n)^2))
}

## [1] 1
## [1] 5
## [1] 14
## [1] 30
## [1] 55

To define sequence, one of the simplest ways is of course to use 1:maxi where
maxi is the last value for which task should be repeated. If it depends on
the length of a particular vector vec, this could be done via 1:length(vec).
Alternatively, seq along(vec) also creates an arithmetic sequence with step
size 1 starting at 1 and ending at length(vec). However, note that while
the above examples showcase typical use of for cycles, sequence can be any
sequence of values. It does not have to start at 1, be an arithmetic sequence,
the values don’t have to be integers or even positive.

Now that we know how to achieve that the task of computing the slope
and plotting the corresponding line can be executed 10 times without copy-
pasting the code as many times, let us finally visualize the idea of the
derivative as a limit.

f <- function(x) x^4


fPrime <- function(x) 4*x^3

h <- seq(1, 1e-10, length.out = 10)


xs <- seq(.5, 2, by = .01)
plot(xs, f(xs), type = 'l')
cols <- rainbow(length(h))
# you may also try heat.colors(), terrain.colors()
5.2. THE FOR CYCLE 127

at <- 1

for (i in seq_along(h) ) {
slope <- (f(at + h[i]) - f(at)) / h[i]
points(at, f(at), col = cols[i], pch = 16)
points(at + h[i], f(at + h[i]), col = cols[i], cex = 2, pch = 16)
abline(f(at) - slope*at, slope, col = cols[i])
print(slope)
}
15
10
f(xs)

5
0

0.5 1.0 1.5 2.0

xs

## [1] 15
## [1] 13.19616
## [1] 11.55693
## [1] 10.07407
## [1] 8.739369
## [1] 7.544582
128 CHAPTER 5. DERIVATIVES

## [1] 6.481481
## [1] 5.541838
## [1] 4.717421
## [1] 4

5.3 Calculating derivatives

Even though for many functions, finding the limit in (5.1) is not an issue,
there are others for which this limit is much more complicated. Moreover,
having to compute the limit every time a derivative is needed is also quite
impractical. In the following, we therefore introduce some rules for computing
derivatives as well as the derivatives of some special functions. Let g, h be
differentiable functions.

f (x) = c, c ∈ R ⇒ f ′ (x) = 0 (5.2)


c ′ c−1
f (x) = x , c ∈ R \ {0} ⇒ f (x) = cx (5.3)

f (x) = exp(x) ⇒ f (x) = exp(x) (5.4)
1
f (x) = ln(x) ⇒ f ′ (x) = (5.5)
x
f (x) = sin(x) ⇒ f ′ (x) = cos(x) (5.6)

f (x) = cos(x) ⇒ f (x) = − sin(x) (5.7)
′ ′
f (x) = cg(x), c ∈ R ⇒ f (x) = cg (x) (5.8)
′ ′ ′
f (x) = g(x) ± h(x) ⇒ f (x) = g (x) ± h (x) (5.9)
′ ′ ′
f (x) = g(x)h(x) ⇒ f (x) = g (x)h(x) + g(x)h (x) (product rule)
(5.10)
′ ′
g(x) g (x)h(x) − g(x)h (x)
f (x) = ⇒ f ′ (x) = (quotient rule)
h(x) h2 (x)
(5.11)

1 g (x)
f (x) = for g(x) ̸= 0 ⇒ f ′ (x) = − (5.12)
g(x) g(x)2
f (x) = g(h(x)) ⇒ f ′ (x) = g ′ (h(x)) · h′ (x) (chain rule) (5.13)

Note that the rule (5.12) is a direct consequence of the quotient rule and of
dc
the fact that dx = 0.
5.4. DERIVATIVES AND THE PROPERTIES OF A FUNCTION 129

With the knowledge of the derivatives of some basic functions and the derivative
rules for the sum (difference), product, quotient or composition of several
functions, one can find the derivatives of many other functions, too. We will
illustrate the use of the above rules on two examples next.
sin(x)
Problem 5.2. Find the derivative f ′ (x) for f (x) = tan(x) = cos(x)
.

Solution. We will make use of the quotient rule (5.11) with g(x) = sin(x)
and h(x) = cos(x). From (5.6) and (5.7) we have that g ′ (x) = cos(x) and
h′ (x) = − sin(x). Therefore we can plug in into the quotient rule formula
and write
cos2 (x) + sin2 (x) 1
f ′ (x) = 2
=
cos (x) cos2 (x)

since sin2 (x) + cos2 (x) = 1.


Problem 5.3. Find the derivative f ′ (x) for f (x) = sin(1/ exp(x2 ) − x2 ).
Solution. After a closer look at the function, we can see that f is a composition
g(h(x)) with g(x) = sin(x) and h(x) = 1/ exp(x2 )−x2 such that we will make
use of the chain rule.

h(x) is in turn the difference of two functions. The term x2 can easily
2
be differentiated and we have dx dx
= 2x. The term 1/ exp(x2 ), on the
other hand, is somewhat more complicated. To simplify it a bit, let us
rewrite it as exp(−x2 ). Now we can interpret it as a composite function
k(ℓ(x)) with k(x) = exp(x) and ℓ(x) = −x2 . We have that k ′ (x) = exp(x)
and ℓ′ (x) = −2x. According to the chain rule in (5.13), this gives us
dk(ℓ(x))
dx
= k ′ (ℓ(x))ℓ′ (x) = −2x exp(−x2 ). Altogether, we get that h′ (x) =
−2x exp(−x2 ) − 2x = −2x(exp(−x2 ) + 1).

Coming back to the original composition g(h(x)) and recalling that g ′ (x) =
cos(x), we finally get the derivative of f as

f ′ (x) = g ′ (h(x))h′ (x) = cos(exp(−x2 ) − x2 )(−4x)(exp(−x2 ) + 1).

5.4 Derivatives and the properties of a function

While the interpretation of the derivative as the slope of the tangent to the
function is useful in itself, derivatives can also be used to decide about the
common properties of function like monotonicity and convexity or concavity.
130 CHAPTER 5. DERIVATIVES

Recall that a function f : A → B is called increasing if for x1 < x2 ∈ A we


have f (x1 ) ≤ f (x2 ) (< for strict monotonicity). Note that if this is the case,
then f (x+h)−f
h
(x)
≥ 0 for all x ∈ A and h ∈ R such that x + h ∈ A (and > 0
if f is strictly increasing). This means that for an increasing function, the
derivative is non-negative. In fact, this relationship is an equivalence, i.e.
a non-negative derivative implies that the function is increasing. A similar
argument can be done for decreasing functions.

Let us formalize and summarize these rules:

Theorem 5.1. A function f : A → B is

a) strictly increasing at a point x ∈ A if f ′ (x) > 0;

b) increasing on an interval C ⊆ A if and only if f ′ (x) ≥ 0 for all x ∈ C;

c) strictly increasing on an interval C ⊆ A if and only if f ′ (x) > 0 for all


x ∈ C;

d) strictly decreasing at a point x ∈ A if f ′ (x) < 0;

e) decreasing on an interval C ⊆ A if and only if f ′ (x) ≤ 0 for all x ∈ C;

f ) strictly decreasing on an interval C ⊆ A if and only if f ′ (x) < 0 for all


x ∈ C.

Remark. Note that some of the points above mention f being strictly
increasing or decreasing at a point. We say that f is strictly increasing
(decreasing) at x0 , if there is an interval open interval I, x0 ∈ I, such
that f is strictly increasing (decreasing) on I. Also note that while the
conditions for monotonicity on an interval are ”if and only if”, the conditions
for monotonicity in a point are sufficient only.

Remark. Notice that from the derivative of f at a given point, we can only
decide about its monotonicity at the point if the corresponding derivative is
not equal to 0. If the derivative is equal to 0 at the given point, f could
be increasing, decreasing, or neither. As an example, consider the following
single variable functions: f1 (x) = x2 , f2 (x) = x3 , f3 (x) = −x3 . We have
f ′ (0) = 0 for all three functions, but f1 is neither increasing nor decreasing
at 0, while f2 is increasing and f3 is decreasing.
5.5. HIGHER ORDER DERIVATIVES 131

5.5 Higher order derivatives

Recall that the (first) derivative of a function is in itself a function. Therefore


we could go on and take the derivative of this function, assuming that it
exists. In this logic, we could differentiate this function an arbitrary amount
of times, as long as the previous derivative is still a differentiable function.
Definition 5.2. Let f : A → B be a differentiable function. The second
derivative of f is defined as
df ′ (x)
f ′′ (x) =
dx
if it exists.
Similarly the n-th derivative of f is defined as
df (n−1) (x)
f (n) (x) =
dx
where f (n−1) (x) is the (n − 1)-th derivative, if it exists.

Clearly, the same way f ′ is used to study the monotonicity of f , f ′′ can


be used to study the monotonicity of f ′ . However, f ′′ can also be used
to investigate the curvature (i.e. convexity or concavity) of f itself, as we
summarize in the following theorem:
Theorem 5.2. A function f : A → B is

a) convex on an interval C ⊆ A if and only if f ′′ (x) ≥ 0 for all x ∈ C;


b) strictly convex on an interval C ⊆ A if and only if f ′′ (x) > 0 for all
x ∈ C;
c) concave on an interval C ⊆ A if and only if f ′′ (x) ≤ 0 for all x ∈ C;
d) strictly concave on an interval C ⊆ A if and only if f ′′ (x) < 0 for all
x ∈ C.

5.6 Applications

In addition to studying the monotonicity and convexity of a function, derivatives


have many other useful applications. In this section, we will mention how
132 CHAPTER 5. DERIVATIVES

they can be used to approximate a function with another, possibly better


understood function, and elasticity, an application from the field of economics.
At a later point in the course, we will use derivatives in some of the integration
methods, and we will also use them extensively to optimize functions. However,
these are by far not all the applications of derivatives. They are useful
in solving so called differential equations (very often used e.g. in financial
mathematics or physics), for finding function limits in some specific cases
(L’Hopital’s rule), etc.

5.6.1 Taylor approximation

We already know that the tangent of a function in a particular point x0 from


the domain of a differentiable function f can be expressed as a linear function
of the form
tx0 (x) = f ′ (x0 )(x − x0 ) + f (x0 ).
Close to x0 , this linear function is in fact quite a good approximation of the
function f (though how good, that depends on the function). Higher order
derivatives can help us to obtain a more precise approximation of a given
function by using polynomial functions.

Definition 5.3. The n-th order Taylor approximation of a function f : A →


B about a point a ∈ A is given by
n
X f (i) (a)
Ta (x) = f (a) + (x − a)i . (5.14)
i=1
i!

Note that the function Ta (x) given in (5.14) is a polynomial of degree n;


therefore, it is sometimes also referred to as Taylor polynomial. If n = 1,
we obtain a linear function (and correspondingly a linear approximation of
a given function) that corresponds to the tangent line. For n = 2, (5.14)
results in a quadratic approximation of f , etc. For values close to a, Ta
approximates f well and the approximation gets better with increasing n.
However, moving further away from a, the approximation is not necessarily
good.

As we already mentioned in Chapter 3.4, polynomials are very well understood


from mathematical point of view and are generally considered ’nice’ functions.
One of the reasons for this is that they are easily and arbitrary often differentiable,
and while it may not be generally easy to find their roots or extrema analytically,
5.6. APPLICATIONS 133

there are numerical methods that can do so quite easily. Therefore being
able to approximate a function by a polynomial, like explained above, is a
big advantage.

Let us illustrate the Taylor approximation on the function ex .

Problem 5.4. Find the n-th order Taylor approximation of f (x) = exp(x)
for n ∈ {1, 2, 3, 4, 5}.

Solution. As we can see from the formula (5.14), for the n-th order approximation
we will need derivatives up to order n in the point a. For f (x) = exp(x),
this is not a difficult task: Any derivative f (n) (x) remains the function itself
such that f (n) (x) = exp(x). For a = 0, we have f (n) (a) = 1 for any n ∈ N.
Plugging in into the approximation formula, we therefore get:

ˆ n = 1 : T (x) = 1 + x.
0

ˆ n = 2 : T (x) = 1 + x +
0
x2
2
.

ˆ n = 3 : T (x) = 1 + x +
0
x2
2
+ x3
6
.

ˆ n = 4 : T (x) = 1 + x +
0
x2
2
+ x3
6
+ x4
24
.

ˆ n = 5 : T (x) = 1 + x +
0
x2
2
+ x3
6
+ x4
24
+ x5
120
.

We can even easily write down the approximation formula for any n ∈ N:
n
X xi
T0 (x) = 1 + .
i=1
i!

To study how well the approximation mimics the behavior of the function,
let us plot the function itself and its approximations up to degree n = 5.

x <- seq(-3, 3, by = 0.1)


plot(x, exp(x), type = "l", ylab = "y", ylim = c(-2, 20))
approximation <- function(x, n) {
Ta <- 1
for(i in 1:n) {
Ta <- Ta + x^i/factorial(i)
}
134 CHAPTER 5. DERIVATIVES

return(Ta)
}
cols <- c("red", "green", "blue", "violet", "yellow")
for(i in 1:5) {
lines(x, approximation(x, i), col = cols[i])
}
legend("topleft", c("n = 1", "n = 2", "n = 3", "n = 4", "n = 5"),
col = cols, lty = 1)
20

n=1
n=2
15

n=3
n=4
10

n=5
y

5
0

−3 −2 −1 0 1 2 3

As we can see, with increasing n, the Taylor polynomial gets closer and closer
to the function.

5.6.2 Elasticity

In economics, derivatives are used to study how various quantities react to


changes in other quantities. Typical examples are the price elasticity of
demand or of supply. Elasticity of a function f with respect to its argument
x is defined as follows:
5.7. EXERCISES 135

Definition 5.4. Let f : A → B be differentiable at x ∈ A with x ̸= 0,


f (x) ̸= 0. Then the elasticity of f w.r.t. x is
df (x)
x ′ f (x)
Elx f (x) = f (x) = dx
. (5.15)
f (x) x

Recall the interpretation of df (x) as the difference in f (and similarly for x).
This allows us to interpret equation (5.15) as follows: It is the percentage
change in f in reaction to a one percent change in x. If f is the demand and
p the price, the price elasticity of demand gives the (approximate) percentage
change in demand if the price changes by one percent.

Problem 5.5. Find the price elasticity of demand if the demand function
is given by D(P ) = −10P + 400. If the current price is P = 10, how much
does the demand change if the price changes by 1%?

Solution. Taking the derivative of D, we get D′ (P ) = −10. Following the


formula in 5.15, we obtain
P −10P
ElP D(P ) = · (−10) = .
−10P + 400 −10P + 400
100
For P = 10, we plug in and get ElP D(P ) = − 300 which means that at a
current price of 10, a price increase by 1% will lead to a demand decrease of
approximately 0.33%.

5.7 Exercises
5.1 Find the derivatives of the following functions. Write down the domains
of both the functions and their derivatives.

f)f6 (x) = x2 − 71 x2
5
a)f1 (x) = x2 + x3
b)f2 (x) = 4x2 − x + 1
√ g)f7 (x) = 2 sin x + 3 cos x
c)f3 (x) = x + x−2
√ h)f8 (x) = 3x
d)f4 (x) = 6 3 x − 5
1 4
e)f5 (x) = x2
+ x3
i)f9 (x) = 9 log10 (x)
logc b
Hint for f9 : You can make use of the fact that loga b = logc a
which easily
lets you change the basis.
136 CHAPTER 5. DERIVATIVES

5.2 Find the derivatives of the following functions. Where possible, simplify
before differentiating, but don’t forget to specify the domain accordingly.
Where necessary, make use of the product, quotient and/or chain rule.
(x+1)3
a)g1 (x) = x
f)g6 (x) = cos(2x + 4)
b)g2 (x) = (x + 1)6
2
g)g7 (x) = sin2 x

c)g3 (x) = 4x3 − x h)g8 (x) = sin(x2 )
p √
d)g4 (x) = x + 5x i)g9 (x) = ln(3 sin x + 8)
2
e)g5 (x) = (x − 1) sin x j)g10 (x) = esin x

5.3 For the following functions, find their derivatives and find intervals on
which they are increasing/decreasing.
(x2 +2)2 1
a)h1 (x) = 4
e)h5 (x) = (3x4 +x2 )10
x3 +1 √
b)h2 (x) = x+1 f)h6 (x) = ( 2x3 − 1 + 2)8
2x−1
c)h3 (x) = x+3 g)h7 (x) = ln(2x + 4)

x2 +2x x
d)h4 (x) = 1−x2
h)h8 (x) = e

5.4 For the following functions, find their derivatives and then use them to
decide, whether the functions are increasing or decresing at x = 0.
a)f (x) = (2x + e−x )3 + 1
x+2
;
b)g(x) = (x2 + e−3x )2 + 2−x
1
.

x3
5.5 For f (x) = x4 − 6
+ 2, find f ′ (0), f ′ (−1) and f ′ (2).

5.6 For g(x) = x3 − 6x2 + 2x + 5, find x for which

a)g ′ (x) = 2 b)g ′ (x) = −10 c)g ′ (x) = −16

5.7 Find the second derivative of the following functions:


√ 1
a)f1 (x) = 3x2 + 6x − 2 d)f4 (x) = 3 x + √ 4x

b)f2 (x) = x2 − 2 x + x1 e)f5 (x) = 4 sin x + 2 cos x
(x2 +4)(x−1)
c)f3 (x) = x4 + 4x3 f)f6 (x) = x2

For functions f1 , f2 and f3 , find the intervals on which they are convex/concave.
5.7. EXERCISES 137

5.8 Find the following Taylor approximations for n = 5:


a)for f (x) = ln x and a = 1 and a = 2;
b)for g(x) = sin x and a = 0 and a = 2π;
c)for h(x) = cos x and a = 0 and a = π2 ;
d)for m(x) = ln(x + 1) and a = 0;
1
e)for n(x) = 1+x
and a = 0.

5.9 The figure below shows the derivative f ′ of a function f : [−4, 2] → R.


6
4
2
f'(x)

0
−4 −2

−4 −3 −2 −1 0 1 2

a)Find the interval(s) on which f is increasing.


b)Find the interval(s) on which f is decreasing.
c)Find the interval(s) on which f is convex.
d)Find the interval(s) on which f is concave.
e)Find the inflection point(s) of f .

5.10 Assume that the demand for natural gas in Europe is given by D(p) =
200p−2 . Find the price elasticity of demand. Interpret the result.
138 CHAPTER 5. DERIVATIVES

5.11 Suppose that a producer faces the following demand function:


exp(−p)
a)q = D(p) = 2
;
−p

b)q = D(p) = exp 2 .
Find the price elasticity of demand and interpret the result. Moreover,
compute the second order Taylor approximation of D(p) around p = 0.

5.12 The following code should implement a function A that takes two vectors
x and y of the same size and returns the element-wise minimum of their
element-wise product and their element-wise sum. However, there are some
mistakes in the code. Find and correct (and avoid building in new ones),
such that you end up with a functioning code that fulfills the task. You may
assume that the user only plugs in two vectors of the same size, that is, a
missing check for this condition is not a mistake.
A <- function(x, y) {
K <- length(x)
ews <- rep(0, times = K, each = 2)
for(i in 1:K) {
ews[i] <- x[i] + y[i]
}
ewprd <- x*y
min(ewprd, ews)

5.13 Assume that two vectors x and y of the same length have been assigned
in R. What does the following piece of code do? In particular, what is the
outcome of the last two lines?
m <- length(x)
res <- x[1]*y[1]
for(i in 2:m) res <- res + x[i]*y[i]
mxy <- max(x[1], y[1])
for(i in 2:m) mxy <- c(mxy, max(x[i],y[i]))
print(res)
print(mxy)

5.8 Further readings

Chapter 6 of [1] gives a detailed introduction to the topic of derivatives;


for purposes of this course, section 6.5 can be left out. To practice taking
5.8. FURTHER READINGS 139

derivatives and their interpretations, we particularly recommend the exercises


of sections 6.6 (1-4, 6), 6.7 (1-4, 8-10), 6.8 (1-10), 6.9 (1-7), 6.10 and 6.11,
as well as the Review exercises at the end of the chapter (except perhaps
exercises 1 and 2). Sections 7.4 and 7.5 discuss Taylor approximation of 1st
and higher degree, whereas in Section 7.7 introduces elasticities. Correspondingly,
the exercises of these sections are useful to practice and check your understanding
of these concepts. From the review exercises of Chapter 7, we recommend
exercises 12, 15, 19 and 20. Finally, to read about convex and concave
functions and their connection to derivatives, we recommend sections 8.1,
8.2, 8.5 and 8.6, and exercises 1-6 in the Review exercises of this chapter for
practicing the concepts.
140 CHAPTER 5. DERIVATIVES
Chapter 6

Integration

In the previous chapter, we learned about derivatives, including several simple


rules that allow to differentiate a wide range of functions. We also mentioned
one of the economic applications of derivatives, the elasticity, and we will
encounter derivatives also in the chapter about optimization later in this
document. However, often economists (and other professionals) are interested
in finding the function from the given information about its derivative. This
can be seen as reversing the differentiation process and is called integration.
Assuming the derivative f (x) of some function is given, then the function
F with F ′ (x) = f (x) is called the indefinite integral or antiderivative of f .
Among many other applications, the one most people would think of first,
and also is used most often to motivate the integral, is finding the area below
the function graph. This is achieved by means of the definite integral, which
is also very relevant for e.g. finding the average value of a function, or to
finding probabilities, expectations and variances in probability theory.

In this chapter, we will introduce both the indefinite and the definite integral,
some rules for calculating integrals, and we will close with some applications.

6.1 Indefinite integral

Assume that we don’t know the function F , but we know its derivative:
F ′ (x) = x2 . Thinking back to the rules about differentiating power functions,
it is clear that F (x) = 31 x3 could be the function F we are looking for because
differentiating F would lead to x2 . But this would also be the case if we set

141
142 CHAPTER 6. INTEGRATION

F (x) = 13 x3 + C for any C ∈ R, because additive constants disappear upon


differentiation. This shows that given F ′ (x) (only), F (x) cannot be uniquely
identified. However, it can quite easily be shown that all functions F with
F ′ (x) = f (x) are of the form F (x) = 13 x3 + C, C ∈ R. The fact that the
antiderivative is in fact a whole class of functions rather than a single definite
function is also why it is called the indefinite integral.

Mathematically, indefinite integral is defined as follows:


Definition 6.1. Let f and F be such that f (x) = F ′ (x). The antiderivative
of f is expressed mathematically as
Z
f (x)dx = F (x) + C

where the left hand side of the equation


R reads as the indefinite integral of f
with respect to x. The symbol is the integral sign, f (x) is the integrand,
dx indicates the variable of integration, and C is the integration constant.

6.1.1 Basic rules for indefinite integrals

By reversing some of the rules for calculating derivatives, one can create a
set of basic rules for finding the indefinite integral. Let f and g be functions
and a ∈ R.
Z
adx = ax + C (6.1)
Z Z
af (x)dx = a f (x)dx (6.2)
Z Z Z
(f (x) + g(x))dx = f (x)dx + g(x)dx (6.3)
Z
1
xa dx = xa+1 + C for a ̸= −1 (6.4)
a+1
Z
1
dx = ln(|x|) + C (6.5)
x
Z
ex dx = ex +C (6.6)
Z
cos(x)dx = sin(x) + C (6.7)
Z
sin(x)dx = − cos(x) + C (6.8)
6.1. INDEFINITE INTEGRAL 143

Note that in rule (6.3), the brackets around f (x) + g(x) are not strictly
necessary since the integrand is enclosed between the integral sign and dx
such that there is no danger of confusion. However, using brackets in similar
situations improves the readability.
R
Problem 6.1. Find the indefinite integral (2x3 + 5)dx.
Solution. We make use of rules (6.1)-(6.4) to write
Z Z Z
3 3
(2x + 5)dx = 2 x dx + 5dx
1
= x4 + 5x + C
2

6.1.2 Integration by parts

Clearly, the rules in the previous subsection will only work for fairly simple
functions. If the function to be integrated is more complicated, one will need
to use more advanced methods to find the antiderivative.

A method that often (but not always!) works well to find the indefinite
integral of a product of two functions is called integration by parts. This
method reverses the chain rule for derivatives to find the integral.

Recall that for two differentiable functions f and g, we have (f · g)′ = f ′ · g +


f · g ′ . By applying the integral to both sides of this equality, we get

(f · g)′ (x) = f ′ (x) · g(x) + f (x) · g ′ (x)


Z Z Z
(f · g) (x)dx = (f · g)(x) = f (x) · g(x)dx + f (x) · g ′ (x)dx
′ ′

By rearranging the terms, we get


Z Z
f (x) · g (x)dx = f (x)g(x) − f ′ (x) · g(x)dx

(6.9)

To find the antiderivative of a function using integration by parts, we therefore


try to rewrite the integrand as the product of two functions, of which R one is
easily integrable (g ′ ) and the other easily differentiable (f ). Ideally, f ′ (x) ·
g(x)dx in the right hand side of (6.9) is a much simpler integral than the
original integral.

Let us solve a few problems to illustrate the use of this method.


144 CHAPTER 6. INTEGRATION
R
Problem 6.2. Find the indefinite integral x ex dx.
Solution. We note that the integrand is the product of two functions: x
and ex . While both integrating and differentiating ex results in the same
function again, x will be simplified by differentiation and becomes slightly
more complicated if we integrate it. We therefore write x ex = f (x)g ′ (x) with
f (x) = x and g ′ (x) = ex . Then we have f ′ (x) = 1 and g(x) = ex such that
we have Z Z
x e dx = x e − ex dx = x ex − ex +C.
x x

R
Problem 6.3. Find the indefinite integral ex cos(x)dx.
Solution. Upon analyzing the functions in the integrand, we realize that it
does not make a difference whether we integrate or differentiate cos(x); in
both cases, the new function will be sin(x) (or − sin(x)). Let us therefore
try and see what happens if we use integration by parts by setting f (x) = ex
and g ′ (x) = cos(x). In this case, we obtain f ′ (x) = ex and g(x) = sin(x).
The formula in (6.9) leads to
Z Z
e cos(x)dx = e sin(x) − ex sin(x)dx.
x x

At the first sight, this does not seem too helpful, but before
R x giving up, let
us try applying integration by parts again, this time on e sin(x)dx, while
keeping f (x) = ex and setting g ′ (x) = sin(x). Note that then g(x) = − cos(x)
such that in the integral part of the right hand side of (6.10), we arrive at
the original integral again:
Z Z
e sin(x)dx = − e cos(x) + ex cos(x)dx.
x x
(6.10)
R
Let us now combine both steps and denote ex cos(x)dx by I:
Z
I = ex cos(x)dx = ex sin(x) + ex cos(x) − I.

By solving for I, we get


ex
Z
I = ex cos(x)dx = (sin(x) + cos(x)) + C.
2
Note that if we set g ′ (x) = ex and used f (x) = cos(x) and f (x) = sin(x)
in the two applications of integration by parts, respectively, we would have
arrived at the same result (try it as an exercise). However, it is important to
not exchange the roles of the two parts of the product: If we differentiated
ex in the first step and integrated it in the second step (or the other way
round), we would end up with the true but uninformative equality I = I.
6.1. INDEFINITE INTEGRAL 145

6.1.3 Integration by substitution

The second (and last) method for finding more complicated indefinite integral
that we introduce is integration by substitution. In this method, one reverses
the chain rule for derivatives.

Recall that for two integrable functions f and g such that their composition
f (g(x)) is well defined, the chain rule implies that (f (g(x)))′ = f ′ (g(x))g ′ (x).
By applying the indefinite integral to both sides of this equality, we get
Z
f ′ (g(x))g ′ (x)dx = f (g(x)).

While mathematically this makes a lot of sense, finding the functions f and
g requires a lot of practice. One needs to practice ”seeing the derivative
of a composite function” in the integrand. Generally we try and look for
a product of two functions: a composite function, where the inner function
is g, and the derivative of the inner function. Then we make the following
substitution: u(x) = g(x). Recall that we can write u′ (x) = du dx
. From
du ′ ′
dx
= g (x), we thus obtain du = g (x)dx. Then we can rewrite the original
integral as follows:
Z Z
f (g(x))g (x)dx = f ′ (u)du,
′ ′

making u our new variable of integration. This way, we have (hopefully)


simplified the integral and now only need to look for the antiderivative of f ′ .
Upon finding it, we just resubstitute back for u to arrive at a function of the
original variable x again:
Z Z
f (g(x))g (x)dx = f ′ (u)du = f (u) + C = f (g(x)) + C.
′ ′

Let us solve some problems using this method.

Problem 6.4. Find the indefinite integral (9x72x


R
2 +2)5 dx.

Solution. We note that the integrand (9x72x 1


2 +2)5 can be rewritten as (9x2 +2)5 ·

72x and that (9x2 + 2)′ = 18x. The first part can therefore be seen as
a composite function with the inner function being g(x) = 9x2 + 2 and
the second part is a multiple of g ′ (x). We therefore use the substitution
146 CHAPTER 6. INTEGRATION

u = 9x2 + 2 and obtain du = 18xdx. Then we can write


Z Z Z
72x 1 1
2 5
dx = 2 5
· 4 · 18xdx = 4 du
(9x + 2) (9x + 2) u5
 
1 1 1
=4· − 4
+C =− + C.
4 u (9x + 2)4
2

Sometimes, the derivative of g is not immediately available in the interval, but


simple reformulations while finding du can still lead to successfully finding
the indefinite integral.
Problem 6.5. Find the indefinite integral 3+√1x+8 dx.
R

Solution. A quick consideration tells us that integration by parts will not


be helpful in this case. To try and find the integral, we consider substitution.

To get a chance at simplifying as much as possible, we set u = 3 + x + 8.
We get du = 21 √x+81 1
dx. √x+8 (or any multiple of it) is not present in the
original√integral, which is not √good news. However, we notice that since
u = 3+ x + 8, we can rewrite x + 8 as u−3 and get, upon some √equivalent
transformations, 2(u − 3)du = dx. By substituting for 3 + x + 8 and
plugging in for dx, we get
Z Z Z  
1 1 3
√ dx = 2 (u − 3)du = 2 1− du
3+ x+8 u u
√ √
= 2u − 6 ln(|u|) + C = 6 + 2 x + 8 − 6 ln(3 + x + 8) + C.

Finally, let us derive a useful rule for integrating composite functions with
the inner function being a linear function.
R
Problem 6.6. Derive the rule for f (ax + b)dx where a, b ∈ R, a ̸= 0 and
f (x) = F ′ (x).
Solution. We use the substitution ax + b = u. Then we get du = adx or,
equivalently, dx = a1 = du. Therefore it follows that
Z Z
1 1 1
f (ax + b)dx = f (u)du = F (u) + C = F (ax + b) + C.
a a a

6.2 Definite integral and areas

Since ancient times, there have been formulas for calculating the area of
any rectangle and thus also of any polygon that, by definition, is entirely
6.2. DEFINITE INTEGRAL AND AREAS 147

Figure 6.1: Approximating the area under an irregular curve from below

bounded by straight lines. However, the question of measuring the area of


general plane surfaces, like circles, that are not bounded by straight lines
(only), is more involved. The measurement of such areas is also related to
integration.

Let us consider the area under an irregularly shaped curve, such as y = f (x),
between x = a and x = b. The area can be approximated by the area of
several rectangles that arise by dividing the interval [a, b] into n subintervals
and raising rectangles above these subintervals. The height of each rectangle
can e.g. be the function value in the left or right endpoint of the subinterval,
the middle of it, or the smallest or largest value of the function on this
subinterval. This idea, in the case of using the lowest value of the function
on the subinterval and therefore approximating the integral from below, is
illustrated in Figure 6.1.

The approximation gets generally better with increasing number of intervals,


and if the number is increased such that n → ∞, the length of each subinterval
becomes infinitesimal – extremely small. The length of the interval ∆xi
becomes dx. The area under the curve can be expressed precisely by the
means of the definite integral.
148 CHAPTER 6. INTEGRATION

Definition 6.2. The area under a graph of a continuous non-negative function


f between x = a and x = b can be expressed as the definite integral of f (x)
over the interval from a and b:
Z b
f (x)dx.
a

In this notation, the function f that is being integrated and the interval of
integration from a to b are explicitly specified. It is read as ”the integral
from a to b of f (x)dx; a and b are the lower and upper limits of integration,
respectively.

Following the approximation idea described above, the definite integral can
be seen as an infinite sum or, in other words, the sum of the area of infinitely
many rectangles.
Remark. Note that the variable of integration x is a dummy variable in
the sense that it can be replaced by any other variable that does not occur
anywhere else in the expression. In that sense, it holds for instance that
Z b Z b Z b
f (x)dx = f (y)dy = f (ξ)dξ.
a a a
Rb
However, a
f (b)db is not a valid expression.

To evaluate the definite integral, one can use the fundamental theorem of
calculus.
Theorem 6.1. Let f be a continuous function and F such that F ′ = f . We
have Z b
f (x)dx = F (b) − F (a).
a

This theorem provides a powerful tool for calculating definite intervals. The
difference F (b) − F (a) is denoted by |ba F (x) = F (x)|ba = [F (x)]ba , to indicate
that b and a are to be substituted successively for x. Note that any constant
disappears in the difference F (b) − F (a) such that any antiderivative of f
can be used and the integration constant does not have to be considered for
definite integral.
R3
Problem 6.7. Evaluate 0 x2 dx.
Solution. We have
3 3
x3 33 03
Z 
2
x dx = = − = 9.
0 3 0 3 3
6.2. DEFINITE INTEGRAL AND AREAS 149

6.2.1 Definite integral for general functions

As we have already mentioned, the definite integral (note the difference in


the names – unlike the indefinite integral, the definite integral is unique)
represents the accumulated area under the curve given by the graph of a
non-negative function over the specified interval. However, definite integral
is also defined for functions that are not necessarily non-negative over the
given interval. In general, the definite integral is in fact the signed area
between the function graph and the x-axis which can be both positive and
negative, depending on the sign of the function on the given interval. If the
function changes its sign on the interval of integration, the definite integral
(the signed area) might be both negative, positive, or even zero. On the other
hand, if we are interested strictly in the area between the function graph and
the x-axis, this amounts to integrating the absolute value of the function.

Definition 6.3. The signed (net) area between the graph of a continuous
function f and the x axis between x = a and x = b is the definite integral of
f (x) over the interval from a to b:
Z b
f (x)dx.
a

Note that the fundamental theorem of calculus does not require the function
to be non-negative such that it also applies in case of general functions.

Problem 6.8. Evaluate the following definite integrals:

ˆR 0
ln(3) x
(e −3)dx,

ˆR 2π
−2π
sin(x)dx.

Solution. We have

ˆR 0
ln(3) x
(e
ln(3)
−3)dx = [ex −3x]0 = 3 − 3 ln(3) − 1 = 2 − 3 ln(3) ≈ −1.296,

ˆR π
−π
sin(x)dx = [− cos(x)]π−π = −1 + 1 = 0.

Problem 6.9. Calculate the area between the x-axis and the graph of the
function f (x) = ex −3 between x = 0 and x = ln(3).
150 CHAPTER 6. INTEGRATION

Solution. We note that the function f (x) is non-positive for all x ∈ [0, ln(3)].
Therefore, we have |f (x)| = −f (x) and the area we are looking for can be
evaluated as
Z ln(3) Z ln(3) Z ln(3)
|f (x)|dx = −f (x)dx = − f (x)dx = 3 ln(3) − 2 ≈ 1.296.
0 0 0

In the following, we collect some rules for working with definite integrals.
Z b Z b
kf (x)dx = k f (x)dx; k ∈ R (6.11)
a a
Z b Z b Z b
(f (x) ± g(x))dx = f (x)dx ± g(x)dx (6.12)
a a a
Z a Z b
f (x)dx = − f (x)dx (6.13)
b a
Z a
f (x)dx = 0 (6.14)
a
Z b Z c Z b
f (x)dx = f (x)dx + f (x)dx (6.15)
a a c
Z b(t)

f (x)dx = f (b(t))b′ (t) − f (a(t))a′ (t) (6.16)
∂x a(t)

Let us now comment on the above rules. (6.11) and (6.12) are direct consequences
of (6.2) and (6.3).
(6.13) implies that the integral limits do not necessarily need to be ordered;
it is possible for the lower limit of integration to be actually greater than the
upper limit. The rule follows from the fundamental theorem of calculus:
Z a Z b
f (x)dx = F (a) − F (b) = −(F (b) − F (a)) = − f (x)dx.
b a
Ra
(6.14) also follows from the fundamental theorem of calculus: a f (x)dx =
F (a) − F (a). It also makes a lot of sense geometrically: The surface over a
single point is just a line, which clearly has area zero.
(6.15) is very useful in cases where the function f is defined as several
different mathematical functions on subintervals of [a, b] and can be used to
easily find the area between the x-axis and the function graph for functions
that change the sign on the interval of integration, as illustrated in Problem
6.10 below. Moreover, it even allows to evaluate definite integrals even for
functions that are not continuous by splitting the intervals in the points of
6.2. DEFINITE INTEGRAL AND AREAS 151

discontinuity. We will encounter such functions in the next section when


presenting applications. Note that rule 6.15 is usually used for c ∈ (a, b), but
in fact holds generaly also for c outside of the interval [a, b].
Finally, (6.16) deals with definite integrals with integration limits that are
functions of some variable t, such as time. The rule is in fact a combination
of the fundamental theorem of calculus and chain rule for derivatives: If F
is an antiderivative of f , we have
Z b(t)
∂ ∂
f (x)dx = (F (b(t)) − F (a(t))) = f (b(t))b′ (t) − f (a(t))a′ (t).
∂x a(t) ∂t

In the following problem, we illustrate the use of rule (6.15)

Problem 6.10. Calculate the area between the x-axis and the function
f (x) = sin(x) between x = −π and x = π.

Solution. To find the area between the x-axisR and the function f (x) = sin(x)
π
on the given interval, we need to evaluate −π | sin(x)|dx. While it is not
straightforward how to find the antiderivative of | sin(x)|, we notice that the
function is non-positive for x ∈ [−π, 0], such that | sin(x)| = − sin(x) on this
interval, and non-negative for x ∈ [0, π]. Therefore we can use rule (6.15) to
write
Z π Z 0 Z π
| sin(x)|dx = (− sin(x))dx+ sin(x)dx = [cos(x)]0−π +[− cos(x)]π0 = 2.
−π −π 0

We close this section by a little warning. Recall the method for finding
antiderivatives by substitution. Particularly when evaluating definite integrals,
it is important to remember to reverse the substitution at the end of the
process of finding the antiderivative, to return to the original variable of
integration, because the integration bounds are given in terms of this variable.
Alternatively, one can change the integration limits in the substitution step,
and in that case, the reversed substitution is not necessary at the end. We
showcase both methods in the following problem.
R2 √
Problem 6.11. Calculate 1 4x 1 + x2 dx.
R √
Solution. Method 1: We start by finding the indefinite integral 4x 1 + x2 dx.
We will use the substitution u = 1 + x2 , which leads to du = 2xdx. We get


p
√ 3 (1 + x2 )3
Z Z
4 u 4
4x 1 + x2 dx = 2 udu = +C = .
3 3
152 CHAPTER 6. INTEGRATION

From this it follows that


" p #2 √ √
Z 2 √ 4 (1 + x 2 )3 4 53 4 23
4x 1 + x2 dx = = − ≈ 11.136.
1 3 3 3
1

Method 2: We again use the substitution u = 1 + x2 which leads to du =


2xdx. Moreover, for x = 1 we have u = 2 and for x = 2 we have u = 5.
Therefore we write
Z 2 √ Z 5 " √ #5
√ 4 u3
4x 1 + x2 dx = 2 udu = ≈ 11.136.
1 2 3
2

6.3 Applications

Like derivatives, integral have a wide range of applications in various areas.


In this section, we will introduce two economic applications.

6.3.1 Consumer and producer surplus

As already discussed in Chapter 3, the market equilibrium is the price and


quantity at which demand and supply meet. If the demand and supply are
given as functions of price, D(P ) and S(P ), the equilibrium price can easily
be found by setting the two functions equal and solving for the price; the
corresponding equilibrium quantity is the common value of the demand and
supply function at the equilibrium price. Sometimes, demand and supply
are described through the inverse demand and supply functions D−1 (Q) and
S −1 (Q). The inverse demand function can be interpreted as the maximal
price the consumer is willing to pay to buy a certain quantity of the product.
Similarly, the inverse supply function can be interpreted as a minimal price
at which a certain quantity of product can be produced. The graphs of the
inverse demand and supply functions are also called the demand curve and
supply curve, respectively.

The consumer surplus represents the overall benefit that consumers have in
the equilibrium from paying a smaller price – the equilibrium price P ∗ – than
what they are willing to pay according to the inverse demand function. At a
given quantity level Q, this difference is D−1 (Q) − P ∗ . For consumer surplus,
we are interested in all quantity levels below the equilibrium quantity Q∗ .
6.3. APPLICATIONS 153

Usually the difference D−1 (Q) − P ∗ will be positive as less people are willing
to buy at a higher price. The overall benefit of all consumers who are willing
to buy at a higher than equilibrium price, can therefore be calculated as
Z Q∗
CS = (D−1 (Q) − P ∗ )dQ. (6.17)
0

On the other hand, the producer surplus represents the overall benefit that
the producers gain in the equilibrium from selling at a higher price – the
equilibrium price P ∗ – than the price they are willing to produce for. At a
given quantity level P , this difference is P ∗ − S −1 (Q). The overall benefit of
all producers willing to sell at a lower than equilibrium price is then given as
Z Q∗
PS = (P ∗ − S −1 (Q))dQ. (6.18)
0

Let us now calculate the consumer and producer surplus in a market. In the
following problem, we will also graphically illustrate the two measures.
Problem 6.12. Given the inverse demand function P (Q) = D−1 (Q) =
110 − Q2 and the inverse supply function P (Q) = S −1 (Q) = 29
9
Q, find the
equilibrium price and quantity. Then, evaluate the consumer and producer
surplus.
Solution. To find the equilibrium quantity, we set the two inverse functions
equal to each other and solve for Q:
29
110 − Q2 = Q
9
29
Q2 + − 110 = 0
9  
110
Q∈ − ,9
9
Since Q is a quantity, we only consider the positive solution and get Q∗ = 9,
P ∗ = 110 − 81 = 29. Then we have
Z 9 Z 9 9
Q3

2 2
CS = (110 − Q − 29)dQ = (81 − Q )dQ = 81Q − = 486
0 0 3 0
and
Z 9    9
29 29 2
PS = 29 − Q dQ = 29Q − Q = 130.5.
0 9 18 0

Graphically, the situation is as follows:


154 CHAPTER 6. INTEGRATION

100
60

CS
P

20

PS
−20

0 2 4 6 8 10 12

6.3.2 Average function value

In some applications, one is interested in the average value a function f


attains on a certain interval [a, b]. Geometrically, we can think of it as the
height of a rectangle whose one side is the x-axis on this interval and whose
(signed) area is the same as the (signed) area between the function graph and
the x-axis. In other words, if the function value were constant throughout
the interval, what would this function value have to be such that the definite
integral of the constant function is the same as the definite integral of f ?

Let us consider the area of the hypothetical rectangle. Since one of the sides
is the x-axis on the interval [a, b], the length of this side is (b − a). The
average value f¯ of the function, is the other side length – the height of the
rectangle. Since we want the (signed) area of the rectangle to be the same
as the (signed) area between the function graph of f and the x-axis on this
interval, we can write
Z b
f¯(b − a) = f (x)dx.
a
6.3. APPLICATIONS 155

Simple rearranging of terms yields

Z b
1
f¯ = f (x)dx. (6.19)
b−a a

In the following problem, we consider an application of finding the average


function value.

Problem 6.13. The amount of products a company has on stock at a given


time t ∈ [0, 10) can be described by the function


−t + 4, for 0 ≤ t < 3,

I(t) = −t + 8, for 3 ≤ t < 5,

−t + 12, for 5 ≤ t < 10.

What is the average inventory level in the given time interval?

¯ we need to calculate
Solution. To find the average inventory level I,

Z 10
1
I¯ = I(x)dx.
10 0

We notice that the given inventory function is discontinuous such that the
fundamental theorem of calculus cannot be directly applied to it. However,
rule (6.15) allows us to split the interval into three simpler intervals:

Z 3 Z 5 Z 10 
1
I¯ = (−t + 4)dt + (−t + 8)dt + (−t + 12)dt
10 0 3 5
 2 3  2 5  2  !
10
1 t t t
= − + 4t + − + 8t + − + 12t = 3.8.
10 2 0 2 3 2 5

Graphically, we can illustrate the situation as follows:


156 CHAPTER 6. INTEGRATION

8
6
I(t)

4
2
0

0 2 4 6 8 10

6.4 Exercises
6.1 Find the following indefinite integrals:

x4 −1+ x
R
a) (x3 + 6x2 − 2x)dx,
R
i) x 3 dx,
R
b) (3x + 5)dx, 1
R
j) 4x+15
dx,
c) (x−4 − x−5 )dx,
R

R ( x−1)2
R 1 1
k) x
dx,
d) (x 3 − 3x− 4 )dx, R
R 2 l) exp(2x + 5)dx,
e) ( x2 + x42 )dx, R
R √ 6
m) cos(0.5 + 0.25x)dx,
f) ( x + √ 3 x )dx, √
R √ 3
R
n) x5 dx,
g) 2 3xdx,
R √ R q
1
h) x x(1 + x√5 x )dx, o) x5
dx.

6.2 Find the following indefinite integrals:


6.4. EXERCISES 157
R R ln(x)
a) x sin(x)dx, f) x
dx,
R R 2 x
b) x ln(x)dx, g) x e dx,
R R
c) ln(x)dx, h) ex sin(x)dx,
d) x e−3x dx, sin2 (x)dx,
R R
i)
R R
e) sin(x) cos(x)dx, j) cos2 (x)dx.

6.3 Find the following indefinite integrals:


R R
a) 4x(x2 + 8)9 dx, e) cos(x)(sin(x) + 2)4 dx,
R R √
b) x5 (4 + x6 )7 dx, f) (2x + 5) x2 + 5x + 1dx,
R √ R x2
c) 5x2 4 x3 − 2dx, g) √1+x 3 dx,
R 2
d) x3x+8 dx,
R 5
h)6 x (4 + x6 )8 dx.

6.4 Find the function f for which the following hold: f ′ (x) = x2 + 32 , f (1) = 3.

6.5 Find the function f for which the following hold: f ′′ (x) = 12x − 6,
f (0) = 4, f (1) = 6.

6.6 Evaluate the following definite integrals:


R4 R1
a) 2
xdx d) −1 ex dx
R3 R3
b) −3 (2x− 5)dx e) −2 x2x+1 dx
R1 3 R1 3
c) −1 x dx f) 0 3x2 ex dx

6.7 Find the parameters a, b, c such that for the function f (x) = ax2 +bx+c,
2
the following hold: f ′ (1) = 8, f ′′ (1) = 6, 1 f (x)dx = 14.
R

6.8 Find the overall area between the graph of the function f (x) = x2 − 1
and the x-axis on the interval [−2, 2]. Hint: Try to plot the function first to
see its sign in various parts of the interval.

6.9 Find the consumer surplus in a market with the inverse demand function
given by D−1 (Q) = −3Q + 60 and equilibrium quantity Q∗ = 10. Hint: First
find the equilibrium price P ∗ .

6.10 Find the average value of the function f (x) = e4x between 0 and 2.5.
158 CHAPTER 6. INTEGRATION

6.11 The amount of products on stock in a certain e-shop for a given time
t ∈ [0, 10] is given by

3,
 t ∈ [0, 4),
I(t) = −t + 7, t ∈ [4, 6)

−2t + 20, t ∈ [6, 10].

Find the average inventory in the given time interval.

6.12 The amount of products on stock in a certain e-shop for a given time
t ∈ [0, 10] is given by
(
650 e−0.25t , t ∈ [0, 5),
S(t) =
330 − 10t, t ∈ [5, 10].

Find the average inventory in the given time interval.

6.13 At time point t, a car travels at a speed of The amount of products on


stock in a certain e-shop for a given time t ∈ [0, 10] is given by

2
3t + 5t,
 t ∈ [0, 5),
s(t) = 100, t ∈ [5, 45)

1000 − 20t, t ∈ [45, 50].

What is the average speed at which the car is traveling for t ∈ [0, 50]?

6.5 Further readings

Most of the content of this chapter (with the exception of average value of
a function) is discussed in [1] in Chapter 10. Parts of Section 10.4 and the
whole Section 10.7 in [1] are not covered in these lecture notes nor in the
Quantitative Methods 1 course.

To further practice, we recommend, as usual, exercises at the end of the


individual sections, as well as the review exercises:

ˆ exercises at the end of Section 10.1, perhaps with the exception of


exercise 7;

ˆ exercises at the end of Section 10.2, except for exercise 7;


6.5. FURTHER READINGS 159

ˆ exercises 1-6 at the end of Section 10.3;


ˆ exercises 6 and 7 at the end of Section 10.4;
ˆ exercises at the end of Section 10.5;
ˆ exercises 1-3 at the end of Section 10.6;
ˆ exercises 1-3, 5 and 9-11 in the review exercises.
160 CHAPTER 6. INTEGRATION
Chapter 7

Matrix algebra

Consider a company with several different outlets selling several different


products. A concise way of keeping track of stocks might be:
Skis Poles Outfits
 
Outlet 1 120 110 90
Outlet 2 80 100 110 .
Outlet 3 140 175 120
In this table, one can see the amounts of the different products each outlet
has on stock, and can easily get an overview also about the overall stocks
of each product or in each store. Such tables are also called matrices and
they not only allow to give an overview about the situation, but also allow
to express complicated systems in a simplified way. They moreover provide a
way of solving (large) systems of linear equations and even to decide whether
a solution of a system exists before even attempting to solve it.
We start this chapter by introducing the basic terminology for matrices.
After a section about defining and working with matrices in R, we continue
with matrix operations and a method for solving systems of linear equations,
Gauss elimination. Finally we introduce the transpose and inverse as well as
the determinant.

7.1 Matrix terminology

Definition 7.1. A matrix is a rectangular array of numbers, parameters, or


variables, each of which has a carefully ordered place within the matrix. The

161
162 CHAPTER 7. MATRIX ALGEBRA

numbers (parameters or variables) are referred to as elements or entries of


the matrix (and their places might also be referred to as cells). The numbers
in a horizontal line are called rows; the numbers in a vertical line are called
columns.
The number of rows r and the number of columns c together define the
dimension (also referred to as size or order ) of the matrix, r × c, read as ’r
by c’.
Matrices of special sizes are:

ˆ A matrix with r = c is called a square matrix.


ˆ A matrix composed of a single row, i.e. a matrix of dimension 1 × c, is
called a row vector (or c-row vector).

ˆ A matrix composed of a single column, i.e. a matrix of dimension r × 1,


is called a column vector (or r-column vector).

Note that in the dimension of the matrix, the number of rows comes first and
only then comes the number of columns. This is a general convention which
does not only apply to naming the matrix dimension, but also to naming
matrix entries. A general matrix A can be written in the form
 
a11 a12 . . . a1c
a21 a22 . . . a2c 
A = (aij )r×c = (aij ) =  .. ..  . (7.1)
 
..
 . . . 
ar1 ar2 . . . arc

Note how each entry of the matrix is denoted with a double subscript, where
the first subscript refers to the row and the second subscript to the column.
aij (or equivalently Aij ) therefore refers to the element in row i and column
j of matrix A.

7.1.1 Special matrices

Some special matrices are the zero matrix, the identity matrix and a diagonal
matrix.
The zero matrix of size m × n is denoted by 0m×n or, if the dimension is clear
from the context, simply 0 and it is a matrix that only contains 0 in each of
its cells.
A diagonal matrix is a square matrix that only contains nonzero entries on
7.2. MATRICES IN R 163

the diagonal (but the diagonal entries might also be 0). That is, a square
matrix D is diagonal if dij = 0 for any i ̸= j.
The identity matrix of order n, denoted by In or, if the order is clear from
the context, simply I, is a diagonal matrix that contains 1 in all diagonal
entries. That is, Iij = 0 for i ̸= j and Iij = 1 for i = j. The identity matrix
plays the role of 1 among matrices: In matrix product (see below in Section
7.3), multiplying with the identity matrix leaves the other matrix unchanged,
just like multiplying a number with 1 leaves the number unchanged.

7.2 Matrices in R

7.2.1 Defining matrices

To define a matrix in R, one generally uses the function matrix. This


function takes as its main argument a vector of values that the matrix is to
be filled with. Other arguments control the number of rows (nrow), number
of columns (ncol) and whether the matrix is to be filled with the provided
values in a row-wise manner (byrow, by default FALSE). As usual, you can
learn more about the function on its help page (?matrix). To inspect the
behavior of matrix with different argument values, let us have a look at some
matrices:

matrix(1:6, 2, 3)

## [,1] [,2] [,3]


## [1,] 1 3 5
## [2,] 2 4 6

matrix(1:6, 2, 3, byrow = TRUE)

## [,1] [,2] [,3]


## [1,] 1 2 3
## [2,] 4 5 6

matrix(1:6, 2)
164 CHAPTER 7. MATRIX ALGEBRA

## [,1] [,2] [,3]


## [1,] 1 3 5
## [2,] 2 4 6

matrix(1:6, ncol = 3)

## [,1] [,2] [,3]


## [1,] 1 3 5
## [2,] 2 4 6

As we can observe from the first matrix, just like in the mathematical
notation, also in R the first dimension in a matrix is the number of its rows:
matrix(1:6, 2, 3) creates a matrix of 2 rows and 3 columns. Observe
the difference between the first and the second matrix: While in the first
matrix, the first column is filled with the values 1, 2 and 3 before moving
on to the next column, the second matrix is filled row by row, ensured by
byrow = TRUE. The third and fourth matrix demonstrate that in fact, it is
not necessary to provide both the number of rows and the number of columns:
providing only one of them is enough for R to decide what the other dimension
should be in dependence of the number of values used. (Recall that if only
providing the number of columns, the name of the argument ncol must be
specifically used since the second argument of the function matrix is nrow
and thus, if no name is provided, the number will be used for nrow – compare
matrix(1:6, ncol = 3) and matrix(1:6, 3)).

In the examples above, we created matrices by providing exactly as many


values as entries in the resulting matrix. However, R can also create a matrix
if this is not the case, by the means of recycling. Let us observe the outcomes
in such cases:

matrix(1:6, nrow = 4, ncol = 4)

## Warning in matrix(1:6, nrow = 4, ncol = 4): data length [6] is


not a sub-multiple or multiple of the number of rows [4]

## [,1] [,2] [,3] [,4]


## [1,] 1 5 3 1
## [2,] 2 6 4 2
7.2. MATRICES IN R 165

## [3,] 3 1 5 3
## [4,] 4 2 6 4

matrix(1:8, nrow = 4, ncol = 4)

## [,1] [,2] [,3] [,4]


## [1,] 1 5 1 5
## [2,] 2 6 2 6
## [3,] 3 7 3 7
## [4,] 4 8 4 8

matrix(1:6, ncol = 4)

## Warning in matrix(1:6, ncol = 4): data length [6] is not a sub-multiple


or multiple of the number of columns [4]

## [,1] [,2] [,3] [,4]


## [1,] 1 3 5 1
## [2,] 2 4 6 2

As we see, in all three cases, after all the provided values are used to fill the
cells of the matrix, recycling starts and the values are used all over again as
often as needed. When comparing the first and the second matrix, we note
that while in the first matrix, not all values were used the same amount of
times, in the second case the number of matrix entries are a multiple of the
number of provided values, such that the values vector was entirely recycled.
This explains why in the first case, we obtained a warning, whereas in the
second case, there was no warning.
If only one of the dimensions, the number of rows or the number of columns,
is provided, R creates the smallest matrix that has the desired number of
rows or columns and uses each of the provided values at least once. In the
case of the third matrix above, we provided 6 values to be used in 4 columns.
If only one row were used, this would only use 4 values such that 5 and 6
would remain unused. Therefore another row was used as well, but since now
8 values are necessary to fill the matrix, the values 1 and 2 were recycled.

This idea of recycling can also be used to create matrices with the same
166 CHAPTER 7. MATRIX ALGEBRA

value in each entry: Such a matrix can be defined by providing this single
value along with the desired number of rows and columns. In particular, this
allows a very simple definition of any zero matrix:

matrix(0, nrow = 3, ncol = 4)

## [,1] [,2] [,3] [,4]


## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0

For the definition of the identity matrix, recall that it is a special diagonal
matrix. Therefore it can be created in R with the use of the command diag,
while specifying the order of the identity matrix by a single argument. If
instead of a single number, we provide a whole vector of values, diag creates
a diagonal matrix with the specified values on the diagonal.

diag(4)

## [,1] [,2] [,3] [,4]


## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1

diag(c(2, 3, 1, 4))

## [,1] [,2] [,3] [,4]


## [1,] 2 0 0 0
## [2,] 0 3 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 4

7.2.2 Accessing matrix entries

Recall that to index vectors – to access their entries – in R, we use square


brackets. This is also the case for matrices, but since unlike vectors, matrices
7.2. MATRICES IN R 167

have two dimensions (the number of rows and the number of columns), we
usually specify both dimensions when indexing matrices. Recall that when
indexing matrix elements, the first subscript refers to the row and the second
one refers to the column. This convention also holds when indexing matrices
in R. We start by assigning a matrix A that we will use for demonstration
purposes.

(A <- matrix(1:15, nrow = 5, byrow = TRUE))

## [,1] [,2] [,3]


## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
## [5,] 13 14 15

#A[1, 2]
A[2, 3]

## [1] 6

As we see, in line with the dimension convention, e.g. A[1,2] results in


showing the value in the first row and second column of A. Note that using
a single value also works for indexing matrices:

A[5]

## [1] 13

A[8]

## [1] 8

By comparing these values with the matrix A, we can deduce after a short
consideration that if indexing by a single value only, R translates the matrix
into a vector containing the columns of A stored one after another, and access
168 CHAPTER 7. MATRIX ALGEBRA

the corresponding entries in this vector. The eighth entry of a matrix with
5 rows is therefore the third entry of the second column (8 = 5 + 3).

To access certain columns or rows of a matrix, one can provide the corresponding
row or column number and leave the space for the other dimension empty
– but the comma dividing the two dimensions must be used (otherwise we
are in the situation above). One might notice this way of indexing also when
looking at a matrix R: The first row is denoted with [1,] and the first column
with [,1] (and the other rows and columns also correspondingly).

A[1,]

## [1] 1 2 3

A[, 3]

## [1] 3 6 9 12 15

It might come as a surprise that when accessing the third column of A, the
output comes in the form of a row. The reason is that generally, vectors
in R are dimensionless and for usual operations, it does not matter whether
a vector is a row or a column vector. R therefore by default drops the
dimension, i.e. it does not keep the information whether the vector is a row or
a column. If for a specific application it is necessary to keep this information,
it is possible to override this default with the argument drop.

A[ , 1, drop = FALSE]

## [,1]
## [1,] 1
## [2,] 4
## [3,] 7
## [4,] 10
## [5,] 13

The concern of dropping dimensions is only present in the case of accesing a


single row or a single column. It is of course also possible to consider several
7.2. MATRICES IN R 169

rows or columns at the same time. In that case, the entries that are at the
same time in the desired rows and in the desired columns will be outputted.
Similarly to vectors, one can also specify which rows or columns are not to
be shown by means of negative indexes.

A[2:3, c(1, 3)]

## [,1] [,2]
## [1,] 4 6
## [2,] 7 9

A[, c(1, 3)]

## [,1] [,2]
## [1,] 1 3
## [2,] 4 6
## [3,] 7 9
## [4,] 10 12
## [5,] 13 15

A[-1, ]

## [,1] [,2] [,3]


## [1,] 4 5 6
## [2,] 7 8 9
## [3,] 10 11 12
## [4,] 13 14 15

Finally, just like with vectors, also with matrices it is possible to access only
values that satisfy a certain condition. Applying a comparison to a matrix
will result in a matrix of logical values TRUE or FALSE; using the outcome of
such a comparison in square brackets will show only the values in the matrix
for which the result of the comparison is TRUE.

A > 5

## [,1] [,2] [,3]


170 CHAPTER 7. MATRIX ALGEBRA

## [1,] FALSE FALSE FALSE


## [2,] FALSE FALSE TRUE
## [3,] TRUE TRUE TRUE
## [4,] TRUE TRUE TRUE
## [5,] TRUE TRUE TRUE

A[A > 5]

## [1] 7 10 13 8 11 14 6 9 12 15

Note that the result comes in the form of a vector (and by comparing to the
original matrix A, we may see that these values come again in a column-wise
manner) and in this case, no argument can change the fact. The reason
is simple: the number of the matrix entries that satisfy the condition in
different rows or columns will in general vary and the entries will be placed
within the matrix in an irregular pattern.

7.2.3 Extending and combining matrices

Two handy functions in R that allow to extend and/or combine matrices


are rbind and cbind. As their names suggest, they are meant for binding,
in this case for binding matrices and vectors together. rbind combines the
rows of matrices, that is, when providing two or more matrices, they will
be combined by binding them below each other (in the order as provided in
the function). An important condition for this to work is that the provided
matrices have equal numbers of columns.

(R <- rbind(A, matrix(0:2, 2, 3)))

## [,1] [,2] [,3]


## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
## [5,] 13 14 15
## [6,] 0 2 1
## [7,] 1 0 2
7.2. MATRICES IN R 171

cbind works very similarly to rbind, with the difference being that it combines
the matrix columns together. As a consequence, the condition is that the
matrices provided to the function must have the same number of rows. If a
vector is used, too, then it must consist of as many values as is the number
of rows in the other input(s).

(C <- cbind(A, matrix(1:10, 5, 2), -5:-1))

## [,1] [,2] [,3] [,4] [,5] [,6]


## [1,] 1 2 3 1 6 -5
## [2,] 4 5 6 2 7 -4
## [3,] 7 8 9 3 8 -3
## [4,] 10 11 12 4 9 -2
## [5,] 13 14 15 5 10 -1

7.2.4 Managing matrix dimensions

To obtain the dimensions of a matrix, one may use nrow for the number of
rows, ncol for the number of columns, or dim for the dimension, that is both
number of rows and number of columns (in this order).

nrow(A)

## [1] 5

ncol(A)

## [1] 3

dim(A)

## [1] 5 3

As in the introductory example of this chapter, in some situations it can


be handy to name the rows and columns of a matrix. In R, this is by
172 CHAPTER 7. MATRIX ALGEBRA

default not the case, which we can see by the fact that both rownames(A)
and colnames(A) result in NULL which means that the vectors that store
the names of rows and columns of A are empty. However, it can easily be
changed by assigning vectors of names to them.

colnames(A) <- c("C1", "C2", "C3")


rownames(A) <- paste("R", 1:5, sep = "")

Let us make a small detour to inspect the way we assigned the row names
of A. The command paste can be used to combine several objects or values
into character vectors. In this particular case, it combines the character "R"
with the values 1:5 in turn. sep = "" specifies that R and the number are
to be separated by nothing (you can play around with the function inputs
to see what happens if you e.g. change the separator). As a result, we get a
vector of 5 characters:

rownames(A)

## [1] "R1" "R2" "R3" "R4" "R5"

Now that we have defined the names of the rows and columns of A, they will
also be shown whenever we view the matrix, instead of the generic square
brackets:

## C1 C2 C3
## R1 1 2 3
## R2 4 5 6
## R3 7 8 9
## R4 10 11 12
## R5 13 14 15

Moreover, it is now also possible to access the rows and columns (and in fact
any entry) also by their names – but the usual indexing via row and column
numbers also remains valid.
7.3. MATRIX OPERATIONS 173

A["R4", "C3"]

## [1] 12

A["R3",]

## C1 C2 C3
## 7 8 9

A[3, ]

## C1 C2 C3
## 7 8 9

While for a 5 × 3 matrix this may not seem too helpful, it is actually
very useful for larger matrices. A typical example would be some dataset
containing e.g. the information about several people, with each row corresponding
to one person (and being named by the person’s name or some identifier) and
each column corresponding to one type of information, like number of points
in different exams (and being named by the course name or exam date).

7.3 Matrix operations

In this section, we shift our focus to operations with matrices. We start


by the operations in mathematical sense and then move on to how matrix
operations are performed in R. Finally, we discuss some applications of matrix
multiplication.
Definition 7.2. Consider a scalar (i.e. real number) α ∈ R and matrices A,
B of the same dimension m × n. Then the following hold:

ˆ A and B are called equal, A = B, if all their elements are equal, that
is, a = b for all i ∈ {1, . . . , m}, j ∈ {1, . . . , n}.
ij ij

ˆ Matrix addition is defined element-wise, that is, S = A + B is given


by sij = aij + bij for all i ∈ {1, . . . , m}, j ∈ {1, . . . , n}.
174 CHAPTER 7. MATRIX ALGEBRA

ˆ Scalar multiplication is defined element-wise, that is, M = αA is given


by mij = αaij for all i ∈ {1, . . . , m}, j ∈ {1, . . . , n}.
Remark. Note that the assumption of equal dimensions is crucial for the
matrix addition. Adding two matrices of different dimensions (special case:
matrix plus a number, i.e. a 1 × 1 matrix) is not defined.
Combining the matrix addition and multiplication by a scalar, it is also easy
to see that matrix subtraction also works element-wise.

For the addition and scalar multiplication, the usual rules apply.
Theorem 7.1. For scalars α, β ∈ R and matrices A, B and C of the same
dimension m × n, the following hold:

ˆ (A + B) + C = A + (B + C) (associative law),
ˆ A + B = B + A (commutative law),
ˆ (α + β)A = αA + βA and α(A + B) = αA + αB (distributive law).
While matrix addition or scalar multiplication work very intuitively and in
a straight-forward manner, this is not the case for matrix multiplication.
Definition 7.3. For matrices A = (aij )m×n and B = (bij )n×p , the matrix
product AB is a matrix AB = C = (cij )m×p with
n
X
cij = air brj .
r=1

For a square matrix A, the k-th power of A is well defined as


Ak = AA . . . A} .
| {z
k times

Breaking down the definition of the matrix product, we have that cij is the
scalar product of the i-th row of A and the j-th column of B, that is, the
sum of element-wise products of the i-th row of A and the j-th column of
B.
Problem 7.1. Find the matrix product
   
0 1 2 3 2
2 3 1 ·  1 0 .
4 −1 6 −1 1
7.3. MATRIX OPERATIONS 175

Solution. We will refer to the left matrix in the product as A and to the
right matrix as B such that we are looking for the product C = AB. To fill
the entry in the first row and the first column of the product C, we consider
the first row of A and the first column of B: c11 = 0 · 3 + 1 · 1 + 2 · (−1) = −1.
For the first row and second column of C, we continue working with the first
row of A, but move on to the second column of B: c12 = 0·2+1·0+2·1 = 2.
We continue in a similar manner to get all the other entries:

c21 = 2 · 3 + 3 · 1 + 1 · (−1) = 8 c22 = 2 · 2 + 3 · 0 + 1 · 1 = 5


c31 = 4 · 3 − 1 · 1 + 6 · (−1) = 5 c32 = 4 · 2 − 1 · 0 + 6 · 1 = 14.

Below, we put these together in the resulting matrix. We also use color-
coding to make it obvious what parts of which matrix were used to find
particular cells: In A, we use colored backgrounds to highlight its rows,
whereas in B, we use different entry colors for the two columns. In the
final matrix, the combination of the background color and of the color of
the resulting number tells you which row and column of the matrices of the
product were used:
     
0 1 2 3 2 −1 2
 2 3 1  ·  1 0 =  8 5  .
4 −1 6 −1 1 5 14

A way of keeping track which rows and columns are necessary for the entries
of the product matrix is the Falk’s scheme. In this scheme, one uses a kind
of a coordinate system. We start by entering the left matrix of the product
(A) in the lower left quadrant and the right matrix of the product (B) in
the upper right quadrant:

3 2
1 0
−1 1
0 1 2
2 3 1
4 −1 6

Then the rows of A and columns of B can be seen as the names of the
rows and columns in the (so far empty) table which arose in the lower right
quadrant – this we will proceed to fill with the resulting matrix C = AB.
Each cell will be filled with the scalar product of its ’row name’ and ’column
176 CHAPTER 7. MATRIX ALGEBRA

name’. We show a few steps below:

3 2 3 2 3 2
1 0 1 0 1 0
−1 1 −1 1 −1 1
→ → ··· →
0 1 2 −1 0 1 2 −1 2 0 1 2 −1 2
2 3 1 2 3 1 2 3 1 8 5
4 −1 6 4 −1 6 4 −1 6 5 14

Remark. Note that finding the matrix product is only possible if the left
matrix in the product has as many columns as the matrix on the right has
rows. This is in line with Definition 7.3 where the matrices are assumed
to be of size m × n and n × p. Moreover, we can notice in the solution of
Problem 7.1 that the resulting matrix has as many rows as matrix A and
as many columns as matrix B. This again agrees with Definition 7.3 where
the size of the product is said to be m × p. To remember these rules, one
can think about ’inner’ and ’outer’ dimensions of the matrices that are part
of the product. The ’inner’ dimensions must be the same for the product to
exist whereas the ’outer’ dimensions define the size of the resulting matrix:

AB=C
m×n n×p m×p

Remark. Warning! In general, AB ̸= BA! This is obvious for matrices


that are not square: If we consider two matrices A and B of sizes m × n
(m ̸= n) and n × p, respectively, then AB will be of size m × p. BA, on the
other hand, will in general not even exist (try e.g. exchanging the order of
the matrices in Problem 7.1), or even if p = n and the product BA exists,
it will be of size n × n. Recall that two matrices can only be equal if they
are of the same size.
Though for n × n matrices A and B, both AB and BA exist and are of size
n × n, also in this case it is generally the case that
 AB̸= BA. To  see this,

5 3 3 1
try and find AB = BA for instance for A = and B = .
13 7 1 1
Nevertheless, there are a few exceptions to the rule. One of them is the
multiplication by the identity matrix (of an appropriate size). As we already
mentioned, the identity matrix plays the role of a 1 among matrices, and for
any m × n matrix A, we have Im A = AIm = A. If m = n, this means that
IA = AI = A.
7.3. MATRIX OPERATIONS 177

While, as discussed above, matrix multiplication is not commutative, the


associative and distributive law hold for matrix multiplication, too.

Theorem 7.2. The following hold for any matrices A, B and C of appropriate
dimensions:

ˆ (AB)C = A(BC) (associative law),


ˆ (A + B)C = AC + BC and A(B + C) = AB + AC (distributive
law).

Problem 7.2. Expand (A + B)2 .

Solution. From the definition of matrix powers and distributive law, we


have that

(A+B)2 = (A+B)(A+B) = A(A+B)+B(A+B) = A2 +AB+BA+B 2 .

Recall that in general AB ̸= BA such that the above cannot be further


simplified to A2 +2AB+B 2 (even though being used to the binomial formula,
one might be very much tempted to do so)!

7.3.1 Matrix operations in R

In R, the usual operators like +, * etc. all work in an element-wise way. We


have witnessed this already for vectors, and it is not different for matrices.
That means, some of the operations which, strictly speaking, are not defined
from mathematical point of view, can be performed in R in some way. Let
us recall the matrix A from earlier and then perform some operations on it.

## C1 C2 C3
## R1 1 2 3
## R2 4 5 6
## R3 7 8 9
## R4 10 11 12
## R5 13 14 15
178 CHAPTER 7. MATRIX ALGEBRA

A + 2

## C1 C2 C3
## R1 3 4 5
## R2 6 7 8
## R3 9 10 11
## R4 12 13 14
## R5 15 16 17

A*1.5

## C1 C2 C3
## R1 1.5 3.0 4.5
## R2 6.0 7.5 9.0
## R3 10.5 12.0 13.5
## R4 15.0 16.5 18.0
## R5 19.5 21.0 22.5

A + A

## C1 C2 C3
## R1 2 4 6
## R2 8 10 12
## R3 14 16 18
## R4 20 22 24
## R5 26 28 30

A/A

## C1 C2 C3
## R1 1 1 1
## R2 1 1 1
## R3 1 1 1
## R4 1 1 1
## R5 1 1 1

A^3
7.3. MATRIX OPERATIONS 179

## C1 C2 C3
## R1 1 8 27
## R2 64 125 216
## R3 343 512 729
## R4 1000 1331 1728
## R5 2197 2744 3375

A*A

## C1 C2 C3
## R1 1 4 9
## R2 16 25 36
## R3 49 64 81
## R4 100 121 144
## R5 169 196 225

A+2 (an operation not defined mathematically) results in a matrix where


each entry of A has been increased by 2. This is a special case of using
recycling where basically a matrix of only 2’s was added to A. Like with
mathematical scalar multiplication and addition, A*1.5 gives a matrix where
each entry has been multiplied by 1.5, whereas A + A results in an element-
wise addition of two matrices (note that since the same matrix was added to
itself, the outcome of 2*A would be the same). However, A/A does not have
a mathematical counterpart since from mathematical point of view, matrix
division is in fact not defined. However, in R, such division is possible and
will be, like all the operations before it, be performed element-wise. In this
case, we divide A[i,j] by the same value, such that A/A is a matrix consisting
of 1’s only. The operations with which one should be especially careful are
raising a matrix to a power and matrix multiplication. The usual operators
^ and * work in an element-wise way and therefore do not correspond to
the mathematical notions of matrix powers and matrix multiplication. In
fact, for a 5 × 3 matrix A that we are using, A3 and A · A are not even
defined from mathematical point of view. But since these operators work in
an element-wise way in R, it is possible to perform these operations.

For matrix multiplication as defined in mathematics, R has another operator,


namely %*%. To demonstrate its use, let us define a matrix B with three rows
such that the matrix product of A and B exists. As an exercise, verify that
the outcome of A%*%B indeed corresponds to the matrix product of the two
180 CHAPTER 7. MATRIX ALGEBRA

matrices. Note that A*B will result in an error.

B <- matrix(1:3, nrow = 3, ncol = 3)


A*B

## Error in A * B: non-conformable arrays

A%*%B

## [,1] [,2] [,3]


## R1 14 14 14
## R2 32 32 32
## R3 50 50 50
## R4 68 68 68
## R5 86 86 86

Next to the usual operations, R also offers useful functions that allow to
easily obtain the sums of each row and of each column of the matrix.

rowSums(A)

## R1 R2 R3 R4 R5
## 6 15 24 33 42

colSums(A)

## C1 C2 C3
## 35 40 45

7.3.2 Applications of matrix multiplication and matrix


powers

Let us close this section by two examples that are applications of storing
information in matrices in a concise way and of using matrix multiplication
or matrix powers to obtain further results.
7.3. MATRIX OPERATIONS 181

Problem 7.3. The following diagram shows the numbers of flight connections
between the different airports in three countries:

Country A Country B Country C

1
b1
2 2
3 1 c1
1
a1 b2
1
1 c2
2 1
a2 b3 4

c3
1 b4

What are the numbers of flight connections between different airports in


countries A and C?
Solution. Let us first consider the connections between airports a1 and c1
before we generalize to obtain the numbers of connections for any combination
of airports in A and C. From a1, there are two connections to b1, and from
there there is one connection to c1, which makes it 2 · 1 = 2 connections from
a1 to c1 through b1. Through b2, there are 1 · 2 = 2 connections from a1 to
c1. Since there are no connections from a1 to b3, one cannot fly from a1 to
c1 through b3 (there are 0 · 1 = 0 such connections) and similarly, one cannot
fly from a1 to c1 through b4, since there are no connections from b4 to c1
(there are 1 · 0 = 0 connections from a1 to c1 through b4). Therefore, there
are in total 2 · 1 + 1 · 2 + 0 · 1 + 1 · 0 = 4 connections from a1 to c1. We may 
notice that this is actually the matrix product of a row vector 2 1 0 1 ,
that gives the number of connections
 between a1 and the different airports in
1
2
B, and a column vector  1, that gives the number of connections between

0
the different airports in B and c1.
In fact, such an argument can be made for any combination of airports in
A and C. A concise overview of the flight connections between the different
airports of A and C can therefore be obtained as a matrix product of two
matrices: In matrix P , let us represent each of the airports in A by one row
and each of the airports in B by one column. In matrix Q, let us represent
each of the airports in B by one row, whereas each column will correspond
to one of the airports in C. In the final product R = P Q, each row will
182 CHAPTER 7. MATRIX ALGEBRA

represent one airport in A, whereas each column will represent one airport
in C. From the diagram, we can write

 c1 c2 c3 
b1 b2 b3 b4 b1 1 0 2
!  
a1 2 1 0 1 b2 2 0 0 
P = , Q=  .
a2 3 0 2 1 b3
 1 0 4 

b4 0 1 0

The multiplication then gives


c1 c2 c3
 
a1 4 1 4
R = PQ = .
a2 5 1 14
Note that in the final product R, the row-airport correspondence is the same
as in P , whereas the column-airport correspondence is the same as in Q. On
the other hand, when calculating the entries of R, each product consists of
two numbers corresponding to the same airport of B.
Problem 7.4. Initially, three firms, A, B, and C, share the market for a
certain commodity. Firm A has 20% of the market, B has 60% and C has
20%. In the course of the next year, the following changes occur:

ˆ firm A keeps 85% of its customers, while losing 5% to firm B and 10%
to firm C;
ˆ firm B keeps 55% of its customers, while losing 10% to firm A and 35%
to firm C;
ˆ firm C keeps 85% of its customers, while losing 10% to firm A and 5%
to firm B.

Find the initial market share vector s and the transition matrix T , that is,
the matrix such that T s describes the market shares at the end of the next
year. Find and interpret the values T s, T (T s), T (T (T s)), etc.
Solution. Since initially, the market shares of the three firms A, B and C
are 20%, 60% and 20%, respectively, the market share vector s is of the form
 
0.2
s = 0.6 .
0.2
7.3. MATRIX OPERATIONS 183

To find out the shares of the firms at the end of the year, let us consider the
described changes.

ˆ A keeps 85% of its own share and moreover gains 10% of B’s customers
and 10% of C’s customers. This corresponds to a final share of 0.85 ·
0.2 + 0.1 · 0.6 + 0.1 · 0.2.

ˆ B keeps 55% of its own share and moreover gains 5% of A’s customers
and 5% of C’s customers. This corresponds to a final share of 0.05 ·
0.2 + 0.55 · 0.6 + 0.05 · 0.2.

ˆ C keeps 85% of its own share and moreover gains 10% of A’s customers
and 35% of B’s customers. This corresponds to a final share of 0.1 ·
0.2 + 0.35 · 0.6 + 0.85 · 0.2.

After carefully inspecting the final shares of the firms and comparing them
to the initial share vector s, we see that the transition matrix is
 
0.85 0.1 0.1
T = 0.05 0.55 0.05 .
0.1 0.35 0.85

Note that the columns of the transition matrix contain the information about
how the current share of each firm will be distributed among the firms in the
next year, with each row corresponding to one (receiving) firm. In this case
T s really delivers the final shares described above and we have
 
0.25
T s = 0.35 .
0.40

Since this is the market share vector after one year, multiplying it with T
again gives the market share after two years, assuming that in the second
year, the same relative changes happen. Similarly, T k s gives the market
share after k years, assuming that the same relative changes happen every
year. To find the values, let us resort to R.

T <- matrix(c(85, 5, 10, 10, 55, 35, 10, 5, 85), nrow = 3)/100
s <- c(0.2, 0.6, 0.2)
T %*% s
184 CHAPTER 7. MATRIX ALGEBRA

## [,1]
## [1,] 0.25
## [2,] 0.35
## [3,] 0.40

T %*% T %*% s

## [,1]
## [1,] 0.2875
## [2,] 0.2250
## [3,] 0.4875

T %*% T %*% T %*% s

## [,1]
## [1,] 0.315625
## [2,] 0.162500
## [3,] 0.521875

T %*% T %*% T %*% T %*% s

## [,1]
## [1,] 0.3367187
## [2,] 0.1312500
## [3,] 0.5320312

(Note that if you copy this code into an R-file, the T will be shaded in a
different color. The reason is that T actually is a short version for TRUE that
we are now rewriting. This is generally not an issue if you always use TRUE;
however, if you are likely to attempt to use the short version T later in your
code, you should be aware of this issue and use a different name for your
transition matrix.)
Note that to obtain the higher powers of the transition matrix, we keep
repeatedly using matrix multiplication. At some point, this becomes tedious,
therefore it would be useful to have a function that can calculate matrix
powers directly. Unfortunately, base R does not provide such a function.
7.3. MATRIX OPERATIONS 185

Fortunately, it is not too difficult to define an own function that will do just
that. Let us define it and then use it to inspect the market behavior over
the next 30 years. Note that T 0 = I, which is in line with the idea of the
identity matrix acting as 1 for matrices.
Try to understand exactly from the code below what matrix M contains and
how this information is used to create the plot as an exercise.

T.power.n <- function(n,T) {


if (n == 0) {return(diag(dim(T)[1]))}
Z <- T
if (n > 1) {
for (j in 2:n) {
Z <- Z %*% T
}
}
Z
}
M <- matrix(0, nrow = 3, ncol = 31)
for (j in 1:(31)) {M[ , j] = T.power.n(j - 1, T) %*% s}
plot(0:30, M[1, ], ylim = c(0, 1))
points(0:30, M[2, ], col = 2)
points(0:30, M[3, ], col = 3)
title("Black: A, Red: B, Green: C")

Black: A, Red: B, Green: C


0.8
M[1, ]

0.4
0.0

0 5 10 15 20 25 30

0:30
186 CHAPTER 7. MATRIX ALGEBRA

In the plot, we can observe that after about 7 years, the market shares seem
to stabilize in a sort of equilibrium. Moreover, we see that B quickly loses the
majority it had at the beginning, due to the large proportion of customers
it loses to C every year. While after the first year, B still owns a proportion
of the market close to C, after the second year the market share of B is the
smallest of the three firms, with C almost having reached majority at this
point.

Remark. To inspect the market behavior over the years in Problem 7.4, we
defined specifically a function T.power.n to calculate matrix powers since
base R does not offer a similar function. However, there is such a function
in the package expm. However, expm is a package that is not part of the
basic R installation, so if you have not yet, you will need to install it in
order to be able to use its functions. The package can be installed using
install.packages("expm"), or alternatively, you can navigate to the tab
Packages in the right lower pane of your Rstudio, click on Install and look
for the package. After installing it, don’t forget to load the package using
library("expm") or library(expm) (or by ticking it in the Packages tab)
for R to be able to access its functions.
This package contains the operator %^% which is the matrix counterpart of
^, much like %*% is the matrix counterpart of *.

library("expm")
T %*% T

## [,1] [,2] [,3]


## [1,] 0.7375 0.175 0.175
## [2,] 0.0750 0.325 0.075
## [3,] 0.1875 0.500 0.750

T %^% 2

## [,1] [,2] [,3]


## [1,] 0.7375 0.175 0.175
## [2,] 0.0750 0.325 0.075
## [3,] 0.1875 0.500 0.750
7.4. MATRIX TRANSPOSE AND SYMMETRIC MATRICES 187

7.4 Matrix transpose and symmetric matrices

An important matrix operation, not known from working with numbers, is


the transpose. This operation corresponds to exchanging the roles of rows
and columns in matrix and the transpose of a matrix A is usually denoted
by A′ or by A⊺ .
Definition 7.4. The transpose A′ = A⊺ of a m × n matrix A is given by
A′ = (a′ij )n×m where a′ij = aji .

It is a mathematical convention that usually when talking about vectors, we


mean a column vector that can be transformed to a row by the means of
transposition. Therefore from now on whenever we mention a vector x, this
is a column vector, whereas x′ is a row vector (unless specified otherwise).

For working with transposes, there are a few useful rules that often simplify
calculations.
Theorem 7.3. For two matrices A and B of appropriate dimensions, the
following hold:

ˆ (A ) = A,
′ ′

ˆ (A + B) = A + B ,
′ ′ ′

ˆ (AB) = B A .
′ ′ ′

Remark. Note the change of order in the rule for the product! This is an
important part of the rule. In fact, if A and B are not the same size, it
might be the case that even though AB is well defined (and can therefore
be transposed to arrive at a new matrix), A′ B ′ might not even exist. Note
that by transposition, the dimension of an m × n matrix changes to n × m.
Use this for the dimensions of A and B to see that B ′ A′ actually does exist
whenever AB exists.

In R, a matrix is transposed by a function called simply t.

t(A)

## R1 R2 R3 R4 R5
## C1 1 4 7 10 13
## C2 2 5 8 11 14
## C3 3 6 9 12 15
188 CHAPTER 7. MATRIX ALGEBRA

A special class of matrices are matrices that are not changed by transposition.
Definition 7.5. A matrix A is called symmetric if A′ = A, that is, aij = aji .
Remark. Recall that by transposition, the dimension of an m × n matrix
changes to n × m. This means that only square matrices can be symmetric.

Symmetric matrices play a special role in matrix algebra because they have
several properties that simplify more advanced operations. In this course,
we do not discuss such advanced concepts, however, we will mention the
very important Hessian matrices in the following chapter, which are also
symmetric.
Problem 7.5. For any matrix X, show that X ′ X and XX ′ are symmetric.
Solution. Recall that a matrix A is symmetric if A′ = A. Therefore, to
show that X ′ X is symmetric, we will transpose it and use the rules for
calculations with the transpose to show that its transpose is the same as
X ′ X itself. We have
(X ′ X)′ = X ′ (X ′ )′ = X ′ X.
The proof for XX ′ works analogously.
Remark. Note that both X ′ X and XX ′ always exist and are square matrices.
If X is of dimension m×n, then X ′ X is of size n×n and XX ′ of size m×m

t(A)%*%A

## C1 C2 C3
## C1 335 370 405
## C2 370 410 450
## C3 405 450 495

A%*%t(A)

## R1 R2 R3 R4 R5
## R1 14 32 50 68 86
## R2 32 77 122 167 212
## R3 50 122 194 266 338
## R4 68 167 266 365 464
## R5 86 212 338 464 590
7.5. SYSTEMS OF EQUATIONS IN MATRIX FORM 189

7.5 Systems of equations in matrix form

A very important use of matrices is to express systems of (linear) equations


in a simplified way. More than that, matrices also provide a transparent way
of solving the systems, or even of deciding whether a particular system has
a solution before attempting to solve it.

Throughout this section, we will illustrate the use of the matrix form of
equations on the following example:

Example 7.1. A simple economy has three industries: fishing, forestry, and
boatbuilding. Producing

ˆ 1 ton of fish requires α fishing boats,


ˆ 1 ton of timber requires β tons of fish,
ˆ 1 fishing boat requires γ tons of timber.
Assume further that this economy has a final demand of d1 tons of fish and
d2 tons of timber (and no final demand for boats). How many tons of fish
x1 , how many tons of timber x2 , and how many fishing boats x3 have to be
produced?

To describe the needed amounts of fish, tons and timber, we can write down
equations in the following way:

ˆ There is a final demand of d 1 tons for fish. However, this is not the
necessary amount x1 , since, to produce the x2 tons of timber, some fish
will be necessary, too. Therefore, we have x1 = d1 + βx2 .

ˆ Timber is not only demanded by the final consumers (d 2 tons), but is


also necessary to built the required x3 fishing boats. Therefore, the
amount of timber necessary is given by x2 = d2 + γx3 , or, equivalently.

ˆ Finally, while there is no final demand for fishing boats, they will be
needed to produce the necessary amount of x1 tons of fish, such that
we get x3 = αx1 .
190 CHAPTER 7. MATRIX ALGEBRA

Combining all three together, we get (after some equation reformulations)


the following system of three linear equations in three unknowns:

x1 − βx2 = d1
x2 − γx3 = d2 (7.2)
−αx1 + x3 = 0.

We may notice that the left hand side of the system actually is the outcome
of the matrix multiplication Ax where
   
1 −β 0 x1
A= 0 1 −γ  and x = x2 
−α 0 1 x3

(note that A contains the coefficients of x1 in the first column, of x2 in the


second column and of x3 in the third column, while each row of the matrix
corresponds to one of the equations.) We therefore have the
 equation Ax = b

where the right hand side b is given by b = d1 d2 0 . Such systems are
often written in the compact form (A|b), that is, this system can be written
in the matrix form as
 
1 −β 0 d1
 0 1 −γ d2  (7.3)
−α 0 1 0

The system (7.2) can in fact be easily solved by plugging in backwards: From
the last equation, we have x3 = αx1 , which leads from the second equation
to x2 = d2 + αγx1 , and that gives, upon plugging in into the first equation,
(1 − αβγ)x1 = d1 + βd2 . Therefore, we have

x1 = d1−αβγ
1 +βd2

x2 = d2 + αγ d1−αβγ1 +βd2
(7.4)
x3 = α d1−αβγ
1 +βd2
.

The method with plugging-in backwards worked well in this particular case
due to the simple structure of the system, in particular due to the fact that
each equation only features two of the three variables. A more versatile
method that works also in situations when the presented method leads to
too complicated solutions is the so called Gaussian elimination, or its slightly
augmented form the Gauss-Jordan method. It relies on three basic facts
about equations:
7.5. SYSTEMS OF EQUATIONS IN MATRIX FORM 191

ˆ Interchanging the order of two (or several equations) does not change
the solution of the system.
ˆ Multiplying one equation by a non-zero scalar does not change the
solution of the equation (and consequently of the system).
ˆ Adding a multiple of one equation to another equation (while keeping
all other equations) does not change the solution of the system.

Let us illustrate these points on a simple system of two equations:


0.5x − y =3
2x + y = 17.
Obviously, if we wrote the equations in reversed order, the system would still
have the same solution. Moreover, if 0.5x − y = 3, then clearly 2(0.5x − y) =
2 · 3, thus the system
x − 2y = 6
2x + y = 17

would be equivalent to the original system, too. Finally, clearly (2x + y) −


2(x − 2y) must be equal to 17 − 2 · 6, which leads to
x − 2y = 6
5y = 5
being also equivalent to the original system. Note how we kept the first
equation and only changed the second one, in order not to lose any information.
The last step very conveniently reveals that y must be equal to 1, which
implies, from the first equation, x = 8.

The matrix form allows to simplify this procedure by only keeping track of
the coefficients next to the variables on the left hand side of the equations,
but not having to write down the variable names themselves. This can save a
lot of time in case of larger systems. Let us write down the process of solving
the above system in matrix notation.
       
0.5 −1 3 1 −2 6 1 −2 6 1 −2 6
∼ ∼ ∼
2 1 17 2 1 17 0 5 5 0 1 1
(7.5)

Gaussian elimination consists of various combinations of similar steps with


the goal of eliminating variables from equations and obtaining a kind of
192 CHAPTER 7. MATRIX ALGEBRA

staircase in which each following row has its first non-zero coefficient to the
right of the previous equation. Then one can find the solution to the system
(if there is any) by plugging in in a backward manner, or one can continue
the elimination process in up to a point where every leading coefficient (first
non-zero coefficient of an equation) is 1 and only has zeroes above it, too. In
the latter case, we talk about the Gauss-Jordan elimination. In the case of
the above system, there would be only one step left to finish the elimination
process presented in (7.5): By adding twice the second row to the first, we
eliminate the -2 in the second column of the first row to get
 
1 0 8
0 1 1

To sum up, in Gaussian and Gauss-Jordan elimination in the matrix form,


every step consists of one or several elemantary row operations, which simplify
the system but, importantly, don’t change its solution:

ˆ interchanging any pair of rows,


ˆ multiplying any row by a non-zero scalar,
ˆ adding any (non-zero) multiple of one row to a different row.
In some situations, notably when the system consists of less equations than
there are variables (but in other situations, too), it is possible that at the
end of the Gauss-Jordan elimination process, there will be non-zero entries
left that are not leading entries (i.e. they are not the first non-zero entry in
a row). These can be chosen freely, while the variables corresponding to the
leading entries will be expressed in terms of them. The number of variables
that can be chose freely is called the number of degrees of freedom.

Problem 7.6. Solve the system of equations given in matrix form as


 
1 0 1 2 a
 1 3 −1 1 b  .
1 9 −5 −1 c

Solution. With the elementary row operations, we can perform the following
7.5. SYSTEMS OF EQUATIONS IN MATRIX FORM 193

elimination procedure:
   
1 0 1 2 a 1 0 1 2 a
 1 3 −1 1 b  ∼  0 3 −2 −1 b − a  ∼
1 9 −5 −1 c 0 9 −6 −3 c − a
   
1 0 1 2 a 1 0 1 2 a
 0 3 −2 −1 b−a  ∼  0 1 − 32 − 13 b−a
3

0 0 0 0 2a − 3b + c 0 0 0 0 2a − 3b + c

In the first step, we have subtracted the first row from the second as well as
the third row to eliminate the 1’s in the first column. In the second step, we
subtracted three times the second row from the third one, to eliminate the
9 in the second column of the third row. Finally, we divided the second row
by 3 to make the leading entry a 1.
From the last row, we see that the system only has a solution if 2a−3b+c = 0.
In this case, x3 and x4 are free variables, since they do not correspond to any
of the two leading entries. We get the solution

x1 = a − x3 − 2x4
b−a 2 1
x2 = + x3 + x4
3 3 3
x3 , x4 ∈ R.

The solution can be checked simply by plugging in into the original system
of equations.

Remark. Note that there is no unique correct sequence of steps in the


elimination procedure. While some order of operations might make the whole
process easier than others, it is possible to exchange the order of elementary
row operations used or even use different operations. However, as long as
only elementary row operations are used (and performed correctly), the final
result will be correct no matter what sequence of steps was used.

Problem 7.7. Solve the system of equations from Example 7.1 for α = 12 ,
β = 41 , γ = 2, d1 = 100 and d2 = 80 with the elimination method.

Solution. We start by rewriting the general system in matrix form with the
corresponding parameter values:

1 − 14 0 100
 
 0 1 −2 80  .
1
−2 0 1 0
194 CHAPTER 7. MATRIX ALGEBRA

The full elimination process (one version of it) is as follows (try to understand
the individual steps or, even better, perform them yourselves and compare
the final result):

1 − 14 0 100 1 − 14 0 100
   
 0 1 −2 80  ∼  0 1 −2 80  ∼
1
−2 0 1 0 0 − 18 1 50
1 − 14 0 100 1 − 14 0 100
   
 0 1 −2 80  ∼  0 1 −2 80  ∼
3
0 0 60 0 0 1 80
 4 1   
1 − 4 0 100 1 0 0 160
 0 1 0 240  ∼  0 1 0 240 
0 0 1 80 0 0 1 80

which gives the solution x1 = 160, x2 = 240 and x3 = 80. By plugging in the
parameter values to (7.4), we see that this is also in line with the solution
we obtained for the general parameters.

In R, solving a system of equations that can be represented by a square matrix


can be achieved with the help of the function called solve. The function
requires as its arguments the matrix corresponding to the coefficients in the
system and a vector of the right hand sides. To solve Problem 7.7, we could
use the following code:

A <- matrix(c(1, -0.25, 0, 0, 1, -2, -0.5, 0, 1),


nrow = 3, byrow = TRUE)
b <- c(100, 80, 0)
solve(A, b)

## [1] 160 240 80

7.6 The inverse and determinant

For scalars, we know that for α ̸= 0,


1 1
α· = · α = 1.
α α
7.6. THE INVERSE AND DETERMINANT 195

For square matrices, this concept is generalized by

AX = XA = I (7.6)

since, as we mentioned on several occasions, the identity plays the role of a


1 among matrices.

Definition 7.6. The matrix X that satisfies (7.6), if it exists, is called the
inverse of A. We denote the inverse by A−1 . If A has an inverse, it is called
invertible or regular ; a matrix that is not invertible is called singular.

Note that for (7.6) to make sense, both A and X must be square matrices.
If A−1 exists, it is uniquely defined (try to prove this). Note also that this
is another exception to the rule AB ̸= BA: If the inverse exists, then
AA−1 = A−1 A.

In the following, we provide some useful rules that can help simplify calculations
with the inverse.

Theorem 7.4. For square invertible matrices A, B of the same size and a
scalar α ̸= 0, the following hold:

ˆ (A ) = A,
−1 −1

ˆ (AB) = B A
−1 −1 −1
,

ˆ (A ) = (A ) ,
′ −1 −1 ′

ˆ (αA) = A .
−1 1
α
−1

Problem 7.8. Find the inverse of the matrices


     
2 1 1 0 a b
, and .
1 0 0 0 c d
   
2 1 x11 x12
Solution. Let the inverse of be a matrix of the form .
1 0 x21 x22
Then from the definition of the inverse, it must be the case that
      
2 1 x11 x12 2x11 + x21 2x12 + x22 1 0
= = .
1 0 x21 x22 x11 x12 0 1

We see immediately that x11 = 0 and x12 = 1. This then implies that x21 = 1
and x22 = −2.
196 CHAPTER 7. MATRIX ALGEBRA

Similar considerations for the second matrix yield that this matrix does not
have an inverse since we get
      
1 0 x11 x12 x11 x12 1 0
= =
0 0 x21 x22 0 0 0 1

which clearly is not possible.


Finally, for a general matrix of the form
 
a b
,
c d

we get
      
a b x11 x12 ax11 + bx21 ax12 + bx22 1 0
= = .
c d x21 x22 cx11 + dx21 cx12 + dx22 0 1

At a closer look, we identify two systems of linear equations in two variables


each:

ax11 + bx21 = 1 ax12 + bx22 = 1


cx11 + dx21 = 0 cx12 + dx22 = 0

The solution to both of these systems exists if ad − bc ̸= 0 and it leads to


 
−1 1 d −b
A = .
ad − bc −c a

Remark. As it turns out in Problem 7.8, looking for the inverse corresponds
to solving severalsystems
   at the same time. For a 2 × 2 matrix,
of equations
1 0
these are Ax = and Ax = , and this concept can be generalized
0 1
for larger matrices. The inverse can therefore actually be found by Gauss-
Jordan elimination with the identity matrix on the right side, i.e. the system
with the matrix form (A|I). If it is possible to achieve the identity matrix
on the left hand, then A is invertible and whatever matrix is on the right at
the end of this process is in fact the inverse of A. This is also why in R, the
function that provides the inverse of a matrix is in fact the same that is used
to solve systems of equations: solve with a single argument, a square matrix,
outputs the inverse of the matrix (if it exists; it gives an error otherwise):

solve(A)
7.6. THE INVERSE AND DETERMINANT 197

## [,1] [,2] [,3]


## [1,] 1.3333333 0.3333333 0.6666667
## [2,] 1.3333333 1.3333333 2.6666667
## [3,] 0.6666667 0.1666667 1.3333333

A%*%solve(A)

## [,1] [,2] [,3]


## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1

In Problem 7.8, we found the condition for the existence of the inverse of a
2 × 2 matrix  
a b
A=
c d
is that ad − bc ̸= 0. The number ad − bc is a special number that can be
calculated for any square matrix A. It is called the determinant, denoted by
det(A) or |A|, and it helps us to determine whether a matrix is invertible or
not.

Theorem 7.5. A square matrix A is invertible if and only if det(A) ̸= 0.

In R, the determinant of a (previously defined) matrix A can be calculated


as det(A).

As we already mentioned above, for a 2 × 2 matrix


 
a b
A= ,
c d

det(A) = ad − bc. For a diagonal matrix, its determinant is the product of


its diagonal entries, that is, for
 
d1 0 0 ...
 0 d2 0 ... 
D =  .. .. . . ..  ,
 
. . . .
0 0 . . . dn
198 CHAPTER 7. MATRIX ALGEBRA

|D| = d1 d2 . . . dn .

We will now collect some properties of the determinant and then comment
on them, including on how they can be used to find the determinant of a
general square matrix, though calculating the determinant of a non-diagonal
matrix of size larger than 2 × 2 without R is beyond the scope of this course.
Theorem 7.6. For any n×n matrices A and B and a scalar α, the following
hold:

1. If all the elements in a row (or a column) of A are 0, then |A| = 0.


2. |A′ | = |A|.
3. If two rows (or columns) of A are proportional, that is, one row (or
column) is a multiple of another, then |A| = 0.
4. If all the elements in a single row (or column) of A are multiplied by
α, then the determinant is multiplied by α.
5. If two rows (or two columns) of A are interchanged, then the determinant
changes sign (but the absolute value remains unchanged).
6. If a multiple of one row (or one column) is added to a different row (or
column) of A, then the determinant remains unchanged.
7. |AB| = |A| · |B|.
8. |αA| = αn |A|.

Note that points 4.-6. of Theorem 7.6 in fact describe what happens to the
determinant of the matrix if elementary row operations are performed. That
is, one can calculate the determinant of any square matrix with the help of
elimination by performing elementary row operations to arrive at a diagonal
matrix and keeping track of the changes to the determinant along the way.
Remark. Point 8. in Theorem 7.6 is in fact implied by point 6.: If A is
multiplied by α, that corresponds to every row being multiplied by α. Since
for each of the n rows, the determinant will be multiplied by α, we get
|αA| = αn |A|.
Remark. Point 7. of Theorem 7.6 implies that the product of two matrices
is invertible if and only if both of the matrices are invertible, too, since for
the product of two numbers to be non-zero, both of them must be non-zero,
and for the product to be 0, at least one of the two numbers must be 0.
7.6. THE INVERSE AND DETERMINANT 199

7.6.1 Solving matrix equations

We will now apply our knowledge about matrix inverses and the general
rules for calculating with matrices to solve equations that feature matrices
not only as coefficients, but also in the role of the unknowns.
Example 7.2. Let us start by a simple system of linear equations that
in matrix form can be written as Ax = b where A is a square matrix of
coefficients, x is the vector of unknown variables and b is the vector of right
hand sides. If A is invertible, then we can multiply both sides of this system
by A−1 – but be careful! In matrix multplication, order matters, thus we
need to specify that in this case, we will be multiplying both sides of the
equation from the left to remove A from the right hand side:
Ax = b ⇔ A−1 Ax = x = A−1 b.
Note that if A is not invertible, it could mean both that the system does not
have a solution, but also that there are infinitely many solutions.
Problem 7.9. Find a matrix X that satisfies AB + CX = D. What
condition must be satisfied for the found X to exist?
Solution. Just like in equations with a single real valued variable, we start
by moving AB to the left side to only have terms with X on the right hand
side:
CX = D − AB.
Now, if C −1 exists, we may multiply both sides of the equation by it from
the left, to obtain
X = C −1 (D + AB)
which clearly only exists if C is invertible.
Problem 7.10. Find a matrix X that satisfies XA+3X = B and formulate
a condition for this solution’s existence.
Solution. We start by noting that 3X = X(3I). This reformulation is
necessary for us to be able to use the distributive law since in its original
form XA+3X, we cannot factor out X from the term on the right hand side
of the equation – the addition of a matrix and a number is mathematically
not defined. After this step, we may proceed as follows:
XA + X(3I) = B
X(A + 3I) = B
X = B(A + 3I)−1
200 CHAPTER 7. MATRIX ALGEBRA

which only exists if A + 3I is a regular matrix. Note that in this case, we


multiplied by the inverse of A + 3I from the right since it was on the right
side of the product X(A + 3I).

Problem 7.11. Let C be a square matrix that satisfies C 2 + C = I. Show


that

ˆC −1
=I +C

ˆC 3
= −I + 2C

ˆC 4
= 2I − 3C

Solution. The equality C 2 + C = I can be rewritten as C(C + I) = I.


Since I is a regular matrix, both C and C + I must be invertible, too, such
that we can multiply the equality by C −1 (from the left) to obtain

C + I = C −1

which shows the first equality.


For the second equality, we multiply C 2 + C = I by C (note that in this
case, it does not matter whether we multiply from the left or from the right
since the equality only features powers of C itself and the identity). We get

C 3 + C 2 = C.

We move C 2 and plug in C 2 = I −C which is implied by the original equality


to get C 3 = −I + 2C. Finding the term for C 4 works analogously.

7.6.2 Determinant and definiteness

The determinant is closely connected to a concept of the definiteness of a


matrix. This is a property usually defined for symmetric matrices and as we
will see in Chapters 8 and 9, very useful when studying the behaviour and
extremal value of functions.

Definition 7.7. Let A be a symmetric n × n matrix.

ˆ A is called positive semidefinite if x Ax ≥ 0 for all x ∈ R .


′ n

ˆ A is called positive definite if x Ax > 0 for all x ∈ R , x ̸= 0.


′ n
7.6. THE INVERSE AND DETERMINANT 201

ˆ A is called negative semidefinite if x Ax ≤ 0 for all x ∈ R .


′ n

ˆ A is called negative semidefinite if x Ax < 0 for all x ∈ R , x ̸= 0.


′ n

ˆ A is called indefinite, if it is neither positive semidefinite nor negative


semidefinite, i.e. there are x, y ∈ Rn such that x′ Ax > 0 and y ′ Ay < 0
 
2 −2
Example 7.3. Consider the matrix A = . For any x ∈ R2 , we
2 5
have x′ Ax = 2x21 + 5x22 − 4x1 x2 = x21 + x22 + (x1 − 2x2 )2 ≥ 0 and this term
is only equal to 0 if x1 = x2 = 0. A is  thus positive
 definite.
0 1
On the other hand, the matrix B = is indefinite: x′ Bx = 2x1 x2 ,
1 0
which can be both positive and negative.

Often, the definiteness of a 2 × 2 matrix can be determined from only its


entry in the first row and first column, and its determinant.
Theorem 7.7. Let A be a symmetric 2 × 2 matrix.

ˆ A is positive definite if and only if a > 0 and det(A) > 0.


11

ˆ A is negative definite if and only if a < 0 and det(A) > 0.


11

ˆ A is positive semi definite if and only if a ≥ 0 and det(A) = 0.


11

ˆ A is negative semidefinite if and only if a ≤ 0 and det(A) = 0.


11

ˆ A is indefinite if and only if det(A) < 0.


Warning! While Definition 7.7 holds for general n, Theorem 7.7 does not
easily generalize. On the one hand, for larger matrices, more than just
the first entry and the determinant need to be checked to determine the
definiteness of the matrix. What is more, while there are corresponding
sufficient and necessary conditions for A being positive or negative definite,
for semidefiniteness the generalized condition is only necessary and one would
have to resort to other methods to confirm semidefiniteness.
Example 7.4. Recall the matrices from 7.3: For A, we have a11 = 2 > 0
and det(A) = 14 > 0, thus A is, as already shown, positive definite.
For B, we have b11 = 0 and det(B) = −1, whichmakes the matrix indefinite.
1 −1
Let us finally have a look at the matrix C = . In this case, we
−1 1
have c11 > 0 and det(C) = 0. That means, we cannot decide its definiteness
from
202 CHAPTER 7. MATRIX ALGEBRA

7.7 Exercises
7.1 Consider a firm that operates in 3 cities. The firm produces 3 tools that
are used in 4 different industries. A manager needs to analyze the data about
the firm for the last 4 years.
Four years ago, the average price of tools that the firm sold in cities {1, 2, 3}
was 3, 8 and 7 (in thousands of EUR), respectively, and the numbers of sales
in the same cities were 16, 10 and 14 (in thousands of units), respectively.
Each year, these prices increased by 2000 EUR and the number of sales
decreased by 1000 units in each city.
Find the matrix P that contains the average prices in each city (row) and
each year (column), the matrix S that contains the sales in each city and
each year, and then use R to find the matrix R that contains the revenue in
each city and each year.

7.2 Consider a firm that produces three types of products from two types of
raw materials. Product 1 requires 5 units material 1 and 2 units of Material
2, Product 2 requires 2 units of material 1 and 4 units of Material 2, and
Product 3 requires 12 units of Material 1. The firm ships these product into
3 countries. The next shipment into Country 1 should consist of 1000, 2500
and 500 units of the products, respectively. The next shipment into Country
2 should contain 2200 units of Product 1 and 3000 units or Product 3. To
Country 3, 5000 units of Product 1, 2500 units of Product 2 and 1800 units of
Product 3 will be shipped. Find the matrix P that contains the production
requirements of the materials for the products and the matrix O that contains
the ordered amounts of the products for the three countries. Then use R to
find the matrix R that contains the amounts of raw materials required to
produce the shipments for the three countries and then the amounts of raw
materials required overall for all three countries combined.

7.3 Consider the matrices


   
    1 0 0 3 0
1 2 1 0 −1
A= , B= , C = 2 1  , D = 1 −1 1 .
0 1 0 2 1
0 −2 2 0 0
Compute the following expressions or explain why they are not well defined.

a)A + B e)BA i)C ′ B ′ − D


b)B + C ′ f)BC j)ABC
c)C + B ′ g)CB k)BCA
d)AB h)A + BC l)BAC
7.7. EXERCISES 203

Verify your results in R.

7.4 Solve the following systems of linear equations using Gaussian elimination:
2x1 + 4x3 = −6
a) x1 + 2x2 + 2x3 = 1
−3x1 + 2x2 = 1
3x1 + 6x3 − 9x4 = 18
x + 2x2 + 5x3 = 9
b) 1
−x1 + 3x2 − 2x3 + 2x4 = −5
− 2x2 − 2x3 + 7x4 = −11
x1 + x2 − 2x3 + 4x4 = 0
c) 2x1 + 2x2 + x3 + 3x4 = 5
−x1 − x2 + x3 − 3x4 = −1

7.5 Which of the following matrices must be equal to the matrix (A + B)2
(for A and B of appropriate sizes)?
a)(B + A)2
b)A2 + 2AB + B 2
c)A(A + B) + B(A + B)
d)(A + B)(B + A)
e)A2 + AB + BA + B 2
f)B(A + B) + (B + A)A

A+A′
7.6 Show that for any square matrix A, the matrix C = 2
is symmetric.

7.7 Show that for an invertible matrix A, it holds that |A−1 | = 1


|A|
.

7.8 Assume that A, B and C are invertible matrices of dimension n × n.


For each of the following statements, decide whether they are true or false.
If a statement is false, correct it.
a)(A − B)(A + B) = A2 − B 2 .
b)X = A−1 B is the solution of the matrix equation XB −1 C = A−1 C.
c)AB + AC −1 can be calculated in R using A*(B+inv(C)) (assuming
that the matrices A, B and C have been assigned).
d)(AB)−1 = A−1 B −1 .
e)If B is symmetric, then B −1 (BC)′ = C ′ .
204 CHAPTER 7. MATRIX ALGEBRA

7.9 Consider the matrix equation CDX = C + D for the unknown matrix
X.
a)Solve the equation and formulate conditions for the existence of X.
b)Find X for    
1 2 5 0
C= , D= .
3 4 0 6
Check your results in R.
c)Assume that some matrices C and D such that the solution of the above
matrix equation exists have been assigned in R. Moreover, the dimension
n has been assigned to the variable n. Among the following lines of code,
choose all that deliver the correct solution X. (There might be more than
one.)
i)inv(D) + inv(D)%*%C%*%D
ii)inv(D) + inv(D)%*%inv(C)%*%D
iii)solve(D) + D%*%solve(C)%*%solve(D)
iv)solve(D) + solve(D)%*%solve(C)%*%D
v)D^(-1) + D^(-1)*C*D
vi)solve(D) + solve(D)*solve(C)*D
vii)solve(D)%*%(diag(n) + solve(C)%*%D)
viii)inv(D)*C*D + inv(D)

7.10 Solve the following matrix equations where A, B and C are square
matrices of the same size and X the unknown matrix, and provide the
conditions for the existence of the solution X.
a)AXC −1 = A′ BA, where C is invertible. How does the solution
simplify if A is symmetric?
b)B −1 XC = BC, where B is invertible.

7.11 Calculate:
a)A−1 + BC −1 if
     
2 2 4 3 4 0
A= , B= , C= ,
3 4 2 1 0 5

b)E ′ D −1 + F if
 
    1 2
1 3 2 4 1
D= , E= , F = 0 −1 ,
2 4 1 3 1
2 −2
7.7. EXERCISES 205

c)R′ + P −1 Q if
 
    1 1
4 6 −1 1 0
P = , Q= , R = 3 5  .
1 2 2 1 3
6 −2

7.12 Write R code that defines matrices P, Q and R with


 
    1 1
1 2 3 2 2
P = , Q= , R = −1 2 ;
4 5 6 2 2
1 3

and then calculates Q′ (P R)−1 and (R′ P ′ )−1 Q. Think first about how to
write it, ideally write it down on paper, and only then use R to check your
answer.
7.13 Assume that a matrix A has been defined in R. What does the following
piece of code do? In particular, what is printed in the last two lines?

m <- nrow(A)
n <- ncol(A)
v1 <- sum(A[1, ])
for(i in 2:m) v1 <- c(v1, sum(A[i, ]))
v2 <- min(A[ , 1])
for(j in 2:n) v2 <- c(v2, min(A[ , j]))
print(v1)
print(v2)

 
d1 0 ... 0
 0 d2 ... 0
7.14 Show that a diagonal matrix D =  .. .. ..  is invertible
 
..
. . . .
0 0 . . . dn
if
 and only if d1 , . . . , dn are all non-zero and that in that case, D −1 =
1
d1
0 . . . 0
0 1 ... 0 
d
 .. ..2 . . ..  .
 
. . . .
1
0 0 ... dn

7.15 Consider 4 × 4 matrices A, B and C with |A| = 4, |B| = 5 and


|C| = 2. Find the following:
206 CHAPTER 7. MATRIX ALGEBRA

a)|B −1 ACB|
b)|3A|
c)|C ′ BC|

7.16 Write a function inverse that takes a 2 × 2 matrix A as input and finds
its inverse, without using the in-built functions for computing the inverse or
for finding the determinant. If the inverse does not exist, it prints "A is not
invertible." You may assume that the provided input is a 2 × 2 matrix,
that is, you do not have to check the dimension of the given matrix A.
Hint: Recall that for a 2 × 2 matrix A, there is an explicit formula to find
A−1 .

7.17 Write a function normalizedA that takes a m × n (m, n ≥ 2) matrix


A as input, and outputs a new matrix that arises by normalizing the colums
of A to have sum, i.e. by dividing each column by its corresponding sum.
You may not use the functions colSums, rowSums. You may assume that
the provided input is a matrix of appropriate dimensions, you do not have
to check this. You may also assume that none of the column sums is equal
to 1, such that the output is well defined.

7.18 Below you will find a code with gaps that is supposed to accomplish the
following set of tasks: Create a 3 × 4 matrix from the vector 1:12, filled in
a row-wise manner. Afterwards, interchange the first and second row. Then
multiply the second row of the (new) matrix by 5. Finally, print the sum
of the elements for those columns where this sum over the column is larger
than 30.
a)Fill in the gaps (denoted by _____) in the code using the words/terms/expressions
from the word cloud below (one word per gap) such that the code
accomplishes the tasks given above. Note that some words remain
unused.

_____ <- matrix(1:12, _____ = 3, byrow = TRUE)


G[_____, ] <- G[c(2, 1), ]
G[_____, ] <- 5*G[2, ]
_____(i in 1:_____) {
SC <- _____(G[ , _____])
if(SC > _____) {
print(_____("The sum in column ", i, " is ", _____, ".", sep = ""))
}
}
7.7. EXERCISES 207

Word cloud: M, G, for, if, max, min, 30, 100, TRUE, FALSE, nrow, ncol,
sum, mean, length, 1, 2, 3, 4, i, n, SC, paste, c(2, 1), c(1, 2)

b)What is the output of the code? (Provide the exact output, not a
description. Try to work it out without using R first, then check your
answer in R.)

7.19 For matrices A and B, what do the following functions do? (Do not
explain the code line by line. Instead, your task is to recognize what the
outcome of each function is in terms of the input.)

a)MyFunction <- function(A, B) {


if(ncol(A) == nrow(B)) {
result <- matrix(0, nrow(A), ncol(B))
for(i in 1:nrow(A)) {
for(j in 1:ncol(B)) {
result[i, j] <- sum(A[i, ]*B[ , j])
}
}
return(result)
} else {
print(’Not possible.’)
}
}

b)MyFunction3 <- function(A) {


if(ncol(A) != nrow(A)) {
print(’Not defined.’)
} else {
for(q in 1:ncol(A)) {
for(p in 1:nrow(A)) {
result[p, q] <- sum(A[, q]*A[p, ])
}
}
result
}
}
208 CHAPTER 7. MATRIX ALGEBRA

7.8 Further readings

Chapter 12, in particular Sections 12.1-12.8 of [1] give a detailed introduction


to matrices, vectors, operations with matrices and Gaussian eliminations.
The end-of-section exercises in this section (with the exception of exercises 7
and 8b in Section 12.7) are great to verify your understanding of the topic,
and so are the exercises 1-8 in Review exercises of Chapter 12. Chapter
13 focuses on determinants and inverses. From this chapter, we suggest as
further readings and for more practice

ˆ Section 13.1 about determinants of 2 × 2 matrices, including exercises


1 and 4-7;

ˆ Section 13.4 about the basic rules for determinants, including exercises
1, 2 and 4;

ˆ Section 13.6 about the inverse, including all exercises except 5c


ˆ Review exercises 1, 3, 7 and 9.
For interested students who want to understand all the details behind the
contents of this chapter, we also recommend a set of videos by 3Blue1Brown.com
(though some parts are well beyond the scope of this course).
Chapter 8

Functions of several variables

In Chapter 3, we introduced functions as mappings f : A → B between two


sets. While in Chapters 3 and 4, we proceeded to consider A and B as
subsets of R, the definition allows for more general sets. In this chapter, we
will continue to consider B a subset of R, but A will be a subset of R × R,
or more generally of R × . . . × R, which means that the function A will take
as arguments several real-valued arguments. You might also often see the
domain of a function being Rn for some n ∈ N. This corresponds to the
argument of the function being a n-tuple (pair, triple, etc.) of values. While
formally these are different notions (with A = R × R, f takes two real-valued
arguments, whereas with A = R2 , f takes only one argument, which however
is a pair of two values – a vector), they result in the same concept, which is
why we use both notations interchangeably.

Example 8.1. Consider the function f : R × R → R, (x, y) 7→ 2x + x2 y 3 .


This is a function of two variables, and we can find its value for different
pairs of x and y by simply plugging in for these variables:

f (0, 1) = 2 · 0 + 02 13 = 0,
f (−1, 0) = 2 · (−1) + (−1)2 03 = −2,
f (a + 1, b) = 2(a + 1) + (a + 1)2 b3 .

Equivalently, we can write f : R2 → R, x 7→ 2x1 + x21 x32 .

Example 8.2. Let the milk consumption f (p, m) be a function of the relative
price of milk p and income per family m:

f : R+ × R+ → R, f (p, m) = p−1.5 m2.08 .

209
210 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

This type of function (product of different powers of the individual variables)


is known as a Cobb-Douglas function; it is often used to model the production
of or demand for a certain product.

In Chapter 3, we introduced a way of plotting functions of one variable in


R. In the following section, we introduce a way of plotting functions of two
variables.

8.1 Plotting functions of two variables in R

Like for many things, also for plotting functions of two variables there are
many ways of how to do it in R. Here, we introduce the package emdbook and
its function curve3d which allows to create different types of plots, both 3-
and 2-dimensional. emdbook is not part of the basic R installation, so install
it first if you have not done so yet, and don’t forget to load it.

To introduce the plotting function, we will use the function from Example
8.2. Upon defining it, we use curve3d to create a first plot of the function.
As the name of the function suggests, it is designed specifically to create 3D
plots, and thus for functions of 2 variables. For such functions, the graph (the
collection of all combinations of function arguments and the corresponding
function value) is a 3-dimensional object – 2 dimensions correspond to the
function arguments, the third one to its value.

The function curve3d requires the function f to be plotted as its first argument.
We can further specify the intervals from which the values of the arguments
come. To this end, we provide a vector of smallest values in the argument
from and a vector of greatest values in the argument to (default values are
c(0, 0) for from and c(1, 1) for to.) If we wish to name the variables
differently than the generic x and y, we can do so with the help of the
argument varnames. Alternatively, one can change the axis labels with the
help of arguments xlab, ylab and zlab. Finally, sys3d controls what 3D
plotting system will be used. The value "wireframe" creates an image with
the view from the point from (also note the wire look of the function surface).
There are several other types of plots and this argument controls which of
them is to be used. We will introduce some of the other 3D plotting systems
one after the other.
8.1. PLOTTING FUNCTIONS OF TWO VARIABLES IN R 211

f <- function(p, m) p^(-1.5)*m^2.08


library(emdbook)
curve3d(f, from = c(.5, .5), to = c(1, 1),
varnames = c("p", "m"), sys3d = "wireframe")

f(p,m)

m p

As already mentioned above, the point from which the function is viewed
when using sys3d = "wireframe" is from. However, we have no information
about what this point is, what intervals are used for the two variables, or what
values the function takes. While we might get an idea about the general form
of the function, the information about the scale is important and usually one
would like to include at least a part of it in a figure. Unfortunately, it is not
possible to add scales when using sys3d = "wireframe". However, the good
news is that the default setting sys3d = "perp" allows for adding axes ticks.
This can be done with the help of another argument, ticktype. The default
value of this argument is "simple" and it results in the arrows indicating the
direction of increase as used with sys3d = "wireframe". With ticktype =
"detailed", we get axis ticks with variable values.
212 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

curve3d(f, from = c(.5, .5), to = c(1, 1), varnames = c("p", "m"),


sys3d = "persp", ticktype = "detailed")

2.5

2.0
f(p,m)

1.5
1.0
0.9
1.0 0.8
0.7
m

0.5 0.6
0.5
0.5 0.6 0.7 0.8 0.9 1.0
p

While 3D plots are great for getting a general idea about the behaviour of the
function, they are usually not too easy to read. For this, it is more common
to use contour plots. With curve3d, these can be created by setting sys3d =
"contour". In a contour plot, the function is visualized as the 3-dimensional
space being cut by various horizontal planes parallel to the xy-plane. The
lines in a contour plot are the so called level curves. Each curve corresponds
to a certain value – level – k of the function, and collects all points (x, y)
(pairs of variable values) for which f (x, y) = k. You have in fact possibly
encountered level curves if you ever used a hiking map: They allow hikers to
get a feeling for the steepness of the trail; the closer the level curves are to
each other, the steeper the trail. Note that while we only see level curves for
a few possible function values, the function takes on other values in between.
However, we would not aim to plot all the level curves as that would result
in a black mess that would not allow us to read any information from the
graph.
8.1. PLOTTING FUNCTIONS OF TWO VARIABLES IN R 213

curve3d(f, from = c(.5, .5), to = c(1, 1),


varnames = c("p", "m"), sys3d = "contour")
1.0

8
2 2 1. 1.
6 1.4 1.2 1
2.
4
2.
0.9

0.8
0.8

0.6
m

0.7

0.4
0.6
0.5

0.5 0.6 0.7 0.8 0.9 1.0

In the contour plot above, one can nicely see that (in the plotted area), f
is decreasing in p and increasing in m: Moving along the axis corresponding
to p from left to right, the function value decreases, while with increasing
values of m (moving along the axis from bottom to top), the function value
increases. This may help us determine the sign of partial derivatives (which
will be introduced below) at a certain point.

Finally, there is the sys3d="image" option, which creates a so called heat


map. In a heat map, the values of the function are indicated by colors.
Low values are indicated by light yellow and as the function values increase,
the color changes to shades of red, dark red indicating the highest function
values in the plotted area. The color palette is a relative one; the colors do
not belong to particular values over all different functions, they depend on
the overall range of values the function achieves in the considered area.
214 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

curve3d(f, from = c(.5, .5), to = c(1, 1),


varnames = c("p", "m"), sys3d = "image")

1.0
0.9
0.8
m

0.7
0.6
0.5

0.5 0.6 0.7 0.8 0.9 1.0

8.2 Partial derivatives

Like with functions of single variables, also in the case of several variables
we are often interested in how functions react to changes in the underlying
variables. To this end, we study the partial derivatives of the function
with respect to its individual variables. We present the definition of partial
derivatives for two variables only, but the notion extends to n variables
(n > 2).

Definition 8.1. Let f : R × R → R be a function of two variables, x and y.


The partial derivatives of f are defined as

∂f f (x + h, y) − f (x, y)
fx = fx′ = f1′ = := lim
∂x h→0 h
8.2. PARTIAL DERIVATIVES 215

and
∂f f (x, y + h) − f (x, y)
fy = fy′ = f1′ = := lim .
∂y h→0 h
The vector of the partial derivatives

∇f (x, y) = (fx′ (x, y), fy′ (x, y))′

is the gradient of the function.


Remark. In the above definition, we present four different (but equivalent)
types of notation for partial derivatives. We will use them interchangeably
throughout the lecture notes.
Remark. While in this chapter we consider the codomain of the function
B to be a subset of the real numbers, there are also more general functions,
e.g. with B ⊆ Rn . Consider for instance a firm that produces two products
and for the production of both of them, a raw material and electric energy
are required. One may describe the overall amount of the raw material
required when producing x units of the first product and y units of the second
one by a function r(x, y), and the required electric energy by a function
e(x, y). Very often, one would be interested in both values at the same
time, so instead
 of considering
  two separate function, a vector-valued function
f (x, y) r(x, y)
f (x, y) = 1 = may be considered. In this case, we would
f2 (x, y) e(x, y)
have f : R0+ × R0+ → R2 . Now, both functions r and e have their partial
derivatives with respect to x and y. These would be typically collected in a
Jacobi matrix or Jacobian. For a general function f : Rn → Rm , the Jacobi
matrix contains the transposed gradients of the individual functions in its
rows:  ∂f1 ∂f1

∂x1
(x, y) . . . ∂xn
(x, y)
J (x) =  .. ... ..
.
 
. .
∂fm ∂fm
∂x1
(x, y) ... ∂xn
(x, y)
In the special case of a single function, the Jacobian is simply the transpose
of its gradient.

Definition 8.1 suggests that to find the partial derivative of f with respect
to one of its variables, all other variables are considered constants. If the
function contains e.g. a product of x and y, y plays the role of a multiplicative
constant when looking for fx′ .

Similarly to single variable functions, the partial derivative of f with respect


to x gives the approximate change of f if x increases by 1 unit (and y is
216 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

kept constant) and the partial derivative with respect to y describes the
approximate change in the function value in reaction to a 1 unit increase in
y. The following interpretations of the derivative signs are therefore quite
intuitive:

ˆ If f (x, y) ≥ (>)0 for all (x, y) ∈ B for some convex set B ⊆ R×R → R,
x
f is increasing (strictly increasing) in x on the set B.

ˆ If f (x, y) ≤ (<)0 for all (x, y) ∈ B for some convex set B ⊆ R×R → R,
x
f is decreasing (strictly decreasing) in x on the set B.

ˆ If f (x , y ) > 0 for a point (x , y ) in the domain of f , then f is strictly


x 0 0 0 0
increasing in x at the point (x , y ).
0 0

ˆ If f (x , y ) < 0 for a point (x , y ) in the domain of f , then f is strictly


x 0 0 0 0
decreasing in x at the point (x0 , y0 ).

Similar interpretations hold also for y (and any other variable of f , if it has
more than two).

Remark. The first two points require a convex set B. A set B ⊆ Rn is


convex, if for any two points x, y ∈ B, the two points can be connected by a
straight line without leaving the set, that is, for any λ ∈ [0, 1], λx+(1−λ)y ∈
B. Note that we give the similar result for single variable function also for
convex sets, since intervals used in the corresponding results of Chapter 5
are convex sets.

Remark. Note that in the third and fourth point, we mention f being
increasing at a point. f is considered increasing in x at a point (x0 , y0 )
if there is an open interval I, x0 ∈ I such that f is increasing on I × {y0 }.

Problem 8.1. Find the partial derivatives of the function from Example
8.2, f (p, m) = p−1.5 m2.08 . Interpret them.

Solution. To find fp , note that f can be seen as p−1.5 · C where C = m2.08


acts, from the point of view of p, as a constant independent of the value of
p. Thus we have

fp = −1.5p−2.5 C = −1.5p−2.5 m2.08 .

Similarly, to obtain fm , considering p−1.5 a multiplicative constant, we get

fm = 2.08p−1.5 m1.08 .
8.2. PARTIAL DERIVATIVES 217

Since p and m are positive, we have fp < 0 and fm > 0 for any values of
p and m from the function domain. That means that f is decreasing in p
and increasing in m over the function’s domain. This is in line with what we
observed in the contour plot.
Example 8.3. Let D(p, q) and E(p, q) be the demands for two commodities
when the prices per unit are p and q, respectively. Suppose the commodities
are substitutes in consumption, such as butter and margarine. Then the
normal signs of the first order derivatives are as follows:

ˆ D (p, q) ≤ 0 and E (p, q) ≥ 0: If the price of butter increases, the


p p
consumers tend to buy less butter and even substitute it with margarine,
such the the demand for margarine increases.
ˆ D (p, q) ≥ 0 and E (p, q) ≤ 0: If the price of margarine increases, the
q q
consumers tend to substitute it with butter and possibly substitute it
with butter, which causes the demand for margarine to decrease and
the demand for butter to increase.

Recall that the derivative gives the approximate absolute change of the
function value in reaction to a unit change in a variable, whereas elasticity
quantifies the approximate relative (percentage) change in reaction to a
relative (one percent) change in the variable. Knowing about partial derivatives,
the definition of elasticity for functions of several variables is a straight
forward generalization of the elasticity definition for single variable functions.
Definition 8.2. Let f : R × R → R be a function of two variables x1 , x2 .
Then the elasticity of f with respect to the variable xi is
xi
Elxi f (x1 , x2 ) = fx (x1 , x2 ).
f (x1 , x2 ) i
Remark. If D1 (p1 , p2 ) and D2 (p1 , p2 ) are two demand functions, both being
functions of the prices of two commodities (like D and E in Example 8.3),
then Elp2 D1 (p1 , p2 ) and Elp1 D2 (p1 , p2 ) are called cross demand elasticities.
Remark. Note that Definition 8.2 can easily be extended to functions n >
2 variables. Just like such a function would have n partial derivatives,
there would be n elasticities of the functions with respect to each individual
variable.
Problem 8.2. The demand for money in the United States for the period
1929 to 1952 was estimated as
M = 0.14Y + 76.03(r − 2)−0.84 , (r > 2),
218 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

where Y is the annual national income, and r is the annual interest rate
measured in percentages. Plot the function in R. Find the partial derivatives
MY and Mr and discuss their signs. Find the income and interest elasticies
of the demand for money.

Solution. For the derivative of M with respect to Y , note that the second
part of the function after the plus sign does not depend on Y in any way,
such that it will be considered an additive constant and therefore disappear
in the differentiation process. Similarly, the part of the function before the
plus sign is just an additive constant when differentiating with respect to r.
We therefore get

MY = 0.14, (8.1)
Mr = −63.8652(r − 2)−1.84 . (8.2)

Since MY > 0, the demand for money increases with national income. People
are more willing to take on credits and mortgages if they earn more and are
therefore confident to be able to repay them. On the other hand, Mr < 0
(for r > 2, which is assumed), such that the demand for money decreases
with increasing interest rates.
We proceed to find the elasticities. We use the partial derivatives we found
in (8.1) and (8.2) to get

0.14Y
ElY M (Y, r) =
0.14Y + (r − 2)−0.84
−0.84r(r − 2)−1.84
Elr M (Y, r) = .
0.14Y + (r − 2)−0.84

If for instance the current levels are Y = 10000 and r = 2.5, we get
ElY M (10000, 2.5) ≈ 0.9987 which means that a 1% increase in income leads
to an increase in the demand for money of about 1%. On the other hand,
Elr M (10000, 2.5) ≈ −0.0013, meaning that the demand for money reacts to
a 1% increase in the interest rate by a decrease of only 0.0013%.
Finally, let us plot the function in R, for which we of course first need to define
it. To get an idea about the behaviour of the function in both variables,
we use the perspective from the starting point, i.e. sys3d = "wireframe".
However, you may use other options, too, to familiarize yourself with the
function.
8.2. PARTIAL DERIVATIVES 219

M <- function(Y, r) .14*Y + 76.03*(r-2)^(-0.84)


curve3d(M, from = c(0, 2.01), c(40000, 2.5),
varnames = c("Y", "r"), sys3d = "wireframe")

M(Y,r)

r Y

Problem 8.3. Recall the function f (p, m) = p−1.5 m2.08 from Example 8.2.
Find the price and income elasticities of milk consumption.

Solution. We use the partial derivatives found in Problem 8.1 to find the
elasticities:

Elp f (p, m) = −1.5,


Elm f (p, m) = 2.08.

Remark. Note that the elasticies in both cases are constant and they are
the corresponding exponents of p and m. This is a special property of the
Cobb-Douglas functions: Their elasticities are always constant and equal to
the exponents of the variables.
220 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

8.3 Implicit differentiation

In some situations, one observes an equality

f (x, y) = k (8.3)

for a constant k and a function f of two variables x, y (in general, f may also
be a function of more than two variables). If one would like to change the
value of x while keeping the function value f (x, y) constant at k, y would have
to be adapted as well. Note that f (x, y) = k corresponds to one level curve
in the contour plot of f . If you consider point (x0 , y0 ) on this level curve (i.e.
f (x0 , y0 ) = k) and move away from this point along the x axis, thus moving
away from x0 , you would typically also need to move away from y0 to stay at
the same level curve. In that sense, y can be seen as a function of x, and since
it is not given directly as a function, which would be an explicit definition,
but rather through an equality, we say that there is an implicit relation or
that it is given implicitly. That is, we have some function y = y(x) that we
usually can’t find in an explicit form. Similarly, if one moves away from the
current value of y, x typically also has to be adjusted, such that x is also a
function of y, x = x(y). In the following, we will consider y in dependence
of y, but the described concepts work the same also if we exchange the roles
of the variables.

Example 8.4. An example for a situation given above would be the production
of a certain product. Assume that the production output of a company when
using x and y units of different inputs (e.g. materials) is given as f (x, y) and
the company currently produces k units of the product. If the price of the
first material increases and the company would therefore like to decrease the
amount of the material used, while at the same time keeping the production
output at the same level, it will also need to adjust the amount y of the
second input used in the production.

To find out how y = y(x) needs to be adjusted if x changes by one unit, we


can use the derivative y ′ (x). Depending on the function f (x, y), this might be
achieved fairly easily if the equality f (x, y) = k can be reformulated to find
y = F (x), where F is some function of a single variable, by differentiating F ,
or, if such a reformulation is not (easily) available, by the means of implicit
differentiation. In the latter method, we differentiate both sides of 8.3 while
treating y as a function of x. That way, f (x, y) = f (x, y(x)) becomes a
function of a single variable, x. The differentiation of f (x, y(x)) involves the
8.3. IMPLICIT DIFFERENTIATION 221

use of chain rule, and very often the product rule will be needed, too. After
differentiating 8.3, we get the equation
df (x, y(x))
=0
dx
which we solve for y ′ (x). In fact, the chain rule lets us write

df (x, y(x)) ∂f ∂f
= (x, y) + (x, y)y ′ (x)
dx ∂x ∂y
which, upon setting equal to 0 and solving for y ′ (x), leads to

fx (x, y)
y ′ (x) = − . (8.4)
fy (x, y)

Let us solve some problems to understand the principle of implicit differentiation.

Problem 8.4. Find an expression for y ′ , when it holds that xy = 5. By how


much does y have to change (approximately) for each unit change in x, if
currently x = 2.5 and y = 2?

Solution. To make the dependence of y on x obvious, we rewrite the equality


as xy(x) = 5. Then we differentiate both sides and solve for y ′ (x):

y(x) + xy ′ (x) = 0
y(x)
y ′ (x) = − . (8.5)
x
Note that in the differentiation, we used the product rule for xy(x), setting
f (x) = x and g(x) = y(x).
To find out by how much y has to change for each unit change in x at the
current levels, we just plug in the current levels of x and y into the expression
for y ′ : y ′ = − 2.5
2
= −0.8.
In this particular case, it is not difficult to reformulate the inequality to get
y = x5 (note that if x = 0, xy ̸= 5 such that 0 not being in the domain of x5 is
not an issue). Differentiating this function gives y ′ (x) = − x52 . On the other
hand, if we plug in y(x) = x5 into the expression (8.5) we got by implicit
differentiation, we get
y(x) 5
y ′ (x) = − =− 2
x x
which provides us an easy way to confirm that the implicit differentiation
worked correctly.
222 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

Problem 8.5. Find an expression for y ′ when it holds that y 3 + 3x2 y = 13.
Solution. We notice that unlike in the previous problem, it is not easy to
describe the relation between x and y explicitly, that is, we don’t know the
explicit form of the function y = y(x). We therefore have to resort to implicit
differentiation. We have
y 3 (x) + 3x2 y(x) = 13
3y 2 (x)y ′ (x) + 6xy(x) + 3x2 y ′ (x) = 0
y ′ (x)(3y 2 (x) + 3x2 ) = −6xy(x)
6xy(x)
y ′ (x) = − 2 .
3y (x) + 3x2
Note that to differentiate y 3 (x), we used the chain rule: We have y 3 (x) =
f (g(x)) with f (x) = x3 and g(x) = y(x). To differentiate this composite
function, we differentiate the outer function and plug in the inner function
(f ′ (g(x)) = 3y 2 (x)) and then multiply this by the derivative of the inner
function (g ′ (x) = y ′ (x)).
Alternatively, we can use formula (8.4) to find y ′ . For f (x, y) = y 3 + 3x2 y,
we have fx (x, y) = 6xy and fy (x, y) = 3y 2 + 3x2 , which gives
fx (x, y) 6xy
y ′ (x) = − =− 2 .
fy (x, y) 3y + 3x2
Remark. We use the notation y(x) to remind you that there is a relation
between x and y, that is, y is in fact considered a function of x. However,
this is not necessary and after you have had some practice with implicit
differentiation, you may drop the function argument and write y instead of
y(x). For instance the solution of the above problem would then look as
follows:
y 3 + 3x2 y = 13
3y 2 y ′ + 6xy + 3x2 y ′ = 0
y ′ (3y 2 + 3x2 ) = −6xy
6xy
y′ = − 2 .
3y + 3x2

8.4 Higher order partial derivatives

For a function f (x, y), fx and fy are called first order partial derivatives.
These functions are, in general, again functions of two variables, and each
8.4. HIGHER ORDER PARTIAL DERIVATIVES 223

of them therefore possesses again two first order partial derivatives each.
With respect to the original function, these would be second order partial
derivatives because we obtain them by differentiating the function twice. A
function of two variables therefore has four second order partial derivatives
(and a function of n variables has n2 second order partial derivatives).
Definition 8.3. The second order partial derivatives of a function f : R ×
R → R are the partial derivatives of the first order partial derivatives.
Direct partial derivatives:
∂ 2f ∂ 2f
   
′′ ′′ ∂ ∂f ′′ ′′ ∂ ∂f
fxx = fxx = f11 = = , fyy = fyy = f22 = =
∂x2 ∂x ∂x ∂y 2 ∂y ∂y
(8.6)
Cross partial derivatives:
∂ 2f ∂ 2f
   
′′ ′′ ∂ ∂f ′′ ′′ ∂ ∂f
fxy = fxy = f12 = = , fyx = fyx = f21 = =
∂x∂y ∂y ∂x ∂y∂x ∂x ∂y
(8.7)
Second order partial derivatives are collected in the Hesse matrix or Hessian:
2 2
!
∂ f ∂ f
∂x2 ∂x∂y
Hf (x, y) = ∂2f ∂2f .
∂y∂x ∂y 2

Remark. For a function of n variables, the Hessian has n2 entries, with each
row corresponding to the variable of the first derivative, and each column to
the variable with respect to which the second derivative was taken:
 2 
∂ f ∂2f
2 . . . ∂x1 ∂xn
 ∂x. 1 . .. 
Hf (x) =  .
 . . . .  .
∂2f ∂2f
∂xn ∂x1 ∂x2n

Problem 8.6. Consider again the function f (p, m) = p−1.5 m2.08 . Find all of
its second order partial derivatives and the Hessian.
Solution. We already found the first order partial derivatives in Problem
8.1. We therefore differentiate each of them with respect to p and m to
obtain
fpp = 3.75p−3.5 m2.08
fmm = 2.2464p−1.5 m0.08
fpm = −3.12p−2.5 m1.08
fmp = −3.12p−2.5 m1.08 .
224 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

Finally, we simply fill a matrix with them to get the Hessian:


3.75p−3.5 m2.08 −3.12p−2.5 m1.08
 
Hf (p, m) = .
−3.12p−2.5 m1.08 2.2464p−1.5 m0.08
Remark. We notice quickly that fpm = fmp . This is no coincidence: Such
an identity holds for any twice continuously differentiable function. While
we do not discuss the continuity of functions in detail in this course, you
may rest assured that usually we only work with functions that are ”nice
enough” such that for instance this identity can be assumed. This in fact
implies that even for higher order derivatives, the order of differentiation
does not matter (as long as the function is differentiated the same amount
of times with respect to each variable). For instance, fxxyy = fxyxy . This can
be seen by considering the function g = fx . For the cross-derivatives of this
function, we have gxy = gyx . Remembering that g = fx , we therefore have
gxy = fxxy = fxyx = gyx . Since fxxy and fxyx are the same, differentiating
both of them once more with respect to x will result in the same function
again. An important consequence of this property is the theorem below.
Theorem 8.1. Let f : Rn → R be a twice continuously differentiable function
(that is, f has all second order partial derivatives and they are all continuous).
Then
∂f 2 ∂f 2
(x) = (x)
∂xi ∂xj ∂xj ∂xi
for any i and j, that is, the Hessian Hf (x) is a symmetric matrix.

Convexity/concavity and second order partial derivatives

Like in the case of single variable functions, the second order derivatives can
be used to study whether the function is convex or concave. Let (a, b) be a
point in the domain of a twice differentiable function f : R2 ⊇ A → R, and
B be a convex subset of A.

The direct derivatives can be used to determine the behaviour of the function
with respect to one variable at a time (while keeping the other constant):

ˆ If f (x, y) ≥ (>)0 for all x, y ∈ B, f is convex (strictly convex) in x


xx
on B.

ˆ If f (x, y) ≤ (<)0 for all x, y ∈ B, f is concave (strictly concave) in x


xx
on B.
8.4. HIGHER ORDER PARTIAL DERIVATIVES 225

ˆ If f (x, y) ≥ (>)0 for all x, y ∈ B, f is convex (strictly convex) in y


yy
on B.

ˆ If f (x, y) ≤ (<)0 for all x, y ∈ B, f is concave (strictly concave) in y


yy
on B.

To study the overall convexity/concavity of the function with respect to


both variables at the same time (both variables changing together), we need
to combine the information from all second order derivatives. Let us first
formally define the concept.

Definition 8.4. Let f : A → R, A ⊆ Rn be a function of n variables and B


a convex subset of A.

ˆ f is (strictly) convex on B if for all x, y and for all λ ∈ [0, 1] it holds


that f (λx + (1 − λ)y) ≤ (<)λf (x) + (1 − λ)f (y).

ˆ f is (strictly) concave on B if for all x, y and for all λ ∈ [0, 1] it holds


that f (λx + (1 − λ)y) ≥ (>)λf (x) + (1 − λ)f (y).

We will now discuss the convexity conditions for a function f : A → R of two


variables and a convex set B ⊆ A. We have the following:

ˆ f is convex (strictly convex) on B if and only if H (x, y) is positive


f
semidefinite (positive definite) for all (x, y) ∈ B, that is, if fxx (x, y) ≥
(>)0, fyy (x, y) ≥ (>)0 and fxx (x, y)fyy (x, y) ≥ (>)fxy (x, y)fyx (x, y) for
all (x, y) ∈ B.

ˆ f is concave (strictly concave) on B if and only H (x, y) is negative


f
semidefinite (negative definite) for all (x, y) ∈ B, that is, if fxx (x, y) ≤
(<)0, fyy (x, y) ≤ (<)0 and fxx (x, y)fyy (x, y) ≥ (>)fxy (x, y)fyx (x, y) for
all (x, y) ∈ B.

Example 8.5. To highlight the difference between convexity/concavity in


one variable and overall convexity/concavity, we consider the following functions:

f (x, y) = x2 + y 2 , g(x, y) = −x2 + y 3 .

For f , we have fxx = fyy = 2 > 0, such that the function is convex in both
2
x and y. Moreover, we have fxy = 0, thus fxx fyy > fxy , the function is
therefore also convex overall. On the other hand, for g we have gxx = −2,
226 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

gyy = 6y and gxy = 0. Since gxx < 0, g is concave in x. gyy is positive for
2
y > 0, thus g is convex in y for y ∈ R+ . In this case, gxx gyy < gxy , g is
therefore neither convex nor concave on R × R+ . For y ∈ R− , gyy < 0 which
2
makes g concave in y. In this case, we have gxx gyy > gxy , which means that
g is also concave overall.
Let us now plot these functions to observe their shape.

f <- function(x, y) x^2 + y^2


curve3d(f, from = c(-1, -1), to = c(1, 1), sys3d = "wireframe")

f(x,y)

y x

We observe that f forms a ”bowl” and for any two points on the graph of
the function, if we connect them with a straight line, this line does not cross
below the graph of the function. For g, such a line connecting two points
will always stay below the graph for y < 0 (see the plot of g for x, y, ∈ [0, 1]),
however, this is generally not the case if y can be positive.
8.4. HIGHER ORDER PARTIAL DERIVATIVES 227

g <- function(x, y) -x^2 + y^3


curve3d(g, from = c(-1, -1), to = c(0, 0), sys3d = "wireframe")

g(x,y)

y x

curve3d(g, from = c(0, 0), to = c(1, 1), sys3d = "wireframe")

g(x,y)

y x
228 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

curve3d(g, from = c(-1, -1), to = c(1, 1), sys3d = "wireframe")

g(x,y)

y x

Problem 8.7. Consider again the function f (p, m) = p−1.5 m2.08 . Decide
whether the function is convex/concave in p, m and (p, m).
Solution. We already found the Hessian in Problem 8.6:

3.75p−3.5 m2.08 −3.12p−2.5 m1.08


 
Hf (p, m) = .
−3.12p−2.5 m1.08 2.2464p−1.5 m0.08

Note that both fpp and fmm are positive for all p, m > 0 (domain of f ).
Therefore f is strictly convex in p and strictly convex in m. However,
det(Hf (p, m)) = 8.424−9.7344 < 0 such that f is neither convex nor concave
in (p, m).

8.5 Exercises
8.1 Find the first order partial derivatives with respect to x and y of the
following functions:
q
a)f (x, y) = xy
3
c)h(x, y) = xy
2 +5xy
b)g(x, y) = ex d)i(x, y) = x3 ln(4xy)
8.5. EXERCISES 229

For each of the functions above, plot the contour plot for x, y ∈ (0, 1). Decide
about the sign of the first order partial derivatives from the analytical form
(the derivatives you found) and confirm it visually from the contour plot.

8.2 For the functions, decide whether they are increasing or decreasing in x
and y on R × R and whether they are convex or concave in x, y and (x, y)
on R × R.
a)f (x, y) = x2 + 4xy + 8y 2
b)g(x, y) = −x2 + 2xy − y 2
c)h(x, y) = − ex1 − ex1 +x2

8.3 Consider the function f (x, y) = 5x0.5 y 0.2 for x, y > 0. Find its Hessian
and decide whether the Hessian is positive/negative definite (or neither/remains
undetermined) for any x, y > 0. Conclude about the convexity/concavity of
the function.
8.4 For the functions

2 x
f : R+ × R+ → R, f (x, y) = ln((x + y) ) + and
3y
r
x2 −y x
g : R+ × R+ → R, g(x, y) = e − ,
y

find all their first and second order partial derivatives. Then consider the
point (x0 , y0 ) = (1, 4) and decide whether the functions are:
a)increasing or decreasing with respect to x at (x0 , y0 ),
b)increasing or decreasing with respect to y at (x0 , y0 ),
c)convex or concave with respect to x on B = (x0 − ϵ, x0 + ϵ) × {y0 } for
some small ϵ > 0,
d)convex or concave with respect to x on B = {x0 } × (y0 − ϵ, y0 + ϵ) for
some small ϵ > 0.

8.5 Below you will find the contour plots of functions f , g, h and i of two
variables, with two points highlighted in each plot. For each of the plots
answer the following questions:
a)Is the value of the function equal at the two points A and B? If not, at
which point is the value greater?
b)Does the function attain both positive and negative values in the depicted
area?
230 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

c)What is the sign of the partial derivatives of the function with respect
to x and to y at the point A?

Function f
3.0

−4
−3
.5
2.5
2.0

B
y

A
1.5

2.5
.5

.5
.5
.5

−3
5
1.0

−2
−0

−4
1.5 0.

−3
−2
−1
−1
2 1 0

−5.0 −4.5 −4.0 −3.5 −3.0

Function g
2.0

−12
−10
1.5

−8
−6

−4 A
1.0

−2
0.5

0
y

0.0

4
0
2
−1.0 −0.5

−2

−4 B

0.5 1.0 1.5 2.0

x
8.5. EXERCISES 231

Function h
3.0

8 6
7
2.5

5 2
4
2.0

2
1.5
y

−1 −2
1.0

−3
A B
0.5

0 −1
0.0

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

Function i
1.0

0.5
0.5
0.0

B
y

−0.5

−0
−3.

.5
.5

0.5
A
−2

−1
−2
−1.0

.5

.5
−3

−1

−0

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

x
232 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

8.6 a)Assume that the demand for a certain product depends on its price
p1 and the price of competitor’s product p2 , with the demand function
given as f (p1 , p2 ) = p10
1 p2
. Find both price elasticities of f .
b)Assume that the demand for a certain product depends on its price p1 ,
the price of a competitor’s product p2 and the average income level m,
with the demand function given as g(p1 , p2 , m) = exp(−0.5p1 + 0.7p2 +
1.1m). Find both price elasticities and income elasticity of demand.

8.7 Find an expression for y ′ if the following equations hold:

a)x2 + y 2 = 10 d)2x2 − 5xy 2 + 6y 3 + 4y 2 = 61


b)y 2 = 6x − 8 e)x3 cos(6y 3 ) = sin(10y + 8x) + 70
c)x2 + y 2 + 4x − 2y − 20 = 0 f)ln(xy) = 10

8.8 Consider a firm operating under the production function

Q(K, L) = 0.5K 2 − 2KL + L2

where Q represents output and K and L denote capital and labor inputs,
respectively. The current capital and labor levels are 100 units and 20 units,
respectively.
a)Find the current level of production.
b)Find and interpret the current marginal productivity of the labor for
the given production function (that is, compute ∂Q
∂L
).
c)Compute and interpret the elasticity of production with respect to
capital at the given input levels.
d)Due to a recent government wage increase that has elevated the cost
of labor, the firm decided to reduce the labor input. Determine how
the firm should adjust its capital level to maintain its initial production
output if they decrease labor by h units. (Hint: Find K ′ .)

8.9 Consider a firm operating under the production function

Q(x, y) = x3 − xy 2 + 3y 3 + x2 y

where Q represents output and x and y denote inputs. The current input
levels are x = 10 units and y = 20 units, respectively.
a)Find the current level of production.
8.5. EXERCISES 233

b)Find and interpret the current marginal productivity of the input x for
the given production function.
c)Compute and interpret the elasticity of production with respect to input
y at the given input levels.
d)Due to a decrease in the availability of input x in the supply market, the
firm decided to reduce the amount of input x used in the production.
Determine how the firm should adjust its usage of input y to maintain
its initial production level if they decrease the usage if input x by h
units.

8.10 Anne is a student who loves chocolate. Her happiness from eating x
grams of dark chocolate and y grams of milk chocolate per week can be
described by the function
 x 2  y 2  xy 
H(x, y) = + + ln .
100 100 10000
Currently, she eats 200 grams of dark chocolate and 50 grams of milk chocolate
per week.
a)What is Anne’s current happiness level from eating cholate?
b)Anne realized that dark chocolate has become more expensive and
conders eating less dark and more milk chocolate to lower her chocolate
expenses. If she aims to maintain her current level of happiness from
eating chocolate, how much more milk chocolate does she have to eat if
she lowers her dark chocolate consumption by h grams per week?
c)If we denote by pd and pm the current prices of dark and milk chocolate,
respectively, what is the difference in her weekly chocolate expenses if
she lowers her dark chocolate consumption by h grams and increases
her milk chocolate consumption such that her chocolate happiness level
remains the same? What does the relationship between pd and pm have
to be such that she can save money by eating less dark and more milk
chocolate?

8.11 For the production of a certain powder, a firm uses two ingredients.
The amount (in kilograms) of the powder that can be produced using x kg
of the first ingredient and y kg of the second ingredient can be described by
the production function
P (x, y) = ln((x + y)2 ) + 5xy.
Currently, they are using x = 0.2 kg of the first ingredient and y = 0.8 kg of
the second ingredient.
234 CHAPTER 8. FUNCTIONS OF SEVERAL VARIABLES

a)What is the currently produced amount of the powder?


b)The managers are looking into changing the current ingredient combination
to reduce costs of the production. If they decrease x by h, how approximately
do they have to change y?
c)If we denote by p1 and p2 the current prices of the first and the second
ingredient, respectively, what is the difference in the overall costs for
the ingredients if x is decreased by h and the production level remains
the same? What does the relationship between p1 and p2 have to be,
such that the change in x by h and the corresponding change in y lead
to a cost reduction?

8.6 Further readings

The topics of this chapter are mostly presented in Chapter 14 (Sections


14.1-14.3, 14.5-14.6 and 14.8-14.10) of [1], with the exception of implicit
differentiation which is discussed in Section 7.1 of [1]. As usual, we suggest
also the end of section exercises, as well as the review exercises at the end
of the chapter, perhaps with the exception of exercises 4, 8 and 9 in Section
14.3, exercises 3-5 in Section 14.5, exercises 8-11 in Section 14.6, and exercise
3c) in Section 14.8.
However, we would like to point out that the presentation of the topic of
convexity and concavity of functions is presented in a more advanced way
than in this course, including functions of more than two variables, and using
concepts that in this document will only be discussed at a later point.
Chapter 9

Optimization

As we indicated in Chapter 5, a very important application of derivatives is


optimization. Optimization is an area of applied mathematics that studies
looking for minimal and/or maximal values of functions. In optimization we
distinguish between unconstrained and constrained optimization – in the first
case, one generally looks for extremal values of the function in its domain,
whereas in constrained optimization, the search area is further constrained
by some condition(s) on the variable of interest. In optimization, the function
to be minimized or maximized (we will in general say optimized unless we
have specifically minimization or maximization in mind) is often referred to
as the objective function, and we will also use this terminology.

In this chapter, we introduce the notions of local and global minima and
maxima and some conditions for these points in case of differentiable functions.
We first focus on single-variable functions and then shift our focus to functions
of several variables, discussing both the cases of unconstrained and constrained
optimization problems.

9.1 Single variable optimization

Definition 9.1. Let f : A → R. with A ⊆ R be a function. Then the local


extrema of f are defined as follows:

ˆ f attains a (strict) local minimum at a point c ∈ A if there is an interval


235
236 CHAPTER 9. OPTIMIZATION

(α, β) with c ∈ (α, β) such that

f (x) ≥ (>)f (c) for all x ∈ (α, β) ∩ A, x ̸= c.

ˆ f attains a (strict) local maximum at a point c ∈ A if there is an interval


(α, β) with c ∈ (α, β) such that

f (x) ≤ (<)f (c) for all x ∈ (α, β) ∩ A, x ̸= c.

The global extrema of f are defined as follows:

ˆ f attains global minimum at a point c ∈ A


f (x) ≥ f (c) for all x ∈ A.

ˆ f attains global maximum at a point c ∈ A


f (x) ≥ f (c) for all x ∈ A.

If f attains a local or global minimum or maximum at c, than c is called an


extreme point of f .

Recall that if at a particular point a in the domain of f the derivative is


positive, f ′ (a) > 0, then f is strictly increasing at this point. That means
that a cannot be a minimum or maximum (local or global) at a, since in
any interval (α, β) around a, f achieves both greater and smaller values than
f (a). Similar argumentation leads to the realization that a cannot be an
extreme point of f if f ′ (a) < 0. This leads to the first order necessary
condition (FONC) for local extreme points:

Theorem 9.1. Let f be differentiable in an interval I and x be an interior


point of I. If x is a local extreme point of f in I, then f ′ (x) = 0.

Remark. For α < β, α, β ∈ R, the open interval (α, β) is the collection of


all interior points of intervals (α, β), (α, β], [α, β) and [α, β].

Definition 9.2. A point c in the domain of a function f is called a critical


point or stationary point of f if it satisfies the first order necessary condition
for being an extreme point of f , that is, if f ′ (c) = 0.
9.1. SINGLE VARIABLE OPTIMIZATION 237

The condition f ′ (c) = 0 is only a necessary condition, that is, it does not
guarantee that c is indeed an extreme point of f . Consider e.g. the function
f (x) = x3 : f ′ (x) = x2 = 0 for x = 0, but f is increasing at x = 0 and x = 0
clearly is not an extreme point of f .

To decide whether at a critical point c – a candidate for an extreme point of


f – f indeed attains a local minimum or maximum, we need to verify further
conditions. One possibility of how to do this is to consider the behaviour of
the function to the left and to the right of the point. Clearly, if in a certain
interval (α, c) the function f is increasing (f ′ (x) ≥ 0 for x ∈ (α, c)), then
f (x) ≤ f (c) for any x ∈ (α, c). If moreover there is an interval (c, β) such
that f is decreasing in this interval (f ′ (x) ≤ 0), we also have f (x) ≤ f (c) for
x ∈ (c, β) and, consequently, f (x) ≤ f (c) for any x in (α, β). This means that
f attains a local maximum at c. On the other hand, if there are α < c < β
such that f ′ (x) ≤ 0 for x ∈ (α, c) (f decreasing) and f ′ (x) ≥ 0 for x ∈ (c, β)
(f increasing), then f attains a local minimum at c. We formalize this rule
as the first derivative test:

Theorem 9.2. Let c be a critical point of f , i.e. f ′ (c) = 0.

ˆ If there are α < c < β such that f (x) ≥ 0 on (α, c) and f (x) ≤ 0 on
′ ′

(c, β), then f attains a local maximum at c.

ˆ If there are α < c < β such that f (x) ≤ 0 on (α, c) and f (x) ≥ 0 on
′ ′

(c, β), then f attains a local minimum at c.

ˆ If there are α < c < β such that f (x) > 0 for any x ∈ (α, β) or

f (x) < 0 for any x ∈ (α, β), then c is not a local extreme point of f .

Another way to check whether a stationary point c is a local extreme of f


is to find out whether f is convex or concave at the given point. A convex
function (think e.g. of x2 ) has a minimum whereas a concave function (such
as e.g. −x4 ) has a maximum. This of course holds not only for functions
that are convex or concave on the whole domain, but also for subsets of the
domain only – such minima and and maxima are then only local. Formally,
the second derivative test works in the following way:

Theorem 9.3. Let f be a twice differentiable function on an interval I and


c be an interior point of I with f ′ (c) = 0.

ˆ If f (c) > 0, then f attains a strict local minimum at c.


′′
238 CHAPTER 9. OPTIMIZATION

ˆ If f (c) < 0, then f attains a strict local minimum at c.


′′

ˆ If f (c) = 0, then the character of c remains undetermined (c can be


′′

any type of extreme point or no extreme point at all).

Remark. Indeed, if f ′′ (c) = 0, c could be any type of an extreme point or no


extreme at all. Consider the functions x4 , x3 and −x4 and the point x = 0.
In all three cases, f ′ (x) = f ′′ (x) = 0, and x4 attains a local minimum at x, x
is no extreme point of x3 and −x4 attains a local maximum at x. That also
means that the first two conditions in Theorem 9.3 are sufficient conditions,
but not necessary conditions. A point c can be a local extreme point of f
even if neither of the two conditions is satisfied.

Summarizing the Theorems 9.1-9.3, to find the local extremes of a differentiable


function f : A → R, the following procedure should be followed:

1. Differentiate f and find all x ∈ A with f ′ (x) = 0.

2. Determine the character of each x from point 1.:

(a) Check the sign of f ′ in close proximity of x (in both directions,


i.e. in an interval around x).
(b) Alternatively, if f is twice differentiable, check the sign of f ′′ (x).

Problem 9.1. Maximize the profit Π of a firm, given the total revenue
function
R(Q) = 4000Q − 33Q3
and the total cost function

C(Q) = 2Q3 − 3Q2 + 400Q + 5000

where Q denotes the quantity of goods, that is, assume Q > 0.

Solution. Profit is given as the total revenue minus the total costs, i.e. we
have
Π(Q) = R(Q) − C(Q) = −2Q3 − 30Q2 + 3600Q − 5000.
In the first step, we differentiate Π and set the derivative equal to 0:

Π′ (Q) = −6Q2 − 60Q + 3600 = 0


Q2 + 10Q − 600 = 0
(Q + 30)(Q − 20) = 0.
9.1. SINGLE VARIABLE OPTIMIZATION 239

This gives us Q1 = −30 or Q2 = 20, but since we should assume Q > 0, we


only consider Q2 = 20. To check whether this really gives the maximum of the
profit function, let us perform the second derivative test. We have Π′′ (Q) =
−12Q − 60 and Π′′ (20) < 0 such that we have found a local maximum
Π(20) = 39000 at Q = 20.
We can also check our result graphically in R.

R <- function(q) {4000*q - 33*q^2}


C <- function(q) {2*q^3 - 3*q^2+400*q + 5000}
P <- function(q) {-2*q^3 - 30*q^2+3600*q - 5000}
curve(R, 0, 60, n = 100, xlab = "Q", ylab = " ", col = 2, lwd = 2)
curve(C, 0, 60, n = 100, add = TRUE, lwd = 2)
curve(P, 0, 60, n = 100, add = TRUE, lwd = 2, col = 5)
abline(v = 20, col="blue", lwd=1)
legend("topleft", c("R","C","P"), col = c(2,1,5), lty = 1)
80000 120000

R
C
P
40000
0

0 10 20 30 40 50 60

Note that above we used the function curve for plotting. This is a new way
of plotting functions compared to what we did so far. In curve, instead
240 CHAPTER 9. OPTIMIZATION

of specifying the x- and y-coordinates of points to be plotted, we provide a


function that we want to plot along with the interval on which it is to be
plotted (here (0, 60)). The argument n controls in how many points is the
function actually evaluated in the process of plotting – it is the counterpart
of the length of the sequence of points used in the plot function. It also
accepts graphical parameters that we know from other plotting functions,
like ylab, col or lwd. If we set add = TRUE, instead of starting a new plot,
the function will be added to an already existing plot (like when we use lines
instead of plot).
In the plot, we observe that the (positive) difference between the revenue (the
red line) and the costs (the black line), also represented by the turquoise line
as the profit, is maximized at Q = 20, marked in blue.
Note that if we went beyond the quantity and profit interpretation of Q and
Π and considered also the critical point Q1 = −30, we would get that Q1 the
function Π attains a local minimum at this point since Π′′ (−30) > 0.

9.1.1 Single variable optimization in R

While Theorem 9.1 in theory gives a simple way of finding candidates for
local extremes, actually finding the critical points (and consequently local
extremes) can be in practice difficult. For many functions, the derivative
does not exist or is not readily available. Other functions, like polynomials
of degree more than 3, can be easily differentiated, but to solve the equation
f ′ (x) = 0 presents a challenge. In such situations, one resorts to numerical
methods, that is, uses a computer software, such as R, to optimize the
function at hand. In this subsection, we show how to optimize single variable
functions in R.

To optimize functions of one variable, R offers the function optimize (with


an alias optimise, that is, it recognizes both American and British spelling
of the word). As arguments, we need to pass to optimize the function to
be optimized f and an interval interval in which we would like to search
for the minimum (unlike analytical methods, numerical methods only search
bounded subsets of the domain). interval should be passed to the function
as a vector of the lower and upper bound of the interval. Alternatively, the
interval can be passed as two separate arguments, lower and upper. By
default, optimization functions in R search for the minimum of the function,
therefore in case we want to maximize the function, we also need to set
maximum = TRUE. Let us illustrate the use of the function on the profit
maximization problem in Problem 9.1.
9.1. SINGLE VARIABLE OPTIMIZATION 241

out <- optimize(P, c(0, 60), maximum = "TRUE")


out

## $maximum
## [1] 20.00001
##
## $objective
## [1] 39000

The output of the optimize function is a list. You can think of lists in R
as baskets containing several objects. These objects may be all of the same
type, but don’t have to. They might also have each their names, and if they
do, as it is the case with the output of optimize, they can be accessed by
using the dollar sign. In this case, the two parts of the output are called
maximum, which is the point in which the local maximum is attained itself,
and objective, which gives the value of the objective function at this point.

out$maximum

## [1] 20.00001

out$objective

## [1] 39000

As we see, the value out$maximum is very close to the value we found analytically,
but still there is a small difference. This goes back to the numerical, and
therefore only approximative nature of the underlying algorithm. However,
the precision can be controlled to a certain degree, and the parameter that
is responsible for it is called tol. The smaller the tolerance, the more
precise the result. However, setting the tolerance too low could result in
long computation times or even to the algorithm not terminating at all, so
one needs to be carefult and not set it too low (and it cannot be set to 0).
Let us try with 1e-10.
242 CHAPTER 9. OPTIMIZATION

out2 <- optimize(P, c(0, 60), maximum = TRUE, tol = 1e-10)


optimizer2 <- out2$maximum
optimizer2

## [1] 20

out2$objective

## [1] 39000

9.2 Multivariate optimization

For functions of several variables, the notions of local extremes generalize in


a fairly straightforward manner.

Definition 9.3. Let f : A → R be a function of n variables and c ∈ A a


point in the domain.

ˆ f attains a (strict) local minimum at c if there are intervals (α , β )


i i
with αi < c < βi such that

f (x) ≥ (>)f (c)

for all points x ∈ A with x ̸= c, xi ∈ (αi , βi ).

ˆ f attains a (strict) local maximum at c if there are intervals (α , β )


i i
with αi < c < βi such that

f (x) ≤ (<)f (c)

for all points x ∈ A with x ̸= c, xi ∈ (αi , βi ).

ˆ f attains global minimum at c, if f (x) ≥ f (c) for all x ∈ A.


ˆ f attains global maximum at c, if f (x) ≤ f (c) for all x ∈ A.
If c satisfies any of the above it is an extreme point of f .
9.2. MULTIVARIATE OPTIMIZATION 243

one can use very similar considerations as in the case of functions of a single
variable to arrive at the first order necessary conditions that let us determine
the critical points, that is, candidates for a local minimum or maximum. We
only formulate them for functions of two variables, but they also extends to
functions of n variables (n > 2).
Theorem 9.4. Consider a function f : A → R of two variables and B ⊆ A.
If an interior point (x0 , y0 ) of B is a local minimum or maximum of f , then
fx (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0.

In the case of multivariate functions, we must turn our attention to the


convexity or concavity of the function at the critical point to decide whether
it is a local minimum or maximum (or neither). Recall that to determine
the convexity or concavity of a function of two variables, we use its Hesse
matrix. The conditions presented in Chapter 8 therefore lie at the heart of
the following second order derivative test:
Theorem 9.5. Let f : A → R be a function of two variables and (x0 , y0 ) be
a critical point of f , i.e. ∇f (x0 , y0 ) = 0.

ˆ If H (x , y ) is positive definite, then f attains a local minimum at


f 0 0
(x , y ).
0 0

ˆ If H (x , y ) is negative definite, then f attains a local maximum at


f 0 0
(x , y ).
0 0

ˆ If H (x , y ) is indefinite, then (x , y ) is a saddle point of f , that is,


f 0 0 0 0
it a critical point but not an extreme point of f .
ˆ If H (x , y ) is positive or negative semidefinite, the character of (x , y )
f 0 0 0 0
remains undetermined.
2
Remark. Note that fxx (x0 , y0 )fyy (x0 , y0 ) < fxy (x0 , y0 ) can only be the case
if fxx (x0 , y0 )fyy (x0 , y0 ) ≤ 0 i.e. fxx and fyy have different signs at (x0 , y0 ) or
(at least) one of them is 0. In the case that fxx and fyy are both non-zero
but of different signs, that means that the function is concave in one of the
variables and convex in the other at this point.
Remark. Note that similarly to the single variable case, the first two points
of Theorem 9.5 are only sufficient conditions but a point that does not satisfy
any of them could still be a local minimum or maximum. However, further
verification would be necessary to decide in cases described in the fourth
point.
244 CHAPTER 9. OPTIMIZATION

Example 9.1. A saddle point (x0 , y0 ) is an interesting point. It can be for


instance a point for which it holds that the function g(x) = f (x, y0 ) attains a
local maximum (or minimum) at x0 and the function h(y) = f (x0 , y) attains a
local minimum (or maximum) at y0 . However, f clearly cannot attain a local
minimum or maximum at such a point since both greater and smaller values
can be achieved by the function in a close proximity of (x0 , y0 ). An example of
a function that has a saddle point is the function f (x, y) = (x−2)2 −(x+3)2 .
We have
fx (x, y) = 2(x − 2) and fy (x, y) = −2(x + 3).
Clearly, for both first order derivatives to be 0, we need x = −2 and y = 3,
such that our candidate point is (x0 , y0 ) = (2, −3). If we look for the second
order partial derivatives, we get that fxx = 2, fyy = −2 and fxy = 0 such
that the point (2, −3) is a saddle point. Around such points, the function
indeed reminds a bit of a saddle (which is where the name comes from) which
we can confirm visually by plotting the function.

library(emdbook)
f <- function(x, y) (x - 2)^2 - (y + 3)^2
curve3d(f, from = c(0,-5), to = c(4,-1), sys3d = "wireframe")

f(x,y)

y x
9.2. MULTIVARIATE OPTIMIZATION 245

Summarizing Theorems 9.4 and 9.5, to find the local minima and maxima of
a function of two variables, we follow the following procedure:

1. Find the first order derivatives fx and fy and solve the systems of
equations fx (x, y) = 0, fy (x, y) = 0.
2. For each candidate point from point 1., check the convexity or concavity
of the function:
(a) Check the direct second order partial derivatives at the point. If
they are nonzero and share the same sign, the critical point is a
candidate for a point where f attains a local minimum (fxx > 0
and fyy > 0) or a local maximum (fxx < 0 and fyy < 0). Otherwise
the test is inconclusive (if all second order partial derivatives are
0) or the candidate point is not an extreme.
2
(b) If fxx fyy > 0, check also the cross-derivatives. If fxx fyy − fxy > 0,
the candidate point is indeed a local extreme. Otherwise the test
2
is inconclusive (if fxx fyy − fxy = 0) or the point is a saddle point
2
(if fxx fyy − fxy < 0).
Problem 9.2. A firm producing two goods in amounts x and y has the profit
function
Π(x, y) = 64x − 2x2 + 4xy − 4y 2 + 32y − 14.
Find the profit maximizing values of x and y and the maximal profit.
Solution. In the first step, we find the first order partial derivatives and
set them equal to 0. We get the following system of two equations in two
variables:
Πx (x, y) = 64 − 4x + 4y = 0
Πy (x, y) = 4x − 8y + 32 = 0.
The solution of this system is the point (x0 , y0 ) = (40, 24). To check whether
at this point the function really attains a local maximum, we perform the
second derivative test. We have
Πxx (x, y) = −4,
Πyy (x, y) = −8,
Πxy (x, y) = 4.
Since Πxx < 0, Πyy < 0 and Πxx Πyy > Π2xy , we conclude that (x0 , y0 ) =
(40, 24) are the profit maximizing amounts. The profit at this point is
Π(40, 24) = 1650.
Let us also visualize the function in R.
246 CHAPTER 9. OPTIMIZATION

profit <- function(x,y) {64*x - 2*x^2 + 4*x*y - 4*y^2 + 32*y - 14}


curve3d(profit, from = c(0, 0), to = c(100, 100),
varnames = c("x", "y"), sys3d="wireframe")

profit(x,y)

y x

curve3d(profit, from = c(20, 10), to = c(50, 30),


varnames = c("x", "y"), sys3d="contour", n = 100)
points(40, 24, col = 2, pch = 3)
9.2. MULTIVARIATE OPTIMIZATION 247

30
60
0 00 00 1600
10 12
25
20
y

15

00
1400 10
0 0 0
10

120 80 40

20 25 30 35 40 45 50

Problem 9.3. The demands for a monopolist’s two products are determined
by the equations
p = 25 − x, q = 24 − 2y
where p and q are prices per unit of the two goods, and x and y are the
corresponding quantities. The costs of producing and selling x units of the
first good and y units of the other are

C(x, y) = 3x2 + 3xy + y 2 .

Find the monopolist’s profit Π(x, y) from producing and selling x units of
the first good and y units of the other. Then, find the values of x and y that
maximize Π (and verify that you have found the maximum) and calculate
the maximum profit.
Solution. The revenue from selling x units of a product at the price p is
given as px, such that the monopolist’s revenue from selling x and y units,
respectively, of the two products, is given as

R(x, y) = px + qy = (25 − x)x + (24 − 2y)y.

Correspondingly, the profit, given as the difference between the revenue and
the costs, is given by

Π(x, y) = R(x, y) = −C(x, y) = 25x − 4x2 + 24y − 3y 2 − 3xy.


248 CHAPTER 9. OPTIMIZATION

Now that we have the profit function, we can proceed to write down the first
order conditions to find the critical point(s):

Πx (x, y) = 25 − 8x − 3y = 0
Πy (x, y) = 24 − 6y − 3x = 0.

The solution of this system is (x0 , y0 ) = (2, 3). We verify

(−8) · (−6) = Πxx Πyy > Π2xy = (−3)2

(and both direct second order partial derivatives are negative), such that
these really are profit maximizing quantities. We have Π(2, 3) = 61.

9.2.1 (Unconstrained) Multivariate optimization in R

To optimize functions of several variables in R, one can make use of the


function optim. This function requires a starting point in the argument
par, which may be any point in the domain of the function that should
be optimized, and the function itself in the argument fn. However, it is
important to note here that fn must be defined as a function of one argument
only – several variables of the function therefore must be passed as a single
object, namely a vector. We will illustrate the use of optim on the profit
function from Problem 9.2. Before we continue to discuss how optim works,
let us redefine the function profit that we used for visualization in a way
neeeded by optim.

profit2 <- function(a) {


profit(a[1], a[2])
}

Note that the function profit2 only takes one argument, but inside the body
of the function we access a[1] and a[2] which means that a is expected to
be a vector of (at least) two values. The first entry in a plays the role of the
variable x of the function profit, while the second entry plays the role of
y. Recall that we discussed these two possible ways of defining multivariate
functions at the beginning of Chapter 8.

Just like optimize, optim by default also minimizes fn. In this case, if
we want to maximize it, this can be done by the means of control =
list(fnscale = -1). This has to do with the fact that if a function is
9.3. CONSTRAINED OPTIMIZATION 249

multiplied by -1, it is mirrored about the x-axis, which means that any (local)
minimum becomes a (local) maximum and any (local) maximum becomes a
(local) minimum. Like with optimize, the output of optim is a list. In this
case, it contains more objects; for our purposes, the interesting parts are par,
which contains the found minimum or maximum, and value, which gives the
optimal objective value.

outmv <- optim(c(0.1, 0.1), profit2, control = list(fnscale = -1))


outmv$par

## [1] 40.00004 24.00006

outmv$value

## [1] 1650

9.3 Constrained optimization

In constrained optimization, the search area is further constrained by some


conditions on the variable(s). In the case of single variable functions, this
simply means that one focuses only on a particular interval (or several intervals).
Let us consider a function f . If the interval of interest is open, (a, b), any local
minimum or maximum must satisfy the first order necessary condition as any
point of an open interval is its interior point. If the interval is (half) open,
that is, we consider [a, b), (a, b] or [a, b], then any of its included endpoints
will in fact present a local minimum or local optimum. If f is increasing
(decreasing) at a, then f (a) ≤ (≥)f (x) for all x < β for some β > a and a is
a local minimum (maximum) of f on the given interval (if it is included in the
interval). Similarly, if f is increasing (decreasing) at b, then f (b) ≥ (≤)f (x)
for all x > β for some β < b and b is a local maximum (minimum) of f on
the interval (if it is included in the interval).

In the case of functions of several variables, the situation is more complicated.


There are various types of constrained problems. In these lecture notes, we
only focus on one particular type of a constrained optimization problem,
namely problems where the objective function is a function of two variables
250 CHAPTER 9. OPTIMIZATION

only, and there is only one constraint in the form of an equality. Formally,
this means that we deal with problems of the form

min(max)f (x, y) s.t. g(x, y) = c (9.1)

where g is another function of two variables and c ∈ R is a constant.


Together, g and c form the constraint. Generalizations of such problems
include functions with n > 2 variables, more than one constraints, or constraints
in the form of an inequality.

Typical examples of problems in the form (9.1) are minimizing production


costs while being able to produce required amounts of a certain product, or
maximizing profit while considering certain budget constraints. Problems of
the form (9.1) can often be solved by a method called Lagrange multiplier
method.

In the Lagrange multiplier method (or Lagrange method in short), the first
step is to define the Lagrange function, also called the Lagrangian. To this
end, one introduces a new variable λ to combine the objective function and
the constraint:

L(x, y, λ) = f (x, y) − λ(g(x, y) − c). (9.2)

In the next step, one considers the first order necessary conditions applied to
the Lagrangian, as if we were optimizing this new function of three variable.
The idea behind the method is that we consider now an unconstrained
function in which, however, there is a penalty of λ – called the Lagrange
multiplier – if the constraint of the original problem is not satisfied, i.e. if
g(x, y) − c ̸= 0.

The first order necessary conditions amount to the system of equations

Lx (x, y, λ) = fx (x, y) − λgx (x, y) = 0 (9.3)


Ly (x, y, λ) = fy (x, y) − λgy (x, y) = 0 (9.4)
Lλ (x, y, λ) = −gx (x, y) + c = 0 (9.5)

where we can notice that the last equation in the system, (9.5), is in fact
nothing else than the constraint of the original problem given in (9.1).

After solving this system of equations, we get the stationary points of the
Lagrangian, triples of values (x0 , y0 , λ0 ). Any (x, y) that are a solution of
the constrained optimization problem given in (9.1) must be part of such a
triple.
9.3. CONSTRAINED OPTIMIZATION 251

Theorem 9.6. If the point (x0 , y0 ) is a solution of the constrained optimization


problem (9.1), then there is λ0 ∈ R such that the triple (x0 , y0 , λ) solves the
system of equations (9.3)-(9.5), that is, (x0 , y0 , λ0 ) is a stationary point of
the Lagrangian.

To find out whether (x0 , y0 ) of any candidate triple is in fact a solution to the
optimization problem, we need to check, similarly to unconstrained problems,
some second order derivatives. To this end, we will consider the second order
derivatives of the Lagrangian with respect to the variables of the original
problem, x and y.
Theorem 9.7. Consider the constrained problem in (9.1) and let (x0 , y0 , λ0 )
be a stationary point of its Lagrangian.

ˆ If L > 0, Lyy (x0 , y0 , λ0 ) > 0 and moreover


xx (x0 , y0 , λ0 )
Lxx (x0 , y0 , λ0 )Lyy (x0 , y0 , λ0 ) > Lxy (x0 , y0 , λ0 )2 , then (x0 , y0 ) is a solution
for the minimization problem.
ˆ If L xx (x0 , y0 , λ0 ) < 0, Lyy (x0 , y0 , λ0 ) < 0 and moreover
Lxx (x0 , y0 , λ0 )Lyy (x0 , y0 , λ0 ) > Lxy (x0 , y0 , λ0 )2 , then (x0 , y0 ) is a solution
for the maximization problem.
Remark. Note that Theorem 9.7 gives sufficient conditions for (x0 , y0 ) to
be a solution of the constrained optimization problem. If neither of the two
conditions holds, it could still be a solution, however, further verification
would be necessary. In these lecture notes, we do not go into the details of
the more involved tests for optimality.
Problem 9.4. Consider a firm that sells two products. If x and y units of
the two products are sold, respectively, the total profit is given by
Π(x, y) = 80x − 2x2 − xy − 3y 2 + 100y.
The firm’s maximum output capacity is x + y = 12. (For instance, both
products are produced using the same machine and in some given time period,
only 12 pieces altogether can be produced). What is the maximal profit?
Solution. We have a constrained maximization problem, with the objective
function being the profit function, f (x, y) = Π(x, y), and the constraint
given by the maximum output capacity, that is, g(x, y) = x + y = 12 = c.
To find the maximal profit at this constraint, we start by writing down the
Lagrangian:
L(x, y, λ) = 80x − 2x2 − xy − 3y 2 + 100y − λ(x + y − 12).
252 CHAPTER 9. OPTIMIZATION

Upon differentiating L with respect to x, y and λ in turn, we get the following


system of equations:

Lx (x, y, λ) = 80 − 4x − y − λ = 0
Ly (x, y, λ) = −x − 6y + 100 − λ = 0
Lλ (x, y, λ) = −x − y + 12 = 0.

This system has exactly one solution, namely the triple (x0 , y0 , λ0 ) = (5, 7, 53).
To check we found a maximum of the objective function under the given
constraint, we check the second order derivatives of the Lagrangian. We
get Lxx = −4, Lyy = −6 and Lxy = −1. Since Lxx < 0, Lyy < 0 and
Lxx Lyy > L2xy , x = 5 and y = 7 is the profit maximizing product combination.

In the case of the simple constraint x + y = 12, there is an alternative way of


solving this problem. Clearly, if x + y = 12, then y = 12 − x. We can plug in
for y in the objective function to get a single variable maximization problem
of the form max Π2 (x) = 80x − 2x2 − x(12 − x) − 3(12 − x)2 + 100(12 − x) =
−4x2 + 40x + 120. Differentiating this function gives the first order necessary
condition −8x + 40, which leads to x = 5, and since the second derivative is
−8, this is the maximum. Plugging in into y = 12 − x yields y − 5. However,
this strategy does not work for more involved constraints.
Finally, let us plot the problem setting in R. The best way to consider (two-
variable) constrained problems is through a contour plot, in which next to
the level curves of the function, one also plots the constraint itself (red line).

P <- function (x, y) {80*x - 2*x^2 - x*y - 3*y^2 + 100*y}


curve3d(P, to = c(12, 12), sys3d = "contour")
abline(a = 12, b = -1, col = "red")
points(5, 7, col = 2, pch = 3)
9.3. CONSTRAINED OPTIMIZATION 253

10 12

12
00
8

110
0
6
y

100
0
4

900
2

800

10 20 30 400 700
0 0 0 500 600
0

0 2 4 6 8 10 12

Maximizing the function under the constraint graphically means finding the
highest level curve that crosses or touches the constraint. The point at which
this level curve touches the constraint is the optimal solution of the problem.
The value at the level curve is the maximal possible value of the objective
function.

Interpretation of the Lagrange multiplier

While the ultimate goal of the Lagrange method is to find the x and y that
solve the optimization problem, the Lagrange multiplier is not only means to
an end. It has an actual interpretation which we give in terms of a particular
type of maximization problems but can be extended to general problems of
the form (9.1).

Consider a maximization problem in which c indicates the available stock of


a certain resource, while the objective function measures the profit obtained
through an activity that uses this resource as an input. Then the multiplier
gives the maximum increase in profit that could be obtained if one more unit
254 CHAPTER 9. OPTIMIZATION

of the input – resource – were available. Hence, it is the maximum amount


that the decision maker (e.g. the producer having profit from production) is
willing to pay for one additional unit of it. Due to this interpretation, the
Lagrange multiplier is also called the shadow price.

In the context of problem 9.4, this means that if the maximal production
were increased by one unit, that is, we would have x + y = 13, this can
lead to an increase of approximately λ0 = 53 monetary units in the profit.
Therefore, the producer would be willing to pay at most 53 monetary units
for a chance to increase the overall production output by one unit.
Problem 9.5. Minimize the function x+2y under the constraint x2 +y 2 = 1.
Solution. Again, we start by writing down the Lagrange function of the
problem:
L(x, y, λ) = x + y − λ(x2 + y 2 − 1).
The first order partial derivatives give us the system
Lx (x, y, λ) = 1 − 2λx = 0 (9.6)
Ly (x, y, λ) = 2 − 2λy = 0 (9.7)
Lλ (x, y, λ) = −x2 − y 2 + 1 = 0. (9.8)
1
From equations (9.6) and (9.7), we obtain that x = 2λ and y = λ1 . If we plug
in these expressions into (9.8) or, equivalently, into the original constraint,
we obtain
1 1
2
+ 2 =1
4λ λ
√ √
which has two solutions: λ1 = − 25 and λ1 = 25 . If we plug in into
the expressions √
for x and y, we have two candidate

points: (x1 , y1 , λ1 ) =
1 2 5 1 2 5
(− √5 , − √5 , − 2 ) and (x2 , y2 , λ2 ) = ( √5 , √5 , 2 ).
The second order partial derivatives of the Lagrangian with respect to x
and y are √
Lxx (x, y, λ) = −2λ, Lyy (x, y, λ) = −2λ and Lxy (x, y, λ) = 0. For
5
λ1 = − 2 , the conditions for a minimum are satisfied, such that the point
(− √15 , − √25 ) is the solution of the problem. Note that ( √15 , √25 ) would solve
the corresponding maximization problem.

9.4 Constrained optimization in R

In these lecture notes, we only focus on solving constrained optimization


problems with linear equality constraints, like in Problem 9.4, in R. Students
9.4. CONSTRAINED OPTIMIZATION IN R 255

who would be interested in more functions that allow to solve more general
classes of optimization problems are encouraged to have a look at the package
ROI.

To solve minimization problems with linear constraints, R offers the function


constrOptim. As its arguments, the function requires a starting point theta,
which must satisfy the constraint, the vector of the function’s first order
partial derivatives grad (which however can be set to NULL and then an
algorithm that does not require the derivatives will be used) the function f
and the constraint(s) provided through a matrix ui and a vector ci.

As mentioned above, the constraint(s) must be provided in the form of a


matrix and a vector. Moreover, the function works for constraints in the form
of ≥ equations, that is, in our problem formulation, g(x, y) ≥ c. Therefore,
to be able to apply the function constrOptim on Problem 9.4, we first need
to rewrite the problem in the corresponding form. Note that

g(x, y) = c

is equivalent to the following system of inequalities:

g(x, y) ≥ c
g(x, y) ≤ c.

Moreover, the second inequality can be written as a ≥ inequality if we


multiply it by -1: −g(x, y) ≥ −c. Finally, to account for the approximative
nature of numerical algorithms, we relax the conditions a little bit to allow
for tiny deviations from the constraint g(x, y) = c. In practice, this amounts
in the following constraints:

g(x, y) ≥ c − 10−5
−g(x, y) ≥ −c − 10−5 .

Note that we could also use a different (small positive) constant instead of
10−5 .
Remark. The need to relax the equality constraint has to do with the fact
that the algorithms used by constrOptim belong to the class of the so called
interior point algorithms. Losely speaking, this means that the starting point
of the algorithm must be a point from which one could move in any direction
and still be able to stay within the constraint with an appropriately small
step size. The details of this procedure, however, are well beyond the scope
of this course.
256 CHAPTER 9. OPTIMIZATION

Moreover, recall that the variables of interest, x and y, were in fact quantities.
While when solving the problem analytically, we keep this interpretation in
mind and could, if necessary, decide to only use a positive solution (in case
there would be several), this information is only given implicitly and R would
not be able to makes this decision. Therefore we need to add the constraints
x ≥ 0 and y ≥ 0. The full set of ≥ constraints is therefore
1·x+1·y ≥ 12 − 10−5
(−1) · x + (−1) · y ≥ −12 − 10−5
1·x+0·y ≥0
0·x+1·y ≥ 0.

We recognize that the above system could be written in the following way:
12 − 10−5
   
1 1  
−1 −1 x −12 − 10−5 

1
 ≥  .
0 y  0 
0 1 0
We will use the matrix on the left hand side and the vector on the right hand
side to pass the contraint(s) to the function.

constM <- matrix(c(1, 0, 0, 1, -1, -1, 1, 1), nrow = 4, ncol = 2,


byrow = TRUE)
constV <- c(0, 0, -12 - 1e-5, 12 - 1e-5)

Like optim, the function constrOptim also requires the function to only use
one argument. Therefore we redefine the profit function before providing
all arguments to constrOptim. To make sure that our function will be
maximized, we use control = list(fnscale = -1), just like in the case
of optim. We set grad = NULL such that we don’t have to provide the first
order partial derivatives of the function.

P2 <- function(a) P(a[1], a[2])


solution <- constrOptim(c(6, 6), P2, grad = NULL, ui = constM,
ci = constV, control = list(fnscale = -1))

Again, the output of this optimization function is a list, of which we are


interested in the objects par (the minimal or maximal point) and value (the
optimal value of the objective function).
9.5. GLOBAL EXTREMES 257

solution$par

## [1] 4.999023 7.000986

solution$value

## [1] 868.0005

Note that the output of the function does not contain the optimal value of
the Lagrange multiplier.

9.5 Global extremes

Recall that if a function f attains a global minimum (maximum) over a subset


B of its domain (or over the domain if a specific set is not mentioned) at a
point c ∈ B, then f does not attain a lower (higher) value in any other point
of B. There may be several such points, e.g. c1 and c2 , but in that case, from
the definition of a global extreme, f (c1 ) = f (c2 ).

To find the global minimum (maximum), one needs to consider all of the
local minima (maxima) and compare them. Moreover, one needs to consider
the overall behaviour of the function. If a function does not have any local
minimum (maximum), then there is also no global minimum (maximum).
An example of a function that does not have any local or global extremes is
x3 .

Other functions do have local extremes, but no global extremes. For instance,
f (x) = x3 −3x attains a local maximum at x = −1 with the value f (−1) = 2,
and a local minimum at x = 1 with the value f (1) = −2. But it does not
have any global extremes because as x approaches −∞, f (x) approaches
−∞, too, whereas for x approaching ∞, f (x) also increases towards ∞.

Finally, of course there are also functions that do have global extremes. A
very simple example is the function x2 .

We have already mentioned that if we consider a function on a (half) closed


interval, the (included) interval endpoints are local extremes of the function.
258 CHAPTER 9. OPTIMIZATION

Therefore to find the global extremes of a function on an interval, it is often


enough to check all local extremes that are found by the first order necessary
conditions, as well as interval endpoints.

Example 9.2. Consider again f (x) = x3 − 3x, now on the interval [−3, 1.5].
The two local extremes satisfying the first order necessary conditions are
x = −1 with f (−1) = 2 and x = 1 with f (1) = −2. When checking the
interval endpoints, we obtain f (−3) = −18 and f (1.5) = −1.125. Therefore,
on the given interval, the global minimum is at x = −3 and the global
maximum is at x = −2.

9.6 Exercises
9.1 For the following functions, find all of their stationary points and decide
whether the function attains a local minimum, local maximum or neither at
these points.

(x2 +2)2 2
a)f1 (x) = 4
d)f4 (x) = (x − 3) ex
x3 +1 e)f5 (x) = 3x4 + 4x3 − 30x2 + 36x +
b)f2 (x) = x+1 10
1
f)f6 (x) = x ln x1

c)f3 (x) = − (3x4 +x 2 )10

Verify your results in R; choose the interval to search in appropriately.

9.2 The following plots show the derivatives of functions f : [0, 7] → R,


g : [0, 7] → R, h : [−5, 4] → R and i : [−3, 3] → R. Find
a)all stationary points of the functions;
b)all points x at which the functions attain their local minima;
c)all points x at which the functions attain their local maxima;
d)the intervals in which the functions are increasing;
e)the intervals in which the functions are decreasing;
f)the intervals in which the functions are convex (approximate - as well
as can be read off the graph);
g)the intervals in which the functions are concave (approximate - as well
as can be read off the graph).
9.6. EXERCISES 259

15
g'(x)
f'(x)

5
−20

0
0 2 4 6 0 2 4 6

x x
30
150
h'(x)

i'(x)

10
50

−10
−50

−4 0 2 4 −3 −1 1 3

x x

9.3 Consider the function

f (x) = x3 − 18x2 + 96x − 80.

Find all its local minima and maxima in the interval [3, 15]. Then find its
global minimum and maximum in this interval (if they exist).

9.4 Consider the function

f (x) = x4 − 8x3 − 80x2 + 15.


260 CHAPTER 9. OPTIMIZATION

Find all its local minima and maxima in the interval (−20, 20). Then find
its global minimum and maximum in this interval (if they exist).

9.5 A firm produces a good with a cost function

C(x) = 4x2 − 20x + 1000

and a revenue function


2x2 + 3000
where x denotes the produced amount of the product (output). Find the
profit maximizing output and the corresponding maximal profit.

9.6 A firm produces a good with a revenue function

R(x) = 3300x − 26x2

and a cost function

C(x) = x3 − 2x2 + 420x + 750

where x is the produced amount of the product (output). Find the profit
maximizing output and the corresponding maximal profit. Assume x > 0.

9.7 Find the stationary points of the following functions and classify them
(f attains local minimum, local maximum, saddle point, undetermined):

a)g1 (x, y) = x2 + y 2 c)g3 (x, y) = x3 y − 3xy 3 + 8y


b)g2 (x, y) = 3x2 + 2y 3 − 6xy d)g4 (x, y) = x + 2y 4 − ln(x4 y 8 )

Use R to verify your answer; choose appropriate starting points to find all
extremes.
9.8 A monopolist offers two products at respective prices p1 and p2 . The
respective demands at these quantities are

q1 = D1 (p1 , p2 ) = 240 − 4p1 + 3p2 ,


q2 = D2 (p1 , p2 ) = 400 + p1 − 2p2 .

The costs of producing one unit of the products are 21 and 35, respectively,
and the fixed costs are 2633. Find
a)the profit function as a function of the prices;
b)the profit maximizing prices p1 , p2 ;
9.6. EXERCISES 261

c)the profit maximizing quantities q1 , q2 ;


d)the maximal profit and the costs at the profit maximizing prices.

9.9 The production outcome (in units) of a company producing a steel


1 1
compontent can be described by the function P (h, s) = 500h 4 s 2 where h
represents the used number of human labor units and s the used number of
tons of steel. The costs are 15 EUR per human labor hour and 16 EUR per
ton of steel. The production has a production budget of 21000 EUR. Use the
Lagrange multiplier method to find the production maximizing values of h
and s under the given constraint and the shadow price of 1000 units of extra
budget.

9.10 A firm organizes a give-away in which products A and B will be distributed


among the winners. The total value of all wins should be 180. The individual
prices (values) are pA = 1 and pB = 8 per unit of A and B, respectively.
The cost of producing x units of product A and y units of product B are
C(x, y) = 5x2 − 2xy + 2y 2 . Find the cost minimizing combination of the
product amounts x and y (decimal numbers are accepted):
a)via Lagrange multiplier method,
b)by converting the problem in a single variable optimization problem,
c)in R.

9.11 A firm produces two goods, whose produced amounts we denote by x


and y, respectively. The cost function is given by

C(x, y) = 8x2 − 2xy + 2y 2 .

Due to some quota restrictions, the firm must produce overall x + y = 12


goods. The firm seeks to minimize the costs.
a)Write the Lagrangian and find the cost minimizing output levels via the
Langrange method.
b)Find the minimal costs.
c)Find the (approximate) change in the value of the cost function caused
by a one unit change in the total number of goods that must be produced.
Which of the following codes can be used to solve the constrained
optimization problem stated above? Assume nonnegative amounts of
products produced. Try to find the answer without R first. Once you
have it, you may try and run it in R, which will also help you to verify
the answers to parts a) and b).
262 CHAPTER 9. OPTIMIZATION

i)f <- function(x, y) 8*x^2 - 2*x*y + 2*y^2


CM <- matrix(c(1, 0, 0, 1, 1, 1, -1, -1), nrow = 4, byrow = TRUE)
CV <- c(0, 0, 12 - 1e-5, -12 - 1e-5)
constrOptim(c(12, 12), f, ui = CM, ci = CV, grad = NULL)
ii)f <- function(x) 8*x[1]^2 - 2*x[1]*x[2] + 2*x[2]^2
CM <- matrix(c(1, 0, 0, 1, 1, 1, -1, -1), nrow = 4, byrow = TRUE)
CV <- c(0, 0, 12 - 1e-5, -12 - 1e-5)
constrOptim(c(9, 3), f, ui = CM, ci = CV, grad = NULL)
iii)f <- function(x) 8*x[1]^2 - 2*x[1]*x[2] + 2*x[2]^2
CM <- matrix(c(1, 0, 0, 1, 1, 1, -1, -1), nrow = 4, byrow = TRUE)
CV <- c(0, 0, 12 - 1e-5, -12 - 1e-5)
constrOptim(c(6, 6), f, ui = CM, ci = CV, grad = NULL,
control = list(fnscale = -1))

9.12 A firm produces two goods, whose produced amounts we denote by x


and y, respectively. The cost function is given by

C(x, y) = 45x2 + 90xy + 90y 2 .

Due to some quota restrictions, the firm’s output must satisfy 2x + 3y = 60.
The firm seeks to minimize the costs.
a)Write the Lagrangian and find the cost minimizing output levels via the
Langrange method.
b)Find the minimal costs.
c)Find the (approximate) change in the value of the cost function caused
by a one unit change in the total number of goods that must be produced.

9.13 A firm produces two goods, whose produced amounts we denote by x


and y, respectively. The cost function is given by

C(x, y) = 9y 2 − 23xy + 4x2 − 50y − 40x + 991,

the revenue function is

R(x, y) = 2x2 − 21xy + 6y 2 − 10x − 30y + 1053.

Due to some quota restrictions, the firm’s output must satisfy x + 2y = 10.
a)Write the Lagrangian and find the profit maximizing output levels via
the Lagrange method.
b)Find the maximum profit.
9.6. EXERCISES 263

c)Find the (approximate) change in the maximum profit caused by one


unit change in the costraint on the total number of goods.
d)Which of the following codes can be used to solve the constrained
optimization problem stated above? Assume nonnegative amounts of
products produced. Try to find the answer without R first. Once you
have it, you may try and run it in R, which will also help you to verify
the answers to parts a) and b).
i)f <- function(x, y) -2*x^2 + 2*x*y - 3*y^2 + 30*x + 20*y + 62
CM <- matrix(c(1, 0, 0, 1, 1, 2, -1, -2), nrow = 4, byrow = TRUE)
CV <- c(0, 0, 10 - 1e-5, -10 - 1e-5)
constrOptim(c(2, 4), f, ui = CM, ci = CV, grad = NULL,
control=list(fnscale = -1))
ii)f <- function(x) -2*x[1]^2 + 2*x[1]*x[2] - 3*x[2]^2 +
30*x[1] + 20*x[2] + 62
CM <- matrix(c(1, 0, 0, 1, 1, 2, -1, -2), nrow = 4, byrow = TRUE)
CV <- c(0, 0, 10 - 1e-5, -10 - 1e-5)
constrOptim(c(8, 1), f, ui = CM, ci = CV, grad = NULL,
control = list(fnscale = -1))
iii)f <- function(x) -2*x[1]^2 + 2*x[1]*x[2] - 3*x[2]^2 +
30*x[1] + 20*x[2] + 62
CM <- matrix(c(1, 0, 0, 1, 2, 1, -2, -1), nrow = 4, byrow = TRUE)
CV <- c(0, 0, 10 - 1e-5, -10 - 1e-5)
constrOptim(c(4, 3), f, ui = CM, ci = CV, grad = NULL,
control = list(fnscale = -1))
iv)f <- function(x,y) -2*x^2 + 2*x*y - 3*y^2 + 30*x + 20*y + 62
CM <- matrix(c(1, 0, 0, 1, 1, 2, -1, -2), nrow = 4, byrow = TRUE)
CV <- c(0, 0, 10 - 1e-5, -10 - 1e-5)
constrOptim(c(4, 3), f, ui = CM, ci = CV, grad = NULL)

9.14 A firm produces two goods, whose produced amounts we denote by x


and y, respectively. The cost function is given by
C(x, y) = 10x2 − 34xy + 3y 2 − 70x − 30y + 1073,
the revenue function is
R(x, y) = y 2 − 32xy + 7x2 − 50x + 990.
Due to some quota restrictions, the firm’s output must satisfy 4x + 2y = 20.
a)Write the Lagrangian and find the profit maximizing output levels via
the Lagrange method.
264 CHAPTER 9. OPTIMIZATION

b)Find the maximum profit.


c)Find the (approximate) change in the maximum profit caused by one
unit change in the costraint on the total number of goods.
d)Verify your calculations (answers to parts a) and b)) in R.

9.7 Further readings

Single variable optimization is the content of Chapter 9 of [1]. Specifically, we


suggest sections 9.1-9.3 (global extremes) and Section 9.6 (local extremes)
as further readings for this topic. Of Chapter 9, we suggest the following
exercises the check your understanding of the topics:

ˆ Section 9.2: Exercises 1-9;


ˆ Section 9.3: Exercises 1a and 2-5;
ˆ Section 9.6: Exercises 1-7;
ˆ Review exercises 1-7, 8a-b, 9a and 10-12.
Chapter 17 of [1] covers the topic of multivariate unconstrained optimization.
The sections covered in this course are 17.1-17.4. The exercises we suggest
from this chapter are the following:

ˆ all exercise of Section 17.1;


ˆ Section 17.2: Exercises 1-5 and 7-8;
ˆ Section 17.3: Exercises 1-2, 3a, 4a, 5;
ˆ Section 17.4: Exercises 1a, 2a;
ˆ Review exercises 1-5, 8a and 9a.
Finally, Chapter 18 of [1] discusses (multivariate) constrained optimization
with particular focus on the Lagrange multiplier method. In particular,
Sections 18.1-18.5 (with 18.4 offering a more detailed insight into why the
Lagrange multiplier method works) are relevat to the contents of the course.
To verify your understanding, we suggest the following exercises from this
chapter:
9.7. FURTHER READINGS 265

ˆ Section 18.1: Exercises 1-6 and 8;


ˆ Section 18.2: Exercises 3 and 4;
ˆ Section 18.3: Exercises 1-4;
ˆ Section 18.5: Exercises 2 and 4;
ˆ Review exercises 1-3, 5a and 7a.
266 CHAPTER 9. OPTIMIZATION
Chapter 10

Interest rates and time value of


money

In this chapter, we consider the value of accounts, payments and investments


at different points in time, and how it is influenced by interest rates. We have
already introduced some basic notions of compounding in Sections 1.2.4 and
3.6. In this chapter, we will discuss the topics in more details and will
particularly focus on how the frequency of (interest) payments changes the
value and on a special type of payment sequences, called annuities. We will
close the chapter with a short discussion about internal rate of return.

10.1 Interest periods and effective rates

In advertisements for accounts or credits, banks offer interest rates p.a. which
stands for per annum – per year. These are also called nominal interest
rates. However, usually the interest does not in fact accrue only once a year,
rather several times a year. Often it would be once a month, but there are
also instances of interest being added every day. In such cases, the nominal
interest is divided equally among all periods. For instance, if a bank offers
an interest rate of 1.5% p.a. with monthly compounding, that means that
every month, an interest of 1.5
12
% = 0.125% will be added to the account.

This can be formalized as follows: Consider an initial capital S0 deposited


in a bank account compounded n times a year at the nominal interest rate
p
p% p.a. Denote r = 100 . Let Sn (t) be the value of the capital after t years.

267
268CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY

Then we have
 r nt
Sn (t) = S0 1+ . (10.1)
n
A special case is n = 1 where the interest accrues once a year. In that case,
as discussed in Problem 1.7, after t years we have S1 (t) = S0 (1 + r)t in the
account. With n compounding periods per year, not only does the periodic
interest (the interest rate corresponding to one compounding period) change
to nr , but it is compounded n times every year, thus in t years, nt times
altogether. The factor q = 1 + nr corresponding to one compounding period
is often referred to as the compounding factor.

Assume now that we increase the payment frequency up to ∞. In that case,


we talk about continuous compounding and the formula changes to
 r nt
S∞ (t) = lim Sn (t) = lim S0 1 + (10.2)
n→∞ n→∞ n
 mrt
1
= lim S0 1 + (10.3)
m→∞ m
  m rt
1
= S0 lim 1 + (10.4)
m→∞ m
= S0 ert . (10.5)
Of course, in practice, there is a limit to how often in a year interest can be
added to an account. However, let us examine the behavior of the interest
with increasing n. We will use an interest rate of r = 1 (corresponds to
p = 100%! We use this unrealistic interest because it showcases the point we
are about to make particularly nicely.) and examine the factor by which the
initial capital S0 increases after t = 1 year. We define a function compound in
R that calculates the compounding factor for a given interest rate r, number
of compounding periods n and number of years t and then consider the
factor within one year for n = 1 (yearly compounding), n = 12 (monthly
compounding), n = 365 (daily compounding) and n = 365 · 24 · 60 · 60
(compounding every second). We then compare these factors to the value ert
for r = 1 and t = 1, i.e. e.

compound <- function(r = 0, n = 1, t = 1) {(1 + r/n)^(n*t)}


n <- c(1, 12, 365, 365*24*60*60)
compound(1, n)

## [1] 2.000000 2.613035 2.714567 2.718282


10.1. INTEREST PERIODS AND EFFECTIVE RATES 269

exp(1)

## [1] 2.718282

We can observe that with an increasing frequency of interest payments, the


factor by which the initial capital increases in one year increases, and it
approaches the value that would be obtained by continuous compounding.
For some more observations, let us plot the factor for n between 1 and
1000. To show how discrete (finite n) compounding approaches continuous
compounding in general, let us now choose a different (more realistic) value
of r.

r <- 0.05
n <- 1:1000
plot(n, compound(r, n))
abline(exp(r), 0, col=2)
1.0512
compound(r, n)

1.0506
1.0500

0 200 400 600 800 1000

n
270CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY

Just like previously, we observe the factor increases with increasing n, and
in fact, it gets very close to the continuous factor relatively fast. Therefore,
while as mentioned above, in practice continuous compounding is not possible,
it is an important theoretical tool used in many finance-mathematical and nt
economic models, since working with ert is easier than working with 1 + nr .

As is clear from the above considerations and results, the same interest rate
with different frequency of interest payments leads to different outcomes.
Similarly, different nominal interest rate with different frequency of payments
may lead to the same value in the account after the same period of time.
Therefore, to make the interest rates with different compounding periods
comparable, we use the notion of effective rate of interest. The effective rate
of interest is the rate at which the initial capital really increases within one
year. Note that with n compounding periods, the capital after one year is
 r n
Sn (1) = S0 1 +
n
which corresponds to an increase (change) by a factor of (1 + nr )n . Therefore,
the effective interest rate corresponds to
 r n
R= 1+ − 1. (10.6)
n
Similarly, with continuous compounding, after one year the initial capital is
multiplied by a factor of er , which makes the effective rate of interest

R = er −1. (10.7)

Though one usually thinks of interest rates as positive values, it is not always
the case. In fact, the past years saw not only 0, but even negative interest
rates.

Problem 10.1. In mid-2018, the nominal annual interest rate at the European
central bank for AAA-rated bonds with a maturity of 1 year was about
−0.7%. Compute the time it takes for such a bond to lose 10% of its original
value and the respective effective interest rate, assuming that the nominal
interest rate stays at this level, if the interest is ’paid’

a) yearly,

b) monthly,

c) continuously.
10.1. INTEREST PERIODS AND EFFECTIVE RATES 271

Solution. Note that we have p = −0.7 which corresponds to r = −0.007.


Denoting the original value of the bond S0 , we are looking for t such that
Sn (t) = 0.9S0 (losing 10% of value means being at 90% of the value). For
n = 1 (part a), this means solving

S0 (1 − 0.007)t = 0.9S0

for t, which gives

ln 0.9
t= = ln0.993 0.9 ≈ 14.9988.
ln 0.993
The effective rate in this case is the same as the nominal rate.
For part b), we have n = 12, which changes the equality for t to
 12t
0.007
S0 1 − = 0.9S0
12

. The solution is

ln 0.9 ln1− 0.007 0.9


t= 0.007
= 12
≈ 15.0471.
12 ln 1 − 12 12

The effective rate is


 12
0.007
R= 1− − 1 ≈ −0.006978.
12

Finally, with continuous compounding, we are looking for the solution of

S0 e−0.007t = 0.9S0 ,

which is
ln 0.9
t= ≈ 15.0515.
−0.007
The effective interest rate is given by

R = e−0.007 −1 ≈ −0.006976.

Remark. Problem 10.1 shows that more frequent interest payments are not
only advantageous with positive interest rates (more earnings), but also with
negative interest rates as in that case, the value depreciates more slowly.
272CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY

10.2 Present and future value

Computing present values is equivalent to finding how much money would


need to be deposited in a bank account at a given interest rate today to
have a given amount of money in a (ficticious) bank account at a time point
in the future. If after t years, a value of K should be in the account with
compounding n times per year at a nominal annual interest rate r, then the
present value A is the number that satisfies
 r nt
K =A 1+ ,
n
that is, we have
K  r −nt
A= r nt
=K 1+ . (10.8)
n

1+ n

In case of continuous compounding, A must satisfy

K = A ert

which leads to
K
A= = K e−rt . (10.9)
ert
Note that the present value refers to how much a future capital K is worth
today; as already hinted above, there is not necessarily a real bank account
connected to this. For each period between now and then, the future value
is divided by the compounding factor q = 1 + nr corresponding to one period
once. The reciprocal value of the compounding factor d = 1q is often referred
to as the discounting factor.

Computing present values is important to compare the worth of future payments


at different times. For instance, you might be offered a payment of e150
today or a payment of e175 in three years. While clearly 175 > 150, the
value of the future payment from today’s point of view depends on current
interest rates in the market. If with today’s e150 you can have more than
e175 in three years if deposited in an account, the ’lower’ value is a better
deal.

If n successive payments a1 , . . . , an are to be made at the end of the next n


years, then at a fixed interest rate r, the present value of all the payments
made is just the sum of the individual present values. Each payment is
10.2. PRESENT AND FUTURE VALUE 273

discounted as often as there are discounting periods between the starting


time point and the time at which the payment is made. We therefore get
n
a1 a2 an X ai
Pn = + 2
+ ... + n
= . (10.10)
1 + r (1 + r) (1 + r) i=1
(1 + r)i
Note that since the payments are being made at the end of the year, we
use yearly compounding. Were the frequency of the payments higher, the
compounding would also be adjusted correspondingly. Moreover, the first
payment a1 happens at the end of year 1. If the first payment were to be
made today, at time point 0, then the first payment would not be discounted
since its present value is equal to its value at its payment date.
The future value of the n payments is the value of the payments at the end of
year n. It can be seen as the value of the capital in a (fictitious) bank account
that increases not only due to interest payments, but also by new deposits
at the end of each year. If the first payment is deposited at the end of year
1, then up to the end of year n, it will be compounded n − 1 times. The last
payment, being only deposited at the time of the last interest payment, will
not be compounded at all. We have
Xn
n−1 n−2
Fn = a1 (1 + r) + a2 (1 + r) + . . . + an = ai (1 + r)n−i . (10.11)
i=1

Notice that we have


Fn
Pn = , (10.12)
(1 + r)n
or, equivalently,
Fn = Pn (1 + r)n , (10.13)
that is, we can simply move between the future and present value of any
sequence of payments by compounding or discounting an appropriate number
of times.
One can illustrate the payments and finding their present or future values
using a time line:
a1 a2 a3 an−1 an
(periods)
0 1 2 3 ··· n−1 n

Pn Fn
274CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY

To calculate the value of a payment one period later, one compounds once;
to calculate the value one period earlier, one discounts once.

Problem 10.2. Toni agrees to pay Steph e1000 one year from now and
e2000 two years from now. In return, Steph pays Toni e3000 three years
from now. Determine the present and the future value of this contract when
the interest rate is 5% (annual compounding).

Solution. Note that the present and future value of the contract differs
depending on whether we calculate the values from the point of view of
Steph or Toni – but they only differ in the sign. Let us calculate it from
the point of view of Steph. From Steph’s point of view, there will be three
payments a1 = 1000, a2 = 2000 and a3 = −3000 one, two and three years
from now, respectively. Note that the last payment is negative since she will
not receive this payment, she will make it. At r = 0.05, the present value of
this contract is therefore
1000 2000 3000
Pn = + 2
− ≈ 174.93.
1.05 1.05 1.053
The future value can easily be calculated from this using formula (10.13):

Fn = Pn · 1.053 = 202.5.

If we calculated the values from the point of view of Toni, we’d get Pn ≈
−174.93 and Fn = −202.5.

10.3 Annuities and mortgage repayments

A sequence of payments with a1 = a2 = . . . = an = a, that is, all payments


are the same, is called an annuity. An annuity where the first payment is
due immediately (at t = 0), is called annuity due. A typical example of an
annuity is a mortgage.

Inspecting equalities (10.10) and (10.11) for an annuity, we might notice that
in fact, the future and present value are both sums of geometric sequences
– sequences in which the relative difference between successive terms is
constant, or in other words, the next term arises from the previous one by
multiplying it by a fix constant k. For the sum of a geometric sequence, there
is in fact a simple formula.
10.3. ANNUITIES AND MORTGAGE REPAYMENTS 275

Theorem 10.1. Let k ̸= 1, a ∈ R. Then it holds that


n−1
2 n−1
X 1 − kn
sn = a + ak + ak + . . . + ak =a ki = a . (10.14)
i=0
1−k

Formula (10.14) can in fact be easily proven: Consider sn − ksn . We have

sn (1 − k) = sn − ksn = a + ak + ak 2 + . . . + ak n−1
− ak − ak 2 − . . . − ak n−1 − ak n = a − akn = a(1 − k n ).

The formula follows from here.

Using formula (10.14), we can easily derive the formulae for present and
future values of any annuity or annuity due. Let us consider compounding
m times a year and let us denote, as above, by q = 1 + nr the compounding
factor for one period and by d = 1q the discounting factor for one period.

Theorem 10.2. Consider an annuity with n payments at a frequency of m


payments per year and a nominal annual interest rate r. Then the present
value of this annuity is
n−1
n
X
i 1 − dn
Pn = ad + . . . + ad = ad d = ad (10.15)
i=0
1−d

and its future value is


n−1
n−1
X
iqn − 1
Fn = aq + ... + a = a q =a . (10.16)
i=0
q−1

Consider an annuity due with n payments at a frequency of m payments per


year. Then the present value of this annuity is
n−1
X 1 − dn
Pn = a + . . . + adn−1 = a di = a (10.17)
i=0
1−d

and its future value is


n−1
X qn − 1
Fn = aq n + . . . + aq = aq q i = aq . (10.18)
i=0
q−1
276CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY
n
−1
Remark. Note that in the future value formulae, we write qq−1 instead
1−q n
of 1−q . Note also that since both the nominator and denominator are
multiplied by −1, these two fractions are in fact the same. However, since
n −1
usually q > 1, this amounts qq−1 into dividing two positive values which is
often more comfortable than working with negative numbers.
Remark. Notice the similarity (and difference) between the two pairs of
formulae for present values and future values. In an annuity, the first payment
is made at the end of year 1 and the last at the end of year n, whereas in the
case of an annuity due, all payments are made one period earlier compared
to an annuity (since the end of one period corresponds to the beginning of
the next period). This is the reason why in formula (10.15), there is one
more discounting factor compared to (10.17), whereas formula (10.18) uses
one more compounding factor than formula (10.16).
Problem 10.3. Find the present and the future value of an annuity of e1000
per payment once per year for 10 years when the annual interest rate is 2%.
Solution. We have an annuity with a = 1000, n = 10, m = 1 and r = 0.02.
1
Consequently, q = 1.02 and d = 1.02 . Plugging in into (10.15) and (10.16),
we get

1 1 − 1.021 10
P10 = 1000 · 1 ≈ 8982.59,
1.02 1 − 1.02
1.0210 − 1
F10 = 1000 ≈ 10949.72.
1.02 − 1
Problem 10.4. Find the present and the future value of an annuity due of
e1000 per payment once per year for 10 years when the annual interest rate
is 2%.
Solution. We have an annuity due with a = 1000, n = 10, m = 1 and
1
r = 0.02. Consequently, q = 1.02 and d = 1.02 . Plugging in into (10.17) and
(10.18), we get

due 1 − 1.021 10
P10 = 1000 1 ≈ 9162.237,
1 − 1.02
due 1.0210 − 1
F10 = 1000 · 1.02 ≈ 11168.72.
1.02 − 1

Note that indeed, if we compare the results from Problem 10.3 and 10.4, we
due due
obtain that P10 = d · P10 and F10 = q · F10 .
10.4. INTERNAL RATE OF RETURN 277

Problem 10.5. Toni borrows e50000 at the beginning of a year and is


supposed to pay it off in five equal installments at the end of each month
throughout the next 5 years, with an annual interest rate at 5%. Find the
monthly payment.
Solution. We have Pn = 50000 (the value borrowed at the beginning), m =
12, n = 5 · 12 = 60 and r = 0.05, thus d = 1+10.05 . The value of each payment
12
a is unknown and to be calculated. We use formula (10.15) to write
1 − d60
50000 = ad
1−d
a ≈ 943.56.
Remark. Problem 10.5 actually describes a mortgage and shows how monthly
mortgage payments are calculated. We can notice that in the end, Toni
repays 943.56·60 ≈ 56613.7 which is by about e6600 more than the originally
borrowed amount – that’s the interest on the mortgage.
Problem 10.6. Steph buys an e-bike for e5000 which should be repaid in
equal repayments of e500 (plus one final payment of less than e500), the
first one immediately and the following after each of the coming years. How
many years does Steph have to pay if the annual interest rate is 7%?
Solution. We have an annuity due with Pn = 5000, a = 500, m = 1,
1
r = 0.07, implying d = 1.07 , and n unknown. Using formula (10.17), we
write
1 − dn
5000 = 500
1−d
n
10(1 − d) − 1 = −d
ln(1 − 10(1 − d))
n=
ln d
n ≈ 15.7
Note that since we got n between 15 and 16, the answers is 16: After 15
years, the present value of her payments will be less than 5000, but after 16
it would be already more than 5000, such that she pays e500 for 15 years
and then one last lower payment.

10.4 Internal rate of return

Consider n + 1 yearly successive payments, a0 , a1 , . . . , an , representing net


returns of some project in each year. Typical example would be starting
278CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY

the production of a certain product that requires purchasing some machine


at the beginning, thus a0 < 0, but brings earnings in the next years, thus
a1 , . . . , an > 0. Clearly, the net present value of the returns from this project
corresponding to some interest rate r is given by
a1 an
A = a0 + + ... + . (10.19)
1+r (1 + r)n

The internal rate of return (IRR) of the project is such a value of r that
satisfies A = 0. In other words, it is an interest rate at which the present
value of all costs (e.g. the purchasing costs at the beginning of the project) is
equal to the present value of all revenues from the project. IRR is sometimes
used by companies to compare different investment opportunities: a project
with a higher IRR is considered better.

Note that finding r is equivalent to finding the root of a polynomial of degree


n. While this can be easily done for n ≤ 2, for longer projects such a solution
is typically not available. Therefore we will resort to R to solve the following
problem.

Problem 10.7. You give someone e10000, and they promise to pay back
e2000 after one year, e2500 after two years, e3000 after three years and
e3500 after four years. Calculate the internal rate of return.

Solution. Clearly, we have

2000 2500 3000 3500


A = −10000 + + 2
+ 3
+ .
1 + r (1 + r) (1 + r) (1 + r)4

As mentioned above, to find r such that A = 0 is not a trivial task. To do


so, we will use R’s function uniroot that looks for roots of a given function.
We will start by defining a function of r that calculates the present value
for a particular interest rate. Before providing it to the uniroot function,
we will graphically inspect possible interval in which the solution could be,
since uniroot requires a search interval, too (similarly to optimize).

a <- c(-10000, 2000, 2500, 3000, 3500)


PV_IRR <- function(r) sum(a/(1+r)^(0:(length(a)-1)))
rs <- seq(0, 0.1, by = 0.01)
PVs <- sapply(rs, PV_IRR)
plot(rs, PVs, type = "l")
abline(h = 0)
10.4. INTERNAL RATE OF RETURN 279
1000
500
0
PVs

−500
−1000
−1500

0.00 0.02 0.04 0.06 0.08 0.10

rs

IRR <- uniroot(PV_IRR, c(0, 0.1))


IRR$root

## [1] 0.03584787

PV_IRR(IRR$root)

## [1] 0.006288343

The output of uniroot is a list, like we already know it from the optimization
functions. The root itself is provided in the object root of the output. A
280CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY

simple check of plugging in the outcome to the original function reveals that
the value at this point is still relatively far from 0, but we can adjust the
tolerance to get a better result.

IRR <- uniroot(PV_IRR, c(0, 0.1), tol = 1e-10)


IRR$root

## [1] 0.03584811

PV_IRR(IRR$root)

## [1] 2.273737e-13

The internal rate of return of the proposed deal is thus about 3.58%.

10.5 Exercises
10.1 Assume that you want to deposit a capital S0 for the next 5 years and
have three offers for a savings account:
ˆBank A offers you an annual interest of 3%, compounding yearly.
ˆBank B offers you an annual interest of 2.5%, compounding continuously.
ˆBank C offers you an annual interest of 2.8%, compounding twice a
year.
Which of the three offers is the best?
10.2 Assume that you deposit e4000 in an account with monthly compounding
and after 10 years, you have e4983.30. What is the nominal and effective
interest rate in the account?
10.3 Thomas won in a lottery and was offered two options: e10000 immediately
or e5100 in one year and e5100 in two years. Assuming an annual interest
rate of 5%, which offer should he choose?

10.4 After graduation, a BBE student got an offer from a master’s program
in another country. The student first arranged a room to stay that costs
e300. The remaining living expenses are estimated to be e500. The student
found a part-time job to cover the expenses. While the employer pays e1000
10.5. EXERCISES 281

at the end of each month, the landlord requires rent payments upon the
beginning of each month. On the other hand, the student uses a credit card
to pay for the other expenses, and the credit card bills are paid at the end of
each month. Suppose that from the former savings, the student has e1000
when moving to the new country. Assume that the annual interest rate is
2% and that the student will stay in the country for 2 years. Compute the
present and future value of the money the student will save over the two
years, after paying for the expenses in each month. Hint: Draw a timeline
with time points 0, 1, . . . , 23, 24 and indicate the cash inflows and outflows
that occur at the given time points and use formulae (10.15)-(10.18).

10.5 Erica plans to take a trip next year which she estimates will cost her
e5000. She wants to save up for the trip by repeated monthly deposits of
the same amount of money. If she deposits the first payment immediately,
saves for a year and the annual interest rate is 3%, how much does she have
to deposit each month?

10.6 Suppose that you are offered an annuity of 10 yearly payments of e1000
in exchange for an immediate payment of e7500. Use R to find the internal
rate of return.
10.7 Assume that two banks offer car loans, where the borrowed monay
plus interest is repaid at the end of the loan period. Bank 1 offers loans
with an annual interest rate of 10.07% and with continuous compounding.
Bank 2 offers loans with an annual interest rate of 10.1% and with monthly
compounding.
a)If a borrower wants to buy a car for e40000 and get this amount financed
with loan with a period of 5 years, which bank should they choose?
Justify your answer by computing and comparing the total payment
the borrower would make to each bank.
b)Calculate the effective annual interest rates of the two considered car
loans (that is, of both a loan from Bank 1 and a loan from Bank 2) from
part a).
c)Assume now that the borrower from part a) considers a third option
to finance a new car. Their close friend offers to lend them e40000
now and in return, the borrower promises to pay 850 at the end of each
month for the next 5 years. Assume an annual interest rate of 10% and
calculate the future value of the total payments the borrower makes to
their friend. Compare the result to the results from part a). Should
the borrower take the friend’s offer, or should they finance the car by
borrowing from one of the banks?
282CHAPTER 10. INTEREST RATES AND TIME VALUE OF MONEY

d)Does the answer to part c) change if the friend requires a payment of


e845 at the beginning of each month?

10.6 Further readings

The contents of this chapter are discussed in Sections 11.1-11.7 of [1], with
11.4 offering a much more detailed overview of geometric sequences and series
than necessary for this course. To practice and check your understanding of
the topics, we suggest the following exercises:

ˆ all exercises of Sections 11.1 and 11.2,


ˆ exercises 1 and 2 in Section 11.3,
ˆ exercises 1-6 in Section 11.5,
ˆ all exercises in Section 11.6,
ˆ exercises 1 and 4 in Section 11.7,
ˆ Review exercises 1-5 and 8-12.
Chapter 11

Further useful R functions and


tips

In this section, we introduce a few more useful R functions that did not fit
with any of the previous topics, to give you a solid basis for writing more
involved codes. We also share a few programming tips that can help to make
your code more compact and efficient.

11.1 while cycle

We start this chapter with another cycle that, unlike for cycle, can be used
in situations when you don’t know upfront how many times a certain task
needs to be repeated.

Recall that in Section 5.2, we have introduced the following structure for the
for cycle:

for(i in sequence) {
task
}

where sequence is a vector of values, for which task (usually dependent on


i) should be performed. If, however, it is not clear from the beginning how
often a certain task should be repeated, for is not a good choice, since we

283
284 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS

would not know how to choose sequence. Instead, we would make use of
while which has the following basic structure:

while(condition) {
task
}

condition is some logical expression (usually dependent on some variable).


Clearly, while can also be used in any situation in which for could be used,
and in our very first example we will use the same example as we used when
n
i2 .
P
introducing for: For all natural numbers n ≤ 5, print the value
i=1

n <- 1
while(n <= 5) {
print(sum(1:n)^2)
n <- n + 1
}

## [1] 1
## [1] 9
## [1] 36
## [1] 100
## [1] 225

Note that unlike with for, we started by defining some value for n outside
of the cycle. After that, the following happens: At the beginning, and after
each iteration of the cycle, while checks whether n is not more than 5. If so,
sum(1:n)^2 is printed and afterwards, we increase the value of n by 1. Once
the end of task is reached, the iteration is done a new one starts with a new
check of the condition. The first time the condition returns FALSE, the task
will be skipped and R will go on with the rest of the code (if there is some
after).

As already mentioned, the above task could have been performed with a for
cycle, and even in a more elegant way. Let us therefore turn our attention
to a task that in fact requires while rather than for.
n
P
Problem 11.1. Find the smallest natural number n with i ≥ 1000.
i=1
11.1. WHILE CYCLE 285

Solution. This goal can be achieved e.g. by the following simple code:

n <- 1
s <- 0
while(s + n < 1000) {
s <- s + n
n <- n + 1
}
n

## [1] 45

Let’s look at the code in more detail: We started by defining n and an


auxiliary variable s that will keep track of intermediate sums. The condition
is simple: Continue as long as the sum of the current value of s and the
next value of n remains below 1000. As soon as 1000 is hit or crossed, the
condition will be evaluated as FALSE and the while cycle will terminate.
Note that we always increase s first and only then increase the value of n.
This way, if the sum for the previous n is below 1000, but adding the new
n makes it hit or cross 1000, we are not increasing it anymore. That is why
the value we are looking for is the value of n at the moment when while
terminates. (You can also easily check that the output is the correct value.)
Remark. Sometimes, the stopping condition of the while cycle is more
complicated, such that it might be advantageous to have an auxiliary variable
that simply takes on the values TRUE or FALSE and is updated within task,
instead of directly incorporating it into condition. We will write an alternative
code to accomplish the above task to demonstrate how this could be done.

continue <- TRUE


n <- 1
s <- 0
while(continue) {
s <- s + n
n <- n + 1
continue <- s + n < 1000
}
n

## [1] 45
286 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS

Typical examples of algorithms in which the while cycle would be used,


in the context of QM1, are optimization or root finding algorithms. In
these algorithms, one typically starts at a certain starting point (recall the
optimization functions in R from Chapter 9) which is adjusted in several
iterations, until a certain condition is met. This condition is usually expressed
in terms of a tolerance (recall the tol parameter of the optimization functions).
For root finding algorithms, the cycle would continue as long as the function
value at the candidate point in absolute value is greater than the tolerance.
For optimization algorithms for differentiable functions, a similar condition
is formulated in terms of the function’s derivative.

11.2 return

In the previous chapters, we often used the function return to define the
output of the function. However, we also mentioned in Section 3.2 that
it is not absolutely necessary to use it, especially in simple functions (e.g.
when defining a mathematical function before plotting it). In fact, the usual
practice is not to use it in simple functions. We will now discuss what are
the benefits of using it and when one should do so.

Consider the following two functions defined in R:

Foo <- function(n) {


if(n <= 10) n
if(n <= 20) 2*n
3*n
}
Foo2 <- function(n) {
if(n <= 10) return(n)
if(n <= 20) return(2*n)
3*n
}

Looking at the functions, one can guess that the goal was to implement a
function that for n ≤ 10 outputs the value of n itself, for n ∈ (10, 20] twice
the value of n, and otherwise 3n. However, only one of the functions achieves
this. Let’s first look at the output of both functions for several values of n
to see what happens:
11.2. RETURN 287

Foo(2)

## [1] 6

Foo2(2)

## [1] 2

Foo(15)

## [1] 45

Foo2(15)

## [1] 30

Foo(40)

## [1] 120

Foo2(40)

## [1] 120

We observe that for Foo, the output is always 3n, even though we used n from
all three different categories used in the function. However, Foo2 delivers the
correct outputs. The reason is that return terminates the function. That is,
the moment that return is first executed, the rest of the code in the function
is ignored. In Foo, no value is returned or printed in either of the if clauses,
but the value 3*n is automatically returned at the end of the function. If
we would want to make sure that the function outputs the desired value for
any n, we would have to use two nested if ... else cases. return makes
it possible to write the code in a more elegant way using only if, but not
288 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS

having to use else: If n ≤ 10, the rest of the code is never reached because
n is returned in the first clause already. Otherwise we know already that
n > 10, such that we don’t have to check this part of the second condition
anymore. Instead, we continue to check whether n ≤ 20, which would (in
combination with the first condition being FALSE) imply that n ∈ (10, 20]. If
this is the case, the value 2*n is returned and the last code line is ignored.
If however also here we have FALSE, then n must automatically by greater
than 30, implying 3*n as the function’s output.

11.3 Several outputs from a function

Sometimes one wants to provide several values or even objects as the outputs
from the function. While we did not explicitly mention it so far, we did define
several functions previously whose outputs were vectors or matrices – this is
a very natural way of outputting several values. However, these have to be
of the same type, i.e. one cannot mix numbers with characters, for instance
(if one tried to define a vector with both value types, all numbers would be
turned to characters – feel free to try it). And in other situations, one does
not only want to output several values of different types, but also different
objects, like a number, a vector and a matrix. Recall that we saw such
outputs in Chapter 9: all of the optimization functions always provided as
their output lists of objects, among them the point x at which the optimal
value is achieved, and the optimal value itself. In the case of functions of
several variables, x was a vector with as many entries as the number of
variables of the function.

In this section, we discuss how to create lists and a few important things
about them. Generally, lists are a handy way of combining several objects,
possibly of different types, into one bigger object. Let’s say we want to
combine the following three outputs:

some_output <- 1:5


other_output <- "Hello"
another_output <- matrix(1:9, nrow = 3)

As we can see, the first object is a numeric vector; the second one is a single
character; and the last one is a numeric matrix. If we tried to combine
them e.g. in a vector, because of the presence of "Hello", all numbers would
11.3. SEVERAL OUTPUTS FROM A FUNCTION 289

be turned into characters. Moreover, we would loose the matrix form of


another_output. Therefore let’s keep their object types and instead put
them all in one ’basket’ – a list:

all_together <- list(some_output, other_output, another_output)


all_together

## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] "Hello"
##
## [[3]]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9

The individual objects in a list can be accessed by the use of double square
brackets, and within them (if they are e.g. vectors or matrices), the usual
indexing works:

all_together[[1]]

## [1] 1 2 3 4 5

all_together[[3]][1, 3]

## [1] 7

In Chapter 9, we have already mentioned that if the objects in the list are
named, they might also be accessed through this name, using the dollar sign.
Let’s try:
290 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS

all_together$some_output

## NULL

We see NULL which means that in all_together, there is no object called


some_output, or, if there is one, it is empty. That shows us that the names
of the outputs are not preserved when assigning them into the list. With the
list having already been defined, we can simply assign names to the existing
objects using the function names. Alternatively, we can assign names already
when defining the list.

names(all_together) <- c("some_output", "other_output",


"another_output")
all_together$some_output

## [1] 1 2 3 4 5

all_together2 <- list(some_output = some_output,


other_output = other_output,
another_output = another_output)
all_together2

## $some_output
## [1] 1 2 3 4 5
##
## $other_output
## [1] "Hello"
##
## $another_output
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9

Let’s write our first function that gives a list as its output.

Problem 11.2. Implement an R function that for x ∈ R finds and outputs


11.3. SEVERAL OUTPUTS FROM A FUNCTION 291

the largest even natural number smaller than x, and provides the information
whether this number is divisible by 3 in a separate character object.
Solution. First we realize that for x ≤ 2, there is no even natural number
smaller than x. We will therefore start with the case of x ≤ 2 and then
continue to implement the rest of the function for values larger than 2.

foo <- function(x) {


if(x <= 2) {
return("There is even natural number smaller than x.")
}
max_div_2 <- max((1:x)[1:x %% 2 == 0])
if(max_div_2 == x) max_div_2 <- x - 2
div_3 <- ifelse(max_div_2 %% 3 == 0, "divisible by 3",
"not divisible by 3")
list(multiple_of_2 = max_div_2, is_divisible_3 = div_3)
}

Note that we first define max_div_2 as the maximal value in the sequence
1:x. If x is a natural number, it is included in the sequence, and if it is
moreover even, max_div_2 will be assigned this value. In that case, we have
to make it by 2 smaller (to make sure that it is still natural, even, and that
it is smaller than x). If x is a decimal number, 1:x will end at the largest
natural number below x.
Let us check the outcome:

foo(1)

## [1] "There is even natural number smaller than x."

foo(3)

## $multiple_of_2
## [1] 2
##
## $is_divisible_3
## [1] "not divisible by 3"

foo(18.2)
292 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS

## $multiple_of_2
## [1] 18
##
## $is_divisible_3
## [1] "divisible by 3"

11.4 floor and ceiling

Let us shortly come back to the function in Problem 11.2. If x is a relatively


small value, that function will work perfectly well. For very large values of
x, however, it will take a while. That has to do with the fact that we are
unnecessarily checking whether all values up to x are even. One way to go
around this is to simply look at the largest natural below x. This can be
achieved for instance by rounding x to the next natural y (y ≥ x) and then
check the value y - 1. If it is even, it’s the value we are looking for; if it’s
odd, y-2 is our desired output. Note that y-1 is always the largest natural
number smaller than x. Here the function ceiling turns out to be helpful:
This function rounds any number to the next larger or equal integer.

foo2 <- function(x) {


if(x <= 2) {
return("There is no even natural number smaller than x.")
}
y <- ceiling(x)
max_div_2 <- ifelse((y - 1) %% 2 == 0, y - 1, y - 2)
div_3 <- ifelse(max_div_2 %% 3 == 0, "divisible by 3",
"not divisible by 3")
list(multiple_of_2 = max_div_2, is_divisible_3 = div_3)
}
foo2(1)

## [1] "There is no even natural number smaller than x."

foo2(3)

## $multiple_of_2
11.5. WHICH 293

## [1] 2
##
## $is_divisible_3
## [1] "not divisible by 3"

foo2(18.2)

## $multiple_of_2
## [1] 18
##
## $is_divisible_3
## [1] "divisible by 3"

A counterpart of ceiling that rounds downwards instead of upwards is the


function floor.

11.5 which

Another useful function we want to introduce here is the function which that
allows us to find out which of the entries of a vector satisfy a certain condition
– it outputs not their values, but indexes. For instance, let’s assume that a
vector v has been defined. max(v) finds the maximal value in v; which(v ==
max(v)) tells us where within v is this value. Similarly it can tell us all the
indexes of even values within v.

v <- c(1, 8, 2, 15, 3, 18, 4)


max(v)

## [1] 18

which(v == max(v))

## [1] 6

which(v%%2 == 0)

## [1] 2 3 6 7
294 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS

To find out the index of the minimal or maximal value within a vector, there
are also the functions which.max and which.min.

which.max(v)

## [1] 6

which.min(v)

## [1] 1

11.6 Computational efficiency

In Section 11.4, we have already briefly thought about the computational


efficiency of our function from Problem 11.2. Especially when dealing with
large amounts of data, such considerations can save a lot of computing time
and computer memory. In this section, we briefly talk about two tips how to
make your code more efficient. There are other points to consider, too, but
they are beyond the scope of this course.

The first thing to consider is the number of mathematical operations used.


Generally, you should try to minimize this number to make your code elegant
and efficient. Let us consider the following two code chunks:

# chunk 1:
x <- 1:10000
a <- 5
sum(a*x)

## [1] 250025000

# chunk 2:
x <- 1:10000
a <- 5
a*sum(x)

## [1] 250025000
11.7. EXERCISES 295

Before you continue reading, think about the following: Which chunk requires
less mathematical operations? Which of them do you believe should be less
computationally costly?

In both cases, the answer is chunk 2. Let us look at it in detail: In chunk 1,


one first performs 10000 multiplications to obtain the vector a*x, and then
sums all the values. On the other hand, in chunk 2, first the sum operation
is performed, and afterwards only a single multiplication. While in this case,
you would not really notice the difference, add a few zeroes in the first line
of each chunk and you will in fact see a difference in the run time of both
chunks.

The second thing you should bear in mind is that although for and while
cycles can be very useful, they should not be overused (though we do admit
that in order to demonstrate their use or to test your understanding of these
cycles as well as other concepts, we sometimes use them in a not very efficient
way – for example in some exercises at the end of this chapter). The reason is
simply that they are computationally costly, especially the while cycle that
in each iteration needs to check some condition to decide whether it should
keep running. Let’s add a third chunk to the above competition:

# chunk 3
x <- 1:10000
a <- 5
s <- a*x[1]
for(i in 1:length(x)) s <- s + a*x[i]
s

## [1] 250025005

Not only does this code use more mathematical operations than necessary, it
also uses a cycle. That makes it by far the least efficient of the three chunks,
and the difference in running times could actually already be observed for
much shorter vectors x than the difference between chunks 1 and 2.

11.7 Exercises

For the following tasks, try first to solve them without using R. Though it
might seem counterintuitive, one of the best ways to learn programming is to
296 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS

write or read code on paper rather then directly type and run it, as it makes
you think about it in a more detailed way. Once you believe you have found
an answer, check it in R.
11.1 For an integer n, explain what the following function does. (Please do
not explain the code line by line. Your task is to recognize what the outcome
of the function is.)

MyFunction <- function(n) {


i <- 1
curr_prod <- 1
while(curr_prod <= n) {
i <- i + 1
curr_prod <- curr_prod*i
}
i
}

11.2 For an integer n, the code below should find the largest integer x for
x
P
which it holds that i < n. However, there are some mistakes in the code.
i=1
Find and correct them.

MyFunction2 <- function(n) {


i <- 1
curr_sum <- 1
while(i < n) {
i <- i + 1
curr_sum <- curr_sum + sum(1:i)
}
i
}

11.3 For a matrix A, the code below should find the sums of the rows and the
columns of the matrix (equivalently to the functions rowSums and colSums,
however, without using these functions). The outcome should be a list of two
vectors containing the row sums in the first one and the column sums in the
second one. However, there are some mistakes in the code. Find and correct
them.

MyFunction3 <- function(A) {


result1 <- rep(0, nrow(A))
11.7. EXERCISES 297

result2 <- rep(0, ncol(A))


for(i in 1:nrow(A)) result1[i] <- sum(A[i])
for(i in 1:ncol(A)) result2[i] <- sum(A[i, ])
c(result1, result2)
}

11.4 Implement a function divide2 in R that takes a natural number n as


input and outputs the number of times it can be divided by 2. E.g. for
n = 10, the output would be 1 because 10/2=5 and 5 is not divisible by 2.
For n = 24, it would be 3 because 24/2=12, 12/2=6, 6/2=3 and 3 is not
divisible by 2. For any odd number, it would be 0. Integrate a check whether
the provided n is really a positive integer; if it is not, print "Please provide
a natural number."
298 CHAPTER 11. FURTHER USEFUL R FUNCTIONS AND TIPS
Answers to the exercises

1.1 a) I, b) N, c) I, d) Q.

1.2 a) number of empty seats, b) price of a children ticket, c) maximal


earnings, i.e. with full cinema and only adults, d) earnings on that night.

0.75 y
1.3 a) 0.30625y kg, b) 0.30625
≈ 2.45 euros, c) 9.1
kg.

√ √
1.4 a) x ∈ −1 − 2, −1 + 2 , b) x = 21 , c) x ∈ ∅, d) x ∈ ∅,

9 1
e) x = 5, f) x ∈ −1, 16 , g) x ∈ {−15, 3}, h) x ∈ ∅ x = − 1+a if
9
a ∈ (−∞, −1) ∪ (0, ∞) and no solution otherwise, i) x = −3 if  a = −7,
√ 9
x = −7 if a = 1, x = −14 if a = 16, x = −3± 9 + 7a if a ∈ − 7 , 1 ∪(1, 16)∪(16, ∞)
and no solution otherwise.

1.5 300.

1.6 Q = 0 or Q = 50.

1.7 a) x > 1, b) x ∈ −1, 37 , c) x ∈ (−∞, −8] ∪ [2, ∞), d) x ∈ [3, ∞).


 

1.8 52 .

1.9 a) (x−3)(x−2), b) (x+4)(x−2), c) (x+2)(x−1), d) (x+5)(x+2),

299
300 ANSWERS TO THE EXERCISES

e) (x + 11)(x − 1), f) cannot be factorized, g) − 12 (x − 3)(x + 1)

1.10 a) −3x + 7, b) 2x2 − 4x, x ̸= ±2, c) 7.5, x ̸= −4, x ̸= ±5, d) 2−x ,


√ 2(3y−5) 5 3 5
2
x ̸= ±7, x ̸= ± 7, e) 5x−3 , x ̸= ± 3 , x ̸= 5 , y ̸= 3 , f) 1, x = ̸ 0,
p−q
x ̸= −1.

1.11 a) 3, b) 21 , c) 0.1, d) 6, e) 0, f) − 21 .

1.12 a) e1800.94, b) e3377.82 (both values rounded to two digits after


the decimal point).

1.13 0.00916=0.916% (rounded)

1.14 can be both true and false

1.15 a) exactly (thus strictly speaking, at least and at most would be


correct, too), b) at most, c) at least.

1.16 a) true, b) true, c) false, d) true.

909 50 100
i2 , c) (−1)i+1 ix2i (note that these are possible
P P P
1.17 a) 11i, b)
i=1 i=1 i=1
ways but the notation is not unique).

2
P 2
P
1.18 a) a3 = 4, ak = 14,, b) a3 = −4, ak = 2.
k=−1 k=−1

778
1.19 a) 16, b) 0, c) 36
.

7
P
1.20 a) 42,, b) (27 + 5(i − 1)).
i=1
301

4
P
1.21 ai = 7 + 4(i − 1), 7 + 4(i − 1).
i=1

2.1 The following codes present one possible option; there is not necessarily
a uniqe way. a) 11^(0:10), b) rep(1:3, 15), c) rep(5:18, each = 3),
d) seq(-500, -100, by = 50), e) seq(55, 33, by = -2), f) rep(seq(2,
14, by = 2), times = 1:7)

2.2 a) sum((-25:25)^2), b) sum(3^(-10:10)), c) sum(2^(0:10)),


d) sum((1:100)*(10 - 1:100)), e) sum(seq(4, 259, by = 3))

2.4 One possible chunk of code that would fulfill the given tasks is the
following:

vector <- rep(1:5, each = 4)


new_vector <- c(vector, vector2)
new_vector[(length(new_vector) - 4):length(new_vector)]
new_vector[new_vector > 3]
sum(new_vector < 7 & new_vector != 5)

3.1 f (x) = 3x + 5, g(x) = −(x + 2)(x − 3), h(x) = log2 x, i(x) = 5 · 2x ,
x
j(x) = 21 (x − 1)2 , k(x) = log 1 x, l(x) = − 15 x, m(x) = −2 21
3

3.2 a) domain [−4, ∞), range R+


0 , b) domain R, range [6, ∞), c) domain
(−∞, 3), range R, d) domain R, range (0, 1].

3.3 a) S(P ) = 5P , P ∗ ∈ {5, 20}, D(5) = S(5) = 25, D(20) = S(20) = 100,
b) D(P ) = −2P +100, S(P ) = 3P , P ∗ 20, D(P ∗ ) = S(P ∗ ) = 60, c) D(P ) = −P 2 −6P +100,
S(P ) = 9P , P ∗ = 5, D(P ∗ ) = S(P ∗ ) = 45,, d) D(P ) = −18P + 264,
S(P ) = 4P 2 + 2P , P ∗ = 6, D(P ∗ ) = S(P ∗ ) = 156. To check your code,
compare the resulting figure with the list of required properties of the plot.
Pay particular attention to whether the y-axis is large enough by default or
needs to be adjusted to show all values of both functions in the given area
(this might also depend on which function you plot first).
302 ANSWERS TO THE EXERCISES

3.4 The mistakes are: a) function arguments are only listed after the word
function; the function for logarithm is called log; brackets are necessary
in the denominator of part2; star is necessary to indicate multiplication in
part3, b) e in R is exp(1); log(x) is the natural logarithm, to work for
any a correctly, base = a must be included, c) the power 1/8 in part2n
must be in brackets; multiplication stars missing in part2d, d) p1 requires
base = 10; p2 requires the star for multiplication; p3 should be exp(y)

4.1 a) range R, bijective, b) range [−4, ∞), neither for i), surjective for
ii), c) range R+ , injective for i), bijective for ii), d) range R0+ , neither for
i), surjective for ii)

4.2 f : [0, 4], g: [−0.5, 0.5], h : (0, 4]

4.3 a) odd, b) even, c) odd (on R\{0}), d) neither, e) even, f) neither,


g) odd, h) neither, i) even

4.4 f is a) convex, strictly convex, b) none, c) monotone increasing,


d) injective, surjective, bijective.
g is a) none, b) symmetric, even, c) neither, d) surjective.
h is a) none, b) odd, c) monotone increasing, d) injective, surjective,
bijective.
i is a) none, b) none, c) neither, d) surjective.
j is a) convex, concave, b) neither, c) monotone increasing, d) injective,
surjective, bijective.
k is a) convex, b) none, c) neither, d) surjective.
l is a) convex, strictly convex, b) symmetric (about 2), c) neither, d) surjective.
m is a) concave, strictly concave, b) none, c) neither, d) surjective.
n is a) none, b) odd, c) neither, d) injective.
o is a) concave, strictly concave, b) none, c) monotone decreasing, d) injective,
surjective, bijective.

4.5 In grey the original function, in black the transformed function.


303
8

6
6

4
4
g1(x)

g2(x)

2
2

0
−2 0

−2
−6 −4 −2 0 2 4 6 −8 −6 −4 −2 0 2 4

x x
6

10
4
2
g3(x)

g4(x)

5
0

0
−4

−4 −2 0 2 4 6 −6 −4 −2 0 2 4 6

x x
6

2 4 6
4
g5(x)

g6(x)
2

−2
0
−2

−6

−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6

x x
304 ANSWERS TO THE EXERCISES

2
0
g7(x)

−6 −4 −2

−3 −2 −1 0 1 2 3


4.6 a) h−1 y−2 −1
1 (y) = 3 , b) h2 (y) = 2(4−y), c) h3 is not bijective, d) h−1
4 (y) = 2y,
−1 −1
p
e) h5 is not bijective, f) h6 (y) = 3− (y −4), g) h7 (y) = log2 (y +4)+1,
1
h) h−1
8 (y) = 2
y−1
, domain of h8 is R+ , i) h−1 4
2
9 (y) = (e + 2) − 4, domain of
h9 is R+

x3
4.7 a) M : (1, ∞) → R+ , M (x) = (x−1)2
. M is surjective, not injective,

3
inverse does not exist. D : R+ → R+ , D(x) = 3x . D is injective and

surjective. D−1 : R+ → R+ , D−1 (x) = 3 9x2 . Note that in both cases, if we
chose a larger codomain, the functions would not be surjective and thus D
2
would not have an inverse, either., b) M : R+ → R0+ , M (x) = (x−1) √
x
. M is
surjective, not injective, inverse does not exist. D : R+ → R+ , D(x) = exp(x−1)2 −1.
0 0

D is surjective, not injective, inverse does not exist. In both cases, a larger
codomain would destroy surjectivity.
item m : (0, 3) → (0, 9), M (x) = 9 3−x 3+x
. M is injective and surjective.
−1 −1 9−y
M : (0, 9) → (0, 3), M (x) = 3 9+y .

4.8 a) Increasing, b) neither, c) decreasing, d) increasing, e) neither.

4.9 a) MyFunction1(25, 60) results in 5, MyFunction1(5, 67) in 1, MyFunction1(a,


b) in the greatest common divisor of the numbers a and b, b) MyFunction2(7,
305

60) results in 420, MyFunction2(6, 10) in 30, MyFunction2(a, b) in the


least common multiple of the numbers a and b, c) MyFunction3(3) results
in 3, MyFunction3(1:5) in 1 2 3 4 5, MyFunction3(c(3, 1, 1, 2, 1,
2, 3, 1)) in 3 1 2, MyFunction3(x) for a general vector x in the vector x
from which the duplicates have been removed (i.e. the outcome are the unique
values appearing in x, in the order of their first appearance), d) MyFunction4(12,
1:4) results in 1 1 1 1, MyFunction4(1:4, 2) in 2, MyFunction4(c(4, 1,
2, 6, 2, 3, 1), c(2, 1, 4)) in 4 7 1, MyFunction4(x, y) for general
vectors x and y of natural numbers in a vector of the same length as y, with
i-entry containing the number of entries of x that are divisible by y[i].

4.10 The mistakes are: a) max should be pmax; in ifelse, the second
and third argument should be switched (or the condition should contain !=
instead of ==), b) : instead of to in the definition of tocheck; tocheck is a
vector such that ifelse is required instead of if...else, c) pmin instead
of min, d) ifelse instead of if; < should be <=; else should be removed;
the last line should be final.class instead of ratings

5.1 In the following, D(f ) denotes the domain of function f . a) f1′ (x) = 2x+3x2 ,
D(f1 ) = D(f1′ ) = R, b) f2′ (x) = 8x − 1, D(f2 ) = D(f2′ ) = R, c) 2√1 x − x23 ,
D(f3 ) = D(f3′ ) = R+ , d) f4′ (x) = √ 2 ′
3 2 , D(f4 ) = R, D(f4 ) = R \ {0},
x
e) f5′ (x) = − x23 − x124 , D(f5 ) = D(f5′ ) = R \ {0}, f) f6′ (x) = − x22 − 35
2 √1
5 3,
x
′ ′ ′
D(f6 ) = D(f6 ) = R \ {0}, g) f7 (x) = 2 cos x − 3 sin x, D(f7 ) = D(f7 ) = R,
h) f8′ (x) = 3x ln 3, D(f8 ) = D(f8′ ) = R, i) f9′ (x) = x ln9 10 , D(f9 ) = R+ ,
D(f9′ ) = R \ {0}

2
√ −1
5.2 a) g1′ (x) = 2x+3− x12 for x ̸= 0, b) g2′ (x) = 12x(x2 +1)5 , c) g3′ (x) = 212x
√ 4x3 −x
′ ′
/ {0, 2 }, d) g4 (x) = √
for x ∈ 1 2 5x+5

2
for x ̸= 0, e) g5 (x) = 2x sin x+(x −1) cos x,
4 5x2 +5x 5x
f) g6 (x) = −2 sin(2x + 4), g) g7′ (x) = 2 sin x cos x,

h) g8′ (x) = 2x cos(x2 ),
i) g9′ (x) = 3 3sincosx+8
x ′
, j) g10 (x) = esin x cos x

5.3 a) h′1 (x) = x3 + 2x, h1 is decreasing on (−∞, 0], increasing on [0, ∞),
b) h′2 (x) = 2x − 1 for x ̸= −1, h2 is decreasing on (−∞, −1) and (−1, 21 ],
increasing on [ 12 , ∞), c) h′3 (x) = (x+3)
7
2 for x ̸= −3, h3 is increasing on
2x2 +2x+2
(−∞, −3) and on (−3, ∞), d) h′4 (x) = (1−x2 )2
for x ̸= ±1, h4 is increasing
306 ANSWERS TO THE EXERCISES

−10(12x3 +2x)
on (−∞, −1), on (−1, 1) and on (1, ∞), e) h′5 (x) = (3x4 +x2 )11
for x ̸= 0, h5
2

2x3 −1+2)7
is increasing on (−∞, 0) and decreasing on (0, ∞), f) h′6 (x) = 24x (√2x 3 −1
1 1 1 ′ 1
for x ̸= 3 2 , h6 is increasing on (−∞, 3 2 ) and ( 3 2 , ∞), g) h7 (x) = x+2
√ √ √ for

′ x √1
x > −2, h7 is increasing on its domain, h) h8 (x) = e 2 x for x ̸= 0, h8 is
increasing on (−∞, 0) and on (0, ∞)

5.4 a) f ′ (x) = 3(2x + e−x )(2 − e−x ) − (x+2)


1 ′
2 , f (0) > 0 such that f is strictly

increasing at x = 0, b) g ′ (x) = 2(x2 + e−3x )(2x − 3 e−3x ) + (2−x)1 ′


2 , g (x) < 0

such that g is strictly decreasing at x = 0.

5.5 f ′ (0) = 0, f ′ (−1) = − 29 , f ′ (2) = 30.

5.6 a) x ∈ {0, 4}, b) x = 2, c) x ∈ ∅.

5.7 a) f1′′ (x) = 6, f1 is convex on R, b) f2′′ (x) = 2 + 2 √ 1 2


3 x + x3 for x > 0,

f2 is convex on R+ , c) f3′′ (x) = 12x2 + 24x, f3 is convex on (−∞, −2) and


(0, ∞) and concave on (−2, 0), d) f4′′ (x) = − 9x √ 2 5
3 2 + 16x2 √
x
4 x for x > 0,

e) f5′′ (x) = −4 sin x − 2 cos x, f) f6′′ (x) = x83 − x244 for x ̸= 0

2 3 4 5
5.8 a)T1 (x) = (x − 1) − (x−1)
2
+ (x−1)
3
− (x−1)
4
+ (x−1)
5
;
2 3
x−2 1 (x−2) 1 (x−2) 1 (x−2)4 1 (x−2)5
T2 (x) = ln 2 + 2 − 4 2 + 8 3 − 16 4 + 32 5

x3 x5 (x−2π)3 (x−2π)5
b)T0 (x) = x − 6
+ 120
; T2π (x) = (x − 2π) − 6
+ 120

x2 x4 (x− π2 )3 (x− π2 )5
c)T0 (x) = 1 − 2
+ 24
; T π2 (x) = −(x − π2 ) + 6
− 120
5 k
(−1)n−1 xk!
P
d)T0 (x) =
i=1
5 k
(−1)n xk! .
P
e)T0 (x) = 1 +
i=1

5.9 a) [−3, 2], b) [−4, −3], c) [−4, −2] and [0, 2], d) [−2, 0], e) -2 and
0.
307

5.10 Elp D(p) = −2 for all p. No matter what the current price is, a price
increase of 1% leads to an (approximate) 2% decrease in demand.

5.11 a) Elp D(p) = − p6 – a 1% increase of the price leads to a decrease in


x2
demand by approximately p6 % (with p being the current price). T0 (p) = 1− 12
x
+ 216 ,
b) Elp D(p) = −p – a 1% increase of the price leads to a decrease in demand
2
by approximately p%. T0 (p) = 12 − x4 + x12

5.12 The mistakes are: each = 2 should be removed; pmin instead of min;
closing bracket missing.

5.13 res contains the sum of the elementwise product of x and y (equivalent
to sum(x*y)), mxy is the equivalent of pmax(x, y), that is, the elementwise
maximum of x and y.

x4 2 4 3
6.1 a) 4
+2x3 −x2 +C, b) 3x2 +5x+C, c) − 3x13 + x14 +C, d) 34 x 3 −4x 4 +C,
3 √ √ q
3 5
e) x6 − x4 + C, f) 32 x3 + 9 x2 + C, g) 4 x3 + C, h) 52 x 2 + 5x + C,
3

2 √
i) x2 + 2x12 − 3√2x3 + C, j) 14 ln |4x + 15| + C, k) x − 4 x + ln |x| + C,
√ √
l) 12 exp(2x+5)+C, m) 4 sin(0.5+0.25x)+C, n) 38 x8 +C, o) 45 x5 +C.
3 4

2 2
6.2 Use integration by parts. a) sin(x) − x cos(x) + C, b) x2 ln(x) − x4 + C,
c) x ln(x)−x+C, d) − 91 e−3x (3x+1)+C, e) 12 sin2 (x)+C, f) 21 ln2 (x)+C,
x
g) ex (x2 + 2x + 2) + C, h) e2 (sin(x) − cos(x)) + C, i) x−sin(x) 2
cos(x)
+ C,
x+sin(x) cos(x)
j) 2
+ C.

6 )8
6.3 Use substitution. a) 0.2(x2 +8)10 , b) (4+x
p
4 4
48
+C, c) 3
(x3 − 2)5 +C,
5 √
d) 31 ln |x3 +8|+C, e) (sin(x)+2)
p
5
+C, f) 23 (x2 + 5x + 1)3 +C, g) 23 1 + x3 +C,
6 9
h) (4+x9 ) + C.

x3 2x
6.4 f (x) = 3
+ 3
+ 2.
308 ANSWERS TO THE EXERCISES

6.5 f (x) = 2x3 − 3x2 + 6x + 4.


6.6 a) 6, b) −30, c) 0, d) e − 1e , e) ln( 2), f) e −1

6.7 a = 3, b = 2, c = 4.

6.8 4.

6.9 P ∗ = 30, CS = 150.

6.10 f¯ = 0.1(e10 − e0 ) ≈ 2202.375

6.11 I¯ = 3.2

6.12 313 (rounded).

6.13 88.75.

   
3 5 7 9 16 15 14 13
7.1 P = 8 10 12 14 in thousands of EUR, S = 10 9 8 7 
7 9 11 13 14 13 12 11
in thousands of units; in R P*S (element-wise product) gives the revenues in
millions of EUR.

 
5 2 12
7.2 For instance P = , where the rows correspond to the materials
2 4 0  
1000 3000 5000
and the columns to the products; O = 2500 0 2500, where the rows
500 3000 1800
correspond to the products and the columns to the countries. Then R can
be obtained in R by P%*%O and the combined raw material amounts required
by rowSums(P%*%O).
309
 
  2 0  
2 2 −1 1 4 1
7.3 a) not defined, b) , c)  2 3 , d) , e) not
0 3 −1 0 2 1
  −1 −1
  1 0 −1    
1 2 2 4 9 2
defined, f) , g) 2 2 −1, h) , i) not defined, j) ,
4 0 4 1 4 0
  0 −4 −2
1 4
k) , l) not defined.
4 8

7.4 a) x1 = 1, x2 = 2, x3 = −2, b) x1 = −1, x2 = 0, x3 = 2, x4 = −1,


c) x1 = 2 − x2 − 2x4 , x3 = 1 + x4 , x2 , x4 ∈ R.

7.5 a) yes, b) no, c) yes, d) yes, e) yes, f) no

7.6 Use properties of transpose to show that C ′ = C.

7.7 Use that |AA−1 | = |I| = 1.

7.8 a) False. Correct: (A − B)(A + B) = A2 + AB − BA − B 2 . b) True.


c) False. Correct: The corresonding R code is A%*%solve(B+in(C)). d) False.
Correct: (AB)−1 = B −1 A−1 . e) False. Correct: If B is symmetric, then
B −1 (BC)′ = B −1 C ′ B.

 
−1 −1 −1 −1.8 1.2
7.9 a) X = D +D C D if both C and D are invertible, b) X = ,
1.25 − 31
c) iv, vii.

7.10 a) X = A−1 A′ BAC if A is invertible; if A is symmetric: X = BAC,


b) X = B 2 if C is invertible.

 
  −2 4.5  
3 −0.4 −6 1 −3
7.11 a) , b) −5 3.5, c) .
−1 1.2 5.5 6.5 4
1 −1
310 ANSWERS TO THE EXERCISES

7.12 One possible way:

P <- matrix(1:6, nrow = 2, byrow = TRUE)


Q <- matrix(1, nrow = 2, byrow = 2)
R <- matrix(c(1, -1, 1, 1, 2, 3), nrow = 3)
t(Q)%*%solve(P(%*%))
solve(t(R)%*%t(P))%*%Q

7.13 v1 contains the sums of the rows of A, v2 the minimal values in each
column of A.

7.14 The determinant of a diagonal matrix is the product of its diagonal


entries, and it can only be non-zero if all the diagonal entries are non-zero.
Multiply D with D −1 to show that it’s the inverse.

7.15 a) 8, b) 324, c) 20.

7.16 One possible way:

inverse <- function(A) {


detA <- A[1, 1]*A[2, 2] - A[1, 2]*A[2, 1]
if(detA == 0) {
print("A is not invertible.")
} else {
invA <- matrix(c(A[2, 2], -A[1, 2], -A[2, 1], A[1, 1]),
nrow = 2, byrow = TRUE)/detA
return(invA)
}
}

7.17 One possible way:

normalizedA <- function(A) {


m <- ncol(A)
for(i in 1:m) {
curr_sum <- sum(A[ , i])
A[ , i] <- A[ , i]/curr_sum
311

}
return(A)
}

7.18 a) G, nrow, c(1, 2), 2, for, 4, sum, i, 30, paste, SC, b) "The sum
in column 3 is 33." "The sum in column 4 is 40."

7.19 a) the outcome of the matrix multiplication A%*%B, b) the square of


the matrix A, outcome of A%*%A.

q q 2 +5xy
3 x
8.1 a) f1 (x, y) = 2 y
, f2 (x, y) = − 12 x
y3
, b) g1 (x, y) = (2x + 5y) ex ,
2
g2 (x, y) = 5x ex +5xy , c) h1 (x, y) = yxy−1 , h2 (x, y) = xy ln(x), d) i1 (x, y) =
3
3x2 ln(4xy) + x2 , i2 (x, y) = xy . Plot the functions in R, on a set (intervals
for x, y) of your choice. Use different types of plots to get some exercise on
the 3D plotting functions and observe the behaviour of the functions.

8.2 a) f1 = 2x + 4y, f2 = 4x + 16y – neither increasing, nor decreasing


in x, y; f11 = 2, f22 = 16, f12 = f21 = 4 – convex in x, y and (x, y),
b) f1 = 2x + 4y, f2 = 4x − 2y – neither increasing, nor decreasing in x, y;
f11 = f22 = −2, f12 = f21 = 4 – concave in x, y, neither convex nor concave
in (x, y), c) f1 = − ex1 − ex1 +x2 , f2 = − ex1 +x2 – decreasing in both x and y,
f11 = − ex1 − ex1 +x2 , f22 = f12 = f21 = − ex1 +x2 – concave in x, y and (x, y).

−1.25x−1.5 y 0.2 0.5x−0.5 y −0.8


 
8.3 H = . Since H < 0 and det(H) > 0
0.5x−0.5 y −0.8 −0.8x0.5 y −1.8
for any x, y > 0, the Hessian is negative definite and f is strictly concave.


x
8.4 fx (x, y) = x+y 2
+ 6√1xy , fy (x, y) = x+y− 3y 2
, fxx (x, y) = − (x+y)2
2 −
√1
12 x3 y
,
2 √1 2 √1
fxy (x, y) = fyx (x, y) = − (x+y) 2 − 6 xy 2 , fyy (x, y) = − (x+y)2 − 6 xy 2 . At
(x0 , y0 ) increasing in both x and y, around the point concave in both x and
y. q
2 2 2
gx (x, y) = 2x ex −y − 2√1xy , gy (x, y) = − ex −y + 21 yx3 , gxx (x, y) = 4x2 ex −y + √1 3 ,
4 x y
q
x2 −y 2 −y
gxy (x, y) = gyx (x, y) = −2x e + √ 3 , gyy (x, y) = e
1 x 3 x
− 4 y5 . At
x xy
312 ANSWERS TO THE EXERCISES

(x0 , y0 ) decreasing in x, increasing in y, around the point convex in both x


and y.

8.5 f : a) at A greater than at B, b) yes, c) both positive. g: a) at


A greater than at B, b) yes, c) w.r.t. x positive, w.r.t. y negative. h:
a) at A greater than at B, b) yes, c) w.r.t. x probably slightly positive,
w.r.t. positive. i: a) yes, equal at both points, b) yes, c) w.r.t. x positive,
d) w.r.t. y negative

8.6 a) Elp1 f (p1 , p2 ) = Elp2 (p1 , p2 ) = −1, b) Elp1 f (p1 , p2 , m) = −0.5p1 ,


Elp2 f (p1 , p2 , m) = 0.7p2 , Elm f (p1 , p2 , m) = 1.1m.

5y 2 −4x
8.7 a) y ′ = − xy , b) y ′ = y3 , c) y ′ = 2+x
1−y
, d) y ′ = 18y 2 +8y−10xy
, e)
′ 3x2 cos(6y 3 )−8 cos(10y+8x) ′ y
y = 18x3 y 2 sin(6y 3 )+10 cos(10y+8x)
, f) y = −x.

8.8 a) 1400, b) −2K + 2L, at current levels -160 – if labor increases by


2 −2KL
one unit, output decreases by approximately 160 units, c) 0.5KK2 −2KL+L 2 , at

current levels about 4.29 – an increase in capital by 1% leads to the output


increasing by about 4.29%, d) K ′ = 2K−2L
K−2L
, at current levels about 2.67 – if
labor is decreased by h units, capital needs to decrease by about 2.67h units.

−2xy 2 +9y 3 +x2 y


8.9 a) 23000, b) 3x2 − y 2 + 2xy, at current levels 300, c) x3 −xy 2 +3y 3 +x2 y
,
y −3x −2xy 2 2
at current levels about 2.87, d) y ′ = −2xy+9y 2 +x2 , at current levels − 11 1

h
decrease of h units in x requires increase of about 11 units in y.

8.10 a) 4.25, b) she must increase milk chocolate by 1.5h grams per week,
c) difference (old minus new expenses) is hpd − 1.5hpm , she saves if it’s
positive, i.e. if pd > 1.5pm .

8.11 a) 0.8 kg, b) y has to increase by (approximately) 2h, c) old minus


new expenses is hp1 − 2hp2 , costs are reduced if p1 > 2p2 .
313

9.1 a) x = 0 local

minimum, b) x =√ 21 local minimum, c) x = 0 local
maximum, d) 3−2 7 local maximum, 3+2 7 local minimum, e) x = −3 local
minimum, x = 1 neither, f) x = 1e local maximum

9.2 Function f : a) stationary points x = 1, x = 4 and x = 6, b) local


minima attained at x = 1 and x = 6, c) local maxima attained at x = 0,
x = 4 and x = 7 (do not forget about the interval endpoints!), d) increasing
on [1, 4] and [6, 7], e) decreasing on [0, 1] and [4, 6], f) convex between 0
and approximately 2.2, and between approximately 5.1 and 7, g) concave
approximately on [2.2, 5.1].
Function g: a) x = 1 and x = 4, b) x = 0 and x = 4, c) x = 1 and x = 7,
d) [0, 1] and [4, 7], e) [1, 4], f) [2.5, 7], g) [3, 2.5]
Function h: a) x = −4, x = 0 and x = 3, b) x = −5 and x = 3, c) x = −4
and x = 4, d) [−5, −4] and [3, 4], e) [−4, 3], f) approx. [−2.9, 0] and
[2.1, 4], g) approx. [−5, −2.9] and [0, 2.1]
Function i: a) x = −2 and x = 2, b) x = −3 and x = 2, c) x = −2 and
x = 3, d) [−3, −2] and [2, 3], e) [−2, 2], f) approx. [−3, 0.4], g) approx
[0.4, 3]

9.3 Local minima at x = 3 (with value 73) and x = 8 (with value 48),
local maxima at x = 4 (with value 80) and x = 15 (with value 685). Global
minimum at x = 8, global maximum at x = 15.

9.4 Local minima at x = −4 (with value −497) and x = 10 (with value


−5895), local maximum at x = 0 (with value 15). Global minimum at
x = 10, there is no global maximum (note that close to the both the left and
right end of the interval, the function achieves higher values than the local
maximum 15 at x = 0).

9.5 x = 5, Π(x) = 2050. Don’t forget to check the second order conditions.

9.6 x = 24, Π(x) = 40722. Don’t forget to check the second order conditions.

9.7 a) (0, 0) local minimum, b) (0, 0) saddle point, (1, 1) local minimum,
c) (0, 0) undetermined, (1, −1) and (1, 1) saddle points, d) (4, 1) and (4, −1)
314 ANSWERS TO THE EXERCISES

local minima

9.8 a) Π(p1 , p2 ) = p1 (240−4p1 +3p2 )+p2 (400+p1 −2p2 )−(21(240−4p1 +3p2 )+


35(400 + p1 − 2p2 ) + 2633), b) p1 = 174, p2 = 275.75, c) q1 = 371.25,
q2 = 22.5, d) Π(p1 , p2 ) ≈ 59585.13, C(q1 , q2 ) = 11216.75.

1400
9.9 h = 3
, s = 875, 1000λ ≈ 2455.09

9.10 x ≈ 5.33, y ≈ 21.83

9.11 a) L(x, y, λ) = 8x2 −2xy+2y 2 −λ(x+y−12), x = 3, y = 9 (don’t forget


to check the second order conditions), b) C(3, 9) = 180, c) λ = 30, d) ii)
(in i), the function is not defined properly as necessary for constrOptim, in
iii), the function is being maximized).

9.12 a) L(x, y, λ) = 45x2 + 90xy + 90y 2 − λ(2x + 3y − 60), x = y = 12,


b) C(12, 12) = 32400, c) λ = 1080.

9.13 a) L(x, y, λ) = −2x2 +2xy−3y 2 +30x+20y+62−λ(x+2y−10), x = 6,


y = 2 (don’t forget to check the second order conditions), b) Π(x, y) = 222,
c) λ = 10, d) ii (in i, f is not defined as necessary for constroptim, in
iii, the constraint matrix is wrong, in iv, f is not defined properly and the
function is being minimized).

9.14 a) L(x, y, λ) = −3x2 +2xy−2y 2 +20x+30y−83−λ(4x+2y−20), x = 2,


y = 6 (don’t forget to check the second order conditions), b) Π(x, y) = 77,
c) λ = 5.

10.1 A (After 5 years, you have approximately 1.16S0 at A, 1.13 at B and


1.15S0 at C.)

10.2 Nominal 2.2%, effective 2.22% (both rounded to two digits).


315

10.3 e10000 today; the other offer has a present value of e9482.99.

10.4 e5689.68 (rounded).

10.5 e409.94 (rounded)

10.6 About 5.6%.

10.7 a) At Bank 1, he repays e66180.08, at Bank 2 it is e66139.5 (both


values rounded), thus Bank 2 is better, b) rounded: 10.5945% at Bank 1,
10.5809% at Bank 2, c) the friend’s offer is better than any of the bank’s
offers (amounts to the future value of all payments being about e65821.51),
d) the friend’s offer is still better, though less so than in part c) (future value
e65979.61).

11.1 Finds the smallest x for which i! ≥ n.

11.2 Condition should be curr_sum < n, i instead of sum(1:i), return i


- 1 instead of i.

11.3 A[i, ] instead of A[i], A[ , i] instead of A[i, ], list instead of


c

11.4 One possible way:

divide2 <- function(n) {


if(n <= 0 | ceiling(n) != n) {
print("Please provide a natural number.")
} else {
counter <- 0
while(n%%2 == 0) {
n <- n/2
counter <- counter + 1
}
counter
316 ANSWERS TO THE EXERCISES

}
}

You might also like