Module 1 (Basics of Calculus)
Module 1 (Basics of Calculus)
Module 1
This is a single, concatenated file, suitable for printing or saving as a PDF for offline viewing. Please note that some
animations or images may not work.
Introduction
1. Describe the definitions of a function, images and preimages of elements, one-to-one functions, and the limit of a sequence of numbers.
2. Describe the definitions of a continuous function and the derivative and differential of a function.
Numerous examples are provided. There is complementary material at the end of the module in case you are interested in the topics we cover.
I hope you’re looking forward to this course—I certainly am. Let’s get started.
What is a Function?
Definition (#1)
Let A and B be sets. Then a mapping f : A → B is called a function if for every element in A there is only one corresponding element in B.
What Is a Function?
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 1/35
7/2/25, 7:39 AM Module 1
1 2 3
When we say the word “function,” we mean three things: a set A, a set B, and a mapping from A to B. Let A and B be sets. A mapping is called a function if, for every
element in A, there is only one corresponding element in B. So, a function f is a mapping from a set A to a set B if there is only one element in a set B corresponding to an
element in a set A. If you look at Figure 1.1, you can see a set A, a set B, and a mapping from A to B. You can also see a couple of elements in A, x1 and x2 , along with
their corresponding values, f (x1 ) and f (x2 ) in set B.
Figure 1.1
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v01 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Definition (#2)
The range of a function f is the set of all elements y in B for which there are elements x in A, such that y = f (x) .
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 2/35
7/2/25, 7:39 AM Module 1
1 2
Example (#3)
Let A be the set of all U.S. residents, let B be the set of all strings of length 9 made of integers from 0 to 9, and let f map A to B in such a way that each person is mapped
to his or her Social Security number. (A Social Security number is a string of length 9.) So, essentially, the function f maps people to their Social Security numbers. Obviously,
there is only one Social Security number per person, so this mapping is indeed a function (every person is mapped to one string only). Each Social Security number is a string
of digits of length 9, but not every string of digits of length 9 is a Social Security number. So, the range of this function is the set of all strings that serve as Social Security
numbers.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v03 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Definition (#4)
If x ∈ A , then f (x) .
∈ B f (x) is called the image of x.
Image of x
This is a definition of the image of an element of A. If x belongs to A, then, obviously, f (x) belongs to B. f (x) is called the image of x under f .
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v04 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 3/35
7/2/25, 7:39 AM Module 1
Definition (#5)
The set {x ∈ A|f (x) = y} (which reads as “the set of elements x in A, such that f (x) equals y” or “the set of elements in A mapped to y” or “the set of elements in A
whose image is y”) is called the preimage of y. In other words, the preimage of y is the set of all elements in A whose image is y.
Yes. Each product in A will have a single price that it is mapped to. No one product will be mapped to 2 different prices.
There may, however, be two different products mapped to the same price (but this is “allowable” in the definition of a
function).
No. This mapping cannot be a function, since it is likely that there will be joint ownership of at least a few of the cars. If so,
then the same car in set A will map to more than one individual in set B.
follows that x1 = x2 .
One-to-One Function
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v06 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 4/35
7/2/25, 7:39 AM Module 1
Example (#7)
Prove that f (x) = x
3
+ 1 is a one-to-one function.
We make an assumption that f (x1 ) = f (x2 ) and then prove that x1 = x2 . Indeed, f (x1 ) = x
3
1
+ 1 and f (x2 ) = x
3
2
+ 1 ; so, from f (x1 ) = f (x2 ) follows
x
3
1
+ 1 = x
3
2
+ 1 ⇒ x
3
1
= x
3
2
⇒ x1 = x2 . Since x1 = x2 followed from f (x1 ) = f (x2 ) then we have shown that f is one-to-one.
So, how can we prove that a function f that maps A to B is one-to-one? This is what we do: We set the images of two elements x1 and x2 to be the same. That is, we set
f (x1 ) to be equal to f (x2 ). And then we have to prove that x1 equals x2 . That is, from the fact that the images of two elements are the same, it should follow that the
elements themselves are the same. If we succeed in proving that x1 equals x2 , then we say that the function f is one-to-one.
Figure 1.3
So, what is a graph of a function? Again, let f map R to R, where R is the set of real numbers—i.e., f maps the set of real numbers to the set of real numbers.
For every element x from R, there is a corresponding element f (x). So, we can form a pair of elements, with x being in the first place and f (x) being in the second. The
graph of the function f is the set of pairs of elements (x, f (x)) for every real number x.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v08 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Graphing a Function
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 5/35
7/2/25, 7:39 AM Module 1
Example (#9)
Let's say you want to draw a graph of the function f (x) = x
2
.
We draw the graph of this function by plotting several points belonging to the graph, for instance, (0,0), (1,1), (−1,1), (2,4), (−2,4), and then connecting them.
Example (#10)
Let f (x) = x
2
. Then the image of every x is x2 .
A preimage for every positive number y does exist and consists of two numbers, x1 and x2 , where x1 = √y and x2 = −√ y . For instance, the preimage for 4 are two
numbers, √4 and −√4, that is, 2 and −2. The preimages of 11 are both √11 and −√11. So we can say that f (x) = x
2
is not a one-to-one function. Note that here we
have an example of where f (x1 ) = f (x2 ) but x1 may not be equal to x2 . That is (−√11)2 = (+√11)
2
but −√11 ≠ √11. This function is not one-to-one since
x1 = x2 does not follow from f (x1 ) = f (x2 ). We can prove this by providing a single counterexample such as the one above.
Figure 1.5
Example (#11)
Let f (x) = x
3
. Then the image of each x is x3 and the preimage for each y is x = √
3
y . Obviously, f (x) = x
3
is a one-to-one function. This is implied since there is only a
single preimage and can be proven “theoretically” using definition 5 as shown in Example 2.
Figure 1.6
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 6/35
7/2/25, 7:39 AM Module 1
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v11 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Example (#12)
The floor function f : x → ⌊x⌋ maps every real number to the largest integer less than or equal to that number, where ⌊x⌋ is the largest integer less than or equal to x. For
instance, 3.7 is mapped to 3 (3 is the largest integer less than or equal to 3.7), −5.99 is mapped to −6, 8.97 is mapped to 8. What is the range of the floor function? Is it one-to-
one?
The range of the floor function is the set of all integers. (Indeed, for each integer n any number less than n + 1 and greater than or equal to n is a preimage for n, for
instance, any number less than 5 and greater than or equal to 4 is a preimage for 4.) The floor function is not one-to-one, as all numbers between n and n + 1 are mapped to
n . (That is, there is more than one preimage for each image.)
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v12 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Example (#13)
Let m be an integer. Then every integer n can be represented as n = k ⋅ m + r , where r is the remainder of n when divided by m. (For instance, if m = 5 , then 13 = 2 · 5
+ 3), so 3 is the remainder.) The mapping f : n → r is called the modulo function. What is the range of the modulo function for a given m? Is it one-to-one?
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 7/35
7/2/25, 7:39 AM Module 1
It's quite clear that many integers divided by m will have the same remainder (for instance, 7, 12, 17, etc. divided by 5 have the same remainder of 2). Thus the modulo
function is not one-to-one.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v13 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Example (#14)
Let A be a table of customers’ records in a relational database and B is a Social Security numbers field that serves as the key field. Then the function f : A → B that maps
customers’ records to Social Security numbers is one-to-one.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v14 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Example (#15)
Let A be a set of items in a department store and B is a set of prices. Then the function f : A → B that maps items to their prices is not one-to-one (as we assume that more
than one item may be mapped to the same price).
Definition (#16)
A mapping h: A → B is called a hash function if it’s not one-to-one, i.e., there are x1 ∈ A and x2 ∈ A, x1 ≠ x2 such that h(x1 ) = h(x2 ) .
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v16 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 8/35
7/2/25, 7:39 AM Module 1
Example (#17)
The floor function from Examples 6 and the modulo function from Example 7 are examples of hash functions.
Example (#18)
This example is for students having advanced knowledge of the computer security field.
Let (0, 1)♦ denote the set of all possible strings of finite length made of 0s and 1s and (0, 1)n denotes the set of all possible strings of length n. Then the mapping h: (0, 1)♦
→ (0, 1)
n
is called a cryptographic hash function if
The floor function and the modulo function are not cryptographic hash functions as they do not satisfy conditions b and c.
Example (#19)
Message Digest 4 and 5 (MD4, MD5) and Secure Hashing Algorithms 1 and 2 (SHA1, SHA2) are examples of cryptographic hash functions.
SHA1, for example, maps strings of any length made of 0s and 1s to strings of length 160.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v19 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
No, it is not a one-to-one function. Since there will likely be two different products mapped to the same price, this means
that the function is NOT one-to-one. A condition of a one-to-one function is that no two distinct (different) elements
(products in A) map to the same image (price in B).
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/all… 9/35
7/2/25, 7:39 AM Module 1
Yes, it is a one-to-one function. Each distinct element in A is mapped to a single distinct element in B.
A Refresher
You may remember the following notations from your high school years, but here is a reminder.
(−∞, b) is the set of all real numbers x, such that −∞ < x < b
(−∞, ∞) is the set of all real numbers x, such that −∞ < x < ∞
(a, ∞) is the set of all real numbers x, such that a < x < ∞
Let f : A → R , i.e., a function f maps a set A to the set of real numbers. Then x ∈ A|f (x) < b (the set of all numbers x in A, such that f (x) is less than b) is the
preimage of (−∞, b).
Figure 1.7
Let’s consider a function f that maps a set A to the set of real numbers. What is the preimage of the interval made of numbers from negative infinity to b? In Figure 1.7 you
can see a bold line going from negative infinity to b. A shaded area inside of a set A is a subset of a set A such that the image of every element from it belongs to the bold
line; in other words, the image of every element of it is less than b.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v21 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 10/35
7/2/25, 7:39 AM Module 1
Remark (#22)
If b1 < b2 , then the preimage of (−∞, b1 ) belongs to the preimage of (−∞, b2 ).
Figure 1.8
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v22 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Remark (#23)
The preimage of (−∞, ∞) is the whole set A.
Definition (#24)
Note that this definition is optional, but it’s desirable that you try to understand it.
Let xn be a sequence on real numbers, that is, for every positive n, xn is a real number. Then, a is called the limit of xn when n → ∞ , if for every very small number
ε > 0 , there is a positive integer N , such that for every n > N , a − ε < xn < a + ε , or in other words, xn will be in the ε-neighborhood of a.
The idea here is that only a finite number of elements of the sequence will stay outside of the ε-neighborhood of a for every very small ε, while an infinite number of elements
will stay inside of the ε-neighborhood of a.
Figure 1.9
Here is a definition of a limit: Let’s consider a sequence of numbers. Then we say that a number a is the limit of the sequence of numbers xn if for each very small number ε
an infinite number of elements of that sequence will be located inside the ε-neighborhood of a and only a finite number of them will be located outside of the ε-neighborhood.
Now let’s consider the ε1 and ε2 neighborhoods of a, where ε2 is smaller than ε1 . For the smaller neighborhood more elements will stay outside of it, but still an infinite
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 11/35
7/2/25, 7:39 AM Module 1
number of elements will be located inside. The smaller the neighborhood gets, the larger the number of elements that will stay outside of the ε neighborhood.But for every
neighborhood an infinite number of elements will be located inside. So we can visualize a sequence in which the limit is a, to be made up of elements flocking around a.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v24 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Examples (#25)
1 1
#25a: xn = 2
⇒ lim xn = lim 2
= 0 , (the denominator is growing while the numerator stays the same, thus the fraction is getting smaller and its limit is 0)
n +1 n +1
n n n n 1
#25b: xn = 2
⇒ lim xn = lim 2
= 0 , (the denominator grows as fast as n2 (when n2 is huge, who cares about 1), so the fraction behaves like n
2
, but n
2
= n
,
n +1 n +1
1
and n
→ 0) .
2 2 2 2
n +3 n +3 n n
#25c: xn = 2
⇒ lim xn = lim 2
= 1 , ( n2 + 3 grows as fast as n2 and n2 + 1 grows as fast as n2 , so the fraction behaves like n
2
, but n
2
= 1 ).
n +1 n +1
3 2 3 2
n +n +3n+5 n +n +3n+5
#25d: xn = 2
⇒ lim xn = lim 2
= ∞ , ( n3 + n
2
+ n + 1 grows as fast as n3 (the rate of growth of n2 and n is insignificant compared to the rate
n +1 n +1
3 3
n n
of growth of n3 ) and n2 + 1 grows as fast as n2 , so the fraction behaves like 2
, but 2
= n and n → ∞ , thus the limit is ∞).
n n
In Example #25a, we'll find limits of some sequences. The first one is 1/(n2 + 1) n . always runs to the infinity. When you look at the fraction, the bottom part of it, (n2 + 1) ,
is growing very rapidly, while the top of it, which is 1, stays the same. So what happens with the fraction if its denominator is getting larger? The fraction will be getting smaller,
though staying positive, as both the top and the bottom are positive. Getting smaller and staying positive means running to a zero, so the limit of this sequence is a zero. Let’s
go to the next one, xn = n/(n
2
+ 1) . It’s quite clear that both the top and the bottom of this fraction are growing as n is growing. But it turns out the denominator is growing
a lot faster. The denominator grows as fast as n2 . When we add 1 to huge numbers, that won’t affect them in any way. So the top of the fraction grows as n, and the bottom of
n 1 1
it grows as n2 . What is n
2
? It’s n
. When n runs to the infinity, what happens with n
? It runs to a zero. That’s why the limit is a zero. Let’s go to the next example,
2
n
xn = (n
2
+ 3)/(n
2
+ 1) . The top grows as n2 . The bottom also grows n2 . But 2
is 1. So the limit is 1.
n
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v25 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 12/35
7/2/25, 7:39 AM Module 1
2
5 5x 5
2
. 5x2 + x grows as fast as 5x2 . 2x2 + 3 grows as fast as 2x2 . So the fraction behaves as 2
=
2
. Thus
2x
2
5x +x 5
lim 2
=
2
.
2x +3
Definition (#26)
e is one of the most important constants, here is a definition of e:
1 n
The limit of a sequence, xn = (1 +
n
) , when n → ∞ , exists and the value of it is called e. The approximate value of e is 2.71. e is a constant like others you may be
familiar with (like π = 3.1415…)
Next >
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v26 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
The techniques used to find limits of sequences could be applied to find limits of functions. (We do not define the limit of a function; we assume it’s intuitively clear.)
1 n
Let’s consider the following sequence: xn = (1 +
n
) . n obviously is growing, in other words, n is running to the infinity. It turns out that this sequence has a limit. The
number that is the limit for this sequence is called e. e approximately equals 2.71.
Example (#27)
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 13/35
7/2/25, 7:39 AM Module 1
lim 2x + 1 = ∞ , when x → ∞ (as x is growing, so is 2x + 1, so we say that 2x + 1 is running to the infinity or lim 2x + 1 = ∞ ).
Let’s find the limit of 3x + 1. As x is growing, so is 3x, thus the limit of 3x + 1 is the infinity.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v27 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Example (#28)
1 1
Let f (x) = x
. Then, lim x
= 0 , when x → ∞ . (Indeed, the denominator is getting larger while the numerator stays the same, thus the fraction is getting smaller and
smaller.)
1
Let’s consider the following function: f (x) equals x
. As x runs to the infinity, the denominator of this fraction is getting larger and larger making the whole fraction smaller and
1
smaller. So the limit of x
is zero.
Example (#29)
x x x 1 x
1. lim 2
= 0 . Indeed, x2 + 1 grows as fast as x2 , so the fraction behaves like x
2
, but x
2
= x
, so 2
→ 0 .
x +1 x +1
3 2 3
x +x +3x−10 x
2. lim 0. The growth of the numerator is determined by x and the growth of the denominator is determined by x , so the fraction behaves like , but
3 4
4
= 4
x +1 x
3
x 1 1
x
4
= x
, so x
→ 0 .
2 2
x +5 x +5
3. lim 1. Both the top and the bottom of the fraction grow as fast as x . Thus .
2
2
= 2
→ 1
x −5 x −5
3 3
2x +3 2x +3
4. lim 3
= 2 . The top grows as fast as 2x and the bottom grows as fast as x , thus the fraction
3 3
3
→ 2 .
x +x+1 x +x+1
x
Let’s do more examples. Let’s find the limit of 2
as x runs to the infinity. x2 + 1 runs to the infinity as fast as x2 . So the bottom of the fraction grows as x2 while the top of
x +1
x x 1 1
it grows as x. Therefore the fraction behaves like x
2
. But x
2
equals x
. As x runs to the infinity x
is running to a zero, so the limit is a zero.
3 2 3
x +x +3x−10 x 1
Next example: 4
. The top of the fraction runs to the infinity and it behaves like x3 , as x3 is the fastest term. But the bottom behaves like x4 . x
4
is equivalent to x
.
x +1
1
As x runs to the infinity, x
runs to a zero, so the limit is a zero. Let’s move on to the next example. Here (x 2
+ 5) is on the top of the fraction and (x 2
− 5) is on the bottom.
The top behaves like x2 . The bottom also behaves like x2 . But x2 over x2 is 1. So the limit is 1.
Next example. On the top of the fraction is 2x3 + 3 . On the bottom is (x3 + x + 1) . So the top behaves like 2x3 , and the bottom behaves like x3 . 2x3 over x3 is 2. Thus
the limit is 2.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v29 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 14/35
7/2/25, 7:39 AM Module 1
0 . As x increases, g(x) is getting smaller and smaller. As such, it appears as though g(x) is going to 0 as x → ∞ .
.
20 g(x = 0) = 20 . Since the function g(x) is “smooth” (or “continuous,” which we will learn the definition of coming up),
the limit as x approaches 0 is equal to the function evaluated at x = 0 . That is, the limit is equal to
x+40 0+40 40
g(x = 0) = 2
= 2
=
2
= 20 .
x +2 0 +2
Continuous Functions
Definition (#30)
A function f : R → R is called a continuous function at the point x0 , if f (x0 + Δx) − f (x0 ) → 0 , as Δx → 0 Δx ( is pronounced as “delta x” and is a very small value
that runs to 0).
In Figure 1.10 you can see that f (x0 + Δx) − f (x0 ) = BC and Δx = AC , so that f (x) is continuous at x = x0 if BC is getting smaller as AC is getting smaller, i.e.,
the point B will be getting to the point A moving along the curve as Δx → 0 .
Figure 1.10
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 15/35
7/2/25, 7:39 AM Module 1
Here we introduce a very important definition of a continuous function at the point. Let’s look at Figure 1.10. The point is x0 so the point A with coordinates x0 and f (x0 )
belongs to the graph of the function f . So we do the following: We consider the point x0 + Δx , where Δx is a very small value. Thus x0 + Δx is not that far away from x0 .
Then we check the value of the function at x0 + Δx . The value is, of course, f (x0 + Δx) . You can see a point B with the coordinates x0 + Δx and f (x0 + Δx) right
there. So what is the difference between the value of the function at (x0 + Δx) and at x0 ? The difference is f (x0 + Δx) − f (x0 ) and is BC . So the function f (x) is
called continuous at x0 if BC is getting close to zero, once Δx is getting close to zero. To prove a function is continuous, we must show that f (x0 + Δx) − f (x0 ) → 0 as
Δx → 0.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v30 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Example (#31)
1
f (x) =
x
is not continuous at x = 0 .
First of all, this function is not even defined at x = 0 , but even if it were defined at 0 moving your pencil along the curve will never get you to the value of f (0). Try it!
Figure 1.11
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v31 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 16/35
7/2/25, 7:39 AM Module 1
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v32 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Example (#33)
2
f (x) = x , f (x) = x
3
+ x + 1 are continuous on any interval.
f (x) = x
3
+ x + 1 is the sum of three continuous functions f (x) 3
= x , f (x) = x and f (x) = 1 . The sum of continuous functions is a continuous function.
Example (#34)
f (x) = x
2
, so f (x + Δx) = (x + Δx)
2
2 2 2 2 2 2
f (x + Δx) − f (x) = (x + Δx) − x = x + 2x ⋅ Δx + (Δx) − x = 2x ⋅ Δx + (Δx) =
Example (#35)
f (x + Δx) − f (x) =
2 2
(x + Δx) + (x + Δx) − (x + x) =
2 2 2
x + 2x ⋅ Δx + (Δx) + x + Δx − x − x =
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 17/35
7/2/25, 7:39 AM Module 1
2
2x ⋅ Δx + (Δx) + Δx =
Δx(2x + Δx + 1) → 0 , as Δx → 0 .
Thus, f (x) = x
2
+ x is a continuous function.
Once a function is continuous at the point, it may have a derivative at that point.
Yes, the function is continuous as x = 0 . The graph at this point is a smooth curve. If we knew the equation of the line
1
(i.e., h(x) =
2
x
2
+ x − 6 ), then we could prove that the function is continuous as we did in some of the previous
examples.
No, the function is not continuous as x = 10 . The graph at this point is NOT a smooth curve. If we knew the equation of
the line, then we could prove that the function is not continuous as we did in some of the previous examples. The function,
however, is continuous at other points (such as x = 0 ).
Definition (#36)
f (x0 +Δx)−f (x0 ) f (x0 +Δx)−f (x0 )
′ ′
lim
Δx
(if it exists as Δx → 0 ) is called a derivative of a function f (x) at x0 and is denoted by f (x0 ) , so that f (x0 ) = lim
Δx
.
The geometrical meaning of the derivative at the point x0 is the slope of the tangent line to the curve at the point (x0 , f (x0 )).
Figure 1.12
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 18/35
7/2/25, 7:39 AM Module 1
Here comes the definition of a derivative of a function at the point. Let’s go back to Figure 1.10. The derivative of a function f (x) at the point x0 is the limit of BC over AC ,
when the point B travels along the curve of the function to the point A. As AC is getting smaller so is BC and the limit of BC over AC is called the derivative of f (x) at x0 .
In Figure 1.12 you can see a part of the graph of a function f (x). At the point x0 , the value of the function is f (x0 ). Let’s draw the tangent line to the curve at the point
(x0 , f (x0 )) . The slope of this tangent line is a derivative of the function f (x) at x0 .
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v36 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Remark (#37)
If the derivative of f (x) exists at each point of some interval (i.e., there exists the tangent line to the curve at each point of the interval), then we say that f (x) has a derivative
in the interval.
Find derivatives of the following functions using the definition of a derivative. In subsequent sections we will review shortcuts and rules that can be used to find the derivative
more easily. But first, we show how these can be derived by hand. You may note some patterns here that will present themselves as rules in the later sections.
c−c 0
a. f (x) = c c( is a constant) ⇒ ′
f (x) = lim
Δx
= lim
Δx
= 0 , as Δx → 0 , so, c′ = 0 , i.e., the derivative of a constant (2, 17, –11, … are examples of a
constant) is zero.
(x+Δx)−x Δx
b. f (x) ′
= x ⇒ f (x) = lim
Δx
= lim
Δx
= lim 1 = 1 , so x′ = 1 , i.e., the derivative of x is 1.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v38 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 19/35
7/2/25, 7:39 AM Module 1
Δx
= lim
Δx
= lim
Δx
= a , as Δx → 0 , so (ax + b)′ = a , i.e., the derivative of ax + b is a.
2 2 2 2 2 2
(x+Δx) −x x +2x⋅Δx+(Δx) −x 2x⋅Δx+(Δx) Δx(2x+Δx)
e. f (x) = x
2 ′
⇒ f (x) = lim
Δx
= lim
Δx
= lim
Δx
= lim
Δx
= lim(2x + Δx) = 2x
2
, so the derivative of x is 2x.
2 2 2 2 2 2
(x+Δx) +1−(x +1) x +2x⋅Δx+(Δx) +1−x −1 2x⋅Δx+(Δx)
f. f (x) = x
2 ′
+ 1 ⇒ f (x) = lim
Δx
= lim
Δx
= lim
Δx
= lim(2x + Δx) = 2x , as Δx → 0 , so
(x
2
+ 1)
′
= 2x .
Let’s look at part e from above, f (x) equals x2 . We do exactly what we have done in the previous problems. So let's find the value of f (x) at x + Δx. So what will be the
value? It will be (x + Δx)2 . Now we set up the formula for finding the derivative, square x + Δx, combine the terms, factor out Δx, observe that when Δx runs to a zero x
plus Δx is running to x and finally obtain the result: the derivative of x2 equals 2x. And now let's move on to part f. The part f is almost the same as part e.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v38b video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Remark (#39)
If f (x) = e
x
, then f ′ (x) = e
x
, i.e., the derivative of ex is the same ex .
If f (x) equals e to the power of x, then the derivative of this function will be exactly the same. So the derivative of e to the power of x is e to the power of x. It looks like the
exponential function doesn't really care if anyone attempts to take a derivative of it. It will stay exactly the same.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v39 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 20/35
7/2/25, 7:39 AM Module 1
Now we'll talk about the rules of differentiation. Just using the definition of the derivative of a function, we can prove these four rules, but we have no time for that so we'll just
take them for granted.
1. The first rule is that if a function is the sum of two other functions, then the derivative of it is the sum of the derivatives of these functions.
2. The second rule is that if a function is a product of a constant and some other function, then the derivative of it is the constant times the derivative of the other function.
3. The third rule is that if a function f equals the product of two other functions, then the derivative of it is the derivative of the first function times the second function plus
the first function times the derivative of the second function.
4. The fourth rule is that if a function f (x) equals x to the power of n, then the derivative of it is n times x to the power of n − 1. It's really easy to remember this rule.
The exponent n just steps down in front of x and the exponent becomes n − 1.
1 . In this case, we can use the sum rule which tells us that the derivative of a sum is the sum of the derivatives. That is,
the derivative of x + 5 is equal to the derivative of x PLUS the derivative of 5. The derivative of x is 1. We know this via
the definition of a derivative or the power rule (which says that the derivative of xn = n ⋅ x
n−1
→ the derivative of
x = x
1
is 1 ⋅ x 1−1
= 1 ⋅ x
0
= 1 ⋅ 1 = 1 ). The derivative of 5 is 0 since the derivative of any constant is 0. As such, the
derivative of x + 5 is equal to the derivative of x (which is 1) PLUS the derivative of 5 (which is 0). That is, the derivative
of x + 5 is 1 + 0 = 1 .
10x
4
. In this case, we can use the constant rule and the power rule. The constant rule tells us that if our function is in the
form c ⋅ g(x) then the derivative is c ⋅ g ′ (x). In other words, the derivative of a constant times a function of x is the
constant times the derivative of that function. Here, f (x) = 2 ⋅ g(x) = 2 ⋅ x
5
. Thus, f ′ (x) ′
= c ⋅ g (x) = 2 ⋅ (x )
5 ′
.
Now, we just need to find the derivative of x . For this, we can use the power rule which tells us that the derivative of
5
x
n
= n ⋅ x
n−1
. In this case, n = 5 . The derivative of x5 is 5 ⋅ x5−1 = 5 ⋅ x
4
= 5x
4
. Putting everything together, we
have f ′ ′
(x) = c ⋅ g (x) = 2 ⋅ (x )
5 ′
= 2 ⋅ 5x
4
= 10x
4
.
Examples of Differentiation
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 21/35
7/2/25, 7:39 AM Module 1
The following examples demonstrate the rules of differentiation.
Remark (#41)
If f (x) = e
g(x)
, then f ′ (x) = e
g(x) ′
⋅ g (x) , i.e., the derivative of e to the power of a function is e to the power of a function times the derivative of the function.
Remark 5 deals with what is known as a chain rule. We do not have enough time to prove it. Nevertheless, we'll try to explain how to take a derivative of a function that is e to
the power of some other function. This is what we do: We simply write down e to the power of the other function and multiply it by the derivative of the other function. Now let’s
move on to the examples.
Example (#42)
1. If f (x) = e
−x
, then f ′ (x) = e
−x
⋅ (−x)
′
= −e
−x
2 2 2
2. If f (x) = e
−x
, then f ′
(x) = e
−x
⋅ (−x )
2 ′
= −2x ⋅ e
−x
3. If f (x) = xe
x
, then f ′
(x) = e
x
+ xe
x
= e
x
⋅ (x + 1)
4. If f (x) = xe
−x
, then f ′ (x) = e
−x
+ x(e
−x
)
′
= e
−x
− xe
−x
= e
−x
⋅ (1 − x)
5. If f (x) = x e
2 x
, then f ′
(x) = 2xe
x
+ x e
2 x
= e
x
⋅ (2x + x )
2
OK, the first example is e to the power of negative x. In this case the other function is negative x. So what is the derivative of it? The answer is e to the negative x times the
derivative of negative x. So what is the derivative of negative x? Well, negative x is actually minus one times x, where minus one is a constant. So the derivative of minus one
times x is minus one times the derivative of x. But the derivative of x is one. Thus the final answer is minus e to the power of negative x.
The second example is e to the power of negative x squared. The derivative of it is e to the power of negative x squared times the derivative of negative x squared. What is
the derivative of negative x squared? It’s negative 2x. We multiply e to the power of negative x squared by negative 2x to get the answer.
The third example is x times e to the power of x. In this case we have to take a derivative of the product of two functions. The first function is x. The second one is e to the
power of x. Remember how we take the derivative of the product? It is the derivative of the first times the second plus the first times the derivative of the second. So let’s start.
The derivative of the first, that is the derivative of x is one. We multiply it by the second function, i.e., by e to the power of x. Plus the first, which is x times the derivative of the
second. But the second is e to the power of x. The derivative of e to the power of x is the same e to the power of x and that’s how we get that.
The fourth example is the product of x and e to the negative x. The derivative of the product is the derivative of x times e to the negative x plus x times the derivative of e to
the negative x. The derivative of e to the negative x is minus e to the negative x.
The fifth example is x squared times e to the x. So the derivative of it is the derivative of x squared times e to the x plus x squared times the derivative of e to the x.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v42 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Once we know the derivative of a function in the interval, it's easy to find the derivative of a function at each point of the interval. So what do we do for that? We just plug in the
value of the point into the derivative. Let's have three examples.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 22/35
7/2/25, 7:39 AM Module 1
In a), the function f is x squared plus one and we’d like to find the value of this function at the point x = 0. So we take the derivative of x squared plus one to obtain 2x and
then plug in zero for x. So the derivative of f at x = 0 is 0.
In b), the function f is e to the power of negative x and we’d like to find the value of the derivative at four. So we take the derivative of e to the negative x to obtain minus e to
the negative x. Then we plug in four to obtain the answer: minus e to the power of minus four.
In c), the function f is x times e to the x and we’d like to find the value of the function at 2. So we take the derivative of x times e to the x and then plug in 2 into the
derivative.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v43 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
10x e
4 x
+ 2x e
5 x
.
In this case, we can use the product rule. The product rule tells us that if our function is in the form g(x) ⋅ h(x) then the
derivative is g ′ (x) ⋅ h(x) + g(x) ⋅ h′ (x). Here, f (x) = g(x) ⋅ h(x) = 2x
5
⋅ e
x
, where we can define g(x) = 2x
5
and h(x) = e
x
. From the previous example, g ′
(x) = 10x
4
. And, h ′
(x) = e
x
per Remark 5 which says that the
derivative of ex equals ex ⋅ (x)
′
= e
x
⋅ 1 . Thus,
′ ′ ′
f (x) = g (x) ⋅ h(x) + g(x) ⋅ h (x) = (10x ) ⋅ (e ) + (2x ) ⋅ (e ) = 10x e
4 x 5 x 4 x
+ 2x e
5 x
.
5e
5x
.
In this case, we can use Remark 5 which says that the derivative of eg(x) equals eg(x) ⋅ (g(x))
′
. Here, f (x) = e
g(x)
,
where we can define g(x) .
= 5x g (x) = 5
′
. Putting everything together, f ′ (x) = e
g(x)
⋅ (g(x))
′
= e
5x
⋅ 5 .
10 .
Differentials
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v44 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
a. d(x + 3) ′
= (x + 3) dx = dx
b. d(2x) = 2dx
c. d(ax + b) = adx
d. d(x2 + 1) = 2xdx
e. d(ex ) x
= e dx
A differential of a function is defined as the derivative of that function times dx, where dx is a very small value. Let’s have some examples.
In a), the function is x + 3. The derivative of it is 1, so the differential of x + 3 is 1 times dx, so the answer is dx.
In b), the function is 2x. The derivative of it is 2, so that the differential of 2x is 2 times dx.
In c), the function is ax + b. The derivative of it is a, so that the differential is a times dx.
In d), the function is x squared plus one. The derivative of it is 2x, so that the differential is 2x times dx.
In e), the function is e to the x. The derivative of it is e to the x, so that the differential is e to the x times dx.
Video Walk-Through
met_cs546_21_fa2_atemkin_m01_v01 video cannot be displayed here. Videos cannot
be played from Printable Lectures. Please view media in the module.
Complementary Material
The following complementary material is for the student who would like to learn more about the material presented in this
lecture. Needless to say, it is optional.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 24/35
7/2/25, 7:39 AM Module 1
Figure 1.19
Definition 17
A function f : A → B is called a bijection if and only if it’s a one-to-one and onto function.
Remark 7
f (x) = x
3
is a bijection, as it is one-to-one and onto.
Example 21
Let f : R → R is such that f (x) = 2x + 3 . Then the image of each x is 2x + 3 and the preimage of each y can be found in the following way: we want to find all x, such
y−3
that 2x + 3 = y . Solving for x, we obtain 2x = y − 3 or x =
2
.
Once more than one function is given, we can define the composition of functions.
Definition 18
Let f : A → B and g : B → C . Then h = g ∘ f , mapping A to C where (g ∘ f )(x) = g(f (x)) is called a composition of functions g and f . It is assumed that the
range of the function f is the same as the domain of the function g.
Figure 1.20
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 25/35
7/2/25, 7:39 AM Module 1
Example 22
Let f (x) = x
2
+ 1 and g(x) = x
3
, where x ∈ R .
(g ∘ f )(x) = (x
2
+ 1)
3
and (f ∘ g)(x) = (x )
3 2
+ 1 = x
6
+ 1 .
The domain for f ∘ g and for g ∘ f is R. The range for f ∘ g and for g ∘ f is the set of all positive numbers greater than or equal to 1.
Once there is a bijection between two sets, we can define the inverse function.
Figure 1.21
Example 23
The function ex maps (−∞, ∞) to (0, ∞) and is a bijection.
So the inverse function g : (0, ∞) → (−∞, ∞) does exist and is also a bijection.
Example 24
Graph the function: f (x) = ln x.
As e0 = 1 , ln 1 = 0 . As x → ∞, e
x
→ ∞ , thus ln x → ∞ as x → ∞
As x → −∞, e
x
→ 0 , thus, as x → 0 , ln x → −∞ .
Figure 1.22
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 26/35
7/2/25, 7:39 AM Module 1
Remark 8
1
The derivative of f (x) = ln x is f ′ (x) =
x
Remark 9
A logarithmic function is a very important function, as its rate of growth is very small.
The limit of a sequence was introduced in Definition 8. The next example illustrates how this definition is used.
Example 25
1
Prove that lim n
= 0 , when n → ∞ .
1
Here xn = n
. Let ε be a very small number and let xn < ε .
1 1
From that follows n
< ε ⇒ n ⋅ ε > 1 ⇒ n >
ε
.
1
That is, for every ε > 0 , there is N = ε
, such that for every n > N , xn < ε .
Definition 21
What does it mean that the limit of a sequence is ∞? The following definition provides the answer to this question.
We say that the limit of the sequence xn is +∞, if for every very large number M > 0 there is N , such that for every n > N : xn > M
Example 26
Prove that lim n2 = ∞ . Let M be a very large number. Then from n2 > M follows n > √M , so that N = √M .
Definition 22
The limit of a function when x → ∞ .
Let f : R → R be a function. Then we say that b is the limit of a function f when x → ∞ , if for every very small number ε > 0 , there is M > 0 such that if x > M then
b − ε < f (x) < b + ε .
Figure 1.23
Example 27
1
Let f (x) =
x
. Then lim f (x) = 0 , when x → ∞ .
1 1 1 1
Let x
< ε ⇒ x >
ε
, so we can say that for every ε > 0 there is M =
ε
, such that for every x > M : x >
ε
.
Practice Problems
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 27/35
7/2/25, 7:39 AM Module 1
Is f (x) a one-to-one function? If it is, prove it. If it is not, have a counter example.
Is f (x) a one-to-one function? If it is, prove it. If it is not, have a counter example.
2x
3
1
− 11 = 2x
3
2
− 11 ⇒ x
3
1
= x
3
2
⇒ x1 = x2 , so the function is one-to-one
2016
x +2015 2015
lim 2016
=
x +2016 2016
2015
x +2016
lim 2016
= 0
x +2015
11 2
2x +x +12 2
lim 11 2
=
3x +x 3
3
3n +2n−11 3 2
lim
3n −1
2
= ∞ , as n → ∞ faster than n → ∞ .
▶ f (x) = 2x + 7
′
f (x) = 2x + 7 ⇒ f (x) = 2 ⋅ 1 + 0 = 2
▶ f (x) = 2e x
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 28/35
7/2/25, 7:39 AM Module 1
x ′ x
f (x) = 2e ⇒ f (x) = 2e
▶ f (x) = 4x 4
+ 11
4 ′ 3 3
f (x) = 4x + 11 ⇒ f (x) = 4 ⋅ 4x + 0 = 16x
▶ f (x) = x 5
+ x
4
+ 4x
3
− 100
5 4 3 ′ 4 3 2
f (x) = x + x + 4x − 100 ⇒ f (x) = 5x + 4x + 12x
▶ f (x) = (x 2
− 3)e
x
f (x) = (x
2
− 3)e
x ′
⇒ f (x) = 2x ⋅ e
x
+ (x
2
− 3) ⋅ e
x
(f (x) is the product of two functions, x2 − 3 and ex , so
we find the derivative of the product of two functions.)
▶ f (x) = x(x + 2) 2
= x(x
2
+ 4x + 4) = x
3
+ 4x
2
+ 4x
2 2 3 2 ′ 2
f (x) = x(x + 2) = x(x + 4x + 4) = x + 4x + 4x ⇒ f (x) = 3x + 8x + 4
met_cs546_16_su2_atemkin_mo1_08 video cannot be displayed here. Videos cannot be played from Printable Lectures. Please v
2 2
f (x + Δx) − f (x) = (x + Δx) + 5 − (x + 5)
2 2 2
= x + 2x ⋅ Δx + (Δx) + 5 − x − 5
2
= 2x ⋅ Δx + (Δx)
= Δx(2x + Δx).
′ 2 ′
f (x) = 6x − 1 ⇒ f (−2) = 6(4) − 1 = 23
▶ f (x) = 2x 3
+ x
2
− x
3 2 3 2 3 2 ′ 2
f (x) = 2x + x − x ⇒ d(2x + x − x) = (2x + x − x) dx = (6x + 2x − 1)dx
▶ f (x) = 2e
2
x −1
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 29/35
7/2/25, 7:39 AM Module 1
2 2 2 2
x −1 x −1 x −1 x −1
f (x) = 2e ⇒ d(2e ) = 2 ⋅ 2xe dx = 4xe dx
▶ f (x) = −e −x
− x
3
+ 4
−x 3 ′ −x 2 −x 3 −x 2
f (x) = −e − x + 4 ⇒ f (x) = e − 3x ⇒ d(−e − x + 4) = (e − 3x )dx
Complementary Problems
The following complementary problems are for those of you who wanted to read the complementary material
R is the domain for f ∘ g and g ∘ f . The range for f ∘ g is the set of all nonnegative numbers and the range for g ∘ f is
also the set of all nonnegative numbers. (g ∘ f )(x) = x
2
and (f ∘ g)(x) = x
2
. In this example both f ∘ g and g ∘ f
are the same.
1
Here xn =
n
2
. Let ε be a very small number and let xn < ε . From that follows
1 1 1 1
n
2
< ε ⇒ n
2
⋅ ε > 1 ⇒ n
2
> ε
⇒ n > . That is, for every ε > 0 , there is N = , such that for every n > N
√ε √ε
, xn < ε .
1 1
−
x+Δx x
′
f (x) = lim
Δx
x−(x+Δx)
x(x+Δx)
= lim
Δx
Δx
= lim −
x(x + Δx)Δx
1
= lim −
x(x + Δx)
1
= −
2
x
Download 1 • 2 • 3 • 4 • 5 • 6 • 7 • 9 • 10 • 11
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 30/35
7/2/25, 7:39 AM Module 1
What is the correct approach for proving that the function given in Problem 8 is a continuous function?
Plugging in the value x=0 in the function f (x) and verifying that the value of the function is 0.
This is false.
This is true.
False is true.
After determining the correct answer, perform all the necessary steps to prove that the function f(x) is continuous.
Application Problems
Application Problems
This section contains problems that illustrate only a small part of the numerous areas in which the material taught in the
module can be applied. Even though working through the problems is optional, we recommend doing so as it can be
beneficial to get a solid sense about the application potential of these concepts in the industry.
In modern networks, we are using layered models for communication, which break down these concerns into separate layers. For example, the TCP/IP model is a layered
model for communication that is used worldwide and is the basis for the global internet. The application layer of this model generates useful messages that enable end-to-end
communication. Let’s denote the size of a useful message with the variable x.
The layers below the application model serve all other functions explained above that help deliver messages end-to-end. Each of them defines a specific protocol that adds
overhead to the useful data. This overhead cannot be avoided, as none of the above functions would be possible without it. Let’s denote the total amount of overhead
introduced by the different protocols of the TCP/IP model with the parameter a.
The transmission efficiency is a measure of how efficiently we are using the resources available on the communication network. The transmission efficiency can be regarded
as function f (x), which is defined as follows:
x
f (x) =
x + a
Having this in mind and using the concept of limits, answer the following questions:
Solution:
a. The variable x and the parameter a are positive real numbers. So, for any such number x, the numerator of f (x) is smaller than the denominator, which is obvious
because x < x + a . For this reason, 0 ≤ f (x) ≤ 1 .
b. If you look at the definition above, mathematically speaking, there are basically two ways to increase the transmission efficiency. One is to increase x, while the other is
to decrease a. When x is increased, the total amount of useful data increases relative to the overhead, making the transmission more efficient. Similarly, when a is
reduced, this automatically makes x higher, which again results in a more efficient communication.
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 31/35
7/2/25, 7:39 AM Module 1
x x
lim f (x) = lim = lim = 1
x→∞ x→∞ x + a x→∞ x
We notice that the "ideal" case of being able to use 100% of the resources of the network while having non-zero protocol overhead is to increase indefinitely the amount
of useful data x.
d. No matter how attractive the idea about increasing x indefinitely looks, the reality is that this can never be achieved. The main reason why we cannot do this is related
to the imperfections of the real world. The data that we send through the network can get corrupted for a wide variety of reasons, ranging from noise, interference to
software bugs or even faulty equipment. Any communication channel features some probability of error that is a function of these factors. In other words, whatever data
you send may be corrupted with errors, and you will have to resend the corrupted packets until they are received correctly.
Now imagine that you send a really big packet through the network. Also imagine that it is received on the receiver side with errors (receivers have error-detection
mechanisms to discover that the data is corrupted). This means that the sender will have to resend that big packet. This takes time, as the packet is big. Now imagine
another error happens, and another, and so on. You are basically sending nothing because the packet is so big that it very often gets affected by the buggy channel,
and it never gets through! So, even though the packet is big (we try to simulate the “ideal” case of infinite x, we are actually limited by the probability of error and thus
unable to achieve the “ideal” case of 100% utilization. On the contrary, with such big packets, we can have hard time getting any practically useful efficiency at all.
The interplay between choosing a good size for the packets—i.e. good x—and dealing with the probability of error is a subtle topic and not an easy one. In any case,
we will not be going into the details of that topic. It is sufficient to know why increasing x indefinitely is problematic.
e. As mentioned in (b), there are two ways to increase the transmission efficiency. One of them is viable, while the other is more of a hypothetical assumption. The first
approach is to increase the amount of useful data x that we send, just as we commented above. This is viable, but is limited because of the probability of error.
The second approach is to decrease the protocol overhead a. However, this is not viable because all the protocols operating at different layers in the TCP/IP stack are
well defined, meaning that their headers and definitions cannot be changed arbitrarily, just like that. If this is hypothetically done by a vendor, it will break the
interoperability with all other devices on the Internet that speak the TCP/IP language! As a result, the equipment manufactured by the vendor will not be able to connect
to other devices on the Internet.
Irrational numbers are one such example because they cannot be expressed as a fraction of integers and have an infinite number of decimal places. Numerically, irrational
numbers are normally approximated with a sequence. One of the most common irrational numbers is π. There are many sequences discovered that have π as a limit and can
be used for calculating this number with an arbitrary precision, one of which is the following:
∞ −k
(−3)
π = √12 ∑
2k + 1
k=0
If you start adding up the first few terms, you will begin to get an approximation for π. Calculate the first 10 terms of the sequence and observe how they are getting closer to
the value of π (3.14159265359).
Solution:
Term Approximation of π
1 3.46410161513
2 3.07920143567
3 3.15618147156
4 3.13785289159
5 3.14260474566
6 3.14130878546
7 3.14167312698
8 3.14156871594
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 32/35
7/2/25, 7:39 AM Module 1
9 3.14159977381
10 3.14159051093
Now you can see how we get closer and closer to the value of π as we add up more terms! You can play around with the sequence using this widget.
One such approximation is the linear approximation, which involves the first derivative of a function. The basic idea is that the tangent line is a good approximation of a
function near the point of interest. The linear approximation formula is defined as follows:
′
f (x + Δx) ≈ f (x) + f (x)Δx
1
Let’s illustrate this with the function f (x) =
x
by answering the following questions:
Solution:
1
a. Plugging x = 4 into the function gives us f (4) =
4
= 0.25 .
1 1
b. The first derivative is f ′ (x) = −
x
2
. So, the value of the first derivative at x = 4 is f ′ (4) = − 2
= −0.0625 .
4
1
c. Plugging x = 4.1 into the function gives us f (4.1) =
4.1
= 0.24390 .
′
f (4.1) ≈ f (4) + f (4) × 0.1 = 0.25 − 0.0625 × 0.1 = 0.24375
As can be seen, the approximate value of 0.24375 is very close to the real value of 0.24390. Now you understand how powerful this technique can be!
Remark: The linear approximation is good for one set of applications, but it may not be sufficient for another set of applications. This can be improved by approximating a
function using a quadratic polynomial or a cubic polynomial. It is even possible to approximate a function with a polynomial of an arbitrary degree. These polynomials are
known as Taylor polynomials. Their limit is an infinite series called the Taylor series, and they provide a very good approximation to many functions.
S
C = B log (1 + )
2
N
The reason why this formula is so important is because it defines the upper limit of reliable data communications on a point-to-point channel, i.e., a channel having one sender
and one receiver, under the presence of additive white Gaussian noise.
While this wording might sound a bit too authoritative, it is actually very easy to understand. What all of this means is that if an information source (e.g., a laptop via its network
interface card) sends data at a data rate R < C , then reliable communication is possible. In other words, there exist such error-correcting algorithms that can handle the
errors that might have occurred during the data transmission. Remember that the channel is noisy because of the presence of additive noise that degrades the quality of the
useful signal.
Conversely, if R > C —i.e., if the information source intends to send data at a data rate R, which is higher than the channel capacity C —then such error-correcting codes do
not exist. Thus, reliable communication is not achievable. This summarizes very briefly the main substance of Claude Shannon’s groundbreaking 1948 discovery, which
established the mathematical theory of communications that today encompasses all fields of information technology.
a. What is the limit of C when N goes to infinity while the remaining parameters in the formula remain unchanged?
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 33/35
7/2/25, 7:39 AM Module 1
b. The signal power and noise power are always finite, no matter how big they are. Under what condition would the channel capacity be negligibly small, assuming that the
channel bandwidth B is kept unchanged? Interpret the result you get from the perspective of data transmission and reliable communications.
c. One way to increase the channel capacity C is to increase the signal power relative to the noise power. Could you increase the signal power indefinitely? Explain your
reasoning.
Solution:
S
a. When N → ∞ while keeping everything else unchanged, then N → 0 because we divide a finite number S by a number N that grows arbitrarily large. For this
reason, we have:
S S
lim C = lim B log (1 + ) = B log (1 + lim ) = B log 1 = 0
2 2 2
N →∞ N →∞ N N →∞ N
This result comes from the fact that the logarithm of 1 is 0 irrespective of the base of the logarithm in question.
b. If B does not change, then C would be very small if the signal-to-noise ratio is small. This can happen only if N ≫ S , i.e., if the noise power N is considerably higher
than the signal power S .
What are the consequences of this observation from the perspective of data transmission and reliable communications? Well, if the noise power is so big that C
becomes very small, what this means is that the noise is so dominant compared to the useful signal that it hinders any normal communication between the sender and
receiver.
The reason why this happens is because there are so many errors occurring on the channel. These errors make it very difficult for the receiver to recover. As a result,
the upper limit of reliable communication (i.e., the channel capacity C ) is very small.
c. One way to minimize the negative effects caused by the noise is to increase the signal power. This can be done if the sender uses more power when sending its useful
information. For example, the antenna of a mobile phone will use more power for sending signals to communicate with the base station. As can be seen, this works well
S
because C increases as N
increases.
However, S cannot increase indefinitely. While increasing S improves our ability to send more information, it comes at a price—we need to use more energy. If we
return back to the mobile phone analogy, using more power to send signals through the antenna means using more battery power. If we keep on doing this, the battery
will be exhausted very quickly! Therefore, it is important to balance the need to send as much data reliably as possible with the wise use of available power.
One of the most common operations performed on data in symmetric encryption is substitution. Assume a simplistic case in which every letter of the English alphabet from a
plaintext is mapped to its subsequent letter. For example, A is mapped to B, B is mapped to C , and so on, and Z is mapped to A. If this encoding is applied to the following
plaintext:
Solutions:
a. This mapping is a function because for every letter of the plaintext there is only one corresponding image. In other words, the encoding maps each letter to only one
letter.
b. By definition, a function f is a mapping from a set A to a set B, i.e., f : A → B . If we use the letter f to denote the encoding function, then the only thing we need to
determine are the sets A and B. But doing this is straightforward given that the plaintext consists of letters of the English alphabet without any restrictions. Therefore,
the sets A and B are the same—they are identical to the English alphabet.
c. By definition, a function f : A → B is called one-to-one if different elements in A are mapped to different elements in B. How can this be proven? If we take two
elements x1 and x2 from A for which f (x1 ) = f (x2 ) , then this holds true if and only if the two elements are the same, i.e., x1 = x2 .
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 34/35
7/2/25, 7:39 AM Module 1
From the way the encoding is constructed in this example, it is obvious that two images are the same f (x1 ) = f (x2 ) only if the preimages are the same x1 = x2 . For
example, f (x1 ) = f (x2 ) = b only if x1 = x2 = a . Alternatively, if the images are not the same, i.e., f (x1 ) ≠ f (x2 ) , this can be true only if x1 ≠ x2 . This explains
why the function is one-to-one.
Solutions:
a. Let’s first determine the number of binary sequences of length n = 1 . Obviously, there are only two such sequences—the binary digits 0 and 1 themselves. Similarly, it
can be easily seen that there are four binary sequences of length n = 2 , and they are 00, 01, 10, 11. The number of binary sequences of length n can be determined
after observing that there are two options for the binary digit in the first bit place, then 2 options for the second binary digit, and so on. Given that a binary sequence of
n n
length n consists of n binary digits, and also by applying the product rule, we have 2 × 2 × … × 2 = 2 . Therefore, there are 2 binary sequences of length n.
1
b. Selecting one of the two possible bit digits can be done with probability 2
. This gives the probability for the case n = 1 . Similarly, selecting one option out of four binary
1
sequences for n = 2 can be done with probability 4
. This can be extended to any general value of n. Based on the answer in (a), the probability of selecting a binary
1
sequence out of all binary sequences of length n is equal to 2
n .
1
c. The sequence of all probabilities is defined by assigning the number 2
n to each n. In other words, we define the following sequence:
1
xn =
n
2
1
d. We notice that the denominator of 2
n is growing while the numerator stays the same. Therefore, we have:
1
lim xn = lim n
= 0
n→∞ n→∞ 2
https://alt-5deff46c33361.blackboard.com/bbcswebdav/pid-15714881-dt-content-rid-125177766_1/courses/25summmetcs546so2/course/module1/al… 35/35