Transition
Transition
Advanced Mathematics
                            Copyright 
                                      c 2019
Version: 1.02
Date: June 10, 2019
Contents
Preface vii
I    Set Theory                                                                                                             1
     1 Sets, subsets, and set operations . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    2
        1.A What is a set? . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    2
        1.B Naming sets . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    3
        1.C Subsets . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5
        1.D Cardinality . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
        1.E Power sets . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
        1.F Unions and intersections . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
        1.G Complements and differences        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
        1.H Exercises . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
     2 Products of sets and indexed sets .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
        2.A Cartesian products . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
        2.B Indices . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
        2.C Exercises . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
II   Logic                                                                                                                 21
     3 Statements . . . . . . . . . . . . . . . . . . . .                   . . .      .   .   .   .   .   .   .   .   .   22
        3.A What is a statement? . . . . . . . . . . .                      . . .      .   .   .   .   .   .   .   .   .   22
        3.B Compound statements . . . . . . . . . . .                       . . .      .   .   .   .   .   .   .   .   .   22
        3.C Logical equivalences . . . . . . . . . . . .                    . . .      .   .   .   .   .   .   .   .   .   28
        3.D Tautologies and contradictions . . . . . .                      . . .      .   .   .   .   .   .   .   .   .   30
        3.E Exercises . . . . . . . . . . . . . . . . . .                   . . .      .   .   .   .   .   .   .   .   .   31
     4 Open sentences . . . . . . . . . . . . . . . . . .                   . . .      .   .   .   .   .   .   .   .   .   33
        4.A Open sentences . . . . . . . . . . . . . . .                    . . .      .   .   .   .   .   .   .   .   .   33
        4.B Quantifiers . . . . . . . . . . . . . . . . .                   . . .      .   .   .   .   .   .   .   .   .   35
        4.C Implication and open sentences . . . . . .                      . . .      .   .   .   .   .   .   .   .   .   37
        4.D The meaning of implication . . . . . . . .                      . . .      .   .   .   .   .   .   .   .   .   39
        4.E Translating between English and symbolic                       logic       .   .   .   .   .   .   .   .   .   40
        4.F Exercises . . . . . . . . . . . . . . . . . .                   . . .      .   .   .   .   .   .   .   .   .   41
     5 Multiple quantifiers and negating sentences . . .                    . . .      .   .   .   .   .   .   .   .   .   42
        5.A Statements with multiple quantifiers . . .                      . . .      .   .   .   .   .   .   .   .   .   42
        5.B Negating statements . . . . . . . . . . . .                     . . .      .   .   .   .   .   .   .   .   .   44
        5.C Greatest and least elements . . . . . . . .                     . . .      .   .   .   .   .   .   .   .   .   46
                                         i
ii                                                                                                   CONTENTS
IV   Proof by Induction                                                                                                  91
     13 Mathematical induction . . . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .    92
        13.A The principle of mathematical induction .                  .   .   .   .   .   .   .   .   .   .   .   .    92
        13.B Exercises . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .    99
     14 More examples of induction . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   101
        14.A Starting induction somewhere else . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   101
        14.B Many base cases . . . . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   105
        14.C Proof of generalized induction . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   105
        14.D Exercises . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   106
     15 Strong induction . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   108
        15.A The definition of strong induction . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   108
        15.B Strong induction by example . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   108
        15.C More examples of strong induction . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   111
        15.D Formalizing strong induction . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   112
        15.E Where to start? . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   113
        15.F Exercises . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   116
     16 The Binomial Theorem . . . . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   118
        16.A Binomial coefficients and Pascal’s triangle                .   .   .   .   .   .   .   .   .   .   .   .   118
        16.B Proof of the Binomial Theorem . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   121
        16.C Exercises . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   122
VI   Relations                                                                                                          147
     20 Properties of relations . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   148
        20.A What is a relation? . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   148
        20.B Properties of relations on a set A     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   150
        20.C Exercises . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   152
iv                                                                                                    CONTENTS
This book is intended as the text for the Math 290 (Fundamentals of Mathematics)
class at Brigham Young University. It covers several fundamental topics in advanced
mathematics, including set theory, logic, proof techniques, number theory, relations,
functions, and cardinality. These topics are prerequisites for most advanced mathe-
matics classes, and it seems worthwhile to have a specific course in which they can
be learned by students.
     The prerequisites for understanding this material are surprisingly light. Typically,
only a small amount of college algebra (manipulating simple algebraic expressions)
and knowledge of decimal expansions and prime numbers are needed; most other
necessary material is covered in the text. The book is designed for a semester-long
class; each section contains an appropriate amount of material for an hour long lecture.
The exercise sets at the end of each section give problems for the students to use to
practice the techniques learned in the section, and to develop their understanding of
the material. At BYU there are typically 42 class days in a typical semester; we have
included 36 sections in this book. This allows a few days for instructors to review for
exams, or cover additional topics of their choice.
     We are often asked if we will produce a solutions manual for the exercises. For
this particular course, a solutions manual is probably not a great benefit to the
student. Unlike most mathematics courses that students will have before studying
this material, the exercises in this book often do not have a single correct answer.
Indeed, as the student progresses further into the book, most of the problems ask
for proofs (or disproofs) of statements. Much of the learning in a course such as this
comes from the struggle to produce a proof, rather than studying the techniques used
by someone else to give a proof. Hence, providing a solutions manual would negate
a necessary aspect of the course. In addition, a solutions manual would be of very
little help in verifying the correctness of a proof, since there are many different ways
to prove almost any given statement, all equally correct.
     One aspect of a proof is that it should be a convincing argument that a statement
is correct. A student should consider their solution of a proof-type problem to be
aimed at an audience of students at their level; if they are unsure if it is a valid proof,
then their goal has not been met.
     In addition, it is important to note that most of the solutions to exercises in the
book will involve much more writing than is usual in previous mathematics classes.
Student will need to adapt their thinking so that they are prepared for this.
     One additional topic that instructors may want to include in a course based on this
                                            vii
viii                                                                        PREFACE
book is writing mathematics using LATEX. This is an important skill for mathemati-
cians, engineers, scientists, and mathematics educators. Because technology moves
quickly, we have not included instructional material on LATEX in this text; an internet
search can easily find a plethora of such material. In our courses, we typically spend
one to two class days instructing students on the use of LATEX, as well as giving a
number of assignments to help students develop their skills in LATEX.
    We thank the many BYU students and instructors who have worked through
preliminary versions of this textbook. They have discovered many typographical and
other errors, which have been eliminated. Should a reader discover any additional
errors in the text, please inform us, so that we can correct them in future printings.
       Darrin Doud
       Pace P. Nielsen
Chapter I
Set Theory
There is surely a piece of divinity in us, something that was before the elements, and
owes no homage unto the sun. Sir Thomas Browne
    One of the benefits of mathematics comes from its ability to express a lot of
information in very few symbols. Take a moment to consider the expression
                                      d
                                         sin(θ).
                                      dθ
It encapsulates a large amount of information. The notation sin(θ) represents, for
a right triangle with angle θ, the ratio of the opposite side to the hypotenuse. The
differential operator d/dθ represents a limit, corresponding to a tangent line, and so
forth.
    Similarly, sets are a convenient way to express a large amount of information.
They give us a language we will find convenient in which to do mathematics. This is
no accident, as much of modern mathematics can be expressed in terms of sets.
                                          1
2                                                          CHAPTER I. SET THEORY
Example 1.3. Let S = {1, 5, {4, 6}, 3}. This set has four elements. We have
1, 5, 3, {4, 6} ∈ S, but 4 ∈
                           / S. However, 4 ∈ {4, 6} and {4, 6} ∈ S.      4
    It can be confusing when sets are elements of other sets. You might ask why
mathematicians would allow such confusion! It turns out that this is a very useful
thing to allow; just like when moving, the moving truck (a big box) has boxes inside
of it, each containing other things.
  Advice 1.4. You can think of sets as boxes with objects inside. So we could
  view the set {1, 5, {4, 6}, 3} from the previous example as the following box,
  which contains another box:
1 5 4 6 3
Figure 1.4: A box with a box inside, each containing some numbers.
N = {1, 2, 3, . . .}.
     This is the first example we’ve given of an infinite set, i.e., a set with infinitely
     many elements.
   • The set of integers is
     The dots represent the fact that we are leaving elements unwritten in both
     directions. We use the fancy letter “Z” because the word “integer” in German
     is “Zahlen.”
   Some sets are constructed using rules. For example, the set of even integers can
be written as
                              {. . . , −4, −2, 0, 2, 4, . . .}
but could also be written in the following ways:
(1.5)                               {2x : x ∈ Z}
(1.6)                       {x ∈ Z : x is an even integer}
(1.7)                       {x : x = 2y for some y ∈ Z}.
4                                                            CHAPTER I. SET THEORY
We read the colon as “such that,” so (1.5) is read as “the set of elements of the form
2x such that x is an integer.” Writing sets with a colon is called set-builder notation.
Notice that
                                  {x ∈ Z : 2x + 1}
doesn’t make any sense, since “2x + 1” is not a condition on x.
   Here are a few more examples. The set of prime numbers is
Q = {a/b : a, b ∈ Z, b 6= 0}.
      Note that there is no problem with the fact that different fractions can represent
      the same rational number, such as 1/2 = 2/4. Repetitions do not matter in
      sets. We will occasionally need the fact that we can always write a rational
      number as a fraction a/b in lowest terms: i.e., so that a and b have no common
      factor larger than 1. We will prove this in Section 18 (see Exercise 18.3).
    • The set of real numbers is
C = {a + bi : a, b ∈ R, i2 = −1}.
Example 1.8. Which of the named sets does π = 3.14159 . . . belong to? We have
π ∈ R and π ∈ C. On the other hand, since 3 < π < 4 we have π ∈
                                                              / N and π ∈
                                                                        / Z. It
is true, but much harder to show, that π ∈
                                         / Q.                                4
    Warning 1.9. The empty set is not nothing. It has no elements, but the empty
    set is something. Namely, it is “the set with nothing in it”.
        Thinking in terms of boxes, we can think of the empty set as an empty box.
    The box is something even if it has nothing in it.
        The symbol ∅ does not mean nothing. It means { }.
1. SETS, SUBSETS, AND SET OPERATIONS                                               5
Example 1.10. Sometimes we want the empty set to be an element of a set. For
instance, we might take
                               S = {∅}.
The set S has a single element, namely ∅. We could also write S = {{ }}. In terms of
boxes, S is the box containing an empty box. Note that not all sets have the empty
set as an element.                                                                4
1.C    Subsets
In many activities in life we don’t focus on all the elements of a set, but rather on
subcollections. To give just a few examples:
   • The set of all phone numbers is too large for most of us to handle. The subcol-
     lection of phone numbers of our personal contacts is much more manageable.
   • If we formed the set of all books ever published in the world, this set would be
     very large (but still finite!). However, the subcollection of books we have read
     is much smaller.
   • If we want to count how many socks we own, we could use elements of the
     integers Z, but since we cannot own a negative number of socks, a more natural
     set to use would be the subcollection of nonnegative integers
    Warning 1.15. Some authors use ⊂ instead of ⊆. Other authors use ⊂ instead
    of (. Thus, there can be a lot of confusion about what ⊂ means, which is one
    reason why we will avoid that notation in this book!
    Warning 1.16. Many students learning about subsets get confused about the
    difference between being an element and being a subset. Consider your music
    library as a set. The elements are the individual songs. Playlists, which are
    collections of some of the songs, are subsets of your library.
Example 1.17. The elements of C are complex numbers like 3+6i or −2.7−5.9i. The
subsets of C are sets of complex numbers like {5.4 − 7.3i, 9 + 0i, −2.671 + 9.359i}. 4
Example 1.18. (1) Let T = {1, 2, 3, 4, 5}. Is 2 an element or a subset of T ? It
     is an element, since it lives inside T . It is not a subset, since it isn’t a set of
     elements of T .
 (2) Let U = {−5, 6, 7, 3}. Is {6} an element or a subset of U ? It is not an element
     of U , since the set {6} isn’t in its list of elements. It is a subset because it is a
     box whose elements come from U .
1. SETS, SUBSETS, AND SET OPERATIONS                                                    7
 (3) Let X = {{6}, {7, 8}, {5, 8}}. Is 7 an element or a subset of X? Neither! It is
     not one of the three elements listed in X, and it is not a box of elements in X
     either.
     Is {7, 8} an element or a subset of X? It is an element, since it is one of the
     three listed elements. It is not a subset, even though it is a box, since it has
     elements which don’t belong to X.
 (4) Let Y = {5, {5}}. Is {5} an element or a subset of Y ? It is both! It is an
     element, since it is the second element listed inside Y . It is also a subset of Y ,
     since it is a box containing the first element of Y .                            4
   It can be useful to construct sets satisfying certain properties in relation to one
another. In the following example we show how this can be done.
Example 1.19. We will find three sets A, B, C satisfying the following conditions:
 (1) A ⊆ B,
 (2) A ∈ C, and
 (3) C ⊆ B with C 6= B (i.e., C ( B).
   One method to solve this problem is to start with the simplest sets possible and
modify them as needed. So let’s start with
                         A = { },       B = { },      C = { }.
We see that condition (1) is fulfilled, but condition (2) is not. To force condition (2)
to be true, we must make A an element of C. Thus, our new sets are
                         A = { },      B = { },       C = {A}.
Condition (1) still holds, and condition (2) is now true. However, condition (3) doesn’t
hold. To make (3) true, we need B to have all the elements of C and at least one
more. So we take
                        A = { },      B = {A, 1},       C = {A}.
We double-check that all of the conditions hold (which they do), and so we have our
final answer.                                                                    4
1.D     Cardinality
The number of elements of a set is called its cardinality. For instance, the set S =
{1, 2, 3} has 3 elements. We write |S| = 3 to denote that S has cardinality 3. Note
that |∅| = 0 but |{∅}| = 1. A set is finite if its cardinality is either 0 or a natural
number, and it is infinite otherwise. In a later section in the book, we will talk about
a better way to define cardinality for infinite sets.
Example 1.20. If T = {5, {6, 7, 8}, {3}, 0, ∅}, the cardinality is |T | = 5.           4
    In mathematics, we sometimes use the same symbols for two different things. The
meaning of the symbols must be deduced from their context. For instance, if we write
|−3.392| this is certainly not the cardinality of a set, but instead is probably referring
to the absolute value of a number. In the next example, we use | · | in two different
ways.
8                                                            CHAPTER I. SET THEORY
    Definition 1.22. Let S be a set. The power set of S is the new set P(S) whose
    elements are the subsets of S. In other words, A ∈ P(S) exactly when A ⊆ S.
P(S) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.
Why is the empty set one of the subsets? Is it really a box containing only elements
of S? Yes, its elements (there are none!) all belong to S. Thinking about it in terms
of “throwing away” elements of S, we threw all of them away.
    Why is S ⊆ S? Because S is a box containing only elements of S. Thinking in
terms of “throwing away” elements, we threw away none of the elements.             4
    If S is a finite set, we can determine the size of the power set |P(S)| from |S|.
Here is a sketch of why this is true. To form a subset of S, for each element in S we
choose to keep or throw away that element. Thus, there are 2 choices for each element.
Since there are n elements, this gives 2n options.
Example 1.25. For the set S = {1, 2, 3} we have |S| = 3. Thus the power set has
cardinality |P(S)| = 23 = 8. This is exactly the number of elements we listed in
Example 1.23.                                                                 4
Example 1.26. How may elements will the power set of U = {1, ∅} have? The set
U has two elements, so there should be 22 = 4 subsets. They can be listed as:
S ∪ T = {x : x ∈ S or x ∈ T }.
This is the set of elements which belong to S or T or both of them. (When we use the
word “or” in this book, we will almost always use the inclusive meaning.) Pictorially,
we can view this set using a Venn diagram as follows.
S T
   Similarly, given two sets S and T we can form the set of elements that belong to
both of them, called the intersection, and we write
S ∩ T = {x : x ∈ S and x ∈ T }.
S T
Example 1.30. Let A = {1, 6, 17, 35} and B = {1, 5, 11, 17}. Then
                          T − S = {x : x ∈ T and x ∈
                                                   / S}.
S T
1.H     Exercises
Exercise 1.1. Each of the following sets is written in set-builder notation. Write the
set by listing its elements. Also state the cardinality of each set.
  (a) S1 = {n ∈ N : 5 < |n| < 11}.
  (b) S2 = {n ∈ Z : 5 < |n| < 11}.
  (c) S3 = {x ∈ R : x2 + 2 = 0}.
  (d) S4 = {x ∈ C : x2 + 2 = 0}.
  (e) S5 = {t ∈ Z : t5 < 1000}. (This one is slightly tricky.)
Exercise 1.4. Give specific examples of sets A, B, and C satisfying the following
conditions (in each part, separately):
  (a) A ∈ B, B ∈ C, and A ∈  / C.
 (b) A ∈ B, B ⊆ C, and A * C.
  (c) A ( B, B ∈ C, and A ∈ C.
 (d) A ∩ B ⊆ C, A * C, and B * C.
  (e) A ∩ C = ∅, A ⊆ B, |B ∩ C| = 3.
12                                                       CHAPTER I. SET THEORY
Exercise 1.5. Let A = {1, 2}. Find P(A), and then find P(P(A)). What are the
cardinalities of these three sets?
Exercise 1.6. Let a, b ∈ R with a < b. The closed interval [a, b] is the set {x ∈ R :
a ≤ x ≤ b}. Similarly, the open interval (a, b) is the set {x ∈ R : a < x < b}. Let
P = [3, 7], Q = [7, 9] and R = [−3, 8]. Give simple descriptions of the following sets.
  (a) P ∩ Q.
 (b) P ∪ Q.
  (c) P − Q.
 (d) Q − P .
  (e) (R ∩ P ) − Q.
  (f) (P ∪ Q) ∩ R.
  (g) P ∪ (Q ∩ R).
Exercise 1.7. Consider the following blank Venn diagram for the three sets A, B, C.
B C
For each of the following sets, copy the Venn diagram above, and then shade in the
named region:
 (a) A − (B ∩ C).
 (b) A − (B − C).
 (c) B − (A − C).
 (d) (B ∩ C) ∩ (B ∪ A).
 (e) (A − B) ∪ (A − C).
Exercise 1.8. Two sets S, T are disjoint if they share no elements. In other words
S ∩ T = ∅. Which of the following sets are disjoint? Give reasons.
 (a) The set of odd integers and the set of even integers.
 (b) The natural numbers and the complex numbers.
  (c) The prime numbers and the composite numbers.
 (d) The rational numbers and the irrational numbers (i.e., real numbers which are
      not rational).
Exercise 1.9. Find some universal set U and subsets S, T ⊆ U , such that |S −T | = 3,
|T − S| = 1, |S ∪ T | = 6, and |S| = 2. (Write each of U , S, and T by listing their
elements.)
2. PRODUCTS OF SETS AND INDEXED SETS                                                        13
cos(x2 ) 6= (cos(x))2 .
There are other situations where we want to keep things ordered. We will write (x, y)
for the ordered pair where x occurs first and y occurs second. Thus (x, y) 6= (y, x)
even though {x, y} = {y, x}. Also, an element can be repeated in an ordered list,
such as (1, 1), while sets do not count repetitions.
    There is a very nice notation for sets of ordered pairs.
    Definition 2.1. Let S and T be two sets. The Cartesian product of these sets
    is the new set
                          S × T = {(s, t) : s ∈ S, t ∈ T }.
    This is the set of all ordered pairs such that the first entry comes from S and
    the second entry comes from T . We will often refer to S × T just as the product
    of S and T .
Example 2.2. Let S = {1, 2, 3} and T = {1, 2}. What is S × T ? It is the set
{(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}. Notice that 3 can occur as a first coordinate
since 3 ∈ S, but not as a second coordinate since 3 ∈         / T.
    While the order matters inside an ordered pair, we could have listed the elements
of S × T in a different order since S × T is itself just a set (and order is irrelevant in
sets). So we could have written
S × T = {(1, 2), (2, 2), (3, 1), (1, 1), (2, 1), (3, 2)}.
However,
               S × T 6= T × S = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)}.           4
   You might notice that in the previous example we have |S×T | = 6 = 3·2 = |S|·|T |.
This is not an accident. In fact, the following is true, although we do not as yet have
the tools to prove it.
    Proposition 2.3. Let A and B be finite sets, with |A| = m and |B| = n. Then
    A × B is a finite set, with |A × B| = mn.
14                                                                CHAPTER I. SET THEORY
Example 2.4. Let A = N and B = {0, 1}. What are the elements of A × B? They
are
A × B = {(1, 0), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1), . . .} = {(n, 0), (n, 1) : n ∈ N}.
Is A × B the same set as B × A? No, they have different elements. For instance,
(1, 0) ∈ A × B, but (1, 0) ∈
                           / B × A since 0 ∈
                                           / A.                              4
    In each of the previous examples, we took the Cartesian product of two different
sets. If we take the product of a set with itself, we sometimes write A2 = A × A. The
following example is one of the most useful products of a set with itself.
Example 2.5. The set R2 = R × R is called the Cartesian plane. We view elements
in this set as points {(x, y) : x, y ∈ R}.
                                        {1, 2, 3} × {1, 2}
2. PRODUCTS OF SETS AND INDEXED SETS                                                15
···
x· · ·
N × {0, 1}
2.B     Indices
When we have a large number of sets, rather than writing them using different letters
of the alphabet
                                A, B, C, D, . . . , Z
it can be easier to use subscripts
                                A1 , A2 , A3 , A4 , . . . , A26 .
This notation is extremely powerful for the following reasons:
16                                                                        CHAPTER I. SET THEORY
     • The notation tells us how many sets we are working with, using a small number
       of symbols. For instance, if we write A1 , A2 , . . . , A132 , we know that there are
       exactly 132 sets. (Try writing them down using different letters of the alphabet!)
     • We can even talk about an infinite number of sets A1 , A2 , A3 , . . .. Notice that
       the subscripts all come from the set N. We refer to N as the index set for this
       collection.
     • Using indices we can form complicated unions, intersections, and Cartesian
       products.
Example 2.7. Let A1 = {1, 2, 4}, A2 = {−3, 1, 5, 9}, and A3 = {1, 6, 10}. We find
                     3
                     [
                           Ai = A1 ∪ A2 ∪ A3 = {−3, 1, 2, 4, 5, 6, 9, 10}
                     i=1
and
                                    3
                                    \
                                          Ai = A1 ∩ A2 ∩ A3 = {1}.
                                    i=1
Those who have seen summation notation
                            X 3
                                i2 = 12 + 22 + 32
                                           i=1
The intersection is the set of elements which belong to every one of the sets, thus
                                   ∞
                                   \
                                      Bn = { } = ∅.                                 4
                                            n=1
    There is an alternate way to write intersections and unions, using index sets. For
instance, using the notation in the previous two examples, we could also write
                              3
                              \                                       \
                                    Ai = A1 ∩ A2 ∩ A3 =                       Ai
                              i=1                                 i∈{1,2,3}
and                                          ∞
                                             [           [
                                                  Bn =         Bn .
                                            n=1          n∈N
There is nothing to limit our index set, so we can make the following broad definition.
2. PRODUCTS OF SETS AND INDEXED SETS                                                  17
  Definition 2.9. Let I be any set, and let Si be a set for each i ∈ I. We put
                 [
                   Si = {x : x belongs to Si for some i ∈ I}
                   i∈I
                                 and
                   \
                         Si = {x : x belongs to Si for each i ∈ I}.
                   i∈I
    The next example shows, once again, how mathematics has the uncanny ability
to express information in varied subjects using very simple notation.
Example 2.10. Let A = {a,b,c,d, . . . , z} be the “lowercase English alphabet set.”
This set has twenty-six elements. Let V = {a,e,i,o,u} be the “standard vowel set.”
Notice that V ( A.
   Given α ∈ A, we let Wα be the set of words in the English language containing
the letter α. Note that α is a dummy variable, standing in for an actual element of
A. For instance, if α = x then we have
                         Wx = {xylophone, existence, axiom, . . .},
while if α = t then we have
                         Wt = {terminator, atom, attribute, . . .}.
Each set of words Wα is a subset of the universal set of all words in the English
language.
    Try to answer
                T the following questions:
  (1) What is α∈V Wα ?
  (2) Is that set
                S empty?
  (3) What is α∈V Wα ?
  (4) Is that set empty?
    Here are the answers. (Look at them only after you have your own!)
  (1) This is the set of words that contain every standard vowel.
  (2) It isn’t empty, since it contains words like “sequoia,” “evacuation,” etc.
  (3) This is the set of words with no standard vowels. (Don’t forget that there is a
      bar over the union.)
  (4) It isn’t empty, since it contains words like “why,” “tsktsk,” etc.           4
    We finish with one more difficult example.
                                 S
Example 2.11. We determine x∈[1,2] [x, 2] × [3, x + 3].
    First, to get a footing on this problem, we try to understand what happens for
certain values of x. The smallest possible x value in the union is when x = 1.
There we get that [x, 2] × [3, x + 3] = [1, 2] × [3, 4]. This is the set of ordered pairs
{(x, y) ∈ R2 : 1 ≤ x ≤ 2, 3 ≤ y ≤ 4}. This is just a box in the plane. Its graph is
the first graph below.
18                                                                CHAPTER I. SET THEORY
y y
x x
    The largest possible x value in the union is when x = 2. There we get that
[x, 2] × [3, x + 3] = [2, 2] × [3, 5]. Notice that [2, 2] = {2} is just a single point. Now
{2} × [3, 5] is a line segment in the plane, where the x-value is 2 and the y-values
range from 3 to 5. Its graph is the second graph above.
    If we consider the intermediate value x = 1.5, we get the box [1.5, 2] × [3, 4.5],
graphed below on the left.
y y
x x
                                                     [
              [1.5, 2] × [3, 4.5]                        ([x, 2] × [3, x + 3])
                                                   x∈[1,2]
   Taking the union over all x ∈ [1, 2], we get the region graphed above on the right,
boxed in by the lines x = 1, y = 3, x = 2, and y = x + 3.
                                                                                    4
2.C     Exercises
Exercise 2.1. Sketch each of the following sets in the Cartesian plane R2 .
 (a) {1, 2} × {1, 3}.
 (b) [1, 2] × [1, 3].
 (c) (1, 2] × [1, 3]. (Hint: If an edge is missing, use a dashed, rather than solid, line
     for that edge.)
 (d) (1, 2] × {1, 3}.
2. PRODUCTS OF SETS AND INDEXED SETS                                                 19
Exercise 2.2. Let A = {s, t} and B = {0, 9, 7}. Write the following sets by listing
all of their elements.
  (a) A × B.
  (b) B × A.
  (c) A2 .
  (d) B 2 .
  (e) ∅ × A.
Exercise 2.3. Answer each of the following questions with “True” or “False” and
then provide a reason for your answer.
  (a) If |A| = 3 and |B| = 4, then |A × B| = 7.
 (b) It is always true that A × B = B × A when A and B are sets.
  (c) Assume
      T        ISis an indexing set, and let Si be a set for each i ∈ I. We always have
            S
         i∈I i ⊆   i∈I Si .
 (d) There exist distinct sets S1 , S2 , S3 , . . ., each of which is infinite, but
                                            ∞
                                            \
                                                  Si
                                            i=1
Exercise 2.4. Using the notations from Example 2.10, write the following sets (pos-
sibly using intersections or unions).
  (a) The set of words containing all four of the letters “a,w,x,y.”
  (b) The set of words not containing any of the letters “s,t,u.”
  (c) The set of words containing both “p,r” but not containing any of the standard
      vowels. (Is this set empty?)
Exercise 2.5. For each number r ∈ R, consider the “parabola shifted by r” defined
as:
                        Pr = {(x, y) ∈ R2 : y = x2 + r}.
Describe the following sets in set-builder notation; the answer should have no reference
to “r.”S Also graph the sets in the Cartesian plane.
  (a) Sr∈R Pr .
  (b) Sr>0 Pr .
  (c) Tr6=0 Pr .
  (d) Tr∈R Pr .
  (e) r>0 Pr .
20   CHAPTER I. SET THEORY
Chapter II
Logic
I am convinced that the act of thinking logically cannot possibly be natural to the hu-
man mind. If it were, then mathematics would be everybody’s easiest course at school
and our species would not have taken several millennia to figure out the scientific
method. Neil deGrasse Tyson
                                                 f (b) − f (a)
                                     f 0 (c) =                 .
                                                     b−a
This is an implication; if a certain condition is true, then some result is implied and
can be inferred to be true.
    In addition to logical connectives, this chapter will introduce quantifiers, which
are also very common in mathematical writing. As an example, the Mean Value The-
orem (stated above) includes the quantified phrase “there is some c ∈ (a, b).” Proper
understanding and use of quantifiers is essential to understanding mathematical writ-
ing.
                                            21
22                                                              CHAPTER II. LOGIC
3 Statements
P : 2 is an even number
we are stating that the variable P will stand for the statement “2 is an even number.”
    Sentences that are not statements could include questions (“What is the color of
the sky?”), commands (“Solve the equation”), or opinions (“Chocolate ice cream is
the best”). Moreover, words that don’t actually form sensible sentences do not have
truth values (“Try apples is”). In addition, sentences involving variables, such as “x
is greater than 2,” are not statements, unless the value of x is known. (We will talk
more about sentences containing variables in the next section.)
P : 2 is an even number,
and
                                Q: 5 is an even number,
then the statement P ∧ Q would be
Example 3.5. For the statements P and Q in Example 3.3, P ∨ Q would be the
statement
                2 is an even number or 5 is an even number.
In this case P ∨ Q is true, since P is true.                                              4
  Warning 3.6. If both P and Q are true, then P ∨ Q is true. This is not the
  way the word “or” is usually interpreted in the English language. For instance,
  if I say to my daughter “You may have an ice cream cone or you may have a
  candy bar,” I typically do not mean that she can have both. Thus, P ∨ Q really
  means “either P or Q or both,” but that is too wordy for common use, so we
  just say “P or Q”.
24                                                                   CHAPTER II. LOGIC
¬P,
The symbols ¬P may be read as “not P .” If we wish to express the meaning behind
¬P we might say “it is not the case that P holds,” or “P is false.”
P : 2 is an even number
P : It is raining in London.
You probably do not know whether this statement is true or false. Nevertheless we
can still negate the sentence. The negation is
     Advice 3.10. To see that the meaning of the negation of P is the same as “P
     is false” consider two cases.
         If P is true, the statement “P is false” is then false. Hence, it has the opposite
     truth value as P .
         If, on the other hand, P is false, then the statement “P is false” is a true
     statement, so again, it has the opposite truth value as P .
         Often, it is easier to understand (or maybe just shorter to say) “P is false”
     than to say “it is not the case that P .” Appending the phrase “it is not the
     case that” has the advantage that it will generally form a grammatical sentence
     when combined with the words comprising P .
3. STATEMENTS                                                                        25
P : 2 is an even number
  (which is false), because the fact that these two statements have opposite truth
  values depends on knowing the truth values of each statement. A correct nega-
  tion of the statement P would be the statement
The next concept that we define is one of the most important in mathematics.
Example 3.13. Let P be the statement “2 is an even number” and Q be the state-
ment “5 is an even number,” as in Example 3.3. Then the statement P ⇒ Q would
be the statement
which is false, since P is true and Q is false. On the other hand, the statement Q ⇒ P
would be the statement
Remark 3.14. To better understand why implication is defined as it is, we can think
of an implication as a promise, or a contract. Suppose I tell my daughter
If you clean your room, then you will get ice cream.
We now examine this statement to find out under what conditions I am telling the
truth.
    First, if my daughter cleans her room and I let her have ice cream, then I have
told the truth and kept my promise. In other words, if P is true and Q is true, then
P ⇒ Q is true.
    Second, if my daughter cleans her room and I do not let her have ice cream, then
I have lied. In other words, if P is true and Q is false, then P ⇒ Q is false (a lie).
    Third, if my daughter does not clean her room and I let her have ice cream anyway
(perhaps because she did some other duty to deserve the ice cream), then I have not
lied, so my statement is true. I am under no obligation to give her ice cream, but I
do so anyway. Hence, if P is false and Q is true, then P ⇒ Q is true.
    Finally, if my daughter does not clean her room and I do not let her have ice
cream, then I have not lied; she did not fulfill the condition of the implication, so I
did not fulfill the conclusion. If P is false and Q is false, the implication P ⇒ Q is
true.                                                                                  N
     Warning 3.15. Many students have difficulty with the idea that the statement
     P ⇒ Q should be true when P is false and Q is true. It can help to think of
     P ⇒ Q as meaning “Whenever P is true, Q must also be true, but if P is false,
     anything can happen.”
   We define one final way to combine statements, called the biconditional, be-
low.
P ⇔Q
     is a statement that is true if P and Q have the same truth value, and false
     otherwise.
    The logical connectives that we will deal with in this text are ∧, ∨, ¬, ⇒, and ⇔
(although there are others). We summarize the definitions of these logical connectives
in Table 3.19. In this table, we have a column for each of P , Q, ¬P , P ∧ Q, P ∨ Q,
P ⇒ Q, and P ⇔ Q. We have a row for each possible combination of truth values
of P and Q, and the entries in each row indicate the truth value of the statement at
the top of the column.
                   P   Q ¬P      P ∧Q     P ∨Q     P ⇒Q P ⇔Q
                   T   T F         T        T        T    T
                   T   F F         F        T        F    F
                   F   T T         F        T        T    F
                   F   F T         F        F        T    T
     Definition 3.23. Two compound statements are logically equivalent if they have
     the same truth value regardless of the truth values of the components. If R and
     S are compound statements that are logically equivalent, we write R ≡ S.
    The following example will give two compound statements that are logically equiv-
alent, and we show how to prove this equivalence.
Example 3.24. Let P and Q be statements. Then the statements
R : ¬(P ∧ Q)
and
                                   S : (¬P ) ∨ (¬Q)
are compound statements with components P and Q. In Table 3.25 below, we make
a truth table that indicates the truth values of these two compound statements for
every possible combination of truth values of P and Q.
Table 3.25: Truth table showing equivalence of ¬(P ∧ Q) and (¬P ) ∨ (¬Q)
   We notice immediately that the two boldface columns have identical truth values.
This shows that the statements R and S are logically equivalent, or, in symbols, that
¬(P ∧ Q) ≡ (¬P ) ∨ (¬Q).                                                           4
   When constructing a truth table, it is useful to organize the rows in such a way
that you can be sure that all possible combinations of the components are in fact
represented. If there are n components there will be 2n rows. One convenient way to
organize them is to write each component as a label for a column of the truth table
3. STATEMENTS                                                                       29
(just as we did in Tables 3.19 and 3.25). For the last component (the rightmost one),
put alternating truth values of T and F in the column, until there are 2n of them.
    For the second to the last component, fill the corresponding column with alter-
nating blocks of two T s and F s, until all the entries have been filled. Continue from
right to left, doubling the sizes of the blocks, until for the leftmost component, you
write just one block of T s and one block of F s.
    In this way, you can (in a systematic way) be certain that all possible combina-
tions of truth values are written. An example of how this is to be done with three
components is given below in Example 3.27.
    Note that a truth table can have as many columns as is convenient to work out the
truth values of the statements in which we are interested; typically, we will include a
column for each intermediate step.
    A number of logical equivalences are important enough to be given standard
names.
              P   Q   R    P ∧ Q (P ∧ Q) ∧ R       Q∧R     P ∧ (Q ∧ R)
              T   T   T      T        T             T           T
              T   T   F      T        F             F           F
              T   F   T      F        F             F           F
              T   F   F      F        F             F           F
              F   T   T      F        F             T           F
              F   T   F      F        F             F           F
              F   F   T      F        F             F           F
              F   F   F      F        F             F           F
    Because the two boldface columns match, the corresponding statements are logi-
cally equivalent.                                                              4
30                                                             CHAPTER II. LOGIC
     Definition 3.29. A compound statement that is false for every possible combi-
     nation of truth values of its components is called a contradiction.
     Definition 3.31. A compound statement that is true for every possible combi-
     nation of truth values of its components is called a tautology.
(P ∧ (P ⇒ Q)) ⇒ Q
                 P   Q   P ⇒ Q P ∧ (P ⇒ Q)       (P ∧ (P ⇒ Q)) ⇒ Q
                 T   T     T        T                    T
                 T   F     F        F                    T
                 F   T     T        F                    T
                 F   F     T        F                    T
Remark 3.34. All contradictions are logically equivalent to each other. Similarly,
all tautologies are logically equivalent to each other. (Can you see why?)      N
3. STATEMENTS                                                                     31
3.E    Exercises
Exercise 3.1. Determine whether the given sentence is a statement. If it is, indicate
its truth value (if you can).
  (a) The number 0 is an even integer.
  (b) Let x = 2.
  (c) If 2 is an even integer.
  (d) Either 2 is even or 4 is odd.
  (e) George Washington had seven children.
  (f) There are 7254 different species of ants in the United States.
Exercise 3.2. Let P , Q, and R be statements. Construct a truth table showing the
possible truth values for each of the following compound statements.
 (a) (P ∧ Q) ⇒ P .
 (b) P ⇒ (P ∨ Q).
 (c) ¬(P ⇒ Q) ∧ (¬P ).
 (d) (P ∨ Q) ∧ R.
 (e) P ∨ (Q ∧ R).
Exercise 3.3. Use truth tables to prove the given logical equivalences.
 (a) ¬(P ∧ Q) ≡ (¬P ) ∨ (¬Q).
 (b) P ⇒ Q ≡ (¬P ) ∨ Q.
 (c) (P ∨ Q) ⇒ R ≡ (P ⇒ R) ∧ (Q ⇒ R)
Exercise 3.4. Determine whether the two given compound statements are logically
equivalent.
  (a) ¬(P ⇒ Q) and P ∧ (¬Q).
 (b) (P ∧ Q) ⇒ R and P ⇒ (¬Q ∨ R).
  (c) P ⇒ (Q ∨ R) and (P ∧ ¬Q) ⇒ R.
 (d) (P ∨ Q) ⇒ R and (¬R ∧ P ) ⇒ (¬Q).
  (e) P ⇔ Q and (P ⇒ Q) ∧ (Q ⇒ P ).
Exercise 3.5. Let P , Q, and R be statements. Identify each of the following state-
ments as a tautology, a contradiction, or neither.
 (a) ((P ⇒ Q) ∧ (¬Q)) ⇒ (¬P ).
 (b) ((P ∨ Q) ∧ (¬P )) ⇒ Q.
 (c) (P ⇒ Q) ⇒ (P ⇒ R).
 (d) ((¬Q) ⇒ (¬P )) ∧ P ∧ (¬Q).
Exercise 3.6. Let P and Q be statements.
 (a) Prove that the compound statement P ⇒ Q is not logically equivalent to Q ⇒
     P.
 (b) The statement Q ⇒ P is called the converse of P ⇒ Q. Give an example of
     statements P and Q for which P ⇒ Q is true, but Q ⇒ P is false. For the
     specific statements P and Q in your example, state their truth values.
Exercise 3.7. The following exercise shows that the biconditional can be a useful
logical connective (if you are ever lost in grue infested woods):
32                                                            CHAPTER II. LOGIC
    After getting lost in the woods, you stumble upon a path. As you follow the path
it comes to a fork, and a grue blocks your way. There is a sign which reads:
     This grue either always tells the truth or always lies. You may ask the
     grue a single question. Any more than that, and it will eat you.
You are positive that one of the paths leads home and the other leads to certain
death. Explain why asking the grue the question “Is the left path the way home if
and only if you are a truth teller?” can help you decide which path to take. (Note
that the question is not “Is the left path the way home?”)
4. OPEN SENTENCES                                                                      33
4      Open sentences
4.A      Open sentences
In many cases, we wish to write a sentence whose truth value depends on variables,
where the variables can take on many different values. Typically, we want each
variable to take on values in some specific set. We call each such set of values the
domain of the corresponding variable.
   Some authors do not explicitly write all open variables that occur in an open
sentence. We will always specify all open variables.
Example 4.2. Let x and y be variables, both with the same domain R. We may
define the open sentence
                              P (x, y): x > y.
With this definition we see that P (2, 3) is false (since the statement 2 > 3 is false),
P (5, 2) is true, and so on. Note that if we try to plug in a value of x or y outside the
domain, the resulting sentence may be meaningless. For instance P (apples, oranges) is
meaningless since the sentence “apples > oranges” makes no sense (you can’t compare
apples and oranges).                                                                   4
   Open sentences are very common throughout mathematics. They can express
quite important and general principles. Here are some examples.
Example 4.3. Let S be the set of triangles and let x be a variable with domain S.
Then we have an open sentence
The open sentence P (x) happens to be true for all choices of x ∈ S. This fact is a
very famous (and very old) theorem.
   Again taking x to be a variable with domain S, we may write the open sentence
In this case, Q(x) is not true for all x ∈ S; instead it is true exactly when x is a right
triangle.                                                                               4
34                                                                   CHAPTER II. LOGIC
     Warning 4.4. An open sentence is much like a function from calculus. It can
     be considered to be a rule in which, when we plug in a value for the variable (or
     variables), we obtain a truth value. Even if an open sentence is true for each
     value of the variable, it does not become a statement. For instance, if we let x
     be a variable with domain equal to the real numbers, the open sentence
P (x): x2 > −1
f (x) = 2
     from algebra. The function f is not equal to the number 2; rather it is a rule
     that associates to each value of x the number 2.
Example 4.5. Let x be a variable with domain the set of all triangles, and define
the following open sentences:
and
                       Q(x): One of the angles of x is a right angle.
Then, if we plug in a specific right triangle t in place of x, the statement P (t) ∧ Q(t)
is true; however, if we take x to be an equilateral triangle e (so that it cannot be a
right triangle), the statement P (e) ∧ Q(e) is false (since Q(e) is false).            4
Example 4.6. Let x be a variable with domain the set of real numbers. We define
two open sentences:
                         P (x): x > 3, Q(x): x < 5.
   Consider the following compound open sentences:
 (1) P (x) ∧ Q(x),
 (2) P (x) ∨ Q(x),
 (3) P (x) ⇒ Q(x),
 (4) P (x) ⇔ Q(x),
 (5) ¬P (x) ∧ ¬Q(x).
4. OPEN SENTENCES                                                                        35
Below, we write the set of places these statements are true. Try it for yourself before
looking at the answers.
   We will work out (3) in detail. Remember that for a given value of x, we have
We know that ¬P (x) is true for x ∈ (−∞, 3]. Also Q(x) is true for x ∈ (−∞, 5).
Thus, ¬P (x) ∨ Q(x) is true for x ∈ (−∞, 3] ∪ (−∞, 5) = (−∞, 5).
All Answers:
 (1) (3, 5),
 (2) R,
 (3) (−∞, 5),
 (4) (3, 5),
 (5) ∅.                                                                                  4
Remark 4.7. The previous example demonstrates why logical equivalence is so use-
ful. We know that
                       P (x) ⇒ Q(x) ≡ ¬P (x) ∨ Q(x)
even before we know the value of x, hence before we know the truth value of P (x) or
Q(x). Thus, we can simplify some compound open sentences, before we even know
what the components say.                                                          N
4.B     Quantifiers
We have seen the expression “for all x ∈ S” several times. This expression is common
enough that we create a symbol for it.
  Definition 4.8. Let x be a variable with domain S and let P (x) be an open
  sentence. The expression
                                 ∀x ∈ S, P (x)
  is then a statement. It is true if P (x) is true for each choice of x ∈ S, and it is
  false if P (x) is false for one (or more) choice of x ∈ S.
Example 4.9. Let S be the set of all triangles and let x be a variable with domain
S. Let P (x) be the open sentence
Then the statement “∀x ∈ S, P (x)” means “For all x in the set of triangles, the sum
of the angles of x is 180◦ .” We may simplify this a bit by reducing it to the sentence
“The sum of the angles of any triangle is 180◦ .” without changing the meaning. We
note that, in this case, the statement ∀x ∈ S, P (x) is a true statement.            4
36                                                                CHAPTER II. LOGIC
Example 4.10. Again letting x be a variable with domain S equal to the set of all
triangles, we will let Q(x) be the open sentence
                      Q(x): One of the angles of x is a right angle.
If we examine the statement ∀x ∈ S, Q(x), we find that it is false, since it is not the
case that all triangles contain a right angle; for example, none of the angles in an
equilateral triangle are right angles. However, if we let T be the subset of S consisting
of right triangles, then the statement ∀x ∈ T, Q(x) is a true statement.               4
   We call the symbol ∀ the universal quantifier; the expression ∀x ∈ S, P (x) makes
the assertion that the open sentence P (x) is universally true for any x in the domain
S. There is another quantifier, ∃, called the existential quantifier.
     Definition 4.11. Let x be a variable with domain S and let P (x) be an open
     sentence. The expression
                                    ∃x ∈ S, P (x)
     is then a statement. It is true if P (x) is true for at least one choice of x ∈ S,
     and it is false if P (x) is false for each choice of x ∈ S.
     We read the statement ∃x ∈ S, P (x) as “There exists some x in S such that P (x).”
Example 4.12. Let x be a variable with domain R and let P (x) be the open sentence
                                      P (x): x2 = 2.
Then the statement ∃x ∈ R, P√(x) is a true statement, because there is at least one
choice of x ∈ R, namely x = 2, for which      √ P (x) is a true statement (in fact there
are two choices, since we could take x = − 2 as well). However, the statement ∃x ∈
Z, P (x) is false, since there are no integers x such that P (x) is a true statement. 4
Example 4.13. Let A and B be sets and let x be a variable with domain A. We will
use quantifiers to describe some relationships between the sets A and B.
   Suppose that the statement
                                      ∀x ∈ A, x ∈ B
is true. This means that every element of A is an element of B. This is what it means
to say that A ⊆ B.
    Suppose that the statement
                                      ∀x ∈ A, x ∈
                                                /B
is true. In other words, every element of A is not an element of B. Then we know that
A and B have no elements in common; they are disjoint. In other words, A ∩ B = ∅.
    Suppose that the statement
                                      ∃x ∈ A, x ∈ B
is true. In other words, there is at least one element in A that is also in B. This tells
us that A ∩ B 6= ∅; so A and B are not disjoint.                                      4
4. OPEN SENTENCES                                                                   37
   Note that, in the case of open sentences with multiple variables, we can define
quantification in a way similar to Definitions 4.8 and 4.11. In this case, we may need
multiple quantifiers in order to turn the open sentence into a statement. In general,
one quantifier will be needed for each distinct variable in the open sentence. As an
example, if we let x and y be variables with the domain R and let P (x, y) be the open
sentence
                                   P (x, y) : x > y,
then the sentence ∀x ∈ R, P (x, y) is not a statement, because it still has a variable
that has not been specified or quantified (namely y). In order to make it a statement
we need to evaluate y; we could say ∀y ∈ R, ∀x ∈ R, P (x, y), which means that
       For all real numbers y and for all real numbers x, it holds that x > y.
Note that this is a false statement, since taking y = 2 and x = 1 gives 1 > 2.
Example 4.14. Suppose we wish to express the statement “Every even integer is a
sum of two odd integers” in symbolic logic. We can do this using multiple quantifiers.
Let Even denote the set of even integers and let Odd denote the set of odd integers.
The statement
                   ∀x ∈ Even, ∃y ∈ Odd, ∃z ∈ Odd, x = y + z
then means that for each even integer x, there is an odd integer y and there is an odd
integer z such that x = y + z. In other words, any even integer x is the sum of two
odd integers, y and z. (This is true.)                                              4
Example 4.15. Suppose we wish to express the statement “Every positive real num-
ber has a positive square root” in symbolic logic. Let R>0 be the set of positive real
numbers. We might write
                             ∀x ∈ R>0 , ∃y ∈ R>0 , x = y 2 .
Interpreted, this means that for each x in the positive real numbers, there exists a
positive real number y, such that x = y 2 . In other words, y is a positive real square
root of x.                                                                           4
   We will return to a much more detailed discussion of multiple quantifiers in the
next section. They are quite important and appear in many parts of mathematics.
P (x) ⇒ Q(x),
P (x): x is even
Q(x): x − 2 is even
(with the domain of x being the integers in both cases). If we wish to write the
quantified statement
                             ∀x ∈ Z, P (x) ⇒ Q(x),
we may write the following:
Notice that no explicit quantifier is stated, but the quantifier is nevertheless under-
stood.
   Similarly the biconditional
means
                                ∀x ∈ Z, P (x) ⇔ Q(x),
even though we didn’t use the words “for all x ∈ Z.”                                 4
Example 4.17. Inclusion of sets can be understood through implication.
   Let A and B be subsets of a universal set U . Let x be a variable with domain U ,
and assume that
                             ∀x ∈ U, x ∈ A ⇒ x ∈ B
4. OPEN SENTENCES                                                                     39
is true. Since the implication must be true for all elements of U , we see that whenever
x ∈ A, it must be the case that x ∈ B. (When x ∈   / A, we know nothing about whether
it is in B.) This is exactly what it means to say that A ⊆ B. Hence, the statement
A ⊆ B can be interpreted (in terms of symbolic logic) as the statement above.
     Occasionally the universal set U is understood from context, and we will write
∀x, x ∈ A ⇒ x ∈ B. More simply, A ⊆ B could be written as ∀x ∈ A, x ∈ B, without
the use of implication or a universal set U .                                         4
∀x ∈ S, P (x) ⇒ Q(x)
to be true.
    We first examine how it could be false. We see that the only way for it to be false
is for there to be one or more values of x for which P (x) is true and Q(x) is false.
Therefore, in order for the statement ∀x ∈ S, P (x) ⇒ Q(x) to be true, it must be
the case that for each x ∈ S, either P (x) is false or Q(x) is true. Another way to put
this is that whenever P (x) is true, it must happen that Q(x) is also true. In other
words, if we know that P (x) is true, then Q(x) must be true. Hence, for any given
x ∈ S we have “If P (x), then Q(x).”
    There are a number of different ways of writing a sentence with the meaning “If
P , then Q.” Some of them are:
                         If P , then Q.
                         P implies Q.
                         P only if Q.
                         Q if P .
                         P is sufficient for Q.
                         Q is necessary for P .
                         Whenever P is true, then Q is true.
    For example, if we say “P is sufficient for Q,” this has the meaning that P being
true is sufficient for us to conclude that Q is true. In other words, “If P is true, then
Q is true.”
    You should think about the meaning of the other phrases above, and satisfy your-
self that they all have the same meaning as P ⇒ Q.
    Another sentence that can be expressed in several different ways in English is
P ⇔ Q. Some of the ways that it can be expressed are:
                          P   if and only if Q.
                          P   is equivalent to Q.
                          P   is necessary and sufficient for Q.
                          P   holds exactly when Q holds.
40                                                             CHAPTER II. LOGIC
4.F    Exercises
Exercise 4.1. Let x be a variable with domain Z. Define the open sentences
                                P (x) : x > 1,
                                Q(x) : x2 < 16, and
                                R(x) : x + 1 is even.
For each of the following compound open sentences, describe the subset of Z (by
listing its elements or using set-builder notation) where that open sentence is true.
  (a) P (x) ∧ Q(x).
  (b) Q(x) ∧ R(x).
   (c) (Q(x) ∨ ¬P (x)) ∧ ¬R(x).
  (d) (P (x) ⇒ Q(x)) ⇒ R(x).
(Hint: Simplify using logical equivalences.)
Exercise 4.2. For each x ∈ {1, 2, 3, 4, 5, 6}, write down the truth value of
                                              3x + 5
            P (x): If x is an odd integer, then        is an odd integer,
                                                 2
and then state whether you believe ∀x ∈ Z, P (x) is true or false.
Exercise 4.3. For each x ∈ {1, 2, 3, 4, 5, 6}, write down the truth value of
            Q(x): If x is an even integer, then 3x + 5 is an odd integer,
and then state whether you believe ∀x ∈ Z, Q(x) is true or false.
Exercise 4.4. Let A and B be two subsets of a universal set U . Write a symbolic
logic interpretation of the statement A = B. Explicitly write out any quantifiers
involved in the statement. (It is possible to do this with no reference to U .)
Exercise 4.5. Translate the following English sentences into symbolic logic. Explic-
itly write any quantifiers that are implied.
  (a) There is an integer strictly between 4 and 6.
  (b) The square of any odd integer is odd.
  (c) If the square of an integer is odd, then the original integer is odd.
  (d) If a real number is not rational, then it is not equal to 0.
  (e) The sum of two rational numbers is rational.
   (f) The square of any real number is a nonnegative real number.
  (g) There is an integer solution to the equation x2 − 5x + 6 = 0.
  (h) Every real solution to x2 − 5x + 6 = 0 is an integer.
Exercise 4.6. Translate the following symbolic logic statements into English.
 (a) ∃x ∈ R, x2 = 2.
 (b) ∀x ∈ Z, (x is even) ⇔ (x2 is even).
 (c) ∀x ∈ R, (x > 1) ⇒ (x3 > 1).
 (d) ∀x ∈ R, (x2 − 2x + 1 = 0) ⇒ (x = 1).
 (e) ∃x ∈ Q, 2x3 − x2 + 2x − 1 = 0.
 (f) ∀x ∈ R, ∃y ∈ Z, ∃z ∈ [0, 1), x = y + z.
42                                                              CHAPTER II. LOGIC
∀x ∈ R, ∃y ∈ R, P (x, y).
For each real number x, there is a real number y such that x > y,
or in other words, “For each real number x, there is a real number y that is smaller
than x.” We claim that this statement is true. To see this, suppose that x is any real
number. The statement asserts that no matter what x is, there is some real number
y such that x > y. If we let y = x − 1, we see that regardless of what the value of
x is, x > y. Hence, for each real number x, there is some y (for example y = x − 1)
with x > y.                                                                         4
Example 5.2. Let P (x, y) again be the open sentence x > y with the domain of
both x and y being the real numbers. We next examine the quantified statement
∃y ∈ R, ∀x ∈ R, P (x, y).
Note that this statement is the same as the statement in Example 5.1, except for the
order of the quantifiers. In words, this statement means
There is some real number y such that x > y whenever x is a real number,
or in other words, “there is a real number y such that every real number is larger
than y.”
    This statement is false, which we see as follows. If y is a real number, then y − 1
is not larger than y. Hence, there does not exist a real number y for which every real
number is larger than y.                                                             4
   These two examples demonstrate that the order of quantifiers in a statement
with multiple quantifiers can change the truth value of the statement. This only
happens when the quantifiers are not the same type. Changing the order of existential
quantifiers that are next to each other, or universal quantifiers that are next to each
other, will not change the truth value of the statement. For example,
                                ∀x ∈ R, ∀y ∈ R, x > y
5. MULTIPLE QUANTIFIERS AND NEGATING SENTENCES                                         43
∀y ∈ R, ∀x ∈ R, x > y
(in this case both are false). Since the order does not matter in this case we will
abbreviate either of these statements as
∀x, y ∈ R, x > y,
∃y ∈ R, ∃x ∈ R, x > y
(in this case both are true). Since the order does not matter in this case, we often
abbreviate either of these statements as
∃x, y ∈ R, x > y,
which we might read as “there are real numbers x and y such that x > y.”
Remark 5.3. Sometimes, when the domain of a variable is well understood and we
wish to quantify a statement over a subset of the domain that is defined by some easy
mathematical condition, we might adapt the notation that we use for a quantifier.
For instance, if x and y are variables with domain R and we wish to express the
statement “Every positive real number has a positive real square root,” we might
write
                               ∀x > 0, ∃y > 0, x = y 2 ,
which is shorthand for
                              ∀x ∈ R>0 , ∃y ∈ R>0 , x = y 2 .
    This modification of notation can only be used when the domain of definition
of the variables is explicitly stated, or completely clear from context. One common
example of the use of this notation will occur in the definition of a limit in Chapter IX,
where we will let , δ, and x be real variables, and define
                                      lim f (x) = L
                                      x→a
to mean
              ∀ > 0, ∃δ > 0, ∀x ∈ R, 0 < |x − a| < δ ⇒ |f (x) − L| < .
(Note that most authors remove ∀x ∈ R, leaving it to the reader to fill in that
gap.)                                                                        N
44                                                              CHAPTER II. LOGIC
P ⇒ Q ≡ (¬P ) ∨ Q.
Hence,
             ¬(P ⇒ Q) ≡ ¬((¬P ) ∨ Q) ≡ (¬¬P ) ∧ (¬Q) ≡ P ∧ (¬Q),
where we have used De Morgan’s law and double negation.
    We can interpret this string of equivalences as follows. Saying “P implies Q is
false” is the same as asserting “P is true and Q is false.”                     4
R : ∀x ∈ S, P (x),
What does this mean? Simply, that P (x) must be false sometime (i.e., for at least
one x ∈ S). More formally,
                            ¬R : ∃x ∈ S, ¬P (x).
    Similar reasoning can be used to see that the negation of ∃x ∈ S, P (x) is the
statement ∀x ∈ S, ¬P (x). We state our conclusions as an axiom.
5. MULTIPLE QUANTIFIERS AND NEGATING SENTENCES                                         45
  Axiom 5.5. Let P (x) be an open sentence, where x has domain S. Then the
  negation of the statement
                                ∀x ∈ S, P (x)
  is equivalent to the statement
∃x ∈ S, ¬P (x).
∃x ∈ S, P (x)
∀x ∈ S, ¬P (x).
Example 5.6. Suppose that we wish to negate the statement “All integers are posi-
tive.” We can write this statement with a quantifier as
                                   P : ∀x ∈ Z, x > 0.
According to the axiom, the negation of this statement is (equivalent to)
                                 ¬P : ∃x ∈ Z, ¬(x > 0).
In words, we could state this as “There exists an integer that is not positive,” or
“Some integer is not positive.” We note that the negation of the open sentence x > 0
is easily seen to be x ≤ 0. Hence, we could write ¬P : ∃x ∈ Z, x ≤ 0 for the negation.
(In this example which of the following is true: P or ¬P ?)                         4
   We can extend these methods to negate statements with multiple quantifiers.
Example 5.7. Let x, y, z be variables with domains S, T , and U , respectively, and
let P (x, y, z) be an open sentence. We will negate the statement
                           ∀x ∈ S, ∀y ∈ T, ∃z ∈ U, P (x, y, z).
In order to do this, we will use parentheses to make the order of the quantifiers clearer.
Hence, the statement that we wish to negate is
                         ∀x ∈ S, (∀y ∈ T, (∃z ∈ U, (P (x, y, z)))).
Proceeding one level at a time, we see that
                          ¬ (∀x ∈ S, ∀y ∈ T, ∃z ∈ U, P (x, y, z))
                     ≡      ∃x ∈ S, ¬(∀y ∈ T, ∃z ∈ U, P (x, y, z))
                     ≡      ∃x ∈ S, ∃y ∈ T, ¬(∃z ∈ U, P (x, y, z))
                     ≡      ∃x ∈ S, ∃y ∈ T, ∀z ∈ U, ¬P (x, y, z).
    Notice that negating this quantified statement was as simple as swapping all ex-
istential and universal quantifiers, and negating the open sentence at the end.   4
46                                                                CHAPTER II. LOGIC
     Warning 5.8. One mistake that many students make is getting carried away
     with negating quantifiers. In particular, when negating the statement
∀x ∈ S, P (x)
                                      ∃x ∈
                                         / S, ¬P (x).
Example 5.9. The problem mentioned in the previous warning most commonly
happens when we use notation other than x ∈ S in our quantifiers. For instance, if x
is a real variable, and we wish to negate
∀x > 0, P (x),
P : ∀x ∈ R, ∃y ∈ R, x > y.
(Remember that this is the true statement from Example 5.1.) Then the negation is
¬P : ∃x ∈ R, ∀y ∈ R, x ≤ y.
Example 5.15. Let S = (0, 1] = {x ∈ R : 0 < x ≤ 1}. Then S has a lower bound;
if we take x = −1, we see that every element y ∈ S satisfies y ≥ x. Note that we
could have taken x = 0 or x = −3 as well. We remark that although S has a lower
bound, it has no least element.                                               4
   We will investigate the ideas of greatest and least elements and upper and lower
bounds more in the exercises.
48                                                              CHAPTER II. LOGIC
                               ¬(¬P )    −→    P
                            ¬(P ∨ Q)     −→    ¬P ∧ ¬Q
                            ¬(P ∧ Q)     −→    ¬P ∨ ¬Q
                           ¬(P ⇒ Q)      −→    P ∧ ¬Q
                      ¬(∀x ∈ S, P (x))   −→    ∃x ∈ S, ¬P (x)
                      ¬(∃x ∈ S, P (x))   −→    ∀x ∈ S, ¬P (x)
5.E    Exercises
Exercise 5.1. Write the negation of the following statements and open sentences. In
each case, the domain of each variable x, , and δ is the set of real numbers. (Write
any quantifiers and logical connectives using English.)
  (a) x > 2 and x < 3.
 (b) If x > 3, then x > 2.
  (c) If x > 3 and x 6= 4, then x2 6= 16.
 (d) If 3 < x < 4, then 9 < x2 < 16.
  (e) If x = 2 or x = 3, then x2 − 5x + 6 = 0.
  (f) For all x ∈ R, it happens that x2 + 2x > 0.
  (g) There exists an x ∈ R such that x2 + 2x > 0.
 (h) For each x ∈ R, there exists y ∈ R such that y > x2 .
   (i) There exists an x ∈ R, such that for all y ∈ R, it holds that y > x2 .
  (j) For each  > 0, there exists some δ > 0 such that for each x ∈ R, if 0 < |x−2| <
       δ, then |x2 − 4| < .
Exercise 5.2. For each pair of statements, decide if they have the same truth value.
 (a) ∀x ∈ R, ∃y ∈ R, x + y = 0 and ∃y ∈ R, ∀x ∈ R, x + y = 0.
 (b) ∀x ∈ R, ∃y ∈ R, xy = 0 and ∃y ∈ R, ∀x ∈ R, xy = 0.
 (c) ∀x ∈ R, ∃y ∈ R, xy 6= 0 and ∃y ∈ R, ∀x ∈ R, xy 6= 0.
 (d) ∀x ∈ R, ∃y ∈ R, y + x2 > 0 and ∃y ∈ R, ∀x ∈ R, y + x2 > 0.
Exercise 5.3. Answer the following two problems.
 (a) Give an example of a set of real numbers that has an upper bound, but does
     not have a greatest element.
 (b) Can there be a set that has a greatest element, but does not have an upper
     bound? Explain.
Exercise 5.4. Let S = 1, 21 , 13 , 41 , . . . = n1 : n ∈ N .
                                            	           	
Exercise 5.5. Let S = (0, 1) be the open interval of real numbers between 0 and 1.
 (a) Does S have an upper bound? If so, give an upper bound.
 (b) Does S have a greatest element? If so, what is it?
 (c) Does S have a lower bound? If so, give a lower bound.
 (d) Does S have a least element? If so, what is it?
     The last two chapters were an introduction to the language of mathematics. Know-
ing the definitions and concepts of set theory and logic allow us to communicate
thoughts more clearly and succinctly. In this chapter we will put our new knowledge
to use in proving that statements are true.
     A good proof is like a good painting. It opens the viewer’s mind to deeper insights,
connections, and beauties. The purpose of a proof is not only to convince the reader
that something is true, but to do so in a way that aids in their understanding of why
it is true.
                                           51
52                                         CHAPTER III. BASIC PROOF TECHNIQUES
6       Direct proofs
6.A       Terminology
All mathematical arguments need a foundation on which to stand; we need truths
which are taken to be self-evident. These truths are called axioms. Some axioms
which are commonly used by mathematicians are the following:
    • The empty set exists.
    • For each real number, there is an integer greater than it.
    • Any two lines either intersect in a single point or are parallel.
In this book we will not worry too deeply about which axioms we will use, trusting
that readers will learn by example what sorts of statements they may use freely.
    Once the language of mathematics is in place we also have definitions. We gave
many examples in the previous two chapters of concepts we have defined, such as
unions, intersections, logical connectives, and so forth. In this section, the following
two definitions will be very important.
Example 6.3. The integer 3 is odd, since 3 = 2 · 1 + 1. The integer 16 is even, since
16 = 2 · 8.                                                                        4
     Warning 6.4. If n is an even integer, this does not mean that n = 2k for all
     k ∈ Z. It is impossible for a single number to equal all of the even integers at
     once.
∀x ∈ S, P (x) ⇒ Q(x)
Proof. The conclusion 0 < 1 is always true. So the implication is trivially true.
Proof. The conclusion is always true; the number 2a is even since it is 2 times an
integer. Hence the implication is trivially true.
    Notice that in both cases the premise of the implication was irrelevant. In the
first proposition it didn’t matter whether or not x2 < 73, since 0 < 1 is always true.
Similarly, in the second proposition it didn’t matter whether or not a is odd, because
2a is always even.
    Also notice the little square box at the end of the proof. This tells the reader that
you have finished the proof.
54                                      CHAPTER III. BASIC PROOF TECHNIQUES
     Warning 6.8. The word “trivial” should not be used in a proof to mean “this
     step is easy, so I will skip it.”
Sometimes “trivial” proofs are not easy and take some work to prove.
     Advice 6.10. It is always a good idea to understand the premise and conclusion
     of an implication separately before trying to prove the implication.
∀x ∈ S, P (x) ⇒ Q(x)
    We use the word “vacuous” when the premise of an implication is false everywhere
in the domain because we think of the implication as an empty promise in that case.
In other words, the statement is true because it isn’t asserting anything!
    The following are some examples of vacuously true statements and their proofs.
Proof. The premise is −x2 > 2, which is equivalent to 0 > x2 + 2. The right side is
the sum of a square and a positive number; such a sum can never be negative. Thus,
the implication is vacuously true.
6. DIRECT PROOFS                                                                       55
  Advice 6.14. To remember the difference between trivial and vacuous proofs,
  memorize the following phrase:
    Trivial, the Q is true.
    Vacuous, the premise is bogus.
Example 6.15. Consider the following statements, and determine if they are trivially
true, vacuously true, or neither trivial nor vacuous.
  (1) Given a ∈ R, if a2 is a negative real number, then a = 5.
  (2) Given a ∈ C, if a2 is a negative real number, then a ∈ C − R.
  (3) Let x ∈ N. If x ∈ ∅, then x > 0.
  (4) Let a ∈ N. Either a is even or a is odd.
The answers are as follows:
  (1) The statement is vacuously true, since the premise is never true. It is not trivial,
      because the conclusion can be false for some a ∈ R (such as a = 6).
                   √ is not vacuous, since the premise is true for some a ∈ C (such
  (2) This implication
      as a = i = −1). It is also not trivial since the conclusion is false for some
      a ∈ C (such as a = 0).
  (3) This statement is both vacuous and trivial! The premise is never true, and the
      conclusion is always true (for x ∈ N).
  (4) This is not an implication, so it is not trivial or vacuous.                     4
Proof. Let x ∈ Z be arbitrary. We will work directly. Assume x is even. This means
x = 2k for some k ∈ Z. Thus
   In the first sentence, we deal with the ∀x ∈ Z, by telling the reader we are letting
x be an arbitrary integer. The next sentence tells the reader what type of proof
technique we will use. In this case it is a direct proof. (We will have more options
56                                      CHAPTER III. BASIC PROOF TECHNIQUES
available over the next few sections.) After telling the reader we are working directly,
we must assume the premise of the implication. So the third sentence does exactly
that; we assume x is even. We then do some work, and we finish by showing that the
conclusion holds.
   The basic outline of a direct proof is as follows.
Result to be proved. Given x ∈ S, if P (x) is true, then Q(x) is true.
Proof. Let x ∈ S.
We work directly.
Assume P (x).
Do some work (to be filled in).
Thus, Q(x) holds.
     Advice 6.17. When first writing proofs it can be useful to leave some space in
     the middle and write the last sentence on the bottom. That way, you can see
     where you need to end up.
6.E     Exercises
Exercise 6.1. Let x ∈ R. Prove that if x 6= 3, then x2 − 2x + 3 6= 0. (Would this
result be true if we took x ∈ C?)
Exercise 6.5. Let a, b, c ∈ Z. Prove that if a and c are odd, then ab + bc is even.
Exercise 6.6. Let n ∈ Z. Prove that if |n| < 1, then 3n − 2 is an even integer.
Exercise 6.7. Prove that every odd integer is a difference of two squares of integers.
(Hint: Try small cases; write 1, 3, 5, and 7 as differences of squares. It might help to
rephrase this statement as an implication, with a premise and conclusion.)
58                                    CHAPTER III. BASIC PROOF TECHNIQUES
7      Contrapositive proof
While the technique of direct proof is a powerful tool, this section will introduce
another method which is very similar in spirit. This new method is called “contra-
positive proof.” You may have experienced it in your own life. For instance, consider
the following story:
   Alice is at work on Friday and tells her coworker Bob: “If it rains on Monday,
then I’m not coming in to work.” Bob has an important deadline on Monday, and so
works through the entire weekend. On Monday Bob sees Alice come into the office.
He concludes it must not be raining.
   Bob’s logic is sound and will be explained (and exploited) in this section.
R : P ⇒ Q.
P ⇒ Q ≡ ¬Q ⇒ ¬P.
Example 7.2. We will find the contrapositive of the statement: “If it rains on
Monday, then I’m not coming in to work on Monday.” The contrapositive (after an
easy application of double negation) is exactly: “If I come in to work on Monday,
then it is not raining on Monday.” In the story at the beginning of this section, Bob
used the contrapositive of Alice’s sentence to conclude it was not raining.        4
    We can also take the contrapositive of an implication between two open sentences,
as in the following example.
3x − 7 = 3(2k) − 7 = 6k − 7 = 2(3k − 4) + 1
is odd.
x = (3x − 7) − 2x + 7 = 2k − 2x + 7 = 2(k − x + 3) + 1
is odd.
    For the next proposition, try to decide for yourself whether you should work
directly or contrapositively.
    Do you want to work directly? If you do, you will assume x2 − 6x + 7 is odd, and
try to show that x is even.
    Or do you want to work contrapositively? If you do, you will assume x is odd,
and try to show that x2 − 6x + 7 is even.
60                                         CHAPTER III. BASIC PROOF TECHNIQUES
is even.
Working directly probably will not work out well for the previous proposition.
7.B        Division
We have done many proofs using even and odd integers. We will now introduce a
new definition which we can use to prove more statements.
Example 7.7. (1) Does 3 divide 6? Yes, dividing 6 by 3 yields the integer 2. So
6 = 3 · 2.
   (2) Does 5 divide 9? No, dividing 9 by 5 yields a remainder of 4.         4
∃c ∈ Z, b = ac.
     Warning 7.9. The symbols a | b represent a sentence that means a divides into
     b. It means that the fraction ab is an integer. It does not mean a ÷ b (which is
     just the number ab , not a sentence).
There are other interesting open sentences related to R. Consider the following:
    • The converse of R(x) is: Q(x) ⇒ P (x).
    • The inverse of R(x) is: ¬P (x) ⇒ ¬Q(x).
    • The contrapositive of R(x) is: ¬Q(x) ⇒ ¬P (x).
    • The negation of R(x) is: ¬R(x) ≡ ¬(P (x) ⇒ Q(x)) ≡ P (x) ∧ ¬Q(x).
    The contrapositive has the same truth table as R(x) (treating P (x) and Q(x) as
components of a compound sentence). However, the converse has a different truth ta-
ble. (Try it!) The negation has another, third, truth table. (Try it!) So the converse,
the contrapositive, and the negation are (in general) all very different sentences.
    What about the inverse? It does have the same truth table as one of the other
three sentences! But which one? (Try it!)
7.D       Biconditional
Over the past few sections, we have been focusing on proving (universally quanti-
fied) implications P (x) ⇒ Q(x). Another common sentence to prove is the bicon-
ditional P (x) ⇔ Q(x). To prove it, we show both P (x) ⇒ Q(x) and the converse
Q(x) ⇒ P (x). Each direction can be proved directly or contrapositively! Consider
the following example.
n2 = (2k)2 = 4k 2 = 2(2k 2 )
is an even integer.
    We now prove conversely that if n is odd, then n2 is odd. We work directly.
Assume n is odd. Then n = 2k + 1 for some k ∈ Z. Hence
is an odd integer.
   To help the reader, sometimes a proof writer will use arrows to tell the reader
which of the two directions is being proved. For instance, the previous proof could
be rewritten as follows.
7. CONTRAPOSITIVE PROOF                                                            63
7.E     Exercises
Exercise 7.1. Let a ∈ Z. Prove that if a2 + 3 is odd, then a is even.
Exercise 7.2. Prove the following: Let x, y ∈ Z. If xy + y 2 is even, then x is odd or
y is even.
Exercise 7.3. Let s ∈ Z. Prove that s is odd if and only if s3 is odd.
Exercise 7.4. Consider the following situation. A student is asked to prove the
statement: “Given x ∈ Z, if 2 | x, then x is even.” The student writes: “Assume,
contrapositively, that x is even. Then x = 2k for some k ∈ Z. Hence 2 | x.”
    Identify what is wrong with this student’s proof and write a correct proof.
Exercise 7.5. Let a, b, c, d ∈ Z. Prove that if a | c and b | d, then ab | cd.
Exercise 7.6. State the contrapositive of the implication in the previous exercise.
Exercise 7.7. Let a ∈ Z. Prove that if 4 - a2 , then a is odd.
Exercise 7.8. Prove the following implication two ways (directly and contraposi-
tively): Given x ∈ Z, then 5x − 1 is even only if x is odd. (Be careful to prove the
correct implication. See Subsection 4.D for the meaning of “only if”.)
64                                      CHAPTER III. BASIC PROOF TECHNIQUES
8       Proof by cases
8.A       Introductory examples
Some problems break down into natural cases. Here are some common examples:
   • Integers are either even or odd.
   • Real numbers (or integers) can be positive, negative, or zero.
   • Numbers can be zero or nonzero.
   • Real numbers can be rational or irrational.
   • Sets can be empty or nonempty.
   • Sets can be finite or infinite.
   Some problems can be handled by considering all possible cases separately. The
proof of the following proposition shows how this is to be done.
x2 + x = (2k)2 + 2k = 4k 2 + 2k = 2(2k 2 + k)
is even.
    Case 2: Suppose x is odd. Then x = 2k + 1 for some k ∈ Z. Then
is even.
    In every case x2 + x is even.
   The key to working by cases is that we can break a problem into smaller “sub-
problems” that each can be handled separately. Sometimes it takes practice to rec-
ognize how to break a problem into smaller cases. On the other hand, sometimes a
problem shouts “Do me by cases!” To demonstrate, let’s introduce another definition.
     Definition 8.2. Let x, y ∈ Z. We say that x and y have the same parity if
     either they are both even or they are both odd. If this doesn’t happen, we say
     that x and y have the opposite parity.
Proof. Let x, y ∈ Z. We work contrapositively. Assume x and y have the same parity.
There are two cases.
   Case 1: Suppose x, y are both even. Then x = 2k and y = 2` for some k, ` ∈ Z.
Then
                            x + y = 2k + 2` = 2(k + `)
is even.
    Case 2: Suppose x, y are both odd. Then x = 2k + 1 and y = 2` + 1 for some
k, ` ∈ Z. Then
                     x + y = 2k + 1 + 2` + 1 = 2(k + ` + 1)
is even.
    In every case x + y is even.
    Notice that we didn’t say that Case 2 is similar to Case 1. That’s because they
really are different!
    In some situations, the cases we should consider come from one of our assumptions.
    Why were those two cases enough to cover all possibilities? Aren’t we missing the
case where x is odd and y is odd? We are ignoring the case where x is odd and y
is odd! But we can ignore that case because we assumed the fact that x is even or
y is even (when working directly). Our assumption limited the number of cases we
needed to consider. If we hadn’t made that assumption, we would need to consider
that last possibility.
     Advice 8.6. If you assume P ∨ Q, then you have two cases: case 1 is when
     P holds, and case 2 is when Q holds. You could also break this up into three
     separate cases: case 1 is when P and Q both hold, case 2 is when P holds but
     Q fails, and case 3 is when P fails but Q holds.
     Warning 8.7. If you are trying to show P ∨ Q, then you do not just consider
     the two cases P or Q. You do not yet know those are the only two options!
        However, you do know that P or ¬P happens. So, perhaps your two cases
     could be as follows. Case 1 is when P holds, and you are done. Case 2 is when
     P fails, and try to show Q now holds.
Question: In the backwards direction, (⇐), we had three cases. Why didn’t we have
those three cases in the forward direction?
Answer: In the backwards direction we made the assumption x ∈ {−1, 0, 1}, which
limited the possibilities for x to three cases. In the forward direction we didn’t have
such an assumption. So we had to consider every possibility.
   The following theorem is very useful and also demonstrates these same ideas again.
8.B     Congruence
If the clock on the wall says 9:00, and 37 hours pass, what time is it then? It isn’t
too difficult to figure out that the new time is 10:00. The way we figure this out is to
notice that each 12 hour block keeps the clock fixed, and so 37 hours looks the same
as 1 hour.
    We do something similar when working with even and odd numbers. We know
that an even number plus an odd number will always be odd. The reason is that
adding any multiple of 2 doesn’t change whether a number is odd or even.
    These two situations are special cases of a much more general, and powerful,
technique. In the first situation, we are looking at numbers and treating multiples of
n = 12 as trivial. In the second case, we are treating multiples of n = 2 as trivial
blocks. The following definition does this for an arbitrary integer n.
a ≡ b (mod n)
    In the following example we work out some instances where this definition holds
or does not hold.
68                                      CHAPTER III. BASIC PROOF TECHNIQUES
   In many cases, we can work with congruences almost as if they were equations.
We will later prove several theorems of this sort; the following proposition is an
example.
    This proposition says that when numbers are congruent, we can multiply by any
integer and they stay congruent. For instance, we have 5 ≡ −9 (mod 7). Multiply
by 5 to get 25 ≡ −45 (mod 7).
Example 8.14. Another way to think about congruence is that two numbers are
congruent modulo n if they have the same remainder when we divide by n. When we
divide by 2, there are only two remainders, so every x ∈ Z is either odd or even. In
other words,
                    x ≡ 0 (mod 2)       or      x ≡ 1 (mod 2).
What happens if we work modulo 3? Now there are three remainders, and we get
In other words, every integer x is of exactly one of the forms 3k, 3k + 1, or 3k + 2 for
some k ∈ Z. (We will prove this later, but you can use it freely for now.)            4
   The previous example tells us that sometimes we can reduce questions about
division into cases according to remainders!
x3 −x = (3k +2)3 −(3k +2) = 27k 3 +54k 2 +36k +8−3k −2 = 3(9k 3 +18k 2 +11k +2).
Example 8.16. The previous proposition asserts that 3 | (10373 − 1037). Check
it!                                                                        4
  Warning 8.19. Many students have difficulty with the idea that |x| = −x,
  which is true anytime that x < 0. Part of this difficulty is that they think that
  an expression beginning with a negative sign (such as −x) must be negative.
  However, if x is negative, then −x is positive.
     Another instance where this problem can come up is if we compute |−x|. We
  note that this is not necessarily equal to x. In particular, if x is negative |−x|
  never equals x.
   We will now prove some statements involving absolute values. Several of these are
important enough to be called Theorems (and one is even important enough to have
a name).
70                                       CHAPTER III. BASIC PROOF TECHNIQUES
|x − a| ≤ b,
then a − b ≤ x ≤ a + b.
a − b ≤ x ≤ a + b.
a − b ≤ x ≤ a + b.
Proof. We divide the proof into four cases. Without loss of generality, we may assume
that x ≥ y, so that if only one of x, y is nonnegative, it is x.
Case 1: Suppose that x ≥ 0 and y ≥ 0. Then x + y ≥ 0, so
|x + y| = x + y = |x| + |y|.
8.D     Exercises
Exercise 8.1. Let x, y ∈ Z. Prove that if x and y have the same parity, then x2 + xy
is even.
Exercise 8.2. Let a, b, c ∈ Z. Prove that if a - bc, then a - b and a - c. (The converse
is not true. Can you see why?)
Exercise 8.5. Prove, for any n ∈ Z, that 3 | n if and only if 3 | n2 . (Hint: Use the
idea in Example 8.14 to divide the proof into cases.)
Exercise 8.8. Prove Theorem 8.22; for any x, y ∈ R, we have |xy| = |x||y|.
9       Proof by contradiction
9.A       Basic technique and examples
In this section we explore a proof technique that can be applied not only to implica-
tions but to other statements as well; the technique is called “proof by contradiction.”
It is based upon the following simple idea:
Proof. If the conclusion of an implication is false, the only way for the implication to
be true is if the premise is also false. Hence ¬R is false. But this means R is true.
(Alternatively, draw a truth table.)
    Another way to think about this theorem is that if by assuming ¬R we can reach
some false statement, then R must have been true after all. (If an assumption leads
to nonsense, that assumption must have been false.) We will see this proof technique
in action, by proving the following proposition.
  Advice 9.3. An easy way to remember what to assume when proving an im-
  plication by contradiction is that you have the same assumptions as in both a
  direct proof and a contrapositive proof.
    Contradiction proofs can also involve cases. We just need to check that every case
ends in a contradiction, in order to show that that case could not have happened after
all. The following result demonstrates this idea.
  Proposition 9.6. If x ∈ Z is even, then x is not the sum of three integers with
  an odd number of them being odd.
Proof. Assume, by way of contradiction, that there is some even integer x that is the
sum of three integers a, b, c ∈ Z, an odd number of them being odd. There are two
cases.
74                                       CHAPTER III. BASIC PROOF TECHNIQUES
x = a + b + c = 2k + 1 + 2` + 1 + 2m + 1 = 2(k + ` + m + 1) + 1
x = a + b + c = 2k + 2` + 2m + 1 = 2(k + ` + m) + 1
   Do we know that there exist any irrational numbers? Yes, and this fact was proved
thousands of years ago by the Pythagoreans. (Legend has it that the mathematician
who originally proved this fact was either killed or exiled for the proof, since it ran
counter to the doctrine of the times!) Here is the proof essentially unchanged from
that time.
                                     √
     Theorem 9.8. The real number        2 is irrational.
                                                   √                              √
Proof. Assume, by way of contradiction, that 2 ∈ Q. We can then write 2 =
a/b for some a, b ∈ N, with a/b in lowest terms. By squaring and then clearing
denominators, we have a2 = 2b2 . Thus 2 | a2 , and hence 2 | a. (This follows from
Proposition 7.16 proved earlier.) Write a = 2x for some x ∈ Z.
    Plugging a = 2x into the equality a2 = 2b2 yields 4x2 = 2b2 , or in other words b2 =
2x . Thus 2 | b2 , and hence 2 | b. However, now a and b are both√even which contradicts
  2
the fact that a/b was assumed to be in lowest terms. Hence 2 is irrational.
9.D       Advice
We end this section with two pieces of advice.
    First, sometimes one can tell that a result could be proved by contradiction be-
cause the statement R itself has some negative sounding words. For instance, in this
section we proved the following statements R:
    • There is no smallest positive rational number.
9. PROOF BY CONTRADICTION                                                           75
                     √
    • The number 2 is not rational.
    • If x is even, then 2 does not divide x2 + 1.
It is usually easier to work with positive conditions, rather than negative conditions,
which is why proofs by contradiction work so well in these cases. The negative
conditions are turned positive after negating R.
    Second, if a proof can be done without contradiction, then that is usually a better
option, because you never enter an “imaginary” world where you assume something
you are hoping to show is false.
9.E    Exercises
Exercise 9.1. Let R and S be statements. Draw a truth table with columns labeled
R, S, ¬R, and (¬R) ⇒ S. Verify that the only row where S is false and (¬R) ⇒ S
is true occurs when R is true.
Exercise 9.2. Prove the following statement directly, contrapositively, and by con-
tradiction: Given x ∈ Z, if 3x + 1 is even, then 5x + 2 is odd.
Exercise 9.7. Prove: If we are given a nonzero rational number x and an irrational
number y, then the number xy is irrational. (Hint: Your proof should, somewhere,
use the fact that x 6= 0, because when x = 0 the conclusion is false.)
Exercise 9.8. Prove there is no smallest positive irrational number. (Hint: Use the
result of the previous exercise.)
Proof. First, we see that since 28 and 6 are integers, we have (28, 6) ∈ Z × Z.
   Second, we check directly that 28 − 6 = 22 is divisible by 11. Hence 28 ≡ 6
(mod 11).
   Therefore (28, 6) satisfies all of the conditions to belong to this set.
∀x, x ∈ A ⇒ x ∈ B.
Most often we prove this implication using a direct proof. The steps are simple.
 (1) Assume x ∈ A.
 (2) Using that information, show x ∈ B.
We demonstrate how this is to be done with a few examples.
10. PROOFS IN SET THEORY                                                             77
   In the next example, we make use of the tautology P ⇒ P ∨ Q. (If P is true, then
P is true or Q is true.)
   Case 2: Assume b ∈   / X. Then, from our work above we know that b ∈ Y and
b ∈ Z. Hence b ∈ Y ∩Z. By tautology, b ∈ X or b ∈ Y ∩Z. Therefore b ∈ X ∪(Y ∩Z),
by definition of union.
   In every case we proved b ∈ X ∪ (Y ∩ Z), so we have proved the needed inclusion.
  Theorem 10.10. Let A, B, and C be sets. Assume that they all are subsets of
  some universal set U . The following properties hold:
     • Commutative laws.
            A ∩ B = B ∩ A.
            A ∪ B = B ∪ A.
     • Associative laws.
            (A ∩ B) ∩ C = A ∩ (B ∩ C).
            (A ∪ B) ∪ C = A ∪ (B ∪ C).
     • Distributive laws.
            A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).
            A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
     • Identity laws.
            A ∪ ∅ = A.
            A ∩ U = A.
     • Complement laws.
            A ∪ A = U.
            A ∩ A = ∅.
    It is a deep fact that every other set equality involving only unions, intersections,
and complements, can be derived (algebraically) from these laws. However, there are
a few other common properties that can be proved quite easily just using the methods
of this section. (In fact, we proved part of one of them earlier!)
  Theorem 10.11. Let A and B be sets. Assume that they all are subsets of some
  universal set U . The following properties hold:
     • De Morgan’s laws.
             A ∪ B = A ∩ B.
             A ∩ B = A ∪ B.
     • Double negation.
             A = A.
80                                     CHAPTER III. BASIC PROOF TECHNIQUES
10.E      Exercises
Exercise 10.1. For each element and set listed below, explain why the element does
or does not belong to the set.
  (a) Is 3 ∈ {1, 2, 3, 4, 5, 6, 7}?
 (b) Is π ∈ {1, 2, 3, 4, 5, 6, 7}?
  (c) Is π ∈ R?
 (d) Is 2/3 ∈ {x ∈ R : x < 1}?
  (e) Is 2/3 ∈ {x ∈ Z : x < 1}?
Exercise 10.3. Let X be the set of integers which are congruent to −1 modulo 6
and let Y be the set of integers which are congruent to 2 modulo 3. Prove X ⊆ Y .
Exercise 10.6. Given a set X, show that X ∪ ∅ = X. (Hint: If you have a case
where x ∈ ∅, then you know that case doesn’t actually happen.)
{x ∈ Z : n | x} = {x ∈ Z : x ≡ 0 (mod n)}.
A − (B ∩ C) ⊆ (A − B) ∪ (A − C).
  Warning 11.3. Some mathematicians use the word “let” when handling ex-
  istential statements. For instance, they might have started the previous proof
  with the sentence “Let n = 0.”
In some cases a statement asks for the existence of more than one element.
Proposition 11.5. Every odd integer is the sum of two consecutive integers.
                                                 √
     Proposition 11.6. One of the digits of          2 = 1.414213562 . . . occurs infinitely
     many times.
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
    Notice that in this example we did not actually find out which digit occurs in-
finitely often, which is why we say the proof is nonconstructive. We only proved that
at least one of the digits does show up over and over. It is not much more difficult to
show that in fact two digits must occur infinitely many times. (Sketch: If, by way of
contradiction, we assume that√ only one digit showed up infinitely many times, then
the decimal √ expansion  for   2 would eventually just repeat that digit. This would
show that 2 is rational, which we previously proved it is not.) Quite surprisingly, it
is an open problem in mathematics
                              √       whether three digits must occur infinitely often
in the decimal expansion of 2!
    Here is another example, where we almost construct an example.
                          √                                                    √
                      √       2                                            √       2             √
   Case 2: Assume         2       is irrational. In this case we fix a =       2       and b =    2,
and calculate
                                              √       √2
                                          √                     √
                                         
                                                  2                 2
                                  ab =        2             =       2 =2
which is rational.
   In this proof, we do not know (or care) which case is true. We just show that we
can solve the problem in either case. (If you do care, it is known that Case 2 is the
true case, but this is not easy to√prove.) There does exist a constructive proof of the
previous proposition (take a = 2 and b = log2 (9) and prove b is irrational).
   We finish with one more example of how to prove existence (in a nonconstructive
way). In the proof below we use some standard theorems from calculus which we will
not prove in this textbook.
Proof. The function given by the polynomial equation f (x) = x5 +2x−1 is continuous
everywhere. We find that f (0) = −1 and f (1) = 2. By the Intermediate Value
Theorem, we know that f must take the intermediate value 0 for some input c ∈ (0, 1).
This number c is a solution.
11.C     Uniqueness
Some problems ask for more than mere existence; they want uniqueness as well. This
means that you are asked to prove two things: first that there is an element satisfying
the given condition, and second that there are no other solutions. For example, we
can improve the previous proposition to the following:
Proof. (Existence): We already proved, above, that the equation has at least one real
solution.
    (Uniqueness): We now show that the equation can have at most one solution.
Letting f (x) = x5 + 2x − 1 we compute the derivative f 0 (x) = 5x4 + 2 > 0. Thus,
by the first derivative test, the function f is strictly increasing. Hence, it can equal
0 only once.
Proof. Let x ∈ Z.
   (Existence): Fixing y = x + 1 ∈ Z works.
   (Uniqueness): Since (x + 2) − x = 2, these integers are exactly distance 2 apart.
Thus, there is at most one integer between them.
    One technique for showing uniqueness is to assume there are two solutions (not
necessarily distinct), and then show that those two solutions are in fact equal. We
will demonstrate this technique in the proof of the following proposition.
     Proposition 11.11. Every odd integer is uniquely the difference of two consec-
     utive squares.
(k + 1)2 − k 2 = k 2 + 2k + 1 − k 2 = 2k + 1 = n
(x + 1)2 − x2 = (y + 1)2 − y 2 .
x2 + 2x + 1 − x2 = y 2 + 2y + 1 − y 2 .
∃x ∈ R, x2 < −1
∃a, b ∈ Z, a | b ∧ b | a ∧ a 6= b.
  Advice 11.13. When asked to either prove or disprove a statement, clearly tell
  the reader which of the two you have decided to try.
11.E      Exercises
Exercise 11.1. Prove the following:
 (a) There exist a, b ∈ Q such that ab ∈ Q.
 (b) There exist a, b ∈ Q such that ab ∈ R − Q.
 (c) There exist a, b ∈ R − Q such that ab ∈ R − Q.
 (d) There exist a ∈ Q and b ∈ R − Q such that ab ∈ Q.
 (e) There exist a ∈ Q and b ∈ R − Q such that ab ∈ R − Q.
86                                       CHAPTER III. BASIC PROOF TECHNIQUES
Exercise 11.4. Prove or disprove: There exists an integer x such that x2 + x is odd.
Exercise 11.5. Prove or disprove: Given any positive rational number a, there is an
irrational number x ∈ (0, a).
Exercise 11.6. Prove that for any two real numbers x < y, there exists a rational
number in the interval (x, y). In this proof you may freely use the fact that if two
real numbers are more 1 apart, then an integer lies between them.
    Idea 1: To help motivate the general proof, first consider the specific case when
y = 0.60100 . . . and x = 0.59922 . . .. They are not more than 1 apart, so you cannot
find an integer between them. However, if you multiply them both by 103 you get
which are more than one apart. The integer 600 lives between 103 y and 103 x, and so
the rational number 10−3 · 600 belongs to the interval (x, y).
    Idea 2: In the previous example, how did we know that we needed to multiply by
103 ? The decimal expansion of y − x is 0.001888 . . . > 0.00100 . . . = 10−3 .
    Idea 3: Write the decimal expansion of y − x as dk dk−1 . . . d1 d0 .d−1 d−2 d−3 . . ..
Since y − x > 0, at least one of the decimal digits is nonzero; call it d` . Prove that
y − x > 10`−1 , and so 10−`+1 y is more than 1 away from 10−`+1 x.
12. SET PROOFS IN LOGIC                                                            87
(A ⇒ B) ⇒ (C ⇒ D).
     Working directly (twice) you have the assumptions A ⇒ B and C. Your goal is
     to prove D. Note that you do not know yet that either A or B is true!
(12.5) B 6= A ∪ B ⇒ A 6= B.
∃x, x ∈ A − B or x ∈ B − A.
So, instead we should try to prove (12.5) contrapositively. Here is a proof outline.
12. SET PROOFS IN LOGIC                                                           89
    We stated above that it is usually difficult to work with unequal sets. However,
there is one situation where this piece of advice fails. Which of the following two
statements is easier to work with?
    • A = ∅.
    • A 6= ∅.
The second statement is easier, because it tells us that A actually has an element.
To illustrate this fact, we will prove the following theorem.
12.D       Exercises
Exercise 12.1. For each natural number n, define the set Sn = {x ∈ Z : x2 = n}.
Prove the following statement: If n = 4, then Sn = {2, −2}.
Exercise 12.2. Give a complete proof for Proposition 12.4, using the sketched out-
line.
Outline a proof of the statement. (Give as much detail as in the outline after Propo-
sition 12.4. You do not need to prove the statement.)
Proof by Induction
P (n) : n < 2n
is true for each positive integer n, one might first check that it is true when n = 1.
In fact, it is easy to check it for many different values of n.
    Suppose we could prove that whenever P (k) is true for some positive integer k,
then P (k + 1) is true. We could use this to finish the problem as follows:
    Since P (1) is true, P (2) must be true; since P (2) is true, P (3) must be true; since
P (3) is true, P (4) must be true; and so on, forever.
    Induction is a technique for making clear what the phrase “and so on, forever”
means in the previous sentence. Anytime we find ourselves wanting to repeat a process
infinitely often in a proof, it is a sign that we should think about using induction.
                                            91
92                                            CHAPTER IV. PROOF BY INDUCTION
13 Mathematical induction
   A proof by mathematical induction proceeds by verifying that (i) and (ii) are
true, and then concluding that P (n) is true for all n ∈ N. We call the verification
that (i) is true the base case of the induction and the proof of (ii) the inductive step.
Typically, the inductive step will involve a direct proof; in other words, we will let
k ∈ N, assume that P (k) is true, and then prove that P (k + 1) follows. If we are
using a direct proof we call P (k) the inductive hypothesis.
   A proof by induction thus has the following four steps.
  Identify P (n): Clearly identify the open sentence P (n). If P (n) is obvious, then
                  this identification need not be a written part of the proof.
     Base Case: Verify that P (1) is true. This will typically be done by direct
                  computation or by giving an example.
Inductive Step: Prove the implication P (k) ⇒ P (k + 1) for any k ∈ N. Typically
                  this will be done by a direct proof; assume P (k) and show P (k +1).
                  (Occasionally it may be done contrapositively or by contradiction.)
    Conclusion: Conclude that the theorem is true by induction. As with identify-
                  ing P (n), this may not need to be a written part of the proof.
  Warning 13.3. Note the importance of the base case. Without it, the inductive
  step shows that we can move from one rung of the ladder to the next higher one,
  but there is no evidence that we can reach the bottom of the ladder at all.
  Perhaps the ladder is suspended high above the ground; the base case shows
  that we can actually reach the bottom rung.
This fact, and variations of it, are often used in induction proofs involving summation.
                                                                                      N
94                                                     CHAPTER IV. PROOF BY INDUCTION
Proof Strategy. We begin by identifying the open sentence P (n). In this case, P (n)
is the equality
                                     n
                                    X       n(n + 1)
                             P (n):     i=           .
                                    i=1
                                               2
    The base case, verifying that P (1) holds, is done by a simple computation (plug-
ging 1 in for n).
    For the inductive step, we assume P (k) and show P (k+1). Hence, we are assuming
for some k ∈ N that P (k) is true, so
                                       k
                                       X           k(k + 1)
                                             i=             ,
                                       i=1
                                                      2
where the sum on the right is the sum involved in P (k). We now use our assumption
of P (k) to simplify the sum, and complete the proof.
     Now that we have sketched the proof method, let’s write a full and formal proof.
Proof. Let P (n) be the open sentence
                                             n
                                             X          n(n + 1)
                                   P (n):          i=            .
                                             i=1
                                                           2
Starting with the left-hand side, and simplifying with the right-hand side as a target,
we find that
    k+1
    X
          i = 1 + 2 + 3 + · · · + k + (k + 1)
    i=1
           = (1 + 2 + 3 + · · · + k) + (k + 1)
                k
                    !
               X
           =       i + (k + 1)
                i=1
             k(k + 1)
           =          + (k + 1)                            (by the inductive hypothesis)
                 2
             k(k + 1) + 2(k + 1)
           =                                             (getting a common denominator)
                      2
             (k + 1)(k + 2)
           =                .                              (factoring out k + 1)
                   2
So P (k + 1) is true.
   Hence, by induction, P (n) is true for all n ∈ N.
Remark 13.6. It can be helpful to point out to the reader of your proofs where you
use the inductive hypothesis, as done above. Note that if you do not use the inductive
hypothesis, then you could have just proved the theorem without induction.          N
Remark 13.7. With practice you will become better at seeing how P (k) and P (k+1)
are related (especially with sums like the one above), and these proofs will go more
smoothly for you. For instance, with practice we could have gone directly to the
equality                                    !
                              k+1
                              X         Xk
                                  i=       i + (k + 1)
                                 i=1          i=1
     which is incorrect, as it says that P (k) is the number k 2 . Another incorrect use
     of P (k) is the following
                                  k+1
                                  X           k
                                              X
                                         i=         i + (k + 1)
                                   i=1        i=1
                                         = P (k) + (k + 1)
                                         = k 2 + (k + 1)
     Note that this also arises from thinking, incorrectly, of P (k) as equal to part of
     the statement that it represents.
Remark 13.9. It might appear that in an induction proof we are assuming what we
are attempting to prove. For instance, if we are trying to prove
∀n ∈ N, P (n)
by induction, then in the inductive step of the proof we will need to assume P (k). It
would indeed be a logical mistake to assume P (k) if our immediate goal is to prove
P (k).
    However, that is not the case. The goal of the inductive step is not to prove P (k),
but to prove that P (k + 1) follows from P (k). Hence, in fact, we are not assuming
what we wish to prove (namely that P (n) is true for each n ∈ N). Note also that
proving
                              ∀k ∈ N, P (k) ⇒ P (k + 1)
by itself does not prove that P (k) is true for any natural number; it just proves that
if P (k) is true for some k, then P (k + 1) must be true as well (which is why we also
need the base case to start the induction).                                          N
  Inductive step: Let k ∈ N and assume 2k > k. We want to prove 2k+1 > k + 1.
We find
              2k+1 = 2 · 2k
                   >2·k                   (by the inductive assumption)
                   =k+k
                   ≥ k + 1.                                 (since k ≥ 1)
This finishes the inductive step, so by induction we know that 2n > n for each
n ∈ N.
    Induction can often be used to prove facts about finite sets. In this case, the
general technique is to induct on the size of the sets. Typically a proposition will be
trivial to prove for the empty set, or for sets with a single element. We may assume
the proposition holds for sets of size k, and let A be a set of size k + 1. Removing
one element from A yields a set of size k, to which the inductive hypothesis applies.
Then, we only need to extend the proposition to A; how we do it depends on what
exactly we wish to prove. The following theorem is a typical example.
       P (n): For each m ∈ N, if m > n and m objects are placed in n bins, then
              two (or more) objects must share a bin.
    Base case: We verify that P (1) is true. If we have more than one object, and we
place them all in one bin, then all the objects must clearly share a bin.
    Inductive step: Let k ∈ N and assume P (k).
    We now prove P (k + 1). Let m ∈ N. Assume m > k + 1 and m objects are placed
into k + 1 bins. We need to show that two objects share a bin. Choose one of the
objects and call it x. We divide the proof into two cases.
Case 1. Suppose that the object x shares a bin with at least one other object. In
         this case two objects clearly share a bin, so we are finished.
Case 2. The object x is in a bin by itself; no other object shares the bin with x. In
         this case there are m − 1 remaining objects, none of which are in the same
         bin as object x. Hence, these m − 1 objects must all be placed into the k
         remaining bins. By the inductive hypothesis (that P (k) is true), we know
         that two of these objects must share a bin, since m − 1 > k .
In both cases two objects must share a bin, which completes the inductive step.
    Hence, by the principle of mathematical induction, P (n) is true for all n ∈ N.
Remark 13.13. The pigeonhole principle can be applied to many situations. For
instance, if we choose three integers, then two of them must have the same parity.
Here there are two bins; even and odd. If we choose three numbers, two of them
(possibly all three) must end up in the same bin, or in other words they have the
same parity.
    As another example, in a class of 30 people, if each person scores between 80 and
100 percent on an exam (with no fractional scores allowed), then two people must
have received the same score since there are 21 possible scores (bins) which must
contain the 30 people.                                                              N
      In terms of the ladder analogy, induction proves that we can reach every rung
  of the ladder, but it cannot be used to prove that we can reach the top of the
  ladder (since the ladder actually has no top).
13.B     Exercises
Exercise 13.1. Prove that for each n ∈ N,
                                   n
                                   X
                                     (2i − 1) = n2 .
                                    i=1
n < 3n .
 (b) Prove that for each n ∈ Z, n < 3n . (Hint: With part (a) in hand, you might
     not need induction for part (b).)
Exercise 13.5. Let x ∈ R − {1}. Prove that for each n ∈ N,
                                   n
                                   X             1 − xn+1
                                          xi =            .
                                   i=0
                                                   1−x
Exercise 13.6. Let x ∈ R and assume x > −1. Prove that for each n ∈ N,
                                   (1 + x)n ≥ 1 + nx.
100                                            CHAPTER IV. PROOF BY INDUCTION
Exercise 13.7. Let S be any nonempty set of natural numbers. Prove that S has
a least element. (Hint: Use Proposition 13.11 and the fact that for any n ∈ N, any
subset of {1, . . . , n} is finite. You will not need to use induction in your proof, since
the induction is done in the proof of Proposition 13.11.)
    The fact that any nonempty subset of the natural numbers has a least element is
called the well-ordering principle.
where a is a fixed integer. Note that if a = 1, this is just a proof of a statement for
all natural numbers.
    Induction can be used to prove such statements. The only change is that our base
case starts at a instead of 1. We will give a proof of this fact at the end of this section,
but for now we demonstrate how this changes proofs by giving some examples.
    Before we start the proof, we make a few remarks. First, why are we restricting
to integers n ≥ 10? The reason is because the claim is false for some smaller integers.
The inequality is false when n = 9. (Try it!) Second, what is the open sentence P (n)?
It is just P (n) : 2n > n3 . When we plug in k + 1 for n, we have
(k + 1)3 = k 3 + 3k 2 + 3k + 1.
In the computation in the proof below we will slowly try to “peel off” each of the
terms k 3 , 3k 2 , 3k, and 1, one at a time, so that eventually we can end up with (k +1)3 .
   We are now ready for the formal proof.
P (n): 2n > n3
   Inductive step: Let k ∈ N with k ≥ 10. Assume that P (k) is true. So we now
know that 2k > k 3 . We wish to prove P (k + 1), which states that
           2k+1 = 2 · 2k
                = 2k + 2k
                > k3 + k3                          (2k > k 3 , by inductive hypothesis)
                ≥ k 3 + 10k 2                                            (since k ≥ 10)
                = k 3 + 3k 2 + 7k 2                                    (peeling off 3k 2 )
                ≥ k 3 + 3k 2 + 70k                                       (since k ≥ 10)
                = k 3 + 3k 2 + 3k + 67k                                 (peeling off 3k)
                > k 3 + 3k 2 + 3k + 1                                   (since 67k > 1)
                = (k + 1)3 .
Hence, P (k + 1) is true.
   Therefore, by mathematical induction, P (n) is true for each n ≥ 10.
Remark 14.2. When trying to prove the inductive step, it can sometimes be difficult
to verify that P (k + 1) follows from P (k). Notice that in the previous example we
wanted to show that 2k+1 > (k + 1)3 . In order to do this, we wrote down one side
of the inequality (the left-hand side) and manipulated it in order to reach the other
side.
    In the previous example we were aided by the knowledge that the right-hand side
is
                             (k + 1)3 = k 3 + 3k 2 + 3k + 1.
This gave us a target to shoot for. Moreover, the right-hand side cannot be simplified
much further, which is why we started with the left-hand side in the proof above. It
is often useful to manipulate both sides of an equation or inequality in order to work
out (on scratch paper) how to get from one side to the other. However, in the proof
we must be careful that our inequalities all go the same direction.                  N
Example 14.4. If we wish to compute 5!, we use the formula repeatedly as follows:
                                 5! = 5 · 4!
                                    = 5 · 4 · 3!
                                    = 5 · 4 · 3 · 2!
                                    = 5 · 4 · 3 · 2 · 1!
                                    = 5 · 4 · 3 · 2 · 1 · 0!
                                    =5·4·3·2·1·1
                                    = 120.
n! = n · (n − 1) · (n − 2) · · · 3 · 2 · 1. 4
P (n) : n! > 2n
          (k + 1)! = (k + 1)k!
                   > (k + 1)2k              (since k! > 2k , by the inductive hypothesis)
                   > 2 · 2k                                            (since k + 1 > 4 > 2)
                       k+1
                   =2        .
   If we wish to prove facts about finite sets, it will often be convenient to start our
induction with the base case being any set of size 0 (namely, the empty set).
    Base case: P (0) is true; the only set with 0 elements is the empty set, and its
only subset is itself, so |P(∅)| = 1 = 20 .
    Inductive step: Assume that P (k) is true for some k ≥ 0; namely, that for any
set A with k elements, |P(A)| = 2k .
    We want to prove P (k +1). Let B be a set with k +1 elements. Choose an element
of B and call it b. We divide the power set of B into two collections of subsets. Let
S = {X ∈ P(B) : b ∈ X},
and let
                                  T = {X ∈ P(B) : b ∈
                                                    / X}.
We note that T consists of the subsets of B − {b}; hence, T is just the power set
of B − {b}. Since B − {b} has k elements (one element less than B), our inductive
hypothesis tells us that |T | = 2k .
   On the other hand, each element of S is uniquely the union of an element of T
with the set {b}. Hence, |S| = |T | = 2k . Since S and T have no elements in common,
the number of elements in P(B) = S ∪ T is |S| + |T | = 2k + 2k = 2 · 2k = 2k+1 .
Hence, P (k + 1) is true.
   Therefore, by induction we see that P (n) is true for each n ≥ 0.
    We can often use induction to extend statements concerning two objects to state-
ments concerning any finite number of objects. For instance, the following proposition
is an extension of De Morgan’s law, from two terms to an arbitrary (finite) number
of terms.
Then we have
                2(k+1)+1 = 2 · 2k+1
                         > 2 · k2                       (by the inductive assumption)
                         = k2 + k2
                         ≥ k 2 + 3k                                       (since k ≥ 3)
                         = k 2 + 2k + k                                 (peeling off 2k)
                         > k 2 + 2k + 1                                   (since k > 1)
                         = (k + 1)2
Hence, P (k + 1) is true.
   Therefore, by induction P (n) is true for each n ≥ 3. Since we have already
demonstrated P (1) and P (2), we see that P (n) is true for each n ∈ N.
  Advice 14.9. To decide whether or not to do extra cases, try the inductive step
  first (perhaps on scratch paper). If you need extra information (as we did above,
  to replace k 2 with 3k) this could be a reason to do extra base cases.
      Another reason to use extra cases is if you are working with a piecewise
  defined function. Doing small cases might help handle places where the piecewise
  function is different.
  Theorem 14.10. Let a ∈ Z, and let P (n) be an open sentence whose domain
  includes the set S = {n ∈ Z : n ≥ a}. If
     (i) P (a) is true and
    (ii) P (k) ⇒ P (k + 1) for all k ∈ S,
  then P (n) is true for all n ∈ S.
    This correspondence makes it clear that if we can prove P 0 (n) for each n ∈ N,
then we will have proved P (n) for each n ∈ S.
    Now, P 0 (1) = P (a) is true by (i).
    Further, for each k ∈ N, we see that P 0 (k) ⇒ P 0 (k + 1) holds since P (k + a − 1) ⇒
P (k + 1 + a − 1), by (ii).
    Hence, by the principle of mathematical induction, P 0 (n) is true for each n ∈ N,
so P (n) is true for each n ∈ S.
Remark 14.11. In order to illustrate the connection between P (n) and P 0 (n), we
describe the corresponding open sentences for Proposition 14.5. In that proposition,
it is asserted that the open sentence
P (n) : n! > 2n
Proving that n! > 2n is true for each n ≥ 4 is the same as proving that (n + 3)! > 2n+3
is true for each n ≥ 1.                                                              N
14.D      Exercises
Exercise 14.1. Prove that n! > 3n for each natural number n > 6.
Exercise 14.2. Prove that if n is any natural number greater than 5, then n! > n3 .
Exercise 14.3. Prove that for each n ∈ N, we have 3n ≥ n3 .
    (Hint: Demonstrate this by direct calculation for n = 1, 2, 3. Then use induction
to complete the proof for n ≥ 3, with n = 3 as your base case.)
Exercise 14.4. Prove that for any n ∈ N with n ≥ 2, if P1 , . . . , Pn are statements,
then
                    ¬(P1 ∧ · · · ∧ Pn ) ≡ (¬P1 ) ∨ . . . ∨ (¬Pn ).
14. MORE EXAMPLES OF INDUCTION                                              107
(Note that for n = 2 this is just Theorem 8.21, the triangle inequality.)
Exercise 14.6. The Fibonacci numbers are a collection of natural numbers labeled
F1 , F2 , F3 , . . . and defined by the rule
F1 = F2 = 1,
Exercise 14.7. Using the definition of the Fibonacci numbers from the previous
problem, prove by induction that for any integer n > 12 that Fn > n2 .
   (Hint: One possible idea for a proof is to let P (n) be the open sentence
Use induction to prove that P (n) is true for all n ≥ 14. This then implies that
Fn > n2 for all n ≥ 13.)
108                                            CHAPTER IV. PROOF BY INDUCTION
15        Strong induction
15.A        The definition of strong induction
Sometimes, when trying to do a proof by induction, the inductive step is not feasible
because P (k) does not provide enough information to conclude P (k + 1). In this case,
a variation on induction called “strong induction” is often useful.
    The idea of strong induction is very intuitive. Recall the ladder analogy. If we
can climb to the kth rung, we just need to know that we can climb to the k + 1st
rung. However, we have more information available. If we have climbed up to the
kth rung, then we have also climbed all the steps below! It is possible to make use of
this extra information by making a stronger inductive hypothesis. In the inductive
step, instead of merely assuming P (k) we instead assume the stronger statement
Q(k) = P (1) ∧ P (2) ∧ . . . ∧ P (k). In other words, we assume that we climbed each of
the steps from the first to the kth.
    The only difference between a proof by “normal” induction and “strong” induc-
tion is that, in the inductive step, we make the stronger assumption Q(k) above.
Everything else is precisely the same—we still need a base case, and in the inductive
step we still want to conclude by showing P (k + 1).
    Because these two proof techniques are so similar it is not necessary to use the
word “strong” in such a proof, unless you want to emphasize this fact to the reader.
B C
D E
    A natural question one might ask is whether or not it is possible to find a path
through each of the cities, without ever revisiting a city. In this diagram, one such
path passing through all the cities is B → A → C → E → D. Others are A → C →
B → E → D, and B → D → C → E → A, and E → D → A → C → B. This shows
that for the specific combination of one-way roads given in the diagram above there
are several paths.
    Does the answer to our question change if we change the directions of the one-way
roads? What if we change the number of cities?
    To answer these new questions we need to set up some notation. Let S be a finite
set of cities. We will call a collection of one-way roads, with a single road connecting
each pair of distinct cities in S, a system of one-way roads for S. If there is some
path through the cities, which follows that system of one-way roads and visits each
city exactly once, we will call it a valid path through the cities.
    We are now ready to answer our questions!
Proof. Let P (n) be the open sentence: “If S is a set of n cities with a system of
one-way roads, then there is a valid path through those cities.” We work by (strong)
induction to show that P (n) is true for each n ≥ 1.
    Base Case: Let n = 1. In this case, starting in the single city, we don’t need to
go anywhere to say that we have visited all the cities. Hence, P (1) is true.
    Inductive Step: Assume that for some k ∈ N, we know that P (1), P (2), . . . , P (k)
are each true. (This is the only place where our proof would look different from a
standard induction. Instead of merely assuming P (k) is true, we have assumed all
the steps from the first to the kth are true.) From this information, we will try to
prove P (k + 1).
    To that end let U be a set of k+1 cities, with a system of one-way roads connecting
them. As U is nonempty, we may fix one of the cities in U , call it X. There are a
total of k cities remaining that are not equal to X. We divide these k cities into two
disjoint sets:
and
    Note that in the first two cases of the proof above, we only needed to know that
P (k) is true. Thus, in those cases, normal induction would work. However, in the
third case we needed to use the fact that the theorem was true for networks of cities
and roads with an arbitrary number of cities smaller than k + 1, not just for networks
with exactly k cities.
Example 15.2. Let U = {A, B, C, D, E} and consider the following system of one-
way roads for U (which is different from the system we considered previously).
B C
D E
    It may be a useful exercise for the student to go back to the system of roads
introduced at the beginning of Subsection 15.B and, using the proof method above,
see which valid paths are constructed for each X. Also, it may be useful to note
which values of P (i) are being used to conclude P (5), for each choice of X. Can you
find a valid path which does not arise from the inductive proof?
  Before we begin the proof, we want to make a few remarks which will help explain
what we are trying to prove.
Remark 15.4 (Sums of one object). When mathematicians say that a number can
be written as a sum they allow the possibility of adding only one object. Hence, the
number 8 = 23 can be considered as a sum of a single power of two.                N
Remark 15.5 (Meaning of distinct). In English, the word “distinct” is often used
to mean “special” or “distinguished.” In mathematics, the word has a very precise
meaning, which is quite different; a list of objects is called distinct if no two of the
objects are equal.
   For instance, there are several ways to write the number 6 as a sum of powers of
two. We have
             6 = 20 + 20 + 20 + 20 + 20 + 20 = 21 + 20 + 20 + 20 + 20
               = 21 + 21 + 20 + 20 = 21 + 21 + 21 = 22 + 20 + 20 = 22 + 21 .
Note that only the last way (6 = 4 + 2) gives 6 as a sum of distinct powers of 2 (all
the rest have repetition).                                                         N
Remark 15.6 (Base 2). We illustrate the theorem for the first few natural numbers.
    We are now ready to begin the proof. Try to figure out why the proof will fail if,
in the inductive step, we only assume P (k).
112                                           CHAPTER IV. PROOF BY INDUCTION
Proof. Assume (i) and (ii) above. Let Q(n) be the open sentence
Note that Q(k + 1) ≡ Q(k) ∧ P (k + 1). Additionally, we see easily that since Q(k) ⇒
P (k + 1) for each k ∈ N by (ii), we have Q(k) ⇒ Q(k) ∧ P (k + 1), so that Q(k) ⇒
Q(k + 1). Applying the principle of mathematical induction to Q(n), we see that
Q(n) is true for all natural numbers. This immediately implies that P (n) is true for
all natural numbers.
Remark 15.8. When using strong induction, typically you will not explicitly write
out what Q(n) is. It is much more common when proving the inductive step to say
something like “Assume P (i) for all integers i in the range 1 ≤ i ≤ k.” Sometimes it
will be convenient to say “Assume that P (i) is true for all natural numbers less than
or equal to k.” Or even just “Assume P (1), P (2), . . . , P (k) are each true.”     N
  Theorem 15.9. Let a ∈ Z and let P (n) be an open sentence whose domain
  includes the set S = {n ∈ Z : n ≥ a}. For n ∈ S, let Q(n) be the open sentence
  If
     (i) P (a) is true and
    (ii) Q(k) ⇒ P (k + 1) for each k ∈ S,
  then P (n) is true for each n ∈ S.
  Proposition 15.10. Let n be any integer greater than 5. Any square can be
  subdivided into n squares.
114                                          CHAPTER IV. PROOF BY INDUCTION
We work by induction on n ≥ 6.
   Base case: We can verify that P (6) is true. We also verify that P (7) and P (8) are
true in the diagrams below. (We do these extra base cases to help with the inductive
step.)
   In addition, we have also given a picture showing how to subdivide a square into
4 smaller squares; we will use this in our proof.
    Inductive step: Let k ≥ 6, and assume that P (`) is true for 6 ≤ ` ≤ k. We wish
to prove that P (k + 1) is true. If k = 6 or k = 7, we have already seen that P (k + 1)
is true, so we may assume that k ≥ 8.
    Since k ≥ 8, we have that k − 2 ≥ 6. Hence, by our inductive assumption, since
6 ≤ k − 2 ≤ k, we know that P (k − 2) is true. In other words, we know that we
can subdivide a square into k − 2 squares. Starting with this subdivision, we further
subdivide the upper-rightmost square into 4 squares. This adds three squares to the
subdivision. Thus, we have subdivided the square into k − 2 + 3 = k + 1 squares.
Hence, P (k + 1) is true.
    Therefore, by mathematical induction, P (n) is true for all n ≥ 6.
To see this proof in action, we demonstrate how to subdivide a square into 20 squares.
Start from the subdivision into 8 squares, and repeatedly divide the upper-rightmost
square into 4 smaller squares at each stage.
n = 11 n = 14 n = 17 n = 20
We see easily how we could extend these diagrams to demonstrate the result for
n = 23, 26, 29, 32, . . . (although the upper-right square would quickly become too
small to see).
    To motivate our last example of a statement that can be proved by strong induc-
tion, consider the following problem.
15. STRONG INDUCTION                                                              115
n = 5x + 7y
15.F     Exercises
Exercise 15.1. Prove by induction that for each integer n > 5, it is possible to sub-
divide an equilateral triangle into n equilateral triangles. (For example, a subdivision
into 6 equilateral triangles is given below.)
Exercise 15.2. For the network of nine cities with one-way roads below, find a
route that visits all nine cities. Do this using the method found in the proof of
Proposition 15.1, letting X be the city denoted by A.
   (Note that there are many routes that solve the problem, but only one that arises
from letting X = A in the proof of Proposition 15.1.)
                                           A
                         I                                  B
                 H
                                                                     C
                     G
                                                                 D
                                F                 E
Exercise 15.3. (a) Prove that every integer n > 13 can be written as n = 3xn +8yn
     for some integers xn , yn ≥ 0 (where xn and yn depend on n).
 (b) Prove that 13 cannot be written as 3x + 8y for any integers x, y ≥ 0.
Exercise 15.4. Let n ∈ N. Prove (by induction) that n = 2kn mn for some nonnega-
tive kn ∈ Z and some odd mn ∈ N. (Again, kn and mn may depend on n.)
Exercise 15.5. Prove that for each natural number n > 43, we can write
for some nonnegative integers xn , yn , zn . Then prove that 43 cannot be written in this
form.
    (Hint: Write 44, 45, 46, 47, 48, and 49 in the given form. Use induction to prove
that any larger number can be written in the given form.)
Exercise 15.6. Find the largest postage that cannot be paid exactly with 4, 10, and
15 cent stamps. Prove that your answer is correct. (This proof will include showing
not only that the postage that you find cannot be achieved, but also that every larger
postage can be achieved. The correct solution is smaller than 30.)
Exercise 15.7. Recall the definition of the Fibonacci numbers from Exercise 14.6.
Prove that every positive integer is a sum of one or more distinct Fibonacci numbers.
(Hint: For each n > 0, we can find some m so that Fm ≤ n < Fm+1 . Write n =
Fm + (n − Fm ), and show that n − Fm is either 0 or (using induction) a sum of distinct
Fibonacci numbers, each smaller than Fm .)
118                                          CHAPTER IV. PROOF BY INDUCTION
Remark 16.3. It is not obvious from Definition 16.1 that the binomial coefficient is
an integer, but it will follow from the properties that we describe below.
   We read nk as “n choose k” because the binomial coefficient nk counts the
number of ways to choose k objects from among n objects. See Theorem 16.6 for a
proof of this fact.                                                               N
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 2 1 0 0 0 0
0 0 0 1 3 3 1 0 0 0
0 0 0 1 4 6 4 1 0 0 0
0 0 1 5 10 10 5 1 0 0
0 0 1 6 15 20 15 6 1 0 0
0 1 7 21 35 35 21 7 1 0
0 1 8 28 56 70 56 28 8 1 0
1 9 36 84 126 126 84 36 9 1
Proof. We will use properties of the factorial function here. In particular, we note
that (k + 1)! = (k + 1) · k!, and (n − k)! = (n − k) · (n − k − 1)! (when 0 ≤ k < n).
We will use these facts to obtain a common denominator in the fractions defining the
binomial coefficients nk and k+1 n
                                     .
   We break the proof into cases, doing the easiest cases first.
Case 1. Assume that k = n. Then both nk and n+1                          n
                                                                         
                                                    k+1
                                                          are 1, and k+1     = 0.
Case 2. Assume that k > n. Then all three binomial coefficients in the formula are
         0.
Case 3. Assume that k = −1. Then nk = 0 but k+1       n
                                                          = n+1
                                                                
                                                              k+1
                                                                    = 1.
Case 4. Assume that k < −1. Then all three binomial coefficients in the formula
         are 0.
Case 5. Assume that 0 ≤ k < n. Then 0 < k + 1 ≤ n, and all three binomial
120                                          CHAPTER IV. PROOF BY INDUCTION
         coefficients in the formula are defined by the rule involving factorials. Hence,
                                 
                       n         n            n!                 n!
                           +           =              +
                       k       k+1        k!(n − k)! (k + 1)!(n − k − 1)!
                                              n!(k + 1)          n!(n − k)
                                       =                   +
                                          (k + 1)!(n − k)! (k + 1)!(n − k)!
                                          n!(k + 1) + n!(n − k)
                                       =
                                             (k + 1)!(n − k)!
                                              n!(n + 1)
                                       =
                                          (k + 1)!(n − k)!
                                                     (n + 1)!
                                       =
                                          (k + 1)!((n + 1) − (k + 1))!
                                                 
                                            n+1
                                       =           .
                                            k+1
Proof. If n < 0, there are no sets of cardinality n, so the theorem holds in that case.
We will now deal with the case n ≥ 0 by induction. Let P (n) be the open sentence
                                                         
                                                         n
     P (n): For each k ∈ Z, the binomial coefficient         counts, for any fixed
                                                         k
            set of cardinality n, the number of subsets of cardinality k.
   Base case: We verify that P (0) is true. For k 6= 0, there are no subsets of
cardinality k of the empty set, matching the value k0 = 0. For  k = 0 there is one
subset of cardinality k of the empty set, matching the value 00 = 1. Hence, P (0) is
true.
   Inductive step: Let m ≥ 0 be an integer and assume that P (m) is true. (We use
m here because  k already has a meaning.) Thus, for each k ∈ Z, any set of cardinality
m has m
          
        k
            subsets of cardinality k.
   Let S be a set consisting of m + 1 elements. Choose one of them and call it x.
Let k ∈ Z. If k is negative, then m+1k
                                           = 0 is the number of k element subsets of
S. Similarly, if k = 0, then m+1  k
                                      =  1 is the number of k element subsets of S.
Therefore, in what follows, we may assume that k > 0.
   We will count subsets T of cardinality k in S by counting the subsets with x ∈ T
separately from those with x ∈ / T and then adding the two counts together.
   Any subset T of cardinality k with x ∈ T corresponds to the set T − {x} which
has exactly k − 1 elements. We have T − {x}    ⊆ S − {x}. Since |S − {x}| = m the
                                            m
inductive hypothesis says that there are k−1    such subsets.
16. THE BINOMIAL THEOREM                                                                  121
    Theorem 16.6 gives an easy way to prove that the binomial coefficients are all
integers, a fact that is not at all obvious from the definition.
                                                                 n
                                                                     
  Theorem 16.7. Let n, k ∈ Z. Then the binomial coefficient      k
                                                                         is an integer.
Proof. By Theorem 16.6, nk counts the number of k-element subsets in a set of size
                           
Proof. We prove the theorem by induction on n ≥ 0. Let P (n) be the open sentence
                                              n  
                                             X   n n−k k
                         P (n): (x + y)n =         x y .
                                             k=0
                                                 k
   Now suppose that P (m) is true for some m ≥ 0. Then we know that
                                      m  
                                   m
                                     X   m m−k k
                           (x + y) =       x  y .
                                     k=0
                                         k
122                                               CHAPTER IV. PROOF BY INDUCTION
16.C      Exercises
Unless otherwise noted, exercises in this section should not be done using induction.
Exercise 16.1. Use the definition of the binomial coefficient to prove that for each
integer n ≥ 0,
                                                      
                n       n                     n          n
                    =       =1       and          =            = n.
                0       n                     1       n−1
16. THE BINOMIAL THEOREM                                                               123
Exercise 16.3. Let n, h, k ∈ Z. Using the definition of the binomial coefficient, prove
that                                              
                            n n−h            n n−k
                                        =                 .
                            h     k          k      h
(Hint: You must deal with the four cases where n < 0, h < 0, k < 0, and h + k > n.)
Exercise 16.7. Use the definition of the binomial coefficient to prove that for any
n, k ∈ Z,                                   
                                  n        n−1
                               k     =n          .
                                  k        k−1
Exercise 16.8. Prove that for n ∈ N, the “middle” binomial coefficient
                                      
                                       2n
                                       n
is an even integer.
    (Hint: To get an idea how to prove it, look at Pascal’s triangle. Use problem 2.)
Mathematics is the queen of the sciences and number theory is the queen of mathe-
matics. Carl Friedrich Gauss
                                         125
126                                       CHAPTER V. THEORY OF THE INTEGERS
17      Divisibility
17.A      Divisibility and common divisors
We now prove several facts about divisibility, some of which we took for granted in
previous sections (often treating them as axioms).
Proof. Assume that a | b and b | a. Then by Theorem 17.1, |a| ≤ |b| and |b| ≤ |a|.
Hence, |a| = |b|, so a = ±b.
  Theorem 17.3. Let b ∈ Z with b 6= 0. There are finitely many integers that
  divide b.
Proof. If a ∈ Z divides b, then |a| ≤ |b| by Theorem 17.1. Hence, a ∈ {−|b|, . . . , |b|}.
This set is finite, so there are only finitely many possibilities for a.
Example 17.4. If b = 12, the divisors of b are
Example 17.6. If a = 12 and b = 18, then the common divisors of a and b are ±1,
±2, ±3, and ±6.                                                              4
  Theorem 17.7. Let a and b be integers, not both 0. The set of common divisors
  of a and b has a largest element.
Proof. Without loss of generality, let a 6= 0. The set of divisors of a is finite and
includes the set of common divisors of a and b, so the set of common divisors is finite.
Since it is finite and nonempty (as 1 is an element), this set has a largest element.
17. DIVISIBILITY                                                                       127
  Definition 17.8. The greatest common divisor, or GCD, of two integers a and
  b (not both zero) is the largest common divisor of a and b. We will write the
  greatest common divisor of a and b as GCD(a, b).
n = qd + r
   In order to organize the proof of this theorem we first prove uniqueness of the
quotient and remainder as a lemma before proceeding with the remainder of the
proof. A portion of the proof is left for the reader in Exercise 17.8.
Proof of Lemma. Suppose that n = qd+r = q 0 d+r0 with 0 ≤ r < |d| and 0 ≤ r0 < |d|.
Without loss of generality, we may assume that r ≤ r0 . Then we have
(17.15) (q − q 0 )d = r0 − r.
Note that since r0 < |d| and r ≥ 0 we have 0 ≤ r0 − r ≤ r0 < |d|, so that |r0 − r| < |d|.
However, by (17.15), we know d | (r0 − r). Hence, by Theorem 17.1, we see that it
must be the case that r0 − r = 0, so that r0 = r. Since d 6= 0, (17.15) now implies
that q = q 0 .
    We will prove the existence of q and r only in the case when n ≥ 0 and d > 0.
The other cases of the proof (when n is negative, or when d is negative) will be left
to the exercises (see Exercise 17.8).
Partial proof of Theorem 17.13. Fix d > 0. We work by induction to prove that
  Advice 17.16. This proof of the division algorithm does not immediately give
  us an easy way to find the quotient and remainder. However, finding q and r is
  a simple task using standard long division with remainder, as taught in many
  elementary schools. Although we will not review long division, we demonstrate
  the work to compute q and r for n = 978 and d = 13.
                                          75
                                       13 978
                                          91
                                           68
                                           65
                                            3
Proof. Let S be the set of common divisors of a and b. Let T be the set of common
divisors of b and c. We will show that S = T . Once this is shown, the largest element
of S must be the same as the largest element of T , and the theorem will be proved.
    (S ⊆ T ): Assume that d ∈ S. Then d | a and d | b. Now c = a − xb, so we must
have d | c. Hence, d is a common divisor of b and c, so d ∈ T . Thus, S ⊆ T .
    (T ⊆ S): Now assume that d ∈ T . Then d | b and d | c. Since a = xb + c, we see
that d | a. Hence, d is a common divisor of a and b, so d ∈ S. Therefore, T ⊆ S.
    Hence, S = T .
  Advice 17.18. The previous theorem does not require that c < |b|, so it ap-
  plies in situations which can be more general than the division algorithm. The
  following example gives just one instance of how useful this theorem can be.
Example 17.19. Let n ∈ Z. We will compute the possible GCDs for the numbers
3n + 1 and n − 2. Notice that
                               3n + 1 = 3(n − 2) + 7.
130                                     CHAPTER V. THEORY OF THE INTEGERS
  Algorithm 17.20. Given two integers a, b not both 0, assume that a 6= 0 and
  that |a| ≥ |b| (if either of these does not hold, swap a and b so that both hold).
     If b = 0, then the GCD(a, b) = |a|, and we are finished.
     Otherwise, apply the division algorithm multiple times, as follows.
         Divide a by b                  a = q1 b + r1     with 0 ≤ r1 < |b|.
         Divide b by r1                 b = q2 r1 + r2    with 0 ≤ r2 < r1
         Divide r1 by r2               r1 = q3 r2 + r3    with 0 ≤ r3 < r2
                 ..                              ..              ..
                  .                               .               .
         Divide rn−1 by rn    rn−1 = qn+1 rn + rn+1      with 0 ≤ rn+1 < rn .
  Continue to divide until we get a remainder rn+1 = 0 (we can’t go any further,
  since we can’t divide by 0).
      If r1 = 0, then GCD(a, b) = |b| and we are finished.
      If rn+1 = 0 for n ≥ 1, then GCD(a, b) = rn and we are finished.
    To show that an algorithm works correctly there are two things that need to
be demonstrated. First, the answer that the algorithm computes must be correct.
Second, the algorithm must terminate after finitely many steps; it does us no good if
the algorithm takes forever to compute an answer. We will demonstrate that both of
these facts hold true.
    First, the algorithm must terminate since we have a strictly decreasing sequence
of nonnegative integers |b| > r1 > r2 > r3 > r4 > · · · ≥ 0. This sequence can certainly
not have length more than |b| + 1.
    Now we show that the output is correct. Notice that if b = 0, then the algorithm
completes by asserting GCD(a, b) = |a|. By Example 17.10, this is the correct answer.
    Next, consider the case when r1 = 0. The algorithm asserts that the GCD is |b|.
By Theorem 17.17, we have
                     GCD(a, b) = GCD(b, r1 ) = GCD(b, 0) = |b|.
   Finally, choose n ∈ N so that rn+1 is 0. Then we have the following sequence of
equalities, from Theorem 17.17.
           GCD(a, b) = GCD(b, r1 ) = GCD(r1 , r2 ) = · · · = GCD(rn , rn+1 ).
17. DIVISIBILITY                                                                      131
Example 17.22. Suppose that we wish to compute the GCD of 39 and 57. We
perform our divisions as follows
57 = 1 · 39 + 18
39 = 2 · 18 + 3
18 = 6 · 3 + 0
464 = 3 · 145 + 29
145 = 5 · 29 + 0
The last nonzero remainder is 29, so the GCD of 1073 and 1537 is 29. 4
Remark 17.24. Notice that the number of divisions is actually significantly less
than |b|. In fact, it can be shown (although we will not prove it) that the number of
divisions required is always less than 5 log10 |b|, which is actually slightly less than 5
times the number of digits in |b|. Hence, for example, if b is a four digit number, no
more than 20 divisions will ever be needed.                                             N
132                                     CHAPTER V. THEORY OF THE INTEGERS
17.E     Exercises
Exercise 17.1. For the given values of n and d, compute the values of q and r
guaranteed by the division algorithm.
 (a) Let n = 17, d = 5.
 (b) Let n = 17, d = −5.
 (c) Let n = −17, d = 5.
 (d) Let n = −17, d = −5.
 (e) Let n = 256, d = 25.
  (f) Let n = 256, d = −25.
 (g) Let n = −256, d = 25.
 (h) Let n = −256, d = −25.
Exercise 17.2. Let a be an integer. Recall that a is even if there is some k ∈ Z such
that a = 2k, and a is odd if there is some ` ∈ Z such that a = 2` + 1. Prove the
following statements, which we took for granted previously. (Hint: Use the division
algorithm with d = 2.)
  (a) Every integer is even or odd.
  (b) No integer is both even and odd.
Exercise 17.3. Write out all the divisors of 60 in a list, and then all the divisors of
42 in a separate list. Write the common divisors in a third list, and find the GCD.
(All the lists should be ordered from least to greatest.)
Exercise 17.4. Use the Euclidean algorithm to compute the following GCDs.
 (a) GCD(60, 42).
 (b) GCD(667, 851).
 (c) GCD(1855, 2345).
 (d) GCD(589, 437).
Exercise 17.5. Recall that the Fibonacci numbers are defined by the relations F1 =
1, F2 = 1, and for n > 2 the recursion Fn = Fn−1 + Fn−2 .
    Prove by induction that for each n ∈ N we have GCD(Fn+1 , Fn ) = 1.
  Theorem 18.1. Every nonempty subset of the natural numbers has a least ele-
  ment.
  We will use this theorem to prove an important and useful statement about
GCD(a, b). The following definition will help us to state the result.
Example 18.3. Let a = 16 and b = 21. We will list some of the linear combinations
of a and b.
    We see that 37 is a linear combination of 16 and 21, since
37 = 16 + 21 = a + b = a · 1 + b · 1.
1 = 16 · 4 + 21 · (−3).
There are no positive integers smaller than 1, so this is indeed the smallest. 4
Example 18.4. When performing the division algorithm, the remainder is a linear
combination of the numerator and denominator. Indeed,
r = n − qd = n · 1 + d · (−q). 4
  Theorem 18.5. Let a and b be integers, not both equal to 0. The smallest
  positive integral linear combination of a and b is GCD(a, b).
134                                      CHAPTER V. THEORY OF THE INTEGERS
Proof. Let S be the set of positive integral linear combinations of a and b. In other
words,
                      S = {ax + by : x, y ∈ Z, ax + by > 0}.
It is clear that S is a subset of the natural numbers, since its elements are positive
integers. In addition, S is nonempty since it contains at least one of the following:
 a = a · 1 + b · 0,   −a = a · (−1) + b · 0,   b = a · 0 + b · 1,   −b = a · 0 + b · (−1).
Hence, S has a least element, which we will call s. Fix some x, y ∈ Z so that
s = ax + by. Note that s > 0.
   Let d = GCD(a, b). Then d | a and d | b, so by Theorem 7.15, d | ax + by, and we
have that d | s. Hence, d ≤ s.
   Now we use the division algorithm to write a = qs + r with 0 ≤ r < s. Then
                  r = a − qs = a − q(ax + by) = a(1 − qx) + b(−qy)
is an integral linear combination of a and b. If r were positive, then r would be an
element of S that is smaller than s (which would contradict the minimality of s).
Hence, r must be 0. Therefore a = qs and we see that s | a. A similar argument
shows that s | b. Since s is a common divisor of a and b, it cannot be larger than the
greatest common divisor d. Hence, s ≤ d.
    Combining the facts that d ≤ s and s ≤ d, we see that d = s.
Example 18.6. Let a = 6 and b = 9. The theorem asserts that 3 = GCD(6, 9)
should be the smallest positive linear combination of a and b.
    We see that 3 = 6(−1)+9(1) is indeed a linear combination. If we had 2 = 6x+9y,
then since 3 | 6 and 3 | 9, we would have 3 | 2, a contradiction. Similarly, 1 cannot be
a linear combination of a and b. Therefore 3 is indeed the smallest positive linear
combination and is the GCD.                                                           4
The bottom right equation then expresses rn as a linear combination of the previous
two remainders, rn−1 and rn−2 . We replace rn−1 in this equation by the integral linear
combination expressed in the equation on the preceding line, so
We now perform a similar replacement of rn−2 by the linear combination of rn−3 and
rn−4 , found on the preceding line. Repeating this process until we use all of the
equations in the right column, we have rn written as a linear combination of a and b.
   We demonstrate how this works with a couple of examples.
Example 18.7. We find GCD(493, 391), and write it as 493x+391y for some x, y ∈ Z.
   We perform the Euclidean algorithm, and solve each of the resulting equations for
the remainder.
The last nonzero remainder is 17, so we know that GCD(493, 391) = 17.
   Now we see that 17 = 102 − 1 · 85, from the bottom right equation. Looking at
the preceding equation, we see an expression for 85 that we plug into this equation,
so
                            17 = 102 − 1 · (      85     )
                               = 102 − 1 · (391 − 3 · 102)
                               = 102 − 1 · 391 + 3 · 102
                               = 4 · 102 − 1 · 391.
Going one equation higher, we see an expression for 102; namely, 102 = 493 − 1 · 391.
We plug this into our expression for 17,
                           17 = 4 · (    102       ) − 1 · 391
                              = 4 · (493 − 1 · 391) − 1 · 391
                              = 4 · 493 − 4 · 391 − 1 · 391
                              = 4 · 493 − 5 · 391,
  Advice 18.8. Probably the most difficult part of this algorithm is the temp-
  tation to oversimplify the expression for the GCD. Taken to the extreme, each
  expression for 17 above can be simplified to equal 17. It is important to keep
  track of the remainders (perhaps by underlining them) and treat them as if they
  were variables rather than numbers.
Example 18.9. We will now find the GCD of 221 and 136, and write it as an integral
linear combination of 221 and 136.
    We perform the Euclidean algorithm, and solve each of the resulting equations for
the remainder. In order to remind ourselves to treat the original numbers and the
remainders as if they were variables, we will underline them.
The last nonzero remainder is 17, and we have 17 = 51 − 1 · 34 (from the bottom
equation on the right). The previous equation is 34 = 85 − 1 · 51. Substituting for
34, we obtain
                             17 = 51 − 1 · (     34    )
                                = 51 − 1 · (85 − 1 · 51)
                                = 51 − 1 · 85 + 1 · 51
                                = 2 · 51 − 1 · 85,
where we have been careful to treat underlined numbers as variables, and not combine
them with other numbers.
   The equation we use to substitute for 51 is 51 = 136 − 1 · 85.
                           17 = 2 · (    51      ) − 1 · 85
                              = 2 · (136 − 1 · 85) − 1 · 85
                              = 2 · 136 − 2 · 85 − 1 · 85
                              = 2 · 136 − 3 · 85
                          17 = 2 · 136 − 3 · (     85      )
                             = 2 · 136 − 3 · (221 − 1 · 136)
                             = 2 · 136 − 3 · 221 + 3 · 136
                             = 5 · 136 − 3 · 221.
Example 18.12. Since GCD(15, 7) = 1, the numbers 15 and 7 are relatively prime.
On the other hand, GCD(5, 30) = 5, so 5 and 30 are not relatively prime.     4
Proof. Let a, b, c ∈ Z. Assume that a | bc and GCD(a, b) = 1. Since a | bc, we see that
bc = ak for some k ∈ Z. Also, for some x, y ∈ Z, we have 1 = ax + by. Multiplying
this last equation by c, we obtain
   The next theorem gives us sufficient conditions under which we can expect the
product of two numbers to divide into another number.
138                                      CHAPTER V. THEORY OF THE INTEGERS
Proof. Let a, b, c ∈ Z. Assume that a | c and b | c and GCD(a, b) = 1. Then for some
k, `, x, y ∈ Z, we have c = ak, c = b`, and 1 = ax + by. Multiplying this last equation
by c, we get
                    c = cax + cby = (b`)ax + (ak)by = ab(x` + ky).
Since x` + ky ∈ Z, we see that ab | c.
  Warning 18.17. Note that neither of the previous two theorems is true if we
  replace the assumption GCD(a, b) = 1 with a - b. For the first theorem, taking
  a = 4, b = 6, and c = 2, we have that 4 | (6 · 2), and 4 - 6, but it is not the case
  that 4 | 2.
     For the second theorem, taking a = 12, b = 18, and c = 36, we see that both
  a and b divide 36, but ab - 36. (Can you find simpler counterexamples?)
18.D     Exercises
Exercise 18.1. For each pair of numbers a and b below, calculate GCD(a, b) and
find x, y ∈ Z such that GCD(a, b) = ax + by.
  (a) Take a = 15 and b = 27.
  (b) Take a = 29 and b = 23.
  (c) Take a = 91 and b = 133.
  (d) Take a = 221 and b = 377.
Exercise 18.2. Let a, n ∈ Z. Assume that GCD(a, n) = 1. Prove that there is some
b ∈ Z such that ab ≡ 1 (mod n).
    (Hint: Use Theorem 18.10.) This result says that there is an element b which acts
like the reciprocal of a, modulo n.
Exercise 18.3. The following exercise proves the existence and uniqueness of the
lowest terms representation of a rational number.
  (a) Let a, b ∈ Z, not both zero, and let d = GCD(a, b). Prove that
                                                
                                             a b
                                     GCD      ,    = 1.
                                             d d
19        Prime numbers
Now that we have a good understanding of divisibility in the integers we are prepared
to define and study the multiplicative building blocks of the integers. As far as
multiplication is concerned, prime numbers are the “atoms” from which other integers
are formed.
  Definition 19.1. A prime number is an integer p > 1 such that the only positive
  divisors of p are 1 and p. An integer n > 1 that is not prime is said to be
  composite.
Example 19.2. We know that all positive divisors of an integer n are between 1 and
n, so we may easily check whether a given integer is prime. The integer 2 is prime,
since there are no integers between 1 and 2. The integer 3 is prime, since it is not
divisible by 2. Similarly 5 is prime, since it is not divisible by 2, 3, or 4. Note that 4
is not prime, since it is divisible by 2.
    The first few prime numbers are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73. 4
  Theorem 19.3. Let a ∈ Z with a > 1. If a is composite, then there are positive
  integers b and c, both strictly between 1 and a, such that a = bc.
Proof. Since a is not prime, it has a positive divisor b with 1 < b < a. Fix c = a/b.
Since b > 1, we see that c < a. Also, since b < a and b is positive, dividing by b we
see that 1 < a/b = c. Hence, 1 < c < a. Finally bc = b(a/b) = a.
   When we combine Theorem 19.4 with Theorem 18.13 we obtain the following
important description of prime numbers. The implication (1) ⇒ (2) in the following
theorem is known as Euclid’s Lemma, since Euclid proved it in the Elements.
Proof. (1) ⇒ (2): Assume that a is a prime number, and that a | bc and a - b. Then
GCD(a, b) = 1, so by Theorem 18.13 we see that a | c.
     (2) ⇒ (1): Working contrapositively, suppose that a is not prime, so that (1) is
false. Then a is composite, so a = bc for some integers b, c between 1 and a. Now
a | bc and a - b and a - c (since b and c are positive and smaller than a). Hence, (2) is
false.
   In Exercise 19.2 you will use induction to prove the following extension of Euclid’s
Lemma.
  Theorem 19.6. Let p be a prime number, let n be a natural number, and let
  a1 , . . . , an ∈ Z. If
                               p | a1 a2 · · · an
  then p | ai for some 1 ≤ i ≤ n.
S = {b ∈ Z≥2 : b | a}
    Next, we prove that every positive integer is a product of primes. Note that in
this theorem we allow the possibility that a number (namely 1) can be a product of
zero primes, or (if it is prime) a product of a single prime.
142                                       CHAPTER V. THEORY OF THE INTEGERS
n = p1 p2 · · · pr
Proof. We have already seen that n can be written as a product of one or more primes.
We order the primes so that they are in nondecreasing order. All that remains to be
proved is the uniqueness statement, which we will prove by strong induction.
   Let P (n) be the open sentence
Base Case: Clearly, P (2) is true; 2 is prime, so the only way to factor it into
2 = p1 · · · pr is to have r = 1 and p1 = 2. A similar argument works for any prime p;
hence P (p) is true.
   Inductive Step: Assume that P (2), . . . , P (k) are each true for some k ≥ 2. In
other words, assume each integer from 2 to k has a unique prime factorization. We
wish to prove that P (k + 1) is true. We divide the proof into two cases.
19. PRIME NUMBERS                                                                                  143
k + 1 = p1 p 2 · · · p m = q1 q2 · · · q`
has two prime factorizations, with p1 , p2 , . . . , pm , q1 , q2 , . . . , q` all prime, and such that
p1 ≤ p2 ≤ · · · ≤ pm and q1 ≤ q2 ≤ · · · ≤ q` . Note that p1 | k + 1, so
p1 | q1 q2 · · · q` .
Hence, by Exercise 19.2, we have that p1 | qi for some i. Since qi is prime and p1 6= 1,
we must have p1 = qi . This yields q1 ≤ qi = p1 .
    By a similar argument p1 ≤ q1 . Thus, q1 = p1 .
    Now, (k + 1)/p1 = p2 · · · pm = q2 · · · q` . Since 2 ≤ (k + 1)/p1 ≤ k, we see
that (k + 1)/p1 has a unique factorization into primes, by our inductive hypothesis.
Hence, m = ` and each pi = qi for i from 2 to m. Since p1 = q1 , we see that the
two factorizations that we had for k + 1 were identical. Hence, k + 1 has a unique
factorization into primes, and P (k + 1) is true.
    Thus, by induction, every integer greater than 1 has a unique factorization into
primes.
Remark 19.10. We note that typically the factorization of a number into primes
will be simplified by combining copies of the same prime together. For instance, if we
wish to factor 720, rather than writing 720 = 2 · 2 · 2 · 2 · 3 · 3 · 5 we might write
720 = 24 · 32 · 5.
with k ∈QN, each pi prime, p1 < p2 · · · < Ppk , and with each ai ∈ N. (Note that the
symbol ki=1 works just like the symbol ki=1 , except for multiplication instead of
addition.) A factorization of this form has a special name as in the next definition. N
   with p1 < · · · < pk each prime and with every ai ∈ N is called the canonical
   factorization of n.
144                                      CHAPTER V. THEORY OF THE INTEGERS
Remark 19.15. We can test this proof in specific situations by selecting any finite
set of primes that we wish to consider, and constructing a prime not in that set. For
instance, let S = {2, 3, 5, 7, 11}. Then N = 2311. In this case, N is prime and N ∈
                                                                                  / S.
    Now suppose that S = {2, 3, 5, 7, 11, 13, 17, 19}. Then N = 9699691 = 347 · 27953.
Both 347 and 27953 are primes not in S.                                             N
19.D      Exercises
Exercise 19.1. For each of the following integers n, give its canonical prime factor-
ization.
    (a) n = 27.  (b) n = 3072.      (c) n = 60.
p | a1 a2 · · · an
Exercise 19.3. Let n > 1 be a natural number. Prove that the smallest divisor d of
n that is greater than 1 is prime.
Exercise 19.4. The goal of this exercise is to prove that there are infinitely many
primes which are congruent to −1 modulo 3. We will do this in a series of steps.
 (a) Prove that, with only one exception, every prime number is congruent to either
      1 or −1 modulo 3.
 (b) Prove that for any n ∈ N and any a1 , . . . , an ∈ Z, if each ai ≡ 1 (mod 3), then
      the product a1 a2 · · · an ≡ 1 (mod 3). (Use induction.)
  (c) Suppose that N ∈ N, and N ≡ −1 (mod 3). Prove that N is divisible by
      some prime p such that p ≡ −1 (mod 3). (Hint: Can N be divisible by the
      exceptional prime mentioned in part (a)? If not, can all its prime factors be
      congruent to 1 modulo 3? If not, what option remains?)
 (d) Prove that there are infinitely many primes p that are congruent to −1 modulo
      3. (Hint: Let {p1 , . . . , pn } be any finite set of primes that are congruent to −1
      modulo 3. Mimic the proof of Theorem 19.14, using 3p1 · · · pn − 1 in place of
      N .)
Exercise 19.5. Prove that there are infinitely many primes p such that
p ≡ −1 (mod 4).
(Hint: Do steps (a) through (d) of the previous exercise with 3 replaced by 4 every-
where.)
146   CHAPTER V. THEORY OF THE INTEGERS
Chapter VI
Relations
                                         147
148                                                      CHAPTER VI. RELATIONS
20        Properties of relations
20.A       What is a relation?
There are many relations that occur in mathematics. For instance, < on the real
numbers is a relation. Inclusion, ⊆, is a relation on sets. Divisibility on the natural
numbers is a relation. The following definition of a relation encompasses all of these
examples and others.
Under this relation, two words are related to each other exactly when they have the
same length. Thus “tree” is related to “yaks,” but “awesome” is not related to “gum.”
We might name this relation the “have the same length” relation.                   4
      Sometimes we can define relations using symbols other than R. For instance:
Example 20.5. Let A = {1, 2, 3, 4, 5}, and let B = P(A). We define a relation from
A to B using the following set of ordered pairs:
R = {(a, X) ∈ A × B : a ∈ X}.
R = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 3), (3, 4), (4, 3), (4, 4), (5, 5)}. 4
    There are many different ways to define a relation. One option is to simply list
all the possible ordered pairs in R. Another option we have seen is to write R using
set-builder notation. For instance, we defined the “have the same length” relation this
way. Sometimes set-builder notation is too clunky, and so we express the definition
of the relation in words. The following example shows how this is commonly done.
Example 20.7. Recall the relation defined in Example 20.3,
R = {(a, b) ∈ N × N : a − b = 2}.
To define the same relation, we could have instead said the following:
This is shorthand for the more complete sentence: “Let R be the relation on the
natural numbers, N, defined as the set of ordered pairs (a, b) ∈ N × N satisfying
a − b = 2.” When we define a relation R by a condition (such as a − b = 2) that
condition tells us exactly when we should expect a to relate to b.             4
  Warning 20.8. As with any definition, when we define a relation R with the
  word “if,” the proper interpretation is “if and only if.” For instance, in the
  previous example, the definition really means “aRb if and only if a − b = 2.”
  This is a standard convention of mathematical language that can take some
  getting used to.
R = {(x, y) ∈ R × R : x − y is negative}.
Because R × R is just R2 we can graph the set R in the coordinate plane, as follows.
150                                                         CHAPTER VI. RELATIONS
Notice that xRy means the same thing as x < y. In fact, this is the mathematical
relation “less than,” and it is usually just denoted by the symbol “<” rather than
being called R.                                                                 4
    Often, for commonly used relations, we will find that there is a standard symbol.
If there is not, we might use a nonstandard symbol, in which case we must be careful
to define what the symbol means.
    Some other standard symbols for relations on the real numbers are
≤, ≥, >, =, 6= .
In Exercise 20.2 you will be asked to graph the set R ⊆ R × R corresponding to each
of these relations.
    We end with one more example of a relation that is denoted with a standard
symbol.
Example 20.10. Let A be the set of all compound sentences formed from P and Q.
Define a relation R on A by xRy if x is logically equivalent to y. This relation R is
usually written ≡.                                                                4
Example 20.12. Consider the “equality” relation on C. Which of the four properties
above hold for this relation?
   It is reflexive, since given any a ∈ C we know a = a. It is also symmetric, since
given a, b ∈ C if a = b then b = a. It is transitive; for given a, b, c ∈ C if we assume
a = b and b = c, then a = c. Finally, it is antisymmetric. (Can you fill in the proof?
Let a, b ∈ A. Assume a = b and b = a. Conclude a = b.)                                4
Example 20.13. Consider the relation < on R. We will show that < is not reflexive,
is not symmetric, is transitive, and is antisymmetric.
    (Not reflexive): Fix 0 ∈ R. We have 0 ≮ 0.
    (Not symmetric): Fix 0, 1 ∈ R. We have 0 < 1 but 1 ≮ 0.
    (Transitive): Let a, b, c ∈ R. Assume a < b and b < c. It follows that a < c.
    (Antisymmetric): Let a, b ∈ R. Assume a < b and b < a. This is impossible,
so the implication is vacuously true.                                             4
Example 20.14. Let A = {1, 2, 3, 4}. Define a relation R on A by
        R = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 4)}.
We will prove that R is reflexive, symmetric, and transitive, but not antisymmetric.
We can do this by just checking every possibility.
    (Reflexive): To see that R is reflexive, we note that each of the four elements of
A relates to itself; this is because (1, 1), (2, 2), (3, 3), and (4, 4) are in R.
    (Symmetric): To see that R is symmetric, we note that for any pair in R,
reversing the order of the elements in the pair yields another pair in R; for example,
(1, 2) ∈ R, and reversing the elements, (2, 1) is also in R. (What if a = 1 and b = 4?
In that case the implication is vacuously true, since the premise is false.)
    (Transitive): Transitivity is harder to see. Technically, you have 4 options for
each of a, b, and c, giving a total of 64 cases. Most of those cases involve a false
premise (hence are vacuously true). Let’s do a case that is not vacuous. Notice that
(1, 2) ∈ R. In addition, there are three elements c such that (2, c) ∈ R, namely
c = 1, 2, 3. Since each of (1, 1), (1, 2), and (1, 3) are in R, we see that whenever
(1, 2) ∈ R and (2, c) ∈ R, we have (1, c) ∈ R. Repeating this process with each
element of R in place of (1, 2), we see that R is transitive.
    (Not antisymmetric): Fix a = 1 and b = 2 in R. We find 1R2 and 2R1, but
1 6= 2.                                                                             4
Remark 20.15. Notice that when a relation R is given explicitly by a set, it can be
difficult to check transitivity. As we will see in the following examples, transitivity is
often easier to check when R is defined by a rule.                                      N
Example 20.16. Let R be the relation on R given by aRb if b − a ∈ [0, ∞). This
relation is really just ≤, so we will write it using that symbol. Let’s check each of the
four properties.
    (Reflexive): Let a ∈ R. We have a ≤ a.
    (Not symmetric): Fix a = 0 and b = 1 in R. We have 0 ≤ 1 but 1  0.
    (Transitive): Let a, b, c ∈ R. Assume a ≤ b and b ≤ c. Then a ≤ c.
    (Antisymmetric): Let a, b ∈ R. Assume a ≤ b and b ≤ a. This forces a = b. 4
152                                                                CHAPTER VI. RELATIONS
                     Definition                                  Negation
                   ∀x ∈ A, xRx                                 ∃x ∈ A, x6Rx
              ∀x, y ∈ A, xRy ⇒ yRx                        ∃x, y ∈ A, xRy ∧ y6Rx
         ∀x, y, z ∈ A, xRy ∧ yRz ⇒ xRz               ∃x, y, z ∈ A, xRy ∧ yRz ∧ x6Rz
         ∀x, y ∈ A, (xRy ∧ yRx) ⇒ x = y              ∃x, y ∈ A, xRy ∧ yRx ∧ x 6= y
Example 20.17. Let U be a nonempty set, and let S = P(U ). We can define a
relation R on S by ARB if A ⊆ B (where A, B ∈ S are subsets of U ). Then R is
easily seen to be reflexive (since every set is a subset of itself), transitive (since A ⊆ B
and B ⊆ C implies A ⊆ C), and antisymmetric (since A ⊆ B and B ⊆ A implies
A = B), but not symmetric. (Can you prove this last statement?)                            4
Example 20.18. Let A be any nonempty set, and let R = ∅ be the empty relation
on A. We note that R is not reflexive (since, for a ∈ A, we have a6Ra). On the
other hand, R is symmetric, transitive, and antisymmetric, since the implications
defining these properties are vacuously true for R (since no elements of A are related
by R).                                                                              4
Example 20.19. Let A be any nonempty set, and let R = A × A. Then, for any
a, b ∈ A, we have aRb. The relation R is easily seen to be reflexive, symmetric, and
transitive. It is antisymmetric if and only if |A| = 1.                           4
Example 20.20. Let A = N, and for a, b ∈ N, define aRb if, for some x ∈ N,
ax = b. Note that saying that aRb is the same as saying that a | b; this relation is just
divisibility, so we will write a | b instead of aRb. Before reading further, try to decide
which of the four properties hold for this relation.
   (Reflexive): Given any a ∈ N, we know a | a (because a · 1 = a).
   (Transitive): If a | b and b | c then a | c, see Proposition 7.13.
   (Antisymmetric): In addition, R is antisymmetric, since a | b and b | a implies
that a = b. (See Corollary 17.2. Note that the relation in this example is defined
on N, not Z. The divisibility relation on Z is not antisymmetric; take a = 1 and
b = −1.)
   (Not symmetric): Note that 1 | 2 but 2 - 1.                                          4
20.C      Exercises
Exercise 20.1. Let A = {1, 2, 3, 4, 5, 6} and let
        R = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 3), (2, 5), (2, 6), (3, 5), (4, 5), (4, 6)}.
20. PROPERTIES OF RELATIONS                                                           153
Exercise 20.8. Let A = {1, 2, 3, 4, 5} and let S = {{1, 2}, {4, 5}}. Define
21      Equivalence relations
Knowing that we have a relation R on a set A tells us very little about either R or A.
We can find many examples of relations on A, but without further information there
is very little that we can say about a relation.
    Placing conditions (such as “reflexive”, “symmetric”, and so forth) on a relation
reduces the number of examples that we can find, but increases the amount that we
know and can prove about each example.
    One class of relations that has proven useful to mathematicians is the class of
equivalence relations. Equivalence relations have enough conditions that we can prove
useful theorems about them, but they are general enough that there are many exam-
ples of them in all areas of mathematics.
  Equality is the prototypical example of an equivalence relation, but there are many
more examples.
R = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 4)},
Example 21.3. Let A = Z and for a, b ∈ A, we let aRb if a and b have the same
parity. One easily checks that this relation is reflexive (an integer has the same parity
as itself), symmetric (if a and b have the same parity, then so do b and a), and
transitive (if a and b have the same parity, and b and c have the same parity, then a
and c have the same parity). Hence, R is an equivalence relation.                      4
Example 21.4. Let A = R and for a, b ∈ A, we let aRb if |a| = |b|. Again, one
checks easily that this is reflexive (|a| = |a|), symmetric (|a| = |b| clearly implies that
|b| = |a|), and transitive (if |a| = |b| and |b| = |c|, then |a| = |c|). Hence R is an
equivalence relation.                                                                    4
Example 21.5. Let A be the set of all triangles. For a, b ∈ A, we let aRb if a is
similar to b. (Recall from geometry that two triangles are similar if they have the
same interior angles.). One sees easily that this is an equivalence relation.    4
156                                                               CHAPTER VI. RELATIONS
    In many cases, equivalence relations are written using symbols other than R to
indicate that two elements are related. A common symbol to use for a generic equiv-
alence relation is ∼, which can be read “is equivalent to.” Other symbols that might
be used to represent equivalence relations include ≡, ∼ =, ≈, u, ', l. Using these
symbols for a relation that is not an equivalence relation can cause confusion. Note
that these symbols can have many different meanings, so if you use them it is impor-
tant to say what they mean. For instance, ≡ could mean congruence modulo n or
it could mean logical equivalence (or it could have another meaning) depending on
where it occurs.
Example 21.6. Let A = R × R. Define a relation ∼ on A by (a, b) ∼ (c, d) if
a2 + b2 = c2 + d2 . We check that ∼ is an equivalence relation.
    Note that for (a, b) ∈ R × R, we have a2 + b2 = a2 + b2 . Hence, (a, b) ∼ (a, b), so
∼ is reflexive.
    Suppose that (a, b) ∼ (c, d). Then a2 + b2 = c2 + d2 , so c2 + d2 = a2 + b2 , and
(c, d) ∼ (a, b). Hence, ∼ is symmetric.
    Finally, assume that (a, b) ∼ (c, d) and (c, d) ∼ (e, f ). Then a2 + b2 = c2 + d2 and
c + d2 = e2 + f 2 , so a2 + b2 = e2 + f 2 , and we see that (a, b) ∼ (e, f ). Hence ∼ is
 2
transitive.
    Therefore, ∼ is an equivalence relation.                                           4
Example 21.7. Let P , Q, R be statements and let A be the set of all compound
statements formed from P , Q, and R. For a, b ∈ A, we let a ≡ b if a is logically
equivalent to b. One may check that ≡ is an equivalence relation.              4
Example 21.8. Let A = {1, 2, 3, 4, 5} and let S = {{1, 2}, {3, 4}, {5}}.
  Define a relation R on A by
(21.9)       R = {(a, b) ∈ A × A : for some X ∈ S, both a ∈ X and b ∈ X}.
This is the same relation that we constructed in Example 20.6, where we saw that
(21.10)        R = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 3), (3, 4), (4, 3), (4, 4), (5, 5)}.
We will now prove that R is an equivalence relation. We note two facts about S that
will be important: (1) every element of A is a member of some element of S, and
(2) the elements of S are disjoint sets; no two of them share an element of A. We
will prove that R is an equivalence relation using (21.9), rather than the explicit list
(21.10).
    (Reflexive): Let a ∈ A. There is some X ∈ S such that a ∈ X. Hence,
(a, a) ∈ R, so aRa. Therefore, R is reflexive.
    (Symmetric): Let a, b ∈ A and assume aRb. Then (a, b) ∈ R, so for some X ∈ S
we have a, b ∈ X. Hence, b, a ∈ X, so (b, a) ∈ R, and we see that bRa. Therefore R
is symmetric.
    (Transitive): Let a, b, c ∈ A and assume aRb and bRc. Then there is some X ∈ S
such that a, b ∈ X, and there is some Y ∈ S such that b, c ∈ Y . Since the elements
of S are disjoint, b ∈ X and b ∈ Y implies that X = Y . Therefore, both a and c are
in X, and aRc. Therefore R is transitive.                                             4
21. EQUIVALENCE RELATIONS                                                                       157
[a] = {x ∈ A : a ∼ x}.
    Other common notations for the equivalence class of a are a, â, or ã. These
symbols are used to represent the equivalence classes for many different equivalence
relations; thus, if you use one of these notations, you must define it. Similarly, if
you see such symbols in mathematical writing, you should look to see how they are
defined.
Example 21.12. Let A = {1, 2, 3, 4} and define a relation R on A by
R = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 4)}.
                                            1    3
                                            2    4
                                                                                                 4
Example 21.13. Let A = {1, 2, 3, 4, 5, 6}, and define a relation ∼ on A by
                                                                     
                      (1, 1), (1, 3), (2, 2), (2, 4), (3, 1), (3, 3),
               R=                                                       .
                      (4, 2), (4, 4), (5, 5), (5, 6), (6, 5), (6, 6)
    To determine [1], we look for all of the ordered pairs (1, x) ∈ R. We see that the
only x’s which work are 1 and 3 Hence, [1] = {1, 3}.
    Similarly, [2] = {2, 4}, [3] = {1, 3}, [4] = {2, 4}, [5] = {5, 6}, and [6] = {5, 6}. We
note that there are three equivalence classes, since [1] = [3], [2] = [4] and [5] = [6].
Because A is finite, we can easily draw a picture illustrating how the equivalence
classes divide A into three pieces.
1 3 5
                                    2     4     6
                                                                                        4
Example 21.14. Let A = Z, and let ∼ be the relations defined on A by a ∼ b if
a and b have the same parity. We saw in Example 21.3 that ∼ is an equivalence
relation. We will compute the equivalence classes.
    The equivalence class [0] consists of all numbers having the same parity as 0.
Hence, [0] = {even integers}. The equivalence class [1] consists of all numbers having
the same parity as 1. Hence, [1] = {odd integers}.
    Notice that if a is any even integer, [a] = {even integers} = [0], and if a is any
odd integer, [a] = {odd integers} = [1]. Hence, in this case, there are exactly two
equivalence classes, each containing infinitely many elements. Each class also has
infinitely many representatives. For instance, · · · = [−2] = [0] = [2] = [4] = · · · . 4
Example 21.15. Let A = R, and for a, b ∈ A, we let a l b if |a| = |b|. We saw
in Example 21.4 that l is an equivalence relation. For a ∈ A, we will denote the
equivalence class of a by [a].
    We see that [0] = {x ∈ R : 0 l x} = {x ∈ R : |0| = |x|}. There is only one such
value of x, namely x = 0. Hence, [0] = {0}.
    Now, [1] = {x ∈ R : 1 l x} = {x ∈ R : |1| = |x|}, or in other words the
real numbers with absolute value 1. There are two such numbers: 1 and −1. Hence,
[1] = {1, −1}.
    Moving to negative numbers, [−2] = {x ∈ R : −2 l x} = {x ∈ R : |−2| = |x|},
or in other words, the real numbers with absolute value 2. There are two such
numbers: 2 and −2. Hence, [−2] = {2, −2}.
    In general, if a ∈ R and a 6= 0, we see that [a] = {a, −a}.
    We see that there are infinitely many different equivalence classes in R, each
having one or two elements.                                                      4
   We conclude by stating a very simple theorem that is really just a restatement of
the definition of an equivalence class. Nevertheless, the restatement is quite useful to
help us recall how to tell whether an element is in an equivalence class.
21.C     Exercises
Exercise 21.1. Let A = {1, 2, 3} and let R be the relation on A given by
R = {(1, 1), (1, 2), (2, 1), (2, 2), (2, 3), (3, 2), (3, 3)}.
Exercise 21.3. Let R be an equivalence relation on the set A = {1, 2, 3, 4, 5}. Assume
that 1R3 and 3R4. Given these conditions, which ordered pairs must belong to R?
(Hint: There are at least 11 such elements.)
Exercise 21.4. Let A = {1, 2, 3, 4, 5} and let S = {{1, 2}, {3, 4}, {5}}. Define
as in Example 21.8. We have seen that R is an equivalence relation. What are the
equivalence classes of R?
Exercise 21.6. Let A be the set of humans with English names. Define a relation
≈ on A by α ≈ β if α and β have the same first letter in their first names. (For
instance, anyone named “Eugene” is related by ≈ to anyone named “Elizabeth”.)
  (a) Prove that ≈ is an equivalence relation on A.
  (b) Determine the equivalence classes of ≈.
1 3 5
2 4 6
   This theorem, along with Theorem 21.16 from the previous section, allows us to
prove three central properties for equivalence classes. Let A be a set and let ∼ be an
equivalence relation on A. The following all hold:
Example 22.3. Let A = {1, 2, 3}. Then the following are partitions of A:
                                 P1   = {{1, 2, 3}},
                                 P2   = {{1, 2}, {3}},
                                 P3   = {{1, 3}, {2}},
                                 P4   = {{2, 3}, {1}},
                                 P5   = {{1}, {2}, {3}}.
162                                                                     CHAPTER VI. RELATIONS
P1 P2 P3 P4 P5
P 0 = {S1 , S2 , S3 , S4 }.
This is a partition because every natural number is congruent to exactly one of the
four numbers 1, 2, 3, 4.                                                         4
Example 22.5. There are fifteen different partitions of the set A = {1, 2, 3, 4}. We
do not list them all, but mention that there is one partition with one part, seven
partitions with two parts, six partitions with three parts, and one partition with
four parts. As an example, {{1, 2}, {3, 4}} is a partition of A with two parts, and
{{1}, {2}, {3, 4}} is a partition of A with three parts.                           4
22. EQUIVALENCE CLASSES AND PARTITIONS                                               163
   The following theorem asserts that from an equivalence relation we can build a
partition.
P = {[a] : a ∈ A}
is a partition of A.
Proof. We already proved above that the set of equivalence classes satisfies all three
of the defining conditions for a partition.
    The next two examples will demonstrate how we pass from an equivalence relation
to a partition.
R = {(1, 1), (2, 2), (2, 3), (3, 2), (3, 3)}.
We can easily confirm that R is an equivalence relation. The equivalence classes are
[1] = {1} and [2] = {2, 3} = [3]. Hence, the set of equivalence classes is {{1}, {2, 3}},
which is the partition P4 from Example 22.3.                                           4
Example 22.8. Consider the “same parity” relation on Z. The two equivalence
classes are the sets of even integers and odd integers. This partitions Z into two
disjoint sets.                                                                  4
   The last theorem in this section (below) allows us to pass from a partition to an
equivalence relation, thereby reversing the direction of the previous theorem.
    Together, Theorems 22.6 and 21.3 tell us that partitions of a set A and equivalence
relations on A correspond to each other; every partition gives an equivalence relation
and every equivalence relation gives a partition. The following example demonstrates
this fact concretely.
In other words, we partition Z into its negative, zero, and positive pieces.
    What is the corresponding equivalence relation? It is the “two elements are related
if they are both positive, both zero, or both negative” relation. More formally,
R = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 3), (4, 4), (4, 5), (5, 4), (5, 5)}.
[1] = [2] = {1, 2}, [3] = {3}, [4] = [5] = {4, 5}.
Then the set {1, 3, 5} is a transversal, since it consists of a single element from each
equivalence class. Other transversals are {1, 3, 4}, {2, 3, 4}, and {2, 3, 5}.        4
22. EQUIVALENCE CLASSES AND PARTITIONS                                                   165
  Advice 22.14. There are typically two steps to proving that a given set T is a
  transversal of an equivalence relation ∼ on a set A.
      First, you need to prove that every element of A is related to at least one
  element of T . Often this can be done by giving an explicit construction; for any
  element a of A, construct an element t ∈ T such that a ∼ t.
      Second, you need to prove that every element of A is related to at most
  one element of T . This can be done directly, by assuming, for t1 , t2 ∈ T , that
  a ∼ t1 and a ∼ t2 , and proving that t1 must equal t2 ; it can also be done
  simply by showing that any two elements of T that are related to each other are
  actually equal. (The reader should convince themselves that either of these two
  statements suffices.)
    Let (a, b) ∈ R×R. Then (0, b−a) ∈ T , and (a, b) ∼ (0, b−a), since a+(b−a) = b+0.
Hence, every element of R × R is related to at least one element of T .
    Now, suppose that (0, d1 ) ∼ (0, d2 ). Then 0 + d2 = d1 + 0, hence d1 = d2 . So
(0, d1 ) = (0, d2 ). This shows that no element of R × R is related to more than one
element of T . Therefore T is a transversal.                                       4
22.D      Exercises
Exercise 22.1. Let A = {1, 2, 3, 4}. List one partition of A with one part, seven
partitions of A with two parts, six partitions of A with three parts, and one partition
of A with four parts. This gives a total of fifteen partitions of A. (There are no more,
but you do not need to prove this.)
Exercise 22.2. Let A be a set with |A| = 10, and let ∼ be an equivalence relation
on A. Denote the equivalence classes of ∼ by [x] for x ∈ A. Suppose that we have
elements a, b, c ∈ A with |[a]| = 3, |[b]| = 5, and |[c]| = 1.
  (a) Are any of a, b, and c related by ∼?
  (b) How many equivalence classes for ∼ are there in A?
Exercise 22.3. Define a relation ∼ on R2 by (a, b) ∼ (c, d) if a2 + b2 = c2 + d2 . We
saw in Example 21.6 that ∼ is an equivalence relation.
  (a) Describe the equivalence class [(3, 4)], both as a set and geometrically.
  (b) For an arbitrary element (a, b) ∈ R2 , describe [(a, b)].
  (c) Prove that the set [0, ∞) × {0} is a transversal of ∼.
Exercise 22.4. Let W be the set of all words in the English language. Define a
relation on W by α ≈ β if α and β have the same first letter.
  (a) Prove that ≈ is an equivalence relation.
  (b) Let [α] be the equivalence class of α ∈ W . For α = “cat”, list six elements of
      [α].
  (c) How many equivalence classes are there in W for ≈?
  (d) Describe a transversal of ≈. (You do not need to write it down in full.)
Exercise 22.5. Let A be a set with n elements. Define a relation ∼ on P(A) by
X ∼ Y if |X| = |Y |, for any X, Y ∈ P(A).
 (a) Prove that ∼ is an equivalence relation.
 (b) Describe the equivalence classes for ∼.
 (c) How many equivalence classes are there for ∼?
 (d) Describe a transversal of ∼.
 (e) How many elements of P(A) are in each equivalence class?
Exercise 22.6. Let A = {1, 2, . . . , 10}. For i ∈ A, define
Exercise 22.7. Give another proof of Theorem 22.1, by proving that (1) implies (2),
that (2) implies (3), and that (3) implies (1). Note that your proof should not use
any theorems after or including Theorem 22.1; it should only use basic properties of
equivalence relations and equivalence classes.
Exercise 22.8. (Harder problem.) It is claimed above that every equivalence relation
corresponds uniquely to a partition. Prove the final piece of that claim, by showing
the following: Let A be a set. Let ∼ and ≈ be two equivalence relations. Show that
if their equivalence classes are the same, then the relations are the same. (In other
words, conclude that for all a, b ∈ A, we have a ∼ b if and only if a ≈ b.)
    (Hint: For ease of notation, write the equivalence classes for ∼ using [a], and the
equivalence classes for ≈ using a.)
168                                                                  CHAPTER VI. RELATIONS
23     Integers modulo n
In this section we will study integer congruence and prove it is an equivalence relation.
Proof. We first prove that the only congruence classes are the ones listed in the
theorem. Let a ∈ Z. By the division algorithm a = qn + r for some q, r ∈ Z and
0 ≤ r < n. Thus a − r = qn, so a ≡ r (mod n). Hence, a = r is one of the n
congruence classes listed in the theorem.
   It remains to show that no two congruence classes listed in the theorem are equal.
Suppose that i, j ∈ Z, with 0 ≤ i < n and 0 ≤ j < n such that i 6= j. We will show
that i 6= j, by showing that i 6≡ j (mod n). Assume, without loss of generality, that
0 ≤ j < i < n. Then 0 < i − j < n, and since there are no multiples of n between 0
and n, we see that n - i − j. Hence i 6≡ j (mod n), so i 6= j.
   Finally, we note that for any a ∈ Z,
                     a = {x ∈ Z : x ≡ a (mod n)}
                       = {x ∈ Z : x − a = kn for some k ∈ Z}
                       = {x ∈ Z : x = a + kn for some k ∈ Z}
                       = {a + kn : k ∈ Z}
as claimed.
   The set of congruence classes modulo n is so important that we give it a special
symbol.
{0, 1, 2, . . . , n − 1}
23.C     Operations on Zn
Even though the elements of Zn are sets (not numbers), we still want to treat these
elements as if they were numbers. In fact, you have probably already done this
without realizing it. For instance, when you add clock times together you are doing
arithmetic in Z12 . When you use the fact that “odd plus even equals odd” you are
doing arithmetic in Z2 .
    The following theorem will help us make a sensible definition of addition and
multiplication on the set Zn .
Proof. Suppose that a ≡ b (mod n) and that c ≡ d (mod n). Then a − b = nk and
c − d = n` for some k, ` ∈ Z.
   In order to prove that (1) holds, we examine (a + c) − (b + d) and find
             (a + c) − (b + d) = (a − b) + (c − d) = nk + n` = n(k + `),
so a + c ≡ b + d (mod n). Hence, (1) is true.
    To prove that (2) holds, we note that a = b + nk and c = d + n`. Multiplying, we
find that
      ac = (b + nk)(d + n`) = bd + bn` + nkd + n2 k` = bd + n(b` + kd + nk`).
Thus ac − bd = n(b` + kd + nk`), so ac ≡ bd (mod n). Hence, (2) is true.
23. INTEGERS MODULO N                                                              171
   The next example demonstrates how using this theorem can help simplify com-
putations modulo n.
Example 23.9. We have that 28 ≡ 2 (mod 26) and 29 ≡ 3 (mod 26). Hence,
according to the theorem, 28 · 29 ≡ 2 · 3 (mod 26). Indeed, computation shows that
Multiplying 2 and 3 is much easier than multiplying 28 and 29. Thus, finding 28 · 29
modulo 26 is very easy. You just multiply 2 · 3 = 6.                              4
    The upshot of this theorem is that when adding and multiplying, if we are only
interested in what the result is modulo n, then we only need to worry about what the
inputs are modulo n. Motivated by this, we make a definition of what it means to
add and multiply congruence classes mod n. (The following definition may be strange
at first. If so, work through the examples which follow.)
   We will illustrate this definition by working modulo 7 and modulo 11. In these
examples, when we work in Zn we will always write our results as r with 0 ≤ r < n,
even though we may use numbers larger than n during the computations.
Example 23.12. Let n = 7. We perform the following computations in Z7 :
                              5 + 6 = 5 + 6 = 11 = 4,
                              1 + 6 = 1 + 6 = 7 = 0,
                               5 · 6 = 5 · 6 = 30 = 2,
                               3 · 5 = 3 · 5 = 15 = 1.
If we want to add two equivalence classes, we choose representatives, add the repre-
sentatives, and then simplify (in this case modulo 7).                            4
172                                                            CHAPTER VI. RELATIONS
  Warning 23.14. It is important to note that it is not obvious that the defini-
  tions of multiplication and addition in Zn make sense. We have defined X + Y
  for X, Y ∈ Zn by choosing integers a, b with X = a and Y = b, and defining
  X + Y = a + b. However, other choices of integers would be possible. Suppose
  that we had chosen (possibly different) representative c and d with X = c and
  Y = d. Then our definition would claim that X + Y = c + d. If the result of
  addition depends on an arbitrary choice, then we do not have a good definition.
      Fortunately, in this case, Theorem 23.8 comes to our rescue. If we have
  X = a = c and Y = b = d, then we know that a ≡ c (mod n) and b ≡ d
  (mod n). Hence, a + b ≡ c + d (mod n), by Theorem 23.8, so a + b = c + d.
  Hence, our choices made no difference in the definition of X + Y .
      When we have a definition that appears to depend on arbitrary choices,
  but for which the arbitrary choices can be shown to make no difference in the
  definition, we say that the object being defined is well-defined. In other words,
  we have shown that addition on Zn is well-defined.
Example 23.15. We write the complete addition and multiplication tables for Z6 .
                 + 0      1 2 3 4 5                ·   0 1 2 3 4 5
                  0   0   1   2   3   4   5        0   0   0   0   0   0   0
                  1   1   2   3   4   5   0        1   0   1   2   3   4   5
                  2   2   3   4   5   0   1        2   0   2   4   0   2   4
                  3   3   4   5   0   1   2        3   0   3   0   3   0   3
                  4   4   5   0   1   2   3        4   0   4   2   0   4   2
                  5   5   0   1   2   3   4        5   0   5   4   3   2   1
In these tables we wrote every entry using 0, 1 . . . , 5. We can do this because S =
{0, 1, . . . , 5} is a transversal for the equivalence classes. We could have used any other
transversal such as {1, 2, 3, 4, 5, 6}, {−2, −1, 0, 1, 2, 3}, or even {−10, −5, 0, 5, 10, 15}
(but it is best to keep things simple).
     We notice some interesting facts about multiplication and addition in Zn that are
different from addition and multiplication in Z.
     If a is an integer and a + a = 0, then a must equal 0. However, in Z6 we have
3 + 3 = 0 even though 3 6= 0.
     If a, b ∈ Z with a · b = 0, then either a = 0 or b = 0. However, in Z6 we have
2 · 3 = 0 even though neither 2 nor 3 is equal to 0.                                       4
23. INTEGERS MODULO N                                                            173
   This example shows that we cannot take facts about addition or multiplication in
Zn for granted. We must prove those facts about addition and multiplication which
we wish to use in Zn , rather than just assuming they hold. In the exercises you will
prove some basic facts about operations in Zn that do match our intuition from Z.
We demonstrate with an example.
X + 0 = a + 0 = a + 0 = a = X. 4
23.D     Exercises
Exercise 23.1. Let n ∈ N and a ∈ Z. Prove that 0 ∈ a if and only if n | a.
Exercise 23.2. Compute the following. Write the results as r, with r ∈ Z nonnega-
tive and as small as possible.
  (a) 6 + 7 in Z9 .
  (b) 6 · 7 in Z9 .
  (c) 59 · 119 in Z30 .
  (d) 6 · 5 + 85 in Z7 .
       10
  (e) 2 in Z5 . (By an we mean a
                               | · a{z· · · a}).
                                n times
Exercise 23.3. Create addition and multiplication tables for Z5 . Be sure to write
each entry of the tables as one of 0, 1, 2, 3, or 4.
Exercise 23.4. Let n ∈ N. Prove the following facts about addition and multiplica-
tion in Zn .
  (a) For all X, Y ∈ Zn , X + Y = Y + X.
  (b) For all X, Y ∈ Zn , X · Y = Y · X.
  (c) For all X ∈ Zn , X · 0 = 0.
  (d) For all X ∈ Zn , X · 1 = X.
  (e) For all X ∈ Zn , X · 2 = X + X.
  (f) For all X ∈ Zn , there is some Y ∈ Zn such that X + Y = 0.
  (g) For all X, Y, Z ∈ Zn , (X + Y ) · Z = (X · Z) + (Y · Z).
Exercise 23.6. Is it true that for each X 6= 0 in Z6 , there is some Y ∈ Z6 such that
X · Y = 1?
Exercise 23.7. In this exercise we generalize what was done in the previous two
exercises.
174                                                     CHAPTER VI. RELATIONS
Functions
                                          175
176                                                          CHAPTER VII. FUNCTIONS
24     Defining functions
The main purpose of this section is to formally define what we mean by a function.
We will also give examples.
∀a ∈ A, ∃! b ∈ B, (a, b) ∈ f.
We see that f is a function, because each of the four elements of A is a first coordinate
of exactly one ordered pair in f .
    It is sometimes useful to visualize a function via a diagram. For instance, the
function f described above could be visualized in the following diagram.
                                                         1
                                 1
                                                         2
                                 2
                                                         3
                                 3
                                                         4
                                 4
                                                         5
                                 A
                                                         B
                                 a
                                 b                       1
                                 c                       2
                                 d                       3
                                 e                       B
The element a has two arrows emanating from it; the element e has none.              4
    Since a function f from A to B is a relation from A to B, for a ∈ A and b ∈ B
such that (a, b) ∈ f we could write a f b. However, for functions we typically use a
different notation.
f (a) = b
  to mean that (a, b) ∈ f . In other words, f (a) is the second coordinate of the
  unique ordered pair having a as its first coordinate.
     When f (a) = b, we say that b is the image of a under the function f .
                                                                  x
                          −2       −1                 1       2                           4
  Definition 24.10. Two functions f and g are equal if they have the same
  domain, the same codomain, and they are equal as sets of ordered pairs.
  Theorem 24.13. Let A and B be sets, and suppose that {P, Q} is a partition
  of A with two parts. If we are given a function g : P → B and a function
  h : Q → B, then f = g ∪ h defines a function f : A → B.
   The following example involves finite sets. You might try drawing the arrows
yourself to check that the diagram in the example is correct.
Example 24.15. Let A = P ∪ Q, where P = {1, 2, 3} and Q = {4, 5, 6}, and let
B = {1, 2, 3}. Then {P, Q} is a partition of A. If we define g : P → B by g(x) = 4−x,
and we define h : Q → B by h(x) = x − 3, we see that
                                       1
                               P       2
                                                              1
                                       3
                                                              2
                                       4
                                                              3
                               Q       5
                                                             B
                                       6
                                      A
f = g ∪ h = {(1, 3), (2, 2), (3, 1), (4, 1), (5, 2), (6, 3)}. 4
Example 24.16. Question: What is wrong with the following piecewise defined
function? Try to define f : R → R by
                                     (
                                      x2 + 2 if x ≥ 0,
                             f (x) =
                                      −1     if x ≤ 0.
   Answer: It has two different values at 0, and hence is not a function. The
problem is that the two domains “x ≥ 0” and “x ≤ 0” are not a partition of R (they
overlap).                                                                       4
24. DEFINING FUNCTIONS                                                            181
    In the following example, the conditions in the piecewise defined function implic-
itly define a partition on the domain. We have no need to explicitly name the parts
of the partition.
Example 24.18. We can define a function f : Z → Z by
                              (
                                 3x + 1 if x is odd,
                      f (x) = x
                                         if x is even.
                                 2
Note that when x is even then x/2 ∈ Z, so this really does give a function. We have,
for instance, f (1) = 4, f (2) = 1, and so forth.                                 4
   It is easy to generalize Theorem 24.13 to partitions of A with more than two parts.
You will do this in Exercise 24.7. For now, we give an example.
Example 24.19. We can define a function f : R → R by
                               
                               −1 if x < −1
                               
                       f (x) = 0      if −1 ≤ x ≤ 1
                               
                                 1    if 1 < x
                               
We graph this function below, with the portions given by the separate conditions in
different colors.
                                         y
                                                            x
                      −2       −1                   1           2
−1
                                                                                    4
   To end this subsection, we define a special kind of (piecewise defined) function,
which is useful in many parts of mathematics.
182                                                            CHAPTER VII. FUNCTIONS
Example 24.22. Let A = R and let S = [0, 1] be the closed interval from 0 to 1.
The graph of χS : R → {0, 1} is
                                    y
                                                                  x
                        −2       −1                    1              2
Example 24.28. Here is another example of a rule that does not yield a well-defined
function. Look for how elements in the domain can be represented in multiple ways.
   Suppose a professor wishes to assign grades to his class in a nontraditional way.
He takes X = {students in the class} and Y = {A, B, C, D, F }, and decides to assign
grades by a rule g : X → Y given by
                      
                       The first letter of the
                      
                                              if the result is in {A, B, C, D},
                      student’s name
       g(student) =
                      
                      
                      F                       otherwise.
How would students react to this? Amelia Andrews would be overjoyed. Robert
Smith would likely say “Call me Bob.” What grade would John Adams receive?
Since the professor has not made clear whether the grade is determined by the first
or the last name, John could make a case that he should receive either an A or an F
(he would probably argue for the former).
    In this case, the proposed function that the professor wishes to use is not well-
defined. It depends on a choice of which name to use for the student.             4
  Warning 24.29. Note that if a proposed rule for a function f does not produce
  a well-defined function, we would be incorrect to say that “the function f is not
  well-defined,” since f is, in fact, not a function. We should instead say “the rule
  defining f does not produce a well-defined function,” “the proposed function is
  not well-defined,” “f is not a well-defined function,” or something similar (as
  long as we do not call f a function).
Example 24.30. In this example we will be working both modulo 6 and modulo 3,
so we need different notations for the different equivalence classes. As usual, we will
denote congruence classes modulo 6 by a, where a ∈ Z. For this example, we will
denote congruence classes modulo 3 by [a], where a ∈ Z. If we define
f : Z3 → Z6
by f ([a]) = a and
                                     g : Z6 → Z3
by g(a) = [a], one of f and g is a function and the other is not well-defined. Which
is which?
24. DEFINING FUNCTIONS                                                              185
24.D      Exercises
Exercise 24.1. Which of the following relations are functions from the set A =
{1, 2, 3, 4} to the set B = {1, 2, 3, 4, 5}?
 (a) f1 = {(1, 3), (2, 3), (3, 3), (4, 3)}.
 (b) f2 = {(1, 2), (2, 3), (3, 5), (4, 6)}.
  (c) f3 = {(1, 2), (2, 3), (2, 4), (4, 5)}.
 (d) f4 = {(1, 2), (1, 3), (2, 3), (3, 4), (4, 1)}.
  (e) f5 = {(1, 2), (2, 3), (4, 5)}.
  (f) f6 = {(1, 2), (1, 2), (2, 3), (3, 4), (4, 1)}. (Hint: Do repetitions matter?)
Exercise 24.3. Let A be a finite set and let B be any set. Let f : A → B be a
function. Considering f as a set of ordered pairs, prove that |f | = |A|.
Exercise 24.4. For a ∈ Z, denote the congruence class of a modulo 8 by a, and the
congruence class of a modulo 4 by [a]. Determine which of the following definitions
give well-defined functions. For those that are well-defined, give a proof. For those
that are not well-defined, give an example to demonstrate this fact.
  (a) Define f : Z8 → Z4 by f (a) = [a].
  (b) Define g : Z4 → Z8 by g([a]) = a.
  (c) Define h : Z4 → Z8 by h([a]) = 2a.
  (d) Define j : Z4 → Z8 by j([a]) = 3a.
Exercise 24.7. Let A and B be sets. Let I be an indexing set, and let P = {Pi :
i ∈ I} be an arbitrary partition of A. For each i ∈ I, let fi : Pi → B be a function.
Prove that the relation                   [
                                      f=     fi
                                          i∈I
is a function from A to B and that the rule for f (as a piecewise defined function) is
x x
    Thus, the vertical line test checks that each x-input yields at most one y-output.
If we turn this around, and ask which functions pass the horizontal line test (i.e., no
output comes from two different inputs) we get the following definition:
  In order to avoid working with inequalities, one often instead uses the contra-
  positive
    Injective functions are also called “one-to-one” (or 1-1) functions, meaning each
element of B is the image of at most one element from A. In other words, for each
b ∈ B either there is a unique a ∈ A such that f (a) = b, or there is no element in A
that maps to B. This is in contrast to functions that might take two, three, or more
elements of A to one fixed element of B. In other words, a function f : A → B is
injective exactly when:
    The name “one-to-one” does not refer to the fact that the function takes each
individual element of A to one element of B (as opposed to multiple elements of B);
this is a property that all functions have (corresponding to the vertical line test).
    The next example illustrates what injectivity looks like for functions between finite
sets.
                                                         1
                                  1
                                                         2
                                  2
                                                         3
                                  3
                                                         4
                                  A
                                                         B
In terms of the diagram, we can think of injectivity as meaning that there is never
more than one arrow pointing at an element of B.                                 4
                                                       1
                                 1
                                                       2
                                 2
                                                       3
                                 3
                                                       4
                                 A
                                                       B
We see from the diagram that f (2) = f (3) even though 2 6= 3. Thus, this function is
not injective.                                                                    4
    In most of the following examples we take Advice 25.3 to heart, and prove injec-
tivity by showing that (25.2) holds.
    The previous two examples show that injectivity depends as much on the domain
as on the rule used to define the function.
x1 x2 − 2x1 − x2 + 2 = x1 x2 − x1 − 2x2 + 2
Cancelling equal terms on both sides, and adding 2x1 + 2x2 to both sides yields
x2 = x1 .
Hence, f is injective.                                                               4
Example 25.11. Define a function f : Z → Z by
                                  (
                                    n + 1 if n ≥ 0,
                          f (n) =
                                    n     if n < 0.
                                        x                                   x
25. INJECTIVE AND SURJECTIVE FUNCTIONS                                                191
    The curve on the left is not a function because it fails the vertical line test. If we
are asking for a function from R to R, then the curve on the right also fails to be a
function because it is not defined everywhere. In other words, some vertical lines do
not hit the curve; there are “holes” for all negative values of x.
    Thus, any function f : A → B satisfies two basic properties:
    • (Vertical line test): For each input, there is at most one output.
    • (No holes in the domain): For each input, there is at least one output.
Putting these two properties together we get exactly the definition of a function (for
each element of the domain, there is exactly one output in the codomain).
    In the previous subsection we studied what happens if we reverse the roles of
the domain and codomain, turning the vertical line test into the horizontal line test.
Here, we study the same reversal, but with the “no holes in the domain” rule changed
to the “no holes in the codomain.”
(25.14) ∀b ∈ B, ∃a ∈ A, f (a) = b.
    Surjective functions are sometimes called “onto” (meaning, they map onto all
elements of the codomain). In terms of ordered pairs, surjectivity of a function
f : A → B means:
Example 25.16. Let A = {1, 2, 3, 4} and let B = {1, 2, 3}. Define a function f : A →
B by
                       f = {(1, 2), (2, 2), (3, 3), (4, 1)}.
We note that each element 1, 2, 3 ∈ B appears at least once as the second coordinate
of an ordered pair in f (in fact, 2 appears twice). Visualizing f as a diagram, we have
                                 1
                                                      1
                                 2
                                                      2
                                 3
                                                      3
                                 4
                                                      B
                                 A
In terms of the diagram, we can think of surjectivity as meaning that there is always
at least one arrow pointing at any element of B.                                   4
    In order to prove that f : A → B is surjective we normally verify (25.14). We
start by letting b ∈ B be an arbitrary element of B. We then need to somehow use b
to prove the existence of an element a ∈ A such that f (a) = b. We will demonstrate
how this is to be done with many examples.
192                                                     CHAPTER VII. FUNCTIONS
Hence, f is surjective. 4
Remark 25.18. This proof may be unsatisfying because, like many existence proofs,
it clearly shows that a exists with f (a) = b, but it does so without explaining how a
might be found. This is due to the fact that the correct a was found using scratchwork,
but that work was not included in the proof. We will do that scratchwork now.
    Given b ∈ R, we wish to find a such that f (a) = b. Hence, we wish to solve
2a + 1 = b.
                                           b−1
                                      a=       .
                                            2
This gives the desired element.
    Notice that the preceding paragraph derives the desired a. However, the work done
there is not the proof of surjectivity. To do the proof we need to check two things.
First, that this a we found in our scratchwork belongs to the domain. Second, that
f (a) = b. You should do the work of calculating the desired a on scratch paper and
afterwards write out the proof.                                                    N
Example 25.19. Define f : R → R by f (x) = x2 . Does this function pass the “no
holes in the codomain” test? No, we see that all negative values are missed. We will
now formally prove it is not surjective. Fixing b = −1, then for any a ∈ R we have
f (a) = a2 6= −1 = b. Hence, f is not surjective.                                4
Example 25.20. Define f : R → [0, ∞) by√f (x) = x2 . Let b ∈ [0, ∞). Since b is
nonnegative it has a square root, so fix a = b. Clearly a is real, hence it belongs to
the domain. Now,                        √      √
                             f (a) = f ( b) = ( b)2 = b.
Since b was an arbitrary element of the codomain, f is surjective.                  4
   The previous two examples show that surjectivity can depend as much on the
codomain as on the rule used to define the function.
25. INJECTIVE AND SURJECTIVE FUNCTIONS                                            193
                                             x−1
                                   f (x) =       .
                                             x−2
We wish to show that f is surjective.
                                  4 y
                                   1
                                                                  x
    −10    −8    −6    −4    −2              2    4    6      8    10
                                  −1
                                  −2
(Note: This function only has a “hole” in the codomain if our codomain is R. We
remove that hole by limiting the codomain to R − {1}.)
   Let b ∈ R − {1}. Fix
                                       2b − 1
                                    a=        .
                                        b−1
We need to show that a belongs to the domain R − {2}. Clearly a ∈ R (since b 6= 1).
Assuming, by way of contradiction, that a = 2, we would have 2b − 1 = 2(b − 1).
Simplifying, we get −1 = −2, a contradiction. Hence, a ∈ R − {2}.
   Now
                          2b − 1
                                 −1    2b − 1 − (b − 1)   b
                  f (a) = b − 1     =                   = = b.
                          2b − 1      2b − 1 − 2(b − 1)   1
                                 −2
                          b−1
Hence, f is surjective.                                                          4
Example 25.23. Let S = {1, 2, 3, 4, 5}, and let A = P(S) − {∅}. Define a function
f : A → S by
                      f (X) = the least element of X,
for every X ∈ A. You may check directly that f is a function from A to S (for
instance f ({3, 4, 5}) = 3 and f ({2, 4}) = 2). We will now prove that f is surjective.
    Let b ∈ S. Fix a = {b}. Since {b} ∈ P(S) and {b} 6= ∅ we have that a ∈ A.
Clearly, the least element of {b} is b. Hence, f (a) = f ({b}) = b. This proves that f
is surjective.                                                                       4
Example 25.29. Let A = {1, 2, 3}, and let f : A → P(A) be defined by f (a) =
{a, 2}. The range of f is the set
             {f (a) : a ∈ A} = {f (1), f (2), f (3)} = {{1, 2}, {2}, {2, 3}}.         4
   A useful way to think about im(f ) is as the set of all elements of B that are second
coordinates of some element of f .
Proof. The function f is surjective if and only if every b ∈ B is equal to f (a) for some
a ∈ A. This happens if and only if B = {f (a) : a ∈ A} = im(f ).
25.E     Exercises
Exercise 25.1. Let A = {1, 2, 3} and B = {x, y}. List all functions from A → B,
and for each function state (without proof) whether it is injective, surjective, both
(bijective), or none of the above.
    Now do the same for all functions from B → A.
Exercise 25.2. For each of the following, determine (with proof) whether the func-
tion is injective and/or surjective.
  (a) Define f : Z → Z by f (n) = 2n + 1.
  (b) Define g : R → R by g(x) = x2 + 2x + 2.
  (c) Define h : Z → Z by h(n) = n + 3.
Exercise 25.3. Define f : Z5 → Z5 by f (a) = 2a + 3.
 (a) Prove that f is well-defined.
 (b) Is f injective? Surjective? Give proofs. (Hint: You cannot divide by 2, but
     you can multiply by 3. Alternatively, write out the ordered pairs and check all
     cases.)
Exercise 25.4. Find a function f : R → R that is
  (a) neither injective nor surjective.
  (b) injective but not surjective.
  (c) surjective but not injective.
  (d) both injective and surjective.
In all cases give proofs. (Hint: For some of these, piecewise defined functions may be
useful.)
Exercise 25.5. Define f : R − {2} → R − {1} by
                                             x−3
                                   f (x) =       .
                                             x−2
 (a) Prove that f is a function from R − {2} to R − {1}. (The only question is
     whether f (x) ∈ R − {1} whenever x ∈ R − {2}.)
 (b) Prove that f is injective.
 (c) Prove that f is surjective.
Exercise 25.6. Define f : Z2 → Z by f (m, n) = 3m − 2n. Is f injective? Surjective?
(Give proofs.)
Exercise 25.7. Describe (without proof) the image of each of the following functions
from R → R.
  (a) sin(x).
  (b) ex .
        3
  (c) x
      p.
  (d) |x|.
Exercise 25.8. Let f : R → R be a function. Suppose that we graph f in the xy-
plane, with the domain being the horizontal axis, and the codomain being the vertical
axis. Prove the following:
25. INJECTIVE AND SURJECTIVE FUNCTIONS                                              197
 (a) (The vertical line test): Since f is a function, every vertical line intersects the
     graph of f at most once.
 (b) (No holes in the domain): Since f is a function, every vertical line intersects
     the graph of f at least once.
 (c) (The horizontal line test): The function f is injective if and only if every hori-
     zontal line intersects the graph of f at most once.
 (d) (No holes in the codomain): The function f is surjective if and only if every
     horizontal line intersects the graph of f .
198                                                           CHAPTER VII. FUNCTIONS
26     Composition of functions
Fix A = {1, 2, 3, 4}, B = {5, 6, 7}, and C = {8, 9, 10, 11}, and fix functions f : A → B
and g : B → C defined by
                           f = {(1, 6), (2, 5), (3, 7), (4, 6)},
and
                              g = {(5, 8), (6, 9), (7, 10)}.
A picture of this situation appears below:
                               1                         8
                                            5      g
                               2                         9
                                      f     6
                               3                         10
                                            7
                               4                         11
                               A            B            C
    One can ask what might happen if we were to begin at an element of A, and first
follow the arrow for the function f (to an element of B), and then follow the arrow for
the function g (to an element of C). We would then get a drawing like the following:
                               1                         8
                               2                         9
                               3                         10
                               4                         11
                               A                         C
    Each arrow in the new diagram is obtained by traversing two arrows in the original
diagram. We notice that the new diagram describes a function, since each element
of A has a single arrow emanating from it. We call this function the composition. In
this section we will formally define compositions and study how function properties
(such as injectivity or surjectivity) behave on composites of functions.
Remark 26.2. We note that this definition makes sense: since a ∈ A, we have that
f (a) is in B, so g(f (a)) exists and is in C.                                N
26. COMPOSITION OF FUNCTIONS                                                      199
  Advice 26.3. When the codomain of f is not equal to the domain of g, there
  is no composite function g ◦ f (i.e., it is not defined).
Example 26.4. Let A = {1, 2, 3, 4}, B = {5, 6, 7}, and C = {8, 9, 10, 11}, and
suppose we have functions f : A → B and g : B → C defined by
and
                              g = {(5, 8), (6, 9), (7, 10)}.
   Since f (1) = 6 and g(6) = 9, we have (g ◦ f )(1) = g(f (1)) = g(6) = 9.
   Since f (2) = 5 and g(5) = 8, we have (g ◦ f )(2) = g(f (2)) = g(5) = 8.
   Similarly, (g ◦ f )(3) = 10 and (g ◦ f )(4) = 9. Hence, we have that
   In this case, since the codomain of g is the same as the domain of f , we may also
construct the function f ◦ g : R → R. This function is given by
Notice that f ◦ g is not the same function as g ◦ f . The order in which the functions
are composed does matter. Because of this, we say that function composition is not
commutative. This is in contrast to operations like addition of real numbers, where
a + b = b + a.                                                                      4
and                                                  
                       (h ◦ g) ◦ f (a) = (h ◦ g) f (a) = h(g(f (a))).
Hence, we see that the two functions give the same image for a. Therefore, they are
equal.
Remark 26.7. A graphical representation of this theorem is given below.
                                   g◦f
                   •                                     •
                                                                    h
                  A                  B         g         C                 D
                           f
                                     •                                     •
h◦g
The statement of the theorem is that following the arrow marked g ◦ f and then the
arrow marked h, in other words h ◦ (g ◦ f ), yields the same result as following f and
then following h ◦ g, in other words (h ◦ g) ◦ f .                                   N
Remark 26.8. The fact that (h ◦ g) ◦ f = h ◦ (g ◦ f ) is called associativity of function
composition. Associativity is a property that shows up in many places in math.
For a, b, c ∈ R, we have two different forms of associativity, namely (a + b) + c =
a + (b + c) for addition and (ab)c = a(bc) for multiplication. If you have seen matrix
multiplication, then you know that if A, B, and C are matrices that can be multiplied,
then (AB)C = A(BC). Different manifestations of associativity are often, in some
way, related to the associativity of function composition.                              N
      There is one special function that behaves very well with respect to composition.
  Definition 26.9. Let A be a set. Define the function idA : A → A by idA (a) = a
  for each a ∈ A. This is the identity function on A.
(g ◦ f )(a1 ) = (g ◦ f )(a2 ).
                                                 x−1
                                      f (x) =
                                                 x−2
xy − 2y = x − 1,
so that
                                  x(y − 1) = 2y − 1,
and, dividing by y − 1, we obtain
                                            2y − 1
                                      x=           .
                                             y−1
                                                 2y − 1
                                    f −1 (y) =          .                               4
                                                  y−1
26.E     Exercises
Exercise 26.1. Let f : A → B and g : B → C be functions.
 (a) Prove that if f and g are injective, then g ◦ f is injective.
 (b) Prove that if g ◦ f is surjective, then g is surjective.
                                                3x + 1
                                      f (x) =
                                                x−5
 (a)   Determine f −1 .
 (b)   Determine f ◦ f .
 (c)   Determine f ◦ f ◦ f .
 (d)   Define
                                      fn = f ◦ · · · ◦ f .
                                           | {z }
                                               n times
Proof. As a collection of ordered pairs, f has |A| elements. Hence, there are at most
|A| second coordinates. Thus, there cannot be more than |A| elements in the image
of f .
    We now prove the last sentence. For the forward direction, assume that f is
injective. Then all second coordinates of pairs in f are distinct, so the number of
such second coordinates, which is |im(f )|, is equal to |f | = |A|.
    Conversely, assume that f is not injective. Then f has fewer than |A| distinct
second coordinates (since at least two of the |A| second coordinates must be equal).
Hence, |im(f )| < |A|.
  Theorem 27.3. Let A and B both be finite sets, and let f : A → B be a function.
   (1) If f is injective, then |A| ≤ |B|.
   (2) If f is surjective, then |A| ≥ |B|.
   (3) If f is bijective, then |A| = |B|.
Example 27.4. Let A = {1, 2, 3, 4} and let B = {1, 2, 3, 4, 5}. Then |A| < |B|.
Hence, by the contrapositive of part (2) above, we know that there can be no surjective
function from A to B. This also implies that there can be no bijective function from
A to B.                                                                              4
  A special property of finite sets is that, in some cases, injectivity and surjectivity
may imply each other.
  Theorem 27.5. Let A and B be finite sets and assume |A| = |B|. A function
  f : A → B is injective if and only if it is surjective.
Proof. Suppose A and B are finite, |A| = |B|, and that f : A → B is a function.
(⇒): Assume f is not surjective. Then |im(f )| < |B| = |A|, so f is not injective.
(⇐): Assume f is not injective. Then |im(f )| < |A| = |B|, so f is not surjective.
  Warning 27.6. Theorem 27.5 does not apply if A and B are infinite sets.
  Firstly, we don’t yet know what it means for two infinite sets to have the same
  size. However, even if A = B, there are still problems. For instance, define
  f : N → N by
                                   f (n) = n + 1
  and define g : N → N by
                                     (
                                      1     if n = 1,
                              g(n) =
                                      n − 1 if n > 1.
  One may quickly check that f is injective but not surjective, and g is surjective
  but not injective.
Remark 27.7. A common use of Theorem 27.5 occurs when A is a finite set and
f : A → A is a function. In this case, the sizes of the domain and codomain are clearly
equal, so f is injective if and only if f is surjective.                             N
functions can be somewhat more complicated than for functions defined by a single
simple rule, but by carefully using the definitions, and proceeding with a proof by
cases, it is usually not too difficult.
    We will now examine a special case of piecewise defined functions for which injec-
tivity and surjectivity are easy to prove. These functions will be useful to us later.
  with the rule f (a) = fi (a) if a ∈ Pi . We call f the function obtained by pasting
  together the fi .
                             1                          1
                             2                          2
                             3                          3
                             4
                             5
                             6                          4
                             7
                             8
                             9                          5
                             10
                                                                                        4
27. ADDITIONAL FACTS ABOUT FUNCTIONS                                                 209
    The following theorem tells us when a pasted together function is injective, sur-
jective, or both.
  Theorem 27.11 (The Pasting Together Theorem). Using the notation as given
  in Definition 27.9, each of the following hold:
    (1) If each fi is injective, then f is injective.
    (2) If each fi is surjective, then f is surjective.
    (3) If each fi is bijective, then f is bijective.
Proof. (1) Suppose that each fi is injective. We will show that f is injective, so
      let a1 , a2 ∈ A and assume f (a1 ) = f (a2 ). We know that a1 ∈ Pi and a2 ∈ Pj
      for some i and j (since the P ’s partition A). Hence f (a1 ) = fi (a1 ) ∈ Qi , and
      f (a2 ) = fj (a2 ) ∈ Qj . Since f (a1 ) = f (a2 ) we have Qi ∩ Qj 6= ∅. But the Q’s
      partition B, hence Qi = Qj , or in other words i = j. Thus
     Since fi is injective, a1 = a2 .
 (2) Suppose that each fi is surjective. Let b ∈ B, so b ∈ Qi for some i. Since fi
     is surjective we can fix some a ∈ Pi with fi (a) = b. Hence, f (a) = fi (a) = b.
     Therefore, f is surjective.
 (3) This part follows from (1) and (2).
   One checks easily that f1 and f2 are bijections (see Exercise 27.2). Hence, by the
Pasting Together Theorem, f is a bijection.                                        4
f |S : S → B
then
                               f |S = {(a, f (a)) : a ∈ S}.
Hence, f |S is just the set of ordered pairs in f whose first coordinate is in S.        N
You should try to prove the following theorem before reading the proof.
fˆ: A → im(f )
  is a surjective function.
27. ADDITIONAL FACTS ABOUT FUNCTIONS                                                 211
Proof. Since fˆ consists of the same ordered pairs as f , it is a function. (Each element
of the domain A is the first coordinate of exactly one ordered pair, and the second
coordinate is an element of im(f ).) For each b ∈ im(f ), there is some a ∈ A such
that f (a) = b (by the definition of the image). Then fˆ(a) = b, so fˆ is surjective.
Remark 27.19. In essence, Theorem 27.18 says “a function f is surjective onto its
image.”                                                                        N
f −1 (T ) = {a ∈ A : f (a) ∈ T }.
  Warning 27.25. Although the same symbol is used for the preimage and the
  inverse function of f (if f is a bijection), we note that the two concepts are quite
  different. The preimage exists even if f is not a bijection, while we have seen
  that the inverse function only exists if f is a bijection.
f = {(1, 7), (2, 6), (3, 7), (4, 6), (5, 9)}.
Example 27.27.
           √ Let                        by f (x) = x2 . Then for x > 0 we have
                √ f : R → R be defined −1
 −1
f ({x}) = { x, − x}, for x < 0 we have f ({x}) = ∅, and f −1 ({0}) = {0}.   4
27.E     Exercises
Exercise 27.1. Prove Theorem 27.2.
Exercise 27.2. Prove that the functions f1 and f2 defined in Example 27.12 are both
bijections.
Exercise 27.3. Give an example of a bijective function f : Z → {0, 1}×N and include
a proof that it is bijective.
   (Hint: Partition Z into positive and nonpositive integers. Partition {0, 1} × N into
Define two bijections, and then use the Pasting Together Theorem.)
Cardinality
...it’s very much like your trying to reach Infinity. You know that it’s there, but you
just don’t know where—but just because you can never reach it doesn’t mean that it’s
not worth looking for. Norton Juster, The Phantom Tollbooth
    In the very first chapter of this book, we defined the cardinality of a finite set to
equal the number of its elements. Thus, for instance, the sets {a, b, c} and {1, 2, 3}
have the same cardinality, which is 3. For infinite sets we cannot define the cardinality
to be the number of elements, because such sets do not have any (finite) number of
elements.
    However, there is a reason we do not just define the cardinality of an infinite set
to be the symbol ∞; there is a better way to measure the size of sets! This came as
a shock to mathematicians in the late 1800’s, who expected all infinite sets to have
the same size. This theory was developed by Cantor, who showed that the set of real
numbers R has bigger cardinality than N. In this chapter, we develop Cantor’s theory
of cardinality, which has become an important part of modern mathematics.
                                          215
216                                                           CHAPTER VIII. CARDINALITY
  Definition 28.1. Let S and T be sets. We say that S and T have the same
  cardinality if there exists a bijection f : S → T . If this holds, we write |S| = |T |.
      If there is no bijection from S to T , we say that they have different cardinal-
  ities and write |S| =6 |T |.
Remark 28.2. We will prove, shortly, that this relation is an equivalence relation,
which will justify our use of an equality sign for the relation.                 N
    The following are some examples and nonexamples of sets with the same cardi-
nality.
Example 28.3. (1) Consider the three sets A = {a, b, c, d, e, f }, B = {1, 2, 3, 4, 5, 6},
and C = {0, 1, 2, 3, 4, 5}. It is easy to construct a bijection from A to B (since both
sets have exactly six elements). So |A| = |B|. There are also bijections from A to C,
and from B to C (since C also has six elements), so |A| = |C|, and |B| = |C|.
    (2) Let S = {1, 2, 3} and T = N. There is no bijection from S to T (since T has
more than 3 elements). Thus |S| =    6 |T |.
    (3) It can be tricky when working with infinite sets to tell whether they have the
same cardinality. For instance, does N have the same cardinality as 2N? The answer
is yes! There is a bijection f : N → 2N, given by f (n) = 2n. In other words,
                 f (1) = 2,        f (2) = 4,       f (3) = 6,    f (4) = 8, . . .
is a bijection from N to 2N. Thus, we do have
                                           |N| = |2N|.                                      4
      The following example is so important that we’ll call it a theorem.
28. DEFINITIONS REGARDING CARDINALITY                                                 217
                     n   1    2    3 4 5 6 7 8 9                    10
                   f (n) 0    1   −1 2 −2 3 −3 4 −4                  5
which yields the needed bijection between N and Z.
Remark 28.5. The previous proof was very informal. First, we didn’t prove that
the function f is injective and surjective. We’ll leave that as an exercise to be verified
later.
    Second, the definition of the function f is sloppy. To be more precise we should
define f as a piecewise function on n ∈ N by the rule
                                  (
                                    n/2             if n is even,
                          f (n) =
                                    −(n − 1)/2 if n is odd.
    Amazingly, it turns out that this function can be expressed by a (somewhat com-
plicated) single formula
                                      1 + (−1)n (2n − 1)
                             f (n) =                     .
                                              4
(It is not expected that a student would be able to come up with this formula without
a lot of help!) Note that the inverse of this function is given in Example 27.12.  N
  Advice 28.6. To prove that two sets have the same cardinality you are required
  to find a bijection between the two sets. In general there are usually lots of
  different bijections. Try to look for a simple one.
Example 28.7. We will prove that the open interval A = (0, 1) and the open interval
B = (1, 4) have the same cardinality. We thus want to construct a bijection between
these two sets. The most obvious option would be to stretch by a factor of 3 and
then shift right by 1. So we define g : (0, 1) → (1, 4) by the rule
g(x) = 1 + 3x.
  Theorem 28.8. The relation of “having the same cardinality” as given in Def-
  inition 28.1 is an equivalence relation on the collection of sets.
Proof. We first prove this relation is reflexive. Let X be any set. The identity function
idX : X → X is a bijection. Thus X is related to X.
    Next, we prove this relation is symmetric. Let X and Y be any sets, and assume
X relates to Y . In other words, assume there is a bijection f : X → Y . Then f has
an inverse function f −1 : Y → X which is also a bijection. Hence Y relates to X.
    Finally, we prove transitivity. Let X, Y , and Z be any sets, and assume there
are bijections f : X → Y and g : Y → Z. The composite function g ◦ f : X → Z is a
bijection, as needed.
    The equivalence classes of this equivalence relation are precisely the collections of
sets with the same cardinality.
    The observant reader will have noticed that we defined when two sets S and T have
the same cardinality, |S| = |T |, but that we have not defined what the cardinality
of an individual set is. Mathematicians solve this problem by choosing a (special)
transversal of this equivalence relation; the representatives in the transversal are the
cardinal numbers. Thus, the cardinality of a set, denoted |S|, is a special element of
the equivalence class of S under the relation “having the same cardinality.” There
are specific symbols used to represent the cardinality of a set. For finite sets, that
symbol is just the actual size of the set. Thus, we still have |{2, 79, −4}| = 3.
    For infinite sets things are much more complicated. (Did you expect otherwise?)
The smallest infinite cardinal |N| is written as ℵ0 (read as “aleph-nought”). The next
infinite cardinal is ℵ1 , and so forth. The diagram below gives some perspective to this
chain. (We put question marks in places where we do not yet have any examples.)
      The following are some examples and nonexamples involving these definitions.
28. DEFINITIONS REGARDING CARDINALITY                                               219
Example 28.10. (1) The empty set is countable, since it is finite. It is not countably
infinite (since it isn’t infinite).
    (2) The set {1, 2, 93828283928} is countable and finite, but not infinite, and hence
not countably infinite.
    (3) Theorem 28.4 tells us that the integers are a countably infinite set. Similarly,
Example 28.3 tells us that 2N is a countably infinite set.
    (4) The set {n2 : n ∈ Z} is infinite. We will see shortly that it is countably
infinite.
    (5) Are there any sets which are infinite but not countably infinite? These would
be sets which occur strictly above ℵ0 in the diagram below. We will prove in Section 30
that, yes, there are such sets!                                                       4
                Cardinalities                 Examples
                      ..                      ..
                       .                       .
                      ℵ2                      ?
                            Infinite
                      ℵ1                      ?
                      ℵ0                      N, 2N, Z, . . .
                      ..                      ..
                       .                       .
                      2                       {1, 2}, {1, 3}, . . .
                            Finite
                      1                       {1}, {2}, . . .
0 ∅
  Advice 28.11. To show that a set A is countably infinite, you just need to
  arrange its elements in a nonrepeating, infinite list
A = {a1 , a2 , a3 , . . .}.
    This is precisely what we did when we proved that Z is countably infinite, we put
its elements in the list 0, 1, −1, 2, −2, 3, −3, . . ..
  Warning 28.12. If you are proving that a set is countably infinite by putting
  its elements into a list, then do not skip elements and do not repeat elements.
  Otherwise, you didn’t really create a bijection.
Answer: Only (c) works. The list in (a) skips the negative integers. (However,
it does prove that the nonnegative integers are countably infinite.) The list in (b)
repeats 0. Of the choices, only (c) lists every integer exactly once, hence gives the
bijection with N.
Proof. Let A be a countably infinite set. We can write the elements of A in an infinite
list a1 , a2 , a3 , . . .. Let B be an infinite subset of A.
    Let n1 be the smallest natural number with an1 ∈ B, which exists since B 6= ∅ as
B is infinite. Put b1 = an1 .
    Next let n2 be the smallest natural number with n2 > n1 and an2 ∈ B, which
exists since B − {b1 } =       6 ∅ as B is infinite. Put b2 = an2 .
    Repeating this process (by induction) we create an infinite list b1 , b2 , . . .. Clearly
there are no repetitions in this list. This new list covers every element of B because
we can also prove (by induction) that ni ≥ i for each i ∈ N; hence, we have worked
all the way through the list of elements of A.
Example 28.15. Not every subset of N is countably infinite. For instance {3, 7, 19}
is a subset but not countably infinite.
    However, every infinite subset of N is countably infinite by Theorem 28.13. For
instance, since there are infinitely many primes (by Theorem 19.14), then we know
that the set of all primes
                                {2, 3, 5, 7, 11, 13, 17, . . .}
is countably infinite.
    Is S = {x3 : x ∈ Z} = {. . . , −27, −8, −1, 0, 1, 8, 27, 64, . . .} countably infinite?
Yes! First, it is an infinite set since f (x) = x3 is a strictly increasing function. As S
is an infinite subset of the countably infinite set Z, we know S is countably infinite
by Theorem 28.13.                                                                        4
                                         1 + (−1)n (2n − 1)
                               f (n) =                      .                            4
                                                 2
28.E      Exercises
Exercise 28.1. Declare whether the following statements are true or false, with
proof/reason or counterexample:
 (a) All finite sets have the same cardinality.
 (b) If f : A → B is a function between two sets, then |f | = |A| (thinking of f as a
      set of ordered pairs).
  (c) Every subset of N is countably infinite.
 (d) Every subset of an infinite set has cardinality ℵ0 .
  (e) If f : A → B is a surjective function then |f | = |B| (thinking of f as a set of
      ordered pairs).
Exercise 28.3. Prove that the set of those natural numbers with exactly one digit
equal to 7 is countably infinite. For instance, the number 103792 has exactly one of
its digits equal to 7, while 8772 has two digits equal to seven.
Exercise 28.5. Prove that the function in Theorem 28.4 is a bijection. (See the
remark following the theorem for a formal definition of the function. The Pasting
Together Theorem might be helpful.)
Exercise 28.6. Prove that |R| = |(0, 1)|. (Hint: Consider the tangent function.
Alternatively, use Exercise 28.2 in pieces.)
29.A Unions
   Advice 29.2. The type of argument used in the first paragraph of the proof
   above is referred to as reducing to a simpler situation. For instance, in the proof
   above we could say there that we reduced to the case where the two sets are
   disjoint.
       After a reduction, mathematicians will simply assert that they now need
   only consider the simpler situation. For example, after the first paragraph of
   the proof above, we could simply say “We thus may assume S ∩ T = ∅.” This is
   because we would recognize that, after replacing T by T 0 , this situation actually
   occurs.
29.B        Products
Taking a union is not the only operation we can do with two sets. Another operation
is intersection. When we intersect two sets, the cardinality can get much smaller.
There is a third operation: the Cartesian product. Cantor came up with a very
clever method for showing that the product of two countably infinite sets is still
countably infinite. Thus we have:
29. MORE EXAMPLES OF COUNTABLE SETS                                                   223
   Travel along each arrow, starting at the smallest arrow, and passing to the next
smallest arrow. This allows us to list the elements of N × N as
(1, 1), (2, 1), (1, 2), (3, 1), (2, 2), (1, 3), . . . ,
according to when we pass through each ordered pair. We will hit each ordered pair
exactly once.
                                      (m + n − 1)(m + n − 2)
                       f (m, n) =                            + n.
                                                2
is also a bijection. (Proving that g is a bijection requires the lemma that every natural
number can be written as a unique power of 2 times a unique odd integer.)
    There are many other options. For instance, we could have used arrows pointing
down and to the left, instead of up and to the right. Alternatively, we could have
“snaked” back and forth along each finite diagonal.                                    N
Proof. Put the elements of Q+ into a diagram as below. (We put fractions which are
not in lowest terms as light gray.)
                                 1     2    3     4
                                 1     1    1     1
                                                       ···
                                 1     2    3     4
                                 2     2    2     2
                                                       ···
                                 1     2    3     4
                                 3     3    3     3
                                                       ···
                                 1     2    3     4
                                 4     4    4     4
                                                       ···
                                 ..    ..   ..    ..   ..
                                  .     .    .     .      .
    Now, we just list the elements as before, skipping over the elements in light gray,
since they will be counted when they are in lowest terms. This counting procedure
never repeats elements (since we skip those fractions not in lowest terms), and con-
tinues forever since Q+ is infinite (since N is a subset; in other words, the top row of
the diagram is infinite).
We finish with one more example of how to show a set is countably infinite.
    First, the set S is infinite, since the left column is infinite. Since S ⊆ N × N and
N × N is countably infinite, we know that S is countably infinite by Theorem 28.13.
    Alternatively, we can list the elements of S by using the “up arrow” argument
from earlier. (We can’t list the elements of S by going down columns, but could we
list the elements of S by traveling across the successive rows?)                      4
29.D      Exercises
Exercise 29.1. Finish the proof of Theorem 29.1.
(Hint: There are two unfinished cases: (a) both S and T 0 are finite, or (b) one of
them is finite and the other infinite.)
Exercise 29.2. Prove that {0, 1} × N is countably infinite. (Hint: Use theorems in
the section.)
Exercise 29.3. Let A and B be countable sets. Prove that A × B is countable. (How
is this different from what was proved in Theorem 29.3?)
Exercise
    S 29.6. Prove that if A1 , A2 , . . . are pairwise disjoint, countably infinite sets,
then ∞i=1 Ai is countably infinite. (Hint: Not induction. Think about “up   S arrow”
arguments. Alternatively, you could construct a bijection from N × N to ∞     i=1 Ai .)
Exercise 29.7. Prove that the set of all finite subsets of N is countably infinite.
226                                                    CHAPTER VIII. CARDINALITY
30        Uncountable sets
The results of this section will be centered around the following definition.
   We can think of the uncountable sets as those sets which are bigger than the
countably infinite sets, as in the following figure.
                         Cardinalities                  Examples
                               ..
                                .
             Uncountable      ℵ2
                                    Infinite
                              ℵ1
                               ..
                                .
0 ∅
    As is evident from this diagram, we still don’t have any examples of uncountable
sets. In this section we will see that there are many examples.
                                       1   = 1.00000 . . .
                                    √
                                       2   = 1.41421 . . .
                                     π
                                   −       = −0.24166 . . .
                                     13
                                 6   24
                                e −        = −3.42609 . . .
                                      7
      However, real numbers do not always have unique infinite decimal expansions. If a
30. UNCOUNTABLE SETS                                                                  227
number ends in repeating 9’s, we can shift up and end in repeating 0’s. For example,
                            0.99999 . . . = 1.00000 . . .
                          8.3929999 . . . = 8.3930000 . . .
                     −3928.83829999 . . . = −3928.83830000 . . .
To avoid nonuniqueness issues, we will always avoid writing decimal expansions which
end in repeating 9’s.
    Our goal now is to show that R is uncountable. From a previous homework
problem we know that |(0, 1)| = |R|, so it suffices to show that (0, 1) is uncountable.
(This set is easier to work with.) We know that (0, 1) is infinite, so to prove that it
is uncountable we must show that there does not exist any bijection f : N → (0, 1).
Cantor’s trick to do this is to show that every function f : N → (0, 1) is not surjective,
using what is now commonly called a “diagonalization argument.” Before we give the
technical proof, we demonstrate the idea with an example.
    Suppose f : N → (0, 1) is the function
                                f (1)   =    0.29838293 . . .
                                f (2)   =    0.43828183 . . .
                                f (3)   =    0.73826261 . . .
                                f (4)   =    0.20030000 . . .
                                f (5)   =    0.73724892 . . .
                                        ..
                                         .
Our goal is to prove that f is not surjective. Thus, we must find some element
x ∈ (0, 1) that f does not map to. We will construct x digit by digit, so that it
doesn’t match any of the numbers on our list.
    First, we want x to be different from f (1) = 0.29838293 . . .. We can make sure this
is true by guaranteeing that the first digit (past the decimal point) of x is different
from the first digit of f (1). So, let’s change that first 2 to a 4, and put
                                        x = 0.4 . . . .
Notice that no matter what we do with the rest of the digits of x, it will not match
f (1).
    Second, we want x to be different from f (2) = 0.43828183 . . .. They do match on
their first digit, but we can make their second digits different by changing the 3 to a
4. So we put
                                     x = 0.44 . . .
and it will not equal f (1) or f (2).
   Third, we want x to be different from f (3) = 0.73826261 . . .. It already is different
because of our choice of the first two digits, but we probably should continue the
pattern we’ve already come up with, to make sure that the third digit is different. So
we change the 8 to a 4, and put
                                        x = 0.444 . . .
228                                                        CHAPTER VIII. CARDINALITY
x = 0.4444 . . .
Remark 30.2. If we start with a different list of numbers, the number x we construct
will be different (depending on that list).                                       N
To make this work more easily, define the digit change function
by the rule                                (
                                            4       if i 6= 4
                                  dig(i) =
                                            7       if i = 4.
Note that because the digit change function does not use 9’s, we don’t need to worry
about x ending in repeating 9’s.
Remark 30.3. There are many other digit change functions we could have used. This
is just one option. Because there are so many different options, you should always tell
your reader which digit change function you are using by giving the definition.      N
    We are now prepared to give the formal proof that (0, 1) is uncountable. As
discussed above, the technique used in this proof is known as Cantor’s diagonalization
argument.
Proof. Let f : N → (0, 1) be any function. We will show that f is not surjective.
    Write f (n) using a decimal expansion f (n) = 0.d1,n d2,n d3,n . . . (which doesn’t
end in repeating 9’s). Let x ∈ (0, 1) be the number with decimal expansion x =
0.x1 x2 x3 . . . where xn = dig(dn,n ). In other words, the nth digit of x is the digit
change of the nth digit of f (n). Hence x 6= f (n) for each n ∈ N. Therefore f is not
surjective, as x is not in the image.
30. UNCOUNTABLE SETS                                                                   229
    The cardinality of R is called the continuum, and we write |R| = c. You might
ask: Where does c fit in the chain of cardinalities? Is it just one step up from ℵ0 ?
    The answer is strange. It depends on the axioms you use! Some mathemati-
cians do assume c = ℵ1 ; this assumption is called the continuum hypothesis. Most
mathematicians simply do not worry about this question.
    We have already seen that |(0, 1)| = |R|, so (0, 1) also has continuum cardinality.
Here are some more examples of sets with continuum cardinality.
  (1) Any open interval (a, b) with a, b ∈ R. (We can also replace a with −∞, or b
      with ∞.)
  (2) Any half-open interval [a, b) with a, b ∈ R. (We can replace b with ∞.)
  (3) Any half-open interval (a, b] with a, b ∈ R. (We can replace a with −∞.)
  (4) Any closed interval [a, b] with a, b ∈ R.
    To give the idea behind how to prove these facts, we will show that (0, 1] has
continuum cardinality.
It is easy to see that f is a bijection from (0, 1] − S to the set (0, 1) − S (as it is
essentially the identity function on this set). It is also a bijection from S → (0, 1) ∩ S.
By pasting together, we have a bijection.
   We end this section with one last result which can be used to tell whether a set is
uncountable.
Proof. This is the contrapositive of Theorem 28.13, after noting that A and B must
be infinite.
30.B     Exercises
Exercise 30.1. Let a, b ∈ R with a < b. Construct a bijection f : (0, 1) → (a, b), and
prove it is a bijection. (This shows that bounded open intervals all have the same
cardinality.)
Exercise 30.2. Prove that the interval [0, 1) has continuum cardinality, by creating
a bijection [0, 1) → (0, 1).
Exercise 30.3. Prove that the interval [0, 1] has continuum cardinality.
Exercise 30.4. Prove that the irrational numbers are uncountable. (Hint: Theorem
29.1 may be useful, along with contradiction.) Find a subset of the irrational numbers
which is countably infinite.
  Advice 31.3. We can think of injections as giving only “half” of the information
  needed to construct a bijection, which is why we only get an inequality ≤.
    You might recall that in our tower of cardinalities (found at the beginning of the
previous section) we had an infinite list of infinite cardinalities ℵ0 < ℵ1 < ℵ2 < . . ..
But so far, we have only found two types of infinite cardinalities; the countably infinite
sets, and the sets of continuum size. In our next theorem we will prove that for any
set S we have |S| < |P(S)|. Thus, we have an infinite chain of increasing infinite
cardinalities
                          |N| < |P(N)| < |P(P(N))| < . . . .
232                                                     CHAPTER VIII. CARDINALITY
    When S is a finite set, say |S| = n, then we know |S| < |P(S)| because n < 2n .
But how does this process work when S is an infinite set? In that situation we
cannot simply count elements. Rather, we must prove that there is no bijection
g : S → P(S). Our approach will be similar to how we showed R is not countable.
Start with an arbitrary function g : S → P(S), and show that g is not surjective
by finding some set B ∈ P(S) which is not in the image of g. The hardest part is
constructing B. We will give an explicit example (using finite sets), and then give
the formal proof for arbitrary sets.
    Fix S = {1, 2, 3}. Hence
                P(S) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.
Consider the function g : S → P(S) given by g(1) = {2}, g(2) = ∅, and g(3) =
{1, 2, 3}. We want to find a set B ∈ P(S) that we can prove is not equal to g(1),
g(2), or g(3). Of course, we could just pick one of the other five sets not in the image
of g in this case; but we want to come up with a method that will work for any set S.
    So, we ask the question: Is x ∈ g(x)?
    • Is 1 ∈ g(1) = {2}? The answer is no.
    • Is 2 ∈ g(2) = ∅? The answer is no.
    • Is 3 ∈ g(3) = {1, 2, 3}? The answer is yes.
We construct B ∈ P(S) by the following rule: If the answer to the question “Is
x ∈ g(x)?” is no then we put x ∈ B, but if the answer is yes then we leave x out of
B. Using the answers we had above, we see that B = {1, 2}.
    Notice that B will not equal g(x) because if x ∈ g(x) then x ∈  / B, and vice versa.
Indeed, we see that
    • 1∈ / g(1) but 1 ∈ B.
    • 2∈ / g(2) but 2 ∈ B.
    • 3 ∈ g(3) but 3 ∈ / B.
This forces B to be different from any element in the image.
    One more example is in order, to test our understanding. Suppose that S =
{1, 2, 3} as above, and suppose h : S → P(S) is the function defined by the rule
h(1) = {2, 3}, h(2) = {2}, h(3) = {2, 3}. If we follow the same pattern as above,
asking the question “Is x ∈ h(x)?” and using the answers to define B, what set B do
we get? (Before looking at the answer, try this construction yourself.)
    Answer: The set is {1}.
Remark 31.4. The set B is sometimes called the barber set. This is because there
is some connection with the following paradox: There lives a barber in a small town
who always obeys the rule that he will shave everyone in town who doesn’t shave
themselves, but if they shave themselves he will not shave them.
    Does the barber shave himself? If he does, then he cannot shave himself by his
own rule. But if he doesn’t, then he must shave himself by his rule.
    One way to resolve the paradox is to assume instead the barber does not live in
the town. This corresponds, roughly, to the fact that B is not in the image of the
function.                                                                         N
      We are now ready to prove the theorem in general.
31. INJECTIONS AND CARDINALITIES                                                   233
Proof. Let S be any set. First, we prove that |S| ≤ |P(S)|, so we need to find an
injective function f : S → P(S). Define f by the rule f (s) = {s}. To prove that f
is injective, let a, b ∈ S and assume f (a) = f (b). Hence {a} = {b}. Therefore a = b,
since sets are equal exactly when they have the same elements.
    Next, let g : S → P(S) be an arbitrary function. We will show that g is never
surjective, and hence there is no bijection between S and P(S). Define the (barber)
set as
                                B = {x ∈ S : x ∈  / g(x)}.
This is a subset of S, hence an element of P(S). We will show that B is not in the
image of g.
   Let s ∈ S be arbitrary. There are two cases.
Case 1: Assume s ∈ g(s). In this case s ∈/ B, hence g(s) 6= B.
Case 2: Assume s ∈  / g(s). Then s ∈ B, hence g(s) 6= B.
   In every case B 6= g(s). Since s ∈ S was arbitrary, this means B cannot be in the
image (since it does not equal any element of the image).
Proof. We must construct a bijection between the two sets P(S) and F(S). The
map is this: send a subset A ⊆ S to its characteristic function, χA : S → {0, 1}. In
other words, we define f : P(S) → F(S) by the rule
f (A) = χA .
We first show that f is injective. Let A, B ∈ P(S) and assume f (A) = f (B). We
then have χA = χB . Plugging in an arbitrary element s ∈ S, we have χA (s) = χB (s).
The left-hand side is 1 when s ∈ A and 0 otherwise, and similarly for the right-hand
side. Thus, s ∈ A if and only if s ∈ B. In other words A = B.
    Finally, we show that f is surjective. Let ϕ : S → {0, 1} be any function. Put
A = {s ∈ S : ϕ(s) = 1}. We then check directly that ϕ = χA . (They have the same
domain and codomain, and the same rule.) Hence ϕ = f (A), so f is surjective.
234                                                   CHAPTER VIII. CARDINALITY
Proof. The equality |P(N)| = |F(N)| follows from the previous theorem. We also
know |N| < |P(N)|, hence F(N) is uncountable.
31.C     Exercises
Exercise 31.1. Answer each of the following true or false problems, proving your
answer.
 (a) Every uncountable set has the same cardinality as (0, 1).
 (b) Let A and B be sets. If A ⊆ B, then |A| ≤ |B|.
 (c) For sets A and B, if A ( B, then |A| < |B|.
 (d) Given sets A, B, and C, if A ⊆ B ⊆ C and both A and C are countably infinite,
      then B is countably infinite.
 (e) No subset of R has smaller cardinality than R.
  (f) For sets S and T , if |S| < |T | and S is finite, then T is infinite.
 (g) For sets S and T , if |S| < |T | and S is countable, then T is uncountable.
 (h) For sets S and T , if |S| < |T | and S is countably infinite, then T is uncountable.
  (i) For any set S, there exists another set T such that |S| < |T |.
Exercise 31.2. Let S = {a, b, c, d, e} and let g : S → P(S) be defined by the rule
g(a) = {b, d}, g(b) = {a, c, e}, g(c) = {a, c, d, e}, g(d) = ∅, g(e) = {e}. List the
elements of the barber set B = {s ∈ S : s ∈/ g(s)}. Why is it not in the image of g?
Exercise 31.3. Find a set with cardinality bigger than that of R. Then find a set
with cardinality bigger than that.
Exercise 31.4. Theorem 27.5 says that for finite sets A and B, if |A| = |B| and
f : A → B is a function, then f is injective if and only if f is surjective. Prove that
this fails for infinite sets, by proving the following:
  (a) Find an infinite set S and a function f : S → S that is injective but not surjec-
      tive.
  (b) Find an infinite set S and a function g : S → S that is surjective but not
      injective.
In both parts prove that the function you construct has the requisite properties.
Exercise 31.5. Let A and B be sets with f : A → B a bijection. Define a new map
g : P(A) → P(B) by the rule g(S) = {f (s) : s ∈ S}, where S ⊆ A is an arbitrary
element of P(A). Prove that g is a bijection.
    Conclude that if |A| = |B| then |P(A)| = |P(B)|.
31. INJECTIONS AND CARDINALITIES                                                    235
f (x) = {q ∈ Q : q ≤ x}.
Prove that f is injective. (Hint: For any two real numbers x < y, there is a rational
number strictly between them. See Exercise 11.6.)
    Using this injection, in conjunction with the previous exercise, derive the inequal-
ity |R| ≤ |P(N)|.
Exercise 31.7. Let A and B be nonempty sets. Prove that there exists an injection
f : A → B if and only if there exists a surjection g : B → A. (Hint: For the backwards
direction, given a surjection g : B → A define a function f : A → B by the rule f (a) =
one of the elements which mapped to a.)
236                                                  CHAPTER VIII. CARDINALITY
Remark 32.2. The story of how this theorem came to be is long and somewhat
convoluted. Cantor was the first to state the theorem, but apparently he had no
proof. The first proof (that we know of) was found by Dedekind, but he did not
publish his work at the time.
   Schröder announced a proof, which was later shown to have an error. Finally, in
1897, Bernstein (who was only 19 years old, and a student of Cantor) presented a
proof. At nearly the same time Schröder independently found an error free proof as
well. Hence, these two mathematicians have their names attached to the theorem. N
                                A=A1 ∪ A2 ∪ A3 ∪    A4
                                                
                                 h h h          h
                                yh y 1 y 2 y 3      y 4
                                B=B1 ∪ B2 ∪ B3 ∪B4 .
   The only information we have available comes from the two maps f and g that
we have given to us. We must somehow use the maps f and g to make any progress
on this problem. We might ask how these maps behave. Fix some element a0 ∈ A.
Applying f , we have a new element f (a0 ) ∈ B. We call this new element b0 . We can
think of a0 as the parent of b0 , because a0 gives rise to b0 (through the function f ).
We also call b0 the child of a0 .
32. THE SCHRÖDER–BERNSTEIN THEOREM                                                      237
A B
a0 b0
     There are some very important facts we need to know about this parent-child
relationship. First, every element in A is the parent of exactly one child in B because
f is a function. Second, it may be the case that some element of B is parentless,
because f may not be surjective. Third, every element in B which actually is a child
has exactly one parent in A, because f is injective.
     We can also talk about elements of B having children, using the function g, and
the same facts we mentioned in the previous paragraph are still true.
     Does b0 have a child? Yes! We can pass back over to A by applying the map g to
b0 . Set a1 = g(b0 ) ∈ A, which is the child of b0 . There are two cases.
     Case 1: a0 = a1 .
A B
a0 b0
In this case, the maps f and g just send a0 and b0 back and forth to each other. Note
that a0 is its own grandparent! It seems very natural that in this case we would want
a0 and b0 to correspond under h.
    Case 2: a0 6= a1 .
A B
                                 a0
                                 a1                      b0
We can now repeat the parent-to-child process. Let b1 = f (a1 ) ∈ B which is the child
of a1 . Is it possible that b1 = b0 ? No! We have b0 = f (a0 ) 6= f (a1 ) = b1 , because f is
injective and a0 6= a1 .
    Similarly, let a2 = g(b1 ) ∈ A which is the child of b1 . We see that a1 = g(b0 ) 6=
g(b1 ) = a2 , because g is injective and b0 6= b1 . However, we cannot tell whether a0 and
a2 are the same or different, so we again have two cases. We will separately consider
when a0 = a2 and when a0 6= a2 .
    Case 2A: a0 = a2 . We picture this situation as follows:
238                                                            CHAPTER VIII. CARDINALITY
A B
                                     a0
                                     a1                         b0
                                                                b1
In this case, applying f and g sends the points a0 , b0 , a1 , b1 in a loop. (Here, each
element in the loop is its own great-great-grandparent.) It seems natural for a0 and
b0 to correspond, and for a1 and b1 to correspond, under h.
    Case 2B: a0 6= a2 .
                                     A                          B
                                     a0
                                     a1                         b0
                                     a2                         b1
     At this point it is recommended that the readers work out for themselves that if
we let b2 = f (a2 ), then b2 is different from b0 and b1 (by injectivity of f ). Similarly,
if we let a3 = g(b2 ), then the readers should show that a3 is different from a1 and a2
(by injectivity of g). Again we have two cases: a3 could equal a0 and we have a loop,
or it is a new element.
     In general, for n ≥ 0, define bn = f (an ) to be the child of an , and define an+1 =
g(bn ) to be the child of bn . Working by induction one can prove that there are exactly
two options. First, these elements can end up in a loop with an+1 = a0 , where
a0 , a1 , a2 , . . . , an are distinct elements of A and b0 , b1 , b2 , . . . , bn are distinct elements
of B. The second option is that there is no loop, and so we have an infinite chain
of descendants: a0 , a1 , a2 , . . . are distinct elements of A and b0 , b1 , b2 , . . . are distinct
elements of B.
                                     A                          B
                                     a0
                                     a1                         b0
                                     a2                         b1
                                      ..                        b2
                                       .           ..            ..
                                                    .             .
   Our discussion has concentrated on passing from a parent to a child, but we can,
sometimes, reverse the process. Remember that elements can have at most one parent
(because f and g are injective). So, if our chain of descendants ends in a loop, then
when we go “backwards” up through the ancestors, we just cycle backwards through
the loop.
   What happens in the case where an element a0 had a nonlooping chain of descen-
dants? If a0 has no parent, we can say that a0 is an ultimate ancestor, and we have
the entire chain of descendants and ancestors.
32. THE SCHRÖDER–BERNSTEIN THEOREM                                                  239
However, if a0 does have a parent b−1 ∈ B, then our chain can be extended:
                                A                        B
                               a0                       b−1
                               a1                        b0
                               a2                        b1
                                ..                       b2
                                 .           ..           ..
                                              .            .
In this case, b−1 6= bn for any n ≥ 0, as depicted in the picture above. This is because
g(b−1 ) = a0 6= an+1 = g(bn ) and g is injective.
    It is possible that b−1 has no parent, and hence is the ultimate ancestor at which
the chain stops. On the other hand b−1 could have a parent a−1 , and a similar
argument shows that a−1 does not equal any of a0 , a1 , a2 , . . ..
    In total, we see that there are four types for the chain of descendants and ancestors
of an element:
  (1) The chain forms a finite loop (of the type described above).
  (2) The chain never loops and has an ultimate ancestor in A.
  (3) The chain never loops and has an ultimate ancestor in B.
  (4) The chain never loops and has no ultimate ancestor. (Thus, it is doubly infinite.)
We are now ready to describe a partition of A. We put A = A1 ∪ A2 ∪ A3 ∪ A4 where
a0 7→ b0 7→ a1 7→ b1 7→ · · · 7→ an 7→ bn 7→ a0 .
The element b0 = f (a0 ) belongs to the same loop, and hence b0 ∈ B1 . We define
h1 = f |A1 : A1 → B1 .
240                                                        CHAPTER VIII. CARDINALITY
32.C        Examples
The Schröder–Bernstein theorem is not only beautiful symbolically, but also quite
useful because it is sometimes very easy to describe injections back-and-forth between
two sets A and B, yet it may be difficult to describe a bijection. Here are some
standard examples.
Example 32.3. We will prove that the closed interval [3, 10] has the same cardinality
as (0, 1).
    Define f : [3, 10] → (0, 1) by the rule f (x) = (x − 2)/10. This is a linear function
with f (3) = 1/10 and f (10) = 8/10. So it maps [3, 10] into the interval [1/10, 8/10] ⊆
(0, 1) injectively.
    On the other hand, the map g : (0, 1) → [3, 10] given by g(x) = x + 3 is also an
injection.
    By the Schröder–Bernstein theorem, we are done.                                   4
      This next example is so important that we will call it a theorem.
Proof. In Exercise 31.6, we proved that |R| ≤ |P(N)|. (For an alternate proof of this
inequality, see Exercise 32.5 below.) By the Schröder–Bernstein theorem, it suffices
to now prove |P(N)| ≤ |R|.
    Define f : P(N) → R by the rule f (A) = 0.χA (1)χA (2)χA (3) . . .. (For instance, if
A = {1, 3, 4, 7, 9, . . .} then f (A) = 0.101100101 . . . ∈ R.) It just remains to show that
this function is injective. Let A, B ⊆ N be arbitrary, and assume f (A) = f (B). Thus
                    0.χA (1)χA (2)χA (3) . . . = 0.χB (1)χB (2)χB (3) . . . .
32. THE SCHRÖDER–BERNSTEIN THEOREM                                                      241
Since neither decimal expansion involves repeating 9’s, the two expansions are equal.
Hence χA (n) = χB (n) for each n ∈ N. This means that A and B have exactly the
same elements so A = B, which finishes showing that f is an injective function.
32.D      Exercises
Exercise 32.1. Let X, Y , and Z be sets. Prove that if X ⊆ Y ⊆ Z and |X| = |Z|,
then |X| = |Y | as well.
Exercise 32.2. Prove that [5, 16) and (0, ∞) have the same cardinalities.
Exercise 32.4. Complete the proof in case 2 (of Subsection 32.B) of the Schröder–
Bernstein theorem, by showing that f |A2 is a function from A2 to B2 , and also that
it is bijective.
Exercise 32.5. In Exercise 31.6 we showed that |R| ≤ |P(N)|. Here is another way
to do that.
    Define a function f : (0, 1) → P(N), by sending (the decimal expansion of) a real
number 0.a1 a2 a3 . . . (not ending in repeating 9’s) to the set
(For instance, 0.03193 . . . maps to {0, 30, 100, 9000, 30000, . . .} − {0}.) Prove that this
is an injective function.
242   CHAPTER VIII. CARDINALITY
Chapter IX
Introduction to Analysis
The only way to discover the limits of the possible is to go beyond them into the
impossible. Arthur C. Clarke
    In the third century BC, the Greek mathematician Archimedes used the “method
of exhaustion” to estimate the circumference of a circle of diameter 1, and thus
estimate the value of π. His method involved inscribing a regular n-gon inside the
circle, circumscribing the circle by a regular n-gon, and bounding the circumference
of the circle between the perimeters of the two n-gons. For example, taking n = 4,
n = 5, and n = 6, we get the following approximations.
2.8284 < π < 4.0000 2.9389 < π < 3.6327 3.0000 < π < 3.464
As n gets larger the approximations get better; for n = 100 we get 3.141 < π < 3.143,
and for n = 1000 we get 3.141587 < π < 3.141603. This computation was among the
first uses in antiquity of the idea of a limit; however, it would be nearly 2,000 years
before the concept of limit was formally defined and given a logical foundation.
    Newton and Leibniz used a concept of limit in the development of calculus, but
it was not until around 1820 that Bolzano and Cauchy formalized the definition of
limit. It was even later when it was finally written in the way most mathematicians
now use limits.
                                         243
244                                 CHAPTER IX. INTRODUCTION TO ANALYSIS
33     Sequences
The infinite list of numbers
                                  1 1 1 1 1 1
                                    , , , , , ,...
                                  1 2 3 4 5 6
is an example of what we shall call a sequence. Sequences arise naturally in many
contexts. For instance, you might measure the speed of a race car every second and
produce a list of speeds. Or you could measure, over time, the temperature of heated
metal. Maybe your list of numbers is the total population in a bacterial culture,
measured every morning in the lab.
    In all these cases, the numbers give us a brief glimpse at a process that could
continue on forever. Extrapolating from the data, we might make a guess about how
the sequence behaves, or perhaps fit it to a nice function. For instance, you might
guess that the list of numbers above comes from the function 1/n, as n ranges over
the natural numbers (and you’d be right!).
    A fundamentally important question we can ask is: Where are these numbers
headed? Scientists use the mathematical theory developed in this section to determine
the eventual behavior, or limit, of such sequences.
a1 , a2 , a3 , . . .
where an = f (n).
Example 33.2. It is important to be able to pass back and forth between a list of
numbers and the function that defines the list.
   For example, define a function f : N → R by the rule f (n) = 2n. Thus, the
nth term of our sequence is an = 2n, and the first few terms are given as follows:
a1 = 2, a2 = 4, a3 = 6, a4 = 8, a5 = 10, . . ..
   On the other hand, if you are given the list of numbers 2, 4, 8, 16, . . ., then you
might guess that this sequence arises from the function g : N → R given by the rule
g(n) = 2n .                                                                          4
Example 33.3. Try to figure out the rule for the following sequences:
 (1) −1, 1, −1, 1, −1, 1, . . .,
 (2) 1, 3, 5, 7, . . .,
 (3) 1, 0, 0, 0, 0, . . .,
 (4) −10, 18397863, 2, 939, −10383, . . ..
33. SEQUENCES                                                                           245
a1 , a2 , a3 , . . .
we could write (an )n∈N or (2n )n∈N . Both notations would refer to the sequence
2, 4, 8, . . . . 4
                                       1, 3, 5, 7, 9, . . .
                                 13, 8, 3, −2, −7, . . .
                                   6, 9, 12, 15, 18, . . .
    Answer: Each term in the sequence is a fixed distance from the previous term.
In the first sequence, the terms jump by adding 2, in the second sequence they jump
by adding −5, and in the last sequence the terms jump by adding 3.
    These types of sequences are so common we give them a special name.
an = c + (n − 1)d
Example 33.7. The arithmetic sequence with first term 2 and common difference 2
has terms
                a1 = 2, a2 = 4, a3 = 6, a4 = 8, a5 = 10, . . .
The nth term is given by the formula an = 2 + (n − 1)2 = 2n.                                4
   Can you find the first six terms of the arithmetic sequence with first term π and
common difference −e?
246                                     CHAPTER IX. INTRODUCTION TO ANALYSIS
                                     1, 2, 4, 8, 16, 32, . . .
                              1, 1/2, 1/4, 1/8, 1/16, . . .
                                  3, −6, 12, −24, 48, . . .
   Answer: Each term in the sequence is a fixed multiple of the previous term. In
the first sequence we multiply each term by 2 to get the next term, in the second
sequence we multiply by 1/2, and in the third sequence we multiply by −2.
an = c · rn−1
for n ∈ N is called a geometric sequence with first term c and common ratio r.
Example 33.9. The geometric sequence with first term 4 and common ratio 1/10 is
given by the formula
                                      n−1
                                       1
                              an = 4
                                      10
                                    1 1 1 1 1 1
                                     , , , , , ,...
                                    1 2 3 4 5 6
It appears that the limit should be 0; the terms are getting closer and closer to 0.
    We will now give the formal definition of what it means for a sequence to approach
a limit. This definition is quite complicated, so we will explain the true meaning
behind the symbols afterwards.
33. SEQUENCES                                                                       247
    We will spend the rest of this section studying this definition and coming to an
understanding of what it means. We begin by peeling off each of the quantifiers.
    What is ε? The first quantifier is “∀ε ∈ R>0 .” The variable ε is used to help us
measure how close the sequence gets to the limit L. We want to be able to prove that
our sequence can get arbitrarily close to the limit L. Thus, we want to prove that our
sequence eventually gets within a distance of 1/100 of the limit, but also eventually
within a distance of 1/1000 of the limit, and eventually within 1/10000, and so forth.
Thus, we do not just take ε = 1/100, we allow it to be any positive constant.
    What is N ? The second quantifier is “∃N ∈ R.” The variable N helps us tell
how far along the sequence we must go, so that after that point the sequence stays
within a distance of ε from the limit.
    For instance, again consider the sequence 1, 1/2, 1/3, . . .. When ε = 1/2, how far
along the sequence do we need to travel until it stays within a distance 1/2 of the
limit L = 0? We see that by the time we reach the second term, all of the rest of
the terms are within a distance of 1/2 from 0. When ε = 1/100, we now must take
N = 100. For even smaller values of ε, we have to take larger values of N so that the
sequence stays that close to the limit.
    Note that the second quantifier is existential. This means that you must fix a
specific value of N (depending on ε) which will satisfy the definition of limit. This
value of N will usually be found in scratch work, outside the proof. This is often the
hardest part of a limit proof, and we will show how this is to be done shortly.
    What is n? The third and final quantifier is “∀n ∈ N.” The variable n is just
one of the subscripts in our sequence.
    What does the premise of the implication, n > N , say? The condition
n > N just tells us that we will only look at the terms in the sequence past N . When
looking at limits, we really only care about what eventually happens.
    What does the conclusion of the implication, |an − L| < ε, say? The
condition |an − L| < ε is just an easy way of saying that the nth term of our sequence
is within a distance of ε from the limit L. Equivalently, by removing the absolute
value signs, we may write L − ε < an < L + ε.
    Proving Limits. Every proof of a limit for a sequence will look essentially the
same. First you must deal with each of the quantified variables. The universal (for
all) variables must be left arbitrary. The existential variables must be fixed, but only
248                                   CHAPTER IX. INTRODUCTION TO ANALYSIS
Proof outline.
   Let ε > 0.
   Fix N = found from scratch work ∈ R.
   Let n ∈ N.
   Assume n > N .
   Do some work (usually by reversing the scratch work).
   Conclude |an − L| < ε.
  Theorem 33.12.
                                             1
                                       lim     = 0.
                                      n→∞    n
Scratch. This work should not appear on your homework, or in your proof.
   Start with the conclusion |an − L| < ε. We know that an = 1/n and L = 0. So
the inequality becomes |1/n − 0| < ε. In other words |1/n| < ε. Since 1/n is positive,
the absolute value signs disappear, and we have 1/n < ε, or in other words n > 1/ε.
This will be our value for N .                                                      F
Proof. Let ε > 0. Fix N = 1/ε ∈ R. Let n ∈ N. Assume n > N . Thus, n > 1/ε.
Taking reciprocals (noting that both sides of the inequality are positive), we get
1/n < ε. Hence
                          |an − L| = |1/n − 0| = 1/n < ε
as desired.
                                       lim an = 1.
                                      n→∞
Scratch. Start with |an − L| < ε. Plugging in the values for an and L, we have
                                           
                                 1 − 3 − 1 < ε.
                                             
                                     n       
Simplifying we have |−3/n| < ε. Since n > 0 we have |−3/n| = 3/n. Hence, we may
write 3/n < ε. Solving for n, we get n > 3/ε. This is our value for N .      F
Proof. Let ε > 0. Fix N = 3/ε ∈ R. Let n ∈ N. Assume n > N . Thus n > 3/ε.
Since ε and n are positive, we get 3/n < ε. Thus
                                                   
                                      3         3 3
                    |an − L| =  1 −      − 1 = −  = < ε
                                      n               n    n
as desired.
  Warning 33.14. In the previous examples, each term of the sequence is closer
  to the limit L than the previous term. It is tempting to think that this is a
  valid definition of a limit; i.e., the terms get closer and closer to L without ever
  actually reaching L or getting farther away. In the next example, we will see
  that this is not true.
  We have
                                      lim an = 0.
                                     n→∞
    This example is interesting because the sequence actually reaches the limit (every
even numbered term is equal to the limit) and moves away from the limit infinitely
often (each odd numbered term is farther from the limit than the previous term).
However, it does not move too far away from the limit.
Scratch. Start with |an − L| < ε. There are two cases.
   Case 1: Suppose n is odd. Then an = 1/n so our inequality becomes |1/n−0| < ε.
Solving for n, as before, we reach n > 1/ε.
   Case 2: Suppose n is even. Then an = 0 and our inequality becomes |0 − 0| < ε.
This is true no matter the value of n, so in this case any value of N will work.
   We must use an N that works in every case. Thus N = 1/ε should suffice.       F
   With our scratch work completed, the proof now follows.
Proof. Let ε > 0. Fix N = 1/ε. Let n ∈ N. Assume n > N . We have two cases to
consider.
   Case 1: Suppose n is odd. In this case we have
                       |an − L| = |1/n − 0| = 1/n < 1/N = ε.
   Case 2: Suppose n is even. In this case we have
                             |an − L| = |0 − 0| = 0 < ε.
   Thus, in every case we have |an − L| < ε.
250                                   CHAPTER IX. INTRODUCTION TO ANALYSIS
33.E      Divergence
Remember that for a sequence a1 , a2 , a3 , . . . to converge, we have:
As you can see, each of the four different quantifiers has been changed.
    We start a divergence proof by letting L ∈ R be arbitrary. Subsequently, we must
find some ε so that our sequence will continue to have some terms at least ε away
from L. (Thus, ε usually depends on L.) We let N ∈ R be arbitrary, and must find
a subscript n, past N , such that an has distance more than ε from L.
    We will do one example, where ε does not depend on L.
  Proposition 33.16. Given the sequence (an )n∈N defined by an = (−1)n , then
  an is divergent.
Proof. Let L ∈ R. Fix ε = 1/2 ∈ R>0 . Let N ∈ R. To find a term in our sequence
that is at least distance 1/2 from L we consider two cases.
   Case 1: Suppose L ≥ 0. Fix n ∈ N to be the smallest odd number with n > N .
Since n is odd, an = −1. We find
   Case 2: Suppose L < 0. In this case we fix n ∈ N to be the smallest even number
with n > N . Since n is even, an = 1. We find
Scratch. Once again, we start our scratch work by considering |an − L| < ε, and try
to solve for n. Thus, we want
                                             
                                 n+3      1 
                                         −
                                 2n − 21 2  < ε.
                                
or in other words 27/|4n − 42| < ε. Thus, we reduce to |4n − 42| > 27/ε.
    There are two possibilities here, depending on whether 4n − 42 is positive or
negative. (Note: It is never zero since n ∈ N.) In the positive case we want
                                              27
                                  4n − 42 >      ,
                                               ε
which reduces (after some algebra) to
                                       27 + 42ε
(33.20)                           n>            .
                                          4ε
   In the case when 4n − 42 is negative, we want
                                              27
                                  42 − 4n >      ,
                                               ε
which reduces to
                                        42ε − 27
(33.21)                           n<             .
                                           4ε
    Inequality (33.21) does not help us because we want to find a lower bound on
n. Thus, we must guarantee that case never happens, or in other words, we must
guarantee that 4n − 42 is positive. This means we want to take 4n − 42 > 0, or
in other words n > 21/2. Combining this with (33.20), we see that we want N =
max(21/2, (27 + 42ε)/(4ε)), so that if n > N then n is greater than both 21/2 and
(27 + 42ε)/(4ε).                                                               F
252                                  CHAPTER IX. INTRODUCTION TO ANALYSIS
33.G       Exercises
Exercise 33.1. Write the first six terms, and determine the nth term an , for each of
the following sequences.
  (a) An arithmetic sequence with first term 5 and common difference −3.
 (b) A geometric sequence with first term 4 and common ratio 2.
  (c) An arithmetic sequence with first term 1/2 and common difference 3/4.
 (d) A geometric sequence with first term 3/5 and common ratio 2/3.
Exercise 33.2. Translate the following phrases into symbolic logic. (Your answer
should include things like ∀ε ∈ R>0 , or other quantified variables.)
  (a) The sequence (an )n∈N defined by an = 3 − 4/n converges to L = 3.
 (b) The sequence (an )n∈N defined by an = 6 does not converge to L = 3. (Note:
      This sequence does converge to L = 6.)
Exercise 33.3. Let a, b, x ∈ R. Prove the following.
 (a) max(a, b) ≥ a and max(a, b) ≥ b.
 (b) min(a, b) ≤ a and min(a, b) ≤ b.
 (c) If x > max(a, b) then x > a and x > b.
Exercise 33.4. Prove that
                                            2
                                      lim      = 0.
                                     n→∞    n2
Exercise 33.5. Prove that
                                     3n − 5     3
                                   lim       = .
                                     2n + 4
                                   n→∞          2
(Hint: When n ∈ N, then 2n + 4 is always positive, so you don’t have to worry about
when it is negative.)
Exercise 33.6. Prove or disprove: The sequence (an )n∈N defined by an = (n + 1)/n
converges. (Hint: On scratch paper, write out the first ten terms to see if the sequence
is going somewhere.)
33. SEQUENCES                                                                  253
Exercise 33.7. Let (an )n∈N be an arithmetic sequence with first term c and common
difference d. Prove that if d = 0, the sequence (an )n∈N converges to c. (In other
words, prove that the constant sequence c, c, c, . . . converges to c.)
Exercise 33.8. Prove that the sequence (an )n∈N defined by an = n does not converge
to L = 3.
                                   √
Exercise 33.9. Prove that limn→∞ ( n2 + 1 − n) = 0.
Exercise 33.10. (Harder problem.) Let (an )n∈N be a geometric sequence with first
term c and common ratio r. Prove the following statements.
  (a) If |r| < 1, then an converges to 0.
  (b) If c 6= 0 and an converges to 0, then |r| < 1.
  (c) If c > 0 and r > 1, then an diverges.
(Feel free to use laws of logarithms, and especially the fact that for 0 < r < 1 we
have ln(r) < 0, and for r > 1 we have ln(r) > 0.)
254                                         CHAPTER IX. INTRODUCTION TO ANALYSIS
34       Series
In this section we introduce the idea of adding infinitely many objects together. There
are many applications for these ideas, which lead naturally into the development of
the integral in calculus.
                                  s1   =     a1
                                  s2   =     a1 + a2
                                  s3   =     a1 + a2 + a3
                                  s4   =     a1 + a2 + a3 + a4
                                       ..
                                        .
which is called the sequence of partial sums. These sums are only the beginning
portion of the series, which is why we call them partial sums. We can also define the
partial sums using the more compact formula
                      s1   =    a1 = 1
                      s2   =    a1 + a2 = 1 + 1 = 2
                      s3   =    a1 + a2 + a3 = 1 + 1 + 1 = 3
                      s4   =    a1 + a2 + a3 + a4 = 1 + 1 + 1 + 1 = 4
                           ..
                            .
sk+1 = sk + ak+1 = sk + 1 = k + 1,
Example 34.3. Consider the sequence (bn )n∈N defined by the rule
                                              1   1
                                       bn =     −   .
                                              n n+1
The first few terms in this sequence are given by
                                           1        1       1
                                  b1 =            −       =
                                           1        2       2
                                           1        1       1
                                  b2    =         −       =
                                           2        3       6
                                           1        1       1
                                  b3    =         −       =
                                           3        4       12
                                           1        1       1
                                  b4    =         −       =
                                           4        5       20
                                        ..
                                         .
                                              1
                             s 1 = b1 =
                                              2
                                                      2
                             s 2 = b1 + b2 =
                                                      3
                                                              3
                             s 3 = b1 + b2 + b3 =
                                                              4
                                                                  4
                             s 4 = b1 + b2 + b3 + b4 =
                                                                  5
                                  ..
                                   .
as desired.
    Now that we know sn = 1 − 1/(n + 1), we can prove limn→∞ sn = 1. We will not
include the scratch work, but here is the proof.
                             1
Proof. Let ε > 0. Fix N =    ε
                                 − 1 ∈ R. Let n ∈ N. Assume n > N . We find
                                         
                                  1          1      1
                  |sn − S| = 1 −
                                     − 1 =     <      =ε
                                  n+1         n+1   N +1
as desired.
  Advice 34.4. To prove that a series converges try the following steps:
   (1) Compute a few partial sums.
   (2) Conjecture a general formula for the partial sums.
   (3) Prove that formula by induction.
   (4) Using your formula for the partial sums, find the limit.
   (5) Finally, prove that the partial sums converge to that limit.
    We will demonstrate how to prove convergence of series with one more example,
leaving most of the work as an exercise.
                                                  1
                                       sn = 1 −      .
                                                  2n
These partial sums converge to 1.                                                      4
    We now prove a powerful result that can often be used to show that a series does
not converge. We first state and prove the result in terms of convergence, and give
the contrapositive (in terms of divergence) as a corollary. Before reading this proof,
it might be helpful to review the triangle inequality (Theorem 8.21).
34. SERIES                                                                            257
  Theorem 34.6. Let (an )n∈N be a sequence of real numbers. If the series
                                          ∞
                                          X
                                                ai
                                          i=1
  converges, then
                                       lim an = 0.
                                      n→∞
for the partial sums of the series. The theorem asserts that if limn→∞ sn exists, then
limn→∞ an = 0. Note that for any n > 1, we have sn − sn−1 = an .
    Assume that the series converges. Then the sequence sn of partial sums converges
to some limit L.
    Let ε > 0 be arbitrary. We wish to find an N ∈ R such that for all natural
numbers n > N , we have |an | < ε.
    Since ε > 0, we also have that ε/2 > 0. Hence, since the sequence sn converges,
there is some M ∈ R such that for all natural numbers n > M , we have |sn −L| < ε/2.
Taking N = M + 1, we see that if n > N , then both n and n − 1 are greater than M .
Hence, for any n > N ,
We thus see that for this value of N , it is true that for all n > N we have |an − 0| < ε.
Therefore
                                       lim an = 0.
                                      n→∞
258                                      CHAPTER IX. INTRODUCTION TO ANALYSIS
                                         lim an 6= 0
                                         n→∞
  then                                     ∞
                                           X
                                                 ai
                                           i=1
does not exist (see Proposition 33.16), and hence does not equal 0. 4
Note that the converse of Theorem 34.6 does not hold; it is possible for
                                         lim an = 0
                                         n→∞
to hold while                              ∞
                                           X
                                                 ai
                                           i=1
34.B        Exercises
Exercise 34.1. Consider the sequence (an )n∈N given by the rule an = n. Find the
first 6 terms of the sequence of partial sums sn . Conjecture a simple formula for sn
and prove it.
Exercise 34.2. Let c, d ∈ R and let (an )n∈N be the arithmetic sequence defined by
an = c+(n−1)d (i.e., the arithmetic sequence with
                                                Pfirst term c and common difference
d). Find a formula for the nth partial sum sn = nk=1 ak and prove it.
34. SERIES                                                                      259
Exercise
      P 34.5.    (Harder problem.) In this exercise we will show thatPthe harmonic
series ∞     1
         k=1 k does not converge. Throughout the exercise, let sn =
                                                                       n   1
                                                                       k=1 k be the
nth partial sum, for each integer n ≥ 1.
  (a) For each n ≥ 1, define
                                            2n
                                            X     1
                                     tn =           .
                                            n−1
                                                  k
                                         k=2    +1
     Prove that tn ≥ 12 , for each n ≥ 1. (Hint: How many terms are being added?
     What is the smallest one?)
 (b) Show that s2n ≥ n/2, for each n ≥ 0, by induction.
     (Hint: For n ≥ 0, we have s2n+1 = s2n + tn+1 .)
 (c) Now show that the harmonic series does not converge.
260                                    CHAPTER IX. INTRODUCTION TO ANALYSIS
35      Limits of functions
Let f : R − {0} → R be defined by
                                                 sin x
                                       f (x) =         .
                                                   x
If we evaluate this function for some values of x near 0, we find the following interesting
behavior.
                           x      0.1     0.01     0.001
                         f (x) 0.998334 0.999983 0.999999
It appears that as x gets close to 0, the value of f (x) gets close to 1. Note that we
cannot just plug x = 0 into the function, since that would result in division by 0. A
graph of the function f (x) seems to confirm this behavior.
1.5 y
0.5
                                                                x
                       −10        −5                       5     10
−0.5
35.A      Windows
In the example above, there are two related quantities we focus upon: the input x of
the function, and the output f (x). We are interested in the behavior of the outputs
f (x) as x approaches some fixed constant, which we might call the point of interest.
In the previous example that point of interest is 0, but more generally we use the
letter a ∈ R to describe the place where x is headed. Sometimes we write x → a as
shorthand for the sentence “as x goes towards the point of interest a.”
    Similarly, we use the letter L to denote the limiting value that f (x) approaches
(if any) as x → a. We might write f (x) → L to mean that “f (x) is approaching the
limit L.”
    To formalize all of this, we start (just as with sequences) by letting ε > 0 denote
some (positive, arbitrarily small) quantity that tells us how close our function should
35. LIMITS OF FUNCTIONS                                                             261
be to the limit L. For instance, in the example above, when ε = 1/4 we want our
function to stay between the two dashed lines in the graph below.
1.5 y
0.5
                                                              x
                      −10        −5                  5         10
−0.5
1.5 y
0.5
                                                              x
                      −10        −5                  5         10
−0.5
in on the limit L. Notice that if the graph of our function has points between the two
green lines that are also either above the top red line or below the bottom red line
(i.e., directly above or below the window), then we have not placed the vertical lines
correctly.
    We let δ > 0 be a variable which measures (given some ε) how far x can vary
away from the point of interest, and still guarantee that f (x) stays within a distance
of ε from the limit value L. In other words, δ is some constant small enough that the
vertical lines x = a + δ and x = a − δ together with the horizontal lines y = L + 
and y = L −  produce a window for our function.
   We can think of a deleted neighborhood as a set of real numbers that contains all
the points “close to” a (but does not contain a itself).
Example 35.2. Let a = 2. Some deleted neighborhoods of a would include
   We are now equipped to give the formal definition of what we mean by a limit for
functions. Afterwards, we will explain all of the notation.
                                     lim f (x) = L
                                    x→a
to mean that
  In this case we say that the limit of the function f , as x approaches the point
  of interest a, is the real number L.
    What is ε? As before, ε measures how close the function gets to the limit L. It
is allowed to get arbitrarily small.
    What is δ? The second quantifier is “∃δ ∈ R>0 ”. The variable δ measures how
close x must be to the point a in order for our function to stay within ε of the limit.
It must be fixed in terms of ε.
35. LIMITS OF FUNCTIONS                                                               263
Proof outline.
   Let ε > 0.
   Fix δ = found in scratch work > 0.
   Let x ∈ S.
   Assume 0 < |x − a| < δ.
   Reverse the steps in scratch work.
   Conclude that |f (x) − L| < ε.
                                     lim 3x + 2 = 8.
                                    x→2
Scratch. Since this is the scratch work, we start with the conclusion |f (x) − L| < ε,
and try to eventually get information about the quantity |x − a| = |x − 2|. We have
Thus we want 3|x − 2| < ε, or in other words |x − 2| < ε/3. This tells us what value
for δ we should use.                                                              F
   With the scratch work done, we can now give the formal proof.
264                                    CHAPTER IX. INTRODUCTION TO ANALYSIS
Proof. Let ε > 0. Fix δ = ε/3 > 0. Let x ∈ R. Assume 0 < |x − 2| < δ. We then find
           |f (x) − L| = |(3x + 2) − 8| = |3x − 6| = 3|x − 2| < 3δ = 3(ε/3) = ε
as desired.                                                                           
      Limits for other linear polynomials will work similarly.                       4
Example 35.5. We find lim −2x + 3.
                            x→−1
    Set f (x) = −2x + 3. Plugging in the point of interest a = −1, we find f (−1) =
(−2)(−1) + 3 = 5. Thus we would guess the limit is L = 5. We now prove it (without
including our scratch work).
Proof. Let ε > 0. Fix δ = ε/2 > 0. Let x ∈ R. Assume 0 < |x − (−1)| < δ. We find
         |f (x) − L| = |(−2x + 3) − 5| = |−2x − 2| = 2|x + 1| < 2δ = 2(ε/2) = ε
as desired.                                                                           
   When working with a linear function f (x) = cx + d (for some constants c, d ∈ R
with c 6= 0) the best value for δ should be δ = ε/|c|. In this example we have c = −2
and indeed, the value for δ was ε/|c|.                                             4
   There is one last trick which will help us to evaluate limits. In limit proofs we
have one assumption, namely
                                 0 < |x − a| < δ.
However, we also have some control on which δ we fix. If we guarantee that δ ≤ 1,
then our assumption gives us |x − a| < 1, or in other words
                                    a − 1 < x < a + 1.
This allows us to limit x a lot. (If we need an even smaller interval for x, we can take
δ even smaller.) We will use this idea to find the limit of a quadratic function.
   We will include the scratch work, so you can see how this is done. (Normally, it
should not appear in your proof.)
Scratch. Start with |f (x) − L| < ε. We have f (x) = x2 + x and L = 2, and so
plugging in those values we want |x2 + x − 2| < ε. If we factor the left side of the
inequality, we get
(35.7)                             |x − 1| · |x + 2| < ε.
    At this point, one might want to take δ = ε/|x + 2|, but δ must not depend on x.
(Why?) To handle this issue, we bound |x + 2| as follows. Assuming δ ≤ 1, we get
|x − 1| < δ ≤ 1, and hence −1 < x − 1 < 1. Thus, by adding 3 throughout, we have
2 < x + 2 < 4. Now 2 < |x + 2| < 4. Using this bound in (35.7), we see that we need
|x − 1| < ε/4.                                                                    F
35. LIMITS OF FUNCTIONS                                                            265
Proof. Let ε > 0. Fix δ = min(1, ε/4) > 0. Let x ∈ R. Assume that we have
0 < |x − 1| < δ. First, since δ ≤ 1, this tells us |x − 1| < 1, so −1 < x − 1 < 1.
Therefore 2 < x + 2 < 4, and taking absolute values we get |x + 2| < 4. We then find
as desired.
  Proposition 35.8.
                                            2x + 1  5
                                      lim          = .
                                      x→2   3x + 2  8
   Before reading the proof below, try to do the scratch work yourself, and see if it
helps you figure out the choices made in the proof.
Proof. Let ε > 0. Fix δ = min(1, 40ε) > 0. Let x ∈ R−{−2/3}. (Notice that we have
to avoid x = −2/3 since the function is not defined there.) Assume 0 < |x − 2| < δ.
First, since δ ≤ 1, this tells us |x − 2| < 1 hence −1 < x − 2 < 1. Adding 2 we
get 1 < x < 3. Multiplying by 3 and adding 2, we get 5 < 3x + 2 < 11. Taking
reciprocals, we find that 1/11 < 1/(3x + 2) < 1/5. Then taking absolute values, we
have 1/|3x + 2| < 1/5. Hence,
                                                   
                           2x + 1 5   x − 2 
           |f (x) − L| =        −  =             < δ/40 ≤ (40ε)/40 = ε.
                            3x + 2 8        8(3x + 2) 
  Advice 35.9. When simplifying |f (x) − L| you should expect that |x − a| is one
  of the factors. That can help you simplify the expression (and also gives a quick
  double-check that you have not made an algebra mistake).
35.D      Exercises
Exercise 35.1. Prove that
                                      lim 2x + 3 = 11.
                                      x→4
                                    lim cx + d = ca + d.
                                    x→a
(Hint: Consider the cases in which c = 0 and c 6= 0, separately. Be sure that your
proof properly handles the case where c < 0.)
266                               CHAPTER IX. INTRODUCTION TO ANALYSIS
                            lim x2 + 3x + 3 = 43.
                            x→5
                            lim x3 + x2 + 2 = 38.
                            x→3
36. CONTINUITY                                                                    267
36     Continuity
36.A     Defining continuity
Intuitively, continuity for a function f : R → R means that there are no holes or
jumps in the function. Put another way, if we focus on a point of interest a ∈ R, we
need f (a) to be defined, and for x near a we want f (x) to be near f (a). Thus, there
are two separate conditions that combine to give the formal definition of continuity.
Example 36.3. Let f : R → R be the characteristic function of the set [0, 1]. Then
                                  (
                                   1 if 0 ≤ x ≤ 1
                          f (x) =
                                   0 otherwise.
One can check that f is continuous for all a ∈ R − {0, 1}. We will show that f is not
continuous at x = 1. (Determining continuity at other points is left as an exercise.)
   Note that f (1) = 1. Then we need to show that
                                    lim f (x) 6= 1.
                                    x→1
Choose ε = 1/2. Let δ > 0 be arbitrary, and fix x = 1 + δ/2. We notice that
0 < |x − 1| = δ/2 < δ and, since x > 1,
|f (x) − 1| = |0 − 1| = 1 > ε.
(f g)(x) = f (x)g(x).
Example 36.6. Let f (x) = x2 and let g(x) = 2x + 1 be functions defined on all real
numbers. Then (f + g)(x) = x2 + 2x + 1 and (f g)(x) = x2 (2x + 1) are functions
defined on all real numbers. Note that g(x) = 0 when x = −1/2. Hence, (f /g)(x) =
x2 /(2x + 1) is a function defined on R − {−1/2}.                                4
                               lim (f + g)(x) = L + M
                               x→a
Proof. Assume that limx→a f (x) = L and limx→a g(x) = M . We wish to show that
limx→a (f + g)(x) = L + M . We will do this by choosing an arbitrary ε and finding a
δ such that for all x ∈ S,
   Let ε > 0. Then ε/2 > 0. Hence, since limx→a f (x) = L, there is some δ1 > 0
such that for all x ∈ S, we have the implication
Similarly, since limx→a g(x) = M , there is some δ2 > 0 such that for all x ∈ S, we
have
   Hence,
                               lim (f + g)(x) = L + M.
                               x→a
Proof. We first do the proof under the assumption that one of L and M is not zero.
Without loss of generality, we assume that L 6= 0.
   Let ε > 0. Then ε/(2|L|) > 0, so, since limx→a g(x) = M , we can find a δ1 > 0 so
that for any x ∈ S, if 0 < |x − a| < δ1 , we have |g(x) − M | < ε/(2|L|). Note that this
conclusion further implies that |g(x)| < |M | + ε/(2|L|).
   Now
                                          ε
                                                     > 0,
                                2 (ε/(2|L|) + |M |)
so there is some δ2 so that for 0 < |x − a| < δ2 , we have
                                                    ε
                           |f (x) − L| <                       .
                                           2 (ε/(2|L|) + |M |)
    Now, assume that 0 < |x − a| < δ. Then |x − a| < δ1 and |x − a| < δ2 . Hence,
|g(x) − M | < ε/(2|L|) and
                                                 ε
                          |f (x) − L| <                     .
                                        2 (ε/(2|L|) + |M |)
We then have
          |(f g)(x) − LM | = |f (x)g(x) − LM |
                           = |f (x)g(x) − Lg(x) + Lg(x) − LM |
                           = |(f (x) − L)g(x) + L(g(x) − M )|
                           ≤ |(f (x) − L)g(x)| + |L(g(x) − M )|
                           = |f (x) − L||g(x)| + |L||g(x) − M |
                                       ε
                           <                     (|M | + ε/(2|L|)) + |L|ε/(2|L|)
                             2 (ε/(2|L|) + |M |)
                           = ε/2 + ε/2
                           = ε.
Hence,
                                   lim (f g)(x) = LM.
                                   x→a
   The proof in the case that both L and M are equal to 0 is much less complicated,
and is left to the reader (see Exercise 36.3).
   We can also prove that the limit of f /g is the limit of f over the limit of g (provided
that the limit of g is nonzero). We state the theorem here, but do not prove it.
    From the previous three theorems, we can deduce the following result about con-
tinuity.
Proof. We prove the theorem for the sum; the proofs for the product and the quotient
are similar.
    Suppose that f and g are continuous at a. Then
                                           
       lim (f + g)(x) = lim f (x) + lim g(x) = f (a) + g(a) = (f + g)(a).
         x→a               x→a            x→a
Hence, f + g is continuous at a.
36. CONTINUITY                                                                       271
Remark 36.13. Note that if f and g are continuous at all points in their domain,
then so is their sum and their product. The quotient f /g will be continuous at all
points of S at which g is nonzero.                                               N
Proof. We prove the theorem by induction on n, the exponent of the largest power
of x involved in the expression for f .
    Let P (n) be the open sentence
36.E     Exercises
Exercise 36.1. Prove that if a limit exists, the limit is unique.
Exercise 36.3. Prove Theorem 36.10 in the case that both L and M are equal to 0.
                                              g(x)
                                    f (x) =        ,
                                              h(x)
where g(x) and h(x) are polynomials. Prove that f is continuous at all points where
it is defined. (You may use Theorem 36.11.)
Index
                    273
274                                                                      INDEX
upper bound, 47
vacuously true, 54
variable, 22
Venn diagram, 9
vertical line test, 187, 191, 197