ndatlons
Discrete
Foundations of Discrete Mathematics
This book is meant to be more than just a
text in discrete mathematics. It is a
forerunner of another book ‘Applied
Discrete Structures‘ by the same author.
The ultimate goal of the two books is to
make a strong caSe for the inclusion of
discrete mathematics in the undergraduate
curricula of mathematics by creating a
sequence of courses in discrete
mathematics parallel to the traditional
sequence of calculus-based courses.
The present book covers the foundations
of discrete mathematics in seven chapters.
It lays a heavy emphasis on motivation and
attempts clarity without sacrificing rigour. A
list of typical problems is given in the first
chapter. These problems are used
throughout the book to motivate various
concepts. A review of logic is included to
gear the reader into a proper frame of
mind. The basic counting techniques are
. covered in Chapters 2 and 7. Those in
Chapter 2 are elementary. But they are
, intentionally covered in a formal manner so
as to acquaint the reader with the
traditional ‘definition-theorem-proof' pattern
of mathematics. Chapter 3 introduces
abstraction and shows how the focal point
of today's mathematics is not numbers but
sets carrying suitable structures. Chapter 4
deals with Boolean algebras and their
applications. Chapters 5 and 6 deal with
more traditional topics in algebra, viz.,
groups, rings, fields. vector spaces and
matrices.
The presemation is elementary and
presupposes no mathematical maturity on
the part of the reader. Instead, comments
are inserted liberally to increase his
maturity. Each chapter has four sections.
Each section is followed by exercises (of
various degrees of difficulty) and by ‘Notés
and Guide to Literature‘. Answers to the
exercises are provided at the end of the
book.
Foundations of
DISCRETE MATHEMATICS
K.D. Joshi
sanmenl ofMathematics
Indian mum ofTachnology
Powai. Bombay
India
JOHN WILEY & SONS
New York Chichm Brisbane Toronto Singapom
First Pubiishcd in 199 by
WILEYEASTERN _
48350.4 Ansnri Road, D-msni
New Delhi no ma, India
Dlstrlhulors:
Austral/a and New Zuland
JACARANDA WILEY LTD
1&0.a 1226, Milton Old 4064, Autumn:
Canada:
JOHN WILEY & SONS CANADA LIMITED
22 Worcester Road, Rexdule. Ontario. Canada
Europe and
JOHN WILEY & SONS LIMITED
Baffin: Lane. Chichester, West Sussex Enghnd
South EasMJIa:
JOHN WILEY & SONSLINC.
05-04, Block B, Union Industrial Building
37 Jalan Pemimpin. Singppore 21157
Africa and South Aria:
WILEY EASTERN LIMITED
4835/24 Amati Road, Damon-Ii
New DeIhi 110 007. India
North and South America and rest ofthe world:
JOHN WILEY 8L SONS, INC.
605 Third Avenue, New York, NY 10158, USA
Copyright 0 1989, WILEY EASTERN LIMITED
New Delhi, India
- Library a! Cm Cnuhglng-II-Publhdan Duh
Joshi, KD.
' Foundations of disarm malhemafim/KD. Jothi.
p. cm.
Include; indem.
1. Mathemafiu—1961- Z Eledronindauproeessmg -
—Mathemati§.r 3. Combinatorialnnnlysis. I. Tine.
WQJW ' 19$ 88-14811
' SID—deb? ' ‘ CIP
ISBN 0-470-21152-0 Join Wllq & Sons, Inc.
ISBN 81-224-0120-1 Wlky Elna-I United
Printed in India It Rajhmli Elwuic Press, Delhi, India.
To my discreet wife
SWARADA
for her continuous support
List of Standard Symbols
Symbol Meaning
the empty set
owpyuze
the set of natural numbers (= positive integers)
the set of all integers
the set of residue classes modulo 2 positive integer m
the set of rational numbers
the set of real numbers
the set of complex numbers
the set of ordered n-tuples of real numbers
the set of complex numbers of absolute value 1
cardinality of a set X
the power set of a set X
P4X) the set of all r-subsets of X, i.e.
(AqA|=n
‘n factorisl’ (= l, 2, 3...(n—l)n).
‘1: choose m‘ (= "—("_l)"'("’_m + l))
m.
dihedral group of order 2n
symmetric group on n symbols
alternating group on n symbols
number of pnrtitions of an integer n
number of partitions of a set with n elements.
(= nth Bell number)
nlh Fibonacci number
nth Catalan number
nth harmonic number
order of group G
contained in (and possibly equal to)
contained in but not equal to
divides
greatest integer not exceedingx
least integer not less thanx
Suggested Course Coverage
This book is intended to be covered in two one-semester courses with 14
weeks of instruction per semester. Because of the diversity of preferences on
the part of the instructors and the diversity of the backgrounds, the calibres
and the needs on the part of students, it is impossible to prescribe a uniform
style or schedule of coverage. However, the following is a pattern which
may be applied under ‘average' conditions.
The core of the text lies in Chapters 2 to 7. The level of the material
and the style of presentation is kept elementary. Also the answers to most
exercises are given at the end of the book. Because of these features, coupled
with some initiation from the instructor, it is expected that an average
sincere student should be able to read and understand most of the material
largely on his own. In that case each section can he covered generally in
one week (assuming 3 to 4 hours of instruction per week). The instructor
is strongly urged not to duplicate the proofs given here but instead to
supplement them with numerical and diagrammatic illustrations (which are
somewhat lacking in this book), with comments about the subtle points in
the proofs and with alternate proofs wherever possible. Depending on the
level of the class, the instructor may also wish to skip some of the exercises
and/or to supplement them with simpler exercises designed to give computa-
tional drill.
The suggested schedule of coverage is as follows :
First Course: Spend about 2 weeks on4Chapter I. Then spend about one
week per section in Chapters 2, 3 and4
Second Course : Spend about one week per section in Chapters 5. 6 and 7.
Allow a little extra time for Sections 6.2, 6.3 and 6.4.
Preface
This book is intended to be more than just a textbook of discrete mathce
matics. Its ultimate goal is to make a strong case for the incorporation of
discrete mathematics into the basic core curriculum of undergraduate
mathematics. This is a rather ambitious task and deserves some elaboration.
Mathematics can be broadly divided into two parts; the continuous
mathematics and the " mm” ' ’ . “' ,, upon the or
absence of the limiting process. (The distinction is brought out more fully
in Chapter 1.) Discrete mathematim is conceptually easier and more akin
to human experience than continuous mathematics. Ironically, it is these
Very qualities which give the impression that discrete mathematics is elemen-
tary, indeed trivial, and does not deserve to be studied at the collegiate
level. So it is the continuous mathematics which has long dominated the
scene. This is also consistent with the history of applications of mathema-
tics. Apart from the age-old applications of arithmetic, algebra. geometry and
trigonometry, nearly all real-life applications of mathematics till the end of
the nineteenth century were through physics. Even today. what is generally
understood by ‘applied mathematics’ consists of topics such as mechanics
of solids and fluids. heat transfer, electromagnetic theory etc. In the view
of the nineteenth century physics, energy and the other physical variables
were assumed to be continuous. This explains why the concept of a limit
acquired such a paramount position, anatural corollary being the standard-
isation of the undergraduate mathematics curriculum so as to include a
sequence of two or three calculus courses. followed by one or two courses
in difl'erential equations, numerical analysis, complex analysis and so on.
There is hardly any room for discrete mathematics in this programme.
WL little " mm‘ ‘ is needed is "‘ i mostly amatter
of common sense or something which one just picks up along the way.
This picture began to change in the twentieth century. Applications of
mathematics to physics continued to flourish. especially during the first half
where they got a big boost because of the theory of relativity. But applica-
tions to other fields, such as statistics, operations research, economics, design
of circuits, logic, computer science etc. also got in full swing. An even more
important development was the change that took place in the conception of
mathematics. Its focal point changed from the concept of a number to the
viii nurses
concept of a set (more on this point in Chapter 3). By its very nature this new
non-numerical mathematics was more amenable to the methods of discrete
mathematics than to those of continuous mathematics. This resulted in a
tremendous increase in the real-life applications of discrete mathematics.
Things such as finite fields, which at one time were considered to he too
abstract to have any practical relevance were shown to have down-to-carth
applications (e.g. in designing economical codes).
As a result, the attitude towards discrete mathematics is changing
slowly but surely. The skepticism that once prevailed about its relevance is
waning. Rich research contributions are being made to it. At the pedagogi-
cal level, specialised elective courses catering to various branches of discrete
m...‘ such as r," " algebra, L‘ in graph theory, linear
programming have been started. Many books are being written on these
subjects.
Welcome as all these signs are, the undergraduate curricula are yet to
accord discrete mathematics its due place in the mainstream of mathe-
matics. The ‘core‘ courses continue to be dominated, almost exclusively, by
continuous mathematics. Where specialised courses in discrete mathe-
matics such as those mentioned above do exist, they are generally treated
as peripheral, of interest only to certain classes of students (mostly major-
ing outside mathematics). They are rarely considered to be an integral part
of ‘basic‘ mathematics. As a result. we have the paradoxical situation that
although in a compulsory calculus course a student is thoroughly drilled
into finding maxima and minima, these methods are rarely used in an actual
problem, whereas the methods that are actually used in practice (e.g. the
simplex method) have to he picked up from specialised, ‘electiVe' courses!
Ironically, again, the very strengths of discrete mathematics have come
in the way of its entry into the mathematics core. One of the strong points
of discrete mathematics is its powerful applications to fields like computer
science, engineering and operations research. Indeed, a good deal of discrete
mathematics owes its development to problems in some of these areas. In
this respect the discrete mathematics behaves no difi'erently from continu-
ous mathematics. Many concepts of the classical continuous mathematics
also ut' ’ " from . . " ‘ to physical ' However, in the case
ofcontinuous mathematics, the process of isolating the underlying mathe-
matical thought from a particular application began a long time ago.
As a result, it is possible today to give coherent courses in continuous
mathematics (already named above), in which, although there are suflieient
hints about the applicability of the various concepts, the emphasis is on
the mathematical aspects of those concepts and not on any particular
applications.
In the case of discrete mathematics, there is an inherent difficulty in
separating the underlying mathematics from its applications. It is a fact that
many results of discrete mathematics, when stripped of their applications,
PREFACE ix
appear either too trivial or too abstract. In either case, their introduction
into the core courses in mathematics is not looked at favourably by the
pedagogists of mathematics. A good case in point is the well-known pigeon-
hole principle or the double counting argument or some results in graph
theory. When ingeneously applied, they work wonders. But when seen all
by themsclves, they appear so trivial that one wonders if they deserve even
to be stated explicitly. As examples of the other kind, we have the theories
of groups and fields. Both are replete with profound results. But without
some down-to-earth applications (such as Polya’s theory of counting or
coding theory), they are likely to be disposed of as sterile intellectual
exercises. Many mathematicians are either unaware of these appli-
cations or tend to dispose them of as too esoteric. Consequently,
although both the group theory and the field theory are highly respected
branches of graduate mathematics, they are still not considered to be parts
of ‘everybody‘s mathematics’ the way double integrals and difl‘erential
equations are.
Because discrete mathematics is so wedded to its own applications, it is
not surprising that most of the currently available books on discrete
mathematics are applications-oriented. As a rule, they are written for non-
mathematicians such as computer scientists or engineers. in such books
mathematics is only a means and not the goal and often gets a treatment
which is at best utilitarian. As a result, even though some of these books
are widely acclaimed outside the mathematical circles, they often fail to
appeal to a traditional mathematician. He is not readily convinced that
their mathematical contents deserVe to be studied mathematically regardless
of their particular applications.
Another strong point 01 discrete mathematics is its infinite variety of
interesting problems. Recently a number of books of the ‘problems and
solutions‘ type have appeared on discrete mathematics, especially on
combinatorics and graph theory. Such books are an intellectual treat for
those who love to solve problems. They are also excellent sources of refer-
ences. In books of this kind, it is possible to pack within a few pages a huge
amount of information which, in a conventionally written book, would
probably occupy several times as much space. A mature reader, who is in
a position'to supply the missing details, can learn a lot from such books.
Naturally, these books lack coherence and cannot be used as texts in regular
courses. There is very little motivation or elaboration of the ideas involved.
If not taken in the proper spirit, these books are liable to give the impression
that discrete mathematics is a scattered bunch of intellectual puzzles rather
than a coherent subject capable of a systematic study.
In short, the dilemma of discrete mathematics is that on one hand, in
order to firmly establish it as a fundamental branch of mathematics, itmust
be delinked from problems and applications of a particular kind. But on the
other hand, in doing so there is a danger that the very heart of the subject
may be lost.
X PREFACE
In the present book, I have made an attempt to find away out of this
dilemma. Keeping in mind the role of problems in discrete mathematics, in
the first chapter I have given a fairly long list of some typical problems
and a few comments about them. Their solutions are, however, intentionally
postponed. Instead, these problems are used from time to time to provide
motivation for the various stages of development of the theory. The theory
itself is developed systematically, in the traditional definition-theorem-
proot' patternso characteristic of mathematics. Following the practice of
modern mathematics I have taken a set as a starting point. The general theme
is to show how a suitable set empowered with a 'suitable additional struc-
ture provida a convenient mathematical model for a given problem. I am
aware that the average reader of this book may not be mathematically
mature. [have therefore made a conscious etTort to develop his mathe-
matical maturity. This is done through comments about the nature of
discrete mathematics, its place in mathematics as a whole and its relation-
ship with continuous mathematics, a review of mathematical logic, and
through general comments about the process of abstraction. A special
emphasis is laid on motivating the definitions and results.
The applications ofdiscrete mathematics have not been entirely ignored.
In fact, I have given quite a few of them. However, as indicated above.
the emphasis is more on the applicability of discrete structures rather than
on any particular applications. So I have generally avoided applications of
a technical nature which would demand a knowledge of some other fields,
such as computer science or economics. In the few places where technical
applications do occur (e. g. the applications of Boolean algebra to swit-
ching ‘ ‘ l the ' L a " has been ' , ‘ For all other
applications. I have chosen real-life problems which can be understood and
appreciated even by a layman. (Admittedly, some of these problems appear
rather contrived.) One can of course garb a problem, at least superficially,
so that it looks like a problem in some other field. For example, in a
problem of putting balls into boxes we can think of a ball as a piece of
data and a box as a memory register and the problem now becomes a
problem in computer science! Whatever be the selling value of such tricks,
I have generally refrained from them.
- ' as the'mu ‘ ‘ of ‘ L in into the under-
graduate curricula is still in its infancy, the contents and the degree of
coverage of the topics in a book like this are open to debate. Although
there are dozens of difl'erent texts on calculus. they more or less cover the
same topics and to the same degree of depth. A similar standardisation is
also slowly taking place in textbooks covering particular aspects of discrete
mathematics such as combinatorics, graph theory or applied algebra. The
present book, however, is meant to give a unified rather than a piece-meal
treatment of discrete mathematics. Naturally, I could not go as deep as a
book specialising in any one area. Still I have attempted to reach a reason-
able degree of depth.
names xi
The original plan was to include all the meterisl in a single book
‘Intraduction to Discrete Structures‘. But this proved impracticable in view
of the size. So it was necessary to split it into two separate books, ‘Faunda-
tians of Discrete Mathematics’ and ‘Applied Discrete Structures‘. The first
book, the present one, gives the fundamental concepts and techniques of
discrete mathematics and a fairlythorough exposure to algebra. The second
book is more applications oriented. (See the Epilogue for a detailed
preview of ‘Applied Discrete Structures’.) It contains a review of the first
book, with the help of which it can beread independently of the first book.
The present book has seven chapters. The first chapter introduces the
subject matter. It also contains the list of problems mentioned earlier which
are nicknamed for a ready reference in future. The results of the second
chapter are elementary. Still. they are treated rather formally, soasto
familiarise the reader with the style of presentation to be encountered in
the later chapters. The third chapter introduces the process of abstraction,
studies two elementary structures on sets and gives the generalities about
algebraic structures. The next three chapters deal with specific algebraic
structures. The last L r .— " ’ counting L ‘ based
on generating functions and recurrence relations.
Each chapter is divided into four sections. Each section contains exercis-
es followed by ‘Notes and Guide to Literature’. The latter are generally
intended to direct the interested reader to appropriate references for further
reading or to acknowledge credit to the sources from where lhave borrow-
ed something. A few historical remarks are also made occasionally.
However, such remarks and references are only indicative. No claim is
made about thier being complete or most upoto-date.
There are virtually no prerequisites for reading this book. Some facts
about power series and differential equations will be referred to in Chapter 7.
But that is more by way of relating discrete mathematics with conti-
nuous mathematics rather than a strict pre-requisite. Although this book
is avowedly written to make a case for discrete mathematics, the idea is
definitely not to belittle continuous mathematics. A great majority of the
readers of this book will have already studied calculus or at least be study-
ing it concurrently. f‘ , I]. L r '“ I have ' ‘ 'v ‘ into
comments about continuous mathematics as well. While a mature reader
may find them platitudinous, I hope they will help the average reader gain
new' insights into the nature of continuous mathematics and thereby
appreciate discrete mathematics even more.
Like most authors of textbooks, 1 had to strike a balance between
pairs of mutually conflicting virtues such as expanse versus depth, clarity
versus brevity and abstract versus concrete. The objectives of including
almost all standard topics, of reaching a reasonable depth and of building
up the mathematical maturity of the reader so that he can handle abstract
concepts soon proved to be somewhat incompatible with each other and
xii ensues
threatened‘to blow the size of the book out of proportion even after it was
split into two. As a result, I was forced to make a few unpleasant choices.
Some of the standard results had to be relegated to the exercises. It is hoped
that with the generous hints given, the reader can work them out. (Answers
to most of the exercises are given at the end of the book.) Also the diagram-
matic and the numerical illustrations have been kept to a minimum. The
emphasis is more on thoughts and less on numerical dexterity. I am of
course, not unaware of the pedagogical importance of diagrams and worked-
out examples. Had space permitted, I would have loved to include more
of them. Another reason is that there already exist books which cater to
these aspects very nicely. For the algebraic part we recommend William
J. Gilbert’s ‘Modern Algebra with Appllcarinns' and for the combinatorial
part, Alan Tucker’s ‘Applied Combinatoricr’.
Exercises form an integral part of the book. The results of many of
them are used freely in the text, There are virtually no exercises whose sole
purpose is to provide numerical drill. Nearly all exercises require some
thinking for their solution, the degree and the quality of which obviously
vary considerably. Some of the exercises merely ask the reader to supply
parts of a proof (occasionally an entire proof). A few require the applica-
tion of the results proved in the text while a few others are meant to prove
some’standard results which could not be incorporated in the text. Hints
for solution and comments about the significance are given liberally. The
degree of difiiculty of a problem, especially one where some thought is
involved, is always a matter of personal opinion. Many challenging problems
look deceptively simple once you know their solutions. It is therefore, very
difiicult to rank the exercises quantitatively in terms of their difiiculty and
to give the estimated time for working them out as is done by Donald
Knuth in his pioneering volumes of 'The Art of Computer Pragrammlng’.
My own experience with such a time scale is, in fact, that when I could
not do a problem within the stipulated time, it unnecessarily created an
inferiority complex. I have therefore refrained from giving any quantitative
assessment of the difficulty of a problem. Still, some qualitative indication
seems to be in order. So I have put a star (s) over those problems. which.
in my opinion, require a little originality of thought. Unusually demanding
exercises are doubly starred (*8). These include some standard theorems
whose proofs are far from simple and also a few problems which, to my
knowledge, are unsolved. These exercises are not really meant to be solved,
even by the hlghly gifted student. He is merely expected to appreciate their
difliculty. HereI recall with full agreement, a comment by LN. Herstein in
his ‘Tnpics in Algebra‘ that the value of a problem is not so much in
coming up with the answer as in the ideas and the attempted ideas it forces
on the would-be solver.
The material is so arranged that it can be cavered in two courses of
one semester each. (See the ‘Suggested course coverage' for more delails
nuance xiii
about this.) In fact, it is hoped that the present book. along with ‘Applied
Discrete Structures' would eventually be used as texts for a sequence of
four one-semester courses in discrete mathematics. Such ,a sequence, coupled
with the traditional sequence in continuous mathematics, would provide a
college student with a solid mathematical background, regardless of what
he specialises in later on.
It was mostly by accident that I was prompted to write this book. My
owu mathematical training did not include discrete mathematics and like
many others in my position I thought of it as something which is possibly
useful but inherently trivial. Then I happened to read C.L. Liu‘s fine
book ‘Introductlon to Combinatorial Mathematics‘ and used it as a text for
a course to students of computer science. In doing so, I realised that the
mathematics in it was important enough to deserve a place in the main-
stream of mathematics. Other books which have influenced me profoundly
are those of Knuth and Herstein, mentioned above. The latter's influence
extends not only to the coverage of algebra in this book, but also to my
style in general. I am deeply indebted to these three authors.
My own expertise in discrete structures being limited to algebra, I
frequently had to consult others on many points. Iam grateful to many
colleagues of mine who helped me by informal discussions and by pointing
out appropriate references. [must especially mention prof. G.A. Patwar-
dhan and Prof. M.N. Vartak. Il‘ despite their best help, any errors have
occurred, I am entirely responsible.
Financial support for the preparation of the manuscript of this book
was given by the Curriculum Development Cell at the Indian Institute of
Technology, Bombay and is hereby thankfully acknowledged. Ialso thank
Mr. Parameswaran for his sincerity and patience in typing the manuscript.
The introduction of discrete mathemetics in the core programme is
still in the experimental stage and not yet fully implemented. I shall, there-
fore, be most interested in the views, comments and suggestions, not
only from mathematicians who may be teaching it but from others as well.
I hope that through such a dialogue a standard syllabus of discrete mathe-
matics will evolve in near future. The twentieth century is coming to an end
and the talk of orienting ourselves for the twentyfirst century has become
a fashion of the day. Let us hope that before the dawn of the twentyflrst
century, the undergraduate mathematics curriculum is freed, at least parti-
ally, from the tight grip of the nineteenth century physics.
Bombay, K.D. Josru
February 20, 1989
Preface to the Reprint
I have added Exercise (7.4.32) and corrected a number of misprints,
and minor errors. I thank all those and particularly Prof. V.G. Tikekar
who brought them to my attention.
Bombay K.DJOSHI
August 22, 1991
Contents
List ofsrandard Symbol:
Suggested Course Coverage
Preface
Introduction and Preliminaries
1.1 What is Discrete Mathematics? 1
1.2 Typical Problems 12
1.3 Comments about Typical Problem} 22
1.4 Review of Logic 36
Elementary Counting Techniques
2.1 Sets and Functions 53
2.2 Csrdinalities of Sets 71
2.3 Applications to Counting Problems 92
2.4 Principle of Inclusion sud Exclusion 112
Set: with Additional Structures
3.1 Abstraction and Mathematical Structures 129
3.2 Binary Relations on Sets 152
3.3 Order Relations 114
3.4 Algebraic Structures 198
Boolean Algebras 221
4.1 Definition and Properties 22!
4.2 Boolean Functions 241
4.3 Applications to Switching Networks 261
4.4 Applicmions to Logic 284
Group Theory
5.1 Groups and Subgroups 300
5.2 Cosets of Subgroups 322
5.3 Group Homomorphisms 341
5.4 Permutstiou Groups 369
Rings, Fields and Vector Spaces 385
61 Basic Concepts and Examples 385
6.2 Special Types of Integrsl Domains 408
xvi CONTENTS
6.3 Vector Spaces 434
6.4 Matrices and Determinants 460
7. Advanced Counting Techniques
7.1 Generating Functions of Sequences 506
7.2 Application to Enumeration Problems 531
7.3 Recurrence Relations 558
7.4 Applications of Recurrence Relations 591
Epilogue: Preview of ‘Applied Discrete Structures’ 616
References 623
Answers to Exercises 629
Index 735
One
Introduction and Preliminaries
This chapter is meant to give an idea of the kind of things that would
be discussed in this book and also to review the prerequisites needed for
their understanding. The spirit of discrete mathematics, especially in com-
parison to the ' vu -----‘ ' is ' ‘ in“ ‘“ 1.Arather
long list of problems is given in Section 2. The comments about them in
Section 3 guide the reader to the chapters in which the machinery needed
to solve the problems will be developed. Section 4 gives a warm-up in
logic.
1. What is Discrete Mathematics?
Discrete mathematics began at least as early as man (or perhaps some
other animals) learnt to count. The fundamental idea behind counting is to
establish a one-to-one correspondence (or a bijection, as it is technically
called) between two sets of objects and even today this continues to be one
of the most widely used devices in discrete mathematics. Nearly all the
mathematics that we pick up in early school comes under discrete mathe-
matics. This includes. the addition, multiplication and other arithmetical
operations we do with integers and rational numbers. The fairly interesting
topics of permutations and combinations and related problems in probability
are an important part of discrete mathematics.
Thus ‘discrete mathematics' is far from a new innovation in the history
of mathematics. Perhaps the only new thing about it is its name. The
dictionary meaning of the word 'discrete’ is ‘separate and distinct, unrelated,
made up of distinct parts‘. But this only remotely reflects the way the word
is used in mathematics. In fact, the best way to understand the spirit of
discrete mathematics is by comparing it with non-discrete mathematics! The
latter is more popularly known as the continuous or continuum mathematics.
Let us, therefore, take a look at what continuous mathematics is and how
it came into being.
2 orscmrrs MATHEMATICS (Chapter One)
The earliest (and, till fairly recently, all) applications of mathematics
to real life involved the process of assigning numbers to describe the sizes
of various sets. The various acts to which these sets are subjected often
naturally correspond to various mathematical operations on the numbers
representing them. For example, suppose two bags of apples are to be
emptied into a third, larger, bag (which is originally empty) and we want
to find how many apples the third bag would contain. We can do so by
simply adding the sizes (that is, the numbers of apples) of the two bags.
Similarly cutting a piece of a wire into two equal parts amounts to divid-
ing its length (or rather, the number which represents its length) by‘two.
The crucial question now is, which numbers should be assigned to the
sets under study? Should they be natural numbers, integers (including
negative integers as well as zero), rational numbers, real numbers or com-
plex numbers? Moreover, how do we assign them? The answer depends
upon the nature of the objects of study and also upon the type of study.
In the two examples given above, we regard each apple as an individual
unit. We ignore the difl‘erences in the sizes of the apples. The numbers
assigned to the two bags containing them are whole numbers, that is,
positive integers. On the other hand, in the case of a piece of a wire there
is no natural unit. We arbitrarily choose some unit of length (such as an
inch, a centimeter etc.) and assign to the wire the number by which this
unit must be multiplied so as to get a segment of the same length as the
given piece. This number need not be an integer and even when it is so, the
number that comes out as the answer (namely, the length of each part of
the wire after it is divided equally into two) need not be an integer. Thus,
integers are adequate to handle the first problem but not the second. It is
tempting to try to salvage the situation in the second problem by choosing
the unit of length so as to make the answer come out as a whole number.
But this trick may not work when more than one line segments are simult-
aneously involved. For instance, if one of the segments represents the side
of a square and the other its diagonal, then it can be shown thatit is
impossible to find a common unit of length which would make the lengths
of both as integers, or even rational numbers. Technically, this is expressed
by saying that the side and the diagonal of a square are not commensurate.
As a more poignant example, the diameter and the circumference of a circle
are not commensurate with each other (although this fact is extremely non-
trivial to prove).
The difl'erence between discrete and continuous mathematics is basically
the difference between a bag of apples and a piece of wire‘l In the former,
the apples sit apart discretely from each other while in the latter, the
*Other similies are also possible. An analog computer corresponds to continuous
mathematics while a digital one to the discrete mathematics. An expert in Company
Law would probably regard the stock of acompany as a continuum variable and it:
share capital as a discrete variable!
Introduction and Preliminaries 3
points on a wire spread themselves continuously from one end to the other,
Integers are adequate to handle the first but not the second. Given any posi-
tive real number a, we could conceive of a piece of wire whose length is at
times a fixed unit. Thus, in continuous mathematics, all real numbers must be
allowed as possible representatives of some real-life objects (hence the name
‘real’ numbers). Therefore the length of an object is called a continuous
variable. What applies to length applies equally well to many other quanti-
ties such as time and mass and hence also for all quantities derived from
them such as area, volume, density, speed, momentum, energy and so on.
All these are continuous variables. At least that was the view that domina-
ted mechanics till fairly recently; that is, till the quantum theory came on
the scene. The term ‘classical mechanics’ is now synonymous with ‘conti-
nuum mechanies', ‘continuum’ being the name given to the real number
system.
The problems as well as the methods of solution in continuous mathe-
matics differ fundamentally from those in discrete mathematics. The
difl‘erence, in fact, starts right from the terminology used. In discrete
mathematics we ‘count‘ the number of objects while in continuous mathe-
matics we ‘measure’ their sizes. Many concepts in either branch become
meaningless or at least inapplicable when carried to the other. There is,
nevertheless, an interplay between the two. A problem in continuous mathe-
matics can often be thought of as the limiting case of a similar problem in
discrete mathematics. Put another way, methods of discrete mathematics
can often be used to provide approximate solutions to problems in conti-
nuous mathematics.
We proceed to illustrate these remarks with a simple problem, which
will be referred to as the ‘Honse Problem'. Let us suppose there is a straight
road, one kilometer long, on which there is a house at every one-tenth of a
kilometer. There would thus be eleven houses on the road (including the
houses at the two ends of it). Suppose we pick two (distinct) houses at
random. What is the probability that the distance between them will not
exceed, say, one half kilometer? Because the houses are discretely located.
this is a typical problem in discrete mathematics. Let us number the houses
along the road by integers from 0 to 10. If): and y are two houses (with
y 2 x) then the distance between them is simply 1/10 (y — 3:) kilometers.
It is tempting to think that the answer to our problem is 1/2. But this
is clearly wrong because there are more pairs of houses that are closely
located than those that are remotely located. It is easy to see that there are
in all i (11. 10) = 55 pairs of distinct houses. This is, therefore, the total
number of cases. As with all probability problems, we have to proceed on
the assumption that each of these 55 cases is equally likely. (This is, in fact,
the interpretation given to the phrase ‘at random'.) The next step in our
solution is to count the so-called ‘number of favourable cases’. For con-
venience, whenever we consider a pair of houses, say, (x, y) let ua always
4 mscnars MATHEMATICS (Chapter One)
suppose that x < y. We then want the number of such pairs with x 4 y
(since the two houses are distinct) and for which I [10 (y — x) g g. Here x
can take any of the eleven values from 0 to 10. Whenx is 0, y has to be
between 1 and 5 (both included). When x is l, y can vary from 2 to 6. This
will go on till x is 5. Thereafter the variation of y will be restricted in its
upper bound. Forexample wheuxis7 yhas to be either 8, 9 or 10 (see Figure
1.1). Keeping track of the number of possible values y can take for various
values of x and adding, we get the number of favourable cases as 5+5 +5
+5+5+5+4+3+2+l+ 0, that is. 40. Thus the answer to our problem,
namely the probability that the distance between two randomly selected
distinct houses be less than or equal to l/2 kilometer, is 40/55 or 8/11.
a —x
l l l I l l
O l 2 3 4 5 6 8 9 [0
Flute 1.1: ’11:. dlmte Hot-e Problem
Let us now try the continuous version of this problem. Let us suppose
that our one kilometer long road is literally packed with houses in the
extreme; that is, there is a house at every point of it. (This may sound like
an unrealistic problem, but if the road happens to be in a city like Bombay,
it is awfully close to reality!) We now ask the same question, namely, if
two distinct houses are picked at random, what is the probability that the
distance between them does not exceed 1/2 kilometer? The method of
solution to the earlier problem breaks down completely, because the total
number of cases and the number of favourable cases are both infinite now
and so the answer would be oo/oo which is meaningless. How do we tackle
the problem then?
There are two methods for doing this. One is to regard the continuous
version as a limiting case of the discrete version. Let us, therefore, revisit
the discrete house problem, assuming this time that there is a house at
every l/nth kilometer wheren is a positive integer. (In the problem that we
just solved n was 10.) Let p, be the probability that the distance between
two distinct, randomly selected houses is at most 1/2. Now as n tends to
infinity the distance between consecutive houses tends to zero, the number
of houses tends to infinity and the problem becomes more and more like the
problem we are trying to solve. It is therefore reasonable to expect that if we
solve the problem for each n, that is, if we compute p,I for every n and then
take thelimit of the sequence {11,} as It tends to infinity, then this limit would
be the answer to our problem. Even if we are not able to actually evaluate
this limit (a fairly common situation with sequences), still, for sulficiently
large n, 11,, would at least give us an approximate answer.
If we take this approach, then, in order to compute p. we have to make
two cases, depending upon whether I: is even or odd. (It will be a good
Introduction and Preliminaries 5
exercise for the reader to pin-point the reason for this.) If n is even, say
71 = 2k where k is a positive integer, then it is easy to show. by reasoning
similar to above, that p. = 37¢?” while for n odd. say, n = 2k + l, p.
comes out to be 4:: 2. As k —> 00 both these expression tend to a com-
mon limit, namely 3/4. Thus 1:. converges to 3/4 and this is the required
probability.
There is also another method to solve the continuous house problem
which does not involve its approximation by the discrete version. We begin
by taking a new look at the solution given above for our original problem,
with eleven houses on the road. In Figure 1.1, we pictured them with dots
on a straight line. Let us now picture the ordered pairs of houses by dots
in a square as in Figure 1.2. We need not explain in detail how this is
done, because the idea is precisely that of Cartesian coordinates. For exam-
ple, the (3/10, 8/10) (the circled point in the figure) represents the pair
consisting of the third and the eighth house. Because of our convention
that the house with a smaller number will be listed first, we have drawn
only the upper triangular half of the full square. There are in all 66 dots
in this triangle. Since we want pairs of distinct houses, we ignore the dots
on the diagonal. This leaves 55 dots, the same as the total number ofcases.
The favourable cases correspond to the dots on and below the line y=x+{.
’i
(O,I)
(0,0) 0 i "
Flu-re 1.2: MW 1‘ the solution to flu diocese house problem.
6 orscnm MATHEMATICS (Chapter One)
They are shown by enclosing them with a curve C. Their count comes out
to be 40. exactly as before.
So far we have not done anything new. We have merely given a geo-
metric interpretation to our earlier solution. But in this new formulation,
the solution can be easily adapted for the continuous house problem. Once
again, we picture pairs of houses by points in a square and consider only
the upper triangular half of it as in Figure 1.3. This corresponds to the set
of all posible cases. The set of favourable cases corresponds to the trape-
zoidal region between the parallel lines y = x and y = x + }. Both the sets
contain infinitely many points. So we cannot compare them on that ground.
But there is another way to compare them. We simply take their areas!
The area of the upper triangle is 1/2 square kilometers while that of the
trapezeum is 3/8 square kilometers. By taking the ratio, we get 3/4 as the
required probability. This is of course the same as the answer obtained by
the first method. Note incidentally, that in the continuous version it does
not matter whether the two houses picked are distinct or not. Points corres-
ponding to pairs of identical houses lie on the line y = x. The line itself
has no area and so the area of the trapezeum is unafl'eeted whether points
of the boundary are included in it or not.
y=x+L
Y \ 2
( yux
(0,0 i") /
favourable
region
(oi/l
(0,0)’ 0 7
Figure 1.3: Solution to the mtlnuous house problem
We have discussed this problem more extensively than it may appear to
deserve. But it does capture most of the salient features of the discrete and
the continuous mathematics and also of the relationship between then.
Some sort of a limiting process is essential for continuous mathematics. (It
Introduction and Preliminaries 7
is tempting to think that the second solution to the continuous
house
problem was free of any limiting process. Butit is not so. A rigorous
defini-
tion of area for planar figures does involve limits.) Actually, even the
precise definition of a real number involves the limiting process in some
form or the other, In discrete mathematics, on the other hand, there is hardly
any scope for the limiting process. The sets involved are generally finite. (In
fact the phrase ‘finite mathematics’ is often taken to mean the same as
‘discrete mathematics’.) There is no way that a discrete variable can come
‘inflnitesimally close’ to something. That is why, concepts like instantaneous
speed have no meaning in discrete mathematics although they occur so fre-
quently in continuous mathematics. We may even writea symbolic equation:
Discrete mathematics + Limiting process = Continuous mathematics
This equation is suggestive in many ways. It shows that continuous mathe~
matics arises as a limiting case of discrete mathematics. An approximate
solution to a continuous problem can be found using discrete mathe-
matics. Sometimes the tables can be turned around. When a discrete
variable assumes values which are very close (although not inflnitesimally
close) to each other, it may be thought of as a continuous variable for many
purposes and so the methods of continuous mathematics become applicable.
For example, in the house problem with n houses, for large values of n, the
problem is very nearly the continuous house problem. Since we havean
independent method of solving the latter, the answer (namely 3/4) can be
taken as a good approximation to the answer for the discrete problem for
all large n.
Limiting process is undoubtedly an important milestone in the history
of human thought. Because of this, for quite some time, discrete mathe-
matics was considered to belong to the infancy of man and continuous
mathematics to his manhood. In recent times, however, discrete mathe-
maticians are no longer considered as second class citizens in the mathe-
matical world. In fact, today discrete mathematics is one of the most
flourishing branches of mathematics. There are four good reasons for the
revival of interest in discrete mathematics. We discuss them briefly.
The first is a very practical one. Although in theorya continuous variable
can assume for its value any real number in some interval, in practice, even
with the best measuring techniques, we can measure its value only upto a
certain degree of accuracy. Thus, even when continuous mathematics gives
us the exact answer to a problem as a nice real number, to put it into
practice we have to resort to approximations. the commonest example of
this being the approximation of the ubiquitous 1: as 22/7 or as 3.1416. In
many problems, the things are even worse. In such problems. continuous
mathematics gives the answer as a certain limit. But this limit cannot be
evaluated concretely in a closed form. A case in point is the perimeter of an
ellipse with semi-major axis a and semi-minor axis 17. Using calculus it is
8 ntscnsrs summaries (Chapter One)
easy to write the perimeter as a definite integral (which, by definition, is the
limit of a sum). namely, as
I!
j. x/a’ sin’ I + 12’ cos' t dt.
0
But there is no way to evaluate this integral. (If a = b then the ellipse
reduces to a circle and the integral can be evaluated to give the familiar
answer 21:11; but in general it cannot be so evaluated.) In other words. even
if we know the values of a and b there is no handy formula in which we
can plug them and the answer would pop out. The best we can do is to
evaluate this integral approximately. This is essentially adiscrete process
and the branch of mathematics which handles such problems is called
numerical analysis.
Secondly, the very premise on which most of the applications of con-
tinuous mathematics to real life are based. has been seriously questioned
recently. As remarked earlier, classical mechanics held that physical quan-
tities such as mass and energy are continuous variables. The quantum
theory, on the other hand. holds that energy comes only in quanta. that is,
in whole multiples of some fixed unit, much like discrete objects like apples
or balls. Another example is that of the electrical charge. A charge which
is smaller in magnitude than the charge on an electron is not known to
exist. Any electrical charge must, therefore. be an exact multiple of the
charge on an electron. In other words, electrical charge is nota continuous
buta discrete variable. 01' Course, the charge on the electron is so small,
that for many practical purposes we may very well regard the electrical
charge as a continuous variable, the same way as we could think of the
discrete house problem as the continuous house problem when the houses
were very densely packed.“ This is, in fact, actually done. Consequently,
continuous mathematics has not been entirely out-dated. But, at least in
principle, its role as the only ‘true' representative of the real world (with
discrete mathematics playing a subordinate role) has been shaken. The
balance has now definitely tilted in favour of discrete mathematics.
The third, and probably the most important, reason for the recent up-
surge in the interest in discrete mathematics is the change that has taken
place in mathematics itself! Till about a couple of centuries ago, number
was the soul of mathematics. Whenever mathematics was to be applied to
any problem, the first step was to assign suitable numbers to the various
entities in the problem. (Euclidean geometry did proceed without numbers
for quite some time; but with the invention of cartesian co-ordinates, it too
was subsumed completely by numbers.) Things have changed considerably
*Ancther, common example is that the population of a country is often taken as
continuous variable, even though it il, most certainy. a discrete variable.
Introduction and Preliminaries 9
during the last century. The focal point of today‘s mathematics is not num-
bers but sets, or more precisely, sets with additional structures (which we
shall study later). Of course, numerical problems continue to have their
importance. But the growth of non-numerical mathematics has also been
considerable. As an example of such a problem, consider the following
problem, which we call as the ‘Dsnce Problem” “Suppose at a dance party,
every boy dances with at least one girl and no girl dances with all the boys.
Prove that there must exist boys, b, b' and girls 3, g’ in the party such that
b dances with g and b’ with g', but neither b dances with 3' nor b” with g."
This problem involves no numbers whatsoever. Neither the continuous
variables (such as the heights of the boys or the dimensions of the girls) nor
the discrete variables (such as the number of boys or the number of girls) are
of any relevance in this problem! We shall present the solution in the third
chapter. It will depend on a certain ‘additional structure' on the set of girls.
Of course, not all problems of modern mathematics are so completely
divorced from numbers. Many of them do involve numbers. But their real
crux is not these numbers but rather certain additional structures on some
sets involved in the problem. This will be more fully discussed in the third
chapter. Sufiice it to say here that the concept of additional structures on
sets has tremendously increased the applicability of methematics. It has en-
riched both the discrete and the continuous mathematics. But from the
point of view of practical applications, it has benefited discrete mathema-
tics more than continuous mathematics, because the sets that we come
across in practice are usually finite sets. Discrete mathematics, empowered
with these additional structures constitutes discrete structures which is the
subject matter of this book.
Finally, the advances in computer technology have also given discrete
mathematics a big boost. On one hand they have made discrete mathema-
tics a workable proposition; while on the other hand they have also posed
certain problems the solution of which again requires discrete structures.
As mentioned earlier, we often have to content ourselves with an approxi-
mate solution to a problem. The techniques for doing this were known for
a long time. But their practical utility was marred by the fact that even to
get a moderately good approximation, the computations involved were
horrendous, if attempted by hand. The high speed computers have over-
come this dilficulty. Therefore numerical methods (which belong to dis-
crete mathematics) are being inereasingly used. If we want to draw the
graph of a function, then unless the function is of a particularly simple
type (say, a linear one), we can only plot a few points on it (and that, too,
only approximately) so as to get a rough idea of the exact graph. This is a
discrete process. But with modern advances in computer graphics, it can be
done to such a level of fineness that the naked eye simply cannot tell the
difl'erence between the exact graph and its discrete, computerised version.
The recent trend is, in fact, to go for discrete approximations even when
the exact solution to a continuous problem can be found with just a little
10 inseam MATHEMATICS (Chapter One)
hard workl (It is rumoured that solving differential equations would soon
become an obsolete art because of the ease with which series solutions can
be obtained to a high degree of accuracy using a computer; much the same
way as cameras have rendered obsolete the art of drawing objects exactly
as they are.)
We hope the reader has by now grasped the spirit and the importance
of discrete mathematics. In the exercises below we give a few more exam-
ples to stress the difference and interplay between discrete and continuous
mathematics.
Exercises
1.1 Prove that two line segments are commensurate if and only if the
ratio of their lengths is a rational number.
1.2 Prove that the side of a square is not commensurate with its diagonal.
1.3 Verify that the solution to the discrete house problem with n = 2k
is 2:33,“: withn = 2k +1 isqksfiz. Why is it necessary to
make two separate cases depending upon whether n is even or odd?
1.4 Let a be a real number between Oand l. Generalise the continuous
house problem by finding the probability that the distance between
two distinct, randomly picked houses is at most a. (If a = 1/2, this
reduces to the problem we discussed.)
1.5 Similarly generalise and solve the discrete house problem with houses
at every l/nth kilometer, n being a positive integer. (You will find that
depending upon at, several cases may have to be made.)
1.6 In the discrete house problem with 11 houses, assume that the number
of persons living in the ith house is]; fori = 0, l, 2, ..., 10, What is
the average number of persons per house? What is the average
population per unit length? Which spot on the road is most densely
populated? (These are, of course, trivial questions. They are given
here because in the next exercise we shall study their continuous
versions and also for future reference).
1.7 When we consider the continuous version of the last exercise, we
encounter several difficulties. First of all, we have to let the number
of persons be a real variable. even though it may sound ridiculous to
think of a fractional human being. A more serious hinderance is that
we cannot simply let fix) be the number of persons living at the point
x. If we do so, the total number of persons would be”: f(x). There
x l.
is no way to define an infinite sum like this. However, we can take
the help of definite integrals. We would like the function flat) to have
Introduction and PreIiminaries ll
the property that for any interval [11. b]. the number of persons living
in It should equal the integral
b
I f(x) dx.
Such a function is called the density function of the corresponding
variable (in this case, the number of persons). Prove that for every
Jrll in the interval [0, 1], f(x.,) equals
lim Number of persons living in the interval Ix“. x. + Ax]
4x90 AX
(This may, in fact, be taken as the definition of the density function,
or more precisely. the linear density function. Thus we see that even
a rigorous definition of it involves a limiting process, similar to the
definition of the instantaneous speed.) Now answer all the three ques-
tions of the last exercise. (1'he dimensions of the quantity f(x) would
be persons per unit length.)
1 .8 Study the two-dimensional version of the last problem. That is, replace
the straight line road by a planar region. say. D (which may be thought
of as a town or a city). For a point (x, y) in D, define the laminar or
the areal population density at (x, y) as acertain limit. Use double
integrals instead of ordinary definite integrals.
1.9 If a discrete variable assumes values x” x,. ..., x. (say) with proba-
bilities pl, 11,, ..., p. respectively (so that g p, = 1), then its expected
[-1
value (or average) is defined as 5 pm. In our original discrete house
I-l
problem, with 11 houses, find the expected distance between two
randomly picked houses.
1.10 When we attempt the continuous version of the last exercise we come
across difiiculities analogous to those in Exercise (1.7). If x is a con-
tinuous variable taking values in an interval, say, [a, b], we can no longer
let f(x,,) be the probability that 5: equals x0. Instead we have to let p(x,,)
be the probability density at x,, defined as a certain derivative as
before. Now, in the continuous house problem, find the expected
distance between two randomly chosen houses.
1.“ Suppose in Exercise (1.6), there is a shopping centre on the road at a
distance a: from the house at 0. What is its average distance from the
houses on the road? What is the average distance which the persons
living on the road would have to traverse to go from their houses to
the L n‘ centre? (P ' 'The two , need not be the
same.)
1.12 Study the continuous version of the last problem.
12 mscnm MATHEMATICS (Chapter One)
1.13 Suppose a savings bank allows an interest at the rate of 10 percent
per annum on deposits. If the interest is compounded n times every
year (with rests at every l/nth year) where n is positive integer, and a
, depositor invests one rupee, how much money will be at his credit
after one year? (Assume, for the purpose of this problem that a
rupee can be divided into as many equal parts as we like; although
strictly speaking. such an assumption is contrary to the spirit of discrete
mathematics.)
1.14 In the continuous version of the last problem, the interest is said to
be, quite appropriately, continuously campaunded.0btain the answer
in this case both as a limit of the answer in the discrete case and also
directly by solving a suitable differential equation.
Notes and Guide to Literature
Although we shall primarily deal with discrete structures, some acquaint-
ance with continuous mathematics will be needed. at least to appreciate
discrete mathematics, if for nothing else. This may be had through any
; “ ‘ course in ' ' (with n" ion) in which the role of the
limiting process has been duly emphasised. There are many standard text
books on calculus, for example. Thomas and Finney [l] or Kreyszing [1].
We shall only need some very elementary concepts from probability
and statistics. An exposition through elementary college level courses should
be adequate. Two good references are Meyer [1] or Parzen [1]. There are
numerous treatises on numerical analasis. As a treatise which gives the
theoretical justifications for the methods used see, for example, Isaacson and
Keller [1]. For a more recent treatment which gives not only the methods
but computer programmes which can be implemented on personal compu-
ters, see Chapra and Canale [l].
Non-commensurability of the diameter and the circumference of a circle
amounts to saying that 1: is an irrational number. For a proof, see Niven
[1].
Continuously compounded interest is not such a hypothetical thing.
some banks do allow it (although, of course, in the pass-books they can
enter only approximate amounts)! It is also the nature’s law of growth
(or decay) that the rate of growth be always proportional to the quantity
present. It is obeyed by radioactive substances.
2. Typical Problems
In this section we discuss the general types of problems that will be
studied in this book.‘ The very genesis of discrete mathematics is its
'Actually. some of them will he covered in a difl'erent book. See the Epilogue.
truncation and Preliminaries 13
application to real life. Problems would therefore play a dominant role in
this book. Accordingly, we give in this section a list of some typical, con-
crete problems. The reader is invited to try them ‘by hand‘. (A few of them
can indeed be so tackled.) Where he can solve them, he is encouraged to
modify them, generalise them or to prove stronger results. On the other
hand, even if you cannot get a problem, at least try some special cases of
it; see if you can solve the problem with some additional hypothesis. In
fact, this is a general piece of advice applicable to all the problems’to
come. In order not to water the things down. we give only the hard-core
problems here. Comments will follow in the next section along with gui-
dance to the chapters where they will be solved fully. But first a few
generalities about the problems of discrete mathematics.
It may appear at first sight that the problems of discrete mathematics
are simple, at least as compared to those of continuous mathematics be-
cause the latter involve a limiting process while the former do not. To
some extent this view is substantiated if We compare the discrete and the
continuous versions of various problems given in the exercises of the last
section. The former generally involve little more than arithmetical com-
putation while the latter require elaborate concepts such as derivatives and
integrals (even double integrals). As a concrete example, in Exercise (1.6),
we asked which spot on the road was most densely populated. This simply
amounts to comparing the eleven numbers f,,_ f,, ..., f" with each other
and finding which of them is the largest. In the continuous version in
" ‘ (L7). the ' ' to ' ' ' the density f "
fix) over the interval [0, 1]. Because there are infinitely many points in
the interval [0,1]. the ' y' ‘ ’ ‘ " “ "" ble to the "
version no longer works. The difliculty, in fact, starts right from the
existence of a maximum. As a simple example, suppose for) is defined by
x for 0<x<zll2
fix): 0 for x=l/2
l—x for l/2<x<l
This function is bounded above (with 1/2 as the supremum or the least
upper bound). Yet, there is no point x, in [0, l] at which f(x,,) = 112. In
other words, this function has a supremum but no maximum! Such an
apparently paradoxical situation cannot arise in the discrete case. Note
that the function f(x) here is not continuous at 1/2, If f(x) is continuous at
every point of [0, 1], then there is a theorem which asserts the existence of
a maximum for f(x), that is, a point x. in [0, 1] such that for“) 2 f(x) for
all x in [0, I]. This is a fairly non-trival result whose proof requires acme
very basic properties of real numbers. Unfortunately, even with such a
deep result at our disposal, we are still a long way from the solution. The
l4 DISCRETE MATHEMATICS (Chapter One)
trouble is that although the theorem gurantecs the existence of a point x,
as above, it gives no construction for finding it in finitely many steps.
There is, in fact, no such method available. Iff(x) satisfies some additional
conditions (such as, having a continuous derivative) then there are methods
for finding approximate answers. But they are fairly involved.
Thus we see that at least in this problem, the discrete version of it
admits a very easy. almost mechanical solution. It is completely free from
the complications that arise in the continuous version brcause of the subtle
differences between a supremum and a maximum or between proving the
existence of something and actually finding it (the kind of differences which
a calculus teacher tries to emphasise to his students with all the zeal of a
y talking to " ' ‘! It is ‘ of reasons like this that
discrete mathematics was considered to belong to the infancy of man and
continuous mathematics to his manhood. As remarked earlier, this view no
longer holds good today. Nevertheless, there is still a class of persons who
lose interest in a problem the moment they see that it can be solved in a
finite number of steps by a mechanical procedure. ‘Give it to a computer
and forget about it’ is their reaction!
But, what might be the end for such persons is the very beginning
for some othersl This 6 y of r ' ' ‘ r p. g n
(or at least the brains behind them). All the problems they deal with are
finitistic in the sense that they can be solved in a finite number of steps by
a systematic, mechanical procedure (an algorithm, as it is technically called).
Do these persons then merely translate these algorithms into computer
programs and feed them toa machine? No. What an intelligent programmer
looks for is not just some algorithm that will work but one that will work
efficiently in terms of time and other factors, such as the amount of memory
used. The design and analysis of algorithms is a highly challenging branch
which we shall discuss briefly in a later chapter.’ For the mement, we can
illustrate the difference between an ineflicient and an efiicient algorithm with
reference to the problem we have been discussing, namely, finding the
maximum out of the eleven numbers 1;, f], . . .. f". The first method is to
go on scanning these numbers one by one and comparing each one of them
with all its predecessors. If it is less than at least one of them, obviously
it cannot be the largest. If, on the other hand, it is greater than or equal
to all its predecessors, then we call it the tentative maximum. The tentative
maximum that survives after all the numbers upto and including I,” have
been exhausted, is obviously the maximum, the answer to our problem.
It is easy to see that this method requires 55 comparisons. (Remember this
was also the number of pairs of distinct houses in the discrete house pro-
blem in the last section.)
Now, are these 55 comparisons absolutely necessary? No. One way to
'See the Epilogue.
Introduction and Preliminaries 15
reduce them is to note that in the procedure above, the moment we found
some number (say f.) less than some predecessor (say f,) of it, it is absolu-
tely futile to go on comparing it with subsequent predecessors of it (in this
case f,, f. and f,). If we omit such redundant comparisons, how much sav-
ing do we effect? The answer depends upon the original sequence, _f.,, f, ...,
fm. If it is monotonically increasing(that is,f;_, < f, for all i=1, 2,..., 10),
we would still need all 55 comparisons. But if it is strictly monotonically
decreasing. only 10 comparisons would do. In general, the answer would of
course lie between these two extremes. The best we can do is to give the
expected, or the average number of comparisons that would be needed by
this method. Finding such averages is itself a challenging problem.
In the present problem, though, the analysis of the last algorithm is only
of academic interest because there is another way to reduce th’e' number of
comparisons still further. We start with f., as the tentative maximum. (This
was also the case in the last two methods). Now, every time we come
across a new number 1;, we simply compare it with the tentative maximum
at that time. If it exceeds the tentative maximum then we set the tentative
maximum to be f. and consider the next number. f1.”- Otherwise we go
directly to flu. This way we can find the maximum of the eleven numbers
fa, fl, ..., flu in just ten comparisons. Can we do better than this? No i It is
a good exercise 1‘or the reader to show that to find the maximum out of ll
numbers. no matter which method we follow, at least ten comparisons will
be needed. In other words, from the point of view of number of comparisons,
the algorithm we have just considered is the best possible. We hasten to add
here that this is not a very common situation. In other words, in many
problems, it is generally far from easy to prove that a particular algorithm
is the best possible one. This is, therefore, yet another challenge to the desig-
ner of an algorithm!
In summary, we see that the simple (and, in fact, trivial from the point
of view of continuous mathematics) problem of finding the maximum of
eleven numbers leads to many interesting improvements and challenging
problems. On this ground alone, discrete mathematics is far from a trivial
subject, even though the sets involved in it are generally finite. But there is
more to come. Even in finite mathematics, an element of the infinite
creeps in because of the fact that although each integer is finite, the set of
all positive integers is infinite! In any actual problem of real life, we would
be dealing with a set whose size is a fixed positive integer. But naturally we
would like our method to be applicable to similar problems of a general,
undetermined size. For example, the problem we discussed above was to
find the maximum of 11 numbers. But there is nothing particularly sacro-
sanct about 11. We might as well be faced with the problem of finding the
maximum of 12 or of lOOnumbers, say. We would obviously not like to
duplicate all our work every time. We, therefore, want asolution that will
work for every positive integer n. In the problem that we are discussing, there
16 Dream MATHEMATICS (Chqpter One)
are no difficulties involved in generalising our comments from the special
case of 11 numbers to the general case of n numbers. But this is not always
so. For many problems, although the special cases can be worked out ‘by
hand’, finding the answer in the general case can be a formidable task. We
shall see many instances of this later on.
Secondly, even when we are dealing with a very particular case, so that,
theoretically, the solution can be obtained by mechanically going through a
finite number of cases, their number can be so enormously large that even
the best computers available today (or in near future) would take ages to
find the answer ! Since we would like to have the answer in our life-time, it
is no solution to say ‘Let a computer do it'. One then has to look for the
solution by some very ingeneous methods. Although we shall not study very
many problems of this type, we mention one such problem here just to
illustrate the point. We nickname it as the ‘Religions Conference Problem’.
‘Suppose there are 10 recognized religions in India. At a religious con-
ference, delegations from 10 states in India are invited. Each state sends a
delegation of lo persons, one each of the ten religions. Thus, in all, there
are 100 delegates, 10 from each state and also 10 of each religion. Can
these 100 delegates be arranged in a 10X [0 square so that in each row (and
column) every state and every religion will be represented?
The answer to this problem is of course, a simple ‘Yes’ or ‘No’, with
justification, that is, showing an actual arrangement if it exists or else pro-
ving that it does not exist. We would of course get it eventully if we exa-
mine all possible arrangements of the 100 delegates in a sqaure. But the
number of such arrangements is 1001, that is, the product l X 2 X3>< 4X
X 98 x 99x 100. This number is larger than a trillion (= 101‘). True, we
can weed out many arrangements easily (for example, those in which the
first row contains some religion twice). But even then the number of
cases left is too large to allow a case by case study. It is therefore not very
surprising that this problem remained unsolved for about two centuries.
Finally, we must remember that our subject matter is not discrete mathe-
matics in the traditional sense but rather discrete structures, that is, discrete
mathematics empowered with study of additional structures on finite sets.
When we study additional structures on sets, we find that some theorems
hold regardless of whether the underlying sets are finite or not. But there
are also many interesting and non-trivial results which hold only for finite
sets. What is needed to prove them is not some routine, brute-force compu-
tation but rather some cool, deductive reasoning (similar to the reasoning
required to prove theorems in geometry without the use of coordinates,
vectors or trigonometry). Indeed, the arguments would be so full of variety
and ingenuity that nobody can call discrete structures a dull or a trivial
subject.
with these generalities about the problems to be discussed in this book,
Introduction and Preliminaries 17
we now give a list of sample problems. Following our earlier practice we
nickname each one of them for ready reference in future.
1. The Tournaments Problem : In a knock~out tennis tournament, the
players at each round are paired off and a match is played between
the players of each pair. (In case of an odd number of players at any
round, one of them gets a bye). The winners enter the next round. This
process continues till a champion is found. If 1,000 players enter the
tournament, how many matches would be needed to decide the
champion?
The Locks Prohlem : A box contains secret documents and only 5
persons are privileged to have access to them. As a further security
measure. it is desired that when any three but no fewer of these 5 per-
sons come together they should be able to open the box. Design a
system of locks and keys which will achieve this.
The Envelopes Problem 2 10 letters and 10 envelopes carry matching
addresses. If one letter is put in each envelope at random, what is the
probability that all the letters are in the wrong envelopes?
The Test Problem : 20 students appear for a test with 10 questions
each of which is to be answered ‘True’ or ‘False’. Prove that there
exist two (distinct) students who answer at least six questions identi-
cally.
The Postage Problem: In how many ways can a postage of two
rupees be aflixed on an envelope. using stamps of denominations lo
paise, 20 poise and 30 poise?
The Division Problem: Is the number 2~3~5-7- 1 l~ l3- l7+l divisible
by 19?
The Landlord Problem: A landlord shares a house with a tenant.
There is an electric lamp in the common passage with two switches,
one with the landlord and one with the tenant. In addition, the land-
lord has a ‘hidden' switch. Design a circuit in which, when the hidden
switch is closed, either of the two other switches can control the state
of the lamp independently but when the hidden switch is open, the
landlord’s switch has exclusive control of the state of the lamp.
The Business Problem: A certain socialist country puts the following
restrictions upon businesses:
18 DISCRETE MATHEMATICS (Chapter One)
(i) All businesses having import licenses as well as those not manu-
facturing essential commodities must employ local personnel and
must not employ any skilled personnel.
(ii) All businesses having import licenses as well as all those not
employing local personnel must employ skilled personnel and
must not manufacture essential commodities.
(iii) No business shall employ local personnel without obtaining an
import license.
Prove that it is impossible to do any business in that country.
The Stone Problem: You are given a stone weighing 40 kilograms.
Cut it into four parts using which it will be possible to weigh any
integral multiple of 1 kg upto 40kilograrns with only one use of a
balance, the parts of the stone being allowed to be placed in either
pan.
10. The Regions Problem: Into how many regions is a plane divided by
it straight lines no two of which are parallel and no three of which
are concurrent?
ll. The Shores Problem: A certain company makes a fresh allotment of
shares. Thereafter, at the end of every year it issues one bonus share
per every share which has matured for two years or more. (The bonus
shares, after their maturity, also keep on acquiring further bonus
shares every year.) A shareholder buys one share at fresh allotment
and thereafter neither buys nor sells any shares, but simply keeps on
collecting the bonus shares due to him. Find how many shares he will
have during the nth year.
12. The Vendor Problem: A vendor sells tickets costing 1 rupee each.
100 customers approach him one by one. 50 customers tender 1 rupee
coin each while 50 tender 2 rupees coin each and claim a change of
l rupee each. The vendor has no money to start with. What is the
probability that he will not run out of change when needed?
13. The Casino Problem: The management of a casino installs a gambl-
ing machine in its lobby. On this machine. a player tosses a {air coin
at each round. Each toss costs him a rupee. The game is over when
three heads show in a row (called a ‘win’) or at the end of the tenth
round whichever is earlier‘ Assume that a gambler continues to play
a game till there is some possibility of a win but not afterwards
. The
machine is meant only to attract customers and the manageme
nt
wants to run it on a ‘no profit no loss’ basis. What amount
of a
reward for a win will ensure this in the long run?
Introduction and Preliminaries 19
14. The Capital Problem: There are three major cities A, B and C in a
newly born state. A capital is to be built at a place from which the
sum of the straight line distances to the three cities will be minimum.
At what point in the plane (containing A, Band C) should it be built?
15. The Head Ofllce Problem: A Company has branch oflices at the
cities A, B, C, D, E, F. G which are joined by roads as shown in
Figure 1.4, where the numbers indicate the lengths of the roads in
some fixed unit. The company wants to build a head office from which
the sum of the distances to the branch offices (along the shortest
possible paths available) is minimised. Where should it he built?
6 8 D
Figure 1.4: TheHe-d Ofinel’nbhm
16. The Little Travelling Salesman Problem: Suppose there are n cities
and there is a direct path between every two of them. Construct a
circular tour of these cities in which every city is visited exactly once.
How many such tours are there? Prove or disprove that the shortest
such tour must visit the closest pair of cities consecutively.
17. The Diet Problem: The staple diet of a person consists of a combi-
nation of two cereals, A and B. The protein and fat contents and the
costs (all expressed in suitable units per kilogram) of these cereals are
as shown in the following table:
20 mscnm mmncs (Chapter One)
The person is advised to have a protein intake of at least 300 units
and a fat intake of at most 500 units per month. However, because of
personal tastes, he insists that at least 30 per cent of his staple diet
should consist of cereal A. Can he have a diet which meets these
requirements? If so, which is the most economical such diet?
18. The Cattle Problem: A farmer undertakes to supply a compsny 300
litres of milk and 450 kilograms of mnure every day. The
company provides the farmer with a pasture of 100 sores on which
he can graze cattle free of charge. But he has to buy his own cattle.
The following table shows the cost (in rupees), the daily milk yield
(in litres), the daily manure output (in kilograms) and the pasture
requirement (in acres) of a cow and a bufl‘slo.
Daily milk Dally mule Future ares
Animel Cost (Rm) yield output (acres)
(“N“) (h!)
Cow 2,200.00 3 3 1.0
Bufl'llo 3,500 . 00 5 7 l.5
What is the minimum investment on the cattle the farmerhns to nuke
to fulfill the contract?
Exercises
2.1 Prove that it is impossible to find the maximum out of 11 numbers
in less than ten comparisons.
2.2 Explain the relationship between the Tournaments problem and
the problem of finding the maximum out of 1,000 numbers.
2.3 If x, y are real numbers, prove that their maximum equals
Hx + y + Ix-yl).
Obtain a similar formula for their minimum.
Introduction and Preliminaries 21
2.4 Generalise the last exercise so as to get an inductive expression for
the maximum of n real numbers.
2.5 Does the last exercise provide a method for finding the maximum
of In real numbers which avoids comparisons of real numbers?
2.6 The Religious Conference Problem can obviously be stated for the
ease of n religions and n states, wheren is positive integer, prove
that for n = 2 it has no solution while for n = 3 it has a solution.
"2.7 Prove that the Religious Conference Problem also has a solution
for n = 4, 5, 7, 8 and 9 but no solution for n = 6.
2.8 According to a popular story about a famous mathematician, in his
student days at Cambridge. he was so confident that he would top
the merit list at the coveted tripos examination, that on the day the
results were put up he asked his servant to go and find out who
was the second in rank. The obedient servant returned and said ‘It
is you, Sir’! Poor fellow then had to send him again to find 01"
who was the first. In contrast with this, prove that any algorithm
for finding the second largest from a given set of real numbers must
necessarily also find the largest one. (In other words, although the
knowledge of the second largest element does not imply knowledge
of the largest element, any method for finding out the second largest
element will necessarily give, as a by-product. the largest element as
well. Thus there is a subtle difl'erenoe between knowing something
and finding it out.)
2.9 Prove that any algorithm for finding the third largest from a given
set of real numbers must necessarily also give, collectively, which
two numbers are among the top two, although it may not necessarily
tell which is the largest and which is the second largest.
2.10 Prove that the second largest (and hence also the largest) element
of a set of 11 (distinct) real numbers can be determined with 13
comparisons.
(Hint: Use your answer to Exercise (2.2).]
Generalise the result of the last problem for the case of 71 real
numbers. (Note, incidentally, that the problem of finding the second
best mixirnum hardly arises in continuous mathematics. Why?)
‘2.12 Prove that, in general, less than 13 comparisons will not suffice to
find the second largest element of a set of 11 numbers.
Nate: and Guide to Literature
A proof of the theorem that every continuous real-valued function on a
closed and bounded interval has a maximum may be found in almost any
book on real analysis. Such theorems are called existence theorems because
they assert the existence of something without giving any method for finding
it. This intriguing fact has led to two lines of development. One is to look
22 DISCRETE MATHEMATICS (Chapter One)
for methods for actually finding the solution (in this case the maximum), or
at least an approximate solution. This has been going on for a long time.
But recently, some mathematicians have started questioning the very founda-
tions of mathematics which produce such a beautiful but ‘useless' result. It
turns out that the root cause lies in our classification of statements either as
‘true’ or ‘false’ even if we have no way of finding out to which category as
particular statement belongs. This has led to a new school of thought, the
so-caIled constructivist mathematics, in which a third category is kept for
statements which are not known to be either true or false at present. A
definitive reference on this type of mathematics is Bishop [l].
The Religious Conference Problem is a paraphrase of what is more
commonly known as the problem of finding mutually orthogonal Latin
squares. Euler proved that no solution exists for n = 6and conjectured that
the same is true for all n of the form n = 4k + 2 (the case n = 10 being
the first unsettled one). Bose and Shrikhande [l] disproved Euler’s canjeeture
by utuslly giving a construction of the desired type of arrangement. This
led to an even more challenging problem of whether there exists what is
known as a projective plane of order 10. It is still open. More details (both
on Latin squares and finite projective planes) may be found in the book of
M. Hall [1].
The anecdote about the mathematician in Exercise (2.8) is taken from
an article by Roth [1].
The simile of a missionary talking to cannibals was used originally by
Littlewood to comment upon the somewhat prolix style of Hardy in his
classic textbook ‘A Course of Pure Mathematics’ [1].
Results about the minimum numbers of comparisons needed to find out
the rth largest element from a given set of real numbers have been sum-
marised in Knuth [1] (volume 3, Section 5.3.3).
3. Comments about Typical Problems
We hope the reader has tried and solved a few (but not all. as otherwise
there would be little need to go further!) of the typical problems in the last
section. We now make comments about them one by one and indicate what
would be needed to solve them. In subsequent chapters these problems will
often be used to provide motivation for certain concepts to be studied. We
already considered the Dance Problem in Section 1. We now go to the
problems in the last section in the order they are listed.
1 . The Tournament Problem: The problem, as it is, is very straightforward
and the answer is obtained simply by summing up the number of matches
ineach round. Thus500 + 250 +125 + 62 + 31 + 16 + s+4+2+1
or 999 matches will be needed for finding the champion. You might wonder
if all problems in discrete mathematics are so trivial (after all. this is a
Introduction and Preliminaries 23
‘typical’ problem 1). But if you are of the perceptive type, you will not fail
to notice that the answer (999) is awfully close to the number of participants
(1000). If further, you are scientifically minded, you would try the problem
with different numbers of participants say, 100; 10,000, or some random
figure such as 12,597,341. You will find that the number ofmatehes to
decide the champion is, respectively, 99; 9,999 and 12,597,340. You are now
convinced that the number of matches is always one less than the number
of participants. But your own personal conviction and verification does
not constitute a mathematical proof 1 There are infinitely many integers and
you will need to prove the assertion for all of them, something which you
can never do by finitely many experiments. This only illustrates an earlier
remark that even in finite mathematics, an element of the infinite creeps
in.
So we need a proof that will show that for every positive integer n. n—l
matches will be needed to decide the champion among 71 players by the
knock-out tournament method. One way to prove this is by induction (or
more precisely, the second principle of mathematical induction). We leave
this as an exercise. The real point of this problem is that there is an ex-
tremely short, ingeneous proof which uses only some elementary facts from
set theory. This will be presented in Chapter 2.
2. The Look: Problem: There is, of course, nothing special about the
figures 5 and 3. If m and n are any positive integers with m < n then we
can ask the same problem with 71 persons of which anym but no fewer
should be able to open the box. The simplest interesting casearises for
m = n = 2. This corresponds to the safety locker arrangement commonly
provided at the banks. There the solution is to have two locks, one of which
can be opened only by the bank and the other only by the customer. It is
not difiicult to generalise this solution. But it would be more natural to
conceive it using set theory. So again we defer it to Chapter 2.
3. The Envelopes Problem: This is what is more formally called the pro-
blem of finding all derangements of 10 symbols. (A derangement is defined
as an arrangement of objects in which no object is in its own position.) It
would be horrendous to do this by hand, and in any case we would like to
find the answer for any u (not just for n = 10). The technique adopted for
doing this is called the principle of inclusion and exclusion. This will be
studied in Chapter 2.
4. The Test Problem: It is instructive to reformulate this problem. A
binary sequence of length n (where n is some positive integer) is defined as
a sequence of n terms each of which is either 0 or 1. Now if we let ‘1' cor-
respond to 'True‘ and ‘0' to ‘False’ then the answers given by a. student can
be represented by a binary sequence oflength [0. The problem now amounts
24 DISCRETE MATHEMATICS (Chapter One)
to showing that given any 20 binary sequences of length 10, there are at
least two among them which difl‘er from each other in at most 4terms.
The solution is simple once we quantify the 'difi‘erence‘ between two binary
sequence of length 10. This amounts to putting an additional structure on
the set of such sequences. Consequently, this problem, like the Dance Pro-
blem. will be relegated to the chapter on sets with additional structures
(Chapter 3).
5. The Postage Problem: We assume that the stamps of each denomina-
tion are indistinguishable from each other, so that all that matters is how
many stamps of each kind are used and not which particular ones are used.
Also, from the statement of the problem, it must be understood that once
the stamps are selected, the manner in which they are afiixed on the enve-
lope is irrelevant. For otherwise, the answer will be infinite since even a
single stamp can be placed on an envelope in infinitely many ways: (Pro-
blems in which certain laminas are to be ‘tilcd' with given patterns are also
important but we shall not discuss them.)
With these simplifications the problem amounts to finding the number
of ways a total oftwo rupee (i.e. 200 poise) can be made up from the three
kinds of stamps. We might as well take 10 paise as a unit. The problem
then reduces to splitting a collection of 20 units into classes of sizes I, 2
or 3 units. The technical name for such a splitting is a partition. We shall
formally define them in Chapter 2. Theoretically, the problem could be
done by hand by simply listing down all possible partitions of 20, in which
noparthassize ",3.Iffor ,',' ‘ ‘ofa, Bof2
rupees we want to aflix only a postage of 50 poise, then we have to parti-
tion 5. This can be done in 5 ways: (i) l + l + l + I + 1 (corresponding
to 5 stamps of 10 paise) (ii) 2 + l + l + 1 (corresponding to one 20 paise
stamp and three 10 paise stamps) (iii) 2 + 2+ 1 (iv) 3 + 1 + l and
(v) 3 + 2. Similarly for the given problem it can be shown that the ans-
wer comes out as 44. But there are many disadvantages in such straight-
forward methods of counting. First, in listing the various possibilities (in
this case, partitions with part sizes at most 3), there is a danger that while
searching for them we may miss some of them and thereby get a wrong
count. This difficulty can be overcome by doing the search systematically
and exhaustively. But the time it takes is prohibitive. Secondly, we would
obviously like a method which will work for the partitions of any positive
integer n, not just for n = 20 as in this problem, because there is nothing
particularly great about a postage of 2 rupees; it could just as well be some
other amount. So let a. denote the number of partitions of n into parts of
sizes 1, 2 or 3. We computed a, as 5 and mentioned (without proof) that
a“ (which is the answer to our problem) is 44. But what is a, for a ‘general‘
value of n?
To answer questions like this we need advanced counting techniques. to
be studied in Chapter 7. The essential idea is very simple. A partition of n
Introduction and Preliminaries 25
amounts to a factorisation of as“ into powers of x. For example, consider
the factorisation of x“ as (x‘)'(x‘)’(x)‘. This corresponds to taking two 30
paise stamps, five 20 paise stamps and four 10 paise stamps. So the problem
amounts to finding the number of possible factorisations ofx'0 into powers
of x‘, xI and x. Although such a paraphrase is not by itself a solution, it
suggests that the machinery of algebra can be used and we shall exploit it.
The reader may wonder why on earth would anybody want to count all
possible ways of afiixing the postage when in practice any one way would
sulfice. As a. somewhat contrived answer, suppose a spy wants to send
some secret messages which are to be coded in terms of the com-
binations of stamps affixed on the envelopes. Then under the conditions
of the problem, he can send upto 44 distinct messages. More importantly,
the present problem is only a sample of problems involving counting.
Counting the number of all ways a certain thing can happen is important
in various connections, for example in finding out the probability of some
event or in predicting how long an algorithm may take to do a given pro-
blem and so on.
6. The Division Problem: Like the Tournaments Problem. this problem
also has at least one straightforward way for solution. By sheer computa-
tion we see that the number 2-3~5~7‘1l-13-17+1 is 510,511 which is
indeed divisible by 19 (with quotient 26869). But brute force method is not
always the best. In the present problem we are not interested in the quo-
tient, but only in seeing whether 2-3-5-7- ll~ l3ol7+ 1 is divisible by 19.
This can be done, without carrying out the given multiplication, much
more easily by the use of the so-called residue or modular arithmetic. Al-
though by way of examples, we shall mention this in Chapter 3, to put it
in proper perspective we need ring theory from Chapter 6.
By the way, there is a reason for considering the particular number
2~3-5.7~ ll~ 13- 17 + 1 (although the method would apply to any product).
Note that this number is the product of the first seven prime numbers plus
one. Using an analogous construction. Euclid showed that there are infini-
tely many prime numbers.
7. The Landlord Problem: If anybody thinks that discrete structures
have no practical uses, this problem should disprove it! If nothing else. it
helps landlords harrass unwanted tenants! The circuit involved here is a
combination of two familiar circuits. The first, popularly known as the
‘staircase circuit’ is a circuit in which either of the two switches can control
the state of the lamp independently. The second circuit is even simpler,
because a single switch controls the state of the lamp. Consequently, any-
body with a little experience about designing circuits, can do the problem
by 'hand. But a systematic solution requires the use of Boolean algebras.
They will be studied in Chapter 4. They are applicable to two-state devices
26 DISCRETE MATHEMATICS (Chapter One)
such as switches and lamps. (A switch is either open or closed; a lamp is
either on or 05).
8. The Business Problem: We are, of course, not commenting about
socialisml Our interest is only in showing how Boolean algebras can help
streamline a jungle of clumsily written laws or statements. In fact Boole
(after whom Boolean algebras are named) originally called his theorems as
laws of thought. We shall study this problem'In applications of Boolean
algebras to logic. it can be argued that this problem can be done by com-
mon sense. But actually, the logic that we shall study will not be significantly
difl‘erent from sense. In " ‘ pr ' ' sense may
bog down by itself, but not when it is aided by the machinery of Boolean
algebras.
9. The Stone Problem: Boolean algebras provide an appropriate tool
for handling two-state systems. In this problem. though. we have three
possible states for each part of the stone. It may be placed in the same
pan as the object to be weighed, or in the opposite pan or it may not be
used at all! Such devices are called ternary devices. Although Boolean
algebras are not the natural means for handling them, we shall club them
together because some of the ideas involved are the same.
10. The Regions Problem: Let a. be the number of regions into which the
plane is decomposed by n straight lines, no two of which are parallel and
no three of which are concurrent. Without these restrictions, the number
of regions would go down and the problem would become much more
complicated, depending upon how many of the lines are parallel to a given
direction and how many sets of concurrent lines there are. The lines satis-
fying the given restrictions are said to be in general position, because,
although we shall not do it, can be shown that given a finite collection of
straight lines at random, the probability that two of them will be parallel
or three of them will be concurrent is zero. (The argument resembles the
comment made in the continuous house problem in Section 1, that the case
of two houses being the same corresponds to points on a line and a line has
no area and so does not contribute to probability.)
For small values of n, a, can be found by actually drawing diagrams.
Thus we see a1 = 2, a, = 4,11‘ = 7, a. =, ll and so on. We may also set
a, = 1 because with no line in the plane the whole plane is a single region.
If you are good at guessing (a highly rewarding ability in mathematics), you
would probably be able to guess, from these few values, a formula for a.
in general. If your guess comes out to be correct for n = 5 and n = 6 (say)
you have reason to be convinced. But, once again, that is not a mathe-
matical proof. You will have to prove your guess for all n, say by induction.
But there is more to this. Guessing answer and proving that it is in
Introduction and Preliminaries 27
fact correct is one thing. But finding it out is quite another. The latter is
always more desirable, whenever it is feasible. In the present problem it is
so. To do this, instead of trying to obtain a formula for 0,, directly in terms
of n, we express a. in terms of u,_,. Suppose the n lines are L,, L,, ...,
I...» L. (see Figure 1.5). Of these, the first n—l lines, namely, L» L,,...,
L._| have already decomposed the plane into a,_, regions. Some of these
regions will be divided into 2 each by the nth line, L... The difference
a,,-a,._l is therefore precisely the number of regions (out of u._l) through
which the line L,I passes. Because of our assumption, L. intersects each
one of the lines Ll, L,, ..., L...I in distinct points (which need not, however,
lie in the same order on the line L.). These n—l points of intersection cut
the-line L. into 71 parts; 2 unbounded and the remaining n—2 bounded.
Each of these parts obviously lies in a difierent region formed by the first
11—] lines. So the line L, passes through precisely In regions out of the a,,_1
regions formed by L., L,, ..., L._,. We thus get an important relationship,
a. = 11..-, + n for all n = l, 2, 3, .
(This is probably what you had guessed too.)
A relation like this is called a recurrence while. or a dlfl'erence equation.
To solve a recurrence relation means to find a formula for 1,, (as a
function of n) which will satisfy the relation subject to some initial conditions
(in this case, ”a = l is the initial condition.) Systematic methods for solv-
ing recurrence relations will be studied in Chapter 7. In the present problem,
28 DISCRETE MATHEMATICS (Chapter One)
though, there is an easy way to get the solution provided you know the
sum ofthe series 1 + 2 + + n.
11. The Share: Problem: This is another example on the use of recurrence
relations. We let b. be the number of shares had during the nth year. We see
that b1=1, b,= 1, b,=2 (because at the beginning of the third year, abonus
share would come), b. = 3, b. = 5, b. = 8, b, = 13 and so on. We also
set 17.. = 0. We note that for every positive integer n, b.—b,,_, is simply
the number of bonus shares that have come at the beginning of the nth year.
But this is the same as the number of shares which have been in possession
for at least two years and hence equals b.._,, the number of shares pos-
sessed during the (n—2)th year. Thus we get the recurrence relation,
b. = b.., + b._,
with the initial conditions be = 0 and b, = l. Unlike the last problem,
there is no easy way to solve this recurrence relation. The solution will
require the use of power series! Power series are associated with complex
analytic functions, which are the bastions of the classical. continuous
mathematics. The reader may rightly wonder what possible business they
have in discrete mathematics, in which there is no limiting process and
hence no room for concepts like difierentiability. But there is really no
reason for surprise if we recall our earlier comment that even in finite
mathematics. an element of the infinite creeps in because of the fact that
the set of integers is infinite. And once infinity comes in, limiting process
is not far away. We shall see, in fact. in the seventh chapter that power
series are a powerful tool even for the discrete mathematician.
12. The Vendor Problem: What matters in this problem is the order in
which the customers approach the vendor. Each such order corresponds
to a sequence of 100 coins, of which 50 are of l rupee and 50 are of
2 rupees each. In the next chapter we shall see that there are in all
100!
50! 50!
such sequences. This is, therefore, the total number of cases.
The crucial part now is to find the number of favourable cases. that is,
the number of those sequences in which at no time the number of 2 rupee
coins exceeds the number 1 rupee coins. (If this happens, the vendor runs
out of change.) An unusually simple and elegant argument for this is due
to Andre and will be given in Chapter 2. But the standard method is to
use recurrence relations. For this we paraphrase the problem slightly.
Think ofeach 1 rupee coin as a left parenthesis, (, and each 2 rupee coin as
a right parenthesis,). We denote by a. the number of ways to put n pairs of
parentheses in a balanced manner. It is easily seen that number of favour-
able cases in our original problem is a”. It is not easy to find a... directly
Introduction and Preliminaries 29
(except by Andre‘s method). But we can write a recurrence relation for (1.,
solve it for all n and then substitute n = 50. This is in fact the beauty of
the method of recurrence relations. It is easier to obtain the answer for all
n than for a particular n.
Now for the recurrence relation itself. We are letting a. be the number
of balanced arrangements of n pairs of parentheses. In any such arrangement
the very first parenthesis must be a left one. There is a unique r (1 S r s n)
such that this first left parenthesis gets ‘eancelled’ after 7 pairs of paren-
theses, as in Figure 1.6, where we indicate this mutually cancelling pair by
putting a sign X on their top. Now inside this pair of parentheses, there are
r—l pairs of parentheses which must balance themselves. This can be done
in a,.; ways for r > 1. Moreover, after the I pairs of parentheses in our
original arrangement, the remaining n—r pairs must balance themselves.
P-lpuirs
x
u m ------ ) )( ———————————— )
L V gt 1 V J
r pairs n— rpclrs
Figure 1.6 : Par-plane. ol'the Veldor Problem
This can be done in a._, ways forr < n. From the elementary counting
techniques to be studied in the next chapter, we shall get
a. = a...: + are... + an... + + and. + a... a. + a._1
For the solution, we must wait till Chapter 7.
13. The Casino Problem: In all problems involving tosses of coins, a fair
coin (also called an unbiassed coin) means one for which the probability of
a head showing is f. In the present problem, the outcome of the game
depends solely on the ‘luck' of the player. The luckiest player can get a re-
ward having spent as little as 3 rupees, while the most unlucky player can
spend 10 rupees and still get no win. Good luck of a player of course spells
bad luck for the management! The problem here is to match the amount of
the reward with what an ‘nveragely' lucky player would spend. The trouble
is how to define average luck. In the present case we have to consider all
possible outcomes of the game, count the probability of each of them, add
.up the total ‘revenue’ they generate and divide this amount equally among
the winners. Even then, there is no guarantee that in a particular day or a
particular week the management will wind up even. All we can say is that
this will be the case ‘in the long run’.
The problem of counting the number of winners in this problem comes
under what is called ‘pattern recognition' and will be studied in Chapter 7 as
an application of recurrence relations. Whatever be the ethical objections to
30 DISCRETE MATHEMATICS (Chapter One)
applying mathematics to gambling, it is a fact that many developments in
probability theory owe their origin to gambling. Moreover, the ideas under-
lying gambling also figure in many serious applications where there is an ele-
ment of uncertainty. For example, the insurance companies are faced with a
similar problem when they have to decide how much premium should be
charged to cover a given risk.
l4. The Capital Problem: Let the three cities be at the points A,B, C in
a plane. If A, B, C are collinear, the problem is trivial, because obviously
the capital should be built at the city which is in between the other two. So
suppose A, B, C are not collinear. We then look for a point P in the plane
spanned by A, B, Cfor which the function | PA | + | PB 1 + | PC | is mini-
mum, where | PA | denotes the length of the straight line segment from A to
P. This is a typical calculus problem and can be solved by the methods for
minimising a function of two variables. The answer comes out as follows:
(i) If the triangle ABC is obtuse angled with obtuse angle > 120 degrees,
then P is the vertex of this obtuse angle (ii) in all other cases, P is the unique
point (within the triangle ABC) at which all the three sides of the triangle
ABC subtend an angle of 120 degrees. Actually, we shall not be studying
problems of this type in this book. This one is given only so as to compare
it with next problem.
15. The Head Ofi‘ice Problem: This problem difl'ers from the last one in
that no new roads are to be built. We are looking for a point P from which
the sum of the distances to the cities along the shortest available paths is
minimum. Obviously the point P has to be either in one of the cities or on
some path between them. The latter possibility can be eliminated by a sim-
ple argument which we leave as an exercise. This leaves only finitely many
possible candidates for P. Thus this problem belongs to discrete mathema-
tics unlike the last one which belongs to continuous mathematics. As with
many other problems in discrete mathematics, the crucial question here is
not the existence of a solution, but rather a method for finding it. Embodied
in this problem is the problem of finding the shortest distance between two
points in a map. This problem is of independent interest in itself and seve-
ral algorithms for it are known. Problems like this are studied in what is
called network anulystlv and in the chapter on graphs we shall make a brief
reference to it (see the Epilogue).
16. The Little Dwelling Sales-man Problem: From the elementary count-
ing techniques we shall develop in the next chapter, the number of all
possible circular tours in which each city is visited exactly once will be nl.
But because the starting point on a tour could be any of the n cities, the
number of distinct tours is only (n—l)l. Further, it does not matter in
which sense the tour is completed. So the number of distinct tours comes
Introduction and Preliminaries 3|
_ l
down to ("—2nlwhich is still a fabulously large number even for relatively
small values of n, say u = 15 or 20.
The real problem now is to determine efliciently which of these tours is
the shortest. This is called the "Travelling Salesman Problem' and is today
one of the most talked about problems in discrete mathematics. Once again,
the difliculty is not in showing the exiétence of the shortest tour. Because
there are only finitely many tours, one can always list them all down, com-
pute the length of each one of them and then take their minimum. But
because ofthe large number of tours, this is an exceedingly long procedure.
What people have been struggling for is an efiicient algorithm, or else a
proof that no such algorithm exists. Now exactly, what is meant by
‘eflicient’Tl'his is an important question and will be discussed in the chapter
on analysis of alogrithms‘. A rough idea can be given as follows. The time
taken by algorithm to find the shortest tour of n cities will obviously be a
function of n and will increase as n increases. It is the qualitative rate of
growth of this function that will be used as a yardstick of efi'iciency of the
algorithm. Rate of growth of a function is a concept from continuous
mathematics. So once again, in the chapter on analysis of algorithms it will
not be surprising if methods from continuous mathematics come in.
The Little Travelling Salesman Problem given here is meant only to
give a little idea of the ditficulties involved. A layman is most apt to start
the shortest tour with a pair of cities which are closest to each other. But
even for n = 4 this may fail as shown in Figure 1.7 where the triangle ABC
is isosceles and right angled at A and P lies on the altitude through A and
very close to the foot of the perpendicular. If the cities are P, A, B and C
then 1 PA | is clearly the shortest inter-city distance but the shortest tour is
P - C—A -B—P.
B C
Figure 1.7: The Little Travelling Salesman Problem
for u = 4.
17. The Diet Problem: This is yet another minimisation problem. Let
x, y denote the amounts (in kilograms) of the cereals A, Bin 3 monthly
'See the Epilogue.
32 mscam mmancs (Chapter One)
diet. Then the cost of the diet is 10x + 3y = f (x. y) (say). The problem is
to minimise f as a function of x and y. The variables x, y are continuous at
least in theory (although. in practice, you can’t ask a grocer to give you
exactly, say 2 + V1: kilograms of rice!). The function f has continuous
partial derivatives of all orders. So this looks like atypical calculus problem
of finding the minimum of a function of several variables. However. if we
apply the calculus method we seethat g—{c =10 and 2%: = 3 at all points. So
the function f has no critical points. The reason for failure is that the condi-
tions of the problem put certain constraints on the variables x and y. First
of all, x and y have to be non-negative, because a person cannot eat a
negative amount of cereal. The requirement about the protein intake
amounts to the inequality 40: + 60y > 300 or equivalently, 2x + 3y > 15.
Similarly the fat requirement is 8:: + 3y < 50. As for the last requirement,
x
we have m 2 %, or in other words, 3y—7x g 0. The problemis there-
fore to minimise the function f(x. y) = 10:: + 3y subjectto the constraints:
(i) x a 0. (ii) y 3 0, (iii) 2x + 3y 2 15, (iv) 8x + 3y g 50, and
(v) 3y — 7): < 0.
Let D be the set of all points (x. y) in the plane satisfying all these five
inequalities; It is easy to sketch D. From elementary coordinate geometry
we know that a line with equation of the form ax + by = c decomposes
the plane into two half-planes of which it is the common boundary. In one
of the half-planes the inequality ax + by 2 c holds while in the other, the
inequality ax + by < c holds. So to sketch D, we draw the lines 2x + 3y = 15,
8x+ 3y = 50, and 3y — 7x = 0 and see by inspection amp is the triangle
shown in Figure 1.8. D is called the feasible region because its points cor-
respond to situations in which all the constraints are satisfied. In this case,
since D is non-empty it follows that the person can indeed have at least one
possible diet. To find the cheapest such diet amounts to minimising f over
the domain D in the x-y plane. Again the well-known calculus method of
Lagrange multipliers fails because here the set D is not the set of points
where some " ‘ g takesa value. Thef ' f to be ' ' ‘ ‘
here is actually a very simple function called a linear function. a concept we
shall study in Chapter 6. The constraints in the problem are of the form of
what are called linear inequalities. We shall study problems of this kind in
a chapter‘ called linear programming. It turns out that many real-life pro
blems of optimisation can he paraphrased as linear programming problems.
Calculus methods are not quite adequate in solving them. The solution has
to be found by solving certain systems of linear equations.
18. The Cattle Problem: This problem is of the same spirit as the last
‘See the Epilogue
Intrnduction and Preliminaries 33
m" *9
(”’13)
feasible region
B D
(3’ 9) 9*
5 33
*3,
*4? c $9?)
Figure 1.8: The Diet Problem
one. If we denote by x and y the numbers of cows and bufl'alos respectively,
then the problem is to minimise the function 2200): + 3500y subject to the
constraints
(i) x 2 0, (ii) y 2 0. (iii) 3): + 5y 2 300, (iv) 5x +.7y > 450, and
(v) x +(1V5).v $100.
Here too the function to be minimised as well as all inequalities are linear.
34 DISCRETE MA'rnaMAncs (Chapter One)
But there is an important difi'erence. The variables x, y in this problem can
assume only integral values. This problem therefore comes under what is
called ‘integer programming’. One can draw the ‘feasible region’ thinking
at, y as continuous variables. However, it is 'only the integral points (Le.
points both whose coordinates are integers) in the feasible region that we are
interested in. Consequently the methods needed are substantially different
and beyond our scope. We shall, therefore, make only a passing reference
to problems of integer programming.
We have now given the reader a glimpse of the type of the things that
will be done in subsequent chapters. He is advised to bear these problems in
mind and to read the comments about them again when they are referred to
later in this book.
Exercises
3.1 Using the second principle of mathematical induction, prove that if
there are n participants in a knock-out tournament, then 71— 1 matches
will be needed to find the champion. (In this method, you prove the
assertion for n = l and to establish its truth for n = k, you are
allowed to assume it is true for all n < k and not just for n = k — l
as is the case with the first principle of mathematical induction. In
the present problem, the cases of even 71 and odd n require slightly
difl'erent, but essentially similar arguments.)
Do the Envelops Problem by hand for n = 2. 3 and 4.
was
UN
Do the Postage Problem by hand and give a complete list of the
distinct combinations of stamps.
3.4 Do the Postage Problem, with the additional restriction that at
least two and at most four stamps of each denominations must be
used.
3.5 Prove that there are infinitely many prime numbers.
3.6 Do the Business Problem by ‘common sense’.
3.7 Sum the series 1 + 2 +...+ n.
3. 8 Using the answer to the last exercise, do the Regions Problem.
(Hint; Sum both the sides of the recurrence relation from 1 to n.
Most terms will cancel.)
3.9 The numbers 11,. in the Shares Problem are popularly known as
Fibonacci numbers and are more standardly denoted by F... Although
we have not yet found a closed expression for F. a number of
properties about them can be proved without it by induction, simply
from the recurrence relation F. + F.“ = F"... For example, prove
the following results, in which n, m, p, q are positive integers.
(a) Fun = FnF-u + F.._1F,..
(Hint: Use induction cam).
Introduction and Preliminaries 35
(5) Flu FI-l — F», = (*1).-
(c) If p divides q then F, divides Iv}.
(Hint: Use (a) and induction on the ratio q/p.)
(d) F,I + F.+,' is also a Fibonacci number.
‘3. 10 Do the Capital Problem.
3.11 Prove that in the Head Oflice Problem, we may as well suppose that
the head office is located at one of the cities. In other words, if Q
is a point on a road joining city X to city Y (say) such that the sum
of the shortest distances from Q to all the cities is the minimum
possible, then either Q is X or Yor else it does not matter where
on the road Q lies so that we might set Q = X or Q = Yarbit-
rarily.
(Hint: The shortest possible path from Q to a given city must
pass either through X or through Y. Suppose m such paths pass
through X and n through Y. Consider the cases In > n, m < n and
m = n.)
3.12 Verify that in Figure 1.7, the shortest tour is in fact
P — C — A — B — P.
More generally, prove that this holds as long as P lies on the alti-
tude through A and its distance from the mid-point of BC does not
exceed——4I—l | BC |.
3.13 Sketch the ‘feasible region’ for the Cattle Problem.
Note and Guide to Literature
in this section we have touched many topics. References to them will be
cited when they are more fully discussed later in this book. As a general
reference we cite the various volumes, and especially the first volume,
of Knuth‘s book, ‘The Art of Computer Programming’, where the
reader will find an encyclopaedic collection of known results on almost
anything even remotely related to discrete structures and also numerous
historical and bibliographical references. For topics in algebra we recom-
mend Herstein’ 5 book of the same title, with its highly readable style and
collection of problems. The book of Dornhofi‘ and Hohn is a recent one
and treats algebra from the point of view of applications. Standard books
on eombinatorics include Tucker [1], Liu [I], Krishnamurthy [1] M. Hall
[I], Riordan [114 Although titled ‘Combinatiorial Mathematics for Recrea-
ticn’, Vilenkin’s [I] book goes fairly deep into the theory and has a charm-
ing variety of problems.
Euclid (300 B C.) is, of course, more famous for his treatise on geo-
metry. His proof of the fact that there are infinitely many primes is to this
date considered one of the best mathematical arguments.
36 DISCRETE mammancs (Chapter One)
The Fibonacci numbers appear in many apparently unrelated contexts.
In nature, certain lower organisms grow much like the shares in the Shares
Problem (if we think of a bonus share as an offspring). So, some of the
lower Fibonacci numbers appear frequently in nature, as the number of
petals ina flower or the number of scales on a snake‘s skin. There is a
huge literature on the Fibonacci numbers. There is even a journal, the
Fibonacci Quarterly which is devoted to research about them.
4. Review of Logic
The prerequisites needed for most of this book will be minimal. All the
chapters upto the sixth will be independent of calculus, except perhaps for
some examples (which may be omitted without loss of continuity). Of
course, as noted before, calculus will help the discerning reader compare the
various concepts in discrete mathematics with their analogues (if any) in
continuous mathematics. Serious use of calculus will be made only in the
seventh chapter. There we shall need some facts about sequences and series,
about complex analytic functions and about some special functions (such as
logarithm and exponential functions). These will be reviewed as and when
they are needed. An exposure to vector algebra and matrices will help in
Chapters 6 and 7. But these too are not stringent prerequisites as the treat.
mom will be largely self-contained.
What is needed more crucially is nota bundle of theorems in mathe-
matics but rather a certain frame of mind. This can be developed through
a working knowledge of mathematical logic and of set theory. We proceed
to review logic here. (Set theory will be discussed in the next chapter.)
Before doing so, we emphasise that the purpose here is not to give a bunch
of definitions and theorems but rather to inculcate a certain discipline of
thought. What counts is the spirit and not the techniCalitles.
The kind of logic that is used in mathematics is called deductive logic as
opposed to the inductive logic which is used in experimental science or in
everyday life. In the latter we generalise from a part to the whole, a com-
monly cited example being that since every dog around us barks, we conclude
that all dogs bark. As was made clear in the comments about the Tourna-
ments Problem, this type of reasoning cannot be used in mathematical proofs
(although, undoubtedly, it has some illustrative value). Mathematical argu-
ments must be strictly deductive in nature, This means. the truth of the
statement to be proved must be established assuming the truth of some
other statements. For example, in geometry we deduce the statement that
the sum of the three angles of a triangle is 180 degree from the statement
that an external angle of a triangle equals the sum of the other two angles
of the triangle. Of course, this latter statement has to be deduced from
some other statements. Eventually a stage will arise where the truth of the
statement to be proved cannot be deduced from that of some others. Such
Introduction and Preliminaries 37
statements are called axioms or postulates. Their truth is an article of faith.
As long as there is no inherent contradiction among themselves, any system
of statements may be taken as axioms and theorems may be deduced from
them. If the theory so developed is to have any ‘practical‘ applications, the
axioms must obviously conform to our experience (although, as noted ear-
lier, experience cannot provide a prooffor them). As our experience and the
purpose of applications vary, we may modify the axioms and develop new
theories. For example, one of the axioms in the classical euclidean geo-
metry is that through any pair of distinct points there passes one and only
one line. When the surface of the earth was regarded as a planar one,
this was certainly consistent with experience, with the usual interpretation
of line as the shortest distance path. But with a spherical earth this is no
longer true for points which are diametrically opposite. This has led to
development of new geometries called Riemannian geometries whose axioms
are difl'erent from those of the euclidean geometry.
The point is that whatever he the axiom scheme, the rules of deducing
theorems from them do not change. In the chapter on Boolean algebras we
shall study these rules under “valid arguments’. Suflice it to say here they all
conform to our common sense and we in fact use them almost instinctively
in everyday life. By way of illustration we state two such rules here. In
stating each of them, first we list a few statements, called the premises,
then draw a line and write another statement called the conclusion. Each
rule asserts that whenever the premises hold true so does the conclusion.
Rule 1: If it ruins the streets get wet.
It rains.
The streets get wet.
Rule 2: All men are mortal.
Socrates is a men.
Socrates is mortal.
Neither of these sounds very bright. But actually, the proofs of even the
most profound theorems‘ in mathematics consist of chains of such tiny bits of
‘By the way, not all results proved in mathematics are called ‘theorcms‘. Many of
them are called 'propositions', ‘Iemmas’ or ‘oorolleries‘. From a strict deductive point
of view. there is no dillerence nmong them. The distinction rem on some extrinsic
aspects such as utility, depth and beauty. A lemma is useful in a limited context (often
only as n preparatory step for some theorem) and is 100 technical to have an aesthetic
appeal. A proposition is liken mini-theorem. A ‘true' theorem carries with it some
depth and a certain suecinctness of form and often representsthe culmination of some
coherent piece of work. while a corollary is like an outgrowth of a theorem.
38 Inseam MATHEMATICS (Chapter One)
reasoning. Genius is needed not for these individual bits but for combining
them suitably. Indeed, as we remarked in the last section, the logic we
shall study will not be significantly different from common sense. We
therefore, conclude our preliminaries about logic by emphasising only those
points where a layman (or a person familiar only with the conventional,
computational mathematics) may experience a little difficulty.
1. Bivalued Logic: The kind of logic we shall use will be bi-valued in
that, for every statement there will be only two possibilities, either ‘true' or
'false'. Thus every statement is either true, or false but not both, regard-
less of whether or not we have a way of knowing which way it is. (It is this
very point on which constructivists difl'er. But we shall not follow them).
There are many statements which at present are not known to be either
true or false. A well-known example of this is the Goldbach‘s conjucrure
which states that every even integer greater than 2 can be expressed as a
sum of two prime numbers. Although this has been verified for an impres-
sive range of cases, nobody has proved it for all cases. Nor has it been
disproved, that is, nobody has so far discovered even a single even
integer greater than 2 which cannot be expressed as a sum of two primes.
Still, even today, the statement is either true or false, even though it
may take years to find out which way it is; much the same way as a miss-
ing person is either alive or dead at a particular time even though at that
time there may be no way to find out which possibility holds.
Another point to note about bi-valuedness is that there is nothing in
between ‘true’ and ‘false'. ‘Truth’ means complete and absolute truth, without
qualification. There is no such thing as 'very true‘. ‘almost true’, ‘substan-
tially true’, ‘partially true’. or ‘having an element oftruth’, although we com-
monly use such phrases in practice. The reason we use such expressions is that
the statement involves, directly, or indirectly, some quantity and the degree
ofthe truth of the statement is measured by how close the actual quantity is to
the quantity implicit in the statement. For example, take a simple statement,
‘John is tall’. In this statement, some standard of tallness is implicit, which
may, of course, depend upon the context. Let us suppose, for instance. that
this standard is a minimum height of 180 centimeters. Now if John’s height
is, say, 150, 160, 170, 175, 179, 180, 183, 186 and 190 centimeters then we
would probably describe the statement respectively as, grossly false. false,
with a shade of truth. substantially true. almost true, true, quite true, very
true and an understatementl From a mathematical point of view. however.
the statement is false (‘equally false‘) in the first five cases and true ('equally
true‘) in the remaining four.
2. Statements about a Class: The remarks about quantification of truth
also apply .to statements made about a class. The statement ‘All rich men
are happy‘ is about the class of all rich men. Goldbach’s conjecture above
Introduction and Preliminaries 39
is a statement about the class of all even integers greater than 2.
A layman is
apt to regard these statements as true (or at least as ‘nearly true‘)
when they
hold in a large number of cases. Even if there are a few exceptions,
he is
likely to ignore them and say ‘The exception proves the rule!’. In
mathe-
matics, this is not so. Even a single exceptional case, (a counter-example,
as
it is called) renders false a statement about a class. Thus even one unhappy
rich man makes the statement ‘All rich men are happy’ as false as millions
of such men would do. in other words, in mathematics we interpret the
words ‘all‘ and ‘every’ quite literally, not allowing even a single exception
If we want to make a true statement after taking the exceptional cases
into account, we would have to make a difi‘erent statement such as, “All rich
men other than Mr. X are happy’ or ‘At least ninety per cent rich men are
happy' and so on. But loose expressions such as ‘almost all’, ‘all except a
few’, ‘a great many’ cannot be used in mathematical statements, unless, of
course, they have been precisely defined earlier. (There is, indeed, one such
common interpretation given to the phrase ‘almost all’. When used in con-
nection with statements about an infinite class, ‘almost all' means ‘all with
the exception of finitely many’. Forexample the statement ‘Almost all prime
numbers are odd’, istrue because there is only one even prime(namely 2) while
the statement ‘Almost all positive integers can be expressed as a sum of three
perfect squares’ is false because one can construct infinitely many positive
integers which are not so expressible. Phrases like ‘almost everywhere’ are
also used with a very precise meaning in certain contexts. But we shall not
study them. We only remark that the expression ‘lines in general position’
used in the comments on the Regions Problem is of this type.)
There is another type of statements made about a class. They do not
assert that something holds for all elements of the class but that it holds for
some, or at least one element. Take for example the statements, ‘There is
(or exists) a man who is eight feet tall'. or '200 is divisible by some power
of 2’. These statements refer respectively to the class of all men and to the
class of all powers of 2. In each case. the statements says that there is at
least one member of the class having a certain property. It does not say
how many such members are there. Nor does it say which ones they are.
Thus the first statement tells us nothing by way of the name and the address
of the eightvfeeter and the second one does not say which power of 2 divides
200. These statements are. therefore, not as strong as, respectively, the
statements, say, ‘Mr. X in Bombay is eight feet tall' or ‘2' dividesZOO’, which
are very specific. A statement which merely asserts the existence of some-
thing without naming it or without giving any method for finding it is called
an existence statement. In Section 2 we already saw an existence theorem.
Because of our commitment to bi-valued logic we have to accept the possi-
bility of an existence statement being true even when it is not specific. (Once
again, this is a point of difference with constructivist logic.)
Note, incidentally, that we make no distinction between the statements
40 mm MATHEMATICS (Chapter One)
‘All men are mortal‘ and ‘Every man is mortal.’ They convey exactly the
same meaning and hence are logically equivalent in the sense that either both
are true together or else both are false. In general, we shall not distinguish
between statements which are paraphrases of each other. Thus, ‘I own this
house' will be taken the same as ‘This house belongs to me’. Whenever there
is a subtle mathematical difl‘erence between two statements which are likely
to be taken to mean the same in practice, it will be pointed out.
3. Negation of a Statement: The bi-valuedness of logic also has some
repurcussions on negations of statements. Formally, the negation of a state-
ment is a statement which is true precisely when the original statement is
false and vice-versa. The simplest way to negate a statement is to precede it
with the phrase ‘It is not the case that ...’. Thus, the negation of ‘Ram is
rich’ is ‘It is not the case that Ram is rich’. But this is too mechanical and
is not very useful either. We therefore paraphrase this as ‘Ram is not rich'
or even as ‘Ram is poor’ provided we agree that ‘rich' and ‘poor’ are
antonyms, that is, words of opposite meaning. If a statement is denoted by
some symbol p, its negation is denoted by p’, — p or ~ p and read as 'not
11'. Where symbols are used in writing a statement, the negation is written
by putting a slash (I) over the symbol which incorporates the principal verb
of the statement. Thus, the negation of ‘x = y' is ‘x ;& y’.
Now, if ‘poor’ is the opposite of “rich’, the opposite of ‘very rich‘ should
be ‘very poor'. Thus we are apt to negate a statement, ‘Ram is very rich'
as ‘Ram is very poor.’ But this is incorrect. There are various degrees of
richness ranging from the very rich to the very poor. The original statement
is about Ram's degree of richness and its logical negation simply says that
he lacks the very high degree. But this does not mean that he is necessarily
at the other end. Perhaps he is just average. So the correct negation of
'Ram is very rich’ is ‘Ram is not very rich’. Inother words, negation should
not be confused with antithesis, when there is a whole spectrum of other
possibilities. The logical negation of ‘This ball is black’ is ‘This ball is not
black' and not ‘This ball is white’. Similarly. the negation of ‘The book is
on the table’ is 'The book is not on the table’ and not ‘The book is below
the table‘.
Similar considerations apply when we deal with statements like ‘All
men are mortal'. Its negation is not ‘All men are immortal’. In view of the
comments we made about the truth of statements about a class, it is false,
even when it fails to hold just in one case, that is, when there is even one
man who is not mortal. So the correct logical negation is ‘There is (or exists)
animmortal man’. Not surprisingly, the negation of an existence statement
is a statement asserting that every member of the class (to which the exis-
tence statement refers) fails to have the property asserted by the existence
statement. Thus, the negation of 'There exists a rich man‘ is ‘No man is
'i‘ih' 0" equivalently, ‘Every man is POOr’. If we keep in mind these simple
Introduction and Preliminaries 41
facts, we can almost mechanically write down the negation of any complicated
statement. Notealso that the double negation (that is, the negation of the
negation) of a statement is logically equivalent to the original statement.
In symbols (p’)’ is equivalent top.
4. Vacuum: Truth: An interesting (and often confusing to abegiuner)
point arises while dealing with statements about a class. A class which
contains no elements at all is called a vacuous or empty or null class. For
example the class of all four-legged men is empty because no man has four
legs. In the House Problem, if there is only one house then the class
of pairs of distinct houses is empty. We can continue this list much longer.
But now consider a statement, ‘Every four-legged man is happy‘. Is it true
or false? We cannot call it meaningless. It has as definite a meaning as the
statement ‘Every rich man is happy‘. We may call the statement useless, but
that does not debar it from being true or false. which way is it then? Here
the reasoning goes as follows. Because of hi-valued logic, the statement
‘Every four-legged man is happy’ has to be either true or false (but not both).
If it is false then its negation is true. But the negation is the statement
‘There exists a four-legged man who is not happy‘. But this statement can
never be true because there exists no four-legged man whatsoever (the
question of his being happy or unhappy not arising at all). So the negation
has to be false and hence the original statement is true! Alayman may
hesitate in accepting this reasoning and we give some recognition to his
hesitation by calling such statements as vacuously true, meaning thereby
that they are true because there cannot be anything to render them false.
Of course, from a logical point of view truth has no further qualification
and so a vacuously true statement is just as true as a statement whose truth
has been established by a long, hard work. Note, by the way, the statement
‘Every four-legged man is unhappy' is also true (albeit vacuously). There
is no contradiction here because the statements ‘Every four-legged man is
is happy‘ and ‘Every four-legged man is unhappy’ are not the negations of
each other.
What is the use of vacuously true statements? Certainly, no mathemati-
cian goes on proving theorems which are known to be vacuously true. But
such statements sometimes arise as special cases of a more general problem.
For example, suppose we want to show that under certain conditions,
given n lines in a plane are in general position (which means that no two of
them are parallel and no three are concurrent). If we try to do this by
induction on n, we see that the starting step, namely the casen=1 is
vacuously true.
5' (q - .~ ”dB-1‘ ' : The ; ' oftwo ‘ is
obtained; by putting the word ‘and’ between them. It is true when both the
statements are true and false otherwise. There is absolutely no restriction on
42 DISCRETE maximums (Chapter One)
the two statements. One can even form the conjunction of a statement and
its negation. of course, such a conjunction will be always false.
The disjuunction of two statements is obtained by putting ‘or‘ between
them. It is true when at least one of them is true and false otherwise. The
only point to stress about it is the meaning of ‘or'. In practice we often use
it to mean either one but not both of the possiblities. This is called the
exclusive use of ‘or’. For example, ‘I shall spend my vacation in Bombay
or in Pune', is often taken to mean that the vacation is to be spent either in
Bombay or in Pune but not in both. Indeed, sometimes the very nature of
the two possibilities is that they cannot hold simultaneously. For example,
‘Either the book is on this shelf or else it is stolen’. If we want to indicate
that both the possibilities can hold simultaneously we add the words 'or
both' in practice. For example, ‘A person with such handwriting must be a
doctor or a crook or both'. In mathematics, however, it is unnecessary to
add ‘or both’. because the word ‘or‘ is always used in the inclusive sense,
that is. so as to include the possibility of both the statements holding true.
This is, of course, consistent with the practice of the disjunction being true
when at least one (perhaps both) of the statements is true. If we want to
use ‘or’ in the exclusive sense in mathematics, we can do so only by speci-
fying ‘but not both’. Thus, “Either x divides y or y divides x but not
both.’
6. Implication Statements: These are the most frequently occurring state-
ments in mathematics and so deserve a careful study. If p and q are two
statements then by p -> q (or p a q) we denote the implication statement
‘p implies q’ or 'If p then if. The layman's interpretation of thisis ‘Whenever
p holds q holds’ or ‘the truth of pforces the truth of (1’. For example, if p
is the statement 'It rains’ and q the statement-“The streets get wet' then
p —> g reads ‘If it rains then the streets get wet' and means that whenever it
rains the streets must get wet. (Even when it does not rain the streets may
get wet for some other reasons but the implication statement is not saying
anything as to what happens when it does not rain.) Many statements about
a class can be expressed in an implication form so as to convey the same
meaning. For example, the statement ‘All rich men are happy‘ is equivalent
to ‘It' a man is rich then he is happy’. In an implication statement “p —> q‘
the statements p and q are called, respectively, the hypothesis and the con-
clusion.
It is interesting to note that the statement ‘1! -> q’ is logically equivalent
to the implication statement ‘q’ —> p”, called its «nun-positive. We use this
equivalence frequently in the reductio—ad-absurdum argument. Instead of show-
ing directly that the truth of 1- implies that of q, we show that if q fails then
p cannot hold. In law, the defence of alibi is actually an instance of this.
The mathematical interpretation of an implication statement is the same
as a layman’s. But a few points need to be stressed. First, as noted above.
Introduction and Preliminaries 43
the implication statement ‘17 -> q' is completely silent as to what happens
if p does not hold. In particular it does not say that if p fails then q must
fail, although, in practice we often attach this extra meaning to it. For
example if a person says ‘if Monday isa holiday, [shall come’, we normally
take this to mean that he would come if Monday is a holiday and also that
he would not come if Monday is not a holiday. In mathematics, this extra
' “is never ‘ L ‘ If it is also ‘ ‘ ‘ 4 it has to be expressed by a
separate implication statement, namely, 'p’ -> q", or equivalently by its
contrapositive ‘q—> p'. The familiar name for this new statement is the
Converse of the original implication statement ‘p —> q‘. Note that the hypo-
thesis of the original statement is the conclusion of the converse and vice-
versa. The truth of an implication statement should not be confused with
that of its converse. The two are quite independent of each other. Numerous
examples can be given where an implication statement is true but the con-
verse is false and vice-versa.
Sometimes, in the statement p —-> q the hypothesis, that is, p, is itself the
negation of some statement, as, for example, the statement ‘If it does not
rain the crops will die‘. In such a case it is customary to replace the phrase
‘if not’ by the single word ‘unless’. With this change, the present statement
would become ‘Unless it rains, the crops will die‘, We warn once again that
this statement says nothing whatsoever about the survival of the crops in
the event it does rain. Here again. a logician differs from a layman who
would interpret this present statement to mean that if it rains crops will be
saved. The safest way to correctly interpret statements involving ‘unless'
is to substitute for it ‘if not’.
In view of the immense importance of implication statements in mathe-
matics, let us consider some other ways of paraphrasing them. Suppose p
and q are any statements. Then 11 —> q can be read in any of the following
ways:
(i) 17 implies q.
(ii) q follows from p.
(iii) q is a (logical) consequence ofp.
(iv) If p is true then q is true.
(v) If q is false then 11 is false.
(vi) p is false unless q holds.
(vii) p is a sufficient condition for q.
(viii) q is a necessary condition for p.
(ix) p is true only if q is true.
Item (i) is just the definition, while (ii), (iii) and (iv) are its paraphrases.
As we have seen before, (v) is the contrapositive of (i) and (vi) a rephrasing
of (v). The last three are the only versions which call for a comment. of
thestreets
these, (vii) is fairly straightforward. For example to say ‘If it rains
get wet’ clearly amounts to saying that"Raining Is a sufficient condition for
44 DISCRETE MATHEMATICS (Chapter One)
the streets to get wet‘, or that ‘In order that the streets get wet, it suflices if
it rairis‘. Thus, the use of the word ‘suflicient‘ here conforms to its ordinary
meaning.
It is a little confusing to use version (vii) in the case of some statements.
For example, in the example just given, the statement would read ‘Wetting
of streets is a necessary condition for it to rain’. This sounds absurd. The
trouble is with the word ‘condition’. In practice, it has the connotation of a
prerequisite, that is, something which is to exist prior to the happening of
some event. In the present case the question of streets getting wet arises
only after the rain and that is why it is hard to swallow that wetting of
streets is a necessary condition for it to rain. Perhaps, another example
would clarify the situation. Consider the statement, ‘If two triangles are
congrpcnt then they are similar’. This means that in order that two triangles
be congruent they must at least be similar to each other. Congruency can
never occur if similarity does not hold. In other words, similarity of the
triangles is a necessary condition for them to be congruent. Whether it is
sufficient or not is not the concern of the statement, it is the business of the
converse statment. Necessity and sufficiency should never be confused with
each other. In a sense, they are converse to each other.
About the last version, ‘17 is true only if q is true‘ it is once again neces-
sary to distinguish a layman from a logician. When alayman says ‘I shall
come only if I am free‘, he generally means that he will come if he is free
but not otherwise. A logician, however, makes no such commitment when
he makes the same statement. All he is saying is that his being free is a
necessary condition for his coming, that is. his coming will be impossibleif
he is not free. He is saying nothing at all as to what he will do if he is free.
Here, too, it is vital to distinguish between ‘if’ and ‘only if’.
There is one exception to the preceding remarks. When something is
defined in terms of a condition, it is customary to cite this condition as
suflieient, even though it is in fact sufficient as well as necessary. Thus,
when we say 'A triangle is called equilateral if all its sides are equal’, it also
means that a triangle all whose sides are not equal will not be called equi-
lateral. In other words, here ‘if’ means ‘if and only if’. This usage is un-
fortunate but very standard. Fortunately, it appears exclusively in definitions
and nowhere else.
In mathematics it often happens that we combine together an impli-
cation statement along with its converse. For example take the Well-known
theorem, ‘The sum of opposite angles in acyclic quadrilateral is 180 degrees
and conversely’. If we let p be the statement’ ABCD is a cyclic quadrilateral‘
and q the statement ‘4 A + 4 C = 180 degrees‘ then the statement of the
theorem is the conjunction of p -> qand q —+ 17. These types of statements
come up so frequently that it is convenient to have a shorter notation for
them. The most natural choice is to use arrows in both direction, that is, to
Introduction and Preliminaries 45
use p H q or p a q. Here again it is convenient to list down a number of
versrons of this statement.
(i) p and q imply each other.
(ii) p and q are equivalent to each other.
(iii) p holds if and only if q holds.
(iv) q is a characterisation of p. (This version is generally used only
when p, q express some properties of the same object).
(v) q holds if p does and conversely.
(vi) 4 holds if p does, but not otherwise,
(vii) if p is true then q is true and if p is false so is q.
(viii) q is a necessary as well as a sufficient condition for 11.
Of course many other formulations are possible in view of the symmetry
of p and q. Such statements are called ‘if and only if’ statements. The ex-
pression ‘ii' and only if’ n so often in mm ' that itis y
to abbreviate it to ‘ifi". Thus, the geometric theorem quoted above can be
stated as ‘A quadrilateral is cyclic ifl' the sum of its. opposite angles is 180
degrees}
A theorem of this sort is really equivalent to two separate theorems
which are converses of each other. If we write the statement symbolically
asp o-y q (or as p <> q) then the implication p —> q is called the direct
implication or the ‘only it" part of the theorem while the other way impli-
cation, q —> p is called converse implication or the ‘if’ part of the theorem.
In general, separate proofs are needed for both the parts. Occasionally, it
so happens that the steps used in the proofs of the direct implication are all
reversible. in such a case. the converse is said to follow by reversing the
proof of the direct implication. It is by no means the case that both the
implications are of the same degree of difliculty. There are many theorems
in which one of the implications is simple, almost to the point of being
trivial, but the other way implication il‘ fairly involved. As an example
take the well-known remainder theorem which states, ‘Let f(x) be a poly-
nomial in the variable x. Then a real numberb is a root off (Le. f (b) = 0)
ifi' (x—b) is a factor off(x)’. in this case, the ‘if’ part is trivial, but the
‘only if‘ part is not so immediate.
The concept of implication leads naturally to that of comparison of
relative strengths of statements. In practice, we say that acertain statement
or piece of information is stronger than another if the knowledge of the
former subsumes knowledge of the other. For example we say it is stronger
to say that a certain person lives in Kerala than to say that he lives in
by
India. This is so because anyone can infer the latter from the former
sheer commonsense, provided of course, that he knows that Kerala is a
part of India.
Mathematically, we say that a statement p is stronger than a statement
46 mscas'rs MATHEMATICS (Chapter One)
q(or that q is weaker than p) if the implication statement 17 —> q is true. A
few comments are in order. First of all, ‘stronger‘ does not necessarily
mean ‘strictly stronger’. Note for example that every statement is stronger
than itself. The apparent paradox here is purely linguistical. If we want
to avoid it we should replace the word ‘stronger‘ by the phrase ‘stronger
than or possibly as strong as‘. However, the use of the word ‘stronger’ in
this context is fairly standard. if p -> q is true but its converse is false, then
we say that p is strictly stronger than q (or that q is strictly weaker than p).
For example'it is strictly stronger to say that a given quadrilateral is a
rhombus than to say it is a parallelogram. Second, given two statements 11
and q it may happen that neither is stronger than the other. Indeed, the
two statements may not be related at all. In such a case we say that their
strengths are not comparable to each other. For example, the statement
‘ABCD is a rectangle’ and the statement ‘ABCD is a. rhombus’. The word
‘sharper' is used sometimes for ‘stronger'. This usage is common when the
two statements deal with estimates or approximation of something
What happens if out of two statements, each is stronger than the other?
As we have already noted, in such a case we say that the two statements
have the same (or equal) strength or that they are (mutually) equivalent. For
example the statement, ‘ABCD is a cyclic quadrilateral’ and the statement,
‘ABCD isa quadrilateral in which 4.4 + 4C = 180 degrees' are equi-
valent to each other. Earlier in this section we defined the logical equivalence
of two statements to mean their simultaneous truth or falsehood. It is obvi-
ousthat in such a case they are of equal strength.
A large part of mathematics is concerned with the determination of -
relative strengths of statements, that is. with the comparison of strengths
of statements. This is a task of varying degree of difliculty. In some cases
the comparison of strengths or the equivalence of two statements is a
matter of common sense or of using synonymous expressions. For example
the statement ‘I own this house’ is equivalent to the statement ‘This house
belongs to rne’i in some cases, on the other hand, equivalence may not be
obvious and needs to be established by some proof (as in the case of the
statement ‘ABCD is a cyclic quadrilateral’ and the statement ‘ABCD is a
quadrilateral in which 4 A + 4C = 180 degrees').
7. Logical Precision in Mathematics: The importance of logic as a
discipline of thought, in mathematics, cannot be over-emphasised. Logical
reasoning being the soul of mathematics, even a single flaw of reasoning
can thwart an entire piece of a research work. We already pointed out that
in mathematics every theorem has to be deduced from the axioms in
a
strictly deductive manner. (The term ‘mathematicai induction' is somewhat
misleading because it is actually a case of logical deduction. Logical
induction has no place in mathematics). or course, one can always
use
theorems proved earlier by oneself or by others. But every step has to
be
Introduction and Preliminaries 47
justified, This is the rule for all mathematics. whether pure or applied,
classical or modern. But it deserves to be emphasised here because in the
classical mathematics (except perhaps for euclidean geometry) the concern
is usually with L f‘ 1 ', the , ' ‘ justifi ' are based
upon some very basic properties of numbers. These are few in number
and are used so frequently that their specific mention is rarely made. For
example, if we get an equation like (x+ 3) 3(x—3)=30 in some problem, we
mechanically solve it in the following steps: .
(x+3)(x—3)=
—9= 10
x‘=l9
xat/Wor—s/fi
Although no justification 1s given for these steps, they require various
properties of real L such as . and "
tion laws for multiplication and addition, dist’ributivity of multiplication
over addition and finally, the existence of square roots of real numbers.
Because of over-familiarity we tend to ignore them. But in this book we
shall consider ‘abstract’ algebraic systems where some of these laws do not
hold. Then the justification for each step will have to be given carefully,
starting from the axioms. Considerations of space (and of the level of this
book) would, of course. not permit us to be always so complete, and may
have to borrow some theorems without proof. Whether a proof is actually
given or not, the crucial point is to appreciate that a proof is needed even
when the statement may seem ‘obvious’.
The kind of logical perfection sought in insisting that every statement
he proved from axioms by a deductive reasoning, has another manifestation
in mathematics, which is relatively of recent origin. It is seen in definitions.
Just as every mathematical statement has to be proved no matter how
obvious it appears, every mathematical concept has to be rigorously defined
no matter how intuitively clear it may be. Just as every statement is deduced
from some others proved earlier, every concept is to be defined in terms of
other concepts defined earlier. For example, everybody understands what a
triangle is. But as a mathematical term it has to be defined, say, as ‘a figure
bounded by three line segments'. This definition, of course, requires that the
mathematical terms appearing in it, namely, ‘figure‘, ‘bounded’, ‘three‘
and ‘line segment’ be defined earlier. Just as in deducing theorems we ulti-
mately reach axioms whose truth has to be taken for granted, in trying to
give definitions for mathematical concepts. ultimately we reach some terms
which cannot be further defined. Such terms are called primitive terms. No
formal meaning is assigned to these terms and one is free to interpret them
as he likes as long as such interpretation is consistent with the axioms
involving them. Indeed such interpretations enlarge the applicability of
48 DISCRETE MATHEMATICS (Chapter One)
mathematices. For example, the primitive concepts of geometry are ‘point',
‘line‘ and the ‘incidence’ relation (that is. the relationship between a line and
a point ‘lying’ upon it). We generally interpret a point as a dot (having no
dimensions), a line as a set of points having only length but no breadth and
a point incident upon a line as a dot belonging to the line. But one is equally
entitled to think of a point as a lock, a line as a key and to think a lock as
incident upon a key if it can be opened by that key. If such a system of locks
and keys satisfies the axioms of geometry (for example, that every two
distinct points are incident upon a unique common line) then all the theo-
rems proved from these axioms will be applicable to it.
Although the perfection achieved through the practice of starting from a
few primitive terms and then defining everything else unabiguously is nethe-
tically appealing (and also practically useful as indicated above). we shall
not always adhere rigidly to this discipline in giving definitions because it
has a pedagogical disadvantage that often, some very intuitive concepts
require highly clumsy and unappealing definitions. What is frequently done
in the interest of logical-precision is as follows. Suppose we want to define
something, say, A, which we understand intuitively but cannot describe
rigorously. We then look for something, say, B, which can be rigorously
described and which is so inexorably related to A that knowing A is as good
as knowing B. We then define A as B. For convenience, we call this as the
‘definition trick’. A simple instance in which this trick is applied is in the
definition of an infinite sequence of real numbers. People have been hand-
ling such sequences for centuries and calculus bookscontain a large number
of theorems about them. But what is a sequence of real numbers? We may
try to answer this something like, ‘A sequence of real numbers is an infinite
succession of real numbers'. But this is at the most a description of a
sequence. It cannot be taken as a definition of a sequence because it immedi-
ately raises the question, ‘What is a succession?‘. We may try to dodge it
by taking ‘succession’ as a primitive term. But then we might as well have
started with ‘seqnence‘ as a primitive term in the first place! Actually, there
is another way out. Let N and R denote, respectively, the sets of positive
integers and of real numbers. Now given a sequence, say, {(1.}, of real
numbers we define a function f from N into R by f(n) = a, for all n E N.
(We trust the reader is familiar with functions. Anyway, we shall study them
in the next chapter.) It is clear that knowing the sequence {0,} is as good
as knowing the function f. So we simply define an infinite sequence
of real numbers as a function from N into the set of real numbers. More
generally, we define a sequence of complex numbers as a function from N
into the set of complex numbers, a sequence of monkeys as a function
from N into the set of monkeys and so on. These are perfectly rigorous
definitions, provided the terms ‘l‘unction‘ and ‘poaitive integers’ have been
defined earlier. This can actually be done, taking a ‘set’ as a primitive
term.
In fact, it turns out that if we take the concept of a set as a primitive
Introduction and Preliminaries 49
one, then almost every concept we come across in highly diverse branches
of L in can be J E ’ quite " 'y in terms of it. Every-
thing that can be done using real numbers can be done using sets, because
real numbers themselves can be defined as certain special sets. The converse
is not true. There are many problems where real numbers are useless but
which can be handled through sets. One such problem is theDance Problem
given in Seetiou 1. It is for this reason that the focal point of today’s
mathematics is not numbers but sets. It is no exaggeration to say that sets
are the all,” ‘ of the ‘ g in which ' i- is ex, “ today.
We shall briefly study them in the next chapter.
As indicated above, we shall not be very fussy about rigorous definitions
of concepts which are intuitively clear and whose formal definitions only
serve to make them precise. The purpose of this rather lengthy review of
logic was merely to develop a certain maturity on the part of the reader so
that when he later comes across ‘abstract‘ concepts, he will not be baffled.
We remark once again that basically all that is needed is common sense
along with a habit of carefully weighing the statements one sees or makes.
Expertise in logic cannot by itself be a substitute for mathematical
acumen. But it at least prevents you from going astray.
Exercises
4.! Write the logical negations of the following statements :
(i) For every man there exists a woman who loves him.
(ii) Gopal is intelligent and rich.
(iii) Gopal is intelligent but not rich.
(iv) Gopal is either intelligent or rich.
(v) If it rains the streets get wet.
(vi) For every 2 > 0, there exists 8 > 0 such that for all x, y in' X,
1x—y| < 8 implies |f(x) «f(y)| < 2. (Readers familiar with
calculus will recognise this as the definition of uniform conti-
nuity of the function f on the set X. But that is not important
here. )
4.2 Why is it that the correct negation of ‘If it rains the streets get wet’
is not ‘If it rains the streets do not get wet’ 7
4. 3 Take a few theorems you know and cast them in the various forms
of an implication statement. Also, state their converses. Give
examples of 'ifi" theorems in various versions.
4.4 Give an example of an ‘ifl" theorem where the converse implication
is proved merely by reversing the proof of the direct implication.
50 Discknra MA'n-iam'ncs (Chapter One)
4.5 Give examples of theorems which are true but whose converse: are
false.
4.6 Give examples of theorems whose converses are also true but where
the two difl‘er considerably in their depth.
4.7 Let p. q, r be statements and suppose q is stronger than r. Show that
the implication statement p -) q is stronger than the implication state-
ment p ->r (i.e. (p—) q) —> (p —> r) is true given that q —> r is true). In
other words, among two implication statements with the same hypo-
theses, the one with the stronger conclusion is stronger. What can be
said about the relative strengths of two implication statements having
the same conclusion but different hypotheses ?
4.8 In each of the following pairs of statements determine which of the
two statements is stronger. If the two statements are not comparable,
or are equivalent, justify why?
(i) Statement p : For every man there exists a woman who loves
him.
Statement :1 : There exists a woman who loves every man.
(ii) p: The diagonals of a parallelogram bisect each other.
q: The diagonals of a rhombus bisect each other.
(iii) p: The diagonals of a rhombus bisect each other.
: The diagonals of a rhombus bisect each other at right angles.
(iv) 1;: One of my friends is an actor and one of my friends is a
cricketer.
q: One of my friends is an actor and a cricketer.
(v) p: If a man is rich he is also intelligent.
q: If all men are rich then all men are also intelligent.
(vi) p: If Monday is a holiday I shall come.
: If Monday is not a holiday I shall not come.
(vii) p: nis an irrational number.
q: There exists an irrational number.
(viii) p: This glass is half filled.
11: This glass is half empty.
4.9 Decide which of the following arguments are logically valid. In each
case you are given some statements as premises and some statement
as a conclusion. Your problem is to decide whether the truth of the
conclusion necessarily follows from the truth of all the premises
simultaneously.
(i) Premises: No man is rich unless he is intelligent.
John is a rich man.
Conclusion: John is intelligent.
(ii) Premises: Every man is mortal.
Conclusion: John is mortal.
Introduction and Preliminaries 51
(iii) Premises: If it rains the streets get wet.
It does not rain.
Conclusion: The streets do not get wet.
(iv) Premises: If it rains the streets get wet.
If the streets get wet, accidents happen.
It rains.
Conclusion: Accidents happen.
(v) Premises: Same as in (iv), except the third premise.
Conclusion: If it rains, accidents happen.
(vi) Premises: The streets get wet only if it rains.
It rains.
Conclusion: The streets get wet.
(vii) Premises: If it rains the streets get wet.
If it rains, accidents happen.
The streets get wet.
Conclusion: Accidents happens.
(viii) Premises: Every human being is either a man or a woman.
Conclusion: Either every human being is a man or every
human being is a woman.
4.10 Point out the logical fallacy in the following proof which shows that
every cyclic quadrilateral is a rectangle.
Proof: Let ABCD be a cyclic quadrilateral. Draw a circle with the
diagonal A C as a diameter. Then B. D lie on this circle as ABCD is
given to be cyclic. But then, LB, [_D are both angles in a semi-cricle
and so each is a right angle. Similarly, drawing a circle with ED as
a diameter we see that 4A and 4C are also right angles. So ABC‘D
is a rectangle.
A vicious circle in logic consists of afinitc sequence p,,pp....p,. of
terms in which the definition of p. involves pg, that ol’p, involves
p,..., that of p...I involves p. and the definition of p. involves P.-
(For example if we define a fool as an idiot, an idiot asa lunatic
and a lunatic as a fool we get a vicious circle.) Prove that because
of primitive terms, there are no vicious circles in mathematics.
4.12 Take an English-into-English dictionary (any other language will
also do). Start with any word and note down any word occurring in
its definition as given in the dictionary. Take this new word and note
down any word appearing in its definition. Repeat the process with
this new word until a vicious circle is formed. Prove that a vicious
circle is unavoidable no matter which word one starts with.
(Caution: The vicious circle may not always involve the original
word).
52 inseam MATHEMATICS (Chapter One)
Notes and Guide to Literature
The review of logic here is intended only to build a certain discipline of
thought. Many of these topics will be dealt with more formally when we
' study Boolean algebras. But even there we shall not go very deep into
mathematical logic as such. A good reference on it is Mendelson [l].
The material here overlaps considerably with the first chapter of
Joshi [1].
Euclid’s geometry is even today considered as a monument of axio-
matic deduction. It may he noted, however, that in the middle ages,
proofs as we understand today were note always given. It is said that as
great a mathematician as Euler (1707—1783) derived numerous formulas
about series but did not prove the convergence of even one of them.
It is believed that Weierstrass (1815-1897) was the first one to insist
upon rigorous proofs. The insistence upon rigorous definitions is even
more recent. The need for it was pointed out most acutely by the Russell’s
paradox (which we shall mention in the next chapter.)
Although, because of bivalued logic, truth cannot be quantified, we
can nevertheless talk of the probability of a statement being true. For
example suppose A and B are classes of 100 students each. Let in class A
there be 30 intelligent students and 70 dumb students and in class B let the
respective figures be, say, 98 and 2. Then the probability that a ‘typical‘
student is intelligent is .30 in class A and .98 in class B. However, for each
particular student the statement that he is intelligent is either true or false,
there being no other possibility.
Precise formulations of terms like ‘almost all‘ and ‘almost everywhere‘
require the concept of measure. Standard references on measure theory are
Halmos [2], Royden [l].
Riemannian geometry, named after Riemann (1826—66) is the mathe-
matical base of Einstein‘s theory of relativity.
Two
Elementary Counting
Techniques
As remarked at the end of the last chapter, set theory is the alphabet
of modern mathematics. In this chapter we acquaint the reader with this
alphabet in the first section. In the second section we study some elemen-
tary methods for counting the number of elements in a finite set. The third
section deals with the applications of these methods to some problems of
counting. The fourth section deals with the so-called principle of inclusion
and exclusion. More advanced counting techniques will be studied in
chapter 7.
1. Sets and Functions
Since we are going to take a set as a primitive concept, we do not have
to define it. It is, nevertheless, instructive to see what difficulties would
arise in an attempt to define it.
Cantor defined a set as a plurality conceived as a unity. In other words,
the concept of a set involves mentally putting together a number of things
(or ‘objects’ as they are technically called) and assigning to the things so
put together a collective identity as a whole, that is, an identity separate
from that of each of these things‘. The things are called 'elements’ or
‘members’ of the set obtained by conceiving them together. Thus, a set
I'A reader familiar with the elements of Company law may recall that a company
has its own legal personality. apart from that of each of its shareholders (that
is why, a company may go bankrupt even when all its shareholders thrive in wealth).
A partnership, on the other hand, has no legal personality. This is in fact, the basic
difl'erence between a company and a partnership.
54 mscam MATHEMATICS (Chapter Tara)
consists of or comprises its members but is itself a different entity
from any of its members. For example, a team (i.e. a set) of cricket players
is not equal to any one of its players (not even the captain). This distinc-
. tion is to be scrupulously observed even when a set consists of just one
element, thus a one—man team is to be distinguished from the lone player
in it. Actually, a team is a concept which has no material existence at all.
Of course we may assign to a set certain attributes in terms of the corres-
ponding attributes of its members. either by making aconvention or by
common sense implications. For example we may define 'A team of players
is going from Madras to Calcutta’ to mean that each player in it is going
from Madras to Calcutta. 0n the other hand a set has certain attributes
which cannot be described in terms of any attributes of individual
members. For example ‘a large flock of birds' is not the same as ‘a flock of
large birds’. Similarly the length of a line-segment is a property of the
segment as a whole and not of the individual points which constitute it.
It is obvious that sets of material objects are as old as human thought.
But we often have to consider sets of abstract objects, such as integers.
real numbers, lines etc. Indeed, we frequently come across sets whose
objects are themselves sets of some other objects, for example the set of
all teams participating in some tournament. However, a little care is neces-
sary in handling such sets. For example. a set of sets of lions is not a set
of lions much the same way as a set of lions is not a lion.
If we can form sets whose elements are themselves sets, aquesticn
naturally arises whether we can form the set of all sets, that is, all possible
sets at all. Note that this set will be extraordinary in the sense that it will,
unlike a set of lions or a set of real numbers, be a member of itself! Let us
agree to call a set as ordinary it' it is not a member of itself, and extra-
ordinary otherwise. Most of the sets that we come across are ordinary,
but some (e.g. the set of sets just considered) are extraordinary. Now let
S be the set of all ordinary sets. A deceptively innocuous question is
whether the set S is ordinary. We are doomed to get a contradiction
whether the answer is afiirmative or negative. Such a situation is known
as a paradox. This particular paradox is due to the philosopher-mathe-
matician Russel. It has revolutionised the approach to set theory for it
clearly shows that a set cannot admit such a simplistic definition as ‘a
collection of objects’. Either we must put some restriction on these objects
(e.g. by requiring that they be material objects) or else we have to admit
that most, but not all collections of objects are sets. The second alternative
is the lesser of the evils.
The approach to set theory in which an attempt is made to define a set
and postulate a number of axioms about sets so as to avoid paradoxes is
known as the axiomatic set theory. Although of considerable interest in its
own rights, we shall not follow this approach, because our interest in sets
is primarily as building blocks from which we can construct everything
Elementary Counting Technique: 55
else. It turns out that most of the concepts in mathematics can be expres-
sed very succinctly and precisely in terms of sets. Even the integers (and
hence the real numbers as well as the complex numbers) as well as their
addition. multiplication, etc. can be defined in terms of sets. Although we
shall not go this far, we shall have ample evidence to show the utility of
sets in giving precise expressions to what we can describe only intuitively
otherwise. It is no exaggeration to say that sets are the alphabets of
modern mathematics.
Because of our interest in sets as a means rather than as an end, we
follow the so-called naive approach to set theory. In this approach, the
paradoxes are avoided by confining ourselves only to certain sets. The
most common choice is to fix some set, say U (to be called the universal
set or the universe) and to agree that all the sets under consideration would
consist only of elements of U; The choice of the universe need not be speci-
fied, but of course must be large enough to include all the symbols,
expressions, and concepts that we deal with. The existence of a universal
set with sutficiently nice properties is a matter of axiomatic set theory.
After these general remarks about set theory, let us nowlist; specifically,
the preliminaries we shall need about sets. For our purpose, ‘set‘ will be
a primitive term and the terms ‘set’, ‘family’. ‘aggregate‘ and ‘eollection’
will be synonymous. The term ‘class’ will be of a wider import, in that it
will be used not only for sets but also when we put together too many
things to form a set. For example we shall speak of ‘the class of all sets’
rather than 'the set of all sets’. A ‘member’ or an 'element’ of a set will
be another primitive term, and the relation of being a member will be
denoted by e. The expression “1165’ will be read variously as ‘a is a
member (or an element) of S' or as ‘a belongs to S’ or “a is contained
in S‘ or ‘S contains a'. The last two versions, however, are not particularly
recommended to a beginner as they are likely to be confused with set
inclusions, which we are going to define. Sets will generally be denoted by
capital letters and their elements by small letters. To denote sets whose
elements are themselves sets, we shall generally (but not always) use script
letters, such as 91, fl 9, fl, 3?, .‘Z’, .9’ etc. and we shall describe such sets
as ‘families‘ of or 'collections‘ of sets rather than as sets of sets. Occa-
sionally the word ‘point' is also used instead of the word ‘element’ or
‘member’ and, in the same vein, ‘a E S‘ and we S’ are sometimes expres-
sed, respectively, by '1: lies or is in (or inside or within) S’ and ‘a is or lies
outside S’.
A set is completely determined by its elements. This means that two
sets having exactly the same elements must be identical. There are two
ways of specifying a set directly in terms of its elements. The first is to list
all the elements together within curly brackets. Thus {1. 2, dog} is the
(unique) set whose elements are l, 2 and dog. In so doing, neither the
order of appearance of the elements nor the repetition of any elements
56 DISCRETE MATHEMATICS (Chapter TWu)
makes any difference. For example the sets {1, 2,» dog), {2, l, 2, dog},
(dog, 2, 2, 1} are all identical. It is, of course, not necessary to write down
all the elements if it is possible (either from the context or because of some
convention) to infer all the elements from those that are actually listed.
Thus, {1, 2, 3, ...,19, 20} is clearly the set of all integers from 1 to 20.
Similarly, (2, 4, 6, 8, 10, 12, 14, ...,2n, ...} is the set of all positive even
integers. _
Another method of specifyinga set by specifying its elements, Is by
specifying what is known as a characteristic property of the set. A churne-
teristic property of a set is a property which is satisfied by each member
of that set and by nothing else. The same set may have more than one
characteristic property. Each set has a trivial characteristic property.
namely, the property of belonging to that set. This property is, of course,
useless if we want to describe the set in the first place. If, howeverI We
know some other characteristic property (and usually such a property is
present when we conceive a set) of a set, then the set can be described in
terms of it. The standard notation for a set so described is {x: ) or (xl }.
Here x is a dummy symbol and could have been replaced by any other
dummy symbol. The space between: (or | ) and) is to be filled by a. state-
ment to the effect thatx has the property in question. For example, the
set of cows owned by a farmer F can be denoted by (x: x is a cow and
is owned by F} or {x} x is a cow and is owned by F}. Similary, (y: there
exists a positive integer x such that y = x‘} is the set whose elements are
1, 4, 9, 16, 25, ..., etc. In such cases it is customary to abbreviate the not-
ation as (y: y = x“ for some positive integer x). As in the present case, it
often happens that all the elements of a set can be expressed in a cer-
tain common form in terms of one (or more) variable. In such a case it
is customary to denote the set by writing such a form in curly brackets and
specifying what values the variable can take. For example the set just
described could have been written as (x’: xa positive integer} or as
(x‘: x = 1, 2, 3, ...}. Note that here too x is dummy symbol.
When a set is described by specifying some characteristic property and
the statement expressing this property is the conjunction of several state-
ments, the word ‘and‘ is often dropped and only commas are used to
denote the conjunction. For example (1:: x real, 3: > 0, x’ < 2} is the set
of those real numbers which are greater than 0 and whose squares are less
than 2. As in the present case, it often happens that the elements of the
set in question are required to come from some other set. In such cases it
is customary to write this requirement before the colon ( :)rather than
after the colon. For example if we denote by R the set of real numbers
and R+ the set of positive real numbers, then the present set could have
been denoted by {fee R: x > 0, x‘< 2} or by {x e 11*: x’ < 2}.
IIn the same vein, when a set is characterised by a number of conditions
which are of a Similar form and can be written in terms of a dummy
Elementary Counting Techniques 57
variable, it is customary to write only the values assumed by this dummy
variable and to omit a specific mention of the requirement that-all the
conditions obtained by assigning the various values to the dummy variable
are to be simultaneously satisfied. For example let S be the set of those
integers which are divisible by every integer from 1 to 20. Then S can be
written as S =(x: x an integer, x is divisible by i for all i = l, 2,...,20)
or as S = (x: x an integer, x is divisible by i, i= 1, 2, ..., 20}. Note that
in the latter expression the words ‘for all’ are omitted and are to be under-
stood. A beginner is warned against interpreting such omission as ‘for
some’. The set (x: x an integer, x is divisible by some i, i: l, 2. ,..,20}
will include many elements not present in the set S. The distinction bet-
ween the quantifiers ‘for all‘ and ‘for some’ is vital and confusing the two
with each other may be disastrous. The situation is admittedly diflicult for
a beginner since some authors tend to omit both the quantifiers and leave
it to the reader to infer from the context which of them is intended. To
avoid confusion, in this book the existential quantifier (‘for some’ or ‘for
at least one’) will always be mentioned specifically.
It is possible to conceive a set with no elements at all. Such a set is
variously known as an empty set or a void or a vacuous or anull set.
Examples of such sets are the set {x: x an integer and x' = 2} or the
set of all six-legged men. One can of course give many other examples.
Note, however, that all these sets are equal because they consist of identi-
cal elements (viz. no elements at all). To say that two empty sets are
unequal would mean that at least one of them contains an element which
is not present in the other. Since such an element does not exist, we have
to agree that they are equal. By the same logic, any statement to the efl'ect
that a certain property holds for every member of the empty set is true,
albeit vacuously so. The unique empty set is denoted by 0 or by ¢- These
notations are so standard that they are used even when the same symbols
may also represent something else. No confusion results because of such
a double use. ‘
Given two sets S and T we say that S is a subset of T (or T is a super-
set of S, or S is contained in T or T contains S) if every element of S is
also an element of T. When this is the case we write S C T or T :> S. If
S c T but S 36 T, then we say that S is a proper subset of T and writs
Sc T. The reader is cautioned that some authors use the notations C
;e
and c where we use C and C , respectivelly. Obviously, two sets are
' as
equal if and only if each is a subset of the other. This is indeed the most
straightforward way of proving that two sets are equal. The words ‘sub-
family' and 'subcollection‘ are synonymous with subset. Words, such as
‘subaggregate', ‘superfamily‘, could be defined but are rarely used. The
term ‘subclass' will not be defined formally, but has the same relationship
58 DISCRETE MATHEMATICS (Chapter Two)
with a class as a subset has with a set. This relationship is known as
inclusion. We shall define set inclusion formally when we define a relation.
Given two sets S and T, the complement of S in T(or with respect
to T) denoted by T — S (or by T~ S or T\S) is the set of such elements
which are in T but not in S. Thus T—S={x:x E T and x¢ S} or
T— S= {x E T 2x¢ S}. Note that we are not requiring here that S be a
subset of T, although the complement T—S is always a subset of T. When
the set with respect to which complements are considered is understood,
the complement of S is denoted by 5’, ~ S or by c(S).
If S is a set then the set of all subsets of S is Called the power set of S
and will be denoted by P(S). Note that the emtpy set and the set S itself
are always members of the power set P(S). In particular, a power set is
never empty. it is easy to show that if S has n (distinct) elements then P(S)
has 2n elements and this is the reason for the name ‘power set‘. Elements
of the power set are in general quite difl'erent from those of the original set
and the two should not be confused with each other. If x is an element
of the original set S, then {x} is a subset of S and hence {a} is an
element of the power set P(S); whereasx itself may or may not be an
element of P(S).
Given two sets S and T we define their union (sometimes called join) to
bethe set (x:x e Sorx e T). ltis denoted by S U Tor by S+T. It
consists of elements which belong to at least one of the two sets S and T.
.we can similarly define the union of three or more sets, as the set of all
elements which belong to at least one of them. The intersection (also called
the meet of two sets S and T is defined as the set (x : e S and x E T). It
is denoted byS n Tor ST or even ST. It consists of those elements
which belong to both S and T. Similary, the intersection of three or more
sets is the set of elements which belong to all of them. It may happen that
ST is empty even though neither S nor T is empty. If Sn T = 4;, then S
and T. are said to be (mutually) disjoint. If Sn Tgé if, the two sets S and T
are said to intersect (or meet) each other at points of SflT or simply said to
Intersect.
If the sets are denoted by some indices, say, A. A....., A. then their
union and intersection is denoted, respectiveiy, by CI .4, and iii A., This
[=1 [—1
is analogous to the 2 notation for summation. Here i is a ‘dummy' index-
ing variable and could be replaced by any other symbol.
Because the concepts of complementation, union and intersection appear
so frequently, it is helpful to form some graphic intuition about them. This
is done using what are calld Venn diagram. In such diagrams, some universal
set is pictured as some planar region and various subsets of it as suitable
subregions of it. One such Venn diagram is shown in Figure 2.1. Here X is
the universal set, A and B are subsets of X (shaded by horizontal and vertical
lines respectively.) The various subsets that can be formed from A and B
Elementary Counting Technique: 59
figure 2.l : Venn Diagram
are shown in the diagram. The advantage of such diagrams is that they
suggest some results which then can be verified by reasoning using definitions.
For example, this Venn diagram clearly shows, that thethree sets A ~B, B — A
and A nB are pairwise disjoint (i.e. every two of them are mutually disjoint)
and that their union is precisely A u B. As a slightly more non-trivial example,
we see that A’ n 3' equals the complement of AUB, or in other words,
(A n B)‘ = A’ UB'. An analytical proof, of course, has to be given as we do
in the following proposition which lists a number of elementary properties
of the concepts introduced so far. Most of them are known by certain names,
which are also given against them. The full significance of these names will
come only later.
(1.1) Proposition: Let X be any set. Then for any three subsets A. B, C of
X the following properties hold:
(i) A U B = BU A and A n B = B n A (commutative lam)
(ii) An (5U C) = (AnB)U(A nQandAU (50C) =(AUB)n(AUC)
(Distributive Laws)
(iii) A U¢ = A and A nX = A (lndentities for U and n)
(iv) A nA' = 95 and A UA’=X, where ’ indicates complementation wont.
X. (Properties of complements.)
(v) A UA = A and A (1.4 = (Law of Taumlogy)
(vi) A U X = X and A [W = 45 (Law: of Absorption)
(vii) A UB UC = A U(BUC) and (A nB)n C = A {1(BnC) (Associative
Laws)
(viii) IfA UB = X andA n3 = 9i then B = A' ( Uniquenessofcomplements)
(ix) (A’)’ = A (Law ofDouble Complemenlation)
(x) (A UB)’ = A’ n B’ and (A nB)’ = A’ U B' (De Morgan's Lam).
Proof: Most of these properties are simple, almost to the point being
trivial. They can all be established in the most straightforward manner of
proving that two sets are equal, namely, by showing that each is a subset
60 ntscnara MATHEMATICS (Chapter Two)
of the other. As an example, consider the first identity in (fitmmely. A H
(BU C) = (AnB)U(AnC). Let S and T denote, respectively, the sets on
thetwo sides; i.e. S= An(BUC) and T: (A nB)U (AnC). We have
to show that S = T. First We show S C T. So we start with an arbitrary
element, say x, of S. Then by definition, x E A and also x E BUC.
The latter means x e B or x E C. In the first case, x e A 08
while in the secOnd, x e A n C. In either case, x e (A n B) U (A n C),
i.e. xe T. Thus every element ofS is also in T, i.e. S c: T. Now for the
other way inclusion let x E T. Then either x E An]! or x e A nC. In
either case x E A. Also in the first case. x E Bwhile in the second, x E C.
In any case xe BU C. So xEAn (BnC), Le. x e S. This proves that
T C S Putting it all together, S = T as was to be shown. The proof of the
second identity in (ii) is similar.
We have given this proof in more detail than it deserves because it is
th first proof of this type. As another illustration we prove the first of the
two De Morgan’s laws, a little tersely this time. Let x E (A U B)‘. Then x ¢
Auli. Then at ¢ A since otherwise x e AUB. So at e A’. Similarly xe 3'.
Hence x E A’n 3'. Conversely, let x e A'nB’. Then at e A’ and x e B’.
Now if x e AUB then either 2: E A or x e B. The first case contradicts
that x E A’ while the second case is ruled out becaure x E B‘. so rest
A U B. Therefore x e (A UB)’.
We leave the proofs of the remaining assertions as exercises. I
The interesting point to observe is that once the first four properties
are established, the remaining can be derived in a purely formal manner,
without actually looking at the elements of the sets. For example consider
the second law of tautology, namely, AnA = A. We prove this as follows.
Because of (iii), we have A = AU¢. By (iv) a5 = AnA’ and so A =AU
(A nA’). Now, the second identity in (ii) is true for any any three subsets
A,B and C ofX. In particular, it is true if we take I? as A and C as A’.
This gives A = A U(A nA') = (A UA) n(A UA’). Now once again, A UA'
is X by (iv) and so we get A = (AnA)nX. Finally, (iii) is true if we replace
A by any subset of X. We replace A by A {1.4 andget (A nA)UX = A 0A.
Putting it all together, We get A = AnA as was to be proved.
The reader may wonder what possible advantage this proof has when a
direct proof of the fact that A = AnA is so immediate right from the
definition of the intersection of two sets as the set of points common to
both of them. This is a very pertinent question and its full answer will
come only when we study ‘abstract’ Boolean algebras. For the moment,
we leave it as an intructive exercise to the reader to prove the other asser-
tions above by this approach.
As another result about the basic concepts involving sets, we charac-
terise set inclusion in the following proposition. The reader is also urged
to verify its truth by drawing appropriate Venn diagrams.
Elementary Counting Techniques 6|
1.2 Proposition : Let X be a set and let A, B be any subsets of X. Then
the following statements are equivalent:
(i) AcB,i.e.AisasubsetofB
(ii) AnB’=qS
(iii) AnB=A
(iv) AUB=B
(v) B'CA’.
Proof: Once again, the proof itself is trivial. But let us first see what the
proposition says. It does not say that any of the five assertion listed is true.
It merely says that the five assertions are mutually equivalent. That is,
when any one of them holds, all must hold. We shall see many proposi-
tions of this type, where what is to be established is the relationship bet-
ween certain statements and not the truth of any one of them.
In proving a proposition like this, we would have to take each of the
given statements and prove that its truth implies that of every other. This
procedure can sometimes be instructive but is 'often too lengthy, (in the
present instance, we would have to prove 20 implication statements).
Moreover, frequently, some of the statements do not directly imply some
others. In such propositions, therefore, we resort to a simple fact from logic.
Let p, q, r be any statements. If the implications p—>q and q -> r are true
then the implication p a r is also obviously true. This is called the law of
Syllogisrn. Because of this law, it suflices to prove the equivalence in a
‘eyclic‘ manner. For this, we arrange the statements inaconvenient order,
prove that each one implies the next and complete the cycle by showing
that the last statement implies the first. Sometimes it is more convenient to
have ,two or more cycles. Of course, then one has to further prove that
some member of one cycle is equivalent to some member of another. If
some statement is common to the two cycles, then obviously this verifica-
tion need not be done.
With these generalities about propositions of this form, let us now get
down to the actual proof. We shall prove the equivalence of the statements
in two cycles (i) .1. (ii) => (iii) :9 (iv) :> (i) and (i) :- (q) 9 (I). (Many
other choices are also possible). We prove each of these implications sepa-
rately.
(i) => (ii). Here we are assuming (i) as true. That means we are given that
A is a subset of B. We have to prove that A {18’ is the empty set. For
this, suppose there is some element, say x, in AnB’. Then by definition
xeA and x E 5’. But since A C B, x e A implies that x e B. We
thus get a contradiction to x e 3’. Thus there cannot be any element in
A n B', i.e. A 03' = ¢- So (ii) holds whenever (i) does.
62 DISCRETE MATHEMATICS (Chapter five)
(ii) => (iii) Here we are given 1103’ = 4: and we have to prove that AnB
=A. clearly A n B is always a subset of A and so we always have A nBCA.
x e B’,
For the other way inclusion, suppose x E A. If x ¢ B then
B
whence it e A n B’ which is given to be empty; a contradiction. So x e
AnB.
and thus x 5 Ann. Therefore A C AnB, and hence A =
AUB
(iii) => (iv). This also could be done by taking elements of the sets
and 8. But we give an alternate argument, based on the properties proved
in the last proposition. We have,
3 = B n)? (by Property (iii) in the last proposition)
= Bn (A U A’) (by imperty (iv))
= (BnA)U (BnA’) (by Property (ii))
= (A n B) U (A' n B) (by Property 6))
= A U (A' n B) (as we are given A nB=A)
= (A U A') n (A u B) (Property (ii) again)
= X n (A U B) (Property (iv))
= A U B (Property (iii)).
(iv) a (i). A is always a subset of A U B, regardless of what Bis. So ACA
U B. We now merely replace A UB by B as we are assuming (iv).
(i) a (v). Suppose ACE. We have to show 3’ c A7. Let x e 3’. Then
x¢B. lfxeA then x e B, a contraction. So x e A’ as was to be
proved.
(v) = (i). This is similar to the last implication. I
Besides union and intersection, there is one more way to generate new
sets from two (or more) sets. We often have to simultaneously consider
two variables, say, x and y of which 3: ranges over a set A and y over a
set B (say). (The sets, A, 19 need not be distinct.) It is then convenient to
represent this situation by an equivalent single variable 2 which ranges
over a suitable set constructed from A and B. We already saw an instance
of this in the House Problem in Section 1 of the last chapter, where we
remarked that picking two houses on the road was equivalent to picking a
single point from the square constructed there. More generally, given any
two sets A and B, we define their cartesian product (or simple product) de-
noted by A>< B to be the set of all ordered pairs (x, y) such that x is from
A and y is from B, in symbols Ax B ={(x, y): x e A, y E B}.We do not
Elementary Counting Techniques 63
define an ordered pair‘, but remark that unless x = y, (x, y) is not the
same as (y, x). The word ‘cartesian’ here comes from cartesian coordi-
nates, because of which a point in plane is represented by an ordered pair
of real numbers (its co-ordinates w.r.t. some fixed frame of reference).
The word ‘product’ isjustiiied because if the sets A and B havem and
n elements respectively, then it can be shown that A XB has mn elements.
Note however, that as sets, A x3 and BxA are not equal even though
they have the same number of elements. Similarly we can define the carte-
sian product of three sets A, B, C as the set of all ordered triples (x, y, z)
in which x e A and, y E B 2 e C. More generally, for any positive integer
n one can consider ordered n-tuples of elements and use them to define the
cartesian product of It sets, say, A], A,,..., A”. It is not necessary that
these sets be all distinct. In fact, it may even happen that all of them are
equal to some set, say A. Then it is customary to call A,><A.>< xAn the
product of n copies of A or the nth power ofA. Clearly, A2 = A xA.
We now turn to the concept of functions which is of paramount impor-
tance in mathematics. Intuitively, if X and Y are sets then a function, say
f, from X to Y is a rule of correspondence which assigns to every element
of X, a unique element of Y. If x stands for an arbitrary element of X
then the unique element of Y which is assigned to at under f is denoted by
f(x). Common notations fora mnction I from a set X to a set Y are
f : X —> Yor X—> Y. The sets X and Y are called, respectively, the
domain and the codomain offl The set 6/ defined by {(x. f(x)) : xe X}
is called the graph off. The term is obviously of geometric origin, because
if both X and Y are subsets of the real line then X x Y is a subset of the
cartesian plane and for a function f z X —> Y. the set G; indeed represents
the graph of f in the usual sense. Note that for every x in X there is one
and only one y in Y such that (x, y) 6 Cf. (Geometrically this means that
every 'vertical' line through a point in X meets the graph at precisely one
point.) Note further that a function is completely determined by its graph.
This fact, along with the ‘definition trick' (mentioned in Section4 of the
last chapter) gives the formal definition of a function. Formally. a function
from a set X to a set Y is defined as a subset G ol'X x Y having the pro-
perty that for each x e X, there is one and only one y E Y such that
(x, y) E G. (If X = 95, then we may take G = 95. Thus there exists afunction
from the empty set to any set.)
We can give numerous examples of functions. But before doing so,
some points need to be emphasised because the practice adopted by various
authors about them is not uniform. First, we require that every function
must be defined on the entire domain set. Forexample, if X and Yare each
the set of real numbers then the formula f(x) = l/x does not define afunc-
‘ The formal definition is slightly clumsy. The ordered pair (I, y) is defined as
the set {(x, y), x). Note that mere (x, .7) would not do because {x, y} is the same as
(y, x}. while we do not want (x. y) to be the same as (y. x).
64 mscnnra MATHEMATICS (Chapter Two)
tion from X to Yunless f(0) is also specified. Some authors do allow such
functions and then define the domain of a function to be the set of those
points where it is defined. We shall not adopt this practice. Similarly. we
require a function to be single-valued, that is, f(x) is uniquely defined for
every x E X. For example, if X is the set of positive real numbers and Y
the set of all real numbers than we cannot define afunction f : X -—> Y
simply by f(x) = square root of .x It is necessary to specify which of the
two square roots is to be taken. The choice may be arbitrary. For example,
We may definef : X—> Y by
non-negative square root of x if x is rational
for) =
negative square root of x if x is irrational.
Note that there are many real numbers (for example. e + 1:) which are
not known to be rational or irrational. Even for such numbers the function
above is well-defined, because it has a unique value, the only trouble being
that we do not know this value today. (Once again such a situation is not
allowed in constructivist mathematics.)
Another point to note about functions is their notation. In calculus
books it is customary to denote a function by a symbol like for) where x
is a variable which ranges over the domain set. Since we shall have occa-
sions to consider sets whose elements are themselves some functions, it is
necessary not to confuse a function either with a formula for it or with any
value assumed by it. We therefore denote functions by single symbols like
f, g, h, 95, e, )3 etc. An expression like f(x) denotes the value off at x, that
is, the element in the codomain set which is associated to x under the func-
tion f. In this notation, x is sometimes called an argument off.
As examples of 'non-mathematical‘ functions, let H be the set of all
human beings and M the set of all men. We can then define a function
f : H —> M by f(x) = father of x. As another example, in the Tournaments
Problem, letM andP denote, respectively, the set of all matches played
and the set of all participants. We define two functionsfand g from M to
P, to be called, respectively, the winner function and the loser function. If
m e M then m is a match played and we let f(m) be the player who wins
it and g(m) be the player who loses it. The loser function considered here
is very useful as we shall see later.
The words ‘transformation‘, ‘operator’, ‘map’ and ‘mapping’ are really
synonyms of ‘function‘ although by convention they are used only in some
specific contexts. We shall use ‘function’ as the general term and reserve
the terms ‘transformation, and ‘operator’ for certain special types of func-
tions.
Suppose f and gore functions such that the codomain of f coincides
with the domain of g; sayf: X -> Yand g : Y—> Z. Then we define their
':.mm- r ‘ A , .1 “‘bygof(or ' bygflto be the
Elementary Counting Techniques 65
function from X to 2 given by‘(g of) (x) =g(f(x)) for x e X. For example
let X, Y, Z each be the set of real numbers. Let f, g be defined by f(x) = x‘
for x E X and g(y) = sin y for y e Y. Then (3 °f) (X) = Sin (’6‘). Now
that in this case the compositef o g is also defined but is not equal to gof,
because f «3 g(x) = sin’ x which is in general different from sin (x'). We
can also give examples where even though 3 a f is defined, f e gis not
defined. Iff, g, h are three functions for which g a f and h o g are both
defined then it is clear that h o (g of) and (h o g) nfare both defined and
are easily seen to be equal because if x is any element of the domain of f
then [h o (g o f)1 (x) and [(h a g) o f ] (x) both equal h(g(f(x))). We denote
this function by h a g of.
The simplest functions are the so-called constant functions. They assume
the same value for all values of the argument and are often denoted by this
common value. For any set X, the function 1,, : X—> X(or Id; 2 X—> X)
defined by lx (x) = x for all x e X is called the identity function on X.
More generally, if Yis a superset of X then the function i : X -> Y defined
by i(x) =x for x E X is called the inclusion function of X into Y. If
f:X —> Y is a function and A c X then the restriction off to A, denoted
by f/A:A -> Y is the function defined by (flAXx) =f(x) for all XE A.
Equivalently, it is the composite off and the inclusion ofA into X. A function
f :X —> Yis said to he injective (or one-to—one) if for all x1, x,e X, f(x,)
= f(x,) implies x1 = x,. In other words, a function is injective if it takes
distinct points of the domain to distinct points of the codomain. Afunction
f: X a Y is said to be surjective (or onto) if for each ye Y there is some
xe X such that f(x) = y. A function which is both injective and surjective
is called a hijective function or a hijection. A bijeetion of a set onto itself
is called a permutation of that set. It is easy to show thatafunction f:X -> Y
is bijective ifl‘ there exists a function g: Y —» Xsuch that g of: id; and
f. g = idy. When such a function exists it is unique and called the inverse
function of f. It is denoted by f". Note that the inverse function is also a
bijection. The term ‘one-to-one correspondence' is sometimes used for
‘bijection’ and is thus difl‘erent from ‘one-to-one function.
Letf:X—> Ybe a function and suppose A, B are subsets of- X, Y.
respectively. Then the direct image (or simply, image) of A under f, denoted
by[(A) is defined as the set (fix): x e A). The set A is said to be taken (or
mapped) by (or under) f onto f(A). The inverse image (or [Ire-image) of 8
under f, denoted by f-‘(B) is defined as the set {XE X :f(x) 6 B}. The
inverse image is defined even where f is not a bijection. In case I is a bijec-
tion, it coincides with the direct image under the inverse function f-1 and
so the notation causes no ambiguity. The set f(X) is called the range of f. it
is evident that a function is onto iff its range is the whole codomain.
Just as Venn diagrams provide graphic intuition forsetsi arrows between
such diagrams graphically represent functions. One such diagram is shown
in Figure 2.2. Arrows are also drawn from several points in the domain to
66 DISCRETE MATHEMATICS (Chapter Two)
9 (flxll
Flume 1.2: Functions and Comma
the respective values of the function at these points. From the figure we
see at once that f is not injective‘ Similarly for a subset A of X. its image
under f is shown as a subset of Y and an arrow is put from A to f(A) We
see some simple properties of direct images, namely,
f(A1U A.) =f(A1)Uf(AI) and (8 ° 1) (A) = s(f(A))-
The proofs of these and other similar properties are left as exercises.
We conclude this section by illustrating how the basic concepts of a set
and a function can be used to provide precise formulations of various
concepts and problems. In the last chapter we already saw how a precise
definition of a sequence can be given as a function with domain N, the set
of positive integers.’ Similarly an m x n matrix of real numbers can be defined
rigorously as afunction from A x B into R, the set of real numbers, where
A and B are respectively the sets {1. 2,..., m) and {1, 2,..., n}. instead of
deseribing it loosely as a rectangular array with m rows and n columns. An
arrangement of distinct balls into distinct boxes amounts 'to defining a
function from the set of balls into the set of boxes. A requirement such as
no box be empty clearly amountsto saying that the corresponding function
is onto.
To illustrate the use of sets in representing problems, we consider two
problems, the Locks Problem and the Dance Problem. We shall only
paraphrase them set-theoretically. The solutions will be given later on.
In the Locks Problem let the five persons be p,, 11,. [7,. p. and pi. Let us
denote the set of locks to be put by L. Now for each i = l, 2,..., 5 we let
L, be the set of those locks in L which can be opened by the person p,.
Then L. L,,.. . , L, are subsets of L. The requirement in the problem amounts
to saying that the union of any three of these five subsets be the whole set
L while the union of any two should not be the whole set L. Using De
Morgan laws, this can be translated further as follows. For each i, let M,
*A finite sequence of length n is a function from (1,. . . , n}.
Elementary Counting Technique: 67
be- the complement of L, in L. Then the problem amounts to finding a
suitable set L and some five subsets M., M,,.... M‘ of L such that (i) for
any i and j, M‘nMfiéd and (ii) for any three distinct i, j, k,
MflMja =¢-
. In the Dance Problem, let us denote the girls present by g,, g,,...,g,..
Let B be the set of boys at the party. For eachi: l, 2,..., n we let B,be
the set of boys who dance with the girl g]. Each B, is a subset of B. The
information that every boy danced with at least one girl amounts to saying
that 'U1 8, = B, while the statement that no girl danced with all the boys
means that no B. equals the whole set B. Now, for a paraphrase of the
assertion of the problem, note first that for any i and j, 8, c: B, is equiva-
lent to saying that every boy who dances with g, also dances with g,.
Therefore to say that there is some boy who dances with g, but not with
g,- is equivalent to saying that B, is not a subset of B}. The problem now
amounts to showing that there exist some two girls, say, g, and g, such
that neither B,CB, nor 81:3,.
Of course we have not yet solved either the Locks Problem or the Dance
Problem. But we have made a good start. We remark that reducing a pro-
blem in terms of sets is not, by itself, a guarantee for its solution. The
solution may require a good deal of more work, depending on the type and
the degree of difliculty of the problem. But such a reduction helps us
crystallise our thoughts and gain some insight as to the direction in which
to proceed for solution. It also helps tremendously in presenting the solution
concisely and precisely. Experience with the two problems just considered
shows that clever laymen (not familiar with set theory) can also sometimes
hit the right ideas. But they are awfully clumsy in expressing them, using
unintelligible, Vague expressions all the time and constantly interspersing,
‘No, this is not what I meant’.lf nothing else, set theory at least provides
us a means to say exactly what we mean and for this reason alone deserves
to be studied.
Exercises
Camplete the proof of Proposition (Ll).
._._.
~—
In Proposition (Ll), for properties from (v) onwards, give alter-
nate proofs, using the first four properties.
1.3 In Proposition (1.2), give a direct proof for each of the 20 possible
implication statements (Our proof already covers 6 of these impli‘
cations).
1.4 Let)! be a set and A, B be any two subsets ofX.
(a) Prove that AU B is the smallest subset (in the sense of inclu-
sion) of X which contains both A and B. In other words,
68 Discnm MATHEMATICS (Chapter Two)
prove that (i) A U B contains both A and B, and (ii) if C is a
subset of X which contnins A as well as B then C contains
A U B.
(b) Obtain u similar characterisation of A [13.
Let X and Y be any sets. Suppose ACX and B: Y. Prove that
A X RCX X Y. (For obvious geometric reasons, A x B is called a
box with sides A and B.) Generalize this to the case of the Cartesian
product of n sets.
1.6 In the last exercise, prove that the intersection of two boxes is
again a box (perhaps the empty box) but that the union of two
boxes need not be a box. If A = A1 U A, and B = BIUBp express
AXE as a union of ‘smsller‘ boxes.
Which of the following formulas define functions from the set of
real numbers, to itself? Which of these functions are injective,
surjective, bijective?
(i) f(x) = x' for all 1:
(ii) f(x) = x' — xfor all x
(iii) f(x) = VW
(iv) f(x) = e" for all 2:
any prime which divides x if x is a positive integer
(V) for) = .
0 otherwxse
. I if Goldbsch's conjecture is true
(VI) 10‘) = _
0 otherwise.
For any three sets X, Y, Z prove that there exists a bijection bet-
ween (X x Y) x Z and X x (Y x Z) and also a bijection between
X x Y and Y X X.
Extend the definition of the composite g o f of two functions f and
g to the case where the range off is contained in the domain of g.
Give an example where g a f is defined but f n g is not.
1.10 Let f: X —> Y and g: Y—> 2 be functions. Prove that
(i) for any A C X. (g °f) (A) = BUM»-
(ii) for any B c 2, (g ~f)"(3) =f"(s“(B))-
(iii) for any A c X,f-‘(f(A)) 3 A and for any BC Kflf-Kfl»cn
and f-‘(Y—B) = X — f-‘(B).
(iv) for any two subsets A“ A, of X and B, B, of Y,
f(AxU A.) = fa.) Uf(A.)
Elementary Counting Techniques 69
and
[(41 n 140C [(41) n f0“);
f“(31 U Bx) = f“(B,) n f_l(Ba)
and
f"(3; n 3. =f“(3) n f“(Ba),
4: C A. =1“) C IV)
and
51 C B. = f"(Bi) C f"(30‘
1.11 (a) Let f:X —> Ybe a function. Prove that the following state-
ments are equivalent to each other:
(i) f is injective
(ii) for every A C X'.f"(f(A)) = A
(in) for every A” A, c XJU. n A.) = fa.) n f(A.)
(iv) for any set Z and for any two functions 3, h:Z—> X,
fo g =fo h implies g :1: (in other words, f can be
cancelled from the left).
(v) either X =¢ or there exists a function g:Y—> X such
that g a f = id; (such a function g is called a left inverse
off).
(b) Obtain similar characterisations of surjective functions.
(c) Find necessary and suflicient conditions for the composite of
two functions to be injective, surjective. bijective.
Letf:X—> Xbe a function. The functions foflfofof; ...... are
denoted respectively by f', P,..., etc By eonvention,f° denotes 1x.
A point xoe X is called a fixed point off ifflxo) = x,. Prove that
a fixed point of f is also a fixed point off‘. Does the converse
hold?
1.13 Let 1, be the set (0, I} and X any set. For A c X, the function
q —>Z, defined byf4(x)= l ifxeA and f4(x)= 0 if x; A is
called the characteristic function of A, so called because the subset
A is completely characterised by the function L. If A, BcX,
express fAn 3, fAU B and fx.4 in terms off4 and fa. What are A,
.,
1.14 Lit M be the set of all men in some town and let H. R and I be,
respectively, the subsets of all happy men, rich men and intelligent
men. Express the following subsets in terms of H, R, I and their
complements ‘
(i) the set of all men who are happy and rich but not intelligent.
(ii) the set of all men who are not intelligent but are either happy
or rich.
70 DIS_CRBTE nammncs (Chapter Two)
(iii) the set of all men who are exceptions to the statement ‘every
rich man is happy’.
(iv) the set of all men who conform to the statement ‘every rich
man is happy’.
1.15 Suppose that in the town in the last exercise, there is monogamy,
all men are married and live with their wives. Let Who the set of
all women in the town and let A, B, C be respectively the subsets
of all rich women, beautiful women and happy women. Define
full -> Wbyf(x) =wife of x for xeM. Express the following
sets in terms of R. H, I, A, B, C, their complements and f.
(i)the set of all married women (in the town)
the set of all unmarried women (including widows)
(ii)
the set of all rich men who are married to beautiful women
(iii)
the set of women whose husbands (if any) are intelligent but
(iv)
neither happy nor rich
(v) the set of men who are exceptions to the rule ‘a happy man
has a happy wife’.
Paraphrase the Business Problem in terms of sets.
Prove that the loser function in the Tournaments Problem is one-
to-one. What is its range?
‘1.18 This exercise shows how functions of two variables can be looked
upon in a certain way. The result itself is simple and will not be
frequently needed in the sequel. But we give it as adrill in 'abstract’
reasoning. If A and B are any two sets then the set of all functions
from A to B will be denoted by BA. The reaso n for this notation
will be explained in the next section. Now let X, Y, Z be any sets
and f be a function from the cartesian product of X x Yinto 2.
Fix somey E Y. Definef,:X—>Zbyf,(x) = (x, y)t'orxe X. The
function f is of two variables while f, is a fun ction of only one
variable. it is said to be obtained from fby fixing the second variable.
Y XX{Y} f
A
_—._
X x
Figure 2.3: Function of Two Variables.
f, is also suggestively denoted by [I~, y). If we fix the first vari-
able, I would give rise to a function from Y to Z.
(a) Prove that]; is effectively the restriction of fto the box
Xx (y)
contained in Xx Y (see Figure 2.3).
Elementary Counting Techniques 71
(b) Define a function f (read ‘Ihat’) from Y to 2" by fly) = f,for
y e Y. Prove that f determines f uniquely. that is, if we know
f, we can recover the original function f from it.
(c) Prove that there is a bijection between 21'" and (ZX)Y. By
analogy with exponentiation for real numbers, this is called the
exponential law for functions.)
Nate: and Guide to Literature
By far, a most reccommendable reference on set theory is the book
of Halmos [l]. The axiomatic approach to set theory along with the cons-
truction of natural numbers as certain sets may be found in the Appendix
to Kelley [1]. Once natural numbers are constructed, then step-by-step one
can construct, integers, rational numbers, real numbers and finally complex
numbers. These constructions are fairly standard and may be found, for
example, in Joshi [1]. This elaborates our earlier remark that numbers
can be constructed from sets.
Il'f : X—> Y is a function and x E X, the value offat xis denoted by
some authors by xf rather than by f(x). With this convention, the composite
of functions is. denoted the opposite way. One advantage of this notation
is that the functions appear in the same order in which they act. It is
probably for this reason that the new notation is increasingly followed.
Nevertheless, we stick to the old notation. Note also that some authors
call ‘codomain’ as ‘range’.
We remarked that in specifying a set by its elements, it makes no
difference how many times a particular element, is repeated. Sometimes,
however, such repetitions do matter. For example, when we consider the
set of roots of a polynomial, we would like to count each root according
to its multiplicity. Similarly when we consider the set of share-holders of a
company, we would like to count each share holder as many times as the
number of shares he holds. The appropriate mathematical concept to
handle such situations is a ‘multi-set‘, which we shall study in the next
chapter.
The perceptive reader must have sensed some resemblance between
complements and negation, between union and disjunction and between
intersection and conjunction. Such resemblance is not fortuitous and will
be fully brought out when we study Boolean algebras.
2. Cardinalities of Sets
The discussion in the last seetion was independent of the sizes of the
sets. The concept of the number of elements in a set never appeared in it
(except in the remark we made after defining cartesian products to justify
the term ‘product'). In practice, the size of a set is obviously a very impor-
72 DISCRETE MATHEMATICS (Chapter TW)
tant concept. The technical name for it is the cardinality or the cardinal
number of the set. For finite sets it coincides with the notion of the number
of elements in a set: Since we shall be interested mostly in finite sets, we
shall omit the definition of cardinality, which is rather technical. Thus,
for our purpose, the cardinal number of a set S is simply another name
for the “ of ' in S. It is 'y " ‘ " by [Slor n(S)
of 41: (S). For brevity, a set of cardinality n will be called an n-ret and a
subset of cardinality m an m-rubset. Note that a l-subset (or singleton)
corresponds to a point of a set although conceptually the two are difl‘ercnt.
Now, what is a finite set 'I The word ‘finite' comes from the Latin verb
'finire' which means ‘to end’. This is consistent with the intuitive meaning
of a finite set. If we start counting its elements one by one, this process
will end eventually. That brings another question. What ‘is counting ?
When we count, we go on assigning successively larger positive integers to
the elements of a set, making sure that no element gets counted more than
once. This gives a one-to-one function from the set S into the set of posi-
tive integers. If the set is finite, the range of this function will be a set of the
form (1, 2,..., n} for some positive integer n. We then get a bijcction bet-
ween S and {1, 2,..., n). Therefore, formally we define:
2.1 Definition: A set S for which there exists a bijection between S and
the set {1, 2,..., n} for some positive integer n, is called finite.
Note that this definition presupposes that positive integers are already
defined. As remarked in the ‘Notes and Guide to Literature’ of the last
section, positive integers can indeed be defined as certain sets and their
basic properties (popularly known as Penna axiom, one of which is the
principle of induction) can actually be derived as theorems, Butwe shall
make no attempt to achieve this degree of completeness. So we shall take
these properties for granted. Using them it can be shown that the integer
n that appears in the definition above is unique fora given set. This unique
integer n is formally called the cardinality of S. By convention we also take
the empty set to be finite and assign it the cardinal number 0.
Most of the sets that we come across in practice are finite. This
includes the set of students in a college, the set of persons living his
country, the set of all sand particles on a beach and the set of all atoms in
some physical object. These are all finite sets. even though some of them
are so large that in practice we tend to think of them as infinite. Whenever
we deal with ‘practical’ problems, there will be a tacit assumption that the
sets involved are finite, unless otherwise stated or implied by the context.
For example, in the Dance Problem, we assume that the set of boys and
the set of girls are both finite. (Actually, if this assumption does not
hold,
the result can be shown to be false.)
The fundamental problem of combinatorics is to find the cardinality
of a given set whichis expressed in terms of some other sets
of knowu
Elementary Counting Techniquer 73
cardinalities'. Often we want to do something more than mere counting.
For example, we may want to list the elements of a set in some order. But
in general we are lucky if we can at least count how many elements there
are in a given set, because with the infinite variations in the problems that
arise in practice, there is no golden method which will work in all problems.
Some of the more elaborate methods will be studied in later chapters.
Here we study a few elementary counting techniques. They will all be based
upon the following simple property which we take as an axiom, popularly
called the principle of addition.
2.2 Axiom: If S and T are any two (finite) set: and S n T=¢, then
lSU Tl=|S|+lTl.
We remark that although we are taking this property as an axiom (and
therefore not giving any proof for it). it Can actually be proved if we go
to the definition of positive integers as certain sets. As a matter of fact, the
very definition of addition involves the process of taking the union of two
mutually disjoint sets. So the property above is actually an easy conse-
quence of certain definitions. However. as remarked before, we are not
after such degree of completeness. Therefore, we shall take it as an axiom.
It is certainly consistent with our experience. The condition S n T = d,
is necessary to ensure that no element in the union S U T is counted
twice. Before we derive further consequences of this axiom, it is convenient
to have a slight extension of it to the case ofthe union of any number of sets.
2.3 Proposition: Let 4,, A,,..., A. be pairwise disjoint, finite sets. Then
a .
IU
i-I
EIAIL
AII= 1-]
Proof: We prove this by induction on n. There is nothing to prove for
n = 1, while the case n = 2 is covered by the axiom above. Assume the
result holds for n= k. We want to prove it for n = k+ 1. For this, let
I
S = UA; and T: Ah“. Then
[-1
k
SnT= ('UIA: ) {Mm
which is easily seen to be equal to
a
'U1“: 0AM!)
(cf. property (ii) in Proposition (1.1).) Since we are assuming that A,nA, = ¢
‘The basic question in combinatories is often put as '1): how many ways can a
certain thing be done? Wecan paraphrase this wing sets. Let s be the set of all
possible ways the particular thing can be done. The question now is equivalent to
asking the cardinality of S.
74 DISCRETE MATHEMATICS (Chapter Two)
for all i ¢ j, it follows that SnT= 95. Therefore, from our axiom,
ISUTI=|fl4dT
or in other words,
k+l I:
IUAII=|U
i-l
A: l'l‘ IAkuI-
I-l
By induction hypothesis, the first term on the right hand side equals
k+l
ilAll. So the right hand side equals 2 [Ail completing the proof. I
[-1 (=1
2.4 Corollary: If in the proportion above, all sets A, are of the same
cardinality (say m) then I It; A, l = mn.
Proof: This is obvious. since m + m + + m(n times) is precisely mu.
(Actually this is the definition of multiplication of integers, but we are not
doing it that way.) I
As another easy consequenCe of Axiom (2.2) we prove;
2.5 Proposition: Let X be any (finite) set and A any subset of it. Then
lX-AI=IX|-lAl.
Proof: We letS = A and T= X—A and apply the axiom. We get,
IX1=|SUTI=|SI+|T| =lA|+lX-AI-
The result now follows by subtracting | A I from both the sides. I
This simple proposition is useful in finding the probability of an event
by first computing what is known as its complementary probability, that is,
the probability that the event will not occur. It so happens sometimes that
it is easier to find the number of unfavourable cases than the number of
favourable cases. To find the latter, we apply the last proposition (provided.
of course, that we know the total number of cases). We shall see instances
of this later-t The following corollary of the last proposition conforms to
our intuition that a part cannot be equal to the whole. .
2.6 Corollary: if X, A are as above then | A | < |X|. Also if A is a
proper subset othen | A l < I X |.
Proof: We have, |X| = | A l + l X—A |. Since [X—A | 2 0, the first
assertion is clear. Also, if A is a proper subset of X then X—A is non-empty
and no | X—A | > 0, proving the second assertion. I
In Axiom (2.2) we required that the sets S and T be disjoint. When if
S n T¢¢7 Wecan then still find I S U T| by summing uplsland | TI
but we have to make up for the fact that the elements of S U T have been
Elementary Counting Techniques 75
counted twice. The situation is pictured in Figure 2.4. Although later on
we shall consider a much more general formulafor the union of any number
of sets, this special case is worth mentioning separately because of its
intuitive nature.
5
T
Figure 1.4: Cardinallty of the Union of Two Sets.
2.7 Proposition: For any two finite setsS and T,
lSUTl=|Sl+lT|-|SnTl«
Proof: Let m,n and k denote respectively the cardinalities of the sets
S—T, S n Tand T—S. It is easy to see that these three sets are pairwise
disjoint and that their union is S U T. So by Proposition (2.3), I S U T |
= m + n + k. It is also clear that S is the union of the two disjoint
subset: S—T and Sn 1‘. Hence, |S|=|S—T| + |Sn T|=m+n.
Similarly| T| = k + n. The desired result nowfollows by substitution. I
Let us now see how the existence of functions with certain properties
implies some relationship between the cardinalities of their domains, ccdo-
mains and ranges. In the following proposition we list a number of such
results.
2.8 Proposition: Let/:X -’ Y be a function with range R, where X, Y
are finite sets. Then,
(i) iffis a bijection. then l X] = | Yl
(ii) iff is one-to-one, then I X | < l YI
(iii) iffis onto, then I Y] g | X]
(iv) | R | S | X | (without any restriction on f).
Proof: If X is empty then all the results are trivially true. So let us
suppose X 95 4» Note that Ymust also be non-empty because there cannot
exist any function from a non-empty set to the empty set. Let m = | yl,
76 mscnsm MATHEMATICS (Chapter Two)
(i) By definition, there exists a bijection, say, g: Y—> {1. 2,, in}.
The composite of two bljections is easily seen to be a buectlon
(cf. Exercise 0.11)). So g of: X—> {1, 2, ..., m) is a bijection.
Butthis means Ihatm = IXI. So|X| = | Y[.
(ii) f is given to be one-to-one. If we consider it as a function from
X to R, it becomes onto also and hence abijection. So by (i),
| X| = [R |. Now, by Corollary (2.6), | R] < [ Y| Hence the
result.
(iii) Let the distinct elements of )_’ be y” y,, ...,y.,.. For each
i: l, 2, ..., m let A, =f"({yi)). Then the sets Av A” ..., A...
are pairwise disjoint and also :61 A. = X. So by Proposition
(2.3), | X[ = fl IA, I. Now because/is onto, no A, is empty
,-
and solA, [ > ”or all r'= 1,2, ..., m. Hence |X| 2 m; that
is, [X] 2 | Y|aswasto be proved.
(iv) This follows from (iii) by considering once again 1‘ as a function
from X onto R. I
Note that the argument used for proving (iii), also gives the slightly more
general result that it‘ every point of the codomain has at least I: preimages
for some positive integer k then I X | 2 kl Y|. Similarly (i) can be gene-
ralized to say that if every y a Y has exactly k preimages then I X | = k
| Y [. This generalisation will be frequently used in the sequel.
The preceding proposition is hardly profound. But it has some very
interesting consequences. The ingenuity lies, of course, in defining a suit-
able function. As an illustration let us do the Tournaments Problem with
n participants. Let P be the set of participants and M the set of matches
played. In the last section we defined the loser function f:M —> P by
flm) = the player who loses the match m; for m E M. It is easy to show
that this function is one-to-on: (cf. Exercise (l.l7)). Let R be its range.
Then by (ii) in the proposition above, I M | = | R |. But now what does
R consist of? Clearly it contains those (and only those) players who lose
at least one match. The rules of the game are such that the champion
never loses a match but every other player does. So R is the entire set P
except the champion. Hence by Proposition (2.5), [R] = [P | - 1 = 71—).
This menus, | M ] =n—l, that is, n—l matches are played to find the
champion from n players. See how efi‘ortlessly the result falls out!
For other applications, it is convenient to teformulate the proposition
above slightly. We state the result without proof because basically the
proof simply amounts to taking the contrapositive of the assertions proved
above.
Elementary Counting Techniques 77
2.9 Theorem: Let f:X —> Yhe afunction where X, Y are finite sets.
Then,
(i) it‘ I X | = | Y I then f is one-tonne if and only if it is onto.
(ii) if i X | > | Yl then f cannot be one-to-one. More generally, if k
is a positive integer such that |X | > kl Y | then there exist at
least k + 1 distinct points, say, x,, xx. ..., X“; in X such that
far) =f(x,) = =f("ku)4 I
The reason for calling this result a ‘theorem’ is that the second state-
ment in it is Very famous by a different name. It is called the pigeon-hole
principle. In essence it says that when we put letters into pigeonholes, it'
there are more letters than there are pigeonholes, then at least one pigeon-
hole must contain more than one letter. This seems to say little more than
common sense. But once again, although the pigeonhole principle itself is
trivial, when cleverly applied, it can yield nontrivial results. We illustrate
this with two problems.
2.10 Problem: Suppose 14 students in a class appear at auniversity exa-
mination. Prove that there exist at least two among them whose seat
numbers differ by a multiple of 13. (There is nothing very special about
the number 13 here. In Western countries there isusuperstition that it
signifies evil.)
Solution: Let S be the set of 14 students. The trick is to define a suitable
function f on S in such a way that for) = f(y) would imply that the seat
numbers of the students x and y difi'er by amultiple of 13. For this we
note that given positive integers p and q, we can divide p by q and get a
quotient a and a remainder I; that is we can find integers a and r such
that p = aq + rwhere r, being the remainder, can only have the values
0, 1, ..., q—l. (This is formally known as the Euclidean Algorithm and we
shall study its consequences in the chapter on rings. Actually, p can be a
negative integer, but We always require the remainder to be non-negative.)
So we deflnef: S —> (0, 1, ..., 12) byf(x) = remainder left when the seat
number of x is divided by 13. Here the domain has cardinality l4 and the
codomain has cardinality 13. So the pigeon-hole principle applies and we get
that there exist distinct students A: and y such that f(x) =f(y). Let the seat
number: of x and y be m and n. Then, by definition, there exists integers
a and b, respectively, such that m = 13 a +flx) and n = 13 b +f(y).
Sincef(x) = [(y), we get that m—n = 13 (a - b) which is clearly a multiple
of 13 since a—b is an integer. E
For the next problem, we urge the reader to try it first on his own. (It
can actually be done without the use of the pigeon-hole principle. But the
argument becomes clumsy.) Many problems in combinatories (and more
generally in mathematics) look deceptively simple once you know their
78 DISCRETE MATHEMATICS (Chapter Two)
solutions. In order to truly ,, ' the ‘ ' it is L 0
9 y
to give the problem a serious try.
2.11 Problem: It' 101 integers are selected from the set {I, 2. ..., 200).
prove that among the selected integers there exist two integers such that
one of them is a multiple of the other.
Solution: Before giving the solution let us see what is so special about 101-
When one integer is a multiple of some other, it is at least twice as big as
the other. Therefore it‘ two integers selected from (1, 2, ..., 200) are both
‘large‘, neither could be a multiple of the other. In particular if we select
the 100 integers from 101 to 200, the result' does not hold. Now
if we select just one more integer, say x, then it has to be from 1 to 100
and so either 2x or 4:: or 8x etc. is among 101 to 200 and so the result
holds. Of course, this does not solve the problem (because we are not given
that the selected integers must include all integers from 101 to 200), but it
suggests that twofolds may have some special role to play in the solution-
Now, for the solution itself, we let S be once again .the set of selected
integers and then construct a suitable function f with domain S. Let x e S.
Then either at is odd or else it is a twofold of some integer y, Le. x = Lin
Again y is itself odd or else it is a twofold of some 2, y = 22. We go on
repeating this until we get an odd integer (which may be equal to 1). For-
mally, we write x as 2': where r is a non-negative integer and t is odd. We
now definef(x) to be this odd integer t. Because x is from 1 to 200, f(x) can
be from I to 199; thusftaltes values only in the set(l, 3, 5,7, ..., 197, 199}.
This set has cardinality 100, less than that of the domain S. So by the
pigeonohole principle there exist distinct x, y e S such that/(x) =f(y).
By definition, x = 25/(x) and y = 2'f(y) for some non-negative integers r
and .1. Since/(x) =f(y) we see that x/y = 2'” which is an integer if r 2 s
while if: 2 r then y/x is an integer. In either case the assertion of the
problem holds. I
We have proved something stronger than the problem asks. We have
found at and y among the selected integers such that x/y (or y/x) is not only
an integer but a power of 2. If the problem was stated like this, perhaps
we would have got some hint of the proof. This is the ease with some prob-
lems. A stronger assertion is easier to prove because it carries with it a
built-in clue for the solution.
The propositions we have proved so for also allow us to find the cardi-
nality of the cartesian product of sets (as remarked in the last section.) The
result which follows is popularly called the principle of multiplication.
2.12 Proposition: Let X, Ybe finite sets. Then |X>< Yl=|X1x|Y|.
Proof: Letm= |X| and n=|Y|. ll‘n=0, then Y and X x Y are both
empty and the result holds trivially. Assume n > 0 and let the distinct
Elementary Counting Techniques 79
elements of Ybe y,,y., ...,y,. For each i= 1, 2, ..., n let A, be the set
I
Xx(yl). Then A, is a subset of Xx Y(cf. Exercise (1.5)) and Xx Y=’U‘Ai.
Also for iaéj, yfiéy, and so AgnA, = 95. Further, for every i, the function
f1:X—> .4, defined by f,(x) = (x, y.) is clearly a bijection. So by Proposition
(2.8), Part (i), |A,| = m for every 1'. The result now follows from Corollary
(2.4). (A pictorial representation of the proof is given in Figure 2.5.) E
A-
.-
A-
x Y
.. (X)
Plgure 2.5: CIrdinnllty of a Cartesian Product.
The preceding result can be readily extended, by induction, to the case
of the product of n sets, the same way as Proposition (2.3) is an extension
of Axiom (2.2). The proof is left as an exercise.
2.13 Proposition: For any finite sets A1, A,, ..., A. we have
lA,><A,x...xA.|=|A,| >< |A,| x...>< [A,,|. '
The result of Proposition (2.12) is often expressed by saying that if
something can be done in m ways and when it is done in any one of these
ways, some other thing can be done in n ways then the two things can to-
gether be done in mn ways. (in Proposition (2.12), the first ‘thing‘ is to
choose an element of X and the second is to pick an element of Y.) Al-
though this formulation is somewhat loose, it helps in many problems. In
fact, the number of ways to do the second thing need not be independent
of the way the first thing is done. Thus if the first thing can be done in m
ways and when it is done in the ilh way the second thing can be done in
n, ways, for i = l, 2, ..., m then the two things can be done together in N
ways Where N 2 711 + n. + ...+ n, The reasoning is analogous to the proof
of Proposition (2.12). As an application ifthere are five men having 2, 3, 1,0
and 2 wives respectively then a couple can be invited in 2 + 3 + 1 + 0 + 2
ways, i e., in 8 ways. The extension to the case of n things (instead 01‘ 2) is
also obvious.
Using this reasoning we count the number of functions from one set to
another. Because of its basic character, we record the result as a theorem.
80 DISCRETE MATHEMATICS (Chapter Two)
2.14 Theorem: Let X and Y be finite sets. Then the number of distinct
functions from Xto Y is “’1'“.
Proof: If X is empty then there is only one function from X to Y and the
assertion holds with the convention that any integer raised to 0 is 1. Assume
X is non-empty. If Y is empty, there can be no function from X to Y and
again the result holds. So assume X, Y are both non-empty. Letthe distinct
elements of X be an, x,, ..., x... where m = |XL Similarly let the distinct
elements of Y be y,, y, .r., y. with n = | Y]. Now a function from Xto Y
assigns to each x, some yjt Since there is no restriction on the functions,
each x, can be mapped independently of the others. Thus there aren possi-
ble ways to map each element of X. Since X has m elements, it follows
that there are n'" functions in all. I
Because of this result, the set of all functions from a set X to a set Y is
often denoted by I”.
Combining the last theorem with Proposition (2‘8), we can count the
cardinality of the power set (i.e., the set of all subsets) of a given set.
Although the result can be derived directly by induction. the proof is inst-
ructive as a culmination of the work done so far.
2.15 Theorem: Let X be a finite set and P(X) its power set. Then
”’00! = 2m
Proof: Let 2, be the set {0. I). Let Fhe the set of all functions from X
to 2,. By the theorem obove, |F| = Wl and so the proof would be com-
pleted if we find some bijection between P0!) and F. For this, define
6:P(X) —: F as follows. If A is a subset of X, we let 01X» 2, be the
characteristic function of A, defined by/xx) = i if x E A and 0 if): ¢ A (see
Exercise (1.13)). We define 8(A) :f4. (A beginner often finds this definition
confusing. But it need not be so. Note that the codomaiu of the function
0 is F which itself consists of functions from)! to 2, So if we take a typical
element, say A, of the domain of 0(viz , P(X)), then 9(A) itself should he a
function from X to 2,. We let this function be the characteristic function of A.
Thus we can write [0(A)](x) = 1 orO according as x E A or x rt A.) Now to
show that 0 is a bijection, we first show it is one-to-one. Suppose 8(A) = 0(3)
where A, BE P(X). We must show A : B. Let xe A. Then 0(A) (x) = 1.
So 0(3) (x) = l, which means xe 5. Thus A C B. Similarly B c A and so
A : 8. Next, to prove that 0 is onto, let f 6 2f. This meansf is some
function from X into Z, and we have to find some Ae P(X) such that 6(A) =f,
i.el such that 1’ equals the characteristic function of A Obviously we have
no choice but to let A = f"({l)). Then A C X, i.er A E [’00. Now if
x E A, then x e [‘1((l}) and sof(x) = 1; while if x ¢ A thenf(x);é l, but
then [(x) = 0 because f assumes only two possible values, namely, 1 and 0.
Therefore for) : l or 0 according as x e A or x¢ A. This is precisely the
Elementary Counting Techniques Bl
definition of the characteristic function of A. Sof = f4 = 0(A), proving that
0 is onto. Since 0 is already injective, it is a bijection and, as noted earlier,
this completes the proof.
In Theorem (2.14), there was no restriction on the functions (other than
those imposed by the very definition of a function, namely that it be single-
valued and defined over the entire domain set). If we want to count only
functions satisfying some additional requirements, the reasoning has to be
tailored to the type of requirements and in many cases is far from easy.
For example, we count below the number of injective functions from one
set to another. But there is no easy formula for the number of surjective
functions. Before proving the result, we paraphrase the problem slightly,
because it is the paraphrased version that is more well-known. Let n and r
be positive integers. By an r-permntation of n objects (or symbols) we mean
an ordered arrangement of r of these objects, or in other words a sequence
of length r in which no object appears more than once. We can make this
precise using the language of functions. Let Y denote the set of n objects.
A sequence of length r of these objects obviously corresponds to afunction
f from the set (I, 2, ..., r} to the set Y. The requirement that no object
appears more than once is equivalent to saying that the function f is in-
jective. We can therefore state our results in the form of r-permutation of
n objects.
2. 16 Theorem: The number ofr-permutations of n objects (or equivalently,
the number of injective functions from a set of cardinality r to a set of
cardinality n) is the product n~(n—l)-(n—2) (n—r+l).
Proof: We simply have to modify the proof of Theorem (2.14) slightly. Let
X= {x,. x,, ..., x,) and Y: {y,. y,, y.) he sets of cardinalities r and n
respectively. Let f be an injective function from X to Y. Then f(x,) can be
anything from ()5, y., ..., y.). Once/(x9 is fixed, say f(x,) = y,, then, how-
ever, f(x,) cannot equal y. because f is one-to-one. So, for f(x,) there are
only n—l choices namely, y,, y,, ..., y,_,, y,+,,...,y.. When f(x,) has been
fixed, for.) will have n —2 possible choices. Continuing this way, we see that
fcan be constructed in n~(n-l) (rt—2), ......(n—r+l) ways. This is, there-
fore, the total number of injective functions from X to Y, or equivalently,
the number of r-permutations of n objects. I
If r > n, then the expression in the last theorem has 0 as a factor and
> n. or
so we see that there can be no r-permutation of n objects for r
it is always
course this is also obvious from the pigeon-hole principle. But
with its
nice (and comforting) to note how a general formula is consistent
special cases proved earlier by difl'erent reasonings.
is
The case r = n is especially interesting. An n-permutation of I: object
the last
simply called their permutation. We already defined this term in
section and in View of the comments we made before Theorem (2.16), the
82 Discnm MATHEMATICS (Chapter Two)
two meanings are consistent with each other. From the formula above, the
number of permutations of a set with n elements isn«(n -— l)- (n —2)......3.2.1.
It is the product of the first n positive integers. This expression appears so
frequently in combinatories, that it is convenient to have a shorthand
notation for it. It is denoted by n! or by I: (read as ‘...n factorial'). By
convention we set 0! = l. The number of r-permutations of n objects is
also denoted often by a special symbol..Pr (01' " I)‘ Clearly 1-?r = ("—1151 f"
0 < r s n.
For example, let u = 3. Then n! = 6. The six permutations of a 3—
element set, say {a,b,c) are given by ubc, acb, bca, bac, cab and cba. There are
12 2-permutations of a set with 4 elements, say a, b, c, d. They are ab, ac,
ad, (M, be, M, ea, cb, ed, do, db and dc. The function n! grows very rapidly
as n grows. Even for relatively small values of n such as n = 8 (note that 8!
exceeds 40,000), if we want to list all permutations of n symbols, we cannot
do so haphazardly. If we do, we are very prone to miss some permutation
or to repeat some permutation. Systematic methods for listing the permuta-
tions of a given set will be studied in the next chapter.
We often have to consider arrangements of objects in which some of
the objects are indistinguishable from each other. For example, suppose we
have to form strings containing, say, three red beads, 2 blue beads and 1
white beads. Beads of the same colour are to be regarded as indistingui-
shable and so two strings which result from each other by mere reshufliings
within beads of the same colour are to be regarded as identical. (As
another example, if we want to form different words by prrmuting the
letters of some given word, say, ‘content' we do not distinguish between
the two If: and the two ts‘. The general problem can be represented as
follows. Suppose we have n objects which are classified into k distinct
types, so that no two objects of the same type are to be distinguished from
each other. Suppose there are n, objects of type l, n. objects of 2,..., 71,,
objects of type k where til, 71,, ...,m. are positive integers (it may happen
that m = l for some values of i). Clearly nI + n, + + m. = n. The
question now is how many distinct permutations of these It objects there
are. The following theorem provides the answer.
2.17. Theorem: The number of distinct permutations of a set with nl
objects of type l, n, objects of type 2,...,nk objects of type k equals
(n.+ n. + + m.)!
71,! n,! mm! '
Proof: The proof will be an application of generalisation of (i) in Proposi-
tion (2.8) (see the comment following its proof). Let us denote objects of
type] by 11,, a.,..., 0., . Even though they are to be regarded as identical, be-
cause of our notation. we distinguish among them for a moment. Similarly
Elementary Counting Techniques 83
denote the objects of type 2 by b,, b,, ...,b.,; those of type 3 by c,. c,...,
c._ and so on. (It may be argued that if k is large, we may run out of the
alphabet. But we can always introduce additional symbols if necessary.
Alternatively, we can use double suffixes. Thus, the objects of type 1 will
be denoted by xm, xv”. ., x,,.t, those of type 2 by x”. x”... .,x,,._ and so
on).
Now let n = n, + n, + + m. and let S be the set of these n ‘distinct'
objects. Let X be the set of all permutations of S. By Theorem (2.16),
|X ] = nl. Let Y be the set of all permutations of the given objects, that is,
where we do not distinguish between two objects of the same type. Our
goal is to find the cardinality of Yand we do so in terms of the cardi-
nality of X by defining a suitable function f z X —> Yes follows.
Take any typical element in the set X. This is some permutation of the
n objects. Now we merely remove the subscripts of the objects in this
permutation. We then get an element of Y, i.e., II permutation of the
original set of objects. We denote this assignment by the function I. For
example, 1' (b,a.c.a.a‘cldlb,) = bacaacdb. Let us now examine how many
pre-images each point of Y has under 1'. Note that every element of Y is a
permutation in which there are nl a’s (without sufixes), n.. b‘s, n. c‘s,....
and so on. Now we can assign the n, sutfixes to these n, «'3 (of course we
must not assign the same suflix twice),the n, suflixcs to the n, b’s and so on.
When we do this, we shall get an element of X which will be mapped
under f to the element or Y with which we started. (For example if we take
bacaacdb in Y, then,
b,a,c,a,a,c,d,b,, blalc,a,a.c.d,b,, bflgctalafidfi,
and many other elements of X are all mapped to it under I). Now. the n,
sutfixes can he put on the n‘a‘s in n,! ways (because each suflix is to be used
only once). Similarly the assignment of suffixes to the Fe can be made in
n,! ways, to the c's in n,! ways and so on. Together, for a given element of
Y, we can assign the sutfixes, in n, ! x n‘ ! x x "1;! ways andthereby get just
these many points of X which are all taken to the same element of Y
under f.
Thus, every point of Y has exactly nlx !xn.l x nk! preimages under
fl So by the extension of Proposition (2.8),
|X|=(n.lxn,! x...x m!) [ 7|.
Since | X| is already known to be It! and n = n,+ n, + + nk, it fol-
low: that
(n + n! +.. .+m)!
| Yi" ' n.‘ n I "M1
as was to be proved. I '
we
As an application, by rearranging the letters of the word 'content
all these words are
get fi or 1,260 distinct words. (of course, not
84 DISCRETE uranium-nos (Chapter Twa)
meaningful, but that is a problem of English rather than of mathematics).
The reasoning used above can also be used to find the number of
r-combinations of n objects. Formally, an r-selection or Hombination of n
objects is an unordered arrangement of r distinct objects» out of these
objects. In the language of sets, if S is the set of n objects then an r-com-
bination of these objects is nothing but a subset of S of cardinality r. or an
r-subset of S. To find the number at such r-subsets, let X be the set of all
r-permutations and Y the set of all r—oombinations of n objects, say
al,a,,...,a,.. We define f: X—b Y by ignoring the order. That is, if we take
any r-permutations say, “I: , a,_ 0,, then we let 1' (a,I , a,, ....a,, ) be sim-
ply the set ((1,, , 4,, , ..., a,,} which is an r-subset. If we permute 41,, , 11,, ,
..., (1,, among themselves, we get the same r-subset, i.e. the value off is
unafl'ected. But there are exactly rl distinct permutations of r objects. So
every point of Y has exactly rl preimages. So by the extension of Proposi-
tion (2-8) G), we set [ X| = r! | Y[. We already know that
nl
|X|—.P.= (n—r)! '
We therefore get the following important result.
2.18. 1“ z The l of “ ‘ . '' ‘ of n objects
nl n-(n—l!...tn—r+ l)
equals r—l ("‘0' , (which also equals ’1 — ). I
This number appears so frequently in mathematics that it is convenient
to have a short notation for it. It is commonly denoted by ..C,, "C, or by
(t) and read as ‘n-choose-r’ or ‘n—or'. It is also called a binomial ooefllcient
because it appears in the well-known binomial theorem, which we shall study
later. Note incidentally that
n n
( )=( ),for0<r<n.
l' n—r
There are numerous identities involving .C, (and sometimes .P,). They
are called combinatorial identities because they can generally be proved
bya l' :mmt -. which ‘ of '- ' 5 that both the
sides of the identity represent the cardinality of the same set, counted in
two different ways. or course sometimes it is just as easy to establish the
identity purely algebraically. For example in the identity
(n n
.)-( )
given above each side equals
nl
r!(n—r)l '
Elementary Counting Tethniques 85
For a combinatorial proof, we observe that selecting r
objects out of n
to ' , the ‘ ' , (n—r) ‘3‘ ‘ The n—r objects to be
. n
rejected can be chosen from n objects in ( )waya. As another illustra-
n—r
tion, we prove one more identity.
1.19 Propositlon: For 1 < r < n. we have
n 11—]
r
= r
) + (n—
r—l
l
.
Proof: An algebraic proof can be given by expanding the terms. For a
combinatorial proof, note that the left hand side represents the number of
ways to select r objects out of n objects. Let S be the set of these 1: objects.
Fix some x E S and let T= S—{x}. Then |T[ = n—l. The r-subsets of S
are of two types; those containing 3: and those not containing x. An r-sub-
set of the first type will have its remaining r—l elements from 1‘ while an
r-subset ot‘ the second type will have all its r elements from T. Since
21-] n
|T|=n—1,Twillhave( )subsets of cardinality r—l and ( > subsets
r—l r
n —l n —l
of cardinality r. Together, there will be( 1) + ( ) subsets of S hav-
r— r
ing r elements each.
Although this argument is reasonably clear, it is instructive to formulate
it rigorously so as to bring out exactly which of the results in this section
have been used where. We let P,(S) be the set of all r-subsets of S. Now let
Q={AeP,(S) : xeA) andR={Ae P,(S):x¢A}.ThenQnR= 95 and
QUR = P,(S). So by Axiom 2.2, |P,(S)| = |Q| + IR |. There is evidently a
bijection between Q and P,_1(T). Hence |Q| = [ P,_.(T)| by Proposition
2.8. Similarly there is a bijection between R and P,(T) and so | R l = | EU) I.
The result now follows by substituting the values of | P,(S) I. | P,_1(T) |
and |P.(T)| given by Theorem 2.18. I
Combinatorial arguments can also be used to prove some other results.
As a somewhat trivial example. we prove:
2.20 Proposition: For every non-negative integer n, n < 2".
Proof: Let X be a set with |X| = n. Let P(X) be, as usual. the power set
of X. Define f: X-> P(X) by [(x) = {x} for x E X. Then f is one-to-one. So
by Proposition 2.8, 1X | = |R| where R is the range of f. Note that R is a
86 nrscam Marnmnncs (Chapter Two)
proper subset of P(X), because 4’. the empty set is in P(X) but not in R.
So by Corollary 2.6. [R] < |P(X)|. Moreover, by Theorem 2.15, |P(X)|
= 2". The result now follows. I
The fact that the cardinality of a set must always be a whole number
implies that whenever it is expressed as a ratio of two integers, the denomi-
nator must divide the numerator. As an application of this type of a
reasoning we prove one result.
2.11 'i“ The, “‘ofanyr ' , “' ‘ , isdi-
visible by r!.
Proof: Let the integers be u, 7: +1. n + 2,..., n +r—l. We have to show
um n-(n + 1)...(n + r— 1) is divisible by r!. But W is
n + r—l
precisely ( ), which is the number of r-subsetsof a set withn+r—l
I'
n(n + l)...(n + r—l)
elements. So rl
is an integer. I
These applications. interesting as they are, are somewhat incidental.
The real applications of the preceding theorems are. in actual counting
problems. They will be given in the next section. We conclude the present
section with a brief discussion of infinite sets. The results we have proved
hold only for finite sets. Most of them are either meaningless or false for
infinite sets. That is why the study of infinite sets requires substantially
different techniques. .
The difi‘rculty begins right from the definition of the cardinal number of
an infinite set because the intuitive description of it as the number of dis-
tinct elements in the set, fails. It is tempting to set the cardinality of all
infinite sets as so. But this solution is not satisfactory for certain purposes.
For example, let N be the set of positive integers and R the set of real
numbers. Then both are infinite sets. But it can be shown that there cannot
exist any bijection between them. So if we assign them the same cardinality,
then it will not be a true measure of their sizes. A bijection from one set
to another simply amounts to renaming the elements of the domain set
with the corresponding elements of the codomain. We would certainly not
like to sacrifice the existence of a bijection as a criterion for equality of
cardinality of two sets, whatever be the formal definition of cardinal
numbers.
The ’ ‘ n. l to " ' L of sets (' ‘ “ , infinite
sets) starts by defining two sets to have the same cardinal number if there
exists a bijection between them. One then takes certain ‘standard‘ sets and
shows that given any ‘abstract‘ set X, there is precisely one standard set
Y such that there exists a bijection between X and Y. We then define the
cardinal number of X as the set Y. It may appear strange at first sight that
Elementary Counting Technique: 87
we define equality of cardinal numbers even before we define cardinal num-
bers. But it need not be so. The situation is very much like weighingobjects
With a balance. A balance alone cannot tell the weight of an object. It
merely tells whether two objects are of equal weight. To find the weight of
a given object, we balance it with some standard Weight, which is then
called the weight of the object. For finite sets, the ‘standard' sets are, of
course, the sets of the form (1, 2,..., It) for positive integer n.
Having defined cardinal numbers, one goes on to define their addition
and multiplication and study their properties. This branch of mathematics
IS called cardinal arithmetic. We shall not study it but remark that it is not
just a routine extension of the arithmetic of finite cardinal numbers. To
illustrate what goes wrang let X, Y, 2 be respectively the sets (n E N:
n 2 2}. {n e N; It even) and {n e N : n odd). These are all proper subsets
of N. But there exists a bijection between N and each of these sets. For
example, define f: N —>X by f(n)=n +1 for n E N; g; N —> Y by g(n)=2n
and h;N—>Z by It(n) =2n—l. Thus the sets X. Y, Z all have the same
cardinality as N. This defies our intuition that a part cannot beequal to
the whole (cf. Corollary (2.6)). Note also that Y and Z are complementary
subsets whose union is N. Still each of them has the same cardinality as N.
Thus when an infinite cardinal number is multiplied by 2, ltis not increased!
Actually more bizarre things can happen with infinite cardinals. Let X
be a finite set with cardinality n. Then the cartesian product XXX has
cardinality n‘ which is much bigger than n for large n. If, however, X is
infinite then XXX has exactly the same cardinality as X because there
exists a bijection between X and XXX. We shall not prove this. But we
shall illustrate it for the case X = N, the set of positive integers. To cons-
truct a bijection f from N to Nx N amounts to listing the elements of
N>< N as an infinite sequence in which every element appears exactly once.
This is done pictorially in Figure 2.6 where we start from the point (i, l)
and each arrow points to the next term in the sequence.
(1,!) (2,l) (3,!) (4,!) (5,l) (6,I)
run“: WityotNxN.
88 mscam MATHEMATICS (Chapter Two)
We shall rarely deal with cardinalities of infinite sets. Still, the set of
positive integers deserves to be mentioned, for on one hand, it is the limit-
ing case of finite subsets while on the other hand, in a certain sense it is the
‘smallest’ infinite set. Figuratively the set of positive integers is the bridge
between the discrete and the continuous mathematics. Limiting process, in its
simplest form, appears as limits of sequences. And sequences, as we know,
are nothing but functions defined on the set of positive integers.
The cardinal number of the set of positive integers is denoted by a
special symbol N,(read ‘aleph naught' or ‘aleph zero‘). 8 is the first letter of
the Hebrew alphabet. Obviously any set X for which there exists a bijection
between N and X also has cardinality R, and conversely. There isa special
name for such sets.
2.22 Definitlon: A set X is called enumerable (or dennmerable) if there
exists a bijection f: N->X (such a bijection is often called an enumeration
of the set X). A set which is either finite or denumerable is called countable.
A set which is not countable is called uncountable.
For example, the set of all positive integers, the set of all even positive
integers, the set Nx N are all denumerable and hence countable. The set
of real numbers is uncountable, although a proof of this fact is not easy.
and requires a fairly deep property of the real number system- (known as
completeness).
A few properties of countable sets will be given as exercises.
Exercises
Prove the generalisation of Proposition 2.8, parts (i) and (iii).
range
“N”
Prove Theorem 2.9.
Suppose there are k boxes with capacities to hold, say, nl, n....,me
objects respectively. Letn = nI + n, + + In. If n objects are put
in these boxes, prove that every box is packed to its capacity.
How does this generalise the pigeonhole principle 'I
2.4 Suppose the figures from 1 to 12 on a clock dial are reshuflied
among themselves. Prove that there exists a pair of adjacent
figures
which add up to at least 14.
2.5 Prove that the last result is the best possible in the sense that
there
exists an arrangement of the figures 1 to 12 around
a clock in
which no two adjacent figures add up to more than 14.
2.6 Given 5 points in a triangle whose longest side has
length or, prove
that there exist at least two among them which are
at a distance
at most al2 from each other.
2.7 Do. Problem 2.. _ ll without
. using the pigeon-hole rinci le ex 1 ' ' .
2.8 Given a positive integer n, prove that therep
p a positiv
exists picltlye
integer which is divisible by n and whose
decimal representation
Elementary Counting Technique: 89
consists of 0’s l’s only. (Hint: Consider a suitable function on the
set (1, 11, 111, ill], lllll,...}.)
‘2.9 Suppose that at a party there are at least 6 persons. Prove that,
either there exist 3 persons every two of whom know each other
or else there exist 3 persons no two of whom know each other.
"‘2. 10 Let P, (X) denote .the set of all r-subsets of a set X. Prove that
given positive integers n, r and k, there exists an integer N (depen-
ding only on n, r and k) such that whenever X is aset with I X ] 2 N
and P, (X) is written as a union of n subsets, say, P, (X) = S,US.
U U S... then there exists a k-subset Y of X such that P, (Y) CS.
for some i= 1, 2,..., n. (This is known as Ramsey’s theorem.)
2.11 Express the pigeonhole principle and the result of Exercise 2.9 as
special cases of Ramsey’s theorem.
2.12 A k-ary sequence is defined as a sequence which takes only k
possible values, which are generally denoted by 0,1,...,k—l. For
k = 2,3,4,5 the resulting sequences are called. respectively, binary,
ternary, quaternary and qulntary sequences. How many k-ary
sequences of length n are there?
Prove Theorem 2.15 by induction on the cardinality of X.
“ ,r weprove L 2.1513‘ " :‘Letthe " ' '
of X be x,, x,,. ., at... For each subset A of X we define a binary
sequence 11,, a,,..., a. in which a, = l or 0 according as X! e A or
xi e A. This gives a bijection between P(X) and the setof nil binary
sequences oflength n. Since there are 2" such sequences, ] P(X) | =2".
Is this proof significantly difl‘erent from the proof given the text?
5 Find .1”. using Proposition 2.12 and Proposition 2.5.
Prove the following identities eombinatorially:
a) ( n0 )+( nl )+( n2 )+...+( nn )=2-
n k n n—r ) < k <
M (Mr) (J (.-. ,
.. = s \ II
2n 2 n z
(m)(2)— (2)+n
(iv) n!=l xl!+2x2!+3x3!+...+(n—l)x(n—l)l+l.
Lets C XXYwhere X, Yare finite sets. For x e X, let
Gx=le:(X..v)ES}-
Similarly for y e Y. let
F,={xeX:(x,y)eS}.
90 015cm taxman/mos (Chapter Two)
(a) Interpret the stes 6,, F, geometrically.
(b) Prove that 27- |G,1 = 2‘15]. (An argument based on this
16X ye
simple result is called a double counting argument. It is useful
in proving identities.)
2.18 At a party there are more boys than girls. If each boy dances with
exactly 2 girls, prove that there is at least one girl who dances with
at least 3 boys.
2.19 Let m, n, p, q be positive integers. Prove that:
(i) (Inn-i134)! is divisible by (m l)" (p !)4
V‘(ii) (n’)! is d1visible by (n 0"“.
2.20 Assume that the probability of a person's birthday falling on a
given date is 1/365 (ignore leap years). In a class with n students,
what'1s the probability that two of them have the same birthday?
(Hint. F " y r ‘ Lility. The‘ ‘ probabi-
lity is fairly high even for relatively low values of n. For n— — 25.
there is more than 50/ chance.)
2.2l Letf: N —> N X N be the bijection shown in Figure 2.6. Prove that
the inverse function f-‘: NXN —> N is given by f‘1 (x, y)
=} (x+y-1) (Hy—2) +.v-
2.22 Prove the following theorems:
(i) A subset of a countable set is countable.
(ii) If f: X—>Y is surjective and X is countable, then Y is
countable.
(iii) The cartesian product of two (and hence any finite number
oi) countable sets is countable.
(iv) The union of two (and hence any finite number of) countable
sets is countable.
(v) If X is countable, then the set of all finite subsets of X is
countable.
2.23 Using the last exercise prove that the set of all integers, as well as
the set of all rational numbers is countable.
Using the decimal expansion of real numbers prove that the set of
real numbers is uncountable. Hence show that irrational numbers
exist and in fact their set is uncountable. (Hint: If at. (1,, a,,...,a,,,...
is a denumeration of real numbers let x be the real number whose
decimal expansion is 0.b1b,...b,.... where b. is a digit between
1 and 8 which is different from the digit in the nth place of decimal
expansion of the number 11,. Then x eé a. for all n. In cous-
tructing x. the digits 0 and 9 are excluded so as to avoid the
possibility of a real number having two difierent expansions such
as, 3.141600000000...= 11415999999999...)
2.25 Prove that every infinite set contains a denumerable subset.
2.26 Using the last exercise prove that if X is an infinite set and Y is a
finite set then the sets XU Y and X—Y have the same cardinality
Elementary Counting Technique: 91
as X. (figuratively, a finite subset of an infinite set is like adrop
in a bucket. See also the meaning given to the expression ‘almost’
all' in Section 1.4).
‘2.27 If X is a set, prove that there cannot be a bijection from X to P(X),
the power set of X. (This is easy for finite sets. For infinite sets
there is a surprisingly short but tricky proof due to Cantor).
2.28 What is wrong with the following ‘proof’ that every triangle is
equilateral? ‘Let ABC be any triangle. Through every pointl’ on
the side AB, there is a unique line parallel to BC. Let this line inter-
sect the side AC at a point Q. The function which takes I’ to Q is
a bijection. So there are as many points on the side AB as on AC.
Hence the two sides are equal. Similarly every other pair of sides
is equal.’
2.29 Find a hijection between the sets of points inside .a square and
inside a circle.
2.30 Prove that a set X is infinite if and only if there exists a proper
subset Y ofX and a bijection flX—>Y.
Nate: and Guide to Literature
Nearly all the results proved in this section are so obvious that they are
generally used without proofs, or sometimes even without formally stating
them. We have deliberately given a systematic approach starting from an
axiom not so much for the sake of the results, but to acquaint the reader
with the discipline of axiomatic deduction.
For cardinal arithmetic involving infinite cardinal numbers see, for
example, Halmos [1].
The argument given in the hint to Exercise (2.24) is an example of what
is called a " " ‘ a: The name L suggestive if we
write the decimal expansions of a], a,,... one below the other. The number
x is constructed by keeping oi! the ‘diagonal' which consists of the nth
place of decimal in the expansion of a., n = l, 2, 3....
It is interesting to note that Exercise (2.24) establishes the existence of
irrational numbers without proving any particular number (such as V2, e
or 1:) to be irrational. It is an example of proving existence through proving
abundance.
Ramsey’s theorem is the starting point of a number of interesting
‘ ' in mm" ' For a ‘ ; exposition, see Graham,
-
Rothschild and Spencer [1].
While it is obviously impossible to make any sweeping generalisation,
a rule of thumb can be laid down that for infinite sets the methods of
continuous mathematics are needed. Methods of discrete mathematics
apply well for finite sets, especially those of relatively small cardinalities.
For finite sets of large oardinalities, the tool is statistics.
92 DISCRETE MATHEMATICS (Chap/er Two)
3. Applications to Counting Problems
In this section we apply the elementary techinqnes developed in the last
section to some counting problems. There is considerable variety in the
types of problems that arise and sometimes ingenuity is needed in selecting
the right technique. We can therefore only give a glimpse of the problems.
Real expertise has to be developed through practice.
As asimple example, we take the Locks Problem. In Section 1, we
paraphrased it in terms of sets. We are looking for a set L and five subsets
M1, 111,, M,, M. and M, of L such that (i) M, U M, 56 ¢> for all Lind
(ii)M,nM,a = p for all distinct i,j, k. Thus, for every (unordered)
pair (i, j) of distinct indices there must be at least one element, say. x,, 1 in
Mr) M]. There are 10 such pairs by Theorem (2.18). because (g) =10. Now
if we take two distinct unordered pairs say (i, j) and (p, q} then out of the
four indices 1, j, p and q at least three are distinct. Therefore the element
x,, 1 cannot be equal to xp,“ for otherwise it will be common to at least
three of the subsets M.,...,M,. Thus L contains at least 10 distinct elements,
one corresponding to each pair of indices. This puts a lower bound on the
number of locks needed. We now show that a solution with 10 locks is in
fact possible. indeed, letL = {x,,,:1 S i S 5, l sjg 5, i #1} where we
regard x", and X“ as the same element. Now for each i = 1, 2,...,5 let
M.={x,,,: 1<j <S;j 961'}. Then for any iaéj,M, n M,={x.,,} 96¢
while for anyvthree distinct i, j, k, M,nM,a = 95. To translate the
solution back into the language in which the problem was posed,
we recall that M, is the set of locks which the person p, cannot open.
Each M, has 4 elements. So every person can open 610cks.The distribution
of keys to each lock can be succinctly given as follows. Note that for every
two distinct persons p, and p], x,, I is a lock which neither 01' them can
open but which everybody else can open. So every lock has 3 keys. In all
there are 10 locks, 3 keys to each lock and every person has 6 keys. (The
same answer will be obtained in Chapter 4 by another method.)
As a far more ingeneous application of the elementary counting techni-
ques, we present Andre's solution to the Vendor Problem. In Chapter 1,
Section 3, we reduced the problem to counting the number of balanced
arrangements of :1 pairs of parentheses. (in the original problem 7: = 50).
Let S be the set of all arrangements of n pairs of parentheses and let B be
the subset of Sconsisting of balanced arrangements. |S|is easily seen to
be (3") of (cf. Exercise (2.17)). To find |B|we find |S—-B[ and then apply
Proposition (2.5). In other words we find out the number of unbalanced
arrangements of in pairs of parentheses and subtract it from G”).
In any unbalanced arrangement of n pairs of parentheses, there will
always be a first stage at which the number of right parentheses exceeds
the number of left parentheses upto that point. Obviously the parenthesis
at this stage will be a right parenthesis. Figure 2.7(a) shows one such un-
Elementary Counting Technique: 93
balanced arrangement, the right parenthesis at which the ‘balancing’ breaks
down ls shown with an arrow. Let us call this parenthesis all the criticll
parenthesis. After the critical parenthesis, there will be an odd number of
((()())())))(()( (((HHHHUH)
(kitted parentheses Critical plrenfllgsi'
Figure 2.7: Andre’s mlntlon to the Vendor Problem.
parentheses, the number of left parentheses among them will be one more
than the L of right r “ Let us ‘ L ,. them. That is,
replace every right parenthesis occurring alter the critical parenthesis by a
leit one and vice-versa. The parentheses upto and including the critical
parenthesis are to be left unafl‘ected. Figure 2.7(b) shows the new arrange-
ment obtained from the unbalanced arrangement in Fig. 2.7(a). Note that
in the new arrangement, there are n + 1 right parentheses and rl—l left
parentheses.
The crucial point now is to observe that the original unablanced
arrangement of 7: pairs of parentheses can be recovered from the new
arrangement. In fact given any arrangement of n + I right and n—1 left
parentheses. we simply go on scanning it from left to right till we come
across a point at which the number of right parentheses exceeds the number
of left ones. We again call this the critical parenthesis. Then we inter-
change all the left and the right parentheses occuring after the critical
parenthesis. This gives an unbalanced arrangement of n pairs of paren-
theses. These two operations are clearly inverses of each other. Thus, we
have defined a bijection between the set 5—8, of all unbalanced arrange-
ments of n pairs of parentheses, and the set, X, say, of all arrangements of
n + 1 right and n—1 left parentheses. Hence |S-—B| = | X | by Proposition
(2.8). But from Theorem (2.17), it follows that
_ ml
‘X' — (‘n+)l(n—l)1'
Hence
_ (2n)!
'5’” l — (n+l)l (n—l)!
So.
_ _(2"_)' _ A _ 4201..
I“ ‘3 'S I ‘ '5‘“ “ (nl)(u!) (n+1)! (n-l)! — nl(n+l)l'
As noted in Chapter 1, Section 3, this is the number of favourable cases,
while | S | is the total number of cases. So the required probability is simply
%. In the original Vendor Problem n was so and so the probability that
94 urscnarsuarnaua’ncs (Chapter Two)
the vendor will not run out of change is % or slightly less than 2 per cent.
In other words, even though there is anadequate numbers of l rupee coins
with the customers, if they approach the vendor randomly, he is almost sure
to run out of change.
In many problems we have to find the number of ways to do something
subject to some restriction. Mere paraphrasing of the problem in terms of
sets and functions may not be of much help. The tricky part in such pro-
blems is often to grasp the essence of the restriction imposed, that is, to
realise that every permissible way of doing the thing under question is
equivalent to a permutation or combination of some other objects associ-
ated with the problems. These ‘associatcd objects' may not be given in the
problem itself, and often some ingenuity is needed to conceive them, analogus
to the ingenuity needed in solving problems of euclidean geometry where
we construct some additional lines (such as perpendiculars, angle bisectors)
not present in the statement of the original problem. We illustrate this
technique with two examples.
3.1 Problem: There are n guests at a party. Two of them do not get
along well with each other. In how many ways can they be seated in a row
so that these two persons do not sit next to each other?
Solution: Let us name the guests as x., x,, ..., x. with x, and x, as the
quarreling members,. We can do the problem by considering the various
positions in which x1 can be seated and then finding, for each such position,
the number of positions for x,. But there is a better way. The total number
of ways to seat n persons is n!. We now count the number of forbidden
ways, i.e.. those arrangements in which x, and 2:, do sit next to each other.
Such arrangements fall into two mutually disjoint categories; those in
which x, is immediately followed by x, (as viewed from one end of the row)
and those in which 1:, is immediately followed by x, (as viewed from the
same end). By symmetry both these categories have the same cardinality.
So we find the cardinality of the first category, i.e. the number of arrange-
ments in which x1 is immediately followed by xr To do this we treat x9“
as a single person. (This is the new ‘object’ we are introducing). This
‘person' along with x,, ..., x. gives a total of n—l persons, any permuta-
tion of which corresponds to an arrangement of x“ x,, ..., x, in which 2:,
is followed immediately by x,. There are (Ir—1)! permutations of n —l
persons. The other category also has (n—l)! arrangements. Since the two
categories are disjoint, the total number of forbidden ways is 2(n— l)! and
the answer to the problem is nl—2(n—-l)!. I
3.2 Problem: In how many ways can r men and .r women be seated in a
row so that no two women sit next to each other?
Elementary Counting Teclmr‘ques 95
Solution: This Is a generalisation of the last problem. (The last problem
arises as a special case if we treat the quarreling guests as women and all
other guests as men.) But the method of the last problem cannot be applied
directly here. Let us call the men as x, m, x, and the women as y,, ..., y,.
By the method of the last problem, we can count the number of arrange—
ments in which a particular pair of women is adjacent. But this will have
to be done for every pair and there will be many overlaps. For example, an
arrangement in which y,, y,, y, occurs will get discarded at least twice, once
because y1 and ya are adjacent and once again because y. and y, are to-
gether. There is_a way to handle such overlaps and we shall study it in the
next section But even that method does not work smoothly for this parti-
cular problem.
We therefore try a new approach. Let us first arrange the r men. This
can be done in r! ways. Now for any one such arrangement of men, there
will be r — I gaps between adjacent pairs of men. Besides, there will be
two gaps, one at each end of the row. These ‘gaps’ are the extra objects we
are introducing into the problem. There are in all r+ l of them. Now the
arrangement of the men and women can be completed by placing the s wo-
men into these r+ l gaps. The restriction that no two women be next to
each other amounts to saying that no gap should be assigned to more than
one woman. By Theorem (2.16) the number of ways to do this in
r+ 1)!
0+1)“ 01' “Ls—m -
This is the ‘ of pcl ‘ '” ... for every a. of
men‘ But there are r! ways to arrange the men. So the answer to the prob-
. r! (r + l)!
_:+l)!. I
lemIs0——
We now turn to problems about k-ary sequences (cf. Exercise (2.12)).A
k-ary sequence of length n can be formally defined as a function from the
set (I, 2, ..., n) into a set of cardinality k (which is generally taken to be
the set {0, l, ..., k — I}. This is also often called the alphabet, its elements
being called letters. In this context, a sequence of length n is also called a
word or a string of length n. The study of such strings is quite important in
the formation of r" s for r ‘ , Binary , are
especially important because they can be used to represent many things in
many contexts (see, for example, Exercise (214)). Also since only two sym-
bols are needed for every entry, mechanical implementation of binary
sequences becomes very easy with binary devices such as a switch or a
ferromagnetic bit.
The number of all k-ary sequences of length n is k", by Theorem (2.14).
We often want to count the number of such sequences satisfying some con-
dition. We prove one such result.
96 DISCRETE mmmncs (Chapter Two)
3.3 Proposition: The number of binary sequences of length n in which
the digit 1 occurs an even number of times’ is 2‘“. This is also the number
of sequences of length n in which 1 occurs an odd number of times.
Proof: Let S be the set of all binary sequences of length 71. Let A be the
set of those in which 1 occurs an even number of times and B be the set of
those in which 1 occurs an odd number of times. Clearly A and B are dis
jointandAUB: S So |A| +|B| = IS]. Also we know [S] = 2" for alln.
Our interest is in showing that |A | = 2"-1 and ID] = 2"“. Now define
f: S —> S byfia, a, a.) = a,’ a,’ a.’ where a" = 0 or 1 according as
a. = l or 0 for i = 1, 2, ..., n. Verbally, f interchanges the 0‘s and 1‘: occur-
ring in a sequence. It is easy to see that if n is odd then f maps A into B
and B into A. Alsof is clearly a bijection. So lA| = IBI. Hence
|A|=§.2-=2-1.
Thus the result holds for odd n.
For even n, the functionfmaps A into itself and B into itself and so
[A | and 13] cannot be related to each other directly. However, we can uti-
lise the fact that the result has already been proved for sequences of odd
length. Note that if n is even then u — l is odd. Now, let C be the subset
of S consisting of those sequences whose last digit is l and D be the set of
those whose last digit is 0. Obviously | C | = [D | = 2"“, because asequence
in C (or in D) is completely determined by its first n — 1 entries. Also
CnD = 4a and CUD = S. Because of this, A is the disjoint union of An C
and Anl). So M | = lAnCl + |AnD|. Now note that asequenceinAnC
is obtained by taking a sequence of length n — 1 containing an odd number
of 1’s and appending a l at the end. Since 11—] is odd, the number of seq-
uences of length n - 1 with an odd number of [’8 is 2"". It follows that
IA n CI = 2"". It can similarly be shown that M n D] = 2"". Hence
IA | = 2"" + 2"" = 2"“. Since ID] = | S] — l A |, it also follows that
|B| = 2"“. Thus we have established the result for even 7: also. I
Using the last proposition we can count the number of k-ary sequences
in which some digit, say 1, occurs an even number of times, for any k 2 2.
We could give the extension right here but we prefer to defer it to the next
chapter where we will be in a position to present it avoiding unnecessary
clumsiness.
We conclude the section with a discussion of problems involving distri-
bution of objects into boxes. Such problems are important because many
other problems can be reduced to them. Depending upon whether we treat
the objects and/or the boxes as distinct or non-distinct there are four
possible cases. We consider them separately.
’ A sequence in which does not appear at all is also to be included here.
Elementary Counting Techniques 97
(i) Distinct Objects and Distinct Boxer: Every placement of distinct
objects into distinct boxes amounts to defining a function from the set of
objects into the set of boxes. The number of all such functions was found
in Theorem (2.14). A restriction that no box should contain more than one
object amounts to saying that the corresponding function be injective. The
number of such functions was obtained in Theorem (2.16). A requirement
that no box be empty is equivalent to saying that the corresponding func-
tion be surjective. As remarked earlier, in this case there is no easy formula
for the number of onto functions. We shall revert to this problem later in
this section.
When we put objects into boxes, sometimes the order of objects in
each box matters. In such cases, we have tordistinguish between two ways
of putting the objects into the boxes even when each box contains the same
set of objecrs, hut ordered in difl’erent ways. We illustrate such a situation
in the following problem.
3 . 4. Problem: A noveau riche wants to wear four distinct rings on the
five fingers of his right hand. In how many ways can he do so? (Ignore the
difl‘erences in the sizes of the rings and the fingers. Also assume each fin-
ger to be long enough to hold all the rings).
Solution: Here the order in which rings appear on each finger is impor-
tant. Let us name the rings as r,,r,,r, and r‘. Now rl can be placed on any
of the 5 fingers. So there are 5 ways to place the ring r1. Suppose it has
been placed in one of these ways. Then the number of ways to place the
ring r, is not 5 but 6. Because when it is placed on the same finger as r,,
it can be either above or below r1. It is as if the ring rl cuts the finger on
which it appears into two fingers and so there are six difi‘erent fingers on
which r, can he placed. By the same reasoning for any one arrangement of r1
and r., there are 7 ways to place r,. Therefore the total number of ways to
wear the 4 rings on 5 fingers is 5x6x7x8 or 1,680.
There is another way to arrive at the answer. Let us name the fingers
as f,.f,,1;, f. and f.. After wearing the rings in any one manner, let us cut
the fingers (hypothetically) and arrange them one after the other gluing
adjacent fingers. This gives a linear arrangement of four rings and four
junction points where adjacent fingers are glued together. Conversely every
linear arrangement of four rings and four junction points determines a
way of Wearing the rings on the five fingers. This corrrespondence gives a
bijection between the set of all ways of wearing the rings on five fingers
and the set of all linear arrangements of 4 rings and 4 junction points. So
our problem reduces to count the latter. This is preoeisely the number of
arrangements of 8 objects (4 rings and 4 junction points) of which 4
(namely the junction points) are indistinguishable from each other. By
1
Theorem (2.17), this number equalsifi which is 5x6>< 7X8. I
98 DISCRETE MATHEMATICS (Chapter Na)
Note that the second solution is an instance of the technique of intro-
ducing new objects into the problem and interpreting the problem informs
of them. The generalisation to the case of r objects and n boxes is imme-
diate and left to the reader.
3.5. Proposition: The number of ways to put r objects into n boxes so
that the order in each box is important is 7101 + 1)(n + 2)...(n + r— 1). I
(if) Distinct objects into Non-distinct Boxes: Suppose we have a set S dis-
tinct objects which are placed in each boxes. The set of objects placed in
each box is a subset of S. This gives us a collection of subsets of S. Obviously
every two members of this collection are mutually disjoint and the union of
all of them is the whole set S. Therefore, a placement of objects of a set S
into non-distinct boxes is equivalent to a collection of mutually disjoint
subsets of S whose union is S. This gives us a paraphrase of the problem
intrinsically in terms of S, without involving the boxes.
When we consider a collection of subsets of a set S whose union is S,
there is obviously little point in including the empty subset in the collec-
tion. There is a special name for such collections.
3.6. Definition: A decompos (or putlflon) of a set S is a collection
of mutually disjoint, non-empty subsets of S whose union is S. These sub-
sets are called the parts (or members) of the decomposition.
For example, {{1, 3}, (2, S), {4}) is a partition of the set (I, 2, 3, 4, 5}
into three parts. The problem of finding all partitions of a set S of cardi-
nality n into 01 parts is equivalent to the problem of placing n distinct ob-
jects into m non-distinct boxes so that no box is empty. To find the number
of all such partitions is an important problem because many other prob-
lems can be reduced to it. Unfortunately, there is no easy formula in terms
of n and m which will give us the number of partitions of a set S with n
elements into m parts, simply by substituting the values of n and m. But
the problem is too important to be given up. What do we do now'.’
The way out is quite ingeneous and is frequently adopted in similar
dilemmas in other branches of mathematics as well. Whenever we cannot
find a formula for something, simply give it a name and a symbol! That is
exactly what we do in the following definition.
3.7. Definition: The number of ways to put n distinct objects into In
non-distinct boxes so that no box is empty (or equivalently the number of
partitions of a set with n elements into m parts), is denoted by
s—-(°'~vi;})
Elementary Counting Techniques 99
and is called a Stirling number of the second kind‘.
What have we achieved? Actually, nothing; because simply assigning a
name to the solution does not solve the problem.The real use of such names
or symbols is not in the problems from which they arise. Their importance
lies in the fact that the solutions to many other problems can be expressed
in terms of these symbols. As a very familiar example of this type of s
a:
situation, recall from elementary calculus that the integral J1?- cannot be
1
expressed as a familiar function of x. (Here by ‘familiar‘ we mean a com-
bination of algebraic and trigonometric functions.) So we give it a new
name, the natural logarithm“ of x, denoted by In x. of course, this new
a.
function is useless in evaluating the integral I? unless we have some
I
other way of computing In x (in which case it would amount to evaluating
the integral in the first place). But, as it turns out, many other integrals such
as tan x dx MI (1 + x‘)‘/' dx can be expressed in terms of the natural
logarithm. So it is worthwhile to study the properties of the logarithms and
also to find methods for evaluating them at least approximately.
So, that is what we shall do with Stirling numbers of the second kind.
First, we shall illustrate how some other problems can be reduced to them
and then we shall see a way to evaluate them. We begin with a problem
that was left open earlier in this section.
3.8. Proposition: The number of ways to put n distinct objects into m
distinct boxes so that no box is empty (or equivalently, the number of sur-
jective functions from a set with n elements to a set with m elements) is
ml Sn"...
Pranf : Let us denote the objects by x1, x.,..., x. and their set by X and
the boxes by BI, 19,, ..., B... and their set by B. We are assuming the boxes
to be distinct. Let S be the set of all arrangements ofthe x1’5 into the boxes
By’s so that no box is empty. Let T be the set of all arrangements of the
‘The usual definition is difl'erent but equivalent to ours. Also, as the name indi-
cates, there are Stirling numbers of the first kind as well demoted by
II
4'4"»: or by
I”
But we shill not define them here.
"In x canbe defined difl’erently. But the present definition is very standard.
100 915mm MATHEMATICS (Chapter Two)
objects into the boxes so that no box is empty. if we do not distinguish
between the boxes. Then l T] = S,,,.. by definition. Our goal is to prove
that ] S| = ml Sm. and to do this it would suffice to find a function
f: S —> T such that the inverse image of each element of T has precisely m!
elements (c.f. Proposition (2.8)). Construction of this function is quite
simple. Given an element of S, i.e. given an arrangement of the xi’s into
the 31’s, we simply view it as an arrangement into non-distinct boxes. It is
as if the boxes are carrying labels from I to m initially and we are remov-
ing these labels. These labels can be put back into the boxes in m! ways.
And when we do so we would be getting difi‘erent arrangements as viewed
into distinct boxes but the same arrangement as viewed into non-distinct
boxes, (see Figure 2.8. where n = 5 and m = 3).
Note that the requirement that no box be empty is crucial. If some of
the boxes are empty then an interchange of the labels of two empty boxes
will not give rise to a new arrangement as viewed into distinct boxes. Thus
m! distinct elements of the set S correspond to the same element of T. As
noted before. this completes the proof. I
-9-
Else
[335
-B-
EB:
:15:
Fixture 2.8: Placenta-t of obiacu Into boxes
Next: We count the number of ways to put n distinct objects into r
non-distinct boxes, without any restriction about the boxes being non-
empty.
3.9. Proposition: The number of ways to put n distinct objects into r
non-distinct boxesis )5 s.,.,..
Ina-II
Elementary Counting Technique: 101
Proof: In every arrangement there will be m non-empty boxes for a uni-
que m,0 S m S r. (The case m = 0 can occur only ifn = 0 and we may
set SW = l by convention.) For each m, the number of arrangements is,
5.... by definition. The result follows by summation. I
The Stirling numbers also appear in other problems but we cannot dis-
cuss them here. We now consider the question of computing S...»I where
m, n are positive integers. Clearly, SW. = 0 for m > n and Sm = 1 be-
cause there is only one way to put n objects into n non-distinct boxes so
that no box is empty, namely, to put one object in each box. S...l is also
easily seen to be 1 for all n. As a less trivial result, the following proposi-
tion computes S.,,.
3.10 Proposition: For all n 2 2, S... = 2""—l.
Proof: Let X be a set with n elements. By definition, Sm is the number
of partitions of X into two mutually disjoint non-empty subsets Whole
union is X. These subsets must therefore be of the form A and X—A
for some subset of X. Note that A at ¢, and also A at X (otherwise X—A
would be 4:). Every other subset of X gives rise to a partition of X of
the desired type. Note, however, that the partition arising out of a
non-empty proper subset A of X is the same as the partition arising out
of its complement. So every partition gets counted twice. Formally.
we have defined a function f from the set P(X)—{¢, X} into the set, say S,
of all partitions of X into two parts by f(A) ={A, X—A}. Then f(A) =
f(X—A). So every point of S has exactly two pre-images. Hence |P(X)
—{¢, X} | = 21S [. But |P(X)| = 2". So |P(X)—{¢, X}| = 2--2. The result
now follows. E
Form > 2, there is no easy formula for S.,.... For small values of n
and m one can find Sm". by actually considering all partitions of a set
with n elements. For higher values this becomes impracticable and also
unreliable (because some possible partitions are likely to slip us). There is,
however. an important recurrence relation which allows us to compute
a Stirling number by first computing the lower Stirling numbers.
3.11. Theorem: For all positive integers m and n,
Sam = m Sn—pn ‘i’ Sn-n m—I‘
Proof: As usual let X be the set (x,, x,,..., x.} and S be the set of all
partitions of X into In parts. Then|S| =S.,.... We divide S into two
subsets as follows. In any partition of X there will be a unique part to
which x. belongs. This part may be simply {x.} or else it will contain
some other elements of X. Now let P be the set of those partitions
in which {x.} is a part and Q be the set of those in which the part contain-
+ |Q[.
ing x. also contains some other elements of X. Clearly |S| = [H
102 DISCRETE MATHEMATICS (Chapter Two)
Computing the cardinality of 1’ presents no problem because every partl-
tion in P obviously corresponds to a partition of the set of remaining
elements, (x1, x,,...,x,_.} into m—l parts. So |P| = S,_1, ._1.
However, computing the cardinality of Q is not so immediate. Para-
phrasing the problem in terms of boxes, Q is the set of all arrangements
of the 11—] objects x1. x,,...,x._, into In boxes so that no box is empty.
But these boxes are not all identical! The box containing the object x.
stands out as difi'erent from the remaining m—l boxes which are
indistinguishable from each other. We now proceed as in the Proof of
Proposition 3.8. We let R be the set of all arrangements of the objects
xv xg,...,x,.fl into the m boxes 8,, B,,...,B,.. where we assume that B. is
the box containing x., and where we regard the boxes as distinct. By Pro-
position 3.8, [R] =m! S.._1,,... Now every arrangement in It gives rise
to an arrangement of Q if we treat the boxes as identical, subject to
the restriction that B... is not to be identified with any other box. Then we
can reshuflle the labels of onlym ~l boxes among themselves and still
get the same element of Q. This can bedone in (m—l)! ways. So, by a
reasoning we have used anumber of times so far, [R| = (m-l)!] Q]. But
lR]=mlS..1,m.So|Q|=m 5H,...
Since |S| =|P] +|Q[ and we have expressed all the three terms in
terms of Stirling numbers, the result follows. I
To see how this theorem can be applied, we calculate
S5,. = 3S0: + Sm
= 3 (3 San + sass) + Spa
= 3(3 + 4—1)+ 8—] (using Proposition 3.10
and the fact that S.,. = l)
s 25.
TL ‘ "y we can r , by , ‘ u" ' of theorem
3.11, 5...... for any m, 7:. But this is a time-consuming process. In view of
the applications of the Stirling numbers, readymade tables, listing their
values upto a fairly large value of n are available. There are also numerous
identities about Stirling numbers. A few of them will be given as
exercises.
(iii) Non-distinct Objects and Distinct Boxes: In this case it is convenient
to think of the objects as balls, beads, or some tags, mechanically produced
so that you cannot tell one apart from the other. Suppose we have r such
objects and we want to put them into n distinct boxes. The number of
ways to do so can be obtained using Proposition 3.5. Call the boxes as
Bl, B,,.. .,B.. Even though the objects are non-distinct, let us temporarily
call them x,, x,,. . ., x,. Let S be the set of ways to put these r objects
( , ii, _, ‘ ’ as “‘ ‘ \into the "‘ " ‘ boxes so that the order
Elementary Counting Techniquer 103
in which the objects appear in each box matters. Now arrange the boxes
in a row in the order 8,, B,,. . . 5.. This amounts to arranging ther objects
in a row, separated by the 11—1 ‘walls’ between adjacent boxes as in
Figure 2.9 where r = 10 and n = 5. Let T be the set of all arrangements
of the x.’s into the B,‘s where we do not distinguish between the x,’s. As
usual we get a function from S to T obtained by forgetting the difi‘erence
between two objects. Under this function, r! distinct elements of S (obtained
by permuting the x,‘s among themselves) go to the same element of T. So
|S| = r ![T[. But by Proposition 3.5.
|S| =n(n+l).. .(n+ r—l).
Therefore
ln_n@+n”4mH—n="+’“
— I! r
It is also instructive to derive the result directly, by an argument analogous
to the second proof of Proposition 3.5. If we drop the sufixes on the xg's
in Figure 2.9, we see that every distribution of r identical objects into n
‘ ‘ ‘ boxes is , ’ ‘ to a ye. ‘ of therohjects along with
the 11—1 interbox walls (denoted by w‘s). By Theorem 2.17, the number
of such permutations is
(n + r—l)! "(n + ”1)
(n—l)!r! r
*- 1‘5 {2|
. W w
.f7..°| .3
X X X
W
X X
W
.
X9
B. 32 53 B4 85
m1): Non-distinct Object! Into Distinct Boxes
As an example, 8 coins (of the same denomination) can be given to
10
3 children in ( )or 45 ways. It is interesting to derive from the result
8
above, a formulu for the number of selections with repetitions allowed.
the objects
We often have a situation where there are n piles of objects,
we want to
in each pile being indistinguishablefrom each other. Suppose
pick r objects from this collection of n types of objects with repetitions
each
allowed freely, which means that we can pick as many ObJCCtS'Of
Is r. Of
typeas we like as long as the total number of objects picked
of
course, in order that this be feasible there must be an adequate supply
104 niscns'rs MATHEMATICS (Chapter Two)
each type of objects. Specifically, each of the n piles must contain at least
r objects. To play it safe, we take each pile as infinite although literally it
need not be so. The following theorem computes the number of such n-selec-
tions with repetitions allowed.
3.12 Theorem: The number of r-eelections from n types of objects with
free repetitions allowed equals the number of ways to put r non-distinct
n + r—l
objects into n distinct boxes and hence equals
r
Proof. Let there be n piles. The members of each pile will be called
‘things’ rather than ‘objects’ because we shall call something else as
objects. Suppose the things of each type are stored in a box, so that there
are n distinct boxes. We have to choose a total of r things from these 71
boxes. Let us indicate our selection of a thing by putting some label over
it. These labels will be indistinguishable from each other and in all there
will be r of them. Because the things in each box are identical, what
matters is not which ones of them are selected but rather simply how
many of them are selected, or equivalently how many labels are put into
that box. Thus the problem is equivalent to distributing r identical labels
into n distinct boxes. These labels will be our objects. By the formula
above, the number of ways to put r identical labels into n distinct boxes is
n + r—l
). This completes the proof. 3
r
There is yet another way to derive the same result which will be given
as an exercise. We illustrate the applicability of this theorem by a problem.
3.13. Problem: Suppose three identical dice are rolled simultaneously.
Find the total number of distinct outcomes. (instead of rolling three dice
simultaneously. essentially the same problem could be asked by rolling the
same dice thrice in succession and ignoring the order of the three scores.)
Solution: In all problems involving dice. we assume, unless otherwise
specified, that each die has six faces marked with figures from 1 to 6. If we
have three such dice, it is as good as having six different boxes, the first
containing three replicas of the figure 1, the second, three replicas of the
figure 2, and so on, Each possible outcome on rolling three dice simulta-
neously is equivalent to selecting 3 figures from these 6 boxes with repetitions
allowed. So by the last theorem, the number of distinct outcomes
is
since-
Elementary Counting Techniques 105
(iv) Non-distinct objects and Non-distinct Boxes: This case is analogous
to (ii) which was reduced to the problem of partitioning a set. However,
in the present case even the objects are non-distinct. Suppose we have
n identical objects (say balls) which are to be put into m identical boxes.
Once again, let us first consider the case where every box is to be
nonempty. Let n“ n,, ..., n," be the numbers of balls in these. The index-
ing here is purely arbitrary because since the boxes are identical we cannot
call one of them as the first, one as the second. Each m is positive integer
and obviously n1 + n,+...+ p1,,I = n. Note that the integers m’s (each
’ with its ' "" ' y if any) ,' ‘, determine the arrangement
of the balls into the boxes. Thus the problem is reduced to partitioning
the integer n, i.e. expressing it as a sum of m positive integers (with repe-
titions allowed). In order to systematise them, we reshufile the indices if
necessary and suppose that n; 2 n, 2 n, 2...2 n,,. (an ascending order
would do as well). This leads to the following definition.
3.14 Definition: A partition of a positive integer n into m parts is a sequ-
ence of positive integers n,,n,,..., n... such that n; 2 na 2 .. .2 n”. and
n, + n. +. . . + n". = n. The number of such partitions is denoted by P,,,...
For example 5 can be partitioned into three parts in two ways. either
as 3 + l + l or as 2 + 2+ I. So we see PM: 2. A recurrence relation
for Pm. (analogous to the one for Stiriling numbers in Theorem (3.11)) is
easily obtained and will be given as an exercise. Analogous to proposition
3.9, we have the following result whose proof is also left to the reader.
3.15 Proposition: The number of ways to put n non-distinct objects
a
into r non-distinct boxes is 2 IF...” I
m-
Sometimes the number of parts is not important. We want to consider
all partitions of an integer n. into whatever number of parts. Obviously,
in any such partition, the number of parts will be some integer between 1
n
and n. Consequently, the number of all partitions of n equals 2 PM... This
nI-l
number is quite important in applications and is commonly denoted by
p(n). For small values of n, p(n) can be found by explicitly writing doWn
all possible partitions of n. For example we see that 5 = 5 = 4 + l = 3 +
2=3+l+1=2+2+l=2+l+1+l=l+l+1+1+1 and
this gives p (5) = 7. However, there is no easy formula for p(n). But the
answers to many other problems (appearing in many diverse branches of
mathematics) can be expressed in terms of the partition function p (n). So
it has become one of the standard functions in mathematics and tables for
its values are available. (See the comments made after Definition (3.7)).
Numerous results are known about partitions. Some of them require
7.
the use of generating functions and will therefore be deferred to Chapter
106 nrscnm MATHEMATICS (Chapter Two)
However, a few results can be proved by some elementary but rather
tricky arguments. A most ingeneous device to handle a partition is to look
at it graphicallyl lfn = n. + n,+... + n,,I is a partition ofn (where n, 2 n,
2 n, 2 .. . 2 n," 2 1) then we draw a block of dots in which the m rows
from top to bottom have 11,, nm. . ., nm dots and the dots in each row ap-
pear below those of the upper row (if any), aligning from the left-most dot.
Such a block of dots is called the Ferrer‘a graph of the given partition.
Figure 2.10 shows the Ferrer’s graph of the partition 7 + 4 + 4 + 2 + l
of 18.
o o o o o n .
o o o .
o o o o
o c
Figure 1.10. Ferrer’s graph of a partition.
If we take the Ferrer’s graph of a partition I: =nI + n, +...+ n".
then clearly it has nI columns and their lengths decrease as we go from the
left to the right. The left-most column has or dots. If we take the numbers
of dots in each column from left to the right, we get another partition of
n, into n1 parts and in which the largest part has size m. This partition is
called the dual partition of the original partition. For example, the dual
partition of the partition 7 + 4 + 4 + 2 + l of 18 whose Ferret’s graph is
shown infigure2.10i55+4+3+3 +l+ l +1.
Many results about partitions can be visualised easily in terms of their
Ferrer graphs. As an example, the concept of the dual partition just intro-
duced leads to the following result, which would otherwise appear some-
what awkward.
3.16 Proposition: Pm... that is, the number of partitions of an integer 1:
into 01 parts also equals the number of partitions of n in which the largest
part is of size In. Similarly the number of partitions of n into at most m
parts is the same as the number of partitions of n in which the part sizes
do not exceed m.
Let us apply this proposition to the Postage Problem. We noted earlier
that this problem reduces to finding the number of partitions of 20 into
parts of sizes 1, 2 and 3. We denoted this number by a”. In View of pro-
position 3.16, am equals the number of partitions of 20 into at most
3 parts. In other words a” = am + P,” + Pour EVidently PW,l = i.
Elementary Counting Technique: 107
As for Pa“, any partition of 20 into 2 parts is uniquely determined by the
size of the smaller part, which can beany integer from 1 to 10. So P,“ =
10. (More generally, for any positive integer n, Pm = n/2 if n is even and
P,,,=(n —-1)/2 if n is odd). However, computing P,“ is not so easy. Here
the smallest part size, say x, can be any integer from 1 to 6. For a given
3: we have to consider the number of partitions of 20—): in two parts both
of which are 2 x. It is easily seen that for x = l, 2. 3, 4, 5 and 6, these
numbers are, respectively, 9, 8. 6, S, 3 and 2. Summing together Pm: = 33.
Hence a,“ = 1 + 10 + 33 =44.
Note, however, that this method would not work if the denominations
of the stamps were, say, 10, 30 and 40 for in that case we would have to
consider partitions of 20 into parts of sizes 1,30r 4 and proposition 3.16
would not apply. In Chapter 7 we shall do the Postage Problem again by
a method with a wider applicability.
Exercises
3.1 Do the generalised Locks Problem with n persons out of which
any m but no fewer are to be able to open the box.
3.2 Vary the Locks Problem by calling one of the five persons as the
leader. The leader is to be able to open the box with the concur-
rence of at least one other person out of the remaining four. For
the remaining four members, the requirement is the same. namely,
any three but no fewer of them should be able to open the box.
Design a system of locks and keys.
3. 3 Vary the Vendor Problem by assuming that there are n persons
with one 1 rupee coin each and k with 2 rupee coins. Find the
probability that the Vendor will not run out of change. (Obviously,
the answer is 0 ifn < k.)
3.4 As a further variation, suppose that the Vendor has q one rupee
coins to start with. If in the line there are n persons with l rupee
coins and k with 2 rupees coins, where q < k < q + n and they
approach the vendor randomly, prove that the probability of his
not running out of change is
n+k) ( n+k )]/(n+k)
n n+ q+ l n
3.5 In how many ways can r integers be selected from set {l.2,.... n} so
that no two consecutive integers are selected?
3.6 Do Problem (3.1) by the first method (that is, by considering the
various positions in which x, may be placed).
*3 .7 Let ABC be an isosceles triangle in which AB = AC and AA = 20
108 Discrtm summaries (Chapter Two)
degrees. Let D and E he points on the sidés AC, AB respectively
such that LDBC = 60 degrees and LECB = 50 degrees. Prove that
LEDB = 30 degrees. (Hint: Consider a point P on AC such that
LPBC = 20 degrees. This problem has little to do with combina-
tories. It is meant just to illustrate the remark regarding some
ingeneous constructions needed in problems in geometry.)
3.8 Suppose a city has m parallel roads going East-West and n parallel
roads going North-South. How many rectangles are formed with
their sides along these roads? If the distance between every con-
secutive pair of parallel roads is the same, how many shortest
possible paths are there to go from one corner of the city to its
diagonally opposite corner?
3.9 Take an 8 x 8 chess-board and remove two diagonally opposite
corner squares. Prove that it is impossible to pair off the remaining
62 squares in such a way that every pair contains two adjacent
squares, i.e. squares having one side in common. Do the problem
both without and with colouring the squares. This illustrates the
facility gained by introduction of an additional device, in this case,
the colouring of squares.
3.10 Suppose there are 20 players of different. heights. These are to be
divided into two teams, A and B, of 10 players each so that for each
i= 1, 2,..., 10, the ith tallest player in team A will be taller than
the 1th tallest player in B. In how many ways can this be done?
3.11 Suppose in Exercise (3.8). 2“ persons start from the southwest cor-
ner of the city where k < min {m—l, n— 1}. Half of these persons
proceed eastward and half northward. They travel with the same
speed and reach the next junction at the end of one unit of time.
Persons arriving at each junction again bifurcate, half of them
going eastward and half northward and then continue to travel with
the same speed. If 0 <j < k find how many persons there will be
at each junction, at the end of1 units of time.
Suppose a. person walks along a straight line starting at a point 0.
He walks at a uniform speed of 100 meters per minute. But at the
end of every minute he is likely to reverse his direction with proba-
bility 1/2. Find the probability that he will be at a spot 100 k
meters from 0 at the end of 1 minutes, where k, r are integers and
r is positive.
*3.13 In [the last problem suppose there is a dangerous zone on the line
beginning at adistance of 1 km. from 0. The person may hit the
boundary of this zone and go back. But once inside this zone, his
motion stops. Find the probability that the person is still walking
at the end of the rth minute where l is a positive integer.
3.14 How many paths are there from point A to point B in Figure 2.11
Elementary Counting Technique: 109
if no portion of a path is to betraversed more than once (whether
in the forward or in the backward sense)?
Figure 2.11: Disgnm [or Exercise (3.14)
Into how many parts are the diagonals of a convex octagon
decomposed, given that no three of these diagonals are concurrent
except at a vertex? Generalise to a convex n—gon.
3.16 Given an arithmetic progression of length n, how many subprogres-
sions of length 3 may be formed from it?
‘3117 Find the number of regions into which a convex n-gon is split by
its diagonals if no three of them are concurrent, except at a vertex.
3.18 Prove that for every positive integer n,
(3) +(Z)+ (Z)+~+(22)+~
=(;) + (g) + (;)+ ()
a n
Deduce that 2 (—1)’( )= 0.
1-0 j
(Although each sum is superficially infinite, it is actually finite, in
It
View of the fact that ( ) = 0 for r > n).
r
A palindrome is a word which reads the same backward or forward
(e.g., ‘MADAM', ‘ANNA’). Find how many palindromes of length
n can be formed from an alphabet of k letters.
How many k-ary sequences of length n are there in which no two
consecutive entries are the same?
How many ternary sequences of length n are there which either
start with 012 or end with 012?
Prove Proposition (3.5).
hall
A conference, attended by 100 delegates, is held in a hall. The
110 mscnm MATHEMATICS (Chapter Two)
has 3 doors, marked A, B, C. At each door, an entry book is kept
and the delegates entering through that door sign in it in the order
in which they enter. If each delegate is free to enter anytime and
through any door he likes, how many different sets of three lists
would arise in all? (Assume every person signs only at his first
entry.)
Find S,,,,._,.
s.2s For positive integers m, n prove that,
,, n
Sun-mu =31. k Sku-
=33" (m +1)” s.....
In how many ways can 3 blue, 4 white and 2 red balls be distri-
buted into 4 distinct boxes?
3.27 A person has three sons. He owns 101 shares of a company. He
wants to give these to his sons so that no son should have more
shares than the combined total of the other two. [n how mlny
ways can he do so?
3.28 Prove that the number of ways to put r identical objects into n
r—l
distinct boxes so that every box is non-empty is< ). Interpret
r—n
this in terms of r-selection of n types of objects with repetitions
allowed.
10 balls are picked from a large pile of red, blue and white balls.
How many such selections contain less then 5 red balls?
A sequence of real numbers is said to be monotonically inereasing
if every term is greater than or equal to its predecessor, if any. A
monotonically decreasing sequence is defined similarly. If every
term is greater than its prelecessor (ie. if strict inequality holds)
the sequence is called strictly monotonic-fly increasing. Prove that
a monotonically increasing sequence of length r taking values in
the set (1, 2,. s ., n) corresponds uniquely to an r-selection of 11 types
of objects with repetitions allowed.
Let S be the set of all monotonically increasing sequences of
length r taking values in the set {1, 2,. . ., n) and let T be the set of
all strictly monotonically increasing sequences of length 7 taking
values in the set (I, 2,“. n + r — l}. Prove that the function
f: S—> T defined by
f(a‘, ah. . ., a,)=(a,’ al +1, a.+2,..., a,+i—l,. . .. ar +r— l)
is a bijecction. Use this to give an alternate proof of Theorem
(3J2).
Elementary Counting Techniques II]
332 For positive integers m, n prove that
PM. = P.—;,m-1 + Pn-..,m-
Also show that for every n,
PM) = P£11,.-
3.33 Prove proposition (3.15).
What is wrong with the following argument which attempts to show
that
. ,n = n! P.,,,.'!
'Let X be a set with n elements. say,
X B (Jr... . .,x,.}.
Given any partition of X into m parts, we get a partition of the
integer 71 into In parts if We count the number of elements in each
part. This gives us a function f from the set of all partitions of the
set X into the set of all partitions of n, each into m parts. If we
reshuflle the indices of the x’s, we get the same partition of the
integer '1. But the indices can be reshufiied in nl ways. So underf,
every point has n! preimages. Hence the result.’
3.35 A triangular partition of an integer n is defined as a partition of n
into three parts such that the sum of any two exceeds the third.
Clearly such a partition corresponds to a triangle with integer
sides and perimeter n. Let t, be the number of triangular partitions
of n. Prove that, for all n 2 4,
I.-. if n is even
r, — 1..., + k ifn = 4k + 1 for some positive integerk
1..., + k “‘1: = 4k — l for some positive integer k
A partition is called self-dual if it coincides with its dual partition.
Prove that the number of self-dual partitions of 7: equals the num-
ber of partitions of n into parts of different and odd sizes.
‘3.37 Suppose there are 10 houses in a row. The occupants of these
houses go on avacation. one by one, in a random order. They
being good neighbours, every occupant, while leaving, checks that
things are in order in the two houses immediately neighbouring his
(one house in case of a house at an end). What is the probability
that every house, except that of the occupant leaving last, has been
checked by its neighbour after its occupant has left?
112 mscnm MA'n-lEMATlCS (Chapter Two)
Notes and Guide to Literature
The problems in this section are standard. For more numerical prob-
lems. see Vilenkin[l]. For more identities on Stirling numbers and for
tables of them see Knuth [I], Vol. 1 There is a huge literature about parti-
tions. On the basis of a few observations. Ramanujam conjectured and
proved a number of interesting congruence relationships about the parti-
tion numbers, a few of them are given in M. Hall [1].
The movements in Exercises (3.11) and (3.12) are examples of what are
called random walks. They are important in statistical mechanics, where
the number of particles is huge and it is assumed that they move random-
ly. A good reference is Pat-zen [l].
4. Principle of Inclusion and Exclusion
Supposea set Sis expressed as the union of a finite number of its
subsets, say,
S=SI U S.U...U S...
If S; n S, = 4% for all distincti and j, then by Proposition (2.3).
lSl=lSII+lS.|+.‘.+|SnI.
But if the sets Sis over-lap this no longer holds, because if some element
is common to, say, two subsets S, and S, then in | S | it will be counted
only once while in the right hand side it is counted more than once. Ignor-
ing such overlaps is in fact one of the most common pitfalls that occur in
solving combinatorial problems.
However, as remarked in the solution of problem (3.2), there is a
method available in which we keep a systematic track of such overlaps
and get the cardinality of the union of a finite number of sets. The basic
idea is to take an element x, to include it in our count as many times as it
appears and then to make up for its excessive inclusion (because of over-
laps) by excluding it an appropriate number of times. That is why, the
result is called the principle of inclusion and exclusion. A special case of
this was stated in Proposition (2.7), where We saw that
lsuT|=|Sl+lTl~lSnTl-
Here on the left hand side, every element of S U T is counted only once.
'On the right hand side, elements which are present in only one of S and T
count only once each. But elements of S n T get counted twice, once in
l S [ and again in | TI . To correct this, we ‘exclude‘ them once, i.c. We
subtract 1 S n T].
Although the basic idea is the same, the result looks much more
complicated when the set is expressed as a union of n sets withn > 2. The
difliculty can be illustrated with the case n = 3. Suppose
Elemetary Counting Technique: 113
S = S, U S, U S,.
Then in general,
ISl#lS.|+lS.|+|S.|
because elements which are common to more than one S, get counted
more than once. So, to make up for this, we subtract
|S1nS.|+|S.nS.|+|S.ns.I
from the right hand side and get
[Sli‘i'lsll +lsll_lslnsil —|s.nSaI-Sinsal-
This may appear to be equal to | S | . But it need not be so! The reason
is that we have over-corrected the error. Points which are common to
all the three sets SI, St, S. are counted in each of the first three terms. So
they are included thrice each. But these points also appear in each of the
term that is subtracted. This means they are also excluded thrice. Efl‘ectively,
then, points of SlnSmSl are not included in this expression. To make
up for this we must add | S,nS,nS. |. Thus
|S|=i51i+lszi+|ssi"isrnssl—Isrnssl
—|Snnsal+lsrnsansal-
This is the correct result. It can begraphically visualised through the Venn
diagram in Figure 2.12.
sl n 3e $3
Figure 1J2. Mummy 0! the Unlon of The: Sets.
We now state and prove the general result.
4.1. Theorem (Principle of Inclusion and Exclusion). Let 3,, S,,. . ., S,I be
finite sets and let
s=s, u s. u...us,..
Then
| S | = sl—r,+ s'I—r. +...+ (—l)‘+‘.r.
where 31 = | SI | + [S2 | +...+ | S,I | = sum of the cardinalities of the
sets s.
114 DISCRETE MATHEMATICS (Chapter Two)
5. = 2 I S, {1 S1] =sum of the cardinalities of intersections of
l‘l<](!l . ‘ ,
the S,‘s taken two at a time.
3a = E I S! n S] n Sk| = sum of the cardinalites of the inter-
1<I<J<h<n
sections of the Sis taken three at a time and so on.
(Caution: :, is often confused with the number of elements belonging to
at least two of the Sg's. This interpretation is incorrect. Actually s, is a
sum of cardinalities of certain subsets. It is not, by itself, the cardinality of
any easily identifiable subset. A similar warning holds for 3,, 5,, s. . . .).
Proof: We proceed by induction on 7:. Four: I, there is nothing to
prove. The case n = 2 is covered by Proposition 2.3 and will be used in
the proof of the inductive step.
Suppose the result holds for all finite sets expressed as the unions of
k (or less) subsets. We shall show that it holds for sets which are expressed
as unions of k + 1 subsets. So let
S: SI [1n .. ..UStUSk+,,
where each [S1] is finite. Now let A: 0 S,. Then by Proposition 2.3,
[-1
lSI=IAl+|Sk+)|—lAnSk+-)[- (I)
Now, we apply induction hypothesis to the set A and get.
IAIN. —:,+ t.— t.+...+(—I)m u
where for] < r< k, t, is the sum of the cardinalities of the intersections
of the sets 8,, S... . .,Sk taken r at a time, La.
1 = S S . .
' 1<lx<lluE.<n<sl "n "n “5"]
Because of the distributivity of intersection over union (see Proposition
1.1 Part (ii), A r‘nSM,l can be written as L_J (Sin St“), So we can also apply
induction hypothesis to it and get
lAnSIml: “1—“s+“e—“4+..-+(—1)*+‘ “I:
where for l g rs k
=1‘:,<..<§..<:.« “sh n Sk+1)n(S:. n Ska) n-- -n(SI, n Ski-OI
Therefore we get, by substitution in (l).
Isl=lskul‘l‘ta—(ta+"i)+(ts+"n)—(ta+Va) +-~~
+ (—1)’+‘(b + 14,..1)+. .-+(—1)"+‘(t,, + ul. (_1)n+1 "Iv
Elementary Counting Techniques 115
To complete the proof we now show.
(i) 51 = ls“! l + '1
(ii) 3, = r, + u _, for r = 2, 3,..., k
and
(iii) 1,.“ = u,“
Of course (i) is obvious from the d efinitrons
‘' of .v and t . "II' ls
poses no problem because 3“,, is simply
‘ I ( I) “ °
But the set
[Sin-5'30 fiSaIn-l '-
S.nS.n ~ ‘ . nSam
equal: the set
(SI n SIM-1) n (S: fish.) n - - - n (SI: nSm)
(seeAaglai? Proposition Ll parts (vii) and (v)): so
its cardinality equals uk.
itt e argument rs needed for (ii). By definition,
3'. equals
ZISI. OS]. n~-nS:.I
as the sum ranges over all r-tuples (in i,.. . . J.) of indices
in which
1<I,<i,< <i,_,<i,sk+l.
VIVe classify such. r-tuples into two types: (1) those in which
iy < k and
( bI).those In which i, = k + l. The summation over r-tuples of
type (I)
o wously gives r,. So the problem is reduced to proving that the summa-
txzan over r-tuples of type (ll) equals u,_‘. A typical r-tuple of type (II) is
o the form (11. 1,...i,i,_,,k+ l) where l < i. <i, < <ir—l Sk- Bl"
the set Si. n S1, n r]S,,.,nS,‘.,.I equals, by the same argument as above,
the set
(5:. FISH-1) (51. n Sh“) n n (51,—. 051‘“)-
But u,_1 was, by definition, the sum of cardinalities of all such sets, for all
possible (r— l)-tuples,
(i1,ig,--~,i,_1) With [<11 <33 < < in] <k.
So u,_, equals the summafion over r-tuples of type (ll) As noted before,
this completes the proof. I
For application, it is convenient to paraphrase the last theorem in a
different form. We often have a set Sand some n properties of elements
of S. We want to find the number of those elements of S which have
exactly r of these properties, where 0 < r g n. Of course, every property
determines a subset of S, namely the set of those elements of S which
have that property. So the question can also be framed as ‘Givenn subsets,
say 5,, S,,...,S,. of a set S, how many elements of S are common to
precisely r of these subsets?’. The case r = 0 is an immediate consequence
1l6 DISCRETE MATHEMATICS (Chapter Two)
of the theorem above and is itself often called the principle of inclusion
and exclusion. The general case will be treated later.
4.2 Theorem: Let S be a finite set and SI, S,,...,S,I be subsets of S.
Let ’ denote complementation w.r.t. S. Then
15m =s.-:, +s.—s.+...+(-l)~:.
1-]
where s, = |S | and 53... .,s,. have the same meaning as in Theorem 4.1.
Proof: By de Morgan’s laws (Proposition 1.1 part x),
("'1 51' = S - Ll SI
I-l l-l
and so by Corollary 2.6,
. .
ms]' = l S 1 — 1,9,1]
S .
The result now follows immediately from the last theorem. However, we
give an alternate argument because the ideas used in it are useful elsewhere.
We take a typical element x of S and find how many times it is counted
on either side of the equality to be proved. If these two counts tally for
every x, the equality is established. Now if x e S“ for all i: 1, 2,. . .,n
then it is counted once in the left hand side, viz.y | ’6l | . Also, in the
right hand side, x is counted only in the first term, namely ] S | . Hence
the two counts are equal. Suppose now that x belongs to precisely k of
the subsets 5,, S,,. . .,SI for some k, l < k < It. Without loss of generality
we let xe S. for 1‘: 1,2,..., k and x ¢S, for i> k. Now at contributes
nothing to the left hand side. As for the right hand side, A: contributes
only to the first k + 1 terms, namely, Sn, — 1,, s,.. . .,(—1)" :k. The contri-
bution to s. is obviously 1. Let us see how often x contributes to atypical
term, (—1)’ s,, l s r g k. Recall that. s, is the sum of the cardinalities of
the intersection of the set S], 5., .. .,S,, taken rat a time. The element x
will appear in one such intersection, say, St, nS., (1.. ms], if and only
it‘ the indices i1. i,,. . .,i, are all from 1 to k. Such indices may bechosen in
k k
< ) ways. We conclude that x contributes (— l)'( )to the term (—1)'.r,
r r
on the right hand side. The proof will, therefore, be complete if we prove
k k k k k
the identity, 1—( )+( )—( )+( )+.. _+ (-1)I=< )=o. But
1 2 3 4 . k
this is an easy consequence of Proposition 3.3. Note that the positive terms
add up to the number of binary sequences of length k with an even
number of [’5 while the terms with a negative sign add up to the number
Elementary Counting Techniques 117
of those with an odd number of 1’s. Bach equals 2““. (This is in fact the
solution to Exercise 3.18.) I
This argument may appear a hit elusive at first sight. But actually it
grasps the very essence of the principle of inclusion and exclusion. We
take an arbitrary element, find how may times it has been included in the
Count and how many times it has been excluded and finally show that in
sum total, it has been counted just the right number of times (which will be
either 1 orO depending upon whether or not that element belongs to the
set whose cardinality is being found). For a reader who is still not con-
vinced, we shall present the same argument in a more rigorous form in an
exercise.
The general case can be obtained by a similar argument.
4.3 Theorem: Let S and S» S,,..., S. be as before. Then the number of
elements which belong to exactly m of these subsets equals
1‘
'§M(—1)M(m)s,.
Proof: Let T be the set of those elements of S which belong to precisely
m of the sets 3,, S,,..., S... We have to show that l T [ equals the given
sum. Now let x be an element of S. Suppose x belongs to exactly k of the
subsets S‘, S,...., S. and once again, without loss of generality suppose
xeS; fori < k and xeslfori> k. It‘k <mthenx¢ T. Also x doesnot
contribute anything to s, for r 2 m. So it contributes nothing to the sum—
mation. If k = m, then x is in T and in the summation it contributes only
m
to one term, namely, to ( )5... Here 5,. is the sum of the cardinalities
m
of the intersections of 51,... S, taken m at a time. Only one of these
intersections, namely, SlnSZn.“ nS... contains 2:. So 1: is counted only
once in the summation. The remaining case is k > m. Here x contributes
k
nothing to | T |. The contribution to s, is( ). If we letj = r—m then the
r
problem reduces to proving the identity,
k-m "1+1“ k
2 (—1)I( )( )=0.
1-0 m m+j
The proof of this is left as an exercise. I
To apply the principle of inclusion and exclusion, we must first know,
either from the data directly or by some computations, the cardinalities of
the intersections of all possible sub-families of the given family of sets.
This can be a horrendous task, because if there are n sets S], S,,..., S,I
118 Discnm MATHEMATICS (Chapter Two)
then there are 2"—-1 such intersections (Why?). But often, there is some
symmetry among these sets which simplifies the computations. Secondly, if
some intersection is known to be empty, then many others have to be so.
For example, if SlnS, = 95, there is no need to consider intersections of
subfamilies of ($1.53...” 5.} in which both S1 and S, are present.
We do a few illustrative problems.
4.4 Problem: In a language survey of students it is found that 80 stu-
dents know English, 60 know French, 50 lcnow German, 30 know English
and French, 20 know Freud! and German, [5 know English and German
and 10 students know all the three languages. How many students know
(i) at least one language (ii) English only (iii) French and one but not both
out of English and German (iv) at least two languages?
Solution: Let E. F,G denote, respectively, the sets of students knowing
English, French and German. Because only three sets are involved, the
problem can be done vividly using aVenn diagram as shown in Figure 2.13.
The various regions in it represent the various subsets obtained by taking
intersections of E, F, G and their complements. For example the shaded
region represents GflE‘nF’ where ’ stands for complementation. Now in
each region we go on putting its cardinality. We start with En Fn G which
is given to contain 10 elements. Since EnF is the disjoint union of
EnFnG and EnFnG’
and
lEnFl = so,
it follows that
I EnFnG’ l =20.
After filling in completely it is easy to answer any of the given questions
simply by expressmg an appropriate set as the disjoint union of some of
these seven regions This gives the answers, (i) 135, (ii) 45, (iii) 30 and
(iv) 45.
Figure 2.13: Venn Diagram for Problem (4.4)
Elementary Counting Techniques 119
However, with more than three sets the Venn. diagrams tend to be
cumbersome and at any rate it is instructive to be able to handle the
problem ‘abstractly’ as well. So we apply the principle of inclusion and
exclusion. (i) comes as a direct consequence of it. For (ii) we let
sI = EnF and s, = EnG.
Then
mm. | = 30 +15—10= 35.
We want the cardinality of E—(S‘US) which is 80—35 = 45. For (iii) let
s,= me, 5, = FnG.
Then we want
I (SiUSt)—(Sinsa) I
which equals, by Theorem (4.1),
ISil+ isli_2|slnsli
or 30 + 20—2.10, i.e. 30. For (iv) let
S‘=EnF, SI=Fn G, Sa=EnG.
Note that in this case,
Sins]: S. (15» SinS.
and
S; n S, n 5,.
are the same set, namely EnFn 6. So by theorem (4.1)
iSiUSaUSai=|Sni+ IS.|+|S.|~2lEnFnGl
=3o+2o+15—2o=45. I
4.5 Problem: How many permutations of the integers from 1 to n are there
in which at least one integer is left in its own place? (if we think of each
permutation asa bijection from the set (I, 2,...,n} into itself, then the
question amounts to asking how many permutations have at least one
fixed point, cf. Exercise (l. 12).).
Solution: Let S, be the set of all those permutations which fixi, i= 1,
2,...,n. We want [SIUS,U...US. [- Clearly] S,| =(n—l)! for each 1,
because having fixed 1‘, the permutation may reshuflie the remaining n—l
elements among themselves and this can be done in (n— 1)! ways.‘ Now
'Note that We are not saying that a permutation in s, has i as its only fixed
point. It may have others. This point should always be kept in mind in applying the
principle of inclusion and exclusion. Whenever a statement is made to the efiect that
an element has (or lacks) certain prayenies, it should be interpreted exactly as
it is, without making any unwarranted inferences as to whether it lacks (or has) the
remaining properties, as we often do in practice If a student tells about his exami-
nation results as ‘I failed in chemistry’ we tend to think that he cleared all other
subjects. In mathematics we must not do so; for the student may have failed in other
subjects tool.
120 mscnm MATHEMATICS (Chapter Two)
for i¢j, sms, consists of those permutations of (l, 2, ..., n} in which
bothi jare fixed. Since the remaining elements may be permuted
among themselves in (11—2)! ways,
| 5105/ I = ("—2)!
n
for every pair {L1} of distinct indices. There are ( 2 )such pairs. Similarly
n .
there are ( )triples of distinct indices and for every one such triple.
3
{!.i.k}.151ns;nsk| =(n—3)! .
Note the considerable symmetry in this problem. Continuing in this
manner, and applying Theorem (4. l), we get
.,Hs..=(l)(n_._(2)(._1
" n l' n 21
n
n 1( )0!
+( )(n—S)! ...... + (—|y+
n
3
n! n!
= ”(n—r)! ‘ (”")!‘24 (11—2): ‘ ("'2’!
n!
+...+ (”'W'n
1 1 1 1 1
=n!(fi—fi+f——!+...+(—1y-+1E).
As there is no way to sum this series, the answer has to be left in this
form. I
The last problem gives us a way to find the number of derangements
of 71 symbols. Recall that a derangement is an arrangement in which no
element is left in its original position. We considered such derangements
in commenting upon the Envelopes Problems. The number of derangements
of n symbols is often denoted by D...
4.6 'l'l The L of’ ofn , L',D,.,is
n! ( 1—1—l1 +21
1 —§-!1 +... +(—1yfi).
1
Consequently, the probability that a given permutation of 11 symbols has
no fixed points is
Elementary Counting Techniques 12]
l l l 1
1—31 +—2-! "3—! +"'+(—‘)a! .
Proof: The result follows immediately by subtracting from n!, the total
number of permutations, the number of those having at least one fixed point,
as obtained in the last problem. (Or one could have directly applied
Theorem 4.2 instead of 4.1.) I
As noted in the solution of Problem 4.5. there is no way to evaluate
l l l l
l—fi+§—!—3—! +"'+(—l).fi‘
of course, the first two terms cancel each other. But there is some reason
for retaining them. A reader familiar with calculus would know that for
every complex number 2,
y: 3 1'.
._. n'
Inparticular,
1—
1 w (—1)"
.—~
:- e‘ E. n!
Therefore the probability calculated above is the partial sum of this
series for lle. Since nl is very large for even relatively small values of n,
the series converges very rapidly. Hence even for small values of n (say,
n = 10), the probability is very nearly equal to l/e which is approximately
0 3678794418.
As another application of the principle of inclusion and exclusion we
return to the problem of finding the number of surjective functions from
one set to another.
4.7 Theorem: Let X, Y be finite sets with n and m elements respectively.
Then the number of onto functions from X to quals
n M
,3(—1)'( r )(m-rr
Proof : Let F be the set of all functions from X to Y. By theorem 2.14,
[F| = m". Now let Y= {y., y,,...,y,,.}. For each i=1, 2,..., m let F, be
the set of those functions from X to Y whose ranges do not contain the
point y.. Every such function may be thought of as a function from X into
the set Y—{y,} whose cardinality is m—l . So again by Theorem 2.14,
I F: I =(m—l)”.
ForiaéjJ’ynF, is the set of functions which take values in the set
Y—{y,, )7) whose cardinality is m - 2.
122 DISCRETE MATHEMATICS (Chapter Two)
So 17mm =(m—2)". ‘
In general for any r-tuple, (in i,,..., i,) with
igi,<i,<...<i,<m.
we have
I n, n a. n n m = (m—r)--
m
There are ( ) such r-tuples. The result now follows Theorem 4.2, be-
r
cause a function from X to Yis onto ifl‘ it is in none of the Pie. I
As a consequence, we get an expression for Stirling numbers.
4.8 Theorem: For positive integers n and m,
_ .-. (—I)'(m—r)H
S"’”'_ 30 r!(m—r—l)! '
Proof : In Proposition (3.8) we counted the number of surjective func-
tions froma set with n elements to a set with m elements as m! Saint.
Equating this with the count in the last theorem, we get the result. I
4.9 Problem: There is is an elevator in a four storeyed building which
has a ground floor and four other floors marked I, 2, 3, 4. Seven persons
get in the elevator at the ground floor. In how many ways can they be
discharged at the remaining floors, assuming that (i) at every floor at least
one person gets out and (ii) the order of persons coming out on the same
floor is immaterial?
Solution: Because of the second assumption, every possible way of dis-
charging the persons on the floors corresponds uniquely to a function which
assigns to each person, the floor at which he gets out. The first assumption
implies that this function is surjective. So the answer to the problem is
simply the number of surjective functions from the set of 7 persons to the
set of four floors. By Theorem 4.7 this equals,
._(;)..(;)._(;>..
Sometimes the data of a problem is not suflieient to give an exact count
of the set in question. In such cases, we can nevertheless find upper and
lower bounds for its cardinality. Such a bound is said to be sharp if we
can demonstrate at least one instance, consistent With the data of the
problem, in which it is attained, i.e. in which the cardinality of the set
equals the bound. As a general result of this type we have :
Elementary Counting Techniquer 123
4.10 Theorem: Let SI, S.,..., S,I be finite sets. For r = l, 2,...,n lets,
be the sum of the cardinalities of the intersections of these sets taken I at a
time. Let
r, = i (—1)I+1:,.
1-1
Then, for every r = l, 2. ..., n-l, I 913‘, l lies between t, and rm. More
specifically,
'ru < I}; s. | < 1, for r odd
and
’1 < I fill St I < tr“ for r even.
These bounds are sharp.
.
Proof: In computing 112‘s. I, we take 3,; then subtract s, so as to ex-
clude the elements which have been included too often because of their
being common to at least two of the 51's; then again add r, so as to inclu-
tie the elements that have been excluded too often because of their being
common to at least three of the 51’s and so on. [f in this process we go
upto r,, then we have correctly counted all elements of f) S. which belong
(-1
to exactly r or less of the S.’s. But scorrection is due for those which
belong to more than rof the 51’s. This correction is positive or negative
depending upon whether r is even or odd. But in any case it is an
over-correction because an element which belongs to more than r+l
subsets gets counted too often in this correction. So it follows that the
correct value of| L'JI Slllies somewhere between I, and t,+ (the next
(=1
correction); i.e. between I, and 1,“. Keeping in mind the signs of these
corrections (depending upon whether r is even or odd), we get the desired
inequalities. As for their sharpness, consider a situation in which no element
is common to more than r subsets, i.e. the intersection of every 7+ 1 sub-
sets is empty. Then 5,.h = 0 and so 1, = t”... Therefore equality holds. I
The preceding theorem combined with Corollary (2.6) ofien gives the
desired bounds on the cardinalities as we now illustrate.
4.11 Problem: Suppose in an examination with four subjects A, B, C,D
the percentages of candidates passing in them are respectively 70, 75, 80
and 85. Find upper and lower bounds for the percentage of candidates
124 915mm MATHEMATICS (Chapter Two)
passing all the courses. How are these bounds afl‘ected (i) if it is known
that everybody who clears A also clears C, (ii) if it is known that every
body who clears C clears at least one of A and B?
Solution: Let S be the set of all candidates appearing for the examination.
We may suppose I S | = 100 for convenience. Let A, B, C, D also denote
the sets of those candidates who clear the corresponding subjects. Then we
are given that
iA|=70,|B|=75,|C|=80
and I D | = 85 and we have to find upper and lower bounds for:
I A n B n C n D | .
of course 100 andO are always two bounds. But obviously we look for
sharp bounds. Since
A n B n C n D
is contained in each A, B, C, D clearly 70 (the lowest of the cardinalities
of the sets), is an upper bound. (It is also true that
A nBn Cn D
is contained in
A n B, B n C n D
etc. but we are not given the cardinalities of these sets). Also this upper
bound is sharp, because we can haveA CBC C c D, in which case
A a CnD = A. As for the lower bound, let A‘ = S — A etc. Then by
Theorem 4.14,
IA’UB'UC‘UD'I< IA'|+IB'I +lC’l+|D’1-90.
Since AaCl’lD - S —- (A’ U B’ U C’ uD’), it follows that
|A aCnDl 2 10. Also we can have a situation in which A', B’, C', D',
are mutually disjoint and consequently | AaC n D | = 10. So the
lower bound 10 is also sharp.
ifitisknownthatAcC,thenA aCn Dissimply AaD.
By the same reasoning as above, the upper and lower bounds for | A nB flDl
are obtained as 70 and 30. Both are easily seen to be sharp. So the lower
bound is improved while the upper one is unafl'ected.
IfitisgiventhatCCAUBthen |C| g IAUB|.But
lAnBl=|A|+lB]—|Anli|.
This gives
lAnBI< IA I+|Bl —|C| =7o+75—80=65.
Since AaCnD C AnB we get I AaCnD| g 65.That this upper
boundissharp is seen from the Venn diagram in Figure 2.14, in which
0 = A U Band theshaded area is A n 8. Thus the upper bound is reduced.
Elementary Counting Technique: 125
But there is no change in 10 as the lowerbound for | An B n CnD | ,
because in the case where A’, B’, C'. D’ are mutually disjoint, C c A n B.
What happens if in numerical examples such as above the cardinality
of some subset comes out to be less than some lower bound for it? Then
obviously the data is inconsistent! For example in the problem above if it
was given that | A n B I = 40, this contradicts the estimate
|AnB|=|Al+lBl—IAUBIZ|AI+IBl—isl=45-
Figure 2.14: Venn Dlazra- for Proble- 4.1].
So the data is inconsistent. Such inconsistency is knowu as numerical incon-
slstency.
Exercises
4.1 In how many ways can the four walls of a room he painted .with
three colours so that no two adjacent walls have the same colour?
42 Suppose the room has two doors, one on each of a pair of opposite
walls. Now, in how many ways can the walls be painted with
three colours so that no adjacent walls have the same colour and
the walls with the doors in have the same colour?
4.3 Suppose n persons are given one card each. These cards are collec-
ted, shuflled and again distributed, to the persons. one to each.
What is the probability that no person gets the same card again?
4.4 How many permutations of the integers l to 8 are there in which
no even integer remains in its own place?
4.5 How many permutations of the integers l, 2,.., n are there in
which no two adjacent positions are filled by consecutive integers
(in an increasing order)? (For examale, for n = 4, the permutation
1432 is allowed but not the permutation I423).
4.6 The finished products in a factory have to undergo two separate
126 orscaa'ra MATHEMATICS (Chapter Two)
tests for quality control under two machines. 0n the same product
the two tests may be performed in either order but not simultane-
ously. Also no machine can handle more than one product at the
same time. If the time taken for each product for each test is one
hour, prove that the number of ways to carry out the testing of n
(distinct) products in 71 hours is n! D. where D, is the number of
derangements of :1 objects.
“4.7 What will be the answer to the last problem if each product is to
undergo three tests of equal duration which may be carried out in
any order but not simultaneously?
4.8 Suppose we have 5 balls each of in colours. In how many ways can
they be arranged in a row so that every ball is adjacent to at least
one ball of the same colour? (The answer may be expressed as a
summation). What if we have only 4 balls of each colour?
4.9 In how many ways can six couples be seated in the 12 positions of
a clock dial so that no couple is seated diametrically opposite?
4.10 In a hotly {ought battle, among the casualities, it was found that
50% of the soldiers lost an arm, 60% lost a leg and 45% lost an
eye. 40% of the soldiers lost at least two of these organs. Find
sharp bounds for the percentage of soldiers losing all the three
organs.
In how many ways can 10 men and 7 women board a bus with 20
seats of which 5 are reserved only for women?
10 persons go for a picnic. 3 of them are vegetarians. They carry
20 (distinct) food packets of which 10 are vegetarian and 10
non-vegetarian. In how many ways can these packets be distri-
buted so that each picnicker gets two packets?
Prove the identity at the end of the proof of Theorem 4.3. (Hint:
Use Exercise 2.16 (ii).)
4.14 The purpose of this exericse is to give a rigorous form to the
somewhat intuitive argument in the second proof of Theorem 4.2.
Let S be a finite set and P(S) its power set. Define a function
f:SxP(S)—>R (the set of real numbers) by f(x, A) =1 or 0
according as xe A or x e A for x e S and A 6 HS). Prove the
following properties:
(i) If we fix A ENS). and consider f as a function of one
variable (see Exercise 1.18) then it is essentially the character-
istic function of the subset A(cf. Exercise 1.13).
(ii) For any A, Be P(S), and x6 S.
(a) f(X.S-‘A)= 1 —f(x.A)
(b) for» AUB) =f(x. 4) +1.06)” —f(x. Anfl)
Elementary Counting Techniques 127
(0) for. A [13) =f(x. A)f(x, B).
(iii) Forany AEP(S), [A] = 2 f(x. A).
XES
(i5) Let S,, S,...,S. be subsets of S. Let I be the index set
(I, 2, . . .,n}. For every subset.) of I, let S] = n S,. (By con-
IE]
vention we let S; = S.) For 0 g r g n, let P, (I) be the set of
all r—subsets of I. Then .r, equals 3 [SJ] and hence
JEPIU)
x x f(x,s,).
xes JEPrU)
(V) If an element x E S belongs to precisely k of the subsets
k
SI, S,...,S,. then
X f(1:, SJ) equals( ), with the under-
JEPrU) ,
0
standing that =1.
0
(vi) For every x e S,
n I
fix, I-l
0 S.')= I-o
2 (-1)'[ 2 f(x. SJ)1~
Jena)
Summing over the two sides of (vi) as x varies over S. we get
the proof of Theorem 4.2.
If n is a positive integer then let 4:01) denote the number of positive
integers S n which are relatively prime to n, i.e., which have no
common prime factor with n. (For example, if n = 20 then such
integers are l, 3, 7, 9, ll, 13,17, and 19 and so ¢(20)=8). The func-
tion o so defined is called the Euler ¢-function.
(i) Using the principle of inclusion and exclusion find 95(200).
¢ (300) and 950,030),
(ii) Ifp is a prime prove that ¢(p') = p' —- r'l.
(iii) If n has a prime factorisation as (I? p? . . . .pzk prove that
9501) =(Pi' - P1”) (P;I -p;"’) (-92“ — 1"" ’
using the principle of inclusion and exclusion.
(iv) If m and n are positive intergers which are relatively prime to
each other then prove that ¢(mn) = ¢(m) fin). Because of this
property, the Euler til-function is called multiplicative).
128 DISCRETE MATHEMATICS (Chapter Two)
Note: and Guide to Literature
The principle of inclusion and exclusion covers all the counting tech-
niques studied in this chapter. Its application is somewhat limited by the
fact that computing the cardinalities of the various intersections can be
difl‘icult, and even where it is easy. the answer has often to be left in a
summation form. Still its theoretical applications are important. It gives a
formula for the Stirling numbers. Also Exercise 4.15 shows an application
to number theory. The Euler «fi—function is very important in algcrbra and
number theory. We shall have occasions to consider it later. For a more
detailed discussion of this and other multiplicative functions, see Hus [l].
Three
Sets with Additional Structures
In this chapter we elaborate the remark made earlier that the focal
point of today’s mathematics is sets with additional structures. In the first
section we study some generalities about such structures and describe how
they arise from an attempt at abstraction. The second section deals with
binary relations, which are among the simplest additional structures on a
set. In the third section we r ’ ‘ to order ' ' which are ,
important in applications. The last section introduces some basic concepts
about algebraic structures. In the next three chapters we shall deal with
certain particular types of algebraic structures. The present chapter isa
prerequisite for them.
1. Abstraction and Mathematical Structures
Birbal, the legendary wizard in the court of the Mogul Emperor Akbar,
was noted for his wit. One of the anecdotes about him relates that he was
onoe asked by the Emperor to give one common answer to three highly
unrelated questions. The questions were (I) Why does the horse ail?.(2)
Why does the earthen pot crack? and (3) Why does the bread“ char? The
diversity of the questions would be battling to anyone. But not so for
Birbal, who quipped instantaneously, “Because you fail to move it,
Jehanpenah.” He was right. The meaning of movement would of course
change according to the context. For the horse it is the trotting exercise,
for the earthen pot it is the spinning motion on a potter's wheel and for
the bread it is a quick flipover to ensure uniform baking.
Although Birbal might not have intended it, he had hit one of the
major forces behind the development of mathematics, namely to look for
'Not the modern bread in the form of a loaf, but the traditional, flat, Indian
bread (rari) which is baked on a roasting pan.or directly on fire.
130 mscmma MATHEMATICS (Chapter Three)
similarities among apparently diverse things, to isolate what is common to
them and then to concentrate on this ‘abstracted’ portion (often forgetting
its origin!) For example, given two problems (1) If Shivaji the Great was
born in [630 AD. and died in [680 A.D., for how long did he live? and
(2) If there were 1680 birds on a tree and 1630 of them flew away. how
many would be left?, a mathematician would hardly treatthem as difl‘erent.
He would consider them as the same mathematical problem put in difl'e-
rent garbs. This is so because the ‘essence‘ or the ‘abstract’ of the two
problems is the same, namely to subtract 1630 from 1680 and get 50 as
the answer. The interpretations of these figures vary according to the
problem. But (unfortunately) they are not considered to be the business of
a mathematician.
Thus, the process of abstraction is hardly new. It has been there for
centuries. But till recently, the abstraction was numerical, that is, numbers
were assigned to the various objects involved in the real life problems. A
mathematician then would work with these numbers, formulate the prob-
lem in terms of these numbers, inventing new concepts (such as a limiting
process) if necessay, then look for methods for solving it exactly (or
approximately). The final answer would be in the form of some numbers
which would be transferred back to the original problem, where they would
be interpreted in terms relevant to that problem. This is the gist of the
development of mathematics, or at least the applied part of it. Of course,
many collateral developments also took place. For example, although
difierential equations originally arose in connection with problems in
mechanics, electricity etc.. the theory for solving them became so rich that
it inspired many mathematicians to try to solve other equations, even
though the latter might not have originated from any ‘practical' problems.
While this line of development still continues; as was remarked earlier,
in modern times, the domain of mathematics has been enlarged to include
non-numerical problems as well. In Chapter 1, we gave the Dance Problem
as an example of such a non-numerical problem. Historically, one of the
most celebrated non-numerical problems is the Konigsberg Bridge Problem,
solved by Euler. Through the city of Konigsberg, flew the Pregel river,
with two islands, A and B in it. The islands were connected to each other
and also to the banks by seven bridges as shown in Figure 3.1.Thc prob-
lem is to start from any one of these islands and to return to it after
having traversed every bridge exactly once (without using any other means.
of course, such as rowing or swimming). It is all right to come to the same
island any number of times during the journey, but no bridge may be tra-
versed more than once, whether in the forward or in the backward sense.
Many people tried, unsuccessfully, to perform such a round walk and
were convinced that it is impossible to do it. But nobody, till Euler, could
prove rigorously that it cannot be done. (There is obviously a world of
Sets with Additional Structure: 131
difference between your being unable to do some thing and that thing being
impossible!)
Figure 3.]: Koninberg Brldge Problem.
As we shall see later in this book,‘ to prove the impossibility of such a
walk is really very simple, once the problem is formulated properly. Why
did, then, the problem baflle so many? The reason is that people, probably
because of their pre-occupation with numbers, simply could not lay their
hands on the essence of the problem. Many quantifies (such as the lengths
of the bridges, the areas of the islands, the speed of walking), which would
be important if the problem is looked upon as a problem of motion in
mechanics, are simply irrelevant in the present problem! (That is why, they
are not given in the statement of the problem.) It follows that if we want
to abstract and isolate the essence of the problem then we must look away
from these numerical data. This is what Euler did as follows.
Since the shapes and sizes of the islands are irrelevant, we may as well
represent them by points Similarly we represent the two banks of the river
by points marked C and D. We thus get four points A, B, C and D. Now
each of the seven bridges may be represented by a curve joining two of
these points. The lengths and curvatures of these curves are unimportant.
Perhaps the only thing we should insist is that these curves should not cross
each other except at end points. We then get the configuration of points and
curves shown in Figure 3.2, which represents the essence of the problem.
In the new formulation the problem amounts to asking, whether there
exists a permutation of the seven curves C1,.“ ., C, and choice of orienta-
tions for them such that (i) the terminal point of each curve coincides
with the initial point of the next curve, if any, and (ii) the terminal point
of the last curve coincides with the initial point of the first curve.
What has been achieved because of the new formulation? For one
thing, we can now solve the problem. There are 7! possible permutations
of the seven curves CI to C7. Also each curve has two possible orientations.
Thus, in all there are 7! 27 cases and if we examine them one by one we
would either come across a casein which the arrangement of the curves
‘See the Epilogue.
132 mscnsra MATHEMATICS (Chapter Three)
(along with their chosen orientations) gives an ailirmative answer to the
problem or else, after exhausting all these 27.7! case we would be able to
say conclusively that a round wall: of the desired type does not exist.
c
C. CS
A a
07’ C4
D
Figure 3.2. The Essence of the Konlglberg Bridge Problem.
This will be a rigorous, mathematical proof, as opposed to a mere personal
conviction, of the impossibility of such a walk based on a few unsuccess.
ful trials. Of course, there are far better ways to arrive at the answer.
(Euler‘s original solution is, in fact, the easiest and will be studied in the
chapter" on graphs). But time-consuming as the method may be, it is sure
to give us the answer eventually.
But the real advantage of abstracting the essence of the problem is that
it tremendously increases the applicability of the solution. In arriving at
the abstraction, we stripped the original Konigsberg Bridge problem of the
irrelevant details (such as the shapes and sizes of the islands, the lengths
of the bridges and so on). It is quite possible that some other problem,
when stripped of its inconsequential details may reduce to exactly the same
problem as the Konigsberg Bridge problem. As a simple (and somewhat
artificial) example, suppose at a tea party, some tea spills over a table and
its stream streches across the full length of the table, with two dry spots
in it. Suppose somebody lays seven spoons as shown in Figure 3.3. A few
sugar particles stick to these spoons. An ant trapped at one of the two dry
spots wants to eat all these particles. Can it do so and return to its origi-
nal position, without getting wet, without leaving the table top and with-
out going over the same spoon more than once? The solution to the
Konigsberg Bridge problem provides an answer to this problem as well.
But this is not very impressive because in this case the similarity between
the two problems is so obvious that most people would refuse to call them
as difl‘erent in the first plate. So we consider another example.
Suppose we have four chemical compounds, say, A, B, C and D. Some
ofthese compounds can be converted to some others by putting them into
'Sce the Epilogue.
Sets with Additional Structures 133
certain processors. Each processor is capable of carrying out two mutually
opposing chemical reactions and can be used either way. For example,
if
compound A is obtained from compound B by oxidation through a pro-
cessor then B can be obtained from A by reduction through the same pro-
cessor. Suppose we have seven such processors, Pb P....,P., and they
act as follows (we list only one way conversions because the other way
conversions are possible by ‘reversing’ the processor). Px and P, convert
A to C; P5 and P, convert A to D; P, converts R to C; PI converts B to D
and P. converts A to D. Can we start with a compound, say, A, make it
pass through each processor exactly once and get back the same com-
pound?
Flgllre 3.3: Problem of the Spilled Tea.
This problem does look quite difl'erent from the Konigsberg Bridge
problem. But if we represent the four compounds by four points, and each
processor by a curve joining the points corresponding to the two com-
pounds on which that processor acts, then we get exactly the same dia-
gram as in Figure 3.2 So the solution to the Konigsberg Bridge problem
could conceivably have applications in chemistry!
We could continue to list further applications (just as we can cite nume-
rous examples in which the equation 1680-1630 = 50 is used). All we have
to do is to give suitable interpretations to the four points and to the seven
curves joining them. While such applications may be interesting to others,
a mathematician will be more concerned with the common underlying
structure on which they are built. It consists of a set of 11 elements, four
of- which are points (often called ‘vertices‘) and the remaining seven are
curves (often called ‘edges’). There is some inter-relationship among these
elements which can be expressed by specifying, for each edge, the pair of
vertices which are joined by it. This amounts to defining a function from
the set of edges into the set of all 2-subsets of the set of vertices.
Formally, the mathematical structure we obtained consists of three
134 DISCRETE MATHEMATICS (Chapter Three)
gadgets, (1) a set V (2) a set E and (3) a function f from E into P. (V) (i.e.
the set of the all subsets of V having two elements each). The technical
name for such a structure is a graph‘ For the moment our interest is not
so much in this particular structure but rather in its genesis. We start with
something, forget some of its details which are irrelevant for our purpose,
isolate the essence and form an ‘abstrnct‘ mathematical structure. We
prove theorems about this structure in the abstract. Such theorems are ap-
plicable not only in the original context but in any other situation in which
the ‘abstract’ structure has been given a ‘concrete’ interpretation.
As another illustration of this procedure, let us consider the Test prob-
lem discussed in Chapter 1. We shall solve it using a mathematical struc-
ture which arises from some other problem, which, at first sight, has little
resemblance with the Test problem. Suppose there are 10 Kings with their
palaces located at points P], P,,...,Pm in a planar region. Tired of cons-
tantly fighting for territory, they come together and sign a pact. Accord-
ing to this pact, each king will be given a territory bounded by a circle
with its centre at the king's palace. This territory is to be exclusively his.
Obviously no two such territories may overlap (except perhaps for points
on their boundaries). It’ all territories are to be of equal size, what will be
their maximum radius?
:1
T3.1:4
‘EaT-rif
Figure 3.4: Klngs with a Territorlal pact.
This problem is quite simple. Our common sense tells us that if the
territories are the largest possible then at least two of them will touch each
other, because if there IS some leeway left, we could enlarge the territories
slightly without ove1lapping. In such a case the radius of each territory
will be half the distance between the closest pair of palaces. But how do we
prove it rigorously that with this choice for the radius the territories will
not overlap? The proof is not difficult. Consider all the possible
*More precisely, n multigrlph.
Set: with Additional Structure: 135
“HW
pairs of the points P., P,,..., P“. Letb be the minimum of the distance
between these 45 pairs. (It may happen that there are more than one pair of
palaces at a distance b apart from each other). Denote by T1, T,,..., Tie the
circular territories of radius b/Z eaCh, centred at P1, P,,..., Pm respectively as
in Figure 3.4. Now suppose, if possible, some point P is common to the
interiors of two of these territories, say T; and T]. Then the distance of P from
P.- as well as P, is less than b/2 each. So the sum or these two distances is
less than b. By a well-known geometric property of triangles, the distance
between P; and P; is less than this sum. (If P,, P and P, are collinear then
it equal this sum). But this means that P; and P, are at a distance less than
b from each other, contradicting the definition of b. Thus it follows that
the territories cannot have any common interior points, i.e. they cannot
overlap except perphaps for boundary points. It is also clear that a radius
larger than [2/2 will not work, because with such a radius, the midpoint of
the line segment joining the closest pair of palaces will be a common inte-
rior point to the territories centered at these palaces.
Thus we see that of all the many attributes of the plane, the most rele-
vant to this problem is the concept of the distance between various points.
Let us isolate this concept, so that it can be applied to other problems.
Let X denote the set of all points in the plane. If P, QEX, let d(P, Q)
denote the distance between P and Q. This is a real number depending on
both P and Q. Using the language of functions, we have a function 4 from
the Cartesian product XXX into R, the set of real numbers which assigns
to each ordered pair (1’, Q) of points of the plane, the distance between P
and Q. This distance is often called .the euclidean distance, because it occurs
so very frequently in the euclidean geometry.
Now if we want to isolate this concept and put it into an abstract,
general form we would have to let X be any ‘abstract’ set and d an ‘abstract’
real-valued function on X x X. if x, y are two points of X then the real
number 110:, y) will be called the distance between them. The resulting
mathematical structure is called a metric space. Examples of metric spaces
will be obtained by letting X be some concrete set and letting
d: X X X -> R
be some concrete function. We already saw one such example where X was
the set of points in the plane and d was the euclidean distance function.
But we can give many others. We list a few below:
(i) X = set of real numbers, d(x, y) =| x—yl
(ii) X = set of real numbers. d(x, )0 = X + Y'
(iii) X = set of all elephants,
136 DISCRETE MATHEMATICS (Chapter Three)
1 if the elephant x is taller than the elephant y
do, )2) = {
0 otherwise
(iv) X = set of real numbers, d(x, y) = xy
(v) X = set of real numbers, d(x, y) = x‘ + ya
1 ifxsé)’
(vi) X = any set, d(x, y) = i
0 ifx=y
(viii) X = the set of four vertices associated with the Konigsberg Bridge
problem (Figure 3.2).
d(.x. y) = number of curves joining x and y (Thus d(A, A) = 0,
d(A, C) = 2, d(C, A) = 2, d(A, B) = 0, (KC, D) = 0, etc.)
We could continue this list much longer. In fact since we have not put
any restriction on the set X or on the function (I (other than it be real-
valued), it does not take much work or imagination to give examples of
metric spaces. Whatever theorems we prove for metric spaces in the abstract
would be applicable to all the examples given above, and also to any other
examples of metric spaces. The trouble. though, is that there is not much
one can prove, given only an abstract set X (about which nothing is known
in general) and an abstract function 11: XXX —> R. In order to prove some
non-trivial theorems, we would have to assume as axioms some properties
of the function d. Effectively this amounts to changing the definition of a
metric space so as to make it more restrictive. Obviously, in doing so we
must be careful not to exclude our original example. namely the plane and the
euclidean distance function. For example, if we assume as an axiom that the
distance between every two distinct points be the same then this axiom is
no longer satisfied by the euclidean distance function for the plane (although
it is satisfied in example (vi) above). This means that the properties which
we wish to assume as axioms for an abstract metric space must be chosen
from among those which are true for the euclidean distance function. Let
us therefore list some of the properties of the euclidean distance function.
We do this in the next proposition.
(1.1) Proposition: Let X be the set of points in a plane and let d(I', Q)
denote the euclidean distance for two points P and Q in X. Then.
1. d(P, Q) 2 0 and d(P, Q) = 0 if and only if P = Q. (This property
is called positivity.)
2. For all P, Q, d(P, Q) = d(Q, 1’) (Symmetry)
3. For allP, Q, R, d(P, R) g d(P, Q) + 4“). R) and equality holds
if and only if P, Q; R are collinear with Q lying between P and R.
(Triangle inequality)
Sets with Additional Structure: I37
4. For all P, Q there exists a unique point M in the plane such that
110’. M) = «M. Q) = s «P. Q)- (Mid-point property).
Proof: The proof depends upon how d(P, Q) is defined. In elementary
books on geometry, the euclidean distance is taken as a primitive concept.
No formal proofs are given for (I) and (2) (which effectively amounts to
taking them as axioms). The inequality in (3) is proved as a property of
triangles (hence the name), namely that the sum of the lengths of the two
sides of any triangle exceeds the length of the third. For (4), the existence
of the mid-point is again taken as an axiom, while its uniqueness follows
as a consequence of (3).
In an analytic approach to geometry, points of the plane are represen-
ted by their cartesian coordinates w.r.t. some fixed frame of reference. if
P 5 (x1, yo and Q E (x,, ',) thcn d(P, Q) is defined as:
V06 — x2)“ + (y, — ya)“
Then the proofs of all the properties reduce to some simplc facts about
real numbers (such as, the fact that the square of every non-zero real
number is positive). We leave the proofs as exercise because our concern is
not proving these properties per se but in studying their consequences. I
We can list more properties of the euclidean distance function. But
first let us take stock as to which of these properties are satisfied by the
examples of the ‘abstract‘ distance function given above. Note that the
concept of collinearity has no meaning for an abstract set. Consequently
the second part of property (3) is meaningless. Therefore, for an abstract
distance function d:X X X —> R, by triangle inequality we shall simply
mean, d(x, 2) g d(x, y) + rl(y, z) for all x, y, z e X (the points x, y, 2
need not all be distinct). We list the results summarily, leaving their veri-
fication as exercises.
(i) satisfies all four properties.
(ii) satisfies none of them.
(iii) satisfies only the third property and a part of the first
property, namely, ifx = y then d(x,y) = 0.
(iv) satisfies symmetry but no others.
(v) satisfies symmetry and the triangle inequality. It also satisfies
a part of the positivity condition, namely, the distance between
any two elements is non-negative. It does not satisfy the
mid-point property.
(vi) satisfies the first three properties. The mid-point property is
satisfied if and only if the set X is either empty or consists of
just one point.
(vii) satisfies the first two properties but not the other two.
Now we come to a basic question. Which of these (or some other)
138 DISCRETE MATHEMATICS ‘ (Chapter Three)
properties should we assume as axioms for an abstract metric space?
Questions like this arise every time we want to define a new mathematical
structure, starting from some particular example (in this case the euclidean
distance) as a model and isolating some of its properties. So some general
discussion regarding the selection of axioms is in order. Recall that the
very purpose of axioms is to provide foundations for the theorems to be
proved. Naturally, the stronger our axiom system, the more powerful
theorems we would be able to deduce from it. These profound theorems
would make our theory rich in terms of its depth. But there is a price we
have to pay for this. As our axiom system becomes more and more power-
ful, it also becomes more and more restrictive in the sense that the number
of examples that satisfy all the axioms goes on decreasing. Consequently,
although we can prove more and deeper theorems, the domain where they
become applicable will shrink. For example, if we take symmetry and
positivity as our axioms for an abstract distance function then among the
seven examples given above, our theory will be applicable to (i), (vi) and
(vii). If we further include triangle inequality then (vii) will have to go
out. And if the mid-point property is also added, then (i) will be the only
example to which our theory would apply.
Thus we see that depth and generality are conflicting virtues. So we
seek an optimum balance between the two. That is, we want our axiom
system to be strong enough so as to enable us to prove non-trivial theorems
from it and at the same time should not be so restrictive that we would be
forced to throw away interesting applications. To hit upon the right
scheme of axioms is not an easy task. There is also some room fora
difl‘erence of opinion as to which is the optimum balance, because depend-
ing upon individual tastes, one can lean more towards generality than
depth or vice versa. Ultimately, it is only through the test of time that a
particular scheme of axioms gets recognition as one leading to afairly
nondrivial and still sufficiently general theory.
The definition of a metric space which is now accepted in the mathe-
matical world has well stood this test of time. We shall give this definition
first, then we shall prove a theorem about metric spaces. Finally, we shall
apply this theorem to a suitably constructed metric space and thereby
obtain a solution to the Test Problem.
1.2 Definition: A metric space consists of a set X and a function
d:X x X -—> R, called the distance function or the metric, satisfying the
following conditions:
1. For all )6, ye X, d(x, y) 2 0 and equality holds if and only if
x = y.
2. For all x, ye X, d(x, y) = d(y, x)
3. For all x, y, z e X, d(x, 2) S d(x, y) + d(y, z).
The plane with the euclidean distance is of course a foremost example
Sets with Additional Structure: 139
of a metric space. In the list of the seven examples given above, (i) and
(vi) are metric spaces. Another important example will be given after
proving a general theorem about ‘abstract’ metric spaces. Note, by the
way, that on the same set it may be possible to define more than one dis-
tance function. For example, on the set of real numbers, one distance
function is given by d,(x, y) = l x -— yl (this is the same as the example
(i) above) while another distance of function is given by d,(x, y) = | x'——y‘ 1.
Thus although the underlying set is the same, the metric space whose dis-
tance function is all is not the same as the metric space whose distance
function is d,. (A beginner may find this a little diflieult to swallow, but it
need not be so. Different metrics induce different structures on the under-
lying set and it is the structure that Counts. It is somewhat like this. The
same mound of clay may be moulded into various forms and then we call
it a jug, a plate and so on even though the material is the same.) To take
into account this fact, a metric space whose underlying set is X and whose
distance function is dis denoted by the ordered pair (X; 11). In the example
just given, X = R and the two metric spaces (R; (1,) and (R; d.) are to be
distinguished from each other. of course, we do not have to be so fussy
all the time. Where the distance function :1 is understood, We may denote
a metric space (X; d) simply by X.
Now, for the theorem we are going to prove, we first define the abstract
analogue of the circular territorial regions in the problem of the ten kings
considered above.
1.3 Definition. Let (X; d) bea metric space and let x e X. Let r be a
non-negative real number. Then the sets BAX, r) and Cd(x, r). defined by,
Ba(x,r) = UEXIJ(X.y)<r}
and
CAX. t) = {y E X : doc. y) s r}
are called, respectively, the open and the closed balls of radius r and centre
x, or for short, open r-ball and closed r-ball with centre x.
If X happens to be the plane and d the euclidean distance then BAX, r)
consists of a circular region (without the boundary) while 040:, r) consists
of a circular region with the boundary included. The euclidean distance
can also be defined on the three dimensional Space so as to give a metric
space. In this case, the open and the closed balls in fact turn out to be
solid balls (without and with boundary respectively) in space. Hence the
name. of course for other metric spaces, the open and the closed balls
may not look like ‘balls‘. Note also that they depend as much on the
metric d as on their centre and their radius. For example, consider the two
metric spaces (R ; :11) and (R; d,) defined above. We invite the reader to
prove that the sets Bu, (2, l) and B4,(2, 1) respectively the open intervals
140 mscnarn MATHEMATICS (Chapter Three)
(1, 3) and (1,31/’) while the corresponding closed balls are the closed
intervals [1, 3] and [l, 3“] respectively. (Hence the names ‘open' and
'closed’). Once again, where only one distance function :1 is involved, we
may suppress it from notation and denote B.(x, r) by B(x, r) and Cd(x, r)
by C(x, r).
Obviously, in any metric space Cd(x, r, is larger than 84x, r), the difi‘e—
rence between the two being the set {y e X: d(x, y) = r}. In all the ex-
amples given above this difference was much smaller as compared to the
open ball BAX, r) and consequently we tend to think that the open and the
closed balls are ‘nearly' equal. However, this does not hold in all metric
spaces. For example let X beany set and define d: Xx X —> R by (Kat, y) = 1
if x aé yand 0 otherwise. Then (X; d) is a metric space. Verification of
the positivity and the symmetry property is trivial. For triangle inequality,
let x, y, z e X. We have to show d(x, 2) S 110:, y) + (Ity, 2). if the right
hand side is 0 then x = y = z and so the left hand side is also 0. If the
right hand side is not 0 then it is at least 1 and so the inequality holds
because the left hand side can be at most 1. Note that in this metric space
all distinct points are equidistant from each other. Let x be any point of
this space. Then BAX, ]) consists only of x but CAx, l) is the whole setXl
Thus Cd(x, l) is considerably larger than B.(x, I) This is inconsistent with
our intuition but we have to accept it as it is a logical consequence of our
definitions. The reason for this behaviour is that, unlike the earlier ex-
amples, this metric does not have the midpoint property. In fact no point
in this space has any close neighbours, the nearest neighbours being at a
distance of 1. So every point is isolated. For this reason, this metric is
often called the discrete metric. It deserves to be studied, it for no other
reason, then at least because it is a namesake of the subject matter of this
book, namely ‘discrcte’ structures! In fact our interest will be in finite
setsand it is not dilficult to show that every metric on a finite set is discrete
in the sense that every point of it is isolated.
Another point which defies our intuition is that open balls of the same
radii but with different centres need not'b: of the same ‘size'. This is true
for the euclidean distance on the plane and also for the metric space (R, d.)
considered above. But it is false for the metric space (R, 01,). 13,,(0, l) is
an open interval of length 2 but 84,0, 1) is an open interval of length 2‘".
We invite the reader to pinpoint the reason for this anomalous behaviour.
Instances like this are to be expected because, since we are not making
all properties of the euclidean distance as the axioms for an abstract dis-
tance function, it is but natural that something that holds for the euclidean
distance may fail in some other examples of metric spaces. However,
theorems which are based only on the three properties we have chosen
(namely positivity, symmetry and the triangle inequality) will hold good in
every metric space. We prove one such simple result.
Set: wIth Additional Structures 141
1.4 Theorem. Let (X; d) be a metric space. Let x, y be distinct points of
X and suppose b = d(x, y). ”0 g r g b/2 then the open balls Bd(x, r)
and Bd(y, r) are mutually disjoint. If 0 S r < b/2 then the closed balls
Cd(x, r) and C,(y, r) are also mutually disjoint.
Proof. Note first that b > 0 because of the positivity condition since
x sé y. Now suppose 0 g r g b/2. Let, if possible, 2 be a common point
of 8.10:. I) and Bd(y, r). Then, by definition, d(x, 1) < r and d(y, z) < r.
By symmetry we get that d(z, y) < r. Now by triangle inequality.
d(my) s 11(x. 2) + d(z, y) < r + r = 2r g b.
Thus we see that d(x, y) < b contradicting that d(x, y) = 17. Therefore the
sets BAX, r) and Edy, r) are mutually disjoint. The proof of the second
assertion is similar and left to the reader. 3
This theorem. as applied to the plane with the euclidean distance was
at the crux of the solution to the problem of the kings discussed above.
We shall now construct a new metric space and apply the theorem to it.
so as to get a solution to the Test Problem.
Let X be the set of all binary sequences of length n, where n is some
positive integer. If x" x,. . . ., x,, is such a sequence we shall denote it by
2 for short. Now suppose v =y1.y,. .. .. y. is another such sequence.
Then we define (ICE, 7) to be the sum 2. | x, — y, |. Note that each x. and
I-l
y, is either 0 or 1 (because the sequences are binary). So i x,—y, 1 = 0 if
x, equals y, and | x,—y, | = 1 if x, is different from y,. Therefore do}, 3’)
is also equal to the number of i's for which x, a6 y,, i.e. to the number of
places at which the two sequences x., x,, .. , x.I and y,, y.. ..., y,' do not
match with each other. The number (Ki, 7) is called the Hamming distance
between the binary sequences 2 = x.. ..., x,, and 7 = y1. ..~,yn. For
example, if n = 8, :‘c = OIOIOIOI and i = 100l0001 then 11(2. 7) = 3
because these two sequence differ in the first. second and the sixth term.
The importance of this concept stems from the following result.
1.5 Theorem. The set of binary sequences of length n along with the
Hamming distance is a metric space:
of
Proof: We have to show that the function d satisfies the three axioms
inequality.
a metric space, namely, positivity, symmetry and the triangle
Let _
x = x, x, ...... x.
and
7 = yl y,...y,. E X.
Then
d(:‘:.r)=,§l|xi—y:l.
142 niscnm MATHEMATICS (Chapter Three)
Each term in this summation is non-negative. So the sum is also non-
negative and will be 0 if and only if
I J‘I—J’I I = 0. i.e. Xi = In
for all i = 1, 2,..., n. But this is equivalent to saying that 2 = 7. So posi-
tivity is established. Symmery is also immediate from the fact that for
every 1',
lyt-xtl = l XI—y: ‘-
For the triangle inequality, suppose
1:211, ...z.e X.
Then
I
d (i, i) =’zI | Xr—ZI 1.
Now for each i,
I XI"! 1 S IXI—J’i I + IYI—Zl ['
(This is known as the triangle inequality for the absolute value of real
numbers. We hope the reader is familiar with it as it is used innumerably
often in calculus.) If we sum both the sides of this inequality for i = 1 to
n, we get that
“7‘, 2) < d6. 7)+ “7. i).
This completes the proof. We have given the argument using the definition
of d as a certain sum. it will be instructive for the reader to paraphrase it,
especially for the triangle inequality, in terms of the interpretation of
(10", 7) as the number of places where the sequences 2 and ? differ from
each other I.
Because of this theorem, all the theorems about metric spaces become
applicable to the set of binary sequences. So far we have proved one such
theorem. To apply it, let us take a closer look at the open and the closed
balls in the metric space we have just constructed. Because the Hamming
distance assumes only integral values, it suflices to consider balls whose
radii are. integers. Moreover, we need consider only the closed balls, be-
cause an open ball of radius r is equal precisely to a closed ball (with the
same centre) of radius r— l .
1.6 Lemma: Let X be the set of all binary sequences of length n and
let :1 be the Hamming distance. Then in the metric space (X; d), every
closed hall of radius r (where r is a positive integer), has precisely
, n
2 elements.
h-o ‘ k
Sets with Additlanal Structures 143
Proof: Let
i=x, x,...x. E X.
For each k, let
S46, k) = We X :d(2,‘)=k}.
Then
Ca (3?, 7) =l‘Sa (7‘. ’0-
Also if kfiék, then
3., (2, kl) n S, (i, k,) = 4:.
So by Proposition (2.2.3).
lac, r) I =30 IS. (2. k) I-
Thua our result will be established if we show that for every k,
n
ls,(x,k)|=( )
I:
Now 5'4 (I, k) consists of those sequences 3' = y1 y,...y,. in X whose entries
difl'er from those of)? in preciselyk places (out of n). The choice of k
n
places from n places may be made in ( ways. For every such choice,
k
there is precisely one sequence whose entries difl‘er from those of i in those
k places. This is so because the sequences are binary. Hence if we are given
that y. is difierent from x;, then x, uniquely determines y;. In fact,
y, = Lifx, =0
and
y,=0ifx, = l.
n
Thus we see that there are ) elements of X which are each at a disp
k
tance k from :‘r. As noted before this completes the proof. I
Having computed the size of the closed r-balls, we now get an upper
bound on how many of them can be packed in the space X without over-
lapping (like the non-overlapping territories in the Kings problem above).
We first introduce a notation If x is a real number, then by LxJ we shall
denote the greatest integer not exceeding 2: and by [‘x‘l we denote the least
integer greater than or equal to x. For example,
L43.12J :43, [13.121 = 44, L43J = 43 = r431, L-7~5_i =4
and F—7.5‘| z —7. (In the literature, LxJ is often denoted by. [x] and is
called integral part of x.) These notations appear frequently in dlscrete
144 DISCRETE MATHEMATICS (Chapter Three)
mathematics because we often have to convert real numbers to integers.
Note that if m is an integer and x is a real number such that m S x, then,
by definition, m g Lx_|.
1.7 Theorem: Let (X; d) be as above. Suppose there are m mutually
disjoint closed balls of radius reach in X where r is some non-negative
integer. Then
2'
m< ,. n
L
a( k ) J
71
Proof: By the lemma above, each closed r-ball has i ( k ) elements. If
k-o
there are m such balls and they are mutually disjoint, then by Proposition
n
(2.2.3), their union has cardinality m k)". k ) . But this union is a sub-
-0
set of X and so its cardinality cannot exceed that of X by Corollary (2.2.6).
Now, | X | = 2'l by Proposition (2.3.3). So, putting it all together, we get
11'
r 21:
m2( )<2“orm<—.
k-o k 2,:(rl)
k-D k
This expression need not be an integer. But m is an integer. So
Combining the last theorem with Theorem (1.4), we now get an upper
bound for the number of points in X which are sufliciently far apart from
each other. We prefer to word the result in terms of its contrapositive,
because of the application to follow.
1.8 Theorem: Let (X; d) be as above. Let! be an odd positive integer
t—l
and let r =2 . Then if we take more than
Sets with Additional Structure: 145
_2"__
r n
E. k
L J
points in X, then at least two of these points must be at a distaneelees
than I from each other.
Proof: Suppose the number of distinct points taken is m. Suppose no
two of them are at a distance less than r from each other. Then for every
two distinct points, say, 1': and 7 among these m points, d (2, 7) > t. 50
r < 41—657) .
Therefore by Theorem (1.4), the closed balls C402, F) and Cd (7 F) are
mutually disjoint, Since this hold for every pair of distinct points fro
them points taken, it " " thatX ' a " " of m ,.
disjoint closed r-balls. Therefore, by the last Theorem,
m<—’-2—~n—.
L
sh)
This contradicts our assumption that more than
2FI
, n
L
E k 0
.1
points have been taken, and establishes the result. I
Put differently, the last result says that for every non-negative integer
r, the integer
2n
, n
s.()
L -"J
is an upper bound on the maximum number of binary sequences of length
In every two of which are at least 2r + l apart. This upper houndil
called the Hamming bound and is important in coding theory. (See the
Epilogue.)
146 mscnsra MATHEMATICS (Chapter Three)
We now have all the machinery needed to solve the Test Problem. In
this problem, 20 students appear for a test with 10 questions. As commen-
ted in Chapter 1, Section 3, the answervboolt of each student corresponds
to a binary sequence of length 10. To say that two students answer at least
six questions identically is equivalent to saying that their answers to utmost
four questions are difl'erent. i.e. the Hamming distance between their
answer-books is 4 or less. We apply Theorem (1.8) with n = 10 and l = 5.
Then r = 2 and
l 1024
| :(n =L1 + 10 + 45J
-|_18.3J = 18.
Lm A
Since we have more than 18 students, it follows that the answer-books
of at least two of them must be at a distance less than Sapart from each
other. Actually the problem can be solved even with 19 students instead of
20.
The reader may ask whether all this lengthy procedure is necessary to
arrive at a solution to the Test Problem. After all, the original problem does
not involve the Hamming distance. Could we not simply consider, for every
student, the set of all binary sequences which difl'er from the answer-book
of that student in at most two places and then show that these sets are
mutually disjoint? This question is important because a similar question
can be asked whenever we apply a general result about mathematical
structures of a certain type (in this case a metric space) to a particular
instance of it (in this case the Hamming distance). Strictly speaking the
answer to such questions is ‘yes’. Whatever can be done through the use of
general mathematical structures can, at least in theory, be done directly in
a particular case. So if our interest is in only one particular problem, then
abstract L ‘ ' on are not ' " , "' Why do we study
them then? One answer was already given above, namely, the same struc-
ture may be applicable in more than one problem and so studying it in the
abstract saves duplication of work. The greater the number of instances
where a structure can be applied, the more worthwhile it is to study it in
the abstract. A fitting analogy would be that if you want to send just one
or two greeting cards to persons in your neighbourhood, you might as well
arrange to deliver them personally; but if you want to send them in bulk
you better mail them (and thus use the postal structure) even though one
of the cards may be addressed to your next-door neighbour !
Hmvever, there is one more reason to justify the study of abstract mathe-
matical structures. It is valid even when we have just one or two applications
ofthat structure. True, the abstract structure is not indispensible in the sense
that whatever can be done with it can also be done directly without it. Some-
times, it is indeed preferable to do so. But many times the direct solution
Sets will: Additional Structures 147
looks so clumsy and contrived that one would wonder how on earth it was
thought of. For example in the direct solution to the Test Problem, it is
far from easy to see the significance of the set of all binary sequences which
differ from a given binary sequence in at most two places. But with the
introduction of the Hamming distance, the same set corresponds to a very
natural concept, namely, a Closed ball of radius 2.
As another example, take the concept of the dual partition of a
given partition of a positive integer. We introduced it at the end of
Chapter 2, Section 3 and used it successfully to give a solution to the
Postage Problem. Strictly speaking, this is a purely set-theoretic concept
and we could have defined it as such. Consider a partition of a positive
integer n into k parts of sizes a,,..., a]. (say) with a, 2 a, 2...> a. > 0.
Here all the k parts have positive sizes. For each i, let b, be the number of
parts whose size is 2 1‘. Clearly bI = k and b, = 0 for i> an Also
bl 2b, 2... 2 11., >0.
For each i: 1, 2, ..., the number of parts of size i is b, — bl“. It follows
that :il 1 (b, — bl“) = n. But if we actually write this sum out we see, after
cancellations, that it is nothing but b,+b,+...+b.. . Thus we have
proved that b‘ + b, + .. .+b,l is also a partition of n. It is not hard to
show, further, that this is precisely the dual of the original partition of n.
We could have therefore defined the dual partition in this manner. Butthis
definition does not appear as elegant as the earlier one, based on the
Ferrer’s graph of a partition. When we draw the Ferrer‘s graph of a parti-
tion, we are in essence putting an additional structure on the underlying
set, namely the structure derived from the geometry of the cartesian plane.
The earlier definition of a dual partition is not substantially different from
the one given above. But it appears more natural because of our long
familiarity with the geometry of the plane. The geometric structure also
helps in the proof of a result like that of Exercise (2.3.36). where a direct
proof would be quite awkward. As we shall see in Chapter 6. the geometric
structure is also inherent in certain concepts involving matrices.
So far we introduced two mathematical structures, a graph and a metric
space. Typically, a mathematical structure consists of an ‘abstruct’ set,
called the underlying set, (sometimes more than one set as in the case of a
graph). On this set, we have an additional structure. Depending upon the
type of study We are interested in, the form of this structure will vary.
Usually it is in the form of some function or some set associated with the
underlying set. A few properties of such functions or sets are often assumed
as axioms. We shall study many such structures in this and the next few
chapters. We conclude this section with a mathematical structure which is
designed to take into account the repetitions in a set. According to our
convention, the repetitions of elements in specifying a set by listing its
elements do not affect the set. Thus we regard {L 2, 1, 2, 2, 3, 2, 1} the
148 Discus-re MATHEMATICS (Chapter Three)
same set as (I, 2, 3}. Many times, however, the frequency with which a
particular element appears in a list may be relevant. For example, when
we consider the set of marks obtained by the students in a class at some
examination, we would like to count each figure as many times as the
number of students who score it. Similarly, in considering the set of roots of
a polynomial, we would like to count each root according toits inultiplicity‘.r
The appropriate concept to handle the repetitions of elements of a set
is a multiset. It can be formally defined as an ordinary set along with a
certain additional structure on it. This additional structure is in the form
of a function which tells us how many times a particular element is to be
repeated.
1.9 Definition: A multi-set is an ordered pair (5', f) where S is a set and
f3 S -> N is a function, called the frequency or multiplicity or weightfnnction.
A Convenient way of writing a multi-set is by writing each clement as
many as its frequency. Thus, the multi-set (1,1, 2, 3, 3, 3, 3, 6} is the
multi-ser‘(S, f) where S = {1, 2, 3, 6) andf:S -> N is defined byf(1) = 2,
f(2) = 1, f(3) = 4 and 1(6) = l. The roots of the polynomial x‘ —x' form
the multi-set {0, 0, 0, 1}.
Many concepts for sets can be generalized for multi-sets. For example
if (S. f) and (T, g) are two multi-sets then we say (S, f) is a snhmulti-set
of (T, g) if first of all, S c T and secondly, for every x e S, f(x) s got).
Given any set S we can define/:5 —> N by f(x) = l for all x e S. Then
the multi-set (S, f) can obviously be identified with the set S, because no
element is repeated more than once. In this sense, the theory of multisets
is more general than the theory of sets. Some of the things that hold for
sets may fail for multisets. A few results about multisets will be given as
exercises.
Exercises
(Some of these exercises are meant only to develop your intuition.
Rigorous proofs may not be possible at this stage. Nevertheless, try to be
as precise as you can.)
1.1 Give an example of a problem (other than those given in the text)
which reduces to the same problem as the Kcnigsberg bridge
problem.
1.2 For which of the graphs in Figure 3.6 is it possible to have a round
tour in which every edge is traversed exactly once?
1.3 Give an example of a graph in which there exists I tour with every
edge traversed exactly once but no such round tour exists.
'51:: Exercise (6.222) for definiliml.
Sets with Additional Structure: 149
(a ) ( b)
Figure 3.5: Graph for Exercise (1.2)
1.4 Suppose there is a green meadow M with a little pond P in it as
shown in Figure 3.6. A herd of cows is originally at the point A.
Figure 3.6: Cowl In the Med“ with a Pond.
Each cow moves from point A to point B without going through
the pond and without going around the pond. Prove that the cows
fall into two categories depending upon the position of the pond
vis-a-vis the path of the cow. Classify the three cows whose paths
are shown in Figure 3.6.
Which of the following attributes correctly describes the clusifica-
tion of the cows in the last theorem? That is, given two cows
traversing paths Cl and Ca (say), which of the following quantities
tell us whether the two cows are in the same class or not?
(i) the lengths of Cl and C2,
(ii) the curvatures of C1 and C,,
(iii) the area bounded by C, and C,
1.6 (For those who know a little complex analysis). Prove that the
classification in Exercise (1.4) can be characterised rigorously in
150 mscrum; MATHEMATICS (Chapter Three)
terms of a certain line integral over the paths traversed by them.
(We shall not study problems like this. This one is meant only to
illustrate how, out of the variOus possible structures on the same
set, one is particularly relevant to a given problem.)
1.7 Verify the assertions made about the seven examples regarding
which of the four properties of the euclidean distance are shared by
them.
1.8 Prove Proposition (1.1) using the analytical definition of the eucli-
dean distance.
Represent points of the euclidean plane by cartesian coordinates
w.r.t. some fixed rectangular frame of reference If P = (x,, yo
and Q = (x,, y.) are two points define
d,(P.Q)-lx.—x.l + Iy.-.v.l
and
d,(P,Q)=max(|x,—x,l, lye—ya”.
Prove that both «I, and 11, give rise to metric spaces. Describe the
open and the closed balls w.r.t. these metrics.
What will be the answer to the problem of the kings with a
territorial pact if all the territories are to be squares of the same
size with sides parallel to the east-west and the north-south direc-
tions?
1.11 Same as the last problem except that the diagonals of the squares
are to be in the east-west and the north-south directions.
'1.12 If in Exercise 1.10 we allow rectangles instead of squares. then
prove that the solution is not unique in general.
“1.13 Do Exercise 1.10. with a further relaxation, namely that the sides
of the rectangles need not all be parallel to the same direction.
1.14 Let X be a finite set and d a metric on X. Prove that, there exists
a real number A > 0, such that for allO <: r g A, and for all
xEX. Bg(x, r) consists only of the point x. (In this sense, every
metric on a finite set behaves locally like the discrete metric.)
Let (X,: 41) and (X,; 4,) be metric spaces. Let X = XI x X,.Deflne
d: X x X —>R by any one of the following formulas:
0) "(08%). (yr. )0) = 1714.01. NP + “a (x15 J’s)?
gi) d«x.. x.). (a. y.» = d.(x.. y.) + 406.. y.)
(ni) d(x.. x.). (n. y.»=max. (are. ya. d.(x.. yo}. Prove um
each of these is a metric on' X x X. Generalize to the case
when we have n metric spaces (XI; d1), (X., d,),..., (X.. d.).
(The first metric is called the Pythagorean metric. Can you
justify the name?)
Sets with Additional Structures 151
1.16 Prove that the Hamming distance can bethought of as a special
case of the metric (ii) in the last exercise (generalised to the case
of n metric spaces.) (Hint: Note that a binary sequence oflength n
m" r ‘. ',toan ' ofthe ‘
%X%X mxm
(n-times) where Z, is the set {0, 1)).
1.17 Prove that there do not exist three (distinct) binary sequence: of
length 10, every two of which are at least atadistance 7 apart
from each other; even though the Hamming bound for the number
of such sequences is 5. (Thus the Hamming bound ls not always
sharp.)
1.18 Let n, t be positive integers with t < n
M:
2" 1
‘5‘ ( )
II
k-o k
Prove that there do exist at least M binary sequences of length n
every two of which are at least at a distance r apart from each
other. (This gives a. lower bound for the number of such sequences.
[t is called the Gilbert bound.)
1.19 Forn = 10 and t=6, prove that the Gilbert bound is also not
sharp.
1.20 F " the , of " ' " and T" 1.7
to k-ary sequences length n.
1.21 Define the intersection and union of two multi-sets in such a way
that they come out to be, respectively. the largest common sub-
multl-Iet of the two and the smallest common super-multi-set of
the two.
1.22 Suppose (A, f) is a submulti-set of a multiset (X, 3). Define
3=U—0Ufiemflfi<flm
and 11:3»b
g(x)ifxEX—A
hm = (x) _fix) ifx e A.
The multiset (B, It) is called the complement of (A, f) in (X, g).
Prove, however, that (A, f)n(B, h) need not be empty.
1.23 The cardinality or the weight of a multi-set (S, f) is defined as
is fls) Generalise the results about the cardinalities of sets
(in8Chapter 2, Section 2) to multi-sets.
152 Dtscnm MATHEMATXCS (Chapter Three)
1.24 Let (S,}’) be a multi-set of cardinality n. A permutation of (S, f)
is defined as a function
0:“, 2,. . ., n}—>S
such that for every x e S,
IO“ (U })| =-f(x)-
Show how this generalises the concept of a permutation of a set.
Using Theorem 2.2.17 obtain a formula for the number of per-
mutations of a multiset.
1.25 In a survey of 98 persons, each person was asked to indicate his
preference for two products by a rating from 0 (total dislike) to 9
(total liking) for each of them. No person showed total dislike for
both the products or total liking for both. Also no two persons
gave the same ratings to both the products. Prove that it is impossi-
ble to pair 011' these 98 persons in such a way that in each pair the
two persons have a minimum difl‘erence of taste.
1 .26 What would have happened if. Birbal. instead of answering ‘Because
you fail to move it‘ had answered ‘Because you fail to treat it
properly"!
Notes and Guide: to Literature
This “ “ gives the f ‘ ‘ ' of L ‘ ' stt Actual
,' will be ‘ in the L r to come, where additional
comments will‘he made.
There is some variation in the definition of ugraph. We shall study
them later (see the Epilogue). Standard references include Harary [l] or
Dec [1].
The theory of metric spaces is very important in analysis and topo-
logy. See for example, Simmons [I].
The problem about the kings with a territorial part is a special case of
a more general problem of fitting in a given figure, mutually disjoint
replicas of some other figure. The general problem is considerably complex.
The concept of Hamming distance also appears in coding theory (see
the Epilogue).
For permutations of multisets, see Knuth [1], vol. 3, Chapter 5,
Section 1.
2. Binary Relations on Sets
Given a set X. one of the simplest and yet one of the most frequently
oecuring additional structures on X consists of what is called a binary
relation on X. As the name implies, a binary relation on a set X describes
some type of relationship which may exist between certain pairs of elements
of X. It is the relationship and not the two particular elements that
mat-
ters. For example, let X be the set of human beings in a town.
Two parti-
Sets with Additional Structures 153
cular elements, say, A and B may be related to each other in any one or
more of the many possible ways, e.g. (i) A and B work at the same place
(ii) A and B are neighbours of each other (iii) A is exactly 7 inches taller
than B (iv) A loves B (v) A loves B but Bdoes not care (vi) B is A’s wife
(vii) A and B live in the same town. There may be many other pairs of
persons who also exhibit some of these relationships between them. The
significance of these relationships of course varies depending upon our
point of view: (i), (ii) and (vi) may be relevant in a census; a dress desig-
ner may be concerned only with (ii), (iv) and (v) may be vitally interesting
to a gossip. (vii) does not seem to say much because A and B are already
knowu to be from the same town. Still it is a relationship, a relationship
which exists between every two members of X, as a matter of fact. At the
other extreme, we may have an impossible relationship such as, (viii) A is
taller than B and also B is taller than A. There is no pair of persons who
are so related. Finally we may also have a relationship like (ix) A is more
than 18 years old. At the outset one may refuse to call this asarelationshlp
between A and B because, B is not at all ‘involved’ in it. Still, it is a per-
fectly well-defined relationship and may be relevant, for example, in open—
ing adjolint bank account if the rules require the first account holder to be
an a u t.
Having given so many examples of relationships which may exist bet-
ween two persons at a time, let us now see how we can give a rigorous,
mathematical definition of a binary relation. Let us take example (iv)
above. So, let X be the set of all persons in a town. Now define
R=((x,y) e Xxx: xlovesy}.
Then R is a subset of X V X and consists of all ordered pairs of persons in
X in which the first person loves the second, (We are not requiring that x and
y be always difierent; it may happen that some person loves himself!) Thus
the relationship (iv) determines the set R, which is a subset of X X X. Con-
versely the subset R determines the relationship (iv) completely. Given two
persons, say, A and B in the town whether A loves B or not amount! to
asking whether the ordered pair (A,B) is in R or not. The same holds for
all other relationships considered above. Each one of them determines
some subset of X x X and this subset, in turn, determines the relationship.
These subsets. of course, vary depending upon the relationship. For the
relationships (vii) and (viii) they are respectively, X :-I X and t5: the empty
set.
We now appeal to the ‘definition trick’ mentioned in Chapter 1, Sec-
we
tion 4. When we have two things, inexorably related to each other,
define one of them to be the other, This leads to the following formal
definition.
2.1. Definition : A binary relation on a set X is defined as any subset of
re-
Xx X. HR (2 XXX, and (x,y)e R we say ‘x is R—related to y' or ‘x is
lated to y under R’ and often write ‘n‘. More generally, for every posi-
154 mscnm MATHEMATICS (Chapter Three)
tive integer n, we can define an n-ary relation on X as a subset of
XX X X XX (n times). We shall rarely consider n-ary relation for n a6 2.
So by relation, from now on. we shall mean a binary relation unless
otherwise specified.
Despite the motivation given above, it may still appear somewhat
arbitrary and smug to define a ralation on a set X as asubset ot' XXX. But
actually it need not be so. Conceiving a relationship through the set of
pairs which have it is not artificial, it is in fact natural. We often hint at
certain relationships without naming them, by citing some instances of the
' ‘ " ‘fflfll. "r F " . for ,' the ‘New Delhi
is to India what London is to England or Paris to France’. We all under-
stand that the implied relationship is that of being the capital of the country.
(Strictly speaking it could be difi'erent. London and Paris are also the
largest cities of England and France respectively and if this was the inten-
ded relationship then New Delhi would have to be replaced by Calcutta.
The dificulty here is quite understandable. A set cannot be uniquely speci—
fied by merely giving two or three of its elements, as there may be other
sets containing these elements. That is why, when we identify a relation on
a set X with a subset of XXX, we take the set onI ordered pairs of ele-
ments of X related under that relation.) As a poetic example, take the
statement, ‘without you I am like a fish out of water or like a night with-
out a moon’. Suggestion of a particular relationship through some instance
of it is, in fact. the very essence of a similc.
It may, of course, happen that two apparently difl‘erent relationships
determine the same relation ona set X. The chances of this are more, the
smaller the set X. Suppose, for instance that X consists of three persons,
say, A, B, C. Let the heights of these persons be, respectively, 190,
180 and 170 cats. and let their weights be 80 kg., 70 kg and 60 kg res-
pectively. Then the two relations. ‘3: is taller than y’ and *‘x is heavier
than y‘ are the same for the set X, each corresponds to the set
((A. B), (13, C), (A. (3)}-
There is nothing wrong in this, because as far as the set X is concerned,
higher Weight is indeed equivalent to greater height.
Another point which a beginner sometimes finds confusing is that we
are allowing every subset of X x X to be a binary relation on X. As we saw
above, certain subsets of X x X arise from some relationship for elements
of X. Is this true of all subsets of X x X'I For example, let X = N, the set
of positive integers. Let R = ((l, l),(2, 2)(2. 4). (3, 3). (3, 9), (4, 4), (4, 16),
(5, 5), (5, 25).-~~. (n, n), (n, '1‘) ...}. It is clear that R corresponds to the
relation ‘y equals either x‘or x”. But what if we take
5 = (1. 39). (2. 105), (3. 37)}?
Does S correspond to some relationship for elements of N? The answer
Sets with Additional Structures 155
is ‘yee’. But this relationship may not be as ‘natural’ as the one for R. Put
difl‘erently, what we want is a characteristic property of the set S (see
Chapter 2, Section 1). Theoretically, there is always one such property,
namely, the property of belonging to S. This may sound evasive but
philosophically it is not so. The very fact that out of all possible pairs of
positive integers, we chose to pick only the three pairs, (1 , 39), (2, 105)
and (3, 87) and no others, itself serves to distinguish them. Admittedly the
choice was arbitrary. Somebody else may make a different choice. But he
too would get a perfectly well-defined binary relation.
With so much preamble about what a binary relation on a set is, let
us now get down to their properties. Since every subset, without restriction,
of X X X is a binary relation on X, it is clear that the number of binary rela-
tions on a set with n elements is 2". There are two ways of visualising a
binary relation on a set X. One is the eartesian representation, where we think
of X as the real line (or some subset of it). Then arelation on X deter-
mines a subset of the cartesian plane. One advantage of this representation
is that the points on the line whose equation (in cartesian co-ordinates) is
y = x correspond to ordered pairs both whose elements are equal. Formally,
for any set X, the diagonal on X, denoted by AX, is defined as the set
((x,x)eX><X:xEX}.
This is a subset of Xx X and plays a crucial role in certain concepts asso-
ciated with binaT-y relations. Figure 3.7 shows the relation R defined by
n ifi' x g y on the interval [0, 1]. Note that points on the diagonal are
included in R. If the relation were x < y then they would not be included.
Figure 3.7: Carnelian Representation 01 a Relation.
Another method for visualising a binary relation R on a setX is known
as the graphic representation. It is convenient when the cardinality of the
set R (i.e. the number of pairs related under R) is relatively small. In this
method we picture elements of X by points in a plane. Whenever n, we
draw a directed arrow (which may be curved) from x to y. These arrows
156 DISCRETE MATHEMATICS (Chapter Three)
may cross each other, but we ignore the crossings. (If X is a finite set, the
crossings can be eliminated by allowing the curves to lie in space instead
of just in the plane.) Note that if a point x is related to itself, then there
will be a closed curve or a ‘loop’ at x. Figure 3.8 shows the graphic
representation of the relation x g y on the set {1, 2, 3, 6}.
0
0%?"
Figure 3.3; Graphic Representation 0! I Blur! Relation.
The reader will notice that this tepreseutation looks very much like a
graph, if we take points of X as verticcs and the arrows as edges. This is
indeed so, except that the edges are now directed in a particular sense.
Note also the presence of loops. The structure that we get here is formally
called a digraph (a short form of directed graph) and we shall study it
later. (See the Epilogue).
Since we are allowing every subset of X x X to be a relation, obviously
there is little that can he proved for all binary relations in general. In order
to prove something non-trivial, we have to impose some additional condi-
tions on the relations We list three such conditions in the following
definition.
2.2. Deflnltlon: A binary relation R on a set X is said to be
(i) reflexive if {or every x E X, xRx,
(ii) symmetric if for every x. y E R. n implies n,
(iii) transitive it" for every x, y, z E X, n and s imply s.
Before giving examples of relations having these properties, it is
instructive to interpret them in terms of the two representations of rela-
tions we have studied. Let us first take the graphic representation. Evi-
dently, the first condition, reflexivity, is equivalent to saying that at every
point there is a loop. The second condition says that whenever there is an
arrow joining one point to another, there is also an opposite arrow joining
the second point to the first. (For loops. we do not draw opposite loop
Set: with Additional Structure: 157
because the symmetry condition is always satisfied when y = x.) The
transitivity condition, in graphic representation, means that whenever you
can ‘transit’ from x to y by an arrow and also from y to I by an arrow then
you can also transit directly from x to 2 by an arrow. (Hence the name).
For the cartesian representation, reflexivity of a relation R on a set X
is equivalent to saying that all points on the diagonal are included in R,
i.e. A X c R. The diagonal also plays a crucial role in the interpretation
of symmetry. From elementary co-ordinate geometry it follows that for
any real numbers x“, and y, the point (yv, x0), in the mirror image of the
point (x., y,), in the line x = y (i e. the line segment joining (xn, yo) and
(yo, x.) is bisected at right angles by the line x = y). It is then clear that a
relation R is symmetric if and only if it is symmetrically situated about the
diagonal line. Hence the name. Unfortunately, there is no natural
interpretation of a transitivity of a relation in terms of its cartesian re-
presentation.
As a simple application of these interpretations let us count the number
of reflexive and symmetric relations on a set.
2.3. Proposition: On a set with n elements, there are 2"'-" reflexive rela-
tions, 20"")I' symmetric relations and 2"‘"’/3 relations which are both re-
flexive and symmetric.
Proof: Let X be a set with n elements. The diagonal AX contains in ele-
ments and its complement, X XX—AX contains n'—n elements. Now a
reflexive relation on R contains AX and the remainder R—AX can be
any subset of Xx X—A X. So the number of reflexive relations on X is the
same as the number of subsets of X x X—AX, which by theorem (2.2.l5)
is 2"“. Now, XX X-A X, consists of all pairs (x, y) with xséy with
x, y E X. Group together (x, y). with (y, x). Then X xX—AX is decom-
posed into (n'—n)/2 such groups, each containing two elements which are
mirror images of each otherin the diagonal. Let Y be a set formed by picking
any one element from each of these duplets. (If X is a set of real numbers
then the choice can be made uniformly by taking only those pairs (x, y)
for which 1: < y. This gives Y as the ‘upper‘ triangular half of the square
XXX, with the diagonal removed.) Now. a symmetric relation R on X is
determined completely by what points of AX are in it and what points of
Y are in It (because for all xaé y, Rwill either contain both (x, y) and (y, x)
or neither of the two and precisely one of these elements is in Y). Thus a
symmetric relation corresponds to a subset of AXU Y while a relation
which is both symmetric and reflexive must contain AX and so corres-
ponds to a subset of Y. Since
11' —n
|Y1= 2
158 inseam MATHEMATICS (Chapter Three)
and
. ,_
|AXUY|=|AX1+1Y|=it+”—21'="L;'l _
the result follows, again using Theorem (2.2.15). I
Ihe proposition above also tells us how to give examples of rcflcxrvc
and symmetric relations. We take any set X and choose suitable subsets of
Xx A as in the proof. But let us give some ‘natural' examples. The rela‘
tion x < y on .the set of real numbers is reflexive. transitive but not
symmetric. However, if on the same set we define n by x S y’ then R
has none of the three properties. If we define S to be the union of the last
two relations, we get a relation on R. which is reflexive but neither
symmetric nor transitive. Note that xSy iflx < y or x g y‘ Then l52 but
23L Also 50810 and [057 but 5057. The relation x < y on R is transitive but
neither reflexive nor symmetric. On any non-empty set X. the empty relation
is symmetric and transitive but not reflexive. The conditions for symmetry
and transitivity are satisfied vacuously in this case. If (X; d) is a metric
space and ris a positive real number then the relation R defined by n ill‘
4 (x, y) s 7 ts reflexive and symmetric and but not transitive in general.
As other examples. let X be a set of statements. If p, q e X, define q
it? the implication statement p—>q is true. ClearlyR is reflexive and
transitive but not symmetric in general. Let X be the set of all straight
lines in plane. If we define parallel lines as those not having any point
in common then parallelism is a symmetric relation which is neither
reflexive nor transitive. But if we define two lines as parallel whenever
they areboth per, " ' to a line, then pm " " is r " ,
symmetric as well as transitive. Relations which enjoy all these three
properties are very important and are given a special name.
1.4 Definition: A binary relation which is reflexive, symmetric and
transitive is called an , ' ' ‘ A few " ‘ ‘ for
equivalence relations are E, ~ or a.
As remarked just before the definition, parallelism (in the second sense)
is an equivalence relation on the set of all straight lines in a plane. But these
relations arise very frequently and so we give a few more examples.
1. Let X be a set of statements. For [1, q E X define q ifi'p 4:» q
is true (i e. ifl' both the implication statements p =9 q and (1 => p
are true). Then R is easily seen to be an equivalence relation. In
Chapter 1, Section 4 wecalled two statements p and q as logically
equivalent ifl' the statement p ¢ q is true. in other words, logical
equivalence is indeed an equivalence relation. More generally, this
isthe case with many other types of equivalence. Whenever two
Set: with Additional Structure: 159
things are considered equivalent in some context, it turns out to
be an equivalence relation.
Suppose we want to count the number of distinct ways to seat n
distinct guests on n (indistinguishable) chairs placed evenly around
a circular table. In problems like this the crucial question is
to decide which arrangements are to be regarded as not distinct.
In the present problem, let S be the set of all possible ways to seat
the n guests on the n chairs. Then | S | = n !. But in this problem
what matters is the relative position of the guests. So we regard
two arrangements as equivalent to each other if one can be obtain-
ed from the other by asking each guest to move the same number
of places to the right, because doing so does not affect the relative
position of the guests. We leave it to the reader to verify that this
is indeed an equivalence relation on the set S.
Let X = Z, the set of all integers. Let n be fixed positive integer.
Define a relation R on X by (IR!) ifl‘ the integer a-b isdivisible by
n, he. ifl' there exists an integer p such that a—b = up. Then aRu
for all a e X, because a—a = 0 = 71.0. If aRb then a —b = up for
some integer 11. But then b—a = n-(—p) and since —p is also an
integer, it follows that ”to. For transitivity, suppose a—b=np
and b—c =nq, where p, q are integers. Then
4—0 = («I-b) + (b—C) = "(p + q)
and since p + q is an integer we get aRc. Putting it all together,
we see that R is an equivalence relation. There is a special name
and notation for this relation. If aRb, we write a E b (modulo n)
or a E 11 (mod n) or sometimes simply a a b and read ‘a is
n
congruent to b modulo n’. For n = 2, this is also called the parity
relation. Instead of saying a a b (mod 2), we also say a and bhave
the same parity. Note that this happens HT 11, b are both even or
both odd.
Let El" be a collection of sets. [f X, Y are two elements of 9', then
they are themselves sets. We say X and Y are equipollent if there
exists a bijection from X to Y. Then equipullency is an equivalence
relation: reflexivity follows from the fact that the identity function
on every set is a bijection, symmetry from the fact that the inverse
function of a bijection is a bijection and transitivity from the {act
that the composite of two bijections is a bijection. Note that two
sets have the same cardinality if and only if they are equipollent.
Let (X; d‘) and (Y; 4,) be two metric spaces. We say (X; 1!.) is
isometric (or congruent) to (Y; 11,) if there exists a bijection f: X —> Y
which preserves distance in the sense that for all x, y E X,
‘1. (X, y) = 4’. (f(X),f0'))-
160 mscnm MATHEMATICS (Chapter Three)
Such a bijection is called an isometry or congruence. By an
argument similar to that in the last example, it follows that cong-
ruency is an equivalence relation. In school geometry, congruency
is studied for triangles only but as our definition shows, it is a
concept applicable for any two geometric figures, or, indeed, two
metric spaces. Obviously congruent objects would have the same
geometric properties such as areas and volumes.
6. ln example (4) we dealt with sets and defined an equivalence
relation for them in the form of a bijection. In example(5), on the
other hand, we dealt with sets with some additional structure
(namely a distance function). To define the equivalence of two
such sets, we required the bijection to preserve this additional
structure. This is a typical situation Two mathematical structures
of the same type are said to be equivalent if there is a bijcction
between their underlying sets which preserves (or is compatible
with) their respeCtiVe additional structures. The name for such
equivalence can change. In geometric context it is called congru-
ence, for algebraic structures (which we shall study later) it is
called an isomorphism, and so on. In all cases it is an equivalence
relation. We leave it to the reader to define the equivalence of
multi-sets.
7. As one more example let us consider the set, say X, of all quarter-
nary sequence of length n. It‘ 3: x, . .x. and 7 =y1...yn EX,
define 52 ~ 7 ifi‘ the 2’s and the 3’s occurring in them are the same.
For example. 012311220330~ “2301220331. It is clear that ~ is
an equivalence relation. This relation will be used later.
8. As the final example, let X. Ybe any sets and f z X —> Yany function.
Define a relation R on Y by n in“ f(x) = f(y). Then Risan
eqaivalence relation. By varying the sets X and Y and the function
f, we can get many examples of equivalence relations this way.
For example. the relations of being of equal age, having the same
hometown etc. on the set of human beings.
Having given examples of equivalence relations, let us now see what
they do to the set on which they are defined. We begin with an important
definition.
2.5 Definition: Let R be an equivalence relation on a set X. Let xe X.
Then the set {y e X : n} is called the equivalence class of R determined
by x or R-equivalence class determined by x and is denoted by R[x] or
simply by [x] when the relation R is understood.
Of course, a similar concept could have been defined for any relation
(not just an equivalence relation) on X. But, for equivalence relations it
has certain important properties which we now prove.
Sets with Additional Structures 16]
2.6 Proposition: Let R be an equivalence relation on a set X. Then
for every x e X, xe[x]. Also for any x, y e X, either [x] =[y) or
[x] n U] = 9‘-
Pronf: The first assertion follows from reflexivity of R. For the second
assertion, suppose x, y e X and [x]n[y) as d. Then weclaim that [x] = [y].
Since [x]n[y] 96 1b, there exists 16X such that zepc] and 250]. By
definition, this means- that s and s. By symmetry of R, s implies
zRy. By transitivity of R, s and zRy give n. By symmetry of R, n.
Now suppose we [x]. Then w. This, coupled with n gives w by
transitivity. This means w e [y]. Hence [x] c [y]. Similarly [y] c [x]. So
[x] = [y] as was to be shown.
Because of this proposition, it follows that every element of X belongs
to precisely one equivalence class of R. Let now 9 be the collection of all
R-equivaleuce classes. Then every member of H) is non-empty; every two
members are mutually disjoint and the set X is the union of members of 9.
From Definition (2.3.6). .4) is a decomposition of X. Conversely given
any decomposition .4) we define a relation R on X by letting n ifl' x and
y belong to the same member of 9). Then R is easily seen to be an equiva-
lence relation on X and the equivalence classes of R are precisely the
members of .4). We have thus proved the following theorem.
2.7 Theorem: Theme is a one-to-one correspondence between the set of
equivalence relations on a set X and the set of all decompositions of X.
This correspondence is obtained by assigning to each equivalence relation,
the collection of its equivalence classes. I
As an illustration of the decomposition induced by an equivalence
relation, let X be the set of all students in a college. If we define two stu-
dents to be related itt' they have the same height then we get an equivalence
relation (see Example (8) above) on X. In the corresponding decomposition
of X, all students of the same height will be grouped together. Students of
difl'erent heights will be in difi'erent groups. ln other words, we have
classified the students according to their heights. Similarly the equivalence
relation of parallelism classifies the lines in a plane according to their
directions (we do not distinguish between opposite directions). Generally,
whenever we classify the elements of a set according to some criterion it
- amounts to defining a suitable equivalence relation on it. In Example (7)
above, the equivalence relation classifies the quartemary sequences accor-
ding to the pattern of 2’s and 3‘s appearing in them.
The concept of an equivalence relation and the classification it induces
is useful in various ways. First, it serves to clarify ideas when we wish to
regard certain elements of a set as indistinguishable from each other. For
example, the problem considered in Example (2) above, really amounts to
counting the numbers of equivalence classes under the equivalence rela-
tion defined there. because the relative position of the guests in two arrange-
162 DISCRETE MATHEMATICS (Chapter Three)
ments is the same “1' they are equivalent. Obviously each class has n
elements and so the answer is nl/n = (n— l)!.
As another n" ' let us ' ' the ‘ of k-ary ,
of length n in which 1 occurs an even number of times. In proposition
(2.3.3) we answered a similar question for binary sequences and obtained
2“—l as the answer, that is halfthe total number of binary sequences of length
n for n 2 1. (For n = 0, there is only the null or the empty sequence and
by convention we regard that 1 occurs in it an even number of times. This
exceptional case will also figure in the proof belowl)
2.8. Theorem. The number ofk-ary sequences of length n in which 1
occurs an even number of times is [k" + (k—2)"]/2.
Proof: We have already proved the result for k = 2. Interestingly it also
holds for k = I. For k = l, the sequence will consist only of 1’s and then
depending upon whether n is even or add, this sequence will be included or
not included;
Assume now k > 2. The exceptional case n = 0 is covered by the fact
that the null sequence is to be regarded as having an even number of 1’s.
So assume now u > 0. Now let X be the set all k-nry sequences of length
n. We divide X into two subsets first. Let Y consist of those sequences
which do not contain any 0 or 1 and let 2 consist of the remaining sequ-
ences. Then a sequence in Y is a sequence of the remaining k—2 symbols.
So I Y | = (k—2)’l and hence |Z | = k"—(k—2)". Now every sequence in
Yis to be included in our count because it contains 1 an even number of
times (namely 0 times). This is also like the exceptional case. Let us now
count how many sequences in Z have an even number of 1's. For this, we
classify them according to the pattern of the symbols 2, 3,..., k—1,(cf,
Example (7) above where the case k = 4 was considered). This gives an
equivalence relation on Z. Let the corresponding equivalence classes be
I
ZI,Z,,...,Z,,. Then '2‘ [2, [ = |Z| . We donotknowp notwhat each
|Z, | is. But we can do without finding it out. Let us find how many
sequences in a typical equivalence class, say 2,, have an even number
of 1’s. By definition, 2, consist of all k-ary sequences which have a
common pattern of the symbols 2, 3, ..., k-l. Let r be the number
of placesfilled by these symbols. Thenr < n, forif r: n then the sequence
would be inY and not in Z. Thus all sequences in Z, have these r terms
in common. The remaining n—r terms have to be either 0 or 1.
Hence an element of 2, corresponds uniquely to a binary sequence of
length n—r. So I Z, | =2"-'. (For example let k =4 and n = 10 and Z,
consist of all sequences of the form — 23 — — 2 — 33 — where the blanks
can be filled with 0’s and 1’s. Then r = 5 and | Z; I = 2“‘5 =2‘.) Now
clearly half of these sequences have an even number of 1’s, by Proposition
Set: with Additional Structures 163
(2.3:3) This holds for all i: 1, 2, . . ., p. So the number of sequences in Z
havmg an even number of 1’s is
.25. i I z,| =* ,2: IZII = HZ! =Hk"—(k—2)~1.
Since all sequences in Y are also to be included, we get the total number
of sequences with an even number of 1’s as [k'. + (k—2)']/2. I
The equivalence classes of Example (3) above deserve to be considered
in detail. There, the relation was that aRbifi' (1—17 is divisible by n. It fol-
lows that for any a e z, the equivalence class [a]consists of all integers of
the form n+ kn where k is any integer. For example, for n= 12, the
equivalence class [I7] is the set (. . . ., — 31, — l9, —7, 5,17, 29, 41, .).
Note that the difference between every two distinct elements of an equi-
valence class is at least n. It follows that no equivalence class can contain
more than one integer from 0 to n— I. On the other hand, bythe euclidean
algorithm (see problem (21.10)), given any integer a we can find an integer
r from 0 to 11—] such that a _=. r (mod n), Le. [a] = [r]. Thus We see that
every equivalence class contains precisely one integer from 0 to n—l. So
in all there are n equivalence classes. They are called the congruence classes
modulo n or residue classes modulo n or simply the residues modulo n, the
last two names coming from the fact that the integer r above is called the
residue left upon division by n. The sctof residue classes modulo n is often
denoted by Z... We shall have many occasions to consider it later.
The concept of an equivalence class is sometimes put to a theoretical
use, namely to give precise definitions of certain terms. Take once again
the example where X is the set of all students in a college and R is the
relation defined by xlty id x and y have the same height. This relation
classifies the students according to their heights. Now suppose we are
given this classification beforehand without knowing that it has been
obtained on the basis of height. If an outsider takes a close look at these
classes, he would notice that all students in the same class have the
same height and no two students in different groups have the same height.
In other words, the height of a student is characterised by the equivalence
class to which he belongs.
Now we come to the crucial points. Can we define the height of a stu-
dent as the equivalence class to which he belongs? According to the
‘definition trick’ we often do this sort of a thing in mathematics. it does
sound extremely artificial to tell the height of a student not as so many
centimeters but as a certain equivalence class! But actually it is not
so far removed from practice. When we say ‘Iohn is 180 centimeters tall’,
we have in mind some object of height 180 cms (which may be a ruler, a
tree, or, most likely. some other person of this height) and what We really
convey is that John is in the srme equivalence class as this object. Indeed,
this is how a layman often tells height. 50, after all, it is not all that arbitrary
164 nrscnm unnamnrcs (Chapter Three)
to define height as an equivalence class under a suitable equivalence rela-
tion. As mother example, we may define the direction of a straight line in a
plane as its equivalence class under the relation of parallelism. We shall
not give many definitions of this type because we shall not be very parti-
cular to give precise definitions where they tend to be clumsy. We have
discussed this point only to advise the reader not to get perplexed when
he sees such definitions in reading other books on mathematics. For
example, he will often find the cardinal number of aset X, defined formally
as the class of all sets which are equipollent to X.
The correspondence between an equivalence relation and a decomposi-
tion of a set, given by Theorem 2.7, allows us to visualise an equivalence
relation. Also any , about ‘ , ' ' can be u ' -' to a
Du- , 1- a 3 about cqui ' ' ' and vice versa. We
discuss one such concept.
2.9 Definition: Let? and I be two decompositions of a set X. Then
9 is said to be coarser than a' (or I is said to be finer than 9, or a
refinement of 9) if every member of J is contained in some member o.
Let us denote the decompositions 9 and 3 respectively by
9 = (0,, D,,..., 0...}
and
6 = {Ev Exam, En}.
If 6' is a refinement of 9 then for every E, there exists some D, such that
E, c D]. This D, is unique since the D‘s are mutually disjoint. It is clear
that each D, is the union of those Ei‘s which are contained in it. In other
words, a! may be thought of as obtained from 9 by further decomposing
each member of 9. A simple example is to let X be some country, the
D,‘s be its states and the 51’s the district: in the various states. We illua—
trate this for a hypothetical country in Figure 3.9, where the state bound-
aries are shown by thick lines and the district boundaries by dotted lines.
Plgnre 3.9: Conner and liner Relations.
Set: with Additional Structures [65
(We may ignore points on these boundaries. We may make some arbitrary
convention by which such points are considered to lie in one of the re-
glons on whose boundaries‘ they lie.)
Now let R and S be the equivalence relations corresponding to the de-
compositions 9 and 6 respectively. What relationship between R and S
corresponds to the fact that 9 is coarser than 3? The name is provided in
the following definition.
210 Definition: Let R and S be two binary relations on a set X. Then S
is said to be stronger than R (or R is said to be weaker than S) if for all
x,y E X, xSy implies n. (in the terminology of Chapter 1, Section 4,
this says that the ‘ ‘ ‘ ‘xSy’ is _, than the ‘n’.
Hence the name. As subsets of XXX, the condition simply means that
S c R.)
We now state the relationship between the concepts defined by the last
two definitions. The proof is extremely simple and is left as an exercise,
for it will give the reader a chance to review many of the concepts defined
earlier.
2.11 Proposition: Let R and S be two equivalence relations on a set X
nndletflandtbeu .— ' ',,thew. , " ,J , 'innsol' X.
Then S is stronger than R if and only if 6’ is finer than 9. I
Because of this proposition, a stronger relation is sometimes called a
finer relation. If we go back to the example of the states and the districts
above then the truth of the proposition above is obvious, because given
two persons; it is certainly stronger to say that they are from the same
district than to say that they are from the same state.
An especially instructive reformulation of this proposition arises when
the relation R is ' ’ ’ by some " “ on X in the of ,'
8 above. The result is often expressed in a diiierent terminology. So first
we introduce this terminology.
' 2.12 Definition: Let S be an equivalence relation on asetX and let 3 be
the corresponding decomposition of the set X. Then 6 is often denoted by
X/S and is called the quotient set of X by the relation S. The function p:
X —) X/S defined by p(x) = [x]. the equivalence class of S containing x, is
called the quotient function or the projection function.
2.13 Definition: Let S be an equivalence relation on a set Xand suppose
Y is some other set. Then a i'unctionf: X—>Y is said to be compatible with
S or to respect S if for all x, y e X, xSy implies f(x) =f(y).
For example, let X be the set of all persons in a country and let S be
the relation, xSy ifi‘ x and y live in the same district. (The equivalence clas-
166 mscrums MATHEMATICS (Chapter Three)
ses induced by S are precisely the sets ofpersons living in the various dis-
tricts). If we define f(x) to be the state the person at lives in then frespects
the relation S because two persons living in the same district obviously live
in the same state. But if we definef(x) =height of x then fdoes not respect
S (except in the unlikely event that all the persons in each district are of
equal height). As another example, let X be the set of all straight lines in
the plane which are not parallel to some fixed line, say, the x-axis. Let S
be the equivalence relation of parallelism, discussed above. For a line L,
letftL) be the acute angle between L and the x—axis. Then Iis compatible
with parallelism. But if we let f(L) be the point at which L meets the x—axis
then f is not compatible with parallelism.
Let us now consider the reformulation of proposition 2.1 I.
2.14 Proposition: Let S bean equivalence relation on a set X and p:
X->X/S the corresponding quotient function. Also let I : X—sY be any
function where Yis some set and let R be the equivalence relation on X
induced byf. i.e. n “1'f(x) =f (y).
Then the following conditions are equivalent:
(i) S is stronger than R
(ii) the function f respects the relation S
(iii) there exists a function g: X/S—> Ysuch that gap = f(this is also ex-
pressed by saying that f factors through p). Moreover, such 3 is
unique.
Proof : We prove the equivalence in a cyclic manner. (i) = (ii). Assume
S is stronger than K. To show that f respects S, let x, y e X with xSy.
Then n because S is stronger than R. But then f(x) = f(y) by the defi-
nition of R. (ii) => (iii). Assume that f respects S. Let us try to define a
function g from X/S to Y. A typical element of X/S is of the form [x].
where x e X and [x] is the unique equivalence class of S containing x. It is
tempting to define g ([x]) simply as f(x). because f(x) is an element of y.
But there is a catch here. It is quite possible that S-equivalence class [x]
is the same as the S-equivalence class [y] for some y as x. Now,
g([y]) = f(y). But since [x] = [y]. it is necessary to have g ([xD = g([y]),
or equivalently, f(x) = f(y). How do we ensure it? A question like this is
quite important and has to be scrupulously answered every time it arises.
When we denote an equivalence class by [x], and define g in terms of the
value of f at x, We are effectively choosing x as a representative of the
equivalence class. If the value of g would change just because we choose
a different representative for the same equivalence class then the function
g will not be well-defined because it will be really speaking a function
of the particular representative and not of the equivalence class as such.
Set: with Additional Structures 167
The situation is analogous to a spokesman of a political party, giving
the party’s reaction to some event. Different parties may have ditTerent
reactions. But if two spokesmen of the same party issue two different
reactions, they can hardly be said to be giving the party’s reaction, they
are only giving their personal reactions. Coming back to our original
problem, if x and y are in the same equivalence class under S, then by (ii),
f(x) = f(y). So g is well-defined in the sence that its value on a particular
class depends only on that class and not on a particular representative. It
remains to verify that the composite gop equals f. But this is immediate
from the very definition of g, because for every x e X, p(x) = [x] and g([x])
= f(x). For uniqueness, suppose h : XIS->Y is such that h up =f. Then
Il[x] = h (p(x)) = f(x) = g([x]), for all [x] e X/S, showing that g =h.
(iii) a (ij Assume that f factors through 1:, i.e. there exists a function
g : X/S —> Y such that g o p = f. We want to show that S is stronger than R.
So let x, y E X and suppose xSy. Then by very definition of p,
p (x) = p ()0
since both equal the same equivalence class under S. But then
:0! (x)) = 3(p (y)).
giving
f (X) = f (y)
Recalling the definition of R, this gives n as was to proved. I
This proposition is really not profound. But the reader may find it a
bit too abstract: To make it more visual, we paraphrase the statement
(iii) in it diagrammatically in Figure 3.10. In (a), we show a triangle at
whose vertices are the sets X, X/S and Y. Two of the ‘sides’ of this trian-
gle correspond to the functions p and].
x/s| x/s
p I /
I
x g? X D 9
\ W
Y \ Y
(a) (b)
Flgure 3.10: Factoring through I Protection Function.
To complete the triangle we need a functiong from X/S to Yes in (b).
g
But merely having some function g would not do. We want the function
same
in such a way that in (b), when we go from X to Y we must get the
com-
result whether we go do directly (by f) or through XIS (i.e. by the
commu-
posite function g op). A triangle with this property is said to he a
arrow
tative triangle and this property is expressed by putting a circular
168 niscmrra MATHEMATICS (Chapter Three)
inside it. A commutative square of functions is defined analogously. The
reader will find it very helpful to visualise equalities involving composites
of functions in term of appropriate commutative diagrams.
To illustrate Proposition 2.14 with concrete example, let X be the set
of all students in a residential school and let Y be the set of all hostels in
which these students are accommodated. Define f : X —> Y by f (x) = the
hostel in which)! is accommodated, forx e X. Define a relation S on X by
xSy ifi' x and y are studying in the same year. Then S in an equivalence
relation and each equivalence class consists of all students in a particular
year. The 1 ‘ set X/S ‘ of all the L l of ‘ ‘ ‘ Now to
say that the function f respects the relation S amounts to saying that all
students studying in the same year are accommodated in he same hostel.
when this is the case, the concept of ‘the hostel of a particular hatch‘ is
well-defined and correspondents to the function g.
“' ' ‘ th' -" witha“ ’ of"- , ofthe , ' '
relation generated by a given relation R on a set X. Let S be the smallest
equivalence relation on X which contains R. The relation R need not be
reflexive, symmetric or transitive. But S, being an equivalence relation has
to have all these three properties. So S has to be obtained from R by
adding some pairs of points. The extension from R to S can be done very
systematically in three steps. They can be conveniently visualised in terms
of the graphic representation of binary relations. First we let
12,: RUAX;
i.e. R, is obtained by adding to R all pairs of the form (x, x) for x e X, if
necessary. Clearly RI is a reflexive relation. However Rl need not be
symmetric. So we let
R. = RaU-Rr‘
where Rr‘ is called the Inverse relation of 11,, and is defined by
Rr‘ = «y, x) e Xxhtx. y) e Re.
The name is suggestive because, in the graphic representation. Rf‘ is ob-
tained from R, by reversing the arrows. Clearly R, is symmetric. It is also
reflexive because it contains AX. As the last step, we now extend R. to an
equivalence relation. If R. is transitive, then it is itself the desired
extension. If R, is not transitive then there exist x, y, 2 such that
(x. y) E R2. 0’, z) e R,
but (x, z) ¢ R,. So we must add (x, z) to R,. Graphically, whenever there
are two arrows in succession, we put a direct arrow. But even if we do
this for all pairs of arrows in wccession, the resulting relation may not
still be transitive. There might be four elements x, y. z, w Iuch that (x, y),
(y, a) and (z, w) are in R, Then because of our construction, (x, 2) will be
Sets with Additional Structures 169
added. But in order to have transitivity we would also have to add (x, w).
More generally, whenever xv x,,...x. are elements of X such that
(x., x.), (x,, x.),.., (x,_,, x,,) E R,
then we would have to include (x,, x.) in the extension. The following
proposition shows that if we do this for all possible finite sequences. then
we do get an equivalence relation.
2J5 Proposition: Let R1 be a reflexive and symmetric relation on a set
X. Let-S=R.U((x,. x.): there exist x.,...,x,_,eX such that ()9, an“)
ER, fort: l, 2,...,n— 1}. Then S is an equivalence relation on X.
“ ,itisthe “ ‘,' ' ' onX "_,.R.inthe
sense that if T is an equivalence relation on X containing R, then Sc T.
Proof: Since R. is reflexive and S contains R., S is reflexive. For sym-
metry suppose (x, y) e S. If (x. y) e R. then (y, x) e R, by symmetry of
R, and hence (y, x) e S. If (x, y) e R, then (x, y) is of the form (XI, 9:.)
where n is some positive integer and there exist x,, x... . .,x.._I e X such
that for each i = l, 2,.. ..,n-— 1,
(‘1: X1“) 6 R'
But then, by symmetry of R, (arm. x,) also belongs to R, Considering the
sequence x.. x._l, . . .. x,. x, it now follows that (x., x1) 6 S, i.e. (y, x) e S.
Thus we see that S is symmetric.
It only remains to prove that S is transitive. For this let (x, y) and
(y. z) e S. Then there exist integers m and n ( 2 2) and
xv...,x.,yl....,y.e){
suehthat
X.= x.xm=y=yny.= 2. («tumoe R.
for all (=1, 2,...,m—l and (yj, nun-3R2 for allj= l,2,..., n—l.
Now let z,=x.foralll‘= l,..., mand z,=y,_,,.fori=m+ l,....m+u.
Then(z,.z:+.)eR,fori= l,2,...,m+n—1.Alsoz,=xand
z.“ = y. = 2. So (x, z) E S.
(Graphically we have concatenated the chain of arrows from x to y with
the one from y to 1.) Thus S is transitive and hence an equivalence rela-
tion. If T is any other equivalence relation on X containing R, then T
must also contain, by repeated applications of transitivity, all pairs of the
form (x,. x.) where there exist x,,.. ., x._,e X such that (x,, xm) 6R.
fori— 1,2,. ..,n—- I. But this means T must contain S. Thus S is the
smallest equivalence relation on X containing R,. I
170 DISCRETE MATHEMA'HCS (Chapter Three)
2.16 l‘“‘ : The,’ ‘ " S‘ ' ’t‘romR,(andhence
ultimately from R)‘Is said to be generated by R
In Figure 3.11 we show graphically the three steps of the transition
from R to S. For convenience, instead of drawing separate curves for two
opposite arrows, they are" “ ‘ " by ‘ two a. ‘ in
directions on the same curve. (For loops it is unnecessary to do rso as
Qfififi
. o 0 0
R R. R2 5
Fig. 3.11: Generating an F4ululenoe Relation.
observed earlier.) The procen described above is reminiscent of the spread
of epidemics. As soon as something gets contaminated, so does every-
thing that is in touch with it and this process continues till we get well-
isolated ‘ialands‘ with no more room left for spreading. Generating from
an arbitrary subset, a subset of a particular type is a general process and
we shall see many instances of it. For the moment we remark that the
equivalence relation in Example (2) above is actually generated by a much
smaller relation. Let us call one arrangement of guests as adjacent to
another if the second arrangement is obtained from the first by asking
each guest to move to the seat immediately on his right. The adjacency
relation so defined on the set S is neither reflexive, nor symmetric nor
transitive. But it generates the equivalence relation in (2).
Exercises
2.] For the relationships (i) to (ix) for the set of persons in a town,
see which relations are reflexive. symmetric and transitive.
2.2 Prove that the setS = ((1,39),(2 [05), (3, 87)) is the same as the set
((x,y) e N x N: l<x< 3, y: —42x‘+192x—lll}.
([n this formulation the relation S appears to be less arbitrary,
because it is defined ‘aoeording to some formula'. But this I: really
Sets with Additional Slructurer 171
an illusion for two reasons. For one, the formula itself is very
arbitrary. Secondly, given any arbitrary numbers 11,, a,,..., a, we can
always find a polynomial p(x) such that p(l) =al, p(2) =a,,.. ., p(r)=a,
and thereby give the impression that the numbers a1, a,,..., a, follow
some regular pattern. The method for finding this polynomial is
due to Lagrange.)
2-3 LetR be a binary relation on a set X and let R-1| be its inverse
relation, i.e.
R-1={(y, x)e X x X, x, yeR}.
What is the interpretation of lt-l in terms of the cartesian re-
presentation of R? Prove that the relations RLIR'1 and RD)!"
are symmetric for any R. Prove that R is symmetric it' and only if
R = R—‘. Show also that (It-1rl = R.
2.4 Let X be a set of cardinality It. How many relations on X are
neither reflexive nor symmetric? Also express the number of
equivalence relations on X as a sum of Stirling numbers.
2.5 Let R be a binary relation on a set X and let Ybe a subset ofX.
Then Y x Yis a subset ofX X X and Rn(Y X Y) is a subset of
Y x Y and hence a binary relation on the set X. it is called the
restriction of R to Y and is denoted by R/ Y. Prove that if R is reflex-
ive, symmetric or transitive then RIY also has the corresponding
properties. (The relation R/Y is also said to be induced on Yby
R. This is a general construction. An additional structure on a set
induces a structure of a similar type on a subset of the original
set. Sometimes, the subset has to satisfy certain restrictions. Some
of the properties of the original structure hold good for such
‘substructures’ also. Such properties are called hereditary. Thus
the properties of reflexivity, symmetry, and transitivity are here-
ditary. But not all concepts pass so nicely to substructures as the
following exercise shows.)
2.6 Show by an example that if R is a relation on a set X, S is the
equivalence relation on X generated by R and Y is subset of X
then S/ Y need not be the same as the equivalence relation on Y
generated by R/ Y.
2.7 What are the coarsest and the finest decompositions of a given set
X? What Ire the corresponding equivalence relations?
2.8 Let X = Z, the set of all integers, u some fixed positive integer and
let
R=((x,y)EX><X:y=x+n}.
Prove that the equivalence relation on X generated by R is pre-
cisely the relation of congruence modulo n.
172 orscaa'nz MATHEMATICS (Chapter Three)
2.9 Let R be a transitive relation on a set X. By induction, prove the
following statement for all n 2 2 :
‘For any x1, x.,.... x. e X, it'
(xi, x1“) 5 R for all i= 1. 2..., n—l
then
(x,, x.) e X.’
(This result was actually used in the proof of Proposition 2.15.)
Let R1 and R, be two equivalence relations on a set X. Prove that
RlnR, is also an equivalence relation on X. Generalise to the
intersection of more than two equivalence relations. Show by an
example, however, that the union of two equivalence relationsneed
not be an equivalence relation.
‘2.” Let R be a relation on a set X. Prove that the equivalence relation,
5, on X generated by R coincides with the intersection of all
equivalence relations on X containing R. (This gives an ‘external'
view of S. However, as a method of constructing S, it is not very
useful because it is impracticable to look at all possible equiva-
lence relations containing R. The construction we have given pro-
ceeds ‘internally' from R to S.)
2.12 If R1 and R, are equivalence relations on a set X, how is the
decomposition corresponding to R, nR, related to the decomposi-
tions correspoding to R, and RJ
Two decompositions 9 and! of a set X are said to be mutually
orthogonal if for all D e Qand E a J, Dn E contains at most one
point. (For example, ifX is a rectangular array of dots then the
decomposition of X into its rows is orthogonal to the decomposi-
tion of X into its columns.) How does this concept translate for
the corresponding equivalence relations?
2.14 Let R be a. symmetric and transitive relation on a set X. Suppose
for every x e X there is some y e X such that n. Prove that
R is also reflexive and hence an equivalence relation on X. (In
other words every symmetric and transitive relation is an equiva-
lence relation. provided it ‘touchea' all points.)
2.15 LetR be a symmetric and transitive relation on a set X. Prove
that there exists a subset Y of X such that R c Y>< Y and R,
regarded as a relation on Y, is an equivalence relation.
2.16 Let f : X—>Y be a function and T a relation on the set Y. Define
a relation R on X by
“0’“? (f(x)»f(y)) E Tfor X. y e x,
Prove that if T is reflexive, symmetric or transitive then R also has
the corresponding properties. How does this generalise Example 8
of equivalence relations?
Set: with Additional Structures 173
2.17 (a) Prove that for positive integers n, k with k 2 2.
..
z
1-0
(2,)(
"
) _..=____
k—l" 2
kfi+(k—2)~
(h) Prove that the number of k-nry sequences of length nin which
both 0 and 1 occur an even number of times each is
%[ k" + 2 (k—2)‘ + (k—4)‘ ]fork 2 2.
2.18 Find the number of k-ary sequence length n in which 1 appears on
even number of times and 0 appears an odd number of times,
1.19 Suppose S and T are equivalence relations on sets X and Y res-
pectively and f z X —> Y is a function which is compatible with
them in the sense that for all x, y e X, xSy implies
f (X) T10)-
Let
p:X-> A75 and q: Y—> Y/T
be the projection functions. Prove thatthere exists a unique func-
tion E 1175 -e Y/T such that g up = g c f. Express this result in
terms ofcommutative diagrams. Prove that the converse also holds.
How does this generalise Proposition (2.14)?
2.20 Let R and S be equivalence relations on the set: X and Y "39"?
ti vely. Let Z = XX Y. Define a relation R on Z by letting
(x1, y,) T(x.. y.)
“7 MR". ““1 y.Sy,. Prove that T is an equivalence relation on Z.
What do the equivalence classes look like?
2.21 In Exercise 2. 16 above, supposef is a bijcction and T is an equiva-
lence relation. Then R is an equivalence relation on X. How are
the equivalence classes of R are related to those of 7'?
Nate: and Guide to literature
The material in this section is elementary and standard. Some authors
consider, more generally, a relation from one set X to another set Y as I
subset of Xx 1’. But any such relation is also a subset of 2 x2. where
Z = XU Y. Therefore, there is really no loss to confine to the case of a
relation on a set.
174 mscmn MATHEMATICS (Chapter Three)
3. Order Relations
An equivalence relation on a set serves to classify its elements into
‘ "j ' subsets g to some property. In the
present section we consider a relation which ranks them according to some
criterion for comparison. Such relations arise frequently in practice Words
are listed in a dictionary in the alphabetical order, the winners of .a
competition are ranked in the order of merit, the guests at adinner are
seated according to the order of their importance, the letters in a file are
arranged in a. chronological order and so on. In all these examples, the
objects are put in a sequential manner from the first to the last. But this
may not always be r 'L' For ,' in a , irion, several 1 " '
may matter and a prize may have to be divided between two contestants
not because they are equal but because one is superior to the other in
some respects and inferior in some other respects. Similarly, if some of the
letters in the file are undated then we may not know where to place them.
From their contents we may be able to tell, in some cases, that they came
before or after certain other letters. But such a comparison may not
always be possible, with the result that certain pairs of letters will have to
be left as incomparable.
We now formally define an order relation on a set X.
3.1 Definition: A binary relation R on a setX is called a partial order
(or partial ordering) if,
(i) R is reflexive,
(ii) R is transitive and
(iii) Ris anti-symmetric, i.c. for all x, yeX n and n implies
x = y.
Standard notations for partial order relations are S, oc (which are
generally read as ‘is less than or equal to’ or ‘precedes’) or}, 30 (which
are read as ‘is greater than or equal to' or ‘l'ollows‘). Note that the inverse
relation of an order relation (see Exercise 2.3) is also an order relation
and is called the reverse order. If a partial order is denoted by s, cc etc.
then its reverse order is denoted almost exclusively by >, no etc. #
We already cited some examples of partial orders. Beffi giving other
examples, it is worthwhile to comment on the definition, especially because
difl‘erent authors tend to adopt different conventions. Certainly, transiti-
vity is the very essence of ordering and any definition of itought to include
it. Whether reflexivity should be included or not is largely a matter of
taste, depending upon whether in an inequality one wants to include a
possible equality or not. For those who do not, there is the following
definition of what is known as strict order.
Set: with Additional Structures 175
3.1 Definition: A binary relation S on a set X is called a strict partial
order, if,
(i) S is irretlexive, i.e. for every x e X. (x, x) d S.
(ii) S is transitive
(iii) S is asymmetric. i e for every x, y e X, (x, y) e S implies (y, x) d S.
There is an obvious relationship between the two concepts which is
stated in the next proposition. The proof is left to the reader.
3.3 Proposition: lt‘ Ris a partial order on a set X then R —AX isa
strlct partial order on X. If S is a strict partial order on a set X then
SUAX is a partial order. (Here AX is the diagonal on X.) i
Thus we can convert partial orders to strict partial orders and vice
versa. If partial orders are denoted by g, at, g, etc. then the correspon-
ding strict orders are denoted by <, ocfc etc. if x< y, we say x is
strictly (or properly) less than y. Sometimes partial orders are themselves
denoted by symbols like <, 0: and C and then the corresponding strict
orders are denoted by f’ 3‘ and 5.
The presence of antisymmetry in the definition of a partial order (or
its counterpart, asymmetry, in the definition of a strict partial order) calls
for a comment. Some authors do not require it. Let us see what would
happen without it. Let X be the set of all students in a class and for
xe X let h(x) be the height of x. Define S on X by x S y ifl‘ l:(x) g h(y).
Clearly S is reflexive and transitive. But it is not anti-symmetric if there
exist two distinct students, say a and b, of equal heights. This does not
sound very strange. But look at the corresponding, strict order. which is
defined byx<yiflxsyandxaéyz Then a <17 and b< a. This does
sound strange because we are apt to interpret a < b as a is shorter than b
(although this is not the correct interpretation). .
Thus we see that if we do not have antisymmetry then the correspond-
ing strict order would not conform to its natural interpretation. We would
always have to be on the guard while handling it. On the other hand, if
we include antisymmetry in the definition of a partial order, then we
would have to sacrifice so many naturally occurring examples. One such
example is the set of students compared according to their heights. Another
would be the file of letters, to be arranged chronologically, with two letters
carrying the same dates.
In othr-r words, although antisymmetry is desirable, its inclusion has a
price. Fortunately, the difiiculty is not a serious one. In many cases it can
be supposed not to arise at all. For example, in the case of the heights of
the ’ if we them very ' 'y then it is P" ‘y unlikely
176 macam MATHEMATICS (Chapter Three)
that two distinct students would have exactly the same height. There will
always be some difference from the point of view of probability and even
an iota of difference would ensure antisymmetry. However, this solution is
not a sound one from our point of view because it is based on the assump-
tion that the height is a continuous variable and such an assumption is
contrary to the very spirit of discrete mathematics. So we have to swept the
possibility that two distinct students may have the same height. Even then
the situation is not so hopeless. After all, we are comparmg the heights and
not the students per se. Let us group together students of the same heights
(which mounts to defining an equivalence relation on the set of the stu-
dents). Now the concept of the height of a group is well-defined and if we
compare the groups in terms of it then antisymmetry does hold, because
no two distinct groups can have the same height. More generally, we have
the following result.
3.4 Proposition: Let X be a set and S abinary relation on X which is
reflexive and transitive. Define a binary relation on X by n if x < y and
y S 2:. Then R is an equivalence relation on X. Let XIR be the quotient set
and p: X —> X/R the projection function. Then there is a partial order I:
on X/R such that for all x, y E X, x g y implies p(x) PE p(y).
Proof: The verification that]! is an equivalence relation on X is simple
and left as an exercise. Let now X/R consist of the equivalence classes
under R. (Note that if g is itself antisymmetric to begin with then each
equivalence class contains only one element and we may as well identify
X/R with X.) Elements of X/R are of the form’ [x] for x e X, where [ ]
denotes the R-equivalence class. If [x], [y] are two elements of X/R, we
define [x] ac [y] ifl‘x g y in X. To ensure that this gives a well-defined
binary relaTion on X, we must verify that it is independent of the choice of
representatives of the equivalence classes (see the comment in the proof of
Proposition 2.14). In other words, suppose [x] [z] and [y]: [w]. Then
we must show that [x] s [y] ifi‘ [i] < [w] So suppose [x] cc [y], Le x S y.
Since [x]= [2] we have 2 < x (and also x < z) and similarly [y]= [w]
gives y\ < w (and w < y). Hence by transitivity of g, we get 2 < w, i..e
[2] _.[w]
ac Similarly [2] ac [w] implies that [x] 0c [y] Thus oc isawell-defined
binary relation on X/R. By its very definitionJOt) 0c p(y) whenever x g y
in X, because p(x), p(y) are simply [x] and [y] respec—tively. It only remains
to verify that at is a partial order on X/R. Reflexivity and transitivity of or:
follow from the corresponding properties of s For antisymmetry, suppose
[2:] ac [y] and [y] cc [x] Then x< y and y g x which means n and
hem—e [x]: [y] (even though x may not be equal to y). I
The significance of this p. r ' ' is that l we have a
and transitive relation on a set, we can always assume antisymmetry by
to a ‘ “ 1 ‘ set, if y, and In this a we regard
Sets with Additional Structures 177
as equivalent such elements which would have been equal, had there been
antisymmetry. A construction like this appears frequently in mathematics.
Whenever we want two things to be equal but they are not, we define a
suitable equivalence relation R under which they are equivalent and pass to
the quotient set. of course, this equivalence relation must not be unneces-
sarily large, or else we would be treating too many things as equal. This
procedure is expressed by saying that the two things under consideration
are equal modulo R. The proposition above says that every reflexive and
transitive relation is also antisymmetric modulo an equivalence relation. So
there is no significant loss of generality in retaining antisymmetry in the
definition of a partial order. From now onwards, we accept Definition
(2.1).
Finally, we answer one more question about the definition, namely.
why the word ‘partiai‘?; ‘Partial’ as used here is the opposite of ‘total'. So
we must define a total ordering first, as we indeed do.
3.5 Definltion: A total or linear or simple order on a set X is a relation
g on X which is reflexive, transitive, antisymmetric and which has the
following property, known as the law of dichotomy: for every x, y E X
either x g y or y S x. (verbally, every two elements are comparable to
each other.)
Put difi'erently, the law of dichotomy says that the relation 4, along
with its inverse relation >. cover the totality of all ordered pairs (x, y),
with x, y e X. Hence the name ‘total’. The justification for the term ‘line-
ar’ will be given a little later. Apparently, there is no simple justification
for the term ‘simple’. Probably it is used just to indicate that certain theo-
rems about partial orders take a particularly simple form when specialised
to simple orders. 'Dichotomy’ literally means branching into two. Here the
two possibilities (x < y and y g x) are the ‘branches’.
We now give examples partial orders, some of which will be total
orders. Note that if < is a partial order on a set X and Y C X then s/l’,
i.e., the restriction of g to Y (see Exercise (2.5)) is also a partial order
and this way we can get more examples. Note further that if S is linear
so is g/Y. It may happenI however, that l is total but < isnot.A
partially ordered set, abbreviated, as p.o. set or a poset is defined as a pair
(X, g) where X is a set and g is a partial order on X. This is yet another
example of a mathematical structure. A linearly ordered subset of a poset
is called a chain, a tower or a nest.
Examples:
1. A foremost example is that of the usual ordering for real numbers.
This is a total order.
178 mscnm MATHEMATICS (Chapter Three)
2. On the set of nonnegative integers, define a | b to mean that ‘a
divides b’, i.e., ‘there exists an integern such that b = rm’. It is
easily seen that| isapartial order. It is not total; for example,
neither 10] 15 nor l5| 10. We could have defined a similar relation
on the set of all integers but it would not have been antisymmet-
ric.
Let X be any set and P(X) its power set. Then the inclusion
defines a partial order on P(X). This is also not a total order
because we can have subsets A, Bot X such that neither A C 3
nor B c A.
Let D be the set of all partitions ofa set X. If 9, l e D, let 9 g I
mean I is a refinement of 9(see Definition (2.9)). Then it is easily
seen that (D, s) is a poset. Again, it is not totally ordered.
Let (X1, <1) and (X,, <,) be posets. Let X=X(XX.. We define a
partial order s on X by
(x19 *1) g ()5: Yr)
ifi” either x. <, y. or
(X: = y: and x, <s J’s)-
In other words, we first compare the first co-ordinates and if they
are equal then we compare the second co-ordinates. More gene-
rally, this can be done for the cartesian product of any finite
number of posets, the order of these posets being crucial. (In
other words, although there is an obvious oue-to-one corres-
pondence between XIXX, and X,XX,. it is not compatible with
the orderings we get.) When we list words in a dictionary, ths
is how they are arranged, first, by their first letters and in case of
words having the same first letters. by their second letters and so on
(In an actual dictionary, not all words are of the same length. But this
can be tackled by putting blanks at the end of the shorter words
and declaring that the blank is to be regarded as the first symbol
of the alphabet.) For this reason, the ordering defined here is
called lexicographic or dictionary ordering. Note that if <1, <3
are linear, so is S.
Let S be a set of statements. Define p < q to mean p implies q.
Then < is a reflexive and transitive relation. It is not antisymmet-
ric in general. However, by the construction in Proposition (2.3),
we get a partial order on the set of equivalence classes of state-
ments where two‘ ' "1 equivalent are . ’ ’
equivalent.
If (X,, <0 and (XI, <9 are posets we can also define a partial
order on XIXX. by
(x1, x.) S (yr. y.)
Set: with Additional Structures 179
ifl‘ x. g, yI and x, g y.. But unlike the lexicographic order, even
when S, and g, are total, < need notbe so. If x, g, y, and
y, g, x,, then the pairs (x,, x,) and (y,, y,) are incomparable
unless they are equal. (This was the essence of one of the exam-
ples at the beginning, where two contestants in a competition are
incomparable.)
8. Let X = RUi‘} Where ‘ is some point not on the real line. Define
g on X as
{(x, y) e RXR: x S y in the usual order}u{(a, a».
In other words, we retain the usual order on R and declare that the
point ‘ is not comparable with anything except itself. This seems a
very artificial way to define a relation, but it satisfies all the condi-
tions of a partial order and is useful as a counterexample.
9. Let I? = Rufeo,—-oo) where no and —oo are just two symbols
not representing any real number. Define g on R‘ as {(x, J') E
R x R: x S y in the usual order} U ({— 00) x R’) U (R‘ x (00)).
In other words, we once again retain the usual order on Rbut
extend it to the points no and —ao by declaring that —eo is less
than everything else and co is greater than everything else. R‘ with
this ordering is called the extended real line. Note that it is a
chain.
The study of partial orders can be taken in three directions. One is the
usual one, common to the study of all mathematical structures. namely. to
prove theorems about them in the abstract and then apply these theorems
to specific examples. But because of the frequent occurence of order rela-
tions in real life, two other problems are very important. One is to put on
a given set X a suitable ordering subject to certain constraints. The cri-
teria of suitability and the nature of the constraints would, of course,
change from problem to problem. Worded differently, the problem is to
list the elements of the given set X in a suitable manner. Many times these
elements are pieces of information or ‘data' as they are called technically.
Problems of this type are therefore studied under what is called ‘data
structures'. Although we shall not study them, we shall do one listing
problem here. Another problem commonly encountered in practice is to
transform, with a suitable permutation, elements listed in one linear order
to the same elements listed in some other order. This is known as the
sorting problem. We shall discuss it briefly later in this book. (See the
Epilogue).
180 Discnm MATHEMATICS (Chapter Three)
Let us take the first line first. We give below a few definitions. Note
that if g is a partial order on a set then so is the reverse order, >. When-
ever a concept is defined for <; the same concept for 2, after translating
in terms of <, gives what is known as the dual concept. For example, the
concept of a lower bound is dual to that of an upper bound because a
lower bound of a set is also an upper bound of the same set under the
reverse order relation. This saves the duplication of work because it is
unnecessary to give separate definitions of the dual concepts. Similarly
once a result is proved about such concepts, the corresponding ‘dual’ result
follows simply by duality.
3.6. Definition: Let (X, <) be apartially ordered set, A c X andx e X.
Then we say xis an upper bound of A if for all a e A, a < x. x is called
amaximnm (or largest or greatest) element of A if x5 A and x is an
upper bound of A, x is called a maximal element of A if x E A and for
every a e A, x g a implies x = a. The dual concepts are a lower bound,
aminlmnm (or smallest or least) element and a minimal element respec-
tively. x is said to be a least upper bound (or I.u.b. or snpremnm) of A if x
is a least element of the set of all upper bounds ofA. The dual concept
is agreatestlower bound (or g.l.b. or infimum). A is said to be bounded
above if it has at least one upper bound. The dual concept is ‘bounded
below’. Finally, a set which is both bounded above and below is called
hounded. If X itself is bounded, the partial order g is said to be bounded.
The reader must have seen most of these concepts defined (and illustra-
ted) for the case of the usual order on the real line. For linear orders,
they behave much the same way as for the real line. But, for partial orders
which do not satisfy the law of dichotomy, one has to be cautious. Note,
for example, the distinction between a maximum and a maximal element.
If x is a maximum element of A then clearly it is also a maximal element
of A. The converse holds if A is achain but not in general. The catch is
that the maximality of an element simply means that there is no element
which properly beats it, i.e., it beats every element with which it is compar-
able. It does not necessarily mean that it beats every other element.
Beating presupposes comparison. So if there is no comparability, auto-
matically there is no beating. Thus, an element can be a maximal element
without being a maximum element, because there may not be very many
elements with which it is comparable. Fot the same reason, note also that
while a set can have at most one maximum element, it may have more
than one maximal elements. (If x and yare two maximum elements of a
set A then x < y and y s x, giving x = y by antisymmetry. But if x and y
are merely maximal elements of A, then this argument breaks down
because x and y may not be comparable.) As an example, in Example (2)
above, let A = {1, 2, ..., 10). Then A has 6, 7.8, 9 and 10 as its maximal
elements. Even when a maximal element is unique it need not be a maxi-
mum element, For example, in Example (8), above, the whole set X has
Sets with Additional Structures 181
4- as its only maximal element. Still it has no maximum element. Note
also that sometimes a set may have no maximal element; for example the
set of positive integers in Example (2) above. (Curiously, 0 is the maximum
element for the whole set there.)
Things are somewhat better for finite sets as we now show. The result,
although not profound, is important because in discrete mathematics we
often deal only with finite sets.
3.7. Theorem: Let (X, g) be a poset and A a non-empty, finite subset
of X. Then A has at least one maximal element. Also A has a maximum
element ifi‘ it has a unique maximal element. (Similar assertions hold for
minimal elements.)
Proof: We start with any element, say a:1 of A. If x, is not a maximal
element of A then, by definition, there exists an element xI e A such that
x, <x,. If x. is not a maximal element of A, then again, there exists
x, e A such that x. < x,. Note that x, < x. (by transitivity of the strict
order) and so x, ;6 x1. Again, if x. is not maximal, we get an element x. of
A, difl'erent from ya, x,, x. such that x, < x‘. Continuingin this manner we
get a sequence x, < x. < x. < . since every time we are getting a new
element and the set is finite, this process must stop at some stage. But then
A would have a maximal element.
For the second assertion, suppose A has a maximum element, say at.
Then it is also a maximal element. Further there can he no other maximal
element in A, for if y were such an element then y < x (since x is a maxi-
mum element), which would force y = x by maximality of y. Thus if A
has a maximum element then A has a unique maximal element. This part
does not require that A is finite. However, for the converse implication.
we do need it. Suppose A has a unique maximal element, say x. We claim
it is also a maximum element of A. If not, then there exists some x, e A
such that x, is not comparable with x; for if x1 were at all comparable
with x then x, < x (by maximality of x) and if this holds for all x:1 e A,
it would mean that xis a maximum element. So x, is not comparable with
3:. From now onwards the argument is similar to the one used in the
proof of the first assertion. If xI is not a maximal element of A then
there exists x, E A such that x, < x, Note that x, cannot be comparable
with .7: (otherwise, 7:, S x and this would give xl < x, contradicting that
x1 was not comparable with 1:). Continuing we get a sequence
x, < x, < x, <
of elements of A none of which is comparable with x. By finiteness of A,
this sequence must terminate and hence some 2:. would be a maximal
element of A, dilferent from x, contradicting the assumption. This proves
the converse implication. The assertion about minimal elements follows
by duality. I
182 mscnm mammanm (Chapter Three)
As an application of this theorem, we can do the Dance Problem. In
Chapter 2, Section 1 we paraphrased it by considering B as the set of boys,
n.
and B. as the set of those boys who danced with the ith girl, 3,, i=1, 2....,
The problem reduces to showing that there exist 1, j such that neither
a, C B, nor B, C 3.. We interpret this in terms of a suitable partial order.
Let I’(B) be the power set of B, partially ordered by set inclusion c (see
Example (3) above). Now let A = (19,, 8,, ..., 3,}. Then A is a fiinite
subset of RB). (We are assuming here that the number of girls is finite.
The assertion of the problem need not hold without this assumption.) We
have to show that at least two elements of A are incomparable under the
relatlon c. if this is not the case then A is a chain. By Theorem (2.6), A
has a maximal element, say, B, and since A is a chain, B, is also the maxi-
mum element, i.e., B: C B, for all i = l, 2, . ., n. _But then
n
B, = U B].
l-I
The conditions of the problem imply that
0 B: = B,
(-1
the set of all boys. But this means B, = B, or in other words. that
the rth girl, g,, danced with all boys, contradicting the data of the
problem. So A cannot be a chain, and, as noted before, this completes the
proof.
Another application of finiteness is in simplifying the graphic represen-
tation of an order relation g when the underlying set. say X, is finite.
Recall that in this representation, we picture element of X as points in a
plane and whenever x < y we draw an arrow from x to y. Such diagrams
tend to be messy even for a relatively small set X. For example, it | X] = 10
and s is a total order on X then its graphic representation would contain
55 snow. (Fortunately, because of antisymmetry, there are no reverse
arrows.) It is possible to reduce the number of arrows and still retain the
information conveyed by the graphic representation of the partial order.
Let us see how this can be done.
First, all loops on elements of X can be removed. Since g is always
reflexive, such loops can be understood even without being drawn. How-
ever, this is not a very big saving. A really substantial saving comes because
of transitivity. Whenever x, y. z e X and there is an arrow from x to y
and an arrow from y to 2, there is no need to draw a direct arrow from x
to 2 because such an arrow can be inferred from transltivity of <. Even the
arrow from x to y can be eliminated it‘ we find some element, say w,
‘between’ x and y, i.e., such that x < w and w < y. Again we can look for
something in between at and w. Unfortunately, if X is an infinite set then
his process may go on forever. For example, if X is the set of real timbers
Sets with Additional Structures 183
and < the usual order on it, then between every two real numbers there
are infinitely many real numbers and so every arrow that we draw is super-
fluous. But then. if we do not draw any arrows, we would not get the usual
ordering on R. (The ordering that can be inferred from a diagram with no
arrows drawn is the trivial ordering in which x g y ifl‘x = y. But this is
not the usual ordering on R.)
However. for finite sets, things are better as we now show. (Even for
infinite sets, the argument given below applies if the ordering happens to
be what is called a well ordering. This is an important concept in mathe-
matics and although we shall not need it. it will be briefly touched at
through the exercises.) First we need a definition.
3.8 Definition: Let (X, S) be a partially ordered set and let x, y e X.
Then y is said to cover x if x < y and there is no 2 e X such that x < z
and z < y.
In other words, y covers x ifi‘ x < y and there is nothing between x
and y. As noted above, in the usual ordering on the real line no element
covers any element. But if we take the restriction of this order to the set
of integers, then an integer n is covered by n + 1. As another example. in
Example (2) above note that x is covered by y if and only if x | y and the
ratio y/x is a prime number. This also shows that the same element may
be covered by more than one element.
In the next proposition, we characterise the concept of covering in terms
of the concepts defined earlier.
3.9 Proposition: Let (X, <) be a poset and x e X. Let
A={zeX:x<z}.
Then an element y e X covers xif and only if y isa minimal element of A.
Proof: This is a straightforward consequence of the definitions and is left
as an exercise. I
Now, coming back to the problem of reducing the number of arrows
in the graphic representation of an order relation, we show that for finite
posets. it sufiioes to draw only those arrows which go from elements of the
set to those which cover them Formally, the result can be stated as
follows:
3.10 Theorem: Let X be a finite set and g apartial order on X: Define
a binary relation R on X by xRy ill" y covers 2: (w.r.t. g). Then g is gene-
rated by R, i.e., is the smallest order relation on X containing R.
Proof: Clearly, as subsets of X X X, R is contained in g. We have to
show that we can get g by suitably extending R. First we add AX for the
184 mm MATHEMATICS (Chapter Three)
sake of reflexivity. So let R, = RUAX. Then RI is reflexive. Now we
come to the critical part of the argument, namely, showing that the order
relation generated by R1 is the original relation <. For this we have to
show that given and x. y E X with x S y, we can find some sequence of
elements x,, x,, .. ..., x. in X such that x = xx, y = x. and (x1, xl+1)e R1
for alli = l, 2, ..., n—l (cf. the proof of Proposition (2.15)).Ifx = y,
then (x, y) e AX and hence (x. y) E R1, so we take x, = x and x, = y.
lt'x < y then we proceed by induction on the number of elements in
between 2: and y (i.e. the number of elements 2 such that x 4 z and
z < y). Let r be the number of such elements. If r = 0, then y covers x
and so (x, y) E R by definition. So again (x, y) e R,. Suppose r > 0.
Let A be the set {2 e X:x < z and z < y}. Then, by definition, [A[ = r.
Fix some 2 e A and let
B—(weX:x<w<z}
and
C={weX:z<w<y}.
Then B and C are proper subsets of A (since 2 9% B and also 2 ¢ C) and
so I B l < r and | C l < r. So by induction hypothesis, there exist sequences
x1, ..., x. and y,,y,, .., y». such that x = xx, 2 = x,., z = yI and y = y,n
and (x,, xi“) 6 RI for i=1, ...,n—l and(y,, n+1) e R, forj=l, 2, ...,m—l.
Concatenating these two sequences (see again the proof of Proposition
(2.15)) we get a sequence of the desired type with first term 2: and last
term y. This completes the induction and proves our assertion that if
x < y in X then the pair (x, y) belongs to the smallest transitive relation
containing R1. So it follows that < is generated by R,. l
lntuitively we may think of the relation R in this theorem as the
‘skeleton’ of the given relation g. The graphic representation of R is called
the Hasse diagram of <- It is obtained by picturing elements of X as points
and drawing an arrow from x to y whenever y covers x. In Figure 3.12 we
show the Hasse diagrams of two posets. The first is the power set of a three
element set, say (a, b, c}, and the second is the set of integers from i to 10
partially ordered as in Example (2) above.
The Hasse diagram of a (finite) poaet vividly describes most of the con-
cepts associated with the partial order. For example, two elements are
comparable ifi‘ there is a path from one of them to the other along the
arrows. An element is maximal ifi‘ there is no arrow issuing from it. Note
that if X is linearly ordered then its Hasse diagram can be drawn by pictur-
ing points of X on a straight line and drawing arrows between consecutive
points. (This justifies the name ‘linear’.) Hasse diagrams are also useful in
constructing counterexamples. It is often easier to conceive a desired counter-
example through its Haste diagram.
Although we defined the concept of a supremum in Definition (3.5), so
far we did not discuss It. This is not because the concept is only of peri-
Sets with Additional Structures 185
(a) (mb
V
Fig. 3.12: fleue Diagram of Peseta
pheral importance. 0n the contrary, it is one of the most pivotal concepts
in mathematics. Let us see how. Note first of all that if a set A has a
supremum then it is unique because it is the least element of some set,
namely the set of all upper bounds of A. Moreover, if A has a maximum
element then clearly this element is also its supremum. So the concept of
a supremum has an independent interest onlyin the absence of a maximum.
For linear orders, we saw in Theorem (2.6), that every finite subset has a
maximum. So, to get really interesting examples of suprema in linear order:
we must necessarily work with infinite sets. This leads to a limiting process
and therefore goes beyond the purview of discrete mathematics. For
example, let A be the set {x e R: 0 < x < I). Then w.r.t. the usual order
on R, A has 1 as its supremum. but not as the maximum because 1 f A.
A contains points which are arbitrarily close to l but not the point 1 itself.
This is the very essence of a limiting process. coming inflnitesimally close
without actually touching. Indeed, the very definition of a real number in-
volves the concept of n supremum (or some other, equivalent form of the
limiting process). Let Q be the set of all rational numbers and let
A={xEQ:x>0mdx'<2).
Then A is bounded above in Q; for example. it is easy to show that 2 is
an upper bound of A. But A has no supremum in Q. As a subset of R, A
does have a supremum, namely 1/5. But V5 is not a rational number.
Figuratively, absence of V: amounts to a hole in the set of rational
numbers. Real numbers can be constructed by patching up these holes and
every real number can be obtained as the supremum of some subset of
rational numbers. Interesting as these topics are, they are beyond the scope
of this book and so we abandon this line here.
Thus. for total orders, a , either ‘ " with the
186 mscnm MATHEMATICS (Chapter Three)
or else leads to a limiting process. However, for partialJorders which are not
total orders, the picture is quite dili'erent. In such a case, even a finite set
need not have a maximum. Still, it may have a supremum. Such suprema
(and infirna) are very important and are given a special name, as in the
following definition. We assume that the set in question has only two
elements, because this is the most interesting case: (A set with only one
element trivially has its lone element for both its supremum and infimum.)
If every set with two elements has a supermum then it is easy to show. by
induction on the cardinality, that every finite set does. (cf. the deriva-
tion of Proposition (2.2.3) from Axiom (22.2)).
3.11 Definition: Let (X, S) be a poset and let x, y e X. Then the
supremum and the infimum of the set (x, y} (in case they exist), are called,
respectively, the join and the meet of x and y and are generally denoted
by xv y (read ‘x join y‘ or ‘x wedge y’) and ‘xA y’ (read ‘x meet y‘). A
poset in which every pair ofelements has a meet and a join is called a
lattice. The Hasse diagram of a lattice is called a lattice diagram.
Obviously if x and y are comparable under S. then xv y is the greater
of the two and x/\ y is the smaller of the two. Consequently, every totally
ordered set is a lattice. As other examples, the power set of any set, parti-
ally ordered by inclusion, is a lattice. The intersection of two subsets is
their meet and their union is their join (of. Exercise 2.1.4). Example 2
above of partial orders is also a lattice, with the meet of x and y being
their greatest common divisor and their join the least common multiple.
Example 6 is also a lattice. provided the set S has the property that when-
ever two statements, say, p and q are in S, their conjunction and disjunc-
tion are also in S. Indeed it is easy to see that the conjunction of two
statements is their meet. Clearly the truth of the conjunction implies that
of each of the two statements and any statement whose truth implies that
of both p and q must also imply the truth of ‘p and q‘, Similarly the dis-
junction of p and q is their join. For this reason the conjunction and the
disjunction of two statements, say p and q, are often denoted by pAq and
q respectively.
The poset in Example 8 is not is a lattice. For any x e R, the pair x
and ‘ has no meet and no join. We can get many other examples by
removing certain elements from lattices. For example, the collection of all
non-empty subsets of a set S is in general not a lattice, because two sub-
sets which are mutually disjoint would have no meet.
The theory of lattices is quite important and certain special types of
lattices will be studied as Boolean Algebras in the next chapter.
We now consider an example of the problem of listing. This amounts
to putting a linear order on a given set, which would generally be a finite
set for our purpose. Although the characteristics of a ‘good’ ordering
would depend on the particular context, one general, desirable feature
Sets with Addition! Structures 187
is that there should be an easy retrieval. This means that the index
(or rank) of an element should be given by an easy formula directly In
terms of the element (and without having to go down the list till that
particular element is encountered). Secondly the inverse of this function
should also be easily computable, i.e.. given a positive integer k (not
exceeding the cardinality of the set in question), we should be able to
tell the kth element of the list directly in terms of k (and without having
to traverse the list physically). Actually, this is a very strong requirement
because the existence of such an indexing function for the given set, X,
establishes an order isomorphism‘ between X and the set {l,2,...,n} where
n = l X |. The latter set is a very familiar set and consequently, it is easy to
answer any question about the ordering on the set X. For example given
2: e X, if we want to find its immediate successor (i.e. y e X such that y
covers x) we simply compute the index of x, add 1 to it and then find the
unique y with this index. Similarly given x and y in X we can tell which
way they are related by looking at their indicts. In a good ordering, it is
desirable to have a mechanism to answer these questions directly. without
going through the computation of the index.
As an illustration, let X be the set of all permutations of a set with n
elements, say, the set {1, 2,. , n}. Then IX] = n!. We put the usual order
on the set (I, 2, ..., n}. Now, every element of X can be thought of as a
‘word’ of length n (with every ‘letter’ appearing exactly once). We put the
lexicographic ordering an X (see Example (5) above). For example, for
n = 3, the six permutations in X are ordered as
123 <132 < 2l3 < 232 < 312 < 321.
In this order it is very easy to tell which of the two given permutations
comes first; we simply ‘scan’ both of them from left to right and when
we first come across a place in which their entries differ, the permutation
with the smaller entry in this place is the smaller. It is also easy to find
the successor of a given element, say, at 2 old, a, of X. We scan 2:
from right to left till the 11‘s keep on ‘climbing’ and first ‘fall down‘, i.e.,
We find the integer j such that a, < (11+, and
"1+1 > “1+1 > > a...
(If such j does not exist then
a. = n, a, = n— l,,..., (1..-, = 2, a. =1
and this is the last permutation in X). Now let r be the unique integer
such that a, > a, > a,“ (set r = n if a,l > 11,). Leave 01.... am as they
are, replace a, by a, and arrange the remaining elements in an ascending
order. Thus the original permutation,
al a, a,., a, am ah“... a, any... a,
‘For a formal definition, lee Exercise 3.12.
188 DISCRETE MATHEMATICS (Chapter Three)
changes to,
al 11,....11,-_l a, a. ttn_l....a,sfl a, a,_,....alfl
it is easy to show that this is in fact the immediate successor ofthe original
permutation. For example. forn = 5, the successor of 23541 is 24135.
Herej=2andr=4.
It remains to find the index function for this ordering. We could do
this recursively, by a formula, which expresses the index of a permutation
of n symbols in terms of the index of some other permutation of n —1
symbols. Applying this formula again and again n —1 times, we get the
index of the original permutation. However, there is another method and
we give it here because it involves a concept, which is important elsewhere
also.
3.12 Definition: Let x = x, x..... x. be a sequence of real numbers. By
an inversion in x, we mean a pair (x,, x,) such that i < 1‘ but at, > X]. For
each i, let d1 = the number of inversions whose first entry is x., i.e.
d, = | {x;:i<j,x, > x,} | .
Then the sequence (11;. d” ..., d.) is called the inversion table or inversion
vector of x.
Intuitively, an inversion is a pair which is in the ‘wrong’ order. The
permutation 1 2....n has no inversions and hence has (0, 0,...,0) as its
inversion table. At the other extreme, the permutation n(n— 1)....21 has
every pair as an inversion and its inversion table is
(n— 1, n—2,..., 2, 1, 0).
The permutation M351has(l, 2, 1, l, 0) as its inversion table. It is obvious
that for every! = 1, 2, n,
Osdggn—i.
What is not so obvious is that the inversion table of a permutation
completely determines it. For example, let us see how we may recover
the permutation 24351 from the inversion table (1, 2, 1,1,0). We are
given that d, = 1. This means that there is one integer less than the integer
in the first place, which follows it. So the first place must be filled by 2.
Then the second entry must be from 1, 3, 4, 5. Since :1, = 2, we know that
it must be the third smallest number in this set. So the second entry is 4.
The next entry must be i, 3 or 5 and the fact that 11,: 1 fixes it as 3.
Continuing we get 24351 as the permutation.
The same p. ‘ can be " ‘ to give the‘ " ' ,theorem
due to M. Hall.
3.13 Theorem: Given integers d1, «1,, ...,d,. such that for all i=1, 2,..., n,
0sd.<n—i,
Set: with Additional Structure: 189
there exists a unique permutation x = J:l x, x. of {1, 2, ..., n} whose
inversion table is (d,, d,, ..., d.).
Proof: Start with d, This is the number of integers from (1,2,..., n}
which are smaller than x,. So x1 = d,+ 1. Let
sH= (I. 2. n) — {x.}.
Then x, E S,,., and d, is the number of elements in S,,_, which are smaller
than x,. So x, is the (d, + 1)th smallest element in the set SH. Next let
Sr... = Stu-1 — (x,}. Then, once again, x, is the (d, + 1)th smallest element
of S._.. Continuing in this manner we determine x“ x....., x".1 and x. in
succession. (The construction can be done conveniently by writing
1, 2,...,n— 1,7; in a row and scoring ofl' the elements x,, x.,... as they
are determined. For example, in the example above we have (1, 2, 3, 4, 5),
(I, 2, 3,4. 5), (1. 2, 3; 4, 5), (l, 2, 3, 4, 5) (l,_ 2, 3, 4, 5) and (1, 13’ 4. 5)
giving 24351 as the permutation) Thus the permutation x is determined
uniquely. By the very construction of it, its inversion table is (d,,d., ..., d"). I
It is interesting to note that the number of all permutations of n
symbols can also be obtained from this theorem. Each d,- can take
(n --J' + 1) values, independently of the others. So the number of possible
inversion tables isn X(n — 1) x x 3 x 2 x1 or n!. This is also the number
of all permutations of n symbols.
In terms of inversion numbers we are now ready to give a formula for
the indexing function for the lexicographic order on permutations.
3.14 Theorem: 1n the lexicographic ordering for the set of all permu-
tations of n symbols, the permutation whose inversion table is (d,,d,, ,d..)
has index (or rank)
=1 + d, (fl—l)! + d,(n — 2)! + .. + d, (n — i)! + +d,._, l! +d,. o l.
(Actually, the last term need not be included because d, = 0).
Proof: We prove this by induction on n. The case n = l is trivial because
then there is only one permutation and its inversion table is (0). Let us
denote the index function for the permutations of n symbols (arranged
lexicographically) by 1nd,. Letx = xI x,.... x, be a permutation of], 2, ...,n.
Let (d1, d,, ...,d") be the inversion table of x. We have to prove that
1nd,,(x)=l+ [2.111101—03-
Now, for each k with 1 S k < x, let T]. be the set of those permutations
OH], 2, ...,n} in which the first entry is k. Clearly nI = (n— 1)!. By
190 nrscms MATHEMATICS (Chapter Three)
the definition of the lexicographic ordering, all elements of Tk come
before x, for k = l, 2, ...,x,— 1. In all there are (x, — l) (n -— 1)! such
permutations. Besides these, among those permutations whose first entry
is x1. some permutations will come before x. Specifically, a permutation
x, y, y,....y,. will come before x, x,.... x. ifi‘ the permutation y’ = y,y,....y,.
comes before the permutation x’ = x,....x,. in the sec of all permutations
of the set {1, 2, ..., n) — (xi), arranged lexicographically. Thus we have,
1nd,. (x) = (x, — l) (n — 1)! + Ind,,_l (x’) (e)
Ind._,(x’) is known by the induction hypothesis. (Note thatx’ need not
be a permutation of the set {1, 2,..., n—l}. Still we can apply the induc-
tion hypothesis because we are proving the result for the set of permuta-
tions oi" any symbols, provided there is some linear order on these
symbols, which is used to put the lexicographic order onvthe set of all
permutations of these symbols. The actual symbols are obviously imma-
terial.) For the sake uniformity in notation, let
x,’ = x”. {OH = {1, 2.....n—l).
Then x’ is a permutation of the set
(’51,; Xa'p-u XII—l}
Let
(d,’, 4;, 4'“)
be its inversion table. Clearly
I!" = d,“ for l = l, 2,..., n—l.
New
ind... (z) = 1 + :iy,‘ (n—l—l)!
=1 + "fauna—14)!
I=I
=1 + i dr(n—i)!(replacing 1 by i + 1
III
as the index variable of summation).
Substituting this in (‘) and noting that ii] = xl—l completes the Floor
of the inductive step. Thus we have established the index formula for the
permutations of the set (I, 2,. . , n}. But obviously the same formula
applies for the set of permutations of any n symbols, on which some
linear order is given. (As noted above, the generality so gained is vitally
needed in the inductive step.) This completes the proof. I
For example, for n = 6, the index of the permutation 531462 is
1+4X5l+2x4!+ 0X3! + 1x2!+ 1X1!+0x0!=532.
Thus we see just by inspection and a little calculation, that this element
Sets with Additional Structures l9]
ranks 532nd from the beginning in the lexicographic order, without sear-
ching down the list till we encounter this element. What about the reverse
question? That is, given an integer k, how do we find the permutation
whose index is k? For the answer we need the following theorem:
3.15 Theorem: Let k and n be positive integers with k g n!. Then there
exist unique integers d" d”...d. such that
0 sd, < n-t’fori: 1,2, m,»
and
k=1+'§ldt(n—I)!.
Proof: We prove the existence by induction on n. The casen- l is
trivial. If k = n!, then we set
d, = n—it’or all I = ], 2,...,n.
Then
a n -—t II-l
1)! £101 —'—!="!=
El (1101 —'= 1) (n 1) £110) '1—!
JEFF 1)]
11—] n—
=z(j+1)!—z'jx=ni_1=k—1
[-1
1-1
and this proves the result.
SUPPose k < n!. Let d, be the largest non-negative integer such that
d.(n—1)! < k. (lfk < (rt—l)! thend, = 0).
Then
0 g d. < n—l.
Let
r = k—dl (n—l)!
Then r is a positive integer and r S (rt—l)! (otherwise
(dl + 1) (n-l)! < k,
contradicting the definition of ‘1.) Now apply induction and express ras
t+'$:'b,(n—1—j)z
[-1
where
0 g b, g n—l
for all j= l,2,...,n— 1.
Now put d; = 171.1 for i = 2,..., n. Then we get
k=1+li| d,(n-i)i
192 mscm MATHEMATICS (Chapter Three)
This completes the inductive step and establishes the existence of the
d,’s.
For uniqueness too, we can apply a similar argument. But there is a
better way out. Let s be the set of all sequences (d1, 11.....J.) such that
0 S d, g n—ifori: 1,2, ..,n.
Clearly I S] = n 1. Let Tbe the set (i, 2,..., 11!). Dehnef: 5—) T by
f(d,, d,,...,d_) = 1 + lid, (n~i)l
Then by what we just proved, fis onto. But | S l = | TI = nl. Hence by
Theorem (2.2.9.), f is one-to-one which is equivalent to the uniqueness of
the 111‘s. I
With a slight change of notation, the preceding theorem says that
as
every non-negative integer can be uniquely expressed ”'12: d, j i where
0 g :1, <1 for all j and d, = o for all except finitely many j’s (thus the
sum is really finite). This expression is called the factorial representation
of the integer. For example
100= 2X21+ 4X4!, 200 = 1X2! + lX3!+ 3X4' + l><5l.
Theorem 2.14 also answers our question about finding the permuta-
tion whose index equals a given integer. We obtain the integers 11,, d,,....d,
and simply construct the permutation whose inversion table is
(4., d” d.).
We conclude this section with a discussion of a particularly interesting
type of listing problems, known as topological sorting. when we put a
linear order on a set there is often a constraint that certain elements must
precede some others. For example, when an author writes a book, the
order in which the chapters may appear is constrained by some of them
being prerequisites to some others. When a number of jobs are to be per-
formed on a machine, one at a time, some of the jobs must be carried out
prior to some others and this puts a restriction on the order the jobs are
to be done. Such situations can be paraphrased easily using a suitable
mathematical structure. Let X be set and g some given partial order on
X. The problem then is to find a total order at on X which is consistent
with g in the sense that for all x, y E X,x<—y implies x at y. As subsets
ofX XX this means that g is a subset of ac. In other worfi, we want to
Kextend' the given partial order to a total .o-rder. This process is known
as topological sorting. The origin of this peculiar name is interesting.
‘Sorting’ as used here is simply a synonym for ‘ordering’. The adjective
‘topological’ comes from ‘topology'. This is a branch of mathematics in
which, among other things, one studies the problem of embedding one
Sets with Additional Structures [93
figure into another, allowing the original figure to be ‘stretched or bent
Without cutting or gluing’. The graphic representation of the problem of
topological sorting is to embed the graph representing the partial order s
into the graph representing a linear order on the same underlying set.
Note that the extension need not be unique as seen from the example
Figure 3.13.
4 l 2 3 5 6 7 8
| 3 5 WW
4 (b)
6 7 l 2 4 3 5 8
WWW
6—)_7
(C)
. I 6 7 2 4 a 3 5
3 W
(a) (a)
Fla. 3.13: Topological Soninl. (a) House disarm: of
original partial order, (It). (c) and (ti) three
possible linear order! cont-hing the given
partial order.
From the point of view of discrete mathematics, it is never enough
merely to prove the existence of something. We also want a systematic
procedure for constructing it. We give below one such simple procedure.
(incidentally. we are tacitly assuming that the set X is finite. If X is in-
finite, it is still true that every partial order on X can be extended to a linear
order. But the proof requires substantially different techniques.)
The procedure consists of outputting minimal elements of subsets of
X (mm. the given partial order S on it), one by one. Let XI be X and let
an be a minimal element of X, Since X is finite, such an element exists by
Theorem (2.6). Delete x, from X,, i.e., consider the set X, = XI — {x,}. Let
x. be a minimal element of X.- Note that, in the order s, if at all x, and
x, are related, then x, < x. (by minimality of x,). Next let X, = X, — Dr.)
and x, be a minimal element of X,. Once again, it' at all x, is related to xI
(or x,), it is greater than x, (or x,). Continue this process till the set X is
exhausted. This gives a listing of the elements of X as x‘, x,, ..., x» (where
n = | X |). In this listing for i < j, x, is either unrelated to x,, (under <)
or else x, < x; by minimality of xi. So if we define 2:, ac x, simply byi S j,
we get a linear order which is an extension of the origin‘al partial order <
Note that because minimal elements are not, in general, unique, there it
considerable choice involved at every stage and consequently we get many
linear orders which are extensions of the same partial order.
194 mscssrs MATHEMATICS (Chapter Three)
Exercises
Prove Proposition (3.3).
Verify that the relation R in Proposition (3.4) is indeed an equiva-
lence relation.
What poset do we get it‘ we apply Proposition (3.4) to the set of all
integers on which we define x g y ii! x divides y?
Prove that a partial order S on a set X is total if and only if the
corresponding strict order < satisfies the following property, known
as the law of trichotomy: for all x, y e X either x < y or x = .V
or y < x.
3.5 HA is a chain in a poset X, | A | is called the length of A. If
X = P0?) where B is a set with n elements and the partial order on
X is by set inclusion, prove that
(i) for S, T e X, Tcovers S ifl‘ S c T and‘T— S is singleton set
(ii) the longest chain inX is of length n + 1
(iii) the number of such chains is n!.
3.6 Prove that the intersection of two chains is a chain but that their
union need not be a chain. ' ‘
3.7 Let (X, g) be a poset and e be the set of all chains in X. Clearly
1- c P00 and if S is a linear order then 1.- = P(X). Partially order
1- by set inclusion (this partial ordering is for certain subsets of X
and should not be confused with the original partial order S which
is for elements of X). A maximal element of r is called a maximl
chain in X.
(i) Prove that a longest chain in X is maximal but the converse
need not hold.
(ii) Prove that every chain is contained in a maximal chain. (Assume
that the set X is finite as usual. For infinite posets the existence
of maximal chains is a very difi'erent matter.)
(iii) Prove that the set X can be expressed as a union of maximal
chains.
3.8 Let m be the largest possible number of mutually incomparable
elements of a poset X. Prove that X cannot be expressed as a union
of less than m chains.
“3.9 In the last exercise prove that X can be expressed as the union of
m chains. (This is known as Dilworth’s theorem.)
‘3.10 Prove that a sequence of real numbers of length mn + l (where m
and n are positive integers) must contain either a monotonically
increasing subsequence of length m + l or else a monotonically
decreasing subsequenoe of length n + l.
Sets with Additional Structures 195
3.11 Prove that the assertion of the Dance Problem may fail if we do
not assume the sets of boys and girls to be finite. (Actually, it
sufiices to assume that at least one of these two sets is finite.)
3.12 Let (X, g) and (Y, g) be posets. A bijection f: X —> Yis called an
order isomorphism (or order equivalence) if {or all x, y E X, x < y
itff(x) S fo'). When such an order isomorphism exists, the posets
(X, g) and (Y, s) are said to be order isomorphic or to be of the
same order type. Prove that this defines an equivalence relation on
any collection of posets. Two posets having the same order type
have identical order theoretic properties, For example, if one of
them is a lattice, so is the other. If one of them has a chain of
certain length, so does the other. Prove that any two finite, linearly
ordered posets of the same cardinality are of the same order type.
Show that N and Q each with the usual order are not order iso-
morphic.
3.13 Let two sets X, Y be equipollent (i.e., suppose there is a bijection
between them). Prove that the posets (P(X,\, C) and (P(Y), c) are
of the same order type.
3.14 Let X be the set of all positive integers which divide 30. Define
g on X by x < y lit at] y. Prove that there exists a set Y such that
| Y| = 3 and (NY), C) is order isomorphic to (X, g). However,
if S is the usual order on X, prove that no such set Y exists.
‘3. 15 Prove that the poset (PtN), c), where N is the set of positive
integers, contains an uncountable chain.
3.16 A linearly ordered set X is called dense-in-ltsellif for all x, y E X
with x < y, there exists 2 E X such that x < zandz < y. For such
a set, prove that:
(i) ll] XI > 1, then Xis infinite
‘(ii) If i X | = N, and X has neither a maximum nor a minimum
element then X has the same order type as Q with the usual
ordering.
3.17 Prove that the set of all decompositions of a set, partially ordered
by the refinement relation is a lattice. (cf. Example (4) above.)
3.18 Prove that in a lattice, every nonempty, finite subset has a supre-
mum and an infimum.
3.19 Let (X, S) be a lattice. A subset Yof X is called a suhlattlee of X
if for all, x, y G Y, xV y and x /\ y are also in Y. (This condition
is commonly expressed by saying that Y is closed under the opera-
tions v and A.)
(i) Prove that (Y, S/ Y) (i.e., the set Y with the restriction of the
ordering g to Y) is itself a lattice.
196 mm MATHEMATICS (Chapter Three)
(ii) Prove that the intersection of any two sub-lattices of X is also
a snblattice of X. Prove that this need not hold for their
union.
(iii)Let S be a set and Ta subset of S, Prove that P(T) is a sub-
lattice of P(S) (partially ordered under c as usual).
(iv) Let S be a set. Pick any two distinct elements, say, x and y
of S. Let Y be the set of those subsets of S which do not
'separate x from y', i.e. Y = {A c S: either both x, y e A or
neither 2:, nor y is in A}. Prove that Yis sublattice of P(S).
(V) Prove that the set of all divisors of a positive integer n as well
as the set of all multiples of n are sublattices of the lattice in
Example (2),
3.201» LetXbethesetN XN. For
(x)! yr): (XI: 5’!) E X
define
(x1: ’1) < (XI, Ya)
ifi‘ x, g x, and y, < y, (of. Example (7)). Prove that < is a partial
order on X and (X, S) is a lattice. Draw its lattice diagram.
3.21 A poset (X, g) is said to be well-ordered if every non-empty subset
of X has a least element.
(i) Trove that s well-ordered set is totally ordered.
(ii) Prove that the set of real numbers and the set of all integers
(with usual orders) are not well-ordered.
(iii) Prove that the set of positive integers is well ordered (Hint:
By induction on n, prove that every subset of N containing n
has a least element.)
(iv) Let(X, g) be a well-ordered set and let on be any symbol
not in X. Let X‘=XU{I=}. Define son X’ by x ccyifi‘
eitherx,yeXandx<yory=~ _
(i-e-- g = < U «x, *)=x 6 Pl)
Prove that (X', S) is well-ordered. (This shows how to con-
struct from a given well-ordered set, the ‘next larger’ well-
ordered set)
(V) If X is well-ordered, prove that every element of X, except the
largest element, if any, is covered by a unique element of X,
called its successor.
3.22 Prove that the construction given for the next element in the
lexicographic ordering on the set of permutations in fact gives the
next permutation.
Sets with Additional Structures 197
3.23 Let n, r be positive integers with r S n. Denote an r-subset of
{1, 2, ..., n) by listing its elements in an ascending order. Now put
the lexicographic ordering on the set P,(n), of all r-subsets of
{1, 2, ..., n). For this ordering find formulas for (i) the next element
and (ii) the index of an element.
3 :24 Obtain a linear ordering for the set of all subsets of {1, 2, ..., n},
by first listing the empty set, then all singleton sets, then all sub-
sets with 2 elements and so on in the manner of the last exercise.
What is the index function for this ordering?
3.25 There is another way to put a linear order on the power set P(X)
where X = {1,2, ..., 71). Let Y be the set {0, l. 2, ..., 2"-1). Define
3: P00 -> Y by :00 = g f4(i)2'-’
wheref‘ is the characteristic function of A for A C X (see Exercise
(2.1.13)). Prove that g is a bijcction. Because of this, the usual
order on Y can be transferred to an order on P(X). Note thatg
itself is the index function for this ordering.
Prove that the Hasse diagram of a poset cannot contain any cycle,
i.e., a sequence of consecutive arrows terminating where it starts.
Let < be a partial order on aset X. Prove that unless S is itself
linear, there are at least two different linear orders on X which
extend S. (Hint: show that there will be at least one i such that
the set X: given in the algorithm for topological sorting has more
than one minimal element.)
Let X = {1, 2, ..., In). Which of the linear orders in Exercises (3.24)
and (3.25) is an extension of the partial order on P(X) by set
inclusion?
Prove that the method given for topological sorting is exhaustive
in the sense that every possible linear extension of the given partial
order can be obtained by making an appropriate choice of minimal
elements at every stage. Obtain the three extensions of the partial
order shown in Figure 3.13 this way.
A linear order < on a set X is called complete if every non-empty
subset of X which is bounded above has a supremum in X (which
need not belong to that subset). Prove that:
(i) g is complete if and only if every non-empty subset of X
which is bounded below has an infimum.
(ii) every well order is complete.
‘(iii) the usual order an N, (the set of all integers) and R is complete
but the usual order on Q is not complete.
198 DISCRETE MATHEMATICS (Chapter Three)
Notes and Guide to Literature
is
Partial orders arise in highly diverse problems. The study of lattices
Inci-
especially important. Astandard reference on them is Birkhofi' [l].
dentally, the term ‘lattice’ is also used in a very different sense in mathe-
matics and physics. The name ‘lattice’ as used here probably comes because
of
the Hasse diagram of the lattice in Exercise (3.20) resembles the lattice
tetravalent atoms such as carbon or silicon.
Our definition of an inversion difl‘ers slightly from that in Knuth [l]
Volume 3 where one full chapter (which comprises more than half the
third volume) is devoted to the problem of sorting and gives an exhaustive
list of sorting algorithms.
A proof of Dilworth‘s theorem may be found in Hall [I]. It can also be
proved using graph theory, see the Epilogue.
The proof of the fact that the usual ordering on R is complete requires
a careful study of the construction of real numbers. There are two standard
ways to construct real numbers from rational numbers, one due to Cantor
and one to Dedekind. The second method is designed to make It order
complete by very Construction. Of course, ultimately both the methods
yield the same properties of real numbers. For a discussion of both the
methods, and for a proof of their equivalence, see, for example, Gofi‘man
[1]. Completeness of the continuum (which is the old name for the real
number system) may very well be considered to be the corner stone of the
continuous mathematics. It would be difficult to think oh really non-trivial
concept or theorem of calculus which does not require it.
For infinite posets, the existence of maximal chains requires what is
known as the axiom of choice. Simply stated, this means that given a collec-
tion of mutually disjoint, non-empty sets, it is possible to choose one ele-
ment from each of them and form a set. This seems very obvious, at least
for finitely many sets. But many of its equivalent formulations are not so
obvious. One such formulation is, in fact, the existence of a maximal chain
in every poset. Another statement equivalent to the axiom of choice is that
every set can be well-ordered. For a discussion of this axiom and its use
in mathematics, see Halmos [l]. Extending a partial order to a linear order
also requires the axiom of choice. in case the underlying set is infinite.
Since we shall be dealing mostly with finite sets, we shall rarely need it.
We have mentioned it only to stress that the theory of infinite sets needs
substantially difl‘erent techniques from that of finite sets.
4. Algebraic Structures
We have mentioned many times that the limiting process is the essence
of continuous mathematics. If the limiting process is taken away, what is
left is mostly algebraic manipulations. While the staunch lovers ofthe
Sets with Additional Structures 199
continuous mathematics may scofl‘ at this algebraic part as ‘raw compu-
tation', it is precisely the algebraic aspect that is of interest to us. When
properly abstracted, it leads to a powerful species of mathematical structures,
called the algebraic structures. It includes Boolean algebras, groups, rings,
fields and vector spaces. They will be studied one by one in subsequent
chapters. What we present here is some of the most basic concepts about
algebraic structures. '
Whenever we do some algebraic manipulation such as adding, subtrac-
ting. multiplying or dividing on numbers we do so on two numbers at a
time. We often encounter expressions like 'the sum of this series’ or ‘the
product of the first n integers‘. But they really involve the process of
summing or multiplying two numbers at a time and carrying this out
repeatedly. (In case of an infinite series, this way we get only the so-called
partial sums and then a limiting process is needed.) Two real numbers,
subjected to any one of these operations give rise to another real number.
It is clear that if we want to define abstract mathematical structures of
this type, we must first replace the set of real numbers by an ‘abstract’ set
X. To define an abstract ‘addition’ on this set X, we must specify for every
two elements 3: and y in X, some element of X to be called their sum and
to be denoted by x + y. This amounts to defining a function from the
cartesian product X x X into X. Such a function is given a name.
4.1 Definition: .A binary operation on a set X is a function from X x X
into X. More generally for every positive integer n, an n-ary operation on
X is a function from the product set X" = X x X x x X (n times) into
X. (For n = l and 3 the operatien is called unary and ternary respectively.
Sincen= 2 is the most important case, an ‘operation’ would mean a
binary operation unless otherwise stated.)
Thus addition and multiplication are binary operations on the set of
real numbers. When so viewed, the equation 3 + 4 = 7 should really be
expressed as +(3, 4) = 7 (or, more fussily. as +((3, 4)) = 7), and the
equation 3 + 4.2 = ll as + (3,.(4, 2)) = ll. This is obviously clumsy. The
usual ' are too ‘ ‘ and ‘ to be L _, ‘ If they
are changed, most of the familiar theorems of algebra would appear
unintelligible. For example, the result that for all x, y E R,
(x+J’)-(X-y)=x‘—y'
will take the form,
'[+ (x, y). - (x, J0] = - ['(x. X)-'O’, )9“
True, it Is only a matter of habit. But if we want to serve the familiar
binary operations for real numbers as guiding lights in the process of
defining abstract algebraic structures, we better not alter them. even
notationally. 0n the other hand, we do want uniformity of notation
200 DISCRETE usrnsusncs (Chapter Three)
between the usual binary operations on R and the abstract binary opera-
tions an abstract sets. The only way out is, therefore, to adopt suitable
notations for the latter, rather than change the notations for the former.
Thus, even though every binary operation on a set)! isa function, it is
rarely denoted by the standard symbols for a function such as f, g, h etc.
The common notations for binary operations are ,-, +, t and a few
other notations in specific contexts. If s is a binary operation on a set X
and x, y e X then ‘(x, y) is denoted by x* y. Also, to keep conformity
with the usual multiplication for real numbers, even for an abstract set,
if the binary operation is denoted by ~, x- y is often denoted by xy. A few
other such conventions, designed to keep conformity with real numbers,
will be pointed out as we proceed.
Note that usual addition and multiplication are also well-defined binary
operations on N, Z and Q. However, if we take N. then subtraction does
not define a binary operation. For, if x, y e N and x < y then x — y is no
longer in N. (or course we can define x — y arbitrarily as, say, 25 in all
such cases and get a well-defined binary operation on N x N ; but it can
hardly be called subtraction.) 0n the other hand, on Z or Q, subtraction
does define a binary operation. What is crucially involved here is that
whenever one element of Z (or Q) is subtracted, as a real number, from
another element of Z (or Q) the result stays within Z (or Q). This is not
always true for N. When we pet‘orm subtraction on two elements of N. the
result may sneak out of N. It is this intuition that lies behind the peculiar
name given to this property in the following definition.
4.2 Definition: Let t be a binary operation on a set X. Then a subset
Y of X is said to be closed under (or w.r.t.) c if for all x, y E Y. x t y e Y.
The term ‘closed under’ something appears frequently and in many
different contexts. But every time the connotation is the same, namely not
having to go out of that set, whenever that ‘something' is performed to
it. For example, a family 9' of sets is said to be closed under unions if
for every subfamily 9 of .9", the union of members of 9 is also a member
of .97 (although not necessarily of 9). Similarly a set Y of real numbers is
said to be closed under suprema if for every subset A of Y, the supremum
of A, if at all it exists, is also in Y.
Coming back to Definition 4.2, suppose Y is closed under t. Then we
get a well-defined binary operation on Y, defined byaitself. This binary
operation is said to be induced on Y by the binary operation a on the
ambient set X. It may be denoted by */ Y but is generally denoted by # itself
because as far as its action on any two elements of Y is concerned, it is
the same as the original binary operation a on X.
Before giving other examples of binary operations, it) is convenient to
introduce names for binary operations satisfying certain conditions.
4.3 Definition: Letsbe a binary operation on a set X. Two elements
Sets with Additional Structures 201
x and y are said to commute with each other if x t y = y t: x. s is called
commutative if for all x, y E X, x s y = y a x. s is called associative if for
all x, y, z e X,x * (ye z) = (wry): 2. (In other words, it is commutative
iii" every two elements commute with each other.)
For example, the usual addition on R is both commutative and associa-
tive. but the binary operation of subtraction is neither commutative nor
associative. If we define in on R by x a y = JE¥When a is commutative but
not associative, while the operation 0 defined by x O y = x for all x. y E R
is associative but not commutative.
Because of commutativity, the order in which the two elements appear
is immaterial. The symbol + is generally not used for non-commutative
binary operations. In absence of associativity, an expression like (1 tb t c
would be ambiguous since it can be interpreted either as a s (b s c) or as
(0.17)“. If t is associative, it is unnecessary to put the parentheses.
More generally, if a is an associative binaryoperation on a set X then for
any a), 11,, ..., a. e X the expression aI t a, t to, can have only one
meaning. If further.tis also commutative, then the 111’s can be permuted
among themselves in any order and all the resulting expressions would be
equal. This fact is used innumerably many times for the usual addition
and multiplication for real numbers often without being conscious of their
' h, and ' ‘ ' y. In an ' set, care must be exercised
while handling such expressions, making sure that no more properties are
used than are assumed as axioms or have already been proved from the
axioms. Note, for example, that without associativity, the familiar law of
indices which says that a'"+" = a”. a" for positive integers m and n would be
meaningless. Indeed, the difliculty would be to define the power of an
element. It' at is a binary operation on a set X and a e X then we may
define a“ as a x a. But (a t 11)!- aneed not bethe same as a t (a s a) (e.g. let a
be subtraction). So we have difiiculty in defining a'. We may try to bypass
the difiiculty by defining aI only as a’ s a (and more generally, a" induc-
tively as and a a when n 2 2). But then the law of indices would not hold.
We leave it to the reader to prove that the law does hold if a is associative
(commutativity is not needed). I“ is denoted by +, the nth power of
a, a", is also commonly denoted by rm. Note that as yet we have given
no meaning to powers where the exponent is not a positive integer.
We new list a number of additional examples of binary operations.
I. Let S be any set and X = P(S), the power set of S. Then U and
n are two binary operations on X. Both are commutative and
associative.
2. More generally, if (X, g) is a lattice. then the meet A, and the
join V both define binary operations on X. Both these operations
202 mscam MATHEMATICS (Chapter Three)
are also commutative and associative. Every sub-lattice of X is
closed under these operations.
3. Let S be a set and X the set of all functions from S to itself. For
I; g E X, let f a g be the composite of the functions f and g. Then
f. g e X. Thus a defines a binary operation on X. This operation
is associative but not commutative in general. Note that the set of
all permutations of S is a subset of X and it is closed under a, be-
cause the composite of two permutations of S is again a permuta-
tion of S.
4. Suppose #, and s. are binary operations on two sets XI and XI
respectively. Let X = X, X Xr Then we can define a binary operation
It on X by
(’51, x2) * (J’s: Ya) = (xt ‘1 1'1. xi ‘s Yr)-
This operation is said to have been obtained by co-ordinltewise
application of s, and *,. The construction can obviously be genera-
lised for the product of any number of sets with binary operations.
This gives a way to construct new examples of binary operations.
(We assume the sets X,, X, to be non-empty, or else their product
would be empty). Clearly * is commutative if and only if ‘1 and t.
are so. A similar statement holds for associativity. Often the sets
X], X" and the binary operations I“, a” are all equal. Then
we get a binary operation on the power X". A common illustration
of this is the ‘co-ordinatewise addition’ of elements of the n-dimen-
sional euclidean space R". (Caution: For n = 2, we may identify
Rs with the set of complex numbers by letting an element (x. y) of
R’ correspond to the complex number x + iy. Then the co-ordinate-
wise addition corresponds to the addition of complex numbers.
But the co-ordinatewise multiplication of elements of R‘ does not
correspond to the usual multiplication of complex' numbers.)
5. On R“, there'is one more binary operation. We identify points of
R' with vectors in three dimensional space. Specifically to (x1. x” x.)
in R“ we associate the vector x,i + x,j + x,k where l ,j, k is a right
handed orthonormal system. Then the cross product of vectors
defines a binary operation x on W. This operation is neither
- 'u. nor ' ' Note ' "' ‘nlly, that the dot pro-
duct (also called the scalar product) of vectors does not defines
binary operation because the dot product of two vectors is not a
vector, it is only a scalar.
6. Let X, Y he sets and let F be the set of all functions from X to Y.
(In earlier notations, F = YX). Suppose t is a binary operation on
Y. Then *induces a binary operation on F by what is known as
the pointwise application of s. This is done as follows. Let f, g E F.
Then both f and g are functions from X to Y. Let x e X. Then
/'(x) and 30:) are elements of Y. So we can apply I: to them and
Sets with Additional Structures 203
get f(x) o g(x) which is another element of Y. This element depends
on x (I, g and 3 being fixed) and thus we get a function from X to
Y which associates to x e X, the element f(x) v g(x) of Y. This
new function is denoted by f®g. Then f®g e F whenever
(I, g) e F x F and thuswe get a. binary operation on F. For example,
ifX = Y = R and t is the usual addition, then for f(x) = e" and
30:) = sin x we have [9 3 defined by (f (D g) (x) = e" + sin x.
[98 is often denoted by f s g but a beginner occasionally finds
the double role of s confusing; on one hand it is a binary opera-
tion on Y and the same symbol is used for a binary operation on
F, whose elements are functiOns into the set Y. It is clear that if t
is commutative or associative then (:3 also has the corresponding
properties. There are many other properties whiehpass similarly
from t to@. Because of this. the construction here is very useful
in providing new examples of algebraic structures of a given type.
Often, instead of taking the full set F, some of its subsets which are
closed under®are more interesting. For example, if X = Y = R
and i is the usual + or - for R, then the set of all continuous func-
tion from R to R is closed under the binary operations 9 and o.
(In simple language this means that sum and the product of two
continuous functions are continuous.)
As an especially instructive example, let n be a positive integer
and let Zn be the set of residue classes of integers modulo n. Recall
from Section 2, thatthese are the equivalence classes of the relation
R on Z defined by n ifl‘x—y is divisible by n, for x, y e Z. We
define the addition and the multiplication of these residue classes
as follows. Take any two residue classes in Z,. Then they will be
of the form [n] and [b] for some a, b e Z. Since a, b are integers
we can add and multiply them under the usual binary operations
on Z. Thus We get a + b and a-b as elements of Z. So, [a + b]
and [a-b] are elements of Z,. It is tempting to define [a] + [b] as
[a + b] and [Ill-[b] as [1‘17]. But to make these definitions valid, we
must check that they are well-defined, i.e.. independent of the choice
of the representatives of the equivalence classes (see the comments
in the proof of Proposition (2.14)). Specifically, we have to show
that if u, b, c, d are integers such that [a] = [c] (i.e., a E c mod n)
=
and [b] = [:1] (Le. b a 11 mod n) then [a + b] = [c + d] and [a-b]
= {c.d]. These verifications are simple. Since a E c and b a d
mod u, there exist integers p and q such that a—c = up and
b—d=nq. Now (a +b)—(c+d):(a—c)+(b—d)=np+nq=n(p+q).
Since p + q is an integer, it follows that a + b E c + 11 mod n.
For multiplication, note that ab-cd = ab-bc + be — ed =b(a— c)
+ c(b -— d) = bnp + cnq = n(bp + cq). Since p, q, b and c are
'
integers, so is bp + cq. It follows that [ab] = [ed]. We, there-
204 mscnm MATHEMATICS (Chapter Three)
fore have two well-defined binary operations, + and ~ on Z...
called respectively residue addition (or mod n addition) and residue
multiplication (or mod n multiplication). It is clear that both these
operations are commutative and associatives because of the com-
mutativity and associativity of usual + and - on Z.
Using the residue addition and multiplication, we can do the
Division Problem elegantly, that is, without the brute force compu-
tation. Let x = 2.3.5.7.ll.l3.l7 + 1. We have to check whether at
is divisible by 19. Let us consider the residue classes modulo 19.
Then the problem reduces to asking whether [3] and [0] are the
same elements of 2... Now because of commutativity and associati-
vity of mod 19 multiplication we have.
[x] = [ll-[3145117141l]~[13]‘[|7l+ [l]
= [2.11].[3.13).[s.71-[171 + [1]
= [Hi-[3914351417] + [l]
-[3l~[l]-[- 3]~[— 21+ [11
(since 22 a 3 mod 19 etc.)
=[3.1. (— 3)-r—2)1+ 111
= [18] + [l]
= [19)
= [0].
Thus we see efl‘ortlessly that x is divisible by 19. The study of the
residue addition and multiplication is called the residue arithmetic.
The example here is an illustration of its use.
LetX be the set of all finite sequences whose entries come from
some set S. (We may think of S as an alphabet and then elements
of X as ‘words‘). Given two such sequences, say, (:1, 3,, ..., .1.) and
(1,, 1,, ..., I”) we form a third sequence of length m + n, namely,
(.11,.1,,..., 1..., :1, t,, ..., t,.). This gives a binary operation on X,
called concatenation or juxtaposition. This operation is associative.
but not commutative. (The idea of concatenation was also used in
the proofs of Proposition (2.15)and Theorem (3.10)).
Given any binary operation : on a set X, we define its opposite ope-
ration t' by xt’y = y: x for all x, y e X. If n is commutative then t'
coincides with as. Note that (t')’ = In There is an obvious duality
between properties of e and e’ (analogous to the duality between
the properties of a partial order and its reverse order.)
When the set X is finite, there is a very handy way to define and
represent a binary operation on X, by drawing, what is called the
Sets with Additional Simrtures 205
table of the operatic-.1! |X|=n, this table has n rows and n
columns, one for each element of X. If X = {x,, x,, ..., x,} and the
binary operation is ', then we put x, * x, in the ith row and jth
column. For example, in Figure 3.14, we show the table for the
binary operation of residue multiplication on the set Z. (see Example
(8) above).
[0] [ll [2] i3] [4] [51
[0] [0] [01 [0] [0] l0! [0]
[1] [0] I1] [2] [3] [4] [5]
[2] i0] [2] i4] [0] [2] [4]
[3] [0] [3] i0] [3] i0] [3]
i4] [0] [4] [2] [0] [4] [2!
[5] [0] [5i [4] l3] [2] [ll
Hz. 3.14. Redd-e Multiplication for 2..
Conversely, we may fill the n‘ places in the table by elements of X
in any manner we like and get a well-defined binary operation on
X. This way we do not have to describe it by any formula; we are
defining it by exhaustively listing its values for all possible ordered
pairs of elements of X (that is why, this method requires X to be
finite). Analogous to the csrtesian representation of a binary rela-
tion, certain properties of a binary operation have a vivid interpre-
tation in terms of its table. For example, commutativity is equiva-
lent to symmetry about the diagonal. (For associativity there is no
such obvious interpretation.)
We now define two more concepts pertaining to binary operations.
4.4 Definition: Let s be a binary operation on a set X. Then an element
2 of X is said to be a left identity for 1: iffor all x e X, e * x= x. A right
identity is defined similarly. An element which is both a right and a left
identity is called a two-sided identity or simply identity for t.
206 orscnm MATHEMATICS (Chapter Three)
Obviously a left identity for an operation at is right identity for its oppo-
site operation " (Example (9) above), and vice-versa. So it sufioes to give
examples or to prove theorems for one of these concepts. Also for a com-
mutative operation, the distinction between left and right identities disap-
pears. For non-commutative operations, several things can happen. For
example, let Xbe any set and define e on X by x t y = x for all x, y E X.
Then every element of X is a right identity but there is no left identity for
at unless X is a singleton set. The operation of subtraction for real numbers
has 0 as the only right identity and no left identity. The following simple
but useful result shows that thing are smooth if both types of identities
are present.
4.5 Proposition: Let a be a binary operation on a set X and suppose it
has both a left identity and a right identity. Then they are equal. Also
there are no other left or right identities.
Proof: it suflioes to show that any left identity of * equals any right
identity for i. Let e andf be respectively such identifies. Then the element
e sf is on one hand equal to f (since e is a left identity) and on the other
hand equal to e (since f is a right identity). So 2 = f.
For real numbers, 0 and l are, respectively, the identities for addition
and multiplication. Generally, even for an abstract set X, when a binary
operation on X is denoted by + (as noted before this is rarely done when it
is not commutative) its identity (if any) is denoted by 0. When the opera-
tion is denoted by -, its identity (if any) is denoted by l or by eorby
some other symbol. Note that the identity function on any set, 15: S —> S
is also the identity element for the binary operation of composition on the
set of all functions from S to S (Example (3) above). Hence probably the
name. We leave it to the reader to find which operations given above have
left or right identities. If e is an identity for a binary operation :- on a set
X and Yis a subset of X which is closed w.r.t. 1' then three things can
happen: (i) e e Y, in which casee is also the identity of the induced binary
operation on Y, (ii) 2 ¢ Y, but the induced binary operation on Y has its
own identity, e.g. letX = R, t = . and Y = {0} and (iii) e ¢ Y and the
induced operation on Yhas no identity, e.g. X = R, a M ~ and
Y={xeR:x>2}.
We invite the reader to give more examples of these three possibilities.
Having discussed identities, let us discuss one more important concept,
that of an inverse of an element. We often make somewhat loose state-
ments such as ‘subtraction is the opposite (or inverse or reverse) ofaddition’
or ‘division is the opposite of multiplication'. What do they really mean?
Let us take the case of subtraction. When we take a real number a, subtract
some other real number b from it, and write the answer as c(= a—b), we
are really answering the question, ‘which real number, when added to b
Sets with Additional Structures 20]
will give us a?’ In other words, we are solving the equation b + x = a
where x is an unknown. The solution can be written as a + (—b) where
-b is the negative of b. Thus subtraction can be expressed in terms of
addition and the concept of the negative of a real number. Similarly division
can be expressed in terms of multiplication and the concept of the reciprocal
of a (non-zero) real number. There is somthing common to these two con-
cepts. When a real number and its negative are added, we always get 0.
the identity of addition. Similarly when a (non-zero) real number and its
reciprocal are multiplied, we always get I, the identity of multiplication.
Let us now isolate this common feature and put in the wider setting of
an ‘abstract’ binary operation on an abstract set, keeping in line with the
spirit of abstraction discussed in Section 1. We have to be a little careful
because while both the operations above (namely + and ~ for R) are com-
mutative, our abstract binary operation need not be so.
4.6 Definition: Let e be an identity for a binary operation It on a set X.
Let x e X. Then x is called right invertible if there exists y E X such that
xt y = e. Any such element y is called a right inverse of x. The concepts
of left invertibility, left inverse, two sided invertibility (also simply called
lnvertibility) and two sides inverse (simply called inverse) are defined
similarly.
We could define these concepts when e is just a right identity or a left
identity (instead of being two~sided identity). In that case, since right
identities need not be unique we would have to define these concepts
w.r.t. a particular right identity. The generality so gained is worthwhile
sometimes. But for most purposes, the definition above is adequate. As
foremost examples, we see that for the usual addition on R, every element
xis invertible and —x is the unique inverse of 2:. Similarly for the usual
multiplication on R, every non-zero xis invertible with its reciprocal as
the unique inverse. For the example (3) above, we see from Exercise (2.1.1 1),
that a function f is lefi-invertible if and only if it is injective (assuming
S 99 95)- However. simple examples show that the same function may have
more than one left inverse. For example, let S = N and define f: S —> S
byf(x) =x + LDefine g:S—¢ Sbyg(y) = y— lfory > I. Theng(l)may
be defined arbitrarily and any such g would be a left inverse off. Similarly
f has a right inverse ifi'f is onto, but the right inverse may not be unique.
However, I is invertible ifl‘ f is a bijection and inthis case the inverse is
unique; it is the inverse function f—‘:S —> S. This notation is also used
in the case of an abstract binary operation, provided the inverse is unique.
If the binary operation is denoted by +, then instead of x" we frequently
write — x. If every element of X has a unique inverse then we get a well-
defined function, called inversion from X into itself, which assigns to each
x e X, its (unique) inverse, x". Note that in such a case (r‘r‘ = x for
all x E X. In other words. the inversion function composed with itself gives
208 DISCRETE MATHEMATICS (Chapter Three)
the identity function. Also note that inversion is a unary operation on the
set X.
Unfortunately. without associative law, the inverses do not behave the
expected way. The analogue of Proposition 4.5 does not hold. as shown
bythe operation : on a set (a, b, c, d, e) whose table is drawn in Figm‘e 3.15.
Here a is the identity. The element b has c as the unique right inverse and
d as the unique left inverse. Still it is not invertible. The element 2 has two
inverses, b and c.
e e a a b c
Fig. 3.15: Patholnglui Behaviour of Inverses.
Things are substantially improved if we have associative law as we now
show.
4.7 Proposition: Let It be an associative binary operation on a set X. with
identity e. Suppose an element a e X is right invertible. Then.
(i) a can be cancelled from the right, i.e.. whenever x. y e X and
Jun =yta,we have x=y.
(ii) if further, a is also left invertible, then its right and left
inverses are unique and equal. Hence a is invertible and has a
unique inverse.
Further, if a, b are any invertible elements. so is a s b and
(a t b)“ = b-1 c a".
and for every positive integer n, a" is invertible with inverse (a-‘)". Finally
if a is invertible, then for every 6 e X, the equations x e a = c and a t x = e
have unique solutions. (However, these solutions need not be equal.)
Set: with Additional Structures 209
Proof: Suppose I; is a right inverse of n. Then,
(i) x4 a =y ta implies (xx-a): b = (yea) n b,which byassoci-
ativity gives x «a a b) = y t (a t b), or, X: e = yt 9. But this
means x = y as desired.
(ii) Let c be a left inverse of a. By associativity,
(c ta)tb = eta-(nth).
Now, the left hand side equals e t b, i.e. b. Similarly the right
hand side equals 6. We have thus shown that any lefi inverse
of a and any right inverse of a are equal to each other. This
proves that there is only one inverse.
Now let a, b have inverses r1, 11-1 respectively. Then (a . b) *(b-1 t a“)
equals, by associativity, at[(bsb'1)ta'1] which reduces to e. Similarly
(b-1 # r1)t(a * b) = 2. Thus a t b is invertible with inverse b-1 it (4-1. In
particular, if we take b = a then (a-‘)' is the inverse of a“. The assertion
that a’' is invertible with inverse (Ix-WI can easily be proved by induction
on n. The case n = l is trivial. For n > I, we simply write a" as a a (or-1
(note again that associativity of a is used). Then set b = (11)“. Then by
induction hypothesis, b" = (tr-1)”1 and so b'1 s a—‘ = (a-l)". For the last
assertion, the equation x . a = c has x = c i a-1 as a solution as seen by
direct calculation. Uniqueness of the solution follows by (i). The equation
a c x = c similarly has x = a'1 t c as its unique solution. a
The fact that the inverse of a product equals the product of the inverses
in the reverse order resembles our everyday experience that when we dress
up we first put on a shirt and then a cost but when we undress, we first
take off the coat and then the shirt! The resemblance will be brought out
more fully when we discuss actions of algebraic structuresi. Of course, if t
is commutative (in addition to being associative). then (a t b)‘1 = a-1 t Ir).
Using the concept of an inverse, we now define the negative powers of an
invertible element a e X. If n is any positive integer, we define a" =(a'1)'
which also equals (n")'1 by the proposition above. By convention we set a“
to be the identity element. We then have the following ‘laws of indiees’
whose proof is left as an exercise.
4.8. Proposition: Let a be an invertible element of a set X, with an asso-
ciative binary operation *, having identity e. Then for all integers m and n
we have
(i) am = a" s (1"
(ii) (0'0" = W- i
The fact that an invertible element can be cancelled from the two sides
i See the Epilogue.
210 mscnnrs MATHEMATICS (Chapter Three)
of an equation is frequently used in simplification of equations. This pro-
perty is given a name.
4.9 Definition: A binary operation It on a set X is said to satisfy the left
cancellation law if for all x, y, z e X. x e y = x s 2 implies y = 2. Right
cancellation law is defined similarly.
Because of (i), in Proposition (4.7), if at is associative and every element
is right invertible then it satisfies the right cancellation law. The converse is
false. The usual multiplication for the set of positive integers is associative.
has an identity and also obeys both the cancellation laws. Still, no element
except 1 is invertible in N. (As real numbers, all elements of N are inver-
tible; but we want the inverses to be elements of N.) Interestingly, for finite
sets the converse holds as we now prove.
4.10. Theorem: Suppose a is an associative binary operation on a finite
set X. If it satisfies both the cancellation laws, then s has an identity and
every element of X is invertible.
Proof: We first show that for any a. b e X, the equation a s x = b has a
unique solution for x. Given such a, b, define a functionfiui’ —> X by
f.(x) = a s x for x e X. By left cancellation law, f. is one-to-one. But since
X is finite, by Theorem (2.2.9), A is onto. In particular b is in the range of
1;. This says precisely that the equation a e x = b has at least one solution.
Uniqueness of the solution follows from the left cancellation law. Similarly,
using the right cancellation law. we see that for any a, b e X, the equation
x s a = b has a unique solution for x. (This solution may difi‘er from that
of the equation a a: x = 17.)
Now fix any a E X. Taking b = a. there exists a unique or e X such
that a s e = n. We contend that e is a right identity for t. For this, suppose
c E X. We have to show c t e = c. NOW, by what We showed above, there
existlxeXsuchthatxsa=c.Butthenc*e=(x*a)s e=xt(ase)
(by associativity) = x Is a = c as was to be proved. So * has: rightidentity,
namely e. By similar reasoning, * has a left identity. But then, by Proposi-
tion (4.5). s has a two sided identity. which we denote by 2.
It remains to show that every element of X is invertible. Leta E X. By
the existence of solutions proved above, there exist 17, c E X such that
atb = e and c s a = 2. Then b is a right inverse of a and c is a left inverse
of :1. But * is associative. So by Proposition (4.7), a is invertible. i
Note that in this proposition it is not enough to have just one cancella-
tion law. For example, if X is any set and we define as * y = x for all
x, y e X then * is associative and satisfies the right cancellation law. But
the left cancellation law does not hold and no element has a left inverse
(unless X is a singleton set). Note that in this example, t has no identity.
If we are given beforehand that a has an identity then the preceding
Sets with Additional Structure: 211
proposition can be proved using just one cancellation law. We leave this
as an exercise.
So far we discussed only one binary operation on a set at a time. We
frequently have situations where there are two (or more) binary operations
on the same set, and they are inter-related in some way. We define the most
important inter-relationship of this type.
4.11. Definition: Suppose - and t are two binary operations on a set X.
Then . is said to be left distrihntive over * if for all x. y. z e X,
x-(y t z) = (x-y) s(x-z).
Right distributivity and two-sided distrlhutivity (or simply distributivity) are
defined analogously. [If - is commutative either distributivity implies the
other. But commutativity of It has no role to play.]
As a foremost example. the usual multiplication for real numbers is
distributive over the usual addition. Here, the addition is not distributive
over multiplication. However, sometimes we have two operations, each of
which is distributive over the other. For example, in Example (1) above.
U and n are distributive over each other. As an example where only one-
sided distributivity holds, let x be any set and s be any binary operation
can set X. Define another binary operation - on X by x- y =- y for all
x, y e X. Then for all x, y, z e X, both the expressions x~(y a z) and
(x- y) s (x-z) equal y :- z and hence . is left distributive over us. But in general
the equality (y t z).x = (y-x) at (2-3:) does not hold unless the operation t
is such that x a x = x for all x E X. This example also shows how an inter-
relationship (in this case right distributivity) between two operations impo-
ses certain restrictions on one (or both) of them. We shall see more instan-
ces of this in later chapters. For the moment we prove one result of this
type. ~
4.12 Proposition: Let . and t be two binary operations on a set X with
. distributive over t. Suppose - has an identity. Assume t is associative
and satisfies both the cancellation laws. Then * is commutative.
Proof: Let 1 be the identity for -. Let a,beX. We have to show
aab=bsa. Letc = lsl. Then,
c.(a*b) = (on) t (c.b) (by let‘t-distributivity of - over s)
= [a . 1M] *[(1'1)~b]
= [(1.a)e(l.a)] s[(1.b)a(l-b)] (by right distributivity)
= (am) It (btb).
But on other hand
¢.(aeb) = (1:1).(atb)
212 DISCRETE MATHEMATICS (Chapter Three)
= [1 .(n e [3)] t [1.(a e b)] (by distributivity again)
=(atb)r(atb).
Hence
(aea)e(btb)=(acb)*(atb)
Since . is associative we can remove the parentheses. Also because of
cancellation laws, we can cancel a from the lefi and b from the right. This
givesatb=btaasdesired. I
By now we have discussed a fair number of general concepts about
binary operations in general. We close this section with a brief discussion
of abstract algebraic structures. The reader may possibly find it a little
intricate. In subsequent chapters we shall study many ‘concrete' algebraic
structures. Each such ‘concrete' structure will itself be abstract in the
sense that it will be defined on an abstract set and it is only after specify-
ing a particular choice of the underlying set that a concrete example of it
will be obtained. What we now propose to do is, therefore, an abstraction
of the second order. A beginner may find it useful to return to it again
after having seen several particular examples of algebraic structures.
4.13 Definition: An algebraic structure is an ordered triple of the form
(X, 5’, i?) where X is a set,.9’is a finite sequence of operations on X, say,
3" =(’:» ’p .. -, *k).
where k is some positive integer and for each i = l, 2, ..., k, t. is an nl-ary
operation on X, (n1, 11,, ..., 71,, being positive integers) and V is a set of
conditions, each involving one (or more) of these operations on X. The
set}! is called the underlying set of the algebraic structure.
For example, a monoid is a triple of the form (X, (‘), 6") where X is a set,
' is a binary operation X and the set 1' consists of two conditions, (i) ‘ be
associative and (ii) * have a two-sided identity. There are, of course,
many other types of algebraic structures such as Boolean algebras, groups,
rings, fields etc. We classify them according to the following rule.
4.14 Definition: Two algebraic structures (X, .9’, 9) and (X', .9”, 3”) are
said to be of the same type (or of the same category) if there exists a bijec-
tion between? and 5’” and a bljection between V ane V’ and these two
bijections are compatible with each other.
Note that we are not requiring that there be a bijeetion between X and
X’. The definition simply means that if .9’ = (n, ‘5, ..., *k), with t, an
m-ary operation X, then we can express 9" as (‘1', fi’, ..., *k’) with t,’ an
nl-ary operation X' and for every condition in $3 if we replace each or,
occurring in it by *i’, we get a condition in ‘6" and vice versa. For example,
any two mouoids, say, (X, (s), ?) and (X’, (0’), 6”) are algebraic structures
of the same type. 0n the other hand a monoid cannot be of the same
type as an algebraic structure in which two binary operations are involved,
Sets with Additional Structure: 213
or one in which only one binary operation is involved but the conditions
involving it are difl‘erent.
While dealing with algebraic structures of the same type, we may omit
the sets of conditions, @ and V from notation.
Obviously, at any one time we would be proving theorem about alge-
braic structures of the same category. In general, there is very little that
Would appply to all possible algebraic structures. Nevertheless a few
basic concepts can be defined in such generality.
4.14 Definition: Let (X, 5”, 1?) be an algebraic structure. Then an
algebraic structure (Y, .9", 6") is called a substructure of (X, 3’, 3’) if the
following conditions hold:
(i) Y is a subset of X and it is closed under each of the operations
inY,
(ii) .SP’isthe 1 ofthe , ' OnY‘ ‘ ‘bytL , ‘
in? and
(iii) 9!" consists of the conditions in V, with each binary operation in
3’ replaced by the corresponding induced operation on Y.
In this context the structure (X. .9’, V) is called the ambient structure.
Clearly, a substructure of an algebraic structure is of the same type as
the original structure. For example, a submonoid of a monoid is itself a
monoid.
A homomorphism from one algebraic structure to another of the same
type is a function between the underlying sets which preserves. or is com-
patible with the operations. A formal definition is as follows:
4.15 Definition: Let (X, (*1: a, ..., at), V) and (X’, (5', q, .... 1,"), Q”)
be algebraic structures of the same type. Then a function f: X —> X’ is called
a homomorphism, if for every 1': l, .... k and for all x,, x” .., Xn‘E X
(where n. is the integer such that both It, and a" are npary operations on
X End X’ respectively), we have
“has: -~-r 3"!» = *,’(f(x,),ftx,). "'Ij(x"t»-
For example, if (X, 0),?) and (X’, (t'), 6”) are monoids then afunction
f: X —> X’ will be called a homomorphism if for all x, y E X,
fix * y) =1'(X) WU).
or
In informal terms. the image of a product is the product of the images,
you
in other Words. whether you first multiply and then apply f or whether
.
first a l fand then multiply, you get the same result.
l
Hgggmorphism between two algebraic structures Is a fundamenta
that we
concept in the study of algebraic structures of a given type. Note
214 DISCRETE MATHEMATICS (Chapter Three)
are not requiring the function f to be a bijection. If it is so, then we call f
an isomorphism, and say that the structures (X, Y, G’) and (X’, 5’", C") are
isomorphic to each other. Isomorphic structures may be considered as rep-
licas of each other. From an abstract point of View they are indistinguish-
able from each other. In Section l, we compared two different structures
on the same underlying set as two earthenwares of difl‘erent types formed
from the same mound of clay. As a similar analogy, we may regard isomor-
phic structures as two flasks of exactly the some shape and size but made
of possibly difl'erent metals.
Suppose (X, 5”, ’5) is an algebraic structure. Let R be an equivalence
relation on the underlying set X. Let X/R be the quotient set (See Definition
(2.12)). We want to make X/R into an algebraic structure of the same type
as the original structure. To do this, we have to define corresponding ope-
rations for the set of equivalence classes and the usual question of their
being well-defined will arise. The following definition is tailored to meet
precisely this difliculty.
4.16. Definition: An equivalence relation R on the underlying set X of an
algebraic structure (X, (*1, ..., tr), 6") is called a congruence relation if for
all i: 1, 2, ..., k, and for all x,. ..., x», , y,. ...,yu,e X. x,Ry, {or all
j .= l, 2, ..., m implies
[*10‘1: ..., 3%)] Ri‘IO’n ..., y": )1
For example, let X = z and let *, = +, the usual addition and t, = ~,
the usual multiplication. Then for any positive integer n, the equivalence
relation of congruency modulo n is a congruence relation (see Example (7)
above). Hence the name. The following basic theorem is now easily proved.
4.11. Theorem: Let R be a congruence relation on the underlying set of
an algebraic structure (LY, 1?). Then the quotient set X/R can be made
into an algebraic structure in such a way that the quotient function
p:X —> X/R is a homomorphism. (The resulting algebraic structure is called
a quotient structure of the original structure.)
1’e: Suppose? = (n, #2, ..., u), where e, s an m-ary operation on X.
Let [x,]. [x,], ..., [301,] e X/R. Define
*.’([x,], ..., [x»,]) = [*,(x,, ..., x..,)].
This is well-defined since R is a congruence relation. This gives the algebraic
structure on XIR. Also, since p(x) = [x] for all x e X, it is clear that p is
a homomorphism. a
If we want to apply this theorem, we must have a way of finding cong-
uence relations on the underlying set. It turns out that if the structure
Sets with Additional Structures 215
(X, .9’, if) is sufliciently strong (i,e. the conditions in 3’ are sufliciently power-
ful)»then certain substructures of (X, 5’, 1?) induce congruence relations on
X. We shall see this in particular cases in the chapters to come.
A comment about the notation: used for particular types of algebraic
structures is in order. First. when we are dealing with structures of the
same type, it is customary to suppress 9’ from notation and to list the con-
ditions in if as the postulates or axioms of that particular type of structures.
Secondly, the structures we shall deal with will have only a few operations
(often one or two and never more than three). Consequently, it is unneces-
sarily fussy to denote them by a sequence. For example, the reader will
often find a monoid defined simply as an ordered pair, (X, g.) where X is a
set and t is an associative binary operation on X having an identity. Also
when the operation * is understood, we simply say X is a moncid. Another
common simplification of language is that attributes of the underlying set
are assigned, by a transfer of epithet, to the algebraic structure in question.
(This is also a common practice for other mathematical structures.) Thus
when we say that a monoid is finite, it simply means that its underlying set
is finite.
Exercises
4.1 How many binary operations are there on a set with n elements?
How many of them are commutative? How many have an identity?
4.2 Let * he a binary operation on a set X. Fixa E X. DefineL:X->X
byf.(x) = a t x for x e X. This function f; is called the left trans-
lation by a. Similarly we define the right translation by a. (These
functions were already used in the proof of Theorem (4.10).) Prove
that a is a left identity for a if and only if the function f; is the
identity function I x- (This is probably another justification for the
name.)
4. 3 It is sometimes helpful to paraphrase the various conditions about
binary operations in terms of commutative diagrams. Prove, for
example. that a binary operation a on a set X is commutative if
and only if the following triangle is commutative where the func-
tion o:x><x -> XXX is defined by 6(x, y) = (y, X) for 811ml 6 X.
xxx——°———>Xxx
\/
216 mscaara MATHEMATICS (Chapter Three)
Similarly characterise associativity, distributivity and presence of
identities.
4.4 Prove that the left cancellation law for a binary operation t on a set
X is equivalent to the assertion that for the function
g:X><X)<X—>X><X
definedby
s(a,x.y)=(a*x.a*y)fora.X.yE X.g"(AX)=X><AX
where AX is the diagonal on X (see Section 2.)
In the examples (1) to (8), find which operations have identities and
which elements are invertible.
Prove Proposition (4.8).
A lattice is called distributive if the binary operations A and V are
distributive over each other. Prove, in fact, that for any lattice
diatributivity of V over A implies that of v over /\ and vice versa.
(Hint: Use absorption laws, namely
aA(aVb)=aandaV(a/\b)=a
for all a. bin a lattice, besides other properties of /\ and V.)
4.8 Prove that the power set lattice, the lattice of all positive integers
with the partial order defined by divisibility, the lattice of state-
ments with partial order defined by implication and the lattice in
Exercise (3.20) are distributive.
4.9 Prove that the lattice of all partitions of a set, with the partial
order defined by refinement is not distributive. (Hint: Let the set
be the cartesian plane. Consider three partitions of it whose mem-
bers are, respectivey. all horizontal lines, all vertical lines and all
lines of slope 1. More generally, any lattice which contains 3 elements
every two of which are non-comparable and every two of which have
the same meet and the same join, is non-distributive.)
Prove that the composition of two homomorphism is a homomor-
phism and the composition of two isomorphisms is an isomorphism.
Prove also that the inverse of an isomorphism is an isomorphism.
4.11 Prove that (R, +) and (R. s) are both monoids. Prove also that
the function f: R —> R defined by fix) = e“ in monoid homomor-
phism. Is it an isomorphism '2
4.12 Given two algebraic structures with the same underlying set, say
(113’, V) and (X,.9”, V’) we say (37”, W) is a stronger structure
than (3', 6’) if (i)? is a subsequence of.9” and (ii) every condition
in 9? follows from V’. In other words, a weaker structure has fewer
and less restrictive algebraic operations. Thus a monoid is a weaker
structure than the one in which the binary operation obeys, besides
associativity and presence of identities, some other laws such as
Sets with Additional Structures 217
cancellation laws. Prove that strictly speaking, the structure on the
quotient set X/R, defined in Theorem (4.17) may be of a weaker
type than the original structure on X. For example, the cancellation
law may hold for X but not for X/R.
4.13 We often have situations where the underlying set of an algebraic
structure also carries some other structure. In such cases the
question of compatibility of the two structures is very important.
For example, let (X, 4r) be a monoid and g a partial order on X.
Then s is said to be compatible with (or invariant under) * if for
all a, b, c e X, a S bimpliesatcgbsc and cimSctb. Prove
that if S is compatible with x then for all a, b, c, d e X, a S b
and c g 4 implies a =I= c g b t d. Prove that the usual order on R
is compatible with the usual addition but not with the usual multi-
plication on R.
4.14 Let (X, i) be a monoid and d a metric on X. Then d is called
translation invariant if for all a, b, c E X,
d(asc,btc)=d(a,b)=d(csa,c*b).
Prove that the usual metrics on R and on the euclidean plane R'
are translation invariant if the binary operations are the usual
addition. Prove that the metric d on R defined by d(x, y) = [ x‘—y‘ l
is not translation invariant. 1f «1 is a translation invariant metric
on R (or on R') then the open (or closed) balls of the same radius
but with difi'erent centres are of the same ‘size’.
4115 Let the a binary operation on a set x. We then define a binary
operation on the power set P(X). which may be denoted by©(or
by a itself). Given A, B c X, we let A®R be the set
(xtyzxe AJE 5).
Study which properties pass over from t to®.
4.16 Using modulo 3 arithmetic, prove the ‘rule of three‘ which says
that a positive integer is divisible by 3 if and only if the sum of its
digits, written in decimal expansion, is divisible by 3. (Hint: Note
that 10 E 1 (mod 3). Hence prove that all powers of 10 are con-
gruent to l modulo 3.)
4.17 Obtain similar criteria for divisibility of a positive integer by 9
and by 11.
4.18 According to a Western superstition 13 is an evil figure and when
the 13th of a month falls on a Friday, such ‘Friday the 13th‘ is
considered an especially evil day. Using modulo 7 arithmetic,
prove that in any calendar year there is at least one and at most
three Friday the thirteenths.
4.19 If three distinct integers are picked from lto 100, find the prob-
ability that their sum is divisible by 3.(Hint; First see, in Z., in
218 DISCRETE MATHEMATICS (Chapter Three)
how many ways the residue class [0] can be expressed as a sum of
three elements.)
4.20 A seminar consists of 60 lectures, spread over a period of five
weeks. There is to be at least one lecture per day. Prove that no
matter how the lectures are scheduled, there will be a block of
consecutivc days in which exactly 13 lectures will be given. Prove
also that this can be avoided if the seminar is to run only for
34 days.
4.21 Using residue classes modulo 8, prove that no integer of the form
8n + 7, (n an integer) can be expressed as a. sum of three perfect
squares.
4.22 If a is an associative binary operation on a set X and at, ..., a. E X
then the expression 11.14,: an. has only one meaning. Prove
2n — 2
that in absence of associativity it could have ’1' ( ) possible
n -l
meanings. [Hintz Each such meaning is given completely by putting
one pair of brackets around the result every time the operation I
is carried out. For example, the two meanings of a, a a, e a, t are
[[al .. 11,]! (1,] and [411 t [a. s a,]]. Now ignore the left brackets and
the s’s. Change all right brackets to right parentheses. Also ignore
a, and change a,, «1,, ..., a. to left parentheses. Thus flap: (1,] e 11,]
changes to ()( ) while [a1t[a,¢a,]] changes to (( )). The trick is
to recover from these, the original interpretations. This can be
done by changing the left parentheses to 11., ..., a," the right paren-
theses to right brackets and then noting that as we scan from
left to right, each right bracket indicates that t has been performed
on the two elements of X immediately preceding it (one or both
these elementsSmay be the results of earlier applications of t.) Show
that this gives a bijection between the set of all possible meanings
of ale :1, a s a. and the set of all balanced arrangements ofn -- 1
pairs of parentheses. The latter was counted in the solution to the
Vendor Problem in Chapter 2, Section 3.]
4,23 Given a binary operation * on a set X, an element x e X is called
idempotent if x t x=x. Prove that every right and every left identity
is an idempotent. If X is finite Ind a is associative, prove that
X contains at least one idempotent. (Hint: consider a sequence of
suitable powers of any x e X.]
4.24 In this exercise you may assume that every positive integer has a
unique factorisation into prime powers. This fact will be formally
proved in Chapter 6, Section 2.
Let S be the set of all functions from N to a. Let 8. c. land .1. be
the functions defined by
Set: with Additional Structures 219
I if n = l
80!) = {
0 if n > 0
c(n) = l for all n
i(n) = n for all n
l for n = l
p01) = 0 ifp‘ln for some prime p
(— l)’ if n is the product of r distinct primes
For 1; g e S, define their convolute f*g to be the function whose
value at n e N is El flk) g (11:) where the sum ranges over all
k.
positive integers dividing n (including 1 and n). Prove that:
(i) the functions 8, c, i and p. are all multiplicative (see Exercise
(2.4.15) for definition).
(ii) convolution defines a commutative and an associative binary
operation on S with 8 as the identity element.
(iii) iff; g are multiplicative, so is ft g. (Hint; If m, n are relatively
prime and klmn then k can be uniquely expressed as W where
ulm and vln).
(M fe S is invertible (w.r.t. *) iii/10¢ 0. (Hint: In the converse
implication, define the value of the inverse function at» by
induction on n, nssuming it is already defined on all proper
divisors of n.)
(V) the functions c and u. are inverses of each other.
(vi) the function cscgives the number of divisors of a positive
integer.
(vii the function is c gives the sum of divisors of a positive
integer. (i t c is often denoted by a).
(viii) if r]: is the Euler function in Exercise (2.4.15) then 45 to:- i.
(Hint: Classify the integers in {1, ...,n) socording to their
greatest common divisors with u. If kln. note that there are
precisely ¢ (£) integers whose g.c.d. with n is k.)
(M ¢ =1 e I‘-
(X) ¢ is a multiplicative function. (Another proof was given in
Exercise (2.4.15).
[The function u is called the Mohius inversion. For f6 S, [O c and
fe p are called. respectively, the Mohlns unusfom and the inverse
220 DISCRETE MATHEMATICS
Mohins transform off. The terms ‘transform', ‘convolution’ as
well as the peculiar notation 8 come from analogous concepts in
continuous mathematics for functions from R to R]
4.25 The concept of a Mobius function can be extended to certain posets.
Let < he a partial order on a set X. Then < is a subset of X X X
and we let S be the set of all functions from g to R. For
(x, y) e < (i.e. for x, y e X with x < y) we assume there are
only finitely many 2 e X with x < z and z < y. (A poset satisfying
this property is said to be locallyfinite. Clearly every finite posetis
locally finite.) Forf, g e S, define their convolute
fut: <—>R
by
Z f(x,z)s(z.y)-
(fit) (any) = l‘l"
Show that s is associative (although not necessarily commutative)
and has an identity. Prove that f e S is invertible w.r.t. tifi‘
flx, 30$ 0 for every x E X. Define t“ < —> R to be the inverse of
the constant function with value I. (In case X = N and g means
divisibility, prove that (4):, y) is the same as p (g) where u. is the
Mohius function of the last exercise.) Deduce that if
so" I) = x‘l"
S {(x. 2)
for all x S y then
RX. 1) 2
8‘1"!“. J’NO', 1) .
(This is called Mohins Inversion formula).
Note: and Guide to Literature
The material in this section is basic and we shall elaborate more on it
in the chapters to come. For more on the residue arithmetic and its use
in computer science see Tremblay and Manohnr [1].
Regarding Exercise (4.21), it can be shown that every positive integer
can be expressed as a sum of four perfect squares. This famous theorem is
called Lagrange’s four square theorem. For a proof, see Herstein [I].
For more on Mohius transforms see Hua [I]. From Exercise (4.25), a
number of interesting results can be derived, by choosing the poset appro-
priately. See Bender and Goldman [l].
Four
Boolean Algebras
Having discussed the generalities about algebraic stmctures in the
fourth section of the last chapter. we now proceed to study particular alge-
braic structures. The logical starting point would, perhaps, be an algebraic
stmcture of a simple type, having only one binary operation and with as
few axioms as possible (keeping in mind the general remarks made in
Section l of the last chapter regarding depth versus generality). However,
we prefer to start with a relatively rich structure, called a Boolean algebra.
The reason for this is twofold. On one hand we already have with us
certain particular Boolean algebras (although we have not called them so).
Secondly, among all possible ‘abstract’ algebraic structures (other than
those modelled after the real number system), the Boolean algebras have
the most down-to-earth applications. Hopefully, this will convince the
reader of the need for abstraction.
In Section 1 we define Boolean algebras and study some of their pro-
perties. Section 2 is devoted to the study of Boolean functions. In the third
and the fourth sections we consider applications to the electric circuitry
and to logic respectively.
1. Definition and Properties
In practice we often come across things which have two natural states
and they exist, at any one time, in one and only one of them. For example.
a statement is either true or false, an electric switch is either closed or open,
a car is either at rest or in motion and so on. In some cases, depending upon
the needs of the problem, we may divide these states into further categories.
For example, we may classify moving cars according to their speeds. But
in many problems, this may be irrelevant. This is especially true of electri-
cal circuits. While in some problems, the magnitude of the current flowing
in a circuit may be crucial, in some other types of problems, the only thing
that matters is whether current is flowing in it or not. In case of statements,
222 mscrums MATHEMATICS (Chapter Four)
we have already agreed not to consider the various degrees of truth but to
recognise only two categories, ‘true’ or ‘false’.
Devices which occur in two states are called blnlry or two-state devices.
These two states are called opposite or complementary states. We arbitra-
rily assign the symbol 1 to one of these states and the symbol 0 to the
other. Two devices which are always in the opposite states are said to be
complementary to each other. We also define the conlunction of two binary
devices as another binary device which is in the state 1 ifi‘ both of them
are in state 1. The concept of disjunction of two binary devices is defined
similarly, as a device which is in state 1 ifi'at least one of them is in state 1.
The interpretation of these three concepts, namely, complement, conjunction
and disjunction would of course change depending upon the nature of the
devices.
The structure of a Boolean algebra is meant to isolate these three basic
concepts about two state devices. The actual definition is as follows:
1.1 Definition: A Boolean algebra is an ordered quadruple of the form
(X, + , ~, ‘) where X is a non-empty set, + and - are two binary operations
on)! and ’ is a unary operation onX (i.e., ’ is a function from)! to X and
ifxe X, we denote, ’ (x) byx’) satisfying the following conditions:
(31) both + and . are commutative,
(82) both + and - have identities denoted by 0 and 1 respectively.
(B3) both + and - are distributive over each other and,
(B4) forallxe X,x + x’ =1 andx-x’ =0.
If x e X, then x’ is called the complement of x. If x. y E X then x + y
and x - y are sometimes called the disjunction and the conjunction of
x and y. respectively.
A foremost example of a Boolean algebra is provided by taking X as
the power set of some set. say S. defining + as U, - as n and ' as com-
plementation w.r.t. the set S. Then the empty set o is the identity for +
while the whole set S is the identity for . . For this reason, ()5 and S are
often denoted by 0 and 1 respectively. We must, of course. verity that the
four axioms (Bl) to (B4) are satisfied. This is precisely the content of
Proposition (2.1.1), parts (i) to (iv). The reader may wonder which is the
two state system associated with this Boolean algebra. The answer is that
for every fixed element x of S, every subset of S is a two state system,‘ be-
cause for any such subset, say. A of S there are only two possibilities.
either at e A or x 9% A (in which case x e A', the complement of A).
If we take S = t, then the Boolean algebra P(S) consists of only one
element. This is a trlviol Boolean algebra. In this Boolean algebra, the ele-
ments 0 and l coincide. As we shall see below, the elements 0 and 1 are
Boolean Algebras 223
always distinct, provided the Boolean algebra has at least two elements
The simplest non-trivial Boolean algebra has 0 and l as its only elements.
An ,' of such a '3 ' ' L is ‘ ’ ’ by taking the power set
of a singleton set. This Boolean algebra'is very important for certain pur-
poses and is often denoted by Z. (Recall that the same symbol'is used for
the set of residue classes modulo 2. The underlying set is the samefor both,
namely, (0, 1}. The operation of - also coincides with the modulo 2 multi-
plication. But the operation of +, as Boolean algebra, does not coincide
with that of the modulo 2 addition, because 1 + l = l in a Boolean algebra,
as we shall soon prove, but in the modulo 2 addition, 1 + l = 0. TM!
double usage of the same symbol is somewhat unfortunate. But the context
will always make it clear, which algebraic structure is meant.) .
Other examples of Boolean algebras will be given later on. Following
the general constructions given in Section 4 of the last chapter, we can
generate new Boolean algebras from old ones. For example we can take the
cartesian product of two (or more) Boolean algebras, and make it into a
‘ by “ “ ,~+, and ’ " ' S.imilarly given a
Boolean algebra (B, +,- , ') and any set S, the set of all functions from S
to B Is a Boolean algebra under pointwise operations. A third method'is to
take L ‘, L of a " ' ' A ., y subset Y ofa Boolean
algebra (X, +, , ’)is called a suhalgehrs if it is closed under the operations
+, ., ‘. We prove that Y itself is a Boolean algebra.
1.2. Proposition: Let Y be a subalgebra of a Boolean algebra (X, +, ., ’).
Then Y, with the induced operations (which we continue to denote by the
same symbols, (+, - and ’) is a Boolean algebra with the same identity
elements as X. .
Proof: We first show that 0 and l are in Y. Since Y is non-empty, there
exists some 1: e Y. But then x’ G Y, since Y is closed under complemen-
tntion. Since x, x’ e Y and Y is closed under + and ', We get x + x' e Y
and x-x’ e Y. Thus both 0 and 1 are in Y. Now all the axioms ofaBoolean
algebra are satisfied for Y. These follow from the corresponding properties
for X. H
An an instructive example of a subalgebra, let X be the power set Boolean
algebra of a set S. Let 9 be a decomposition of S whose members are, say,
5., S,,..., S; Let Y = {A c Szfor each I = l, 2, ..., k, either S,c A or
SlnA = 4:}. In other words, Yconsists of those subsets of S which comp-
letely contain those members of 9 which they intersect. Y cannot contain any
subset of S which intersects some S, only partly. It is easy to show that Y is a
subalgebra of P(S). As a Boolean algebra, it can be shown that Y is iso-
morphic to the power set Boolean algebra HQ).
We could go on looking for more examples of Boolean algebras. But
there is an interesting theorem which stops us by saying that, from an abs-
tract point of view, our search will not yield any more variety. Specifically
224 orscnm MATHEMATICS (Chapter Four)
the theorem says that every Boolean algebra is isomorphic to some sub-
algebra of a suitable power set Boolean algebra. This famous theorem is
called the Stone Representation Theorem. We shall prove it for finite Boolean
algebras (the infinite case requires the axiom of choice), In studying abstract
algebraic structures we generally identify two structures that are isomorphic
to each other. Module this identification, the Stone representation theorem
says that if we study all power set Boolean algebras and their subalgebras,
then we have exhausted the world of Boolean algebras. Whatever theorem
hold for the subalgebras of power set Boolean algebras also hold for all
Boolean algebras.
Such representation theorems represent landmarks in the study of mathe-
matical structures because they represent abstract mathematical structures
as some concrete, familiar structures (hence the name “representation theo-
rems'). They serve to measure the true generality arising out of abstraction.
We shall see a few other representation theorems later. However, despite
their theoretic achievements, such theorems generally do not provide a
direct simplification of the study of the respective abstract structures. For
example, in the case of the Stone representation theorem, to prove it req-
uires considerable spadework on abstract Boolean algebras. Some represen-
tation theorems are surprisingly easy to prove (as we shall see for groups).
But then, their utility is correspondingly limited. Although they reduce the
study of the abstract structures of a particular type to that ofcertain parti-
cular, concrete structures, the latter itself is a formidable task, not any less
diflicult than the former.
Therefore, despite the Stone representation theorem, (which. any way,
has not been proved yet). we continue to work in terms of abstract Boolean
algebras. We begin with a few basic properties, Which are derived directly
from the axioms (see the comments made after the proof of Proposition
(2.1.1)). The proofs provide an excellent example of axiomatic deduction.
1.3. Theorem: Let (X, +, ~, ) be a Boolean algebra. Then the following
properties hold for all elements x, y, z of X,
(i) x + x = x and x~x = x. (Laws of Tautology or Idempotency)
(ii) x+ l = landx-0= 0
(iii) x + x’ y = x and x-(x + y) = x (Laws of Absorption)
(iv) x+(y+z)= (x+ y) + zand x-(y-z) =(x-y)-z
(Associative Laws)
(v) Ifx+y=landx.y=0theny=x’
(Uniqueness of Complements)
(Vi) (’0' = X (Law of Double Complementatlon)
Boolean Algebras 225
(vii) (x +y)’ =x’-y' and (x-y)’ =x’ + y'
(De Morgan‘s Laws)
(viii) 0’ =1 and 1': 0
(ix) 0 ;é 1 unless X has only one element.
Proof: Before actually giving the proofs, we make an important observa-
tion. There is an absolute symmetry between the operation + and , in the
definition of a Boolean algebra. Consequently, whenever any identity holds
in a Boolean algebra, we can replace all occurrences of + with . and vice
versa (with a corresponding interchange of 0 and l) and get a new identity,
called the dual of the original identity. Its proof can be given dualising
each step in the proof of the original identity. This is known as the
principle of duality. Formally, it states that if the ordered quadruple
(X, +,-, ’) is a Boolean algebra, so is the ordered quadruple (X,~, +,'). We
see that in most of the assertions to be proved, there are two statements
which are the duals of each other. Because of the principle of duality, it
suflices to prove either one of the two statements.
We now prove the assertions one by one.
(i) We have, x = xyl (by property 32 in the definition)
= x~(x + x’) (by B4)
=(x-x) + (x-x’) (by B3)
= (x-x) + 0 (by B4 again)
= am: (by 82 again)
Hence x-x = x. The other assertion, x + x = x follows by duality.
The reason for the name “Laws of tautology’ (and also for the
name ‘Laws of Absorption’) will be given in Section 4.
(ii) x+l=(x+l)-l (byBZ)
= (x + l)-(x + x') (by B4)
= x + (14’) (by 83)
= x + x’ (by 32)
:1 (by 34)
(iii) From (ii) we have, y + l = l. Multiplying both sides by x we get
reduces to
0+ 1). x: 1.x. By 33, 52 and 131 the left hand side
x+ (x- y) while the right hand side equals x by 82. Hence
x + (x- y) = x as was to he proved. _ ‘
algebraic
(iv) This is really a remarkable property, because in most
axiom. In
structures, it is customary to assume associativrty as an
as n
the case of Boolean algebras, this property can he proved
226 015mm mmrm (Chapter Four)
consequence of other axioms and hence it is redundnnt to include
it as an axiom. of course, till we have proved it, we have to be
careful to insert , ' L y to avoid '’
Let us prove x>(y-z) = (X'y)-Z. Let the left and right hand sides be
denoted by a and b respectively. Instead of proving a = b directly, we shall
first prove, separately, that 11+ x = b + x and a + 2! :1; + x’. Then
multiplying the two we shall get (a + x)-(a + x’) = (b + x).(b + x’). By
B3, B4 and BI, the left hand side reduces to a while the right hand side
reduces to b. This would prove the result.
Now. a + x = [xv-2)] + (x‘l) (by 32)
=x-[(y-l)+ 1] (by 33)
= x-[l] (by (ii) proved above)
= 1:
while b +x= [(x-y)~z] +x
=[(x‘}’) +xl~lz+xl (by 33)
= [x+(x-y)]-[x+ 2] (by 31)
= x- (x + Z) (by (iii) proved above)
-= 2: (again by 0“».
Hencea+x==b+x.
Next, a + x’ = [x-(yq)]+ x’
= [x + x’]~[(y-2) + X’] (by 53)
=1-[(J’-2)+¥1 (b)l 54)
= (N) + x’ (by 32)
=(J’+X')-(Z+x’) (byBS)
while b+ x’= [(x.y).z] +x’
=[(x-J')+X’]-[Z+x’] (by BB)
= [(x + Jt’)-(.v + X')l-(z + x’) (by 53 again)
=[1-(y + x’)l (2 + X') (by 84)
= (y + X’)-(2(+ x’) (by 132)
So a + x’ = b +x’ and as noted before this completes the
proof.
(v) Suppose x+ y =1 and x~y= 0. Then
Boolean Algebras 227
x’=x’-l =x’-(x+ y) =(x’-x) + (x’.y)=0 +(x’-y) =x’-y.
Similarly y= l-y= (x’ +x)-y=x’ :ysincex-y= 0. Thus
both x’ and y equal x' ~y whence x‘ = y. We have omitted the
justifications this time. By now the reader should be in a position
to supply them. This shows that the complement of an element x
is characterised by the property that when added to x it gives 1
and when multiplied with x it gives 0.
We caution the reader against attempting to prove the uniqueness
of complements from the equation x + x’ = x + y by cancelling
x from both sides. This is so because cancellation law has not been
established. Actually, from (ii) we have x + l = y + 1(= i) for
all x, y E x. Thus we see that the cancellation law does not hold
for 4-. Similarly it fails for - .
(vi) We have x + x' =l and x'x'=0. By Bl, this gives. x’ +x=l
and x’ -x = 0. But because of the uniqueness of complements just
proved, this means x is the complement of 1!, i.e., x = (x’)’.
(vii) Let a = x + y and b = x’ -y’. We have to show a’ = b. By unique-
ness of complements, it sufiioes to prove that a + b = l and
a-b=0. Now, a + b = x + y + (x’~y') (because of associativity we
need not put parentheses)
= x + [0’ + x')'(}’ + J”)]
=x+[(}’+x')-ll
= x + y + x’
=y+x+M
=y+ l
= 1 (by (3))
Also a-b = (x + y)~(x’-y')
= (X~x"Y’) +(y~X’vy')
= (09") + (04")
= o + o (by 00)
=2 0.
This proves (x + y)’ = x’~y'. The other assertion follows either by
duality or by replacing x' by x. y' by y and taking complements.
(viii) Since 0 + 1 = l and 0-1 = 0(from (ii)), it follows, by (v) again
that l = 0' undo = 1'.
(ix) Suppose 0 = 1. We have to show that X has only one element. Let
x E X. Then x = x~l = x0 = 0. So 0 (or 1) is the only element
of X. I
228 orscms MATHEMATICS (Chapter Four)
Using the various laws proved so far (or assumed as axioms) we can
simplify an algebraic expression involving elements of a Boolean algebra,
much the same way as we simplify expressions involving real variables.
Because of commutativity and associativity, the sum or the product of any
number of terms is defined unambiguously, without having to insert paren-
theses. As a further notational simplification, we make the convention that,
as with real numbers, + will rank higher than ~ . This means that an expres-
sion like x- y + z-w will be interpreted always as (x- y) + (z-W) and not as
x-[y+ (z.w)] or as x.(y + z)-w or as [(x.y) + 2}»! etc. In view ofthesym-
metry between + and -, We could have as well decided to let - rank higher
than +. But we prefer our convention because it is adopted for the real
number system with which we are so familiar. To further keep up with real
numbers, we shall often suppress . from notation. of course, analogy with
expressions involving real numbers should not be relied upon blindly.
Certain laws, such as cancellation laws, which hold good for real numbers
no longer apply in Boolean algebras. 0n the other hand, there are certain
" of " ' 'g L such as ' and distributivity of
+ over ~ which render possible certain simplifications which would not
hold far real numbers. As a commonly used example, x + x‘y simplifies to
x +y sincex + x'y = (x + x’)(x + y) = l -(x + y) = x + y. Another point
to note is that because of tautology, the same term (or factor) may beused
any number of times. For example, the expression xyz + xyz' + x’yz equals
xyz + xyz’ + xyz + x’yz which further reduces to xy + yz or to y(x + 2).
Yet another simplifying feature is provided by the absorption law. In any
expression we may ignore all terms which are multiple: of some other term
and any factor which is obtained by adding some terms to some other
factors.
As an application of these methods we now present a systematic, algeb-
raic solution to the Business Problem. (Another solution will be presented
in the fourth section.) We let B be the set of all businesses in the country
and consider the power set Boolean algebra, (P(B),U,n, ’). We denote by
I, E, L and S respectively, the subsets of B consisting of all businesses
having import licenses, manufacturing essential commodities, employing local
personnel and employing skilled personnel. We let V,, V,, V, be the subsets
of those businesses which violate the three given rules respectively. The
problem amounts to showing that Vl + V, + V. = l = B. For this, we first
express each V. as a Boolean expression involving I, E, L and S (which are
elements of our Boolean algebra), of. Exercise (2.1.16). Now, the first rule
is violated by two types of businesses, (as) those which have import licences
but either do not employ local personnel or do employ skilled personnel and
(b) those businesses which do not manufacture essential commodities and
either do not employ local personnel or do employ skilled personnel. The
set of businesses of type (a), is precisely In (L'US) or I(L’ + S), in our
notation. Similarly the set of businessesrof type (b) is E’(L' + 5'). So, V,,
Boolean Algebra: 229
the set of violators of the first rule, is the union of these two, i.e.,
[(L’ + S) + E’(L’ + S)
or (I-+ E’)(L’ + S). By similar reasoning it follows that
V, = (1+ L')(S' + E)
and V, = LI’
Now
V, + V,+ V.= (1+ E’)(L’ + S) +(I +L')(S'+E) +LI'
= IL’ + E’L' + IS+ E’S+ IS‘ +L’S’ + IE +L’E + LI’
=L’(I+E'+S'+E)+I(S+S’+E)+E’S+LI’
=L’(l +I+ S’) + [(1 +1?) + E’S+ LI'
=L’.l +1.1 +E’S+LI'
=L’ +I+E’S+LI’
=L’ +I’+I+ E‘S (since L'+LI’=L’+ I')
= l + L’ + E'S
= i.
This shows that every business violates at least one rule and so it is impos-
sible to do any business in the country. Sometimes things are not so bad.
If the rules are difl‘erent, the expression V, + V, + V. may not reduce to 1.
Still, it is worthwhile to simplify it, because then taking its complement we
get a handy expression for VI’Vfl’,’ which is precisely the set of all lawful
businesses. Paraphrssing it gives a simplified, but equivalent version of the
original rules. We illustrate this in the following problem.
1.4. Problem: Suppose that in the Business Problem, the third rule is
changed, by an amendment, to ‘No business shall employ skilled personnel
without obtaining an import license‘, the first two rules being unafi‘ected.
Simplify the rules to In equivalent set of rules.
Solution: We proceed as above. The sets V1 and V, remain the same. But
Va changes to 31’. So V.+ V.+ V, would now reduce, instead of l, to
L’ + I + E’s + 51’ which in turn becomes L’ + E’S + I + S and finally
L’ + I + S (since E’S is absorbed in S). Therefore, the set of lawful busi-
nesses, is (L’+ I + S)’ which equals LI’S’ by De Morgan's laws. Hence
the system of rules is equivalent to the following simple system of rules:
(i’) every business must employ local personnel (ii') no business shall have
an import license and (iii’) no business shall employ skilled personnel. B
In this problem, the simplified version of the rules shows something
which is not obvious from the given rules, namely, that the original system
230 DISCRETE MATHEMATICS (Chapter Four)
is independent of whether the business manufactures essential ”commodities
or not. The first two rules do ostensibly make references to essential com-
modities. But our work shows that, if we consider the system of the three
rules asa whole then businesses manufacturing essential commodities do
not have any more advantage or disadvantage as compared to those busi-
nesses which do not manufacture essential commodities.
Theorem (1.3) was a generalisation of Proposition (2.1.1) in the sense
that when Theorem (1.3) is applied to a power set Boolean algebra, we get
precisely the results of Proposition (2.1.1) (except, of course, the results (i)
to (iv) which correspond to verification of the axioms of a Boolean
algebra). Direct proofs, using elements of the set in question, were possible
for Proposition (2.1.1). Such arguments would not work for Theorem (1.3),
because the elements of an abstract Boolean algebra need not be subsets
of some set; they could be real numbers, some functions, some animals, in
fact anything at all. We know nothing about them save what is implied by
the axioms. Therefore, whatever, theorems we prove about abstract Boolean
algebras, must be deduced strictly from the axioms. (Things would be
different if we have the representation theorem at our hand. Such a theorem
allows us to regard an abstract structure as a concrete one.)
This situation is fairly typical in the study of abstract mathematical
structures. Some particular example serves as a model. We take concepts
which are originally defined for this model and see if they can be suitably
paraphrased so that they can be defined for the general. abstract context.
Similarly, we inquire what theorems can be carried over from the concrete
to the general. In the present case. the power set Boolean algebras serve as
models for the abstract Boolean algebras. Let P(S) be the power set Boolean
algebra of some set S. Elements of I’(S) are subsets of S and hence consist
of elements of S. We are used to do things in terms of these elements. Let
us see which of these things can be carried over to an abstract Boolean
algebra. LetA e P(S). Then IA], the cardinality of A is the number of
elements in A. There is no easy, direct way to generalise this concept for
elements of an abstract Boolean algebra. But let us take some other concept,
say. that of inclusion. Let A, B e P(S). Then we say A C B if every ele-
ment of A is also an element of B, and this gives a partial order on P(S).
As it is, this concept cannot be generalised for an abstract Booleanalgebra,
X. If x, y. E X, we cannot define it C y to mean that every element of x
is also an element of y, because x may not be a set. It may be an elephant,
a flower and so on. So it is meaningless to talk of an element of x. (The
reader may ask whether it is not meaningless to talk of the sum of two
elephants or the complement of an elephant. Such a question is baseless.
We are starting with the assumption that (X, +, -. ') is a Boolean algebra.
If elements of X happen to beelephants then it means that we are given
some rule to define the sum of two such elephants. Whether such a defini-
tion is ‘meaningful’ is an extraneous question.)
Fortunately, in Proposition (2.1.2) we have characterised set inclusion
Boolean Algebra 23]
in various ways which involve only the operations n , U and ’ and do not
directly involve the elements of the set. Since these three operations corres-
pond to +, - and ’ in an abstract Boolean algebra, the concept of set inclu-
sion can be generalised for an abstract Boolean algebra, as we now do.
1.5 Definition: Let (X, ,-, ’) be a Boolean algebra. If x, y e X, we say
x <yifx-y’ = 0.
There is no harm in denoting x g y by xC y and reading it as ‘x is
contained in y‘ instead of ‘x is less than or equal to y', as long as we
treat the notation C and the phrase ‘contained in’ like proper nouns.
This is another point, albeit a minor one. to he noted about abstraction.
When the inspiration for a concept comes from a particular example,
often the same notation and terminology is used even in the general context.
Their etymology may be interesting and instructive but should not he stret-
ched unduly. For example, in the present case, although for power set
Boolean algebras xsy is the same as n, for other Boolean algebras,
xsy may have a very different interpretation. Similarly two elements x
andy in any Boolean algebra are called disjoint ifx«y = 0, even though - may
not always stand for intersection.
The following theorem captures the important properties of the binary
relation just defined.
1.6 Theorem: The relation S defined above makes the underlying set of
a Boolean algebra. into a lattice. Moreover, 0 and l are the minimum and
the maximum elements of this lattice.
Proof: Let (X, +, ., ’) be a Boolean algebra. For x, y e X. we have de-
fined x < y as x-y’ = 0. We want so show that (X, <) is a lattice. First
we verify that < is a partial order on X. Reflexivity follows from the fact
that X‘x' = 0 for all x e X. For transitivity suppose x s y and y < 1.
Then xy' = 0 and yz’ = 0. Now
xz' = x-l -z' = x{y+ y’)z’ = xyz' + xy'z = x-O + 0: = 0 + 0 = 0,
proving that x < 2. It remains to prove that s is anti-symmetric. For this.
letx,y e Xwithxg yandygx.Thenxy‘ = 0andyx’——-0. Now
x=x‘1=x(y+y’)=xy+ xy'=xy+0 =xy.
.
By symmetry y also equals xy. Hence x = y.
For the verification of the lattice properties, let x, y e X. We claim that
the first
the join of x and y is simply x + y and that their meet is x-y. For
x S x + y.
assertion, note first that x-(x + y)’ = x-x’y’ = 0~y’ = 0. So
(x, y}. To
Similarly y < x + y. Hence x + y is an upper bound of the set
bound
show that it is the least upper bound, suppose z is some other upper
So
of{x,y},i.e.x < zandyg z.Thenxz’ = Oandyz’ =0.
(x+y)z'=xz'+yz’=0+0=0.
232 DlSCRB'I'B MATHEMATICS (Chapter Four)
This means x + y < z and hence x + y is the least upper bound of the set
(x, y), which, by definition, is the join of x and y. The proof that the meet
of x and y is x- y is obtained by duality.
Finally. if x e X then O-x' = 0 and hence 0 S x, showing thato is the
minimum element of X. Similarly, x-l’ = x~0 = x for all x E X, shows
that 1 is the maximum element of X. This completes the proof. i
The preceding theorem shows that every Boolean algebra gives rise to a
lattice. An interesting question arises, whether every lattice can be obtained
from a Boolean algebra. The answer is obviously in the negative. As we
just saw, the lattice obtained from a Boolean algebra is always bounded
(i.e. has a smallest and a largest element). It turns out that a bounded lattice
satisfying a couple of other properties indeed arises from aBoolean algebra.
We already defined a distributive lattice as one in which the binary opera-
tions of join and meet are distributive over each other (see Exercise (3.4.7)).
We make one more definition.
1.7 Definition: Let (X, S) be a bounded lattice with 0 and l as its
minimum and maximum elements respectively. Then X is called complem-
terl, if for every x e X, there exists some y e X such that xv y = l and
xA y = 0. Any such y is called a complement of x.
Note that we are not requiringcomplements to be unique. For example,
in the lattice of Exercise (3.4.9). any two of the three given partitions of the
plane are complementary to each other. This particular lattice is not dis-
tributive. As may be expected, things are much better for distributive
lattices. In fact, such lattices come very close to Boolean algebras as the
next theorem shows.
1.8 Theorem: Let (X. +. -, ’) be a Boolean algebra. Then the corres-
ponding lattice (X, g) is complemented and distributive. Conversely if
(X. g) is a bounded, complemented and distributive lattice then there
exists a Boolean algebra structure on X, (X, +, -. ’) such that the partial
order relation defined by this structure coincides with the given relation <-
Proof: The first part needs no proof, because since V and A are preci-
sely + and . respectively, the assertion following right from the axioms of
a Boolean algebra along with the fact that 0 and l are the minimum and
the maximum elements of X respectively. It is the converse that is more
interesting. Suppose (X, g) is a bounded, complemented, distributive lattice
with 0 and l as the smallest and the largest elements. For x, y e X define
x+y=xVy
and
wy=XAy
Then the binary operations + and ~ are commutative with 0 and l as their
respectiveidentities. Their distributivity over each other follows from the
Boolean Algebra 233
very definition of a distributive lattice. Thus the axioms Bl to BS in Defini-
tion (Ll) are verified. Now for each x e X, we select any one complement
of x and denote it,hy x‘. (Actually, using distributivity it is not diflicult to
show that complements are unique. But this fact is not needed. In order
to have a Boolean algebra, all we need to have is at least one complement
for every element.) This gives a function ’ : X —> X such that the quadruple
(X. +, -, ’) is a Boolean algebra.
Now let S be the partial order on X induced by this Boolean algebra
structure, i.e., for x, y e X, x cc y ifl‘ xy' = 0. We have to show that cc
coincides with g, i.e., for x, J’E X, x ocy ifl'x < y. Suppose x at y. The—n
x~y'=0. So x=x-l=x-(y+y’_)=xy+xy'=xy+F=x.y,
This means x = x A y. But x A y g y. Sox < y. Conversely suppose
x<y. Then x=xAy=x-y. Hence xy'=xyy’=x-0=0. So
x cc y. This completes the proof. I
4 a
_Many authors definea “ ' ' L as a L ,u a
distributive lattice. The preceding theorem shows that this definition is
equivalent to ours. It really does not matter which definition is adopted,
bacause both approaches will ultimately yield the same results Still, certain
concepts may appear more natural in one approach than in the other and
oertian results may be easier to prove in one approach than in the other.
For example associativity is immediate for the binary operations /\ and V
in a lattice. But to prove associativity for a Boolean algebra as we have
defined, requires a little work. On the other hand, in some examples such
as the algebra of circuits (which will be studied in Section 3), the partial
order g is not a. very natural one and so it is a little artificial to conceive
them as lattices. Ultimately, however, it is more a matter of taste than of
real convenience as to which approach one adopts. Note that the partial
orders induced by a Boolean algebra (X, +, -. ’) and its dual Boolean
algebra (X, ., +, ’) are dual to each other.
The partial order structure induced on the underlying set of a Boolean
algebra (X, +, -, ’) also enables us to prove the representation theorem
for it, at least when the set X is finite. We proceed as follows. Suppose,
to start with, that X is indeed the power set Boolean algebra of some set,
say S. Let S = {x,, x,, ..., x.) and for eachl= l, 2, ..., I: let A, = {x,).
(Note again that x: sé (x,). x: e S while {xi} C S, i.e. {an} E l’(S)-) Now
suppose A e P(S), i.e. A C S. Then either A = cf or else A is of the form
(xl, , xi, ,..., x,,}
for some positive integers r and
l<i,<r',<...<i,<n.
Also
A = U A",
k-l
01',
A =4“ +A,,+ +A...
234 nrscma MATflBMATICS (Chapter Four)
Thus we see that every non-zero element of the power set Boolean algebra
P(S) can be expressed as a sum of singleton sets. Since two distinct single-
ton sets are disjoint, this expression is unique.
The key to the representation of an abstract Boolean algebra as a power
set Boolean algebra is to obtain a similar expression for an element of an
abstract Boolean algebra as a sum of certain ‘basic’ elements. Which ele-
ments should play the role of these ‘building blocks") To get an answer
we take another look at singleton subsets. If the power set of any set is
partially ordered under inclusion, then the empty set is obviously the mini-
mum element and singleton subsets are characterised as minimal elements
in the set of all non-empty subsets. We are now in a position to define
their counterparts for an abstract Boolean algebra.
1.9 Definltion: Let (X, +. ., ’) be a Boolean algers. Then a minimal
element of the set X —--{0} is called an atom of X.
Atoms of a power set Boolean algebra P(S), are precisely singleton
subsets of S. As we saw above, every element of P(S) can be expressed as
a union of atoms much the same way as in chemistry every molecule is
obtained by combining atoms. (Hence probably the name ‘stom'.) Actually,
the concept of an atom can be defined for any poset havinga least element.
In the terminology of Definition (3.3.8), atoms are precisely those elements
which cover the unique least element. For example, if on the set N, we
define x g y if x divides y for x, y e N, then 1 is the minimum element
and prime numbers are the atoms.
In the following proposition we prove a few simple properties about
atoms in a finite Boolean algebra and show how they serve as building
blocks.
1.10 Proposition: Let (X, +, ~, ’) be a finite Boolean algebra. Then
(i) every non-zero element of X contains at least one atom,
(ii) every two distinct atoms of X are mutually disjoint,
and
(iii) every element of X can be uniquely expressed as a sum of atoms,
specifically if x e X, then x is the sum of all atoms contained in
x (with the understanding that an empty sum is 0).
Proof: (i) Let x e X, xaéO. Then either x is itself an atom (Le, a minimal
element of X — {0}) or there is some x,eX, xfiéo such that )61 <x. If
x, is an atom we are done. Otherwise there is some x, e X' such that x.#0
and x, < x,_. Continuing in this manner, since X is a finite set, there will be
some n such that x. is an atom and x. g x. (The same idea was used in
the proof of Theorem (3.17).)
(ii) Leta, b be two atoms of x. If avbiéo, then by (i), there exists an
atom e such that c < a-b. Since a~b g u, we have r: < a. Since a itself is
Boolean Algebra 235
a minimal non-zero element of X, it follows that a= c. Similarly b = c.
Hence a = b. In other words, if a, b are distinct atoms then a-b = 0.
(iii) Let x E X. If x = 0, then x contains no atoms and the assertion
holds because of the understanding that an empty sum is to be regarded
as 0. So suppose x;é0. Let a1, 11,, ak be the distinct atoms of X con-
tained in x. We assert that x=a,+ a, + + at. At any rate, since
a, S xfor alli= 1,2, ..., k and a,+a,+ +ak is the supremum of
the set {an (1,, ..., ak}, we already have ul+ a, + + ak S x. To show
x S a, + a, + + a,” we must show, by De Morgan‘s laws, that
xa,’ a'I ak‘ = 0. If not, then by (I) there exists an atom b such that
b g xa,’ a,’...ak'. But x01' a,’...ak' < x. So b gx, i.e. b is an atom
contained in x. So b = a, for some 1‘. But this means a, g xa,’ a,’ ak’ S a,’
giving (1,11, = 0, i.e. a. = 0 by the law of tautology. Since no atom can be
0, we get a contradiction, proving that x < a, + a, + + ak and hence
that x = a, + -l— ak. As for uniqueness, suppose x = b, + b, + + b,
where each b, is an atom and bfiéb, for iséj. (If b,=b, for some icéj,
then. by the law of taultology we may replace b, + b, by b; itself). Then
1:, g x for all i=1, 2, ..., r. But (1,, a,, ..., ak are all the atoms contained in
x and so b, = a, for some 1. Since all the b's are distinct, it follows that
r < k. Suppose r < k. Then there is some a, which does not equal any b,.
Without loss of generality suppose a1 a5 b,, 17., ..., b,. We shall derive a
contradiction as follows. b, + b,’ = 1 for all i -= l, ..., r. Hence
’ aI = a1 (bl + 12;) (b, + b,’) (b, + b").
When this product is expanded, every term except a, b.’ b,’....b,’ will
contain the product of a, and at least one of the atoms b,. ..., b,. By (ii),
all such products are 0. Hence al =11, b" b,’ b,’. But a, < x gives
al x’ = 0 which gives a, bl’ b,’ b,’ = 0 since x = bI + b, + + b,. So
we get a, = 0, a contradiction. This shows that every a, equals some 1:, and
hence that the expression of x as a sum of atoms is unique. B
As an important special case of (iii), the sum of all atoms equals 1.
We now have all the machinery needed to prove the Stone representation
theorem for finite Boolean algebras.
1.11 Theorem: Every finite Boolean algebra is isomorphic to a power
set Boolean algebra, specifically, to the power set Boolean algebra of the
set of all its atoms.
Proof: Let (X, +, -,‘) be a Boolean algebra. If I X | = I, then X is isomor-
phic to the power set Boolean algebra of the empty set. Assume I X | > 1.
Then Oaél (by (ix) in Theorem (1.3)) and so X has at least one atom.
Let a,, ..., a,| be the distinct atoms of X. By the last proposition,
a,+a,+ ...+a,.= 1.
Now let S be the set {1, 2,..., n}. We assert that the power set Boolean
236 mscms MATHEMATICS (Chqpter Four)
algebra (P(S),U . n .’) is isomorphic to (X, +.-,’). Define f : P(S)-MY as fol-
lows. Atypical element, say I of P(S) is some subset of (I, 2, ..., n}. We let
f(I) = 2: a,. In other words, 1(1) is the sum of those atoms whose indices
IEI
are in 1. Clearly [(4a) = 0 and f(S) = 1, Because of part (iii) of the last
proposition, the function, f is a bijecticrn. In order to show that it is an
isomorphism of two Boolean algebras, we must show that it preserves
the corresponding operations. Specifically, for any two subsets l and J of
S, we must show: (i) f(IUJ) =f(l) +f(J) (ii) f(a) =f(I)-f(J) and
(iii) f(S — I) = [f(01'. We verify these conditions one-by-one. For nota-
tional convenience, let]: (lg, ...,Ic,, in ..., 1),} and J= (kl, ..., k,,j,, ...,j,}
where p, q. r are nomnegstive integers and it, ..., i, gt J and j‘ . , j, ¢ I.
Then IUJ= .(k,, ...,k,, r}. ..., i,,j,. ...,j,} and Inl= {k1, k,}.
Now for (i)f(IUJ)= i‘ak,+lflfln + 51.2,.
v I a l
=2ah+2an +2a,.+ 2 ak,
1-1 I.) l-l 5-1
(by law of tsutology)
= fl!) +f(J).
For (ii), 1(1)~f(l) =( 5:, at. + g am '5' at + .2" ax.)
= 5: akra," by (ii) of the last proposition
0-1
5:: at, by the law of tautology
..
f(’n1)-
And finally.
and f(l)+f(S-l)-f(lU(S-l))=-f(S)=1
far/(5 -I) =f[ln(5 -1)l =f(¢) = 0
by what is proved earlier. So by uniqueness of complements f(S—I)=[f(l)]'.
This proves (iii) and completes the proof that f is an isomorphism. So X is
isomorphic to P(S). But obviously P(S) is isomorphic to the power set
Boolean algebra of the set {an ..., a.). a
Note that although the theorem merely asserts the existence of an
isomorphism between a given ‘abstract‘ Boolean algebra and some con-
crete, power set Boolean algebra, the proof does more than that, because it
gives an explicit isomorphism. In fact, the proof shows that the structure
of an abstract, finite Boolean algebra is completely determ’med by the set of
its atoms. For this reason. this theorem (or rather its proof) is an example
of what are known as structure theorems in mathematics. Such thorems are
Boolean Algebra 237
very important because they express an abstract mathematical structure in
terms of some concrete structure of the same type. As an application of
the last theorem (or rather. Proposition (l.10)), we Can define the concept
of cardinality or of weight for elements of any finite. abstract Boolean
algebra. If x is an element of a finite abstract Boolean algebra X, we let
I x I be simply the number of atoms of X contained in x. This is consistent
with the definition of cardinality in case X is a power set Boolean algebra.
The results proved in Chapter 2, Section 2, about cardinalities of finite sets
can be generalised to any finite Boolean algebras.
As another application of the last theorem, we have the following
result, which is not so easy to establish directly.
1.12 Corollary: If X is a finite Boolean algebra then | X l = 2" for some
non-negative integer n.
Proof: We simply note that the power set of a set with n elements has
cardinality 2'. The result now follows by letting n be the number of atoms
in X. I
The Stone representation theorem actually deals with all Boolean algebras,
not just the finite ones. The idea of the proof is basically the same, namely
to consider the atoms and to express every element in terms of the atoms.
But there are two difliculties. The first is that we have to consider infinite
sums of atoms. This is not a very serious difiiculty and can be circumven-
ted by considering suitable functions from the set of atoms into Z, (similar
to the characteristic functions), The real hurdle is to prove part (i) of
Proposition (1.10), namely that every non-zero element contains at least
one atom. The proof for the finite case no longer works because in an in-
finite poset it is possible to have an infinite strictly descending sequence.
x,>x,:>x,>...>x,.>x.,,.....
To get the existence of a minimal non-zero element we have to appeal to
the axiom of choice, see the notes at the end of Chapter 3, Section 3. Even
after using it, the result we get is that every Boolean algebra is isomorphic
to arubalgebra of the power set Boolean algebra of the set of its atoms.
There do exist Boolean algebras which are not isomorphic to the entire
power set Boolean algebras of any sets. An example of such a Boolean
algebra will be given in the exercises.
Exercises
1.] Let S be any set and Z, a Boolean algebra with two elements 0
and l. Prove that the Boolean algebra of all functions from S to
Z. (under pointwise operations) is isomorphic to the power set
238 915mm panama-nos (Chapter Four)
Boolean algebra of S, (Hint: Prove that the bijection constructed
in the proof of Theorem (2.2.15) is an isomorphism).
1.2 Suppose S and T are disjoint sets. Prove that the product Boolean
algebra. P(S') x P(T) is isomorphic to the Boolean algebra
P(S u T).
A positive integer is called square-free it' it is not divisibleby the
square of any prime number. For example 30 is square-free but 45,
120 are not. Suppose n in a squarefree positive integer. Let X be
the set of all positive integers dividing n, i.e., X = {x e N : x[n}. For
x, ye X define x + y to be the least common multiple of x and y.
x-y to be the greatest common divisor of x and y and x' to be
n/x. Prove that (X. +, -, ‘) is a Boolean algebra. Find a power set
Boolean algebra isomorphic to X. Why is it necessary to assume
that n is square-free?
1.4 Let Xa: (a. b, e, d). Define two binary operations + and - on X by
the following tables:
+ a b c d . a b c d
_a a b c d a a a a a
b b d d .b a b a b
c c d c d c a a c c
d d d d d d a b c d
Prove that there exists a function ’: X -> X such that (X, +, -, ') is
a Boolean algebra.
(Hint:_ Instead of verifying the axiom: directly, it is much easier
to find some known Boolean algebra Yand a bijectionf:X——> Y
which is compatible with addition and multiplication.)
1.5 in the example given after Proposition (1.2), verify that the sub-
algebra Y is indeed isomorphic to PW).
(a) Prove that it is impossible to prove the laws of tautology using
only Bl, 82 and one of the distributive laws in Definition (1.1).
(b) What is wrong if we attempt to prove part (ii) of Theorem (1.3)
as follows?
x+l=x+x+x' (sincex+x'=lbyB4)
Boolean Algebra 239
=x+x’ (sincex+x=xby(i))
= l (by B4).
1.7 Prove that in a distributive, bounded lattice, the complements are
unique, whenever they exist.
1.8 Although neither + nor - in a Boolean algebra (X, +. -, ’) satisfies
the cancellation law individually, prove that the ‘simultnneous can-
cellation law’ holds. i.e. for any x, y, z e X, x + y = x + z and
x-y = in: together imply y = z.
1.9 if x" x., ..., x, are elements of a Boolean algebra prove that
x,+x,+...+x,.==0
ifl‘x, = Ofor all l = l, 2, ..., n and xl-x,~...-x. =1ifl'x, = l for
all! = l, 2, ..., n.
1.10 if x, y are elements of a Boolean algebra, prove that x = y ifl'
xy’ + x’y = 0. (This and the last result help in reducing any system
of equations to a single equation.)
1 .11 Prove that the following system of statements:
(i) All happy. intelligent men are rich
(ii) The only way for a poor man to be happy is to be intelligent
(iii) Every rich man is either happy or married
(lv) No married man is unhappy unless his poor
is equivalent to the single statement that a man is happy ifi‘ he is
rich.
1.12 A medical oficer reported the following observations in a health
survey of a certain population:
(i) 40 Z of the population smoked cigarettes. 30% suffered from
cancer and 20 “/3 sufi‘ered from heart diseases.
(ii) All smokers suffering from heart diseases also sufl‘ered from
cancer.
(iii) Every person not sufl'ering from heart diseases either refrained
from smoking or else had cancer.
Prove that the survey is a fake onc.
1.13 A certain university ofl‘ers four courses .4, B, C and D and pres-
cribes the following rules for registration:
(i) Every student must register for at least two courses.
(ii) A student registering for course A must also register for one
of the courses B and C but not both.
(iii) Every student registering for course C must also register for at
least one of the courses B and D.
(iv) No student may register for courses A and C simultaneously.
(v) No student may register for courses C and D simultaneously
without registering for course B
Simplify the rules and prove that one of them is redundant (that
is, it is implied by the remaining four rules).
240 nrscam mrnnmrrcs (Chapter Four)
1.14 Suppose (X, g) is a lattice with a maximum element 1. Then a
maximal element of X — (l) is called a eo-atom. Prove that an ele-
ment :1 of a Boolean algebra is a co-atom ifl' its complement a' is an
atom. (Atoms and eo-atoms are dual concepts. Generally, the suffix
co- is used to name the dual concepts. Another such usage occurs
in the next exercise.) Prove that every element of a finite Boolean
algebra can be uniquely expressed as a product of co-atoms.
1.15 A subset A of a set S is called cofinlte, if its complement, S — A is
finite. Prove that:
(i) the union and the intersection of two cofinite sets is eofinite.
(ii) if S is infinite then every cofinite subset of S is infinite but not
every infinite subset is necessarily coflnite.
(iii) if S is countable then the set of all coflnite subsets of S is
countable. (of. Exercise (2.2.22) part (v).)
1.16 Let Y be the set of all subsets of N which are either finite or co-
flnite. Prove that Y is a subalgebra of the power set Boolean algebra
P(N).
1.17 Prove that Y, regarded as a Boolean algebra by itself, is not iso-
morphic to the full power set Boolean algebra of any set.
(Hint: Prove that there exists no setS such that |P(S) | equals R...)
1.18 Prove directly (without using Corollary (1.12)). that the number of
elements in a finite. nontrivial Boolean algabra is even. (Hint:
Pair off every element with its complement.)
1.1 Let (X, +, -, ’) be a Boolean algebra. For x, y E X define x A y
\D
as xy' + x’y. The binary operation A so defined is called symme- .
trlc ditterence or the exclusive or. A is also often denoted by e
and called the ring sum. Prove that:
(i) A is commutative, associate and has 0 as an identity.
(ii) Every element of X is invertible w.r.t. A.
(iii) If x, y are two-state devices then Ay is 1 when exactly one of
x and y equals 1. (Hence the name ‘exclusive or’.)
(iv) - is distributive over A,
(v) For all x e X, a” = xAl.
(vi) For all x, y e X, x + y = xAyA(x- y).
(Thus, + can be recovered from A).
1.20 Let S be any set and X a Boolean algebra. Let Y be the set of all
functions from S to X. Then Y is a Boolean algebra under point-
wise operations. Iff, g E Y, prove that f < g ifi‘f(s) g g(s) in X
for all .r E S. (in other words the partial order induced by point-
wise operations is the same as the pointwise partial order. Fix some
3., E S and definef:S—> b
Boolean Algebra 24]
f... = { 0 for all s e S, except for s = .r,
some atom of X, for .r = So-
Prove that f is an atom of Y.
l.2l Similar to the last exercise, obtain a description of the partial
order and of the atoms of the product of two Boolean algebras.
”1.22 Suppose Xis a non-empty set, - is a binary operation on X and
’ : X —> X is a function such that (i) . is commutative and associ-
ative, (ii) there exists an element 0 in X such thatx . x’ = 0 for
allX, (iii) for all x,ye X,x~y= x ifl' x-y' = 0. Prove
that a binary operation + on X can be so defined that (X. +, ~. ')
is a Boolean algebra. (This gives an alternate and a highly com-
pact definition of a Boolean algebra.)
Notes and Guide to Literature
The definition of a Boolean algebra. as given here is due to Huntington.
The lattice approach is equally popular, especially in logic. For more on
the definition in Exercise (1 22), see Dornhofi‘ and Hohn[l]. A proof of the
Stone representation theorem for Boolean algebras may be found in
Kelley [1].
In the literature on Boolean algebras (and other algebraic structures)
the reader may often find additional axioms such as ‘if a, b E X then
a + b E X’ and ‘if a = b then a + c = b + c‘, called respectively as
laws of closure and substitution (or uniqueness). This practice comes from
the days when a rigorous definition of a binary operation was not formu-
lated. Now that we define a binary operation on a set X as a function
from X X X to X and require that a function always assume values in its
codomain and be single valued, it is redundant for us to include such
axioms.
2. Boolean Functions
It was remarked in the last section that Boolean algebras provide the
appropriate mathematical tool for handling two state devices. It often
happens that the state of some two state device depends upon the states of
some other two state devices. For example whether current flows through
an electrical circuit containing various switches depends upon the states of
these switches. The truth or falsehood of a complicated statement depends
upon that of several simpler statements. Such dependence can be described
by a function called a Boolean function. In this section we study the general
theory of such functions. Their applications to circuits and logic will be
considered in the next two sections.
242 mm MATHEMATICS (0:41pm Four)
We begin by giving a formal, mathematical name for a two-state device.
2.1 Definition: A Boolean variable is a variable which assumes only two
possible values, 0 and i.
In other words, a Boolean variable takes values in the set 2,. Since Z,
is a Boolean algebra, we can define the addition, multiplication and com-
plements of Boolean variables. For example, if x, y are two Boolean vari-
ables, x + y is the Boolean variable which has value 1 when at least one of
x and y has value 1, and the value 0 otherwise, (because in Z,, 0 + 1 = 1
+0=1+1=land0+0=0).
Conceptually, Boolean variables are similar to real variables with which
we are all too familiar. The obvious difl’erence, of course, is that while a
real variable assumes infinitely many possible values, a Boolean variable
assumes only two possible values. Thus, a Boolean variable is the simplest
'L‘ variable, L a variable ‘ a only one possible value is
not a variable at all, it is a constant. A real variable is a continuous
variable while a Boolean variable is discrete.
Two Boolean variables x and y are called Independent it' each can
assume values independently of the other. This means any of the four
possibilities can occur, namely, x = 0, y= 0; x = l. y = l;x = l, y s O
andx = l, y = 1. More generally, if x., x,. ..., x. are n independent Boolean
variables, then together they can be assigned values in 2" possible ways.
Note that two complementary variables x and x' can neverbe independent.
For real variables, we often replace two real variables, sayx and y,
by a single variable (x, y) which takes values in the Cartesian plane R X R,
or R'. More generally, if we have n real variables, we can replace them
by a single variable ranging over the euclidean space R". A similar cons-
truction is possible for Boolean variables. Let Z; denoted the product set
Z, X Z, x... x Z, (n times). (Note that Z; is also the set of all binary
sequences of length It.) If x,, x,, ..., x. are Boolean variables, we can
replace them by a single sequential variable (x,, x., ...,x.) taking values in
2;. Mutual independence of x,, x,, ...,x. is equivalent to saying that this
new variable (x1, x,, ..., x.) assumes all the 2" values in Z'z'. Analogous to the
concept of a real-valued function of several real variables, we have the
following definition.
2.2 Definition: A Boolean function of n variables is any function from
2: to 2,.
For example suppose n = 3 and flxl, x,, x.) = x, x, + x.’ x, Then f is
a Boolean function of three variables. ftl, 0, l) is the value assumed by f
when x, = l, x, =0 and x.= I. Since in this case, x,’ = 1, we get
10,0, 1)=1.o+ l.l=0+l=l.
Similarly 1(0, 1, 0) = 0. Notice that computation of the values of a Boolean
Boolean Algebra 243
function is considerably simple because 0 and l are the only two possible
values and also because the binary operations in the Boolean algebra ZI
are so simple. For example, if we want to evaluate some function f which
is given by writing fix“ x,, ...,x.) as a sum of terms, the moment we see
that at least one of these terms equals 1 for a particular choice of values of
the variables x1, ..., x,., it is unnecessary to evaluate the other terms
because no matter what they are, f will have value 1 for that particular
choice of values of the variables.
There are two ways to denote a Boolean function f of nvariables,
x,, x,, ..., x,. The first method is to write the table of values off, in which
we actually list downthe values off for all possible elements of the domain
set, namely 2;. This is a workable proposition because the domain set is
finite. (Even in the case of a function of a real variable, such tables of
values are frequently prepared while drawing its graph. But such tables
can never be complete since the domain set is infinite) In such a table
there are n + 1 columns, one for each x, and one forf. Sometimes some
auxiliary columns may be added as an aid to fill the column below f.
There are 2" rows. In each row, we indicate one possible assignment of
the values 0 and l to the Jo‘s by writing the value given to x, in the column
under x,. In that row. the place in the column below f is filled by the value
off for these values of x,‘s. Although the order of the rows can be
arbitrary, one standard practice is towrite them, starting from(l,l,..., l)
and then making changes from the right to the left, so that x,I changes most
frequently and x, changes only once. In Figure 4.1 (a), we show the table
or values of the function f of three variables considered above, namely
fob: 3‘s, 9‘1) = "l "s + 3‘1, xi-
Row No. x, x, x, f
I 1 I l l
2 1 l 0 1
3 l 0 1 l
4 1 0 0 0
5 0 l l 0
6 0 1 0 0
7 0 0 1 l
8 0 0 0 0
Figure 4.] (a) : Table of Values of a Boolean Pnnctlon
244 DISCRETE MATHEMATICS (Chapter Four)
There is another way to write the table of values of aBoolean function,
which is a little more concise. The idea is to replace several (say k) Boolean
variables by a single variable taking 2" possible values. For example. in-
stead of the two Boolean variables x, and x,, we treat (x,, x,) as a single
variable. taking 4 possible values, namely (I, l), (l, 0), (0, l) and (0.0).
Given a function f of n Boolean variables X], ..., x., we take a suitable k
(usually an integer close to n/2). We consider (xv ..., xk) as a single vari-
able taking 2" values and (xkfl, ..., x.) as another single variable taking
2""‘ values. The function f may then be thought of as a function of these
two variables and consequently its values can be listed in a table with 2"
rows and 2""‘ columns, similar to the table of a binary operation. For
example, consider the function fix“ x" x,) = xlx, + xl’x, for which a table
was constructed above. Here n = 3. We take k = 2. Then another table for
f can be drawn as shown in Fig. 4.1 (1:). Both the tables, of course, convey
the same information.
(:tm\~ x. l 0
(l, l) l 1
(l, 0) l 0
(0. l) 0 0
(0, 0) l 0
Figure 4.1 (b): Concise Table for a Boolean Function
Even with the concise version. it becomes impracticable to draw the
table of values of a Boolean function of 71 variables even for relatively small
values of It. So we look for another, compact method. Such a method is
provided by giving an algebraic formula for the function. In Chapter 2.
Section 1 we emphasised that a function should not be confused with a
formula for it. indeed for functions of real variables, such closed formulas
may not always exist. As a well-known example take the function f: R —>R
defined by flx) = the smallest prime greater than at, This is a perfectly well-
defined function but there is no known formula expressing f(x) as something
like exp (— sin2 2: + x).
However, things are much better for Boolean functions as we prove in
the following theorem. First we introduce a shorthand notation. Let
x,, x3, ..., x.l be Boolean variables which are mutually independent. For
i: 1, 2, ..., n let/7:22» Zg be the Boolean function defined by
Bnoleau Algebra 245
1 if x; = l
fi(x,, x” ..., x.) = i
0 if x, = 0
1n other words, f,(x1, xl, ..., x.) = x,. For this reason we shall denote f. by
x, itself. This double role of x. (once as a Boolean variable and again as a
Boolean function) should cause no confusion. (The function f; is often
called the projection on the ith factor and denoted by m.)
We now prove what msy be called the structure theorems for Boolean
functions.
2.3 Theorem: Let
x1, x,, ..., x.
be mutually independent Boolean variables. Then there are 21" Boolean
functions of these n variables. The totality of such functions constitutes a
Boolean algebra. The atoms of this algebra are the 2" functions of the
form
where each x," is either an or x.’ (with x. interpreted as above).
Proof: Let X be the set of all Boolean functions of the n variables. Every
element of X is a function from Z: to 2,. Since | Z'.‘[ = 2". it follows from
Theorem (2.2.14) that [X I = 2’". Z, is s Boolean algebra. So under
point-wise operations (which we denote by +, « and ’)X is aBoolean
algebra. We have to find the atoms, i.e., the minimal non-zero elements
in X. First we show that all functions of the form
xtsxiis ... X?
are atoms. Let g be one such 1’ ' For ' ’ r ‘ ‘ we -r
I
g = xlx. xkx,“ x.’
for some k. (In all other cases the argument is the same except for note-
tions.) Note that as a function from 25' to 2,, g vanishes at all points of
Z; except the point (1, .... l, 0, ..., 0) where the first k entries are 1 and
the remaining 71 — k entries are 0 (cf. Exercise (1.9)). For brevity let us call
this point ek. Then g(e;,) = 1 but g(x) = 0 for all x E Z; — (ck). So 3 is
not In identically zero function. Hence g is a non-zero element of X. To
show it is a minimal non-zero element, suppose f e X andf s g. Then
flx) < g(x) for all x e Z; (of. Exercise (1.20)). But g(x) = 0 for all x ye ek.
Soflx) = 0 for all x gé en. Iff(e.) = 0, then f is the zero element of X
and iff(ck) = 1 then f = g. Thus g is not properly larger than any non-
zero element of X. In other wordsgissminimal non-zero element of X, i.e..
an atom of X. Thus every element of the form
246 mscssrs mmsmnrcs (Chapter Four)
41x;- r-
where each :6," is either an or xi' is an atom of X. Since there are two pos-
sibilities for each i, we get 2' such elements. But by Corollary (1.12), the
number of atoms in a " ' ' ‘ with 2"l ' is , ‘ ‘, 2".
Since we have already constructed 2" atoms in X, it follows that there can
be no other atoms. I
Now that we know the atoms of X, we can apply the theory proved
in the last section. We record the result as a theorem.
IA Theorem: Every Boolean function of n variables x, x,, ..., x. can be
uniquely expressed as a sum of terms of the form
xjxxgl x:-
where each x," is either x, or x,’. Specifically, for every element, say,
7 = (yn y.. ---.y.)
of ZS, let f,- be the Boolean function from 24" —> Z, defined by f; (y) = l
and f;(1) = 0 for all z e Z; — {y}. and let x; be the term
' xgxxga xe-
where an? — x. if y, = 1 and xi' = x,’ if y. = 0. (For example, for n = 5,
and
y a (l, 0, l, 0, 0), x; = x,x.’x,x"x.’).
Then every function
I: Z; -> Z,
etluals E x; where the sum extends over all i) such that f0) = 1.
Proof: As proved above, elements of the form f; are precisely the atoms
of X, the Boolean algebra of all Boolean functions of x» x.. ..., x,.. Note
that a function f: Z'z'-—> Z. contains the atom f; ifl‘fG) = l (see again
Exercise (1.19)). Hence by Proposition (l.iO), f equals 2f; where the sum
extends over all i e 2'; such that fly) = 1. But, because of our under-
standing to let x, denote the function which takes (xl. XI, ..., x.) to x,. it
is easily seen that f; is nothing but :5. Hence f = Ex;
As a concrete example, consider the function f whose table of values is
shown in Figure 4.1(a). This function takes the value 1 at four points of
Z}, namely,
(1.1. 0.0.1.0). (1,0,1) and (o, o. 1).
The corresponding atoms are
x.x.x,, 313%" 3131*: and x,’x,'x,,
Baoleau Algebra 247
Hence f equals the sum of these four atom, i.e.,
f = xlxgx, + x,x.x,’ + xlx,'x, + xl'x,’x,.
Upon simplification, this reduces to
xlx, + x,’x..
Thus
[(q x1: x,) = ’8s + x,’x,
for all
x,. x,, x, e Z,.
This was, of course, the original definition of f. What we have shown is
that even if we do not know the original formula for f; but simply know
(either from the table of values or by some other method) the points where
f takes the value I then we can always get an algebraic expression for f by
adding the corresponding atoms. Of course, the expression so obtained
may not be the simplest possible. As in the example above. we can often
reduce it by combining suitable terms and using the laws proved for
Boolean algebras. The point is that, unlike functions of real variables,
Boolean functions always have an algebraic formula. Indeed a few authors
define a Boolean function of n Boolean variables x1. ..., x, as an algebraic
expression of these variables involving the operations +, . and ’. We re-
frain from doing so because the concept ofafunction is a general one
and although in a particular instance it coincides with a formula, this is no
reason to change the definition.
Although the algebraic expression for a Boolean function, as a sum of
certain atomic functions is generally not the shortest possible, it has
certain advantages. First, it can be written down just by inspection from
the table of values. Secondly, it has a certain regularity of form. It consists
of a sum of terms, each term has exactly the same number of factors (equal
to the number of variables), the order of these factors is regulated and
each factor is very simple, either some x, or x,’. Because of this regularity
of form, the expression is given a special name.
2.5 Definition: The algebraic expression for a Boolean function of n
variables as a sum of atoms (as given by Theorem (2.4)) is called the
disjunctive normal form (abbreviated D.N.F.) of that function. The disjunc-
tive normal form of the function which identically equals 1 is called the
complete disjunctive normal form in n variables.
The word ‘disjunctive’ here refers to summation, which is more formally
called disjunction. The name comes because the function is expressed as a
sum of terms each having a certain ‘normal’ form.
As noted above, the disjunction normal form of a function can be
written down by an inspection of its table of values. Conversely, given
the disjunctive normal form of a function of n variables we know imme-
248 DISCRETE MATHEMATICS (Chapter Four)
diately at what points of 212' it assumes the value 1. Since at all other
points the function must vanish, we can construct the table of values of
the function. Because of this, we may regard the disjunctive normal form
of a function as acompact version of its table of values. For example, if
flxp x1! xv x‘) = xixn-Xalxs + xl'x,x.'x‘ + xixn'xsxe'
+ x,’x,’x,'x. + x.’x,’x.’x"
then f is in its disjunctive normal form and we see that f assumes the value
lforSpoints in Z}, namely, (1, l, 0, I). (0. l, 0, l),(1, 0, l, 0), (0, 0, 0, l)
and (0. 0, 0. 0). Therefore, in the table of values off. in the column under
f, 1 will occur at 5 places, that is, in the rows corresponding to these 5
points. All the remaining ll places will be filled by 0‘s.
Sometimes we are already given some function f in an algebraic form
(not necessarily in the disjunctive normal form) and we want to cast it
into its D.N.F. One method is to Write the table of values of f. But this is
often too tedious. So we look for other methods. We first write f as a sum
of monomials, where by a monomial we mean a product in which every
factor is some variable or its complement. For example x‘xfic. is a
monomial but x,(x,’ + x.)x. is not a monominl by itself; it is a sum of two
monomials, x‘x.’x‘ + x,x,x.. Because of xl-x,’ = 0 we ignore those mono-
mials in which some variable and its complement both occur. Also because
mm = x. (by law of lautology) if the same variable occurs more than once
in a monomial, we retain only its first occurrence. Of course, not every vari-
able need appear in a given monomial. If neither x: nor x,’ appears in a
monomial, we multiply the monomial by x, + x,’ (which equals 1) and split
it into two monomials, one of which contains 7:. and the other contains x,’
and both of which contain all other variables occurring in the original
monomial. We repeat this process till every monomial in the original expres-
sion off is expressed as a sum of monomials in which every variable occurs.
Now once again apply the law of tautology to weed out the repetitions of
monomials. The resulting expression is the disjunctive normal form of the
given function. As an illustration we do the following problem.
2.6 Problem: Write the following Boolean functions in their disjunctive
normal forms:
(i) “*1: "s- x.) =‘(xl + x,‘)x;’ + x.X.’(Xa + xl'x.)
(ii) g(a, b, c)=(a+b+c)(a'+b+c’)(a+b'+c‘)
(a’ + b’ + c’)(a+ b + c’)
Solution:
(i) fix” *1» xi) = xix; + xllxs' + 31x1, + xnxlixllxl
= xiv:- + x10»; + (x, + x.’)x.’X.' + xt’xsx: + x.’)
+ xl’xgr.
Boolean Algebra 249
= xaxaxa' + xlxl’xl' + x,x,’x,’ + xl’xI’x,’ + xllxixl
+ x,’x,x,’ + 35%,):a
= xlx‘x,’ + xrxn'xnl + XII-xxx: + xi'xsI + "‘llxslxsl
This is the distinctive normal form.
(ii) It would be too cumbersome to multiply the 5 factors out using
distributivity of . over +, because this would initially give a sum
of 3', Le. 243 terms. But we can use distributivity of + over - Then
the product of the first and the fifth factor is a + b while that of
the second and the fourth is a’ + 2‘. Because of tautology, the
same factor may be used again. So the product of the third and the
fourth factor is b’ + c’.
Hence g(a, b, c) = (a + b)(a‘ + c’)(b’ + c’)
= (a + b)(a’b' + c’)
= aa‘b' + ba’b’ + ac' + be’
= 110' + be’
= (1(1) + b')c' + (a + a’)bc’
= abc’ + ab‘c’ + abc' + a’bc’
= abc’ + ab’c' + a’bc’. !
We recall once again that the disjunctive normal form arises by applying
the theory of atoms developed in the last section to the Boolean algebra of
Boolean functions of Boolean variables, In the proof of Theorem (1.11) we
saw how to add, multiply and take complements of elements expressed as
sums of atoms. This leads to the proof of the following result'
2.7 Proposition: Let two functions f and g of the same It Boolean vari-
ables be expressed in their respective disjunctive normal forms. Then the
D.N.F. off + g is obtained by taking the sum of those terms which appear
' in the D.N.F. of at least one offaud g; the D.N.F. off~g is obtained by
summing terms which are common to me D.N.F.’s of both f and g and the
D.N.F. off’ is obtained by omitting from the complete D.N.F. the terms
which appear in the D.N.F. off.
Proof: As already noted, this is merely a special case of the proof of
Theorem (1.11). For the last statement, note that the function which iden-
tically equals lis precisely the element 1 (Le, the identity of multiplication)
of the Boolean algebra of Boolean functions of n variables. By (iii) in Pro-
position (l.10), this function is the sum of all atoms, which is precisely the
complete disjunctive normal form. I
250 Discaa'ra mruamrlcs (Chapter Four)
The dual concept of a disjunctive normal form is the conjunctive normal
form. By Exercise (1.14), every element of X can be uniquely expressed as
a product of co-atoms of X. Since co-atorus are precisely the complements
of atoms, it follows that the co-atoms of X are all functions of the form
1:? + x? + + xf; where each x}! is either x; or x]. WhenaBoolean func-
tion of x,, x” ..., x“ is expressed as a product of factors of this form the
function is said to be expressed in its conjunctive normal form, abbreviated
C.N.F. Each factor in the C.N.F. of a function f corresponds to one point
of z: where f vanishes. Therefore, like the D.N.F., the C.N.F., of a function
can be written down by an inspection of its table of values. . For example,
for the function f(x,, x2, x.) =1 xxx, + x.’.\-,, we see from its table in Figure
4.1 (a) that fvanishes at four points in 2:, namely, (1,0, 0), (o, 1, 1), (0. 1,0)
and (0, 0, 0). Take the point (i. 0, 0). It corresponds to the factor x,’ +
x,+x,, because this is the only co-atom which vanishes when x, - l,
x, = 0 and x, = 0. Similarly, we determine other factors and let
f= (x; + X. + xxx; + x; + me. + x.’ + x.)(x,+ x. + x.)-
We could also get this directly from
K31! xi- ”9 = ”1": + xl’xl
by dualising the procedure for obtaining the D.N.F. Thus.
fix" xi, 3:.) = XIX; + x,’x,
:2 (xl + n’xaxx, + fix.) (by distributive laws)
= (x. + mm + me. + x.)
= (x1 + Xv' + x,)(x, + x.’ + Xa'Xx: ‘l' x, + x.)
(x, + x,’ + Jr.)o(xI + x, + x,)(x,' + x, + 3,)
=(x1‘ + x, + xa) (x, + x.’ + x:') (x, + «‘3' + Xe)
(X. + x, + x.).
Properties of the conjunctive normal form are dual to those of the
disjunctive normal form and hence will not be stated separately. Because
of the principle of duality for Boolean algebras, the two forms are really
equivalent. But in a particular context, it may be more advantageous to
express a given function in its C.N.F. than in its D.N.F. or vice versa.
Note that for a function of n variables, if its D.N.F. has r terms then its
C.N.F. will have Zfl—rfactors. Here r also equals the number of points
in Z; where the function takes the value 1. So. if a function vanishes at all
except a few points, it is more economical to take its D.N.F. than its
C.N.F., but if it vanishes at only a few points, the situation is reversed.
Boolean Algebra 251
. There is also a way to convert either of the two forms to the other
using double complementation. We illustrate it for the function f in problem
2.6. There we obtained
f = xlx,x,’ + xlxl’x,’ + x,’x,x, + xl’ x. x,’ + x,’x,’x,’.
Then by proposition (2.7). f’ is obtained by omitting these five terms
from the complete D.N.F. in the three variables (which has 8 terms). This
gives f’ = x, x,x, + xx x,’ x. + x1’ x.’ x.. We now again take complements,
but apply the De Morgan’s laws to the right hand side. Thus
(f')’ = (e + x.’ + Xa’) (1?" + X. + Xg'Hx: + 12+ 98')-
But (f')’ = f and thus we have obtained the C.N.F. off.
Having discussed the generalities about Boolean functions. let us now
illustrate how they arise in real life problems. More examples would come
in the next two sections. But some initiation can be done right now.
Let us take the Locks Problem. As before, denote the five persons by
px, p,, 17,, p. and Pr Each person p.~ can either give his assent to open the
box (which means he hands over all the keys in his possession) or he may
refuse to do so. These being the only two possibilities, we associate a
Boolean variable x. to represent them. We set x; = l ifp. agrees to open
the box and x, - 0 if he does not. The box also has two states namely
open or closed, and we use the Boolean variables b to denote the state of
the box, I: = I if the box is open and b = 0 if not. The problem now is to
express b as a function of the variables x1. ..., x. in such a way that b will
be 1 iii” at least three of the x.’s are equal to I. From this we can mentally
prepare the table of values of f and write f in its disjunctive normal form,
givingb =b, +17, + ba where b, is the sum of ten terms of the form
x‘x,x,x"x.’ (with three variables without’ and 2 with’), b, is the sum of
5 terms of the form xxx,x,x‘x.’ and b, = x,x,x,x,x..
We have not yet said anything about the locks. Indeed. the discussion
so far has been completely independent of the manner by which each
person signifies his assent or dissent to open the box. In the next section,
we shall consider the case where the assent is expressed by pressingan
electric switch and the outcome (that is, the state of the box) will be
indicated by the lighting of some lamp. For the moment. let us do the
problem of designing a system of locks and keys. Each lock L is also a
two state device and hence can be represented by a Boolean variable, say
y, which we set to 1 if L is open and 0 if L is closed. Now if the keys of
a particular lock are with persons 1),, ..., 17,, (say), then the corresponding
variable y equals 1 iii“ at least one of XI, .... x,, equals 1 (regardless of the
states of the other x,’s). So y = x,I + x,, + + x,,. The box will open ifl'
all the locks are open. Hence, b (that is, the Boolean variable representing
the state of the box) must be the product of Boolean functions representing
the states of the locks. So the problem will be solved if we factor b (for
252 DISCRETE MATHEMATICS (Chapter Four)
which an expression was obtained above) into factors of the form (Jr,l + x1.
+ + x,, ). Corresponding to each such factor there will be a lock whose
keys will be given to 17,, , p,, , .... p,, . If in cannot be factorised into factors
of this form it means the problem has no solution.
To see whether b has a factorisation of the desired form, we first
express b in its conjunctive normal form. We convert the D.N.F. of b
obtained above as b, + b, + bx to the C.N.F. using double complementa-
tion as illustrated earlier. We omit the details of the computation but
write the final answer as b = CA'CQ'Ca where c, is the product of ten factors
of the form (x, + x2 + x, + x,’ + x,’), c, is the product of five factors of
the form (x‘ + x, + x, + x. + x.') and c, = x1 + x, + x. + x. + x5. Now
we group together these 16 factors using distributivity of + over - and also
the law of tautology (which allows us to use the same factor again and
again). As a sample, the product of the four factors
(x;+xg + x: + x], + x0: (x1 + XI + x, + x" + x6)!
(x1+x,+x,+x. +x,’) und(x,+x,+x.+x.+x.)
comes out as (x, + X. + X. + x") (x; + x. + X. + 3.) Whifll equals
x‘ + x, + x‘. Similarly for every distinct i, j, k we can get x. + x, + 2:1. as
the product of four factors in the C.N.F. of (7. Thus b ultimately comes
out to be equal to the product of 10 factors of the form x,- +x1+xk
where i, j, k run through triples of distinct indices from 1 to 5. Conse-
quently there will be ten locks and each lock will have 3 keys. This is, of
course, exactly the same answer as obtained in Chapter 2, Section 3. But
the present method makes a systematic use of Boolean functions and
therefore can be applied to more general problems.
As in this problem. the data often has certain symmetry. In the Locks
Problem, whether the box can be opened or not depends only on how many
of the five persons want to open it and not on which particular person;
want to open it. (We can, of course, change the problem, say, by giving
some special powers to one of the persons. It would then no longerbe
symmetric in all the five persons.) The Boolean functions arising in such
problems have a particularly simple form whose advantage will be more
apparent in the next section. In the present section we study this symmetry
condition. The motivation for the following definition comes from the
fact that in the Locks Problem, if any two persons exchange their sets of
keys. the solution to the problem remains unaffected.
2.8 Definition: A Boolean function f of n variables xv x,, .., x. is said
to be symmetric w.r.t. a pair of variables x; and x, if the value off is
unaffected by interchanging x, and x1. that is, (assuming i< 1' without
loss of generality) if for all values of x" x,, ..., x. we have
f(x,,. u n-1, x,. xm- - xI-Iv x,, X1+p "u x.)
=f(xnmyxt—u *1. arm. ..., xi—n X1. X1“: aux.)-
Boolean Algebra 253
If f is symmetric w.r.t. every pair of variables out of xl,...,x,. then f is
called symmetric among these variables (or simply symmetric).
For example, the function xlx, + x3 is symmetric w.r.t. x, and x, but
not wont. x, and 2:, nor w.r.t. x, and x, The function x,x,x,+ x.‘(x,’
+ x,’ + xa’) is symmetric among x,, x, and x, but not among all the four
variables. The function b in the Locks Problem is a symmetric function of
all the five variables xl. ..., x.. The function x, x,’ + x, x,’ + x,x1’ is also
symmetric in the three variables. This is not immediately obvious. But if
we rewrite x, x,’ + x, x,’ + x, x,’ in its disjunctive normal form as
x, x,’x3 + x2 xa'x, + x, xl’x, + x, xg'xs’ + xl'x,’x, + xl'x,x,‘
we see that it is symmetric.
More generally, whether a function is symmetric or not can be easily
decided by writing it in its D.N.F. (C.N.F. would do equally well). Suppose
f is written in its D.N.F. and x, x,x',+l....x,,’ is one of the terms.
where r is some integer, 0 < r S n. Let e,=(l,..., l, 0, ..., 0) he the
binary sequence whose first r entries are all 1 and the remainingn—r
entries are all 0. Then fll,...,l, 0,...,0) =f(e,)= I. Now because of
symmetry, we can interchange any two terms in the sequence 2, and f will
still assume the same value. namely 1 at this new sequence. By performing
a series of such interchanges of two terms at a time, we can transform 2,
to any binary sequence of length n in which exactly r terms are 1 and the
remaining n — 7 terms are 0. (A formal proof of this fact will be given in
a later chapter. For the moment the reader can convince himself by trying
a few cases. For example (l,l,0,0,0) can be transformed into (0,],0,l,0)
by letting it go through (l,0,1,0.0), (0,1.l,0,0) and (0,1,0,1,0) there being
only one interchange at any stage. Alternatively, we can modify the
definition of a symmetric function so as to requires (reduntantly). that it
be invariant under any permutation of the variables.) The number of
II
sequences with r 1‘s and (n — r) 0's is( ). By symmetry,f takes the value
r
n
l at all of them. Accordingly, the D.N.F. off will contain all the )terms
r
of the form x,'l....x;n where x'll equals x, for exactly r values of i and
equals x.’ for the remaining (11— r) values of i. On the other hand it‘
’1
fig) = 0 then the D.N.F. off cannot contain any of these( > terms.
r
We use this reasoning to obtain a handy representation of symmetric
functions. First we need a definition.
2.9 Definition: Letf be a symmetric function of n Boolean variables.
Then an integer r(0< r s u) is called a characteristic number of f if
f(9,) = l where e, is the binary sequence of length 11 whose first r entries
254 Drscan'ra MATHEMATICS (Chapter Four)
are l and the remaining (n -— r) entries are 0.
For example the symmetric function f(x1, x,, x.) = x,x. + x,x, + x,x,
has 2 and 3 as its characteristic numbers as we see by actual computation,
namely [12,) = fl0,0,0) = 0,f(e,) = 0, f(e,) = l and fie.) = 1. With a little
practice, the characteristic numbers can be found by inspection. The
characteristic numbers of xlx,’ + xtx,’ + x,x,' + x,’x,'x,’ are 0, 1 and 2.
The identically zero function has no characteristic numbers while the
function which is identically 1, has every integer between 0 and n as a
characteristic number.
Although Definition (2.9) could as well have been made for any
Boolean function of n variables (not just for a symmetric function), it is
only for symmetric functions that the concept of characteristic numbers
becomes powerful. In fact, a symmetric functiOn is completely characteri-
sed by the set of its characteristic numbers as we now show. This also
justifies the name.
2.10 T‘ The ‘ ‘ ' ‘ ‘ ot‘asy fie“ ' func.
tion completely determine it. In other words, given integers
0<rl<rs<m<rk<n
there exists one and only one symmetric function ofn Boolean variables
whose characteristics numbers are r,, r,, ..., n.
n.
Proof: For an interger r, 0 s r g 71 let S, be the set of all terms of the
r
form xfl x;- where exactly r of the n variables occur without primes and
the remaining 11 — r variables occur with prime (’ ). Letf be a symmetric
Boolean function of x" x., ...,x,, with characteristic numbers r1, r,. ....n,.
Then by the argument made before Definition (2.9), f contains all the
terms in the sets S,, , Sq. .... 5",, and none of the terms in S, for r96 r1, r,, ..., rs.
Therefore, the D.N.F. off must be the sum of the terms in Sr. , Sy. , .... Srk-
Since two distinct functions cannot have the same D.N.F., f is uniquely
determined by this expression. This also show that given any subset, say
C, of {0, 1, ...,n} we can construct a symmetric function of n variables
whose characteristic numbers are precisely the elements of C. All we have
to do is to set S = U S, and take the sum of the terms in S. |
16C
The reader may note that in the Locks Problem. we had I: (the state
of the box) as a symmetric function of x‘, ..., x, with characteristic numbers
3, 4 and 5. We expressed b as bl + [7, + 1);. These summands were nothing
but the sums of the terms in 5,. S. and S, respectively.
The concept of characteristic numbers also provides a structure theorem
for the set of all symmetric functions of given variables.
Boolean Algebra 255
2.11 Theorem: The set of all symmetric Boolean functions of n Boolean
variables x,, X], ..., x.iu ‘ ' ‘ :- ofthe " ' ' L of all " ‘
functions of these variables. As a Boolean algebra, it is isomorphic to the
power set Boolean algebra of the set {0, l, ..., n}.
Proof: Let X denote, as before, the Boolean algebra of all Boolean
functions of A], .r,. ...,x,, under pointwise operations and let Y be the set
of all symmetric Boolean functions of in, ...,x,.. We show Y is closed
under the three operations +, . and ’ . Let [lg a Y. Then f, g are symmet-
ric and hence each is unafl‘eCted under any interchange of two variable.
Obviously the same is true of f+g. Hencef + g is symmetric, i.e-. f+ g e Y.
So Yis closed under +- Similarly, Y is closed under ~ and‘ - Since Y is
obviously non-empty (at least the constant functions Oand 1 are always
in Y). it follows that Y is a subalgebra of X.
Now let T be the set {0. l, ..., n}. We have to show that, Y, as a
Boolean algebra by itself, is isomorphic to the power set Boolean algebra
P(T). An explicit isomorphism can be defined as follows. For a symmetric
function f of x,, ...,X.. let C(f) denote the set of characteristic numbers
off. Then C(f) C T, i.e., C(f)E P(T). Define a function 0:Y—> P(T) by
6(f) = C(f) for f E Y. In view of the last theorem. 0 is a bijection. To
show that 8 is an isomorphism, we have to show that 6 is compatible with
the operations. This amounts to showing that for all f, g E Y, (i) C(f+ g)
= C(f)u co). (ii) C(f-g) = cmn co) and cm = 7— cm. For (i),
suppose r is a characteristic number off + g. Then (f + 3) (er) = 1, i.e,
flea + g(e,) = 1. Since f(e,), g(e,) are elements of 2,, this can happen ifi'
. at least one off(e,) and g(e,) equals 1. Therefore, r is either in C(f) or in
(3(8). This shows C(f +g)c C(f) U (g). Conversely if r e C(f)UC(g)
then either f(e,) =1 or g(e,) = I. In either case, (f+ g) (2.) = 1. So
r e C(f + g). Hence C(f+ g) = C(f)U C(g). Similarly we prove (ii) and
(iii). Thus 0 is an isomorphism between Yand P(T). I
In particular, since I T] = n + l we see that there are 2’”1 symmetric
Boolean functions of n variables. The importance of symmetric fitnctions
comes from the fact that they arise frequently in applications and their
characteristic numbers can often be determined directly from the data (as
in the Locks Problem). Moreover. in electrical circuits (to be studied in
the next section), some particularly simple circuits can be designed for
such functions.
Boolean variables provide the appropriate mathematical tool for
handling two-state devices, such as statements, switches, boxes. locks,
persons (having only a ‘yes' or ‘no’ opinion). However, there are a few
entities which have three natural states. For example, a member of a
committee may vote ‘yes’ or 'no’ on a resolution or he may abstain. The
resolution itself may be either carried (by majority vote) or rejected or
the votes may be divided equally upon it. There are also physical devices
256 ntscma'rs MATHEMATICS (Chapter Four)
having three states, e.g. a balance, an iron bar (which may be magnetieed
in two ways or else not magnetised). Although we shall not develop any
mathematical structure suitable for handling ternary devices, some of the
reasoning used for Boolean functions may be adapted for three—state
devices. We illustrate this by doing the Stone Problem. In this problem
we are given a balance and we are allowed to put a weight in either pan
(or else not use that weight). So we havea ternary device. Let us agree
to place the object to be weighed in the left pan. With this convention,
a weight of m kilograms acts as + m when placed in the right pan, 0 when
not used and as — m when placed in the left pan. (For example, if the
balance is even with a 10 kg. weight in the right pan and 2 kg. weight in
the left pan then the object has weight [0—2 = 8.) The problem now is to
cut the stone weighing 40 kgs. into 4 parts so that every integral multiple
of 1kg. betweenl to 40 kg. can be weighed with only one use of the
balance. By interchanging pans and allowing negative weights, this means
that every integeral weight between -— 40 kg to 40 kg. (both included) is
to be weighed with only one use of the balance. Note that this gives 81
distinct possible weights and 81 is a power of 3.
Let us now formulate the problem mathematically. Let us call the
four parts of the stone as [7,, 11,, p3,“ and let their weights be m,, m,, m,, m.
(all weights are in kilograms) where m‘, m, m,, m, are positive real
numbers with m, < ma g m, S mI and m1 + mI + m. + m, =40. Now
let T denote the set{ —1, 0, l), and let S be the set of all sequences of
length 4 with values in T. There are 3‘ = 81 such sequences, Now every
element, say (x,, x,, x3, x.), of S corresponds to a placement of the parts
of the stone, if we make the convention that for i= 1, 2, 3, 4 the part p,
will be in the left pan if x, = - 1, in the right pan if x. =1 and not be
used if x, = 0. (For example (1, — l, 0, — 1) indicates the placement where
the left pan contains p, and p4 and the right pan contains pf) Now define
f: S—> R by fix” x;. x,, x.) = mm + "1.x! + max, + mlx.
for (x1, x2, x3, x.) e S. The function 1' may be called the weight function
because furl, x“, x,, x.) gives the weight that can be weighed by the place-
ment of the p.‘s corresponding to (x,, x," x,, x.) (Note that the values of
f may be negative also as we are allowing negative weights.) The problem
now reduces to determining the real numbers my "1,, ma, m. in such a way
that the range offcontains the set A, where A = {0, j: l, j: 2. ..., :l: 40}.
One such solution is provided by taking m1 = l, m, = 3, m, = 9 and
m.=27 (the verification that every integer between — 40 and 40 can be
expressed as the weight of some arrangement is left to the reader;for
example, 16 = 27 — 9 — 3 + l gives that a weight of 16 kg, is obtained by
putting Pi and p‘ in the right pan and p, and Pa in the left pan.)
Although this answers the Stone Problem, it can hardly be said that we
have obtained the solution. It is as if we pulled the numbers I. 3. 9 and 27
Boolean Algebra 257
out of the blue and they luckily worked. This is all right for a puzzle but
not for a genuine mathematical problem. True, these figures are not enti-
rely at random. They are precisely the first four powers of 3 and 3 is a
vital number in the present problem. Still, we would like to arrive at the
answer by some reasoning. We would also like to know if there is some
other solution. Both these will be done through exercises. The crucial point
is that the domain of the weight function f above has cardinality 8]. So its
range, say R, can have cardinality at most 81 by Proposition 2.2.8, part
(iv). Since [ A | is also 81, it follows that if R contains some element not in
A then R cannot contain A (or else IR | would be at least 82, a contradic-
tion). In particular. this forces all the weights m,, m,, In, and m. to be
integers since each m, is obviously in the range off.
Exercises
2.1 Prepare the tables of values of the following functions. Obtain
their disjunctive normal forms. Also obtain their conjunctive nor-
mal forms both directly and by conversion from the disjunctive
normal forms.
(0 x,’ x, (7‘1, + x, + x! x.)
(ii) a + b + c’
(iii) (xy + x‘y + x'y')’ (x + y).
2.2 Let f, g be two Boolean functions of n Boolean variables. Prove
that f is a summand of g (i.e.. there exists a Boolean function I:
such that f+ h = 3) if and only if for all 7 - (y,, y,,.... y.) e 23.
f0) = 1 implies g0) = l.
Obtain a similar characterisation for f to he a factor of 3.
hi»
NN
Let f bea Boolean function. Prove that the conjunctive normal
form off provides the ultimate factorisation of f in the sense that
iff=fyfp .fk isa lactorisation of I, then every factorf, is
the product of some factors in the C.N.F. of f. (This fact was
implicitly used in the solution to the Locks Problem.)
2.5 Suppose in the Locks Problem that the person p, has a special
veto power which he can exercise, in addition to and independently
of his decision to open the box on par with other persons (which
means even if he has agreed to open the box, as an ordinary mem-
ber, he can still exercise his veto). Now design a system oflocks
and keys, using Boolean variables.
258 DISCRETE MATHEMATICS (Chapter Four)
2.6 At an examination there are seven subjects A, B, C, D, E, F and G.
A candidate is given a grade of ‘pass' or ‘fail’ in each subject. The
rules for passing the examination are as follows:
(i) A candidate must pass in at least five subjects.
(ii) A candidate must pass in subject A and in at least two out of
B, C and F.
(iii) For candidates passing in C, D and G, requirement (ii) shall
be waived.
(iv) For candidates, coming from the scheduled castes, require-
ment (i) shall be waived.
Express the pass/fail state of a candidate as a Boolean function
of 8 Boolean variables (one fox each subject and one for indicat-
ing whether he comes from a scheduled caste).
2.7 Theree are live industries A, B, C, D and E applying for import
licenses. Their qualifications are as follows:
A is a large scale, urban industry employing skilled personnel.
Bis a large scale, urban industry employing unskilled personnel.
C is a small scale, rural industry employing unskilled personnel.
D is a small scale rural industry employing skilled personnel.
E is a large scale, rural industry employing unskilled personnel.
The licensing authority is biased in favour of A, B, C and against
D, E. However, to dismiss the implications of favouritism, it wants
to frame rules so that only A, B, C would get the licenses. Design
a simple system of rules for this.
2.8 Decide which of the following functions are symmetric. In case of
symmetric functions find their characteristic numbers.
(i) (a + b'r) (b + c'a) + (c + n’b)
(ii) x,x,x,x" + x,x,x‘x1’ + x.x‘x,x,' + x.x.x,x,'
(iii) x,x.’x,x,’ + x.x,’x,x,’
2.9 LetXbethe“ ‘ ‘ ‘- ofall" ' ' ’ ofn" '
variables. Let < be the corresponding order structure on X. (cf.
Exercise (1.20)). Prove that for every f e X, there exists a smallest
symmetric function g such that f g g and a largest symmetric
function h such that h g f.
2.10 A Boolean functionf (x1. x,,...,x,,) is said to be cyclically syl-
metric if for all (x,...,.\'.) e Z."-
f(x1; XIV"! X») =f(xat xai'w X," x!)
Prove that if f is cyclically symmetric then f (x,, xx, ..., tn)
Boolean Algebra 259
depends only on the relative placement of x1,..., x. around a circle.
(Hence the name). In other words, prove that
fix” ..., x.) =f(x,, ..., x., x.)
zflxay xv ---r xll’ x1, X2) = --.'—“f(X., X1, ..., Am“).
Prove that every symmetric function is cyclically symmetric and
that the converse is not true in general.
Prove that for n = I, 2, 3 every cyclically symmetric Boolean func-
tion it symmetric.
The notions of symmetry and cyclic symmetry may be defined for
functions. of In real variables also. Prove that for such functions,
even for n = 3, cyclic symmetry does not imply symmetry.
“,r fisa," ", it“ ' f ' ofx1,....x.
where n is a prime. Let xf‘x? x:" be an atom other than xix....x,.
and x,’x,’...x.’. Prove that if the D.N.F. off contains
A:I2| x,e. ...x,
then it must also contain certain 11—] other atoms obtained from
xfi‘x? ...x:". What happens ifn is not a prime?
‘2.15 Using the last exercise show that the number of cyclically sym-
metric functions of n variables where n is a prime is 2'" where
_ 2'—2
m + 2.
_ n
2.16 In the Stone Problem prove that at least one of the four parts
must weigh 1 kg. (Hint: If not, show that it would be impossible
to weigh 39 kg.)
2.17 LetA = (0, i1. i2, ..., :l: 40). Let A-” A.I and A, be three
mutually dlsjoint subsets of A, each having 27 elements such that
(i) for all xEA., x + l e A, and x—l e .4.”
(ii) for all x E A], x—l e A".
(iii) for all x e A-» x + 1 E A“ and (iv) 0 e A... Prove that
A” = (0, :1: 3, j; 6, ..., :l: 39)).
2.18 In the Stone problem, if we exclude the part weighing 1 kg. (given
by Exercise 2.16), then prove that all multiples of 3 kg, (upto 39
kg.) can be weighed using the remaining three parts. (Hint: Use
the last exercise. Let S, f be as in the discussion of the problem
and suppose p, has weight 1 kg, i e., m, = i.
Let
S_, = «x.» x,, XI. x.) E Sm, = — l}.
260 Discmn MATHEMATICS _ (Chapterfour)
Similarly define
so = (On. an. x” at.) 6:5: x. = 0)
and
s! = {(XI, ”as xi. 3‘) 6 Six, = I}.
Let
A, = f(S,), i: — 1, o, 1.)
2.19 Prove that the only solution to the Stone Problem is
m: l,m,=3,m,=9andm.= 27.
(Hint: Let Q;, Q,, Q‘ be stones with weights
m, "'a ’_”_4
T a ? Ind 3 .
Then by the last exercise, every multiple of 1 kg. upto 13 kg. can
be weighed using 9,, Q. and 9.. Now apply induction.)
2.20 Let f be a Boolean function of n variables. Prove that for all
(x,, x!) ..., x,.) E 2;,
fO‘h xx» ~"txfl)=f(xy-n) xix-r, 1) xn +f(xn ..., XII-I) 0) xu'.
What is the significance of this result?
2.21 For certain applications, it is more convenient to consider the
ring sum, 1: a) y of two Boolean variables x and y than the ordi-
nary sum (the disjunction) x + y (cf. Exercise 1.19). If x,,x‘...,x,.
are Boolean variables, prove that x, e x. 6 $ at. (which is
well-defined. since 6 is associative) equals 1 ifi‘ the number of xfl
having value 1 is odd.
‘2.22 Prove that every Boolean function f of n Boolean variables can be
uniquely expressed as the rins sum of terms of the form
1:“ X1. XI,
where r is an integer 0 < r g n, (if r = 0, the term is an empty
product which is interpreted as I, just as an empty sum is inter-
preted as 0). (Hint: Note that x,’ = x, G) 1. Apply properties of
the ring sum to convert the D.N.F. off to the desired form). The
form of the function given by this exercise is called the ring nor-
mal form.
Nate: and Guide to Literature
This section is preparatory to the next one. For more on the disjunc-
tive, conjunctive and the ring normal forms, see Dornhofi' and Hohn [l].
Boolean Algebra 261
3. Applications to Switching Networks
In this section we apply the theory of Boolean functions of Boolean
variables to a very special context, where the variables represent the states
of electrical switches and the function represents the state of an electric
circuit in which these switches occur. For most of the time we shall deal
with what are called combinations] clrcnits. In these circuits, the state of
the circuit (i.e. whether current is flowing through it or not) at any time
depends only on the combination of the states of the switches at that time
(i.e. on which of them are closed and which are open). If at two difl‘erent
times, every switch is In the same state, so will be the circuit. 0n the other
hand, there are some circuits for which this is not necessarily true. In such
circuits, the state of the circuit is a function not only of the states of the
switches but also of time. Such circuits are called sequential circuits and
we shall mention them briefly.
The mechanical construction of switches will he unimportant for our
purpose. Our discussion will also be independent of the manner in which
the ' L are ‘ " or ' " by some
sort of afeedback arrangement. That is why the theory has survived the
transitions in the mechanism or switches caused by advances”in electronics
For fixation of ideas we shall take a switch as a device with two pieces of
wire (called leads). The switch is said to be open when the two wires are
electrically insulated from each other and closed when current can flow
from one into the other. Admittedly this terminology is confusing to a
beginner, because these terms are used precisely with opposite meanings
in other contexts. For example, when a door is closed it prevents the
passage through it and when it is open, such passage is allowed. But the
usage is too standard to be changed. A symbolic representation of a
switch is shown in Figure 4.2 (a). Two switches which are always in the
same state are to be regarded as equal. This means that their leads are
joined by some non-conducting device in such a way that the current will
flow or stop flowing simultaneously in every pair of leads, (Figure 4.2 (13)).
4; LP
open switch : T Z:
___\L___
L—
(0) closed switch (b) equal switches (c)complementory
switches
Figure 4.2: Symbolic Representation of Switches.
262 Discam MATHEMATICS (Chapter Four)
0n the other hand, two switches which are always in the opposite states
are said to be complementary. If one of them is denoted by x, the other is
denoted by K. A symbolic representation for a pair of complementary
switches is given in Figure 4.2 (c).
Since a switch is a two state device it can be represented by a Boolean
variable. We assign the elements 1 and 0 of Z, respectively to the closed
and the open states of a switch, i.e. when a switch is closed, the Boolean
variable representing it has value 1 and when it is open, the value 0. In
general, the same symbol will be used to denote both a switch and the
Boolean variable representing it. In diagrams, for notational brevity we
shall treat the two leads of a switch as parts of the same wire separated by
a gap which will be filled by the Boolean variable representing the switch,
for ' x or y’ .
Besides complementation, there are two basic operations. + and - for
Boolean variables. Let us see which physical devices serve to represent
them. Let x. y be two switches. Then we want x- y to be a switch which
will be closed when both x and y are closed and open when at least one
of them is open. The simplest way to achieve this is to take one lead from
each x and y and join them together. The current will flow from the other
lead of x to the other lead of y ifi‘ both x and y are closed. This is called the
series arrangement of the switches at and y and is shown in Figure 4.3 (a).
0n the other hand. if we join together one lead of x with one lead of y
and the other two leads together, then current will flow between these
junction points in“ at least one of x and y is closed. Therefore, this arrange-
ment, called the parallel arrangement of the switches x and y and pictured
in Figure 4.3 (b). serves to represent x+ y. Incidentally, 'parallel’ here
simply means that the wires on which the switchesx and y operate have
no electrical contact in between the two junction points. It does not literally
mean that they are parallel lines in the geometric sense. Indeed, the wires
need not be straight at all! They could be any arcs. There is no harm if
they cross each other, as long as they are insulated. Such crossings can be
__\__ _yL__
x
_¥\_
" y I x I
( a) y
(b)
Figure 4.3:' Series and Parallel Arrangements of Switches
Boolean Algebra 263
avoided in space (by a result from graph theory). Since' we shall be draw-
ing only planar diagrams for circuits, sometimes crossings are inevitable.
In such cases, we show the crossing by curving one of the wires slightly as
indicating that there is no electrical contact at the point of intersection.
We now turn to circuits. A complete electric circuit has three parts,
namely. a source of power, an output (such as a lamp, 3’ bell etc.) and a
third part called the control part, in which we have a combination of
switches. For our purpose, the third part will be the most important.
We may not always draw the other two parts. when we draw them we
shall show the source by some symbol like“, and the output by some
symbol like L. In more complicated circuits. that is. in the seguential
circuits, some of the switches in the control path themselves appear as
output or are controlled by the output.‘ However, for the time being, we
shall restrict ourselves to the cases where the control part of a circuit is
independent of the other two parts and controls the flow of the current
between two points on the circuit (called terminals) depending upon the
states of the various switches in it. We shall call such circuits as two
terminal circuits even though they are. really speaking. only the control
parts of some complete circuits. In Figure 4.4 (a) we show a complete
circuit and in Figure 4.4 (b). its control part as a two-terminal circuit.
X—W ._ ,
>__z’
C<
0
y 2
T- r—<,_> “
(a) (b)
Figure 4.4: Electrical Circuit
Even among two terminal circuits, we shall first consider only the
so-called series parallel circuits which are obtained by repeated series and
which
parallel combinations of the simplest circuits. namely, those in
see how
there is a single wire with only one switch on it. (Later we shall
el
any two terminal circuit, can be replaced by an equivalent series-parall
circuit.) For example, the circuit in Figure 4.4 (b) is obtained as follows:
w
(i) take the series combination of the switch at and the switch
(ii) take the series combination of the switch y and the switch:
(n)
(iii) take the parallel combination of the circuits (i) and
and
arrangement.
‘ This is the essence of what is popularly called a feedback
264 mm unnammcs (Chapter Four)
(iv) take the series combination of the circuit obtained in (iii) and the
switch 2’.
Note that the same switch may appear any number of times in a circuit.
In actually constructing the physical circuit the multiple occurrences of a
switch would have to be handled by switches capable of simultaneous
operation on various pairs of wires. Such multiple switches are costly and
are, as far as possible, avoided. This leads to the notion of simplification
of a circuit, that is, replacing it by an equivalent circuit in which the total
number of switches (counting multiplicities) is as small as possible.
Informally, two circuits are equivalent if, whenever every switch com
mon to them is in the same state in both of them, the two circuits are in the
same state, i.e. current flows either through both of them or through
neither of them. This idea can be neatly expressed if we use the language
of Boolean functions. We already remarked that every switch is a
Boolean variable. Now every circuit also has only two states, either some
current flows through it or else no current flows through it. (The magnitude
of the current flowing through various parts of a circuit may be relevant in
some problems, but not for our purpose.) In analogy with switches, we
associate a Boolean variable with a circuit and assign it the value 1 if
current flows through it (a closed or a completed circuit) and the value 0
if no current flows through it (an open or a broken circuit). This convention
is consistent with the fact that a series arrangement of two circuits is closed
ifl‘ both of them are closed and a parallel arrangement is closed ifl‘ at least
one of them is closed.
Evidently, the state of a circuit is a Boolean function .of the switches
occurring in it. This function is called the closure function of the circuit,
because the circuit is closed for those states of the switches at which this
function has the value 1. Iff is the closure function of a circuit we say1'
represents the circuit or is realised by the circuit. These definitions are
applicable for all two terminal circuits, not just for series parallel circuits.
However, in the case of series parallel circuits, closure functions can be
written down mechanically simply by inspection keeping in mind that a series
m w- r ’ to the " ' ' ‘ " inn and a parallel
arrangement to the Boolean addition. For example, the circuit in Figrue 4.4
(b) realises the Boolean function
f (X. y. z. W) = 2' (W + n)-
Conversely, given any Boolean function, we can mechanically construct
at least one circuit to realise it, even though there may be many others,
and some of them may be more economical.
Now, simplification of a circuit can be done at least wrtiy by simpli-
fying the function representing it. (If we allow circuits which are not series
parallel, then further simplification may be possible as we shall see later.)
For example, the circuit of Figure 4.4 (b) can be simplified by simplifying
Boolean Algebra 265
its closure function
f(x.}'. Z. W) = z’ (xw + yz)
as
f“. y, Z, W) = z’xw.
Thus an equivalent circuit can be drawn simply as a series combination
of three switches x, w and 2’, between the terminals T. and T,. We could
also see this by inspection. In order that current can flow from T, to T,,
it must either pass through x, w and z' or else through y, z and z’. The
second possibility can never hold because, by definition, the switches z
and 2’ can never be closed simultaneously. Hence, there is only one path
for the current to flow namely through x, w and 2’, giving xwz’ as the
closure function. More generally, if we trace all possible simple paths (i.c.,
paths not having any loops in them) for the current to flow from one terminal
to the other, then each path gives rise to a monomial involving the
switches and summing these monomial gives the closure function. In small
circuits this method is easy to apply and has the additional advantage
that it works for any two terminal circuits, not just for the series parallel
ones. However, one has to be careful that all possible paths have been
traced. An algorithm for tracing all possible paths will be given when we
study graph theory‘. As an application of this method, we see that the
closure function of a ‘bridge circuit‘ in Figure 45 (a) (which is not a
series parallel circuit) is f = ab + cd + acd + ceb because there are only 4
possible simple paths from T. to T1, namely
To
KIN, a
T ,_,__,_
o
b—
—-
A e\l/d>D—T. c d T
e —e — b —— 1
C
Figure 4.5: Bridge Clrcull and its Series Parallel Eonivaleat
T.~A——B—D—T,, Tn—A —c—n—T,,
13—4 —B—C«D— T, and Tu—A —C—-B—D—T1
An equivalent series parallel circuit is shown in Figure 4.5 (b).
In practical problems, we have to construct a circuit with given switches
and some output which is to be on for certain combinations of the states
of the switches and ofi‘ for the remaining combinations. We can do this by
putting the output (and the power source) in series with a circuit of these
switches whose closure function is first to be determined from the data of
the problem. As an illustration, let us do the Landlord Problem. Let x and
y denote the switches controlled by the landlord and the tenant respectively
'See the Epilogue.
266 DISCRETE MATHEMATICS (Chapler Four)
and let 2 be the hidden switch (with the landlord). When 2 is closed. each
of x and y is to control the state of the lamp independently of the other,
which means that the change in the state of either one of them (with the
other switch remaining as it is) must cause a change in the state of the
lamp. We arbitrarily set the lamp on when x and y are both closed. We
then get the table of values for the closure function, say, f, of the lamp
(or rather. its circuit), shown in Figure 4.6 (a). (Note that when we go
from the second to the third row, both x and y change their states and
hence the lamp remains in the same state.) The last four rows of the table
indicate that when 2 is open, 2: has exclusive control of the lamp, i.e., a
Row 2 x y f(x, y, z)
1 l 1 l l
2 l 1 0 0
3 l 0 1 0
4 l 0 0 1
5 0 l 1 l
6 0 1 0 l
7 0 0 l 0
8 0 0 0 0
(I)
z x/ y'
zI
x —E
y
(b)
Figure 4.6: Solution to the landlord Prohiem
Boolean Algebra 267
change in. 3: changes the state of the lamp. regardless of whether y changes
or not. >
From this table, we get theD N.F. of f as zxy +zx'y' + z’xy + z'xy’
which upon simplification gives
10mm): 2(x.v + Xy’) + 2'x = zx'y’ + x(z’ + y)-
A circuit realising this function is shown in Figure 4.6 (b),whereLdenotes
the lamp.
It is a common practice for several circuits to share some common
parts. As a commonest example, the circuits of various appliances in a
house share the same source of power and also the main switch. Sometimes
the functions representing these circuits have common factors and if so.
portions representing these common factors may be shared. In such a case
we represent all the circuits with a common initial terminal, say, T.. If
there are n circuits, we put n terminals T,, 7}, ..., T,I and then between Ta
and T, put the circuit whose closure function is given (or calculated from
the data of the problem), sharing common factors wherever possible. Each
T, is then connected to the ith output and then to T, through the power
source. Sometimes, in order to share the switches so as to efl‘ect an
economy, it may be necessary to manipulate the closure functions using
the various laws in a " ' ', L as is ill ‘ ‘ " in the‘ " ,
problem.
3.1 Problem: An aircraft has three engines. Each engine is provided
with a switch which closes as soon as there is any mechanical fault in that
engine. Although the aircraft can run with just one engine operating, it
is desired to have a red lamp appear when there is a fault in any one of
the engines and an alarm to ring when there is‘ a fault in any two of
them. Design a three terminal circuit for this. sharing switches wherever
possible.
Solution: Denote the three switches by x, y, z. Let f,g denote the closure
functions for the red lamp (R) and the alarm (A) respectively. A simple
calculation shows that [(x,y, z) = x + y + z and g(::, y, z) = xy + yz + zx.
If we draw separate circuits, we in all need 8 occurrences of switches
(because g can be simplified as x(y+ z) + yz). To see if any economy is
possible with sharing of switches, let us first see iff and 3 have any
common factors. For this we take their conjunctive normal forms (of.
Exercise (2.4)). f is already in its O.N.F. while
g=(X+y+2)(X+.v+2’)(x+y'+2)(X’+y+2)-
Here although x+ y+z is a common factor, it will not result in any
saving to share it because the remaining three factors of g, even upon
simplification would require at least five occurrences of switches anyway.
So we write fasf, +/; where f1=x and fg=y+z. Then g=(y+z)
268 mscam MATHEMATICS (Chapter Four)
(x + yz) and we may try to use the factor y + 2 common to f, and g. It
is tempting to try to do this by a circuit as in Figure 4.7 (I). But in this
circuit the alarm can ring even when only at is closed, which is not desired.
A path by which a current can flow when not desired is called “meek
path. It can be avoided by inserting the switch x' as in Figure 4.4 (b),
which is a correct solution. We could have drawn the second circuit
without the first if instead of writing f as a sum of x and y + 2 we write
l...—_.
E ;:j l g”? z y_z—_i_®—
(o) Circuit with 41 Sneak Path
ll
'1
.+
z y _zj—@‘
(b) Correct Solution
Figure 4.7 a Solution to Problem (3.1)
f asx +x’(y+z). Here the two summands are mutually disjoint (i.e.
their product is 0) and hence their is no possibility of a sneak path. Note
that there is a saving of switchesbecause only 7 occurrences are needed. I
We remark that in this problem both f and g are symmetricfunctions
with characteristic numbers 1, 2, 3 and 2, 3 respectively. Later on we shall
give an alternate circuit to realise symmetric functions of switcher.
Let us now see how an arbitrary two terminal circuit can be replaced
by an equivalent series parallel circuit. One method for doing this was
discussed already, namely to look for all possible paths for current to flow
from one terminal to another. Instead of applying this method in a
heuristic manner (which makes it likely that some possible paths may be
missed), we now show how it can be applied in a systematric, step-by-step
manner to reduce a non-series-parallel circuit to a series parallel circuit.
Obviously, in a non-series parallel circuit, there must be at least one
point, other than a terminal, which is joined to more than two points by
wires and each of these wires carries some combination of switches. Such
a point is called a star, and the number of wires meeting at it is called its
degree. For example, in the bridge circuit of Figure 4.5, the points B and C
Boolean Algebra 269
are stars of degree 3 each. (The points A and D are not regarded as starts
because they are efi'ectively the terminals.) A star of degree 3 is called a
wye because of its resemblance with the capital letter 1’.
The key step in reducing a non-series-paraiiel circuit is to go on
replacing its stars by equivalent arrangements of switches between every
pair of points which are joined to the star points. The rimplest case is that
of a wye as shown in Figure 4.8 (a). The star point P is joined to A, B and
C by wires whose closure functions are f, g, h respectively. (Here f, g, It
may “ ' . be ' l or some “ ' of " L ) Since the
A
\. . 9/ B A
if f——-
7 a
h \h h/
\/ c
C
(a) given wye circuit (blequivalent delta circuit
Figure 4.8: Wyevto-dettn Trenton-anon
point P is not a terminal, we eliminate it by putting wires between A and
B. B and C and between A and C. These wires are in addition to whatever
other wires that may be already present. The resulting new circuit is called
a delta circuit. ‘delta‘ being another name for a ‘triangle‘. The transforma-
tion is called a wye-to-delta transformation. The new circuit will be equi-
valent to the old one if and only if for every pair of verticel, the conditions
for the current to flow from one of them to the other are identical in the
two circuits. Let us take A and .8. Then in the wye circuit current will flow
from A to B ifi‘ both I and g are i. (We are, of course, not counting here
the other paths. if any. available to go from A to B.) This is equivalent to
saying that fg= 1. So the closure function of the side AB in the delta
circuit is fg. Similarly the other two closure functions are determined. Note
that the path A — C —- B does not provide an additional path {or the
current to flow from A to B because current can flow through it ifi‘ fl: and
gh are both 1; but in that case is fg = 1 any way.
Thus we have eliminated a star of degree 3. Similar constructions apply
for eliminating stars of higher degrees. Using this method we can reduce
the bridge circuit of Figure 4.5. redrawn in Figure 4.9 (a). We eliminate the
star at Band show the resulting circuit in Figure 4.9 (b). Note that the
wires between A and C and between C and 0 present in the original circuit
270 DISCRETE MATHEMATICS (Chapter Four)
are to be retained. Note also that there is no star in the circuit in (b).
Although four wires meet at C, the point C is not a star because it is joined
to only two points. namely to A and D. by two wires each. We replace
each pair of wires by a single wire whose closure function is the sum of
their closure functions. This gives the circuit in Figure 4.9 (c). This is a
series-pardlel circuit with closure function ab + (c + ae)(d + eb) which
upon simplification reduces to ab + cd+ aed + ceb, which was also obtai-
B
o’- b
Lg.“ ., °—<A k. >— T 1-l
wA
c
e D cwd
(a) (It)
To ab Tr
A D
e+ee d+eb
(c)
Me 4.9: Reduction of a Bridge Circuit
ned earlier by inspecting all possible paths for the current to flow between
To and T1.
it should be noted that although the series-parallel circuit obtained from
the bridge circuit is easier to analyse, it is the bridge circuit that is more
economical as far as the number of occurrences of the switches is concer-
ned. The wire containing the switch e is like a ‘bridge’ joining the paths
A ._ B — D and A -— C — D (hence the name). By putting such
bridges one can oonstnrct non-series parallel circuits which are considerably
cheaper than their series parallel equivalents. However, it takes some in-
genuity to fortell which Boolean functions can be realised by a bridge
circuit. There is no easy method for doing this in general. However, in case
of symmetric functions there is a well-known circuit which allows us to
realise a symmetric function of rr switches say xv x2, ..., x“ as soon as we
know its characteristic numbers. Such a circuit is shown in Figure 410.
The initial terminal is marked as T. We have a triangular grid of wires
with horizoa wires carrying the switches x,’, x,’, ..., x,.’ and slanting
wires carrying the switches x1, x,, ..., xna Note that the switches appear with
Boolean Algebra 27]
x/Tn
ll
lli'l'Tn-I
/ I
z’ l
I I
z’ 'I
xs/------ —:..7 T, I
/ZX37
, """" [1"
xn7Ta
/'::'P"::7""" *7“
Q X?
T_l’_sé_Zx;_--_-
X. x1"
'I
4%...13
-
Figure 4.10: Recitation o! a Syn-nettle Function
difi'erent multiplicities, 2:, occurs 21' times (including complementary occur-
rences), for i- I, 2. ..., n. The terminals on the right are numbered
7”,, I“, ..., T... The key to understand the working of this circuit is the
observation that current can flow from Tto T, if and only if precisely x'
of the switches are closed end the remaining n —i are open. To see this,
note first that the arrangement of the complementary switches is such that
when the current comes at any junction point (such as the point P in the
figure) it cannot go back towards T. For example, if the current is at P,
then it has come through Q or through R. If it came through Q it cannot
go back to Q (because no loops are allowed in the path of a current) and
it cannot go to R because x, is open, x,’ being closed. So the current at
any junction point has only two alternatives, either to go horizontally to
the right or else to ‘climb up‘ along the slanting line by one unit, depending
upon whether the next switch is open or closed. Since every path from T
to T.- involves i climb-ups, current can flow from T to T, precisely when
exactly i of the switches are closed. So if we consider a two terminal circuit
with T and T. as the terminals, its closure function will be the sum of
n
( . terms of the form x,~x, nut/M x,.’. This is a symmetric function
I
withi as its only characteristics number. Now if we attach leads to the
points Tr“ T7,, ..., Ty,‘ (say) and fuse them together atapointS then current
will flow from Tto S in" it flows from Tto one of T“, T," ..., T,,‘. There-
fore the closure function of the two terminal circuit between T and S is
the symmetric function of XI, x,, ..., x,, Whose characteristic numbers are
r,. I," ..., rk. Conversely every symmetric function of n variables can be
realised by finding its characteristic numbers and joining the correspond-
ing terminals to a common terminal S. We illustrate this in the following
problem.
272 DISCRETE “mummies (Chapter Four)
3.2 Problem: There are five persons on a committee. Each person is
provided with a switch. When a bill is before the committee, every person
in favour of it closes the switch (abstention is not allowed). Design a circuit
in which a green lamp will light if the majority is in favour of the bill and
a red lamp otherwise.
Solution: Let the five switches be x1, x,, x,, x” x. and letf be the closure
function of the green lamp. Thenfis to be i when at least three of the five
switches are closed and 0 otherwise. It follows that f is a symmetric func-
tion of x” x” x,. x“ x. with characteristic numbers 3,4 and 5. The closure
function of the red lamp is the complement off and consequently has 0, l
and 2 as its characteristic numbers. A required circuit can now be drawn
as in Figure 4.11. I
x5 I
/4"5 0
4"?
"4
[,5
*5
X3 X4 X5
I I I
/zx374x4 x5
X? /x,3
/4X27’—X374‘474xs
X,“ "5 I
0
XI x2 X3 xa x
5
xl—éxl 4x, lxllx’
l 2 3 4 5
Figure 4.": Solution to Prome- (3.2)
Note that the disjunctive normal forms off and f' contain )6 terms
each. A series parallel circuit for the problem would be extremely costly. fl
As another illustration of the circuit for symmetric functions, we can
do problem (3.1) by connecting R to the terminals 1“,, T, and T. and the
alarm A to the terminals T, and TI. The resulting circuit is not as compact
as that in Figure 4.7 (is). But it has a greater adaptability. For example,
if we want the red light to go ofl‘ when the alarm is ringing all we have to
do is to out off the wires joining R to T, and T,
Sometimes the closure function of a circuit is not symmetric in all the
variables. Even in such a case, the circuits for symmetric factions are usefill
if a large factor or ‘ of the ' function is win among
some of the variables. For example we invite the reader to modify the
Boolean Algebra 273
circuit of Figure 4.11 if one of the persons has a special veto power
(cf.
Exercise (2.5)).
Electrical circuits provide a physical means of expressing a Boolean func-
tion of Boolean variables, by realising each Boolean variable as a switch. In
electronic data processors, it is convenient to think of a function as a
device which produces an output when some inputs are fed to it. This is
of course just another way of looking at the definition of a function. Let
f:X—> Y be any function, expressed by y = f(x). Here the argument at
ranges over the set X. The elements of X are, therefore, the possible
inputs and when any one of them say x1 is fed to the function f. it produces
as output the value off at x” i.e., the elementf(xl) of Y. For functions of
several variables, say, x,, x,, .... x" there are n inputs. In the ease of
Boolean functions each input has only two possible values 1 and 03nd
the output is also either 1 or 0.
In signal processing, a binary input is usually represented by the
presence or absence of a certain voltage along a wire, called an input
lead, The magnitude of the voltage is immaterial for our purpose. If fis a
Boolean function of n Boolean variables. say, y =f(x,, xl. .... x.) then we
want to construct a ‘black box’ with n input leads, one for each x, and one
output lead such that a voltage will appear on the output lead precisely
for such values of the inputs at which the function f has value I. A
symbolic representation of such a black box is shown in Figure 4.12.
xi
xsoex
Figure 4.12: Black Box Representation of Boolean Function
The black box representing a Boolean function can be constructed by
suitably combining some very elementary black boxes. called gates. These
gates are also called logic elements for reasons which will be apparent
from the next section. 'lhc internal mechanism of these gates is not our
concern. There are three such basic gates corresponding to the three basic
operations of a Boolean algebra namely 4», . and '. They are called respec-
tively the OR-gate. the AND-gate and the NOT-gate. These names
again come from logic and will be evident from the next section. The OR-gate
has two input leads and one output lead. A voltage appears on the output
lead when a voltage appears on at least one of the inputs. An AND-gate
274 mscnm MATHEMATICS (Chapter Four)
also has two input leads and an output lead on which a voltage appears
iii“ a voltage appears on both the input leads. The NOT-gate has one
input and one outputlead and there is a voltage on the output lead if
there is no voltage on the input lead and vice-versa. The symbolic represen-
tations for these basic logic elements are shown in Figure 4.13. The inputs
are marked with Boolean variables and the function represented by the
output lead is shown along the output lead.
x H)! x U
y yj:>— x D: x’
(a) OR-gate (blAND-qate (c) NOT-gate
Flgure 4.13: Bull: Logic Elements ( = Gates)
It is now a simple matter to construct a black box to realise a given
Boolean function in terms of these basic gates. For example, the arrange-
ment of gates shown in Figure 4.14 realises the function xlx. + x,’ (x,' +
x,’)+ x,’x2‘. Note that all crossings of wires are insulated except those
which are circled. The ‘blnclr box’ is shown by a dotted boundary.
l" """"""""""""""""""""" 'l
I
l
I——l>°—-’-—J we
lxl ‘2‘)“ "1,2
t
:+ xslxflxz)
Figure 4.14: Realising Boolean Functions Using Gates
Because of the various laws such as distributivity and associativity, which
hold in a Boolean algebra, the same function may often be represented by
more than one black box. Naturally, between two arrangements for the
same functions, the one with a smaller number of gates is preferable. For
example. to realise the function xy + xz as it is, it takes three gates; two
AND-gates and one OR-gate. But writing the same function as x(y +2)
Boolean Algebra 275
(using distributive law), saves one AND-gate. Even between two black
boxes requiring the same number of gates of each type, one arrangement
may be better than the other on some other ground. For example, take
the function x + y + z + w of four Boolean variables. This can be realised
in two ways, each using three OR-gates as shown in Figure 4.15. But the
arrangement in (b) is superior to that in (a) because it is quicker. It takes
some time for each gate to operate. If we denote this time by T, then the
time taken by (a) is 3T because the gates must operate one after the other
(we are not counting here the time it takes for the signal to pass from one
end of a wire to another, this time is negligible as compared to the time
T). On the other hand, in (b), two OR-gntes can function simultaneously
and hence the time taken to process the function x + y +2 + w is only
2T. Simultaneous functioning of several gates is popularly known as
parallel processing.
it x
y Y
z z
w
w
(a) (b)
Figure 4.15: Two ways to realise x + y + z + II
We close the section with a brief discussion of sequential circuits. As re-
marked at the beginning of this section, the state of such a circuit does not
depend exclusively on the states of the switches appearing in it. Obviously
in such circuits there must be some device which changes its state under a
change of certain conditions but which does not return to its original state
even when the initial conditions are restored. A simplest device of this type
is provided by what is known as a relay. There are various types of relays
such as magnetic or thermal relays. Every relay, regardless of its type, is
a binary device with a circuit called its control path. When no electric
current flows through the control path. the relay is in its natural or release
state. When current flows in the control path, the relay is activated and
is said to be in its operate state. It will continue to be in this state as long
as current flows in its control path. When the control path is broken, the
relay is deactivated, or released and returns to its release state. The
mechanism of change of states depends on the type of the relay. For
magnetic relays it is the magnetisation and demagnetisation of an iron
bar, for thermal relays it is the expansion and contraction upon a change
in temperature. But this is unimportant for our purpose because our
interest is not in the construction of relays but rather in their use.
276 DISCRETE MATHEMATICS (Chapter Four)
In order to use a relay, it is necessary to attach to it an additional
feature called a contact As the name suggests, a contact consists of two
thin metallic plates. One of these plates Is steady and'Is symbolically de-
noted by a vertical arrow f or t. The other plate is capable of a slight
motion when the relay Is activated and returns to its original position (by
a springlike arrangement) when the relay is released. Symbolicslly this mov‘
able plate is shown by a horizontal line (~#) in its natural state and a
slightly slanting line (\) in its activated state. The gap between these
plates is small. When the plates touch each other the contact is said to be
made and when they do not, it is said to be broken. There are two types
of contacts. In a make contact, the plates do not touch in their natural
position but when the relay is activated the movable plate moves and
touches the stationary plate. Symbolically, a make contact is represented
by “7 when it is broken and by i when it is made. On the other hand,
in the other type of a contact, called a break contact, the plates touch each
other in the released stale and a gap is created when the relay is in its ope-
rate state. A symbolic representation for a break contact is l in its
release state and \ll in its operate state. There can be several contacts
on the same relay. As a rule, relays are denoted by capital letters such as
A, B, X etc. The make contacts on them are denoted by corresponding
small case letters a, b, x etc. and the break contacts with small case letters
with primes, a’, b’, x’ etc. This notation is consistent with our earlier prac-
tice of using ’ for complementation because a contact has only two states,
one where it is made (i.e., the platcs touch each other) and one where it is
broken (i.e., the plates do not touch each other) It is obvious that a make
contact and a break Contact on the same relay will always be in opposite
states, whether the relay is operated or released. Symbolic representations
of relays and the contacts on them are shown in Figure 4. l6. The standard
practice is to write the contacts directly above the relays and to show the
link between a relay and a contact on it by a dotted line. (Such a line is
only symbolic.) Unlike the circuits considered earlier in this section, it is
customary to often show the control circuits of relays in full, that is, with
the power source (5) included in them. The completion of a circuit may
be symbolised by earlhing the two ends (shown by g). As a student of
electricity knows, the enrthing of a lead simply means maintenance of a
constant potential at its terminus and not necessarily II physical burying
into the earth.
If two pieces of a wire are joined to the two plates of a contact on a
relay. then the contact behaves exactly like a switch on that wire. The only
difl'erence, perhaps, is that while a switch is generally operated manually.
a contact is operated through a relay whose control path may be operated
manually. Such a relay therefore provides an indirect way to control the
current in some other circuit and is useful when a direct manual control of
the current in the other circuit is undesirable for some reason. such as
Boolean Algebra 277
:1: £11
Relay In release state Relay In operate state
xii. ti W
X
Malta contact wlth relay released Make contact wlth relay opsratlng
’ N
I x
__|'_g,_
X
Break contact wlth relay released Break contact with uley operating
Figure 4.16: Symbolic Representation of Relays and Contacts
safety. But conceptually this use of a relay is not very interesting. From the
point of view of Boolean algebra there is little difi‘erence between a circuit
' ' a of " y ‘ * and one ‘ " of on relays whose
control paths contain switches, as long as these control paths do not mingle
with the contacts. However, things are quite difi'erent when the control path
of a relay passes through a contact on that relay itself or. more generally,
when the control paths of several relays contain contacts on each other. It
is this feature of a relay that truly distinguishes it from a switch although
both are binary devices. We study the simplest circuit of this type shown
in Figure 4.”. Here the relay X is provided with a make contact. When
the switch a is initially open no current flows through the control circuit
of X and [0 X is in the released state. (Normally, in all diagrams involving
relays, they are shown in the released state. What happens when any of
them are activated has often to be visualised mentally.) Now if a is closed,
current flows through the control circuit of X. So X is activated and
consequently the make contact at is made. A portion of the current will
flow through x also. Now even if the switch a is opened, current will
continue to flow through x and will hold the relay in the operate state.
(Hence the name 'operate and hold’ circuit.) Such a circuit is the basic
memory device because the operate state of the relay ‘remembers’ that the
switch had been closed. If the relay is to be released, the control path will
278 macam MATHEMATlCS (Chapter Four)
? X
Figure 4.17: Operate all Hold Clrcult for I Relay
have to be broken at some other point (for example. between the battery
and the relay). _
Another important feature of relays is that a relay never acts instant-
aneously. There is always a definite time gap between the instant the
control path of a relay is complete and the instant when the relay actually
operates (which means all the make contacts on it are made and all break
contacts are broken). This time is called the operate time of the relay.
Similarly the release time of a relay is the time taken from the breaking
of its control path to its actual release. For simplicity we shall assume
that both these times are the same. They of course vary considerably
depending upon the type of the relay, ranging from a millionth of a second
for certain electronic relays to several seconds for thermal relays.
Because of the delay in relays. circuits can be designed which will
repeat some pattern of states periodically. The simplest such circuit. shown
in Figure 4.18, appears in a door bell.
ill
(ll—lg
X
Flmrre 4.18: A Door Bell Circuit
In this circuit. there is a break contact on a relay and the control path
of the relay passes through this contact. As soon as the switch a is closed.
the control circuit is complete. After the lapse of the operate time the
break contact is broken. This breaks the control circuit. However, another
time interval, equal to the release time of X has to pass before the relay
is released i.e. the break contact is made again. Now the control circuit is
complete again. This process repeats cyclically till the switch a is opened,
the contact being made and broken in alternate intervals of time. This
make-and-break arrangement is used in a door bell. It may he used, with
a relay of a delay time of several seconds, to have a lamp go on and off
Boolean Algebra 279
periodically. All we have to do is to put into the circuit of the lamp, a
contact on the relay X (other than the break contact mentioned earlier),
as shown in Figure 4.18. Such a circuit is called an output circuit attached
to the relay circuit.
As another illustration, consider the circuit of Figure 4.19 (a) in which
there are two relays X and Y and the control path of each passes through
a contact on the other (interlocked relays). For simplicity let us suppose
both the relays have the same delay time, T. The state of the circuit has
to be described for various intervals of time T, starting from the instant
the switch a is closed. During the first time interval both the relays are at
rest (even though the control circuit of Yis complete). At the end of the
first interval, the make contact on Y is made and the control path of X
is completed. However X is operated only at the end of the second interval.
in the third interval both X and Y are in the operate state but the control
path of Y is broken. Consequently, at the end of the third interval, Y
will come to rest. This will break the control path of X. So the relay X
will be released at the end of the fourth interval. We are now back to
square one, so to speak, and the states ofthe relays would repeat cyclically
in a sequence till the switch a is opened (hence the name sequential
circuits) as shown in the table in Figure 4.19 (b). By attaching suitable
output circuits, we can utilise this circuit to provide four consecutive
recurring actions of frequency 4—11: each.
Time Interval l 2 3 4 l 5 6 7 8 9
X 0 0 l l 0 0 l 1 0
Y 0 l l 0 0 l l 0 0
(b)
Figure 4.19: Interlockei Relays of Equal Tine Delays
280 macam MATHEMATICS (Chapter Four)
Further problems on the use of relays. both as memory devices and in
sequential circuits will be given at exercises.
Exercises
3.1 Find the Boolean functions representing the circuits of Figure 4.20.
Simplify the circuits.
{3'33}— 4%;{2m}
551
Figure 4.20. Clrcnltl for Exercise (2.1)
’
3.2 Find the Boolean function realised by the circuit in Figure 4.21
both by finding all possible paths for the current and also by
replacing the star of degree 4 in it. (Caution: Such replacement
will result in the production of other stars of degree 4 which will
then have to be replaced.)
a /\ b><° /\ d
:x y z w
\/ \/
Figure 4.21: Circuit for Exercise (3.2)
3.3 A hall has 3 doors and a central lamp. At each door, a switch is
provided. Design a circuit in which each of there three switches
can control the lamp independently of the others.
3.4 Generalise the last problem by assuming that the hall has n doors.
Prove that the closure function of the lamp is a symmetric fixnction
whose characteristic numbers are in an arithmetic progression
with common difi‘erence 2.
3.5 Prove that the circuit of Figure 4.22 can be used to realise the
function in the last problem. (This circuit is said to be obtained
from that in Figure 4.10 by the shifting down method, because of
the wires slanting forward that go downwards (from level 1 to
level 0) instead of upwards as in Figure 4.10.)
Boolean Algebra 281
4:4)(‘23423
3' i _____ 421—1
Figure 4.22: Shifting down Method (Exercise (3.5))
3.6 At a party there are several tables. At each table there is a switch
which closes if the number of persons sitting at that table is odd.
Design a circuit with these switches which will indicate whether
the‘ total number of persons at the party is even or odd.
3.7 For Exercise (2.6), design a circuit with 8 switches (one for each
subject and one for indicating whether the candidate comes from
a scheduled caste) in which a green lamp will light if the candidate
passes the examination and a red lamp if he fails.
3.8 A concert is to be recorded on a series of cassettes (the concert
being too long to come on one cassette). To avoid gaps that may
result because of the time taken to change cassettes, two recording
machines are provided. Each machine has a switch. Design a
circuit such that when the position of either one of these two
switches is altered (the other switch remaining in the same position),
the states of both the machines would change (i.e., the machine
which is running would stop and the machine at rest would start).
3.9 Three students A, B and C share an apartment and take turns at
cooking as per the following schedule:
(i) When exactly two of them have a test the third student cooks,
(ii) When only A has a test, B cooks on even numbered days
and C an odd numbered days.
(iii) When only B has a test, C cooks on even numbered days
and A on odd numbered daysl
(iv) When only C has a test, A cooks on even numbered days
and B an odd numberd days,
(v) In all other cases they eat out.
Each student is given a switch which closes if! he has a test. There
is also a fourth switch which closes precisely on even numbered
days. Design a circuit with these four switches in which a red lamp
lights if A is to cook, a blue one if B is to cook. a green one if
C is to cook and a white one if they are to eat out.
3.10 An obstacle track has five obstacles. At each obstacle. there is a
switch which closes automatically if a fault is committed while
clearing that obstacle. Design a circuit in which lamps of different
colours would light to indicate the number of faults committed.
282 mscrtm MATHEMATICS (Chapter Four)
3.11 Do the variation of Problem (3.2) in which one of the committee
members has a special veto power which he may exercise by
pressing an additional switch, independently of his opinion as an
ordinary member.
3.12 Suppose that abstention is allowed in Problem 3.2. Then a single
switch (having only two states) does not suifice to represent a
member’s mind. So suppose each person p, is given two switches
x; and y;. The person closes x, if he is in favour and y.- if he is
opposed to the bill. If he wishes to abstain. he closes neither.
(Closing both x. and y; may be either prohibited or taken to mean
abstention.) New design a circuit with these 10 switches in which
a green lamp lights if the bill is carried by a majority of those
voting, a. red lamp if the bill is defeated by amajority of those
voting and a white lamp if the votes are equally divided.
The last exercise illustrated how 2 Boolean variables can be utilised
to represent a ternary device (with some wastage.) What is the
minimum number of Boolean variables needed to represent a
variable taking n possible values? (Any representation of avariable
in terms of several Boolean variables is known as a binary coding
of that variable.)
3.14 Design black box representations, using logic gates. for the follow-
ing .. . f - Use as . r - t .l.
r u -
r
6) xy’ + yz’ + 2x
(ii) x e y (that is. the ‘exclusive or’ or the 'ring sum‘ of x and y).
(iii) x. e x, e e x. (this is called the parity checker, see
Exercise (2.21)).
(iv) xyz’ + x’y + xy’z.
3.1 Suppose a designer of black boxes has run out of AND-gates.
Us
Show that he can still manage if he has an adequate supply of the
other two logical gates, namely the OR-gates and the NOT-gates.
(Technically, a set S of gates is called functionally complete if
every logical gate can be constructed from replicas of the gates in
S. Thus (OR, NOT) is a functionally complete set. So is (AND.
Non.)
Three other basic logic elements are the XOR-gate, the NAND-
gate and NOR-gate. For inputs x and y, their outputs are, res-
pectively, x e y, (xy) and (x + y)’.
(a) Construct these gates.
(b) Show that (NAND) is a functionally complete set.
(c) Show that {NOR} is also functionally complete.
Show that any delta in a circuit may be replaced by a wye. In
other words. prove that the circuits in (a) and (b) of Figure 4.23
Boolean Algebra 283
are equivalent if the blanks in (b) are suitabl filled. h'
‘
celled delta-to-wye conversion. y (T IS IS
° \7 ,/
(a) I
Flame 4.23: Delta-town Conversion (Exercise 3.17)
It is not quite the opposite of the wye-todelta conversion, because
if the two are applied one followed by the other then the circuit
that results is not the same as the original one, although it is of
course equivalent).
3.18 In Figure 4.24, assume a and b are pushbutton switches which
cannot be operated simultaneously. Prove that if a is pressed and
released and then b is pressed and released then the relay Y will
remain operated but if first b is pressed and released and then a is
pressed, it will be at rest. (In other words, this circuit can distinguish
between the order in which two switches are pressed).
Flynn 4.24: Circuit for Exercise 3.18
"3.19 Generalising the construction in the last exercise, design a toy with
three pushbuttons marked A, C and T(and using as many ‘hidden'
relays as needed), in which a bell will ring “T the pushbuttons are
pressed in the order C—A—T, and a red lamp will light otherwise,
indicating that a fresh start is necessary. (The red lamp should not
light at press and release of C or at C —A , but it should light at
C—6’, C—A—A, C—T—A etc.):
(Chapter Four)
284 DISCRETE MATHEMA'IICS
relay diagram in
3. 20 By attaching suitable output circuits to the
lights
Figure 4.19 (a), design a traffic signal in which a red lamp
and n
for 30 seconds, followed by a green lamp for 15 seconds
yellow lamp for 15 seconds and then this cycle repeats.
relays with the
3.21 Generalising the arrangement in Figure 4.l9(a) to n
ns
some delay times, obtaina cycle of length 2n of state combinatio
of the relays, which repeats itself.
Notes and Guide to Literature
The material in this section is elementary and standard. Our treatment
practice
is modelled after Whitesitt [I]. However, we have deviated from the
of '4 :na directly, " ‘ ', ‘ of ' " ' " we have
taken a circuit as a Boolean function of the switches occurring in it and
of
used the theory of Boolean functions. Because of the great importance
is
the Boolean algebra of functions in switching networks, this algebra
often itself called the swltchlng algebra. For more on it, and for methods
of simplifying Boolean functions, see Dornhofl‘ and Hohn [1].
Digital processing using logic elements is an important branch of
electronics today. See, for example, Strangio [l].
4. Applications to Logic
As remarked earlier, Boole originally formulated his theorems as law
of thought. In other words, application to logic was the very motivation
for the study of the Boolean algebras. Although we have treated them
axiomatically, keeping in line Wilh the modern trend for abstraction, in
this section we show how the basic concepts and results from Boolean
algebras can be applied to streamline the thinking process, especially the
process of logical deduction.
The crux of the applicability of Boolean algebra to logic is that Boolean
algebras deal with two state systems and in logic (or at least the kind of
logic to which we are confining ourselves, namely, bivalued logic) we
have every statement as a two state system, because it can be either true
or false but not both simultaneously. Indeed this is the rine-qua-non of a
statement. Althougha complete and a rigorous definition of a statement
is beyond our scope (for such a definition would cut deep into linguistics
and philosophy), we shall take a statement (also called proposition) as any
declarative .,' g a single, ’ ° ' ' g which is either
true or false but not both. Let us examine these three conditions, one by
one.
The first requirement, that of being a declarative sentence,- rules out all
questions, commands and mere phrases from being statements unless, of
courseI they are used either figuratively or for brevity to mean declarative
sentences. For example, ‘Who does not want to be rich?‘ is a statement
Boolean Algebra 285
despite being a question ostensibly. (Whether it is atrue statement ora
false statement is another matter.) Similatly the phrase ‘In the morning‘ is
not a statement by itself. But as an answer to the question ‘When did you
arrive?’ it is a statement, the words ‘I arrived' being understood.
The second requirement of a statement, namely, it should have a single
definite meaning is more elusive because the meaning is subject to the con-
text. The statement ‘Today is Friday‘ is a diflerent statement every day and
it is only when the day is understood that it becomes a statement. Similarly
a sentence like ‘John is tall’ would be a statement only when the particular
individual named John and the standard of tallness are expressed or at least
understood. To avoid the dependence upon the context, some authors
confine their discussion only to what may becalled mathematical state-
ments, where, as noted in Chapter 1, every term, other than a primitive
term, is given a precise definition. While this chaste practice is mathema-
tically adequate (because all mathematical statements are included anyway),
if we adopt it we would have to sacrifice many ‘real-life’ statements-
Secondly, even if we restrict ourselves to only mathematical statements, some
imperfection of meaning will always occur because of the primitive terms.
Let us, therefore, not be very filssy about the question of meaning, which
is really a deep question of philosophy. However, two points deserve to be
noted. First. by ‘meaningful‘ we do not necessarily mean ‘sensible’. State-
ments like ‘Either Grandma chews gum or missiles are costly’ and 'If there
is life on Mars then the postman delivers a letter' sound utterly non-sensical
and seldom appear in practice (except, of course. in surrealistic literaturel).
But from the point of view of logic they are as valid examples of state-
ments as ‘Either the set A is empty or else it has a least element‘ and ‘If
two triangles are congruent then they have equal areas‘. Another point
about ‘meaning’ is, as emphasised in Chapter 1, Section 4, we interprete
statements to mean exactly what they say. We exclude all connotations and
implications.
It is the third requirement, namely that every statement must be either
true or false, that really calls for a comment. At first sight it may appear to
be superfluous, because every declarative sentence with a definite meaning
must be either true or false depending upon whether what it states holds
or not. For example the sentence ‘The boiling point of water is 100° centi-
grades‘ is true if the phenomenon asserted by it in fact holds, that is if
water indeed boils at l00°C (but at no lower temperature) and false other-
wise. The statement is silent about the conditions under which water is
boiled. As it stands, it is false. To make it true, these conditions would have
to be specified. But that is immaterial for our purpose. The point is that,
every sentence about physical facts is either true or false according as its
contents are consistent or inconsistent with observations. The same holds
about mathematical statements To be sure, there are many sentences which
are not known to be true or false at present. Goldbach's conjecture was
cited as an example in ( haptcr l, Section 4. As a real life example we can
286 niscnm mmamncs (Chapter Four)
take the statement ‘There is life outside our solar system’. What all these
examples show is that while it may he a matter of varying degree of difi-
culty to ascertain the truth or falsehood of a sentence, the fact that it is
either true or false is obvious enough. Why do we then include it as an
additional requirement of a statement?
The answer lies in our attempt to avoid paradoxes. These paradoxes
are of the same spirit as the Russel’s paradox in Chapter 2, Section 1‘ which
arises if we let any collection of objects be a set. To avoid it, we had to
put some restriction on the collections. Similarly, let us see what happens
it’ we allow every declarative sentence (with a single definite meaning) to be
a statement. Consider the sentence ‘I am telling you a lie’. This sentence
cannot be called true because if so then what it says is true and hence the
person uttering it is lying. But this means that what he is saying is nottrue
and hence the statement is false. Similarly if we say the sentence is false
then we are forced to conclude that it is true! This example shows that
there do exist sentences which cannot be assigned any truth value (i.e., which
cannot be called true or false without getting a contradiction). These
examples are different from those where we do not know the truth value of
a statement (such as the Goldbach’s conjecture). They also show that the
third requirement of a statement, that it should be either true or false but
not both is not a superfluous one.
If we look at this example closely we see that the sentence ‘I am telling
you a lie’ differs from some sentence like ‘The boiling point of water is 100°C’
in one crucial respect, namely it makes a reference to its own truth value.
There are many variations of the paradox given above where a sentence
makes a self-reference directly or indirectly (for example, two sentences
referring to the truth values of each other). Such sentences cannot be assig-
ned truth values and consequently cannot be called statements. Sometimes
they are called metastatements. They share some of the properties of ordi-
nary statements, but not all. In the same vein terms like metaknowledge
are used to express knowledge about the presence or absence of knowledge
of something. For example, in the statement, ‘I know that I know nothing’
the second occurrence of the verb ‘to know’ conveys knowledge but the
first occurrence conveys metaknowledge. If we treat metastatements on par
with statements (or metaknowledge on par with knowledge) we get many
paradoxes (a few of which will be given as exercises).
We shall not pursue the definition of a statement further because as far
as the applications of Boolean algebra are concerned, what matters is not
the definition of a statement but how the truth or falsehood of a statement
depends on that of some other statements much the same way as in the
applications of Boolean algebra to electricity, what matters is not the cons-
truction of switches but the formation of circuits from these switches. Our
primary concern will thus be to express certain statements (or more pre-
cisely, their truth values) as Boolean functions of some other statements
(or more precisely, of their truth values). We say the truth value of a state-
Boolean Algebra 287
ment is l (or T) if it is true and 0 (or F) if it is false. Every statement is,
therefore, a Boolean variable, very much like a switch. Just as two switches
which are always in the same state are to be regarded as equal, two state-
ments which are always simultaneously true or simultaneously false are to be
regarded as equal. (Earlier we called such statements as logically equiva-
' lent. Now we are treating them as equal because our concern is only in
their truth values. An alternate approach would be to consider, instead of
statements per :2 their equivalence classes under the relation of logical equi-
valence. But we stick to the first approach.) We shall generally denote
statement by p, q, r, etc.
We already defined the basic operations of disjunction, conjunction and
negation of statements. If p, q are statements, their disjunction is denoted
by p + q (or by p Vq because the statements form a lattice). The conjunc-
tion of p and q is denoted by p-q, pq or by pA q. The negation of p is de-
noted by _I p, ~ 1: or p’. To justify that the negation indeed corresponds
to complementation in a Boolean algebra, we first have to consider what
are the identity elements for the operations + and -. A statement which
is always true is called a tantology and is denoted by 1 while a statement
which is always false is called a contradiction and is denoted by 0. (Although
both these terms have been used earlier, in the present context they play
the role of constant functions which are identically 1 and 0 respectively.)
It is clear that for any statement p, the statement p + 0 will be true ifl‘ p is
true (because 0 can never be true). Hence p + 0 has always the same truth
value as p. So by our convention, p + 0 = p, showing that 0 is an identity
for +. Similarly 1 is the identity for -. Oommutativity of + and . is trivial
While the distributivity of each over the other can be established directlyas
follows (see also Exercise (3.4.8)). Let p, q, r be statements. Then p-(q + r)
is true ifl' p is true and at least one of q and r is true. This is equivalent to
saying that either p and q are true or else p and r are true. But that is
exactly the condition for pq + pr to be true. Thus [1(1) + r) = pq+ pr.
Similarly the other distributivity can be proved. As for complementation,
it is clear that for any statement p, p + p’ is always true and p- p’ can never
be true. (It is here that bivaluedness of logic is crucially needed.)
Therefore if we let S be the set of all statements and consider the ope-
rations +, ~ and ’ defined respectively by conjunction, disjunction and nega-
tion then we have verified all the axioms of a Boolean algebra. However, there
is a technical difficulty in calling the set S(along with these operations) asa
Boolean algebra because S is not a set at all! We cannot form the set of
all statements much the same way as we cannot form the set of all sets.
The difficulty involved here is a matter of axiomatic set theory and beyond
our scope. Still, to play it safe, we state the results in the following form:
4.1 Theorem: Let S be any non-empty set of statements which is closed
under +, - and ’ (i.e. for every p, q e s,p+ q,p.q and p' e 3), Then
(S, +, -, ') is a Boolean algebra. i
288 mscas'rs MATHEMATICS (Chapter Four)
It often happens that the truth value of a statement depends on the
truth values of some other statements say p, q, r,... The number of state-
ments 1), q, r, ...... may be finite or infinite. For example, the truth value of
p + q depends only on that of p and q. But the truth of a statement like
‘every positive integer can be expressed as a sum of four perfect squares'
depends on the simultaneous truth of the infinite sequence 1),. p.,..., p.,...
of statements, where, for each positive integer n. p,' is the statement ‘11 can
be expressed as a sum of four perfect squares'. More generally, the truth
of a statement about a class sometimes depends on the truth or falsehood
of statements about the individual members. (This is not always the case,
because as noted in Chapter 2, Section 1, there are some attributes of a set
which cannot be described in terms of any attributes of the individual
members).
When the truth value of a function f depends on the truth values of
infinitely many statements, there is no easy way to express fin terms of
those statements. We can try things like the conjunction» of infinitely
many statements but that would involve a limiting process. However. if
the truth value off depends upon those of only finitely many statements,
say p1, p,,..., p, then f is a Boolean function of the Boolean variables
[11, Pa. ..., p. and the methods of Section 2 become applicable. The table
of values off is known as the truth table off because the values assumed
byf (and also by pl“... p,,) are the truth values. Knowing the truth table
off we can find the disjunctive normal form off and hence express f as
an algebraic expression involving p,, p,,..., p" and the operations +, -
and '
We apply this procedure to an implication statement p -> q where p
and q are some statements. As defined in Chapter 1, Section 4, p —> g
means that whenever p is true so is g. It is a little easier to first write the
truth table of the negation of p —> q, i.e. of (p —> q)’. Note that (p —> q)’
holds precisely when p—> 1] fails But to say that'p ~> q fails (i.e. is false)
is equivalent to saying that p holds and still q fails, that is. p is true but q
is false. In all other cases p —> q holds. Hence we get the truth tables of
(p —> q)’ and ofp —> q as in Figure 4.25.
From this table we see that the disjunctive normal form of
p —> qiq +p’q + p'q’
which upon simplification gives p'+q. In other words, the implication state-
ment p —> q is logically equivalent to ‘either p fails or else q holds‘. This is
consistent with common sense too. In practice we use a sentence like
‘That plane is a jet or else my eyes are failing me' to mean ‘lf my eyes are
not failing me then that plane is a jet’. The crucial point in preparing the
truth table above was that the implication statement p —> q is completely
silent as to the truth of q when p is false. And since it is not saying any-
thing as to what happens if p is false. we have to let it be true regard-
less of whether q is true or not. The logic adopted here is the same as in
Boolean Algebra 289
R0" I q (p -> q)’ p—> q
l 1 1 o 1
2 1 0 l o
3 o l o 1
4 o o o 1
Figure 4.25: Truth Tobie of Impllentlon Statement
vacuous truth. Indeed, ‘, true can be on, A as
implication statements whose hypthesis is always false. For example
‘Every {our legged man is happy‘ can be expressed as ‘lf a man has four
legs then he is happy’, which is true because its hypothesis is false.
We also note that equaling p —> q with p' + qis consistent with the
partial order S defined on the set of statements by p < q ifi' p implies q
is true (cf. Chapter 3, Section 3). For,p’ + q = l ifl'pq’ = 0 (taking >
complements) which is precisely the order relation for a Boolean algebra
in Definition 1.5.
Having obtained an algebraic expression for an implication statement,
we can now handle with ease problems involving logical implications. For
example we see at once that an implication statement p —> q is logically
equivalent to its contrapositive, q'—> p’, because
p~q=p’ + qandq'ep’ = q +p'
(since (q')' = q). In real-life problems where a system of rules is given.
they are generally in the form of the implication statements. To say that
a particular rule is obeyed is equivalent to saying that the corresponding
implication statement has truth value 1. Therefore, to say that a system of
rules is simultaneously satisfied is equivalent to saying that the conjunc-
tion of the corresponding implication statements is true. Faetorising this
conjunction into simpler factors leads to a simplification of the rules.
Similarly if the conjunction comes out to be 0 then it means the rules can
never hold simultaneously and hence that the system is inconsistent.
We illustrate this technique for the Business Problem. Take aparticular
business and let p, q, r, s be respectively the statements that it has an
import license, it " nn- essential dities, it ,‘ , local
personnel and it employs skilled personnel. Then the three rules are
respectively (i) (p + q’) » (rs’) (ii) (p + r’) —> (rq’) and (iii) 12’ —> r’. If
the first rule is to be satisfied then p'q+ rs' = 1. Similarly (ii) holds ifi‘
p'r + 59' = 1 and (iii) holds ifl'p + r’ = 1. Thus the business in question
will satisfy all the three rules ifl' (p’q + r:') (p’r 4— sq’) (p + r') = 1. The
290 arson-n; MATHEMATICS (Chapter Four)
product of the last two factors is sq’ (p +r’). which, when multiplied with
(p’q + rs’) gives 0. So we get 0 = l which is a contradiction. Thus no
business can satisfy all the three rules. So it is impossible to do any business
in that country.
In Section 1, we tackled the Business Problem by considering the set of
all businesses and certain subsets of it. It is instructive to compare the two
solutions. There we calculated, for each rule, the set of businesses which
violated that rule. It turns out that this set, (or its complement, namely the
set of businesses which obey a particular rule) is intimately related to the
implication statement which corresponds to that rule. We proceed to study
this relationship. First we need a definition which involves the formation
of statements.
4.2 Definition: A predicate (or more precisely a nnary predicate) on a
set X is a sentence which contains a variable x such that when every
occurrence of x is substituted by an element of X we get a statement.
Intuitively, we may think of a predicate as a variable statement which
becomes specific when a particular value is assigned to the variable in it.
The terminology comes from grammar where every simple sentence has a
subject and a predicate. The predicate contains the verb and hence describes
the action in the sentence and the subject specifies a particular entity about
which the predicate says something.
For example, let X be the set of all positive integers. Consider the
sentence ‘x is a prime and x’ can be expressed as a sum of two squares‘.
Here x is a variable. For convenience let us write this sentence as p(x).
When we substitute for :4 some value, say 5, from X we get the statement
‘5 is a prime and 25 can be expressed as a sum of two squares'. We denote
this statement by 11(5). Similarly 11(3) is the statement. ‘3 is a prime and 9
can be expressed as a sum of two squarcs’, p(lO) is the statement ‘10 is a
prime and 100 can be expressed as a sum of two squares' and so on. The
statement 11(5) is true while the statements 11(3) and p(10) are false. It is very
important to replace every occurrence of the variable by the same element
of the underlying set. Otherwise disastrous results are obtained (which
ofien form the gist of some jokes). For example if A says to B, 'I love my
wife’ and B says ‘So do 1’, then B's statement means that B loves his own
wife and not that B loves A‘s wife! Here there is an implicit predicate
p(x) = ‘x loves x’s wil'e' on the set of all married men (with only one wife
each!) and the statement made by A is p(A). If B wants to make a similar
statement, arising from the same predicate. it would be p(B). which reads
‘3 loves B’s wife‘. However, if A says ‘I love Lucy’ and B says ‘80 do I’
then B’s statement means ‘8 loves Lucy'. Here the predicate is dilferent,
namely q(x) = ‘x loves Lucy‘. If Lucy happens to be A’s wife then by ‘I
love Lucy’ whether A means p(A) or q(A) is to be inferred from the context
in practice. Unintended interpretation would lead to a disaster.
We remark here that the word 'sentence’ appearing in Definition (4.2)
Boolean Algebra 291
is technically incorrect. An expression like ‘x is intelligent’ is not called a
sentence because of the variable x occurring in it. It is called just a ‘formula‘
or a ‘string’. A sentence is technically defined as a formula with no free
variables. This distinction is important in the study of formal languages
but we shall ignore it for the moment here.
Note also that the variable x is a dummy variable and could be replaced
by any other variable which ranges over the same set as x, Generally, in
specifying a predicate, the choice of the variable is so made that the set on
which the predicate is defined would be automatically implied. For example,
in the predicate 'It' a man is rich, he is happy‘ We can take ‘s man' as the
variable and obviously it ranges over the set of all men. We can also word
the predicate as ‘A rich man is happy’ where the variable is ‘a rich man'
and ranges over the set of all rich men.
We can also consider binary predicates (also called 2-plaee predicates) or
more generally n-ary predicates. For example if X is the set of all persons
in a town and Y is the set of all houses in it then ‘x lives in y’ is a binary
predicate with two variable x and y ranging over the sets X and Y respect-
ively. If the sets X and Yeoinclde then a binary predicate over X can be
identified with a binary relation on X. Given a binary relation R on X we
define a binary predicate on X by ‘x is R-related to y', i.e., by ‘(x. y) e R’.
Conversely every binary predicate plx, y) on X determines a binary relation
R on X, if we let R = ((x. y): x, y e X. p(x, y) is true}. It is, in fact, more
consistent with our intuition to think of a binary relation on a set X as a
binary predicate on X rather than as a subset of X XX as we have defined.
But, as just remarked, the two approaches are really equivalent.
Another way to look at a predicate p on a set X is to think of p as a
function from X to set of statements. The various operations on statments,
such as disjunction etc. can then be defined for predicates by pointwise
application. Thus ifp(x), q(x) are two predicates (i e. unary predicates) on
a set X then we define their disjunction (p + q) (x), as the predicate
p(x) + q(x). Similarly p-q and p’ are defined. With these operations, the
set of all predicates on a set X forms a Boolean algebra. We now propose
to investigate the structure of this Boolean algebra by showing that it is
isomorphic to a more familiar Boolean algebra. The key concept on which
this isomorphism is based is the following.
4.3 Definition: Let p(x) he a (unary) predicate on a set X. Then the truth
set ofp(x) is defined as the set (y E s (y) is true).
In other words, the truth set of a predicate is the set of those values of
the variable for which the statement formed from the predicate is true.
For example, the truth set of the predicate p(x) on R (the set of real
numbers) defined by p(x) = 'x‘ - 7 = 9’ is (2, —- 2).
If the same predicate is considered on the set of complex numbers then
the truth set would be larger, namely, (2, — 2, 2i, —2i). Obviously a
292 DISCRETE MATHEMATICS (Chapter Four)
predicate serves as a characteristic property (see Chapter 2, Section 1) for
its truth set, provided the set on which it is defined is understood.
It is quite possible that two different predicates, say, p(x) and q(x)
defined on the same set X have the same truth sets. The chances of this
are more, the smaller the set X is. For example if X is the set of all students
in a small class'it may very well happen (by chance) that all intelligent
students in it are heavy smokers and vice-versa, even though we expect no
correlationship between intelligence and smoking. (A similar remark was
made about binary relations, following Definition (23.2.1).) The purpose of
predicates is to distinguish among various elements of a set by means of
properties which some elements may have and others do not. We therefore
regard two predicates on a set as equal if their truth sets are identical.
This may sound absurd at the beginning; for example who would agree
that ‘x is a smoker’ is the same as ‘x is intelligent'? But if we really look
at it from the point of the range of variation of x, it is no longer all that
absurd. The extreme case is that of an empty set on which any two
predicates are equal. When a set X is not empty but still fairly small. two
predicates on X may very well be equal, even though We would hesitate to
admit it. Our hesitation is generally due to our knowledge that the two
predicates are really the restrictions of two unequal predicates on some
superset of X. But if the set X is our whole world, there would be little
room for hesitation.
With this understanding about the equality of predicates on a set X,
we now determine the structure of the Boolean algebra formed by them.
4.4 Theorem: Let X be any set. Assume that two predicates onX
having the same truth sets are equal. Then the Boolean algebra of all
predicates on X is isomorphic to the power set Boolean algebra P(X).
Proof: For a predicate p(x), let T, be its truth set. Let B be the set of all
predicates on X. Define f: B—>P(X) by f(p(x)) = T,. We verify easily that
f((P + q) ()0) = Ta U To! f((p-q) ()0) = Tn n T. and f(P'(X)) = X - T,,.
So once we show that f is a bijection, it would follow that f is an isomor-
phism. That f is one-to-one follows from the assumption that if T, = T,
then the predicates p(x) and q(x) are to be regarded as equal. It remains
to show that f is onto. Let A E P(X). Then A is a subset of X. Let p(x) be
the predicate ‘X e A’. Clearly T, = A, that is, f(p(x)) = A. So f is onto. 5
Although this theorem is not very profound, the isomorphism establish-
ed in it serves to link she two solutions we gave for the Business Problem.
Let p(x) and q(x) be two predicates on a set X. Let r(x) he the predicate
p(x)—> q(x). Let us compute the truth set T, of r(x). If y e X then the state-
ment p(y) —> q(y) is true ifl' either p(y) is false 'or else q(y) is true. This shows
T, = (T,’) U Tq. It follows that the predicate r(x) is identically true (that
means p(x)—>q(x) for all x e X or equivalently. T,=X) ifi‘ ((T,)’ U Tq)’=¢,
i.e.. iii" T, C T,,. This is, of course. consistent with our interpretation of
Boolean Algebra 293
110:) -> q(x) to mean that whenever p(x) is true so is 110:). Even when r(x)
is not identically true. its truth set, (T,)' U T, tells us in which cases it is
true.
New in the Business Problem, the set X is the set of all businesses in the
country. In the second solution, we took some element of X and consider-
ed the statements 11, q. r, 3 about it. This actually amounts to considering
four predicates, p(x), q(x), r(x). s(x) on X. We expressed the three rules as
certain other predicates on X and showed that their conjunction was 0.
In the first solution, on the other hand, we considered the truth sets of the
predicates p(x), q(x). r(x) and r(x). For each rule, we found its truth set
(or rather its complement) and showed that their intersection was null (by
showing that the union of their complemets was the whole set). The two
solutions are, therefore, just the translations of each other.
The algebraic expression for an implication statement also provides a
method for testing the validity of an argument. This was left largely to
common sense in Chapter 1, Section 4. But with the aid of the machinery
of Boolean algebras we can tackle it quite systematically. Recall that in
every argument we have a collection of statements, say, p., p,...., p. which
are called premises and a statement I] called the conclusion. The argument
is called valid if whenever all the premises hold true, so does the conclusion.
Now the simultaneous truth of 1:1, p,....,p,, is equivalent to the truth of
their conjunction p1 pump". Therefore. an argument is valid or invalid
depending upon whether the implication statement (pl p,....p,,)-e q is
identically true or not. Since (p‘ p,...p.)—>qequals 111' + p.’ + +p.’ +1].
the problem of testing the validity of an argument can be solved by
Simplifying this expression and seeing whether it comes out to be a tame-
logy (l.e., identically equal to 1).
To illustrate this consider the following argument :
Premises: If it rains, the streets get wet.
If the streets get wet, accident happen.
Accidents do not happen.
Conclusion: It does not rain.
Let r, s, r denote respectively the statements ‘it rains‘, ‘the streets get wet'
and ‘accidents happen’. Then the three premises are r—v :, .1 —> t, and r‘
or r’ + .y, .r’ + r and t' and the conclusion is r’. To test the validity, we
consider
(r’ + S)’ + (3' + 0’ + (1')' + r'
which equals rr’ + 51’ + t + r’ which reduces to 1 through
r.r'+s+t+r’,r+.r+t+r’andl+s+t.
294 DISCRETE MATHEMATICS (Chapter Four)
So the argument is valid. We could also establish the validity by drawing
a table of values in which we show the truth values of the premises and
of the conclusion for every possible combination of truth values of the
statements r, s, r. If we find that for every combination in which all premi-
ses are l, the ' ' is also I we ' " that the my is valid.
This is a time consuming method. Some saving can be effected by noting
that if in a particular row we find that some premise is 0 or the conclusion
is l. we need not do any further computation in that row. Even then, the
method is not as eflicient as the agcbraic method.
If in the example above, the third premise was ‘Accidents happen'
then to test the validity we would have to consider the statement,
rs‘ + 31' + t' + I’
which. upon simplification, equals r' + s‘ + t’. This is false when r, s, l
are all true. Thus there is at least one situation in which all the premises
are true but the conclusion is false. (In terms of the truth table. this means
there is at least one row in which the entries in the columns of the premi-
ses are all 1 but the entry in the column of the conclusion is 0.) So the
argument is invalid.
When the number of premises is large. it is hardly practicable to check
the validity of an argument either by the truth table or by a reasoning simi-
lar to the above. In such cases we resort to what may be called the chain
rule about validity. Simply stated, it says that by chaining together two
(or more) valid arguments. we get a valid argument. A precise formulation
is as follows. Let A, be an argument with premises pi..., p. and conclusion
ql. Let A, be an argument with premises q,, 11”,...” p. and conclusion (1..
Let A, be an argument with premises p1"... pk, pk+,...., p. and conclusion
q, Then, if AI and A. are valid, so is A, The proof of the chain rule is
easy and left to the reader.
The chain rule is used so frequently in mathematics that an explicit
mention is rarely made. Whenever in a proof we cite a theorem previously
proved. we are implicitly using the chain rule. By repeated applications of
the chain rule a valid argument can be split into a series of some very
simple arguments. Among these, probably the most frequently used argu-
ment is what is called modus ponens. Formally, this is an argument with
two premises. one of the form p —> q and the other p and whose conclusion
is q (where p, q are any statements whatever.) Verbally, modus ponens is
the argument that if an implication statement holds and its hypothesis is
true, then so is its conclusion. This is certainly consistent with common
sense and it is a triviality to verify the validity of modus ponens. Using
modus ponens and the chain rule let us now establish the validity of the
argument given above. We follow the notation there and present the
reasoning in the form most commonly adopted in mathematical proofs,
as a sequence of steps beginning with the premises, ending with the conclu-
sion and defending each step.
Boolean Algebra 295
(l) r —> r (given as a premise)
(2) .r -> I (given as a premise)
(3) _| t (given as a premise)
(4) —1 t —> -1 .1 (equivalent to (2))
(5) m .r ((3), (4) and modus ponens)
(6) —l .r —> —| r (equivalent to (1))
(7) "1 r ((5), (6) and modus ponens).
Hence the argument is valid. This method is generally not very useful
in proving that an argument is not valid, becau se even if we cannot reach
the conclusion through one particular sequence of simple arguments.
conceivably some other sequence could work. However, for proving vali-
dity, it is the most frequently resorted method in mathematics. The
desired conclusion appears at the end. But every step along the way. is
itself the ‘ ' of a valid as- These are often called
minor or subordinate arguments.
We often come across arguments of the following type:
Premises: Every man is mortal
Socrates is a man.
Conclusion: Socrates is mortal.
The validity of this argument is obvious. But it is not strictly speaking
a case of modus ponens. If the first premise were ‘If Socrates is a man,
then Socrates is mortal' then this would be a case of modus ponens. As it
happens, the first premise is much stronger. Consider the predicate ‘If x
is a man then x is mortal’. Here the variable x can range over any set.
The first premise then says that this predicate is identically true, although
we are using it in only a particular instance, namely, when the variable x
is given the value ‘Socrates’. For this reason. the first premise is called a
malor premise and the second is called a minor premise. This form of an
argIIment is called instantlation because it consists of taking a particular
instance of a statement.
We urge the reader to go back to Exercise l.4.9 and test the validity of
the arguments using the methods of Boolean algebra. The reasoning used for
proving validity of arguments can also be used to prove that a system of
statements is inconsistent. All we have to do show that 0 can be drawn as
valid conclusion of an argument whose premises are the given statements.
This provides yet another solution of the Business problem which will be
developed through exercises.
296 DISCRETE MATHEMATICS (Chapter Four)
Exercises
4.] Which of fire following expressions are statements? Why?
(1) If it rains the streets get wet.
(2) if it ruins the streets remain dry.
(3) It rains.
(4) If it rains.
(5) John is intelligent and John is not intelligent.
(6) If John is intelligent then John is intelligent.
(7) If John is intelligent then John is not intelligent.
(8) For every man there is a woman who loves him.
(9) There exists a woman for whom there exists no man who loves
her.
(10) There exists a woman such that no man loves her.
(11) This sentence is false.
(12) There is life outside our solar system.
4.2 On both the sides of a piece of paper itis written, ‘The sentence on
the other side is false'. Are the two sentences so written statements?
Why? What if on one side ‘The sentence on the other side is true‘
is written and on the other side ‘The sentence on the other side is
false”!
4.3 A barber in a village makes an announcement, ‘I shave those (and
only those) persons in this village who do not shave themselves’. II
this announcement a statement? Why?
4.4 Comment on the usage of the verbs ‘to learn‘ and ‘to teach’ in the
sentence ‘The only thing we learn from history is that it teaches
us nothing’.
4-5 Suppose God the Almighty can do anything at any time. Can He
then make a stone so heavy that He will not be able to lift it?
Comment.
4.6 What, if anything, is wrong in the following reasoning?
‘An instructor of a course announces to the class that he would
give a surprise test next week. The week runs from Monday to
Friday and by ‘surprise' is meant that on no day prior to the test
could the class tell with certitude that the test will fall on the next
day. Then the test could not be given on Friday because
in that case. by the evening of Thursday. the class would be in a
position to predict it. So the test has to be given on a day from
Monday to Thursday. But then by the same reasoning, it cannot
be given on Thursday. as otherwise, by Wednesday evening the
class would come to know about it. Continuing in the lame
manner, the test would have to be given on Monday. But then it
is hardly a surprise test. So it is impossible to give a surprise test,
after announcing that there would be one.’
Boolean Algebra 297
4.7 A clever inspector of police assured his suspects that he would not
resort to any third degree methods to extort confessions from
them, if they would agree to answer truthfully just two questions,
with simple ‘yes' or ‘no' answers. After a suspect agreed the
inspector would ask him the first question and the suspect would
reply ‘Yes’ or ‘no’. Then this inspector would ask his second
question ‘Are you guilty?‘ and to this the suspect had to answer
‘yes‘. Find out the inspector’s first question that would do the trick.
Then comment upon this form of questioning. (Forget the lega-
lities involved. Just stick to the logic of it.)
‘4.8 A and B are two mathematicians. A third person, C, selects two
(not necessarily distinct) integers x and y each less than 100 and
greater than 1. He tells A only their sum x + y and tell: B only
their product xy. A is also told that B is given the product xy but
not the sum x+ y and B is told that A is given x + y but not x~y.
The following conversation takes place:
B : I do not know the two number; x and y.
A : I already know you don’t.
B : Now Iknow them.
A : So do L
Assuming both A and B are telling the truth, find the number: x
and y.
4-9 Verify the following identities for statements 11, q. r by writing
truth tables for both the sides.
6) Na + r) =p-q + P"
(ii) p + (w) = (p + no» + r)
(iii) (1) +q)-(pq)’ = (p-q') + (p"q)
4.10 Do Exercise (1.13) using implication statements.
4.11 Do Exercise (1.11) using implication statements.
4.12 A set X consists of 5 men named Ram. Rahim, Robert, Gopal and
Goliath. They have the following properties in regard to their
height, financial status, intelligence and age.
Person Height Financial Intelligence Age
status
Ram Tall Rich Dumb Young
Rahim Tall Poor Dumb Old
Robert ' Short Poor Intelligent Young
Gopal Tall Rich Intelligent Old
Goliath Short Rich Dumb Young
298 Discnm MATHEMATICS (Chapter Four)
Find the truth sets of the following predicates on X
(i) A man is rich and intelligent.
(ii) A man is rich or old but not necessarily tall.
(iii) A rich man is intelligent.
(iv) A man whose name begins with an ‘R’ is rich.
4.13 Prove the validity of undue ponens and of the chain rule.
4.14 Besides modus ponens and instantiation the following elementary
forms of valid argument are frequently needed. The names of some
of them are given against them. Establish the validity of each one
of them. (p, q, r, s are some statements)
0) PM
p—n
(Hypothetical Syllogiem)
P -> 11
(ii) M
P
(m) ~p— (Addition)
p+ 9
(1V) :1
— (Conjunction)
qr
(V) p -> q
s -+ r
p: + at
(vi) p -> r
q —> :
p+q—>r+s
4.15 Prove that the rules in the Business Problem are mutually inconsis-
tent by drawing 0 u A valid conclusion from them. (Hint: Use the
various arguments in the last exercise.)
4.16 Can the inconsistency of the data in Exercise (LIZ) be proved
using solely implication statements?
4.17 Prove that the following argument is invalid:
Premises: If it rains missiles are costly.
Missiles are costly.
Conclusion: It rains.
Boolean Algebra 299
Can this argument be called invalid because the first premise is
absurd, there being no correlation between rain and the cost of
missiles?
4.1 If in the last exercise we keep the premises as they are but change
on
the conclusion to 'It does not rain’, prove that the resulting argu-
ment is still invalid. Why does this not contradict the bivaluedness
of logic because of which one of the two statements, 'lt rains'Xand
‘It does not rain' must be true?
Note: and Guide to Literature
This section, like the fourth section of Chapter 1, overlaps with the
first chapter of Joshi [1]. For a more formal treatment of predicates see.
for example, Tremblay and Manohar [l]. The branch of logic dealing with
statements (or propositions) and their Boolean algebra is called propositio-
Iltl calculus. On the other hand. predicate calculus deals wlth the formation
of statements in I language.
Five
Group Theory
A group structure on a set is probably the most important algebraic
structure that can be induced on it with a single binary operation.
The theory of groups is rich both in terms of the profound
results in it and in terms of its applications to oombinatories. physics
and many other fields. In an elementary book like this, we have
to be content with only a few glimpses of the theory. In this chapter we
present some of the basic results about groups in the first three sections.
The last section deals with a special type of groups, called the permutation
groups. The properties proved here about them will be applied in the later
chapters. Other applications of groups will be given in the chapter on
group actions (see the Epilogue).
1. Groups and Subgroups
Suppose S is a set. We often have to consider functions of the set S into
itself. These functions are often called operations, transformations or some
other names depending upon the context. Let X denote the set of all func-
tions of S into itself. We are rarely interested in the whole set X. The types
of functions that are important will of course depend upon the context. In
mechanics we are interested in what are called rigid body motions, that is,
those transformations which preserve the distance between various points
(also called isometries). In the study of finite state machines. the set S is
often called the state space. Each member of S represents some configura-
tion in which a particular system is at a given time. The transformations
in this case are the results of giving some instructions to the machine, by
which one state is converted into another. As yet another example, which
has acquired recent popularity, take a Rubik’s cube. Let S be the set of
various patterns of squares on the six faces. Then every sequence of motions
of the Rubik‘s cube gives a function from S to S. (The object of the puzzle
is to find a 1 of such ‘ which will ‘ ‘ a given pattern
Group Theory 301
to a standard pattern in which all the squares on each face are of the
same colour).
So let S be the set and let G be the set of those functions of S into itself
in which we are interested. Then G c X, the set of all functions from S
into itself. The property of G that is most commonly true is that it is closed
under composition, that is. if two functions f : S —> S and g : S —> S are in
G then so is their composite fc g: S —> S. If G is not closed under com-
position. then we usually enlarge it to a subset H of X which is closed under
composition. Another property which is often true of transformations is
that they are invertible and their inverses are also of the permitted type.
In symbols, iff e G then f-1 exists and belongs to G. For example, in the
case of a Rubik’s cube, if a particular sequence of motions changes one
pattern of squares into another we can get back the original pattern by
applying, in the reverse order. the inverse motions. Note that if G is a non-
empty subset of X which is closed under composition and under inversiOn
then the identify function I; is also in G because we take any f E G and
write ls = faf-l.
Sets of functions which are closed under composition and inversion are
called groups. The origin of the term is obscure. Possibly. it lies in the fact
that functions grouped together so that their composites and inverses are
also included have certain nice properties, ‘as a group’ which an arbi~
trary set of functions assembled together does not have in general. (We shall
study such properties later.) At least historically, the term group was used
to mean a group of functions (or a group of substitutions as it was called).
As at many other places, abstraction set in. The set of functions was re-
placed by an abstract set G and the composition of functions was replaced
by an abstract binary operation on G, satisfying certain conditions. Such
abstraction is definitely worth-while because there are many 'naturally oc-
curring’ examples of groups other than groups of functions. Nevertheless.
the old ties with the groups of functions are important for two reasons.
First, in applications, it is these groups that will be important. Secondly,
we shall prove in Section 3 that every ‘abstract’ group is isomorphic to a
group of mnctiOns. Although this result will not be of much direct use, it
will show that, upto isomorphism, the groups of functions are the only
possible groups.
Let us now turn to the definition of an abstract group. As noted before
it is modelled after groups of functions. So we take an abstract set G, a
binary operation . on G and we want to assume as axioms certain proper-
ties of . which are true of composition of functions. The foremost such
property is associativity. Without associafivity. it would not be possible to
manipulate algebraic expressions involving . (see Chapter 3, Section 4).
Associativity by itself is not a very strong condition. Still there is a very
large number of examples where it holds and so there is a name for the
algebraic structure where it is the only axiom.
302 mscam MATHEMATICS (Chapter Five)
1.1 Definition: A semigroup is an ordered pair of the form (G, .) where
G is a set and . is an associative binary operation on G.
The set of all functions from a set into itself is a semigroup under the
operation of composition. There are of course many other examples. The
reader should consult the list of examples of binary operations given in
Chapter 3, Section 4. As usual, there are three ways to construct new
semigroups from old ones. One is to take a sub-semigroup of a semigroup,
that is, a subset closed under the given binary operation. Another is to
take the cartesian product of two or more semigroups and define the binary
operation coordinatewise. The third method is to take the set of functions
from a set into a given semigroup and define the binary operation point-
wise. Note that we have not required the presence of an identity element
in the definition of asemigroup. If a two sided identity exists (which would
then be unique by Proposition (3.45)), then the semigroup is called a monoid
as already defined after Definition (3.4.l3). The set of all functions of a
set S into itself is actually amonoid. Under the operation of usual addition,
the set of non-negative integers is a monoid, while the set of positive inte-
gers is a semigroup but not a monoid. An especially interesting example
of a mouoid is obtained by taking the set of all words in an alphabet S(see
again the list of examples in Chapter 3, Section 4) and defining the binary
operation by concatenation. The empty word (that is. a sequence of length
0) plays the role of the " ’ , ' ‘ This " is quite a ' in
that the choice of the alphabet is arbitrary. By choosing a suitable alphabet
we can get a monoid which will be important in a particular context. For
example, if we let S be the set of all elementary operations which a machine
can do then the monoid ofall words in Sconsists of all series of operations
which the machine can do in a finite amount of time.
Thus we see that the theory of monoids would have many applications.
Unfortunately, the structure of a monoid is just not strong enough to
yield profound results. it turns out that the addition of one more axiom.
namely the existence of an inverse for each element, improves the situation
dramatically. The inevitable price to be paid is, of course, the sacrifice of
those monoids where this condition is not met. For example, in the monoid
of words in an alphabet considered above, no element other than the
identity element (that is, the empty word) is invertible. It turns out that
even after the exclusion of such examples, the list where the theory would
be applicable is still impressively large. As a matter of fact, even if we add
one more axiom, namely that the operation be commutative. we still have
a sizable number of cases where the theory would apply. We therefore
make the following two definitions.
1.2 Definition: A group is an ordered pair of the form (G, -) where G is a
non-empty set and - is a binary operation on G which satisfies the following
conditions:
Group Theory 303
(i) . is associative,
(ii) - has a two-sided identity element and
(iii) every element of G is invertible.
It further. . is commutative, then the group (G, -) is said to be abellan (or
commutative). Otherwise it is called non-ahelisu.
Before we give examples, a comment is in order regarding the
notation. As usual, when the group operation is denoted by - it is often
suppressed from notation. Thus we write xy for x- y. For abelian groups,
the group operation is also often denoted by + and in that case the group
is said to be additively written. As with Boolean algebras, there is in
general no harm in using the same symbol to denote the group operations
of two different groups so long as this is not likely to cause any confusion.
A confusion would indeed arise if the underlying sets of the two groups
are the same. When the group operation is denoted by + (which is almost
never done for non-abelian groups), the identity element is generally
denoted by 0. In all other cases it is customary to denote it by e or by
1, we shall follow the first notation for 'abstract’ groups. The inverse of
an element x (which is unique by Proposition (3.4.7)) is denoted by x4
in general and by —x in case of an abelian group which is additively
written. Following the comments preceding Proposition (3.4.8), if x is
a group element then the power x' is well-defined for all integers 7: (positive,
zero or negative) and the laws of indices hold. If the group is abelian
and additively written, then x" is denoted by wt and is called a multiple
rather than a power of x. Unless the group operation has some other
suitable name in a particular context, we shall call it ‘multiplication' and
denote it by .
Now, as for examples of groups, the foremost example is the set G of
all bijections of a set X. If for f, g, e G we define f~g as the composite
f e g then (G, -) is a group. This group is called the permutatlm group of
the set X, and may be denoted by 50!). If X is a finite set with n elements
then S(X) has n! elements. For n =0,l and 2this group is abelian.
For all other n it is uon-abelian. To see this suppose X has at least three
distinct elements say at, y and z. Define f: X —> X by f(x) = y, f(y) =2:
and 1(3) = s for all s E X, s #2 x, y. In words, f interchanges x and y and
leaves every other element of X fixed. Similarly, let g : X —> X beg/(y) = z,
g(z) = y and g(:) = s for all .1 e X — (y, 2}. Then the composites g o f
andfo g are unequaibecausegof(x) =g(y) = zwhile fog(x) =f(x) = y-
Note here that f a f: g o g = l,r =‘identity element of S(X). In other
words I and g are their own inverses. If we define h z X —> X byh (x) = y,
h(y) =z, h(z) =xand h(s)= sforallseX— (x, y,z)theuh oh-h— = 1,.
As examples of abeliau groups one can take the groups of integers,
' ‘ s, real ' or ‘ ‘ under usual addition.
The set of rational numbers is not a group under usual multiplication
304 mscrtm summaries (Chapter Five)
because the element 0 has no inverse. However, the set of all non-zero
rational numbers is a group, in fact an abelian group under usual multi-
plication. The same holds for the set of non-zero real numbers and the
set of non-zero complex numbers.
Another important class of groups is provided by the sets of residue
classes modulo some positive integer. if m is a positive integer then in
Chapter 2 Section 2, we defined 2,. as the set of all equivalence classes of
Z (the set of all integers) under the equivalence relation of congruence
modulo m. In Chapter 3, Section 4 we saw that there are two well-defined
binary operations + and . on 2,”, namely. the addition and multiplication
of residue classes modulo m. Also both these operations are commutative
and associative. We claim (2..., + ) is a group. Clearly the residue class
[0] is an identity for +. Also if [n] e 2,, then [— n] is the additive inverse
of [n].
Under multiplication of residue classes, 2,. is not a group (except in
the trivial case m: 1), because the residue class [0] cannot have an inverse.
It is tempting to exclude [0] and ask whether Z... — {[0]} is a group under
modulo m multiplication. Here the difliculty is that in general Z». — {[01)
is not closed under multiplication. For example, let m = lo. T’hen neither
[5] nor [2] equals the residue class [0] but [5]-[2] = [10] = [0] modulo 10.
So we do not have a well-defined binary operation on Z". -([10]}. Notice
here that lo is not a prime. If m is a prime then things are better as we
now allow.
1.3 Proposition: Let m be a positive integer greater than 1.. Then the
set of non-zero residue classes modulo m is closed under residue multipli-
cation modulo m if and only if m is a prime.
Proof: The direct implication is easy. if m is not a prime then m = a-b
for some 0 < a < m and 0 < b < m. Then [a] and [b] are both non-zero
in Z... but [a] [b]: [a~b]= [m] =[0]. So Z... — ([0]) is not closed under
mod to multiplication.
The proof of the converse requires an important property of prime
numbers, namely if p is a prime and a, b are two integers such that p l ab
then either p [a or plb. Since a more general result will be proved in
the next chapter, for the moment we take this fact for granted. Now
suppose m is a prime. Let [a]. [b] be non-zero elements of Z“. We have
to show [ab] is also a non-zero residue class. If not then ab 5 0 (modulo m)
which means m divides ab. But then, by the fact just mentioned, either
m l a or m l b. In the first case, [a] = 0 while, in the second, [b] = [0].
In either case we have a contradiction. Thus [a-b] 9e [0], or in other
words, the set Z,n — {[0]} is closed under multiplication modulo m. i
Because of this proposition, whenever m is a prime. there is a well-
defined binary operation of multiplication on the set of non-zero residue
Group Theory 305
classes modulo m. This operation is already known to be associative.
Also the residue class [1] is the identity element. Thus Z... — {[0]} is a
monoid. To prove that it is in fact a group we use the following result
which is of independent interest.
1.4 Proposition: In any group both the cancellation laws hold. Conversely
any finite semigroup in which both the cancellation laws hold is a group.
Proof: The first assertion follows from the properties of inverses proved
in Proposition (3.4.7). The second assertion is just a rewarding of Theorem
(3.4.10). 3
Now to apply this result to Z," —- {[0]} when m is a prime. We simply
have to show that the cancellation laws hold in it. Because of com-
mutativity. only one cancellation law needs to be proved. So let [x], [y]
and [z] e Z". —{[0]} and suppose [x] [z] = [y] [2]. Then xz— yz a 0
(mod m). Hence m divides the product (x— y) 2. Since m is a prime
either m l (x — y) or m | 2, (by the property of primes mentioned above).
The latter possibility means [2] = [0], contradicting the hypothesis. So In
divides x — y which means [x] = [y]. Thus cancellation holds in Z,. —([0]}.
So it is a group. Note that it has m—— 1 elements, (all but one residue
classes modulo m.). To record our work we introduce a terminology.
Let (G,-) be a group. If the set G is finite then the’ group G is said to be
of finite order and | G | is called the order of G and is often denoted by
a (G). (If G is not finite, the group is said to be of infinite order.) With
this terminology, our result is as follows:
1.5 Theorem: For every positive integer m, Z," is an abelian group of
order m under addition modulo m Ifm is a prime. then Z». —{[0]) is an
abelian group of order m — I under multiplication modulo m. 3
Further examples of groups canbe obtained by taking cartesian products
of groups, sets of functions from any set into a group and by taking
subgroups. The last concept is especially interesting because various
subgroups of the group of permutations of a set are important in various
fields of study. Moreover, we shall show in Section 3 that every abstract
group is isomorphic to a subgroup of a group of permutations of some
set. Let us therefore study the concept of a subgroup carefully. Although
it is only a special case of the general concept ofa substructure of an
algebraic structure defined in Definition (3.4.l4), in view of its special
importance we restate it. A characterisation of it will be given in
Exercise (L6).
1.6 Definition: Let (G. -) he a group. Then a non-empty subset H of
G is called subgroup of G if H is clased under - and inversion, that is, for
all x, y E Hwe have. (i)x - y 6 Hand (ii) x-1 e H.
306 DISCRETE MATHEMATICS (Chapter Five)
Trivially, if e is the identity element of G then {e} is a subgroup of G.
It is called the trivial subgroup’. G is also a subgroup of itself. It should
be noted that the conditions (i) and (ii) above are independent of each
other. For example, let Z he the group of all integers under the usual
addition, + . Then the set of all positive integers satisfies (i) but not
(ii) while the set of all odd integers satisfies (ii) but not (i). The set of
all even integers satisfies both (i) and (ii) and hence is a subgroup of Z.
More generally it is easy to show that for any positive integer m, the set
ofall multiples of m, that is, the set (mx : x E Z) is a subgroup of Z. It
is denoted by mZ. It is interesting to note that these are the only non-
trivial subgroups of Z, as we now show.
1.7 Proposition: The only subgroups of z are (0) and those of the form
ml for some positive integer m.
Proof: Let H be a subgroup of 2. If H ¢ {0}, then H contains some non-
zero integer x. Then —x is also in H. Since one of x and — x is positive.
H contains at least one positive integer. Let m be the smallest positive
integer in H. (The existence of such m follows from the fact that the set of
positive integers is well ordered, see Exercise 3.3.21). We claim H — mz.
First, to show mz C H. we consider any multiple we of m. If x is 0, then
nix = 0 e H. If x is a positive integer we can show by induction on x that
mxe H. (Clearlym- 1 6 Hand ifm.(x-l) G cen mx = m.(x—l)+mE
H.) If x is a negative integer then —-x is positive. We write m - x =—m. (— x)
which belongs to H since — m is already in H by condition (ii) above.
In all cases m-xe H, that is mz c H. For the converse, we need the
euclidean algorithm mentioned in the solution to Problem 2.2.10. Let
h e H. Write h = mp + qfor some integersp and q withO < qg m — 1.
Then
q=h-mp=h+[m- (—p)leH
since h e H and m-(-p) e H. Now il'q = 0, then It = mp and so
h e m Z. Ifq eé 0, then qis a positive integer in Hand 11 < m. But this
contradicts the very definition of m as the least positive integer in H. So
q = 0 must hold and hence h = mp e mz. Thus we have shown that
H C m 2. This proves that H, unless it is the trivial subgroup, {0), is of
the form ml for some positive integer m. I
The last result is exceptional in that in general, it is far from easy to
characterise all possible subgroups of a given group. (The reason we could
do it for Z is that it is a cyclic group, a concept we shall define shortly).
For example let R be additive group of real numbers. Then R has as
subgroups. z, Q(the set of all rational numbers). the set Z + V2 z defined
as the set of all timbers of the form m + 1/21: : m, n E Z and many other
subgroups. All these are also subgroups of C, the group of all complex
'A group with only one element in it is called a trivial group.
Group Theory 307
numbers under addition. Let 0' denote the group of non-zero complex
numbers under multiplication. Let
S‘={x+iyeC-:x'+y'=l).
(‘ _:--::, S1 ' of all r' L x which lie on a circle
of radius 1 centered at the origin. It is easy to show, using elementary
properties of absolute values of complex numbers, that S1 is a subgroup of
C‘. When regarded as a group by itself 51 is called the circle group. This
group plays a crucial role in the study of certain other abelian groups.
Note that it is L " “ every L, r of an “ " group is
abelian.
Let us now take the permutation group S(X) ofa set X. It often happens
that X carries some additional structure. Theninstead of taking all bijections
of X into itself, we take only those which preserve this additional structure.
In most cases, such bijections constitute a subsgroup of S(X). This sub-
group, when regarded as a group in its own right, is an important tool in
the study of that additional structure because it depends on that structure.
For example there may be some partial order < on the set X. Then the
set of those bijectionsf: X -> X which are order preserving (that is, for
all all x, y e X, x g y ifl'f(x) <f(y)) is a subgroup ofS(X).
An especially interesting example of a subgroup so constructed is obtai-
ned by taking X to be set of points in a plane or in a space. The additional
structure is that given by the euclidean distance. We take only such per-
mutations of X which preserve this distance, that is. bijectionsf: X —» X
which have the property that for all x, y E X,
d(x, y) = d(f(X). f0)),
where d denotes the euclidean distance between two points. In Chapter 3,
Section 2 we called such a bijection as an isometry. Since the composite of
two iosmelries is an isometry and the inverse of an isometry is an isometry,
it follows that the set of all isometric: of X is a subgroup of S(X). It is
called the group of isometries or the group of symmetries of X. To justify
the second name we note that the larger this group, the greater is the
degree of symmetry in the figure X. For example. suppose X consists of
the three vertices, A, B and C of a triangle. If no two sides of the triangle
ABC are equal, then there can be only one isometry from X into itself.
namely, the identity function lr Suppose, however, that the triangle ABC
is isosceles with the sides AB, AC having equal length. Then the triangle
is symmetric about the altitude through A. If we define f : X—-> X by
f“) = AJU’) = Crud/(C) = B
then f is an isometry. So here the group of symmetries of X has two ele-
ments, f and 1);. If further, ABC is an equilateral triangle then the group
of symmetries of X is the entire permutation group S(X) of order 6. All the
308 mscna'ra MATHEMATICS (Chapter Five)
six elements in this group have geometric interpretations. There are three
elements which represent rotations around the centre through angles of
0°, 120° and 240" (a rotation through angle 0° is of course the identity
function). The three other isometries correspond to the reflections through
the three altitudes of the triangle. All these six elements along with their
geometric representations are shown in Figure 5.1, where (n) represents the
identity function or the original state of the triangle. The function repre-
sented by any other triangle is obtained by sending every labelled vertex
to the vertex at the corresponding pasition in (a). For example the triangle
in (b) represents the function which takes A to C, B to A and C to B.
B
(a)
A
I
I
I
I
I
i
C D
(d) (o) m
Figure 5.1: Group of Symmetries of an Equllatenl Triangle
The groups of symmetries are very important in problems of placement
where we do not want to " ‘ ' L‘ , tic .n , i.e.
those which can be obtained from each other by applying some isometry.
We shall compute many such groups when we shall study group actions
(see the Epilogue). For the moment let us compute the group of symmetries
of a regular pentagon. Let us name its vertices clockwise as v,. v., v,, v‘
and v.. Let C be its centre. Let L; be the line joining v, to C for l = l,..., 5.
Let G be the group of all symmetries of the pentagon. Then G contains
rotations through angles which are multiples of 72' and also reflections
(or 'flips‘) through the lines L, ...,L,. To see that these are the only
possible isometries. suppose f: {vb .... v.) ~> (v‘, ..., Va} is an isometry. Then
f(v1) can be any of the five vertices v, to v‘. However, the moment flu) is
fixed as v, (say) then f0,) must be one of the two 'neighbouring’ vertices
of v,. As soon as f(v,) and f(v,) are fixed, we have no choice in defining
f(v,), f(v‘), f(v.) because of the requirement that all the distances be
Group Theory 309
preserved. Hence there can be only 10 distinct isometries at the most.
SinCe we already have listed 10 isometries, we have exhausted the group G.
Thus G is a group of order l0. To study its structure further, let r, denote
clockwise rotation through an angle of 72i degrees for i: l, 2, 3. 4, 5;
and let )7 denote the reflection through the line L.. For example. as func-
tions from (v, .... v.) to itself. r, is given by
'20:) = ”3- '10:) = V0 r10’s) = Viv '10“) = ’1 33d HUI) = Va-
whileI. is given byfi(v,)=v,.f,(v,) = v,,f‘(v,) = v‘, f‘(v,) = v. and f;(v5)=v,.
By actual computation, we see that the composite r, o f. equals f.5 while the
composite f.ar, equals fr (The reader is also urged to verify these by actually
doing physical experiments with a labelled pentagon.) Thus the group is
non-ahelian. More generally, it can be shown that for all n 9 3, the group
of symmetries of a regular n-gon is a non-ahelian group of order 2n. It is
called the dihedral group of order Zn (or sometimes of order n) and is
denoted by 0..
For applications to organic chemistry. the group of isometries of a
regular tetrahedron is very important because in the tetrahedral model of
a carbon atom, a carbon atom is situated at the centre of a regular tetra-
hedron and is attached, by covalent bonds, to four radicals situated at the
vertices. All the 24 permutations of the four vertices of a regular tetra-
hedron are isometries. But, for figures in space the question of orientation
is important because we can distinguish between an object and its mirror
image (that is, reflection through some plane), as the two have opposite
orientations. This is shown in Figure 5.2 where the regular tetrahedron
Figure 5.2: Reflection In a Plum Reverses Orientation
ABCD in (a), alter undergoing rCflection in the plane containing A, B, and
M (the midpoint of CD) looks as in (b). The corresponding function
fzid. 19. C. D}-*{A. B, C, D) is given byf(A)-—~A.f(B)=B.f(C)=D.
f(D)= C. Clearly f is an isometry. But it is not orientation preserving.
To see why it is so, suppose we traverse the boundary of the face 8CD in
the sense 8- C- D — B so that the interior of the face is on our left.
Then the fourth vertex A will be towards the head in (9.). But if we
traverse the face BCD in (b) in the same manner then A will be towards
310 mscnm MATHEMATICS (Chapter Five)
the opposite side. (The same idea can be expressed in terms of the right
handed and the left handed screw, or, more analytically, in terms of the
cross product of certain vectors.) Clearly the composite of two isometries
is orientation preserving if both of them are orientation preserving or
both are orientation reversing. When one of the two isometries preserves
and the other reverses the orientation. their composite is orientation
reversing. So the orientation preserving isometries form a subgroup of the
group of all symmetries. Generally, for figures in space, by the group of
their symmetries we mean the group ofall orientation preserving symmetries.
For plane figures the question of orientation does not arise because an
orientation reversing transformation corresponds to a reflection in some
line L in the plane. But We can think of it in space as a rotation through
180" around the axis L. Hence, as a transformation in space it is orienta-
tion preserving. Physically this amounts to realising a reflection through a
flipping of the figure. However, where flipping is not allowed, the question
of orientation does matter even for a plane figure. For example, in
the case of a regular pentagon which is laid on a table and is not to be
lifted from it, the group of isometries would comprise only of the five
rotations.
Besides the groups discussed so far, there are two other groups which
are important in applications to physics. One of them is called the group
of quarterlons and is denoted often by Q. It has eight elements denoted by
l, i, j, k, — l, —i,—j, and —k. The full table of multiplication is given
in Figure 5.3. But there is a very easy way to remember it. As far as the
six elements ii, :1; j,:j;k are concerned, the multiplication is precisely
the cross product for the three coordinate vectors i, j, k and their nega-
tives (except for r“ = — 1 etc.). The element 1 is the identity. Multiplication
by —1 reverses the sign. The verification of the associativity is fairly
tedious. As for inverses, l and —l are their own inverses while the inverse
of every other element is its negative. This group is non-abelian, because
ij=kbutji= —k.
The other important group is variously known as the Klein group, the
axial group, the quadratic group or the tour group (because it has four
elements). Let us denote it by K. It has four elements 9. a, b, c and the
group operation is defined so as to make 9 the identity element. As for the
remaining elements, we let (u: a b~b = c4: = e and a-b = b-a = c,
b-c = c~b = a, c-a = a~c = b. it is an abelian group oforder 4. The reader
will notice a certain relationship between Q, the quaternions group and K,
the Klein group because of the cyclical manner in which the multiplication
is defined for three elements. Indeed we may say that if we ignore the
minus sign in the quaternion group then we get precisely the Klein group.
Formally we identify 1 and — 1 to a single element and call it e. Similarly
we let a, b, c correspond to :t: i, :1: j, :1; k respectively. Then we getK from
Group Theory 311
Q. This is a special case of what is known as the construction of a quotient
group and will be studied in Section 2.
-1 __1
——ji1j —!
Figure 5.3: Mnhhnimlon Table for Qnatmlono
It is clear that the intersection of two (and in fact, any number of)
subgroups of a group G is again a subgroup of G. However, the union of
two subgroups need not be a subgroup. For example in the quaternion
group just defined, let H. = (:1; l. :l: i) and H, = {i l, ij} Then
H,nIl,=(j:l} which is a subgroup of Q. But mun, =(i1, i I, :L-i} is
not a subgroup because it is not closed under multiplication.
We often have a group G and a subset S of G. We then want to consider
the smallest subgroup of 6 containing S. It is called the subgroup generated
by S. and we denote 'it by G(S). Of course. if S itself is a subgroup of G
then G(S) = S. Clearly, if S is the empty set, then G(S) is the trivial sub-
group (e) of G. More generally, if S a subset of G then we let 5! be the
collection ofall subgroups of G which contain S. In symbols, JI={H:ScH
and H is a subgroup of G). Then .fl is non-empty since the whole group G
is a member of AI. If G is a finite group then obviously 3! will have only
G). We
finitely many elements, say. [1,, I,...,H,, (one of which will be
now let H = 6H,. ThenHis asubgroup ofG. Also S c HA‘or all i=1....,n
I-l
312 mscnm MATHEMATICS (Chapter Five)
and hence S c H. So H is a subgroup ofG containing S. To show that it is
the smallest such subgroup, suppose K is any other subgroup of G contain-
ing S. Then X =H. for some i. But HC H, for all i = 1.....n. So H c K.
Therefore H = 6(5).
Thus the existence of the subgroup generated by a subset has been esta-
blished at least for the case where the group is finite. The same argument
actually applies even for infinite groups. The collection 31‘considered above
is possibly infinite. But we can take intersections even of infinite families
of sets. However, we do not want to go into it, because anyway the cons-
truction given above for G(S) is only of theoretical importance. If we want
to apply it for a concrete set S, the difficulty starts right from deciding
which subgroups of G contains S. Moreover, most of the work done in
finding these subgroups is superfluous since we want only the smallest
among them. We therefore look for some intrinsic construction for 6(5),
that is, some construction in which elements of 6(3) are expressed con-
cretely in terms of S. The situation is analogous to the problem of finding
the equivalence relation generated by a. given relation on a set (See
Proposition 3.2.15 and Exercise 3.2.11). In the proposition below we give
this ‘internal' description of G(S).
1.8 Proposition: Let (G, ~) be a group and let S be a nonempty subset of
G. LetH=(x’,“¥,"...x:’:x. e S, n. e Z, r) 0}. Then H is a subgroup
of G and it is the subgroup generated by S.
Proof: H consists of all possible products of powers of elements of S.
(Negative powers are also included). These products need not be all distinct.
Clearly S c H since every element x e S can be written as x‘. To show H
is a subgroup, by very definition H is closed under multiplication. As for
closure under inversion, we simply note that
,, . _
(Jd' xz'...x.’)" = x, "' x,."{"...x: " x:_ "’
-
-
(by Proposition (11.4.7)) which is again an element of H. Thus H is a sub-
group of G containing S. To show it is the smallest such subgroup, suppose
K is a subgroup of G and SC K. Then we claim H C K. Take a typical
element xi" x;"...x’," ofH. Since x, e S, x, E K for all i: l, , 7. But
K, being a subgroup, is closed under multiplication and inversion. Hence
x7'e Kfor i= l,...,r and finally xi" x5'...x7' e K. So HC K. Thus H is
the smallest subgroup of G containing S. I
This proposition not only establishes the existence of the subgroup 6(5)
generated by a subset S but also gives the nature of its elements in terms of
the elements of S. It may of course happen that the same element of G(S)
can be expressed in more than one way, say, x’,"...x'," and y;"'...y'."’. Note
also that we are not requiring the x.’s to be distinct, because the elements
we):I and xix, need not be the same. However, in an abelian group,
we
Group Theory 313
may require 'the x,’s to be distinct. As examples, we see that in the quater-
nion group Q, the subgroup generated by the singleton set (i) is {i 1, i i).
Note here that i': —-l, i'=—i, i‘=l and the other powers of i do not give
rise to new elements. for example i‘ = i‘-i= i. The subgroup generated
by (i, j) is the‘entire group Q. In Z, the singleton set {I} generates the
whole group. There is a special name for groups with this property.
1.9 Definition: A group G is called cyclic if there exists 2: e G such that
G is generated by the singleton set {x}. Any such x is called a generator of
G. (We often say ‘x generates 0’ instead of ‘(x) generates 6').
Thus, the group (2, +) is cyclic with land — l as generators. For
every positive integer m, the group Z». of residue classes under addition is
also cyclic, with [l] as a generator. Note that if n (>1) divides m then [n]
cannot be a generator of Z... because if k : m/n then, the subgroup genera-
ted by [n] will contain only k elements, namely, [n], [2n],..., [nk—n] and
[uk](=3[m] = [01) For example, [2] is not a generator for 2. but [1] and
[3) are. In the next chapter we shall determine all the generators of a
cyclic group.
Since a cyclic group consists of all powers of an element and any two
powers of the same element commute with each other (by the laws of
indices, Proposition (3.4 8)). it follows that every cyclic group is abelian.
The converse is false. As a simple counter-example, the Klein group, K,
defined above is not cyclic. The subgroups generated by (e), (a), {b} and
{c} are respectively, {e}, (e, a}, (e, b) (e, 0). None of these isthe entire group
Although a cyclic group consists of all powers of its generator, these
powers need not be distinct. Indeed if the group is finite then they are
bound to repeat (by the pigeon-hole principle). It turns out that this repeti-
tlon follows a regular pattern. The precise result is given below: The reader
is asked to first verify its truth for the cyclic group 2,.
1.10 Theorem: Let G be a cyclic group of order n with a generator 2:.
Then n is the smallest positive integer such that x" = e, the identity of the
group. For any two integersi and j, x' = x! if and only if! E j (modulo
n). The group G consists of e, x, x’,...,x"—‘.
Proof: First we show that there exists a positive integer r such that
x'=e. Consider the successive powers, x, x’, x“, x‘, These are all
elements of G. But since G is finite, these elements cannot be all distinct.
So there exist positive integers! and j with i <: j such that x’ = x1. Let
r =j~i. Then )c'=x"'=:r1’.(x')-l (by Proposition (3.43)). Hence
x' = e. Since there exists at least one positive integer r with x' = e, there
exists a least positive integer, say In, with this property. (Here again we
are using the fact that the set of positive integers is well-ordered, see
Exercise (33.21).) We have to show thatm = n. First note that the elements
314 oiscmrra MATHEMATICS (Chapter Five)
2, x, x“, ..., x"'-1 are all distinct. (For if x9 = x0 (say) with 0 < p < q < m
then once again xd'P = 2 with q — p < m contradicting the minimality of
m.) But G has only n distinct elements. So m g n. Note further that c.
x, ..., ”H are the only distinct powers of x. To see this suppose M‘ is any
power of x. By the euclidean algorithm, we can write k = .rm + r where .r.
tare integers, with 0 g t< m. Then x" = x'mt' = x""-x‘ = (my-x' = e'-
x' = e-x' = x'. Thus )6“ equals 1:1 which is in the set (e, x, ..., x"-‘}. But
we are assuming that x is a generator for G. Hence the powers of x exhaust
the set 6. So G C (e, x, ..., W4). giving n g m (by comparison of cardi-
nalities). Thus we have shown m = n which proves the first assertion. For
the second assertion, if iEj(modulo n) then we write i =j + km for
some integer k and show with laws of indices, that x‘ = x1. Conversely
suppose x' = X}. Then x") = e. Once again we use the euclidean algorithm
and write 1‘ —— j :1 un + v where n, v are integers and O S v < n. We then
get a" = e. If v # 0, then this is a contradiction to the fact that n is the
smallest positive integer such that x" = 2. This shows v = 0 and hence
i .=_ j (modulo n). The last assertion has already been proved. I
This theorem completely describes the structure of finite cyclic groups.
Even when a group G is not cyclic, the subgroup generated by an element,
say x, of G is a cyclic group when regarded as a group by itself. This sub-
group is oi‘ten denoted by (x). If (x) is finite then we say the element x is
of finite order (or sometimes a torsion element). Further the order of the
group 0:) is called the order of x. Obviously if the group G is finite then
every element is of finite order. But even in an infinite group some elements
may be of finite order. Trivially, the identity element is one such element.
As a non-trivial (and interesting) example, consider the circle group 5‘
defined above. This consists of all complex numbers of absolute valuel
and the group operation is that of multiplication of complex numbers.
In polar form, every element of S‘ can be written as e" (which equals
cos 0 +1 sin e) where 0 is a real number. Note that e": = eu- ifi' e, — 8.
is an integral multiple of 21:. Also em . c”! = e'l'H' W. Now let u be any
positive integer. Let 1 = 2"“. Then 2 is an element of order It. All powers
of z are complex nth roots of 1. Hence the group (:5) is called the group of
n"' roots of unity and any generator for it is called a primitive nth root of
unity. (Unity is just another name for the number 1). Note that 5' also
contains elements of infinite order. We leave it to the reader to prove that
e" is of finite order ifi' 6 is a rational multiple of it.
Note that it' x is an element of order n then x'" is the inverse of x, In
other words, the inverse of x can be expressed as a positive power of x.
As a simple consequence of this fact we get the following result which.
although not profound. makes it a little easier to check whether a subset
of a finite group is a subgroup or not.
1.11 Proposition: Let G be a finite group and H a non-empty subset of
Group Theory 315
G. Then H is a subgroup of G if and only if H is closed under multiplica-
tion.
Proof: The necersity of the condition is obvious and does not require
that G is finite. For suficiency, suppose G is finite and H is closed under
multiplication. To show H is a subgroup of G we merely have to verify
that H is closed under inversion, that is if x E H then r1 e H. Now
every element of G is of finite order since G is itself finite. Let x e H and
n be the order of x. Ifn = i, then x = e and x‘1 = e e H. Supposen >1.
Then x"-1 is the inverse of x Since H is closed under multiplication and
x e H, x'x E H. Then again (x~x)-x e H. Continuing, x“ e H for
all positive integers m. In particular )6” e H. But this means x-1 e H.
This completes the proof. I
As an illustration, in the group of isometries of a regular pentagon
considered above, the five rotations form a subgroup because the composite
of two clockwise rotations through angles 721° and 72j" is aciockwise
rotation through 72 (i +j)‘.
So far we studied subgroups in general. Among all subgroups, certain
subgroups called normal subgroups stand out because of the important
properties they have. Although these properties will be studied in the next
two sections, we introduce the concept of normal subgroups here and give
some examples. Admittedly, the definition below may appear without
motivation. But this would come later.
1.12 Definition : A subgroup N of a group G is called normal in G if for
all» e Nand allg 5 0,3113“ 5 N.
Note that in this definition n ranges over N but g ranges over the
whole group G. If g E N and n E N then we certainly have gag" e N,
whether N is normal or not (because N is closed under inversion and
multiplication). Therefore, whenever we want to test whether a subgroup
N of G is normal or not, it suffices to verify the condition gnr‘ e N for
all n e N and for all g e G — N. Note also that normality is relative to
the ambient group. We may have subgroups N, H with N C H c G and
Nnormal in H but not in G.
In case the group G is abelian, ,gng“l = gg-ln = en = n and thus every
subgroup of an abeiian group is normal. So the concept of a normal
subgroup is interesting only when the group is not abelian. Trivially, for
any group G, the trivial subgroup {e} and the whole group G are normal
in G. As some non-trivial examples let G be the group of symmetries of a
regular pentagon. We saw above that G consists of 5 rotations, r,, r,, r,, r‘
and r. and 5 reflections (or flips)f,, f,, ..,f.. Let N = {r,, r,, ..., n). Then
as noted above, N is a subgroup of G. We claim it is a normal subgroup
of G. By the remark made above. it suflices to show that for all i, j = 1,
2, ...,5,fl c r, afr‘ equals some rk. Note further that I)“ equals]. As a
316 orscrusrn MATHEMATICS (Chapter Five)
typical case consider f, e r, of}, Here f, as a function from (v,, .... vi} to
itself is given by f,(v,) = v,,f,(v,) = I". f;(v,) = 17., Av.) = v, and
f,(V5) ‘= ”1- Also ’s is given by '10:) = VI: “("0 = v‘, '20:) = Vs: '20.) = V:
and r,(v,) = v,. Let g be the composite f. o r, of,. By direct computation
we see that g(v,) = v‘, g(v,) = v‘, g(v,) = vl, g(v‘) = v, and g(v‘) = v..
Thus g is precisely the rotation r,. So fa r.f;‘ = r, e N. Similarly the
other cases can be verified. So N is a normal subgroup of G. (In the next
section we shall see a much quicker way of showing that N is normal in
G.) Similar reasonings shows that for each i, the subgroup (fl) is not
normal in G.
A few other examples of normal subgroups will be given in the exer-
cises. We conclude this section with two characterisations of normality.
Although both of them are more refo‘rmulations of the definition, we state
them because of certain important concepts involved in them. Other cha-
racterisation of normal subgroups will be given in the next two sections.
The first characterisation involves the process of extending I bitter)
operation on a set to a binary operation on its power set (cf. Exercise
(34.15)) Let s be a binary operation on a set X and let P“) be the power
setof X. I”. Be P(X). wedefinedA©Btobetheret(atb: aeA.beR).
This is a subset of X. It consists of all possible elements which can be
expressed as products of two elements, one from A and the other from 5.
Such expressions need not be unique. For example. if X = N. the set of
positive integers, . is the usual multiplication. A = (l, 3. 4, 6) and
3= (2, 3.5) then A©E is the set {2, 3. S, 6. 8, 9. l2. 15, 18, 20, 30). Thus
Qis in binary operation on P(X), which is often denoted by n- as well.
This operation inherits many of the properties such as commutativity,
associativity and presence of identities of the original binary operation.
However, very few elements of I’(X) have inverses. If A is a singleton
set, say (x). then A c Bis often denoted by x . B. This set is often called
the left translate of the set B by the element x. Similariy if B = y then
A .3 is denoted by A t y and is called the right translate of A by y, (cf.
Exercise (14.2)). This terminology has a geometric origin. If ‘ denotes the
usual addition for vectors in a euclidean space then a translate of a set is
nothing but what is popularly called its ‘parallel translate‘. because it is
obtained by shifting every element of the original set by the same vector
In the next section, we shall havea lot to do with the various translates of
a subgroup of a group. They will be called the cosets of the subgroup. For
the time being, we characterise normality of a subgroup in terms of certain
sets obtained from it by translating it on the left by an element of the
group and on the right by the inverse of that element.
1.13 Proposition: Let G be a group and H a subgroup of G. Then for
every g e G, gHrl is a subgroup of G. H is normal in G if for all g e G.
gHr‘ coincides with H.
Group Theory 317
Proof: By definition gHg—l consists of all elements of the form gxr‘ as
x varies over H. If 6 were abelian, then gHg“ would be the same asH. In
general, it could dilIer from H. To show it is a subgroup of G, we first
show it is closed under multiplication. Two typical elements of gHg-l are
of the form gxg" and gyr' for some x, y e H. Then
(as ") - (3%") = .3): (8"8) ya" = gxeyg" = 3(0):"-
But xy 5 H since H is a subgroup. So g(xy)g“ E gHg", that is,
(gxg‘l) (81'3“) E H. As for inversion, if x E H, then (gxg-‘)'l = (g—‘)'1
x—1 g" = gx-‘g-‘ (by Proposition (3.4.8) again). But :6" E H. So
gx’lg—l E gHg-L Hence (gxg‘U—l is in H. Therefore gHg-‘ is closed both
under multiplication and inversion. So it is a subgroup of G. if H is normal
in G, then for all x 6 Hand 3 E G, gxg“ e H. Hence gHg“ c H. But
actually, equality holds, because if It 6 II we can write h = g(g-‘hg)g" =
g(xhx“)g-‘ where x = g-‘. By normality of H, xhx-l e H and hence
g(xhx“)g" e gHg-l. So h e gHg“. Thus we have shown that if H is nor-
mal in G then gHg"1 equals H for all g e G. The converse is clear because
if gi-lg—1 coincides with H for all g e G then certainly for all xe H and
ge G, gxg" e H. I
As an application of this result we have the following interesting
corollary which is sometimes useful in proving that a subgroup is normal.
1.14 Corollary: Let G be a group. Suppose for some positive integer r,
G hasa unique subgroup of order r. Then the subgroup is normal in G.
(G is not required to be finite.)
Proof: Let H be the unique snbgroup of order r and let g6 G. Then by
the Proposition above, gHg" is a subgroup of G. We claim that gHg-‘ is
also of order r. For this, we define a bijection from H onto gHg-h Define
sagHg-l by/(x)=gxgr‘. lff(x)= f(y) for x, ye H, then gxg-1=gyg-l
which implies x = y by the cancellation laws (Proposition (1.4)). So f is
one-to-one. And by the very definition of gHg", f is onto. 50 fit a
bijection and hence gHg-‘ has cardinality r. But H is the only subgroup of
order r. So gHg-‘ must equal Ht By the last proposition, this means that
H is normal in G. I
For example, it can be shown that in the group G of all symmetries of
a regular pentagon, there is only one subgroup of order 5, namely the
subgroup N consisting of the five rotations. This gives an alternate proof
that N is normal in G. Note however, that this method is not always
applicable, because the converse of the corollary is false. For example in
the quaternions group Q, the subgroups (i), (j), (k) are all of order 4 each
and all of them are normal in Q.
The second characterisation of normality is based on the concept of
conjugacy which we now define.
318 ciscnara MATHEMATICS (Chapter Five)
1.15 Detlnition: Let G be a group. If x, y are elements of G, we say y is
conjugate to x (or a conjugate of x) inG if there exists ge G such that
y = gxg-l. Similarly ifH, K are subgroups of G, we say K is conjugate to
H in G if there existsg e G such that K = gHg-l. (Sometimes we say that y
is a conjugate of x by g and K is a conjugate of II by g.)
The following proposition is an immediate consequence of the defini-
tions.
1.16 Proposition: Let G be a group and H a subgroup of G. Then the
following statements are equivalent:
(i) H is normal in G.
(ii) All conjugates of elements in H are in H.
(iii) The only conjugate of H in the set of all subgroups of G is H
itself.
Proof: The equivalence of (i) and (ii) is immediate from the definitions
of normality and conjugacy. The equivalence of (i) and (iii) follows from
Proposition (1.12) and the definition of conjugacy. 8
Although the last proposition is far from profound, the concept of
conjugacy is very important. After developing some machinery needed to
apply it, we shall see its power in the chapter on group actions (see the
Epilogue). For the moment we show that it induces an equivalence relation.
1.17 Proposition: Let G be a group. Define a binary relation R on G by
x Ry ifl‘ yis conjugate toxin G. Then R is an equivalence relation.
Similarly conjugacy defines an equivalence relation on the set of all sub-
groups of G.
Proof: Reflexivity of R is clear since at = e~x«e-l for all x e G. For
symmetry suppose n. Then there exists ge G such that y =gxg'1. But
then x: g-‘yg =g-1y(g-1)—1, showing that x is a conjugate of y in G,
that is, n. For transitivity, suppose x, y, ze G and n and s. Then
there exists g, h e G such that y = gxg" and z = hyg-l. Then by substitu-
tion we get 2 = h(gxg")h“l = (hg) x (3411“) = (hg) x (113)“ showing that
s. So R is an equivalence relation on G. The proof of the second
assertion is similar and left as an exercise. I
1.18 Definition: The equivalence classes of the setG under the relation of
conjugacy are called conjugacy classes. The decomposition of G into
conjugacy classes is called the class decomposition of G.
In an abelian group, no two distinct elements can he conjugates of
each other. Consequently, its class decomposition is trivial. In anon-abelian
group, it is in general not easy to tell whether two given elements 2: and y
are in the same conjugacy class. A necessary condition is that they be of
Group Theory 3l9
equal order. But this condition is for from suflicient. In the next section we
shall find a formula for the size of the conjugacy class of an element.
Exercises
Prove that the set of all invertible elements of a monoid forms a
group.
Give an example of a binary operation Ir on a set X such that ' has
a two sided identity and every element of X is invertible but "
is not associative. (Hint: Take X to be a finite set and prepare the
table for '.)
1.3 Suppose - is an associative binary operation on a set G such that -
has a right identity e and for every xe G there is some y such
that A“ y = e. Prove that (G,~) is a group. (Hint: First show that
the right cancellation law holds. Then show that e is a left identity
and finally that x. y = 2 implies y-x = 2.)
1.4 Show by an example that the last result fails if we assume 2 to be
a left identity (with no other change).
‘Suppose G is a group in which (x.y)-' = x-ly-1 for all x, ye G.
Prove that G is abeliun. More generally, prove that if the equality
(xy)n =.- My" holds for x, y e G for any three consecutive values of
the integer n then G is abelian.
1.6 Prove that a non-empty subset H of a group G is a subgroup of
G if for all x, _v e H, xy" 6 H. (In other words, the two condi-
tions in Definition (1.6) can be replaced by a single condition.
This often shortens the work needed to verify that asubset is a
subgroup.)
Prove that the groups Z. - ([0]}, Z, — ([01) under respective
modular multiplications are cyclic.
Prove that for every prime p, the group 2,, — ([0]) under modulo
p multiplication is cyclic.
Prove that if G is an abelian group, and n is a positive integer then
the set of all solutions of the equation x” = e in G, that is, the set
(ye G: y"=e} is a subgroup of G, Show by. an example that
this need not hold if G is non-abelian.
1.10 Let G be a cyclic group of order n. Suppose a positive integer m
divides n. Prove that the equation x" = 9 has exactly m solutions
in G and find them.
Prove that a subgroup of a cyclic group is cyclic.
Define f:N—>bf(n) = n +1 if n is odd and f(n)=n—l
in is even. Define g : N —-> N by
320 Discus-rs MATHEMATICS (Chapter Five)
’1 if n=l
g(")= n+lifniseven
n—lit‘ nisoddandn>l
Prove that both I. g are bijections. In the group S(N) of all
bijections of N onto itself, prove that both f. g are of finite order
but neither f e g nor g a f is of finite order.
In an abelian group prove that the product of two elements of
finite order is of finite order. Obtain an upper bound for its order
in terms of the orders of the two elements and show by an example
that it is sharp
1.14 If x and y are two elements of any group G, prove that xy and yx
have the same order. (Hint: For every positive integer n prove
that Qty)"+1 = X(YX)'}’-)
1.15 Suppose G is a group of order n and x,, x,_ ..., x. is a sequence
of (not necessarily distinct) elements of G. Prove that there exist
integers i.jwith l s i sf < n such that 1mm]... x, =e. (Hint:
Use the pigeon hole principle.)
1.16 Prove that every finite group of even order contains at least one
element of order 2. (Hint: Pair 011‘ every element with its inverse.)
117 A function f: R —> 11. of the form f(x) = ax+ b where a, b are
real constants with a 9:5 0, is called a non-singular linear function.
Prove that the set of all such functions is a subgroup of son. Is
it a normal subgroup?
Let c be the set of complex numbers, so a symbol not in C and
C“ = G U {co}. 0‘ is called the extended complex number system.
A function f : c‘ —» C‘ is called a linear fractional transformation
. . . «I + b
(or L.F.T.) iftt IS of the form fiz) = u + d where a, b, c, d are
complex numbers with ad — be aé 0 with the understanding that
f(co) = u/c and f(z) = no if cz + d = 0. Prove that the set of
all such linear fractional transformations is a subgroup of S(c‘).
1.19 Prove that the dihedral group D, contains a normal, cyclic subgroup
of order n. Prove that D,. can be generated by two elements.
1.20 Prove that the group of all orientation preserving isometric: of a space
figure is a normal subgroup of the group of all isometries of it.
1.21 Let SI be the group of all permutations of the set X =(1, 2, 3}.
Letf: X—> Xbef(1) = l,f(2) = 3,](3) = 2 and letg: X—>X
be g(1) = 3, g(2) = 2. g(3) = I. Let H. K be the subgroups of
SI generated by f, g respectively. Prove that the sets HK and KH
are not equal and neither is a subgroup of 5,.
1.22 Let H. K be subgroups of a group G. Prove that the set HK is a
subgroup of Gif and only if HK = KH. Prove further that when
either one of H and K is normal in G. this condition is always
Group Theory 321
satisfied and hence HK is a subgroup of G. If both H and K are
normal in G, prove that HK is normal in G.
Prove that every subgroup of the quaternion group is normal.
Let H be the subgroup generated by a subset S of a group G. If
gxg-l e H for all x e S and g e G, prove that His normal in G.
1.25 Prove that two elements which are conjugate to each other have the
same order. (A direct proof is not difiicult. But the result can also
be derived slickly from Exercise (1.14) above.)
1.26 The centre Z(G) of a group G is defined as the set of all elements
of G which commute with every element of G, that is,
2(6) ={ze G 2 gz=zg) for all g e G. Prove that
(i) 2(6) is a normal subgroup of G. In fact every subgroup of G
contained in Z is normal in G.
(ii) G is abelian ifi‘ Z(G)=G. (Thus the centre of a group provides
a measure of ‘how abelian’ the group is.)
(iii) An element 256 is in the centre ifl‘ the conjugacy class of z
in G consists only of 2.
(iv) The centre of S, is {e} and that of Q (the quaternion group)
is (1,—l).
1.27 Another measure of how abeiian a group G is provided by the so-
called commutator subgroup C(G) of G defined as follows. For two
elements x, y in G, their commutator, denoted by [x, y], is defined as
the element xyx“y". The subgroup generated by all such commu-
tators is called the commutator subgroup of G. Denote it by C(G).
Prove that:
(i) x and y commute with each other “1' [x, y] = e.
(ii) G is abelian ifi' C(G) = (e).
(iii) C (G) is a normal subgroup of G. (Hint : Use Exercise (1.24))
(W) C(Q) = {1. —1)-
(v) C (0..) is a subgroup of order 11.
1.28 Let G and H be groups. The cartesian product G x H is made a
group under co—ordinatewise multiplication. The purpose of this
exercise is to relate the properties of G X H to the corresponding
properties of G and H. Prove that :
(i) G x H is abelian it! both G, H are abelian.
(ii) If A, B are subgroups of G, H respectively then A x B is a
subgroup of G x H, which is normal ifl' A is normal in G
and B is normal in H.
(iii) If G x H is cyclic. so are G and H.
322 mscnm MATHEMATICS (Chapter five)
(iv) Even if both G and H are cyclic, G x H need not be cyclic.
(v) if S, T are subsets of G, H respectively and they generate
subgroups A, B then S X (e) U {e} x T generates the
subgroup A X B.
1.29 Let G be a group, S any set and G5 the set of all functions from
S to G. Under pointwise multiplication, GS is a group. Study
which properties of G pass over to 6‘.
1.30 Let X be a set and S (X) the permutation group of X. For a subset
A of X, let FA = {/E S(X):f(x)=x for all xe A} and let
6,. ;= {[5 S(X) :f(A) = A}. In other words. elements oft-'4
leave every element of A fixed individually while those of G4 leave
the set A fixed as a whole. Prove that both F1 and 0,. are sub-
groups of S(X) and FA c 0‘. Prove that neither F4 nor 0,. is
normal in S0!) unless either the set A is 9i or | X —-A lg 1.
1.31 Give an example of an infinite group in which every element is of
finite order.
1.32 Prove that a group cannot be expressed as a. union of two of its
proper subgroups. Give an example of a group which can be
expressed as the union of three proper subgroups.
Note: and Guide to Literature
The literature on groups is so vast and diverse that it is hardly possi-
ble to cite even typical references. Standard books on groups are Burnside
[1], Zassenhaus [l]. or Kurosh [1]. Groups are often studied with some
other structure on them compatible with the group structure. The most
important among these are the so-called topological groups. A classic
reference on them is Pontryagin [1]. Groups, from the point of applications
to physics, are studied in Hamermeshm or 0.0. Hall [I]. For applica-
tions to chemistry see Bishop, D.M. [l].
The algebraic structures of a semigroup and a monoid are considerably
weaker than that of a group. Still, recently, they have been studied because
of their applicability, especially to finite state machines. See, for example,
Dornhofl‘ and Hohn [1].
Exercise 1.3 shows that thereis some redundancy in the definition of
a group as we have given. Some authors do adopt the definition given by
Exercise 1.3.
Fora generalisation of Exercise 1.5, see Problem No. B 2411 in the
American Mathematical Monthly, Vol. 8], p. 410 (1974). For a group
theoretic solution to the Rubik's cube puzzle, see Larsen [1].
2. Cosets of Subgroups
Every subgroup of a group induces a certain equivalence relation on
Group Theory 323
the underlying set of that group. The equivalence classes are called cosets
and they turn out to be nothing but the various translates ofthe subgroup.
They all have the same cardinality. If the subgroup is normal, these cosets
themselves form agroup whose properties are obviously related to those
of the original group. and the normal subgroup. In this section we study
this coset decomposition and derive some interesting consequences.
Before we give the general definition of the equivalence relation induced
by a subgroup, let us take a couple of examples. Let us first consider 2,
the group of integers under the usual addition. For every positive integer
m, m2 is a subgroup of 2 as we saw in the last section imz consists of all
multiples of m, that is,
m = {..., —3m, — 2m, —m. 0,m.2m,...}
Consider the translates of MI by various integers. For fixation of ideal
take or = l0, Then :02 + l is the set
(,..,—29. — 19, —-9, 1, ll, 21, ...).
Similarly. 102 + (— 44) is the set
(..., —54, — 44, — 34, -24. — l4. —4. 6. l6, 26....)
which also equals the translate 102 + 6 or the translate 102 + 76. In
general we see that the translates M + x and 012 + y are equal ifi‘ x —y
is a multiple of m- It follows that there are exactly m distinct translates of
m2, namely, m2 itself and
MI +1. mz + 2, ..., mz + (m — 1).
Moreover, these are precisely the equivalence classes under the equivalence
relation of congruency modulo m.
The second example is geometric. Let R' be the cartesian plane
(0:. y): x e R. y e 11).
Under coordinatewise addition, R' is an abelian group. Let L beastraight
line passing through the origin, say L = ((x, y): y= 2):}. Then L is a
subgroup of R‘. (See Figure 5.4). L consists of points of the form (t, 2!)
for l E R. If (x... ya) is any point of the plane then L + (x.. yo) is the set
{U + X” 2! + Ya): t e R).
Eliminating I, this is precisely the set
{(3. Y) E R': .v — ya = 2x — 2x0)-
Geometrically. this - , the ‘ ‘ line L r‘ (x., yo) , “ ' to
L. The whole plane is decomposed into the family of all lines parallel to
L (including L itself). Through every point of R' there passes one and only
one such line. Two points say P and Q are on the same line if and only if
324 mscnm MATHEMATICS (Chapter Five)
5” / L+(|,o)
L+(0,l)-—->/ Q/ ,L,—L+(2,o)
/
(o,I) / I
I P ,l I
/ NW0)
/ / I
/ V [I
“wail: Contact-swoop
the difference Q —P (which geometrically represents the vector joining P
to Q) is in the subgroup L.
We now want to extend the common feature of these two examples to
the case of a subgroup H of an arbitrary group G. In the two examples
above. the groups were abelian and so it did not matter whether we took
the left translates or the right translates. However, in a non-abelian group
we have to be careful. We develop the theory for left translates. An entirely
analogous theory holds for right translates and is left as an exercise. First
we give a name for the translates.
2.1 Definition: Let H be a subgroup of a group G. Then the left trans-
late of H by an element x ofG (that is, the set xH ={xh :h e 11)) is
called the left coset of H by x.
We already saw two examples of this concept above. As another
exmaple (which we shall visit frequently in this section), let G be the group
of symmetries of a regular pentagon discussed in the last section. G con-
sists of 5 rotations r,,..., r. and 5 flipsfi,....f,. The rotation r, is the
identity element 2- For every i = l. ... 5, (e, f.) is a subgroup of G Iince
fiofi = e~ Let” be one of these subgroups, say, H = (e,f,). Let us
determine the left coset r, H. By definition it is the set {rI o 0, rl of,). r1 = e
is of course r,. A direct computation shows that rI . f, is f.. So the left
coset rl o H is the set {5. A}. Note that this is also the left coset f.H because
fI o e =1; and f‘ of, = r1. Thus, two distinct elements of G can determine
the same left coset of H. We leave it to the reader to show that there are
four other left cosets (H itself being one of them since eH == H).
In the next proposition we characterise the left cosets of a subgroup
as the equivalence classes of cetain equivalence relations on G.
2.2 Proposition: LetHbeasubgroup of a group G. Define a binary
relation R on G by n ifl' x—‘ye H for x, y e G. ThenRia an
Group Theory 325
equivalence relation on G and its equivalence classel are precisely the left
cosets of H in 6.
Proof: Reflexivity of R is trivial since for every x e G, x-‘x = e and
every subgroup contains the identity element. For symmetry we note that
if r1 y E H then (x-‘y)-I E H (since every subgroup is closed under
inversion). But (x" y)" = y-‘(x-‘H = y‘1 x. So n implies n. Simi-
larly transivity of R follows from the fact that H is closed under
multiplication (or-1y.y~I z = x" z for all x, y, z e G). So R is an equiva-
lence relation. It remains to show that the equivalence classes are precisely
the left cosets of H in G. Letx E G and let S be the equivalence class
under R containing x. Then, by definition, S = { y e G :x-‘y e H) and
we have to show that x]! = S. A typical element of act! is of the form xh
for someh e H. Then x-l (xh) = (r‘x)h =Ir 6 Hand so xh e S. Hence
xH C S. Conversely ify E S then x-ly = h for some I: e H. But then
y = x (x-‘y) xh showing that y e xX. Hence xH = S as was to be
-
shown. I
Because of this proposition, the results about equivalence relations
become applicable to left cosets. We have. in particular,
2.3 Corollary: Two left onsets are either identical or mutually disjoint.
The group is the disjoint union of all the distinct left cosets.
Proof: This follows by applying Proposition 3.2.6 to the equivalence
relation above. I
As remarked earlier, an entirely analogous propositition holds for
‘ right cosets of a subgroup H in a subgroup G. Two right cosets H): and
Hy are equal if and only if xy—1 6 H. It should be noted that in general
the left cosets themselves are different from the right cosets (although for
an abelian group they are obviously equal). Two elements which are in
the same lei‘t coset need not necessarily be in same right coset. For
example, we considered above the group G of all symmetries of a regular
pentagon and let]! be the subgroup (e, 1,). The left coset all is the set
(n, L). But the right coset Hrl comes out to be the set {5, f.). Although
r, and f‘ are in the same left eosets, they are in difl‘erent right cosets.
(The right coset Hf. is the set {rp Al.) The precise relationship between
the left cosets and the right cosets is giVen by the following proposition.
2.4 Proposition: Let H be a subgroup of a group G. Then two elements
at, y e G are in the same lei! coset of H in G ifl' their inverses x“, y'1 are
in the same right coset of H.Consequently every right coset of H is obtain-
ed hy taking the inverses of all elements of some left coset of H and vice
versa.
(Chapter FiVe)
326 DISCRETE MATHEMATICS
The right coseta
Proof: Let y, x E G. Then XI! = yH itl’ x‘ly e H.
H. 80 art! = yH
Hx", Hrl are equal ifl' Jr'l(y‘1)'l e H that is, in‘ x'ly e
Now let T be a
iii‘Hx'1 = Hy-l. Similarly Hx = Hy ili'x-‘H = y'1H.
We claim T
right coset of H, say T = Hx. Let S be the left coset x-‘H.
y e T. Then
consists of precisely the inverses of elements of S. Let
then y" E 5
Hx = Hy. So y'lH = x")! by what we proved earlier. But
Conversely If
So y is the inverse of some element (nmely y“) of S.
e 1‘. Thus the
2 e S then x-‘H = 2H, giving H): = Hz-1 and hence z"
of the left
right coset T consists of precisely the inverses of the elements
elements
coset S. Similarly every left onset is obtained by inverting all the
of a right coset. I
Hr.
As an illustration. in the example above r = rf‘. The right coset
therefore, should consist of the inverses of the elements of the left coast
n“ and
rIH. This is indeed so, because r1H = {rhfi}, Hr. = (n, 1"}, r‘ =
f =IT -
‘ As a consequence of the preceding proposition, we get,
2.5 Corollary: The number of distinct left cosets of a subgroup equals
the number of its distinct right costs.
Proof: Let _L‘ be the set of all distinct lefl coasts of a subgroup H of a
group G and let 5?. be the set of all distinct right cosets of H in G. Define
a function g: .L' -> 91 by g(xH) = Hr‘ for all x e 6. By the last pro-
position, 3 is well-defined because xH = yH implies Hx-‘=Hy“. Conversely
Hx" = Hy"l implies xH = yH. So gis one-to-one. Finally, given any right
coset Hy e 91. Hy = g(y-‘H). Sag is also onto. Therefore 3 is s bijec-
tion and hence _L' and R have the same cardinality. |
2.6 Definition: The number of distinct left (or right) eosets of a sub-
group H in a group G is called the index of]! in G and is generally denoted
by (G: H) (or by [6: H]).
For example. for every positive integer m. (2. m2) = m. If G is the
group of symmetries of a regular pentagon and H is the subgroup consi-
dered above then (G: H) = 5. If G = R‘ and H is the line L in Figure 5.4.
then (G: H) is infinite. In case G is a finite group, there is a simple formula
for the index of a subgroup which we shall prove a little later.
We just remarked that the left coset xH is in general not equal to the
right coset Hz. or course this will be the case if the group is abelisn. But
this is too strong a condition. It will suflice if. instead of the whole group
G being abelian. elements of H commute with all elements of G. that is. H
is contained in the centre of G(see Exercise (1.26)). Even this requirement
turns out to be stronger than is really needed. In order that the sets :11
and H): should coincide. all we need is that for every h; E I"! there exists
h, e H, such that xh, = h,x and vice versa. This condition will be trivially
Group Theory 327
met if x and hl commute with each other (for then we can set h, = h,).
But even if they do not commute, it will still be satisfied if the element
xhpr‘ is in H (for then we can set h, = xh,x-'). This suggests that normal-
ity of H might do the trick. This is a good guess. As a matter of fact, this
provides an important characterisation of normality which we now prove.
2.7 Theorem: Let H be a subgroup of a group G. Then the following
conditions are equivalent:
(i) H is normal in G.
(ii) For every x e G, xH :- Hx.
(iii) Every left coset of H in G equals some right coset of H in G and
and vice versa. (That is, the left and the right cosets of H in G
coincide.)
Proof: (i) 9 (ii). Suppose H is normal in G and x e G. Then for any
II E H, zinc-1 E H, and hence
x/: = xh .(r‘x) = (xhx-l) - x e Hx.
Thus
xH C Hx.
Similarly for any h e H,
hx = x(x-‘hx) = xIx-‘h(x'1)-‘] 6 xi!
by normality of H, showing Hx c xH. Hence xH = Hx.
(ii) a» (iii). This hardly requires any proof because (ii) is more specific
than (iii).
(lii) =~ (ii). Let x e G. Then the left coset xH equals some right
cosets Hy. We are not given, nor is it necessarily true that x = y. Still
we claim Hy = Hx. This follows from the fact that x = x - e E xH = Hy.
So x is in the right coset Hy. But 2: is also in the right coset Hx since
x=e.x. So Hayaé 95 and hence Hx=Hy. Since Hy=xH it
follows that xH = Hx.
(ii) =3 (i). Let x e G, h e H. Then xh e x”. But xH = Hx. So
xh E Hx, i.e., there exists hl e H such that xh = hlx. But then xhx“ =
III e H, showing that H is normal in G. I
As a simple application, we get the following result which is often
useful to show that a subgroup is normal.
2.8 Theorem: A subgroup of index 2 is always normal.
Proof: Let H be a subgroup of a group G. Suppose (G: H) -—- 2. This
means there are two distinct left eosets of H in G and also that there are two
distinct right cosets of H in G. Now, in any group. the subgroup itself
, constitutes one of the left cosets and also one of the right cosets because
328 DISCRETE MATHEMA‘HCS (Chapter Five)
H = eH = He. Since the two left cosets are mutually disjoint, it follows
that the other left coset must be precisely the complement of H in G. The
same holds for the other right coset. Thus we conclude that the left and
the right cosets of H in G coincide. Therefore H is normal in G. I
For example, if G is the group of symmetries of a regular pentagon
and N is the subgroup consisting of all five rotations then (G :N) = 2.
This can be proved directly by showing that N has only two left cosets,
one consisting of all 5 rotations (that is, N itself) and the other consisting
of all 5 flips. However, this will come much more easily from the formula
for the index which we shall prove later. We then see that N is normal in
G. Similarly let G be the group of isometries of a figure in space and let H
be the subgroup of orientation preserving isometries. We claim that H is
anormal subgroup of G. If there is no orientation reversing isometry in
G, then H z, G and so H is normal in G. Suppose there is some orientation
reversing isometry 6. Then we claim that H and 0H are the only two left
cosets of H in G. Note that all elements of 01! are orientation reversing
because the composite of an orientation reversing and an orientation pre-
serving iscmetry is orientation reversing. Conversely if q; is any orientation
reversing isometry then we write i]; = 4, (til-1 0 a). Note that ¢-‘ a e is orien-
tation preserving, being the composite of two orientation reversing isomet-
ries. So iii-1 a e e H. Hence ‘9 s 0H. Thus all orientation preserving
isometries constitute the set H while all orientation reversing isometric!
constitute the set 8H. Since there can be no other isometries, H and 8H
exhaust the group 6. So H has index 2 in G. By the theorem above H is
normal in G. (This could also be done directly as in Exercise (1.20). But we
want to illustrate Theorem (2.9).) Since the right and left cosets of a normal
subgroup are equal, from now onwards we shall simply call them as cosets.
It is instructive to reformulate Theorem (2. 8). 1n the last section
we saw how a binary operation it on a set X induces a binary operation
(which we continue to denote by l) on the power set P(X). Let us do this
construction for the group operation - on a group G. If H is a subgroup
of G, and x E G, then by definition, the left and right cosets xH and H):
are respectively the elements {2:} . H and H . (x) of P(G). If G is abeiian
then the operation ~ on P(G) is commutative. Then Theorem (2.8) says
that the subgroup H is normal in G ifl‘ the element H of P(G) commutes
with all singleton subsets of G. Thus normality is a weaker form of
commutativity.
Let us now see whether P(G) in group under this operation. Associa-
tivity of ~ follows from that of the group operation. The singleton set
{2} is also easily seen to be an identity for - So (P(G),.) is a monoid.
However, we are in trouble when we look for inverses. HA is a subset of
G with at least two elements then A, as an element of P(G) can never be
invertible under . . If B is any subset of G, then A -B is empty (when B2!)
or else has at least two distinct elements (if 11,, a, e A, :1, ye a, and b e B
Group Theory 329
then alb at: a,b by cancellation law). In either case A - B cannot equal the
identity e1ement(e). If follows that the only invertible elements of P(G) are
the singleton sets.
It turns out that if H is a normal subgroup of G then the set of all
cosets of H in G is closed under the operation - on P(G). Moreover, under
the induced binary operation, it forms a group. This construction is impor-
tant and we examine it in detail.
Let H be a normal subgroup of a group G. Let G/H be the set of all
distinct cosets of H in G. Element of G/H are of the form xH (which is the
same as Hx by normality) for x e G. Then G/H isa subset of P(G). We
first show that G/H is closed under the binary operation - on P(G).
2.9 Proposition: The product of any two cosets of a normal subgroup
is again a onset of it.
Proof: Let H be a normal subgroup of a group G. Let xH, yH be two
cosets of H in G. Then (xH) ~ (yH) equals ((x} - H) - (0} ~ H) by definition
which further equals {x}. (H - {y)) - H by associativity. But since H is
normal,H - {y} equals {y}-H. So (xH)- (yH) equals (xy) - (H - H). Now
H ~ H c H (since H is closed under the group operation). Also H = H~(e}
C H. H. Therefore, H -H = H. This shows (xH)-(yH) equals (xy)H which
is again a onset of H. (A more direct argument would be to show that
every element of the form xhl yh, for h,. h, e H can be expressed as xylr.
for some h, E H. This would show that (xH) - (yH) c (xy) H. Conversely.
for any h, e H, xyh. ——- (x ‘ e) . oh.) showing (xy)HC (xH) - (yH).) I
Note that normality of H was crucially used in the proof above. If we
once again take G to be the group of symmetries of a regular pentagon
and H be the subgroup (2, [1). Then the left coset r,H is (7,, f.). The pro-
duct (r,H) o (r,H) consists of four elements r, o r,, rI sf” 1; a r, and f, of, .
Computing these composites we see that (all) o (r,H) is the set {eH/I, r.)
which is not equal to any lel‘t coset of H in G. As a matter of fact we leave
it to the reader to prove that if a subgroup H has the property that for
all x, y e G, (xH) - (yH) equals (xy)H then H is normal in G. (This gives
yet another characterisation of normal subgroups.)
Because of this proposition, we get a well-defined binary operation on
the set of all cosets of a normal subgroup. This operation is called, quite
appropriately, the met multiplication. It is generally denoted by the same
symbol as the original group operation. We now proceed to show that
under this operation, the set of cosets is a group.
2.10 Theorem: Let H be a normal subgroup of a group G. Then under
coset multiplication, the set of cosets. G/H, is a group.
Proof: Recall once again that the coset multiplication is given by
330 nrscam usmnmncs (Chapter Five)
(xH) - OH) = (xy) H for x, y e G. Associativity of this operation follows
from the associatlvity of the group operation 6. The coset H serves as a
two sided identity because for x e G, (xH) - H =(xH) ~ (eH) a (xe) H=xH
and similarly H - (xH) = xH. Finally for inverses, we merely note that for
any x e G, the coset x“H is the inverse of xH because,
(xH) - (r‘H) = (xx-1)}! = eH= H
and similarly (r‘H) - (xH) = H. Therefore G/H is a group under coret
multiplication. I
211 Definition: The group G/H constructed in the last theorem is called
the group of onsets of H in G or the quotient group of G by H.
The first name requires little elaboration. Justification for the second
name will come a little later. Let us study two examples ofquotient groups.
Let 2 be the group of integers under the usual addition. For every positive
integer m,mz is a subgroup of z of index m and the cosets are the con-
gruence classes modulo m as we saw above. In the last section, we saw that
the set of these residue classes, 2,... is a group under residue addition. If
[x] and [y] are two residue classes then [x] + [y] is the residue class [x + y].
But since the residue classes [x], [y] and [x + y] are nothing but the cosets
mz + x, ml + y and m2. + (x + y) we see that the residue addition coin-
cides with the coset addition. Therefore, in this case, the quotient group
mm is the familiar group Z...
As another example, take the quaternion group Q. Ithas eightelements,
:1: l. i; I, i j and ik. The elements 1 and _1 form a subgroup H of Q.
This subgroup is in fact the centre of Q (see Exercise (1.26) (M). So H is
normal in Q. The cosets of H in Q are the sets {1. —-l}, (i, —i}, {1", .j} and
(k, —k}, For brevity let us denote these by e, a, b and c respectively. Then
we see that e (which also equalsH) is the identity for the onset multiplica-
tion. We also see a- h = (i. - i) - {Is—j} = (k, — k) = c and similarly
b , “=9. Also a . a: (1,4) . (1,4) —_(I, —1)= e. Completing these
computations we see that the quotient group Q/H is nothing but the Klein
group. This justifies our comment in the last section that the Klein group is
obtained from the quaternion group by ignoring the minus sign.
We studied one more example of coset decomposition where the group
G was the cartesian plane 11' and the subgroup H was a line L through the
origin. The identification of the quotient group G/H will be greatly simpli-
fied by the concept of a group homomorphism to be studied in the next
section. There is, in fact. an intimate relationship between homomorphisms
and quotient groups. So we shall visit them again in the next section. For
the moment let us see what properties ot'a group pass over to its quotient
groups. It is clear that if G is abelian, so is the quotient group G/H.
Moreover. if G is cyclic so is GIH. To see this, suppose x is a generator for
G. We claim that the onset xH is a generator for G/H. Let yH be any
Group Theory 331
element of G/H, where ye G. Then y equals some power, say x' of 3. But
then yH = x'H which equals (xH)-(xH)- — ~(xH) (n times) if n isapositive
integer and (x-‘H)-(r‘H)~ — ~(x“H)(—- n times) if n is a negative integer.
In the first case. y]! = (xH)" and in the second case yH =(x'1H)"I which
again equals (xH)" (since x-‘H and XI! are inverses of each other in G/H).
Ifn = 0, then y = e and yl-I = H = (.rH)’. lnallcases we have expressed
yH as a power of xH. So 6/” is generated by x]! and hence is cyclic.
It is tempting to think that the converse is also true, at least with the
additional hypothesis that H is cyclic. Such an expectation is intuitively
justified because knowing 6/” is like knowing G modulo or upto H. This
knolwedge, combined with the knowledge of H. ought to give the full
information about G. Unfortunately, this does not come out to bequite
true. For example let Q be the quaternion group once again and let H be
(i, — 1}. Then H is abelian. Also the quotient group G/H is the Klein
group which is also abelian. But Q is not abelian. In the same example
let N = (l, — l, i, — 1}. Then N is a subgroup o and it is cyclic, because
N is generated by the element i. N has only two cosets in Q namely N itself
and fill — {j, — j, k, —k). So N is normal in Q by Theorem 2.9. The
quotient group Q/N has only two elements and is generated by the coset
jN. Thus we see that both N and GIN are cyclic. But Q is not only not
cyclic. it is not even abellan.
The discussion in this section has so far been independent of the cardi-
nality of the groups. From the point of view of discrete mathematics. finite
groups are especially important. So we now consider what additional
results hold for finite groups. We begin with the promised formula for the
index of a subgroup of a finite group. The result is due to Lagrange and
is one of the basic theorems about finite groups.
2.12 Theorem: For finite groups. the order of a subgroup divides the
order of the group and the ratio equals the index of that subgroup. In
symbols, if G is a finite group and H a subgroup of G then . (6)]: (H)
- (G : H).
Proof: Note that a(G) and o(H) are simply the cardinalities of the sets
G, H respectively. Now consider the decomposition of G into the left
cosets of H in G. as given by Corollary (2.3). Let r be the number of
distinct left cosets. Then r= (G: H). We claim that all left cosets have
the same cardinality. namely u(H). Let xH be a left coset. Define] : H -+ xH
by f(h) = xh for Ii 6 H. By cancellation laws in a group, f is one-to-one.
Also, by very definition of the set xH, [is onto. 50 fisa bijection, proving
that ll =|H| = o(H). The set G is now decomposed into r mutually
disjoint subsets, each having cardinality o(H). By Corollary (2.2.4),
«(6) = |Gl a: r o(H)= (G. H) o (H). In particular we see that o(H) divides
.(6) since r is an integer. I
For finite groups, the index of a subgroup is sometimes defined as the
332 nrscns'rs MATHEMATICS (Chapter Five)
ratio of the order of the group to that of the subgroup. Because of Lag-
range‘s theorem, the index is then an integer. However, we have not adopted
this definition because it is not applicable for infinite groups. The definition
we have given is applicable for all groups, whether finite or not. it
ennhappen that an infinite group may have a subgroup of finite index.
For example the index of m Z in Z is finite, but it cannot be expressed as
the ratio of 9(2) and a(m Z) because both are infinite.
Lagrange's theorem has a host of applications. To begin with, let G be
the group of all symmetries of a regular pentagon and let N be the sub-
group of the 5 rotations. Then o(G)=10 and a(N)=5. So (G :N)
=10/5 = 2. Hence N is of index 2 in G and consequently is normal in G
by Theorem 2.9. This gives a truly effortless way of proving that N is
normal in G.
It is interesting to note that in many applications of Lsgrnnge's
theorem, the index of the subgroup is not very important. All that matters
is that the order of a subgroup is a divisor of the order of the group.
Note that there is no restriction on the subgroup (such as it be normal).
Choosing the subgroup appropriately, we get various applications. We
begin with the.
1.13 Proposition: The order of every element divides the ordesuof n
group.
Proof: Let G be a finite group and x e G. Then the order of x is, by
definition, the order of (x), the subgroup of 0 generated by x. The result
now follows immediately from the last theorem. 3
As a consequence we get the following result :
2.1‘ Corollary: If G is a finite group of order n then for every x e G,
x' = e, the identity element of G.
Proof: Let m be the order of x. Then by the last Proposition m divides
n. So u =mr for some integer r. Now x" = e by Theorem 1.10. Hence
x" = x'" = (x")' (by Proposition (3.43)) = e' = 42.
Applying this corollary to a particular group, we get the following
interesting theorem due to Fer-Inst.
2.15 Theorem: If p is s prime number then for every integer x, xi a x
(modulo p).
Proof : We consider two cases, J: is divisible by p sud x is not divisible
by p. If x is divisible by p then so is it". Therefore both x! and x are
congruent to 0 modulo p and the result holds trivially. Alternatively we
can factorise x7 - x as x(xl’" — l) and see directly that it is divisble by 1;
(since p divides the first factor, x).
Group Theory 333
It is the other case that is more interesting. Ifx is not divisible by p
then the residue class, [2:], of x modulo p is non-zero. So [2:] e Z, — {[0]}.
Since p is a prime. by Proposition (1.3), 2, — ([0]) is a group under mod
p multiplication. The order of this group is obviously p -— 1. So by the
corollary above, [x]!" equals the identity element of this group, which is
simply the residue class [1]. Since [x]P-‘ equals [xi-‘1 we have xP-l E 1
(mod p). This means xi-l — l is divisible by p. But then so is x0e!-l - 1) .
which equals xP -— x. This completes the proof. I
For example, if p = 5 and x = 3 then x‘ = 243 and we indeed see that
it is congruent to 3 modulo 5 because 243 — 3 = 240 = 5-48. We remark
that Fermat's theorem can also be proved directly by induction on x.
(In case x is negative, we work with — x.) But as with the solution to the
Tournaments Problem, the inductive proof is more like a verification. The
group theoretic proof given above, on the other hand, really ‘explains'
why 3:"1 — 1 is divisible by p when x is not a multiple of p.
Lagrange’s theorem puts a restrication on the order of a subgroup of
a flninte group. For example, if G is a group of order 60 then a subgroup
ofG must have for its order 1, Z, 3, 4, 5, 6,10, 12, 15, 20, 30 or 60, because
these are the only positive divisors of 60. But it does not say that G will
necessarily contain subgroups of all these orders. In Section 4 we shall
show that the group of all orientation preserving isometries of a regular
tetrahedron has order 12 but contains no subgroup of order 6, even
though 6 is a divisor of 12. Thus in general the converse of Lagrange's
theorem is false. It does hold for abelian groups. However, to prove it
will require considerable machinery from the theory of group actions
(see the Epilogue). For the time being. we prove it for a very special
type of abelian groups, namely_ cyclic groups.
1.16 Theorem: Let G be a cyclic group of order n and let m be a posi-
tive integer dividing n. Then G contains a unique subgroup of order m.
Proof: By Theorem 1.10, G consists of e, x.x,x‘,.... r" where x is any
generator of G. Now let 1! = n/m. Then d is a positive integer. Let
H = (x4), the subgroup generated by x“. Clearly
H = {e, x“, x",..., x("'""}.
because the next power of x‘ is W which equals x“ ( = 2). Hence
a (H) = m. So G has a subgroup of order m. To show it is unique, suppose
K is another subgroup of G of order m. Let r be the smallest positive
integer such that x’ e K. We claim x' generates K. Let y E K., Then
y = x' for some 0 < s g n — 1. By the euclidean algorithm, write s as
ru + v where u, v are integers with 0 S v <: r. Then
xv = x’-" = x' (x')—" = y (x')“' E K
334 DISCRETE MATHEMATICS (Chapter five)
since y e K and x' E K. So v must b: 0, for otherwise v will be a posi-
tive integer less than 7 such that x' e K, contradicting the definition of r.
Therefore, y = x” = (x')", showing that y e (x'). Hence K = (x'). Now
we claim that r = d, which would of course prove that H = K since
H = (x4) and K = (1'). Since K is a cyclic grcup of order m generated by
x', (x'r' = e, that is x"' = 9. By theorem 1.10, this means that rm is a
multiple of n. set n = rmp. But n = md. So we get d = rp. In particular
41 > r. If d > r, then n = md > mr and the m + 1 distinct elements
e, x', x",..., 1"" would all be in K, contradicting that the order of K is In.
(Alternatively, we can simply interchange the roles of r and d and prove
r > d.) Thus r = 4. As noted before, this means H = K. So there is one
and only one subgroup of order m in G.
As the last application of Lagrange’s theorem in this section, we apply
it to a subgroup whose significance may not be obvious at first sight.
However, later we shall relate it to the conjugacy classes defined in the
last section and as a consequence will get some interesting results about
groups whose powers are orders of a prime.
2.17 Definition: Let G be a group. For an element x e G, the set
(E e G : xx = xg) is called the normalise: of x in G and is denoted by N,
or by Man).
In other words, the normsliser of an element is the set of all elements
which commute with it. Obviously N, - G if and only it' x is in the centre
oi“ G. In an abelian group, Nx=G for all elements x in G. Ina non-
abelian group, N, is the measure of the extent to which at commutes with
other elements of G. Note that N. always contains the subgroup (x) gene-
rated by x. because all powers of x commute with x. In Q, we see that
N. = (l, - l, i, — 1) because none of the remaining elements commute
with l, but N.,= Q because ~ 1 is in the centre of Q.
Note that for any two elements x, y of a group G, x e N, if and only
if y e N,. This simple fact is often useful.
2.18 Proposition: Let G be a group and x e G. Then N, is a subgroup
of G. Further (x) c N, and (x) is a normal subgroup of N: (although not
necessarily of G).
Proof: Let g, h e ”X. Then g.\' = xg and hx = xh. So ghx = gxh
= xgh showing that gh e N» Hence N, is closed under multiplication.
Similarly if g E N)" then {5: = g-‘xgg-1 = g-‘gxr‘ = xx" showing
g-1 e Nx. Thus N. is also closed underinversion, proving that N, is a
subgroup of G. Wealready noted (x) c M. Hence (x) is a subgroup of Na.
To show that (x) is normal in N,, let y e (x) and g E N,. Then y — xr
for some integer r. Now g E N,r implies x E N, But N, is subgroup by
what we proved just now. So x' e N, that is. y e N,. This means gy = yg
or gyr‘ = y e (1:). Thus (x) is normal in N,. I
Group Theory 335
The preceding proposition justifies the name ‘normaliser'. Let us
now see how big the normaliser of an element x is. The condition gx = X:
is equivalent to gxg" = x. This means that the conjugate of x by g equals
x (see Definition 1.14). If this happens for a large number of elements g
then N, would be large and we would expect x to have very few conju-
gates. (1n the extreme case where x commutes with every element of G,
Nx = G and x has only one conjugate, namely x itself.) Thus we are led
to believe that the larger the normaliser is. the smaller would be the num-
ber of conjugates. This guess turns out to be quite correct as seen from
the following theorem.
2.19 Theorem: Let x be an element of a finite group G. Then the num-
ber of (distinct) conjugates of x is the index of the normaliser N. in G.
and hence equals a (6')] - (Nx).
Proof: Let .L' be the set of all left coasts of N. in G and let S be the set
of all elements in G which are conjugate to x. (In order words. S is the
conjugacy class of x.) We have to prove that the sets _L‘ and S have the
same cardinality. We do so by establishing a bljection between them.
Define f: .L —> S by f (n) a gxg‘l. We must first verify that f is well-
defined because it has been defined in terms of a representative of a coset.
Suppose the same coset gN. is represented as hN. where h e G. Then
by Proposition 2.2, g-‘h E N., which means g-‘hx = xg"h. This gives
hx=gxg-‘h and finally. gxgd = hxlr‘. Thus f (gN.) = f (hN,) and so
the function f is well-defined. The same argument can be read backwards
to show that if f(gm) = f (t) then gN, = t. This means f is one-
to-one. Also f is obviously onto because every element of S is of the form
gxg" for some g e G and hence equals f (31%,) for comes 6 G. Therefore
we have thatfis abijection and hencel .CI = |S|. But [ J.‘ l is, by very
definition, the index of N, in G. By Lagrange's theorem |.£'| equals 0 (6)]
9(Nx) and the proof is complete. I
Note that in the proof above, finiteness of the group G was used only
in the last step. Thus, even for an infinite group, if an element has only
finitely many conjugates then its normaliser willbeot' finite index.
As a corollary. we have
2.20 Corollary: Let G be a finite group. If two elements any 6 G are
conjugates of each other then o (N,) = e (N,).
Proof: Let S be the conjugacy class containing 2: and y. By the theorem
above.
°(Nx =% = “(NIL '
336 Discxara MATHEMATICS (Chapter Five)
What is really interesting is that the cardinality of each conjugacy class
of a finite group G is a divisor of the order of G. even though these conju-
gacy classes are generally not subgroups of G. When a(G) is such that its
divisors are only of a certain type, this information is very valuable. To
illustrate its power, We prove the following result.
2.21 Theorem: Every group whose order is the power of a prime has a
non-trivial centre.
Proof: Let G be a finite group with o(6) =1)" for some prime number p
and some positive integer m. (If m = 0, the group is trivial and so is the
result.) Let Z be the centre of G. Then Z is a subgroup of G (cf. Exercise
(1.26)). We have to show o(Z) > i. Let r = o(Z).
Now consider the class decomposition of the set G, that is, the
decomposition of G into mutually disjoint conjugacy classes, say, SD
S.,...,S.. By what we said above, for every t, I S, | is a divisors of «(6)
which equals p’". But the only possible divisors of p" are powers of p (with
p" = 1 included as a possible divisor). Note also that a conjugacy class S,
is a singleton set, say (x) if and only if x62 (see Exercise (1.26) again).
Therefore, there are exactly r singleton conjugacy classes. Without loss of
generality we may suppose that they are S,, 5...... S, where r g k. Now,
for !> r, | S. | =pW for some m,> 0. Hence p divides 1 S, | . Now, by
Proposition (2.2.3) we have
p~=|Gl
k
E Isvl
I!
'- l
ilsll‘i' 3 ISII
1-1 >:
=r+ I>r218.1
Hence r = p’" — '2 [SI|. Nowp |p'" (since m > 0). Also p [I Sll for all
>r
i> r, as we say just now. So p divides r. Since r is positive, r is at least
equal to p. This shows r > 1 as was to be proved. I
As an application of this result, we want to show that every group of
order p‘ where p is a prime is abelian. First we need subsidiary results. The
first is interesting by itself.
2.22 Proposition: Every group of prime order is cyclic.
Proof: Let‘G be a group of orderp wherep is a prime. Let x be any
element of G other than the identity. Let H = (x), the subgroup generated
by 2:. Then ~(H) dlvldes o(G). But the only possible divisors ofp are i and
Group Theory 337
p. Since e, XE H, o(H) > I. So o(H) :p. In other words H=G. Thus
G = (3:), showing that G is cyclic. I
2.23 Proposition: If G is a group with centre Z and the quotient group
G/Z is cyclic, then G is abelian.
Proof: The quotient group, as we recall, consists of the cosets of Z in G
and the operation is that of coset multiplication. (By Exercise (1.26), Z is
normal in G and so the quotient group 6/2 is defined.) We are given that
G/Z is cyclic. Let a coset xZ be a generator for 6/2, for some 1!: e G.
Then for every y e G, yZ=(xZ)' = ml for some integer r. But this
means that y = 1’: for some 26 2. Now let a, b be any two elements of
G. By what we showed just now, there exist integers r, s and elements 2, w
in 2 such that a = r: and b = x‘w. Now, ab = x’zx'w = x'x'zw because
zx‘ = 1': (since 2 is in the centre). So ab = x'+‘zw. But ba = x'wx'z =
x'x'wz = x'+‘zw (since Mr also commutes with all element of G). So ab= ba.
That is, G is abelian. I
There is a sort of vacuousity about the way the last result is formula-
ted. Its conclusion says that G is abelian which means Z = G and the
quotient group 6/2 is trivial. Thus the proposition in efl‘ect says that if 2 is
the centre of a group and 6/2 is cyclic then G/Z must be trivial. In other
words, the conclusion shows that the hypothesis can hold only vacuously
except in one case. It would be better to word the result as “The quotient
group G/Z cannot be cyclic except in the trivial case when 2 = G.’ But the
formulation given above is fairly standard.
It should not, however, be supposed that the preceding proposition is
useless. It so happens sometimes that we do not know beforehand that G
is abelian. Still we may be in a position to prove that 6/2 is cyclic. Then
it follows from the last proposition that G is abelian. This is exactly what
we do in the proof of the following theorem.
2.24 Theorem: Every group of order p’ where p is a prime, is abelian.
Proof: Let G be a group of order p‘. Let Z be the centre of G. Then
~(Z) = i, p or 1;”, since these are the only divisors of p'. The first (possibi-
lity is ruled out by Theorem (2.21). In the second case, o(G/Z) = o(G)/c(Z)
= p‘/p= p. Hence by Proposition (2.22), 6/2 is cyclic. So by the last
proposition, G is abelian. If o(Z) = p’, then of course Z = G and so G is
ahelian. In any case the assertion holds. I
Thus we see that all groups of orders 4, 9, 25,... are abelian. It is not
true that every group of order p' is necessarily abelian. We already had
two non-abelian groups of order 8( = 2’), the quaternion group Q and the
dihedral group 0.. that is, the group of isometries of a square. Of course;
by Theorem (2.21) every group of order 113 has a non-trivial centre and
338 Drscam MATHEMATICS (Chapter Five)
this fact helps in determining the structure of such a group. The converse
of Lagrange's theorem can be proved for groups whose orders are prime
powers. A proof can be based upon induction. A critical step is the follow-
ing result.
2.25 Proposition: Every group of order p" where p is a prime and m is a
positive integer, contains a normal subgroup of order p.
Proof: Let G be a group with 0(0) = p". Let 2 be the centre of G. Then
«(2) =p' for some r with 1 g r < m. Note that every subgroup of Z is
normal in G by Exercise (1.26) (i). So it suffices to show that Z contains
some subgroup of order 11. Let x be any element of Z other than the
identity. Let H = (x). Then o(H) = p" for some k with l g k g r. Now H
is a cyclic group and p is a divisor of «(H). So by Theorem (2.17) H con-
tains a subgroup, say N, of order p. N is also a subgroup of Z and, as
noted earlier, this completes the proof. I
Having proved that G has a normal subgroupN of order p, we now
consider the quotient group G/N, whose order is o(G)/°(N) = V". So by
induction, we can assume that GIN contains a subgroup of any order
which is a divisor of 13"". If we could relate subgroups of G/N to those of
G, we would get the existence of a desired subgroup of G. Such a relation-
ship indeed exists and will be studied in the next section. where we shall
prove the converse of Lagrange's theorem for groups whose orders are
prime powers.
Before closing this section we remark that the concepts, ‘conjugncy
class‘ and ‘normaliser’ are particular cases of what are called 'orhits' and
‘isotropy subgroup‘ respectively, of a group action. When we shall study
group actions‘, we shall again visit them, to motivate the general concepts,
if for nothing else. These more general concepts will also enable us to prove
deeper results about the structure of finite groups. one of which will be
the converse of Lagrange’s theorem for abelian groups.
Exercises
2.] Let G be the group R', under coordinatewise addition. Let H be
the set ((x. y, z) e R': 2:: + 3y — 42 = 0). Prove that H is a sub-
group of G. Find the various cosets of H in G.
2.2 With G as above let K= ((x, y, z) e R' : 2x + 3y — 4: = 0,
x — y + 22 = 0}. Prove thntK is also a subgroup of G and find
its cosets.
2.3 Prove that the cosets of the subgroup S1 of the group of non-zero
complex timbers are circles centred at the origin. Describe the
‘See the Epilogue.
Graig Theory 339
coset multiplication geometrically. (Hint : Use polar form of com-
plex numbers.)
2.4 Let H, K be subgroups of a group G. Prove that the intersection
of a lefl ooset of H and left coset of K is either empty or a left eoset
ofH n K.
2.5 Prove that the intersection of two subgroups of finite indices is a
subgroup of finite index.
2.6 Let G be a group. The function f: G —> G defined byf(x) = r‘ is
called inversion. Prove that f is a bijection. Give an alternate
formulation of the proof of Proposition (2.4) by applying Exer-
cise (3.2.21) to f.
2.7 Let G be a group and H, L subgroups ofG with L C H. lfL is of
finite index in H and H is of finite index in G then prove that L
is of finite index in G, and moreover, (GzL) = (G : H)(H : L). Use
this result to give an alternate solution to Exercise (2.5).
2.8 Show by an example that a subgroup of index 3 need not be
normal.
2.9 Let .C be the set of all left eosets of a subgroup H of a group G.
Suppose we attempt to define a binary operation on I by (xH) .
()1!) = (xy)H for x. y e G. Prove that this is well-defined (that
is, independent of the left coset representatives 2: and y) if and
only if}! is a normal subgroup of G.
2.10 Let (G, -) he a group with H a normal subgroup. When we defined
the operation 0 on the entire power set P(G), we remarked that
the singleton set {e} is the identity for this operation. When we take
the restriction of this operation to the set of all cosets of H in
G (and call it coset multiplication), we find that H is the identity
for it. Why does this not contradict the uniqueness of identities,
established in Proposition (3.4.5)?
Prove Theorem (2.15) by induction on x.
If p is a prime and m is any positive integer. prove that for all
integers x. x!” E x (mod p).
2.13 Let G be a group and C(G) its commutator subgroup (Exercise
(1.27)). Prove that the quotient group G/C(G) is abelian. Prove
further that if N is any normal subgroup of G such that G/Mis
ahelian then C(G) C N. (Thus C(G) is the smallest normal subgroup
whose quotient group in abelian. The quotient group G/C(G) is
sometimes called the abelianised group G.)
Let H be a subgroup of index 2 in a group G. Suppose every element
of G —— H is of order 2. Prove that H is ahelian. (Hint : For h e H
and x E G — H, prove that hxh = 1:). Show by an example that
G need not be abelian.
2.15 Suppose two elements 2: and y of a group G are conjugates of each
other. Prove that their normalisers N. and N, are also conjugates
340 DISCRETE MATHEMATICS (Chapter Five)
of each other. (This gives an alternate proof of Corollary (2.21).)
2.16 Prove that the centre of a non-abelian group of order p' where p is
a prime is of order 1:.
2.17 Let H, K be subgroups of a finite group G. Prove that the cardi-
nality of the set HK equal o(H) o (K)/o(H n K). (Hint: Consider
the function f: H X K—-> HK defined byf(x, y) = xy for x e H,
y e K. Prove that every element of HK has exactly e(H n K) pre-
images and apply Proposition (2.18).)
2.18 Suppose G is a group of order pq where p, q are primes with p > q
and that G contains a subgroup H of order p. Prove H is normal in
G. (Hint : Apply the last exercise together with Corollary (1.13).).
2.19 Prove that the centre of a group of order pq where p, q are prime:
is either trivial or else the whole group.
2.20 Suppose Hand K are normal subgroups ofn finite group G. Suppose
°(H) and a(K) are relatively prime, that is, have no common divisor
except 1. Prove that every element of H commutes with every
element of K. (Hint: Show that their commutator is in H n K
which must be only (2}).
2.2 Suppose G is a finite group and S is a subset ofG with |S| > §a(G).
.—
Prove that for any four elements xl, x.. x., x‘ e G, there exists
some g e G such that mg 6 S for I = 1, 2, 3, 4. (Hint: Apply
the principle of inclusion and exclusion to the four left translates
or s by x."‘, 25‘. x;', xr‘).
2.22 Suppose G is a finite group in which the number of solutions to the
equation x‘ = e exceeds (3/4) o(G). Prove that every element of
G in solution of this equation and hence that G is abelinn. (Hint:
Apply the last exercise to the set of all solutions of this equation.)
Show by an example that the last result need not hold if the
number of solutions to x‘ = e equals (3/4) a (G) (Hint: Consider
the dihedral group of a suitable order).
If G is a group of order p’", where p isa prime and m a positive
integer and N is a home] subgroup of G with «(N) > 1, prove that
o(N n 2) > 1. (Hint: Argue as in the proof of Theorem (2.21),
after noting that all conjugates of elements of N are also in N and
thereby obtaining a decomposition of N into conjugacy classes.)
Suppose G is a finite abelian group with n elements A}, x., ..., x...
Suppose G has exactly one element. say y, of order 2. Prove that
the product xxx....x. equals y. (Hint: cf. the hint to Exercise
(1.16). ).
‘2.26 Using the last exercise prove that for every prime p, (p — I)! + 1
is divisible by p. (This is called Wilson’s theorem.)
2-27 Suppose G is a group which has no nontrivial, proper subgroup
(i.e.. asubgroup other than (e) and G). Prove that G is afinite,
cyclic group of prime order.
Group Theory 341
2.28 Suppose G is a group of order 2a with trivial centre and an
element x of order n. Prove that x cannot commute with any
element of G except its own powers. (Hint : Consider e(N,).)
2.29 Analogous to the normaliser of an element, we can define the
normaliser of a subgroup which is sometimes helpful in checking
normality. Let G be a group and H is a subgroup. Let
N(H) = (s e G: gHg-l = H}.
N(H) is called the normale of H in G. Prove that:
(i) N(H) is a subgroup of G and contains H. .
(ii) MI!) is the largest subgroup of G containing H as a normal
subgroup.
(iii) H is normal in G if and only if N(H) = G.
(iv) the number of distinct conjugates of H in G equals the index
of N(H) in G. (Hint: The argument resembles the proof of
Theorem (2.20).)
Note: and Guide to Literature
The material in this section is basic for any study of structure of
groups, especially finite groups. The theorems of Fermt and Wilson are
illustrations of applications of group theory to number theory. Apparently.
Wilson‘s theorem was already proved by Leibnitz.
Fermat (1601-1667), although a lawyer by profession, made significant
contributions to number theory. But he is most famous for something
. which he claimed he could do but never actually published. The equation
x' + y“ = 2' has many solutions for positive integers x, y, z (for example.
x=3,y= 4, z = 5). Fermat claimed that for no integer n > 2, the
equation .t" + y" = 2" has solutions for positive integers x, y, I but did not
write the proof down. Although this has been proved for many values of
n. whether it holds for all n > 2 is still not known. Equations of this type
are called diophaltine equations. Fermat’s conjecture is papularly called
Fermat's last theorem. Althoughits solution is of no particular significance
from the point of view of applications, no other conjecture in mathematics
has engaged so many mathematicians for so long.
It is customary to define a quotient group in the manner ofExereisc
(2.9). We have preferred to define it through a binary operation defined
for all subsets of the group, because this binary operation figures else-
where too (for example, Exercise (2.17).).
3. Group Homomorphisms
Our discussion of groups has so far, involved only one group at a time
except when we constmcted new groups from old ones. In this section we
study a particular relationship which exists between certain pairs of groups.
342 niscxm MATHEMATICS (Chapter Five)
Whenever it does, the properties of one of the groups often throw some
light on those of the other.
This relationship is known as a homomorphism. We already defined
it (see Definition (14.15)) for general algebraic structures. But it is only
when the algebraic structure is suificiently strong that non-trivial results
can be proved about homomorphisms. A group structure is such a structure.
So we recapitulate the definition of a group homomorphism. In the next
chapter we shall study an algebraic structure which is even stronger than
a group structure. So results proved here will have their analogues in the
next chapter.
3.1 Definition: Let G, H be groups. A function f:G->H is called a
group homomorphism (or simply a homomorphism) if for all x, y E G, we
have f(x-y) =f(x)-f(y) (where the same symbol is used to denote the
group operation in both G and H). A homomorphism which is injective,
surjective or bijective is respectively called a monomorphlsm, an epimor-
phism, and an isomorphism. In the last case, we say 6 is lsomorphie to H
and denote this symbolically by G a H. An isomorphism of a group onto
itself is called an automorphism.
A comment is in order about the terminology. The common part
‘morphism’ comes from ‘morphos’ which meansstrueture. All the four
concepts defined above deal with the structures of the two groups. The
prefixes ‘homo’ and ‘iso‘ both mean ‘similar' but while ‘homo‘ indicates
likeness ‘iso' stresses on equality. The prefixes ‘mono‘, ‘epi' and ‘auto’
mean respectively ‘one’. ‘on’ and ‘self’. Two groups (or more generally
any two algebraic structures) which are isomorphic to each other are
indistinguishable from each other. They are like replicas of the same
object. A homomorphism is not such a strong concept as an isomorphism.
It simply means that thef " s—pH withoris ‘ :'..:-
with the group multiplication. Given two elements x, y in G whether we
first multiply them (in G) and apply f to the product xy or whether we
first apply f to the elements x,y separately and multiply their images
f(x),f(y) in H, we get the same result. This condition can be graphically
represented by requiring that the following diagram is commutative
GxG-—>s
fxf f
HXH—éi‘l
Group Theory 343
where the horizontal arrows represent the binary operations on G and H
respectively and f x f :G x G —> H x H is the function which sends
(x,y)e G x Gto (fix),f(y))iuHx H.
Of course, the concept of ahomomorphism makes sense even if G, H
are algebraic structures of a weaker type than a group, say semigroups or
monoids. But, for a semigroup or a monoid homomorphism, there is not
much we can infer from the definition. For example, let R be the monoid
of real numbers with usual multiplication. If we define ft R —> R by
f(x) = 0 for all x e R then f is a monoid homomorphism. Butf does not
take the identity element of R (namely 1) to the identity element of R.
With group homomorphism: things are much better as we now show.
3.2 Proposition: Let f : G ->H be a group homomorphism. Theufmaps
the identity element of G to the identity element of H. Moreover, if xe G
then for-1) is the inverse offlx) in H. In other words, f is compatible with
or preserves inverses.
Proof: Let c, e’ denote, respectively. the identity elements of G, H. We
have to show f(e) = e’. Denote [(2) by a. Then a-e' = a =f(e) =f(e.e) =
f(e) - fie) = am. So ave’ = a.a. Since cancellation laws hold in a group,
it follows that e’ = a as was to be shown. As for inverses, letx E G. Denote
fix), f(x-‘) by y, 2 respectively. We have to show 2 = y—‘. Now
yz =f(x).f(x-1) -f(x.x'l) =f(e) = e’
(as shown just now). So
z=e’z=y‘1yz=y-‘ e’ =y". I
The proof of the following proposition is left as an exercise. From now
onwards ‘homomorphism‘ means 'group homomorphism’ in this chapter.
3.3 Proposition: The , " of two ‘ phisms is a ‘
phism. The composite of two isomorphisms as well as the inverse of an
isomorphism are isomorphisms. [ff : G —> H is a homomorphism. f(G) is
a subgroup of H. If f is monomorphism then f(G) 2'. G. (For this reason, a
monomorphism is also called an embedding or an imhedding.) a
We now give some examples of group homomorphisms.
(1) If G is any group. then theidentity function 1020 a G is an isomor-
phism. Although this, by itself, is hardly profound, this fact combined with
the last proposition means that in any collection of groups, ‘being isomorphic
to’ is an equivalence relation. A property of groups is said to be invariant
under isomorphisms if whenever a group has it so does every group iso-
morphictoit. For example, being an nbelian group is such a property. Suppose
f: G --> His an isomorphism and G is abelian. Given any x, y e H, let
344 ntscnarn MATHEMATICS (Chapter Five)
(1 = f'1 (x). b = l"(.v)- Then 161' =f(a)f(b) =f(ab) =f(ba) =f(b)fla)=yx-
So H is abelian. similarly being a cyclic group is a property which is
invariant under isomorphisms. The property of being equal to a given
group is not invariant under isomorphisms.
(2) Let G, H be any groups. The constant function from Gto H which
sends every element of G to the identity element of H is a homomorphism.
It is called the trivial homomorphism. If H is an abeli'an group denoted
additively, this homomorphism is also called the zero homomorphism. Note
that the composite of two homomorphisms is trivial if either one of them
is trivial. The converse is false in general.
(3) Let G be an abelian group and n a fixed integer. Define f: G -> G
by[(x) = :5“. Then I is a homomorphism. Even if G is not abelian f may be
a homomorphism for some values of n. Iff is a homomorphism for n = 2
or n = 1 then G must be abelian (cf. Exercise (1.5)).
(4) Let G be the additive group of real numbers. For a fixed x E R
the function f :R —> R defined by f(x) = M: for x e R is a homomorphism.
If )t = 0, this is the trivial homomorphism. If ). ye 0, it is an isomorphism.
Similar homomorphism can be defined for Qor C instead of R. We may
also replace R by a euclidean vector space R" and for a fixed ). E R
define f: R" —> R" by f(x) = Ax (that is, the product of the scalar A with
the vector x). Note that in all these examples, the fact that f is a homomor-
phism follows from distributivity of multiplication (by A) over the addition.
(5) Let R be the additive group of real numbers and R‘ the multi-
plicative group of non-zero real numbers. Definef : R —> R‘, by f(x)=e".
Because of properties of the exponential function,
[(x + y) = em = e".e” =f(x) JO)»
Thus f is a homomorphism. Note that f is a monomorphism but not an
epimorpbism. Its range is the set R; of all positive real numbers. This is
a subgroup of R‘ and if we view f as a homomorphism from R to R3. ,it
is an isomorphism. In other words, the additive group of real numbers is
isomorphic to the multiplicative group of positive real numbers.
(6) As the complex version of the last example, let G, C‘ be' respect-
ively the additive group of complex numbers and the multiplicative group
of all non-zero complex numbers. Define f: C —> C' by f(z) = exp (21:12).
Then f is homomorphism. Note that f is not a monomorphisrn because of
the periodicity of the complex exponential function. However, it is an
epimorphism. The reason for putting the coeflicient 21ri in the exponent is
that if we restrict f to R, which is asubgroup of C, we get a homomorphism
of R onto the circle group S‘ which sends a real number x to the complex
number exp (Zm‘x) = cos 21:): + i sin 23):. Note that the entire subgroup Z
of R ts mapped to the identity element I of S‘.
(7) Let m be a positive integer and Z... the group of all residue classes
modulo m. These residue classes are the equivalence classes of z under
Group Theory 345
the relation of congmency modulo m. We let p: Z —> 2... be the quotient
(or the projection) function, (see Definition (3.2.12). Then p is a homo-
morphism, by the very definition of addition in Z... More generally, let G
be any group and H any normal subgroup of G. In the last section we
defined 61!! as the group of all cosets of H in G, under coset multiplication.
Define p: G «r G/H by 170:) = X]! for x e G. The function p assigns to
each element of G, the coset to which it belongs. Then p is a group
homomorphism, called the quotient homomorphism. it is an epimorphism.
This example is very important from a theoretical point of view, because
as we shall see below, essentially every epimorphism is of this type.
(8) Let G be a group. Fix any element g e G. Define e, : G -> G by
0.0:) = gxr‘. In other words, the function 0. is the conjugation by the
element g. Since
0.000 = 2(xy)g" = (gxr‘xgyr') = 0.0000)
we see that 0, is a homomorphism. It is easy to show that 0, is a bijection
(cf. the proof of Corollary (l.13)). So 0, is an automorphism of G. An
automorphism of this type called an inner'automorphism.
(9) Certain homomorphisms associated with products of two (or more)
groups are very important. Let 6,, 0,, ..., G. be any groups. Let G be the
set Gl x G, x X 0.. Then G becomes a group under co-ordinatewise
multiplication. For each I, we define the function
1:, : G —> G., m(x,, x,. ..., x.) = x,
called the projection onto the ith factor. It is easily seen that m is an
epimorphism. Let us denote the identity elements of all G“: by e. Then
define 7‘, : G —> G by Mar) = (e, e, ..., e, x, e, ..., e) for x e G, where the
x in the n-tuple is in the ith place. Then A, is a monomorphism. The range
ofh is the subgroup (e) x x (e) X G, x (e) x ;< (e) of G. Let us call
this subgroup Hi. Then by Proposition (3.3), G. is isomorphic to 11,. Thus
we see that every factor group in a product of groups is isomorphic to a
subgroup of the product.
of
(10) It is very easy to characterise which functions into a product
groups are homomorphisms. Let the groups G; and G be as in (9). Let H
be any group andf: H —> G a function. Then for each i = l, 2, ...,n, the
composite 1:. o f is afunction from H to 6.. Clearly iff is a homomorphism,
so is m c f for all i .——- 1, ..., n by Proposition (3.3). Interestingly, the con-
verse is also true. For if 1:, o f is a homomorphism for all i, then given any
x, y E H we let f(x) andf(y) be respectively (xv ..., x.) and (y., ..., y").
Then 1r,(f(x)) = x, and 1r,(f(y)) = y, for all 1 = l, ..., 71. Let f(xy) =
(II, zI ..., 2.). Then z; = 1r,(f(xy)) = n,(f(x))-1c.-(f(y)) = xm for all
i = l, ..., n. It follows that f(xy) = f(x) ~f(y) since [(x) - fly) is obtained
.
by co-ordinatewise multiplication.
(ll) Let G, H be any two groups and suppose f, g : G —> H are both
f
homomorphisms. As usual we define the pointwise multiplication of
346 DISCRETE MATHEMATICS (Chapter Five)
and g and get a new function h = fg : G —> H defined by 110:) = flx)g(x)
for x e G. In general, his not a homomorphism because if x, y e G then
h(xy) = f(xy)g(xy) which equals f(x)f(y)g(x)g(y) because f and gare homo-
inorphisms but h(x)h(y) equals f(x)g(x)fly)g(y). However, if H is abelian
then we see that It is it homomorphism In that case the set ofall homomor-
phisms from G to H is itself agroup under the operation justdefined. This
group is denoted by Hom (G, H),
(12) Let G be the group R’ under co-ordinatewise addition: Define
f: G ——> R by f(x, y) = y-2x. Then f is an epimorphism. Note thatf"({°))
is the set ((x. y) e R“: y — 2x = 0). We already considered this set in
Figure 5.4 and saw that it is a subgroup of R‘.
We could continue the list further. But let us now turn to some basic
theorems about group homomorphism. The key concept is the following:
3.‘ Definition: Let f : G —> H be a group homomorphism. Let e be the
identity element of H. Then the set f-'((e)) is called the kernel ofI.
For example, the kernel of the trivial homomorphism is the whole group
G. The kernel of a. monomorphism consists onlv of the identity element.
Actually, this property characterises monomotphisms as we shall see shortly.
The kernel of the homomorphism in Example (6) above is the group of
integers. In Example (7), the kernel is precisely the subgroup m2 of Z.
We saw in the last section, that the quotient group Z/mz is precisely the
group Zn" whiCh is the range of the homomorphism. We now show that
upto an isomorphism the range of every homomorphism is the quotient of
its domain by its kernel. The following result is called the fundamental
theorem about group homomorphism.
3.5 Theorem: Let K be the kernel of agroup homomorphism f: G —> H.
Let R be the range of f. Then K isa normal subgroup of G. Moreover.
there exists a unique isomorphismti: G/K —> R such that f = e e p where
p : G —> G/I-l isthe quotient homomorphism, in other words, in the follow-
ing diagram
6 ———P—)_e/K
f
a”...
there is a unique way to fill the dotted arrow by a homomorphism, so as
to make it commutative.
Proof: First we show that K is a subgroup of G. If x, y e K then
flxy) =f(x)f(y) =“ez = e showing that xy 6 K. Similarly flat-1) equals
[f(;:)]‘1 by Proposmon (3.2). So if x E K, then f(x") = [2]“ = 2, showing
Group Theory 347
that x-1 e K. Thus, we see that K is a subgroup of G. As for its norma-
lity, if g E G and x e K then f(n-‘) = [(g)f(x)f(g*') = f(g)e[flg)]—‘
= f(g)[f(g)]-1 = e, showing that xgx“ e K. So K is a normal subgroup
of G, and G/K, the group of cosets is well-defined.
Now we come to the more interesting part of showing that as a group,
G/K is isomorphic to R, the range off. (By Proposition (3.3), R is a sub-
group of H and hence itself a group). We define 0 2 GIK —> R by 6(xK) =
f(x). We must of course verify that this is well-defined. So supposexK
and yK are the same cosets. Then xy“ 6 K. So f(xy") = e. But this means
that f(x)[f(y)]-' = e and so /(x) = fl y). Therefore 6(xK) = 0(yK). This
argument can be read backwards and proves that 0 is one-to-one. Also
since R is the range off, every element of R is of the form fix) for some
x e G and so 6 is onto. It only remains to show that 0 is a group homo-
morphism. Recall that the group structure on G/K was defined by coset
multiplication, (xK)(yK) =(xy)K, for x, y E 0. So. 0[(xK)(yK)]= e[(xy)K]=
f(xy) = f(x)f(y) = 8(xK)0(yl\ ). This proves that 0 is a group homomor-
phism and hence an isomor phism. The construction of 9 is the same as
that of the function g in Proposition (3.2.14). So it follows that 0 op = f
and also that e is the only isomorphism (in fact the only function) from
G/K to R with this property. a
This theorem is of profound significance. Given a homomorphism
f: G -> H. obviously it is only the range off that is directly related to G
and f: The entire group H plays no role. The theorem says that, upto an
isomorphism, this range R may be treated as a quotient group of G and
when so treated, the function f coincides with the quotient function from
G onto this quotient group. This justifies our comment in Example (7)
above. that essentially every epimorphism isaquotient homomorphism.
The isomorphism 0 is often called the canonical isomorphism. (‘Canonical‘
is a formal synonym for ‘standard' or ‘simple’.)
Let us apply this theorem to a few examples. In Example (6)y we con.
sidered the homomorphism f: R —> C defined by fix) = e'""‘ for x e R.
The range of f is S‘, the circle group. The kernel is Z. Thus we see that
the quotient group R/Z is isomorphic to the circle group 5‘. There is a
vivid, geometric way to visualisef and this isomorphism. RepresentR by a
straight line. Then under f, a point atadistance a(say) from the origin goes
to a point on S1 whose argument is 21m. (Here a can also take negating
values.) Intuitively .f wraps the real line around the circle an infinite
number of times, with every interval of length 1 getting wrapped once
around the full circle 5“. The coset at + Z consists of all real numbers
whose distance front a: is an integer. All these points are taken to the
same point of S‘. The construction of S1 from R can be visualised as
follows. Cut the line at all points marked by integers as in (a). Glue the
end points of each segment of unit length. This results in an infinite family
of circles as in (b). Now ‘fuse together‘ all these circles into a single circle
348 DISCRETE MATHEMATICS (Chapter Five)
A i f(‘()=o2"i-t.
__(_2 “H ‘ 4.1 __.I A
-2 —1 o l 2
l -.
—i'-—_I—O-— -—.— _._. 7
\J O O O O
(0) (b)
Figure 5.5: R/z bowl-[c to 5‘
so that points which difl‘er by an integer originally in R are fused together.
Then every coset of Z in R corresponds to a point of S‘ and vice versa.
The second example is also geometric. In Example (12) above we de-
flnedf: R' ——> R by _/(x, y) = y — 2x. Then I is an epimorphism with
kernel L = ((x, y) e R‘ : y — 2x = 0). So by Theorem (3.5) the quotient
group RVL is isomorphic to R. To see this isomorphism visually, draw
a line M perpendicular to L. Then the various cosets of L in R” are lines
perpendicular to M. Smash the plane to the line M by perpedicular pro-
jection. Then every line perpendicular to M gets smashed to a point of M.
namely the point of its intersection with M, the line 'L itself is smashed to
the origin, while any line parallel to L at a perpendicular distance a: is
smashed to the point on M at a distance a: from the origin where the dis-
tance a may be positive or negative, (Figure 5.6). Thus if we regard each
coset of L as a single point we get M as R'lL. M is, of course, isomor-
phic to R as a group with a point at a distance a from 0 corresponding to
(0) ( b)
Figure 5.6: 11'”. lmorphic to R.
Group Theory 349
the real number a. (Instead of M we could take any line through 0 other
than L, but the ‘smashing’ will still have to he done parallel to L.)
As the third example, we justify the term ‘quotient group’. As used in
algebra, ‘qnotient’ is the opposite of ‘product'. if 12 is the product of 3
and 4 then 3 is the quotient of 12 by 4 and 4 is the quotient of 12 by 3.
Let G, H be two groups and G x H their product group. Denote the
projection on G by 1: (see Example (9) above). Then 1: (x, y) = x for all
x E G, y e H. 1: is a homomorphism. in fact an epimorphism. The kernel
K (say) of re is the subgroup {e) x H. Clearly K is isomorphic to H. (We
simply definef: H—> K by/(y) = (e, y) for y e H.) So upto an isomor-
phism, we identify K with H. By Theorem (3.5), the quotient group
(G x H)/H is isomorphic to G. Similarly (G x H)/G is isomorphic to H.
This justifies the term 'quotient’. (Another possible justification is simply
that for finite groups, the order of the quotient group equals the quotient
of the order of the group by that ol' the subgroup.)
Although every factor group of a product appears, upto an isomorphism,
as its quotient, the converse is not true. lt‘G is a group and K is a normal
subgroup then we can form the quotient group G/K. But G need not
always be isomorphic to the product group K >< G/K. As a simple counter-
example, let G be the quaternion group Q and let K be its centre. Then
X is sbelian. Also G/K is the Klein group which is also abelian. Hence the
product K x G/K is abelian and so cannot be isomorphic to Q, which is
non—abelian.
In the course of the proof of Theorem (3.5), We have proved the
following result which is worth isolating.
3.6 Proposition: Let f : G —> H be a group homomorphism with kernel
K and range R. Let yo e 1!. Then the equation f(x) = y0 has a solution
for x (in G) if and only if yo E R. If x. is any solution, then every solu-
tion is of the form xok for some k e K. Also f is one-toone if and only if
K= (e).
Proof: The first assertion is obvious. For the second, we have to show
that if/(xo) = yo then f“ (00)) is the coset o. This can be proved directly.
However, using the notation of Theorem (3.5). we have p(x.,) -.= o e G/K
and r! «M» = ocG- Nowf= 0 or gives. 1-1 «y.»=p-‘ (ta-try.» =
p-l({o)) since 0(xaK) =f(x,,) = y. and 0 is one-to-one. So we get
f-l ({yo}) = xaK as was to be shown. The last statement is now obvious,
because if K = {e} then for every y. e H, the equation f(x) = yD has at
most one solution. The converse was already proved. I
Two special cases of this result must be already familiar to the reader.
Consider a system of m linear equations in :1 real variables, say,
350 mscma mrumancs (Chapter Five)
aux, + aux. +...+ aux. =- lll
anxl + aux: +...+ flux- = bl (1)
"nrxi + ”Ms + + ”inn". = bMJ
In the next chapter we shall see that the general solution of this system is
obtained by finding any one particular solution and adding itto the general
solution of the corresponding homogeneous system (which is obtained by
replacing all the bl's by 0). We can interpret this in terms of the last
proposition as follows:
Consider the groups R" and R", each under coordinatewise addition.
Denote their points by vectors Thus at = (x,,.. .,x.) E R" and
y = (yr. ,y...) e R" Definef: R“—R"‘ byfm.» .xu)=?=(yp---.y.)
where for each 1 = l, 2, ..., m, y, = 121 01/26}. 'l‘henfis easily seen to
be a group homomorphism, (of. Example (10) above). Let
1?: (b1, ,b,.) e an.
Then solving (1) amounts to solvmgf(x) b. The corresponding homo-
geneous system 1s given by f (x) = 0 The kernel of f IS precisely the set of
all solutions of the homogeneous system. By adding these solutions to any
one particular solution of (l), we get a coset of the kernel off and this
coset consists precisely of all solutions of (1). For n = 2, m = l, the
solutions of (1) form a line in the plane, parallel to the line which represents
the kernel of the homomorphism. The result then geometrically means that
in order to know a line completely, it suflioes to know any one point on it
and any one line parallel to it.
Similarly it can be shown that the general solution of alinear differential
equation is obtained by finding the general solution of the corresponding
homogeneous equation and adding to it any one particular solution of the
original equation. This is a special case of Proposition (3.6).
Let us now see what a homomorphism between two groups does to
their subgroups. Let f : G --> H be a homomorphism with kernelK and
range R. Let A, B be subgroups of G, H respectively. If A C K, then
[(A) = (e), the trivial subgroup of H. More generally, if A is any subgroup
of G then the part of it which is common with K (that is, the intersection
A n K) will be taken to {e} by f. Intuitively whatever happens inside K
willbe masked by f. Similarly for any subgroup B of H, only the part it
has in common with R (that is, R n B) will have any bearing with f. (Iff
is an epimorphism then H = R and so the entire subgroup B will be
governed by f.) It follows that in order to have a non-trivial relationship
between subgroups of G and those of H, we must require the former to
Group Theory 351
contain K and the latter to be contained in R. With this restriction, we do
have an important relationship proved below.
3.7 Theorem: Let f: G —> H be a group homomorphism with kernel
K and range R. Let E be the collection of subgroups of G containing
K and .5( be the collection of subgroups of 1-! contained in R. Then
there isa one-to-one correspondence between 9 and All. This correspon-
dence preserves subgroups, normality and quotient groups. (The meaning
of this statement will be clear in the course of the proof.)
Proof: We leave it as an exercise to prove that if A is any subgroup
of G (not necessarily containing K) then f (A) isa subgroup of H. Also
obviously/(A) C [(6) = R. So f (A) e .5[. Similarly if B is any subgroup
of H, then f-‘(B) e 9. Note that in general, f"(f(A)) may be larger than
A and ltf—‘(B))may be smallerthan B. We claim that if A E g and B E M
then this cannot happen.
Let A e 9'. That is, Kc A and A is a subgroup of G. Certainly,
f-' (fiAl) contains A. To show that equality holds, suppose x e f-‘ (f(A)).
Then f(x) e f(A), which meansflx) -_= f(y) for some Y e A. But then
xy'1 e K as we saw in the proof of Theorem(3.5). Since Kc A, Ay“ e A.
But y e A. Hence x =- (xy“)~y e A. Thusf-l (f(A)) c A as was to be
shown. Similarly let B e 3!. Then BcR and the equality f (f-l (3)) = B
follows by a purely set theoretic argument.
So, ifwe define 8: {I -) Jlby 0(A) =f(A) and 41:51» 9 by ¢(B)=f" (B),
then these functions are inverse: to each other. Hence each is a one-to one
correspondence, that is, a bijection. This proves the first assertion of the
theorem. To say that the correspondence between 9 and .5! preserves
subgroups means that whenever A1, A, e 9 with AI c A, then f(A,)Cf(A,)
and similarly whenever 8,. B, E .9! with B. C 3., then f-1 (3,) cf-‘(B,).
This is a purely set-theoretic result. Suppose further A, is normal in A,
Then we ciaimf(A,) is normal in f(A,‘. Letx e A,. ye A, We have to
show f(y) fix) [f(y)]" e 1“,). This follows since My1 6 AI and
foo/(x)[f(y)]"=f(yxy“). Similarly if B, is normal in B, then f-‘(B,) is
normal inf-101,). This is what is meant by preservation of normality.
Finally, we show that H Al is normal in A,for A.. A, e 9 then the quotient
group A,/AI is isomorphic to the quotient groupf(A,)fl(A I). We could do
this directly. But, an application of Theorem (3.5) would save a lot of
work. Simply define g: A, —> f(A,)/f(A,) by g(x) zflx) [(Al), that is, g(x)
is that onset off(A,)in f(A,) which contains for). That gis a homomorphism
follows from the fact that f is a homomorphism (and the definition of
ooset multiplication). g is also obviously onto. The kernel of gis the set
(x e A.: g(x) = the identity element off(A,)/f(A1)) But the identity element
offlA,)/f(A,) is the coset 1“,). Now, fix) 11A.) = f(A,) if and only it‘
f(x) 5 f(A,). This is equivalent to saying that xe f‘l (f(Al)). But
352 mscnm MATHEMATICS (Chapter Five)
f“1 (f(A,)) = Al, as we saw above (since A, contains K). Thus the kernel of
g is precisely A1. So by Theorem (3.5), the quotient group AJA, is isomor-
phic to the range of g, which equalsf(A,)/f(A,). Conversely if B, is normal
.in 5,. then f“(B,) is normal in f-1(B,). So, by what we proved just now,
f" (B,l/f-1(B,) is isomorphic to [(f-l) (B,)W(f-1(B,)) which is nothing
but 3413;. We have now completely established the theorem. [1
The following corollary is often useful.
3.8 Corollary: Let K be a normal subgroup of a group G. Then there is
a one-to-one correspondence between subgroups of the quotient group
G/K and subgroups of G containingK. This correspondence preserves
subgroups, normality and quotient groups.
Proof: Let p: G -> G/K be the quotient homomorphism. Then p has kernel
K and is onto. So the result follows directly from the last theorem. l
As a concrete example, let G = Q, the group of quaternions and let K
be its centre. (1, — I}. Then the quotient group G/K is the Klein group
H=(e, a, b, c} where a = (j: i}, b ={ij}, c = (ik}and e: {:l; l)-H
has five subgroups, {e}, (a. a), (e, b}, {e, c} and H itself. Under the corres-
pondence in the corollary the corresponding subgroups of Q are K, (i), (j),
(k) and Q respectively.
As an application of this theorem, we prove the converse of Lngrange's
theorem for groups whose orders are powers of a prime, as promised in
the last section.
3.9 Theorem: Let G be a group with o(G) = p"I where p is a prime
number. Then for r with 0 g r g m, G contains a subgroup of order 1".
Proof: We argue by induction on m. If m = 0, G is the trivial group and
the assertion holds. Let m > 0. The case r: 0 is also trivial. So assume
r > 0. By Proposition (2.25), G contains a normal subgroup K of order p.
Then the quotient group G/K has order p"“. By induction hypothesis, G/K
has a subgroup B of order p”. Let f: G —> G/K be the quotient homo-
morphism. By Theorem (3.7), f-1 (B) is a subgroup of G containing K and
f-1(B)/K is isomorphic to B. Since -(B) = o(f"(B))/o(K) and °(K)=p.
it follows that o(f“’(B)) = 17'. So 0 contains a subgroup of order p’,for all
0 g r < m. This completes the inductive step and the proof. I
As remarked earlier, two groups which are isomorphic to each other
may be regarded as identical from the point of view of group theory. One
of the central problems in group theory is to show that any two groups
with certain common properties are isomorphic to each other, or that
7 every group with certain properties must be isomorphic to some standard
familiar group. This is known as the problem of group classification. The
most trivial result of this type is that any two trivial groups (that is, groups
Group Theory 353
having only one element each) are isomorphic to each other. This means
that upto isomorphism there il only one trivial group. That is why we
speak of ‘rhe trivial group’ (or 'the zero group' when we are dealing with
abelian groups).
As a slightly less trivial example, we show that when we form the
permutation group ofa set. it is only the cardinality of that let that
matters.
3.10 Proposition: Suppose f :X —> Y is a bijection of sets. Then the
permutation groups, SO!) and S(Y) of X, Y respectively are isomorphic to
each other. In other words, two sets of the same cardinality have isomor-
phic permutation groups.
Proof: Recall that S(X) consists of all bijections from X to itself. Given
any such bijection c: X -> X, f c o of" is a bijection of Y onto itself
(because f and f"1 are bijections). So we have function A: S(X) -> S(Y)
defined by 1(a) = fc a e f-1 for o e S(X)~(7l is like conjugation by], but
note that f is not an element of S(X) in general.) Then A is itself a hijection
because the function u :S(Y) —» S(X) defined by 51(1) = [‘1 e r a f is clearly
the inverse function of A. Also if e, t]: E S(X) then M944) =f a (0 o tll)f"=
(fa e) . o «1-1) = (fo 6 or!) . (M w“) - A (e) W) showing A is a
group homomorphism. Hence A is a group isomorphism. That is, SO!) and
S(Y) are isomorphic to each other. I
Because of this proposition, when we consider the permutations group
of a set with n elements, it does not matter which particular elements we
choose. The most standard choice is to take them as the positive integers
from 1 to n. The group of permutations of the set (I, 2,..., n) is called the
symmetric group of degree n and is denoted by 5.. We already know that
~(S.) = n!. We shall consider these groups in the next section.
The problem of classification of groups is fairly involved, even when
restricted to finite groups. In the last chapter we saw that any two finite
Boolean algebras with the same number of elements are isomorphic to
each other. This is far from the case for groups. The groups S, and 2. both
have order 6. But Z. is abelian while S. is not (see section 1). So S, cannot
be isomorphic to 2,. Thus equality of orders is necessary but far from
sufi‘icieut for two groups to be isomorphic. Later (see the Epilogue), we
shall obtain a criterion for two finite sbelian groups of the same order to
be isomorphic. As a forerunner, we have the following simple result which
completely settles the case of cyclic groups.
3.11 Proposition: Any two finite cyclic groups of the same order are
isomorphic to each other. Also any two infinite cyclic groups are isomor-
phic to each other.
354 orscnm MATHEMATICS (Chapter Five)
Proof: Let G be a cyclic group generated by an element x of it. Let 2 be
the additive group of integers. Define f:z —> G by fin) = x" for n EZ-
Because of the laws of indices (Proposition (3.48)), fis a homomorphism.
Also f is onto because every element of G is some power of x. So by
Theorem (3.5) again, 6 is isomorphic to Z/K where K is the kernel off.
The proof now reduces to the computation of K.
If G is finite of order m (say) then by Theorem (1.10). x" = e ifand
only if n is a multiple of m. So in this case K is simply the subgroup m2
of Z. But we have already seen that z/mz is the group 2... of residues
modulo m. Thus every cyclic group of order m is isomorphic to Z... Hence
any two such groups are isomorphic to each other.
Suppose now that G is infinite. Then the kernel K contains only 0.
Otherwise, by Proposition (L7), K would equal mz for some positive
integer m. But then z/K, would be finite and isomorphic to G, contradict-
ing that G is infinite. So K = (0), that is, f is a monomorphism. Sincef is
also an epimorphism, it is in fact an isomorphism. Thus every infinite
cyclic group is isomorphic to Z and so any two such groups are mutually
isomorphic. I
In the course of the proof we have done something more than the
proposition asserts. Let 9 be the class of all cyclic groups. We have
shown that if G is any member of 9’ then G is isomorphic either to Z or
to Z... for some unique positive integer m. In other words the collection
of the groups {2, 21, 2,, ...} is a complete set of representatives (upto
isomorphism) for the class @, More generally, a collection 9 of groups
is called a complete set of representative: for a class 9 of groups if every
member of 9 is isomorphic to one and only one member of 9. It follows,
in particular, that no two distinct members of g are isomorphic to each
other, which means, intuitively, that 9 is free of redundancy.
The problem of finding a complete set of representatives for a given
class of groups is an important one and varies in difficulty depending
upon the class. We have just solved it for the class of all cyclic groups.
The classes that are interesting include those which consist of all groups
of a given order. For a positive integer n, let V. be the class of all groups
of order n. If n is a prime then a complete set of representatives for T.
is given by the following proposition.
3.12 Proposition: Every group of a prime order p is isomorphic to zp.
That is (2,) is a complete set of representatives for the class if, of all
groups of order p.
Proof: Let G be a group of order p. Let x be any element of G other
than e. Let H = (x). Then 0 (H) > 1. But by Lagrange’s theorem 0 (H) is a
divisor ofp. So a (H) = p. That is, G = H = (x), proving that G is a cyclic
group of order p. By the last proposition, G is isomorphic to 2,. I
Group Theory 355
When n is not a prime but a product of two primes, a complete set of
representatives for n will be obtained later. In general, however, there is
no way that will work for all n. Considerable ingenuity is needed even
for particular values of It. By way of illustration, we settle the cases n = 4
and n = 6 (the first two cases not covered by the last proposition).
3.13 Proposition: For 3", the Klein group and the group Z. form a
complete set of representatives. For $1,, the groups S, and 2. form a
complete set.
Proof: Let G be a group of order 4. If G contains an element of order 4
then G is cyclic, and isomorphic to 2.. Suppose G contains no element of
order 4. Let the elements of G be e, x, y, z with e as the identity. Now
the order of x is either I, 2 or 4 by Proposition (2.14)." «(x) = l, x: 2.
Also o(x) 9!: 4 by assumption. So c(x) = 2. Similarly y, z are of order 2
each. Now xygé e, x or y as this would respectively imply y — x, y = e,
or x = e none of which is true. So xy = 2. Similarly yx = 2. (Or we could
use Theorem (2.24) by which G is abelian). Similar reasoning gives
yz=zy=xand x2 =xz ax. If we define f: G—>H (where His the
Klein group) by f(e) =- e, f(x) = a, fly) = b and [(2) = c then f is an
isomorphism. Thus we have shown that every member of V. is isomorphic
either or to z. or the Klein group. Since these two groups are not themselves
isomorphic to each other, they constitute a complete set of representatives
for V‘.
Next, suppose G is a group of order 6. We consider the cases G abelian
and G non-abelian separately. First suppose G is abelian. We show that
G has an element of order 6 which would imply G is isomorphic to 2,. By
Exercise (1.16), G has at least one element say x of order 2. We claim G
has no other element of order 2. Because if y were such element then
{e, x, y, xy} would be a subgroup of G. This contradicts Lagrange’s theorem
since 4 does not divide 6. Now let 2 be any element of G other than c and
x. By what we have shown, the order of z is 6 or 3. In the first case, we
are done. In the second. we leave it to the reader to show that 2x has order
6. In any case G has an element of order 6 as was to be shown.
Finally, suppose G is a non-abelian group of order 6. Then G has no
elements of order 6. So every element other than the identity has order 2
or 3. We claim that there is at least one element of order 3. If not, then
the square of every element of G will be e and hence G would be abelian
(cf. Exercise (1.5)). So 6 has at least one element say y of order 3. Let
H = (y). Then H has index 2 in G and hence is normal in G, by Theorem
(2.9). Now let x be an element in G of order 2 (which exists by Exercise
(1.16)). Then certainly x¢H. Now rl=x and since H is normal in G,
xyx‘1 = xyx E H. So xyx =1 e or xyx = y or xyx = y“. The first equality
implies xy = x which is impossible. The second equality, xyx = y implies
356 DISCRETE MATHEMATICS (Chapter Five)
xy = yx (since x’ = 2). But this means y commutes with x. Then xy would
have order 6, as can be easily shown, implying that G is cyclic. So the
second possibility is also ruled out, leaving xyx = y', or xy = y’x. The six
distinct elements of G are now seen to be e, y, y', x, xy (= y’x) and
yx (= xy’). With this knowledge about the structure of G, it is now easy
to establish an isomorphism f:G-> S, We think of S, as the group of
isometries of an equilateral triangle (Figure 5.1). We let f(y) be the clock-
wise rotation through 120" and f(x) be the reflection in any one of the
altitudes. It is easy to extend f to the remaining elements of G so as to
get an isomorphism of G onto S,.
Summing up, every group of order 6 is isomorphic to Z. or to S,
depending upon whether it is abelian or not. So 2. and SI constitutea
complete set of representatives of if... n
We remark that with the machinery to be developed later in this book.
the proposition can be proved much more easily. The direct proof above
is meant to illustrate the type of reasoning needed. It also gives a good
opportunity to review most of the basic concepts introduced in the last two
sections.
The problem of group classification seeks to find, for a given group 6,
some known group H which is isomorphic to G. A related problem is that
of group representation. Here we are interested in findinga homomorphism
f from a given group G to a familiar group H. This homomorphism mustx
of course be non-trivial so as to cast some light on the properties of G in'
terms of those of H. Iff can be chosen to be u monomorphism then G is
isomorphic to a subgroup of H, namely f(G). When this is the case, we
know 0 upto an isomorphism, as soon as we know H and all its subgroups.
Thus the problem of group representation is to express an ‘abstract‘
group as isomorphic to a subgroup of some ‘concrete’ group. This problem
is surprisingly simple as shown by the following theorem due to Cayley.
3.14 Theorem: Every group is isomorphic to a subgroup of a group of
permutations.
Proof: Let G be agiven group. We consider the permutations group of the
set'G itself. We denote it as usual by S(G). It consists of all bijections from
G to itself (they need not be isomorphisms) and the group operation is the
composition of functions. For each g E G define a function T,: G —> G by
T,(x) = gx for x E G. In other words, T, is nothing but the left transla-
tion by the element g (cf. Exercise (14.2)). Because of the cancellation
laws in groups, T, is one-to‘one. Also if y e G then T,(g‘1 y) = gg‘1y = y
showing that T, is also onto. So T, is a bijection of the set G onto itself
and hence is an element of S(G). (Note that we are not claiming, nor is it
necessarily true that T, is a group homomorphism. Actually. T, will be a
group homomorphism ifi' g = e.) Now definef: 6—) S(G) hyf(g) = T,
We assert that f is a monomorphism. First, let g. h e G. We claim
Group Theory 357
T“ = T a T;.. Since both sides are functions from G into itself, it snflices
to show that for every x E G, Tact) = (T, o T.) (x). But the left hand
side is, by definition, (glz) x, which by associativity equals g(hx), which is
precisely the right hand side. So T,,, = T, a T5, that is f(gh) = flg)f(h)
for all g, ’1 e G. This shows that fis a group homomorphism. As for its
kernel, suppose g e G and f(g) is the identity element of S(G). This means.
T, is the identity function on G, that is, T,(x) = x for all x E G. But then
gx = x = ex for all x, giving g = e by the cancellation law. So the kernel
off consists only of the identity element. Therefore f is a monomorphism,
showing that G is isomorphic to e) which is a subgroup of 5(6). I
Cayley‘s theorem is of the same spirit as the Stone representation
theorem for Boolean algebras mentioned in the last chapter. Both assert
that certain abstract algebraic structures are isomorphic to some very
concrete structures of the same category. There is, however, considerable
difi‘erence in the degree of their depth. While Cayley‘s theorem is easy to
prove, the Stone representation theorem. even in the finite case required
some work and in the infinite case (which we did not prove) requires the
use of the axiom of choice.
For finite sets, Cayley’s theorem can be given a. still more concrete
form.
3.15 Theorem: Every finite group of order n is isomorphic toasubgroup
of Sn.
Proof: Let G be a group of order It. By the last theorem 6 is isomorphic
to a subgroup of 5(0), the group of permutations of the set G. But by
Proposition (3.10), the permutation group of any set with n elements is
isomorphic to 5(6). We take this set to be {1, 2,.... n}. Then S(G) is
isomorphic to S... Consequently G is isomorphic to a subgroup of S”.
3.16 Corollary: For a positive integer n, there are only finitely many
distinct groups of orders In upto isomorphism.
Proof: The group S. is finite (having order nl) and so has only finitely
many subgroups. Since every group of order n is isomorphic to at least one
of these, it follows, that upto isomorphism, there can be only finitely many
distinct groups of order n. E
Worded difierently the corollary snys that the class (6,. of all groups of
order n has a finite, complete set of representatives. It should be noted that
the corollary does not, by itself, give any particular set of representatives
for a. To find such a set, we would have to find all subgroups of S. having
order n. and then decide which of them are isomorphic to each other. By
picking exactly one from each type, we would get a complete set of
representatives for the class K... This procedure is theoretically possible but
highly inefficient. First, even for relatively small values of n, n! is very large.
358 mscnm MATHEMATICS (Chapter Five)
So it is impracticable to list down all n-snbsets of S.' and find out which of
them are subgroups. Even if we manage to do that, we still have to decide
which of these subgroups are isomorphic to each other. This is another
difiicult problem. Given two groups say G and H of order 11 each, there is
no easy way known to tell whether 6 and H are isomorphic to each other.
Of course. we could consider all possible bijections from G onto H and
examine them one by one to see if at least one of them is an isomorphism.
This is theoretically a finitistic process. But once again, because there are
n! bijections, this is not a practical way. True. we could weed out quite
a few bijections right away, for example those which do not take the
identity element of G to that of H. But even then we are still left with a
large number of them. It is often easier to show that G and H are no!
isomorphic to each other. We simply take a suitable property (invariant
under isomorphisms) which one of the groups has and the other does not.
For example if G is abelian and H is not, we see at once that G cannot be
isomorphic to H, without examining any bijection between them. Similarly
it‘G has an element of a particular order'and H has no such element, it
follows that G is not isomorphic to H. Such a property is said to distinguish
the two groups. Of course a property which serves to distinguish one pair
of groups may not work for some other pair. This is, in fact, the trouble.
Given two groups we may go on comparing their properties. If we come
across a property which distinguishes them we are done. But if not. we
cannot still say that the groups are isomorphic. We may of course be
convinced that they are so. But an actual isomorphism will have to be
constructed, which is again not so easy. Interestingly, for finite abelian
groups, there is indeed a way to tell whether two groups are isomorphic
by simply examining a few of their properties (or invariants as they are
technically called). We shall study them later (see the Epilogue).
We conclude this section with another representation of groups which
is sort of dual to that given by Cayley‘s theorem. In Cayley’s theorem, we
express a given group, uplo isomorphism, as a subgroup of a group of a
specific type, namely, a permutation group, In' the representation to be
studied now, we shall express it as a quotientgroup of a group or a specific
type, called free groups. We begin by defining what they are.
In Section I, we studied the concept of a subgroup generated by a sub-
set S of a group G. Suppose this subgroup happens to be G itself. By
Proposition (1.8), every element of G can be expressed in the form
12%" x'," where x, E S, m e Z and r is a non-negative integer. This
expression need not be unique. For example, the quaternion group Q is
generated by the set S = {Li}. We can write k E Q as ii or as i‘j’ and
also as ji'1 or nsj'i, and in many other ways‘ Such multiple expressions
result because of certain equalities which hold in a particular group. (For
example in the quatemions group, i‘ = e and so i5 = 1‘ etc.). It is possible
to conceive ol'a group which is ‘free’ of such equalities. The only equalities
Group Theory 359
that hold in such a group are those that are implied by the laws of indices.
By making a convention that no two adjacent x,’s in the expression
at? x7 are equal and also by requiring that no index In is zero, we then
get a. unique expression’ for every element of G, as a product of powers of
elements of S. The following definition gives. the name for such a group.
3.17 Definition: A group G is said to be free if there existsa subset
S c G such that every element of G can be uniquely expressed as
xI‘A’J' xi"
wherer > 0, x, E S and n, is a non-zero integer for i = l, ..., r and no
two adjacent xi’s are equal (r = 0 corresponds to the identity element of G).
We also say that the set S freely generates the group 0. By convention, we
take the trivial group to be a free group generated by the empty subset.
The simplest non-trivial example of a free group is the group 2 of in-
tegers under addition. Here the set Smay be taken as {1) or as {— 1}.
None of other groups we have considered so far is free. Indeed it is easy
to show, by the pigeon hole principle, that no finite group (except the
trivial group) can be free. Moreover, if the set S in the definition above
happens to contain at least two elements. say a and b then G cannot be
abelian, or else ab and he would give two different expressions for the same
element of 6. Thus no abelian group, other than Z and the trivial group
can be free. However, free groups do exist. In fact given any set, we can
construct a group which is freely generated by it as we show now.
‘ 3.18 Theorem; Let S be any set. Then there exists group G having S as
a subset such that S freely generates G.
Proof. We have little choice in defining G. We let G be the set of all
monomials of the form x7‘x‘2" xi." where r > 0 (r = 0 gives the null
monomial \vhichis also an element of G, to be denoted by e), x, E S, n,
are nonezero integers and no two adjacent x,’s arevequal. Here x." is only
a formal symbol. It is not to be interpreted as the mth power of an. Such
an interpretation is meaningless at the moment because as yet no binary
operation has been defined. To avoid confusion, we might denote the
element x'.“ x7 as a sequence of ordered pairs ((x,, n,) (x,.n.),.... (x,,n,)).
In other words, elements of G are all finite sequences taking values in the
set S x (Z—{0)), including the sequence of length 0, that is. the null
'Another equivalent way to ensure uniqueness of expression is to allow the
adjacent x,‘s to be equal, but to restrict each exponent n, to be either I or — l and
require that in case XI=Xl+l then nix-n“... This avoids x and r1 appearing adjacently
and thereby ensures unlqueness. With this convention a‘b"a’ will have to be written
as aanb“1b ‘1“. An expression of the former type is called the normal form while the
latter is called the reduced ward farm of an element.
360 mscas‘ra MATHEMATICS (Chapter Five)
sequence. We identify each x E S with the element x', (or equivalently,
the sequence (x, 1)) in the set G. This way S is a subset of G.
We now turn to defining a binary operation on G which will make G
lntoa groupl‘reely generated by G. Let x= x‘l. .... x'," and y = y,"", y':"
be two elements of G. We shall define xy by induction on their ‘lengths’.
that is on the integers r and k. When r = 0, x = e, the null sequence and
we simply let xy = y for any y. Similarly if k = 0, xy = xfor all x
Suppose r > 0 and k >0. If x, aé y, we simply juxtapose x and y and
define xy as xi‘xg' xfl'y’f'yf" yi'" which is indeed an element of G.
However. if x, = y“ this definition will not be valid because of the restric-
tion on the form of the elements of G. In this case, we proceed as follows:
Suppose n, + m. ;e 0. Then we simply let
xy = xi" $237+")? .. y"".
This is a sequence of length r + k — l and is in G because as, ye y, (since
at, = y“ and y1 ge y‘). The only remaining case is when x, = y, and
n, + ml = 0. (Intuitively in this case, x} and yi'" should cancel each other.)
Formally,inthiscase we letxybe the product of xi" .11 and y?" y?“
which is a product of sequences of lengths r—l and k—l and hence is
already supposed to be defined by induction hypothesis. Thus we have
completely defined a binary operation on G. (The definition is not as clumsy
as it appears. The manner in which the product is defined is exactly the
manner in which the product of two such monomials in a group would be
simplified until no further reduction is possible.)
Now we prove that with this binary operation G is indeed a group. By
very definition, the null sequence e is an identity element. The verification
of associativity is a little tedious and has to be done inductively. Let
x = x? xi", y = y!“ yz'k and z = z," 2:” be any three elements of
G. If x, ere y‘ and yk eé 2,, both (xy): and x(yz) come out to be equal to
14‘ 37%" ..-. yzkn" zf’. When either x, = y1 or y; = 21 we have to
make several cases. Suppose for example. that x, = yI and y. eé 2,. If
n, + m1¢0 then both (xy)z and x(yz) equal xi" x',"+"",v"z"'...y;"" 2'1"...s
On the other hand if n, + m1=0, then xy is defined inductively as x’y’ where
x' = x? arm; y’ = y!" yf". In this case (xy): = (x’y’)z and x(yz) =
x’O'z). Since x’, y’ are shorter than x, y by induction hypothesis we have
(x'y‘)z = x’o/z). Hence (xy)z = x(yz). Other cases are handled similarly,
resorting to induction when needed. This establishes the associativity of
the multiplication. Inverses are very easy to obtain. The inverse of xi" x‘,"
is simply x,""x,'.”{" x,“"'xf”‘. Thus we see that G is a group. That it is
freely generated by S follows by the very construction of G. i
3.19 Definition: The group G constructed above is called the free group
on the set S and is denoted by F(S).
Although Theorem (3 l8) establishes the existence of free groups on any
Group Theory 361
set, unless the set S is empty or a singleton they are not very familiar as
noted above. Still they have some very interesting theoretical properties.
Just as the subgroups of permutation groups provide a complete picture
of all groups (upto isomorphisms), the quotient groups of free groups give
a complete picture of all groups (upto isomorphisms). In other words.
every group is isomorphic to a quotient group of a free group. A special
case of this result is already familiar to us. Every cyclic group is isomor-
phic to a quotient group of 2, which is the free group on a singleton set.
To prove the general result, we first need a property of free groups which
is important in its own rights.
3.20 Theorem: Let S be a set and F(S) the free group on S. Let H be
any group and f: S —> H any function. Then I can be uniquely extended
to a group homomorphism from F(S) to H. That is, there exists a unique
homomorphism e: F(S) —> H such that e (x) = f(x) for all x E S.
Proof: Every element of F(S) is of the form x,"l...x‘r'r, where xles,
mez —{0} with the restrictions noted above. Now 0(x.) has to be
f (Jo) which is already defined fori = l,...,r. So ire is to be a group
homomorphism, we have to choice but to define 9 (x:I ...x:'r) as
[f(x,)]"l ...... If(x,)]"1 e H 6(2) is defined as the identity element of
the group H. To show that e is a homomorphism, let x_ — x"- x-_: and
y: yI ...... yz'ke G. To prove that 0 (xy): 0 (x)0(y), we would again
have to proceed by induction on the lengths of x and y, keeping'In mind
the definition of xy, just as we did while proving associativity. (This, by
the way, is a common feature in mathematics. Whenever something is
defined inductively, many of its properties, at least those where the defini-
tion is directly involved, have to be proved by an inductive argument.)
So we get a well-defined homomorphism 0: F(S) » H. By its very construc-
tion,0 (x) = f (x) for all x e S. Also. because we had no choice in
defining o, it follows that 0 is the only homomorphism extending f. I
The result we are after is now an immediate consequence.
3.21 Theorem: Every group is isomorphic to a quotient group of a free
group.
Proof: Let G be a group. Let S be any subset of G which generates G
(not necessarily freely). We may even take S to be G itself. Let F(S) be
the free group on the set S. Elements of F(S) should not be confused with
those of G, even though both can be represented as products of powers of
‘ ‘ of S. For ' G may be L " In that case, for x1, x, e S,
JrlxI and x,x, are the same elements in G but as elements of F(S), they
are distinct. (In case the reader finds it necessary to avoid confusion,
elements of S may be denoted by putting bars over them while representing
362 DISCRETE MATHEMATICS (Chapter Five)
elements of G. Thus 2.7:, may equal $.21 but x, x. 96 x,x,. But with a little
care, this would not benecessary.) Now define f: S—> G by f(x)=xfor xe S
(or as f(x) = :‘c if we put bars to denote elements of G). In case S = (if is
merely the identity function on the set G. By the theorem above there
exists a unique group homomorphism 0: F(S)—>G which extends f. In
fact 0 is given by e (xfl ...x;'r) = g- ...... x;'r where on the left hand side
£1”..f represents an element of RS) and on the right hand side it
represents an element of G. Since S generates the group G. it follows that
e is onto. Let K be the kernel of 0. Then by Theorem (3.5), G is isomor-
phic to the quotient group F(S)/K. I
Although this theorem is of great theoretical significance, it is rarely
used in the form it is stated. Given a known group G, it is not particularly
helpful to express it as the quotient group of F(S) (S being some generat-
ing set for G) by some normal subgroup K. In practice, we often use the
theorem the other way. We specify a free group and some normal subgroup
of it and use this information to define the group G. The theorem above
says that this method does define every group upto isomorphism As a
concrete example, let us see how the group G of isometries of a regular
pentagon (also called the dihedral group D‘. see Section 1) can be defined
by this procedure. We already know the structure of G completely. We
know that G can be generated by two elements (for example, by a rotation
and a flip). So we consider a set S with two elements, say, S = (a, b} and
take the free group RS) on S. We dcfinef: S—vG byf(a) = r,,f(b) =1;
(see Section 1). Then by Theorem (3.20), there is aunique group homo-
morphism 6:F(S)—» G which extends f. Let K be the kernel of e. If we
could compute the kernel K in some independent manner, then we could
define G, the dihedral group 05. as the quotient group I-‘(S)/K. Let us
therefore see what K looks like. In G we have the relations r: =f§ =
identity and so a“, b‘ e K. Moreover, in G we also have rA o [1 = f, which
is an element of order 2. So (ab)‘ 5 K. K contains many other elements.
But we claim that the three elements, a“. b’ and (ab)I of HS) generate K as
a normal subgroup, that is, K is the smallest normal subgroup of G which
contains these three elements. To show this, suppose N is any normal
subgroup of G which contains a“, b‘ and abab. We show KCN. First
note that baba E N, because we can write baba as b (abab) b-1 which must
be in N by normality. Next we claim that (a‘ b)’ e N. We write (a’b)’ as
[a(abab)a“] ab-lab. Since abab E N, a(tlbab)a:-l e N. Also ab“ab=ab"a-1
abab which is again in N since b" E Ngives ab'a" E N and (ab)' E N.
Putting it together, we getthat (a’b)2 E N. Similarly (ba‘f‘ E N. By induction
on n, it can be shown that for every positive integer n, (a"b)z and (lm'l)2 are in
N. As for the negative powers, we already saw above that ab-‘ab e N. Since
b" e N, we get (ab-l)‘ = ab‘lab b“2 E N. Similarly (b‘la)' E N. It is then
seen by induction that (“"11“)2 and (b44102 are in N for all positive integers
n. By taking inverses, (a"b)2 and (bir')3 are in N for all integral values of n.
Group Theory 363
With this spadework, we are now in a position to show that K, the
kernel of 6 : F(S) -—) G is contained in N. Let x .—. x{‘ .'...x," be a typical
element of K. We are given that 0(x) = e and We have to show that x e N.
The proof will be by induction on r, the length of x. Each x, is a or b and
since no two adjacent x,'s are equal, they are alternatively 2’s and b‘s.
Suppose r = 1. Then x is either a": or b“ and accordingly 6(x) is either d"
or ffl. We are given that 0(x)=e. Recalling that the orders ofthe elements
r1| and f‘ in the group G are 5 and 2 respectively, it follows that in the
first case n, is a multiple of 5 while in the second case n, is a multiple of 2.
In either case xi" 6 N (since a5 e N and b’ E N) Now suppose r > 1 and
the result holds for all elements of length less than r. First consider the
case r is even. Then either x, = b or x, = b. In the first case, x = bow-x;-
...x7’, i.e., x = buy where y = maxi”. ti". We now make two subcases.
First let 7:; be even. Then 0(x) = e gives [6(b)]"A 0(y) = e which implies
8(y) = e since (f,)"\ = e (nl being even). So y e K. Since y has length less
than r. .V E N by induction hypothesis. But b” e N since n, is even and
b' e N. Thus x e N in case n1 is even. Now suppose n1 is odd, say.
u1 = 2p + l. Then we write x = b'l’z where z = ba"!x',"...x,'.". We again get
0(2) = e. Now 2 = (ba"-)‘w where w = rub-WP... x7 = a-n-xg'"x:‘mx;"
(since x, = b). Once again we show that 0[(ba"t)'] = e and hence 0(w) = e,
that is we K. The length of w is 7-1 (or even less in case na = 1). So by
induction hypothesis, w e N. Since (bah? e N (as proved above) and
bale N (since 11' e N), it follows that y e N and finally x e N. We are
still in the case r is even. We have completely disposed oil" the subcase
Xi = b. 1f 2:, = a then x, = b (since r is even) and a similar argument holds.
Alternatively, we take x-‘ which equals x7" ...x,"" and apply the argument
above to show that x" e N, whence x E N. We still have to consider the
case r is odd. This is much easier. We simply write x = xf‘x;'...x'," as
x'."(u)x."" where u is x',"..,x’,”x;" which equals x;'...x‘,"+"‘ (since x, = x”
r being odd). 6(x) = 2 gives [9(x,)]~te(u)[e(x,)]"l=e which implies 0(u)=e,
showing that u e K. Now the length of u is r — l (or even less in case
n, = - n1). So by induction hypothesis, u e N But N is normal. So
2: = xi‘uxf" e N. Thus in all cases, we have shown that x E N. This
completes the proof that K C N and shows that the kernel K of 9 is the
smallest normal subgroup of F(S) containing the three elements a“, b' and
ab 2.
( )Let us pause to recapitulate what we have achieved through this rather
sticky argument. We started with a known group G, the group of isomet-
ries of a regular pentagon. We then expressed G as a homomorphic image
of the free group F(S) where S is a set with two elements‘a and b. We letK
he the kernel of this homomorphism. We then showed that K is the normal
subgroup of F(S) generated by a subset R of F(S) namely, R .-= {a', b', abab).
Now suppose we do not start with the group G. We can still consider the
set S = (a, b}, the free group F(S) and the subset R =(a', b’, abab). We
364 otscma MATHEMATICS (CImpterFive)
then lete the normal subgroup of F(S) generated by R, that is, the
smallest normal subgroup of F(S) containing R. We could still define the
quotient group F(S)/K. Our work shows that this quotient group is isomor-
phic to the group of isometries of a regular pentagon. Strictly speaking
elements of F(S)/K are cosets of K by elements of F(S). Howwer, it is
customary to denote them by the same symbols as elements of F(S). In
particular, we denote the cosets aK, bK (which are elements of F(S)IK) by
a and b respectively. Then F(S)/K is generated (although not freely) by
these elements a and b. The fact that a‘, b’ and abab are elements of K
means that in F(S)/l(, the following identities hold, (i) a“ = 2 (ii) 17‘ = e
and (iii) abab = e. In presence of (i) and (ii), (iii) is equivalent to ba = a‘b.
These identities are called relations. It is customary to describe F(S)/K as
the group generated by two elements a and b satisfying (or subject to) the
relations (i), (ii) and (iii). Using these relations it is easy to see that F(S)/K
consists of elements of the form n'bI where i, j are integers and where a'bl
is to be regarded as the same element as amlrl if i Em (modulo 5) and
‘E n (modulo 2). To multiply two such elements, say. a'i‘l and a'b‘ we
reduce a'bla'b' by repeated applications of the relation be = a‘b. For
example, abab = a(ba)b = aa‘b‘ = a‘ba = e. [t is clear that F(S)/K is a non-
abelian group of order 10, generated by two elements.
This procedure can be generalised. We let S be any set whatsoever and
form the free group on S, F(S). We also let R be any subset of F(S) and
define K to be the smallest normal subgroup of [(8) containing R. Then
the quotient group F(S)/K is said to have elements of S as generators. The
elements of R are called relations (often the corresponding equalities which
hold in F(S)/K are called relations). This method of specifying a group is
known as presenting a group through generators and relations. It is a very
compact way of specifying a group. For example, instead of defining the
quaternions group Q in terms of the elaborate table of multiplication
shown in Figure 5.3 (and then carrying out the horrendous verification of
associativity) we may simply define the quaterniun group as the group
generated by two elements a and b subject to the relations a‘ = b‘ = e,
be = «'12 = ab'. We leave it to the reader to verify that the group so
defined is indeed isomorphic to Q. As another example, the relations
a7 = b' = 9, ba = a‘b give rise to a non-abelian group of order 2l.
If the set S is finite, the group F(S)/K is said to be finitely generated.
If the set R is also finite, it is called finitely presented. Obviously every
finite group is finitely generated. It is not necessarily true that a subgroup
of a finitely generated group is also finitely generated. It is true but non-
trivial to prove that a subgroup ofa free group is free. A proof can be
given using graph theory (see the Epilogue).
Theorem (3.21) is applicable to all groups, abolian as well as non-
abelian. Even if G is an abelian group, however, the free group whose
quotient group is isomorphic to 6 need not be abelian. This is sometimes
Group Theory 365
undesirable. Given an abelian group G, we often want to express G as a
quotient group of some abelian group whose properties are analogous to
those of free groups. Such groups are called free abelian groups. They
turn out to be the abelianised free groups (see Exercise (2.13)). The term
‘free abelian’ should not be interpreted as ‘free and abelian'. Indeed the
only free groups which are abelian are Zand the trivial group. The correct
meaning of ‘free abelian‘ is the counterpart of free groups for the world
of abelian groups. Properties of free abeliau groups, as well as the analogue
of Theorem (3.21) will be developed through exercises.
While closing the section, we remark that the topic of group represen-
tations is intimately related to that of group actions (see the Epilogue).
Exercises
3.1 For any three groups 6,, 6,, 0,, prove that G, x GI is isomorphic
to G,>< G1 and (G, x G,)>< G, and 01x (G,>< 0,) are both isomor-
phic to Gl,xG,.
3.2 Suppose f:G —>H is a group homomorphism and x E G has
order n. Prove that the order ot‘f(x) divides n. iff is onto and K
is a subgroup of index m in H, prove that/“(m has index In
in G.
3 3 Suppose G, H are finite. groups and f:G -> H a homomorphism
with range R. Prove that a(R) is acommon divisor of o(G) and
.(H). Deduce that if a(G) and o(H) are relatively prime then the
- only homomorphism from G to H is the trivial one.
3.4 Let G be any group. Let A(G) be the set of all automorphisms of
0 onto itself. Prove that A(G) is a subgroup of 5(6). (The group
A(G) is called the automorphism group of G.)
3.5 Let G, H be two groups. If G is isomorphic to H, prove that
A(G) is isomorphic to A(H). Show by an example that the converse
is false.
3.6 Prove that A(z,) is isomorphic to the group Z, — (([0]} under
modulo 1; multiplication for any prime p. Also find 4(2).
3.7 Let I(G) be the set of all inner automorphisms (see Example (8) in
the text) of a group G. Prove that [(6) is a subgroup of A(G) and
as a group [(6) isomorphic to the quotient group G/Z where Z is
the centre of G.
is an
3. 8 Prove that the inversion function for a group (Exercise (2.6))
automorphism if and only if the group is abelian.
a non-
”3.9 Prove that every group with more than two elements has
other
trivial automorophism group, that is, has an automorphism
than the identity function.
sm
3. 10 Suppose G is a finite group and f: G —-> G is an automorphi
which sends more than three fourths of the elements of G to their
366 mscam MATHEMATICS (Chapter Five)
inverses. Prove that fix) = r1 for all x E G and G is abelian (cf.
Exercise (2.22)).
3.11 Let G be a group and K a normal subgroup of G. For any sub-
group H of G, HK is a subgroup of G by Exercise (1.22). Prove
that H n K is normal in H and that the quotient group H/H n K
is isomorphic to the quotient group HK/K. (This is called the
Noether isomorphism theorem.) Illustrate this with G =R‘ under
co—ordinatewise addition and H, K two lines through the origin.
3.12 Let T be a subset of a set S. Let G, H be the permutation groups
of T, S respectively. Define f: 6—) H as follows. Let 9 e G.
Define f(6) to he the function 1-: S —> S such that 1-(x) = 0(x) if
x e Tand 1(x) = x ifx e S — T. Prove that f is a monomor-
phism. (See also Exercise (1.30).).
3.13 Let C‘ be the multiplicative group of non zero complex numbers
and R' the multiplicative group of non-zero real numbers. Define
f: C‘ + R‘ byf(z)= I: | forzEC‘. (Here, ifz=x+ iy with
x, y real then I z I means Vx‘ + y‘.) Prove that f is a homomor-
phism with kernel S‘. Verify the truth of Theorem (3.5) for this
homomorphism in the light of Exercise (2.3).
'Prove that the Klein group is isomorphic to 2.x 2..
Let G be an abelian group, Ha subgroup of G and p: G —> G/H
the quotient homomorphism. Suppose there exists a homomorphism
j: GIH —> G such that for all xH e G/H, p(j(xH)) = xH. (In
other words 11 has a right inverse which is a group homomorphism).
Prove that G is isomorphic to the product group H x G/H. (Hint:
For x e G. let h(x) = x — j (p(x)). Prove that his a homomorphism
of G into itself with range H.)
3.16 Suppose G, H are groups with K. L as normal subgroup: respect-
ively. Suppose f: G —> H is ahomornorphism which takes K into L
(that is, flK) C L). Prove that there exists a homomorphism
ll: GIK —> HIL such that the following diagram (in which the hori-
zontal arrows represent the respective quotient homomorphisms)
is commutative:
G———>G/K
f '4'
H —-—>H/L
3.17 Suppose f: G —> H is a homomorphism where H is abelian. Prove
that the kernel of fmust contain C(G), the commutator subgroup
Group Theory 367
or G. Using this give an alternate solution to Exercise (2.13).
3.18 Let f and g be homomorphisms from a group G to a group H.
Suppose S is a subset of G and K is the subgroup of G generated
by S. If Ax) = g(x) for all xe S, prove that f(x) =g(x) for all
are K (F . '-y a “ phisu. is ,‘ ' y determined
by its values on any set of generators for the domain group.)
Prove that every group of order l0 is isomorphic either to Z", or
to the group of isometries of a regular pentagon.
3.20 Prove that every group of order p' where p is a prime is isomor-
phic either to 2,. or to Z, X 2,.
‘3.21 Prove that every non-abelian group of order eight is isomorphic
either to Q, the group of quatemions or to D,, the group of isomet-
riee of a square. (Hint: start with Exercise L2.l6). Note that the
quotient group by the centre cannot contain an element of order
4.)
Prove that every abelian group of order 8 is isomorphic to either
2.. Z,xz. or z,xz,><z,.
In the proof of Cayley’s theorem, what would happen if we associa-
ted an element g to the right translation by g? Prove that an
alternate proof of Cayley’s theorem is possible if we associate an
element g to the right translation by r‘.
3.24 Give an alternate proofot' Corollary (3.l6) (that is one not based
on Theorem (3.15)).
3.25 Let G, H, K be abelian groups. Let f: G —> H be a homomorphism.
Define f7“: Hom (H, K) —> Hom (G, K) by fl‘o.) = A offor A E
Hom (H, K). Similarly define f,“ Hom (K, G)—>Hom (K, H) by
fit (it) =fo p. for u E Hom (K, G). Prove that fi" and f,. are
homomorphisms.
3.26 Let S and T he sets of the same cardinality. Prove that the free
groups F(S) and F(T) are isomorphic to each other. (The converse
is true but not so easy to prove. it will be postponed to Exercise
(63.31).)
3.27 Prove that a group with two generators a, b with relations
a“ = I:I = e and ab = be is isomorphic to Z.><Z, and hence to
1...
3.28 Suppose G is a group with two generators a and b with relations
a" = b" = e and bu = akb. where m, n, k are positive integers.
Prove that in G, (a’bl) (a'b‘) equals a’*"‘/ bi” for all integers i, j, r,
i.
3.29 In the last exercise suppose m = 7 and n = 3. Prove that G is a
non-abelian group of order 21 if k = 2 or 4, is the cyclic group of
order 21 ifk = l and is the group Z, it'k = 3. (Hint: In each case
consider a homomorphism from F((a, (1}) onto the appropriate
group and show that its kernel is generated by the three relations.)
368 DISCRETE MATHEMATICS (Chapter Five)
3.30 Prove that the group generated by two elements a and b with rela-
tions 11‘ = b‘ = e, ab' = a’b = ba is isomorphic to the group of
quaternions.
3.31 Let S be any set and F(S) the free group on S. Let C(S) be the
commutator subgroup of F(S). The quotient group F(S)/C(S) is
called the free abelian group on the set S. Denote this group by
A(S). We think of S as a subset of A(S) by identifying x E S with
the coset xC(S) in A(S). H6 is an abelian group and f : S —> G
any function prove that there exists a unique homomorphism
d; : A(S) —~ G which extends f. Hence prove that every abelian
group is isomorphic to a. quotient group of a free abelian group.
’132 Let S be any set. Let Z be the additive group of integers. Let 2‘
be the set of all functions from S to 2. Then 25 is an abelian group
under pointwise addition of functions. Let B(S) be the subset
{f 6 ZS: f vanishes at all except finitely many points of S). Prove
that B(S) is a subgroup of ZS and that it is isomorphic toA(S), the
free abelian group on S. (Hint : Define f: S —> 3(3) by f(.r) to be the
function f,: S —> Z which has value 1 at s and 0 everywhere else.
Extend f to a homomorphism from F(S) to 8(S) and show its
kernel to be precisely C(S)).
3.33 Identify an element 3 e S with the function f, defined in the hint
to the last exercise. With this identification S is a subset of 3(3).
Prove that every element of 8(5) can be expressed as a sum
msl + n23, + + n,:,
where
r>0ma es,n;ez—-{0}
and no two sis are the same and such an expression is unique
except for the order of the summands. (This gives a direct con-
struction for the free abelian group on a set S.)
3.34 Prove that the product of two free abelian groups is a free abelian
group. (In particular Z x Z, Z x Z X Z are free abelian groups.
So the free abelian groups are not as ‘abstract’ as the free group).
3.35 Let G be the free group with two generators, a and b. Let H be
the set of those elements of G in which the sum of the exponents
of a equals the sum of exponents of b. (Thus, ab, ba. a"b‘1, ab‘a.
{lb/1b, ba‘b—‘Qr‘b‘ are in H but aba, abab‘l, tr‘b"al‘l7"l are not in H.)
Prove that:
(i) H is a normal subgroup of G,
(ii) H is a free group generated freely by the set
{a'b":n€ N) U {b"a":neN);
another set of generators being {a’b": n = 31:. i, :l: 2, :l: 3.}.
(iii) Let z x 2 be the group of all ordered pairs of integers under
coordinatewise addition. Consider the unique homomor-
phismf: G —> Z x 2 which takes a to (l, 0) and b to (0, 1)
Group Theory 369
(which exists by Theorem (3.20)). Then H is precisely f'1(Z)
where Z = {(x, x) : XS 2).
(iv) H is not finitely generated. (Hint: Otherwise a finite subset
from (ii) would generate H.)
Notes and Guide to Literature
All the concepts in this section are of fundamental nature. Their analogues
for other algebraic structures will come in the next chapter. Some of the
examples of homomorphisms we have given are actually linear transfor-
mations and will be visited again when we study vector spaces.
Free groups are important in many branches of mathematics. such as
‘knot theory'. For more on them, see for example Robinson [1]. For knot
theory. see Crcwell and Fox [I].
Although the terminology used in this section is by now standard, the
reader should be wary in consulting old literature. For example, Carmichael
[1] calls ‘simple isomorphism’ what we call as isomorphism.
4. Permutation Groups
As proved in the last section. the permutation groups of finite sets, along
with their subgroups, give a complete picture of all finite groups. For this
reason alone they deserve to be studied in detail. But there are other
reasons too. in applications we often consider the various transformations
of a set into itself. These form groups which are subgroups of the permu-
tation groups (not just isomorphic to them). In the chapter on group
actions', we shall see many instances of this. Secondly, some of the
algebraic concepts we study in various branches of mathematics can be
expressed very simply in terms of the permutation groups. We shall illus-
trate thisin the next chapter by showing how the definition of a- determinant
of a square matrix can be given in terms of permutation groups. Finally,
some of the properties of the permutation groups proved here are needed
when we want to prove the well-known impossibility of finding a formula for
the roots ofa polynomial of degree 5 or more." Apart from these theore-
tical applications, permutation groups also provide interesting counter-
examples.
As before, by S, we denote the group, under composition, of all
bijections of an n-elemeut set, generally taken to be the set {1, 2, ..., n)
where n is a positive integer. Elements of S, are functions from (I, 2, ..., n}
to itself and will generally be denoted by a, 1-, 9 etc. It is rather clumsy to
’See the Epilogue.
"See the Epilogue.
370 DISCRETE MATHEMATICS (Chapter Five)
denote these functions in the usual functional notation, in the form 1(1) =
..., ..., ...,f(n)= It is convenient to have a compact notation. One
method is to agree to list f( l), ...,f(n) always in this order and simply write
. this arrangement. For example 2354] is the (unique) permutation say a, of
l, 2, 3, 4, 5 which takes 1 to 2, 2 to 3, 3 to 5, 4 to 4 and 5 to 1. To stress
that we are thinking of 23541 as a function rather than as an arrangement
of symbols (although the two are equivalent), it is helpfill to sacrificea
l 2 3 4 5
little space and write a as ( ) An advantage of this no—
2 3 5 4 l
tation is that it is not necessary to have the top row always in the increasing
order (although it is generally so) provided the entries in the bottom row
are also shifted correspondingly. Thus a can as well he expressed as
l 3 5 2 4 l 2 3 5 4
( ) or as( )or in many other similar
2 5 1 3 4 2 3 5 l 4
ways. This helps in writing down the inverse of a permutation as well as
the composite of two permutation. For inversion we simply interchange
the top row with the bottom row and then reshuflle it, if necessary, to
bring it to the standard form. Thus, in the example above, 3-1 is the
2 3 5 4 l
permutation ( 1 ),which in the standard from equals
2345
1 2 3 4
( 5 I 2 4 3 )or 5l243. Similarlylet e be some other elemeutof S‘, say
1 2 3 45
e = ( 5 2 l 3 4 ). To find the compositee e ewe rewrite a so that its top
5 2 1 3 4
row coincides with the bottom row of 1-, that is, c= . Then
1 3 2 5 4
l 2 3 4 5
a e 1- is simply ( ) which is obtained by taking the bottom
l 3 2 5 4
row of a (in the new form). The procedure given here amounts to taking
each element of {l, 2, 3, 4, 5} and tracing it under the action of e and
then that of a. For example 3 goes to I under 1 and l(= 1(3)) goes to
2 under :1. So 3 goes to 2 under the composite a e 'r.
We now turn to another method of expressing a permutation, which Is
important not only notationally but conceptually as well, We begin by
defining a special type of permutations.
4.1 Definition: Let n, r be integers with l g r < n. A permutation
a e S, is said to be a cycle of length r or an r-cycle if there exists distinct
elements in i,,,..., i, e (I, 2,..., n} such that
Group Theory 371
(i) 60‘.) =1}. 00'.) = i.---- 002—1)=in 0(1)) =1}
and
(ii) .10) = i for all i e {1, 2. n} _ {1,,.... 1,}.
In other words, a cycle of lengthrpermutes the elements of some
r-subset of l, 2,...,n in a cyclic manner and leaves all other elements
unafl'ected. A cycle of length l is of course the identity permutation. A
cycle of length 2 is a permutation which interchanges two elements of
(I, 2,..., n) and leaves other elements fixed. These are the simplest non-
trivial cycles and are called transpositions. Theirimportanoe in the structure
of S. will be brought out later. As examples, for n = 4, the permutations
l 2 3 4 1 2 3 4 d 1 2 3 4
. an are cles of length 2, 3
l 3 2 4 4 2 I 3 4 3 1 2) cy
and 4 respectively (the corresponding permuted subsets being {2. 3}. (l, 4, 3)
l 3
and (l, 4, 2, 3}). However, the permutation (4 3 l) is not a cycle
2
l 2 3 4
by itself. It is, however, the product of the two 2-cycles (4 2 l) and
3
l 2 3 4
(1 3 2 ) It is obvious that the order of an racycle as an element of
S. is r.
There is a very compact way to denote permutations which are cycles.
Suppose a is a cycle of length r and 1,. I....., l, are the elements which are
cyclically permuted by a (in the order of their listing). Then a is denoted
by (h, i,...., i,) or simply by (i, i, i,). We could start with any i. and the
same r-cycle would be denoted by (iki.+,...i,i,...ik_,). It follows that the
same r-cycle can be denoted in r ways depending on the starting point. For
1 2 4
example the 4-cycle ( 3 ) can be denoted by (l 4 2 3), (4 2 31),
4 2
(2 3 l 4) or by (3 1 4 2). From this, it is easy to compute the number of
distinct cycles of length r. A transposition (that is a 2-cycle) which inter-
changes, say, iand j will be denoted by (ij) or (ji).
What happens when we compose two cycles ? The answer depends on
the two cycles. Suppose a, 1- e S. are cycles of length r. s respectively,
say, a = (i,i:...i,) and 'r = (j, j, ...j,). If these two cycles are disjoint,
(that is the sets {i,...., i,) and (fly", 1",} are disjoint) then it is easy to see
that they commute with each other. that is a o a: = 1 o a,- because under
both of them, it goes to 1“,,“ (with i, going 1,), 1', goes to j,+1(withj, going
to i,) and all other elements of (l. 2,.... n} remain unchanged. Note, however,
that this permutation is notacycle(unless r = l or s = 1). Two non-disjoint
372 nrscmsra unnammcs (Chapter Five)
cycles do not in general commute with other. For example, (12) o (13) equals
(1 3 2) while (13) a (12) equals (1 2 3).
Cycles are the building blocks for elements of 5,. and behave, to some
extent, like the atoms in a Boolean algebra. The following theorem is the
analogue of Proposition (4.1.10). Part (ii).
4.2 Theorem: Every permutation of n symbols, other than the identity,
is the product of mutually disjoint cycles of length greater than 1. This
expression is unique except for the order of the cycles (and except for the
different ways of expressing-the same cycle).
Proof: We proceed by induction on n For n = 1, there is no permutation
other than the identity permutation and so the result holds vacuously.
Suppose I: > 1 and the result holds for all m < n. Let a be a permutation
of n symbols, say, I, 2,..., n. Since a sé identity, there exists some x such
that a(x) aé x. Set ix =x. Define i. = a(l,) = 60:), i, = o (i, = u“(!,) =
a' (x),... and in general it“ = c203,) = o" (1“) for every positive integer k.
This gives an infinite sequence (A, i,, 1...... 11,...) However, this sequence
takes values in the finite set {1, 2,.... It). So its terms cannot be all distinct.
Thus there exist integers p, q with p < q such that i, = 1}. Let q — p = .r.
Then i, = i,“ = arm (1,) = 59-1 (n’(i,)) = av-I on,» while i, = al’"(!,).
So I, = 1', implies im = i,, since at“, as a function from {1, 13..., n) into
itself is injective. Thus we have shown that there exists a positive integer s
such that i”, = 11. Let r be the least positive integer with this property.
Then 1,, i,,..., i, are all distinct (by an argument analogous to the proof of
Theorem (1.10)). It is then clear that (il i,...i,) is an r-cyclt. Call this r-cycle
as a, Define 11(1, 2,..., n)» (l,2....,n) by t(i.)=i;. for allk and 1(x)=0(x)
if x ye i,,..., i... In other words, e behaves exactly as 3 except on the set
(1,. 1...... 1,) which is left invariant by 1. Clearly r and a, commute with
each other and the product 0,1- (or 1- «1) equals a. Now we may as well
think of 1 as a permulation ot" the set T, where T = (l , 2,... , n)— {1}, i,,...,i,}.
Now i T l = n — r < n. It“: is the identity then a = 0‘ which is a cycle. If
e¢ identity, then by induction hypothesis 1- can be expressed as a product
of disjoint cycles of length greater than I, say, 1 = G’s....0k. Evidently
a“... 0,; are disjoint from or Every permutation of T can also be regarded
as a permutation of (l, 2..... n) (cf. Exercise (3.12)). So a :1 clown,” a
product of cycles. As for uniqueness suppose a is also expressed as 0,0,...8.
where each 0: is a cycle of length greater than 1 and every two 0’s are
mutually disjoint. We started with x such that o(x) ¢ x. Clearly x must be
in one of these cycles, as otherwise a(x) = x. Because disjoint cycles
commute with each other we may suppose that 2: appears in 0, (otherwise
we reshuflie and reindex the cycles). We may also suppose that the cycle 91
starts with x. But then 01 must be (xo(x) c‘(x)...o'" (x)) which is precisely
or So 61 = a1, giving 0,11,...“ = 0.0,...fl.. By cancellation law, a.e,...c.=
Group Theory 373
6,...0... Both sides are permutations of the set Tabove. Since I T l < n,
we apply induction hypothesis. because of which the factorisation is unique.
So k = u and with a re-indexing and reshuflling of the 0‘s,
o, = 0,,..., 0,, = 0"
So the factorisation of o as alumna; is unique modulo the order of the
cycles. a
In the theorem above, we ignored the l-cycles (and therefore had to
disallow the identity per ‘ )L they are r " " ' in a
factorisation. However, for certain purposes which wedo not mention here,
it: is important to ensure that every element of {1, 2, ..., It) belongs to at least
one cycle even if it is a cycle of length I. If we do so, we get the following
reformulation of Theorem (4.2).
4.3 Theorem: Every permutation (including the identity) of n symbols
can be expressed as a product of mutually disjoint cycles the sum of whose
lengths is n. This factorisationis unique modulo the some restrictions as
above.
Proof: There is basically no change in the proof. But since we are allowing
cycles of length 1 also, the argument can be given a little more systemati-
cally by an algorithm. We illustrate this with thc.permutation
I 2 3 4 5 6 7
a: in S1.
4 2 l 3 7 6 5
We start with l andconsidcr e(l), 6*(1), u’(l), till we get I. In this case
we get (1 4 3) as the cycle containing 1. Now we take the first element not
covered so far. This element is 2. Since 0(2) = 2, (2) is the l-cycle contain-
ing 2. The next element not yet covered is 5 and (5 7) isthe Z-cycle contain-
ing it. Finally, the only element left is 6 which is in the l-eycle (6). So a is
expressed as the product of 4 cycles, (143) (2) (57) (6). Since the process
is carried till all symbols are exhausted it is obvious that the sum of the
lengths of all cycles is n. For uniqueness. we apply induction again. I
4.4 Definition: The factorisation of a permutation into mutually disjoint
cycles is called its cyclic decomposition.
Whether to include cycles of length l in n cyCIic decomposition or not
is really a matter of convention. As remarked above, sometimes it is impor-
tant to include them. For the moment, however, we consider only cycles of
length greater than one in the factorisation. The cyclic decomposition pro-
vides the most compact way of representing a permutation in S... For
example, let G be the group of isometries of a regular pentagon, considered
in Section 1. if we number its vertices as 1, 2, 3, 4, 5 clockwise then G is a
subgroup of 55. A clockwise rotation through 72 degrees isacycle of length
374 Discatms MATHEMATICS (Chapter Five)
5, namely (1 2 3 4 5). Similarly, clockwise rotations through 144, 216 and
288 degrees are given by the cycles (1 3 5 2 4), (1 4 2 5 3) and (1 5 4 3 2)
respectively. Rotation through 360 degrees is of course the identity permu-
tation. As for reflections, or ‘flips’, let f1 be the reflection in the line passing
through the centre of the pentagon and the vertex 1. Then the cyclic de-
composition of f1 is (25) (34). Similarly the flips f., f., f.. f, are given by
(13) (45), (24) (15), (12) (35) and (14) (23) respectively as can be verified
with a diagram (cf. Figure 5.1).
When a permutation is expressed in its cyclic decomposition form, its
inverse is also obtained immediately in its cyclic decomposition form. If
(11. i., ..., 1,) is an r-cycle, its inverse permutation is the r-cycle read back-
wards, that is, the r-cycle (in 1,-“ ..., i,, 1,). (The proof that these two are
inverses of each other is trivial.) Since mutually disjoint cycles commute
with each other, the product of their inverses (in the same order) gives the
inverse of their product. As an example, in the proof of Theorem (4.3). we
1 2 3 4 5 6 7
considered a = ( ) whose cycle decomposition is
4 2 l 3 7 6 5
(143) (57). Hence a" = (341) (75). There is, however, no easy way in general
to get the cyclic decomposition of the product of two permutations in terms
of their cyclic decompositions. We simply have to compute the product
and find its cyclic decomposition by the method above.
Because of Theorem (4.2), (or its companion, Theorem (4.3)). when we
want to prove something about an arbitrary permutation in 5,, the problem
can often be reduced to proving it for a cycle. As an illustration, we have
the following result which underscores the importance of transpositions
(that is, cycles of length 2).
4.5 Theorem: Every permutation of n symbols can be expressed as a
product of transpositions. Consequently, the subgroup of S, generated by
all transpositions is the entire group S...
Proof: The identity permutation is the product of zero 2-cyclcs. (It is the
empty product. We already treated empty sums as 0 and empty products
as l in Chapter 4. However, if this sounds absurd, we write the identily
permutation as (12) (21) assuming 7: > 1.) By Theorem (4.2) every permu-
tation a, other than the identity permutation is expressed as the product of
cycles of lengths 2 2. So it suffices to show that every cycle of length 2 or
more can be expressed as a product of transpositions. This is very easy
because a direct calculation shows that an r-cycle (iI i, 1}) equals the
product ((11)) (i1 i,_,) (1l 1;). (Given are {1, 2, ..., n}, we consider two
cases, J: = i], for some 1 g k < r and x71- 1], in ..., i, and verify that both
the sides take x to the same element in both the cases.) Thus it follows that
every element of S. can be expressed as a product of transpositions. The
last assertion is now clear because the subgroup generated by a subset must
Group Theory 375
contain all possible products of elements of that subset. a
Note that it is not claimed, nor is it true, that the factorisation of a
permutation as a product of transpositions is unique. For example, the
permutation (2 3 l) of S, can be written as (2]) (23) or as (32) (3|) or as
(13) (12). Even the number of factors need not be the same as we see from
the equation (23) = (12) ([3) (12).
As a simple application of the last theorem, we make good a promise
given in the last section about symmetric Boolean functions. If furl, ..., x.)
is a Boolean function of n Boolean variables, the definition of symmetry
required only that f be invariant for all possible interchanges of two vari-
ables at a time. However, in applications we needed that f is invariant
under any permutation of the variables. This can be proved as follows. Let
H be the set of those permutations of the variables xv ..., at. under which
f is invariant. In other words
H = {c e S.:f(x., ..., x") =f(xun. x-(o, ..., xo(.))}~
We have to show that H is the entire S,.. In any case H is a subgroup of
5., because if f is invariant under a and r it uould be so under a o -r and
under a". Now, by the definition of symmetry, every transposition (1,1)
belongs to H. So the subgroup generated by the set of all transpositions
in S,l is contained in H. But by Theorem (4.5) this subgroup is the entire S...
So H = S," which means f is invariant under any permutation of the
variables.
We remark that in order to generate S., we do not need all transpositions.
For example, the n — l transpositions (12), (13), ..., (In) also generate 5.,
because every other transpositions, say, (i, j) can be written as (l, 1') (1,1) ( l . 1)-
Although the number of transpositions into which a permutation is
factorised is not unique, as we saw above, interestingly, the parity of this
number is the same. This means that if we factorisc a permutation n' as a
product of transpositions asa = 1,1, and also ass = 9x 9, then r and s
need not be equal but either both are even or both are odd. This fact is
not immediately obvious. We prove it by first constructing a suitable
homomorphism defined on 5.. Let P,(n) denote the set of all 2-subsets of
n _
{1, 2, ....,n}. Note that mom =( 2 )= "($21).
4.6 Proposition: Let was". Define [(a) be the product of all real
numbers of the form 'L‘Di:_;(j)as (i, j} ranges over all 2-subsets of
{1,2, ....,n}, that is,f(o)= (1.1) enr. (a) M. Then f(o) is either
1 —J .
l or — 1. The function f :S.—>{l, —— 1}is a homomorphlsm of the group
S. into the group {1, — 1} under multiplication.
376 DISCRETE MATHEMATICS (Chapter Five)
n
Proof: f(v) is a product of ( )factors. Each factor is non-zero since
2
igé j'implies a (i) at a (j), a being one-to-one. Each 2-suhset of{l, 2, ..., It)
gives rise to one factor. Note that we are considering 2»subsets {i,_i} (with
res j) and not ordered pairs (i, 1‘). Whether we take 1% or
flL—Iu—DM: get the same factor. Now as(i,j} runs over all possible
2-su’bsets of (1, 2, ...., n}, so does {a- (i), 6(1)}. Therefore, in the expression
for f(o), the product of the numerators numerically equals the product of
the denominators. Hence f(0) =1 or — 1. (If we recall the concept of an
inversion of a permutation from Definition (3.3.12), we see that every
inversion gives a negative factor off(a). Sofia) is l or — 1 according as
the total number of inversions of a is even or odd.)
Now to show that f is a homomorphism suppose a, r e S,‘ and e = o a r.
_ 9(ii—0fi)= “Km-6&0?)
Then “0) __ (1,» gr. t.) i —J (1.1}31’10) 1' " f
«dB-690)) 103—40)
-
a.» e r. (w) 110—70) 1-}
cr(1'(1))-<r(1'(ii))( 1'I 1’(!')-T(i))
( a,» e m.) TOW—TU) u.» e no) 1-]
Now the second factor equals [(1) by definition. In the first factor, 1’, j are
only dummy variables. As {1", j} runs through every element of P,(n), so
does {1(1'), r(j)), except possibly in a difl‘erent order. But the multiplication
for real numbers is commutative as well as associative. So the first factor is
n
the same as mo. «1» e no.)
snip-«(m
"I'M—‘70)
which is nothing but f(a) So we
see that f(e e 7) = f(e)f(r). This proves that f is a homomorphism. I
4.7 Theorem: When a permutation is expressed as a product of trans-
positions, the number of factors has the same parity in all such expressions,
namely, the parity of the number of inversions in that permut.:tion.
Proof: Let f; S..—>(l, —l} he the homomorphism defined in the last pro-
position. If r is any transposition, say, (r, .v) then we claim {(1) = —- 1. To
see this we count the number of inversions in 1-. Without loss of generality
suppose r < .9. Then 1 is given by
(1 2...r—l r r+l...s—l s .r+l...n
l 2 r—l r r+l....v—l r s+1...n)
Group Theary 377
Suppose (1‘, j) is an inversion pair for 1, that is i<j but 1(1) > s(j). The
only way this can happen is, (i) i: r and r+l s] g s and (ii) r g ig 5—1
and j = 3. Both (i) and (ii) can occur s —~ r times each. But the case i: r
and j = .r is common to both (i) and (ii). It follows that the number of
inversions of s is 2(: — r) — I, which is odd. So in the expression of f(-r) as
I, m)— «1)
—an odd number of factors is negative. Hence f(r) is
(1.1) 5 non) i‘J
negative. But [(1) = 1 or —I. So [(1) = —— l as we wanted to show. Thus
f takes every transposition to —1.
It is now easy to complete the proof Suppose a permutation e is ex-
pressed as a product of transpositions in two ways, say, a = 1,1,...1, and
a= 9,0,...04. Since f is a homomorphism and f(1-,)= fta,)= — l for
all i, j, we have f(o) = (—1)" = (—1)¢. Also from the proof of the last Pro-
position, f(a) =(— l)’ where r is the number of inversions of 5. Sop, q, r
are either all even or all odd. E
Because of this theorem, we can classify all the permutations in S. into
two well-defined categories as follows:
4.8 Definition: A permutation is called even or odd according as the
number of factors in its expression as a product of transposition is even or
odd.
Thus all transpositions are odd permutations. The product of any two
transpositions is an even permutation. If (1}. i,,..., i,) is an r-cycle then, as
we saw above, (i,, i,,..., i,) equals the product (i.i,)(i,, i,_,)...(i1i,). So an
r-cycle is an even permutation if r is odd and an odd permutation if r is
even.
We have the following simple proposition which expresses the parity
of permutations obtained from other permutations.
4.9 Proposition: The product of two even or two odd permutations is
even while the product of an even permutation and an odd permutation is
odd. The inverse of a permutation has the some parity as that permuta-
tion.
Proof: If a is expressed as a product of r transpositions and o as a pro-
duct of r transpositions (say), then a o r is the product of r + s transposi-
tions, where r, s are positive integers. Now r + s is even if r, .r have the
same parity and odd if they have oppositive parity. This proves the first
assertion. For the second, we note first that each transposition is its own
inverse. So if a 2 111-2...“ where each 1/ is a transposition then r1=1,t-,_,
"n.1,. Hence a and r1 are either both odd or both even. I
As a consequence, we have the following simple but important result.
4.10 Proposition: For every integer n > 1 the set of all even permuta-
378 DISCRETE MATHEMATICS (Chapter Fiva)
tions of n symbols is a subgroup of S... It has index 2 in S.. and in normal
subgroup of S..
Proof: Let H be the set of all even permutations of n symbols. By the last
proposition, H is closed under composition and under inversion. So H is a
subgroup of S... To find its index in S. fix any transposition r E S.. (Since
n > 1, such a transposition exists.) Then 1 ¢ H. We claim that H and «H
are the only left cosets of H in S., This amounts to showing that 1-H equals
the set of all odd permutations in S... For this, if u E H then 16 is odd by
the last proposition. Conversely. if 0 is an odd permutation then 10 is even.
again by the last proposition. But 0 = 1(10). So 0 6 1H. Then H has index
2 in S.. From Theorem (2.9), H is normal in S.. (A direct proof of norma-
lity is also easy, using the last proposition.) fl
4.“ Definition: The group of all even permutations of n symbols called
the alternating group of degree n and denoted by A...
For n = l, A. = S... For n > 1, e(A..) =% o(S..) = "71 by the proposi-
tion. Thus A, is the trivial group. A, is a group of order 3. It is the cyclic
group generated by the 3-cycle (123). The group A. has order 12 and
because of its interesting structure deserves to be studied in detail. Its ele-
ment: can be classified as follows: (i) the identity element, (ii) eight 3-cycles
ot‘ the form (123) and (iii) three elements, each of which is a product of
two disjoint transpositions, namely, (12) (34), (l3) (24) and (14) (23). It is
easy to see that if we take elements of type (i) and (iii) then these four
elements form a subgroup K of A. Further K is isomorphic to the Klein
group. (To see this, note that (12) (34) (IS) (24) equals (14) (23) and
similarly the product of any two elements of type (iii) is the third element.)
We claim that K is a normal subgroup of A.. Curiously it is easier to show
that K is normal in the larger group S.. For this we consider N(K), the
normaliser of K in S. (cf. Exercise (2.29)). We have to show that N(K)=S..
In view of Theorem (4.5), it suffices to show that every transposition is in
N(I(). Take a typical transposition r = (12). Then T e 1—1 = e e K where
e is the identity permutation Also by direct calculation we see that 1-(l2)
(34)'r" = (12) (12) (34)(12) = (34) (I2) = (12) (34) e K and similarly
103) (24) r‘ = (12) (13) (24) (l2)= (I4) (23) e K and 1114) (23):—1 = (13)
(24) e K. So 1- K r" = K, that is 1- E N(K). Similarly every transposition
is in N(K). So N(K) = S.. which means K is normal in S. and a fortiari
in A..
The alternating group A. provides certain interesting counterexamples.
First we show it contains no subgroup of order 6. which would prove that
the converse of Lagrange’s theorem is not true in general. Suppose, if
possible, L is a subgroup of A. with «(1.) = 6. Then L has index 2 in A.
and so L is a normal subgroup of A. by Theorem (2.9). Now by
Group Theory 379
Exercise (1.16), L must contain at least one element of order 2. But the
only elements of order 2 in A, are the three elements of type (iii) listed
above. So L must contain at least one of them. Without loss of
generality we may suppose that (12) (34) e L Let r = (123). Then 1 e A.
and 1-1 = (132). By direct calculation we see that 102) (307-1 = (123)
(12) (34) (132) = (14) (23). But L is normal in Ar So (14) (23) E L. Hence
([3) (24) which is the product of (12) (34) and (I4) (23) is also in L. In
other words L contains the subgroup K. But this contradicts Lagrange's
theorem because e(K) = 4, «(1.) = 6 and 4 does not divide 6. This contra-
diction shows that A. cannot contain a subgroup of order 6, even though
6 is a divisor of u(A.).
Another interesting counter-example given by A‘ is regarding the nor-
mality of a subgroup. As noted above K is a normal subgroup of 4,. Let
M= (e, (12) (14)}. Then M is a subgroup ofKand is normal in K since K
is abelian (alternatively, M has index 2 in K and hence is normal in K). So
we have K as a normal subgroup of A. and M as a normal subgroup of K.
Still M is not a normal subgroup of A4! Indeed, as we saw above, the
element (14) (23) is a conjugate of ( 12) (34) by an element of A‘ (namely
the 3-cycle (123)). IfM were normal in A. then it would have to contain
all the conjugates of its elements. This example can be interpreted as
saying that ‘being n normal subgroup’ is nota transitive relation on the
set of all subgroups of a group.
The group A. also has an interesting geometric interpretation. In
Section 1 we considered the group of isometries ot‘ a regular tetrahedron.
If we label the four vertices of it as l, 2' 3, 4 then every permutation of
{1, 2, 3, 4} is an isometry. So the group of isometries is the entire group .5".
However, some of these isometries do not preserve orientation. In terms
of even and odd permutation, it is very easy to tell which isometries are
orientation preserving and which are orientation reversing. Consider a
transposition, say. (34) This represents the reflection in the plane passing
through the vertices 1 and 2 and the midpoint of the edge joining 3 and 4,
(see Figure 5.2). Such a reflection reverses orientation. Similarly every
other transposition is orientation reversing. The composite of two orientation
preserving or two orientation reversing isometries is orientation preserving
while the composite of an orientation preserving and an orientation reversing
isometry is orientation reversing. These facts combined with Proposition
(4.2) imply that the group of orientation preserving isometries of a regular
tetrahedron is precisely A‘. it is easy to see the geometric transformations
represented by variouselements of A.. The identity element of course repre-
sentstheidentity transformation. A typical element of type (ii), say, the 3-
cycle (234) represents a rotation around the altitude of the tetrahedron
passing through the vertex 1. A typical element, say, (12) (34) of type
(iii) r a 180° r ‘ ‘ the axis , ' ,, through the mid-
points of the edges 12 and 34.
380 otscxm MATHEMATICS (Chapter Five)
The alternating groups of higher degree (that is A. for n > 4) cm also
be interpreted as groups of orientation preserving isometries of higher
dimensional figures. Such figures are called slmplexes. A two dimensional
simplex is a triangle: a three dimc nsional simplex is a tetrahedron. However,
we shall not pursue this line further. Instead, we turn to a property for
which these groups are most famous.
4.1: Definition: A group G is called simple if it has no non-trivial,
proper normal subgroups, that is no normal subgroup other than {2} and
G.
A group of prime orde.r is simple because it not only has no proper
normal subgroups, it has no proper, non-trivial subgroups at all. by Lagran-
ge’s theorem. However, these are trivial examples. When we look for non-
trivial examples of simple groups, abelian groups are no help. Let G be an
ahelian group. Let x beany element of x other than the identity and let
H= (x), the subgroup generated by x. If H ¢ 0, then H is a proper normal
subgroup of G (since G is abelian). 0n the other hand if H = G, then G
is cyclic and so has a proper subgroup except when its order is a prime.
(See Theorem (2.17) if G is an infinite cyclic group generated by x then
the subgroup (x9) is a proper subgroup.)
So, to get non-trivial examples of simple groups,.we have to turn to non-
abelian groups. Using a case‘by-case argument, it can be shown that no
group of order 59 or less is simple, except when its order is a prime. The
group A“ of order 60 is simple as we now show. This group is, therefore,
the smallest non-trivial, simple group. Incidentally, the adjective ‘simple' in
the definition above does not mean that the structure of a simple group is
very easy to determine. On the contrary, characterising and classifying
simple groups has been one of the most formidable problems in group
theory. The term ‘simple’ is used by an analogy with other algebraic
structures (such as simple rings, simple algebras which we shall not study).
if a group G has a proper normal subgroup N then the structure of N and
of the quotient group G/N give some clue about the structure of G (although,
as we saw earlier, they do not completely determine it). In case of a simple
group no such simplification is possible. So 'simple‘ here simply means
something which cannot be further simplified and not something which is
very easy to handle.
4.13 Theorem: A” the alternating group of degree 5 is simple.
Proof: Let N be a normal subgroup of A. other than the identity sub-
group. We have to show that N is the entire group A, The proof will
proceed in three steps. First We shall show thatN must contain at least
one 3-cycle. Then we shall show that N contains all 3’cycles. And finally
we shall prove N = As.
We begin by listing the elements of A. other than the identity element.
Graup Theory 381
Keeping in mind that A . consists of all even permutations of the set {1, 2,
3 4,5} and Theorem (4.2), elements of A‘ (other than the identity) fall
into 3 categories:
(i) 3-cyeles, that is, elements of the form (123). There are 20 Inch
elements. (Ostensihly there are 60 3 cycles in a set with 5 elements,
but every 3-cycle can be represented in 3 ways).
(ii) Products of two ‘~ ", disjoint ‘ , ' ‘ that is, '
of the form (12) (34). In all there are 15 such elements. (Provei).
(iii) Sacycles, that is, elements of the form (12345). There are 24 such
elements.
The first step of our proof is to show that N contains at least one
elemement of type (i), Since N is non-trivial, it suffices to show that when-
ever N contains an element of type (ii) or type (iii) then it must alto
contain an element of type (i). We do these two separately, First suppose,
without loss of generality that, (l2) (34) e No Let 0 = (12) (45). Then 0
is an even permutation. So 0 e As. Also 8'1 = 0 = (45)(12) Since
(12) (34) E NandN is normal in A., 0(12) (34)6—1 is in N. By a direct
calculation, this element comes out as (12)(35). So (12) (35) e N. But N
is also closed nndermultiplication. So ( 12) (34)(12) (35) e N. This element
is precisely the 3-cycle (354). So N contains a 3-cycle. 1n the second case,
suppose N contains an element of type (iii) which, without loss of genera-
lity may be taken as the 5-cycle (l 2 3 4 5). Let c = (23) (45). Then a e A.
and «(123415):r—1 comes out as (13254) which is in N by normality But
then (1234903254) which equals (142) is also in N. Thus we have shown
‘ that in all cases N must contain at least one 3-cycle, completing the first
step.
Now in the second step we have to show that N contains every 3‘cycle,
that is, all elements of type (i). For this, it suffices to show that any two
3-cycles are conjugate to each other in A 5. for then, by normality whenever
N contains any one 3-cycle it will contain all 3-cycles. Leta, r be two
3-cycles. For proving that they are conjugate in A5, the fact that conjugacy
is an equivalence relation (see Proposition (1.16)) will be very useful. If
a = 1- then they are conjugate to each other by reflexity. So we suppose
0 9c 1-. First consider the case where o and c consist of the same three
symbols. Then they can be written down in the form typified by a = (123)
and t- = (132). Take 0 = (23) (45). Then 6 equals 016-1, proving that a is
conjugate to 1- in A... Next, suppose that o' and t have two symbols in
common. Then we can suppose without of loss of generality that o = (123),
1- : (l24) or a = (123). T = (214). In the first case leth = (345). Then
1 = Aux—1. In the second case 1 = nan-1 where P. = (12) (34). In either
case a, 1- are conjugate to each other in A6. So two 3-cycles which difl‘er in
one symbol (and possibly in the order of the common symbols) are con-
382 Discnm MATHEMATICS (Chapter Five)
jugate to each other. This fact, used repeatedly along with transitivity of
the conjugacy relation, implies that any two 3-cycles are conjugate to each
other in 4.. As noted before, this completes the second step of our proof,
namely, that the subgroup N contains all 3-cycles.
For the third step, we must now show that the set of all 3-cycles
generates the entire groupA.. For this, it suffices to show that allelements
of type (ii) and (iii) above can be expressed as products of 3-cycles. This is
very easy. A typical element of type (ii), say, (12) (34) equals (132) (134)
while a typical element of type (iii), say, (12345) equals (l23) (345).
We have thus shown that the normal subgroup N must be entire Ai
unless N = (9). So A, is simple. a
For n > 5 also, A, is a simple group. (The proof is similar to the one
for A, but a little more elaborate.) However, the result above will be good
enough for our purpose. The groups A,. A, and A, are also simple
(although trivially so). However, .4, is not simple because we saw that it has
a normal subgroup K of order 4. In this sense, the group A. behaves
exceptionally among all alternating groups.
A few other standard results about permutation groups will be given as
exercises.
Exercises
41 In 5'. let H = {a 6 Sn: 0(n) = n}. Prove that H is isomorphic to
S...” This way we regard 5..l asa subgroup of S... Prove that when
so done. S.._1 n A. equals A,_,. Is S...I a normal subgroup of 5.?
4.2 In S. prove that the number of distinct r-cycles is
110:
_r—. —1)...(n — r+ l)
4.3 Suppose (i,i....l,) is an r-cycle in S... Let a be any permutation in
S,.. Suppose j. = «(4) for k = l,..., r. Prove that e (i,i,...l,) a“
is the r-cycle (j,...j,).
4 4 Suppose two elements, say, 1 and 0 are conjugate to each other in
S... Prove that for every r = l,..., n the number of r-cycles in the
cyclic decompositions of 7 and 0 are the same.
‘4.5 Prove that the converse of the last result is also true. (Hint: Write
down the complete cyclic decompositions of 'r and 0, in descending
order of lengths. Using Exercise (4.3), find a permutation e such
that 'r = Geo-1.)
4.6 Prove that the number of distinct conjugacy classes in Sn equals
p(n). the number of partitions of the integer n (see comments after
Proposition (23.15).)
4.7 Let 8 = (l,2,..., n) e 5.. Prove that 0 commutes only with its
Group ”teary 383
own powers. (Hint: Consider the number of elements conjugate to
0 and apply Theorem (2.20).)
Prove that for n > 2, the group S. has a trivial centre.
coo
:“P
Prove that for all n 2 3, A. is generated by all 3-cycles. (This is
in fact, the third step in the proof of Theorem (4.13) for n = 5.
Instead of listing all elements of A., a simpler proof can be based
upon the fact that every element of S. can be written as a product
of transpositions of the form (I, 1),! = 2,..., no)
4.10 In fact, show that. A. for n 2 3, can be generated by the (n — 2)
3-cycles of the form (1, 2, i), i = 3r“, n. (Hint: Express (1, r, r)
in terms of such cycles.)
Prove that S. can be generated with just two elements, say by (l. 2)
and (l, 2, 3,..., n).
4.12 Prove that K is the only normal subgroup of A, other than (e)
and Al.
4.13 Prove that the group of orientation preserving isometries of a cube
is a group of order 24. Describe its elements geometrically as
permutations of the set of its 8 vertices.
4.14 Prove that the group in the last exercise is in fact isomorphic to
S.. (Hint: Note that every isometry of the cube must permute its
diagonals among themselves.)
4.15 If N is a. normal subgroup of S” prove that N =4 (e), A. or S‘.
(Hint: Consider N n A,.)
4.16 A group G is a said to be solvable if there exists a finite sequence
of subgroups G = N0 3 NI 3 N; DMD NH :3 N, = (e) such
that, for each i = l,..., r, N, is a normal subgroup of NH and the
quotient group NH/Nt is abelian. Prove that all abelian groups,
all dihedral groups 0., the quaternion group Q and the groups
5,, 5,. 5,, S. are solvable but the groups A, and S. are not
solvable.
Prove that a subgroup of a solvable group is solvable. Prove also
that every quotient group of a solvable group is solvable.
Prove that for n > 5, A. and S. are not solvable.
Let H be a normal subgroup ol'a group G, Prove that it‘Hand
G/H are solvable, so is G. (Hint: Use Corollary (3.8).)
Let G be a group of order p" where p is a prime. Prove that G
contains subgroups (in C G. C C G = G with 0(G,) = p',
i = 0,..., m and G,_, normal in 6.. In particular G is solvable.
[Hintz Apply induction on m and Theorem (2.22). Consider G/H
where «(11) = p and H is a subgroup of the centre of 6.]
*4.21 In 5., let 0 and a be respectively the permutations ([47) (258)
(369) and (456) (798). Let G be the subgroup of S, generated by
0 and a. Prove that G is a non-abelian group of order 27 in which
384 mscnm MATHEMATICS (Chapter Five)
every element except the identity has order 3. (Hint: Interpret 8
and a in terms of 2,.)
4.22 Prove that the group Z, x Z. X 2:. also has 1 element of order 1
and 26 elements of order 3 but that it is not isomorphic to the
group G of the last exercise.
‘423 Prove that A. is simple for all n > 5. (Hint: Apply induction on n.)
4.24 Prove that the order of a permutation (as an element of the
permutation group) equals the least common multiple of the
lengths of its cycles.
Notes and Guide to Literature
The permutation groups have been studied for a long time, long before
‘abstract’ groups were defined. The cyclic decomposition of a permutation
is needed in what is called ‘Polya’s theory of counting".
The term ‘solvable’ defined in Exercise (4.16) may appear strange. Its
justification will come in later' where it will be related to the solubility
of a polynomial equation. The fact that S. is not solvable for n 2 5, will
be used to give a proof, due to Galois, that it is impossible to given
formula for the roots of a general polynomial of degree 2 5.
Exercise (4.2l) is adapted from Carmichael [l], where the reader will
also find more about permutation group. The significance of Exercise (4.21)
and (4.22) is note-worthy. If two groups G and H are isomorphic then.
trivially, for every positive integer n, the number of elements of order n in
G is the same as the number of elements of order n in H. The two exercises
show that the converse is false. In other words, in order to check whether
two groups are isomorphic it will not suffice to compare the numbers of
elements of each order. This illustrates the remark made in the last section,
that in general there is no eflicient criterion to test whether two given
groups are isomorphic or not. However, for finite sbelian groups, it turns
out thatAhe converse does hold. The theory of finite abelian groups is
therefore considerably simpler than that of all finite groups.
The alternating group A, is the ‘smallest’ simple group, in the sense
of order. There are groups, other than A, (n > 5), which are simple. The
search for and the classification of simple groups has occupied group
theorists for several decades. See Gorenstein [I] for a recent comprehensive
work on the subject.
’ See the Epilogue.
Six
Rings, Fields and Vector Spaces
In the last chapter we studied, rather thoroughly, groups, which are
algebraic structures that arise from a single binary operation on a set. In
this chapter we study algebraic structures which are richer than groups,
because they arise by putting certain additional structures on a group
(actually an abelian group). The first such structure is a ring, in which we
have one more binary operation on an abelian group. These will be defined
and studied in the first section. The foremost example of a ring is the
familiar ring of integers and in the second section we shall study a class
of rings which share some of the pleasant properties of the ring of integen.
Fields are a very important type of rings. Field theory has several interest-
ing applications (see the Epilogue) and the present chapter will develop
the machinery needed.
The other algebraic structure that arim from an abelian group is that
of a vector space, which is obtained by specifying a rule whereby elements
of the group are multiplied by elements of some field, often called scalars.
They will be studied in Section 3. The theory of vector spaces is a generali-
sation of certain aspects of the euclidean spaces. Homomorphisms of vector
spaces are called linear transformations. They can be represented most
conveniently, by matrices. The theory of matrices (along with determinants)
will be briefly developed in Section 4.
I. Basic Concepts .md Examples
Just as the definition of an abstract group was modelled after the
permutation groups, some familiar rings, notably the ring of integers,
serve as the model for the definition of a ring. On 2, the set of integers,
we have the binary operations+ and . . Of course Z has a lot more
other structures, for example, that imposed by the usual order relation <.
But from the point of view of algebra, the binary operations are most
386 Insane-rs MATHEMATICS (Chapter Six)
important. Under addition, 2 is an abelian group, in fact a cyclic group
and we already abstracted this structure in the last chapter. If we now
include the other binary operation, namely, multiplication, on 2, then we
get an example of what is known as a ring. In order to define an abstract
ring. we take an abeiian group (R, +) and suppose that there is another
binary operation - on R. The properties that we want to assume about -
must come from those which are true for the usual multiplication on 2.
And once again we have to strike a balance between depth and generality.
As with group structures, associativity of - turns out to be vital, while its
commtuntivity is, although desirable, not indispensible. Moreover, we
must assume some inter-relationship between the two operations + and ~
For, without it. the study of rings would merely amount to two separate
studies, one of the group (R, +) and one of the semigroup (R, -).
These considerations motivate the following definitions:
1.1 Definition: A ring is a triple of the form (R, + , ~), where (R, +) is an
abelian group, (R, -) is a semigroup and - is distributive over +. Iffurther, - is
commutative, the ring is called commutative. If . has an identity clement
(almost always denoted by 1), then the ring is said to have an Identity.
As a foremost example, Z, the set of integers with the usual addition
and multiplication is a ring. In fact, it is a commutative ring and also has
an identity element. The sets of real numbers, rational numbers, complex
numbers each with the usual operations are also commutative rings with
identity elements. Many other examples will be given soon. But first we
comment upon the definition. The definitions of a ring and of a commutative
ring are standard. As for the identity clement, some authors require that it
be difi'erent from the zero element. As with Boolean algebra. it can be shown
that if 0=1 then the ring consists of only one element. Such a ring is called
a trivial ring. For every positive integer m, z... the set of residue classes
modulo m, is a commutative ring with identity under modulo In addition
and multiplication. All the ring axioms follow from the corresponding pro-
perties of Z (see Chapter 3 Section 4).
Before giving other examples of rings, let us prove a few simple proper-
ties about rings. In any ring, the underlying abelian group will always be -
denoted additivcly and its identity element by 0. The additive inverse of an
element x will be denoted by — x As usual, when multiplication operation
is denoted by -, a-b will be denoted by ab. As with Boolean algebras, addi-
tion will be a deeper operation than multiplication. This means, an expres-
sion like ab + c will be interpreted as (a-b) + c and not as a-(b + c).
The first result establishes the expected behaviour (which we generally
take for granted because of over-familiarity) of the additive identity and
inverses w. r t multiplication.
Rings, Fleur and Vector Space: 387
1.2 Proposition: Let R be a ring. Then
(0 x-0=0 x=0forallR
(ii) (—x)y = x(-y) =— (xy)and (—x)(-y) = xy for all rays)!
(iii) if R has an identity then —x= (—1)): = x(—l) forall are R.
(W) if R hat an identity I and 1 = 0, then R has only one element.
Proof:
(l Let x-o be a. Now, x(0 + 0) equals, by left diltributlvity.
v
A0+x00ra+a.But0+0=0. Sox(0+0)=x0=at'l'hu|
a + a = aandhencea +a =a + 0. Since(R, +) is a
group, this implies a = 0. That is, x0 = (1. Similarly using
the other distributive law, 0t = 0 for allx s R.
(ii) — (xy) is , by definition, the additive inverse of xy. 'l'o thow
that it equals (—x)yamounts to showing that (—x) y + xy - 0.
But (—x).v+ xy = (—x + x)y= 0y = o by (i). Similarly
X(—J’) + xy - X(-.V +y) = x0 = 0 by (i), showing that
x(--y) = — (xy). Since the equation x(—y) =—(xy) holds for
all x, y it holds it we replace x by —x, giving (—x)(—y) -
—[(-—x)y]. Call —x as 2. Then —[(—x)y] is -(zy) which equall
(—z)y. But —2 is simply x. So (—x)(—y) = xy.
(iii) We have 1+ (—l)=0. So xtl + (—1)) = x0 -0 by (i). By
distributivity, the left hand side equals x-l + x(- 1). that is,
x + x(—l). Thus at + x(-l) = 0, which means x(—l) is the
additive inverse of 2:. So x(—l) :9 — x. Similarly (—l)x —— x.
For (iv), in = 0, then x = x-l x-0= 0 for all xe R. So
R = (0). a
Note the vital use of the distributive law made throughout the proof.
In its absence, we can expect nothing about the multiplicative properties of
the additive identities and inverses. We proved a result similar to (i) for
Boolean algebras (see Theorem (4.1.3)). But there are two important difl'er-
ences. First, in a ring, we do not have the symmetry between + and -
which holds in a Boolean algebra. The dual result at + l = x is in fact false
except in the trivial ring. Also the proof of ti) difl‘ers from that of the cor-
responding result for Boolean algebras. In the case of a ring, the operation
+ is fairly strong algebraically. To some extent, complement: in a Boolean
algebra play the role of additive inverses in a ring. This similarity Ihould
not. of course, be stretched too far. For example, (a + b)’ doe- not equal
a' + b' but equals a'b’ in a Boolean algebra.
u w L
There is. nevertheless, a ‘ " ' L '. between " _
and certain types of rings defined below:
388 Discam MATHEMATICS (Chapter Six)
1.3 Definition: A ring R with identity is called a Boolean ring if x’ = x
for all x e R.
As the name suggests, there must be some intimate relationship between
Boolean algebras and Boolean rings. There is indeed one and to establish
it we first prove a result which is interesting in its own right.
1.4 Proposition: A ring R in which a:‘ = x for all xe R, is commutative.
Moreover, in such a ring x + x = 0 for all x E R. (In other words x = — x
and consequently the plus and minus signs can be used interchangeably in
such rings.)
Proof: Let x, y e R. Then (x + y)’ - x + y. 0n the other hand, we expand
(x + y)a using the distributive law as x' + xy + yx + y’. (Note that we do
not know the ring to be commutative yet, so we cannot combine xy and yx
to write 2xy.) Since x'=x and y'=y we get x + y=x+ xy+yx+y,
which further means xy + yx = 0 for all x, ye R. In particular putting
y = x, x' + x' = 0, that is, x + x = 0 for all x e R, proving the second
part. Now to get the first assertion let x, y e R. We already know xy + yx
— 0 and so xy - — yx. But because of what we just proved, —yx is the
same as yx. So xy = yx, that is, the ring is commutative. I
We remark that a ring in which x' = x for all elements is also commu-
tative. (The proof is left as an interesting exercise.) Actually a much more
general result is true, but it is not relevant to our discussion.
We are now ready to prove the relationship between Boolean algebras
and Boolean rings. To avoid confusion, for the purpose of this theorem
only, the addition in a ring will be denoted by (B as + will be needed to
denote the addition in a Boolean algebra.
1.5 Theorem: Let (B, +, -, ’) be a Boolean algebra. Define a binary
operation 63 on B by a ® b = ab’ + ba' for a, b 6 B. Then (B, Q, -) is a
Boolean ring. Conversely, let (R, e, -) be a Boolean ring. Define a binary
operation + on R bya+ b =1: e b @ ab fora, beR and a unary ope-
ration ’:R—>R by x’ =1@ x foeR.Then(R, +, -, ’) is aBoolean
algebra.
Proof: The first assertion is precisely the content of Exercise (4.1.19) and
will not be proved. We shall prove only the converse implication. Suppose
(R, e, ‘) is a Boolean ring and the operations +, ' are defined as indicated.
Then - is commutative by the last proposition. Since 9 is also commutative
it follows that + is commutative. This verifies the first axiom of Boolean
algebras. As for the presence of identities, - is already given to have 1 as
its identity element. For +, 0 is the identity because for all a E R, a + 0
= a Q 0 ® 00 which, in view of Proposition (l.2) equals a 63 0 e 0, that
is a, since 0 is given to be the identity for e. Thus'another axiom of
Rings, Fields and Vector Spaces 389
Boolean algebras holds. We now prove that both ~ and +
are distributive
over each other. Both the proofs are by direct computation.
Letx, y,z E R.
Then we have
x-(y +Z)=X'(.V$ZG).VZ)
= xy G) X: e xyz (since - is distributive over 9)
= xyexze) x’yz (since x‘ = 1:)
xy 9 x2 ea (”X-VI)
(since R is commutative by Proposition (1.4))
= xy + xz
which proves distributivity of - over +. The other distributivity follows
from
(x+y)(X+z) = (xeyexy) (166929“)
=xexyexyexzeyzexyZsexyze}xyz
(since x‘ = x and ~ is commutative)
= xeyzexyusince xyexy a 0, xzexz = 0 and
H: e xyz = 0)
= x + yz.
It only remains to verify the fourth axiom of a Boolean algebra, regarding
complementatlon. Let x e R. We have to show x + x’ = l and xx' = 0
where x’- l 9 x. Now, by definition, xx’= x(l 9 x) = x9 x‘ = x e x a 0
by Proposition (L4). Also x + x’ = x e x’ 9 xx’ = x EB x’ @ 0 by what we
just proved. So x+x'=x®x’=x©l®x=x®x® l=0+l =1,
again by Proposition (1.4). Thus we have verified that (R, +, v, ’) is a
Boolean algebra. i
This theorem is significant for several reasons. It provides us with some
examples of rings, because every Boolean algebra gives rise to a commuta-
tive ring with identity. Secondly, because of the converse statement in the
theorem, we get an equivalent formulation of the concept of a Boolean
algebra. (Another equivalent formulation, in terms of lattices, was given in
Chapter 4). We could as well work with Boolean rings instead of Boolean
algebras. Many algebraists actually do this. From their point of view, then,
Boolean algebras occupy only a corner in the world of all rings. However,
from the point of view of applications, this corner is an important oneand
that is why we have devoted one whole chapter to it.
After this digression to Boolean algebras, let us return to the study of
rings in general. Because of property (i) in Proposition 1.2, the operation
of multiplication can never satisfy the cancellation laws except in the trivial
ring, because cancellation by 0 is impossible. In many rings, however,
cancellation by non-zero elements is possible. Such rings are important and
we give them a special name. To put the definition in the form in which it
is generally given, we introduce another term.
390 DISCRETE marnaua'ncs (Chapter Six)
1.6 Definition: An element x in a ring R is called a zero-divisor if there
exists ye R such that y at 0 and either xy =0 or yx= 0. (Depending
upon which of these equality holds we may call x as a left zero-divisor or
a right zero-divisor. However, since for most part, we shall study this con-
cept only for commutative rings we may avoid the hair-splitting.)
The element 0 is of course a zero~divisor in any ring. (Other than the
trivial ring. From now onwards, we may not explicitly mention the trivial
exceptions that arise out of trivial rings. That is why, some authors require.
asin the case of a Boolean algebra. that a ring should have at least two
elements.) In the ring of integers, there are no zero divisors except 0. In a
ring obtained from a Boolean algebra, on the other hand. every element
except the identity is a zero-divisor.
The concept of a zero-divisor is intimately related to cancellation law
as we see in the following proposition.
[.7 Proposition: Let R be a ring and x e R. Then for all y. z e R,
either of the equations xy = xz or yx = zx implies y = 2 if and only if x
is not a zero divisor. In other words, cancellation by an element is possible
ifi‘ it is not a zero-divisor.
Proof: If x is a zero divisor then there exists y at 0 such that xy = 0
(or yx = 0). Take 2 = 0. Then at: = xy (or yx _ 2x) buty es 2, so
cancellation by as does not hold. Conversely suppose x is not a zero-
divisor. Let y, z e R and suppose xy = xz (the other case, yx = zx is
similarly treated). Then xy—xz=0. But by Proposition (1.2), —xz = x(—z).
So by distributivity once again, at y——z) = 0. Since x is not a zero-divisor,
this forces y -— z to vanish, i.e., y = z. I
We are now ready to define the kind of rings where cancellation by
non-zero elements is possible
1.8 Definition: A commutative ring with no non-zero zero-divisors is
called an integral domain or an entire ring.
Before giving examples of integral domains, we remark that there is a
little lack of unanimity about this definition. Some authors do not require
an integral domain to be commutative while some authors require it not
only to be commutative but also to have an identity element. The essence
of all the three definitions is the absence of zero-divisors. We shall stick to
the definition given above. As the foremost example of an integral domain
we have the ring of integers (which in fact justifies the word ‘integral’ in
the definition). Since we do not require an integral domain to have an
identity element, for every positive integer m, the set is also an integral
domain. Many other examples will be given later on. In fact. the next
section will deal with certain special types of integral domains. For the
‘wedefinea , whichis ‘ , thanan" " '
Rings, Fields and Vecrnr Spaces 391
We saw in Chapter 3, “ ' 4, that for ' 'n, K ' ,thc,
of an inverse is a stronger condition than cancellability. If we assume this
stronger condition, we get the following definition.
1.9 Definition: Let R be a commutative ring with identity. If every non-
zero element of R has a multiplicative inverse then R is called a. field.
The most standard examples of fields are those of rational numbers, real
numbers and complex numbers. However, there are many other ‘ahstract’
examples of fields. Indeed, the construction of fields with suitable pro-
perties will occupy us for quite some time. Every field is an integral domain,
because if an element x has a multiplicative inverse r‘, then 3: cannot be
a zero divisor, since xy = 0 would imply r‘xy = 0, i.e., l- y = 0 or y = 0.
That the converse is false is proved by Z, the ring of integers. However,
for finite rings the converse does hold. To see this, we paraphrase the two
definitions slightly. Let R‘ be the set ofall non-zero elements in a ring R.
To say that R has no non-zero zero-divisors is equivalent to saying that
R‘ is closed under multiplication. it is therefore a semigroup. If R is a
field then (R'. -) is an ahelian group while if R is an integral domain then
(R‘, ~) is merely a commutative semigroup which obeys cancellation laws.
We proved in Proposition (5.1.4) that every finite semigroup satisfying
cancellation laws is a group. Using this proposition, we get the following
result.
1.10 Theorem: Every finite integral domain is a field.
As an example, let m be a positive integer and let 1.. be the ring of
residue classes modulo m. If m is not a prime then m can be expressed as
any where 0 < x,y < m. Then [x] and [y] are non-zero elements of Z.»
whose product is [xy] which is the zero element of Z... Thus Zn] is not an
integral domain if m is a composite number. However, if m equals some
prime p, then, whenever xy is divisible by p, either x or y is divisible by p,
as will be proved later. So 2, is an integral domain and hence a field by
the theorem above. (Alternatively, we could use Theorem (5.1.5) in which
it was proved that the non-zero residue classes modulo a prime form a
group. Both the arguments are, however, essentially the same because the
crux of both of them is Proposition (5.].4).)
Note that unike the fields of rational, real and complex numbers, the
field 2,, where p is a prime is finite. It has p elements. Finite fields are
quite important in applications (see the Epilogue). For the time being we
only introduce an important concept about fields, or more generally, about
integral domains. We begin with a preliminary result.
1.11 Proposition: Let (R, +, .) be an integraldomain. Suppose for some
non-zero x e R and some positive integer n, m: = 0. Then ity = o for all
yeR.
392 DISCRETE MATHEMATICS (Chapter Six)
Proof: From the last chapter, recall that 71:: means the sum x +...+ x
(n times). For any y E R, n(xy) = (nx)y by distributivity. But by com-
mutativity n(xy) = x(ny). So if xx = 0, we get x(ny) = 0 for all y e R.
Since R has no non-zero zero-divisors, this means ny = Oas was to be
proved. I
1.12 Corollary: In any integral domain (R, +, -) the order of every non-
zero element in the group (R, +) is the same. Moreover. this order, if
finite, is a prime.
Proof: By the last proposition, either all non-zero elements are of infinite
order or else all of them are of finite order. In the first case the assertion
holds. [it the second case, let x, y be any two non-zero elements of R.
Suppose their orders in the group (R. + ) are m, 71 respectively where m, n
are positive integers. Then by the last proposition, m: = 0 and so m divides
I: (see Theorem (5.1.10)). Similarly my = 0 and so It divides m. But then
m = 71. So all non-zero elements are of the same order say n. If n is
composite, say, n = ij where i, j are proper divisors of- n, then for any
non-zero x in R we have ix as a non-zero element of order j < n, a con-
tradiction. So u is a prime.
1.13 Definition; An integral domain in which every non-zero element is
of infinite order is said to be of characteristic 0. If every non-zero element
has order p (which must be a prime) then the integral domain is said to
he of characterlstic p.
Thus the fields of real, complex and rational numbers, as well as the
ring of integers have characteristic 0, while the field 2,, is of characteristic
p. Obviously, every field of characteristic 0 is infinite. Put differently,
every finite field is of prime characteristic. There do exist, however,
infinite fields of prime characteristic. The theory of the two types of fields
difl‘ers considerably. In the case of characteristic 0, in an equation nx = ny,
where n is a positive integer, we can cancel n and set x = y. If the
characteristic is a prime 1;, this cancellation will be valid only if n is not a
multiple of 11.
We remarked that in a ring R with no non-zero. zero-divisors, the set
R‘ of all non-zero elements is a semigroup obeying both the cancellation
laws. In case of a field it is an abelian group. If we drop the commutati-
vity requirement, we get the following definition.
1.14 Definition: A ring with identity in which every non-zero element
has a multiplicative inverse is called a division ring.
The name is justified by the fact that in a division ring, division by
non-zero elements is possible. that is equations of the form ax = b (as
well as those of the form xa = b) can be solved uniquely for x whenever
Rings. Field: and Vector Space: 393
11$ 0. A division ring is just like a field except for the commutativity require-
ment and is therefore also called a skew field or anon-commutative field. (In
the old literature, a division ring was called a. field and what we call as a field
was described as a commutative field). Every field is a division ring. A classic
example of adivision ring which is nota field is the ring of quaternions. It is
defined analogously to the group of quaternions and in fact we shall denote
it by the same symbol Q. As a set it consists of all formal expressions of the
form an + at! + ngj + (1,1: where a“. (1,, (1,. a, are real numbers and i, j, k are
some unknOWn symbols. Let a = a0 + ali + a,j + a.k and b = b" + b,i+
bj + 0,]: be two quaternions, that is. elements of Q. Then a + b is defined
to be the quaternion (41‘I + b.) + (a1 + b,) i + (aI + b.) j + (a. + b,)k.
It is easily verified that (Q, +) is an abelian group. In fact (Q, +), as a
group, is isomorphic to the group R.‘ under coordinatewise addition. To
define multiplication of quaternions, we keep in mind the multplication table
for quaternions group and collect the coefficients of like terms. Thus for a, b
as above. a . b is defined as c. + 6,1' + c.1' + c,k where on = nobo— 111b,—
all), — “abs: cl = ”obs + “Ibo ’l‘ “abs"asbr c, 2 ”ob: + We + “abl‘aabl
and c. = nob, + ml), + mb, — a,b,. That (Q, +, -) is a ring is a little
tedious, but routine verification. The element l + 01‘ + Oj + 0k (denoted
by l) is the identity. The definition of quaternionicmultiplication must have
reminded the reader of the cross product of two vectors in 11'. There is
actually a connection between the two. In the third section (Exercise (3.26))
we shall give a definition of quatetnionic multiplication in terms of the
dot and the cross product of vectors. This definition will also make it'
easier to verify the ring axioms for Q. ll‘n = a, + a,i + 11.1" + a.k E Q,
then its norm N(a) is definedas the real number Va: + a: + a; + a:.
If a 9e 0, it is clear that N(n) > 0. For a non-zero quaternion 11, its inverse
is given by b“ + b.i + b,j + b,k where l:" = no/N(a), b, = -— al/N(a).
b, = — n,/N(a) and b, = — a,/1V(a) as can be verified by direct multiplica-
tion. However, the quaternionic multiplication is not commutative (for
example, ij¢jl where i = 0 -l- 1i + Oj + 0k etc.). Thus (Q, +. -) is
division ring which is not a field. This ring is important for applications in
number theory and physics. -
As with other algebraic structures we have the concept of a ring
isomorphism. Two rings R. S are said to be isomorphic if there exists a
bijection f: R ——> S such that for all x, y e RJIx + y) = fix) +f(y) and
f(xy)= f(x)/(y). lsomorphic rings may be regarded as replicas of each
other and have identical ring theoretic properties. As isomorphism between
two rings is also an isomorphism of their additive groups. But the converse
is false, as will be seen by many examples below. The more general concept
of a ring homomorphism will be studied through exercises.
We new study some of the most standard ways of constructing new”
rings from old ones. This study will be instructive for two reasons) First,
along the way it will introduce us to many basic concepts. Secondly, with
suitable combinations of these methods. we get many interesting examples
394 niscnm MATHEMA'HCS (Chapter Six)
of rings (for example, infinite fields of prime characteristics).
(1) Product of Rings: Let R, and R, be two rings. Let us denote the
binary operations in both by the same symbols. (This practice will generally
be followed whenever we consider more than one ring at a time. The natu-
ral exceptions are, of course, those situations where this would lead to
confusion as in Theorem 1.5 for example.) Let R = R,x R,. As usual, we
define two binary operations on R coordinatewise. Then R becomes a ring.
Clearly R is commutative if and only if R, and R. are commutative. Also if
R1. RI have identity elements (both of which to be denoted by 1).then (l. l)
is the identity for R. Note, however, that in generalR always has zero-divi-
sors. Let x]. x. be non-zero elements of 11,, RI respectively. Then neither
(In 0) nor (0, x,) is the zero element of R (which is (0, 0)). But (x,. 0).
(0, x.) = (xl-O, O-x.) = (0, 0) by Proposition (1.2). It follows that even if
Rn R, are fields, Ir‘lx R, is not even an integral domain. This does not
mean that Rl x RE cannot be made into a field with some other binary ope-
ration instead of coordinatewise multiplication. For example, let I?“ R.
each be R, the field of real numbers. Then R X R may be identified with C,
the set of complex numbers, by thinking of an element (x. y) of Rx R as
the complex number x + iy. Under this identification. the coordinatewise
addition of elements of R ><R does correspond to the usual addition of
complex numbers. But the multiplicative structures are quite diflerent. In
Rx}! the product of (x1, J5) and (x,, y,) is (x,y,, x,y,). But the product of
the corresponding complex numbers x, + iy, and x, + 1y, is (x,x,—-y.y.) +
1(31)’: + xly‘). As we saw above RXR is not a field. But 0 is a field. So
as rings, they are not ilomorphic. But they were isomorphic as groups.
The construction of product rings can he obviously generalized to the
case of the product of any finite sequence of rings.
(2) Subringrz As with other algebraic structures, we can consider sub-
structures of rings. They are called subrings. Formally, if (R, +, .) is a
ring and S C R, then S is said to be a lubring of R if first of all S is a. sub-
group of the group (R, +) and secondly S is closed under multiplication.
These conditions can be succintly put by saying that for all x, y E S, both
x—y and xy are in S (cf. Exercise 5.l.6). Clearly with the restrictions of
the binary operations. S itself is a ring. If R is commutative, so is S. How-
ever, even it‘ R has an identity. S need not. For example, for every positive
integer m > 1. m2 is a subring of Z but has no identity element. Sometimes
the subring may have its own identity element but it may be different from
that of the ring R (if any). For example let R = RX R under coordinate-
wise operations. Let S = R ><{0). Then S in! subring of R. S has (I, 0) as
its identity element while the identity of R is (l, i). If we take the product
our. with some ring not having an identity. for example R x 22 , then R ><(0)
is a Inhring with an identity element (1 , 0) but the ambient ring R x 22 has no
identity. (We remark that some authors do require a subring to contain
the identity element, if any, of the original ring.)
Rings, Fields and Vector Spaces 395
It is easily seen that a subring of an integral domain is an integral
domain. In particular if R is a field and S is a subring of R then S is an
integral domain. However, S need not be a field as we see from the fact
that Z is a subring of R. In case the subring is also a field, it is called a
subfleld. If F is a subfield of a field K, we also say K is an extension field
of F. Extension fields will be extensively studied later on (see the Epilogue).
(3) Rings of Functions: This is yet another instance of generating new
algebraic structures. Let (R, +, .) be a ring. X any set and F be the set
R". consisting of all functions from X to R. Then under pointwise addition
and multiplication of functions Fis a ring. If R is commutative, so is F. If
R has an identity element 1, then the constant function which assumes the
value 1 at every point of X is an identity element forF. Note, however,
that when R is an integral domain F is not so, unless the set)! isalingieton
(in which case we might as well identify F with R). For let Y be a proper
non-empty subset of X. Let f:X -> R be a function which vanishes on Y
and assumes some non-zero value on the complement, X— Y. Let g : X—sR
be a function which vanishes on X—Y but not anywhere on Y. Then f, g
are both non-zero. but fg = 0.
We can identify R as a subring of F in various ways. For an element
re R. let c, be the constant function which takes the value r at all points
of X. Let C be the set of all such constant functions. Then C is a subring
of F. The function e : R —> C defined by 80) = c, for r e R is clearly a ring
isomorphism. So upto an isomorphism, R is a subring of F. Another way
to embed the ring R into F is to fix some xoeX. For each re R, let
d, : X -> R be the function which takes the value r at x. and the value 0 at
all x a6 X0 in X. Let D be the set of all such functions d, as r varies over R,
Then D is a subring of F which is isomorphic to R.
If X is a finite set we can think of F as the product of copies of R.
Specifically. let X= (l, 2,..., n). Then a function f:X—>R determines a
unique n-tupie of elements of R, namely, (ftl). f(2),...,f\n)). This gives a
function 4. from F to the cartesian product R><R X XR (rt-times). This
function is a ring isomorphism.
It often happens that the sets X and R have some other structures in
addition to the ring structure on R. We then consider only such functions
from X to R, which are interesting from the point of View of that structure.
Such functions often. form subrings of F. For example IetX be the unit
interval [0, l] (= (x6 11:0 < x g 1)) and let R = R, the field of real
numbers. Let Cuo, 1]) be the set of all continuous functions from [0, l] to
R. Since the sum and the product of two continuous functions are continu-
ous, it follows that C([0, 1]) is a subring of the ring of all functions from
[0, I] to R. Similarly we have the ring of all differentiable functions from
[0, l] to R. Such rings are important in the study of that particular struc-
ture on the set X.
396 orscruzrn MATHEMATICS (Chapter Six)
(4) Ring: of Matrices: In the last construction, both the addition
and multiplication of functions were defined poinrwise. Depending upon
the nature of the set X, it is often possible to define multiplication of func-
tions in some other manner (without changing the definition of addition of
functions) and this gives new rings. We already had an instance of this.
As we just saw, we may identify the product R x Rwith the set of all
functions from the set {1,2} to the set R. The ring structure on R X R
corresponds to that obtained by pointwise addition and multiplication of
functions from {1, 2} to R. But suppose we define a new multiplication of
two functions fl 2 by (ft s)(l)=/ (I): (|)-f(2) 3(2) and (ftg) (2)
=f(l) 5(2) +f(2) g(l), then (R X R, +. ‘) becomes a ring which is
isomorphic to the field of complex numbers, because it corresponds to the
multiplication of complex numbers if we identify the function f with the
complex number f(l) + if(2). Similarly we ask the reader to view the ring
of quaternions as obtained from defining a suitable multiplication on the
set of all functions from the set {0, 1, 2, 3) to Rt
Now suppose X is the set (I, 2, ..., m} x (l, 2, ..., n} where m, n are
some fixed positive integers. As remarked in Chapter 2, Section 1, a
function from X into R is nothing but a matrix of order m x n (thatis,
with m rows and n columns) with entries coming from the set R. A typical
such matrix, say A, is denoted as shown below‘ or, in a more compact form by
(11.9%", or simply by (01,). Here 0,,- is the value of the function A at the
‘II
point (1,1') in the domain set (I, ...,m} x {l,.... n}, for l g [g m,
1 <1 < n. It is called the (i, j)th entry of the matrix. The matrix A is said
to be a matrix of order m x n over the ring R. Ifm=n. the matrix is
called a square matrix of order u.
an au a” a,..
an ”n as! . ”In
lth tow—9 ”I; at. a” .... a".
um am, a...) . . . . a...
jth co:umn
" It is equally common (although typographicnlly a little clumsy) to enclose I
matrix by parentheses instead of brackets.
Rings. Fields and Vector Spaces 397
Given two matrices, say, A = (all) and B = (bu) of the same order (that
is with the same number of rows and the same number of columns) over
the same ring R we can add them entrywise to get another matrix of the
same order. Symbolically, (a,,)+(b,,)=(a,,+b,,). This amounts to pointwise
addition of the corresponding functions into the ring R. We can also define
the multiplication of two matrices of the same order entrywise, that is,
(n,,) (b,,) = (my bu). But this would correspond to the pointwise multipli-
cation of the corresponding functions and the resulting ring would be the
same as the ring of functions considered above. To get adifl‘erent ring
structure we define a new multiplication called the matrix product. In
order that the matrix product of two matrices, say A and B, (in this order)
is ‘ ‘ ‘ it is y that the L of ' of A should equal
the number of rows of A. So supposeA = (on) is anm x n matrix and
B = (1),.) is an n x [1 matrix where m, n, p are some positive integers. Then
we define their matrix product (denoted by A3 or AB) to be the m X p
matrix whose (i, k)"' entry is the sum 2' a” b”; for l < i < m, l g k <p.
[-1
For example
r1 2 3 —1-H— 2 V2 7 «2—4+3n-|
o 1 2 -2 l —3 = 7 1+2"
1—1 4 ol 0 1: 1 v2+3+4uJ
J L
—3 ._
A a 4.3
The eulest way to visualise the matrix product AB is to note that the
entry in its ith row and kth column is obtained by taking the 1th row of A
and the kth column of 8 (each of which has n entries), multiplying their
corresponding entries and summing the products. For example, in the
illustration above, the third row of A has entries 1, —l. 4, 0 and the
corresponding entries in the second column of B are 1/2, -3, 1: and —2.
Multiplying them in that order and adding gives 1/2 + 3 + 41: which is
the entry in the third row and the second column ofthe product matrix AB.
Matrix multiplication is not, strictly speaking a binary operation on
the set of all matrices over a ring R, because not every pair of matrices
can be multiplied (we may even have matrices A, B such thatAB is defined
but BA is not defined). Nevertheless matrix multiplication has properties
similar to certain attributes of binary operations. We list three such
properties:
(i) Aflociathity: Let A, B. C be matrices of orders m X n, n x p and
398 DISCRETE MATHEMATICS (Chapter Six)
1: X 4 (say). Then all the products AB, BC. (AB)C and A(BC) are defined
and the last two are equal. This is so because the (I, r)t.h entries in them are
P n l P
2 b[k cIn)
El 51:] ‘111 but) Ck! and ’33 at! k(_l
respectively and they are equal because of the associativity and distri-
butivity of multiplication In the ring R.
(ii) Distributivlty: Let A, B, C, D be matrices of orders m X n, n x p.
n x p and p X qrespectivlty. Then A(B + QequalsAB + AC and (B + C)D
equals 30 + CD. This is also proved by straightforward verification that
the corresponding entries are equal.
(iii) Identity matrices: Suppose the ring R has an identity element 1.
For every positive integer n, let I, be then x n matrix in which all din-
gonal entries are l and all other entries are 0. For example, In is the
matrix
[100 001
on 00
O
001 oo
00 10
O
L00 on
0
Then for any m xn matrix A, A]. = A and for any nxp matrix B, 1.5 = B.
For every positive integer n, let M,(R) bethe set of all nxn matrices
over R. Then matrix ' '," ’ is a .“ ‘ ‘ ‘ binary r ' on
M.(R). This operation is associative, distributive over matrix addition and,
in case R has an identity, has I. has the identity element. Thus we see that
M..(R) is a ring, and has an identity in case R does. It is called the ring of
nxn matrices over R. For n = I, it is isomorphic to R. For n > 1, it is in
general non-commutative even when R is commutative.
We shall return to matrices later in this chapter. Here we are interested
in them primarily as means of constructing new rings. Various subrings of the
ring of matrices often provide a ‘concrete' representation of some abstract
rings. For example, the classic view of complex numbers is that they are ob-
tained by 'adjoining’ to the real number system. the ‘imaginary' square root
of — 1. Instead of this we can as welldetine a complex number to be a 2x 2
—Y
matrix over R. of the form [ ](i.e., a 2X 2 real matrix whose(l, l)th
y x
and (2, 2)th entries are identical and (l, 2)th and (2, l)th entries are the
negatives of each other). Let S be the set of all matrices of this form as
x, y vary over all real numbers. Then it is easily seen that S is a subring of
Rings, Fields and Vector Spaces 399
x ‘1’
MAR). Further, under the correspondence which takes [ ] to x + iy,
y x
S is isomorphic to the field of complex numbers. The number !(.= 0 + i1)
0 —1
corresponds to the matrix [ ]. while a real number x(= x + 10) cor-
l 0
x 0
respond: to [0 ]. So we could as well define a complex number as a
x
2x2 matrix of a certain form. This representation is very important because
we shall consider many extension fields which are obtained by adjoining
certain new elements to some ‘ground’ or ‘base‘ field. In all such cases, the
elements of the new fields can be ‘concretely’ realised as certain square
matrices of appropriate order over the ground field. and the extension field
turns out to be isomorphic to a subring of the ring of matrices.
(5) Rings of Polynomials: This is another important class of rings
that arise from “ “ ' ‘ ‘r" ‘ off ' ins other than
pointwise. Let No he the Set (0, l, 2, ...} and R be a ring with identity. Let
f, g be two functions from N, to R. We define f+g pointwise. But we
define fg to be the function whose value at an element n E N, is the sum
ll
Eof(i) g(n - i). We leave it to the reader to verify that this operation makes
,
-
the set F of all functions from N. to R into a ring. This ring has an identity
element, namely, the function which takes the value I at0 and the value 0
at all other points of No.
Lest this multiplication sound too bizarre, let us View it a little difl‘er-
ently. Note that every functionf from No to R is asequenee {0.};7... where
a, =f(i), for 1: 0, l, 2, . Let x be some symbol not in R. Consider the
power series E and. This power series is purely formal, that is, notational
l-ll
and does not signify convergence or limiting process of any kind. Two such
power series are added term by term, that is,
(5
loo
m!) + ( [-0 "2" (a, + b;)x‘.
if 12.x!) = l-v
But to multiply them we use the law of indices x‘” = xhxl. So in the pro-
at a .
duct ( alx'x Echx') the coeflicient of x“ will be '20 nib“. This is precisely
I- l— -
i fti)g(n -- i) where/2 g are functions from No to R defined by f(i) = a,
I III
and g(i) = b, for i E N... Thus we see that the definition of multiplication
of two functions is simply the multiplication for power series.‘ So far we
‘ Another interpretation is that it is simply the convolution as defined in Exercise
(3.4.25) if the poser is “0 with the usual order. where we think off(nr.n) as [In — m),
for m g n.
400 DISCRETE MATHaM/mcs (Chapter Six)
treated it as some unknown symbol (or an ‘indeterminnte’ as it is techni-
enlly called). But we can identify it with the power series 33 and in which
[—0
a, = l and a, = 0 for all i 96 l. The ring defined above is called the ring of
power series in x over the ring R. We shall denote it by R(x}. The symbol
at may of course be replaced by any other symbol, say y. It is clear that
R(x) is commutative ifl‘ R is commutative.
Far more interesting is the subring consisting of the so-called finite
power series, or more popularly, polynomials. Formally, a polynomial in x
with coeficients in R isan expression of the form :1n + a1x ...+ and + in
which all We after some i are 0. The largest n for which 0,. ac 0 is called the
degree of the polynomial and this element a, is called the leading eoelflclent
of the polynomial. (if a. = 0 for all I, then the polynomial is the zero
polynomial and we assign no degree to it.) The set of polynomials is easily
seen to be a subring of the power series ring over R and is denoted by
R[x]. Note that if R is an integral domain, so is R[x] because if for), g(x)
are non-zero polynomials of degrees m and n then f(x) g(x) is apolynomill
with degree m + n, whose leading coefficient is the product of the leading
coeficients of f(x) and g(x). However, even if R is a field, R[x] is not a
field, it is only an integral domain.
The ring R itself may be identified with a subring of R[x], consisting of
the constant polynomials. Thus R[x] is an extension ring of R and will be
very important in the construction of extension fields.
We can also consider the ring of polynomials in two or more indeter-
minates. say, x,, x,, ..., xk. It is denoted by R[x,, x., xk]. It consists of
all finite sums of the form 2 A’l'lxgt xzk where mare non-negative integers.
Their multiplication is defined using the laws of indices for powers of each
indeterminate. Alternatively, R[x,, ..., xi] may be defined inductively as the
polynomial ring in xk over R[x,, ..., x1._,] for k > 1.
(6) Quotient Rings: Just as we form quotient groups of groups we can
form quotient rings of rings. Recall that the quotient group consists of
cosets of a subgroup. In order to have a well-defined coset multiplication,
we have to have some restriction on the subgroup, namely that it he nor-
mal. Similarly, inthe case of a ring. we consider the cosets of a subring but
to define a ring structure on the set of these cosets, the subring must satisfy
certain additional property. We define what it is.
1.15 Definition: Let (R, +, ~) be a ring. A subset I of R is called a left
ideal (or a right ideal) if] is a subgroup of (R, +) and for all x E I, r e R,
rx 6 I (respectively xr e I). If I is both a left ideal and a right ideal then
it is called a tw0<sided ideal or simply an ideal.
Evidently every left ideal and every right ideal is a subring because the
second condition implies that it is closed under multiplication. The con-
verse is not true in general. For example, 2 is a subring of R but it is not
Rings, Fields and Vector Space: 401
an ideal. Indeed we see that if a ring R has an identity element 1 then no
proper left or right ideal can contain 1, for if it does thensincer- l =r= l -r
for all r e R, it would contain every re R. If a is any element of a ring
R then we let Ra = (ra :r e R}. It is easily seen that Ra is a left ideal. It is
called the principal left ideal generated by a. Similarly aR={ar : r e R) is a
right ideal, called the principal right ideal generated by a. If R is commu-
tative then of course Ra equals (JR and is an ideal of R called the principal
ideal generated by a. If R has an identity then aR containsa. but otherwise it
need not. For example if R = 22 and a = 2 then 2R = 42 which does not
contain 2. So the term ideal ‘generated' by a is somewhat misleading.
Generally, it is used only for rings having identity elements.
To see how quotient rings are obtained from ideals, let I bea two-aided
ideal in a ring R. Then I is also a subgroup of the additive group of R.
We consider its cosets in R and denote them additively, that is, if x e R
then x + 1 denotes the set {x + y: y e I} which is the coset of the sub-
group 1 containing x. Since R is an abelian group, I is normal in R and
so there is no difiiculty in defining the coset addition. In fact for this
purpose it would suflice if I were merely a subring of R. But when we try
to define coset multiplication by (x + I) (.V + I) = xy + I, in order to
show it is well-defined, the fact that] is an ideal is needed. For suppose
x+I=z+ Iandy+I=w+L Thenx—zelandy—wel. We
have to show that xy + I = zw + I, that is, xy —zw e I. For this we
write xy—zw as x(y— w) + (x—z)w. Since xE R and y—we I,
x(y —- w) e I. Similarly, since (x ~—z) e I and w e R, (x—z)w e I.
I is allo closed under addition. So x(y—w) + (x—z)w E I, that is,
xy -— zw e I as was to be shown. Having obtained well-defined operations
of addition and multiplication, it is now a routine matter to verify that R]!
is a ring. It is commutative if R is. If R has an identity 1 then the coset
1 + I is the identity element of R/I. As a simple example of quotient rings
letR = Z and I = mZ where m is some positive integer. Then R/I is
precisely 2”,. the ring of residue classes modulo m, undermodulo m addition
and multiplication. When m is not a prime, Z," is not an integral domain
as we saw earlier. Thus we see that the quotient ring of an integral do-
main need not bean integral domain. Actually, when R]! will bean integral
domain (or a field) depends more on the ideal I than on the ring R. Since
we shall construct many fields as quotient rings of certain rings by suitable
ideals, we characterise those ideals whose quotient rings are fields.
1.16 Definition: An ideal I of a ring R is called maximal if it is a proper
ideal (that is I c R) and is not properly contained in any proper ideal of
a:
R, that is whenever J is an ideal of R such that I C J C R then either
I = I or J = R.
If We consider the collection of all proper ideals of R and partially
order it by inclusion, then a maximal element of it is a maximal ideal and
402 mscnm MATHBM/mcs (Chapter Six)
conversely. This justifies the name. If R = z, and m is a positive integer,
then "12 is maximal if and only if m is a prime. (This 'will be proved in the
next section). The importance of maximal ideals comes from the- following
result.
l.l"l\ Proposition: Let R be a commutative ring with identity and I an
ideal of R. Then the quotient ring R/I isa field it’ and only if I is a maximal
ideal of R.
Proof: Let I be a maximal ideal. R]! is already a commutative ring with
identity I + I. To prove it is a field we have to show that every non-zero
element of it has a multiplicative inverse. A non-zero element of R/I is a
coset ofthe form x + I, where x p I. Now let Jbe the set I + Rx, that is
the set of all elements of the form y + rx where y e I and r E R. Taking
r = 0, we see that I C J. Also takingy = Oand r =1, xe J. Since xél,
I is a proper subset of J. We claim] is an ideal of R. For let y: + rxx
and y, + r,x e J. Then (y, + r,x) —(y, + r,x) = (y,—y,) + (r,—r,)xe J
since y, — y. E I and r1 — rx 6 R. This proves that J is a subgroup of
(R, +) (of. Exercise (5.].6)). For proving that J is an ideal, suppose 26 R
and y+rewhercyE I,rER. Then z(y+ rx)=zy+zre
since zy e I (I being an ideal). So I is an ideal of R which properly con-
tains I. By maximality of I, I must equal R. In particular, 1 = y + rx for
somcyel,re'R. Thenl+I=(y+I) +(r+ I)(x +1). y+ Iis
the zero element of R/I. Hence I + I = (r + I) (x + I). This means that
the element x + I has r + I as its multiplicative inverse. As noted before,
this proves that R/I is a field.
Conversely, suppose R/I is a field. We have to show that the ideal I is
maximal. Let I be an ideal of R which properly contains I. We have to
show] = R. As noted above, it sufiices to show that 1 e J. Now since
I c J. there exists x e J such that 2: ¢ I. This means the coset x + I is a
)-
non-zero element of RN. Since R]! is a field, x + I has an inverse, say,
y + I for some y e R. This means (x + I) (y + I) equals 1 + I. But
thenxy— l e I. Letz= xy— l.ThenzeJ(sinceIC J). AlsoxyeJ
since x e I and J is an idea]. So 1 = xy —z E his was to be shown B
The last proposition will be applied to the polynomial ring F[x] over a.
field F, to get an extension field which will contain F as a subfield (upto
an isomorphism), Of course we must have some way of telling which
ideals in flat] are maximal. This will he found in the next section.
Just as normal subgroups are intimately related to group homomor-
phisms, ideals are related to ring homomorphisms We shall define this
concept and give its properties (which are straightforward analogues of the
corresponding properties for groups) through the exercises.
(7) Field of Quotients: The last construction of forming new rings by
taking quotients of some rings by ideals is applicable for all types of rings
Rings, Field: and Vector Spaces 403
and results in fields for certain types of ideals. The construction to be
given now is applicable only for integral domains and in spite of the word
‘quotient' in it has nothing to do with quotient rings. It is motivated by
the construction of rational numbers as ratios or quotients of integers.
(The word ‘rational’ in fact comes from ‘ratio’.) Suppose R is an integral
domain and x, y e R. with x aé 0, We would like to define the ratio or
the quotient of y by x to be an element 2 such that xz = y. Because of
Proposition (1.7), such an element 2. if at all it exists, is unique. But it
need not always exists. If it does, we say x divides y and write x [ y. For
example, in Z, 6 divides 30 but 8 does not divide 30.0f course we can
think of 30/8 as a rational number. We can say that the field of rational
numbers is obtained by adding to Zall such ‘missing quotients’ of pairs of
integers. We associate the pair (30, 8) with the rational number 30/8. Note
that 30/8 is the same rational number as 15/4. In other words, the pair
(30, 8) corresponds to the same rational number as the pair (15, 4). More
generally, if a, b, e, d e Z with b, d as 0 then the pairs (a, b) and (c,d)
correspond to the same rational number “1' ad = be. The addition and
multiplication of rational numbers can also be described in terms of the
corresponding ordered pairs. Thus (a, b) + (e. d) = (ad + be, bd) and
(a, b) . (c, d) = (ac, bd) (These are nothing but the rules we learn in early
school, if we think of the first member of the pair as numerator and the
second member as denominator.)
We now show how this construction can be generalized to the ease of
an ‘abstract’ integral domain R. First we consider the set S = Rx(R—{0)).
consisting of all ordered pairs (a, b) of elements of R with b as 0. 0n S we
define a relation ~ by (a, b) ~ (e, d) ifi' stir—be. It is easy to show that ~ is
an equivalence relation. The equivalence class containing an element (a, b)
of S will be denoted by la, b]. Let Q be the set of all equivalence classes of
S under the equivalence relation ~. We are going to define a field structure
on Q. First we define two binary operations, still denoted by + and . by
[a, b] + [e, d] = [ad + be, bd] and [41, b].[c, d] = [ae, bd]. We must, of
course, verify that these are well-defined. We leave this as an exercise and
turn to proving that with these operations, Q is indeed a field (where,
too, we shall be very sketchy in the proof).
1.18 Theorem: With the operations defined above, Q is a field which
contains a subring T which is isomorphic to R. Every element of Q can he
expressed as a quotient of two elements of T.
Proof: The properties which are necessary to make Q a commutative ring
follow one-by-one from the corresponding properties of R. We assert that
even if R has no identity element, Q always does. We assume R is not
trivial. (If R has only one element, the set S is empty, so Q has only one
element and the theorem holds trivially.) Let x be any non-zero element of
404 DISCRETE Mamancs (Chapter Six)
R. We claim that Ix, x] is the identity for multiplication on Q. For, let
[a, b] e Q. Then [a, b] . [x, x] = [ax, bx]. But [ax, bx] = [a, b] because
abx = axb. (Note that if we took some other non-zero element y in R,
then we would get the same identity element, because (x. x] = [y, y].)
For proving that Q is a field, suppose (a, b] is a non-zero element of
Q. Then a 9‘ 0 (because, ordered pairs of the form (0, b), b e R — (0) con-
stitute an equivalence class under no which represents the zero element of
Q). So (b, (1)5 S. Now [a, hub, a] = [ab, rib] which is the identity element
of Q, showing that [b, a] is the multiplicative inverse of [a, b]. Thus Q
is a field. Fix some non-zero x E R. Let T= ([ax, x]:a e R). It is easy to
show that T is a subring of Q. Moreover, the function f: R —> Tdefined
by f(a) = [ax, x] for a e R is a ring isomorphism. For the last assertion,
let (a, b] e Q. Note that [x, bit] is the multiplicative inverse of [bx, x]. So
the quotient [ar,x]/[bx,x] equals [ax,x]-[x,bx] which equals [ax’,hx’] and
hence [(1, ii]-
1.19 Definition:' The field Q constructed above is called the field of
quotients of the‘ ‘ g ' “ ‘ R. The ‘ [a, b] is g " ‘ ‘ "
by a/b. Also R is identified with T and hence treated as a subring of Q
The field of quotients is sometimes called the quotient field. It should
not be confused with the concept of a quotient ring. It is not obtained from
any ideal. Instead it is constructed by adjoining to the integral domain R,
the missing quotients.i in R, some element b does divide some other ele-
ment a with ratio i: (say), then the pair (a, b) is equivalent to (ex. x) for any
non-zero x, and to (e, l) in particular if R has an identity. In other words,
in such a case [a, b] is in T which is just a replica of R. In particular if R
itself is a field to start with then Q would coincide with T and so we do
not get anything new. The construction of Q from R typifies a common
theme underlying many apparently diverse constructions in mathematics.
Whenever you would like something to exist (in this case the ratios or the
quotients) and it does not exist in the domain you have, enlarge the domain
by adjoining certain hypothetical objects (in this case, equivalence classes
of pairs). Figuratively, these extra objects serve to fill the 'holes‘ which
arise because of non-existence of certain elements in the original domain.
Another instance of this philosophy is the construction of complex num-
bers. In the real number system, the number —1 has no square root. So
we adjoin the hypothetical (or ‘imaginary‘)_ number i such that i' = —- 1.
In order to have a field which contains all real numbers, plus this new
number i, we must also have numbers of the form a + 1']: where a, b are
real and this leads to the field of complex numbers.
Suppose now that the integral domain R is itself the ring F[x], that is,
the ring of polynomials over some field F(see Construction (5) above).
Then the field of quotients of fix] is denoted by F(x). Elements of this
field are ratios of the form p(x)/q(x) where p(x), 10:) are polynomials with
Rings, Field: and Vector Spaces 405
coefficients in Fand q(x)¢0. Such ratios are oflen called rational functions
of 1:. Consequently. the field F(x) is called the field of rational functions in
x. Note that even when F is a finite field, the polynomial ring F[x] is in-
finite. So F(x) is also infinite. since it contains a replica of F[x]. Taking
F = 2,, where p is a prime, we get z,(x) as an example of an infinite field
of a prime characteristic.
Exercises
Prove that in a ring with identity, the commutative laW' for addi-
tion is redundant, that is, can be derived from the remaining
axioms of a ring. (Hint: of. Proposition 3.4.127)
An element x of a ring is called idempotent if x' = x. (Actually,
the definition makes sense for any binary operation on a set; see
Exercise 3.4.23.) Prove that an integral domain has at least one
and at most two idempotents. Prove that a Boolean ring cannot
be an integral domain unless it is trivial or isomorphic to 2..
The centre 2 ofa ring R is defined as the set {25 R:zr = rz for
all re R). Prove that the centre is a subring. Find the centres of
the ring of quaternions and the ring of 2x 2 matrices over R.
Prove that a ring in which x' = x for all x is commutative. (Hint:
First show that the square of every element is in the centre. Then
rewrite (xy)‘ as x(yx)‘y.)
For any two quaternions a, b, prove that
N(a + b) g N(a) + N(b) and N(ab) = N(a)N(b).
1:6 Prove that the ring of quaternions contains subrings isomorphic to
R and C.
1.7 Obtain a representation of the quaternion ring as a subring of
M.(R).
«.3 Obtain a reprcresentation of the quaternion ring as a subring of
M.(C)- '
19 Prove that a field has characteristic 0 in" it contains a subfield iso-
morphic to Q(the field of rationals) and that it has characteristic p
ifi‘ it contains a subfield isomorphic to 2,. (Hint: in both cases,
consider the subfield generated by the element l.)
1.10 Prove that the additive group in a field is not cyclic unless the
field is isamorphic to z, for some prime 11.
1.11 Let R a RlxR, where R1, R. are rings. if S“ S. are subrings of
R,, R, respectively, prove that S,xs, is a subring of R. ls every
subring of R of this type? Prove that the subring R1><{0} is an
ideal of R, which is isomorphic to R1 and the quotient ring R/(R,
x (0}) is isomorphic to 12,.
1.12 Let R be a ring and R1, R, C R. Prove that if RI, R, are subrings,
406 means-1'13 MATHEMATICS (Chapter Six)
or left ideals or right ideals of R, so is R,nR.. Show by an exam-
ple that the intersection of a left ideal and a right ideal need not
be either a left or a right ideal. (It is however a subring).
1.13 An n><n matrix A is called a diagonal matrix if all the entries off
its principal diagonal are zero, that is m,- = 0 for i¢jr Let R be a
ring. Prove that the set of all diagonal nxn matrices is a subring
of MAR). Is it an ideal?
1.14 Let R be the ring of all continuous, real valued flmctions on the
unit interval [0, 1]. Fix some subset A c [0, 1]. Let M4: (f e R:
flx)= 0 for all x e A}. Prove that M4 1: an ideal of R and that it
is maximal ifi‘ the let A is a singleton. (Hint: Note that if B C A,
then MA C My.)
“1.15 In the last exercise, prove that every maximal ideal of R is of the
form M(.,, for some x06 [0, l]. (The proof requires the fact that
[0, l] is compact.)
‘1.16 Let R be the set of all analytic functions from G to C. Prove that
R is a subring of the ring of all functions from G to c. Prove
further that R is an integral domain. (The proof requires that the
set of zeros of a non-zero analytic function is discrete, that is, has
no limit points.)
Prove that the sets {a + VZb : a, be Z) and {a + iVsu, b E Z)
are subrings of R and C respectively. Give other examples of this
type.
1.18 (a) For any ring R with identity verify that the multiplication of
power series with coeflicients in R is indeed assoeiative and
hence that Rlx) 1s a ring.
(b) Let R be a ring with identity, It a positive integer and x an
indeterminate. Prove that the ring M.(R[x]) of n><n matrices
over the polynomial ring R[x] may be identified with the ring
(M.(R))[x] of polynomials over the ring M.(R) of n x n matrices
over R.
Let F be a field and F{x) the ring of power series over F. Prove
that the power series l—x is invertible and has for its inverse the
power series I + x + x“ + x’ + + x‘ + (Caution: This does
not mean that the geometric series 1 + x + .1:2 + ‘converges‘ to
ill—x. Actually, we have not defined the concept of convergence
in an abstract ring. The result to be proved simply means that the
product of the two power series 1 — x (i.e., l—x + 0x‘ + Ox' +
0x‘+....) and l + x +x"+ x°+... is the power series 1 (that is,
1 + 0x + 0x' + 0x' + ...). More generally, prove that a power
series 3 a...\:" is invertible ifi‘ au sé 0.)
Il=ll
1.20 Just as the last exercise deals with formal inversion of power series,
Rings, Fields and Vector Spaces 401
we can study formal derivatives of power series, that is, derivatives
which arise purely from a particular form and not from any limit-
ing process as in calculus. if f(x) = g am" is a power series in x,
...o
we define its derivative (denoted by f’(x)) to be the power series
m
"20(n + 1)a,,+,x". If for), g(x) are any two power series, prove that
(f + s)’(X) =f'(X) + 8’0) and (fs)'(X) =f(x)g’(X) +f’(x)£(x)-
1.2 Prove that a commutative ring with identity is a field if and only if
it has no ideals other than the zero ideal and the whole ring.
1.22 Let R, S be rings. A function f: R—>S is called a ring homomor-
phism if for all x, y e R, f(x+ y) =f(x)+f(y) and f(xy) =f(x)f(y).
The kernel of f is defined as the set (x e R:[(x) = 0). Prove that
the kernel of a ring homomorphism is an ideal and conversely every
ideal is the kernel of some ring homomorphism. If f: R—> S is a
ring homomorphism with kernel K and range T, prove that T is a
subring of S and is isomorphic, as a ring, to the quotient ringR/K.
Also state and prove the analogue of Theorem 5.3.7.
1.23 Using the last two exercises, give an alternate proof of Proposition
1.16.
1.24 Let f: R -> S be a ring homomorphism where R is field. Prove that
f is either identically zero or else an isomorphism of R onto the
range of f.
Let Q be the field of quotients of an integral domain R. Let F be
any field. Given any one-to-one homomorphism f:R—>F, prove
that there exists a unique ring homomorphism g: Q —> F which ex-
tends f (that is, g(x) =f(x) for all x e R).
1.26 An ordered field is a field, which is totally ordered in a manner
compatible with the field structure. Formally it is a quadruple (F,
+,- , <) such that (F, +, ) is a field and < is a strict linear order
on the set F satisfying
(i) ifa<bthena+b<a+cforalla,b,c,eFand
(ii) ifa<band0<cthenac<bc for alla,b,c, 61".
(The definition would make sense for any ring, not just for a field.
But it is only for integral domains and fields that the concept of
ordering has important consequences.)
In an ordered field an element x is called positive if x > 0 and
negative if x' < 0. Prove that the square of every non-zero element
is positive. Hence show that an ordered field must' be of character-
istic 0.
1.27 Show by an example, that in a ring, (x + y)(x——y) need not equal
408 DlSCRm MATHEMATICS (Chapter Six)
x‘ —— y“. Prove, however, that this holds in a commutative ring.
Prove also, that in a commutative ring, the binomial theorem holds,
that is, for any two elements x and y and any positive integer n,
(x + y)“ equals
n n
X“+( )x‘"y+ (2) x'-'y'+ +
l
n n
+( x'"y' +...+( ) xy"1+y'.
r n—l
Notes and Guide ro'Lmralure
Ring theory may very well be considered to be the centre of modern
algebra. Iaeobson’s three volumes [I], provide a thorough treatment. Spe-
cial references for commutative and non-commutative rings are Zariski and
Samuel [1] and Herstein [2] respectively.
The origin of the terms ‘ring‘ and ideal’ is somewhat obscure. The
former is attributed to the geometric representation of the ring Z... by the
complex mth roots of unity. These points lie on a circle or a ‘ring’. ‘ldeal‘
probably comes from the so called 'ideal’ points in projective geometry.
The quaternions were invented by Hamilton. For more on them see
Herstein [l], where the reader will also find a proof of a remarkable theorem
due to Weddcrburn that every finite division ring is a field. That is why,
although we can form quaternion rings over Z, exactly as over R, they are
not division rings.
Exercises (1.15) and (1.16) highlight how the structure of certain rings
of functions is affected by the structure on the domain set. A good refer-
ence on the rings of continuous functions is Gillman and Jerison [l].
The "' '1 tL (E ' 1.27) r“ why the "‘ '1
coeflicients are called that way. It is one of the most well-known theorems
of algebra. For historical references to it and to the binomial coefiicients
see Knuth [I], Vol. 1.
2. Special Types of Integral Domains
It was remarked in the last section that the ring of integers is a foremost
example of a. ring Still, the definition of a ring is far too general to capture
any of the interesting properties of Z. If we confine ourselves to integral
domains, some of the phenomena that take place in Z can be imitated,
such as the construction of the field of quotients. But the class of all integral
domains is still too large to permit generalisation of the deeper properties
of integers. So we want to restrict it further by putting some more condi-
tions on integral domains. 01‘ course, these additional restrictions must not
be so strong that, upto isomorphism, Z is the only ring satisfying them! For,
Rings, Fields and Vector Space: 409
in that case there is little point in the generalisation. In other words, we
are once again trying to gain some depth in our abstraction, without having
to sacrifice generality altogether.
In this section we consider three progressively stronger conditions on
an integral domain which make it resemble Z more and more. Yet, the class
of rings satisfying even the strongest of these three conditions, will be large
enough to include in it, besides Z, the ring of polynomials'over a field and
many other rings. The properties to be proved for rings satisfying this
condition will hold in particular for the ring of integers. These properties
wrll then be used to fill some gaps which were left in the earlier chapters
(for example, finding the number of generators for a cyclic group).
. We begin with the strongest of the three conditions. It is intended to
Imitate the euclidean algorithm for integers (which we already used without
proof a few times, for example in Problem (2.2.l0) and Theorem (S.l.7)).
Recall that, according to this algorithm, given integers a, b with b > 0, we
can find integers q, r such that a = qb + r with 0 < r < b. The integers
q and r are easily seen to be unique and are respectively called the quotient
and the remalnder (or residue) obtained when a is divided by b. This
algorithm was originally used by Euclid to give a procedure for finding the
greatest common divisor of the two integers a and b. The next step is to
replace a by b and b by r and carry out the division once again to get a
new remainder, rl (say). We then continue, everytime letting the old divisor
be the new dividend and the old remainder be the new divisor. This is a
familiar procedure which we all learn in schOols. But what guarantee is there
that it will terminate with a remainder 0 (and give the divisor at this stage
as the greatest common divisor)? The crucial point is that the remainder,
if nomzero, is always less than the divisor. This, combined with the fact
that the set of positive integers is well»ordered (see Exercise (3.121))
ensures that the process of successive divisions, with the remainders getting
smaller every time will not go on indefinitely.
Now, how do we carry this out in an ‘abstract’ integral domain R? A
direct imitation would require that on R we should assume an order struc-
ture with properties similar to the usual ordering on 2. Then, first of all,
the order relation has to be compatible with the binary operations on R
(of. Exercise (1.26)). Such ordered integral domains can indeed be defined
and studied But if we further require that the set of positive elements in R
be well-ordered (which is true in case of 2). then this requirement would
be too strong and there would not be any “naturally occurring’ examples
of rings besides Z and its subrings that will satisfy it.
So we look for another generalisation. Instead of ordering the elements
of R directly we consider a suitable function :1 from R into the set of non-
negative integers (which is well-ordered) and work with the values assumed
by this function. We then make the following definition.
410 DISCRETE uni-[mums (Chapter Six)
2.1 Definition: An integral domain R is called a euclidean ring (or aeucli-
dean domain) if there exists a function d: R — {0} —> No (the set of non-
negative integers), such that the following condition (It) holds.
(1-) for all a, be R, with b eé 0, there exist q, r e R satisfying,
a = qb + r where either r =0 or d(r) < d(b).
The function dis often called the degree function and d(a) is read or
‘degree of a’. This terminology comes from the ring of polynomials over a
field, which we shall show to be a euclidean ring a little later. A trivial
example of a euclidean ring is any field. If R is a field and we let d(a) =1 0
for a e _F, a ye 0, then (t) holds because, since H exists we can take
q=ab-la.ndr= 0.
As the first non-trivial example of a euclidean ring, we have,
2.2 Theorem: The ring of integers is a euclidean ring.
Proof: Define «1: Z — {0} —> N. by d(x) = lxl, that is, the absolute value
of x. We have to show that (t) is satisfied. [fa = 0, then we set 4 = r _ 0.
suppose a, b are both non-zero. We make four cases.
(1) a > 0, b > 0. in this case we argue by induction on a. For a = l,
we set-q= l,r=0 if b=l and q=0,r’=lifb>l. Now
suppose a = n > 1 and the result holds for all positive values of a
less than n. If n < b. we set q = 0 and r = n. Suppose n > b.
Then n — bis either 0 or a positive integer less than It. So, in
either case we can find 11.. r, such that n — b = qnb + n, with
r“ = 0 or d(ro) <b. We then haven = (q0 + I) b + r, and so can
setq=qo+ 1 andr=ro.
(11) 11> 0,b<0.Letc= —b.Thena>0,c>0 and so by Care
I; there exist q” r, e 2 such that a = qlc + r‘ with r; = 0 or
|r1|< [c|. Letq= —q‘ and r=r1. Then a=qb+r and
eitherr=Oor|r| = |r, | <|c|=lb|,asdesired.
(III) a < 0, b > 0. Replace a by —a and apply Case I, to get —u = qib
+ rl (soy). Then set q = — q, and r = — r,.
(IV) a < 0, b < 0. Here too, replace a, b by —a, — b and convert to
Case I. I .
Note that what we have called euclidean algorithm is slightly difi’erent
from the condition (a) in Definition (2.1) applied to the ring 2. We assume
that b > 0 and require the remainder to be non-negative, whereas condition
(us) only gives |r| < |b| (unless r = 0). This is only it minor difference and
can be easily straightened out. Let a, b be integers with b > 0. By the last
theorem, we can find q, r such thata = qb + rwith |r] < lb I. If r 2 0, then
of course we are done. If r < 0, then, r must be in theset(—b + 1,..., —1)
Rings, Field: and Vector Spacer 4|]
(because Ir] < b). But then b + r isin the set (1,..., b— l)and a = (q —— l)b
+ (b + r) gives the desired representation of a with the remainder, (that
is, b + r) non negative. Thus we have completely proved the euclidean
algorithm and thereby sanctified its uses in the previous chapter. (We could
have as well given the proof the first time the algorithm was used. But we
deferred it till now because the definition of a euclidean ring involves it
crucially.) Note that the essence of the inductive step in Case I of the proof
above is the same as finding the largest multiple of b which can be subtrac-
ted from a. The trouble is how to find it. The induclive argument really
shows that the largest multiple of b which can be subtracted from a is the
next larger multiple than the largest multiple which can be subtracted
from a -— b.
Because of the last theorem, whatever properties are true of all euclidean
rings would hold, in particular, for Z. But before proving such properties,
let us show that the class of euclidean rings is large enough to make it
worthwhile to prove theorems about them. In this vein we prove:
2.3 Theorem: For every field F, the polynomial ring F[x] is a euclidean
ring.
Proof: We define d : FIx] —— (0} —> No to be function which assigns to each
polynomial its degree as defined in the last section. Thus if fix) = a, + up:
+ + 11.x“ with a. 95 0, then d(f(x)) = n. We assign no degree to the zero-
polynomial. For the condition (t) let f(x), g(x) be polynomials of degree
m, 7: respectively, say, f(x) = 0., + up: + + tax" and g(x) = bu + 171:: +
+ b.x" with a... ge oee b... If m < n we write f(x) = 0-g(x) +f(x) and
this proves (t). if m 2 n, then we subtract a suitable multiple of g(x) from
fix). Specifically, let h(x) = [(x) — (a,/b,.)g(x)s'""'. Note that here for the
first time we are using that F is a field (whence am/b. that is, a,,(lz,,)-1 exists).
If h(x) = 0 then g(x) divides fix) and we are done. Otherwise h(x) is a
polynomial of degree less than m. So we can proceed by induction on m
and write h(x) as qo(x) g(x) + r..(x) where r,(x) is either 0 or a polynomial
of degree less than n. Then we simply write
'(x) = [q..(x) + Z—f w] xv) + we
as desired. Thus condition (s) in the definition is satisfied. So F[x]isa
euclidean ring. I
We mention yet another example of a euclidean ring, called the ring of
Gaussian integers. It consists of complex numbers whose real and imaginary
parts are integers, that is, {x + lyzx, y e 2). It is easily seen to be a .sub-
ring of the field of complex numbers (of. Exercise (1.17)) and hence is an
integral domain. Let us denote this ring by G. if we define d G — (0) —> N.)
G
by d(x+ iy) = x2 + y', then it can be shown that wrth this function,
becomes a euclidean ring. This ring has interesting applications to number
412 DISCRETE MATHEMATICS (Chapter Six)
theory a few of which will be given as exercises.
We now turn to proving properties of euclidean rings. The foremost is
the following:
2.4 Theorem: Every ideal of a euclidean ring is a principal ideal (that is,
consists precisely of all multiples of some element).
Proof: Let R be a euclidean ring and I an ideal of R. We have to show
that I is of the form Rx for some xER. Certainly it] = {0), then we may
take x= 0. So we suppose I contains some non-zero elements. Let d:R
— (0)—>1\I0 be the degree function for R. Since the set N, is well ordered,
among the non-zero elements of 1. there exists some element, say x, such
that 110:) is the minimum value of don I—(O), that is for all y e I, y eé 0.
d(x) g d(y). We claim that the principal ideal Rx coincides with I. Certainly
Rx C I because x e I and I is an ideal. For the converse. suppose y e I.
Since x eé 0, by condition (a) in Definition (2.1), there exist q, r E R such
that y = qx + r where r = 0 or d(r) < d(x). We claim that only the first
possibility holds. For, since y e I and qx e I, we have that r e I. lfreéo.
then d(r) < (l(x) would contradict the choice of x as an element of I of the
smallest possible degree. So r =- 0. Hence y = qxe Rx. That is, I C Rx.
Therefore 1 equals the principal ideal Rx. I
The theorem as well as the proof must have reminded the perceptive
reader of Proposition (5.1.7), in which all subgroups of 2 were obtained.
Actually, the two proofs are not substantially difl'erent, because both of
them are based on the fact that N (or No\ is well-ordered and theenclidean
algorithm.
Note also that the proof above not only shows that every ideal of a
euclidean ring is a principal ideal, but also tells how to find a generator
(that is, the element x) of it Any element of the smallest possible degree
among the non zero elements of an ideal will be a generator for that ideal.
The importance of this theorem will be clear when we study some of its
consequences. There is a name for rings which have the property asserted
by its conclusion.
2.5 Dcflnltlon: An integral domain in which every ideal is principal is
called a principal ideal domain (abbreviated p.i.d.) or a principal ideal ring.
This is the second of the three conditions we are studying. The preced-
ing theorem may be reworded as ‘Every euclidean ring is a p.i.d.’. The con-
verse il‘ false. Motzkin has shown that the ring (a + I‘VE b : a, b e Z) is
a p.i.d. but not a euclidean-ring. The proof is beyond our scope.
Not every integral domain is a principal ideal ring. In fact, as we are
about to prove, a p.i.d. must contain a unit element. Since we are not
requiring all integral domains to have identity elements, we can get many
integral domains such as 22, 32, which are not p.i.d.'s. As a less trivial
example, consider Z[x], the ring of polynomials over 2. Let I be the set
of
Rings, Fields and Vector Spaces 413
those polynomials in Z[x], the sum of whose coeflicients is even, that is,
.
1= (no + 11.x + + a.x":'Enoi is even} It is easy to show that I is an
ideal of Z[x]. However, it is not a principal ideal. For. if it were, then there
would exist a polynomial for) in I, such that every element of I is a multiple
offix). What can be the degree offor)? Since the constant polynomial (of
degree 0) 2 is in I, there exists a polynomial 30:) in Z[x] such that 2 =f(x)g(x).
But since degree off(x) g(x) = degree offix) 4- degree of g(x). this forces
fix) (and also g(x)) to be constants. Further f(x) must be either 2 or —2
(since these are the only even integers which divide 2). But then 1 + x.
which is in I, cannot be expressed as a multiple of/(x).
Let us now study properties of principal idesl domains.
2.6 Proposition: Every principal ideal domain has an identity.
Proof: Let R be a p.i.d. . Then R itself is an ideal of R. So R = Rx for
some x e R. Then every element of R is a multiple of x. In particular there
exists 9 e R such that x = ex. We claim that e is the identity for R. For
let y e R Then yex = yx. Obviously x at 0 (as otherwise R = (0}, the tri-
vial ring). So cancelling by x, ye = y. By commutativity, 2y also equals y.
So 9 is the identity element of R. |
In particular it follows that all euclidean rings have an identity element.
A minor but pleasant consequence of this proposition is that whenever an
ideal I of a p.i.d. is generated by an element x then x is in I. because
at = x~l e xR = I.
Before proving other properties of principal ideal domains. we prove a
few simple but useful results which show how certain concepts associated
with divisibility of one element by another can be characterised in terms of
the principal ideals generated by the two elements. First we characterise
divisibilily itself. For the remainder of this section, unless otherwise stated,
R will denote an integral domain, having an identity element. If a E R,
(a) will denote the ideal Ra.
2.7 Proposition: Let u, b e R. The a divides“ b if and only if (b) C (a).
Proof; Suppose a divides b. Then there exists 0 E R such that b = ac.
Letx (_=_ (11). Then 2: = rb for some r E R. But then x = r(ca) = (rc)a
showing that x E (a). Thus (b) c (a). ,Convcrsely suppose (b) c (0).
Since R has an identity, b e (b). 80 b e (a), that is b =cu for some c e R.
In other words, a divides b. 5
*Note that the relation to divides b‘ depends crucially on the ring R. In 7., 3 does
not divide 5. But viewed as elements of I (which is an extension ring of 2).} does
divide 5. To stress the role of the ring R. we should perhaps write ‘a divides b in it“
instead of just ‘a divides b'. But. by context we always understand the ring R. This
also lpplles to all concepts involving divisibility such as g.c.d., units, primes. associates
to be studied later.
414 mscnsra MATHEMATICS (Chapter Six)
From this proposition we see at once that the relation | defined by
‘a | b‘ ifl‘ a divides b is reflexive and transitive. (Of course, a direct proof
is equally simple.)
2.8 Definitlon: An element u of R is called annit if it divides the identity
element of R.
For example, in Z, 1 and —l are the only two units. In the ring of
Gaussian integers, 1, —l, i and -—t' are the units. In the ring of polyno-
mials over a field, all non-zero constant polynomials are units. It is easily
seen that all units of R form a multiplicative group. A unit should not be
confused with the unit element (which is just another name for the identity).
The name ‘unit’ probably comes from one of the characterisations listed
below. In mensuration problems, we try to choose the unit of measurement
in such a way that it divides all the quantities to be measured.
2.9 Proposition: Let u e R. Then the following statement are equivalent:
(i) u is a unit.
(ii) 14 is invertible (i.e., has a multiplicative inverse in R).
(iii) u divides every element of R.
(iv) (u) = R.
Proof: The equivalence of (i) and (ii) is immediate from the definition.
If u is a unit then there exists some v such that uv = 1. Then for any
r E R, r -= r-l = (rv)u. So u divides r. This proves (i) => (iii). The impli-
cation (iii) a (i) needs no proof. Finally, the equivalence of (iii) and (iv) is
an immediate consequence of the definition of (u).
Because a unit divides every element of a ring, it is to be considered as
a trivial factor. A factorisation in which one of the factors is a unit it.
therefore, a trivial factorisation. There is a name for those elements of R
which have no nontrivial factorisations.
2.l0_ Definition: An element x is called a prime element, if whenever
x = ab then either a or b is a unit.
A few comments are in order about the definition. First, according to
this definition every unit is a prime element. because every factor of a unit
is itself a unit. in particular 1 is a prime element. Some authors specifically
exclude units from primes. This is really a matter of convention. Secondly,
in z, the ring of integers, all the prime numbers, namely, 2, 3, 5. 7, are
prime elements. But the converse is false! Because —2, —3, —5, are also
prime elements. In general. if): is a prime element and u is any unit then
xu is also a prime element. Our goal is to prove aunique factorisation
theorem for principal ideal domains. For this. we would obviously not
like to treat x and xu (where u is a unit) as different factors. Such pairs
of elements are given a name.
RMgs, Fields and Vector Space: 415
2.11 Definition: An element a is called an assoelate of b if there exists
a unit M such that a = bu.
The following proposition characterise: this concept.
2.12 Proposltlon: Let a, b e R. Then the following statements are
equivalent:
(i) a is an associate of b
(ii) a and b divide each other
(iii) (a) = (b).
Proof: (i) so (ii). Suppose a is an associate of b. Then there exists a unit
a such that a = bu. So certainly b divides a. But since u -is a unit, there
exists v e R such that uv = 1. Then av = buv = b which shows that as
also divides b. So (ii) holds.
(ii) a (i). Suppose a, b divide each other. Then these exist 14, v e R
such that a = bu and b = av. Then a = auv. Assuming a aé 0, this gives
uv = 1 and so u is a unit. But then a is an associate of b. If a = 0, then
b is also 0. Then a = 0 = b]. Since 1 is a unit, a is an associate of b. So
(1‘) holds.
The equivalence of (ii) with (iii) is immediate from Proposition (2.7). I
2.13 Proposltion: Associateship is an equivalence relation. Divisibility
defines a partial order on the set of the equivalence classes.
Proof: The first part follows from the characterisation (iii) above. For
a, b e R, define a < b to mean a divides b. Then g is reflexive and transi-
tive. However, it need to not be antisymmetric. In view of (ii) above,
a g b and b < a holds ifl‘ a is an associate of b. From Proposition (3 3.4)
it follows that on the set of equivalence classes we have awell—defined
partial order. i
Let us see if the poset just obtained is a lattice.
2.14 Theorem: Every two elements in a principal ideal domain have a
greatest common divisor (abbreviated g.c.d.). It is unique upto associate-
ship and is expressible as a linear combination of the two elements. Every
pair of non-zero elements hasa least common multiple which is also unique
npto associateship.
Proof: Let a. b be two elements of a p.i.d. R. Let I be the set of all linear
combinations of a and b with coeflicients from R, that is, I = {xa + yb:
x, y e R). It is easy to show that I is an ideal of R, I is evidently closed
under subtraction. Also, if xa + yb e I and r E R, then r(xa + yb) =
(my: + (ry)b which is in I. So I is an ideal and since R is a p.i d., Iis of
the form Re for some 0 E I. We claim that c is aged. of a and b. First.
416 DISCRETE MATHEMATICS (Chapter Six)
a e I since a = La + 0.b. So a = he for some 7. e R. This meansc | a.
Similarly c| b. Thus c is a common divisor of a and b. To show it is the
greatest common divisor, let d be a common divisor of a and b, say a=md
and b = mi for m, n e R. Now c, being an element of I is a linear com-
bination of a and b, say c=xoa+yob for x“. y., e R. Then e=(xom+y,,n)d,
showing that d divides e. 80 c is the greatest common divisor of a and b.
(Here ‘greatest‘ is in the sense of the partial order defined above.) As for
its uniqueness, suppose c’ is any other g.c.d of a and b. Then c must divide
c' and c' must also divide e. So c and c’ are associates of each other. Thus
the g.c.d. is unique upto associateship That it is expressible as a linear
combination of a and b is already proved.
As for the least common multiple(l,c.m.), let a, b be non-zero elements
of R. Let c be their g.c.d.. Then c 96 0. Leta = M and b = ac, where
A, t!- e R. Let 4 = Mu. We claim thatd is the least common multiple of
a and b. Now, :1 = [.ta = Ab and so d is a common multiple of a and b.
To show that it is the least common multiple, note first, that e is a linear
combination ofa and b, say c=aa + 9b with a, a e R. Then c=(¢H bloc
which gives a). + By. = 1. Now suppose e is a common multiple of a and b,
say e = xa = yb with x, y e R. Then e = Axe = aye, giving Ax = by,
since c 9k 0. From a). +9.1. = l we get I: = m + flux = p(uy + Bx)
and hence e = ax :- aa(ay + Bx) = d(1y + bx) (since d = up. = bx).
This shows that d divides 2. Thus :1 is the least common multiple of a and
b. Uniqueness ofd(upto associateship) follows by the same argument as
that for uniqueness of g.c.d.
This theorem shows that ifR is a p.i.d., then the set of equivalence
classes (under associateship) of non-zero elements or R is a lattice.
The crux or the last theorem is not just the existence of a greatest
common divisor for every two elements of a p.i.d., but also the fact that
it can be expressed as a linear combination of the two elements. However.
the proof does not say how to express the g c.d. as a linear combination
of the two elements. This is to be expected since the very definition of a
principal ideal domain is non-constructive in the sense that it merely says
that every ideal is generated by a single element, without saying anything
as to how to find this generator. However, in the case of a euclidean ring,
the things are much better. The algorithm for finding the g.c.d. of two
positive integers. given at the beginning of this section can he obviously
extended to any euclidean ring. The crucial step in proving that the algo-
rithm actually works is the following.
2.15 Proposition: Let R be a euclidean ring and a, b E R. If a = qb + r
then the g.c.d. of a and b is the g.c.d. of b and r. If r = 0, then b is the
g.c.d. of a and b. (Here equality is upto associateship.)
Proof: Let c. d be, respectively, the g.c.d.'s of a and b and of b and r.
Then d divides b and r so it divides qb + r, i.e. d divides a. Sod is acommon
Rings, Fields and Vector Space: 417
divisor of a and b. Hence :1 | c. Similarly, c, being a common divisor of a
and b is a divisor of r (since r = a — qb). So c divides both b and r, and
hence c | d. Thus cand dare associates of each other. For the second
assertion, if r = 0, then b itself is a common divisor of a and b and obvious-
ly is their greatest common divisor. I
It is now very easy to give an algorithm for finding the g.c.d. of two
elements a, b of a euclidean ring R. 11' b = 0 then a is their g.c.d. Other-
wise we set ”a = :1, b0 = b and write do = ‘Iobo + n, where r0 = 0 or
(1(70) < d(bo). If ro = 0, then be is the g.c.d. of a and b and it is a linear
combination ofa and bsince bo = b = Lb + 0~a. Ifro ¢ 0 We set a,= 17,,
b1 = r0 and repeat the process, that is, write 111 = (11b; + rI where r‘ = 0
or dog < d(b1) = do“). In the first case, b; is the god of a, and bi.
and hence also of a and b by the last proposition. Also b, = to = an — qobn =
a — qnb, which is a linear combination of a and b. If rI 7% 0, we continue
the procedure setting aI = b1 and [II = r,. In general, we continue induct-
ively. If r, as 0, we set a,“ = b, and 111+; = n and write a,+1 = (Imbm + rm
which r,+1=0 or d(r,+,) < d(r,). By induction, it is seen that each r, is a
linear combination of a and b. Since the degree function takes only non-
negative integers as values, there must exist some n such that r. = 0. Then
r.._, is the g.c.d. of a and b. As a numerical example we compute the g.c.d.
of 1092 and 195 as follows:
1092=5x19s+117
l95=1><117+ 78
ll7=1X78 + 39
78=2><39 + 0
So 39 is the g.c.d. of 1092 and 195. To express it as a linear combination,
we write 39 = 117—] x 78 = ll7—-(195 — lI7)=2 x 117— 195 =
2(1092 — 5 x195)—195 =2>< 1092—11 x 195.
As noted above, for principal ideal domains which are not euclidean
rings, there is no algorithm for expressing the g.c.d. of two elements as a
linear combination of them. Still, the mere existence of such an expression
has some useful consequences. We prove one of them here. First we need a
definition.
2.16 Definition: Two elements a, b of R are said to be relatively prime
(or coprime) to each other if they have! (and hence unyunit) as thelr
g.c.d. '
For example, in z, 6 and 25 are relatively prime to each other. It Is
immediate from the definition that two elements are relatively prime ifi‘
they have no common factors except the units.
418 mscnm summaries (Chapter Six)
2.17 Propositlon: If a, b, c e R, a | be and a, b are relatively prime then
a l c, provided R is a principal ideal domain.
Proof: Let be = ax for some x e R. Since a, b are relatively prime, their
g.e.d. is 1 and since R is a p.i.d., we can write 1 as M + pb for some
7i, p. e R. Multiplying by c we get: = Me + pbc = Me + pax = a(Ac +px),
showing that a divides c as desired. I
The following corollary, although a special caseof the last result, is
worth singling out.
2.18 Corollary: Suppose R is ap.i.d. and p is a prime element of R. Then,
whenever; divides be for b, c E R, p must divide either b or c.
Proof: Suppose p does not divide b. The only factors of p other than units
are p and its associates. None of these divides b. So the only common
factors of p and b are units. Hence p and b are relatively prime. By the
last proposition, p | 0. Hence if p divides be then either p ] b or p | c. I
We had already used this corollary in the proof of Theorem (5.1.5) in
which it was shown that if p is a prime then the set of non-zero residue
classes modulo p is a group of order p —— 1 under multiplication modulo 1:.
The same fact was also used (without proof) in showing that Z, is field, in
the last section. Now that we have proved it, we have filled the gap. Actually,
now that we have at our disposal Proposition (2.17). which is more general
than Corollary (2.18), we can prove a more general result.
2.19 Proposition: Let m be a positive integer. Then the set of residue
classes modulo m, which are relatively prime to m. that is, the set {[x]: x
is relatively prime to m) is a group under modulo m multiplication.
Proof: Let 2,. be the ring of residue classes modulo m. Let S be the set of
residue classes relatively prime to m. (Note that if x E y(modulo m), then
m and x are relatively prime iff m and y are relatively prime. So the concept
of a residue class being relatively prime to m, is well-defined.) We assert that
Sis closed under modulom multiplication. Let [x], [y] E S. Then x, y E Z
and both are relatively prime to m. So there exist a, b, c, d e Z such that
um+ bx=1 and cm +dy= 1. Then we see that (acm+bcx+ady)m +bdxy= 1.
So m and xy are relatively prime (any common factor of them will be a factor
of _ 1 since 1 is a linear combination of xy and m). So [xy] E S, that is
[x] [y] E S. So S is closed under modulo m multiplication. Therefore, mod
m multiplication defines a commutative and associative binary operation
on S. We claim that the cancellation law holds for this operation. For
suppose [x], [y], [z] are in S and [x] [z] = [y] [2]. Then (x — y): is divisible
by m. But m is relatively prime to 2. So by Proposition (2.17), m divides
x — y. Thus x a y (modulo m) or [x] = [y]. We have shown that S is a
semigroup in which the cancellation laws hold. Obviously S is finite, since
Rings, Fields and Vector Spaces 419
S C 2.. and Z... is finite. So by Proposition (5.1.4). S is a group. This
completes the proof. But it is instructive to go a little further. Every
element of S is a unit in the ring 2..., because it has a multiplicative inverse.
Conversely, if [x] is a unit in 2,, then there exists iy] E 2... such that xyzl
(modulo m). But then x must be relatively prime to m. (Prove) So [x] E S.
In other words, the elements of S are precisely the units of the ring Z... I
_ Let us denote this group by R... Elements of R... can he represented by
integers between 1 and m — 1 which are relatively prime to m. The number
of such integers is ¢(m), where 99 is the Euler fi-function (see Exercise
(2.4.15)). ’
Integers which are relatively prime to a given integer figure many times
in various contexts. We cite one example.
2.20 Proposition: Let G be a cyclic group of order n. Then G has ¢(n)
distinct generators.
Proof: Let x be a generator for G. Then G consists of x, x‘,.... M" and
x‘(= e). We claim that for l g m < n, x" is a generator of G ifi‘ m is relati-
vely prime to n. Suppose x’" is a generator. Then every element of G is
expressible as some power of x. In particular (x"')' = x for some r E 2.
But then x'"' = x which shows mr E l (modulo n), by Theorem (5.1.10).
So [m] [r] = [l] in the ring Z... This means [m] is a unit in 2,, and so m is
relatively prime to n. Conversely if m is relatively prime to n, we trace this
argument backwards and get x as (may for some r. But then every Jrl can
be expressed as 06")" and hence is in the subgroup generated by x’". This
establishes our claim and shows that the number of generators of G equals
the number of positive integers which are less than n and are relatively
prime to n. By definition, this number equals ¢(n). I
We apply this result to the group of all complex nth roots of unity.
Recall from Chapter 5, Section 1 that a generator for this group is called
a primitive nth root of unity. So we see that for every positive integer n,
the number of primitive nth roots of unity is ¢(n).
A few other results where the g.c.d. of two integers is needed will be
given as exercises. Let us go back to principal ideal domains in general.
From Proposition (2.19) it follows that if p is a prime then Z, is a field.
(We had already mentioned this in the last chapter. But the proof was
incomplete since it involved the use of Corollary (2.18).) Note that Z, is
the quotient ring of Z by the ideal pl, which is the principal ideal generated
by p which is a prime element. It would be natural to inquire if this canbe
generalised to all principal ideal domains. The aflirmative answer is provi-
ded by the following result.
2.21 Theorem: Let R be a principal ideal domain and p E R. Then 11 is
a prime element if and only if the quotient ring R/(p) is a field.
420 Discnm MATHEMATICS (Chapter Six)
Proof: If p is a unit then (p) = R and the quotient ring is trivial. It is,
therefore, the trivial field. Conversely, if R/(p) is the trivial ring then R=(p)
and so p is a unit. By our convention, units are included among prime
elements. So the result holds trivially when p is a unit.
Suppose p is not a unit. Then (p) is a proper ideal of R. In view of
Proposition (1.16), we are reduced to proving that (p) is a maximal ideal
ifl' p is a prime element. Suppose (p) is a maximal ideal and letp=ab
be a factorisation of p. Then p e aR = (a), 50 (p) C (a). By maximality
of (p), (a) = R or else (a) = (p). In the first case a is a unit. In the second
case. p is an associate ofa by proposition (2.12). So a = pa for some unit
14. But then p = ab = pub whence ab = 1 and so b is a unit. Thus in any
factorisation of p at least one factor must be a unit. So p is a prime
element.
Conversely suppose p is a prime element. Let I be an ideal such that
(p) S I C R. We have to show that I = R. Since R is a p.i.d., I is of the
form (q) for some q E R. Since (p) C (11). by proposition (2.7), qdivides p.
So p = aq for some a e R. Now a cannot be a unit, for otherwise. q will
be an associate of p and hence (p) = (q), contradicting that (p) is properly
contained in I. Since p is a prime element, p = aq and a is not a unit, the
other factor, q, must be a unit. But then (47) = R, showing I = R, as was
to be proved. a
This theorem is of great importance in the construction of field exten-
sions. Let F be a field Then [Ix], the ring of polynomials over F is a
euclidean ringI and hence a p i.d The prime elements of this ring are called
Ir ’ " overF.“ r, f(x)isanh- ‘ “ n.‘ 'I
over F. Let I be the ideal, ‘ by f(x) I ' of all poi, ' ‘
of the form f(x) g(x) where g(x) e F[x]. Let K be the quotient ring F[x]/I.
Then by the theorem above K is a field. If f(x) is a constant (i.e., a degree
zero) polynomial then of course I = F[x] and K is trivial. But suppose
f(x) is of positive degree. Then the function 0:F —> K defined by 0(a) = a + I
for a e F is a ring homomorphism with kernel 0, because the ideal I con-
tains no constant polynomial except 0. So the range of 8 is a subring of-K
and is isomorphic to K As usual we identify F as a snbfield of K. Thus K
is a field extension F. This extension has a very interesting property. To study
it, we introduce a concept which itself is of great importance.
2.22 Definition: Let f(x) = 41., + (1,): + + 0.x" be a polynomial with
coefiicients in some ring R (not necessarily an integral domain). Then an
element at of R is called a root (or a zero) of f(x) if
ao+a,a+a,a‘+...+a..a"::0.
There is a handy way to look at this concept. In the polynomial
f(x), 2: is just a symbol, an indeterminate. If we substitute for x, some
element, say, a of the ring R, then we get an element of R which we
Rings, Fields and Vector Space: 421
denote by f(a). This way we get a function from R to itself. We may
denote this function by f. A root of the polynomial f(x) is then simply an
element at which the function f takes the value 0. It should be noted,
however, that a polynomial is conceptually different from the function it
induces. Two different polynomials may induce the same function. For
example, let R=Z,, the ring of mod 2 residuoclasses. Let f(x) =x' + x + l
and g(x) = 1. Then f(x) and g(x) are difl‘erent polynomials but they deter-
mine the same function from 2, to itself because.f(0)=0+0+1=1=g (0)
andf(l)=l+1+ l=l=g(l).
2.23 Proposition: Suppose R is a commutative ring with identity. Let
f(x), g(x) be polynomials in R [x]. Let h(x) =f(x)+g(x) and p(x) =f(x) g(x),
Let f, g. h, p be the functions from R to R defined by the polynomials
f(x), g(x), h(x) and p(x) respectively. Then h is the pointwise sum off
and g and h is the pointwise product off and g.
Proof: We have to show that for every (16 R, h(a)=f(a)+g(a) and
p(a) =f(a) 3(a). This may seem obvious and is indeed straightforward to
prove, keeping in mind the way polynomials are added and multiplied.
Note, however, that for proving p(a) = f(a) g(a), commutativity of R is
vitally needed. If R is the ring of quaternions, f(x)=x and g(x) =ix
weapon =rx'. Bu:f(1)e<j)=jk=rwhile p<j>=w=—i. Thus/mg(,~)..p(j)
even though as polynomials, f(x)g(x)=p(x). The crux of the matter is
that when we think of R as a subring of RM, the indeterminate x
commutes with all elements of R but when x is replaced by some element
a(say) of R, this ' need not with other ' ‘ of R. If
R is commutative this difficulty does not arise and the result holds. I
Using the last proposition and the euclidean algorithm we get the
following characterisation of roots, sometimes called the ‘remainder
theorem‘.
2.24 Theorem: Let F be a field and f(x) 6 [Ix]. Then an elementaof
Fis a root off(x) if and only if (x — a) is a factor off(x) in F[x].
Proof: Suppose first that x — a is a factor off(x). Then f(x)=(x—a)g(x),
for some g(x) 6 PM By the last proposition, f(a) = (a: — a) g(¢) = 0. So
a is a root off(x). Conversely suppose a: is a root of f(x). Then f(at) = o.
By the euclidean algorithm for 11x], there exist polynomials q(x), r(x) in
[TX] such that f(x) = q(x) (x — a) + r(x) where r(x) = 0 or deg (r(x)) <
deg (x —— a) = i. In any case it follows that ((x) is a constant, say [5. Now,
by the last proposition again, flu) = (1(a) (at —- a) + r01). But [(1) = 0,
since at is a root off(x). So r(ot) = fl = 0. Hence r(x) is the zero polyno-
mial. Thus f(x) = q(x) (x — a), showing that x — a is a factor of fix). I
A consequence of this result is noteworthy.
422 niscrum; MATHEMATICS (Chapter Six)
2.25 Proposition: A polynomial of degree I: over a field can have at
most n distinct roots in that field;
,Prnof: Let F be a field and suppose for) e F[x] has degree n. We prove
the result by induction on n. For n = 0, the polynomial is a non-zero cons-
tant polynomial and so has no roots. Suppose n > 0 and the result has
been proved for all polynomials of degree less than It. If f(x) has no root
the result certainly holds. So suppose f(x) has at least one root say on. Then
by the last theorem, we can write f(x) = (x — a) g(x) where g(x) is a
polynomial over F. Clearly g(x) has degree n — 1. If at, is any root of f(x)
other than a, then flux) = 0 = (a, —— a) g(a,) which shows 3(a) = 0 'since
at —— a. 515 0. Thus all roots off(x) except at are roots ofg(x). By induction
hypothesis, g(x) has at most n -— 1 distinct roots in F. These roots, along
with a, are the only roots off(x). So f(x) has at most n distinct roots. I
This proposition can be strengthened in two ways. First, it remains true
even if each root is counted as many times as its multiplicity. This version
will be given as an exercise. Secondly, it holds even for integral domains.
If R is an integral domain we let F be its field of quotients. Then R c F
and so R[x] c F[x]. lt‘f(x) e Rlx] then a root off(x) in R is also a root
of it in F. So the number of distinct roots of f(x) in R cannot exceed the
number of distinct roots off(x) in F, which, by the proposition, is at most
the degree off. Thus the hypothesis that the ground ring he a field can be
relaxed a little. However, commutativity cannot be dispensed with. In the
ring of quaternions, the polynomial x‘ + l of degree 2 has at least 6
distinct roots, namely 3: i, :l; j and j: k. Similarly, absence of zero divi-
sors is necessary. In a Boolean ring, every element is a root of the
polynomial x' —— x (which also equals x‘ + 1:, since — l = 1).
After this disgression about roots, let us go back to the field extension
obtained through an irreducible polynomial. Suppose f(x) is an irreducible
polynomial of degree n (> 0) over a field F. Then f(x) can have no roots in
F, because if a were a root, then (x—u) would be a nontrivial factor of
fix), contradicting that it is a prime element in F[x]. (The converse is not
true. The polynomial (x‘ -l l)‘ has no root in R, but is not irreducible in
in Rh]. as it is the product of x’ + l with itself.) We let I be the ideal of
F[x], generated by fix), and K be the quotient ring FixJ/I. We already saw
that K is a field. We identify F with a subfield of K, by letting a e F cor-
respond to the coset a + I. Since F C K, the polynomial fix) might as well
be considered as a polynomial over K. But, as a. polynomial in K[x]. f(x) is
not irreducible. As a matter of fact it has a root and this root is none
other than the element x + I. To see this, suppose f(x) = a, + a,x + +
(1.xl where an, a,,..., a. e F. Let a be the coset x + I in F[x]/I. By defini-
tion of onset multiplication, a’ =(x + 1)(x + I) = x' + I. More generally.
for all k} l, a"=x"+1. Also for any aeF, ac: really means (a+ I)
(x + I) which equals ax + 1. Thus we see that f(u) = a“ + (Ila + ...+ aw."
Rings, Fields and Vector Space: 423
is simply (a, + a,x +...+ am) + I. But this is the zero coset, I, because
a. +a,x +...+ a.x" =f(x)el. So f(at)= 0. That is, x+ I is a root of
fix) in K[x]. We have thus proved the first half of the following important
result.
2.26 Theorem: Every irreducible polynomial of positive degree over a
field has a root in some extension field of it. Moreover, no such root can
satisfy a non-zero polynomial of lower degree with coefiicients in the given
field.
Proof: The first part was already proved above. For the second assertion,
suppose I" is a field and f(x) is an irreducible polynomial of degree n in
11):]. Let a be a root of f(x) in some extension field, say K, or F. Let
J = {g(x) E F[x]:g(or) = 0). Clearly [(x) 6 fix] and the assertion amounts
to showing that I cannot contain any non-zero polynomial of degree less
than n. It is easily seen that J is an ideal of F[x]. Since Fix] is a euclidean
ring, by Theorem 2.4. J is a principal ideal, i.e., there is some polynomial
h(x) e F[x], such that J consists precisely of multiples of h(x). In particular
f(x) is a multiple of h(x). But since f(x) is irreducible, this can happen
only if li(x) is an associate of fix). In that case, J is also generated by f(x).
0 iy every " ' ‘ofJis a ' 'r' off(x) and hence
has degree at least n. l
Lest the construction appear too abstract, we illustrate it with a parti-
cular example. Let F: R, the field of real numbers and let fix) = x' + 1.
Then f(x) is an irreducible polynomial over R, because if for) =p(x)q(x)
where neither p(x) nor q(x) is a unit Ihen both p(x) and q(x) would be
polynomials of degree 1. Since polynomials of degree 1 always have roots,
and any root of p(x)4 (or of q(x)) is also a root of fix), this would imply
that f(x) has a root in R. But we know this not to be the case, since, in R,
the square of every element is non-negative (cf. Exercise [.26). So f(x) is
irreducible over R Let K = RIxI/l as constructed above. We claim R[x]/l
is isomorphic to C, the field of complex numbers. Define 0:R[x]->C by
6(g(x)) = 2(1'), for g(x) e R[x]. In other words, if 30:) = 17,, + by: + +
b,,,x"' where ba,..., b," E R, then 6(g(x)) is the complex number b.+b,i+...
+ b..i"". Clearly 0 is a ring homomorphism and 0 is onto since every com-
plex number, say, a + to can be expressed as 0(g(x)) where g(x) = a + bx.
We claim that the kernel of 0 is precisely the ideal I, generated by x‘ + 1.
If g(x)e I then g(x) = h(x)(x‘ + l) for some h(x)e R[x]. But then
g(i) = 1:0i + 1) = h(t)o = 0 showing that g(x) e Ker (0). Conversely.
suppose g(x) e Ker (0). Then g(i) = 0. By the euclidean algorithm write
g(x) = 4000:9 + l) + r(x). Then r(i) = 0. Also r(x) is either 0 or a poly-
nomial of degree 0 or 1. But the complex number 1 satisfies no polynomial
equation of degree 0 or 1 with real coefficients. (If it did, i would be a real
number.) So 7(1) = 0 forces r(x) to be identically zero. Hence g(x) = q(x)
424 DISCRETE MATHEMATICS (Chapter Six)
(x'+ 1), showing g(x)e I. Thus we have shown that the kernel of e is
precisely the ideal I. By the fundamental theorem about ring homomor-
phisms (of. Exercise 1.22), it follows that R[x]/I is isomorphic to c. Thus
the extension field we constructed is isomorphic to the field of complex
numbers. In this isomorphism the root x+I goes to the number i. We
could have as well defined 4;:R[x]—>C by ip(g(x))=g(—i). In that case
x + I would correspond to —i.
The argument above shows that we might as well have defined complex
numbers as elements of the extension field R[x]/I constructed above. As
mentioned in the last section. an alternate approach is to regard a complex
number as a 2 x2 real matrix of a certain form. With this representation
0 —l
the complex number 1 corresponds to the matrix [1 0] . The smallest
b 0
suhring of M,(R) containing all matrices of the form (0 b) for b e R
and the matrix (1 0) is isomorphic to C, and hence to R[x]/l. A direct
isomorphism between C and this subring can be constructed by first dc-
fining a homomorphism 95: R[x] —> M,(R) which takes a polynomial
b. +b1x +...+b,,,x"'
to the matrix
(0b. 0
b0
+ b,
0
0
bI
( 0l -1‘
0
)+
b, 0
0 b,
0
1
—l"
0
) +...+
b”. 0 ) 0 —-l)"l
+
(o b. (1 o .
It can be shown that the kernel of d) is preciw the ideal I generated by
0 —l
the polynomial x‘ + l. The matrix ( > is called the companion
l 0
matrix of the polynomial x‘ + 1. More generally, for every irreducible
polynomial f(x) of degree it over F, we can define its companion matrix as
a certain n x n matrix. Then the extension field formed from f(x) comes
out to be isomorphic to the subring of M.(F), generated by the companion
matrix. This provides a ‘ooncrete’ representation of the somewhat ‘abstract'
extension field constructed above. We could do this now, but we defer it to
the fourth section where we would be in a better position to do it.
It is natural to inquire whether Theorem (2.26) can be extended to any
non-constant polynomial. Suppose f(x) is a polynomial of degree n > 0.
Rings, Field: and Vector Spaces 425
over a field F. Canwe always find some extension field of K in which ftx),
(regarded as a polynomial overK now) has a root? lt fix)’Is irreducible,
the answer is ‘yes' by Theorem (2 26). if for) Is not irreducible but has
some non-trivial factor g(x) e 111:] which'Is irreducible, then also the answer
Is ‘yes’, because, any root of g(x) (in whatever extension field of I") is
evidently a root of f(x) also. So the problem reduces to proving that the
polynomial f(x) has at least one irreducible factor. This can be done by
repeated factorisation, or what amounts to the same, by induction on the
degree of f(x) If _f(x) Is not irreducible, then it has a proper factor (i.e. a
factor which'Is neither a unit nor an associate of [(x), say f (X) Not: that
the degree of f (x) will be strictly less than f(x). So by induction hypothe-
sis. f (x) has an irreducible polynomial as a factor. This polynomial will
also be a factor of f(x). Alternatively, we repeat the argument for f,(x).
If f,(x) is not irreducible, then it has a proper factor f,(x). Again deg
(£00) < deg (f,(x)). We now apply the same argument to f,(x). This
process has to stop because the degree function cannot go on strictly des-
cending infinitely many times. Thus eventually We shall find some irreducible
factor of f(x).
Although this settles the question with which we started (namely,
whether Theorem (2.26) can be generalised to any non-constant polynomial),
it is instructive to see whether the argument given above for the existence
of irreducible factors can be generalised from the ring F[x] to some other
types of rings. For a euclidean ring R. there is little difficulty once we show
that the degree of a proper factor is less than that of the original element.
However, if we try to generalise it to a still wider class of integral domains,
namely, principal ideal domains, then we have to reformulate the fact that
a sequence of elements with strictly descending degrees must be finite (i.e.
cannot go on ad infinltum). As we have done many times before, if we
replace the elements with ideals then we get the appropriate reformulation.
2.27 Proposition: In a principal ideal domain, every properly ascending
chain of ideals must be finite.
Proof: Let R be a p.i.d. We have to prove that there cannot exist an
infinite sequence of ideals of R, 1,, 1., I,,..., I,,,..., in which I. C I.“ for all
i
"=1, 2. .Suppose such a sequence exists. Then we shall get a contradic-
tion as follows. Let I—-
= U 1.. The set I'Is the union of the infinite family
of sets {[1, 1,. l,,.. ., ...}. llt is defined exactly like the union of finitely
many sets. That is, I consrsts of those elements of R which belong to at
least one I,I or I = (x e R: x e I. for some n, which may depend on 9:}.
Note that each 1,, is contained in I. Now, in general, the union of ideals
need not be an ideal. However, in this case we show that I is an ideal.
First, let x, y e I. We have to show It — y e I. By definition there exist
426 mscxm MATHEMATICS > (Chapter Six)
positive integers m, n such that x e I... and y e I.. Now either m S n or
n gm. In the first case, x e I. (since 1.. c I.). Also y e I. So I. being an
ideal, x — y E 1.. But then x — y e I. In the second case x, y are both in
Im whence x —— y e I,,. and hence x — y e I. In either case x — y e I. For
the second condition of an ideal, let x e I and r e R. Then x E I. for
some n. But then rx 5 I., since I. is an ideal. So I): e I. Thus we have
shown that [is an ideal of R. Now we use the fact that R is a p.i.d. So
there is some x E I such that I = Rx. By definition, 1: e I. for some it.
But then Rx C 1., I. being an ideal. So I c I.. We already have that
I... c I. So I.“ c I.. This contradicts our assumption that I. is a proper
subset of I“... This contradiction establishes our assertion, namely that the
given sequence of ideals must terminate. a
This proposition is the key step in proving the existence of a factorisa-
tion of an element of a p.i.d. as a product of powers of distinct prime ele-
ments. By distinct primes here, we mean primes which are not associates
of each other. It also turns out that the factorisation in unique, where
uniqueness is again, upto associateship. We state and prove both these
results in the following theorem which may be called the unique factorin-
tion theorem.
2.18 Theorem: Let R be a principal ideal domain and x E R. Suppose x
is not zero and not a unit. Then there exists a unit 14 and distinct prime
elements p.,..., p, in R (which are not units) and positive integers n” n,..., n,
such thatx = up:x pQ'... 177’. lfx also equal: vying?” qf" where v is a unit
and the q‘s are distinct prime elements other than units and 01‘s are positive
integers then r = .v and with a re-indexing of the qi’s. we have that for
each I‘ = l,. . . , r, m, = n, and p, is an associate of (1;.
Proof. Let S be the set of those elements of R which are not zero, not units
and for which a factorisation into a unit times powers of distinct prime
elements is possible. We have to show that x E S. Note first that S is closed
under multiplication. For if a, b E S then certainly ab ¢ 0 and ab cannot
be a. unit (the factors of a unit are themselves units). Moreover, from
factorisations of a and b as products of units times prime powers, we easily
get a similar factorisation for ab. (If some prime appearing m a is an associate
of a prime appearing in b. then we replace one of them with the other, add
the exponents and adjust the unital coeflicient suitably.) Note also that S
contains all prime elements of R,
Now suppose 'x gt S. Call x as x, and let I. = (x,), the ideal generated
by x,. Then x1 is not a prime element and so has a proper factorisatio n
say, be where neither b nor 6 is an associate of XI. Since x, e S and S is
closed under multiplication it follows that either b¢ S or cg S. Let x, be
the factor which is not in S. (If both b ¢ S and c¢ S, make an arbitrary
choice.) Let II = (x,). Since x, divides x,. II c 1.. Also 11¢ I. as otherwise
Rings, Field: And Vector Spaces 427
x. will be an associate of x,. Now x,¢ S. So we repeat the argument with
x, replacing x,. This gives an element x,¢ S such that I,clI where I, = (x,)
g
Continuing in this manner we get a strictly ascending infinite sequence of
ideals. This is an outright contradiction to the last proposition. So at e S,
proving the first part.
For uniqueness of factorisation suppose x = up’l'I p? p7 = vqf‘ q',"’
where u, v are units, pfs are distinct primes and so are q.’s, and n's, m’s are
positive integers. Then the prime p, divides the product of the elements
v, q,,..., (11(ml times)...., q,...., 4: (m, times). Applying Corollary (2.18)
repeatedly, we see that pl must divideone of these elements Since p, is not
a unit, it cannot divide v. which is a unit. So pl must divide one of the q’s.
With ' ”' re-‘ ‘ ’na if y, we ,, p1 divides qr. say
91 = w,p,. Since q, is a prime, w1 must be a unit. Cancelling p,, we get
upi"1p2'...p'," =vw‘qf'"' qf’...q;"'. We now repeat the argument (if in: l, we
repeat it with p, instead ofpl). It follows eventually that m| must equal nl
and hence upfl'...pf' = vwf‘ q'i"...q:". We again repeat the argument with p,
and write one of the q,’s as an associate of 11,. say, q, = mp, with ma unit.
Continuing in this manner till all [ifs are exhausted, we see r s .v. By sym-
metry .r s r. So r = .r. Also in the course of the argument we already showed
that if 1:; and q, are associates of each other then m: min This gives the
desired result.
In particular, the last theorem is true for euclidean rings, of which 2 is
an example. This gives the unique factorisation theorem for positive integers.
It also shows that every non-zero polynomial over a field can be factorised
as a product of irreducible polynomials.
As happens with many important theorems, we give a name for the
property appearing in the conclusion of the last theorem.
2.19 Definition: An integral domain R with identity is called a unique
lactorisation domain (or u.l‘.d. for short) if every non-zero, non-unit element
of R can be expressed as a product of a unit and powers of distinct prime
elements and any two such factorisations are equal upto associateships.
The last theorem can be rewarded as saying that every p.i.d. it a. u.f.d.
This is the last of the three conditions on an integral domain which seek to
capture some of the properties of Z, the ring of integers. The three condi-
tions, namely being a. euclidean ring, being a Nd and being a u.f.d. are
progressively weaker, and so the classes of rings satisfying them are pro-
gressively larger. The relationship between various classes is shown dia-
grammatically in Figure 6.1.
To show that the class of p.i.d.’s is a proper sub-class of u.f.d.’s, con-
sider Z[x], the ring of polynomials over Z. We saw earlier that Z[x] is not
a p.i.d , However. it can be shown that Z[x] is a u.f.d.. Given a polynomial
jtx) in Z[x], we consider it as an element of Q[.x] (since zfx] C Q[x]). Now
QIx] is a u.f.d., in fact a. euclidean ring. So we can factorise [(x) into
428 DISCRETE MATHEMATICS (Chapter Six)
.2.
Euclidean ring:
p .
""clpoi moi dom“ ‘
oi“.
Integral domains
Figure 6.1: Various Classes 0! Rings
irreducible polynomials with coefiicients as rational numbers. Taking out
the common denominators of these coefficients, it can be shown that f(x)
has a factorisation into polynomials with integer eoeflicients. The details
will be given through exercises.
Some of the properties of p.i.d.’s such as the existence of g.c.d. and
I.c.m. hold for u.f.d.’s. (Indeed, as we recall from school, one of the
methods for finding the g.c.d. is to factorise the two elements into powers
of primes.) These results will also be given as exercises, where the reader
will also find examples of integral domains which are not u.f.d.’s. (A u.f.d.
is also often called a factorial ring.)
Exercises
2.] For Z, prove that a stronger version of the euclidean algorithm is
true. Given a, b e Z with 17 5e 0, there exist integers q, r such
thata=qb+rwherc|r l gHbl.
2.2 Let G be the ring of Gaussian integers, that is G = {x + tyne,
y e Z}. Prove that if z], z. E G, then d(z,z,) = d(z,) d(z.). Suppose
x+ lye G and n is a positive Integer. Prove that there exists
p +iqe G and r+ isEG such that
x + iy = n(P + iq) + (r + is) and d(r + is) < n‘.
(Hint: Apply the last exercise separately to the real and imaginary
parts of x + iy.)
2.3 Prove that G is a euclidean ring. (Hint: In order to divide it + iy
by a + ib, divide (x + iy)(a—ib) by a‘ + b‘ and use the last
exercise.)
2.4 Prove that 2is not a prime element of G. More generally. prove
that any integer which is expressible as a sum of two squares of
integers is not a prime element of G.
Rings, Field: and Vector Space: 429
2.5 Suppose p is a prime integer and x, y are relatively prime integers
such that x'+ y'= 2p for some integer c. Prove that p is not a
prime element of G. [l-lint: Factor x' + y‘ as (x + iy)(x—iy) and
apply Corollary 2.18.]
2.6 In the last exercise, prove further that p can be expressed as a sum
of two squares of positive integers. [Hintz If a + ib is a prime
divisor of p, so is a—ib.]
2.7 If p is a prime of the form 4n + 1, prove that there exists an inte-
ger x, l g x g p—l, such that x‘+ 1 is divisible by p. (Hint: Let
17—]
y = 1.2.. . ..q where q = —2—. Using Wilson’s theorem (Exercise
5.2.26), prove that y'E - l modulo p. Now reduce y modulo p
to a desired x) This result is sometimes expressed by saying that
—1 is a quadratic residue modulo p.
2.8 Prove that every prime of the form 4n + 1 can be expressed as a
sum of two square: of integers (e.g. 5 = 2' + l’; 29 = 5' + 2I etc.).
Prove further that this expression is essentially unique, that is if
p = x‘ + y‘ = a‘ + b' where x, y, a. b are positive integers then
(x, y}= (a, b}. [Hintz Use the last three exercises. Note that if
p = x‘ + y‘ then x -I- iy must be a. prime element of 6.]
2.9 Suppose p is a prime of the form 4n + 3. Prove that:
(i) there does not exist an integer x such that x3 = - 1 (mod p).
or in other words, —1 is not a. quadratic residue moduldo p.
(Hint: If it is, then show that [x] is an element of order4 in
the group 2, — ([0]) under modulo p multiplication)
(ii) 17 cannot be expressed as a sum of two squares of integers.
(iii) p is a prime element of G.
(Primes greater than 2 fall into two categories. Those of the form
4» + l and those of the form 4n+ 3. These exercises show that
the two types of primes behave differently and this difference is
brought out fully by the ring of Gaussian integers.)
2.]0 Suppose a positive integer n is expressed as 2*pf"...p1’"q'1"...q1” where
k, r, .r 2 0. p,....,p, are distinct primes ol' the form 471+] ; q,,...,q,
are distinct primes of the form 4n + 3 and m,...., m,, my", n, are
positive integers. Prove that n can be expressed a sixth of two
squares of integers if and only if all the nl’s are even.
2.1 Let R be a principal ideal domain and let a, be R. Find genera-
.—
tors, in terms of a and b, for (i) the ideal (a)n(b) (ii) the ideal
generated by (a)U(b).
2.l2 LetR be a commutative ring with identity. Two ideals A and B of
R are said to be relatively prime if there exist «E A, b e B such
that a+ b = 1. Suppose R is a p.i.d‘ and x, y E R‘ Prove that x and
y are relatively prime ifl‘ the ideals (x) and (y) are relatively prime.
430 DISCRETE MATHEMATICS (Chapter Six)
2.13 Let m.,..., m; be positive integers which are pairwise relatively
prime, that is for every 1 #1, m, and m, are relatively prime. Let
m = m,m,...rm.. Prove that the ring 2,. is isomorphic to the product
ring Z...xz,,,,>< xzw (Hint: Define 0 :2”. —> z,,,,x x2.“ by
0([x]) = ([x], [x],..., {x]) where the same symbol [x] is used to de-
note the residue classes of x modulo difierent integers. There is
little difficulty in showing that 0 is a well-defined ring homomor-
phism and 9 is one-to-one. Then use the pigeon hole principle to
show that e is onto. This result is called the Chinese Remainder
Theorem).
Using the Chinese Remainder Theorem, find 5832 X 6639 modulo
840. (Hint : 840 = 8.3.5.7. Perform the multiplication modulo 8, 3, 5
and 7 separately to get answers x,. x,. x,, 2:. (say), respectively.
By the Chinese Remainder Theorem there exists an integer
x, 0 < x < 840 such that x I x, (mod 8), x5 x, (mod 3), x l 2:,
(mod 5) and x5 x, (mod 7). The trouble is that there is no easy
algorithm for finding such x. short of searching. If we have to
do a very large number of arithmetical operations modulo 840, it
is advantageous to prepare tables which give suchx for each
quadruple (x1, x,. x” x.) e 2,, x z, x Z. x Z1.)
*2.lS Generalise the Chinese Remainder Theorem to any commutative
ring, R, with identity as follows. Let Al. ...,Ak be ideals of R which
are pairwise relatively prime. Let A-Aln flAk. Then prove
that R/A is isomorphic to R/AlXR/Agx ...xR/Ak. (Hint: proceed
as in Exercise (2.13). The only difficulty is in showing that 6
is onto. For this, prove by induction on k, that given x1_ ...,xkER.
there exists xER such that x-x;EA,~ for all i = 1, 2, ..., k.)
2.16 Characterise the units of R1 x R, x x Rk where R» ...,Rg are
rings, with identities. Combining this with the Chinese Remainder
Theorem, give an alternate proof of the fact that the Euler
e-function is multiplicative, (cf. Exercise (2.4.15)).
PrOVe that the group of automorphisms of a cyclic group of order
m (of. Exercise (53.6)) is isomorphic to the group of module m
residue classes relatively prime to m.
Suppose x is an element of a group G and m, n are relatively
prime integers. If x’" = x" = e prove that x = e.
Suppose x, y are elements of a field F and m, n are relatively
prime positive integers such that )6" = y“ and x“ = y“. Prove that
x= y. Show that this still holds if F is only an integral domain
but not necessarily if it is any arbitrary ring.
2.20 Generalise Proposition (2.23) as follows. Let S be a ring with
identity. LetZ be the centre of S. Suppose R is a subring of S such
that 1 E R and R C Z (Such a subring is called a central subring.)
Then for any a e S and any two polynomials f(x), 3(x) in R [x],
Rings, Field: and Vector Space: 43]
show that h(a) = f (a) + g(a) and p(a) = f(a)g(a), where
htx) =f(x) + 30¢) and par) =f(x) so).
2-21 Prove that a polynomial of degree 3 or less over a field is irreduci-
ble if and only if it has no roots in that field. (Hint; In any non-
trivial factorisation, there will be at least one linear factor.)
2.22 The mnltipllcity of a root at of a polynomial f(x) over a field I" is
defined as the largest integer m such that (x — at)“ divides f(x) in
F[x]. A root is called simple or multiple according as its multipli-
city equals or exceeds 1. Prove that iff(x)=n, + a,x+...... + a.”
(a. 9&0) and has distinct roots a,,.. ., an, in F, with multiplicities
m ..,mk respectively then 2 m.< n and equality holds ill/(X)
can be expressed as a product of n polynomials of degree 1 each.
(Such a factoring‘rs called a complete splitting off(x)) In the case
of equality prove further that
imlar= -'2-—-' and I'll‘ n7"! =(_ 1),,“
1-: an r-r -
Let f(x) be a polynomial of degree n (> 0) over a field F and let
f'(x) be its formal derivative (of. Exercise (1.20)). Prove that at is a
multiple root of f0) in F if and only if it is neommon root of
f(x) and f’(x).
2.24 In a unique factorisation domain, obtain a criterion for divisibility
of one element by another in terms of the factorisations of the
two elements into prime powers.
2.25 Prove that in a u.f.d., every two elements have a g.c.d. and every
two non-zero elements have an l.c.m.. Prove also that Proposition
(2.17) holds for a u.f.d..
In Z [x], prove that the g.c.d. of 2 and x cannot be expressed as a
linear combination of 2 and x. (Thus the behaviour of the g.c.d.
of two elements of an integeral domain improves in the hierarchy
of the three conditions. In a u.t'.d. it simply exists, in a p.i.d. it
equals some linear combination of the two elements and in a
euclidean ring there is an algorithm for expressing it this way.)
Let R be the ring, under poiutwise operations, of all complex-
valued functions which are analytic in the entire complex plane 0
(see Exercise 0.16)). Prove that the units of R are those functions
which have no zero in C, while the prime elements of R are those
functions which have at most one zero (of multiplicity l) in C.
Prove that functions with infinitely many zeros (such as sin 1, cos 2)
cannot be factorised in- R as products of finitely many primes.
(The solutions require only one basic property of analytic functions,
namely, if a: is a zero of multiplicity m of an analytic function
f(z) then there exists an analytic function g(z) such that f(z)=(z- a)"
3(1) and 3(a) aé 0-)
4‘32 mscxm MATHEMATICS (Chapter Six)
2.28 Prove that in the ring 2 + “/32 (that is, the subring of C, con-
sisting of elements of the form a + i\/3_b where a, b are integers),
4 can be expressed as a product of primes in two difl‘erent ways.
(Thus we have an example of an integral domain with identity
which fails to be a u.f.d. because of non-uniqueness of factori-
sation. The ring R in the last exercise, on the other hand, fails
to be a u.t'.d. because of absence of factorisation).
2.29 A polynomial a0 + 11.x + + 4.x" in Z[x] is said to be primitive
if the g.c.d. ofits coeflicients a“. ..., a,l is 2, (equivalently, there is
no prime number which divides all of them). Prove that the pro-
duct of two primitive polynomials is a primitive polynomial.
[Hintz Given a. prime p, consider the terms of the smallest degree
in_ each polynomial, whose coeflicients are not divisible by 11.]
2.30 Suppose f(x) is a primitive polynomial in z[x] and fix) = g(x)h(x)
in QM. Then prove that fix) can be factorised in Z [x] (Hint:
Write g(x) as 5 g,(x) where p, q are integers and g,(x) is a primi-
tive polynomial in Z(x). Do the same for h(x) and apply the last
exercise. This result is called Gauss' Lemma.)
2.3 Using the last exercise prove that z[x] is a unique factorisation
.—
domain. (Hint: Note that Z c Z[x] c Q[x]. Intuitively there are
two types of primality in Z[x], one that comes from the fact that
the coefficients are integers and the other that comes from the fact
that the elements are polynomials. Because of Gauss’ Lemma these
two types of primalities can be separated from each other.)
2.32 Let Fbe a field and/(x) = x‘ + x + l. Prove that if F = R or
2,, then f(x) is irreducible, but if F == 2. then it is reducible over
F. In the first two cases let I be the ideal (ftX))- If F = R, prove
that mm is isomorphic to the field of complex numbers, while if
F = Z, prove that F[x]/I is a field with 4 elements.
'2.33 Using the field with 4 elements constructed above, solve the Reli-
gious Conference Problem for n = 4. (These two exercises give an
idea of the construction and applicability of finite fields. It is
possible to study both in greater details, see the Epilogue.)
Let for) = a0 + up: + aIx' + + a..x" E Z[x] and suppose
p/q is a root offix) where p. q are relatively prime integers. Prove
that p divides a, and q divides a... (This result, called the rational
root test gives the possible rational roots of a polynomial with
integer coeflicients. Each one of them is then checked individually
to see if it is in fact a root.)
In Proposition (2.20), show more generally that if d is a positive
integer which divides n then G has exactly ¢(d) elements of order
d. [Hint: Apply the possition to a subgroup of G of order (I, which
exists and is unique by Theorem (5 2.17).]
Rings, Field: and Vector Spaces 433
2.36 Usingthe last exercise giveanalternate proof of Exercise (3.4.24),
(viii).
2.37 Let H has group of order n. Prove that H is cyclic ifl' it has the
property that for every positive integer d dividing n, H contains at
most d elements satisfying the equation x4 = e. [Hintz For the
converse, for every positive divisor d of n, let
H, = {x E H: o(x) = d).
From the condition H satisfies, show that if x is an element of
order d then the only elements of order din H are a" where m is
relatively prime to 4. Hence l Hal < 950!) for all d i n. So
71 = 0(H) =4s Hll g FF‘Kd) = II.
So for every d, we must have I H; | = Md). In particular H. ’6 ¢.]
2- 38 Using the last exercise and Proposition (2.25), show that the multi-
plicative group of a finite field is cyclic. More generally show that
every finite subgroup of the multiplicative group of any field is
cyclic.
2.39 Let p be an odd prime. Let/(x) = xr' — l. Consideringflx) as a
polynomial over the field 2, and applying Theorem (5.2.16) and
the result of Exercise (2.22), obtain another proof of Wilson‘s
theorem, (p — ])l E — 1 (mod p). [Hintz Every non-zero element
of z, is a root offlx).]
Notes and Guide to Literature
The treatment in this section follows largely the pattern of Hersteln [1],
Chapter 3, except for one major difl‘erence. Herstein adds one more con-
dition to the definition of a euclidean ring, to the efi‘ect that the degree of
every element is greater than that of its proper divisior. This makes
it easier to apply induction on the degree and thereby give direct proofs
of the results. However, this extra condition seems superfluous, because
even without it, every euclidean ring is a principal ideal domain. Another
point of departure is that unlike Herstein, who coflnes to euclidean rings
most of the time. we have proved the results for p.i.d.'s. It is perhaps
debatable whether We have achieved any real generality in doing so, espe-
cially because we have not given any examples of rings which are p.i.d.'s
but not euclidean rings. We have only quoted an example which is due
to Motzkin [l]. The reason we have preferred to work with p.i.d.’s rather
than euclidean rings is an important difl‘erence of technique. In a euclidean
ring we directly imitate what goes in Z. In a principal ideal domain on the
other hand, we first translate the concepts in terms of ideals. When so
translated, they are often meaningfui even for rings which are not p.i.d.'s.
for example, the concept of relatively prime ideals in Exercise (2.12). In
434 Discam MATHEMATICS (Chapter Six)
mathematics it often happens that a concept cannot be generalised as it is
to a wider setting, but when appropriately translated. the generalisation
becomes self-evident. Art lies in finding such 'appropriate’ translations.
The Chinese Remainder Theorem is said to have been known to the
ancient Chinese. For more on its applications to computations, see Knuth
[1], Volume 2 or Tremhlay and Manohar [l].
The result of Exercise (2.8) is due to Fermat. For an elementary proof,
see for example, Chandrasekhar [I]. See Ahlfors [l] for properties of
complex analytic functions.
3. Vector Spaces
Compared to the last two sections, the material in'this section is relat-
ively light. A " 9', our, inn will be ' ‘ ‘,.Another
reason for this is that we expect the reader to be familiar at least with the
algebra of the two and the three dimensional vectors as used in physics.
Indeed we count on this familiarity to motivate ‘abstract' vector spaces to
be studied in this section.
In physics we take a vector as a quantity which has both a magnitude and
adirection and which obeys the parallelogram law for addition. Common
examples of vectors are velocity, force, acceleration, electrostatic field vector
and so on. Scalars on the other hand, have only magnitude and no direction.
They are added as real numbers. Speed, work, pressure etc. are scalars. The
vector spaces to be studied in this section are actually abstraction of the
vectors in physics. But unlike thedefinition of a ring (which is modelled after
the ring of integers), the linkage between abstract vectors and their forerun-
ners, namely, the vectors in physics is a rather remote one. So instead ofgiving
right away the definition of an abstract vector space, we proceed to show
how it evolves from the vectors in physics.
As a first step we have a convenient mathematical representation of
physical Vectors by directed line segments in the plane or in space. (Some-
times a vector is defined as a directed line segment). Here two directed line
segments which are of equal length and have the same direction are to be
regarded as representing the same vector. So, formally, a vector is an
equivalence class of directed line segments, under the equivalence relation
in which two line segments with the same length and direction are to be
regarded as equivalent. An alternate approach is to consider only those
directed line segments which start at the origin 0. If P is any point then the
directed line segment 01’ joining 0 to P is called the position vector of P.
Position vectors are added according to the parallelogram law (Figure 6.2 (a)).
Similarly if P is a point and A is a real number (a ‘scalar’ as it is called)
them 101’ if the position vector of a point Q which lies on the same ray
through 0 as P or on the opposite ray according as A 2 0 or A < 0, and
whose distance from 0 is [ A | times that of P (Figure 6.2. (b)).
Rings, Field: and Vector Spaces 435
0|
R P
a -)
0 OQI=XIOP(X|>O)
-s .,
”'0
OQZ X20P(X2<O)
P.
o
O
" —>
0R=oa+6l=2
(0) (b)
Figure 6.1: Elementary Operatlnne on Vectors.
Besides the two operations of vector addition and scalar multiplication.
we have two types of products, the scalar (or the dot) product of two
vectors and the vector (or the cross) product of two vectors. A number of
identities about these products hold true. Most of the basic concepts of
geometry such as distance. angle, lines, planes, areas, volumes etc. can be
expressed in terms of vectors. This study is known as vector algebra. In
vector calculus, on the other hand, we study functions whose domains
and/or codomains consist of vectors. We study difi‘erentiabiiity of such func-
tions, define things like the gradient, the divergence and the curl and prove
and apply such classic results as the theorems of Gauss and Stoke. This study
constitutes one of the richest applications of mathematics to physical
problems. Indeed many concepts of vector calculus owe their origin to
physical problems. (This vestige can be seen even from the nomenclature.
For example, a vector valued function is called a ‘vector field' because the
electric field due to a charge is an important example of such a function.
Of course, ‘fleld' as used here has little to do with the way we are using
the term ‘flcld', namely to denote a certain kind of rings.)
In terms of depth, the theory of vector spaces which we are going to
develop will not come anywhere near the theory outlined above. First, we
shall not deal with any limiting process and this takes away the entire
vector calculus from our purview. Consequently, the abstraction we are
after will only be of the algebraic aspect of vectors. But even here, we shall
be sacrificing a large part, namely, the structure that arises because of the
dot and the cross product. It is indeed possible to abstract and gmeralise
the concept of the dot product. The structure so formed is called an inner
product space and is important in a branch of mathematics called functio-
nal analysis. But we shall not pursue it. As for the cross product, it is a
feature peculiar to vectors in space, that is, three dimensional vectors. it
436 DISCRETE Humane- (Chapter Six)
cannot be generalised even to four dimensional vectors, let alone for
abstract vectors.
So, summing it up, the definition of an abstract vector space is designed
to generalise only those aspects which depend on the addition of vectors
and on the multiplication of a vector by a scalar. As usual, we list a few
properties of these two operations and assume some of them as axioms.
The set of vectors forms an abelian group under addition. We already
defined abelian groups in Chapter 5. Just as a ring is obtained by putting
an extra structure on an abelian group (and inter-relating it to the group
structure by means of the distributive law), an abstract vector space is
obtained from an abelian group by endowing it with an additional structure
corresponding to scalar multiplication. To see what this structure should
be, let us list some of the properties of the familiar scalar multiplication
for the vectors considered above. Let V be the set of such vectors. Then the
scalar multiplication is really a function from R x V into V which assigns
to an ordered pair (7., u) where A e R and u e V, the vector ).u in V. This
function is not a binary operation on V (because it is not a function from
V>< V into V). However, it is compatible with the binary operation of
addition in Vin the sense that for all A e R and u, v in V, 1(14 + v) equals
(M) + (Av). The scalar multiplication is also compatible with the ring
operations in R. in the sense that for all 7i, p. e R and for all u E V, we
have (t. + p) u = (M) + 0m) and (sign = Mm). Moreover, the identity
element of R, namely 1, also behaves like an identity element for scalar
multiplication in the sense that for all u e V, lu = :4.
To define an abstract vector space. we start with an abelian group V. a
field F and a function from F x Vinto tich plays the role of scalar
multiplication. The formal definition is as follows:
3.1 Definition: A vector space over a field F is a triple of the form
(V, +, -) where (V, +)is an abelian group and ~ :FXV—e Vis a function
satisfying the following properties for all x, P. E F and all u, v e V.
(i) n-(u+v)=(x.u)+(m)
(ii) (1+u)-u=(lsu)+(u-u)
(iii) 0H)'"=’\-(l*'u)
(iv) l~u=u.1being the identity element ofF.
Note that in (ii), + denotes the addition in F on the left hand side and
the addition in V on the right hand side. Also the same symbol - is used for
the multiplication in F and for the scalar multiplication. This double usage
causes no confusion. We emphasize, however, that the scalar multiplication
is not a binary operation either on F or on V. It is a sort of a hybrid. In
technical terms, the scalar multiplication represents the action of the field
F on the group V. We shall study other examples of such actions of algeb-
raic structures later (see the Epilogue).
Rings, Fields and Vector Spaces 437
The reader may wonder where we needed that I" is a field. Will the defi-
nition not make sense if instead of the field F we merely have a ringR with
identity? The answer is ‘yes’. The algebraic structure that arises this way is
called nmodnle (actually a left module) over the ring R. Thus vector spaces
are special cases of modules. The theory of modules is also an interesting
one. But we shall not study it here. (A few indicative results will be givenas
exercises.) Because a field is a very special type of a ring, it is but natural
that there would be some results which hold good for vector spaces but not
for all modules. These results would involve the special properties of a field
namely, commutativity and, more importantly, the presence of multipli-
cative inverses for non-zero elements. As a simple illustration, we have the
following proposition_ where the first three parts hold good for modules as
well but the last one is true for vector spaces only. (As with other places
where a dot is used to denote a binary operation, we shall suppress it
while denoting the scalar multiplication. Thus if A E F and u e V we write
Au instead of A . u. Another minor point about notation is that in physics
the vectors are almost invariably denoted by bold face letters so as to dis
tinguish them from scalars. We shall not do so. Generally, field elements
will be denoted by lower case Greek letters and vectors by small case
English letters. However. this is not a. uniform convention since we shall
have occasions to consider vectors which are themselves elements of some
field. Ultimately, it is only through the context that a particular symbol
will represent a scalar or a vector. The symbol 0 will simultaneously repre-
sent the zero element of the field F and also the zero vector, that is, the
identity for addition of vectors. By the way, the zero vector, also called the
null vector, should not be thought of as ‘a vector with zero length’ or as
‘a vector having no particular direction" as is done with the ordinary
vectors. The concepts of length and direction are meaningless for abstract
vectors.)
3.2 Proposition: Let V be a vector space over a field F. Then we have,
forall A e Fand u E V,
(i) A0 = 0
(ii) Ou = 0
(iii) (—A)u = —(Au) - A(— 14)
(iv) ifhu = Othen either A = 0 or u = 0.
Proof: The proof of the first three parts resemq that of Proposition(l .2)
and hence is left as an exercise. For the last part, suppose Au = 0. If 1. eé 0,
then A" exists in F. By (i), A" (Au) = kl 0 = 0. But on the other hand,
(A4)hu = (r1 nu = in = :4, using the axioms of a vector space. So u = o. I
The last part says. in a way, that there are no zero divisors for the
438 Discam mmmncs (Chapter Six)
scalar multiplication in a vector space. As a corollary we have that if A 75 0
then the equation M = Av forces a: v. Similarly, if «9&0, then M = pu
implies A = [.t.
The concepts of a subspace of a vector space, the quotient space of I
vector space and homomorphism of vector spaces are defined in the expected
manner. Specifically, let V be a. vector space over a field F. Then asubspsce
of Vis a. subset W of V such that (i) W is a subgroup of the abelian group
V and (ii) W is closed under scalar multiplication, that is. for all h e F and
ue W, MEW. It is clear that a subspace of a vector space is itself a
vector space over the same field. Also if W is a subspace of V then the
set ‘V/W, consisting of the eosets of Win V, is already an abelian group
under coset addition. If we define Mu + W) to be (M)+ Wfor he I",
u E V then we get a well-defined scalar multiplication on WW and it makes
V/Wa vector space called the quotient space of V by W. If V, W are any
two vector spaces over the same field F, then a vector space homomorphism
from V to W is a function T: V—> W such that for all u, v e V and A e F,
T(u + v) = T(u) + T(v) and T011) = A T(u). Vector space homomorphisms
are more popularly called linear transformations. The reason for this term
will be explained in the exercises. A linear transformation which is also a
bijection is called a vector space isomorphism. It is easy to show that the
kernel and the range of a linear transformation are subspaces of its
domain and co-domain respectively. The analogues of the theorems about
group homomorphisms hold for linear transformations. We leave it to
the reader to state and prove them.
We now give a few examples of vector spaces. Embodied in every
vector space, there is an abelian group. So, not surprisingly, some of the
examples here have already appeared as examples of groups. Still we state
them again, emphasising this time. the scalar multiplication.
l) The vectors consisting of directed line segments originating at some
fixed point 0 form a vector space over R, the field of real numbers. We
considered above only the two and the three dimensional vectors. But more
generally, we can take the position vectors of points in any higher dimen-
sional euclidean space. In fact we may as well identify a point in a
euclidean space with its position vector and say that a euclidean space is
itself a vector space over the field of real numbers. Any line or plane
passing through the origin is a subspace of this vector spine (cf. Figure 5.4,
where one such subspace was pictured only as a subgroup).
2) Trivially, every field F is a vector space over itself. The scalar
multiplication ~ : F x F —~> F is simply the field multiplication. More
generally, suppose R is a (not necessarily commutative) ring with identity 1
and Fis a subfleld of R such that 1 e F. Then R is a vector space over F,
with the scalar multiplication ~ : F x R ->R being merely the restriction of
the binary operation ~ :R x R->R. It is of course not necessary that R
should actually contain the field 1". It suffices if R contains an isomorphic
Rings, Fields and Vector Spaces 439
replica of F. For example, let R to be the ring M.(F) of all n x n
matrices over a field F. We identify an element a e Fwith the n x n matrix
whose diagonal entries are A each and whose all other entries are 0. If
A = (an) is any n x n matrix over F then for any AE F, we have
0 g 0.48 a“ annoy,
Q \ all n,,...a.. =
0 0 A a,“ a,,...o.,
Therefore, when M.(lv') is regarded as a vector space over F, the scalar
multiplication of an n xn matrix with an element of F simply amounts to
multiplying each entry of the matrix by A. In view of this, we can even
consider matrices which are not square matrices. Let m, n be fixed positive
integers and let Mm, ,. (F) be the set of all m Xn matrices over F. Then
MM, . (F) is a vector space over F with scalar multiplication defined entry-
, wise. If we take only such matrices which have 0 in a particular place (say
the second row and the third column) then we get a subspace of Mn, .. (1").
Similarly the set of all matrices with the property that the sum of the entries
in a particular row (or column) is 0 is a subspace of Mn, ,. (F).
3) As another example of an extension ring forming a vector space, let
Rx) be the ring of all formal power series in an indeterminate 2: over a
field F. Then F{x} is a vector space over F. The scalar multiplication by
A E F amounts to multiplying the cocfiicient of every power of x by A. An
important subspace of this is the set of all polynomials in x, F[x]. If we
take the set of all power series in which the coeflicient of a particular power
of x (say of x‘”) is 0 we get a subspace of F(x}. Another important class of
subspaces is the sets of those polynomials whose degrees are bounded by
some fixed positive integers (with the zero polynomial included). For
example all polynomials of degree 5 or less (with the zero polynomial
included) form asubspace of F[x] and hence of Rx}. Note that this sub-
space is generated by the set (l, x, x'. x’, x‘, x“) because it is clearly the
smallest subspace of F(x} which contains this set. F(x), the field of rational
functions in 3: over F is also a vector space over F.
4) As with other algebraic structures, the product of two (or more)
vector spaces over the same field is a vector space over that field. The
scalar multiplication is defined coordinatewise. Similarly, if V is a vector
space over a field F. S is any set and W is the set of all functions from S
into V then W is a vector space over F. The scalar multiplication is defined
pointwise. That is, iff: S —> V is an element of W and A e F then Af is the
function from S to V which assigns to a point 3 e S, the veCtor Af(s). A
particular vector space, which can be looked at as an instance of either of
these two constructions is noteworthy. Let n be any positive integer. Let F"
be the cartesian product FX Fx XF (n times). Elements of PI are ordered
440 DISCRETE MATHEMATICS (Chapter Six)
n-tuples of elements of F. They can also be considered as functions from
the set S = {1, 2,...I n) into F. F- is a vector space over F. By convention
we let F‘ be the trivial vector space. The importance of vector spaces of
this type is that every finitely generated vector space over F is isomorphic
to F" for some n. We shall prove this later. But as a special case, we can
see that the vector space of all directed line segments in space is isomor-
phic to R', that is, to R x R xR. An explicit isomorphism is obtained by
setting any fixed rectangular frame of coordinates say at, y, r. If a point P
has co-ordinates x, y and 2, we associate the triple (x. y, z) to the position
vector of P. It may be recalled from geometry that the real numbers X,» z
_,
represent the components of the vector 01’ along the three axes. (It is
customary to let i, 1, It be unit vectors along these axes. Then 61: =xi +
y] + zlt). Something very similar will he used in the proof of the general
isomorphism theorem just mentioned.
5) Let V, W be vector spaces over a field F, L( V, W) be the set of all
linear transformations of from V to W. L(V, W) is a subset of the set of all
functions from Vto W, which is a vector space under pointwise operations.
We claim that L(V, W) is a subspace and hence that it is a vector space
over F. For this we have to show that whenever T,, T. E L(V, W) and
A e F, then Tl + T. e L(V, W) and Arts L(V, W), that is, we have to
show 7‘, + T. and AT1 are linear transformations from V to W. Recall that
T, + T, is defined by pointwise addition of values of T, and TI.
So, if u, v, e Vnnd A e F then (T,+T.) (u + v) = T,(u + v) + T.(u+v) =
T,(u) + Tl(v) + T,(u) + T,(v) (since 1”,, T,are linear) = [T,(u) + T.(u)]+
[T,(v) + T,(v)] (since V is an abelian group under +) = (T, + T.) (u) +
(T1 +T,) (v), showing, that T1+ TI preserves addition. Also (T,+T.)(M)=
Mu) + mu) = Wu) + Wu) = Ame) + mm = W. + T0001.
This proves that Tl + T. e L(V, W). The proof that AT, E L(V, W) is
even simpler and left to the reader. Thus L( V, W) is a vector space over F.
It is also sometimes denoted by Horn; (V. W) so as to distinguish it from
Hom (V, W) which simply consists of all group homomorphism: from the
abelian group V to the abelian group W(see Example (ll) in Chapter 5,
Section 3).
A particularly interesting case of Horny (V, W) arises when the vector
space W is just the field F. Elements of Hons, (V, F) are functions
T: V—> Fsuch thatt’or all u, v e Vand A E F, T(u + V) = T(u) + T0)
and T(Au) = AT(u). Such functions are called linear functionals on V and
Horn: (V. F) is called the dual space of V. The dual of Vii often denoted
of V or V‘. (A linear functional is also often called a linear form.)
. 6) 15MB. +. u ') be a Boolean algebra. By Theorem (LS), (B, e, -)
Is a Boolean ring. We can make (5, e) a vector space over 2, (which is a
field as proved in Section 1). Since 2. has only two elements. namely 0 and
1, we have little choice in defining the scalar multiplication. We define
Rings, Field: and Vector Spaces 441
0-): = 0 and 1-): = x for all x e B. The axioms of a vector space are
easily verified. Thus we see that a Boolean algebra (or rather. the corres-
ponding Boolean ring) is a veCtor space over 2,. (Actually, the ring structure
is not very vital here. All that is needed is that x 9 x = 0 for every x e B.
This will be pointed out in an exercise.)
We now turn to the determination of the structure of abstract vector
spaces. In Chapter 5, we remarked that the problem of classifying groups
according to isomorphism is an extremely diilicult one. The same is true
for rings. As compared to this, vector spaces over a field behave remarkably
simply. Upto isomorphism, there are very few distinct vector spaces over
a field F. Moreover, their form can be easily written down. The situation
is comparable to the structure theorem for Boolean algebras (Theorem
4.1.11) where we classified all finite Boolean algebras. In a somewhat
similar manner we classify all finitely generated vector spaces overa field F.
We begin with a few basic definitions. Throughout, V will denote a vector
space over a field F. An expression of the ,form Alvl + AIV, + + My.
where n is a non-negative integer, v1, ..., v. e Vand Al, ..., A. e F is called
a linear combination of the v,’s with the coeflicients Us. The term ‘linear'
comes from the fact that the equation of a straight line through the origin
in Cartesian coordinates is obtained by equating to zero some linear com-
bination of the coordinates with constant coeflicients. We consider only
finite linear combinations. Consideration of infinite linear combinations
would require some type ofa limiting process. Thus even if S is an infinite
subset of V, by a linear combination in S, we shall mean a linear combi-
nation of a finite number of vectors from S. in terms of linear combinations
we can easily describe the subspace generated by a subset.
3.3 Proposltlon: Let S c V and let L(S) be the set of all linear combi-
nations of elements of S. (L(S) is often called the linear span of S.) Then
L(S) is a subspace of V. Moreover, it is the smallest subspace of V con-
taining S. (if S = 95, L(S) = (0]. This conforms to the convention that a
sum with no terms equals 0.)
Proof: First we show that L(S) is a subspace of V. If u, v E L(S), then
each u, v is a linear combination of vectors in S, say u = 7\,v| + + luv.
and v = paw, + + umw" where v.’s, w,’s are in S and Ms and W’s are
in F. Nowwe simply write 14 + v as Ag, + . . .+ Mm. + y..wl +. . .+ [1.a
and see that it is in L(S). (Some of the V's may overlap with some of the
w’s. But we are not requiring that the vectors appearing in a linear com-
bination be distinct. If we do, then we can sort them out and add together
the coefficients of equal vectors.) Similarly for any A e F, Au = My; +
. . . + 1m. and so Au 6 L(S). Thus L(S) is a subspace of V. Ifu E S.
then u = l-u is a linear combination in S and so u e L(S). Thus S C L(S).
Finally, suppose W is any subspace of V containing S. By induction on n,
442 mscam mmmrrcs (Chapter Six)
it is easy to show that for V1, . . ., v. e S and Al, ..., A. e F, xiv, + My.
+. ..+ A.“ e W. So L(S) c W. Summing it up, L(S) is the smallest
subspace of V containing S. '
Because of this proposition, the subspace L(S) is said tube spanned
(rather than generated) by S. By way of an example, let V = R‘ and let
S = {(l, 0, —l), (—2, l, 3)}. Suppose (x,, x,, x.) E L(S), Then there exist
7.1, h, E R such that (x1. x,, x.) = MI, 0, -l) + 11—2, 1, 3). This is
equivalent to three simultaneous equations x, = A,—2).,, x, = A, and
x, = — A, + 3)... Eliminating A, and x, we get x1 — x, + x, = 0. Con-
versely, if x,, x,, x. satisfy 2:, — x. + x, = 0 then we can solve for 1,, A.
and express (x,, x,, x.) as a linear combination of (1,0, —- l) and (—2, l, 3).
Thus L(S) is the subspace {(x., x,, x,) E R': x, — x, + x. = 0). Suppose
we add the vector (0, 2, l) to S. Then it is very easy to show that the
span of the set {(1, 0, —l), (—2, 1, 3), (0, 2, 1)) is the entire space R‘.
However, if we add the vector (1, 4. 3) to S then the linear span of the set
((1.0. -1). (—2, I, 3), (l, 4, 3)) is the same as that of the original set S.
The reason is that the new vector added is already a linear combination of
elements in s, because (1,4,3) = 9(1, 0, —1)+ 4(—2, 1, 3). So (1, 4. 3)
is redundant in the sense that it does not contribute anything new to the linear
span of S.
Given a vector space V over a field F, our intention is to find a subset
S of V which spans V. Obviously we would like this spanning subset S to
be as small as possible (or else we might simply take S to be V itself). To
ensure this, we must avoid redundancies of the kind illustrated above.
There is a very handy way to tell whether a set is free of such redundancies.
But first we need a definition.
3.4 Definition: A subset S c V is said to be linearly dependent (over I")
if there exist distinct vector: v., v,, ..., v. e S and 1,, ..., he F, not all 0,
such that m. + NV. + + m. - 0. A set which is not linearly depen-
dent is called linearly independent.
In this definition, S is a set of vectors. According to our convention
about sets, when the elements of a set are listed, neither their order nor
their repetition matters. As was seen in Chapter 3, Section 1, if we want
to take into account the repetitions, the appropriate concept is a multiset.
in the study of vector spaces, we often encounter multisets of vectors. It is
convenient to extend the preceding definition of linear dependence when
S is a multiset. In that case we do not require that the elements v,. ..., v.
be distinct. We simply require that none of them appears more frequently
than it does in the multiset S. Note that if some element, say, v appears
at least twice in S, then S is linearly dependent because we can write
lv+(—- l)v=0. The linear dependenCe of a multiset is conceptually
different from that of a set. If v is a non—zero vector, then the multiset
(v,v) is linearly dependent but the set (v, v}, which is the same as the set
Rings, Fields and Vector Spaces 443
{v}, is linearly independent! Still. to avoid pendantic fussiness, we shall
speak only of linear dependence of a set, even though at times we mean
the linear dependence of a multiset. The context will always make it clear
which way we mean. By a further abuse of language, when S is a finite set
or a multiset, say S = {w,, ..., wk) then instead of saying that S is linearly
dependent (or independent) we may say that the vectors w,,...,w,, are
linearly dependent (or independent).
Worded difl‘erently, a subset S is linearly independent iii no non-trivial
linear combination of elements of S vanishes. In the example given above,
the set ((1, (l, —l)_ (—2, l, 3), (0, 2, 1)) is linearly independent while the
set ((1, 0, —1),(—2, l, 3), (I, 4, 3)) is not because we have
90, o, —1)+4(—2, 1, 3)— 1(1, 4, 3)=(o,o,o) =0.
In the vector space of all polynomials over a field, the set{l, x, x', x', . .. .}
is linearly independent. But if we add'any one more polynomial to it,
it becomes linearly dependent. As with any other concepts about vector
spaces, the role of the ground field should not be ignored. Sometimes the
same abelian group may be made into a vector space over two fields. it
may happen that a subset S is linearly dependent over one of these fields
but not over the other. For example, consider 0. the set of complex
numbers as a vector space over itself. Then the set {1, l) is linearly depen-
dent since i-l + (—l)i = 0. Since G is an extension field of R. we may
think of G as a vector space over R. But then (I, I) is linearly independent
over R because if A, p. E R and 7h] + pl = 0 then A. u must be both 0.
Similarly the set {1, v2, V3, V5} is linearly dependent over R. But it is
linearly independent over Q, the field of rational numbers. (The proof of
this fact is not so immediate. We shall indicate a proof in the exercises.)
The relationship between linear dependence and the redundancy
discussed above is brought out in the following proposition.
3.5 Proposition: For a subset S of V, the following statements are
equivalent:
(1) S is linearly dependent over F.
(2) There exists v E S such that v is in the linear span of S — {v}.
(3) There exists a proper subset T of S such that L(S) = L(T).
Proof : (l) =>(2). Suppose S is linearly dependent over F. Then, by
definition there exist distinct vectors, say, VI, v” ....V.ES and scalars
7th ....x. E F, not all 0, such that An. + A. v, + + A. v. = 0. Since not
all 11's are 0, We may suppose, without loss of generality. that afieo.
Then v, = w, + + law. where n. = —;11 fori = 2, ..,.n Let v = v,
Then vl, ..., v. e S—(v) and we see that v is in the linear span of 5—0}.
So (2) holds.
444 Discs-m MATHEMATICS (Chapter Six)
(2) =9 (3). Suppose v e S can be expressed as v = A,“ +. . .+ luv.
where v1, ...,v. e S—(v}and Ax, ..., A. e F. LetT= S—{v}. Then T
is a proper subset of S. We claim that L(T) = L(S). Certainly, since
T C S, we have L(T) c L(S). For the other way inclusion let u 6 L(S).
Then u can be expressed as illwl +. . .+ p..w,,. for some W1» ..., w... e S
and [1.]. ..., (1.». e F. Now, if none of the wi's equals v, then u is a linear
combination of elements of T and hence is in L(T). If w. = v for some
values of i, we substitute each such w; by m, + + A.v,,, in the expres-
sion u= pm. + + p..w,. and thus get u as a linear combination
of elements of S other than v. But then u e L(T). So L(S) c L(T) and
hence L(S) = L(T).
(3) a (1). Suppose T is a proper subset of S and L(T) = L(S). There
exists some v e S such that v ¢ T. Since v E S. v e L(S) and hence
ve L(T). But this means there exist v” ..., VII 6 T and A1, ..., A. E F
such that v = xiv. + + AA... We may assumethntthe vl’s are distinct.
for otherwise we can add the coefficients of equal v,’s. Since v¢ T, it
follows that v, v1, ..., v. are distinct vectors in S. Also 11v, + 1.17, +
+ My. + (—1)v = 0. Since the coefiicient —l is non-zero, it follows
that S is linearly dependent. |
We are now in a position to prove the existence of a spanning set which
is free of redundancies. The idea is to go on ‘shrinking’ a given spanning
set, while retaining its linear span till we reach a stage where no more
shrinking is possible. By the proposition above, the spanning set at this
stage will be linearly independent. Now, how do we ensure that the process
of shrinking the spanning set will stop eventually? For finite sets there is
little difi'iculty as we see in the proof of the following proposition.
3.6 Proposition: Suppose S is afinite subset of V. Then there exists a
linearly independent subset T of S such at L(T) = L(S).
Proof: Let W = L(S). If S itself is linearly independent, we set T = S.
If not, then by the last proposition, there exists a subset S, C S such that
i
L(S, = W. Again if SI is linearly independent we set T = 5,. If not there
exists S, S Sl such that L(S.) = W. Continuing in this manner we get a
strictly descending sequence of sets S :) SI 3 S, D S. D each of which
i ,. 7‘ no
spans W. But S is a finite set. So this sequence must terminate at some
stage. Suppose it terminates at S1,. Then 5;, is linearly independent and
spans W. We set T = Sk.
This proposition is actually true without the hypothesis that S be
finite. But the argument given above breaks down, In an infinite set we
may have an infinite chain of strictly descending sets, say
S: 5,: S, 3,..., D S,. 3 Sn”...-
,r as 99 it :-
Rings. Field: and Vector Spaces 445
It is tempting to let T be the intersection of all these S,’s, that is,
T = Il=l
E s..
(Something similar was done in the proof of Proposition (2.27), where we
took the union of a strictly ascending sequence of ideals.) But the trouble
is that this T may not be linearly independent. So we may have to trim it
still further. The existence of the desired subset of S has to be established
by the use of axiom of choice. The manner in which it is used is analogous
to the manner in which it is used for proving that every partial order can
be extended to a linear order (see Chapter 3, Section 3). But we do not
pursue this line further. So we leave the result as it is. Its applicability is
thereby restricted, but adequate for our purpose. The type of spaces for
which it is applicable is defined below.
3.7 Definition: A vector space V over a field F is said to be finite-
dlmensional (overF)if there exists a finite subsets of Vsuch that L(S)=V-
For example, the space 1"" consisting of all ordered n-tuples of elements
of F is finite-dimensional. A finite spanning set is given by 91, 2,, .... 9,.
where for each i, e, is the n-tuple whose ith entry is] and all other entries
are 0. The space of all polynomials in x over F is not finite dimensional.
However, the subspace of it consisting of all polynomials of degree 5 or
less, say, is finite-dimensional with a spanning set {1. x, x3, x3, x‘, x“),
We could have as well taken {1, l + x. l + x + x', 1 + x + x' + x‘,
l+x+x‘+x‘+x‘,l+x+x’+x‘+x‘+x‘}asa spanning set.
Many other choices are also possible. Note that although we have defined
a finite dimensional vector space, we have not yet defined the dimension
of a vector space. We shall do this a little later. For the time being, there-
fore, ‘finite dimensional’ simply means ‘finitely generated’,
Proposition (3.6) provides the key to obtain the structure of finite
dimensional vector spaces. First let us paraphrase it in terms of an ex-
tremely important concept, which we define.
3.8 Definition: A basis for a vector space V over a field F is a subset of
V which spans V and which is linearly independent.
For example, for F", the set {en 2,. m, e.) is a basis. It is called the
standard basis for F". Many other bases are possible. For example, in
R”, the set {(1, 0. — 1), (- 2, l, 3), (0, 2, 1)} is a basis. But the set
{(1, 0, —- l), (— 2, l, 3)} is not a basis because although it is linearly in-
dependent, its span, as calculated above, is a proper subspace of K”. On
the other hand, thc set {(1, 0, —l), (— 2, l, 3), (0, 2, i), (l, l, 1)) is not a
basis, because although it spans R' it is not linearly independent. Thus the
two requirements of a basis are mutually independent.
Proposition (3.6) can be paraphrased to say that every finite dimensional
446 mscxm MATHEMATICS (Chapter Six)
vector space has a basis. As remarked after its proof, this holds for all
vector spaces. Although we shall not prove it, a few examples are note-
worthy. For the space FIX] of all polynomials over F, it is easy to see that
the set {1. x, x', .... )d', ...) is a basis. Since R is an extension field of Q.
it is a vector space over Q. Any basis of R overQis called a Hamel basis.
It is interesting that no particular Hamel basis has yet been constructed.
But its mere existence is useful in many contexts. This may sound puzzling
to a beginner. But it is fairly typical of situations where the axiom of
choice is applied. We simply get the existence of something without any
explicit description of it.
We now prove what role a basis plays in the structure ofa vector
space. In doing so we prove various simple characterisations of a basis.
Many authors take one of these characterisation: as the definition of a
basis.
3.9 Theorem: Let B be a subset of avector space V over a field F. Then
the following conditions are equivalent.
(1) B is a basis for V.
(2) B spans V and no proper subset of B spans V. (In other words B
is a minimal spanning set for V.)
(3) B is linearly independent and no proper superset of B is linearly
independent. (In other words, B is a maximal linearly independent
set.)
(4) Every element of V can be uniquely expressed as a linear com-
bination of elements of B. (The meaning of uniqueness will be
clarified in the proof).
Proof: (1) o (2). Suppose B is a basis for V. Then by definition, 3 spans
V. Also B is linearly independent. So by Proposition (3.5), no proper
subset of B can have the same span as B. So (2) holds,
(2) a (3). Suppose B is a minimal spanning set for V. Then no proper
subset of B has the same span as B and so again by proposition (3.5),
B is linearly independent. To show it is a maximal such set suppose C is
a proper superset of 8. Then L(B) c L(C). But L(B) = V. So L(C) = V.
This means that C has the same linear span as that of its proper subset B.
So by Proposition (3.5). C is linearly dependent. This proves (3).
(3) a (4). Suppose Bis maximally linearly independent. We assert that
L(B) = V. For let v e V. IN e B then certainly v e L(B). If v ¢ B, then
the set Bum is a proper superset of Band hence by assumption, is linearly
dependent. So there exist distinct vectors v1, Vt, ..., v, e B and Al, ....7..eF
not all 0 such that A, vx + + A. v. = 0. One of these v,’s must be v, for
otherwise all v,’s would be in B proving that B is linearly dependent, a
Rings. Fields and Vector Spaces 447
contradiction. Without loss of generality suppose v = '1. Then x, 56 0, for
otherwise Av, + + An. = 0, again contradicting that B is linearly
independent. Since 7x, as 0, we can write v = v, = pg, + + p,,v,l where
1 . .
p, = —- {for x = 2, ..., n. This shows v E L(B). Thus every vector in Vis
l
in L(B). So L(B) = V, or in other words every element of V can be
expressed as a linear combination of elements of B. We have further to
show that this expression is unique. Here uniqueness means the following.
Suppose the same vector v is expressed both as MV;++1.V. and
also as u‘vl + + on. where the v.’s are distinct elements of B and all
MS and W’s are in F. Then, 7i, = p, for all i = l, 2, ..., It. To prove this
suppose v admits the two expressions m, + ...+ A.» and plv, + ...+ n.v..
Leta,=A,—p. fori= l. ,n. Then u,v,+ +a.v,.=0. Since Bis
linearly independent, this means each a, is 0, on, = p. as was to be proved.
(4):»(1) Suppose every element of V has a unique expression as a.
linear combination of elements of B. Then certainly B spans V. To show B
is linearly independent, suppose v" ...,v. are distinct elements of B and
xiv, + + luv. = 0 for some 7.1, ...,)». e F. Now the zero vector 0 also
equals 0vI + 0v, + + 0v.. So by uniqueness of expression, A, = 0 for all
i = l, 2, ..., n. In other words, no non-trivial linear combination of distinct
vectors from B can vanish. So Bis linearly independent. Thus we have
shown that B spans V and also that B is linearly independent. By definition,
B is a basis for V. I
As a corollary, we get the following structure theorem for finite
dimensional vector spaces.
3.10 Corollary: Every finite dimensional vector space over a field I" is
isomorphic to F' for some integer n.
Proof: Let V be a finite-dimensional vector space over F. Then there
exists a finite set S such that L(S) = V. By Proposition (3.6), 5 contains a
linearly independent set E which also spans V. By definition. Bis
a basis for V. Let n = |B|. if n = 0, then B = 9‘ and V consists of only
the zero vector. In this case I"0 is also the trivial vector space by conven-
tion and so the result holds. 80 suppose n>0. Let v,, v., ...,v,be the
.
distinct elements of B. Define h:I""-> V by h (11, 1., ..., A.) = 121”"
Then h is a linear transformation. For, suppose (M. ..., 1.), ([4,. ..., p...) are
.
in F"- The“ h [(7th ..., 1-)+ (F1! "'1 For” = ’10: + H! ---s I. + F») ='2 0‘!
-1
+ W) V: '-= [£1 11"! + 1i-1 PIVi = ’10., ...,h) + Mil-n l‘ar «up-.0- Similarly, for
-
“e F, hm)“, ..-.R~))= 2' a 7m, = uh ()1, ..., 1...). So I: is a vector space
1-!
448 DISCRETE MATHEMATICS (Chapter Six)
homomorphism. To prove that it is an isomorphism we have to show that
it is a bijection. But this is exactly what statement (4) in the last theorem
says.
In Ease the field F is finite. the cardinality of the set F" is|I"|'l by
Proposition (2.2.13). This gives an alternate proof of Corollary (4.1.12),
about the cardinality of a finite Boolean algebra. We already saw how a
Boolean algebra may be regarded as a vector space over 2,, a field with
elements. If the Boolean algebra is finite, then as a vector space it is
certainly finite dimensional. So its cardinality is a power of 2. In the same
vein, we get the following result about cardinalities of finite fields.
3.11 Corollary: The number of elements in a finite field is a power
of some prime.
Proof: Let K be a finite field. In Section 1 we saw that the characteristic
of K must be a prime, say, p. Using the result of Exercise (1.9), K contains
a. subfield F which is isomorphic to 2,. Since K is an extension field of F,
it is also a vector space over F. Also it is finite dimensional. because we
may take K itself as a finite set which spans K. By the last Corollary, K
is isomorphic to F" for some 71. But then] X] = [F" I =p'. I
This corollary puts a restriction on the cardinality of finite fields. For
example. we see that there can be no field with 6 or with 10 elements.
However, the corollary does not say that given any prime power p”, there
exists a field with p" elements. The existence of such fields can indeed be
proved, using the method of construction of fields starting from irreduci-
ble polynomials, given in the last section (cf. Exercise (2.32)). Moreover.
two fields with the same number of elements turn out to be isomorphic to
each other. So for every prime power 1", upto isomorphism there is one
and only one field with 11" elements. The proof of this fact, as well as other
properties of finite fields will be given when we study applications of fields
(see the Epilogue).
Interesting as these corollaries are, there is still something missing. We
have not yet defined the dimension of a finite dimensional vector space.
According to Corollary (3.10), if V is a finite dimensional vector space
over a field F then V is isomorphic to I"I for some non-negative integer n.
It is tempting to define the dimension of V as this integer 7:. But there is a.
catch here. Corollary (3.10) does not assert that the integer n is unique. If
we go into its proof' we see that the integer n was the cardinality of the
basis 8 for V. There was nothing canonical about this basis B. Suppose we
had some other basis, say, C for l’ with cardinality m. Then by the same
reasoning, V would come out as isomorphic to I“. So the dimension of V
would be m as well as n. This would defeat our expectation that the dimen-
sion of a vector space be uniquely defined.
Rings, Field: and Vector Spaces 449
Fortunately, this does not happen. The same vector space may have
many different bases, but they all have the same cardinality as we show.
First we prove a preliminary result.
3.12 Proposition: In any vector space, any set of n+1 (or more)
vectors in the linear span of n vectors is linearly dependent, where n is any
positive integer.
Proof : Let V be a vector space over a field F. Let v1, v,. ..., VI 6 V and
let S = {Vu v,, ..., v"). The statement of the proposition means that if we
take any n + 1 (or more) vectors, say, w”. -, w..." in L(S) then the set
{w,. w., ..., Wm) is linearly dependent. Since every superset of a linearly
dependent set is linearly dependent, once we prove the assertion for n + l
vectors, it automatically holds for more than n + l vectors. Our task is
therefore to show that the set (W1, ..., mm) where each w, e L((v,, ..., m)
is linearly dependent. For this, we apply induction on 71.
Suppose n = 1. Then w. = 7., v1 and w, =09, for some M, M5 F. If
A,= 0 then w. = 0 and OM, + l-w, = 0 gives a non-trivial, vanishing
linear combination of w, and w,. If A, as 0, then MW, + (—1,) w, = 0. In
either case we see that {w,, w,} is linearly dependent.
Suppose now that the result holds for all values of n less than some
positive integer k. We prove it {or n = k. So let W1, ..., w»; E L((v,, Vk».
Then we can find elements M, e F such that,
W1 = 1n": + 1n": + ...... + luv.
w,u=).“v,+)...v,+ +1.,”
WI=N1V1+NIV2 + +1t
_. “Iv, + luv, + . . . . . + 14v;
wk—
and ”hi-1 = Atel-pl") ‘l’ AIM-1st,": + - - ' - + )‘k'tlv " v"
We may suppose that the coefficients in the last equation are not all
0. If they are, then w”1 = 0 and we can write 0w1 + 0w, +...+ 0w; + lwi.+1
= 0, proving the linear dependence of (w,,.. ”wk, wk“). 50 We assume
that 1k“. I =0 for some J We may further suppose that j— = k, that is,
1"“, k 56 0; as otherwise we merely re«index the v’5. Now, for each
i = 1,2, ,k we let in: w, — mm“, where 11.,— = lulu”. t. Then we see
that u, is a linear combination of v,,.. ., vim. So u,, u, ., uh are k vectors
in the linear span ofthek —- l vectors v,, ..., wk... By induction hypothesis,
the set (at, at) is linearly dependent. Hence, there exist a” ..., a, e F
not all 0 such that apt, + + aku,‘ = 0. Substituting u, = w; — mm...
wegetplwl+ a,w.+ +Bk+flz+1Wk+1 = 0whereal = an fori= l. ...,k
and 9,.“ = — (1,9, + a, p, + + an“). Since not all a‘s are 0, not all
450 mscnm MATHEMATICS (Chapter Six)
B’s are 0. So {wk ..., wk, wk“) is linearly dependent. This completes the
inductive step and proves the proposition.
Worded differently, this proposition says that if B c L(S) whereS is a
finite set, and Bis linearly independent then |B| cannot exceed |S |. Using
this result, we are now in a position to settle the possible ambiguity about
dimension.
3.13 Theorem: Any two bases of a finite dimensional vector space V
over a field F have the same cardinality. This cardinality is also the unique
integer n such that V is isomorphic to F“.
Proof: Suppose B and C are two bases for V. Since V is finite dimensional
there exists a finite set S C Vsuch that L(S) = V. Then elements of B
are in the linear span of S. But B is also linearly independent. So by the
last proposition, | B| g | S |. In particular B is finite. Similarly C is
finite. To show that B and C must have the same cardinality we note that
C is linearly independent and C C L(B). So again by the last proposition.
|C| < [B [. lnterchanging the roles of B and C, we get |B| < [C l.
Hence | B I = l C 1. Thus we have shown that any two bases of Vhave the
same number of elements. Let this number he n. We already saw in
Corollary (3.10) that V is isomorphic to F'. To complete the proofwe must
show that Vis not isomorphic to F" for m 36 it. Since ‘being isomorphic
to' is an equivalence relation,'this ultimately reduces to proving that for
m gt n, F" is not isomorphic to P". Let, if possible, h:F"—> F'" be an
isomorphism. Let B ={e1, 2,, ..., e.) be the standard basis for Fn discussed
earlier. Let C = h(_B). We leave it to the reader to prove, from the fact
that h is an isomorphism, that C is a basis for P". Sincehis a bijection,
| C| = |h(B)| = [B] = It. So 1“" has a basis, C, with n elements. But the
standard basis for FM has In elements. So by the first assertion, m = n a
contradiction. Thus 1"" and F“ are not isomorphic and hence n is the unique
integer such that V is isomorphic to F".
With this theorem at hand, we can now make the following definition:
3.14 Definition: The dimension of a finite dimensional vector space V
is the cardinality of any basis for it. It is denoted by dim; (V) or simply
by dim (V).
It is only now that we can say that a finite dimensional vector space
has finite dimension. Because of the standard basis for F“, we see that its
dimension is n. Again the role of the ground field should not be forgotten.
For example, 0, as a vector space over itself has dimension 1. But as a
vector space over R. its dimension is 2, because {1, i) is a basis for C over
R. If C is regarded as avector space overQ, the field of rational numbers,
then its dimension is infinite. (A proof of this is based on the fact that the
set Qis countable while G is not.) Intuitively,whenthe elements of avector
Rings, Field: and Vector Spaces 45]
space V over a field F can be expressed in terms of the elements of F, then
the dimension of V equals the number of free choices that we can make in
specifying elements of V. As the simplest example, in F" the elements are
ordered n-tuples of elements of 1’. Each of the n entires in such n-tuples
can be chosen freely, that is, without any restriction. So there are n free
choices, equal to the dimension of FII over F. By a similar reasoning, in an
m x n matrix (“11) over F, each a,, can be chosen independently of the
others. So the dimension of Mm,.(F) over F is mn. However, consider the
subspace V of M....(F) consisting of those matrices in which the sum of the
entries in the first row is 0. In such a matrix the entries in all rows except
the first one can be chosen freely. However, in the first row, any n — 1
entries may be chosen arbitrarily. Once these are chosen. the remaining
entry is determined completely. So the number of free choices in all is
(m— l)n + n—l that is mn—l. This is the dimension of the subspace V. As
yet another example. the dimension of R‘ is 5. Let V be the subspace of
R‘ defined by V = ((x,, x,, x,, x” x5): 22:, — x. + xx, = 0). For elements
of V we can choose xi, x,, x, and x. arbitrarily. But once they are chosen
x. must be defined by (x“— 2x,)/1t. So dim (V) = 4. (We could also have
chosen x,, x,. x, and x, arbitrarily and defined x. in terms of them.) If we
add one more restriction and define W as ((xl, x,. x,, x.. x,): 2x, — x. +
rrx. = 0 and 32:, —x. + x. + x. = 0) then dimension would go down
by one. In W, only x,, x, and x. can be chosen freely and so dim (Wi = 3.
AI a general rule, each additional restriction reduces the dimension by 1.
Of course, this additional restriction must be ‘rcally new', that is, some-
thing not implied by the other restrictions. For example. if we add one
more restriction to W, namely, 2xI + 6x. - 2x. + x. + (-r: + 2);“ = 0,
then the dimension of W does not Change, because this new restriction can
be expressed in terms of the earlier restrictions as, (2x1 — x, -1- xx‘)
+ 2(31, —— x, + x. + x.) = 0. This topic will be further taken up in the
next section where we shall solve systems of linear equations.
For dimensions of subspaces we have the following result. It conforms
to our intuition. But it is not trivial, because a similar result does not
hold for all algebraic structures. For example. a subgroup of is finitely
generated group need not be finitely generated. (A counter-example was
given in Exercise (53.35)).
3.15 Proposition: Let W be a subspace of a finite dimensional vector
space V. Then W is also finite dimensional and dim (W) < dim V. More-
over, any basis of Wcan be extended to a basis for V.
Proof: Suppose W is not finite dimensional. Then no finite subset can
span W. Let wl be any non-zero vector in W. Let S,={w,). ThenL(S,) (E W.
So there exists w.EW such that w,¢L(S,). Let S.= SIU(w,}. Then
again L(S,) C W. So find W, E W— L(S,). Let S. = S, U (“3). Continue
,1
452 DISCRETE MATHEMA'HCS (atapter Six)
this process indefinitely. It is easily seen, by induction, that each S. is
linearly independent. Now let B be a basis for V. Then L(B) = V. S. c L(B)
for all n. But by Proposition (3.12), S. is linearly dependent for all n > | BI.
This is a contradiction. So W must be finite dimensional. Let A be a basis
for W. Then A is linearly independent and A C L(B). So again by Proposi-
tion (3.12), [A[ g [8}. So dim (W)<dim (V). For the last assertion,
suppose C={u‘, ...’u..) is a basis for W. We go on adding vectors to
C one by one till we get a basis for V. Let T. = C. If L(T,,.) = V then T...
is a basis for V. Otherwise let u..." be any vector in tich is not in
L(T,.). Let T...+1 = T...U{u,,.+.}. It is easily seen that 7‘»H is linearly
independent. If it spans V, it is a basis for V. If not, enlarge it toalinearly
independent set TN... This process must stop at T. wheren=dim (V).
For otherwise Tm would be a linearly independent set of cardinality» + 1
in the n-dimensional space V, contradicting Proposition (3.12) once again.
Thus we have extended the given basis, C. for W to a basis T. for V. I
The way the basis for V was constructed in this proof is exactly
opposite to that based on Proposition (3.6). There we start with aspann-
ing set S for V and go on shrinking it till it cannot be shrunk any further.
This minimal spanning set is then a basis for V. 0n the other hand, in
the construction here, we start with a linearly independent subset of V and
so on enlarging it till it is impossible to enlarge it any more without
making it linearly ‘ r ‘ This ' ' linearly ' ‘ .— " subset of
V is then a basis for V. As we just saw, this construction is convenient
where subspaces are involved. Not surprisingly then, the earlier construc-
tion is more convenient for quotient spaces. Using it, it can be shown
that dim (V/W) g dim V. The exact relationship between the dimensions
of V, W and WW is that dim (V) = dim (W) + dim (V/W). This fact and a
few other results about dimensions will be given as exercises. We conclude
the present section with an application of the results proved in this section
to field extensions. Throughout the remainder of this section let F denote
a field, R will be a ring with identity containing F as a central subring.
(That is, F c R, elements of F commute with those of R and the element
1 in Fis also the identity for R. R itself need not be commutative.) Ifa E R
then FM will denote the smallest snbring of R containing F and a. It is
easily seen that F[a] consists of all elements of the form f(a) where
f(x) E F [x]. (cf. Exercise (2.20)). More generally, if at” 1., ..., at. e R then
F[a1,..., «.1 will denote the smallest subring of R containing Fand a” ..., a...
We shall mostly deal with the case where R is afield. But some of the
concepts, such as the following, make sense for any R.
3.16 Definition: An element one R is called algebraic over F if there
exists a non-zero polynomial p(x) over F such that p(a) = 0. Otherwise it
is called a transcendental element.
For example, let F= Q. the field of rationals and let R =R. the field
Rings. Field: and Vector Space: 453
of real numbers. Then V2 V3 are algebraic over Q because 1/5 is a root
of the polynomial x' — 2 which has rational coeflicients and similarly {/3
is a root of x' —- 5. Triviaily, every element of Q is algebraic over itself.
Given a. real number. it is extremely diflicuit in general to decide whether
it is algebraic or transcendental over Q. It is known that e and n are
transcendental over Q. But about 9+1: it is not known whether it is
algebraic or not. In fact it is not even known whether e + 1: is rational or
irrational. On this background, any result which asserts that a certain
number is algebraic is valuable. We shall prove that if R is a field extension
of Fand u, p e R are algebraic over F then a +9 and up are algebraic
over F. There seems no obvious way of doing this. Even if we are given
polynomials p(x), (10:) in F[x] having a, a, as roots, it is far from clear how
to construct from these a polynomial having a + p as a root.
However, with a suitable characterisation, the result can be proved
easily. The characterisation we need is the following: '
3.17 Theorem: Let K be a field extension of F. Then an element at e K
is algebraic over F if and only if the vector space He] is finite dimensional
over F.
Proof: Suppose at is algebraic over F. Then there exists a polynomial p(x)
of degree n (say) over F such that p(ot) = 0. Let S = (l, a, u‘, ..., u""}. we
claim that the vector space Hat] is spanned by S. As noted above, a typical
element of FM is of the form f(a) for some fine 1"[x]. Now F[x] is a
euclidean ring by Theorem (2.3). So there exist polynomials q(x),r(x) E F
[x] such that fix) = p(x) q(x) + r(x) where r(x) = 0 or degree of r(x) is
less than n. This gives f(¢) = p(a)q(a) + r(¢)=r(a). since 120:) = 0, (cf. Pro-
position (2.23).) If r(x) = 0 then r(at) = 0. But then f(a) = 0 and 0 is always
in the linear span of S. So suppose r(x) is not identically 0. Then r(x) is
a polynomial of degree k g n — i. Let r(x) - a0 --1- a,x + + 4e where
an, ...,ak e F. Then/(a) = r(at) = aol + a,“ + + aw“. Since k s n — l,
we see thatf(at) is in the linear span of S. 50 FM is finite dimensional
over F.
Conversely suppose Fla] is finite dimensional over F. Let n be the
dimension of Fiat] over F. Then FM is spanned by a set S of cardinality n.
By Proposition (3.12), the n + 1 elements 1, a, a“. .... a" in the span of S
are linearly dependent. So there exist an, a” ..., a. e F, not all 0, such that
(1.1 + ale: + a,«' +...+ a..a." = 0. But this means [1(a) = 0 where p(x) is
the polynomial is“ + 11.1: +...+ a.” in flat]. So a is algebraic over F. I
In order to use this characterisation efiectively, we need a definition.
3.18 Definition: An extension field K of a field F is said to be a_finite
extension if K, as a vector space over F, is finite dimensional. The
dimen-
by
sion of K over F is called the degree of K over F and Is denoted
[K: F].
454 mscna'rs summaries (Chapter Six)
It should be noted that here the adjective 'flnite' refers to the dimen-
sion of K over F and not to the field K itself. The field K may very well be
infinite. There is a reason why this dimension is called the ‘degree’ of K over
,F. In the proof of Theorem (2.26), we saw that iff(x) is an irreducible
polynomial of degree n in F[x] then there exists an extension field K of F
such that in K, f(x) has a root at. Using essentially the same argument as
in the last theorem, it is easy to show that the field K constructed there has
dimension n over F, the same as the degree of the polynomial f(x) from
which it was constructed. Hence the name ‘degree'.
As a consequence of the last theorem we have the following result:
3.19 Corollary: "K is a. finite extension field of F then every element
of K is algebraic over F.
Proof: Let at E K. Consider the subring PM of K generated by F
and a. Then PM is a subspace of K, regarded as a vector space over F.
By assumption, K is finite dimensional over I". So by Proposition (3.15),
Hat] is finite dimensional over F. Hence by theorem (3.17), a is algebraic
over F H
Thelcverage which Corollary (3.19) has over Theorem (3.17) in proving
that a is algebraic over F is that it may not be so easy to show directly
that F[a] is finite dimensional over F. But by Corollary (3.15) it is sufiicient
to find some finite extension, say, K, of F which contains a. As a further
aid, the following theorem shows that in order to prove that a field ex-
tension is finite, we may use a chain of intermediate fields.
3.20. Theorem: Suppose F, L, M are fields withF c L c: M. Then if L
is a finite extension of F and M is a finite extension of L then M is a finite
extension of F. Moreover, [M : F] = [M : L] - [L: 1"].
Proof: It is only the first assertion that will be of immediate use to us.
The second assertion is much stronger. Curiously, however, it is easier to
prove because it is also more specific and hence provides a clue for its
proof. (See the comments after the solution of Problem (2.2.ll)~) So we
shall proceed to prove the second assertion. Let {Jrh ..., x...) be a basis of
L over F where m = [L: F] and (n. ..., y.) be a basis ofM over L where
n = [M : L]. We want to find a basis of M over F having mn elements.
The most natural candidate would be the set S = {x,y,.l S i g m,
r g j g n). We show, in fact, that S is a basis for M over F by showing
that it spans M and secondly that it is linearly independent over F (which
would also show that S has run distinct elements). First, let: 5 M.
Then there exist 1),, ..., b. e L such that a = i b,y,. Also for each
I-l
j = 1, ..., u, there exist nu, a“, ..., (4..., e Fsuch that b, = 5 arm. Sub-
I-l
Rings, Fields and Vector Spaces 455
h
stitution gives a = 2 2 a,,x.y,. Thus we have expressed mas a linear
1-1 [—1
combination of the mph: with coeficients from E This shows that the
linear span of S (over F) is M. As for linear independence of S, suppose
I 2 b,,x,y,_
—- 0where bye Ffori <i<m,1<j<n. We now work
backwards, that is, we group together the terms. For each j = , ..., n,
let c, = E; auxl. Then c, E L, also g c,y,=0. But (yh ..., y.) is linearly
independent over L. So :1: 0 for all j = l, ., n. This means film/x, = 0.
which, by linear independence of x,‘s over F, forces each an, to be 0. So
S is linearly independent over F. As noted before, this completes the
proof. I
None of the results we have proved so far about field extension:
is very profound. But, as it happens many times, simple results, when
ingeneously combined, yield non~trivial results. We are now ready to prove
that the sum and the product of two algebraic elements are algebraic.
something which is not easy to prove directly.
3.21 Theorem: Let K be an extension field of a. field F. Suppose or, B E K
are algebraic over F. Then a + 3 and up are algebraic over F.
Proof : Let L, M be respectively FM and F[u, 9]. By definition, these
are, respectively. the subrings of K generated by Fu(u} and Fu(e,p}.
But we claim they are in fact subfields of K. First consider Hat]. Sinceut
is algebraic over F. it satisfies some non-zero polynomial equation. Let
I be the set of all polynomials f(x) in Flat] such that flux) = 0. It is easily
seen that I is an ideal of flat]. Also it is a non-zero ideal because it
contains at least one non-zero polynomial. Now fix) is a euclidean ring
by Theorem (2.3). Let p(x) be a polynomial of minimum possible degree
in I. Then by Theorem (2.4) (or rather its proof), I is generated by p(x).
That is, I = (p(x)).
The next step is to show that p(x) is irreducible over F. Let if possible,
p(x)=f(x)g(x) be a proper factorization of p(x) in F[x].Thenf(x) and
g'(x) are of loWer degrees than 110:). Now, f(a) g(a) = p(nt) = 0. ButK is
a field and so has no zero divisors. (This is the first time that we are
crucially using that K is a field and not just a ring extension of F.) So
f(a) — 0 or g(a)= 0. In either case we get a polynomial in FIx] of a lower
degree than p(x) having a as a root, contradicting the choice ofp(x). So
p(x) is irreducible over F[x]. (Caution. p(x) is not irreducible over K[x].
In fact since 11(1) 0p(x)
, has“a factor x — a by Theorem (2 24).).
The I "‘ is L ’ " ," ' ofthe -. in TL
(2.26). Define 0:F[x]—>K by 0(f(x)) =fla). Then 0 is a ring homomor-
456 Inseam MATHEMATICS (Chapter Six)
phism (see again Exercise (2.20)). By definition, I is the kernel of 0. Also
the range of e is FM. So by the fundamental theorem about ring homo-
morphism (Exercise (1.22)), Ha]. as a ring, is isomorphic to the quotient
ring F[x]/I. But I is generated by p(x), which is a prime element in Hx].
So by Theorem (2.21), F[x]/I is a field. Hence Ha], which is isomorphic
to it is also a field.
Therefore, L (that is, FM) is not only a subring of K but a subfield of
K. By theorem (3.17), L is a finite extension of I". Next, consider
M = F[a, [5]. Clearly M = [FM] [3] = L [9]. New B is algebraic over F and
hence all the more so over L (since Fc L). So by exactly the same
argument as was used in showing that flu] is a field, it follows that L[p],
that is, M is a subfleld of K. Also it is a finite extension of L. So by
Theorem (3.20), M is a finite field extension of F. Hence by Corollary
(3.19). (applied to M and not to K), every element of M is algebraic overF.
In particular, a + 3, up are elements of M (since a, F E M). So they are
algebraic over F. I
Exercises
3.1 In the definition of a vector space, prove that it is enough to
assume that (V, +) is a group (not necessarily an abelian group).
In other words show that in presence of all other axioms, the
commutativity of+ can be proved and need not be assumed as an
axiom (Hint: See Exercise (1.1)).
3.2 Which of the following subsets are subspaces of R'?
(i) «as, x.. r.) e R'22x. + 3x. + 4x. = 0)
(ii) «3‘1: 3.: x.) E R” 2x: + 3x: + 4x: - 1)
(iii) «In. 26.. X.) E R'm > 0. x. > 0. x. < 0}
(iv) (06.. x,, x,) e 11': x1, x” x, are rational}
(V) «xen- x.) e R': x: + x: + x: =1).
3. 3Prove that a subset Wof a. vector space V is a subspace of V ifi‘ W
is closed under linear combinations. that is, all linear combinations
of elements of W are in W.
3.4 Let V be a vestor space over a field F. Prove :
(a) the intersection of any family of subspaces of V is again a
subspace of V.
(b) if F is infinite, then the union of finitely many subspaces of V
is a subspace of V only if one of these subspaces contains all
others.
3.5 Let T: V—> W be a function, where V, W are vector spaces over
the same field F. Prove that T is a linear transformation ifi‘ T
preserves linear combinations, that is, for all v1, ..., v.6 Vand
Rings. Fields and Vector Space: 457
1.1, ...,z.e F, T(t,v,+ + m.) = x, T(v,) + +1.. T(v,,). Hence
the name ‘linear’ transformation).
3.6 Let W,, WI be vector spaces over a field P. Let W = Wl x W,. Let
15, n‘ be projections of W onto W‘. W, respectively, that is,
1:, (x,. xi) = x, and 1:. (x,, x,) = x, for (x,. x.) e W. Prove that
1:1, 7:, are linear transformations. What are their kernels and ranges 7
3.7 If V, W,, W, are vector spaces over a field F, prove that Homp
(V, W‘ x W,) is isomorphic to Horn; (V, W.) X Horn, (V, W,).
3‘8 Prove that if T: V-—>W is a linear transformation with kernel K
and range R then R is isomorphic to the quotient space V/K. (The
kernel of a linear transformation is also often called its null space.)
Let v0 6 V and We = T(vo). Prove that the let of solutions to the
equation T(v) =w‘l is precisely K + {v0}. (of. Proposition (5.16).)
3.9 Give an alternate proof of Proposition (3. 6) as follows. Let
S = (”1- v,, ..., n}. Go on picking these vectors onc-by-one. After
v. is picked, see if v, is in the linear span of {V1, ...,v,_;}. If it is
not, include it in T. Prove that the set Tthat is obtained after all
elements of S are picked is linearly independent and has the same
span as S. (This proof is in the line of the construction in the
proof of Proposition (3.15). From the point of View of mechanical
implementation, this proof is better than the proof in the text.)
Let V = {(x,, x,, x3, x“ 1,) e R512x1—x,+1rx,=0, x,+xs—x.—x,
= 0}. Obtain a basis for V.
3-11 Let V be a vector space of dimension it over a field F. Let S be a
finite subset of V. Prove that any two of the following three
statements, taken together, imply the third
0) ISI =n.
(ii) S spans V.
(iii) S is linearly independent.
3.12 Using the fact v5, V3, V5 and 1/; are all irrational show that:
(i) {1, V5) is linearly independent over Q,
(ii) (I, f, V3} is also linearly independent over Q,
(iii) (1, v2, 1/5! 1/3} is linearly independent over Qand
(iv) {1, «5, V3, 1/3} is linearly independent over Q.
Let X. Y be subspaces of a finite dimensional vector space V. Let
X+ Y= (x+y:xeX,yE Y}. Prove that X+ Y is a subspace
of V and that it is generated by X U Y. Prove further that dim(X+ Y)
= dim (X) + dim (Y) — dim(Xn Y). (Hint: Start with a basis for
X n Y. Extend it to a basis for X and also to a basis for Y. Show
that the union of these two bases is a basis for X + Y.)
458 Discaa'rs MATHEMATICS (Chapter Six)
3.14 Prove that dim (V x W) = dim v+ dim W.
3.15 Let W be a subspace of a finite-dimensional vector space V. Prove
that if {v,+ W, v,+ W,.... vs+ W} is a basis for WW then
(v1, ...,vk} is linearly independent. Hence show that dim (V)=dim
(W) + dim (V/ W).
3.16 Let T: V—> W he a linear transformation of finite dimensional
vector spaces. with kernel K and range R. The dimensions of K
and R are called respectively the nullity and the rank of T and are
often denoted by n(T) and r(T). Prove that n(T) + 7(1) = dim (V).
If T1: V—> W and T,: W—> X are linear transformations, prove that
r(T,aT‘) g r(T.) and also r(T,oT,) g r(T,).
Prove that a linear transformation T: V —> W is one-to-one ifl‘ it
takes linearly independent sets in V to linearly independent sets
in W.
3.18 Let V, W be vector spaces of equal dimension. Prove that a linear
transformation T: V—->W is one-to-one ifl' it is onto, (Thus, to
some extent, finite dimensional vector spaces behave like finite sets.)
3.19 Let X be a subspace of a finite dimensional vector space V. Prove
that there exists a subspace Y of V such that (i) X n Y = {0} and
(ii) dim (Y)=dim (V)—-dim (X). For any such space Y prove
that X + Y: V and X X Y is isomorphic to V. (In a sense, X and
Y are complementary subspaces. However, for a given X, Y need
not be unique in general.)
3.20 LetG be an abelian group. Prove that G can be regarded as a
module over 2. the ring of integers.
3.21 Let R be a ring and suppose M is a lett ideal of R. Prove that M
can be regarded as a left module over R, Show by an example
that it mayhappen that rm = 0 but neither 7 nor m is 0.
3.22 Let K be a field extension ofa field F and [K:F] = n. Let aEK
he a root of an irreducible polynomial p(x) in Hx]. Prove that the
degree of p(x) is a divisor of n. If n is prime, show that PM is
either F or K.
3.23 Prove that the set of those real numbers which are algebraic over
Q is countable. (Hint: Prove that there are only conntably many
polynomials in Q[x]. Each of them can have only finitely many
roots.)
Prove that transcendental real numbers exist and in fact their set
is uncountable. (See Exercise (2.2.24) and the comments on it.)
3.25 Let V,. Va, .... V; and Wbe vector spaces. A function f: V1 x V9;
x V,‘ —> W iscalled mnltllineer iffor every i=1, ,.., k,f(v,, ..., v,_I,
«u, + 5V1. "1+1: ..., Vk) = #1405. ..., w, ..., Vt) + Bf(v,, ..,, w, ..., u) for
all we V1. ..., m, we V,, ...,vk e Vk. In other words, a multi-
linear function is linear in each variable when all other variables
are held constant. Computeflui + V1: ..., "I: + 1'0 for a multilinear
Rings, Fields and Vector Spaces 459
function f. For k = 2, a multilinear function is called hilinear. if F
is a field, prove that the function f : F- x Fe —> F defined by
f((x,, .... x.), (y,, .«-.y,.)) = xiyt + x,y, + + x..y,. is bilinear.
3.26 Although we are not studying inner product spaces. we assume
the reader is familiar with the definitions and properties of the
usual dot and cross products of vectors in R'. For the purpose of
this exercise we shall denote vectors in R' by bold faced letters.
i, j, It will denote three unit vectors forming a right handed,
orthogonal system as usual. If u = u|i +u,i + u.lt and v: v,i
+ VI] + v,lt e R' then n-v denotes their dot product u,vI + ulv,
+ u,v, and n X v denotes their cross product, (up, —— 11.12,) i + (u,vI
— um) j +(u,v,—u.v,) It. Given a quaternion a=ao+ al i+a,j+a,k,
we associate to it the ordered pair (on, a) where a is the vector
:11 l + ulj + a, k. This sets up a hijection between Qand R X R“.
In terms of this bijection, prove that the quaternionic multiplica-
tion takes the form (an, a)-(bo, h) = (aobo -— a~b, a, h + ba 3 + a
x h). This is highly analogous to complex multiplication. (The
quaternionic addition shows no remarkable change of form,
(an, a) + (be. h) is simply (a, + b”, a + h).) In this formulation the
norm of a, N(a) is simply V | a. l' + | a F, note again the close
formal resemblance with the absolute value of a complex number.
Using this representation of quateroions and the various identities
about the dot and the cross product (which may be found in any
elementary textbook on vector algebra). verify that (Q, +, -) is a
division ring. Also give an easier proof of the multiplicative property
of the norm, that is, N(ab) = Ma) N(b) for all a. b e Q. This exercise
also shows why the quaternions are denoted in a peculiar manner
using the symbols 1', j, k and not merely as ordered 4-tuples (no, a”
41., a.) which seems more logical.
Let G be an abelian group in which 2x = 0 for all x e G. Prove
that G can be considered as a vector space over the field 2. in a
unique way. (This generalises Example (6) of vector spaces.)
Let G be any abelian group. For any positive integer n, let
nG = (rig: ge G). Prove that n0 is a subgroup of G and that in the
quotient group G/nG, every element has an order which is a divisor
of 71.
Let S be a finite set and G be the free abelian group on S (see
Exercise (53.32)). Prove that 0/26 is a vector space of dimension
|S |over 2..
Let S, T be finite sets and let G, H be free abelian groups on S, T
respectively. Prove thatG is isomorphic to H ifand only if I S | = | T |.
(In view of this exercise the rank of a finitely generated free abelian
group G is well-defined. It is the cardinality of any finite set S such
that G is isomorphic to the free abelian group on the set S.)
460 DISCRm MATHEMATICS (Chapter SIx)
3.31 Prove that the last exercise also holds for free groups. [Hint Com-
bine the last exercise with Exercise (5.3.31). Thus the rank of is
finitely generated free group is also well-defined; it equals the
number of elements in any set which freely generates it ]
Notes and Guide to Literature
The results in this section are elementary because we are confining
ourselves to finite dimensional vector spaces. For existence of a basis for
an infinite dimensional space, see Lang [1]. where the reader will also find
more about modules. The theory of modules over a ring depends crucially
on the properties of that ring. (Vector spaces correspond to the case where
the ring is a field.) See. for example, Herstein [2].
For a proof of transcendence of e see Herstein [1]. It is much more
dificult to prove that 1: is transcendental. This was proved by Lindemann
in 1882. See Niven [1]. While proving that a particular number is transcen-
dental can be quite dificult, interestingly enough, the mere existence of
such numbers (without showing a particular number to be transcendental)
can be established by a simple cardinality argument, as in Exercise (3.24).
This is another instance of proving existence through abundance.
The inner product spaces, meant to abstract the geometric structure of
euclidean spaces, form the starting point of a large area of a branch of
mathematics called functional analysis. In this, we study generalisations of
such topics as Fourier series and Bessel functions. There are numerous
treatises on functional analysis. for example Rudin [2] or Limaye [l].
The aspects of euclidean geometry which do not depend on the concepts
of distance and angles can be generalised to vector spaces over fields other
than the field of real numbers. Strange as it may seem, even after these two
basic concepts are removed. a surprisingly large amount of geometry can be
still salvaged and leads to the theory of what are called projective geomet-
ries. We shall briefly touch them while studying applications of finite fields
(see the Epilogue).
Multilinear functions are more popularly called multilinear forms. Bilinear
forms are especially important. See Lang [1] or Hofl'man and Kunze [l].
4. Matrices and Determinants
In Section 1, we introduced matrices to give examples of rings. In this
section we study one of the standard applications of matrices, namely, to
provide a handy representation for linear transformations of finite dimensio-
nal vector spaces. In the study of properties of a square matrix, the deter-
minant is an invaluable tool. We assume that the reader is familiar with the
properties of 3 x 3 determinants. We shall show how they can be extended to
Rings, Fields and Vector Spaces 461
n x n determinants, using the permutation groups, studied in the last chapter.
We shall also study the concept of the rank of a matrix, which is also
intimately related to the solution of a system of linear equations. Throughout
this section we shall consider only finite dimensional vector spaces over a
field F. If n is a positive integer, then the vector space F", as defined in the
last section, consists of all ordered n-tuples (x,,..., x") of elements of 1".
Each such element can be thought of as a matrix with one row (and n
columns) and hence is also called a row vector. Occasionally, we shall find
it u, ‘ to . r ‘ ‘ of F" by ' vectors, that is by n><l
*1
matrices "_” . We can do so because the vector space of all row vectors
x"
(of length n) is clearly isomorphic to the vector space of all column vectors
of length n. An m x nmatrix may be thought of either as an ordered
collection of its row vectors or as an ordered collection of its column
vectors. »
We begin with a simple proposition whose proof is a matter of straight-
forward verification and hence left to the reader.
4.1 Proposition: Let A = (a,,) be an m x n matrix over a field F. Let
P", F" be respectively the vector spaces of column vectors of lengths m and
x
it over F. Define 1': Fa —» F'" by T(x) = Ax for xe F”. (U): = (5‘) then
x.
.
( yIn)
a“ a,,...a,.
where
,Vm ”nu ”mu-all»
Then T is a linear a transformation from F” to P".
Thus we see that every matrix gives rise to a linear transformation, In the
last section We saw that F" has (42,, e,, ..,, 2,.) as a standard basis, where e.
is a column vector of length n which has I in the i-th place and 0 every-
where else. Notc that, under the linear transformation T, the image of e, is
precisely the i-th column vector of the matrix A‘ Let us denote this column
vector by m. Now atypical element, say u, ofF" is of the form Aley-k...+lt,.e,.
for 1th.", A, E F. Then T(u) = A,T(el) + + MTG") = Alv, + + A.v,..
Therefore, if R denotes the range of T, then R is the subspace of F"' spanned
by the set (yum, v.}. This set need not be linearly independent. However,
by Proposition (3.6), we can get a linearly independent subset. say,
462 mscnm MATHEMATICS (Chapter Six)
{v,,,..., w, ) of it which will be a basis for R. Then dimension of R is r. The
dimension of the range of a linear transformation is called its rank.
(Exercise (3.16)). We therefore have the following result for the linear
transformation constructed in the last proposition.
4.2 Proposition: The rank of the linear transformation T obtained from
the matrix A as above equals the maximum number of linearly independent
columns ofA. I
This result is often expressed by saying that the rank of T equals the
column rank of A, where the column rank of a matrix is defined as the
maximum number of its linearly independentcolumns. Similarly the rowrank
of a matrix A is defined as the maximum number of linearly independent
rows in it. The row rank can also be interpreted as the rank of a suitable
linear transformation as follows:
4.3 Proposition: Let A be an mx n matrix over a field I". Let F", F" be
the vector spaces of row vectors of lengths m, n respectively overF- Define
T: I" -> F" by T(x) =xA for x = (x., ..., x...) e F". Then T is a linear
transformation. Moreover, the rank of T is the row rank of A. I
The proof is completely dual to that of the last two propositions. This
duality, which results by an interchange of rows and columns is called
transposition. Formally, if A = (a,,) is an m x n matrix over any ring R
then its transpose, denoted by A' (or sometimes by AT or Al) is defined as
then X m matrix whose entry in the ith row and jth column is a,,. In
symbols, A’ -—= (a;,) where ai1=un for i=1, ..., n;j=l, m. For example,
the transpose of the matrix
1 1:0
12 o —1 4 2 31
(11: e _2 V5 9 )istheSxSmatrix o —2 2
01 2131/2 —1v313
4 91/2
Similarly the transpose of a row vector is a column vector and vice
versa. A few simple properties of transposes are listed below. The simple
proof is again omitted.
4.4 Proposition: Let A, B be two m x n matrices over a commutative
ring R. Then for any a, B e R, («A + flB)’ = atA’ +882 If C is any n X p
matrix over R then the matrix products AC and C'A’ are both defined and
moreover, (AC)' = C’A’. Finally, for every matrix A, (A’)' = A. i
Transposition is a convenient device by which concepts about rows of a
matrix can be translated to concepts about columns of the transpose, and
vice versa. For example, the row rank of a matrix A equals the column
Rings, Fields and Vector Spaces 463
rank of A'. What is not so obvious is that the row rank of A equals the
column rank of A itself. This common value is then called its rank. A
direct proof of this fact is somewhat awkward. After studying determinants
we would be in a position to prove it.
Propositions (4.1) and (4.3) show how matrices give rise to linear trans-
formations. We now want to proceed the other way. That is, given a linear
transformation T: V —> W where V. W are vector spaces of dimensions
n, m respectively, we want to show that T can be represented by an m x n
matrix over F. This representation makes it possible to study linear trans-
formations through matrices (which are easier to operate on machines).
But besides its utility, the very existence of such a representation is remark-
able, especially because there is no similar representation for homomorphisms
of other algebraic structures such as groups and rings. What is so nice
about vector spaces? The answer is the presence of bases. Every element
of a. vector space can be uniquely expressed in terms of the basis elements.
In this respect vector spaces behave like free groups, where every element
can be uniquely expressed in terms of the generators (see Chapter 5,
Section 3, see also Exercise (5.3.33) where a similar expression is obtained
for elements of a free abelian group). Because of this property of a basis,
we have the following simple but useful result, which is the analogue of
Theorem (5.3.20):
4.5 Theorem: Let B be a basis for a finite dimensional vector space V
over a field F. Let W be any vector space over 1". Then any function
f: B a W can be uniquely extended to a linear transformation from V to
W. (Note, in particular, that this says that if two linear transformations
agree on a basis, then they agree everywhere.)
Proof: Let v1, ..., v" be the distinct element of B. Then [(17) is giVen as an
element of Wfor each 1‘ = l, ..., n and we want a linear transformation
T: V» Wsuch that To.) =f(vl) for! = l, ..., 71. Let vs V. By Theorem (3.9),
there exist unique 1,, ..., A. e Fsuch that
v = i m.
[-1
We define
a
T(v) =l 7t;f(v,).
Since the M’s are unique there is no problem about T being well-defined.
To prove T is a linear transformation, suppose, v, w e V with v = f 7t,v,
1:)
and
u
w ='21 pm, 1,, 9., E 1", i=1, ..., n.
464 am sun-insures
Then v + w = lilo" + u,)v,. So T(v + W) =31 (N + mfll’r) =51 MK")
+ ,_51 mm) = T(u) + 70-). Similarly for any a e r, my) =,£i“’f(")
_
= «i hfm) = a: T(v). Thus T is s linear transformation. Moreover, for
[-1
for every 1, VI=OV1 +...+0v,.l + In + 0v,“ +...+ 0v.. So To.) =0flv,) +
+ Of(v,_,) + WV!) + 0f(v.+,) + -I— Oflv.) =f(v,). Thus Tis a desired
extension off. As for uniqueness of T, suppose S: V—> W is another hnear
extension off. Then for v = 5: Am, by linearity of S (see Exercise (3.5)),
1-1
50-) =31 mm.) = 3‘ mm), since 50.) = m.) for all i = i, n. This
shows S(v) = T(v) for all v e V. I
Verbally. a linear transformation is uniquely determined by its action
on a basis. Thus, to define a linear transformation from V to W it Iufices
to take any basis B for Vendany function f: B -> W. We arefreetochoose
this function anyway we like. A judicious choice leads to interesting con-
sequences, one of which is given below:
4.6 Theorem: Let V. W be vector spaces of dimensions mm respectively.
Then L(V, W), that is, the vector space of all linear transformations from
V into W has dimension nm.
Proof: Let B = (n, ..., v.) and C = (w, .... w..}bssesfor V, Wrespectively.
For each i - l, ..., n sndj =1,..., m mm: B -> Whethefunctionwhich
takes v, to w, and v. to 0 for k #- 1. (fl; is somewhat like the characteristic
function of the singleton set (w), except that the value at v, is not l but 19,.)
By the last theorem, each fu can be uniquely extended to a linear trans-
formation Tu from Vto W. We claimthat the set (Tun g i < n,1<j g in)
chin: linear transformations is a basis for L(V, W). This would of course
imply that L(V, W) has dimension um.
First, let T e L(V, W). Then T is a linear transformation from Vto W.
For each i =. l, .... n, T(v,) is an element of W. Since (w,, ..., w...)is a
basis for W, there exist unique scalars a“, “u: ..., a... such that
T01): g fluW].
J=l
Now let S be the linear transformation 5 E- aflTu. We claim that S = T.
For this, in view of the last theorem agiiihl. lit‘sulfices to show that for every
k = 1. n, so“) _—. TM). Now, 50.) =5! 5... anTu(v.) = 2‘ (a,.w,)
because T001) = 0 for i 56 k and fori = k, T410.) =wl. So 5(a) = To.)-
Ring, Fields and Vector Spaces 465
Hence S = T. That is, T is a linear combination of the We, showing that
L(V, W) is spanned by them. ,
We only have to show now that the his form a linearly independent
set. Suppose a” are scalars with [El '2' a,,T,, = 0. Then for every k.
-1
-
the value of £1 '2. aflTu on the element v], is 0. By the same computation
,. .1
n
as before, this value is Fla/kw}. But {wh ..., w...) is linearly independent.
So 41,. = 0 for all j = l, ..., m. Since this holds for every k = l, ..., n
it followsthat a]. = Oforall i, j. Thusthesetfl'”: i= 1, ...,n;j= l, ...,m}
is linearly independent. I
In particular we have the following corollary:
4.7 Corollary: The dimension of a space is the same as that of its dual.
Proof: We simply recall that the dual of a vector space V over a field F
is L(V, F) where F is regarded as a vector space over itself. Since the
dimension of F (over itself) is l, the result follows by taking m = 1, in the
last theorem. l
The um numbers a” constructed in the proof of Theorem (4.6) deter-
mine the transformation Teompletely in terms of the bases B and C. We
can get an m x n matrix by arranging these numbers into m rows and n
columns. The matrix so obtained will give a complete representation of
the transformation T. However, this matrix depends not only on the
transformation but also on the bases 8 and C. If we change either one of
them, we could get a different matrix for the same transformation. Actually,
the order of the elements in B and C also matters. If we reshuflle the
elements of B then the columns of A would be permuted and- if we reshufile
the elements of C then the rows of A will be permuted.
To assign a unique matrix to a linear transformation, we introduce
the concept of an ordered basis. This is the same as a basis, except that
the order of the elements matters Formally, an ordered basis for a vector
space V over a field may be defined as a finite sequence (v‘, v,,...,v,.) of
distinct elements of V such that the set (vl,...v,.) is a basis for Vover F.
Thus, (v,, v,...., 1'.) is not the same ordered basis as(v,, v,,..., v,,) even
though, as sets, (v,, v,,..., v.) is the same as (v,. v,,..., v.). With this con-
cept, we are now ready to give the definition of the matrix associated with
a linear transformation.
4.8 Definition: Let (v,...., v..) and (tv.,..., w...) be ordered bases for vector
spaces V, Wrespectively over a field F. Let T: V —» W be a linear transfor-
in
motion. For each k = l,..., n let Tm) = 2 a”. w,. Then the m xn matrix
[-1
466 DISCRETE MATHEMATICS (Chapter Six)
A = (an) is called the mntrlx of T w.r.t. the ordered bases (v,,..., v.) and
(Wu-"y Win)
The easiest way to remember this matrix is to note that its kth column
is obtained by writing down the coetficients of T(v;,) when it is expressed
as a linear combination of wl,..., w,,,. [t‘ we start with an m x in matrix A
and construct the linear function T: F" -> F'" given by Proposition (4.1),
then the matrix of T w.r.t. the standard ordered bases for Fl and comes
out to be A itself. But with a different choice of bases, it will be different.
To illustrate this. we work out a numerical example.
Yr
2:
4.9 Problem: Define T:R'—>R' by T< ‘ ) = y, where y, = 2x,—
. xI y‘
—'- x..,Vz = — x, + x, and y, = x, +3:r Find the mntrix of T w.r.t. the
l
l —I
ordered basis (( 2 > , ( l» for R5 and the ordered basis 0 , —l ,
‘ l 2
3
l for R’.
0
Solution: The matrix of Tw.r.t. the standard ordered bases would be
2 —l
—l l . But we have to find the matrix w.r.t. the given hues. First
2 0
l l
T = —l I > = l .Wc want to express this as
l l '
a linear combination of the given vectors in 11'. Suppose the coefficients
are a, B, 7. Then we get three equations:
a+23+3y=0
-{i+v=l
ail-213 =3
Systematic methods for solving such equations will be discussed later
on. For the moment we solve them by hand to get (1-7. |i-—2.
Ring, Fields and Vector Space: 467
7 .
= —~ 1. This gives —2 u the first column of the desired matrix. A
—- 1
similar procedure (which involves solving the system a: + 23 + 37 = — 3,
2
—9+T=2andat+2fl=0)gives thesecond columns: —1 . So
—-l
7 2
the matrix of Tw,r_t. the given ordered bases is —2 —1 .
—-i —-I
For a fixed choice of ordered bases for Vand W the correspondence
from L(V, W) to M... n U“) which associates to T E L(V, W) its matrix A.
isa bijection. The inverse of this bijection is determined by essentially
the same construction as used in Proposition (4.1). Once we know the
matrix, A, of a linear transformation T: V—> W w.r,t. ordered bases
(v1. v,,..., v.) and (w,,..., u-..), it is easy to evaluate T(v) for any we V.
I I
First write v as 2 A, wwherc 1,...” 7i. E F. Then RV) = 121"} w, where the
1-] -
coefliCients pr... a." are given most conveniently by the matrix equation,
a»: a
S =A E
ii... an
The proof of this fact is simple and hence omitted.
The importance of the assignment of matrices to linear transformations
comes from the fact it is compatible with addition and scalar multiplica-
tion. Moreover, matrix multiplication corresponds to composition of linear
transformations We state these results precisely in the following proposi-
tion.
4.10 Proposition: Let V, W, X be vector spaces with ordered bases
(v,,..., v.). (w,,..., w”) and (x,,..., x,) respectively. Let T: V—> W, S:V—>W
and U: W —» X be linear transformations with matrices A, B. C respectively
w.r.t. these ordered bascs. Then
(i) A + B is the matrix of the linear transformation T+ S: V—> W
(ii) For every scalar x, M is the matrix of )‘T and
(iii) CA is the matrix of U o T: V—> X.
(All matrices are w.r.t. the given ordered hues.)
468 mscsm mums-nos (Chapter Six)
Proof: The first two statements follow immediately from the definitions.
We prove the third statement. Let D = (d,,) be the matrix of the composite
linear transformation Us Tfrom V to X. We have to show that D = CA.
Now for any 1' = l, 2,..., n, we have, by definition, (U o T) (w) =lildu x,.
But (U o T) (v,) also equals U(T(VJ)). From T(v1) =3} a“ w, and the line-
arity of U, U(T(v1)) equals *5; a.) U (wk). But, once again. Wm) :15:
cum. So we get If: dux, =1}: a” (:51 mm). Since {x,,..., x,) is linearly
-i _
-
independent, the coefiicients of x, (i = 1,..., p) on both the sides must be
equal. So d” = 2' c”, on. But the right hand side is, by very definition.
the (l, j)th entity in the product matrix CA. SoD = CA as was to be
shown.
The preceding proposition may be used both ways. Sometimes a result
is easier to prove for linear transformations than for matrices or vice
versa. In such cases, we prove it where it is easier and transfer it to the
other For ' ' ‘ ‘ yof , ' ' of linear transfor-
mations is a simple general property of functions. Using it, we can prove
the associativity of matrix multiplication (for matrices with entries in a
field F) as follows Let A, B, C be mx 7:, n><p and px q matrices. Let S, T,
U be the linear transformations obtained from them by Proposition (4.1).
Then (S o T) a U is a linear transformation from F' to F'" and its matrix
w.r.t. the standard ordered bases is (AB)C. Similarly S o (T a U) has A(BC)
as its matrix w.r.t. the same ordered bases. Since (S . T) - U=S o (T a U) and
the bases are the same, it follows that (AB)C = A(BC). This proves
associativity of matrix multiplication I little more elegantly than the direct
proof in Section 1. (Note however, that this proof is applicable only for
matrices over a field, while the earlier proof was valid for matrices over
any ring.)
As another application of the interplay between matrices and linear
transformations, we make good a promise given in Section 2. If Fis a
field. p(x) is an irreducible polynomial in PM and I is the ideal generated
by p(x), then in the proof of Theorem (2.26) we saw that the quotient ring
F[x]/I is an extension field of F and that in this extension field the polyno-
mial p(x) has a root (namely, the coset x + I). There we promised that
using matrices we can get a 'concrete’ representation of this extension field.
We now show how this representation is obtained. First we need a couple
of definitions.
4.11 Definition: A polynomial whose leading coefiicient is l is called a
manic polynomial. (‘Mono’ means one).
Rlng, Field: and Vector Space: 469
Thus a manic polynomial of degree n over a field F is of the form
all + up: + + a...,.1c*-1 + x', where an. ..., a._, E F.
It‘f(x) is any non-zero polynomial in F[x], we can write it uniquely use-got)
where c e F and g(x) is a monic polynomial. (We simply let 0 be the leading
coethcient off(x) and let g(x) = 7: f(x).) It is obvious that [(x) and g(x)
have the same degree. Any root off(x) is also a root of got) and vice versa.
Also since non-zero elements of F are the units in F[x],f(x) and g(x) are
associates of each other. So fix) is irreducible ifl‘ g(x) is so. Also the ideals
of [Ix] generated by f(x) and g(x) are the same (see Proposition (2.12)). So,
passing from an arbitrary polynomial to a manic polynomial over a field,
does not materially change its properties. It is merely a standardisation
device. The number of constants needed to determine a polynomial of degree
n is n + 1 in general. But if it is a monic polynomial it is only n, because
one of the constants (namely the leading coeflicient) is already known to be
1. Using this simple fact, we associate an nxn matrix to a manic polyno-
mial as follows.
4.12 Definition: Let f(x) = a, + 4,): + + a.._,x"" +x" be a monic
polynomial. Then its companion matrix, C(f(x)). is defined to be the nxn
matrix
0 0 0 ~ ------ 0 -a.
1 0 0 ------ - 0 —a,
0 1 o ....... 0 -0.
6 0 0 . 1 —.a,._,
In the comments following Theorem (2.26), we remarked that the matrix
0 -1
( ) corresponds to the complex number i, which is a root of the
1 0
0 -—l
polynomial 1 + x' in RIx]. Note that( l 0 is precisely the companion
matrix of the monic polynomial 1 + x'. If we identify an element A E R
A 0 ‘ 0 —l
with the matrix ( ) then we see that the matrix ( l ) satisfies
0 A 0
0 ——l '
the equation 1 + ( ) = 0. because by direct computation,
1 0
(T 11H"; -3)-
470 mm MATHEMATICS (Chapter Six)
Thus the companion matrix of l + x’ is a root of the polynomial l + x‘,
regarded now as a polynomial over the ring M,(R) of all 2x Zmatrices over
F. We now generalise this and show that the companion matrix of every
monic polynomial over afield is a root of that polynomial, regarded as a
polynomial over the ring of matrices over that field.
4.13 Proposition: Let [(x) = on + 01): + + a._, x"“ + a.“ be a manic
polynomial over a field F and let A = C(f(x)) be its companion matrix.
Then A is a root off(x) in the ring M.(F), that is,
a0 + a,A + a,A’ + + a._, A“'1 + A" a 0.
Moreover A cannot be a root of any polynomial in FIX] of degree less thann.
Proof: We could, of course, prove this by directly computing the various
powers of A and forming their linear combination. But there is a better way
out. Let T: F" —>F’l be the linear transformation obtained from A as in
Proposition (4.1). Then a‘I + ulA + a,.4‘ + + a._,A"-1 + A'I is thematrix
of the transformation an + alT+ A,T’ + + a,._. T'" + T", by Proposi-
tion (4.10). (Here an really means ao times the identity transformation). Let
us call this transformation S. We have to show that S is identically 0. Since
S is a linear transformation, it suflioes to show that S vanishes on some
basis for FlI (by Theorem (4.5)). We take the standard basis {en 2,, ..., 2,.)
for F“. First consider T(e,). This is obtained by looking at the first column
of A and taking the linear combination of the e.’s with coefiicients coming
0
1
from this column. But the first column of A is 0 . So T(e,)=e.. Similarly
_ 0
119,) = ea, T(e,) = e‘, ..., T(e._,) = e... However T(e,.) equals — :1. 2I —
—- a, e, -—... — a.-, 2,. Now 7"(e,) = T(T(e,))=T(e,)=e,. T'(e,)=T(T‘(el)).—_
= T(e,) = e‘. In general T'(e,) = e,+1 for i: l, ..., n— 1. However, T"(el)=
= T(T"-‘(el)) = T(en)= —a., el—a, e, ...—a.._.l 2.. It now follows that S(e,)=
= (ao + a.T+...+ a,._l T"" + T'Xel) = a. e, + 011-03.) + H.T’(e1) + +
+ a,._1 1""(20 + T'(e,) = 0. Similarly we can compute S(e,) and show that
it is 0. But there is a much better way. Note that 5(92) = S(T(e,)). But S
and T commute with each other, since S is a linear combination of powers
of T and T commutes with its own powers. So swap) is the some as
T(S(eo). But S(e,) = 0 as was shown just now. So, T(S(el)) = 0 and hence
S(e,) :0. Similarly S(e,) = 0 because S(e,) = S(T(e,)) = T(S(P.))= T(0)=0.
Continuing like this, we get S(e.) = 0, ..., S(e.)=0. Thus S is identically 0.
This proves the first assertion, namely that T (and hence A) isaroot off(x).
For the second assertion, suppose g(x) E F[x] is a polynomial of degree
m, wrth m < n. Then we have to show that g(A);é0 or equivalently g(T) ;é 0.
Let g(x) = b» + b,x +...+ bm, with b.. at 0 where b0, b1 ..., b... e F. Let
Ring, Field! and Vector Spaces 471
U = b, + 1117‘ + + bMT'". To show that U is not identically 0, it suflices to
show that U(e,) as 0. Using the computation done above, ”(9‘) = b, e, +
+b,e.+... +b.e..+, (we are using here that m < n). But the set (en ...,elfl}
is linearly independent. So U(e,) = 0 would force b: = 0 for all i = 0. ---.m
contradicting that b,,, sé 0. So T and hence A cannot be a root of g(x). I
We are now ready to give the 'concrete' representation of the extension
field obtained from an irreducible polynomials
4.14 Theorem: Let p(x) be a manic, irreducible polynomial of degree I:
over a field F. Let A be its companion matrix. In the ring M417), let F(A)
be the subring generated by F and A (i_.e., the smallest subring of M..(F)
containing FU{A)). Then F(A) is a field, containing F as a sub-field. More-
over, in this field, A is a root of p(x). Every element of F(A) is a unique
linear combination of A“, A‘, A’, ..., A“, where A“ = 1,. = then ><u identity
matrix. 1
Proof: The argument is essentially a duplication of the first part of the
proof of Theorem (3.21). Let I be the set of all polynomials in F[x] which
have A as a root in M..(F). Then I is an ideal of FM]. (We need Exercise
(2.20) here, because of which if k(x) =f(x) g(x) in F[x], then k(A) =
=f(A) g(A).) Now by the last proposition, p(x) is a polynomial of the least
possible degree among all non-zero polynomials in I. So, by the proof of
Theorem (2.4), the ideal I is generated by p(x). Here we are given that p(x)
is irreducible. In Theorem (3.21), we had to prove this. This is the only
point of difference. The rest of the proof is identical. Definee :F[x] -> M,.(F)
by 0(f(x)) =f(A). Then 0 is a ring homomorphism with kernel I and range
F(A). So F(A) is isomorphic to the quotient ring F[x]/I which is a field.
Under this isomorphism, the element x + I of F[x]/I goes to A. From
Theorem (2.26), we already know that F[x]/I is an extension field of F and
x + I is a root of p(x). So the subring F(A) is a field, containing F, and A
is a root ofp(x). Finally, for the last assertion, a typical element of F(A)
is of the form/(A) for some f(x) E f]. (Thisf(x) need not be unique.) By
division algorithm in F[x], write f(x) as p(x) q(x) + r(x) where r(x) = 0 or
deg (r(x)) < n. In any case r(x) is a linear combination of 1, x, ..., 16'". So
r(A) is some linear combination of A°, A‘, ..., AH. But, RA) = p(A) q(A)+
-|- r(A) = 0 + r(A) = r(A). So flA) is a linear combination of
A”, A1, A‘, ..., A"".
As for uniqueness suppose 12v + blA‘ + ...'+ 13..., A’H = coA" + «.-,A1 +
+ + a.-. A"! Then A is a root of (b. — Co) + (111 —— c1) x +...+ (b _, —
~ c,._,) xl—I. By the last proposition, this polynomial must be the zero poly-
nomial. So by = c, for all i=0, i, ..., n — l. I
This theorem provides a handy method for constructing field extensions.
We take any'irreducible polynomial of degree n (say) over a field F, convert
it to a manic polynomial, take its companion matrix _A and take all
472 Discairra MATHEMATICS (Chapter Six)
possible linear combinations of the powers of A, namely A°, A1. A',....A"-‘
with coefficients from F. Unfortunately, there is no easy way, in general,
to tell whether a polynomial is irreducible or not. One sufficient condition
will be given in the exercises.
When we defined the matrix of a linear transformation T: V—> Wwe
emphasised that it depends on what ordered bases we take for Vand W
and that it would change if either of the two ordered bases is changed.
Let us now see the manner in which it changes. First we need a definition.
4.15 Definition: Let (v‘. v,,....v,,) and (x1, x,, x.) be two ordered bases
for a vector space V. Then the matrix of the identity transformation
1y: V—> V, where the domain Vhas the basis (v,....,v,,) and the codomaln
V has the basis (x1....,x,,), is called the matrix of change of basis from
(Vrn-en) to (xr,....xn).
This is actually a special case of Definition (4.8) and therefore will not
be much elaborated. It is a square matrix whose jth column consists of
the eoelfieients of v, when it is expressed as a linear combination of
x” x,,...,x,,. The name 'change of basis matrix’ may be justified as follows.
Call this matrix A. Suppose an element ofv is expressed as g Any and
[-1
also as [£1 nix, where A's and 11’s are in F. Then the relationship between
the A’s and his is given by
a. M
If: =A ’3:
ill! in
Thus if we know the 'components' of a vector w.r.t. the ‘old’ basis
then we can easily determine its components w.r.t. the 'new‘ basis.
If we interchange the two ordered bases we have the following expected
result.
4.16 Proposition: Given two ordered bases (v1....,v.) and (x,,...,x..) for
V. the matrix of change of basis from (v,,...,v,.) to (x,....,x.) is the inverse
of the matrix of change of basis from (x,,...,x..) to (v,,...,v.).
Proof: Let A and B denote these two matrices respectively. Then by
Proposition (4.10). BA is the matrix of the identity transformation
1y: V—v V w.r.t. the ordered basis (vl,...,v,.) for bath the domain and the
oodomain. So BA must be the identity matrix 1,. Similarly, AB, being the
matrix of Ir: V~>Vw.r.t. the ordred basis (x,....,x,,) for the domain as
well as codomain, equals 1... So A and B are inverses of each other. a
We are now ready to study the effect of change of bases in general.
Ring, Fields and Vector Spaces 473
4.17 Proposition: Let T: V—> Wbe a linear transformation. Let A be the
matrix of T w.r.t‘ the ordered bases (v1....,v.) for V and (w,....,w,.) for
W. Let B be the matrix of T w.r.t. the ordered bases (x,,...,x,.) for Vand
(y,....,y,.) for W. Then 8: DAC" where C is the matrix of change of
basis from (v1....I v.) to (x1...., x.) and D is the matrix of change of basis
from (wl.....w,.) to (y,....,y,.).
Proof: The proof is short despite the prolix statement of the result.
Consider the following commutative diagram where near each vector
space we write an ordered basis for it and for each arrow, on one side we
write the linear transformation it represents and on the other side the
matrix of this linear transformation w.r.t. the given bases.
T
(v,.. .,v,,) V—W (w,,...,w..)
A
ly C IWD
T
(xv-um.) V—B—W (y......y-)
Commutativity of this diagram, along with Proposition (4.10), implies
BC = DA. But by the last proposition, C is invertible. So 8 = DAG-1. |
When we represent a linear transformation from a vector space into
itself by a matrix, almost invariably the same ordered basis is chosen both
for the domain and the codomain (except for some theoretical purposes
such as above). In this case the preceding proposition reduces to the
following.
4.18 Proposition: Let T: V» V be a linear transformation. Let A be
the matrix of T w.r.t. an ordered basis (vb...,v,.) (which is understood to
be both for the domain and the codomain). Let B be the matrix of T w.r.t.
the ordered basis (x1...,x,.). Then 3: CAC", where C is the matrix of
change of basis from (v,,...,v,.) to (x,,....x.).
Proof: We merely take W = V, m = n, (w,....,w..) = (v1....,v.) and
(yx....,y,,,) = (x,,....x,,) in the last proposition. Then D =-. C and so
B = C A C-l. H
For many purposes. the matrix C in the equation)? = CAC“ is not
very important. What matters is that the matrices A and B are related to
each other like this by some invertible matrix C. We have encountered
this concept earlier in group theory, where two elements, say x and y are
said to be conjugate to each other if there exists some g such that y = gxg-l
(sec Definition (5.1.14)). There is no reason why the same term cannot be
used to describe the analogous relationship for matrices. But a difi‘erent
name is already too standard to be changed. We give it below:
474 niscma MATHEMATICS (Chapter Six)
4.19 Definition: Given nxn matrices A, B over a field F. we say Bis
similar to A if there exists an invertible n x n matrix C such that
B = CAC".
Using the same reasoning as in Proposition (5.1.16) it follows that
similarity is an equivalence relation on M.(F), the set of all n X n matrices
over E Proposition (4.18) shows that although a change of the ordered
basis for V changes the matrix of a linear transformation from V to itself,
this variation is confined to the similarity class of the original matrix. One
of the major problems in linear algebra is to find, for a given linear trans-
' formation T: V—> V, a suitable ordered basis for Vw.r.t. which the matrix
of T will be of some simple, or canonical, form. In the language of matri-
ces. the problem amounts to finding, for a given square matrix A over a
field F, a matrix B which is similar to A and which is in a canonical form.
The search for such canonical forms leads to many interesting developments
involving the properties not only of the matrix A but also of the field F.
But we shall not pursue them.
An attribute of square matrices which is invariant under similarity is
called a similarity invariant. For example, invertibility is a similarity invari-
ant. For suppose A, B are similar matrices and A is invertible. Then
5 :4 CAC" for some matrix C. By the general properties of inverses M”.
an associative binary operation (Proposition (3.4.7)). it follows that B is
invertible with CA"C" as its inverse. When a property of matrices is
invariant under similarity, it can be defined unambiguously for a linear
transformation of a finite dimensional vector space into itself. If T : V —> V
is such a linear transformation, we choose any ordered basis for V and let
A be the matrix of T w.r.t. this ordered basis. If we had chosen a difi'erent
basis the matrix would possibly be different, say B. Still, A and B are
similar by Proposition (4.18). So, since the property in question is I simila-
rity invariant, either both A and B have it or else neither does. Accordingly
we say that T has or lacks that property.
Among the most important similarity invariants of a square matrix are
its eigenvalues, determinant, characteristic polynomial and trace. Of these,
we shall discuss only the determinant. In the exercises we shall briefly
mention eigenvalues and trace. (Actually, an eigenvalue is the basic concept.
The others can be defined in terms of eigenvalues. But we shall not go into
this.)
The reader has undoubtedly studied determinants of real numbers of
order 2 and 3. (A determinant of order 1, I a l , is trivially the number a
itself.) A 2 x 2 determinant is defined as ad — bc. A typical 3 x 3
C
determinant has the form
Ring, Fields and Vector Spaces 475
“u “12 “In
D 2 an ”as an
”:1 ”a: ”a
”2: an “n a"
'i‘ an !
a" a” ”as ”a:
which comes out to be the sum of six terms
“Hanan " ”Hanan — ”Rana” + “mamas: + “manure _ ”manger
The theory of n x n determinants which we shall develop will be a
straightforward extension of this. But there is a minor conceptual difference.
A determinant, as defined above, is a number which is expressed in a
peculiar way,uamely, as a square array of several numbers. For our purpose.
on the other hand, the determinant will be a function defined on the set of
square matrices, abbreviated as det. For example. the determinant D above
will be written as det (A) where A is the 3 x 3 matrix
“11 an at!
“n a” “as
“u ”n ‘3:
Thus, the determinant of a square matrix of real numbers (say) is a real
number. but the determinant will be a function.
As a first measure of generalisation, we replace the field of real numbers
by a commutative ring R with identity. Many of the results about determi-
nants hold for this case too. A few results. however, require that R be a
field. Till we encounter such results, we shall assume we are dealing with
matrices over a commutative ring with identity.
Now comes the crucial question. How to define det (A) when A is an
n xn matrix if n > 3? For possible answers we look at the definition of a
3 x 3 determinant above. In the first expression it was written as alinear
combination of three 2 X 2 determinants, with coefiicients coming from
the first row, with alternating J.— and —— signs. A similar approach is possible
for any n. Thus. given an n X n matrix A=(a,;), we consider its (11 — l) x (n— l)
submatrices obtained by removing the first row and one of the columns.
Let A, be the submatrix obtained by removing the first row and the jth
Column ofA, j: 1..... n. Then define det (A) as §I(_1)1+1 a,, dc! (A,).
416 Discam MATHEMATICS (Chapter Six)
This is known as the inductive definition of the determinant. Its main
advantage is its simplicity. Also certain properties of determinants, where
an inductive argument is needed, are more amenable with this definition.
Its disadvantage is that the first row is given a special role, when there is
in fact nothing special about any row in particular. Also it is not clear that
the-determinants ot‘ a matrix and its transpose are equal.» We shall therefore
adopt another definition, which is based on the permutation group S,I of n
symbols, which was studied in detail in Section 4 of the last chapter. Later
we shall show that our definition is equivalent to the inductive definition.
01‘ course, our definition too has its disadvantages, one of which is that it
requires the knowledge of S,.. So ultimately it is a matter of taste as to
which definition one prefers.
Let us take a closer look at the determinant D of the 3 x 3 matrix
A = (an) given above. It consists of o terms three of which are with a
+ sign and 3 with a — sign. Moreover, each term is a product of 3 factors
each of which is an entry of the matrix A. These entries are such that in each
term there is precisely one entry from each row and each column. So every
term is of the form awry“, wherex y z is a permutation of (l, 2, 3}. If we
denote this permutation by c, we can write the term as an“, an“) am”. We
see further that the term appears with a+or—sign according as the permuta-
l 2 3
tion a is even or odd. For example, if a = ( )then u is even and
2
l 2 3
the term unauan indeed appears with a + sign, while if 1 = < )
3 2 1
then 1 is odd and term annnau appears with a negative sign. Thus, we see
that for the 3 x 3 matrix A: (a,,), det (A) equals 2 (—1)" an“) an“, am.)
06S:
where (— I)" = l or —-1 according as c is even or odd. The case of the
2x2 determinant is similar and in fact simpler.
In S, there are only two permutations, one of which is even and the
“u “is
other is odd. So also equals 2 (—1)” am) am”.
an as: oes-
It is now clear how to define the determinant of any nx n matrix over
R.
4.20 Definition: Let A be an nxn matrix over a commutative ring R
with identity. Then the determinant of A (denoted by det (A). d(A) or by
[A [ ), is defined as the element, 25 (-1)” an“, am,,...a..(,.) of R, where
05 n
(—l V = l or —- 1 according as the permutation a is even or odd.
It is obvious that in proving properties of determinants, the group S.
will figure many times. As a typical example, we prove that A and its trans-
pose have the same determinant.
Ring, Field: and Vector Spaces 477
4.21 Proposition: Let B be the transpose of an nyn matrix A over R.
Then det(B) = det (A).
Proof: Let A=(a,,). ThenB = (bu) where by = aflfor i: l,...,n; j=1,...,n.
By definition. det (B) = 2 (—l)° b1.m._,b,,(,., = z (—1)"’a(1)x
ass. was:-
a,(,),...a.(,.,.. Now a is a permutation of (l,..., n}. So as ivaries from 1 to
n. so does «(1‘), except possibly in a different order. Also the ring R is
commutative. So we can write 00(1): a,(,),...a.(n),. as an“) a“.(,,...a,,fl,.) where
1‘ = a“. Note further that as c ranges over S. so does 1-. Moreover a and 1
have the some parity. Therefore det (B) equals Es (— l)r am.) am”...
7 I:
am"), which, by definition, is det (A). I
Because of this proposition, whenever we prove a result regarding the
rows of a determinant, the corresponding result for columns also holds. The
technique in the proof above is noteworthy. The variable a is a dummy ranging
over S”. If we replace it by any variable 1 which also ranges over S,,, the sum
will be unchanged. The choice of r is made 'as a function of a; in the last
proof 1 was 6-1. Of course the function taking a to r must be one-to-one and
onto, that is, it must be a permutation of S... It is this technique of change
of variable of summation (along with properties of the permutation groups)
that makes our definition of determinant easy to manipulate despite its
gigantic size. (Note that there are 111 terms each term being a product of n
elements.) As another illustration of this technique, we prove the following
result Which tells what happens when two rows (or columns) are
interchanged.
4.22 Proposition: Let B be a matrix obtained from an nxn matrix A by
interchanging two rows (or two columns) of A. Then det (B) = — det (A).
Proof: In view of the last proposition we prove the result only for rows.
Let B be obtained from A by interchanging the rth row with the sth row
where r<s. Then bu=au for all food for all tee r,r and b,, = a”,
b,/ = a” for all 1'. Now det (B) = 625 (—1)" by“)... bum... b,,(,)...b,.,(,.)
0 II
= 2 (—1)0 an“)... am")... a,,(,)...a,..(,.,. Let 0 be the transposition (r s),
165'
and let 1- = o a e for o e S.. Then as a ranges over 5,, so does 7. However,
a and e are always of opposite parity. So (— l)”: —(— 1)‘. Now, in the expres-
sion for det (B) obtained above. amt)...a,a(.)...a,,(,,...a,,.(.) is nothing but
an“)... (1,4,0)... (111(5)... am”). So det (B) equals — 5% (—l)‘ ulm)...a,.,(.).
Thus det (B) = -— det (A). I
4.23 Corollary: Suppose B is obtained from A by a permutation of
rows. Lete be this permutation. Then det (B)=det A it‘ 0 is even and
det (B) = — det (A) in) is odd. (Similarly for permutation of columns.)
478 nieces-rs mmulmcs (Chapter Six)
Proof: By Theorem (5.4.5), express 0 as 0‘ 0,...0. (say) where each 0, is a
transposition. Now the permutation 0 of rows of A can be effected by
applying, in succession, the interchanges of rows given by 9k, 0,4...” 0,, 01
.to A. Every time the sign of the determinant changes. So det (B) = ( —1)*
det (A). But k is even or odd according as 0 is even or odd. Hence the
result. I
Let us now prove the well-known result that if two rows (or columns)
of a matrix A are identical then det (A) = 0. Suppose the rth and .rth rows
of A are identical. The familiar proof is by interchanging these two rows.
Then the new matrix is also A. So by Proposition (4.22), det (A) = — det (A).
But we have to be wary in concluding from this that det (A) = 0. This is
valid for matrices over R, and more generally over any commutative ring
R in which 2): = 0 implies x = 0. But not all rings are of this type. In fact,
in a Boolean ring. and also in any integral domain of characteristic 2, x =
=— x for all x. So we give a different argument which is applicable for all
cases.
4.24 Propultion: if two rows (or columns) of a square matrix are iden-
tical then its determinant is 0.
Proof: LetA benxn matrix whose rth and sth rows are identical with
r < 3. Then [1,, = a,, for all] = l, 2, ..., n. Then,
d“ (A) = we»: (—1)“ ”10(1) awn) m ”1-0) RM»)
.
As before, let 0 be the transposition (r s), and for e e 5., let 1 = ooh. Then
an“) an“) an") am.) is the same um” ohm an") 0...“, be-
cause ”no: a on“) = am.) and similarly am.) a an") = an“), while for
let r, s, «(1) = 1(1). As noted above, «and 1 have opposite parity. So (—1)’:
=— (— l)‘. Thus we see that in the summation above every term appears
twice. but with opposite signs. (It should be noted that if we further con-
vert «- to 1- s 0 then we get back a. So these two terms cancel each other.)
Hence det (A) = O. I
This proposition is useful in evaluation of determinants. Its utility is
enhanced by combining it with another property of determinants, called its
linearity in each row (or column). To state what it means, let 14,. u,. ..., u,'
be the rows of an n X]: matrix A over R. Each 14. is an ordered n-tuple,
(ah, ah, ..., an.) and hence is an element of R". Since we are not yet assum-
ing that R is a field, R" is not quite a vector space. Still the concept of a
linear combination is meaningful. So. suppose, u, = Av, + pw; for some
it. it e R, where w, w, E R", say v,=(b,,,b,3, ..., b4") and w,=(c',, c,” ..., 0...).
This means, for each j: l, ..., n, a” = 1b,, + no”. Now let us keep all
other rows of A unchanged and form two matrices B and C, where the rows
of B are 14., ..., in-” v,, um, ..., 14,, and those ofC are
11,. ..., In-.. wt, um, ..., u...
Ring, Fields and Vector Space: 479
The following proposition relates the determinants of A, B and C.
4.25 Proposition: With the notation above,
det(A) = x det(B) + i. det (C).
Proof: Once the statement is understood, its proof is very simple. By defi-
uitlon,
det (A) =cezs.(_ 1)” an“) 01—1. «1—1) “1-0) “an: 004-1) “um
= 0285—1)” 01m) - - - 111—1. vii—1) (MI-(I) + F- 01mm“. mu) - - - and»)
=1 2 (—lyalm)...b1,u)...a,.,(.)
0655
+51. 2 (—1)'a,.m "many-mm.)
063::
= A det (B) + u det (C). E
The lust two propositions are the basis of one of the most frequently
used methods of simplifying a determinant, namely adding a multiple of
some row (or column) to another row (or column), (Exercise (4.14)).
We may very well represent an nxn matrix A as an ordered n-tuple of
its rows. (14,, 11,. ..., u.) where u; is the ith row of A. (Of course, u, itself is
an ordered n-tuple of elements of R.) We mey therefore regard the deter-
minant of an nxn matrix as a function of n variables u, ..., 11,. each rang-
ing over RI. For det(A), we write det(u1. 11,, ..., u.). The preceding result
then says that the det function is separately linear in each variable. or in
the language of Exercise (3.25), it is a multilinear function. when more than
one row is expressed as a. linear combination of row vectors, we can reduce
it, applying Proposition (4.25) to only one row at a time. For example.
suppose u, = xiv, + hull and u, = A," + u,w,. Then det(u1. um 14., ..., u.)
= detouv1 + hwy "2, 14,, ..., u,.)
= 11 detail, 14,, an, ..., u.) + [1.1 det(w1, u,. 14,, ..., :4.)
which further reduces to,
All, det (v,, v,, u,, ..., u.) + Alp, det (v1, w, 11,, ..., 14,.)
+ 1,111 det (W1. v,,_u,, ..., u.) + Hills det(w1. W2, 14,, ..., u.)-
Repeated applications of multilinesrity, coupled with earlier results
about determinants, yield the following result about determinant of the
product of two matrices.
4.26 Theorem: If A, B, are square matrices of order n, then
det(AB) = det (A) det (E).
480 mscnm MATHEMATICS (Chapter Six)
Proof: Let A = (all), B = (by). Denote AB by C = (cu). Let V» ..., v. be
the rows of B and w” ..., w, be rows of C. From the definition of matrix
multiplication, it is easily seen that each w. is a linear combination of the
v's with coefficients coming from the ith row of A. Specifically,
. w: 1553,,” So an (AB) = det (C) = det 951"" .,,I;:la,,. v,, "hind" v1).
Since each variable is a linear combination of n rows and there are n
such variables, if we apply the last proposition, det (C) will be a linear
combination of WI determinants in all, formed from the various v‘s. This
looks like too large a number to manipulate. But most of these n" deter-
minants vanish. A typical term in the expression for det (C) will be
a”, a”, 0,1, a”, det(v,,, Viv . ., v”),
where each 1,, .,., jn varies from I to n, independently of the others. When-
ever }, = j, for some r953, this determinant is 0 by Proposition (4.24). So
we have to consider only those terms where j,, ..., j. are all distinct. This
means j” j, ..., j. is a permutation of (l, ..., n). Thus there will be only nl
terms which appear in the expression for det (G) as a linear combination.
In a slightly different notation,
Cl“ (C) =03, (ls-(llama) “M00 d" ("0(1), Vets). ..., Vow)
n
But by Proposition (4.23),
dam.” Vols)! ---, Vow) = (—l)° dfl (V1: ..., 7n) = (-1)” det (B).
So det(C) - det(B) 2 (—lralm) an“) an“,
025;.
which equals (let (A) (let (3) as desired. 3
An important generalisation of this theorem will be given as an exercise.
It is proved by an analogous argument.
This result can also be expressed by saying that the determinant function
is a monoid homomorphism from the monoid MAR) to themonoid R (both
under multiplication). Note, however, that determinant is not a ring homo-
morphism, det(A + B) is generally difl‘erent from det (A) + det (B). Each
row of A + B is the sum of the corresponding rows of A and B. So if we
apply Proposition (4.25). det (A + B) will be a sum of 2'I determinants, of
which det (A) and dot (B) will be only two terms. Still, the fact that deter-
minant preserves multiplication has important consequences as the follow-
ing two corollaries show,
4.27 Corollary: if a square matrix A is invertible so is its determinant
(as an element of the ring R). Also det (A") = [det (A)]".
Ring, Fields and Vector Spaces 481
Proof: Let A be an n><n matrix over R. Suppose Be M..(R) is an inverse
of A. Then AB_= I.I where I. is the identity matrix of order n. Then by the
last theorem, det (A) det (B) = det (1.). But det (In) is evidently 1. So det (B)
is the inverse of det (A) in R. (Note that Ris commutative. However, M.(R)
need not be ‘ The -. here " “ even for "‘ "
inverse.) l
4.28 Corollary: Similar matrices have the same determinant. In other
words, the determinant is a similarity invariant.
Proof: Suppose A , B. C are n><n matrices with B = CAC“. Then by
Theorem (4.26) and the last corollary. det (B) = [det (C)]'1 det (A) det (C):
= det (A) since R is commutative. I
The converse of this corollary is false. For example the 2 x2 matrices
l 0 1 l
(0 l ) and( >have the same determinant but are not similar.
1 0
(since ( ) commutes with every 2X2 matrix, it cannot be similar to
any other matrix.) However, the converse of corollary (4.27) is true. That
is, if det(A) has an inverse in R, then A is invertible. To prove this, we
shall first prove a result, which will also show that our definition of adeter-
minant of an nxn matrix is equivalent to the inductive definition. First we
define the gadgets needed in ‘expanding a determinant w.r.t. a particular
row .
4.29 Definition: Let A = (a,,) be an nxn matrix. Then the determinant
of the (n — l) x (n — l) submatrix of A obtained by removing the ith row
and jth column of A is called the 0,1) the minor of A and is denoted by
M”. The element (— l)'+1 M” is called the (I, j)th cofactor of A and will be
denoted by C”.
In particular if i = i, then CU is the coeflicient of flu in the inductive
definition of the determinant. So the equivalence of the two definitions will
follow as a consequence of the following theorem.
4.30 Theorem: For any r = l, ..., n, det 04):; a,, Cu. Similarly,
-I
det(A)= 5: aka,"
1-:
Proof: Let for j: l, ..., n, T, ={e e S..:o(r) =1». Then each T, has
(n — 1): elements, 1m n = ¢ forisék and s. = 1-:
O 1,. Hence
det (A) ='Ezh(—I)V am” 0.4,.) = E] 0:17}- I)“ an“) an“ aw”)
482 DISCRETE MATHEMATICS (Chapter Six)
.
= 2 a” 2 (dram)...ar..,.(r—na.+w(v+nman-(u)
1—1 061']
So the proof of the first assertion will be complete it'we show that for
each 1',
Cr} = 2 (“—1)" 0mo- ‘ J'r-x. 90-1) ar+1lo(r+l)~-~anv(l) (1)
a e 1']
Let 5 he the (n—l)x(n— l) submatrix of A obtained by removing the rth
row and jth column. Note that for i >r, the ith row of A becomes the
(l—l)th row B (with deletion of one element namely a”) and similarly for
k >1. the kth column of A becomes the (k —- l)th column of B. In other
words, for i=1,..., n — l and k =1, 2,..., n — l, the (i. k)th element. In.
of B is given by
a”, if i<r and k<j
u..,,,,, if l>r and k<j
blk= . (2)
an“. if i<r and k>j
am“, if izr and k2]
Now, by definition, C,, = (—l)'+l dot (B)
= (—1)"H 2 (—l)‘ 11““) bm...b..,. “p.13.
1 e 5...
So, in (1) both sides are summations of (n—l)! terms each. The result will
be established if we show that the terms on the two sides match one by one.
For this define f: T, —> S...l by [(0) = 1- for e e T}, where
“(1. 2..... n — l} —> (I, 2...., n — l) is defined by
.(1) if !< r and am <j
o(i+l) if i>r and «(1+1)<j 3
(0— «(0—1 if i<r and a(!)>j O
o(I+I)—l if i>r and o(i+l)>j
e is essentially the same permutation as 0, except that indices greater than
r in the domain and those greater than i in codomain are lowered by 1. For
1 2 3 4 5 6 7 8 9)
example, if n=9, r=7,j=3 and e=
2 9 l 4 S 8 3 7 6
l 2 3 4 5 6 7 8
then 'r = ( ). Clearly the function f taking a to
2 8 l 3 4 7 6 5
T is a hijection. Also from (2) and (3) it follows that for everyeETJ.
“ado-"W-rmr—n) ar+wtr+n)-~“mn) equals bmnbmn...b..nm—.)- Th“! the
Ring, Fields and Vector Spaces 483
terms on the two sides of (1) are the same except possibly for sign. To
complete the proof we now merely have to show that (— l)‘=(—-l)’+/
(— 1)‘1 for all e 6 Th. For this we compare the inversion pairs in n and in
1-. lf(p, q) is an inversion pair of r (i.e., if p <11 and r(p)> 1-(q)) then
there is a corresponding inversion pair in c, which may be (p, q) or
(p+ l, q) or (p, 4+ 1) or (p +1, 4+ 1) depending upon howp isrelated
to r and how q is related to j. Conversely, to every inversion pair in a.
there is a corresponding inversion pair in 1-. except for such inversion
pairs (p, q) in c for which p or q equals r. Letxbe the number of inversion
pairs in o for which p =r and y the number of inversion pairs in a for
which q = r. Then, the number of inversion pairs in a = x + y + number
of inversion pairs in 1-. From Theorem (5.4.7), a permutation is even or
odd according as the number of inversion pairs in it is even or odd. We
see that (— l)” = (— l)’+’(— l)‘. We are given a(r) = j. Let us write a in
theform
l 2...r—l r r+l...n )
a“ (0(1) «2)...«0—1) j o(r+l)...o(n)
Then 2: is the number of entries in the second row. which lie to the left of
j and are greater than 1. Similarly y is the number of entries in the second
row which lie to the right ofj and which are less than 1. So the number of
entries in the second row which are greater than 1' is x + (n — r — y). But
this number obviously is n —j. Thus we get x — y = r —j. (In the numeri-
cal example given above, r = 7, j .= 3, x = 4, y = 0.) We have not succeeded
in computing x + y. But it does not matter, because we are interested only
in its parity. Now x + y has the same parity as x - y. Also I — j has the
same parity as r +j. So (—1?” = (—l)rU. Thus for every c'E Tr, the
"m (—0'41-(0' . ' “1—1! utI-n ”an v(r+n---”n(u) equals (—1)'+J(— 1)1 bum-u
...b.._,. WM). As noted before. this establishes (l) and completes the proof
of the first assertion. The second assertion is just the ‘transposed’ version
of the first assertion. I
We are now ready to prove the converse of Corollary (4.27).
4.31 Theorem: A square matrix over a commutative ring with identity is
invertible if and only if its determinant is an invertible element of that
ring.
Praafi The direct implication was proved in Corollary 4.27. For the con-
verse, suppose A = (a,,) is an nxn matrix over a commutative ring R with
identity. Let D be the transpose of the matrix of cofactors of A, that is D
is nxn matrix in which 11; = C1,. Let us compute the product matrix AB.
A typical diagonal entry of the product is of the form ‘2‘ add," But this
I»-
484 DISCRETE MATHEMATICS (Chapter Six)
equals 5: a,;C,, which is det (A) by the last theorem. What about non-
1-1
I .
diagonal entries? A typical such entry i512: and/D where r 7% s. We claim
this is o. i a,1d,, = S“. a,,c,,. Let B be the matrix (b,,) where 12,, = a” for
1-1 1-]
all tees and allj and b,, = 0,, forj= l,..., n. In other words Bis obtained
fromA by replacing its sth row with the rth row leaving all other rows unafl‘ec-
ted. It follows that the cofactors C” are the same for A as well as for B. So
gauc, = )5 b.1611 which equals det (B) by the last theorem. But
—[ j-l
det (B) = 0 since two rows of B (namely, the rth row and the sth row) are
identical. Thus we have shown that all non-diagonal entries of D are 0.
Putting it all together AD equals (det A) I... Similarly DA equals (det A) I...
We are assuming that det (A) is an invertible element of R. It follows that
detl(A) D is the inverse of A. 3
Note that in the proof of the converse, the {act that det A is invertible
in R was used only at the end. Even without it. we still have AD = DA =
(det A) I. for any nxn matrix A over a commutative ring R. The matrix D
is called the adjoint matrix of A and denoted by adj (A).
Using this theorem we can get a method for solvinga system of n linear
equations in n unknowns, known as Cramer’s rule, which will be given as
an exercise. But it is a very ineflicient method because it requires the com-
putation of so many nx n determinants, which is time-consuming. Later we
shall study a method, due to Gauss. which is efficient and also more general
in that it applies to a system of m linear equations in n unknowns, where
m may be difl‘erent from u.
Nevertheless, the last theorem has interesting theoretical consequences
and we proceed to study them. Our treatment of the determinant so far
was applicable to square matrices over any commutative ring with identity.
From now onwards we assume that we are dealing with matrices over a
field I“. As before, if A is an nxn matrix over F we shall think of its rows
and columns as elements of F". Note, however. that now F" is a vector
space over Fond so the results of the last section become applicable. The
following proposition gives a handy characterisation of the linear depend-
ence or otherwise of the rows (or columns) of a square matrix.
4.32 Proposition: Let A be an a matrix over a field F. Then the fol-
lowing statements are equivalent:
(1) The rows of A are linearly dependent over F (as elements of F").
(2) The columns of A are linearly dependent over F (also as elements
of F").
Ring, Field: and Vector Spaces 485
(3) det (A) = 0.
Proof: We shall prove first that (2) and (3) are equivalent. We shall then
apply this equivalence to A’, the transpose of A. Since det (A) = det (A’)
by Proposition 4.2]. it follows that (1) and (3) are equivalent, because the
columns of A’ are precisely the rows of A.
(2) a (3). Let the columns of A be cl. c,,..., c... If they are linearly
dependent. then one of them can be expressed as a linear combination of
the remaining columns (see Proposition 3.5). Suppose that the jth column
c, equals he, +.. .+ 11491-1 + Alfie,“ -l—.. .+ 71.6. for some 1,," .. 711.1,
1,“... .7... e F. We form a matrix B from A by adding— he, to c, for
i=l,. ,1— l,j +1,.. .. It By Proposition 4.25 (or rather, the comment
following it), det (B)= (let (A). But'in B, the 1th column is 0. So det (B)
= 0. Hence (let (A) = 0
(3) => (2). Suppose det (A) = 0. Let T:F"—>F" be the linear trans-
formation obtained from A by Proposition 4.1. Let R be the range of T.
As noted while proving Pr K ‘ ‘ 4.2, R is , ’ by the ' of
A. If the columns of A are linearly independent, then dim (R) would be n,
which would mean that R = F” that is, T is onto. But then T would also
be one~to-one (see Exercise 3.18). So T would have an inverse T“: Fu-eF"
and the matrix of T-1 w.r.t. the standard basil would be the inverse of the
matrix of 1'. But the matrix of T is A. So A would be invertible. This
would contradict Theorem 4.31, according to which the determinant of an
invertible matrix cannot be 0. So the columns of A are linearly dependent.
This proves the equivalence of (2) and (3) and, as noted before, com-
pletes the proof. I
It is convenient to paraphrase this proposition. A square matrix A is
called non-singular (or regular) if it is invertible and singular otherwise.
(This is another instance of how a general concept is given a special name
in a particular context. Another example was the use of ‘similar' to convey
the same meaning as ‘conjugate’). The following proposition characterise!
non-singular matrices.
4.33 Proposition: Let A be an nxn matrix over a field F. Then the fol-
lowing statements are equivalent.
(1) A is non-singular.
(2) A hrs a right inverse, that is, there exists an nxn matrix B such
that AB = 1..
(3) A has a left inverse.
(4) det (A) as 0.
(5) The row rank of A is n.
(6) The column rank of A is n.
(7) Ax = 0 2.. x = 0 for every column vector x,
486 mscxs'rn MATHEMATICS (Chapter Six)
(8) xA = 0 = 0 a x = 0 for every row vector x.
Praofi Clearly (i) re (2). If (2) holds, then det (A) det (B) = det (1,.) = 1
showing that det (A) a6 0. So (2) => (4). From Theorem (4.31), (4) => (1),
because, in a field every non-zero element is invertible. Thus we see that
(1), (2) and (4) are equivalent. By a similar argument, (1), (3) and (4) are
equivalent. The equivalence of (4), (5) and (6) follows from the last propo-
sition. It only remains to show that the last two statements are equivalent
to the rest.
Clearly (l) a (7), because Ax = 0 a A'1 Ax = 0 => Inx= 0 ox =0
for every column vector x (of length n). Similarly (I) =:> (8).
To show that (7) a (l), we consider once again the linear transforma-
tion 1': 1‘"I —> F“ defined by T(x) = Ax. (7) is equivalent to saying that T is
one—tovone. But then by Exercise (3.18), T is also onto and hence has an
inverse, T'l. As before the matrix of T—1 is an inverse of A. So (1) holds.
To complete the proof, we only have to show that (8) => (1). For this,
we define T: F"—>1""l by T(x) = AA and argue as above. Alternatively, let
B be the transpose of A. Then (8) is equivalent to saying that By=09y=0
for all column vectors y. So from the equivalence of (l) and (7) applied to
B, Bis invertible. But then the transpose of the inverse of B would he the
inverse of A. So (1) holds. I
We are now ready to prove the equality of the row rank and the column
rank of a matrix. A special case of this, where the matrix is a square one
and the row rank is the highest possible was proved in the last proposition.
We first extend this result to any rectangular (that is. not necessarily square)
matrix.
4.34 Proposition: Suppose B is an r>< n matrix of row rank r, over a field
F. Then the column rank of B is also r.
Proof: Let u,..... u, be the rows of B, regarded as vectors in F". Let
c” c,,..., e,' be the columns of B, regarded as vector in F'. The hypothesis
means that u,,..., ur are linearly independent. in view of Proposition (3.12),
this implies n>r. Now let k be the column rank of B Then some k columns-
of B are linearly independent and every other column 15 a linear combina-
tion of these k columns. For ' " y, ,., these ‘
are c1," ., c... We have to show that k— — r Certainly, since (0“..., ck} is a
linearly independent set of vectors in F', it follows, again from Proposition
(3.12) that k cannot exceed r. To complete the proof, we show k cannot
be less than r.
Suppose k < r. Let C be the rxr submatrix of Bformed by the columns
c,,..., e,. Let v1...., v, be the rows of C. Note that each v, is a trunca-
tion of u}. Now (cum, c,} is linearly dependent since a, can be expressed
as a linear combination of c,,..., c, (and k < I). So by Proposition (4.32),
Ring, Fields and Vector Spaces 487
the rows of C, that is, v,.... v, are also linearly dependent. This by itself
does not mean that 11“.... u, are linearly dependent, because We are merely
truncations of «1‘s. However, we show that (up. ., 14,} is linearly dependent.
This will contradict our hypothesis and thereby establish our result.
Since v,..,, 1-, is linearly dependent. there exist Mr... A, E F, not all 0,
such that 11v. +...+ A,» = 0. This means that for allj= l,..., r, Mb” +
Mb.) +...+ A,b,, = 0. We now show that this holds for every j, that is even
for r < j g n. Fix such j. Then 2, is a linear combination of or”, ch say
c1 = me, + We. +...+ pick. This means that for every 1‘ = l...., r,
bu = (Alb/l + (table +w + l‘lrbllr- So,
Mb!) + Aabs] +---+ Arl’rl
I: k
=1.”
,-i upbu)+kg( p-l
z u.b..)+...+x.< p-I
i m.)
, . .
= “(q-:1 7”117") + F: (”El labia) +---+ l‘k ( 3‘ Nbak)
(by regrouping of terms)
= “0+ “.0 +...+ M0 (since .-2" A, b,, = o for all p=1,2,...,k)=o.
This shows that A1111, +A,bu +...+ Lb.) =0 for allj= l.2....,n.
This indeed means that the rows of B. that is up...,u, (and not merely
their truncations v,,....v,) are linearly dependent. I
Now We are ready to prove the equality of the two ranks for all ’
matrices. The proof is by linking both of them to a third, common number.
4.35 Theorem: The row rank and the column rank of an mxn matrix
A are equal. This common number also equals the order of the largest
non-singular square submatrix of A.
Proof : Let r be the row rank of A. We claim that A contains a non-
singular r x r submatrix and also that it does not contain a non-singular
square submatrix of a higher order. Let (u,l 10,, ....u,, } be a linearly indepen-
dent set of rows of A and let 3 bethe r x n submatrix of A formed by
these rows. Then the row rank of B is r and so by the last proposition its
column rank is also r. So there exist some columns ch, c7,_....c,, of B which
are linearly independent. Let C be the matrix formed by these r columns
(ofB). Then C is an r x r submatrix of A. Also C is non-singular by Pro-
position (4.33). Thus A contains a non-singular square submatrix of order
r. Now suppose s > r and D is a square submatrix of A of order s. We
show D is singular. Suppose D is obtained by rows 14", u,.,...,u,, and columns
ca.” 0“,...,c,, of A. Then the rows of D are truncations of up,_..., u,,. Now,
sinces> row rank of A, um...,u,,, are linearly dependent and afartinri
so are their truncations. Thus the rows of D are linearly dependent. By
Proposition (4.33). D is singular.
488 DISCRETE MATHEMATICS (Chapter Six)
Thus we have shown that the row rank of every matrix equals the
order of the largest non~singular square submatrix. Let us apply this to
A', the transpose of A. Then the row rank of 4’ equals the order of the
largest non-singular square submatrix of A'. Now, a square submatrix of
A’ is the transpose of a square submatrix of the same order of .4. Also a
square matrix in .. ' ' if! its - r is no ' g ‘ (since the
two have the same determinants). It follows that the row rank of A’ equals
the order of the largest non-singular square subtnatrix of A. So row rank
of .4’ = row rank of A. But row rank of A’ = column rank of A. This
completes the proof. 5
The second assertion has an interesting theoretical consequence. In our
treatment so far, we have emphasised the role of the ground field. However,
as far as the rank of a matrix is concerned, it turns out to be independent of
the ground field, as long as the ground field is large enough to include all the
entries of the matrix. Let A be an m xn matrix over a field F. Suppose K is
an extension field of F. Then A can also be thought of as a matrix over X.
Conceivably, the rank of A as a matrix overF could difl'er from its rank as
a matrix over K. But the theorem above shows that this cannot happen.
For, suppose B is a square submatrix of A. Then det (B), being a sum of
the products of the entries of B is independent of whether A is regarded as
a matrix over F or over K. Consequently, B is non-singular as amatrix
over F ifi‘ it is non-singular as a matrix over X. Hence the order of the
largest ‘ ‘ su‘ ixofAis‘ ‘, ’ of L L Aistreated
as a matrix over F or over K.
Determination of the rank of a matrix is important in various connec-
tions. For example, if we are given vectors Ill. u....., u»| in R" and are
asked to find the dimension of the subspace of RI spanned by these vectors,
this is equivalent to finding the rank of the m >( n matrix whose rows are
u1,...,u,,,. The rank also plays a crucial role in solving systems of linear
equations as we shall see shortly. So it is desirable to have a method for
finding the rank of a given matrix. it is not is practicable proposition to
evaluate the determinants of all square submatrices and then apply the
last theorem. The method we give below is due to Gauss and called the
row reduction. In this method, given an m x 7: matrix A, we apply a series
of operations to its rows so as to get another m x in matrix B which has
the same rank as A, but which is in a standard form called the echelon
tom and consequently whose rank can be determined by inspection. Let
us first define this form. (The word 'echelon' means a ateplike formation
and the name will be justified from the form).
‘4.36 Definition: An m X 7: matrix B =(bu) over a field F is said to be
in the echelon form. if there exists an integer r, 0 g r g m and a strictly
monothonically increasing sequence of positive integers l<jl <j,< ...<j.<n
such t at
Ring, Field: and Vector Spaces 489
(i) for every 1‘ = l,...,r, 11,; = 0 for k <1) and Im, =1 for k =1)
(11) for every iwith r < is m, and for all k = 1,2,...,n, b”, = 0.
A matrix in the echelon form looks like
1'; 1'. j: jI-s J}
0.0 o......o o ......o l
o oo
o...o 1
o o 1 “h
—_ NW
0 o 0o
o...o..o..o..
o o 00
.
Here all entries below the 'stnircase‘ are 0. (Hence the name.) We now
compute the rank of this matrix.
4.37 Proposition: The rank of the mm): 8 above in the echelon form
is r.
Proof: Let vb ..., v,, ..., v... be the rows of B. Then fori :> r, w is iden-
tically 0. We claim that v, ..., v, are linearly independent. Let a” ..., A, e F
and suppose 11v, + 1.1-, + + by, = 0. Then for every k, Alb“ +
Nb... + + Nb”; = 0. Putk =j,. Then 1),], =17,“ = = ,k = Oand
big = 1. This gives 1, = 0. Hence 1,17,, + 1.17,, + + Mm = 0 for all
k. Now putk =1, and apply the same reasoning to get A, = 0. Conti-
nuing in this manner, we get A, = a, = = A, = 0. Thus the first r rows
of B are linearly independent. But all the remaining rows are identically
0. So ris the maximum number of linearly independent rows of B and
therefore equals its rank. I
490 DISCRETE MATHEMATICS (Chapter Six)
Now suppose A is an mxn matrix. We claim thatby performing certain
row operations on A, it can be reduced to a matrix in the echelon form,
without affecting its rank. These row operations are of three types:
(R1) interchange of any two rows,
(R2) multiplying one row by a non-zero scalar and
(R3) adding a scalar multiple of one row to another.
Although this description of the row operations is clear enough. for
theoretical pruposes it is convenient to paraphrase them in terms of matrix
multiplication. Let A be an m x :1 matrix. Suppose I < i < j S m. Let
P. be the m ><m matrix
ith row—> 0 l !
jth row—> l 0
\ l
ith column fill column
This matrix is the same as the identity matrix 1... except that there is 0 in the
(i. i)th and (j, j)th place and there is l inthe(i, j)th and in the (j, i)th place.
It is clear that P,A is the same matrix as A with the ith andjth rows inter-
changed. Thus the row operation R1 is equivalent to multiplication by P,
on the left. Note that P, is its own inverse. Similarly, R2, that is multi-
plying one of the rows, say, the ith row, by A is equivalent to multiplication
by P. where P, is like I... except that the (i, i)th place is 7.. If A ,s 0, then
P, is invertible (with the inverse having l/A in the (i, i)th place). For the
third operation R3, suppose A times the ith row of A is added to thejth
row. Then it is easily seen that this is equivalent to forming P.‘ where P.
is the m x m matrix which is the same as 1,. except that the (j, i)th entry
is A. P, is also invertible (with the inverse having —A in the (I. i)th place).
We leave it to the reader to verify (using this interpretation or directly)
that the rank of a matrix is unaifected under any of these operations and
hence under any composition of them in any order. We show that by
performing these operations in a suitable order, every matrix can be reduced
to an echelon form.
Ring, Fields and Vector Spaces 491
4.38 Proposition: Let A = (my) be an m X n matrix. Then by applying
the row operations above in a suitable order, A can be reduced to amatrix
(of the same rank as A) which is in the echelon form.
Proof: If all columns of A are identically 0 then A is the zero matrix
which is already in the echelon form (with r = 0), Otherwise, let j, be the
smallest integer Such that the jlth column of A is non-zero. Then a", as 0
for some i. Pick any such 1' and interchange the ith row with the first row
(i.e. apply R1). This gives a matrix C (say) in which all columns upto (Ir l)th
column areO and cm ye 0. Divide first row of C by cu, (i.e. apply 1R2). This
gives a matrix D, in which the first jl—lcolumns are 0 and d”, =1. Now
apply R3 to D, (m— 1) times. That is, for each i = 2, m, subtract (1”, times
the first row from the 1th row. This gives a new matrix 15‘ in which the j,th
column has 1 at the top and 0’s everywhere else. Let A1 be the (m—l)><
(rt—1}) submatrix of E obtained by removing the first row and the firstjI
columns of E. If A1=0, we stop and let B = E because the matrix B is in
the echelon form. lfAI is not 0, we subject it to the same procedure as A.
A is then reduced to a matrix say G of the form
j,th column th coiumn
go o...or ......... . ...........
o o...o o. .01.
G:3
i0 o o
.2 s 2 A,
'o o...0 .........o
for some j, > 1’1. Let A,be the submatrix of G obtained by removing the
first two rows and the first 1’. columns of G. Once again if A. = 0, we
stop and let R = G which is in the echelon form. Otherwise subject A. to
the same procedure. Note that A, has m—i rows fori = l, 2,.... So this
process cannot go on beyond m steps. So it must stop at some stage with
an a matrix Bin the echelon form. I
The construction is reader simple and will be illustrated in Problem
(4.41). The reader is also urged to illustrate it on his own with one or two
examples. Since the rank of a matrix in the echelon form is known by
Proposition (4.37), we now have a systematic procedure for computing the
rank of a matrix. The procedure is in fact. so algorithmic in nature that a
computer program can be written to implement it.
Finally, we study how the row reduction of a matrix plays a crucial role
in the solution of a system of linear equations. Let us consider a system of
m equations in n unknowns, x1, ..., x. given by
492 DISCRETE MATHEMATICS (Chapter Six)
”n": + + 01.1,. = b,
aux, + + anac- = b,
.......................... (t)
“In“: + + (1»:a = bu
We represent this as Ax = b where x, b are column vectors of length n
and m respectively and A is the m X n matrix (an). If b = 0, then (It)
heCOmes
Ax = o (v)
This is called a homogeneous system. From Proposition (4.2), the solutions
of (“) from a vector subspace of R' (namely the kernel of the linear trans-
formation T associated with A). This subspace is called the solution space
of ("). If rank of A is r, then by Exercise (3.10), the solution space has
dimension n — r. Also, from the discussion following Proposition (5.3.6),
the solution space of (‘) is either empty or it is a coset of the solution space
of (”). So, to find the general solution of (’), we first find the general solu-
tion of (") and then any one particular solution of (‘). We do these two
parts separately.
The solution of ("°) is simplified by the following Proposition.
4.39 Proposition: Suppose the matrix A is reduced to a matrix B in the
echelon form by row reduction. Then the solution space of (“) is the same
as that of Ex = 0.
Proof: We noted earlier that each row operation on a matrix corresponds
to multiplying it on the left by a non-singular matrix. So B = PA where P
is an m x m matrix which is the product of a. number of matrices of the form
P1, P” Pa (corresponding to the row operations R1, R2 and R3 respectively).
All these matrices are invertible. So P is invertible, that is non-singular.
Now any solution of Ax = 0 is also a solution of PA): = 0, Le. Bx = 0.
Here we do not need that P is non-singular. But for the converse. suppose
x is such that Bx = 0, Le. PAx = 0. Now we apply (7) in Proposition (4.33)
to the non-singular matrix P and to the column vector Ax. Then PA): = 0
gives Ax = 0. So x is a solution of (“). Thus solutions of 0") are the same
as that of Bx = 0.
When Bis in the echelon form it is very easy to write down the solutions
of Bx = 0. Let us adopt the notation of Definition (4.36). Then for
i #1}. win
at, can be given any value. Once these values are fixed, x, for j =1” ....j, is
uniquely determined as follows. Consider v,, the rth row of B. (This is the
last non—zero row of B). Since Bx = 0, we have in particular, that m: = 0.
But v.1: Is nothing but x,, + 17“,," x1,“ + + bunt... Since x,,+, ,..., x. are
Ring, Field: and Vector Spaces 493
already determined, we must set x,, = ~ (br.1.+xxl,+1+ + b,,.x,.). Now
consider the (r — l)th row, v,_l of B. Then v,_,x = 0. This gives
"In—l + bI-iJ'q-H JUN-H + + b,_1,.x._1, . = 0.
in this equation, x; is determined already for all i>j,_,. So x,” is uniquely
determined by this equation. Next consider the (r — 2)th row of B and
determine x,,_,. Continuing in this manner, x,, and finally x}. is determined.
The original matrix A has the same rank as B, namely, r. So we see that
in the general solution of (“), there are n — r arbitrary constants because
(n —- r) variables out of x,, ..., x. can be assigned values arbitrarily and the
remaining xl’s are determined in terms of them. Since the dimension of the
solution space of (") is n — r, this is consistent with the intuitive meaning
of dimension (given in the last section) as the number of free choices
possible.
Having “ ' ‘ the ‘ ' " of the L a system (“),
let us now solve the original system (‘). that is Ax = b. Here b is a column
vector of length m. We think ofb as an mxl matrix and perform exactly
the same row operations on it as on A to reduce (‘) to
Bx = c (1")
where B is in the echelon form and c is a column vector with entries
0,, a” ..., c... (say). More specifically, suppose B = PA where P is a product
of a number of matrices of the form 1’,, P., I"1 corresponding to the row
operations R1, R2, R3 respectively. Then 1: = Pb, and (*') is the same as
PAx= Pb, or equivalently, P(Ax — b) = 0. Since 1' is non-singular, it
follows that solutions of (‘) and (N) are identical. So the problem reduces
to solving ("). The corresponding homogeneous system, Bx = Ohas already
been solved. So we simply have to find any one solution for (") or else prove
that no solution exists. The following proposition answers this question
completely.
4.40 Proposition: Suppose the matrix B in (") has the form in Definition
(436). Then (") has a solution ifi‘ e, = 0 {or all i> r.
Proof: If u, = (bu, b", ..., ha.) is the ith row of B and x is a solution of
(") then we have c, = bnx, + but: + + buxu. But for i > r, u, is iden-
tically 0. So if a solution exists for (") then c; = 0 for all i > r. Conversely,
suppose c, = 0 for all i> r. Then a solution can be obtained in the manner
analogous to that for the homogeneous system. Set x, = 0 for all j #1}, ...,j,.
For j :1}, ...,j, determine 1:,- onc-by—one starting with j =j,. We have,
31, + bps/#119,“ + + Flux» = C:-
This gives X}, = c,. Next, we have,
x},_. + br—n I7- 1+: xl,.,1+r + + bv—b I. X], + + bf—IHI x" = 67—1
494 DISCRETE MATHmATlCS (Chapter Six)
This gives x,,_' + b,_,, ,, c, = cm. and determines. x,,_,. Similarly we de-
termine x,,_z , ...,x,2 and finally "h' This way we get asolution for ("). I
We illustrate this with a numerical example.
4.41 1‘ ‘ ' Find the- ' ' ' ofthe‘ " ' , , in the
real unknowns x1, x,, x,, 26., x,.
(i) 2x,—2x‘+x,=2 (ii) zxe‘2x4+xl='2
2x,— 8x, + l4x.—5x.=2 2x,—8x,+14x.—5x,=0
x,+3x,+x.=8 x,+3x,+x,—B
2
Solution: (i) We write the system in the form Ax = 2 where A is the
8
0 0 2 —2 1
matrix 0 2 —8 l4 —5 . We reduce A to a matrix in the echelon
0 1 3 0 1
form through the following sequence of matrices. We also indicate the row
operation by which each matrix is obtained from the preceding one.
0 0 2 —2 l
0 2 —-8 l4 —5 (given matrix A)
0 l 3 0 l
0 2 —8 l4 —5‘
0 0 2 —2 l (interchange of first two rows)
0 I 3 0 1
o 1 -4 7 —5/2\
(0 0 2 -2 l (multiplying first row by l[2)
0 1 3 0 l I
l —4 7 —5/2
0 0 2 —2 l (subtracting 1 times the first row
0 0 7 _7 7/2 from the third)
Ring, Field: and Vector Spaces 495
0 1 >1 1/2 (multiplying the second row by 1/2)
0 0 l —1 1/2 (subtrlcting 7 times the second row
from the third row)
This is in the echelon form. So this is the matrix B. Herc r = 2, j, = 2
and j, = 3. So, the general solution of the homogeneous system is obtained
by setting x, = 7,1, x. = A, and x, = A, arbitrarily and solving for x. and
then for x, We have x, — x. + ix. = 0 from the second row. This gives
x, = M — {15. Also from the first row, at, — 4x, + 7x, — fix, = 0, giving
x, = 4x.— 7::‘ + ix, = 40. — 9‘5) —- 7x. + EA. =— 3).. + 9.5. Hence the
general solution of the homogeneous system is x, = Al, x, = —- 3M + p“.
x, = A. —- p.” x. = A, and x6 = A, where M, 7“, A, are arbitrary constants.
2
Now to solve (i), we apply the same row reduction to 2 d get suc-
8
2 l 1 1 l
cessively, 2 , 2 , 2 , l and finally 1 .ln thiscolumn
8 8 7 7 0
vector all the entries after the 2nd row (note, r = 2) are 0. So a solution
exists. To obtain a particular solution, we set x1 = x. = x. = 0 and solve
for x, and then for x,. Once again, the second row of B gives x, — x, +
+ ix, = l. Hence x, = 1. Next, the first row of B gives 2:, — 4x, + 7x. —
— 3x. = l which implies x, = 5. This gives a particular solution of (i). By
adding it to the general solution of the homogeneous system. we get the
general solution of (i) as
x, 7i;
x, —3M + h, + 5
7‘s = 7“ — “a + 1
x‘ 1,
x5 15
where 1,, A. and A5 are arbitrary constants, ranging over R, the real field.
496 Discnsrs MATHEMATICS (Chapter Six)
(ii) Here the coelficient matrix is the same as in (i). So it reduces to
the same matrix' B in the echelon form. When we apply the same row ope-
2 0 o 0 0
-rations to 0 , it reduces, successively, to 2 , 2 , 2 , I
8 0 i 8 8 8 8
and finally to l . Here there is a non-zero entry in the third row. Since
1
3 > 2 = rank of A, by Proposition (4.40), the system has no solution. Such
a system is therefore called inconsistent. I
To conclude we show how the method of row reduction can be used to
find the inverse (if any) of a square nx n matrix. Let X be the (unknown)
inverse of A. Here X is an nx 11 matrix. Finding X is equivalent to solving
the equation AX = I". For each j, let x, be the jth column of X and let r,
be the column vector of length n in which only thejth entry is 1 and the
other entries are all 0. Then solving for X really amounts to solving n sys-
tems of linear equations, namely, Ax, = e, for j = 1, ..., n. These systems
are independent of each other. But since the coefllcient matrix is the same,
namely A, a lot of work is common. So we apply row reduction to both
the sides of AX = 1,, and get a non-singular matrix P such that PAX =
= PI. = P, with PA = B in the echelon form. If r = rank of B < n, then
there will be at least one j for which the equation Ax,- = e, cannot be solved.
If r = n, then B will be of the form
000 l
where ell entries below the diagonal are 0. Now for each j, we determine,
successively, x,,,-, x(..1,,...., xgi, x1; from the system of equations
xu
B it” = P9}.
\ 5‘s}
The technique of determining xi" x;,_,,... xi. and finally 1:}, from (") is
called back substitution. A slightly difi‘erent approach will be indicated in
Exeecise (4.40).
Ring, Heidi and Vector Space: 497
Exercises
Prove proposition (4.4).
Jib
A matrix A = (an) is called symmetric if a” = a,. for all i, 1‘. It is
called skew-lymmetric, if a” = —a,, for all i, 1. Obviously such
matrices must be square matrices. Prove that a matrixA is symmet-
ric iii A = A', the transpose of A and that it is skew-symmetric
ifl' A = —- A‘. For any square matrix A, prove that A + A’ is
symmetric and A — A’ is skew-symmetric.
4.3 Prove that every square matrix overafield ofcharacteristic different
from 2 can be expressed as a sum of two matrices one of which is
symmetric and the other skew-symmetric.
4.4 Prove that the property of the basis in Theorem (4.5) actually
characterise: a basis. In other words, suppose Bis a subset of a
finite-dimensional vector space with the property that for every
vector space Wand for every function f: V—> W, there exists a
unique linear transformation T: V» W which extends f. Prove
that B is a basis for V.
4.5 Prove that two vector spaces over a field are isomorphic ifi‘ they
have the same dimension. In particular prove that every veetor
space is isomorphic to its dual.
‘4.6 Let V be a vector space over F, V‘ its dual and V“ its double dual
(i.e., V“ is the dual of V‘). For each we V, define e,: V‘->F
by e,(T) = T(v). This function e, is called, quite appropriately. the
evaluatlon at v. Prove that e, is a linear transformation and hence
e, E V“. Define 0 : V—> V“ by 009 = e, for ve V. Prove that
0 is a vector space isomorphism.
x. 2xl—x,
4.7 Define T:R‘—>R’ by!" x, = x|+x,+x, . Find the
x3 x! _xa
l 2 1
matrix of T w.r.t. the ordered basis 0 l , —l .
—- l 3 1
Find the rank of T.
4.8 Let V be the set of all polynomials over a field F, of degree g 4,
including the zero polynomial. Define D: V—> V by D(flx)) =f’(x).
the formal derivative of fix), (see Exercise (1.20)). Prove that D is
a linear transformation (This is often expressed by saying that
differentiation is a linear operator.) Find the matrix of D w.r.t. the
ordered basis (I, x, x‘, x‘, x‘) and also w.r.t. some other ordered
basis.
498 mscnma narflnmncs (Chapter Six)
4.9 Let at, B, y, 8 be elements of a field 1". Define T:M:(F) + Maui)
by T(A) = ( a p )A for A e M=(F). Prove that T is a linear
y 8
transformation and find its matrix w.r.t. the ordered basis
((3, EH: QM? 3H: 2’»-
Let ea 5., i.e. a is a permutation of the set (1, 2,..., n}. Let
(v‘,.... v,.) be an ordered basis for a vector space V. Let T, be the
unique linear transformation which takes v, to m.) fori = l,..., 71.
Let P. bethe matrix of T. w.r.t. the ordered basis (vl,..., v.). P.
is called the permutation matrix of a. Prove that in every row and
column of P. there is exactly onel and all remaining entries are
0. If a, 1‘ E S,. prove that Pa“ = P,P, and Pa-x = (P,)-l. Show
that every n ><n matrix with exactly one l in each row and column
and zeros elsewhere equals P. fora unique at E S," Finally, prove
that det (P,) = (—1)v.
Let F be a field and n a positive integer. Let GL(n, F) and SL(n. F)
be respectively the sets {A e M..(F) : A is non-singular} and
{A E M.(F): det (A) = 1}. Prove that:
(i) Both GL(n, F) and SL(n, F) are groups under matrix multipli-
cation
(ii) SL(n, F) is a normal subgroup of GL(n, F) and that the
quotient group is isomorphic to the multiplicative group I",
i.e., the group of non-zero elements of F under multiplication.
(Hint: Note that the determinant function gives a group homo-
morphism from Gun, F) to F‘.)
(iii) If P:S,, —> GL(n, F) is defined by P(n) = P0. then P is a
monomorphism.
4.12 Given a finite group G and a field F, prove that for some integer
n, G is isomorphic to a subgroup of GL(n, F). (Theorem (4.14)
shows how matrices provide a concrete representation of a field
extension. This exercise shows how every finite group can be
represented as a group of matrices.)
4.13 A square matrix is called upper triangular or (simply triangular) if
all the entries below its diagonal are 0, i.e. ai,= 0 for j < t. A
lower triangular matrix is defined dually (or using the transpose).
Prove that if A = (an) is an n x n triangular matrix, then
det (A) = antinuflm.
Ring, Field: and Vector Spaces 499
4.14 Prove that when a scalar multiple of a row (or column) of a square
matrix is added to some other row (or column), its determinant is
unchanged.
4.15 Generalise Theorem (4.26) as follows. Let A be an m><n matrix
with m < 1:. Let B be an nxm matrix. Then AB is an mxm matrix.
Ifj1,...,j,. are integers with 1 < j; <j, <...< j... < n, let A,,,...,J,,,
denote the m x n submatrix of A formed by its jlth,j,th,..., and
j...th columns and let Emu-1.... denote the mxm submatrix of B
formed by its j.th,.... j,,.th rows. Prove that
‘1“ (AB) = 2
! ‘I1<m<]m‘l
“NA/two”) det (Elwin)-
Prove Cramer’e rule. SupposeA is a non-singular n x 7: matrix. Then
show that for every column vector b of length n. the equation
Ax = b has a unique solution for the column vector x, and further
det (Ar)
that this solution is given by x, = where A. is the matrix
det (A)
obtained from A by replacing its Ith column by the column vector b.
4.17 The trace ofan n xn matrix A is defined as the sum of its diagonal
n
elements, that is, as '2 a". If A, B are two n X n matrices prove
-I
that (i) trace (A) = trace (A’), (ii) trace (AA + pH) = 7.
trace (A) + a trace (B) for any scalars A, y. and (iii) trace (AB) =
trace (BA).
4.18 Prove that the trace is a similarity invariant. (Hint: In (iii) in the
last exercise, let A = CD and B = C4.)
4.19 Let T: V—> V be a linear transformation where Via a vector space
over a field F. An element A e F, is called an elgenvalue of T if
there exists a non-zero vector v e V such that Tv = Av. Any such
vector is called an eigenvector of T, corresponding to A. Prove that
all the eigenvectors of Tcorresponding to 1, along with the zero
vector, form a subspace V; of V, called the eigenspace correspond-
ing to A. If V has a basis consisting of eigenvectors of T (not
necessarily belonging to the same eigenvalue), prove that the matrix
of T w.r.t. this basis is diagonal.
x1 2 1 x1
4.20 Define T: R” —> R” by T = . Prove that T
x, 3 0 x,
has two eigenvalues — l and 3. Find corresponding eigenspaces.
4.21 Let V be set of all functions f: R —> R which have derivatives of
all orders. Prove that V is a vector space over R. under pointwise
addition and scalar multiplication. Prove that the function D: V—>V
defined by D(f)-——f’, the derivative function off, is a.linear transfor-
mation. Prove that every real numberis an eigenvalue of D. If AER,
500 Drscnara MATHEMATICS (Chapter Six)
find the eigenvectors corresponding to 1. (Such eigenvectors are
often called eigenfunctions. This term appears frequently in the
solution of linear difl'erential equations).
4.22 Let T:V—> V be a linear transformation where Vis an n-dimensiou-
a1 vector space over a field F. Let A be the matrix of Tw.r.t. some
fixed ordered basis for V. Prove that an element). 6 F is an eigen-
value of T if and only if the matrix A - A (is. the matrix A — AI.)
is singular. (Quite often, 7» is called an eigenvalue of A.)
Let A be an nxn matrix over a field F. Then 1: In — A is an nxn
matrix over the polynomial ring [Ix]. Prove that the determinant
of this matrix is a monic polynomial of degree n in F[x]. This
polynomial is called the characteristic polynomial of A. Prove that
an element A E F is an eigenvalue of A ifl‘ it is a root of the
characteristic polynomial of A. Verify this, for Exercise (4.20) by
2 1
showing that the characteristic polynomial of the matrix ( )
, 3 0
is x‘ — 2x — 3.
4.24 Prove that a linear transformation of an n-dimensional vector space
into itself cannot have more than n distinct eigenvalues.
‘4.25 Letf(x) be a manic polynomial of degree I: over a field F and let
A be its companion matrix. Prove that the characteristic polynomial
of A is f(x).
"4.26 Prove that every matrix is a root of its characteristic polynomial.
4.27 Prove that the characteristic polynomial is a similarity invariant.
(Hence if T is a linear transformation of a finite dimensional vector
space into itself, we can unambiguously define its characteristic
polynomial).
34.28 Suppose a linear transformation T of an n-dimeusional vector
space V into itself has n distinct eigenvalues A, . ..., A. with v1, v...., v.
as corresponding eigenvectors. Show that {v,,..., v.) is a basis for
V, using induction on n. Then deduce that:
(i) the characteristic polynomial of T is
(x —— i“) (x — 12)...(x - 1").
(ii) the trace of T is A, + M +...+x,,.
(iii) the determinant of T is 7., Ar")...
4.29 If A, B are mxn and nx p matrices, prove that the rank of AB
can exceed neither the rank of A nor that of B. (cf. Exercise (3.16).)
Prove however that if m = n and A is non-singular then rank (AB)
= rank (8). [Hintz Apply Exercise (3.16), to the product A-1(AB),
to get rank (B) < rank (AB).] Similarly if p = n and B is non-sin-
gular then rank (AB) = rank (A). Deduce that the rank of a linear
Ring, Fields and Vector Space: 501
transformation equals the rank of its matrix w.r.t. any ordered
bases for its domain and eodomain.
4.30 For an n x n matrix A prove that the following statements are
equivalent:
(1) A is non-singular
(2) For every nx p matrix B, rank (AB) = rank (B)
(3) For every m xn matrix B, rank (BA) = rank (B).
(This gives two more characterisations of non-singular matrices.)
4.31 Using the last exercise, prove that the row operations R1. R2 and
R3 do not affect the rank ofa matrix.
4.32 Consider the system (‘1) (that is. Ax = b) of in linear equationsin u
unknowns. Let [A : b] be the m X(u+ 1) matrix obtained by adding the
column vector b as one more column to A. Prove that (a) has a solu-
tionifi rank of A is the same as that of the matrix [A : b]. (This is
essentially a paraphrase of Proposition (4.40). The matrix [A : b] is
_ often called the augmented matrix of (111).)
4.33 It a1,..., a. are elements of a commutative ring R with identity then
the determinant of the matrix.
1 l l
a; a, a.
a; a: a:
1.16:1. .é; ....... kiln
is called a Vandermonde determinant. Prove that this determinant
equals the product 1'] (a, — 11/) and that it is non-zero when
I ei <I <n
11,, m, a. are distinct elements of a field. (Hint: From each row,
subtract 1!, times the prweding row, starting from the last row.)
1 1/2 1/3 1/»
4.34 A matrix of the form 1/2 1/3 l/4 l/(n+ 1)
1/» l/(n+l) l/(n+2) l/(2n+l)
is called a Hllbert matrix. Using the method of row reduction find
its inverse for n = 3 and 4. Can you guess a formula for its inverse
for the general case?
4.35 Let V be a vector space of dimension n over a field F. Suppose
f: Vx V —> F is a hilinear function (also called a bilinear form, see
Exercise (3.25) for definition.) Let (vl,..., v.) be an ordered basis
for V. Let A be the n xn matrix (an) where a” =f(v,, v1). Prove that
502 DISCRETE MATHEMATICS (Chapter Six)
f is uniquely determined by A. A is called the matrix of the billnear
form f w.r.t. the ordered basis (v,,..., v.). [ff: F'XF' —> F isdefined
b((xn---, Xe). (ya-n, y.» = fixm,
I]
prove that the matrix off w.r.t.
the standard basis is I... How are the matrices of the same bilinear
form w.r.t. two difl'erent bases related to each other?
4.36 Suppose [(x) = ”o + 0.x + ...+ 0.x" is a primitive polynomial with
integer coefiicients. Suppose there is a prime number p such that p
divides an, a,...., 11..-, but )7 does not divide a. and moreover p‘
does not divide an. Prove that f(x) is irreducible in l]. (Hint:
Assumea factorisation and check divisibility by p of the coeflicients
of the factors, using Corollary (2.18) to get a contradiction.)
4. 37 In the exercise above prove that f(x) is, in fact, irreducible over Q[x].
(Hint: Apply Exercise (2.30).)
4.38 If p is a prime number prove that the polynomial x!"l + x'" +
+...+ x + 1 is irreducible in Z[x] and hence in QM. (Hint: Sub-
stitute x= y + l and show that the resulting polynomial in y is
irreducible. Then deduce that the original polynomial is also
irreducible.)
4.39 An mxn matrix A = (all) over a field F is a said to be partitioned
if there exist positive integers p <2 q with p < m. q < n such that
a,,=0 for (i) all l<i<p, q<j<n and (ii) all p<i$m,
B 0
1 <1 < q. Equivalently, A has the form (T) where B, C are
0 C
pxq and (m—p)x(n——q) matrices respectively and all other entries
are 0. Prove that
(i) rank (A) = rank (B) + rank (C)
(ii) if m = n and p = q then det (A) = det (B) det (C).
4.40 In reducing the system of equations (.) to (s’), at various stages
multiples of the ith row (i = l,..., r) were subtracted only from the
subsequent rows. Suppose we subtract suitable multiples of the 1th
row from the earlier rows as well so that the jlth column has a l
in the ith row and 0 everywhere else, Show that from the resulting
system, the general solution can be written down by mere inspec-
tion. Do problem (4.41) this way.
Notes and Guide to Literature
The enormous length of this section should not come as a surprise. Apart
from the monstrous notations needed for matrices, the topic of
linear
algebra is vast enough to occupy not only a whole chapter but
a whole
Ring, Fields and Vector Spaces 503
book. Our treatment constitutes only a glimpse at some important aspects
of it. For more on canonical forms, eigenvalues etc. see Herstein [1].
We have discussed vector spaces over a general field F. When I“ is the
field of real (or complex) numbers, an additional structure can be imposed
through the concept of an inner product, as noted in the last section.
Among all linear transformations from one inner product space to another.
if we take those which behave ‘nicely’ w.r.t. the inner product, we can
prove deeper results about them. Among such transformations are the so-
called unitary and Hermitian transformations. See again Herstein [1].
The construction of the isomorphism e in Exercise (4.6)'may appear
rather contrived. But it has a profound significance. The mere existence of
an isomorphism between V and V“ follows from Exercise (3.5). since both
have the same dimension. But such an isomorphism would depend on the
choice of particular bases for Vand 1"“. The isomorphism B, on the other
hand, does not depend on the choice of any particular basis for V. For this
reason it is called the canonical isomorphism from V to V“. Elements of V‘
are linear functions on V. while those of V" are linear functionals on V‘.
Identifying V with V“ (through 0) amounts to interchanging the roles of
functions and their arguments. Intuitively, we are recovering V from V‘ by
taking linear functionals on V". This technique is used in many branches
of mathematics. To cite one instance, the Stone representation theorem for
Boolean algebras. is proved using this technique.
The result of Exercise (4.15) is called Cauchy-Binet theorem. It is useful
in enumeration of certain graphs. (See the Epilogue)
Exercise (4.26) is the famous Cayley-l-lamilton theorem. A truly instruc-
tive proof of it requires a good dip into canonical forms. But a slick, ele-
mentary proof using only properties of determinants over the ring of
polynomials is also possible. For such a proof and a generalisation see
Greenberg [1].
Determinants are among the most classical algebraic forms and have
been heavily studied. It is not our intention to give a drill in the manipula-
tion of determinants. By way of sample we have mentioned only the
Vandermonde determinant and the Hilbert matrix. For more exercises on
determinants see Knuth [1], Vol. I.
Exercise (4.36) is called Eisenstein’s uitcrion. It is one of the few criteria
available for proving primality of a polynomial.
We recall that in this section we have confined ourselves to finite di-
mensional vector spaces. Nearly all concepts here can be generalised to
infinite dimensional spaces. But the generalisations are far from trivial.
For more extensive coverage of linear algebra and its applications. see
Noble [1].
The method given in Exercise (4.40) is called the Gauss-Jordan method.
Seven
Advanced Counting Techniques
One of the most fascinating features of mathematics is the interplay
between two apparently unrelated branches of it. In Chapter 1 we stressed
the essential difierence between discrete and continuous mathematics,
namely that in the former there is no limiting process. That is why. in
the problems of discrete mathematics the sets involved are generally finite.
Handling infinite sets usually require some kind of limiting process. We
remarked, however. that even in discrete mathematics, an element of the
infinite comes from the fact that the set of positive integers is infinite.
Most of the counting problems involve an integer variable, often several
such variables. For example, in the Shares Problem, b. is the number of
shares during the nth year. Here the integer n can take any value 1, 2, 3.....
In any particular instance, we may be interested only in one value of n,
say u = 50. In that case, blo can be computed, in theory, by sheer arith-
metical computations, starting from b, = 0, b, = l and using the recurrence
relation b.+, = b. + hm with n = 0, l,..., 48. But this is hardly satisfying.
Very often we have to know b. for several values of n simultaneously. We
would also like to know how rapidly b. grows with 7:. To answer questions
like this, we have to study the sequence (b.)‘.".. as a whole. We would also
like to ‘aolve’ the recurrence relation b. = b.._, + b..-” i.e., get the value of
11.. directly from n, without having to know b..-” 17..I
Generating functions provide ameans of studying sequences of real (or
complex) numbers and for solving recurrence relations. Given a sequence,
say {a.},T.o, we define its ordinary generating function as E an x" = a. +
l-fl
+ alx + a,x'+...+a,.x' +.... This definition is meaningless unless x is
specified. We could letx be simply an indeterminate. In that case, the
generating function is just a formal power series in x, discussed in
Example 5 of Chapter 6, Section I, where we defined the ring of formal
power series. This approach has limited utility because the properties of
Advanced Counting Techniques 505
the formal series have to be derived first and doing this may entail the
same amount of work as proving results directly about the given sequencc
b,l in the first place. There is nevertheless some notational simplification
and sometimes seeing things in a familiar notation can inspire ideas lead-
ing to solutions.
A far more fruitful approach is to let x be a real variable (or sometimes
a complex variable, in which case it is often denoted by 2). When this is
..
done; the power series E aux" represents a function of the variable x which
. l-fl
can range continuously over the entire interval of convergence of the power
series, (or the disc of convergence in the case of a complex variable). This
function, which is analytic, can be manipulated, differentiated, integrated
and the terms of the original sequence (an) can be recovered from it if
desired. In other words, the rich machinery of continuous mathematics
becomes available. of course. before applying it, we must ensure that it
is infact applicable, that is, the hypotheses of its theorems are satisfied.
Usually, this requires establishing the uniform convergence of the series in
question. This care is often not exercised in elementary treatises on the
subject. In other words, an attempt is made to tap the fruits of the theory
of analytic functions without paying the price, namely, adhering to its
rigour. This is probably keeping in line with the historical development of
the subject. It is a fact that as great a mathematician as Euler, who proved
numerous identities using generating functions, rarely bothered about the
kind of logical discipline that would be expected of a mathematician today.
What is truly remarkable is that despite the consequent vulnerability to
going wrong, every single result of Euler is true. In other words, even
though the method used was formal and lacked in rigour, the results
themselves were correct and today they can all be substantiated with
rigourous justifications.
The approach we shall take will be a combination of the two, although
more inclined towards the naive approach. We shall take x to be a real
(sometimes a complex) variable and use facts from the theory of power
series. Although we shall omit the theoretical justifications, we shall point
out where they are needed and where they can be found. In the first
section we define generating functions and initiate their applications. More
" -n--- to ”' “problems ( ' " ,a ' ' to
the Postage Problem) will be taken up in the second section. The third
section deals with the methods of solving recurrence relations, one ofwhich
is based on generating functions. In the last section we present a number
of problems which are solved by recurrence relations.
506 ntscma MATHEMATICS (Chapter Seven)
1. Generating Functions of Sequences
How do you remember a sequence of letters? If the sequence consti-
tutea a ‘meaningful’ word like ‘MAGIC‘ or ‘lNDUCTlON’, there is little
difliculty in remembering and reproducing it. But what if it is a crazy
sequence like ‘A — S — T— C’ or 'A — H— G - H — B —-F—G~F—C"'l
In such cases, a most frequently used mnemonic device is to form a phrase
or sentenCe the first letters off whose words are the members of the given
sequence. For example, in the examples just given, two popular ‘long forms’
are ‘All Silver Tea Cups’ and “A Handsome Guy Having Brave Features
Goes For Cocacola': On the face of it, we are making the problem more
complicated because instead of having to remember a sequence of single
letters, we now have to remember whole words. Still, these memory tricks
work because the phrase or the sentence has some meaning and hence is
easier to remember. it is necessary, of course, that this meaning should
appeal to us in some way or else we shall have a situation where the cure
is worse than the disease. Further. in order that we can recover the original
sequence of letters uniquely, the choice of the words should be such that
the same meaning cannot be conveyed in more than one way. For example,
if we agree that ‘Johu is poor’ means the same as ‘John is not rich’ then
there would be an ambiguity as to whether the original sequence of letters
was 'J — I —P’ or 'J —l ~N-R’. If we make a stipulation that the word
‘Not‘ shall not be used. the second possibility is ruled out.
The underlying idea behind generating functions of sequences of real
numbers is similar. although, of course, they go far deeper than cheap
memory aids. We select a family of functions fu(x), f,(x), f,(x),... of a real
variable x. These functions are called indicator functions. Given a sequence
(on);" of real numbers, we form the sum '2‘: a, 50;). In general this sum
l-I
is infinite and so the question of its convergence is important. Frequently,
the indicator functions are real-valued and the convergence is the usual
convergence for an infinite series of real numbers, namely, as the limit of
the sequence of its partial sums. In other words
gmfnot) = lim 5.0:)
where n m
S..(x) = E.» (Inf-(X)
' Reader: familiar with elementary trigonometry and coordinate geometry would
recognise. of course. that neither sequence is all that crazy. The lint is obtained
from the first letters of ‘All‘, she, ‘Tangent' and ‘Cosine' which describe which of
the three trigonometric functions are positive in the various quadrants of the plane,
The second sequence gives the entries in a 3 x 3 determinant which is crucial ln
analysing the general quadratic equation Ax2 + By' + Zn + 26:: + 25: + C— 0.
Advanced Counting Technique: 507
for m = 0. l, 2, . This infinite sum represents a function of the variable
x. Its domain is the set of all values of x for which the series 5 a.f.(x) is
It"
convergent. This function is called the generating function of the sequence
{on}. Let us denote it by A(x).
Sometimes. it is convenient to let the functions I). have complex num-
bers as their domains and codomains. In that case, we often denote the
generating function by A(z). It is defined as f a.fi,lz), where the conver-
. n-Il
gence involved is that of a series of complex numbers. More generally.
we could let the indicator functions take values in any vector space, say
V, over R (or C). If we have some sort of a convergence in V, then
'2’ a.f.(x)
0-.
is a vector in Vfor every xfor which the series is convergent. it is an
infinite linear combination of the vectors j;(x), f1(x), . But most of the
time we shall confine ourselves to the case where the functions f,.(x) are
real-valued. To spell out the analogy with the mnemonic devices, the
original sequence (amt, corresponds to the sequence of letters, the terms
of the series in a..f»(x) are like the words whose first letters are from the
I-
sequence, while the sum function A(x) plays the role of the phrase or the
sentence formed by these words. Note that the generating function depends
as much on the given sequence {11.) as it does on the choice of the indicator
functions. We want to choose these generating functions once and for all.
When we do so, we can represent a given sequence (11,.) by its generating
function.
Now, what would be a good choice of indicator functions? The answer
is dictated by two requirements. First, the generating function A(x) should
be something easier to analyse, something whose properties are known to
us. (In the analogy given above, this corresponds to the phrase or the sen-
tence formed having a meaning which appeals to us.) Secondly, we should
be able to reconstruct the sequence (11,.) if we know its generating function
A(x). In other words, we should be able to ‘resolve‘ A(x), along the indi-
cator functions. This condition is analogous to, but stronger than requiring
that the indicator functions be linearly independent. as elements of the vector
space of all real valued functions of 2:. (Linear independence deals only with
finite linear combinations, whereas A(x) is an infinite linear combination of
the indicator functions). For example, we cannot take f,(x) = x‘,A(x) = 3x
and/fix) = 2):2 + 6x. Because then, no matter what the other f.‘s are, the
sequences {1, l, 0, 0, 0,...) and {0, 0, i, 0, 0, 0, 0,...) would have the same
generating function and so we cannot uniquely get back to asequence from
its generating function.
508 DISCRETE MATHEMATICS (Chapter Seven)
The two requirements just laid down severely restrict the choice of
indicator functions. By far the most standard choice is to take fl.(x) = x“
forn=0, l, 2,... (with the understanding that 130:): l for all x), or
some variation thereof. This way the generating function 40:) of asequence
{0.) is simply the power series 5 11.x". As such, all the nice theorems about
II-D
power series (a few of which will be listed shortly) become available.
Although we shall stick to this choice of the indicator functions in most of
what is to come, we remark that taking f,.(x) = cos nx or sin we is also a
fruitful choice because the generating function so obtained is what is known
as a Fourier series and next to power series, these are probably the most
thoroughly studied series of functions.
With this rather lengthy introduction we now come to the most basic
definition of this chapter.
[.1 Definition: Let {cf 0 be a sequence of real numbersThen its ordinary
.-
genersting function (0.G.F) is defined as .E’ anx" and its exponential gene—
ratlng function (E.G.F.) is defined as E a. x“.
.on i
I-o
For example, the 0.G.F. of the sequence l, l. l, 1,... is
l+x+x‘+...+#+...= i #-
l—O
while its E.G.F. is
x" x‘ x‘
E. "1:1 +x+21+3_l+ +u_!+’
Most of the time x will be a real variable. However, occasionally we shall
need to assign it complex values, in which case we shall sometimes denote
it by s. As a rule, if the terms of asequence are denoted by putting suffixes
on a small case letter, its 0.G.F. will be denoted by the corresponding
captial letter. Thus the 0.G.F. of (an): is :50 apt" = Apr). that of ((7.)
-II
is B(x) and so on. There'is no standard notation for E.G. F. of a sequence.
‘ ' " it is not an' ‘ , , * the E.F.G. ofnsequence
{a,.)'is precisely the 0 G. F. of the set:1ilence{:—"l and vice versa. The reason
for the word ‘exponential”is that the EEG. of the sequence 1, l, 1,... 1,...
. =7 x' . . .
ll “2:.“ E!" which is precisely the power series expansion of the exponential
function e'. Since we shall refer to the exponential generating function far
Advanced Counting Techniques 509
less often than the ordinary generating function, by ‘generating function’
we shall mean O.G.F. unless otherwise stated.
We now summarise the basic properties of genergting functions.
1.2 Theorem: Let A(x), B(x) and C(x) be respectively the O.G.F.’s of the
sequences {41.}. (b,), and {on}. Then
(i) (Uniqueness) A(x) = 130:) if and only if a. = b. for all n.
(ii) (Linear Combinations) Suppose A, p. are constants such that
c. = A n. + p. b,. for all n. Then C(x) = AA(x) + y. B(x).
(iii) (Products) Suppose for every n, c. = anb. + all)".l + a,b.,_.+...+
a,_,b, +a,.b, = [$20 a,b._,. Then C(x) = A(x) B(.x).
(iv) (Difl'erentintion) A(x) is an infinitely difi‘erentiable function of x
and its derivative: can be obtained by term-by-term difi'erentiation.
That is,
A'(x) = a, + 2a,): + 3a,x’ +... + na,x"1+...+= “23‘ Mutt"-1
.-
A"(x) = 2a, + 6a,x+...+n(n—1) tam-8+ ...= 2 n(n—l)a,x""
A(*)(x) = 3* n(n— r)...(n-k+1)a.x~-t.
k
Moreover, for each k, a; = A": '(0) (with the understanding that
A‘Wx) = A(x)).
Proof: The meanings of and the arguments needed to prove these asser-
tions depend on which approach is taken. If we let x be a formal variable
(or an ‘indeterminate‘), then A(x), B(x) etc. are nothing more than formal
power series. In this approach, the various concepts about power series are
defined in such a way that the assertions (i) to (iv) would come out true.
Thus two series a 0.x" and E 17.x" are defined to be equal if a. = it,| for
n-ll "-0
all n. Similarly, their combinations and products are so defined that (ii)
and (iii) hold (cf. Example 5 in Chapter 6, Section 1). Note that the rule for
multiplication is the same as for multiplying polynomials and is based upon
x’x1=x’+i. Finally, for (iv), the formula for A’(x) is the very definition
of the formal derivative of A(x) (of. Exercise (6.110)). This derivative,
unlike the derivatives in calculus, is not obtained through any limiting
process. Even though it shares some of the properties of the derivatives in
calculus, they have to be established independently. Theorems of calculus
are useless. Note also that the very last statement is meaningless in the
formal approach because once we let x be just a formal variable. we can-
not assign it any value.
510 DISCRETE Marmancs (Chapter Seven)
In the approach we have taken, however, x is a real variable and A(x),
B(x) etc. are functions of x, defined by the respective power series in x.
Their domains are the respective “intervals of uniform convergence’
a:
(specifically, the interval of uniform convergence of 2. 4.9:" is (—R, R)
u-
where R -.= llmn [ a. |“", (with theunderstanding that 1/0 = co). R is called
the radius of convergence of the power series.) and the statements to be
proved here must be qualified by the phrase 'for oil it in the intersection
of the domains’. The proofs are based on properties of power series. (ii)
is true for any series (not just power series). (iii) is also a consequence of
a theorem about products of series. For (iv) to make sense, it is necessary
to assume that the radius of convergence of A(x) is positive. The proof of
(iv) requires uniform convergence of the derived series. For notational
uniformity, it is coqvenient to rewrite these formulae as
A'(x) = in (n+l)a,,+,x'. A”(x) = E0 (n+1) (n+2)a,“,x",...,
A<~>(x) = $0 (In +1) (n+2)...(n + k) a". at".
These expansions are valid in the interval of convergence of A(x). Since this
interval always contains 0, setting x = 0 we get 4*(0) = ask! for all k = 0,
l, 2,.... This shows how we can recover the original sequence from its
O.G.F. In particular it proves (i). We omit the details of these basic theorems
because they belong to continuous mathematics and are available in any
standard treatise of the subject. [2
In this theorem, we started with sequences of real numbers and studied
properties of their generating functions. For applications, it is convenient
to paraphrase certain parts of this theorem starting from the other end.
That is, we take a function A(x) of x. Assuming that A(x) can be expanded
by a power series in x: say A(x) = 5 an". it follows that A(x) is the gene-
n-fl
rating function of asequencc of real numbers, namely the sequence (”ill-30'
Uniqueness guarantees that this sequence is uniquely determined by the
function A(x). This means that the coeflicient of each power ofx is uniquely
determined. So it-makes sense to speak of ‘the coemcient ol‘ ::'I in A(x)’ for
every n = 0, l, 2. . We shall often use this terminology. (For example,
l
the coefficient ofx‘ in is 2'.) Part (ii) of Theorem (1.2) can be
l —- 2x
paraphrased by saying that if A(x), B(x) admit power series expansions
then for every n = 0, 1, 2, ..., the coefficient of x" in M(x) + p. B(x) is
the corresponding linear combination of the coeflicients of x" in A(x) and
30:). Similarly (iii) gives a formula for the coefiicient of x" in the product
function A(x)B(x). Part (iv) gives a formula for the coefi‘icient of x“ in A(x)
in terms of the nth derivative of A(x).
Advanced Counting Technique: 511
A crucial question now is how to tell if a given function of xcan be
expressed as a power series in x. Part (iv) gives a necessary condition for
this, namely that the function be infinitely differentiable. For real functions,
this condition is not sufficient. A classic counter-example is the function
A : R—>R defined by A(x)=exp (— l/x')=e-"" for xgéOand A(0)=0. Using
properties of growth of exponential functions it is not hard to show that A
is infinitely difl'erentiable and that A(")(0) = 0 for all k = 0, 1, 2, . So
:2. A(*)(O)x" = 0 as A(x). Thus A(x) cannot be expanded as a power series
-e
in A(x). Surprisingly, the situation is much simpler for complex functions.
A well-known theorem of complex analysis asserts that if a complex-valued
function f(z) of a complex variable is differentiable in a neighbourhood of
0, then it can be expanded as a power series in z and that this expansion
is valid in any disc centered atO in whichf is differentiable. This is why
the theory of complex power series expansions is easier to handle than that
of real power series expansions. The explanation for this apparently para-
doxial situation is that complex difl‘ereutiability is a much stronger condition
than real difl'erentiability. To stress this fact, complex difl‘erentiability is
often called analyticity.
These difliculties need not worry us. however. The real functions that
we shall consider will be expressible by power series since they will gene—
rally be restrictions of complex analytic functions.
A special case of (iii) is worth mentioning separately. Let r be a non-
negutive integer. Then the coeflicient of x" in A(x) is the same as the co-
eflicient of x'“ in x'A(x). This follows by taking B(x) = x'. In the formal
approach, this follows more easily by multiplying the power series for Apr)
term by term by 2".
Because of this theorem, we can start from the O.G.F.'s of a few
standard sequences and build up those of many others. The next
theorem lists the generating functions of some of the most standard
sequences. Once again we omit the details of the proof because these results
will be found in continuous mathematics except for a slight change of
terminology. Instead of saying that A(x) is the O.G.F. of a sequence (an),
a continuous mathematician is more apt to say that the power series ex-
pansion of A(x) is 50am".
1.3 Theorem: (i)l+x+:—:+ 1'
31 +...+ = e'is
11’;
il’:
+ = '2'
n—D
)1
valid for all x (ii) Ifn is a positive integer then (1 + x " = 1 + ( ) x +
l
n n n . n
( )x' + +.( )x'+ +( )x‘“ + x" = E ( )2! isvalidfor
2 r 71-! r-o ,
512 Discnm MATHEMATICS _ (Chapter Seven)
all x. (iii) For any real n (not necessarily a positive integer) and any posi-
tive integer r, define (a) to be "Wt—l). Then (i +1t)‘I
I'
n n 1
=2()x',isvalidforallxwith|x|<1. (iv) Foralllxl < hm
r-ll ,-
=l+x+x’+...+x“+.... I
Comments: (ii) is a special case of the binomial theorem of algebra (see
Exercise (6.128)). Note that the sum involved here is finite. By analogy
(iii) is called the binomial theorem of calculus. Note, however, that here the
n .
numbers< ) have no combinatorial significance. When n is in fact a
r.
n
positiveinteger,( in (iii) does coincide with its usual meaning as the
7
number of ways to select r objects out of n. Unlike (ii). (iii) is valid only
for | x | < 1. (iv) is really a special case of (iii) obtained by taking 71 = —l
and replacing x by —x. It can also be derived directly from the formula
for the sum of a geometric progression. More generally = l + six
1 — ax
+ a'x' + + om" + for all |xl< Tit—I where a is any non-zero
real number. We shall occasionally need the complex version of this
result.
The case of (i) is r ' liy ' :....a " ‘ the “r d I
function e’‘ is defined by the power series E g. In that case there is little
lI-n ~
to prove in (i) except to show that this series converges for all x. The price
one has to pay is that all the properties of the exponential function (in-
cluding the multiplicative property, namely, e“? = e’e’) have to be
established from this series. Another approach is to define the exponential
function as the inverse of the natural logarithm. Its properties then follow
8
from those of the natural logarithm (which is usually defined as I g) and
I
we shall use some of these properties. When this approach is taken, (i) is
precisely the infinite Taylor series expansion of e" at 0.
We now illustrate the applications of generating functions. The
applications to problems of enumeration and to solving recurrence
relations will be relegated to the next-two sections. in this section
we show how generating functions can be used to prove certain combi-
natorial identities. Generating functions have real variables for their
arguments. Inasmuch as a real variable can assume infinitely many Values,
Advanced Counting Technique: 513
it would appear that we could generate a host of identifies by merely
assigning various values to the variable x. For example, by putting x = §
in(iv)oi'Theorem(l.3), weget z: 1 +§+§+§ + +;-. + ...
Care has to be taken to see thatxis given only permissible values, i.e.,
values within the interval of convergence of the particular power series, or
else some very absurd identities would result. For example, if we set x = 2
in (iv) of Theorem (1.3), we get the ridiculous result that l + 2 + 4 + 8
+ = — 1. Actually. it was absurdities like this that convinced mathe-
maticians of the neeed to exercise caution in handling infinite series.
If we assign x a permissible value then the resulting identity would of
course be mathematically valid. But that does not mean it will be worth
stating separately. In order for such identities to have appeals of their own
(other than as special cases of more general identities), it is necessary that
at least one of the two sides should have some special significance or some
elegance of form. Take for example, the binomial expansion in (ii) in Theo-
n I] n
rem (1.3), namely, (l+x)'=l+(l)x+(2)x‘+...+( )
)1 - 1
.\""'1+ x“, which is valid for all x. Putting x: 10 we get the identity
1: n'
ll"=1+ < ‘ ) l0+ (2 )100+ + (10). which has no particular
11 n
significance. But it“ we put x—— l. we get 2": 1 +(I + (2 )+ ...+
n
-( >+ I. Here the right hand side hasaspecisl significance. Since
n— l
n -.
( is the number of r-subsets of a set with n elements. the right hand
r
side is the total number of subsets of a set with n elements. Thus we get
an alternate proof of Theorem (2.2.15). (Cf. Exercise (2.2.16) (i) where
this identity was to be established combinatorially.) Similarly setting
,, n
=—l. we get 0= 2 (-l)' from which it follows that
7-0 r
11 n
E ( >= 2 ( ). Since the sum of these two sums equals 2"
yeven ,- road r '
(as We just saw). each sum equals 2“, which means that the number
of subsets of even cardinalities of a set with nelements is 2"“. This is
equivalent to Proposition (2. 3..3)
Interesting as this is, a far more frequent use of generating functions
in proving combinatorial identities is through the uniqueness of power
series expansions (Part (i) of Theorem (1. 2)). The way this technique works
is as follows. We take the given expression to be simplified (or one of the
514 DISCRETE MATHEMATICS (Chapter Seven)
sides of the identity to be proved) and by some algebraic manipulations if
necessary, identify it as the coefficient of some power of at, say of x' in
some function A(x) of x. We then expand A(x) by power series using a
combination of some of the standard power series expansions. The coefli- -
cient of x‘ in this expansion must equal the given expression.
As a typical illustration, we prove the following identity.
It ’ Zn
1.4 Proposition: For every positive integer n, i < ) =( ).
.-. r n
Proof: We immediately recognise the right hand side as the coeflicient
of x“ in the expansion of (l + x)". If we could show that the left hand
side equals the same, we would be through. The ‘enm of the products'
form of the left hand side suggests that Part (iii) of Theorem (l.2) may be
n
useful. However, ( ) is the coeflicient of x' in (l + x)” and if we want
r
the result to come out as the coefiicient of x'. then the coeflicient of x'
needs to be multiplied by that of x"". This difl‘iculty can be remedied by
n n
noting that ) is the same as ). Consequently we have,
f n—r
é.(l‘)’=é.(f) (.3)
= §o[ooefficient of a" in (l + x)"]><[coeff. of 1"" in (l + x)"]
,-
‘= ooeflicient of x- in (l + x)" (l + x)“ by Theorem (1.2), (iii).
Since (1 + xy- (1 + x)‘ = (1 + x)”, the result follows. I
As another example we do the following problem.
1.5 Problems Let n, k, r be positive integers. Evaluate the sum
( n ) (n+ l > <n+r
+ + + >
k k k
Solution: The given sum has for its summands the coeficients of x" in
(l + x)", (1 + x “,..., (1 + x "a By (ii) in Theorem (1.2), the given sum
equals the coefficient of x." in (1 + x)" + (l + 20"“ + ...+ (1 + x)“. This
observation would be of little help unless we have some way of handling
this latter sum. Fortunately, we can do this by noting that it is the sum of
the first r + 1 terms of a geometrio progression with common ratio (1 + x)
and first term ( l + x)". Using the well-known‘summation formula for geo-
metric progressions, we see that ’
Advanced Counting Technique: 515
(1+x)-+...+(1 +")'"=(’lifl‘x# (1 +x)‘
= i [(1 + x)-+'+I — (1 + xv].
So the given sum equals the coefficient of x‘ in (l/x) [(l+x)'+"‘—(l +x)"],
which is the same as the eoeflicients of x‘“ in [(l + x)"+1'+l — (1 + x)']. By
the binomial theorem the eoeflicients of x‘“ in (1 + x)'+'+‘ and (l + x)"
n + r+ l n
are. respectively, ( and ( md so by (ii) Theorem (1.2)
k+ 1 k+ 1
again, their difi‘erence equals the coefficient of x’“ in [(1 +x)"+'+‘—(l +x)'].
So h l n+r+ l n )
t e iven sum us 5 — .
8 eq ( k+ l ) (k+l I
in this problem. all the summands were coeflicients of the same power
of x in various functions of x and hence Theorem (1.2) (ii) couldbe applied
directly. Sometimes when the summnnds represent coefficients of difl'erent
powers of x in various functions, we have to multiply these functions by
suitable powers of x before applying (ii) of Theorem (1.2). We illustrate
this technique in the next problem.
1.6 Problem: Let n,k,rbe integers with 0 <r<k<m Evaluate the
ll 73 II I! n
sum ( )-( +< +—( )+...+(-—1)'( .
k k—I k—2 k—3 k-r
n
Solution: Fori=0,...,r, (k ) is the coeifieient of x“ in (l + x)".
which is the same as the coeflicient of x“ in x'(l + x)'. So the given sum
equals the coeflcient of 3‘ in’éo(— 1)'x’(l + x)“. This is again I geometric
progression with common ntio —x. So
r _ = 1-(_1y+l,(+1
'33“ XY(l+x)' 0+1”?-
= [(1 + xv"! + (—1)'x'+1(l + x)"'1].
n—1 n—1
The coeificient of x" in this function is ( k ) + (—1)’ ( ). This,
k—r—l
therefore, is the given sum. 3
Note that the special case r = 0, gives Proposition (2.2.19). while if we
I! ll
put n = r = k, and use that (1) equals ( ), we get
n—1
(Z)-(Y)+(:)-(;')+---+<-w(:)=°-
516 DISCRETE MATHEMATICS (Chapter Seven)
which we obtained above by a dih‘erent method. i.e. by setting at = — l in
the binomial expansion of (l + x)‘.
So far we used the binomial expansion of (1 +x)‘. It should he noted
that here x is a dummy variable. As such, we may replace it by any ex-
pression. In particular we may replace 1: throughout by 2):, x — 3, x2 or by
other function of x. The same can be done for any other power series in x.
Care has to be taken to see that this function of x takes only those values
which lie in the interval of convergence of the original power series. For
example, replacing x by 2x in (iii) in Theorem (1.3), we get (1 + 2x)"
an '1 n n
= z ( ) (2x)' = 2 2'( ) x'. This is valid whenever] 2:: l < 1.
1-D ,- 1-0 . r .
i.e. whenever | x | < i. If the original expansion is valid for all x (as is the
case with (ii) in Theorem (1.3)) then of course so is the new one.
We illustrate this technique or changing the variable in the following:
1.7 Problem: Let m, n be positive integers with m g n. Sum the series
3H),: ( Z )( mn—k) n ' :1
Solution: In the binomial’expansion (l + x)“ = a, ( k )x" if we replace
M n
x by — x we get (I —x)" =3. (_l)"( k )x". Using this and (iii) in
Theorem 0.2), we recognise the given sum as the coeflicient of xM in the
product (1 -x)" (l + x)", i.e., in (l—x')”. Replacing x by —x2 in the
l n
binomial theorem. (1 — x')“ =3 (—ll" < k )x”. 80 we see that ifm is
\ ‘ n ‘ i
even, the given sum equals (—IW'( I2 ) while if m is odd, it is 0. I
m
There is yet another powerful technique by which the validity of an
identity can be extended considerably. It is based on Proposition (6.2.25)
according to which the number of roots of a non-zero polynomial over a
field cannot exceed its degree. As a consequence it follows that if f(x) and
g(x) are polynomials of degree S r (say) over a field F and if the polynomial
equation f(x) = g(x) holds for at least r + 1 distinct values of x in F,
then it must hold identically over the field F. (Consider roots of the
difference polynomial h(x) = f(x) — gun) The way this technique works
for proving identities is as follows. First we prove an identity for a variable
n which takes only positive integral values. Usually, this is done by induction,
a combinatorial argument or using generating functions as outlined above.
If the two sides of the identity are polynomials in n (of not necessarily
equal degree), than by what we just said. lhcidcntity must hold if we replace
Advanced Counting Technique: 5”
n by a real variable x (or a complex variable 2). We can then assign x any
real value, often a fraction or a negative integer. Converting the two ex-
pressions we get a ‘new‘ identity whose direct proof may not by so obvious.
We, illustrate this technique in the following sequence of identities. The
first is proved by a combinatorial argument.
1.8 Proposition: Let m, n be positive integers. Then
a: m n + k m m n
2 ) = 2 2“
k-tl k m / k-o k k
n + k ' n ‘
Proof: If ) were equal to ( k )2" then the identity would be
. "I .
n+k
trivial to prove. But of course in general( ) need not equal
In
n
( k ) 2*. The presence of the term 2" suggests that a combinatorial argu-
ment might work since 2" is the number of all subsets ofa set with k elements.
Let M, N be two mutually disjoint sets with m and n elements res-
pectively. Let? = ((X, Y): Xc M, YcNUXandl H = m). We apply
a double counting argument. Letting 1 X | = k, where k = 0, l, ..., m and
summing over k. we see that the left hand side of the identity is precisely
[.9 I. On the other hand, for (X, Y) 65”, let Y, = YnM and Y, = YnN.
(See Figure 7.1). Note that l YI 1 +1 Y,| = m and Y, C X c M. Con-
verlely given any subsets Y” Y. of)", N respectively, with l Yll + I Y, | = m
and a subset X with Y, C. Xc M, if we let Y= Y,u Y.. we get an element
(X. Y) e 5’. As this correspondence is obviously one-to-one. we can find
x®® N
M
Fun 1.]: Proof of Pmpolltion (1.8)
|.9’|hy counting ordered triples (1", I". X) with YICN, YIC X CM
andlY1|+|Y,|=m- Let j=| Y.[. Then 0<j<m.IY.I =m—J'
and|M~Yl =m—(m-j) =j. Forafixedj,(Y]. 17,) can be chosen
" .
in ( m )( ) ways. For each such choice. the set X can be obtained
m—J‘ j
by adding to Y, any subset of M — Y.. Since there are, in all, 2/ subsets of
518 Discnm MATHEMATICS (Chapter Seven)
m n
M— y" it follows am [91 = '2': ( > ( )2). Noting that
1" m-j i
m m
( ) = and changing the index of summation from i to k we
m—i i
get the right hand side of the identity, proving the result. I
We now derive another identity from this, in the manner indicated
above.
1.9 Proposition: With m. n as before,
u. m n + k ,,, m n +k
N =()( FH“()( km
k m “‘0 k k
Proof: Although the left hand side of this side is identical to that in the
last proposition, the nature of the right hand side makes it unlikely that a,
combinatorial argument would work. The two sides of the identity in the
last proposition are polynomials in n. Since the identity holds for all
positive integral values of n, it must hold if» is given any real value. In
particular it holds if we replace 1: by - n — l. yielding
ECX”Z”%ECX13” v -—n —l +k —n —I
Here the binomial coefficient _ and have no
. m k
combinatorial significance. They are formally defined by the formula
(2:) _ {(x—l)...(x—r+l!
_ r! .
r
We leave it in an exercise to check that
—x x+r—l
( ) = (—1)r( ) for allx.
r r
So the left hand side becomes
.. m n+ m—k‘
(-l)"E ( )( .Letj=m—k.Then
H k m
m m'
(k ) _ (,- )
(Here we do need that m, k are integers.) So
ECWTW=(3C3
1M-
Advanced Counting Technique: 519
. I ... m n+k
wh1ch upon chnnging from j to k becomes 2 . Thus the
H k m
left hand side of (...) reduces to (— l)’" E
k'“ k m
n+k
). By. smile
. m n+k
reasoning, the right hand side of (t) becomes 2 ( ) < ( — 2)".
H k k
Multiplying both sums by (—l)"l proves the result. I
In the applications so far, we did only algebraic manipulations to
generating functions. We now study a couple of applications where term-by-
term differentiation (Part (iv) of Theorem (1.2)) is used. Thus facts from
continuous mathematics will figure for the first time. We begin by obtaining a
formula for the sum 1 + 2 +...+ n where n is a positive integer. Although
this formula is well-known and can be obtained by very elementary methods,
we derive it here to illustrate the method used. First we prove 11 result
which may be of independent interest.
1.10 Proposition: Let A(x) be the 0.G.F. of a sequence {a.},'.’;.. For each
"30, 1e; cn=aa+as+~~+ a... Then the 0.G.F. of the sequence (cu) is 1%;
Proof: By (iv) in Theorem (1'3)’1_—x = 1 +x + x‘+...+ x- +... .The
— 0,
—- 1 for all n—
result now follows from (iii) in Theorem (1.2), if we take b..—
1,2,.... l
1.11 Theorem: Forevery positiveinteger n, 1 + 2 +...+n = "—(I—I—g—Q.
Proof; in View of the last proposition. 1 + 2 +...+ n = coefficient of x."
in 1(1) whereA(v)= 0+x+ 2x'+3x’+.. .+ nan-+.... Sowe first need
to findxa closed form for A(x) There is no function in our list so for which
has 23 nx" for its power series expansion. However, we start with
”-1 —1_
l—_-x 1+ v+x’+ -~~+x'+
(1.2), we get
Differentiating both the sides w.r.t. x and using (iv) in Theorem
1%(1'lf'x)= (1_—l—x)l=.f ""H'
Multiplying both sides by x, we g°t_—(1_:)I = 2 ”X'= 400- So 1+2+m
same as the
+u would be the coefficient of x“ in (l—x)" which is the
520 DISCRETE MATHEMATICS (Chapter Seven)
coefficient of x'H In( —lx)' To find it. we write (1110' as (l x)-’ and
-
expand it by the binomial theorem (Part (iii) of Theorem (1.3)) as
2 (_,).( --3, ) y whm( —3 ) = QLL4_)---;_3
.e
r-o
_ _
f
r
_ _
r+.1>
2
=(_1y(’+2_)4§= (—1y(2+ ).So,(l—x)-‘
.. r+2 .. r+2
= 2 (—l)"( )x’=2 x’.Forr=n—-l,
r-o 2 1-0 2
r 2 n l
( + ) = ( + > = n_(n2il_) and as noted before, (his equals
2 2
1 + 2 +... + n. i
The crucial part in this proof was to get a closed form for a function
whose power series expansion was knowu to us. This is not always possible
to obtain and even when it is, considerable ingenuity may be needed to
find it. In the next problem we encounter a situation where a differential
equation has to be solved, hardly to be expected in discrete mathematics.
‘ ,,. 2k Zm—Zk‘
1.12 Problem: Evaluate the sum 2 ( .
*‘0 k m—k
Solution: The nature of the sum indicates that it is the coefficient of x" in
a 2n '
[A(x)]' where A(x) = E0 ( )x‘. We have to find a closed expression
" n
for A(x). Difierentiating we get,
.. 2n .. 2n+2
A’(x) = 2 mt"-l = 2 ) (11+ 1) x'.
"I n .4 n+l
Now,
2n+2 2n 2n 2n
(n+l)=2(2n+l) =4n + 2 .
n+1 n n n ‘
Multiplying by x“ are summing over n, we get
to Zn .. 2n
A'(x)=42( )nx“+2t( )xflI
u-o ” H n
2n .. 2::
=4): 2 (n )nx” +22 ( )x"=4xA’(x)+ZA(x).
“'0 n
A'(x)— 2A_(x)
Advanced Counting Techniques 521
inn»:
we e y "-
A(x), this gives - - equation
a differential ~d_y=fl__
dx 1—- 4,: whose
general solution is y' = Ml -— 4x)“, where A is a constant. When 2: = 0,
0
y= ( o = 1, giving A :1. So y =(1/(1—‘—4")'l = (1—4")-m' (If
we
expand (l — 4x)‘11' by the binomial theorem we indeed see it comes out as
2r
)3 ( > so. But we had to arrive at the answer rather than verify it.)
y-o r
Since A(x)—
— V1143? [A(x)]a = 134—): = a (4x? by (iv) in Theorem
(1.3). So the given sum equals 4'". i
So far the variable x in the generating functions was assumed to be a
real variable. As remarked after the proof Theorem (1.2), there are some
theoretical advantages in considering generating functions of a complex
variable. In the next problem we show that there are practical advantages
too. Because of the fundamental theorem of algebra, a real polynomial
can be split completely as a product of complex linear polynomials. This
factorization is helpful while resolving into partial fractions, where the real
factorisation may not be adequate. The problem also illustrates how
sometimes it pays to treat finite sums as infinite sums.
n m ‘
1.13 Problem: Evaluate the sum 2 (—l)‘| ( ) where n is a
m-rn/Z] n-m
positive integer.
m
Solution: The binomial coefficient is 0 when n—m<0 and Ilso
11—»:
when n - m > m, i.e. when m < 1'"/21. (Note that Fri/21 is the smallest
integer 2 71/2.) So the given sum might as well be taken as the infinite sum
. m
2 (—1)"l ( ). Since (—1)” ) = coeflicient of x""" in (— 1)"'
""0 n—m n—m ‘
(l + x)"1 = coeflicient of x“ in (—1)" x'"(1+x)"', the given sum equals the
coefficient ofs" in “xv (— 1)~'r" (1 +x)"'. By (iv) in Theorem (13), this equals,
M-I
1 . l
i+x+x"we 1+x+x‘ into partial f ' As a real poly-
nomial, l + x + x"Is irreducible. But“as a complex polynomial. it factors
as (l—ux) (l—-¢.s'x)l where «a = c032
—u
— + iHafiz?
sin is a primitive cube root
. . A
of 1. Writing m= i—_
_ ox +——
—,
l— we get two equations for
522 olscnm MATHEMATICS (Chapter Seven)
. . l
AandB.namelyA+B=landAos‘+Bu=0. whlchglvelA=l—(x)
l l l 1—. Expand-
“dB =1—a 5°1_+x'+?==1—m1_mx+ 1—..11_ 0'):
each by (IV) in Theorem (1.3) again, we see that the
ing finnd 1 _I——o—.xl
l—“I. .This is, therefore, the
+x +x‘ 51: u +———
Coefficient of x" in 1——
value of the given sum. However, since the sum is real, it is desirable to
write the answer in the real form. Since 0’ = l, the answer will depend on
the congruence class of n modulo 3. It'n = 0 (mod 3). then in" = to” = l
an
d—— m+l—m’= l—_¢o
' =.—2“"’""‘5 —§=1 Sothe
‘ +— l——cs’_ l—tn—m’+m*_3 '
given_ sum equals 1 when n=_ 0 (mod 3). We leave it to the reader to check
that it comes out as —1 when n E 1 (mod 3) and as 0 when n E 2 (mod
3). (In this problem. we could have dispensed with complex numbers by
writing m as fia— l——x" But we wanted to illustrate their
use)
Infinite series are by far the most common means to express a function
as the limit of some more “ such as pol,
There are. however. other methods of doing the same, suchssinfinite
products and continued fractions. We shall not treat continued fractions.
But we include a brief discussion of infinite products because sometimes,
in combination with generating functions. they yield certain identities.
Just as infinite series are defined aslimits of their sequences of partial
sums. an infinite product is defined as the limit of its sequence of partial
products. Specifically, a a. is defined as lim p. where, for each u,
p, =alal. .—-a,. _ Ill «1.. If this limit exists and 1s non-zero, then it is easy
to show that «n+1 as n—bao. (This IS analogous to the fact that the nth term
of a convergent series must tend to 0 as n —> on.) it is customary to write
a.as 1+ 17.. Then 12,—»0 as n—s co.
The usual rules of algebra, such as distributivity, carry over from finite
products to infinite products, if appropriate care is exercised regarding
convergence. As with series, in the past numerous identities were derived
through such manipulations, paying little heed to rigour. We prove one
such identity here. Its combinatorial significance will be apparent in the
next section.
1.14 Proposition: lf|x|< 1 then (l + x) (l + x‘) (l + x‘) ...(l + x“)...
= 170(1+x’")=l+x+x‘+...+x"+....
Proof: By (iv) in Theorem (l.3), the right hand side equals 1;.We
— X
Advanced Counting Techniques 523
can rewrite this as 11:: andthen
enas W Continuing, we
have for every n,
1 _(l+x)(l+x‘).. (1+x’”)
l—x— i—-x‘""
or,
(1+x) -‘u+x2";= (if
Since|x1< l,x1”‘—>0asn—>eo.So the infinite product fl (1+x'")
l-I
1
co nverges tol_
—x I
—.
So far we have illustrated some standard methods by which generating
9' are n." ‘ to prove L' ml ‘ ‘ In a given problem
it is often a combination of several techniques that is needed and ultimately
it is only through practice that one learns to pull the right trick. It should
be noted, however, that despite the multitude of formulas and tricks, it ll
not always possible to sum a given series, even when it consists of the first
few terms of a sequence whose generating function has a nice closed
form. For example, let H. = l + % +% +...+ %. It is not hard tofind a
closed form expression for E: 1; (cf. Exercise (1.3)). It would then appear
._
that in view of proposition (1.10), we can geta handy formula for 1:0.
But this is not so. The trouble is that finding the coeflicient of x“ in a
function ofx may lead to a summation. The numbers H. are called harmonic
numbers, because of the fact that #4:; the harmonic mean of the numbers
I
l, 2,..., n. It is well known that H.» eoas n —> 00. Because of their
frequent appearances in applications, the harmonic numbers have been
extensively studied. There are some highly accurate estimates available
for H... (It is known, for example, that H. z: in n for large values of n;
see the Epilogue.) Still, there is no closed form for 11,. as there is for
l + 2 + +n.
We conclude this section with a brief discussion of generating functions
with two variables. A sequence is a function of an integer variable. In
combinatorics, we frequently come across expressions having several integer
variable: (or 'pararueters’ as they are sometimes called). For example, the
II
binomial coeflicient( ), which gives the number of ways to choose r
7
objects from n objects, depends on n and r. Similarly in .P,. the number of
r-permutations of n objects, there are two integer variables, n and r. (In
both these examples, r is restriced by 0 g r S n, but we may remove this
524 nrscam MATHEMATICS (Chapter Seven)
.
u
restriction by setting ( )= .P, = 0 for r > n.) The number of ways to
I'
distribute n identical objects into r distinct boxes so that no box contains
more than, say, 1: objects is a function of three variables, namely, u,r
and k.
Just as we studied sequences (i.e. functions of a single integer variable)
by forming their generating functions w.r.t some indicator functions, we
can handle functions of several integer variables by associating with them
functions of the some number of real (or complex) variables. As with one
variable, the most common choice of indicator functions is polynomials
of several variables. We give below the definition for the case of two
variables. The extension for the case of more variables is similar.
1.15 Definition: Let s be the set of all non’negative integers and let
f:S x S—aR be a function. For (m,n) GS X S, write a... forf(m,u)~
Then the ordinary generating function off; (or the O.G.F. of {o,.,.):..-o)
is defined as“? .50 a... #"y' and denoted by A(x.y).
Again we may either let x and y be mere indeterminates in which case
A(x,y) is a formal power series, or we may let x and y be real (or complex)
variables. In the latter case the question of convergence arises and in fact
becomes more intriguing because although the order of summation is
unimportant for finite double summations, it need not be so for infinite
double summations. However, these questions can be answered satis-
factorily so as to give a rigonrous treatment of generating functions of
several variables.
Let us obtain a closed form for the generating function a 11... fr
M.l
m m
where a... = < ). Sinee ( ) = o for u > m, for each fixed m, the sum
II II
. m
E.( ) fly“ is really finite and equals )4" (l +n by the binomial theorem.
ll
So Amy) = 3.”! + y)" ——— m by Theorem (1.3), part (iv).
In
As with generating functions of one variable, we can recover ) from
n
A(x, y) by expanding it and taking the coefficient of my".
An alternate way to handle functions of two integer variables is to fix
one of them. For each fixed value of one of the two variables, we get a
sequence. If a." =flm,n) then for each fixed m. {2...}2’... is a sequence.
Denote its generating function by F..(x). That is, 17,,(x) = E a.“ x". This
l-O
Advanced Counting Techniques 525
way we get a sequence {F.(x)):.. of generating functions. The original
(an...) can be recovered as the coeificient of x“ in F..(x). For example, if
m
a"... =< , then for each m = 0,1, 2, ...,F,,(x)= (l + x)”.
II
There is also a graphic way to represent functions of two integer
variables which has little to do with generating funflions. If a. =f(n),
n=0, 1, 2,... is a sequence, we can indicate it by marking the points 0, l, 2....
on a number line and writing the value a. at the point marked n. Indeed
this is what we do implicity when we list the sequence as an, a,,a,, . For
a function of two integer variables, say 11.... =f(m,n), we first identify the
domain set as the set of all points in the cartesian plane both of whose
coordinates are non-negative integers. (Such points are often called the
lattice points. They form a subset of the ring of Gaussian integers discussed
in Chapter 6, Section 2). We then simply write the value am, at the point
(m.n) in the cartesian plane. It may happen that a"... is defined only for
In
some of these points. For example, the combinational definition of( )
n
makes sense only for n s m (although, of course, the algebraic definition,
m(m — l)...(m — n +1)
namely, makes senu for any 01). In such cases
71!
it is customary to describe the subsets like {(m, n): m = n) as ‘boundaries’
in analogy with functions of continuous variables, although, in the strict
topological sense, every subset of the lattice points is discrete and hence
has no boundary points.
In
In Figure 7.2(a), we give the graphic representation of a... =( ) for
n
0 g n < m. Note the ‘boundary conditions’, a”... = l for n = 0 and n = m.
In this particular example, another, although essentially equivalent, graphic
representation of the binomial coefficients is more common. It is popularly
- o e [o 40 IO.
0 I |e 3o 6' IO'
e l' 2- 3e 4e 50 [0
e o o e o I. 4. 0|
lo
(0,0) I I I I I
(a) Cartesian representation (b) Pascal triangle
Figure 7.2: Graphic Repnmtatlon of Billet-Ea] Cnefliciolm.
known as the Pascal triangle and is shown in Fig. 7.2(b). Many properties
of the binomial coeflicients can be paraphrased vividly in terms of the
526 mscnm MATHEMATICS (Chapter Seven)
geometric properties of the Pam] trimgle. For example, the identity
(m) = ( m ) is reflected in the symmetry of the triangle nhout the
n m—n
M m m . .
vertical axis. The fact that ( 0 ) + ( 1 + + = 2‘" 1s equ1va-
m
lent to saying tht the sum of the entries in the mth row is 2".
Exercise-
1.1 Find a closed form for the ordinary generating functions of the
following sequences:
(i) 1. —1, 1, —1,1. —1,...
. (ii) 1,0,1.o,1,o,1,o,...
(iii) 1,0. 0,1,0, 0, 1, 0, 0,1, 0, 0,1,...
(iv) 1. —l, —l, l. —l. -l. l, —1, —l,1,...
(v) 1. 2, 3, 4, 5,...
1. 2 Find 1 closed form for the exponential generating filnctiom of the
following sequences:
6) 1, z, 4, s. 16, 32...,
(ii) I, —1,1.—1,1,-1,1,—1,...
(iii) 1, o, 1, o, 1, o, 1. 0....
(iV) 1. 0, —1, 0,1, 0, —1,0, 1, o, —1, 0,...
(v) o,1,o,-1,o,1,o,—1,o,1,o,—I....
(vi) .P... .P...... .P,,.... .15., 0,0, 0,0,...
(where n is some positive integer.)
1.3 tr A(x) is the O.F,G. of a sequence (113:1. , find sequences whose
0.G.F.’s are
(i) 14%) (ii) %) (iii) [4mm
0
Hence find the 0.G.F. of the sequence
l. i. L in-u -
Advanced Counting Technique: 527
1.4 Let E be a primitive complex m-th root of unity where m is a posi-
tive integer. Let A(x) = i am. Prove that
I.”
an + amx" + “amx'm + ...+ «M +...= i Egan).
1.5 Prove the following identities for binomial coeflicients where n. p,
q are positive imegers.
.s.k<;>-——uz~
an 2 k-(Z) = "(u + me.
«mg (aw—(Z) = 1 +; +0.44
G”©-©+O-©+i
+M%C%C%C%JLP
[Hintz Consider the Ibsolute value of the complex number
(I + m1
. (IX 9 ) (p+q)
(v) 2 =
1'” ‘ j n —j n
. -(pX ‘q ) (p+q)
(VI) 2 = -
"' i n+i n+1-
Evaluate the following sums where n i! a positive integer:
wayh)
. . n
W&%J
u a n I V
(iii) :3 .21L
J
528 mscnm MATHEMATICS (Chapter Seven)
1.7 If n is a positive integer, prove that the coefieients of x" in
(IT—fix).— and grit—i“: are respectively, 2" and 4". Hence eva-
luate
(1")“(27')+2*(2"J2)+'~-+2*(’";")
+-+.(:)
. .
1 2". (A Simpler
" + k 1: .
proof Will
. .
be given in
1.8 Show that 2
H n 2
Section 4.) _
l 1 .. n+r—l
1.9 Provethata_—x),=( +x+x‘+...)"='§ r x.n
a positive integer.
1.10 Let f: X—nt be a function, whose range is finite. For each A e R,
let n he the probability that f assumes the value A ( m = 0 lf A is
not in the range of 1‘.) Then the expected value of f is defined as
A ER in A (which is a finite sum since the range off is finite).
e :
Prove that in case X is finite, the expected value of f equals
'17! ( SEX/Ix». (In other words. the expected value coincides with
l
the average value of f. However, it is customary to use the former
term when. instead of specifying the values of f at various points
of X directly, we are more interested in the probability that f attains
a particular value. Note that two different functions may have the
same expected value.)
1.11 Three halls are drawn at random from an urn containing 4 red, 5
white and 6 green balll. What is the expected number of red balls
in the draw?
Supposen isa positive integer and p is a real number with 0 $ p g l.
Prove that ‘
II
(a) gr( r )p'(1—p)~=np and
n I.
(b) ”E“ (r )P’(1 -P)"" = l[1+(1—?p)"l-
[Hilm Setx =l—f-I, in :3 (1 + xy-and in W
respectively.]
Advanced Counting Technique: 529
Suppose p is the probability that a head will show when a coin is
tossed. (For an unbiased coin, p = i.) If this coin is tossed n times
find (ii) the expected number of occurrences of head (b) the pro-
bability that a head occurs an even number of times.
Prove Theorem (1.11), using the result of Problem (1.5). Using
n n
n' =[2 ( >+ ( )], similarly evaluate 1'+2‘+ + n”,
2 l
where n is a positive integer. Verify your answer by induction on
)1. Also derive the same result from Proposition (1.10).
In the solution of Problem (1.12), verify that the power series ex-
. 2r ‘
pansion of (l -- 4x)'1" is indeed 2 < ) V. Show that (l —
r-(l r
4)c)—1l’I is also the E.G.F. of the sequence (ulna...
Find the 0.G.F. of the sequence {a.};.,. where a0 =0 and for
Zn —
n 9 l, a. = % < l ). (The terms of this sequence are called
n
Catalnn numbers. We already encountered them in the solution to
the Vendor Problem and also in Exercise (3.432).) Hence show
-I
that for n > I. a. =:EI am...»
.17 Verify the last assertion in the solution to Problem (1.13).
n m
.18 For a positive integer n, evaluate 2 ).
m-l'n/Z'l n — m
Suppose F is a field and [(x) = a, + 01x +...+ awe FIX] has
non-zero roots up a”. .., a. in F. Prove that
_
l 1
_ +
1 a
._=——_l
a! +u' +a. ”0-
[Hint: Note that ['(x) = a,(x — a,)(x — “2) (x — an.). Consider
Ill/X1]
x x' x‘ x‘ x-
.20 Letf(x)=l— fi—l—fi — 7—!+ fi+...+(—Wm+m . Show
that the zeros off are of the form (mt)‘ where n is npositive integer.
[Hintz Clearly f has no negative zeros. For x > 0, show that
fog) = in?! where y" = x.]
.21 Assuming that the result of Exercise (1.19) is valid for power series
(ie. for ‘polyuomials of infinite degree‘), and applying it to the
function fix) in the last exercise, prove that El %i= 16:"
n—
530 mscnm MATHEMATICS (Chapter Seven)
1.22 If p, q, n are positive integers, prove that
" ” (”W“) ("X")
2
”(112)16
n k a, k
p+q
=
n
P a
.
[Hintz Write(p + )as 2 (j )( . Interchange the
+q 1" p + q —1
order of summation and make several uses of Exerciser (2.2 16) (ii)
and (1.5) (v).]
1.23 Combining the result of the last exercise with the technique of the
proof of Proposition (1.9), prove that
is(p)(q>(n+lu+q—k) (n+p)(n+q)
*" k k p+q p q
(The special ease p: q is called the Li-Jen-Shir Formula.)
1.24 Let a... be the number of ways to select n integers from(1.2, .. ., m}
so that no two e are ‘ "‘(cf.
(2.345)). Verify that the O.G.F. of {11...}; ,.-. is MW
What is the ‘boundary' in this case?
1.25 Suppose you start at the top apex of the Pascal triangle and in
each unit of time go to either of the two neighbouring points in the
row immediately below. Prove that the number of such paths from
the apex to any point in the Pascal triangle is precisely the entry at
the point. (One such path is shown in Fig. 7.2 (13).) Using this, give
m m m
a cgeomeu-ic' proof of the identity ( 0 ) + ( l )+.,,+( )=2n_
m
1.26 Interpret and prove the following binomial identities in terms of
the Pascal triangle
('Z>=(“J’)+(Z’_‘I)
“"éu(i)(.3.)=(":")
Advanced Counting Techniques 53]
an
1.27 Using the fact that the harmonic series 2 11-: is divergent, prove that
u-1
- _. product Z.2.§.Z.£.E.
the infinite 1 2 4 6 10 12 .. ‘1’
1" p-l
_1’ where p ranges
over all primes greater than 1 is divergent.
Notes and Guide to Literature
For the theorems about power series which we have referred to, see Kreyszig
[1] or, for a more rigorous approach, Rudin [1]. The need {or a rigorous
I
approach was stressed by Weierstrass. The formula E nig=l% and its
n-l
‘proof’ in Exercise (1.21) are due to Euler. It is a masterpiece of an unscrup-
lous reasoning yielding a beautiful result. The result can, however, be
established by rigorous methods, using, for example Fourier series; see
again. Rudin [1].
There are numerous combinatorial identities, many of them being redis~
covered from time to time all over the world. The binomial coeflicients
have a long history. The Pascal triangle was actually known before Pascal,
but he used it for his work on probability theory. MacMahon [l] and
Riordan [1] give a full discussion of the use of generating functions. Propo-
sitions (1.8) and (1.9) are from Lovasz [l].
2. Application to Enumeration Problems
In algebraic computations, we frequently expand an expression and then
simplify it by grouping the ‘like’ terms together. The coeflicient of each
such term is the number of times it appears in the expansion and finding
it is essentially a combinatorial process. Consider for example, the expan-
sion of (x + y)“, where x, y are elements of a commutative ring with
identity (cf. Exercise (6.128)). Since this is a product of n factors, each
having two terms, the expansion consists of 2" terms in all. of the form
a,a....a. where for each != l, ..., n, a, is either a: or y. Since x and y com-
mute with each other, for every k, 0 < k g n, all terms in which exactly I:
of the dis equal x (and the remaining (1: — k) al’s are equal to y) are equal
to x‘yl‘h Thus for every choice of k symbols out ofthe n symbols a1, a,,...,a.,
we get a term which equals xiv” and vice versa. It follows that the coeffi-
n n n
cient of 26‘)!” is (k ) and hence that (x + y)“= ~24]: ) fly“. This is
indeed the most direct way to prove the binomial theorem.
As another example of the use of combinatorial reasoning, let us do a
simple problem.
532 nrscnm summaries (Chapter Semi)
2.1 Problem: Find the coefficient of x" in
(1 +x+x’)(x+x’+x‘)(x‘+x‘).
Solution: Although this problem has no special significance, we present
several ways of doing it, because it will be used to motivate further discus-
sion. First, there is the brute-force method which consists of fully multiply-
ing out the three factors. This would give [8 terms, and we take those that
equal x". But this is labourious and most of the labour is wasted since we
want only the terms that equal x'. 50 we look for a better solution. Note
that, in the expansion, a term x" will arise as the product x'IxMxW where
xM, x“ and x": are one of the summands in the first, second and the third
factor respectively. The nature of the factors indicates that n. has to be 0
(corresponding to the summand l which equals 36'), l or 3. Similarly n,
has to be 1. 2 or 3 and n, has to be 4 or 6. Moreover,
x7 = x'hxlllxla = xn'mflr
gives n, + n, + n, = 7. Thus the problem is equivalent to finding the num-
ber of ordered triples (up n,, n.) of integers for which n, + n. + n, = 7,
n,=0. l or 3,11,: 1,2and 3and n3=4or 6. For n,=4, 111+», must
be 3 and this gives only two possibilities: n, = 0, n, = 3 and n, = l, n, = 2.
If n, =- 6, then nl + n, = l which can happen only when nl = 0, n2 = I. So
in all there are 3 triples satisfying the given conditions. Thus the coefiicient
of x’ in the expansion is 3.
This solution can be improved upon slightly. We note that the given
expression equals
(1+ x + x') x (l + x + x') #0 + x‘)=x‘(l + x + x')(l+x + X')(l+x’).
So the coefiicient of x7 in it is the same as the coeflicient of x' in
(l+x+x’)(l +x+x’)(l +x‘).
As above, this equals the number of triples (ml, mg. m,) such that
m,=0,1 or3;m,=0,l or 2;m,=00r2
and m, + m, + ml = 2. It is a little easier to count the number of such
triples. The count again comes to 3, which is the desired coeflicient. !
The solution given above clearly indicates that the preceding algebraic
problem is equivalent to the combinatorial problem of finding the number
of ways to distribute 7 identical balls into three (distinct) boxes so that the
first box contains 0, l or 3 balls, the second one contains 1, 2 or 3 balls
and the third one contains 4 or 6 balls.
We now turn the tables around. Instead of translating an algebraic pro-
blem into a combinatorial one, it is often more rewarding to go the other
way. This is so because algebra is a more developed branch of mathematics.
Advanced Counting Teclmiques 533
Numerous tricks are available to manipulate algebraic expressions. Not all
of these tricks may have combinatorial analogues. For example, if we en-
counter a factor l/l—x we can replace it by the infinite series l+x+x‘+
...+ x“ +... and vice versa. It is difficult to see the combinatorial analogue
of such a replacement. Even for those algebraic manipulations which do
have combinatorial interpretations, it is often the case that the algebraic
manipulations are easier to think of than their combinatorial equivalents.
For example, in the ‘improvement’ to the solution to the last problem, the
combinatorial interpretation is to start by placing one ball in the second
box and 4 balls in the third box and then find the number of ways to dis-
tribute .the remaining 2 balls. Although this trick is simple enough, a
person is not likely to think of it as readily as its corresponding algebraic
manipulation. So heavily drilled in algebra are we, that the moment we see
the expression (1 + x + x3) (x + x“ + x’) (x‘ + x“), the very first thing that
comes to our mind is to take out x and a4 as common factors from the
second and the third brackets respectively.
It is worthwhile at this point to consider the relationship between
algebra and geometry. Suppose we want to solve simultaneously, the system
of equations: (i) x — 7y 4— 25 = 0 and (2) x‘ + y' = 25. We could translate
this problem geometrically. If x and y denote the cartesian coordinates of
a point in a plane w.r.t a fixed rectangular frame of reference, then, the
solutions to (1) constitute a straight line L while (2) represents a circle C
of radius Swith centre at the origin. The common solutions to (l) and (2),
therefore. correspond to the points of intersections of L and C. If we sketch
C and L we find that they intersect at (3,4) and (—4, 3). So the solutions
to the system (1) and (2) are: x = 3, y = 4 and x = — 4, y= 3. However.
this method is rarely used. First of all, it is impossible to sketch curves or
to read the coordinates of points in the plane with absolute precision and
even a slight inaccuracy would lead to erroneous solutions. secondly, as in
the present case, it is often easier to solve a system of equations algebraically.
That is why, instead of translating algebraic problems in terms of geometry,
we often go the other way. As the reader might have experienced, many
propositions in geometry which require considerable ingenuity to prove by
the methods of Euclid, follow rather routinely from algebra with a suitable
choice of coordinates. (This is not to suggest that geometry is useless. While
geometry may not give the exact values of a solution, it provides deep in-
sights into the nature of the solution).
The relationship between algebra and combinatorics is not quite so
lop-sided. Because of the immense variety of combinatorial problems, there
is no single algebraic model that can handle them all. There are many combi-
natorial results which are still best proved by combinatorics. There are even
algebraic results where combinatorial arguments are needed, the binomial
theorem being one such result. For the time being, however, we shall
concentrate on the use of algebraic manipulations in combinatorics. Speci-
534 inseam MATHEMATICS (ChapIEr Seven)
fically, we shall study how generating functions can be used to formulate
problems of enumeration, that is, problems where we have to count the
number of ways to do a certain thing (such as distributing balls into boxes,
selecting subsets of a given set, permuting objects. traversing paths in a
given diagram.) subject to certain restrictions. LI
The way this‘ '- ' works is as " " . In ...:.... Pf
there is usually an integer parameter, say u. (Sometimes there are more
than one integer parameters.) lna given problem we may be interested only
in one particular value of n. For example, in the problem discussed above
we wanted to distribute 7 balls among 3 boxes subject to certain restrictions.
But there is nothing special about 7 and we might as well consider the
problem of enumerating the number of ways to distribute n balls into those
boxes, where n is any non-negative integer. (We could also change the
number of boxes, 3, to a variable, say m, but for the moment, let us not
get into this kind of generalisation.) So let an be the number of ways to
put n identical balls into three boxes so that the first box contains 0, l or
3 balls, the second contains 1, 2 or 3 balls and the third contains 4 or 6
balls. We now look for an expression A(x) of a formal variable x, which
will have the property that each way of doing the particular thing (in the
present case, each distribution of the n balls) corresponds to one occurrence
of the term x" in A(x). Consequently, as will be the coeflicient of x" in A(x).
and hence, A(x) will be nothing but the generating function of the sequence
(tuna. Having found A(x), we can expand it as apower series and recover
a. as the coefficient of a". We often call A(x) the enumerator for the com-
binatorial problem concerned. More generally, the term enumerator is
applied to any function (possibly of more than one variable) which stores,
in some way. (usually as the coeflicients of some terms) the number of
ways to do something. As with generating functions, there are many kinds
of enumerators. But we shall be concerned with only two of them, the
ordinary enumerator (which we shall simply call as enumerator) and the
exponential enumerator. As a rule of thumb, rhc ordinary enumerator is
useful for problems involving combinations. while, for problems of permu-
tation, the exponential enumerator is more convenient. In the literature,
the term ‘enuinerator’ is frequently used without a formal definition. The
reason probably is that. as a concept, an enumerator is not substantially
difl'erent from agenerating function. Perhaps the best way to describe an
nun isasthe ‘ g“ f of a _ - A with an
miuu , "' Ina ‘ L ", an is a :....' ‘
put into action! .7
In essence, what we are doing here is to obtain an algebraic 'coding’ of
each way .of executing a combinatorial processs. Thus, in the problem at
Fahd: Puttlnx n, bails In the first box. it. in the second and n. in the third
is coded by the expression xt'lx'ux't which will equal 1:" if
and only if
:11 + n. + n. = n: The restrictions on the numbers of balls in the boxes
ranslate into restrictions on the values til, 11,, n, can take.
Advanced Counting Techniques 535
. Of course, merely paraphrasing a problem is no guarantee of its solu-
tion. In our case, the success of the technique of introducing generating
functions in problems of enumeration depends on the answers to the follow-
Ing questions: (i) How easy is it to identify the enumerator say, A(x)?
and (ii) How easy is it to expand A(x)? Let us answer these questions
for the problem we are studying, namely, finding an, the number of ways
to put n halls into 3 boxes, subject to the restrictions given above. A(x) is
the O.G.F. of the sequence {a,);‘,"_... Is there any way to get A(x), other
than by first computing «1.? The answer is 'yes’. Let us first consider enu-
merators for each box. Let B(x) be the ordinary enumerator for the first
box. This means B(x) is the 0.G.F. of the sequence {b,.}:'_o where b. is the
number of ways to put n balls in the first box. Because of the restriction
Imposed. namely that the first box must contain 0, lot 3 balls we have
b.=bl= b3: 1 and b,. =0for all other n. SoB(x)= l +x+x'.
Similarly, if we let C(x) and D(x) be the enumerators for the second and
the third box respectively, then C(x) = x + at:1 + x3 and D(x) = x‘ + x'.
The trick is now to observe that A(x) = B(x) C(x) D(x). This is so because
every occurrence of x" in A(x), i.e., every distribution ofn balls into the
three boxes, arises from a product of the form x'I x" x'" where n1+n,+n,=n
and x"-. x", x”- are terms appearing in B(x), C(x) and D(x) respectively.
Thus we see that question (i) above has an affirmative answer. Question
(ii) reduces to doing a problem like (2.1) (which deals with the case n = 7).
We can simplify A(x) as x°(l + x + x‘) (I + x + x”) (l + x'). From this
we see that a,, = Ofor n < Sand for n >12. For 5 g n g 12, n.I is the
coeflicient of x'" in (1 + x + x') (1 + x + x') (l + x’). This expression,
however, has to be expanded 'by hand'. It comes to l + 2x + 3x' + 4x“
+ 3x‘ + 3x5 + x‘ + x7. So ultimately, A(x) = x' + 2x' + 3x7 + 4x“ + 3x'
3x“ + x“ + x“. We can now tell instantaneously, for every n, how many
ways there are to put n balls into the 3 boxes subject to the given restrictions.
Summing up, in this problem we had a partial success. It was easy to
identify the enumerator A(x). But there was no slick way to expand it. In
some problems, even the second step is easy to carry out and we have com-
plete success. We shall see examples of such problems shortly. However.
even when we have only a partial success, the method is worthwhile be-
cause once we obtain A(x) in some form, expanding it is often a mechanical
process and can be carried out on machines. Secondly. in some problems
it is more important to get a closed form for the enumerator than to expand
it, because the closed form can provide us with the information we need,
as in the following problem.
2.2 Problem: At an international conference, 25 countries send teams of
4 delegates each. A committee is to be formed from these 100 delegates
subject to the following rules: (i) the number of members in the committee
shall be odd (ii) the committee shall include atleast one and at most three
536 mscnm MATHEMATICS (Chapter Seven)
delegates from each country. How many such committees can be formed?
Solution: Let us ignore the first rule for the moment. For n = 0, 1. ...,
let a. be the number of committees satisfying the second restriction. Let
A(x) = Enanx'.
Then A(x) is the enumerator for the formation of committee, ignoring the
first restriction Every such ‘ --—-— is 1 ‘ ' to 25 l ' me
one from each country, put together. Because of (ii). each such subcom-
mittee contains 1, 2 or 3 members from that country. From 4 delegates, we
can have 4 subcommittees consisting of 1 member. 6 subcommittees
consisting of 2 members and 4 subcommittees consisting of 3 members. It
follows that A‘(x), the enumerator for formation of subcommittee for the
ith country, is 4): + 6x9 + 4x’, fort = l, ..., 25. Once again, we claim
that the product A,(x)A,(x) A,,(x) equals A(.r). This is so because for
every n = 0, l, 2, each occurrence of x‘I in this product is equivalent to
an ordered 25-tuple, (11,, 71,. ..., n") where x"! occurs in A,(x) and n, + n,
+ + n" = n. So A(x) — (4x + 6x3 + 4x')". It would be horrendous
to expand A(x). But we can get by without it. Note that A(x) is a poly-
nomial (of degree 75) and hence the expansion is valid for every value of
x- In particular. setting x = l, we would get A(l) = (4 + 6 + 4)“ =
14“ = a, + a, + a, + + a. + . This will be the total number of
committees that can be formed so as to satisfy (ii). (This number could
also be obtained without generating functions, since for every country there
are 14 possible ways its subcommittee can be formed and we could combine
the subcommittees any way we like.)
However, in view of the restriction (i), namely. that the number ofcom-
mittee members he odd, the answer to the problem is not ”a + a, + a, +...
but a, + a8 + a. + . There is a tricky way to evaluate this sum. Note
that A(—x) = (Io—apt + apt: —a,x' + . So alx + a,x’ + fix. +...
= ML) Settingx =1, we getaI + a, + a. + = —_._A(1)_2A(—l)
14“ - (“2)"
__2 _ = 51 (14 n + 2 a ‘). This
' is
~ the desired
' answer and there is no
easy way to get to it without generating functions. I
In this problem, there is no easy formula for a. because A(x)
cannot
be expanded easily. Still, by factoring A(x) and applying the
binomial
theorem, we get the number of committees with 7: members
= a. = coefficient ofx“ in (4):“) (1 + g x + xi)“
= coetlicient of x‘k” in (4)3I [1 + x( x + %)]h
Advanced Counting Techniques 537
. .
: coeflicrent of #251114" 2
n 25 x'
3 '
x+5
1-D r
g. 25 3 '
= 4'5 2 coeflicient of #4“ in (x + ]
r-n \ r 2
-
= 4” 3:5 ( 25 r (a 2. + 2s» "
"" r n — r — 25 2
25 ' ,- lr-u
"° r n — r — 25 )(3)
This has to be left in the summation form. Thus, although the theory
of enumerators does not give ahandy formula for 41,, it does express it as
a sum which can be evaluated mechanically for a given value of n.
As an example where it is easy to identify the enumerator as well
as to expand it, we give an alternate proof of Theorem (2.3.12) where it
was shown that the number of r-selections of n types of objects, with un-
n+r— 1
limited repetitions allowed, is equal to . The proof there was a
I‘
rather tricky one. With euumerators, we can do the problem routinely. Let
a. be the desired number of selections (for a fixed n). Then A(x)= Ejnm'
,.
is the enumerator for the selection of objects of the given )1 types. LetA,(x)
be the enumerator for the selection of objects of lype i, i = l, ...,n. Since
there is no restriction on the number of objects of type i that can be chosen
and since all objects of the same type are to be regarded as identical. it
follows that for every m =0, l 2, there is only one way to choose m
objects of type i. So A,(x)== l +x+x9+ +x" + = l/l—x. Once
againi A(x) = A,(x)A,(x)...A,,(x). So A(x)=(l + x + x' + m)“ = (l — x)“.
= Expanding this by the binomial theorem (cf. Exercise (1.9)) we get
u+r— l
41,— as desired.
I
At the heart of the arguments in the preceding examples is the simple
formula x‘xl =x’+/. Repeated applications of this formula allow us to express
the desired enumerstor A(x) as the product of the enumerators A,(x),
A.(x),...for individual ‘contsiners‘ (which were either the boxes, the countries
or the piles of objects) because the nature of the problem in each case is
such that the selections for each container can be made independently of
the others and consequently every term in each A,(x) can be multiplied by
every term in A,(x) for i as j. This will no longer be the case if the conditions
of the problem implied some mutual dependence among the selections for
some of the containers. In such cases, such inter-related containers have to be
538 mscna-ra MATHEMATICS (Chapter Seven)
handled separately as we illustrate in the following variation of Problem
(2.2).
2.3 Problem: Suppose in Problem (2.2), two of the countries are super-
powers. How many committees can be formed if there is an additional res-
triction that these superpowers have an equal representation in the
committee?
Solution : Denote the enumerator by B(x) this time. Suppose the first two
countries are the superpowers. Then for i = 3, 4 ..., 25, A,(x) remains the
same, namely, 4:: + 6x' + 4x‘. However, we have to consider the two super-
powers together. If one of them has In members in the committee so does
the other. This means a term a" for one of them can be multiplied only
with a term X'- for the other and not with any other power of2:. Since there
4
are( >ways to choose or members from 4 delegates. it follows that the
or
enumerator for the two superpowers together is no longer the entire
(4:: + 6x‘ + 4x')‘ but only the part l6x' + 36x‘ + 16x‘ of it. Hence
B(x) = (me + not + 16x‘)(4x + 6x' + 4x')”.
As in Problem (2.2), the desired answer is “8(1) — B(—l)] = H68. 14“ —
— 68 (— 2)“] = 34 (14" + 2"). I
In the examples so far, the desired enumerstor came out to be the pro-
duct offinitely many expressions. As remarked in the last section, with a little
care. it is possible to consider infinite products and in Proposition 0.14)
we proved that the infinite product
(1+ x)(l + x') (l + x‘)(l +x') ... (l + x1") equals l+x+x'+x'+ ....
By interpretting both sides of this identity as enumerators, we can prove a
result from number theory. 'Although, the result itself can be proved by
other methods and is not very deep. it illustrates how the theory of
enumerators can be applied to prove results from other branches of
mathematics.
2.4 Theorem: Every positive integer has a unique binary expansion, that
is, an expansion with base 2.
Proof: In a binary expansion, the only possible digits are 0 and 1, So the
assertion is equivalent to saying that every positive integer can be uniquely
expressed as a sum of powers of 2. (Here 2" = l is to be included as a
power, but not the fractional powers }. 3, 5”...) It is easy to interpret this
assertion combinatorially. Suppose we have an infinite number of boxes,
Bo, Bl, B . Assume that the box B, can contain either no ball or else 2‘
balls for i: 0, l, 2,... and consider the problem of distributing n identical
Advanced Counting Technique: 539
balls into these boxes. Obviously for every n all except finitely many of
these boxes will be empty‘ The enumerator A,(x) for B, is l + x" , i= 0, l.
2,...n. Let A(x) be the enumerator for distributing balls into the boxes.
Then A(x) = ; A,(x) because the reasoning given earlier for finitely many
[-11
boxes applies for this case as well. (Note that when (l + x) (l + x“)
(l + x‘) (l + x‘)...(l + x“)... is expanded, each term will be an infinite
product, all except finitely many factors of which are 1. For example,
x l-xtxl l - 1 .x“» l - l-l- 1m equals x". The terms ofthe form x'lxfllx“
where infinitely many of the n,’s are greater than 0, are 0 if I x | < l, as we
assumed in (Proposition (1.14).) By Proposition(l.l4), A(x)=l +x+x'+...
+x" + But this means that for every n, there is only one way of putting
71 balls into the boxes Bu, 8,. B.,..., which is equivalent to saying that there
is only one way to express it as a sum of powers of 2. I
The argument given above is partition theoretic. It amounts to saying
that every positive integer has only one partition in which the part sizes
are distinct powers of 2. Analogous arguments yield interesting results
about partitions. Recall (Definition (23.14)) that a partition of an integer
n into m parts is a sequence (21., n,,..., nm) such that n, 2 n. 2... 2 n... 2 l
and n, + n, + ...+ n... = n. The integers n1, n“... n. are called the part
sizes. P., m denotes the number of partitions of n into m parts and p(n)
denotes the total number of partitions of n (into whatever the number of
parts). Clearly poi) = 3311’." m. As mentioned in Chapter 2, Section 3.
m-
there is no easy formula either for P.., .. or for p(n). However, using enu-
merators, it is easy to get formulas for their generating functions. We begin
by finding a closed form for E P..,... x”, for a fixed m.
n -0
2.5 Proposition: Forapositive integer in,
on
__x’”__.
2 Pawn 3'" =
.-0 (1 — x) u — x‘)...(1 — x”)
Proof: By Proposition (2.3.16), P5,, .. equals the number of partitions of
n in which the largest part size is m. Consider m boxes, Bl. B,,..., B... For
i: 1,..., m suppose B, contains infinitely many bags each containing 1‘
balls. If we are not allowed to split the bags. then the number of balls that
can be picked from B. has to be a multiple of i, i.e.0, i, 2!, 3!, 4i,.... Any
partition of an integer with part sizes not exceeding m is equivalent to a
way to pick n balls by picking bags from the boxes B,,..., BM. All we have
to do is to group together parts of size 1' and think of each of them as a
bag from the box 3,. For example the partition (5. 5. 4, 3, 3, 3, 3, l, 1) of
28 corresponds to picking 2 bags from 3,, none from 8., 4 bags from 8,,
1 from B. and 2 from 8,. The enumerator for the box B. is
540 mscnm MATHEMATICS (ChapterSeven)
1+ x'+ x" + x~+...= L
1—3:"
It follows that the enumerator for picking balls from the m boxes is
l l l 1
iTx.—1—x' . . _l—x" l——x'—‘
This would be the generating function for the number of partitions in which
all parts are of size < m. But we want the largest part size to be m. So
there has to be at least one part of size m, which amounts to saying that
at least one bag from the box B... must be picked. Hence the enumerator
for B... will not be 1 + x’" + x“- + x" +... but )4" + x'" + x'" +... which
equals x" (l + x" + x” +...) = l _ X“ .
x". The result is now clear. 3
2.6 Corollary: For every positive integer n, pot), the number of partitions
of n. equals the coefiicient of x“ in the expression
.7: x' x'
1 —x+(—l —x) (1 —x’) +(—1 —x)(l —x’) (1'— Pf”
x“
...+ (1 — x) (l — x')...(l — x')‘
Proof: This is an immediate consequence of the last proposition and the
fact that p01) = i P»... I
III-I
It should be noted that this corollary does not give the generating
function for the sequence (p(n)}:’.,. because the expression appearing in it
depends on n. However, using essentially the same argument as in the last
proposition, we get the generating function for (11(0), p(l), (P(2),...) where
we set p(0) = l.
2.7 Theorem: For every non-negative integer n, p(n) is the coefiicient of
x" in the infinite product
1 l 1
Proof: We duplicate the argument in the proof of Proposition (2.5). except
that instead of or boxes we have an infinite collection of boxes 13,. B,,...
with the 1th box containing infinitely many bags each containing 1' balls. I
As a corollary. for a fixed n we get an expression for p01), which is a
little simpler.
2.8 Corollnry: For every positive integer n. p(n) is the coeflicient of x“ in
1
(1— x) (1 — x‘)...(l — x“)
Advanced Counting Techniques 54]
I
Proof: We split the infinite product fl as A(x) B(x) where A(x) =
"=1 l—x‘"
l
= II and B(x) = fi+ 1—. When B(x) is expanded it con-
ul-l l —x" am 1'1 —x"'
tains no terms of degreeg 11, other than I. So the coefi'icient of x’l in
.3 1 — r- is the same as that in A(x). The result now follows from the
last theorem. fl
When it comes to actually finding p(n) for a given n, the results above
are of little help because expanding the expressions as power series in x
so as to find the coeflicient of x" takes about the same amount of work.
However. the arguments are useful when we are dealing with partitions
with some restrictions on the part sizes. Theorem (2.7) easily generalizes to
show that the number of partitions of n in which the part sizes have to be
from among my "1,, m,,...(say) is the coefficient of x" in the product
{I 1—113'" Sometimes we may be able to expand this product by other
means and thereby get a closed formula for the number of such partitions.
As an illustration, we do the Postage Problem. Recall that in this problem
we have to find the number of ways in which a postage of two rupees can
be aflixed on an envelope with stamps of denominations 10 paise, 20 paise
and 30 paise.
We assume that the stamps of each denomination are indistinguishable
from each other and that they are available in infinite supply. (In the
present problem, it suflices to assume that at least 20, 10 and 6 stamps of
the three denominations are available respectively.) Asremarked earlier,
taking 10 paise as a unit, the problem amounts to finding the number of
partitions of 20 into parts with sizes 1,2, and 3. The enumerator for such
. . . 1 1
”mm“ " (1 —x) (1 — x‘)(l —x') =(l—x)'(1+x)(1+x+ x')' M
can resolve this into partial fractions as follows. (1 + x + x’) = (l — 0x)
(1 — m‘ x) where to = — —21— + 122 i (cf. Problem (1.13).) Now, let
l A B c D
9) ('1'—x")"(1 —x') (1 — x?) = (1 5-7) + (1 -—x;' + 1_—x +1 + x‘
. E F
+ 1—— mx+ 1 — u’x
By expanding the right hand side of (t) and comparing coefiicients of
numerators we would get a system of linear equations in the 6 unknowns
A, B, C. D, E, F. Solving it would be a horrendous job. So we apply certain
well-known tricks. Multiplying both sides of (t) by l + x and then setting
x= — 1, we get D = 1/8. Similarly, we get E: 1/9 and F: 1/9. Ifwe
multiply (t) by (1 — Jr)I and set x = l, we get A = 1/6. However. finding
B and C is a little tricky. Multiplying (t) by (l — x)’, we get for) a: A +
542 DISCRETE MATHEMATICS (Chapter Seven)
1
+ 3(1 —x) + C(1 — x)’ +(1— x)'g(x) where/(x) = (1+x) 1+x+x'§ =
1 D E F . . t-
: W and g(x)= m + 1——_ tax + l——- m'x‘ Dlfl'erentla
. , __ ~3x'—4x—2 ‘__ _ __ __ ,
1ng,f (x) — (fir—WW ._ B + 20(1 x) 3(l x) g(x) +
+ (1 — x)'g'(x). We need not compute g’(x). We set x = l and get f'(l) =
= — B, which gives B = — f’(l) = 9/36 = 1/4. Differentiating again,
we could set C from f"(1). Alternatively, in (.) we set x = 0, giving
A + B + C + D + E + F: 1. Since all other constants are known, C
1 l " 1
comes out as 17/72. Thus (I —-x (I —x‘) (l—x‘) =?(l —x) + '4—
' 17 l l 1 l l I 1
0—3:) +7—21—x+ 81+x+?l—mx+.9_l—o'x‘ We expand
each term on the right by Theorem (1.3), (iii) and (iv). The coefi‘icient of
x. in (l —x)-‘ is (_ I), (— 3)(—4)...:!— 3 —n +1) = 3.4.....';(ln+ 2) =
n + 2 n + l
=< 2 . Similarly the coeflicient ofx“ in (l — at)“ is l . The
other four terms are expanded by the formula I = E n'x' Summing
l + ax u-o '
. 1 "
It up, weget _—_—=
(l—x)(l— 90 —x‘) 30am,where
1 "+2 1 l7 1 ..
”~=?( 2 +4‘("+‘)+7‘2+s(“) +
1 ..
—9-u + 3° —
1 ,n_ 6(n+2)(n+l)+18(n+l)+l7+(—
72
1)~9+sm~+8u'-
We are interested in a" which comes out as 44 since a” + w“=m‘ +u=
= — I. So there are 44 ways to put a total postage of 2 rupees from
stamp: of 10 paise, 20 paise and 30 paise. This is exactly the same answer
obtained earlier in Chapter 2, Section 3 but this time the method works
for other denominations as well.
It is tempting to try to apply the method of solution above to the
1
expression (l—W—“X'WTW and thereby get a formula for the
number of partitions of an integer n into at most m parts (or equivalently,
number of partitions into parts of size not exceeding m). Unfortunately,
the partial fraction resolution of (1 _ XX] _ 1 ..(1 _ 2"") seems quite
complicated. So, as remarked before, there is still no easy formula for p(n).
The reason for applying generating functions to study partitions is not that
they yield such handy formulas. Their real advantage is that by manipulat-
ing the enumerators algebraically we can sometimes show that two appa-
Advanced Counting Techniques 543
rently difi'erent combinatorial problems have the same enumerator. In such
a case, even if we are not able to expand the enumerator, we do get a rela-
tionship between the counts associated with the two combinatorial problems
which may not be obvious directly. By way of illustration, we prove One
such relationship here.
2.9 Proposition: For every positive integer n, the number of partitions
of n in which the part sizes are distinct equals the number of partitions in
which the part sizes are all odd.
Proof: Let 4,. and b. be the respective numbers of partitions of n with the
stated restrictions. (For example, let n = 5. Then a,l = 3 because (5), (4,1),
(3. 2) are the partitions of Sin which the part sizes are distinct. Similarly
b» = 3. because (5), (3, 1, 1) and (1, 1, 1, 1, 1) are the partitions of 5 in
which the parts are of odd size.) We set ao=bo=1. We have to show a..=b,.
for all n. It would be great if we can compute both 11,, and b. and show
directly that they are equal. But this is too ambitious. Nor is it necessary. It
would suffice if we can establish a one-to-one correspondence (Le, a bijeo-
tion) between the sets of the two kinds of partitions of 71. There are many
instances where this kind of an approach works (see e.g. Exercise (2.336)).
But in the present case. it is rather clumsy to convert a partition in which
the parts are of distinct size to a partition in which the parts are of odd
size or vice versa. Another approach would be to try induction on n. This
would require us to express a. in terms of a._‘ (or in terms of at for some
k < n) and similarly for b,.. In the next two sections we shall see numerous
examples where this approach works. But. again, in the present case, there
seems to be no easy way to implement it.
However, with generating functions, we can get the result elegantly. Let
A(x) =3)“ a..x" and B(x) =30 17.x". In view of Theorem (1.2) part (i). it
would suffice to show that A(x) = B(x). Now A(x) is the enumerator for
partitions in which the part sizes are distinct. Referring to the proof
of Proposition (2.5), this means that from each box we can pick at most
one bag. So the enumerator for the ith box is only 1 + x'. It follows that
A(x) = (1 + x)(l + x')(l + x')...(1+ x~)...= .fi. (1 + x1").
As for 8(x), we follow the proof of Theorem (2.7), with the restriction that
the part sizes must be odd. This gives
_ 1 1 1 1 1
30‘) - 1—: W WW = ,1}. it?”
-
So we are reduced to proving the following identity about infinite products:
(*1 (1+xx1+x')(1+x«1...=,—jx1—_‘;ll—x....
544 DISCRETE MATHEMATICS (Chapter Seven)
As usual, we assume that infinite products are amenable to the same
algebraic manipulations as are finite products. (A rigorous approach would
require a justification.) Rewriting l + x’" as 11:21 the left hand side of
(5‘) l (1 —~ xg)(l ~ x‘)(l -— x“)...(l — x“)...
. We see that all the fac-
(l — x)(l — x’)(l — x‘)...(l — x")...
tors in the numerator cancel with the alternate factors in the denominator.
This leaves us precisely with the right hand side of (t). I
In the proof above, we proved an algebraic identity (namely (*)) by
direct arguments and interpreted its two sides suitably so as to yield a
combinatorial result. Sometimes it is easier to go the other way. That is,
we find the enumerators of two mutually equivalent combinatorial prob-
lems. Equating these two enumerators gives an algebraic identity. Such
combinatorial proofs of algebraic identities are often ingeneous and provide
a testimony to the fact that the relationship between algebra and combina-
torica is not a one way afi'air. Each branch has enriched the other. By way
of illustration, we prove here one such identity.
2.10 Proposition
D xk‘
(1+ x)(1+ x')(l +35)... —k§o WW.
Proof: We immediately recognise the left hand side as the enumerator for
the number of partitions into odd, distinct, part sizes. What we need is
another way of counting such partitions so that the enumerator would
come out to be the right hand side. The key to this alternate counting is
provided by the Ferrer‘s graph of a partition, introduced in Chapter 2,
Section 3. As an application of this concept, it is easy to show (cl'. Exercise
(2.136)) that the number of partitions of an integer n into parts of odd,
distinct sizes equals the number of self-dual partitions of n, i.e., partitions
which coincide with their dual partitions. We now count the number of
self-dual partitions. Such partitions are characterised, by the fact that their
. Figure'lJ: Dnrt'ee Squared: Partition
Advanced Counting Technique: 545
Ferrer’s graphs are symmetric in terms of rows and columns. Consider the
largest square of dots that is contained in the left hand top corner of the
Ferrer‘s graph of a self-dual partition. This square is called the Durt'ee
square of that partition. (Actually, this concept makes sense for any parti-
tion, but here we need it only for self-dual partitions.) In Figure 7.3 we
show the Durfee square of a self-dual partition of 26. It is a 4x4 square.
Conversely suppose we start with a kxk square of dots. We can then
Construct a Ferrer‘s graph of some partition as follows. We add 71, dots to the
ith rowof the square and an equal number of dots below the ith column.
The resulting figure will he the Ferrer’s graph of a self-dual partition of the
integerk2 + 2n. + 2n2+...+ 211,. provided III 2 n2 2...; mt 2 0. (In Figure
7.3, "1:3, n,=1, "3:1 and n, = O). The originalkxksquare is the Durfee
square of this partition which is uniquely determined by the integers k and
n....., n. Let m = 2111+ ...+2n,,. Then the sequence (n1,.... nk) corresponds
to a partition of m into at most k parts of even size. Conversely every such
partition of m gives rise to a sequence (n,,.. ., m.) (we simply divide the
part sizes by 2 each). Following an argument analogous to that in Propo-
sition (2.5), the number of partitions of m into at most k parts of even size
is the coefiicient of x'” in (l— “15%). ..(l _ ). which is the same
as the coeflicient of W” in (i _ x‘) (l _ ). It now follows that the
number of self-dual partitions of an integer n is the coefficient of x" in the
" xk' . . .
sumkfo (1 _ x‘)(l —x‘)...(l _ )' For a fixed n, this sum is really finite
since the terms with k > 1/I: contribute nothing to it. Thus we see that the
on
enumerator for self-dual partitions Egan—3W“), This
is precisely the right hand side of the identity to be proved. As noted
before. this completes the proof. I
We now " an n" ' of, " ,i‘ ' to probability.
Many counting problems can be paraphrased as problems in probability
because of the definition of probability as the ratio of the number of
favourable cases to the total number of cases. assuming the latter are all
equally likely. Of course, this simple-minded definition presupposes that
the total number of cases is finite. It is possible to extend the theory of
probability to the case where the set. say S, of all possible cases is infinite
using the concept of what is called aprobability measure. The essential
idea can be explained as follows. Suppose X is a variable whose values
vary randomly over the set S. Such a variable is called a random (or a
stochastic) variable (or a varlate). Now we would like to have a function
it: P(S) —> R, such that for each A c S, MA) gives the probability that the
value assumed by X lies in the subset A. This function must satisfy certain
properties (for example, 0 < 11.04) g I for all A c S), and when it does,the
546 mscam MATHEMA'HCS (Chapter Seven)
pair (S, p) is called a probability measure space. We encountered an exam-
ple of this in the Continuous House Problem, where the random variable
X was the ordered pair (x, y) of two houses on the road, the set S was the
unit square in the cartesian plane and for A c S, p.(A) was simply the area
of A.
Construction of a probability measure on an infinite set S involves
certain problem beyond our scope. The situation is somewhat tractable if
S is countably infinite. Specifically, let S : (:1, rpm, :.,...). Let X be a
random variable taking values in S. For each n, let p. be the probability
that X equals 3.. Obviously 0 <1), g l for all n e N and EN p. = l.
n 6
Now for any A C S, we can define p.(A) = 2 4p" Clearly p(A) is the
II .‘ In E
probability that the value of X is in A. In case S is a subset of R, we can
speak of the average (or the expected) value of X. because of the algebraic
structure on R. Things are especially nice if the set S consists of non-
negative integers, because in this case generating functions can be Used
handily as we thail soon see. Moreover, this special case has a wide appli-
cability because as the examples below show, there are many real-life
it " where the is a , " integer.
So. formally we define a discrete random variable as a variable X which
assumes the value n with probability, say, p... We express this by saying
that P(X = n) = p. for n = 0, 1, 2,.... Obviously the numbers p. are non-
..
negative and-20 p. = i. The sequence {11.)};0 is called the probability dis-
trlhation of X.
Let us consider a few examples of discrete random variables.
(1) Let X denote the figure that appears on top when an ordinary die
is rolled. Then X is a discrete random variable for which p, = 1/6 for
is n g 6, andp,=0forallothern.
(2) Let X denote the score when'a pair of ordinary dice is rolled.
Here X can assume values from 2 to 12 but not with the same probabilities.
Enumerating the various possibilities it is easy to show that X is a discrete
random variable with probability distribution { p.),,"_. where p0 = p, = 0,
l l l
P5=Pn= 3—6iPa=Pu = fish =Pro= 12,
l 5 l
Pr=Pa =§.Pc =Ps 2‘56"” = a
and p. = 0 font >12.
(3) Suppose p is the probability of a head showing up when a coin is
tossed, where 0 < p < 1. Let X be the number heads occurring in 10 tosses
of the coin (cf. Exercise (1.13)). Then X is a discrete random variable for
Advanced Countlng Techniques 547
10
which p, = p'(l -p)'°-'; r_
— 0,1,2, .(A probability distri-
bution of this kind IS called a binomial disribntionl. .
(4) In all the three examples above, the discrete random variable assu-
med only finitely many values and hence the probability p, was 0 for all
sufficiently large n. As an example when p,. > 0 for infinitely many n‘s,
suppose an. unbiassed coin is tossed until a head shows. it can be shown
that the probability of success for this experiment is 1, that is, it is extremely
unlikely that we will keep on tossing the coin endlessly and getting a tail
every time. A rigorous proof of this statement requires the concept of a
measure and is beyond our scope. Hoewver, for every positive integer n,it
is easy to calculate the probability that a head will appear for the first time
on the nth toss. We identify a head with l and a tail with 0. Each outcome
of tossing a coin n times corresponds to a binary sequence of length II. There
are 2" such sequences and the assumption that the coin is unbiassed makes all
these sequences equally likely. The case where the first head occurs during
the nth toss corresponds to the binary sequence 0.,.01 which occurs with
probability 51;. So, if'X denotes the number of times the coin is tossed till a
head shows, then X is a discrete random variable with Pa = 0. and p,I =2—l,I
for n > 0. Notethat
'2".
--o p._u-I
1 1I
ii T.-
Let us now introduce generating functions into the study of discrete
random variables. So far we have been denoting the independent variable
in a generating function by x. Since a discrete random variable is itself
denoted by symbols like X, Y, it is customary to use the variable t for the
generating function instead of x. This is, of course, purely a matter of
notation and does not affect the underlying concept.
2.11 Definition: Let X be discrete random variable with probability dis-
tribution (p..}._... Then the probability generating function of X, denoted by
p10), is defined as 2- pm.
In other words, the probability generating function of a discrete random
variable X Is nothing but the ordinary generating function of'Its probability
distribution. Because of the uniqueness prOperty of power series expansions
(Theorem (1.2) (1‘)). the random variable X (i.e. its probability distribution)
is uniquely determined by its probability generating function. In order to
show that the probability generating function is more than just a coding
device. we must be able to translate certain concepts about random vari-
548 mscxn'rs MATHEMATICS (Chapter Seven)
ables in terms of their probability generating functions. We discuss two
such concepts.
The first is about the average or expected value of a discrete random
variable. In Exercise (1.10), we defined this concept for a variable which
assumed only finitely many real values. The extension to the case of any
discrete random variable is straightforward
2.12 Definition: The expected value of a discrete random variable X,
with probability distribution {11.}.-9 Is defined as 20 np..
E(X) Is the standard notation for the expected value of a random vari-
able X. Examples can be given to show that E(X) need not be finite. In
the examples given above, in (l), the expected value is
%(l+2+3+4+5+6)=;.
This is also the arithmetic mean of the values assumed by X. because all
these values are assumed with equal probability. This is no longer the case
in Example (2), where the expected value, EU!) is 7 by a direct calculation.
Exercise (1.12) gives the expected value of X in Example (3). In Example (4),
to get the expected value we have to sum the series 52. %r Thiscan be done
III!
by the methods of the last section. The same methods yield the following
alternate description of the expected value.
2.13 Proposition: If px(t) is the probability generating function of a
discrete random variable X. then E(X) = p’x(i), where I"): _=_ dit' 1);
Proof: By definition. px(r)= i, p..I-. By (iv) of Theorem (1.2),
px’(t) = ‘2'] mpg”.
Setting t = l, we get
P10)? up.
which is precisely the expected value of X. |
in Example (4) above,
_ °' . = " L". L =_'.= 2
”(0‘30“ .2. 2 = I—I/z 2—: 2—: _“
So
, _ 2
PX (I) — W.
Advanced Counting Techniques 549
Hence 500 = px'(l) = 2. This means that if we keep on tossing a coin
then the ‘expected’ number of times it will have to be tossed before a head
shows up is 2. This does not mean that a head will necessarily show in 2
tosses. Indeed. as we saw above, even for a thousand tosses there is always
some probability( namely 1
21“!) )that no head will show. What, then, is
the ‘real life’ interpretation of the ‘expected‘ value? This is actually a deep
question and tends to be more philosophical than mathematical. The
mathematical definitions of probability do not really answer what proba-
bility is. They merely express one probability in terms of some others.
This is analogous to the stand taken in mathematical logic where we never
answer what is meant by ‘true’ or ‘false’. Instead our concern is to express
the truth or the falsehood of a complicated statement (or. in others words,
its truth value) in terms of the truth values of the statements from which it
is formed. A vague but intuitively appealing description of expected value
is that it is the average over a very large sample. In the present case.
suppose one person tosses a coin till a head shows. He may or may not get
ahead in2tosses. But suppose [0000 persons each do the same experiment.
Then the average number of tosses will be close to 2. Again this is not a
certainty. But the probability that it will difl‘er from 2 by more than. say,
.0001 is small, which means that if this experiment in which large numbers
of persons flip coins is repeated a large number of times, then in a large
majority of them the averges will difl‘er from 2 by at most .0001. But again,
we cannot be absolutely sure of this! This only serves to point to the
circuitous nature of probability. ‘
We leave this philosophical discussion here and return to showing how
generating functions provide a succinct description of some of the concepts
associated with discrete random variables. Expected value is one such con-
cept. As another example, we consider a concept which deals with the
mutual relationship between two such variables, say. X and Y. Intuitively,
we say X and Y are independent, if the values assumed by either one of
them are completely independent of each other. In terms of probabilities,
we can state this condition as follows.
2.14 Definition: Two discrete random variables X and Y with probabi-
lity distributions (p..},°,"_o and (q. Lin respectively are said to be (mutually)
independent if for every pair (1‘, j) of non-negative integers, P(X = i and
Y=j)=P(X=i)~ P(Y=j).
For example if a pair of dice is rolled and X and l’ denote the figures
on the faces of the two dice then X and Y are independent. On the other
hand suppose we pick two cards, one after the other, randomly from six
cards marked 1 to 6. Let X, Y denote the numbers on the first and the
second cards respectively. Then X, Y are discrete random variables each
having the same probability distribution, namely,
550 DISCRETE MATHEMATICS (Chapter Seven)
(o, 6'1 5'l 3,l 6'1 3'l 6’I o, o, 0,...).
Here X and Y are not independent, because they can never be equal.
Now let X, Ybe any two discrete random variables,. Then their sum,
X + Y, is also a discrete random variable. Its values are the sums of the
various possible values assumed by X and Y. However, very little can be
said about its_probability distribution unless we know how X and Yare
related. In case X and Y are independent, the answer is simple and can be
expressed elegantly in terms of probability generating functions as we now
show.
2.15 Theorem: If two discrete random variables are independent then
PX+Y(‘) = Px(')Pr(i)-
Proof: Let (12.). {11.}, {7.) be, respectively, the probability distributions of
X, Y and X + Y. For a given non-negative integer n, X + Y can assume
the value n whenever X and Yassume values 1‘ and j for which i +j= n.
Hence P(X+ Y= n) = 12° P(X= iand Y = n —— 1‘). Since X, Y are inde-
pendent. P(X = i and Y = n - i)=p.q..-I. So for every n20. r. = 3‘3 nan—I
and hence 2 r.l"= 2° 2 pa" q,._,t'-'=( _§ pIt‘x 2 gm) This proves
the assertion. (In essence we are applying Theorem (1.2)o(iii) here.) I
We conclude this section with a discussion of the use of exponential
generating functions in problems of enumeration. It was remarked In the
last section that theoretically the E.G G. is not an independent concept in-
asmuch as the E.G.F. of the sequence {a,},°'.o Is precisely the 0. G.F. of the
‘ a " . . .
sequence {r—I'} . However, problems Involvmg permutations can be handled
- no
more elegantly with exponential enumerators than with ordinary enumera-
tors. For example, let n be a fixed positive integer. For each r, the number
n!
of r-permutations of n objects Is ,,P,——n(n — l) (n — r + l)= (——u __ r)l .
So the ordinary enumerator for permutations of II objects is 2(——l—n:r)lx'.
This is really a finite sum since ,,P,==—0 for r > n But there is no closed
form expression for it. The exponential enumerator in this case, on the
n! °° n . . .
other hand, is 2dmx' =50 ( r x' tch Is nothing but(l +25)".
Here the objects are distinct and we are not allowing repetitions in the per-
mutations. lf we keep the n objects distinct, but allow unlimited repetitions.-
then for every r (including r > n), the number of permutations is n' and
J.
so the y for per ”Ins of n '3‘ , with repetitions
Advanced Counting Technique: 55l
allowed, is 20 my, which, now, is an infinite sum whose value is T‘ii'
r= __
~ ' _ _ on , no
The exponential enumerator in this case is 2 5:.” = 2 1’? = 2'“. Here
1-!) '-0
although we do have a closed form expression for the ordinary enumerator,
the expression l _ "x does not suggest any alternate way to arrive at it.
The exponential enumerator, e“, on the other hand,factors as ex- 2" -e"- . . . - e‘
(n times) and suggests that there might be some other method to derive it.
Such a method indeed exists and since it has many other applications be-
sides providing an alternate derivation of the formula above, we proceed
to study it. As with ordinary enumerators, the crux of this method is the
simple formula that x‘xi = xHi.
Recall that an r-permutation (or a permutation of length r) is an ordered
r-tuple of objects‘. Equivalently it is an arrangement of r objects in a row,
i.e. a way of filling the cells of an rx 1 rectangle. Suppose we are interested
in finding, for each value of r, the number of permutations where the objects
to be placed in these cells come from two mutually disjoint sources, say.
S1 and Sr These permutations may be required to obey certain conditions
regarding how many times certain objects can appear. We assume that the
nature of these conditions is such that the sources S1 and S, are indepen-
dent in the following sense. Let (0,, at, ..., a.) be a ‘permissible’ permuta-
tion. Out of the r objects, let a,» a,» ..., (1,, '(say) come from S1 and the
remaining r—p come from 5,, where 0 sp <r and l g i,<i, <...< i, S r.
Call the remaining objects 17,. ..., b, where q = r — p and the b's appear
in the same order as in (an ..., 11,). Then (a,h...,a,,) is a permutation of
objects from S1 and (III, ..., b.) is a permutation of objects from 5,. We
assume that the nature of the conditions on the permutations is such that
(11,. (1,, ..., 0,) is a permiSsible permutation if and only if both (a;,_ ..., (0,)
and (b., b.) are permissible permutations (Le. satisty the respective
conditions for the sources S, and S,). In other words every permissible per-
mutation is composed by freely merging (and not just concatenating) two
permissible ‘subpermutations', one coming from the source SI and the other
from 5.. (By free merging we mean that the entries of one permutation
can freely intersperse with those of the other. but the relative order of the
objects in each pen ' must be ‘ ‘ ‘ For ' .' (a, b, a) can
be merged with (l. 2) to give (a. b. 11, 1. 2), (a, b. l. a, 2), (a, b, 1. 2, a).
(l, a, 2, b, a), (l, 2,11, b, a) etc. but not (a, a. b, I, 2) nor (a, 2, b, a, 1). It
is easy to show that two permutations of length p and q respectively can
* In Chapter 2, Section 2 we required that-the entries in a permutation be dis-
tinct. We remove that restriction here. as in some problem: wehave to conslder
permutations with repetitions.
552 mscxm MATHEMATICS (Chapter Seven)
. P+q
be merged together In ) ways. This fact is the essence of what is
P
to come).
This assumption about the mutual independence of the sources SI and
S1 is not very restrictive and in fact, as we shall soon see, holds true in
many naturally occurring examples. But first let us see how it helps in
identifying the exponential enumerator for the permutations satisfying the
given conditions. For r 2 0, let a, he the number of permissible r-permuta-
tions. Let A(x) = 2.1% .7". Similarly, let B(x) = 20 1% x' and C(x) =
c . . .
;'—! xr where b, and c, are, respectively, the numbers of permlsnble r-permu-
tations for sources SI and 5,. We contend that for every r,
, r
a, = 2 ( > b, c,_,.
F0 p
To see this. note that every permissible r-permutation arises from merging
together a permissible p-permutation from S1 and a permissible (r -—p)-
permutation from S, for some p, 0 < p < r. Every such pair of permuta-
I‘
tions can be merged together in < > ways, because out of r cells, we can
P
r
choose 17 cells in ways and the cells so chosen areto be” used to acco-
P
modate the p-permutation from SI and the remaining (r — p) cells to acc0>
' I
modate the permutation from Sg. Recalling that ( ) r‘
P ._. p!(r—p)!' m
b;1—p—)!‘
get that:— =2” 0—1—9(r— By Theorem (12.) part (iii), this amounts to
saying that A(x)= B(x) C(x). Note that there would have been no such
simple rr'm' L" if," ‘ ‘of , :- we had taken
ordinary enumerators.
The extension to the case of n sources is immediate and we record it as
a theorem.
2.16 Theorem: Let 5,. S.,..., S. be mutually disjoint sets of objects. For
i: 1,..., 7:, let Ag(x) be the exponential enumerator for a certain class of
permutations of the objects in the set S, (to be called permissible permuta-
tions hereafter). Then the product, A,(x) A,(x)...A.(x) is the exponential
enumerator for those permutations of the objects in S.U S,U US., which
are obtained by freely merging together the permissible permutations from
each Si.
Advanced Counting Technique: 553
Proof: We proceed by induction on n. For n = I, there is nothing to
prove. The argument above proves the case n = 2 and also shows how to
do the inductive step. I
We are now in a position to give alternate derivations of the number of
permutations ofn distinct objects; say, a,,..., a... Let S, = (a,). First suppose,
no repetitions are allowed Then for each i: l,.... n, forr = 0, there is the
empty permutation. for r = 1, there is the lone permutation (11,) while for
r > 1, there are no r-permutations of 5,. So each A,(x) = 1 + x. Hence by
the theorem above, A(x) = (l + x)‘. 0n the other hand, suppose unlimited
repetitions are allowed. Then for each i: 1,... ,n and r> 0, there Is exactly
one r-permutation of 5,, namely the r-tuple (a,, a,,.. .. a,). So, In this case,
2
Am = l + g + ’2‘—, + $311+" By (I) in Theorem (1.3). A,(x)= 2*. By
the theorem above, A(x) = (e*)" = e".
As another example of the use of exponential enumerators, we give an
alternate proof of Theorem (3.2.8) where it was shown that for positive
integers k and n, the number of k-ary sequences of length n in which a
particular symbol, say 1, appears an even number of times is k———- + (Ir—2r.
The argument given earlier was somewhat tricky: based on classifying all
such , to a ’ L‘ ' ' with our
present technique the problem amounts to counting the permutations of k
objects, say a,,. , at, in which one of the objects, say. (1,. appears an even
number of times and there is no restriction on the number of occurrences
of the other objects. Once again, let S,=(a,), for i = l,.... k. Then the
desired enumerator, A(x), equals A.(x) A,(x)...Ak(x). As before, for
1:2,... ,.k A,(x) = l+—x!+;—-l'+‘ +x;+...=e'.However,sinoeu;can
appear only an even number of times, the enumerator An(x) is not the entire
sum 1 +1-! +;'l-l—u but only the even powersin It, namely 1+?! +x—‘1-i-n-
e‘ + r"
It is easily seen that this equals . So
2
A(x) = ("fl—i) (av-1 = %[¢"‘ + AM»;
This can be expanded as E [W] I". Since this is the exponen-
7-0
tial and not the ordinary enumerator, the desired number ofpermutations is
n 1 times the coeflicient of x', proving the result.
As with ordinary enumerators, it can happen that'In some problems It Is
easy to identify the exponential enumerator but not easy to expand it.
Consider. for example the problem of distributing distinct objects into
distinct boxes so that no box is empty. In Proposition (2.3.8), we showed
that the number of ways to distribute n distinct objects into In non-distinct
554 mscnn‘rn taxman/mes (Chapter Seven)
boxes so that no box is empty is m ! S... .. where S.,,,. is Stirling number of
second kind, defined as the number of ways to put 71 distinct objects into
m non-distinct boxes with no box empty. or equivalently the number of
partitioning a set of cardinality n into In mutually disjoint. non-empty
subsets. It was remarked that there was no easy formula for S..,,.,. Note that
Sm. is not the same as Pu, ., which is the number of partitions of the
integer 11 into exactly in parts, Although there is no easy formula for P... nu
in Proposition (2.5). we obtained the 0.G.F. of the sequence {P.,,.):°_. for
fixed m. Similarly, although there is no formula for the Stirling numbers,
S.,... it is easy to obtainthe E.G.F. of the sequence (S,,...):’.., for a fixed
m.
2.17 Proposition: Let m be a positive integer. Then
-s,,.. _1 _
n.0n!r_’.n—!(¢x 1y».
Proof: Let us first work with (e"— l)“. Considerm distinct boxes. say.
3,, 12”.... B... and let us count the permutations of these boxes with repe-
titions allowed and with the restriction that each box must appear at least
once. (Later on we shall put certain objects in these boxes, It is important
to note, however. that here we are dealing with the permutations of the
boxes and not of the objects that will be put in them. In other words, for
the time being, these boxes are our 'objects'.) We repeat the reasoning in
the examples following Theorem (2.”). Perl = l,..., m, let S, = (3,}. Since
the box 8. must appear at least once. we have to exclude the empty permuta-
tion (i.e.. permutation of length 0) from each A,(x). Consequently, A,(x) is not
2* but 2‘ — l for each i= 1..... m. By theorem (2.17),(e'—1)"' isthe expo-
nential enumerator for permutations of the m boxes B,,..., B... with repe-
titions allowed, in which every box appears at least once. We now look at
such permutations a little differently. Let n be a fixed integer and let The a
set with n distinct elements, say T = (11...... 11.). Let us place these elements
(or ‘objects‘) in the boxes 5”,... 8,... Every such placement is equivalent
to a permutation (Eh, B,,....B,,,) where for k = 1,..., n. B", is the box in
which the object a. is put. The requirement that every box shall appear at
least once corresponds to no box being empty while allowing unlimited
repetitions translates into there being no upper bound on how many objects
can go into each box. Thus the n-permutations of the boxes with the
Itipulnted restrictions correspond to ways of putting n distinct objects in
in distinct boxes with no box empty. By Proposition (2.3.8). this can be
done in m! S... ways. Summing it up, (6“ — l)"' is the exponential
generating function of the sequence {ml S.,..,);.".... To finish the proof we
merely divide by ml I
Expanding (e‘ — l)" as a power series in x, we can now show that for
an a (_ I), (m _ ry—I
each n, 5-,». = 5
_. rl (m -_ _ 1):. This is the same as Theorem (2.4.8).
Advanced Counting Technique: 555
which was proved using the principle of inclusion and exclusion. We could
have as well started from Theorem (2.4.8) and obtained another proof of
Proposition (2.17). Whenever possible, it is generally more desirable to
obtain enumerators by combinatorial arguments.
Exercises
2.1 Find the coefficient of x“ in the following expressions:
(i) (1 + X’ + x')(x‘ + x‘)(x + x')
(ii) (1+ 5x +10x')‘
(iii) (1 + x3)"
2.2 Devise suitable combinatorial problems which can be solved using
the last exercise.
2.3 Find the enumerator for the selections of I: kinds of objects. if the
selection must not include more than m, objects of type 1. i=1,
2,..., k.
2.4 Find the enumeratcr for selections of books from two shelves con-
taining 10 and 7 books respectively if the selection must include an
odd number of books from the first shelf.
2.5 Find the enumerator for the total score if n distinct dice are tossed,
n being a fixed positive integer.
2.6 A ‘crazy‘ dice is defined as one whose six faces are marked difi'erently
than those on an ordinary dice. Prove that if two crazy dice, one
marked with figures 0. l, 2, 3, 4 and 5 and the other with 2, 3, 4,
5, 6 and 7 are rolled together, each possible total score is as likely
as that for a pair of ordinary dice.
‘2.7 Devise a pair of crazy dice, the markings on whose faces do not
form arithmetic progressions, which have the property that when
they are rolled together, each possible total score is as likely as that
for a pair of ordinary dice. (Hint: Factor (x + x‘ + x' + x‘ + x‘
+ x')‘ as a product of unequal factors, say A(x) and B(x) such
that‘A(l) = 3(1): 6.]
2.8 Suppose at an international conference, each country sends a team
of 4 delegates There are 5 capitalist countries, 6 communist coun-
tries and 7 non-aligned countries at the conference. Let a. be the
number of committees that can be formed in which there are :1
members and in which the capitalist countries have the same repre-
sentation as the communist countries. Identify the 0.G.F. of
{113.11. . Evaluate a. for n < 10. '
2.9 Generalise Theorem (2.4) to any base b. by first proving the identity
1
(1+ x+...+ 15”")(1 + x‘+ as" +...+ 3‘0““)... = IT)?
556 mscnm Mummies (Chapter Seven)
2.10 Negative coeficients rarely appear in enumerntors. (Why?) How-
ever, sometimes it is convenient to take the negative: of some
coeficients soasto‘ ‘ parity.” r, abox
many bags each containing m balls, where m is a positive integer.
Prove that l — x” + x’"I — x” + x“ — x”- + still enumerates
the number of ways to pick balls from this box (without breaking
any bag) and also indicates the parity of the number of bags
picked.
For every positive integer n, consider partitions of n into parts
whose sizes are (not necessarily distinct) powers of 2. Among all
such partitions, let a, be the number of those in which the number
of parts is even and b. the number of those in which there is an
odd number of parts. Using the last exercise and the identity
1_ 1 l l l
x—mm—mm-"m—m'"
prove that a. = b. for all n > 1.
Prove that the number of partitions of n into exactly m parts equals
the number of partitions of n -— m into at most In parts and
that the number of partitions of n into exactly m parts of distinct
m(m — 1)
sizes equals the number of partitions of n — __2_ into exactly
m parts. [Hint Remove a suitable number of dots from the rows
of the Ferrer’s graph so as to get direct bijections between the
appropriate sets of partitions]
2.13 Prove the following identity combinatorially:
(1+ x)(1+ x')(l + x')...(1+ x“)... =- j. (I + x“)
gut-#1)]!
=I+§
~-; (1 — x) (l — x').. .(l —x")
(Hint: In the Ferrer’s graph of a partition consider the largest
isosceles triangle of dots It the top left corner.]
2.14 Find the number of ways a 10 rupee note can be exchanged for a
collection of coins of denominations l rupee, 1/2 rupee and 1/4
rupee.
1.15 What would be the answer to the last exercise if only 10 coins of
each denomination are available? What would be the answer if We
require further that the change must contain at least 2 half-rupees
and at least 4 quarter-rupees?
‘2.16 In a triangle ABC, suppose AB = AC, D is the midpoint of BC, E
is the foot of the perpendicular drawn from D to AC and F is the
midpoint of DE. Prove that AF Is perpendicular to BE. (Like
Exercise (2.3.1), this one too has little to do with combinatorics.
Advanced Counting Technique: 557
It is meant to illustrate our comment that througha suitable choice
of cartesian coordinates, a tricky geometric problem can be trans-
formed into a routine algebraic one.) ,
7 Use Theorem (2.16) to give an alternate proof of Theorem (2.2.17).
Use Theorem (2.16) to give an alternate solution to Exercises
(3.2.17)(b) and (3.2.18).
Using Proposition (2.17), prove that
S _uia _l)r(m_r)n-l
”'"_,-. rl(m—-r—l) '
where m, n are positive integers with m g n, and SM. is the Stirling
number.
2.20 Prove Proposition (2.17) from Theorem (2.4.8).
2.21 Let t. be the number of triangular partitions of an integer n (cf.
Exercise (2.135)). Prove that if n is even than t. equals the number
of partitions of n/z into exactly 3 parts (these latter partitions are
not required to be triangular).
2.22 Using the last exercise determine 1. if n is even.
2.23 Prove that for all u, '2' SH... x(x — I) (x — m + l)= x". [Hintz
N-
First prove this when x is a positive integer, using Theorem (2.4.7).
Then note that both sides of the identity are polynomials in x.]
2.24 The number of all possible partitions of a set of cardinality n is
called the nth Bell number and denoted by P.. Show that P. is
also the number of distinct equivalence relations on a set of cardi.
nalitynand that P.= *5 S”. (This sum is really finite since
-0
SW. = 0 for k >"-)
‘2. 25 Combining the last exercise with Exercise (2J9), prove that
5 i 17;. mint: Interchange the order of a double summan
s—o -
" tion.]
2.26 The variance V(X) of a discrete random variable X with probability
distribution as.) is defined as '2; (n — 15mm, Prove that
W") = px"(1) + px’(l) ~px’(l)‘-
Calculate the variance in Examples (3) and (4).
2.27 Let Y be the discrete random variable denoting the number of
times afair coin is tossed till a head shows for the second time.
Prove that ”(1) = [px(t)]‘ where X is the discrete random variable
in Example (4). Hence show that EU) = 4.
2.28 (it) Using the fact that E ’17, is convergent while 5171' is not, con-
fl-l .-
558 Discuss-n MATHEMATICS (Chapter Seven)
struct a discrete random variable X for which E(}{) is not
finite.
(b) What will be E(X) in Example (4) if the probability of a head
showing is p where 0 < p < l?
2.29 Let 2 be the discrete random variable denoting the number of
times a fair coin is tossed till two consecutive outcomes: are the
same. Prove that the probability distribution of Z is { p..},."_o with
>2. Hence show that E(Z) = 3.
— 2.1—4 for n>
P. =11, = 0 and p,,-
(Hint: A binary sequence (x.,..., x...” x.) in which x._, and x.
is the only pair of consecutive identical digits is uniquely deter-
mined by x..]
Notes and Guide to Literature
The enumeration techniques discussed here are classic. The identities in
Proposition (2.10) and Exercise (2.13) are due to Euler. For more on Bell
numbers as well as numerous other identities, see Lovasz[l]. Exercises
(2.7) and (2J6) are from Newman [1] and Lsrson[l] respectively. All
these three books contain an interesting variety of mathematical problems.
A classic treatise on the theory of probability using measures is by
Kolmogorov [i]. For a recent treatment. see Tjur [I].
3. Recurrence Relations
In the last section we ‘“ ’ ’ afew r," ' oftheg ' 7‘
In this sectionwe study what are by far the most important applications
of generating functions. namely, towards the solutions of recurrence rela-
tions. This'Is so. because recurrence relations are such apowerful and frequ-
ently occurring ‘ ‘ for “ :..l mm ' that any ‘ ‘
for solving them is bound to have considerable importance. Recurrence
relations are to discrete mathematics what differential equations are to
continuous mathematics. This analogy is not based ,merely on the wide
applicability of the two. The relationship between recurrence relations and
differential equations is much deeper. Many concepts and results about re—
currence relations and their solutions have a striking resemblance with the
corresponding concepts and results for difl‘erentlal equations. Recurrence
relations are also called difl'erenee equations because they can be written in
terms of the differences between the consecutive terms of ase'quence. When
so expressed, the analogy between them and the differential equations be-
comes even more apparent: But we shall not go into that. Instead, in this
section we shall study the methods for directly solving recurrence relations
and by way of illustration. present the solution’s to the Regions Problems,
the Shares Problems and the Vendor Problem. Theemphasis here, however,
Advanced Counting Technique: 559
will not be on applications. in the next section we shall study various kinds
of applications of recurrence relations.
We shall not attempt to define recurrence relations formally. Although
such a definition would be necessary for a rigorous development of the
theoretical aspects, such as the existence and uniqueness of solutions, it
tends to be clumsy. Fortunately, as with difl‘erential equations, a formal
definition of recurrence relations is not necessary in order to really under-
stand and apply them. The essential idea in a recurrence relation is that it
expresses a general term of an (unknown) sequence as a (known) function
of its earlier terms. In symbols, if the sequence is {xn}:.., thena recurrence
relation for it serves to express x. in terms of x., x,, ..., x._,, for all n after
some stage, say, for all n > r. This integer r is called the order of the recur-
rence relation. The values of x,, x,, ..., x,_, have to be given or found
from some data. (In some cases we have an option to choose some of these
values.) These values are called the initial conditions. This is in obvious
analogy with differential equations. A difl‘erential equation of order r ex-
presses the rth derivative of some function, say y = [(x), in terms of the
lower order derivatives; y, y', y". ..., y("" and the initial conditions are
usually the values of these derivatives at some point, say x“, in the domain
off. There is, however, an important difl'erence. For differential equations,
the existence of a solution satisfying the given initial conditions is far from
obvious. For recurrence relations, on the other hand, this is no problem at
all. Suppose we are given the values of x., x,, ..., .v,_, say 2:, =0, for
i= 0. ..., r — 1. Now. 1:, is given as a function of x,, x., ..., x,_. and we
assume that the initial conditions are such that the values on, ..., a,_, give
us a point in the domain of this function. (If this is not the case, the recur-
rence relation has no solution. For example the recurrence relation
1
x. = , for n 3 1 has no solution if the initial condition is x. = l.)
X... — 1
x, is uniquely determined, say x, = 11,. Now, xm is expressed as afunction
of x,” x,, ..., x,. Once again we assume that (a,, 11,, ..., a.) gives us a point
in the domain of this or else there would be no solution. (As an example,
consider again, the recurrence relation x. = x 1 l for n > I. If X0 = 2,
11—] “
we do get 2:. = l, but then x, is undefined.) We repeat (or recur) this pro-
cess ad Infinilum. Either it will stop at some n (in which case there is no
solution to the recurrence relation) or else we shall get a sequence (a..}.?..,
which is a solution of the recurrence relation. By its very construction, this
solution is unique, because at every stage the function uniquely determines
a. from 11., a1, ..., a.._,. We record this as a theorem.
3.1 Theorem: A recurrence relation of order r has at most one solution
for a given set of initial conditions. I
This theorem is too simple to be of any real value. When it comes to
560 uIscnm mmam'ncs (Chapter Seven)
compute x, for large values of n, say u = 500, it is hardly practicable to do
so by going through the computation of all earlier x,’s. If we want to ‘solve'
the recurrence relation in the true sense of the term, we must express x.
directly in terms of ti. that is, we must have a ‘closed form’ expression for
21:
x., like x. = n‘ — 2n, x, = ( ) or x. =- 3*“. The trouble is that it is
u
very hard to give a rigorous definition of a ‘closed form expression' even
though this term is widely used and everybody understands what it means.
To illustrate the kind of dificulties that arise in formalising it, consider the
sequence {x,.):°_o where x, = 3"“ for all n 2 0. it is easily seen that this is
the solution of the recurrence relation x, = 3x._,, with the initial condition
x. = 3. On the face of it, it appears that 3""1 is a closed from expression
for x... Given any particular value of n, say II = 500, all we have to do is
to plug 500 for II in it and get x“. = 3'“. But it is not quite as simple as
that. What do we really mean by 3‘“? If we go back to the very definition
of 3"“, we see that it is defined as 3".3 (cf. Chapter 3, Section 4). So by
Writing x.= 3"“, we are not really solving the recurrence relation x.=3xH;
we are merely writing it difl‘ereutlyl Another example would bethe relation
ya = ny... with y. = I. Here y. = nl because that is the very definition
of III.
These example indicate that the concept of a closed form expression is
not easy to define. It will be helpful here to recall the comments made
after Definition (2.3. 7). We often have to enlarge our bag of closed form
I) ‘ ‘ in it certain ', occurring expressions, even
though technically they may have been defined by processes which smack
of recursion. For example, we treat 3" and n! as closed form expressions in
n. Methods are availabal to evaluate them, at least approximately, for a
large number of values of n. Having let in a few basic expressions like this,
we try to express the solutions to other recurrence relations in terms of
these ‘familiar’ expressions. This is the best we can hope to do. But even
this may prove to be too ambitious. Sometimes the very nature of the re-
currence relation precludes any closed form solution to it. Recall that in a
recurrence relation of order r, we merely required that for every n > r. x.
be a function of x". ..., x.,_1. But we did not require that it be the same
function for every :1. Indeed, as we shall see below, in many interesting
examples, this is not the case. So we have to allow for the possibility that
the way x. depends upon x., ..., x.-. can itself vary with u. If this variation
is of an arbitrary nature, then we can hardly hope to get a closed form
expression for x... For example let a. be the digit in the nth place of the
decimal expansion of some irrational number, say 1:. (Since
1e=3. 141592635" ., wehave «1:1, a,= 4, n,=1, u‘=5, 15=9 etc.)
Define a recurrence relation by x.~ — x,._, + a. with the initial condition
x.= 0. This is a perfectly well-defined recurrence relation of order 1 and
solving it we get x, = l, x, = 5, x. = 6, x. = 11, x, = 20 etc. But a closed
Advanced Counting Technique: 561
form expression for x. in this case would amount to getting a closed form
formula for :1... So far, no such formula is known. (From the fact that 1:
is irrational, it is easy to show that a.’s cannot repeat periodically. But
little positive information is available.)
It follows that if we want a closed form solution to a recurrence relation,
then there must be some sort of a uniformity in the way each term depends
on the preceding ones. This will be the case in all the examples that we
shall consider. But to formalise it and incorporate it as a part of the
definition of a recurrence relation is no easy job. That is why a rigorous
definition of recurrence relations is beyond our scope. We shall therefore
adopt a somewhat naive approach in which, instead of defining recurrence
relations in the abstract, we shall concentrate on solving a few kinds of
recurrence relations. As foremost examples, we recall three problems
discussed in Chapter 1.
(i) In the Regions Problem, we let a. bethe number of regions into
which a plane is cut by n lines in general position (i.e., no three of which
are concurrent and no two parallel). We got the recurrence relation
a.=a,,_,+n for n21. (1)
The initial condition was a. = 1. In Exercise (3.8), we indicated a method
for solving this recurrence relation. Putting n =1, 2,..., It (When k is a
positive integer) and summing both the sides, we get
2 a..= 2 a._,+ 2n.
Since every term, except 11., on the left hand side cancels with some term
on the right, we get a]. = au + (l + 2 +...+ k). By Theorem (1.11),
l kk l k‘ k
1+2+.. .+k=k(__k2+).Soa =a.+(—z+—2=L2fl.
I
Thus we see that n lines in general position cut the plane into W.
regions.
(2) In the Shares Problem, [7,. is the number of shares had during the
nth year. The conditions of the problem give the recurrence relation
bu = b». + bn-t (2)
which is valid for all n > 3. If we set (7" = 0, (2) becomes valid for n = 2
as well. So (2) is a recurrence relation of order 2, with initial conditions
bu = 0 and b, = 1. Unlike (1), there is no slick way of solving (2). In fact,
the solution may come as a surprise to the untutored. Here it is:
”~= «s[(‘~+—“§)- (3”)? is... 1.2.... (3,
562 mscnm MATHEMATICS (Chapter Save»)
This is unbelievable! The bu’s, to say the least, are whole numbers (being
the numbers of shares during the nth year) and from (3) it does not appear
that b. is even an integer. much less that it satisfies (2). But the surprise
wears out if we look at (3) a little closely. First, if we apply the binomial
theorem and keep in mind that (x/S)"‘ = 5* and (\/2)"l“+1 = 9V5, we
see, after cancellations, that
(n—l) 2 u
bn =L E I J( ) 5*/2"".
k=0 2k + I
This does not quite prove that b. is an integer, but at least it is a rational
number and the initial shock brought about by the presence of .the irratio-
nal number V5 is gone. We still do not know that (3) satisfies (2). This is
best done by direct calculation. For each n 2 2, we have
._ =sues—5r-("—2”)?
=L.[Cir—5)“(Les)—<-— a(1%)] (since (1 i v5)-= 6 i2v5)
=— [es—5r"(. + lei—5)414—5)“(.+ l—a—Wn
=—‘ (Le-5w(#5)”—(1-—:—5)“— (‘“—¥5)"“]
= 11—! + bn-r
Thus (3) satisfies (2). Putting n = 0 and n = l in (3), gives ba=0 and b, = 1.
So the initial conditions also hold. By Theorem (3.1), (3) is the (unique)
solution of (2).
Athough we have now completely solved the recurrence relation (2), the
solution seems to have been pulled out of a hat, so to speak. We would
naturally like to know if there is a way to arrive at (3) rather than merely
verw’ it as we just did. As we shall soon see, generating functions provide
a way to do just that.
(3) In the Vendor's problem, we saw that the essential part was to
count, for every positive integer n, the number of balanced arrangements
of in pairs of parentheses. We denoted this number by an and got the
recurrence relation,
a, = a..; + ale.” + a,a,._.+ ...+ a._,a, + akin, + (1..I (4)
which is valid for all n 2 2. If we set a. = 1, then (4) can be rewritten as
l—l
a, = Eu a,a,...,., (5)
Advanced Counting Techniques 563
and is valid for all n 2 1. Thus here we have a recurrence relation of order
1 with the initial condition do = 1. In this case, we already know the answer
by another method. In Chpter 2, Section 3 we counted the number of all
unbalanced arrangements of u pairs of parentheses and thereby showed that
(2n) !
“n = m ‘5)
However, the argument given there was a tricky, combinatorial argument.
So we look for a more systematic way of solving (5) and as we shall see
again, generating functions provide an answer. Unlike in the last examples,
in this example, it is not so easy to verily that (6) is a solution of (5).
Let us now see how the method of generating functions provides a
systematic solution in all these three examples. The essential idea in Ethis
method is to convert the given recurrence relation about a sequence. say
{a,.}, to an algebraic equation for its generating function A(x). Most of the
time we take A(x) to be the O.G.F. of {(1.}, although, with some recurrence
relations, the E.G.F. is a more convenient choice. Solving this algebraic
equation gives us a formula for A(x). We then expand A(x) as a power
series in x. Equating a. with the coeflicient of JrI gives a closed form expres-
sion for 0., thereby solving the recurrence relation.
Let us begin with (l). Multiplying both sides by x“ we get
a,» = a,_,x* + nx", for n a l (7)
We now sum both the sides for n = I, 2,.... Denoting 5 apt" by A(x) and
no
recalling that “o = 1. the left hand side gives A(x) — a, i.e. A(x) — l. The
first term on the right hand side gives .2“ 41...,” which is the same as
.-
W I?
x 2| 11H x0". i.e., x 20 our, i.e. xA(x).
l- l-
The second term yields 2: mt". In the proof of Theorem (1.1!). we already
evaluated this sum as 0:10,. So. from (7), we get
A(x) — 1 = xA(x) + (1—:_-x)' (a)
This is an algebraic equation in A(x) and can be easily solved to give
1 x l x
‘(x)=1'—'x[l+(1——x)-]= "1- x+ (1 ——‘x) '
We now expand this term by term using Theorems(i.2) and (1.3), (cf
Exercise (1.9)). The coeflicient of x" is
564 Discms MATHEMATICS (Chapter Seven)
+(n+ 1 )=1+( n+1
2 )=1+"_(";fl.Soa.=ns—i;;2,
n—l
which is the same answer as before. Actually, this solution is not essentially
difl‘erent from the earlier solution because to evaluate the sum l+2+...+n
we used generating functions anyway. However, there are slicker methods
of evaluating this sum and if we use any one of them, then we do get a
solution of (1) without generating functions.
To really drive home the importance of generating functions, let us,
therefore, solve (2). Denote E 17.x" by B(x). Multiplying both sides of
III”
(2) by M and summing over for n 22 (which is the range of validity of
(2)), we get
"E: bnx" = .Ez bl-l. x" + “E: bII—g x" (9)
The sum on the left is E 17.x" — b,x - b. which equals B(x) — x since
n-O
b. = l and b. = 0. The first sum on the right equals x E. b.-|x"" which
.—
reduces to x i: b..x'* which equals x(B(x) — b0 = xB(x). Similerly the
.-
second sum reduces to x‘B(x). Thus (9) gives
80:) — x = x B(x) + x’B(x) ' (10)
solving which, we get B(x) = fl . To expand this, we resolve
1—;_—x‘ into partial fractions. Suppose (I — x — x') factors as (l — ex)
(1 — fix). Then a + p = 1 and up = - l, by comparing coefiicients. It
follows that «,5 are roots of the quadratic x'—x—l =0. So we let.
a = 1 +21/5 and p=#. [Now we see how on earth these numbers
. . . x _ l 1 1
got into (3).] It is easily seen that Im _ 75 [—1 _ ax” l_—p‘:]
So, by Theorem (1.3), b. = coefficient of x" in B(x) = V}? (ow — 9"). So (3)
is the solution of the recurrence relation.
Before tackling the next relation, namely (5), by generating functions.
it is perhaps time to comment on the validity of this method. When we
recover the nth term of a sequence as the coeflicient of x" in its 0.G.F.
we are crucially using the uniqueness property of power series expansions
(Theorem (1.2) (i)). In essence we are saying that if the power series
Advanced Counting Techniques 565
n an
E. a.x" and 20b.” are equal then a.=b,. for all n =0,l,2,...,. This
.- n-
statement is not quite true as it is. As remarked in the proof of Theorem
(1.2), it has to be qualified by the expression ‘for all x in the common
interval of convergence of the two power series’. Some power series in x
may converge only for x = 0, that is, their radii of convergence are 0. For
example. the power series ink" (where we let 0° = l) and the power
.
series 20 nlar'I converge only for x = 0. For such power series, the state-
.-
ment 2° 0.x“ = 2 b,.x" for all x in the common interval of convergence’
II- "-0
is vacuously true and does not really mean much. In particular. it does
not guarantee that a,l =11n for all n. In order to ensure this, we must
know beforehand that the two power series have positive radii of conver.
gence. For example, in solving (I), after we get
" 4,_ , _ l x
E. “u ‘4‘”— —l—x+m’
before equating a. with the coeflicient of x' in the right hand side, we
must ensure that both sides have positive radii of convergence. For the
right hand side, this is not much of a problem. it is a well-known theorem
that the power series expansion of 1—1—2: as well as that of (1 3x). are both
valid for |x| < l, i.e. the radius of convergence is 1. But when it comes to
Q
finding the radius of convergence of 20 tax" we are in a dilemma. As
.-
remarked in the proof of Theorem (1.2), the radius of convergence is given
by R = l/fifi |a.]1I". But we cannot compute it without knowing a.. We
would know a. only after solving the recurrence relation. But the solution
is not justified unless we show R >0. This constitutes a vicious circle.
Fortunately, there is a way to break it. We do not really need the
exact value of R. It is enough for our purpose to show that R > 0, and
for this it would suffice if we show that the sequence la. ’1’" is bounded.
For this, any crude upper bound on la.| would do. For some recurrence
relations, such an upper bound is easy to guess and prove by induction,
even before solving them. All we have to ensure is that as It grows, |a,.|
grows less rapidly than this upper bound, For example, in (1), an exceeds
a,._l by 71. Since n’ exceeds over (n —- l)‘ by 2n + 1. an upper bound of
the form An' for a suitable A will work. In this case, we guess that a. g n'
for all n 3 2. For n = 2, equality holds. If a...‘ g (n - l),‘ then a, = a...l +
+n< (n— l)'+ n=n'—n+ l <n'. So, by induction a,<n' for all
n 3 2. Hence |a,.|‘/n g (n‘)"" = (n‘/")'. It is well-known that n""—)l as
n—> no. So [a.|11" is bounded which meansE 14.1"" < 00. This is all we
566 omens-rs MATHEMATICS (Chapter Seven)
need to conclude that R > 0. As noted before, we now have an absolutely
rigours justification to equate a. with the eoefiicient of x" in the power
series expansion of A(x).
Similarly. in solving (2), we got E. lax" = B(x) = -1_.:—_x5 Since the
n-
. 1 1 . _,
pOWer series expansions of l —ax and 1 _ BX are valid for |x|< [a|
and] x| < [ B |" respectively, we see that the radius of convergence is
positive for the power series expansion of fl, . As for 50 b..x", we
can easily prove from (2) and induction that b» < 2‘I for all n. SoliTn lb. |""
< 2, showing that 3: bnx‘ also has a positive radius of convergence. We
x
are now justified in equating b, with the coefficient of x“ in
1 — x — x' '
But there are limitations to this approach. For recurrence relations like
(5), it is far from easy to obtain an upper bound on [11.] before solving it.
An alternate method proceeds as follows. Instead of appealing to the uni-
queness of the power series expansion (Theorem (1.2)), we appeal to the
uniqueness of solutions of recurrence relations, satisfying the given initial
conditions (Theorem (3.1)). Essentially, this amounts to starting from the
0.G.F. and working our way backwards to show that the coefiicients of a“
satisfy the given recurrence relation. For example, in the case of (1), let us
1
start from the function
1 — x + (fig Let the power series expansion
of this function be £0 ox". (Note that we have not yet shown that a. = c.
I-
because we are not in a position to apply Theorem (1.2).) This expansion
is valid for l xl <1 i and, as we showed above,
0 = n- + n + 2
O 2 ‘
In particular 0. = 1. if we reverse the steps in arriving at
l x
1 x + (l — xF’
-
(i.e., going from (7) to (8)), we on show that
331 0.2:“ = "Ed (c.__, + n)x' (for all |x| < l) (e)
Now comes the crucial step. The two sides of (a) are power series with
known coeflicients and their radii of convergence are therefore known to
be positive (in this case both equal 1). We are now fully justified in applying
Theorem (1.2) (i) to conclude c. = c...‘ + n for all n 2 I. So the known
Advanced Counting Techniques 567
sequence {c.)::. satisfies (I). Since the initial condition (viz. e, = i) also
holds, it follows from Theorem (3. I) that {£410 is the only solution to (1).
So 6. = a. for all )1. As we already know c," it now follows that
a. =w
2
This approach is somewhat cunning in that we are not answering the
question of how we thought of starting from the function
;+ x
l—x (17;?
It is as if we generated this function behind closed doors and then brought
it out in the open and showed that it really worked. Logically. of
course, this approach is perfectly unassailable. Nor is this an isolated
instance of this kind of reasoning. In mathematics, in many problems
we have to ‘suspect‘ the solution first, that is, we have to do reasoning to
the efi‘ect that if at all the problem hasa solution then it has to be this one.
Then of course we have to back our suspicion by a rigorous proof that
what we have in mind is in fact a solution. If mathematics is viewed as a
sterile science then it is only the proof that counts. In many problems.
however, the proof is a routine drill and the real art lies in guessing the
solution. In a murder trial, as far as the conviction goes, what matters
is the hard evidence put forth by the police ofiicer and not what led him to
suspect the accused in the first place. In amystery novel. it is just the other
way. (And often, the police oflicer is not the one who does the brain
work!)
Henceforth we shall not worry about the issue of convergence. It will
be a good exercise for the fussy minded ones to validate our solutions by
either one of the two approaches.
Letus now return to solving recurrence relations. Consider (5). Again
let A(x) = in 41.x". The very form of the right hand side of (5) shows
a.
that it is the coeflicient of x'" in the product A(x)A(x). So, once again
multiplying both sides of (5) by x" and summing over for n 3 l (the range
of validity of(5)), we get A(x) — «II. = x[A(x)]'. i.e.,
x[A(x)]’—— A(x) + l = 0. (11)
This is an algebraic equation for A(x). Unlike the earlier two problems,
however, it is a quadratic and so, for each x 95 0. we have, by the quad-
ratio formula, A(x) = W. From this, we must not hastily con-
1—1/1—4x
clude that either 40:) = m
\.
alix ,. o 0:49;) = T—
568 mscnma MATHEMATICS (Chapter Seven)
for all x aé 0. Although for each x eé 0, one of the two possibilities must
hold, there is nothing to guarantee that the same possibility must hold for
all x. To illustrate the point involved here, it is obviously true that every
human being is either a man or a woman. But that does not mean either
every human being is a man or every human being is a woman! [Cf. Exercise
(1.4.9) (viii).] In the present case the choice of sign in the expression
49;) = W
.g u
1 " ' from ' matL ' r we need
the continuity of the three functions
A(x),!—+—
”pt—4x and #3
in a sufiiciently small open interval (0, R). It can then be shown that
throughout this interval the same possibility holds, l.e., either
‘0‘) = 1+ V21—4x
x
for all x or
_ I — 1/1 — 4x
(4(3) — T
for all x. The details will be indicated in the exercises, because they belong
to continuous mathematics and if we take the cunning approach, we can
spare them anyway. We still have to decide which of the two possibilities
holds. For this, we expand V i — 4x = (l—4x)‘l’ by the binomial theorem;
it comes as l + i c,x'. where
1-)
”Wm—assures!) rl
After a little computation, c, simplifies to
(2r— 2)!
- 2 r!(r — l)l'
It follows that
1 + V1 — 4x 1 1 -
T = 3 + i ,2 W"
for all x in the interval (0, R). But then
1+V1—4x
T»ooasx—>0+.
Advanced Counting Technique: 569
whereas A(x) —» a, = l asx —> 0+, A(x) being continuous at 0. So it cannot
be the case that A(x) =
17 + 1/1— 4x . This forces us to conclude that for
2x
all suficiently small positive values of x,
1 —- VT—Tx
40‘) = 2x (12)
Since we already have the power series expansion of 1/ 1 - 3x. it follows
that
1—(1+§c,xr a, 6
So a. = coefiicient of x' in A(x)
c." (2n)!
Thus finally we obtain (6) as the solution of (5).
These three examples have hopefully convinced the reader of the power
of generating functions. In all these examples, we used the ordinary gene-
rating function. Sometimes, it is better to consider the exponential generating
function of the given sequence. We illustrate this in the following recur-
rence relation. In the next section we shall see that this particular recurrence
relation provides an alternate way to count the number of derangemerits
of 11 objects (of. Theorem (2.4.6)).
3.2 Problem: Solve the recurrence relation
d. = mi... + (— 1)~ (13)
for n 2 1, with the initial condition do = 1,
Solution: Let us first see what would happen if we try to solve (13) by
letting D(x) be the 0.6.17. of (11.). As usual we multiply both side: of (13)
by x“ and sum over for n 2 l. Rewriting nd,,_1 as (n — Dds—1 + tin—r it
is easy to show that we get
at
D(x)—l= x'D'(x) + xD(x) — m (14)
This it a linear differential equation in x and has for its general solution.
_ l ell’ dx
0""— Wl‘"lm*x) (‘5’
where c is a constant. Unfortunately, the integral in (15) cannot be evalua.
ted in a closed form and so this approach is not workable,
570 mscnn'rs MATHEMATICS (Chapter Seven)
However, if we let E(x) = .2. 553 x" be the E.G.F. of {H.110 then (13)
inc
can be solved as follows. We multiply both its sides by :—"' and sum over
. n _
torn; 1. Since fi—mmdjo
1 " (—n!1)“ x“ _
_ _flweget
E(x) — l = xE(x) + e"‘ — 1 (16)
Solving which. E(x) = &. By Proposition (Lil), the coefficient of x" in
ex is the sum of the first n + l coeflicient of e", i.e..
l l l l
l— 1—! +7! —3—!+..-+(—1)";l—i.
So,
_
d.—nl(-2—!—§-l+4-!—5—! l I
l l l 1 +...+(—1)«n-‘).
The failure of ordinary generating functions in this problem could have
been predicted in advance by consideration of convergence. In the right
hand side of ([3), the term (—l)’l is insignificant as compared to the other
for large n. So
4,. z nd._.=n(n—l) d._,zn(n-l)...(n—r+ l)d..,z...;= g 1,.
(We do I10! 80 further because d, = 0.) Thus d. grows at a rate comparable
to that of n!. Since (I: 0‘1"» no as n -> co, the power series E dux” W1"
n-O
converge only for x = 0. As noted before, this will make it impossible to
recover d. from D(x). even if we could obtain a closed form for it. The
exponential generating function, E(x), on the other hand has a positive
radius of convergence. 1:! is a very large number and division by it 'brings
d. to proportion’ so to speak. Thus the exponential generating functions
have a theoretical advantage over the ordinary generating functions.
Wherever the O.G.F. converges, so does the E.G.F. But the converse is not
true, as shown by this example.
Nevertheless, the attempted solution using D(x) is noteworthy for another
reason. We converted the recurrence relation (13) to a difl'erential equation
(14). We could just as well have reversed the process and gotten (13) from
(14). This is in fact more commonly done. This technique is popularly
known as ‘serles solution 0! a diflerential equation’. Although the basic
theorems guarantee the existence and uniqueness for a large class of
difi‘erential equations, they are of little help in actually finding the solution
of a differential equation. Various ingeneous methods have to be used for
Advanced Counting Technique: 571
this purpose. Even then, it often happens that the solution cannot be found.
In such a case the problem is converted through Taylor series. For conveni-
ence, let us suppose that y=flx) is the solution of the given difl'erential
equation and that we are interested in the values off in a neighbourhood of
the point 0. Let f(x) = E a..x' be the Taylor series expansion of f near
n-I
0. Here the coefficients a. are related to f by a. = f") (0) Solving for f is
n! '
equivalent to determining a. for n = 0. l. 2,.... The differential equation
to be solved can be translated into a recurrence relation about the a.'s.
This recurrence relation is usually not easy to solve. Because if it could be
solved, then the original difi‘erential equation was probably easy to solve
directly anyway. However, starting from the initial conditions we can
N
calculate a, for a fairly large value of n, say n = N. Let gN(x) = E a..x'.
n-O
Then, in a small neighbourhood of 0, gN(x) is very close to f(x) and so
y = g"(x) can be taken as an approximate solution of the difl‘erential
equation over this neighbourhood.
Although our interest is not in solving differential equations we show
by an example how. this method works. Suppose we want to solve the
difi‘erential equation (x + 2) y" + y=0, in a neighbourhood of 0, with
initial condition y(0)= l, y'(0) s i. Let y=f(x) be the solution and
.20 am its Taylor expansion at 0. Then y" = ‘50 (n + l) (n + 2) max".
I. .-
For every n. the coefiicient of x" in (x + 2) y" + y must be 0. This gives,
no: + 1) a... + 2(n + I) (n + 2M»: + a.= 0 (17)
which can be rewritten, with a shift of indices, as
_ — a...— (71—2) gn— l)a,...I
“" ‘ moi—1) (18)
for all n > 2. Starting from a. = l and a. = l, we get successively.
_ _ I — — 2a —1 l
a, =a:_0= -—E, 11,: —a—‘l—2——'= l_-2+*=_2_4‘" etc.
These computations can easily be done on a machine. If we compute a.
[0
for n g 50 (say), then y = Z 11.x" will be a very good approximation to
l-'
the actual solution.
This technique is a classic example of how discrete mathematics is used
to approximate continuous mathematics. The other way transition is not
very common. That is, differential equations are not generally used to solve
recurrence relations. But there are exceptions. Indeed we already saw an
instance of this. In Problem (1.12). we found the O.G.F. of the sequence
572 mscam MATHEMATICS (Chapter Seven)
2»
(a,)‘:_o where a.= . If we anale the solution, we see that the
n
2n + 2 2n
crucial step was the equation ( ) (n + l) = 2 (2n + 1) (
n+ l n
This can be rewritten as “n+1 = 4"”__:‘13 a... or, with a change of indices as
a. = 4n _ 2 11..-, for n 2 1. This is a recurrence relation and the first part
of the solution to Problem (1.12) essentially amounts to solving this recur-
rence relation by converting it to a difl‘erential equation.
Among the various recurrence relations we studied so far. (1) and (2)
enjoy certain properties not shared by others. They belong to a class of
recurrence relations known as linear recurrence relations. As we shall see
in the next section, such equations arise in a large number of applications.
Consequently, although the method of generating functions is available to
solve them, it is worthwhile to have other methods which enable us to
write down the solutions more quickly. We study one such method here.
First we define linear recurrence relations and a few associated terms.
3.3 Definition: Let r be a positive integer. Then a linear recurrence rela-
tlon (with constant coefl‘icients) of order r, is an equation of the form
Clan + Cr—a all-l +---+ Clan—r“ + Coal-v = fol) (’1 > ’) (19)
where C,, C,,..., C, are constants (with C, and Co non-zero) and f(n) is a
function of u only. If moso, the recurrence relation is said to be homogene-
ous. Even when f(n) is not identically 0, the equation obtained by replacing
the right hand side of (19) by 0, i.e. the equation
Cm. + C._,a._.+...+ C.a._,+, + cua._, = 0 (n 2 r) (20)
is called the associated homogeneous equation (Al-LE.) of (19).
The assumption that C“, C.- are non-zero is of a technical nature.
intended to ensure that the order r be uniquely defined. Note also that
nothing has been said about initial conditions. We shall return to them
later.
As examples, a rewriting (l) and (2) we see that (2) is ahomogeueous
linear recurrence relation of order 2, while (1) is a. non-homogeneous linear
recurrence relation of order (1). Its A.H.E. is a, —- 41,.I = o,
The terminology will undoubtedly remind the reader of the system of
linear equations we studied in Chapter 6, Section 4. Those familiar with
difl‘erential equations will also be reminded of the linear difi‘erential equations
with constant coeflicients and will be in a position to predict what is coming.
Actually the resemblance between the two is not coincidental, nor just a
matter of form. Using exponential generating functions, it can, in fact be
Advanced Counting Techniques 573
shown that the two are equivalent (see Exercise (3 32)). Consequently the
theory of linear recurrence ‘ ‘ can be’ ‘ .— ‘usingthem-
concepts and theorems from linear differential equations. Nevertheless we
shall develop this theory directly, because of its simplicity and also because
it gives us a. chance to use a few basic facts about vector spaces. The
other approach is given as an exercise.
Let us first see what is ‘linear’ about linear recurrence relations. The term
comes from linear transformations. We show how the sets of all solutions
of (19) and (20) can be interpreted in terms of suitable linear transformations.
Although such an interpretation does not, by itself, give a method for solving
the recurrence relations, it provides valuable insight into the nature of
soultions.
Let V be the set of all infinite sequences, say, a = (no, a1, 0,, ..., 0,, ...)
of real (or complex) numbers. Every such sequence is a function from the
set {0. l, 2, ...) into R (or c). Consequently, under coordinatewise addition
and scalar multiplication, V is a vector space over the field R (or a) (see
Example (4) in Chapter 6, Section 3). Now let r be a positive integer and
Co. C“, ..., C, be real (or complex) numbers with 6,, Co non-zero. (Some
of the other numbers may be 0). Let W be the vector space of all real (or
complex) valued functions with domain (r. r+ l, ...) = (n E NH; > r).
Let f be a fixed element of W. We are now ready to look at (19) a little
difi‘erently.
3.4 Theorem: For a given choice of r, C., ..., C, (with q, and C, ;e 0)
and f, the solutions to (20) constitute the kernel (or the null-space), say,
K, of a linear transformation T: V» W and those to (19) constitute a
translate (or a coset) of the kernel. Further. K is an r-dimensional subspace
of V.
Proof: Define T: V—> W by
TM) = T((a.. ab (1,. ...)) = T;
where
T; :(r, r + l,...}-» R (or G)
is defined by
T;(n) = Cran + 0.41,.-. + + Cunt—y
for n > I. Then a is a solution of (19) ifl‘ T; = f and it is a solution of (20)
ifi' T—= o where—0 means the identically zero fimction; which is also the
zero vector in the vector space W. The linearity of T Is a routine verification.
Ira, B e V and a, as R (or C). then for every n > r T‘pi)
i c,_.(«a._,+ ab.-.)= « (-0é Chm.-. + p [-0>5 cm.-. = mm) + pm»).
= (-0
Let K be the kernel (i.e. the null-space) of T. Then the first assertion follows
from the very definition of the kernel.
574 niscmm MATHEMATICS (Chapter Seven)
For the other two assertions, we first show that T is auto, i.e. for every
fe W, there exists it e W, such that T; = f. The construction of such ii is
direct and uses the assumption that C, a6 0 Choose an, aw", a,_, arbit-
rarily. Having chosen them, a, is uniquely determined by
C.a. + Cum—i +...+ Coat) =10).
specifically,
a, = £7 (fly) — C,_,a,.| — — Cod”).
Similarly am is uniquely determined by
am = % (ftr + I) — C...a. —...- cm.
a,“ is uniquely determined by
am = é. (fir + 2) — CHM. a...— 00:1,)
Continuing in this manner, we determine each a. for n 3 r. (Essentially
this is a special case of the construction preceding Theorem (3.1).) It now
follows that T; = f and hence that T is onto. In other words. (19) has at
least one solution for a given f. Once we know this, the other solutions of
(19) are obtained by adding to this Iparticular solution' the elements of K:
see Exercise (6 3.8). This proves the second assertion. A popular way of
expressing it is that the ‘general solution' of a linear recurrence relation is
obtained by adding any one particular solution of it to the general solution
of its associated homogeneous equation.
The construction above also shows that dim (K) = r. If we take f = 5,
then the construction shows that every elementd of K is uniquely deter-
mined by no...” a,_1 which could be chosen arbitrarily. Thus we have a
bijection between K and the r-dirnensional vector space R' (or C'). This
bijection is easily seen to be a vector space isomorphism. (Under this bi-
jection, every element of K is truncated, by taking only its initial segment
of length r.) It follows that dim (K) = r.
This theorem shows that the problem of solving a linear recurrence
relation can be broken into two parts: (i) finding the general solution of
the associated homogeneous equation and (ii) obtaining any one particular
solution of the original relation. We tackle these two parts Separately.
(i) really amounts to finding a basis for the kernel K in the theorem above.
One such basis was in fact constructeg in the proof. It can be described
as follows: For i = 0,..., r — I, let e, = (0,..., 0,1, 0,,,., 0), i.e. the vector
having 1 in the ith place and 0 elsewhere. If we start with 3,, that is, if we
let a]. = 0 for k = 0,..., r— l. k eé i, and a, = l and apply the construction
in the proof above, then we get a solution of (20), i.e.,a vector, say 17:. in
K which is a continuation of e,. The vectors in...” fi,_, are linearly inde-
pendent, because their truncations, Eo...., E,., are so. Since dim (K) =
r.
Advanced Counting Technique: 575
it follows from Exercise (6.3.11), that an...” :7,_, is a basis for K. Conse-
quently, the general solution of (20) is of the form cofio+c,a1+...+c,_,fi,_,
where cw“, c,_, are arbitrary constants. However, this form of the general
solution is of little practical value, because we have not identified the {11's.
The only description we have of them is that they are some sequences
constructed recursively so as to satisfy (20). We have no closed form for-
mula for their typical terms.
So we look for a basis for K whose elements can be easily described in
:erms of the given recurrence relation. The key step in this search is the
following. Let a be a non-zero real (or a complex) number. Denote by 17.
the sequence {¢I}f_o = (l , a, a',..., a',...). The following proposition tells
us when a. e K.
3.5 Proposition: a. = a" for n = 0, l, 2, is a solution of (20) if and only
if C,u' + C,_,a"’ + ...+ Cy: + C. = 0. In other words, 12. is in K ifl' a: is a
root of the polynomial C,x' + CH)!" +...+ C,x + C...
Proof: ' Suppose a,l = a" is a solution of (20). Then for all n > r we have
Cm" + C,_,a"‘1 +...+ C.¢H+' +Coat"' = 0 (21)
for all n 2 r. In particular, setting it = r, we see that a is a root of
C,x' + C,_1x'-‘ + + Clx + 0,.
So the condition is surely necessary. Conversely if C,” + C,_,¢'-1 +...+
C,a+ Co = 0 we merely multiply it by a‘” and see that (21) holds for
all n > r. This proves sufiiciency. H
In View of this result, the polynomial C,x' +...+ Clx+ Co and its
roots are of tremendous importance in finding the general solution of (20).
As we shall soon see, these roots completely characterise all the solutions
of (20). This is probably the reason they are given the following name:
3.6 Definition: The polynomial C,x’ + + C9: + C, is called the
characteristic polynomial of (20). Its roots are called characteristic roots of
(20). The polynomial equation C,x' +..+ Clx+ Co = 0 is called the
characteristic equation of (20).
Note that because of our assumption that C.;é0, 0 can never hen
characteristic root. Proposition (3.5) shows how each characteristic root
gives rise to a vector in K. The following proposition goes a little further
towards obtaining the general solution.
3.7 Proposition: If a” a,,..., a". are distinct characteristic roots then the
vectors a.“ fi.,...., in are linearly independent.
Proof: By Proposition (6.2.25), k g r. Let 7. = (l, «5..., sf"). Then 7‘ is
576 Discmn MATHEMATICS (Chapter Seven)
a truncation of a.,. In order to show that (11.,...., a“) is linearly indepen-
dent, it obviously sufiices to show that the VT: are linearly independent.
By Proposition (6.4.32), the latter is equivalent to showing that det (4)730.
where A is the transpose of the k>< k matrix
r 1 1 1 1
E! “g I; “k
a} a: «3 “2
A= . .
L of“ atIzz-r “3—: “Iii—t
This is a Vandermonde determinant and by Exercise (6.4.33) it is non-zero
since the roots a,,..., «k are distinct. I
We are now in a position to completely describe the general solution of
a homogeneous linear recurrence relation whose characteristic roots are
distinct. We still have to consider the case of multiple roots. However, in
the case of most of the linear recurrence relations arising in applications.
it turns out that the characteristic roots are distinct. We therefore content
ourselves in giving the general solution in this case only. relegating the case
of multiple roots to the exercises.
3.8 Theorem: Suppose «I, 3...", a, are the distinct characteristic roots
of (20). Then the general solution to (20) is
a. = 01a: + caa; +...+ c..a',', {or all n 2 0 ( 22)
where (:1, cw... c. are arbitrary constants taking values in R (or a).
Proof: Apply the last proposition with k = r. The vectors a,” a.,...., 12.,are
linearly independent elements of K. But by Theorem (3.4), dim (K) = r. So
it follows from Exercise (63.1 1), that they span K. This means every element
of K, i.e. every solution, say a, of (20) is of the form gau- +c,fi.,+ +c,fi.,
for some constants c,...., 0,. Recalling that the vector space operations in
V are termwise, we see that (22) holds. I
As a simple application we return to (2). Rewriting it as
b. —b... — b.-. = 0
we see its characteristic polynomial is x‘ — x -— l and so the characteristic
1+2\/5 and 1—245
roots are which are obviously distinct. So its general
Advanced Counting Technique: 577
- - ll —
solution 13 b,‘ = tact—Vs) + c.(1 2/5). where 0p c, are constants,
to be determined from the initial conditions, which in this case are I:,, = 0.
bl = 1. Setting n a 0 and l we get respectively, b, = 0 = c, + c. and
I 5 I — 5 .
bI = 1 = c, (+T‘l ) + c,(—2‘/~). Solvrng these two equations simul-
1 _
taneously, we get t:l = «73 and c= = 751 . So the particular solution to (2) is
b~= Mtg—“5)"- ('afiy],
which is the same as (3). Note that the work involved is not qualitatively
different. because in the earlier approach, in order to resolve1_—:_—J‘1into
partial fractions we had to solve the quadratic x“ — x — l anyway. The
advantage of Theorem (3.8) is that once we obtain the characteristic roots,
there is nothing more to do. Note also that whereas in the earlier approach
the generating function depended on the initial conditions, in the present
approach, the initial conditions figure only at the end. Even when the linear
recurrence relation is not homogeneous, we first obtain its general solution
and then determine the constants so as to satisfy the given initial conditions.
Although we are omitting discussion of the case of multiple character-
istic roots, the case where some of the characteristic roots are complex
deserves to he settled here. In our treatment of the linear recurrence
relations so far, it made little diflerence whether the coefiicients C0,...,C,
in (20) were real or complex. Indeed Theorem (3.8) is equally valid over
any field. In most applications, of course, the coefficients are real and in
fact integers. The trouble, though, is that even when all C’s are real, some
of the characteristic roots may be complex, as we see in the following
very simple recurrence relation of'order 2.
a. + a.-. = 0 (for n > 2) (23)
Here the characteristic equation is x’ + l and the characteristic roots are
rand ——i. In such a case the solution (22) is valid only if we treat the
coefficients of the original recurrence relation as complex numbers. The
constants c1....,c, in (22) must then be allowed to take complex values, if
we want a particular solution in which all the a.’s are real. Suppose, for
example, that in (23) the initial conditions are a, = 5 and a, = 6. By
inspection, in this case the solution is (5, 6, —5, -6, 5, 6,—5, —6, 5.6,...)
Let us try to obtain this from (22). Let a. = cI i" + c,(—i)". Setting 7: = 0
and 1 we get. from the initial conditions, a. = 5 = c, + c, and a, = 6 = c,i
— c,i. Solving these, we get c. = 5/2 — 3i and cI = 5/2 + 31. Consequently.
a, = (5/2 — 311')?' + (5/2 + 3i)(-— i)“. Mathematically, there is nothing wrong
with this expression. Its values are real for integer values of n. Still, we
578 Inseam MATHEMATICS (Chapter Seven)
would prefer if a,, can be expressed directly in a real form. We now show
that such an expression is always possible. That is, as long as the coeffi-
cients C..,...,C, in (20) are real, even though some of the characteristic
roots are complex we can obtain an alternate basis for the vector space
K so that every real solution of (20) can he expressed as a linear combi-
nation ot‘ these basis elements with real eoeflicients.
The key idea is that complex roots of real polynomials must always
occur in conjugate pairs. Writing complex numbers in the polar form. if
re" = r cos a + ir sin 0 is a characteristic root, so is re-“. The vector in
K generated by re” is 12,61. = {r‘e'"'}:_° while that generated by re“° is
17”." = {’"e—i"}:'-o' Now suppose a = {a.):°_o is a real sequence which
is a linear combination of fire" and rim-u with complex coeificients say,
c1 and 0,. That is, a, = c.r"e"" + crr‘" for all n 2 0. Let c, = p‘e'h
and c, = pze‘h. Then a, must equal the real part of crew + ewe-"'9,
which is M91 cos (n0 + 01) + ”'93 cos (M) + 0,). Expanding cos (n6 + 0,)
and cos (n6 +e,) and rearranging the terms, we get a.=d,r" cos n0 +dgr"sln no
where d, = pl cos 0, + p, cos 0, and d, = -— 9, sin 9, — 9. sin 0,. Note
that d, and d, are real. Thus we have shown that every real linear com-
bination of
(r"e""):°.o and {rue-m)?“
with complex coefficients can be expressed as a linear combination of
{7" cos n9):°.o and {r" sin n0)?“
with real coefficients. It follows that we can replace the pair of vectors
an“, afl-n in the complex vector space K by the pair of vectors
{rn cos nails, {1" tin no),?.o
in the real vector space K. Repeating this procedure for all complex charac-
teristic roots (which we assume to be distinct), we get the following
result.
3.9 Theorem: Suppose the coefiicients Co, Cl, ..., C, in (20) are real and
that the characteristic roots are '
r.e="', r.e*"'. My", Bum. ..-. P.
where Bum, ..., p, are real. Then the general solution of (20) is
a. = dflcos n01 + d3; sin nil1 + dgr; cos 0; + dgr; sin n, +...+...
+ (in; cos 710;, + dgr" sin ’10,, + dunfl'g'.“ +...+ 49: (24)
where d1. ..., d,“ d", ..., dk’, (hm, ..., d, are real cousitauts. I
For example. returning to (23), we have r = 2, k = l, r| = 1 and
0, = 12:. So the general solution of (23) is a. = d1 90352? + d,’ sin n—u,
where dl, d,’ are real constants. If the initial conditions are a, g 5 and
Advanced Counting Technique: 579
a, :6,wegct5=d,coso+d,’sin0=d,and6=d,cos72—E+d,’ sin
g = d,’. 50 the particular solution is a. = 5 cos n71: + 6 sin "—2—". This is
exactly the same as a,' = (g — 3i)i" + (g — 3i) (— i)", but appears in a
more likable forms
Let us now turn to the second part of obtaining the general solution of
(19), namely finding any one particular solution of it. This solution
evidently depends on the nature of the function f(n) in (19). Inasmuch as
there is no restriction on this function, obviously there can be no Igolden
method which will work in all cases. Nevertheless a few observations
are useful. First, suppose f(n) is a linear combination of functions, ’ say,
f(n) = A,f1(n) + + M1201). Because of linearity, it is easily seen that
a particular solution to (19) is the corresponding linear combination
of the solutions to (19) with f(n) replaced by f.(n), i= 1, ..., k. This
reduces the task of finding a particular solution to those cases where f(n) is
a'simple' or ‘elementary’ function in some sense. In such cases, depending
on the nature of the function we try a particular solution of a certain form
with some unknown coefiicients and then determine these coefficients.
The table in Figure 7.4 gives a list of what form of a particular solution
works for certain kinds off(n), assuming that there are no multiple charac-
teristic roots.
Let us illustrate the procedure with the recurrence relation (i). Rewriting
it as a. — a..-, = n, we see that 1 is its characteristic root and so by
Theorem (3.8) the general solution of the A.H.E, is a” = c where c is a
i
No. Form off(n) Form of a.
I
l s", where p is not a characte- A9", where A is a constant
ristic root
2 9" where p is a simple charac- Ana" where A is a constant
teristic root
3 a polynomial in n of degree k a polynomial in n of degree k + 1
or]: according as 1 is or is not a
characteristic root.
Figure 7.4: Finding a particular Solution of a Una-r
Recurrence Relatlon
constant. As for a particular solution, f(n) = n is a polynomial of degree 1.
Here 1 is a characteristic root. So we let a. beapolynomial in n ofdegree 2,
580 DISCRETE MATHEMATICS (Chapter Seven)
say a. _—.. An” + Bn + C for a particular solution. Then
0.1-, = A(n — l)" + Bo: — l) + C.
Son = a..— a._, = 2A): + (—A + B). Ifthis is to hold for all n 2 l,
we must havezA = l and —A + B = 0; solving which, A = B = I. C
is undetermined, which happens because 1 is a characteristic root. Setting
C = 0 arbitrarily, a” = gn' + in is a particular solution of (1). By
Theorem (3.4), the general solution of (l) is a. = c + §n‘ + }n. The
initial condition a,J = I now gives c = l and this is the same answer as we
obtained earlier by the method of generating functions.
Of course we have to verify that the table above really works. This
verification, as well as the consideration of the ease of multiple roots tends
to be messy if done directly. In the exercises we shall indicate how it can
be done elegantly using facts about vector spaces.
We have now described how to obtain the general solution of ([9).
This general solution involves r mutually independent arbitrary constants.
To find a particular solution we need to determine these constants. This
would require a system of r mutually independent equations in these cons-
tants. The initial conditions, i.e., the values of no, a,, ...,a,., give one such
set of equations. This, in fact, is the most common choice of specifying
initial conditions, because in applications, the values of a. can usually be
found by inspection for lower values of n. In some problems, however,
a, has no natural meaning for some values of n. For example, let a"
be a number associated with a polygon with n sides. A non-degenerate
polygon mult have at least three sides. So, for n< 30,. is undefined. In
some problems a. may be defined for all H, but the recurrence relation may
fail for a few small values of n. In such cases, even if we know the values
of a., ..., a,_,, they cannot be used as initial conditions.
In such situations we proceed as follows. Suppose the recurrence
relation (19) is valid at least for all n 2 k and that the initial conditions
specify the values of a," at”, ..., (15"-, where k is some integer 2 r. We
can then work backwards and define (or redefine if necessary), at.“
as-” ..., “1: a0 successively from (l9). This determination is always possible
because of our assumption that Co eé 0. (19) is then valid for all n 2 r.
From an, ..., (1..-, so defined, we get a particular solution. Alternately, we
can substitute the values of a," ..., a,+,_1 directly in the general solution
and solve the resulting system of equations for the arbitrary constants. It
is not hard to show that this way the arbitrary constants are uniquely
determined. It is necessary, however, that we know a, for r consecutive
values of n. For example, in the case of (23), suppose we know, say, that
a“ = a and a“ = b. If a + b a6 0, then (23) has no solution with these
initial conditions. But if a + b = 0, then there are infinitely many such
solutions, because these initial conditions do not determine a, for odd n.
The following numerical problem summarises almost everything we
Advanced Counting Techniques 58l
have covered about linear reccurenoe relations.
3.10 Problem: Solve the recurrence relation
21:. — 5a..I + 6a._, — 2a,,., = 2" + 3'+1 — n (25)
subject to a. = 5, a‘ = — 3 and a, =1.
Solution : The characteristic polynomial is 2xa .— Sx’ + 6x — 2. By the
rational root test (Exercise (6.234)), the possible rational roots of this
polynomial are 3; l, i 2, 1;}. By trial we find i is a root. So (x— i) or
equivalently, (2x — 1) is a factor Carrying out the long division we see
2)“ — 5x' + 6x -— 2=(2x — 1)(x' - 2x + 2). The other characteristic roots
are the roots of x“ — 2x + 2, namely 1 :l: 1‘. These are complex and their
polar form is Victim“. So by Theorem (3.9), the general solution of the
A.H.E. is
a. = my + c.( fl)- cos (“7") + c.(v5)« sin (”f) (26)
Now for a particular solution of (25), 2'" equals (if and since f is a
characteristic root, we tryAnG)‘, i.e., 2:7". However, 3'“ = 3.3" and since
3 is not a characteristic root we try B3“. For the third term, — n. since 1
is not a characteristic root. we try 011+ D. So let a particular solution be
a.=% +113-+Cn+D (27)
where the constants A, B, C, D are to be determined. Computing an-“
a... and a._,, we get, after simplification,
104 __, + Cl! +D—C
2a. — sap—1+6an-2— 24nd: 2T +2553
It follows that if (27) is to be a solution of (25). then we must have
104 = 1, 253: 81, c: — landD— C=0,whichgivesd=$, 3-%,
C-- — l and D— ~I So a =—1 +fi—n—l ispartlculnrsolu-
_ 'h ' " 10-2" 25
tion of (25). By Theorem (3.4). the general solution to (25) is
n . l 3"“
a, = c1(%) + qwz)» cos "—45 + c.(¢2)» sm ’11—" + 1—07. + f — n — 1
(23)
To find the particular solution with the given initial conditions, we set
n =5 4, 5, 6 in (28) and get the following system of equations:
582 mm MATHEMATICS (Chapter Seven)
(0 1%_4c'=a (ii) g—‘2—4c,—4c,=b (iii) 24—8c
where
_ 1 6,561 —201.637 _ M 1 19.683 6
“—5_1’6‘0_T =“ 300 -"—‘3 3‘26— 25 +
_ —1,254,917
‘“ 1,600
and
1 59049 _ 7,532,677
“=1‘W‘_2s +7“ 37—200 '
Solving this system we get
= 64(2a—52b+c)’ c! = 3a — 33+“ and c, =0 —b-2€
6' 20
Substituting the valves of a, b, c and then substituting in (28) gives the
answer.
We conclude the section with a brief discussion of recurrence relations
with two indices. A sequence is a function of a single integer variable. As
remarked in Section I. we often have functions of two (or more) integer
variables. Denote such a function by (“mum-r A recurrence relation for
l-fl
such functions is an expression which expresses am. in terms of am’s where
is m. j g n and at least one inequality is strict. Such recurrence relations
can often be written down by combinatorial considerations. For example.
m
let a...” = < , the number of ways to choose 71 objects from a collection
71
m
of m objects. Even if we did not know a formula for ( ), we can easily
n
show by purely combinatorial arguments (cf. Proposition (2.2.19» that for
all m 2 1, n 2 l,
a.” = and, H + an”, n (29)
Here the ‘boundary conditions‘ are a,,.,=l for all m and a,” =0for
m < n.
This is a recurrence relation with two indices, m and n. As a matter of
m
fact, solving (29),,we can show a,” =( )as follows. Multiply both sides
II
of (29) by xn and sum over It 2 1. Then we get
m an no
2 am,,.x"=x E a.._,,,,_lx"-‘+ 2 41..-, .x0 (30)
71-] l—l n-l '
m
Now, for each i, let F,(x)= 2 (15.3“. In other words, 1",(xl is the
I'll
Advanced Counting Technique: 583
O.G.F. of the sequence {a.,.};'.".. for fixed 1‘. With this notation, and keep-
ing in mind that a,_., = I for all i, we get
0 F..(x) — l = xF._,(x) + F.._‘(x) — l (31)
r
F..(x) = (l + x)F,,._1(x) (m 2 l) (32)
(32) is a recurrence relation for the sequence of generating functions
(Fu(x). Fix). F.(x),..., F,,.(x)....). Since am = 1 and a.,,,=o for n> 0,
Foot) =.§0%_.x" = 1. So F,(x) = l + x, F,(x) = (l + x)'... and in general
F..(x) = (I + x)"‘ for all m, as can be easily proved by induction on m. We
have thus solved (32). (Of course we could have also solved it by consider-
ing F(.\-, y)=m§or.,(x);~, which is the O.G‘F. ofthe sequence {1mm}"-0 0‘
and then deriving a formula for [(x, y) by multiplying both sides of (32)
by y'" and summing over in 2 1. We would then get F(x, y) = my
and expanding this as a power series in y, we get F...(x)=coefiicient of
yll = (l + win.)
It thus follows that an... is the coefficient of x" in (l + x)’". (or course
we already know this from the binomial theorem. But the point was to
derive it strictly from (29), without knowing how am.» came about.)
In
Knowing this, it follows that am. must equal ( ).
n
It is instructive to look at recurrence relations with two indices geometri-
oally, in the light of the cartesian representation of a..,,, (cf. Figure 7.2(a))
introduced at the end of Section I. Consider the rectangle of dots with
sides parallel to the axes, one vertex at (0, 0) and the diagonally opposite
vertex at (m, n). A recurrence relation then expresses the value of the
function at the corner (m, n) in terms of its values at some of the other
points of this rectangle. In general these points need not lie in the same
row or column as the point (m, n). (In the case of (29), both (m—l, n- l)
and (m— 1, n) lie in the column on the left of (m, 11).) In fact, if they did, the
corresponding recurrence relation would not be very interesting. For
example. consider the recurrence relation
mm = 17m, "'1 + b.., H + n (33)
for n 2 2.
In this case, if we let G..(x) = ‘3" I)..." x", then for each fixed m, multi-
.—
plying both sides of (33) by x” and summing over 11> 2, yields a separate
equation for each G..(x). The generating functions Gm(x)'s are not interlinked
in any way as the F...(x)'s arc interlinked by (32). In other words (33) is
584 DISCRETE MATHEMATICS (Chapter Seven)
nothing more than a collection of mutually unrelated recurrence relations
of one index, namely, n. In (29), on the other hand, the terms of the mth
sequence, (am, :10, are linked with those of the (m—l)th sequence.
(a..._1,,.};°_o. That is why solving (29) is more interesting_than solving (33).
Such inter-linkage can occur for two ordinary recurrence relations as
well. As we shall see in the next section, in some enumeration problems.
even though our primary interest may be in a sequence (11,}, instead of
obtaining a recurrence relation for {(1.} directly, it is more convenient to
introduce an auxiliary sequence {b.) (often several such sequences) and to
inter-relate the terms of the two sequences. This gives rise to what is known
as a system of rlmultaneons recurrence relations. The standard method for
solving such a system is to translate the inter-relationship between the two
sequences into an equation in their generating functions. We illustrate this
technique with an example.
3.11 Problem: Solve the system of recurrence relations:
a. -= a,..I + b. (n > 1) (34)
b. = an-r + bin—1 (’1 3 I) (35)
with the initial conditions 110 = l, b. = 0,
Solution: As usual we let A(x) = 3 mac" and B(x) = E 11.x". Multiply-
II-D II-
ing throughout by x“ and summing over n B I, gives a
A(x) — 1 — xA(x) + B(x) (36)
and
m) = xA(x) + xB(x) (37)
Solving these equations simultaneously gives A(x) = fl and 3(x)
= l-——_ 3: + 2;" Expanding these, we get closed form expressions for a. and
11.. This is left as an exercise. 5
In this problem we could have delinked the sequences {4.} and (1,.) from
each other as follows. Substituting (35) in (34) and then again using (34)
with n — 1 replacing n, we get
an=zam+bn~1=zam+("u—1—a-—g) =3aH —u..,.
Similarly, b, = a. — b,I + b._l = b.“ _ 2b. + b..,. We can rewrite these
as
and all + 3 al-l + all-j = 0 (n 3 2) V (38)
b. - 3b»—: + bro—I = 0 (ll 2 2) (39)
Advanced Counting Techniques 585
These are linear recurrence relations, and can be solved separately. But in
some problems such delinking may not be easy and so it is desirable to use
the method above.
Exercises
3.1 Let 0.9,9,9... ....9. be the decimal expansion of the number L353:
Show that p, = 5, Ba : 0 and for n > 3, 3.. depends only on the
congruence class of n modulo 3.
3.2 Define a recurrence relation by x. = Jr,H + 9,. with x0 = 0 where
the fl's are as the last exercise. Prove that the solution of this
recurrence relation is the same as the that of the recurrence relation
x. — x._, = 10 for n > 5 with the initial conditions x,=5, xlalo
and x‘ = 13. Hence obtain a closed form formula for x,.
3.3 Prove that a real number is rational if and only if its decimal
expansion either terminates or repeats periodically after some
stage. (From the decimal expansion of any real number we get a
recurrence relation. If the number is rational then this recurrence
relation can be solved by generalising the method in the last exercise.
But in the case of an irrational number like it, there is no known
way of solving it as we remarked in the text.)
3.4 Recall from Exercise (1.3.9) that the numbers b, given by (3) are
called Fibonacci numbers and are more commonly denoted by F”.
Prove that lim F" = l + V5. What is wrong if we try to prove
a” Fri—I 2
F._—i
ml‘T—a
5'" —- l.—
Dividing the recur-
.» E... L
rence relation F.=I-‘,,_, + FM throughout by 1"».l and letting
n—>eo, weget L=l +124 or L’—L -— 1 =0, solving which
L — ~—2—‘/5
l + ,the other root being negative.
3.5 Suppose ABCD Is a rectangle with the property that when a square
on its shorter side'Is removed from it, the remaining rectangle is
similar to the original rectangle (see Figure 7 5). Prove that the
1 +2‘”. Prove that if
—
ratio of the length to the width of ABCD is—
we go on removing squares on the shorter sides2we keep on getting
1 + 1/_S
rectangles of the same shape. (For this reason, the ratio 2
is called the golden ratio. It is believed that among all rectangles
586 orscrurrs MATHEMATICS (Chapter Seven)
(I
Finn 7.5: The Golden Ratio
which are not squares, the shape of a rectangle with the golden
ratio is most pleasing to the eye !. A possible reason might be that
according to the last exercise, a rectangle whose sides are the
Fibonacci numbers F. and 17..-, for a large n is very close to a
rectangle with a golden ratio and. as remarked in Chapter 1,
Section 2 Fibonacci numbers arise in nature in biological growth.
Maybe we have an innate fascination for anything related to
Fibonacci numbers!)
3.6 Using only the recurrence relation (2) for the Fibonacci numbers
(and not the formula (3)). show by induction that | F. | g 2H for
all n.
‘3.7 Let f, g, It be continuous real-valued functions defined on an open
interval (0, R). where R > 0. Suppose 30:) 9E ’10:) for all x E (0. R)
and that for every x e (0, R) either f(x) = g(x) or f(x) = h(x).
Prove that either f(.\') = g(x) for all x e (0, R) or [(x) = It(x) for
all x 6(0, R). [Hint: Let A a (are (0, R) : f(x) =g(x)) and
B = (x e (0. R): f(x) = It(x)}. Note that A, B are closed subsets of
(0, R). Use connectedness of (0. R).]
3.8 Starting from the power series expansion of Lu. work
2:
backwards to justify that (6) is indeed the solution of (5).
3.9 -Solve the following recurrence relations using 0.0.F.’s:
(i) a..+n._,=nforn22withau=1,41,:—l.
.. l . .
(n) a. = 2—,; a...I for 11> 1 With a, = 3 (this can also be solved by
inspection).
(iii) a. = 3" — 2"”o - 2""a1—... — 2M... — .H. for n 9 l, with
(10 = l.
Solve 11,.=ml,._,+3'I with 410:0 using exponential generating
functions.
Solve a”: Sap}, ac = 2 by three methods; by inspection, by
ordinary generating functions and by exponential generating func-
tions.
Advanced Counting Technique: 587
3.12 Obtain the first four terms in the series solution of the following
differential equations at the point 0.
(i) y”+xy'+y=0.y(0)=2,y‘(0)=l
(ii) y"+e"y‘+}’=0. K0)=1. y'(0)= 2
3.13 Suppose a is a multiple characteristic root of (20). Prove that
a. = ml is a solution of (20). [Hint For each fixed I: 2 r, note
that ais a multiple root of Cat" + C,_lx"“ + + Cox“. Apply
Exercise (6.223).)
3.14 Suppose a is a characteristic root of multiplicity k. Prove that
a. = a", a. = n a", 11,. = n’a",..‘, a. =- n‘H a“ are all solutions of
(20). Prove further that the vectors (affirm, (na.");°_o,..., {nk‘l «0220
are linearly independent in the subspace K of V. [Hint note that
at a9 0. Take suitable truncations of these vectors and a Vander-
monde determinant.]
3.15 Generaliee Theorem (3.8) to cover the case where some of the
characteristic roots are multiple roots.
(a) Let n, k be positive integers and A = (on) = (x,, + iy,,) be
a 2k x n matrix over c in which the entries in even numbered
rows are the complex conjugates of the corresponding entries
in the preceding rows, i.e., a2,” = a,,.,,. for all p =- 1...., k;
.v = 1,..., n. Let B = (b,,) be the 2k xn matrix over R obtained
by taking the real and the imaginary parts of the entries in A;
specifically for p = l....,k and s = l,..., n, b...” = x,,_,,,
and bu... = y”... Prove that A and B have the same ranks.
(Hint: Using the formula
(1H .2 MEI)
construct a non-singular matrix C such that B a CA.)
(b) Suppose the coeflicients Cu....,C, in (20) are real and re" is a
complex characteristic root of multiplicity k. Prove that the
vectors fr" cos 118),".0, (r’I sin n0)2'_.,, {nr" cos a9},7’_._
{ml sin :19);'.'_.,..., {uh-1WI cos n0}.".u, {n‘Hr'l sin n0):'.,, are in K
and are linearly independent over R. (Hint: First work overc
and then use ((1).)
3.17 Generalise Theorem (3.9) to cover the case of multiple complex
roots and thereby to give a complete answer to finding the general
solution of (20) in all possible cases with real coeflicients.
3.18 Let p be a '0 real (or r' . L and ” the
recurrence relation
(Chapter Seven)
588 Discna'ra MATHEMATICS
C,a,. + C,_.a,,_, + + Cm..-,+,+C.a.-. = W" 2 r) (40)
that any solution
(which is, of course, a special case of (19)). Prove
of (40) is also a solution of
Cr“; + (Cr—1 — flCr) all—1 + + (Co — 3C1) all-I — Bevan-P1 = 0
(n 2 r + 1) (41)
of order r + 1.
which is a linear homogeneous recurrence relation
(3.4), let K, K’ and L be the
3.19 Continuing the notation of Theorem
ely. Prove that L
solution spaces of (20), (40) and (41) respectiv
and 42 ¢ K. Then
contains both K and K'. Further suppose a e L
E K’.
show that there exists a unique Re R (or G) such that m
r and
[Hintz Note that K and L are subspaces of V of dimensions
shows
r + 1 respectively and that K’ is a coset of K. This exercise
how we can get a particular solution of (40) which is not asolution
of (20).]
if B is not a
3.20 Using the last exercise and Exercise (3.15), show that
characteristic root of (40). then there exists a unique x¢0 such
am a. = mp" is a particular solution of (40). [HintzNote that the
characteristic polynomial of (41) is simply (x -— [3) times the charac-
teristic polynomial of (40).]
3.21 With the same reasOning, show that if [5 is a characteristic root of
(40) of multiplicity k (k > 1), then there exists a unique A¢0
such that a. = Nth" is a solution of (40).(These two exercises show
that the first two lines in the table in Figure 7.4 are correct and
also take care of the case of multiple characteristic root. Direct
computational proofs tend to be clumsy.)
Suppose {b.):'.o is a 1 which ‘ ‘3 some ‘ ,
linear recurrence relation of order 11, say,
pn + Dp-s b._,+ + Dr bit—n: + Dubs-p = 0 (n 2p) (42)
Suppose further that («4:1, is a sequence which satisfies a (non-
homogcneous) linear recurrence relation of order r, say
Cyan + Cr-xan—l +... +Csan—Ifi + Coal-r = bu (’1 2 7). (43)
(This is the same as (19) if we write f(n) = 13,.) Denote by D(x)
and C(x) respectively the characteristic polynomials of (42) and
(43). By direct calculation, show that (auxin satisfies a homo-
geneous linear recurrence relation of order I + p whose characteri-
stic polynomial is D(x) C(x). (It is a little messy to write down
this recurrence relation. But as we shall see, we are more interested
in its characteristic polynomial than its individual coeflicients.
Pecnuse of this exercise we can convert a solution of a non-
‘ ‘ to a ' “ of a ‘ _, relation if
we happen to know a homogeneous relation satisfied by the right
hand side: Note that Exercise (3. l8) is a special use of this
exercise wrth D(x) = x — fl.)
Advanced Counting Technique: 589
3-23 Let k be a non-negative integer and let b. 2 n", with the understand-
ing that 0' = l. Prove that (b.).'.‘_. satisfies a homogeneous, linear
recurrence relation whose characteristic polynomial is (x — Db“.
[Hint: Apply Exercise (3.14) with at = l. A direct proof is messy].
With k as above, consider the linear recurrence relation of order r,
Clan + Cr—gnl—l+~-~+Cjan—r+l + Cam—r = n‘k (n 2 r). (44)
Consider the homogeneous linear recurrence relation of order
r + k + 1.
Emu. a, + 5m: a.—,+...+E,a.—,-k +15. an—r-k-) = 0 (45)
whose characteristic polynomial, say E(x), is the product (x—l)“
C(x) where C(x) is the characteristic polynomial of (44). Let K, K’
and L be the solution spaces of (20), (44) and (45) respectively.
Prove that L contains both X and its coset K'. [Hint: Apply
Exercise (3.22) with b.=0 and then again with b,.= n", D(x)
being equal to (x — l)“1 in both cases]
3.25 In (44), suppose I is not a characteristic root. Prove that there
exist constants A0,....Ak such that a. = A..+AI n +A,n‘+ +Akn“
is a particular solution of (44). [Hint: Use the last exercise. 1 is a
characteristic root of (45) with multiplicity k+ 1. Apply
Exercise (3.15) to (45). For l=0,l, 2,...,let vibe the vector
(0', 1', 2’, 3',...,n',...) in V. Prove that L is spanned by KU (3,,
Vl!"'svk}']
(44) suppose I is a characteristic root of multiplicitym(m 2 l),
Prove that there exist constants A.., A..+.,...,A,.+k such that
a,I = A..n" + A..+nn"'+‘ +...+A...kn"‘t* is a particular solution
of (44). (Hint: Same as the last exercisey except that this time 1 is
a characteristic root of (45) with multiplicity m+k+ l. The
vectors 5,: 171.....v:__, are already in K and so L is spanned by
K U (vn, v..+,,...,v..,,.}. With these exercises we have now com-
pletely established the validity of the table in Figure 7.4.]
Solve the following recurrence relations:
(i) a” -— 5a,... + 6a,,_, = 2" + 1:
(ii) a, — 30..-, + 3a.,_, — a... = n“ with a, = 1,11, = — l,a,=‘/2
(iii) 6a,. + 7an—l + 60..., — a,_. = 0
3.28 Show that if the characteristic roots all, ..., a, of (19) are distinct
then for any k 2 0 the following system in the unknowns c1, ..., c,
has a unique solution:
on: + an; + + cal: =;(k)
(Chapter Seven)
590 DISCRETE MATHEMATICS
clef“ + 0.11;“ + + LWE“ =flk + 1)
: (46)
r1¢g+v-I + c,at§+”‘ + + “5““ =fik + r - 1)
a.'s determine a
Hence show that the values of any r consecutive
particular solution of (19). ‘
[Hint: Vandermonde determinant ngam.]
3.29 Extend the result of the last exercise to the general case.
3.30 Solve the recurrence relation with two indiccs:
n
a... = E ( >11.-.” (m 2 I. n 2 0) (47)
l-o ‘ i
with the boundary conditions an,“ = l and an”. .= 0 for n > 0. by
of the
considering for each m, the exponential generating function
sequence (award. .
3.31 Solve the system of simultaneous recurrence relations:
Il-l n
(n + 1m + (—1)»]— Pram" - .Eob’b'”
a.=
2 (n 2 l)
and
Myra-
I _ n n
‘2‘ albn-l (n 2 I)
with
ao=l,a1=—1,bo- 1.
3.32 Let E(x)= § a. :7: be the B.G.F. of a sequence (11.). Prove that
u-o
{a,.) is a solution of (20) if and only if y = 50:) is a solution of
. dr—l d
C.:7’,+ 0.3;}: + + €133"c Coy=0
which is a homogeneous linear difl'erential equation with constant
coefficients. Hence give an alternate proof of Theorem (3.8).
Notes and Guide to Literature
The results in this section are classic. For more on difi‘erence equations,
see Goldbergll] or Levy and Lessman [1]. Historically, Fibonacci relation
(Equation (5)), was the first recurrence relation, published in 1220. It
appears in such diverse contexts ranging from phyllotaxy (arrangement of
leaves on a branch) to searching algorithms that its solution by De Moivre
(which is essentially the same as given here and appeared in eighteenth
century) is probably the strongest testimony of the ‘prnctical’ utility of
Advanced Counting Technique: 591
the power series. For more on the golden ratio as well as the real life
occurrences of the Fibonacci numbers, see Coxeter [I].
There are numerous treatises on differential equations. See for example.
Diwan & Agashe [l] or Piaggio [I]. As an example of an article illustrating
the comments about suspecting the solution to a problem, see Rudin [3].
4. Applications of Recurrence Relations
In the last section we studied how to solve recurrence relations and as
applications, solved three problems, namely the Regions Problem, the
Shares Problem and the Vendor Problem. In this section we present a
greater variety of applications of recurrence relations. The principle behind
these applications is similar to that of mathematical induction. We take a
problem with an integer parameter :1. We solve it by inspection for some
low values of n. Then we relate the solution of the case where n has value
mto the solutions of the cases where n has values lower than m. But there
is an important difference. In induction, we have to guess the answer
before-hand and we can only verily it. This is indeed a drawback because
as in the Shares Problem, sometimes the answer to the general case is far
from easy to guess. The method of recurrence relations serves to patch up
this gap. That is why it is a powerful tool in enumeration problems, com-
parable to difi'erential equations in continuous mathematics.
As with differential equations, there are two major steps in applying
the technique of recurrence relations to a combinatiorial problem. First, to
write down a recurrence relation for the quantity we are looking for and
secondly to solve it. The success of this method depends on how easy it is
to carry out these two steps in a particular problem. The first step naturally
varies considerably from problem to problem and in some problems it
cannot be carried out satisfactorily, For the second step too, there isno
golden method that will work to solve every possible recurrence relation.
Although we discussed the method of generating functions in the last section,
as remarked in Section 2 its success hinges upon two factors (i) identifying
the generating function in a closed form and (ii) expanding it. As we shall
see, in some problems, even if we can write a recurrence relation, there is
no way to solve it and so the answer has to be left in a summation form or
as the coefficient of a suitable power of x in some function of .v. It should
also be noted that in some problems even though the method of recurrence
relations works, there may be other methods which are more elegant. We
already saw an example of this in the Regions Problem and, more strik-
inglyhin the Vendor Problem.
Despite these limitations, the method of recurrence relations is an
important one. First of all, whenever it works, it usually works very syste-
matically. Other methods may be more elegant. But they are also often
tricky and uncertain, involving ad-hoc rcasonings. We recall our earlier
592 DISCRETE MATHEMATICS (Chapter Seven)
analogy that while it may be more artistic to do a problem by the methods
of, Euclid, doing it with coordinates is often a surer way. Secondly, in some
problem, recurrence relations are the only means An example of this was
the Shares Problem.
We first give a few applications where we already had obtained the
answer by other methods. For example, in Theorem (3.2.8) we showed
that the number of k-ary sequences of length n in which a particular
symbol, say 1, appears an even number of times is k" + rk — 2)" . In Sec»
———2
tion 2 we rederived this using exponential enumerators. With recurrence
relations, the problem almost reduces to a triviality. Let aI be the desired
number. Then a“ = 1. For n 2 1, a. = |A.|, where A, is the set of all
k-ary sequences of length n in which 1 occurs an even number of times.
Clearly A, = C,. u B" where C,I consists of those sequences in A, which
end with l and 5,, = A. -— C,.. If we remove the last 1 appearing in a
sequence in C,., we get a sequence of length n — l in which 1 appears an
odd number of times. There are k"-1 — (1..., such sequences (k‘l-l being
the total number of k-ary sequences of length n — 1). Conversely every
such sequence of length n — l gives a sequence in C,. if we append a l at
its end. It follows that I C. |= k"‘1 — a,_,. As for | B. l, every sequence
in B. must end in one of the remaining k — l digits other than I. If we
take out the last digit, we get a sequence of length n — l, in which 1
appears an even number of times. By our assumption, there are 11,..1 such
sequences. Since each one of them comes from k _ 1 sequences in 8,, it
follows that |B,,| = (k - l)a._,. So we have,
a. =1 AnI-I C.I+ I B. l= (k"-l -—a._1)+(k—- l)a,._,.
Or equivalently,
a. - (k — 2M»; = k'"I (n 2 l) (1)
This is a linear non-homogeneous recurrence relation of order 1. (k — 2)
is the only characteristic root. So a. = C(k - 2)" is the general solution of
the A.H.E. by Theorem (3.8). For a particular solution, we try a. = Ak"
for a constant 7L Substitution in (1) gives Mk" —— (k — 2) k"-‘] = [tn-1 or
A = }. So the general solution of (1) is a,. = c(k -— 2)" + ik". The constant
c is determined by the initial condition a. = l, givingc = f. It follows that
that for n 2 0, a. __ (k — 2)'l + k'l
, the same answer as before.
2
_ As another example of this kind, which also serves to illustrate how it
Is sometimes more convenient to consider a system of recurrence relations
rather than a single recurrence relation, we count the number of those
k-ary sequences of length n in which both 0 and 1 appear even number of
times each. (cf. Exercises (3.2.17) (b) and (2.18)).
Advanced Counting Techniques 593
4.1 Problem: For each n, find the number of k-ary sequences of length
II in which two symbols, say 0 and 1, appear an even number of times
each.
Solution: For simplicity denote the symbols by 0, l,..., k —- 1.Denote the
desired number by an. Then a0 = 1. However, it is not easy to express an
directly in terms of a._l. So we introduce other auxiliary sequences. Let
b. be the number of those k-ary sequences of length n in which 0 appears
an even number of times and 1 an odd number of times, c. the number of
those in which 0 appears an odd number of times and 1 an even number
of times and d. the number of those in which both 0 and 1 appear an odd
number of times each. There is considerable inter-relationship among {an},
{bu}, (c,.} and {11,}. For example, by symmetry [7,. obviously equals c... Also-
an + l7,I + 0,, + 11.. = k", this being the total number of k-ary sequences of
length n. Thus, apparently it is wasteful to introduce c. and d. when they
could have been expressed in terms of a, and b. as
c. = b. (n 2 0) (2)
d..=k'—a,.—2b,. (n>0) (3)
But there is a reason for introducing all four sequences. Let A," B". C...
D. denote respectively the sets of sequences whose cardinalities are a... b.,
c,.,d,,. ThusA.=(2 =(xx,...,x,.):0<xl<k—l,forali i=1, ,n and
x, = 0 for an even number of’ I s and x,= l for an even number of 1's}
Similarly define B," C,“ D”. Now let x= (x,... ., x,) e A... Let y= (xv...
x.._,) be the truncation of 2 obtained by removing the last digit, x... Then y
— 2. k.,—1
—0, x,=-l or x,.-
belongs to C..-" 5..-! or A..., according as x,,=
Conversely starting. from y in C..,UB,._,UA,._l we can get 2 e A. by
a ‘ x. ’ “ upon L L yeCH,yeB,.,or
y e A.._1,'In the last case there being k— 2 possible choices for x,. It
follows that
”'I = ”II-1 + bn-r + (k - 2)“n-r ('1 3 l) (4)
By a similar reasoning, we get
_ ‘10-: + and + (k— 2) bn—l ('1 21)
[’n— (5)
c» = an. + d._1 + (k— 2) c,._, (n a 1) (6)
and
d, = 17"., + c,_, + (k — 2)d,._1 (n 2 l) (7)
It is now time to use (2) and (3) to get from (4) and (5),
all = (k _ 2) ail—1 + Zb'I-I (ll 2 l) (8)
and
b. = (k -- 4) bn-i + 15"" (n > 1) (9)
594 niscnn'rn MATHEMATICS (Chapter Seven)
[Note that (6) and (7) are not used. This is not surprising since, in presence
of (2) and (3), they convey no new information].
We could solve this system using generating functions as was done in
Problem (3.11). But the resulting expressions for A(x) and B(x) come out
rather complicated Fortunately, in the present case there is another way
out. Note that (9) is a linear recurrence relation of order I and does not
involve a... Solving it analogously to (l), with the initial condition I)“ = 0,
we get
b.=’§—w (r120) (10)
With this, (8) reduces to
a,_(k_2)a,,_,="_;f_(kj"4l_i (n 2 l) ('1)
Solving this, with the initial condition 110 = 1, we finally get
a,=§(k—2)~+
I k'l k —4"
?+(-—Tl 0:20) (12)
as the answer to the problem. 3
Notice that although our interest was in finding (1., along the way we
also found b. (which answers Exercise (3.2.l8)). In fact, we had to find it”.
Without it, it would have been diflicult to find or. directly. In other words,
it was easier to find both 11,, and b,, than finding it. alone!
As an illustration of the use of recurrence relations with two indices.
we count, once more, the number nth-selections of m kinds of objects.
with unlimited repetitions allowed. Denote this number by a..." for m 2 0,
n 2 0. A recurrence relation for a,,.,,, is obtained as follows. Suppose m2 1.
n 2 1. Pick one kind of objects and classify the selections of n objects
into two types, those that contain at least one object of this chosen kind
and those that do not. In the first case, the remaining n — 1 objects can be
chosen from the m kinds without restriction. In the second case, all the
7: objects must be from the remaining m — 1 kinds. Consequently,
a...» = a...“ + “In-1m (m 2 l, n 2 1)~ (128)
The boundary conditions are am,o = l, for m 2 0 and am = 0 for all n > 0.
Let Fm(x) = E ammx", m = 0, l, ...... Then from (l2) we get, as usual.
l—ll
Fm(x) — l = xF,”(x) + I’m-10¢) ~ 1, which gives
1
Fwd") = m I’m-i“) (m 9 1) (l3)
The boundary conditions imply Fn(x) a: 1. Solving (13) as a recurrence
relation for the sequence of functions, {Fm(x)}:_o, we get
5.0:) = (1 ~ x)—~ (m 2 0) (14)
Advanced Counting Technique: 595
m + n— l )
By Exercise (1.9), am. = coeflicient of x'I in (l — x)" =
n
This is the same as Theorem (2.3.12), except for a difl'erent notation.
Recurrence relations can also be used to count d,” the number of
derangements of n symbols, which was counted earlier by the principle of
inclusion and exclusion (Theorem (2.45)). Recall that a derangement of n
objects is an n-permutation without aflxed point. (Unlike in Section 2,
here by a permutation, we again mean one without repetitions. Equivalen-
tly a permutation of a set Sisa bijection of S onto itself.) Denote the
, objects by l, 2,.... n and let D. be the set of all derangements of these
objects. Each such derangement must take 1 to some I for 2 S i< n. For
i- 2,..., n. let E, be the set of those derangements in D, which map l to !.
Evidently I E, l = [ E, | =...= | 5,. | and D,l = 'ls1.FurtherE,nE/=¢
for tee j. So 4,. =(n — l) | E, | . We now classify the derangements in E;
into two types, those which take 2 to 1 and those which take 2 to some
other object. In the case of a. derangement of the first type, the symbols 1
and 2 are l , ‘nnd the ‘ ' n — 2 , L I are ur-
among themselves, without a fixed point. It follows that in E, there are
d.-. derangement: of the first type. As for derangements of the second
type. every such derangement is a bijection f: {2. 3...., n)»{l, 3, 4,.... n)
with no fixed points. and in which 1(2) as 1. If we call 1 as 2 then this is
the same as a bijection f: (2, 3...., n)» (2, 3...., n} with no fixed point.
There are d”, bljections like this. Putting together. | E, | = 11,._, + J,”
and hence, finally, from d. = (n — l) | E9 | we get
4- = (n — 1) (div-s + dn-r) (’1 9 2) (15)
It is a little complicated to solve this recurrence relation. So we rewrite it
as
d, —-n 4-: = - (d,_, — (n — l) d...) (n 2 2) (16)
Ol‘ as
an == — an-a (n> 2) (17)
where a,l = (1,. -— n d._,. The initial conditions are do = l, d, = 0 and hence
a“: — 1. If we define ao— =l, (l7)is valid for all n 21. It can be solved
by inspection to give a.—
— (- l)" for n > 0. So we get,
d. = ad... + (_1)~ (n> 1) (18)
with the initial condition 1!. = I. We already solved this recurrence relation
in Problem (3.2) to give
1
d. _n1(,—3—,+ fll — 57+...+(_1)~
l l
:7!)
Note that in this case the answer has to be left in the summation form.
596 DISCRETE MATHEMATICS (Chapter Seven)
The only non-trivial, non-linear recurrence relation we solved in the
ast section by the method of ordinary generating functions was (5) of the
last section. It arose in connection with the Vendor Problem. There are
many apparently unrelated problems in which essentially the same relation
holds. Recall from Chapter 1, Section 2’ that the essence of the Vendor
Problem was to count 0,, the number balanced arrangements of 7: pairs of
parentheses. It turns out that when suitablyinterpreted, a balanced arrange-
ment of parentheses corresponds to a certain way of doing something in a
wide variety of contexts. Exercises (2.3.10) and (3.4.22) already provide two
examples of this. Not surprisingly, then, the numbers given by (6) in the
last section appear in the solution to a number of problems. Because of
their frequent occurrences, these numbers are given a name. Specifically,
211—2
the nth Catalan number is defined as }1( ) for n > 0 (see Exercise
11—1
(1.16)) and is often denoted by C... Here we present one more problem
whose solution involves the Catalan numbers.
4.2 Problem: Find the number of ways to divide a convex polygon
with n sides into triangles by non-intersecting diagonals.
Solution: Let b,, denote this number. By inspection we see that b, = l,
b. = 2 and b. = 5, b0, b1, b, have no natural meaning. Arecurrence relation
for b. can be obtained directly as follows: Denote the vertices of the poly-
gon by A“, A1...” An-) in a particular order. Now, in any triangulation of
the polygon, AoAH will form a side of a unique triangle. Let A, be the
third vertex of this triangle (see Figure 7.6), l g r < n — 2. For a given r
in this range, let us count the number of triangulations in which one of the
triangles is 40AM..-“ For 2 g r < n — 3, any such triangulation amounts
to further triangulating the two subpolygons, A0141...“ A, and
A,A,u,...,
A._.An_l. These polygons have I + l and n — r sides respectively and since
A3 1’ ‘~‘Ap-|
A Ar
\
I
An-z
A0 An-l
Figure 7.45: Trllngulntlon of Convex nvgon.
Advanced Counting Technique: 597
each can be triangulated independently of the other, it follows that for
2 g r s n - 3, the number of triangulations in which AOA,A,,_, is one of
the triangles is b,+1 X b,,_,. For r = 1 the first subpolygon is degenerate and
the second can be triangulated in 17.H ways. For the sake of uniformity of
notation we set b. = l and write b,._l as b,b,_1. This is also the number of
triangulations for the other degenerate case, namely r=n——2. We now have
n—2
b. = 2 b,+,b,_, (n2 3) (19)
HI]
We set bu = b, = 0. Then the right hand side of (19) can also be written as
no.“ + blb, + b,b,,_, + + b._,b, + b,.b1 + bulb.” which is precisely the
coeficient of .V'” in [B(x)]2 where B(x) is the O.G.F. of (b,,}',’,°=o. Since
b, = bl = 0, the coefficients of a”, x, x’ and .v' in B(x)B(x) are all 0. So,
multiplying (19) by .V'“ and summing over 11 2 3 gives
.\‘[B(X) —- x‘] = B(x) 30:) (20)
which yields B(x) = ’iL—fl. bl = 0 forces the choice of the negative
sign for the radical giving B(x) = 12:3. This is the same as x*A(.v)
where A(x) is given by (12) in the last section. Consequently
_ __(2n;4)!___‘_ 2"” _
b"“'""‘(n—i)i(n—2)i‘u—1(n_2 ‘Cn-I' 3
Instead of working our way through the generating function for (b..}:°.n.
we could have directly related [7,, to some other problem. Suppose A]... ., AM
are elements of a set on which some non-associative operation )3 has been
defined. Then every triangulation of the polygon corresponds to one possi-
ble meaning of the expression A1 # A,z v...vA,.-,. Specifically, a triangulation
in which ADA,A,._, is a triangle corresponds to an interpretation of the form
(AI #. . .t A,) at (Am i. . .tAn_,) where l S r g n — 2. By exercise (3.4.22),
the number of all possible interpretations of A, a A, 4. . .: AM is
1 2n —- 4
It — 1 n_2 ‘
In the solution above, 170, b,, b, had no natural meanings. We set
bu=b, =0 and bI =1 so as to simplify our computations. With some
other values, we would have ultimately gotten the same answer, but the
calculations would have been a little more clumsy. This point should always
be kept in mind. In many problems, the first few terms of a sequence are
not defined by the conditions of the problem and we are free to choose
them. With a judicious choice, the computations become easier.
598 mscaa'ra MATHEMATICS (Chapter Seven)
Counting problems, by far, constitute the vast majority of the kind of
problems that are amenable to the method of recurrence relations. Occa-
sionally, however. the recurrence relations can be made to do some other
jobs such as proving identities. Suppose the identity to he proved is of the
form a..=b,. where both {a.} and (11,.) are sequences. Usually, we can estab-
lish the identity by' direct verification for lower values of 11. Let us suppose
that we can identify some recurrence relation satisfied by one of the sides
of the identity. say by b... If we can then show that the other side, (1,, also
satisfies the same recurrence relation, then, in view of the uniqueness of
solutions of recurrence relations (Theorem (3.1)) it would follow that a..=b,.
for all n. (An analogous technique is also used in continuous mathematics.
If f; g are functions of a real variable x, then in order to show that f(x) =
g(x) for all x, it suffices to show that for.) = guru) for some xll and f'(x) =
g‘(x) for all x, or more generally that f and g are solutions of some first
order difl'erential equation, satisfying the same initial condition.)
By way of illustration, we prove the identity in Exercise (1.8).
. n+k l
4.3 Proposition: If n > 0, then 2 7, = 2".
s-u ,,
Proof: If b.=2", the right hand satisfies the recurrence relation b.=2b,..,.
Let 0,, denote the left hand side. Clearly do = 1. Now, for n > 1 we have,
m m
by the identity( ) = < > and Proposition (22.19),
1‘ "I — f
_i n+k l
a,—M k 7‘
i n+k—l 1 ,. n+k—l l
M k 7+h§o k—l 5"
._, n—1+k l 2n—l 1 1,. n+k—l 1
”EA k )5"+< u )7+531(k—1)2"‘
_ 1 2n—l 1 II n+'J
-
_a”+2[( n )F+1§"( j
Advanced Counting Technique: 599
in other words, a. = 2a,,_l for all n 2 1. Thus the left hand side satisfies
the same recurrence relation as the right hand side. The initial conditions
also match because (1,: Ito = 1. So by Theorem (3.1), a. = b. for all n,
proving the result. a
The argument above is not substantially different from a proof by in-
duction. But there is a subtle difi'erence. If, instead of proving the identity.
I! ll + k 1 . .
the problem was to evaluate the sum 2 7‘ , then induction
M n
would not have worked unless we were able to guess the answer somehow
(which, in the present problem, is not very easy). But the argument above
would still allow us to get a recurrence relation for the sum in question
and solving it we would have got the answer.
All the applications of recurrence relations considered in this section so
far involved problems whose solutions were already knowu to us by other
methods. Let us now turn to new problems where recurrence relations are
useful. One very celebrated such problem is popularly called the Tower of
Hanoi problem. In this problem there are three pegs. 0n the first one, there
is a stack of n discs of difl'erent radii (with holes at the centres) arranged
in descending order starting from the largest disc at the bottom to the smallest
disc at the top (see Figure 7.7). The problem is to move these discs from
one peg to another. one at a time and to eventually stack them On another
peg in the same order. But the catch is that at no time can a larger disc be
placed over a smaller one on the same peg. Obviously. if n > 1. the transfer
would be impossible without a third peg. For n = 3 a solution can basin“
as follows. Call the pegs A, B, C. Number the discs as 1 to n fr0m the top
to the bottom and suppose originally they are stacked on A. For short,
denote by (m, X) the movement which consists of moving the mth disc
£93” h-2
n—I
Ill
3 C
A
Flgure 7-7 : The Town of Hanoi Problem
600 mscmrrn MATHEMATICS (Chapter Seven)
(from wherever it is) to the top of the stack on the peg X. Thus (3, B) means
‘move the disc 3 to the top of the peg B’. It is easy to see that the sequence of
moves (1, s), (2, c), (1. cm. H). (I. A), (2, B), (1. 19) results in transferring
the stack of 3 discs from A to B. Note that this requires 7 moves.
The problem now is to find the minimum number of moves needed to
move a stack of n discs from one peg to another. Denote this number by
a... We have to express a,. as a function of n, or, in other words, find a closed
form expression for a... By its very nature. the problem is twafold. We
have to give some method to efl'ect the transfer in an moves and we also
have to show that it cannot be done in less than 41,. moves. Obviously
zz.,=0anda1 =1.
A recurrence relation for a. can be built as follows. Let n > 1. Ignore
for the moment the disc n at the bottom of the stack on peg A. By our
assumption, the stack of the remaining n — 1 disc can be moved to the
peg C in a...1 moves, without even touching the disc It. Now move n to
B(which will be empty at that time). Keep It at the bottom of B and with-
out touching it, move the stack of 71—1 discs from Cto B. This can be done
in 11..-,moves. So We have shown that the stack of n discs can be moved from
one peg to another in a...1 + l + a._,, La, in 2a...l + 1 moves. This proves
that a. < 2:1».1 + 1. To show that equality holds, we must prove that any
method of transferring the stack of n discs would require at least 2h...1 +1
moves. In any such method the disc n must move at least once. Let b,._l
denote the number of moves (in that method) before the first move of the
disc 71 and let c.._, be the number of moves after the last time disc n is moved.
Then the total number of moves in the method is at least b".l + l + c._,.
Now, the disc n can only be placed on an empty peg. This means, before it
was first moved, the entire stack of n — 1 discs must have been moved from
one peg ”to another at least once (without even touching the disc n). Since 11..-,
is the minimum number of moves to do this, we have b...1 2 11,.-1. Similarly,
after the last time the disc n is moved, the same stack must have been trans-
ferred at least once, this requires at least (1..., moves and so c..-) 2 a,_,.
Putting it together we see that any method of transferring the entire stack
will need at least 2a,,_, + 1 moves, Le. a, 2 2a...l + 1. Since the other way
inequality holds already, we get
an=2a.—1+l (n2 2) (21)
with initial conditions a. = 0, a. = l.
Solving (21) is a routine matter which we leave as an exercise. We then
get a. = 2'——1 for all n > 0. This number grows very rapidly with n. So even
for a small sized stack of discs, it will take fairly long to transfer it to
another peg. The story goes that one such stack is located at a temple in
Hanoi and the priests there are constantly moving its discs so as to transfer
the stack to another peg. The world is supposed to come to an end when
they will . be through! If true, there is nothing to worry about because
assum ng It takes one second to move one disc, a stack with as few as 55
Advanced Counting Teclmiques 601
discs would take more than a billion years. And if that is too short, we
just have to add one more disc to double the life of the world!
The derivation of the recurrence relation in the Tower of Hanoi
Problem was relatively easy because the nature of the problem was such
that to solve it for the case of n discs we had to solve the ‘sub-problem'
forn—l discs and once we did it, the solution to the original problem
was very easy to express in terms of the solution to the subproblem.
However, things are not always so easy as we saw in Problem (4.1) The
following problem is another illustration of this sort of a situation,
4.4 Problem: Find the number of ways to tile a 3 x n rectangle by
dominoes (i.e. by rectangle of size 2 x 1).
Solution: When n is odd. 3 x u is odd and so there is no way to com-
pletely cover a 3 X n rectangle with 2 x l rectangles. Let us therefore
suppose n is even say n = 2m. Let n... denote the number of ways to cover
a 3 x 2m rectangle by dominos. We see by inspection that an: 1 and
a,==3. Now supposem> 1. Let us try to express a," in terms of a,._l.
Every tiling of a 3 x (2m — 2) rectangle can be readily extended to a tiling
of a 3 x 2m rectangle by covering the remaining3 x 2 rectangle in any of
the 3 possible ways. This gives 3a,.._, possible tilings of the 3 X 2m rect-
angle. But the trouble is that not every tiling of a 3 x 2m rectangle has to
be an extension of some tiling of a 3 x (2m - 2) rectangle. Let us count
those that are not. It is easy to see that every such covering must be either
of type A or of type B as shown in Figure 7.8 where squares covered by
Bo _ _ .3,” Q am.I am Bo _____3",4 am a...
_ H—c T O—h-l |-—t
P |——q l r——l
R ---q l—‘—l l——l
Ac Am-z Arn—l Am Am Ant-2 Am-I Am
(A) (3)
Bo _ _ __Bm-l am 50 _ -__ 3.11.; am
__ _ _ ’“‘ _____ T
P-—l l.
.4
A0 Am-l Am A0 Am-i Am
(C) (D)
Figure 7-8: Domino Coverings of a Rectangle
(Chapter Seven)
602 DISCRETE MATHEMATICS
symmetry there are as
the same domino are joined by a line segment. By
suffices to count
many coverings of type A as there are of type B. So it
coverings of type A. In Figure 7.8 (A), consider the figure
with vertices
to which a
A0, A”-.. R, P, Q. 8..., and Do. It is a 3 x (2m — 4) rectangle
the number
2 x l rectangle has been attached at one end. Denote by b.,_,
the number
of domino coverings of this figure. It is clear that this is also
of domino coverings of type A. Consequently, we have
a... = 3a.... + 2b.-. (m .2 2) (22)
to which
where b... = number of domino coverings of a 3 x 2m rectangle
a ngure
a 2 x l rectangle has been attached. Adomino covering of such
clear that
must end either like (C) or like (D) in Figure 7.8. It is then
b». = b...) + a» (m > 1) (23)
Substituting for the b’s from (22) into (23), we get
cam—4mm an-.=o On») (24)
with a. = l and a,= 3. This is a homogeneous linear relation with charac-
teristic polynomial x“ —4x+ 1. The characteristic roots. therefore, are
2 :i: 43. By theorem (3.8), the general solution is a... — c, (2+ v3)"I +
+ c,(2 — v 3)"I where 0,, c, are constants. The initial conditions give
cI + c. = l and c.(2 + t/ 3) + c.(2 — 1/3) = 3. Solving this system gives
c, = 1 it); 3 and c, :- ‘é3/3
— 1. Thus the number of domino coverings of
a 3 x 2m rectangle is
I
a. =77? W3 + 1) (2 + v3)~+w3 — 1) (2 — vam- I (25)
it should be clear by now that in the applications of recurrence
relations. the real art lies in constructing the recurrence relation and not
so much in solving it. In this respect, the recurrence relations difl'er sharply
from their continuous counterparts, namely the difi'erential equations. In
the applications of difl‘erential equations, writing the difierential equation is
a matter of recalling the appropriate laws of the particular science (such as
physics. economics etc.) which govern the given problem. The real
challenge is often in solving the difl‘erential equation, for which a large
variety of methods is available. Comparatively, we have studied very few
methods for solving recurrence relations. of course, not every recurrence
relation can be solved. But if at all the solution is within our scope then
finding it is a rather routine matter. The construction of the recurrence
relation, on the other hand, changes from problem to problem and so
every problem offers a new challenge, requiring techniques peculiar to the
nature of the problem. In the next problem we evaluate a determinant
using recurrence relations. As is to be expected, the essential step is to
relate an n x a determinant to a lower order determinant of a similar type.
Advaced Counting Technique: 603
4.5 Problem: Evaluate the determinant of the n X n matrix A, which has
all 1‘s on its principal diagonal and the two immediate subdiagonals and
0’s everywhere else. (Equivalently A. =01”) where au= 1 it' It —j| g l
and 0 otherwise.)
Solution: Letan=det (A..) where A. is thenxnmatrix,
rlloooo...oooo‘|
l l l 0 0 0 ...0 0 0 0
0 l 1 l 0 0 ...0 0 0 0
A» = s s s s
0 0 0 l l l 0
0 0 0 0 l l l
|_ 0 0 0 0 0 1 1 _|
If we expand det (A..) w.r.t. the first row we get
a. = det (An) - det A“; —— dot (B._,) (27)
whereB..,isthe(n—l)X(u—l)mntrix
r1 1 ooo...ooo‘|
01 l 0 0...0 0 0
0 1 l l 0...0 0 0
3... =
90 l l l...0 0 0
Léoooo...oii
Bxpmding det (B...) by it; first column we see
det (B...) = det (A...) (23)
From (27) and (28), we get the recurrence relation
a. — 6H + a.... = 0 (n 2 3) (29)
with the initial conditions a, = l, a. = 0. The characteristic roots here are
l:l:21/3!' or in the polar form, exp ($515) . By Theorem (3.9), the general
solution of (29), is 0, cos "—1? + 0, sin"; where cl, 0, are real constants. The
initial conditions imply c, g— + c, %3 = l and - ‘1 71 + c, V73: 0, giving
1:‘ = l and c, = —l—-. So the given determinunt equals
1/3 l
I"! . n1:
008T+73Sln?. I
604 DISCRETE MATHEMATICS (Chapter Seven)
The rather impressive variety of problems considered so far might lead
to the belief that recurrence relations are some kind of a golden tool that
can tackle almost any problem which has an integer parameter. Unfortuna-
tely this is not the case. Take for example, the problem of determining
p(n), the number of partitions of a positive integer n. There is no easy
recurrence relation for p(n) and consequently the problem of determining
p(n) cannot be tackled by the method of recurrence relations. In fact, as
we mentioned earlier, there is no known closed form expression for [(n).
Also, in some problems, even if we can write down a recurrence relation,
there is no closed form solution to it. This is what happened with (15)
which arose in connection with counting the number of derangements of
71 symbols. Although we could simplify (15) to (18), the solution to the
latter had to be left in a summation form. Even for the relatively simple
case of a linear recurrence relation, things are not always rosy. Theorem
(3.8), along with its extensions, does give a complete answer, at least for
the homogeneous case. But those theorems are applicable only after the
characteristic roots are found. They do not tell us how to find the characteris-
tic roots. This requires the solution of a polynomial equation. For polyno~
mials of degree 2, there is the well-known ‘quardratic formula' for the
roots. For polynomials of degree 3 or 4 also there are formulas for roots
but they are too complicated to be of much practical use. For polynomials
of degree 5 or more, there is no general formula for the roots’. It follows
that in general we cannot satisfactorily solve even a linear recurrence
relation of order 3. In such cases we may have to be content on a genera-
ting function. Finally, there do exist enumeration problems which can be
solved quite easily by elementary methods but whose generating functions
do not have easy closed form expressions. For such problems, even if a
recurrence relation can be written down, it is complicated, if not impossi-
ble, to solve it. An example of such a problem will be given in the exercises,
e.g. Exercise (4.9) part (iv).
Although there is no recurrence relation for the number of all partitions
of an integer n. things may he difi'erent if we confine ourselves to partition:
of a particular type. For example, Exercise (2.3.35) gives a recurrence
relation for the number of triangular partitions of an integer. Solving it,
we can count the number of such partitions, or equivalently, the number
of mutually incongruent triangles with integer sides having a given perimeter.
We leave this determination as an exercise.
To conclude the section. we consider applications of recurrence relations
to problems dealing with occurrences of a pattern. Let 2 = (x,, x,,..., x.)
be a k-ary sequence of length n. (Usually We shall assume k = 2, that is,
the case of binary sequences.) Suppose d=(a,. a,,..., an) is a find sequence
of length In over the same alphabet as i. If there exists a block
of m
' See the Epilogue.
Advanced Counting Techniques 605
consecutive xfs which equal the a’s in the respective order then we say
that the patternfioccurs (or appears) in x. In other words, the pattern
6 is said to occur in i if there exists some r such that x,+, 2a, for all
i = 1,..., m. (This in particular means r + m < n). For example the pattern
101 occurs in the binary sequences 011011101, 0001101011 and 10010100
but not in the sequence 1110010. Note that in the first two sequences, there
are two subsequences each of which matches with 101. But in the case of
the first sequence these two subsquences are non-overlapping while in the
second one they overlap. In determining the occurrences of a pattern, we
always ‘scan’ the sequence from the left to the right and require distinct
occurrences to be non-over-lapping. Each occurrence is denoted by under-
lining together the block of consecutive entries which matches with the
given pattern. If this block ends with x; (say), then we say that the given
pattern occurs at the jth digit. For example, in the sequence 011011101 the
pattern 101 occurs twice, namely 011011101. Here the first occurrence is at
the 5th digit and the second at the 9tfiote'that because of our requirement
that distinct occurrence of the pattern be non-overlapping, and because of
the convention about scanning from left to right, in the sequence 0001101011
the pattern 101 occurs only once, at the 7th digit (as 0001101011) and not
at the 9th digit. This may appear a little strange. But the—requirement is
designed to meet practical needs. Sometimes a particular pattern is undesira-
ble and a provision is made to erase it as soon as it occurs. A subsequent
occurrence of the same pattern must begin fresh. If we let it overlap with
an earlier occurrence then the erasing mechanism may be falsely activated.
Because of the requirement that distinct occurrences of a pattern must
not overlap, counting problems involving occurrences of patterns require
careful handling. Some of the problems done earlier about k-ary sequences
of length n can be viewed as problems involving patterns. For example,
Problem (4.1) to ' g the l of k-ary . of length
n in which the pattern 0 and the pattern 1 appear an even number of
times each. As we saw, this problem can be done in a number of ways
including the method of exponential enumerators (Exercise (2.18)). For
non-trivial patterns (that is, patterns of length greater than 1), however,
the method of exponential enumerators cannot be applied readily because
it requires free merging of subsequences and the very idea of a pattern
precludes free merging because in a pattern the given sequence of digits
must appear consecutively, without anything interspersed, However, the
method of recurrence relations goes through as we show.
4.6 Problem: Find the number of binary sequences in which the pattern
101 occurs at the end.
Solution: Denote this number by a... For n 2 3, there are 2"‘3 binary
sequences of length n which end up with 101. But not all of them will have
the pattern 101 at the end. For example, the sequence 0100110101 of length
606 prscnm MA'rnaMA'ncs (Chapter Seven)
10 ends with 101. But the pattern 101 already appears at the 8th digit. So
the 8th digit cannot be counted in a later occurrence of the pattern. 0n
the other hand, in the sequence 000101010], the pattern does occur at the
end (and also earlier, namely at them flit). It should now be clear that
a sequence of length n ending with l0] will have the pattern lOl appearing
at the nth digit or else at the (n — 2)th digit. As these two possibilities are
mutually disjoint, we get the recurrence relation
0. + a... = 2"“ (n > 3) (30)
The initial conditions are al=0 and a,=0. This is a linear relation
with characteristic roots i and — I, which, in the complex form are
em" and rfili. By Theorem (3.9), the general solution to the associated
homogeneous equation is a. = cl cos "—2" + c, sin n71:- Fora particular solu-
tion we try a. = A2", since 2 is not a characteristic root. Then from (30).
M2" + 2"") = 2"", giving 51 = s or A s 5. So the general solution to
(30) is a. = 6. cash?" + c, sin ’5 + 2_' where c‘. e, are real constants.
2 10
From the initial conditions we get 0 = a1 = c. + f and 0 = a. = - t:I + §
giving cI = l and c, a — g. So ultimately, for every n > 0. the number of
binary sequences of length n having the pattern 101 at the end is
2 cos"; — sin? + 2"-1
5
The ‘ “ ‘ in this, “ were “ . ’to have the pat-
tern 101 occurring several times. Sometimes the occurrence of a particular
pattern signifies the termination of some process. In that case obviously it
occurs only once. Such situations give arise to problems of the following
‘flrst time occurrence‘ kind.
4.7 1‘ " Find the l of binary . of length n in which
the pattern 101 appears at the nth digit for the first time.
Solution: This time let b. be the number of such sequences. Obviously
b. g a. where an is as in thelast problem. We could study b,I independently.
But it is instructive to relate it to 11., which we already know. By definition,
a, = | A. | , where A. is the set of all sequences of length n in which the
pattern 101 appears at the end (and possibly earlier). In any such sequence
the first occurrence of the pattern will be at the rth digit for some r,
3 g r< n. I” < r g n — l, the number of sequences in A. in which the
first occurrence of the pattern is at the rth digit is evidently b,a,._,, because
if we break such a sequence at the rth digit then the first portion can be
formed in 1:, ways and the second portion is a sequence of length n — r
having the pattern 101 It its end. This gives,
Advanced Counting Teehniques 607
= 21M” + b. (n > 3) (31)
We now set a“ = 1. This is not consistent with the formula for a. in the
last problem. But that formula was valid only for n > 0. Similarly we set
bn = 0 The advantage of doing this is that we can add dummy terms to
the right hand side of (31)and rewrite it as 2-00,17..-" which Is precisely the
coefliclent of x’l in the product A(x) B(x) where
a: a:
A(x) = Z 11.x" and B(x) = 2 b..x".
l-O n-O
Also with our choices of a, and be, (31) is valid for all n > 0 but not for
n = 0. So multiplying (3]) by x" and summing over It 2 l, we get '
A(x) — do = A(x) B(x) — nob“ (32)
Since a0 = l and b" = 0, this gives
1
30:) = 1— (33)
To»
Using
2 cos 1— — sin"—
n+ 2'"l
=——-—5——2— (n > 0)
or directly from (30), it is not hard to show that
‘0‘) = ‘ + from—film
So from (33),
x‘
30‘) = We ' (3‘)
If we could resolve 30:) into partial fractious we could expand it and
compute b,‘as the coefficients of x‘. But this would require us to find the
roots of the polynomial 1 —2x + xg—x‘. It is easy to show that this
polynomial has one real and two complex roots. But there is no rational
root. So although the roots could be theoretically found by the formula
for solving a cubic, it is not practicable to do so. It is best to leave (34)
as it is. If we want b, for a particular value of n we could expand 80:) as
x',-'2'“ (2x — x= + x’)’,
and take the coeflicient of as“ in
ED x'(x‘ — x + 2y.
608 DISCRETE MATHEMATICS (Chapter Seven)
Although this is an infinite sum, only the terms from r = 0 to r = n — 3
need be considered because the other terms involve only the higher power
of 3:.
But there is a better way out and it is worth pointing out because of its
novelty. So far our approach has been to derive a recurrence relation
through a combinatorial argument (or some other consideration peculiar
to the problem) and then solve it with generating functions or some other
method. For the sequence (17,.) we can turn the tables around. From (34)
we get
B(x) (l-2x+x‘—x‘) =x‘ (35)
For n > 3, the coeflicient of x" in the left hand side is
b. - 2bl-1 + bn-a—‘bn-a
while in the right hand side it is 0. So we get .
b,I = 2b.-1 - b.-. + b,,_a (n > 4) (36)
with the initial conditions 1;, = 0, bI = 0 and b, = 1. We can now com-
pute bm one be one, giving
b.=2,b.=3,b,=s,b,=9,b.=16.b,=28etc,
We could have derived (36) directly. Let B. be the set of all binary
sequences of length n in which the pattern 10l appears at the end for the
first time. Then b, = | B, |, If! = (x., x,, ..., x.), denote by y, I, the suc-
cessive truncations of i from the left, i.e. y = (x3. x.. ---. x.) and 3 = (x,,
x‘. ..., x“). Clearly, for n 2 4, whenever x E 3., we have 3: e B,H and
2 E B._,,. Conversely let y e B._,. If x. = 0 then 2 e B... If x, = I,
then also 2 e B. except when x, = 0, x, = 1. So if we let
Cn={ieB.:x,=0,x,=l),
then we get b,. = 2b,.-, — c,_, where c, = l C,l I. To get hold of 0,, note
that 5: e C.I it? i e B"..— C..,. Thus c. = b._, — c._.. We already have
c. = 2b,,—b..+i. Substituting for c. and c.-. in c. = b._,—c._,, we get (36).
Despite this, the earlier derivation of (36) from (34) has some desirable
features. In some problems, even though the recurrence relation is fairly
simple, {direct derivation of it may not be so obvious. In such cases. if we
are somehow able to get hold of the generating function, we can convert
it to a recurrence relation, the way we get (36) from (34). Note, however,
that we cannot solve (36). Solving (36) would amount to solving (31) in the
first place. I
Although we were unable to get a closed form expression for b, in this
example, (36) gives a reasonable method of calculating it for small values
n, which is worthwhile because first time occurrences of patterns are impor-
tant in many real life problems, where they may signify the termination
Advanced Counting Techniques 609
of an algorithm, the victory in a game etc. To determine the probability
of such events we have to know the number of sequences of a given length
which end with a first time occurrence of a given pattern. Once we know
the generating function B(x) for the number of such sequences, it is easy
to get the generating function for the number of sequences in which the
pattern occurs exactly twice, exactly thrice etc. (with or without the last
occurrence being at the end). Having obtained the generating function, we
can write a linear recurrence relation in the same manner as we got (36) from
(34). A few problems based on such variations will be given as exercises.
One of them will give the answer to the Casino Problem.
Exercises
4.1 For every positive integer n, prove that the number of ways to
stack 1 rupee and 2 rupees coins so that the total value of the
stack is n rupees is
_1_ 1+v§ ~+1 1—«/§m
vs [( z ) - (-r) ]-
[Hint: Show that the numbers in question satisfy the same recur-
rence relation as the Fibonacci numbers but with a different
initial condition]
4.2 Using the least exercise, prove that for every positive integer n.
”ii )—[()<——)}
[(-0
(Essentially. the same sum was evaluated in Exercise (1.18) using
generating functions. The present solution is of the same spirit as
the proof of Proposition (4.3) because it consists of obtaining a.
recurrence relation for the sum and solving it.).
4.3 Prove that Fibonacci numbers also arise in finding the number of
ways to cover a 2><n rectangle by dominos.
4.4 Prove that for every positive integer n, the number of binary
sequences of length n in which the pattern It appears for the first
time at the end is
_1= I + 4/3 ~-‘_(l—s/§)H]
1/s 2 ‘2 '
[Hintz Instead of following the method of Problem (4.7), it is much
easier to consider the sequence of length n — 2 obtained by runo-
610 mscnm MATHEMATICS (Chapter Seven)
ving the pattern II at the end. This and the last three exercises
illustrate the ubiquitous nature of the Fibonacci numbers.]
4.5 Let X be the discrete random variable which denotes the number of
times a fair coin has to be tossed before two consecutive heads
show up. Prove that the probability generating function of X,
no). is given by
4.6 In a gambling game, the gambler has to pay one rupee for each
toss of a fair coin. The game is over and a reward of 5 rupees is
given if two consecutive heads show. If you are the gambler. would
you play this game? [Hint : Apply Proposition (2.13).]
Solve the recurrence relation (9) and then (11).
oe~l
alb¥
A restaurant serves three kinds of snacks for tifiin, say A, B, and
C costing rupees l, 2 and 2 respectively. A person has a tiflin
allowance of n rupees. If he eats one snack each day till allowance
is exhausted, in how many ways can he spend it? (Note that the
order of the snacks is important. That is, snack A today and B
tomorrow is not the same as B today and A tomorrow.)
m-n+1)
4.9 For non-negative integers m, n (with n $ m) let an,» = (
\ n
and 12...... be the number of ways to select n integers from {1, 2,...,m)
so that no two consecutive integers are selected. Set a”... = 0 for
m < n.
(i) Prove that at"). = urn—g) n.. + Ila-u - (m 2 2, n 2 1)
(ii) Prove combinatorially that b...“ = 11......” + I)...”
m —n + l
(m) From (i), (ii) and induction prove that 17...... = ).
n
(The kind of induction that is needed here is called doable
induction because two indices are involved. It proceeds very
much like ordinary induction except that in order to prove the
truth for some values of m and n, we are allowed to assume
the truth for all pairs (i, j) where is m, j<n and at least
one inequality is strict.)
. . ' m—n+l
(1v) By a direct combinatorial argument show that b,,.,.= .
ll
[l-Iint: See Exercise (2.3.5).]
Advanced Counting Techniques 61]
(v) Let (My) be the O.G.F. of the sequence (b..,,.)f.‘_.. for a fixed
m. From (ii) show that (F..(y)):_. satisfies the recurrence relation
FMO’) “ Fin—10’) “yFm—ao’) = 0 (m 2 2)
with the initial conditions F00!) = l and F101) = l + y.
(vi) Let B(x, y) = "£2 b...” x'”y" bethe 0.6. F. omie doubly infinite
1-0
sequence (b..,.} (and hence also of {a.,.,.)). From '(v) show that
B(x, y)—
— %. (This 15 the same as Exercise (1.24). But
there we merely verified the answerxwhile now We are arriving
at it through (ii) Expanding 1W, we can get yet ano-
ther proof of (iv).)
(vii) Using (vi), or directly from (v), show that for all m 2 0,
E m-"+1y, _(1+1/1—+4y)"'"—(l—Vl_+4y)"‘
... n 2~Hv1+4y
(viii) Using (vii), prove that if m, n are positive integers with m 2 n
then
.. m+2 (k (In-n+1)
2 = gut—m»:
k-° (2k+1 \n n
4.10 A stick of n units of length is to be broken into 11 parts of 1 unit '
length. In how many ways can this be done if at any stage we
simultaneously out every part of length greater than linto two
parts? What would be the answer if at every stage only one part
can be cut.
a III-+1
Define f and g by f(x) = ”E (—1)» (2%”? and
g(x)= .2 (—1)"(2n) l for x6 R.
Let h(x)=[f(x)]B +[g(x)}’. Prove that f’(x) =g(x) and g'(x)= —f(x)
for all x. Hence show that h’(x) = 0 for all x. Deduce that
h(x) = 11(0) = l for all xe R.
(This is the analytic proof of the well-known trigonometric identity
sin' x + eos‘x = 1. It illustrates the remark preceding Proposition
(4.3).)
612 Discnm mmmmcs (Chapter Seven)
partially
4-l2 Consider the set of all partitions of a set with n elements,
ordered by the refinement relation. (of. Example (4) in Chapter 3.
Section 3). Find the number of chains of length n in this set.
Find the number of regions into which a plane is divided by n
ellipses in it if every two ellipses intersect at two points and no
three of them pass through the same point.
Prove that for each n, the number of sequences (an, 11,, ....a,,.) of
non-negative integers in which do = n... =0 and l alanaH | = l
for all i: 1,..., 2n equals the Catalan number n~i— ( ). [l-iint:
+ l n
Associate a left parenthesis if a, — a,., = 1 and a right parenthesis
ifa. — 41—, = - L] f.
l of - “I . .
4.15 Prove that the
ftil, 2..... n} —> {1, 2,..., n) such that fix) 9 x for all x = l,,,,, n
1 211
”“5371 n ' "I ~ ~ f
Prove that the ‘ of
f: (l, 2,..., n} —>(l, 2...., n} such that f(x) g x for all x= l..... n
2n
also equals —1 ( ). [Hint :Oomider g(x) = n+ l—fln+ l -x).]
n+ l n
4.17 Solve the recurrence relation (21).
‘4.18 Design an electrical version of the Tower of Hanoi problem. That
is, design a circuit with nswitches x......x., n relays Y” Y,,..., Y,
and a lamp L such that
(i) Yl can be operated and released at will by closing or opening
the switch x, respectively.
(ii) for i=2,..., n, Y, can be operated or released by closing or
opening an if and only if YH is in the operate state and every Y,
for j < i — l is in the released state (otherwise operation of
x. has no efi‘ect, on the state of Y.) and
(iii) the lamp L is on only when Y. is in the operate state and
hm, Y._, are in the released state. Find directly how many
flips of switches are necessary to light the lamp L. [Hinu
Introduce suitable contacts y,, y..-~. yr. on the relays and
auxiliary relays. First find the closure functions for their
operate and release paths in terms of the Boolean variables
x‘, x.,..., x," y,,..., y... Do the same for the control circuit of
the lamp. See Chapter 4 Section 3.]
”4.19 How many moves will be needed in the Tower of Hanoi problem
if instead of three pegs. there are four pegs, all other conditions
remaining the same?
Advanced Counting Techniques 613
'4. 20 Find the number of ways to cover a 4Xn rectangle by dominos.
(These two problems illustrate how even a slight change can consi-
derably increase the complexity of a problem, a typical feature of
combinatorics, and more generally, of mathematics.)
Let r, A be real numbers with rgé A. Let A. be the n ><n matrix (nu)
where at, = A for iaéj and on = rfor 1 g is n. Using recurrence
relations prove that det (4,.) = (r — 79"! (An + r —- A).
Generalise Problem (4.5) to the case where in the matrix A,., a,,=a:
for i =j, a], =fl ii' li—j| = l and a” = 0 otherwise, where a. p are
some fixed positive real numbers. (Notice that the answer will differ
depending upon whether at > 29, u = 29 or u < 23.)
Show that the recurrence relation for 1,. in Exercise (2.3.35) is
equivalent to the linear recurrence relation
_ _ n_ x‘n-l_ ( _ 1)
~n-1 (”>0
1,. — 1..-, = 1111(1)
4.24 Solve the recurrence relation in the last exercise subject to the ini-
tial conditions i, = t, = 0, I, = 1. Hence determine the number of
mutually incongruent triangles of perimeter )1 whose sides are
integers.
2 cos "—2" — sin 53—: + 2"-l
4.25 Prove that the number———-5——, obtained in the
answer to Problem (4.6) equals the integer closest to the real num-
ber $. (This makes it a little easier to calculate it and also gives
a rough idea of how rapidly it grows as 71 increases.)
4.26' Verify that the 0.G.F.ot the sequence {a,,).'.°.o where a,I = l and
2cosg—sin-"2—"+2HH x’
a,= —3——- rsrndeed 1+ 1— +x’—2x"
[Him Work with (30). For a direct proof rewrite cos "2—" and
sin g in terms of powers of 11]
4.27 Let a = (an a,,.... a...) be a fixed binary sequence of length m. For
each positive integer n. let a. be the number of those binary sequ-
ences of length n which end with the pattern 47 and b. the number
of those which end with the first occurrence of the pattern a. Set
12, = 0, a0 = 1. For a positive integer k, let by, be the number of
those binary sequences of length n, in which the pattern 5 occurs
exactly k times, the last occurrence being at the end (clearly b“ =
bu), cm be the number of binary sequences of length n in which
the pattern n‘ occurs at least k times (with the last occurrence not
614 DISCRETE MATHEMATICS (Chapter Seven)
necessarily at the end) and 11.,k the number of those in whiehfi
occurs exactly k times (but not necessarily at the end). Let A(x),
B(x) be the 0.G.F.’s of {0.}, {b,.} respectively. Prove that:
(i) the 0.G.F. owner... is B"(x) (i.e. [12mm
(ii) {he 0.G.F. of (audio is 5:02
:
B"(x) — B"+‘(x)
(iii) the O.G.F. of (dangle is _ 2x
1
1
(iv) Am = 1— 80:) [Hint: Proceed as in (32) or sum (i) for
k=l,2,....]
4.28 Combining the last exercise with Problems (4.6) and (4.7), obtain
linear recurrence relations for the number of binary sequences of
length n with one of the following properties:
(i) the pattern 101 occurs at least once in them
(ii) the pattern 10! appears exactly once in them.
Let b,. be the number of binary sequences of length n in which the
pattern 111 appears for the first time at the end. Prove that b. satis-
fies the linear recurrence relation
bu = "-1 + bin—a + bin-a (n > 4)
with b‘ = b, = 0 and b, =1. Count 1:. for n (10.
4.30 In a gambling game, at every round the gambler tosses a fair coin.
The game is over when three heads show consecutively (called a
win) or at the end of the tenth round whichever is earlier. Prove
that the probability of a win is 65/128. [Hint: Even after a win,
continue the game with dummy tosses of the coin till 10 rounds
are over. Among the 1024 (= 2“) equally likely outcomes count
those in which there is a win, using the last exercise]
4.31 Using the last exercise. show that in the Casino Problem, the
amount of the reward for a win should be 14 rupees. [Hint: Calcu-
late the total amount of money the machine will get from all possi-
ble 21° outcomes corresponding to 10 tosses of the coins keeping
in mind that the dummy tosses cost nothing. Note also that as soon
as a player is sure to have lost the game (for example if he does
not win by the 7th round or earlier and gets a tail on the eighth
round), his subsequent tosses must be considered as dummy tosses.
Divide this amount equally among all winners]
Advanced Counting Technique: 615
4.32 In the Casino Problem. suppose the game is allowed to go on indefinitely
(instead of being terminated at the tenth round) and that every player
keeps on playing till he wins. Prove that the probability of winning the
game is 1 now but that the average cost of winning it is still 14 rupees.
[Hintz Let p,l be the probability of winning at the end of the it” round,
but not earlier. A closed form expression for 2 pf is not easy. But the
"-0
n n
sums 2 p. and 2 up. can be evaluated using the recurrence relation
III
in 5x231“ (429).]
Note: and Guide to Literature
The Tower of Hanoi Problem is one of the most well-known problems in
mathematics and often features iii-books on recreational mathematics. The
solution to Problem (4.4) is due to Tomeeeu [I], another problem book,
from where Proposition (4.3) is also taken.
The matrix considered in Exercise (4.21) arises neturelly in the theory
of designs (see the Epilogue).
Epilogue
Preview of ‘Applied Discrete Structures’
By now we have (hopefully) taken the reader fairly deep into the spirit of
discrete mathematics and have given him a rather thorough acquaintance
with its important techniques. It is high time that we apply the tools we
have picked. A beginning in this direction was already made in Chapter 4
wherewe u" "‘ ' ' L tothepr ‘-‘ of“ ' ’ ' U a
circuits. Also in Chapters 2 and 7 we solved a number of ‘real-life’ problems
(some of them being admittedly somewhat contrived) involving counting
techniques.
But there is alot more to go. In the original plan of this book, the
L , ‘ , were ‘ ‘ ‘ to n" ' of discrete matL ‘
to a variety of interesting problems. Considerations of space have forced
us to carve them out in a separate book titled ‘Applied Discrete Structures'.
Here we content ourselves by giving a rather detailed perview of that book.
(1) we begin by applications of groups, in a chapter titled ‘Group
Actions‘. Figuratively. group actions are groups put into actioahe formal
definition is of course different. In essence an action of a group G on a
set S is a homomorphism, say 0, from G into P(S), the permutation group
of the set S. Quite frequently G itself is a subgroup of KS) and in that
case a is simply the inclusion function. For x, ye S, we write x~y ifl’
there exists some g e G such that (0(g)) (x) = y. It is easily seen that ~ is
an equivalence relation on the set S. The equivalence classes are called the
orbits. We shall prove theorems due to Burnside about the sizes of the
orbits and also the number of orbits. (Theorem (5.2.19) will come out as a
special case.)
The importance of these theorems stems form the fact that in many
counting problems certain apparently distinct objects are to be treated as
equivalent to each other and the problem therefore amounts to counting
the number of equivalence classes. Such problems can often be reduced to
problems of counting the number of orbits under suitable group actions.
As atypical example consider the problem of colouring the vertices of a
regular pentagon with 3 colours so that 2 vertices are red, 2 are yellow and
one is blue. Let V = {v,, 1", VI, '0. 1"} be the set of vertices of the pentagon.
Epilogue 617
Let R = (r, y, b} be the set of the three colours. Then a colouring with the
given restrictions amounts to a function f: V—>R for which I f-'((r}) | =
If“((y}) | = 2. Let S be the set of all such functions. We easily calculate
| S I as 30. But that is not the answer to the problem. The pentagon being
regular, it is impossible to tell apart two colourings if one of them can be
obtained from the other by a rotation and/or a flip of the pentagon. Be-
cause of this, the problem reduces to counting the number of orbits under
the action of G on S where G is the group of isometries of a regular penta-
gon. (We already identified G in Chapter 5, Section I.)
For certain applications. we need more information than just the num-
ber of orbits. This information can be coded algebraically using polynomials
in several variables. This idea is analogous to that behind the enumerators
studied in Chapter 7, Section 2 where we saw that an enumerator is nothing
but an algebraic coding of a combinatorial problem. The resulting theory,
called Polya's theory of counting, is a powerful counting technique. (Polya
originally developed it for certain enumeration problems in chemistry. But
we shall not go into it.)
Finally the theory of group actions can be applied to study groups as
well. In Chapter 5, Section 4, we saw that the converse of Lagrange's
theorem is not true in general. That is, if G is a group of order n and m
is a divisor of n then 6 need not always contain a subgroup of order m.
However, this does hold ifm is of the form 17' where p is a prime. Ifp' is
the highest power of p dividing n, then a subgroup of Gof order p' is
called a Sylow subgroup of G. We shall prove a number of results about
them using suitable group actions. As an application we shall obtain
certain invariants which will characterise a finite abelian group upto iso-
morphism (see the comments following Corollary (5.116)).
(2) Next, we move to applications of a much richer algebraic structure,
namely fields. In the chapter titled ‘Applicatious of Field Theory’ we shall
consider two kinds of applications. Some will be purely theoretical while
some will be downright practical. The theoretical applications will be of
a novel type in that they will be aimed at showing not how something can
be done but rather at showing that certain things cannot be done. Take the
centuries old problem of trlseeting a given angle using ruler and compass
only. In school geometry we learn an elementary construction for bisecting
an angle. It is but natural to look for a similar construction for trisecting
an angle. People tried this for centuries without success. But, of course,
that does not mean it is inherently impossible to trieset an arbitrary angle
with ruler and compass only. That it is in fact so can be proved quite
rigorously, using only a few simple properties of field extensions.
Another ‘impossibility result' we shall prove deals with solving a
polynomial equation. In Chapter 7, in the comments following Problem
(7.4.5), we remarked that while there are formulas for expressing the
roots of a polynomial of degree 4 or less in terms of its coefl‘icients,
no such formula is available for polynomials of degree 5 or more. But
618 mscxm MATHEMATICS
once again, our own inability to find such a formula does not by itself
constitute a proof that no such formula exists. That it is indeed so isa
remarkable theorem of Abel. But the proof we shall give is even more
remarkable. It is based on what is called Galois theory, one of the most
exciting achievements of human mind. The key idea in it is to assign to
a field extension a certain group called its Galois group. The properties
of the field extensions are reflected in those of their Galois groups and
vice versa. In particular, solving a polynomial equation reduces to the
solvability of the corresponding group, as defined in Exercise (5.4.16).
The proof ultimately hinges on the result of Exercise (5.4.18) namely that
the permutation group S. is not solvable for n 2 5. Incidentally this
explains the peculiar name ‘solvable group'.
For the ‘practical' applications of field theory, we first construct finite
fields. Theorem (6.2.26) gives a construction for field extensions starting
from an irreducible polynomial over a ‘ground’ field. (Proposition (6.4.13)
gives the same construction in a somewhat ‘less abstract' form.) Taking
the ground field as 2,, where p is a prime and applying this construction
repeatedly, it is possible to construct a field with p" elements for any
positive integer m. thereby proving a converse to Corollary (6.3.11). The
existence of a field structure on a finite set permits certain combinatorial
constructions with certain subsets of that set. It can for example be shown
that the Religious Conference Problem with It states and n religions has a
solution if n is a prime power (cf. Exercise (62.33)). Actually a much
stronger combinatorial structure, namely a projective plane of order u
(see the Notes on Chapter I. Section 2) exists if n is a prime power. We
shall study all this under what is called design theory, this peculiar name
coming from the design of experiments, where in order to have unbiased
samples. it is necessary that the trials of experiments be conducted on
batches formed in a way which displays some kind of symmetry. This is
an important real-life application, but we shall not go into its details.
We shall, however, consider another very practical application of finite
fields, to an area called coding theory. When we want to transmit some
messages across a communication channel, they first have to be coded as
‘strings' or sequences of digits coming from some alphabet S. Such
sequences are called codewords. For a given n, the codewords of length n
constitute a subset, say C, of S" (which is the set of all sequences of length
n over the alphabet S). The set C is called a code. Because of errors in
transmission, the received message may difl'er from the intended message.
The basic problem in coding theory is to design the code C in such a way
that even if a certain L of errors new" the' ‘ -‘ ‘ a can
still be deciphered unambiguously from the received message. To do this.
it is necessary that the Hamming distance (see Theorem (3.1.5) and
Exercise (3.120)) between every two distinct codewords be sufiiciently
large. At the same time we do not want the length n to be too large,
Epilogue 619
because this increases the cost of transmission. It is a challenging problem
to design a code which meets these conflicting demands. In case the
alphabet S can be given a field structure, there is an elegant solution,
resulting in what are called Bose-Chondha-Hocquenghem codes. We shall
study these codes and in doing so will use facts about principal ideal
domains and matrices.
(3) Having discussed the applications of algebraic structures like groups
and fields, we move to a structure which is of a combinatorial nature and
is considerably more adaptable. It is called a graph. We already mentioned
this concept in connection with the Konigsberg Bridge Problem. In the
chapter titled ‘Graph Theory‘ we shall formally define graphs and give a
solution to this problem. But we shall do a lot more. We shall study the
internal structure of a graph. The most crucial concept will be that of a
path from a' vertex to another. Many real-life problems can be paraphra-
sed in terms of finding such paths in suitably defined graphs. We shall see
many instances of this, some of them being solutions to some popular
puzzles (typically involving rowing a boat across a river).
Problems of colouring the vertices or the edges of a graph subject to
certain restrictions are important both theoretically and in applications
and will be studied briefly. Another very important problem about graphs
is to decide when they are planar, i.e. can be drawn in a plane in such a
way that the curves representing two distinct edges do not intersect except
possibly at the end points. Although we shall not prove acomplete charac-
terisation of such graphs, we shall study a number of interesting conse-
quences of planarity.
Trees constitute an especially important class of graphs from the point
of view of applications, because they serve to abstract the process of
branching, which arises in one form or the other in a variety of contexts.
In many problems the solution cannot be obtained by an explicit formula
but has to be found out from a set of possible ‘candidate solutions' by a
search process consisting of performing some tests and eliminating some
possibilities depending on the outcome of a test. Such a search can be
modelled very conveniently in terms of trees, called search trees. Again
there are many applications, both of a serious nature and to puzzles
(typically involving the detection of a fake coin, given a collection of coins
and a balance).
For certain applications, it is necessary to assign certain real numbers
(which are called weights, lengths, costs, capacities etc. depending upon the
application) to the edges of a graph. The resulting structure is calleda
network. Problems like the Head Oflice Problem or the Travelling Salesman
I‘ ' ' can hep-n- ,‘ ‘ as certain ' ' ‘ :.... problems in networks
and we shall give algorithms for solving them. We shall also discuss the
concept of a flow in a network and present a Well-known algorithm. due
to Ford and Fnlkerson, to obtain the maximum flow.
620 DISCRETE MATHEMATICS
Although a graph is a purely combinatorial structure, it can be comple-
tely represented in terms of certain matrices. Two most important matrices
associated with a graph are its adjacency matrix and its incidence matrix.
This association gains strength from the fact that the graph theoretic
properties of a graph can often be translated into the algebraic properties
(such as eigenvalues or ranks) of the associated matrices. Thereby the
machinery of algebra becomes available for studying graphs. We shall
prove a few illustrative results of this kind.
(4) A graph being a very flexible structure, it is hardly surprising that
graphs can be applied to many problems. Indeed so varied and numerous
are these applications that even though we shall devote an entire chapter
‘Applications of Graph Theory” to them, we shall barely scratch the
surface. We already noted that many combinatorial problems can be
reduced to the problems of paths and colouring in suitably defined graphs.
We shall present a number of examples of this kind. Among the more
profound applications to combinatorial problems, we shall study matching
theory. Using popular terminology, one of the results we shall prove can
be stated as follows: ‘Suppose in a town there are n unmarried men and
n unmarried women. If each one of them is acquainted with exactly k
persons of the opposite sex, then they can all be married to persons of
their acquaintance without committing bigamy’. That this can always be
done (whenever k s n) is not immediately obvious. But this fact will
follow from the Ford-Fulkerson theorem about flows! or course ‘matching’
need not always be interpreted as a 'marriage‘. Interpreting is suitably, a
variety of interesting applications results, one of them being a well-known
theorem of P. Hall on what are called ‘systems of distinct representatives’
and another being Dilworth’s theorem (See Exercise (3.3.9)).
A graph is conceptually, a very uncomplicated structure. Most of the
graph theoretic concepts can be easily related to our everyday experience
and some of the theorems are so close to ‘common sense' that they are
sometimes criticised as lacking in depth. It is remarkable on this background
that these very concepts and these ‘utterly trivial' results of graph theory,
can work wonders when ingeneously applied. For example, certain non-
trivial theorems from other branches of mathematics can be proved using
some very elementary concepts about graphs. As illustrations of such
'theoretical‘ applications of graph theory, we shall prove two theorems.
One of them is a celebrated theorem of Nielsen and Schrier, which asserts,
among other things that a subgroup of a free group is free, a result we
mentioned without proof in Chapter 5, Section 3. The other result we shall
prove using graphs is called Brouwer’s fixed point theorem. It asserts that
if f: D"——>D'l is a continuous function where D" is the ‘closed unit ball’ in
the euclidean space R", i.e. D“ = ((x,,..., x0611": x} +x:+... +x,,l < 1},
then f has a fixed point, i.e. a point (a,,..., a.) e D'I such that f(a,,..., 41.):
(a,,.... 11,). This is one of the most classic theorems of continuous
Epilogue 621
mathematics. To prove it we shall first prove its discrete analogue, popularly
called the Sperner's lemma, and then apply a limiting process, based on
the well-known Bolzano-Weierstrass theorem. The proof, therefore,
provides an excellent illustration of our symbolic equation in Chapter 1,
Section 1 that continuous mathematics equals discrete mathematics plus
the limiting process.
Graphs also have many ‘recreational' applications. We already mentioned
that many puzzles can be reduced to problems of finding certain paths in
suitably constructed graphs. it turns out that certain kinds of games can
also be represented in terms of graphs. The problem of finding a winning
strategy for such a game reduces to finding a certain set of vertices in its
associated graph, which leads to what is called ‘game theory’. We shall
take a brief look at it. The techniques are applicable not just to games but
in other competitive situations, like planning war strategies, economic
policies, etc.
(5) In Chapter 1, Section 2 we remarked that the problems of discrete
mathematics are generally finitistic in nature and hence can be done on a
computer. The real problem before a programmer is not to devise some
algorithm but to devise a good, i.e. an efficient algorithm. We take up this
line further in the chapter ‘Analysis of Algorithms’. This is a topic which
provides some of the most challenging problems of applied discrete struc-
tures. The difl'iculty in fact begins from deciding what constitutes efficiency.
Various yardsticks for measuring efficiency will be discussed along with
illustrations. An important feature in the analysis of an algorithm is to
study what is called its asymptotic behaviour. The same algorithm operates
on different pieces of data. Let n be an integer variable which measures the
lsize’ of a given piece of data in some sense. As 7: increases, so will the time
it takes to process the data. Asymptotic behaviour involves the study of
how rapid this growth is. Methods from continuous mathematics are needed
to estimate the ‘order of magnitude‘ of a function. We shall study one such
method and apply it to derive two well-known asymptotic approximations.
The first says that for large n, the Harmonic number H. (see Chapter 7,
Section 1) very nearly equals ln n + y where y is a fixed constant. The
other result is called Stirling’s formula which asserts that for large n,
n! z 1/?! "if".
The Stirling formula and properties of search trees will be applied in
the study of sorting algorithms. The sorting problem is to list the elements
of a linearly ordered set in an ascending (or descending) order. Numerous
algorithms are known for sorting. We shall discuss four of them, Straight
Insertion, Bubble Sort, Merge Sort and Distributlon Sort.
As a crude classification, an algorithm is considered efficient if the time it
takes to process a data of size n grows proportionately to some polynomial
622 DISCRETE MATHEMATICS
in 7:. Such algorithms are called polynomial time algorithm: (or loosely
‘good’ algorithms). Problems for which such algorithms are available are
called (polynomially) tractable and their class is denoted by ?. For example
the sorting problem is in 9'. 0n the other hand it is not known whether the
Travelling Salesman Problem is in War not. No polynomial time algorithm
is known for it; nor is the non-existence of such an algorithm established.
This, in fact, is the case with hundreds of problems from highly diverse
branches of mathematics. What makes the situation so interesting is that
many of these problems are equivalent in some sense, so that the fate of
any one of them (namely. knowing whether it is in 9' or not) will decide
that of all of them. We shall see a few instances of such equivalences.
(6) We frequently come across optimisation problems, i.e. problems
where we have to maximise or minimise some real-valued function subject
to certain constraints. In notations, we have to maximise (or minimise) a
functionf:D -> R where D is a subset of R" consisting of all points (x,,...,x,.)
which simultaneously satisfy constraints of the form g,(x,,..., x") s c, for
i: 1,..., k (say) where gum, g], are some real valued functions on R" and
c,,..., Cr: are some given constants. The simplest case is the one where the
functions f, g,,..., ,are all linear. Such problems are called linear program-
ming problems. The Diet Problem is atypical such problem. In the chapter
‘Linear Programming‘, we shall give a systematic method, called the
simplex method for solving such problems. We shall also consider briefly
variations of linear programming such as integer programming (see the
comments on the Cattle Problem).
Although the simplex method works quite well for most linear pro-
gramming problems, in some pathological cases it is not efiicient. It is not
a polynomial time algorithm. Recently, a polynomial time algorithm for
linear programming has been invented by Karmarkar. We shall study it
briefly.
References
Ahlfors, Lars
[l] Cornplex Analysis, Second Edition, McGraw-Hill Book Company,
New York 1966.
Bender, EA. and Goldman, LR.
[2] 0n the Application of MabiuJ Invention in Combinatorial Analysis.
Amer. Math. Monthly 82 (789-803) 1975.
Birkhofi”, Garrett
[l] Lattice Theory. AMS Colloquium Publication 1948.
Bishop, D.M.
[I] Group Theory and Chemistry, Clarendon Press. Oxford 1973.
Bishop, E.
[1] Foundation: ofConstructive Analyril, McGrnw-Hill Book Company.
New York 1967.
Bose, R.C. and Shrikhande, 8.8.
[1] 0n the Falrtty of Euler’s Conjecture About the Non-existence of
mo Orthogonal Latin Square: of Order 4! + 2, Proc. Nut. Acnd.
Sci. USA, 45 (734-737), 1959.
Burnside, W.
[1] Theory of Group: of Finite Order, Dover Publications Inc., New
York 1955.
Carmichael, Robert
[1] Introduction to the Theory of Groups of Finite Order, Dover Publi-
cations Ine., New York 1956.
Chandrasekhar“. K.
[I] Introduction to Analytic Number Theory. Springer Verlag, Berlin
1968.
Chopra, 5.0. and Canale, R.P.
[1] Numerical Methodsfor Engineer: with Personal Computer Applica-
tions, McGrnw-Hill, 1985.
624 mscnm MATHEMATICS
Coxeter, H.S.M. '
[1] The Golden Section, Phyllotaxis and Withofl’s Game. Scripta
Mathematics, 19 035—143), 1953.
Crowell, R. and Fox, R. .
[I] Introduction to Knot Theory, Blaisdell Publishing Co., New York
I963.
Dec, Naming]:
[1] Graph Theory with Applications to Engineering and Computer
Science, Prentice-Hall of India Pvt. Ltd., New Delhi 1980.
Diwan, 6.5. and Agashe, 5.8.
[l] Dtfl'erentiol Equations, Popular Publication, Bombay 1961.
Dornhofi‘, LL. and Hohn, RE.
[1] Applied Modern Algebra, The Macmillan Company, New York.
1978.
Gillman L. and Jerison. M.
[1] Rings of Continuous Functions, Van Nostrand, Princeton 1960.
Gofl'man, C.
[l] Real Functions, Prindle, Weber and Schmidt. Boston, 1967.
Goldberg, S.
[I] Introduction to Dw'erence Equations, John Wiley& Sons, New York
1958.
Gorenstein, Daniel
[1] Finite Simple Groups: An Introduction to Their Classification,
Plenum Press, New York 1982.
Graham, R.L., Rothschild, H. L. and Spencer. 1-H.
[1] Ramsey Theory, John Wiley & Sons, New York, 1980.
Greenberg, Marvin
[1] Note on the Cay/ey-Hamilton Theorem, Amer. Math. Monthly, 91
(193—195) 1964.
Hall, 6.0.
[1] Applied Group Theory, Longrnans Publications, London 1967.
Hall, Marshall Jr.
[1] Combinatorial Theory, Ginn/Blaisdell, Waltham, Massachusetts
1967.
[2] Theory of Groups, The Macmillan Company, New York 1961.
Halmos, P.R.
[1] Naive Set Theory, Van Nostrand, Princeton 1960.
[2] Measure Theory, Van Nostrand, Princeton. 1950.
References 625
Hamersmith. Morton
[1] Group Theory and Its Applications to Physical Problem, Addison-
Wesley, Reading, Massachusetts 1962.
Harmy, F.
[1] Graph Theory, Addison-Wesley, Reading, Massachusetts 1969.
Hardy, 6.1-1.
[1] Course of Pure Mathematics, 10th Edition Cambridge University
Press, 1955.
Hemein, I. -
[1] Topics'in Algebra, Blaisdell Publishing Company, New York. 1964.
[2] Non-F ' Rings, Mm” ‘ of America.
1968.
Hoflmnn. K. and Kunze, R.
[1] Linear Algebra. 2nd Edition, Prentice-Hall, Englewood-Clifi‘l, N,J.,
1971.
Hua, LK.
[1] Introduction to Number Theory, Springer-Vetlag Berlin, 1982.
Isnacson, E. and Keller, H3.
[1] Analysis of Numerical Methods, John Wile”: Sons. Inc.. New York,
1 966.
Jacobson, Nathan
[1], Lecture: in Abstract Algebra (Vols. 1, 2, 3) Van Nostrand, Prince-
ton. N.J., 1951.
Joshi, KB.
[1] Introduction to General Topology, Wiley Eastern Ltd., Publishers.
New Delhi, 1983.
Kelley. J.L.
[1] General Topology, Van Nostrand, Princeton, N.I., 1955.
Knuth, DE.
[1] The Art of Computer Programming, (Vols. 1, 2, 3), Addison-Wesley
Publishing Co. Inc., Reading, Masshchuseus, 1968.
Kolmogorov, AN.
[1] Foundations of the Theory of Probability, 2nd English Edition,
Chelsea Publishing Co., New York, 1956.
Kreyszig,E.
[1] Advanced Engineering Mathematics, John Wiley & Sons, Inc.,
1972.
626 mscnm MATHEMATICS
Krishnamurthy, V.
[1] Combinatorics, Theory and Applications, Afiilinted East-West Press
Pvt. Ltd" New Delhi, 1985.
Kurosh, A.G.
[l] The Theory afGroups—a Edition, Chelsea Publishing Co., New
Delhi, 1960.
Lang, Serge
[1] Algebra, Addison-Wesley Publishing Company, Reading, Massa-
chusetts, 1965.
Larsen, M.E.
[1] Rubik's Revenge: The Group Theoretical Solution, Amer. Math.
Monthly 92 (381—390), 1985.
Larson, Loran C.
[1] Problem-Solving through Problems, Springer-Verlag, New York,
1983.
Levy, H. and Lessman, F.
[1] Finite Dw‘erence Equations, Macmillan, New York, 1961.
Limaye, B.V.
[1] Functional Analvxis, Wiley Eastern Limited,PublislIers, New Delhi,
1981.
Liu, C.L.
[1] Introduction to Combinatorial Mathematics, McGraw-Hill. Book
Company, New York, 1958.
Lovasz, L.
[1] Combinatorial Problem and Exercises, North Holland Publishing
Company. Amsterdam, 1979.
MacMahon. P.A.
[1] Combinatary Analysis, Volumes 1 and 2, Chelsea Publishing Co.,
New York, 1960.
Mendelson, B.
[I] Introduction to Mathematical Logic, Van Nostrand Princeton, 1964.
Meyer, P. L.
[1] Introductory Probability and Statistical Applications, Addison-
Wesley‘r‘ " " t‘ , ,," ‘ 1966.
Motzkin, Th.
[1] The Euclidean Algorithm, Bull. Amer. Math. Soc. 55 (1142—1146).
1949.
Newmann, DJ.
[1] Problem Seminar, Springer-Verlag, New York, 1982.
References 627
Niven, I.
[1] Irrational Numbers, Mathematical Association of America, Carus
Monograph, l956.
Noble, Ben
[1] Applied Linear Algebra, Prentice-Hall. Ine.,Englewood-Cliffs,N.J.
1969. .
Parzen, Emanuel
[1] Modern Probability Theory and Its Applications, John Wiley &
Sons Inc., New York, 1960.
Piaggio, H.T.H.
[1] Elementary Treatise on Difi‘erential Equations and Their Applica-
tions, G, Bell and Sons Ltd., London, 1960.
Pontryagin, L.S.
[l] Topological Groupr, 2nd Edition, Gordon and Breach, New York,
1966.
Riordan, J.
[I] An Introduction to Combinatorial Analysis, John Wiley & Sons,
Inc., New York, I958. '
Robinson, D.J.S.
[1] Course in the Theory of Groups, Springer-Voting, New York. 1982.
Roth, Leonard
[1] Old Cambridge Days, Amer. Math. Monthly, 78 (223-236), I971.
Royden, H.
[1] Real Analyrir, The Macmillan Company. New York. 1968.
Rudin, Walter
[1] Principles of Mathematical Analysis—Third International Edition.
McGraw Hill/Kogakusha, 1976. _
[2] Functional Analysis, Tata-McGraw Hill, New Delhi, 1974.
[3] Unique Right [nurses are Two-sided, Amer. Math. Monthly 92
(489—490), 1985.
Simmons, G.
[I] Introduction to Topology and Modern Analysis, McGraw-Hill Book
Co., New York, 1963.
Strangio, CE.
[1] Digital Electronics: Fundamental Concept: and Applications,
Prentice-Hall, Englewood-Clifl's, NJ. 1930.
Thomas, GB. and Finney, R.L.
[I] Calculus and Analytic Geometry, Addison Wesley/Nara”, New
Delhi, 1985.
628 mscnm MATHEMATICS
Tjur, Tue
[1] Probability Boudoir Radon Meaxuree, John Wiley& Sons, Ltd.,
Chichester, N.Y., 1980.
Tomescu, loan
[1] Problems in Combinatorier and Graph Theory, John Wiley & Sons,
New York, 1985.
Tremblay, LP. and Manohar, R.
[1] Discrete .'u’ ‘L ' ' c u with 1,," ’ to r' ,
Science, McGraw-Hill Book Company, New York, 1961.
Tucker, Alan
[1] Applied Combinatories—Second Edition, John Wiley & Sons, New
York, 1982.
Vilenkin, N.
[1] Combinatorial Mathematics for Recreation, MIR Publishers,
Moscow, 1969.
Whlteaitt. LE.
[1] Boolean Algebra and Its Applicotlom, Addison Weeley, Reading,
Masachusetts, 1961.
Zariski, Oscar and Samuel, Pierre
[1] Commutative Algebra, Volumes 1 and 2, Van Nosrrand, New York,
1958.
Zauenhnus, Hens
[1] The Theory of Groups—Second Edition, Chelsea Publishing Com-
pany, New York, 1958.
Answers to Exercises
These ansWers are not to be looked at as complete solutions. They
are merely meant to help the reader after he has honestly tried a problem.
If he feels he has a correct solution, he may compare it with the answer
here. If the two match, most likely he is right. If they don't, he is advised
to check his solution again and/or to discuss it with others. If he still
believes the answer given here is incorrect, he is urged to write to the
author. In the case of thought-oriented problems, the answers given are
often in the form of extended hints. Again, if the reader believes there is a
flaw in the answer given here or that he has a more elegant solution, the
author would appreciate hearing from him.
In case the reader is unable to do a problem, he is strongly urged to
resist the temptation to give up too soon and look up the answer. Keep
trying. Do a few special cases of the problem to seeif they give any clues.
Discuss the problem with others. If nothing comes up, simply leave the
problem temporarily. A flash may come utter a few days. (In the personal
experience of the author. some of the problems took several months.) Look
up the solution only when you are convinced it is not worth your time to
try further.
No answers are provided in some of the following cases:
(i) Where the problem asks for a straightforward verification of
something or for a routine calculation
(ii) where the problem is intentionally vague Is it asks the render to
give examples of a general nature or mske comments
(iii) where the hint given is either sufliciently detailed or sufl‘iciently
incisive to reduce the rest of the work to a routine
(iv) where a reference has been given to the solution in the ‘Notes and
Guide to Literature’
(v) where, to the author’s knowledge, the problem is an unsolved
one.
630 DISCRETE unnamncs
CHAPTER 1
Sailor: 1.1
1. Let the lengths be 1.1, I; If A, = 51, then A, = pa and A: = qu
where u = 33 is taken as the unit of length. The converse is clear.
This mounts to showing that 1/5 is irrational. if not, let «5 =5
where p, q are positive integers with no common divisor other
than 5: 1. Then p‘ = 241'. whence p‘ and hence p is even, say
12 = 2k. But then (1’ = 2k‘ whence q is also even, a contradiction.
Depending upon whether 71 is even or odd there do or do not exist
two houses exactly } km apart.
2a — a'.
2% where m is the unique integer such that
_< <:m+1
Ill
(IE f; ”11,310 f,, it]: house wheref, = max f.
0‘ I‘M
I l
I f(x)dx, S f(x)dx, point in [0, 1] (if any) where f is maximum.
0 0
Letflx. y) = population density at (x, y)
___ lim population on R
A4. A
where R is a region containing (x, y) and A is its area. The total
population now is 1] f(x, y)dA, the average population density
D
equals
I fix. y)dA
areaofD'
Answers to Exercises 531
%.(The distance between two distinct houses can have value 1—0
with probability1 155 ifori = 1, 2 ,10.)
10. i. (Let a he the distance between two houses. Then 1 assumes
values in the interval [0, 1]. For any r, r + Ar in [0, l], the pro-
bability that a: lies in [r, r + Ar] is, by Exercise (1.4) above.
2(r + Ar) —(r + Ar)'— 2r + r”.
Dividing by Ar and taking limits as Ar—) 0*, the probability
distribution of at is p(r) = 2 — 2r. The average distance
I
= I r(2 — 2r) (In)
I)
BMW—9)+SS—m(m+l) fl< m+I
ll. where 10 <u < T
11
(In an integer),
(Inf4l¢——) Efi.
12. eta—Hg] Ia—xuflxwx/j and»
13. (1+ $1).
14. 21110.
Section 1 . 2
Every number except the largest one must appear as the smaller
in at least one comparison.
Let numbers correspond to players and suppose when two numbers
‘play' (i.e. are compared), the larger is the winner.
Minimum ofx,yis }(x+ y— | x—yl ),
Let u,.= max {x,,.. ., x.}. Then for n 2 2.
u..= {(u”.l + x, + | u,._1 — x.l I ).
No. because to find [ x [ you still have to compare x with 0.
Let a, b, c... .denote the states and l, 2, 3. .the religions Let x;
denote the representative of religioni, from state x.IFor n = 3. a
desired arrangement is
632 mscna'rn martian-ms
a, b, c,
b, (:l a, .
c, n, b,
For n = 2, without loss of generality. assume one of the rows of
arrangement is (a,, b,). Now the first column must be ( Z‘ ) So
I
[2, appears twice.
4, 5, 7, 8 and 9are prime powers. For these values of n, there
exists a field with n elements (see the comments following corollary
(63.11)) using which the problem can be solved (of. Exercise
(61.33)). The case n = 6 was disposed of by exhaustively checking
all possibilities by Tarry.
Let y be the second largest element. In order to establish that y is
not the largest, ymust have been ‘defeated' sometime in the execu-
tion of the algorithm. But the only element which defeats y is the
largest element.
The third largest element must have been defeated at least twice.
Given x, y, 2, if we compare x with y and then the smaller of the
two with 2 we know the third largest without knowing the first
and the second largest.
10. Determine the largest out of 11 elements in 10 comparisons by the
process in the Tournaments Problem. The second largest element
must be from among those that have been compared with the largest.
There are at most 4 such elements. So with 3 more comparisons,
the second largest can be determined.
11. In general the second largest element of n elements can be found
With It -— 2 + [‘13 n'l comparisons where Fl: n1 is the least integer
9 log, n.
Section 1.3
l. If_n = 2m. in the first round there are m matches. Out of the m
Winners,- the champion is found by playing m — lmore matches by
induction hypothesis. A similar reasoning holds if n is odd.
i. i. and {-
Each . solution
. corresponds to an ordered tri pie i ' k -
negative integers for which 31+ 2] + k = 20, (1:135 maxi,
determined by i and j. Possible values of i and ‘ a '
following table: 1 re given by the
Answer: to Exercises 633
Value of 1' Possible values ofj
0 0,1, 2, 3. 4, 5, 6, 7, 8, 9.10
l 0,],2,3,4,5,6.7,8
2 0, 1.2.3.4, 5,6,7
3 0, l, 2, 3, 4, 5
4 0, l, 2, 3, 4
5 0. 1,2
6 0, l
4. Out of the 44 solutions above we take only those where each i, j, k
lies between 2 and 4. Out of the 9 possible solutions in the tnble
above. the requirement 2 < k S 4 rules out 6 of them. So there nre
only 3 combinations.
If not, let p,,..., p,. be the only primes. Let g be a prime dividing
pip....p,. + 1. Then qeép. for! = l,..., n as otherwise p, would
divide 1.
If a business has an import lieenle then by (ii) it must employ
skilled personnel while by (i) it must not! On the other hand if the
business has no import license, then by (iii) it cannot employ local
personnel and hence by (ii) it must employ local personnell
Lets. = l + 2 +---+ (n— 1) +n
Then s..= n +(n — 1) +...+2 +1-
Adding 2:.=(n+1)+ (ll +1)+...+('| +1)+(" +1)
[n times].
s°ln=#-D-
i a, = '2 a,“ + i k aives. after cancellations.
3-1 k-l 1-1
634 mscnm MATHEMATICS
9. (b) Let c,. = 1",.+1 EH — E}. Writing F.“ as F. + FL.l and F.’
as (1",.I + F,.,) I". we see that c, = — ¢:,._1 for all n > 2.
Sinee cl = — l, the result followe by induction on n.
(e) Let q = kp where k is an integer 2 2. Put n =- kp — p and
m = p in (n).
(d) ‘From 0): F: + F3“ = n+|F~ + Fn+1F11—1 = Fun by (3)-
10. The desired point P must be within (or on the boundary of) the
tringle ABC. For otherwise P is on the ‘wrong’ side of at least one
A3
(t1I ) (b)
line, say, BC. But then its reflection, say Q. in BC will be a better
choice,since|QB| =|PB|nnd|QC| =|PC| andbutlQAI
<: | PA [ . Now call A, B, C as A,, A” A. respectively. Let
A. = (x,, y.). If P = (x. y) is a point in the triangle, we hnve to
minimise f(x, y) = El [(x — x.)' + (y — y,)‘]-V'. Setting the partial
derivatives of f equal to 0 gives cos 01 + cos 0. + cos 9. = 0 and
sin 0, + sin e. + sin 0, = 0 where 0, is the angle PA, makes with
the x-sxis. From this, 1 = cos‘ 0, + sin2 0, = (cos 0, + cos 0,)‘
+ (sin 0, + sin 09' which ultimately gives cos (0. — so = — f or
0, — 91 = 120 degrees. Similarly 0, — 0. = 120 degrees. This
yields the answer in the text.
12. Let |BC| = 2x and 0 he the mid-point of BC. Then |OA| = 2:. Let
l 01’ I = Mawhfl< 7t < 1. Of the three tours, two have lengths
(1 — 1):: + XV1 + A‘ + 2x + 1/22: while the third has length
J21 J2):
B A; c
Answer: to Exerckes 635
2x\/l + A'+ 2V2x. The three tours are of equal lengths ifi‘
A : -9_—.7‘/—2. For A < “—74% the third tour is shorter.
13.
yI
( 0,339)
(0,450/7)
0 (100,0) x
Section 1.4
1. (i) There exists a man for whom there exists no woman who loves
him.
(ii) Gopal is either unintelligent or poor.
(iii) Gopal is either unintelligent or rich.
(iv) Gopal is neither intelligent not rich.
(v) It rains and the streets do not get wet.
(vi) There exists r > 0 such that for every 3 > 0, there exist x, y
inXsuchthatIx—y] <8and|f(x)-—f(y)| >1.
2. In general the negation of ‘p -> q’ is 'p and ~ 4', which in quite
difi‘erent from p —> ~ q.
4. As a simple example, take the theorem in geometry which says that
a quadrilateral is a parallelogram ifi‘ its opposite sides are equal.
636 Discnm MATHEMATICS
5, ‘If two triangles are congruent, they have equal areas’, or 'If s
series 5 a. is convergent then a. —> 0 as n —> no.
III
6. If a triangle ABC is equilateral then cos A + cos B + cos C = 3/2.
7. If}! —> q is true then q -> r is stronger thatp -> r.
8. (i) q is strictly stronger than 1:.
(ii) p is strictly stronger than q.
(iii) 9 is strictly stronger than p.
(iv) q is strictly stronger than 1:.
(v) p is strictly stronger than q. (When some men are poor, qholds
vacuously. but 12 may fail.)
(vi) The two are uncomparahle.
(vii) p is strictly stronger than q.
(viii) The two are equivalent.
9. (i), (iv), (v) are valid, the others are invalid. [In (ii), John is not
given to be a man.]
10. The circle with AC as a diameter need not be the same as the one
on which A, B, C, D are given to lie.
11. In mathematics, every term in the definition of a new term is either
previously defined or a primitive term.
12. Suppose the words encountered are A, p, p,.... Since there are
only finitely many words, p. = p; for some! < 1. Then
pl. Pm.- .., 171-:
is a vicious circle.
CHAPTER 2
Section 2.1
2. See Theorem (4.l.3).
4. (a) are-A :xEAUB.SoA CAUB.SimilIrlyBCAUB.
So (1) holds. For (ii). let x e A U R. Then xeA or x e B.
Ineithercase are C. SoAUBc C.
(b) A n B is the largest subset of X contained in both A and B.
5. if A, c X, for i: l,...,n then A,X A. x...xA. c xl x...><
X.-
(AIXBs)n(A.XBa)=(AsnAs)X(51nBl).A XB=(AI)(B‘)
UUIXBJUMsXWUbtJ.
Answers to Exercises 637
(i). (ii), (iv) (vi) are functions; (ii) is surjective, (iv) is injective.
None is bijeetive.
Definefi (X X Y) X Z —>X x (YxZ) hyf((x. y). z) = (x. (y. 1))
andg:X>< Y—> Yxbg(x,y)=(y,x)for allXJEY,
z E Z.
Let R“ = (x6 R12: > 0). Letf:R+—)R beflx) = V): for
are R+ Ind g:R —> R be g(y) = sinyfory e R.Tliengofil
defined but not f. g.
ll. (a) (i) 9 (ii). We always have A Cf"(f(A)). Conversely, let
x E f'1(f(A)). Then f(x) 6 f(A). So there is some y e A such
111“ f(x) =fly). Since f is one-to-one, x = y. So x e A.
(ii) => (i). Let f(x) = fly). Call {x} as A. Then y E f-‘(flAD
= {x}. So y = x. Similarly each ttaternent is proved equivalent
to (i). In the implication (i) :3 (iv), fixsome a E X. For y E Y
define 30!) = x if y = f(x) for some x e X (which is unique
by (i)) and g(y) = 4 otherwise.
(b
v
The corresponding Itato-ents are:
(ii) for every B C Y.f(f"(B)) =
(iiI) TOI‘ every 31- B: C Y Bi 0 3.3=f(f"(Bi) nf"(3.))
(iv) for any set Z and any two functions 3, h: Y—>Z,
gcf= hofimpliesg = h.
(v) there exists a function g: Y —» X such fog = idy. (To
construct g, for each y e Y choose some x such that
fix) = y. This requires the axiom of choice.)
f
(e) Let X—o Y—‘-> 2. Then
(i) 3-f is injective ifl'f is injective and gum is injective.
(ii) g o f is surjective if there exists It: Z —> Y such that
g a II = My and h(Z) C RX).
(iii) 3 of is bijective in“f is injective and g/flX) is bijective.
12. 10‘.) = x0 °f(f(x.,)) =f(x.) = x“. Converse is false. Let X = R—
{0} and f:X—>X beflx) =—x for all xeX. Then/hasnc
fixed point. But [1 = id; has every x E X as a fixed point.
13. fin-(x) =fl(x)fl(x)yf4un(x) = max {1,(x),f.(x)),fx_4(x) = 1
—f4(x) for all x e XJ', is identically 0- fx is identically l on X.
14. (01m R n<M—I) (ii) (M—I) n(HUR) (iii) R n (M—H)
(= R— H)(iv)(M mun
15. (i)./(M)(ii) W—flM) (iii) R 0 14(3) (iv) (W-flM)) -U f(I-H-R)
(0H nf-‘(W —
638 Discnn'rs MATHEMATICS
16. See Chapter 4, Section 1.
17. Once a player losesa match, he never plays again. So the _sme
player cannot be a loser in more than one match. The range is the
set of all players except the champion.
18. (a) Define h: X+Xx (y) by h(x) = (x. y) for x e X. Then his
a bijection.f, = (fl)! x {y}) oh. If we identify (x. y) with x
(or equivalently treat I! as the identity function) then
f; =f/X X 0')-
(b) f(x, y) is nothing butfly) evaluated at x.
(0) Define a: zm —> (2!)! by (x f) = firm 0 i. a bijection.
Section 1.2
3. The pigeonhole principle results by taking each u, = l.
4. Consider the two neighbours of 12. At least one of them exceeds
1. Or use the generalized pigeon-hole principle. Think of each
adjacent pair as a box. The sum of the contents of the 12 boxes is
156. No two overlapping boxes can have the same sum.
5. In the clock positions 1 to 12 put, for. example, 1, 7, 6, 8, 5, 9, 4,
10, 3,11, 2, and 12.
6. Divide the triangle into four mutually congruent subtriangles by
lines parallel to sides. At least two points are in the same subtri-
angle.
7. Let S be the set of selected integers; A = (l, ..., l00} and B = (101,
..., 200}. The idea is to show that the selection of k integers from
A prevents the selection of at least k integers from B, whence ] S I
cannot exceed 100. Let .4, =0: e 412'): e B}, for i = l,..., 7.
If x e A, is in S, it forces 2'): to be out of S. Show that the same
y E B does not get excluded because of two difi‘erent x‘s.
Consider the remainder after division by n.
Represent the six persons by six vertices of a hexagon. Colour the
edge joining two vertices blue or red according as the corresponding
two persons know or do not know each other. The problem is
equivalent to showing that there is at least one triangle whose three
edges have the same colour. Pick any vertex x. Of the 5 edges
meeting at x at least three, say, xa, xb, are have the same colour.
If ab also has this colour, xab is a desired triangle. Similarly for
be and ca. Otherwise abc is a desired triangle.
Answers- ta Exercises 639
II. In the pigeonhole principle, r = l, k = 2 and N = n + I. In
Exercise (2.9), n = 2, r = 2, k = 3 and N = 6.
12. k".
13. Let X=(x,,..., X...“ x.); Y = {x‘,..., x._,). Let 1? = {ACX:x..eA)
and 9 = {AcX‘: x.¢ A). Apply induction hypothesis to 1’, 9.
14. No, because a. binary sequence of length n is nothing but afunction
from (x,,...,x.) into 2,.
15. .P, = | Xx X—AXI where Xis any set with n elements and AX
is the ‘diagonal' on X, i.e. AX = {(x, x): x e X}.
n
16. (i) ( k ) is the number of k subsets of an n-set.
(ii) Suppose from 71 objects we choose k and put a white tag on
the selected objects. Then out of these I: objects we select r
and put a black tag on those selected. This is equivalent to
first selectingr objects (and putting a white and a black tag on
each) and then selecting k—r objects from the remainingn — r
and putting a white tag on the selected objects.)
(iii) Let X= YUZ' where YnZ=¢and| Y! = |Z| =n. If
two objects are chosen from X, either both can be from Y or
both from Z or one from Y and one from 2.
(iv) Let T be the set of all bijections from {1, 2,..., n} to itself.
For i = l...., n, let T, = {fe Tu' is the smallest integer
such that f(i) ¢ 1}. Then I T, | = (n — !) >< (n—I)!.
17. (a)
y
4'— --------- 1
-_--l 5 j
j l (X)I
exw ' ’y {Y
' ' L
L___._l____.._J
.
x I
i
l l ,x
'3
(b) Both sides equal | S | .
640 Discnm MATHEMATICS
18. Let X, Ybe the sets of boys and girls respectively. Let
S = (x, y) e Xx Y: x and y dance together. In the notation of the
lastexercise, for eachx e X. 1G,,| = 2. So 2 |F,] = 2 I G, |
’6? : ex
= 2|X|, which would be impossible if I F, | S 2 for all y E Y,
since|Y1< [X].
19. (i) Consider the number of arrangements of 0: objects each of n
types and p objects each of q types.
‘ ,
(ii) Writefi—J: ss p1pI...p,., where for i: l,....n,
_ (ni — n + l) (n!—u+ 2)...(ni)
PI—_‘——‘_n!
Then p, = iq: where q. = (m — "(l—1);!“ _ l)
So%%=nlq,q,...q.. Each 9, is an integer. For a combi-
natorial proof, suppose there are n colours and there are n
chips of each colour. Count the number of difi‘erent colour
contrast patterns that result by arranging these n‘ chips in a
row. Note that two contrast patterns are to be regarded the
same if one results from the other by a permutation of the
colours.
1.. 3_§5.364. ...(366 — n)
( 5)“ '
21 . For an integer k2], the line x+y=kmeetsN>< Nink—l
points.
22. All assertions hold trivially for finite sets. So assume 'countable‘
means ‘denumerable’.
(i) Letf:N —>X be a bijection. Let YC X. Thenf-‘(Y) c N.
So it suflioes to show that every subset of N is countable. Let
S (IN. If S isnot flinite, let u, be the smallest integer in S.
For k :> 1, define n; inductively as the smallest integer in
S — (n,.....nk.;}. Then gzN -> S, g(k) = m. is a bijection.
(ii) Letf:X—> Y be surjective. Then there exists g: Y—>X such
that f u g = idy. Let Z = 80’). Thenggives a bijection between
Y and Z. If X is countable, so is Z and hence Y.
(iii) Letg:N—>X, h:N —> Ybe bijectlons. Then BzN X N—>X X
X Y. defined by 0(m,n) = (3(m), h(n)) is a bijection. Since
N x N is countable. so is X x Y.
Answers to Exercises 641
(iv) Let g:N—>X, h:N—) Y be bijeetions. Define f:N x {1.2)
» XU Y by f(n. 1) = g(n) and/(n. 2) = h(n) for n E N. Then
f is a surjection. Apply (iii) and (ii) to get the countability of
N x {1.2) and ofXUY.
(v) Let X be denumberable. For each positive integer r. find an
injective function from P,(X) into X'. Hence P,(X) is count-
able. Let f,: P,(X)~>N be a bijection. Let F(X)= set of all
non-empty, finite subsets of X. Define f :F(X)—->N X N by
f(A) = (f,(A),r) where r = |A I. Then I is a bijection. So F(X)
is countable. By (iv) F(X)U(¢} is also countable.
23. Z = NU (0) U (—N) where — N is the set of all negative integers.
All three are countable. Similarly write Q = Q+U{0)UQ‘ where
Q+, Q‘ are sets of positive and negative rationals respectively. Since
fiNXN—>Q+ defined by flm, n) = m/n is n suriection. (1+ is
‘ ' which ‘ ," Q' is also ''
25. Go on picking one element at a time.
26. Let Z be a denumerable subset of X. Then ZU Y is countable. Get a
bijection between 2 and 2 U Y and extend it suitably.
27 Let if possible f: X —> P(X) be a bijection. Let A = {x E X:x Mix»,
Consider x‘ E X such that fix") = A. The contradiction reached is
of the same spirit as the Russell’s paradox.
A bijection need not preserve such attributes as lengths or areas.
(As a still more shocking example, it is possible to construct a
bijection from a line segment to a solid cube.)
P Q
c 8
(U9) fine)
9
Let S denote the set of points in the square ABCD inscribed on
the unit circle as shown in the figure and T denote the set of points
in the unit circle. The idea is to define f:S—o T by ‘Itretching'
each segment like 01' to the segment 0Q. In terms of polar coordi-
nates, (r, 0), f is given explicitly by f(r, 0) = (A (0)r, 0) where
642 mm Mlmmwncs
moose if I‘l‘asf
425mm? “<37"
1(0)= —\/2¢os(t <9<T
. 3 1B 5 1‘
. . 51: 71:
—1/23m°lf7 <0<T.
L
To get I bijection between any other square and circle compose]
with suitable similarity transformations (which are all bijections).
30. Let X be infinite. Let 2 be a denumenble subset of X and f: N—> Z
a bijection. Define g:X‘—>X by g(x)=x if x ¢ Z and g(x)=
fl,“ (x) + 1) if 2:6 2. Then 3 is a bijection from Xto X—{f(l)}.
For the converse, let x, be any element in X - Y. For n >1,
define x. inductivelynf(x._,). Then the elements x1,x,....sre Ill
disarm.
Section 2.3
II
There are ( 1) looks, each lock has n — m +1 keys, each person
.-
n—1
can open( locks.
m—l
Let P, be the leader. Let LI be a lock which any of p,,p., pnp‘
can open. For every 2-subset {1.)} of {2, 3, 4, 5), let I.” be a lock
with 3 keys which neither 1' nor j can open.
[( 71:]: )_( If: )]/( ":1‘ )(Andre’s method goes
through.)
n—r+l
. (Think of a selected integer as a woman and a
r
rejected integer as I man and modify Problem (32).)
When x, is at an end. x. has n — 2 possible places. When 2:, is not
at an end 2:. has n — 3 possible places. So x, and x. can be seated
Answers to Exercise: 643
in 2(n -— 2) + (n — 2) (n — 3) ways in all. For each such seating,
the remaining guests cur be seated in (n — 2)! ways.
(For anoles,
x denotes
l degrees)
Join EP. The triangles BPC, BPD and BEC are isosceles. So
DP = RP = BC = BE Since A EBP = 60 degrees, triangle 1581’
is equilateral. So 4 EPD = (100 — 60) = 40 degrees and EP = DP
whence LED? = } (180 — 40) = 70 degrees. Since 4 Bl)? = 40
degrees, the result follows.
m n m + n—2
) ) rectangles; ( paths. (in all m+n—2
2 2 m— l »
inter-road distance: must be covered of which m —-1 are North-
South and n — l are East-West. Each path corresponds to a
binary sequence with m — l 0's and n — 1 1’s).
Without loss. of generality let the removed comers be white. In
the 62 remaining squares. 32 are black and 30 are white. In any
adjacent pair there is one black and one white square. So the
desired pairing is impossible.
For a solution without colouring, suppose the 62 squares can be
644 mscnm rum-nos
decomposed into 31 pairs of adjacent squares. Let us all the two
squares in each pair as ‘mstes’ of each other. Now, for i = l, ..., 8
let
= the number of squares in the ith row (from the top) whose
mates lie immediately below them (i.e. in the i + ith row).
= the number of squares in the ith row whose mates are above
them.
R. a the number of squares in the 1th column whose mates are to
their left.
L, = the number of squares in the ith column whose mates are to
their right.
Clearly T,=B,+1 and L,'=R;+1 for i = l, ., 7. Also 5, -—- T.=
R,=L.=0. LetV= IET’= 2 BlendH= 2 R,=E L. Then
Vnnd H are the numbers, respectively. of ‘verticsl‘ and ‘horizontal'
pairs, so that V + H = 3|. In each row, the horizontal pairs
(i! any) occupy an even number of places. So we get T, + B, is
even for2 as i g 7 and odd for t = l, 8, From this it follows that
all T,’s are odd for! = 1, ..., 7. So V is odd. Similarly H is odd.
But then V + H would be even, a contradiction.
20 20
10. ( 10 —< 11 ). (Arrange the players in descending order of
heights. Associate a left parenthesis with those in team A end a
right one for those in team B.)
ll. At a junction which is hands to the East and j — i roads to the
north of the starting point. there will be 2"'I( J ) persons,
1'
0 < I <1. (Because of the binomial coeflicients. such a distribution
of persons is called a binomial distribution)
3
12. 0ift+kisodd;( ) 2'ift+kiseven.
t+k
_2_
k-Lrlz +5
13. ( )/2'. (Apply Exercise (3.4) with 11+ k - t and
k=|_rl2_|—5 k
g = 10).
14. 2'.
15. The answer for a convex n —gon is 2(11 )+
+M"2
_3).
_ (There
4
Answers to Exercises 645
aren(";3) n
diagonals and 4 )points of intersections. If k of
them lie on a, diagonal, it Is cut into k+ 1 segments.)
16. m(m — 1) it'n = 2»: end m' it'n = 2m +1. (Consider the middle
term of the suhprogression.)
n 71—]
17. (4 )+( 2 ). (Let r. be thenumber of regions. Get a
-—-i
recurrence relation r,. = r._, + 1 + ( ) + n — 3 and solve
it by summing, using formulns for 5:; k, 57‘ k2 and i k'. A some-
- .- k-l
what easier method for summing El 3 will be given later, in
problem (7.1.5). A much slicker argument to do the present exercise
is to suppose that the diagonals are drawn one by one in some
fixed but arbitrary order. Every time a new diagonal is drawn, the
number of regions goes up by k+ l where k is the number of
previous diagonals which interest the new diagonal.)
18. The first sum equals the number binary sequences of length» in
which 0 occurs an even number of times.
19 . k"ifn=2m; k"'+1ifn=2m+ l.
k(k _ 1).”
21. Oifn < 2, 2.3"” if3 < n g 5, and 2.3""—3"'ifn 3 6.
23. 34-5... 101-102.
24. .2 .
n)
25. Both identities can be proved by induction. However, combinatoriel
proofs are more interesting. Let X = 0‘1...” x., xul) be a set with
n + 1 elements. Let B be the box in which x," is put. Let k be
the number of elements in the remaining m boxes. Then k 2 m.
I:
These k elements can be chosen in (k) ways and distributed in Sir.»-
wuys into In boxes. For the second formula. suppose the objects are
put into boxes one by one, in the order x,, x,,..., x.+l..Let x,“ be
the last object to fall into a box which is empty previously. Then
MG DISCRETE MATHEMATICS
the first k objects occupy the remaining m non-distinct boxes and
the last n— k objects can be put into any of the m + 1 boxes
(which are now distinguished by their contents.)
(3) (Z) (Z)-
3*“
27.
(‘Z’)-’ (22)-
First put one object in each box. Now put the remaining r— n
objects without any restriction.
45 (iii-(1))-
Label the types ,1. 2,..., n. A selection in which k, objects of type i
are selected for i = 1,.... n can be represented by a sequence whose
first kl terms are 1, next kI terms are 2,... and so on.
32 . Suppose a.. 4...... a,n is a partition of n with a,>...>n,.. If a,.>1
then 11,— l, aI — l,..., aM—l is a partition ofn —In. It‘s... =1,
then a,,..., am.I is a partition of n — 1. For the second assertion,
given a partition of Zn into 71 parts, remove 1 from each part to
get a partition of n (into possibly less than n parts).
Given a partition of X into m parts, if under the reshuflle of the
indices, the x‘ s in each part get permuted among themselves, it is
the same partition of X. So, it is not true that n! distinct partitions
of X give rise to the same partition of n.
A triangular partition of 1: correspond uniquely to an ordered triple
of integers (an 11,, a.) in which 111 2 a, 2 a, 2 1, a, + a, + a. = n
and a,l + a, > a1. If (an 11., a,) is a triangular partition of n, then
(:11 — 1, a, — l, a, — l) is a triangular partition of n — 3 except
when a, = l or when a, + a, = a, + 1. In either case n is odd and
so of the form 4k :1; 1. In each case show that the number of
exceptions comes to k.
36. Given a self-dual partition of 1:, consider a new partition of n
whose parts are the ‘L-shaped’ subsets of the Ferret’s graph con-
Answer: to Exerciua 647
sisting of the first row and the first column, the second row and
the second column and so on.
37. 2'/10!. Suppose the rth person leaves last, 1 s r < 10. Then for
i: l,.... r — l, i must leave before 1+ 1 and forj= r,...,9.j must
leave after j + 1. So the number of favourable cases in which the
9
rth person leaves last is ( l ))
,_
Section 2.4
1. 18(=81—4x27+6><9— 4x3+3).
z. 12(=27_ 2x9+3).
1 1 1 1
3. 2—1—3—l+-4—1—...+(-—l)-fi.
4. 8!-4x7!+6x6!—4><Sl+4!
,._, n—I
2 (-1)’( )(n-J')I
5"
1-0 j
6. The order of the products on the two machines determine: a pair
(f, g) of permutations of (l, 2,..., n) such that It!) 96 g(!) for each
is: l...., n. This condition is equivalent to f-1 o g being n derange-
ment.
. This reduces to counting ordered triple (f, g, h) of permutations of
~l
l,..., n for which f-1 o g, f-I o h and 3-1 o h are all derangements.
This comes out to be n! E. where E.= ”(0, w):0, q; and 0 o rla are all
derangements of { l,...,n))|. But there seems no easy way to compute
En.
. (2n)!—n(2n—l)! + C) (Zn—2)! + ...+ (—l)'nl. (For each colour,
consider two objects, one a triplet and the other a duplex of balls
of that colour. Arrangements in which these two objects are together
get counted again.) With 4 balls of each colour the answer is simply
(2n)!
2" '
648 13mm MATHEMATICS
6 6 6
9. (12)l_(l)]2.(10)l+(2) 12.10-8!—(3)12.10-8.6! +
6 6
+ (4) 12-10-8-6-4!—<5)12-10.8-6u2o2!+12'10.8-6-4~2.
lo. 40 and 15. For sharpness see, for example, the Venn diagrams be-
low where the figures represent the cardinalities of appropriate
subsets of the set of all casualities whose cardinality is assumed to
be 100.
s
s» 25
_ l
11- 2 (-l)‘ (’5‘) M 10! Without the principle of inclusion
i-0
3! (IO-k)!‘
and exclusion, the answer comes out more @3i as (151’10). (10P7).
12. A solution using the principle of inclusion and exclusion
is clumsy.
10
Instead, first serve the three vegetarians in ) (8) (6) ways.
2 2 2
For each such serving, the remaining 14 packet: can
be distributed
in % ways.
*-"
J; <—w< >< >= —w< '"
m +j
m In +1
k ""
;( )
k
m
k—
j
=(:)§;<-w(";"')
= 0 by Exercise (3.18).
Answers In Exercises 649
15. (i) 80, 80 and 408.
(ii) x is relatively prime to p’ in“ p does not divide x. There are
p'-1 positive integers g p’ which are divisible by p.
(iii) For i: l,..., k let S, : {x:l g x g n, p; divides x). Thenfln)
I
is simply :0: S,’ which equals ’Eo(—l)1' s, in the usual notation.
Here :1 is the sum of the cardinalities of sets of the form
5,, n ...n S" for 1 <1} < i2 < <1", g k. It is easily seen
than 5;, n.. ._nS,,. consists of multipleskof 1),, p“ . pij. Conse-
quently I SI. n-- -n 31,-] =01"l’m- Pk" where m. =7“ — 1 if
u—
— 1‘”. ., i, and m..—
— r. otherwise. But these are also precisely
the terms in the expansion of (p? _. pI' ‘).. .(p'k— p" ‘).
(iv) Since m, n are relatively prime, the primes appearing in their
prime factorisations are all different. The result follows from
(iii).
CHAPTER 3
Section 3.1
2. For (b).
3. The simplest example is a graph consisting of just two vertices and
an edge joining them.
4. C‘ and CI are in the same class. C, is not.
5. None. Simple counter-examples exist to show that equality of
lengths is neither necessary nor suflicient for CI and C, to be in the
same class. It is also possible to eonstruct Cl, C., C. such that the
area between CI and C, = the area between C. and C. but Cl, C,
are in the same class while C. is not. As for curvature, consider
mirror images.
6. Let 2,, be a point in the pond. Then it can be shown that C, and
C, are in the same class ifi‘
J‘dz_ dz
z—zo_ z—z."
cl c.
8. Positivity and symmetry are immediate from the definition. The
triangle inequality amounts to proving that for any x1, yl, x” y,,
3‘5: y; E R!
(n+b)=+(c+d>=<a=+b2+c2+da+2vm vm
where a=x,—x,. b=x,—xa,c=J’i—J’vd=J’I—yr This
650 mscxara MAI-nannies
further reduces to proving (ab + ed)I s (a2 + b‘) (0'. + d?) which
follows from the fact that 0 < (ac - bd)‘. For the midpoint pro-
perty, set M = (in—:i', yL—g—y‘). For uniqueness of M, shift the
origin to (‘%xfi L115). Then P, Q take the form P = (II, k).
Q=(— h,—k). if M=(x, y) satisfies 4w, P)=d(M, Q)=}d (P, Q)
then(x—h)’ + (y — k)‘ = (x + h)‘ + (y + k)‘ = h‘ +k‘ which
gives xii + yk = 0 and x' + y’ — 2xh — 2yk = 0 implying
x = y = 0.
For :11 the balls are squares with sides having slopes i l; for d,
they are squares with sides parallel to the axes. In both cases the
points on the sides are included in the closed balls but not in the
corresponding open balls.
10. The side of each square should equal the minimum of 11.05, P1), 1%}
where d, is as in the last exercise. with the axes going east-west and
north-south.
ll. min {d,(P., P1)” #j} where dl is as above.
The diagonal of each square should equal.
12. If, for example, all Pfs are on the x-axis, then the vertical side of
the rectangles can be any positive number.
14. Let the distinct elements of X be x,...., x.. Then
I = minld(xhx/)I i #1. l s i < ".1 <j< n}.
15. The arguments are analogous to those in Exercise (1.8) and (1.9).
The name Pythagorean comes from the well-known Pythagorean
theorem. if d, (and also d.) is the usual metric for R, then by
Pythagorean theorem, the usual metric on R“ is precisely that
given by (i).
16. In Z, let d.(x,y) = Oifx =y and 1 if x eey. Then :1, isthe
discrete metric on 2,. If i, y are two binary sequences of length n,
then [‘31 dl(x,, y,) is simply the number of Pa for which x; 9‘ y,.
17 . Let, if possible, it, y, 2 be three such sequences. but} be the
'oomplementary’ sequence of i, i.e. iv = (w,..., w”) where for
i =1,...,lO, w, =l—z,. It is easy to check that for any a,d(a, 1) + d
(a. Tv)=10. So we have JCT, W) = l0 — 4(2, 2) < 3 and similarly d
(y, Ty) < 3. But then d(:‘c, y) g 3 + 3 = 6, a contradiction,
18. Let X be the set of all binary sequences of length n. Start with any
xex. IfC(3?, t — l) S )Lpicky e X— C(J'c, t —-1),It‘
C(i-1-1)UC0.1—l)
Answer: to Exercises 651
does not exhaust X, pickioutside it. Continuing. we are sure
to get at least M distinct points, every two of which are, by very
construction, at least 1 apart.
19. The Gilbert bound here is 2; but we can easily find at least 4
elements, forexample, 0000000000, 1111110000. 1110000111 and
0001110111.
20. In Theorem (1.7), the conclusion would be
mg' "— I"
LZ(-)<k-l>'
'
[-0 J
21. Let (A, f), (B, g) be multisets. Then their union is (A ()3. [1) while
their intersection is (A n B, k) where the function: h, k are defined
by
f(x) ifxeA,x¢B
h(x)= g(x) ifx e 8,): $4
max (/(x), g(x)) ifx e AUB
and k(x) = min (fix), g(x)} for all x e A 03.
As the simplest counter example, let X = {x} and g(x) = 2. If
A = Xand fix) = l, thenB = A and h =f. Thus (A, f) equals
its own complement.
14. In the case of a set S with n elements fix) = 1, for all x e X. So
a permutation of the multiset (S, f) is simply a bijection
0: (l,..., n) —> S.
25. This is essentially a duplication of Exercise (2.3.9) except that
instead of an 8x 8 chess-board, we have a 10 x 10 chess-board.
26 . Birbal would have been guilty of over-generalisation and consequent
loss of depth. Anything will go wrong, if you fail to treat it properly.
There is no big deal in saying it.
Section 3.2
1. Reflexive: (i), (vii)
Symmetric: (i), (ii), (vii)
Tranlitive: (i), (vii), (ix).
652 DISCRETE MATHEMATICS
3 R-1 is precisely the ‘reflection‘ of Rin the diagonal AX. Other
assertions are trivial.
2..- _ 2n‘—n _ you»: + 2~(-—l)ll; £1 SM".
n.
As the simplest counter-example, let X = {1, 2, 3) and R = {( 1,2),
(1.3)}. Then s = XXX. But if Y: {2, 3}, then S/Y= n
while R/Y = 95 and so the equivalence relation generated by R/ Y
is merely AY. (The key idea is that although both 2, 3 are in Y,
they are related not directly but through I, which is not an element
of Y.)
{X} and {(x}: x e X) respectively. The corresponding equivalence
relations are X XX and AX.
Let S be the relation of congruency modulo n on Z and T be the
equivalence relation on Z generated by R. Then RC5 and so TCS
For the other way inclusion, let (x, y) e S. Then y — x = kn
for some integer k. If k = 0, then y = x and (x, y) e T by reflexi-
vity of T. If k > 0, apply induction on k to show that (x, y) E T.
If k < 0, consider (y, x) and apply symmetry of T to show that
(x. y) e T. So Sc T, and hence S = T.
10. As the simplest counter-example, let X=(l. 2, 3}. RI=AX U ((1, 2),
(2. 1)} and R. = AX U {(2. 3): (3. 2)}.
ll. Let .9' be the collection of all equivalence relations on X which
contain T. 9' is non-empty since X x X e 9'. Let V = Tn EFT. (If
E
X is finite, so is f. If X is infinite 9' may be infinite. Still Tng T
e
makes sense. It is defined as the set {(x, y) e X XX: (x. y) e T
for every T e .7). Generalising the last exercise. Vis an equivalence
‘ relation on X. Also V contains R. So V contains S. On the other
hand,Se$andscS.
12. Let 9, 6', f be the decompositions of X corresponding to R,. R,
and R1 (1 R, respectively. Then members of .9' are of the form
DnE for some D e .92, E e 6’. Conversely every intersection of
this form is either empty or a member of .9".
l3. Let R, S be the equivalence relations corresponding to 9, 6' respec-
tively. Then 9 and 6 are mutually orthogonal ifl' R n S = AX.
14. If n, then n by symmetry and hence xRx by transitivity.
15. Let Y = {x e X: there exists some y e X such that n}.
16. Example (8) results as a special case if T is the identity relation on
Y, La. T = AY.
Answer: to Exercises 653
17. (a) Follows from Theorem (2.8), since the sum on the left is the
number of k-ary sequences of length n in which the symbol 1
occurs an even number oftimes.
(h) Let 1 occur 2r times in a k-ary sequence of length n. Then the
remaining entries constitute a. (k — 1) -ary sequence of length
n — 2r. If 0 is to occur an even number of times in them, then
by Theorem (2.8), there are (k___7___
—1)--=' + (k — 3).-.. such sequ-
ences. The result follows by applying (a) twice (once for k and
then for k —- 2) and adding.
18. EU: —(k—4) 1.
1 I ll
(find i( 2" )[(k— w ‘WM
19. Let D be an equivalence class under S. Let x E D. Then g(D) has
to be defined as [ f(x)], i.e., the equivalence class of Rx) under T.
g is well defined, because if ye D, then xSy and hence/(x)T/(y)
so that [/00] = [f(y)]. The following diagram is then commuta-
tive.
X ——9X
P is
I
ll O
I
V
Y ——-9Y/ T
q
If T = AY, Y/T may be identified with Y and we get Proposition
(2.14).
20. Each equivalence class under T is a 'box‘ whose sides are equiva-
lence classes under R and S.
21. An equivalence class under R is a set of the form f-‘(B) where B
is an equivalence class under T.
Section 3.3
3. The pose! consists of {0) and all sets of the form (n, — n} where n
is a positive integer. Clearly this poset may be identified with N.
(the set of non-negative integers) with the partial order ‘x < y‘ it?
x divides y.
654 DISCRETE MATHEMATICS
5. (i) If SCT and T — S has at least two distinct elements say, a, b
x then SU{a} lies properly between S and T.
(ii) Let the distinct elements of B be b,...., b.. Then (9%, (b,),
(bl, b,),..., B} is a chain of length n + 1. Since no two distinct
members of a chain can have the same cardinality, no chain
can be of length exceeding n + 1.
(iii) Each chain of length n + 1 corresponds to an arrangement
of the elements of B.
6. In P(B), where B = {I, 2), (4», {1)} and (4% (2)) are chains, but their
union, (d, {1}. {2}} is not.
7. (i) Let B = (1. 2, 3), x - {15, {1), {2}, (1, 3)) and g be
defined by set inclusion. Then (4., (2)) is a maximal chain in X.
But the longest chain has length 3.
(ii) Apply Theorem (3.7) to the poset (3’, c). Given a chain C e 3’.
let 9 be the set of all chains which contain C. A maximal
element of a is also a maximal chain.
(iii) Apply (ii), to singleton chains, i.e. chains of the form «2:» for
x E X.
Let the ' x r of X be " ' , able Let
X = [[11 C, where C,,...,C. are chains. Then no two x’s can be in
the same chain. So u a m.
Let the terms of the sequence be xI‘x.....,x, where r = mn + 1.
For each k. " all K '“ m- ' " ' a sub-
sequences beginning at xk and let f(k) be the length of the longest
such subsequence. "f(k) < m for all k, then by the pigeonhole
, principle, f assumes the same value at at least n+1 distinct points,
say lr,,...,k,.+1 where l < k, < k, < < k...” g r. But then we
must have 1:1,, > xk, >...> k ; for if, say, xk, g mm, then any
monotonically increasing subsequence starting from xii,“ can be
elongated to a similar subsequence starting from xk, which would
meanf(h) 2 f(km) + 1-
ll. Suppose there are infinitely many boys 1),, b, b.....,b,.,...and
infinitely many girls g1, g,,...,g,,.... Consider the case where the
only dancing pairs are (1),, g,) for which 1‘ 2 i.
If it is known only that B is finite, say B = (bl,...,b,.}, let G; be
the set of girls who danced with b,, r' = l,..., u. If (GI....,G.) is
a chain then let G, be the smallest element of it. Then a girl in G,
dances with all boys.
Answer: to Exercises 655
12. Proceed by induction on the cardinality. Let (X, < ), (Y, a) be
linearly ordered with |X| = ] Y| = n > 1. Let x, y be their
smallest elements. Apply induction hypothesis toX -— (x) and
Y — (y) each with the restriction of the corresponding order.
Iff : N —>Qwere an order isomorphism, then f(1) would have to
be the smallest element of Q. But Qhas no smallest element.
13. Letf: X-s Y be a bijection. Define a: P(X) —> P(X) by 9 (A) =f(A)
for A C X. Then 0 is an order equivalence.
14. Let Y = (2, 3, 5}. Define g: Y —> X by. g(A) to be the product of
the elements in A with the understanding that g(¢) = 1. Then 3 is
an order equivalence.
Us is the usual order on X (= (I, 2, 3, 5, 6, 10, 15, 30)), then
(X, S) is a chain but (P(Y), c) can never be a chain if[ Y| 9 2.
15. In view of Exercise (3.13), it suflices to show that (HQ), C)
contains an uncountable chain. For every real number a, let
S. = {x E (2:): < so}. Then fora < p, S. c S, as there are
i
rationals in the interval (a, p). 50 (S.:a e R) is a chain in HQ).
This chain is uncountable since the set R is uncountable.
16. (i) If X is finite, let x be the smallest element of X and y be the
smallest element of X — {x}. Then there is no 2 such that
x<z<y
(ii) Enumerate Q and X as 1],. q,,..., 11...... and x,, x;...., x,....
respectively. Construct an order-preserving bijection I: Q —> X
by defining f(q.,) inductively. Let f(q,) = x1. Let n > I and
suppose f(q0,,.., f(q,,_,) have been defined. Suppose 1],, < q,I
<...<q,,,_,, where i,i,...i,._I is a permutation of(|, 2,..., 71—1).
Then f(q,,) < f(q(,) < ...< f(q,,_.). Now q, is precisely in one
0f “15 intervals (—00, ‘1"), (111. , ‘11. )u». (limp 00). [By (—00,
q“) we understand {q E Q: q < qr, ). similarly for (W; . 00).]
Consider the corresponding interval in X (which is nonempty)
and let r be the smallest integer such that x, is in that interval.
Define f(q,.) = x,.
17. Let 9, 4’ be two decompositions of a set X. Let R, S be the corres-
ponding equivalence relations on X. Then 9A! is the decomposi-
tion induced by RnS which is an equivalence relation (cf. Exercise
(2.12)). Let T be the equivalence relation generated by RU S. Then
the decomposition induced by T is 9V6.
18. Apply induction on the cardinality.
656 DISCRETE MATHEMATICS
20. /
/ /\ /\
\
/ \
\
(I 4) I ‘\ (’213‘ \\\(3,2‘ ‘ (4")
I
0,1)
21 . (i) For x, y e X. let 2 he the least element of (x. y}. If 1 = x,
thenxsy.lfz=y.theny<x.
(ii) Neither R nor Z has a least element. (Note that the set of non-
negative reals is also not well~ordered even though it has a
least element. The subset (0, 1) has no least element).
(iii) Let A c N and n e N. If k e. A for some k < n apply induc-
tion hypothesis. Otherwise n is the least element of A. (Actually,
the fact that N is well-ordered is equivalent to the second
principle of mathematical induction which can be formally
stated as ‘IfA c N. IE A and for every n >1,(ke A for all
k < n) = n E A, then A = N. In a rigorous development of
number system, positive integers are first so defined that they
form a well-ordered set. Then the principle of induction is
proved as a theorem.)
(iv) Let A cX‘ and A aé ¢. It‘ XnA as 49, the least element of
X nA is also the least element of A. Otherwise t is the least
element of A.
(v) LetxeX. Let A ={yeX:x<y}. IfA#¢.theleastelement
of A covers x.
22. With the notation in the text, let 0 and 4: denote respectively the
old and the new permutations. To show that a is the successor of 0,
we have to show two things: (i) 0 < a and (ii) if 0 < 1- then a S 1.
(i) follows easily from the fact that the first j — 1 entries of 0 and a
match while the jth entry of 0 is less than that of v. For (ii). let T
be the set of those permutations of (1, 2,..., n} whose first j — 1
entries are al. n,..... 111-, respectively. Let M = (l, 2,..., n}—{a1...,
aH). For any 1 e T, the jth entry of A must be from M. For each
x e M, let T; = (It 5 T: the jth entry of A is x}. Clearly
T = EU“ Tx. Then for x, y e M, x < y implies that every element
x
of T, is less than every element of T,. Note that 0 is the largest
element of T., while a is the smallest element of T.,. Now suppose
0 < 1. Let ibe the smallest integer such that the ith entry of 0 il
Answers to Exercise: 657
less than that Off. In gj— I, then certainly a < 1'. Ifiz>j— 1,
then r e T. So 1 e T,‘ for some xe M. Since 0 < 1 we must have
a, < x. New note that in M, a, is the immediate successor of 11,.
Soa,<x. If flr< x. thenagain «<1. If a,==x_ then use
since 6 is the least element of T.,.
Given an r-subset (11,, a,,...,a,) (with al< a, <...< 11,) of{l, 2,...,
11), first locate the largest i such that a, < n —- r + j. Then (an a,,
..., a1-“ a; + I, a, + 2...., a, + r—j + 1) is the successor. To
get the index of (an a,,..., a,), note that the number of r-subsets
n n — a1 + 1
whose smallest element is less than a; is -( . So
r r
n n—al+1 ,
the index of (4,, a,...., a.) equals ( ) —( )+ the index
r r
of (bl, b.,..., bH) where b, = am— 411 for i= 1,..., r—l and
(b1, .... b,_,) is an (r— 1)-lubset of (l, 2, ..., 7: —al). Working
inductively, the index of (an a,,..., a,) comes out as
(1')-("Tl-(3:?)—(f:;")-~-—("I“')
=<:)— é! (r3: 1)’
For i < r S n, the index of (11,, 11...“, a.) is
r n I "‘0':
2 —2 ).
“(J ”(r—kw
25. Use uniqueness of binary expansion of an integer. If A, B c X and
g(A) = g(B), then f4(i) =f,(i) for all i = l,..., n whenCe f4 =fe
and hence A = B. So g is one-to-one. Since I P(X) | = | Y I, by
the pigeonhole principle 3 is also onto.
Let if possible (1,, 119.... at) be a cycle. Then there isnn arrow from
a: to a,“ for i = l,..., k — 1 and also an snow from a. to ax. But
then a. < a, <...< a; and a; < a“ which is a contradiction in_
view of transitivity and asymmetry of <. (In topological termino-
logy, the absence of cycles is called aeyclicity.)
27. Follow the hint. If the assertion in it does not hold then at every
stage x; is the minimum element of X1. So x1 < x, <...< x. in
the original order.
658 mscnn'rs mmmncs
28. Both.
29. Let (X, S) be a poset and let y1,..., y. be any listing of elements
of)! consistent with g, i.e. if y, S y, then I g 1. Then for each i,
y, is a minimal element of (y., y”,,..., y.). So there is a topological
sorting of X in which the elements output successively are yl...., y...
30. (i) Let S be complete and BC X, Bee '1‘ and b be a lower bound
for B. Let A be the set of all lower bounds of B in X. Then
b e A and so A 96 95. Also A is bounded above, since my
element of B is an upper bound for A. The proof is completed
by showing that the supremum of A in X is also the infimum
of B. The converse is proved by a dual argument.
(ii) Every non-empty subset has a least element, which is also an
inflmum. Apply (i).
(iii) Apply (ii) to N. That Q is not complete is seen from the set
A = (x e Q: x‘ < 2} which is bounded above (by 3 for
example), but has no supremum. (If y = p/q is the supremum
MA in Q, then y a5 v2, since 1/2 is not rational. Ify< v2,
there exist x e A such that y < x < V2 contradicting that y
is an upper bound for A. If y > V2, there exists b 5 qch
that 1/ 2 < b < y. b is then an upper bound for A, contradict-
ing that y is the least upper bound for A.
The completeness of R under usual order is a matter of how
the real numbers and their order are constructed. One approach
is to define a real number as a subset, say, S of Q having the
following properties:
(a) S is non-empty, bounded above but has no greatest
element
(b) forallx,yeQifx<yandyeSthenxeS.
Given two real numbers S and T we say S S T if S c T (as '
subsets of Q). With this approach, if A is a non-empty subset
of real numbers which is bounded above then U S becomes
554
the supremum of A.
Section 3.4
”(nl); "uni-1)”; "(n—1H1.
It's —> Bandg:C—> D are two functions, let fxg: AxC a
BXD be the function (fxgxa, c) = (f(a), g(c)) for a e A, c e
C.
Then assocmtrvtty of a binary operation e on a set X is equivalent
Answer: to Exercises 659
to the commutativity of the diagram
Xxxxx
\/m
Left distributivity of a over + is equivalent to the commutativity
of
h
XxXxxxx ———> XxXxXxX
*x*
Axx‘xxx
XxXxx XXX
N: /
XX X —-L——> X
where Ax X —> XXX is defined as Ax(x)= (x, x) for x E X and
h: XxXxXxX-u XxXxXXXis defined byh(a. b, c, d)= (a, c,
b, :1) (Le. h interchanges the second and the third coordinates.)
For a e X, let a also denote the constant function with value a.
Then a is a right identity for t ifi‘ the diagram
commutes.
(1.x.y)er‘(AX)¢>s(a.X.y)eAX¢Mx=My~
(a,x,y)eXXAX¢(x,y)eAX¢x=y.
660 nrscnm mmnrcs
Since left cancellation law holds itf [x = y a a t x = a t y] the results
follows.
5. ln (1), 45 and S are the identities for U and {1 respectively. Only the
identities are invertible. ln alattice the maximum and the minimum
elements are the identities for A and V respectively. Also they are the
only invertible elements. In (3), the identity function is the identity
and bijections are the invertible elements. ln (4), the identity and the
inverse is pointwise. (S) has no identity‘ in (6)©has an identity ifl‘
. does. A function [is invertible ifif(x) is invertible for all x6)!
10 (7), [0] and [l] are respectively the identities for the modulo In
addition and multiplication respectively. Every element is invertible
w.r.t. addition. For multiplication modulo n, [m] is invertible ifl' m is
relatively prime to n (This fact is proved in Chapter 6.) In (8), the
null sequence is the identity and the only invertible element.
Both assertions hold trivially when either m or n is 0. If both In > 0
and n > 0, prove (i) by induction on u. If m > 0 and n < 0, write
n = — It so that k > 0. Ifm > k, then MW” 12" = W014)" 6"
(since both sides equal a"). Now cancel a“. If m < k, then
12"" a“ = (a-‘YH' a" = (a‘1)'°—"' 0“" n’" = a’" = (2")(2-1)‘ ah
Cancelling a“ again, (i) holds in this case too. Proceed similarly if
m < 0 and n > 0. If»: < 0 and n < 0, apply (i) with :1 replaced
by r‘, m by -—m and n by —n. So (i) holds in all cases.
Similarly for (ii), first proceed by induction when m > 0, n > 0,
lfm < 0 and n > 0. let m a — k. Then (a"-)'' = (a-k)" = ((a-1)k)~
= (a-lyw = r“ = 11". Similarly dispose of other cases.
Suppose /\ is distributive over V. Let x, y, 2 ex. Then
(Jc V y) /\ (x V 2)
=[(XVJ')AXJV[(xV.v)/\z]
=xvl(xAz)V(y/\2)l=[xV(XAZ)1V(yA2)
= x V(}’/\ I).
So V is distributive over A. For the converse interchange A and
V.
The argument is purely set theoretic for the power set lattice. The
argument for the lattice of statements is equally straightforward.
For the lattice of positive integers, consider prime factor decompo-
sition. If a. b, as N. find primes 11”.... [1, so that
a = pi“ pa"-..p';"; b = pl‘ pimp? and c = pi‘ pimp?
where en’s, 9's, 7's are non-negative integers Use the fact that aAb
=p3‘ p?...p$' and a v b =pi' p,“...p§’ where for i: l,... r.
8, = min (in, B.) and (1 = max (an, 5,). Similarly for a /\ 6 etc.
Answers to Exercises 661
Let x,, x,, x, be three mutually incomparable elements of a lattice
such that for i aéj, x, A x, = a (say) and x, V x; = 1:. Then
x, A (x, V x.) = a A b = a. But (xLVx,)A(x1Vx,)= bAb=b.
10. This follows straight from definitions. However, it is more interest-
ing to prove such results in the context of particular algebraic
structures to be encountered in subsequent chapters.
ll. That / is a monoid homomorphism follows from the propertics'of
exponential function, namely 2"" = e‘e’ and e‘ = l. f is not a
hijection and hence not an isomorphism. However, if (Rh .) is the
submonoid of (R, .). consisting of positive real numbers then
f: R —> R+ is a monoid isomorphism.
12. For example, let X = N, e = usual multiplication. Then (N. t) is
a commutative monoid in which cancellation law holds. Let R be
the equivalence relation of congruency modulo 10. The quotient
structure X/R is a commutative monoid but does not satisfy cancel-
lation law.
13. dug a eds b ed. On R, we have 2<3but(—l)2>(——l)3.
So < is not compatible with multiplication.
14. For all x, y, zek.d(x+z.y+z)=|(x+z)—(y+z)|
= l x — y | = d(x, y). Similarly for (x.. y,). (x., y.) and (x,, y,)
in R‘: d((J‘r» yr) + (’6: ya): (‘1,Y:)+(xs- ya» = “(Xvi—x:- J’fi'h).
(x2 + xx: 3’: + Ya» = 1/(x, + x, _ xx — Xe)! + 0’: ‘i' ."s-J’I‘J’e)’l
34/”: _ Xe” + (Y) ‘J’I)I = ‘1“i 1’1), (xv «VI» If do“ )9
= | x' —y’|forx,ye Rthendt0,2)=8butd(0 + 1.2+ 1)=
d(l, 3) = 26.
IS IT a is commutative or associative,®has the corresponding proper-
ties. " e is a right/left identity for t, then (a) is a right/left identity
fore However, even if every element of X is invertible. the same
is not always true for P(X). Similarly even if cancellation law
holds for a, it need not hold for®(e.g. let X = 2. e = 4-. Then
29 (l) = Z ® (2) even though (I) a6 (2).)
16. Let a., a1"... a. be the digits in the decimal expansion of an integer
k
n. Then I: = 2 0.10. Since 10 E 1(mod 3). 100 E lO’El'(mod 3)
(-1
and ingeneral 10‘ a l (mod 3) for all i = 0,1,2... So u a f a,
1-"
(mod 3). In particular, 3 dividesn ifi‘3divides 1-.
2" a..
17. Since 10 a 1 (mod 9), asimilar criterion holds for divisibility by 9
as for divisibility by 3. In case of 11. 10 E — 1 (mod ll). So
662 DISCRETE mmmncs
10' 5 (—1y (mod 11). So n = ’2"-eanon: divisible by 11 it! a,—a,+
a.~a. +...+(—l)"akis divisible by I].
18. Represent the days of the week by residue classes modulo 7. with
Sunday for [0]. Monday for [1] etc. Suppose 13th January falls on
[x]. Then 13th February comes after 3l days and since 31 E 3
(mod 7), it will {all on [x + 3]. In an ordinary year the subsequent
13th: will fall on [x + 3], [x + 6], [x + l], [x + 4], [x + 6],
[x + 2], [x + 5], [x], [x + 3] and [x + 5]. Regardless of what 1:
is. all residue classes appear in this list. So every ordinary year has
at least one Friday the thirteenth. Also at most three of these classes
are equal. [x + 3] occurs thrice in the list, namely February, March
and " l . So the .h:. ‘ of these ‘L fall on the same
days, which is a Friday id [x + 3] = [5]. i.e. ifi' [x] = [2], i.e. ifl‘ x
is Tuesday (or equivalently the year begins on Thursday).
Forsleap year simply add [I] to the classes for the months
from March to December. Then January, April and July have their
13th: falling on the same day.
34 33 ‘
+ 2 ( ) + 34x 33 >< 33
19. 3 180 .(In2.,a+b+c+0holds
(3)
iii a, b, c are distinct or all equal).
For i = l...., 35. let x, be the number of lectures on the 1th day.
Let y, = x1 +..+x,. Then ls yl< y. <...< y..= 60. Let
Y= (yl, you» I»). We have to show that for some i< j, y,—y,= 13.
For k = l,.--. 13 let Sk= (n: l g n < 60, n sk (mod 13)}.Then
|S,.|=5for l<k<8and|SH=4 for 9<k<13. Also
ll
Y =kL_Jl (YnS.). Now if the assertion fails then I YnS, | anacon-
tsin at most 3 elements for 1 < k g 8 and at most 2 elements for
9 < k < 13. But then [ Yl g 34, a contradiction. However, a set
Y with 34 elements can be constructed so that no two of its elements
difl‘er by 13, In fact let 1’: {l,,.., 13) U {27,..., 39) U {53...., 60}.
The w- , “ ,x,‘sgive the L ‘ ' of the ' over 34 days.
21. .Let x be a positive integer. Write x as 4m + k where
m, k are
integers and k = 0, l. 2, or 3. Then modulo 8, x' equals k’
which
Is 0, l, 4 or 1. The result follows from the fact in 2.,
[7] cannot be
expressed as the sum of three elements from [0]. [l],
[4], with
repetitions allowed.
22. The hint gives one solution. The problem can also be
done by
Answer: to Exercises 663
recurrence relations. Let b. be the number of possible interpretations
of al * a, *...* an. In any such interpretation. the last application
of: mustbe ol‘the form x e ywhere for some r, 1 s r S n — l,
x is an interpretation of a; hut a. and y is an intepretation of
-l
(uh-v.01. on. So we get [1,. = ”2 b,b._, for n 2 2. This is analogous
real
to the recurrence relation for the Vendor Problem obtained in
Chapter 1, Section 3.1tcan be solved by the methods in Chapter
7.
Since X is finite, in the sequence x, x’, x‘, x',..., x”...., we must
have x” = x1" for some m < n. But then x”‘”' is an idempotent.
(i) Suppose m, n are relatively prime. Then mu is divisible by a
prime square ifl‘ at least one ofm and n is. When this is not
the case, number of prime factors of mu is the sum of the
numbers of prime factors of m and n. This proves multipli-
cativity cf n. Others are trivially multiplicative.
(ii) For commutativity note that for ne N, both (I! g) (n) and
(pf) (71) equal (21,) f (a)g(b) where the sum ranges over all
I;
ordered pairs ((1, b) of positive integers for which ab = n.
Similarly both ((f s g) # h)(n) and (f a (g t h))(n) equal (25 )
a, _,
f(a)g(b)h(c) where a, b, c are positive integers with abs = n.
(iii) Prove the hint using prime factorisations of m, n. With the
notation of the hint, if k = uv then Tkfl = u’v’ where u’ = Em ,
v’ = 3. Note that u, v are relatively prime and so are u’, v’. So
fluv) = f(u)f(v) etc. Then (f t g) (mn) comes out to be {/00
f(v)g(u’)g(v’) where the sum ranges over all the quadruples (u,
v, u’, v’) for which uu’ = m and vv’ = n. This factors as
( (u,u’)
’3 f(u)e(u'))( (v.v')
E f(V)s(V'))
which is precisely (f * g)(m) times (f t g)(n).
(iv) Ifft g = 8 for some g E S, then (f*g)(l) = 1 =f(l)g(1).
So 1(1) eé 0. For the converse follow the hint. Define
l — 1
1:0) = 70—). For n > I, g(n) has to be defined “sf—(U “FER”
g(b) where (a. b) ranges over all ordered pairs (a, b) of positive
integers for which ab = n and b < n. Then (f * g)(n) :0 for
n > 1.
(v) By (i) and (iii) p. s- c is multiplicative. So it suflices to show that
(p. t c)(p') = 0, for a prime p and a positive integer r. The
664 DISCRETE MATHEMAUCS
only divisors ofp' are p", 0 s k g r. Since “(110) = l, lI-(P)=
—l and Mp‘) = 0 for k > 1. the result follows.
(vi) and (vii) are immediate from definitions.
(viii) For k = l,..., n, let G; = (x: l < x s n, g.c.d. of x and n
is k). Clearly G,, = d if k does not divide n. If k divides n and
the g.c.d. of x and n is k, we can write x = kr and n = ks.
Then r s 3. Also r is relatively prime to .r for any common
factor of them will give a larger common divisor of x and n
than k. Conversely if x = kr and r is relatively prime to s',
then the g.c.d. ofx and n is k. So|Gk| = ¢l(s) = ¢ (£) Hence
I: = 13::- | 6,, | = “2. ¢ (z) c(k). showing that i(n) = (¢ 1 c)(n)
for all n.
(ix) Multiply both sides of 4: t c = i by y. and use c t p. = 8.
(x) Use (i), (iii) and (ix).
25. The arguments are analogous to that of the last exercise except thnt
some care is necessary since the convolution is not necessarily com-
mutative. If f(x, x) :5 0 for all x e X, define the inverse. say g, off
on (x, y) by induction on the number of elements 2 which satisfy
2: g z s y. (First we construct a right inverse and a left inverse
for f separately. Then apply Proposition (4.7).)
CHAPTER 4
Section 4.1
Define f:P(S)><P(T)—>P(SUT)by f(A, B) = AUB.
This is a generalisation of Exercise (3.3.14). If n is not square-free
and p is a prime such that p‘ | n, then p and n/p are not comple-
ments of each other. Also the number of positive divisors of n is
not a power of 2 in this case.
From the tables, if at all X is a Boolean algebra then a = 0 and
4:: 1. So I: and 4: must be complements of each other. This
defines ’. The axioms can then be verified. Alternatively, let
Y = P({l, 2}).
Definefla) = 4» fan = {1).Jic) = {2) Indfld) = {1, 2).
Observe that a subset A of S is in Y if it is the union of some
members of 9. i.e. it!there is some subcollection V of 9 such that
A = c g C C. (the collection V need not be finite. Even then the
Answers to Exercises 665
union of its L is well " " " lt ‘ of all ‘ ‘t of
S which belong to at least one member of 91) An isomorphism
between Y and P(9) is obtained by associating A with V.
(a) If such a proof were possible, it would apply equally well to
the algebraic structure (R. +. .), But tautology does not hold
in R either for + or for -.
(b) In the seoond step, associative law has been tacitly nssumed.
Although associativity does hold, its proof requires part
(ii) of Theorem (L3).
Denote the smallest and largest elements by 0 and l, suppose for
somex,y,zweheve xVy=s= landxAy=xAz=0.
Theny=yAl =-yA(x V 2)=(yAX)V(y/\I)=0A(yvz) =y/\z.
Similarly: = yAz. Soy = z.
y=y(X+y)-.v(x+z)=yx+yz=zx+yt=2(X+.v)
= z(x + z) = 2. V
By absorption x, = x.(x, + + x.). If x1 + x. + + x, = 0,
then x, = x;(x, + + x,) = xpO = Ofori = l, ...,n. Converse
is trivial. The other assertion follows by duality.
10. If xy' + x’y = 0, then by the last exercise xy' = 0 and yx’ = 0.
50:: = x(y+ y’ = xy +xy’=xy.Similnrlyy=xy. Sox=y.
Converse is trivial.
ll. Let H, I, R, M denote appropriate subsets of the set of all men,
Then (i) says HIR’ = 0 while (i) says HR’I’ = 0. Adding, HR’=o.
Similarly (iii) gives RH’M' = 0 and (iv) says MRH’ = 0. 50
RH’ = 0. By the last exercise, R = H.
12. Let S, H, C denote approprinte subsets. (ii) says SHC' = 0 while
(iii) says H’SC’ = 0. Adding SC’ = 0. i.e. S C C. But this con-
tradicts (i).
13. Rule (iii) is redundant. The simplified system of rules in ‘Every
student must register for course B and at least one more course'
and ‘No student can register both for A and C‘.
14. The first assertion follows from the fact that x < y ifi‘ y’ < It’.
For the second assertion apply Proposition (1.10) to the comple-
ment of the given element.
15. (i) Let S— A, S — B be finite. Then S—(AUB) is a subset of
S—A and hence is finite. S—(A nB) equlll (S—A)U(S—B)
and so is finite.
(ii) HA and S — A are both finite, S would be finite. If S = R
666 DISCRETE MATHNATICS
and A = N then A is an infinite but not a cofinite subset of
S.
(iii) Let if, .7 be the collections respectively of all cofinite subsets
and all finite subsets of S. Define 0: ’5’ —-> F by 6(A) =S—A
forA e 69’, Then 0 is a bijection. By Exercise (2.2.22) (v),
9' is countable.
l6. Clearly Y is closed under complementation. Let A, B E Y. If at
least one of them is finite, so is A nB. If both are cofinite, then so
is A n B by the last exercise. In either case A n B e Y. Similarly
A U B E Y.
17. Obviously Y is infinite. However, with the notation of Exercise
(1.15) (iii) above, Y = '4’ u 9‘ and so Yis countable by Exercise
(2.2.22) (iv). Hence] Y] = Ra. Now its is finite so is P(S). If s
is infinite, then by Exercise (2.2.25), IS[ 2 |N I. from which it
follows that |P(S) [ > | P(N) |. But by Exercise (2.2.27),
|P(N)I ,. x...
18- No element equals its own complement. So every pair contains
exactly two distinct elements.
19. (i) F ivity ofAis ‘ ’ For ' iviq. (xAy)Az
equals (xy' + x'y)z’ + (xy‘ + x’y)’: which simplifies to
xyz + x’y’z + x’yz’ + xy’z’. By symmetry. x A(yAz) also
reduces to the some.
GV) (xy)A(xz) == xyOc’ + z’) + (x' + )0 x2
= x(yz’ + y'z) = oz).
(vi) Putting z = xy in (i) above. am y A xy equals
xyxy + x'y’xy + x'y(X’ + y') + Xy'(x' + .v’)
which reduces to xy + x’y + xy‘ and further to x + y.
The operations in Y are pointwise. In particulnr,
13’ = 0 efg’o) = o
for all .r e S 9 lie) < 3(3) for all s E S. For s. e S, the function
f defined is non-zero. If g < I then g(s) = 0 for all s 75 3., while
300) < f(s.,) implies g(s.,) = 0 orflea) since Re.) is an stain of X.
So eitherg = Oorg=f.
2|. Let X, Ybe Boolean algebras. Denote the operations on them by
the same symbols. Then (a, b) g (c, d) ifi' a g c in X and b S d
in Y. (x, y)is anatomofXx Yifi‘one ofx and y is 0 and the
other is an atom.
Answers- to Exercises 667
Section 4.2
l. The disjunctive and conjunctive normal forms are:
(i) xl’x,x.' + x,’x.x, and (.7:I + x, +» x,) (x,' + x, + x,’) X
(x1, + x, + xalo‘i’ + x! + xa'XX.’ +x,‘ + xalm’ ‘l' Xe, + x3!)
(ii) abc+abc’+ab’c+ab’c’+n'bc’+¢{bc+a'b’c’ and (a+b+c’).
(iii) 0 and the complete C.N.F., i.e.
(x + y) (x + y') (x' + y) (X' + J”)-
[fg =f+ h, then forevery y,fly) < 30). Soft?) = l a 307) = 1.
For the converse, we can take 11 to be g itself. (Other choices are
also possible. But note that there is no such thing as g - f in a
Boolean algebra.)
f is a factor of g ifi‘ g vanishes wherever I does.
Whenever f = gh, the C.N.F. for f is obtained by multiplying the
C.N.F.'s of g and h. (In view of tautology, the common factors
need not be repeated.)
Introduce a new Boolean variable y which is set equal to 1 if the
veto power is not exercised and 0 if it is. Then b, i.e. the state of
the box, comes out as the product of I] factors, namely y and 10
factors of the form (x. + x, + x,). So the box has 11 locks, ten
as before but the eleventh lock has only one key which lies with
Pr
Let a. b, ..., g be Boolean variables representing passage or failure
in A, B, ..., G respectively. Set s = l or 0 according as the candi-
date is or is not from a scheduled caste. Let p denote passage in
the examination. Then 11 = pm. + P1P: + P1P. + hm. where
p, is a sum of 21 terms of the form abode, p, = a(b(: + cf + bf).
p, = cdg andp. -= .1.
Let a, b, c be Boolean variables representing the sizes, the locations
and the personnel respectively. Let f = lorO according as the
industry gets a license or not. The condtions of the problem imply
f ; abc + abc' + a’b’c’ and f’ > a’b’c + a b’c’. This does not
determine f uniquely. The remaining three terms of the complete
D.N.F..namely ab’c, a’bc and a‘bc‘ can be put at will either in f
or in f’, independently of each other. (Such terms are sometimes
called neutral or ‘don’t care' conditions). They are to be used for
simplifying f. In the present case if we put a'bc and a'bc' in f, f
simplifies to b + a’c’. So one possible system of rules is that 'All
urban industries will get licenses’ and ‘All small scale industries
employing unskilled personnel will get licenses’.
668 mscma MATHEMATICS
(i) and (ii) are symmetric with characteristic numbers 1, 2, 3 and
3 respectively.
g is the minimum of all symmetric functions k such I g k. To
construct g explicitly fromf, suppose a term like x? xi'. .. sf." appears
in the D.N.F. of f where, say, r of the efs are l and the remaining
n
n — r s's are 0. Then in g, put ( ) terms corresponding to all
r
possible binary sequences of length n in which exactly r terms are
1. Do this for every term in the D.N.F. of f. To construct 11, take
only those terms in the D.N.F. off. for which all possible
It
( )terms are also in the D.N.F. off. (g may be called the syn:-
r
metric closure off).
10 Apply the definition repeatedly.
ll. The n-tuple (x,,...,x., x,) can be obtained from (x1, x....., x.) by a
series of interchanges of variables; for example. going through
(x,, x1, x,,..., x.), (x., x., x,. x.,..., x.),..., (x,, x.,..., x._,, x,, x.)-
A symmetric function must assume the same value at all these
points. A counter-example for the converse is given by (iii) in
Exercise (2.8).
12. The result is trivial for n a: l, 2. For n = 3, classify the terms in
the complete D.N.F. with 3 variables, into four classes depending
upon the number of dashes. Thus, A, = (x,x,x,), A, — {x,x,x,'.
xsxs'xs- xs'xtxsh A.=(x1’x.'x,, x,’x,x,’. xsxs'xs'} and As=ixs’xs’xs')-
Then f(x,, x,, x.) issymmetric ifl' it has the property that whenever
its D.N.F‘ contains any one term from 4,, it contains all terms of
Al, i= 0,..., 3._This is always true for i: 0 and 3. Fort =1, 2
it follows from cyclical symmetry off.
l3. For example, let f(x, y, z) = xe’ sin z + ye' sin x + ze" sin y.
14. Classify the atoms as in the solution to Exercise (3.12) above, into
n + 1 classes A“, A,,..., A... Then for lg k g n —, 1 iff contains ‘
'an atom in Ak. it must also contain all atoms obtained by cyclical
permutations. The point is that if n is a prime. the atoms so obtai-
ned are all distinct. So there are n - l of them besides the original
one. This need not be so if n is not a prime e.g. if n = 4 then the
atom x,x.’x,x" gives only one more atom, x,x,’x.x,’. (Results like
this are special cases of a theorem on group actions. See the
Epilogue.)
Continuing the notation above, each class Ak, k = l,.,., n — 1
gets divided into subclasses. Each subclass has n distinct atoms.
A cyclically symmetric function must contain either none or all n
of these atoms. Thus in all the 2' atoms get classified into classes,
Answer: to Exercise: 669
two of which are singleton and the remaining have n elements each.
The number of classes is 2 + 2- »— 2
n . Any cyclically symmetric
function of 7: variables is determined uniquely by which of these
classes it contains.
16. With the notation in the text, let ml > LThen m, 2 2 for all
i = l, 2, 3, 4. If any p, is not used or is in the left pan then it is
impossible to weight more than 38 kg. So all pfs must be in the
right pan. But then the weight is 40 kg. So it is impossible to
weight 39 kgs.
17. Start from 0 e_ By (I), l e A, and —l e A-.. If2 e A, then
by (i), l e A-.. a contradiction. Similarly if 2 5 AI then by (ii)
I E An, also a contradiction. So 2 e A-“ whence by (iii), 3 E A“
and hence by (i) again 4 E A). Consider 5. Continue like this.
Similarly work with negative integers. (Actually it is not necessary
to assume beforehand that I A. | = 27 fori= — 1, 0. 1.)
18. Show that the sets A-.. A.. A, in the hint satisfy the conditions in
the last exercise. For example, A0 is precisely the set of weights
that can be weighted without using 17.. If m is such a weight. then
putting p. in the right pan we can weigh m + 1. But this means
m + l e A,. because .4, consists of precisely the weights that can
be weighed with p1 in the right pan.
19. More generally, for every positive integer n, a stone weighing
3—"; 1 1‘35 can be uniquely cut into n parts so that every integral
weight upto 3'2: kg: can be weighed with only one use of the
balance. The parts must weigh l, 3,..., 3"I kgs.
20. In the D.N.F. of j; combine all terms containing x. as well as all
those containing x.’. The significance is that a Boolean function of
n variables can be looked upon as a combination of two functions
of n — 1 variables, because each of the functions f(x,,..., “.1, l)
and f(x,...., x._,, 0) is essentially a function of the n ~ 1 variables,
x,,..., x._,. Note that such a result is possible only because a
Boolean variable assumes finitely many (in fact only two) values.
21 Inzg,0®0= 191 = 0 and l®0 =0@1=1.Thisproves
the assertion for n = 2 For the general case apply induction.
Section 4.3
1. (a) X] + 11'
(b) a'
670 DISCRETE MATHEMATICS
(0) «(be + be: + W: + pc) + x(yz + pbc + m + pq).
Only (a) and (b) can be simplified.
. (ab + xy) (rd + 2w)
3. Let the switches be x, y, z and f denote the state of the lamp. With-
out loss of generality we may suppose 1'0, 1, 1) = 1. This gives
f(0, l, 1) =1“, 0, 1) =f(l, 1,0) = 0;
f(0,0, l) =fl0, l,0) =11], 0, 0) =1
and
1(0, 0, 0) = 0.
So
f= xyz + x’y'z + x’yz' + xy'z’.
4. Let the switches be x1,..., x... If we assume [(0, 0...., 0) = 1 then
f(x1,..., x.) = 1 iii the number of x,’s having values 1 is even. So f
is a symmetric function with characteristic numbers 0, 2, 4, 6.--- .
If flo, 0,..., 0) = 0, then also f is symmetric. but its characteristic
numbers are l, 3, 5, 7.
5. Each of the 2" possible combinations of the states of the switches
provides one path for the current starting at the initial terminal.
This path will terminate at the lower or the upper level depending
upon whether the number of x,'s equal to l is even or odd.
6. Let n, be the number of persons at the ith table and x, be the cor-
responding switch. En, is odd ifi‘ an odd number of the m's are odd,
i.e. ifl‘ an odd number of x,’s are equal to 1. So the closure function
of the lamp is a symmetric function with characteristic numbers
1, 3, 5, 7,... . Use the last exercise.
8. Let x, y be the switches and j: g be the closure functions of the two
machines. One choice is to let f = xy + x’y’ and g = xy’ + x‘y.
Using Exercise (3.5) again, a possible circuit is,
9. Let switches a, b, c indicate whether A. B. C have tests respectively.
Let d be the switch which closes on odd numbered days. Then a
possible circuit is
Answer: to Exercise: 67]
c
a!
—1:c., —Ctil-.— (I
' —|:.“‘5d, 1—1—0—
1:
an d
h
c’
—E,. n
_'l:
317—.—
a — b-——c
.1-.I_all-@—
10. Attach lamps of difi‘erent colours to the terminals in the circuit of
Fig. 4.11.
12. Introduce some shorthand notation. Let X» Y, denote replicas of
the circuit in Fig. 4.10 with n = 5. In Xls the switches are x,....,x,;
in Y. they are y,,..., y,. Also let Y.,. be the portion of Y. upto and
including level 2 (i.e. terminals T“, T,, 1}). Then a desired circuit
is
672 DISCRETE MATHEMATICS
13. [105.111.
14. (1)
(iii) Let B. be a black-box representation of x,@ —...— ex". An
inductive construction for B. is
15. Use De Morgan’s law by which xy =(x’ + y')’.
16. (a) XOR was constructed in Exercise (3.14) (ii).
For NAND and
NOR. let the respective outputs of AND and OR
go through
NOT.
(b) For inputs x and y. the output of NAND is often
denoted by
x; y (read ‘2: dagger y’), i.e., x ,L y = (xy)’ = x' + y'.
Now
xtx = x’. Hence 1: +y =x'ty' = (x;x)l(y{,y).
Now
apply the last exercise.
(0) dual of (b).
Answers to Exercises 673
17. Between the terminals TI and T, the closure function is a + be l.e.
(a+b)(a+c). Similarly for other pairs. The equivalent wye circuit is
"i\<:u T “5T2
b c
LIJ
T3
18. After a is pressed contact at is made and the control path of X
remains complete through at even after a is released. If now b is
pressed and released, the control path of Y remains complete
through x and y. If b is pressed while x is not nude, 1’ does not
operate.
19. The tricky part is to construct a device which will be activated when
C is pressed twice in a row but not when pressed only once. This
can be done through a relay whlch is activated when C is released.
I—fij—‘fgil;
i
I X
“i
The relay Y ‘remembers' that C had been pressed and released.
Similarly let Z be a relay which remembers thstA had been pressed
and released afier C.
674 mscaa'rs MATHEMATICS
w r
--O-|—-r i 2 ll
Y
m ° Ell T—‘E—L'lfi.
Finally, let the circuits for the bell and the red lamp be respectively,
and
w—Eij—
v—I::_,, ——.—
(If desired, provide l master switch in the first circuit which is m
be turned 03' and on again before a fresh try is started.)
Take relays of delay 15 seconds each. Take the closure function:
for the red. green and yellow lamps as x. xy and xy’ respectively.
Let the relays be X1....,X... Let the closure function for the control
path of X, be x’. while that for X, (2 g is n) be xH.
Sectlon 4.4
1. All except (4) and (ll) are statements. (4) isnot a sentence. (ll)
is a sentence but cannot be assigned any truth value.
In essence We have here a pair of sentences p and q. p says ‘q is
false' and q says ‘11 is false’, then we can have p true and 11 false
(or vice versa). So both are statements. However, ifp says ‘q is true'
and q says ‘1) is false’ then they cannot be assigned truth values and
hence are not statements.
Answers to Exercises 675
Let V be the set of villagers, b the barber and R the binary relation
on V defined by n ifl‘ x shaves y. Then the announcement says
‘For every xe V, bRx ifl‘ ~(xRx). This is a statement which has to
be false, as otherwise bRb ifl‘ ~ (bRb). (This example is popularly
called the barber‘s paradox, and structurally resembles the Russel‘s
paradox in Chapter 2, Section I.) If the announcement were ‘for
every x eV, 3: ye b. bRx ifl‘ ~ (xRx); then it could be either true
or false.
The verb ‘to learn' is used to convey metalearning, i.e. learning
about learning.
No. The given statement is thus false. Here again doing a thing
about which it is impossible to do something is ‘metadoing'. God
could possibly make a stone which nobody other than Himself can
lift.
There is nothing wrong in the reasoning. It sounds paradoxical
because of the definition of a surprise test. which involves meta-
knowledge. In practice the instructor could either announce. ‘There
may be a surprise test‘ or announce ‘There will be a surprise test'
but change the definition of the surprise test to mean that on no
day except possibly Thursday can the class know definitely that
there would be a test the next day.
The first question should be ‘If my next question is going to he
'Are you guilty’, will your answer to it be the same as your
answer to this question?’ This is, of course, a metaquestion.
Let .r = x + y and p = xy. To start with,A knows .v but not
11 and B knows p but not .r. Now p cannot bea prime since
neither of the numbers is 1. If p were a product of two
primes then B would know the two numbers. More generally, if p
is such that p can be factored in only one way as product of two
integers in the range 2 to 99, then also B would know the two
numbers (e.g. ifp = 730. then the numbers have to be 18 and 73).
Let q be the largest prime divisor of p. Then p must have some
factor k 2 2 such that kq g 99, for otherwise the two numbers
are uniquely determined as q and [ill]. In particular we must have
q < 47. This much we conclude from B's first statement.
Now A knows only 3. But he knows for sure that B is unable tc
determine the two numbers uniquely. That means the integer s i:
such that no matter how it is split into two parts, it is impossible
to determine the two parts uniquely from their product. For
example s cannot be 52 because 52 can be split as 37 + 15 which
would give p as 555 from which R could have determined the
numbers uniquely as 37 and 15. Let us call such a splitting as a
‘good’ splitting of s. (47 + 5 is another good splitting of 52). Now
676 Discus-a MATHEMATICS
4 S s g 198. Let S = (s: 4 s s g 198, s admits no good splitting}.
Clearly no integer greater than 54 is in S (since it has a good split-
primes
ting in which one part is 53). Similarly ifs is a sum of two
it has a good splitting. This leaves S = { l1, 17, 23, 27. 29, 35, 37.
41,47, 51, 53).
When B makes his second statement, he knows p and he also
knows x + y e S. For each s e S, let A, be the set of products
of the parts in the various possible splitting of s. For example, if
s: 11, the possible splitting of s are 2 + 9, 3 + 8, 4 + 7 and
5 + 6 and the corresponding products of parts are 18, 24. 28 and
30. So A“ = (18, 24, 28. 30). Similarly A" = {30, 42, 52, £0, 65.
70, 72}. B knows p. If p e A, and A, for two distinct s, (E S, then
x + y could be .r or t and B would not be able to determine the
numbers uniquely. So from B's second statement we conclude
p E A. for a unique .1 e S. For each as S, let 8, be the subset of
A. consisting of those elements which do not belong to any A, for
1 5i .1, i.e.,
B. = first ()AI—At- For example, 8,, = (18, 24, 28), B" =(52)
—.
etc. From B’s second statement we know p E B, for some s E S.
Now consider A’s position when he makes his second statement.
He already knows 3. He also knows that p e 19,. If 8, had more
than one element, then A would not know 17 uniquely and hence
would not be able to determine the two numbers. The fact that he
could do so means B, is a singleton. Calculating B, for all s E S,
We see that B" is the only singleton set among them. So .r == 17
and p = 52. Solving x + y= l7 and xy = 52 we get the two
numbers and 4 as 13.
It might appear at first sight that 3': two statements contradict
each other and hence both cannot be true. But actually they are
not the negations of each other. The real meaning of ‘to know’ in
these statements is ‘to be able to find out using the available infor—
mation’. The available information is difl'ereut at the time of the
two statements made by B.
10. Let a be the statement that a particular student x registers for
course .4. Similarly define b, c, d. The system of five rules is equi-
valent to f: 1 wheref=f,f,f,f.f. and wheref, = ab + ac +
ad+bc+bd+cd, f,=a'+bc’+b'c, f.=c’+b+d,
f.= a” +c' and}; = c' +d’ +b.f.f.=-c’+a'd’+a’b whenCe
f,f,fi, = a’c’ + a’d’ + «'19 + bc’ and thus fifmf. comes out as
b(a’ + c’) (a + c + d) after simplification. Now the C.N.F. of fI
is (a + c’ + b + d) (a' + c’ + b + 4). Since both these factors
occur in the C.N.F. of b, j; divides b and hence f is the same as
Answer: to Exercises 677
fifgfifi. So the third rule is redundant. The equation b(n‘ + c’)
(a + c + d) = 1 also gives the simplified system of rules.
ll. Letting h, r, i and m stand for appropriate statements, the system
is given by f: l where fis the product of (r+h'+i'), (l+r+h’).
(m + h + r’) and (h + m' + r’). This reduces to (r+h’) (r’+ h)=1.
i.e. r + h’ = land r' + h: l. Soh—> randrA-uh. Hencerc-vh.
12. (i) {Gopal} (ii) (Ram, Rehim, Gopal, Goliath)
(iii) (Rahim, Robert, Gopal} (iv) (Ram, Gopal, Goliath}.
13- In modus ponens we have to show that (p' + q) p —> g is true or
equivalently that [(p’ + q) p]’ + q = 1. But [(9’ + q)!]’ + a-
(P9)’ + q = p’ + q' + q =p’ + l = 1. So modus ponens il
valid. For validity of chain rule apply induction.
14- In (v) for example, we have to show (1" + q) (.r’ + r) —> (pry + qr
is true. or equivalently that [(p' + q) (.1' + r)]’ + (pry + qr = 1.
But [0’ + q) (:‘+ r)l' + (M' + qr = (p’ + q)’ + (3' + r)’+
(p' + s’) + qr=pq’+:r’ +p' +8’ + qr=(pq’ +p’) + (:r’+
S')+qr=pq’ +p’+r’ + :'+qr=q’+qr+p’+r'+s'-q'+
r + p' + r' + 3' = l + q' + p’ + s’ = 1. Similarly the validity
of other arguments is proved.
15. with the notation in the text the argument proceeds as follows;
(1) p + q’ -> n’ (premise)
(2) p + r’ —>.rq’ (premise)
(3) P' -+ 7' (premise)
(4) P + P' ->p + r' (from (vi) in the last exercise)
(5) p + p’ -> :q’ ((4), (2) and (i) above)
(6) rq’ » 1 ((ii) above)
(7) P + p’ a s ((5). (6) 3nd (0)
(8) .r —> J + r’ ((iii) above)
(9) p + p' —> .r + r' ((7), (8)and (i))
(10) .r + r' -> p’q (contrapositive of (l))
(11) p + p’ —> 17"! ((9). (1°) and (i))
(12) p’q -> 9 ((ii) above)
678 01mm MATHEMATICS
(13) p + P’ -> 4 ((11). (12) and (i))
(14) :q’ —> q’ ((ii), above)
(15) P + 11’ —> q' ((5). (14) find (i))
(15) (P + P')(P + p’) -> 119' ((13). (15) and (V) above)
(17) l —> 0 (since 11 +p’=1 and qq’=0in(16)).
'16. No. (ii) and (iii) imply that all smokers have cancer. This contra-
dicts (i) only because of numerical data.
17. With appropriate symbols, the premises are r -> m and m and the
conclusion is 7. For the argument to be valid ((r' + m)m)' + r
would have to be I. But ((r’ + m)m)' + r = m’ + r which is 0 if
m = 1 and r = 0. So there is at least one instance (namely. when
missiles are costly and it does not rain) when all premises hold but
the conclusion does not. Hence the argument is invalid. However,
the question of corelation between rain and the cost of missiles is
irrelevant here. In mathematical logic, we are not concerned with
the truth or falsehood of statements per Se but only with how the
truth of some statements leads to the truth or falsehood of some
other statements.
18. ((r' + m)m)' + r’ = m’ + r’isoifm = land r= l.'l'bere is no
contradiction here because in testing the validity of the arguments we
are concerned with the statement ((r -> m) A m) —> r (which reduces
to m' + r as shown above) and the statement ((r —> m) A m) —> r'
(which reduces to m’ + r’). These two statements are not the
negations of each other. even though r and r’ are negations of each
other.
CHAPTER 5
Section 5.1
1. Let I denote the set of all invertible elements of a monoid (M, an).
I is closed under. by Proposition (3.4.7). Clearly the identity
element is in 1. Finally a E I implies a" e I since a“ is invertible
with inverse a.
Answer: to Exercises 679
2. For exemple let X = (e, a. b} and s be given by
* e a b
e e a b
a a e b
Foreach x, choose someflx) such thatxflx)=¢. Now, ax = bxsax
f(x) = bxflx) a a = b. Forae G, af(a) = e = 22 = ea/(a), so by
right cancellation, = ea. Finally. for any a e G, f(a) af(a)=
flu): = ef(a) so again by cancellation, my: = 2. Thus every element
has a two-sided inverse.
For example let X = {2, a} and ae = :2 = e, an = a = ea.
e = (xy)-l xy = x-‘ y“ xy. So yx = yxe = ,wm-ly-1 xy = xy.
More generally, let (.vy)’I = x'y" for n = k, k + 1, k + 2. Then
ac"+1 y"+‘ - (xy)""1 = (xy)" xy = x‘y“ xy. So by cancellation,
xy" = fix. Similarly, xy‘”1 = y"“x. So ykxy = y"yx. Cancel y'.
Clearly every subgroup H has the property. For the converse, let
a e H. Then a a“ e H, i.e. e e H Hence for anyy e H, evy-1
E H, i.e. y-1 e 1!. Finally, for x, y e H,y-l E H and so
x(y-1)-1 e H, i.e. xy 6 H.
They are generated by [2] and [3] respectively.
This is a consequence of properties of finite fields (see the Epilogue).
The crucial step in the argument is that for every m dividing 11—],
there are precisely m distinct residue classes 11 whose mth power
is [1]. See Exercise (6.2.38).
15" = e = y'I implies (Ir-1)“ = e in any group and (xyy‘ = e in an
Ibelian group. For a counter-example in non-abelinn groups. Let
G be the group of isometries of an equilateral triangle and n = 2.
10. They are y', y“,..., y“ where y is a generator for G and d = n/m-
If y“ is also a solution, find 1], r such that k = qd + r with
0 S r < d. Then y' would also be a solution, a contradiction.
11. Let G = (x).If H is a non-trivial subgroup of 6, show that H= (26*)
whine k is the least positive integer such that x" e H.
680 Discsm MATHEMATICS
12. j; 3 have order 2 each. Let h = f o g. Then by induction, for every
positive integer n, h"(l) = 2n. So h" cannot be the identity mnpping
for any 11. Similarly g o f is of infinite order.
13. Let m, n be the orders of x, yrespectively. Then (xy)""' = x” ym=
(e)"e"‘ = e. So xy has order s run. A better upper bound is the
1.c.m. of x and y. For sharpness, consider [2] and [3] in 2..
14. From the hint, if yx has order n, then (xy ”*4 = xy and hence
(xy)" =- 2. So order (xy) g order (yx). By symmetry the other
inequality also holds.
15. Definef: (I. 2,..., n} -> G byf(k) = x,x,...x.. Either fis onto or
there exists p, q with 1 < p < q < n such that f0!) = f(q). In the
letter case let I = p + 1,1‘ = q.
16. Decompose G into sets of the form (x, x") for x e 6. Each such
set contains 1 or 2 elements. Since 0(G) is ev :n, the number of such
sets having only 1 element each is even. But (e, e) is one such set.
So there is at least one more, say (y, y-l} where y“1 = y. But then
1" = e. y aé e.
17. Letfig: R+Rbe given byf(x) = ax + b, g(x) = cx + d for
x e R. Theng -f(x) = g(ax+ b) =cax + be + dwhich islinesr
and non-singular since on s‘ 0. f-I is given by f-1 (x) = $2: —3b
which is also non-singulsr, linear. Hence such functions form a
subgroup ofS(R). It is notsnormsl subgroup, e.g. letf(x) = x + l,
g(x)=x’. Then gfg-Kx) = (x1/' + l)‘ which is not linesr.
18- Analogous to the last exercise.
19. Consider (r) where r is a rotation through an angle of %0 degrees.
r and any flip generate 0..
Follows from the comments regarding when the composite of two
function preserves/reverses orientation.
22. Let HK be a subgroup of G. Certainly H = H (e) t: HK. Let h,
k e G. Then kli=h-l (hk)h is in HK. So kh e HK, i.e.KH c HK.
For the other way inclusion, first (t E HK. So k“)?1 = hlk
folr some hI e H, kI e K. But then his = In" hf‘ 5 K11. Conver3
sey supposeHK =KH. Then forh,,h. e H, k , k e K,hkh
= h1 (h,k,)kI for some h, e H, k, e K. So HI; isI closed
Inhale:
multiplication. As for inversion, (h, ko-l = k1" hf' e KH
= IIK.
So HK is a subgroup of G.
If, say, H is normal, then for h E H, k e K. hk = k(k-1hk)
6
[CH and H: = (khk-l) k e HK. So HK = KH. If both H, K
are
Answers to Exercise: 68)
normal in G, then for any h E H, k e K,g e G, ghkg-‘ = (ghr‘)
(gkg-i) e HK.
Let h = xf‘... xI" be a typical element of H. Then for g e G,
shy—1 = (sxtg‘lfluuxur‘rk E H-
In Exercise (1.14), let x = ab and y = r1.
28. (iii) If (g. h) is a generator for G x 11, then for every x E G, there
is some n such that (g‘, h") = (x, e). In particular g" = x. So
G = (g). Similarly H = (h). (iv) 2.xz. is not cyclic-
IfA #éanl—Al; 2, there exist aEA,x,yeX-—A.
and x .7é y. Let g: X -> X interchange x and y, leaving all other
elements fixed. Let h: X -> X interchange a and x, leaving all other
elements fixed. Then g e F‘. But hgh-l ¢ 04.
31. The elements of finite order in the circle group form an infinite
group.
32. Let G = A UB. If A, B are proper subgroups, there exist x. y E G
suchthatx¢ 4,}: ¢ B. Buttheny e A, xe Bandsoxy¢ A and
xy ¢ 3. For the quaternion group Q, Q = (i) U (i) U (k)~
Section 5.1
The cosets of H in G are planes parallel to H.
The cosets of K are straight lines parallel to the line K.
For w e C — (0}, Slw is the set of all complex numbers of abso-
lute value | w I, or equivalently S‘w - ((r, 6): r = | w I), which is
a circle of radius |w| centred at 0. If SW, SW, are two such
circles, their product is a circle whose radius is the product of
their radii, i.e., [ W, “W: l -
If 2 e ayK show that ayK= 1(HnK). Note that xH= 211
and yK = 2K.
Let x,H,..., m and y,K...., ”K be the distinct left cosets of H
and K respectively. Then HnK has at most mn left coasts, they are
ofthe form XaHOYIK, 1' = I»... m; J =1.--u 'b
Let x,H,..., m be the distinct left cosets of H in G and ylL,...,
y.L be those of]. in H. Consider (xmL: l g I g m, 1 gj g n).
Letting L = H nK gives Exercise (2 5).
Consider 5,.
The left coset multiplication is well-defined ili' H has the property
that for all x, y, z, w e G, at“: e H and y'lw e H imply that
y-lx-lzw e H. If H is normal in G then this is the case because
y—lx-lzw = [y-1(x-‘z)y] (y‘w). Conversely if the condition holds
682 mscnm MATHEMATICS
and 366, hell. put x=e, y=w=g-‘ and 2:}: toget
ghg'l E H.
10. Let 3 be the set of all cosets off! in G. Then 3' C P(G) and 2' is
closed under 0. So 6) induces a binary operation, the onset multi-
plication on 2’. {e} is the identity for O on the larger set P(G)
while H is the identity for the induced binary operation on the sub-
set 2‘. They need not be equal. Proposition (3.4.5) is not viollted
since it deals with the identities of the same operation on the same
set (see the comments following that proposition.)
P P
ll. (x+1)'=x'+l+'2fl( )x".Nowpdivides( )forl<
Ira-l k k
k < p (p being a prime), and x! = x by induction hypothesis.
12. Apply induction on m and use Theorem (2.16).
13. For all x, y e G, (xy)'1yx = y-lx—lyx E C(G) and so xy and yx
are in the same coset ot‘ C(G). For the second part, (xN)(yN) =
(yN)(xN) implies (xy)N = (yx)N, i.e. r‘x"yx e N. Since such
elements generate C(G), C(G) c N.
14. For the hint, hx ¢ Hand so hxhx = e whence hxh = xsince x’ = 2.
Now for h,, h. e H, h,x e 6—H and so hl(h,x)h. = ,x=h.ll,xh.-
So hlh, = h.h,. For the example, take G = D... the dihedral group.
15. Suppose y — gxg“. Then h E N, ifr‘hg e N,
16. Let Z be the centre of G where 0(6) = p’. Then 0(2) = l, p, p“ or
12'. Since G is non-abelian, Z 96 6. So 0(Z) eér. Also by Theorem
(2.2.2), a(Z) > 1. If 0(2) = p‘ then G/Z would be cyclic by Pro-
position (2.23) and G would be abelian by Proposition (2.24). This
leaves a(Z) = p.
17. Let x,h E Hand y,keK. lff(x,y) =hk, then h"x=ky"=ll
(say). Then 14 e HnK and x = Im, y = u-lk. Conversely for every
14 e HnK, [(1114, 14-11;) = hk.
Let K be a subgroup of order p. Then H fix is a subgroup
of H.
So aginK) = l orp. In the first case |HK| =p' > | G |,
anon-
tradietlon. So 0(Hn K) = p, whence K = H.
19. 0(2) = 1. p, we If 0(2) ¢ 1, then 0/2 is cyclic.
21. Lets. = xI-‘S to”:1,2,314-ThGIIISiI=|S|So
4 I ‘
IG—DIS.I=l'l_Jl(G—s.)|<'§l|G—s, =4|G|_4|s|<lal_
I
So ,915’ as ¢.
Answers to Exercises ‘683
22. Let S = {x e G: x‘ = 2}. Then S is closed under inversion. Also
let x, y E S. By the last exercise there exists 3 e G such that eg,
xg, yg and xyg are all in S. So, 3“ = y' = g‘ = (xg)’ = (yg)’ = (xyg)‘
= e. But then gxg‘bc" = gxgx = e, showing x commutes with g.
Similarly, y commutes with g. So xy commutes with g. This gives
xyg = (xyg)-1 = y-‘x-‘g-l = (gxy '1 = yxg. So xy = yx. Hence
(xy)’ = x'y' = 2, Le. xy 6 S. Thus S is closed under multiplica-
tion. Hence Sis a subgroup of G. Since 0(3) > % 0(6), S = G by
Lagrange’s theorem.
25. Without loss of generality, let x, = e, x, = y. For any other
x e G, x and x—1 are inverses of each other and are distinct. So
we may suppose for k > 1, x“ = x5}...
26. LetLv] E Z,-{[0]) where lgygp —l. If[y]’= e in Z,—
{[0]}, then p divides y'—l, whence either p divides y—l or p divides
y + l. The first possibility gives [y] =- e while the second gives
y = p — 1. So [17 — l] is the only element of order 2 in Z, — {[0]}.
27. Let x e G, x 9E e. Then G = (x). If G is infinite, it has proper
subgroups. 6.3. (x‘). Finally, apply Theorem (2.16).
28. Certainly, (x) c N,. So o(N,) > n. If o(N,) > n, then N, = G and
x is in the centre.
Sealon 5.3
2. Let R, S be respectively the subgroups(x) and (f(x)). Then S=f(R)
and so S is isomorphic to a quotient group of R. Hence a (S)
divides a (R). But a (S) = order offix) while 0 (R) = n. ,K,Let y
l,..., ymK be the distinct left cosets ofK in H. For each 1:1,...01
flx x, ef-1((y,)). Then xJ-1(K),..., x..f"(K) are the distinct
lefi cosets of f"(K) in G.
3. R = G/K where K is the kernel of f. So a (R)=o(G/K)=o(G)/0(K).
Hence a (R) divides a (G). (This fact was also used in the last
exercise with different notations.)
4. The composite of two isomorphisms as well as the inverse of an
isomorphism are isomorphisms.
5. If f: G —> H is an isomorphism, then for 0 e A(G), fef" e A(H).
This correspondence gives an isomorphism from A(G) onto A(H).
For the counter-example, take G = z, and H = 2.. Both 4(6)
and A(H) have two element each.
6. f e A(z,) is uniquely determined by f ([1]), which can be any
element of 2,, — {[0]}. A(Z) has 2 elements, the identity function
and the function f: 2 -> 2 defined by fix) = — x for all xe Z.
684 mscmn numrrcs
7. For g E, G, let T,: G —> G be defined by T,(x) = 9:3" for): e G.
Then T, E 1(6). The function 0:0 —> 1(6) defined by 0(g) = T.
is a homomorphism with kernel Z.
The inversion function is always a bijection. It is ahomomorphism
ifl‘ for all x, y in the group, (.vcy)—I = x-1 y-‘. Apply Exercise(l.5).
If G is non-abelian some inner automorphism is non-trivial. If G
is abelisn, inversion is a non-trivial automorphism except when
x’ = e for all x e G. In this case G can be considered as a vector
space over the field z, (see Exercise (6.3.27) and from a basis for
the vector space, an automorphism for G can be constructed.
10. LetS=(xEG:xf(x)=e).
ll. Define f: H —> HK/K by f(h) = hK for In E H. Prove thntfis a
homomorphism with kernel H n K.
14. Definef: Z, X Z. -> Kbyfil, 0) = a,f(0, 1) = b,/(l. l) = c and
1(0, 0) a 2.
15. Write G additively. For any x E G, p(h(x)) = p(x) —p(j(p(x)))
= p(x)— p(x) (since pj - idem). So h(x) E Ker p = H. Con-
versely if y e H, then h(y) = y— j(p(y)) = y. So I; has range H.
Define f: G —> (G/H) X H by f(x) = (p(x), h(x)). Show that fis
an isomorphism.
Define (KXK) - f(x)L. «J: is well-defined since
xK = yK : r‘y e K : f(x'1y) E L =>f(x)“f(y) E L.
17. A homomorphism takes a commutator to a commutator. Inan
abelian group every commutetor equals the identity. To get the second
assertion of Exercise (2.13), let H = G/H and f the quotient homo-
morphism.
18. Let L = {x E G : [(x) = g(x)}. L is easily seen to be a subgroup
of G. Also S c L. Hence K c L. An alternate solution is to
write a typical element of K in the form
”In, ..., xgk for x” ..., X; e S, n,, ..., In 6 z.
19. Analogous to the second part of Proposition (3.13).
130(0) :3 pl. If G has no element of order p‘. let
x e G, x at e
and let y E G — (x). Define
f:ZXZ—>Gbyf(m,n) = x'flyn.
Showthat
Kerf=((m,n)e s:p|m andpln).
Hence, get an isomorphism from z, x 2,, onto G.
Answer: to Exercises 685
21. Let G be a non-abelian group with order 8 and centre Z. Then
0(2) = 2. So G/Z is isomorphic to the Klein group by Propositions
(3.13) and (2.24). Elements of G may be denoted by i l, :1; a,
:l: b, :l: c where I is the identity, — 1 is the other element of Z
and (— l)x is written as —— x. G must have an element of order 4,
for otherwise every element except I would be of order 2 and G
would be abelian by Exercise (1.5). Let n he of order 4. Then
a‘ = —- 1 (because in 6/2, (aZ)‘ = a'Z, but (1' ye 1). Without loss
of generality, let ab = c (the other possibility being ab = — 6).
Then at: = — b. Also ba = — 5 (otherwise ba = c, putting bin
N. which would mean 0(N.) 2 5 and hence N, =- G, i.e., a E Z).
Similarly ca = b. This determines the multiplication on G except
for be which must equal a or — :1. Depending upon which possi-
bility holds, G is isomorphic to Q or to Dr
22. Let G be an abelian group of order 8. If every element of G
(except e) is of order 2, choose xe G — {e}, y e G — (x) and
z e G — H where H is the subgroup generated by (x, y}. Define
f: Z X Z x Z —> G byf(i,j, k) = x'ylz“ and work as in Exercise
(3.20). If G contains an element of order 8, G is isomorphic to 2..
The only case left is where G hes an element, lay x, of order 4 but
no element of order 8. Let H = (x). At least one element of
G — H must be of order 2. For otherwise, lety e G —H. Then
xy 5 G — H. y‘ and (xy)‘ are of order 2 and hence equal x', the
only element of order 2 in G. But this gives x' = e, a contradiction,
So some y é H has order 2. Now. G/H = (H, yH). Define
j: G/H —> G by j(H) = e and 1011) = x’. Exercise (3.15) is now
applicable.
If T, is the right translation by 3, then for 3, II e G, 1‘“, equals
T, o T, and not T, o 1).. So the function 1' would not be I group
homomorphism. However, if f(g) = Tr: then [(gh) = RIM—1
= 7'e = TraTrt = f(g)](h). So an alternate proof is possible.
24. Any group G of order n, induces a group structure on the set
S = {1, 2, ..., 7:). through any bijection from G to S. Isomorphic
groups induce isomorphic group structures. But on S there are
only finitely many binary operations and only a few of them make
S into a group.
Let f : S —> T be a bijection with g : T —> S as its inverse. Think of
S, Ta: subsets of [(8), F(T), respectively. Regarding f as a func-
tion from S to F(T) let F : F(S) —> F(T) be the unique homomorphism
which extends f2 Similarly let G : F(T) —> F(S) extend 3. Then G o F
is a homomorphism which extends the inclusion function from S to
F(S). Since the identity function 15(5) also does the same,
6 e F = 11(5). Similarly F0 G = I’m. So F, G are isomorphisms.
686 mscnm MATHEMATICS
27. Defines : F((a, 11)) —> Z“, by 0(a) = [2], 0(1)) = [5]. Show that the
kernel of e is the " normal ' g .— of F({a. b}) ' 'w
a“, b‘ aha-lb“. The argument is analogous to that in the text for
D5. More generally for any positive integer n, the relations 12' = b‘
= abab = 2 given D. while the relations a“ = b’ = ba'lb'1 = 2
given Z...
28. The problem reduces to proving mm = ark’bl. For j = 1, apply
induction on r. Then apply induction on j.
29 . Let k be a positive integer. Define a binary operation . on the set
Z, XZ, by ([1], [1]) t ([r], [rD=([i+ rkl], [j +s]). Then 0 is well-defined
ifl‘i 51' (mod 7). r a 1" (mod 7),j E j' (mod 3)ands a s'(mod3)
imply that i + rk! E i' + r’kl' (mod 7) and j + s E j’ + s’ (mod 3).
This finally reduces to k' E 1 (mod 7). giving]: = 1, 2 or 4 (mod 7).
For these values of k, t does make 2, X 2. into a group. On the
other hand if k = 3, then the relation M! = ark’ bi gives. forj = 3
and r=1, a=a". So a"=e, i.e., a‘ =e and hence a=e (since a7=e)-
But then G is generated by the single element b with relation
b' = e.
This time start with the homomorphism 0 :F({a, b}) —> Q which
takes a to iand b toj.
31. First extend f : S -> 'G to a group homomorphism e : F(S) —> 6.
Now apply Exercises (3.17) and (3J6) (with appropriate changes
of notation). For the second assertion. take S =- G.
31. Let f, g 6 3(8). Suppose f vanishes outside a finite subset C of S
and g vanishes outside a finite subset D of S. Thenf+ g vanishes
outside the finite set C U D and so I + g E 8(8). Similarly — f
vanishes outside C and hence is in 8(5). 80 B(S) is a subgroup of
2‘. Following the hint, let 9: F(S) —> B(S) be the unique homo-
morphism which extends f. Let K = kernel of a. By Exercise
(3.17), C(S) r: K. Conversely let x e K. Write x = 33'“? .,..r',"
where r E N, :1, ..., s, e S. 71,, ..., n, e z — {0} and theadjacent
sis are unequal. To show x e C(S). apply induction on r. Note
that 0(x) is the function g : S —> 1 which equals n11}, + "J'- +
+ n,f,,. In particular g(s,) = 0. But g(s,) = m, + 71,, + .. + m,
where 1 S 1,, ..., i, g r are the indices for which 3;, = 5,. = = 3,,
= s, and we let i, = 1. Since nI gé 0, p 2 2. Thus there is lome
k > 0 such that n+1 = :1. (Actually k > 1, since the adjacent sf:
are unequal.) Now write x = yz where
y = 3:43;- szkpl—uopk 5p.)
and
z = .1;- ’Zk’i‘+""+‘322{‘ 5"".
Answer: to Exercises 687
Then ye C(S) C K. So 2 e K. Since length of z is r— l (or
even less if nI + "kn = 0), induction hypothesis applies.
33. Given fe B(S), let :1...” .v, be the points of S where f has non-zero
values and let n,,..., n, be respectively these values. Then 1" =
r k
'2‘ am. For uniquenss suppose f also equals 21 mm. Then (s,,...,s,}
1-
-
and {t,,..., n.) are the same sets. since both consist of points where
I does not vanish. Now note that f(.1,) = n, while f(t,) = m,-
34. Let G = A(S). H: A(T), where the sets S, T are disjoint. Every
pair of functions (1;. f,) e B(S)><B(T) determines f: SU T~> Z,
defined by f(x) =f,(x) if x E S andf(x) = f,(x) ifx e T. This
gives an isomorphism from B(S)><B(T) onto B(SUT) and hence
from A(S) ><A(T) onto A(SU T). If Sn T 9% ¢, first disjunctify them
by replacing them with disjoint sets, S xfl} and TX (2).
35. (i) can he proved directly or as a consequence of (iii).
(ii) let x e H. Write x as xi" x?...x:’. The xii are alternately a‘s
and 17’s. Apply induction on r. Consider various cases depending
upon whether xl=n or b, and n, > Oor n, < 0. If for example,
xx = a and n. > 0, write x = yz where y = (WM) and
z = (WI-"1 x’."... xi". Note that z E H and has length less then
r. For (iv), suppose S is a finite subset of H which generates
H. By (ii), write each element of S as a finite product of ele-
ments of the form n"b", n e Z (or their inverses). Since Sis
finite only finitely many elements of this form are involved.
So there exist n1...., Inez such that the set T={tz"l b'l : i= l,...,k}
also generates H. Now let m be an integer greater than all
| n, 1, i = l,..., k. Then WV" 5 Hbut cannot be expressed as
a product of elements in T (or their inverses).
Sectlon 5.4
1. Let a e H. Then «({I, 2,..., n - 1}) = o({l,. ., n} — {n}) = {Iv-'0
n}—a({n}), a being a bijection. So a induces 9:(l,2,...,n—l}
—> {1, 2..... n — I) defined by 6(x) = 60:) four = 1,..., n —- l.
(0 is often called the restriction of a to (l, 2,..., n — 1}, although
strictly speaking when we take the restriction of afunction, it is
only the domain and not the codomain that gets reduced.) The
correspondence e H 0 gives the desired isomorphism. The second
assertion follows from the fact that a and e have the same parity.
S...1 is not normalin S. forn > 2 (consider conjugation by the
transposition (n — l, 71).)
688 mscrum: MATHEMATICS
2. Let X = be the set of ordered r-tuples with distinct entries from
I,..., n; and Y the set of all r—cycles in S". Define f: X? Yby
fix,,..., x,) to be the cycle (x,x,...x,). Each y e Y has r preimagel.
Let 1- : gee". Express 0 as a product of disjoint cycles say
0 = clernck. Then 1 = (acx a-l) (ac,a-1)...(ac.a-1). Apply the last
exercise.
Let 9 = (11:, in) (in+1 in“) (13”,t in+...+rk) and
1' = (In "’j'l) (jun jaws) (jn+...+fle-s+l ... Jn+:.-+'k) where
r]; r, >...2 r, and r. + r, + + rk = n. Let c (’3) =1» {0‘
p = 1,..., n.
With the notation in the solution to the lest exercise, the conjugncy
class of 9 corresponds to the partition 71 + r. +... + rk of n.
The conjugacy clnss of 0 has (n — l) !elements and so No,the
normsliscr of 0, has order n. Since the elements 1, 0, 0',..., 0"" are
distinct, they exhaust N..
By the last exercise any element in the centre is a power of every
n-cycle. For n > 2, the only common power of (123... n) and
(213... n) is the identity.
For I z 2,] 2 2nndi ;6 j, (if) = (lj) (li)(lj). So S. is generated
by{(li): 2 < i< n}. Let 0 e A. Write 9 = (11,) (1],)...(lik_l)(lik)
k
where i1,f,,..., 1,22. Then k is even. If for Inyr = 1...”?
i,,_, = 1,, then the product (I i,._1) (1 iv) equals identity and may
be deleted from the expression for 0. If 1,.-. eel", then (1 i.,-,)
(1 i..) = (1 ii, l,,_,). So 0 is a product of 3-cycles, [Actually. it
suffices to tske 3-cycles of the form (I rs).]
10. For2<r<s,(lr.r)=(ls)(lr)=(ls)(l 2)(12)(l r)=(12s)
(1 r2) = (1 2:)10 201-1.
ll. Let 0 = (1 2) and c =(12 3...n — 1n). Let G be the subgroup
of S, generated by 0 and 0'. Using Exercise (4.3) and induction on
I, it follows that a' 0 o" =(i + Li + 2) for I= 0,1,...,n —l
(with the understanding that n + 1 = I). Now for I' = 2,... n—l,
(l, i) (i, i + 1) (l, i) =- (1, i + l) and so it follows again by induc-
tion that (l, i) e G for i = 2,..., n. But transpositions of thisform
generate S,, as noted in Exercise (4.9) above.
12. Let N be a normal subgroup of A.. [f N 915 (e) and N 75 K, thenN
contains some 3-cycle e. But then a(N) would have to be 12. 6 or
3. As shown in the text, A. contains no subgroup of order 6. So
if A. 7s N. then N would have to be (a) which is not normal in
4..
Answer: to Exercises 689
13.
B AI
l I
I IM
c’ .I I DI r q
I I ’
If? "’
P/, urn---— c
A a
Let the cube be as shown and G the group of its orientation preser-
ving isometries. Let a e G. 6(4) can be any of the 8 vertices. 8(3)
must be adjacent to 6(A). So for a given 0(A), 6(8) has 3 choices.
Once 6(A), 6(8) are fixed 0(C) is uniquely determined because of
preservation of distance and orientation. Similary 0 is determined
at all other vertices. It follows that 0(6) < 24. That equality holds
is shown by enumerating 24 orientation preserving isometrtis. They
are (i) identity (ii) 3 rotations each about axes like MM', joining
midpoints of opposite faces; one of which has cycle structure
(ABCD)(A’B’C’D’) (iii) six l80" rotations about axes like PQ joining
mid-points of opposite sides; one of them has cycle structure (A’C)
(CA) (83’) (DD’) and (iv) two 120' rotations about each of the
four diagonals, one of them has cycle structure (A) (A') (030)
(D’C B’).
1.4. A pair of diagonal vertices is characterised as being V3 a units
away where a is the side of the cube. So every isometry of the
cube induces a permutation of the four diagonals. Indeed with the
answer to the last exercise. isometric: of type (iii) correspond to
transpositions in 5.. those of type (iv) correspond to 3-cyc1es. As
for (ii), the 90° rotations give 4-cycles while 180° rotations give
elements of K C A,.
15. By Exercise (3.11), N n A, is a normal subgroup of 4,. If
N n A. = A,, then N = A. if 0(N) = 60. while N=s. ifa(N)>
60. If N n A. = {e} then N = (e), for otherwise every permutation
in N except e is odd. If 0, a are two such permutations then 06, 8'
are even and must equal e. So N has exactly one element other than
e. But then N is not normal in S,.
16. If G is abelian, take N. = G, N1 = {e}. For 1)., take N1 = the
.690 Discsm MATHEMATICS
Q,
subgroup of order n consisting of rotations and N, = (e). For
let N1 be its centre and N' = {1). For S, take N‘ = A, and N. = (e).
For S, take N1 = A“ N, = Kand N, = (e).
17. For the first part take intersections of the subgroup with the Nis.
For the second assertion, use Theorem (3.7).
18. Since A. is non-abelian and has no non-trivial normal subgroups.
it cannot be solvable. Also A, C A, C S. for all n 2 5. Apply the
last exercise.
19. Let G/H= No :) NI :...3N, = Hand H: M. :> MI 3...:
M, = (e) be the chains of subgroups satisfying the conditions in
the definition of solvability. Let f: G -> G/H be the quotient
homomorphism. For i: l,..., r, let K. = f-1 (N,), while for
i: r +1,...,r + .r, let K: = M,.,. Then G = Kl, I) K, 3...K,D
K,“ D... D K”, = (e) isa chain of subgroups showing G is solvable.
Let z 96 e be in the centre Z of G. Then 0(2) = p' for some r 2 1.
If r = 1. let H =' Z. If r> i, let H = (zr‘). In either case
o(H) = p. So o(G/H) = p'"-‘. Apply induction hypothesis and then
Theorem (3.7).
21. Under a suitable bijection between (1. 2,..., 9) and z,xz.. it is
easy to see that e and a are given respectively by. 0(x, y)=(x,y+ 1)
and a(x, y) = (x + by) for x, y E 2,. Since 0' = a’ = e, the
various elements of G are functions f: z,x z, —> z, xz, ot' the form
f(x, y) = (x + ny + b, y + c) for some a, b,c, E Z..So 0(6) S 27.
That equality holds can be seen by showing that all 27 functions
of this form (resulting from various choices of a, b, c) are in fact
the composites of suitable sequences of 8 and 11. (Actually, by
Lagrange’s theorem, it suffices to do this verification for only 10 out
of these 27 functions.) It is clear that every function of this form
has order 3, except when a, b, c are all [0] , which corresponds to the
identity function.
22. Elements of z.xz,xz, are of the form (x, y. z) for x, y, z e Z,-
Clearly 3(x, y, z) = (3x, 3y, 32) = (0, 0, 0). Z,XZ,x Z, is abelian
while the group in the last exercise is not.
A, is simple by Theorem (4.13). Let n > 5 and suppose N ¢ {3} is
a normal subgroup of A... It suflices to show that N contains a
permutation say 95 other than identity which has at least one fixed
point. For in that case we may think 01¢ as an element of A...”
whence by induction hypothesis N (1.4".l would be the entire group
A.._,. But then N contains a 3-cycle in 14..-, C A,“ from which it
can be shown that N contains all 3-cycles in A. (as in the proof of
Theorem (4.13)) and finally that N = A. (using Exercise (4.9)).
Answers to Exercises 691
To show that N contains a permutation with a fixed point, let
0 e N. 0 ¢ e. If 0 has a cycle of length I, 0 itself has a fixed point.
Otherwise let 0 = 0.0....0, he the cyclic decomposition of 6.11“
r=l, then 0 is an n-cycle which may betaken to be (1 2 3...(n-— 1)»).
Leta = (l 2) (3 4) Thenc 6A,. and a0 a‘1=(2 l 4 3 5...(n—l)n).
Now 6 e 6 r1 is in N and has 2 as a fixed point. Next
suppose r 2 2 and at least one cycle, say 0,, has length > 2. Let
0, = (i1. i,,..., 1,) and 0, = Ul,j,.....j,,) with p :> 2. Again let
a = (1'1, 1'.) (1“, 1,). Then 0 a 0 v“E N and has i, and jI as fixed
points. Also 0 a 6 r1 as e since 11 > 2. The only case left is when
each 9, is a 2-cycle. In that case n = 4k for some integer k > i
and we may suppose 0 is (l 2) (3 4) (5 6)...(4Ie — 1, 4k). In this
caselet a =(13 5). Then 0 a e 0'“ = (I 53) (2 4 6)(78)...(4k—l,
4k) is in N and the earlier argument (where p > 2) can be applied
to it.
Let 8 e S,l and suppose 0 = 0,0.,..., 0,; is the cycle decomposition
of 0. Let n, he the length of 0,. Then n, is 1110 order of 0,. Let m be
the least common multiple of my", nk. Then since the We commute
with each other, we certainly have 0‘" = e. 0n the other hand sup-
pose 6' = e. For each i, let x be any dementia the cycle 0,. Then
8"(x) = 0'(x) = 2: gives u, divides I. So In divides r. Hence m is
the order of 0.
CHAPTER 6
Section 6.1
2. oisalwsysan" , “ r, x is a . " , of
an integral domain R. Suppose first R has an identity. Then x’ =
x = x-l implies x = I. If R has no identity let F be the field of
quotients of R. Then x as an element of R corresponds to the ele-
ment [x‘, x] of F. But this is the same as [x, x] which is precisely
the identity element of F. 80 x is an identity for R, a contradiction.
In a Boolean ring every element is an idempotent.
The centre of Q consists of quaternions of the form a. + 0i + 0}
+ 0k. which is clearly isomorphic to R. The centre of M.(R) con-
a 0
sists of all matrices of the form (0 )foraeR. It is also iso-
a
morphic to R.
Let u = xy3—y‘x. Then yuy = yxy'—y'xy = y-W*yx}’ = 0-
So yu = 010' = yuyuyu = 0. giving yxy‘ = yx and hence y'x =
692 DISCRETE MATHEMATKX
y'xy‘. Similarly uy = 0 gives xy‘ = y‘xy'. This proves the hint.
Now for any x, y; x', y' and 000' commute with all elements. So
xy = 000’ = xovo'y = (nXy = yxyx‘y = W? = y'x’ = W:-
The first assertion is a consequence of a well-known result, called
Canchy-Schwuz inequality, which says that for any real numbers
a,,..., a.., bl”... b., B‘ S AC where B =l.z a,b,, =li a,’ and
-1 -,
C = i b.‘. To prove it observe that the quadratic f(x) = Ax'
[-1
+ 23x + C = 2",(a1x + b,)’ is always non-negative and hence its
[-
discritninant cannot be positive.
This second assertion can be proved directly by a lengthy com-
putation. Easier proofs are available using properties of vector
spaces. One such proof will be indicated in Exercise (3.26).
{"0 + 01+0j + 0k: a. E R} and {do +a,i+ Oj +0k: no, «,6 R)
respectively.
Define f: Q —> M.(R) by f(a. + 11,! + aJ + a,k) to be
“0 _a1 — ”a _ “I
“i an ‘03 as
a. a. a. —al
a, —a, a, a.
Then I is a one-to‘one ring homomorphism. Note that since matrix
multiplication is associative, this exercise gives an easier proof of
the associativity of the quaternionic multiplication.
To avoid confusion between complex numbers and quaternions
denote by 0 the complex number v—l (which is commonly denoted
byi). Every complex number 2 then has the form x + By with
x, y e R and its complex conjugate is 2 = x — By. Now define
2 ——v7
g:Q—>M,(C)byg(a.+a,i+a,j+a,k)=( )
w 2
where
z=ao+0a, andw=al+0ar
10. Let (F, +. -) be a field. Suppose x e F generates the additive
group (F, +). Clearlyx 9E 0. First we claim that F must be
of a prime
Answer: to Exercises 693
characteristic. For otherwise, let u = 1 + 1 CF. Then it 9!: 0. So
u" exists in F. let y = u-‘x. Then it is easily seen that x = y+ y
= 2y. Nowy= mx for some mEZ. So 2m): = xor (2m—1)x=0,
whence x is of finite order. By Corollary (1.11), Fhas characteristic
pl'or some prime p. But then px = 0 and so F consists of the p
elements 0, x, 2x,..., (p—l)x.
ll. Let R1 = R, = R. Then AR ={(x, x): XE R} is a subring of
R x R, but is not of the form SIXS,
12. For example, in M,(R). let
a" an
RI = {(0 0) z aII , aII e R }
an 0
R, = )z a“. an e R}
an 0
Then R| is a right ideal and R, is a left ideal of M,(R). But RlnR.
is neither.
13. No.
14. “A = pr.) and J is an ideal of R such that M4 (5 J, let fe J,
f¢ MA. Let g: [0, l] —> R be g(x) = (x — x0)' and 110:) = (Rx)?
+ 30:). Then g E M4 C J and so h e J. h is never 0 and hence it
an invertible element of R. So .1 = R. Conversely if A contains two
distinct elements say a, b, then the function 30:) = (x — a)‘ is in
Mm but not in M...
For [E R, let Z(f) = (x e [0, l]:f(x) = 0). Each Z(/‘) is a closed
subset of [0, 1]. Let M be a proper ideal of I and let 2 =a2(f).
E
We claim 2 ;e 99, for otherwise by compactness of [0, 1]. there exist
f,,.., f. e M such that .-6| 20,) = 95. Now the function'iof,‘ is
-
in M- and never vanishes. So M contains an invertible element.
contradicting M C R. Let x0 6 Z. Then M C My“).
,1
16. That R is a subring follows from the fact that the sum, the difl‘er-
ence and the product of two analytic functions are analytic. To show
R contains no zero divisors let j; 36 R. Then Z(fg) = Z(f)UZ(g)
(where 2(f) = {z 5 mm) = 0} etc.). Since z(f). 20;) are dis-
crete, so is Z(fg). So fg cannot vanish identically.
17. In general, for any positive integer m. {a + Vmb: a, b E Z) and
694 ' prscns'rs MATHEMATICS
{a + IVmb :a, b E Z) are subrings of R, C respectively. Unless
m is a perfect square, the expressions of the elements are unique,
i.e. a + Vmb = c + ti ifi‘a = c and b = d etc. More generally
for any positive integers n and k, let a = (n)1l" em". Then complex
I n
numbers of the form? and where afs e 2 form a subnng of c.
-0
All these suhrings are integral domains. but their properties depend
considerably on the integers n and k.
18. (a) This is a special case of Exercise (3.4.25) since the poset (No.
g), where g is the usual order, is locally finite. Given f: N,
—> R, where R is a ring, we define f: < —> R asf(x, y) =f(y
— x). In Exercise (3.4.25). R could be replaced by any ring R
in the proof of associativity of convolution.
(b) An element of M,(R[x]) is an n xn matrix say F(x) = {1hrx))
where each fi,(x) is a polynomial over R. These 71' polynomials
may have difl'erent degrees. Let r be the maximum of their
degrees. Then we may identify F(x) with the polynomial
A. + A,x +...+ A,x' where for k = 0...., r, A, is an nxn
matrix whose (i, j)th entry is the coeflicient of x” in the poly-
nomial fu(x). Then Ake M.(R). This correspondence is a
ring isomorphism.
19. This is also a special case of Exercise (3.4.25).
20. The first assertion is straightforward. For the second let (fg)(x) =
n
in on)! where for each n 2 0 e.-J§°ajb._,. The problem now
amounts to showing that for each n 3 0, (n + Dc.“ = § a,(n - r
1-0
I
+ I)b,._,+, +rzn(r + Dumb..." This follows since for each j.
0 <j< n + 1,-the coeflicient of ayb 1., intheflrst sumis
n-' 1
while that in the second sum is j. M,
( 1+ )
21. The ideal generated bya non-zero element equals
the whole ring
ifl‘ that element is invertible.
Define the isomorphism 0: R/K» T by 0(x +
K) =f(x) for x e R.
Since x + K= y + Kc: x—y eKevflx) =fly).
8 is a well-defined
and one-to-one. That it preserves the binary
operations is routine
to verify. If U is s subring or an ideal of T,
f-‘(U) is a subring
(respectively an ideal) of R. This gives aone-to
-one correspondence
between subrings of T and suhrings of R contain
ing K etc.
Answer: to Exercises 695
Let f : R —> 12/] be the quotient homomorphism. By the last exer-
cise I is a maximal ideal of R ifi‘ the ring R/I has no proper ideals.
Now apply Exercise (1.21) above.
24. Let K be the kernel of f. if f is not identically 0, then K ye R and
so K = (0), the only other ideal of R. But then f is one-to-one.
25. For x, y E R, with y 75 0, define g[x, y] = fix)[f(y)]-1.
Ifx > 0, then x4: > 0-0 = 0. Ifx < 0, then —-x > 0 forother-
wise we would have x + (- x) < 0 + 0, i.e. 0 < 0 a contradiction.
So(— x)’> 0, i..e x‘> 0. In particular [=2 l‘> 0. If F has
characteristic p aé 0, then 1 + 1 +. .+ 1(p times) is > 0; a
contradiction
27 . For example, in Q, the ring of quaternions, if (i + j) (i — j) equals
3‘ —j', then either i + j or i — j would have to vanish, there being
no zero-divisors in Q. For binomial theorem, apply induction on
it along with Proposition (2.2.19).
Section 6.2
1. Apply the usual euclidean algorithm. If i l b I < r < l b I, increase]
decreaseq by 1 depending upon whether b > 0 or b < 0 and change
r accordingly.
The first assertion is a standard property of complex numbers
(called ‘multiplicativity of absolute value’). For the second assertion,
following the hint, d(r + is) = r’ + s' < in' + {n' < n'.
Let n = a2 + 17': (a + ib) (a — 1b). Applying the last exercise
to ax + yb + i (ay— bx) [which equals (x + ly) (a —- ib)] find
p, q, r. .1 e 2 such that r‘ + r*<n' and (x + iy) (a —ib) =
n(P + M) + (r + is) Dividing by a—ib, we see. x + iy= (up—b4)
r+is
_beG. Alsod(r+i3
—!b)
x’ + y’ factorsinto (x + iy) (x — iy). Neither factor is a unit.
Up is a prime in G. theneitherp [X + iy orplx—iy in G. In
the first case x + iy = (a + ib)p for some a, b e Z. But then
x = pa and y = pb, contradicting that x. y are relatively prime in
2. Similarly p I x — iy gives a contradiction.
In general, in G, (x + iy) | (a + lb) :9 (x — iy) | (a — ib) because
of complex conjugation. So a + ib is a prime in G ifi’a — ib is a
prime in G. Moreover. these two primes are not associates of each
other in G unless either a or b is 0. If a + ibis a prime divisor of
696 DISCRETE mmmncs
p in G then certainly neither a nor b is 0 (or else p would not be a.
prime in 2). So the prime factorisation of p in G contains both
a + ib and a — ib. Hence their product, (a‘ + b’), divides p in G
and hence in Z. But then p = a‘ + b‘.
Foreach r = l, 2, ..., q. [p — r] = [r] in thegroup Z, — ([0]) under
modulo p multiplication. So modulo p, y' equals the product
l-2:...-q-(q +1)...” (p— l) which equals —1 (modp.). Let
ls x g 11—] he suchthat ya x(modulop). Thenx‘ a y’E —l
(modulo p).
For uniqueness, if x + iy has a proper divisor say u + iv in G,
then d(u + iv) would be a proper divisor of d(x + iy) in 2 by
Exercise (2.2) contradicting that x’ + y' is a prime in 2. So
p = (x + iy) (x — iy) is a factorisation of p in G into primes, If
p=(a + 1b) (a — ib) is another such factorisation then by Theorem
(2.28), a + ibis an associate of x + iy or of x - iy. In either
case (a, b) = (x, y).
(i) follow hint and get a contradiction since 4 does not divide
4n + 2. which is the order of the group 2, — {[0]}.
(ii) Let p - x‘ + y‘. Obviously 0 <: y < p. So there exists it
such that uy 5 1 (mod p), giving (xu)' + l a 0 (mod p),
contradicting (i).
(iii) Argue as in Exercise (2.6) to get a contradiction to (ii).
10. Let S = (n E N: n can be expressed as a sum of squares of two
integers}. If x, y e S then xy 5 S. (For an easier proof of this, if
x = (a + ib)(a—ib)andy =(c+!d)(c —i'd)then xy = (u + iv)
(u— iv) and where u + iv = (a + ib) (c + id).) Now 2, p1, ...,p,.
and q}. .... q.‘ E S. So the condition issuflicient. For necessity, let
u = (x + iy) (x — iy). Since q, is a prime in G.q, divides either
at + iy or x — 1y. Obviously the same power of q, divides both
x + iy and x— iy. So u, is even.
11. (i) an l.c.m. of a and b (ii) ag.e.d. of a and b.
If (x) and (y) are relatively prime then Ax + [Ly = l for some
F3
A, u. E R. So x, y are relatively prime in R. For the converse,
reverse the steps, using Theorem (2.14).
13. Since the ”11’s are pairwise relatively prime, it follows that m is
their l.c.m. So for any n e 2, mln ifl‘mll n for every 1‘ = l. .... k.
Consequently for x, y e Z, x a y (mod m) if x a y(mod m.) for
every 1= 1, k. This proves that 0 is well-defined and one-to-one.
Also its domain and codomain have the same cardinality.
14 . Let a = 5832 and b = 6639. Then a = 0 (mod 8); so ab = 0
Answer: to Exercises 697
(mod 8). Similarly ab 5 0 (mod 3) since b E 0 (mod 3). Since
a E 2 (mod 5) and b E 4 (mod 5), ab 2 3 (mod 5). Finally,
ab E 1.3 s 3 (mod 7). So in the notation of the hint. xl : x, = 0,
and x. = x‘ = 3. The first two conditions imply x is a multiple
of 24 and this considerably reduces the search for x in the present
problem. Letx = 24y. Thenx a —- y (mod 5), giving y a 2(mod 5).
Similarly y E 1 (mod 7). By trial y is of the form 22 + 352. Since
24y < 840. y has to be 22. So at = 528.
15. It sufiices to show that there exist y,, ..., y. e R such that for each
i=1, ...,k, yIEl (modA,) (i.e. y, — IE Abandyi 50(m0d A,)
for j eé i. Once such y‘s are found, we need only set
k
x = 2 «WM.
1.1
To get such y's, let )9 be the ideal generated by all products of
the form «,0, ak with a, e A, for j = 2, ..., k. (B is often
denoted by AA, Ak.) Show that A, and B are relatively prime
to each other. Hence there exists yle B such that yl—l e A,.
But then y, e A, for allj = 2, .... n. Similarly find yl, ..., y..
16. (14,, ..., u.) is a unit of R1 X R, x X R... if u. is a unit of R, for
every t - l, ..., k. The second assertion follows from the fact for
every m e N, #0:!) is simply the number of units of the ring 2...
17. Every homomorphism f: 2,, —> Z. is uniquely determined by
fl[l]). Let f([I]) = [k]. Thenfis an automorphism ifi‘ [It] generates
2..., which is the case ifi‘ k is relatively prime to m. Let G be the
group of units of the ring 2.... The function 0 : A(Z,,,) » G defined
by 0(f ) = [1(1)] is an isomorphism.
18. Finda,b€ Zsuchthatam + bn =1. Thenx= ”Mi-=t
= (x'")‘(r')‘ = we“ = e.
19. If y = 0, then x = 0. If y ;é 0, then apply the last exercise to the
element xy" of the multiplicative group of the field F. If Fill in
integral domain, pass over to its field of quotients. For a counter-
example, talte x = [3], y = [0] in Z, m = 2. n = 3.
The proof of Proposition (2.23) goes through since all that really
matters is that a commute with demerits of R.
22. For the first part write f(x) = (x — a1)"tg(x) and apply induction
hypothesis to g(x). For the second part write f(x) = a, If (Jr—my"!
-1
and compare coeflicientl of like powers. ’
693 Discas'rs MATHEMATICS
23. If a is a multiple root offix) then f(x) = (x — u)’g(x) for some
g(x) e F[x]. It is easily seen that the formal derivative of (x— at)‘
is 2(x — a). By exercise(l.20), f‘(x) =(x — a)'g’(x) = 2(x — at)g(x).
So 1 is a root off’(x). Conversely, if a is a common root of fix)
and f‘(x), writef(x) = (x — a)“g(x) where m 9 1. and 3(a) 96 0. If
m = 1, then f’(x) = (x - u)g’(x) + g(x) which does not have a: as
a root, contradiction. So In 2 2.
Let a = up?“ pf" and b = vqf' q? with the usual notation.
Then a divides 1) ill” for every t = 1, ..., k, there exists 1' such that
q, is an associate ofp. and n, 3 "1,.
With the notation of the last answer, suppose without loss of gene-
rality that s is the integer > 0 such that for I = l, ..., .r, p; - q,
and for i > s,p, is not an associate of any q, nor q, of any pl. For
I = 1, .. , s, let A, = min (m, m) and p, = max (rm, to). Then a
g.c.d. ofa and b isp.“ pf' while an l.c.m. is p{‘ M‘Pifi‘
pqffi‘ qf’. If a, b are relatively prime then .r s 0. In thlt
case if a I be, the primes p,, ..., 1:; must appear in beand hence in
c with powers at least my ..., ml, respectively. So a] 6.
Both 2 and x are prime elements of Z[x]. Their g.c.d. is 1. Let if
possible, 1 = 2f(x) + xg(x) where fix) = a0 + a,x +...+ aux"
and g(x) = bo + blx + + 17.x" e 2pc], with a”. eé 0, b.9é 0.
This gives 1 = 2a,, which is impossible.
27. Clearly a unit cannot vanish anywhere. Conversely if f : C —> G is
analytic and never vanishes, then 3(2) = 1%?) is analytic and is an
inverse off. If f has a single zero, say a of multiplicity] then
flz) = (z —- a) g(z) where g(z) is analytic and never vanishes. So I
is an associate of the polynomial z — a which is clearly irreducible.
If, on the other hand, I has two distinct zeros say all, a, or a single
multiple zero, a, then 1(2) = (z -— a!) [(z — a.)g(z)] or [(2) = (z -ot)
[(1 _ ¢)g(z)] gives a non-trivial factorization of f. So primes in R
have the desired form. Any finite product of primes must there-
fore have only finitely many zeros.
4 = 2.2 = (l + iv3)(l —iv3). Here 2 is a prime. For 2 =
(a + i\/3b) (c + 1V3 1!) would give 4 = (a’ + 3b') (c' + 3d'), the
only possible solutions of which are a = i 1, b = :i: 1, c = :1: 1,
d=00ra=:|: l.b =0,c = :1; 1,d = i 1. Similarly, I +h/3,
1 — “/3 are primes. Further ifu + 1V3 v is a unit, then u' + 3v‘
will be a factor of l in Z. So :1; 1 are the only units in the ring
2 + iVBZ. Hence, neither 1 + 1V3 nor 1 — 1V3 is an associate
of2.
Let f0) = 0.. +...+ aux“. soc) = b0 + b,x +...+ b.x- e z[;;]
Answer: to Exercises 699
be primitive. Let Il(x) =flx) 30;) = co + a"; +...+ c"+'xm+n_
Given a prime p, let i be the smallest integer such that p 1~ a, and
j be the smallest integer such that p 1-b,. Let k = l + j. Then
I:
c, = 2;“ a, b._,. For I < i, p | (5b,.-. while for r > i, k — r < j
,-
and so 1) | n,b._,. However, 1: 1-11. bk_,. So p 1~ ck. Hence there is
no prime which divides all the c's', Le. /i(x) is primitive.
30. Let g(x) =p_., +& x +...+p—"' where p., 0"e q... u». q. E Z.
99 91 ‘In
Let q = q,q,...q,,.. Then g(x) = 5 k(x) where k(x) e l]. Let p be
the g.c.d. of the coefiicients of k(x). Then k(x) = pg1(x) where
50:) E z[::] is primitive. Similarly let h(x) = :—hl(x) with r,.v e z
and h,(x) a 71x) is primitive. Then q: f(x) = pr £106) h,(x). Call
this common polynomial Mx). Then the g.c.d. of the coeflicients of
¢(x) is 41: since fix) is primitive and is also equal to pr since
g‘(x) h1(x) is primitive. So q: = pr and we get f(x) = g,(x) 150:)
in Z[x].
31. Let fix) 5 fix], f(x) aé 0. In view of the last exercise fix) isaprime
element of Z[x] ifi' one of the two possibilities holds:(i) deg/ix) = 0
and f(x) is a prime element of Z or (ii) deg f(x) > 0, fix) is primi.
tive and irreducible as an element of l]. Given f(x) 5 2h] write
fix) = cf,(x) where c is the g.e.d. of the coefficients of f(x) and
f,(x) is primitive. Factor 0 as an element of Z and f,(x) as an element
of Q[x]. which is a n.f:d.
32. ForanyaER.¢‘+¢+l=¢‘+a+i+!=(u + i)’ + i>o.
Soflx) has no root in R. In 2, there are only two elements 0 and
l, neither of which is a root of f(x). By Exercise (2.21) [(x) is
irreducible over R and 2,. In z.[x], however, [(x) = (x + 2)“. If
F = R. define h: I'Ix] —> G by h(g(x)) = g(a)) where o) = em"
= — i + i vii/2. Show that h is a ring homomorphism with kernel
I. To show h is onto, show that for every complex number z, there
exist a. p e R web that z = a + flu.
If F = 2,, call F[x]/l as K. The elements of K are cosets of the
form g(x) + I for g(x) 5 fix]. From the fact that (x' + x + l)
6 fix] it follows that x' + Iis the same coset as (x + I) + I.
(Note that — l = l in 2,.) Similarly x' + I = (x‘ + I) (x + I)
=((x+ 1) +I)(x+I)=(x'+x)+I=(x+x+1)+l=l
+ I. Repeating this argument, it follows that K has only 4 distinct
elements, namely, 1, l + I, x + I and (x + l) + I. It is customary
to denote these elements by 0, I. o) and o) + i. (This to bears only
a formal similarity to the complex number a above since both are
roots offlx).)
700 olscnsrs Mamsmmcs
33. Let K be a field with 4 elements and denote them by an. 1,, a“ 1..
with a. = 0. Use these same symbols to denote the four religions
and the four states. Now form a 4 x 4 square so that the delegate
in the ith row and jth column has religion aha, + a, and state
age, + an]. Since en, es 0, every religion appears exactly once in
every row and in every column. Similarly for the states. Also given
any religion r and state .13 (i.e., given 7, s E K) the system of equa-
tions r = m, + a, and .r = um + a, has a unique solution for a,
and 1;, namely a, = L: and a, = m. In other words, for
““s “1““:
every delegate, there is a unique place to 30. Hence, an arrangement
of 16 delegates is possible.
34. We get aoq" + aurzq'”l + a.p'q'-‘ +...+ a._,p""q + my = 0.
Sop divides (1.1)". By Proposition (2.17), p | a... Similarly q | a...
Section 6.3
1. For n. ve V, expand (l + l)-(u + v) in two ways; once as
(14+ v) +(u + v)and thenas(u+ u) + (v + v).
2. When R‘ is regarded as a vector space over R, only (i) is a sub-
space. When considered as a vector space over Q. both (i) and (iv)
are subspaces.
4.0:) The problem is equivalent to showing that if W is a vector
opsce over an infinite field Fund W1. ..., W" are proper sub-
spaces of Wthen'EJ' W, cannot equal W. Apply induction on n.
For each i, there exists w, e W; such that w, c W, for all j 96 i;
for oth'tzrwise‘tgI W. is the same as :2k and the induction
hypothesis applies. Now aw, + w, ¢ W, for any a E F (or else
w, E W,). Fori = 2,....n, there is at most one one F such that
m. + w. E W.. For ifuw, +w,e W. and flu. + w. e Wlarul
l
a eé fithen wl = a_p (aw, + w. — 3w, — w.) is in W,. Since F
is infinite there exists ueuch that aw, + w. e Wifor all
i = l,..., n.
6. (0)>< V, and V,X{0} are the kernels of 1:1, «g respectively. Both
are onto.
7. Define 9: Hum; (V, Wlx W.) —> H0111! (V. W.)
by 90‘) = (7‘: o f, n, a f). X Home (V, W.)
Show that 0 is an isomorphism.
As in the case of group homomorphisms, define 0: V/K —>
R by
0(y + K) = f(v). Show that e is a vector space
isomorphism.
Amer: to Exercises 701
Let TH be the set obtained just before v, is picked. (To = 4a.) By
induction on i, show that for every i = l...., k, T. is linearly
independent and has the same span as {v...... v,). The set T equals
Ts.
10. One basis is «Lu—13‘, o, —2),(o,1,—,o L+1)(o, o o, 1-1)}.
ll. If (i) and (ii) hold but (iii) does not, then some proper subset of S
would span V and hence dim (V) < n, a contradiction. If (ii) and
(iii) hold then S is a basis for V and Theorem (3.13) applies. If (i)
and (iii) hold and L(S) E V, then for any v e V — MS), S U(v}
is a linearly independent set of cardinality n + l in an n-dimensio-
nal vector space. contradicting Proposition (3.12).
12. Evidently, (iii) => (ii) => (i). For (iii), linear dependence of (1, 1/2.
1/3, V6} over Qimplies a relationship of the form a + b 1/2 =
V3 (c + :1 V2) for some a. b, c, d e Q, not all 0. Squaring and
simplifying, 21/2 (ab — 3nd) = 3(c' + Zd‘) — a‘ — 2b' which implies
either V2 is rational or else ab = 3d and 3(c2 + Zd‘) = a“ + 2b'
which further implies x‘ + y‘ = 3(u' + v‘) where x = a + b, yab,
u = c + d, v = :1. Here x, y, u, v are rationals. However, writing
them with a common denominator we may suppose they are, in
fact, integers. Let m = 14* + vI and n = x“ + y'. Then both m, n
are expressible as sums of squares of integers. Since n = 3m and 3
is a prime of the form 4k 4- 3, this contradicts the result of
Exercise (2.10). (iii) also implies (iv). Linear dependence of (1, v2,
1/3, 1/5} over Q would imply V5 = a + bvz + c1/3 for some
a, b, c e Q. Squaring we get a relation contradicting (iii).
I4. Apply Exercise (3. l3) with X = Vx {0) and Y = (0)x W.
15. “1": +...+ am = 0 implies alv. +...+ sake W, i.e., a, (v,+ W)
+...+ “(wk + W) equals the zero coset W. For the second
assertion, combine {v,,..., v.) with a basis for W to get a basis for
V. Another approach is to note that V is always isomorphic, as a
vector space. to W>< (V/ W). For this, first one proves the analogue
of Exercise(5.3.l5). Given a basis {v‘ + W,..., v; + W} for WW
there exists a unique linear transformation j:V/W—> V satisfying
j(v, + W) = v, for1= l,..., k, and hence j(v + W) e v + W for
all v e V. This follows as a consequence of Theorcm (4.5)
16. For the first assertion, apply the last exercise alter noting that R
is isomorphic to VIK. As for the inequalities note that range of
T. a TI C range of T., which implies r(T, 9 T1) g r(T,). Also
kernel of T1 c kernel of T, 0 T1. So n(T,) g n(TI 9 TI). Since
"(To = dim (V) — r(T,) and "(1} a T1) = dim (V)—r (TI o ,),
the second inequality follows.
702 mscnm mrnnuancs
17. If Tis not one-to-One then there exists v ¢ 0 such that T(v) = 0.
So T takes the linearly independent set (v) to {0} which is linearly
dependent. 0n the other hand, if T is one-to-one and n,T(v,)+...
+ukT(vk) =0 then T(a‘v,+ +akvk)=0 and so am—i— +ukvk=0.
18. T is one-to-one co n(T) = 0 ¢ r(T) = dim V (by Exercise (3.16))
o r(T) a: dim Wee Tis onto.
19. Get a basis A for X and extend it to a basis 5 for V. Let Y be the
subspace generated by B — A. For the second aSsertion, apply
Exercise (3.13) to calculate dim (X + Y) and conclude X + Y: V.
Since Xn Y = {0), every v e V can be uniquely expressed as x+y
with x E X, y e Y. This gives the desired isomorphism.
For n E PL: 6 G, define n~g to be g + g+...+ g (n times) if
n>0.0ifn =03nd(—n)(—g)ifn <0.
21. For r E R, m e M, rm E M. Dcfiner~m toberm. For the example.
take any ring with identity containing a zero divisor.
22. Let I be the ideal of FIx] generated by p(x). Then F[x]/I is a field.
Define 6: [IX] -> K by 0(f(x)) = f(a) Then 0 is a ring homomor-
phism with kernel I and range Ha]. Hence 1111] is a field (isomorphic
to the field I'M/I). Hence Theorem (3.20) can be applied to the chain
of fields Fc FMCK. From Theorem (2.26) and the proof of Theorem
(3.17), it il clear that [Ha]: (-1 equals the degree of p(x).
23. Every polynomial in Q[x] is uniquely determined by a finite
sequence of rationals. Arguing as in Exercise (2.2.22) (v), the set of
such sequences is countable.
_Let A denote the set of real numbers which are algebraic over Q.
Then A is countable by the last exercise. If R — A were also
countable then R = A U (R — A) would be countable.
f(u‘ + v,, u, + v,,..., u,‘ + w.) = Zf(w,, w,...., w) where the sum
extends over all 2" binary sequences (x., x,, ..., xk), and where
w, = u. or v, according as x, is 0 or 1. (For example. f(u1 + v1,
“2 + ’1) = flung “1) + f("l- Vs) + f0» "1) + for, ”I”
Associativity of quaternionic multiplication amounts to showing
that for any (do, 2), (b0, 1)), (ea, e), 14., = v. and n = v where.
u‘,=n.,b.,co — Mn:o — a.h-c — bp-c — (a x h)’c
vo=u,b.co—a,h~e — boa-e — ova-h—a~(h><c)
u=a,b.c—(a-h)c+a,coh+b.c.l+c.axh+a.h><c+bolxc
+(exh)><c
and
v=aob,c+a,coh +aoh xc+b,coa—(h-c)a+boaxc+col
xh+a x(h xe).
Equality of u, with v“ follows from the identity n-(bxc) =
Answer: to Exercises 703
c~(axb) = (axh).c. For n = v, use the fact a><(h><c) = (a~e)h —
(a-h)cand (a x h) x c = — c x (a x h) = c><(h)<a) =(c-a)h—
(eh) I. All other properties follow routinely from corresponding
properties of the vector operations. (For example, for distributivity,
use a.(b + c) = a~h + are and ax(h + e) = axh + axe etc.)
For multiplicativity of norm, note first that for any vector
41,} a]' = a.a and use the facts that a-(axh)=h.(axh)=0 and
(a-b)’ + laxhl’ = I al‘lhl‘.
21. For any x e G, 0-): and 1-): have to be defined as 0 and x respec-
tively. This defines the scalar multiplication and makes G a vector
space over 2,.
For any x e G, nx e G and so in the quotient group G/nG,
n(x + 710) = 0.
29 . 0/26 is vector space over 2, by the last two exercises. A basis for
it consists of elements of the form .r + 26 for s E S.
Iff:G —> H is an isomorphism then [(26) = 2H and so there is an
induced vector space isomorphism v.1; : G/ZG —o H/ZH, (cf. Exercise
(S.3.l6)). So 6/26 and H/2H have equal dimensions. Now apply
the last exercise. The converse is similar to Exercise (5.3.26).
31. An isomorphism f: G ->H must map C(G) onto C(H) and hence
induces an isomorphism from G/C(G) onto H/C(H). Apply the
last exercise. The converse was already proved in Exercise (5.3.26).
Section 6.4
l. The (I, j)th entry of (AC)’ equals the (j, l)th entry of AC which by
definition is k)”: one... The (i, j)th entry of C’A’, on the other hand,
-1
..
equals 2 a," a,’,. Since Cy'k = on and 125’, = am and the ring R
k-l
is commutative, the result follows. The rest is routine and does not
require commutativity of R.
For the second assertion instead of a direct argument note that by
Proposition (4.4), (A + A’)’ = A’ + (A’)’ =A + A’ proving
A + A' to be symmetric. Similarly (A — A’)’ = A’ — A = —
(A - A'), whence A — A’ is skew-symmetric.
A = § [(A + A’) + (A — A’)], where i is the multiplicative inverse
of the field element 1 + 1.
First B is linearly independent. For otherwise some I); e B is a
linear combination of the remaining elements of B. Any linear
transformation which vanishes on B — (11,) must also vanish on b,.
It follows that if we take W = V and definef: B —>W by ftb) = 0
704 mscnm mmnmncs
for all b 6 8-01,} and f(b,)=any non-zero vector in W, then f has
no linear extension. Secondly B spans V. If not, enlargeBto abasis
say C {or V. Then the inclusion function f : B —> V can beextended
to a linear transformation T: V —> V in at least two ways. because
for v E C — B, T(v) may be defined arbitrarily.
5. Let f: V -> W be a vector space isomorphism. Then f must take a
basis of V to a basis of W. So dim (V) = dim (W). Converse follows
from Theorem (3.13). The second assertion now follows in view of
Corollary (4.7). The point to note is that the isomorphism between
V and V‘ depends on the choice of the basis for V.
6. em + m = (T. + mm = m) + no) = we + em)-
Similarly e,(aT) = (aT)(v) = a T(v) = a: e,(T). So e, E T”-
Linearity of 0 amounts to proving that for all v1. v, E V, an..." =
= 2,, + e" and for all A e F, v e V, q. = M,. For any Te V',
e..+..<n = m, + Va) = 101) + T(v.) = :..m+ e..m= (en +
+ 9,.)(T). Similarly verify ex, = her Now, v e Ker 0 => e,(T) = 0
for all T E V'. If v a6 0, extend (v) to a basis 8 for V. Then there
exists T e V‘ with T(b) = l for all b e B. In particular, T(v) 9e 0
and so e,(T) 9e 0. So 0 is one-to-one. Since dim (V”) = dim(V‘) =
= dim (V), 0 is onto.
5 24 I9
7. § 3 13 3 . Thnsrank3.
3 —29 —4
8. Matrix of D w.r.t. the ordered basis (1, x, x', x‘, x‘) iI
01000
00200
00030
00004
00000
a B
9. Cnl|( Y 3 )as 3- T(Mx + and.) = B(¢M.+M.) +934.
+
“BA, = “T(Al) + a,T(A,), proving linearity of T.
The matrix
of T w.r.t. the given basis is
Answersm Exercises 705
o s o
R
O
a 0 fl
0 8 0
.g
0 y 0 8
10. P. has a l in (1‘, «(mm place and 0 elsewhere for i = l,..., n. These
entries uniquely determine a. In det (P,) the only non-zero term in
l and its sign is + or — according as a is even or odd.
12 By Theorem (5.3. 15),G is isomorphic to a subgroup of S. for some
integer In. Now apply (iii) of the last exercise. (An important prob-
lem in group representations is to find for a given G. as small It as
possible so that G can be embedded in GL(n, 17).)
13. If u e S. and o is not the identity permutation, then there exists
1 such that e(i) < 1‘. (Consider any cycle of length > 2 ofa and
take the largest symbol in it). So every term except one in det(A)
vanishes. An altemte proof can be given by induction.
14. Let the rows of A be u1.u,,...,u,,..., u,,..., u. and those ofBbe
14...... u,,..., um, u, + M" um”... 14.. Then det B =det (u,,...,u,_,,
"n "1+1 .., u.) + x det (urn... u,,...., 14..., 14,. any..." 14,.) = det
A+AO = det A.
15. Using the same notation as in the proof of Theorem (4.26). det
(AB) = 2a,], a“, ...a..,,, (let (v1. . w,,..., v,,,,) where the sum ranges
over all possible m-tuples (j,,.... j.) where the fa; range from i to
n independently of each other. Again out of the 73" possible terms
we need consider only those of the form where the j's are all distinct.
Each such term determines an m-subset J = {j.,....j,,,} of {l,..., In}.
Each such m-subset J corresponds to m! terms obtained by various
permutations of the indices in J.
Assume that J = {j1,...,j,,.) with l < j, <jI <...< j..<n. Then
the m! terms corresponding to a fixes” are of the form
a"v(1) a“a(sv"'a"-’v(n) dd (vi-(1v Vlatr)’"" Vidal)
where a is a permutation of (l, 2,..., m}. By Proposition (4.23),
this term equals (—1)" a,,,(l)...a.;,(_) do! (8“, ,,,..., 1,.) and so their
sum equals det (A,,,..., ,..) det (51”.... 1...). Thus det (AB) is a sum
n
of( )terms of this form, each corresponding to adifl‘erent
m
m-subset of {l..... n).
706 DISCRETE MATHEMATICS
16. Ax = I; gives x= A-l = A—1 b. So the solution exists and is
unique. By Theorem (4.31), A-1 _L (d,,) where d1, = on. So
_ det A
Eda—1:41.: dub/a d—l‘et 41-2 b101, which by Theorem (4. 30),
equals det A det A 1
19. Let my... v. be eigenvectors with corresponding eigenvalues Aw... A,I
If they form a basis for Tthen the matrix of T w.r.t. the ordered basis
(17“.... v.) is (an) where a” = 0 for {#j and a" = A, fori=l,..., n.
The system 2xl + x, = Ax. and 3x, = Ax. yields A' — 2A — 3 = 0,
Le. A = 3 or A = — 1. That these are in fact eigenvalues is shown
by actually solving the systems for these values of A. The elgenspace
forA = 3 is {(x, x): x e R} while that for A = —-l is «3:, —3x):
1: E R}.
fis an eigenveetor for A of'(x)=A f(x) for all x e R, oftx) =ke“
for all x E R, where k is an arbitrary constant.
22. A — A is the matrix of T — AI which is singular if (T— A!) v = o
for Iome non-zero v e V. This is equivalent to saying that A is an
eigenvalue of T.
Call det (x I. —A) as f(x). In the expansion of the determinant,
only one term gives a polynomial of degree n. namely (2: — a“)
(x-a,,)...(x —a,.,,). For the second assertion. f(A) = 0 e det
(A I. — A) = 0 e A 1,. — A is singular. Apply the last exercise.
Combine last exercise with Proposition (2.25).
03?
The problem amounts to showing that for any 11...... a,..I e F,
a, + alx +... + 11,.-1 r” + x” equals the determinant of the
matrix
x 0 0 0 0
—l x 0 0 0 al
o
0
._|
oix
0 0 0 0 —1 Jc+an.l
Expand the determinant w.r.t. the first row. Apply induction hypo-
thesis and Exercise (4.13) to the two minors.
Let/(x) = det (x I, — A) where A e M,(F), F being a field. It is
tempting to setx = A and conclude f(A) = det (AIn— A) - det
(A— A) = det (0) = 0. But the catch is that in this calculation
Answers to Exercises 707
det (Aln -— A) equals the determinant of an nxn matrix whose
entries are themselves n X n matriets, namely
A —-auI,, —a,,l,. —al.l,.
—a,ll,, A—afll. —a,.,.I.
_a,uI,, —a,,l,. A- am].
It is far from obvious why the determinant of this matrix should
equal the zero matrix. For a valid proof, regard xln — A as a matrix
over [Ix], which is a commutative ring with identity and let
adj (xi. — A) be its adjoint (see the comments after Theorem
(4.31)). Then we have
f(x)I,, = (x1. — A) adj (xl,. — A). (e)
This is an equality of two elements of the ring M..(F(x)). By
Exercise (1.18) (b), we identify M.(F(x)) with the polynomial ring
(M,(F)) [x] (except that here we write the powers of the indetermi-
nate x to the left of the coeflicients rather than to the right). So (it)
may be regarded as an equality of two polynomials over the ring
M..(I-'). Let/(x) = no + up: + ...+ am for some an. a1....,a,. e F.
(Actually a. = 1). Similarly adj(xI.' A) equals a polynomial of the
form Bo + x81 + x'B, +...+ B.._,x"-1 for some 5,, B"... 8,.-16
M.(F). Comparing the coelficients of the like powers of x on both
sides of (t) we get (n + 1) equations, namely no]. = — A3,,
ax], = 5., — A81, as!» =3. — AB.,...,a,,_,1,, = 3,.-. _ 43...,and
a..I.. = B,._l. Multiply these equations on the left by In, A, A‘,...,
Al“. A" respectiw and add to get the result.
27. If A, B, C e M..(F) and B = CAC-I then C is also invertible
when regarded as an element of the ring M.(F(x)) (which contains
M.(F) as a subring) and has the same inverse, C". Also x]. commu-
tes with C. So XL, — B = C(xln — A)C-1. Now apply Corollary
(4.28).
Let W = L((v‘,..., v._l}). Clearly T(W)C W. By induction hypothe-
—l
sis (Yum, v,._1) is linearly independent. If v,. = :2 am for some
.1
—l
a,,..., a.._, e F, then applying T we get hm. = "E «Aw, and hence
[—1
—l
:21 a, (A; — 1..) v, = 0 which forces each a, = 0 since N 94 I... The
three assertions follow by considering the matrix of T w.r.t.
the ordered basis (v1...., v.).
29. Let S: I" -> P" and T: F! —-> F" be the linear transformations
defined by A, B respectively. Then S - Tis the linear transformation
708 mm maximums:
defined by AB. Now apply Proposition (4.2) and Exercise (3.16).
30. (1) a (2) follow: from the last exercise. Conversely, if (2) holds,
choosep = n and B = 1.. Then rank (A)=n whenceA is invertible.
Similarly prove (1) ¢ (3).
31. Each row operation on a matrix amounts to multiplying it on the
left by a non-singular matrix.
Following the hint, the given determinant equals that of the matrix
1 l l l
o 11,—0, a,—n, a..—a1
Q ag—ag, a:—a.a, aj—amI
o arl—ar'a. «tr—arm ar'wr'aa
which. alter subtracting the first column from all others and taking
common factor: from them equals ifi. (a, — a) time: the determi-
nnnt of the mtrix
1 o o o
o 1 1 1
9 a. a. a.
6 “5-2 a1” “2—:
Apply induction on n.
Let X = (2:1,) be the inverse matrix. For n = 3. performing suitable
row operations we get:
Iii loo
lit X=01o
Ht 001
1H 100
011 X=—6120
1}: —401
632 6 00
011 x=_6 120
01 30—180l80
Answer: to Exercises 709
Solving three systems of 3 equations each. we get X as
9 — 36 30
—36 192 —180 -
30 — 180 180
For n = 4, a similar computation gives the inverse as
16 — 120 240 — 140
—120 1200 —2700 1680
240 —2700 6480 _4zoo
— 140 1680 —-4200 2800
However, it is far from easy to guess a formula for X” in'the gene-
ral case. The answer is,
w—w >< X )0:
(The last ! is ‘
i+j—2i+n—-l
1—1
"-y and not mm‘
1—1
j+n—ln
n—i
‘ ' i) It is -“ ' 1
j
by finding the inverse of a more general matrix called Cauchy‘s
matrix.
Let f: Vx V -» F be a bilinear form. Let A and B the matrices of
f w.r.t. the ordered bases (fir... v.) and (w1,..., w.) respectively.
Then A’ = C’BC where C is the matrix of change of basis from
(mm, v.) to (wl,..., w.).
36. Let, if possible. fix) = g(x)h(x) where g(x) =- b0 + + bar" and
h(x) = c,, + clx +...+ c,x' are in 742:]. Then a" = boon. Since
p | a, but 11' $4,, p divides b. or c, but not both. Without loss of
generality, suppose p I b0 and p‘bco. Now a, = b.,cl + bloc. Since
p | a” and p‘fcn we get p i b]. From a, = boo, + £111:l + b,c, and
p | a, we get p [ 1).. Continuing like this, from a”. = boa.I + b;.¢:,,._1
+... + bur-1“: + bmcn, we get 11 | b... (We assume here that m < n,
for otherwise h(x) will have degree 0 which would contradict the
primitivity of fix).) But then p I a. since a, = b..c..
39. Reduce B. C separately to echelon forms by row operations. These
two sets of row operators are completely independent of each
other. When performed together they reduce A to an echelon form.
proving (i). For (ii), note that a term in det (A) corresponding to
710 DISCRETE MATHEMATICS
a permutation v E S. will be non-zero only if a(l) S p for all
i: l,...,p and 9(3) > p fori =p + l,..., 7;. Every such a cor-
responds uniquely to an ordered pair (0.1-) where 0 is a permutation
of {l...., p} and ‘r is a permutation of (p + 1,..., n}. The number
of inversions in a clearly is the sum of the numbers of inversion:
in 0 and 1. So (—1)" = (—1)0(-1)'. Hence every term in det (A)
factors as the product of a term in det (B) and a term in det (C).
CHAPTER 7
Section 7.1
l. (i) 1%: (ii) -—1_1x= (“0 fin (iv) 17—335:
1
(V) (17).-
2. (i) e” (ii) 6" (iii) Egg cosh x) (iv) cosx
(v) sinx (vi) (1+ x)".
3. (i) an. 0, al, 0, 0., 0, a..... (ii) 0, a1, 0, a” 0, a” 0,... (iii) 0. a...
a a.
2-5, 3—,...” a... l l l
71.....1'he 0.G.F. of l, 5’ . —1
3, 1,... Is ? in (l — x).
4. 2.0—1 = (z — 1) (z"'-1 + x"-' +...+ z + 1). Since E is a primi-
tive mth root of l, for m1~n, E" 1% l and so E' is a root of 2"“ +
zm-' +...+ z + 1. Thus we getlg (El)! = m it»: | n and
-I
= 0 if
W W3 an) E. .2. MW :2. M" a. w =
III I w
. N J = _
a n
.
E max.
:7:
. u
5. (i) Diflerentiateb oth u'd eso f 3“ k :6 = (1+ x)' w.r.t. x
and set x = 1.
(ii) Differentiate again and add (i).
.
(m) Integrate both sides l—x"
of m = 1 + x + l...+ x'-‘ w.r.t.
x
Answers In Exercise: 71]
1
over the interval [0, I]. To evaluate I E); J): put 14 = 1—):
II
and use binomial theorem.
(iv) Let a. and b,I denote the sum: in the two brackets. Then
a. +11,» :0 +i)-.Soa:+b§= I (1 +z)-|-=1(1 +1)|n
= (sn = 2-.
(v) Use (1 + x)!(l + xy =(1+ x)r+v.
"(”>(‘) '(")( ' > (W’)
(V1):
I" J‘ n+1
=2
"° 1 q-n—i
=
q—n
p+q
11+».
. .. 2n—1 1-] "k n x" =nx( 1 +x) x
(1)6"(n)n(n_l)[Mutlpyk§o(k)
md§( n )x"=(1+x)'.]
H n—k
(iii) 2 [Dmemfim -1+§x + {xi + ...... and set
1-5::
L
#1.]
“1—1? ==(1+ x)' (I + x + x' +...). The coefficient of x- in
n
this equals i ( ) which is 2’I by Exercise (2.2.16) (i). Similarly
e-o k
. . (l + x)""" " 2n+
the eoefiicxent, say a, ot'x' in W. equals a. k . Let
2» 1
'2” r+ . Then «+5.12%1 by Exercise (2.2.16) (i) again.
p: r-ll+l
2n+l 2» +1
Alec a: Mince = for k = 0,..., It. So
k 2n+ l —k
a = 2" = 4". Finally, the given sum, say S, equals the coeflicient
of x" in 2'' i: (x + 3“ which, beingageometric series, simpli-
Ir-I
712 913mm MATHEMATICS
2(1 + 2x)" — 21-(1 + 2x)"+1 .
fies to -—l:27—‘ Putting 2x = y and usmg
the earlier parts, 5: 2-2'.2'— %.2"-4" = 4".
”k
The sum equals the coelficients of x' in 3: (x + a . Now pro-
_.
eeed as in the last exercise. k
,. —n
Applying Theorem (1.3) (iii), (1 ~— x)" =3 (—1)’( )x’, where
r
(—11) — (—n) (—n—l)...(—n—r + 1) ___ (_l)’(n + : — l)
r r
4.
11.
5
13. Use Exercise (1.12).
14. Putting k = 1 and n = 0, in Problem (1.5), l + 2 +.....+ r
7+1 I . u l
=( )=r(rT+l).Similarly2i'=22( )+<')
2 [-1 III.
2 1
=2 "+‘)+
3
"+1
2
=wx To w .h.
result from Proposition (1.10), start from (—lx—x)’ = E nx".
_ n—I
Difl‘erentiating. :1. +2: = i: n'x‘ and so in” is the coefficient
" l '-
. x' . . .
of x" m (1:372. Applying binomial theorem and noting that
—4 n+3
= (— 1)" ).
n 3
£fl=("+l) + "+2 ="‘——-§—"+1 (2"+ 1).
"‘ ~ 3 3
15. By binomial theorem,
(1 —4x)-1Il = 33 our
where "0
a,=(_4)r( _1l2)
r
= (_ m. (LiKfilglf‘z—B)
Answer: (a Exercius 713
= 2' 1.3.5 (2r — l)
r!
* 1.3.5....(2r— 1).:4... 2r (2r)! 2' )
r! r! rl r! r
2r
For the second assertion, VP, rl .
-
r
2r
16. From the last exercise. (1 — 40—1/3 = z ( ) t'. Integrating
"" r
both sides from 0 to x and putting it = r+ l. i— l? (l — 4x)”
I. 1 211—2
= 2 - x“ Since 110:0, the O.G.F. of (a.) is A(x)=
"1 n n—l
flzfl‘. From this we get, (A(x) ' = A(x) — x. For n > 1,
the coeflicient of x“ in [A(x)]' is :5] aka,” (since a, = 0) while on
.1
the R.H.S. it is a...
m
18. Since( )= 0 for m < 1'11/21. the rum might as wellbe taken
_m .
m
as 2. ( ) which equals the coefl'icient of x" in
I” _ m
- _l—x"+'(l+x +1
.§o(‘+"7""“————1-x_x- '
l
Now——l _x_x, resolves into partial fractions as
l(a_L
1—9 1'1 ax’l —px
where z = 'HTVS' and B = #. Since 2:" cannot appear in
x"+1(l + 10"“, the answer is a _l_ B (W1 —- W‘).
f(£)=uv+%‘+%+ ~+3-~w(§)=a.(;_¢.)...( g_
— a.)=:—: (1— fix)... (1 — 11.x). So a,x- + a,x-1
+ + a.
= x"! G) = a.(| — Mx)...(1 — m). The roots of this polynomial
are “1... a. So their sum equals -:—:, as we see by comparing
l .,—l-.
the coefficients of r-l in the equation
714 131mm MATHEMATICS
'
apc'+a,x"'1+...+a.=a,, (x—%)(x— (1—1;)...(x—%
. sin y' y‘ y'
20. By Exercise (1.2)(v), foryqé 0, .7), = _§l + i — fl +... .
21. For fix) in Exercise (1.20). a, = 1, a1 = —éand the rootsare(nn)',
n= 1, 2. 3,... . So
- 1 1 ~ 1 _5I
EMF? ”‘3“?— 6'
s(")("“>(‘) ("X")
"‘ a p—i 1 q p O
23. In the last exercise, both sides are polynomials in n (ofdrgree
p + q). Since equality holds for infinitely many values of n (namely
n = l, 2, 3,...), it holds if n is replaced by any real number. Re-
place n by — n — 1. For any pmitive integer r,
<"":‘>=<—w<":'>-
Usethiuwithr=p,r=qandr=p+qto gettherelult.
1+xy =1+:l:y 1 1+xy'° x’ "
2" l—x—yx‘ l—--xl (x’) =1—x.—o(l-—_x y"
— 1—): y
, . l w xln J‘In-iu.
which reduces to 1— x + .21 [W + WI y“, i.e. t0
Answer: to Exercises 715
l g W"l F 1 h . .
minimal. or n 2 . t e coeflictent of x" In
__ Is . . l .
(l—xy'“ the same as that ofxn- u +I Ill __
(1 —xy'+‘ , which by
n+l+m—2n+l—l m—n+l
Exercise (1.9) is . i.e. or
m — 2n + 1 711—211 + l
m— n + l
. But by Exercise (2.3.5), this is exactly a..,,.. For a
n
solution without Exercise ( 1.9),
l+xy _ 1+xy _" m
I —x—yx‘ _ l — x(l +xy)_.§o!(l+ xy
.. .. r+ l
= 2 Z 35"“ y‘.
1-. 11-13 It
To get the coeflicient of xfly" set k = n and l' = m — n.
25. Each path of length m corresponds to a binary sequence of length
m, (a1, (1...... a,,.) where forj = l,..., m, a,=1 or 0 depending upon
whether in the jth time unit, you go forward or backward. For any
k with 0 < k g m, this path terminates at the kth point in the mth
row iff, exactly k of the term in the binary sequence are 1. So
m
there are paths terminating at it. The identity follows since
there are 2'" paths of length m.
26. Let P.., ,. denote the nth point in the mth row (m 2 0, 0 g n g I»).
Then (i) says that to reach PM... you must go either through Pm-“
or through P..-,,,._1. For (ii), note that starting with any point in
the Pascal triangle there is a replica of Pascal triangle (ignoring
the entries) with that point as the apex. Now any path from the
apex PM to the point 13+” will cross the pth row at a point PM
for a unique k, 0 < k g n. The remainder of the path can be
thought of as the replica of a path from PM to PM-” For (iii),
divide the paths arriving at Pm). into two kinds; let a. of them have
their last segment going forward and b,. going backward. (Clearly
m—l m— l
a. = 1 and b. = , but that is not very important
. n — n
here. Then a.,=0=b.| and u,.=lz,._1 for n: l,...,m. Also
m
a.+b,,= . We then get the first sum as uo+a1+a,+
n
a. +...+ a... and the second as b. + bi + bI +...+ b...
716 Drscnm MATHEMATICS
27. 1
g _———= 1 1 1 1
+—+-+—+---.Now use the fact that
p- 1 1- llp :1 p‘ 1”
every n > 1 has a unique factorization into prime powers.
Section 7.2
6 k
(i) 2. (ibéo (k ) 5‘2“ (10 k). The only non-zero tern" arise
from k n: 5 and k = 6. So the answer in 4,350,000. (iii) (5 ).
(Put y= x'.)
(Ii 4.0:) where A.(x)= 1+ 1: + x‘ + + w.
[x+<;°>»+<‘:>»+<:°>~~+<2°>v1u+x>a
#(1+x+x’+x'+x‘+x‘)'.
The enumerate: for total score is
(l +x+x‘+x’+x‘+x‘)(x'+x’+x‘+x'+x‘+x')
which is thesameasthnt fora pair of ordinary dice.
TakeA(x)= x(l +x)(1+ x+ x‘) and 8(x) = 1(1 + z‘)(l +
x' + x‘). The corresponding markings on the faces of the first die
are 1, 2, 2, 3, 3 and 4 and those on the second are l, 3. 4, S. 6nd 8.
The enumerator is not the entire product (1 + 4x + 6x‘ + 4” +
x‘)“ but only (I + x)" ( '2' I’m») where amy - (1 + 4x + 6x'
"0 I
+ 4x' + x‘)‘and ataxia thepartof(l+4x+6x‘+4x'+x‘)‘
,-
20 24
consisting of terms npto x”. Then 1:, = ( and e, = ( )
r r
forO<r<20.So
(1.:
W21
2 5,01
( 28 )
" n—Zr
Jill? )( 2.“ XS)
9. 1 + x +...+ #‘1 = fi' Similarly, I + x‘ +...+ grub-1)
Amen to Exercises 717
__ 3
= :_:: etc. Multiplying infinitely many factors of this form gives
the result.
10. In enumerators the coeificients generally represent the number of
ways to do something and hence cannot be negative. In the example.
the number of ways to pick n balls is 1 or 0 according as n is or is
not a multiple of m. For every n 2 0, the absolute value of the
coeflicient of at" is the number of ways to pick n balls.
11. For r 3 0, let A,(x)= l—x” + W—x‘” +...+ (—l)"x"-7’+....
Let A(x) = 11014.00. Now consider a partition of n in which 1:.
parts are of size 2'! each for i = l...., m (say) where 0 < r1 < r,
<...< r... and k1, ..., k... are positive integers. Such a partition
corresponds to the term tI - xtm. ...x*~1’m in A(x). The sign of
this term is 1 or —1 according aslléki, i.e. the total number of
parts, is even or odd. It follows that in A(x), the coefl'icient of a" is
—b,I for all n 2 1. From the hint, A(x) is simply l—x. So al—b1
=—1, while a..—-b. = 0 for n >1.
12. Let» = :11 + n, +...+ 11.. where n, 2 n, 9...; n... > 0. Then
r1 — m = (nI — 1) + (n,— l) +...+ (n... — 1) is a portition of
n—m into at most mparts. Ifn, > n. >...> n." then [nI—(m — 1)]
+ [n,— (m — 2)] +...+ [n,.._l — l] + [n.. — 0] gives a portion of
_m(m—
——l)'into exactly m parts.
13. The coeflicient of x" on the LES. is the number of partitions of 71
into parts of distinct size. if n = n, + + m. is any such partition
with n, > n, >...:>n,,. >0,then[nl — m] + [n, — (m — 1)] +...+
[n...1 — 2] + [n,. — l] is partition of n -— "(Ll-I'D into at most
In parts, whose dual is a partition in which the part size is at most
m.
14. Taking i rupee as a unit the problem asks to find a“ where a. is
the number of partitions of n in partlgof size 1,2 or 4. In general
(n+2)(n+l) n___+1 n+1
:1, equals
‘T"+ 4 +32 H 1)- 16 “—1)"
1+11 ,1—1
32H") 16 +‘ 16
15. Let a. be as in the answer to the last exercise. Then the answer to
the first question is a“ — a“ — a1. + a, while to the second it is
an — all ‘ all + (1, ~
16. For a direct solution, let M be the midpoint of BD. Then it suffices
718 mscnn'rn MATHEMATICS
to show that AF _L MF. Draw MN and RH perpendicular to AC.
Join DH and FN. Since BII/MN/DE and BM = MD = {DC
we get [IN = NE = } EC, whence it follows that FN = fDH = f
DC = MD. So MDFN is an isosceles trapezium and is therefore
cyclic. Thus the points M, D, F, N are concyclic. But since AADM
; 90° = AANM the points A. M, D, N are concyciic. So A, M,
D, Fare concyclic, proving that AAFM = LADM = 90°.
For a solution using coordinates, choose axes so that D = (0, 0),
= (0.0). 5 = (-0, 0), A = (0, 17)‘ Let F = (h: k). Then
H
N
E
a M D C
2k— b
E= (2h 21:) and ‘Th— — -— — .But since DEiAC, wealso have
b h ak— b 2k
3: -. Alitlle calculation gives 11 2h—11=—1'+ which means
AF_LBE.
17. Let S, consist of the ith object. Then A;(x) = :11; since the only
I
permissible permutation is of length It]. So A(x) = Iwhere
n—— n, +.. .+ Hg. The number of permutauons 1n wh1ch 1th symbol
occurs in times is 711 times the coeflicient of x" in A(x), 1.e.
nl
n,l...nk l-
Answer: to Exercises 719
18. For i = l,..., k, let S, = (I‘ — 1). For exercise (3.2.17) (b), A,(x)=
A,(x)= ex 1— 2“, while fori = 3,..., k, A,(x) = :2". By Theorem
(2.16). Atx) = (emf—{YIE‘HV = i re“ + M—nx+e<*-‘m The
coeflicient of I:ll in A(x)is—1![k" + 2(k — 2)“ + (k —— 4)“]. Multi-
plying it by n! gives the resulL For Exercise (3.2.18),
A,(x)= 2 Eu”; “(all Exercise (1.3) (iii)).
road fl
The other Ai(x)’s remain unchanged. So
A<x>= "—5"+“exp«k— 2)»)=‘+“”exp«k—4»)
= i» [exp (kx) -— exp ((k — 4)x)]. So the answer is W.
19. " n.m-
"a 1 l " m
x"=m(e*—l)"=m— !£(—l)’exp((m—r)x)(r>
_ (—l)' “(m — r_)" x“. Comparing the coefiicients
_ ,_., (m—r)lrl ,_, n ! of x"
gives the result.
20. Multiply both sides of S... = Z-‘(— l)' (m — r)"'"'1
by grind sum
.rl (m — r — I)!
over from n = 0 to co. The result follows by reversing the steps
in the solution to the last exercise.
21. Let n=2m-a+b+cwhere a+b—c>0, b+c—a>0
and c + a — b > 0. Note that a + b — c has the same parity as
a + b + c, which is even. So a + b — c = 2x for some positive
integer 2:. Similarly letting b + c — a = 2y and c + a — b = 22,
x + y + z is a partition of m into exactly 3 parts. Moreover. a, b.
c can be uniquely recovered from x. y, z.
22. Let n = 2m. Then 1,, = b... = number ofpartitions of mintoexactly
3 parts. b... is the same as the number of pnrtitions of m in which
the largest part size is 3. So 1),. = coeflicient of x’" in
x'
(_—l — x) (1 — x00 — x')‘
From the solution to the Postage Problem, b... = a.._,
_ 6(m — l)(m — 2) + 18(m — 2) + 17 — (—l)'-9+ 89" +843”-
_ 72
720 DISCRETE murmurs
23. Let X. Y be sets with x and n elements respectively and let F bethe
set of all functions from Y to X. Then I F l = x". Classify the
functions in F awording to their ranges which can be any non-empty
subsets of X. For 1 g m < n, the number of m-subsets of X is
W. For any one such m-snbset, say, R, the num-
ber of functions from Yto X with R as their range is m! Sm...
Apply Theorem (3.2.7) for the first assertion. For the second,
classify the partitions nooording to the number of parts in them.
The result of Exercise (2.19) can be expressed as
mih§(—1)--I( ),.,
I
m
since( .)=0forj>m.So.
J
. ‘_..1.. k -.
P.=.§lS.,k—k§l“fi§(—I)H<j)1
-3 ..5
_I-IJk-l(—l)‘.,1“(I"
_- 1 k
=31 jEl(—l)*-Ik—l(j)(smce(j)=0fork<1)
. k .
-
at P on 1 e2. 1—-
=
12-17!- 331‘ _1 )” (k—m =
91-11"
From px'(r) = 51mm“ we get rpx’(t) = Elnpnt" and, after difl'er-
entietion, tpx”(t) +px'(t)= Ex n'p.t'-‘. 80 V0!) = E (n’—2£(X)n
I- l-O
+ [worm =m1) + Px' (1) — 250?) ”if. np. + [E(X)1'_°§" p...
Since "2011p" = E(X) = px'(l) and {a p. = l, the result follows.
.-
-
The variances in Examples (3) and (4) are, respectively, 10p(1—p)
and 2 respectively.
27. For every n 2 2, the probability. 88y 4.. that a head will occur for
Answers In Exercises 721
1.I==Sopr(r)
the second time at the nth toss is" 2—”. "an?
a 1 1". In
[px(t)]', the c'oetficient of 1"Is :_2: —
212:. So pro) [px(t)]'. Alter-
natively. tossing a coin till ah‘ead shows for the second timeis
equivalent to toning two coins separately till a head shows in each.
Since the two tosses are independent and have the same probability
generating function. namely px(t), the result follows. For E(Y),
33’ (240*) = 2M) mi). Since ma) = 1 and Px'(1) = 5m = 2.
we get E(Y) =
28. (a) Let a = £1 "1r (By Exercise (1.21), a = 7‘; but that is not very
..
. 1
important here.) Let p. = 0 and p. = m {or n a 1. Then
{ p.);'_o is the probability distribution of a discrete random
. . °° n 1 '° 1 .
vartable for which 500 =- “E 5‘ = 3.1? 71 Is infinite.
(b) For n 3 l, 12,, would be q"-1 p, where q = l —p. Hence px(t)
would come out to be_
— :diflerentiating which, E(X) =5.
Section 7.3
50,482 532
1. 99900 =0.50+ 99900. In the decimal expansion of5 2after first
5—99
three stages the remainder is again S32, whence the expansion
must recur with period 3. ((ln fact 99,900
”"82 = 050532532532...)
-— 2— 4m+ aa" (21.) + 3) +0)" (21.) —- l)+10n
2. x.= ———3(l-+—2;)————+ for n 2 2,
wherew= — -1+ i—
V3 is a primitive cuberoot of i.
3. Leta: be a real number. If a:% where p, q are integers and q > 0.
then in the decimal expansion of a, at any stage the remainder is
some integer from 0 to q — 1. So after the digit in the unit‘s place
of p, there can be at most q distinct remainders. Whenever the
same remainder appears, the digits in the decimal expansion will
recur periodically. (Finding the length of this period is an interes-
ting number theoretic problem.) Conversely if a has a recurring
decimal expansion of period r, say, then for suitable integers»: and
722 DISCRETE MATHEMATICS
n, 10'I a = n + p where 9= 0. «1a,...a,u1¢,...a,a,u,...n...... Since
(10'— 1) file an integer, 5 is rational. whence a is rational too.
From F. = $5 (a'—fl'), where a: = 1—1;} and F = l;;/_§ V"
F, _ than _(1—(a/a)'
have Fn—:_ TEL—fl": —1_(3/a ".1 0:. Since
- _a _— L-_.\/5
1+ V5
< l, (Ii/u)" -> 0 as u —» no. The attempted argument is valid to
evaluate L in case it is known to exist. But it does not show that
L exists and hence is not complete.
Let the lengths AD and AB be Ax and x respectively. Then FD
= (A — l)x. Since FDCE is similar to ABCD. = 1—:1. Solving
a l + V5 Th . . d l
— 2 . e property in question epends on y on the shape
of the rectangle. So the rectangles FDCE, FDHG, FJIG. all
have it.
Since F, = l,|F,| = l g 2. Also |F,._,| S 2'"1 and |F._,|
s 2"" giW. IF-l = IFn—s + Fe—sl< a-1|+|Fn-II< 2*)
+ 2I-! <1 2“I + 2‘“1 = 2" for n > 2.
By assumption, A U B = (0, R). Also A n B - 45 since f(x) can-
not simultaneously equal g(x) and h(x) for any x E (0, R). So the
interval (0, R) has been expressed as the union of two mutually
disjoint. closed subsets By connectedness of (0, R), either A = 4a
or B = 95. In the first case. B = (0, R), i.e. fix) = h(x) for all
x e (0. R). In the second casef(x) = g(x) for all x E (0, R).
For those not familiar with connectedness, but familiar with
other versionsof completeness of the real line, an alternate (but
essentially equivalent) argument can be given as follows. If either
A = 95 or B = 4. We are done. If not pick al 6 A, the Band
suppose without loss of generality that a, < 1),. If “Jig—b1 e A,ca]l
it a, and blas b,. If “' "2' ”I e 8, call it b. and .11 as a.. In either '
case. a, E A. b, E B. [a.. b,] c [a, b1] and b, — a, = f (br— «1).
Repeat the argument with [41,, b.] to get [a,, b,] C [a,, b,] with
as e A, b, e B and. b3 — a. = {(bI — (1.). Continuing, we get
sequences (0,}, {b,.} With a, e A, b,' e B,[a.+1, bml c [a., b.] and
1
b. —a,. = 2,“ (bl—a1) for all n 2 1. By completeness of R,
these sequences converge to a common point, say c, of (0, R). But
then on one hand, f(c) = £12 flu.) = lim g(a.) = g(c) while on
m
the other, flc) = h(c), a contradiction.
Answers to Exercises 723
This is essentially same as Exercise (”6), except that a, = l.
(i) a..= "—+—~+1H(4z—1)+(—i)n—-(—;L—1)
(A(x) comes out as (W + i :L 7;)
(ii) a. = 2,3n,.. (A(x) satisfies the difl’erential equation %’ = éy.)
(iii) a, = 3"“ for n> l. (Rewriting the relation as 3" = 2" a,2""
7-0
for n > i, C(x) = A(x) B(x) where A(x), B(x), C(x) are the
O.G.F.’s of {(1.}, {2"} and {3"} respectively.)
__ 3 3‘ 3' 3"
10. (1,,—n!(l—!+2—l+—3—l +“"+rTl)'
ll. a. = 2.5"“. (For a solution using B.G.F., write
a. _1_ a,..l
" if! x" 501—1)!”-
summing which the E.G.F. of (0.} satisfies the difl‘erential equation
(.12 = 5y)
dx '
(i) y=2+x—x'-lx'+lx‘+...
(ii) y=l+2x—;x‘-ax'+-l3-x‘+..
13. Let f(x) = C,» + Chm—l + + Clx + C, and g(x) = xn-rflx),
Then a is a multiple root ofg(x). By Exercise (6.2.23), g’(u) = 0
and hence ug'(¢)= 0. Since g(x)= C,xn + + Cow", xg’(x)
= Cum" +.. .+ C,(n — r)x'-' and the result follows.
14. Follow the notation of the answer to the last exercise Let gg(x)
= g(x). For i = 1, .k —— 1, let g‘(x) = xg,'_.(x). It IS easily seen
that z is a root of g,(x) for i = 0, ... ,k — 1. Expanding the poly-
nomials g:(x), gives the first assertion. For the second assertion it
suflices to show that the matrix
a 1' ... . . . 01"“
a 21' ...... (k—l)¢""~
(k—1)2¢"‘1
a; 2M;- (k—l)"':‘¢"‘1
724 DISCRETE MATHEMATICS
is non-singular. The determinant of this matrix equals “Mb-l)"
times that of the matrix
1 l l
c—
1
O
1 g! (k-Tl)’
can-O
i 2k'-I .....(k-13M
which by Exercise (6.4.33) is non-zero.
15. Let a” ..., a, he the distinct, (possibly complex) characteristic
roots with multiplicities kl, .... k, respectively so that
i k] = r.
1-:
Then the general solution to (20) is
an " “as: + “mg + ... + all»
where for each i= 1. . .. P. 11...: - 6mm" + 6mm" + +
cl:k1”*"‘“l"u nu, ..., c1," beingJ arbitrary constants. In proving this,
fori= l, .. .p and for] = 0, ,k, — l, let why he the vector
("JG/9:10 in V with the understanding that 0° - I. We already
know these vectors are all in K. It is a little clumsy to prove their
linear independence directly using determinants. A different argu-
ment runs as follows. Suppose there exist M, E c not all 0 such
. ‘ ‘ D
that ”E, 1,2! h,,w,,, = 0'. Pick any 4 for which there is some} such
that 5;, as 0 and among all such 1's, let s be the largest Then
A... at 0 and Wu iI_therefore In the linear span of;the set S consis-
ting of the vectors Him for'z ¢ q and the vectors Wa-I forO g j < .1.
To get a contradiction it sulfices to construct a homogeneous
linear recurrence relation, such that every member of S is a solu-
tion of it but W... is not. The linear homogeneous relation whose
characteristic polynomial is (x—aqy [£1 (x—a,)"l has this property.
0
1,6. (a) C — H) where D — (d,,) is a 2k><2k matrix over G in which
forp = 1: .... k: dry-h ”—1 = dII-l'”: ‘7 d"! u = i, dunp—a
= -—i and all other entries are 0. Generallsing Exercise
(6.4.39), det (D) a (2i)", whence D and hence C it non-singular.
Finally, apply Exercise (6.4.29).
(b) Since the coefficients are real, re-" is also a (complex) root of
multiplicity k. For each 0 g j s k—l. express {nlr' cos new."
and (rt/r" tin n0}.‘.’..a as linear combinations (over C) of the
vectors nlr"c"" and {nlr'r‘"’);°.a. both of which are in K
Answers to Exercises 725
by Exercise (3.14) above. As for their linear independence, let
Bbe the 2k x 2k matrix whose rows are truncations of the
vectors {rt/r" cos n') and {rt/r" sin n‘) and A be the 2k >< 2k
matrix whose rows are truncations of the vectors {rdr'e’niii and
(mm-"0), 'By Exericse (3.15), A has rank 2k over 0. So by
(a). B has rank 2k over C, Le. its rows are linearly indepen-
dent over (1 and afartiart over R.
17. Let a“ ..., a, be real characteristic roots of multiplicities k,, ..., k,
respectively and r,e*"l. ..., r.e*"l be complex characteristic roots
of. multiplicities m,, "1,, m” "1,. ..., my In, (so that 5 k, +
‘ . -i
2 [El m,=r). Then a basis for K consists of the vectors of the form
("Mfr-o fori = 1, ..-, p. 0 S] < k. — l and vectors of the form
(uln‘ cos 1:0,)?» and (Mr; si n01)?” for i=1...., 4 and o < j 4
Mt -' I.
18. Since C,a,. + C,_,a._, +...+ C142,...“ + Con” = 5" and C,a._1+
C,_,a.-, +...+ C,a._, + Con..."| = pH. the result follows by
subtracting p times the second equation from the first.
19. That K c L follows from subtraction times the equation 6,41,... +
Chg.-. + + Clap, + Coo...M =- 0 from the equation C,a,, +
C,_,a,..I +...+ 63a...” + C,a._, = 0. The last exercise shows
K’ C_L. For the second assertion. K’ is of the form K + 3 for
somov e L. Also K U (12} spans L. So 17 is of the form x + m for
some A and some :5 K. But then M - (—2) + is K’. If in? is
also in K' for M- an A then (ll—7‘)“ e K which would imply
fl (is — m e K, i.e. a e K, a contradiction. So A is unique.
20. Let a = {9"};0. Continuing the notation of the last exercise and
following the hint, a E L. But a e K. The result now follows from
the last exercise.
21. As a characteristic root of (41), p has multiplicity k + 1. So the
vector u = (11‘3"): o is in L but not in K.
This follows directly by substituting in (42). the values of b., b”.
.... 17..-, from (43) and regrouping the terms.
k+l
In (42),]etp = k + i and D,.: = (—l)’( . for i = 0.
X
....p. Then it becomes a linear homogeneous recurrence relation
whose characteristic polynomial is (x— 0*“. Both the null vector
and the vector { f_.are solutions of this relation. The former
726 niscmn MATHEMATICS
implies K C L while the latter implies K’ C L, both in view of
Exercise (3.22).
By Exercise (3.15), applied to (45). L has a basis of the form
S U (v0, .. ., vi} where S consists of vectors corresponding to charm:-
teristic roots of (45) other than 1. These are precisely the charac-
teristic roots of (44). So 5 Is a basis for K Hence K U (v0, ~ ,vk}
spansL. Now K' is of the form K + v for some v e L (since
K’ c L). Write v as x + A.“ +_.. .+ An. fo_r somex E K and some
constants A0, ... “A; Then o. +...+ Aime K', implying the
result.
(i) a,l = (312‘ + c,3" — 112"n + '21 + %, where 0,. c, are arbitrary
l 5 l l
constants.(n)a,.—— l—(v—_2+ To)" +(7_i_.§)na+5na+ i;
.. 37
n‘ + £6715. (iii) a.=c, (—%) + C. G)" + c. 0052; + 0. sin 213—"
where cl, (3,, ca, c, are arbitrary constants. (Use Exercise (6.2.34) to
detect rational characteristic roots. By trial they are — fand i. The
remaining characteristic roots are ll__=2i_=.l/_3)
Usc Cramcr's rule (Exercise (6.4.16» along with Exercise (6.4.33)
and the fact that ll, ..., a, are all non-zero.
Let K be the solution space of (20). We know dim(K) — r. Let
(3;. .... i} be a basis for K, where D, = (w,,.)2°_., fort = 1. ..., r
Let k 2 0. For i=1, ., r_, let v,_= (whtH, ..., whkwfl) and let A
be the matrix with rows v1, .. .. v,. It sufiices to show thatA is non-
singular. HOWever, as observed In the answer to Exercise (3.15), it is
a little clumsy to do this by computing det(A). Instead, observe that
if A is singular then there exist Aw... I, not all 0 such that i? Mum/=0
1-1
for everyj=k, k+l,.... k+r— 1. Now, fori=l. .... r, Faisa solution
of (20). Since C, #0, each w;,, is a. linear combination of the preceed-
r
ingrterms. So the equation ‘E‘AiWI.1=0holds, successively, for j=k
+ r. k + r + 1,.... Similarly since Co ye 0, it holds for j = k — l.
k -— 2, ..., 0. So it holds for all j = 0, l, But that means the
vectors 3‘, . .. W, are linearly dependent, a contradiction.
a ll
HF.,.(x) = 30“": . then F,(x) = 1 and for m > i, F...(x) =
”lb—i“) So. F..(x) = t'“ by induction. Hence a..,. = m“.
Answers to Exercises 727
31. Let A(x) and 30:) be the 0.G.F.‘s of {a.) and (11.) respectively.
Then the two equations give [A(x)]‘ + [B(x)]'= (—f), + (1—
—lx——)—'
i
and A(x) B(x)= —-.Solving, A(x)= 1—
+2: and B(x) = l——_x .
(Other possibilities—x
are ruled out by initial conditions.) So a.=
(— l)' and b.I = l for all n.
s a ,
32. ‘_'1")1 x"'
IfE(X) = 2 —';nthenE(x)-= E (n: =E “—“lxn. E" (x)
= in a—“—“ x", .. .. E")(x) = “2: 91' x". The result follows by direct
11-0 Ill an) "i
substitution. It is well-known that e" is a solution of the differential
equation it? a is a characteristic root and that if up .... a, are distinct
characteristic roots, then the solutions 2“". W. ..., eh" constitute
a basis for the solution space. From this and the fact that e" is the
E.G.F. of (atn :0, we get an alternate proof of Theorem (3.8).
Section 7.4
Let a. denote the number of ways to form a stack of value n rupees.
Depending upon whether the coin at the top has value 1 or 2
rupees, the rest of the stack can he built in 41..., or in a.-. ways. 50
a, =a.-, + 11..-, for n) 2. Also do = l (the empty stack) and
11 = 1.
In a stack worth n rupees, let the number of 2~rupcc coins be k.
11—):
Thenthetotai number of coins in itis n —- k. Now < A ) is the
number ofstacks that can be formed with k 2-rupec coins and n—2k
l-rupee coins.
Let a. be the number of ways to cover a 2x n, rectangle with
dominos. Any such covering must be one of the following forms,
depending on how the last 2x1 column is covered. So clearly
“I = afl-l + “is-.-
728 DISCRETE MATHEMATICS
BO En-e Bra-i an
____ T
All Am: An-i An
Let b. be the number of sequences of the desired type. The first
n—2 entries of any such sequence constitute a binary sequence of
length 71—2 in which every 1 is immediately followed by a 0. Treating
10 as a single symbol which occupies two spaces, the number of
such sequences is precisely 11..., where a, is as in the answer to
Exercise (4.1).
Clearly p0 —~ 0 while, by the last exercise, for n 3 l,
l+v5 "“ l-VS ""
_i—2 ) ‘i—z i ‘
Pr— V5 ,.
The result follows from by summing 5' pm.
I'-
The choice is, of course, yours! But monetnrily it will be a good
idea to play the game only if the expected number of tosses
before winning the game does not exceed 5. By Proposition (2.13),
this number equals px'(l) where p,‘ (I) is es in the last exercise. A
direct computation gives px'(l) = 6. So if you are 'Iveragely' lucky
it would cost you 6 rupees to win 5 rupees.
The problem is to count 11., the number of sequences of A, B, C
whose total cost is 1:. As in Exercise (4. l), we classify such sequences
depending upon their last entries and get
b,=b,,_,+2b,,_,(n>2)withb.= 1.b,= 1.
Answers to Exercises 729
Solving, b. = .2'"+—3(—l).'_
9. (i) is straight computation. For (ii) classify the selections into two
types: those containing m and those not containing m. (v) follows
by multiplying both sides of (ii) by y" and summing over from
n = 0 to an. Fuov) and F,(y) are obtained from the boundary
conditions, lam = l, b... = 0 for n > 0. bl... = I’m = 1. lip-FD
for u > i. Multiplying both sides of (v) hya‘" and summing over
from m = 2 to on gives V
30:. y) — l~ XU +y) —- xtfitx. .v) — l) -— yx‘B(x. y) = 0
which gives (vi). Thinking of 8(x, y) as a function of x and resolv-
ing into partial fractions and expanding again gives (vii).A1ternati-
vely (v) can be thought of as a homogeneous, linear recurrence
relation for the sequence {Fm(}’»:—o whose characteristic polynomial
is x‘ — x — y. The characteristic roots are W and the
initial conditions give (vii) as the solution. The R.H.S. of (vii) equsls
l'm+214
Tun-151 2k+1 drirub
(+y)*an '
(Vl)oows yexpandmg
(1 + 4y)“-
10. %< 2n 2) and (Ii—l)! respectively. (For the second question, the
n— 1
numbers. say 1:... satisfy the recurrence relation 17,, = (n — l)b..,
for n > 2. To see this, note that It the last stage a piece of length
2 must have been cut into two parts. This piece could be chosen
in (n — 1) ways and thought of as a single pie“ throughout except
at the last cut. A solution without recurrence relations is also pos~
sible. Each cut must occur at one of the n — 1 'nodes' on the stick
and the order of these nodes determines the way the stick is cut.)
ll. The key step in the argument is term-by-term difierentiation, which
is valid for all x because both the series have (—00, so) at their
interval of convergence.
12. "—1”. (Similar to the second part of Exercise (4.10), except
"1 2r}
n
that this time the recurrence relation is bll = ( ) b._,.
2
13. n(n— n+2.
730 DISCRETE MATHEMATICS
14. Follow the hint and apply the solution to the Vendor Problem.
15. Given any balanced arrangement of n pairs of parentheses. define
f: (l,..., n} —> (1, 2,..., n) by [(x) = number of left parentheses
occurring before the xth right parenthesis. Then I is monotonically
increasing and fix) > x for all x. Also the original arrangement
can be recovered from f.
16. The function g in the hint is from {l,..., n} into itself, monotonical-
ly increasing and satisfies g(x) > x for x: 1,..., n. Apply the
last exercise.
17. 2 is the only characteristic root. It has multiplicity ]. For a particu-
lar solution. try a. = A. Then A = 2.4 + 1 gives A = —1. So the
general solution to (2]) is a. =- c 2"-l where c is an arbitrary con-
stant. a, = 0 gives c = l.
18. The crucial part of the construction is to design, for a given ordered
pair, say (a. b), of Boolean variables, a Boolean device D(a, b)
which can be controlled by b if a = 1, but on which b has no
control if a = 0. (D(a, b) cannot be expressed merely as a Boolean
function of a and b, because in that case it will not be able to dis-
tinguish between the orders in which a and b are set equal to 1.)
One such device is a relay D, shown below (where X is an auxiliary
relay).
I
,_a———— ._‘
d
. L.
J,
———l
IIIIT
4"}
Here a and b may themselves be Boolean functions of some other
Boolean variables. Now the desired circuit can be built as follows.
Let the control circuit of Y, be as shown below.
Answers to Exercise: 73]
__J_——1E'—x.—_.|—_.,?_
YI
For 1:2,.“I n, let Y1=D(}’i' yt,---y;’—tyI—bxl)- Finally, let the closure
function of the lamp be yl'y,'l.. y,_1 y... To count the number of
flips directly, let a, be the number of flips needed to activate the
relay Y, starting and ending in the position where Y1...” YH are
all released. Then a, is also the number of flips needed to release
Y, starting and ending in this position. This gives 11,.u = 2n, + 1,
with aI = l. which is exactly the same recurrence relation as in the
Tower of Hanoi problem.
20. Let a.., b," c. denote respectively the number of ways to cover the
following three figures.
0 I n- I n o l - In
Then we get
a. = a... + a.-. + 2"... + 0H (n 3 2)
bin-=01: +bn-1 0'} l)
c. = a. + c.._, (n 2 2).
Converting these into equations about their 0.G.F.'s and solving,
.- _
we get A(x) = "2-2" 0.x” = 1—x—-15x-—':I—x'-H‘ (equivalently.
a, = a...‘ + 5am + a... — am). But the roots of the polynomial
x‘ — x' — Sx‘ — x + 1 are not easy to find and so a closed form
expressior': for a. seems difficult.
21. Write the first row as the sum of (r—h, 0, 0,..., 0) and 0., A, A”... A).
Then by Proposition (6.4.25), det (4,.) = (r A) dot (A..,) +fln)
-
where [(n) is the determinant of the n X n matrix.
FA A I. ...... A']
A r A h
a A r r
i A 1......}
732 Dlscmrra MATHEMATICS
Substracting first row from all others, by Exercise (6.4.13) [01)
-=- Mr —— 70"“. So det (A.) satisfies the linear recurrence relation
b. — (r —— A) b..l = A(r —— h)"‘1 with I:I = r.
Solving this gives the result. (Since r — A is a characteristic root,
for a particular solution try 17,. = An(r -- 10".)
A solution without recurrence relations is also possible. Subtract
the first column of A. from every other column and then add to
the first row every other raw.
det (A.) satisfies the homogeneous linear recurrence relation
a. — an,” + (fink, = 0 with a‘ = at, a, = a' — B‘.
The characteristic roots are real and distinct if a > 29, real and
coincident if a = 23 and complex if e < 2a. With suitable substi-
tutions the answer in the three cases comes out as
(sec 0 + tan 0):“ — (see 0 ~ tan or“ 9,,
2 tan 0
I
where sec 0 = E
(11+ m- “. t = 29
d“ (A.)=
m—(s’iln-FTW 9" Whore cos 0 = 5‘5
It is interesting to note that the case a = 29 results from either of
the other two cases by taking limit as 0 -> 0.
“NW“ fin). lf n is even then
n- n(— l)”and(— if": —-i"-' and sof(n)=0.1fn=-4k+l
then f(n) =
4k + I + (4" + I)
—I— l = k. Similarly verify
thecasen: 4k— 1.
= ___l_ 9—" 2‘: n‘ n 2u+3
"' 233+ 9 + 9 +I§+fi+(“w(—3T)
_ "(1 + (— 0') + ""0 —(— 1)”)
l6 l6 '
where, as usual to = — il + I VT.
3 .
Since a”, a” depend only on
the residue class of n modulo 3 while I" depend: on the residue
class of n modulo 4, the formula for t. can be replaced by 12
sepa-
Answers to Exera‘aes 733
rate. simpler formulas, depending upon the congruence class of n
I
modulo 12. For example, i. = 2—8 + ;+ 3% if n a 7 (mod 12). See
also Exercise (2.22) for u even.
Let x. be the integer closest to 2“1l5. Then
T—— if 2--‘-l(mod 5)
2".‘Lg if 2»q (mod 5)
LH+§ if 2-Iss(mod 5)
2: % if 2--‘E4(mod5)
If m a n (mod 4). it is easily seen thnt 2'”-| : 2'"1 (mod 5). Since
the values of sin mr/Z and cos rue/2 also depend only on the residue
class of n modulo 4, it sufices to prove the assertion only for
n a: l, 2, 3 and 4. This can be done by direct verification.
26. Multiplying both sides of (30) by x“ and summing over from n = 3
to 00, gives A(x) - 1 + x'(A(x) — 1) =1 f” from which the as-
sertion follows.
27 (i) Apply induction on k. Let BM be the set of sequences of length n
in which the patternd occurs exactly k times, with the last occur-
rence at the end. Every member say 55 of BM can be thought
of as a concatenation of a member of 31,; and a member of
B...,, ,H for a unique 1. So the O.G.F. of (I’m) is the product
of the 0.G.F. of (bmfl and that of (b.,k-1}.
(ii) Define the set C3,, . so that | 6., k | = c... g. An element of cm;
is the concatenation of an element of RM and any binary
sequence of length n —i where i is such that the kth occur-
rence of 4 ends at the 1th digit.
(iii) This follows from (ii) since (1., = ‘3'," — CHM-1'
D
(iv) For» a l, a. -ux|b"k' (For a fixed n. the sum is actually
finite since b.,,, = 0 for k > l’n/m‘l.) Since a0 = 1, and
b. = b.,. = 0 for k >1, we get
4(3) — l = 30:) + B’(X) +...+ 311
0+“ = 1 fgix)‘
734 mscnm MATHEMATICS
28. (i) c. = “fix—51:... + 3c... — 2c._.. (The 0.G.F. is B(x)/1—2x
where 5(x) is given by (34).)
(ii) 1!. = 6d,..— 1441..., + 184.4— 17d... + 12d...—5d._. + 2d,.-1.
29 . Let i = (x1,..., x.) be a binary sequence in which the pattern 11]
appears for the first time at the end. Then for n > 3, a": must begin
with 0 or with 10 or with 110. Adding the three possibilities gives
the recurrence relation. The values of b,, b,,..., bu, are. respectively,
0,0,1, 1, 2, 4, 7,13, 24 and 44.
The number of favourable cases is £112,210-n = 128 + 64 + 64 + 64
u-
+56+52+4s+44=52o
31. Suppose there are l024 players. corresponding to the 210 binary
sequences of length 10. Out of these, 520 are winners as calculated
in the last exercise. In all there are 10,240 tosses. The number of
l
dummy tosses from the winners is 2: b.2“-'I (10 —- n) = 2,176. To
count the dummy tosses from the losers. note that the number
of players who have won by the 7th round or earlier is 376. Out
of the 648 remaining players, 324 will get a tail on the eighth round
and stop playing. increasing the number of dummy tosses by 648
(= 2x 324). Among the 324 players who get a head on the eighth
round. 52 win the game on the eighth round and have been already
accounted for, 272 players remain. Of them, 136 get a tail on the
nineth round and hence contribute 136 dummy tosses. The remain-
ing 136 players make no new contribution to the dummy tosses.
because the winners among them have already been counted and
the losers go for the tenth round anyway. The number of dummy
tosses is thus 2,176 + 648 + 136:2.960. Hence the total ‘revenue'
got from all 1024 players is 10240 — 2960: 7280. If this 18 divided
equally among the 520 winners. we get l4 rupees as the amount of
the reward for a win.
b
32. We have 1),, - 2—: where b" is as in Exercise (4.29).
Thflps - %andforallnz4,
Pa - l
2Pu-1 + l
4Pn—2 + l
spa—3 , and
1 1 1 1 1 3
"Pa - 2 (n-l)pn-1 + 301-311»: " g (II-3)?” + ape-1 + ape-2 + gnu—3-
Summing over from n = 4 to on gives the desired answer.
Index
8. (“mph mum"). 88 axial group (are Klein group)
Abel. 61 axiom. 36
abelian group, 303. 319. 337. 343. 368. axionutric set theory. 54
384. 617 axiom of choice. I98. 224. 237. 445. 446
abelianieed group. 339. 365 637 .
absorption lawn. 224
addition (as a logical argument). 298 Back substitution, 496
addition (principle of). 73 basis. 445
additively written none. 303 cardinality of-, 450
adioint matrix. 484 change of-, 472
Amhe. 591 characterisation of-. 446. 497
m". 55 extension of-. 451
Ahlfon. 434 ordered-. 465
alnbraie element. 452. _458 Bell numbers. 557, 558
algebraic structures. 212 Bender. 220
homomorphism' 01-. 213 bijeetion (- bijective function). 65
-of the same type. 212 bilinear function (- form). 459. 501
substructures 01-. 213 binary coding. 282
-on a quotient set. 214. 217 binary device. 222
algorithm. 14, 621 binary operation. 199
'almost‘. 39. 52. 91 notations for, 199
alphabet. 95. 618 -induoed on a sublet. 200
alternating group. 378 -on a canesian product. 202
generators for-. 383 -on a set of functions. 202
non-solubility of-. 38.3. 618 -on residue classes. 203
simplicity ot-. 380. 384 -—on power set. 217
ambient structure. 213 opposite operation 01-. 204
analytic Motions. 28. 406. 431. 434. 511 table for-. 205
AND-pate. 273. 282 commutative diagrams for-. 215
Andre's solution. 92 binary predicate. 291
antisymmetry. 174 binary relation, 153. 291
antithesis. 40 binary sequence, 23. B9. 96, 141
argument (in logic) 37. 293 Billet. 503
argument (of a function). 64 binomial coetficients. 84, 512
associate. 415 identities for-. 8511‘. 3145'
associativity. 201. 218. 224 sum of-. 109. 514
asymmetry. 175 binomial distribution. 547. 655
asymptotic behaviour, 621 binomial theorem. 84. 408. 512
atom, 234 Jar arbitrary exponent. 512
augmented matrix, 501 Birbal. 129, 152
automorphism (ofmup) 342. 365. 430 Birkhofl”. 198.
overuse (see expected value) Bishop. D.M.. 322
736 rumor
Bishop. 13.. 21 solution to-. 29
bivalued logic. 37. 284. 299 cardinal arithmetic. 87. 91
black box, 273. 282 cardinal number (— cardinality). 72
Boole 25. 284 -of a disjoint union. 73
Boolean algebras 4)! a union. 75
definition of-, 222 —of product. 78, 79
alternate definition 01-. 241 -of power set. so
—as lattices. 232 -of set of functions. 80
as rings, 388. 440 -of infinite sets. 86. 91
basic properties of-. 224. 231. 239 —of NxN, 87
duality for‘, 225 -of a multiset. 151
structure of-. 235 Carmichael, 369. 384
representation theorem for-. 224, 235. mesian product. 62
237. 241. 441 canesian representation of a binary
subalsebras 01-. 223 relation. 155
product 01-. 223, 241 Casino Problem 18. 29
—o.f Boolean functions. 245. 255 solution to—. 614
—of functions. 223. 237. 240 Catalan numbers. 529. 596. 612
—of circuits. 284 (aim rte Vendor Problem)
—of predicates. 292 Cattle Problem, 19, 33. 35. 622
—oi' statements. 287 Cauhcy-Binet theorem. 503
Boolean function. 242 Cayley‘s theorem about groups. 356
tables for-. 243. 244 Cayley-Hamilton theorem. 503
structure of-. 246 Central subring. 430
disjunctive normal form 01-. 247 centre (of a group) 321. 334. 336
conjunctive normal form 01-. 250 centre (of a ring). 405
factorisation of-. 257 chain 177. 194. 612
ring normal form 01-. 260 chain rule. 294
black box representation of-. 273 Chandrasekhar. 434
Symmetry of-. (m symmetric Boolean change of basis, 472
function) China. 12
—representinz a circuit. 264 characterisation. 44
Boolean ring. 388. 440 characteristic (of an integral domnln)
Boolean variable. 242 392, 405
Bose, 2l characteristic function. 69. 80, 197
Bose-Choudhuri-Hocquenheim codes. 619 characteristic number. 253
bounded, 180 chnrncteristic polynomial (of a matrix),
box, 68 500
break contact, 276 charcteristic polynomial (of a recurr-
bridge circuit. 265. 270 ence relation), 575
broken (= open) circuit. 264 characteristic property. 56. 292
Brouwer's fixed point theorem. 620 Chinese remainder theorem. 430. 434
Burnside, 321. 616 circle group. 307. 314. 338. 344. 347. 366
Business Problem. 17. 25. 34, 70. :93 circuit with gates. 273
solution to-. 229. 289. 298 circuit with relays, 275
modification 01-. 229 circuit with switches
—with two terminals, 262
Canale. 12 seriu-paraflel-. 262
cancellation law. 210 equivolent-. 264
—in Boolean algebra, 239 closure function 01-. 264
canonical forms. 474. 502 —with a star. 268
canonical isomorphism. 347 —with a bridge, 265. 270
-with double dual. 503 -i'or a symmetric fimction. 271. 281
Cantor. 53. 91. 198 —for ternary devices. 281
Capital Problem. 18. 34 class. 55
Index 137
statements about-. 38 commuter graphics. 9
class decomposition. 318. 336 concatenation. 204. 302
classification of groups. 352. 384 conculaion, 37, 42, 203
closed bull. 139. 150. 217. 620 commence (geometric). 160
closed form expression. 560 congruence (modulo an integer). 159,
cloeed circuit, 264 171. 214. 323
closed switch. 261 aruence classes. 163
closed under. 200 congruence relation (on an algebraic
closure function. 264 structure). 214
closure (laws of). 241 conjecture. 38
C.N.F. (see conjunctive normal form) continue? (in groups). 318. 321. 335
coarser decomposition. 164 conjunctive normal form. 250
eo-atom. 240. 250 conjunction. 41, 222, 298
coding theory, 145,- 618 consequence. 43
codomain. 63. 71 constant function. 65
coefficient of a power, 5l0 constructivist mathematica. 21. 37. 64
coiactor. 481 contact (on a relay). 276
coflnite. 240 containment (for sets). 57. 61
collection. 53 (also an inclusion)
colourinl a graph, 619 ContinuousHouse Problem. 4, 6, 10,
column rank, 462 546
column vector. 461 continuously compounded interest. 12
combination. 84 continuous mathematics, 1. 7. 198
combination-l circuit. 261 continuous variable. 3, 8
combinatorial identities, 84. 89. 110, 51417 continuum 3. 198
combinatorics. 72 contradiction. 236
-and algebra. 533. 544 contrapositive, 42. 289
commensurate. 2. 10. 12 converse. 42
commutative binary operation. 201 —imp1ication, 45
commutative diagrams. 167. 173 convex polygon problems. 109. 596
-for binary operations. 215 convolution. 219, 399
commutative group (see abelian group) co-ordinatewise operation, 202. 223, 321,
commutative rinz. 386. 408 394
commutative triangle. 167, 173 eoprime (=relatively prime)
commutator subgroup. 321. 339. 366. 368 corollary. 37
companion matrix. 424. 469. 500 cosets, 324. 328
comparable elements. 177 decomposition into—, 325
compatibility 165. 217 multiplication of- 320. 339
complement 58. 222. 232 group of— 330
complementary probability, 74, 90 of an ideoi- 401
complqnentary state. 222 center-example, 3B
complementary subspace, 458 countable sets. 88. 90. 195. 240, 458
complemented lattice, 232 cover (in a partial order). 1E3
completed (— closed) circuit, 264 Cexeter, 591
complete D.N.F.. 247 Crammer's rule, 484, 499
crazy dice, 555
cross product, 202. 310. 393, 435. 459
-01' real line 198. 722 Crowell. 369
complete set of representatives, 354 cunning approach to recurrence
complete splitting (of n polynomial). 431 relations. 567
complex numbezs 306. 320. 394. 424 cycles (of a permutation) 370, 384
-u matrices 39B cyclically symmetric function. 258
composite (= composition) 64. 71, 202. cyclic decomposition. 373, 382
302. 467 cyclic group. 313, 333, 353, 432
compound interest. 11 generators fot- 313. 419
738 INDEX
automorphism group of— 430 -of a subspace, 451, 457. 458
—of a quotient space, 452. 458
Dance Problem. 9, 67. 72, 195 —of the space of linear tnnsforma-
solution to—. 182 tions. 464
dm structures, 179 Diophantine equations. 34]
decomposition (of I set), 98, 164, 17]. direct image, 65
195, 223, 557, 612 direct implication, 45
-by an equivalence reluion. 161. 165. Discrete House Problem. 4. B, 10
”I discrete mathemntics, 1, 7
Dedekind, 198 discrete metric, 140, 150
decutive logic. 36 discrete nndom variable, 546
deeper opention. 386 expected value of- 548
definition trick. 48 independence of- 549
degree (in a euclidean ring), 410 variance of— 557
degree (of an extension field), 453, 458 disjoint sets, 58
degree (of a polynomial). 400 disjunction, 41, 222
degree (of a star), 268 disjunctive normal form, 247
delta circuit. 269. 283 distance function, 138
De Morgan‘s laws, 59. 225 distribution problems, 96
De Moivre, 59o —distinct objects into distinct boxes,
dense-in-itself, 195 97
density function, 11 —distinct objects into non-distinct
denumerable set, 88 boxes. 98
Dec. 152 (also see decomposition)
derangement, 23, 120, 126. 595 man-distinct objects into distinct
derivative. 407, 431. 509, 520 boxes. 102
design theory, 618 -non-distinct objects into non-distinct
determinant. 476 boxes. 105
inductive definition of- 476,481 (also see partition of I nmnber)
—of transpose, 477 distributlvity. 211
multilinearity of— 479 4n a lattice, 216, 232
—ot"a product. 479 Division Problem, 17. 25
-and invertibility, 483 solution to—, 204
-invariance under similarity, 48l division ring, 392, 408
-and eigenvalues, 500 . divisibility relation, I78. 183,186. 234,
diagonal, 155, 157. 175, 216 40], 413 '
diagonalisation argument. 91 Diwan, 59l
diagonal matrix, 406, 499 domain (of a function), 63-
dice, 104, 546, 555 domein (= integnl domain), 390
dichotomy. 177 euclideAn—, 410
dictionary (-lexicographic) ordering. 178 principal ideal-, 412
Diet Problem, 19. 31. 622 unique factorisalion-. 427
difference equation. 558 domino coveringproblerns. 601. 609, 613
difl'erential equations don’t care conditions, 667
analogy with recurrence relations, door-bell circuit. 278
558, 59L 602 Domhofl', 35, 241. 260,284, 322
-associated with linear recurrence dot product, 202 435, 459
relations. 590 double complementation. 224
-applications to discrete mathematics. double counting argument. 90
520 double dual, 497, 503
—series solutions of— 10, 57]. 587 double induction. 610
iihedral group, 309, 320 dual (of a vector space) 440. 465, 497
Dilworth‘s theomln, 194. 198, 620 dual partition,'106. 147. 544
dimension. 450 duality
intuitive meaning of— 451 —for binary operations, 204
Index 739
—for Boolean algebras 225. 234 exponential law (for functions), 71
~for partial orders, 180 extended complex number system, 320
Durfee square, 545 extended real line, 179
extension field, 395,» 399, 420, 432, 471
Echelon form. 488 617
reduction to-. 492
E.G.F. (- exponential generating Factorial, 82
function) stifling’a approximation to-, 621
eigenfunction, 500 factorial representation, 192
eigenspace. 499 factorial ring (— u, f. d.)
eigenvalue, 499 factoring a Boolean function, 257
eigenvector, 499 factoring through a function, 166
Eisenstein‘s criterion, 503 fair coin, 29
element. 55 family, 55
embedding, 343 feasible region, 33
empty class, 40 feedbadc arrangement, 263
empty set, 57 Format, 332, 341, 434
entire ring (rte integral domain) Ferrer's gr-Iph, 106, 147, 544
enurnerable set, 88 field, 39]
enumeration, 88 extension of-, (re: extension field)
enumerator, 534 fiinite-, 391, 432, 433, 448
Envelopes Problem, 17, 23 34 non-oommutative-, (sec division ring)
solution to-. 121 —of quotients, 404
epimorphism, 342 ordered-, 407
equality modulo a relation. 177 vector, 435
equipollent. 159, l95 Fibonacci numbers, 34, 35, 585, 609
equivalence (for orders), I95 finer decomposition, 164
equivalence (logical). 33, l58 finite algebraic structure, 215
equivalence class, 160 finite dimensional vector space, 445, 451
equivalence relation, 158 finite extension field, 453
associated decomposition, 161. 165. 172 finite field, 391, 405, 432, 448
-generated by a relation 170, 171, l72 multiplicative group of-, 433
Euclid, 25. 35. 409 finitely generated group, 364, 369, 451
euclidean algorithm, 77, 163, 306, 314, finitely presented group, 364
409 , finite mathematics. 7
-proof of, 410 finite set, 72
euclidean distance. 135, 137 finite sequence, 66
euclidean geometry, 8, 37, 533 finite state machine, 300
euclidean ring (— domain), 410 fixed point, 69, 119, 620
g,c.d. in-, 417 filp (m reflection)
euclidean space, 242 flow, 619
Euler, 51, 130, 505, 531 Ford, 619
Euler's function, 127, l28, 219, 419, 430 four group (see Klein group)
432 four square theorem, 220
evaluation, 497 Fourier series, 508
even permutation, 377 Fox, 369
exclusive or (:2: ring sum) free abeiian group, 365, 368, 459
exclusive use of 'or', 41 free group, 359, 367, 460
existence statement, 39 frequency, 148
existence theorem, 21 Friday the 13th, 217
existential quantifier, 57 Fullterson, 619
expected value, 528, 548 function, 63
exponential enumerator, 534. 550fi' basic teminology about- 64, 65, 71
exponential qenerating funuim, 508, -of two variables, 70
5501!, 557 functional analysis, 435, 460
740 INDEX
thnctlonally complete not of ates, 282 0! order p" 336. 332. 383'
-of order pg 340
Galoil, 384, 618 homomorphism! of-, 342
me theory, 621 -of hommnorphilml. 346, 367
gate, 274 -of automorphism. 365. 430
Gill“. 435. 488 clarification of-. 352
(human intersen. 411. 428 symmetric-, 353
Gaussian method (see row reduction) -as subgroups of permutation groups,
Gauss-Jordan method. 503 356
Gauss' lemma. 432 —as quotients of free groups. 361
g.c.d. (=- greutest common divisor) presentation of-, 364
general position, 26. 39 simple. 380
generating functions, 507 solvable-. 383. 618
basic properties of-, 509 applications 01-. 309. 322, 615
of two variables-, 524 group actions, 338. 616
-applicntions to counting problems,
5313' Hall. 6.6. 322
applications to evaluating sums, 514. Hall. M.. 22, 35. 112, 188.198
523 Hall, P. 620
applications to probebility, 545 Halmos. 52. 71, 91. 198
application: to recurrence relations. Hamel basis, 446
563 Hammermesh, 322
generators (for a group). 364 Hamilton. 408
Gilbert bound, 151 Hamming hound. 145. 151
Gillmon. 408 Hamming distance (- metric) 141
3.1.1:. (- greatest lower bound) Harary. 152
Goflmln. 198 Hardy. 22
Goldbach's conjecture. 38. 285 harmonic numbers. 523
Goldberg, 590 Hose diagram, 184
golden ratio. 585, 591 Head Office Problem, 18, 30. 34
Goldman, 220 hereditary property. 171
Gorenstein. 384 Herstein, 35, 220, 480, 433, 460. 502. 503
Graham. 91 Hilbert matrix, 501
mph. 134, 152, 619 Hofl‘man, 460
-Of a function, 63 Hohn, 35, 241, 260, 284, 322
graphic ' (of n ‘ linear relation
154 572
greatest common divisor. 186. 415. 43] homogeneous system of linear equations
greatest element. 180 492
Greenberg, 503 homomorphism (of algebraic structures).
will”. 2l3, 216
definition of—, 302, 322 homomorphism (or groups), 342
—of substitutions. 301 examples of-, 343
—of isometries (— symmetries), 307 fundamental theorem about-. 346
—of permutations, 303. 353. 356 kernel of,— 346
—of residue classes, 304, 313. 418 -onto a quotient Broup. 345
-of quaternions. 310 homomorphism of rings, 407
—of roots of unity. 314 homomorphism (of vector spaces).
—order 01-. 305, 331 (see linear transformation),
class demcompoaition of. 318. 336 House Problem, 3
centre of-. 321. 336 Hua. 128. 220
product of-, 321 Huntington. 241
literature about-, 322 hypothesis. 42
topological-. 322 hypothetical syllogism. 298
—of octets. 330, 339
Index 741
Ideal, 400, 408 inverse function. 65
idempotency (in Boolean llaebra). 224 inverse image, 65
idempotent. 218, 405 inverse relation. 168, 171
identity (for a binary operation), 205 inversion (in a permutation). 188, 376
identity function, 65 inversion (w.r.t. a binary operation). 207.
identity matrix, 398 339, 365
ill, 45 invertible element, 207
'il" in a definition. 44 invertible matrix, 480$
'if’ part, 45 irreducible polynomial, 420, 423, 431, 502
image, 65 irreflexive, 175
imbedding, 343 Isaecson, 12
implication statement. 42. 288 iwmetry, 160, 300, 307
inclusion (in a Boolean algebra), 231 group of- (see symmetry group)
inclusion (of sets). 58, 61. 178, 194 isomorphism, 160
inclusion and exclusion (see principle of —t‘or algebraic structures, 214, '216
-1‘or Boolean algebras, 223
inclusion function, 65 -1‘or groups. 342, 369
inclusive use of 'or‘. 42 —t'or partial orders, 195
inconsistency (numerical), 125 -for rials. 393
inconsistent system, 496 40: vector spaces, 447
independent Boolean variables, 242 isotropy subgroup, 338
independent discrete random variables,
Jacobson. 408
indeterminate, 400 Jerison, 408
index (of a subgroup). 326, 331 join. 58, 186
indexing function (for a linear order), Joshi. 52, 71. 299
Juxtaposition, 204
-for permutations of a set, 187. 196
indicator function, 506 Kannarkar, 622
induced binary operation, 200 k-ary sequences. 89. 109, 151, 162, 173.
induced relation, 171 553, 593
induction (second principle of), 34 Keller. 12
(see also mathematical induction) Kelley 71, 241
inductive logic, 36 kernel. 346
infimum. 180 Klein map. 310. 313, 330. 349, 352, 366.
infinite products, 622 78
lntinite sets, 86 knot theory, 369
initial condition, 559. 530 Knuth. 22. 35, 112, 152, I98, 408, 434.
infective function, 65 503
characterizations off. 69 Kolrnogorov. 558
inner automorphism, 345, 365 Konixsberg bridge problem, 130. 619
inner product space, 435 Kreyszig, 531
instantiation. 295 Krishnamurthy, 35
integer programming. 34, 622 Kunze, 460
integral domain, 390 Kurosh. 322
characteristic or- 392
field of quotients of- 404 Lagrange, 171, 220
—with special properties (at: domain) Lazranze‘s theorem (about groups). 331
integral part. 143 converse of—333, 352, 378, 617
interlocked relays. 279 Landlord Problem. 17, 25
intersection. 58 solution to— 265
interval of convergence. 510, 565 Lana, 460
invariant under, 217, 343 largest element. 180
inverse (of a matriit), 483, 496 Larsen. 322
inverse (of an element). 207 Larson. 558
742 moax
Latin sqwes.22 linear transformations (ulna .12: matrix)
lattice (as a poset), 186,195 198, 216 438, 457
(also see Boolean algebra) nullity of— 458
lattice (in geometry). 198 rank of— 458
-polnts, 525 —defined by a matrix, 463
laws of indicea. 201. 209 matrix of— 465
leading coefficient, 400 vector space of—440, 464
least common multiple, 186. 415 Lindemnnn, 460
least element. 180 listing problem. 179, 186
least upper bound. 180 Lime Travelling Salesman Problem, 19.
l.c.m. (=least common multiple) 30
left cancellation law, 210 Littlcwood, 22
left coset, 324 Liu, C]... 35
—rel.ationship with right coset, 325 locally finite poset. 220
left distributivity. 211 Locks Problem, 17, ‘23. 66, :07
left ideal. 400 solution to— 92, 251
left identity, 205 logic element (—gate), 274
left inverse. 207 logically equivalent, 39. 158
left module. 437, 458 law/an. 531.558
left translation, 215. 316 lower bound. 180
Leibnitz. 341 lower triangular matrix. 498
lemma. 37 l.u.b. (see least upper bound)
Lessman, 590
Levy. 590 MacMahon. 531
lexicographic ordering, l78 major premise, 295
—for permutations, 187 make contact. 276
—for subsets. 197 Manohar. 220. 299. 434
Li-Jen-Shu formula, S30 mp (=mappins). 64
Limaye, 460 matching theory. 620
limiting process 7, 28, 185 mathematical induction, 34. 46. 72. 196.
Lindernann, 460 591, 610
linear combination. 441 matrix. 66, 396
linear differential equation, 350. 590 product of- 397
linear equations, 350 491 ~representing complex number. 3913
linear fractional transformation. 319 -representing quaternions, 405
linear functional (— form). 440 -representing field extension, 471
linearly dependent. 442 diagonal-406. 499
linearly independent. 442 rank 01-. 462, 4871?, 500
linear order, 177. 184 transpose of-, 462
linear programming 33, 622 -of a linear transformation. 465
linear recurrence relation. 572 —of a change of baris, 472
associated homogeneous equation similar- 474
(A.H.E.) s72 determinant 01-. 476
solution'rpac‘e of—573 invertibility of-. 485
characteristic root 01—. 575 adjoint of-. 484
—with distinct characteristic roots. 576 singular; 485
—with complex characteristic roots, triangular-. 498
578, 587 ~in echelon form, 488
finding particular solution of— 579. 588 —of a permutation. 498
—with two indioes. 582 —-of a bilinear form, 50?.
—with multiple characteristic roots. 587 partitioned- 502
worresponding linear diflerential eigenvalue of- 499
equation, 590 trace of-499
applications of 59111 characteristic polynomial of-. 499
linear span. 441 maximal chain. 194, 198
Index 743
maximal element, 180 non-commutative field (sec division ring)
maximal ideal, 401 non-singular linear function. 320
maximum element, 180 non-singular matrix, 48517. 501
measure theroy. 52 NOR-gate. 282
meet (in a lattice). 1116 norm (of a quaternion). 393. 405. 459
meeting (for sets), 58 normal form (in free groups). 359
member. 55 normaliser. 334, 341
Mendelson. 52 normal subgroup, 315. 317, 318. 327
metaknowledge. 286 NOT-lute, 273
metastatemcnt. 286 nullity. 458
metric, 138, 2l7 null set. 57
Meyer, 12 null space, 457, 458
minimal element, 180 numerical Imlylio. 11. 12
minimum element. 180 numerical inconsistency. 125
minor (in a matrix). 481
minor argument, 295 Odd permutation. 377
minor premise, 295 0.6.1:. (- ordinary generating function)
Mobius inversion. 219 one-to-one correspondence. 65
Mobius transform, 219 one-tonne function. 65
modular arithmetic (- residue arithme- ’only if’. 44. 45
tic) onto function. 65
module, 437. 458 open ball. 139. 150. 217
modulo. )1 addition, 204 open circuit. 264
modulo, 1: congruence. 159, 203 open switch. 261
module In multiplication. 204 operate-and-hold circuits. 277
modus ponens. 294 operate state (of a relay). 275
monic polynomial. 468 operate time (of n rehy). 278
monoid, 212. 215, 302. 319. 322 operation. 199
monomial, 248 operator. 64
monomorphism. 342 opposite opention, 204
monotonically increasing sequence. 110. orbit, 338. 616
194. 612 order (of I group). 305. 331
Motzkin. 433 order (of a group element), 314. 320. 332
multigraph, 134 -and isomorphism. 5384
multilinear function. 458. 460 order (of n recurrence relation). 559
multiple root. 431 order (on I set). 174
multiplication (principle of), 78 -equivalence (— isomorphism). 195
multiplicative function. 127. 128, 219 -type, 195
multiplicity function. 148 ordered basis. 465
multiset, 71. 148. 151 ordered field, 407
ordered pair. 63
NAND-gate, 282 ordinary enumerator; 534
n~nry operation. 199 ordinary generating function (see nonem-
natural logarithm. 99. 523. 621. ing function)
necessary condition. 43 OR-glte. 273
negative element. 107 orientation preserving isometriea, 309,
negation. 39. 49
nest, 177 orthogonal decompositions. 172
network. 619 output circuit. 279
neutral conditions, 667
Newmann. 558 Palindrome, 109
Nielsen. 620 plrldox. 54. 286
Niven 12. 460 parallel arrangement of switches. 262
Noble. 503 parallel processing, 275
Nociher, 366 parallel trmslate. 316
744 mbrx
parity, 159 binary operation on-. 202
-ot‘ permutations, 377 power series,
parity checker. 282 formal» 400. 406. 439. 504
partially ordered set. 177 48 a generating function. 508
partial order. 174 interval 01' convergence of-. 510
—extension to a linear order. 192, 198 power set, 58
partitions (of a number). 24. 105. 332 binary operations on-. 217
dunl 01-. 106. 147 -Boolean alaebra. 222
Ferrer‘s graph: 01-. 106 onrdinality all, 80. 91
triangular. 111, 557. 613 predicate, 290
enumerator for-. 540 equality of-. 292
‘Durfee square of-. 544 -calclllus, 299
-or restricted type. 543, 556 premise. 37, 293
partition (of a set). (an decomposition) ' prime (element), 414, 418. 419
partitioned matrix. 502 n1mively—. 127. 417
Farm, 12. 112 prime numbers. 414
Pascal triangle. 525. 530 infinity of-, 34, 35
pattern. 60511, 613 —as atoms. 234
Penna axioms. 72 -ofthe form 4n+ l. 429
permutations (of a set). 65. 81, 202. 551 primitive polynomial. 432
number 01-. 81 primitive root of unity, 314. 4l9, 527
-with identical objects, 82 primitive terms, 47
-of a multitet. 152 principal idea]. 401
-ordering (- listing) 01-. 187, 196 principal ideal domain, 412
inversion in-. 188 p.c.d. in-, 415
will) of-. 303. 353. 356 chain of ideals in-. 425
cyclic decomposition of-. 373 factorisation in-. 426
parity (— sign) 01-. 377 principle of addition, 73
-matrix. 498 principle of duality, 225
P'npxio 591 principle of inclusion and exclusion. 113
p.i.d. (- principal ideal domain) inductive proof of-, 114
pimn-bcle-principle. 77 alternate proof of-, 116, 126
planar nupb. 619 applications 011, 120. 127
point. 55 inequality form 01-, 123
pointwise defined operation, 202 generalised form of-, 117
Polya's theory of counting. 384. 617 principle of multiplication, 78
polynomials. 490 probability,
ring 01-. 400. 411 -and randomness. 3
irreducible-. 420 -and truth. 52
root (- zero) of-. 420, 431 oomplementary—. 74, 90
splitting 01-. 431 —density function. 11.
primitive. 432 mm. 545
monio. 468 —di-tribution. 546
' ' inproving" " $16 ' ‘ ‘ .547
polynomial time algorithm. 622 philosphical view about-, 549
Pontrynsin. 322 product (of sets). 62
pout (= partially ordered set), 177 binary operation on-, 202
position vector. 434 cardinality 01-. 7B. 79
positive element. 407 projection function, 165
positivity (for a metric). 136 —from a product, 345
Postage Problem. 17. 23. 34 projective non-retry, 460
solution to-, 107. 541 projective plane. 22, 618
postulate. 36 properly less. 175
power (of an element), 201. 303. 560 proper subset. 57
power (of a set), 63 proposition (as a statement). 284
Index 745
proposition (as a theorem), 37 reflexive relation, 155
propositional calculus, 299 Regions Problem, 18, 26
Pythagorean metric. 150 solution to~, 34
regular matrix, 485
Quadratic group (see Klein group) relation (on a set),-154, 291
quadratic residue, 429 relations (for a group), 364
quantification of truth, 38, 52 relatively prime, 127, 417
quantifier, 57 -ideals, 429
quaternary sequenée, 89, 160 relays, 275
quaterion group, 301, 321, 330, 349, 352, —as memory devices, 278
358, 364, 367, 368 -with delays, 278
quatemions, 393, 405, 408, 421, 422, 459, interlocked-, 279
692 release state (of a relay). 275
quintary sequence, 89 release time (of a relay), 278
quotient field (tee field of quotients) Religious Conferenoe Problem, 16, 20.
quotient function, 165, 214 21, 432, 618
quotient group. 311, 330, 349,383 reminder theorem, 421
quotient homomorphism, 345 representation theorems, 224 230. 356,
quotient ring, 400 361, 447
quotient set, 165. 214, 217 residue (s residue class), 163, 203
quotient space, 438 arithmetic 25, 204, 220
quotient structure, 214 restriction (of a function), 65
restriction (of a relation). 171
Ramanujan, 112 reverse order, 174. 180
Ramsey’s theory, 89, 91 Riemannian geometry, 37, 52
Random, 3 right-, (see corresponding 'left‘ entries)
-Variable, 545 rigid body motions, 300
—walk, 112 rings. 408
range, 65, 71 definition 01-. 386
rank (of a free abelian group), 459 Boolean-, 388
rank (of a free group). 460 —of quaternions 393, 421, 422
tank (of a linear transformation), 458 Product of-, 394
rank (of a matrix). 462 487m 500 suhring 01-, 394
rational (etymology), 403 -of functions, 395, 406
rationai function 405, 439 -of matrices. 398, 439
rational root test, 432 —of power series, 399
real numbers -of polynomials. 400
-applioability to mathematics, 3 -of analytic functions, 406. 431
construction «pf-.198 ~01“ Gaussian integers. 4l1
completeness of-. 197, 722 homomorphism 01-. 407
recurrence relations, 27 Euclidean. 410
analogy with difl‘erential equations, ring normal fom. 260
558, 591, 602 ring sum, 240. 260. 282
-solution by 0.G.F. 5631f Riordan, 35, 531
cunning approach to- 567 Robinson, 369
-solution using, E.G. ., $69 roots (of a polynomial), 420
-solution using difi‘erential equation, roots of unity. 314. 403. 419. 527
590 Roth. 22
linear-, 57211 Rothschild, 91
withtwo indices-, 582, 594 row operations. 490
simultaneoua-, 584, 590, 594 row rank. 462
reduoed word form, 359 row reduction, 491
reductio-ad-absurdum argument, 42 row vector, 461
refill-out 164, 177, 195 Royden, 52
reflection, 308 Rubik’s cube, 300. 322
746 INDEX
Rudin, 460, 531. 591 Spencer. 91
rule of three. 217 Sperner's lemma, 621
Russel‘s paradox, 52. 54. 586 squarefree intent. 238
square matrix. 396
Samuel. 408 standard basis, 445
scalar. 434 star (in a circuit). 268
scalar multiplication, 434 statement. 284
scalar product. 435 about a class. 38, 288
Schrier, 620 Stirling numbers. 99. 122
search trees, 619 recurrence relation for-. 101
second principle of mathematical identities about-. 110. 112
induction, 34 references to~ 112
selections (- combinations), 84 generating function for-. 554
-without repetitions. 84 Stirlinl‘s formula, 621
with repetitions 104. 110. 537 stochastic variable. 545
indexing (— listing) of-, 197 Stokes‘ theorem. 435
self-dual partition, 111. 544 Stone Problem, 18. 26
semigroup 301. 322 solution to-, 256. 259
sentence, 291 Stone representation theorem, 224, 235,
sequence. 66 237. 241. 357, 503
sequential circuit. 261. 275 Strangio. 283
series amusement (of switches). 262 strictly less. 175
series solutions of dln‘erentiel equations. strictly stronger. 45
10. 571. 587 strictly weaker. 45
set. 53 strict partial order. 175
n-set. 72 string, 95
-with additional struoturc. 129 stronger algebraic structure. 216
Shares Problem 18. 27. 34. 504 stronser relation, 165
solution to—, 562. 577 stronger statement, 45
slurp bound. 122 slnscture theorems, 236
sharper statement. 46 subfield, 395
Shrikhande, 21 subgroups, 305. 319
shifting down method. 281 order of, 317, 331
sign (of I permutation). (st-s psrlty) normal-, 315, 327
similarity invariant. 474 —scnersted by a subset. 3]]
similar matrices. 474. 481 -of Z. 306,
Simmons. 152 union 01-, 311. 322
simple group, 380 product of-, 320
simple isomorphism. 369 -of commutators. 321
simple order. 177 cosets of. 324
simple root. 431 index of~, 325, 331
simplex, 380 —of orientation preserving isometries.
simplex method, 622 309. 320. 328
’ 584. L' ' _ 195
590. 594 subordinate argument, 295
singleton set. 72 subring, 394, 400
singular matrix. 485 oentrnl-. 430
skew field (see division ring) subset. 57
skew-symmetric murix. 497 lit-subset. 72
smallest element. 180 subspace. 438
sneak path, 268 '51;a (- lemma!) by a subset,
solvable group, 383. 618 1
solving a polynomial equation, 384, 618 uniform of-. 456
sorting problem. 179. 198. 621 oomplemennry-, 458
span (— linear span) substitution. 301
Index 747
laws of—. 241 topology, 192, 198
substructure, 171 torsion element, 314
-of an algebraic structure, 213 total order, 177
sufl‘icient condition, 43 Tournament Problem, 16, 22, 64. 70
superset, 57 solution to-, 34. 76
supremum, 13, 14, 180 tower (= chain), 177
surjective function, 65 Tower of Hanoi Problem, 599, 612, 615
characterisation 011/69 trace. 499, 500
switch. 261 tractable problems, 622
switching algebra, 284 transcendental element, 452, 458
switching circuit (see circuit with transform, 219
switches) transformation, 64
sylloxism. 61, 298 transitive relation, 155
Sylow subgroup. 617 translation, 215
symmetric Boolean function. 252, 375 —invariant, 217
characteristic number-, 253 transpose, 462
structure therom for-, 254 transposition, 371, 375
cyclic symmetry of-. 259 Travelling Salesman Problem. 31. 622
circuits for-. 270, 280 tree. 619
symmetric difference (see ring sum) Tlemblay 220, 299, 434
symmetric group (also re: permutation triangle inequality. 136, 142
groups), 353 triangular matrix, 498
mentor: for. 383 triangular partiton, 111, 557. 613
non-solubility of-. 383, 618 trlehotomy. 194
-and determinants. 477 trisecting an angle, 617
symmetric matrix, 497 trivial Boolean algebra, 22
symmetric relation. 155 trivial group, 306, 353
symmetry (for a metric). 136 trivial homomorphism, 344
symmetry (— isometry) trivial ring, 386
symmetry group (- group of sym- truth. 284
metries) 307. 310 quantification of, 38, 52
-of a triangle. 307 —tab1e, 288
—of a regular pentagon; 308, 315, 324, -value, 286
326, 332, 362, 373 vacuous- 41. 289
-of angular recon, 309 —set, 291
-ot‘ a tetrahedron, 309. 379 Tucker, 35
-o! a cube, 384 two—sided identity etc, (we identity
etc.)
Table for two-state device, 222. 242
-binary operation, 205 2-plaee predicate. 291
—Boolean ninetion, 243, 244
—truth values. 288 u lid. (=uniquefnctorisation domain)
tnutolouy. 287 unary operation, 199
lawn of-, 224. 228 unnry predicate, 290
ternary device, 26. 256. 282 uncountable set, 88
ternary operation. 199 underlying set, 212. 215
ternary sequence, 89 union, 58
Tent Problem. 17, 23 unique factorisation domain ( —tnf,d.),
solution to-, 146 427
theorem, 37 uniqueness (laws of). 241
Thomas and Finney, 13 unit, 414
Tjur, 558 unity, 314
Tomeaeu. 615 universal set, 55
topological groups, 322 universe, 55
topological sorting. 192, 197 'unless', 43
748 INDEX
upper bound, 180 vicious circle, 51
upper triangular matrix, 498 Vilenkin, 35, 112
vnid (-= null) set. 57
Vuuous (= null) set, 57
vacuous truth, 4!, 57. 289 Weaker relation, 165
vnlid argument, 293 weaker statement, 45
value (of a function), 64 Wedderhurn, 408
vandermonde determinant, 50], 576, 590 Weierstrass, 51
variance, 557 weight function, 148
variate, 545 well-defined, 166
vector, 434 well-ordered, 183. 196. I98
vector field. 435 Whitesitt, 283
vector space. 436 Wilson's theorem, 340, 341. 429, 433
—homomorphism, 438 word, 95. 302
—of matrices. 439 we circuit. 269. 283'
-oi' polynomials, 430
-of ordered u-tuples, 440 XOR-late, 282
—ot‘linear transformations. 440. 464
finite dimensional-J45 Zariski, 408
basis for-, 445 Zauenhaus. 322
dual of-. 440, 465, 497 zero (= root) 4203
Vendor Problem, |2, 28 zero divisor, 390, 437
solution to—. 94 zero group (see trivial group)
—related problems, 107, 108, 218, 529 zero homomorphism. 344
Venn diagram, 58
0A 39.2 .J67 1989
Joshi, K. D.
Foundations of discrete
mathematics
K.D. Joshl obtained his doctorate in
Mathematics from Indiana University, USA.
After working In topology for some time
(and having authored a book ‘Introductlon
to General Topology'), he developed ..
Interest in discrete mathematics and taught
a number oi courses In it at the Indian
Institute at Technology, Bombayt The
present book and the sequel 'Applied
Discrete _Stmctures'\are a culmination of
his Interest and experience in this area of
mathematics.
0139.2 J61 I!” IAIN
Jolltl. K. DJFuumans cl diner-la mat
Mathematical Modelling
Each chapter of the book deals with mathematical modelling through one or more
specified techniques. Thus there are chapters on mathematical modelling through
algebra, geometry. trigonometry and calculus. through ordinary differential
equations of lirst and second order, through systems of differential equations.
through diflerence equations, through partial differential equations, through
functional equations and integral equations. through delay-differential, differential-
difference and integro-differential equations. through calcuius of variations and
dynamic programming, through graphs. through mathematical programming.
and ' entropy ' '
Each chapter contains mathematical models from physical. biological, social
management sciences and engineering and technology and illustrates unity in
diversity of mathematical sciences.
The book contains plenty of exercises in mathematical modelling and'Is aimed to
give a panoramic view of applications of modelling In all fields of knowledge It
contains both probabilistic and deterministic models.
The book, only the ' "_, of ‘ Iain L ' and can
be used as a text book at senior undergraduate or post-graduate level tor a one or
two ‘ course for ‘ of "social and
biological sciences and engineering. ltcan also be useful for all users of
mathematics and for all mathematical modellers.
J.N. Kapur
Prof. Kapur is Honorary Visiting Professor at IIT Delhi and Delhi University; Senior '
Scientist with lndIan National Science Academy; Academic Secretary, Indian
Mathematical Society; President. Association of Mathematics Teachers of Delhi and
is Convener of Science Education Forum of Indian Science Congress Association.
He Is Fellow of Indian National Science Academy, National Academy at Sciences
and Indian Academy of Sciences.
Earlier he was Vice-Ci ‘ of Meerutl.‘ ' ' ,. Head of " '-
Department of IIT Kanpur, Visiting Professor at Carnegie Mellon. Arkansas.
Manitoba. Waterloo. Carleton, New South Wales and Flinders Universities. He has
also been President ol Indian Mathematical Society. Calcutta Mathematical Society,
Bharat Ganit Parishad, Indian Society of Theoretical and Applied Mechanics. Indian
Society of Agricultural Statistics. Indian Science Congress (Maths Section), National
Academy of Sciences (Physical Science Section) and Mathematics Association of
India.
Prof. Kapur has published more than 350 research papers In reputed journals.
50 books and 400 scholarly articles His books include Insight into Mathematical
. . J .. -1 L - - Models o! r ' ' 'M'odels In Biology
and“ J ' “ '- ‘ 'Statisiics, ‘” " ' Fluid Flow. and Generalised
Maximum Entropy Principle.
ISBN 0—470-2w
JOHN WILEY-R. some
New York Chichester BI
ISBN 0-470-21152-0