Algebra and Number Theory Lecture Notes v6
Algebra and Number Theory Lecture Notes v6
∗
Algebra and Number Theory
Lecture Notes
Szymon Brzostowski
November 1, 2020
1. The discussion class mark is the average of the marks from two or more tests.
The mark may be increased in special cases (to students taking active part in
the classes) up to one level up.
2. There will be one regular retake (scheduled just before or during the final exam-
ination period) for those students who fail any of the aforementioned tests.
3. The lecture mark is the final exam mark (checking students’ theory knowledge),
scheduled during the end-of-term examinations.
4. The final mark is comprised of the discussion class mark (50%) and the lecture
mark (50%), provided that both are passing marks.
5. „Second sit” test/examination will take place during the resit examination period.
Dates:
. This document has been written using the GNU TEXMAC S text editor (see www.texmacs.org).
This text is intended to be a more detailed and formal version of the lecture itself.
In order to get the most of it, I recommend reading it with your discussion-class notes
at hand. Hopefully, this should allow for full comprehension of the material given.
Szymon Brzostowski
Notation. The symbol „ :=” will denote an assignment, for example :=2 means a
a
that the name carries the value 2. More generally, „ :=” can be treated as a defi-
nition of a new concept (on its left side) using already known concepts (appearing
on its right side). The phrase „if and only if” will be abbreviated as „iff”.
Table of contents
1 Numbers and operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Which numbers are „best”? . .. . . . .. . . .. . . . .. . . .. . . . .. . . .. . 5
3 The first abstraction – the concept of a group . . . . . . . . . . . . . . . . . . 7
4 ...Divisibility in Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 ...Second abstraction – the concept of a ring . . . . . . . . . . . . . . . . . . 11
6 The concept of an isomorphism . . . .. . . .. . . . .. . . .. . . . .. . . .. 12
7 The direct sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 The third abstraction – the concept of a field . . . . . . . . . . . . . . . . . 17
9 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10 GCD . . . .. . . .. . . . .. . . .. . . . .. . . .. . . . .. . . .. . . . .. . . .. 20
11 Solving a · x ≡n b and a ·n x = b . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
12 The group Gn and the ϕ function . . . . . . . . . . . . . . . . . . . . . . . . . 26
13 Fermat’s little theorem and its generalizations . . . . . . . . . . . . . . . . 30
14 Chinese remainder theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
15 The prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
16 Systems of linear equations . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 42
16.1 From one to several equations . . . . . . . .. . . . . . . . . . . . . . . . . . . 42
16.2 Triangular systems and back substitution . . . . . . . . . . . . . . . . . . . 45
16.3 Matrix notation for a linear system . . . .. . . . . . . . . . . . . . . . . . . 46
16.4 Row echelon form of a matrix . . . . . . . .. . . . . . . . . . . . . . . . . . . 50
17 The algebra of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
18 Which matrices are invertible? . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
19 Invertibility and determinants . . . .. . . .. . . . .. . . .. . . . .. . . .. 62
20 Linear systems revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Note that by convention 0 2/ N. The symbol N will denote the set N [f0g. 0
Example.
1. Here’s how you add two complex numbers:
= terms3+(1 1) i=3+0 i=3.
(2+i)+(1 i)skip the=brackets2+i+1 icollect similar
2. And here is multiplication:
Analyzing the first example above, it is not hard to produce a general rule for the
8
addition of complex numbers:
Addition Rule in C.
a;b;c;d 2R
(a +b i)+(c +d i):=(a +c)+(b +d)i.
8
(The universal quantifier is read „for all”.)
A similar formula for the product of two complex numbers holds (try to find it yourself!):
8
Multiplication Rule in C.
a;b;c;d 2R
(a +b i) (c +d i):=(a c b d)+(a d +b c) i.
Observe that thanks to using symbols instead of specific numbers, above there are
How
formulas that work for every choice of (real) values of . So although it is enough a;b;c;d
for you to know how to calculate with complex numbers to arrive at the correct result,
Algebra
these formulas give you the result almost automatically once you remember them. In
works
algebra you generally work with symbols and you calculate using a certain set of rules
for manipulating the symbols. Some of these rules you should already be familiar with.
The basic rules for computations with numbers are
a +b =b +a (commutativity)
2 3 3 2
! ( 1+2)+ =1+ = = 1+ = 1+ 2+
3 3 5 7 3
, and – in symbols –
2 2 2 2 2
! (2+1) =3 =1= + = 2 +1
1
3
1
3
2
3
1
3
1
3
1
3
, and – in symbols –
(a +b) c =a c +b c (distributivity)
(This law allows you to distribute multiplication over addition.)
Note that the first two rules above are also true when multiplying numbers (i.e. with
+ replaced by ). This suggests that perhaps such rules could be interpreted more
abstractly. Namely, the symbol + (or ) might denote an unspecified binary operation.
Of course, in such generality it might be appropriate to use some new symbols (like ,
, , or ) for the binary operations and sometimes it is done so, especially when
there is a danger of some confusion. Observe however, that in the case of complex
numbers that we have considered, we used the ordinary symbols + and , rather than
some new ones, and we called them „addition” and „multiplication”, respectively. This
is natural, because those operations are in fact extensions of the usual operations of
addition and multiplication from the real numbers to a wider domain (this means that
the „new” binary operations work in the „old” way for those complex numbers which
are real, e.g. (2+0i)+(3+0i)=5+0i and similarly (2+0i) (3+0i)=6+0i). Also
in computer science, a common practice is to implement some new binary operations
and still denote them by + and although in fact they overload or even totally redefine
the standard operations of addition and multiplication.
Commentary. Proving the usual empirical properties of the operations + and is not
totally trivial, even for N. First, one needs to properly define the set N (this can be done
using Set Theory) and the operations + and . Only then is it possible to actually prove
their properties (this is done in a branch of mathematics called Theoretical Arithmetic).
Fortunately, you have so much empirical evidence for their validity from everyday life,
you should not feel uncomfortably using those properties.
Going back to our question again: are the various sets of numbers essentially
different? Yes, they are... But only when you start thinking about solving equations
in them. Suppose we want to solve the equation 2+ =1 in the naturals. Clearly, this x
is impossible because we have
N 1 2 3x2 ::: !
a result always greater than 1!
2+x 3 4 5 :::
But the same equation has a (unique!) solution = 1 in Z. A difference! x
The following table shows which equations can be solved in the various number sets:
Equation to solve N Z Q R C
a +x = b NO YES YES YES YES
a x=b; a =/0 NO NO YES YES YES . (1)
x =2 2
NO NO NO YES YES
x= 1
2
NO NO NO NO YES
As we can see, Q is different than Z if considered with multiplication (e.g. 2 =1 x
has no solutions in Z; on the other hand the equation 3 = 5 does have the solution
2 3
x
x = 109 in Q). Also, C is „better” than the other sets with respect to . But what is
the difference between Q and R? Part of the answer is shown in the above table. The
full answer is beyond the scope of Algebra: the real difference is that in R there exist
„limits” for all the sequences that „should have” a limit. The same property is true for
C. Hence, R and C have different analytic features than Q.
12
1
12 22
1 1
12
1
22
1
32
Table (1) may seem to suggest that the complex numbers are „the best”. This is,
indeed, true in many respects but not in all of them. For example, the numbers N, Z,
Q, R are linearly ordered, thus are comparable and can be drawn on a line:
/ 1 p2 2 3 π +1
8 8
1
2
C. Still, there exists a nice way to visualize complex numbers – they can be drawn on
a plane.
Example (the well-ordering principle of N). The set := the natural numbers A f
g f ; ; ;:::g
that are divisible by 2 and 5 = 10 20 30 has its minimum equal to min =10. A
We can sum up the above discussion by agreeing that there are various possible
viewpoints on the properties of numbers; hence there is no obvious choice of „the best
numbers”. The various viewpoints are developed in different branches of mathematics:
Number Theory
.% Algebra
(properties of binary
operations)
& Linear Algebra
(divisibility and prime (vectors, matrices and
numbers) operations on them)
The lectures will cover the very basics of these three branches of mathematics.
98 98
G1 ◦.
a;b;c 2G
(a b) c =a (b c) (associativity)
G2 ◦. !
e 2G a 2G
a e =e a =a (existence of identity element)
G3 . ◦
a 2 G b 2G
a b = b a =e (existence of inverse element).
The element e 2G involved in G2 and G3 is one and the same and is called the
identity, while the element b 2G satisfying G3 is called the inverse element of a 2
G and denoted by a . Hence, G3 can be written in the form a a =a a=e.
8
1 1 1
If moreover
G4 ◦. a b =b a
a;b 2 G
(commutativity)
98
(The existential quantifier 9 is read „there exists”. The symbol 9! means „there exists
exactly one”.)
Exercise 2. Show that the condition G2◦ is equivalent to the condition G2 0 ae=
e 2G a 2G
e a = a; in another words, an element e 2 G satisfying G2 0 is the element.
Exercise 3. Show that in a group G the element a 2 G inverse to an element a 2 G is indeed
1
At first sight, the definition of a group does not seem to be connected with the
equations ax b a;b 2G
=, . But it turns out that in a group such equations are always
8
solvable. We list this together with some other basic properties of groups:
Property 1. Let (G; ) be a group. Then
8
1.
a;b 2G
(a ) =a ^(ab) =b a
1 1 1 1 1
89 89
2.
a;b;c2 G
((ac =bc) )a =b) ^((ca =cb) )a =b) (cancellation laws)
3.
a;b 2G c 2G
!ac =b and a;b 2G d2G!da =b (solvability of ax=b and xa =b).
Commentary. Property 1 works for all groups; in particular it is valid for the example
groups given below. This is a manifestation of the power of abstract thinking.
8
So far, we have not spoken explicitly about the division „/” and subtraction „ ”
8
operators. The reason for that is that in a group they can be defined using the group
operation (or +). Namely, in the additive case we put a;b 2G
a b :=a +( b) and in
a/b = a :=a b 1
the multiplicative case —
a;b 2G b .
Examples of groups. (Some of these will be discussed in greater detail at the classes.)
; ; ; ; ; +); (Z; +); (f0g; +) are abelian groups with respect to the
I. (C +) (R +) (Q
usual addition.
; ; ; ; ; ; f g;
II. (C ) (R ) (Q ) ( 1 ), where the asterisk means we exclude the number
0, are abelian groups with respect to the usual multiplication (check this for C!).
III. (R+; ); (Q ; ), where R :=(0; +1), Q :=Q \R , are also abelian groups.
+ + + +
roots of unity that is for every 2 n we have ::: = n =1. With respect to
||{z}
|n|| }}
}}}
(complex) multiplication, ( n; ) is an abelian group.
factors
VII. The circle S C with center 0 and radius 1 consists of the complex numbers of
1
the form z =cos '+i sin ', where '2[0; 2 π). S is an abelian group with respect 1
to the complex multiplication. (This, as well as example VI, follows from the
formula for the product of two complex numbers written in their trigonometric
' '
form : (cos +i sin )(cos +i sin )=cos ( + )+i sin ( + )). ' '
4. ...Divisibility in Z 9
;M
VIII. Let X be any set. The pair (2X ), where 2X is the set of all the subsets of
the set X and denotes the symmetric difference of sets that is := AMB
( A[ B n A \ B
) ( ), constitutes an abelian group.
IX. Consider the set Bn of binary strings of length . We may define a bitwise oper- n
ation on such strings using the exclusive or (xor) operation. Namely, for bits
we put:
Input xor
0 0 0
0 1 1 ;
1 0 1
1 1 0
so that e.g. 0 xor 0=0 and so on, and for two binary strings = 1 n, = 1 n s s :::s t t :::t 2
s t s t ::: s
Bn we put: xor :=( 1 xor 1) ( n xor n) (e.g. 01 xor 11 =(0 xor 1)(1 xor 1)=t
;
10). The ordered pair (Bn xor) is a group (check it!).
The last example turns out to be a very special case of a more general phenomenon,
having its source in Number Theory. Namely, this is an instance of modular arith-
metic... (see Section 7 for an explanation of this connection)
4 ...Divisibility in Z
The basic definition is:
Definition 2. Let k;l 2 l
Z. We say that divides , and we write or 0(mod ) k l j k k l
k l
(read: „ is congruent to 0 modulo ”), if there exists an integer Z such that m2
l m k l
= . We may also say that is a divisor of and that is a multiple of . k k l
Example. 3 j 6 because 3 2=6.
What happens if an integer k is not divisible by an integer l? We may perform a
„partial division”:
Theorem 1. (Division with remainder) Let k; l 2 Z; l =/ 0. Then there exist
(unique!) integers q;r
such that
k = ql +r and 0 6r<jlj (2)
q is called the partial quotient and r – the remainder ( on division of k by l).
(In the above theorem jj denotes the modulus (or the absolute value) of a real number
which is equal to its distance from 0.)
8
Definition 3. (Modular addition and multiplication)
m+l n:=(m+n)(mod l) .
ml n:=(m n)(mod l)
m;n 2Zl
Clearly, we defined two operations +l; l: Zl !Zl. What are their properties?
Theorem 2. (The algebraic structure of Zl)
1. (Zl ; +l) is an abelian group with the number 0 2Zl as the identity.
2. l is associative and commutative and has 1 2 Zl as the identity.
3. l is distributive over +l that is
a;b;c2 l
8
(a +l b) l c =(a l c)+l (b l c).
Let us illustrate these properties with some specific examples.
Z
Example.
! 2+ 3=(2+3) mod 5=5 mod 5=0; hence 3 is the + -inverse element to 2 and
5 5
this fact can be written as (cf. the comments on page 7 following Definition 1)
2=3 (in Z5); more generally, for Zl it holds =( )(mod ) . a2 a l a l
! (1+ 2)+ 3 =(3 mod 7)+ 3=3+ 3=6 mod 7=6 and
7 7 7 7
9 9 9 9
5 5 5 5 5
Convention. To simplify the notation, we may write just + and in place of +l and
l (respectively) upon remarking that the calculations are performed in Zl.
Theorem 2 shows that the triple (Zl; +l; l) constitutes a quite nice algebraic struc-
ture. Note also that those nice properties seem to be inherited from (Z; +; ), where they
also hold. More generally, every triple (Q; +; ), (R; +; ), (C; +; ) has such properties,
too (check this for C!). Thus we have come to our...
8
3. there exists an identity element
4. is left and right distributive over +, that is
r (s +t)=(r s)+(r t) :
r;s;t 2R (s +t) r =(s r)+(t r)
If, moreover, is commutative we say that R is a commutative ring (with unity).
Convention. It is customary to agree that has greater priority than +. This allows
one to skip some brackets. For example, we can simply write (a + b) c = a c + b c
because we understand that the products ac, bc should be computed first, and only
after that may we compute the value of a c +b c by performing the addition.
Example.
I. (Z ; +; ), (Q; +; ), (R; +; ), (C; +; ) are commutative rings with unity.
II. By Example VIII page 9, (2 ; M) is an abelian group. One can check that (2 ; M;
X X
\) constitutes a commutative ring with unity (what are its identity elements?).
III. If R is a ring then the set R[X] of polynomials with coefficients in R (that
is „formal sums” of the form r + r X + ::: + rn Xn where r ; :::; rn 2 R,
n 2 N ), together with the natural operations of addition and multiplication of
0 1 0
IV. Let X = / ? and let R be a ring. The set R of functions f: X !R is itself a ring
8 8
X
8 8
1.
f ;g 2RX X
addition in R
2.
f ;g 2RX
f g is a function defined as x2 (f g)(x):= f(x) g(x) .
X
multiplication in R
Moreover, if R is commutative then also (R ; +; ) is a commutative ring.
X
8
The reason why this does not lead to a confusion is that the set of constant functions
R R
in X is an isomorphic copy of the ring (see definition below), so one can calculate
with these constant functions in the same way as in R. If, for example, = Z then R
we have: 1+2=3 ,x2 x x x x
1( )+2( )=3( ), where 1( ) is the constant function sending
every x 2 X to 1 2 Z and similarly for 2(x), 3(x).
X
2. '(0)=0;'(1)=1;::: .
The bijection ' is then called an isomorphism ( of the structures A and B). The
notation A =B or A
'
=B will indicate that A and B are isomorphic (by means of
the isomorphism ').
If we do not require ' to be a bijection (so it is just a function from A to B)
but the conditions 1 and 2 are satisfied for ', we call such a ' a homomorphism
of the structures A and B.
8
that G is homomorphic to H if there exists ': G!H such that
a;b 2G
'(a b)= '(a)'(b);
'
this is called a (group) homomorphism.
'
If, moreover, is a bijection, it is called an isomorphism ( of groups) and the
groups G
and H G H G H
are said to be isomorphic. In symbols: = or = .
'
The question is: why in this definition do we not require ' to satisfy condition 2
of the general Definition 5? The answer is simple: in case of groups this condition is
automatically satisfied for a group homomorphism '. Namely, if e 2 G and e0 2 H
are the identities (=„the distinguished elements”) of the respective groups, we have:
'(e)e0 ='(e)='(e e)='(e)'(e) and by Property 1 item 2 applied to the group H
this implies that '(e)=e0 , which is the missing condition.
In general however, such good behaviour cannot be taken for granted. For example,
' ; ; ! ; ;
we may define :(Z + ) (Z + ) by the formula ( ):=0 for 'k k2
Z. It is immediate to
'k l 'k 'l ' k l
notice that ( + )=0= ( )+ ( ) and ( )=0= ( ) ( ) for all integers ' k ' l , so k;l
'
one might be tempted to treat as a homomorphism of rings. This is ok if one considers
rings without unity (sometimes it is done so), but according to our definition (Definition
4) a ring has got the distinguished element 1 so its homomorphism should carry 1 to 1.
Thus, in case of rings (with unity) we must explicitly demand this condition:
R; +; ), (S; 6; ) be two rings. We
8
Definition 7. (Isomorphism of rings) Let (
R S
say that is homomorphic to if there exists ': R!S such that
1.
a;b 2R
'(a +b)= '(a) 6 '(b) ^ '(a b)= '(a)'(b),
2. '(1)=1;
this ' is called a (ring) homomorphism.
If, moreover, ' is a bijection, it is called an isomorphism ( of rings) and the
rings R and S are said to be isomorphic. In symbols: R =S or R =S.
'
Again, here the condition '(0)=0 is automatic since Definition 7 implies that this ring
homomorphism ' is also a homomorphism of the additive groups (R; +) and (S; 6) in
the sense of Definition 6.
Examples (check!).
1. For a group G R
(or a ring ) the identity function Id defined by Id( x) := x, for
x 2 G x 2R
( ), is always an isomorphism of groups (rings).
2. The group (Z; +) is isomorphic to the group (2 Z; +), where 2 Z :=f2 k: k 2 Zg
is the set of even numbers. The isomorphism can be specified as '(x):=2 x, x2Z.
3. For the group (R; +) one can define an automorphism (i.e. an isomorphism of
a group (a ring,...) with itself ) by the formula '(x):= x, for x 2 R.
4. Similarly, the function ' given by '(z):=z, z 2C, (here the bar denotes complex
conjugation) is an automorphism of the ring (C; +; ).
5. For the group (Q; ) one can define its homomorphism into the group (Q> ; ) as
'(x):=jxj, x2Q (here Q> :=fq 2Q: q>0g). Another possible homomorphism
0
6. Let l 2 N, l > 2. The function ' given by '(x) := x (mod l), x 2 Z, is
a homomorphism of the rings (Z; +; ) and (Zl; +l; l).
7. The function ':(R;+)!(R> ;) given by '(x) :=2x, x 2 R, is a group isomor-
0
phism.
R X ; ;
8. For a polynomial ring ( [ ] + ) (see Example III on page 11) and a chosen ele-
mentr 2R ' ' w wr
one can define the evaluation homomorphism r by r( ):= ( ),
w 2R X R; ;
[ ], into the ring ( + ).
9. The group ( n; ) of n-th roots of unity is isomorphic to the group (Zn; +n).
k
The isomorphism can be given by the formula ' cos n +i sin n
k :=k, for
2 π 2 π
k 2f0;:::;n 1g.
0
i ! 1
1 ! 2
i ! 3
Under this identification, let as compare the multiplication (addition) tables of the
respective groups:
1 i 1 i +
4 0 1 2 3
1 1 i 1 i 0 0 1 2 3
i i 1 i 1 1 1 2 3 0
1 1 i 1 i 2 2 3 0 1
i i 1 i 1 3 3 0 1 2
This direct inspection shows that the above tables are essentially identical. Namely,
let us try to forget about the actual values represented by the various numbers (1 i ; ;:::;
; ;::: ; ;:::; ; ;:::
0 1 ) and try to think of them as names (810 8i0 800 810 ) and similarly we may
think about an abstract operation instead of and +4. Better yet, let us introduce
new names, e.g. a;b;c;d
. Then the above two tables can be written in a unified way:
a b c d
a a b c d
b b c d a
c c d a b
d d a b c
This table represents a group ( fa; b; c; dg;
) much in a way Algebra actually
treats the groups it studies – this representation is more abstract than the two specific
representations above. Seeing this amounts to noting that the substitutions :=1 :=i a ;b ;
c ;d
:= 1 := i and := lead to the table of the group ( 4 ) while the substitutions ;
a ;b ;c ;d
:= 0 ; =1 := 2 := 3 and :=+4 lead to the table of the group (Z4 +4) (see ;
above). Thus, the concept of an isomorphism gives us a way to say that two given
groups (rings,...) are just instances of one and the same abstract group (ring,...). This
„abstract structure” is what Algebra is actually interested in, because it can be studied
without contemplating „the nature of its elements” and the real interest, which lies in
the properties of the binary operation itself, is more transparently exposed.
There is a close analogy between this point of view on groups and object-ori-
ented programming – specific groups are instances of an „abstract group” just the
way specific objects are instances of the class they instantiate. Keeping this analogy,
one might also say that Algebra is concerned only with the properties of the classes,
not the objects themselves.
To illustrate with what sort of properties Algebra is interested in, let as define the
r R
notion o a zero-divisor in a ring: an element of a ring is called a (left) zero-divisor
s2R s , = r s
/ 0, such that =0. Let r 2R R
if there exists be a zero-divisor in a ring and
R; ; R; ;
' ^
^
let ( + ) =( + ^). We have r s =0 for some element . Hence, (0)= ( )= s2R ' ' r s
'r 's
( )^ ( ). By the comments after Definition 7 we know that (0)=0 which means that '
'r 's '
we get the relation 0= ( )^ ( ). Since is one-to-one and = / 0, also ( )= / 0. Bys 's
the above definition this means that the element ( ) (identified with 'r
by means r 2R
' ^ R
of the isomorphism ) is also a zero-divisor in the ring . Thus the property of „being
a zero-divisor in a ring” is an invariant of isomorphisms. Of course, one may similarly
define and analyze the notion of a right zero-divisor . In the following „a zero-divisor”
will stand for a left or right (or maybe simultaneously left and right) zero-divisor.
We summarize some basic properties of morphisms, often used in practice:
Property 2. Let ': G!H, : H!K be homomorphisms of groups (rings). Then:
1. ' is also a homomorphism,
2. ' is an isomorphism if ' and are isomorphisms,
3. if ' is an isomorphism then also ' is an isomorphism, 1
4. the set Aut (G) of automorphisms G!G is a group with respect to com-
position of maps ,
5. if G is a group then
x 2G
8
'(x )=('(x)) .
1 1
Let us go back to example IX on page 9. The table for the xor operation on bits
can be compared against the following table:
Input +
2
0 0 0
0 1 1 :
1 0 1
1 1 0
;
The immediate conclusion is that we have (Z2 xor) =(Z2 +2) i.e. the operations xor ;
and +2 are the same in Z2. Moreover, we extended this xor =+2 to binary strings of
n
length , bitwise. Whatever „a string” means to you, you should agree that working
with binary strings in a bitwise way is the same (=the resulting algebraic structures are
n
isomorphic) as working with -tuples ( 1 k ;:::;k 2 |||||
n) Z2 :::}}}}}}}}
|n|||{z}
Z2 componentwise, e.g.
factors
1 1
! Another example of a direct sum is just the n-dimensional space Rn with coordi-
natewise addition of real numbers (here becomes the usual addition of vectors).
Is there any relation between the groups G, H and their direct sum G H? Well,
note that usually when we think e.g. about the plane R2, we think of it as embedding
the line R, as in the following picture:
R
R2
Hence, the real numbers are identified with the subset R 0 of R2. A similar idea f g
G;
works in the general case. If ( +) and ( + H;
^) are groups with identities equal to 0, 0^
(respectively), we may define the following maps:
G: G3x7!(x; ^0) 2GH and H: H 3 y 7!(0;y) 2GH:
It is straightforward to check that G, H are injective (=one-to-one) homomorphisms;
GH
hence in there are isomorphic copies G( )= G Gf g
0 and H ( )= 0 H f gH
of the
GH
groups , , respectively. As explained earlier, from the point of view of Algebra we
may thus think that GGHand H GH
. It is worth to note that similar identi-
fications are made in other branches of mathematics, for example in Analysis one also
often treats R as a subset of Rn.
(Z2 +2
! In Z = f0; 1; 2g we have of course 1 1=1 and moreover 2 2=1. Hence also
2
(Z ; + ; ) is a field.
3 3 3
4 4 4
In the last example a direct inspection showed that 2 does not have a multiplicative
inverse in Z . Even worse, 2 is in fact a zero-divisor in this ring because 2 2=0. For
n2
4 4
larger rings Zn (i.e. for big N) such verification could be very tedious (or even com-
putationally infeasible) without some additional knowledge. For instance, checking if an
element a2 Z1000 is invertible requires (potentially) the computation of 999 products.
Hence, our next medium-term objective is to characterize the invertible elements in
Zn and learn how to compute the inverses. This will be done in Section 11, because to
accomplish this we need some new tools. But first, let us note a necessary condition
for invertibility, suggested by the last example above.
Property 3. If a2
Zn is an integer such that then is a zero-divisor in Zn aj n a
and has no multiplicative inverse. More generally, in any ring a zero-divisor is
not invertible.
The notion of an isomorphism in case of fields is essentially the same as the one for
rings. Since, however, fields have „better” multiplicative structure than rings, one can
state a seemingly weaker definition:
Definition 10. (Isomorphism of fields) Let (K + ; ; ), (L; 6; ) be two fields. We
say that K is homomorphic to L if there exists : K ' !L such that '=/0 (i.e. ' is
8
not identically equal to 0) and
a;b 2K
'(a +b)= '(a) 6 '(b) ^ '(a b)= '(a)'(b);
'
this is called a (field) homomorphism (or an embedding).
'
If, moreover, is a bijection, it is called an isomorphism ( of fields) and the
fields K and L are said to be isomorphic. In symbols: K =L or K =L.
'
In order to explain why this definition gives an isomorphism in the sense of the general
'
Definition 5 we note the following: since of Definition 10 satisfies =
/ 0, we can choose '
x2 K such that ( )= 'x
/ 0 ' x ' x
and then we get ( )= ( 1)= ( ) (1) which multiplied ' x '
'
by '(x) gives (1)=1. This is all that we need, because the relation (0)=0 holds for
1
'
the same reasons as in the case of ring homomorphisms. But there is more to the story:
Property 4.
! A field has got no zero-divisors other than 0 (=it is an integral domain).
! Every field homomorphism is injective (hence the name emb edding ) .
Note that the latter property means that in order to check if a field homomorphism is
in fact field isomorphism, it is enough to check if it is „onto”.
Property 3 explained that an invertible element of Zn cannot divide the number
n
. But this is not enough, as it turns out (e.g. 6 9 6=0 in Z9 and 6 9). The invertible -
elements of Zn turn out to be exactly the non-zero-divisors. We will see this shortly,
but first let us change our language a little...
9 Congruences
n
The concept of „calculating modulo ” can be extended to the whole set of integers,
which has proved to be quite useful in applications. First we give the basic definition.
Definition 11. (Congruent numbers) Two integers , are said to be congruent kl
n
modulo , where n2
N, if k l n
0(mod ) (see Definition 2) that is if ( ). nj k l
k l n
For and congruent modulo we will write k l n k l
(mod ) or n .
Note that in particular k 0(mod n) (in the new sense) iff n j (k 0)= k, so the new
notation is consistent with the notation introduced in Definition 2.
9. Congruences 19
8
ations in Z:
Theorem 4. (Arithmetic properties of congruences) Let N. Then n2
8
1.
a;b;c;d 2Z
(an b^cn d))(a+cnb+d) (sidewise addition of congruences)
2.
a;b;c;d 2Z
(an b^cn d))(acn bd) (sidewise multiplication of congruences)
Example. 10 26 (mod 4) and 1 11 (mod 4) so 9 = 10 1 26 + 11 = 37. Also
10 ( 1) 4 26 11.
4
out of the ring Zn and calculate more „freely” with integers at the cost of not having
strict equalities (which we had in Zn) but only relations ( n in Z). But note that at
the end of such calculations we can always produce a result that lies in Zn (by the
distinguished formulas above) and this, in fact, will be the correct result in this ring (if
the arguments to the calculations also came from Zn).
Example. In Z15 we have
(10 +15 ( 3)) 15 13 =(10 +15 12) 15 13 =7 15 13 =1.
On the other hand, we can perform the ordinary addition and multiplication and only
reduce the final result to its remainder:
(10 +( 3)) 13 7 13 91 1(mod 15).
As we can see, the result is the same.
Illustrative
as it is, the above example is somewhat misleading because the real
power of calculating with congruences comes from the exact opposite application of
Theorem 4: namely, one can always reduce any term of the expression to compute to
its remainder and still have a valid congruence.
Example. Let’s say we want to compute the remainder (134 ( 21)+16)(mod 6). We
know that 16 4(mod 6), 134 2(mod 6), 21 3(mod 6). Hence, by Theorem 4 item 2
we have 134 ( 21) 6 2 3 and then by item 1 — 134 ( 21)+16 6 2 3+4. Now it is
easy to finish: 2 3+4 10 4(mod 6). And this is the answer to our problem.
If you think of what has just happened, you may feel a little amazed because in
order to find the result we didn’t need to compute the product 134 ( 21) at all!
Actually, such calculations can be performed in a much more flexible way than done
above (again, this follows from Theorem 4). For instance, we can also do something like
this: 134 ( 21)+16 2 ( 21)+34
~
~
42 +34
8 4(mod 6), where the decorated
numbers are (respectively) congruent modulo 6.
At the level of congruences one can study „equations” similar to those considered
in Zn, too:
a +xn b and a xn b.
The first congruence is easy to solve. For given a;b 2Z all the solutions of a +xn b
are the integers x 2fb a +kn: k 2Zg. Of course, among such numbers there is a solu-
tion equal to (b a)(mod n) 2Zn (for k =
b a where bc denotes the floor function).
n
If, moreover, a;b 2Zn then solving a +xn b in integers amounts to solving a +n x=b in
Zn and then using the (unique!) solution x =b n a (see page 8 for the definition of n)
to produce all the integer solutions f(b n a)+kn: k 2 Zg to the congruence a +x n b.
These observations indicate that there is no essential difference between the ability to
solve a +n x = b and a + x b (mod n) but the latter is „more general” than the former
in that that we allow here a;b to be arbitrary integers (not just the elements of Zn).
Similarly, we will be able to solve the equation a n x =b in Zn once we can solve the
congruence a x n b in integers. The tool that is needed to solve such congruences will
be developed in the next section.
10 GCD
The basic definitions that we now give should be more or less familiar to you:
k ;:::;k 2 d is called the greatest
8
Definition 12. (GCD) Let 1 r Z. A natural number
common divisor (gcd) of the numbers 1 r if k ;:::;k
1.
1
d j kj
6 j 6r
2. if 2 Z is such that
6 j 6r
8
j kj then j d.
1
Definition 13. (Coprime numbers) Two integers k;l are called coprime if (k;l)=
1. Coprime numbers are sometimes denoted by k ?l.
The next theorem should not be a surprise to you:
Theorem 5. (On GCD) For any 1 r k ; :::; k 2 Z not all equal to zero their gcd
always exists and is uniquely determined.
In principle, Theorem 5 is a consequence of the following method of finding gcd:
k k
find the sets D ( j ) of all the positive divisors of the numbers j , compute 1 6 j 6 r D ( j )
T k
k
and take the maximum of this (finite! if not all j are 0) set. This number will be the
k ;:::;k
gcd ( 1 r).
10. GCD 21
Examples.
! For any k 2Z we have gcd (k)=jkj because jkj is the greatest divisor of k.
! (4; 6)=2 because D(4)=f1; 2; 4g, D(6)=f1; 2; 3; 6g so that max (D(4) \D(6))=
max (f1; 2g)=2.
The method for computing gcd indicated above is not very practical, because it
involves factorization of integers (to find the divisors) and the problem of factorizing
integers is widely considered to be quite difficult computationally (and supposedly infea-
sible for large numbers). The popular public-key cryptosystem RSA is based on the
problem of factorization and so-called Euler’s theorem (see Section 13). Fortunately,
there is another, very old, method to compute gcd without factoring integers:
Theorem 6. (Euclidean algorithm) The gcd of two numbers Z, of which k;l 2
at least one is different from zero, can be computed in the following way:
1. Put r :=max (jkj; jlj), r :=min (jkj; jlj).
0 1
With such definitions, there exists a >1 such that ra =0 (so the above process
ends). Moreover, then ra =gcd (k;l).
1
Commentary. The sequence (r ;:::;ra) in the above theorem is defined using mathe-
0
matical induction. This, in turn, can be implemented recursively e.g. as follows (using
Maple language):
; r r r r
Example. gcd ( 224 128)=? We have 0 =224, 1 =128, 2 = 0 (mod 1)=224 (mod128)= r
r r r r r r
96, 3 = 1 (mod 2)=128 (mod 96)=32, 4 = 2 (mod 3)=96 (mod 32)=0. By Theorem 6,
; r
( 224 128)= 3 =32
The above calculations can be represented in a both more compact and more detailed
way:
ri ri qi +1
224 128 1
128 96 1.
96 32
0 3
j ri k
32
q
Here i = r
i +1
. This representation will be useful in what follows.
The Euclidean algorithm is a much more effective way of computation of gcd than
the naïve method based on factorization of numbers indicated above. Specifically, one
can show that the number of steps required to perform in order to compute gcd ( ) k;l
for two integers k;l
such that l 6k l
is O(log( )) (read: „...is big-O of log ”; this notation l
means that the number of steps is bounded by log for some positive constant C l R). C2
From this it also follows that the running time of Euclidean algorithm is O(len len ), k l
where len denotes the binary length (that is the number of bits) of an integer. It is
worth noting that the time needed to compute the product is also O(len len ) sokl k l
finding gcd is essentially as fast as multiplying integers.
Exercise 4. Show that the binary length len k of an integer k is equal to len k = b1log
,
2 jkjc, if k =
/ 0
if k = 0
.
Theorem 6 gives us a method of finding the gcd of two integers without knowing
their factorizations. And what about the case of several numbers? We have:
Theorem 7. (GCD of several integers) Let 1 k ;:::;k 2
r Z be integers not all equal
to 0. Then ( 1 k ;:::;k
r ) does not depend on the order of the arguments 1 r. k ; :::; k
More precisely, ( 1 k ;:::;kr) depends only on the set 1 r . If fk ;:::;k g
2 and also r>
k ;:::;k
1 r 1 are not all equal to 0 then
(k ;:::;kr)=((k ;:::;kr );kr).
1 1 1
Remark. Note that the properties of gcd given in Theorem 7 resemble the properties of
binary operations in groups (commutativity and associativity) but, generally speaking,
n n>
gcd is an -ary operation on integers ( 1) and such terminology is not used in this
n k ;k k ;k
case. However, for =2 we have ( 1 2)=( 2 1) and ( 1 ( 2 3))=(( 2 3) 1)=( 2 k ; k ;k k ;k ;k k ;
k ;k 1)=( 1 k ;k ;k k ;k ;k
3)=(( 1 2) 3) so one can say that the binary gcd, gcd2: Z
Z Z, !
;
3 2
is commutative and associative. Question: is (Z gcd2 ) a group?
11 Solving a · x ≡n b and a ·n x = b
What is the connection between the gcd and the equation a xn b? The following
theorem provides a characterization of gcd which turns out to be the link that we search
for.
a x ::: a x b
Theorem 8. (Bezout’s identity) The equation 1 1 + + r r = , where 1 a ;:::;
a2r Z are not all equal to 0 and b2 a ;:::;a j b
Z, has a solution in integers iff ( 1 r) .
r ; 2 a x a x b
If =2 and ( 1 2) Z is a solution of 1 1 + 2 2 = then all solutions of this
2
9
Diophantine equations.
How does the above theorem help to deal with the congruence axn b? Well, recall
first what such congruence actually means: ax n b ,(n j (ax b)) , ax+ny =b.
y 2Z
Hence, in order to solve axn b it is enough to solve ax+ny =b in Z , and by Theorem 2
8 we know when this is possible – exactly when (a;n) j b. Before actually showing how to
solve the latter equation, let us finally answer the question touched upon in Property 3.
Corollary 1.
1. The congruence axnb, where a;b2Z, n2N, has a solution in Z iff (a;n)jb.
In particular,
2. The congruence ax
n 1, where Z, a2 n2
N, has a solution in Z iff (a;n)=1,
a
or in words: iff the numbers and are coprime.n
3. An element a 2Zn has a multiplicative inverse in (Zn; +n; n) iff (a;n)=1.
Commentary. Note that if for some a 2 Zn it is (a;n) > 1 then a; n 2 Zn nf0g and
n
an a;nn n a;an nn 0 which means that a is a zero-divisor in Zn. In another words, an
( )
element a 2Zn is n-invertible iff it is a non-zero-divisor (by item 3 of the above corollary
( ) ( )
Using Corollary 1 we can characterize those Zn that bear a field structure. For this,
we recall the following definition:
ri ri qi +1
224 128 1
1,
!
128 96
96 32
0 3
jr k
32
q
where i = r i is the partial quotient on division of i by i +1. Using this relation
i +1
r r
and recalling the Division with Remainder Theorem (Theorem 1), we see that 128 =
1 96 +32 (look at the second and third rows from the bottom) which can be rewritten
as 32 =128 1 96. Similarly, looking at the third and fourth rows from the bottom we
get 96 =224 1 128. Substituting the latter relation into the former gives
Now we can finally show how to solve the congruences a x n b and the related
equations a n x =b.
ri ri qi +1
44 37 1
37 7 5.
!
7 2 3
2 1 2
1 0
Working from the bottom upwards, we get
We can sum up the above considerations by giving the following schematized pro-
cedure of dealing with the various problems to solve:
& .
Solutions exist iff: (a;n) j b (a;n) j b
Solution domain: x 2Z x 2Zn
Auxiliary:
#
Auxiliary equation: ax+ny =b
Solution domain: x;y 2Z
#
Extended Euclid to
A solution to aux.: (x ;y ) 2Z 0 0
2
ax+ny=(a;n)
Solution set for the (x;y) 2Sol ,
x=x +k a;nn and
.o &n
auxiliary equation 0
Z ): y=y k a;na ,k 2Z
( )
2
(Sol 0
( )
n o
Solution: x +k
0
(
n
a; n )
: k 2Z x +k a;nn :k2Z \Zn
0
( )
This theorem explains that the solution x = 25 to the equation 37 x =1 is just the
inverse element 37 in the group Gn. By Exercise 3 we know that an inverse element
44
1
in a group is always unique; that is why x = 25 is the only solution to this equation.
More generally, by Property 1 item 3 every equation an x = b has a unique solution if
a;b 2Gn, that is if (a;n)=(b;n)=1. Actually, the condition (b;n)=1 is unnecessary
for the uniqueness of the solutions; (a;n)=1 is enough. This is a simple generalization
of Exercise 3 and Property 1 item 3 (you will see this in practice on the classes).
We pose the definition of the counting function for the groups Gn:
Definition 15. (Euler totient function) The Euler totient (or ) function is '
defined on the set of natural numbers as ( ):= card
1,
'n
Gn, if n > 1 . Hence, : N N.
if n = 1
' !
'p p p2
By the above example, ( )= 1 if P. Is there a closed-form formula for
ϕ in general? The answer is „none is known”, but there is a method of computation
'
of the values of which, in principle, is as fast (or rather: as slow) as factorizing integers
(see Theorem 12). Below we provide a picture of the graph of the function in the set '
f ;:::; g
1 10000 which shows why a direct formula for is unlikely. '
To explain how to compute the values of the ' function, we introduce the following:
Definition 16. (Multiplicative function) We say that a function : N f ! C is
f
multiplicative if (1)=1 and for every coprime k;l 2
N it is ( )= ( )f k l f k f(l).
There is a related notion of an additive function which satisfies g(0)=0 and g(k l)=
g(k)+ g(l) for all naturals k ?l.
Exercise 6. Show that if f : N ! R is additive then N 3 x 7! 2f x 2 R is multiplicative.
( )
Theorem 10. (On the totient function) The Euler function ' is multiplicative.
How does the above theorem help with calculating '(n)? In order to understand
this, we recall the following fact which should be more or less familiar to you.
>
Property 6. Two natural numbers 2 are coprime iff the primes appearing in
their canonical representations are all different. In particular, powers of different
prime numbers are always coprime.
It should be clear to you that the above calculations are not accidental. Namely we have:
f
Corollary 3. If is a multiplicative function and = 1 1 r
r is a decomposition n p ::: p
n
of a natural number into pairwise different prime factors then
Once we have Corollary 3, the only missing information to be able to compute (at
least in principle) the values of the ' function is how to compute '(p ), where p 2 P
and 2N. Right after stating Definition 15 we noticed that '(p )=p 1 and the reason 1
for that is just the fact that all the element of the set Zp nf0g are coprime to p. This in
turn is a consequence of Property 6: clearly in the canonical representation of a natural
n p
number less than a given prime there is no this prime , so these numbers and p n
p are coprime. Similarly, in order to compute ( ) for a prime and N, we must 'p p 2
count the number of numbers from the set Zp that are coprime to . But by Property p
6 again, this reduces to counting those numbers from Zp that do not contain in p
their canonical representations or, what amounts to the same thing, those that are not
p
divisible by the prime . Now, it is easy to see that all the numbers of the set Zp that
are divisible by p are those of the form 0 2 ( 1 1) , so there are exactly ;p; p;:::; p p
p 1
such numbers. All the other elements of the set Zp are not divisible by , hence p
p
they are coprime to , hence they belong to p , hence there are exactly 1
such G p p
numbers. We can sum up these observations:
'(n)=(p 1
1
p ) ::: (pr r pr r )=n p p ::: prpr :
1
1 1 1 1
1
1 1
Example.
2
2
3
4
5
10
11 33
8
2 3 5 11 3 5 11 =1056000.
11 =2
3
5 2 3 2 2 8 1 3 1
3
7
16
17
72
73
'n
Commentary. The formula for ( ) given in Theorem 12 depends heavily on the
n
possibility of factorization of the number into its prime factors. Up to now, there is no
known algorithm that can do this efficiently. The best proven complexity bound is for
so-called quadratic sieve algorithm and is O(eCn (ln n lnln n) ), where n tends to 1
1/2
C
with n!1 . This is really slow. Since most of modern cryptographic methods depend
in essential way on the practical infeasibility of factoring numbers, there was a contest
called RSA Factoring Challenge, active from 1991 to 2007, in which the RSA
Laboratories founded money prizes for factoring large numbers. This acted as a
method of verifying what key lengths should be considered as being safe given current
state of knowledge of factorizing algorithms and available computing power. Before the
project was canceled, the largest challenge that had been solved was RSA-640, i.e. the
number to be factored consisted of 640 bits. It was done in 2005 and it was awarded
with $ 20 000. After 2007 some bigger challenges were also factored (for free), the biggest
one being RSA-704 (a number having 212 decimal digits). For the remaining challenges
see http://en.wikipedia.org/wiki/RSA_Factoring_Challenge. It is interesting to note
that there is a quantum computer factoring algorithm by P. Shor which works in
polynomial time i.e. fast enough to compromise all cryptographic systems based on
the problem of factorization. However, it is uncertain if powerful enough quantum
computers will soon be built, so this algorithm provides no real danger for world’s safety
;-) in the near future. Still, there is the possibility that an efficient real-world-computer
factoring algorithm might exist. Finding such an algorithm means fame.
As you have probably already noticed, calculations involving congruences are much
simpler to perform than the „usual ones”, especially when we are interested in a remainder
of some (possibly quite complex) expression modulo an integer which itself is not too
big. For instance, it turns out that raising an integer to a (large) power in Zn can
be realized rapidly by so-called repeated-squaring algorithm (see [Sho08, p. 65]).
Moreover, there are some special exponents for which the computation of the power
is trivial. To describe such a situation, we first note the following, somewhat unexpected:
(a +b)p =ap +p bp in Zp
(in the ring Zp the symbol xp means x p ::: p x).
||||||{z}}}}}}
p factors
The reason behind the above formulas is quite simple. As you may already know (at
a b P
least in the special cases of second and third powers) there is something called Binomial
theorem: ( + )n = i =0 ni
n i n i, where n := n! is the binomial coefficient.
ab i! (n i)!
n (n i + 1) ::: i(n 1) n
Now, for / 0 i 2f ;ngthe integer i = 1 2 ::: i
has its numerator divisible by n
n
and the factors in its denominator are less than . This means that if is a prime, the n
n
factor in the numerator does not cancel out
with any of the factors in then denominator
P
(in fact is co-prime to them) and hence ni
0(mod ). But this gives i=0 ni i n i n ab
an +bn (mod n) if n is a prime number, so (6) holds.
Examples.
1. We have (a +b) =a +2 ab +b a +b for any a;b 2Z.
2 2 2
2
2 2
The above observation can be exploited even more. Fix a prime number p. Obvi-
ously, we have 0p p 0, 1p p 1 but thanks to Property 7 we also have 2p =(1+1)p p 1p +
1p =1+1=2, hence 3p =(2+1)p p 2p +1p p 2+1=3 and „so on”. This „so on” means
that we notice a rule which allows as to use symbols instead of specific numbers and
essentially repeat the above calculations, but in a more abstract way. Namely, let’s say
we know that n p ≡p n for some number n2
N0. Then we also have
(n+1)p p np +1p p n+1,
by (6)
where the second congruence follows from the bolded assumption. Thus, we have found
nn) n n
that ( p p ) (( +1)p p ( +1)) for any n2
N0. This means we justified the fol-
lowing chain of implications:
(0p p 0) )(1p p 1) )::: )(np p n) )((n+1)p p (n+1)) ):::
Since the very first antecedent in this chain is true, we may infer that all its consequents
are also true. In another words, we have used mathematical induction and we have
proved the essential part of the following:
Theorem 13. (Fermat’s little theorem) Let p be a prime number.
1. If k 2Z then kp k (mod p).
2. If k ? p then kp 1(mod p).
1
3. If k 2 Zp then kp =k in Zp.
4. If k 2 Gp then kp =1 in Gp.1
Items 2 and 4 of the above theorem are direct consequences of Corollary 1 items 1 and
3, respectively: e.g. the congruence p k k
(mod ) can be multiplied sidewise by an p
element l2
Z such that k l
p 1 (which exists since ) and this gives the item 2. k ?p
Examples.
1. ( 234)28 29 1 since 29 is a prime and (234; 29)=1.
2. (12)17 =12 in Z17 since 17 is a prime.
Fermat’s theorem can be formulated in a somewhat more general form (to see this,
it is enough to use the „sidewise multiplication trick”).
What about moduli which are not prime? First note an example: 3 3 so item 2 3
4
of Theorem 13 does not hold for the non-prime modulus 4. This means that we need a
different exponent than just p 1 in general (if such an exponent exists). It turns out
that there is a more general way of finding such numbers. And this method comes from
group theory. Namely, we have:
G;
Theorem 15. (Lagrange theorem) If ( ) is a finite group with elements and n
H;
( ) is its subgroup (this means that as a set and H G
is a group with H
respect to the same operation restricted to ) then card card . H Hj G
Example. The set H:=f0; 2; 4; 6g is a subgroup of (Z ; + ) and card H=4j 8=card Z . 8 8 8
At this point, it may not be obvious how Lagrange theorem can help us find the
exponents that we search for. The missing ingredient is the following:
G;
Property 8. If ( ) is a finite group then for every the set := k: N a2G hai fa k 2 g
is its subgroup. Denote := card m. Then m hai
=1, where 1 is the identity of the a
G
group . Moreover, for every N such that l2, we have l =/ 1. l<m a
The group hai of Property 8 is said to be generated by a.
Warning. If a group ( G;
+) is in additive notation then = : hai fk a k 2 Ng,
where k a a||||||{z}
:::}}}}}a H
:= + + , e.g. the group of the previous example is equal to H =h2i
k addends
(check!).
Let us see how the above two facts provide a natural generalization of Fermat’s little
theorem. Fix n2 n>
N, k2
2, and let Zn be a number coprime to . This means that n
k 2G G
n (see page 27 for the definition of n). Since ( n n) is a group, by Lagrange G ;
theorem we have that card hkij G
card n. But we know that card n = ( ), where is G 'n '
the totient function. Using Property 8 we get (in the group n) G
'(n) '(n)
k' n =(k
( ) card hk i)ca r d hki =1c a rd hki =1.
Example. 34567233 7 and since 7?10 we can use the fact that '(10)='(2) '(5)
233
'n
Commentary. In Euler’s theorem, the exponent ( ) is a number which works for all
choices of elements k 2Gn but it does not have to be the smallest such exponent. For
instance, it is easy to check that for every k 2G f ; ; ; g
8= 1 3 5 7 k G
it holds 2 =1 in 8 while
' G
(8)=8 4=4. The least universal exponent for n is given by so-called Carmichael
n G
function ( ). For a specific element of n a much smaller exponent may suffice, e.g.
G
one always has 11 =1 in any n. This smallest exponent for an element k 2G n is called
k
its multiplicative order ordn . By Property 8, ordn =card k hki G
(in n). From Lagrange
theorem we know that always ordn kj ' n
( ). This does not help much, because it turns
k
out that finding ordn is in principle as hard as factoring integers (first we should find
'n( )).
Having prepared all the necessary tools, we may illustrate the theory with a prac-
tical application:
The RSA (Rivest, Shamir, and Adleman) cryptographic system. This is an example of a
public key cryptosystem . To use it, Bob must generate a pair of keys – a public and a
private one. The public key is available in a secure repository (a public key server) and
it may be used by Alice to encrypt her message to Bob. After receiving the encrypted
message, Bob may use his private key to decrypt it, obtaining the original message.
Here is how Bob generates his keys. He should:
1. Generate two (large) random primes p and q, p=/q (this can be done efficiently).
2. Compute n:= p q (the number len n is the key length ) and '(n)=(p 1)(q 1).
3. Choose e 2 G' n (e does not have to be very big).
( )
Then: the public key is the pair (n;e) and the private key is (n;d).
( )
message (the case of (mi;n) >1 must be treated in another way but still m~i =mi).
Since G' n is an abelian group, the roles played by the numbers d and e can be
( )
interchanged, which gives another application: Bob may publish a file (F) along with
so called hash of F but this hash is encrypted by Bob’s private key. Then, this new
piece of information may work as a Bob’s digital signature (D). Namely, anybody who
downloaded a file that was suppose to be the one published by Bob, may compute the
hash of this file and compare it against the value of the signature D „decrypted” using
Bob’s public key . If those two values do not match, this means the downloaded file is
either a fake and probably does not come from Bob or there was some net error during
the transmission.
A safety remark: you should see that in order to be able to „break the system”, it is
enough to find Bob’s private key, which in turn amounts to finding the number '(n) e2G
satisfyinge d '(n) 1. But in order to do it, first of all you must compute the modulus
'n n
( ) somehow. And you know only (part of the public key). If you can factor into n
p q
the (secret) prime factors and then clearly you can compute ( ). Otherwise, how 'n
'n
to compute ( )?...
(7)
x
for , in the integers. Here 1 a ; :::; a 2
r Z are arbitrary. However, it turns out that in
general in order that the system (7) have solutions, we must impose some conditions
on the natural numbers 1 n ;:::;n
r, as proved by the following:
Example. Consider the system x mod 4)
2(
. One sees easily that all the solutions to the
x
k k2 ; ;:::
mod 6)
1(
first congruence are the numbers of the form 2+4 , where Z, i.e. 2 6 while the
solutions to the second congruence are just the numbers 1+6 , Z, i.e. 11 5 l l2 :::; ; ;
; ;:::
1 7 In particular, the first series of numbers consists of even integers and the second
series – of odd integers. Hence, there is no common solution Z to this system. x2
What sort of assumptions should be imposed then on the numbers 1 r in order n ;:::;n
to guarantee the existence of some solutions
to (7)? To find a possible such condition, let
us consider a two-congruence system xx a (mod m) . This can be rewritten in an equiv-
b (mod n)
alent form as follows:
x+k m=a
x +l n =b ;
where k;l 2
Z are some integers to be found. If this system has some solution x;k;l 2Z
then upon subtracting its two equations we get that and must satisfy k l
k m+l ( n)=a b. (8)
a b
But now recall that and are arbitrary integers. This means that on the right-hand
side of this equation there can stand virtually any integer; in particular, it may happen
a b
that =1. Thanks to Bezout’s identity (Theorem 8) we know that the equation
km l n
+ ( )=1 has a solution in integers iff ( m; n
)=1 that is iff and are coprime. m n
On the other hand, under this assumption, we have that = = x a k m b l n
if the
k l
integers and satisfy equation (8). This means that this value of satisfies also the x
system of congruences that we wished to solve. Moreover, it is not hard to notice that
all other solutions to this system are congruent to the given one modulo . Thus, we m n
are able to give all the integer solutions to the system (7) in the case of two congruences.
Example. Consider the system x mod . Equation (8) becomes
2( 3)
=2 10 3+3 13 39
In principle, the above method is the idea behind a general algorithm of solving
the system (7). One thing we see at once: if this system (of r > 2 congruences) has a
x x
solution , then this is obviously a solution of any two congruences of this system,
too. This means that for any = i j
/ it should hold ( i j) =1, or in another words – n; n
the numbers 1 n ;:::;n
r should be pairwise coprime. It turns out that this condition is
enough in order that the system (7) have solutions:
Theorem 17. (Chinese remainder theorem) Let 1 r Z and 1 r N be a ;:::;a 2 n ;:::;n 2
natural numbers greater than 1. If 1 n ;:::;n
r are pairwise coprime then the system
8
< x a (mod n )
: x ar (mod nr)
1 1
It is not hard to see why Theorem 17 is true. Namely, every number j j 1 nj r is a s n ::: n
a n n ::: n
congruent to j modulo j because of the relation verified by j . But since 1 n r nk 0
j
s
for =k j a s k j
/ , we get j j n1 n:::j nr nk 0 for = x
/ . This means that defined by formula
(9) satisfies x ::: a ::: a n j ; :::; r
0+ +0+ j +0+ +0 j (mod j) for = 1 x , so this is a
x2
solution to our system of congruences. Now, let 0 Z be another solution of this
system. This implies that x x n
0 j ;:::;r
0(mod j) for every =1 . In another words,
8j 2f ;:::;rg;n j x x
1 j ( n ;:::;n
0 ). Since r are pairwise coprime, this means that also
n ::: n j x x x x n ::: n
1
(1 r) ( 0 0
) so that and differ by an integer multiple of 1 r, so that
there exists a unique solution of our system of congruences in Zn1 ::: nr and this solution
x n ::: n
is equal to (mod 1 r).
At the beginning of this section we mentioned that Chinese remainder theorem may
work as a connection between the modular arithmetic and the usual one. The main idea
behind such an application of Theorem 17 is the following. Suppose you want to com-
pute a value of some expression E involving big integers. It is a common situation that
you may predict some upper bound for the result. For our discussion, let us assume that
M
this bound is ; more precisely, assume that the result is an integer lying in the interval
[0; :::; M
]. You may choose some sequence 1 n ; :::; n
r of (small) coprime numbers such
that 1 n ::: n >M
r , compute E j :=E (mod jn) for each 1 j 2f ;:::;rg
and then use Chinese
remainder theorem to recover the unique integer x2
Zn1 ::: nr satisfying the system of
congruences x
nj Ej . Since 1 n ::: n >M
r , the only possibility is that =E. In another x
words, you may compute the value of E by performing most of the calculations modulo
n
the small numbers j (which is very fast) and only at the end should you produce the
x
final (possibly big) result =E working modulo 1 n ::: n
r. While this may seem a lot of
work, in practice this approach (when applicable) is usually much more efficient then a
direct computation in Z, provided that the numbers involved are big enough. Moreover,
if you have a computer with a multi-core processor, you may split the calculations of
Ej among several threads which speed things up even more. Another advantage is the
memory usage – it may be much smaller for this method than for the direct one.
Let us see by example how the above method works (this example also shows that
for small numbers it is more efficient to use a direct computation):
Example. Let us say we want to compute the product = 13 63. Clearly, the result k
will not be bigger than 24+6 =210. We may choose e.g. 1 =4, 2 =5, 3 =7, 4 =9. We n n n n
n ::: n >
have: 1 4 k k
210 and (mod 4)=3, (mod 5)=4, (mod 7)=0, (mod 9)=0. So k k
we must solve the system
8
>
< xx3( mod 4)
4(mod 5) .
> x
: x0( mod 7)
0(mod 9)
Because of formula (9), in this case there are only two numbers that should be found:
s
1 5 7 9 s
1(mod 4) and 2 4 7 9 1(mod 5) or equivalently 1 ( 1) 1(mod 4) and s
s
2 2 s s
1(mod 5). This gives e.g. 1 =3 and 2 =3. Hence, =3 3 5 7 9+4 3 4 7 9+ x
t t t 2
4 5 7 9=5859 + 1260, Z. Since 5859 (mod 1260)=819, this is the value of . k
Commentary. The method of performing calculations with integers by first falling back
on modular arithmetic and then applying Chinese remainder theorem is widely used.
Some example applications of this method include so-called threshold secret sharing
(in cryptography) and also faster decryption in the RSA cryptosystem (typically about
three times faster). Also, it can be used to multiply big matrices or polynomials (with
integer coefficients).
There is another method of speeding up calculations, called rational reconstruc-
tion, which, loosely speaking, allows one to use modular arithmetic first and then re-
cover the result which can be a rational number (see [Sho08, Section 4.6]).
It is worth noting that the assumption that the numbers 1 n ;:::;nr are coprime (see
Theorem 17) is not in general necessary in order that the system (7) have solutions. It
may be proved that solutions exist iff 8i;j 2f ;:::;rg;a a
1 i n ;n
j (mod ( i j )). Moreover,
in such a case there is exactly one solution of the system in Z[n1;:::;nr], where [ 1 r] n ;:::;n
denotes the least common multiple of the numbers 1 n ;:::;n
r .
Proof. To the contrary, assume that there are only finitely many primes, e.g. 1 n. p ;:::;p
We may consider the number := 1 N p ::: p
n +1. Clearly, N
1(mod j) for every = p j
;:::;n N
1 , which means that is not divisible by any prime. This is not possible (why?).
While the above theorem is good news, it does not provide us with any direct way
of finding big prime numbers. A fascinating property of primes is that their distribution
among the naturals looks as a random one and nobody has found any direct formula
producing only prime numbers (not to mention a formula for, let’s say, -th prime k
number). So, a natural approach to generate big primes would be just to test subsequent
numbers starting from some big enough N
and hope that soon enough in the sequence
NN N
, +1, +2,... there appears a prime number.
Fact.
There are arbitrary long intervals of natural numbers without prime numbers.
n 23 4 5 6 7 8 9 10 ::: 20
p 3 5 5; 7 7 7; 11 11; 13 11; 13 11; 13; 17 11; 13; 17; 19 ::: 23; 29; 31; 37
2 n 4 6 8 10 12 14 16 18 20 ::: 40
Theorem 19 gives us some positive information, but it turns out one can do much
better. In order to state the next result, we introduce the following important definition:
Definition 17. (π function) Let x2
R+. Then ( ):=the number of primes less x
x
than or equal to . The function is called the prime-counting function.
Example.
x 1 2 3 4 5 6 7 8 9 10 11 ::: 106
(x) 0 1 2 2 3 3 4 4 4 4 5 ::: 78498
.
lim
(xx) =1.
x!1 ln x
x x
This theorem says that for big there are „roughly” ln x prime numbers less than . The x
assertion of the theorem can be written in a somewhat more suggestive way as follows:
(x)= lnxx (1+o(1)) as x!1,
where the small-o notation o(1) is just a way of saying that the values hidden by the
symbol o(1) tend to 0 (in this case as x!1).
A simple corollary of Theorem 20 is the following improvement of the
Bertrand–Chebyshev theorem:
Corollary 5. For every (small) real ">
0 there exist N 2N such that for all x>N
there is a prime number in the interval ( (1+ ) ). x; " x
This corollary does not say anything about the value of the number but there are also N
some specific results in this direction. For instance, there is always a prime between x
1
x x>
and (1 + 25 ln(2 x) ) if 396 738. These kind of results show that probably it
should not take „too long” to find a prime number starting from some value of and x
following the strategy described on page 38. So the answer to Question 1 is yes. There
are also much more „optimistic” predictions concerning the distribution of primes in
some specific intervals and these predictions have some heavy empirical evidence. One
such conjecture is Legendre’s conjecture, which states that there is a prime number
n n
between 2 and ( +1)2 for every n2 N. This conjecture is still unsolved . Of the same
p
status is Cramér’s conjecture which states that if is a prime number then in the
p;p O p
interval ( + ((ln )2)) there is also another prime.
Sometimes it is also important to be able to generate big prime numbers of a special
form. For instance in the RSA system, some additional restrictions may be imposed on
p q
the prime numbers and (see page 34). This is connected with the notion of a strong
prime, which in turn is a contra-answer to existing factorizing algorithms working faster
than usually in some special situations. An example of such an algorithm is Pollard’s
p 1 algorithm. In order for this method to be useless to factor = , one should n p q
p
assure that the prime number is of the form = p k r
+1, where the number is also r
q
a big prime (and similarly for the prime number ). Thus, the question is: do primes
of such a form exist and how many of them is there? The answer to this question
may be viewed as a generalization of Theorem 20:
Theorem 21. (Dirichlet’s theorem) If two positive integers and are coprime k l
then there are infinitely many prime numbers in the sequence ( + +2 ) k;k l;k l;:::
that is there are infinitely many prime number solutions to the congruence l . x k
x
More precisely, if k;l( ):=the number of these primes less than or equal to that x
k
are congruent to modulo then l
k;lx(x) = 1
x!1
lim
x '(l)
ln
or equivalently
k;l(x)= ' l xx (1+o(1)) as x!1.
1
( ) ln
Note that for k =1 and l =1 it holds k;l(x)=(x) and '(1)=1 so Dirichlet’s theorem
is indeed a generalization of Theorem 20. Moreover, this theorem guarantees that if you
take a (big) prime number r and consider the sequence (1; 1+ r; 1+2 r;:::), for sure
therein you will find a prime p =k r +1 for some k 2 N. If you generate only this sort
of primes p and q for the RSA algorithm, you are safe from the p 1 attack.
Example. In the sequence f8 n+1: n 2Ng there has to be infinitely many primes (by
Dirichlet’s theorem). The first few primes in this set are:
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;:::
17 41 73 89 97 113 137 193 233 241 257 281 313 337 353 401 409 433 449 457 521
In particular, ; (521)=21. On the other hand, ' l x evaluated at l =8 and x =521
x 1
gives the value 20.820844:::; so in this case the prediction of Theorem 21 is almost exact.
1 8
( ) ln
As you can see, the distribution of prime numbers, although chaotic, is also subject
to some restrictions and there are many observations concerning the counting function
. Perhaps the most famous of all unsolved problems in mathematics is the Riemann
Hypothesis and this problem can also be stated in terms of the function .
Millennium Problem (Riemann Hypothesis 1859). Prove that
(x)=Li(x)+O(px ln x),
where the Eulerian logarithmic integral Li is given by the formula Li( x):=R x t dt.
2
1
ln
The Riemann Hypothesis postulates that you may estimate the number of primes
6x by the number Lip(x) (which can be easily computed) and by doing so you make a
mistake which is O( x ln x), so a relatively small mistake. This hypothesis turns out
to be extremely important in mathematics as it implies many other unsolved problems
in Number Theory. That is why it has been chosen as one of Millennium Problems
and as such is worth $ 1000 000. The price is offered by Clay Institute (see http://
www.claymath.org/millennium-problems/millennium-prize-problems).
Although we answered Question 1 in the affirmative, one more information seems
to be crucial in order to be able to generate big primes. Namely, since there are no
formulas for prime numbers, how to check if the number under consideration is a prime
or not? Of course, for a given N2 pN
N you could employ the naïve method of successive
divisions ofN by naturals less than (or N
) but this approach is totally useless for
N
big . Notice also that this method would produce a (partial) factorization of as a by- N
N
product (for a composite ), which is both unnecessary and – as has been said already
– widely believed to be impossible to accomplish in a reasonable time. So we must ask:
GIMPS project if you wish. If your computer finds a new record 108-digit prime number
using GIMPS software, you will have to split the prize with the GIMPS project; also
awards of about $ 3 000 are offered by GIMPS itself for finding a new Mersenne prime
(see http://www.mersenne.org/legal/#mpa for details).
Unsolved problems. Besides the famous Riemann Hypothesis, there are tons of un-
solved problems in Number Theory. Some of them can be found at http://en.wikipe-
dia.org/wiki/Unsolved_problems_in_mathematics#Number_theory_.28general.29.
y
1
1 1 x
1
Let us recall that in Section 11 we sought for the integer solutions of this equation, which
in turn corresponds to finding integer points lying on the above line (e.g. the points
; ;
( 1 1) and (2 1)). In another words, if you had a big enough piece of grid paper, a
long enough ruler and if you were accurate enough, you would be able to solve (in Z)
all equations of the form a x b y c
+ = , where a;b;c 2 Z with almost no calculations.
It also makes sense to consider the equation 2 x +3 y =1 from the above example
over Zp, where p is a prime number and p >5 (of course in this situation + means +p
and means p). Indeed, multiplication by 3 in Zp produces an equivalent equation
1
in the last formula are to be understood in Zp but other than that, this formula for y is
3 3
in principle the same as the one over R which reads y = + x. This suggests that 1
3
2
such equations could be considered more abstractly (namely over fields) and that there
should be no major differences between solving them over R and over an abstract field
K.
x y
Example. Consider the equation 2 +3 =1 in Z25 (=Z5 Z5). We easily find that
3 1 =2 in Z5. Hence and by the above observations, our equation is equivalent to = y
x x
2 +5 ( 2) 5 2 5 =2 +5 . This means that becomes a function of . The table of y x
values of this function is as follows:
x 0 1 2 3 4.
y23401
Graphically, the „line” 2 x +3 y =1 in Z 2
5 looks like this:
y
5
4
3
2
1
0 12 34 5 x
In general, a linear equation may have many variables and may have coefficients in
any field. Here is the formal definition:
; ;
Definition 18. (Linear equation) Let (K + ) be a field. A linear equation in n
x ;x ;:::;x
variables 1 2 n over K has the form 1 1 + + n n = , where 1 a x ::: a x b
n K a ;:::;a 2
are the coefficients and b2 n
K is the constant term. The subset of K built of all S
the solutions to a linear equation is called (if =
/ =/ Kn ): ? S
a point (if n=1),
a line (if n =2),
a plane (if n =3),
a hyperplane (in general).
Example. 2 + πx y 2
z p
= 2 is a linear equation in
3
3 variables x;y;z over R. This
equation defines a plane in R3:
If you intersect several hyperplanes you get either an empty set or another hyper-
plane:
Examples of various configurations of planes in R3. Let’s say we are given two or
three equations of the form ax by cz d
+ + = , where a;b;c;d 2
R and let’s say we want
to solve them simultaneously. This means that we want to solve a system of linear
equations. The set of solutions of this system consists of exactly those points in R3
that lie in the intersection of all the hyperplanes defined by the equations of the system.
8
over K. Usually such a system is written in the following form:
1 1
a ;b 2
where ij i K. A solution (in Kn) of this system is any point =( 1 n) Kn a a ;:::;a 2
that satisfies all equations of this system (this means that the substitutions 1 = x
a ;:::;x a
1 n = n lead to genuine equalities in K). A linear system is called consistent
if it has at least one solution and inconsistent in the opposite case.
Our objective is to learn how to solve linear systems. We begin with some simple ones...
8
called a triangular system if it is of the form
>
>a x + a x + a x +::: + a n xn = b
< a x + a x +::: + a n xn = b
11 1 12 2 13 3 1 1
a x +::: + a n xn = b ;
22 2 23 3 2 2
>
> 33
3 3 3
: ann xn = bn
where a ;:::;ann are non-zero elements of the field of coefficients K.
11
Note that a triangular system always has the same number of variables as the number of
equations. Moreover, if we set 1 x >x >:::>x
n then every equation of a triangular system
x
2
has a different greatest variable j . This variable is called the leading variable of the
equation. In another words, one can say that a system is triangular (up to permutation
of the equations) if its every variable is a leading variable of exactly one equa-
tion of the system . Usually, when no ordering of variables is mentioned, one means the
lexical (=dictionary) ordering, or an ordering which is clear from the context.
8
Example of solving triangular systems using back substitution. Consider the system
>
< 3w + 24 xx + 73 yy + z == 10
> 5 y 2 z = 1 :
: 3z = 6
If we setw>x>y>z then we see that every equation has a different leading variable
(and in exact this order); hence this is indeed a triangular system.
In order to solve the system, we start with the simplest equation which is the one
z
involving the smallest variable as the leading variable. We get 3 =6 z ,z
=2. Now we
z
use the discovered value of and substitute this value into the other equations. The next
simplest equation that we get in this way is the second from the bottom of the system:
5 y 2 2 =1 ,5 y =5 ,y =1:
z
Now we also substitute y =1 into the other equations. The third one from the bottom
of the system becomes:
2 x 7 1 +2 = 1 ,2 x =4 ,x =2:
y z
Lastly, we also substitute this value of z into the first equation to get
3 w +4 2 +3 1 =0 , 3 w= 11 ,w = : 11
x y
3
Summing up, our system has the unique solution equal to ( ;;; w;x;y;z
)= 3 2 1 2 . This 11
means that this system corresponds to case g) of example on page 44 (although in R4).
Note also that we could consider this system over Zp where p> P and p2 11. In such
a case, at each step we should solve our equations in this Zp. It is easy to see that for
w
this particular system the solution would be the same except that = 11 p 3 1 where
p2 p6
3 1 is the multiplicative inverse of 3 in Zp. For P but 7 our system cannot be
p
considered over such Zp unless you reduce its coefficients modulo . If you do so than
p
various things may happen. For instance, if =7 then you still have a triangular system
with unique solution ( w;x;y;z ;;; p ;;
)=(6 2 1 2) (check!) but for =2 3 5 you do not have a
w;x;y;z ; ; ;
triangular system any more. In Z2 the solution is ( )=(1 0 1 0), in Z3 there are
w;x;y;z t ;t ;t;
no solutions and in Z5 the solutions are of the form ( )=(4 5 +5 3 +5 1 2),
where t2 Z5 (you will learn how to solve such systems later on).
The process of solving triangular systems using the method sketched in the above
example is called back substitution. Note that in each case in which we had a triangular
system there was only one solution to such system (see the example above). This is not
accidental, as it turns out:
Property 9. Every triangular system of equations (over any field K) has exactly
one solution in Kn. This solution can be found using back substitution.
(E ) 3
(E )
3
< 4 x + 6 y 10 z = 6 (E )
: w 2 x 4 y + 2 z == 294 (E ) :
1
(E ) 3
<w = 4 (E )
: 2x 4 y + 2 z = 29 (E ) :
1
(10)
14 y 14 z = 64
2
(E )
3
This system is quite similar to a triangular one but it is not such because the variable
z is not a leading variable of any equation of this system (a more basic reason for this
is that in our system there are less equations than variables so no matter what you do
you cannot reduce it to a triangular form).
To finish our work, we might apply the method of back substitution, but for this we
need a triangular system. This is easy to achieve – we can look at the non-leading vari-
z
able in the last equation of the system and treat it as a parameter (in our case varying
s
over R). So we introduce a new name, e.g. , and we set = . At this point, we may z s
s
treat as a (fixed but arbitrary) constant and rewrite our system in the following form:
8
<w = 4
: 2x 4 y = 29 2 s :
14 y = 64 +14 s
In this way we constructed a triangular system (in the variables ), which we can w;x;y
solve by back substitution. Applying this method we get the following solutions (exactly
s
one for each value of ):
(w;x;y;z)= 4; 5 +s; 4 +s;s , where s 2R .5 4
14 7
Exercise 8. Reduce the coefficients of the system from the example on page 45 modulo 5, and solve
this reduced system over the field Z5.
The above example revealed the allowed transformations on linear systems and also
showed that we cannot expect to be able to transform every given system to a triangular
one. Before we proceed any further, it will be convenient for us to change our language.
Namely, observe that in the examples worked out in this section little role was played by
the variables themselves. They are just names, placeholders and what matters most are
the coefficients of the systems. This suggest that we should be able to dismiss variables
from our notation, but doing so we must nevertheless retain the structure of our system
as to not loose any crucial information. This simple idea leads to the notion of a matrix .
Definition 21. (Matrix) A matrix of shape is a rectangular table of numbers mn
(more generally: of elements from a field K or from some other set) having m
n
rows and columns. The elements of a matrix are called its entries and the num-
mn
bers , are called its dimensions.
In the context of matrices with coefficients in a field K, the elements of K are often des-
ignated as scalars.
Notation. The set of ( mn
)-matrices with coefficients in a set will be denoted S
S
by m n. For a matrix A2S
m n its entry in the -th row and -th column will i j
A
be denoted by ij and the matrix itself will be written as =[ ij ]11 66 ij66mn or just A A
A A
=[ ij ] if the dimensions are clear from the context. A graphical presentation
A
of a matrix
2 A A ::: A n
looks like this:
3 0 A A ::: A n 1
A=664 A::: A::: ::: An 77 or A=B
@ A::: A::: :::::: A:::n
B C
C
11 12 1 11 12 1
21
::: :::
22 2
5 21 22 2
A.
Am Am ::: Amn
1 2 Am Am ::: Amn
1 2
1 1
2 a a ::: a n 3
we may associate its coefficient matrix and its augmented matrix :
2 a a ::: a n b 3
66 a a ::: a n
11 12 1
77 66 a a ::: a n b 77:
11 12 1 1
am am ::: amn
1 2 am am ::: amn bm
1 2
The augmented matrix contains all the necessary information about the system and we
may effectively work with this matrix to find the solutions of our system. The three basic
transformations of linear systems considered in the example on page 47 can be trans-
lated into appropriate transformations of rows of matrices associated to such systems:
Definition 22. (Elementary row operations on matrices) Let K be a field. The
three types of elementary row operations on matrices from Km n are:
1. Interchanging two different rows. (Ri $Rj)
2. Multiplication of a row by a nonzero scalar. ( Ri !Ri)
3. Addition of a multiple of one row to another one. ( Ri +Rj !Rj)
Two matrices are said to be row-equivalent when one can be obtained from the
other by some number of (successively applied) elementary row operations. We will
use the notation AB to indicate the fact that the matrices and are row-equiva- A B
lent. It is not hard to observe what follows:
Example. Let us rewrite the transformations performed on the system of the example
8
on page 47 in the language of elementary row operations on matrices. The system was:
< w + 2x + 3 y 5z = 1
:7 ww + 42xx 64 yy ++ 102 zz == 12 :
2 3
The augmented matrix associated to this system is:
1 2 3 5 1
4 1 4 6 10 25:
7 2 4 2 1
Now we shall perform row operations on this matrix mimicking the linear system trans-
2 3 2 3 2 3
formations applied in the aforementioned example:
1 2 3 5 1 2 4 6 10 2 2 4 6 2
4 5
1 4 6 10 2 4 5
1 4 6 10 2 4 1 0 0 0 45
10
7 2 4 2 1 R !R 7 2 4 2 1 R R !R 7 2 4 2 1 R $R
2 2 4 6 10 23 2 0 4 6 10 63
2 1 1 1+ 2 2 2 3
4 7 2 4 2 15 4 7 2 4 2 15
1 0 0 0 4 R R !R 1 0 0 0 4 R R3 ! R2
2 3 2 0 0 14 14 64 3 2 1 3
1 2 3 1 2 +7
0 4 6 10 6 0 0 0 4
4 0 2 4 2 29 5 4 0 2 4 2 29 5 4 0 2 4 2 29 5:
1 0 0 0 4 R R !R 1 0 0 0 4 R $R 0
1 2 2 1 1 3 0 14 14 64
This last matrix is the augmented matrix of linear system (10), which is an illustration
of the fact that working with matrices is equivalent to working with the systems them-
selves.
Terminology. Sometimes the leading entries of a matrix are called pivots, their loca-
tions in this matrix are called pivot positions and the rows (columns) containing a pivot
position are called pivot rows (columns).
Example. Consider
21 3 5 0
3 21 3 0 0
3
660 0 1 077 6600 1 077
40 0 0 25 400 0 1 5.
0 0 0 0 0 0 0 0
The first matrix is in ref and the second one is in rref. Moreover, the first matrix can
be transformed into the second one by a sequence of elementary row operations.
21 2 5 2 3 2 3 21 2 5 2
II. Transform matrix (11) into its rref. In this case
0 4
3
40 0 2 1 4 1 5 40 0 2 1 0 95
0 0 0 0 1 2 R R !R 0 0 0 0 1 2 R !R
21 2 5 2 0 4R3 R !R 21 2 0 3
4
1
2 3 2 2 2
2
1 3 3 1
0
640 0 1 0 75 6640 0 1 77.
1 37
5
2 2
0
1 9
1 9
2 2
0 0 0 0 1 2
2 2
R R !R 1 5 2 1
0 0 0 0 1 2
" "
w s = y t
=
This last matrix is already in rref. Now you may either proceed as before (apply
method I to this matrix) or treat the columns that do not contain leading entries
21 2 s
as corresponding to parameters ( =
0 t 0
3 w s;y 21t
= let’s say) and continue as shown:
0 0 0 0 +2s+ t7
3
66 77and then 66
1 37 37 1
40 0 5 40 t 75.
2 2 2 2
1 1
2
t 0 9
2
0 1 0 0 9
2
1
0 0 0 0 1 2 0 0 0 0 1 2
The system corresponding to the last matrix is trivial to solve with respect to
v;x z w sy t
and (but you have to remember that = , = is a part of the solution).
The two methods of solving linear systems, outlined in the above example, can be
loosely summarized as follows:
Gaussian elimination. In order to solve a linear system you should:
1. write down the augmented matrix of the system,
2. transform this matrix into a new matrix being in ref,
3. translate the new matrix into the linear system it represents,
4. treat all non-leading variables as parameters (introduce new names for
that) and move them to the right-hand side of the system, ( j = j) !x s
5. solve the triangular system you have arrived at using back substitution (re-
x s
member to take into account the equations j = j defining the parameters).
Gauss-Jordan elimination. In order to solve a linear system you should:
1. write down the augmented matrix of the system,
2. transform this matrix into its rref,
and then either
3a. repeat steps 3.–5. of Gaussian elimination,
or
3b. multiply the columns not containing leading entries by new variables treat-
ed as parameters (but not the last column!), ( j = j j j j) ! x s ;s C !C
4b. subtract the columns with parameters from the last column, zeroing them
afterwords, ( n+1 j !C
n+1) j 0
PC !C ;C !
5b. read off the solutions from this matrix (remember to include appropriate
x s
equations j = j defining the parameters).
Warning. In both above methods, the names of the new parameters (variables) must
all be different from each other.
Commentary. Another variation of Gauss-Jordan method would be to also add new
rows to the matrix from step 4b, namely those corresponding to the equations j = j . x s
In this way, you get a triangular system with all the necessary information in step 5b.
Note also that in Gauss-Jordan method you may work with columns. This is some-
thing new but such operations will reappear later on, in another context.
Example.
1. Let 21 23 2 0 33
A:=42 35and B :=4 4 35.
0 5 2 1
Then 21+0 2+33 2 1 13
A+B =42+4 3+35=4 6 65.
0 2 5+1 2 6
2. Let 21 23 1
4 5
A:= 2 3 and B := 0 0 2
1 3 .
0 5
Then
21 1+( 2) 0 1 0+( 2) ( 1) 1 2+( 2) ( 3)
3
A B = 4 2 1+3 0 2 0+3 ( 1) 2 2+3 ( 3)5=
21 2 83 0 0+5 (
0 1+5 0 1) 0 2+5 ( 3)
= 42 3 55.
0 5 15
Moreover,
1 1+0 2+2 0
1 ( 2)+0 3+2 5
B A = 0 1+( 1) 2+( 3) 0 0 ( 2)+( 1) 3+( 3) 5 =
1 8
= 2 18
.
3. Let
21 23
:= 3 and A:=42 35.
0 5
2 3 1 3 ( 2)3 2 3
Then
3 6
A=4 3 2 3 35=4 6 95.
3 0 3 5 0 15
AB BA
The above example shows in particular that even if both products and are
AB BA
defined, then in general =/ (cf. item 2). One might guess that this is because
AB BA
the above two products are of different sizes. But when are the sizes of and the
same? Well, if A2R B2R m nAB 2R
, n BA
p then m p . Since should also be defined,
p m AB2R
first of all we must have = . But then BA2R m m and n n. If these sizes are
m n A2 R B 2R
to be the same, we must also have = . This means that m m and m m.
A
Such matrices are called square; more precisely, a matrix is called square of order
n nn
if it is of size n R
. The set of all square matrices of order with entries in a ring
M n;R
will be denoted by ( A;B2M n;R AB2M n;R
). Note that if ( ) then also ( ) so the
n
operation of multiplication of square matrices of a given order is an inner operation
in (M n;R AB BA
). This is nice, but does equal for square matrices? The answer is no:
2 1
Example. Let
A:= 11 20 and B := 0 2 .
Then
1 2
AB = 22 51 and BA= 2 4 .
Hence AB =/BA.
You may wonder what sort of properties do hold for the matrix operations + and
. The following theorem provides a partial answer to such question.
Theorem 24. (General properties of matrix operations) Let R be a commuta-
tive ring. Then:
1. (Rmn; +) is an abelian group. The identity of this group is the zero matrix
O 2Rmn defined as 2 0 0 ::: 0 39 >
>
=
6
O :=6
0 0 ::: 0 7
7
4 ::: ::: ::: ::: 5> m
0 0 ::: 0 >
rows.
;
|||||||||n||||{z}}}}}}}}}}}}}}
columns
Theorem 24 exhibits interesting properties of the matrix operations. While all of them
are straightforward to check, the verification of associativity of matrix multiplication is
somewhat tedious (but try to do it!). As was the case in earlier sections, such associative
property means we can safely skip all the brackets when multiplying matrices (also
for the scalar multiplication). Similarly, we agree (as usually) that has a greater
priority than + so that we will skip even more brackets in expressions in which both +
and are involved, e.g. A B A C
+ means ( A B A C
)+( ). However, as we have seen
already, we cannot interchange the order of the factors when multiplying matrices.
0 0 ::: 1 ;
|||||||||n|||||{z}}}}}}}}}}}}}}
columns
It is easy to see that if A;B 2GL(n; K) then A 2GL(n; K) and A B 2GL(n; K). 1
Hence, we get:
Theorem 26. (The structure of GL(n, K)) (GL( n; K); ) is a group.
In principle, the answer to the question in the title of the section is simple: similarly
as was the case with the rings Zn (cf. Commentary on page 23), the invertible elements
M n;
of ( K) are exactly the non-zero-divisors of this ring. To see why this is indeed the
case, first we interpret the elementary row operations on matrices in a new way.
Definition 26. (Elementary matrix) Let K be a field. A matrix ( K) is A2M n;
called an elementary matrix if it is the result of a single elementary row operation
performed on the identity matrix In. More specifically:
! A is an elementary matrix of the first kind if it is the result of the elementary
operation Ri $Rj on In; we will write A=Eij.
! A is an elementary matrix of the second kind if it is the result of the elemen-
tary operation Ri !Ri on In (here = / 0); we will write A=Ei( ).
! A is an elementary matrix of the third kind if it is the result of the elemen-
tary operation Ri +Rj !Rj on In; we will write A=Eij ( ).
Remark. The distinction between elementary matrices of the first, second and third
kinds is merely a matter of terminology convenience. Thus, depending on author’s pref-
erences, the enumeration of these kinds may vary.
00 00
The general form of elementary matrices can be represented graphically as follows:
0 0 0 0
2 3 2 3
1
6
6
...
0 7
7
6
6
1 ...
0 7
7
6 7 6 7
6 0
6
... 1 7
7 ith row 6 0 ...
6
1 7
7 ith row
Eij = Eij( )=
6 0 0 ... 0 1 7 6 1 0 ... 0 0 7
00 00
6 7 6 7
0 1 ... 0 0 0 1 ... 0 0
6
6
6
6
7
7
7
7
6
6
6
6
7
7
7
7
6 0 0 ... 1 0 7 6 0 0 ... 1 0 7
6
6
6
6
1 0 ... 0 0
7
7
7
1 ... 0 7
j th row 6
6
6
6
0 ... 0 1
1 ...
7
7
7
0 7
j th row
6
4
7
5
6
4 7
5
0 ... 1 0 ... 1
2
0
66
Ei( )=66
00
0
1 ... 0
3
77
77 i
00
0 ... 1
1 0 0
4 0 0
0 0 1
1 ... 0
5 th row .
0 ... 1
Second kind
M ;
Example. Consider the set (3 C). The elementary matrix of the first kind corre-
R $R
20 3
sponding to the elementary operation 1 3 is
0 1
E =4 0
13 1 0 5,
1 0 0
The elementary matrix of the second kind corresponding to the elementary operation
1
R !R
2 2 is
21 3
=640 0 0
3
E 2
1
075 1
0 0 1
3 3
and the elementary matrix of the third kind corresponding to the elementary operation
R R !R
i 2 + 3 3 is 21 3
0 0
E (i)=40
23 1 05.
0 i 1
The first simple observation concerning elementary matrices is the following:
(Eij) =Eij ;
1
Ei( ) =Ei ; 1 1
Eij( ) =Eij( ).
1
Corollary 6. Let K be a field and let A2Kmn. Then there exist elementary matri-
E ;:::;E 2M m;
ces 1 r ( K) such that
RREF( A)=Er Er ::: E A. 1 1 (12)
Let us observe that equality (12) tells something useful about invertibility of the
matrix A. Namely, if A 2 M(n; K) then it may happen that RREF(A)=In. In such a
case, putting B :=Er Er ::: E we have that B 2 M(n; K) and In =B A. At this
E
1 1
point we may exploit Property 11. Namely, multiplying this last equality by r 1 from
E E ::: E A
left we get r 1 = r 1 E
. Now multiplying this equality by r from right we get
E ::: E A E E ::: E A E E
1
that also D:=CEr Er :::E = / O. Summing up, we have D= / O and DA=O which
1
means that A is a (right) zero-divisor and as such is non-invertible (see Property 3).
1 1
In another words, to transpose a matrix means to interchange its rows with its columns,
or to flip this matrix about the imaginary „diagonal line” emanating from the upper left
corner of the matrix.
Example. Let
21 3 22 13 1
A:=4 4 5; B :=43 05; C := 0 24 14 .
3 0 1
Then
2 2 1 0
3
AT =[1 4 3]; B = 1 30 1 ; 0 C =4 2 45.
1 4
The basic properties of the transpose operation are:
8
Property 12. Let R be a commutative ring with unity. Then:
8
1. (AT )T =A (in words: the operation T is an involution=self-inverse),
A 2 Rm n
8
2.
m n
(A+B)T =AT +BT,
A;B 2R
(c A)T =c AT,
8
3.
A2R m n
c 2R
4. (A B)T =BT AT,
A 2 Rm n
B 2Rn p
6. 8
A 2M n;R
(
;
5. OT =O (In)T =In,
)
(A is invertible )(AT ) =(A )T ).
1 1
All of the above facts are straightforward to check. We remark only that item 6 is a con-
sequence of items 4 and 5 because In =ITn =( A A
) =( 1)T T and similarly In =
1 T
A A
AA A
T ( 1)T which means that the inverse matrix to T is equal to ( 1)T and this A
is what is asserted in item 6.
Assume now that we are given a right zero-divisor matrix A2 M n; A
( K). Hence,
is non-invertible. This means that also T A
is non-invertible (otherwise, by item 6 of the
A
above property, we would have in particular that there exists (( T )T ) 1 which would
mean that A 1
also exists). By Theorem 28 and the analysis on page 59 we know that
A T must be a right zero-divisor; so CA C 2 M n;
T = O for some non-zero ( K). By
Property 12 items 4–5 we thus get O = O =( ) T A C A C
T T T = T C
. Clearly, T is also
A
non-zero and hence is a left zero-divisor in ( K). M n;
Similarly, if A2M n; A A
( K) is a left zero-divisor then is non-invertible so RREF( )= /
A
In and hence is a right zero-divisor.
We may sum up these observations:
Corollary 7. Let K be a field and A2M(n; K). The following conditions are equiv-
alent:
i. A is non-invertible,
ii. RREF(A)= / In,
Using the above informations we can indicate a method of finding the inverse matrix
A of a matrix A2GL(n; K). Namely, by Theorem 28, A2GL(n; K) iff
1
Er Er ::: E A=In, 1 1
for some elementary matrices E ;:::;Er 2M(n; K). Moreover, then A =Er Er ::: 1
E . But this means that also A =Er Er ::: E In. By Theorem 27 the elementary
1 1
1
matri-ces
we get:
Theorem 29. (First method of finding the inverse matrix) Let K be a field and
A2M n; A
( K). Then the matrix is invertible iff there exists a sequence of elemen-
tary row operations transforming A
into In. Moreover, if this happens then the
same sequence of elementary row operations performed on the identity matrix In
produces the matrix 1
. A
Note that if A
In then necessarily In =RREF( ) because In is in rref (cf. Theorem A
23). Hence, the above suggests using Gauss-Jordan elimination to find 1
(if it exists). A
A convenient method to accomplish this goal is to adjoin matrix In to the matrix and A
consider the matrix of the form [ A
In ]. If using Gauss-Jordan elimination you manage
A A
to transform the –part of this matrix to In then the whole matrix will be of the form
In 1
. If you find that RREF( )= A
/ In you may conclude that the matrix is singular. A
A2M(2; Q) be of the form
Example. Let
2 1
A:= 3 4 .
We adjoin I2 to this matrix and perform the Gauss-Jordan elimination (actually only its
reduction-to-rref part) on it:
2 2 2
A In = 3 14 10 01 1 1 0
6 8 0 2 1
0 5
1 0
3 2
R2 ! R2 R1 R2 ! R2 R1 ! R1
2 3 2 3
2 3 + 1
2
1
R2 ! R2
4 1 0 5 41 0 5= In A
5
1 1 8 1
2 2 10 5 1
.
0 1 3
5
2
5
0R1 12 R2!R1
1 3
5
2
2
If our calculations are correct, we have found that
3
A =4 5= 15 4 1
8 1
10
3 2
1
3
5
2
.
5 5
Thanks to Theorem 28 (second part) we do not have to check the other equality because
it must hold automatically. Thus, we have indeed found the matrix 1
. A
2. If n> 1 then det A:=a C +a C +:::+an Cn , where Cij is called the
cofactor of the entry aij and is defined as Cij :=( 1)i j Mij , Mij is called the
11 11 21 21 1 1
minor of the entry aij and is defined as the determinant of the (n 1)(n 1)
matrix obtained by deleting ith row and jth column from the matrix A.
Example.
i. It is easy to check that det In =1 and det O =0.
a a
ii. Let A=
a a 2M(2;R), R"– a commutative
#
11 12
ring with unity. Then
—————
11 12
11 22 22
"a j a #
12 22
and
—————
12
12
22
12 12
4 3
6
iii. Let A=41 3 17 5. We have
0 1 2
j2j 4 3 2jj 4 3 2jj 4 3
jAj = 2( 1) j1j 3 1 +1( 1) 1jj 3 1 +0( 1) 1jj 3 1
———————
1+1 2+1 ——————— 3+1
j0 1 2 0j 1 2 0j 1 2 ———————
=
13
(
2(32 ( 1)( 1)) 1(42 ( 1)3)+0(4( 1) 33)=25 111 +0= 1.
)
The example above shows how to compute determinants – this can be done using
step-by-step reduction of the dimension of the determinant(s) to compute, up to the
point where one can apply formula (13). While such a procedure always works (by the
very definition of determinant), often various speed-ups are possible. We will see this
shortly, but first let us note the following:
Theorem 30. (Laplace expansion) Let R be a commutative ring with unity and
A2M n;R
( ). Then for every 1 i;j 2f ;:::;ng it holds
det A=a j C j +::: +anj Cnj
1 1 (jth column expansion)
and
det A=ai Ci +::: +ain Cin i
( th row expansion),
Ckl is the cofactor of the entry akl.
1 1
where
22 4 3
3
6
Example. Let us consider the matrix A=41 3 17 5from example iii) on page 62.
0 1 2
We may compute its determinant using e.g. 3rd row expansion as follows:
2 4jj 3 2 4 j3j
jAj=0+( 1) ( 1) 1 j3j 1 +2 ( 1) 1 3 1jj =
3+2 3+3
0 1j 2
——————— 0 1 j2 ———————
=1 21 13 +2 21 43 =
13
1(2( 1) 13)+2(23 14)=1( 5)+22= 1.
( )
R
Corollary 8. Let be a commutative ring with unity and ( ). Then det A2M n;R A
A
=det T. In words: the determinants of a matrix and of its transpose are the same.
A computation speed-up is possible in the following special situations:
2 3 1 2 2 1 0 2 1 2 1 = 2.
1 1 0 C C=!C 1 1 0 C C=!C 0 1 0 =2 ( 1) 3+1
1 0
5 4 1 5 3 1
2+ 3 2
2 3 1 1+ 2 1
Property 14. Let K be a field, A2M(n; K) and 2K. Then det( A)= n det A.
Theorem 31 gives more information that can be seen at first sight. Namely, when
applied to elementary matrices it says that (cf. Definition 26): det Eij = det In = 1,
det Ei( )= det In = and det Eij ( )=det In =1. Combining these formulas with The-
orem 27, we may restate (the „row part” of) Theorem 31 in the following simple form:
E ;F
some elementary matrices i j . Successively applying Corollary 9 to these expressions
A B
for and we get
det(A B) = det(Er ::: E Fs ::: F )=det Er ::: det E det Fs ::: det F =
= det (Er ::: E ) det(Fs ::: F )=det A det B.
1 1 1 1
1 1
det
Cn ... Cnn 1
Example. Consider 2 2 3 1
3
A=4 1 1 0 5.
5 4 1
C =( 1) 1
0 =1, C = 3 1 = 1, C = 3 1 = 1,
We have
1 4 4 1 1 0
1 +1
11 21 31
C = 15 01
12 =1, C 2 1 = 3, C =
5 1
22 = 2 1 = 1, C = 1 1 =
1 0 5 4 32 13
1, C = 52
23
3 = 7, C = 2 3 = 1. Hence,
4 1 1 33
1 1 1 T
2 1 1 1
3 2 3
adj A=[Cij ]T =4
1 3 7 = 1 3 1 5.
5 4
1 1 1 1 7 1
From example on page 64 we know that det A= 2. Using Theorem 34, we thus get
2 1 1 13
A = 4 1 3 1 5. 1 1
1 7 1
2
Verification:
2 1 1 1 2 3 1
32 3 2 2 0 0
3
1
4 1 3 1 54 1 1 0 5= 1
4 0 2 0 5=I . 3
1 7 1 5 4 1 0 0 2
2 2
X =A B.
2x 3 2b 3
1
4 5 4 5 1 1
In particular, let us take m=1. This means that A=[aij ] 66ij66nm, X = xn and B = bn 1
1
so equation (15) looks as follows
2a a n 3 2x 3 2b 3
...
4 11
54 5=4 5.
1 1 1
(16)
an ... ann xn bn
1
2a x a x + ... + a n xn 3 2b 3
Computing the product gives the equation
+
4 11
1
12
5=4 5
2 1 1
an x + an x + ... + ann xn bn
1 1 2 2
(17)
1 1 + an x + ... + ann xn = bn
2 2
2x 3
This means that matrix equation (16) is equivalent to linear system (17) in the sense
that X =4 5is a solution of (16) iff (x ;:::;xn) is a solution of (17). Hence, the above
1
xn
1
analysis gives:
xn bn
( 17) has a unique solution which is equal to
X =A b. 1
It turns out that there is also another way of solving equation (16) or – what amounts
to the same thing – of solving system (17). Namely, let us multiply the equation
2x 3 2b 3
A4 5=4 5
1 1
(18)
xn bn
by adj A from left. By Theorem 34 we thus get
2x 3 2b 3
det A4 5=adj A4 5.
1 1
xn bn
Let us take a look at the right hand side of this equation. This is the matrix whose th i
C b C b ::: C b
row is equal to 1i 1 + 2i 2 + + ni n. But this is, according to Theorem 30, equal
i
to th column expansion of
2a ... a i b a i ... a n 3
det4 5.
11 1 1 1 1 +1 1
|||a||n|||||||...|||||||a||ni||||||||||{z}
1 1 bn }}}a}}}ni}}}}}}}}}}...}}}}}}}a}}nn
Ai =:
+1
}}}}}
In another words, for every i 2f1;:::;ng we get
det A xi =det Ai.
Now, assume we are working over a field K. Then we have two cases. If det =0 then A
A
we get 0=det i, for every i 2f ;:::;ng
1 , which is a necessary condition for solvability of
equation (18) in this case. If det = A
/ 0 then i = det Ai , for every
det A
x 1 . Summing i 2f ;:::;ng
up, we have:
Theorem 36. (Cramer’s rule) Let K be a field. Consider system ( 17). Let be A
the coefficient matrix of this system. Denote by i the matrix constructed fromA A
i b
by replacing its th column with the column of constants of system ( 17).
1. If / 0 then ( 17) has the unique solution (x ; :::;xn) 2Kn, where
det A= 1
Example.
1. Consider the system:
8
<2 x+3 y +4 z =2
:3 xx+47 yy+23zz=0
=1 .
2 32 3 2 3
Written as matrix equation, this system takes the form
2 3 4 x 2
4
1 4 354y5=415.
3 7 2 z 0
We have det A = 2
4 3 + ( 1) ( 1) 3 4 + 3 3 4 =
7 2 7 2 4 3
2 ( 13)+34 +3 ( 25)= 26 +34 75 = 67. Since det A=
/ 0 we know at this
22 3 43
point that our system has a unique solution. To find this solution, we compute:
0 7 2 7 2 7 2
2 2 2 43
det A =det4 1 1 35=2( 1)
1 3 +1 2 4 +0= 27+( 8)=
2
3 0 2 3 2 3 2 22,
2 2 3 23
det A =det4 1 4 15=2
1 4 +1( 1) 2 3 =2( 5) ( 23)=13.
3
3 7
3 7 0 3 7
x= A1 =
det
A
det
60
67
= 6067 ;
y = AA = = ; z = AA = = .
det
det
2 22
67
22
67
det
det
3 13
67
13
67
It is easy to check that this (x;y;z) is indeed the solution of our system.
2. Consider the system
8
<2 x+3 y +4 z =2
:2 xx+3
+4 y 3 z =1 .
y +4 z =0
Its coefficient matrix is 2 2 3 43
A=4 1 4 35.
2 3 4
Since the first and the third rows of this matrix are identical, by Property 13 item
A
2 we infer that det =0. On the other hand, one checks easily that det 1 = / 0, A
which by Theorem 36 item 2 means that this system is inconsistent.
3. Consider the system
8
<2x+3 y +4 z =1
:22xx+3
+3 y +4 z =2 .
y +4 z =3
Its coefficient matrix is 22 3 43
A=42 3 45.
2 3 4
Here, again by Property 13 item 2, we infer that det =0. Similarly, det i =0 A A
i ;; A
for =1 2 3, because in every i one of its columns is a multiple of another one
(cf. Property 13 item 3). Clearly, in this case the system is inconsistent because
by subtracting e.g. its first two equations we get 0=1 2= 1.
4. Consider the system
8
<2x+3 y +4 z =1
:22xx+3
+3 y +4 z =1 .
y +4 z =1
As above, its coefficient matrix is
22
3 4
3
A=42
3 45.
2 3 4
We have det A =0 and det Ai =0 for i =1; 2; 3 (same reasons as in the previous
example). But in this case we have infinitely many solutions, because the system
x y z
reduces to only one equation: 2 +3 +4 =1 – a plane in R3.
Examples 3 and 4 show that in general one cannot strengthen item 3 of Theorem 36
(both cases may happen). There are, however, some special cases when more can be said.
The first one is:
Property 15. Using the same notations as in Theorem 36, assume that system
x x
( 17) has only two variables 1 , 2 (and two equations as well). If det =0 and A
A A
det 1 =det 2 =0 then ( 17) has infinitely many solutions.
8
that is if the system has the form
<a x + a x + ... + a n xn = 0
.
:an x
11 1 12 2 1
1 1 + an x + ... + ann xn = 0
2 2
Obviously, a homogeneous system always has a solution (the trivial one, that is x =::: =
x
1
n =0). Hence:
Corollary 10. Using the same notations as in Theorem 36, assume that system
A
( 17) is homogeneous. If det =0 and det i =0 for every 1 , then ( 17) A i 2f ;:::;ng
has infinitely many solutions.
Property 16. Let K be a field and A2Kmn. Then rank A6min (m;n). Moreover,
A
rank =rank T. A
We have:
8
Theorem 37. (Rouché-Capelli) Let K be a field. Consider the system
<a x a x + ... + a n xn = b
+
:am x , ()
11 1 12 2 1 1
1 + am x + ... + amn xn = bm
1 2 2
where aij ;bi 2 K. Then system ( ) has a solution in K iff the ranks of the coef-
ficient matrix A and the augmented matrix [Ajb] are the same; in symbols – iff
rank A = rank [Ajb]. Moreover, if ( ) does have solutions, then the dimension
of the hyperplane of solutions is equal to n rank A. In particular, ( ) has
a unique solution iff n =rank A=rank [Ajb].
TBC...
Further reading 71
Further reading
[Gar07] Garrett, P. Abstract Algebra . Available at http://www-users.math.umn.edu/~garrett/m
/algebra/notes/Whole.pdf.
[Goo06] Goodman, F., M. Algebra: Abstract and Concrete. Available at http://homepage.math.
uiowa.edu/~goodman/algebrabook.dir/book.2.6.pdf.
[Sho08] Shoup, V. A Computational Introduction to Number Theory and Algebra . Available at
http://shoup.net/ntb