Cryptology and
Coding Theory
Information & Coding Theory
Christopher J Kawishe
MIS, BSc CS, ICDL
Author | Researcher | Trainer | Consultant
Expert in Cybersecurity, System Development & AI
The fundamental problem of
communication is that of
reproducing at one point either
exactly or approximately a
message selected at another point.
(Claude Shannon, 1948)
2
Prepared by Christopher James
Claude E. Shannon (1916-2001)
To solve this problem, Shannon created a branch of
applied mathematics: Information Theory.
3
Prepared by Christopher James
Information Theory is a mathematical
framework for quantifying, storing, and
communicating information. Influences
various areas in computer science such as
data compression, cryptography, machine
learning, communications, and even
algorithm design.
4
Prepared by Christopher James
Applications in Computer Science
• Data Compression - use entropy to prioritize frequent patterns (lossless) or discard
redundant data (lossy).
• Error-Correcting Codes - Ensure data integrity in storage (CDs) and transmission (Wi-
Fi, 5G). Hamming codes correct single-bit errors; LDPC codes approach Shannon limits.
• Cryptography - Information-theoretic security (e.g., one-time pads) guarantees
secrecy if keys are truly random and never reused.
• Machine Learning - Decision trees use entropy reduction (information gain) for splits.
Cross-entropy loss optimizes classification models.
• Telecommunications - Modems and MIMO systems leverage channel capacity for
high-speed wireless/data networks.
• Quantum Computing - Extends principles to qubits, enhancing secure communication
via quantum entropy.
5
Prepared by Christopher James
Foundations of Information Theory
6
Prepared by Christopher James
Digital communication
•Process of transmitting digital signals from a source to a
destination through a communication channel.
•This process encompasses
•Converting information into a digital form,
•Encoding it for efficient transmission,
•Conveying it through a possibly noisy channel, and
•Decoding it to retrieve the original message.
7
Prepared by Christopher James
Shannon’s
Communication Model
Digital communication systems
Entropy Binary digits (bits)
Discrete symbols
Message Source Channel
Modulator RF
Source Encoder Encoder
Add redundant bits
to the message to
Reduce the number of bits protect the latter
required to convey a given against transmission
amount of information How? How? errors Physical Signal-to-noise
Channel ratio (SNR)
Bit error Bit error
Discrete symbols probability probability
after decoding before decoding
Message Source Channel
Decoder Demodulator RF
Sink Decoder
Digital Communication System
• Message Source - The origin of the information (e.g., voice, text, video)
that needs to be transmitted.
• Source Encoder - Reduce redundancy in the message to minimize the
number of bits required for transmission. E.g., Huffman coding.
• Channel Encoder - Protect the message against transmission errors by
adding redundant bits. E.g., Hamming codes, Reed-Solomon
• Modulator - Converts digital bits into analog signals suitable for
transmission over a physical channel (e.g., RF signals).
• Physical Channel - The medium (e.g., wired, wireless, optical fiber)
through which the signal travels. Signal-to-noise ratio (SNR) affects bit
error probability before decoding.
10
Prepared by Christopher James
Digital Communication System
• Demodulator - Converts the received analog signal back into
digital bits.
• Channel Decoder -Correct errors introduced during transmission
using the redundant bits. Key Metric: Bit error probability after
decoding (improved by error correction).
• Source Decoder - Reconstructs the original message from the
compressed data (reverses source encoding).
• Message Sink - The final destination where the information is
received (e.g., speaker, display).
11
Prepared by Christopher James
Entropy and Data
Measurement
Measure of uncertainty or randomness
associated with a random variable.
H ( X ) = − ∑ p( x ) log2 p( x )
x
Where:
• H(X): Entropy of random variable X. H is always positive, min value is 0
• P(x): Probability of occurrence of symbol x
13
Prepared by Christopher James
Higher entropy means more uncertainty or
information content
• More randomness and unpredictability
• Stronger security (harder to guess or
brute-force)
• Examples: Tg$9k@Lm3Pq!, cryptographic
keys
• Less redundancy; harder to compress
• Ideal for secure passwords and encryption
14
Prepared by Christopher James
Lower entropy implies predictability (e.g., a
biased coin).
• More predictable and ordered
• Weaker security (easier to guess or crack)
• Examples: 123456, password
• More redundancy; easier to compress
• Risky for passwords or secure data
15
Prepared by Christopher James
For example,
•Tossing a dice: Outcomes are 1,2,3,4,5,6
•Each occurs at probability 1/6
•Information provided by tossing a dice is
6 6
H = − ∑ p (i ) log2 p (i ) = − ∑ p (i ) log2 p(i )
i =1 i =1
16
1
= − ∑ log2 = log2 6 = 2.585 bits
i =1 6 6
16
Prepared by Christopher James
Wait!
The number 2.585-bits is not an integer!!
What does you mean?
• Using fixed-length encoding, each of 1,000,000-
coin tosses (with 8 possible outcomes) requires 3
bits, totaling 3,000,000 bits.
• But by applying entropy-based (variable-length)
encoding, we can reduce the average to ≈2.585 bits
per toss, needing only 2,585,000 bits—a 14%
reduction in storage.
• This demonstrates how compression based on
entropy saves space when data has patterns or
unequal probabilities.
17
Prepared by Christopher James
Let’s Compare File Size Compression
Ratio
No Compression 8,000,000 bits 100%
Shannon 2,585,000 bits 32.31%
Winzip 2,930,736 bits 36.63%
WinRAR 2,859,336 bits 35.74%
18
Prepared by Christopher James
Information Rate (R)
Information Rate (R) Refers to the rate at which information is transmitted over a
communication channel. It is typically measured in bits per second (bps).
R=rH Where:
• 𝑅𝑅 is the Information Rate in bits per second.
• 𝑟𝑟 is the symbol rate (symbols per second).
• 𝐻𝐻 is the entropy per symbol (average information content per symbol in bits).
7
Find Information rate, if source emits 1 symbol/ms and Its entropy is bits/symbol.
4
7 1 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
R= x= = 1750 bits/sec or 1.750 Kbps
4 10−3 𝑆𝑆𝑆𝑆𝑆𝑆
OR if 1 second = 1000 milliseconds
R = 1.75bits/symbol x 1000 symbol/sec = 1750 bits/sec (bps) or 1.750 Kbps
19
Prepared by Christopher James
Follow-up Story
Later in 1952, David Huffman, while was a graduate
student in MIT, presented a systematic method to
achieve the optimal compression ratio guaranteed
by Shannon. The coding technique is therefore
called “Huffman code” in honor of his achievement.
Huffman codes are used in nearly every application
that involves the compression and transmission of
digital data, such as fax machines, modems,
computer networks, and high-definition television (1925-1999)
(HDTV), to name a few.
20
Prepared by Christopher James
Source vs. Channel Coding
21
Prepared by Christopher James
kraft Inequality
• If you can decode the codeword of a code uniquely then it is called uniquely
decodable codes.
• Kraft's inequality helps to determines if code is uniquely decodable.
• Kraft's inequality does not guarantee that the code is uniquely decodable. .
• But If results is greater than 1 then it is not uniquely decodable.
L
nk ≤ 1 Where as
∑2
−
• -nk is number of bits in a codeword
k =1 • L is number of code words in a code
22
Prepared by Christopher James
kraft Inequality
CODE A RX
0 0
100 100
110 110
111 111 7
• CODE A satisfies Kraft's inequality (since ≤1)
8
• This means CODE A can be a prefix code and uniquely
decodable
CODE B RX
0 0
10 10
010 0, 10 or 010
001 001 • CODE B also satisfies Kraft’s inequality (since the sum
equals 1)
• But is not a prefix code and not uniquely decodable
23
Prepared by Christopher James
Average code length
•The Average Code Length is Indicates the expected
number of bits per symbol in the encoded data.
•The average code word length L per source symbol is
defined by N
L = ∑ P( x j )n j
J =1
24
Prepared by Christopher James
Code Efficiency
• Code Efficiency measures how close the average code length is to the theoretical
lower bound (entropy). Higher efficiency means less wastage of bits.
• In an optimal coding scheme, the average code length can approach the entropy.
Efficiency is defined as the ratio of the source entropy 𝐻𝐻 to the average code
length 𝐿𝐿.
• The code efficiency η is given by η =
Lmin
L
H ( x)
• The Efficiency of a prefix code is defined by η =
R
• where Lmin is the minimum possible value of L.
25
Prepared by Christopher James
Code Redundancy
•Code Redundancy quantifies the amount of extra bits in the
encoding compared to the theoretical optimum.
•It measures the inefficiency or the overhead of the
encoding scheme.
•Redundancy is the difference between the average code
length and the source entropy given by 𝑅𝑅=𝐿𝐿−𝐻𝐻 or in relative
terms as = 1− γ η
26
Prepared by Christopher James
Shannon Fano Coding
A source emits 6 symbols with probability 0.3, 0.25, 0.2, 0.12, 0.08, 0.05 respectively. Construct
Shannon Fano Coding and find Code Efficiency (η)
Solution
• First arrange them from highest to lowest
• Then partition the probability to the closest of sum of probabilities e.g. .03+0.25=0.55 and
0.2+0.12+0.08+0.05=0.45
• Then on top of partition you will write 0 and below your will write 1
• Repeat the steps until you left with 1
27
Prepared by Christopher James
Shannon Fano Coding
Entropy H ≈ 0.521+0.5+0.464+0.367+0.292+0.216≈2.36 bits per symbol
Average Code Length
L = 0.6+0.5+0.4+0.36+0.32+0.2=2.38 bits per symbol
Code efficiency (η)
is defined as:
Code Redundancy
28
Prepared by Christopher James
Huffman Coding
• Data compression technique that is used to compress data files.
• Developed by David A. Huffman in the 1950s.
• Uses a variable-length code for each character in the file. The
code lengths are determined by the frequency of each character
in the file.
• The most common characters are assigned the shortest codes,
and the less common characters are assigned longer codes.
• On average this should decrease the file size (usually ½)
29
Prepared by Christopher James
Applications of Huffman Coding
• Used to reduce the amount of storage space required for digital
images, audio data.
• Useful for reducing the size of text documents or email
attachments.
• When data is transmitted over a network, it is often compressed
using Huffman coding to reduce the amount of bandwidth
required. This can help to speed up data transfer and reduce
costs associated with transmitting data.
30
Prepared by Christopher James
Huffman Coding
A source emits 6 symbols with probability 0.3, 0.25, 0.2, 0.12, 0.08, 0.05
respectively. Construct Huffman Coding and find Code Efficiency (η )
Solution
• First arrange them from highest to lowest
• Add the last 2
• Arrange them from highest to lowest again and add the last 2, repeat the steps
until 2 left
• Then partition them, on top of partition you will write 0 and below your will write
1
31
Prepared by Christopher James
Huffman Coding
Entropy H ≈ 0.521+0.5+0.464+0.367+0.292+0.216≈2.36 bits per symbol
Average Code Length
L = 0.6+0.5+0.4+0.36+0.32+0.2=2.38 bits per symbol
Code efficiency (η)
is defined as:
Code Redundancy
32
Prepared by Christopher James
Another Method-Huffman Coding
• Huffman's algorithm is an example of a greedy algorithm.
• In general, greedy algorithms use small-grained, or local
minimal/maximal choices in attempt to result in a global
minimum/maximum.
• We pick the nodes with the smallest frequency and combine
them together to form a new node
• The two selected nodes are removed from the set, but replace
by the combined node
• This continues until we have only 1 node left in the set
33
Prepared by Christopher James
Greedy Algorithm Example:
Alphabet: A, B, C, D, E, F
Frequency table:
A B C D E F
10 20 30 40 50 60
Total File Length: 210
34
A 10 B 20 C 30 D 40 E 50 F 60
35
Run Greedy Algorithm
X 30 C 30 D 40 E 50 F 60
A 10 B 20
36
Run Greedy Algorithm
Y 60 D 40 E 50 F 60
X 30 C 30
A 10 B 20
37
Run Greedy Algorithm
D 40 E 50 Y 60 F 60
X 30 C 30
A 10 B 20
38
Run Greedy Algorithm
Z 90 Y 60 F 60
D 40 E 50 X 30 C 30
A 10 B 20
39
Run Greedy Algorithm
Y 60 F 60 Z 90
X 30 C 30 D 40 E 50
A 10 B 20
40
Run Greedy Algorithm
W 120 Z 90
Y 60 F 60 D 40 E 50
X 30 C 30
41
A 10 B 20
Run Greedy Algorithm
Z 90 W 120
D 40 E 50 Y 60 F 60
X 30 C 30
42
A 10 B 20
Run Greedy Algorithm
Huffman Coding
•Now we assign codes to the tree by placing a 1 on every
left branch and a 0 on every right branch
•A traversal of the tree from root to leaf give the Huffman
code for that particular leaf character
•Note that no code is the prefix of another code
43
Prepared by Christopher James
Run Greedy Algorithm
V 210
1 0
Z 90 W 120
0
1 1 0
D 40 E 50 Y 60 F 60
0
44
1
X 30 C 30
1 0
A 10 B 20
The Huffman encoding:
A: 0111 1
V 210
0
B: 0110
C: 010 Z 90
0
W 120
1 0
D: 11 1
E: 10 D 40 E 50 Y 60 F 60
0
F: 00 1
45
X 30 C 30
1 0
A 10 B 20
File Size: 10x4 + 20x4 + 30x3 + 40x2 + 50x2 + 60x2 =
40 + 80 + 90 + 80 + 100 + 120 = 510 bits
Note the savings
•The Huffman code:
• Required 510 bits for the file.
•Fixed length code:
•Need 3 bits for 6 characters.
•File has 210 characters.
•Total: 630 bits for the file.
46
Prepared by Christopher James
Error Control Coding
47
Prepared by Christopher James
Types of Bit Errors
Types of Bit Errors
Single-bit Error - occurs when exactly one bit in a data unit (or word) is
altered during transmission.
• Single-bit errors are the simplest form of corruption.
• Common in scenarios where minor, random noise influences the signal.
• The change of just one bit can be critical if it occurs in a sensitive part of
the data (for example, a parity bit or control flag).
49
Prepared by Christopher James
Types of Bit Errors
Burst Error - group of consecutive bits being in error. The error typically spans
multiple bit positions.
• Often result from short-term, high-intensity disturbances like electrical spikes
or severe interference.
• Can be more challenging to detect and correct because the corruption is not
isolated to a single bit.
50
Prepared by Christopher James
Types of Bit Errors
• The term burst error means that two or more bits in the data unit have changed
from 1 to 0 or from 0 to 1.
• Burst errors does not necessarily mean that the errors occur in consecutive
bits, the length of the burst is measured from the first corrupted bit to the last
corrupted bit. Some bits in between may not have been corrupted.
51
Prepared by Christopher James
Error Detection vs.
Correction
Error Detection
•This technique involves identifying the occurrence of an
error in the transmitted data.
•Means to decide whether the received data is correct or
not without having a copy of the original message.
•Uses the concept of redundancy, which means adding
extra bits for detecting errors at the destination.
•Parity check, checksums, cyclic redundancy check (CRC).
53
Prepared by Christopher James
Redundancy
54
Prepared by Christopher James
Hamming Code
• Uses multiple redundant bits inserted at specific positions,
enabling the detection and correction of single-bit errors.
• Memory systems (RAM) and low-latency communication
systems where real-time correction is essential.
55
Prepared by Christopher James
Hamming Code
56
Prepared by Christopher James
Hamming Code
57
Prepared by Christopher James
Hamming Code
58
Prepared by Christopher James
Single-bit error
59
Prepared by Christopher James
Error
Detection
Use XOR to plus
the bits
60
Prepared by Christopher James
Individual Assignment
61
Prepared by Christopher James
Digital Communication System
1. What is mutual information, and how is it useful in communication and machine
learning?
2. Define redundancy in the context of information theory. How does it help in
error detection and correction?
3. What are the trade-offs between redundancy and efficiency in communication
systems?
4. What is noise in communication systems, and what are its common sources?
5. How does noise introduce uncertainty and errors in the transmission of
messages?
6. How is reliability defined in digital communication systems?
7. What strategies are used to enhance reliability in the presence of noise?
62
Prepared by Christopher James
Digital vs. Analog Communication
1. What is the key difference between analog and digital transmission in terms of signal
representation?
2. Why is analog transmission more prone to degradation over long distances compared to
digital transmission?
3. List three advantages of digital communication over analog communication.
4. How does digital transmission improve noise immunity compared to analog transmission?
5. What are two main disadvantages of digital communication?
6. Why does digital communication require more bandwidth compared to analog
communication?
7. How does digital transmission facilitate easier error detection and correction?
8. What role does redundancy play in improving the reliability of digital communication?
9. Why is analog-to-digital conversion necessary in digital communication, and what
challenge does it introduce?
10. How does the complexity of hardware in digital systems compare to that in analog
systems?
63
Prepared by Christopher James
Data Compression
1. What is data compression, and what are its advantages?
2. Compare lossless and lossy compression methods. Provide
examples of each.
64
Prepared by Christopher James
Source vs. Channel Coding
1. What is the primary goal of source coding? Give an example.
2. What is the primary goal of channel coding? Give an example.
3. Define the following terms: Source alphabet, Binary codeword and Length of
codeword
4. Compare fixed-length and variable-length codes. Provide an example of each.
5. What is a uniquely decodable code? How does a prefix-free code ensure
unique decodability?
6. What are instantaneous codes, and why are they useful in real-time decoding?
7. What is entropy coding, and how does it relate to the entropy of a source?
Name two examples of entropy coding techniques.
65
Prepared by Christopher James
Channel Capacity
1. What does "channel capacity" mean in the context of information theory?
2. Why is it important to know the maximum rate at which information can be transmitted
reliably over a channel?
3. How does the capacity of a noiseless channel differ from that of a noisy channel?
4. Can you give an example of a real-world communication system that might approximate a
noiseless channel?
5. How does the probability p of a bit flip affect the reliability of a Binary Symmetric Channel?
6. What strategies could be used to mitigate errors in a BSC?
7. Why is the AWGN model considered realistic for many communication systems?
8. How does Gaussian noise impact the signal quality and the channel capacity?
9. How does the Binary Erasure Channel model packet loss in networks?
10. What are the implications of erasures (lost packets) for data transmission?
66
Prepared by Christopher James
Error Control Coding
1. What is the purpose of error-correcting codes?
2. How does adding redundancy help in error correction?
3. What are the key objectives of a good error control
coding scheme?
4. At which layers of the OSI model are error detection and
correction typically implemented?
67
Prepared by Christopher James
THANK YOU
©2025 Christopher has over decade years of experience in IT infrastructure, cybersecurity, and education, with
proven track record in designing, implementing, and supporting digital learning environments. His professional
journey includes roles as a tutorial assistant, computer engineer, and assistant system analyst, where he
enhanced institutional capabilities by developing custom web-based systems and providing training to over
10,000 users. In consultancy, he have led workshops on eLearning, cybersecurity, and artificial Intelligence.
68