Parham and AmirAli’s work
Thursday, October 24, 2024
Huffman tree and Huffman coding
How does it happen?
Given a name Amir Ali
To construct Huffman codes for any thing (here a word specifically), follow these
steps:
Calculate Frequencies: Count the frequency of each character in the word.
Note: We're treating uppercase and lowercase letters as distinct characters. However, in
"AmirAli", all instances of 'A' are uppercase, and all instances of 'i' are lowercase.
Create a leaf node for each character and add it to a priority queue (min-
heap) based on their frequencies:
1. m (frequency 1)
2. r (frequency 1)
3. l (frequency 1)
4. A (frequency 2)
5. i (frequency 2)
Build the Huffman Tree:
We iteratively combine the two nodes with the lowest frequencies until only one
node (the root) remains.
1


Parham and AmirAli’s work
Iteration 1:
• Combine: m and r (both with frequency 1).
• Create: New node N1 with frequency 2.
• N1 becomes the parent of m and r.
Queue after Iteration 1:
• l (frequency 1)
• A (frequency 2)
• i (frequency 2)
• N1 (frequency 2)
Iteration 2:
• Combine: l (frequency 1) and N1 (frequency 2).
• Create: New node N2 with frequency 3.
• N2 becomes the parent of l and N1.
Queue after Iteration 2:
• A (frequency 2)
• i (frequency 2)
• N2 (frequency 3)
Iteration 3:
• Combine: A and i (both with frequency 2).
• Create: New node N3 with frequency 4.
• N3 becomes the parent of A and i.
Queue after Iteration 3:
• N2 (frequency 3)
• N3 (frequency 4)
Iteration 4:
• Combine: N2 (frequency 3) and N3 (frequency 4).
2

Parham and AmirAli’s work
• Create: New root node N4 with frequency 7.
• N4 becomes the parent of N2 and N3.
Assign Codes to Each Character:
Assign binary codes to each character by traversing the Huffman tree. Moving left
adds a '0' to the code, and moving right adds a ‘1'.
[N4:7]
/ \
'0'/ \'1'
/ \
[N2:3] [N3:4]
/ \ / \
'0'/ \'1' '0'/ \'1'
/ \ / \
[l:1] [N1:2] [A:2] [i:2]
/ \
'0'/ \'1'
/ \
[m:1] [r:1]
Final Huffman Codes:
1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 or '10' '010' '11' '011' '10' '00' '11'
3

Parham and AmirAli’s work
Paths and Codes:
• 'l': Path '0' (to N2) + '0' (to 'l') = '00'
• 'm': Path '0' (to N2) + '1' (to N1) + '0' (to 'm') = '010'
• 'r': Path '0' (to N2) + '1' (to N1) + '1' (to 'r') = '011'
• 'A': Path '1' (to N3) + '0' (to 'A') = '10'
• 'i': Path '1' (to N3) + '1' (to 'i') = '11'
To decode the binary string 1001011011100011 back to "AmirAli", follow the
paths in the Huffman tree:
• Start at the root node.
• Read bits one by one:
◦ '1' then '0' → 'A'
◦ '0' '1' '0' → 'm'
◦ '1' '1' → 'i'
◦ '0' '1' '1' → 'r'
◦ '1' '0' → 'A'
◦ '0' '0' → 'l'
◦ '1' '1' → 'i'
Tips To Consider:
In a min-heap priority queue used for Huffman coding, nodes are organized based
on their frequencies—the node with the smallest frequency has the highest
priority (i.e., it's at the top of the heap). When multiple nodes have the same
frequency, the order in which they are selected can vary because their
frequencies are equal.
Why were 'm' and 'r' selected before 'l' in the min-heap priority queue?
• Equal Frequencies: In the word "AmirAli", the characters 'm', 'r', and 'l'
each have a frequency of 1.
4

Parham and AmirAli’s work
• Arbitrary Selection: When multiple nodes have the same minimum frequency,
any two of them can be selected for combination. The Huffman algorithm
doesn't specify which ones to pick first in this case.
• Our Example: We arbitrarily chose to combine 'm' and 'r' first. This was a
random choice for illustration purposes.
Detailed Explanation
Step-by-Step Selection in the Min-Heap
1. Initial Min-Heap Content:
◦ Nodes with frequency 1: 'm', 'r', 'l'
◦ Nodes with frequency 2: 'A', 'i'
2. Iteration 1:
◦ Select Two Nodes with Minimum Frequency:
▪ Nodes: 'm' (frequency 1) and 'r' (frequency 1)
▪ Why 'm' and 'r'? Because all three nodes ('m', 'r', 'l') have
the same frequency, we can pick any two. We chose 'm' and 'r'
arbitrarily.
◦ Combine Them:
▪ Create a new node N1 with frequency 1+1=2
1 + 1 = 2
1+1=2.
▪ N1 becomes the parent of 'm' and 'r'.
◦ Update Min-Heap:
▪ Remove 'm' and 'r'.
▪ Insert N1 (frequency 2).
3. Iteration 2:
◦ Select Two Nodes with Minimum Frequency:
▪ Nodes: 'l' (frequency 1) and N1 (frequency 2)
◦ Combine Them:
▪ Create a new node N2 with frequency 1+2=3
1 + 2 = 3
1+2=3.
5

Parham and AmirAli’s work
▪ N2 becomes the parent of 'l' and N1.
◦ Update Min-Heap:
▪ Remove 'l' and N1.
▪ Insert N2 (frequency 3).
Key Points
• Arbitrary Choice with Equal Frequencies:
◦ The selection between 'm', 'r', and 'l' was arbitrary because they
all had the same frequency.
◦ We could have combined 'l' and 'm', or 'l' and 'r', and the final
Huffman codes would still be optimal.
• Final Huffman Codes Unaffected:
◦ The total length of the encoded message and the efficiency of the
Huffman codes remain the same regardless of the order in which
equally frequent nodes are combined.
Alternative Scenario:
Suppose we had combined 'l' and 'm' first instead:
The Huffman tree structure would be slightly different, but the overall encoding
efficiency would remain the same.
The specific Huffman codes for each character might differ in their bit patterns,
but the total number of bits used to encode the message would be identical.
Additional Note
• If you have specific criteria or preferences (e.g., always selecting the
earliest character alphabetically), you can apply that consistently
throughout the algorithm.
• However, the standard Huffman algorithm does not mandate any particular
tie-breaking rule, focusing instead on minimizing the total encoded length.
6