Rene Nyffeneggers base64
Decoding Algorithm Explained
Michael Horii
Intro
Were using Rene Nyffeneggers code to
decode base64 strings.
Code is available at:
http://www.adpgmbh.ch/cpp/common/base64.html
The following slides explain how the algorithm
works.
Encoded Symbols
Suppose we have four encoded symbols to
decode, where each symbol is represented by
a 6-bit vector. For ease of illustration, well
choose the encoded symbols as follows:
Encoded Symbol 1: ES1 = 111111
Encoded Symbol 2: ES2 = 000000
Encoded Symbol 3: ES3 = 111111
Encoded Symbol 4: ES4 = 000000
Decoding the String
To decode the base64 string, we concatenate the 6bit vectors, and group them in 8-bit vectors.
ES=Encoded Symbol
ES1
ES2
ES3
ES4
111111000000111111000000
DS=Decoded Symbol
DS1
DS2
DS3
111111000000111111000000
C
Note we start with a
4-symbol encoded
string, and end up
with a 3-symbol
decoded string
We match each 8-bit vector to a symbol using a
look-up table.
11111100 C
00001111
11000000
A
T
The Catch
This decoding process is straightforward
conceptually, but the catch is that bytes are 8
bits, while the encoded symbols are
represented by 6-bit vectors.
That means well need to use three operators
to accomplish our goal of assembling the bits:
Bit shifting
Masking
ORing
Prepended Encoded Symbols
When we assign a 6-bit vector to a (1-byte) char
variable, two zeroes are prepended to the front
of the 6-bit vector:
111111 becomes 00111111
000000 becomes 00000000
Well refer to a prepended encoded symbol as PES:
PES1 = 00111111
PES2 = 00000000
PES3 = 00111111
PES4 = 00000000
(ES1 = 111111)
(ES2 = 000000)
(ES3 = 111111)
(ES4 = 000000)
Aligning Bit Vectors
Because were fitting 6-bit vectors into 8-bit
containers, we need to align the 6-bit vectors
using bit shifting.
ES1
ES2
ES3
ES4
111111000000111111000000
unsigned char PES1 = 0b111111;
unsigned char PES2 = 0b000000;
DS1
PES1 shifted left by 2
to align the bits
DS2
00111111
00000000
DS3
111111000000111111000000
00111111
00000000 Apply mask to PES2 to get the two digits
(underlined) needed for DS1, and then shift
right by 4 to align the bits
Plan of Action
The following slides show in detail how the
bytes are manipulated, from start to finish, to
achieve our goal.
Steps:
Populate Byte 1 (DS1) using ES1 and ES2
Populate Byte 2 (DS2) using ES2 and ES3
Populate Byte 3 (DS3) using ES3 and ES4
ES=Encoded Symbol
ES1
ES2
ES3
ES4
111111000000111111000000
DS=Decoded Symbol
DS1
DS2
DS3
111111000000111111000000
Populating Byte 1
Byte 1
Prepended ES1 = PES1
111111000000111111000000
00111111
Shift left by 2
11111100
11111100
Prepended ES2 = PES2
00000000
Mask 00110000
Apply Mask
00000000
Blue: Digits
from ES1
Orange: Digits
from ES2
PES1 << 2
To populate Byte 1, we only want the first two nonprepended digits of ES2, so we use a mask to get these
digits
PES2 & 0x30
Shift right by 4
00000000
00000000
(PES2 & 0x30) >> 4
11111100
(PES1 << 2) | ((PES2 & 0x30) >> 4)
Populating Byte 2
Byte 2
PES2
111111000000111111000000
00000000 To populate Byte 2, we only want the
Mask 00001111
last 4 digits of ES2, so we use a mask to
Apply Mask
get these digits
00000000 PES2 & 0xf
Shift left by 4
00000000
00000000
00111111
Mask 00111100
PES3
Apply Mask
00111100
Shift right by 2
(PES2 & 0xf) << 4
To populate Byte 2, we only want the first 4 nonprepended digits of ES3, so we use a mask to get
these digits
PES3 & 0x3c
00001111
00001111 (PES3 & 0x3c) >> 2
00001111 ((PES2 << 0xf) << 4) | ((PES3 & 0x3c) >> 2)
Populating Byte 3
Byte 3
PES3
111111000000111111000000
00111111
Mask 00000011
Apply Mask
00000011
PES3 & 0x3
11000000
00000000
11000000
(PES3 & 0x3) << 6
Shift left by 6
PES4
PES4
((PES3 & 0x3) << 6) |PES4
Putting It All Together
Now we have three bytes, and concatenated
we get the desired byte array.
Byte 1
Byte 2
Byte 3
111111000000111111000000
C
11111100
00001111
11000000
C
A
T
Decoding just requires using the lookup table
to find out what symbol goes with Byte 1, Byte
2, and Byte 3. Using the look-up table shown
earlier, well get CAT as the decoded string.