Motivation
• Data that is either transmitted over communication channel (e.g.
bus) or stored in memory is not completely error free
• Error can caused by:
Error Detection and Correction 1. Transmission Errors
¾ Signal distortion or attenuation
¾ e.g. sender and receiver out of syn, can happen if clocks are not
CIT 595 synchronized – systems distributed over network
Spring 2007 2. Storage Errors
¾ DRAM memory cell contents can change spuriously due to some
electromagnetic interference
¾ In magnetic storage devices such as disks, magnetic flux density
increases could cause one or more bits to flip (change that value)
CIT 595 16 - 2
Error Detection and Correction Parity
• Error detection is the ability to detect errors • The simplest and oldest error detection method
• Error correction has an additional feature that enables
• In this scheme, a binary digit called parity is used to
identification and correction of the errors
indicate whether the number of bits with value of one in
• Error detection always precedes error correction a given set of bits is even or odd and is appended to
original data
• Both can be achieved by having extra or redundant or
check bits in addition to data deduce that there is an
error • Usually used to detect transmission error
¾ Original Data is encoded with the redundant bit(s)
¾ The sender adds the parity bit to existing data bits before
¾ New data formed is known as code word transmission
¾ The receiver checks for the expected parity, If wrong parity
found, the received data is discarded and retransmission is
requested
CIT 595 16 - 3 CIT 595 16 - 4
1
Parity type: Even Parity type: Odd
• Forced an even number of one’s on total data sent • Forced an odd number of one’s
¾ 000 0001 Æ 1000 0001
¾000 0001 Æ 0 000 0001
¾ 000 0011 Æ 0001 0001
¾000 0011 Æ 1 000 0011
• Generating even parity bit is just an XOR function
• Data Received Examples: • Odd parity is generated using a XNOR function
¾ 0111 1111 - incorrect
¾ 1000 0000 - incorrect
¾ 1000 0001 - valid
• Note that error could be in data or parity
• Not entirely fool proof
CIT 595 16 - 5 CIT 595 16 - 6
Limitations of Parity Limitations of Parity (contd..)
• Cannot determine which bit Data Parity Code • What happens if the code word is subjected to two-bit
position has a problem Word Code Word error?
(Even)
¾E.g. 011 became 000 while transmission
00 0 000
• If 001 is encountered, it is ¾According to parity scheme, no error is detected but
not a valid code-word and 01 1 011 error!!
hence error is detected 10 1 101
11 0 110 • In general, If an odd number of bits (including the parity
• The correct code-word bit) are changed from a set of bits then parity bit will be
000 100
could either be 101 or 011 incorrect and will thus indicate that an error has occurred
001 101
but we cannot tell
010 110
011 111
8 possible, only 4
CIT 595 correct code-words 16 - 7 CIT 595 16 - 8
2
Hamming Distance and Error Detection Hamming Distance and Error Correction
• Hamming Distance = # of bit positions in which two code words differ • If there is a larger hamming distance between valid code words,
¾ E.g. 10001001 and 10110001 have distance of 3 ¾ Then we may be able to determine which valid codeword was
intended
• If hamming distance is d apart, then d-single bit errors are required to
convert any one valid code into another • Suppose a code needs just 2 different values, and we use:
¾ Implying that this error would not be detected ¾ One valid value = 0000 0000 and the other = 1111 1111
¾ Then distance between these is 8
• In previous parity example (slide 7)
¾ Could detect 1-bit error as 4 code words had hamming distance = 2
• Suppose we got 2 bit changes so that:
¾ But could not detect 2-bit error
¾ 0000 0000 became 0011 0000
• In general, to detect k-single bit error, minimum hamming distance
D(min) = k + 1 • Can we determine what the transmitted value was?
¾ Hence we need code words that have D(min) = 2 + 1 = 3 to detect 2-bit ¾ Was it more likely 0000 0000 or 1111 1111?
errors
CIT 595 16 - 9 CIT 595 16 - 10
Hamming Distance and Error Correction (contd..) Hamming Code (HC)
• The greater the distance between valid code words, • Hamming Code is type of Error Correcting Code (ECC)
the easier it is to figure what the correct codeword was ¾ Provides error detection and correction mechanism
• Adopt parity concept, but have more than one parity bit
• Requires additional redundant bits (> 1 parity bit) to
chose code words that are far apart • In general hamming code is code word of n bits with m
data bits and r parity (or check bits)
• D(min) = 2k + 1 is required for correcting k-errors ¾ i.e. n = m + r
• Can detect D(min) – 1 errors
• Can correct errors
• Hence to correct k errors, need D(min) = 2k + 1
¾ you need a least a distance of 3 to correct a single bit error
CIT 595 16 - 11 CIT 595 16 - 12
3
Determining # of Parity bits for single-bit correction Determining # of Parity bits for single-bit correction
• Hamming Code for single-bit error correction is the • An error could occur in any of the n bits of the code
most commonly used word, so each code word can be associated with n
erroneous words (at a hamming distance of 1)
¾Experiments (IBM study) show 98% time there are
¾ E.g. Previous Parity Example (slide 7)
single-bit errors
¾ 000 can become 001 or 010 or 100 due to single bit error
• Need determine r for m-data bits that provides code • Therefore, we have n + 1 bit patterns for each code
words of n-bits that has single-bit correction capabilities word: one valid code word, and n erroneous words
• This gives us the inequality: (n + 1) × 2 m ≤ 2 n
¾ 2 m is the number of legal code
CIT 595 16 - 13 CIT 595 16 - 14
Hamming Code: Determining Parity bits for single-bit
correction
Hamming Algorithm for Code Generation
• Because n = m + r, we can rewrite the inequality as: • Hamming Algorithm provides a technique to design
codes for single-bit error correction
(m + r + 1) × 2 m ≤ 2 m + r or (m + r + 1) ≤ 2 r
• This inequality gives us a lower limit on the number of • We will look at an example generating code word
parity bits that we need in our code words ¾ Steps/rules are provided in the book section 2.7.2
• Example: Suppose we have data words of length m = 4
(4 + r + 1) ≤ 2 r
¾ Implies that r must be greater than or equal to 3
¾ This means to build a code with 4-bit data words that will correct
single-bit errors, we must add 3 check bits
CIT 595 16 - 15 CIT 595 16 - 16
4
Example: Generation of 12-bit Code word Example: 12-bit Code word
• 8-bit data needs 4 parity bits, total of 12-bit code word • We then write each bit position, 1 through 12, in
• Using our code words of length 12, number each bit position powers of 2
starting with 1 in the low-order bit 1 = 20 5 = 22 + 2 0 9 = 23 + 2 0
2=2 1 6=2 +22 1 10 = 2 3 + 2 1
• Each bit position corresponding to an even power of 2 will be 3 = 2 1 + 2 0 7 = 2 2 + 2 1 + 2 0 11 = 2 3 + 2 1 + 2 0
occupied by a parity or check bit
4 = 22 8 = 23 12 = 2 3 + 2 2
• All other bit positions are for the data to be encoded
• 1 (= 20) contributes to all of the odd-numbered digits
• Each parity bit calculates the parity for some of the bits in the ¾ Hence parity at position 1 will based on bits in position 3, 5, 7,
code word 9, 11
• And so on…
CIT 595 16 - 17 CIT 595 16 - 18
Example: 12-bit Code word Example: 12-bit Code word
• 2 (= 21) contributes to positions 2, 3, 6, 7, 10, and 11
¾ Position 2 will contain the parity for bits 3, 6, 7, 10,
and 11
• For the bit values shown, we have a parity value of 0 in
the second bit position using even parity The completed code word is shown above:
• Bit 1 checks the positions 3, 5, 7, 9, and 11, so its
• Even parity = XOR i.e. 0 ^ 1 ^ 0 ^ 0 ^ 1 = 0 value is 1
• Bit 2 checks the positions 3, 6, 7, 10, and 11, so its
value is 0
• Bit 4 checks the positions 5, 6, 7, and 12, so its
value is 1
• Bit 8 checks the positions 9, 10, 11, and 12, so its
value is also 1
CIT 595 16 - 19 CIT 595 16 - 20
5
Example: 12-bit Code word Example: 12-bit Code word
• Suppose an error occurs in bit 5, as shown above • Parity bits 1 and 4 both check position 5 and 7
• Our parity bit values are: • Since parity bit 2 checks bit 7 and indicates no error
¾ Bit 1 checks positions, 3, 5, 7, 9, and 11. Its value is 1, but occurred in the subset of bits it checked that means
should be 0 that error occurred in bit 5
¾ Bit 2 checks positions 3, 6, 7, 10, and 11. The zero is
correct. • If we change bit 5 to a 1, all parity bits check and our
¾ Bit 4 checks positions 5, 6, 7, and 12. Its value is 1, but data is restored
should be 0
¾ Bit 8 checks positions 9, 10, 11, and 12. This bit is correct
CIT 595 16 - 21 CIT 595 16 - 22
Some notes on Hamming Code Some Notes on Hamming Code (contd..)
• Require Dmin = 3 to detect and correct 1-bit error • But sometimes cannot tell if 1 or 2 bit errors occured
¾ With 2-bit errors, hamming code ends up correcting wrong
• If any-one parity bit has error, the error is still single-bit position (which actually is innocent)
correctable
• Two determine whether we have 1-bit or 2-bit error we
• If two errors occur, Hamming code still produces an add need to add an extra check bit
error ¾ The extra check bit is to determine parity across all data and
existing parity bits
• This is because the Hamming distance between two
code words is enough that double bit errors can be ¾ If this extra check bit is calculated to match the received value,
then we either have two bit errors, or no errors
detected
¾ D(min) for 1-bit correction = 3 (2k + 1) ¾ no errors can be confirmed by further determining whether
¾ D(min) for 2-bit error detection = k + 1 = 2 + 1 = 3 parity at bit position 1, 2, 4, 8, etc are same as received parity
at those positions
CIT 595 16 - 23 CIT 595 16 - 24
6
Some notes on Hamming Code (contd..) Error Correcting Codes (ECC) in General
Overheard in terms of bits • Hamming Code is type of ECC
• Small number of bits data -> overheard is high ¾ Others include Reed-Solomon, Convolution Code
¾ E.g. 4 bits, need 3 parity + 1 for 2-bit detection, overhead is 50%
¾ E.g. 1 byte transfers, need 4 parity + 1, overhead is 38 % • Requires extra bits for maintaining information integrity
¾ E.g. in hamming code: 3 bits are added to a 4-bit data
• For larger transfer -> overhead decreases
¾ E.g. 6 parity bits for 57 bits (~7 bytes). If add another bit for 2-bit
error detection, we need 7 parity bits – still less than a byte • The overhead of extra bits does pay off
¾ Need to devote 1 out of each 8 bytes i.e. 12.5% to error ¾ Single-bit correction often costs less than sending the entire
correction/detection data twice
¾ If the storage is the only source of data (e.g. disk or DRAM)
Overhead in terms of hardware
then we want a error-correction to avoid crashing of programs
• Some part of storage will be utilized for the check bits
• Need hardware that will encode and decode code words on the fly –
added circuitry
¾ If done in software, will be slow – need processor’s involvement
CIT 595 16 - 25 CIT 595 16 - 26