Alphanumeric Codes Earlier computers were used only for the
purpose of calculations i.e. they were only used as a calculating device. But now
computers are not just used for numeric representations, they are also used to represent
information such as names, addresses, item descriptions etc. Such information is
represented using letters and symbols. Computer is a digital system and can only deal
with l's and 0s. So to deal with letters and symbols they use alphanumeric codes.
. Using these codes, we can interface input-output devices such as keyboards, monitors,
printers etc. with computer.
Several coding techniques have been invented that represent alphanumeric information
as a series of l's and O's. The earliest better-known alphanumeric codes were the Morse
code used in telegraph and 12-bit Hollerith code used when punch cards were used as a
medium of inputting and outputting data. As the punched cards have completely
vanished with the evolution of new mediums, the Hollerith code was rendered obsolete.
Now the ASCII and EBCDIC codes are the two most widely used alphanumeric codes.
The ASCII Code is very popular code used in all personal computers and workstations
whereas the EBCDIC code is mainly alphanumeric code, the UNICODE, has evolved to
overcome the limitation of limited character encoding as in case of ASCII and EBCDIC
code.
Types of alphanumeric code.
MORSE CODE
The Morse code, invented in 1837 by Samuel F.B. Morse, was the first alphanumeric
code used in telecommunication. It uses a standardized sequence of short and long
elements to represent letters, numerals and special characters of a given message. The
short and long elements can be formed by sounds, marks, pulses, on off keying and are
commonly known as dots and dashes. For example : The letter 'A' is formed by a dot
followed by a dash. The digit 5 is formed by 5 dots in succession. The International
Morse code treats a dash equal to three dots. To see the details of Morse code table you
can refer the Internet search engines.
Due to variable length of Morse code characters, the morse code could not adapt to
automated circuits. In most electronic communication, the Baudot code and ASCII code
are used.
Morse code has limited applications. It is used in communication using telegraph lines,
radio circuits. Pilots and air traffic controllers also use them to transmit their identity
and other information.
BAUDOT CODE
The Baudot code was another popular alphanumeric code used in early 1860's. It was
invented by the French engineer Emile Baudot in 1870. It is a 5-unit code (i.e. it uses
five elements to represent an alphabet). Moreover, unlike the Morse code, all the
symbols all of equal duration. This allowed telegraph transmission of the Roman
alphabet and punctuation and control signals.
HOLLERITH CODE
In 1896, Herman Hollerith formed a company called the Tabulating Machine Company.
This company developed a line of machines that used punched cards for tabulation.
After a number of mergers, this company was formed into the IBM, Inc. We often refer
to the punched-cards used in computer systems as Hollerith cards and the 12-bit code
used on a punched-card is called the Hollerith code.
A Hollerith string is a sequence of l2-bit characters; they are encoded as two ASCII
characters, containing 6 bits each. The first character contains punches 12,0,2,4,6,8 and
the second character contains punches 11, 1,3,5, 7, 9. Interleaving the two characters
gives the original 12 bits. To make the characters printable on ASCII terminals, bit 7 is
always set to 0 and bit 6 is said to the complement of bit 5. These two bits are ignored
when reading Hollerith cards.
Today, as punched cards are mostly obsolete and replaced with other storage medias so
the Hollerith code is rendered obsolete.
American Standard-Code for Information Interchange (ASCII)
The American Standard-Code for Information Interchange (ASCII) pronounced "as-kee"
is a 7-bit code based on the ordering of the English alphabets. The ASCII codes are used
to represent alphanumeric data in computer input/output.
Historically, ASCII developed from telegraphic codes. It was first published as a
standard in 1967. It was subsequently updated and many versions of it were launched
with the most recent update in 1986. Since it is a seven-bit code, it can almost represent
128 characters. These include 95 printable characters including 26 upper-case letters (A
to Z), 26 lowercase letters (a to z), 10 numerals (0 to 9) and 33 special characters such as
mathematical symbols, space character etc. It also defines codes for 33 non-printing
obsolete characters except for carriage return and/or line feed. The below table lists the
7 bit ASCII code containing the 95 printable characters.
The format of ASCII code for each character is X 6' Xs, X4, X3, X2, Xl' X0 where each X is
0 or 1. For instance, letter D is coded as .1000100. For making reading easier, we leave
space as follows: 1000100.
Similarly, from the above table, we see that letter 'A' has X 6 X5X4 of 100 and X3 X2 Xl Xo
of 0001 (A). Similarly, the digit '9' has X 6 X5 X4 values of 011 and X3 X2 Xl X0 of 1001 so
the ASCII-7 code for digit 9 is 0111001.
More examples are:
The ASCll-7 code for 'd' is 1100100 as seen from the table 3.4.
The ASCll-7 code for '+' is 0101011 as seen from the table 3.4.
An eight-bit version of the ASCII code, known as US ASCII-8 or ASCII-8, has also been
developed. Since it uses 8-bits, so this version of ASCII can represent a maximum of 256
characters.
The table below lists some ASCII-8 codes.
When the ASCII -7 codes was introduced, many computers dealt with eight -bit groups
(or bytes) as the smallest unit of information. This eight bit code was commonly used as
parity bit for detection of error on communication lines. Machines that did not use the
parity bit typically set the eighth bit to O. In that case, the ASCII code format would be
X7 X6 X5 X4 X3 X2 Xl X0' In case of ASCII-7 code, if this representation is chosen to
represent characters then X7 would always be zero. So, the eight-bit ASCII-7 code for 'A
would be 01000001 and for 4' would be 00101011.
Example 1: With an ASCII-7 keyboard, each keystroke produces the ASCII equivalent
of the designated character. Suppose that you type PRINT X. What is the output of an
ASCII-7 keyboard?
Solution: The sequence is as follows:
The ASCII-7 equivalent of P = 101 0000
The ASCII-7 equivalent of R = 101 0010
The ASCII-7 equivalent of I = 1001010
The ASCII-7 equivalent of N = 100 1110
The ASCII-7 equivalent of T = 1-010100
The ASCII-7 equivalent of space = 010 0000
The ASCII-7 equivalent of X = 101 1000
So the output produced is 1010000101001010010101001110101010001000001011000.
The output in hexadecimal equivalent is 50 52 49 4E 54 30 58
Example2: A computer sends a message to another computer using an odd-parity bit.
Here is the message in ASCII-8 Code.
1011 0001
1011 0101
1010 0101
1010 0101
1010 1110
What do these numbers mean?
Solution
On translating, the 8-bit numbers into their equivalent ASCII-8 code we get the word
1011 0001 (Q), 10110101 (U), 10100101 (E), 1010 0101 (E), 1010 1110 (N)
So, on translation we get QUEEN as the output.
Extended Binary Coded Decimal Interchange Code (EBCDIC)
The Extended Binary Coded Decimal Interchange Code (EBCDIC) pronounced as "ebi-si
disk" is another frequently used code by computers for transferring alphanumeric data.
It is 8-bit code in which the numerals (0-9) are represented by the 8421 BCD code
preceded by 1111. Since it is a 8-bit code, it can almost represent 23 (= 256) different
characters which include both lowercase and uppercase letters in addition to various
other symbols and commands.
EBCDIC was designed by IBM corp. so it is basically used by several IBM models. In
this code, we do not use a straight binary sequence for representing characters, as was in
the case of ASCII code. Since it is a 8-bit code, so it can be easily grouped into groups of
4 so as to represent in arm of hexadecimal digits. By using the hexadecimal number
system notation, the amount of digits used to represent various characters and special
characters using EBCDIC code is reduced in volume of one is to four. Thus 8-bit binary
code could be reduced to 2 hexadecimal digits which are easier to decode if we want to
view the internal representation in memory. The above table lists the EBCDIC code for
certain characters.
Read the above table as you read the graph. Suppose you want to search for EBCDIC
code for letter 'A. To that case, the value of X 3 X2 Xl X0 bits is 0001 and value X7 X6 X5 X4
bits is 1100.
Therefore, EBCDIC code for letter 'A is 11000001(A). Similarly, the EBCDIC code for 'B'
is 11000010(B).
The EBCDIC code '=' is 01111110
The EBCDIC code for '$' is 0101 1011
UNICODE
The ASCII and EBCDIC encodings and their variants that we have studied suffer from
some limitations.
1. These encodings do not have a sufficient number of characters to be able to encode
alphanumeric data of all forms, scripts and languages. As a result, they do not permit
multilingual computer processing.
2. These encoding suffer from incompatibility. For example: code 7A (in hex) represents
the lowercase letter 'Z' in ASCII code and the semicolon sign ';' in EBCDIC code.
To overcome these limitations, UNICODE also known as universal code was developed
jointly by the Unicode Consortium and the International Organization for
Standardization (ISO). The Unicode is a 16-bit code so it can represent 65536 different
characters. It is the most complete character encoding scheme that allows text of all
forms and languages to be encoded for use by computers. In addition to multilingual
support, it also supports a comprehensive set of mathematical and technical symbols,
greatly simplifying any scientific information interchange.
UNICODE has a number of uses
1. It is increasingly being used for internal processing and storage of text. Window NT
and its descendents, Java environment, Mac OS all follow Unicode as the sole internal
character encoding.
2. All World Wide Web consortium recommendations have used Unicode as their
document character set since HTML 4.0.
3. It partially addresses the new line problem that occurs when trying to read a text file
on different platforms. It defines a large number of characters that can be recognized as
line terminators.
Unicode is currently being adopted by top computer industry leaders like Microsoft,
Apple, Oracle, Sun, SAP and many more in their products.