CS2842 Computer Systems – Lecture IV
Data Formats
Dr. Sapumal Ahangama
Department of Computer Science and Engineering
1
DATA FORMATS
Computers
Process and store all forms of data in binary format
Human communication
Includes language, images and sounds
2
DATA FORMATS
Specifications for converting data into computer usable form
Define the different ways human data may be represented,
stored and processed by a computer
The data must have the ability to be moved between
computers
Metadata: information that describes or interprets the meaning of the
data
3
DATA FORMATS
Proprietary formats
Individual programs can store and process data in any format that
they want
Standard data representations
to be used as interfaces between different programs,
between a program and the I/O devices used by the program,
between interconnected hardware,
between systems that share data
4
COMMON DATA REPRESENTATIONS
5
ALPHANUMERIC DATA
Much of the data that will be used in a computer are originally
provided in human-readable form,
Letters of the alphabet, numbers, and punctuation,
English or some other language
Alphanumeric data are a combination of alphabetical and
numerical characters
Since alphanumeric data must be stored and processed within
the computer in binary form, each character must be
translated to a binary representation
6
ALPHANUMERIC DATA
Three alphanumeric codes are in common use,
ASCII (American Standard Code for Information Interchange)
EBCDIC (Extended Binary Coded Decimal Interchange Code)
Unicode
Nearly every system today uses Unicode or ASCII
7
ASCII
Each character represented with a 7 bit code
128 characters
Consists of,
digits 0 to 9,
lowercase letters a to z,
uppercase letters A to Z,
punctuation symbols,
33 non-printing control codes
Extended to 8 bit code – Latin-1
8
ASCII
9
UNICODE
ASCII and EBCDIC have limitations
8-bit word limit the number of possible characters
Other major languages?
Omitted characters [, ], ^, {, }, ~
These issues led to a 16 bit standard – Unicode or UTF-16
65,536 characters
49,000 are defined to represent the world’s most used
characters
6,400 16-bit codes are reserved for private use
Each character can be stored in 2 bytes
10
UNICODE
11
UNICODE
12
2 CLASSES OF CODE
Printing characters
Produced on the screen or printer
Control characters
13
KEYBOARD INPUT
Scan code
When a key is struck on the keyboard, the circuitry in the
keyboard generates a binary code
14
KEYBOARD INPUT
Other alphanumeric inputs:
OCR
Barcode
Magnetic Strip Reader
RFID
15
IMAGE DATA
Images come in many different shapes, sizes, textures, colors,
and shadings
Different requirements require different forms for image
data
Quality of the image
Storage space required
Time to transmit
Ease of modification
Make it difficult to define a single universal format
16
IMAGE DATA
Two distinct categories
Bitmap or raster images
Characterized by continuous variations in shading, color, shape, and
texture
JPEG, GIF
Graphical objects
Made up of graphical shapes such as lines and curves that can be
defined geometrically
The nature of display technology make it much more
convenient and cost effective to display and print most images
as bitmaps
17
IMAGE DATA
Two distinct categories
Bitmap or raster images
Characterized by continuous variations in shading, color, shape, and
texture
JPEG, GIF
Graphical objects
Made up of graphical shapes such as lines and curves that can be
defined geometrically
The nature of display technology make it much more
convenient and cost effective to display and print most images
as bitmaps
18
IMAGE DATA
19
BITMAP IMAGES
Bitmap image format
A rectangular image is divided into rows and columns
The junction of each row and column is a point known as a pixel
Pixel is a set of one or more binary numerical values that define the
visual characteristics
Preferred when image contains large amount of detail and
processing requirements are fairly simple
20
BITMAP IMAGES
Example each point below represented by a 4 bit code corresponding
to 1 of 16 shades
Meta data
Pixel data
Stored from top to bottom one row at a time
21
BITMAP IMAGES
Data value representing a pixel
Could be as simple as one bit
For color image, might consist of many bytes
RGB
Additional bytes for other characteristics such as transparency and
color correction.
22
BITMAP IMAGES
File size affected by
Resolution
Reducing the size of a pixel to improve details
Levels: number of bits to represent each pixel
Image formats
GIF (Graphics Interchange Format)
JPEG (Joint Photographers Expert Group)
PNG (Portable Network Graphic)
23
OBJECT IMAGES
Object images are made up of simple elements like straight or
curved lines, circles and arcs etc.
Each element can defined mathematically by parameters
Circle requires 3 parameters, Cartesian coordinates + radius
Straight line needs the coordinates of its end points
24
OBJECT IMAGES
Advantages
Require less storage space
Can be manipulated easily
Photographs as object images?
25
VIDEO DATA
Requires a large amount of data
1024 × 768 pixel true-color images at a frame rate of 30 frames per
second?
70.8 megabytes of data per second!
4.25 gigabytes per minute
How to reduce video size?
26
AUDIO DATA
Sound is naturally an analog wave that needs to be digitized
Sampling
1000 samples per second = 1 KHz (kilohertz)
Example : Audio CD sampling rate = 44.1KHz
27
AUDIO DATA
Sampling Rate
Height of each sample saved as,
8 bit number for radio quality recordings
16 bit number for high fidelity recordings
2 x 16 bits for stereo sound
28
DATA COMPRESSION
Compression: reducing data so that it requires fewer bytes of
storage space
Compression ratio: the amount of file shrunk
Lossless Compression
Inverse algorithm restores data to exact original form
Examples GIF, PCX, TIFF
05573200001473291000006682732732
0155732041473291056682732732
0155Z0314Z91056682ZZ
29
DATA COMPRESSION
Lossy Compression
Trades off data degradation for file size and download speed
Much higher compression ratios, often 10 to 1
JPEG
MPEG-2?
30
THANK YOU
31