Chapter 2 Multimedia Information
Representation
Contents
▪ 2.1 Introduction
▪ 2.2 Digitization Principles
▪ 2.3 Text
▪ 2.4 Images
▪ 2.5 Audio
▪ 2.6 Video
1
2.1 Introduction
▪ Codeword: a fixed number of bits representing a set of
symbols, e.g) ASCII Code, FAX Run-length Code, … .
▪ Signal Encoder
Audio-Video CODEC
▪ Signal Decoder (Coder-Decoder)
▪ CODEC performs the conversion using some codewords
Data
Data
Host Network Host
Data (or conversion Signal (or conversion Data (or
Signal) Data) Signal)
2
2.2 Digitization Principles (1)
(Analog Digital)
- Spectrum VS. Bandwidth terms
- Signal bandwidth VS. Channel (Bandlimiting) bandwidth
- Cutoff frequency = min {Signal bandwidth, Bandlimiting bandwidth}
Analog Digital
A/D Converter D/A Converter
Bandlimiting Quantizer Lowpass
Sampler Decoder
Filter & Coder Filter
Digital Analog
Host Encoder Networks Decoder Host
conversion Transfer conversion
3
Encoder
Analog Bandlimiting Sampler
Quantizer Encoder
input filter (sample-and-hold)
signal clock
A B D E F
C
A time
D Decoder
4 7
0 3 5 Lowpass
DAC
filter
E
-4 -5 -3
G H
F 0 000 0 100 0 111 0 011 1 100 1 101 1 011 0 101
Analog
G output
0 101(1-bit sign & 3-bit signal
H amplitude magnitude)
4
2.2 Digitization Principles(2)
(Analog Digital)
▪ Analog Signal
▪ Bandwidth, B Hz, via band-limiting channel
▪ Encoder
▪ Band limiting filter
▪ Sampling: 2B sps(samples per sec) aliasing may happen !
▪ Quantizing: Aliasing filter for eliminating alias signals
▪ quantization interval q = 2(Vmax/2n)
▪ quantization error/noise = q/2 n: # of bits
Vmax: max(min) positive
(negative) signal
▪ Decoder amplitude
▪ low-pass filter (= band-limiting filter = anti-aliasing filter)
5
2.2 Digitization Principles (3)
(Analog Digital)
When does
aliasing occur ? If the sampling rate is lower than the Nyquist rate
T 6KHz real signal 2KHz alias signal because of T’ = 3T
amplitude
time 6KHz sine-wave is sampled at
8Ksps, lower than the Nyquist
T’ = 3T rate 12Ksps(26KHz)
8Ksps
All frequency components in the source signal that are higher in
Conclusion
frequency than half the sampling frequency being used will generate
related lower-frequency alias signal which will simply add to those
making up the original thereby causing it to become distorted
Resolution Using “bandlimiting filter”, let’s pass only those Frequency
components up to that determined by the Nyquist rate
bandlimiting filter = anti-aliasing filter = low-pass filter = reconstruction filter 6
2.2 Digitization Principles (4)
(Analog Digital)
▪ Example 1
An analog signal has a dynamic range of 40 dB. Find the magnitude
of the quantization noise relative to the minimum signal amplitude
if the quantizer uses 1) 6 bits and 2) 10 bits
7
▪ Solution
It follows that 40 = 20 log10(Vmax/Vmin) by assumption and
finally the equation 102 = Vmax/Vmin results in
Vmin = Vmax/100
And the quantization noise is determined by q/2 where,
q is the quantization interval given by q =2(Vmax/2n).
Thus q/2= Vmax/2n.
For n =6, q/2= Vmax/2n(= Vmax/64) > Vmin(=Vmax/100)
→ unacceptable !
For n =10, q/2= Vmax/2n(= Vmax/1024) < Vmin(=Vmax/100)
→ acceptable !
8
▪ dB (decibel) : The decibel measures the relative strength
of two signals or a signal at two different points p1 and p2
given by dB = 10 log10(p2/p1)
p2
p1
If a signal power is reduced to half at p2
such that p2=p1/2
10 log10(p2/p1) = 10 log10(0.5p1/p1)=
10 log10(1/2) = 10 log101- 10 log102= -3dB
9
2.3 Text
▪ Unformatted Text, Plaintext
▪ String of fixed-size characters Well-defined code-words
are used for Text Creation
▪ ASCII, Mosaic Characters, … .
& Manipulation
▪ Formatted Text
▪ String of characters of different sizes, styles &
shapes with table, figures (graphics) & images
▪ Latex, Acrobat, … .
▪ Hypertext
▪ Integrated set of documents comprising
formatted & unformatted texts with linkages
among them
▪ HTML, Postscripts, … .
10
2.4 Images
▪ Image (still picture) Classification
▪ Computer-generated images (computer graphics) VGA 640 480
e.g) palette files pixels
8-bits/pixel
▪ Digitized images of documents and/or pictures
e.g) fax-scanned files, scanned color-image files
▪ Graphics
▪ high-level language form: description of attributes of objects
▪ bit-map form: actual pixel-images (VGA- Video Graphics Array)
▪ gif: graphical interchange format
pixel (or pel):
▪ tiff: tagged image file format
picture element
▪ srgp: simple raster graphics package
▪ Digitized Documents
▪ Facsimile (FAX) machine, about 2Mbits/page(black-white/pixel)
▪ Pixel resolution: 8 pels per mm
▪ Line resolution: 3.85 or 7.7 lines per mm(100 or 200 lines per
inch)
11
Digitized Pictures(1)
▪ m-bit per pixel (pixel depth m)
▪ good-quality black-white picture: 8-bit/pixel(256 gray levels)
▪ colored-picture: 24-bit/pixel(R/G/B each 8-bit yielding 16 M colors)
▪ Coloring Principles : How is color produced and represented ?
▪ Color gamut:a whole spectrum of colors
▪ Three primary colors: R (Red), G (Green), B (Blue)
▪ all kind colors are produced by using different proportions of
these primary colors
▪ Additive Color Mixing on a black surface
▪ Subtractive Color Mixing on a white surface
▪ Raster-Scan Principles: TV Screen or Computer CRT Monitor
▪ NTSC (National Television Standards Committee)-USA
▪ 525(active 480) lines/frame & 60-time refresh rate/sec
▪ PAL (Phase Alternation Line)/CCIR/SECAM
▪ 625(active 576) lines/frame & 50-time refresh/sec
12
Digitized Pictures(2)
Sweep
1. N=525(National Television System Committee) &
1 625(Phase Alternating Line/Sequential Color with
2 Memory/Commander’s Critical Information Requirements)
3
2. fresh rate (Hz) = 60(NTSC) & 50(PAL/SECAM/CCIR)
4
5 3. M is determined by the aspect ratio (see the next slides)
frame : a complete set of N horizontal scan lines
N
frame refresh rate: # of frames per sec
1 2 3 M at least 50 Hz to avoid flickering
Retrace
MxN
Scanning Method
60 or 50 Hz
Progressive scanning : 1→2→3→…→N: one frame refresh rate
Interlaced scanning : 1→3→5→…→N-1: first half frame (field) 30 or 25 Hz
refresh rate
2→4→6→…→N: 2nd half frame (filed)
13
Digitized Pictures(3) in HTML
▪ Raster-Scan Principles
▪ Raster:a finely-focused electro beam
▪ Phosphor:a light-sensitive material that emits light when
energized
▪ white-sensitive phosphor: a single electron beam used
▪ color-sensitive phosphor : each pixel comprises a set of three
color-sensitive phosphors, one each for R, G, B signals,
called phosphor triad
spot size:
0.635mm(0.025inch)
▪ beam signal may be either analog or digital form
▪ Pixel Depth: # of bits per pixel
▪ CLUT (Color Look-Up Table): 24-bit/pixel yields 224 colors. But eye
discriminates between some ranges of colors hence, each pixel value is
used as an index on CLTT of 256 colors (compression achieved !)
14
Digitized Pictures(4)
▪ Aspect Ratio: ratio of the screen width to the screen height
Representing
▪ NTSC, 525 scan lines/frame ⇒ 480(45) data (control) lines an MN pixels
▪ 4/3 aspect ratio ⇒ 480 4/3(=640) pixels/line under a
▪ 16/9 aspect ratio ⇒ 480 16/9(=853.33) pixels/line particular
aspect ratio
▪ PAL/CCIR/SECAM 625 lines/frame ⇒ 576(49) data (control) lines
▪ 4/3 aspect ratio ⇒ 576 4/3(=768) pixels/line
▪ 16/9 aspect ratio ⇒ 576 16/9(=1024) pixels/line
Computer Graphics Array
standard resolution #of colors Bytes/frame
Video GA 640 x 480 x 8 256 307.2K
ExtendedGA 640 x 480 x 16 64K 614.4K
1024 x 768 x 8 256 786.432K
Super 800 x 600 x 16 64K 960K
videoGA 1024 x 768 x 8 256 786.432K
1024 x 768 x 24 16M 2359.296K 15
Digitized Pictures(5)
▪ Example 2.3
Derive the time to transmit the following digitized images at both 64Kbps and
1.5Mbps networks
▪ a 6404808 VGA-compatible image
▪ a 102476824 SVGA-compatible image
16
▪ Solution
The size of each image in bit is as follows
▪ a VGA image = 6404808 = 2.46Mbits
▪ an SVGA image = 102476824 =18.88Mbits
The time to transmit each image is given as follows
▪ at 64Kbps : VGA = 2.46Mbits/64Kbps = [2.46106]/[64 103] = 38.4 sec.
SVGA = [18.88106]/[64 103] = 295 sec.
▪ at 1.5Mbps: VGA = 2.46Mbits/1.5Mbps = [2.46106]/[1.5 106] = 1.64 sec.
SVGA = [18.88106]/[1.5 106] = 12.59 sec.
17
Digitized Pictures(6)
▪ Digital Cameras & Scanners
▪ (Still image cameras) 2-D grid of photo-sites (diode), light-
sensitive cells, made of charge-coupled devices (CCD’s)
▪ level of light intensity on each photosites is converted into
a digital value using an AD converter when the shutter is
activated
▪ (Scanners) single-row of photo-sites is exposed in time-
sequence with the scanning operation
▪ How are color images obtained ?
▪ each photosite/pixel is coated with R/B/G filter & the
General color is determined by the level of it together with
consumer
8 neighbors in a 3 x 3 grid structure
▪ use of three separate exposures of a single photosite, say,
Photo first R filter, 2nd G filter, and finally B filter
studio
▪ use of three separate image sensors per pixel professional
e.g) TIFF (tagged image file format), TIFF/EP for electronic
photography
18
2.5 Audio
▪ Typical Audio Types
▪ Speech signal for interpersonal application such as (video)
telephony
▪ Music-quality audio such as CD-on-demand & broadcast TV
▪ synthesizer
▪ microphone
▪ loudspeaker
Basics on Audio Signals
1. Human speech: 50Hz -10KHz (4Khz in a plain-old-telephone system)
- 2 x 10K or 2 x 8K sps monaural (mono) speech
- (2 x 10K) x 2 or (2 x 8K) x 2 sps stereophonic speech
- ideally, 12 bits/sample
2. Human audible music: 15Hz - 20KHz
- 2 x 20K sps monaural (mono) music
- (2 x 20K) x 2 sps stereophonic music
- ideally, 16 bits/sample
19
PCM Speech(1)
▪ Human Voice over PSTN
▪ 200Hz-3.4Khz bandlimiting channel: about less than 4Khz
▪ 8K(2x4K) sps, 8bits/sample : ITU-T G.711(PCM) recommendation
▪ Companding (compressing/expanding)
▪ 1-bit: polarity, 3-bit: segment code, 4-bit: quantization code
Pure Compander Enhanced
PCM signals (compressor/expander) PCM signals
Equal (linear) Non-linear (unequal)
interval interval quantization
quantization & & narrower intervals
same level of for smaller amplitude
quantization error signals
Irrespective of the magnitude of the input signal ,
the same error level for both low (quiet) signals
and high (loud) signals is produced
Why companding ?
Because the human ears are more sensitive to noise on quiet
signals than it is on loud signals. Hence the effect of
quantization noise (error) can be reduced with companding 20
PCM Speech(2)
▪ Companding Example: 5-bit per sample(1-bit polarity, 2-bit segment code,
& 2-bit quantization code)
+V signal
Linear 11
10
11 quantization 01
intervals 00
Polarity: 1
11
10 10
Segment 01
00
codes(+) 11
01 10
01
00
11
00 10
01
00
-V 00
01 +V
10 00
11
Polarity: 0
00
Narrower 01
10 01
intervals 11
Segment
for smaller 00
01 codes(-)
amplitude 10
11
10
00
01
10 11
11
-V 21
PCM Speech(3)
▪ Companding Example: 5-bit per sample(1-bit polarity, 2-bit segment code,
& 2-bit quantization code)
+V signal
Linear 11
10
11 quantization 01
intervals 00
Polarity: 1
11
10 10
Segment 01
00
codes(+) 11
01 10
01
00
11
00 10
01
00
00
01
10 00
11
Polarity: 0
00
Wider 01
10 01
intervals 11
Segment
for smaller 00
01 codes(-)
amplitude 10
11
10
00
01
10 11
11
-V 22
PCM Speech(4)
▪ Two Companding Codewords for PCM
▪ μ -law: North America & East Asia
▪ A-law: Europe
μ-law A-law
+127 1 0000000 1 1111111
+96 1 0011111 1 1100000
+64 1 0111111 1 1000000
+32 1 1011111 1 0100000
+0 1 1111111 1 0000000
-0 0 1111111 0 0000000
-32 0 1011111 0 0100000
-64 0 0111111 0 1000000
-96 0 0011111 0 1100000
-127 0 0000000 0 1111111
1’s complement
Sign bit (polarity)
23
CD-Quality Audio
Standard associated with these devices is CD-DA(Digital Audio)
standard
▪ Human audible bandwidth: 15Hz-20Khz 40Ksps
▪ In CD-ROMs, more higher, say, 44.1Ksps(Sampled at 23µs
intervals) & 16-bit/sample (65536 equal quantization intervals)
used
▪ bit rate for channel = sampling rate x bits per sample
= 44.1 x 103 x 16 = 705.6 Kbps
▪ total rate required for stereophonic music
= 2 x 705.6 = 1.411 Mbps
▪ storage capacity for a 1 hour CD-ROM title
= 1.411 x 60 x 60 = 634.95 Mbytes
this takes (634.95 x 106 x 8)/(10 x 106) = 8.5 min. down-
loading time via a 10Mbps link network !
24
• Assume the CD-DA standard is being used, derive:
i. the time to transmit a 30 second portion of the
title using a transmission channel of bit rate:
• 64 kbps
• 1.5Mbps
25
• One 30 second portion of the title = 1.411X 30
= 42.33 Mbits.
• Hence time to transmit this data:
• At 64 Kbps = (42.33 X 106)/(64 X 103) = 661.4s
• At 1.5 Mbps = (42.33 X 106)/(1.5 X 106) = 28.22s
26
27
28
29
30
2.6 Video (Motion): Broadcast TV
Video Applications
▪ Entertainment: Broadcast TV, VCR/DVD Recordings
▪ Interpersonal: Video Telephony & Videoconferencing
▪ Interactive: Video Clips on PC Windows
▪ Scanning Sequences: Interlaced Scanning
▪ To minimize the amount of tx bandwidth, a frame is divided into
two halves called fields
e.g) 525-line 50-time frame refresh rate/sec.
- 262.5 odd lines 50-time field rate/sec.
- 262.5 even lines 50-time field rate/sec.
In reality,
525-line 25-time frame refresh rate/sec.
31
32
Broadcast TV(2)
Luminance
Brightness
Hue (Tint)
Saturation
▪ Color Signals Chrominance
▪ Three properties of a color
- Brightness, Hue (Tint) & Saturation
- Yellow = R +G, Magenta = R+B, White=R+G+B
▪ Color production: an equation of R, G, and B phosphors
- 0.299 R + 0.587 G + 0.114 B where, 0.299+0.587+0.114=1
▪ Luminance refers to the brightness of a source, the hue & the saturation
called, chrominance characteristics
-say, luminance Ys = 0.299 Rs + 0.587 Gs + 0.114 Bs
Ys: magnitude of the luminance signal
Rs, Gs, Bs: magnitudes of three major colors
▪ Since the Luminance signal Y = R+G+B we get Two color difference signals:
Blue chrominance Cb and Red chrominance Cr
- Cb = Bs-Ys, Cr = Rs -Ys
33
Broadcast TV(3)
▪ Chrominance Components
▪ Composite Video Signal for Transmission
- Ys, Cb, and Cr signals are combined together and signal
differences are scaled down before transmission
▪ In PAL(phase Alternating line)(color encoding system)
- Y = 0.299 R + 0.587 G + 0.114 B
- U(Cb) = 0.493(B-Y) = -0.147R-0.289G+0.437B
- V(Cr ) = 0.877(R-Y) = 0.615R-0.515G-0.1B
▪ In NTSC(National Television system committee)
- Y = 0.299 R + 0.587 G + 0.114 B
- I(Cb) = 0.74(R-Y)-0.27(B-Y) = 0.599R-0.276G-0.324B
- Q(Cr ) = 0.48(R-Y)+0.41(B-Y) = 0.212R-0.528+0.311B
34
35
36
37
Digital Video
▪ Advantages of DV
▪ Easy to store in computer
▪ Easy to edit and integrate with other types
▪ Easy to digitize three RGB component signals
▪Transmission bandwidth is achieved by using the luminance and
two color difference signals, instead of the RGB signals
directly.
▪ CCIR-601 Recommendations: standard for the digitization
of video pictures
38
Digital Video(2)
Y
▪ 4:2:2 format(CCIR-601) Cb
▪ Recommendation for use in TV studio Cr
▪ Three component (analog) video signals may have bandwidths
▪ up to 6Mhz for the luminance ⇒ 12Mhz sps
▪ less than 3Mhz for the two chrominance signals ⇒ 6 Mhz sps
▪ In reality, 13.5M sps for luminance, 6.75 M sps for the two
chrominance signals
▪ In NTSC(525-line) system, total line sweep time 63.56μsec =
retrace time 11.56 μsec + an active line sweep time 52 μsec
▪ In PAL(625-line) system, total line sweep time 64μsec =
retrace time 12 μsec + an active line sweep time 52 μsec
Orthogonal sampling
Line sampling rate: Line sampling rate:
5210-613.5106 = 702 samples/line 5210-66.75106 = 351 samples/line
In reality, 720 samples/line In reality, 360 samples/line
4Y samples for every 2Cb and 2Cr samples(4:2:2)
39
Digital Video(3)
▪ 4:2:2 Format Bit Rate & Storage (NTSC 525-line)
▪ The number of active (visible) lines: 480
▪ The number of samples per line: 720
Resolution of luminance Y = 720480
Two chrominance signals Cb = Cr = 360480
▪ Line sampling rate: 13.5sps for Y & 6.75sps for both Cb & Cr
▪ Bits per sample: 8 bits
Bit rate per line = 13.51068 + 2(6.751068) = 216Mbps
Bits per line = 7208 + 2(3608) = 11.52Kbits
Bits per frame = 48011.52 = 5.5296Mbits
Bits for 1.5 hrs Video assuming 60 refresh rate = 5.5296601.53600
= 223.9488GBytes
40
Digital Video(4)
▪ 4:2:0 Format
▪ used in Digital Broadcast Applications
▪ interlaced scanning with the absence of chrominance
samples in alternative lines
▪ 525-line system
▪ Y = 720480(the same as 4:2:2 format), Cb = Cr = 360240
▪ 625-line system
▪ Y = 720576, Cb = Cr = 360288
▪ bit rate per line: 13.51068 + 2(3.3751068) = 162Mbps
▪ HDTV Format
▪ used in High-Definition Television (four times bit rate)
▪ 4/3 14401152 pixels(50/60 Hz refresh rate) & 16/9 wide-screen
19201152 pixels(25/30 Hz) with # of visible lines per frame 1080
41
Digital Video(5)
▪ SIF (Source Intermediate Format), 4:1:1 Format
▪ used in Video Cassette Recorders (VCRs)
▪ progressive (non-interlaced) scanning since it is intended for
storage applications
▪ Half of 4:2:0 format: “Subsampling & Temporal Resolution”
▪ 525-line system
▪ Y = 360240, Cb = Cr = 180120
▪ 625-line system
▪ Y = 360288, Cb = Cr = 180144
▪ bit rate per line
▪ 6.751068 + 2(1.68751068) = 81Mbps
42
Digital Video(6)
▪ CIF (Common Intermediate Format), 4:1:1 format
▪ used in Video Conferencing applications
▪ spatial resolution of the SIF 625-line system plus
temporal resolution of the SIF 525-line system
▪ Y = 360288, Cb = Cr = 180144
▪ refresh rate: 30 Hz
▪ bit rate per line: 6.751068 + 2(1.68751068) = 81Mbps
▪ many variants for videoconferencing using desktop PCs
or ISDN/PSTN
▪ say, typically 4 or 16 64Kbps channels used
▪ 4CIF: Y = 720576, Cb = Cr = 360288
▪ 16CIF: Y = 14401152, Cb = Cr = 720576
43
Digital Video(7)
▪ QCIF (Quarter CIF), 4:1:1 Format
▪ used in Video Telephony applications
▪ half spatial resolution of the CIF and
either half or quarter temporal resolution of the CIF
▪ Y = 180144, Cb = Cr = 9072
▪ refresh rate: 15 or 7.5 Hz
▪ bit rate per line:
3.3751068 + 2(0.843751068) = 81Mbps
▪ a lower version is typically used for single 64Kbps channel
ISDN or PSTN with modems: sub-QCIF(SQCIF)
▪ Y = 12896, Cb = Cr = 6448
44
Digital Video(8)
▪ PC Video Digitization
Digitization System Spatial Resolution Temporal
Format Resolution
525-line Y = 640480, Cb = Cr = 320240 60Hz
4:2:2
625-line Y = 768576, Cb = Cr = 384288 50Hz
525-line Y = 320240, Cb = Cr = 160240 30Hz
SIF
625-line Y = 384288, Cb = Cr = 192144 25Hz
CIF Y = 384288, Cb = Cr = 192144 30Hz
QCIF Y = 192144, Cb = Cr = 9672 15/7.5Hz
- Video capture board or S/W required
- All PC monitors use “progressive (non-interlaced) scanning”
45