Image Graphics
Image Graphics
In this chapter, the digital image will be discussed;this is the computer represen-
tation most important for processingin multimedia systems. We present basic con-
cepts of image representation. Further, computer processingof images is described
with an overview of topics on image generation, recognition and transmission'
DD
56 CHAPTER 4. IMAGES AND GRAPHICS
An image might be thought of as a function with resulting valuesof the light intensity
tr-914h-lli.1-.9"",t.3p-lariariaBion. For ili$T5l-compnteropei5lio-ns,
th-GmncEon
needs to be sampled at discrete interviG. The ;ampmg-quantizes th-e-Intensity
values inlo diJCreie lliveiJ.
It is common to use a square sampling grid with pixels equally spaced along the two
sides of the grid. The distance between grid points obviously affects the accuiacy
wittr wtiictr tG original image li repiesented,ind iC de-termine-ffiw much ae-iiii ian
1 r i- h'
be resolved.The resolutiondependson the imaging system as well.
Digital pictures are often very large. For example, suppose we want to sample
and quantize an ordinary (525-line) television picture (NTSC) with a VGA (Video
Graphics Array) video controller, so that it can be redisplayed without noticeable
degradation. We must use a matrix of 640 x 480 pixels, where each pixel is repre-
sented by an 8-bit integer. This pixel representation allows 256 discrete gray levels.
Hence, this image specification gives an array of 807,200 g-bit numbers, and a total
of 2,4571600bits. In many cases,even finer sampling is necessary.
BASIC CONCEPTS
The image format is specified by two main param eterc: spatial resolution, which
is ;be11fied
v'rn:ltefi trv bits per
?: ?:!_"t'\_?i!_"_!-:2n! cot91!i1;od'in4
is speci_fied
pixel. Both qarame!91 g,lue.sd919nd,9nh,ardwargand softwle 1-o1 iT-:-,t4llq:l
of images. At utt *" *iti pietent image formats supported on SPARC and
""i-pi"
IRIS iomputers.
(color-,EGB). Another video frame grabber is the Parallar XVid,eo, which includes
a 24-bit frame buffer and 640 x 480 pixeis resolution. The new multimedia kit
in
the new SPARCstations includes the SunViitefMcard (Sun Microsystems, Inc.),
a
color video camera and the CDware for CD-ROM discs. The SPARCstation 10
M
offers 24-bit image manipulation. The new SunVid,eocard is a capture and com-
pression card and its technology captures and compresses30 frames/second
in real
time under the Solaris 2.3 operating system. Further, SunVideo offers capture and
compressionof video at resolution (320 x 240) pixels in severalformats
[Moog4]:
Three numbers representing the intensities of the red, green and blue compo-
nents of the color at that pixel.
Three numbers that are indices to tables of red, green and blue intensities.
BASICCONCEPTS
In addition, each pixel may have other information associatedwith it; for example,
thiee numbers indiciting the nofmal to the surfaie <lii,wn at fhat plx-l:-Thni-' we
consider-an image as consisting of a collection of Red, Green and Blue channels
(RGB it.ooels), each of *hich gives some single piece of information about the
pixels in the image.
Some current image file formats for storing imaqes i""tf* GIF (Gr-alhical In-
terchange Format), Xl1 Bitmap, Sun Rasterfile, PostScript, IRIS, JPEG, TIFF
(Tagged Image File Format) and others.
Graphics image formats are specified through graphics primitiues and their af-
tributes. To the category of graphics primitives belong lines, rectingleJ, circles ind
ellipses, text strings specifying two-dimensional objects (2D) in a graphical image
or, e.g., polyhedron, specifying three-dimensional objects (3D). A graphics package
60 CHAPTER4. IMAGES AND GRAPHICS
determineswhich primitives are supported. Attributes such as line style, line width,
Graphics primitives and their attributes represent a higher level of an image rep-
resentation,i.e., the graphical images are not representedby a piiet matrix. This
higher level of representation needs to be converted at some point- of the image pro-
cessinginto the lower level of the image representation;for example,when ln-image
is to be-displayed.The advantageof the higher levef primitineJ G the reduition of
data to be stored per one graphical image- and easier manipulation of the graph-
ical image. The disadvantageis the additional conveision step from the gripEidal
primitive,.sand their attributes t9 its_p!1gl Some graphics pl!!3ges
,ry-p,re:-gnt_1tion.
like SRGP (Simple Raster Graphics Package)provide such a conversion,i.e., they
take the giaphics primitives and attributes and generateeither a bitmap or pirmap.
A bitmap is an array of pixel values that map on" by one to pixels on the ,.r""n;
the pixel information is stored in 1 bit, so we get a binary image. Pixmap is a more
-liave
general term describing a multiple-bit-per-pixel image. Low-end color'syste-ins
eight bits per pixel, allowing 256 colors simultaneously. More expensivesystems
have 24 bits per pixel, allowing a choice of any of l6 million colofs. flefr-effi-butrers
with 32 bits per pixel and a screenresolution of 1280 i iiii+pi*uir ur" u,nuGbl"
on personalcomputeis. Ot ttt" 32 bits per pixel, 24 bits are devoied to representing"*t
color and 8 bits to control purposes.Beyond that, bufferswith-96 bits (or more) per
pixel are availableaL 7280x T}z4resolutionon high-endsystems[FDFH92]. S-RGP
does not convert the graphical image into primitives and attributes after generating
a bitmap/pixmap. In this case,after the conversionphase, the graphics format is
-
presentedis a digitat i-ig" iot-ut.
Computer graphics concern the pictorial synthesisof real or imaginary objects from
g ir"utr th" .."
theii compued;lia ryg{"fr the reiited-fleld-ofimag.gp.roc".rit
p-ic-
1_h9gffu.y::f- scenes,or_t\e r:c?Jy_tfy:tr":of modelsfrom
vejs,e*,p-_r.9_g9rtl
tures of 2D or 3D objects. In the following sections, we describe basic principles
;f ir";d;-;V"tt*rir (SJn"..tion) and image analysis (recognition). The literature on
computer graphics and image processingpresentsfurther and detailed information
[FDFH92, KR82, Nev82,HS92].
Image synthesisis an integral part of all computer user interfaces and is indispensable
for visualizing 2D,3D and higher-dimensionalobjects. Areas as diverseas education,
science,engineering, medicine, advertising and entertainment all rely on graphics.
Let us look at some representative samples:
o User Interfaces
The use of graphics for the creation and dissemination of information has
increasedenormously since the advent of desktop publishing on personal com-
puters. Office automation and electronic publishing can produce both tradi-
tional printed documents and electronic documents that contain text, tables,
graphs and other forms of drawn or scanned-in graphics. Hypermedia sys-
tems that allow browsing networks of interlinked multimedia documents are
proliferating.
Interactive computer graphics are the most important means of producing images
(pictures) since the invention of photography and television; it has the added ad-
vantage that we can make pictures not only of concrete, ,,real world,' objects, but
also of abstract, synthetic objects such as mathematical surfacesin 4D.
Dynamics in Graphics
Graphics are not confined to static pictures. Pictures can be dynamically varied; for
a user can control animation tyla;usting-the speed, porti<in-i6fTl6-total
"*u-pl",
scenein view, amount of detail shown, etc. Hence, dynamics is an integral part
of graphics (d'ynami'cgraphics). Much of interactive graphics technology io"t"i"t
hai?Lwareand software for user-controlled motion d,ynamics and updaTedynamics:
Motion Dynamics
With motion dynamics, objects can be moved and enabled with respect to
a stationary observer. The objects can also remain stationary and the view
around them can move. In many cases,both the objects and the camera are
moving. A typical example is a flight simulator which contains a mechanical
platform, which supports a mock cockpit and.a display screen. The computer
controls platform motion, gauges and the simulated world of both stationary
and moving objects through which the pilot navigates.
Update Dynamics
4.2. COMPUTER IMAGE PROCESSING
Update dynamics is the actual changeof the shape, color, or other properties of
the objects being viewed. For instance, a system can display the deformation
of an in-flight airplane structure in responseto the operator's manipulation of
the many control mechanisms. The smoother the change, the more realistic
and meaningful the result. Dynamic, interactive graphs offer a large number of
user-controllablemodes with which to encode and communicate information,
e.g., the 2D or 3D shape of objects in a picture, their gray scale or color and
the time variations of these properties.
Images can be generated by video digitizer cards that capture NTSC (PAL) analog
fGsJkinds of digital im;ges are
signals and create a digital_ima,_ge.
"#A;f- "*
ample, in image processing for image recognition and in communication for video
conferencing. In this section we concentrate on image generation via graphics sys-
tems. We discussin more detail image and video generation via video digitizers in
Chapter 5.
The application program handles user input. It produces views by sending to the
third component, the graphic" svstem, a series of graphics output"iommands t}at
contain bolh a detiiled geometrii a""C"iptio" .f ,;i;i ilT; t" viewed i"a irr"
attributesdescribingnoi tie ;bj ;ilii;u1[ ;ttff .
The graphics system i, ,"rpor,"iur" ro, r"trr.rly producing the picture from the de-
tailed desCriptionsa1a r"r passing the usei's input to iiie-appliiiiio. piod.;fo,
qt".-e-:lilg: The glaphics system is thus r" i"lur*"diary component U"t*""" ift"
application program and the display h;ra;;;. iC."a".ir aA outpai tiaisforio,uon
from objects in itre appiiiation modgl lo a view of rhe model. s;-;uili;;iiy, itlr-
fects an input.trgnsfomnationftorn"r"r'""tio* i" program inputs that
"ppii."tion
cause the applicafion io make changesi" ift".model and/orpict"r". fft" g".phi.,
system typically consists of a set of output subroutines corresponding to
primitives, attributes and other elements. These are collected in a "rtio*
iaphici ttoiiou-
tine librarg or package. The application program specifiesgeometric pri-itirr"".id
attributes to these s-g-b19utines,
and the subroutines then drive the specific disffi
device.rd ii io aLpfuytft" i*.!".
"",r""
At the hardware level, a computer receivesinput from interaction devices and out-
puts images to display devices.
Track-balls can be made to senserotation about the vertical axis in addition to that
about the two horizontal axes. However, there is no direct'ielatioiidhip betileen
hand movements *ittt ttt" device and the corresponding movement in 3D space.
A space-ball is a rigid sphere containing strain gauges. The user pushes or pulls the
spheie in any direction, prbviding 3D tianslation and orientation. In tfris cas", [ii"
directions of movement correspond to the user's attempts to move"the rigid sphere,
aithough the hand does not actually move.
3:.[lg__"-t.t-"oyem:nE
The data glove records hand position-a-n{ 9r!en!_a!i9" -u.t_,ry"Il.
It is a glove covered with small,.lightweight sensors. Each senlor :9"{*P'.91.?
short fiber-optic cable with a Light-Emitting Diode (LED) at one end a1!. a p,h,oto-
transistor at the other. In addition, a Polhelm:usSSPACEthree-dimensionalposition
and orientation sensorrecords hand movements. Wearing the data glove, a user can
grasp objects, move and rotate them and then release them, thus providing very
natural interaction in 3D IZLB+STl.
Audio communication also has exciting potential since it allows hand-free input and
natural output of simple instructions, feedback, and so on.
Current output technology uses rasfer displags, which store display primitives in
a refresh buffer in terms of their component pixels. The architecture of a raster
display is shown in Figure 4.2.In some raster displays,there is a hardware display
controller that receivesand interprets sequencesof output commands. In simpler,
more common systems (Figure 4.2), such as those in personal computers, the display
controller exists only as a software component of the graphics library package,and
the refresh buffer is no more than a piece of the CPU's memory that can be read
by the image display subsystem (often called the uideo controller) that produces the
actual image on the screen.
66 CHAPTER 4, IMAGES AND GRAPHICS
(DisplayCommands) (InteractionData)
The complete image on a raster display is formed from the raster, which is a set of
horizontal raster lines, each a row of individual piiels; the risteii; ih;. rt-;d ;;
a matrix .l qil,gh representing the entire screen area. The entire image is scanned
out sequentially by the video controller. ffr" ii.t"i scan is shown in Figure +.S.
At each pixel, the beam's intensity is set to reflect the pixel,s intensity; in .oio,
.yrt"mi, three beams are controlled - or," io, ei.h ptim*y ;i;t (;d;gr*;, uG;l
- asspecifleauy the iiii"" irb. .;;t"nents of eachpixel's.,al*
flf":s-"..tiq.lrt.
Raster graphics systemshave other characteristics. To avoid flickering of the image,
a 60 Hz or higher refresh rate is used todayl an entire image of 1024 fines of 10f4
pixels each must be stored explicitly and a bitmap or pixmap is generated.
Raster graphics can display areas filled with solid colors * frr"rrrr, i.e., realistic
4.2. COMPUTERIMAGE PROCESSIIfG 67
Vertl@l Retrace
Horlzontil Retnce
Dithering
The solution lies in our eye'scapability for spatial integrat'ion.If we view a very small
area from a sufficiently large viewing distan*ce,our eyes average fine detail witli,n
the small area ard record only the overall intensity of the area. This _pheno-menon
is exploited in the technique called,halftoning, or clustered,-dotord,eredd'i'thg711g
(haiftoning approximation). Each small resolution unit is imprinted with a circle of
CHAPTER 4. IMAGES AND GRAPHICS
t
Figure 4.4: Fiue intensity leuels approrimated,ui,th four 2 x 2 di,therpatterns.
devices which are not able to display individual dots (e.g., laser printers). This
meansthat these devicesare poor at reproducingisolal"a 'on' pixels lifrJtqck-dots
in Figure 4.4). AX pixels that are 'on'for a particular intensity must be adjacent to
other 'on' pixels.
A CRT display is able to display individual dotsl hence, the clustering requiremelt
can be relaxed and a dispersed-dotordered d,ithercan be used. Monochrome dither-
ing techniques can also be used to extend the number of available colors, at the
expenseof resolution. Considera color display with three bits per pixel, one for red,
gr""r, md blue.'"Wecan use a2x2 pattern arei to obt.itt izs .otoii"""foiio*.r
"u"ft
pattern cin display flve intensities for each coloi, by using the halfton" piiterns in
Figure 4.4, iesulting in 5 x 5 x 5 =I25 color combinations.
a1a_lvsis_5
Imas_e *i1h 1":,hlilges.fo,r,e1!l"di5*{"::lip!ig"'fl'l1-3s"'
c91cem9{
that are necessaryfor higher-level scene analysis methods. By itself, knowledge
of the position and value of any particular plxJ J-ort-io"veys no information
related to the recognition of an object, the description of an object's shape, its
position or orientation, the measurement of any distance on the object or whether
the object is defective. Hence, image analysis techniques include computation of
peiceived brightness and color, partial or complete recovery of three-dimensional
4.2, COMPUTER IMAGE PROCESSING 69
Image enhancement deals with improving image quality by eliminating noise (ex-
traneous or missing pixels) or by enhancing contrast.
Scene analysis and computer vision deal with recognizing and reconstructing 3D
models of a scenefrom several 2D images. An example is an industrial robot sensing
the relative sizes,shapes,positions and colors of objects.
70 CHAPTER 4. IMAGES AND GRAPHICS
fmage Recognition
o Infer explicitly or impiicitly an object's position and orientation from the spa-
tial configuration.
To infer an object's (".g., u cup) position, orientation and category or classfrom the
spatial configuration of gray levels requires the capability to infer which pixels are
part of the object. Further, from among those pixels that are part of the object, it
requires the capability to distinguish observedobject features, such as special mark-
ings, lines, curves, surfaces or boundaries (e.g., edges of the cup). These features
themselvesare organized in a spatial relationship on the image and the object.
The kind of object, background, imaging sensor and viewpoint of the sensor all
determine whether the recognition problem is easyor difrcult. For example, suppose
that the object is a white planar square on a uniform black background, as shown in
the digital image (Table 4.1). A simple corner feature extractor could identify the
distinguishing corner points, as shown in the symbolic image (Table 4.2). The match
betweenthe image corner features and the object cornel features is direct. Just relate
the cornersof the image square to the cornels of the object square in clockwiseorder,
starting from any arbitrary correspondence.Then, use the corresponding points to
establish the sensororientation relative to the plane of the square. If we know the
size of the square, we can completely and analytically determine the position and
orientation of the square relative to the position and orientation of the camera. In
4.2. COMPUTER IMAGE PROCESSI]fG 7I
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
U 0 0 0 255 255 255 255 255 0 0 0 0
0 0 0 0 255 255 255 255 255 0 0 0 0
0 0 0 0 255 255 255 255 255 0 0 0 0
U 0 0 0 255 255 255 255 255 0 0 0 0
0 o n 0 255 255 255 255 255 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
Table 4.7: Numeric d,igitat intensitg image of a white square (gray tone 255) on a
black (gray tone 0) backgroundand symboli,cimage.
N N N N N N N N N N N N N
N N N N N N N N N N N N N
N N N N C N N N C N N N N
N N N N N t{ N N N N N N N
N N N N N N N N N N N N N
N N N N N N N N N N N N N
N N N l{ C N N N C N N N N
N N N N N N N N N N N N N
N N N N N N N N N N N N N
this simple instance, the unit of pixel is transformed to the unit of match between
image corners and object corners. The unit of match is then transformed to the
unit of object position and orientation relative to the natural coordinate system of
the sensor.
On the other hand, the transformation process may be diffrcult. There may be a
variety of complex objects that need to be recognized,.For example, some objects
may include parts of other objects, shadows may occur or the object reflectances
may be varied, and the background may be busy.
Which kind of unit transformation must be employed dependson the specific nature
of the vision task, the complexity of the image and the kind of prior information
available.
t
la
=)
We will give a brief overview of the recognition steps, but a deeper analysis of these
steps can be found in computer vision literature, such as [Nev82, HS92], etc.
Image formatting means capturing an image from a camera and bringing it into a
digital form. It means that we will have a digital representation of an image in the
form of pixels. (Pixels and image formats were describedin Sections4.1.1' and
4.L2.) An example of an observedimage is shown in Figure 4.6.
7. Condi,ti,oning
2. Labeling
Labeling is based on a model that suggeststhe informative pattern has struc-
ture as a spatial arrangement of events, each spatial event being a set of
connected pixels. Labeling determines in what kinds of spatial events each
pixel participates.
An example of a labeling operation is edge detection. Edge detection is an
important part of the recognition process. Edge detection techniques find
local discontinuities in some image attribute, such as intensity or color (e.g.,
detection of cup edges). These discontinuities are of interest becausethey are
likely to occur at the boundaries of objects. An edgeis said to occur at a point
in the image if some image attribute changesin value discontinuously at that
point. Examples are intensity edges. An ideal edge, in one dimension, may be
viewed as a step change in intensityl for example, a step between high-valued
and low-valued pixels (Figure 4.7). If the step is detected, the neighboring
Figure 4.8: Edge detection of the image from Fi,gurei.6 (Courtesy of Jana Koieclca,
GRASP Laboratory, Uniuersity of Pennsyluania, 1991)'
Edge detection recognizes many edges, but not all of them are significant.
Therefore, another labeling operation must occur after edge detection, namely
thresholding.Thresholding specifieswhich edgesshould be acceptedand which
should not; the thresholding operation filters only the significant edges from
the image and labels them. Other edgesare removed. Thresholding the image
from Figure 4.8 is presentedin Figure 4.9.
3. Grouping
The labeling operation labelsthe kinds of primitive spatial events in which the
pixel participates. The grouping operation identifies the events by collecting
together or identifying maxima.l connected sets of pixels participating in the
same kind of event. When the reader recalls the intensity edge detection
viewed as a step changein intensity (Figure 4.7),the edgesare labeled as step
edges,and the grouping operation constitutesthe step edgelinking.
Figure 4.9: Thresholding the image from Figure l.S (Courtesy of Jana Koseckd,
GRASP Laboratory, Uniuersity of Pennsyluania, lggl).
Figure 4.10: Line-fitting of the image from Figure l.B (Courtesy of Jana Koieckd,
GRASP Laboratory, Uniuersitg of Pennsyluania).
The grouping operation involves a changeof logical data structure. The ob-
served image, the conditioned image and the labeled image are all digital
image data structures. Dependingon the implementation, the grouping oper-
ation can produce either an image data structure in which each pixel is given
an index associatedwith the spatial event to which it belonesor a data struc-
4.2. COMPUTER IMAGE PROCESSING 77
ture that is a collection of sets. Each set corresponds to a spatial event and
contains the pairs of positions (row, column) that participate in the event.
In either case, a change occurs in the logical data structure. The entities of
interest prior to grouping are pixels; the entities of interest after grouping are
sets of pixels.
4. Ertracting
The grouping operation determines the new set of entities, but they are left
naked in the sense that the only thing they possesis their identity. The
extracting operation computes for each gloup of pixels a list of properties.
Example properties might include its centroid, area, orientation, spatial mo-
ments, gray tone moments, spatial-gray tone moments, circumscribing circle,
inscribing circle, and so on. Other properties might depend on whether the
group is considered a region ot an arc. If the group is a region, the number
of holes might be a useful property. If the group is an atc, avelage curvatule
might be a useful property.
5. Matching
After the completion of the extracting operation, the events occurring on the
image have been identified and measured, but the eventsin and of themselves
have no meaning. The meaning of the observedspatial events emergeswhen a
perceptual organization has occurred such that a specific set of spatial events
in the observed spatial organization clearly constitutes an imaged instance
of some previously known object, such as a chair or the letter A. Once an
object or set of object parts has been recognized,measurements(such as the
distance between two parts, the angle between two lines or the alea of an
object part) can be made and related to the allowed tolerance, as may be the
case in an inspection scenario. It is the matching operation that determines
the interpretation of some related set of image events, associatingthese events
with some given three-dimensional object or two-dimensional shape.
78 CHAPTER 4. IMAGES AND GRAPHICS
Image transmission takes into account transmission of digital images through com-
puter networks. There are several requirements on the networks when images are
transmitted: (1) The network must accommodatebursty data transport becauJe
image transmissionis bursty (The burst is causedby the large size of the image.);
(2) Image transmission requires reliable transportl (3) Time-dependence is not a
dominant characteristic of the image in contrast to audio/video transmission.
Image size dependson the image representationformat used for transmission. There
are several possibilities:
For example, the transmission of an image with a resolution of 640 x 480 pixels
and pixel quantizationof 8 bits per pixel requirestransmissionof 307,200bytes
through the network.
4.3 Comments
We have described in this section some characteristics of images and graphical ob-
jects. The quality of hardwarl 919!t g,s
media depends on the agalit.y of !_l-re
!!e!g
frame grabbers, displays i"a otn"i input/output devices. The developmgnt of input
and.o,utput devices continues at a rapid pace. A few examples should give a flavor
of this development:
New multimed,iad,euices
New scanners of photographical objects already provide high-quality digital
images and become part of multimedia systems. An introduction of a new
muitimedia device (e.g., scanner) implies new multimedia format becausethe
new medium (e.g., photographical images) can be combined with other images
and other media. An example of such a new multimedia format is the Photo
Image Pac File Format introduced by Kodak. This format is a new disc format
that combines high-resolution images with text, graphics and sound. Hence,
it enablesusersto designinteractive Photo-CD-basedpresentations[Ann94b]'