0 ratings0% found this document useful (0 votes) 67 views264 pagesSystem Software
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Contents
Chapter 1 Background 1
1d
12
13
14
15
Introduction 1
System Software and Machine Architecture 3
The Simplified Instructional Computer (SIC) 4
1.3.1 SIC Machine Architecture 5
132 SIC/XE Machine Architecture 7
13.3. SIC Programming Examples 12
Traditional (CISC) Machines 21
14.1 VAX Architecture 21
142 Pentium Pro Architecture 25
RISC Machines 29
1.5.1 UltraSPARC Architecture 29
15.2 PowerPC Architecture 33
153 Cray T3E Architecture 37
Exercises 40
Chapter 2 Assemblers 43
21
22
23
24
25
Basic Assembler Functions 44
21.1 ASimpleSIC Assembler 46
2.1.2 Assembler Algorithm and Data Structures 50
Machine-Dependent Assembler Features 52
2.2.1 Instruction Formats and Addressing Modes 57
22.2 Program Relocation 61
Machine-Independent Assembler Features 66
23.1 Literals 66
232 Symbol-Defining Statements 71
23.3 Expressions 75
234 Program Blocks 78
235 Control Sections and Program Linking 83
Assembler Design Options 92
24.1 One-Pass Assemblers 92
24.2 Multi-Pass Assemblers 98
Implementation Examples 102
25.1 MASM Assembler 103
252 SPARC Assembler 105
ix25.3 AIX Assembler 108
Exercises 111
Chapter 3 Loaders and Linkers 123
3.1
3.2
33
34
35
Basic Loader Functions 124
3.1.1 Design of an Absolute Loader 124
3.1.2 ASimple Bootstrap Loader 127
Machine-Dependent Loader Features 129
3.2.1 Relocation 130
3.2.2 Program Linking 134
3.23 Algorithm and Data Structures for a Linking Loader 141
Machine-Independent Loader Features 147
3.3.1 Automatic Library Search 147
33.2 Loader Options 149
Loader Design Options 151
34.1 Linkage Editors 152
34.2 Dynamic Linking 155
34.3 Bootstrap Loaders 158
Implementation Examples 159
35.1 MS-DOS Linker 160
35.2 SunOS Linkers 162
35.3 CrayMPPLinker 164
Exercises 166
Chapter 4 Macro Processors 175
41
42
43
44
Basic Macro Processor Functions 176
4.1.1 Macro Definition and Expansion 176
4.1.2 Macro Processor Algorithm and Data Structures 181
Machine-Independent Macro Processor Features 186
42.1 Concatenation of Macro Parameters 186
422 Generation of Unique Labels 187
423 Conditional Macro Expansion 189
4.24 Keyword Macro Parameters 196
Macro Processor Design Options 197
43.1 Recursive Macro Expansion 199
43.2 General-Purpose Macro Processors 202
43.3 Macro Processing within Language Translators 204
Implementation Examples 206
4.4.1 MASM Macro Processor 206
44.2 ANSIC Macro Language 20944.3 The ELENA Macro Processor 213
Exercises 218
Chapter 5 Compilers 225
5.1
52
5.3
54
55
Basic Compiler Functions 225
5.1.1 Grammars 227
5.1.2 Lexical Analysis 231
5.1.3 Syntactic Analysis 242
5.14 Code Generation 260
Machine-Dependent Compiler Features 272
5.21 Intermediate Form of the Program 272
5.2.2 Machine-Dependent Code Optimization 274
Machine-Independent Compiler Features 278
53.1 Structured Variables 279
53.2 Machine-Independent Code Optimization 284
53.3. Storage Allocation 289
5.34 — Block-Structured Languages 294
Compiler Design Options 299
5.4.1 Division into Passes 299
5.42 Interpreters 301
5.43 P-CodeCompilers 302
544 Compiler-Compilers 304
Implementation Examples 305
55.1 SunOSC Compiler 305
55.2 GNUNYUAda Translator 308
553 Cray MPP FORTRAN Compiler 310
554 Java Compiler and Environment 313
555 The YACC Compiler-Compiler 315
Exercises 318
Chapter 6 Operating Systems 325
61
62
Basic Operating System Functions 325
6.1.1 Types of Operating Systems 327
6.12 User Interface 328
6.1.3 Run-Time Environment 329
Machine-Dependent Operating System Features 331
6.21 Interrupt Processing 332
622 Process Scheduling 339
623 1/OSupervision 344
6.24 Management of Real Memory 354
xiChapter 1
Background
This chapter contains a variety of information that serves as background for
the material presented later. Section 1.1 gives a brief introduction to system
software and an overview of the structure of this book. Section 1.2 begins a
discussion of the relationships between system software and machine architec
ture, which continues throughout the text. Section 1.3 describes the Simplified
Instructional Computer (SIC) that is used to present fundamental software
concepts. Sections 1.4 and 1.5 provide an introduction to the architecture of
several computers that are used as examples throughout the text. Further in-
formation on most of the machine architecture topics discussed can be found
in Tabak (1995) and Patterson and Hennessy (1996).
Most of the material in this chapter is presented at a summary level, with
many details omitted. The level of detail given here is sufficient background
for the remainder of the text. You should not attempt to memorize the material
in this chapter, or be overly concerned with minor points. Instead, it is recom-
mended that you read through this material, and then use it for reference as
needed in later chapters. References are provided throughout the chapter for
readers who want further information.
1.1 INTRODUCTION
This text is an introduction to the design and implementation of system soft-
ware. System software consists of a variety of programs that support the opera-
tion of a computer. This software makes it possible for the user to focus on an
application or other problem to be solved, without needing to know the de-
tails of how the machine works internally.
When you took your first programming course, you were already using
many different types of system software. You probably wrote programs in a
high-level language like C++ or Pascal, using a text editor to create and modify
the program. You translated these programs into machine language using a
compiler. The resulting machine language program was loaded into memory
and prepared for execution by a londer or linker. You may have used a debugger
to help detect errors in the program.Chapter 1 Background
In later courses, you probably wrote programs in assembler language. You
may have used macro instructions in these programs to read and write data,
or to perform other higher-level functions. You used an assembler, which prob-
ably included a macro processor, to translate these programs into machine lan-
guage. The translated programs were prepared for execution by the loader or
linker, and may have been tested using the debugger.
You controlled all of these processes by interacting with the operating sys-
tem of the computer. If you were using a system like UNIX or DOS, you proba-
bly typed commands at a keyboard. If you were using a system like MacOS or
Windows, you probably specified commands with menus and a point-and-
click interface. In either case, the operating system took care of all the ma-
chine-level details for you. Your computer may have been connected to a
network, or may have been shared by other users. It may have had many dif-
ferent kinds of storage devices, and several ways of performing input and out-
put. However, you did not need to be concerned with these issues. You could
concentrate on what you wanted to do, without worrying about how it was
accomplished.
As you read this book, you will learn about several important types of sys-
tem software. You will come to understand the processes that were going on
“behind the scenes” as you used the computer in previous courses. By under-
standing the system software, you will gain a deeper understanding of how
computers actually work.
The major topics covered in this book are assemblers, loaders and linkers,
macro processors, compilers, and operating systems; each of Chapters 2
through 6 is devoted to one of these subjects. We also consider implementa-
tions of these types of software on several real machines. One central theme of
the book is the relationship between system software and machine architec-
ture: the design of an assembler, operating system, etc,, is influenced by the ar-
chitecture of the machine on which it is to run. Some of these influences are
discussed in the next section; many other examples appear throughout the
text.
Chapter 7 contains a survey of some other important types of system soft-
ware: database management systems, text editors, and interactive debugging
systems. Chapter 8 contains an introduction to software engineering concepts
and techniques, focusing on the use of such methods in writing system soft-
ware. This chapter can be read at any time after the introduction to assemblers
in Section 2.1
The depth of treatment in this text varies considerably from one topic to
another. The chapters on assemblers, loaders and linkers, and macro proces-
sors contain enough implementation details to prepare the reader to write
these types of software for a real computer. Compilers and operating systems,
on the other hand, are very large topics; each has, by itself, been the subject of1.2. System Software and Machine Architecture
many complete books and courses. It is obviously impossible to provide a full
coverage of these subjects in a single chapter of any reasonable size. Instead,
we provide an introduction to the most important concepts and issues related
to compilers and operating systems, stressing the relationships between soft-
ware design and machine architecture. Other subtopics are discussed as space
permits, with references provided for readers who wish to explore these areas
further. Our goal is to provide a good overview of these subjects that can also
serve as background for students who will later take more advanced software
courses. This same approach is also applied to the other topics surveyed in
Chapter 7.
1.2 SYSTEM SOFTWARE AND
MACHINE ARCHITECTURE
One characteristic in which most system software differs from application soft-
ware is machine dependency. An application program is primarily concerned
with the solution of some problem, using the computer as a tool. The focus is
on the application, not on the computing system. System programs, on the
other hand, are intended to support the operation and use of the computer i
self, rather than any particular application. For this reason, they are usually re-
lated to the architecture of the machine on which they are to run. For example,
assemblers translate mnemonic instructions into machine code; the instruction
formats, addressing modes, etc., are of direct concern in assembler design.
Similarly, compilers must generate machine language code, taking into ac-
count such hardware characteristics as the number and type of registers and
the machine instructions available. Operating systems are directly concerned
with the management of nearly all of the resources of a computing system.
Many other examples of such machine dependencies may be found through-
out this book.
On the other hand, there are some aspects of system software that do not
directly depend upon the type of computing system being supported. For
example, the general design and logic of an assembler is basically the same on
most computers. Some of the code optimization techniques used by compilers
are independent of the target machine (although there are also machine-
dependent optimizations). Likewise, the process of linking together indepen-
dently assembled subprograms does not usually depend on the computer
being used. We will also see many examples of such machine-independent
features in the chapters that follow.
Because most system software is machine-dependent, we must include real
machines and real pieces of software in our study. However, most real com-
puters have certain characteristics that are unusual or even unique. It can beChapter 1 Background
difficult to distinguish between those features of the software that are truly
fundamental and those that depend solely on the idiosyncrasies of a particular
machine. To avoid this problem, we present the fundamental functions of each
piece of software through discussion of a Simplified Instructional Computer
(SIC). SIC is a hypothetical computer that has been carefully designed to in-
clude the hardware features most often found on real machines, while avoid-
ing unusual or irrelevant complexities. In this way, the central concepts of a
piece of system software can be clearly separated from the implementation de-
tails associated with a particular machine. This approach provides the reader
with a starting point from which to begin the design of system software for a
new or unfamiliar computer.
Each major chapter in this text first introduces the basic functions of
the type of system software being discussed. We then consider machine-
dependent and machine-independent extensions to these functions, and exam-
ples of implementations on actual machines. Specifically, the major chapters
are divided into the following sections:
1. Features that are fundamental, and that should be found in any
example of this type of software.
2. Features whose presence and character are closely related to the
machine architecture.
3. Other features that are commonly found in implementations of this
type of software, and that are relatively machine-independent.
4. Major design options for structuring a particular piece of software—
for example, single-pass versus multi-pass processing,
5. Examples of implementations on actual machines, stressing unusual
software features and those that are related to machine characteristics.
This chapter contains brief descriptions of SIC and of the real machines
that are used as examples. You are encouraged to read these descriptions now,
and refer to them as necessary when studying the examples in each chapter.
1.3 THE SIMPLIFIED INSTRUCTIONAL
COMPUTER (SIC)
In this section we describe the architecture of our Simplified Instructional
Computer (SIC). This machine has been designed to illustrate the most com-
monly encountered hardware features and concepts, while avoiding most of
the idiosyncrasies that are often found in real machines.1.3. The Simplified Instructional Computer (SIC)
Like many other products, SIC comes in two versions: the standard model
and an XE version (XE stands for “extra equipment,” or perhaps “extra expen-
sive”). The two versions have been designed to be upward compatible—that is,
an object program for the standard SIC machine will also execute properly on
a SIC/XE system. (Such upward compatibility is often found on real comput-
ers that are closely related to one another.) Section 1.3.1 summarizes the stan-
dard features of SIC. Section 1.3.2 describes the additional features that are
included in SIC/XE. Section 1.3.3 presents simple examples of SIC and
SIC/XE programming. These examples are intended to help you become more
familiar with the SIC and SIC/XE instruction sets and assembler language.
Practice exercises in SIC and SIC/XE programming can be found at the end of
this chapter.
1.3.1 SIC Machine Architecture
Memory
Memory consists of 8-bit bytes; any 3 consecutive bytes form a word (24 bits).
All addresses on SIC are byte addresses; words are addressed by the location
of their lowest numbered byte. There are a total of 32,768 (215) bytes in the
computer memory.
Registers
There are five registers, all of which have special uses. Each register is 24 bits
in length. The following table indicates the numbers, mnemonics, and uses of
these registers. (The numbering scheme has been chosen for compatibility
with the XE version of SIC.)
Mnemonic __Number _ Special use
A 0 Accumulator; used for arithmetic operations
x 1 Index register; used for addressing
L 2 Linkage register; the Jump to Subroutine (JSUB)
instruction stores the return address
in this register
PC 8 Program counter; contains the address of the
next instruction to be fetched for execution
sw 9 Status word; contains a variety of
information, including a Condition Code (CC)Chapter 1 Background
Data Formats
Integers are stored as 24-bit binary numbers; 2's complement representation is
used for negative values. Characters are stored using their 8-bit ASCII codes
(see Appendix B). There is no floating-point hardware on the standard version
of SIC
Instruction Formats
All machine instructions on the standard version of SIC have the following
24-bit format:
8 1 15
| opcode x address
The flag bit x is used to indicate indexed-addressing mode.
Addressing Modes
There are two addressing modes available, indicated by the setting of the x bit
in the instruction. The following table describes how the target address is calcu-
lated from the address given in the instruction. Parentheses are used to indi-
cate the contents of a register or a memory location. For example, (X)
represents the contents of register X.
Mode Indication _Target address calculation
Direct x=0 TA = address
Indexed — x=1 TA = address + (X)
Instruction Set
SIC provides a basic set of instructions that are sufficient for most simple
tasks. These include instructions that load and store registers (LDA, LDX, STA,
STX, etc.), as well as integer arithmetic operations (ADD, SUB, MUL, DIV). All
arithmetic operations involve register A and a word in memory, with the result
being left in the register. There is an instruction (COMP) that compares the
value in register A with a word in memory; this instruction sets a condition code
CC to indicate the result (<, =, or >). Conditional jump instructions (LT, JEQ,
JGT) can test the setting of CC, and jump accordingly. Two instructions are1.3. The Simplified Instructional Computer (SIC)
provided for subroutine linkage. JSUB jumps to the subroutine, placing the
return address in register L; RSUB returns by jumping to the address con-
tained in register L. *
‘Appendix A gives a complete list of all SIC (and SIC/XE) instructions, with
their operation codes and a specification of the function performed by each.
Input and Output
On the standard version of SIC, input and output are performed by transfer-
ring 1 byte at a time to or from the rightmost 8 bits of register A. Each device is
assigned a unique 8-bit code. There are three I/O instructions, each of which
specifies the device code as an operand.
The Test Device (TD) instruction tests whether the addressed device is
ready to send or receive a byte of data. The condition code is set to indicate the
result of this test. (A setting of < means the device is ready to send or receive,
and = means the device is not ready.) A program needing to transfer data must
wait until the device is ready, then execute a Read Data (RD) or Write Data
(WD). This sequence must be repeated for each byte of data to be read or writ-
ten. The program shown in Fig. 2.1 (Chapter 2) illustrates this technique for
performing I/O.
1.3.2 SIC/XE Machine Architecture
Memory
The memory structure for SIC/XE is the same as that previously described for
SIC. However, the maximum memory available on a SIC/XE system is
1 megabyte (220 bytes). This increase leads to a change in instruction formats
and addressing modes.
Registers
The following additional registers are provided by SIC/XE:
Mnemonic Number _ Special u:
B 3 Base register; used for addressing
s 4 General working register—no special use
T 5 General working register—no special use
F 6 Floating-point accumulator (48 bits)Chapter 1 Background
Data Formats
SIC/XE provides the same data formats as the standard version. In addition,
there is a 48-bit floating-point data type with the following format:
1 u 36
: ‘exponent fraction
The fraction is interpreted as a value between 0 and 1; that is, the assumed bi-
nary point is immediately before the high-order bit. For normalized floating-
point numbers, the high-order bit of the fraction must be 1. The exponent is
interpreted as an unsigned binary number between 0 and 2047. If the exponent
has value ¢ and the fraction has value f, the absolute value of the number rep-
resented is
f+ 2fe-1024),
The sign of the floating-point number is indicated by the value of s (0 =
positive, 1 = negative). A value of zero is represented by setting all bits
(including sign, exponent, and fraction) to 0.
Instruction Formats
The larger memory available on SIC/XE means that an address will (in gen-
eral) no longer fit into a 15-bit field; thus the instruction format used on the
standard version of SIC is no longer suitable. There are two possible options—
either use some form of relative addressing, or extend the address field to 20
bits. Both of these options are included in SIC/XE (Formats 3 and 4 in the fol-
lowing description). In addition, SIC/XE provides some instructions that do
not reference memory at all. Formats 1 and 2 in the following description are
used for such instructions.
The new set of instruction formats is as follows. The settings of the flag bits
in Formats 3 and 4 are discussed under Addressing Modes. Bit ¢ is used to dis-
tinguish between Formats 3 and 4 (¢ = 0 means Format 3, ¢ = 1 means Format
4), Appendix A indicates the format to be used with each machine instruction.
Format 1 (1 byte):
8
op13 The Simplified Instructional Computer (SIC)
Format 2 (2 bytes):
8 4 4
op n [2
Format 3 (3 bytes):
6 pratt 12
op
Format 4 (4 bytes):
6 satay 20
PP pple address
Addressing Modes
‘Two new relative addressing modes are available for use with instructions
assembled using Format 3. These are described in the following table:
Mode h jon Target address calculation
Base relative b=1,p=0 TA=(B)+disp (0< disp < 4095)
Program-counter_ b=0,p=1 TA=(PC)+disp (-2048 < disp < 2047)
relative
For base relative addressing, the displacement field disp in a Format 3 instruc-
tion is interpreted as a 12-bit unsigned integer. For program-counter relative ad-
dressing, this field is interpreted as a 12-bit signed integer, with negative
values represented in 2's complement notation
If bits b and p are both set to 0, the disp field from the Format 3 instruction
is taken to be the target address. For a Format 4 instruction, bits b and p are
normally set to 0, and the target address is taken from the address field of the
instruction. We will call this direct addressing, to distinguish it from the rela-
tive addressing modes described above.
Any of these addressing modes can also be combined with indexed ad-
dressing—if bit x is set to 1, the term (X) is added in the target address calcula-
tion. Notice that the standard version of the SIC machine uses only direct
addressing (with or without indexing).10
Chupter 1 Background
Bits i and 1 in Formats 3 and 4 are used to specify how the target address is
used. If bit i= 1 and 1 = 0, the target address itself is used as the operand
value; no memory reference is performed. This is called immediate addressing,
If bit ( = 0 and n = 1, the word at the location given by the target address is
fetched; the value contained in this word is then taken as the address of the
operand value. This is called indirect addressing. If bits i and 1 are both 0 or
both 1, the target address is taken as the location of the operand; we will refer
to this as simple addressing. Indexing cannot be used with immediate or indi-
rect addressing modes.
Many authors use the term effective address to denote what we have called
the target address for an instruction. However, there is disagreement concern-
ing the meaning of effective address when referring to an instruction that uses
indirect addressing. To avoid confusion, we use the term target address
throughout this book.
SIC/XE instructions that specify neither immediate nor indirect addressing
are assembled with bits 1 and i both set to 1. Assemblers for the standard ver-
sion of SIC will, however, set the bits in both of these positions to 0. (This is be-
cause the 8-bit binary codes for all of the SIC instructions end in 00.) All
SIC/XE machines have a special hardware feature designed to provide the up-
ward compatibility mentioned earlier. If bits » and i are both 0, then bits b, p,
and ¢ are considered to be part of the address field of the instruction (rather
than flags indicating addressing modes). This makes Instruction Format 3
identical to the format used on the standard version of SIC, providing the de-
sired compatibility.
Figure 1.1 gives examples of the different addressing modes available on
SIC/XE. Figure 1.1(a) shows the contents of registers B, PC, and X, and of se-
lected memory locations. (All values are given in hexadecimal.) Figure 1.1(b)
gives the machine code for a series of LDA instructions. The target address
generated by each instruction, and the value that is loaded into register A, are
also shown. You should carefully examine these examples, being sure you un-
derstand the different addressing modes illustrated.
For ease of reference, all of the SIC/XE instruction formats and addressing,
modes are summarized in Appendix A.
Instruction Set
SIC/XE provides all of the instructions that are available on the standard
version. In addition, there are instructions to load and store the new registers
(LDB, STB, etc.) and to perform floating-point arithmetic operations (ADDF,1.3 The Simplified Instructional Computer (SIC) n
(B) = 006000
. . (PC) = 003000
: : (x) = 000080
3030 | 003600
3600 | 103000
(006303
303 | 003030
@)
Machine instruction Value
loaded
Hex Binary into
register
op on i x bo pe displaddress A
032600 000000 «1:«1 «0 ~©0 1 0 0110 0000 0000 3600 103000
o3c300 000000 «1 «1110 0 _ 0022 0000 0000 6390 00303
022030 000000 ««:«8 = 0 1 0 0000 0011 0000 3030 103000
010030 900000 «9 «S10 «8 «OO 0000 0011 0000 30 000030
003600 000000. :« SO -««O-—-1_—«-1_—(0210 0000 0000 3600 103000
0310¢303 000000 ««1«1 «8 «© «010000 1100 0011 0000 0011 c303.«S(003030
)
Figure 1.1 Examples of SIC/XE instructions and addressing modes.2
Chapter 1 Background
SUBF, MULE, DIVF). There are also instructions that take their operands from
registers. Besides the RMO (register move) instruction, these include
register-to-register arithmetic operations (ADDR, SUBR, MULR, DIVR). A spe-
cial supervisor call instruction (SVC) is provided. Executing this instruction
generates an interrupt that can be used for communication with the operating
system. (Supervisor calls and interrupts are discussed in Chapter 6.)
There are also several other new instructions. Appendix A gives a complete
list of all SIC/XE instructions, with their operation codes and a specification of
the function performed by each.
Input and Output
The I/O instructions we discussed for SIC are also available on SIC/XE. In ad-
dition, there are I/O channels that can be used to perform input and output
while the CPU is executing other instructions. This allows overlap of comput-
ing and I/O, resulting in more efficient system operation. The instructions
SIO, TIO, and HIO are used to start, test, and halt the operation of I/O chan-
nels. (These concepts are discussed in detail in Chapter 6.)
1.3.3 SIC Programming Examples
This section presents simple examples of SIC and SIC/XE assembler language
programming. These examples are intended to help you become more familiar
with the SIC and SIC/XE instruction sets and assembler language. It is as-
sumed that the reader is already familiar with the assembler language of at
least one machine and with the basic ideas involved in assembly-level pro-
gramming.
The primary subject of this book is systems programming, not assembler
language programming. The following chapters contain discussions of various
types of system software, and in some cases SIC programs are used to illus-
trate the points being made. This section contains material that may help you
to understand these examples more easily. However, it does not contain any
new material on system software or systems programming. Thus, this section
can be skipped without any loss of continuity.
Figure 1.2 contains examples of data movement operations for SIC and
SIC/XE. There are no memory-to-memory move instructions; thus, all data
movement must be done using registers. Figure 1.2(a) shows two examples of
data movement. In the first, a 3-byte word is moved by loading it into register
Aand then storing the register at the desired destination. Exactly the same
thing could be accomplished using register X (and the instructions LDX, STX)
or register L (LDL, STL). In the second example, a single byte of data is moved
using the instructions LDCH (Load Character) and STCH (Store Character).1a The Simplified Instructional Computer (SIC)
These instructions operate by loading or storing the rightmost 8-bit byte of
register A; the other bits in register A are not affected.
Figure 1.2(a) also shows four different ways of defining storage for data
items in the SIC assembler language. (These assembler directives are discussed
in more detail in Section 2.1.) The statement WORD reserves one word of stor-
age, which is initialized to a value defined in the operand field of the state-
ment. Thus the WORD statement in Fig. 1.2(a) defines a data word labeled
FIVE whose value is initialized to 5. The statement RESW reserves one or
more words of storage for use by the program. For example, the RESW state-
ment in Fig. 1.2(a) defines one word of storage labeled ALPHA, which will be
used to hold a value generated by the program.
The statements BYTE and RESB perform similar storage-definition func-
tions for data items that are characters instead of words. Thus in Fig. 1.2(a)
CHARZ is a 1-byte data item whose value is initialized to the character “Z’
and C1 is a 1-byte variable with no initial value.
LDA FIVE LOAD CONSTANT 5 INTO REGISTER A
STA ALPHA STORE IN ALPHA
LOCH CHARZ LOAD CHARACTER ‘Z’ INTO REGISTER A
sTH cl STORE IN CHARACTER VARIABLE C1
ALPHA RESW
FIVE WORD
ONE-WORD VARIABLE
ONE-WORD CONSTANT
ONE-BYTE CONSTANT
i
4 .
cl RESB ONE-BYTE VARIABLE
@)
LDA O45 LOAD VALUE 5 INTO REGISTER A
STA ALPHA STORE IN ALPHA
LDA #90 LOAD ASCII CODE FOR 'Z' INTO REG A
stcH cl STORE IN CHARACTER VARIABLE Cl
ALPHA RESW 1 ONE-WORD VARIABLE
ct RESBO 1 ONE-BYTE VARIABLE
(b)
Figure 1.2 Sample data movement operations for (a) SIC and
(b) SIC/XE.
13vy
Chapter 1 Background
The instructions shown in Fig. 1.2(a) would also work on SIC/XE; how-
ever, they would not take advantage of the more advanced hardware features
available. Figure 1.2(b) shows the same two data-movement operations as
they might be written for SIC/XE. In this example, the value 5 is loaded into
register A using immediate addressing. The operand field for this instruction
contains the flag # (which specifies immediate addressing) and the data value
to be loaded. Similarly, the character “Z” is placed into register A by using im-
mediate addressing to load the value 90, which is the decimal value of the
ASCII code that is used internally to represent the character “2”.
Figure 1.3(a) shows examples of arithmetic instructions for SIC. All arith-
metic operations are performed using register A, with the result being left in
register A. Thus this sequence of instructions stores the value (ALPHA + INCR
~1) in BETA and the value (GAMMA + INCR ~ 1) in DELTA.
Figure 1.3(b) illustrates how the same calculations could be performed on
SIC/XE. The value of INCR is loaded into register S initially, and the register-
to-register instruction ADDR is used to add this value to register A when it is
needed. This avoids having to fetch INCR from memory each time it is used in
a calculation, which may make the program more efficient. Immediate ad-
dressing is used for the constant 1 in the subtraction operations.
Looping and indexing operations are illustrated in Fig. 1.4. Figure 1.4(a)
shows a loop that copies one 11-byte character string to another. The index
register (register X) is initialized to zero before the loop begins. Thus, during
the first execution of the loop, the target address for the LDCH instruction will
be the address of the first byte of STR1. Similarly, the STCH instruction will
store the character being copied into the first byte of STR2. The next instruc-
tion, TIX, performs two functions. First it adds 1 to the value in register X, and
then it compares the new value of register X to the value of the operand (in
this case, the constant value 11). The condition code is set to indicate the result
of this comparison. The JLT instruction jumps if the condition code is set to
“less than.” Thus, the JLT causes a jump back to the beginning of the loop if
the new value in register X is less than 11.
During the second execution of the loop, register X will contain the value
1. Thus, the target address for the LDCH instruction will be the second byte of
STRI, and the target address for the STCH instruction will be the second byte
of STR2. The TIX instruction will again add 1 to the value in register X, and the
loop will continue in this way until all 11 bytes have been copied from STRI to
STR2. Notice that after the TIX instruction is executed, the value in register X
is equal to the number of bytes that have already been copied
Figure 1.4(b) shows the same loop as it might be written for SIC/XE. The
main difference is that the instruction TIXR is used in place of TIX. TIXR
works exactly like TIX, except that the value used for comparison is taken
from another register (in this case, register T), not from memory. This makes1.8 The Simplified Instructional Computer (SIC) 15
the loop more efficient, because the value does not have to be fetched from
memory each time the loop is executed. Immediate addressing is used to ini-
tialize register T to thg value 11 and to initialize register X to 0.
BETA
DELTA
BETA
DELTA
‘LDA
ADD
SUB
STA
‘LDA
ADD
SUB
STA
ALPHA,
INCR
ONE
SA
ce
BETA
SA
#1
DELTA
LOAD ALPHA INTO REGISTER A
ADD THE VALUE OF INCR
SUBTRACT 1
STORE IN BETA
LOAD GAMMA INTO REGISTER A
ADD THE VALUE OF INCR
SUBTRACT 1
STORE IN DELTA
ONE-WORD CONSTANT
ONE-WORD VARIABLES
@
LOAD VALUE OF INCR INTO REGISTER S
LOAD ALPHA INTO REGISTER A
ADD THE VALUE OF INCR
SUBTRACT 1
STORE IN BETA
LOAD GAMMA INTO REGISTER A
ADD THE VALUE OF INCR
SUBTRACT 1
STORE IN DELTA
ONE WORD VARIABLES
©)
Figure 1.3 Sample arithmetic operations for (a) SIC and (b) SIC/XE.16
Chapter 1 Background
LOX ZERO INITIALIZE INDEX REGISTER TO 0
MOVECH LDCH = STRI,X LOAD CHARACTER FROM STR1 INTO REG A
SICH STR2,X STORE CHARACTER INTO STR2
TIX ELEVEN ADD 1 TO INDEX, COMPARE RESULT TO 11
JLT MOVECH LOOP IF INDEX IS LESS THAN 11
STRL BYTE C’TEST STRING’ 11-BYTE STRING CONSTANT
STR2 RESB lL 11-BYTE VARIABLE
: ONE-WORD CONSTANTS
ZERO WORD 0
@)
wor fll INITIALIZE REGISTER T TO 11
Lx #0 INITIALIZE INDEX REGISTER TO 0
MOVECH LDCH —STR1,x LOAD CHARACTER FROM STR1 INTO REG A
SICH STR2,X STORE CHARACTER INTO STR2
TDR oT ADD 1 TO INDEX, COMPARE RESULT TO 11
JLT = MOVECH Loop IF INDEX IS LESS THAN 11
STRL BYTE C/TEST STRING’ 11-BYTE STRING CONSTANT
‘STR2 RESB oll 11-BYTE VARIABLE
)
Figure 1.4 Sample looping and indexing operations for (a) SIC and
(b) SICIXE.
Figure 1.5 contains another example of looping and indexing operations.
The variables ALPHA, BETA, and GAMMA are arrays of 100 words each. In
this case, the task of the loop is to add together the corresponding elements of
ALPHA and BETA, storing the results in the elements of GAMMA. The gen-
eral principles of looping and indexing are the same as previously discussed.
However, the value in the index register must be incremented by 3 for each it-
eration of this loop, because each iteration processes a 3-byte (i.e., one-word)
element of the arrays. The TIX instruction always adds 1 to register X, so it is
not suitable for this program fragment. Instead, we use arithmetic and com-
parison instructions to handle the index value.ADDLP = LDK
ALPHA = RESW
BETA RESW
ZERO ‘WORD
300 ‘WORD
ADDLP
BETA RESW
ZERO
INDEX
ALPHA, X
BETA, X
100
100
100
#3
#300
#0
ALPHA, X
BETA, X
GAMA, X
8.X
XT
100
100
100
1.3 The Simplified Instructional Computer (SIC) 7
INITIALIZE INDEX VALUE TO 0
LOAD INDEX VALUE INTO REGISTER X
LOAD WORD FROM ALPHA INTO REGISTER A
ADD WORD FROM BETA
STORE THE RESULT IN A WORD IN GAMMA
ADD 3 TO INDEX VALUE
COMPARE NEW INDEX VALUE TO 300
LOOP IF INDEX IS LESS THAN 300
ONE-WORD VARIABLE FOR INDEX VALUE,
ARRAY VARIABLES--100 WORDS EACH
ONE-WORD CONSTANTS
(@)
INITIALIZE REGISTER S TO 3
INITIALIZE REGISTER T TO 300
INITIALIZE INDEX REGISTER TO 0
LOAD WORD FROM ALPHA INTO REGISTER A
ADD WORD FROM BETA
STORE THE RESULT IN A WORD IN GAMMA
ADD 3 TO INDEX VALUE
COMPARE NEW INDEX VALUE TO 300
LOOP IF INDEX VALUE IS LESS THAN 300
ARRAY VARIABLES--100 WORDS EACH
(eo)
Figure 1.5 Sample indexing and looping operations for (a) SIC and
(b) SICIXE.18
Chapter 1 Background
In Fig. 1.5(a), we define a variable INDEX that holds the value to be used
for indexing for each iteration of the loop. Thus, INDEX should be 0 for the
first iteration, 3 for the second, and so on. INDEX is initialized to 0 before the
start of the loop. The first instruction in the body of the loop loads the current
value of INDEX into register X so that it can be used for target address calcula-
tion. The next three instructions in the loop load a word from ALPHA, add the
corresponding word from BETA, and store the result in the corresponding
word of GAMMA. The value of INDEX is then loaded into register A, incre-
mented by 3, and stored back into INDEX. After being stored, the new value of
INDEX is still present in register A. This value is then compared to 300 (the
length of the arrays in bytes) to determine whether or not to terminate the
loop. If the value of INDEX is less than 300, then all bytes of the arrays have
not yet been processed. In that case, the JLT instruction causes a jump back to
the beginning of the loop, where the new value of INDEX is loaded into regis-
ter X,
This particular loop is cumbersome on SIC, because register A must be
used for adding the array elements together and also for incrementing the in-
dex value. The loop can be written much more efficiently for SIC/XE, as
shown in Fig. 1.5(b). In this example, the index value is kept permanently in
register X. The amount by which to increment the index value (3) is kept in
register S, and the register-to-register ADDR instruction is used to add this in-
crement to register X. Similarly, the value 300 is kept in register T, and the in-
struction COMPR is used to compare registers X and T in order to decide
when to terminate the loop.
Figure 1.6 shows a simple example of input and output on SIC; the same
instructions would also work on SIC/XE. (The more advanced input and out-
put facilities available on SIC/XE, such as 1/O channels and interrupts, are
discussed in Chapter 6.) This program fragment reads 1 byte of data from de-
vice F1 and copies it to device 05. The actual input of data is performed using
the RD (Read Data) instruction. The operand for the RD is a byte in memory
that contains the hexadecimal code for the input device (in this case, F1)
Executing the RD instruction transfers 1 byte of data from this device into the
rightmost byte of register A. If the input device is character-oriented (for ex-
ample, a keyboard), the value placed in register A is the ASCII code for the
character that was read.
Before the RD can be executed, however, the input device must be ready to
transmit the data. For example, if the input device is a keyboard, the operator
must have typed a character. The program checks for this by using the TD
(Test Device) instruction. When the TD is executed, the status of the addressed
device is tested and the condition code is set to indicate the result of this test.
If the device is ready to transmit data, the condition code is set to “less than”;
if the device is not ready, the condition code is set to “equal.” As Fig. 1.61.¥ The Simplified Instructional Computer (SIC)
INLOOP. 1D INDEV TEST INPUT DEVICE
JQ INLOOP LOOP UNTIL DEVICE IS READY
RD INDEV READ ONE BYTE INTO REGISTER A
STCH DATA STORE BYTE THAT WAS READ
courte = 1D ourDEV TEST OUTPUT DEVICE
JEQ (OUTLP LOOP UNTIL DEVICE IS READY
LOCH DATA LOAD DATA BYTE INTO REGISTER A
wD ouTDEV WRITE ONE BYTE TO OUTPUT DEVICE
INDEV BYTE X’F1l’ INPUT DEVICE NUMBER
OUTDEV BYTE = -x’05' OUTPUT DEVICE NUMBER
DATA RESB OL ONE-BYTE VARTABLE
Figure 1.6 Sample input and output operations for SIC.
illustrates, the program must execute the TD instruction and then check the
condition code by using a conditional jump. If the condition code is “equal”
(device not ready), the program jumps back to the TD instruction. This two-
instruction loop will continue until the device becomes ready; then the RD will
be executed.
Output is performed in the same way. First the program uses TD to check
whether the output device is ready to receive a byte of data. Then the byte to
be written is loaded into the rightmost byte of register A, and the WD (Write
Data) instruction is used to transmit it to the device.
Figure 1.7 shows how these instructions can be used to read a 100-byte
record from an input device into memory. The read operation in this example
is placed in a subroutine. This subroutine is called from the main program by
using the JSUB (Jump to Subroutine) instruction. At the end of the subroutine
there is an RSUB (Return from Subroutine) instruction, which returns control
to the instruction that follows the JSUB.
The READ subroutine itself consists of a loop. Each execution of this loop
reads 1 byte of data from the input device, using the same techniques illus-
trated in Fig. 1.6. The bytes of data that are read are stored in a 100-byte buffer
area labeled RECORD. The indexing and looping techniques that are used in
storing characters in this buffer are essentially the same as those illustrated in
Fig. 1.4(a).
Figure 1.7(b) shows the same READ subroutine as it might be written for
SIC/XE. The main differences from Fig. 1.7(a) are the use of immediate
addressing and the TIXR instruction, as was illustrated in Fig. 1.4(a).20 Chapter 1
RLOOP
INDEV
RECORD
100
RLOOP
RECORD
Background
JSUB READ
MDX = ZERO
™ INDEV
JEQ —-RLOOP
RD INDEV
STCH RECORD, X
TIX 100
JLT ——-RLOOP
RSUB
BYTE X’F’
RESB 100
WORD 0
WoRD 100
JSUB READ
LX #0
wor #100
.D INDEV
JEQ -RLOOP
RD INDEV
STCH RECORD, X
TR OT
JLT RLOOP.
RSUB
BYTE X'Fl’
RESB 100
CALL READ SUBROUTINE
SUBROUTINE TO READ 100-BYTE RECORD
INITIALIZE INDEX REGISTER TO 0
TEST INPUT DEVICE
LOOP IF DEVICE IS BUSY
READ ONE BYTE INTO REGISTER A
STORE DATA BYTE INTO RECORD
ADD 1 TO INDEX AND COMPARE TO 100
LOOP IF INDEX IS LESS THAN 100
EXIT FROM SUBROUTINE
INPUT DEVICE NUMBER
100-BYTE BUFFER FOR INPUT RECORD
ONE-WORD CONSTANTS
@
CALL READ SUBROUTINE
SUBROUTINE TO READ 100-BYTE RECORD
INITIALIZE INDEX REGISTER TO 0
INITIALIZE REGISTER T TO 100
TEST INPUT DEVICE
Loop IF DEVICE IS BUSY
READ ONE BYTE INTO REGISTER A
STORE DATA BYTE INTO RECORD
ADD 1 TO INDEX AND COMPARE TO 100
LOOP IF INDEX IS LESS THAN 100
EXIT FROM SUBROUTINE
INPUT DEVICE NUMBER
-100-BYTE BUFFER FOR INPUT RECORD
)
Figure 1.7 Sample subroutine call and record input operations for
(a) SIC and (b) SIC/XE.. 1.4 Traditional (CISC) Machines
1.4 TRADITIONAL (CISC) MACHINES
This section introducgs the architectures of two of the machines that will be
used as examples later in the text. Section 1.4.1 describes the VAX architecture,
and Section 1.4.2 describes the architecture of the Intel x86 family of proces-
sors.
The machines described in this section are classified as Complex Instruc-
tion Set Computers (CISC). CISC machines generally have a relatively large
and complicated instruction set, several different instruction formats and
lengths, and many different addressing modes. Thus the implementation of
such an architecture in hardware tends to be complex.
You may want to compare the examples in this section with the Reduced
Instruction Set Computer (RISC) examples in Section 1.5. Further discussion of
CISC versus RISC designs can be found in Tabak (1995).
1.4.1 VAX Architecture
The VAX family of computers was introduced by Digital Equipment
Corporation (DEC) in 1978. The VAX architecture was designed for compati-
bility with the earlier PDP-11 machines. A compatibility mode was provided at
the hardware level so that many PDP-11 programs could run unchanged on
the VAX. It was even possible for PDP-11 programs and VAX programs to
share the same machine in a multi-user environment.
This section summarizes some of the main characteristics of the VAX archi-
tecture. For further information, see Baase (1992).
Memory
The VAX memory consists of 8-bit bytes. All addresses used are byte ad-
dresses. Two consecutive bytes form a word; four bytes form a longword; eight
bytes form a quadword; sixteen bytes form an octaword. Some operations are
more efficient when operands are aligned in a particular way—for example, a
longword operand that begins at a byte address that is a multiple of 4.
All VAX programs operate in a virtual address space of 232 bytes. This vir-
tual memory allows programs to operate as though they had access to an ex-
tremely large memory, regardless of the amount of memory actually present
‘on the system. Routines in the operating system take care of the details of
memory management. We discuss virtual memory in connection with our
study of operating systems in Chapter 6. One half of the VAX virtual address
space is called system space, which contains the operating system, and is shared
by all programs. The other half of the address space is called process space, and
21Chapter 1 Background
is defined separately for each program. A part of the process space contains
stacks that are available to the program. Special registers and machine instruc-
tions aid in the use of these stacks.
Registers
There are 16 general-purpose registers on the VAX, denoted by RO through
R15, Some of these registers, however, have special names and uses. All gen-
eral registers are 32 bits in length. Register R15 is the program counter, also
called PC. It is updated during instruction execution to point to the next in-
struction byte to be fetched. R14 is the stack pointer SP, which points to the cur-
rent top of the stack in the program's process space. Although it is possible to
use other registers for this purpose, hardware instructions that implicitly use
the stack always use SP. R13 is the frame pointer FP. VAX procedure call con-
ventions build a data structure called a stack frame, and place its address in
FP. R12 is the argument pointer AP. The procedure call convention uses AP to
passa list of arguments associated with the call
Registers R6 through R11 have no special functions, and are available for
general use by the program. Registers RO through R35 are likewise available for
general use; however, these registers are also used by some machine instruc-
tions.
In addition to the general registers, there is a processor status longword
(PSL), which contains state variables and flags associated with a process. The
PSL includes, among many other items of information, a condition code and a
flag that specifies whether PDP-11 compatibility mode is being used by a
process. There are also a number of control registers that are used to support
various operating system functions.
Data Formats
Integers are stored as binary numbers in a byte, word, longword, quadword,
or octaword; 2's complement representation is used for negative values.
Characters are stored using their 8-bit ASCII codes.
There are four different floating-point data formats on the VAX, ranging in
length from 4 to 16 bytes. Two of these are compatible with those found on the
PDP-11, and are standard on all VAX processors. The other two are available
as options, and provide for an extended range of values by allowing more bits
in the exponent field. In each case, the principles are the same as those we dis-
cussed for SIC/XE: a floating-point value is represented as a fraction that is to
be multiplied by a specified power of 2.
VAX processors provide a packed decimal data format. In this format, each
byte represents two decimal digits, with each digit encoded using 4 bits of the
byte. The sign is encoded in the last 4 bits. There is also a numeric format that. 1.4 Traditional (CISC) Machines
is used to represent numeric values with one digit per byte. In this format, the
sign may appear either in the last byte, or as a separate byte preceding the first
digit. These two variations are called trailing numeric and leading separate nu-
meric.
VAX also supports queues and variable-length bit strings. Data structures
such as these can, of course, be implemented on any machine; however, VAX
provides direct hardware support for them. There are single machine instruc-
tions that insert and remove entries in queues, and perform a variety of opera
tions on bit strings. The existence of such powerful machine instructions and
complex primitive data types is one of the more unusual features of the VAX
architecture.
Instruction Formats
VAX machine instructions use a variable-length instruction format. Each in-
struction consists of an operation code (1 or 2 bytes) followed by up to six
operand specifiers, depending on the type of instruction. Each operand specifier
designates one of the VAX addressing modes and gives any additional infor-
mation necessary to locate the operand. (See the description of addressing
modes in the following section for further information.)
Addressing Modes
VAX provides a large number of addressing modes. With few exceptions, any
of these addressing modes may be used with any instruction. The operand it-
self may be in a register (register mode), or its address may be specified by a
register (register deferred mode). If the operand address is in a register, the reg-
ister contents may be automatically incremented or decremented by the
operand length (autoincrement and autodecrement modes). There are several
base relative addressing modes, with displacement fields of different lengths;
when used with register PC, these become program-counter relative modes.
All of these addressing modes may also include an index register, and many of
them are available in a form that specifies indirect addressing (called deferred
modes on VAX). In addition, there are immediate operands and several spe-
ial-purpose addressing modes. For further details, see Baase (1992).
Instruction Set
One of the goals of the VAX designers was to produce an instruction set that is
symmetric with respect to data type. Many instruction mnemonics are formed
by combining the following elements:
2324
Chapter 1 Background
1. a prefix that specifies the type of operation,
2. a suffix that specifies the data type of the operands,
3. a modifier (on some instructions) that gives the number of operands
involved.
For example, the instruction ADDW2? is an add operation with two operands,
each a word in length. Likewise, MULL3 is a multiply operation with three
longword operands, and CVTWL specifies a conversion from word to long-
word. (In the latter case, a two-operand instruction is assumed.) For a typical
instruction, operands may be located in registers, in memory, or in the instruc-
tion itself (immediate addressing). The same machine instruction code is used,
regardless of operand locations.
VAX provides all of the usual types of instructions for computation, data
movement and conversion, comparison, branching, etc. In addition, there are a
number of operations that are much more complex than the machine instruc-
tions found on most computers. These operations are, for the most part, hard-
ware realizations of frequently occurring sequences of code. They are
implemented as single instructions for efficiency and speed. For example, VAX
provides instructions to load and store multiple registers, and to manipulate
queues and variable-length bit fields. There are also powerful instructions for
calling and returning from procedures. A single instruction saves a designated
set of registers, passes a list of arguments to the procedure, maintains the
stack, frame, and argument pointers, and sets a mask to enable error traps for
arithmetic operations. For further information on all of the VAX instructions,
see Baase (1992)
Input and Output
Input and output on the VAX are accomplished by I/O device controllers.
Each controller has a set of control/status and data registers, which are as
signed locations in the physical address space. The portion of the address
space into which the device controller registers are mapped is called I/O space.
No special instructions are required to access registers in I/O space. An
1/0 device driver issues commands to the device controller by storing values
into the appropriate registers, exactly as if they were physical memory loca-
tions. Likewise, software routines may read these registers to obtain status in-
formation. The association of an address in 1/O space with a physical register
ina device controller is handled by the memory management routines.1.4 Traditional (CISC) Machines
1.4.2 Pentium Pro Architecture
The Pentium Pro microprocessor, introduced near the end of 1995, is the latest
in the Intel x86 family. Other recent microprocessors in this family are the
80486 and Pentium. Processors of the x86 family are presently used in a major-
ity of personal computers, and there is a vast amount of software for these
processors. It is expected that additional generations of the x86 family will be
developed in the future
The various x86 processors differ in implementation details and operating
speed. However, they share the same basic architecture. Each succeeding gen-
eration has been designed to be compatible with the earlier versions. This sec-
tion contains an overview of the x86 architecture, which will serve as
background for the examples to be discussed later in the book. Further infor-
mation about the x86 family can be found in Intel (1995), Anderson and
Shanley (1995), and Tabak (1995).
Memory
Memory in the x86 architecture can be described in at least two different ways.
At the physical level, memory consists of 8-bit bytes. All addresses used are
byte addresses. Two consecutive bytes form a word; four bytes form a double-
word (also called a dword). Some operations are more efficient when operands
are aligned in a particular way—for example, a doubleword operand that be-
gins at a byte address that is a multiple of 4.
However, programmers usually view the x86 memory as a collection of
segments, From this point of view, an address consists of two parts—a segment
number and an offset that points to a byte within the segment. Segments can
be of different sizes, and are often used for different purposes. For example,
some segments may contain executable instructions, and other segments may
be used to store data. Some data segments may be treated as stacks that can be
used to save register contents, pass parameters to subroutines, and for other
purposes.
It is not necessary for all of the segments used by a program to be in physi-
cal memory. In some cases, a segment can also be divided into pages. Some of
the pages of a segment may be in physical memory, while others may be
stored on disk. When an x86 instruction is executed, the hardware and the op-
erating system make sure that the needed byte of the segment is loaded into
physical memory. The segment/offset address specified by the programmer is
automatically translated into a physical byte address by the x86 Memory
2526
Chapter 1 Background
Management Unit (MMU). Chapter 6 contains a brief discussion of methods
that can be used in this kind of address translation.
Registers
There are eight general-purpose registers, which are named EAX, EBX, ECX,
EDX, ESI, EDI, EBP, and ESP. Each general-purpose register is 32 bits long (i.e.,
one doubleword). Registers EAX, EBX, ECX, and EDX are generally used for
data manipulation; it is possible to access individual words or bytes from
these registers. The other four registers can also be used for data, but are more
commonly used to hold addresses. The general-purpose register set is identi-
cal for all members of the x86 family beginning with the 80386. This set is also
compatible with the more limited register sets found in earlier members of the
family.
There are also several different types of special-purpose registers in the x86
architecture. EIP is a 32-bit register that contains a pointer to the next instruc-
tion to be executed. FLAGS is a 32-bit register that contains many different bit
flags. Some of these flags indicate the status of the processor; others are used
to record the results of comparisons and arithmetic operations. There are also
six 16-bit segment registers that are used to locate segments in memory.
Segment register CS contains the address of the currently executing code seg-
ment, and SS contains the address of the current stack segment. The other seg-
ment registers (DS, ES, FS, and GS) are used to indicate the addresses of data
segments.
Floating-point computations are performed using a special floating-point
unit (FPU). This unit contains eight 80-bit data registers and several other con-
trol and status registers.
All of the registers discussed so far are available to application programs.
There are also a number of registers that are used only by system programs
such as the operating system. Some of these registers are used by the MMU to
translate segment addresses into physical addresses. Others are used to con-
trol the operation of the processor, or to support debugging operations.
Data Formats
The x86 architecture provides for the storage of integers, floating-point values,
characters, and strings. Integers are normally stored as 8-, 16-, or 32-bit binary
numbers. Both signed and unsigned integers (also called ordinals) are sup-
ported; 2's complement is used for negative values. The FPU can also handle
64-bit signed integers. In memory, the least significant part of a numeric value
is stored at the lowest-numbered address. (This is commonly called1.4 Traditional (CISC) Machines
little-endian byte ordering, because the “little end” of the value comes first in
memory.)
Integers can also be stored in binary coded decimal (BCD). In the unpacked
BCD format, each byte represents one decimal digit. The value of this digit is
encoded (in binary) in the low-order 4 bits of the byte; the high-order bits are
normally zero. In the packed BCD format, each byte represents two decimal
digits, with each digit encoded using 4 bits of the byte.
There are three different floating-point data formats. The single-precision
format is 32 bits long. It stores 24 significant bits of the floating-point value,
and allows for a 7-bit exponent (power of 2). (The remaining bit is used to
store the sign of the floating-point value.) The double-precision format is 64
bits long. It stores 53 significant bits, and allows for a 10-bit exponent. The
extended-precision format is 80 bits long. It stores 64 significant bits, and
allows for a 15-bit exponent.
Characters are stored one per byte, using their 8-bit ASCII codes. Strings
may consist of bits, bytes, words, or doublewords; special instructions are
provided to handle each type of string.
Instruction Formats
All of the x86 machine instructions use variations of the same basic format.
This format begins with optional prefixes containing flags that modify the op-
eration of the instruction. For example, some prefixes specify a repetition
count for an instruction. Others specify a segment register that is to be used
for addressing an operand (overriding the normal default assumptions made
by the hardware). Following the prefixes (if any) is an opcode (1 or 2 bytes);
some operations have different opcodes, each specifying a different variant of
the operation. Following the opcode are a number of bytes that specify the
operands and addressing modes to be used. (See the description of addressing
modes in the next section for further information.)
The opcode is the only element that is always present in every instruction.
Other elements may or may not be present, and may be of different lengths,
depending on the operation and the operands involved. Thus, there are a large
number of different potential instruction formats, varying in length from
1 byte to 10 bytes or more.
Addressing Modes
The x86 architecture provides a large number of addressing modes. An
operand value may be specified as part of the instruction itself (immediate
mode), or it may be in a register (register mode).
2728
Chapter 1 Background
Operands stored in memory are often specified using vari
eral target address calculation
ions of the gen-
TA = (base register) + (index register) * (scale factor) + displacement
Any general-purpose register may be used as a base register; any general-
purpose register except ESP can be used as an index register. The scale factor
may have the value 1, 2, 4, or 8, and the displacement may be an 8-, 16-, or 32-
bit value. The base and index register numbers, scale, and displacement are
encoded as parts of the operand specifiers in the instruction. Various combina-
tions of these items may be omitted, resulting in eight different addressing
modes. The address of an operand in memory may also be specified as an ab-
solute location (direct mode), or as a location relative to the EIP register (relative
mode).
Instruction Set
The x86 architecture has a large and complex instruction set, containing more
than 400 different machine instructions. An instruction may have zero, one,
two, or three operands. There are register-to-register instructions, register-to-
memory instructions, and a few memory-to-memory instructions. In some
cases, operands may also be specified in the instruction as immediate values.
Most data movement and integer arithmetic instructions can use operands
that are 1, 2, or 4 bytes long. String manipulation instructions, which use repe-
tition prefixes, can deal directly with variable-length strings of bytes, words,
or doublewords. There are many instructions that perform logical and bit ma-
nipulations, and support control of the processor and memory-management
systems.
The x86 architecture also includes special-purpose instructions to perform
operations frequently required in high-level programming languages—for ex-
ample, entering and leaving procedures and checking subscript values against
the bounds of an array.
Input and Output
Input is performed by instructions that transfer one byte, word, or double-
word at a time from an I/O port into register EAX. Output instructions trans-
fer one byte, word, or doubleword from EAX to an I/O port. Repetition
prefixes allow these instructions to transfer an entire string in a single
operation.1.5. RISC Machines
1.5 RISC MACHINES
This section introducés the architectures of three RISC machines that will be
used as examples later in the text. Section 1.5.1 describes the architecture of the
SPARC family of processors. Section 1.5.2 describes the PowerPC family of mi-
croprocessors for personal computers. Section 1.5.3 describes the architecture
of the Cray T3E supercomputing system.
All of these machines are examples of RISC (Reduced Instruction Set
Computers), in contrast to traditional CISC (Complex Instruction Set
Computer) implementations such as Pentium and VAX. The RISC concept, de-
veloped in the early 1980s, was intended to simplify the design of processors.
This simplified design can result in faster and less expensive processor devel-
opment, greater reliability, and faster instruction execution times.
In general, a RISC system is characterized by a standard, fixed instruction
length (usually equal to one machine word), and single-cycle execution of
most instructions. Memory access is usually done by load and store instruc-
tions only. Alll instructions except for load and store are register-to-register op-
erations. There are typically a relatively large number of general-purpose
registers. The number of machine instructions, instruction formats, and ad-
dressing modes is relatively small.
The discussions in the following sections will illustrate some of these RISC
characteristics. Further information about the RISC approach, including its ad-
vantages and disadvantages, can be found in Tabak (1995).
1.5.1 UltraSPARC Architecture
The UltraSPARC processor, announced by Sun Microsystems in 1995, is the
latest member of the SPARC family. Other members of this family include a
variety of SPARC and SuperSPARC processors. The original SPARC architec-
ture was developed in the mid-1980s, and has been implemented by a number
of manufacturers. The name SPARC stands for scalable processor architecture.
This architecture is intended to be suitable for a wide range of implementa-
tions, from microcomputers to supercomputers.
Although SPARC, SuperSPARC, and UltraSPARC architectures differ
slightly, they are upward compatible and share the same basic structure. This
section contains an overview of the UltraSPARC architecture, which will serve
as background for the examples to be discussed later in the book. Further in-
formation about the SPARC family can be found in Tabak (1995) and Sun
Microsystems (1995a).
2930
Chapter 1 Background
Memory
Memory consists of 8-bit bytes; all addresses used are byte addresses. Two
consecutive bytes form a halfword; four bytes form a word; eight bytes form a
doubleword. Halfwords are stored in memory beginning at byte addresses that
are multiples of 2. Similarly, words begin at addresses that are multiples of 4,
and doublewords at addresses that are multiples of 8.
UltraSPARC programs can be written using a virtual address space of
264 bytes. This address space is divided into pages; multiple page sizes are sup-
ported. Some of the pages used by a program may be in physical memory,
while others may be stored on disk. When an instruction is executed, the hard-
ware and the operating system make sure that the needed page is loaded into
physical memory. The virtual address specified by the instruction is automati-
cally translated into a physical address by the UltraSPARC Memory Manage-
ment Unit (MMU). Chapter 6 contains a brief discussion of methods that can
be used in this kind of address translation.
Registers
The SPARC architecture includes a large register file that usually contains more
than 100 general-purpose registers. (The exact number varies from one imple-
mentation to another.) However, any procedure can access only 32 registers,
designated r0 through r31. The first eight of these registers (r0 through r7) are
global—that is, they can be accessed by all procedures on the system. (Register
10 always contains the value zero.)
The other 24 registers available to a procedure can be visualized as a win-
dow through which part of the register file can be seen. These windows over-
lap, so some registers in the register file are shared between procedures. For
example, registers r8 through r15 of a calling procedure are physically the
same registers as 124 through r31 of the called procedure. This facilitates the
passing of parameters
The SPARC hardware manages the windows into the register file. If a set of
concurrently running procedures needs more windows than are physically
available, a “window overflow” interrupt occurs. The operating system must
then save the contents of some registers in the file (and restore them later) to
provide the additional windows that are needed
In the original SPARC architecture, the general-purpose registers were
32 bits long. Later implementations (including UltraSPARC) expanded these
registers to 64 bits. Some SPARC irhplementations provide several physically
different sets of global registers, for use by application procedures and by vari-
ous hardware and operating system functions.
Floating-point computations are performed using a special floating-point
unit (FPU). On UltraSPARC, this unit contains a file of 64 double-precision
floating-point registers, and several other control and status registers.1.5 RISC Machines
Besides these register files, there are a program counter PC (which contains
the address of the next instruction to be executed), condition code registers,
and a number of othet control registers.
Data Formats
The UltraSPARC architecture provides for the storage of integers, floating-
point values, and characters. Integers are stored as 8-, 16-, 32-, or 64-bit binary
numbers. Both signed and unsigned integers are supported; 2's complement is
used for negative values. In the original SPARC architecture, the most signifi-
cant part of a numeric value is stored at the lowest-numbered address. (This is
commonly called big-endian byte ordering, because the “big end” of the value
comes first in memory.) UltraSPARC supports both big-endian and little-
endian byte orderings.
There are three different floating-point data formats. The single-precision
format is 32 bits long. It stores 23 significant bits of the floating-point value,
and allows for an 8-bit exponent (power of 2). (The remaining bit is used to
store the sign of the floating-point value.) The double-precision format is
64 bits long. It stores 52 significant bits, and allows for a 11-bit exponent. The
quad-precision format stores 63 significant bits, and allows for a 15-bit expo-
nent.
Characters are stored one per byte, using their 8-bit ASCII codes.
Instruction Formats
There are three basic instruction formats in the SPARC architecture. All of
these formats are 32 bits long; the first 2 bits of the instruction word identify
which format is being used. Format 1 is used for the Call instruction. Format 2
is used for branch instructions (and one special instruction that enters a value
into a register). The remaining instructions use Format 3, which provides for
register loads and stores, and three-operand arithmetic operations.
The fixed instruction length in the SPARC architecture is typical of RISC
systems, and is intended to speed the process of instruction fetching and de-
coding. Compare this approach with the complex variable-length instructions
found on CISC systems such as VAX and x86.
Addressing Modes
As in most architectures, an operand value may be specified as part of the in-
struction itself (immediate mode), or it may be in a register (register direct
mode). Operands in memory are addressed using one of the following three
modes:
3132
Chapter 1 Background
Mode Target address calculation
PC-relative TA = (PC) + displacement (30 bits, signed]
Register indirect TA = (register) + displacement
with displacement {13 bits, signed]
Register indirect indexed TA = (register-1) + (register-2)
PC-relative mode is used only for branch instructions.
The relatively few addressing modes of SPARC allow for more efficient im-
plementations than the 10 or more modes found on CISC systems such as x86.
Instruction Set
The basic SPARC architecture has fewer than 100 machine instructions, reflect-
ing its RISC philosophy. (Compare this with the 300 to 400 instructions often
found in CISC systems.) The only instructions that access memory are loads
and stores. All other instructions are register-to-register operations.
Instruction execution on a SPARC system is pipelined—while one instruc-
tion is being executed, the next one is being fetched from memory and de-
coded. In most cases, this technique speeds instruction execution. However, an
ordinary branch instruction might cause the process to “stall.” The instruction
following the branch (which had already been fetched and decoded) would
have to be discarded without being executed.
To make the pipeline work more efficiently, SPARC branch instructions (in-
cluding subroutine calls) are delayed branches. This means that the instruction
immediately following the branch instruction is actually executed before the
branch is taken. For example, in the instruction sequence
SUB LO, 11, LL
BA NEXT
Mov BL, 803
the MOV instruction is executed before the branch BA. This MOV instruction
is said to be in the delay slot of the branch. The programmer must take this
characteristic into account when writing an assembler language program.
Further discussions and examples of the use of delayed branches can be found
in Section 2.5.2.
The UltraSPARC architecture also includes special-purpose instructions to
provide support for operating systems and optimizing compilers. For exam-
ple, high-bandwidth block load and store operations can be used to speed1.5 RISC Machines
common operating system functions. Communication in a multi-processor
system is facilitated by special “atomic” instructions that can execute without
allowing other memoty accesses to intervene. Conditional move instructions
may allow a compiler to eliminate many branch instructions in order to opti-
mize program execution.
Input and Output
In the SPARC architecture, communication with 1/O devices is accomplished
through memory. A range of memory locations is logically replaced by device
registers. Each 1/O device has a unique address, or set of addresses, assigned
to it. When a load or store instruction refers to this device register area of
memory, the corresponding device is activated. Thus input and output can be
performed with the regular instruction set of the computer, and no special 1/O
instructions are needed.
1.5.2 PowerPC Architecture
IBM first introduced the POWER architecture early in 1990 with the RS/6000.
(POWER is an acronym for Performance Optimization With Enhanced RISC.)
It was soon realized that this architecture could form the basis for a new fam-
ily of powerful and low-cost microprocessors. In October 1991, IBM, Apple,
and Motorola formed an alliance to develop and market such microprocessors,
which were named PowerPC. The first products using PowerPC chips were
delivered near the end of 1993. Recent implementations of the PowerPC archi-
tecture include the PowerPC 601, 603, and 604; others are expected in the near
future.
As its name implies, PowerPC is a RISC architecture. As we shall see, it has
much in common with other RISC systems such as SPARC. There are also a
few differences in philosophy, which we will note in the course of the discus-
sion. This section contains an overview of the PowerPC architecture, which
will serve as background for the examples to be discussed later in the book.
Further information about PowerPC can be found in IBM (1994a) and Tabak
(1995).
Memory
Memory consists of 8-bit bytes; all addresses used are byte addresses. Two
consecutive bytes form a halfword; four bytes form a word; eight bytes form a
doubleword; sixteen bytes form a quadword. Many instructions may execute
3334
Chapter 1 Background
more efficiently if operands are aligned at a starting address that is a multiple
of their length.
PowerPC programs can be written using a virtual address space of 264
bytes. This address space is divided into fixed-length segments, which are 256
megabytes long. Each segment is divided into pages, which are 4096 bytes
long. Some of the pages used by a program may be in physical memory, while
others may be stored on disk. When an instruction is executed, the hardware
and the operating system make sure that the needed page is loaded into physi-
cal memory. The virtual address specified by the instruction is automatically
translated into a physical address. Chapter 6 contains a brief discussion of
methods that can be used in this kind of address translation.
Registers
There are 32 general-purpose registers, designated GPRO through GPR31. In
the full PowerPC architecture, each register is 64 bits long. PowerPC can also
be implemented in a 32-bit subset, which uses 32-bit registers. The general-
purpose registers can be used to store and manipulate integer data and
addresses.
Floating-point computations are performed using a special floating-point
unit (FPU). This unit contains thirty-two 64bit floating-point registers, and a
status and control register.
A 32-bit condition register reflects the result of certain operations, and can
be used as a mechanism for testing and branching. This register is divided into
eight 4-bit subfields, named CRO through CR7. These subfields can be set and
tested individually by PowerPC instructions.
The PowerPC architecture includes a Link Register (LR) and a Count
Register (CR), which are used by some branch instructions. There is also a
Machine Status Register (MSR) and variety of other control and status regis-
ters, some of which are implementation dependent.
Data Formats
The PowerPC architecture provides for the storage of integers, floating-point
values, and characters. Integers are stored as 8-, 16-, 32-, or 6+-bit binary num-
bers. Both signed and unsigned integers are supported; 2’s complement is
used for negative values. By default, the most significant part of a numeric
value is stored at the lowest-numbered address (big-endian byte ordering). It
is possible to select little-endian byte ordering by setting a bit in a control
register.1.5. RISC Machines
There are two different floating-point data formats. The single-precision
format is 32 bits long. It stores 23 significant bits of the floating-point value,
and allows for an 8-bit exponent (power of 2). (The remaining bit is used to
store the sign of the floating-point value.) The double-precision format is
64 bits long. It stores 52 significant bits, and allows for a 11-bit exponent.
Characters are stored one per byte, using their 8-bit ASCII codes.
Instruction Formats
There are seven basic instruction formats in the PowerPC architecture, some of
which have subforms. All of these formats are 32 bits long. Instructions must
be aligned beginning at a word boundary (i.e, a byte address that is a multiple
of 4). The first 6 bits of the instruction word always specify the opcode; some
instruction formats also have an additional “extended opcode” field.
The fixed instruction length in the PowerPC architecture is typical of RISC
systems. The variety and complexity of instruction formats is greater than that
found on most RISC systems (such as SPARC). However, the fixed length
makes instruction decoding faster and simpler than on CISC systems like VAX
and x86.
‘Addressing Modes
As in most architectures, an operand value may be specified as part of the in-
struction itself (immediate mode), or it may be in a register (register direct
mode). The only instructions that address memory are load and store opera-
tions, and branch instructions.
Load and store operations use one of the following three addressing
modes:
Mode Target address calculation
Register indirect TA = (register)
Register indirect with index TA = (register-1) + (register-2)
Register indirect with TA = (register) + displacement
immediate index {16 bits, signed]
‘The register numbers and displacement are encoded as part of the instruction.
35Chapter 1 Background
Branch instructions use one of the following three addressing modes:
Mode Target address calculation
Absolute TA = actual address
Relative TA = current instruction address +
displacement {25 bits, signed}
Link Register TA = (LR)
(CR)
Count Register. T/
The absolute address or displacement is encoded as part of the instruction.
Instruction Set
The PowerPC architecture has approximately 200 machine instructions. Some
instructions are more complex than those found in most RISC systems. For ex-
ample, load and store instructions may automatically update the index regis-
ter to contain the just-computed target address. There are floating-point
“multiply and add” instructions that take three input operands and perform a
multiplication and an addition in one instruction. Such instructions reflect the
PowerPC approach of using more powerful instructions, so fewer instructions
are required to perform a task. This is in contrast to the more usual RISC ap-
proach, which keeps instructions simple so they can be executed as fast as
possible.
In spite of this difference in philosophy, PowerPC is generally considered
to be a true RISC architecture. Further discussions of these issues can be found
in Smith and Weiss (1994).
Instruction execution on a PowerPC system is pipelined, as we discussed
for SPARC. However, the pipelining is more sophisticated than on the original
SPARC systems, with branch prediction used to speed execution. As a result,
the delayed branch technique we described for SPARC is not used on
PowerPC (and most other modern architectures). Further discussion of
pipelining and branch prediction can be found in Tabak (1995).
Input and Output
The PowerPC architecture provides two different methods for performing 1/O
operations. In one approach, segments in the virtual address space are
mapped onto an external address space (typically an I/O bus). Segments that
are mapped in this way are called direct-store segments. This method is similar
to the approach used in the SPARC architecture.1.5. RISC Machines
A reference to an address that is not in a direct-store segment represents a
normal virtual memory access. In this situation, 1/O is performed using the
regular virtual memory management hardware and software.
1.5.3 Cray T3E Architecture
The TSE series of supercomputers was announced by Cray Research, Inc., near
the end of 1995. The T3E is a massively parallel processing (MPP) system, de-
signed for use on technical applications in scientific computing. The earlier
Cray T3D system had a similar (but not identical) architecture.
‘A TBE system contains a large number of processing elements (PE),
arranged in a three-dimensional network as illustrated in Fig. 1.8. This net-
work provides a path for transferring data between processors. It also imple-
ments control functions that are used to synchronize the operation of the PEs
used by a program. The interconnect network is circular in each dimension.
Thus PEs at “opposite” ends of the three-dimensional array are adjacent with
respect to the network. This is illustrated by the dashed lines in Fig. 1.8; for
simplicity, most of these “circular” connections have been omitted from the
drawing.
Each PE consists of a DEC Alpha EV5 RISC microprocessor (currently
model 21164), local memory, and performance-accelerating control logic devel-
oped by Cray. A T3E system may contain from 16 to 2048 processing elements.
This section contains an overview of the architecture of the T3E and the
DEC Alpha microprocessor. Sections 3.5.3 and 5.5.3 discuss some of the ways
programs can take advantage of the multiprocessor architecture of this ma-
chine. Further information about the T3E can be found in Cray Research
(1995c). Further information about the DEC Alpha architecture can be found in
Sites (1992) and Tabak (1995).
Memory
Each processing element in the T3E has its own local memory with a capacity
of from 64 megabytes to 2 gigabytes. The local memory within each PE is part
Ra<— Interconnect network
4
"Processing element node
Figure 1.8 Overall T3E architecture.
3738
Chapter 1 Background
of a physically distributed, logically shared memory system. System memory
is physically distributed because each PE contains local memory. System mem-
ory is logically shared because the microprocessor in one PE can access the
memory of another PE without involving the microprocessor in that PE.
The memory within each processing element consists of 8-bit bytes; all
addresses used are byte addresses. Two consecutive bytes form a word; four
bytes form a longword; eight bytes form a quadword. Many Alpha instructions
may execute more efficiently if operands are aligned at a starting address that
is a multiple of their length. The Alpha architecture supports 64-bit virtual
addresses.
Registers
The Alpha architecture includes 32 general-purpose registers, designated RO
through R31; R31 always contains the value zero. Each general-purpose regis-
ter is 64 bits long. These general-purpose registers can be used to store and
manipulate integer data and addresses.
There are also 32 floating-point registers, designated FO through F31; F31
always contains the value zero. Each floating-point register is 64 bits long.
In addition to the general-purpose and floating-point registers, there is a
64-bit program counter PC and several other status and control registers.
Data Formats
The Alpha architecture provides for the storage of integers, floating-point val-
ues, and characters. Integers are stored as longwords or quadwords; 2's com-
plement is used for negative values. When interpreted as an integer, the bits of
a longword or quadword have steadily increasing significance beginning with
bit 0 (which is stored in the lowest-addressed byte).
There are two different types of floating-point data formats in the Alpha
architecture. One group of three formats is included for compatibility with the
VAX architecture. The other group consists of four IEEE standard formats,
which are compatible with those used on most modern systems.
Characters may be stored one per byte, using their 8-bit ASCII codes.
However, there are no byte load or store operations in the Alpha architecture;
only longwords and quadwords can be transferred between a register and
memory. As a consequence, characters that are to be manipulated separately
are usually stored one per longword.15. RISC Machines
Instruction Formats
There are five basic Instruction formats in the Alpha architecture, some of
which have subforms. All of these formats are 32 bits long. (As we have noted
before, this fixed length is typical of RISC systems.) The first 6 bits of the in-
struction word always specify the opcode; some instruction formats also have
an additional “function” field.
Addressing Modes
As in most architectures, an operand value may be specified as part of the in-
struction itself (immediate mode), or it may be in a register (register direct
mode). As in most RISC systems, the only instructions that address memory
are load and store operations, and branch instructions.
Operands in memory are addressed using one of the following two modes:
Mode Target address calculation
PC-relative TA = (PC) + displacement (23 bits, signed}
Register indirect TA = (register) + displacement
with displacement {16 bits, signed}
Register indirect with displacement mode is used for load and store opera-
tions and for subroutine jumps. PC-relative mode is used for conditional and
unconditional branches.
Instruction Set
The Alpha architecture has approximately 130 machine instructions, reflecting
its RISC orientation. The instruction set is designed so that an implementation
of the architecture can be as fast as possible. For example, there are no byte or
word load and store instructions. This means that the memory access interface
does not need to include shift-and-mask operations. Further discussion of this
approach can be found in Smith and Weiss (1994).
Input and Output
The T3E system performs I/O through multiple ports into one or more 1/O
channels, which can be configured in a number of ways. These channels are
39Chapter 1 Background
integrated into the network that interconnects the processing nodes. A system
may be configured with up to one I/O channel for every eight PEs. All chan-
nels are accessible and controllable from all PEs.
Further information about this “scalable” I/O architecture can be found in
Cray Research (1995c).
EXERCISES
Section 1.3
1. Write a sequence of instructions for SIC to set ALPHA equal to the
product of BETA and GAMMA. Assume that ALPHA, BETA, and
GAMMA are defined as in Fig. 1.3(a).
2. Write a sequence of instructions for SIC/XE to set ALPHA equal to
4* BETA - 9. Assume that ALPHA and BETA are defined as in Fig.
1.3(b). Use immediate addressing for the constants.
3. Write a sequence of instructions for SIC to set ALPHA equal to the
integer portion of BETA + GAMMA. Assume that ALPHA and BETA
are defined as in Fig. 1.3(a).
4, Write a sequence of instructions for SIC/XE to divide BETA by
GAMMA, setting ALPHA to the integer portion of the quotient and
DELTA to the remainder. Use register-to-register instructions to make
the calculation as efficient as possible.
5. Write a sequence of instructions for SIC/XE to divide BETA by
GAMMA, setting ALPHA to the value of the quotient, rounded to
the nearest integer. Use register-to-register instructions to make the
calculation as efficient as possible.
6. Write a sequence of instructions for SIC to clear a 20-byte string to all
blanks.
7. Write a sequence of instructions for SIC/XE to clear a 20-byte string
to all blanks. Use immediate addressing and register-to-register in-
structions to make the process as efficient as possible.
8. Suppose that ALPHA is an array of 100 words, as defined in Fig.
1.5(a). Write a sequence of instructions for SIC to set all 100 elements
of the array to 0.
9. Suppose that ALPHA is an array of 100 words, as defined in Fig.
1.5(b). Write a sequence of instructions for SIC/XE to set all 10010.
uu.
12.
13,
Exercises
elements of the array to 0. Use immediate addressing and register-to-
register instructions to make the process as efficient as possible.
Suppose that RECORD contains a 100-byte record, as in Fig. 1.7(a).
Write a subroutine for SIC that will write this record onto device 05.
Suppose that RECORD contains a 100-byte record, as in Fig. 1.7(b).
Write a subroutine for SIC/XE that will write this record onto device
05. Use immediate addressing and register-to-register instructions to
make the subroutine as efficient as possible.
Write a subroutine for SIC that will read a record into a buffer, as in
Fig. 1.7(a). The record may be any length from 1 to 100 bytes. The
end of the record is marked with a “null” character (ASCII code 00).
The subroutine should place the length of the record read into a vari-
able named LENGTH.
Write a subroutine for SIC/XE that will read a record into a buffer, as
in Fig. 1.7(b). The record may be any length from 1 to 100 bytes. The
end of the record is marked with a “null” character (ASCII code 00).
The subroutine should place the length of the record read into a vari-
able named LENGTH. Use immediate addressing and register-to-
register instructions to make the subroutine as efficient as possible.
41Chapter 2
Assemblers
In this chapter we discuss the design and implementation of assemblers. There
are certain fundamental functions that any assembler must perform, such as
translating mnemonic operation codes to their machine language equivalents
and assigning machine addresses to symbolic labels used by the programmer.
If we consider only these fundamental functions, most assemblers are very
much alike.
Beyond this most basic level, however, the features and design of an as-
sembler depend heavily upon the source language it translates and the ma-
chine language it produces. One aspect of this dependence is, of course, the
existence of different machine instruction formats and codes to accomplish
(for example) an ADD operation. As we shall see, there are also many subtler
ways that assemblers depend upon machine architecture. On the other hand,
there are some features of an assembler language (and the corresponding as-
sembler) that have no direct relation to machine architecture—they are, in a
sense, arbitrary decisions made by the designers of the language.
We begin by considering the design of a basic assembler for the standard
version of our Simplified Instructional Computer (SIC). Section 2.1 introduces
the most fundamental operations performed by a typical assembler, and de-
scribes common ways of accomplishing these functions. The algorithms and
data structures that we describe are shared by almost all assemblers. Thus this
level of presentation gives us a starting point from which to approach the
study of more advanced assembler features. We can also use this basic struc-
ture as a framework from which to begin the design of an assembler for a com-
pletely new or unfamiliar machine.
In Section 2.2, we examine some typical extensions to the basic assembler
structure that might be dictated by hardware considerations. We do this by
discussing an assembler for the SIC/XE machine. Although this SIC/XE as-
sembler certainly does not include all possible hardware-dependent features,
it does contain some of the ones most commonly found in real machines. The
principles and techniques should be easily applicable to other computers.
Section 2.3 presents a discussion of some of the most commonly encoun-
tered machine-independent assembler language features and their implemen-
tation. Once again, our purpose is not to cover all possible options, but rather
4344
Chapter 2. Assemblers
to introduce concepts and techniques that can be used in new and unfamiliar
situations.
Section 2.4 examines some important alternative design schemes for an as-
sembler. These are features of an assembler that are not reflected in the assem-
bler language. For example, some assemblers process a source program in one
pass instead of two; other assemblers may make more than two passes. We are
concerned with the implementation of such assemblers, and also with the en-
vironments in which each might be useful.
Finally, in Section 2.5 we briefly consider some examples of actual assem-
blers for real machines. We do not attempt to discuss all aspects of these as-
semblers in detail. Instead, we focus on the most interesting features that are
introduced by hardware or software design decisions.
2.1 BASIC ASSEMBLER FUNCTIONS
Figure 2.1 shows an assembler language program for the basic version of SIC.
We use variations of this program throughout this chapter to show different
assembler features. The line numbers are for reference only and are not part of
the program. These numbers also help to relate corresponding parts of differ-
ent versions of the program. The mnemonic instructions used are those intro-
duced in Section 1.3.1 and Appendix A. Indexed addressing is indicated by
adding the modifier “,X” following the operand (see line 160). Lines beginning
with ”.” contain comments only.
In addition to the mnemonic machine instructions, we have used the fol-
lowing assembler directives:
START Specify name and starting address for the program.
END Indicate the end of the source program and (optionally) specify
the first executable instruction in the program.
BYTE Generate character or hexadecimal constant, occupying as
many bytes as needed to represent the constant.
WORD _ Generate one-word integer constant.
RESB Reserve the indicated number of bytes for a data area.
RESW _ Reserve the indicated number of words for a data area.
The program contains a main routine that reads records from an input de-
vice (identified with device code F1) and copies them to an output device
(code 05). This main routine calls subroutine RDREC to read a record into a
buffer and subroutine WRREC to write the record from the buffer to the out-Line
10
15,
20
25,
30
35,
40
45
50
55
60
65
70
5
85
90
95
100
105
u10
15
120
125
130
135
140
145
150
155
160
165
170
175
180
185
190
195
200
205
210
215
220
225
230
235
240
245
250
255
Source statement
copy START 1000
FIRST STL RETADR
coop = JSUB_ REC
LDA, LENGTH
COMP ZERO
BQ ENDFIL
JSUB = WRREC
z ‘cLooP
ENDFIL «LOA EOF
ST! BUFFER
LOA, ‘THREE,
STA LENGTH
JSUB = WRREC
LDL RETADR
SUB
EOF BYTE C’EOF’
THREE = WORD 3
ZERO woRD 0
RETADR © -RESW 1
LENGTH = RESW 1
BUFFER RESB 4096
2.1 Basic Assembler Functions
COPY FILE FROM INPUT TO OUTPUT
SAVE RETURN ADDRESS
READ INPUT RECORD
TEST FOR BOF (LENGTH = 0)
EXIT IF BOF FOUND
WRITE OUTPUT RECORD
LOOP,
INSERT END OF FILE MARKER
SET LENGTH = 3
WRITE BOF
GET RETURN ADDRESS
RETURN TO CALLER
LENGTH OF RECORD
4096-BYTE BUFFER AREA
‘SUBROUTINE TO READ RECORD INTO BUFFER
ROREC LOK
LOA
RICOP TD
EXIT ed
2ERO
ZERO
INPUT
RLOOP
INPUT
ZERO
EXIT
BUFFER, X
MAXLEN
LOOP
LENGTH
xtFL
4096
CLEAR LOOP COUNTER
CLEAR A TO ZERO
TEST INPUT DEVICE
LOOP UNTIL READY
READ CHARACTER INTO REGISTER A
TEST FOR END OF RECORD (X'00")
EXIT LOOP IF FOR
STORE CHARACTER IN BUFFER
LOOP UNLESS MAX LENGTH
HAS BEEN REACHED
SAVE RECORD LENGTH
RETURN TO CALLER
CODE FOR INPUT DEVICE
SUBROUTINE TO WRITE RECORD FROM BUFFER
WRREC LOX
woop = 1D
ourpur BYTE
ZERO
ourpur
LOOP
BUFFER, X
oureur
LENGTH
LOOP
x05"
FIRST
CLEAR LOOP COUNTER
TEST OUTPUT DEVICE
LOOP UNTIL READY
GET CHARACTER FROM BUFFER
WRITE CHARACTER
LOOP UNTIL ALL CHARACTERS
HAVE BEEN WRITTEN
RETURN TO CALLER
CODE FOR OUTPUT DEVICE
Figure 2.1. Example of a SIC assembler language program.46
Chapter 2 Assemblers
put device. Each subroutine must transfer the record one character at a time
because the only I/O instructions available are RD and WD. The buffer is nec-
essary because the I/O rates for the two devices, such as a disk and a slow
printing terminal, may be very different. (In Chapter 6, we see how to use
channel programs and operating system calls on a SIC/XE system to accom-
plish the same functions.) The end of each record is marked with a null charac-
ter (hexadecimal 00). If a record is longer than the length of the buffer (4096
bytes), only the first 4096 bytes are copied. (For simplicity, the program does
not deal with error recovery when a record containing 4096 bytes or more is
read.) The end of the file to be copied is indicated by a zero-length record.
When the end of file is detected, the program writes EOF on the output device
and terminates by executing an RSUB instruction. We assume that this pro-
gram was called by the operating system using a JSUB instruction; thus, the
RSUB will return control to the operating system.
2.1.1 A Simple SIC Assembler
Figure 2.2 shows the same program as in Fig. 2.1, with the generated object
code for each statement. The column headed Loc gives the machine address
(in hexadecimal) for each part of the assembled program. We have assumed
that the program starts at address 1000. (In an actual assembler listing, of
course, the comments would be retained; they have been eliminated here to
save space.)
The translation of source program to object code requires us to accomplish
the following functions (not necessarily in the order given):
1. Convert mnemonic operation codes to their machine language
equivalents—e.g,, translate STL to 14 (line 10).
2. Convert symbolic operands to their equivalent machine addresses—
e.g., translate RETADR to 1033 (line 10).
3. Build the machine instructions in the proper format.
4, Convert the data constants specified in the source program into their
internal machine representations—e.g,, translate EOF to 454F46 (line
80).
5. Write the object program and the assembly listing,
All of these functions except number 2 can easily be accomplished by sequen-
tial processing of the source program, one line at a time. The translation of
addresses, however, presents a problem. Consider the statement
10 1000 FIRST = STL RETADR 141033Line
10
1s
20
25
30
35
40
45
50
55
65
70
5
20
85
90
95
100
105
110
15,
120
125
130
135,
140
145,
150
155
160
165
170
175
180
185
190
195
200
205
210
215
220
225
230
235
240
245,
250
255,
Loc
1000
1000
1003
1006
1009
00c
100F
1012
1015
1018
1018
101E
1023
1024
1027
102A
02D
1030
1033
1036
1039
2039
203¢
203F
2042
2045
2048
2048
2045
2051
2054
2057
205A
20D
205E
2061
2064
2067
206A
2060
2070
2073
2076
2079
Source statement
cory
First
cLooP
ENDFIL,
RLOOP
wWLoOP
START
sm
JSUB
TA
COMP
JEQ
JSUB
a
A
STA
mA
STA
JSUB
. 2.1 Basic Assembler Functions
‘Object code
1000
RETADR 141033
RDREC 482039
LENGTH 001036
ZERO 281030
ENDFIL 301015,
WRREC 482061,
ctor 31003,
EOF oo102a
BUFFER 0c1039
THREE 0102p
LENGTH oc1036
WRREC 482061,
RETADR 081033,
4co000
c'ROF’ 454F46
3 000003,
o ‘00000
1
1
4096
2ERO 041030
ZERO 001030
INPUT 50208
RLOOP 30203F
INPUT a20sD
ZERO 281030
Bar 302057
BUFFER, X 549039
MAXLEN 2¢205B,
RLOOP 38203F
LENGTH 101036
4c0000
x'FL FL
4096 001000
SUBROUTINE TO WRITE RECORD FROM BUFFER
ZERO 041030
ourrur £02079
WLooP 302064
BUFFER, X 509039
oureur pc2079
LENGTH 2c1036
‘WLOOP 382064
4co000
x'05" 05
FIRST
Figure 2.2 Program from Fig. 2.1 with object code.
4748
Chapter 2. Assemblers
This instruction contains a forward reference—that is, a reference to a label
(RETADR) that is defined later in the program. If we attempt to translate the
program line by line, we will be unable to process this statement because we
do not know the address that will be assigned to RETADR. Because of this,
most assemblers make two passes over the source program. The first pass
does little more than scan the source program for label definitions and assign
addresses (such as those in the Loc column in Fig. 2.2). The second pass per-
forms most of the actual translation previously described.
In addition to translating the instructions of the source program, the assem-
bler must process statements called assembler directives (or pseudo-instructions),
These statements are not translated into machine instructions (although they
may have an effect on the object program). Instead, they provide instructions
to the assembler itself. Examples of assembler directives are statements like
BYTE and WORD, which direct the assembler to generate constants as part of
the object program, and RESB and RESW, which instruct the assembler to re-
serve memory locations without generating data values. The other assembler
directives in our sample program are START, which specifies the starting
memory address for the object program, and END, which marks the end of the
program.
Finally, the assembler must write the generated object code onto some out-
put device. This object program will later be loaded into memory for execution.
The simple object program format we use contains three types of records:
Header, Text, and End. The Header record contains the program name, start-
ing address, and length. Text records contain the translated (ie., machine
code) instructions and data of the program, together with an indication of the
addresses where these are to be loaded. The End record marks the end of the
object program and specifies the address in the program where execution is to
begin. (This is taken from the operand of the program’s END statement. If no
operand is specified, the address of the first executable instruction is used.)
The formats we use for these records are as follows. The details of the for-
mats (column numbers, etc.) are arbitrary; however, the information contained
in these records must be present (in some form) in the object program.
Header record:
Col.1 H
Col. 2-7 Program name
Col. 8-13 Starting address of object program (hexadecimal)
Col. 14-19 Length of object program in bytes (hexadecimal)2.1 Basic Assembler Functions
Text record:
Col. 1 T
Col. 2-7 Starting address for object code in this record (hexadecimal)
Col. 89 Length of object code in this record in bytes (hexadecimal)
Col. 10-69 Object code, represented in hexadecimal (2 columns per
byte of object code)
End record:
Col.1 E
Col. 2-7 Address of first executable instruction in object program
(hexadecimal)
To avoid confusion, we have used the term column rather than byte to refer to
positions within object program records. This is not meant to imply the use of
any particular medium for the object program.
Figure 2.3 shows the object program corresponding to Fig. 2.2, using this
format. In this figure, and in the other object programs we display, the symbol
* is used to separate fields visually. Of course, such symbols are not present in
the actual object program. Note that there is no object code corresponding to
addresses 1033-2038. This storage is simply reserved by the loader for use by
the program during execution. (Chapter 3 contains a detailed discussion of the
operation of the loader.)
We can now give a general description of the functions of the two passes of
our simple assembler.
YEOPY 001000901074
7001009184 410334820390010362810393010154820613C10030010240C103900102D
{9010181 5010364820610810334C0000454F46000003000000
9020391804 103000103002050302038D820552810393020575490392C205E38203F
790205 7,161010366c0009F 1,001000041030£02079.302064509039DC20792c1036
90207307.3820644co00905
5901000
Figure 2.3 Object program corresponding to Fig. 2.2.
4950
Chapter 2 Assemblers
Pass 1 (define symbols):
1. Assign addresses to all statements in the program.
2. Save the values (addresses) assigned to all labels for use in Pass 2
3. Perform some processing of assembler directives. (This includes
processing that affects address assignment, such as determining
the length of data areas defined by BYTE, RESW, etc.)
Pass 2 (assemble instructions and generate object program):
1. Assemble instructions (translating operation codes and looking
up addresses)
2. Generate data values defined by BYTE, WORD, etc.
3. Perform processing of assembler directives not done during Pass 1.
4, Write the object program and the assembly listing.
In the next section we discuss these functions in more detail, describe the in-
ternal tables required by the assembler, and give an overall description of the
logic flow of each pass.
2.1.2 Assembler Algorithm and Data Structures
Our simple assembler uses two major internal data structures: the Operation
Code Table (OPTAB) and the Symbol Table (SYMTAB). OPTAB is used to look
up mnemonic operation codes and translate them to their machine language
equivalents. SYMTAB is used to store values (addresses) assigned to labels.
We also need a Location Counter LOCCTR. This is a variable that is used
to help in the assignment of addresses. LOCCTR is initialized to the beginning
address specified in the START statement. After each source statement is
processed, the length of the assembled instruction or data area to be generated
is added to LOCCTR. Thus whenever we reach a label in the source program,
the current value of LOCCTR gives the address to be associated with that
label
The Operation Code Table must contain (at least) the mnemonic operation
code and its machine language equivalent. In more complex assemblers, this
table also contains information about instruction format and length. During
Pass 1, OPTAB is used to look up and validate operation codes in the source
program. In Pass 2, it is used to translate the operation codes to machine lan-
guage. Actually, in our simple SIC assembler, both of these processes could be
done together in either Pass 1 or Pass 2. However, for a machine (such as
SIC/XE) that has instructions of different lengths, we must search OPTAB in
the first pass to find the instruction length for incrementing LOCCTR.. 2.1 Basic Assembler Functions
Likewise, we must have the information from OPTAB in Pass 2 to tell us
which instruction format to use in assembling the instruction, and any pecu-
liarities of the object eqde instruction. We have chosen to retain this structure
in the current discussion because it is typical of most real assemblers.
OPTAB is usually organized as a hash table, with mnemonic operation
code as the key. (The information in OPTAB is, of course, predefined when the
assembler itself is written, rather than being loaded into the table at execution
time.) The hash table organization is particularly appropriate, since it provides
fast retrieval with a minimum of searching. In most cases, OPTAB is a static
table—that is, entries are not normally added to or deleted from it. In such
cases it is possible to design a special hashing function or other data structure
to give optimum performance for the particular set of keys being stored. Most
of the time, however, a general-purpose hashing method is used. Further in
formation about the design and construction of hash tables may be found in
any good data structures text, such as Lewis and Denenberg (1991) or Knuth
(1973).
The symbol table (SYMTAB) includes the name and value (address) for
each label in the source program, together with flags to indicate error condi-
tions (e.g., a symbol defined in two different places). This table may also con-
tain other information about the data area or instruction labeled—for example,
its type or length. During Pass 1 of the assembler, labels are entered into
SYMTAB as they are encountered in the source program, along with their as-
signed addresses (from LOCCTR). During Pass 2, symbols used as operands
are looked up in SYMTAB to obtain the addresses to be inserted in the assem-
bled instructions.
SYMTAB is usually organized as a hash table for efficiency of insertion and
retrieval. Since entries are rarely (if ever) deleted from this table, efficiency of
deletion is not an important consideration. Because SYMTAB is used heavily
throughout the assembly, care should be taken in the selection of a hashing
function. Programmers often select many labels that have similar characteris-
tics—for example, labels that start or end with the same characters (like
LOOP1, LOOP2, LOOPA) or are of the same length (like A, X, Y, Z). It is im-
portant that the hashing function used perform well with such non-random
keys. Division of the entire key by a prime table length often gives good
results.
It is possible for both passes of the assembler to read the original source
program as input. However, there is certain information (such as location
counter values and error flags for statements) that can or should be communi-
cated between the two passes. For this reason, Pass 1 usually writes an inter-
mediate file that contains each source statement together with its assigned
address, error indicators, etc. This file is used as the input to Pass 2. This work-
ing copy of the source program can also be used to retain the results of certain
5152
Chapter 2 Assemblers
operations that may be performed during Pass 1 (such as scanning the
operand field for symbols and addressing flags), so these need not be per-
formed again during Pass 2. Similarly, pointers into OPTAB and SYMTAB may
be retained for each operation code and symbol used. This avoids the need to
repeat many of the table-searching operations.
Figures 2.4(a) and (b) show the logic flow of the two passes of our assem-
bler. Although described for the simple assembler we are discussing, this is
also the underlying logic for more complex two-pass assemblers that we con-
sider later. We assume for simplicity that the source lines are written in a fixed
format with fields LABEL, OPCODE, and OPERAND. If one of these fields
contains a character string that represents a number, we denote its numeric
value with the prefix # (for example, (OPERAND)).
At this stage, it is very important for you to understand thoroughly the al-
gorithms in Fig. 2.4. You are strongly urged to follow through the logi
these algorithms, applying them by hand to the program in Fig. 2.1 to produce
the object program of Fig. 2.3.
Much of the detail of the assembler logic has, of course, been left out to
emphasize the overall structure and main concepts. You should think about
these details for yourself, and you should also attempt to identify those func-
tions of the assembler that should be implemented as separate procedures or
modules. (For example, the operations “search symbol table” and “read input
line” might be good candidates for such implementation.) This kind of
thoughtful analysis should be done before you make any attempt to actually
implement an assembler or any other large piece of software.
Chapter 8 contains an introduction to software engineering tools and tech-
niques, and illustrates the use of such techniques in designing and implement-
ing a simple assembler. You may want to read this material now to gain
further insight into how an assembler might be constructed.
2.2 MACHINE-DEPENDENT
ASSEMBLER FEATURES
In this section, we consider the design and implementation of an assembler for
the more complex XE version of SIC. In doing so, we examine the effect of the
extended hardware on the structure and functions of the assembler. Many real
machines have certain architectural features that are similar to those we con-
sider here. Thus our discussion applies in large part to these machines as well
as to SIC/XE.
Figure 2.5 shows the example program from Fig. 2.1 as it might be rewrit-
ten to take advantage of the SIC/XE instruction set. In our assembler lan-
guage, indirect addressing is indicated by adding the prefix @ to the operand+2.2 Machine-Dependent Assembler Features
Pass 1:
begin
read first input line
if OPCODE = ‘START’ then
begin
save #[OPERAND] as starting address
initialize LOCCTR to starting address
write line to intermediate file
read next input line
end {if START)
els
initialize LOCCTR to 0
while OPCODE # ‘END’ do
begin
if this is not a comment line then
begin
if there is a symbol in the LABEL field then
begin
search SYMTAB for LABEL
4 found then
set error flag (duplicate symbol)
else
insert (LABEL,LOCCTR) into SYMTAB
end {if symbol)
search OPTAB for OPCODE
if found then
add 3 (instruction length} to LOccTR
else if OPCODE = ‘WORD’ then
add 3 to LOCCTR
else if OPCODE = ‘RESW’ then
add 3 * #{OPERAND) to LOCCTR
else if OPCODE = ‘RESB’ then
add #[OPERAND] to LOCCTR
else if OPCODE = ‘BYTE’ then
begin
find length of constant in bytes
add length to LOCCTR
end (if BYTE)
else
set error flag (invalid operation code)
end (if not a comment)
write line to intermediate file
read next input line
end (while not END)
write last line to intermediate file
save (LOCCTR - starting address) as program length
end (Pass 1)
Figure 2.4(a) Algorithm for Pass 1 of assembler.Chapter 2 Assemblers
Pass 2:
begin
read first input line (from intermediate file}
if OPCODE = ‘START’ then
begin
write listing line
read next input line
end (if START)
write Header record to object program
initialize first Text record
while OPCODE # ‘END’ do
begin
4f this is not a comment line then
begin
search OPTAB for OPCODE
Af found then
begin
if there is a symbol in OPERAND field then
begin
search SYMTAB for OPERAND
if found then
store symbol value as operand address
else
begin
store 0 as operand address
set error flag (undefined symbol)
end
end {if symbol}
el
store 0 as operand address
assemble the object code instruction
end (if opcode found}
else if OPCODE = ‘BYTE’ or ‘WORD’ then
convert constant to object code
Af object code will not fit into the current Text record then
begin
write Text record to object program
initialize new Text record
end
add object code to Text record
end (if not comment)
write listing line
read next input line
end (while not END)
write last Text record to object program
write End record to object program
write last listing line :
end (Pass 2)
Figure 2.4(b) Algorithm for Pass 2 of assembler.Line
10
12
B
15
20
25,
30
35
40
45
50
55
60
65
70
80
95
100
105
110
115
120
125
130
132
133
135
140
145
150
155
160
165
170
175
180
185
195
200
205
210
212
215
220
225
230
235
240
245
250
255,
Source statement
cory smrT 0
FIRST STL RETADR
0B #LENGTH
BASE LENGTH
coop = -+JSUB = RDREC
LDA LENGTH
comp 40
JEQ ENDFIL
+3SUB © WRREC
gz cLooP
ENDFIL «= LOA OF
STA BUFFER
WA 8B
STA LENGTH
+USUB_WRREC
J @RETADR
OF BYTE —C'EOF’
RETADR = -RESW 1
LENGTH «RESW1
BUFFER ESB (4096
*2.2 Machine-Dependent Assembler Features 55
COPY FILE FROM INPUT TO OUTPUT
‘SAVE. RETURN ADDRESS
ESTABLISH BASE REGISTER
READ INPUT RECORD
TEST FOR BOF (LENGTH = 0)
EXIT IF BOF FOUND
WRITE OUTPUT RECORD
L00P
INSERT END OF FILE MARKER
SET LENGTH = 3
WRITE BOF
RETURN TO CALLER
LENGTH OF RECORD
4096-BYTE BUFFER AREA
SUBROUTINE TO READ RECORD INTO BUFFER
CLEAR |X
CLEAR A
CLEARS
sur #4096
RWOOP = 1D INPUT
JEQ RLOOP
RD INPUT
COMPR A,S
EQ maT
SICH BUFFER, X
TR 7
our RLOOP
Exar sm LENGTH
RSUB
INPUT BYTE X'FL’
CLEAR LOOP COUNTER
CLEAR A TO ZERO
CLEAR $ TO ZERO
‘TEST INPUT DEVICE
‘WOOP UNTIL READY
READ CHARACTER INTO REGISTER A
TEST FOR END OF RECORD (X’00")
EXIT LOOP IF FOR
STORE CHARACTER IN BUFFER
LOOP UNLESS MAX LENGTH
HAS BEEN REACHED
SAVE RECORD LENGTH
RETURN TO CALLER
CODE FOR INPUT DEVICE
SUBROUTINE TO WRITE RECORD FROM BUFFER
CLEAR xX
Lor LENGTH
woop = 1D ourpur
BQ WLOOP
LOCH BUFFER, X
wo ourpur
TOR oT
our LOOP
SUB
ourpur BYTE «x"05"
=D FIRST
CLEAR LOOP COUNTER
TEST OUTPUT DEVICE
LOOP UNTIL READY
GET CHARACTER FROM BUFFER
WRITE CHARACTER
LOOP UNTIL ALL CHARACTERS
HAVE BEEN WRITTEN
RETURN TO CALLER
CODE FOR OUTPUT DEVICE
Figure 2.5 Example of a SIC/XE program.56
Chapter 2. Assemblers
(see line 70). Immediate operands are denoted with the prefix # (lines 25, 55,
133). Instructions that refer to memory are normally assembled using either
the program-counter relative or the base relative mode. The assembler direc-
tive BASE (line 13) is used in conjunction with base relative addressing. (See
Section 2.2.1 for a discussion and examples.) If the displacements required for
both program-counter relative and base relative addressing are too large to fit
into a 3-byte instruction, then the 4-byte extended format (Format 4) must be
used. The extended instruction format is specified with the prefix + added to
the operation code in the source statement (see lines 15, 35, 65). It is the pro-
grammer’s responsibility to specify this form of addressing when it is re-
quired.
The main differences between this version of the program and the version
in Fig. 2.1 involve the use of register-to-register instructions (in place of regis-
ter-to-memory instructions) wherever possible. For example, the statement on
line 150 is changed from COMP ZERO to COMPR A‘S. Similarly, line 165 is
changed from TIX MAXLEN to TIXR T. In addition, immediate and indirect
addressing have been used as much as possible (for example, lines 25, 55, and
70).
These changes take advantage of the more advanced SIC/XE architecture
to improve the execution speed of the program. Register-to-register instruc-
tions are faster than the corresponding register-to-memory operations because
they are shorter, and, more importantly, because they do not require another
memory reference. (Fetching an operand from a register is much faster than re-
trieving it from main memory.) Likewise, when using immediate addressing,
the operand is already present as part of the instruction and need not be
fetched from anywhere. The use of indirect addressing often avoids the need
for another instruction (as in the “return” operation on line 70). You may no-
tice that some of the changes require the addition of other instructions to the
program. For example, changing COMP to COMPR on line 150 forces us to
add the CLEAR instruction on line 132. This still results in an improvement in
execution speed. The CLEAR is executed only once for each record read,
whereas the benefits of COMPR (as opposed to COMP) are realized for every
byte of data transferred.
In Section 2.2.1, we examine the assembly of this SIC/XE program, focus-
ing on the differences in the assembler that are required by the new addressing
modes. (You may want to briefly review the instruction formats and target ad-
dress calculations described in Section 1.3.2.) These changes are direct conse-
quences of the extended hardware functions.
Section 2.2.2 discusses an indirect consequence of the change to SIC/XE.
The larger main memory of SIC/XE means that we may have room to load
and run several programs at the same time. This kind of sharing of the ma-
chine between programs is called multiprogramming. Such sharing often results
in more productive use of the hardware. (We discuss this concept, and its