UNIT : I
SYSTEM
PROGRAMMING
II SEMESTER (MCSE 201)
PREPARED BY ARUN PRATAP SINGH
PREPARED BY ARUN PRATAP SINGH 1
1
OVERVIEW OF LANGUAGE PROCESSORS :
System programming (or systems programming) is the activity of computer programming system
software. The primary distinguishing characteristic of systems programming when compared to
application programming is that application programming aims to produce software which
provides services to the user (e.g. word processor), whereas systems programming aims to
produce software which provides services to the computer hardware (e.g. disk defragmenter). It
requires a greater degree of hardware awareness.
The following attributes characterize systems programming:
The programmer will make assumptions about the hardware and other properties of the
system that the program runs on, and will often exploit those properties, for example by using
an algorithm that is known to be efficient when used with specific hardware.
Usually a low-level programming language or programming language dialect is used that:
can operate in resource-constrained environments
is very efficient and has little runtime overhead
has a small runtime library, or none at all
allows for direct and "raw" control over memory access and control flow
lets the programmer write parts of the program directly in assembly language
Often systems programs cannot be run in a debugger. Running the program in a simulated
environment can sometimes be used to reduce this problem.
Systems programming is sufficiently different from application programming that programmers
tend to specialize in one or the other.
In system programming, often limited programming facilities are available. The use of automatic
garbage collection is not common and debugging is sometimes hard to do. The runtime library, if
available at all, is usually far less powerful, and does less error checking. Because of those
limitations, monitoring and logging are often used; operating systems may have extremely
elaborate logging subsystems.
Implementing certain parts in operating system and networking requires systems programming,
for example implementing Paging (Virtual Memory) or a device driver for an operating system.
UNIT : I
PREPARED BY ARUN PRATAP SINGH 2
2
PREPARED BY ARUN PRATAP SINGH 3
3
Division of of language processors
PREPARED BY ARUN PRATAP SINGH 4
4
PREPARED BY ARUN PRATAP SINGH 5
5
PREPARED BY ARUN PRATAP SINGH 6
6
PREPARED BY ARUN PRATAP SINGH 7
7
PREPARED BY ARUN PRATAP SINGH 8
8
PREPARED BY ARUN PRATAP SINGH 9
9
ELEMENTS OF ASSEMBLY LANGUAGE PROGRAMMING :
An assembly language is a low-level programming language for a computer, or other
programmable device, in which there is a very strong (generally one-to-one) correspondence
between the language and the architecture's machine code instructions. Each assembly language
is specific to a particular computer architecture, in contrast to most high-level programming
languages, which are generally portable across multiple architectures, but
require interpreting or compiling.
A utility program called an assembler is used to translate assembly language statements into the
target computer's machine code. The assembler performs a more or less isomorphic translation
(a one-to-one mapping) from mnemonic statements into machine instructions and data. This is in
PREPARED BY ARUN PRATAP SINGH 10
10
contrast with high-level languages, in which a single statement generally results in many machine
instructions.
Many sophisticated assemblers offer additional mechanisms to facilitate program development,
control the assembly process, and aid debugging. In particular, most modern assemblers include
a macro facility (described below), and are called macro assemblers.
Elements of Assembly Language :-
Assembly language is basically like any other language, which means that it has its words, rules
and syntax. The basic elements of assembly language are:
Labels;
Orders;
Directives; and
Comments.
PREPARED BY ARUN PRATAP SINGH 11
11
Key concepts :
Assembler
An assembler is a program which creates object code by translating combinations of mnemonics
and syntax for operations and addressing modes into their numerical equivalents. This
representation typically includes an operation code ("opcode") as well as other control bits. The
assembler also calculates constant expressions and resolvessymbolic names for memory
locations and other entities. The use of symbolic references is a key feature of assemblers, saving
tedious calculations and manual address updates after program modifications. Most assemblers
also include macro facilities for performing textual substitutione.g., to generate common short
sequences of instructions as inline, instead of called subroutines.
Some assemblers may also be able to perform some simple types of instruction set-
specific optimizations. One concrete example of this may be the ubiquitous x86 assemblers from
various vendors. Most of them are able to perform jump-instruction replacements (long jumps
replaced by short or relative jumps) in any number of passes, on request. Others may even do
simple rearrangement or insertion of instructions, such as some assemblers
for RISC architectures that can help optimize a sensible instruction scheduling to exploit the CPU
pipeline as efficiently as possible.
Like early programming languages such as Fortran, Algol, Cobol and Lisp, assemblers have been
available since the 1950s and the first generations of text based computer interfaces. However,
assemblers came first as they are far simpler to write than compilers for high-level languages.
This is because each mnemonic along with the addressing modes and operands of an instruction
translates rather directly into the numeric representations of that particular instruction, without
much context or analysis. There have also been several classes of translators and semi automatic
code generators with properties similar to both assembly and high level languages,
with speedcode as perhaps one of the better known examples.
Number of passes
There are two types of assemblers based on how many passes through the source are needed
to produce the executable program.
One-pass assemblers go through the source code once. Any symbol used before it is defined
will require "errata" at the end of the object code (or, at least, no earlier than the point where
the symbol is defined) telling the linker or the loader to "go back" and overwrite a placeholder
which had been left where the as yet undefined symbol was used.
Multi-pass assemblers create a table with all symbols and their values in the first passes, then
use the table in later passes to generate code.
In both cases, the assembler must be able to determine the size of each instruction on the initial
passes in order to calculate the addresses of subsequent symbols. This means that if the size of
PREPARED BY ARUN PRATAP SINGH 12
12
an operation referring to an operand defined later depends on the type or distance of the operand,
the assembler will make a pessimistic estimate when first encountering the operation, and if
necessary pad it with one or more "no-operation" instructions in a later pass or the errata. In an
assembler with peephole optimization, addresses may be recalculated between passes to allow
replacing pessimistic code with code tailored to the exact distance from the target.
The original reason for the use of one-pass assemblers was speed of assembly often a second
pass would require rewinding and rereading a tape or rereading a deck of cards. With modern
computers this has ceased to be an issue. The advantage of the multi-pass assembler is that the
absence of errata makes the linking process (or the program load if the assembler directly
produces executable code) faster.
High-level assemblers
More sophisticated high-level assemblers provide language abstractions such as:
Advanced control structures
High-level procedure/function declarations and invocations
High-level abstract data types, including structures/records, unions, classes, and sets
Sophisticated macro processing (although available on ordinary assemblers since the late
1950s for IBM 700 series and since the 1960s for IBM/360, amongst other machines)
Object-oriented programming features such as classes, objects, abstraction, polymorphism,
and inheritance
See Language design below for more details.
Assembly language
A program written in assembly language consists of a series of (mnemonic) processor instructions
and meta-statements (known variously as directives, pseudo-instructions and pseudo-ops),
comments and data. Assembly language instructions usually consist of an opcode mnemonic
followed by a list of data, arguments or parameters. These are translated by
an assembler into machine language instructions that can be loaded into memory and executed.
For example, the instruction below tells an x86/IA-32 processor to move an immediate 8-bit
value into a register. The binary code for this instruction is 10110 followed by a 3-bit identifier for
which register to use. The identifier for the AL register is 000, so the following machine code loads
the AL register with the data 01100001.
10110000 01100001
This binary computer code can be made more human-readable by expressing it in hexadecimal as
follows.
PREPARED BY ARUN PRATAP SINGH 13
13
B0 61
Here, B0 means 'Move a copy of the following value into AL', and 61 is a hexadecimal
representation of the value 01100001, which is 97 in decimal. Intel assembly language provides
the mnemonic MOV (an abbreviation of move) for instructions such as this, so the machine code
above can be written as follows in assembly language, complete with an explanatory comment if
required, after the semicolon. This is much easier to read and to remember.
MOV AL, 61h ; Load AL with 97 decimal (61 hex)
In some assembly languages the same mnemonic such as MOV may be used for a family of
related instructions for loading, copying and moving data, whether these are immediate values,
values in registers, or memory locations pointed to by values in registers. Other assemblers may
use separate opcodes such as L for "move memory to register", ST for "move register to memory",
LR for "move register to register", MVI for "move immediate operand to memory", etc.
The Intel opcode 10110000 (B0) copies an 8-bit value into the AL register, while 10110001 (B1)
moves it into CL and 10110010 (B2) does so into DL. Assembly language examples for these
follow.
[6]
MOV AL, 1h ; Load AL with immediate value 1
MOV CL, 2h ; Load CL with immediate value 2
MOV DL, 3h ; Load DL with immediate value 3
The syntax of MOV can also be more complex as the following examples show.
[7]
MOV EAX, [EBX] ; Move the 4 bytes in memory at the address contained in EBX into EAX
MOV [ESI+EAX], CL ; Move the contents of CL into the byte at address ESI+EAX
In each case, the MOV mnemonic is translated directly into an opcode in the ranges 88-8E, A0-
A3, B0-B8, C6 or C7 by an assembler, and the programmer does not have to know or remember
which.
Transforming assembly language into machine code is the job of an assembler, and the reverse
can at least partially be achieved by a disassembler. Unlike high-level languages, there is usually
a one-to-one correspondence between simple assembly statements and machine language
instructions. However, in some cases, an assembler may provide pseudo instructions (essentially
macros) which expand into several machine language instructions to provide commonly needed
functionality. For example, for a machine that lacks a "branch if greater or equal" instruction, an
assembler may provide a pseudo instruction that expands to the machine's "set if less than" and
"branch if zero (on the result of the set instruction)". Most full-featured assemblers also provide a
rich macro language (discussed below) which is used by vendors and programmers to generate
more complex code and data sequences.
PREPARED BY ARUN PRATAP SINGH 14
14
Each computer architecture has its own machine language. Computers differ in the number and
type of operations they support, in the different sizes and numbers of registers, and in the
representations of data in storage. While most general-purpose computers are able to carry out
essentially the same functionality, the ways they do so differ; the corresponding assembly
languages reflect these differences.
Multiple sets of mnemonics or assembly-language syntax may exist for a single instruction set,
typically instantiated in different assembler programs. In these cases, the most popular one is
usually that supplied by the manufacturer and used in its documentation.
PREPARED BY ARUN PRATAP SINGH 15
15
DESIGN OF ASSEMBLER :
PREPARED BY ARUN PRATAP SINGH 16
16
PREPARED BY ARUN PRATAP SINGH 17
17
PREPARED BY ARUN PRATAP SINGH 18
18
PREPARED BY ARUN PRATAP SINGH 19
19
PREPARED BY ARUN PRATAP SINGH 20
20
PREPARED BY ARUN PRATAP SINGH 21
21
ONE PASS ASSEMBLER :
One-pass assemblers go through the source code once. Any symbol used before it is defined will
require "errata" at the end of the object code (or, at least, no earlier than the point where the
symbol is defined) telling the linker or the loader to "go back" and overwrite a placeholder which
had been left where the as yet undefined symbol was used.
PREPARED BY ARUN PRATAP SINGH 22
22
PREPARED BY ARUN PRATAP SINGH 23
23
PREPARED BY ARUN PRATAP SINGH 24
24
PREPARED BY ARUN PRATAP SINGH 25
25
PREPARED BY ARUN PRATAP SINGH 26
26
PREPARED BY ARUN PRATAP SINGH 27
27
PREPARED BY ARUN PRATAP SINGH 28
28
MULTI-PASS ASSEMBLER:
Multi-pass assemblers create a table with all symbols and their values in the first passes, then
use the table in later passes to generate code.
In both cases, the assembler must be able to determine the size of each instruction on the initial
passes in order to calculate the addresses of subsequent symbols. This means that if the size of
an operation referring to an operand defined later depends on the type or distance of the operand,
the assembler will make a pessimistic estimate when first encountering the operation, and if
necessary pad it with one or more "no-operation" instructions in a later pass or the errata. In an
assembler with peephole optimization, addresses may be recalculated between passes to allow
replacing pessimistic code with code tailored to the exact distance from the target.
The original reason for the use of one-pass assemblers was speed of assembly often a second
pass would require rewinding and rereading a tape or rereading a deck of cards. With modern
computers this has ceased to be an issue. The advantage of the multi-pass assembler is that the
absence of errata makes the linking process (or the program load if the assembler directly
produces executable code) faster.
PREPARED BY ARUN PRATAP SINGH 29
29
PREPARED BY ARUN PRATAP SINGH 30
30
PREPARED BY ARUN PRATAP SINGH 31
31
PREPARED BY ARUN PRATAP SINGH 32
32
The difference between one pass and two pass assemblers are:-
A one pass assembler passes over the source file exactly once, in the same pass collecting the
labels, resolving future references and doing the actual assembly. The difficult part is to resolve
future label references and assemble code in one pass.
A two pass assembler does two passes over the source file ( the second pass can be over a file
generated in the first pass ). In the first pass all it does is looks for label definitions and
introduces them in the symbol table. In the second pass, after the symbol table is complete, it
does the actual assembly by translating the operations and so on.
MACRO DEFINITION :
A macro (short for "macroinstruction") in computer science is a rule or pattern that specifies how
a certain input sequence (often a sequence of characters) should be mapped to a replacement
output sequence (also often a sequence of characters) according to a defined procedure. The
mapping process that instantiates (transforms) a macro use into a specific sequence is known as
macro expansion. A facility for writing macros may be provided as part of a software application or
as a part of a programming language. In the former case, macros are used to make tasks using
the application less repetitive. In the latter case, they are a tool that allows a programmer to
enable code reuse or even to design domain-specific languages.
PREPARED BY ARUN PRATAP SINGH 33
33
Macros are used to make a sequence of computing instructions available to the programmer as
a single program statement, making the programming task less tedious and less error-
prone. (Thus, they are called "macros" because a big block of code can be expanded from
a small sequence of characters). Macros often allow positional or keyword parameters that dictate
what the conditional assembler program generates and have been used to create
entire programs or program suites according to such variables as operating system, platform or
other factors. The term derives from "macro instruction", and such expansions were originally
used in generating assembly language code.
A macro instruction (macro) is a notational convenience for the programmer
It allows the programmer to write shorthand version of a program (module
programming)
The macro processor replaces each macro instruction with the corresponding
group of source language statements (expanding)
Normally, it performs no analysis of the text it handles.
It does not concern the meaning of the involved statements during macro
expansion.
The design of a macro processor generally is machine independent!
Macro:-
Macro instructions are single line abbreviations for group of instructions.
Using a macro, programmer can define a single instruction to represent block of
code.
PREPARED BY ARUN PRATAP SINGH 34
34
Macro Expansion
Replacement of macro call by corresponding sequence of instructions is called as
macro expansion
PREPARED BY ARUN PRATAP SINGH 35
35
Example of macro definition-
A macro invocation statement (a macro call) gives the name of the macro instruction
being invoked and the arguments to be used in expanding the macro.
macro_name p1, p2,
Difference between macro call and procedure call
Macro call: statements of the macro body are expanded each time the macro is
invoked.
Procedure call: statements of the subroutine appear only one, regardless of how
many times the subroutine is called.
How does a programmer decide to use macro calls or procedure calls?
From the viewpoint of a programmer
From the viewpoint of the CPU
PREPARED BY ARUN PRATAP SINGH 36
36
Example: more than one arguments-
A 1,data1
A 2,data2
A 3,data3
:
A 1,data3
A 2,data2
A 3,data1
:
data1 DC F5
data2 DC F6
PREPARED BY ARUN PRATAP SINGH 37
37
PREPARED BY ARUN PRATAP SINGH 38
38
Two ways of specifying arguments to a macro call-
A) Positional argument
Argument are matched with dummy arguments according to order in which they appear.
INCR A,B,C
A replaces first dummy argument
B replaces second dummy argument
C replaces third dummy argument
B) keyword arguments
This allows reference to dummy arguments by name as well as by position.
e.g.
INCR &arg1 = A,&arg3 = C, &arg2 =B
e.g.
INCR &arg1 = &arg2 = A, &arg2 =C
Two pass Macro Processor
General Design Steps
Step 1: Specification of Problem:-
Step 2 Specification of databases:-
Step 3 Specification of database formats
Step 4 : Algorithm
Specify the problem
In Pass-I the macro definitions are searched and stored in the macro definition table and
the entry is made in macro name table
In Pass-II the macro calls are identified and the arguments are placed in the appropriate
place and the macro calls are replaced by macro definitions.
Specification of databases:-
Pass 1:-
The input macro source program.
The output macro source program to be used by Pass2.
PREPARED BY ARUN PRATAP SINGH 39
39
Macro-Definition Table (MDT), to store the body of macro def
ns
.
Macro-Definition Table Counter (MDTC), to mark next available entry MDT.
Macro- Name Table (MNT), used to store names of macros.
Macro Name Table counter (MNTC), used to indicate the next available entry in MNT.
Argument List Array (ALA), used to substitute index markers for dummy arguments
before storing a macro-def
ns
.
Pass 2:-
The copy of the input from Pass1.
The output expanded source to be given to assembler.
MDT, created by Pass1.
MNT, created by Pass1.
Macro-Definition Table Pointer (MDTP), used to indicate the next line of text to be used
during macro-expansion.
Argument List Array (ALA), used to substitute macro-call arguments for the index
markers in the stored macro-def
ns
Specification of database format:-
PREPARED BY ARUN PRATAP SINGH 40
40
Macro Names Table (MNT):
PREPARED BY ARUN PRATAP SINGH 41
41
RELOCATING AND LINKING CONCEPTS :
Relocation is the process of assigning load addresses to various parts of a program and adjusting
the code and data in the program to reflect the assigned addresses. A linker usually performs
relocation in conjunction with symbol resolution, the process of searching files and libraries to
replace symbolic references or names of libraries with actual usable addresses in memory before
running a program. Although relocation is typically done by the linker at link time, it can also be
done at execution time by a relocating loader, or by the running program itself.
Relocation is typically done in two steps:
1. Each object file has various sections like code, data, .bss etc. To combine all the objects
to a single executable, the linker merges all sections of similar type into a single section
of that type. The linker then assigns run time addresses to each section and each symbol.
At this point, the code (functions) and data (global variables) will have unique run time
addresses.
2. Each section refers to one or more symbols which should be modified so that they point
to the correct run time addresses based on information stored in a relocation table in the
object file.
PREPARED BY ARUN PRATAP SINGH 42
42
PREPARED BY ARUN PRATAP SINGH 43
43
PREPARED BY ARUN PRATAP SINGH 44
44
PREPARED BY ARUN PRATAP SINGH 45
45
PREPARED BY ARUN PRATAP SINGH 46
46
LINKING CONCEPT :
PREPARED BY ARUN PRATAP SINGH 47
47
PREPARED BY ARUN PRATAP SINGH 48
48
PREPARED BY ARUN PRATAP SINGH 49
49
DESIGN OF LINKER :
Execution of Programs-
A has to be transformed before it can be executed
Many of these transformations perform memory bindings
Accordingly, an address is called compiled address, linked
address, etc
Linking
The process of collecting and combining various pieces of code and data
into a single file that can be loaded (copied) into memory and executed.
Linking time
Can be done at compile time, i.e. when the source code is
translated
Or, at load time, i.e. when the program is loaded into memory
Or, even at run time.
Static Linker
Performs the linking at compile time.
Takes a collection of relocatable object files and command line arguments
and generate a fully linked executable object file that can be loaded and
run.
Performs two main tasks
Symbol resolution: associate each symbol reference with exactly
one symbol definition
Relocation: relocate code and data sections and modify symbol
references to the relocated memory locations
Dynamic Linker
Performs the linking at load time or at run time.
PREPARED BY ARUN PRATAP SINGH 50
50
Performing Relocation
relocation_factorP = l_originP t_originP
lsymb = tsymb + relocation_factorP
Relocation Requirement
Linking Requirement
Name Table (NTAB) with the field of Synbols name and Linked_address
eg. : Pg: 231 Example : 7.9
Self-Relocating Programs-
Which program can be modified, or can modfied it self , to execute from a given Load Origin.
Classifications :
1) Non Relocatable Program
2) Relocatable Programs
3) Self Relocating Programs
Non Relocatable Program
1) Can't Execute in Any Memory Area, Only on Traslated Origin
2) Lack of Address Sensitive Instructions
3) e.g. Hand Coded Machine Language Program
Relocatable Programs-
1) Can Execute in Any Desired Memory Area
2) Availability of Address Sensitve Instruction
3) e.g. Object Module
Self-Relocating Programs-
1) Can Execute in Any Memory Area
2) Availability of Own Address Sensitve Instruction
3) Relocating Logic Specified on the start of the Program
4) Useful for Time sharing Operating System
PREPARED BY ARUN PRATAP SINGH 51
51
Linking for Overlays-
Overlay:
It is a part of a program that use same load origin as some part of the program.
Advantages:
Keep in memory only those instructions and data that are needed at any given time.
Needed when process is larger than amount of memory allocated to it.
Implemented by user, no special support needed from operating system, programming
design of overlay structure is complex.
Used to reduce the Main Memory Requirements.
An Overlay Tree
Loader
An operating system utility that copies programs from a storage device to main memory, where
they can be executed. In addition to copying a program into main memory, the loader can also
replace virtual addresses with physical addresses.
Most loaders are transparent, i.e., you cannot directly execute them, but the operating system
uses them when necessary.
Absolute loader can only load origin = linked Origin.
PREPARED BY ARUN PRATAP SINGH 52
52
PREPARED BY ARUN PRATAP SINGH 53
53
PREPARED BY ARUN PRATAP SINGH 54
54
PREPARED BY ARUN PRATAP SINGH 55
55