System Programming
UNIT I
Basics of system
programming
Goals of This Class
Enforce your programming skill
Get you acquainted with programming
in ‘C’, Linux OS
Make you prepared for University Exam
& entrance exams
Goals of System software
User convenience: Provide convenient
methods of using computer system.
Efficient use: Ensure efficient use of
computer resources.
Non-interference: Prevent interference in
the activities of its users.
Introduction
Software
Application software usually used by end-user
It is concerned with the solution of some problem, using
the computer as a tool, instead of how computers
actually work.
System software
System software consists of a variety of programs that
support the operation of a computer (ex: text editor,
compiler, debugger)
One characteristic in which most system software differ
from application software is machine dependency
A system software programmer must know the target
machine structure(e.g. of assembler)
Introduction
What is the System Software?
Programs that supports the operation of a computer.
Examples of the System Software
Text editor, compiler, loader, linker, assembler, macro processor,
operating system.
What is the System Programming?
Design and implementation of system software.
“By understanding of the system software, you will gain a deeper
understanding of how computers actually works”
System Software
and Machine Architecture
Machine dependent system software
System programs are to support the operation and use
of the target computer.
The difference between different machine
Machine code
Instruction formats
Addressing mode
Registers
Machine independent system software
General design and logic is basically the same:
Code optimization
General design and logic of an assembler
System Software
The system software includes
Assembler
Linker
Loader
Macro processor
Text editor
Compiler
Operating system
Debugging system
Source Code Control System
Need ?
Programming Languages
Machine language:
It is computer’s native language having a sequence of zeroes
and ones (binary). Different computers understand different
sequences. Thus, hard for humans to understand e.g.0101001...
Assembly language:
It uses mnemonics for machine language. In this each instruction
is minimal but still hard for humans to understand:
e.g. ADD AH, BL
High-level languages:
FORTRAN, Pascal, BASIC, C, C++, Java, etc.
Each instruction composed of many low-level instructions,
closer to English. It is easier to read and understand:
e.g. hypot = sqrt(opp*opp + adj * adj);
File Handling
FILE:
It is defined in the header file stdio.h.
The link between our program and
operating system is a structure called FILE.
FILE structure contains information about
file being used , such as current size,
location in memory.
FILE *fp; where fp is file pointer of type
FILE.
Format of fopen()
FILE *fp;
fp= fopen(“file_name”, “type”);
File_name: character string that contains name of file to be
opened.
Type: File Meaning
Type
r Open existing file for reading only
w Opens a new file for writing if exists overwritten
a Opens existing file appending
r+ Read, write, modify existing file
w+ Read, write new file, existing file contents
destroyed
a+ Reading, appending existing file
NOTE: Returns NULL if file does not exists or if unable to
open the file
Closing file
fclose(fp);
flcose closes the file to which the file
pointer fp points to. It also writes buffered
data in the file before close is performed.
Character input/output in files
getc(): is used to read characters from a file opened
in read mode by fopen.
General Format:
getc(fp);
getc() function returns EOF when end of file
occurs or if it encounters an error.
putc():is used to write characters to a disk file that
can be opened using fopen in write mode
General Format:
putc(c, fp); where c is character to be written
Direct Input/Output
fread(): reads into array ptr upto
nitems data items of size size from
the steram fp and returns the number
of items read. If error returns EOF.
int fread(ptr, size, nitems,fp)
Char *ptr
int size, nitems
FILE *fp
rchar=fread(buf, sizeof(int), 20,input)
fwrite :appends at the most nitems
item of data of size size in the file to
which file pointer fp points to, from the
array to which pointer ptr points.
int fwrite(ptr, size, nitems,fp)
Char *ptr
int size, nitems
FILE *fp
rchar=fwrite(buf, sizeof(char), 80,output)
Program
Copy the contents of text file in to
another text file character by character.
Write program to accept characters
from user and write in to file.
#include<stdio.h>
void main()
{
FILE *fs, *ft;
char c;
fs=fopen("pr1.txt","r");
if(fs==NULL)
{ printf("unable to open file");
exit(0);
}
ft=fopen("pr2.txt","w");
if(ft==NULL)
{
printf("unabl to create new file");
fclose(fs);
exit(0);
}
while(1)
{
c=getc(fs);
if(c==EOF)
break;
putc(c,ft);
}
fclose(fs);
fclose(ft);
getch();
}
Contents
• Language processors
• Data structures for language
processing
• Scanning and parsing
• Assembler
Language Processor
Necessity of Language of processing
Application domain
Execution domain
Semantic gap
Draw back of semantic gap
Large development time
Large development effort
Poor quality of software
A. Language processors
A Language processor is a software
which bridges a specification or
execution gap
Specification Gap Execution Gap
Application Execution
PL Domain
Domain Domain
Language processors
A language translator
A detranslator
A preprocessor
Interpreter
Language Processing Activities
Program generation activities
Program execution activities
1.Program translation
2.Program interpretation
Program Generator
Program generator: It is s/w which accepts the
specification of a program to be generated and
generates a program in target PL
Program Generator Domain
EG between target PLD and ED is
bridged by compiler or interpreter for
the PL.
Specification Gap
Program
Application Target PL Execution
Generator
Domain Domain Domain
Domain
Program generator validates the data
during data entry.
It reduces the testing effort.
Economical than to develop problem
oriented language(large EG).
Employee Name
Address
Age
Program Execution Activities
Characteristics of program
translation model
Program must be translated before it
can be executed.
Translated program may be saved in
the file. The saved program may be
executed repeatedly.
A program may be retranslated
following modifications.
Program Interpretation
Interpreter Memory
PC Source
Program
+
Error Data
Characteristics of program
interpretation model
The source program is retained in the
source form itself i.e. no target program
exists.
A statement is analyzed during its
interpretation.
Qu: Define the term language processor and explain
various language processing activities (8)
comparison
asda asda
Fundamentals of Language
Processing
Language processing
= Analysis of source program
+ synthesis of target program
Lexical rules
Syntax rules
Semantic rules
Fundamentals of Language
Processing
Lexical Analysis:
• Identify lexical units->classification of units->
Tokens->class code and number
a:= b + I;
ID#2 OP#5 ID#3 OP#3 ID#1 OP#10
• ID#2 Stands for identifier occupying entry #2
in the symbol table
• Tokens: keywords, variable or arithmetic
operator
Continue…
Syntax analysis:
String of tokens->statement class->
build IC -> semantic analysis
Qu. Explain Syntax analysis, Lexical analysis and semantic
Analysis with example? (6-2012)
Qu. Why Lexical and syntax analyzer are separated out.
(2-2012)
Phases of a language processor
• Forward references
• Issues concerning memory requirements and
organization of a language processor`
Explain different phases of language processing?
(4-2011)
Forward Reference
A FR of a program entity is a reference
to the entity which precedes its
definition in the program
E. g. PR_PROFIT := (PROFIT * 100)/ COST_PRICE;
-------
MOVER AREG, PROFIT
Long profit;
MULT AREG,100
DIV AREG, COST_PRICE
MOVM AREG, PERCENT_PROFIT
---------------
PERCENT_PROFIT DW 1
COST_PRICE DW 1
PROFIT DW 1
Passes of a Language Processor
Pass I
Performs analysis of the source program
and note relevant information
Pass II
Perform synthesis of target program
Intermediate Representation
Desirable Properties of IR
Memory efficiency: IR must be
compact.
Processing efficiency: Efficient
algorithms must exist for constructing
and analyzing the IR.
Ease of use: IR should be easy to
construct and use
Language Processor
Development Tools
What is parsing? Explain LEX and YACC in brief.
Language Processor
Development Tools
Qu. What is Parsing? Explain LEX and YACC in brief.
(8-2011)
Assembly Language
Low level programming language
Mnemonic operation code
Symbolic operands
Data declarations.
Assembly language is machine
dependent.
Assembler
A tool called an assembler translates assembly language
into binary instructions.
Basic Assembler Functions
Translate mnemonic opcode to machine language.
Convert symbolic operands to their
machine(neumeric) values.
Build the machine instructions in proper format.
Convert data constants in to machine representation.
Error checking if provided.
Variables are represented by symbolic names.
Changes can be quickly and easily incorporated with
reassembly.
Assembler maintains several data structures.
Assembly language statements
Imperative statements
Indicate an action to be performed during the execution.
Declarative statements
[label] DS <constant>
[label] DC <value>
A DS 1
B DS 200
ONE DC ‘1’
Assembler directives
START <constant>
END [<operand spec >]
Literals
Write the value of a constant operand as a part of
the instruction that uses it, such an operand is called
a literal
Avoid having to define the constant elsewhere in the
program and make up a label for it
A literal is identified with the prefix =, which is
followed by a specification of the literal value
With a literal, the assembler generates the specified
value as a constant at some other memory location.
The address of this generated constant is used as the
target address for the machine instruction
Assembler Directive
Assembler directive instruct the
assembler to perform certain actions
during the assembly of program.
E.g. START <constant>
How literals are processed by assembler which does not
support immediate operands.(4-2012).
What is literal? What is its advantage in assembly
language programming?
Advantages of Assembly
Language
Reduced errors
Faster Translation time
Changes could made easier and Faster
Better than HLL where it is desirable to
use specific architectural features.
Disadvantages
Many instructions are required to
achieve small task.
Programmer requires knowledge of the
processor architecture and instruction
set.
Programs are machine dependent,
requires complete rewrite if hardware is
changed.
Design specification of assembler
Specify the problem.
Define format of data structure.
Specify algorithm.
Determine the processing necessary to
perform the task
Analysis Phase
Isolate the label, mnemonic opcode and
operand fields of statement.
If label is present enter the pair in a new
entry of symbol table.
Check validity of the mnemonic opcode
through a lookup in the Mnemonics table.
Perform the LC processing i.e. update the
value contained in LC by considering the
opcode and operands of the statement.
Synthesis Phase
Obtain the machine opcode
corresponding to the mnemonic from
mnemonic table.
Obtain the address of a memory
operands from the symbol table
Synthesize the machine instruction
Data structures of the assembler
Single Pass Translation
Forward reference.
Back patching.
e.g.MOV AREG, A
TII(instruction add and symbol)
Two pass Translation
Handles forward references.
Design of two pass Assembler
Pass - I
Separate the symbol, mnemonic opcode and
operand fields.
Build the symbol table.
Perform LC processing.
Construct intermediate representation
Pass – II
Synthesize the target program
Pass – I Databases
Input source program
LC: used to keep track of each instructions
location.
MOT: mnemonic and its length.
ST: stores each symbol and its
corresponding location.
LT: stores each literal and its
corresponding assigned location
Data structures of assembler pass I
An assembly program
Algorithm for pass-I of two
pass assembler
Read
If label
Yes: store label in ST with LC value
No: Search pseudo OPTAB
Search pseudo OPTAB
Found: type
DS, DC : Determine length of data space required and update LC
End: Go to pass-II
Not found: Search m/c OPTAB
Search m/c OPTAB(MOVEM AREG,A)
Get length of instruction
Process literals update LC
Read
Assembler First Pass
1. loc_cntr:=0;
POOLTAB[1]:=1;
pooltab_ptr:=1; littab_ptr:=1;
2. While next statement is not END
a) If label is present
this_label = symbol in label field;
Enter(this_label,loc_cntr) in SYMTAB
b) If an LTORG statement then
process literal and update location counter
c) If a START or ORIGIN statement
loc_cntr:= value in operand field
d) If an EQU statement then
i) this_addr:=value of <address spec>
ii) correct the symtab entry for this_label to
(this_label,this_addr)
e) If a declaration statement the
i)code:=codeof the declaration statement
ii)size:=size of memory area req. by DC/DS
iii)loc_cntr:=lcn_cntr+size
iv) Generate IC ’(DC,code)..’
f) If an imperative statement then
i) code:=machine opcode from OPTAB
ii)loc_cntr:=lcn_cntr+instruction length from OPTAB
iii)IF operand is literal then
this_literal:=literal in operand field;
LITTAB[littab_ptr]:=this_literal;
littab_ptr:=littab_ptr+1
else(operand is symbol)
this_entry:=SYMTAB entry number of operand
Generate IC(IS, code)(S, this_entry)
3. (Processing of END statement)
a) Perform step 2b
b) Generate IC ‘(AD,02)’
c) Go to Pass II
Intermediate code
IC is categorized based on:
Processing efficiency
Memory economy
Each IC unit consists of following 3
fields:
Address
Representation of mnemonic opcode
Representation of operands
Address Mnemonic Operands
Opcoed
Code for declaration statements and
directives
Intermediate code – variant I
Intermediate code – variant II
Memory requirements using variant I
and variant II
Processing of declarations
• All of the literal operands used in the program
are gathered together into one or more literal
pools
• Normally literals are placed into a pool at the
end of the program
• A LTORG statement creates a literal pool that
contains all of the literal operands used since
the previous LTORG
• Most assembler recognize duplicate literals: the
same literal used in more than one place and
store only one copy of the specified data value
LITTAB (literal table): contains the literal name,
the operand value and length, and the address
assigned to the operand when it is placed in a
literal pool
An assembly program
Two pass assembler
Read statement
Search pseudo opcode table
Found : check type
DC,DS: Determine length of data space update LC
END : clean up and exit
Not found: Search MOT
Search MOT
Get instruction length, format type and binary code
Evaluate operand expression by searching for values of
symbols
Assemble together the parts of instruction and update LC
Read statement.
Data structures for two pass
assembler
Input: pass I program(IC)
Location counter(LC)
Machine operation table (MOT):
Pseudo-Operation table(POT)
Symbol table(Prepared by Pass I)
Base Table (indicates which registers are
currently specified as base register)
Work space INST : used to hold each
instruction as its various parts(binary op-code,
register field, length field, displacement field)
are being assembled together.
Algorithm: Second pass
Continued
Error reporting in pass - I
Error reporting in pass - II
• Two pass assembler performs program listing and
error reporting in pass II.
• Errors detected in pass I are stored in an error table.
Thank you !!!
Symbol-Defining Statements
Assembler directive that allows the programmer to define symbols
and specify their values
General form: symbol EQU value
Line 133: +LDT #4096
MAXLEN EQU 4096
+LDT #MAXLEN
It is much easier to find and change the value of MAXLEN
Assembler directive that indirect assigns values to symbols ORG
STAB RESB 1100
ORG STAB
SYMBOL RESB 6
VALUE RESW 1
FLAGS RESW 2
ORG
STAB+1100
Context-Free Grammars
Grammars
Grammars express languages
Example: the English language
sentence noun _ phrase predicate
noun _ phrase article noun
predicate verb
article a
article the
noun cat
noun dog
verb runs
verb walks
80
A derivation of “the dog walks”:
sentence noun _ phrase predicate
noun _ phrase verb
article noun verb
the noun verb
the dog verb
the dog walks
81
More Notation
G V , T , S , P
Grammar
V: Set of variables
T: Set of terminal symbols
S: Start variable
P: Set of Production rules
82
Scanning and parsing
Scanning is the process of recognizing
the lexical components in a source
string
Parsing is the process of checking the
validity of a source string, and to
determine its syntactic structure
Top down parsing
Bottom up parsing
83
Errors Reporting in Pass-I
Source need not be preserved till pass II.
Conserves memory and avoids duplicate
processing.
Debugging difficult.(mapping of source
statement and target code is difficult )
Enlist the different types of errors that are handled by
Pass- I and pass- II of two pass assembler(6-2012)
Data structures for language
processing
In computer science, a data structure is a particular way of
storing and organizing data in a computer so that it can be used
efficiently.
Different kinds of data structures are suited to different kinds
of applications, and some are highly specialized to specific
tasks. For example, B-trees are particularly well-suited for
implementation of databases, while compiler implementations
usually use hash tables to look up identifiers.
1. Search data structure
2. Allocation data structures
85
Search data structure
Sequential search organization
Binary search organization
Hash table organization
Linked list and tree structured
organization
Nested search structures
86
Allocation data structures
Stacks
A linear data structure which permits
allocation and deallocation of entities in a
Last-in-first-out (LIFO) manner and only
the last entry is accessible at any time
Heaps
A nonlinear data structure which permits
allocation and deallocation of entities in a
random order
87
Symbol-Defining Statements
Assembler directive that allows the programmer to define symbols
and specify their values
General form: symbol EQU value
Line 133: +LDT #4096
MAXLEN EQU 4096
+LDT #MAXLEN
It is much easier to find and change the value of MAXLEN
Assembler directive that indirect assigns values to symbols ORG
STAB RESB 1100 STAB RESB 1100
SYMBOL EQU STAB ORG STAB
VALUE EQU STAB+6 SYMBOL RESB 6
FLAGS EQU STAB+9 VALUE RESW 1
FLAGS RESW 2
ORG STAB+1111