KEMBAR78
System Software Overview & Layers | PDF | Assembly Language | Operating System
0% found this document useful (0 votes)
250 views38 pages

System Software Overview & Layers

A pdf about BCA 3rd Sem System software paper

Uploaded by

Gyandeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
250 views38 pages

System Software Overview & Layers

A pdf about BCA 3rd Sem System software paper

Uploaded by

Gyandeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

UNIT I: OvervIew:

DefINITION aND classIfIcaTION Of sysTem sOfTware


Systems software includes the programs that are dedicated to managing the computer itself, such
as the operating system, file management utilities, and disk operating system (or DOS).
System software is a software that provides platform to other softwares. Some examples can be
operating systems, antivirus softwares, disk formating softwares, Computer language translators
etc. These are commonly prepared by the computer manufacturers. These softwares consists of
programs written in low-level languages, used to interact with the hardware at a very basic level.
System software serves as the interface between the hardware and the end users.
The most important features of system software include :
1. Closeness to the system
2. Fast speed
3. Difficult to manipulate
4. Written in low level language
5. Difficult to design

There are several different types of system software:


Operating Systems are a collection of programs that make the computer hardware conveniently
available to the user and also hide the complexities of the computer's operation. The Operating
System (such as Windows 7 or Linux) interprets commands issued by application software (e.g.
word processor and spreadsheets). The Operating System is also an interface between the
application software and computer. Without the operating system, the application programs would
be unable to communicate with the computer.
The most important tasks performed by the operating system are
1. Memory Management: The OS keeps track of the primary memory and allocates the memory
when a process requests it.
2. Processor Management: Allocates the main memory (RAM) to a process and de-allocates it
when it is no longer required.
3. File Management: Allocates and de-allocates the resources and decides who gets the
resources.
4. Security: Prevents unauthorized access to programs and data by means of passwords.
5. Error-detecting Aids: Production of dumps, traces, error messages, and other debugging and
error-detecting methods.
6. Scheduling: The OS schedules process through its scheduling algorithms.
Utility programs are small, powerful programs with a limited capability, they are usually operated
by the user to maintain a smooth running of the computer system. Various examples include file
management, diagnosing problems and finding out information about the computer etc. Notable
examples of utility programs include copy, paste, delete, file searching, disk defragmenter, disk
cleanup. However, there are also other types that can be separately installable from the Operating
System.
Library programs are a compiled collection of subroutines (e.g. libraries make many functions and
procedures available when you write a program)
Translator software (Assembler, Compiler, Interpreter)
1.Assembler translates assembly language programs into machine code (A binary code that a
machine can understand).
2.Compiler translates high level language code into object code (which is the machine language of
the target machine).
3.Interpreter analyses and executes a high-level language program a line at a time. Execution will
be slower than for the equivalent compiled code as the source code is analyzed line by line.
DIsTINcTION beTweeN sysTem sOfTware aND applIcaTION sOfTware
layereD OrgaNIzaTION Of sysTem sOfTware
The layered organization of system software is a highly structured model where each layer serves
a specific role in managing the complexity of computer systems. By dividing the software into
different layers, each with its own responsibilities, this approach ensures that the system is more
maintainable, scalable, and secure. Let’s explore each layer in greater detail, using a real-world
scenario to highlight how they interact and function together.
1. Hardware Layer
 Description: The hardware layer is the foundation of the computer system. It consists of the
physical components such as the Central Processing Unit (CPU), Random Access Memory
(RAM), storage devices (hard drives, SSDs), input/output devices (keyboards, mice,
monitors), and network interfaces.
 Role: The hardware layer provides the raw computational power, memory, and input/output
capabilities that all higher layers depend on. It directly executes machine-level instructions
(binary code), which are basic operations like arithmetic calculations, data storage, and data
retrieval.
 Real Example: Consider a gaming PC with an Intel Core i9 processor, 32GB of RAM, an
NVIDIA GeForce RTX graphics card, a 1TB SSD, and peripherals like a mechanical
keyboard and a gaming mouse. These components form the hardware layer, which is
responsible for executing all the low-level operations needed to run software.
2. Firmware Layer
 Description: Firmware is specialized, low-level software embedded into hardware
components. Unlike regular software, firmware is stored in non-volatile memory (like ROM
or flash memory) and is critical for initializing and controlling hardware. It bridges the gap
between hardware and the operating system by providing essential instructions.
 Role: Firmware initializes hardware components during the boot process and provides
basic control over the hardware. It is responsible for starting up the system and handing
control over to the operating system. Firmware often includes a bootloader, which loads
the operating system into memory and begins its execution.
 Real Example: The UEFI (Unified Extensible Firmware Interface) in modern PCs. When
you power on your computer, UEFI initializes the motherboard, checks that all connected
components (CPU, RAM, storage) are functioning correctly, and then loads the operating
system from the storage device into memory.
3. Kernel (Operating System) Layer
 Description: The kernel is the core component of an operating system (OS). It is
responsible for managing the system’s resources, such as CPU time, memory allocation,
and device access. The kernel acts as an intermediary between hardware and application
software, ensuring that all programs can run smoothly and efficiently.
 Role: The kernel manages system resources and provides essential services like process
scheduling, memory management, file systems, and device management. It handles
system calls from applications, which are requests for the kernel to perform operations on
their behalf. The kernel ensures that multiple applications can run simultaneously without
interfering with each other.
 Real Example: In a Linux-based Android smartphone, the Linux kernel manages all the
resources needed to run apps. When you open an app, the kernel allocates memory for it,
schedules CPU time, and manages input/output operations like reading data from storage
or sending data over the network.
4. Device Drivers
 Description: Device drivers are specialized software components that enable the operating
system to interact with specific hardware devices. Each hardware component, such as a
printer, graphics card, or network adapter, requires a corresponding driver that translates
generic OS commands into hardware-specific instructions.
 Role: Device drivers act as intermediaries between the operating system and hardware
devices, allowing the OS to control hardware without needing to know the details of how the
hardware operates. Drivers ensure that devices function correctly and provide the necessary
input/output services to applications.
 Real Example: The NVIDIA graphics driver in a Windows gaming PC. This driver allows the
Windows operating system to communicate with the NVIDIA GPU, enabling high-
performance graphics rendering for games and other applications that require intense
graphical processing.
5. System Libraries
 Description: System libraries are collections of pre-compiled functions or routines that
software applications can use to perform common tasks, such as file handling, memory
management, or mathematical calculations. Libraries abstract these tasks, allowing
developers to use standardized code without having to write it from scratch.
 Role: System libraries provide a set of standardized functions that applications can use to
interact with the operating system. By using these libraries, applications can perform
complex tasks in a consistent and efficient manner. Libraries also promote code reuse and
reduce the amount of redundant code in the system.
 Real Example: The Windows API (Application Programming Interface) libraries. These
libraries provide functions for creating windows, handling user input, interacting with the file
system, and managing processes. For example, when a developer writes a Windows
application, they might use the CreateFile function from the Windows API to open or create
a file on disk.
6. System Utilities
 Description: System utilities are software tools that perform maintenance, diagnostic, and
management tasks within the operating system. These utilities help users and
administrators monitor the system’s performance, troubleshoot issues, and manage
resources effectively.
 Role: System utilities provide essential services that keep the operating system running
smoothly. They include tools for managing files, optimizing performance, monitoring system
health, and protecting against security threats. Utilities often run in the background,
performing tasks like disk defragmentation, virus scanning, or system updates.
 Real Example: The Disk Cleanup utility in Windows. This utility scans your hard drive for
unnecessary files (like temporary files, system cache, and old updates) and allows you to
delete them to free up space and improve system performance.
7. Shell (Command Line Interface)
 Description: The shell is a user interface that allows users to interact with the operating
system through text-based commands. It serves as a command interpreter, executing user
commands, running scripts, and managing system tasks. Shells can be command-line
interfaces (CLI) or graphical user interfaces (GUI).
 Role: The shell provides a powerful interface for interacting with the operating system. It
allows users to perform a wide range of tasks, such as navigating the file system, managing
files, launching applications, and automating tasks through scripting. Shells are especially
useful for system administrators and developers who need precise control over the system.
 Real Example: The Bash shell in Linux. Developers and system administrators use Bash to
navigate the file system, execute commands, and run scripts. For example, you might use
Bash to compile a program by running the gcc command, move files with the mv command,
or check system resource usage with the top command.
8. User Applications
 Description: User applications are software programs that end-users interact with directly to
perform specific tasks. These programs rely on the underlying system software layers to
function, but they are designed to be user-friendly and provide a wide range of functionalities,
from word processing and web browsing to gaming and graphic design.
 Role: User applications provide the tools and functionality that users need to accomplish their
tasks. They interact with the operating system through system calls and libraries, allowing
users to perform a wide variety of activities without needing to understand the complexities of
the underlying system.
 Real Example: Microsoft Word, a popular word processing application. When you create a
document in Word, the application relies on system libraries to handle tasks like saving the
document to disk, printing it, or displaying it on the screen. Word also interacts with device
drivers to send the document to a printer, and with the operating system to manage
resources like memory and processing power.
Interplay Between Layers: A Real-World Scenario
Let’s consider a real-world scenario where you want to print a document using Microsoft Word on
a Windows PC. Here’s how each layer comes into play:
1. User Applications: You open Microsoft Word (a user application) and type your document.
When you’re ready to print, you click the "Print" button.
2. System Libraries: Word uses system libraries (such as the Windows API) to create a print
job. This job is formatted according to the printer’s requirements. The library functions
ensure that the document is converted into a format that the printer can understand, such as
a PDF or a PostScript file.
3. Kernel (Operating System): The Windows kernel manages the print job by allocating CPU
time, memory, and other resources needed to process the print job. It ensures that the print
job is executed without interfering with other tasks running on the system.
4. Device Drivers: The print job is passed to the printer driver, a specialized piece of software
that translates the generic print commands from the operating system into specific
instructions that the printer hardware can understand. The driver might convert the
document into a language like PCL or PostScript, which the printer can then interpret to print
the document correctly.
5. Firmware: The printer’s firmware receives the commands from the printer driver. The
firmware is responsible for controlling the physical components of the printer, such as the
print head, paper feed mechanism, and ink or toner delivery system. It ensures that the
document is printed correctly on the paper.
6. Hardware: The hardware layer is the physical printer itself, which receives the signals from
the firmware and executes the print job. The printer takes the document data, moves the
paper through the printer, and applies ink or toner to produce the final printed document.
UNIT II:assembler:
OvervIew Of The assembly prOcess

An assembler is a crucial system software that converts assembly language programs into
machine code or object code, which the central processing unit (CPU) can directly execute. It
serves as a translator, converting human-readable mnemonics and symbols into the binary
instructions that make up machine language. This process is essential for low-level programming
and system development, where precise control over hardware is necessary.
Assemblers are similar to compilers in that they produce executable code. However, assemblers
are more simplistic since they only convert low-level code (assembly language) to machine code.
Since each assembly language is designed for a specific processor, assembling a program is
performed using a simple one-to-one mapping from assembly code to machine code. Compilers,
on the other hand, must convert generic high-level source code into machine code for a specific
processor.
Most programs are written in high-level programming languages and are compiled directly to
machine code using a compiler. However, in some cases, assembly code may be used to
customize functions and ensure they perform in a specific way. Therefore, IDEs often include
assemblers so they can build programs from both high and low-level languages.
The design of an assembler depends upon the machine architecture as the language used is
mnemonic language.
Key Characteristics of an Assembler
1. Translation of Assembly Language to Machine Code:
o The primary function of an assembler is to convert assembly language, which uses symbolic
instructions (like MOV, ADD, SUB), into the corresponding binary code understood by a CPU.
o Every assembly instruction is mapped directly to a machine instruction, making this translation
straightforward but vital for efficient system execution.
2. Symbolic Addressing:
o Assemblers use symbols to represent memory addresses, making it easier for programmers to
reference data without needing to know the exact address. The assembler manages these
symbols in a symbol table and resolves them to actual memory addresses during translation.
3. Platform-Specific:
o Assemblers are highly architecture-specific. They generate machine code for a specific
processor type or instruction set architecture (ISA) such as x86, ARM, or MIPS.
o Each processor has its own set of instructions, registers, and addressing modes, so the
assembler must be tailored to that particular architecture.
4. Two Types of Assemblers:
o Single-Pass Assembler: Processes the source code once and generates machine code
immediately. Efficient but struggles with forward references.
o Two-Pass Assembler: Processes the code in two passes—first building a symbol table, then
generating the machine code. Easier to manage forward references but requires more time and
memory.
5. Macro Processing:
o Assemblers often support macros, which allow the definition of reusable code blocks. A macro
can be called multiple times, simplifying repetitive code patterns and reducing the overall
complexity of the source code.
6. Error Detection and Reporting:
o Assemblers perform error checking, ensuring that the syntax of the assembly language is correct
and that symbolic references are properly resolved. Errors like undefined labels, incorrect
instructions, and invalid operand types are flagged during assembly.
Functions of an Assembler
1. Lexical Analysis:The assembler scans the source code, breaking it down into tokens
(mnemonics, labels, constants, directives, etc.). This process identifies the basic
elements of the assembly code that need to be translated into machine code.
2. Syntax Analysis:After lexical analysis, the assembler checks the syntax of each
instruction to ensure that it follows the rules of the assembly language. For example, it
verifies that the correct number and types of operands are used for each instruction.
3. Symbol Table Management:The assembler creates and maintains a symbol table,
which maps symbolic names (labels, variables) to memory addresses. During the
assembly process, symbolic addresses are replaced with actual memory addresses from
this table.
4. Instruction Encoding:The assembler translates each mnemonic instruction into its
corresponding binary machine code instruction. For example, MOV A, B might be
translated into a binary sequence like 10101011, depending on the processor’s
instruction set.
5. Handling Directives:Assembly programs often include assembler directives (like
START, END, ORG, DB) that are instructions for the assembler itself, not for the
machine. These directives control aspects like where code is loaded in memory or how
data is initialized.
6. Forward Reference Handling:Forward references occur when a label or variable is
used before it is defined. Assemblers handle these either by backpatching (in single-pass
assemblers) or by resolving them in a second pass (in two-pass assemblers).
7. Literal Handling:The assembler also handles literals (constants used in the program)
and stores them in a literal table. These literals are replaced with actual values when
generating machine code.
8. Error Handling: Assemblers detect and report various errors, such as:
 Syntax Errors: Mistakes in instruction formats, invalid operands, etc.
 Semantic Errors: Undefined symbols, incorrect label definitions.
o The assembler outputs error messages to help the programmer debug the code.
Assembler Directives
Assembler directives are special instructions that guide the assembler during
the assembly process. These directives are not translated into machine code
and do not generate any executable instructions. Instead, they provide
information to the assembler about how to interpret and manage the assembly
code, such as memory allocation, symbol definition, and data initialization.
Assembler directives help control the organization of the program in memory,
initialization of data, label assignment, and macro definitions. They are often
prefixed by a dot (.), but this can vary based on the assembler and architecture.

Types of Assembler Directives


1. Data Definition Directives : These directives define data and allocate
space in memory for variables or constants.
 DB (Define Byte): Reserves a single byte of memory and initializes it with a
value.
o Example: VAR DB 10 allocates one byte of memory for VAR and initializes it
with the value 10.
 DW (Define Word): Reserves two bytes (a word) of memory and initializes it.
o Example: VAR DW 100 allocates two bytes of memory for VAR and initializes it
with the value 100.
 DD (Define Double Word): Reserves four bytes of memory for a double word.
o Example: VAR DD 12345678h allocates four bytes of memory for VAR with the
value 12345678h.
 DQ (Define Quad Word): Reserves eight bytes of memory (used for larger
integers or floating-point values).
 RESB, RESW, RESD (Reserve Byte/Word/Double Word): These directives
reserve uninitialized memory.
o Example: RESB 4 reserves 4 bytes of memory but does not initialize them with
any value.
2. Segment Definition Directives :These directives define segments, which
are blocks of code or data that need to be grouped together in memory.
 SEGMENT: Marks the beginning of a code or data segment.
o Example: DATA SEGMENT defines the start of the data segment.
 ENDS: Marks the end of a segment.
o Example: DATA ENDS ends the data segment.
3. Control Directives :Control directives guide how the assembler organizes
and assembles the program.
 ORG (Origin): Specifies the starting address for the code or data segment. It
tells the assembler where to start loading the code in memory.
o Example: ORG 100h sets the starting address at memory location 100h.
 EQU (Equate): Defines a constant or an alias for a value, label, or address.
o Example: MAX EQU 100 creates a constant MAX with the value 100. It doesn't
allocate memory but associates MAX with the value 100.
 END: Indicates the end of the source code. The assembler stops processing
any code beyond this point.
o Example: END marks the end of the program.
 START: Specifies the starting point of the program execution. It indicates
where the assembler should begin assembling the program.
o Example: START 100h directs the assembler to begin code execution at
address 100h.
4. Alignment Directives :Alignment directives ensure that data is aligned in
memory according to the processor's requirements, which can improve
performance.
 ALIGN: Aligns the next instruction or data to a specific byte boundary (e.g., 2,
4, or 8-byte alignment).
o Example: ALIGN 4 aligns the next instruction or data on a 4-byte boundary.
5. Conditional Assembly Directives : These directives control the conditional
inclusion of code during assembly, allowing portions of code to be assembled
or ignored based on certain conditions.
 IF/ENDIF: Conditional assembly based on a condition.
o Example:
IF DEBUG
; Debugging code here
ENDIF
 IFDEF/ENDIF: Includes code only if a symbol has been defined earlier using
the DEFINE directive.
o Example:
IFDEF FLAG
; Code assembled if FLAG is defined
ENDIF
6. Macro Directives :These directives help define macros, which are blocks of
reusable code that can be expanded in multiple places in the program.
 MACRO: Defines a macro.
o Example:
INCREMENT MACRO VAR
INC VAR
ENDM
 ENDM: Marks the end of a macro.
7. Include and File Management Directives :These directives manage the
inclusion of external files in the assembly process.
 INCLUDE: Tells the assembler to include code from another file.
o Example: INCLUDE io.inc inserts the contents of the io.inc file at that point in
the program.

Working Example of Assembler Directives in Code


Here’s an example program that uses several assembler directives:
; Simple assembly program demonstrating assembler directives

DATA SEGMENT ; Start of data segment


A DB 10 ; Define a byte variable A and initialize it with 10
B DB 20 ; Define a byte variable B and initialize it with 20
RESULT DW ? ; Define a word variable RESULT (uninitialized)
DATA ENDS ; End of data segment

CODE SEGMENT ; Start of code segment


ORG 100h ; Start code at memory address 100h
START: ; Label to indicate the start of the program
MOV AX, DATA ; Load the address of the data segment into AX
MOV DS, AX ; Initialize the data segment register

MOV AL, A ; Load the value of A into AL register


ADD AL, B ; Add the value of B to AL register
MOV RESULT, AX ; Store the result in RESULT

MOV AX, 4C00h ; Return control to the operating system


INT 21h ; DOS interrupt to terminate the program
CODE ENDS ; End of code segment

END START ; End of program, specify starting point


Explanation:
 DATA SEGMENT and DATA ENDS: Define the start and end of the data
segment, where variables A, B, and RESULT are declared.
 ORG 100h: Tells the assembler to start assembling code at memory address
100h.
 MOV Instructions: Regular assembly instructions that move values between
registers and memory.
 END START: Marks the end of the program and specifies the entry point as
START.

Summary of Common Assembler Directives


Directive Meaning Example
DB Define Byte A DB 10
DW Define Word VAR DW 100
DD Define Double Word VAR DD 12345678h
Set Origin (starting
ORG ORG 100h
address)
Equate a symbol with a
EQU MAX EQU 100
value
SEGMENT Start a memory segment DATA SEGMENT
ENDS End a memory segment DATA ENDS
ALIGN Align data or instructions ALIGN 4
IF/ENDIF Conditional assembly IF FLAG ... ENDIF
INCREMENT MACRO VAR ...
MACRO/ENDM Define and end a macro
ENDM
INCLUDE Include an external file INCLUDE io.inc

Design of Assembler

It generates instructions by evaluating the mnemonics (symbols) in operation field and find the
value of symbol and literals to produce machine code. Now, if assembler do all this work in one
scan then it is called single pass assembler,
Single-Pass Assembler : A Single-Pass Assembler processes the source program in one pass.
It is faster and more efficient but has limitations in handling forward references.Scenario for one-
pass assemblers Generate their object code in memory for immediate execution – load-and-go
assembler External storage for the intermediate file between two passes is slow or is inconvenient
to use Main problem - Forward references Data items Labels on instructions Solution Require
that all areas be defined before they are referenced. It is possible, although inconvenient, to do so
for data items. Forward jump to instruction items cannot be easily eliminated. Insert (label,
address _ to _ be _ modified) to SYMTAB Usually, address _ to _ be _ modified is stored in a
linked-list
Forward Reference in One-pass Assembler Omits the operand address if the symbol has not yet
been defined Enters this undefined symbol into SYMTAB and indicates that it is undefined Adds
the address of this operand address to a list of forward references associated with the SYMTAB
entry When the definition for the symbol is encountered, scans the reference list and inserts the
address. At the end of the program, reports the error if there are still SYMTAB entries indicated
undefined symbols. For Load-and-Go assembler Search SYMTAB for the symbol named in the
END statement and jumps to this location to begin execution if there is no error

Operation of Single-Pass Assembler


 Objective: The assembler reads the source code only once and simultaneously generates the
symbol table and machine code.
 Key Activities:
1. Location Counter (LC) Initialization: The LC is initialized as in the Two-Pass Assembler, and it
tracks the memory addresses.
2. Reading the Source Code: The assembler reads each instruction from the source code and
tries to resolve symbols and generate machine code immediately.
3. Symbol Table Updates: If a label is encountered, it is added to the Symbol Table with its
address (from the LC). If the label is referenced before it is defined (forward reference), it is
temporarily unresolved.
4. Handling Forward References:
 Backpatching: To handle forward references, the assembler uses a technique called
backpatching. When a forward reference is encountered, the assembler leaves a placeholder in
the object code and continues assembling. Later, when the label is defined, the assembler goes
back and fills in the correct address in the object code.
 If the forward reference cannot be resolved, the assembler may store it in a Fix-Up Table to
patch later.
5. Opcode Conversion: Like in the Two-Pass Assembler, mnemonics are converted into their
corresponding machine codes as the code is read.
6. Object Code Generation: Machine code is generated in parallel with scanning the source code.
7. Error Handling: The assembler detects errors in symbol usage, undefined labels, and incorrect
syntax, stopping the assembly process if necessary.
 Output: The final output is the object code, but unlike the Two-Pass Assembler, this is done in a
single step.
Advantages of Single-Pass Assembler:
1. Faster Execution: Since it only makes one pass over the source code, it is faster and more
efficient.
2. Lower Memory Usage: It doesn't need to store intermediate files or Symbol Tables across
passes.
3. Simple Programs: For small, simple programs, it works efficiently without needing multiple
passes.
Disadvantages of Single-Pass Assembler:
1. Limited Forward Reference Handling: Forward references are difficult to manage, and the
assembler needs to rely on backpatching or other techniques.
2. Error Handling: Since it makes only one pass, error detection may be less thorough than in a
Two-Pass Assembler.
3. Complexity in Large Programs: For large programs with many forward references, it becomes
complex and less efficient to resolve symbols in a single pass.

Two-Pass Assembler

A Two-Pass Assembler makes two passes over the source program. This type
of assembler is commonly used because it resolves forward references
(instructions that refer to labels that are defined later in the program). Here's
how it works in depth:
Pass 1: Symbol Table Construction
 Objective: The first pass's primary goal is to gather information about all the
symbols (labels) in the source code.
 Key Activities:
1. Location Counter (LC) Initialization: The LC is initialized to the starting address
of the program (typically zero or a user-defined address). It keeps track of the
memory location of each instruction.
2. Scanning the Source Code: The assembler reads the source code line by line.
For each label encountered, it stores the label in the Symbol Table along with
its corresponding address (from the LC).
3. Address Assignment: Each instruction or label is assigned an address based
on the current value of the LC.
4. LC Increment: After each instruction, the LC is incremented by the instruction's
size, ensuring that the next instruction gets the correct address.
5. Error Handling: If a symbol (label) is used in an instruction but not defined in
the source code, it raises an undefined symbol error.
 Output: At the end of the first pass, a Symbol Table and intermediate file are
created, containing the instruction addresses and symbol definitions.
Pass 2: Machine Code Generation
 Objective: The second pass's purpose is to generate the actual machine code
using the Symbol Table created in Pass 1.
 Key Activities:
1. Reading the Source Code and Intermediate File: The assembler reads the
intermediate file and uses the Symbol Table to resolve addresses for all
symbols.
2. Opcode Conversion: Each assembly instruction is converted into its
corresponding machine code (binary or hexadecimal) using the instruction's
mnemonic and the operand addresses.
3. Forward Reference Resolution: All forward references are resolved using the
Symbol Table. Since the symbol definitions were gathered in Pass 1, the
assembler can now assign the correct memory locations to instructions
referencing labels.
4. Object Code Generation: The assembler generates the object code for each
instruction and outputs it in a machine-readable form, typically in an object file
or for immediate execution.
5. Error Handling: It checks for errors such as undefined symbols and incorrect
syntax. If any error is detected, it halts the assembly process and reports the
issues.
 Output: The result is a fully assembled object code.
Advantages of Two-Pass Assembler:
1. Forward Reference Handling: Since it makes two passes, it handles forward
references smoothly.
2. Error Detection: More thorough error checking is possible since all symbols
and addresses are known after the first pass.
3. Modularity: It separates symbol resolution from machine code generation,
making the process more organized.
Disadvantages of Two-Pass Assembler:
1. Inefficient: The assembler has to scan the source code twice, which can be
slower for large programs.
2. Memory Usage: It requires additional memory to store the intermediate file and
Symbol Table across passes.
Working of Assembler
Assembler divides tasks into two passes:
Pass-1
 Define symbols and literals and remember them in the symbol table and literal table
respectively.
 Keep track of the location counter.
 Process pseudo-operations.
 Defines a program that assigns the memory addresses to the variables and
translates the source code into machine code.
Pass-2
 Generate object code by converting symbolic op-code into respective numeric op-
code.
 Generate data for literals and look for values of symbols.
 Defines a program that reads the source code two times.
 It reads the source code and translates the code into object code.
Firstly, We will take a small assembly language program to understand the working in their
respective passes. Assembly language statement format:
[Label] [Opcode] [operand]

Example: M ADD R1, ='3'


where, M - Label; ADD - symbolic opcode;
R1 - symbolic register operand; (='3') - Literal

Assembly Program:
Label Op-code operand LC value(Location counter)
JOHN START 200
MOVER R1, ='3' 200
MOVEM R1, X 201
L1 MOVER R2, ='2' 202
LTORG 203
X DS 1 204
END 205

Let’s take a look at how this program is working:


1. START: This instruction starts the execution of the program from location 200 and
the label with START provides a name for the program. (JOHN is the name of the
program).
2. MOVER: It moves the content of literal(=’3′) into register operand R1.
3. MOVEM: It moves the content of the register into memory operand(X).
4. MOVER: It again moves the content of literal(=’2′) into register operand R2 and its
label is specified as L1.
5. LTORG: It assigns an address to literals(current LC value).
6. DS(Data Space): It assigns a data space of 1 to Symbol X.
7. END: It finishes the program execution.
Working of Pass-1
Define Symbols and literal tables with their addresses. Note: Literal address is specified by
LTORG or END.
Step-1: START 200
(here no symbol or literal is found so both table would be empty)
Step-2: MOVER R1, =’3′ 200
( =’3′ is a literal so a literal table is made)
Literal Address

=’3′ –––

Step-3: MOVEM R1, X 201


X is a symbol referred before its declaration so it is stored in the symbol table with a blank address
field.
Symbol Address

X –––

Step-4: L1 MOVER R2, =’2′ 202


L1 is a label and =’2′ is a literal so store them in respective tables
Symbol Address

X –––

L1 202
Literal Address

=’3′ –––

=’2′ –––
Step-5: LTORG 203
Assign an address to the first literal specified by LC value, i.e., 203
Literal Address

=’3′ 203

=’2′ –––
Step-6: X DS 1 204
It is a data declaration statement i.e. X is assigned a data space of 1. But X is a symbol that was
referred to earlier in step 3 and defined in step 6. This condition is called a Forward Reference
Problem where the variable is referred prior to its declaration and can be solved by back-patching.
So now the assembler will assign X the address specified by the LC value of the current step.
Symbol Address

X 204

L1 202
Step-7: END 205
The program finishes execution and the remaining literal will get the address specified by the LC
value of the END instruction. Here is the complete symbol and literal table made by pass-1 of the
assembler.
Symbol Address

X 204

L1 202

Literal Address

=’3′ 203

=’2′ 205
Now tables generated by pass 1 along with their LC value will go to pass 2 of the assembler for
further processing of pseudo-opcodes and machine op-codes.
Working of Pass-2
Pass-2 of the assembler generates machine code by converting symbolic machine-opcodes into
their respective bit configuration(machine understandable form). It stores all machine-opcodes in
the MOT table (op-code table) with symbolic code, their length, and their bit configuration. It will
also process pseudo-ops and will store them in the POT table(pseudo-op table). Various
Databases required by pass-2:
1. MOT table(machine opcode table)
2. POT table(pseudo opcode table)
3. Base table(storing value of base register)
4. LC ( location counter)

Take a look at the flowchart to understand:

Comparison Between Two-Pass and Single-Pass Assemblers

Feature Two-Pass Assembler Single-Pass Assembler


Number of Two passes over the source
Only one pass over the source code
Passes code
Forward Handled easily due to the two Requires backpatching or fix-up table
References passes to handle
Faster since only one pass is
Speed Slower due to two passes
required
Feature Two-Pass Assembler Single-Pass Assembler
Better error detection and Limited error detection due to single
Error Detection
handling pass
Requires more memory for More memory efficient due to lack of
Memory Usage
intermediate files intermediate files
Easier to implement for complex More complex implementation for
Implementation
assembly tasks large programs
UNIT III: macrO
INTrODUcTION TO macrOs
A macro is a sequence of instructions that can be invoked by a single name. Macros are
extensively used in assembly language programming and high-level languages to simplify
repetitive tasks, improve readability, and allow for more efficient code development. The core
idea is to encapsulate frequently used code blocks into a single construct, which, when called,
expands into the complete set of instructions.
 Definition: Macros are preprocessor directives that define a block of code under a single
identifier, which is expanded in-line where the macro is invoked. It reduces code redundancy and
the need for repeated writing of identical code blocks.
 Key Characteristics:
o Code is substituted at the place of invocation.
o The macro definition is not part of the final executable; only the expanded code is.
o Can take parameters to make the macro flexible.
 Purpose of Macros:
o Reduce Code Repetition: Instead of writing the same set of instructions multiple times, a macro
allows you to define it once and reuse it.
o Encapsulate Complex Operations: Complex and repetitive tasks can be encapsulated into
simple macros, making code easier to understand.
o Simplify Maintenance: Any changes needed in the macro logic can be made in one place
(macro definition), and the change will be reflected wherever the macro is invoked.
o Improve Readability: By using well-named macros, the intent of the code becomes clearer.
 Macro Expansion: When a macro is invoked in the source code, the assembler or preprocessor
replaces the macro call with the corresponding block of instructions defined in the macro. This
process is known as macro expansion.

2. Types of Macros
Macros can be classified into various types based on their complexity, functionality, and how they
are used. These types offer flexibility in how code is defined and reused.
a. Simple Macros
 Description: The most basic type of macro, which simply replaces the macro name with a fixed
block of code. There are no parameters or conditions in simple macros.
 Example:
ADD_VALUES MACRO
MOV AX, BX
ADD AX, CX
ENDM
Here, ADD_VALUES is a simple macro that moves the content of BX into AX and adds the value
of CX to it.
b. Parameterized Macros
 Description: Macros that accept arguments, allowing the same block of code to be reused with
different values or variables. The arguments act as placeholders that are replaced with actual
values during macro invocation.
 Example:
ADD_VALUES MACRO A, B
MOV AX, A
ADD AX, B
ENDM
When invoked with specific parameters like ADD_VALUES 5, 10, the macro expands to use
those values, making it more versatile than a simple macro.
c. Conditional Macros
 Description: Macros that incorporate conditional logic, typically using assembly language
directives like IF, ELSE, and ENDIF. This allows the macro to generate different code based on
certain conditions.
 Example:
MAXIMUM MACRO A, B
IF A > B
MOV AX, A
ELSE
MOV AX, B
ENDIF
ENDM
Depending on whether A is greater than B, the macro will move the correct value into the AX
register.
d. Nested Macros
 Description: Macros that are defined within other macros. This allows for complex hierarchies
and modularization within macro definitions.
 Example:
OUTER_MACRO MACRO X
NESTED_MACRO MACRO Y
MOV AX, X
ADD AX, Y
ENDM
NESTED_MACRO 5
ENDM
OUTER_MACRO defines NESTED_MACRO, which can then be invoked with parameters inside
OUTER_MACRO.
e. Recursive Macros
 Description: Macros that invoke themselves, either directly or indirectly. This kind of macro is
powerful but should be used carefully to avoid infinite recursion.
 Example:
FACTORIAL MACRO N
IF N <= 1
MOV AX, 1
ELSE
MOV AX, N
FACTORIAL N-1
MUL AX, N
ENDIF
ENDM

3. Design of Macro Processor


A macro processor is a system software that handles macro definition, expansion, and
invocation. Macro processors are part of the assembler or preprocessor system and are
responsible for expanding macros into the full set of assembly instructions before the actual
assembly or compilation of the source code.
The two main designs of macro processors are Single Pass and Double Pass processors.
DesIgN Of sINgle pass macrO prOcessOr : A single pass macro processor
reads and processes the source code in a single scan. It handles both macro definitions and
expansions during this single pass.
 Advantages:
o Faster as it only requires one pass over the source code.
o Simple to implement.
 Disadvantages:
o Cannot handle forward references (i.e., macros must be defined before they are used).
 Components:
o Macro Definition Table (MDT): Stores macro definitions.
o Macro Name Table (MNT): Stores macro names and points to corresponding entries in the MDT.
o Argument List Array (ALA): Stores arguments for each macro invocation.
 Operation:
1. The processor scans the source code line by line.
2. When a macro is defined, it is stored in the MDT and MNT.
3. When a macro is invoked, the processor fetches its definition from the MDT, substitutes
arguments from the ALA, and inserts the expanded code back into the source.

A one-pass macro processor that alternate between macro definition and macro expansion is
able to handle “macro in macro”.However, because of the one-pass structure, the definition of a
macro must appear in the source program before any statements that invoke that macro.This
restriction is reasonable (does not create any real inconvenience).
Three main data structures involved in an one-pass macro processor:
• DEFTAB: Stores the macro definition including macro prototype and macro body.Comment
lines are omitted.References to the macro instruction parameters are converted to a positional
notation for efficiency in substituting arguments.
• NAMTAB: Store macro names, which serves an index to DEFTAB contain pointers to the
beginning and end of the definition
• ARGTAB : Used during the expansion of macro invocations.When a macro invocation
statement is encountered, the arguments are stored in this table according to their position in
the argument list
DATA STRUCTURE

ALGORITHEM

DesIgN Of DOUble pass macrO prOcessOr : A double pass macro


processor reads the source code twice. The first pass records macro definitions and stores them
in tables, while the second pass performs the actual macro expansions.
 Advantages:
o Handles forward references, allowing macros to be used before their definitions.
o More flexible and robust than the single pass processor.
 Disadvantages:
o Slower because of the two passes over the code.
 Components:
o Similar to the single pass processor, with MDT, MNT, and ALA.
 Operation:
1. First Pass: The processor scans the code, identifies macro definitions, and stores them
in tables.
2. Second Pass: Macro invocations are expanded using the information from the first pass,
with argument substitution and code generation.

Like an assembler or a loader, we can design a two-pass macro processor in which:


• First pass: process all macro definitions, and
• Second pass: expand all macro invocation statements.
However, such a macro processor cannot allow the body of one macro instruction to contain
definitions of other macros. Because all macros would have to be defined during the first pass
before any macro invocations were expanded.

DebUgger aND ITs feaTUres


A debugger is a tool that helps in identifying and correcting errors (bugs) in software. It allows
programmers to monitor the execution of a program, examine memory and register contents, and
control the flow of execution.
a. Breakpoints
 A breakpoint allows the program to stop at a specific line of code. This is useful for examining
the state of the program at critical points.
 Usage: To inspect variables or check if the control flow is working as expected.
b. Step-by-Step Execution
 Allows the program to be executed one instruction at a time.
 Step Into: Executes each line of code, including entering into function calls.
 Step Over: Skips over function calls but executes them as a whole.
 Step Out: Exits the current function and returns to the caller.
c. Variable Inspection
 The debugger allows inspection of variable values at runtime. You can monitor the state of
variables and how they change throughout execution.
 Usage: Helps to find incorrect values or detect unexpected behavior in data.
d. Watchpoints
 A watchpoint pauses execution when the value of a specific variable changes. It is like a dynamic
breakpoint for a particular data point.
 Usage: Useful for debugging memory issues or tracking down unexpected variable changes.
e. Memory Dump
 A memory dump displays the contents of memory locations. In low-level programming, it is often
necessary to examine memory to debug problems related to pointers or memory corruption.
 Usage: Helps in understanding how data is stored in memory and finding issues like buffer
overflows.
f. Call Stack Inspection
 A call stack shows the sequence of function calls that led to the current point of execution. It
helps track the program's flow and identify where errors occurred.
 Usage: Especially useful in recursive functions or debugging crashes that occur deep in the
function call hierarchy.
g. Register Inspection
 In assembly language and systems programming, the debugger allows you to inspect and modify
the contents of CPU registers.
 Usage: Crucial for debugging hardware-level interactions or assembly-level programs.
UNIT Iv: lINkers & lOaDers:
Linkers and loaders are essential tools in the software
development lifecycle, responsible for transforming human-
written code into a format that can be executed by the
computer's CPU. Both components play a pivotal role in
program execution, and understanding their functions, types,
and advantages is vital for developers and system architects.
Linkers and loaders are essential components in program
execution, ensuring that compiled code is correctly combined,
loaded into memory, and ready to run. Different types of
loaders serve different purposes, from the simplicity of
absolute loaders to the flexibility and efficiency of dynamic
linking loaders. Understanding these tools is crucial for
developing efficient, reliable, and flexible software systems.
The choice of loader affects both program performance and
system design, so developers must consider these factors
when building and deploying applications.

1. Introduction to Linkers & Loaders


Linkers
A linker is a utility program that takes multiple object files,
produced by a compiler, and combines them into a single
executable file. The primary responsibilities of a linker include
symbol resolution and relocation.
 Symbol resolution: Programs usually consist of multiple
modules. Functions or variables defined in one module may be
referenced in another. The linker resolves these references by
identifying their corresponding definitions.
 Relocation: Programs use memory addresses that are often not
final. The linker adjusts these addresses so that they align with
where the program will be loaded into memory.
Example:
Consider two source files: main.c and helper.c. Each file
contains function definitions and calls to functions in the other
file:
// main.c
extern void helperFunction();
int main() {
helperFunction();
return 0;
}
########
// helper.c
void helperFunction() {
// Do something
}
When these are compiled into object files, main.o will contain
a reference to helperFunction, but it won't know its memory
location. The linker resolves this by assigning the correct
memory address of helperFunction from helper.o and creates
the final executable.
Loaders
A loader is a system program that loads the executable file
created by the linker into memory, prepares it for execution,
and then initiates the program. It typically performs the
following key tasks:
1. Loading: Reads the executable file from storage (e.g., disk) and
copies it into memory.
2. Relocation: Adjusts memory addresses if the program uses
relocatable memory references.
3. Memory allocation: Allocates memory segments for code,
data, and stack.
4. Linking libraries: Dynamically links libraries if the program uses
dynamic linking.
5. Transfer of control: Transfers execution to the program’s entry
point (e.g., main() in C programs).
Example:
When you execute a program like ./myProgram on a Linux
machine, the operating system’s loader (like ld-linux) loads the
binary into memory, links any required dynamic libraries (e.g.,
libc.so), and then starts executing main().

2. Functions of a Loader
The loader performs several vital functions that are crucial for
the proper execution of programs. Below are the main tasks
that a loader typically handles:
1. Memory Allocation
The loader allocates memory for the program’s text segment
(where instructions are stored), data segment (for global/static
variables), and stack (for function calls and local variables).
Example:
If a program’s text segment needs 100 KB of memory and its
data segment requires 50 KB, the loader allocates 150 KB of
memory and ensures that each segment is placed in separate
memory locations.
2. Relocation
Relocation involves adjusting memory addresses based on
where the program is actually loaded. During the linking stage,
memory addresses are typically relative (e.g., starting from 0).
When the program is loaded, the loader updates these
addresses to reflect their true positions in physical memory.
3. Symbol Resolution
In the case of dynamic loading, where a program depends on
external libraries, the loader resolves external symbols (e.g.,
function names, variable names) by finding the appropriate
libraries and linking them to the main program.
4. Dynamic Loading of Libraries
In modern systems, programs often rely on dynamically loaded
libraries (e.g., .dll files in Windows or .so files in Linux). The
loader is responsible for loading these libraries into memory at
runtime.
5. Transfer of Control
After all initializations and setups (memory allocation, symbol
resolution, etc.), the loader passes control to the program’s
entry point, such as the main() function in C/C++ programs.
Example of Loader Functionality:
When a user executes a program, the loader may find that it
depends on a dynamic library like libssl.so for SSL encryption.
The loader searches for this library, loads it into memory, links
it to the main program, and updates all references to libssl.so
functions.

3. Types of Loaders
Different types of loaders are used depending on the system
design and the specific needs of the application. The following
are common types:
1. Absolute Loader
In an absolute loader, the executable file contains absolute
memory addresses. The loader simply transfers the program to
a specific location in memory.
 Advantages:

1. Simple: No need for address adjustment or relocation.


2. Fast: The loader doesn’t perform relocation, making the
process quicker.
3. Straightforward design: Simple to implement.
4. Low overhead: Less CPU/memory usage during loading.
5. Efficient for embedded systems: Often used in embedded
systems where hardware is fixed.
 Disadvantages:

1. Inflexibility: The program can only be loaded into a specific


memory location.
2. Memory wastage: Multiple programs may not be able to share
memory efficiently.
3. Difficult to multitask: Harder to implement multitasking
systems.
4. Compatibility issues: Must be designed for specific
hardware/memory.
5. Manual adjustments required: The developer must carefully
manage memory layout.
Example: Early computers used absolute loaders since memory
allocation was static and predictable.
2. Relocating Loader
Relocating loaders adjust memory addresses at load time,
allowing the program to be loaded into any memory region.
This is more flexible than absolute loading.
 Advantages:

1. Flexibility: The program can be loaded anywhere in memory.


2. Memory efficiency: Multiple programs can share memory.
3. Supports multitasking: Ideal for modern operating systems.
4. Dynamic memory allocation: Programs do not require fixed
memory regions.
5. Easier to manage large programs: Efficiently handles large,
complex applications.
 Disadvantages:

1. Performance overhead: Relocation requires extra processing.


2. More complex implementation: Requires additional code to
handle relocation.
3. Time overhead: More time required during program loading.
4. Fragmentation issues: May lead to memory fragmentation.
5. Increased resource usage: Uses more system resources than
absolute loaders.
Example: Modern operating systems like Linux use relocating
loaders to dynamically allocate memory for processes.
3. Dynamic Linking Loader
Dynamic linking loaders load libraries at runtime, not at
compile time. This reduces the size of the executable and
allows multiple programs to share the same library.
 Advantages:

1. Reduced memory usage: Shared libraries are loaded only once


and used by multiple programs.
2. Smaller executable size: Since libraries are not included in the
executable, the size is reduced.
3. Efficient updates: Libraries can be updated independently of
programs.
4. Improves program portability: Programs can be distributed
without their libraries.
5. Supports plugin architectures: Dynamic loading makes it easier
to add or remove functionality via plugins.
 Disadvantages:
1. Runtime overhead: Library loading adds overhead at runtime.
2. Complexity: More complex to implement and manage
dependencies.
3. Potential for missing libraries: If a required library is not
available at runtime, the program fails to execute.
4. Version conflicts: Programs may encounter "DLL hell" where
incompatible versions of libraries cause errors.
5. Security risks: Dynamically loading libraries can pose a security
risk if malicious libraries are loaded.
Example: Windows uses .dll files for dynamic linking, while
Linux uses .so files. When you run a program, the loader
dynamically links the necessary libraries.
4. Bootstrap Loader
The bootstrap loader is responsible for loading the operating
system kernel when the system starts. It initializes hardware
and loads the OS from a non-volatile storage medium (e.g.,
hard drive or SSD).
 Advantages:

1. Automates system startup: Loads the operating system


automatically when the computer is powered on.
2. Enables hardware initialization: Sets up essential hardware
components.
3. Supports multiple boot environments: Can load different
operating systems or configurations.
4. Essential for embedded systems: Loads fixed applications on
embedded systems.
5. Versatile: Can be customized to support various booting
options.
 Disadvantages:

1. Limited functionality: Primarily focused on OS loading, with


minimal error handling.
2. Hardware dependencies: The bootstrap process is highly
dependent on the underlying hardware architecture.
3. Potential for failure: If the bootstrap loader is corrupted, the
system may fail to boot.
4. Harder to debug: Issues in the bootstrap process are often
difficult to diagnose.
5. Security risks: Vulnerable to boot-time attacks like rootkits.
Example: The BIOS or UEFI system in modern computers
functions as a bootstrap loader, loading the OS kernel when
the system is powered on.

4. Databases Used in Loaders


Loaders maintain internal databases that store information
necessary for efficient program loading and execution. These
databases include:
 Symbol Table: Stores information about all symbols (variables,
functions, etc.) used in the program.
 Relocation Table: Contains a list of addresses that need to be
adjusted during relocation.
 Import Table: Holds references to external functions or
variables that need to be resolved by dynamic linking.
 Export Table: Lists the symbols that a module or library makes
available to other programs.
 Error Table: Logs errors encountered during loading, such as
missing symbols or unresolved references.

5. Design of Loaders
1. Absolute Loader
An absolute loader directly loads programs without any
address modifications. It’s simple but lacks flexibility. Absolute
loaders are often used in embedded systems where programs
are always loaded into the same memory locations.
2. Dynamic Loading and DLL (Dynamic Link Library)
A dynamic loader loads shared libraries at runtime. This is
common in modern operating systems. For example, Windows
uses .dll files to dynamically load and link libraries to running
applications.
Advantages of DLLs:
 Memory efficiency: Shared libraries reduce memory
consumption.
 Code reusability: Libraries can be reused across different
programs.
 Ease of maintenance: Updating the library updates all
programs that depend on it.
Disadvantages of DLLs:
 Runtime overhead: Linking happens at runtime, adding
overhead.
 Compatibility issues: Different programs may require
incompatible versions of the same library, leading to "DLL hell."
Example of Dynamic Loading:
In Windows, a program may use a DLL for graphics rendering.
When the program is executed, the loader dynamically loads
the necessary DLL into memory, allowing the program to use
the library's functions.
UNIT v: basIcs Of cOmpIler:
a sImple cOmpIler
A compiler is a computer program which helps you transform source code written in a high-level
language into low-level machine language. It translates the code written in one programming
language to some other language without changing the meaning of the code. The compiler also
makes the end code efficient which is optimized for execution time and memory space.
The compiling process includes basic translation mechanisms and error detection. Compiler
process goes through lexical, syntax, and semantic analysis at the front end, and code generation
and optimization at a back-end.

Features of Compilers
 Correctness
 Speed of compilation
 Preserve the correct the meaning of the code
 The speed of the target code
 Recognize legal and illegal program constructs
 Good error reporting/handling
 Code debugging help

Difference between Interpreter, Assembler and Compiler


1. Compiler
 Definition: A compiler is a program that translates a high-level programming
language (like C, Java, etc.) into machine code (binary) that a computer's
processor can execute.
 Process: Compilation happens in a single go, translating the entire source
code into machine code before execution.
 Output: It produces an independent executable file that can be run multiple
times.
 Speed: Faster execution, as the code is already converted into machine
language.
 Error Handling: All errors are detected before execution, during the
compilation process.
 Examples: C, C++, Java (JIT compilation for Java), etc.
2. Interpreter
 Definition: An interpreter translates high-level code into machine code line by
line and executes it immediately.
 Process: It processes the source code one line at a time, translating and
executing each line before moving to the next.
 Output: No separate executable file is generated, the source code is
interpreted every time it is run.
 Speed: Slower execution because the translation occurs at runtime.
 Error Handling: Errors are detected and reported one at a time, during the
execution phase.
 Examples: Python, Ruby, JavaScript, PHP, etc.
3. Assembler
 Definition: An assembler converts assembly language (a low-level language
closely related to machine language) into machine code.
 Process: The assembler translates assembly instructions directly to machine
instructions that the processor can execute.
 Output: Machine code or object code.
 Speed: Faster execution like compilers, as assembly language is very close
to machine language.
 Error Handling: Errors are detected during the assembly process.
 Examples: Programs written in assembly language for platforms like x86 or
ARM processors.

Feature Compiler Interpreter Assembler


Translates the
Translation Translates and executes Translates assembly
entire program at
Method line-by-line to machine code
once
Produces an No intermediate
Output Produces object code
executable file executable file
Execution Faster execution Slower execution due to Fast execution after
Speed after compilation line-by-line interpretation assembly
Error Detects all errors Detects assembly-
Detects errors at runtime
Detection during compilation level errors
Suitable for large Suitable for scripting and Suitable for low-level
Use Case
programs small tasks programming

Types of Compiler
 Single Pass Compilers
 Two Pass Compilers
 Multipass Compilers
Single Pass Compiler

In single pass Compiler source code directly transforms into machine code. For example, Pascal
language.
Two Pass Compiler

Two pass Compiler is divided into two sections, viz.


1. Front end: It maps legal code into Intermediate Representation (IR).
2. Back end: It maps IR onto the target machine
The Two pass compiler method also simplifies the retargeting process. It also allows multiple front
ends.
Multipass Compilers

The multipass compiler processes the source code or syntax tree of a program several times. It
divided a large program into multiple small programs and process them. It develops multiple
intermediate codes. All of these multipass take the output of the previous phase as an input. So it
requires less memory. It is also known as 'Wide Compiler'.

aNalysIs - syNThesIs mODel Of cOmpIlaTION


There are two parts of compilation:
Analysis
Synthesis
>The analysis part breaks up the source program into constituent pieces
>creates an intermediate representation of the source program.
>The synthesis part constructs the desired target program from the intermediate representation.

> During analysis, the operations implied by the source program are determined and recorded in a
hierarchical structure called a tree.
>Often, a special kind of tree called a syntax tree is used.
>In syntax tree each node represents an operation and the children of the node represent the
arguments of the operation.
>For example, a syntax tree of an assignment statement is shown below.
The analysis-synthesis model is a foundational concept in compiler design, breaking the
compilation process into two major phases: Analysis and Synthesis.
Analysis Phase
The analysis phase is responsible for examining the source code and breaking it down into its
components. It consists of the following steps:
1. Lexical Analysis:
o This is the first step, where the compiler reads the source code and converts
it into tokens (basic syntactic units).
o The lexical analyzer (lexer) identifies keywords, operators, identifiers, and
literals using regular expressions and finite automata.
o Example: For the statement int a = 5;, the tokens generated might be int, a,
=, 5, and ;.
2. Syntax Analysis:
o In this step, the compiler parses the token sequence to ensure it adheres to
the grammatical structure of the language.
o The syntax analyzer (parser) constructs a syntax tree (or parse tree) that
represents the hierarchical structure of the program.
o Example: The parser might build a tree that shows the assignment operation,
with a as the variable being assigned and 5 as the value.
3. Semantic Analysis:
o This step verifies the logical correctness of the program, checking for
semantic errors such as type mismatches or undeclared variables.
o The semantic analyzer checks that operations are valid for the types involved
(e.g., you cannot add a string to an integer).
o Example: If b is declared as a string and the program attempts to add it to an
integer, a semantic error will be flagged.
Synthesis Phase
The synthesis phase takes the analyzed representation and constructs the final machine code. It
consists of the following steps:
1. Intermediate Code Generation:
o After the analysis phase, the compiler produces an intermediate
representation (IR) of the code that is easier to manipulate and optimize than
the original source code.
o Example: The statement int a = 5; might be represented in IR as LOAD 5
INTO a.
2. Optimization:
o The compiler optimizes the intermediate code to improve performance and
reduce resource usage. Optimization can occur at different levels: local
optimizations (within a single function) and global optimizations (across
multiple functions).
o Example: If a variable is assigned a value that is not used later in the
program, the compiler can eliminate that assignment to save space and time.
3. Code Generation:
o The final step of the synthesis phase involves generating the target machine
code from the optimized intermediate representation.
o The code generator produces assembly or machine language instructions
specific to the target architecture.
o Example: The IR instruction LOAD 5 INTO a might be translated into a
machine instruction like MOV R1, 5 followed by MOV a, R1.

The phases Of a cOmpIler


Compiler operates in various phases each phase transforms the source program from one
representation to another. Every phase takes inputs from its previous stage and feeds its output to
the next phase of the compiler.
There are 6 phases in a compiler. Each of this phase help in converting the high-level langue the
machine code. The phases of a compiler are:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generator
5. Code optimizer
6. Code generator

All these phases convert the source code by dividing into tokens, creating parse trees, and
optimizing the source code by different phases.
Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes
in the form of tokens as:
<token-name, attribute-value>

Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are
checked against the source code grammar, i.e. the parser checks if the expression made by the
tokens is syntactically correct.
Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of language. For
example, assignment of values is between compatible data types, and adding string to an integer.
Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether
identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax
tree as an output.
Intermediate Code Generation
After semantic analysis the compiler generates an intermediate code of the source code for the
target machine. It represents a program for some abstract machine. It is in between the high-level
language and the machine language. This intermediate code should be generated in such a way
that it makes it easier to be translated into the target machine code.
Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be assumed as
something that removes unnecessary code lines, and arranges the sequence of statements in order
to speed up the program execution without wasting resources (CPU, memory).
Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code and
maps it to the target machine language. The code generator translates the intermediate code into a
sequence of (generally) re-locatable machine code. Sequence of instructions of machine code
performs the task as the intermediate code would do.
Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names
along with their types are stored here. The symbol table makes it easier for the compiler to quickly
search the identifier record and retrieve it. The symbol table is also used for scope management.
Error Handling: Throughout the compilation process, the compiler must handle errors gracefully,
providing useful feedback to the programmer regarding syntax or semantic issues.

The grOUpINg Of phases


• Phases deals with logical organisation of compiler.
• In an implementation, activities from more than one phase are often grouped together.
• Front and Back Ends:
– The phases are collected into a front and a back end.
– The front end consists of phases or part of phases that depends primarily on source language and
is largely independent of the target machine.
• These normally include lexical and syntactic analysis, the creation of symbol table, semantic
analysis and intermediate code generation.
– The back end includes portions of the compiler that depend on the target machine.
• This includes part of code optimization and code generation.
• Passes:
– Several phases of compilation are implemented in a single pass consisting of reading an input file
and writing an output file.
• Reducing the number of passes:
– Takes time to read and write intermediate files.
– Grouping of several phases into one pass, may force the entire program in memory, because one
phase may need information in a different order than previous phase produces it.
– Intermediate code and code generation are often merged into one pass using a technique called
backpatching.
Compiler phases can be grouped into two major categories:
1. Front End: This part includes the lexical analysis, syntax analysis, and semantic
analysis phases. The front end focuses on understanding and validating the source
code, independent of the target architecture.
2. Back End: This part includes intermediate code generation, optimization, and code
generation phases. The back end deals with transforming the validated
representation into efficient machine code tailored to a specific architecture.
This separation allows for modular design, enabling the front end to be used with different back
ends, making the compiler adaptable to various platforms.

cOmpIler - cONsTrUcTION TOOls.


The process of compiler construction can be facilitated by various tools and technologies, helping
to automate and streamline development. Some common tools include:
1. Lexical Analyzers: Tools like Lex and Flex can generate lexical analyzers from
regular expressions, simplifying the tokenization process.
2. Parser Generators: Tools such as Yacc (Yet Another Compiler Compiler) and
Bison assist in creating parsers from context-free grammars, handling syntax
analysis efficiently.
3. Intermediate Code Generators: Tools like LLVM (Low-Level Virtual Machine)
provide frameworks for generating and optimizing intermediate code, allowing
developers to focus on the front-end compilation processes.
4. Optimization Tools: Many modern compilers include built-in optimization
techniques, but separate tools like GCC offer extensive optimization features for
various target architectures.
5. Integrated Development Environments (IDEs): Many IDEs provide built-in
support for compiling and debugging code, offering an integrated approach to
software development.
6. Debuggers: Debugging tools assist in analyzing the generated code and identifying
issues during runtime, helping developers troubleshoot their programs effectively.

You might also like