KEMBAR78
15CS72 ACA Module2Final | PDF | Computer Science | Computer Architecture
0% found this document useful (0 votes)
36 views29 pages

15CS72 ACA Module2Final

The document discusses various advanced processor technologies including CISC, RISC, superscalar, VLIW, superpipelined, vector, scalar, and symbolic processors, highlighting their architectures, advantages, and instruction execution methods. It compares the design space of these processors in terms of CPI and clock speed, and explains instruction pipelining and instruction set architecture (ISA). Additionally, it provides specific examples of CISC and RISC processors, detailing their features and performance characteristics.

Uploaded by

Tarun Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views29 pages

15CS72 ACA Module2Final

The document discusses various advanced processor technologies including CISC, RISC, superscalar, VLIW, superpipelined, vector, scalar, and symbolic processors, highlighting their architectures, advantages, and instruction execution methods. It compares the design space of these processors in terms of CPI and clock speed, and explains instruction pipelining and instruction set architecture (ISA). Additionally, it provides specific examples of CISC and RISC processors, detailing their features and performance characteristics.

Uploaded by

Tarun Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Module 2

4.1 Advanced Processor Technology


Major processor families to be studied includes
1. CISC: Complex Instruction Set Computer
A complex instruction set computer (CISC /pronounce as ˈsisk’/) is a computer where
single instructions can execute several low-level operations (such as a load from
memory, an arithmetic operation, and a memory store) or are capable of multi-step
operations as its name suggests ―COMPLEX INSTRUCTION SET.
Let we take an example of multiplying two numbers stored at location A and B. A = A * B
Thus MUL instruction performs the multiplication and performs three low level operations
given below.
1. Loads the two values into separate registers
2. Multiplies the operands in the execution unit
3. And finally third, stores the product in the appropriate location.
Thus, the entire task of multiplying two numbers can be completed with one instruction. It
does not require the programmer to explicitly call any loading or storing functions.
Advantage:- 1. Compiler has to do very little work to translate a high-level language
statement into assembly 2. Length of the code is relatively short 3. Very little RAM is
required to store instructions. 4. The emphasis is put on building complex instructions
directly into the hardware.

2. RISC: Reduced Instruction Set Computer


RISC processors only use simple instructions that can be executed within one clock
cycle. Thus, the ―MUL command described above could be divided into three separate
commands: ―LOAD which moves data from the memory bank to a register ―PROD
which finds the product of two operands located within the registers ―STORE which
moves data from a register to the memory banks. A programmer would need to code
four lines of assembly:
LOAD R1, A <<<======this is assembly statement
LOAD R2,B <<<======this is assembly statement
PROD A, B <<<======this is assembly statement
STORE R3, A <<<======this is assembly statement
Advantage:- ​1. Each instruction requires only one clock cycle to execute, the entire
program will execute in approximately the same amount of time as the multi-cycle
―MUL command. 2. These RISC ―reduced instructions‖ require less transistors of
hardware space than the complex instructions. 3. Pipelining is possible.

3. Superscalar : A superscalar processor is a ​CPU that implements a form of ​parallelism


called ​instruction-level parallelism within a single processor. A superscalar processor
can execute more than one instruction during a clock cycle by simultaneously
dispatching multiple instructions to different ​execution units on the
processor.Superscalar dynamically issues multiple instructions.

4. VLIW: Very long instruction word (VLIW) describes a computer processing architecture
in which a software based language ​compiler or preprocessor breaks program
instruction down into basic operations that can be performed by the ​processor in ​parallel
(that is, at the same time). These operations are put into a very long instruction ​word
which the processor can then take apart without further analysis, handing each operation
to an appropriate functional unit. Example Itanium (3 operations). VLIW statically issues
multiple instructions at each cycle.

5. Superpipelined: Super-pipelining attempts to increase performance by reducing the clock


cycle time. It achieves that by making each pipeline stage very shallow, resulting in a
large number of pipe stages. Super pipelining improves the performance by
decomposing the long latency stages (such as memory access stages) of a pipeline into
several shorter stages, thereby possibly increasing the number of instructions running in
parallel at each cycle.

6. Vector processor: In ​computing​, a vector processor or array processor is a ​central


processing unit (CPU) that implements an ​instruction set containing ​instructions that
operate on ​one-dimensional arrays of data called vectors. ​Vector processors can greatly
improve performance on certain workloads, notably ​numerical simulation and similar
tasks. Vector processing is also used in the video game consoles.
7. Scalar: Scalar processors represent a class of ​computer processors​. A scalar processor
processes only one data item at a time, with typical data items being ​integers or ​floating
point numbers​.A scalar processor is classified as a ​SISD processor (Single Instructions,
Single Data) in ​Flynn's taxonomy​.Scalar processor are those executing one instruction
per cycle.Scalar processor can be RISC scalar or CISC scalar.

8. Symbolic:It is used in Artificial Intelligence. There are many areas where symbolic
processors are applied. The list includes text retrieval, machine intelligence, expert
systems, and so on. Symbolic processors are sometimes called PROLOG processors,
Lisp processors, or symbolic manipulators.

4.1.1 Design space of the processors

1. The goal of the processor manufacturers is to lower the CPI using innovative hardware
approaches. The figure below shows the comparison between CISC, RISC and Vector
processor with respect to CPI and Clock Speed.
2. Conventional processors like Intel Pentium, M68040, VAX/8600, IBM 390 etc. fall into
CISC architecture. Clock rate of todays CISC processor ranges up to a few GHz. The
CPI of few CISC instructions varies from 1 to 20. CISC processor are in the upper part of
the design space.
3. RISC processors include SPARC, Power Series, MIPS, Alpha, ARM etc. The average
CPI of RISC instruction is around one or two clock cycles. Hence it is shown in the lower
part of the design space given below.
4. There are two categories of RISC i.e superscalar RISC and scalar RISC. The CPI of
superscalar RISC is even less than scalar RISC.
5. The Vector processors are supercomputers which use multiple functional units for
concurrent scalar and vector operations. The effective CPI of these processors is very
low and is positioned at the lower right corner of the design space as shown in the
figure.

Instruction Pipelines
1. Pipeline is an implementation technique where multiple instructions are having
overlapped execution inside the pipeline. The Pipeline has four stages namely
fetch,decode,execute and writeback.
2. Basic definitions involved with the instruction pipeline.
Instruction pipeline cycle ​— the clock period of the instruction pipeline.
Instruction issue latency — the time (in cycles) required between the issuing of two
adjacent instructions
Instruction issue rate — the number of instructions issued per cycle, also called the
degree of a superscalar processor.
Simple operation latency ~ Simple operations make up the vast majority of instructions
executed by a machine, such as integer adds, loads, stores, branches, moves, etc. On
the contrary, complex operations are those requiring a longer latency, such as divides,
cache misses, etc. These latencies are measured in number of cycles.
Resource conflicts — This refers to a situation where two or more instructions demand
use of the same functional unit at the same time.
3. A base scalar processor in which one instruction is issued per cycle is shown below.
There is one cycle latency between instruction issues. The pipeline is fully utilized if all
instructions are issued at the rate of one per clock cycle. This is an ideal pipeline where
the effective CPI rating is 1.

4. In case of Under-Pipelined processor with two cycles per instruction issue the pipeline is
underutilized as shown in the figure below. The CPI will be 2.

5. Another under pipelined situation is shown in the figure given below. Here the pipeline
cycle time is doubled by combining the pipeline stages. The fetch and decode phase is
combined into one stage and execute and writeback stage is combined into another
stage. This will result in poor pipeline utilization.

4.1.2 Instruction Set Architecture


Instruction set Architecture(ISA) is the interface between the software and hardware. ISA
defines the primitive commands or machine instructions, data types, registers, addressing
modes, opcode specification and flow control mechanism used. The two popular Instruction Set
are CISC and RISC architecture.

Architectural distinctions between RISC and CISC


The CISC architecture has unified cache for both instruction and data and RISC architecture
has separate instruction cache and data cache(split cache). CISC had microprogrammed
control unit which generates the control signals on the basis of some microinstructions. The
RISC has hardwired control unit which is logic circuit and generates the control signals on the
basis of state. The hardwired control unit performs better than microprogrammed control unit
and helps in reducing the CPI.

Comparison of RISC and CISC

Architectural Consideration CISC RISC

Instruction set size and Large set of instructions with Small set of instructions with
format variable format(16-64 bits per fixed format(32 bit per
instruction) instruction)

Addressing modes 12-24 3-5

General purpose register and 8-24 GPR and unified cache 32-192 GPR and split cache
cache design
CPI Between 2 and 16 Average CPI<1.5

CPU control Microprogrammed Hardwired

4.1.3 CISC Scalar processor

The CISC scalar processor may have both integer unit, floating point unit or even multiple such
units. It also has support for pipelining. The some early representative CISC scalar processors
are VAX 8600,Motorola MC68040,Intel i486 etc

Digital Equipment VAX 8600 processor architecture


1. It was introduced by Digital Equipment Corporation in 1985. It has microprogrammed
control.
2. The instruction set has about 300 instructions with 20 addressing modes.
3. It contains two functional units i.e integer unit and floating point unit which are pipelined.
4. It has unified cache. The pipeline is designed with six stages.
5. The Translation Lookaside Buffer(TLB) is built in memory unit for fast generation of
physical address from the virtual address. The CPI ranges from 2 to 20 clock cycles.

The Motorola MC68040 microprocessor architecture


1. It has over 100 instructions and supports 18 addressing modes and has 16 general
purpose registers.
2. It has split cache.i.e. a4-K byte data cache and 4-K byte instruction cache with separate
memory management unit(MMU) and Address Translation cache(ATC).
3. The integer unit is pipelined with 6 stages and floating point unit is pipelined with 3
stages.
4. It is more than 1.2 million transistors.
5. The data formats range from 8 to 80 bits, with provision for the IEEE floating point
standard.
6. Snooping logic is built into the memory units for monitoring bus events for cache
invalidation.

4.1.4 RISC Scalar processor


Four representative RISC-based processors from the year 1990, the Sun SPARC CY7C601,
intel i860, Motorola M88100, and AMD 29000.It uses 32-bit instructions. The instruction sets
consist of 51 to 124 basic instructions. On-chip floating-point units are built into the i860 and
M88100, while the SPARC and AMD use off-chip floating point units. All of them issue
essentially only one instruction per pipeline cycle.
Sun Microsystems SPARC architecture
1. SPARC stands for scalable processor architecture. Also here the floating point unit is
implemented on a separate chip. The figure shows the CY7C601 SPARC processor
architecture and Cypress CY7C602 FPU below. SPARC instruction set contains 69
basic instructions. Fourteen out of 69 are for floating point operations.
2. SPARC runs each procedure using a set of thirty two 32 bit IU registers. Out of 32
registers eight registers are global registers which are shared by all procedures and 24
are window registers associated with only one procedure.
3. The current window is specified by the current window pointer (CWP) field in the
processor state register (PSR). Window overflow and underflow are detected via the
window invalid mask (WIM) register.
4. The windows are overlapping as shown in the figure given below. Also 24 window
registers are divided into three register sections labelled as Ins, Locals and Outs. The
local registers are only locally addressable by each procedure. But the Ins and Outs are
shared among procedures.
5. The window of currently running procedure is called as the active window.
6. The overlapping windows saves the time required for interprocedure communication
resulting in fast context switching among procedures.
7. The Floating point unit(FPU) features 32 single precision(32-bit) or 16 double
precision(64-bit) floating point registers.
4.2 Superscalar and Vector Processors.

4.2.1 Superscalar processors: Multiple instructions are issued per cycle and multiple results
are generated per cycle. In superscalar processors it is possible to exploit instruction level
parallelism by executing the independent instructions in parallel without causing a wait state.
Pipelining in superscalar processors
The degree of superscalar processor is equal to the number of instructions issued per cycle. A
superscalar processor of degree m can issue m instructions per cycle.

1. But in order to fully utilize the superscalar processor it is necessary to execute m


instructions in parallel. This situation may not be true in all clock cycles. In that case,
some of pipelines may be stalling in a wait state.
2. Some of the representative superscalar processors include IBM RS/6000, DEC Alpha
21064, Intel i960CA etc. Due to reduced CPI and higher clock rate superscalar
processors outperform scalar processors.
3. A typical superscalar architecture tor a RISC processor is shown in Figure. The
instruction cache supplies multiple instructions per fetch. However, the actual number of
instructions issued to various functional units may vary according to data dependencies
and resource conflicts among instructions that are simultaneously decoded.
4. Multiple functional units are built into the integer unit and into the floating-point unit.
5. The maximum number of instructions issued per cycle ranges from two to five in these
superscalar processors. Typically. the register files in the IU and FPU each have 32
registers. Most superscalar processors implement both the IU and the FPU on the same
chip.
IBM RS/6000 architecture
1. It is superscalar processor with three functional units called branch processor, fixed point
unit and floating point unit.
2. Branch processor unit arranges the execution of five instructions per cycle.
3. It uses hardwired rather than microcoded control logic.
4. The system used a number of wide buses ranging from one word (32 bits) for the FXU
to two words (64 bits) the FPU, and four words for I-cache and D-cache, respectively.
These wide buses provided the high instruction and data bandwidths required for
superscalar implementation.
5. The architecture is shown below.

4.2.2 VLIW Architecture

1. The instructions of the program are broken down into some operations which can be
executed simultaneously by multiple functional units and a very long instruction word is
formed.
2. Each VLIW instruction word ranges from say 256 to 1024 bits and there are multiple
functional units which share a common register file.
3. Also the VLIW instruction word is formed by the compiler which can predict branch
outcomes using elaborate heuristics or run time statistics. This is called as code
compaction.
As shown in figure above a single instruction word has 3 operations and it is possible to have a
CPI of 0.33 for this example.

Differences between VLIW and superscalar


1. The decoding of VLIW instruction is much easier than superscalar instructions.
2. The code density of superscalar machine is better than VLIW when the instruction level
parallelism is less.
3. The CPI of VLIW processor can be even lower than that of superscalar processor.
Example: Multiflow trace computer allows up to seven operations to be executed
concurrently with 256 bits per VLIW instruction.
4. The success of VLIW processor depends on the efficiency of code compaction.
5. By explicitly encoding parallelism in the long instruction, VLIW processor eliminates the
need for hardware or software to detect parallelism. Also it has simple hardware design
and instruction set. It performs well for scientific applications where the program
behavior is more predictable. Not popular among the general purpose applications.

4.2.3 Vector Processors.

1. Vector processor operate on vector which is array of operands. It is generally used in


supercomputers.
2. We have two types of vector processor i. E. register to register vector processor and
memory-to-memory vector processor.
3. Example of register-to-register vector processor: CRAY supercomputer. Example of
memory-to-memory vector processor : CDC Cyber 205

Vector Instructions

Some Register based vector operations are found in register to register vector processor
as listed below. The vector register of length n is denoted as V i , scalar register Si and
memory array of length n as M(1:n) and operation is denoted by small “o”.

The reduction is an operation on one or two vector operands, and the result is a
scalar——such as the dot product between two vectors and the maximum of all
components in a vector
Memory-based vector operations are found in memory-to-memory vector processors
such as those in the early supercomputer CDC Cyber 205. Listed below are a few
examples:

Here M1(1:n) and M2(1:n) are two vectors of length n and M(k) is a scalar quantity
stored in memory location k.
Vector pipeline can be attached to any scalar or superscalar processor. The pipeline for
scalar and vector execution is shown below

In case of scalar pipeline execution single operation is performed for a single data
element. In case of vector pipeline same operation is performed for each data element of
the vector.

Symbolic Processors
It is used in many areas like pattern recognition, expert systems, knowledge engineering,
text retrieval, machine intelligence etc. Also known as prolog processors, lisp processors
or symbolic manipulators.
Characteristics of Symbolic Processing

4.3 Memory Hierarchy Technology


The storage devices like register, cache, main memory, disk devices and backup storage
devices are organized in a form of hierarchy as shown below. The cache is at level 1,
main memory at level 2, disk at level 3 and backup storage at level 4.
Memory devices at a lower level are faster to access, smaller in size, and more
expensive per byte, having a higher bandwidth and using a smaller unit of transfer as
compared with those at a higher level.
The access time t​i refers to the round-trip time from the CPU to the ith-level memory.
The memory size s​i is the number of bytes or words in level i. The cost of the ith-level
memory is estimated by the product c​i​s​i​. The bandwidth b​i refers to the rate at which
information is transferred between adjacent levels. The unit of transfer x​i refers to the
grain size for data transfer between levels i and i+1.

Also t​i-1​<t​i​,s​i-1​<s​i​,c​i-1​>c​i​,b​i-1​>b​i​ and x​i-1​<x​i​ for i=1,2,3,4 in the hierarchy.


Registers and caches.
The register transfer operations are controlled by processor after instructions are decoded.
Register transfer is conducted at processor speed in one clock cycle.
Cache is controlled by MMU and is programmer transparent. The cache can be implemented at
one or multiple levels, depending on the speed and application requirements.
Main Memory
The main memory is usually larger than cache and implemented using DDR SDRAMs i.e. dual
data rate synchronous dynamic RAMs. The main memory is managed by MMU and operating
system.
Disk Drives and Backup storage
Disk storage is online memory and it holds the system programs as OS and compilers and
users program and data. Optical disks and magnetic tape units are off-line memory for use as
archival and backup storage. They hold copies of present and past user programs and
processed results and files.
The memory characteristics of typical mainframe computer in 1993 is shown below.
Peripheral Technology
The peripheral devices include printers, plotters, terminals, monitors, graphics displays, optical
scanners, image digitizers, output microfilm devices, etc.The technology of peripheral devices
has improved rapidly in recent years. For example, we used dot matrix primers in the past. Now,
we use laser printers.
Inclusion, Coherence and Locality
The information stored in memory hierarchy satisfies three important properties: inclusion,
coherence and locality.
Inclusion
The inclusion property is stated as M​1​⊂M​2​⊂M​3​⊂…...⊂M​N​. The inclusion relationship implies
that all information items are originally stored in the outermost level M​N​. During the processing,
subsets of M​N​, are copied into M​N-1​. Similarly, subsets of M​N-1 are copied into M​N-2 and so on.
Hence if word is found in M​i then same word can be found in M​i+1​, M​i+2 and so on but may not
be found in M​i-1​.
Information transfer between the CPU and cache is in terms of words (4 or 8 bytes each
depending on the word length of a machine). The cache is divided into cache blocks Each block
may be typically 32 bytes. Blocks are the units of data transfer between the cache and main
memory. The main memory (M3) is divided into pages, say, 4 Kbytes each. Each page contains
128 blocks for the example in Fig. 4.18. Pages are the units of information transferred between
disk and main memory. Scattered pages are organized as a segment in the disk memory, for
example, segment F contains page A, page B, and other pages. Data transfer between the disk
and backup storage is handled at the file level, such as segments F and G illustrated in Fig.
4.18.
Coherence
The coherence property requires that copies of the same information item at successive
memory levels must be consistent. If a word is modified in the cache, copies of that word must
be updated immediately or eventually at all higher levels. In general, there are two strategies for
maintaining the coherence in a memory hierarchy.
The first method is called write-through (WT), which demands immediate update in M​i+1 if a word
is modified in M​i​.
The second method is write-back(WB), which delays the update in M​i+1 until
​ the word being
modified in M​i​ is replaced or removed from M​i​.
Locality of References
The CPU refers memory to either access the instructions or data. The memory references can
be clustered according to time, space or ordering. Hence there are three dimensions for locality
of reference i.e. temporal, spatial and sequential locality.
Temporal Locality​: Recently referenced items i.e' instructions or data are likely to be referenced
again in the near future. This is often caused by special program constructs such as iterative
loops, process stacks, temporary variables, or subroutines. Once a loop is entered or a
subroutine is called, a small code segment will be referenced repeatedly many times.
Spatial Locality : ​This refers to the tendency for a process to access items whose addresses are
near one another. For example, operations on tables or arrays involve accesses of a certain
clustered area in the address space.
Sequential Locality : ln typical programs, the execution of instructions follows a sequential order
unless branch instructions create out-of-order executions. The ratio of in-order execution to
out-of-order execution is roughly 5 to 1 in ordinary programs. Besides, the access of large data
array also follows a sequential order.

Figure 4.19 shows the memory reference patterns of three running programs or three software
processes. As a function of time, the virtual address space (identified by page numbers) is
clustered into regions due to the locality of references. The subset of addresses {or pages)
referenced within a given time window (t, t+Δt ) is called the working set.
4.3.3 Memory Capacity Planning

Hit Ratio
Consider memory levels M​i and M​i-1 in a hierarchy, i= 1, 2,. . ., n. The hit ratio h​i at
​ M​i is the
probability than an information item will be found in M​i​. The miss ratio at M​i​ is defined as 1-h​i​.
The CPU starts finding the data from M​1 level and searches till the outermost memory M​n​. The
access frequency to M​i​ is defined as
f​i​=(1-h​1​)(1-h​2​)...(1-h​i-1​)h​i
This is indeed the probability of successfully accessing M​i when there are i-1 misses at the
lower levels and a hit at M​i​. Note that
n
∑ f​i​ = 1 and f​1​=h​1
i=1

Due to locality property, the access frequencies decrease very rapidly from low to high levels;
that is f​1​>f​2​>f​3​>.....>f​n​. This implies that the inner levels of memory are accessed more often than
the outer levels.

Effective Access Time


Every time a miss occurs the penalty has to be paid to access the next higher level of memory
hierarchy. The misses are called block misses in cache and page faults in main memory. The
effective access time of memory hierarchy T​eff ​ is given as follows.
n
T​eff ​ = ∑ f​i​ t​i
i=1

= h​1​t​1​ + (1-h​1​)h​2​t​2​ + (1-h​1​)(1-h​2​)h​3​t​3​ + …………+(1-h​1​)(1-h​2​)...(1-h​n-1​)t​n

Hierarchy Optimization
The total cost of a memory hierarchy is estimated as follows
n
C​total​= ∑ c​i​ s​i
i=1

Also t​i-1​<t​i​,s​i-1​<s​i​,c​i-1​>c​i​,b​i-1​>b​i and x​i-1​<x​i for i=1,2,3,4 in the hierarchy.The optimal design of
memory hierarchy should have T​eff​ close to t​1​ of M​1​ and total cost close to cost of M​n​.

4.4 Virtual Memory Technology


The size of physical memory is limited and it is not possible to load in all programs fully and
simultaneously. Hence the concept of virtual memory was introduced. The virtual memory is
formed of physical memory and back up memory such as disk arrays.
Only active programs or only portion of them are brought into the physical memory at a time.
Active portions of programs can be loaded in and out from disk to physical memory dynamically
under the coordination of the operating system. To the users, virtual memory provides almost
unbounded memory space to work with. Without virtual memory, it would have been impossible
to develop the multiprogrammed or time-sharing computer systems that are in use today.

Address Space
Each word in physical memory has a unique physical address. Virtual addresses are those used
by machine instructions making up an executable program.
The virtual addresses must be translated into physical addresses at run time. A system of
translation tables and mapping functions are used in this process. Hence we have physical
address space and virtual address space.
Address Mapping
Let V be the set of virtual addresses generated by a program running on a processor. Let M be
the set of physical addresses allocated to run this program. A virtual memory system demands
an automatic mechanism to implement the following mapping:
F​t​: V→ M U {ϕ}
This mapping is time function which varies from time to time because the physical memory is
dynamically allocated and deallocated. Consider any virtual address v the mapping is formally
defined as follows.

f​t​(v)= { m, if m εM has been allocated to store the data identified by virtual address v
Φ if data v is missing in M
Thus the f​t​(v) translates the virtual adress v intp physical address m if there is memory hit and
returns Φ if there is a miss.

There are two virtual memory models


1. Private Virtual Memory
2. Shared Virtual Memory

Private Virtual Memory


Example: VAX/11
Each processor will have its own virtual space. The virtual space contains pages which are
mapped to physical memory which is shared by all processors. The advantage of using private
virtual memory is that all processors have their own private virtual space and private memory
maps which don’t require locking.
Disadvantage: Synonym problem( different virtual address from different virtual space may get
mapped to the same physical address in main memory)
Shared Virtual Memory
Examples: IBM 801, RT, RP3, System38,the HP Spectrum,the Stanford Dash, MIT Alewife,
Tera
Here all processors have a single globally shared virtual space. Each processor is given a
portion of shared virtual memory and different processors can use disjoint spaces also. The
page table is shared between all processors. Therefore, mutual exclusion {locking} is needed to
enforce protected access.
Advantage: all the virtual address are unique and there is no synonym problem.

TLB, Paging, and Segmentation


The pages of virtual memory are allocated to page frames in main memory. The virtual address
is translated to physical address using translation maps. These translation maps are stored in
separate cache or main memory.A mapping function is applied to the virtual address to access
these maps. Mapping is achieved using hashing or congruence function. Hashing is a simple
computing technique to convert a long page number into a short one with fewer bits.

Translation Look-aside Buffer(TLB)

1. It is a hardware(cache) used to provide fast lookup.


2. It stores most recently referenced page entries.
3. Virtual address is divided into three parts i.e.Page number, Block number and word
address.
4. The page number is searched through the TLB to retrieve the frame number.
5. If it is found it’s a HIT
6. In case of a MISS, a hashed pointer identifies one of the page tables from which the
frame number can be retrieved.

Paged Memory

1. Paging is a technique of partitioning both the physical memory and virtual memory as
fixed sized blocks.
2. Physical memory => Frames
3. Virtual memory => Pages
4. A page table consists of frame numbers indexed by page numbers.
5. Using paging lowers performance.
6. It results in internal fragmentation but no external fragmentation.
Segmented Memory
1. Used for logical structuring of a program.
2. Unlike pages, segments are of varied sizes.
3. Segmented memory is arranged as 2-D address space.
4. Virtual address has two parts: Segment number and an offset.
5. The offset addresses within each segment form 1-D contiguous addresses.
6. The segment number, not necessarily contiguous forms the second dimension.
7. Only external fragmentation, no internal fragmentation.

Paged Segments

1. Combination of Paging and Segmentation.


2. Within each segment, the addresses are divided into fixed size pages.
3. Paged segments offer advantages of both paging and segmentation.
4. For users the programs are logically structured.
5. For OS virtual memory is better managed with fixed size pages within each segment.

Inverted Paging
1. Direct paging works well for small address space like 32 bits.
2. A large virtual address space demands either large Page Tables or multilevel paging
which slows down performance.
3. An inverted page table is created containing information about the frames in the physical
memory.
4. Only one inverted page table can be used by all the processes.
5. The size of the inverted page table is governed by the size of the physical memory.

You might also like