Course No: CSE 4117
Course Name: Parallel and Distributed
Processing
Uniprocessor Architecture
Book: Computer Architecture & Parallel Processing(K ai Hwang & Briggs) -1.2
❑ A typical uniprocessor computer consists of three major components:-
1. Central Processing Unit (CPU)
▪ Master controller of VAX system.
▪ Set of general purpose registers along with program counter.
▪ A special purpose CPU status registers for storing the current state of CPU and program under execution.
▪ One arithmetic logic unit.
▪ One local cache memory.
2. Main Memory
3. Input-Output (I/O) subsystem
❑ In addition, there is a common synchronous bus architecture for communication between CPU, main memory and I/O
sub system.
❑ Example: Supermini VAX-11/780 Uniprocessor system
Supermini VAX-11/780 Uniprocessor system
The first VAX model sold was the VAX-11/780, which was introduced on October 25, 1977.
(extra)
Watch this if you are more interested: (For VAX-11/780 watch from 14 min)
https://www.youtube.com/watch?v=VEf2Xf7Urn8&ab_channel=NancyLovesDogs.
*Uniprocessor
Architecture
Figure: Architecture of Supermini VAX-11/780 Uniprocessor system
Uniprocessor Architecture
❖ The CPU contains the master controller of the VAX system.
❖ There are 16, 32-bit general purpose register one of which is a Program Counter (PC).There is also a special CPU
status register containing about the current state of the processor being executed.
❖ The CPU contains an ALU with an optional Floating-point accelerator, and some local cache memory .
❖ The CPU can be intervened by the operator through the console connected to floppy disk.
❖ The CPU, the main memory( 2^32 words of 32 bit each) and the I/O subsystem are all connected to a common bus,
the synchronous backplane interconnection(SBI).
❖ Through this bus, all I/O devices can communicate with each other with CPU or with the memory.
❖ I/O devices can be connected directly to the SBI through the uni bus and its controller or through a mass bus and its
controller.
Parallelism in Uniprocessor Architecture
Book: Computer Architecture & Parallel Processing (K ai Hwang & Briggs) -1.2.2
❑ Parallelism can be achieved in the uniprocessor systems using two main methods:-
1. Hardware Method
▪ Multiplicity of functional units
▪ Parallelism and pipelining within the CPU
▪ Overlapped CPU and I/O operations
▪ Use of hierarchical memory system
▪ Balancing of subsystem bandwidth
2. Software Method
▪ Multiprogramming
▪ Time sharing
1.Multiplicity of Functional Units
❖ Early computers
❖ one ALU that performed one operation at a time.
❖ Slow process
❖ Many of the functions of the ALU can be distributed to multiple and specialized functional units which can operate in parallel.
❖ CDC-6600 (designed in 1964) has 10 functional units built into its CPU.
❖ These 10 units are independent of each other and may operate simultaneously.
❖ A scoreboard is used to keep track of the availability of the functional units and registers being demanded.
2.Parallelism and Pipelining within the CPU
❖ CPU makes use of parallel adders like carry look ahead adder and carry save adders.
Bit-Serial Adder
2.Parallelism and Pipelining within the CPU
❖ Various phases of instruction executions arc now pipelined, including instruction fetch, decode, operand
fetch, arithmetic logic execution, and store result.
❖ To facilitate overlapped instruction executions through the pipe, instruction prefetch and data buffering
techniques have been developed.
❖ The instructions are prefetched and related data is buffered so that the instructions can be overlapped
through pipes (memory queues).
3.Overlapped CPU and I/O operations
❖ I/O operations can be performed simultaneously with the CPU computations by using separate I/O
controllers, channels, or I/O processors.
❖ The direct-memory-access (DMA) channel can be used to provide direct information transfer between
the I/O devices and the main memory.
4.Use of hierarchical memory system
❖ A hierarchical memory system can be used to close up the speed gap between the CPU and memory.
❖ Memories are divided into different levels.
4.Use of hierarchical memory system
❖ The innermost level is the register files directly addressable by ALU.
❖ Cache memory can be used to serve as a buffer between the CPU and the main memory.
❖ Block access of the main memory can be achieved through multiway interleaving across
parallel memory modules.(Fig 1.4)
❖ Virtual memory space can be established with the use of disks and tape units at the outer
levels.
5. Balancing of subsystem bandwidth
❑Among 3 subsystem of computer: CPU, Main memory and I/O devices, CPU is the fastest
unit and I/O devices are the slowest unit of the computer.
❑The performance of this subsystem is measured in terms of bandwidth.
❑The bandwidth of a system is defined as the no. of operations are performed per unit
time. In case of memory, the memory bandwidth is measured by the no of words that can
be accessed per unit time.
❑Bandwidth balancing between CPU and Memory
▪ The speed gap between CPU and memory can be closed up by using cache memory
between them.
5. Balancing of subsystem bandwidth
❑Bandwidth balancing between memory and I/O devices
• To balance the speed gap of the I/O devices and the memory, I/O channels with different
speeds are used between slow I/O devices and main memory. I/O Channel is an
extension of the DMA concept. Processor does not execute I/O instructions itself.
Channels are logically self-contained, with sufficient logic and working storage to handle
I/O tasks.
• These I/O channels perform buffering and multiplexing functions(Multiplexer channel is
a DMA controller that can handle multiple devices at the same time) to transfer the data
from multiple disks into the main memory by stealing cycles from the CPU.
❑ Why there is no mention of Bandwidth balancing between CPU and I/O
device?
• Because most of the I/O tasks are done without CPU.
Software Method for Parallelism
❖Multiprogramming
▪ Usually every process or program consists of either CPU bound instructions or I/O
instructions or a combination of both.
▪ When a system has many process then, while CPU is busy with some CPU bound process
and at the same time a waiting I/O bound process can be allocated I/O resources for it’s
execution. This is called multiprogramming.
▪ The program interleaving is intended to promote better resource utilization through
overlapping I/0 and CPU operations.
▪ Here, processes don’t have to wait for each other to complete and hence can execute
simultaneously (parallel).
Software Method for Parallelism
❖Time Sharing
▪ Multiprogramming on a uniprocessor is centered around the sharing of the CPU by many
programs. The concept of time sharing extends from multiprogramming by assigning fixed
or variable time slices to multiple programs.
▪ In time sharing system, we assign time slices to each and every process and a preemptive
strategy is automatically use to preempt a process when it’s allocated time span is over.
▪ The preempted process goes into the waiting state and when given a chance by the
scheduler are again assigned the CPU resources.
▪ This happens till all the processes are finished.
Time Sharing