KEMBAR78
Csa Module I Notes | PDF | Computer Data Storage | Input/Output
0% found this document useful (0 votes)
120 views53 pages

Csa Module I Notes

computer system architecture

Uploaded by

serinajena2206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views53 pages

Csa Module I Notes

computer system architecture

Uploaded by

serinajena2206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Basic Functional Unit of a Computer

Figure 1.1. (a) Basic functional units of a computer


What is a computer?

 a computer is a sophisticated electronic calculating machine


that:
 Accepts input information,
 Processes the information according to a list of
internally stored instructions and
 Produces the resulting output information.
 Functions performed by a computer are:
 Accepting information to be processed as input.
 Storing a list of instructions to process the information.
 Processing the information according to the list of
instructions.
 Providing the results of the processing as output.
 What are the functional units of a computer?

16
Functional units of a computer
Input unit accepts Arithmetic and logic unit(ALU):
information: •Performs the desired
•Human operators, operations on the input
•Electromechanical devices (keyboard) information as determined
•Other computers by instructions in the memory

Memory
Arithmetic
Input
Instr1 & Logic
Instr2
Instr3
Data1
Output Data2 Control

I/O Processor
Stores
information: Control unit coordinates
Output unit sends various actions
results of processing: •Instructions,
•Data •Input,
•To a monitor display, •Output
•To a printer •Processing

17
Information in a computer -- Instructions

 Instructions specify commands to:


 Transfer information within a computer (e.g., from memory to
ALU)
 Transfer of information between the computer and I/O devices
(e.g., from keyboard to computer, or computer to printer)
 Perform arithmetic and logic operations (e.g., Add two numbers,
Perform a logical AND).
 A sequence of instructions to perform a task is called a
program, which is stored in the memory.
 Processor fetches instructions that make up a program from
the memory and performs the operations stated in those
instructions.
 What do the instructions operate upon?

18
Information in a computer -- Data

 Data are the “operands” upon which instructions operate.


 Data could be:
 Numbers,
 Encoded characters.
 Data, in a broad sense means any digital information.
 Computers use data that is encoded as a string of binary
digits called bits.

19
Input unit
Binary information must be presented to a computer in a specific format. This
task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world Computer

Memory

Keyboard
Audio input
Input Unit
……

Processor

20
Memory unit
 Memory unit stores instructions and data.
 Recall, data is represented as a series of bits.
 To store data, memory unit thus stores bits.
 Processor reads instructions and reads/writes data from/to
the memory during the execution of a program.
 In theory, instructions and data could be fetched one bit at a
time.
 In practice, a group of bits is fetched at a time.
 Group of bits stored or retrieved at a time is termed as “word”
 Number of bits in a word is termed as the “word length” of a
computer.
 In order to read/write to and from memory, a processor
should know where to look:
 “Address” is associated with each word location.

21
Memory unit (contd..)
 Processor reads/writes to/from memory based on the
memory address:
 Access any word location in a short and fixed amount of time
based on the address.
 Random Access Memory (RAM) provides fixed access time
independent of the location of the word.
 Access time is known as “Memory Access Time”.

 Memory and processor have to “communicate” with each


other in order to read/write information.
 In order to reduce “communication time”, a small amount of
RAM (known as Cache) is tightly coupled with the processor.
 Modern computers have three to four levels of RAM units with
different speeds and sizes:
 Fastest, smallest known as Cache
 Slowest, largest known as Main memory.

22
Memory unit (contd..)

 Primary storage of the computer consists of RAM units.


 Fastest, smallest unit is Cache.
 Slowest, largest unit is Main Memory.

 Primary storage is insufficient to store large amounts of


data and programs.
 Primary storage can be added, but it is expensive.
 Store large amounts of data on secondary storage devices:
 Magnetic disks and tapes,
 Optical disks (CD-ROMS).
 Access to the data stored in secondary storage in slower, but
take advantage of the fact that some information may be
accessed infrequently.
 Cost of a memory unit depends on its access time, lesser
access time implies higher cost.

23
Arithmetic and logic unit (ALU)

 Operations are executed in the Arithmetic and Logic Unit


(ALU).
 Arithmetic operations such as addition, subtraction.
 Logic operations such as comparison of numbers.
 In order to execute an instruction, operands need to be
brought into the ALU from the memory.
 Operands are stored in general purpose registers available in
the ALU.
 Access times of general purpose registers are faster than the
cache.
 Results of the operations are stored back in the memory or
retained in the processor for immediate use.

24
Output unit
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.

Computer Real world

Memory Printer
Graphics display
Speakers
……
Output Unit

Processor

25
Control unit

 Operation of a computer can be summarized as:


 Accepts information from the input units (Input unit).
 Stores the information (Memory).
 Processes the information (ALU).
 Provides processed results through the output units (Output
unit).
 Operations of Input unit, Memory, ALU and Output unit are
coordinated by Control unit.
 Instructions control “what” operations take place (e.g. data
transfer, processing).
 Control unit generates timing signals which determines
“when” a particular operation takes place.

26
A Typical Instruction

 Add LOCA, R0
 Add the operand at memory location LOCA to the operand in
a register R0 in the processor.
 Place the sum into register R0.
 The original contents of LOCA are preserved.
 The original contents of R0 is overwritten.
 Instruction is fetched from the memory into the processor
– the operand at LOCA is fetched and added to the
contents of R0 – the resulting sum is stored in register R0.
Separate Memory Access and ALU Operation

 Load LOCA, R1
 Add R1, R0
 Whose contents will be overwritten?
How are the functional units connected?
•For a computer to achieve its operation, the functional units need to
communicate with each other.
•In order to communicate, they need to be connected.

Input Output Memory Processor

Bus

•Functional units may be connected by a group of parallel wires.


•The group of parallel wires is called a bus.
•Each wire in a bus can transfer one bit of information.
•The number of parallel wires in a bus is equal to the word length of
a computer

29
Organization of cache and main memory

Main Cache
memory memory Processor

Bus

Why is the access time of the cache memory lesser than the
access time of the main memory?

30
Registers
 In addition to the ALU and the control circuitry, the processor contains

number of registers used for several different purposes.


 Instruction register (IR)
 It holds the instruction that is currently being executed.
 Its output is available to the control circuits which generates the

timings ignals that control the various processing elements involved


in executing the instruction.
 Program counter (PC)
 PC is another specialized register.

 It keeps track of the execution of a program. It contains the memory

 address of the next instructionto be fetched and executed.

 During the execution of an instruction, the contents of the PC updated

to correspond to the address of the next instruction to executed.


 PC points to the next instruction that is to be fetched from the memory.
Registers

two registers facilitate communication with the memory


 Memory address register (MAR)

holds the address of the memory location to be accessed.

 Memory data register (MDR)


MDR contains the data to be written into or read out of the addressed
location.
 General-purpose register (R0 – Rn-1)
Registers
 In addition to the ALU and the control circuitry, the processor
contains a

a number of registers used for several different purposes

 Instruction register (IR)


 It holds the instruction that is currently being executed.
 Its output is available to the control circuits which generates the
timings ignals that control the various processing elements involved
in executing the instruction.

 Program counter (PC)


 General-purpose register (R0 – Rn-1)
 Memory address register (MAR)
 Memory data register (MDR)
Computer Components: Top-Level View

Connections between Processor and memory


Basic Operational Concepts
Typical Operating Steps
 Programs reside in the memory through input devices
 PC is set to point to the first instruction
 The contents of PC are transferred to MAR
 A Read signal is sent to the memory
 The first instruction is read out and loaded into MDR
 The contents of MDR are transferred to IR
 Decode and execute the instruction
 Get operands for ALU
 General-purpose register
 Memory (address to MAR – Read – MDR to ALU)
 Perform operation in ALU
 Store the result back
 To general-purpose register
 To memory (address to MAR, result to MDR – Write)
 During the execution, PC is incremented to the next instruction
A Partial Program Execution
Example: Add 2 numbers and store the results
A Partial Program Execution Example
Interrupt

 Normal execution of programs may be interrupted if some


device requires urgent servicing
 To deal with the situation immediately, the normal execution
of the current program must be interrupted
 Procedure of interrupt operation
 The device raises an interrupt signal
 The processor provides the requested service by executing
an appropriate interrupt-service routine
 The state of the processor is first saved before servicing
the interrupt
• Normally, the contents of the PC, the general registers,
and some control information are stored in memory
 When the interrupt-service routine is completed, the state
of the processor is restored so that the interrupted
program may continue
Classes of Interrupts
 Program
 Generated by some condition that occurs as a result of an
instruction execution such as arithmetic overflow, division
by zero, attempt to execute an illegal machine instruction,
or reference outside a user’s allowed memory space
 Timer
 Generated by a timer within the processor. This allows the
operating system to perform certain functions on a regular
basis
 I/O
 Generated by an I/O controller, to signal normal completion
of an operation or to signal a variety of error conditions
 Hardware failure
 Generated by a failure such as power failure or memory
parity error
Bus Structures

 A group of lines that serves a connecting path for several


devices is called a bus
 In addition to the lines that carry the data, the bus must
have lines for address and control purposes
 The simplest way to interconnect functional units is to use a
single bus, as shown below
Drawbacks of the Single Bus Structure
 The devices connected to a bus vary widely in their speed of
operation
 Some devices are relatively slow, such as printer and
keyboard
 Some devices are considerably fast, such as optical disks
 Memory and processor units operate are the fastest parts of
a computer
 Efficient transfer mechanism thus is needed to cope with this
problem
 A common approach is to include buffer registers with the
devices to hold the information during transfers
 An another approach is to use two-bus structure and an
additional transfer mechanism
• A high-performance bus, a low-performance, and a bridge
for transferring the data between the two buses. ARMA
Bus belongs to this structure
Software
 In order for a user to enter and run an application
program, the computer must already contain some system
software in its memory

 System software is a collection of programs that are


executed as needed to perform functions such as
 Receiving and interpreting user commands
 Running standard application programs such as word
processors, etc, or games
 Managing the storage and retrieval of files in secondary
storage devices
 Controlling I/O units to receive input information and
produce output results
functions of system Software

 Translating programs from source form prepared by


the user into object form consisting of machine
instructions
 Linking and running user-written application programs
with existing standard library routines, such as
numerical computation packages
 System software is thus responsible for the
coordination of all activities in a computing system
Operating System
 Operating system (OS)
 This is a large program, or actually a collection of routines,
that is used to control the sharing of and interaction among
various computer units as they perform application programs
 The OS routines perform the tasks required to assign computer
resource to individual application programs
 These tasks include assigning memory and magnetic disk
space to program and data files, moving data between
memory and disk units, and handling I/O operations
 In the following, a system with one processor, one disk, and one
printer is given to explain the basics of OS
 Assume that part of the program’s task involves reading a
data file from the disk into the memory, performing some
computation on the data, and printing the results
HOW OS manages to execute one application program ?
Consider a system with one processor, one disk, and one printer
 Sequence of the steps involved in running one application program.
 Assume that the application program has been compiled from a high-level

language form into a machine language form and stored on the disk.

 The first step is to transfer this file into the memory.


 When the transfer is completed, execution of the program is started.

 Assume that part of the program's task involves reading a data file from

the disk into the memory, performing some computation on the data, and
printing the results.

 When execution of the program reaches the point where the data file is

needed, the program requests the OS to transfer the data file from the
disk to the memory.
 The OS performs this task and passes execution control back to the
application program, which then proceeds to perform the required
computation.

46
Cont..
 When the computation is completed and the results are ready

to be printed, the application program again sends a request to


the OS.

 An OS routine is then executed to cause the printer to print

the results.
 i.e. execution control passes back and forth between the

application program and the OS routines .


 A convenient way to illustrate this sharing of the processor
execution time is by a time-line diagram, such as that shown in
Figure 1.4.

47
Execution of more than one application program at a time

 Computer resources can be used more efficiently if several application

programs are to be processed.


 Notice that the disk and the processor are idle during most of the time

period {t4 to t5) .

 The OS can load the next program to be executed into the memory from

the disk while the printer is operating.

 Similarly, during to to t1, the OS can arrange to print the previous

program's results while the current program is being loaded from the disk.

 Thus, the OS manages the concurrent execution of several application

programs to make the best possible use of computer resources.

 This pattern of concurrent execution is called multi-programming or

multitasking

48
User Program and OS Routine Sharing

time-line diagram
Cont..
 During the time period to to t1|, an OS routine initiates loading

the application program from disk to memory, waits until the


transfer is completed, and then passes execution control to
the application program.

 A similar pattern of activity occurs during period t2 to t3 and

period t4 to t5 when the OS transfers the data file from the


disk and prints the results.
 At t5 , the OS may load and execute another application

program.

50
Multiprogramming or Multitasking
Performance
 The speed with which a computer executes programs is
affected by the design of its hardware and its machine
language instructions

 Because programs are usually written in a high-level


language, performance is also affected by the compiler
that translates programs into machine languages

 For best performance, the following factors must be


considered
 Compiler
 Instruction set
 Hardware design
Performance
 Processor circuits are controlled by a timing signal called a clock
 The clock defines regular time intervals, called clock cycles
 To execute a machine instruction, the processor divides the
action to be performed into a sequence of basic steps, such that
each step can be completed in one clock cycle
 Let the length P of one clock cycle, its inverse is the clock rate,
R=1/P
 Basic performance equation
 T=(NxS)/R, where T is the processor time required to execute
a program, N is the number of instruction executions, and S is
the average number of basic steps needed to execute one
machine instruction
 Note: these are not independent to each other
 How to improve T?
Pipeline and Superscalar Operation

 Instructions are not necessarily executed one after


another.
 The value of S doesn’t have to be the number of clock
cycles to execute one instruction.
 Use of Pipelining – overlapping the execution of
successive instructions.
 Use multiple functional units
 Goal is to reduce S (could become <1!)
Performance Improvement
 Pipelining and superscalar operation
 Pipelining: by overlapping the execution of successive
instructions
 Superscalar: different instructions are concurrently
executed with multiple instruction pipelines. This means that
multiple functional units are needed.
 The processor and a relatively small cache memory can be
fabricated on a single integrated circuit chip
Performance Improvement

 Clock rate improvement


 Improving the integrated-circuit technology
makes logic circuits faster (R inereases ), which
reduces the time needed to complete a basic step.
 Reducing amount of processing done in one basic
step also makes it possible to reduce the clock
period, P.
 However, if the actions that have to be performed
by an instruction remain the same, the number of
basic steps needed may increase
 Reduced instruction set computers (RISC) and complex
instruction set computers (CISC)
CISC and RISC

 Reduce the number of basic steps to execute


Tradeoff between N and S
A key consideration is the use of pipelining
 S is close to 1 even though the number of basic steps per
instruction may be considerably larger
 It is much easier to implement efficient pipelining in
processor with simple instruction sets
Compiler

 A compiler translates a high-level language program into a


sequence of machine instructions.
 To reduce N, we need a suitable machine instruction set and
a compiler that makes good use of it.
 Goal – reduce N×S
 A compiler may not be designed for a specific processor;
however, a high-quality compiler is usually designed for, and
with, a specific processor.
Performance Measurement
 T is difficult to compute.
 Measure computer performance using benchmark programs.
 System Performance Evaluation Corporation (SPEC) selects and
publishes representative application programs for different
application domains, together with test results for many
commercially available computers.
 SPEC rating is a measure of the combined effect of all factors
affecting performance.(Compiler,OS,CPU and of the computer being
tested)Memory how much fast the computer under test
 Compile and run (no simulation)
 Reference computer
Running time on the reference computer
SPEC rating 
Running time on the computer under test
n 1
SPEC rating  ( SPECi ) n Where n is the number of
programs in the suite.
i 1
Multiprocessors and Multicomputers

 Multiprocessor computer
 Execute a number of different application tasks in parallel
 Execute subtasks of a single large task in parallel
 All processors have access to all of the memory – shared-
memory multiprocessor
 Cost – processors, memory units, complex interconnection
networks
 Multicomputers
 Each computer only have access to its own memory
 Exchange message via a communication network – message-
passing multicomputers
The Performance Equation

The performance equation analyzes execution time as a product of three factors


that are relatively independent of each other. It is given by :

T=(N*S)/R = N*S*P = IC*CPI*CT

where ‘T’ is the execution time or the processor time, ‘N’ is the instruction count
(IC), ‘S’ is the number of clock cycles per instruction or CPI, ‘R’ is the clock rate
or clock frequency and ‘P’ is the Clock Time (CT) which is the reciprocal of the
clock frequency, i.e., P=1/R.

For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.

This equation remains valid if the time units are changed on both sides of the
equation. The left-hand side and the factors on the right-hand side are discussed in
the following sections.

The three factors are, in order, known as the instruction count (IC), clocks per
instruction (CPI), and clock time (CT). CPI is computed as an effective value.

Instruction Count

Computer architects can reduce the instruction count by adding more powerful
instructions to the instruction set. However, this can increase either CPI or clock
time, or both.

Clocks Per Instruction

Computer architects can reduce CPI by exploiting more instruction-level


parallelism. If they add more complex instructions it often increases CPI.

Clock Time

Clock time depends on transistor speed and the complexity of the work done in a
single clock. Clock time can be reduced when transistor sizes decrease. However,
power consumption increases when clock time is reduced. This increase the
amount of heat generated.

Instruction Count

Instruction (IC) count is a dynamic measure: the total number of instruction


executions involved in a program. It is dominated by repetitive operations such as
loops and recursions.
Instruction count is affected by the power of the instruction set. Different
instruction sets may do different amounts of work in a single instruction. CISC
processor instructions can often accomplish as much as two or three RISC
processor instructions. Some CISC processor instructions have built-in looping so
that they can accomplish as much as several hundred RISC instruction executions.

For predicting the effects of incremental changes, architects use execution traces of
benchmark programs to get instruction counts. If the incremental change does not
change the instruction set then the instruction count normally does not change. If
there are small changes in the instruction set then trace information can be used to
estimate the change in the instruction count.

For comparison purposes, two machines with different instruction sets can be
compared based on compilations of the same high-level language code on the two
machines.

Clocks Per Instruction

Clocks per instruction (CPI) is an effective average. It is averaged over all of the
instruction executions in a program.

CPI is affected by instruction-level parallelism and by instruction complexity.


Without instruction-level parallelism, simple instructions usually take 4 or more
cycles to execute. Instructions that execute loops take at least one clock per loop
iteration. Pipelining (overlapping execution of instructions) can bring the average
for simple instructions down to near 1 clock per instruction. Superscalar pipelining
(issuing multiple instructions per cycle) can bring the average down to a fraction of
a clock per instruction.

For computing clocks per instruction as an effective average, the cases are
categories of instructions, such as branches, loads, and stores. Frequencies for the
categories can be extracted from execution traces. Knowledge of how the
architecture handles each category yields the clocks per instruction for that
category.

Clock Time

Clock time (CT) is the period of the clock that synchronizes the circuits in a
processor. It is the reciprocal of the clock frequency.

For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.

Clock time is affected by circuit technology and the complexity of the work done
in a single clock. Logic gates do not operate instantly. A gate has a propagation
delay that depends on the number of inputs to the gate (fan in) and the number of
other inputs connected to the gate's output (fan out). Increasing either the fan in or
the fan out slows down the propagation time. Cycle time is set to be the worst-case
total propagation time through gates that produce a signal required in the next
cycle. The worst-case total propagation time occurs along one or more signal paths
through the circuitry. These paths are called critical paths.

For the past 35 years, integrated circuit technology has been greatly affected by a
scaling equation that tells how individual transistor dimensions should be altered as
the overall dimensions are decreased. The scaling equations predict an increase in
speed and a decrease in power consumption per transistor with decreasing size.
Technology has improved so that about every 3 years, linear dimensions have
decreased by a factor of 2. Transistor power consumption has decreased by a
similar factor. Speed increased by a similar factor until about 2005. At that time,
power consumption reached the point where air cooling was not sufficient to keep
processors cool if the ran at the highest possible clock speed.

Problem Statement 1

Suppose a program (or a program task) takes 1 billion instructions to execute on a


processor running at 2 GHz. Suppose also that 50% of the instructions execute in 3
clock cycles, 30% execute in 4 clock cycles, and 20% execute in 5 clock cycles.
What is the execution time for the program or task?

Solution

We have the instruction count: 109 instructions. The clock time can be computed
quickly from the clock rate to be 0.5×10-9 seconds. So we only need to to compute
clocks per instruction as an effective value:

Value Frequency Product


3 0.5 1.5
4 0.3 1.2
5 0.2 1.0
CPI = 3.7

Then we have

Execution time = 1.0×109 × 3.7 × 0.5×10-9 sec = 1.85 sec.

Problem Statement 2

Suppose the processor in the previous example is redesigned so that all instructions
that initially executed in 5 cycles now execute in 4 cycles. Due to changes in the
circuitry, the clock rate has to be decreased from 2.0 GHz to 1.9 GHz. No changes
are made to the instruction set. What is the overall percentage improvement?
Solution Form

We can determine the percentage improvement quickly by first finding the ratio
between before and after performance. The performance equation implies that this
ratio will be a product of three factors: a performance ratio for instruction count, a
performance ratio for CPI or its reciprocal, instruction throughput, and a
performance ratio for clock time or its reciprocal, clock frequency. We can ignore
the first factor in this problem since it is 1.0: the instruction count has not changed.
We are left with determining the performance ratio for CPI and, since we are given
clock frequencies, the performance ratio for clock frequencies.

Solution

The performance ratio for frequencies must be less than 1.0: if other factors are the
same then a slower clock rate implies worse performance. So this factor of the
improvement ratio must be 1.9/2.0.

For the clocks per instruction, we had a value of 3.7 before the change. We
compute clocks per instruction after the change as an effective value:

Value Frequency Product


3 0.5 1.5
4 0.3 1.2
4 0.2 0.8
CPI = 3.5

Now, lower clocks per instruction means higher instruction throughput and thus
better performance, so we expect this part of the performance ratio to be greater
than 1.0; that is, 3.7/3.5.

Then we have

3.7 1.9 7.03


performance ratio = × = =1.0043
3.5 2.0 7.0

This is a 0.43% improvement, which is probably not worth the effort.

You might also like