KEMBAR78
Computer Architecture Essentials | PDF | Computer Data Storage | Central Processing Unit
0% found this document useful (0 votes)
116 views22 pages

Computer Architecture Essentials

The document provides an overview of the key components of a computer system, including: 1) The major hardware components are the processor, main memory, secondary memory, input devices, and output devices. 2) Main memory is closely connected to the processor and volatile, while secondary memory like hard disks provide non-volatile long-term storage. 3) A hierarchy of memories exists with faster more expensive memory at the top and slower larger cheaper memory at the bottom.

Uploaded by

kalaivani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views22 pages

Computer Architecture Essentials

The document provides an overview of the key components of a computer system, including: 1) The major hardware components are the processor, main memory, secondary memory, input devices, and output devices. 2) Main memory is closely connected to the processor and volatile, while secondary memory like hard disks provide non-volatile long-term storage. 3) A hierarchy of memories exists with faster more expensive memory at the top and slower larger cheaper memory at the bottom.

Uploaded by

kalaivani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT 1

OVERVIEW & INSTRUCTIONS


Syllabus:
Eight ideas Components of a computer system Technology Performance Power
wall Uniprocessors to multiprocessors; Instructions operations and operands representing
instructions Logical operations control operations Addressing and addressing modes.

1.1 EIGHT IDEAS:
IDEA 1: Design for Moore's Law
The one constant for computer designers is rapid change, which is driven largely
by Moore's Law.
It states that integrated circuit resources double every 1824 months.
Moore's Law resulted from a 1965 prediction of such growth in IC capacity made
by Gordon Moore, one of the founders of Intel.
As computer designs can take years, the resources available per chip can easily
double or quadruple between the start and finish of the project. We use an "up and to the
right" Moore's Law graph to represent designing for rapid change.

IDEA 2: Use Abstraction to Simplify Design
Both computer architects and programmers had to invent techniques to make themselves
more productive, for otherwise design time would lengthen as dramatically as resources grew by
Moore's Law.
A major productivity technique for hardware and soft ware is to use abstractions to
represent the design at different levels of representation; lower-level details are hidden to offer a
simpler model at higher levels.


IDEA 3: Make the common case fast
Making the common case fast will tend to enhance performance better than
optimizing the rare case. Ironically, the common case is often simpler than the rare case
and hence is often easier to enhance.

IDEA 4: Performance via parallelism
Since the dawn of computing, computer architects have offered designs that get
more performance by performing operations in parallel.

IDEA 5: Performance via pipelining
A particular pattern of parallelism is so prevalent in computer architecture that it
merits its own name: pipelining.

IDEA 6: Performance via prediction
It can be better to ask for forgiveness than to ask for permission, the next great
idea is prediction.
IDEA 7 Hierarchy of memories
Programmers want memory to be fast, large, and cheap, as memory speed often
shapes performance, capacity limits the size of problems that can be solved, and the cost
of memory today is often the majority of computer cost.
Architects have found that they can address these conflicting demands with a
hierarchy of memories, with the fastest, smallest, and most expensive memory per bit at
the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom.
Caches give the programmer the illusion that main memory is nearly as fast as the
top of the hierarchy and nearly as big and cheap as the bottom of the hierarchy.
The shape indicates speed, cost, and size: the closer to the top, the faster and more
expensive per bit the memory; the wider the base of the layer, the bigger the memory.

IDEA 8: Dependability via redundancy
Computers not only need to be fast; they need to be dependable.
Since any physical device can fail, we make systems dependable by including
redundant components that can take over when a failure occurs and to help detect
failures.




COMPONENTS OF A COMPUTER SYSTEM
A computer system consists of both hardware and information stored on hardware.
Information stored on computer hardware is often called software.
The hardware components of a computer system are the electronic and mechanical parts.
The software components of a computer system are the data and the computer programs.
The major hardware components of a computer system are:
Processor
Main memory
Secondary memory
Input devices
Output devices
For typical desktop computers, the processor, main memory, secondary memory, power supply,
and supporting hardware are housed in a metal case. Many of the components are connected to
the main circuit board of the computer, called the motherboard.
The power supply supplies power for most of the components. Various input devices (such as the
keyboard) and output devices (such as the monitor) are attached through connectors.
HARDWARE COMPONENTS:

The terms input and output say if data flow into or out of the computer. The arrows show
the direction of data flow.
A Bus is a group of wires on the main circuit board of the computer. It is a pathway for
data flowing between components. Most devices are connected to the bus through a Controller
which coordinates the activities of the device with the bus.
The Processor is an electronic device about a one inch square, covered in plastic. Inside
the square is an even smaller square of silicon containing millions of tiny electrical parts. A
modern processor may contain billions of transistors. It does the fundamental computing within
the system, and directly or indirectly controls all the other components.
The processor is sometimes called the Central Processing Unit or CPU. A particular
computer will have a particular type of processor, such as a Pentium processor or a SPARC
processor.
MEMORY:
The processor performs all the fundamental computation of the computer system. Other
components contribute to the computation by doing such things as storing data or moving data
into and out of the processor. But the processor is where the fundamental action takes place.
A processor chip has relatively little memory. It has only enough memory to hold a few
instructions of a program and the data they process. Complete programs and data sets are held in
memory external to the processor.
This memory is of two fundamental types: main memory, and secondary memory.
Main memory is sometimes called volatile because it loses its information when power is
removed. Main memory is sometimes called main storage.
Secondary memory is usually nonvolatile because it retains its information when power is
removed. Secondary memory is sometimes called secondary storage or mass storage.
Main memory Secondary memory
a. closely connected to the
processor.
a. connected to main memory through the
bus and a controller.
b. stored data are quickly and
easily changed.
b. stored data are easily changed, but
changes are slow compared to main
memory.
c. holds the programs and data that
the processor is actively
working with.
c. used for long-term storage of programs
and data.
d. interacts with the processor
millions of times per second.
d. before data and programs can be used,
they must be copied from secondary
memory into main memory.
e. needs constant electric power to
keep its information.

e. does not need electric power to keep its
information.

Main memory is where programs and data are kept when the processor is actively using
them. When programs and data become active, they are copied from secondary memory into
main memory where the processor can interact with them. A copy remains in secondary memory.
Main memory is intimately connected to the processor, so moving instructions and data
into and out of the processor is very fast.
Main memory is sometimes called RAM. RAM stands for Random Access Memory.
"Random" means that the memory cells can be accessed in any order. However, properly
speaking, "RAM" means the type of silicon chip used to implement main memory.
Secondary memory is where programs and data are kept on a long-term basis. Common
secondary storage devices are the hard disk and optical disks.
The hard disk has enormous storage capacity compared to main memory.
The hard disk is usually contained inside the case of a computer.
The hard disk is used for long-term storage of programs and data.
Data and programs on the hard disk are organized into files.
A file is a collection of data on the disk that has a name.
A hard disk might have a storage capacity of 500 gigabytes (room for about 500 x 10
9

characters). This is about 100 times the capacity of main memory. A hard disk is slow compared
to main memory. If the disk were the only type of memory the computer system would slow
down to a crawl. The reason for having two types of storage is this difference in speed and
capacity.
Large blocks of data are copied from disk into main memory. The operation is slow, but
lots of data is copied. Then the processor can quickly read and write small sections of that data in
main memory. When it is done, a large block of data is written to disk.
Often, while the processor is computing with one block of data in main memory, the next
block of data from disk is read into another section of main memory and made ready for the
processor. One of the jobs of an operating system is to manage main storage and disks this way.

INPUT AND OUTPUT DEVICES
Input and output devices allow the computer system to interact with the outside world by
moving data into and out of the system.
An input device is used to bring data into the system. Some input devices are:
Keyboard
Mouse
Microphone
Bar code reader
Graphics tablet
An output device is used to send data out of the system. Some output devices are:
Monitor
Printer
Speaker
A network interface acts as both input and output. Data flows from the network into the
computer, and out of the computer into the network.
I/O
Input/output devices are usually called I/O devices. They are directly connected to an
electronic module attached to the motherboard called a device controller.
For example, the speakers of a multimedia computer system are directly connected to a
device controller called an audio card, which in turn is plugged into a bus on the motherboard.
Sometimes secondary memory devices like the hard disk are called I/O devices.
EMBEDDED SYSTEMS
A computer system that is part of a larger machine and which controls how that machine
operates is an embedded system. Usually the processor constantly runs a single control program
which is permanently kept in ROM (Read Only Memory).
ROM is used to make a section of main memory read-only. Main memory looks the same
as before to the processor, except a section of it permanently contains the program the processor
is running. This section of memory retains its data even when power is off.
A typical embedded system is a cell phone. Digital cameras, DVD players, medical
equipment, and even home appliances contain dedicated processors.
SOFTWARE
Computer software consists of both programs and data.
Programs consist of instructions for the processor.
Data can be any information that a program needs: character data, numerical data, image
data, audio data, and countless other types.
TYPES OF PROGRAMS
There are two categories of programs.
Application programs (usually called just "applications") are programs that
people use to get their work done. Computers exist because people want to run
these programs.
Systems programs keep the hardware and software running together smoothly.
The difference between "application program" and "system program" is fuzzy.
Often it is more a matter of marketing than of logic.
The most important systems program is the operating system. The operating system is
always present when a computer is running. It coordinates the operation of the other hardware
and software components of the computer system.
The operating system is responsible for starting up application programs, running them,
and managing the resources that they need. When an application program is running, the
operating system manages the details of the hardware for it.
Modern operating systems for desktop computers come with a user interface that
enables users to easily interact with application programs (and with the operating system itself)
by using windows, buttons, menus, icons, the mouse, and the keyboard. Examples of operating
systems are Unix, Linux, Android, Mac OS, and Windows.

OPERATING SYSTEMS
An operating system is a complex program that keeps the hardware and software
components of a computer system coordinated and functioning.
Most computer systems can potentially run any of several operating systems. For
example, most Pentium-based computers can run either Linux or Windows operating systems.
Usually only one operating system is installed on a computer system, although some computers
have several.
In any case, only one operating system at a time can be in control of the computer
system. The computer user makes a choice when the computer is turned on, and that operating
system remains in control until the computer is turned off.







TECHNOLOGY:
To plan for the evolution of a machine, the designer must be especially aware of rapidly occurring
changes in implementation technology. Three implementation technologies, which change at a dramatic
pace, are critical to modern implementations:
Integrated circuit logic technology:
Transistor density increases by about 50% per year, quadrupling in just over three years. Increases in die
size are less predictable, ranging from 10% to 25% per year.
The combined effect is a growth rate in transistor count on a chip of between 60% and 80% per year.
Device speed increases nearly as fast; however, metal technology used for wiring does not improve,
causing cycle times to improve at a slower rate.
Semiconductor DRAM:
Density increases by just under 60% per year, quadrupling in three years.
Cycle time has improved very slowly, decreasing by about one-third in 10 years.
Bandwidth per chip increases as the latency decreases.
In addition, changes to the DRAM interface have also improved the bandwidth. In the past, DRAM
(dynamic random-access memory) technology has improved faster than logic technology. This difference
has occurred because of reductions in the number of transistors per DRAM cell and the creation of
specialized technology for DRAMs.
Magnetic disk technology:
Recently, disk density has been improving by about 50% per year, almost quadrupling in three years.
It appears that disk technology will continue the faster density growth rate for some time to come.
Access time has improved by one-third in 10 years.
These rapidly changing technologies impact the design of a microprocessor that may, with speed and
technology enhancements, have a lifetime of five or more years. Even within the span of a single product
cycle (two years of design and two years of production), key technologies, such as DRAM, change
sufficiently that the designer must plan for these changes.. Traditionally, cost has decreased very closely
to the rate at which density increases.
PERFORMANCE:
The computer user is interested in reducing response timethe time between the start and the
completion of an eventalso referred to as execution time.
The manager of a large data processing center may be interested in increasing throughputthe total
amount of work done in a given time.
In comparing design alternatives, we often want to relate the performance of two different machines, say
X and Y. The phrase X is faster than Y is used here to mean that the response time or execution time
is lower on X than on Y for the given task.
In particular, X is n times faster than Y will mean

Since execution time is the reciprocal of performance, the following relationship holds:

The phrase the throughput of X is 1.3 times higher than Y signifies here that the number of tasks
completed per unit time on machine X is 1.3 times the number completed on Y. Because performance and
execution time are reciprocals, increasing performance decreases execution time.
The key measurement is time:
The computer that performs the same amount of work in the least time is the fastest. The
difference is whether we measure one task (response time) or many tasks (throughput).

MEASURING PERFORMANCE:
Even execution time can be defined in different ways depending on what we count. The most
straightforward definition of time is called wall-clock time, response time, or elapsed time, which is the
latency to complete a task, including disk accesses, memory accesses, input/output activities,
operating system overhead.
CPU time means the time the CPU is computing, not including the time waiting for I/O or
running other programs. It can be further divided into the CPU time spent in the program, called user
CPU time, and the CPU time spent in the operating system performing tasks requested by the program,
called system CPU time.
Choosing Programs to Evaluate Performance
A computer user who runs the same programs day in and day out would be the perfect candidate
to evaluate a new computer.
To evaluate a new system the user would simply compare the execution time of her workload
the mixture of programs and operating system commands that users run on a machine.
There are four levels of programs used in such circumstances, listed below in decreasing order
of accuracy of prediction.
1. Real programs:
Real programs have input, output, and options that a user can select when running the program.
2. Kernels:
Livermore Loops and Linpack are the best known examples. Unlike real programs, no user would
run kernel programs, for they exist solely to evaluate performance.
Kernels are best used to isolate performance of individual features of a machine to explain the
reasons for differences in performance of real programs.
3. Toy benchmarks:
Toy benchmarks are typically between 10 and 100 lines of code and produce a result the user
already knows before running the toy program. Programs like Sieve of Eratosthenes, Puzzle, and
Quicksort are popular because they are small, easy to type, and run on almost any computer.
4. Synthetic benchmarks:
Synthetic benchmarks try to match the average frequency of operations and operands of a large
set of programs. Whetstone and Dhrystone are the most popular synthetic benchmarks.

Benchmark Suites:
Benchmark suites are made of collections of programs, some of which may be kernels, but
many of which are typically real programs.
The programs in the popular SPEC92 benchmark suite used to characterize performance in
the workstation and server markets. The programs in SPEC92 vary from collections of kernels (nasa7)
to small, program fragments (tomcatv, ora, alvinn, swm256) to applications of varying size (spice2g6,
gcc, and compress).


Reporting Performance Results:
The guiding principle of reporting performance measurements should be reproducibilitylist
everything another experimenter would need to duplicate the results.
A SPEC benchmark report requires a fairly complete description of the machine, the compiler
flags, as well as the publication of both the baseline and optimized results.
As an example, the SPECfp92 report for an IBM RS/6000 Powerstation 590.
In addition to hardware, software, and baseline tuning parameter descriptions, a SPEC report
contains the actual performance times, shown both in tabular form and as a graph.
The importance of performance on the SPEC benchmarks motivated vendors to add many
benchmark-specific flags when compiling SPEC programs; these flags often caused transformations that
would be illegal on many programs or would slow down performance on others.

Total Execution Time: A Consistent Summary Measure
The simplest approach to summarizing relative performance is to use total execution time of
the two programs. Thus
B is 9.1 times faster than A for programs P1 and P2.
C is 25 times faster than A for programs P1 and P2.
C is 2.75 times faster than B for programs P1 and P2.
This summary tracks execution time, our final measure of performance. If the workload
consisted of running programs P1 and P2 an equal number of times, the statements above would predict
the relative execution times for the workload on each machine.
An average of the execution times that tracks total execution time is the arithmetic mean.

where Timei is the execution time for the ith program of a total of n in the workload. If
performance is expressed as a rate, then the average that tracks total execution time is the harmonic
mean.

where Ratei is a function of 1/ Timei, the execution time for the ith of n programs in the workload.

Weighted Execution Time
The first approach when given an unequal mix of programs in the workload is to assign a
weighting factor wi to each program to indicate the relative frequency of the program in that workload.
If, for example, 20% of the tasks in the workload were program P1 and 80% of the tasks in the
workload were program P2, then the weighting factors would be 0.2 and 0.8. (Weighting factors add up to
1.) By summing the products of weighting factors and execution times, a clear picture of performance of
the workload is obtained. This is called the weighted arithmetic mean:

where Weighti is the frequency of the ith program in the workload and Timei is the execution
time of that program.
The weighted harmonic mean of rates will show the same relative performance as the weighted
arithmetic means of execution times. The definition is

Quantitative Principles of Computer Design
Make the Common Case Fast
Perhaps the most important and pervasive principle of computer design is to make the common case fast:
In making a design trade-off, favor the frequent case over the infrequent case. This principle also applies
when determining how to spend resources, since the impact on making some occurrence faster is higher if
the occurrence is frequent. Improving the frequent event, rather than the rare event, will obviously help
performance, too. A fundamental law, called Amdahls Law, can be used to quantify this principle.
Amdahls Law
The performance gain that can be obtained by improving some portion of a computer can be calculated
using Amdahls Law.
Amdahls Law states that the performance improvement to be gained from using some faster mode
of execution is limited by the fraction of the time the faster mode can be used.
Amdahls Law defines the speedup that can be gained by using a particular feature.
What is speedup? Suppose that we can make an enhancement to a machine that will improve performance
when it is used. Speedup is the ratio

Speedup tells us how much faster a task will run using the machine with the enhancement as opposed
to the original machine.
Amdahls Law gives us a quick way to find the speedup from some enhancement, which depends on
two factors:
1. The fraction of the computation time in the original machine that can be converted to take advantage
of the enhancement.
For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can
use an enhancement, the fraction is 20/60. This value, which we will call Fraction
enhanced
, is always less
than or equal to 1.
2. The improvement gained by the enhanced execution mode; that is, how much faster the task would
run if the enhanced mode were used for the entire program.
This value is the time of the original mode over the time of the enhanced mode: If the enhanced
mode takes 2 seconds for some portion of the program that can completely use the mode, while the
original mode took 5 seconds for the same portion, the improvement is 5/2. We will call this value, which
is always greater than 1, Speedup
enhanced

The execution time using the original machine with the enhanced mode will be the time spent
using the unenhanced portion of the machine plus the time spent using the enhancement:

Amdahls Law expresses the law of diminishing returns:
The incremental improvement in speedup gained by an additional improvement in the
performance of just a portion of the computation diminishes as improvements are added.
An important corollary of Amdahls Law is that if an enhancement is only usable for a
fraction of a task, we cant speed up the task by more than the reciprocal of 1 minus that fraction.
Amdahls Law can serve as a guide to how much an enhancement will improve performance
and how to distribute resources to improve cost/performance. The goal, clearly, is to spend
resources proportional to where time is spent.
The CPU Performance Equation
Most computers are constructed using a clock running at a constant rate. These discrete time
events are called ticks, clock ticks, clock periods, clocks, cycles, or clock cycles. Computer designers
refer to the time of a clock period by its duration (e.g., 2 ns) or by its rate (e.g., 500 MHz). CPU time for
a program can then be expressed two ways:

In addition to the number of clock cycles needed to execute a program, we can also count the
number of instructions executedthe instruction path length or instruction count (IC). If we know the
number of clock cycles and the instruction count we can calculate the average number of clock cycles per
instruction (CPI):

By transposing instruction count in the above formula, clock cycles can be defined as IC
CPI. This allows us to use CPI in the execution time formula:

Expanding the first formula into the units of measure shows how the pieces fit together:

CPU performance is dependent upon three characteristics:
clock cycle (or rate),
clock cycles per instruction, and
Instruction count.
Furthermore, CPU time is equally dependent on these three characteristics: A 10% improvement
in any one of them leads to a 10% improvement in CPU time.
It is difficult to change one parameter in complete isolation from others because the basic
technologies involved in changing each characteristic are also interdependent:
Clock cycle timeHardware technology and organization
CPIOrganization and instruction set architecture
Instruction countInstruction set architecture and compiler technology

Sometimes it is useful in designing the CPU to calculate the number of total CPU clock cycles as

where ICi represents number of times instruction i is executed in a program and CPIi
represents the average number of clock cycles for instruction i. This form can be used to express CPU
time as

Locality of Reference
While Amdahls Law is a theorem that applies to any system, other important fundamental
observations come from properties of programs.
The most important program property that we regularly exploit is locality of reference:
Programs tend to reuse data and instructions they have used recently.
A widely held rule of thumb is that a program spends 90% of its execution time in only 10% of
the code.
Locality of reference also applies to data accesses, though not as strongly as to code accesses.
Two different types of locality have been observed.
Temporal locality states that recently accessed items are likely to be accessed in the near future.
Spatial locality says that items whose addresses are near one another tend to be referenced close
together in time.


POWER WALL:

Three major reasons for the unsustainable growth in uniprocessor performance

1. The Memory Wall:

Increasing gap between CPU and Main memory speed

2. The ILP Wall:

Decreasing amount of "work" (instruction level parallelism) for processor

3. The Power Wall:

Increasing power consumption of processor

Power wall

Power expensive

Transistors free

The power wall: it is not possible to consistently run at higher frequencies without hitting
power/thermal limits.


UNIPROCESSOR TO MULTIPROCESSOR:

Introduction- Uniprocessor

A uniprocessor system is defined as a computer system that has a single central
processing unit that is used to execute computer tasks.
The typical Uniprocessor system consists of three major components:
the main memory,
the Central processing unit (CPU) and
the Input-output (I/O) sub-system.
The CPU contains an arithmetic and logic unit (ALU) with an optional floating-point
accelerator, and some local cache memory with an optional diagnostic memory.
The CPU, the main memory and the I/O subsystems are all connected to a common bus,
the synchronous backplane interconnect (SBI) through this bus, all I/O device scan communicate
with each other, with the CPU, or with the memory.
A number of parallel processing mechanisms have been developed in uniprocessor
computers and they are identified as multiplicity of functional units, parallelism and pipelining
within the CPU, overlapped CPU and I/O operations, use of a hierarchical memory system,
multiprogramming and time sharing, multiplicity of functional units.

.

Limitations of Uniprocessor Machine
Clock speed cannot be increase further.
In Digital circuit maximum signal speed is as much as light speed.
More speed generates more heat.

Introduction-Multiprocessor
Multiprocessor system is an interconnection of two or more CPUs with memory and
input-output equipment
The component that forms multiprocessor is CPUs IOPs connected to input output
devices, and memory unit that may be partitioned into a number of separate modules.
Multiprocessor are classified as multiple instruction stream, multiple data stream
(MIMD) system.
Why Choose a Multiprocessor?
A single CPU can only go so fast, use more than one CPU to improve performance
Multiple users
Multiple applications
Multi-tasking within an application
Responsiveness and/or throughput
Share hardware between CPUs
Multiprocessor System Benefits
Reliability
Multiprocessing improves the reliability of the system so that a failure or error in one part has a
limited effect on the rest of the system. If a fault causes one processor to fail, a second processor
can be assigned to perform the functions of the disabled processor.
Improved System Performance
The system derives its high performance from the fact that computations can proceed in parallel
in one of two ways.
1. Multiple independent jobs can be made to operate in parallel.
2. A single job can be partitioned into multiple parallel tasks.
How multiprocessor are classified?
Multiprocessor are classified by the way their memory is organized, mainly it is classified
into two types
1. Tightly coupled multiprocessor
2. Loosely coupled multiprocessor

Tightly coupled Multiprocessor
A multiprocessor is a tightly coupled computer system having two or more processing
units (Multiple Processors) each sharing main memory and peripherals, in order to
simultaneously process programs
Tightly coupled Multiprocessor is also know as shared memory system
Loosely-coupled multiprocessor
Loosely-coupled multiprocessor systems (often referred to as clusters ) are based on
multiple standalone single or dual processor commodity computers interconnected via a
high speed communication system.
Loosely-coupled multiprocessor is also known as distributed memory.
Example
A Linux beowulf cluster
Difference b/w Tightly coupled and Loosely coupled multiprocessor
Tightly coupled
Tightly-coupled systems physically smaller than loosely-coupled system.
More expensive
Loosely coupled
It is just opposite of tightly coupled system.
Less expensive.






Chip Manufacturing Process

You might also like