COSC 403:
COMPUTER ARCHITECTURE
Image Source: http://www.thedailycrate.com/wp-content/uploads/2014/08/o-COMPUTER-SCIENCE-facebook.jpg - Retrieved Online on January 11, 2016
multiprocessing
MODULE SEVEN
Process
•A process or a running
process is a collection of
instructions carried out by
the computer processor.
Process
• A process’s execution must proceed in a
sequential manner. For example, a
computer program is written in a text
file, but once executed, it turns into a
process and performs all the tasks
mentioned in the program sequentially
Differences between Process and thread
• A process is a section of a running program, while
a thread is a portion of a process
• Threads are lightweight compare to processes
• When a process ends, it does it more slowly than a
thread
• The creation of a process requires more time than
the creation of a thread
Differences between Process and thread
• Processes require more time for context
switching, whereas thread need less time
• Unlike threads, which share memory, a
process is largely segregated
• Data isn’t shared across processes, but it
is shared between threads
MULTIPROCESSING
• Multiprocessing is the utilization of two
or more central processing units (CPUs)
in a single computer system. Concurrent
execution of multiple processes on
multiple CPUs. During multiprocessing,
a CPU can execute different processes
simultaneously.
MULTIPROCESSING
•A system's capacity to
accommodate more than one
processor at once is referred to as
a multi-processing,
multiprocessor or clustering
MULTIPROCESSING
• The main benefits of multiprocessing are
increased throughput and reduced
latency, as multiple processes can be
executed at the same time without
interfering with each other.
Multiprocessing involves running
multiple processes on multiple CPUs.
Types of multiprocessing system
• Asymmetric multiprocessor (AMP): The is
the first type of multiprocessor where each
processor is given a predetermined task. A
primary-secondary relationship exists in an
AMP system, where all the processors
receive instructions from the primary
processor. AMP is cheaper than SMP.
Types of multiprocessing system
• Symmetric multiprocessing (SMP): in SMP
or tightly coupled multiprocessing, the
processors share memory and the
input/output (I/O) bus or data path. A single
copy of the OS manages all the processors.
SMP, also known as a shared everything
system doesn’t usually exceed 16 processors.
How SMP shares memory space with different processors
Cache
• The CPU never directly access RAM.
Modern CPUs have one or more
layers of cache. The CPU’s ability to
perform calculations is much faster
than the EAM’s ability to feed data to
the CPU.
Cache
• Cache memory is faster than the system RAM, and it is
closer to the CPU because it is physically located on the
processor chip. The cache provides data storage and
instructions to prevent the CPU from waiting for data to
be retrieved from RAM. When the CPU need data.
Program instructions are also considered to be data; the
cache determines whether the data is already in
residence and provides it to the CPU.
Cache
• If the requested data is not in the cache, it’s retrieved from
RAM and uses predictive algorithms to move more data
from RAM into the cache. The cache controller analyzes
the requested data and tries to predict what additional data
will be needed from RAM. It loads the anticipated data
into the cache. By keeping some data closer to the CPU in
a cache that is faster than RAM, the CPU can remain busy
and not waste cycles waiting for data.
Massively parallel processing (MPP)
• In MPP, or loosely coupled processing, up to 200 or more
processors can work on the same application. Each processor
has its own OS and memory, but an interconnect arrangement
of data paths lets messages be sent between processors.
Typically, the setup for MPP is more complicated requiring
thought about how to partition a common database among
processors and how to assign work among the processors. An
MPP system is also known as shared nothing system.
Advantages of Multiprocessing
• Multiprocessing environments are
widely adopted and offer a wide
range of advantages such as increased
speed, throughput and reliability
Advantages of Multiprocessing
• Reliability: in case of failure of one processor in a
multiprocessor system the other processor can
pick up the slack and continue to function. While
the shutting down of one processor might cause a
gradual slowdown, the system can still function
smoothly. This makes multiprocessing systems
highly reliable
Advantages of Multiprocessing
• Increased throughput: throughput is the number of
processes executed at a given time. Given that
multiprocessor systems use many CPUs to handle
data, increased performance is expected when the
system uses parallel processing. This means more
tasks can be accomplished in a shorter amount of
time, as they’re divided among different processors
Advantages of Multiprocessing
• Cost saving: multiprocessing systems are
more economical compared to multiple
single processor systems. This is because
multiple processors within a single system
share the same memory, disk space, buses
and peripherals.
Multiprocessing challenges
• Multiprocessing is expensive. It is
cheaper to relatively maintain one
processor than more than one
processor
Multiprocessing challenges
•Deadlock: Deadlock can occur if
one processor attempts to access
an I/O device while another
processor is trying to use it.
Multiprocessing challenges
• Extra memory requirement: Due to their improved
computing capacity, multiprocessor computers are
widely used. However, they do come with increased
memory requirements. In multiprocessing architecture,
memory is shared across all processes and each processor
requires memory space. All processor work together and
simultaneously accesses the main memory directly,
which causes an increase in memory consumption.
Multiprocessing challenges
• Complex operating system: In multiprocessing
OSes, each CPU has its own operating system,
which assigns each processor with several minor
tasks and the load is distributed among the
processors. However, the use of multiple
processors makes it more complex for the OS to
function
CPU clock and control unit
• All of the CPU components must be
synchronized to work together smoothly. The
control unit performs this function at a rate
determined by the clock speed and is responsible
for directing the operations of the other units by
using time signals that extend throughout the
CPU
Multiprogramming
• Multiprogramming is the interleaved execution of two or
more programs by a processor. Concurrent execution of
multiple programs on a single CPU. The CPU switches
rapidly between programs, providing the illusion that
they’re all running simultaneously. The main benefit of
multiprogramming is to increase CPU utilization, as the
CPU is never idle. Multiprogramming involves running
multiple programs on a single CPU
Multitasking or time sharing
• Multitasking or time sharing is the
management of programs and the
system services they request as tasks
that can be interleaved
Multithreading
• Multithreading is the management of
multiple execution paths through the
computer or of multiple users sharing
the same copy of a program.
Multithreading
• A multithreded CPU is not a parallel
architecture, strictly speaking; multithreading
is obtained through a single CPU, but it
allows a programmer to design and develop
applications as a set of programs that can
virtually execute in parallel: namely, threads.
Multithreading
• Multithreading is solution to avoid waiting
clock cycles as the missing data is fetched:
making the CPU manage more peer-threads
concurrently; if a thread gets blocked, the
CPU can execute instructions of another
thread, thus keeping functional units busy.
Multithreading
•Each thread must have a
private Program Counter and
a set of private registers,
separate from other threads.
How multithreading works with four program
threads
Multicore processors
• Multicore processors today are easily
capable of having 12, 24 or even more
processor cores on the same
motherboard, enabling the effective and
concurrent processing of numerous tasks.
Architecture of multicore processors
Performance Definition
• The speed of computer depends on the user;
if you are a simple user (end user) then you
say a computer is faster when it runs your
program in less time, and you think at the
time it takes from the moment you launch
your program until you get the results, this
the so called a wall-clock time. On the other
hand, if you are system’s manager, then you
say a computer is faster when it completes
more jobs per time unit.
Performance Definition
• As a user you are interested in reducing
the response time (also called the
execution time or latency). The computer
manager is more interested in increasing
the throughput (also called bandwidth),
the number of jobs done in a certain
amount time.
Performance Definition
• Response time, execution time and
throughput are usually connected to
tasks and whole computational
events. Latency and bandwidth are
mostly used when discussing about
memory performance.
CPU Performance
• CPU time used in running a program
is driven by a constant rate clock
generator
• CPUtime = Clock_cycles_for _the
_program*Tck
• Where Tck is the clock cycle time
CPU Performance
• The formula compute the time CPU
spends running a program, not the elapse
time: it does not make sense to compute
the elapsed time as a function of Tck,
mainly because the elapsed time also
includes the I/O time, and the response
time of I/O devices is not a function of
Tck.
CPU Performance
• Let use assume the Instruction Count
(IC) as the number of instructions that
are executed since the program starts
until very end. Then the average
number of clock cycles per
instruction (CPI) can be computed as
CPU Performance
•CPI =
•The CPUtime can then be
expressed as:
•CPUtime = IC * CPI * Tck
CPU Performance
• The scope of a designer is to lower the
CPUtime, and here are the parameters
that can be modified to achieve that
• IC: The instruction count which
depends on the instruction set
architecture and compiler technology
• CPI which depends upon machine
organization and instruction set
architecture. RISC tries to reduce the
CPI
CPU Performance
• Tck, the hardware technology and
machine organization. RISC machines
have lower Tck due to simpler
insructions.
• The above parameters are dependent of
each other, so the changing in one
usually affects others.
Amdahl’s law
• Amdahl’s law state that “the overall performance
improvement gained by optimizing a single part of a
system is limited by the fraction of time that the
improved part is actually used. It is a formular which
gives the theoretical speedup in latency of the
execution of a task at fixed workload that can be
expected of a system whose resources are improved.
Amdahl’s law
• Amdahl’s law is often used in parallel
computing to predict the theoretical
speedup when using multiple
processors.
Amdahl’s law
• For example, if a program needs 20 hours to complete using
a single thread, but a one-hour portion of the program cannot
be parallelized, therefore only the remaining 19 hours’ (p =
0.95) execution time can be parallelized, then regardless of
how many threads are devoted to a parallelized execution of
this program, the minimum execution time cannot be less
than one hour. Hence, the theoretical speedup is limited to at
most 20 times the single thread performance, (= 20)
Amdahl’s law
• For example, if a program needs 20 hours to complete using
a single thread, but a one-hour portion of the program cannot
be parallelized, therefore only the remaining 19 hours’ (p =
0.95) execution time can be parallelized, then regardless of
how many threads are devoted to a parallelized execution of
this program, the minimum execution time cannot be less
than one hour. Hence, the theoretical speedup is limited to at
most 20 times the single thread performance, (= 20)
Amdahl’s law
• Amdahl’s law can be formulated
in the following way
• (s) =
• Where
• Slatency is the theoretical speedup
of the execution of the whole task
Amdahl’s law
• S is the speedup of the part of
the task that benefits from
improved system resources.
• P is the proportion of execution
time that the part benefiting
from improved resources
originally occupied
Amdahl’s law
• Suppose you enhance somehow your machine, to make
it run faster: the speedup is defined as:
• Speedup =
• Where Told represent the execution time without the
enhancement, and Tnew is the execution time with the
enhancement. In terms of performance the speedup can
be defined as:
Amdahl’s law
• Speedup =
• The Amdahl’s law gives us a way to compute
the speedup when an enhancement is used
only some fraction of the time:
• Soverall =
Amdahl’s law
•Where Soverall represents the
overall speedup and Fenhanced
is the fraction of the time
that the enhancement can be
used (measured before the
enhancement is applied).
Amdahl’s law
Amdahl’s law
• As it can easily be seen the
running time after the
enhancement is applied:
• Tnew = (1 – Fenhanced) * Told +
Locality of reference
• This is the largely used property of programs.
It describes the fact that the address generated
by a normal program, tend to be clustered to
small regions of the total address space.
•
Locality of reference
(1)Temporal locality: this refers to the fact
that recently accessed items from
memory are likely to be accessed again
in the near future; loops in a program are
good illustration for temporal locality.
Locality of reference
(2) Spatial locality: items that are
near to each other in memory tend
to be referenced near one another in
time; data structures and arrays are
good illustrations for spatial locality.
Locality of reference
•It is the locality of
reference that allows us to
build memory hierarchies.
N
I O
ST
E
U
Q ?
S
Image Source: http://iamforkids.org/wp-content/uploads/2013/11/j04278101.jpg - Retrieved Online on January 11, 2016
L E
DU
M O
O F
ND
E