Multiprocessor Architecture Overview
Multiprocessor Architecture Overview
OF
CSE-211
Topic: Multiprocessor Architecture System.
Submitted by:
MANMEET SINGH Submitted to:
Roll.no- RE2801B40 Lect. Ruchika Dhall
Reg.no- 10801620
Course- B Tech.(IT)-M Tech.
Multiprocessor Architecture System
Multiprocessing:
Multiprocessing is the use of two or more central processing units (CPUs) within a single
computer system. The term also refers to the ability of a system to support more than one
processor and/or the ability to allocate tasks between them. There are many variations on this
basic theme, and the definition of multiprocessing can vary with context, mostly as a function of
how CPUs are defined (multiple cores on one die, multiple chips in one package, multiple
packages in one system unit, etc.).
Types:
Processor symmetry:
In a multiprocessing system, all CPUs may be equal, or some may be reserved for special
purposes. A combination of hardware and operating-system software design considerations
determine the symmetry (or lack thereof) in a given system. For example, hardware or software
considerations may require that only one CPU respond to all hardware interrupts, whereas all
other work in the system may be distributed equally among CPUs; or execution of kernel-mode
code may be restricted to only one processor (either a specific processor, or only one processor at
a time), whereas user-mode code may be executed in any combination of processors.
Multiprocessing systems are often easier to design if such restrictions are imposed, but they tend
to be less efficient than systems in which all CPUs are utilized.
Systems that treat all CPUs equally are called symmetric multiprocessing (SMP) systems. In
systems where all CPUs are not equal, system resources may be divided in a number of ways,
including asymmetric multiprocessing (ASMP), non-uniform memory access (NUMA)
multiprocessing, and clustered multiprocessing.
Processor coupling:
Tightly-coupled multiprocessor systems contain multiple CPUs that are connected at the bus
level. These CPUs may have access to a central shared memory (SMP or UMA), or may
participate in a memory hierarchy with both local and shared memory (NUMA). The IBM p690
Regatta is an example of a high end SMP system. Intel Xeon processors dominated the
multiprocessor market for business PCs and were the only x86 option until the release of AMD's
Opteron range of processors in 2004. Both ranges of processors had their own onboard cache but
provided access to shared memory; the Xeon processors via a common pipe and the Opteron
processors via independent pathways to the system RAM.
Chip multiprocessors, also known as multi-core computing, involves more than one processor
placed on a single chip and can be thought of the most extreme form of tightly-coupled
multiprocessing. Mainframe systems with multiple processors are often tightly-coupled.
Tightly-coupled systems perform better and are physically smaller than loosely-coupled systems,
but have historically required greater initial investments and may depreciate rapidly; nodes in a
loosely-coupled system are usually inexpensive commodity computers and can be recycled as
independent machines upon retirement from the cluster.
Flynn's taxonomy
Single Multiple
Instructi Instructi
on on
Single
SISD MISD
Data
Multip
le SIMD MIMD
Data
SISD multiprocessing:
In a single instruction stream, single data stream computer one processor sequentially processes
instructions, each instruction processes one data item.
SIMD multiprocessing:
In a single instruction stream, multiple data stream computer one processor handles a stream of
instructions, each one of which can perform calculations in parallel on multiple data locations.
SIMD multiprocessing is well suited to parallel or vector processing, in which a very large set of
data can be divided into parts that are individually subjected to identical but independent
operations. A single instruction stream directs the operation of multiple processing units to
perform the same manipulations simultaneously on potentially large amounts of data.
For certain types of computing applications, this type of architecture can produce enormous
increases in performance, in terms of the elapsed time required to complete a given task.
However, a drawback to this architecture is that a large part of the system falls idle when
programs or system tasks are executed that cannot be divided into units that can be processed in
parallel.
Additionally, programs must be carefully and specially written to take maximum advantage of
the architecture, and often special optimizing compilers designed to produce code specifically for
this environment must be used. Some compilers in this category provide special constructs or
extensions to allow programmers to directly specify operations to be performed in parallel (e.g.,
DO FOR ALL statements in the version of FORTRAN used on the ILLIAC IV, which was a
SIMD multiprocessing supercomputer).
SIMD multiprocessing finds wide use in certain domains such as computer simulation, but is of
little use in general-purpose desktop and business computing environments.
MISD multiprocessing:
MISD multiprocessing offers mainly the advantage of redundancy, since multiple processing
units perform the same tasks on the same data, reducing the chances of incorrect results if one of
the units fails. MISD architectures may involve comparisons between processing units to detect
failures. Apart from the redundant and fail-safe character of this type of multiprocessing, it has
few advantages, and it is very expensive. It does not improve performance. It can be
implemented in a way that is transparent to software. It is used in array processors and is
implemented in fault tolerant machines.
MIMD multiprocessing:
MIMD multiprocessing architecture is suitable for a wide variety of tasks in which completely
independent and parallel execution of instructions touching different sets of data can be put to
productive use. For this reason, and because it is easy to implement, MIMD predominates in
multiprocessing.
Processing is divided into multiple threads, each with its own hardware processor state, within a
single software-defined process or within multiple processes. Insofar as a system has multiple
threads awaiting dispatch (either system or user threads), this architecture makes good use of
hardware resources.
MIMD does raise issues of deadlock and resource contention, however, since threads may
collide in their access to resources in an unpredictable way that is difficult to manage efficiently.
MIMD requires special coding in the operating system of a computer but does not require
application changes unless the programs themselves use multiple threads (MIMD is transparent
to single-threaded programs under most operating systems, if the programs do not voluntarily
relinquish control to the OS). Both system and user software may need to use software constructs
such as semaphores (also called locks or gates) to prevent one thread from interfering with
another if they should happen to cross paths in referencing the same data.
This gating or locking process increases code complexity, lowers performance, and greatly
increases the amount of testing required, although not usually enough to negate the advantages of
multiprocessing.
Similar conflicts can arise at the hardware level between processors (cache contention and
corruption, for example), and must usually be resolved in hardware, or with a combination of
software and hardware (e.g., cache-clear instructions).
Symmetric Multiprocessing:
SMP systems allow any processor to work on any task no matter where the data for that task are
located in memory; with proper operating system support, SMP systems can easily move tasks
between processors to balance the workload efficiently.
Alternatives:
Diagram of a typical SMP system. Three processors are connected to the same memory module
through a bus or crossbar switch SMP represents one of the earliest styles of multiprocessor
machine architectures, typically used for building smaller computers with up to 8 processors.
Larger computer systems might use newer architectures such as NUMA (Non-Uniform Memory
Access), which dedicates different memory banks to different processors. In a NUMA
architecture, processors may access local memory quickly and remote memory more slowly.
This can dramatically improve memory throughput as long as the data is localized to specific
processes (and thus processors). On the downside, NUMA makes the cost of moving data from
one processor to another, as in workload balancing, more expensive. The benefits of NUMA are
limited to particular workloads, notably on servers where the data is often associated strongly
with certain tasks or users.
Other systems include asymmetric multiprocessing (ASMP), which uses separate specialized
processors for specific tasks (which increases complexity), and computer clustered
multiprocessing (such as Beowulf), in which not all memory is available to all processors.
Examples of ASMP include many media processor chips that are a relatively slow base processor
assisted by a number of hardware accelerator cores. High-powered 3D chipsets in modern video
cards could be considered a form of asymmetric multiprocessing. Clustering techniques are used
fairly extensively to build very large supercomputers. In this discussion a single processor is
denoted as a uni processor (UN).
SMP has many uses in science, industry, and business which often use custom-programmed
software for multithreaded (multitasked) processing. However, most consumer products such as
word processors and computer games are written in such a manner that they cannot gain large
benefits from concurrent systems. For games this is usually because writing a program to
increase performance on SMP systems can produce a performance loss on uniprocessor systems.
Recently, however, multi-core chips are becoming more common in new computers, and the
balance between installed uni- and multi-core computers may change in the coming years.
The nature of the different programming methods would generally require two separate code-
trees to support both uni-processor and SMP systems with maximum performance. Programs
running on SMP systems may experience a performance increase even when they have been
written for uni-processor systems. This is because hardware interrupts that usually suspend
program execution while the kernel handles them can execute on an idle processor instead. The
effect in most applications (e.g. games) is not so much a performance increase as the appearance
that the program is running much more smoothly. In some applications, particularly compilers
and some distributed computing projects, one will see an improvement by a factor of (nearly) the
number of additional processors.
In situations where more than one program executes at the same time, an SMP system will have
considerably better performance than a uni-processor because different programs can run on
different CPUs simultaneously.
Systems programmers must build support for SMP into the operating system: otherwise, the
additional processors remain idle and the system functions as a uni-processor system.
In cases where an SMP environment processes many jobs, administrators often experience a loss
of hardware efficiency. Software programs have been developed to schedule jobs so that the
processor utilization reaches its maximum potential. Good software packages can achieve this
maximum potential by scheduling each CPU separately, as well as being able to integrate
multiple SMP machines and clusters.
Access to RAM is serialized; this and cache coherency issues causes performance to lag slightly
behind the number of additional processors in the system.
Entry-level systems:
Before about 2006, entry-level servers and workstations with two processors dominated the SMP
market. With the introduction of dual-core devices, SMP is found in most new desktop machines
and in many laptop machines. The most popular entry-level SMP systems use the x86 instruction
set architecture and are based on Intel’s Xeon, Pentium D, Core Duo, and Core 2 Duo based
processors or AMD’s Athlon64 X2, Quad FX or Opteron 200 and 2000 series processors.
Servers use those processors and other readily available non-x86 processor choices including the
Sun Microsystems UltraSPARC, Fujitsu SPARC64 III and later, SGI MIPS, Intel Itanium,
Hewlett Packard PA-RISC, Hewlett-Packard (merged with Compaq which acquired first Digital
Equipment Corporation) DEC Alpha, IBM POWER and Apple Computer PowerPC (specifically
G4 and G5 series, as well as earlier PowerPC 604 and 604e series) processors. In all cases, these
systems are available in uniprocessor versions as well.
Earlier SMP systems used motherboards that have two or more CPU sockets. More recently,
microprocessor manufacturers introduced CPU devices with two or more processors in one
device, for example, the POWER, UltraSPARC, Opteron, Athlon, Core 2, and Xeon all have
multi-core variants. Athlon and Core 2 Duo multiprocessors are socket-compatible with
uniprocessor variants, so an expensive dual socket motherboard is no longer needed to
implement an entry-level SMP machine. It should also be noted that dual socket Opteron designs
are technically ccNUMA designs, though they can be programmed as SMP for a slight loss in
performance.
Mid-level systems:
The Burroughs B5500 first implemented SMP in 1961. It was implemented later on other
mainframes. Mid-level servers, using between four and eight processors, can be found using the
Intel Xeon MP, AMD Opteron 800 and 8000 series and the above-mentioned UltraSPARC,
SPARC64, MIPS, Itanium, PA-RISC, Alpha and POWER processors. High-end systems, with
sixteen or more processors, are also available with all of the above processors.
Sequent Computer Systems built large SMP machines using Intel 80386 (and later 80486)
processors. Some smaller 80486 systems existed, but the major x86 SMP market began with the
Intel Pentium technology supporting up to two processors. The Intel Pentium Pro expanded SMP
support with up to four processors natively. Later, the Intel Pentium II, and Intel Pentium III
processors allowed dual CPU systems, except for the respective Celerons. This was followed by
the Intel Pentium II Xeon and Intel Pentium III Xeon processors which could be used with up to
four processors in a system natively.
In 2001 AMD released their Athlon MP, or MultiProcessor CPU, together with the 760MP
motherboard chipset as their first offering in the dual processor marketplace. Although several
much larger systems were built, they were all limited by the physical memory addressing
limitation of 64 GiB. With the introduction of 64-bit memory addressing on the AMD64 Opteron
in 2003 and Intel 64 (EM64T) Xeon in 2005, systems are able to address much larger amounts of
memory; their addressable limitation of 16 EiB is not expected to be reached in the foreseeable
future.
Asymmetric multiprocessing:
Where as a symmetric multiprocessor or SMP treats all of the processing elements in the system
identically, an ASMP system assigns certain tasks only to certain processors. In particular, only
one processor may be responsible for fielding all of the interrupts in the system or perhaps even
performing all of the I/O in the system. This makes the design of the I/O system much simpler,
although it tends to limit the ultimate performance of the system. Graphics cards, physics cards
and cryptographic accelerators which are subordinate to a CPU in modern computers can be
considered a form of asymmetric multiprocessing. SMP is extremely common in the modern
computing world, when people refer to "multi core" or "multiprocessing" they are most
commonly referring to SMP.
Introduction:
Although hardware-level ASMP may not be in use, the idea and logical process is still
commonly used in applications that are multiprocessor intensive. Unlike SMP applications,
which run their threads on multiple processors, ASMP applications will run on one processor but
outsource smaller tasks to another. Although the system may physically be an SMP, the software
is still able to use it as an ASMP by simply giving certain tasks to one processor and deeming it
the "master", and only outsourcing smaller tasks to "slave" processors.
Asymmetric hardware systems commonly dedicated individual processors to specific tasks. For
example, one processor may be dedicated to disk operations, another to video operations, and the
rest to standard processor tasks. These systems don't have the flexibility to assign processes to
the least-loaded CPU, unlike an SMP system.
History:
Asymmetric multiprocessors date back to 1970, when they were first pioneered by MIT and
DEC as a modern computing technique. Their original design and product was called the PDP-
6/KA10. In 1972 DEC rewrote their TOPS-10 monitor software which ran on the PDP-10. This
change allowed for the computer to use asymmetric multiprocessing. Furthermore in 1981 DEC
continued their research into ASMP and produced asymmetric multiprocessor models of the
VAX 11, the VAX 11/782 which had two processors, and the VAX 11/784 which had four
processors.
After 1981, asymmetric processing research and design faded and later disappeared.
Symmetrical processing came about during the same period and saw higher adoption along with
use by larger companies such as Intel. As a result, ASMP seemed to disappear into history while
SMP began to flourish. It was largely because ASMP was very complex and convoluted in its
design, most of the technology was optimized for very specific applications (i.e. Video editing
applications that could outsource rendering to a separate processor). SMP is simply a collection
of identical processors capable or processing any information that any one processor is given.
Thus writing software and operating systems that are multiprocessing capable, was much more
realistic for SMP architecture.
Currently there are no consumer level production computers that use asymmetric multiprocessor
designs. There are, however, computers that are able to distribute tasks Asymmetrically. In
theory you are able to use a Symmetrical processor to do asymmetrical computations. A
programmer can choose to use one processor as a main, and only offload certain tasks to the
other processor. Although each physical or logical processor is able to complete any given task,
priority is given to one as the "master" processor, and the other is given the position of "slave".
The hardware architecture was abandoned in the early 80's and lost out to Symmetrical
multiprocessors which were much easier to work with and provided a much simpler hardware
build. It is common to see some applications using Asymmetrical traits within a symmetrical
processing system. Such an example would be a video game that ran on one "master" processor
and offloaded physics calculation onto the "slave" processor.
Even though both processors are non-unique and equal, software can choose to use the
processors in a master/slave fashion.
ATI pioneered a technology that allows their video cards to be used Asymmetrically (i.e. Using
one for Rendering and another for Physics) but this is once again a representation of Software
ASMP. The hardware is identical and thus Symmetric, but is being used Asymmetrically through
software intervention.
The Sony PS3 is an example of an extrapolated asymmetric multiprocessor. The cell processor
has unique cores which compute only certain tasks, though it is a game console rather than a
general-purpose computer.
Below are examples of what a cluster of asymmetrical multiprocessors would look like. Observe
the extremely unique nature of these designs and how only one processor has access to the I/O
part of the system. As stated before, these systems work best and were originally designed to do
very specific tasks. One processor may simply do physics calculations while another is dedicated
to rendering 2D video. Above those two processors, will be a master processor that assigns tasks.
Notice also that the main memory is not accessible by all of the processors. The master processor
will usually relay information on a "need to know" basis, to the slave processors.
This image depicts an ASMP system where only one processor has direct
Multiple processors with unique access to memory access to I/0
and I/O.