Module II
• Processors and memory hierarchy – Advanced
processor technology- Design Space of
processors, Instruction Set Architectures, CISC
Scalar Processors, RISC Scalar Processors,
Superscalar and vector processors, Memory
hierarchy technology –Assignment(24/09/18).
INSTRUCTION SET ARCHITETCURE
• Instruction set: primitive commands /machine
instructions that a programmer can use in
programming the machine.
• Complexity of instruction attributes to
– Instruction formats
– Data formats
– Addressing modes
– General purpose registers
– Opcode specification
– Flow control mechanisms used
• Two type architecture- CISC,RISC
NO CISC RISC
1 120 to 350 instructions using variable Less than 100 instructions with fixed
instruction/data format. instruction format(32 bits).3 to 5 simple
addressing modes are used.
2 Uses small set of 8 to 24 general Most instructions are register based. Large
purpose registers register file used to improve fast context
switching among multiusers.
3 Large number of memory reference Memory access is done by load/store
operations based on large set of instructions.
addressing modes.
4 HLL statements are directly Most instructions execute in one cycle with
implemented in hardware/firmware in a hardwired control.
CISC architecture.
Entire processor is implementable on a single
VLSI chip
Example:SPARC IMPLEMENTATIONS
Example :SPARC Cypress CY7C601
• The SPARC processor architecture contains
essentially a RISC integer unit(IU) implemented with
2 to 32 register windows.
• The Sun SPARC instruction set contains 69 basic
instructions, a significant increase from the 39
instructions in the original Berkeley RISCII
instruction set.
• The SPARC runs each procedure with a set of thirty-
two 32-bit IU registers.
• Eight of these registers are global registers shared
by all procedures, and the remaining 24 are window
registers associated with only each procedure.
• The concept of using overlapped register windows
is the most important feature introduced by the
Berkeley RISC architecture.
• A total of 136 registers are implemented in the
Cypress 601.
• Each register window is divided into three eight register
sections, labeled Ins, Locals, and Outs.
• The local registers are only locally addressable by each
procedure. The Ins and Outs are shared among procedures
• The calling procedure passes parameters to the called
procedure via. its Outs (from r1 to r15] registers, which are
the Ins registers of the called procedure.
• The window of the currently running procedure is called the
active window .
• A window invalid mask is used to indicate which window is
invalid.
• The overlapping windows can significantly save the time
required for inter-procedure communications, resulting in
much faster context switching among cooperative
• The FPU features 32 single-precision (32-bit)
or 16 double precision (64 bit} floating-point
registers.
• Fourteen of the 69 SPARC instructions are for
floating-point operations.
Superscalar processor
Designed to exploit more instruction –level
parallelism in user programs.
Only independent instruction can be executed in
parallel without causing wait state.
Instruction level parallelism depending on the type
of code being executed.
The instruction issue degree in a superscalar
processor has been limited to 2 to 5 .
Pipelining in superscalar processor
• It’s a 3 issue –processor pipeline.
• Superscalar processor were originally developed as
an alternate to vector processor.
• Superscalar processor of degree m can issue m
instruction per cycle.
• They have separate integer unit and floating point
unit.
• Eg:
Eg: A typical superscalar RISC processor
architecture consisting of an integer unit and
floating point unit
• Besides the register files reservation stations and
reorder buffers can be used to establish instruction
windows.
• The purpose is to support instruction lookahead
and internal data forwarding, which are needed to
schedule multiple instructions simultaneously.
Example 4.5 The IBM RS/6000 architecture
• The branch processor could arrange the execution
of up to five instructions per cycle.
• These included one branch instruction in the branch
processor, one fixed-point instruction in the FXU,
one condition-register instruction in the branch
processor, and one floating point multiply-add
instruction in the FPU, which could be counted as
two floating-point operations.
• uses hardwired rather than microcoded control
logic.
• The system used a number of wide buses ranging
from one word (32 bits) for the FXU to two words
(64 bits) for the FPU, and four words for the I-cache
and D-cache, respectively. These wide buses
provided the high instruction and data bandwidths
required for superscalar implementation.
• The RS/6000 design was optimized to perform well
in numerically intensive scientific and engineering
applications, as well as in multiuser commercial
environments.
VLIW Architecture
• Very Long Instruction Word machine.
• Has instruction words hundreds of bits in
length(256/1024).
• Architecture is generalized from –horizontal
microcoding and superscalar processing,
• Multiple functional units are used concurrently in
VLIW processor.
• All functional units share the use of a common large
register file.
• Different fields of the long instruction word carry
the opcodes to be dispatched to different functional
units.
• Programs written in conventional short instruction
words {say 32 bits} must be compacted together to
form the VLIW instructions.
• This code compaction must be done by a compiler .
• lnstruction parallelism and data movement in a
VLIW architecture are completely specified at
compile time.
• VLIW machines behave much like superscalar
machines with three differences: First, the
decoding of VLIW instructions is easier than that of
superscalar instructions.
• Second ,the code density of the superscalar
machine is better when the available instruction-
level parallelism is less than that exploitable by the
VLIW machine.
• This is because the fixed VLIW format includes bits
for non-executable operations, while the super
scalar processor issues only executable instructions.
• VLlW machine exploiting different amounts of
parallelism would require different instruction sets.
Pipelining
• ln general-purpose applications, the architecture
may not be able to perform well.
• Due to its lack of compatibility with conventional
hardware and software , the VLlW architecture has
not entered the main stream of computers.
Although the idea seems sound in theory, the
dependence on trace-scheduling compiling and
code compaction has prevented it from gaining
acceptance in the commercial world.
Vector and symbolic processors
• Vector processor is a co processor specially
designed to perform vector computation.
• Vector instruction involves a large array of
operands. In other words, the same operation will
be performed over an array or a string of data.
• Specialized vector processors are generally used in
supercomputers.
• A vector processor can assume either a register in
register architecture or a memory-to-memory
architecture.
• The former uses shorter instructions and vector
register files.
• The latter uses memory-based instructions which
are longer in length, including memory addresses.
• Vector to vector operations
lt should be noted that the vector length should be equal in
the two operands used in a binary vector instruction.
Memory based
• Memory-based vector operations are found in
memory-to-memory vector processors such as
those in the early supercomputer CDC Cyber 205.
•
Vector pipeline
Symbolic processor
• Symbolic processing has been applied in many
areas, including theorem proving, pattern
recognition, expert systems, knowledge
engineering, text retrieval cognitive science, and
machine intelligence.
• Symbolic processors have also been called prolog
processors, Lisp processor, or symbolic manipulator
Characteristic of Symbolic Processing
Attributes Characteristics
Knowledge Representations Lists, relational databases, scripts
Common Operations Searching , sorting , pattern matching,…
Memory Requirements Large memory with intensive access pattern
Communication Patterns Message traffic varies in size and
destination
Properties of algorithms Nondeterministic, possibly parallel and
distributed computation.
Input, output requirements user guided programs , input can be
graphical and audio as well as from
keyboard.
Architecture Features Dynamic memory allocation ,dynamic
load balancing, hardware supported
garbage collection ,stack processor
architecture.