The Evolution of DSP Processors
Presented by MUHAMED SHEREEF P M2 AEI
Outline
 DSP applications  Digital ltering as an example  The rst generation of DSPs, with an example  Comparison of DSP processors to general-purpose processors  DSP evolution continues... later-generation DSPs and
alternatives
 Modern DSP-enhanced general-purpose processors  Conclusions
DSP
 DSP is a key enabling technology for many types of electronic
products
 Computational demands of DSP-intensive tasks are increasing
very rapidly
 In many embedded DSP applications, general-purpose
microprocessors are not competitive with DSP-oriented processors today
Example DSP Applications
 Digital cell phones  Automated inspection  Vehicle collision avoidance  Voice -over-Internet  Motor control  Videoconferencing  Toys, games consoles  Music synthesis, eects
 Satellite communications  Seismic analysis  Secure communications  Sonar  Digital cameras  Noise cancellation  Medical ultrasound  Radar
And more....
DSP Tasks for Microprocessors
 Speech and audio compression  Filtering  Modulation and demodulation  Error correction coding and decoding  Audio processing (e.g., surround-sound, noise reduction,
equalization, sample rate conversion, echo cancellation)
 Image processing  Speech recognition  Signal synthesis (e.g., music, speech)  Servo control
What Do DSP Processors Need to Do Well?
Most DSP tasks require:
 Repetitive numeric calculations  Attention to numeric delity  Fixed- vs oating-point  standards  High memory bandwidth  Real-time processing
Processors must perform these tasks eciently while minimizing:
 Cost  Power consumption  Memory use  Development time
FIR Filtering as an Example
Each tap (M+1 taps total) nominally requires:
 Two data fetches  Multiply  Accumulate  Memory write-back to update delay line
First-Generation DSP (1982):
Texas Instruments TMS32010
 16-bit xed-point  Harvard
architecture
 Accumulator  Specialized
instruction set
 390 ns MAC time
Features Common to Most DSP Processors
 Data path congured for DSP  Specialized instruction set  Multiple memory banks and buses  Specialized addressing modes  Specialized execution control  Specialized peripherals for DSP
Comparison
Data Path
DSP Processor
 Specialized hardware
General-Purpose Processors
 Multiplies often take > 1
performs all key arithmetic operations in 1 cycle.  Hardware support for managing numeric delity:
 Shifters  Guard bits  Saturation
cycle
 Shifts often take > 1 cycle  Other operations
(e.g.,saturation, rounding) typically take multiple cycles
Instruction Set
DSP Processor
 Specialized, complex
General-Purpose Processors
 General-purpose instructions  Typically only one operation
instructions
 Multiple operations per
instruction
per instruction
Memory Architecture
DSP Processor
 Harvard architecture  2-4 memory accesses per
General-Purpose Processors
 Von Neumann architecture  Typically 1 access per cycle  May use caches
cycle
 No cacheson-chip SRAM
Addressing
DSP Processor
 Dedicated
General-Purpose Processors
 Often, no separate address
address-generation units  Specialized addressing modes
 Autoincrement  Modulo (circular)  Bit-reversed (for FFT)
generation units
 General-purpose addressing
modes
 Favor compiler-generated
 Good immediate data
code
support
Second-Generation DSPs (1987-):
Motorola DSP56001
 24-bit data, instructions  3 memory spaces (X, Y,
P)
 Parallel moves  Single- and multi
instruction hardware loops
 Modulo addressing  75 ns MAC (21 ns today)  Other second-generation processors: AT&T DSP16A, Analog
Devices ADSP-2100, Texas Instruments TMS320C50
Third Generation DSPs (1995)
Examples: Motorola DSP56301, TI TMS320C541
 Enhanced conventional DSP architectures  3.0 or 3.3 volts  More on-chip memory  Application-specic function units in data path or as
co-processors
 More sophisticated debugging and application development
tools
 20 ns MAC (10 ns today)
Architectural innovation mostly limited to adding application-specic function units and miscellaneous minor renements.
 Also, multiple processors/chip (TI TMS320C80, Motorola
MC68356)
Fourth Generation (1997-2000)
Ex: TMS320C6201/6701, LSI401Z, MMX Pentium
DSP performers adopt architectures far dierent from conventional DSP processor designs.  SIMD
 Single instruction, multiple data(e.g., MMX, AltiVec, MDMX)
 VLIW  Very long instruction word  Compile-time scheduling and parallel execution of multiple simple instructions (e.g., TMS320C6201/C6701)  Superscalar  Run-time scheduling and execution of > 1 (usually 2-4) instructions per cycle (e.g., Pentium, PowerPC, ZSP164xx)  User-dened instructions
VLIW
Very long instruction word (VLIW) architectures are garnering increased attention for DSP applications. Notable recent introductions include Texas Instruments TMS320C62xx and Philips TM1000. Major features:
 Multiple independent operations per cycle  Packed into a single large instruction or packet  More regular, orthogonal, RISC-like operations  Large, uniform register sets
VLIW
Advantages:
 Increased performance  More regular architectures  Potentially easier to program; better compiler targets  Scalable
Disadvantages:
 New kinds of programming/compiler complexity  Code size bloat  High program memory bandwidth requirements  High power consumption
SIMD
Single Instruction, Multiple Data
 Virtually all high-performance CPUs (and some modern
DSPs) support SIMD operations  One SIMD instruction performs the same operation on multiple (independent) sets of data
 For each SIMD instruction, you can get 2x (or 4x, or 8x, ...)
the work  Two ways to implement SIMD  Split execution units  Multiple execution units (or data paths) operating in lock-step
SIMD
Split Execution Unit
SIMD Characteristics
 Each instruction performs lots of work  Algorithms, data organization must be amenable to
data-parallel processing
 Most eective on algorithms that process large blocks of data  May support multiple data widths (e.g., 16-bit and 8-bit)
Processor DSP Speed: BDTImarks
Conclusions
 DSP processor performance has increased considerably  Multi-issue architectures dominate the eld of new
high-performance processors
 Processor architectures for DSP will be increasingly specialized
for applications, especially communications applications
 General-purpose processors will become viable for many DSP
applications
 Users of processors for DSP will have an expanding array of
choices
 Selecting processors requires a careful, application-specic
analysis