The Evolution of DSP Processors
Presented by MUHAMED SHEREEF P M2 AEI
Outline
DSP applications Digital ltering as an example The rst generation of DSPs, with an example Comparison of DSP processors to general-purpose processors DSP evolution continues... later-generation DSPs and
alternatives
Modern DSP-enhanced general-purpose processors Conclusions
DSP
DSP is a key enabling technology for many types of electronic
products
Computational demands of DSP-intensive tasks are increasing
very rapidly
In many embedded DSP applications, general-purpose
microprocessors are not competitive with DSP-oriented processors today
Example DSP Applications
Digital cell phones Automated inspection Vehicle collision avoidance Voice -over-Internet Motor control Videoconferencing Toys, games consoles Music synthesis, eects
Satellite communications Seismic analysis Secure communications Sonar Digital cameras Noise cancellation Medical ultrasound Radar
And more....
DSP Tasks for Microprocessors
Speech and audio compression Filtering Modulation and demodulation Error correction coding and decoding Audio processing (e.g., surround-sound, noise reduction,
equalization, sample rate conversion, echo cancellation)
Image processing Speech recognition Signal synthesis (e.g., music, speech) Servo control
What Do DSP Processors Need to Do Well?
Most DSP tasks require:
Repetitive numeric calculations Attention to numeric delity Fixed- vs oating-point standards High memory bandwidth Real-time processing
Processors must perform these tasks eciently while minimizing:
Cost Power consumption Memory use Development time
FIR Filtering as an Example
Each tap (M+1 taps total) nominally requires:
Two data fetches Multiply Accumulate Memory write-back to update delay line
First-Generation DSP (1982):
Texas Instruments TMS32010
16-bit xed-point Harvard
architecture
Accumulator Specialized
instruction set
390 ns MAC time
Features Common to Most DSP Processors
Data path congured for DSP Specialized instruction set Multiple memory banks and buses Specialized addressing modes Specialized execution control Specialized peripherals for DSP
Comparison
Data Path
DSP Processor
Specialized hardware
General-Purpose Processors
Multiplies often take > 1
performs all key arithmetic operations in 1 cycle. Hardware support for managing numeric delity:
Shifters Guard bits Saturation
cycle
Shifts often take > 1 cycle Other operations
(e.g.,saturation, rounding) typically take multiple cycles
Instruction Set
DSP Processor
Specialized, complex
General-Purpose Processors
General-purpose instructions Typically only one operation
instructions
Multiple operations per
instruction
per instruction
Memory Architecture
DSP Processor
Harvard architecture 2-4 memory accesses per
General-Purpose Processors
Von Neumann architecture Typically 1 access per cycle May use caches
cycle
No cacheson-chip SRAM
Addressing
DSP Processor
Dedicated
General-Purpose Processors
Often, no separate address
address-generation units Specialized addressing modes
Autoincrement Modulo (circular) Bit-reversed (for FFT)
generation units
General-purpose addressing
modes
Favor compiler-generated
Good immediate data
code
support
Second-Generation DSPs (1987-):
Motorola DSP56001
24-bit data, instructions 3 memory spaces (X, Y,
P)
Parallel moves Single- and multi
instruction hardware loops
Modulo addressing 75 ns MAC (21 ns today) Other second-generation processors: AT&T DSP16A, Analog
Devices ADSP-2100, Texas Instruments TMS320C50
Third Generation DSPs (1995)
Examples: Motorola DSP56301, TI TMS320C541
Enhanced conventional DSP architectures 3.0 or 3.3 volts More on-chip memory Application-specic function units in data path or as
co-processors
More sophisticated debugging and application development
tools
20 ns MAC (10 ns today)
Architectural innovation mostly limited to adding application-specic function units and miscellaneous minor renements.
Also, multiple processors/chip (TI TMS320C80, Motorola
MC68356)
Fourth Generation (1997-2000)
Ex: TMS320C6201/6701, LSI401Z, MMX Pentium
DSP performers adopt architectures far dierent from conventional DSP processor designs. SIMD
Single instruction, multiple data(e.g., MMX, AltiVec, MDMX)
VLIW Very long instruction word Compile-time scheduling and parallel execution of multiple simple instructions (e.g., TMS320C6201/C6701) Superscalar Run-time scheduling and execution of > 1 (usually 2-4) instructions per cycle (e.g., Pentium, PowerPC, ZSP164xx) User-dened instructions
VLIW
Very long instruction word (VLIW) architectures are garnering increased attention for DSP applications. Notable recent introductions include Texas Instruments TMS320C62xx and Philips TM1000. Major features:
Multiple independent operations per cycle Packed into a single large instruction or packet More regular, orthogonal, RISC-like operations Large, uniform register sets
VLIW
Advantages:
Increased performance More regular architectures Potentially easier to program; better compiler targets Scalable
Disadvantages:
New kinds of programming/compiler complexity Code size bloat High program memory bandwidth requirements High power consumption
SIMD
Single Instruction, Multiple Data
Virtually all high-performance CPUs (and some modern
DSPs) support SIMD operations One SIMD instruction performs the same operation on multiple (independent) sets of data
For each SIMD instruction, you can get 2x (or 4x, or 8x, ...)
the work Two ways to implement SIMD Split execution units Multiple execution units (or data paths) operating in lock-step
SIMD
Split Execution Unit
SIMD Characteristics
Each instruction performs lots of work Algorithms, data organization must be amenable to
data-parallel processing
Most eective on algorithms that process large blocks of data May support multiple data widths (e.g., 16-bit and 8-bit)
Processor DSP Speed: BDTImarks
Conclusions
DSP processor performance has increased considerably Multi-issue architectures dominate the eld of new
high-performance processors
Processor architectures for DSP will be increasingly specialized
for applications, especially communications applications
General-purpose processors will become viable for many DSP
applications
Users of processors for DSP will have an expanding array of
choices
Selecting processors requires a careful, application-specic
analysis