KEMBAR78
Introduction To DSP Processors | PDF | Digital Signal Processor | Digital Signal Processing
0% found this document useful (0 votes)
77 views16 pages

Introduction To DSP Processors

The document discusses digital signal processing (DSP) applications and DSP processors. It describes several DSP tasks including filtering, modulation, error correction, audio processing, and image processing. These tasks often require repetitive numeric computations and real-time processing. Common DSP applications include voice mail, digital cameras, audio production, and medical ultrasound. The document outlines the evolution of DSP processors from early generations with specialized instruction sets and memory architectures to more advanced designs like the TI TMS320C6xxx with multiple functional units and register files.

Uploaded by

sai420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views16 pages

Introduction To DSP Processors

The document discusses digital signal processing (DSP) applications and DSP processors. It describes several DSP tasks including filtering, modulation, error correction, audio processing, and image processing. These tasks often require repetitive numeric computations and real-time processing. Common DSP applications include voice mail, digital cameras, audio production, and medical ultrasound. The document outlines the evolution of DSP processors from early generations with specialized instruction sets and memory architectures to more advanced designs like the TI TMS320C6xxx with multiple functional units and register files.

Uploaded by

sai420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

16.

1 DSP Applications

INTRODUCTION TO DSP
PROCESSORS

Voice mail
Digital cameras
Navigation equipment
Modems
Audio production
Noise cancellation
Videoconferencing
Medical ultrasound
Pagers
Patient monitoring
Music synthesis, effects
Radar

611 37100 Lecture 16-4

16.1 DSP Applications

INTRODUCTION TO DSP
PROCESSORS
16.1 DSP Applications
16.2 The Evolution of DSP Processors
16.3 The TMS320C6xxx Core CPU
Architecture
16.4 The TMS320C6xxx Memory Architecture
16.5 The TMS320C6xxx Peripherals
16.6 The TMS320C6xxx Software
Development
611 37100 Lecture 16-2

16.1 DSP Applications

DSP tasks for microprocessors


Speech and audio compression
Filtering
Modulation and demodulation
Error correction coding and decoding
Servo control
Audio processing (e.g., surround sound, noise
reduction, equalization, sample rate conversion, echo
cancellation)
Signaling (e.g., DTMF detection)
Speech recognition
Signal synthesis (e.g., music, speech synthesis)
Image processing
611 37100 Lecture 16-5

16.1 DSP Applications

Digital cellular phones


Satellite communications
Automated inspection
Seismic analysis
Vehicle collision avoidance
Secure communications
Voice over Internet
Tapeless answering machines
Motor control
Sonar
Consumer audio
Cordless phones

611 37100 Lecture 16-3

DSP tasks requirements


Repetitive numeric computations
Attention to numeric fidelity
High memory bandwidth, mostly via array
accesses
Real-time processing

611 37100 Lecture 16-6

16.1 DSP Applications

16.2 The Evolution of DSP Processors

Advantages and disadvantages


Advantages

Common DSP Features


Data path configured for DSP
Specialized instruction set
Multiple memory banks and buses
Specialized addressing modes
Specialized execution control
Specialized peripherals for DSP

Flexibility: Easy to modify and upgrade with software for


using the same hardware
Reproducibility: The performance of a DSP can be
repeated precisely from one unit to another
Reliability: The memory and logic of DSP hardware does
not deteriorate with age
Complexity: Allows sophisticated applications

Disadvantages
Speed and cost
Longer hardware and software design time
Problem of Finite word length error
611 37100 Lecture 16-7

611 37100 Lecture 16-10

16.1 DSP Applications

16.2 The Evolution of DSP Processors

DSP manufacturers and products

Data path
DSP Processors

General-Purpose Processor

Specialized hardware
performs all key arithmetic
operations in 1 cycle.

Multiplies often take >1 cycle

Hardware support for


managing numeric fidelity:

Shifts often take >1 cycle


Other operations (e.g.,
saturation, rounding) typically
take multiple cycles

Shifters
Guard bits
Saturation

611 37100 Lecture 16-8

611 37100 Lecture 16-11

16.1 DSP Applications

16.2 The Evolution of DSP Processors

General-purpose DSP market share

Instruction set
DSP Processors

General-Purpose Processor

Specialized, complex
instructions

General-purpose instructions

Multiple operations per


instruction
Mac x0,y0,a x:(r0)+,x0 y:(r4)+,y0

Forward Concepts, 1998


611 37100 Lecture 16-9

611 37100 Lecture 16-12

Typically only one operation per


instruction
mov *r0,x0
mov *r1,y0
mpy x0,y0,a
add a,b
mov y0,*r2
inc r0
inc r1

16.2 The Evolution of DSP Processors

16.2 The Evolution of DSP Processors

Memory architecture

Typical DSP algorithms

DSP Processors

General-Purpose Processor

Harvard architecture

Von Neumann architecture

2-4 memory accesses/cycle

Typically 1 access/cycle

No cacheson-chip SRAM

611 37100 Lecture 16-13

611 37100 Lecture 16-16

16.2 The Evolution of DSP Processors

16.2 The Evolution of DSP Processors

Addressing

Characteristics of DSP-based systems


Algorithms

DSP Processors

General-Purpose Processor

Dedicated address
generation units

Often, no separate address


generation unit

Specialized addressing
modes; e.g.:

General-purpose addressing
modes

Modulo (circular)

Clock rates

Numeric representations

Bit-reversed (for FFT)

611 37100 Lecture 16-14

The rate at which samples are consumed, processed, or


produced
MIPS (Million Instructions per Second)
MFLOPS (Million Floating-point Operations per Second)

Autoincrement

Good immediate data


support

Specifies the arithmetic operations to be performed

Sampling rates

Fixed-point numeric
Float-point numeric

611 37100 Lecture 16-17

16.2 The Evolution of DSP Processors

16.2 The Evolution of DSP Processors

Specialized peripherals for DSP


Host ports
Bit I/O ports
On-chip DMA controller
Clock generators
Synchronous serial ports
Parallel ports
Timers
On-chip A/D, D/A converters

First generation DSP (Texas Instruments

611 37100 Lecture 16-15

TMS32010)
Introduced in 1982
16-bit fixed-point
Harvard architecture
Accumulator
Specialized instruction set
390 ns MAC time
(228 ns today)

611 37100 Lecture 16-18

16.2 The Evolution of DSP Processors

16.2 The Evolution of DSP Processors

Second Generation DSPs

Processor DSP speed

Example: Motorola DSP56001 (1987)


24-bit data, instructions
3 memory spaces (X, Y, P)
Parallel moves
Single- and multi-instruction
hardware loops
Modulo addressing
75 ns MAC (21 ns today)
Other second-generation processors:
AT&T DSP16A, Analog Devices
ADSP-2100, Texas Instruments
TMS320C50

611 37100 Lecture 16-19

16.2 The Evolution of DSP Processors


Third generation DSPs (1995)
Examples: Motorola DSP56301, TI TMS320C541 (1995)
Enhanced conventional DSP architectures
3.0 or 3.3 volts
More on-chip memory
Application-specific function units in data path or as coprocessors
More sophisticated debugging and application development
tools
DSP cores (Pine and Oak from DSP Group, cDSP from TI)
20 ns MAC (10 ns today)
Architectural innovation mostly limited to adding
application-specific function units and miscellaneous minor
refinements.
Also, multiple processors/chip (TI TMS320C80, Motorola
MC68356)

611 37100 Lecture 16-20

611 37100 Lecture 16-22

16.2 The Evolution of DSP Processors


TI TMS320 family evolution

611 37100 Lecture 16-23

16.2 The Evolution of DSP Processors

16.3 The TMS320C6xxx Core CPU


Architecture

Fourth Generation DSP

The block diagram of TMS320C6000

INTERNAL BUS

DATA

CPU

611 37100 Lecture 16-24

PERIPHERALS

611 37100 Lecture 16-21

MEMORY
ADDRESS
EMIF

TI TMS320C6201, Intel Pentium with MMX (1997-1998)


Todays top DSP performers adopt architectures far
different from conventional DSP processor designs.
Blazing clock speeds and superscalar architectures make
some general-purpose processors, such as the PowerPC
604e, top floating-point performers, despite lack of many
DSP features.
Multimedia SIMD extensions, such as MMX, offer strong
fixed point performance on general-purpose processors.
VLIW-like architectures, such as that of the TI
TMS320C6201, achieve top performance via high
parallelism and increased clock speeds.
3 ns MAC throughput... but expensive, power-hungry

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

The block diagram of TMS320C6000

The program control unit


These units operate in an assembly line fashion and
are necessary for the CPU to execute instructions
Program fetch unit
To retrieve a fetch packet (FP), which is a group of eight
instructions, four phases are required:

PG phase: the CPU generate a fetch address


PS phase: the CPU sends the address to the memory
PW phase: the CPU waits for the data to be ready
PR phase: the CPU reads the opcode

Instruction dispatch unit


Instruction decode unit

611 37100 Lecture 16-25

611 37100 Lecture 16-28

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

The block diagram of TMS320C6000


Core CPU

Instruction dispatch unit


Eight instructions are received if the memory is
internal or a single instruction if the memory is
external.

Eight functional units


Data path
Control unit
32 32-bit registers

DP
(1 cycle)

On chip memory
Data RAM
Program RAM

DC
(1 cycle)
[8 units]

On chip peripherals
Internal buses
EMIF (External Memory Interface)

PG

611 37100 Lecture 16-26

PR*

Memory
PS

PW

611 37100 Lecture 16-29

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

The Central Processing Unit (CPU)

CPU data paths


Two general-purpose register files (A and B)

Program Control Unit


-Program Fetch
-Instruction Dispatch
-Instruction Decode
Data Path 1

16 32-bit registers (A0~A15) for file A


16 32-bit registers (B0~B15) for file B

Eight functional units

Data Path 2

Two load-from-memory paths (LD1 and LD2)


Two store-to-memory paths (ST1 and ST2)
Two register file cross paths (1X and 2X)
Two data address paths (DA1 and DA2)

Control Registers
Test, Emulation, Control
And Interrupt Logics

611 37100 Lecture 16-27

.L1, .S1, .M1, and .D1 for A


.L2, .S2, .M2, and .D2 for B

611 37100 Lecture 16-30

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

CPU data paths

Data cross paths


The data cross paths can also be referred to as
the register file cross paths.
Allow operands from one side to cross to the
other side.
Basic arithmetic instructions: MPY, ADD, SUB
Only one cross path per direction per execute
packet is permitted.
The destination register is always on the same
side of the unit used.

Program Control Unit


Data Path 1

Data Path 2

Register File A

Register File B

.L1 .S1 .M1 .D1

.L2 .S2 .M2 .D2

Control Registers
Test, Emulation, Control
And Interrupt Logics

611 37100 Lecture 16-31

611 37100 Lecture 16-34

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

CPU data paths and control

Address cross paths


The advantage of using an address cross path is
to be able to generate the address using one
register file, and accessing the data from the
other register file.
A1
DA1

.D1

*A0

Register
File A

Data path 1
Data path 2
.D2

DA2
B1

611 37100 Lecture 16-32

611 37100 Lecture 16-35

*B0
Register
File B

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

Cross paths
Cross paths enable linking of one side of the
CPU to the other.
Type of operand cross paths:
Data cross path
Address cross path

Address cross paths


Only one address cross path per direction per
execute packet is allowed.
When an address cross path is used, the
destination register for the load (LD) instructions
and the source register for the store (ST)
instruction should come from the opposite side of
the unit, or simply the register pointers must
come from the same side of the .D unit used.
If both .D units are to be used, then either none
or both of the address cross path should be used.

611 37100 Lecture 16-33

611 37100 Lecture 16-36

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

Functional units
There are four functional units for each data path.

Functional units - .M units


.M units are hardware multiplier units capable of
performing 16-bit by 16-bit multiplications
producing 32-bit results (for the C62xx).
The operands may come from the 16 MSB or the
16 LSB of the registers used.

. L units
.M units
.S units
.D units

All the units operate on 32-bit operands and


execute instructions simultaneously.
Eight functional units can be used in parallel to
execute up to 8 instructions in a given cycle.
If the cycle time for one instruction is 5ns (200
MHz clock), this results in 1600 MIPs of
performance.
611 37100 Lecture 16-37

Register a
a

MPY

a,b,c

MPYH

a,b,c

MPYHL

a,b,c

MPYLH

a,b,c

611 37100 Lecture 16-40

Register b
X

Register c
=

axb

AxB

Axb

axB

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

Functional units

Functional units - .S units


The .S units contain 32-bit integer ALUs and 40bit shifters
32-bit arithmetic, logic and bit field operations
32/40-bit shifts
Branches (.S2 only when using a register)
Register transfers to and from control registers
(.S2 only)
Constant generation

611 37100 Lecture 16-38

611 37100 Lecture 16-41

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

Functional units - .L units


40-bit integer Arithmetic and Logic Units (ALUs)
32/40-bit arithmetic and compare operations
32-bit logical operations
Normalization and bit count operations
Saturated arithmetic for 32/40-bit operations

Functional units - .D units


Load and store with 5-bit constant offset
Load and store with 15-bit constant offset (.D2
only)
32-bit additions/subtractions
Linear and circular address calculation

611 37100 Lecture 16-39

611 37100 Lecture 16-42

16.3 The TMS320C6xxx Core CPU


Architecture

16.3 The TMS320C6xxx Core CPU


Architecture

Control registers
The C62xx devices have 10 registers for control
purposes, while the C67xx have 13 control
registers.
Reading and writing to the control registers can
only be performed via the .S2 unit.
All the control registers can only be accessed by
the MVC (move constant) instruction.

Register files
Each data path contains a register file composed
of 16 32-bit general purpose registers (A0-A15
for data path 1 and B0-B15 for data path 2).
These registers can support 32- and 40-bit fixed
point data or 64-bit double-precision floating point
data for the case of the C67xx.
The general-purpose register can be used for:
Data
Data address pointer
Conditional registers

611 37100 Lecture 16-43

611 37100 Lecture 16-46

16.3 The TMS320C6xxx Core CPU


Architecture

16.4 The TMS320C6xxx Memory


Architecture

Control registers

Memory map
Memory map 0 (direct execution)

Internal program RAM: 64K/256K from 01400000H


Internal data RAM: 64K/256K from 80000000H
Internal peripherals: 8MB from 01800000H
Expansion bus: 1GB from 40000000H on C62
External memory space (CE0-CE3)

Memory map 1 (boot mode)

611 37100 Lecture 16-44

Internal program RAM: 64K/256K from 01400000H


Internal data RAM: 64K/256K from 80000000H
Internal peripherals: 8MB from 01800000H
Expansion bus: 1GB from 40000000H on C62
External memory space (CE0-CE3)

611 37100 Lecture 16-47

16.3 The TMS320C6xxx Core CPU


Architecture

16.4 The TMS320C6xxx Memory


Architecture

Control registers Extension for floating

Memory map

point operations

611 37100 Lecture 16-45

611 37100 Lecture 16-48

16.4 The TMS320C6xxx Memory


Architecture

16.4 The TMS320C6xxx Memory


Architecture

Data memory access


The CPU and DMA controller requests are made
through the Data MEMory Controller (DMEMC).
The DMA controller only uses the DMEMC for
internal data memory access.
The CPU uses the DMEMC for data request to
internal data memory as well as to on-chip
peripherals or to the External Memory InterFace
(EMIF).
For the CPU/DMA access, the arbitration is
performed by the DMEMC.

Cache architecture

611 37100 Lecture 16-49

611 37100 Lecture 16-52

16.4 The TMS320C6xxx Memory


Architecture

16.4 The TMS320C6xxx Memory


Architecture

Data memory access

External memory interface (EMIF)


The external memory interface (EMIF) connects
the CPU and external memory, such as
synchronous dynamic RAM (SDRAM),
synchronous burst static RAM (SBSRAM), and
asynchronous memory.
The EMIF also provides 8-bit-wide and 16-bitwide memory read capability to support low-cost
boot ROM memories (flash, EEPROM, EPROM,
and PROM).
The EMIF supports high throughput interfaces to
SDRAM, including burst capability.

CPU-DMEMC memory bank connections (C6201 Rev.2)


611 37100 Lecture 16-50

611 37100 Lecture 16-53

16.4 The TMS320C6xxx Memory


Architecture

16.4 The TMS320C6xxx Memory


Architecture

Internal memory

External memory interface (EMIF)

The amount and location of internal memory depends on


the particular device. The C6201, C6202, and C6701
have separate program and data memories (Harvard
architecture), while the C6211 has a portion of its internal
memory that may be used for either program or data.

611 37100 Lecture 16-51

611 37100 Lecture 16-54

16.5 The TMS320C6xxx Peripherals

16.5 The TMS320C6xxx Peripherals

TMS320C6000 Peripherals

Host-Port Interface (HPI)

611 37100 Lecture 16-55

611 37100 Lecture 16-58

16.5 The TMS320C6xxx Peripherals

16.5 The TMS320C6xxx Peripherals

Direct Memory Access (DMA) controller


The direct memory access (DMA) controller
transfers data between regions in the memory
map without intervention by the CPU.
The DMA allows movement of data to and from
internal memory, internal peripherals, or external
devices to occur in the background of CPU
operation.
The DMA has four independently programmable
channels allowing four different contexts for
operation. In addition, a fifth (auxiliary) channel
allows the DMA to service requests from the
host-port interface (HPI) or the Expansion Bus
(XB).

Boot configuration logic

611 37100 Lecture 16-56

The C62x and C67x devices provide a variety of boot


configurations for proper device initialization. These
configurations determine what actions the C62x/C67x
performs after device reset to prepare for initialization. These
boot configurations, which are set by external input pins,
determine:
The memory map the device selects. The memory map
determines whether internal or external memory is mapped
at address 0.
The type of external memory at address 0 (if external
memory is mapped at address 0)
The boot process used to initialize the memory at address
0 before the CPU is allowed to run.
611 37100 Lecture 16-59

16.5 The TMS320C6xxx Peripherals

16.5 The TMS320C6xxx Peripherals

Host-Port Interface (HPI)

Boot configuration logic


Three types of boot processes are available:

The Host-Port Interface (HPI) is a 16-bit wide parallel port


through which a host processor can directly access the
CPUs memory space.
The host device functions as a master to the interface,
which increases ease of access.
The host and CPU can exchange information via internal or
external memory. The host also has direct access to
memory-mapped peripherals.
The HPI is connected to the internal memory via a set of
registers. Either the host or the CPU may use the HPI
Control register (HPIC) to configure the interface.
The host can access the host address register (HPIA) and
the host data register (HPID) to access the internal memory.
The HPIC is a memory-mapped register, which allows the
CPU access.
611 37100 Lecture 16-57

No boot process (direct-execution startup): The CPU


simply starts running from the memory located at address 0.
When this memory location resides on SDRAM, the CPU is
held until SDRAM initialization finishes.
ROM boot process: A section of external memory is
copied to address 0 by the DMA/EDMA controller. Although
the boot process begins when the device is released from
external reset, this transfer occurs while the CPU is held in
reset internally.
Host-boot process: In the host-boot process, the CPU is
held in reset while the remainder of the device is released
from reset. During this period, an external host can initialize
the CPUs memory space as necessary through the HPI or
expansion bus.
611 37100 Lecture 16-60

10

16.5 The TMS320C6xxx Peripherals

16.5 The TMS320C6xxx Peripherals

Multichannel Buffered Serial Port (McBSP)


The standard serial port interface provides:

Internal timers

Full-duplex communication
Double-buffered data registers, which allow a continuous
data stream
Independent framing and clocking for reception and
transmission
Direct interface to industry-standard codecs, analog
interface chips (AICs), and other serially connected A/D
and D/A devices
External shift clock generation or an internal programmable
frequency shift clock

611 37100 Lecture 16-61

The C62x/C67x has two 32-bit general-purpose


timers that you can use to:
Time events
Count events
Generate pulses
Interrupt the CPU
Send synchronization events to the DMA
controller

611 37100 Lecture 16-64

16.5 The TMS320C6xxx Peripherals

16.5 The TMS320C6xxx Peripherals

Multichannel Buffered Serial Port (McBSP)

Internal timers
The timer has two signaling modes and can be
clocked by an internal or an external source.
The timer has an input pin (TINP) and an output
pin (TOUT). The TINP pin can be used as a
general-purpose input, and the TOUT pin can be
used as a general-purpose output.
With an internal clock, the timer can signal an
external A/D converter to start a conversion, or it
can trigger the DMA controller to begin a data
transfer.
With an external clock, the timer can count
external events and interrupt the CPU after a
specified number of events.

611 37100 Lecture 16-62

611 37100 Lecture 16-65

16.5 The TMS320C6xxx Peripherals

16.5 The TMS320C6xxx Peripherals

Multichannel Buffered Serial Port (McBSP)

Interrupts
The C62x/C67x CPU has 14 interrupts. These
are reset, the nonmaskable interrupt (NMI), and
interrupts 415. These interrupts correspond to
the RESET, NMI, and INT4INT15 signals on the
CPU boundary.
In some C62x/C67x devices, these signals may
be tied directly to pins on the device, connected
to on-chip peripherals, or may be disabled
permanently by being tied inactive on chip.
Generally, RESET and NMI are connected
directly to pins on the device.

Multichannel Buffered Serial Port (McBSP) Registers


611 37100 Lecture 16-63

611 37100 Lecture 16-66

11

16.5 The TMS320C6xxx Peripherals

16.5 The TMS320C6xxx Peripherals

Interrupts

Interrupts

Characteristics of interrupt servicing include:


The IACK pin from the CPU is used to
acknowledge an interrupt request.
The INUM0INUM3 pins indicate which interrupt
vector is being serviced.
Interrupt vectors are relocatable.
Interrupt vectors consist of one fetch packet
which provides for quick servicing.
Interrupt control registers

611 37100 Lecture 16-67

16.5 The TMS320C6xxx Peripherals


Interrupts
When an interrupt occurs, the CPU automatically
recognizes the source of the interrupt and jumps
to the interrupt vector location.
Each vector location can accommodate eight
instructions which correspond to a fetch packet.
Such a location is known as the Interrupt
Service Fetch Packet (ISFP) address.
When the CPU branches to the ISFP address,
the ISP can be fitted in a single fetch packet or in
multiple (successive or non-successive) fetch
packets.
611 37100 Lecture 16-68

611 37100 Lecture 16-70

16.6 The TMS320C6000 Software


Development
Instruction set

611 37100 Lecture 16-71

16.5 The TMS320C6xxx Peripherals

16.6 The TMS320C6000 Software


Development

Interrupts

Instruction set

Interrupt source

ISFP address

Reset

0x0000

NMI

0x0020

Reserved

0x0040

Reserved

0x0060

INT4

0x0080

INT5

0x00A0

INT6

0x00C0

INT7

0x00E0

INT8

0x0100

INT9

0x0120

INT10

0x0140

INT11

0x0160

INT12

0x0180

INT13

0x01A0

INT14

0x01C0

INT15

0x01E0

611 37100 Lecture 16-69

Interrupt Service Table (IST)

611 37100 Lecture 16-72

12

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Instruction syntax

Fetch phases of the pipeline

The fetch phases of the pipeline are:

Label
Mnemonic
Instruction
Directive
Unit specifier
Operand field
A register (e.g. A1) or a register pointer (e.g. *A1)
A symbol (e.g. loop) or constant (e.g. 390)
An expression (e.g. if label >= 100)
Comment field

611 37100 Lecture 16-73

PG: Program address generate


PS: Program address send
PW: Program access ready wait
PR: Program fetch packet receive

611 37100 Lecture 16-76

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Example of assembly language syntax

Decode phases of the pipeline


The decode phases of the pipeline are:
DP: Instruction dispatch
DC: Instruction decode

611 37100 Lecture 16-74

611 37100 Lecture 16-77

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Pipeline operation

Execution phases of the pipeline

The pipeline phases are divided into three stages:


Fetch
Decode
Execute
All instructions in the C62x/C64x instruction set flow
through the fetch, decode, and execute stages of the
pipeline.
The fetch stage of the pipeline has four phases for all
instructions, and the decode stage has two phases for all
instructions. The execute stage of the pipeline requires a
varying number of phases, depending on the type of
instruction.

611 37100 Lecture 16-75

611 37100 Lecture 16-78

13

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Execution phases of the pipeline

Operations occurring during pipeline phases

611 37100 Lecture 16-79

611 37100 Lecture 16-82

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Pipeline operation:
One execute packet per fetch packet

Software development tools


A complete development tool set for both the PC
and Sun workstations includes the following:

611 37100 Lecture 16-80

C compiler
Assembly optimizer
Assembler
Linker
Evaluation tools

The C6000 platforms C compiler eliminates the


need for extensive knowledge of DSP
architecture while the the inherent performance
benefits of the advanced VLIW architecture is
maintained.

611 37100 Lecture 16-83

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Operations occurring during pipeline phases

Software development tools

611 37100 Lecture 16-81

611 37100 Lecture 16-84

14

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Process to develop DSP code phase 1

TMS320C6201 Evaluation Module, EVM


The C6x EVM is a full-size PCI board with
approximate dimensions of 4.2 inches wide,
12.28 inches long, and 0.49 inches high.
The C6x EVM can be operated stand-alone on a
desktop with the use of an external power supply
and XDS510 or XDS510WS emulator.
The C6x EVM has a C6201 or C6701 DSP
onboard that allows full-speed verification of C6x
code with the included source debugger.

611 37100 Lecture 16-85

611 37100 Lecture 16-88

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Process to develop DSP code phase 2

TMS320C6201 Evaluation Module, EVM

611 37100 Lecture 16-86

611 37100 Lecture 16-89

16.6 The TMS320C6000 Software


Development

16.6 The TMS320C6000 Software


Development

Process to develop DSP code phase 3

TMS320C6201 Evaluation Module, EVM

611 37100 Lecture 16-87

611 37100 Lecture 16-90

15

16.6 The TMS320C6000 Software


Development
EVM features
Uses the TMS320C6201 DSP processor
DSP clock: 32.25 MHz, 50 MHz, 133 MHz or 200
MHz.
External memory:
64K x 32, 133 MHz SBSRAM
1M x 32, 100 MHz SDRAM (bank 0)
1M x 32, 100 MHz SDRAM (bank 1)

Expansion memory interface provided


PCI or JTAG Emulator Interface
Stereo codec interface (44.1 kHz sampling)
Code generation tools and support library
included
611 37100 Lecture 16-91

16.6 The TMS320C6000 Software


Development
C6000 EVM

611 37100 Lecture 16-92

16

You might also like