KEMBAR78
DSPAA Notes | PDF | Computers
0% found this document useful (0 votes)
92 views42 pages

DSPAA Notes

1. A DSP system consists of several key blocks: an anti-aliasing filter, ADC, DSP processor, and issues to consider in design include arithmetic format, data width, speed, memory organization, ease of development, and cost. 2. Shifters are needed in DSP applications for scaling operands to prevent overflow and underflow. A 4-bit barrel shifter allows shifting the input by 0-3 bit positions using a multiplexer structure. 3. The MAC unit performs multiply-accumulate operations efficiently for DSP algorithms. It contains a multiplier and accumulator. The ALU performs basic arithmetic and logical operations.

Uploaded by

Kushagra Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views42 pages

DSPAA Notes

1. A DSP system consists of several key blocks: an anti-aliasing filter, ADC, DSP processor, and issues to consider in design include arithmetic format, data width, speed, memory organization, ease of development, and cost. 2. Shifters are needed in DSP applications for scaling operands to prevent overflow and underflow. A 4-bit barrel shifter allows shifting the input by 0-3 bit positions using a multiplexer structure. 3. The MAC unit performs multiply-accumulate operations efficiently for DSP algorithms. It contains a multiplier and accumulator. The ALU performs basic arithmetic and logical operations.

Uploaded by

Kushagra Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

DSPA

1. Sketch a neat figure of DSP system and explain the various blocks. Explain
the issues to be considered in designing and implementing a DSP system with
a neat figure

a.) anti aliasing filter: An anti-aliasing filter (AAF) is a filter used before a signal sampler to
restrict the bandwidth of a signal to satisfy the Nyquist–Shannon sampling theorem over the band
of interest.

b.) ADC: In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that
converts an analog signal, such as a sound picked up by a microphone or light entering a digital
camera, into a digital signal.

c.) DSP: Digital signal processing (DSP) is the use of digital processing, such as by computers or
more specialized digital signal processors, to perform a wide variety of signal processing
operations.

Issues considered in designing a DSP are:

a.) Arithmetic Format. One of the most fundamental characteristics of a programmable digital
signal processor is the type of native arithmetic used in the processor.

b.) Data Width

c.) Speed

d.) Memory Organization. ...

e) Ease of Development. ...

f) Multiprocessor Support. ...

g) Power Consumption and Management. ...

h) Cost.

2. Explain the need of shifters in DSP applications? Also explain the


implementation of a 4-bit Barrel Shifter and its working showing its block
diagram
Shifters are used to either scale down or scale up operands or the results. The

following scenarios give the necessity of a shifter

A. While performing the addition of N numbers each of n bits long, the sum can
grow up to n+log2 N bits long. If the accumulator is of n bits long, then an

overflow error will occur. This can be overcome by using a shifter to scale down

the operand by an amount of log2N.

B. Similarly while calculating the product of two n bit numbers, the product can

grow up to 2n bits long. Generally the lower n bits get neglected and the sign bit

is shifted to save the sign of the product.

C. Finally in case of addition of two floating-point numbers, one of the operands has

to be shifted appropriately to make the exponents of two numbers equal.

From the above cases it is clear that, a shifter is required in the architecture of a

DSP.

Explanation and implementation of a 4 bit barrel shifter:


3. Describe the functionality of MAC and ALU unit in DSP processors with a
neat figure
Most of the DSP applications require the computation of the sum of the products

of a series of successive multiplications. In order to implement such functions a special

unit called a Multiply and Accumulate (MAC) unit is required.

A MAC consists of a multiplier and a special register called Accumulator. MACs

are used to implement the functions of the type A+BC. A typical MAC unit is as shown
ALU:

A typical DSP device should be capable of handling arithmetic instructions like

ADD, SUB, INC, DEC etc and logical operations like AND, OR , NOT, XOR etc. The

block diagram of a typical ALU for a DSP is as shown in the figure:


4. What is Overflow/underflow? Explain its need and the various ways of
controlling overflow/underflow in DSP computations.
Overflow, which occurs when an operation such as the addition of two numbers
produces a result with more bits than can fit within a processor’s register, becomes
a concern.
underflow occurs when storing a value lesser than the minimum
supported value.
Its Need:
the resulting values on overflow and underflow are the closest to the
“real” values we would get if operating without constraints.
5. a. Briefly explain how the Circular addressing mode is used in DSP
processors
b.A DSP has a Circular buffer with the start and end address as 0200h and
0310h. What is the circular buffer size? What would be the new values of
the address pointer of the above buffer if, in the course of address
computation it gets updated to i) 0212h ii) 01FCh

c. Repeat the problem of the start and end addresses of the circular buffer
are 0210h and 0201h respectively
b. & c.
6. With the help of a neat block diagram explain the need and operations of
a. Address Generation Unit
b. Program Sequencer
7. Explain the implementation of an 8-tap FIR Filter given by y(n) = Σ h(i) x(n-i) using

a. Parallel Implementation

b. Pipeline Implementation

SAME AS Q.13

8. Explain the signification of Q-notation in DSP processors


DSP algorithm implementations deal with signals and coefficients. To use a fixedpoint DSP
device efficiently, one must consider representing filter coefficients and signal samples using
fixed-point2’s complement representation. Ex: N=16, Range: -2N-1 to +2N-1 -1(-32768 to
32767)Typically, filter coefficients are fractional numbers. To represent such numbers, the Q-
notation has been developed. The Q-notation specifies the number of fractional bits.

A commonly used notation for DSP implementations is Q15. In the Q15 representation, the least
significant 15 bits represent the fractional part of a number. In a processor where 16 bits are used
to represent numbers, the Q15 notation uses the MSB to represent the sign of the number and the
rest of the bits represent the value of the number. In general, the value of a 16-bit Q15 number N
represented as: b15…………b1b0 N= - b15+ b142 -1+…………+b02 -15 Range:-1 to 1- 2-15

9. What is an Interpolation filter? Explain the need and working of


Interpolation and Decimation filters in DSP
10. Explain the working and implementation of FIR and IIR filters
11. Find the magnitude and phase response of an FIR filter represented by the difference
equation

y (n) = 0.5 x(n) + 0.5 x(n-1)


12. Obtain the transfer function of the IIR filter whose difference equation is given by

y (n)= 0.9 y (n-1) + 0.1 x (n)


13. Consider the implementation of an 8-tap FIR filter given by

y (n) =Σ h(i) x(n-i)

Illustrate the pipelined implementation of this filter using

i. Eight MACs

ii. Parallel Implementation using Two MACs


14.

a. Examine the structure and working of a 4 x 4 Braun Multiplier. Illustrate


the effects of speed and Bus-widths in the multipliers and explain how it is
handled.

b. Consider a MAC unit whose inputs are 16-bit numbers. If 256 products
are to be summed up in this MAC, how many guard bits should be provided
for the accumulator to prevent overflow condition from occurring?
b.
15. Interpret the need and role of Digital filters. Sketch a neat figure and explain
the structure of a Digital filter. Compare the salient features of FIR and IIR filters
with relevant equations showing their advantages and drawbacks

Filters are used to remove the unwanted components in the sequence. They are
characterized by the impulse response h (n). The general difference equation for an Nth
order filter is given by, y (n) = Σ aky(n-k)+ Σ bkx(n-k)

A typical digital filter structure is as shown in figure 1.7.


Values of the filter coefficients vary with respect to the type of the filter. Design of a digital
filter involves determining the filter coefficients. Based on the length of the impulse response,
digital filters are classified into two categories viz Finite Impulse Response (FIR) Filters and
Infinite Impulse Response (IIR) Filters.

FIR Filters
FIR filters have impulse responses of finite lengths. In FIR filters the present output depends
only on the past and present values of the input sequence but not on the previous output
sequences. Thus they are non recursive hence they are inherently stable.
FIR filters possess linear phase response. Hence they are very much applicable for the
applications requiring linear phase response.
The difference equation of an FIR filter is represented as y (n) = Σ bkx(n-k)
The frequency response of an FIR filter is given as H (e jθ)=Σbk e-jkθ
also H (Z)=Σbk Z-k
The major drawback of FIR filters is, they require more number of filter coefficients to realize
a desired response as compared to IIR filters. Thus the computational time required will also
be more.

IIR Filters
Unlike FIR filters, IIR filters have an infinite number of impulse response samples. They are
recursive filters as the output depends not only on the past and present inputs but also on
the past outputs. They generally do not have linear phase characteristics. Typical system
function of such filters is given by, H (Z) = (b0+b1z-1+b2z-2+…………bLz-L) / (1-a1z-1-a2z-
2-………aNz-N)
16. Compute the indices for an 16-point FFT using Bit reversed Addressing Mode
showing all the intermediate steps

17. What are the memory addresses of the operands in each of the following cases
of indirect addressing modes? In each case, what will be the content of the addreg
after the memory access? Assume that the initial contents of the addreg and the
offsetreg are 0200h and 0010h, respectively.

ADD *addreg-

ADD +*addreg

ADD offsetreg+,*addreg

ADD *addreg, offsetreg-

18.

a. Interpret the need for shifters in DSP applications? Also explain the
implementation of a 4-bit Barrel Shifter and its working showing its block
diagram

Shifters are used to either scale down or scale up operands or the results. The following
scenarios give the necessity of a shifter a. While performing the addition of N numbers each
of n bits long, the sum can grow up to n+log2 N bits long. If the accumulator is of n bits long,
then an overflow error will occur. This can be overcome by using a shifter to scale down the
operand by an amount of log2N. b. Similarly while calculating the product of two n bit
numbers, the product can grow up to 2n bits long. Generally the lower n bits get neglected
and the sign bit is shifted to save the sign of the product. c. Finally in case of addition of two
floating-point numbers, one of the operands has to be shifted appropriately to make the
exponents of two numbers equal.
From the above cases it is clear that, a shifter is required in the architecture of a
DSP.
b. It is required to find the sum of 128 numbers each represented by 32-bits.
How many bits should the accumulator have so that the sum can be
computed without the occurrence of overflow error or loss of accuracy?

The sum of 128, 32 bit numbers can grow up to (32+ log2 128 )=39 bits long. Hence the
accumulator should be 39 bits long in order to avoid overflow error from occurring.

c. If, for the previous problem, it is decided to have an accumulator with


only 32 bits but shift the numbers before the addition to prevent overflow, by
how many bits should each number be shifted?

As the length of the accumulator is fixed, the operands have to be shifted by an amount of
log2 128 = 7 bits prior to addition operation, in order to avoid the condition of overflow.

19. What are guard bits? Interpret its need in a MAC unit of a DSP device.
Consider a MAC unit whose inputs are 24-bit numbers. How many guard bits
should be provided if 512 products have to be added in the accumulator to prevent
overflow condition? What is the overall size of the accumulator required?
20.

a. Explain the need and working of a MAC unit in DSP applications with the
help of its block diagram. Discuss how overflow and underflow can be
overcome while performing multiplications
b. How many bits should the accumulator require to find the sum of 128
numbers each represented by 16 bits, so that the sum can be computed
without the occurrence of overflow error or loss of accuracy?

The sum of 128, 16 bit numbers can grow up to (16+log2 128 )= 23 bits long. Hence the
accumulator should be 23 bits long in order to avoid overflow error from occurring.
21. Explain the implementation of an 8-tap FIR Filter given by

y(n) = Σ h(i) x(n-i) using Parallel and Pipeline Implementation

SAME AS Q.13

22. Explain the implementation of a 4-bit shift right barrel shifter with a neat figure.
A Barrel Shifter is to be designed with 16 inputs for left shifts from 0 to 15 bits. How
many control lines are required to implement the shifter?
A Barrel Shifter is to be designed with 16 inputs for left shifts from 0 to 15 bits.
How many control lines are required to implement the shifter?

As the number of bits used to represent the input are 16, log2 16=4 control inputs are
required.
UNIT 2

1. Examine the data types and floating point formats supported in


TMS320C6x processors showing the respective ranges and resolution

Some data types are:


2. Write a Assembly program calling an Assembly function to compute
the Dot product of two given numbers using the instruction set of
TMS320C6x architecture showing relevant comments
3. Examine the functionality of prolog, loop kernel and epilog with
relevant examples
Software pipelining uses available resources to obtain efficient pipelining
code. The
aim is to use all eight functional units within one cycle. However,
substantial coding
effort is required using the software pipelining technique. There are three
stages to
a pipelined code:
1. Prolog
2. Loop kernel (or loop cycle)
3. Epilog.

The first stage, prolog, contains instructions to build the second-stage loop
cycle, and
the epilog stage (last stage) contains instructions to finish all loop iterations.
Soft-
ware pipelining is used by the compiler when optimization option level –o2
or –o3
is invoked. The most efficient software pipelined code has loop trip counters
that
count down: for example,

for (i = N; i != 0; i--)

A dot product example with word-wide hand-coded pipelined code results in


(N/2)
+ 8 cycles to obtain the sum of two arrays, with N numbers in each array.
This trans-
lates to 108 cycles to find the sum of products of 200 numbers, as
illustrated in
Chapter 8. This efficiency is obtained using instructions such as LDW to
load a 32-
bit word, and multiplying the lower and higher 16-bit numbers separately
with the
two instructions mpy and mpyh, respectively.
Removing the epilog section can also reduce the code size. The available
options
–msn (n = 0,1,2) directs the compiler to favor code size reduction over
performance.
Producing a hand-coded software pipelined code can be obtained by first
drawing
a dependency graph and setting up a scheduling table [8]. In Chapter 8 we
discuss
software pipelining in conjunction with code efficiency.
4. Interpret the need and working of cross paths? Explain the software
pipelining and its various stages used in implementing DSP
applications .

Software pipelining

Software pipelining uses available resources to obtain efficient pipelining


code. The
aim is to use all eight functional units within one cycle. However,
substantial coding
effort is required using the software pipelining technique. There are three
stages to
a pipelined code:
1. Prolog
2. Loop kernel (or loop cycle)
3. Epilog.

The first stage, prolog, contains instructions to build the second-stage loop
cycle, and
the epilog stage (last stage) contains instructions to finish all loop iterations.
Soft-
ware pipelining is used by the compiler when optimization option level –o2
or –o3
is invoked. The most efficient software pipelined code has loop trip counters
that
count down: for example,

for (i = N; i != 0; i--)

A dot product example with word-wide hand-coded pipelined code results in


(N/2)
+ 8 cycles to obtain the sum of two arrays, with N numbers in each array.
This trans-
lates to 108 cycles to find the sum of products of 200 numbers, as
illustrated in
Chapter 8. This efficiency is obtained using instructions such as LDW to
load a 32-
bit word, and multiplying the lower and higher 16-bit numbers separately
with the
two instructions mpy and mpyh, respectively.
Removing the epilog section can also reduce the code size. The available
options
–msn (n = 0,1,2) directs the compiler to favor code size reduction over
performance.
Producing a hand-coded software pipelined code can be obtained by first
drawing
a dependency graph and setting up a scheduling table [8]. In Chapter 8 we
discuss
software pipelining in conjunction with code efficiency.

5. Examine the following constraints in the memory considerations of


TMS320C6x processors
i. Data Allocation
ii. Data alignment
iii. Pragma Directives
iv. Memory models

3.17.1 Data Allocation


Blocks of code and data can be allocated in memory within sections
specified in
the linker command file. These sections can be either initialized or
uninitialized.
Initialized or uninitialized sections, except .text, cannot be allocated into
internal
program memory. The initialized sections are:
1. .cinit: for global and static variables
2. .const: for global and static constant variables
3. .switch: contains jump tables for large switch statements
4. .text: for executable code and constants
The uninitialized sections are:
1. .bss: for global and static variables
2. .far: for global and static variables declared far
3. .stack: allocates memory for the system stack
4. .sysmem: reserves space for dynamic memory allocation used by the
malloc,
calloc, and realloc functions
The linker can be used to place sections, such as, text in fast internal
memory for
most efficient operation.

3.17.2 Data Alignment


The C6x always accesses aligned data which allows it to address bytes, half-
words,
and words (32 bits).The data format consists of four byte boundaries, two
half-word
boundaries, and one word boundary. For example, to assign a 32-bit load
with
LDW, the address must be aligned with a word boundary so that the lower 2
bits of
the address are zero. Otherwise, incorrect data can be loaded. A double-
word (64
bits) also can be accessed. Both .S1 and .S2 can be used to execute the
double-
word instruction LDDW to load two 64-bit double words, for a total of 128
bits per
cycle.

3.17.3 Pragma Directives


The pragma directives tell the compiler to consider certain functions.
Pragmas
include DATA_ALIGN, DATA_SECTION, and so on. The DATA_ALIGN pragma
has
the syntax
#pragma DATA_ALIGN (symbol,constant);
which aligns symbol to a boundary.The constant is a power of 2.This
pragma direc-
tive is used later in conjunction with FFT examples to align data in memory.
The DATA_SECTION pragma has the following syntax:
#pragma DATA_SECTION (symbol,”my_section”);
which allocates space for symbol in the section named my_section.
Another useful pragma directive,
# pragma MUST_ITERATE (20,20)
tells the compiler that the loop following will execute 20 times (minimum
and
maximum of 20 times).

3.17.4 Memory Models


The compiler generates a small memory model code by default. Every data
object
is handled as if declared near unless it is specifically declared far. If the
DATA_SECTION pragma is used, the object is specified as a far variable.
How run-time support functions are called can be controlled by the option –
mr0
with the run-time support data and calls near, or by the option –mr1 with
the run-
time support data and calls far. Using the far method to call functions does
not
imply that those functions must reside in off-chip memory.
Large-memory models can be generated with the linker options –mlx (x = 0
to 4). If no level is specified, data and functions default to near. These
models can
be used if calling a function that is more than 1 M word away.

6. Write a C program calling an Assembly function to compute the


factorial of a number using the instruction set of TMS320C6x
architecture showing relevant comments
7. Write a C program to compute linear convolution of two given
sequences x1 =[1, 2, 3, 4] and x2 =[1, 2, 2, 1] for a TMS320C6x
architecture showing relevant comments

8. Examine the different types instructions supported by TMS320C6x


with relevant examples
9. Write a C program to generate a Sine and Random waveform for the
TMS320C6x architecture showing relevant comments
10. What are Assembler Directives? Examine the commonly used
assembler directives in TMS320C6x processor
An assembler directive is a message for the assembler (not the compiler) and
is not
an instruction. It is resolved during the assembling process and does not
occupy
memory space as an instruction does. It does not produce executable code.
Addresses of different sections can be specified with assembler directives.
For
example, the assembler directive .sect “my_buffer” defines a section of code
or data named my_buffer. The directives.text and.data indicate a section for
text and data, respectively. Other assembler directives, such as.ref and .def,
are
used for undefined and defined symbols, respectively. The assembler creates
several
sections indicated by directives such as .text for code and .bss for global and
static variables.
Other commonly used assembler directives are:
1. .short: to initialize a 16-bit integer.
2. .int: to initialize a 32-bit integer (also .word or .long). The compiler
treats a long data value as 40 bits, whereas the C6x assembler treats it as
32 bits.
3. .float: to initialize a 32-bit IEEE single-precision constant.
4. .double: to initialize a 64-bit IEEE double-precision constant.
Initialized values are specified by using the assembler
directives .byte, .short,
or .int. Unitialized variables are specified using the directive .usect, which
creates an uninitialized section (like the .bss section), whereas the
directive .sect
creates an initialized section. For example, .usect “variable”, 128,2 desig-
nates an unitialized section named variable, the section size in bytes, and
the data
alignment in bytes, respectively.

11. Examine the different constraints in the memory considerations of


TMS320C6x processors and how they can be solved with examples

12. Examine the memory constraints, cross-path constraints,


load/store constraints and pipelining constraints in TMS320C6x
processor

You might also like