KEMBAR78
Data Path Subsystem Design | PDF | Dynamic Random Access Memory | Random Access Memory
0% found this document useful (0 votes)
87 views84 pages

Data Path Subsystem Design

Unit 4 notes vlsi important topics class notes? r18 ece (electronics communication engineering)

Uploaded by

polavenitharun48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views84 pages

Data Path Subsystem Design

Unit 4 notes vlsi important topics class notes? r18 ece (electronics communication engineering)

Uploaded by

polavenitharun48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Data path Subsystem

By
A N SAI CHAKRAVARTHY
M.Tech (VLSI System Design)
One/zero Detectors
• Detecting all ones or zeros on wide N-bit words requires large fan-
in AND or NOR gates.
• AND, OR, NAND, and NOR are fundamentally the same operation
except for possible inversions of the inputs and/or outputs. You
can build a tree of AND gates, as shown in Figure .
• Here, alternate NAND and NOR gates have been used.
• The path has log N stages. In general, the minimum logical effort is
achieved with a tree alternating NAND gates and inverters and the
path logical effort is

• A rough estimate of the path delay driving a path electrical effort


of H using static CMOS gates is

• where tFO4 is the fanout-of-4 inverter delay.


Fig: zero/one detectors

Fig: Fast detector using pseudo nMOS


Comparators
Magnitude comparator:
• A magnitude comparator determines the larger of two
binary numbers.
• To compare two unsigned numbers A and B, compute B
– A = B + A + 1.
• If there is a carry-out, A ≤ B;otherwise, A > B. A zero
detector indicates that the numbers are equal.
• 4-bit unsigned comparator built from a carry-ripple
adder and two’s complementer is shown in figure.
• The relative magnitude is determined from the carry-
out (C) and zero (Z) signals according to Table .
• Equality Comparator
• An equality comparator determines if (A = B). This can
be done more simply and rapidly with XNOR gates and
a ones detector, as shown in Figure below.
Counters
• Two commonly used types of counters are
binary counters and linear-feedback shift
registers.
• An N-bit binary counter sequences through 2N
outputs in binary order.
• An N-bit linear-feedback shift register sequences
through up to 2N – 1 outputs in pseudo-random
order.
• It has a short minimum cycle time independent
of N, so it is useful for extremely fast counters
as well as pseudo-random number generation.
• Some of the common features of counters include the
following:
• Resettable: counter value is reset to 0 when RESET is
asserted (essential for testing)
• Loadable: counter value is loaded with N-bit value when
LOAD is asserted
• Enabled: counter counts only on clock cycles when EN is
asserted
• Reversible: counter increments or decrements based on
UP/DOWN input
• Terminal Count: TC output asserted when counter
overflows (when counting up) or underflows (when
counting down)
– In general, divide-by-M counters (M < 2N) can be built
using an ordinary N-bit counter and circuitry to reset
the counter upon reaching M
Binary Counters
• The simplest binary counter is the
asynchronous ripple-carry counter, as shown
in Figure.
• It is composed of N registers connected in toggle
configuration, where the falling transition of each
register clocks the subsequent register.
• Therefore, the delay can be quite long. It has no reset
signal, making it difficult to test.
• A general synchronous up/down counter is shown in
Figure Figure 11.49(a).
• It uses a resettable register and full adder for each bit
position. The cycle time is limited by the ripple-carry
delay.
• If only an up counter (also called an incrementer) is
required, the full adder degenerates into a half adder,
as shown in Figure 11.49(b) is called the fast adder
circuit.
• Figure 11.50 shows a fully featured resettable loadable
enabled synchronous up/down counter.
Fast Binary Counters
• The speed of the counter in Figure 11.49 is limited by the adder.
• This can be overcome by dividing the counter into two or more
segments.
• For example, a 32-bit counter could be constructed from a 4-bit
prescalar counter and a 28-bit counter, as shownin Figure 11.51
• The TC output of the prescalar enables counting on the more
significant segment. Now, the cycle time is imited only by the
prescalar speed because the 28-bit adder has 24 cycles to
produce a result.
• Prescaling does not suffice for up/down counters because the
more significant segment may have only a single cycle to
respond when the counter changes direction.
• To solve this, a shadow register can be used on the more
significant segments to hold the previous value that should be
used when the direction changes.
• Figure 11.52 shows the more significant segment
for a fast up/down counter. On reset (not shown in
the figure), the dir register is set to 0, Q to 0, and
shadow to –1.
Ring and Johnson Counters
• A ring counter consists of an M-bit shift register
with the output fed back to the input, as shown in
Figure 11.53(a).
• On reset, the first bit is initialized to 1 and the
others are initialized to 0. TC toggles once every M
cycles.
• A Johnson or Mobius counter is similar to a ring
counter, but inverts the output before it is fed back
to the input, as shown in Figure 11.53(b).
• The flip-flops are reset to all zeros and count
through 2M states before repeating.
Linear-Feedback Shift Registers
• A linear-feedback shift register (LFSR) consists of
N registers configured as a shift register.
• The input to the shift register comes from the XOR
of particular bits of the register, as shown in Figure
11.54 for a 3-bit LFSR.
• On reset, the registers must be initialized to a
nonzero value (e.g., all 1s). The pattern of outputs
for the LFSR is shown in Table 11.7.
• The inputs fed to the XORare called the tap sequence
and are often specified with a characteristic polynomial.
For example,
• this 3-bit LFSR has the characteristic polynomial 1 + x
2 + x3 because the taps come after the second and third
registers.
• LFSRs are used for high-speed counters and pseudo-
random number generators. The pseudo-random
sequences are handy for built-in self-test and bit-
error-rate testing in communications links
• Table 11.8 lists characteristic polynomials for some
commonly used maximal-length LFSRs. For certain
lengths, N, more than two taps may be required. For
many values of N, there are multiple polynomials
resulting in different maximal-length LFSRs.
Coding
• Error-detecting and error-correcting codes are
used to increase system reliability.
• The simplest form of error-detecting code is
parity, which detects single-bit errors.
• More elaborate error-correcting codes (ECC) are
capable of single-error correcting and double-
error detecting (SEC-DED). Gray codes are
another useful alternative to the standard
binary codes.
• All of the codes are heavily based on the XOR
function
Parity generator
• A parity bit can be added to an N-bit word to
indicate whether the number of 1s in the word is
even or odd. In even parity, the extra bit is the XOR
of the other N bits, which ensures the (N + 1)-bit
coded word has an even number of 1s:
Shifters
• Shifts can either be performed by a constant or
variable amount.
• Constant shifts are trivial in hardware,
requiring only wires. They are also an efficient
way to perform multiplication or division by
powers of two.
• A variable shifter takes an N-bit input, A, a shift
amount, k,and control signals indicating the
shift type and direction. It produces an N-bit
output, Y.
• There are three common types of variable shifts,
each of which can be to the left or right:
• Rotate: Rotate numbers in a circle such that empty
spots are filled with bits shifted off the other end
○ Example: 1011 ROR 1 = 1101; 1011 ROL 1 = 0111
• Logical shift: Shift the number to the left or right
and fills empty spots with zeros.
○ Example: 1011 LSR 1 = 0101; 1011 LSL 1 = 0110
• Arithmetic shift: Same as logical shifter, but on
right shifts fills the most significant bits with copies
of the sign bit (to properly sign, extend two’s
complement numbers when using right shift by k
for division by 2k).
○ Example: 1011 ASR 1 = 1101; 1011 ASL 1 = 0110
• Conceptually, rotation involves an array of N N-input multiplexers to
select each of the outputs from each of the possible input positions.
This is called an array shifter.
• The array shifter requires a decoder to produce the 1-of-N-hot shift
amount.
• logarithmic shifter are used to construct logv N levels of v-input
multiplexers
• A left rotate by k bits is equivalent to a right rotate by N – k bits.
Computing N – k requires a subtracter in the critical path.
• the left rotate can be performed by preshifting right by 1, then doing
a right rotate by the complemented shift amount.
• Logical and arithmetic shifts are similar to rotates, but must replace
bits at one end or the other with a kill value (either 0 or the sign bit).
• The two major shifter architectures are funnel shifters and barrel
shifters.
• In a funnel shifter, the kill values are incorporated at the beginning,
while in a barrel shifter, the kill values are chosen at the end.
• Both barrel and funnel shifters can use array or logarithmic
Funnel Shifter
• The funnel shifter creates a 2N – 1-bit input
word Z from A and/or the kill values,then
selects an N-bit field from this input word, as
shown in Figure 11.61.
• The simplest funnel shifter design consists
of an array of N N-input multiplexers
accepting 1-of-N-hot select signals (one
multiplexer for each output bit). Such an
array shifter is shown in Figure 11.62 using
nMOS passtransistors for a 4-bit shifter.
• The shift amount is conditionally inverted
and decoded into select signals that are fed
vertically across the array. The outputs are
taken horizontally.
• The array shifter works well for small shifters in transistor-
level designs, but has high parasitic capacitance in larger
shifters, leading to excessive delay and energy. Moreover,
array shifters are not amenable to standard cell designs.
• Figure 11.64 shows a 4-bit logarithmic shifter based on
multiple levels of 2:1 multiplexers (which, of course, can be
transmission gates) . The XOR gates on the control inputs
conditionally invert the shift amount for left shifts.
Barrel Shifter
• A barrel shifter performs a right rotate
operation. As mentioned earlier, it handles left
rotations using the complementary shift
amount.
• Barrel shifters can also perform shifts when
suitable masking hardware is included.
• Barrel shifters come in array and logarithmic
forms; we focus on logarithmic barrel shifters
because they are better suited for large shifts.
• Figure 11.68(a) shows a simple 4-bit barrel
shifter that performs right rotations
• Notice how, unlike funnel shifters,
barrel shifters contain long wrap-around wires.
• Figure 11.68(b) shows an enhanced version
that can rotate left by prerotating right by 1,
then rotating right by k.
multiplication
• Multiplication is less common than addition, but is still essential
for microprocessors, digital signal processors, and graphics
engines. The most basic form of multiplication consists of
forming the product of two unsigned (positive) binary numbers.
• M × N-bit multiplication P = Y × X can be viewed as forming N
partial products of M bits each, and then summing the
appropriately shifted partial products to produce an M + N-bit
result P. Binary multiplication is equivalent to a logical AND
operation.
• Therefore, generating partial products consists of the logical
ANDing of the appropriate bits of the multiplier and
multiplicand. Each column of partial products must then be
added and, if necessary, any carry values passed to the next
column
• Figure 11.72 illustrates the generation,shifting,
and summing of partial products in a 6 × 6-bit
multiplier.
• Large multiplications can be more conveniently
illustrated using dot diagrams. Figure 11.73
shows a dot diagram for a simple 16 × 16
multiplier. Each dot represents a placeholder
for a single bit that can be a 0 or 1.
Unsigned multiplication(Array
multiplier)
• Fast multipliers use carry-save adders (CSAs, see Section
11.2.4) to sum the partial products.
• Figure 11.74 shows a 4 × 4 array multiplier for unsigned
numbers using an array of CSAs. Each cell contains a 2-
input AND gate that forms a partial product and a full
adder (CSA) to add the partial product into the running
sum.
• The first row converts the first partial product into
carry-save redundant form. Each later row uses the CSA
to add the corresponding partial product to the carry-
save redundant result of the previous row and generate
a carry-save redundant result. The least significant N
output bits are available as sum outputs directly from
CSAs
• In Figure 11.74, the CPA is implemented as a carry-
ripple adder. The array is regular in structure and
uses a single type of cell, so it is easy to design and
lay out.
• Assuming the carry output is faster than the sum
output in a CSA, the critical path through the array is
marked on the figure with a dashed line.
• In practice, circuits are assigned rectangular blocks
in the floorplan so the parallelogram shape wastes
space. Figure 11.75 shows the same adder squashed
to fit a rectangular block.
Adders
Array Subsystems

By
A N SAI CHAKRAVARTHY
M.Tech(VLSI System Design)
Memory Arrays
Memory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory


(CAM)

Read/Write Memory Read Only Memory


Shift Registers Queues
(RAM) (ROM)
(Volatile) (Nonvolatile)

Serial In Parallel In First In Last In


Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out
(SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)

Mask ROM Programmable Erasable Electrically Flash ROM


ROM Programmable Erasable
(PROM) ROM Programmable
(EPROM) ROM
(EEPROM)

Slide 2 SRAM
Introduction

 Random access memory is accessed with an address and has a


latency independent of the address.
 Serial access memories are accessed sequentially so no
address is necessary.
 Content addressable memories determine which address(es)
contain data that matches a specified key.
 2n words of 2m bits each
 If n >> m, fold by 2k into fewer rows of more columns

Slide 10 SRAM
6T SRAM Cell
 The fundamental building block of a static RAM is the SRAM
memory cell.
 6T SRAM Cell
 Used in most commercial chips
 Data stored in cross-coupled inverters
 Read:
 Precharge bit, bit_b to
high and one is pulled
down by the SRAM cell
through the access
transistor.
 Raise word line
 Write:
 Drive data onto bit or bit_b to low and this low value overpowers the
cell to write new value.
 Raise word line
SRAM Read bit bit_b
word
 Precharge both bitlines high
P1 P2
 Then turn on wordline N2 N4
 One of the two bitlines will
A A_b
be pulled down by the cell N1 N3
 Ex: A = 0, A_b = 1
 bit discharges, bit_b stays high
A_b
 But A bumps up slightly bit_b

 Read stability 1.5

 A must not flip


1.0
word bit
 N1 >> N2
0.5

A
0.0
0 100 200 300 400 500 600
time (ps)
SRAM Write bit_b
bit
 Drive one bitline high, the other low
word
 Then turn on wordline
P1 P2
N2 N4
 Bitlines overpower cell with new value
A A_b
 Ex: A = 0, A_b = 1, bit = 1, bit_b = 0 N1 N3
 Force A_b low, then A rises high
 Writability
 Must overpower feedback inverter
A_b
 N2 >> P1
1.5 A

bit_b
1.0

0.5
word

0.0
0 100 200 300 400 500 600 700
time (ps)
SRAM Sizing
 High bitlines must not overpower inverters during reads
 But low bitlines must write new value into cell

bit bit_b
word
weak
med med
A A_b
strong
SRAM
 SRAM memory cells:
 12-transistor cell
• read/write are used in place of
a single wordline
• larger and fast
• for small RAMs and register files
 6-transistor cell
• Operation: (NMOS pass “0”)
For reads, bitlines are precharged high
and one is
pulled down by the cell
For writes, one bitline is driven low and
this low value overwrites the cell
Simple CMOS Memory Circuits: The SRAM Cell
 Circuit Schematic:
B0 B1  4 N-FETs and 2 P-FETs: T1 & T2 called active devices; T3 & T4
calld the I/O devices; T5 & T6 sometimes called loads.
Vdd
 The cell is comprised of two cross-coupled inverters (positive
T6
feedback).
T5
T4  2 vertical lines (bit lines B0 & B1) are used for sensing state of
T3 X0 cell and writing data in the cell
X1
T1 T2  1 horizontal line (word line WL) is used to select a row of cells
for writing or reading and to prevent the unselected rows of cells
from being disturbed.

WL Circuit Operation:
 The cell has two stable states: “0” and “1”
 “0” State = Node X0 high and Node X1 low; T2 & T5 are ON, T1 & T6
are OFF.
 “1” State = Node X1 high and Node X0 low; T1 & T6 are ON; T2 &
T5 are OFF.
 No dc current flows in either state.
 Read: raise WL to Vdd; pull one bit line high & pull the other bit
line low
 Write: raise WL to Vdd; precharge bit lines to ½ Vdd
Simple CMOS Memory Circuits: The SRAM Array
 READ Operation:
Data In  Word Decode circuitry selects one of n word
Bit lines and drives high to Vdd (say WL2); other
Addr Bit Decode (Column Decode) word lines held at gnd.
and Write Drivers
Word  Bit Lines all precharged to half Vdd
Addr
 Selected cell’s I/O devices turned ON and apply
SRAM SRAM SRAM a DV to bit line pair
Cell Cell Cell  Sense amp triggers on bit line DV and stores
11 12 13 read data “0” or “1”
SRAM SRAM SRAM  WRITE Operation:
Word
Decode
Cell Cell Cell  Selected WL is driven high to Vdd by word
(Row 21 22 23 decode circuitry turning ON I/O devices in
Decode) SRAM SRAM SRAM selected cells
Cell Cell Cell  Selected bit column has one BL pulled high to
31 32 33 Vdd and the other pulled low to gnd, thus
writing the selected cell.
 Unselected bit columns merely perform a READ
operation.
Sense Amplifiers
and Off-Chip Drivers/Buffers

R. W. Knepper
Data Out
SC571, page 1-15
DRAM (Dynamic RAMs)
 Store contents as charge on a capacitor rather than in a
feedback loop.
 Smaller than SRAM, but the cell must be periodically read
and refreshed.
 High density
Read: bit is precharged to
Vdd/2 wordline rises
capacitor shares its charges
with the bitline voltage change
on bitline is detected rewritten
after each read
Write: the bitline is driven
high or low the voltage is
forced onto the capacitor.
Subarray architecture

Large DRAMs are divided


into multiple subarrays
(256-word by 512-bit)

Sense amplifier is needed


DRAM (Dynamic RAMs)
 SRAMs typically use six transistors per bit of storage.
 DRAMs use only one
transistor per bit:
 1/0 = capacitor
charged/discharged

Lect #15 Rissacher EE365


DRAM read operations

 Precharge bit line to VDD/2.


 Take the word line HIGH.
 Detect whether current flows into or out of the cell.
 Note: cell contents are destroyed by the read!
 Must write the bit value back after reading.

Lect #15 Rissacher EE365


DRAM write operations

 Take the word line HIGH.


 Set the bit line LOW or HIGH to store 0 or 1.
 Take the word line LOW.

 Note: The stored charge for a 1 will eventually leak off.


DRAM charge leakage

 Typical devices require each cell to be refreshed once


every 4 to 64 mS.
 During “suspended” operation, notebook computers use
power mainly for DRAM refresh.

Lect #15 Rissacher EE365


Read only memory(ROM)-NOR
 ROM cells can be built with only one transistor per bit of
storage.
 A ROM array is commonly implemented as a single ended NOR
array using any of the NOR structures including the pseudo
nMOS and the footless dynamic NOR gate .
 The contents of the ROM can be symbolically represented with a
dot diagram in which dot indicates the presence of 1’s.
Serial Access Memories
 Serial access memories do not use an address
 Using the basic SRAM and/or registers,we can construct a
variety of serial access memories including
 Shift Registers
o Serial In Parallel Out (SIPO)
o Parallel In Serial Out (PISO)
 Queues (FIFO, LIFO)

Slide 32 SRAM
Shift Register
 Shift registers store and delay data
 Simple design: cascade of registers
 Watch your hold times!

clk

Din Dout
8

Slide 33 SRAM
Denser Shift Registers
 Flip-flops aren’t very area-efficient
 For large shift registers, keep data in SRAM instead
 Move read/write pointers to RAM rather than data
 Initialize read address to first entry, write to last
 Increment address on each cycle

Din
clk

readaddr
counter

00...00
dual-ported
SRAM
counter

writeaddr
11...11

reset
Dout
 One variant of shift register is a tapped delay
line that offers a variable number of stages of
delay.
 Delay blocks are built from 32-,16-,8-,4-,2-,
and 1-stage shift registers.
 Another variant is Serial In Parallel
Out(SIPO)and Parallel In Serial Out(PISO).
Tapped Delay Line
 A tapped delay line is a shift register with a programmable number
of stages
 Set number of stages with delay controls to mux
 Ex: 0 – 63 stages of delay

clk
SR32

SR16

SR8

SR4

SR2

SR1
Din Dout

delay5 delay4 delay3 delay2 delay1 delay0


Serial In Parallel Out
 1-bit shift register reads in serial data
 After N steps, presents N-bit parallel output

clk

Sin

P0 P1 P2 P3

Slide 37 SRAM
Parallel In Serial Out
 Load all N bits in parallel when shift = 0
 Then shift one bit out per cycle

P0 P1 P2 P3
shift/load
clk

Sout
Queues
 Queues allow data to be read and written at different rates.
 Read and write each use their own clock, data
 Queue indicates whether it is full or empty
 Build with SRAM and read/write counters (pointers)

WriteClk ReadClk

WriteData Queue ReadData

FULL EMPTY
FIFO, LIFO Queues
 First In First Out (FIFO)
 Initialize read and write pointers to first element
 Queue is EMPTY
 On write, increment write pointer
 If write almost catches read, Queue is FULL
 On read, increment read pointer
 Last In First Out (LIFO)
 Also called a stack
 Use a single stack pointer for read and write

Slide 40 SRAM
Content Addressable Memory
 The CAM acts as an ordinary SRAM that can be read or
written given ‘adr’ and ‘data’,but also performs ‘matching’
operation.
 Matching asserts a matchline output for each word of the
CAM that contains a specified key.
 A common application of CAMs is translation lookaside
buffers(TLBs) in microprocessors supporting virtual
memory.
Fig:10T and 9T CAM cell
implementations

You might also like