Memory: Random-Access Memory 1
Example 41
Memory: Random-Access Memory
In this example we will learn how to create random access memory (RAM) using
the Xilinx Core Generator, CoreGen. We will also learn the difference between
implementing RAM using distributed resources verses the FPGA block RAM.
Prerequisite knowledge:
Example 40 – Memory: Read-Only Memory
41.1 Random-Access Memory
In Example 40, we discussed read-only memories and implemented a
programmable read-only memory in an FPGA using VHDL. While read-only memories
contain information that can only be read by hardware, random-access memories can can
be either written to or read from. Figure 41.1 shows a block diagram for a typical
256Kx16 RAM. The first number, 256K, refers to the number of slots or rows in the
memory. In this case, there are 256K of them. The second number, 16, refers to the
number of bits in the word stored in each row. In this case, each data word contains 16
bits.
O[15:0]
A[17:0]
D[15:0]
256K x 16
RAM
WE
CLK
Figure 41.1 A Standard Block Diagram for a 256K x 16 RAM
To address 256K address spaces, the address line must be an 18-bit bus since
18
2 =262,144=256K. The address bus, A, is an 18-bit input. Data are input when writing
to the memory through the data input bus, D. A write enable line, WE, is used to control
whether data are being written to the memory or read from the memory. Often, the write
2 Example 41
enable line is active low as indicated by the bar over top of the signal in the schematic. If
data are to be written to the RAM, WE is asserted low. Otherwise, if data are to be read
from the RAM, WE is asserted high. Since RAMs are synchronous storage elements, an
input clock, CLK, is necessary. Finally, a data output bus, O, is where data at address A
are retrieved from memory.
There are many variations between random-access memories. These variations
include different speeds or maximum clock frequencies, ways to load the address bus,
and ways in which data are written to or retrieved from memory, to name a few. For
example, some memories will retrieve data from memory asynchronously. In that case,
the clock is not relevant to reading data from memory. Simply placing an address on the
address bus with the active-low WE high will result in data being output to the output
bus, O, after a short gate delay. On the other hand, some memories have registered
address and data output busses. These memories require a clock cycle to register A and
WE internally while requiring another clock cycle to register the data output on the output
bus O thus requiring two clock cycles to perform a read. It is important that a hardware
designer takes these factors into account when designing a memory controller for a
particular RAM.
In this example, we will use the Xilinx Core Generator to design and implement a
RAM module in a Spartan 3 FPGA. We will also design a state machine that writes a
sequence of numbers to the memory and then reads the numbers back from the memory
displaying them on the LEDs to demonstrate writing to and reading from the memory.
The steps are similar for creating RAM for other target FPGAs; however, the
specifications for memories in the target FPGA are not all exactly the same.
41.2 Creating a RAM Module using Core Generator
There are two different ways to implement a RAM module in an FPGA:
distributed or using block RAM. Distributed refers to routing CLBs or slices on the
reconfigurable FPGA fabric together to implement a RAM module of a specified size in
the Core Generator, provided that it will fit on the FPGA. Distributed memory uses
reconfigurable resources just like any other custom hardware would in an FPGA. An
alternative to using distributed memory is to use block memory. Most FPGAs have some
memory internal to the FPGA that is not reconfigurable. That is, it does not use CLBs or
slices; instead, it uses the FPGAs pre-fabricated RAM. Refer to the FPGA
documentation for determining how much onboard block memory is available in your
device. The block RAM often behaves differently depending on the specific model of
FPGA, so it will be important to read the documentation before creating hardware that
uses block memory.
In this example, we will first create distributed memory along with a simple state
machine to write to and read from this memory. Finally, we will use block memory and
alter the state machine accordingly, for a Xilinx Spartan 3 FPGA, to write to and read
from the block memory. The first step is to create the memory module using the Xilinx
Core Generator. This can be launched from Xilinx ISE, the Aldec ActiveHDL simulator,
or independent from any IDE using the Microsoft Windows Start menu.
Memory: Random-Access Memory 3
From Aldec ActiveHDL, click the ‘Tools’ icon and then the ‘CoreGen &
Architecture Wizard’. This will launch the Xilinx Core Generator and Architecture
Interface shown below.
Be sure that the boxes are checked for adding the files for simulation and adding
the files for implementation since we will simulate the memory and implement it to
download it to the FPGA. Click the ‘Run CORE Generator’ button. This will launch the
Core Generator Menu shown below. First, select Project Æ Options and select the
correct device from the pull down lists.
Then, select
Memories & Storage Elements
RAMs & ROMs
Distributed Memory (ver 7.1)
4 Example 41
A wizard will appear where the particulars for the memory module can be
entered. Name the component ram16x8. For this example, we will create a memory that
has an 8-bit data bus with 16 memory slots. A schematic of the RAM module will appear
to the left of the parameters. Click ‘Next’.
Next, the wizard prompts for more parameters including whether or not the input
and output should be registered. The most common choice is to have both the input and
output unregistered. Therefore, no clock cycles are required to either latch the input into
the memory or to register the output. Thus, the output can be available based on the
current input without any extra clock cycles. Click the ‘Data Sheet’ button if you wish to
read more about the options and details for the distributed memory. Click the ‘Next’
button.
Memory: Random-Access Memory 5
The third and final step in the wizard allows you to set the initial contents of the
RAM using a .coe file or by specifying a single value that will be used for all cells during
initialization for which the default is zero. For our example, we will accept zero as an
initial value, however, by creating a .coe file, the initial contents of each memory slot can
be specified. To create a .coe file, use a simple text editor such as Microsoft Notepad.
The file extension must be .coe. The contents include two basic parameters, the
memory_initialization_vector and memory_initialization_radix. Valid radix choices are
2, 10, and 16. An example .coe file is given below in Listing 41.1 for initializing our
16x8 memory with random values in hex. The values in the vector can be separated by a
comma or white space. The values specified below are for the sixteen memory slots, the
first would contain a 01 and the last would contain 2E. After creating this file, you would
select it using the ‘Load Coefficients’ button on step 3 of the memory wizard.
Listing 41.1 sample.coe for a 16x8 memory
; Sample initialization file for a 16x8 memory
memory_initialization_radix = 16;
memory_initialization_vector = 1 A 5 2 6
5 23 F 48 0 B
23 4 1D 7 2E;
Since we will use zero as an initial value for all data in the RAM for our example,
click the ‘Generate’ button in the lower left hand corner of the wizard. Several files are
generated by the Core Generator. A list of commonly used files and descriptions is given
in Listing 41.2.
Listing 41.2 Commonly used files generated by the Core Generator
The following files were generated for 'ram16x8' in directory
c:\My_Designs\RAM\xilinxcoregen\:
ram16x8.edn:
Electronic Data Netlist (EDN) file containing the information required to implement the module in a
Xilinx (R) FPGA.
ram16x8.vhd:
VHDL wrapper file provided to support functional simulation. This file contains simulation model
customization data that are passed to a parameterized simulation model for the core.
ram16x8.vho:
VHO template file containing code that can be used as a model for instantiating a CORE Generator
module in a VHDL design.
ram16x8_readme.txt:
Text file indicating the files generated and how they are used.
In the Core Generator and Architecture Wizard Interface, the distributed memory
is now listed. Close the interface to return back to Aldec ActiveHDL’s main menu. In
Aldec ActiveHDL, add the ram16x8.vhd file to your project from the xilinxcoregen
6 Example 41
directory where the generated files were output. At this point, we have generated a 16x8
RAM module. Now, we will develop a state machine that will write incremental
numbers, starting with zero, to each data location in memory. When it is finished, each
memory location will contain data that matches its address. Finally, we will read the data
from each location displaying the data on the LEDs. The state machine shown in Fig
41.2 will write and read accordingly.
addr < 15 addr < 15
START WRITE CLR READ
addr = 15
addr = 0 WE = 1
addr = 0 addr = addr+1
addr = addr+1
addr = 15
Figure 41.2 State machine for writing to each location in the 16x8 RAM
followed by reading the data from each location
In the start state, the address counter is set to zero, for the memory created by the
Core Generator we is active high and therefore is set to low by default. In the write state
we is set high and the address counter is incremented. The state machine remains in the
write state until the address counter reaches 15 and then it transitions to the clr state. The
clr state resets the address counter to zero. Finally, the read state is the same as the write
state in that it increments the address counter, however we remains low. The state
machine remains in the read state until the address counter reaches 15 and then it
transitions to the start state. The VHDL code for the state machine is shown in Listing
41.3. The VHDL code for the top level wiring the state machine to the memory is shown
in Listing 41.4.
Listing 41.3 RAMsm.vhd
-- Simple state machine for writing to and reading from RAM 16x8
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity RAMsm is
port ( clk : in std_logic;
clr : in std_logic;
we : out std_logic;
data : out std_logic_vector(7 downto 0);
addr : out std_logic_vector(3 downto 0)
);
end RAMsm;
Memory: Random-Access Memory 7
architecture RAMsm of RAMsm is
type state_type is (start, write, clear, read);
signal present_state, next_state: state_type;
signal addrcnt : std_logic_vector(3 downto 0);
begin
sreg: process(clk, clr)
begin
if clr = '1' then
present_state <= start;
elsif clk'event and clk = '1' then
present_state <= next_state;
case present_state is
when start =>
addrcnt <= X"0"; --clear address variable
when write =>
addrcnt <= addrcnt + 1; --incr address var
when clear =>
addrcnt <= X"0"; --clear address variable
when read =>
addrcnt <= addrcnt + 1; --incr address var
when others =>
null;
end case;
end if;
end process;
C1: process(present_state, addrcnt)
begin
case present_state is
when start =>
next_state <= write;
when write =>
if addrcnt < 15 then
next_state <= write;
else
next_state <= clear;
end if;
when clear =>
next_state <= read;
when read =>
if addrcnt < 15 then
next_state <= read;
else
next_state <= start;
end if;
when others =>
null;
end case;
end process;
8 Example 41
C2: process(present_state, addrcnt)
begin
we <= '0';
if present_state = write then
we <= '1';
end if;
end process;
addr <= addrcnt;
data <= "0000" & addrcnt; --connect data to addr since we want to write
--the addr as the data for each location
end RAMsm;
Listing 41.4 RAMtest.vhd (top level)
-- Top level for RAM16x8 and the simple RAM test state machine
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity RAMtest is
port ( clk : in std_logic;
clr : in std_logic;
RAMdata : out std_logic_vector(7 downto 0)
);
end RAMtest;
architecture RAMtest of RAMtest is
component ram16x8 is
port ( A: in std_logic_vector(3 downto 0);
CLK: in std_logic;
D: in std_logic_vector(7 downto 0);
WE: in std_logic;
SPO: out std_logic_vector(7 downto 0));
end component;
component RAMsm is
port ( clk : in std_logic;
clr : in std_logic;
we : out std_logic;
data : out std_logic_vector(7 downto 0);
addr : out std_logic_vector(3 downto 0));
end component;
signal data : std_logic_vector(7 downto 0);
signal addr : std_logic_vector(3 downto 0);
signal we : std_logic;
begin
Memory: Random-Access Memory 9
RAM: ram16x8 port map(
A => addr, CLK => clk, D => data, WE => we, SPO => RAMdata);
SM: RAMsm port map(
clk => clk, clr => clr, we => we, data => data, addr => addr);
end RAMtest;
Fig. 41.3 shows the simulation for the state machine RAMsm. Addr and data are
the address and data busses, respectively, that will be connected to the RAM. The we
signal is the write enable. Addcnt is the address counter signal within the state machine.
Present_state and next_state are state signals.
Figure 41.3 Simulation for the test state machine, RAMsm
Fig. 41.4 shows the simulation for the top level combining the state machine and
16x8 RAM. RAMdata is the output of the RAM, addr is the RAM address, data is the
data input to the RAM, we is the write enable signal, and present_state is the current state
of the test state machine, RAMsm.
Figure 41.4 Simulation for the RAM test component, RAMtest
Now that we have successfully simulated the top-level RAMtest, we are ready to
create a constraints file, synthesize, and implement the design for onboard testing.
Before synthesizing the design, it is important to copy the ram16x8.edn file from the
xilinxcoregen directory into the src directory for the current project. Otherwise,
implementation will fail stating that the .edn file for ram16x8 can not be found. Listing
41.5 shows the Design Summary from the map report. The state machine and memory
occupied 11 slices.
10 Example 41
Listing 41.5 Design Summary for RAMtest including the 16x8 Distributed RAM
Design Summary
--------------
Logic Utilization:
Number of Slice Flip Flops: 6 out of 3,840 1%
Number of 4 input LUTs: 8 out of 3,840 1%
Logic Distribution:
Number of occupied Slices: 11 out of 1,920 1%
Number of Slices containing only related logic: 11 out of 11 100%
Number of Slices containing unrelated logic: 0 out of 11 0%
Total Number 4 input LUTs: 16 out of 3,840 1%
Number used as logic: 8
Number used as 16x1 RAMs: 8
Number of bonded IOBs: 10 out of 173 5%
Number of GCLKs: 1 out of 8 12%
Number of RPM macros: 1
Total equivalent gate count for design: 1,123
Additional JTAG gate count for IOBs: 480
Peak Memory Usage: 128 MB
41.3 Using Block RAM to implement a 16x8 RAM module
Instead of using distributed memory where the RAM uses reconfigurable
resources on the FPGA, the RAM can be implemented using the FPGA’s block memory.
The Spartan 3-200, for example, has 216 Kbits (27 Kbytes) of block memory. Create a
new project and launch the Xilinx Core Generator. Instead of selecting distributed
memory, double click ‘Single Port Block Memory’. Enter the component name, memory
width and depth, port configuration, and write mode.
Memory: Random-Access Memory 11
Since we are implementing a RAM, the port configuration desired is ‘Read And
Write’. There are three write modes to choose from. ‘Read After Write’ means that the
RAM output will change after the data has changed after a write; the output will reflect
the data that was written to the location. ‘Read Before Write’ means that the RAM
output will reflect the data that was in the location before it was changed with the new
value. ‘No Read On Write’ means that the value that was on the data output will remain
on the data output bus during a write and will not change until the next read (where we is
low). For this example, select ‘Read After Write’ so that it will behave similar to the
distributed RAM created in Section 41.2. Click the ‘Next’ button.
Page 2 of 4 gives optimization and design options. For more information about
these choices, click the ‘Datasheet’ button. These can be left as default, click ‘Next’.
Page 3 of 4 gives implementation options and pin polarity options. Leave these options
as default, an active high we will behave the same as the distributed memory created in
the previous section. Click ‘next’ to go to page 4 of 4 where initialization values,
including those from a .coe file can be specified. For our example, we will continue with
initializing all memory locations with a zero. Click ‘Generate’ to generate the files for
the component ram16x8block. Copy the state machine for the test program RAMsm.vhd
and the top level RAMtest.vhd into this new project. Finally, instead of port mapping the
ram16x8 component from the previous section, port map the ram16x8block created in
this section. The signals are different. Listing 41.6 shows the component declaration and
port map statement for the block RAM in the top level.
Listing 41.6 Component declaration and port map for the 16x8 Block RAM in RAMtest.vhd
...
component ram16x8block IS
port (
addr: IN std_logic_VECTOR(3 downto 0);
clk: IN std_logic;
din: IN std_logic_VECTOR(7 downto 0);
dout: OUT std_logic_VECTOR(7 downto 0);
we: IN std_logic);
END component;
...
RAM: ram16x8block port map(
addr => addr, clk => clk, din => data, we => we, dout => RAMdata);
...
The block RAM behaves very differently than the distributed RAM in one
important respect, the block memory performs a synchronous read. This means that
whereas the distributed memory would output the data at address addr irrespective of the
clock, the block memory will output the data at address addr on the rising edge of the
clock only. Fig. 41.5 shows a simulation of RAMtest.vhd with the block RAM.
12 Example 41
Figure 41.5 Simulation for the Block RAM test component, RAMtest
Addr and data are the address and data bus from the state machine into the block
RAM. The we signal is the write enable. RAMdata is the output data bus from the block
RAM. Finally, addr_q is an address signal internal to the block RAM. The output data
bus, RAMdata, is always one clock cycle behind the write since it requires a clock edge to
output the data. That is, on the clock edge where the data from address 3 is being read
(we is low), for example, the RAMdata is updated on the next clock edge. Compare this
with Fig. 41.4 where the RAMdata output for the distributed memory is updated
immediately (asynchronously) when the address changes. This is a good example of how
memory differs in behavior. It is important that you read the memory specifications
before implementing a memory controller to ensure that it will work properly.
After placing the ram16x8block.edn file in the src directory for this project and
creating a constraints file, we can synthesize and implement the design and download it
to the board for testing. Listing 41.7 shows the Design Summary from the map report for
the test state machine and block RAM.
Listing 41.7 Design Summary for RAMtest including the 16x8 Block RAM
Design Summary
--------------
Logic Utilization:
Number of Slice Flip Flops: 6 out of 3,840 1%
Number of 4 input LUTs: 8 out of 3,840 1%
Logic Distribution:
Number of occupied Slices: 7 out of 1,920 1%
Number of Slices containing only related logic: 7 out of 7 100%
Number of Slices containing unrelated logic: 0 out of 7 0%
Total Number of 4 input LUTs: 8 out of 3,840 1%
Number of bonded IOBs: 10 out of 173 5%
Number of Block RAMs: 1 out of 12 8%
Number of GCLKs: 1 out of 8 12%
Total equivalent gate count for design: 65,635
Additional JTAG gate count for IOBs: 480
Peak Memory Usage: 129 MB
This Design Summary shows that the entire design used 7 Slices while the
distributed memory used 11 Slices as shown in Listing 41.5. Also, Listing 41.7 indicates
that one of the twelve 18 Kbit blocks of block RAM is used.