Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
124 views42 pages

Memories PDF

The document discusses different types of memories that can be used with FPGAs including memories in Verilog, memories on the FPGA like block RAMs and LUT-based RAMs, and external memories like SRAM, DRAM, and flash. It notes that while memory technologies provide increasing capacities and lower prices each year, memory access latencies have not kept up with logic speeds, creating bottlenecks. FPGA memories include small fast LUT-based RAMs and larger block RAMs with options for width, depth, and single or dual port access.

Uploaded by

Harshil Suthar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views42 pages

Memories PDF

The document discusses different types of memories that can be used with FPGAs including memories in Verilog, memories on the FPGA like block RAMs and LUT-based RAMs, and external memories like SRAM, DRAM, and flash. It notes that while memory technologies provide increasing capacities and lower prices each year, memory access latencies have not kept up with logic speeds, creating bottlenecks. FPGA memories include small fast LUT-based RAMs and larger block RAMs with options for width, depth, and single or dual port access.

Uploaded by

Harshil Suthar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Memories

•  Memories in Verilog
•  Memories on the FPGA
•  External Memories
-- SRAM (async, sync)
-- DRAM
-- Flash

6.111 Fall 2008 Lecture 7 1


Memories: a practical primer
•  The good news: huge selection of technologies
–  Small & faster vs. large & slower
–  Every year capacities go up and prices go down
–  New kid on the block: high density, fast flash memories
•  Non-volatile, read/write, no moving parts! (robust, efficient)
•  The bad news: perennial system bottleneck
–  Latencies (access time) haven’t kept pace with cycle times
–  Separate technology from logic, so must communicate between
silicon, so physical limitations (# of pins, R’s and C’s and L’s) limit
bandwidths
•  New hopes: capacitive interconnect, 3D IC’s
–  Likely the limiting factor in cost & performance of many digital
systems: designers spend a lot of time figuring out how to keep
memories running at peak bandwidth
–  “It’s the memory, stupid”

6.111 Fall 2008 Lecture 7 2


Memories in Verilog
•  reg bit; // a single register
•  reg [31:0] word; // a 32-bit register
•  reg [31:0] array[15:0]; // 16 32-bit regs

•  wire [31:0] read_data,write_data;


wire [3:0] index;

// combinational (asynch) read


assign read_data = array[index];

// clocked (synchronous) write


always @(posedge clock)
array[index] <= write_data;

6.111 Fall 2008 Lecture 7 3


Multi-port Memories (aka regfiles)
reg [31:0] regfile[30:0]; // 31 32-bit words

// Beta register file: 2 read ports, 1 write


wire [4:0] ra1,ra2,wa;
wire [31:0] rd1,rd2,wd;

assign ra1 = inst[20:16];


assign ra2 = ra2sel ? inst[25:21] : inst[15:11];
assign wa = wasel ? 5'd30 : inst[25:21];

// read ports
assign rd1 = (ra1 == 5’d31) ? 32’d0 : regfile[ra1];
assign rd2 = (ra2 == 5’d31) ? 32’d0 : regfile[ra2];
// write port
always @(posedge clk)
if (werf) regfile[wa] <= wd;

assign z = ~| rd1; // used in BEQ/BNE instructions

6.111 Fall 2008 Lecture 7 4


FIFOs din
WIDTH WIDTH
dout
wr FIFO rd
full empty
1<<LOGSIZE
overflow
reset locations

clk

// a simple synchronous FIFO (first-in first-out) buffer


// Parameters:
// LOGSIZE (parameter) FIFO has 1<<LOGSIZE elements
// WIDTH (parameter) each element has WIDTH bits
// Ports:
// clk (input) all actions triggered on rising edge
// reset (input) synchronously empties fifo
// din (input, WIDTH bits) data to be stored
// wr (input) when asserted, store new data
// full (output) asserted when FIFO is full
// dout (output, WIDTH bits) data read from FIFO
// rd (input) when asserted, removes first element
// empty (output) asserted when fifo is empty
// overflow (output) asserted when WR but no room, cleared on next RD
module fifo #(parameter LOGSIZE = 2, // default size is 4 elements
WIDTH = 4) // default width is 4 bits
(input clk,reset,wr,rd, input [WIDTH-1:0] din,
output full,empty,overflow, output [WIDTH-1:0] dout);

endmodule
6.111 Fall 2008 Lecture 7 5
FIFOs in action
// make a fifo with 8 8-bit locations
fifo #(.LOGSIZE(3),.WIDTH(8))
f8x8(.clk(clk),.reset(reset),
.wr(wr),.din(din),.full(full),
.rd(rd),.dout(dout),.empty(empty),
.overflow(overflow));

6.111 Fall 2008 Lecture 7 6


FPGA memory implementation
•  Regular registers in logic blocks
–  Piggy use of resources, but convenient & fast if small
•  [Xilinx Vertex II] use the LUTs:
–  Single port: 16x(1,2,4,8), 32x(1,2,4,8), 64x(1,2), 128x1
–  Dual port (1 R/W, 1R): 16x1, 32x1, 64x1
–  Can fake extra read ports by cloning memory: all clones are written
with the same addr/data, but each clone can have a different read
address
•  [Xilinx Vertex II] use block ram:
–  18K bits: 16Kx1, 8Kx2, 4Kx4
with parity: 2Kx(8+1), 1Kx(16+2), 512x(32+4)
–  Single or dual port
–  Pipelined (clocked) operations
–  Labkit XCV2V6000: 144 BRAMs, 2952K bits total

6.111 Fall 2008 Lecture 7 7


LUT-based RAMs

6.111 Fall 2008 Lecture 7 8


LUT-based RAM Modules

// instantiate a LUT-based RAM module


RAM16X1S mymem #(.INIT(16’b01101111001101011100)) // msb first
(.D(din),.O(dout),.WE(we),.WCLK(clock_27mhz),
.A0(a[0]),.A1(a[1]),.A2(a[2]),.A3(a[3]));

6.111 Fall 2008 Lecture 7 9


Tools will often build these for you…
From Lab 2: =============================================
* HDL Synthesis *
=============================================
reg [7:0] segments;
always @ (switch[3:0]) begin Synthesizing Unit <lab2_2>.
Related source file is "../lab2_2.v".
case (switch[3:0])
...
4'h0: segments[6:0] = 7'b0111111;
4'h1: segments[6:0] = 7'b0000110;
Found 16x7-bit ROM for signal <$n0000>.
...
4'h2: segments[6:0] = 7'b1011011; Summary:
4'h3: segments[6:0] = 7'b1001111; inferred 1 ROM(s).
4'h4: segments[6:0] = 7'b1100110; ...
4'h5: segments[6:0] = 7'b1101101; Unit <lab2_2> synthesized.
4'h6: segments[6:0] = 7'b1111101;
=============================================
4'h7: segments[6:0] = 7'b0000111; Timing constraint: Default path analysis
4'h8: segments[6:0] = 7'b1111111; Total number of paths / destination ports: 28 / 7
4'h9: segments[6:0] = 7'b1100111; -------------------------------------------------
4'hA: segments[6:0] = 7'b1110111; Delay: 7.244ns (Levels of Logic = 3)
Source: switch<3> (PAD)
4'hB: segments[6:0] = 7'b1111100;
Destination: user1<0> (PAD)
4'hC: segments[6:0] = 7'b1011000;
4'hD: segments[6:0] = 7'b1011110; Data Path: switch<3> to user1<0>
4'hE: segments[6:0] = 7'b1111001; Gate Net
4'hF: segments[6:0] = 7'b1110001; Cell:in->out fanout Delay Delay Logical Name
--------------------------------- ------------
default: segments[6:0] = 7'b00000000;
IBUF:I->O 7 0.825 1.102 switch_3_IBUF
endcase LUT4:I0->O 1 0.439 0.517 Mrom__n0000_inst_lut4_01
segments[7] = 1'b0; // decimal point OBUF:I->O 4.361 user1_0_OBUF
end ---------------------------------------
Total 7.244ns (5.625ns logic, 1.619ns route)
(77.7% logic, 22.3% route)

6.111 Fall 2008 Lecture 7 10


Block Memories (BRAMs)

(WDATA + WPARITY)*(LOCATIONS) = 18K bits

1,2,4 16K,8K,4K,2K,1K,512
1
2
4
8
16
32

6.111 Fall 2008 Lecture 7 11


BRAM Operation
Data_in Data_out
BRAM
Address
Single-port
WE Config.
CLK

6.111 Fall 2008


Source: Xilinx App Note 463
Lecture 7 12
BRAM timing

6.111 Fall 2008 Lecture 7 13


Using BRAMs (eg, a 64Kx8 ram)
•  From menus: Project → New Source…

Select “IP”
Fill in name

Click “Next” when done…


6.111 Fall 2008 Lecture 7 14
BRAM Example

Click open folders

Select “Single Port Block


Memory”

Click “Next” and then “Finish” on next window


6.111 Fall 2008 Lecture 7 15
BRAM Example

Fill in name
(again?!)

Select RAM vs
ROM

Fill in width
& depth

Usually “Read After


Write” is what you
want

Click “Next” …
6.111 Fall 2008 Lecture 7 16
BRAM Example

Can add extra


control pins, but
usually not

Click “Next” …
6.111 Fall 2008 Lecture 7 17
BRAM Example

Select polarity of
control pins; active
high default is
usually just fine

Click “Next” …
6.111 Fall 2008 Lecture 7 18
BRAM Example

Click to name a .coe


file that specifies
initial contents (eg,
for a ROM)

Click “Generate” to complete


6.111 Fall 2008 Lecture 7 19
.coe file format
memory_initialization_radix=2;
memory_initialization_vector=

00000000,
00111110,
01100011, Memory contents with location 0 first, then
00000011, location 1, etc. You can specify input radix, in
this example we’re using binary. MSB is on
00000011,
the left, LSB on the right. Unspecified
00011110, locations (if memory has more locations than
00000011, given in .coe file) are set to 0.
00000011,
01100011,
00111110,
00000000,
00000000,
6.111 Fall 2008 Lecture 7 20
Using result in your Verilog
•  Look at generated Verilog for module def’n:

module ram64x8 (addr,clk,din,dout,we);


input [15 : 0] addr;
input clk;
input [7 : 0] din;
output [7 : 0] dout;
input we;

endmodule

•  Use to instantiate instances in your code:

ram64x8 foo(.addr(addr),.clk(clk),.we(we),
.din(din),.dout(dout));

6.111 Fall 2008 Lecture 7 21


Memory Classification & Metrics

Read-Write
Memory Non-Volatile
Read-Only
Random Read-Write
Sequential Memory
Access Memory
Access

EPROM Mask-
SRAM
FIFO E2PROM Programmed
DRAM
FLASH ROM

Key Design Metrics:


1.  Memory Density (number of bits/mm2) and Size
2.  Access Time (time to read or write) and Throughput
3.  Power Dissipation

6.111 Fall 2008 Lecture 7 22


Static RAMs: Latch Based Memory

Set Reset Flip Flop Register Memory


S Q D Q
D Q
D Q Q
D D Q
D Q
D Q
R Q

Address

  Works fine for small memory blocks (e.g., small register files)
  Inefficient in area for large memories
  Density is the key metric in large memory circuits

How do we minimize cell size?


6.111 Fall 2008 Lecture 7 23
Memory Array Architecture
Small cells → small mosfets → small dV on bit line
2LxM memory

2L-K Bit Line


Storage Cell

AK
Row Decode
AK+1 Word Line 2L-K row
by
Mx2 column
K

AL-1 cell array

M*2K
Amplify swing to
Sense Amps/Driver rail-to-rail amplitude

A0
Column Decode Selects appropriate word
AK-1
(i.e., multiplexer)
Input-Output
(M bits)

6.111 Fall 2008 Lecture 7 24


Static RAM (SRAM) Cell (The 6-T Cell)
BL BL
WL WL
VDD
M2 M4
Q
M5 Q M6 Q Q

M1 M3
Write: Set BL, BL to (0,VDD )
or (VDD,0) then enable WL (= VDD)
BL BL
Read: Disconnect drivers from BL
and BL, then enable WL (=VDD).
Sense a small change in BL or BL
  State held by cross-coupled inverters (M1-M4)
  Retains state as long as power supply turned on
  Feedback must be overdriven to write into the memory
6.111 Fall 2008 Lecture 7 25
Using External Memory Devices

Address Write Write enable Tri-state Driver


Logic
Pins Row Decoder Chip Enable
Memory Matrix enable

Data
Pins in out
If enable=0
… Read
Logic Write enable out = Z
Sense Amps/Drivers Output
Enable
Column Decoder If enable =1
out = in

•  Address pins drive row and column •  Output Enable gates the
decoders chip’s tristate driver
•  Data pins are bidirectional: shared •  Write Enable sets the
by reads and writes memory’s read/write mode
•  Chip Enable/Chip Select acts
Concept of “Data Bus” as a “master switch”

6.111 Fall 2008 Lecture 7 26


MCM6264C 8K x 8 Static RAM
Same (bidirectional) data bus used
On the outside: for reading and writing
Address
13
Chip Enables (E1 and E2)
E1 must be low and E2 must be
Chip Enables E1
8
high to enable the chip
E2 MCM6264C Data Write Enable (WE)
Write Enable WE DQ[7:0] When low (and chip enabled),
values on data bus are written to
Output Enable OE location selected by address bus
Output Enable (OE or G)
When low (and chip is enabled),
On the inside: data bus is driven with value of
selected memory location
A2 DQ[7:0]
A3
Row Decoder

A4 E1
Memory matrix E2
A5

256 rows
A7 32 Column
A8
A9
A11 W
… G Pinout
Sense Amps/Drivers
Column Decoder
A0

A12
A1
A6
A10

6.111 Fall 2008 Lecture 7 27


Reading an Asynchronous SRAM
Address Address Valid
Access time (from address valid)
E1
Access time (from enable low)
OE
Bus enable time Bus tristate time
(Tristate)
Data Data Valid
E2 assumed high (enabled), W =1 (read mode)

•  Read cycle begins when all enable signals (E1, E2, OE) are active

•  Data is valid after read access time


–  Access time is indicated by full part number: MCM6264CP-12  12ns

•  Data bus is tristated shortly after OE or E1 goes high

6.111 Fall 2008 Lecture 7 28


Address Controlled Reads

Address Address 1 Address 2 Address 3


Access time (from address valid)
Contamination time
E1

OE
Bus enable Bus tristate
time time Data
Data Data Data
1 2 3
E2 assumed high (enabled), WE =1 (read mode)

•  Can perform multiple reads without disabling chip


•  Data bus follows address bus, after some delay

6.111 Fall 2008 Lecture 7 29


Writing to Asynchronous SRAM

Address Address Valid


Address setup time Address hold time
E1
Write pulse width
WE
Data setup Data hold time
time
Data Data Valid
E2 and OE are held high

•  Data latched when WE or E1 goes high (or E2 goes low)


–  Data must be stable at this time
–  Address must be stable before WE goes low

•  Write waveforms are more important than read waveforms


–  Glitches to address can cause writes to random addresses!
6.111 Fall 2008 Lecture 7 30
Sample Memory Interface Logic
Write cycle Read cycle

Clock/E1
OE
WE
Address Address for write Address for read

Data Data for write Data read

Write occurs here, Data can be


Drive data bus only when when E1 goes high latched
VCC
clock is low FPGA
here

–  Ensures address are ext_chip_enable


E2
Clock E1
stable for writes ext_write_enable
Control W SRAM
–  Prevents bus (write, read,
FSM ext_output_enable
G
int_data
contention reset)
Write D Q Data[7:0]
–  Minimum clock period data
Read data
is twice memory
Q D
ext_data
Address
access time D Q
ext_address
Address[12:0]

6.111 Fall 2008 Lecture 7 31


Tristate Data Buses in Verilog
CE (active low)

OE (active_low)
clk int_data

Write data D Q
ext_data

Read data Q D

output CE,OE; // these signals are active low


inout [7:0] ext_data;
reg [7:0] read_data,int_data
wire [7:0] write_data;

always @(posedge clk) begin


int_data <= write_data;
read_data <= ext_data;
end

// Use a tristate driver to set ext_data to a value


assign ext_data = (~CE & OE) ? int_data : 8’hZZ;
6.111 Fall 2008 Lecture 7 32
Synchronous SRAM Memories
•  Clocking provides input synchronization and encourages more
reliable operation at high speeds
Write Write Enable

Row Decoder
Logic Chip Enable
Memory
Address matrix


Pins Data
Pins
… Read
Sense Amps/Drivers Logic Output Enable
Column Decoder

long “flow-through”
difference between read and write timings combinational path creates high
creates wasted cycles (“wait states”) CLK-Q delay

R1 R2 W3 R4 W5
CE
WE
CLK
Address A1 A2 A3 A4 A5

Data Q1 Q2 D3 Q4 D5

6.111 Fall 2008 Lecture 7 33


ZBT Eliminates the Wait State
•  The wait state occurs because:
–  On a read, data is available after the clock edge
–  On a write, data is set up before the clock edge
•  ZBT (“zero bus turnaround”) memories change the rules for writes
–  On a write, data is set up after the clock edge
(so that it is read on the following edge)
–  Result: no wait states, higher memory throughput

R1 R2 W3 R4 W5
CE
WE
CLK
Address A1 A2 A3 A4 A5

Data Q1 Q2 D3 Q4 D5

Write to A3 Data D3 Write to A5 Data D5


requested loaded requested loaded

6.111 Fall 2008 Lecture 7 34


Pipelining Allows Faster CLK
•  Pipeline the memory by registering its output
–  Good: Greatly reduces CLK-Q delay, allows higher clock (more throughput)
–  Bad: Introduces an extra cycle before data is available (more latency)
ZBT
Row Decoder Write Write Enable
Logic Chip Enable
Memory
Address matrix
As an example, see

Pins Data the CY7C147X ZBT
Pins
Synchronous SRAM
… Read
Sense Amps/Drivers Logic Output Enable
Column Decoder

pipelining register

R1 R2 W3 R4 W5
CE
WE
CLK
Address A1 A2 A3 A4 A5

Data one-cycle
Q1 Q2 D3 Q4 D5
latency... (ZBT write to A3) (ZBT write to A5)

6.111 Fall 2008 Lecture 7 35


EEPROM
Electrically Erasable Programmable Read-Only Memory

EEPROM – The Floating Gate Transistor


Intel 20 V 0V
Floating [Rabaey03]

gate
10 V 5V 20 V 5V 0V

S D S D

Avalanche injection Removing programming


voltage leaves charge trapped

This is a non-volatile memory (retains state when supply turned off)


Usage: Just like SRAM, but writes are much slower than reads
( write sequence is controlled by an FSM internal to chip )
Common application: configuration data (serial EEPROM)
6.111 Fall 2008 Lecture 7 36
Interacting with Flash and (E)EPROM
•  Reading from flash or (E)EPROM is the same as reading from SRAM
•  Vpp: input for programming voltage (12V)
–  EPROM: Vpp is supplied by programming machine
–  Modern flash/EEPROM devices generate 12V using an on-chip charge pump
•  EPROM lacks a write enable
–  Not in-system programmable (must use a special programming machine)
•  For flash and EEPROM, write sequence is controlled by an internal FSM
–  Writes to device are used to send signals to the FSM
–  Although the same signals are used, one can’t write to flash/EEPROM in the same
manner as SRAM

Flash/EEPROM block diagram Vcc (5V)

Address Data
Charge
Chip Enable pump
EPROM omits
Output Enable Programming FSM, charge
voltage (12V)
FSM pump, and
Write Enable
write enable
6.111 Fall 2008 Lecture 7 37
Dynamic RAM (DRAM) Cell
WL BL
Write "1" Read "1" DRAM uses
WL Special
M1 CS Capacitor
X
GND
Structures

VDD
BL
CBL VDD/2 VDD /2 Cell Plate Si
sensing
[Rabaey03] Capacitor Insulator
Refilling Poly
To Write: set Bit Line (BL) to 0 or VDD Storage Node Poly
& enable Word Line (WL) (i.e., set to VDD ) Si Substrate
2nd Field Oxide
To Read: set Bit Line (BL) to VDD /2
& enable Word Line (i.e., set it to VDD )
  DRAM relies on charge stored in a capacitor to hold state
  Found in all high density memories (one bit/transistor)
  Must be “refreshed” or state will be lost – high overhead
6.111 Fall 2008 Lecture 7 38
Asynchronous DRAM Operation
Address Row Col

RAS

CAS

(Tristate)
Data Q (data from RAM)

WE set high/low before


asserting CAS

RAS-before-CAS CAS-before-RAS
for a read or write for a refresh
(Row and column addresses taken
on falling edges of RAS and CAS)

•  Clever manipulation of RAS and CAS after reads/writes provide


more efficient modes: early-write, read-write, hidden-refresh, etc.
(See datasheets for details)
6.111 Fall 2008 Lecture 7 39
Addressing with Memory Maps
•  Address decoder selects memory
–  Example: ‘138 3-to-8 decoder SRAM 1 SRAM 2 EPROM

Address[12:0]
–  Produces enable signals

Address[12:0]

Address[12:0]
Data[7:0]

Data[7:0]
Data[7:0]
•  SRAM-like interface often used
for peripherals

~E1
~E1

~E1
~W

~W
~G

~G
~G
–  Known as “memory mapped” OE
peripherals WE

[12:0]

[12:0]

[12:0]
Address[15:0]
15 C Y7

Memory Map 14

[2:0]
B Y6
13 A Y5

‘138
Y4
0xFFFF
0xE000
EPROM ~G2B
~G2A
Y3
Y2
0xDFFF Bus Enable
SRAM 2
G1 Y1
Y0
0xC000 +5V
0xBFFF
0xA000
SRAM 1 Data[7:0]
0x9FFF

~G
~W
~E1
Address[2:0]
Data[7:0]
0x2000 Analog
0x1FFF
ADC ADC Input
0x0000

6.111 Fall 2008 Lecture 7 40


Memory Devices: Helpful Knowledge
•  SRAM vs. DRAM
–  SRAM holds state as long as power supply is turned on. DRAM
must be “refreshed” – results in more complicated control
–  DRAM has much higher density, but requires special capacitor
technology.
–  FPGA usually implemented in a standard digital process
technology and uses SRAM technology
•  Non-Volatile Memory
–  Fast Read, but very slow write (EPROM must be removed from
the system for programming!)
–  Holds state even if the power supply is turned off
•  Memory Internals
–  Has quite a bit of analog circuits internally -- pay particular
attention to noise and PCB board integration
•  Device details
–  Don’t worry about them, wait until 6.012 or 6.374

6.111 Fall 2008 Lecture 7 41


You Should Understand Why…
•  control signals such as Write Enable should be registered
•  a multi-cycle read/write is safer from a timing perspective
than the single cycle read/write approach
•  it is a bad idea to enable two tri-states driving the bus at the
same time
•  an SRAM does not need to be “refreshed” while a DRAM
requires refresh
•  an EPROM/EEPROM/FLASH cell can hold its state even if the
power supply is turned off
•  a synchronous memory can result in higher throughput

6.111 Fall 2008 Lecture 7 42

You might also like