Digital Design
Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, First Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2007.
http://www.ddvahid.com
Copyright 2007 Frank Vahid
Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
subject to keeping
this copyright
notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
Digital
Design
with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Copyright 2006
1
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
Franksource
Vahid
may obtain PowerPoint
or obtain special use permissions from Wiley see http://www.ddvahid.com for information.
5.1
Chapter 3: Controllers
Control input/output: single bit (or just a
few) representing event or state
Finite-state machine describes
behavior; implemented as state register
and combinational logic
bi
bo
Combinational
logic
n1
s1
clk
FSM
outputs
FSM
inputs
Introduction
n0
s0
State register
Chapter 4: Datapath components
Data input/output: Multiple bits
collectively representing single entity
Datapath components included
registers, adders, ALU, comparators,
register files, etc.
Register
Comparator
ALU
Register file
i
s
sa
i
n
This chapter: custom processors
Processor: Controller and datapath
components working together to
implement an algorithm
bi
bo
Combinational
logic
n1
z
n0
s1 s0
State register
Register file
ALU
Datapath
Digital Design
Copyright 2006
Frank Vahid
Controller
Note: Slides with animation are denoted with a small red "a" near the animated items
RTL Design: Capture Behavior, Convert to Circuit
Recall
Chapter 2: Combinational Logic Design
First step: Capture behavior (using equation
or truth table)
Remaining steps: Convert to circuit
Chapter 3: Sequential Logic Design
Capture behavior
First step: Capture behavior (using FSM)
Remaining steps: Convert to circuit
RTL Design (the method for creating
custom processors)
Convert to circuit
First step: Capture behavior (using highlevel state machine, to be introduced)
Remaining steps: Convert to circuit
Digital Design
Copyright 2006
Frank Vahid
RTL Design Method
Digital Design
Copyright 2006
Frank Vahid
5.2
RTL Design Method: Preview Example
Soda dispenser
c: bit input, 1 when coin
deposited
c
a: 8-bit input having value of
deposited coin
d
s: 8-bit input having cost of a
soda
d: bit output, processor sets to
1 when total value of
deposited coins equals or
0 1 0 1 0
exceeds cost of a soda
c
Soda
dispenser
processor
a 25
25
50
Soda tot:
tot:
dispenser
25
processor50
0 1 0
Digital Design
Copyright 2006
Frank Vahid
How can we precisely describe this
processors behavior?
Preview Example: Step 1 -Capture High-Level State Machine
If see coin, go to Add state
Add state: Update total value:
tot = tot + a
Remember, a is present coins
value
Go back to Wait state
In Wait state, if tot >= s, go to
Disp(ense) state
Disp state: Set d=1 (dispense
soda)
Return to Init state
Digital Design
Copyright 2006
Frank Vahid
a
8
Declare local register tot
Init state: Set d=0, tot=0
Wait state: wait for coin
c
d
Soda
dispenser
processor
Inputs: c (bit), a (8 bits), s (8 bits)
Outputs: d (bit)
Local registers: tot (8 bits)
c
Add
Init
d=0
tot=0
Wait
tot=tot+a
c*(tot<s)
c*(tot<s)
Disp
d=1
6
Preview Example:
Step 2 -- Create Datapath
Need tot register
Need 8-bit comparator
to compare s and tot
Need 8-bit adder to
perform tot = tot + a
Wire the components
as needed for above
Create control
input/outputs, give
them names
Inputs : c (bit), a(8 bits), s (8 bits)
Outputs : d (bit)
Local reg isters: t ot (8 bits)
c
Add
Init
Wait
d=0
t ot=0
c
(tot<s)
tot= tot+a
c (tot<s)
Disp
d=1
tot_ld
ld
clr
tot_clr
tot
8
8-bit
adder
8-bit
<
tot_lt_s
Datapath
Digital Design
Copyright 2006
Frank Vahid
Preview Example: Step 3
Connect Datapath to a Controller
Controllers inputs
tot_ld
ld
clr
tot_clr
External input c
(coin detected)
Input from datapath
comparators output,
which we named
tot_lt_s
8
8-bit
<
tot_lt_s
Datapath
tot
8-bit
adder
8
a
8
Controllers outputs
External output d
(dispense soda)
Outputs to datapath
to load and clear the
tot register
c
d
tot_ld
tot_clr
Controller
Digital Design
Copyright 2006
Frank Vahid
tot_lt_s
Datapath
8
Preview Example: Step 4 Derive the Controllers
FSM
Same states
and arcs as
high-level
state machine
But set/read
datapath
control
signals for all
c
datapath
operations d
and
conditions
s a
8 8
c
tot_ld
Datapath
Controller
tot_clr
tot_lt_s
s
Inputs::c, tot_lt_s(bit)
Outputs:d, tot_ld, tot_clr (bit)
tot_ld
tot_clr
tot_ld
ld
tpt
clr
Add
Init
d=0
tot_clr=1
Wait
c*
tot
_lt
_s
tot_clr
tot_ld=1
c*tot_lt_s
tot_lt_s
tot_lt_s
8-bit
adder
8-bit
<
Datapath
Disp
d=1
Controller
Digital Design
Copyright 2006
Frank Vahid
Preview Example: Completing the Design
Init
d=0
tot_clr=1
Wait
c*
Init
Add
tot_clr
tot_ld=1
tot_lt_s
c*tot_lt_s
tot
_lt
_s
Controller
Digital Design
Copyright 2006
Frank Vahid
Disp
Wait
Add
Disp
tot_ld
s0
0
0
0
0
1
c
0
0
1
1
0
0
0
0
1
1
1
1
0
tot_clr
Inputs::c, tot_lt_s (bit)
Outputs:d, tot_ld, tot_clr (bit)
s1
0
0
0
0
0
tot_ld
As in Ch3
Table shown on right
tot_lt_s
Implement the FSM as
a state register and
logic
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
n1
0
0
0
0
1
n0
1
1
1
1
1
d
0
0
0
0
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
0
1
d=1
10
Step 1: Create a High-Level State Machine
Lets consider each step of the
RTL design process in more
detail
Inputs: c (bit), a (8 bits), s (8 bits)
Outputs: d (bit)
Step 1
Local registers: tot (8 bits)
Soda dispenser example
Not an FSM because:
Multi-bit (data) inputs a and s
Local register tot
Data operations tot=0, tot<s,
tot=tot+a.
Init
d=0
tot=0
Wait
tot= tot+a
c(tot<s)
c (tot<s)
Disp
Useful high-level state machine:
Data types beyond just bits
Local registers
Arithmetic equations/expressions
d=1
Digital Design
Copyright 2006
Frank Vahid
11
Step 1 Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
interest
sensor
2D = T sec * 3*108 m/sec
Example of how to create a high-level state machine to
describe desired processor behavior
Laser-based distance measurement pulse laser,
measure time T to sense reflection
Laser light travels at speed of light, 3*108 m/sec
Distance is thus D = T sec * 3*108 m/sec / 2
Digital Design
Copyright 2006
Frank Vahid
12
Step 1 Example: Laser-Based Distance Measurer
T (in seconds)
laser
sensor
from button
16
to display
Laser-based
distance
measurer
to laser
S
from sensor
Inputs/outputs
B: bit input, from button to begin measurement
L: bit output, activates laser
S: bit input, senses laser reflection
D: 16-bit output, displays computed distance
Digital Design
Copyright 2006
Frank Vahid
13
Step 1 Example: Laser-Based Distance Measurer
from button B
Inputs: B, S(1 bit each)
Outputs: L (bit), D (16 bits)
to display
S0
a
16
Laserbased
distance
measurer
to laser
from sensor
L = 0 (laser off)
D = 0 (distance = 0)
Step 1: Create high-level state machine
Begin by declaring inputs and outputs
Create initial state, name it S0
Initialize laser to off (L=0)
Initialize displayed distance to 0 (D=0)
Digital Design
Copyright 2006
Frank Vahid
14
Step 1 Example: Laser-Based Distance Measurer
from button B
Inputs: B, S (1 bit each)
Outputs: L (bit), D (16 bits)
B (button not pressed)
to display
L
Laserbased
distance
measurer
16
to laser
from sensor
S0
S1
B
(button
pressed)
L=0
D=0
Add another state, call S1, that waits for a button press
B stay in S1, keep waiting
B go to a new state S2
Q: What should S2 do?
A: Turn on the laser
a
Digital Design
Copyright 2006
Frank Vahid
15
Step 1 Example: Laser-Based Distance Measurer
from button B
Inputs: B, S (1 bit each)
Outputs: L (bit), D (16 bits)
to display
L
Laserbased
distance
measurer
16
to laser
from sensor
S0
L=0
D=0
S1
S2
S3
L=1
(laser on)
L=0
(laser off)
Add a state S2 that turns on the laser (L=1)
Then turn off laser (L=0) in a state S3
Q: What do next? A: Start timer, wait to sense reflection
a
Digital Design
Copyright 2006
Frank Vahid
16
Step 1 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
from button
Local Registers: Dctr (16 bits)
to display
16
Lase
r-based
distance
measu
rer
to laser
from sensor
S (no reflection)
S0
S1
L=0
D=0
Dctr = 0
(reset cycle
count)
S2
S3
L=1
S (reflection)
?
L=0
Dctr = Dctr + 1
(count cycles)
Stay in S3 until sense reflection (S)
To measure time, count cycles for which we are in S3
To count, declare local register Dctr
Increment Dctr each cycle in S3
Initialize Dctr to 0 in S1. S2 would have been O.K. too
Digital Design
Copyright 2006
Frank Vahid
17
Step 1 Example: Laser-Based Distance Measurer
from button
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
Local Registers: Dctr (16 bits)
to display
16
Lase
r-based
distance
measu
rer
to laser
from sensor
S0
S1
L=0
D=0
Dctr = 0
S2
L=1
S3
S4
S
D = Dctr / 2
L=0
Dctr = Dctr + 1 (calculate D)
Once reflection detected (S), go to new state S4
Calculate distance
Assuming clock frequency is 3x108, Dctr holds number of meters, so
D=Dctr/2
After S4, go back to S1 to wait for button again
Digital Design
Copyright 2006
Frank Vahid
18
Step 2: Create a Datapath
Datapath must
Implement data storage
Implement data computations
Look at high-level state machine, do
three substeps
(a) Make data inputs/outputs be datapath
inputs/outputs
(b) Instantiate declared registers into the
datapath (also instantiate a register for each
data output)
(c) Examine every state and transition, and
instantiate datapath components and
connections to implement any data
computations
Instantiate: to
introduce a new
component into a
design.
Digital Design
Copyright 2006
Frank Vahid
19
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each)
Outputs: L (bit), D (16 bits)
(a) Make data
Local Registers: Dctr (16 bits)
inputs/outputs be
datapath
B
S
inputs/outputs
(b) Instantiate declared
S4
S0
S1
S2
S3
registers into the
B
S
datapath (also
L=0
D = Dctr / 2
Dctr = 0
L=1
L=0
instantiate a
D=0
Dctr = Dctr + 1 (calculate D)
register for each
a
data output)
Datapath
(c) Examine every
Dreg_clr
state and
Dreg_ld
transition, and
clear
clear
I
Dctr_clr
instantiate
Dctr: 16-bit
Dreg: 16-bit
count
Dctr_cnt
load
up-counter
register
datapath
Q
Q
components and
connections to
implement any
16
data computations
D
Digital Design
Copyright 2006
Frank Vahid
20
10
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
Local Registers: Dctr (16 bits)
(c) (continued)
Examine every
state and
transition, and
instantiate
datapath
components and
connections to
implement any
data computations
S0
S1
L=0
D=0
Dctr = 0
S2
L=1
S3
S4
L=0
D = Dctr / 2
Dctr = Dctr + 1 (calculate D)
a
Datapath
>>1
16
Dreg_clr
Dreg_ld
clear
count
Dctr_clr
Dctr_cnt
clear
Dctr: 16-bit
up-counter
Dreg: 16-bit
register
load
Q
16
16
D
Digital Design
Copyright 2006
Frank Vahid
21
Step 2 Example Showing Mux Use
Localregisters:
E,F, G, R (16 bits)
E
T0 R = E + F
A
T1 R = R + G
add_A_s0
add_B_s0
1
2
A
(a)
(b)
(c)
1
2
(d)
Introduce mux when one component input can come from
more than one source
Digital Design
Copyright 2006
Frank Vahid
22
11
Step 3: Connecting the Datapath to a Controller
from button
B
Controller
from sensor
S
Dreg_clr
Laser-based distance
measurer example
Easy just connect all
control signals
between controller and
datapath
Dreg_ld
Dctr_clr
Datapath
Dctr_cnt
D
to display
16
to laser
300 MHz Clock
Datapath
>>1
16
Dreg_clr
Dreg_ld
clear
count
Dctr_clr
Dctr_cnt
clear
load
Dctr: 16-bit
up-counter
Q
Dreg: 16-bit
register
16
Digital Design
Copyright 2006
Frank Vahid
16
23
Step 4: Deriving the Controllers FSM
Inputs: B, S (1 bit each)
Outputs: L (bit), D (16 bits)
Local Registers: Dctr (16 bits)
to laser
from button
Controller
from sensor
S
Dreg_clr
Dreg_ld
Dctr_clr
Datapath
Dctr_cnt
D
to display
16
300 MHz Clock
S0
S1
L=0
D=0
Dctr = 0
S2
L=1
Inputs: B, S
FSM has same
Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
structure as highB
level state machine
Inputs/outputs all
bits now
Replace data
operations by bit
operations using
datapath
Digital Design
Copyright 2006
Frank Vahid
S3
S4
L=0
D = Dctr / 2
Dctr = Dctr + 1 (calculate D)
S
a
S0
S1
L=0
Dreg_clr = 1
Dreg_ld = 0
Dctr_clr = 0
Dctr_cnt = 0
(laser off)
(clear D reg)
L=0
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 1
Dctr_cnt = 0
(clear count)
S2
S3
L=1
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 0
Dctr_cnt = 0
(laser on)
L=0
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 0
Dctr_cnt = 1
(laser off)
(count up)
S4
L=0
Dreg_clr = 0
Dreg_ld = 1
Dctr_clr = 0
Dctr_cnt = 0
(load D reg with Dctr/2)
(stop counting)24
12
Step 4: Deriving the Controllers FSM
B
Using
shorthand of
outputs not
assigned
implicitly
assigned 0
S0
S1
L=0
Dreg_clr = 1
Dreg_ld = 0
Dctr_clr = 0
Dctr_cnt = 0
(laser off)
(clear D reg)
L=0
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 1
Dctr_cnt = 0
(clear count)
Inputs: B, S
S2
S3
L=1
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 0
Dctr_cnt = 0
(laser on)
L=0
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 0
Dctr_cnt = 1
(laser off)
(count up)
S4
L=0
Dreg_clr = 0
Dreg_ld = 1
Dctr_clr = 0
Dctr_cnt = 0
(load D reg with Dctr/2)
(stop counting)
Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
S
S0
S1
L=0
Dreg_clr = 1
(laser off)
(clear D reg)
Dctr_clr = 1
(clear count)
S2
S3
L=1
(laser on)
L=0
Dctr_cnt = 1
(laser off)
(count up)
Digital Design
Copyright 2006
Frank Vahid
S4
Dreg_ld = 1
Dctr_cnt = 0
(load D reg with Dctr/2)
(stop counting)
25
Step 4
L
Dreg_clr
Dreg_ld
Dctr_clr
Datapath
Controller
from button
to laser
from sensor
>>1
16
Dreg_clr
Dreg_ld
clear
count
Dctr_clr
Dctr_cnt
Dctr_cnt
to display
Datapath
D
16
clear
load
Dctr: 16-bit
up-counter
Q
300 MHz Clock
16
Dreg: 16-bit
register
Q
16
D
Inputs: B, S
Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
B
S0
S1
L=0
Dreg_clr = 1
(laser off)
(clear D reg)
Dctr_clr = 1
(clear count)
Digital Design
Copyright 2006
Frank Vahid
S2
S3
L=1
(laser on)
L=0
Dctr_cnt = 1
(laser off)
(count up)
Implement
FSM as state
register and
Dreg_ld = 1
Dctr_cnt = 0
logic (Ch3) to
(load D reg with Dctr/2)
complete the
(stop counting)
design
S4
26
13
5.3
RTL Design Examples and Issues
Well use several more
examples to illustrate RTL
design
Example: Bus interface
Master processor can read
register from any peripheral
Master
processor
rd
Per0
Each register has unique 4-bit
address
Assume 1 register/periph.
32
Per1
Per15
to/from processor bus
rd
D
A
Sets rd=1, A=address
Appropriate peripheral places
register data on 32-bit D lines
32
Faddr
Bus interface
4
Periphs address provided on
Faddr inputs (maybe from DIP
switches, or another register)
32
Main part
Peripheral
Digital Design
Copyright 2006
Frank Vahid
27
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd
((A = Faddr)
and rd)
WaitMyAddress
(A = Faddr)
and rd
D = Z
Q1 = Q
rd
SendData
D = Q1
Step 1: Create high-level state machine
State WaitMyAddress
Output nothing (Z) on D, store peripherals register value Q into local
register Q1
Wait until this peripherals address is seen (A=Faddr) and rd=1
State SendData
Output Q1 onto D, wait for rd=0 (meaning main processor is done
reading the D lines)
Digital Design
Copyright 2006
Frank Vahid
28
14
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd
((A = Faddr)
and rd)
WaitMyAddress
(A = Faddr)
and rd
D = Z
Q1 = Q
rd
SendData
D = Q1
clk
Inputs
rd
State
Outputs
D
SD
Q1
W
Z
SD
SD
Q1
Digital Design
Copyright 2006
Frank Vahid
29
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd
((A = Faddr)
and rd)
SendData
WaitMyAddress
(A = Faddr)
D = Q1
and rd
D = Z
Q1 = Q
A
rd
Faddr
32
Q1_ld
ld Q1
= (4-bit)
A_eq_Faddr
D_en
Step 2: Create a datapath
(a) Datapath inputs/outputs
(b) Instantiate declared registers
(c) Instantiate datapath components and
connections
Digital Design
Copyright 2006
Frank Vahid
32
32
Datapath
Bus interface
D
30
15
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd
Inputs: rd, A_eq_Faddr
((A =(bit)
Faddr)
Outputs: Q1_ld, D_enand
(bit)rd)
rdSendData
rd WaitMyAddress
(A = Faddr)
(A_eq_Faddr
D = Q1
and
rd
D = Z
and rd)
Q1 = Q
D_en = 0
Q1_ld = 1
A_eq_Faddr
and rd
Faddr
32
Q1_ld
rd
ld
= (4-bit)
SendData
WaitMyAddress
rd
32
A_eq_Faddr
D_en = 1
Q1_ld = 0
D_en
Bus interface
Q1
32
Datapath
Step 3: Connect datapath to controller
Step 4: Derive controllers FSM
Digital Design
Copyright 2006
Frank Vahid
31
RTL Example: Video Compression Sum of Absolute
Only difference: ball moving
Differences
Frame 1
Frame 2
Frame 1
Frame 2
Digitized
Digitized
Digitized
Difference of
frame 1
frame 2
frame 1
2 from 1
1 Mbyte
1 Mbyte
1 Mbyte
(a)
0.01 Mbyte
(b)
Video is a series of frames (e.g., 30 per second)
Most frames similar to previous frame
Just send
difference
Compression idea: just send difference from previous frame
Digital Design
Copyright 2006
Frank Vahid
32
16
RTL Example: Video Compression Sum of Absolute
Differences
compare
Frame 1
Each is a pixel, assume
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
Frame 2
Need to quickly determine whether two frames are similar
enough to just send difference for second frame
Compare corresponding 16x16 blocks
Treat 16x16 block as 256-byte array
Compute the absolute value of the difference of each array item
Sum those differences if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)
Digital Design
Copyright 2006
Frank Vahid
33
RTL Example: Video Compression Sum of Absolute
Differences
256-byte array
256-byte array
SAD
sad
integer
go
<
)!(i
6
5
2
Want fast sum-of-absolute-differences (SAD) component
When go=1, sums the differences of element pairs in arrays A and
B, outputs that sum
Digital Design
Copyright 2006
Frank Vahid
34
17
RTL Example: Video Compression Sum of Absolute
Differences
SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits)
sad
B
go
!go
S0
go
S0: wait for go
S1: initialize sum and index
S2: check if done (i>=256)
S3: add difference to sum,
increment index
S4: done, write to output
sad_reg
sum = 0
i=0
S1
<
)!(i
6
5
2
(i<256)
S2
S3
i<256
sum=sum+abs(A[i]-B[i])
i=i+1
S4
sad_reg = sum
Digital Design
Copyright 2006
Frank Vahid
35
RTL Example: Video Compression Sum of Absolute
Differences
AB_addr
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits)
i_lt_256
S0
go
S1
(i<256)
sum = 0
i=0
<256
i_clr
i_inc
!go
A_data B_data
8
sum_ld
S2
sum_clr
sum
32
abs
<
)!(i
6
5
2
S3
i<256
sum=sum+abs(A[i]-B[i])
i=i+1
sad_reg
<
t_
)l(i!
6
5
2
S4
sad_reg=sum
32 32
sad_reg_ld
Datapath
32
sad
Step 2: Create datapath
Digital Design
Copyright 2006
Frank Vahid
36
18
RTL Example: Video Compression Sum of Absolute
Differences
go
AB_addr
AB_rd
i_lt_256
go
S0
go
S1
i_clr
S4
<
t_
)l(i!
6
5
2
i<256 i_lt_256
sum=sum+abs(A[i]-B[i])
S3
sum_ld=1; AB_rd=1
i=i+1 i_inc=1
<
t_
)l(i!
6
5
2
sum_ld
S2
<256
i_inc
sum=0 sum_clr=1
i=0 i_clr=1
A_data B_data
sad_reg=sum
sad_reg_ld=1
sum_clr
32
sum
abs
8
32 32
<
)!(i
6
5
2
sad_reg_ld
sad_reg
32
Controller
sad
Step 3: Connect to controller
Step 4: Replace high-level state machine by FSM
Digital Design
Copyright 2006
Frank Vahid
37
RTL Example: Video Compression Sum of Absolute
Differences
Comparing software and custom
circuit SAD
Circuit: Two states (S2 & S3) for
each i, 256 is 512 clock cycles
Software: Loop (for i = 1 to 256), but
for each i, must move memory to
local registers, subtract, compute
absolute value, add to sum,
increment i say about 6 cycles per
array item 256*6 = 1536 cycles
Circuit is about 3 times (300%)
faster
Later, well see how to build SAD
circuit that is even faster
(i<256)
S2
S3
i<256
sum=sum+abs(A[i]-B[i])
i=i+1
<
)!(i
6
5
2
<
t_
)l(i!
6
5
2
Digital Design
Copyright 2006
Frank Vahid
38
19
RTL Design Pitfalls and Good Practice
Common pitfall: Assuming
register is update in the
state its written
Local registers: R, Q (8 bits)
C
R<100
Final value of Q?
Final state?
Answers may surprise you
R=99
Q=R
Value of Q unknown
Final state is C, not D
R>=100
R=R+1
(a)
R<100
A
99
?
clk
Why?
State A: R=99 and Q=R
happen simultaneously
State B: R not updated with
R+1 until next clock cycle,
simultaneously with state
register being updated
B
100
99
100
(b)
Digital Design
Copyright 2006
Frank Vahid
39
RTL Design Pitfalls and Good Practice
Solutions
Read register in
following state (Q=R)
Insert extra state so that
conditions use updated
value
Other solutions are
possible, depends on
the example
Local registers: R, Q (8 bits)
R<100
A
B2
R=99
Q=R
R=R+1
Q=R
(a)
R>=100
D
R<100 R>=100
A
99
?
B
100
99
100
100
99
99
clk
B2
(b)
Digital Design
Copyright 2006
Frank Vahid
40
20
RTL Design Pitfalls and Good Practice
Common pitfall:
Reading outputs
Inputs: A, B (8 bits)
Outputs: P (8 bits)
Outputs can only be
written
Solution: Introduce
additional register,
which can be written
and read
Inputs: A, B (8 bits)
Outputs: P (8 bits)
Local register: R (8 bits)
P=A
P=P+B
R=A
P=A
P=R+B
(a)
(b)
Digital Design
Copyright 2006
Frank Vahid
41
RTL Design Pitfalls and Good Practice
Good practice: Register
all data outputs
B
R
In fig (a), output P would
show spurious values as
addition computes
Furthermore, longest
register-to-register path,
which determines clock
period, is not known until
that output is connected
to another component
In fig (b), spurious outputs
reduced, and longest
register-to-register path is
clear
Digital Design
Copyright 2006
Frank Vahid
+
P
(a)
Preg
P
(b)
42
21
Control vs. Data Dominated RTL Design
Designs often categorized as control-dominated or datadominated
Control-dominated design Controller contains most of the
complexity
Data-dominated design Datapath contains most of the complexity
General, descriptive terms no hard rule that separates the two
types of designs
Laser-based distance measurer control dominated
Bus interface, SAD circuit mix of control and data
Now lets do a data dominated design
Digital Design
Copyright 2006
Frank Vahid
43
Data Dominated RTL Design Example: FIR Filter
Filter concept
Suppose X is data from a
temperature sensor, and
particular input sequence is
180, 180, 181, 240, 180, 181
(one per clock cycle)
That 240 is probably wrong!
Could be electrical noise
X
12
digital filter
12
clk
Filter should remove such
noise in its output Y
Simple filter: Output average
of last N values
Small N: less filtering
Large N: more filtering, but
less sharp output
Digital Design
Copyright 2006
Frank Vahid
44
22
Data Dominated RTL Design Example: FIR Filter
FIR filter
Finite Impulse Response
Simply a configurable weighted
sum of past input values
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Above known as 3 tap
Tens of taps more common
Very general filter User sets the
constants (c0, c1, c2) to define
specific filter
X
12
12
digital filter
clk
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
RTL design
Step 1: Create high-level state
machine
But there really is none! Data
dominated indeed.
Go straight to step 2
Digital Design
Copyright 2006
Frank Vahid
45
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath
Begin by creating chain
of xt registers to hold past
values of X
12
12
digital filter
clk
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Suppose sequence is: 180, 181, 240
3-tap FIR filter
x(t-1)
x(t)
12
x(t-2)
xt0
xt1
xt2
240
180
181
180
181
180
12
12
12
Y
a
clk
Digital Design
Copyright 2006
Frank Vahid
46
23
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath
(cont.)
Instantiate registers for
c0, c1, c2
Instantiate multipliers to
compute c*x values
x(t)
12
xt0
12
digital filter
clk
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter
x(t-1)
c1
c0
x(t-2)
xt1
c2
xt2
X
a
clk
Digital Design
Copyright 2006
Frank Vahid
47
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath
(cont.)
X
12
digital filter
12
clk
Instantiate adders
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter
x(t)
c0
xt0
x(t-1)
c1
xt1
x(t-2)
c2
xt2
X
clk
+
Y
Digital Design
Copyright 2006
Frank Vahid
48
24
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath (cont.)
X
12
Add circuitry to allow loading of
particular c register
digital filter
12
clk
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
CL
3-tap FIR filter
e
Ca1
Ca0
3
2x4 2
1
0
C
x(t)
c0
xt0
x(t-1)
c1
xt1
x(t-2)
c2
xt2
X
clk
*
+
*
+
yreg
Y
Digital Design
Copyright 2006
Frank Vahid
49
Data Dominated RTL Design Example: FIR Filter
Step 3 & 4: Connect to controller, Create FSM
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
No controller needed
Extreme data-dominated example
(Example of an extreme control-dominated design an FSM, with no
datapath)
Comparing the FIR circuit to a software implementation
Circuit
Assume adder has 2-gate delay, multiplier has 20-gate delay
Longest past goes through one multiplier and two adders
20 + 2 + 2 = 24-gate delay
100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path
Software
100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per
multiplication, 2 per addition. Say 10-gate delay per instruction.
(100*2 + 100*2)*10 = 4000 gate delays
Circuit is more than 100 times faster (10,000% faster). Wow.
Digital Design
Copyright 2006
Frank Vahid
50
25
5.4
Determining Clock Frequency
Designers of digital circuits
often want fastest
performance
clk
Means want high clock
frequency
Frequency limited by longest
register-to-register delay
2 ns
delay
Known as critical path
If clock is any faster, incorrect
data may be stored into register
Longest path on right is 2 ns
Ignoring wire delays, and
register setup and hold times,
for simplicity
Digital Design
Copyright 2006
Frank Vahid
51
Critical Path
Example shows four paths
2 ns
delay
*
7 ns
1 / 7 ns = 142 MHz
Max
5 ns
delay
7 ns
7 ns
7 ns
Longest path is thus 7 ns
Fastest frequency
5 ns
a to c through +: 2 ns
a to d through + and *: 7 ns
b to d through + and *: 7 ns
b to d through *: 5 ns
2 ns
(2,7,7,5)
= 7 ns
Digital Design
Copyright 2006
Frank Vahid
52
26
Critical Path Considering Wire Delays
Real wires have delay too
Must include in critical path
Example shows two paths
clk
Each is 0.5 + 2 + 0.5 = 3 ns
Trend
1980s/1990s: Wire delays were tiny
compared to logic delays
But wire delays not shrinking as fast as
logic delays
0.5 ns
0.5 ns
2 ns
0.5 ns
Must also consider register setup and
hold times, also add to path
Then add some time to the computed
path, just to be safe
s
n
3
3 ns
3 ns
Wire delays may even be greater than
logic delays!
e.g., if path is 3 ns, say 4 ns instead
Digital Design
Copyright 2006
Frank Vahid
53
A Circuit May Have Numerous Paths
s
Paths can exist
In the datapath
In the controller
Between the
controller and
datapath
May be
hundreds or
thousands of
paths
Timing analysis
tools that evaluate
all possible paths
automatically very
helpful
Digital Design
Copyright 2006
Frank Vahid
Combinational logic
a
8
8
d
tot_ld
ld
tot
t ot_clr
clr
8
(c)
tot_lt_s
n1
8-bit
<
n0
8-bit
adder
8
tot_lt_s
Datapath
s1
clk
s0
(b)
(a)
State register
54
27
5.5
Behavioral Level Design: C to Gates
C code
!go
S0
go
S1
(i<256)
sum = 0
i=0
S2
i<256
sum=sum+abs(A[i]-B[i])
S3
i=i+1
S4
sad_reg = sum
int SAD (byte A[256], byte B[256]) // not quite C syntax
{
uint sum; short uint I;
sum = 0;
i = 0;
while (i < 256) {
sum = sum + abs(A[i] B[i]);
i = i + 1;
}
return sum;
}
Earlier sum-of-absolute-differences example
Started with high-level state machine
C code is an even better starting point -- easier to understand
Digital Design
Copyright 2006
Frank Vahid
55
Behavioral-Level Design: Start with C (or Similar
Language)
Replace first step of RTL design method by two steps
Capture in C, then convert C to high-level state machine
How convert from C to high-level state machine?
Step 1A: Capture in C
a
Step 1B: Convert to high-level state machine
Digital Design
Copyright 2006
Frank Vahid
56
28
Converting from C to High-Level State Machine
Convert each C construct to
equivalent states and
transitions
Assignment statement
Becomes one state with
assignment
target=
expression
target = expression;
If-then statement
Becomes state with condition
check, transitioning to then
statements if condition true,
otherwise to ending state
!cond
if (cond) {
// then stmts
}
then statements would also
be converted to states
cond
a
(then stmts)
(end)
Digital Design
Copyright 2006
Frank Vahid
57
Converting from C to High-Level State Machine
If-then-else
Becomes state with condition
check, transitioning to then
statements if condition true, or
to else statements if condition
false
!cond
if (cond) {
// then stmts
}
else {
// else stmts
}
cond
(then stmts) (else stmts)
a
(end)
While loop statement
Becomes state with condition
check, transitioning to while
loops statements if true, then
transitioning back to condition
check
Digital Design
Copyright 2006
Frank Vahid
!cond
while (cond) {
// while stmts
}
cond
(while stmts)
(end)
58
29
Simple Example of Converting from C to HighLevel State Machine
Inputs: uint X, Y
Outputs: uint Max
!(X>Y)
!(X>Y)
X>Y
X>Y
if (X > Y) {
Max = X;
(then stmts)
(else stmts)
Max=X
Max=Y
}
else {
Max = Y;
(end)
(end)
}
a
(a)
(b)
(c)
Simple example: Computing the maximum of two numbers
Convert if-then-else statement to states (b)
Then convert assignment statements to states (c)
Digital Design
Copyright 2006
Frank Vahid
59
Example: Converting Sum-of-Absolute-Differences C
code to High-Level State Machine
Convert each construct to
states
Simplify when possible,
e.g., merge states
From high-level state
machine, follow RTL design
method to create circuit
Thus, can convert C to
gates using straightforward
automatable process
Not all C constructs can be
efficiently converted
Use C subset if intended
for circuit
Can use languages other
than C, of course
Inputs: byte A[256, B[256]
bit go;
Output: int sad
main()
{
uint sum; short uint I;
while (1) {
!(!go)
!go
!go
while (!go);
sad = sum;
!go
go
sum=0
i=0
(d)
(c)
(b)
while (i < 256) {
sum = sum + abs(A[i] - B[i]);
i = i + 1;
}
}
!go
i=0
sum = 0;
i = 0;
go
sum=0
go
!go
go
a
(a)
!go
sum=0
i=0
go
sum=0
i=0
sum=0
i=0
!(i<256)
i<256
!(i<256)
i<256
sum=sum
+ abs
i=i+1
!(i<256)
i<256
sum=sum
+ abs
i=i+1
while stmts
sad =
sum
(g)
Digital Design
Copyright 2006
Frank Vahid
(e)
sad =
sum
60
(f)
30
5.6
Register-transfer level
design instantiates datapath
components to create
datapath, controlled by a
controller
A few more components are
often used outside the
controller and datapath
M words
Memory Components
MxN memory
N-bits
wide each
M words, N bits wide each
Several varieties of memory,
which we now introduce
MN memory
Digital Design
Copyright 2006
Frank Vahid
61
Random Access Memory (RAM)
RAM Readable and writable memory
Random access memory
Strange name Created several decades ago to
contrast with sequentially-accessed storage like
tape drives
Logically same as register file Memory with
address inputs, data inputs/outputs, and control
32
4
32
W_data
R_data
W_addr
R_addr
W_en
R_en
1632
register file
Register file from Chpt. 4
RAM usually just one port; register file usually two
or more
RAM vs. register file
RAM typically larger than roughly 512 or 1024
words
RAM typically stores bits using a bit storage
approach that is more efficient than a flip flop
RAM typically implemented on a chip in a square
rather than rectangular shape keeps longest
wires (hence delay) short
Digital Design
Copyright 2006
Frank Vahid
32
data
10
addr
rw
1024 32
RAM
en
RAM block symbol
62
31
RAM Internal Structure
32
10
data
rw
wdata(N-1) wdata(N-2) wdata0
Let A = log2M
addr
1024x32
RAM
en
d0
bit storage
block
(aka cell)
word
enable
a0
a1 AxM
d1
decoder
a(A-1)
addr0
addr1
r
d
a
addr(A-1)
clk
en
rw
word
data cell
word word
enable enable
rw data
d(M-1)
to all cells
rdata(N-1) rdata(N-2) rdata0
RAM cell
Similar internal structure as register file
Decoder enables appropriate word based on address
inputs
rw controls whether cell is written or read
Lets see whats inside each RAM cell
Digital Design
Copyright 2006
Frank Vahid
63
Static RAM (SRAM)
d0
addr
rw
1024x32
RAM
addr
10
wdata(N-1)
Let A = log2 M
data
en
addr0
addr1
addr(A-1)
en
rw
data
bit storage
block ,,
(aka cell )
a0
a1 A M
d1
decoder
a(A-1)
e
clk
SRAM cell
wdata(N-2) wdata0
word
enable
,,
32
data
cell
word
data cell
word word
enable enable
rw data
d(M-1)
to all cells
rdata(N-1)
rdata(N-2)
word
enable
rdata0
Static RAM cell
SRAM cell
6 transistors (recall inverter is 2 transistors)
Writing this cell
data
1
data
0
d
word enable input comes from decoder
When 0, value d loops around inverters
That loop is where a bit stays stored
When 1, the data bit value enters the loop
data is the bit to be stored in this cell
data enters on other side
Example shows a 1 being written into cell
data
1
Digital Design
Copyright 2006
Frank Vahid
word
enable
word
enable
data
cell
d
64
32
Static RAM (SRAM)
d0
addr
rw
wdata(N-1)
Let A = log2 M
data
1024x32
RAM
addr
10
en
addr0
addr1
addr(A-1)
bit storage
block ,,
(aka cell )
a0
a1 A M
d1
decoder
a(A-1)
e
clk
wdata(N-2) wdata0
word
enable
,,
32
en
rw
word
data cell
word word
enable enable
rw data
d(M-1)
to all cells
Static RAM cell
rdata(N-1)
rdata(N-2)
SRAM cell
rdata0
data
1
Reading this cell
Somewhat trickier
When rw set to read, the RAM logic sets
both data and data to 1
The stored bit d will pull either the left line or
the right bit down slightly below 1
Sense amplifiers detect which side is
slightly pulled down
data
1
d
1
0
a
word
enable
<1
1
To sense amplifiers
The electrical description of SRAM is really
beyond our scope just general idea here,
mainly to contrast with DRAM...
Digital Design
Copyright 2006
Frank Vahid
65
Dynamic RAM (DRAM)
d0
addr
rw
wdata(N-1)
Let A = log2 M
data
1024x32
RAM
addr
10
en
addr0
addr1
addr(A-1)
en
rw
bit storage
block ,,
(aka cell )
a0
a1 A M
d1
decoder
a(A-1)
e
clk
wdata(N-2) wdata0
word
enable
,,
32
word
data cell
word word
enable enable
rw data
d(M-1)
to all cells
Dynamic RAM cell
rdata(N-1)
rdata(N-2)
DRAM cell
data
rdata0
1 transistor (rather than 6)
Relies on large capacitor to store bit
Write: Transistor conducts, data voltage
level gets stored on top plate of capacitor
Read: Just look at value of d
Problem: Capacitor discharges over time
Must refresh regularly, by reading d and
then writing it right back
cell
word
enable
capacitor
slowly
discharging
(a)
data
enable
d
Digital Design
Copyright 2006
Frank Vahid
discharges
(b)
66
33
Comparing Memory Types
Register file
MxN Memory
implemented as a:
Fastest
But biggest size
register
file
SRAM
Fast
More compact than register file
SRAM
DRAM
DRAM
Slowest
And refreshing takes time
Size comparison for same
number of bits (not to scale)
But very compact
Use register file for small items,
SRAM for large items, and DRAM
for huge items
Note: DRAMs big capacitor requires
a special chip design process, so
DRAM is often a separate chip
Digital Design
Copyright 2006
Frank Vahid
67
Reading and Writing a RAM
clk
2
1
addr
13
data
500
999
rw
3
9
Z
clk
addr
500
1 means write
en
data
rw
valid setup
time
valid
hold
time
setup
time
Writing
500
access
time
RAM[9]
RAM[13]
now equals 500 now equals 999
(b)
Put address on addr lines, data on data lines, set rw=1, en=1
Reading
Set addr and en lines, but put nothing (Z) on data lines, set rw=0
Data will appear on data lines
Dont forget to obey setup and hold times
In short keep inputs stable before and after a clock edge
Digital Design
Copyright 2006
Frank Vahid
68
34
RAM Example: Digital Sound Recorder
wire
microphone
en
rw
addr
data
4096 16
RAM
16
analog-todigital
converter
12
ad_buf
ad_ld
digital-toanalog
converter
Ra Rrw Ren
processor
wire
da_ld
Behavior
speaker
Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
Well use a 4096x16 RAM (12-bit wide RAM not common)
Play back later
Common behavior in telephone answering machine, toys, voice recorders
To record, processor should read a-to-d, store read values into
successive RAM words
To play, processor should read successive RAM words and enable d-to-a
Digital Design
Copyright 2006
Frank Vahid
69
RAM Example: Digital Sound Recorder
4096x16
RAM
RTL design of processor
Create high-level state
machine
Begin with the record behavior
Keep local register a
Stores current address,
ranges from 0 to 4095 (thus
need 12 bits)
Create state machine that
counts from 0 to 4095 using a
For each a
Read analog-to-digital conv.
ad_ld=1, ad_buf=1
Write to RAM at address a
Ra=a, Rrw=1, Ren=1
Digital Design
Copyright 2006
Frank Vahid
16
analog-todigital
converter
ad_buf
ad_ld
12
digital-toanalog
converter
Ra Rw Ren
processor
da_ld
Record behavior
Local register: a (12 bits)
a<4095
S
a=0
ad_ld=1
ad_buf=1
Ra=a
Rrw=1
Ren=1
U
a=a+1
a=4095
70
35
RAM Example: Digital Sound Recorder
Now create play behavior
Use local register a again,
create state machine that
counts from 0 to 4095 again
For each a
4096x16
RAM
16
analog-todigital
converter
ad_buf
ad_ld
Read RAM
Write to digital-to-analog conv.
Note: Must write d-to-a one
cycle after reading RAM, when
the read data is available on
the data bus
data bus
12
digital-toanalog
converter
Ra Rw Ren
da_ld
processor
Play behavior
The record and play state
machines would be parts of a
larger state machine controlled
by signals that determine when
to record or play
Local register: a (12 bits)
a<4095
V
a=0
ad_buf=0
Ra=a
Rrw=0
Ren=1
X
da_ld=1
a=a+1
a=4095
Digital Design
Copyright 2006
Frank Vahid
71
Read-Only Memory ROM
Memory that can only be read from, not
written to
32
10
Data lines are output only
No need for rw input
data
addr
rw
1024 32
RAM
en
Advantages over RAM
Compact: May be smaller
Nonvolatile: Saves bits even if power supply
is turned off
Speed: May be faster (especially than
DRAM)
Low power: Doesnt need power supply to
save bits, so can extend battery life
Choose ROM over RAM if stored data wont
change (or wont change often)
RAM block symbol
32
10
data
addr 1024x32
ROM
en
ROM block symbol
For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
Digital Design
Copyright 2006
Frank Vahid
72
36
Read-Only Memory ROM
32
data
10
addr
1024x32
ROM
Let A = log2M
en
d0
ROM block symbol
addr0
addr1
r
d
a
addr(A-1)
a0
a1 AxM
d1
decoder
a(A-1)
clk
bit storage
block
(aka cell)
word
enable
word
data
word word
enable enable
data
d(M-1)
en
ROM cell
rdata(N-1) rdata(N-2) rdata0
Internal logical structure similar to RAM, without the data
input lines
Digital Design
Copyright 2006
Frank Vahid
73
ROM Types
Let A = log2 M
addr
d0
addr0
addr1
addr(A-1)
a0
a1 A M
d1
decoder
a(A-1)
e
Storing bits in a ROM known as
programming
Several methods
2-bit word on right stores 10
word enable (from decoder) simply
passes the hardwired value
through transistor
word
data
cell
word word
enable enable
data
d(M-1)
en
data(N-1)
Mask-programmed ROM
Bits are hardwired as 0s or 1s
during chip manufacturing
bit storage
block ,,
(a cell )
word
enable
,,
If a ROM can only be read, how
are the stored bits stored in the
first place?
data(N-2)
data line
cell
data0
data line
cell
word
enable
Notice how compact, and fast, this
memory would be
Digital Design
Copyright 2006
Frank Vahid
74
37
ROM Types
Fuse-Based Programmable
ROM
Let A = log2 M
addr
addr(A-1)
Each cell has a fuse
A special device, known as a
programmer, blows certain fuses
(using higher-than-normal voltage)
bit storage
block ,,
(a cell )
word
enable
,,
d0
addr0
addr1
a0
a1 A M
d1
decoder
a(A-1)
e
word
data
cell
word word
enable enable
data
d(M-1)
en
data(N-1)
Those cells will be read as 0s
(involving some special electronics)
Cells with unblown fuses will be read
as 1s
2-bit word on right stores 10
data(N-2)
data line
data0
cell
data line
cell
a
word
enable
fuse
blown fuse
Also known as One-Time
Programmable (OTP) ROM
Digital Design
Copyright 2006
Frank Vahid
75
ROM Types
Erasable Programmable ROM
(EPROM)
Let A = log2 M
addr
Uses floating-gate transistor in each cell
Special programmer device uses higherthan-normal voltage to cause electrons to
tunnel into the gate
word
data
cell
word word
enable enable
data
d(M-1)
en
data(N-1)
data(N-2)
data line
data0
data line
cell
cell
r
o
word
enable
n
ti
g
2-bit word on right stores 10
a0
a1 A M
d1
decoder
a(A-1)
e
floating-gate
transistor
Electrons become trapped in the gate
Only done for cells that should store 0
Other cells (without electrons trapped in
gate) will be 1
addr(A-1)
bit storage
block ,,
(a cell )
word
enable
,,
d0
addr0
addr1
ee
t
e
Details beyond our scope just general
idea is necessary here
trapped electrons
To erase, shine ultraviolet light onto chip
Gives trapped electrons energy to escape
Requires chip package to have window
Digital Design
Copyright 2006
Frank Vahid
76
38
ROM Types
Electronically-Erasable Programmable ROM
(EEPROM)
Similar to EPROM
Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
But erasing done electronically, not using UV light
Erasing done one word at a time
Flash memory
Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously
Become common relatively recently (late 1990s)
r
o
32
n
ti
g
addr
t
e
Both types are in-system programmable
data
10
en
Can be programmed with new stored bits while in the
system in which the ROM operates
1024x32
EEPROM
write
Requires bi-directional data lines, and write control input
Also need busy output to indicate that erasing is in
progress erasing takes some time
busy
Digital Design
Copyright 2006
Frank Vahid
77
ROM Example: Talking Doll
Hello there!
4096x16 ROM
Hello there! audio
divided into 4096
samples, stored
in ROM
16
speaker
Hello there!
a
Ra Ren
processor
digital-toanalog
vibration
converter
sensor
da_ld
v
Doll plays prerecorded message, trigger by vibration
Message must be stored without power supply Use a ROM, not a RAM,
because ROM is nonvolatile
And because message will never change, use a mask-programmed ROM or
OTP ROM
Processor should wait for vibration (v=1), then read words 0 to 4095 from
the ROM, writing each to the d-to-a
Digital Design
Copyright 2006
Frank Vahid
78
39
ROM Example: Talking Doll
Local register: a (12 bits)
4096x16 ROM
v
a=0
a<4095
16
Ra Ren
processor
Ra=a
Ren=1
digital-toanalog
converter
da_ld=1
a=a+1
a=4095
da_ld
v
High-level state machine
Create state machine that waits for v=1, and then counts from 0 to
4095 using a local register a
For each a, read ROM, write to digital-to-analog converter
Digital Design
Copyright 2006
Frank Vahid
79
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
Want to record the outgoing
announcement
When rec=1, record digitized
sound in locations 0 to 4095
When play=1, play those
stored sounds to digital-toanalog converter
What type of memory?
Should store without power
supply ROM, not RAM
Should be in-system
programmable EEPROM
or Flash, not EPROM, OTP
ROM, or mask-programmed
ROM
Will always erase entire
memory when
reprogramming Flash
better than EEPROM
Digital Design
Copyright 2006
Frank Vahid
4096x16 Flash
y
s
u
b
analog-todigital
converter
Were not home.
16
ad_buf
ad_ld
12
Ra Rrw Ren er
processor
record
microphone
bu
digital-toanalog
converter
da_ld
rec
play
speaker
80
40
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
High-level state machine
Once rec=1, begin
erasing flash by setting
er=1
Wait for flash to finish
erasing by waiting for
bu=0
Execute loop that sets
local register a from 0 to
4095, reading analog-todigital converter and
writing to flash for each a
4096x16 Flash
n
e
analog-todigital
converter
16
12
ad_buf
ad_ld
Ra Rrw Ren er
processor
bu
digital-toanalog
converter
da_ld
rec
record
play
microphone
S
a=0
er=1
speaker
Local register: a (13 bits)
bu
a<4096
T bu U
er=0
ad_ld=1
ad_buf=1
Ra=a
Rrw=1
Ren=1
a=a+1
rec
V
a=4096
Digital Design
Copyright 2006
Frank Vahid
81
Blurring of Distinction Between ROM and RAM
We said that
RAM is readable and writable
ROM is read-only
ROM
Flash
EEPROM
RAM
a
NVRAM
But some ROMs act almost like RAMs
EEPROM and Flash are in-system programmable
Essentially means that writes are slow
Also, number of writes may be limited (perhaps a few million times)
And, some RAMs act almost like ROMs
Non-volatile RAMs: Can save their data without the power supply
One type: Built-in battery, may work for up to 10 years
Another type: Includes ROM backup for RAM controller writes RAM contents to
ROM before turning off
New memory technologies evolving that merge RAM and ROM benefits
e.g., MRAM
Bottom line
Lot of choices available to designer, must find best fit with design goals
Digital Design
Copyright 2006
Frank Vahid
82
41
5.7
Queues
A queue is another component
sometimes used during RTL
design
Queue: A list written to at the
back, from read from the front
Like a list of waiting restaurant
customers
Writing called a push, reading
called a pop
Because first item written into a
queue will be the first item read
out, also called a FIFO (first-infirst-out)
back
front
write items
read (and
to the back
of the queue
remove) items
from front of
the queue
Digital Design
Copyright 2006
Frank Vahid
83
Queues
7
rf
0
Queue has addresses, and two
pointers: rear and front
Initially both point to 0
Push (write)
Item written to address pointed to
by rear
rear incremented
Pop (read)
Item read from address pointed
to by front
front incremented
If front or rear reaches 7, next
(incremented) value should be 0
(for a queue with addresses 0 to
7)
Digital Design
Copyright 2006
Frank Vahid
r
2
r
1
f
0
f
0
B
r
f
84
42
Queues
Treat memory as a circle
If front or rear reaches 7, next (incremented)
value should be 0 rather than 8 (for a queue
with addresses 0 to 7)
B
r
Two conditions of interest
Full queue no room for more items
In 8-entry queue, means 8 items present
No further pushes allowed until a pop occurs
Causes front=rear
Empty queue no items
No pops allowed until a push occurs
Causes front=rear
Both conditions have front=rear
To detect whether front=rear means full or
empty, need state machine that detects if
previous operation was push or pop, sets full
or empty output signal (respectively)
5
4
Digital Design
Copyright 2006
Frank Vahid
85
Queue Implementation
rear used as register files
write address, front as read
address
Simple controller would
set control lines for
pushes and pops, and
also detect full and empty
situations
FSM for controller not
shown
Digital Design
Copyright 2006
Frank Vahid
8 16 register file
16
wdata
rdata
waddr
raddr
reset
inc
3-bit
up counter
rear
eq
clr
inc
rd
rdata
rd
clr
wr
16
wdata
wr
Controller
Can use register file for
item storage
Implement rear and front
using up counters
3-bit
up counter
front
=
full
empty
8-word 16-bit queue
86
43
Common Uses of a Queue
Computer keyboard
Pushes pressed keys onto queue, meanwhile pops and sends to
computer
Digital video recorder
Pushes captured frames, meanwhile pops frames, compresses
them, and stores them
Computer network routers
Pushes incoming packets onto queue, meanwhile pops packets,
processes destination information, and forwards each packet out
over appropriate port
Digital Design
Copyright 2006
Frank Vahid
87
Queue Usage Example
Example series of pushes
and pops
Note how rear and front
pointers move
Note that popping doesnt
really remove the data from the
queue, but that data is no
longer accessible
Note how rear (and front)
wraps around from address 7
to 0
rf
0
f
0
r
7
f
1
f
1
r
0
Initially empty
queue
1. After pushing
9, 5, 8, 5, 7, 2, 3
r
7
2. After popping
3. After pushing 6
Note: pushing a full queue is
an error
As is popping an empty queue
4. After pushing 3
data:
9
full
rf
Digital Design
Copyright 2006
Frank Vahid
5. After pushing 4
ERROR! Pushing a full queue
results in unknown state
88
44
5.8
Hierarchy A Key Design Concept
r
r
o
o
n
i
v
n
i
v
2
e
3
e
n
i
v
n
i
v
n
i
v
Province 3
Province 3
CityG
Country A
Map showing all levels of hierarchy
To go from transistors to gates, muxes,
decoders, registers, ALUs, controllers,
datapaths, memories, queues, etc.
Imagine trying to comprehend a controller
and datapath at the level of gates
P
CityF
Province 2
Hierarchy helps us manage complexity
CityE
CityC
Province 1
1 item at the top (the country)
Country item decomposed into
state/province items
Each state/province item decomposed into
city items
CityB
Province 1
An organization with a few items at the
top, with each item decomposed into other
items
Common example: A country
Province 2
CityD
CityA
Hierarchy
2
e
1
e
Country A
n
vi
1
e
Map showing just top two levels
of hierarchy
Digital Design
Copyright 2006
Frank Vahid
89
Hierarchy and Abstraction
Abstraction
Hierarchy often involves not just grouping
items into a new item, but also associating
higher-level behavior with the new item,
known as abstraction
e.g., an 8-bit adder has an understandable
high-level behavior it adds two 8-bit binary
numbers
a7.. a0
b7.. b0
8-bit adder
Frees designer from having to remember,
or even from having to understand, the
lower-level details
co
ci
s7.. s0
n
i
v
n
i
v
2
e
3
e
n
i
v
n
i
v
n
i
v
2
e
3
e
1
e
n
vi
1
e
Digital Design
Copyright 2006
Frank Vahid
90
45
Hierarchy and Composing Larger Components
from Smaller Versions
A common task is to compose smaller components
into a larger one
Gates: Suppose you have plenty of 3-input AND gates,
but need a 9-input AND gate
Can simple compose the 9-input gate from several 3-input
gates
i0
4 1
i0
i1
i1
i2
i2
i3
i3
d
2 1
s1
Muxes: Suppose you have 4x1 and 2x1 muxes, but
need an 8x1 mux
s0
i0
d
s2 selects either top or bottom 4x1
i4
s1s0 select particular 4x1 input
Implements 8x1 mux 8 data inputs, 3 selects, one output i5
i6
P
i1
4 1
i0
s0
i1
i2
i7
n
i
v
n
i
v
i3
n
i
v
n
i
v
n
i
v
2
e
3
e
2
e
3
e
1
e
n
vi
s1
1
e
s0
s1
s0
s2
Digital Design
Copyright 2006
Frank Vahid
91
Hierarchy and Composing Larger Components
from Smaller Versions
Composing memory very common
Making memory words wider
Easy just place memories side-by-side until desired width obtained
Share address/control lines, concatenate data lines
Example: Compose 1024x8 ROMs into 1024x32 ROM
10
n
e
addr
1024x8
ROM
en
data
addr
1024x8
ROM
en
data
addr
1024x8
ROM
en
data
r
d
a
addr
1024x8
ROM
en
data
P
n
i
v
n
i
v
2
e
3
e
n
i
v
data(31..0)
3
e
n
vi
1
e
10
r
d
a
1024x32
ROM
n
e
data
Digital Design
Copyright 2006
Frank Vahid
32
92
46
Hierarchy and Composing Larger Components
from Smaller Versions
11
a9..a0
Creating memory with more words
addr
r
d
a
Put memories on top of one another until the
number of desired words is achieved
Use decoder to select among the memories
Can use highest order address input(s) as
decoder input
Although actually, any address line could be
used
a10
n
i
v
0 1 1 1 1 1 1 1 1 1 0
0 1 1 1 1 1 1 1 1 1 1
n
vi
a10 just chooses
which memory
to access
Digital Design
Copyright 2006
Frank Vahid
2
e
data
addr
1024x8
ROM
11
2048x8
ROM
r
d
a
en
data
8
n
e
data
addr
1024x8
ROM
en data
P
P
n
i
v
n
i
v
2
e
3
e
n
i
v
3
e
1
e
1 1 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 1
en
1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 1 0
1024x8
ROM
n
e
Example: Compose 1024x8 memories into
2048x8 memory
a10a9a8
a0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1 0
1x2 d0
i0 dcd
e d1
addr
1024x8
ROM
en data
To create memory with more
words and wider words, can first
compose to enough words, then
widen.
93
Chapter Summary
Modern digital design involves creating processor-level components
Four-step RTL method can be used
1. High-level state machine 2. Create datapath 3. Connect datapath
to controller 4. Derive controller FSM
Several example
Control dominated, data dominated, and mix
Determining fastest clock frequency
By finding critical path
Behavioral-level design C to gates
By using method to convert C (subset) to high-level state machine
Additional RTL components
Memory: RAM, ROM
Queues
Hierarchy: A key concept used throughout Chapters 2-5
Digital Design
Copyright 2006
Frank Vahid
94
47