Clockless Computing
Montek Singh
Thu, Sep 6, 2007
Review: Logic Gate Families
A classic asynchronous pipeline by Williams
1
Review:
Logic Gate Families
Static CMOS logic (“standard”)
Transmission gates, or “pass-transistor” logic
Dynamic logic, or “domino” logic
2
Static CMOS logic: Summary
Advantages:
output always strongly driven
pull-up and pull-down networks are fully-complementary;
always exactly one of them is “on”
good immunity from noise and leakage
both inverting and non-inverting functions implementable
each gate is inverting
cascade two gates together to get non-inverting logic
Disadvantages:
slow/big PMOS devices needed (in addition to NMOS)
greater chip area
higher power consumption
slower switching speed
3
Complementary CMOS
Complementary CMOS logic gates
– nMOS pull-down network pMOS
– pMOS pull-up network pull-up
network
inputs
– a.k.a. static CMOS output
nMOS
pull-down
network
Pull-up OFF Pull-up ON
Pull-down OFF Z (float) 1
Pull-down ON 0 X (crowbar)
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 4
Series and Parallel
nMOS: 1 = ON a a a a a
0 0 1 1
g1
g2
pMOS: 0 = ON b
0
b
1
b
0
b
1
b
(a) OFF OFF OFF ON
Series: both must be ON a a a a a
Parallel: either can be ON g1
g2
0
0
0
1
1
0
1
1
b b b b b
(b) ON OFF OFF OFF
a a a a a
g1 g2 0 0 0 1 1 0 1 1
b b b b b
(c) OFF ON ON ON
a a a a a
g1 g2 0 0 0 1 1 0 1 1
b b b b b
(d) ON ON ON OFF
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 5
CMOS Gate Design
Activity:
– Sketch a 4-input CMOS NOR gate
A
B
C
D
Y
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 6
CMOS Gate Design
Activity:
– Sketch a 4-input CMOS NAND gate
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 7
Conduction Complement
Complementary CMOS gates always produce 0 or 1
Ex: NAND gate
– Series nMOS: Y=0 when both inputs are 1
– Thus Y=1 when either input is 0
– Requires parallel pMOS Y
A
B
Rule of Conduction Complements
– Pull-up network is complement of pull-down
– Parallel -> series, series -> parallel
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 8
Compound Gates
Compound gates can do any inverting function
Ex: Y AB C D (AND-AND-OR-INVERT, AOI22)
A C A C
B D B D
(a) (b)
C D
A B C D
A B
(c)
(d)
C D
A
A B
B
Y Y
C
A C
D
B D
(f)
(e)
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 9
Transmission (“Pass”) Gates
Key Idea:
transistors used in a different configuration
when switched on: instead of connecting output to Vdd or
Gnd, they connect output to the input
Advantage:
very efficient for implementing switches and multiplexers
Disadvantage:
signal degradation unless both NFET and PFET passgates are
used in a complementary configuration
10
Pass Transistors
Transistors can be used as switches
s d
s d
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 11
Pass Transistors
Transistors can be used as switches
g g=0 Input g = 1 Output
s d 0 strong 0
s d
g=1 g=1
s d 1 degraded 1
g g=0 Input Output
g=0
s d 0 degraded 0
s d
g=1
g=0
s d strong 1
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 12
Transmission Gates
Single pass transistors produce degraded outputs
– pMOS good only for transmitting “1”
– nMOS good only for transmitting “0”
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 13
Transmission Gates
Single pass transistors produce degraded outputs
Complementary Transmission gates pass both 0 and
1 well
Input Output
g = 0, gb = 1 g = 1, gb = 0
g
a b 0 strong 0
a b g = 1, gb = 0 g = 1, gb = 0
a b 1 strong 1
gb
g g g
a b a b a b
gb gb gb
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 14
Multiplexers
2:1 multiplexer chooses between two inputs
S
S D1 D0 Y
0 X 0 0 D0 0
0 X 1 1
Y
D1 1
1 0 X 0
1 1 X 1
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 15
Transmission Gate Mux
Nonrestoring mux uses two transmission gates
– Only 4 transistors
S
D0
S Y
D1
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 16
Gate-Level Mux Design
Y SD1 SD0 (too many transistors)
How many transistors are needed? 20
D1
S Y
D0
D1 4 2
S 4 2 Y
D0 4 2
2
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College 17
Dynamic Logic, or “domino”
Key idea:
only use NMOS’s to compute function
use a single PMOS to reset
Advantages:
significantly fewer transistors smaller chip area
higher speed, lower power
less “loading” on wires (drive fewer transistors)
for async: no storage elements needed
Disadvantages:
need extra control input to precharge
logic is typically non-inverting only
more vulnerable to noise and leakage effects
18
Dynamic Logic, or “domino” (contd.)
Gate has 2 phases:
precharge (=reset): output reset to ‘0’
evaluate: output computed either stays ‘0’, or switches to ‘1’
control input controls
PC pull-up
“precharge” PC =0
PC =0 ((asserted
asserted))
network precharge
precharge
pull-down data
network outputPC
PC =1
=1 ((de-asserted
de-asserted))
data
inputs controls evaluate
evaluate
“evaluation”
Pull-up and pull-down must never both be simultaneously active:
ensure that data inputs are reset while gate is precharging
or, add a “footer” device 19
Outline: Several Pipeline Styles
Classic static logic pipeline: Sutherland
Recent static logic pipeline: MOUSETRAP
Classic dynamic logic pipeline: Williams/Horowitz’
PS0
20
A Classic Asynchronous
Dynamic Pipeline
Williams and Horowitz’s PS0 pipeline:
Structure
Operation
Performance
21
A Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:
successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Stage 1 Stage 2 Stage 3
ack
Data Data
in out
data
Processing Completion
Block Detector
Implemented using “dynamic logic” 22
PS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a
completion detector:
ack
Completion
Detector
PC
“keeper”
data Pull-down
inputs network
data
outputs
Processing Block 23
Dual-Rail Completion Detector
Combines dual-rail signals
Indicates when all bits are valid (or reset)
C-element:
C-element:
ifif all
allinputs=1,
inputs=1,output
output
11
ifif all
allinputs=0,
inputs=0,output
output
if all inputs=0, output 000
else,maintain
maintain output
outputvalue
value
bit0 OR else,
Done
bit1 OR C
bitn OR
OR together 2 rails per bit
Merge results using “C-element”
24
PS0 Protocol
PRECHARGE N: when N+1 completes evaluation
delete data: after next stage has copied it
EVALUATE N: when N+1 completes precharging
accept new data: after next stage is emptied
indicates
indicates “done”
“done” indicates “done”
6 3 4
N N+1 5 N+2
1 2 3
evaluates precharges
evaluates evaluates
Complete
Evaluate
Precharge
Complete
Evaluate
Precharge
cycle:
cycle: 66 events
events
Precharge:
Evaluate:
Precharge:
Evaluate: 33 events33 events
another
events
another events 25
PS0 Performance
6
4
1 2 3
Cycle Time = 3 TEVAL TPRECH 2 TDETECT
TEVAL Evaluation Time
TPRECH Precharge Time
TDETECT Completion Detection Time
26
Summary: PS0 Pipelining
Datapaths are latch-free:
dynamic gates themselves provide implicit latches
+: chip area savings
+: extremely low latency
Data items kept separate by control
stage deletes data: only after next stage has copied it
stage accepts new data: only if next stage is empty
distinct data items always separated by “spacers”
Control is extremely simple: each controller = single wire
completion detector directly controls previous stage
+: chip area savings
+: low control overhead
27
Comparison to a Clocked Pipeline
How would you design the pipeline if you actually had a clock?
1. Replace handshaking with “magic clocking”
each stage gets its own clock
successive clocks are slightly skewed
essentially, clocked simulation of asynchronous handshaking!
– need multiple clock phases!
Ck
latch
Ck’
2. Use a single clock, but insert latches between stages
latches are simple, level-sensitive
consecutive stages receive complementary clock signals
28
Drawbacks of PS0 Pipelining
1. Poor throughput:
long cycle time: 6 events per cycle
data “tokens” are forced far apart in time
2. Limited storage capacity:
max only 50% of stages can hold distinct tokens
data tokens must be separated by at least one spacer
My Research Goals have been: address both issues
still maintain very low latency
29
Homework #4 (due Tue Sep 18)
1. Enumerate ALL of the timing assumptions inherent
in Williams’ PS0 style
Assume all gate and wire delays can be arbitrary
For which scenarios can there be a malfunction?
2. Compare the cycle times of PS0 with an ideal
clocked dynamic pipeline (slide #28)
30