Chapter 5
Chapter 5
Lecture 5
Dr. Shoab A.
New Trends in FPGAs’
FPGAs
Embedding in FPGAs
Power PC Flash
External address
OPB Memory data Memory
For FPGA
Controller control bit-stream
MicroBlaze OPB
For program
GPI of CPU
Etherne
RJ45 O
t MAC
UART
SDRAM
SDRAM
controlle PLB Arbiter
r
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 3
Khan
Embedded Arithmetic Units in
FPGAs
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 4
Khan
(a) DSP48 in VirtexTM-4 FPGA (Derived from Xilinx
documentation) BCOUT
18 PCOUT
18 38 48
48 CIN 48
X
72
x 48 + P
48
B 18 SUBSTRACT
48 48
X
C 48
48
48 48
48
Wire shift Right by 17 Bits PCIN
BCIN
BCOUNT
18 PCOUT
18
38 48 48
X CIN 48
18 38
A 18 x
72
48 + P
38 Y
48
B 18 SUBSTRACT
48 48
X
18
48
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan 48
BCIN PCIN 5
Wire shift Right by 17 Bits
(b)18x18 multiplier and adder in Altera FPGA
DATA-1 18
BUFFER 36
X
COEFFICIENT 1
BUFFER
18 ADDER
DATA 2 18
BUFFER
36
X
COEFFICIENT-2
BUFFER
18 SUMMATION ADD-
UNIT
37
ACCUMULATE
DATA-3 18 CIRCUIT
BUFFER 36
X
COEFFICIENT-3
BUFFER
18
ADDER
DATA-4 18
BUFFER
36
X
COEFFICIENT-4
BUFFER
18
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 6
Khan
8x8 multiplier and 16-bit adder in Quick Logic FPGA
16
ABUS
16
BBUS
8 8
16 16
3
CBUS
17
RBUS
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 7
Khan
18x18 multiplier in Virtex-II, Virtex-II pro and
SpartanTM-3 FPGA
18
A 18 x18 bit 36
18 multiplier
B
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 8
Khan
Instantiation of Embedded Blocks
ISE-provided template
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 9
Khan
Embedded Multipliers
Automatic instantiation
w[n a w[ 1] a w[n 2]
1 2
] n x[n]
y[n] b w[n b w[ 1] b w[n 2]
0 1 2
] n
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 10
Khan
Block diagram of a 2nd Order IIR filter in Direct Form II
Realization
w[n]
x[n] y[n]
16 + 32
Q
16 x 32 + 32
w[n]
b0
w[n-1]
+ 32 x 16 16 x 32 +
a1 b1
w[n-2]
32 x 16 16 x 32
a2 b2
(a)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 11
Khan
RTL schematic generated by Xilinx’s Integrated
Software Environment
(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 12
Khan
Synthesis of the design on Spartan™-3 FPGA, the multiplication and
addition
operations are mapped on DSP48 blocks
Multiply Accumulate (MAC) embedded
Number of IOs: 50
(c)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan 13
RTL schematic generated by Xilinx ISE for Virtex™ 4 target device .
The multiplication and addition operations are mapped on DSP48
multiply accumulate (MAC) embeded blocks
(d)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 14
Khan
module iir(xn, clk, rst, yn);
assign wn = wfn[30:15]+xn;
b0 data_out
b1 b2 b3 b4 b5 b6 x
x x x x x x x y[n]
Q
16 16
b7 40
+ + + + + + +
clk
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 17
Khan
Example of Optimized Mapping
//
-------------------------------------------------------------
// Module: fir_filter
//
-------------------------------------------------------------
//
-------------------------------------------------------------
// Discrete-Time FIR Filter
// -------------------------------
// Filter Structure : Direct-Form FIR
//
// FilterInput
OrderFormat:: 7 Q1.15
// Output Format: Q1.15
//
//
-------------------------------------------------------------
module fir_filter (
input clk,
input signed [15:0] data_in, //Q1.15
output reg signed [15:0] data_out //Q1.15
);
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 18
Khan
Contd…
// Constants, filter is designed using Matlab FDATool, all coeffs are in Q1.15
format parameter signed [15:0] b0 = 16'b1101110110111011;
parameter signed [15:0] b1 = 16'b1110101010001110;
parameter signed [15:0] b2 =
16'b0011001111011011; parameter signed [15:0] b3
= 16'b0110100000001000; parameter signed [15:0]
b4 = 16'b0110100000001000; parameter signed
[15:0] b5 = 16'b0011001111011011; parameter
signed [15:0] b6 = 16'b1110101010001110; parameter
signed [15:0] b7 = 16'b1101110110111011;
// Block Statements
always @(posedge
clk)
Begin
xn[0] <=
data_in; xn[1]
<= xn[0];
xn[2] <= xn[1];
19
xn[3]
Digital Design of
Khan
Signal<= xn[2];Systems, John Wiley & Sons by Dr. Shoab A.
Processing
Contd…
endmodule // fir_filter
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 20
Khan
Synthesis reports: (a) ) Eight 18 18-bit embedded multipliers and seven
adders from generic logic blocks are used on a Spartan™-3 family of
FPGA
Selected device : 3s200pq208-5
Number of IOs: 33
(a)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 21
Khan
(b) Eight DSP48 embedded blocks are used once mapped on a Vertix-4 family
of
FPGA
Selected device : 4vlx15sf363-12
MHz)
Number of Slices: 9 out of 6144 0%
Number of IOs: 33
(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 22
Khan
Optimized Mapping
x x x x x x x x
b0 b1 b2 b3 b4 b5 b6 b7
y[n]
0 + + + + + + + +
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 23
Khan
Optimized Mapping
Number of IOs: 33
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 24
Khan
module fir_filter_pipeline
( input clk;
input signed [15:0] //Q1.15
data_in;
output signed [15:0] data_out; //Q1.15
// Constants, filter is designed using Matlab FDATool, all coeffs are in Q1.15
format parameter signed [15:0] b0 = 16'b1101110110111011;
parameter signed [15:0] b1 = 16'b1110101010001110;
parameter signed [15:0] b2 =
16'b0011001111011011; parameter signed [15:0] b3
= 16'b0110100000001000;
= 16'b0011001111011011;
= 16'b1101110110111011;
reg signed [15:0] xn [0:13] ; // one stage pipelined
input sample delay line
ci 1 gi pi c i
pi ai bi
ai b
gi
i
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 27
Khan
Fast Carry Logic in Vertix™-II pro FPGA Slice
Cout
SHIFT IN
LUT: A + B ORCY
SOP IN SOPOUT
0
YBMUX
Dual-Port MUXCY YB
G4 Shift-Reg 0
G3 A4 LUT 1
A3
B RAM
A
A2
A1 ROM GYMUX Y= A + B + Cin
WG4 P
WG4
WG3 WG3
WG2 XORG
WG1
WG2
WG1
DY
WS DI
FF
LATCH
D Q Q
ALT DIG G2
PROD DYMUX Y
G1 CYOG
MULIAND BY
1 CE CE
0 CLK CK
S
R
BY SHIFTOUT SR
R
E
DIG
WSG Dedicated Carry logic
V
WE [2:0]
SLICEWE[2:0] WE
CLK
MUXCY 1 half-Slice= 1bit-adder
WSF 0 1
CE
Shared
between
CLK
x&y
Registers Cin
SR
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 28
Khan
Fast Carry Logic
COUT COUT
to S0 of the next CLB to CIN of S2 of the next CLB
O I MUXCY
FF
LUT
CIN
COUT
O I MUXCY
COUT
FF
COUT LUT
to CIN of S2 of the SLICE
O I MUXCY
FF
LUT
FF
LUT
SLICE S1
O I MUXCY
FF
LUT
CIN
COUT (Secondary
O I Carry Chain)
MUXCY
FF
LUT CO
UT
SLICE S0
O I MUXCY
FF
LUT
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan 29
CIN CIN CLB
Fast Carry Chain
1 CLB = 4 Slices = 2,4bit adders
A[63:60]
B[63:60]
+ Y[63:60]
Y[64]
A[63:60] CLB 15
Y[63:60]
B[63:60]
CLB 2 Y[11:8]
B[11:8]
A[7:4] CLB 1
Y[7:4]
B[7:4]
A[3:0] CLB 0
Y[3:0]
B[3:0]
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 31
Khan
Adders
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 32
Khan
Half Adder using Data Flow modeling
assign {cout,si} = ai +
bi; endmodule
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 33
Khan
Half Adder using Data Flow
modeling
ai Si
bi
cOut
Half
Adder
Ci
a i b i ai
bi
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
34
Full
Adder
x y zTruth CS
Table
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 35
Khan
Full
Adder
Cin Si
Half Adder cOutHA2
SiHA1
ai Half Adder cOutHA1
cOut
bi
ci ai bi ci ai
1 bi
si ai bi ci
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 36
Khan
Gate-level design options for a full adder
ai
bi
ci
(a) (b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Contd ...
ai
bi ai pi
bi si
0 0 0
1 1 gi
2 2 Ci+1
C i+1
3 1 3
ci
s i
c i
(c) (d)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Full Adder: Implementation in Verilog
module FULL_ADDER(ai,bi,cin,si,cout);
endmodule
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Full Adder Using Data Flow Modeling
module FULL_ADDER(ai,bi,cin,si,cout);
input ai,bi;
input cin;
output si,cout;
assign {cout,si} = ai + bi +
cin; endmodule
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 40
Khan
Full Adder Using Data Flow Modeling
module FULL_ADDER(ai,bi,cin,si,cout);
input a,bi i;
input cin;
output
si,cout;
assign {cout,si} = ai + bi +
cin; endmodule
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 41
Khan
Ripple Carry Adder
cout
FA FA FA FA FA FA
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 42
Khan
module ripple_carry_adder #(parameter W=16)
(input clk,
input [W-1:0]
input a, b,
cin, s_r,
output reg [W-1:0]
output regcout_r);
wire [W-1:0]
s; wire cout;
reg [W-1:0]
a_r, b_r;
reg
cin_r;
output cout;
output [5:0] s;
input [5:0]
a,b;
input cin;
reg [5:0] s,c;
reg cout;
always@(a or b or cin)
begin
{c[0],s[0]}= a[0] +
b[0] + cin;
for(i=1; i<6; i=i+1)
{c[i],s[i]}= a[i]
+ b[i] + c[i-
1]; // through
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 44
Khan data flow
Important Observation
45
45Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Non-uniform Group 12-Bit Carry Select Adder
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 46
Khan
Carry Generate and Propagate Logic
gi ai bi
pi ai bi
ci 1 gi
p i c i si
ci pi
47Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Group Carry and Group Propagate
c1 g0 p0 c0
c2 g1
p1 c 1
p0 c0
g1
p 0 p1 c 0
p1
g0 p2 p1 g 0 p2 p1 p0 c0
p3 p2 p1 p0 c 0
g1 p3 p2 g1 p p p g
3 2 1 0
let Gp0 g g3 p3 g 2 p3 p2 g1 p3 p2 p1 g 0
1 0
and P0 p3 p2 p1 p0
c3 g2
we can write c4 G0
p2 g1
P0 c0
c4 g3 48
48Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
CLA logic for computing carries in two-Gate delay time
ai
bi pi
ci si
gi
(a)
c
p00
p1
p2
p3
c0
g0
p0 p1
g0 c1 c
p00 p2
p1 p3
p2
g0 g1
c0
p0 p1 p2
p1 p2 c3 p3 c4
g g1 g2
c2 p2 p3
0
p1 g2 g3
g1
(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 49
Khan
A 16-bit carry look-ahead adder using two levels of CLA
logic s a b s a b s a b s a b C 3 3 3 2 2 2 1 1 1 0 0 0 0
C3 C2
Adder Adder C Adder
Adder 1
p3 g3 p2 g2 p0 p1 g1 g0
Carry Look Ahead Logic CLA00
P0 G0
s7 a7 b7 s6 a6 b 6 s5 a5 b5 s4 a4 b 4
C7 C6 C4
Adder Adder C5
Adder Adder
p7 g7 p6 g6
p5 g5 p4
CLA01
g4
P1 G1
s11 a11 b11 s10 a10 b10 s9 a9 b 9 s8 a8 b 8
C10 C8
Adder C11 Adder C9 Adder
Adder
p11 g11 p10 g10 p8 g8
p9
CLA02
g9 P2 G2
s15 a15 b15 s14 a14 b14 s13 a13 b13 s12 a12 b12
C14 C8
Adder C15 Adder Adder C13 Adder
p15 g15 p14 g14 p13 g13 p12 g12
CLA03
P3 G3
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan
50
CLA10
A 64-bit carry look-ahead adder using three levels of CLA
logic
4-bit carry-
lookahead adder
03 G P0 G P0 P0
P 03 02
2 1 G01 0 G00
P1 G P1 G P1 G CLA10
3 2 1
13 12 11
P10 G10
16-bit carry-
CLA20
P20 G20
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 51
Khan
A 12-bit Hybrid Ripple Carry and Carry Look-ahead
Adder
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 52
Khan
Binary Carry Look-ahead Adder (BCLA)
g i ai bi
pi
ai (g i 1, pi 1 ) ...( gi , pi ) (g 0 ,
bi and p0 )
________________________________________
(Gi , Pi )
____ Andthe problemcan be recursively solvedas
(g i , pi )
(G0 , P0 ) (g 0 , p0 )
Eq1
for i 1 to N-1
(Gi , Pi (g i , pi ) (Gi 1, Pi
)
1)
ci
P i c0
G i end
Wherethe dot operator is
givenas : (Gi , Pi ) (g i , pi ) pi Gi 1, pi Pi 1 )
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 53
Khan
Binary carry look-ahead adder Serial Implementation
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
S15
……………………………………………………………………… S0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 54
Khan
module BinaryCarryLookaheadAdder begin
integer i;
always@(*)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 55
Khan
Brent–Kung adder
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Level
S15
……………………………………………………………………… S0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 56
Khan
Ladner–Fischer parallel prefix adder
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Stage 1
Stage 2
Stage 3
Stage 4
S15
………………………………………………………………… S0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 57
Khan
Kogge–Stone parallel prefix adder
15 14 13 12 11 10 8 7 6 5 4 3 2 1
9 0
Stage
1
Stage
2
Stage
3
Stage
4
S15
……………………………………………………………………… S0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 58
Khan
Han–Carlson parallel prefix adder
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 6
S15 ………………………………………………………………… S0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 59
Khan
Regular layout of an 8-bit Brent-Kung Adder
c1 c2 c4
( g0,p0
)
( g1,p1 c3
)( g2,p2 ) c5
( g3,p3
)
( g4,p4 c6
)
( g5,p5
)( g6,p6 c7
)
( g7,p7
)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 60
Khan
Carry Skip Adder
In case the group does not generate its own carry then it
simply bypasses the carry from the previous block to its next
block
Pi pi pi 1pi 2... pi k 1
pi ai bi
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 61
Khan
A 16-bit equal-group carry skip adder
C12 C8 C4
c_out 4-bit RCA 4-bit RCA 4-bit RCA 4-bit RCA c_in
OV
P[15:12]
P[11:8]
P[7:4]
P[0:3]
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 62
Khan
Conditional Sum Adder
The process that led to the two-level carry select adder can be
continued . . .
A logarithmic time conditional-sum adder results if we proceed
to the extreme:
single bit adders at the top
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 63
Khan
Principle
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 64
Khan
Example
s0 i ai bi
s1i ai ~
bi
c0 i a i bi
c1i ai bi
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 65
Khan
Conditional Cell (CC)
ai
bi
s0i s1i
c0i c1i
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 66
Khan
Addition of three bit numbers using a conditional sum adder
(Here we are assuming actual cin=0)
1 1 1
1 0 1
0 1 0 s0i
1 0 1 c0i Cin
=0
1 0 1 s1i
c1i Cin=1
1 1 1
0 1 0
1 1
1
0
1 0 0
1 67
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Example: Conditional Sum Adder
1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1
ai ai
0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0
bi
Group sum and block carry out bi
Group 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0 i
Group width S0
carry-in i
C0i
S1i
C1
i
1 0 1 0 1 0 1 1 1 1 0 0 0 1 1 0 1 1
0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0
1 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0
1 0 1 1 1 1 1 1 1 1 0 1 1 1 1
2 0 1 0 1 1 1 0 0 1 1
0
0 0 1 1 0 1 0
0 1 0 0 1 0 1
1 1 0 0 0 1 1 0
1 1 0 0 1 0 1
0 1 1 1 1 0 1
4 0 1 1 0 1 1 1 1 0 0 0 0 1 1
0 1 1 1
0 0 1
1 1 1 0 0 0 0 1 0 1
Digital Design of Signal Processing Systems, ohn Wiley
1 & Sons by Dr. 0Shoab A. Khan 0 68
J
0 1 1
A 16-bit Conditional Sum Adder
a0 CC0 S00
C00
b0
S01
a1 CC1 Mux S1
C01 C1
b1 S11 2 C11
C11
C02 S02
S02 Mux S2
a2 CC2 Mux S03
C02
C03
3
b2 S12 2 S3
C12 C12 S12
S03 C3
a3 C03 Mux S13
CC3 S13 C13
b3 C13 2
C04 C05 C13
S0
C044 S04
a4 CC4 Mux S05
S14 C05 S05
b4 C14 2 S06
C14 Mux
S05 S07
Mux S15 3 S4
a5 CC5
C05
C07
S15
C15 2 C15
b5 S5
C06 S06 C16 Mux
S6
CC6 S06
Mux S07 S14 5 S7
a6 C06 C07 S15
b6
S16 2 Mux S1 6 C7
C16 C17 S16
S07 S17 3 S17
C07 Mux
a7 CC7 S17 C17 C17
C17 2 C011
b7 S08 Mux
C08 C09
C08 2 Mu
a8 CC8 x S010
8
S09 Mux C7
b8 S1S0
8 C18 S011
S08
C1
9 2 3 S8
C09 C09 Mux S19 C011 S09 Mux
a9 CC9 S19 Mux S010 9
C19 2 C19
b9 C19 5 S011
S010 C010 S010 S012
CC10 C0
S110 M x S011
C011 S110
S013
a10 C110
u2 Mux S014
S011 C110 S110 S111 S015
S111 3 C111 C015
b10 Mux S9
CC11 C111
2 S10
C111 S11
a11 S012 C012 C013
CC12 S013 Mux S012 S12
S013
C0C0
12 11 M ux C013 S013 S13
b11 S1S1
12 11 2 3 S014 S14
C1C1
12 11 C112 S15
S113 S015
Mux C113 S18 C15
a12 CC13 C015 S19
C013
2 Mux 69
S1
S014 C114 S110
b12 S0C1 13 C014 5 S111
CC14 14
S01513 Mux S015 S112
C014 S112
S114 2 C015 Mux S113 S113
a13 C114 C114 S114 S114
S114 3
a15 C015
Mux S115 S115 S115
b13 CC15 S115
C115
C115 C115
b15 C115 2
Hybrid Adder Designed
Carry-skip adders
Carry-select adders
Conditional-sum adders
greater cost-effectiveness
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 70
Khan
Example
s0 i ai bi
s1i ai ~
bi
c0 i a i bi
c1i ai bi
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 71
Khan
A 16 –bit uniform-groups carry select adder
Cout Cin
4 bit 2-to-1 Mux 4 bit 2-to-1 Mux 4 bit 2-to-1 Mux 4 bit 2-to-1 Mux
C[12] C[8] C[4]
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 72
Khan
Hierarchical CSA
a 3N a 3N N
b N 1: 4 1: b 3N 1: N a N N b N 1: N
N1: 3N4 4 2 4 2 2 4 2 4
1: N
b 4 1:0
a N
4 1:0
0 0 0
N/4-bit adder N/4-bit adder N/4-bit adder
Cin
N/4-bit adder 1 N/4- bit adder 1 N/4-bit adder 1 N/4- bit
adder
N/4+1 N/
N/4 N/4 N/4+1 N/4+1 N/4
4+1
1 Mux 0
CN/4
1 Mux 0
CN/2
1 Mux 0 N/4
N/
Cout
s N 1:
2+1N
s N 1: N s N 1 :
2 2 4 4
0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 73
Khan
Barrel Shifter
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 74
Khan
(a) Design of a logic shifter for an 8-bit Operand (b) Design of logic
and
arithmetic shifter for an 8-bit signed
0
operand
x[7]
x 8 1 1
0
{ x[7], L/A 0
8 1
1 x 8
0
x[7:1] } x[7:1] 7
{ 2{x[7]}, x[7:2] } 8 2 1 8
1 1
{ 3{x[7]}, x[7:3] } 8 3 2
x[7:2]
2
6 8 2
{ 4{x[7]}, x[7:4] } x[7:3] 5
4 3
8 3
{8 5{x[7]}, x[7:5] }
3
x[7:4] 4
5 4 8
4 4
{8 6{x[7]}, x[7:6] } x[7:5] 3
6 5 8 5
5
{8 7{x[7]}, 8 y x[7:6] 2
7 8 6 8 6
x[7] }
8'b0 8
6
8 7 x[7] 1 8 7
7
{ x[0], 8 8'b0 8 8
9 y
7'b0 }
{ x[1:0], 8 { x[0], 7'b0 } 8
10 8
9
6'b0 }
{ x[2:0], 8 { x[1:0], 6'b0 }
11 8
10
{5'b0 }
x[3:0], 8 { x[2:0], 5'b0 } 8
12 11
4'b0 }
{ x[4:0], 8
13 { x[3:0], 4'b0 } 8
3'b0 }
{ x[5:0], 8 12
14 { x[4:0], 3'b0 } 8
{2'b0 }
x[6:0], 8 15
13
{ x[5:0], 2'b0 } 8
1'b0 } 14
{ x[6:0], 1'b0 } 8
15
4
4
s
75
(a) (b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Design of a Barrel Shifter performing shifts in multiple stages
(a) Single cycle design
{15{x[15]}} 15'b0
15 15
L/A 1 0
15 {15[sgn] },x[15,0]
y0[22:0]
x[15:0] 0 y0 23 y1[18:0]
16 31
31 0 y1 19
y2[16:0]
1 y0>>8 y2 y3[15:0]
23
y1>>4 0 17
{x[14:0],16'b0} 31 y3
23 1 19 y2>>2 0 16
s4 y0[30:8] 19 1 17
y3>>1
0 16 y
s3 y1[22:4] 17 1
s2 y2[18:2] 16 1
s1 y3[16:1]
s0
(a)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 76
Khan
Design of a Barrel Shifter performing shifts in multiple (b)
Pipelined design
{15{x[15]}} 15'b0
15 15
L/A 1 0
15 y0_reg
y0[22:0] y1_reg
{15[sgn] },x[15,0] y0 23 y1[18:0] y2_reg
x[15:0] 0 0 y1 19 y2[16:0] y3_reg
16 31 31
y0>>8 0 y2 y3[15:0]
23 17
{x[14:0],16'b0} 1 1 y1>>4 0 y3 16
31 1 19 y
s4
23 y2>>2 17 0 16
y0[30:8] 19 1 y3>>1
5 y1[22:4] 17 1
4 4 y2[18:2] 16
3 3 y3[16:1]
s-reg 2 s2
1 s1
sp_reg spp_reg
sppp_reg
(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 77
Khan
Carry Save Adders and
Compressors
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 78
Khan
Carry Save Addition saves the carry at next bit location
The CSA does not ripple any carry
It has a delay of one FA
The concept of CSA is effective in designing partial products
compression/ reduction logic
a0 a1
0 0 1 0 1 1 a2
N
a0 = N N
0 1 0 1 0 1
a1 =
1 1 1 1 0 1 3:2
a2 =
s = 1 0 0 0 1 1
N+1 N
c = 0 1 1 1 0 1
c s
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 79
Khan
Dots are used to represent each bit of the partial product
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 80
Khan
Parallel multiplier architecture
Designing Customized Multipliers
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 81
Khan
Three components of a multiplier
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 82
Khan
Three components of a multiplier
multiplicand
multiplier
N N
PP Generation
…
PP Reduction
CPA
2N
Product
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 83
Khan
Partial Product Generation for a 6x6 Multiplier
Multiplicand Multiplier
b5 b0 a5 a0 =
PPij
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 84
Khan
Partial Product Generation Verilog Code
module multiplier (
input [5:0] a,b,
output [11:0]
prod);
integer i;
always@*
begin
for(i=0; i<6; i=i+1)
begin
pp[i] =
b&
{6{a[i]}
};
end
end
Level n
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 86
Khan
Three dots are shown
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 88
Khan
12x12 Carry Save Reduction Scheme
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 89
Khan
PP reduction for a 12x12 Multiplier using Carry Save Reduction
Scheme
First 3 Partial Products
Level 0
Level 1
Level 0 HA FA FA FA FA HA
Level 1 FA FA FA FA FA HA
Level 2 FA FA FA FA FA HA
Level 3 FA FA FA FA FA HA
P3 P
P4 P2 P1
0
CPA
Free product bits
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 91
Khan
Dual Carry Save Reduction
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 92
Khan
Wallace Tree Multipliers
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 93
Khan
Wallace Tree Multipliers
These rows then, with other rows from other partial product
groups, form a new reduced matrix
Iteratively apply Wallace reduction on the new generated matrix
This process continues until only two rows are left
The final rows are added together for the final product
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 94
Khan
Wallace Reduction Tree applied on 12 PPs
23 20 15 10 5 1
Level 0
5 5
10
10
Level 1
Level 2
Level 3
Level 4
Level 5
FinalJohn
PartialWiley
Product row that Free Product
Digital Design of Signal Processing Systems, & Sons by need carry A. Khan
Dr. Shoab Bits
95
propagate adder
Wallace Reduction layout for a 6x6 array of PPs
HA FA FA FA FA HA
Level 1
HA FA FA FA FA HA
FA FA FA FA FA HA Level 2
HA HA HA FA FA FA HA HA HA Level 3
PS11 PS10 PC10 PS9 PC9 PS8 PC8 PS7 PC7 PS6 PC6 PS6 PC5 PS4 PC4 P3 P2 P1 P0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 96
Khan
Dada Reduction uses the Wallace Reduction Table
Adder Levels in Wallace Tree Reduction
Scheme
Number of partial Number of full adder
Products Levels
3 1
4 2
5 n 6 3
7 n 9 4
10 n 13 5
14 n 19 6
20 n 28 7
29 n 42 8
43 n 63 9
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Dada Reduction
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 98
Khan
Dadda reduction levels for reducing eight PPs to two
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 99
Khan
A Decomposed Multiplier
(a L 28 a ) ( b L 28 b )
H H
(a L bL aL bH 8 a H bL 28 a H bH 216
2
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 100
Khan
A 16x16 bit Multiplier decomposed into four 8x8 multipliers
aL X bL 16-Bits
aL X bH 16-Bits
aH X bL 16-Bits
aH X bH 16-Bits
32-Bits
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 101
Khan
The results of these multipliers are appropriately added
to get the final product
Stage 1
8x8 Multiplier
Stage 2
Stage 3
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 102
Khan
Optimized Compressors
c_out
c_in
(a) (b)
Candidate implementation of 4:2 compressor Concatenation of 4:2 compression to create wider
tiles
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 103
Khan
Contd…
(c) (d)
Use of 4:2 compressor in Wallace tree reduction Use of 4:2 compressor in an 88 multiplier in
of an 8x8 multiplier Dadda reduction
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 104
Khan
Single- and Multiple-column Counters
P5
25 24 23 22 21 20
P5 P4 P3 P2 P1P0
P4
P3
6-bit partial
P2
product x6
P1
6-LUT 6-LUT 6-LUT
P0
Carry Carry Sum
6:3 -compressor x6 1
0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 105
Khan
Counters compressing a 15x15 matrix
15:4, 4:3 and 3:2 counters working in cascade to compress a
15x15 matrix
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 106
Khan
A (3,4,5:5) GPC compressing three columns with 3, 4,
and 5 bits to 5 bits in different columns
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 107
Khan
Compressor Tree Synthesis using compression of two
columns of 5 bits each into 4 bit (5, 5; 4) GPCs
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 108
Khan
Compressor tree mapping by (a) 3:2 counters (b) and a (3, 3;
4) GPC
(a) (b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 109
Khan
Two’s Complement Signed Multiplier
N2 2
i N2 1 i for i=0,1,…,N1-2
PP[i (ai bn 12 bi
] 2 ) i 02
N2 2
N N2 1 i
PP[ 1 1] ( a 1 1)
N 1 12 bn 1 2 bi
N i 02
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 110
Khan
Optimized GPC for FPGA Implementation
FGPAs are best suited for counters and GPC-based compression
trees
LUTs in many FPGAs come in groups of two with shared 6-bit input
A GPC (3,3;4) best utilize 6-LUT-based FPGAs
6 6
LUT 1 LUT 1
LUT 1 LUT 1
LUT 1 LUT 1
Not
Used
LUT 1
(a) (b)
An Altera FPGA Adaptive Logic Module (ALM ) contains A 6 inputs, 4 outputs GPC has full logic
two 6-LUTs with shared inputs, 6 inputs, 3 outputs GPC utilization.
has 3/4 logic utilization
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 111
Khan
Showing 4 x 4-bit signed by signed multiplication
(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 113
Khan
Sign-Extension Elimination and CV Formulation for signed
by signed Multiplication
111111 1 1 1 1 1
1 1
11111SXXXXXXXXXX 1 1 1 1
1
1111SXXXXXXXXXX 1 1 1
1
111SXXXXXXXXXX
SXXXXXXXXXX 111
11SXXXXXXXXXX and adding 1 at 0
1 10 0 0 0 1 0 0 0 0 1 0 0 0 00
1
1 SXXXXXXXXXX LSB
1's compliment
1
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 114
Khan
Multiplying two numbers, 0011 and 1101
0010
Negative Number 1101
1111
11110
Correction 010
Vector 11100
00010
1
11001
0
(a)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Contd…
0010
1101
11010
Correction Vector
1000X
1010XX
0101XXX
111101
(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Contd…
The MSB of all the PPs except the last one are flipped
and a 1 is added at the sign-bit location, and the number
is extended by all 1s
For the last PP, the two’s complement is computed
Flip all the bits and adding 1 to the LSB position
The MSB of the last PP is flipped again and 1 is added
to this bit location for sign extension.
All these 1s are added to find a correction vector (CV)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Application of the string property
0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1
String
0 0 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1
String
0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 1
String
0 0 1 1 1 1 1 0 0 1 0 0 0 0 1 0 0 1 0 1
String
0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1
Hence the number of 1(s) has reduced from 14 to 6. Both have the same
value.
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 118
Khan
Generation of four PPs
10 10 11 01
-2 +1 -1 1
11111111 10 10 11 01
00000001 01 00 11
11111010 11 01
00101001 1
00100101 01 00 10 01
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 119
Khan
An 8 x 8 bit modified Booth recoder multiplier
2a 9
a
0 CV
-a
-
2a
Wallace Tree
0 3 Reduction
b0 BR0 3-5
Scheme
b1 2a a
0 -a
-2a
3
b2 BR1 3-5
b3
3 16 Bit
b4 BR2 3-5 CSA
b5
16
b6 BR3 3
b7 3-5
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 120
Khan
Pre-calculated part of the CV
1 0 1 0 1
1 1 1 1 s
1 1 s
s
0 1 0 1 1
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 121
Khan
Algorithm Transformations for CSA
0; else
sel
1;
To transform the logic for optimal
use of compression tree the
algorithm
Digital Design is
Khan
modified
of Signal Processing as:
Systems, John Wiley & Sons by Dr. Shoab A. 122
Example: Multi Operands addition
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 123
Khan
Example illustrating use of compression tree in multi-operand
addition
Implied place of decimal
1 1 1 1 Inverted bit
1 1 1 1 1 1 Q1.5
1 1 Q5.3
1 1 1 Q4.7
1 Q6.6
0 0 0 1 1 1
CV→ 1
added as
fifth layer 5:4
layers
1 1 1 1
1
HA 4:3
layers
FA
1
3:2
layers
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 124
Khan
Algorithm Transformations for CSA
Multi operands addition should use compression tree and one
CPA
a[n]
b[n] + d[n]
d[n-1]
c[n] + y[n]
x
e[n]
a[n]
d[n] d[n-1]
b[n] CSA
c[n]
x
e[n]
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 125
Khan
Compression tree replacement for an Add Compare and
Select Operation
In many applications multi operands addition is hidden and can be extracted
This example performs an Add-Compare-Select operation
The operation requires three CPAs
The statements can be transformed to exploit compression tree
+ + Compression Tree
(CT)
< Sign
S S
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 126
Khan
Transforming the add and multiply operations to use one CPA and
a compression tree
Apply distributive property of multiplication
Generate PPs for the two multiplications
Use one compression tree to reduce all PPs to two layers
Use one CPA to add these two layers
PP PP
+
Generation Generation
Compression Tree
x
CPA
y
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 127
Khan
Transformation to use compression trees and single CPA
to
implement a cascade of multiplication operations
PP Generation
Op1 Op2(PPG)
Op1 Op2
xx Op3 Op3
S1
CT Op3
C1
PPG PPG
x Op4 Op4
S2
CT
C2
Op4
PPG PPG
x S3
CT
C3
CPA
Pro
d
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 128
Khan
String Property
7=111=8-1=1001
31= 1 1 1 1 1 =32-1
Or 1 0 0 0 0 1=32-
1=31
a bar on it
We go to the end of the string
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 129
Khan
Instead of multiplying with a single bit
We multiply with two bits hence making the partial
products half in No.
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 130
Khan
Booth Recoding Basic Idea
A= 10 10 11 01
B= 1 10 11 0
0 1
For these two bits Booth’s algorithm restricts the value to (-2, -1, 0,
be
+1,+2)
+2 means Shift left A by one
+1 means Copy A in the answer
0 means copy all 0’s
-1 means 2’s complement and then copy
-2 means 2’s complement and then shift
left
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 131
Khan
Booth’s Algorithm
10101101 0
Use the MSB of the previous group to check for the string property
on the pair, use 0 for the first pair
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 132
Khan
As the string property is applied on three bits, there are
following eight possibilities:
21=2 20=1
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 2
1 0 0 -2
1 0 1 -1
1 1 0 -1
1 1 1 0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 133
Khan