Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
72 views133 pages

Chapter 5

The document discusses different architectures for embedding computational blocks like multipliers and adders within field-programmable gate arrays (FPGAs). It provides examples of multiplier blocks from different FPGA manufacturers like Xilinx, Altera, and Quick Logic. It also demonstrates how these embedded blocks can be instantiated and used to implement signal processing functions like filters.

Uploaded by

Muhammad Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views133 pages

Chapter 5

The document discusses different architectures for embedding computational blocks like multipliers and adders within field-programmable gate arrays (FPGAs). It provides examples of multiplier blocks from different FPGA manufacturers like Xilinx, Altera, and Quick Logic. It also demonstrates how these embedded blocks can be instantiated and used to implement signal processing functions like filters.

Uploaded by

Muhammad Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 133

Basic Building Blocks

Lecture 5

Dr. Shoab A.
New Trends in FPGAs’

Architecture Embedded Units in

FPGAs
Embedding in FPGAs

 FPGA with PowerPC, MicroBlaze, Ethernet MAC


and other embedded interfaces

Power PC Flash
External address
OPB Memory data Memory
For FPGA
Controller control bit-stream
MicroBlaze OPB
For program
GPI of CPU
Etherne
RJ45 O
t MAC
UART
SDRAM
SDRAM
controlle PLB Arbiter
r

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 3
Khan
Embedded Arithmetic Units in
FPGAs

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 4
Khan
(a) DSP48 in VirtexTM-4 FPGA (Derived from Xilinx
documentation) BCOUT
18 PCOUT
18 38 48
48 CIN 48
X
72
x 48 + P
48
B 18 SUBSTRACT

48 48
X
C 48

48
48 48
48
Wire shift Right by 17 Bits PCIN
BCIN

BCOUNT
18 PCOUT
18
38 48 48
X CIN 48
18 38
A 18 x
72
48 + P
38 Y
48
B 18 SUBSTRACT

48 48
X

18
48
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan 48
BCIN PCIN 5
Wire shift Right by 17 Bits
(b)18x18 multiplier and adder in Altera FPGA

DATA-1 18
BUFFER 36
X
COEFFICIENT 1
BUFFER
18 ADDER

DATA 2 18
BUFFER
36
X
COEFFICIENT-2
BUFFER
18 SUMMATION ADD-
UNIT
37
ACCUMULATE
DATA-3 18 CIRCUIT
BUFFER 36
X
COEFFICIENT-3
BUFFER
18
ADDER
DATA-4 18
BUFFER
36
X
COEFFICIENT-4
BUFFER
18

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 6
Khan
8x8 multiplier and 16-bit adder in Quick Logic FPGA

16
ABUS

16
BBUS

8 8

16 16

ADD REGISTER MULTIPLY

3
CBUS

17
RBUS

INSTRUCTION FIFOs RAM

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 7
Khan
18x18 multiplier in Virtex-II, Virtex-II pro and
SpartanTM-3 FPGA

18
A 18 x18 bit 36
18 multiplier
B

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 8
Khan
Instantiation of Embedded Blocks

 ISE-provided template

// MULT18X18: 18 x 18 signed asynchronous


multiplier
// Virtex-II/II-Pro, Spartan-3
// Xilinx HDL Language Template, version 9.1i
MULT18X18 MULT18X18_inst (
.P(P), // 36-bit multiplier output
.A(A), // 18-bit multiplier input
.B(B) // 18-bit multiplier input);
// End of MULT18X18_inst instantiation

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 9
Khan
Embedded Multipliers

 Automatic instantiation

w[n a w[ 1] a w[n 2]
1 2
] n x[n]
y[n] b w[n b w[ 1] b w[n 2]
0 1 2
] n

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 10
Khan
Block diagram of a 2nd Order IIR filter in Direct Form II
Realization

w[n]
x[n] y[n]
16 + 32
Q
16 x 32 + 32
w[n]

b0

w[n-1]
+ 32 x 16 16 x 32 +
a1 b1

w[n-2]
32 x 16 16 x 32

a2 b2

(a)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 11
Khan
RTL schematic generated by Xilinx’s Integrated
Software Environment

(b)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 12
Khan
Synthesis of the design on Spartan™-3 FPGA, the multiplication and
addition
operations are mapped on DSP48 blocks
Multiply Accumulate (MAC) embedded

Selected Device : 3s400pq208-5

Minimum period: 10.917ns (Maximum Frequency: 91.597MHz)

Number of Slices: 58 out of 3584 1%

Number of Slice Flip Flops: 32 out of 7168 0%

Number of 4 input LUTs: 109 out of 7168 1%

Number of IOs: 50

Number of bonded IOBs: 50 out of 141 35%

Number of MULT18X18s: 5 out of 16 31%

Number of GCLKs: 1 out of 8 12%

(c)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan 13
RTL schematic generated by Xilinx ISE for Virtex™ 4 target device .
The multiplication and addition operations are mapped on DSP48
multiply accumulate (MAC) embeded blocks

(d)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 14
Khan
module iir(xn, clk, rst, yn);

// x[n] is in Q1.15 format


input signed [15:0] xn;
input clk, rst;

// y[n] is in Q2.30 format


output signed [31:0] yn;

// Full precision w[n] in Q2.30 format


wire signed [31:0] wfn;

// Quantized w[n] in Q1.15 format


wire signed [15:0] wn;

// w[n-1]and w[n-2] in Q1.15


format
reg signed [15:0] wn_1, wn_2;

// all the coefficients are in Q1.15


format wire signed [15:0] b0 = 16'ha7b0;
wire signed [15:0] b1 = 16'hf2b2;
wire signed [15:0] b2 = 16'h7610;
wire signed [15:0] a1 = 16'h5720;
wire signed [15:0] a2 = 16'h1270; 15
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
// w[n] in Q2.30 format with one redundant sign bit
assign wfn = wn_1*a1+wn_2*a2;

/* through away redundant sign bit and keeping


16 MSB and adding x[n] to get w[n] in Q1.15
format */

assign wn = wfn[30:15]+xn;

// computing y[n] in Q2.30 format with one redundant sign bit


assign yn = b0*wn + b1*wn_1 + b2*wn_2;

always @(posedge clk or posedge rst)


begin
if(rst)
begin
wn_1 <= 0;
wn_2 <=0;
end
else
begin
wn_1 <= wn;
wn_2 <= wn_1;
end
End
16
EDnigditaml Doesdigun olef Signal Processing Systems, John Wiley & Sons by Dr.
An 8-tap Direct Form (DF)-I FIR filter

x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7]


data_in 16 16

b0 data_out
b1 b2 b3 b4 b5 b6 x
x x x x x x x y[n]
Q
16 16
b7 40
+ + + + + + +
clk

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 17
Khan
Example of Optimized Mapping

//
-------------------------------------------------------------
// Module: fir_filter
//
-------------------------------------------------------------
//
-------------------------------------------------------------
// Discrete-Time FIR Filter
// -------------------------------
// Filter Structure : Direct-Form FIR
//
// FilterInput
OrderFormat:: 7 Q1.15
// Output Format: Q1.15
//
//
-------------------------------------------------------------

module fir_filter (
input clk,
input signed [15:0] data_in, //Q1.15
output reg signed [15:0] data_out //Q1.15
);
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 18
Khan
Contd…

// Constants, filter is designed using Matlab FDATool, all coeffs are in Q1.15
format parameter signed [15:0] b0 = 16'b1101110110111011;
parameter signed [15:0] b1 = 16'b1110101010001110;
parameter signed [15:0] b2 =
16'b0011001111011011; parameter signed [15:0] b3
= 16'b0110100000001000; parameter signed [15:0]
b4 = 16'b0110100000001000; parameter signed
[15:0] b5 = 16'b0011001111011011; parameter
signed [15:0] b6 = 16'b1110101010001110; parameter
signed [15:0] b7 = 16'b1101110110111011;

reg signed [15:0] xn [0:7]; // input sample delay


line wire signed [39:0] yn; // Q8.32

// Block Statements
always @(posedge
clk)
Begin
xn[0] <=
data_in; xn[1]
<= xn[0];
xn[2] <= xn[1];
19
xn[3]
Digital Design of
Khan
Signal<= xn[2];Systems, John Wiley & Sons by Dr. Shoab A.
Processing
Contd…

xn[5] <= xn[4];


xn[6] <= xn[5];
xn[7] <= xn[6];
data_out <= yn[30:15]; // bring the output back in Q1.15 format
end

assign yn = xn[0] * b0 + xn[1] * b1 + xn[2] * b2 +


xn[3] * b3 + xn[4] * b4 + xn[5] * b5 +
xn[6] * b6 + xn[7] * b7;

endmodule // fir_filter

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 20
Khan
Synthesis reports: (a) ) Eight 18 18-bit embedded multipliers and seven
adders from generic logic blocks are used on a Spartan™-3 family of
FPGA
Selected device : 3s200pq208-5

Minimum period: 23.290 ns

(Maximum frequency: 42.936


Number
MHz) of slices: 185 out of 1920 9%

Number of Slice Flip Flops: 144 out of 3840 3%

Number of 4 input LUTs: 217 out of 3840 5%

Number of IOs: 33

Number of bonded IOBs: 33 out of 141 23%

Number of MULT18X18s: 8 out of 12 66%

Number of GCLKs: 1 out of 8 12%

(a)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 21
Khan
(b) Eight DSP48 embedded blocks are used once mapped on a Vertix-4 family
of
FPGA
Selected device : 4vlx15sf363-12

Minimum period: 16.958 ns

(Maximum frequency: 58.969

MHz)
Number of Slices: 9 out of 6144 0%

Number of Slice Flip Flops: 16 out of 12288 0%

Number of IOs: 33

Number of bonded IOBs: 33 out of 240 13%

Number of GCLKs: 1 out of 32 3%

Number of DSP48s: 8 out of 32 25%

(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 22
Khan
Optimized Mapping

x[n] x[n-2] x[n-4] x[n-6] x[n-8] x[n-10] x[n-12] x[n-14]

x x x x x x x x
b0 b1 b2 b3 b4 b5 b6 b7
y[n]
0 + + + + + + + +

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 23
Khan
Optimized Mapping

Selected Device : 4vlx15sf363-12


Minimum period: 1.891ns (Maximum Frequency: 528.821MHz)

Number of Slices: 9 out of 6144 0%

Number of Slice Flip 16 out of 12288 0%


Flops:

Number of IOs: 33

Number of bonded IOBs: 33 out of 240 13%

Number of GCLKs: 1 out of 32 3%

Number of DSP48s: 8 out of 32 25%

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 24
Khan
module fir_filter_pipeline
( input clk;
input signed [15:0] //Q1.15
data_in;
output signed [15:0] data_out; //Q1.15

// Constants, filter is designed using Matlab FDATool, all coeffs are in Q1.15
format parameter signed [15:0] b0 = 16'b1101110110111011;
parameter signed [15:0] b1 = 16'b1110101010001110;
parameter signed [15:0] b2 =
16'b0011001111011011; parameter signed [15:0] b3
= 16'b0110100000001000;

parameter signed [15:0] b4 =

16'b0110100000001000; parameter signed [15:0] b5

= 16'b0011001111011011;

parameter signed [15:0] b6 =

16'b1110101010001110; parameter signed [15:0] b7

= 16'b1101110110111011;
reg signed [15:0] xn [0:13] ; // one stage pipelined
input sample delay line

reg signed [32:0] prod [0:7]; // pipeline product registers in Q2.30


format wire signed [39:0] yn; // Q10.30 25
Dreg
bigeitasigned [39:0]
gl iDnesign mac
of Signal [0:7];
Processing // pipelined
Systems, John Wiley &mac registers
Sons by Dr. Shoab in Q10.30 format
xn[0] <= data_in;
for (i=0; i<13; i=i+1)
xn[i+1]=x[i];
data_out <= yn[30:14]; // bring the output back in Q1.15 format
end

always @( posedge clk)


begin
prod[0] <= xn[0] * b0;
prod[1] <= xn[2] * b1;
prod[2] <= xn[4] * b2;
prod[3] <= xn[6] * b3;
prod[4] <= xn[8] * b4;
prod[5] <= xn[10] * b5;
prod[6] <= xn[12] * b6;
prod[7] <= xn[14] * b7;
end
always @(posedge clk)
begin
mac[0] <=
prod[0]; for (i=0;
i<7; i=i+1)
mac[i
+1] <=
mac[i]
+prod[i
enDidgimtal Doedsiugnleof Signal
+1];Processing System//s, Jfoirh_n Wfililteey &r 26
Carry Chain Logic in FPGAs

ci 1 gi pi c i

pi ai bi
ai b
gi
i
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 27
Khan
Fast Carry Logic in Vertix™-II pro FPGA Slice
Cout
SHIFT IN
LUT: A + B ORCY
SOP IN SOPOUT
0
YBMUX
Dual-Port MUXCY YB
G4 Shift-Reg 0
G3 A4 LUT 1
A3
B RAM
A
A2
A1 ROM GYMUX Y= A + B + Cin
WG4 P
WG4
WG3 WG3
WG2 XORG
WG1
WG2
WG1
DY
WS DI
FF
LATCH

D Q Q
ALT DIG G2
PROD DYMUX Y
G1 CYOG
MULIAND BY
1 CE CE
0 CLK CK
S
R

BY SHIFTOUT SR
R
E
DIG
WSG Dedicated Carry logic
V
WE [2:0]
SLICEWE[2:0] WE
CLK
MUXCY 1 half-Slice= 1bit-adder
WSF 0 1

CE
Shared
between
CLK
x&y
Registers Cin
SR

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 28
Khan
Fast Carry Logic
COUT COUT
to S0 of the next CLB to CIN of S2 of the next CLB

O I MUXCY

FF
LUT

(Fast Carry SLICE


Chain) S3
O I MUXCY
FF
LUT

CIN
COUT
O I MUXCY
COUT
FF
COUT LUT
to CIN of S2 of the SLICE
O I MUXCY
FF
LUT
FF
LUT

SLICE S1
O I MUXCY

FF
LUT

CIN
COUT (Secondary
O I Carry Chain)
MUXCY
FF
LUT CO
UT
SLICE S0
O I MUXCY

FF
LUT

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan 29
CIN CIN CLB
Fast Carry Chain
1 CLB = 4 Slices = 2,4bit adders
A[63:60]

B[63:60]
+ Y[63:60]

Y[64]

A[63:60] CLB 15
Y[63:60]
B[63:60]

CLB 2 Y[11:8]
B[11:8]

A[7:4] CLB 1
Y[7:4]
B[7:4]

A[3:0] CLB 0
Y[3:0]
B[3:0]

CLBs must be in same column


Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 30
Khan
Parallel multiplier architecture
Designing Customized Multipliers

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 31
Khan
Adders

 Used in addition, subtraction, multiplication and division

 Speed of a signal processing or communication system


ASIC depends heavily on these functional units

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 32
Khan
Half Adder using Data Flow modeling

module HALF_ADDER(ai, bi, si, cout);

input ai, bi;


output si, cout;

// data flow modeling

assign {cout,si} = ai +

bi; endmodule

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 33
Khan
Half Adder using Data Flow
modeling
ai Si
bi

cOut

Half
Adder

Ci
a i b i ai
bi
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
34
Full
Adder

x y zTruth CS
Table
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 35
Khan
Full
Adder

Cin Si
Half Adder cOutHA2
SiHA1
ai Half Adder cOutHA1
cOut

bi

ci ai bi ci ai
1 bi
si ai bi ci
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 36
Khan
Gate-level design options for a full adder

ai
bi

Ci+1 Half Adder (HA)


HA
ai pi
bi si
gi
C i+1
ci
si

ci

(a) (b)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Contd ...

ai
bi ai pi
bi si
0 0 0
1 1 gi
2 2 Ci+1
C i+1
3 1 3
ci
s i

c i

(c) (d)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Full Adder: Implementation in Verilog
module FULL_ADDER(ai,bi,cin,si,cout);

input ai,bi; input


cin; output
si,cout;
wire
SiHA1,CoutH
A1,CoutHA2;

HALF_ADDER HA1(ai,bi,SiHA1,CoutHA1); // instance


HA1 HALF_ADDER HA2(SiHA1,cin,si,Cout); //instance
HA2
Or (cout,CoutHA1,CoutHA2); // using or gate primitive

endmodule

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Full Adder Using Data Flow Modeling

module FULL_ADDER(ai,bi,cin,si,cout);

input ai,bi;
input cin;
output si,cout;

// through data flow level of


abstraction

assign {cout,si} = ai + bi +

cin; endmodule

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 40
Khan
Full Adder Using Data Flow Modeling

module FULL_ADDER(ai,bi,cin,si,cout);

input a,bi i;
input cin;
output
si,cout;

// through data flow level of abstraction

assign {cout,si} = ai + bi +

cin; endmodule

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 41
Khan
Ripple Carry Adder

a[5] a[4] a[3] a[2] a[1] a[0]


b[5] b[4] b[3] b[2] b[1] b[0]

cout
FA FA FA FA FA FA

c[5] c[4] c[3] c[2] c[1] cin


c[6] s[5] s[4] s[3] s[2] s[1] s[0]
overflow

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 42
Khan
module ripple_carry_adder #(parameter W=16)
(input clk,
input [W-1:0]
input a, b,
cin, s_r,
output reg [W-1:0]
output regcout_r);

wire [W-1:0]
s; wire cout;
reg [W-1:0]
a_r, b_r;
reg
cin_r;

assign {cout,s} = a_r + b_r +

cin_r; always@(posedge clk)


begin
a_r<=a;
b_r<=b;
cin_r<=cin;
s_r<=s;
endmodule
Digital Design cout_r<=
of Signal Processing Systems, John cout;
Wiley Sons by Dr. Shoab A. Khan 43
&
end
RCA: Dataflow modeling

Six bit ripple carry adder through data flow modeling

// SIX BIT FULL ADDER ;


module fulladder_6bit(s,cout,a,b,cin);

output cout;
output [5:0] s;
input [5:0]
a,b;
input cin;
reg [5:0] s,c;
reg cout;
always@(a or b or cin)
begin
{c[0],s[0]}= a[0] +
b[0] + cin;
for(i=1; i<6; i=i+1)
{c[i],s[i]}= a[i]
+ b[i] + c[i-
1]; // through
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 44
Khan data flow
Important Observation

 Do we have to wait for the carry to show up to begin doing


useful
work?
 We do have to know the carry to get the right answer.

 But, it can only take on two values

45
45Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Non-uniform Group 12-Bit Carry Select Adder

 Three partitions of 3-bits, 4-bits, 5-bits are made

 The cout of the first block is ready earlier making it faster


in functionality than the uniform group 12- bit carry
select adder
 So non-uniform group carry select adder is faster than
the uniform group carry select adder

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 46
Khan
Carry Generate and Propagate Logic

gi ai bi
pi ai bi
ci 1 gi
p i c i si
ci pi

47Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Group Carry and Group Propagate

c1 g0 p0 c0
c2 g1
p1 c 1
p0 c0
g1
p 0 p1 c 0
p1
g0 p2 p1 g 0 p2 p1 p0 c0
p3 p2 p1 p0 c 0
g1 p3 p2 g1 p p p g
3 2 1 0
let Gp0 g g3 p3 g 2 p3 p2 g1 p3 p2 p1 g 0
1 0

and P0 p3 p2 p1 p0
c3 g2
we can write c4 G0
p2 g1
P0 c0
c4 g3 48
48Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
CLA logic for computing carries in two-Gate delay time

ai
bi pi
ci si
gi

(a)
c
p00
p1
p2
p3
c0
g0
p0 p1
g0 c1 c
p00 p2
p1 p3
p2
g0 g1
c0
p0 p1 p2
p1 p2 c3 p3 c4
g g1 g2
c2 p2 p3
0
p1 g2 g3
g1

(b)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 49
Khan
A 16-bit carry look-ahead adder using two levels of CLA
logic s a b s a b s a b s a b C 3 3 3 2 2 2 1 1 1 0 0 0 0

C3 C2
Adder Adder C Adder
Adder 1
p3 g3 p2 g2 p0 p1 g1 g0
Carry Look Ahead Logic CLA00
P0 G0
s7 a7 b7 s6 a6 b 6 s5 a5 b5 s4 a4 b 4

C7 C6 C4
Adder Adder C5
Adder Adder
p7 g7 p6 g6
p5 g5 p4
CLA01
g4
P1 G1
s11 a11 b11 s10 a10 b10 s9 a9 b 9 s8 a8 b 8

C10 C8
Adder C11 Adder C9 Adder
Adder
p11 g11 p10 g10 p8 g8
p9
CLA02

g9 P2 G2
s15 a15 b15 s14 a14 b14 s13 a13 b13 s12 a12 b12

C14 C8
Adder C15 Adder Adder C13 Adder
p15 g15 p14 g14 p13 g13 p12 g12
CLA03
P3 G3

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. Khan
50
CLA10
A 64-bit carry look-ahead adder using three levels of CLA
logic

4-bit carry-
lookahead adder

c48 c32 c16 c12 c8 c4 c0

03 G P0 G P0 P0
P 03 02
2 1 G01 0 G00

P1 G P1 G P1 G CLA10
3 2 1
13 12 11
P10 G10
16-bit carry-

CLA20

P20 G20

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 51
Khan
A 12-bit Hybrid Ripple Carry and Carry Look-ahead
Adder

s11 a11 b11 s10 a10 b10 s9 a9 b9 s8 a8 b8 g7:4 s7 a7 b7 s6 a6 b6 s5 a5 b5 s4 a4 b4 s3 a3 b3 s2 a2 b2 s1 a1 b1 s0 a0 b0 c0

c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1


FA RFA RFA RFA FA RFA RFA RFA FA RFA RFA RFA

g11 p11 g10 p10 g9 p9 g8 p8 g7 p7 g6 p6 g5 p5 g4 p4 g3 p3 g2 p2 g1 p1 g0 p0

LOOKAHEAD LOGIC LOOKAHEAD LOGIC LOOKAHEAD LOGIC

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 52
Khan
Binary Carry Look-ahead Adder (BCLA)

g i ai bi

pi

ai (g i 1, pi 1 ) ...( gi , pi ) (g 0 ,
bi and p0 )
________________________________________
(Gi , Pi )
____ Andthe problemcan be recursively solvedas
(g i , pi )
(G0 , P0 ) (g 0 , p0 )
Eq1
for i 1 to N-1
(Gi , Pi (g i , pi ) (Gi 1, Pi
)
1)
ci
P i c0
G i end
Wherethe dot operator is
givenas : (Gi , Pi ) (g i , pi ) pi Gi 1, pi Pi 1 )
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 53
Khan
Binary carry look-ahead adder Serial Implementation

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

S15
……………………………………………………………………… S0

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 54
Khan
module BinaryCarryLookaheadAdder begin

# (parameter N = 16) for (i=0;i<N;i=i+1)

(input [N-1:0] a,b, begin

input c_in, //generate all ps and


gs
output reg [N-1:0] p[i]= a[i] ^
sum, b[i];
output reg c_out); g[i]= a[i] &
b[i];
reg [N-1:0] p, g, P, end
G;
reg [N:0] c; End

integer i;

always@(*)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 55
Khan
Brent–Kung adder

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Level

S15
……………………………………………………………………… S0

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 56
Khan
Ladner–Fischer parallel prefix adder

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Stage 1

Stage 2

Stage 3

Stage 4

S15
………………………………………………………………… S0

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 57
Khan
Kogge–Stone parallel prefix adder
15 14 13 12 11 10 8 7 6 5 4 3 2 1
9 0

Stage
1

Stage
2

Stage
3

Stage
4
S15
……………………………………………………………………… S0

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 58
Khan
Han–Carlson parallel prefix adder
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Stage 1

Stage 2

Stage 3

Stage 4

Stage 6

S15 ………………………………………………………………… S0

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 59
Khan
Regular layout of an 8-bit Brent-Kung Adder

c1 c2 c4
( g0,p0
)
( g1,p1 c3
)( g2,p2 ) c5
( g3,p3
)
( g4,p4 c6
)
( g5,p5
)( g6,p6 c7
)
( g7,p7
)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 60
Khan
Carry Skip Adder

 If any group generates a carry, it passes it to the next group

 In case the group does not generate its own carry then it
simply bypasses the carry from the previous block to its next
block

Pi pi pi 1pi 2... pi k 1

pi ai bi

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 61
Khan
A 16-bit equal-group carry skip adder

a[15:12] b[15:12] a[11:8] b[11:8] a[7:4] b[7:4] a[3:0] b[3:0]

C12 C8 C4
c_out 4-bit RCA 4-bit RCA 4-bit RCA 4-bit RCA c_in
OV
P[15:12]

P[11:8]

P[7:4]

P[0:3]
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 62
Khan
Conditional Sum Adder

 The process that led to the two-level carry select adder can be
continued . . .
 A logarithmic time conditional-sum adder results if we proceed
to the extreme:
 single bit adders at the top

 A conditional-sum adder is actually a (log2 k)-level carry-select


adder

 Implemented in multiple levels

 Built using Conditional Cells (CC) and MUX(s)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 63
Khan
Principle

 The conditional cell generates a pair of sum and carry


bits i at each bit position (si0, ci0,si1,ci1)
 One pair assumes carry_in of one (si1, ci1) and the other
assumes a carry_in of zero (si0, ci0)
 The correct sums and carries are then selected using a
tree of multiplexers
 All level one bits are paired up
 The sum and carry of the next bit position, brought
down to level 2 are selected by the least significant
carry
 This continues until all the sums and carries are resolved
in the last level

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 64
Khan
Example

s0 i ai bi
s1i ai ~
bi
c0 i a i bi
c1i ai bi
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 65
Khan
Conditional Cell (CC)

ai
bi

s0i s1i
c0i c1i

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 66
Khan
Addition of three bit numbers using a conditional sum adder
(Here we are assuming actual cin=0)

1 1 1
1 0 1
0 1 0 s0i
1 0 1 c0i Cin
=0
1 0 1 s1i
c1i Cin=1
1 1 1
0 1 0
1 1
1
0
1 0 0
1 67
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Example: Conditional Sum Adder
1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1
ai ai
0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0
bi
Group sum and block carry out bi
Group 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0 i
Group width S0
carry-in i
C0i
S1i
C1
i
1 0 1 0 1 0 1 1 1 1 0 0 0 1 1 0 1 1
0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0
1 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0
1 0 1 1 1 1 1 1 1 1 0 1 1 1 1
2 0 1 0 1 1 1 0 0 1 1
0
0 0 1 1 0 1 0
0 1 0 0 1 0 1
1 1 0 0 0 1 1 0

1 1 0 0 1 0 1
0 1 1 1 1 0 1
4 0 1 1 0 1 1 1 1 0 0 0 0 1 1
0 1 1 1
0 0 1
1 1 1 0 0 0 0 1 0 1
Digital Design of Signal Processing Systems, ohn Wiley
1 & Sons by Dr. 0Shoab A. Khan 0 68
J
0 1 1
A 16-bit Conditional Sum Adder
a0 CC0 S00
C00
b0
S01
a1 CC1 Mux S1
C01 C1
b1 S11 2 C11
C11
C02 S02
S02 Mux S2
a2 CC2 Mux S03
C02
C03
3
b2 S12 2 S3
C12 C12 S12
S03 C3
a3 C03 Mux S13
CC3 S13 C13
b3 C13 2
C04 C05 C13
S0
C044 S04
a4 CC4 Mux S05
S14 C05 S05
b4 C14 2 S06
C14 Mux
S05 S07
Mux S15 3 S4
a5 CC5
C05
C07
S15
C15 2 C15
b5 S5
C06 S06 C16 Mux
S6
CC6 S06
Mux S07 S14 5 S7
a6 C06 C07 S15
b6
S16 2 Mux S1 6 C7
C16 C17 S16
S07 S17 3 S17
C07 Mux
a7 CC7 S17 C17 C17
C17 2 C011
b7 S08 Mux
C08 C09
C08 2 Mu
a8 CC8 x S010
8
S09 Mux C7
b8 S1S0
8 C18 S011
S08
C1
9 2 3 S8
C09 C09 Mux S19 C011 S09 Mux
a9 CC9 S19 Mux S010 9
C19 2 C19
b9 C19 5 S011
S010 C010 S010 S012
CC10 C0
S110 M x S011
C011 S110
S013
a10 C110
u2 Mux S014
S011 C110 S110 S111 S015
S111 3 C111 C015
b10 Mux S9
CC11 C111
2 S10
C111 S11
a11 S012 C012 C013
CC12 S013 Mux S012 S12
S013
C0C0
12 11 M ux C013 S013 S13
b11 S1S1
12 11 2 3 S014 S14
C1C1
12 11 C112 S15
S113 S015
Mux C113 S18 C15
a12 CC13 C015 S19
C013
2 Mux 69
S1
S014 C114 S110
b12 S0C1 13 C014 5 S111
CC14 14
S01513 Mux S015 S112
C014 S112
S114 2 C015 Mux S113 S113
a13 C114 C114 S114 S114
S114 3
a15 C015
Mux S115 S115 S115
b13 CC15 S115
C115
C115 C115
b15 C115 2
Hybrid Adder Designed

 Hybrids are obtained by combining elements of:


 Ripple-carry adders

 Carry-lookahead (generate-propagate) adders

 Carry-skip adders

 Carry-select adders

 Conditional-sum adders

 You can obtain adders with


 higher performance

 greater cost-effectiveness

 lower power consumption

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 70
Khan
Example

s0 i ai bi
s1i ai ~
bi
c0 i a i bi
c1i ai bi
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 71
Khan
A 16 –bit uniform-groups carry select adder

b[15:12] a[15:12] b[11:8] a[11:8] b[7:4] a[7:4] b[3:0] a[3:0]

4 bit Ripple 4 bit Ripple 4 bit Ripple 4 bit Ripple


C0 Carry Adder C0 Carry Adder C0 Carry Adder C0 Carry Adder
0 0 0 0

C1 4 bit Ripple 4 bit Ripple 4 bit Ripple C1 4 bit Ripple


Carry Adder C1 Carry Adder 1 C1 Carry Adder 1
1 1 Carry Adder

Cout Cin
4 bit 2-to-1 Mux 4 bit 2-to-1 Mux 4 bit 2-to-1 Mux 4 bit 2-to-1 Mux
C[12] C[8] C[4]

S[15:12] S[11:8] S[7:4] S[3:0]

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 72
Khan
Hierarchical CSA

a 3N a  3N N
 b  N 1: 4   1:  b  3N 1: N  a N N b  N 1: N
N1: 3N4   4 2   4 2 2 4 2 4
  1: N
 b  4 1:0
 a N
4 1:0


0 0 0  
N/4-bit adder N/4-bit adder N/4-bit adder
Cin
N/4-bit adder 1 N/4- bit adder 1 N/4-bit adder 1 N/4- bit
adder
N/4+1 N/
N/4 N/4 N/4+1 N/4+1 N/4
4+1

1 Mux 0
CN/4
1 Mux 0

CN/2

1 Mux 0 N/4

N/
Cout
s N 1:
2+1N

 s N 1: N s  N  1 : 
 2 2 4 4
 0

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 73
Khan
Barrel Shifter

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 74
Khan
(a) Design of a logic shifter for an 8-bit Operand (b) Design of logic
and
arithmetic shifter for an 8-bit signed
0
operand
x[7]
x 8 1 1
0
{ x[7], L/A 0
8 1
1 x 8
0
x[7:1] } x[7:1] 7
{ 2{x[7]}, x[7:2] } 8 2 1 8
1 1
{ 3{x[7]}, x[7:3] } 8 3 2
x[7:2]
2
6 8 2
{ 4{x[7]}, x[7:4] } x[7:3] 5
4 3
8 3
{8 5{x[7]}, x[7:5] }
3
x[7:4] 4
5 4 8
4 4
{8 6{x[7]}, x[7:6] } x[7:5] 3
6 5 8 5
5
{8 7{x[7]}, 8 y x[7:6] 2
7 8 6 8 6
x[7] }
8'b0 8
6

8 7 x[7] 1 8 7
7
{ x[0], 8 8'b0 8 8
9 y
7'b0 }
{ x[1:0], 8 { x[0], 7'b0 } 8
10 8
9
6'b0 }
{ x[2:0], 8 { x[1:0], 6'b0 }
11 8
10
{5'b0 }
x[3:0], 8 { x[2:0], 5'b0 } 8
12 11
4'b0 }
{ x[4:0], 8
13 { x[3:0], 4'b0 } 8
3'b0 }
{ x[5:0], 8 12
14 { x[4:0], 3'b0 } 8
{2'b0 }
x[6:0], 8 15
13
{ x[5:0], 2'b0 } 8
1'b0 } 14
{ x[6:0], 1'b0 } 8
15
4
4
s
75
(a) (b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Design of a Barrel Shifter performing shifts in multiple stages
(a) Single cycle design

{15{x[15]}} 15'b0
15 15
L/A 1 0
15 {15[sgn] },x[15,0]
y0[22:0]
x[15:0] 0 y0 23 y1[18:0]
16 31
31 0 y1 19
y2[16:0]
1 y0>>8 y2 y3[15:0]
23
y1>>4 0 17
{x[14:0],16'b0} 31 y3
23 1 19 y2>>2 0 16
s4 y0[30:8] 19 1 17
y3>>1
0 16 y
s3 y1[22:4] 17 1
s2 y2[18:2] 16 1
s1 y3[16:1]
s0
(a)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 76
Khan
Design of a Barrel Shifter performing shifts in multiple (b)
Pipelined design

{15{x[15]}} 15'b0
15 15
L/A 1 0
15 y0_reg
y0[22:0] y1_reg
{15[sgn] },x[15,0] y0 23 y1[18:0] y2_reg
x[15:0] 0 0 y1 19 y2[16:0] y3_reg
16 31 31
y0>>8 0 y2 y3[15:0]
23 17
{x[14:0],16'b0} 1 1 y1>>4 0 y3 16
31 1 19 y
s4
23 y2>>2 17 0 16
y0[30:8] 19 1 y3>>1
5 y1[22:4] 17 1
4 4 y2[18:2] 16
3 3 y3[16:1]
s-reg 2 s2
1 s1
sp_reg spp_reg
sppp_reg

(b)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 77
Khan
Carry Save Adders and
Compressors

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 78
Khan
Carry Save Addition saves the carry at next bit location
 The CSA does not ripple any carry
 It has a delay of one FA
 The concept of CSA is effective in designing partial products
compression/ reduction logic

a0 a1
0 0 1 0 1 1 a2
N
a0 = N N
0 1 0 1 0 1
a1 =
1 1 1 1 0 1 3:2
a2 =
s = 1 0 0 0 1 1
N+1 N
c = 0 1 1 1 0 1
c s
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 79
Khan
Dots are used to represent each bit of the partial product

 Dot notation facilitates description of different reduction


schemes
 Dots are used to represent each bit of the partial product

Multiplicand b[3] b[2] b[1] b[0]


a[0] Multiplier

Partial Product pp[0]

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 80
Khan
Parallel multiplier architecture
Designing Customized Multipliers

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 81
Khan
Three components of a multiplier

 N-bit inputs operands

 Partial Product Array Generation = N shifted


binary numbers

 Partial Product Array Reduction= reduction to 2


binary numbers

 Final addition = 2N-bit final product

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 82
Khan
Three components of a multiplier

multiplicand

multiplier

N N

PP Generation


PP Reduction

CPA

2N

Product
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 83
Khan
Partial Product Generation for a 6x6 Multiplier

Multiplicand Multiplier
b5 b0 a5 a0 =

PPij

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 84
Khan
Partial Product Generation Verilog Code

module multiplier (
input [5:0] a,b,
output [11:0]
prod);

integer i;

reg [5:0] pp [0:5]; // 6 partial products

always@*
begin
for(i=0; i<6; i=i+1)
begin
pp[i] =
b&
{6{a[i]}
};
end
end

assign prod = pp[0]+


{pp[1],1'b0}+{pp[2],2'b0}
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 85
Khan
Reducing number of dots in a column

Level n

Level ( n+1 sum sum


) carry carry

(FA (HA No operation


) )

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 86
Khan
 Three dots are shown

 Each symbolizes a partial product

 Using FA reduces these to two bits

 One has the weight of 20(sum)

 The other has the weight of 21(carry)

 This type of reduction is known as 3 to 2 reduction


or
carry saves reduction
 The two dots are reduced to 2 using a HA
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 87
Khan
Partial Products Reduction Schemes

 Carry Save Reduction Scheme

 Dual Carry Save Reduction Scheme

 Wallace Tree Reduction Scheme

 Dadda Tree Reduction Scheme

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 88
Khan
12x12 Carry Save Reduction Scheme

 Considers three rows at a time


 Take first three rows use CSA to reduce them to two
 Iteratively take two layers from previous reduction and a new
from PP layer and reduce them to two using a CSA
 Finally produces two layers
 Also produces free product bits
 The two layers are added using any CPA

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 89
Khan
PP reduction for a 12x12 Multiplier using Carry Save Reduction
Scheme
First 3 Partial Products

Level 0

Level 1

Final partial product Free product bits


Digital Design of Signal Processing Systems,
rows Johncarry
that need Wiley & Sons by Dr. Shoab A. Khan
90
propagate adder
Carry Save Reduction Scheme Layout for a 6x6 Multiplier

Level 0 HA FA FA FA FA HA

Level 1 FA FA FA FA FA HA

Level 2 FA FA FA FA FA HA

Level 3 FA FA FA FA FA HA

P3 P
P4 P2 P1
0
CPA
Free product bits

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 91
Khan
Dual Carry Save Reduction

 The partial products are divided into 2 equal size groups

 The carry save reduction scheme is applied on both the


groups simultaneously
 This results into two partial product layers in each
group
 The four layers are then reduced using Carry Save
Reduction
 The last two layers are added using any CPA

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 92
Khan
Wallace Tree Multipliers

 One of the most commonly used multiplier architecture

 It is log time array multiplier

 The number of adder levels increases logarithmically as the


partial product rows increase

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 93
Khan
Wallace Tree Multipliers

 Make group of threes and apply CSA reduction in parallel

 Each CSA layer produces two rows

 These rows then, with other rows from other partial product
groups, form a new reduced matrix
 Iteratively apply Wallace reduction on the new generated matrix
 This process continues until only two rows are left

 The final rows are added together for the final product

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 94
Khan
Wallace Reduction Tree applied on 12 PPs
23 20 15 10 5 1

Level 0
5 5

10
10

Level 1

Level 2

Level 3

Level 4

Level 5

FinalJohn
PartialWiley
Product row that Free Product
Digital Design of Signal Processing Systems, & Sons by need carry A. Khan
Dr. Shoab Bits
95
propagate adder
Wallace Reduction layout for a 6x6 array of PPs

HA FA FA FA FA HA

Level 1

HA FA FA FA FA HA

FA FA FA FA FA HA Level 2

HA HA HA FA FA FA HA HA HA Level 3

PS11 PS10 PC10 PS9 PC9 PS8 PC8 PS7 PC7 PS6 PC6 PS6 PC5 PS4 PC4 P3 P2 P1 P0

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 96
Khan
Dada Reduction uses the Wallace Reduction Table
Adder Levels in Wallace Tree Reduction
Scheme
Number of partial Number of full adder
Products Levels
3 1
4 2
5 n 6 3
7 n 9 4
10 n 13 5
14 n 19 6
20 n 28 7
29 n 42 8
43 n 63 9

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Dada Reduction

 Minimizes the number of HAs and FAs


 Reduction considers each column separately

 Reduces the number of dots in each column to the maximum


number of layers in the next level in Wallace Reduction Table

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 98
Khan
Dadda reduction levels for reducing eight PPs to two

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 99
Khan
A Decomposed Multiplier

 Four Multipliers of size NxN can be combined to make a 2N x


2N multiplier

(a L 28 a ) ( b L 28 b )
H H

(a L bL aL bH 8 a H bL 28 a H bH 216
2

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 100
Khan
A 16x16 bit Multiplier decomposed into four 8x8 multipliers

aL X bL 16-Bits
aL X bH 16-Bits
aH X bL 16-Bits
aH X bH 16-Bits
32-Bits

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 101
Khan
The results of these multipliers are appropriately added
to get the final product

Stage 1

8x8 Multiplier

Stage 2

Stage 3

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 102
Khan
Optimized Compressors

( 3,2 ) ( 3,2 ) ( 3,2 ) ( 3,2 )

c_out
c_in

( 3,2 ) ( 3,2 ) ( 3,2 ) ( 3,2 )

(a) (b)
Candidate implementation of 4:2 compressor Concatenation of 4:2 compression to create wider
tiles

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 103
Khan
Contd…

(c) (d)
Use of 4:2 compressor in Wallace tree reduction Use of 4:2 compressor in an 88 multiplier in
of an 8x8 multiplier Dadda reduction

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 104
Khan
Single- and Multiple-column Counters

 A 6:3 counter reducing six layers of multiple operands to three

 A 6:3 counter is mapped on three 6-input LUTs

P5
25 24 23 22 21 20
P5 P4 P3 P2 P1P0

P4
P3
6-bit partial

P2
product x6

P1
6-LUT 6-LUT 6-LUT

P0
Carry Carry Sum
6:3 -compressor x6 1
0

Carry1 Carry0 Sum


x2 i+2 x 2i
x 2i+1

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 105
Khan
Counters compressing a 15x15 matrix
 15:4, 4:3 and 3:2 counters working in cascade to compress a
15x15 matrix

Compression ratio 15:4


Compression ratio 4:3

Compression ratio 3:2


CPA CPA

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 106
Khan
A (3,4,5:5) GPC compressing three columns with 3, 4,
and 5 bits to 5 bits in different columns

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 107
Khan
Compressor Tree Synthesis using compression of two
columns of 5 bits each into 4 bit (5, 5; 4) GPCs

 Two columns of 5 bits each results into 4-bit


 This GPC is represented as (5,5;4)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 108
Khan
Compressor tree mapping by (a) 3:2 counters (b) and a (3, 3;
4) GPC

(a) (b)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 109
Khan
Two’s Complement Signed Multiplier

 The sign bit in 2’s complement representation plays a critical role


in signed multiplier
N 2
N 1 i
x xn 12 xi 2
i 0

N2 2
i N2 1 i for i=0,1,…,N1-2
PP[i (ai bn 12 bi
] 2 ) i 02

N2 2
N N2 1 i
PP[ 1 1] ( a 1 1)
N 1 12 bn 1 2 bi
N i 02
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 110
Khan
Optimized GPC for FPGA Implementation
 FGPAs are best suited for counters and GPC-based compression
trees
 LUTs in many FPGAs come in groups of two with shared 6-bit input
 A GPC (3,3;4) best utilize 6-LUT-based FPGAs

6 6
LUT 1 LUT 1

LUT 1 LUT 1

LUT 1 LUT 1
Not
Used
LUT 1

(a) (b)

An Altera FPGA Adaptive Logic Module (ALM ) contains A 6 inputs, 4 outputs GPC has full logic
two 6-LUTs with shared inputs, 6 inputs, 3 outputs GPC utilization.
has 3/4 logic utilization

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 111
Khan
Showing 4 x 4-bit signed by signed multiplication

 To cater for the sign bit


 The sign bits of the first three PPs are
extended
 Two’s complement of the last PP is taken
 HW implementation results in additional logic
n logic 1 1 0 1
sign extensio 1 1 0 1
1 1 1 1 1 1 0 1
0 0 0 0 0 0 0 X
1 1 1 1 0 1 X X
0 0 0 1 1 X X X
0 0 0 0 1 0 0 1
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 112
Khan
Sign - extension Elimination

extend all 1s flip the sign bit


B = 0 0 0 0 0 0 .1 1 0 1 0 1 1
B= 1 1 1 1 1 0 .1 1 0 1 0 1
+ 1 add 1 at the location of sign bit
0 0 0 0 0 0.1 1 0 1 0 1 1
(a)

extend all 1s flip the sign bit


B = 1 1 1 1 1 1.1 1 0 1 0 1
B= 1 1 1 1 1 1
+ 1.1
1 1add
0 1 at
0 the
1 1location of sign bit
1 1 1 1 1 1.1 1 0 1 0 1 1

(b)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 113
Khan
Sign-Extension Elimination and CV Formulation for signed
by signed Multiplication

111111 1 1 1 1 1
1 1
11111SXXXXXXXXXX 1 1 1 1
1
1111SXXXXXXXXXX 1 1 1
1
111SXXXXXXXXXX
SXXXXXXXXXX 111
11SXXXXXXXXXX and adding 1 at 0
1 10 0 0 0 1 0 0 0 0 1 0 0 0 00
1
1 SXXXXXXXXXX LSB
1's compliment
1

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 114
Khan
Multiplying two numbers, 0011 and 1101

 All the 1s in red area are added to get CV


 The CV is 8’b0001_0000

0010
Negative Number 1101
1111
11110
Correction 010
Vector 11100
00010
1
11001
0
(a)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Contd…

 CV is simply added as one of PP


 In case of NxN multiplier, CV is always a 1 at N+1 bit
location

0010
1101
11010
Correction Vector
1000X
1010XX
0101XXX
111101

(b)
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Contd…

 The MSB of all the PPs except the last one are flipped
and a 1 is added at the sign-bit location, and the number
is extended by all 1s
 For the last PP, the two’s complement is computed
 Flip all the bits and adding 1 to the LSB position
 The MSB of the last PP is flipped again and 1 is added
to this bit location for sign extension.
 All these 1s are added to find a correction vector (CV)

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A.
Khan
Application of the string property

0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1
String
0 0 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1
String
0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 1
String
0 0 1 1 1 1 1 0 0 1 0 0 0 0 1 0 0 1 0 1
String
0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1

Hence the number of 1(s) has reduced from 14 to 6. Both have the same
value.

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 118
Khan
Generation of four PPs
10 10 11 01

-2 +1 -1 1
11111111 10 10 11 01

00000001 01 00 11
11111010 11 01
00101001 1
00100101 01 00 10 01

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 119
Khan
An 8 x 8 bit modified Booth recoder multiplier

2a 9
a
0 CV
-a
-
2a

Wallace Tree
0 3 Reduction
b0 BR0 3-5
Scheme
b1 2a a
0 -a
-2a
3
b2 BR1 3-5
b3

3 16 Bit
b4 BR2 3-5 CSA
b5

16
b6 BR3 3
b7 3-5

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 120
Khan
Pre-calculated part of the CV
1 0 1 0 1

1 1 1 1 s
1 1 s
s
0 1 0 1 1
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 121
Khan
Algorithm Transformations for CSA

sum op1 op2;


1 op3
if op4;
sum2
sum
(sum1 )
2
sel

0; else
sel
1;
To transform the logic for optimal
use of compression tree the
algorithm
Digital Design is
Khan
modified
of Signal Processing as:
Systems, John Wiley & Sons by Dr. Shoab A. 122
Example: Multi Operands addition

 Multiple operands addition should use compression tree


 Avoid multiple instantiations of CPA

 The example adds Q1.5, Q5.3, Q4.7, and Q6.6 format


sign numbers
 Compute CV using sign extension elimination technique
 Add it as 5th partial product
 Compress using dadda tree
 The last two rows can be added using any CPA

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 123
Khan
Example illustrating use of compression tree in multi-operand
addition
Implied place of decimal

1 1 1 1 Inverted bit
1 1 1 1 1 1 Q1.5
1 1 Q5.3
1 1 1 Q4.7
1 Q6.6
0 0 0 1 1 1
CV→ 1
added as
fifth layer 5:4
layers
1 1 1 1
1

HA 4:3
layers
FA

1
3:2
layers

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 124
Khan
Algorithm Transformations for CSA
 Multi operands addition should use compression tree and one
CPA
a[n]
b[n] + d[n]
d[n-1]
c[n] + y[n]
x
e[n]

(a) FSFG with multi operand addition

a[n]
d[n] d[n-1]
b[n] CSA
c[n]
x
e[n]

(b) Modified FSFG reducing three operands to two

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 125
Khan
Compression tree replacement for an Add Compare and
Select Operation
 In many applications multi operands addition is hidden and can be extracted
 This example performs an Add-Compare-Select operation
 The operation requires three CPAs
 The statements can be transformed to exploit compression tree

Op1 Op2 Op3 Op4 Op1 Op2 Op3 Op4 2'b10

+ + Compression Tree
(CT)

< Sign

S S
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 126
Khan
Transforming the add and multiply operations to use one CPA and
a compression tree
 Apply distributive property of multiplication
 Generate PPs for the two multiplications
 Use one compression tree to reduce all PPs to two layers
 Use one CPA to add these two layers

op1 x (op2 + op3) = op1 x op2 + op1 x op3


Op1 Op2 Op3 Op1 Op2 Op1 Op3

PP PP

+
Generation Generation

Compression Tree

x
CPA
y
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 127
Khan
Transformation to use compression trees and single CPA
to
implement a cascade of multiplication operations
PP Generation
Op1 Op2(PPG)

Op1 Op2

xx Op3 Op3
S1
CT Op3
C1

PPG PPG

x Op4 Op4
S2
CT
C2
Op4

PPG PPG

x S3
CT
C3
CPA

Pro
d

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 128
Khan
String Property

 7=111=8-1=1001
 31= 1 1 1 1 1 =32-1
Or 1 0 0 0 0 1=32-
1=31

 Replace string of 1s in multiplier with


 In a string when ever we have the least significant 1, we put

a bar on it
 We go to the end of the string

 We replace all the 1(s) with 0

 We put a 1 where the string ends

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 129
Khan
 Instead of multiplying with a single bit
 We multiply with two bits hence making the partial
products half in No.

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 130
Khan
Booth Recoding Basic Idea

A= 10 10 11 01
B= 1 10 11 0
0 1

For these two bits Booth’s algorithm restricts the value to (-2, -1, 0,
be
+1,+2)
+2 means Shift left A by one
+1 means Copy A in the answer
0 means copy all 0’s
-1 means 2’s complement and then copy
-2 means 2’s complement and then shift
left

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 131
Khan
Booth’s Algorithm

 Form pairs using string


property

10101101 0

 Use the MSB of the previous group to check for the string property
on the pair, use 0 for the first pair

Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 132
Khan
As the string property is applied on three bits, there are
following eight possibilities:

21=2 20=1
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 2
1 0 0 -2
1 0 1 -1
1 1 0 -1
1 1 1 0
Digital Design of Signal Processing Systems, John Wiley & Sons by Dr. Shoab A. 133
Khan

You might also like