18/11/2024
Digital System Design
CS-431
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
7 Series FPGAs: Clock Management Tile (CMT)
• Clock management tile (CMT)
– Performs Frequency synthesis,
– Clock de-skew,
– Jitter-filtering
– High input frequency range
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
1
18/11/2024
CMT
Clock signal from
outside world Daughter clocks
Clock used to drive
internal clock trees
Manager
or output pins
etc.
Special clock
pin and pad
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
CMT: Jitter Removal
• In the real world clock edges may arrive a little early or a little late.
• A fuzzy clock would result (jitter) due to the delay encountered.
• The FPGA clock manager can be used to detect and correct for this jitter and
provide a “clean” daughter clock signal for use inside the device.
1 2 3 4
Ideal clock signal
Real clock signal with jitter
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Superimposed cycles
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
2
18/11/2024
CMT: Frequency Synthesis
• The clock manager can be used to generate daughter clocks with frequencies
that are derived by multiplying or dividing the original signal.
1.0 x original clock frequency
2.0 x original clock frequency
.5 x original clock frequency
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
CMT: Phase Shifting
• Certain designs require the use of clocks that are phase shifted (delayed) with
respect to each other.
• Some clock managers allow you to select from fixed phase shifts of common values
such as 1200 and 2400 (for a three-phase clocking scheme)
0o Phase shifted
90o Phase shifted
180o Phase shifted
270o Phase shifted
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
3
18/11/2024
Growing DSP Performance Gap
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
Typical DSP Operation
Diagram of a typical FIR filter
- Parallel computing process by nature
- N number of taps
- N multiplications should happen in
parallel
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
4
18/11/2024
Serial vs. Parallel DSP Processing
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
Embedded Multipliers/ DSP Slices
• Some functions, like multipliers are inherently slow if they are implemented by
connecting a large number of programmable logic blocks together.
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
5
18/11/2024
Main Components and Functionality
• Multiplier: A high-speed multiplier capable of performing signed and unsigned integer and fixed-
point multiplications.
• Adder/Accumulator: Allows results of multiplications to be accumulated or added/subtracted,
enabling the implementation of multiply-accumulate (MAC) operations, essential for many DSP
algorithms.
• Pre-Adder: Supports pre-addition of inputs, useful for symmetric FIR filter implementations.
• Pipeline Registers: To increase the operating frequency, the DSP Slice includes several
pipelining stages, which help reduce delays and achieve high clock speeds.
• Control and Configuration Logic: The DSP Slice provides configuration settings to control its
operational modes and to manage inputs and outputs dynamically.
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
DSP Slices
• All 7 series FPGAs contain DSP48E1 cell. It has the following
features
– 25x18 signed multiplier
– 48-bit add/subtract/accumulate
– 25 bit pre-adder
– Pipeline registers for high speed
– Pattern detector
– SIMD operators
– Cascade paths
– Dynamic pipeline control
• DSP48E1 slices inferred, instantiated or accessed using IP cores
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
6
18/11/2024
Typical Slice Features
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
FIR Filter Mapped to DSP Slices
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
7
18/11/2024
Non-DSP Functions (Addition)
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
DSP48E1 cell
CASCOUT
CARRY
BCOUT
ACOUT
MULT
SIGNOUT
PCOUT
18
18 48 A:B
B 18 Dual B
Register 4
18
6
CARRY
30 0 X OUT
P
25 X 86 43
Dual A, 30 M
A 30 18 43
D
Register 0 48
25 1 Y P
D 25 With P
Pre- C’
adder C 2
0 >>17 = P PATTERN_
C 48 DETECT
Z
>>17 Carry
18 30
7 3 4 48
PATTERN
5
CarryInSel
ALUMode
INMODE
OpMode
CarryIn
C’
CASCIN
CARRY
SIGNIN
MULT
BCIN
ACIN
PCIN
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
8
18/11/2024
X, Y, and Z Multiplexers
• Adder/subtractor operates on X, Y, Z and ALUMODE
0000
Operation
Z + X + Y + CIN
CIN operands 0001 -Z + (X + Y + CIN) – 1
– Table shows basic operations 0010 -Z – X – Y – CIN – 1
0011 Z – (X + Y + CIN)
• X, Y, and Z multiplexers allow for Others Logic Operations
dynamic OPMODEs
• Multiplier output requires both X and Y
multiplexers
Normal or 17-bit right
shifted with MSB fill for
multi precision arithmetic
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
Apply Your Knowledge
OPMODE
Controls the behavior of X, Y, and Z multiplexers
1) Given this OPMODE table, what is the
OPMODE for the following functions?
– C + A:B
– A*B + C
– P + C + PCIN
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
9
18/11/2024
Two-Input Logic Functions
ALUMODEs
• 48-bit logic operations Logic Unit Mode OPMODE[3:2] ALUMODE[3:0]
– XOR, XNOR, AND, NAND, OR, X XOR Z 00 0100
NOR, NOT X XNOR Z 00 0101
X XNOR Z 00 0110
ALUMODE[3:0] X XOR Z 00 0111
X AND Z 00 1100
X AND (NOT Z) 00 1101
X NAND Z 00 1110
0 (NOT X) OR Z 00 1111
P X
A:B X XNOR Z 10 0100
X XOR Z 10 0101
0
1 Y P
X XOR Z 10 0110
X XNOR Z 10 0111
0 X OR Z 10 1100
PCIN Z
P X OR (NOT Z) 10 1101
C X NOR Z 10 1110
(NOT X) AND Z 10 1111
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
OPMODE[3:0]
2000 Vahid/Givargis
Pattern Detect and SIMD
..
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
10
18/11/2024
Dual B Register
● B input to multiplier is controlled by INMODE[4]
- Dynamically selects B1/B2 pipeline level
● B input to X MUX and BCOUT cascade outputs
are statically controlled by bitstream options
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
Dual A, D Registers and Pre-Adder
● A input to multiplier is controlled by INMODE[3:0]
- Dynamically selects A1/A2 pipeline level
- Dynamically selects add/subtract
- Dynamically selects Zero for A or D
● X MUX and ACOUT cascade input are statically
controlled.
..
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
11
18/11/2024
Pre-Adder
• The pre-adder can add or subtract the two 25-bit operands on the A and
the D inputs before the result drives the multiplier
• Benefits
– Perfect for operations using symmetrical coefficients
– Doubles the efficiency of symmetric FIR and symmetric IIR and transpose
convolution filters
– Half the power consumption compared to architectures without a pre-adder
– A small change with a big benefit
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
Symmetrical Filters
When the coefficients are symmetrical
- The pre-adders either reduce the number of multiplications by
50%
- Factorizing the taps replaces one multiplication by a pre-
addition (or pre-subtraction)
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
12
18/11/2024
Six -Tap Transpose FIR Filter Without Pre-Adder
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
Six -Tap Transpose FIR Filter Using the Pre-Adder
Optimized implementation supported by XST using only
three DSP slices
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
13
18/11/2024
Dynamic Pipeline Control
• The 7 series FPGA DSP slice has dynamic pipeline control on the A and B
registers
– User can select which of the two pipeline registers to use for calculations on a
clock-by-clock basis
• Benefits
– Allows an operation to reuse the same operand in subsequent cycles
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
Application: Sequential Complex Multiply
Embedded Systems Design: A Unified Hardware/Software Introduction, (c)
2000 Vahid/Givargis
14