Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views55 pages

Fpgas 29032016

The document discusses the basic building blocks of FPGAs including gates, flip flops, and interconnect. It provides a history of programmable logic devices including PLDs, CPLDs, and FPGAs. Key concepts like logic interconnect, I/Os, and propagation delay are also covered.

Uploaded by

Roshan Raju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views55 pages

Fpgas 29032016

The document discusses the basic building blocks of FPGAs including gates, flip flops, and interconnect. It provides a history of programmable logic devices including PLDs, CPLDs, and FPGAs. Key concepts like logic interconnect, I/Os, and propagation delay are also covered.

Uploaded by

Roshan Raju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

FPGAs!

Basic Concepts – Building Blocks

• There are (3) fundamental building blocks found in


digital devices interconnect gates flip flops

– Gates
– Flip-Flops D Q

>
– Interconnect
(or routing) D Q

>

D Q

>

D Q

>

2
Digital Logic Landscape
The following slides provide a history of the various logic devices
Design Capacity Full
(gates) Custom
Standard
Cell
Gate
Programmable Array
Logic

FPGA

CPLD

SPLD
Standard
Logic
Development Time
hours days weeks months years

3
Digital Logic History - PLDs
interconnect gates flip flops

• Developed in
the late 70s D Q A very common
> low cost IC package
• Major player D Q
has pins on all 4 sides
called a Plastic-Leaded
today: Lattice > Chip Carrier (PLCC)

• First device that D Q

needs software >

• 50 – 200 gates D Q

>

4
PLD Example

5
Digital Logic History - Gate Array
Definition: A pre-built IC consisting of a regular arrangement of gates and interconnect
(routing) where the interconnect is modified to achieve a customer’s desired
functions.
– The customer designs the behaviors/functions
– The vendor manipulates/changes the interconnect gates
metal interconnect to arrive at the
customer’s specified functions
(that is, the vendor hooks up the gates)
– Sometimes called an
Uncommitted Logic Array (ULA).

Packaging Enhancement:
To increase the number
of I/Os (Inputs/Outputs), the
pin thickness and spacing
(pitch) are dramatically
Gate Array in a
reduced in this Thin Quad
TQFP package
FlatPack package (TQFP). 1,000,000+ gates

6
Gate Array
• The ultimate building tool set for digital designers
• Advantages

– Very dense (today over 10,000,000 gates (10 million))


– Fast performance (200 – 500 MHz)
– Very low unit cost
• Disadvantages
– Long turn around time (3 - 6 months)
– $50K - $500K NRE
• NRE = Non-Recurring Engineering charges,

which are one-time “set-up” charges to ready


the “fab” to build the custom part
(“fab” = the “factory” where the ICs are
manufactured;
the “fabrication plant”)
– Risk of re-spins

7
Digital Logic History - Standard Cell
• This device features a series of customized “cells”
– Each cell is optimized for its “standard” function
• Cells are chosen form a library from the Standard Cell vendor,
customized, and connected to the other cells and the routing on the
part.
• There are no standard layers to the device; each layer is a unique
design
• Advantages:
– More optimized die size compared to GA
– Cheaper device price compared to GA
– Can add analog functions
• Disadvantages:
– Extremely high NRE charges (up to $1M)
– Requires >250k+ units/year
– Much longer development time
– Much higher risk (re-spins, etc.)
8
CPLDs, FPGAs
Design Capacity Full
(gates) Custom
Standard
Cell
Gate
Array

Programmable FPGA
Logic

CPLD

SPLD
Standard
Logic
Development Time
hours days weeks months years

9
Digital Logic History - CPLD
Complex Programmable Logic Device
interconnect macrocells
Definition:
A CPLD contains a bunch of PLD blocks
whose inputs and outputs are
connected together by a global
interconnection matrix.

CPLD has two levels of


programmability:
--Each PLD block can be programmed
--The interconnection between the
PLDs can be programmed.

CPLD technology was introduced in


the late 80s 32-1024 macrocells

10
CPLDs
• Vendors: Altera, Lattice, Cypress, Xilinx
• 2 Primary Technologies

– EEPROM
(old technology)
– FLASH
(technology used by Xilinx CPLDs)

• FPGAs vs. CPLDs


– FPGAs have much greater capacity
– CPLDs are faster for some small applications
– Both are easy to design

11
Digital Logic History - FPGA
Field Programmable Gate Array
Definition:
interconnect logic cells
• An array of “logic cells” surrounded by
substantial routing, both of which are under
the user’s control

• The CLB (Configurable Logic Block) is/was the


fundamental building block of the logic cell,
although today’s FPGAs use a very
sophisticated collection of gates that goes
beyond the original CLB design

– The early Xilinx CLBs contained a (4)


input look-up table (LUT), a flip-flop,
and “carry logic”
>10 million gates

12
FPGA Building Blocks

13
An Early Xilinx CLB

14
Digital Logic History
FPGA - Field Programmable Gate Array
2 types of FPGAs LUT flip flop

• Reprogrammable (SRAM-based)
0110 0

– Xilinx, Altera, Lattice, Atmel 1011


1100
0001
0
1
1
1010 0
1111 1

• One-time Programmable (OTP) SRAM logic cell

– Actel, Quicklogic, EZchip gates flip flop

OTP logic cell


15
Basic Concepts - Logic Interconnect
• Method to hook-up gates inside a single device
• Need to have enough routing to connect most gates
• Larger gate counts result in lots of routing,
bigger die size, increased cost
vertical interconnect

A
B

horizontal used
interconnect interconnect
path

gates

16
Basic Concepts - I/Os
Inputs and Outputs

• All signals on & off O


chip must go through I/O buffer
an I/O buffer
I
• User can choose
package pin
many I/O buffer
options
silicon die

17
Basic Concepts
Propagation Delay (tPD)

Definition: The time required for a signal to travel


from A to B, measured in nanoseconds
(ns).
Gate Delay Interconnect Delay

“A” “B”
“A” “B”

tPD = 3ns tPD = 1ns

18
Basic Concepts
Path Delay
Definition: The sum of all the gate and net delays from
starting to ending point.
“C”

fanout=2

“A” “B”

tPD = 3ns tPD = 1.2ns tPD = 3ns tPD = 1.8nstPD = 3ns

Path Delay “A” to “B” = sum of all gate + net delays


3ns + 1.2ns + 3ns + 1.8ns + 3ns =
12ns
19
Basic Concepts
Maximum System Performance (fMAX)
Definition: The fastest speed a circuit containing flip-flops can
operate, measured In Megahertz (MHz).

D Q Circuit Events per Second:


1 = 1 Hertz (Hz)
1,000 = kilo (kHz)
> 1,000,000 = mega (MHz)
1,000,000,000 = giga (GHz)

tCQ = 2.5ns tPD = 1ns tPD = 2ns tPD = 0.5ns tPD = 2ns

1
fMAX =
longest flip-flop path delay

fMAX = 1/(flip-flop delay + gate delays + net delays)


= 1/(2.5 + 1 + 2 + 0.5 + 2)ns
= 125 MHz
20
Xilinx FPGA
Architecture
How are they arranged
18Kbits 18×18
Spartan 6
Dual Port RAM Multiplier

CLB (Configurable Logic Block)


= 4 Slices

Slice

I3 SET
CE
I2 O D Q
I1
RST
I0

I3 SET
CE
I2 O D Q
I1
RST
I0

124 multi-standard I/O with JTAG

Low Cost Design 22


How they are arranged
Kintex-7 FPGA
Typical FPGA Logic Structure

• LUT
• Flip flop
Typical 4 Input LUT
• 4 Inputs
• One Output

• Any 4 input Logic function


can be implemented.
Flip Flop
• Input D
• Input Clock
SET
• Input Clock Enable CE

• Input Set D Q

• Input Reset

RST
• Output Q
Making the Most of Controls
Dedicated Flip-Flop controls make designs smaller and faster.

LUT4
SET
I3 CE
1 level of logic - fast and small I2 O D Q
I1
Up to 4 data inputs plus 3 controls I0
RST
tSU

2 levels of logic - significantly slower and twice the size (and cost)

LUT4 LUT4
SET
I3 I3 CE
I2 O I2 O Q
net D
I1 I1
I0 I0
RST
tSU tSU

Low Cost Design 27


Workshop - How can this be implemented?
This simple code describes a 4-input function followed by a Flip-Flop.
What size and performance is this function?

process (clk,reset)
begin
if reset='1' then reset
data_out <= '0';
elsif clk'event and clk='1' then
if enable='1' then enable
if force_high='1' then
set
data_out <= '1';
else
data_out <= a and b and c and d; logic
end if;
end if;
end if;
end process;

Low Cost Design 28


Making the Most LUTs and FFs
Dedicated Flip-Flop controls make designs smaller and faster.

LUT4
SET
I3 CE
1 level of logic - fast and small I2 O D Q
I1
Up to 4 data inputs plus 3 controls I0
RST
tSU

2 levels of logic - significantly slower and twice the size (and cost)

LUT4 LUT4
SET
I3 I3 CE
I2 O I2 O Q
net D
I1 I1
I0 I0
RST
tSU tSU

Low Cost Design 29


Workshop - How can this be implemented?
This simple code describes a 4-input function followed by a Flip-Flop.
What size and performance is this function?

process (clk,reset)
begin
if reset='1' then reset
data_out <= '0';
elsif clk'event and clk='1' then
if enable='1' then enable
if force_high='1' then
set
data_out <= '1';
else
data_out <= a and b and c and d; logic
end if;
end if;
end if;
end process;

Low Cost Design 30


TWICE the Cost and Half the Speed
Report

Cell Usage :
# BELS : 2
TWICE as Big as it # LUT2 : 1
should be and Slow! # LUT4 : 1
# FlipFlops/Latches : 1
# FDCE : 1

enable

LUT4
LUT2 PRE
force_high I3 CE
d I1 b I2 data_out
O O D Q
c I0 I1
a I0
CLR
Solution

reset

Low Cost Design 31


CLB (Configurable Logic Block)
Multiple LUTs and FFs
CLB

Slice Slice

PRE PRE
LUT Carry D Q LUT Carry D Q
CE CE

CLR CLR

LUT Carry PRE LUT Carry PRE


D Q D Q
CE CE

CLR CLR

2 Slices in Each CLB


• Each Slice has Two LUTs and Two Flipflops
How do CLBs connect with each Other
• Pairs of CLBs are arranged symmetrically
• Connect via Switch matrix

Slice

Slice
Switch Matrix
Clocks

Switch Matrix
Slice

Slice
Data Data
Fabric Routing
• Connections between CLBs and other resources use the fabric routing
resources
• Routing lines connect to the switch
matrices adjacent to the resources
• Routes connect resources vertically,
horizontally, and diagonally
• Routes have different spans
• Horizontal: Single, Dual, Quad, Long (12)
• Vertical: Single, Dual, Hex, Long (18)
• Diagonal: Single, Dual, Hex
Different Architectures:
6 Input LUTs
• 6-input LUT can be two 5-input LUTs with common inputs
• Minimal speed impact to
a 6-input LUT 6-LUT
• One or two outputs A6

• Any function of six variables or A5 A5


A4 A4 D
two independent functions of A3 5-LUT
A3
five variables A2 A2
A1 A1
O6

A5
A4 D O5
A3
5-LUT
A2
A1
Different Architectures:
Slice Structure with 4 LUTs
• Four six-input Look Up Tables (LUT)
• Wide multiplexers
LUT/RAM/SRL

• Carry chain
• Four flip-flop/latches LUT/RAM/SRL

• Four additional flip-flops

• The implementation tools (MAP)


LUT/RAM/SRL

are responsible for packing slice


resources into the slice LUT/RAM/SRL

01
More Detailed Look at Flip Flops
• All flip-flops are D type D Q
CE
CE
• All flip-flops have a single clock input (CLK) CK
CK

 Clock can be inverted at the slice boundary SRSR

• All flip-flops have an active high chip enable (CE)


• All flip-flops have an active high SR input
 Input can be synchronous or asynchronous, as determined by the configuration bit
stream
 Sets the flip-flop value to a pre-determined state, as determined by the configuration
bit stream
Asynchronous Reset
• To infer asynchronous resets, the reset signal must be in the
sensitivity list of the process
• Output takes reset value immediately
• Even if clock is not present
• SRVAL attribute is determined by reset value in RTL code
FF: process (CLK, RST)
always @ (posedge CLK or posedge RST )
begin
begin
if (RST)
if (RST = ‘1’) then SRVAL
Q <= ‘0’;
Q <= 1’b0;
elsif (rising_edge CLK) then
else SRVAL Q <= D;
Q <= D;
end if;
end
end
Using Asynchronous Resets
• Deassertion of reset should be synchronous to the clock
• Not synchronizing the deassertion of reset can create
problems
• Flip-flops can go metastable
• Not all flip-flops are guaranteed to come out of reset on the
same clock
• Use a reset bridge to synchronize reset to each domain
rst_pin

D SR D SR
0 D Q
CE D Q
CE rst_clkA

CK
CK CK
CK
SR configured as
SR SR asynchronous,
clkA SRVAL=1
Synchronous Reset
• A synchronous reset will not take effect until the first active clock
edge after the assertion of the RST signal
• The RST pin of the flip-flop is a regular timing path endpoint
• The timing path ending at the RST pin will be covered by a PERIOD constraint
on the clock

FF: process (CLK)


always @ (posedge CLK) begin
begin if (rising_edge CLK) then
if (RST) if (RST = ‘1’) then
Q <= 1’b0; Q <= ‘0’;
else SRVAL else
Q <= D; Q <= D; SRVAL
end end if;
end
Chip Enable
• All flip-flops in the 7 series FPGAs have a chip enable (CE) pin
• Active high, synchronous to CLK
• When asserted, the flip-flop clocks in the D input
• When not asserted, the flip-flop holds the current value
• Inferred naturally from RTL code
FF: process (CLK)
begin
always @ (posedge CLK )
if (rising_edge CLK) then
begin
if (CE = ‘1’) then
if (CE)
Q <= D;
Q <= D;
end if;
end
end if;
end
LUTs can also be used as RAM
• Uses the same storage that is used for
Single Dual Simple Quad the look-up table function
Port Port Dual Port Port
• Synchronous write, asynchronous read
32x2 32x2D 32x6SDP 32x2Q
• Can be converted to synchronous read
32x4 32x4D 64x3SDP 64x1Q using the flip-flops available in the slice
32x6 64x1D
32x8 64x2D • Various configurations
64x1 128x1D • Single port
64x2 • One LUT6 = 64x1 or 32x2 RAM
64x3 • Cascadable up to 256x1 RAM
64x4 • Dual port (D)
128x1 • 1 read / write port + 1 read-only port
128x2 • Simple dual port (SDP)
256x1 • 1 write-only port + 1 read-only port
Each port has independent • Quad-port (Q)
address inputs • 1 read / write port + 3 read-only ports
Block RAMs
(In built Memory)
Single-Port Block RAM
• Single read/write port
• Clock: CLKA ADDRA Port A
36 36
• Address: ADDRA 4
DIA DOA
WEA
• Write enable: WEA CLKA
• Write data: DIA 36 Kb
• Read data: DOA Memory
Array
• 36-kbit configurations
• 32k x 1, 16k x 2, 8k x 4, 4k x 9, 2k x 18, 1k x 36
• 18-kbit configurations
• 16k x 1, 8k x 2, 4k x 4, 2k x 9, 1k x 18, 512 x 36
• Configurable write mode
• WRITE_FIRST: Data written on DIA is available on DOA
• READ_FIRST: Old contents of RAM at ADDRA is presented on DOA
• NO_CHANGE: The DOA holds its previous value (saves power)
Summary of Block RAM Configurations
18kbit 36kbit

32k x 1, 16Kx2,
16Kx1, 8Kx2, 4Kx4,  1 read/write port
Single Port 8Kx4, 4Kx9,
2Kx9, 1Kx18  Read OR write in 1 cycle
2Kx18, 1Kx36

32Kx1, 16Kx2,  Two fully independent


16Kx1, 8Kx2, 4Kx4,
True Dual Port 8Kx4, 4Kx9, read/write ports
2Kx9, 1Kx18
2Kx18, 1Kx36  Any two operations in 1 cycle

32K x 1, 16Kx2,
16Kx1, 8Kx2, 4Kx4,
8Kx4, 4Kx9,  1 read port and 1 write port
Simple Dual Port 2Kx9, 1Kx18,
2Kx18, 1Kx36,  Read AND write in 1 cycle
512x36
512x72
SelectI/O
5.0V 1.8V 3.3V 2.5V SelectI/O Allows Connection
Directly to External Signals of
Varied Voltages & Thresholds

PCI SSTL HSTL

Future Standards Can be


Supported Without Having GTL GTL+ AGP
to Make Silicon Changes

4 System Interfaces
SelectI/O
• Allows Connection & Use of a Wide Variety of Devices
• Processors, Memory, Bus Specific Standards, Mixed Signal...
• Provides Industry Standard IEEE/JDEC I/O Standards
• Maximizes Speed/Noise Tradeoff - Use Only What is Needed
• Can Connect to or Create High Performance Backplanes
• PCI, GTL+, HSTL
• DIY - Virtex Based Backplane Design in Progress
• Define I/O by Simply Placing Desired Input And/Or Output
Buffers Into the Design
• Special IBUF and OBUF Components Provided in Schematic Based and
HDL Based Design Flows
• For Example: SSTL3, Class I Output Buffer - OBUF_SSTL3_I
Simplified IOB Structure
• Fast I/O Drivers
DFF/LATCH

• Separate Registers for Input, D


CE
Q

Output & Three-State Control S/R

• Asynchronous Set or Reset


Available on Each Flip-flop
• Common Clock, Separate Clock DFF/LATCH
D Q
Enables CE
PAD
S/R

• Programmable Slew Rate, Pullup,


Input Delay, Etc
• Selectable I/O Standard Support DFF/LATCH
D Q
CE
• Supported Standards List can be S/R

Updated After Testing


How It Works
SelectI/O Output SelectI/O Input
Configuration Bits

OBUF_SSTL3_I IBUF_SSTL3_I

SSTL3 Class1 SSTL3 Class1


Output Driver Input Receiver
Xilinx 7 Series

Industry’s Best Industry’s Highest


Lowest Power
Price-Performance System Performance
and Cost
“New Class of FPGA” and Capacity
Compared to Spartan-6 Compared to Virtex-6 Compared to Virtex-6
 30% more performance  Comparable performance  2.5x larger (2M LCs)
 Lower system cost with 50% lower cost for 2x  50% higher performance
 50% less power better price-performance  50% lower power
 30% smaller footprint  50% less power  2x line rate (28 Gb/s)
Compared to Spartan-6  Similar EasyPath™ cost
 3.3x larger reduction
 Over 2x performance with
4x transceiver speed
 Superior price-performance
Page 50
7 Series FPGA Layout
• Similar Floorplan to Virtex-6 FPGAs
– Provides easy migration to 7 series
FPGAs
• CMT columns moved from center of
device to adjacent to I/O columns
– No more inner vs. outer column
performance difference
– Support for higher performance
interfaces
• Only one I/O column per half device I/O Columns

– Uniform skew from center of device CMT Columns

• GT columns replace I/O and CMT in Clock Routing


smaller devices CLB, Block RAM, DSP Columns
• GT columns not always present GT Columns

Page 51
7 Series Slice Structure
• Four six-input Look Up Tables (LUT)
• Wide multiplexers
LUT/RAM/SRL

• Carry chain
• Four flip-flop/latches LUT/RAM/SRL

• Four additional flip-flops

• The implementation tools (MAP)


LUT/RAM/SRL

are responsible for packing slice


resources into the slice LUT/RAM/SRL

01
7-Series I/O Block Diagram
Logical Resources Electrical Resources

OLOGIC/
ODELAY
OSERDES
P

Interconnect to FPGA Fabric


ILOGIC/
IDELAY
ISERDES

Master
LVDS
Termination

Slave
ILOGIC/
IDELAY
ISERDES
N
OLOGIC/
ODELAY
OSERDES
7 Series FPGAs DSP
• 7 series FPGAs DSP slice 100% based on Virtex-6 FPGA
DSP48E1
• 25x18 multiplier
• 25-bit pre-adder
• Flexible pipeline
• Cascade in and out
• Carry in and out
• 96-bit MACC
• SIMD support
• 48-bit ALU
• Pattern detect
• 17-bit shifter
• Dynamic operation (cycle by cycle)

Programmable
Systems
Integration
Programmable
Highly Capable, Dedicated DSP Logic in Every 7 Series FPGA
Systems Integration
Page 54
7-Series Gigabit Transceivers
2
Tx
FPGA
PMA PCS
Fabric
2 Interface
Rx
PMA PCS

• Dedicated parallel-to-serial transmitter and serial-to-parallel receiver


• Unidirectional, differential bit-serial data I/O
• Integrated PLL-based Clock and Data Recovery (CDR)

• Parallel interface to the FPGA internal fabric


• Width varies by family, protocol, and line rate from 8 to 40 bits

• Serial interface to the printed circuit board (differential signaling)


• Differential Current Mode Logic (CML)
• Two traces for the transmitter and two traces for the receiver; removes common-mode noise

You might also like