01 - 01 PCI Express Basics & Background
01 - 01 PCI Express Basics & Background
Background
Richard Solomon
PCI-SIG® PWG Member
Synopsys
Acknowledgements
Click to edit Master title style
Thanks are due to Ravi Budruk, Mindshare, Inc. for legacy material on PCI Express®
Basics
2
PCI Express Background
3
Revolutionary AND Evolutionary
Click to edit Master title style
• PCI™ (1992/1993)
• Revolutionary
• Plug and Play jumperless configuration (BARs)
• Unprecedented bandwidth
• 32-bit / 33MHz – 133MB/sec
• 64-bit / 66MHz – 533MB/sec
• Designed from day 1 for bus-mastering adapters
• Evolutionary
• System BIOS maps devices then operating systems boot and run without further knowledge of PCI
• PCI-aware O/S could gain improved functionality
• PCI 2.1 (1995) doubled bandwidth with 66MHz mode
4
Revolutionary AND Evolutionary
Click to edit Master title style
• PCI-X™ (1999)
• Revolutionary
• Unprecedented bandwidth
• Up to 1066MB/sec with 64-bit / 133MHz
• Registered bus protocol
• Eased electrical timing requirements
• Brought split transactions into PCI “world”
• Evolutionary
• PCI compatible at hardware *AND* software levels
• PCI-X 2.0 (2003) doubled bandwidth
• 2133MB/sec at PCI-X 266 and 4266MB/sec at PCI-X 533
5
Revolutionary AND Evolutionary
Click to edit Master title style
• PCI Express – aka PCIe® (2002)
• Revolutionary
• Unprecedented bandwidth
• x1: up to 4GB/sec in *EACH* direction (PCIe 5.0)
• x16: up to 64GB/sec in *EACH* direction (PCIe 5.0)
• “Relaxed” electricals due to serial bus architecture
• Point-to-point, low voltage, dual simplex with embedded clocking
• Evolutionary
• PCI compatible at software level
• Configuration space, Power Management, etc.
• Of course, PCIe-aware O/S can get more functionality
• Transaction layer familiar to PCI/PCI-X designers
• System topology matches PCI/PCI-X
• Doubling of bandwidth each generation (from 250MB/s/lane):
• PCIe 2.0 (2006) 500MB/s/lane
• PCIe 3.0 (2010) ~1GB/s/lane
• PCIe 4.0 (2017) ~2GB/s/lane
• PCIe 5.0 (2023) ~4GB/s/lane
6
PCI Concepts
7
Address Spaces – Memory &
Click to edit Master title style I/O
• Memory space mapped cleanly to CPU semantics
• 32-bits of address space initially
• 64-bits introduced via Dual-Address Cycles (DAC)
• Extra clock of address time on PCI/PCI-X
• 4 DWORD header in PCI Express
• Burstable
• I/O space mapped cleanly to CPU semantics
• 32-bits of address space
• Actually much larger than CPUs of the time
• Non-burstable
• Most PCI implementations didn’t support
• PCI-X codified
• Carries forward to PCI Express
8
Address Spaces – Configuration
Click to edit Master title style
• Configuration space???
• Allows control of devices’ address decodes without conflict
• No conceptual mapping to CPU address space
• Memory-based access mechanisms in PCI-X and PCIe
• Bus / Device / Function (aka BDF) form hierarchy-based address (PCIe 3.0 calls this
“Routing ID”)
• “Functions” allow multiple, logically independent agents in one physical device
• E.g. combination SCSI + Ethernet device
• 256 bytes or 4K bytes of configuration space per device
• PCI/PCI-X bridges form hierarchy
• PCIe switches form hierarchy
• Look like PCI-PCI bridges to software
• “Type 0” and “Type 1” configuration cycles
• Type 0: to same bus segment
• Type 1: to another bus segment
9
Configuration Space (cont’d)
Click to edit Master title style
Processor Processor Processor Processor
PCI-to-PCI PCI-to-PCI
Bridge Bridge
Primary = 0 Primary = 4
Secondary = 1 Secondary = 5
Subord = 3 Subord = 5
PCI-to-PCI PCI-to-PCI
Bridge Bridge
Primary = 1 Primary = 1
Secondary = 2 Secondary = 3
Subord = 2 Subord = 3
PCI Bus 2
PCI Bus 3
10
Configuration Space
Click to edit Master title style
• Device Identification
• VendorID: PCI-SIG® assigned
• DeviceID: Vendor self-assigned
• Subsystem VendorID: PCI-SIG
• Subsystem DeviceID: Vendor
• Address Decode controls
• Software reads/writes BARs to determine
required size and maps appropriately
• Memory, I/O, and bus-master enables
• Other bus-oriented controls
11
Configuration Space – Capabilities List
Click to edit Master title style
• Linked list
• Follow the list! Cannot assume fixed location of any given feature in any given device
• Features defined in their related specs:
• PCI-X
• PCIe
• PCI Power Management
• Etc.
31 16 15 8 7 0
Pointer to
Feature-specific Next Capability Capability ID Dword 0
Dword 1
Configuration Registers
Dword n
12
Configuration Space –
Click to edit
Extended Master
Capabilities title
List style
• Linked list – new with PCI Express
• Follow the list! Cannot assume fixed location of any given feature in any given device
• First entry in list is *always* at 100h
• Features defined in PCI Express and related (e.g. MR-IOV, SR-IOV) specifications
• Consolidated in PCI Code and ID Assignment Spec
31 20 19 16 15 8 7 0
Pointer to Next Capability
Version Capability ID Dword 0
Dword 1
Feature-specific Configuration Registers
Dword n
13
Interrupts
Click to edit Master title style
• PCI introduced INTA#, INTB#, INTC#, INTD# - collectively referred to as INTx
• Level sensitive
• Decoupled device from CPU interrupt
• System controlled INTx to CPU interrupt mapping
• Configuration registers
• report A/B/C/D
• programmed with CPU interrupt number
• PCI Express mimics this via “virtual wire” messages
• Assert_INTx and Deassert_INTx
14
What are MSI and MSI-X?
Click to edit Master title style
• Memory Write replaces previous interrupt semantics
• PCI and PCI-X devices stop asserting INTA/B/C/D and PCI Express devices stop sending
Assert_INTx messages once MSI or MSI-X mode is enabled
• MSI uses one address with a variable data value indicating which “vector” is asserting
• MSI-X uses a table of independent address and data pairs for each “vector”
• NOTE: Boot devices and any device intended for a non-MSI operating system generally
must still support the appropriate INTx signaling!
15
Split Transactions – Background
Click to edit Master title style
• PCI commands contained no length
• Bus allowed disconnects and retries
• Difficult data management for target device
• Writes overflow buffers
• Reads require pre-fetch
• How much to pre-fetch? When to discard? Prevent stale data?
16
Split Transactions
Click to edit Master title style
• PCI-X commands added length and Routing ID of initiator
• Writes: allow target device to allocate buffers
• Reads: Pre-fetch now deterministic
• PCI-X retains “retry” & “disconnect”, adds “split”
• Telephone analogy
• Retry: “I’m busy go away”
• Delayed transactions are complicated
• Split: “I’ll call you back”
• Simple
• More efficient
17
Benefits of Split Transactions
Click to edit Master title style
Bandwidth Usage with Conventional PCI Bandwidth Usage with PCI-
Protocols X Enhancements
275 275
100% 100%
Idle Time System Overhead
250 Idle Time 250 -- Scheduling
-- Unused BW
-- Unused BW 90% 90%
225 225 Transaction Overhead
80% 80% -- Addressing and Routing
Bandwidth MegaBytes /sec
125 125
40% 40%
1 2 3 4 5 1 2
18
PCI Express Basics
19
PCI Express Features
Click to edit Master title style
• Dual Simplex point-to-point serial connection
• Independent transmit and receive sides
• Scalable Link Widths
• x1, x2, x4, x8, x12, x16, x32
• Scalable Link Speeds
• 2.5, 5.0, 8.0, 16.0 GT/s, 32GT/s
• Packet based transaction protocol
Packet
PCIe PCIe
Device Link (x1, x2, x4, x8, x12, x16 or x32) Device
A B
Packet
20
PCI Express Terminology
Click to edit Master title style
PCI Express Device A
Signal Link
Wire Lane
21
Upstream/Downstream
Click to edit Master title style
• Relative to root – up is towards, down is away
• Note “streamness” of devices vs their ports
• The direction a gremlin standing on the device looks…
22
PCI Express Throughput
Click to edit Master title style
Link Width
Bandwidth (GB/s) x1 x2 x4 x8 x16
“2.5 GT/s” (PCIe 1.0+) 0.25 0.5 1 2 4
“5 GT/s” (PCIe 2.0+) 0.5 1 2 4 8
“8 GT/s” (PCIe 3.0+) ~1 ~2 ~4 ~8 ~16
“16GT/s” (PCIe 4.0+) ~2 ~4 ~8 ~16 ~32
“32GT/s” (PCIe 5.0+) ~4 ~8 ~16 ~32 ~64
“64GT/s” (PCIe 6.0) 8 16 32 64 128
24
Additional Features
Click to edit Master title style
• Evolutionary PCI-compatible software model
• PCI configuration and enumeration software can be used to enumerate PCI Express hardware
• PCI Express system will boot “PCI” OS
• PCI Express supports “PCI” device drivers
• New additional configuration address space requires OS and driver update
• Advanced Error Reporting (AER)
• PCI Express Link Controls
25
PCI Express Topology
Click to edit Master title style
CPU
Root Complex
Bus 0 (Internal) Memory
PCIe 3 Switch
PCIe Switch PCIe Virtual
PCI
Endpoint Endpoint PCIe Bridge
Bus 2
Bridge To
PCIe 4 PCIe 5 PCI/PCI -X Virtual
PCI
Virtual
PCI
Virtual
PCI
Bridge Bridge Bridge
PCIe Legacy
Endpoint Endpoint PCI/PCI-X
Bus 8
Legend
PCI Express Device Downstream Port
PCI Express Device Upstream Port
26
Transaction Types, Address Spaces
Click to edit Master title style
Request are translated to one of four transaction types by the Transaction
Layer:
1. Memory Read or Memory Write. Used to transfer data from or to a memory mapped
location.
– The protocol also supports a locked memory read transaction variant
2. I/O Read or I/O Write. Used to transfer data from or to an I/O location.
– These transactions are restricted to supporting legacy endpoint devices
3. Configuration Read or Configuration Write. Used to discover device capabilities, program
features, and check status in the 4KB PCI Express configuration space.
4. Messages. Handled like posted writes. Used for event signaling and general purpose
messaging.
27
Three Methods For Packet Routing
Click to edit Master title style
• Each request or completion header is tagged as to its type, and each of the
packet types is routed based on one of three schemes:
• Address Routing
• ID Routing
• Implicit Routing
• Memory and IO requests use address routing
• Completions and Configuration cycles use ID routing
• Message requests have selectable routing based on a 3-bit code in the message
routing sub-field of the header type field
28
Programmed I/O Transaction
Click to edit Master title style
Processor Processor
MRd FSB
Requester:
-Step 1: Root Complex (requester)
initiates Memory Read Request (MRd) Root Complex
-Step 4: Root Complex receives CplD DDR
SDRAM
MRd CplD
Switch A Switch C
MRd
CplD
MRd CplD
Completer:
Endpoint Endpoint -Step 2: Endpoint (completer)
receives MRd
-Step 3: Endpoint returns
Completion with data (CplD)
29
DMA Transaction
Click to edit Master title style
Processor Processor
FSB
Completer:
-Step 2: Root Complex (completer)
receives MRd Root Complex
-Step 3: Root Complex returns DDR
Completion with data (CplD) SDRAM
CplD MRd
Switch A Switch C
CplD
MRd
CplD MRd
Requester:
Endpoint Endpoint -Step 1: Endpoint (requester)
initiates Memory Read Request (MRd)
-Step 4: Endpoint receives CplD
30
Peer-to-Peer Transaction
Click to edit Master title style
Processor Processor
FSB
Root Complex
DDR
SDRAM
CplD MRd MRd CplD
Switch A Switch C
CplD
MRd MRd CplD
31
TLP Origin and Destination
Click to edit Master title style
PCI Express Device A PCI Express Device B
Link
32
TLP Structure (Non-FLIT Mode)
Click to edit Master title style
Information in core section of TLP comes
from Software Layer / Device Core
34
TLP Structure – Flit vs Non-Flit
Click to edit Master title style
Start Sequence Header Data Payload ECRC LCRC End
1B 2B 3-4 DW 0-1024 DW 1DW 1DW 1B
TLP D000-D015
TLP D016-D031
TLP D032-D047
TLP D048-D063
TLP D064-D079
TLP D080-D095
TLP D096-D111
TLP D112-D127
TLP D128-D143
TLP D144-D159
TLP D160-D175
TLP D176-D191
TLP D192-D207
TLP D208-D223
TLP D224-D235 DLP 0-3
DLP 4-5 CRC 0-7 FEC 0-5
35
New TLP Header Formats for Flit Mode
Click to edit Master title style
• TLP Header is composed of a 3 to 7 DW TLP
Header Base, followed by 0 to 7 additional DWs of
“Orthogonal Header Content” (OHC)
First DW of FLIT Mode Header Base
• Single DW exceptions: NOP TLP and Reserved TLPs
• Fully-decoded 8b Packet Type field
• All 256 Type values are defined or earmarked to permit
proper framing and forwarding
• First DW of the Header Base includes all info required
to determine the full size of the TLP
64-bit Address Routed TLP – FLIT Mode • TLP Header Base, OHC, Payload, and Trailer
• Exception: Link-Local TLP Prefixes (rarely used)
• End-End TLP Prefixes are integrated into the Header
• Some are in architected OHC fields (e.g., PASID)
• Remaining ones become OHC-E
• Transaction Digest is replaced by a 0 to 5 DW “Trailer”
• Poison / nullify behavior is defined; two ways to poison
Example FLIT Mode OHC DWs
36
DLLP Origin and Destination
Click to edit Master title style
PCI Express Device A PCI Express Device B
Link
37
DLLP Structure (Non-FLIT Mode)
Click to edit Master title style
Bit transmit direction
38
Data Link Layer Payload (DLP) in FLIT Mode
Click to edit Master title style
• Each FLIT has 6Bytes for DLP Non-Flit DLLP (Reference)
• First 2 bytes: FLIT reliability information Start DLLP CRC End
• Ack/Nak/retry - latency optimized 1B 4B 2B 1B
• 10 bit sequence num– low latency Ack/Nak
• Prior FLIT NOP TLPs only to avoid retry on error
Flit Structure
Every FLIT has Ack/Nak and credits => smaller queue sizes
and low latency even on retry of FLIT
39
Ordered-Set Origin and Destination
Click to edit Master title style
PCI Express Device A PCI Express Device B
Ordered-Set Ordered-Set
Physical Layer Physical Layer
Transmitted Received
Link
40
Ordered-Set Structure (Non-FLIT Mode)
Click to edit Master title style
COM Identifier Identifier Identifier
41
Ordered-Sets in FLIT Mode
Click to edit Master title style
• 64GT/s OS works with repetitions & handshake
• More complex rules and more variable formats compared to non-FLIT
• Transmitted between FLITs
• Formats permit receiver to distinguish OS from start of FLIT
42
PCI Express Flow Control
Click to edit Master title style
Credit-based flow control is point-to-point based, not end-to-end
Buffer space
available
TLP
VC Buffer
Transmitter Receiver
Receiver sends Flow Control Packets (FCP) which are a type of DLLP (Data Link Layer Packet)
to provide the transmitter with credits so that it can transmit packets to the receiver
43
ACK/NAK Protocol Overview
Click to edit Master title style
Transmit Receiver
Device A Device B
From To
Transaction Layer Transaction Layer
Tx Rx
Data Link Layer Data Link Layer
TLP DLLP DLLP TLP
ACK / ACK /
Sequence TLP LCRC NAK NAK
Sequence TLP LCRC
Replay
Buffer De-mux De-mux
Error
Mux Mux Check
Tx Rx Tx Rx
DLLP
ACK /
NAK
Link
TLP
Sequence TLP LCRC
44
ACK/NAK Protocol – FLIT Mode Impacts
Click to edit Master title style
• FEC Correction is applied *BEFORE* any error detection is performed
45
ECRC Overview
Click to edit Master title style
Start Sequence Header Data Payload ECRC LCRC End
46