Cisco Switching Hardware Architecture What makes a Cisco Switch
BRKRST-3069
www.ciscolivevirtual.com
Agenda
Overview Concept System Design Mechanical / Physical Design Buffer Design
Hardware Engineering
Software Engineering
Forwarding Design
ASIC Engineering
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Overview
Timeline
Product Requirements Document
ASIC
Requirements
Plan
Micro Architecture
Implementation
Final Netlist
Power On
Hardware
HW Design Detailed Design Mechanical Drawing Fab Out P0 P1 P2 A-0
Mechanical
Electrical Manufacturing
MDVT
PCB Layout BOM
RDT
EDVT
Software
SW Functional Spec SW Design Spec Unit Test Plan Unit Integration Plan
Software Test
Master Test Plans
BRKRST-3069
Functional Test Plans
2012 Cisco and/or its affiliates. All rights reserved.
Automation
Cisco Public
Regression
FCS
4
Nexus 7000 and F2 Modules
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
Concept
Concept
What customer problem will the product solve? Vision Technology Market
Cost
Life Cycle
How Big?
Time to Market
Differentiation Innovation
How many ports?
Fixed vs Modular Backward Compatibility
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
Nexus 7000 Vision (circa 2007)
Ciscos End-to-End Data Centre Switching platform; providing solutions for 10G, 40G, and 100G for Access, Aggregation, and Core.
Consolidate IP, Storage, and IPC networks onto a single Ethernet fabric and deliver innovative features and services that provide value to our customers.
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
DC Evolutionary Innovation
2013 Phase 3 2011 Phase 2 Terabit Slot 2009 Phase 1 Terabit Slot
10G Access 40 / 100G Aggregation Unified Fabric 10GbE Access 10GbE Aggregation Unified Fabric 10G Aggregation
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
FCS DC3 80G Slot
Cisco Internal Slide CY2007
9
F2 Series
High Level Goals
48 Ports 1/10G Line Rate 64 Bytes
Low Latency
L2MP, TRILL, FEX, FCoE, L3 Forwarding Optimise for Data Centre IPv4 & IPv6 Equal Performance Cost target
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
10
System Design
Many Factors to Weigh
Applicable to any Switch / Router Design
Standards requirements Market requirements Designability Silicon technology Processor technology Manufacturability Time to market Flexibility Budget Modular / Fixed
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
12
Many Factors to Weigh
Baseline Data Centre Switch Requirements
Data Plane
Buffering No packet drop
Control plane
Modular Restartable (including activeactive state handling) Non-disruptive code load & activation
Throughput
Port count
Modular
No single point of failure In-order delivery Future protocol compatibility
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved.
No single point of failure
Scaleable Unit Testable Future protocol compatibility
Cisco Public
13
Mechanical / Physical Design
Mechanical Design
48 x 1G BaseT N7K-M148GT-11 Nexus 7010 Rear
N7K-SUP1 Supervisor N7K-AC-6.0kW Power Supply
Nexus 7010 Rear
Nexus 7010 Front
Fabric N7K-C7010-FAB-1
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
32 x 10G SFP+ N7K-M132XP-12
15
Industrial Design / Usability
Ejectors
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
16
Industrial Design / Usability
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
17
Buffer Design
One plus one does not equal Two
Cisco Public
Switch B
10 Links @ 1Gbps Each Bandwidth = 10Gbps Flow Bandwidth = 1Gbps Serialisation Delay = 20uS
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved.
1 Link @ 10Gbps Each Bandwidth = 10Gbps Flow Bandwidth = 10Gbps Serialisation Delay = 2uS
19
Switch B
Switch A
Switch A
Single ASIC
Scalability limited by memory bandwidth/size Typically optimised for fixed configuration Cost effective with small port counts
Output 3 Output 1
20
Input 3
Often used as building block
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
Input 1
Output 2
Input 2
Switch Architecture
Clos / Fat Tree
Mesh
Crossbar
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
21
Complete System Pull Fabric
Request Grant Credit
Arbiter 3 4 5
Superframes
Fabric
Superframes
Ingress
Egress
SP SP SP
WRED
WRED
WRED
WRED
WRED
WRED
WRED
DWRR DWRR
WRED WRED
DWRR
8 1
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
22
Forwarding Design
High Level View of Forwarding
Parse Packet Table Lookups Forwarding Decision
L2 Table L3 Table Classification
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
24
How fast?
GT/s PCI express v1 PCI express v2 2.5 5 Serdes (Gbps) 2.525 5G Encoding
10G Ethernet = 14.88Mpps @ 64 Bytes
8b/10b 8b/10b
67.2ns to receive a packet
PCI express v3
10G Ethernet
7.99
10.3125
128b/130b
64b/66b
DDR3 Latency ~10ns SRAM Latency 1 cycle
100G Ethernet = 148.8Mpps @ 64 Bytes 6.72 ns to receive a packet
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
25
10G Ethernet Forwarding Rate
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
26
Table Lookups CAMs, HASH Tables and *Tries
Input Key CAM
1 2 3 4 01001010 010010XX 01001XX0 01001XXX
Hash Table
Trie
Result
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
27
CAMs
Content Addressable Memory
01001110
Ternary Content Addressable Memory
01001000 01001101 01001110 1 2 3 4
2 4 3
Lkup #1 Lkup #2 Lkup #3
1 2 3 4
01101010 01101011 01001110 01101100
Hit!
01001010 010010XX 01001XX0 01001XXX
Result #1 Result #2 Result #3
Hit #1! Hit #3! Hit #2!
Result
Storing 1 bit in TCAM takes 10-12 transistors
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
28
Hash Tables
Input MAC Address 0000.c000.0001
Pages
Mathematical Functional produce value between 0 and Page Size
Page Size
Compare if value in each page matches input value
1 bit in SRAM takes 6 transistors 1 bit in DRAM takes 1 transistor
Cisco Public
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
29
Tries
Many different *tries
Bitwise Trie
Balanced Trie Patricia Trie Fixed or Variable Stride Tries
Store information in each leaf or pointer to table with information in it
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
30
L3 Table: Design 1
Rewrite Information IPv4 Unicast FIB VRF / Prefix / Mask / Paths / Offset 1 / 10.1.2.0 / 24 / 4 / 1 1 / 10.1.3.0 / 24 / 1 / 5 3 / 10.1.2.0 / 24 / 2 / 9 3 / 10.1.3.0 / 24 / 2 / 9 ADJ 1 - Rewrite SRC A+DST A MAC ADJ 2 - Rewrite SRC A+DST B MAC
H A S H
ADJ 3 - Rewrite SRC A+DST C MAC ADJ 4 - Rewrite SRC A+DST D MAC ADJ 5 - Rewrite SRC A+DST D MAC ADJ 6 - Rewrite SRC A+DST F MAC ADJ 7 - Rewrite SRC A+DST G MAC ADJ 8 - Rewrite SRC A+DST H MAC ADJ 9 - Rewrite SRC A+DST I MAC
ADJ 10 - Rewrite SRC A+DST J MAC
Software View
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
31
L3 Table: Design 2
Path Table IPv4/v6 Unicast FIB VPN / Prefix / Mask / Paths / Offset 1 / 10.1.2.0 / 24 / 4 / 1 1 / 10.1.3.0 / 24 / 1 / 5 3 / 10.1.2.0 / 24 / 2 / 6 3 / 10.1.3.0 / 24 / 2 / 6 Path 1 Rewrite Information ADJ 1 - Rewrite SRC A+DST A MAC ADJ 2 - Rewrite SRC A+DST B MAC ADJ 3 - Rewrite SRC A+DST C MAC ADJ 4 - Rewrite SRC A+DST D MAC ADJ 5 - Rewrite SRC A+DST E MAC ADJ 6 - Rewrite SRC A+DST F MAC ADJ 7 - Rewrite SRC A+DST G MAC ADJ 8 - Rewrite SRC A+DST H MAC ADJ 9 - Rewrite SRC A+DST I MAC
H A S H
Path 2 Path 3 Path 4 Path 1
Path 1
Path 2
ADJ 10 - Rewrite SRC A+DST J MAC
Software View
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
32
L2 Table / Host Table / FIB
Common Optimisation Hash tables take less space than TCAMs and Tries Instead of placing /32 or /128 entries for host entries into the FIB, place them into the hash table Common for the L2 table and the Host table to share the same memory
Allows for the FIB Table to be smaller since it does not need to contain single path /32 and /128 entries
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
33
Forwarding Design
Design 1
L2 Table Parse Packet L3 Table Ingress Security ACLs Adjacency Table Egress Security ACLs Egress QoS ACL Input / Output Policing Fwd Decision Update Statistics
Ingress QoS ACL
Design 2
L3 Table (x2)
Parse Packet L2 Table VPN CAM Ingress Security ACLs Ingress QoS ACL Adjacency Table Egress Security ACLs Egress QoS ACL Input / Output Policing Fwd Decision Update Statistics
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
34
Forwarding Design
Design 2
L3 Table (x2) Parse Packet L2 Table
VPN Table
Ingress Security ACLs Ingress QoS ACL
Adjacency Table
Egress Security ACLs Egress QoS ACL
Input / Output Policing
Fwd Decision
Update Statistics
Design 3
Parse Packet
Ingress Security ACLs
L2 Table VPN Table Ingress QoS ACL L3 Table (x2) Input Policing
Adjacency Table
Egress Security ACLs Output Policing Egress QoS ACL
Cisco Public
Fwd Decision
Update Statistics
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
35
References
Network Algorithmics,: An Interdisciplinary Approach to Designing Fast Networked Devices George Varghese Art of Computer Programming Vol 1-4, Donald E. Knuth Introduction to Algorithms, Third Edition Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein
IEEE SIGCOMM Papers
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
36
ASIC Engineering
ASICs vs FPGAs
ASIC - Application Specific Integrated Circuit A finished IC which is built to the exact specification & functionality of the customer Can make optimal use of the underlying silicon circuits Low part cost, High upfront investment Significant development time FPGA (EPLD) Field Programmable Gate Array An IC that can be configured with the required functionality after it is installed into a target system Flexibility vs. sub-optimal use of underlying silicon circuits Higher part cost Shorter development time
Main players: Xilinx, Altera
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
38
CMOS
VDD in VSS out in
out
p+ n-well
p+
n+
n+
Feature size This dimension is what Moores Law is all about!
http://en.wikipedia.org/wiki/Semiconductor_device_fabrication 1.6 X Increase in usable gates between process nodes
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
39
Gates
module nand2(a,b,c) input a,b; ouput c; begin c <= !(a & b); end
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
40
Why is die size important?
Defect 300mm Die
Silicon Wafer With same number of defects per wafer, smaller Die size results in higher yield per wafer
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
41
Integrated Circuit Production
RTL
Register Transfer Language Verilog, VHDL
Synthesis
Turn RTL into Gates and Logical Connections
Netlist
Gates and Logical Interconnections
Floor Plan
Overall Block and Function Placement
Placement
Specific Gate Placement
Route
Layout physical interconnection
GDSII
One file per layer (photomask)
Foundry Production
Metal layers on Wafer
Device Test
Test Dies on Wafer
Packaging
Cut wafer into dies Dies into IC packages
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
42
Integrated Circuit Production
RTL
Register Transfer Language Verilog, VDHL
Synthesis
Turn RTL into Gates and Logical Connections
Netlist
Gates and Logical Interconnections
ASIC Customer
- Cisco
Floor Plan
Overall Block and Function Placement
Placement
Specific Gate Placement
ASIC Vendor
Route
Layout physical interconnection
GDSII
One file per layer (photomask)
- Avago, IBM, TI, ST Micro - COT - Cisco
Foundry Production
Metal layers on Wafer
Silicon Foundry
Device Test
Test Dies on Wafer
Packaging
Cut wafer into dies Dies into IC packages
- IBM, TSMC, Global Foundries
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
43
ASIC Design Process
Requirements
Requirements Complete Select vendor, process, package Architecture, HW, SW, Marketing sign-off Planning ASIC Commit
Micro Architecture
Design Review
Implementation Final Netlist Handoff
DV Review Prelim Netlist Final Netlist RTL Release
Mask order (Tapeout)
Floorplan Netlist
12-26 Weeks ~52 Weeks ~12 Weeks Power On Release to Production
~12 Weeks
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
44
F2 ASIC - Clipper
Technology Die Size IBM Cu-65 18.0x18.3mm
Total SRAM
Total eDRAM Total TCAM
33.3Mb
134Mb 2.94Mb
Register Array
Logic Gates Signal Pin
1.34Mb
45M 186
Package IO
840
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
45
Memory and Packet Corruption Protection
No ECC or Parity no way to determine if a software or hardware problem Parity will detect single bit errors
ECC will detect 2 bit errors, and correct single bit
Parity and ECC apply to a word (32 or 64 bits)
CRC Detect if a set of bytes (normally a packet) has been corrupted
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
46
ASIC Packaging
Electrical parasitics of the chip package are critical
Silicon Die
Impacts electrical properties of high-speed signals
Manufacturing tolerances constrain minimum ball pitch Limit to number of available signal I/O pins
Underfill Level-1 Interconnect Die-to-Package
Level-2 Interconnect Package-to-Board
Package Substrate Power Planes (FR4/Ceramic)
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
47
Hardware Engineering
F2 Block Diagram
Central Arbiter To Spine Cards
PWR FPGA IO FPGA SODIMM LC CPU
Lightning
Sacramento
Clipper
Clipper
Clipper
Clipper
Clipper
Clipper
Clipper
Clipper
Clipper
Clipper
Clipper
Clipper
EDC SFP+ SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+ SFP+
EDC SFP+ SFP+ SFP+
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
49
Thermal Modelling
Component Case Temperatures Temperature Contours
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
50
Electrical / Mechanical Layout
20 Layers
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
51
EDVT
(Electronic Design Validation Test)
All tests performed using offline diagnostics and again with NXOS
On-board power supplies have voltages margined to +5% & -5% Temperature testing occurs while Soaking for 12 hours at 55o C and -5o C Ramping between extremes at 1o C per minute
Power cycle testing occurs during 12-hour soak
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
52
RDT (Reliability Demonstration Test)1
The Reliability Demonstration Test (RDT) is Ciscos approach to verifying the stated reliability of a product prior to production release. The reliability to be demonstrated is the products MTBF (Mean Time Between Failure). RDT replicates the end user operating environment and application through accelerated test time. It is expected that all hardware features are exercised in RDT.
All new products including systems and boards are subject to RDT.
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
53
Power Consumption
Skew Parts
Data Sheet
Typical 340W
Maximum 400W
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
54
Generic Online Diagnostics
Generic Online Diagnostics provide a diagnostic framework for detecting hardware faults and verifying the health of hardware components throughout the chassis.
Diagnostics run during system Boot-Up, after OIR, On-Demand using the CLI, or as Health Checks in the background.
Problem Areas:
Hardware Components (ASICs)
Interfaces (Ethernet, SFP+, etc)
Connecters (loose connectors, bent pins, etc) Memory Failure (Failure over time) Solder Joints
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
55
Software Engineering
NXOS Architecture
Layer-2 Protocols
VLAN mgr STP UDLD CDP
Layer-3 Protocols
OSPF BGP GLBP HSRP
Storage Protocols
Other Services
IGMP snp
LACP
802.1X
CTS
EIGRP
PIM
VRRP
SNMP
VSANs Zoning FCIP FSPF IVR
SNMP, XML, CLI Management
Protocol Stack (IPv4 / IPv6 / L2)
Future Services Possibilities
Sysmgr, PSS & MTS
Interface Management
Chassis Management Chip/Driver Infrastructure
Kernel
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
57
Multi-threaded
Scalability with SMP and multi-core CPUs Faster Route Re-convergence Lower mean-time-to-recovery
Real-Time
Real-Time preemptive scheduling System operational when CPU is 100%
Modularity
Most of the features are conditional Can be enabled/disabled independently Maximises efficiency Minimises resources utilisation
Separation Control Plane and Data Plane
No software forwarding feature Fully distributed hardware forwarding
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved.
Line Card Offloading
Offload to line card CPUs Scales with # of line cards Optimal hardware programming
Cisco Public
58
Software Engineering
SW Functional Spec SW Design Spec Unit Test Plan
} mfib_hw_oif_t;
Unit Integration Plan
Table Ptr. Pltfm Data
MFIB Context Data Structure IPv4 (S,G) Database
(S, G) Prefix rpfif/df
Pltfm Data hw_idx[] md_adj [..]
OIF List OIF
Pltfm Data: MET1 Ptr[..]
OIF
OIF
(S, G) Prefix rpfif/df
Pltfm Data hw_idx[] md_adj[.]
OIF Info
Pltfm Data: adj_ptr[]
OIF Info
Pltfm Data: adj_ptr[]
OIF Info
Pltfm Data: adj_ptr[]
IPv6 (S,G) Database
(S, G) Prefix OIF List
Pltfm Data hw_idx[] md_adj [..] Pltfm Data: MET1 Ptr[...]
OIF
OIF
MET Table FIB DRAM
(S, G)
ADJ RAM
MD Adj
RIT RAM
ccc=7 OIF Adj ptr 1 OIF Adj ptr 2
OIF 1 Adj OIF 2 Adj.
OIF OIF Adj Adj ptr ptr 3 3
MDT
Idx1 Idx2
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
59
Design Review
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
60
Development Test
Master Test Plans Functional Test Plans Automation Regression FCS
Testing of completed integrated feature Test for interactions with other features and functions Test for interoperability with Cisco and 3rd party devices Build scripts to automate testing so is repeatable on future releases
BRKRST-3069 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
61
First Customer Ship
ASIC
Requirements Plan
Micro Architecure
Implementation
Final Netlist
Power On
Hardware
HW Design Mechanical Electrical Manufacturing Detailed Design Mechanical Drawing MDVT PCB Layout BOM EDVT RDT
Fab Out
P0
P1
P2
A-0
Software
SW Functional Spec SW Design Spec Unit Test Plan Unit Integration Plan
Software Test
Master Test Plans
BRKRST-3069
Functional Test Plans
2012 Cisco and/or its affiliates. All rights reserved.
Automation
Cisco Public
Regression
FCS
62
Q&A
Complete Your Online Session Evaluation
Complete your session evaluation:
Directly from your mobile device by visiting www.ciscoliveaustralia.com/mobile and login by entering your username and password Visit one of the Cisco Live internet stations located throughout the venue Open a browser on your own computer to access the Cisco Live onsite portal
Dont forget to activate your Cisco Live Virtual account for access to all session materials, communities, and on-demand and live activities throughout the year. Activate your account at any internet station or visit www.ciscolivevirtual.com.
Cisco Public
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
64
BRKRST-3069
2012 Cisco and/or its affiliates. All rights reserved.
Cisco Public
65