Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
1K views45 pages

EVPN VXLAN Design Guide

This document provides an overview of EVPN and VXLAN protocols, architectures, and use cases. It describes: 1. How VXLAN encapsulates Ethernet frames over IP to extend L2 domains across IP networks using VTEPs, VTIs, VNIs, and MAC-in-IP encapsulation. 2. How EVPN uses BGP to advertise MAC addresses, MAC/IP bindings, and IP prefixes across the overlay for control plane learning, active-active topologies, and route policies. 3. Different deployment models including L2 VPNs, integrating L2 with L3 using IRB, and pure L3 VPNs using only Type-5 routes. The

Uploaded by

deep k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views45 pages

EVPN VXLAN Design Guide

This document provides an overview of EVPN and VXLAN protocols, architectures, and use cases. It describes: 1. How VXLAN encapsulates Ethernet frames over IP to extend L2 domains across IP networks using VTEPs, VTIs, VNIs, and MAC-in-IP encapsulation. 2. How EVPN uses BGP to advertise MAC addresses, MAC/IP bindings, and IP prefixes across the overlay for control plane learning, active-active topologies, and route policies. 3. Different deployment models including L2 VPNs, integrating L2 with L3 using IRB, and pure L3 VPNs using only Type-5 routes. The

Uploaded by

deep k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

EVPN VXLAN Design Guide

eos.arista.com/evpn-vxlan-design-guide

Russell Kelly August 6, 2018

A Detailed Overview of the EVPN & VxLAN Protocols, Route Types, Use-Cases and
Architectures

Contents [hide]

1. Introduction
2. VXLAN Overview
2.1 VXLAN Bridging
2.2 VXLAN Routing
3 EVPN Overview
3.1 EVPN Operational Benefits
3.2 EVPN Terminology
3.3 EVPN Address Family and Routes
3.4 EVPN Service Models
4. EVPN Core Operations
4.1 MAC Address Learning.
4.2 ARP Suppression
4.3 MAC Mobility
4.4 MAC address Damping
4.5 Broadcast and Multicast Traffic
4.6 Integrated Routing and Bridging
4.7 EVPN Type 5 Routes – IP Prefix advertisement
4.8 Summary Comparison of Route Type-2 and Type-5 Prefix Announcements
4.9 Auto RT and Auto RD For VLAN-Based EVIs
5. Deployment Models
5.1 Underlay and Overlay Design Options
5.2 Site Topology Design Options
5.3 Layer 2 VPN deployment model
5.4 EVPN VXLAN Layer 2 With Layer 3 IRB Integration
5.5 Pure Layer 3 VPN Deployment Model (Type-5 only)
7. Configuration Guides & Further Reading
7.1 General Collateral
7.2 Layer 2 EVPN VXLAN Configuration Guides
7.3 IRB EVPN VXLAN Configuration Guides
7.4 Layer 3 EVPN VXLAN Configuration Guides
7.5 Related EVPN VxLAN Services and Functions
7.6 Configuring & Managing EVPN VXLAN Using Cloudvision

1/45
1. Introduction
This document describes the operation and configuration of BGP EVPN Services over a
VXLAN (Virtual eXtensible LAN) overlay on Arista platforms.

The focus in this design guide is VxLAN as the protocol for the data-plane encapsulation for
the overlay tunnels, and the functionality of the Multiprotocol BGP (MP-BGP) EVPN address-
family for control plane signaling in the overlay. MP-BGP EVPN is not only used for
advertising MAC addresses, MAC and IP bindings and IP prefixes across the overlay; it
provides efficiencies in the way learning is managed in the overlay and enables enhanced
active/active topologies. Some examples of these efficiencies/enhancements include:

Control plane learning of MAC and IP information. Contrary to existing Layer 2 VPN
technologies, such as VPLS, that learn only through the data-plane and have no Layer
3 awareness

The control plane learning further allows for active-active forwarding into dual-homed
environments due to split horizon and designated forwarder capability definitions.
Again, in contrast to VPLS, which is exclusively active-standby as it lacks a capability
to detect loops

Using BGP route policies to control MAC (MAC+IP) advertisements, much like IP
VPN’s. Unlike existing Layer 2 VPN technologies that rely on data-plane MAC filtering
locally on each device.

2. VXLAN Overview
Firstly, let’s review the VXLAN data-plane encapsulation that’s being utilized to provide the
overlay tunnels when using BGP EVPN with an IP only underlay.

The VXLAN protocol is an RFC (7348) standard co-authored by Arista. The standard defines
a MAC in IP encapsulation protocol allowing the stretching of layer 2 domains across a layer
3 IP infrastructure. The protocol is typically deployed as a data center technology to create
overlay network topologies both within and across data centers for:

1. Providing layer 2 connectivity between racks, or halls of the data center without
requiring an underlying layer 2 infrastructure
2. Linking geographically dispersed data centers as a data center Interconnect (DCI)
technology
3. Replacement for traditional MPLS technologies in MAN and WAN environments, to
provide Layer 2 and Layer 3 VPN services across an IP only infrastructure

To perform the packet encapsulation and forwarding within the overlay network, the standard
introduces a set of new components and functions to the traditional network forwarding and
control plane.

2/45
Figure 2.1 VXLAN Encapsulation

Virtual Tunnel End-point (VTEP). The VTEP acts as the entry and exit point into and out of a
VXLAN overlay network. The task of the VTEP is to encapsulate locally received traffic
destined for nodes learnt on a remote VTEP with a VXLAN header. For traffic received from
a remote VTEP, it will decapsulate the traffic and forward it to the relevant locally attached
nodes using standard layer 2 forwarding techniques.

The VTEP component can be embedded in either:

1. Physical switch: For wire-speed packet forwarding performance reasons or to provide


connectivity to bare-metal servers and traditional network services (Firewall, Load-
balancer etc),

2. Software switch: The software virtual switch within the hypervisor of a physical server
to provide VXLAN forwarding for directly attached Virtual Machines (VM).

Virtual tunnel Interface (VTI): Is the IP interface of the VTEP. The originating or local VTEP
uses this as the source IP address for any traffic to be VXLAN encapsulated and would be
the destination IP address for any VXLAN encapsulated traffic destined to the VTEP.

VXLAN Frame: The outer IP header added by the VTEP is a standard IP/UDP header,
containing the source IP address of the local VTEP and the destination IP address of the
remote VTEP. To provide a level of entropy for load-balancing the inner IP packet across
network, the SRC port of the UDP header is a hash of the inner frame.

Virtual Network Identifier (VNI): The VNI is a 24-bit field contained within the VXLAN header
of the encapsulated frame and is the logical layer 2 network identifier for the overlay network.
The use of a 24-bit field for the VNI provides the ability to scale the layer 2 domains within
the overlay beyond the 4k limit of traditional 802.1Q VLANs, providing support for potentially
16 million layer 2 domains.

2.1 VXLAN Bridging

3/45
VXLAN bridging is the concept of using the VXLAN protocol to provide layer 2 connectivity
across the layer 3 infrastructure. This is achieved on an Arista VTEP, by taking a traditional
layer 2 domain, defined by an access or trunked interface and mapping the layer 2 domain
into a VXLAN VNI. With a pair of VTEPs deployed, layer 2 connectivity can be achieved
between the VTEPs across a layer 3 infrastructure.

Figure 2.2 VXLAN Bridging

The configuration of VXLAN bridging on an Arista switch and the concepts involved are
covered in the document available at the following link:

https://eos.arista.com/vxlan-with-mlag-configuration-guide/

2.2 VXLAN Routing


VXLAN routing of an encapsulated frame, involves the routing of traffic based not on the
destination IP address of the outer VXLAN header but the inner header or overlay tenant IP
address. The concept is illustrated in the diagram below, where VXLAN routing is used to
route traffic between hosts (Serv-1 and Serv-2) in different layer 2 segments.

Figure 2.3 VXLAN Routing

4/45
As the default gateway for Serv-1, traffic destined to Serv-2 is routed by leaf-1 into the Serv-
2’s subnet (10.10.20.0/20). The destination MAC for Serv-2 has been learnt behind a remote
VTEP (VTEP-2), thus to forward the traffic to Serv-2, leaf-1 will VXLAN encapsulate the
frame and VXLAN bridge the frame to the remote VTEP.

Receiving the frame, VTEP-2 de-encapsulates the VXLAN frame and based on its local
configuration, maps the VNI 1020 to VLAN 20. A MAC address lookup of VLAN 20, results in
the packet being forward to Serv-2. The packet walk of the traffic flow is illustrated below.

Figure 2.4 VXLAN Routing Packet Header information

The traffic flow from Serv-1 to Serv-2, therefore results in Layer 3 routing on the original
frame at Leaf-1 and VXLAN bridging to the remote VTEP, VTEP-2. Traffic flow in the
opposite direction, with the default gateway for Serv-2 being 10.10.20.1, would result in
VXLAN bridging between VTEP-2 and VTEP-1 and routing of the inner frame on leaf-1 for
local forwarding to the final destination, Serv-1.

The configuration of VXLAN Routing on an Arista switch and the concepts involved are
covered in the document available at the following link:

https://eos.arista.com/vxlan-routing-with-mlag/

3 EVPN Overview

5/45
EVPN is a standards-based BGP control plane to advertise MAC addresses, MAC and IP
bindings and IP Prefixes. The standard was first defined in RFC 7432 for an MPLS data
plane, that work has since been extended in the BESS (BGP Enabled ServiceS) working
group, with additional drafts published by the group defining the operation in the context of
Network Virtualization Overlay (NVO) for VXLAN, NVGRE and MPLS over GRE data planes
(RFC 8365). This design document focuses on EVPN and it’s operation with a VXLAN data
plane, which has become the de facto standard for building overlay networks in the data
center.

A number of control planes exist today for VXLAN, based on specific use cases, whether it
be a requirement to integrate with an SDN overlay controller, or operate in a standards
based flood and learn control plane model.

Figure 3.1 The different VXLAN Control Planes

Current flood and learn models operate either with a multicast control plane, or ingress
replication, where the operator manually configures the remote VTEPs in the flood list. Both
of these are data-plane driven, that is, MAC’s are learnt via flooding. In the IP multicast
model MAC’s are learnt in the underlay via flooding to an IP multicast group, while ingress
replication (HER) floods to configured VTEP endpoints and no IP Multicast is required in the
underlay.

The controller based solution with cloud vision exchange (CVX), locally learned MAC’s are
published to a centralized controller and these MAC’s are then programed to all participating
VTEPs.

6/45
Finally, there is what’s known as “controller-less” BGP EVPN MAC learning. Where a
standards-based control-plane (MP-BGP) is used to discover remote VTEPs and advertise
MAC address and MAC/IP bindings in the VXLAN overlay, thus eliminating the flood and
learn paradigms of the previously mentioned (multicast or HER) controller-less approaches.
As a standards-based approach, the discovery and therefore the advertisement of the EVPN
service models can inter-operate amongst multiple vendors.

The initial EVPN standard is RFC 7432 which defines the BGP EVPN control plane, and
specifies an MPLS data-plane. The control plane with an MPLS data plane was extended to
consider additional data plane encapsulations models including VXLAN, NVGRE and MPLS
over GRE.

Figure 3.2 VXLAN Control Plane and Dataplane Definitions

This highlights an important and powerful advantages of BGP EVPN; that being, it is a single
control plane for multiple data-plane encapsulations and defines both Layer 2 and layer 3
VPN services. As network operators drive toward simplicity and automation, having one
control plane protocol and address family for all data-planes and VPN services will prove
extremely powerful.

The BESS working group has defined a number of standards and draft proposals for the
operation of EVPN, the relevant standards discussed in this document in the context of an
VXLAN data plane are summarised below:

RFC 7432: BGP MPLS-Based Ethernet VPNs

https://tools.ietf.org/html/rfc7432

Network Virtualization Overlay solutions using EVPN:

https://tools.ietf.org/html/rfc8365

Integrated Routing and Bridging in EVPN

https://tools.ietf.org/html/draft-ietf-bess-evpn-inter-subnet-forwarding-03

7/45
IP prefix advertisement in EVPN

https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement-04

3.1 EVPN Operational Benefits


As discussed EVPN provides a single control plane for the delivery of Layer 2 and 3 VPN
services across multiple data-plane encapsulations. This provides a powerful advantage as
operators drive towards simplicity and automation. While learning lessons from previous
approaches, EVPN also leverages a number of mature and proven models from well-known
standards such as IP-VPN (RFC4364), to simplify the operation and adoption of the protocol.

Standards based BGP control plane for VXLAN to provide support for multi-vendor
interoperability.

Reuse of well known and mature MP-BGP concepts, Route-Targets and Route-
Distinguishers to deliver multi-tenant Layer 2 and layer 3 VPNs

MAC address learning in the control plane using BGP, rather than flood and learn,
making the operation more akin to that of an IP (L3) VPN service.

Optional ARP (MAC to IP) learning / suppression for the reduction of traffic flooding
across layer 2 domains

MAC flapping prevention using address damping techniques

Support for active-active and active-standby multi-homing of end nodes, providing an


important optimization over other L2VPN’s which only provide one active path into and
out of the VPN (to avoid loops).

3.2 EVPN Terminology


The EVPN standard in the context of an NVO environment, defines the functionality for
delivering multi-tenant Layer 2/3 VPN services using either VXLAN, NVGRE or MPLS over
GRE encapsulation, across a common physical IP infrastructure. The standard introduces
new terminology specific to a NVO environment, which are summarised below in relation to
VXLAN encapsulation and are referenced through the remainder of the document.

Network Virtualization Overlay (NVO): The overlay network used to deliver the Layer
2 and Layer 3 VPN services. For VXLAN encapsulation, this would define a VXLAN
domain, which would include one or more VNIs, for the transportation of tenant traffic
over a common IP underlay infrastructure.

8/45
Network Virtualization End-Point (NVE): The provider edge node within the NVO
environment responsible for the encapsulation of tenant traffic into the overlay
network. For a VXLAN data plane, this defines the Virtual Tunnel End-Point (VTEP)

Virtual Network Identifier (VNI): The label identifier within the VXLAN encapsulated
frame, defining a layer 2 domain in the overlay network

EVPN instance (EVI): A logical switch within the EVPN domain which spans and
interconnects multiple VTEPs to provide tenant layer 2 and layer 3 connectivity.

MAC-VRF: A Virtual Routing and Forwarding table for storing Media Access Control
(MAC) addresses on a VTEP for a specific tenant.

Figure 3.3 EVPN terminology for a VXLAN data plane

3.3 EVPN Address Family and Routes


The new EVPN Network Layer Reachability Information (NLRI) is carried in BGP using
Multiprotocol BGP Extensions with a newly defined Address Family Identifier (AFI) and
Subsequent Address Family Identifier (SAFI).

To provide multi-tenancy, the standard uses the aforementioned traditional VPN methods to
control the import and export of routes and provide support for overlapping IP addresses
between tenants.

Multi-protocol BGP for EVPN: A new AFI and SAFI have been defined for EVPN.
These are AFI=25 (L2VPN) and SAFI = 70 (EVPN)

9/45
EVPN L2/L3 Tenant Segmentation: Similar to standard MPLS VPN configurations
Route Distinguishers (RD’s) and Route Targets (RT’s) are defined for the VPN.

Route Target (RT): To control the import and export of routes across VRFs, EVPN
routes are advertised with Route-Target (RT) (BGP extended communities). The RT
can be auto derived to simplify the rule configuration, typically this is based on the AS
number and the VNI of the MAC-VRF.

Route Distinguisher (RD): Unique number prepended to the advertised address


within the VRF, ensuring support for overlapping IP Addresses and MACs across
different tenants.

The format of the MP_REACH_NLRI/MP_UNREACH_NLRI attribute, holding the new EVPN


NLRI is illustrated below, where the next-hop address within the NLRI is the IP address of
the VTEP advertising the EVPN route.

Figure 3.4 The EVPN NLRI Route Format

As illustrated in figure 3.4, the original EVPN MPLS RFC (7432) and subsequent IP prefix
draft (draft-ietf-bess-evpn-prefix-advertisement-04), introduced five unique EVPN route
types.

Type-1 Route: Ethernet A-D route

Ethernet A-D route per ESI route, announces the reachability of a multi-homed Ethernet
Segment. The route type is used for fast convergence (ie: ‘mass withdraw’) functions, as
well as split horizon filtering used for active-active multi-homing.

10/45
Ethernet A-D route per EVI route, is used to implement the Aliasing and Backup Path
features of EVPN associated with active-active multi-homing.

Type-2 Route: Host advertisement Route

Used to advertise the reachability of a MAC address, or optionally a MAC and IP binding as
learnt by a specific EVI. With the advertisement of the optional IP address of the host, EVPN
provides the ability for VTEPs to perform ARP suppression and ARP proxy to reduce
flooding within the layer 2 VPN.

Type-3 Route: Inclusive Multicast route

The type-3 route is used to advertise the membership of a specific layer 2 domain (VNI
within the VXLAN domain), allowing the dynamic discovery of remote VTEPs in a specific
VNI and the population of a VTEP ingress flood list for the forwarding of Broadcast Unknown
unicast and Multicast (BUM) traffic.

Type-4 Route: Ethernet Segment Route

The type-4 route is specific to VTEPs supporting the EVPN multihoming model, for active-
active and active-standby forwarding. The route is used to discover VTEPs which are
attached to the same shared Ethernet Segment. Additionally, this route type is used in the
Designated Forwarder (DF) election process.

Type-5 Route: IP-prefix route advertisement

The type-5 route is used to advertise IP prefixes rather the MAC and IP hosts addresses of
the type-2 route. This advertisement of prefixes into the EVPN domain provides the ability to
build classic layer 3 VPN topologies.

A detailed understanding of the function of each of these route types in the operation of
EVPN to provide multi-tenant layer 2 and 3 VPN services, is defined in Section 4 of this
document.

While this guide focuses on EVPN with VXLAN data-plane encapsulation, it’s important to
note that, in addition to the new routes type, a BGP encapsulated extended community is
included in all advertisements to determine the data-plane encapsulation. The Encapsulation
extended community is defined in RFC 5512. The different IANA registered tunnel types for
an NVO environment are summarized in the table below:

11/45
Figure 3.5 Defined Data-Plane Encapsulations

3.4 EVPN Service Models


An EVPN instance (EVI), can contain, one or more layer 2 broadcast domains (VLANs). The
association of a VLAN-IDs to a specific EVI instance and how a VLAN tag can be
transported within the EVI if required, is defined by three EVPN service models: VLAN
based, VLAN Bundle, and VLAN aware bundle.

VLAN based service interface

In the VLAN based service there is a one-to-one mapping between the VLAN-ID and the
MAC-VRF of the EVPN instance. With the MAC-VRF mapping directly to the associated
VLAN, there will be a single bridge table within the MAC-VRF. The VLAN tag is not carried in
any route update and the VNI label in the route advertisement is used to uniquely identify the
bridge domain of the MAC-VRF in the VXLAN forwarding plane.

Figure 3.6 Vlan based service interface

12/45
With a one-to-one mapping between the VLAN-ID and the MAC-VRF of EVI instance, the
EVI will represent an individual tenant subnet/VLAN in the overlay. The one-to-one mapping
also means the route-target associated with the MAC-VRF, uniquely identifies the tenant’s
subnet/VLAN, providing granular importing of MAC routes on a per VLAN basis on each
VTEP.

In this service, the associated MAC-VRF table is identified by the Route-Target in the control
plane and by the VNI in the data plane and the MAC-VRF table corresponds to to a single
VLAN bridge domain.

VLAN Aware Bundle Service Interface

In the VLAN aware bundle service there is a many-to-one mapping between the VLAN-IDs
and the MAC-VRF of the EVPN instance. However, the MAC-VRF contains a unique layer 2
bridge table for each associated VLAN-ID and a unique VNI label for each bridge domain.

With the MAC-VRF containing multiple layer 2 bridge tables, the VLAN tag is carried in any
EVPN route update to allow mapping to the correct tenant bridge table within the MAC-VRF.
Only the unique VNI label is carried in the VXLAN data plane, to allow forwarding to the
correct VLAN with the MAC-VRF

Figure 3.7 Vlan Aware Bundle Service

In this service, MAC-VRF of the EVI instance represents multiple subnet/VLANs of the
tenant. The layer 2 bridge table of the MAC-VRF is identified by a combination of the Route-
Target and the ethernet tag in the control plane and by the unique VNI and in the VXLAN
data plane.

This service type is a common DCI/WAN deployment, where a tenant’s VLANs are bundled
into single EVI instance, while VLAN “awareness” can be retained in the EVPN service as
the VNI tag is advertised in the MAC-IP route (which now identifies the VLAN within the EVI).

13/45
Bundling into a service like this reduces the number of EVI’s that need to be configured,
reducing complexity and the control-plane signaling between PE’s.

4. EVPN Core Operations


The EVPN standard defines a number of operations and functionality to allow the dynamic
learning of MAC and IP bindings, management of MAC moves (VM/host mobility), ARP
suppression, automated discovery of remote VTEPs and multi-homing to support active-
active topologies.

4.1 MAC Address Learning.


Referring to the diagram below. MAC address learning on the local interface of a VTEP is still
flow-based learning, however once the MAC’s are learnt locally they are advertised to BGP
peers within the EVI via an EVPN route update. The next hop of the update is set to IP of
the advertising VTEP. In the case of EVPN VXLAN the label advertised in the update is the
VNI, which identifies the MAC-VRF in the case of a VLAN Based service, or the EVI for a
VLAN aware bundle service.

Figure 4.1 EVPN Type 2 Route Announcement

The route advertisements are EVPN type-2 routes, which can advertise just the MAC
address of the host, or optionally the MAC and IP address of the host. The format of the
type-2 route is illustrated in the figure below, along with the mandatory and optional extended
community attached to the route.

14/45
Figure 4.2 EVPN Type 2 MAC and IP route format

From the figure above the salient fields are:

Multiprotocol Reachable NLRI (MP_REACH_NLRI) attribute of the route is used to


carry the next-hop hop for the advertised route. In the context of a VXLAN forwarding
plane, this will be the source address (VTI) of the advertising VTEP.

Route Distinguisher of the advertising node’s MAC-VRF

Ethernet Segment Identifier (ESI), this field is populated when the VTEP participating in
a multi-homed topology. This is discussed in the following sections.

Ethernet tag ID that will be 0 for VLAN-based service, and the customer VLAN ID in a
VLAN-aware bundle service.

IP address of the host which is associated with advertised MAC address. The
advertisement of the Host’s IP address is optional

Label in the context of a VXLAN forwarding plane is the VNI associated with the MAC-
VRF/layer 2 domain the advertised MAC address has been learnt on.

Route Target associated with the MAC-VRF advertised with route to allow the control of
the import and export of routes.

15/45
The MAC mobility extended community, as discussed in the following section is used
during MAC moves to update all VTEPs of the new location of the host.

4.2 ARP Suppression


Providing the option to advertise the MAC and IP binding in the type-2 route, ARP
suppression can be supported on the remote VTEPs. The MAC to IP binding can be learnt
locally, via ARP snooping or DHCP traffic on the VTEP. Once the MAC and IP binding has
been learnt, it is advertised to the remote VTEPs as a type-2 route. This allows remote
VTEPs to respond to any ARP requests for the host locally, thus reducing the amount of ARP
traffic across the EVI.

Importantly, the optional MAC and IP route can be advertised separately from the MAC only
type-2 route. This is done so that if the MAC and IP route is cleared, i.e. ARP flushed, or the
ARP timeout is set to less than the MAC timeout, then the MAC only route will still exist.

4.3 MAC Mobility


A common scenario in a data center environment is virtual machines (VMs) moving between
physical servers, for maintenance or performance reasons, this will result in the MAC of the
VM being learned and advertised by a new VTEP.

To cater for this situation a sequence number is attached to the new MAC advertisement
ensuring an EVI wide refresh of the MAC table, with VTEPs updating their forwarding tables
to point to the advertising VTEP as the new next-hop for MAC address.

16/45
Figure 4.3 EVPN type-2 MAC Mobility Behaviour

When a MAC address is learnt and advertised for the first time it is advertised without a
sequence number and the receiving VTEP assumes the sequence to be zero. On detection
of a MAC move, i.e. a MAC is learnt locally when the same MAC route is active via a type-2
advertisement, then the sequence number is incremented by one, and the MAC route is
advertised to the remote peers. The original advertising VTEP, receives the MAC route with a
now higher sequence number and withdraws its own local MAC route. All other VTEPs flush
the original MAC route, and update their tables with the new higher sequence number route.

4.4 MAC address Damping


In addition to MAC mobility, EVPN defines a protection mechanism to detect and prevent
MAC routes flapping between VTEPs, which can occur during network instability or when
hosts have been misconfigured with the same (duplicate) MAC address.

On advertising a locally learned MAC, the VTEP will start an M second counter (default is
180s), if the VTEP detects N MAC moves (default is 5) for the route within the M second
window, it will generate a syslog message and stop sending and processing any further
updates for the route.

4.5 Broadcast and Multicast Traffic

17/45
Broadcast, unknown unicast and Multicast (BUM) traffic is handled within the EVPN
forwarding model using ingress replication. Where the BUM frame is replicated on the
ingress VTEP to each of the remote VTEPs in the associated EVI/VNI. The VTEP replication
list for the EVI, is dynamically populated based on Type-3 route advertisements (Inclusive
Multicast Ethernet Tag Route), where VTEPs advertise type-3 routes for each EVI they are
members.

Figure 4.4 EVPN type-3 IMET route behavior for ingress replication

The format of the type-3 route is illustrated in the figure below:

Figure 4.5 EVPN type-3 IMET route format

From the figure above the salient fields of the type-3 route are:

18/45
Multiprotocol Reachable NLRI (MP_REACH_NLRI) attribute of the route is used to
carry the next-hop for the advertised route. In the context of a VXLAN forwarding plane,
this will be the source address (VTI) of the advertising VTEP.

Route Distinguisher of the advertising node’s MAC-VRF

Ethernet tag that will be 0 for VLAN-based service, and the MAC-VRF VNI for a VLAN-
aware bundle service.

IP address of the VTEP advertising the type 3 route

Route Target associated with the MAC-VRF or the EVI in a VLAN-aware bundle
service.

PMSI Tunnel Attribute, to advertise the replication model the VTEP is supporting. The
supported options defined within the standard are ingress replication and IP multicast.

4.6 Integrated Routing and Bridging


In the traditional data center design, inter-subnet forwarding is provided by a centralized
router, where traffic traverse across the network to a centralized routing node and back again
to its final destination. In a large multi-tenant data center environment this operational model
can lead to inefficient use of bandwidth and sub-optimal forwarding.

To provide a more optimal forwarding model and avoid traffic tromboning, the EVPN inter-
subnet draft (draft-sajassi-l2vpn-evpn-inter-subnet-forwarding) proposes integrating the
routing and bridging (IRB) functionality directly onto the VTEP, thereby allowing the routing
operation to occur as close to the end host as possible. The draft proposes two forwarding
models for the IRB functionality, which are termed asymmetric IRB and symmetrical IRB,
these two models are described in the following sections.

In the asymmetric IRB model, the inter-subnet routing functionality is performed by the
ingress VTEP, with the packet after the routing action being VXLAN bridged to the
destination VTEP. The egress VTEP only then needs to remove the VXLAN header and
forward the packet onto the local layer 2 domain based on the VNI to VLAN mapping. In the
return path, the routing functionality is reversed with the destination VTEP now performing
the ingress routing and VXLAN bridging operation, hence the term asymmetric IRB.

19/45
Figure 4.6 EVPN Asymmetrical IRB

To provide inter-subnet routing on all VTEPs for all subnets, an anycast IP address is
utilized for each subnet and configured on each VTEP. The anycast IP acts as the default
gateway for the hosts, therefore regardless of where the host resides the directly attached
VTEPs can act as the host’s default gateway. The host MAC and MAC to IP bindings are
learned by each VTEP based on a combination of local learning/ARP snooping and type-2
route advertisements from remote VTEPs. In a typical implementation, the optional MAC and
IP, type-2 route is advertised separately from the MAC only type-2 route. This is done so that
if the MAC and IP route is cleared, for example the ARP flushed, or the ARP timeout is set to
less than the MAC timeout, then the MAC only route will still exist.

The format of the two advertised type-2 routes for Server-1 are illustrated below, where the
RD IP-A:1010 and route-target 1010:1010 are used to distinguish the uniqueness of the
route and allow the route to be imported into the correct remote MAC-VRF based on the
route-target import policy of the VTEP

20/45
Figure 4.7 EVPN Comparison of MAC & MAC+IP Type 2 Route in Asymmetrical IRB

The packet flow for the asymmetrical model is illustrated in the figure below, where two
subnets are configured subnet-10/VNI 1010 (Green) and subnet-11/VNI 1011 (Blue). For the
traffic flow between Server-1 in subnet-10 and Server-4 in subnet-11, the ingress VTEP
(VTEP-1) locally routes the packet into subnet-11/VNI 1011 and then VXLAN bridges the
frame , inserting the VNI 1011 into the VXLAN header with an inner DMAC equal to the
destination host, Server-4. This requires the receiving VTEP, (VTEP-4) to only perform a
local layer 2 lookup, based on the VNI to VLAN mapping, for the DMAC of Server-4.

21/45
Figure 4.8 EVPN Asymmetrical IRB VxLAN Data-plane Forwarding Detail

For the asymmetric model to operate the sending VTEP needs the information for all the
tenant’s hosts (MAC and MAC to IP binding), to route and bridge the packet. This means the
VTEP needs to be member of all the tenant’s subnets/VNI and have an associated SVI with
anycast IP for all the subnets, and this will be required on all VTEPs participating in the
routing functionality for the tenant. This introduces scaling issues on multiple fronts.

1. VNI Scaling: The number of VNIs supported on a hardware VTEP will be finite, so not
all VNIs can reside on all VTEPs. This is especially true in datacenter deployments,
where the TOR’s have traditionally been more resource constrained than chassis-
based edge systems.

2. Forwarding memory scaling: The VTEPs needs to store all host MACs and ARP entries
for all subnets in the network, on leaf switch this is hardware resource which again will
be a finite resource defined by the specific hardware platform deployed at the leaf.

Symmetric IRB

To address the scale issues of the asymmetric model, in the symmetrical model the VTEP is
only configured with the subnets that are present on the directly attached hosts, connectivity
to non-local subnets on a remote VTEP is achieved through an intermediate IP-VRF. The
subsequent forwarding model for symmetric IRB is illustrated in the figure below, for traffic
between Server-1 on subnet-10 (Green) and Server-4 on the remote subnet-11 (Blue). In this
model, the ingress VTEP routes the traffic between the local subnet (subnet-10) and the IP-

22/45
VRF, which both VTEPs are a member of, the egress VTEP then routes the frame from the
IP-VRF to the destination subnet. The forwarding model results in both VTEPs performing a
routing function, hence the term symmetrical IRB.

Figure 4.9 EVPN Symmetrical IRB

To provide the inter-subnet routing, when the subnet is stretched across multiple VTEPs, an
anycast IP address is utilised for each subnet, but only configured on the VTEP’s where the
subnet exists. The host MAC and MAC to IP bindings are learnt by each VTEP based on a
combination of local learning/ARP snooping and type-2 route advertisements.

For the symmetrical IRB model the type-2 (MAC and IP) route is advertised with two labels
and two route-targets corresponding to the MAC-VRF the MAC address is learnt on and the
IP-VRF. Remote VTEP’s receiving the route, import the IP host route into the corresponding
IP-VRF based on the IP-VRF route-target and if the corresponding MAC-VRF exists on the
VTEP the MAC address is imported into the local MAC-VRF based on the MAC-VRF’s
Route-Target. The import behavior for the type-2 route is illustrated in the diagrams below for
the host Server-1.

If the MAC-VRF exists locally on the receiving router, both the IP host route will be installed
in the IP-VRF, and the MAC address will be installed in the MAC-VRF. As shown in Figure
4.10. With both a MAC route in the MAC-VRF and an IP host route in the IP-VRF, the VNI
used in the data-path will depend on whether the traffic is being VXLAN bridged between
hosts in the same VNI (1010) or VXLAN routed (VNI 2000).

23/45
Figure 4.10 EVPN Type 2 Route in Symmetrical IRB – MAC-VRF on Both VTEPs

Compare this to Figure 4.11, where the MAC-VRF does not exist on the receiving VTEP
(VTEP-2). In this case the MAC route is not installed and ignored, as there is no
corresponding Route Target on the the VTEP. In this scenario, only the IP-VRF host route is
installed on VTEP-2. Traffic from VTEP-2 destined to hosts on subnet-10, are therefore
always VXLAN routed via the IP-VRF, VNI 2000.

24/45
Figure 4.11 EVPN Type 2 Route in Symmetrical IRB – MAC-VRF Only Exists on Sending
VTEP.

The symmetrical IRB type-2 route contains a number of additional extended community
attributes over the asymmetrical IRB type-2 route, the salient fields of the route are
summarised below: :

Multiprotocol Reachable NLRI (MP_REACH_NLRI) attribute is used to carry the next-


hop hop for the advertised route. In the context of a VXLAN forwarding plane, this will
be the source address of the advertising VTEP.

Route Distinguisher of the advertising node’s MAC-VRF. For Server-1 in the example
above this would be IPA:1010

MAC address field contains the 48-bit MAC address of the host being advertised.For
Server-1 in the example above this would be MAC-1

IP address and length field contain the IP address and 32-bit mask for the host being
advertised. For Server-1 in the example above this would be IP-1

MAC-VRF label, this contains the VNI number (label) corresponding to the local layer 2
domain/MAC-VRF the host MAC was learnt on. For Server-1 in the example above this
would be VNI 1010

IP-VRF label, this contains the VNI number (label) corresponding to the MAC-VRF’s
associated lP-VRF. For MAC-VRF 10 in the example above this would be IP-VRF
2000

Extended community Route Target for the IP-VRF. This contains the route-target of the
IP-VRF associated with the learnt MAC address

Extended community Router MAC, This field advertises the system MAC of the
advertising VTEP and is used as the DMAC for any packet sent to the VTEP via the IP-
VRF.

Extended community Route Target for the MAC-VRF. This contains the route-target of
the MAC-VRF associated with the learnt MAC address

4.7 EVPN Type 5 Routes – IP Prefix advertisement


The EVPN type 2 routes can be used to advertises IP prefixes by making use of the optional
IP address and IP address length fields in the route, however they are explicitly linked to the
MAC address advertised within the route. The EVPN type-5 route defined within the draft
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement-04, provides the ability to

25/45
decouple the advertisement of an IP prefix from any specific MAC address, providing the
ability to support floating IP address, optimised the mechanism for advertising external IP
prefixes, and reduce the churn when withdrawing IP prefixes.

The format of the new type-5 IP-prefix route is illustrated in the figure below:

Figure 4.12 EVPN Route type-5, for advertisement of IP-prefixes

The IP prefix draft defines a number of specific uses cases for the type-5 route, which
consequently affect the format and content of the fields within the route. The different
deployment scenarios and use cases defined within the draft are summarised below

Advertising of IP prefixes behind an appliance, when the appliance is not running a


routing protocol and only supporting static routes. This could be the typical use case for
a Virtual Firewall with a number of local subnets directly attached, but the firewall is
only supporting static routes into the associated EVI.

Support for active-standby deployment of appliances using a shared floating IP model.


This is an extension of the previous case where there is now a virtual IP (or VIP) for
clustering the appliances, rather than a dedicated physical IP address on the
appliance.

26/45
Support for layer 2 appliances, acting as a “bump in the wire” with no physical IP
addresses configured, where instead of the appliances having an IP next-hop there is
only a MAC next-hop.

IP-VRF to IP-VRF model, which is similar to inter-subnet forwarding for host routes
(detailed in the symmetric/asymmetric section), except only Type-5 routes and IP-
prefixes are advertised, allowing announcement of IP-prefixes into a tenant’s EVI
domain for external connectivity outside the domain. The IP-VRF to IP-VRF model, is
further divided in the draft into three distinct use cases.

Interface-less

In interface-less mode, the IP-prefixes within the type-5 route, whether they are local or
learned from a connected router are advertised to remote peers via the shared IP-VRF, as
illustrated in the figure below.

Figure 4.13 EVPN Route type-5, Interface-less Update

As illustrated in the figure, the IP prefix (subnet-A) residing behind the router (Rtr-1) is
learned via an IGP in EVI-1 on VTEP-1. The prefix is announced and learnt by the remote
VTEPs residing in the same EVI, via the type-5 route announcement. The type-5 route, is
advertised along with the prefix, with a route-target (2000:2000) and a VNI label (2000) equal
to the IP-VRF which interconnects the VTEPs in the EVI, the router-mac extended
community of the route is used to define the inner DMAC (equal to system MAC of VTEP-1)
for any VXLAN frame destined to advertised IP prefix.

27/45
From a forwarding perspective, host residing on subnet-B communicating with a host on
subnet-A, will send traffic to their default gateway which is the IRB interface on VTEP-2 in
VLAN 11/VNI 1011. VTEP-2 performs a route lookup for the destination subnet (subnet-A),
which has been learnt in the IP-VRF with a next-hop of VTEP-1 and VNI label of 2000. The
packet is thus VXLAN encapsulated with VNI label of 2000 an inner DMAC of A (VTEP-1
system/router MAC), and routed to VTEP-1, which is the next-hop for the prefix. Receiving
the frame, VTEP-1 de-encapsulates the packet, with an inner DMAC of the VTEPs router
MAC, it performs a local route lookup for the destination subnet (subnet-A), which has been
learnt with a next-hop of rtr-1. The frame is forwarded directly to rtr-1, which subsequently
routes the packet to the local host on subnet-A. The format of the type-5 route in interface-
less mode is illustrated in figure below:

Figure 4.14 EVPN Type-5 route format for interface-less mode

In this model, the VTEPs forming the EVI are interconnected via an IP-VRF, meaning there is
no IRB interface (MAC and IP) created for the interconnection on each of the VTEPs, hence
the term “interface-less”. With no IRB interface the gateway IP address within the type-5
route is set to zero, traffic is routed to the prefix based on the next-hop of the route (VTEP
IP) as well as MAC address conveyed within the Router MAC extended community, which
represents the inner destination MAC of the VXLAN encapsulated frame.

28/45
4.8 Summary Comparison of Route Type-2 and Type-5 Prefix
Announcements
Although both type-2 and type-5 routes have the ability to announce IP prefixes, each is
used for a specific operation. Type 2 routes announce MAC and IP bindings, and are used
for MAC mobility and ARP resolution. Importantly the next-hop of the prefix (which is always
a host route) is always fixed to the associated advertised MAC address.

Type-5 routes are used to advertise IP prefixes with an associated next-hop IP address. As
discussed, this prefix announcement does not need to be bound to a MAC address, although
in interface-less mode the extended community gateway MAC is sent, there are floating IP
and interface-full modes that do not explicitly set the MAC in the type-5 update, and instead
rely on the route type-2 update to provide this resolution.

4.9 Auto RT and Auto RD For VLAN-Based EVIs


Route Targets in BGP signaled overlays are a great construct to create arbitrary topologies
for constrained route distribution. While traditionally VPNs have been used in WANs, with
EVPN VxLAN, these technologies are now being used within multi-tenant data center
environments. So simplifying network configuration complexity for operational staff who may
not have experience with Provider Edge technologies like VPNs, becomes very important to
leverage benefits of these technologies while simplifying operations. The advantage of auto
generation of route targets is to reduce the provisioning overhead and associated overhead
of managing the route target to tenant mapping.

The auto-RT is derived from the local BGP AS number, the overlay index type and the VNI in
the case of VXLAN and the normalized VLAN ID in the case of MPLS. This feature is only
supported with 2-byte AS numbers, and with I-BGP overlay peerings. In addition, solutions
like CloudVision EVPN Configlet builder enables customers to fully automate EVPN
deployments.

5. Deployment Models

5.1 Underlay and Overlay Design Options


The final topic to discuss is the underlay and overlay design for providing reachability
between loopbacks, or more correctly VTEP endpoints in the underlay, and for the MP-BGP
family EVPN peering to advertise MAC addresses and IP prefixes for the overlay network.

As the overlay network with EVPN VXLAN is IP, then there is no requirement to run an IGP
to support a label distribution protocol such as LDP, or and IGP for the extension required for
RSVP-TE. In this case customers often use eBGP in the underlay because it’s scalable,
predictable and has a high degree of route control via policies.

29/45
Given eBGP is used in the underlay it is then a simple extension to use the same eBGP
session to advertise the EVPN routes as well, or alternatively use a separate multi-hop
eBGP session between loopbacks to advertise the EVPN routes, providing a logical
separation between the advertised underlay and overlay prefixes. Second design option for
the overlay EVPN prefixes is an iBGP topology on the loopbacks, with the Spine switches
acting as resilient route-reflectors. The third option outlined below is to use an IGP such as
OSPF or ISIS as an underlay routing protocol, providing reachability between VTEP and
spine loopbacks, with an MP iBGP overlay session for EVPN route advertisement. This
design would normally employ the spines as route reflectors to ease the BGP configuration.

eBGP Underlay and Overlay with EVPN Transit Router

With reference to Figure 5.1; eBGP is deployed in the underlay, running on the physical
interfaces of the leaf and spine switches for the VTEP prefixes, a separate eBGP session is
configured to peer (via loopback interfaces) with one or all spine switches to advertising the
overlay EVPN prefix. The spine switches are transparently re-advertising the EVPN prefix to
the other leaf routers without changing the next-hop and retaining any advertised community.
This role is also known as am EVPN transit router, reflecting the EVPN routes between
leafs. Technically, it is not a Route Server because it is not transparent in the AS path.

Figure 5.1 eBGP Underlay and eBGP Overlay Using Spine as EVPN Transit Routers

EBGP Underlay with iBGP Overlay

With reference to Figure 5.2; using eBGP as the underlay is to use a separate iBGP in the
overlay to advertise the EVPN routes. The choice to run iBGP in the overlay is normally so a
route-reflector can be used in the overlay. Route reflector functionality is supported in EOS,
which would typically be deployed on two of the spine switches, there is no need for a full
iBGP mesh between all the leafs switches in the topology. The iBGP sessions in the diagram
below are configured on the loopback interface of the leaf and spine switches, with iBGP

30/45
peering configured between the MLAG peers, with the “local-as” parameter configured
between the iBGP neighbors in order to create the same AS number for the iBGP EVPN
sessions.

Figure 5.2 eBGP Underlay and iBGP Overlay Using Spine Route Reflectors

IGP Underlay with iBGP Overlay

With reference to Figure 5.3; using an IGP like OSPF or ISIS as the underlay, and iBGP in
the overlay to advertise the EVPN routes. The choice to run MP-iBGP in the overlay is
normally so a route-reflector can be used in the overlay. Route reflector functionality is
supported in EOS, which would typically be deployed on two of the spine switches, there is
no need for a full iBGP mesh between all the leafs switches in the topology. The iBGP
sessions in the diagram below are configured on the loopback interface of the leaf and spine
switches.

31/45
Figure 5.3 IGP Underlay and iBGP Overlay Using Spine Route Reflectors

5.2 Site Topology Design Options


Finally, in a related, topic to EVPN protocol operation is the different site designs for VTEPs
participating in an EVPN VXLAN overlay topology. Some have been alluded to with respect
to injecting EVPN type-5 routes from connected or learned prefixes, and the other is in the
site resiliency designs with multi-homing.

Active-active forwarding with MLAG

To provide support for active-active multi-homing while preventing any disruption to the
existing leaf spine topology and cabling, EVPN operates in conjunction with Arista’s standard
MLAG leaf topology. While providing support for multi-homing via MLAG, the solution, as
documented in section 4.11, interoperates with any leaf running the EVPN multihoming
model, with type-1 route advertisements.

An MLAG leaf topology interworking with EVPN is illustrated in the diagram below, where the
physical switches are configured with a single shared logical VTEP (next-hop for any
advertised EVPN routes) while running separate BGP EVPN session with the Spine, with
each leaf advertise the same locally learnt type-2 and type-5 routes with the same next-hop,
the logical VTEP IP address.

32/45
Figure 5.4 Active-active forwarding with MLAG

With the same next-hop set by both leaf switches in the MLAG, they are able to work in
active-active mode, with traffic load-balanced to both the physical switches via the ECMP
topology of the leaf-spine architecture.

To provide resiliency in the event of a leaf losing connectivity to all four spine switches, an
iBGP session is run across the peer link interconnecting the two MLAG leaf switches, where
both underlay prefixes and overlay EVPN routes are exchanged.

MAC & ARP Timeout Setting for Locally Learnt Hosts

Just like in Non-EVPN domains; for locally connected hosts, MAC and ARP aging is
occurring, the only difference with EVPN is that the remotely learnt BGP EVPN MAC and
ARP are programmed as static entries. To avoid the locally learnt MACs from being flushed
after the default timeout (5 minutes) due to a lack of traffic, it is advised to configure the ARP
aging time (default 4 hours) to a value less than the configured MAC timeout. This
configuration will force an ARP refresh, and consequently a re-learning of the MAC entry,
before the MAC is flushed. The ARP aging timer is configured at the interface level with the
CLI command ‘arp timeout <60-65535 seconds>’, the MAC timeout value is a global
parameter and configured with the CLI command “mac address-table aging-time <10-
1000000 seconds’. This is particularly important if there are “quiet” hosts in the domain and
one needs to ensure MAC entries are not flushed (and then relearnt) unnecessarily.

Site/POD Connectivity Options

Another common variable is the site design, and whether the VTEP acts as a default
gateway for a downstream layer 2 domain or peers at layer 3 with downstream L3 aware
nodes. This becomes more relevant at the edge of the DC, when the VTEP is being
deployed for DCI or WAN connectivity, therefore instead of end-nodes being connected
directly to the VTEP, it could be a Layer 2 switching domain, or the DC Core routers that are
peering with the VTEP.

Layer 2 Site Design

33/45
Figure 5.5 Layer 2 POD Design

Firstly, the Layer 2 site: In this topology, the VTEP is the default-gateway for the Layer 2
domain, and as such it can spoof the ARP’s from the connected hosts and generate the
type-2 MAC+IP route for ARP suppression and provide layer 2 connectivity across sites,
Layer 3 VPN services between site can also be provided, in the design by the advertisement
of type-5 routes for the local prefix.

Layer 3 Site Design

The second site design option is to have routers peering with the VTEPs as detailed below.

34/45
Figure 5.6 Layer 3 POD Design

In this topology, the southbound interfaces on the VTEP are configured as router ports (no
switch-port) and most commonly the router’s peer using BGP. Again, any prefixes learnt via
this BGP session are advertised on to remote VTEPs as type-5 prefix routes when
redistribute learned is configured in the VRF. This provides Layer 3 VPN services between
sites.

5.3 Layer 2 VPN deployment model


In a Layer 2 EVPN topology, VLANs are stretched across leaf switches, and EVPN is used to
dynamically discover remote VTEPs and advertise MAC routes between VTEPs in the
shared VNI. VXLAN is the supported data-plane encapsulation, used to provide layer 2
connectivity between the leafs across a Layer 3 fabric. There is no IRB configured for any
VLAN/VNI, therefore there is no inter-VNI forwarding capability on any of the Arista VTEPs in
the 4.18.1 EOS release.

EVPN BGP sessions are configured on all leafs in either a full-mesh multi-hop eBGP
topology between Leaf/VTEP switches, or in a partial-mesh using route server capabilities on
the spine. These BGP EVPN peering sessions advertise the dynamically learnt, and
statically configured, MAC addresses to all remote VTEPs. The sequence number is
included in these MAC address advertisements to suppress MAC flapping and MAC spoofing

35/45
Broadcast, unknown unicast and multicast (BUM) traffic is flooded via head-end replication
(HER) to remote VTEPs using the BGP EVPN type 3 route.

The Arista leaf VTEPs can be configured in an active/active dual-homing configuration using
the standard MLAG configuration, and MAC addresses advertised via BGP EVPN updates to
remote VTEPs, with a next-hop of the shared virtual VTEP.

For further information please refer to the following configuration guide:

https://eos.arista.com/evpn-configuration-layer-2-evpn-design-with-type-2-routes/

Layer 2 EVPN model – Non-EVPN Layer 3 Gateway

Figure 5.7 below illustrates the Layer 2 EVPN model for multiple VNIs, with two of the Arista
VTEPs configured in an active/active M-LAG pair. One spine switch is configured as the
transit route server for the EVPN overlay routes, while a non-EVPN aware gateway router
provides inter-VNI routing capabilities and external access.

Figure 5.7 Layer 2 EVPN model – Non-EVPN Layer 3 Gateway

Layer 2 EVPN model – Third-Party Layer 3 VTEP

Figure 5.8 illustrates the same topology as in Figure 5.7, except that now a 3rd party VTEP is
providing gateway functionality. The 3rd party GW VTEPs can be configured either as an ESI
active/standby or an active/active site.

36/45
Figure 5.8 Layer 2 EVPN model – Third-Party Layer 3 VTEP

For further information refer to this guide: https://eos.arista.com/arista-layer-2-vtep-evpn-


vxlan-route-type-1-support/

5.4 EVPN VXLAN Layer 2 With Layer 3 IRB Integration


In this section the deployment scenarios will be examined in further detail, with a focus on
deployment use-cases, topology design options and the pros and cons of each approach.

As detailed in Section 4.6, one of the fundamental concepts to understand in EVPN VXLAN
is inter-subnet routing. Firstly, what it is, and secondly, what are the different modes of inter-
subnet routing?

Inter-subnet routing is known by a couple of different terms, inter-subnet routing, inter-VLAN


routing, or inter-VNI routing. All are the same thing.

Similarly, VXLAN overlay traffic that stays within the same VLAN, or VNI, is known as intra-
subnet forwarding, or intra-VLAN/intra-VNI forwarding.

37/45
Figure 5.8 Logical Intra VNI Topology

Figure 5.9 Intra VNI Data-plane Encapsulation

As shown in the diagrams above, intra-VNI forwarding only needs the destination MAC to
forward over the VTEP. In EVPN VXLAN this information is gleaned from the mandatory
MAC address in the type-2 route. The associated IP address may be included as a separate
MAC/IP route if an SVI is configured, but as discussed previously, this MAC/IP route allows
for ARP suppression. Intra-VNI is simple, and any vendor that supports EVPN VXLAN will
advertise MAC routes.

The complication arises when traffic needs to be routed between VNIs. In this case, there
are two methods for providing this functionality, Asymmetric forwarding and Symmetric
forwarding.

For inter-subnet routing to happen Integrated Routing and Bridging (IRB) needs to be
enabled. In EOS, this means configuring an SVI for the VLAN.

In Arista parlance, the two modes of inter-subnet routing translate as below.

38/45
Asymmetric IRB forwarding – Arista’s Direct” routing model

Symmetrical IRB forwarding – Arista’s “Indirect” routing model

EVPN Asymmetric IRB Inter-Subnet Forwarding

In this mode, the ingress VTEP must route the packet locally, then bridge over the VTEP so
that the receiving side only needs to do a VXLAN header strip and a direct Layer 2 forward
onto the receiving host.

Figure 5.10 Asymmetric IRB

As detailed in figure above, the ingress VTEP locally routes the packet into VNI 1010
(Orange) and then bridges the packet over the VTEP, with the VTEP header having the
correct VNI for the destination host (VNI1010) and inserting the DMAC for the receiving host
in the inner-packet, as this MAC is a MAC type-2 route in VNI1010. This requires the
receiving VTEP, VTEP4, to perform only a layer-2 lookup for the locally connected hosts.

The major drawback with this approach is the sending VTEP needs all the information for
every host and for every VNI to be able to build this packet. This means all VNIs, and
associated SVI with anycast IP, needs to be configured on all participating VTEPs

This introduces scaling issues on multiple fronts.

39/45
1. VNI Scaling: There are only a limited number of VNIs supported on some hardware,
so not all VNIs can reside on all VTEPs. This is especially true in datacenter
deployments, where the TOR’s have traditionally been more resource constrained than
chassis-based edge systems.

2. Forwarding memory scaling: Certain hardware has a limited forwarding resources,


so the number of MAC’s in the forwarding table can become an issue.

3. IP Next–hop scaling issue: With this mode each host has a MAC+IP binding in this
mode, meaning each /32 prefix has its own NH. This is inefficient, and in the symmetric
mode this is avoided by having the remote IP host routes all pointing to the router MAC
they are located behind.

EVPN Symmetric Inter-Subnet Forwarding

To address some of these concerns around VNI scaling and host MAC and MAC+IP state
bloat on VTEPs symmetric mode was proposed to optimize this type-2 MAC/IP inter-subnet
host routing.

Figure 5.11 Symmetric IRB

As shown in the figure above, as shared “routing” VNI is now used to forward inter-VNI
traffic, therefore not all MAC VRFs and associated SVI’s need to be configured on all VTEPs.
The model now is Routes, VXLAN bridge, then route again.

40/45
When a host in the green subnet, needs to communicate with a host in the orange subnet it
sends traffic to the default gateway (VTEP-1 router MAC in VLAN green), Now the ingress
VTEP does a lookup in the routing VRF (VNI 2000), and will swap the SMAC to be the local
router-mac and the DMAC to be destination router MAC (VTEP4 in this case). This
forwarding is standard layer 3 routing. The next step is to forward the traffic over VNI 2000,
this is VXLAN bridging, do the VNI is 2000 and the source and destination IP is simply the
VTEP IP endpoints.

The receiving router now needs to remove the VTEP tunnel header and perform a layer 3
look-up on the received packet. in this case it resolves to subnet Orange. Finally, the
receiving router just does a lookup for the DMAC of the host. SMAC is set to be the MAC of
VTEP-4.

This addresses the scaling issues seen in asymmetric mode, because now all IP hosts
routes have a NH of the remote router-mac, thus dramatically lowering the number of next-
hops in the system. Additionally, there are less MAC’s in the system overall as all MAC’s in
those VNIs not local to that VTEP are not known or installed locally.

It must be noted that even if the two MAC VRFs exist on the same VTEP, inter-subnet routing
will go via the shared routing VRF VNI.

5.5 Pure Layer 3 VPN Deployment Model (Type-5 only)


BGP EVPN can be used to advertise prefixes using an L3 IP VPN between interconnected
multi-tenant DC POD’s as shown in Figure 5.12, or in a DCI topology as illustrated in Figure
5.13. The BGP sessions are configured on all BGP EVPN Layer 3 VPN routers, in either a
full-mesh multi-hop eBGP topology between Leaf/VTEP switches, or partial-mesh using
route server capabilities on the spine. These BGP EVPN peering sessions advertise the
locally configured, or dynamically learnt, IP prefixes to all remote VTEPs.

The common use cases for using a Layer 3 EVPN service are detailed below. In Figure 5.12,
multiple Layer 2 POD’s are inter-connected at layer 3 using an EVPN L3VPN. The IP
prefixes of each tenant are advertised in their respective EVPN L3VPN instance.

41/45
Figure 5.12 Layer 3 EVPN Inter-POD VRF

Figures 5.13 and 5.14 illustrate the DCI/WAN use case, where each tenants prefixes in each
DC are advertised in a separate EVPN L3VPN instance. Within each DC the sites can be
connected at Layer 2, such that the edge BGP EVPN speakers are gateways for the local
DC subnets, and are advertising these subnets to remote DC’s, as shown in Figure 5.13.

Figure 5.13 Layer 3 EVPN DCI – Layer 2 Handoff

Or alternatively the subnets can be learnt from local peering routers, such that the BGP
EVPN speakers are advertising these learned local IP prefixes, as well as connected
prefixes to remote DC’s, as shown in Figure 5.14.

42/45
Figure 5.14 Layer 3 EVPN DCI – Layer 3 Handoff

6. Conclusion

As customers move resources to the cloud and/or expand their current cloud-based
resources an architecture that is scalable, secure and standards-based is a necessity.
Furthermore, the data center architectures now demand a high degree of flexibility, and
rapid on-ramping of services anytime and anywhere.

EVPN provides this workload mobility, optimized forwarding and routing capabilities and the
ability to extend these services in DCI and WAN interconnects for both layer 2 and layer 3
VPN services, over multiple data plane encapsulations. This allows customers to
standardize on BGP EVPN as the unified service control plane, thus simplifying and de-
risking their deployments and operations.

EOS supports a full suite of BGP EVPN service types and deployment models to support
both these layer 2 and layer 3 VPN services over VXLAN for both DC and DCI/WAN
topologies.

7. Configuration Guides & Further Reading

7.1 General Collateral


Arista Universal Cloud Network Design Guide:
https://www.arista.com/custom_data/downloads/?f=/support/download/DesignGuides/Arista-
Universal-Cloud-Network-Design.pdf

General Overview of VXLAN Control Planes: https://eos.arista.com/summary-of-arista-vxlan-


control-plane-options/

BGP L3LS Fabric Design (used for general BGP best practices): L3LS Design Guide

7.2 Layer 2 EVPN VXLAN Configuration Guides

43/45
Layer 2 configuration guide – and general concepts:

https://eos.arista.com/eos-4-18-1f/evpn-vxlan/

eBGP Underlay eBGP Overlay Layer 2 EVPN VXLAN Configuration Guide:


https://eos.arista.com/evpn-configuration-layer-2-evpn-design-with-type-2-routes/

7.3 IRB EVPN VXLAN Configuration Guides


EVPN VxLAN IRB configuration guide (covers symmetric and asymmetric:
https://eos.arista.com/eos-4-20-1f/evpn-irb-with-vxlan-underlay/

eBGP Underlay eBGP Overlay IRB EVPN VXLAN Configuration Guide:


https://eos.arista.com/evpn-configuration-ebgp-design-for-evpn-overlay-network/

Multi-Tenant EVPN VXLAN IRB Configuration & Verification Guide (iBGP Overlay eBGP
Underlay): https://eos.arista.com/multi-tenant-evpn-vxlan-irb-configuration-verification-guide-
ibgp-overlay-ebgp-underlay/

Multi-Tenant EVPN VXLAN IRB Configuration & Verification Guide (eBGP Overlay eBGP
Underlay): https://eos.arista.com/multi-tenant-evpn-vxlan-irb-configuration-verification-guide-
ebgp-overlay-underlay/

7.4 Layer 3 EVPN VXLAN Configuration Guides


L3 EVPN VXLAN Configuration Guide: https://eos.arista.com/l3-evpn-vxlan-configuration-
guide/

7.5 Related EVPN VxLAN Services and Functions


Related VXLAN recirculation requirements: https://eos.arista.com/eos-4-15-2f/vxlan-routing/

Recirculation Channels: https://eos.arista.com/eos-4-15-2f/recirculation-channel


Unconnected Ethernet Interfaces For Recirculation: https://eos.arista.com/eos-4-15-
2f/unconnected-ethernet/
A comparison of virtual ip commands https://eos.arista.com/a-comparison-of-virtual-ip-
commands/
Coexistant Static and EVPN Flood-lists: https://eos.arista.com/eos-4-21-3f/vxlan-static-and-
evpn-dual-configuration/
EVPN MLAG Shared Router MAC: https://eos.arista.com/eos-4-21-3f/evpn-mlag-shared-
router-mac/
EVPN Virtual IP Failover: https://eos.arista.com/eos-4-21-3f/virtual-ip-failover-support/
EVPN Selective ARP Install (scaling ARP): https://eos.arista.com/eos-4-21-3f/provide-user-
control-of-selective-arp/

44/45
Enabling DHCP relay in Multi-tenant IRB EVPN VXLAN Overlays: https://eos.arista.com/eos-
4-20-5f/dhcprelay-anycast/

Inserting host-routes into underlay with Asymmetric IRB VXLAN in default VRF:
https://eos.arista.com/eos-4-20-1f/hostinject/

Mapping Multicast to the underlay with BGP EVPN VXLAN Overlay:


https://eos.arista.com/eos-4-20-5f/multicast-in-vxlan-using-underlay/

Configuration Guide Type-1 Inter-OP: https://eos.arista.com/arista-layer-2-vtep-evpn-vxlan-


route-type-1-support/

7.6 Configuring & Managing EVPN VXLAN Using Cloudvision


Building the fabric underlay with Fabric Builder: https://eos.arista.com/automating-evpn-
fabric-deployment-using-cvp/

Layer 2 EVPN VRF provisioning with Cloud Vision: https://eos.arista.com/automating-l2-


evpn-instances-deployment-using-cloudvision-portal/

Layer 3 EVPN VRF provisioning with Cloud Vision: https://eos.arista.com/automating-l3-


evpn-instances-deployment-using-cloudvision-portal/

45/45

You might also like