In this article I expose my study notes on the VXLAN technology with the Nexus 9000 product family, in both NX-OS and ACI mode.
What is the purpose of VXLAN?
Any network engineer can tell that we have a limitation with VLANs: we can have a maximum of 4094 unique VLANs on a single layer 2 domain.
And a VLAN stops at the layer 3 boundary, where routing devices take the role to transport packets from one location/datacenter to another. That means, whenever we want to apply Live Migration (or vMotion) to a workload on a datacenter X to swap it to datacenter Y will not possible. In this sense, VXLAN facilitates workload-anywhere strategy, which encompasses workload mobility and reachability.
General definition of VXLAN
A VXLAN is a layer 2 tunneling scheme (i.e group of virtual tunnels) formed over a layer 3 network that plays the role of (in our case) a transport network. The transport network is usually a regular IP network running static or dynamic routing protocols and is called the “underlay“. In Cisco ACI, the underlay network is run by IS-IS protocol. The VXLAN tunnels form the “overlay“.
We know Vlan ID already. And VXLAN has its own ID also: the VXLAN Network ID (VNID). So like VLANs, network hosts and nodes that belong to the same VXLAN Network ID can communicate together. They are called to be part of the same VXLAN segment. So each VNID identifies a unique VXLAN segment on the layer 2 domain. And each VXLAN segment confines two things:
- the layer 2 flooding traffic (BUM traffic) of its attached end hosts
- the configuration/fault domain.
Why is a VXLAN segment important? one of the reasons is that switches that run VXLAN, named VTEP -for VXLAN Tunnel End Point- , read this value in the VXLAN packet and make a decision. Which decision? We will learn this later.
Each VTEP device hat two types of interfaces: an IP interface and a switch interface in the local LAN segment, usually the user or server VLAN.
The IP interface connects to the VXLAN infrastructure VLAN; this is a VLAN dedicated only to communication between VTEPs.
Supposing that a server named server_A has something to communicate to another server server_B, we distinguish the concept of local VTEP and remote VTEP. From a standpoint of server_A, a local VTEP is the “VXLAN switch” nearest to him, and the remote VTEP is the VXLAN switch that attaches to server_B.
I’ve never seen until now a server that supports VXLAN. So this means, to take the packets from the server VLAN onto the VXLAN overlay, we need some sort of device that acts like a gateway and performs VLAN-to-VXLAN connectivity: this is the role of a VXLAN gateway.
VXLAN packet format
The VXLAN packet has a bizarre format, if you compare it to the regular Ethernet IEEE 802.3 frame format. That is because VXLAN uses the MAC-in-UDP encapsulation technology
We used to know that the “last point of encapsulation” occurs at layer 2 where an Ethernet header and a trailer (FCS) is added to the PDU. Well, with VXLAN this does not hold true anymore: a normal layer 2 Ethernet frame is encapsulated into another UDP packet, which itself will be subject to normal layer 2 encapsulation and de-encapsulation mechanisms.
At which level does the VXLAN encapsulation and de-encapsulation occur? It happens at the VTEP level.
I said earlier that VXLAN tunnels are
That is why we have the concepts of VXLAN encapsulation and VXLAN de-encapsulation.
As we can see in the packet format above:
- the VNID field is a 24-bits field, which gives 16 Million possible VXLAN subnets on the same layer 2 domain!
- Notice the outer IP header and the outer MAC header. Both are added or stripped off along the path on the transport network.
- VXLAN adds an overhead of 50 bytes to the normal Ethernet 802.3 frame size (1500 bytes). A network engineer must therefore ensure that the underlying transport network supports at least MTUs of 1550 bytes.
- the inner UDP destination port is called simply VXLAN port, because it is by default the IANA-assigned port number 4789.
We said that VXLAN builds virtual tunnels between VTEPs over the transport network. These tunnels can be either unicast or multicast.
To each VNID value we associate an IP multicast group address.
VXLAN on Nexus 9k in NX-OS mode
Cisco Nexus 9000 switches process VXLAN packets – in encapsulation and de-encapsulation- at line rate because the VXLAN function in the nexus 9k is hardware-based. This means that there are no CPU cycles lost processing VXLAN packets.
Packet forwarding behaviour when the MAC address and the remote VTEP are known
A unicast tunnel is used between an initiating VTEP and a destination VTEP when a MAC address that we want to reach is known to our host (i.e. in its ARP table).
Packet forwarding behaviour when the destination MAC address is not known
A multicast tunnel is used between all VTEPs belonging to the same IP multicast group, when a MAC address we want to reach is unknown. In other words, the VXLAN flow reaches all VTEPs that joined the IP multicast group that is assigned to the VNID value.
We know that nodes that joined an IP multicast group all receive the multicast-forwarded packets. However, we note here that not all of them will process the received packet. In fact they process it only when the observed VNID is equal to the VNID of their local VXLAN segment.
VXLAN and ECMP
Since VXLAN leverages the underlay network, it can benefit from Equal Cost Multi Path routing mechanism. ECMP offers also link load-sharing technique. these benefits means that we have IP path choice where multiple network paths are going to be used for traffic forwarding. This whole thing is not available with VLANs, where we run the spanning-tree protocol and focus on the added features from its different versions like RSTP or MST.
VXLAN on Nexus 9k in ACI mode
VXLAN in the Cisco ACI fabric is different from the VXLAN implementation with NX-OS software.
- In ACI, VXLAN works in IRB mode.
- At the ingress leaf, usually the endpoint generates traffic that is encapsulated in VLANs (some endpoints support VXLAN and NVGRE too!). The ACI fabric leaf:
- encapsulates the ingress frame in a Cisco VXLAN packet that contains both UDP and VXLAN headers. The VXLAN header includes the right VNI (there is a VNI-to-VLAN mapping on each fabric leaf).
- encapsulates the VXLAN packet into an IP packet
- routes the packet to the destination VTEP using the IS-IS underlay network.
- At the leaf egress, the packet is decapsulated from the Cisco VXLAN header and encapsulated into a VLAN frame (or whichever the encapsulation at the endpoint is).