Tuesday, August 2, 2022

Virtual Extensible Local Area Network (VXLAN)

A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks

Virtual extensible Local Area Network (VXLAN) is a tunneling protocol that tunnels Ethernet (layer 2) traffic over an IP (layer 3) network. A Technology which allows overlapping of layer 2 network over a Layer 3 underlay with use of any IP routing protocol, using MAC in UDP encapsulation.

Traditional layer 2 networks have issues because of three main reasons:

  • Spanning tree: Spanning-tree blocks any redundant links to avoid loops. Blocking links to create a loop-free topology gets the job done, but it also means we pay for links we can’t use. We could switch to a layer 3 network, but some technology requires layer 2 networking.
  • Limited amount of VLANs: The VLAN ID is 12-bit, which means we can create 4094 VLANs (0 and 4095 are reserved). Only 4094 available VLANs, which can be an issue for data centers. For example, imagine we have a service provider with 500 customers. With 4094 available VLANs, they can only offer 8 VLANs to each customer.
  • Large MAC address tables: Because of server virtualization, the number of addresses in the MAC address tables of our switches has grown exponentially. Before server virtualization, a switch only had to learn one MAC address per switchport. With server virtualization, we run many virtual machines (VM) or containers on a single physical server. Each VM has a virtual NIC and a virtual MAC address. The switch has to learn many MAC addresses on a single switchport. A Top of Rack (ToR) switch in a data center could connect to 24 or 48 physical servers. A data center could have many racks, so each switch has to store the MAC addresses of all VMs that communicate with each other. We require much larger MAC address tables compared to networks without server virtualization.

VXLAN is a Layer 2 overlay scheme over a Layer 3 network. It uses MAC Address-in-User Datagram Protocol (MAC-in-UDP) encapsulation to provide a means to extend Layer 2 segments across the data center network. VXLAN is a solution to support a flexible, large-scale multitenant environment over a shared common physical infrastructure. The transport protocol over the physical data center network is IP plus UDP. RFC 7348 is for VXLAN

VXLAN Packet Format

VXLAN defines a MAC-in-UDP encapsulation scheme where the original Layer 2 frame has a VXLAN header added and is then placed in a UDP-IP packet. With this MAC-in-UDP encapsulation, VXLAN tunnels Layer 2 network over Layer 3 network.

VXLAN introduces an 8-byte VXLAN header that consists of a 24-bit VNID and a few reserved bits. The VXLAN header together with the original Ethernet frame goes in the UDP payload. The 24-bit VNID is used to identify Layer 2 segments and to maintain Layer 2 isolation between the segments. With all 24 bits in VNID, VXLAN can support 16 million LAN segments.

The inner MAC frame is encapsulated with the following four headers (starting from the innermost header):

VXLAN Header:  This is an 8-byte field that has:

  • Flags (8 bits): where the I flag MUST be set to 1 for a valid VXLAN Network ID (VNI).  The other 7 bits (designated "R") are reserved fields and MUST be set to zero on transmission and ignored on receipt.
  • VXLAN Segment ID/VXLAN Network Identifier (VNI): this is a 24-bit value used to designate the individual VXLAN overlay network on which the communicating VMs are situated.  VMs in different VXLAN overlay networks cannot communicate with each other.
  • Reserved fields (24 bits and 8 bits): MUST be set to zero on transmission and ignored on receipt.

Outer UDP Header:  This is the outer UDP header with a source port provided by the VTEP and the destination port being a well-known UDP port.

  • Destination Port: IANA has assigned the value 4789 for the VXLAN UDP port, and this value SHOULD be used by default as the destination UDP port.  Some early implementations of VXLAN have used other values for the destination port.  To enable interoperability with these implementations, the destination port SHOULD be configurable. Can be used as Entropy to add random factor into UDP source port for better ECMP and LACP Load Sharing
  • Source Port:  It is recommended that the UDP source port number be calculated using a hash of fields from the inner packet. When calculating the UDP source port number in this manner, it is RECOMMENDED that the value be in the dynamic/private port range 49152-65535 [RFC6335].
  • UDP Checksum: It SHOULD be transmitted as zero.  When a packet is received with a UDP checksum of zero, it MUST be accepted for decapsulation.  Optionally, if the encapsulating end point includes a non-zero UDP checksum, it MUST be correctly calculated across the entire packet including the IP header, UDP header, VXLAN header, and encapsulated MAC frame.  When a decapsulating end point receives a packet with a non-zero checksum, it MAY choose to verify the checksum value.  If it chooses to perform such verification, and the verification fails, the packet MUST be dropped.  If the decapsulating destination chooses not to perform the verification, or performs it successfully, the packet MUST be accepted for decapsulation.
Outer IP Header:  This is the outer IP header with the source IP address indicating the IP address of the VTEP over which the communicating VM is running.  The destination IP address can be a unicast or multicast IP address. When it is a unicast IP address, it represents the IP address of the VTEP connecting the communicating VM as represented by the inner destination MAC address.
  • Source IP = Local VTEP Address
  • Destination IP = Remote VTEP Address

Outer Ethernet Header: It represent immediate next-hop address. The outer destination MAC address in this frame may be the address of the target VTEP or of an intermediate Layer 3 router.  The outer VLAN tag is optional.  If present, it may be used for delineating VXLAN traffic on the LAN.

VXLAN uses an overlay and underlay network

  • An overlay network is a virtual network that runs on top of a physical underlay network.
  • With VXLAN, the overlay is a layer 2 Ethernet network. The underlay network is a layer 3 IP network.
  • The underlay network is simple; its only job is to get packets from A to B. We don’t use any layer 2 here, only layer 3. When we use layer 3, we can use an IGP like OSPF or EIGRP and load balance traffic on redundant links.
  • Another advantage is that the overlay and underlay network are independent. The overlay network is virtual and requires an underlay network, but whatever changes you make in the overlay network won’t affect the underlay network. You can add and remove links in the underlay network, and if your routing protocol can reach the destination, your overlay network will remain unchanged.

VXLAN tunnel endpoint (VTEP)

Any endpoint like a host, switch, or router that supports VXLAN can be referred to as a VTEP (VXLAN Tunnel Endpoint). VTEP is the device that’s responsible for encapsulating and de-encapsulating layer 2 traffic. This device is the connection between the overlay and the underlay network. The VTEP comes in two forms:

  • Software (host-based): When I’m talking about hosts, I mean hypervisors like VMWare’s ESXi or Microsoft’s Hyper-V. These hypervisors use virtual switches, and some of them support VXLAN. The VXLAN tunnels are between the virtual switches of the hypervisors. The underlay network is unaware of VXLAN.
  • Hardware (gateway): A hardware VTEP is a router, switch, or firewall which supports VXLAN. We also call a hardware VTEP a VXLAN gateway because it combines a regular VLAN and VXLAN segment into a single layer 2 domain. Some switches have VXLAN support with ASICs, offering better VXLAN performance than a software VTEP. The VXLAN tunnels are between the physical switches. The devices that connect to the physical switches are unaware of VXLAN.
Each VTEP has two interface types:

  • VTEP IP interface: Connects the VTEP to the underlay network with a unique IP address. This interface encapsulates and de-encapsulates Ethernet frames.
  • VNI interface: A virtual interface that keeps network traffic separated on the physical interface. Like an SVI interface.

A VTEP can have multiple VNI interfaces, but they associate with the same VTEP IP interface.

 Ways to Implement VXLAN

There are two ways to implement this based on the use cases in Data Centers:

  • Bridging: When the two hosts communicating are on the same subnet and no gateways are required on the VTEPs. In this case, packets can be simply bridged over the VTIs from source VTEP to destination VTEP.
  • Routing: When the two hosts communicating are on different subnet and gateway is required on the VTEP. A packet will be routed from the source VLAN to the destination VLAN on the first hop VTEP. And then will be bridged to remote VTEP.

VXLAN Control Plane Options

  • Multicast
  • HER (Head End Replication) aka Ingress Replication in Cisco.
  • BGP EVPN
Sample Configuration of VXLAN Tunnel:
interface vxlan1
vxlan source-interface loopback <lo-int>             #Specifies the Source Interface for VXLAN traffic
vxlan udp-port 4789                                                  #Destination UDP port for VXLAN Encap Traffic
vxlan vlan <vlan-id> vni <vni-id>                         #One to one mapping between VLAN and VNI
vxlan flood vtep <remote-vtep-ip>               #Floodlist to encapsulate BUM traffic and replicate to remote

No comments:

Post a Comment