Wednesday, December 21, 2022

Spanning Tree Protocol

Switching Loops

A Layer-2 switch belongs to only one broadcast domain and will forward both broadcasts and multicasts out every port but the originating port.
When a switching loop is introduced into the network, a destructive broadcast storm will develop within seconds. A storm occurs when broadcasts are endlessly forwarded through the loop. Eventually, the storm will choke off all other network traffic.
The visual symptom to analyze bridging loop or L2 loop is:

  • Broadcast Storm
  • MAC Database instability
  • Multiple Frame copies

Solution is Spanning Tree Protocol

A Spanning Tree Protocol or STP is a layer2 protocol used to prevent loops in a network by ensuring that a single path exists between any two nodes in the network.
It is a layer2 network protocol that runs on switches and creates a loop-free topology by blocking the redundant links in the Ethernet networks.

History of Spanning Tree Protocol

  • Dr. Radia Perlman of Sun Microsystems first invented STP and was specified as IEEE 802.1D.
  • Then the IEEE defined Rapid Spanning Tree Protocol (RSTP) as 802.1w in 2001. RSTP introduces new convergence behaviours and the bridge port roles for faster network change and failure recovery. In addition, RSTP is backward compatible with STP.
  • STP was initially specified as IEEE 802.1D, but the capability of spanning tree (802.1D), rapid spanning tree (802.1w), and multiple Spanning tree (802.1s) has since been integrated into IEEE 802.1Q-2014. MSTP is also backward compatible with STP.

Types of Spanning tree protocol

IEEE Versions of STP includes: -

  • IEEE 802.1D, which is the original STP version. One STP instance for all VLANs.
  • IEEE 802.1w, which is Rapid STP or RSTP. It has faster convergence than STP.
  • IEEE 802.1s, which is Multiple Spanning Tree Protocol or MSTP. 
    • IEEE response to Cisco’s Per VLAN STP, you can map various VLAN into a single STP instance.

Cisco Proprietary Versions of STP includes: -

  • Per VLAN STP+ or PVSTP, it has 1 STP instance per VLAN.
  • Per VLAN Rapid STP or R-PVSTP+ or PVRSTP, it is faster than PVSTP.

Spanning Tree Protocol Working

STP elects a root bridge in the network. The root bridge is the center of the spanning tree, and all other bridges must travel via the shortest path possible to reach the root bridge. The spanning-tree calculates the cost of each path from each bridge in the network to the root bridge. Only the path with the lowest cost is kept and used. Rest all the other paths are put on hold by placing those ports into a blocking state.
The above functions and many more are performed by exchanging BPDUs (Bridge Protocol Data Unit) between the switches every 2 seconds.

What is BPDU (Bridge Protocol Data Unit)?

  • BPDUs are 8-byte control frames that carry STP information between switches.
  • STP employs BPDUs to select a single root bridge and discover/promote TCs (Topology Changes).
  • BPDUs include the data necessary to assign distinct port responsibilities between switches and detect/avoid loops.
  • Only the root bridge sends BPDUs in a stable STP (802.1D) topology, while other bridges relay the root bridge BPDUs.
  • The most recent BPDU received on each port is saved for up to the timer’s maximum age.
  • An inferior BPDU contains root bridge information that is worse than the BPDU currently stored for the port on which it was received.
  • A superior BPDU contains root bridge information that is superior to the BPDU currently stored for the port it was received on.
  • When a superior BPDU is received on a port, the previous BPDU is overwritten, and the port is promoted to root/designated port.
  • BPDUs are generated per-VLAN with PvST.
  • PvST BPDUs include the VLAN-ID in a ‘PVID’ TLV field, the sending port’s MAC address, and a destination multicast MAC of 0100.0ccc.cccd.

The root bridge is the authoritative starting point for computing the loop-free spanning-tree structure. As a result, all bridges should only have one active link, known as the root port, to that root bridge.

A VLAN’s root bridge ports will be in the designated forwarding state. The root bridge broadcasts BPDUs with a root path cost of 0.

Building the STP topology is a multistep convergence process:

  • A Root Bridge is elected
  • Root ports are identified
  • Designated ports are identified
  • Ports are placed in a blocking state as required, to eliminate loops

Election of the Root Bridge

The first step of the STP process is to elect the root bridge in the network.
The bridge with the lowest Bridge ID is chosen as the STP root bridge.
Bridge ID, comprised of two components in the original 802.1D standard:

  • 16-bit Bridge priority
  • 48-bit MAC address

The default priority is 32,768, and the lowest priority wins. If there is a tie in priority, the lowest MAC address is used as the tie-breaker.

When a switch boots, it assumes it is the root bridge and sets the Root ID in all outgoing BPDUs to the local Bridge ID. If it receives a BPDU with a lower root ID, it considers that switch as a root switch. The local switch then starts sending BPDUs with that root ID.

On a root bridge, the output of “show spanning-tree” will show:
>> ‘this bridge is root.’
>> The same Priority and MAC address for both the Root ID and Bridge ID.

Identifying Root Port

The second step in the STP convergence process is to identify root ports.

The root port of each switch has the lowest root path cost to get to the Root Bridge.

Each switch can only have one root port. The Root Bridge cannot have a root port, as the purpose of a root port is to point to the Root Bridge.

The local switch checks the BPDUs received on ports. If BPDU packets from the root bridge are received on multiple ports, then multiple paths to the root bridge exist in the network.

The best path is then considered to be through the port that received the BPDU with the lowest path cost. As BPDUs are forwarded from one bridge to another bridge, path costs are calculated by adding each bridge’s port priority to the initial path cost.

Selection of Root Port

1st- Lowest cumulative cost to the root bridge:

  • It is the sum of all the port cost values towards the root bridge.
  • The default values are inversely based on interface bandwidth, i.e., a higher bandwidth interface will have a lower cost.
  • The port cost can be changed manually using “spanning-tree cost.”

2nd- Lowest upstream BID:

  • They are used to choose one bridge over another when two uplinks to different bridges are available.
Third- Lowest port ID
Port ID is used as the final tiebreaker, and consists of two components:

4-bit port priority
12-bit port number, derived from the physical port number
  • The lowest port priority (0-255), default, is 128.
  • The lowest port number value assigned by IOS software, e.g., Fa0/1, may have a port number of 1.

Identifying Designated Port

  • The third step in the STP convergence process is to identify designated ports.
  • A single designated port is identified for each network segment.
  • This port is responsible for forwarding BPDUs and frames to that segment.
  • If two ports are eligible to become the designated port, then there is a loop.
  • One of the ports will be placed in a blocking state to eliminate the loop.
  • Like a root port, the designated port is determined by the lowest cumulative path cost leading the Root Bridge.
  • A designated port will never be placed in a blocking state unless there is a change to the switching topology and a more preferred designated port is elected.

Note: A port can never be both a designated port and a root port.

Blocking non-forwarding ports

If a port is not a root or designated port and has received BPDUs, those ports are put into a blocking state. Although they are administratively up, these ports are not permitted to forward traffic (they still can generate and receive BPDUs). This type of port is also referred to as an alternate port or backup port in RSTP.

Spanning Tree Protocol states

As STP converges the switching topology, a switch port will progress through a series of states:

  • Blocking
  • Listening
  • Learning
  • Forwarding

Blocking

  • A port that is blocking is essentially idle. The blocking delay is 20 seconds.
  • The port cannot forward or receive frames or record MAC addresses.
  • The port is responsible for receiving and processing BPDUs only.
  • The port can receive and respond to network management messages if required.

Listening

  • The listening state is like the blocking state, except that BPDUs are sent and received in this state. Again, frame forwarding remains prohibited, and no addresses are learned.
  • The listening delay is 15 seconds.

Learning

  • A port in the learning state does not forward frames, but it does analyze frames that come in and retrieve the MAC addresses from those frames and them into the MAC address table or CAM table. 
  • The frames are discarded after they have been analyzed.
  • The learning delay is 15 seconds.

Forwarding

  • You can think of the forwarding state as the “normal” state. In this state, a port receives and transmits BPDUs, examines incoming packets for MAC address information, and forwards frames from other switch ports.
  • When a port is in the forwarding state, the device or network connected to it is active and ready to communicate.

STP Bonus Features

To Improve STP Convergence

  • In many environments, a 30 second outage for every topology change is unacceptable. 
  • Cisco developed three proprietary features that improve STP convergence time:

PortFast

  • By default, all ports on a switch participate in the STP topology. This includes any port that connects to a host, such as a workstation.
  • The host port will transition through the normal STP states, including waiting two forward delay times. Thus, a host will be without network connectivity for a minimum of 30 seconds when first powered on.
  • Portfast does two things for us:
  • Interfaces with portfast enabled that come up will go to forwarding mode immediately, the interface will skip the listening and learning state.
  • A switch will never generate a topology change notification for an interface that has portfast enabled. Its eliminates the unnecessary BPDU traffic and frame flooding
  • Portfast is disabled by default. To enable PortFast on a switch port:
SwitchD(config)# int gi1/14
SwitchD(config-if)# spanning-tree portfast
  • PortFast can also be globally enabled for all interfaces:
SwitchD(config)# spanning-tree portfast default

UplinkFast

  • Uplinkfast is a spanning-tree feature that was created to improve the convergence time.
  • Normally, if the root port fails on the local switch, STP will need to perform a recalculation to transition the other port out of a blocking state.
  • BPDUs are originated from the root bridge so if we receive BPDUs on an interface the switch knows it can reach the root bridge on this interface. We have to go through the listening (15 seconds) and learning state (15 seconds) so it takes 30 seconds to end up in the forwarding state.
  • UplinkFast is disabled by default, and must be enabled globally for all VLANs on the switch:
Switch(config)# spanning-tree uplinkfast
  • UplinkFast functions by tracking all possible links to the Root Bridge. Thus, UplinkFast is not supported on the Root Bridge. In fact, enabling this feature will automatically increase a switch’s bridge priority to 49,152.
  • UplinkFast is intended for the furthest downstream switches in the STP topology.

BackboneFast

  • UplinkFast provides faster convergence if a directly connected port fails. 
  • In contrast, BackboneFast provides improved convergence if there is an indirect failure in the STP topology.
  • If the link between SwitchB and SwitchA fails, SwitchD will eventually recalculate a path through SwitchE to reach the Root Bridge. However, SwitchD must wait the max age timer before purging SwitchB’s superior BPDU information. By default, this is 20 seconds.
  • BackboneFast allows a switch to bypass the max age timer. The switch will accept SwitchE’s inferior BPDU’s immediately. The blocked port on SwitchE must still transition to a forwarding state. Thus, BackboneFast essentially reduces total convergence time from 50 seconds to 30 seconds for an indirect failure.
  • This is accomplished by sending out Root Link Queries (RLQs). The Root Bridge will respond to these queries with a RLQ Reply:
  • If a RLQ Reply is received on a root port, the switch knows that the root path is stable.
  • If a RLQ Reply is received on a non-root port, the switch knows that the root path has failed. The max age timer is immediately expired to allow a new root port to be elected.
  • BackboneFast is a global command, and should be enabled on every switch:
Switch(config)# spanning-tree backbonefast

To Protect STP

STP is vulnerable to attack for two reasons:

  • STP builds the topology by accepting BPDUs from neighboring switches.
  • The Root Bridge is always determined by the lowest Bridge ID.
A switch with a low priority can be maliciously or inadvertently installed on the network, and then elected as the Root Bridge. STP will reconverge, often resulting in instability or a suboptimal topology.
Cisco implemented three mechanisms to protect the STP topology:

Root Guard

  • Root Guard prevents an unauthorized switch from advertising itself as a Root Bridge. If a BPDU superior to the Root Bridge is received on a port with Root Guard enabled, the port is placed in a root-inconsistent state.
  • In this state, the port is essentially in a blocking state, and will not forward frames. The port can still listen for BPDUs.
  • Root Guard is enabled on a per-port basis, and is disabled by default:
Switch(config)# interface gi1/14
Switch(config-if)# spanning-tree guard root
  • To view all ports that have been placed in a root-inconsistent state:
Switch# show spanning-tree inconsistentports
Name Interface Inconsistency
-------------------- -------------------- ------------------
VLAN100 GigabitEthernet1/14 Root Inconsistent
  • Root Guard can automatically recover. As soon as superior BPDUs are no longer received, the port will transition normally through STP states.

BPDU Guard

  • Spanning-tree BPDUguard is one of the features that helps you protect your spanning-tree topology.
  • PortFast should only be enabled on ports connected to a host. If enabled on a port connecting to a switch, any loop may result in a broadcast storm.
  • To prevent such a scenario, BPDU Guard can be used in conjunction with PortFast.
  • Under normal circumstances, a port with PortFast enabled should never receive a BPDU, as it is intended only for hosts.
  • BPDU Guard will place a port in an errdisable state if a BPDU is received, regardless of if the BPDU is superior or inferior.
  • The STP topology will not be impacted by another switch that is inadvertently connected to that port.
  • BPDU Guard should be enabled on any port with PortFast enabled.
  • It is disabled by default, and can be enabled on a per-interface basis:
Switch(config)# interface gi1/14
Switch(config-if)# spanning-tree bpduguard enable
  • If BPDU Guard is enabled globally, it will only apply to PortFast ports:
Switch(config)# spanning-tree portfast bpduguard default
  • An interface can be manually recovered from an errdisable state by performing a shutdown and then no shutdown:
Switch(config)# interface gi1/14
Switch(config-if)# shutdown
Switch(config-if)# no shutdown
  • BPDUs will still be sent out ports enabled with BPDU Guard.

BPDU Filtering

  • The spanning-tree BPDUfilter works like BPDUGuard as it allows you to block malicious BPDUs. The difference is that BPDUguard will put the interface that it receives the BPDU on in err-disable mode while BPDUfilter just “filters” it.
  • BPDUfilter can be configured globally or on the interface level and there’s a difference:
  • Global: if you enable BPDUfilter globally then any interface with portfast enabled will not send or receive any BPDUs. When you receive a BPDU on a portfast enabled interface then it will lose its portfast status, disables BPDU filtering and acts as a normal interface.
  • Interface: if you enable BPDUfilter on the interface it will ignore incoming BPDUs and it will not send any BPDUs. This is the equivalent of disabling spanning-tree.
  • Great care must be taken when manually enabling BPDU Filtering on a port. Because the port will ignore a received BPDU, STP is essentially disabled. The port will neither be err-disabled nor progress through the STP process, and thus the port is susceptible to loops.
  • If BPDU Filtering is enabled globally, it will only apply to PortFast ports:

Switch(config)# spanning-tree portfast bpdufilter default

  • To enable BPDU Filtering on a per-interface basis:
Switch(config)# interface gi1/15
Switch(config-if)# spanning-tree bpdufilter enable

Spanning-Tree LoopGuard and UDLD

  • If you ever used fiber cables you might have noticed that there is a different connector to transmit and receive traffic.
  • If one of the cables (transmit or receive) fails, we’ll have a unidirectional link failure, and this can cause spanning tree loops. There are two protocols that can take care of this problem:

UDLD (Unidirectional Link Detection)

  • Cisco developed Unidirectional Link Detection (UDLD) to ensure that bidirectional communication is maintained. 
  • UDLD sends out ID frames on a port and waits for the remote switch to respond with its own ID frame. 
  • If the remote switch does not respond, UDLD assumes the port has a unidirectional fault. 
  • By default, UDLD sends out ID frames every 15 seconds on most Cisco platforms. 
  • Some platforms default to every 7 seconds. UDLD must be enabled on both sides of a link.
  • UDLD reacts one of two ways when a unidirectional link is detected:
  • Normal Mode – the port is not shut down, but is flagged as being in an undetermined state.
  • Aggressive Mode – the port is placed in an errdisable state
  • UDLD can be enabled globally, though it will only apply for fiber ports:
Switch(config)# udld enable message time 20
Switch(config)# udld aggressive message time 20

  • The enable parameter sets UDLD into normal mode, and the aggressive parameter is for aggressive mode. 
  • The message time parameter modifies how often ID frames are sent out, measured in seconds.
  • UDLD can be configured on a per-interface basis:
Switch(config-if)# udld enable
Switch(config-if)# udld aggressive
Switch(config-if)# udld disable

  • To view UDLD status on ports, and reset any ports disabled by UDLD:
Switch# show udld
Switch# udld reset

LoopGuard

  • UDLD addresses only one of the possible causes of this scenario – a unidirectional link. 
  • Other issues may prevent BPDUs from being received or processed, such as the CPU on a switch being at max utilization.
  • Loop Guard provides a more comprehensive solution – if a blocking port stops receiving BPDUs on a VLAN, it is moved into a loop-inconsistent state for that VLAN.
  • A port in a loop-inconsistent state cannot forward traffic for the affected VLANs, and is essentially in a pseudo-errdisable state.
  • However, Loop Guard can automatically recover. As soon as BPDUs are received again, the port will transition normally through STP states.
  • Loop Guard can be enabled globally:

Switch(config)# spanning-tree loopguard default

  • Loop Guard can also be enabled on a per-interface basis:
Switch(config)# intecrfae gi2/23
Switch(config-if)# spanning-tree guard loop

  • Loop Guard should only be enabled on trunk ports, or ports that connect to other switches. 
  • Loop Guard should never be enabled on a port connecting to a host, as an access port should never receive a BPDU.

Thursday, December 1, 2022

OSI and TCP/IP Model

How many layers are there in networking?
Do we have a four-layer TCPIP model or perhaps a five-layer TCPIP model or do we have a seven-layer OSI model.

  • Five Layer TCPIP model which is a combination of the original RFC 1122 TCPIP model and the OSI models. So basically, it's a hybrid of multiple models.
                                   
  • We have a model which mean we're taking a complex problem and we're breaking it up into smaller components or smaller pieces.
  • Models are used in many places as an example if you're building a house, you typically have a blueprint or a model of what the House is going to look like.
  • It makes a lot more sense to create a blueprint of a house and then have specific people work on specific parts of the building and do what they are good at.
  • So as an example, a plumber will work on the plumbing, an electrician will concentrate on the electricity, a bricklayer will concentrate on laying the bricks.
  • But they all work together to correct to the end result which is the house that you want built.
  • It's going to be much easier to have a blueprint or a model that everyone works towards to build something rather than them just arriving on site and then saying let's build this house, but they don't actually know what the house looks like.
  • You have an electrician working on the plumbing or a plumber working on bricklaying.
  • That's not going to scale very well.
  • So, similarly we have different layers in the OSI model and different people concentrate on different layers.
  • Now the layers that we as networking people concentrate on are the lower four layers which in the OSI model are called transport, network, data link and physical.
  • In the new version of the CCNA they are using this hybrid model where they've taken parts of the OSI model and added it to the TCPIP model.
  • You need to know both the OSI model and TCPIP model but concentrate on the TCPIP model.
  • The OSI model which consists of the seven layers physical layer, data link layer, network layer, transport layer, session layer, presentation layer, and application layer isn't as important as the TCPIP hybrid model if you like.
  • So, a five-layer TCPIP model which is more real world which has a physical layer, data link layer, network layer, transport layer and a combined application layer.
  • But notice we talk about Layer 7 applications because of the history of the OSI model being used.
  • So, notice we have layer 1, layer 2, layer 3, layer 4, those are the layers that we concentrate on as a networking person. And then we have a combined layer 5 to Layer 7 called the application layer but we still referred to it as application layer.

Let discuss more in detail all seven layer of OSI Model first

The Open Systems Interconnection (OSI) model is a conceptual model created by the International Organization for Standardization in 1984 which enables diverse communication systems to communicate using standard protocols. In plain English, the OSI provides a standard for different computer systems to be able to communicate with each other.
The OSI Model can be seen as a universal language for computer networking. It’s based on the concept of splitting up a communication system into seven abstract layers, each one stacked upon the last.

The OSI model is divided into two layers: upper layers and lower layers.

The upper layer of the OSI model mainly deals with the application related issues, and they are implemented only in the software. The application layer is closest to the end user. Both the end user and the application layer interact with the software applications. An upper layer refers to the layer just above another layer.
The lower layer of the OSI model deals with the data transport issues. The data link layer and the physical layer are implemented in hardware and software. The physical layer is the lowest layer of the OSI model and is closest to the physical medium. The physical layer is mainly responsible for placing the information on the physical medium.

  • Application Layer (Layer 7) – Human-computer interaction layer, where applications can access the network services. Here are your applications. E-mail, browsing the web (HTTP), FTP, and many more.
  • Presentation layer (Layer 6) – Ensures that data is in a usable format and is where data encryption occurs. This one will make sure that information is readable for the application layer by formatting and structuring the data. Most computers use the ASCII table for characters. If another computer would use another character like EBCDIC, then the presentation layer needs to “reformat” the data, so both computers agree on the same characters.
  • Session Layer (Layer 5) – Maintains connections and is responsible for controlling ports and sessions. The session layer takes care of establishing, managing, and terminating sessions between two hosts. When you are browsing a website on the internet, you are probably not the only user of the web server hosting that website. This web server needs to keep track of all the different “sessions.”
  • Transport Layer (Layer 4) – Transmits data using transmission protocols including TCP and UDP. When you downloaded this lesson from the Internet, the webpage was sent in segments and transported to your computer.
  • Network Layer (Layer 3) – Decides which physical path the data will take. This layer takes care of connectivity and path selection (routing). This is where IPv4 and IPv6 live. Every network device needs a unique address on the network.
  • Datalink Layer (Layer 2) – Defines the format of data on the network. This layer makes sure data is formatted the correct way, takes care of error detection, and makes sure data is delivered reliably. This might sound a bit vague, but for now, remember that this is where “Ethernet” lives. MAC Addresses and Ethernet frames are on the Data Link layer.
  • Physical Layer (Layer 1) – Transmits raw bit stream over the physical medium. This layer describes stuff like voltage levels, timing, physical data rates, physical connectors, and so on. Everything you can “touch” since it’s physical.

Let’s take a look at a real-life example of data transmission:

  1. You are sitting behind your computer and want to download some files from a local webserver. You start up your web browser and type in the URL of your favorite website. Your computer will send a message to the web server requesting a certain web page. You now use the HTTP protocol, which lives on the application layer.
  2. The presentation layer will structure the information of the application in a certain format.
  3. The session layer will make sure to separate all the different sessions.
  4. Depending on the application, you want a reliable (TCP) or unreliable (UDP) protocol to transfer data to the web server. In this case, it’ll choose TCP since you want to ensure the webpage makes it to your computer. We’ll discuss TCP and UDP later.
  5. Your computer has a unique IP address (for example, 192.168.1.1), and it will build an IP packet. This IP packet will contain all the data of the application, presentation, and session layer. It also specifies which transport protocol it’s using (TCP in this case) and the source IP address (your computer 192.168.1.1), and the destination (the web server’s IP address).
  6. The IP packet will be put into an Ethernet Frame. The Ethernet frame has a source MAC address (your computer) and the destination MAC address (webserver). More about Ethernet and MAC addresses later.
  7. Finally, everything is converted into bits and sent down the cable using electric signals.

Going from the application layer all the way down to the physical layer is what we call encapsulation.
Going from the physical layer and working your way up to the application layer is called de-encapsulation.

Now you know about the OSI model, the different layers, and the function of each layer. During peer-to-peer communication, each layer has “packets of information.” We call these protocol data units (PDU). Now every unit has a different name on the different layers:

  • Transport layer: Segments; For example, we talk about TCP segments.
  • Network layer: Packets; For example, we talk about IP packets here.
  • Data link layer: Frames; For example, we talk about Ethernet frames here.
  • Physical layer: Bits; For example, we talk about the electric current going on/off (1/0)

This is just terminology, so don’t mix up talking about IP frames and Ethernet packets…

OSI is just a reference model, a blueprint. TCP/IP model is an implementation of current internet architecture.

OSI model is developed by ISO (International Standard Organization) whereas TCP/IP model is developed by ARPANET (Advanced Research Project Agency Network).

Monday, October 10, 2022

Introduction to DMVPN

  • DMVPN (Dynamic Multipoint VPN) is a routing technique we can use to build a VPN network with multiple sites without having to statically configure all devices. 
  • It’s a “hub and spoke” network where the spokes will be able to communicate with each other directly without having to go through the hub. 
  • Encryption is supported through IPsec which makes DMVPN a popular choice for connecting different sites using regular Internet connections. 
  • It’s a great backup or alternative to private networks like MPLS VPN. 
  • A popular alternative to DMVPN is FlexVPN.
  • DMVPN is an overlay hub and spoke technology that allows an enterprise to connect its offices across an NBMA network.

A final note that must be iterated is that DMVPN is a routing technique and is NOT a security feature. By default, any traffic sent over DMVPN will be in clear text since GRE is used as the transport tunnel however this traffic can be referenced in an IPSec transform-set and be encrypted if you wanted.

There are four pieces to the DMVPN puzzle:

  • Multipoint GRE (mGRE)
  • NHRP (Next Hop Resolution Protocol)
  • Routing (RIP, EIGRP, OSPF, BGP, etc.)
  • IPsec (not required but recommended)

Multipoint GRE

Our “regular” GRE tunnels are point-to-point and don’t scale well. For example, let’s say we have a company network with some sites that we want to connect to each other using regular Internet connections:
We have one router that represents the HQ and there are four branch offices. Let’s say that we have the following requirements:

  • Each branch office must be connected to the HQ.
  • Traffic between Branch 1 and Branch 2 must be tunnelled directly.
  • Traffic between Branch 3 and Branch 4 must be tunnelled directly.
To accomplish this, we will have to configure a bunch of GRE tunnels which will look like this:

Thing will get messy quickly…we must create multiple tunnel interfaces, set the source/destination IP addresses etc. It will work but it’s not a very scalable solution. Multipoint GRE, as the name implies allows us to have multiple destinations. When we use them, our picture could look like this:

  • When we use GRE Multipoint, there will be only one tunnel interface on each router. 
  • The HQ for example has one tunnel with each branch office as its destination. 
  • Now you might be wondering, what about the requirement where branch office 1/2 and branch office 3/4 have a direct tunnel?
  • Right now, we have a hub and spoke topology. 
  • The cool thing about DMVPN is that we use multipoint GRE so we can have multiple destinations. When we need to tunnel something between branch office 1/2 or 3/4, we automatically “build” new tunnels, as seen in above figure.

When there is traffic between the branch offices, we can tunnel it directly instead of sending it through the HQ router. This sounds pretty cool, but it introduces some problems…

When we configure point-to-point GRE tunnels, we must configure a source and destination IP address that are used to build the GRE tunnel. When two branch routers want to tunnel some traffic, how do they know what IP addresses to use?

Above we have our HQ and two branch routers, branch1 and branch2. Each router is connected to the Internet and has a public IP address:

  • HQ: 1.1.1.1
  • Branch1: 2.2.2.2
  • Branch2: 3.3.3.3

On the GRE multipoint tunnel interface, we use a single subnet with the following private IP addresses:

  • HQ: 192.168.1.1
  • Branch1: 192.168.1.2
  • Branch2: 192.168.1.3

Let’s say that we want to send a ping from branch1’s tunnel interface to the tunnel interface of branch2. Here’s what the GRE encapsulated IP packet will look like:

The “inner” source and destination IP addresses are known to use, these are the IP address of the tunnel interfaces. We encapsulate this IP packet, put a GRE header in front of it and then we must fill in the “outer” source and destination IP addresses so that this packet can be routed on the Internet. The branch1 router knows its own public IP address but it has no clue what the public IP address of branch2 is…

To fix this problem, we need some help from another protocol…

NHRP (Next Hop Resolution Protocol)

We need something that helps our branch1 router figure out what the public IP address is of the branch2 router, we do this with a protocol called NHRP (Next Hop Resolution Protocol). Here’s an explanation of how NHRP works:

  • One router will be the NHRP server.
  • All other routers will be NHRP clients.
  • NHRP clients register themselves with the NHRP server and report their public IP address.
  • The NHRP server keeps track of all public IP addresses in its cache.
  • When one router wants to tunnel something to another router, it will request the NHRP server for the public IP address of the other router.

Since NHRP uses this server and client’s model, it makes sense to use a hub and spoke topology for multipoint GRE. Our hub router will be the NHRP server, and all other routers will be the spokes.

Here’s an an illustration of how NHRP works with multipoint GRE:


Above we have two spoke routers (NHRP clients) which establish a tunnel to the hub router. Later once we look at the configurations you will see that the destination IP address of the hub router will be statically configured on the spoke routers. The hub router will dynamically accept spoke routers. The routers will use a NHRP registration request message to register their public IP addresses to the hub.

  • The hub, our NHRP server will create a mapping between the public IP addresses and the IP addresses of the tunnel interfaces.
  • A few seconds later, spoke1 decides that it wants to send something to spoke2. It needs to figure out the destination public IP address of spoke2 so it will send a NHRP resolution request, asking the Hub router what the public IP address of spoke 2 is.
  • The Hub router checks its cache, finds an entry for spoke 2 and sends the NHRP resolution reply to spoke1 with the public IP address of spoke2.
  • Spoke1 now knows the destination public IP address of spoke2 and is able to tunnel something directly. This is great, we only required the hub to figure out what the public IP address is and all traffic can be sent from spoke to spoke directly.

When we talk about DMVPN, we often refer to an underlay and overlay network:

  • The underlay network is the network we use for connectivity between the different routers, for example the Internet.
  • The overlay network is our private network with GRE tunnels.

NHRP is a bit like ARP or frame-relay inverse ARP. Instead of mapping L2 to L3 information, we are now mapping a tunnel IP address to a NBMA IP address.

DMVPN Phases

DMVPN has different versions which we call phases, there’s three of them:

  • Phase 1
  • Phase 2
  • Phase 3

Phase 1

  • DMVPN is the first phase that was defined when this technology was implemented by Cisco and is strictly designed for Hub and Spoke communications only.
  • With phase 1 we use NHRP so that spokes can register themselves with the hub. 
  • The hub is the only router that is using a multipoint GRE interface, all spokes will be using regular point-to-point GRE tunnel interfaces. 
  • This means that there will be no direct spoke-to-spoke communication, all traffic has to go through the hub!
  • Since our traffic has to go through the hub, our routing configuration will be quite simple. 
  • Spoke routers only need a summary or default route to the hub to reach other spoke routers.

  • R1 is acting as the DMVPN hub for this network and is therefore the NHS for NHRP registration of the spokes. 
  • In phase 1 the GRE tunnels shown are multipoint GRE on the hub and point-to-point on the spokes. 
  • This forces hub and spoke traffic flows on the spokes.
As a high-level configuration on R1 we can see the basic configurations for DMVPN phase 1.

The first two commands shown create a GRE tunnel and sets the VPN address and is nothing new to DMVPN configurations.

  • no ip redirects - Disables ICMP Redirects on this interface. With DMVPN there can be cases where traffic flows through the hub initially to reach the destination, if the next-hop address isn't set as the hub this would cause an ICMP redirect message to be sent. Disabling this will prevent excessive redirect traffic.
  • no ip split-horizon eigrp 1 - Disables split-horizon on the hub so routes from one spoke can be sent down to another. Since EIGRP is always loop-free due to the feasibility condition this will not cause a loop.
  • ip nhrp authentication - This command requires spokes to authenticate with the hub before registration or resolution requests can be made. This is optional however advised.
  • ip nhrp map multicast dynamic - This command performs a static NHRP mapping on the hub that allows it to send all multicast traffic (Including routing protocol hellos) to all dynamically learned spokes.
  • ip nhrp map 100.64.0.1 10.1.1.1 - This command performs a static NHRP mapping on the hub that specifies that the VPN address 100.64.0.1 maps to the physical address of 10.1.1.1 in the underlying topology.
  • ip nhrp network-id - This command specifies the ID of the DMVPN cloud. It's required to allow a router to distinguish between each DMVPN network as more than one can be created on a router.
  • ip nhrp nhs 100.64.0.1 - This command specifies who the next hop server is on the network. Notice that this is the VPN address and therefore a static mapping for this will be needed. This is the reason for the mapping command earlier.
  • ip summary-address eigrp 1 0.0.0.0 0.0.0.0 - Send out a default summary route to spokes. Since only hub and spoke traffic flows are allowed default routing can be done via the hub to reduce the routing table information on spokes.
  • tunnel source GigabitEthernet1 - Specifies the source of the tunnel interface. The address of this interface will be advertised in the registration message and should be reachable via the spokes.
  • tunnel mode gre multipoint - Specifies the interface as a multipoint GRE interface and that an explicit destination doesn't need to be specified.

Next, a brief review of the spoke configuration can be seen below:

The configuration of a spoke router is simpler with just the usual IP address configuration, NHS specification and mapping and authentication parameters required. The most noticeable difference is the explicit specification of the tunnel destination. By having point-to-point tunnels on the spokes, it forces a hub and spoke topology.

Phase 2

The disadvantage of phase 1 is that there is no direct spoke to spoke tunnels. In phase 2, all spoke routers use multipoint GRE tunnels, so we do have direct spoke to spoke tunneling. When a spoke router wants to reach another spoke, it will send an NHRP resolution request to the hub to find the NBMA IP address of the other spoke. This function however relies heavily on your routing design and in ensuring that the next-hop address is preserved during advertisement from the hub down to other spokes.

  • In DMVPN phase 2 when a spoke router wishes to communicate with another spoke router it will look at its routing table to determine the next-hop address.
  • For R2 to reach R3's loopback address it needs to send the traffic to 100.64.0.3 which is located out the tunnel0 interface. 
  • Since this is now a multipoint GRE interface, R2 will check it's NHRP cache to determine what the underlying address of R3 is so underlying routing can occur. This can be seen with show dmvpn.
  • The final part on DMVPN phase 2 is to briefly look at the configuration changes made to enable this phase. 

Starting with the hub tunnel configuration:

The configuration changes made was the removal of the summary route as that would cause the next-hop address to become the hub and therefore cause the data-plane to flow through the hub. 

  • In addition to this was the disabling of changing the next-hop value for EIGRP as it propagates the traffic across the DMVPN. 
  • This is what controls spoke-to-spoke traffic flows. 
  • Each routing protocol implements this different, however, and therefore caution needs to be made. 
  • For example, in OSPF this is achieved by making the network type as Broadcast; iBGP doesn't change the next-hop by default anyway and eBGP can use third party next hop. 
  • Lastly, IS-IS can be configured by setting the network type of broadcast though care needs to be taken to ensure database synchronization is complete.

The configuration on the spokes:

  • On the spokes the most noticeable change is the conversion of the tunnel from a point-to-point GRE tunnel to a multipoint GRE. 
  • This allows spoke-to-spoke traffic flows as data isn't forced to be sent to the hub. 
  • The other NHRP mapping command tells the spoke to send any multicast traffic to the hub router. 
  • This is to allow EIGRP neighbours to form as multicast traffic is then sent to 10.1.1.1 directly. 
  • Note that the hub address is referenced as the underlay address.

Phase 3

DMVPN Phase 3 is the final and most scalable phase in DMVPN as it combines the summarisation benefits of phase 1 with the spoke-to-spoke traffic flows achieved via phase 2. This phase works by having the Hub summarise a default route or to summarise all spoke prefixes and then to enable NHRP redirection messages. On the spokes you only need to enable NHRP shortcuts for DMVPN phase 3 as routing will complete the rest. Therefore, DMVPN will be using the same topology as before:

                               

  • On R2 the only entry in its routing table is a received default route from R1 with it being set as the next-hop. 
  • As we know, this would normally cause hub and spoke traffic flows as the next-hop points to the hub for the data plane.
  • Therefore, when R2 wishes to ping the loopback on R3 (Spoke-to-spoke traffic flow) it will initially send the traffic up to the hub as that is what the routing table has suggested. 
  • Upon receipt of the packet R1 will realize that the destination is another spoke on the DMVPN network (Due to the destination being out the same DMVPN interface). 
  • Therefore, R1 will then send an NHRP redirect down to both R2 and R3 telling them that their source addresses are reachable by each other. This can be seen in R1's debug output.
  • Upon receipt of a redirect message, R2 and R3 will conduct NHRP resolution requests with each other (R1 acting as the proxy) to resolve their NBMA Address with their VPN address. 
  • Upon completion, R2 will have two extra entries in its routing table recorded by NHRP.
  • One is the destination network that was previously summarised, the second one is then the used for recursive routing to allow spoke-to-spoke traffic flows on the tunnel interface. 
  • This process then happens for each network and next-hop on the DMVPN network so only the information that is needed are on the spokes. 
  • As opposed to DMVPN phase 2 where all routing information was propagated.
  • As with any DMVPN verification, traceroute reveals the direct spoke-to-spoke traffic flows.

Phase 3 configurations are relatively simple and normally require summarisation and enabling of NHRP redirects. Let's review the hubs configuration first...

Since we are using EIGRP we can issue a summary-address command to force a default route to be advertised out to the spokes. Equally, we only need to enable the sending of NHRP redirects. Also note that next-hop self has been enabled again since phase 2 as NHRP redirects handle the next-hop. Moving on to the spoke’s configuration...

On the spokes you just need enable NHRP shortcut which allows the spoke to accept any NHRP redirect information. Failure to input this would force hub and spoke traffic flows as the spokes would ignore any redirect messages sent to it.