Wednesday, September 28, 2022

Firewall Port Considerations for Cisco SDWAN

  • The secure sessions between the WAN Edge routers and the controllers (and between controllers), by default are DTLS, which is User Datagram Protocol (UDP)-based.
  • The default base source port is 12346.
  • The WAN Edge may use port hopping where the devices try different source ports when trying to establish connections to each other in case the connection attempt on the first port fails.
  • The WAN Edge will increment the port by 20 and try ports 12366, 12386, 12406, and 12426 before returning to 12346.
  • Port hopping is configured by default on a WAN Edge router, but you can disable it globally or on a per-tunnel-interface basis.
  • It is recommended to run port-hopping at the branches but disable this feature on SD-WAN routers in the data center, regional hub, or any place where aggregate traffic exists because connections can be disrupted if port hopping occurs.
  • Note that port hopping is disabled on the controllers by default and should be kept disabled.
  • Control connections on vManage and the vSmart controller with multiple cores have a different base port for each core.
  • For WAN Edge routers that sit behind the same NAT device and share a public IP address, you do not want each WAN Edge to attempt to connect to the same controller using the same port number. Although NAT or port hopping may allow both devices to use a unique source port, you can instead configure an offset to the base port number of 12346, so the port attempts will be unique (and more deterministic) among the WAN Edge routers.
  • A port offset of 1 will cause the WAN Edge to use the base port of 12347, and then port-hop with ports 12367, 12387, 12407, and 12427. Port offsets need to be explicitly configured, and by default, the port offset is 0.

Alternatively, you can use TLS to connect to the vManage and vSmart controllers, which is TCP-based instead of UDP-based. vBond controller connections always use DTLS, however. TCP ports originate on the WAN Edge from a random port number, and control connections to controllers with multiple cores have a different base port for each core, like the DTLS case.

Examples of DTLS and TLS control connections are shown in the following diagram.

  • Note that every core on vManage and vSmart makes a permanent connection to vBond while WAN Edge routers makes a transient connection to vBond, using DTLS only.
  • The WAN Edge routers connect to only one vManage and vSmart core. vManage and WAN Edge routers act as clients when connecting to vSmart controllers, so when using TLS, their source ports are random TCP ports > 1024.
  • The WAN Edge router in the TLS example is configured with an offset of 2, so it uses the offset on the DTLS source port when connecting to vBond.

  • IPsec tunnel encapsulation from a WAN Edge router to another WAN Edge router uses UDP with similar ports as defined by DTLS.
  • Ensure that any firewalls in the network allow communication between WAN Edge routers and controllers and between controllers. Ensure that they are configured to allow return traffic as well.

Additional Ports for the VPN 0 Transport

In VPN 0 on the transport interface, almost all communication occurs over DTLS/TLS or IPsec, but there are a few other ports that need consideration.

Network Configuration Protocol (NETCONF)

The NETCONF protocol defines a mechanism through which network devices are managed and configured. vManage uses NETCONF for communication with SD-WAN devices, primarily over DTLS/TLS, but there are a few situations where NETCONF is used natively before DTLS/TLS connections are formed:

  • When any controller (vManage, vBond, or vSmart) is added to vManage, a vManage instance uses NETCONF to retrieve information from them and allows them to be added as devices into the GUI. This might be when initially adding controllers to vManage, or for incremental horizontal scaling deployments, by adding vManage instances to a cluster or adding additional vSmart or vBond controllers.
  • If any controller reloads or crashes, then that controller uses NETCONF to communicate back to vManage before encrypted DTLS/TLS sessions are re-formed.
  • NETCONF is also used from vManage when generating Certificate Signing Requests from controllers through the vManage GUI before DTLS/TLS connections are formed.

NETCONF is encrypted SSH using AES-256-GCM and uses TCP destination port 830.

Secure Shell (SSH)

  • SSH provides a secure, encrypted channel over an unsecured network.
  • It’s typically used to log into a remote machine to execute commands, but it can also be used in file transfer (SFTP) and secure copy (SCP) from and to all SD-WAN devices.
  • vManage uses SCP to install signed certificates onto the controllers if DTLS/TLS connections are not yet formed between them.
  • SSH uses TCP destination port 22.

Network Time Protocol (NTP)

  • NTP is a protocol used for clock synchronization between network devices.
  • If an NTP server is being used and can natively be accessed through the VPN 0 WAN transport be sure NTP is allowed through the firewall.
  • NTP uses UDP port 123.

Domain Name System (DNS)

  • DNS may be needed if you are using a DNS server to resolve hostnames and the server is reachable natively through the VPN 0 transport.
  • You may need DNS to resolve the vBond or NTP server name.
  • DNS uses UDP port 53.

Hypertext Transfer Protocol Secure (HTTPS) (vManage)

  • HTTPS provides an admin user or operator secure access to vManage, which can be accessed through the VPN 0 interface.
  • vManage can be accessed using TCP port 443 or 8443.

Summary of additional VPN 0 protocols for SD-WAN device communication

Service

Protocol/Port

Direction

NETCONF

TCP 830

bidirectional

SSH

TCP 22

bidirectional

NTP

UDP 123

Outgoing

DNS

UDP 53

Outgoing

HTTPS

TCP 443/8443

bidirectional

Protocols Allowed Through the Tunnel Interface

Note that the VPN 0 transport interface is configured with a tunnel so control and data plane traffic can be encrypted, and native traffic can be restricted. 
Other than DTLS or TLS, the following native protocols are allowed through the interface by default:

  • DHCP
  • DNS
  • ICMP
  • HTTPS

Ports for Controller Management

  • Additional management protocols may be used on the VPN 512 interface of SD-WAN devices. They are summarized as follows:
  • Summary of management protocols for SD-WAN devices

Service

Protocol/Port

Direction

NETCONF

TCP 830

Bidirectional

SSH

TCP 22

Incoming

SNMP Query

UDP 161

Incoming

Radius

UDP 1812

Outgoing

SNMP Trap

UDP 162

Outgoing

Syslog

UDP 514

Outgoing

TACACS

TCP 49

Outgoing

HTTPS (vManage)

TCP 443, 8443, 80

Incoming

Ports for vManage Clustering and Disaster Recovery

  • For a vManage cluster, the following ports may be used on the cluster interface of the controllers. Ensure the correct ports are opened within firewalls that reside between cluster members.
  • Summary of ports needed for vManage clustering

vManage Service

Protocol/Port

Direction

Application Server

TCP 80, 443, 7600, 8080, 8443, 57600

Bidirectional

Configuration Database

TCP 6362-6372, 7687, 7474, 5000, 6000, 7000

Bidirectional

Coordination Server

TCP 2181, 3888

Bidirectional

Message Bus

TCP 9092

Bidirectional

Statistics Database

TCP 9200, 9300

Bidirectional

Tracking of device configurations (NCS and NETCONF)

TCP 830

Bidirectional

  • If disaster recovery is configured, ensure that the following ports are opened over the out-of-band interface across the data centers between the primary and standby cluster:
  • Summary of ports needed for vManage disaster recovery

vManage Service

Protocol/Port

Direction

Disaster Recovery

TCP 443, 830, 18600, 18500, 18501, 18301, 18302, 18300

Bidirectional

Tuesday, September 27, 2022

Cisco SDWAN Management Plane

In traditional networking, configurations are typically applied on a device-per-device basis using CLI. This leads to a lot of boilerplate code and management inefficiencies. Cisco SD-WAN has been designed to overcome this by implementing a centralized management-plane that administers all devices. 

The solution uses policies to manipulate the overlay fabric in a centralized fashion and templates to eliminate the boilerplate configurations and reuse code.

Cisco SD-WAN Policies

  • Policies are an essential part of the Cisco SD-WAN solution and are used to influence the packet flow across the overlay fabric. 
  • They are created on vManage through the Policy Wizard GUI and when applied, are pushed via NETCONF transactions either to vSmart controllers (centralized policies) or directly to vEdges (localized policies).
  • A Cisco SD-WAN policy is the sum of at least one list, that identifies interesting values, one policy definition, that defines actions, and at least one application, that defines where this policy will be applied.

  • It is important to understand that policies are configured on vSmart or vEdge.
  • vManage is only a graphical user interface used to create and store policies, but once a policy is activated through the vManage GUI, it is configured with a NETCONF transaction either on vSmart or vEdge. 
  • Therefore, activating a policy via vManage is equal to manipulating the configuration of vSmart.
  • vSmart does not store policies, it only loads the currently active policy in its running-configuration. 
  • All policy versions and revisions are stored on vManage. 
  • Therefore, vManage is responsible for rollbacks, version control, and making sure that policy changes are persistent across multiple vSmart controllers.
  • While all policies are defined using the vManage Policy Wizard, different types are enforced on different devices at different locations in the network.

 Cisco SD-WAN Policy Types

  • Centralized policies allow us to manipulate the whole overlay fabric in a centralized fashion and localized ones give us the ability to manipulate only a particular device or location. 
  • Because the control and data plane are separated, centralized policies are also separated into centralized-control-policies that affect the control-plane operations and centralized-data-policies that directly affect the forwarding of packets.

Policy Key Points

A policy is processed in the following order of steps:

  • All match–action clauses are processed in sequential order, starting from the lowest sequence number upwards.
  • When a match occurs, the configured action is performed, and the sequential processing does NOT continue further (all other match-action pairings are skipped).
  • If a match does not occur, the configured entity is subject to the default action configured (by default it is reject).

Centralized policies (the ones configured on vSmart) are always applied to a site-list.

  • Only one of each type of policies can be applied to a site-list. For example, you can configure one control-policy in and one control-policy out but not two control policies in the outbound direction.
  • Cisco does not recommend including a site in more than one site-list. Doing this may result in unpredictable behavior of the policies applied to these site-lists.
  • Centralized-Control-policy is unidirectional applied either inbound or outbound. For example, If we need to manipulate omp routes that the controller sends and receives, we must configure two control policies.
  • Centralized-Data-policy is directional and can be applied either to the traffic received from the service side of the vEdge router, traffic received from the tunnel side, or both.
  • VPN membership policy is always applied to traffic outbound from the vSmart controller.

vEdge Order of Operations

The steps that a WAN edge router takes when forwarding a packet through:

  1. IP Destination Lookup: WAN edge devices are in essence just routers, so the forwarding decision always starts with IP address lookup.
  2. Ingress Interface ACL: Localized policies are typically used to create ACLs and tie them to vEdge interfaces. As in traditional networking, these ACLs can be used for filtering, marking, and traffic policing.
  3. Application-Aware Routing:  If there is an Application-Aware Routing policy applied, it makes a routing decision based on the defined SLA characteristics such as packet loss, latency, jitter, load, cost, and bandwidth of a link.
  4. Centralized Data Policy: The centralized data policy is evaluated after the Application-Aware Routing policy and can override the Application-Aware Routing forwarding decision.
  5. Forwarding: At this point, the destination IP address is compared against the routing table, and the output interface is determined.
  6. Security Policy: If there are security services attached to the WAN edge node, they are processed in the following sequence - Firewall, IPS (Intrusion Prevention), URL-Filtering, and lastly AMP (Advanced Malware Protection). The necessary tunnel encapsulations are performed, and VPN labels are inserted.
  7. Egress Interface ACL: As with ingress ACLs, local policy is able to create ACLs that are applied on egress as well. If traffic is denied or manipulated by the egress ACL, those changes will take effect before the packet is forwarded.
  8. Queueing and Scheduling: Egress traffic queueing services such as Low-Latency (LLQ) and Weighted Round Robin (WRR) queueing are performed before the packet leaves.

In Cisco SD-WAN, you can apply configurations to network devices with either one of the following two methods:

  • Via the CLI - This is the well-known way of configuring network nodes in traditional networking. You connect to the device via TELNET/SSH or CONSOLE and modify the running configuration. As much as we network engineers love the CLI, it has not been designed to make massive scale configuration changes to multiple devices at the same time.
  • Via the vManage GUI - This is the recommended centralized approach of configuring the devices in the Cisco SD-WAN solution. It is significantly less error-prone, can easily scale, and has support for automation, backups, and recovery.

Configuration Templates

  • The actual process of configuring Cisco SD-WAN nodes via vManage is done by applying device templates to one or multiple devices. 
  • A device template holds the whole operational config of a device. 
  • When vManage provisions the configuration of a node, it acts as a single source of truth and "locks" the device in a configuration mode called "vManage mode". 
  • That means that configuration changes can only be applied via vManage and changes via CLI are not allowed.  

A device template can be either Feature-based or CLI-based as is shown in figure. Something very important about templates is that when we create a CLI-based template for a specific device, the whole configuration of the device must be in the CLI template and not only a specific snippet of the configuration. The opposite is true about feature templates.

Creating a feature-based template is comparable to assembling a template of lego blocks where each block is a different technology feature. For example, OSPF is one lego block, BGP is another lego block, AAA is another, and so on. Let's highlight the main benefit of configuring devices using feature-based templates:

  • Feature templates can be reused across multiple devices. This brings greater flexibility and scale.
  • It is more granular than CLI-based templates. You can modify only a specific device feature such as AAA or BGP.
  • You don't need to know the device-specific syntax of different platforms. You just apply the template and vManage handles the actual configuration behind the scenes.

Configuration Variables

Network engineers know very well that network devices have many device-specific parameters that are unique per device. For example, each one has a unique name, IP addresses, interface names, router-id, and so on. To account for that, Cisco SD-WAN gives us the ability to specify three different types of values when creating feature templates:

  • Global - When we specify a value to be Global that means that it will be applied to all devices to which the feature template is attached. For example, this will most probably be the case for the SNMP communities, Syslog servers, or the company's banner message. At a later stage, when we want to change the Banner of all nodes, we would just update the feature template value and it will update every device template that is using this feature template. 
  • Device-specific - When we know that a particular parameter will be unique for every different device, we specify a device-specific value. When we do that, the wizard will ask for a variable name. In the example in figure 2, this is [inet_if_name]. Upon applying a device template to a given device, the vManage Wizzard will ask us to provide the actual unique value for this variable.
  • Default - The default value simply represents the factory default settings. It cannot be changed, that is why the textbox is always greyed-out and inactive. When we want to overwrite the default value, we change the value type to either Global or Device-specific.

Device Templates

It is important to understand that a device template defines a given device's complete operational configuration. The structure of a device template is shown in figure. It is made of several feature templates depending on the specific device, role, and so on.

  • As you know, on the traditional Cisco networking devices, some essential features are mandatory and turned on by default (for example spanning-tree, vtp, etc). 
  • In the same way in Cisco SD-WAN, when creating a device template, some features are mandatory, indicated with an asterisk (*). 
  • That is why there are factory-default templates named Factory_Default_{Feature-Name}_Template that is applied by default in case you do not overwrite them with a more specific configuration.
  • Upon attaching a configuration template to a cisco SD-WAN node, vManage requires all device-specific values to be filled in. This can be done through the vManage GUI directly, or by or by using a CSV file.
  • Once a device template is applied to vEdge or vSmart device, the device is put in "vManaged mode" and its configuration can no longer be modified via CLI.
  • In cases where we attach a device template to a WAN edge router and it for whatever reason loses control plane connectivity to the vManage controller, the vEdge will immediately start a 5-min rollback timer. If the control-plane connectivity does not come up within that 5 minute, the vEdge will revert back its configuration to the last-known working setup and will eventually reconnect to vManage.

A Cisco SD-WAN device can be either in one of these configuration modes at any given time:

  • CLI mode – a template is not attached to the device by vManage and the device's configuration can be modified locally using the cli, for example via console or SSH. This is the default mode for all Cisco SD-WAN devices.
  • vManage mode – a template is attached to the device by vManage and the device's configuration cannot be modified locally using the cli.

vManage Mode

  • By default, all Cisco SD-WAN controllers are in "CLI mode". 
  • That means that they allow configuration changes done using the CLI only. 
  • However, as we have explained in our lesson for Cisco SD-WAN Policies when we activate a centralized policy through the vManage GUI, what happens behind the scenes is that the vManage is making configuration changes on the vSmart controller using NETCONF. 
  • But by default, like all other devices, the vSmart controller is in CLI mode (allowing config changes via cli only) and thus it does not accept NETCONF transactions from vManage. 
  • That is why the policy activation fails.
  • To successfully activate a policy, we should change the configuration mode of the device, that the policy will be applied to, to be in “vManaged" mode. 
  • This is done by applying a template from vManage to that device. 
  • This tells the affected node that from now on it will not be configured manually via CLI but in a centralized fashion using templates and policies from vManage.
  • You can check whether a device is in vManaged mode or not with the following command:
vSmart# show system status
Viptela (tm) vsmart Operating System Software
Copyright (c) 2013-2017 by Viptela, Inc.
Controller Compatibility:
Version: 18.4.4
Build: 82
## lines omitted
Personality:             vsmart
Model name:              vsmart
Services:                None
vManaged: false
Commit pending:          false
Configuration template: None
Policy template:         None
Policy template version: None

Or alternatively on vManage under Configuration > Devices > Controllers

Applying a template to vSmart

Applying a configuration template to vSmart allows vManage to have authoritative control of vSmart’s configuration. 
Any type of template does the job - it does not matter whether it is a CLI or Feature template. 
In typical production deployments, it is very common to use CLI templates for this use case, as they are very simple and quickly made, and do not require administration beyond the initial deployment.
Practically speaking, the easiest way to change a controller to be in vManaged mode is to create a CLI template and attach it to vSmart. 

  • Let's create a CLI template by going to Configuration > Templates > Create Template > CLI.
  • Then we select the device model (in our case vSmart) from the dropdown menu and specify the name and description for the template. At this point, we SSH to the controller, get the output of the show run command, and paste it in the CLI configuration section.
  • Then we go to the additional options and select Attach Devices. In the next window, you are going to see all vSmart controllers that are known to vManage. You select the one, which you got the show run output from.
  • In the next window, you will be prompted to validate the configuration that is going to be applied. Once you confirm it, vManage will push the configuration template to vSmart.
  • At this point, the device is fully managed in a centralized fashion by vManage.

Cisco SDWAN Orchestration Plane

Bringing the WAN Edge into the Overlay

  • In order to join the overlay network, a WAN Edge router needs to establish a secure connection to the vManage so that it can receive a configuration file, and it needs to establish a secure connection with the vSmart controller so that it can participate in the overlay network. 
  • The discovery of the vManage and vSmart happens automatically and is accomplished by first establishing a secure connection to the vBond orchestrator.

The following figure shows the sequence of events that occurs when bringing the WAN Edge router into the overlay.

  1. Through a minimal bootstrap configuration or through the automated provisioning (ZTP or PnP) process, the WAN Edge router first attempts to authenticate with the vBond orchestrator through an encrypted DTLS connection. Once authenticated, the vBond orchestrator sends the WAN Edge router the IP addresses of the vManage network management system (NMS) and the vSmart controllers. The vBond orchestrator also informs the vSmart controllers and vManage of the new WAN Edge router wanting to join the domain.
  2. The WAN Edge router begins establishing secure DTLS or TLS sessions with the vManage and the vSmart controllers and tears down the session with the vBond orchestrator. Once the WAN Edge router authenticates with the vManage NMS, the vManage pushes the configuration to the WAN Edge router if available.
  3. The WAN Edge router attempts to establish DTLS/TLS connections to the vSmart controllers over each transport link. When it authenticates to a vSmart controller, it establishes an OMP session and then learns the routes, including prefixes, TLOCs, and service routes, encryption keys, and policies.
  4. The WAN Edge router attempts to establish BFD sessions to remote TLOCs over each transport using IPsec.

Onboarding the WAN Edge Router

There are multiple ways to get a WAN Edge router up and running on the network. 

  • One way is the manual method, where you can establish a console to the device and configure a few configuration lines, or 
  • By using an automated provisioning method, like Zero-Touch Provision (ZTP) or Plug-and-Play (PnP), where you can plug the WAN Edge router into the network and power it on and it will be provisioned automatically.

Manual

With the manual configuration method, the idea is to configure the minimum network connectivity and the minimum identifying information along with the vBond orchestrator IP address or hostname. 
The WAN Edge router attempts to connect to the vBond orchestrator and discover the other network controllers from there. In order for you to bring up the WAN Edge router successfully, there are a few things that need to be configured on the WAN Edge router:

  • Configure an IP address and gateway address on an interface connected to the transport network, or alternatively, configure Dynamic Host Configuration Protocol (DHCP) in order to obtain an IP address and gateway address dynamically. The WAN Edge should be able to reach the vBond through the network.
  • Configure the vBond IP address or hostname. If you configure a hostname, the WAN Edge router needs to be able to resolve it. You do this by configuring a valid DNS server address or static hostname IP address mapping under VPN 0.
  • Configure the organization name, system IP address, and site ID. Optionally, configure the host name.

Automated Device Provisioning (ZTP or PnP)

  • Automated device provisioning for vEdge devices is called Zero-Touch Provisioning (ZTP), and for IOS XE SD[1]WAN devices, it is called Plug-and-Play (PnP). 
  • The processes are very similar, but two different services are involved.
  • The automated provisioning procedure starts when the WAN Edge router is powered up for the first time. 
  • The vEdge router attempts to connect to a ZTP server with the hostname ztp.viptela.com, where it gets its vBond orchestrator information. 
  • For IOS XE SD-WAN routers, it attempts to connect to the PnP server using the hostname devicehelper.cisco.com. 
  • Once the vBond orchestrator information is obtained, it can then subsequently make connections to the vManage and vSmart controllers in order to get its full configuration and join the overlay network.

There are a few requirements for automated device provisioning:

  • With the hardware vEdge appliances, only certain ports are pre-configured by default to be a DHCP client interface and can be used for ZTP. The following table outlines the ports that must be plugged into the network for ZTP to work. With IOS XE SD-WAN devices, PnP is supported on all routed Gigabit Ethernet interfaces with the exception of the management interface (GigabitEthernet0).
  • The WAN Edge router should be able to get an IP address through DHCP or use Auto IP (vEdge only) to discover an IP address.
  • The gateway router for the WAN Edge router in the network should have reachability to public DNS servers and be able to reach ztp.viptela.com for vEdge devices and devicehelper.cisco.com for IOS XE SD-WAN devices. A ZTP server can be deployed on-premise but the PnP server requires Internet access.
  • The SD-WAN device needs to be correctly entered in the PnP portal at https://software.cisco.com and associated with a controller profile defining the vBond hostname or IP address information.
  • In vManage, there must be a device configuration template for the WAN Edge router attached to the WAN Edge device. The system IP and site ID need to be included in this device template in order for the process to work. The ZTP or PnP process cannot succeed without this.

Monday, September 26, 2022

Cisco SDWAN Secure Data Plane

What is a TLOC?

  • Transport Locators, or TLOCs, are the attachment points where a WAN Edge router connects to the WAN transport network. 
  • A TLOC is uniquely identified and represented by a three-tuple, consisting of 
  • System-IP: The System-IP is the unique identifier of the WAN edge device across the SD-WAN fabric. It is like the Router-ID in traditional routing protocols such as BGP. It does not need to be routable or reachable across the fabric.
  • Transport Color: The color is an abstraction used to identify different WAN transports such as MPLS, Internet, LTE, 5G, etc. In scenarios where transport types are duplicated (for example two different Internet providers) and should be treated differently from each other, the colors could be arbitrary, such as Green, Blue, Silver, Gold, etc.
  • Encapsulation Type: This value specifies the type of encapsulation this TLOC uses - IPsec or GRE. To successfully form a data plane tunnel to another TLOC, both sides must use the same encryption type.
  • TLOC routes are advertised to vSmarts via OMP, along with a number of attributes, including the private and public IP address and port numbers associated with each TLOC, as well as color and encryption keys. 
  • These TLOC routes with their attributes are distributed to other WAN Edge routers.
  • Now with the TLOC attributes and encryption key information known, the WAN Edge routers can attempt to form BFD sessions using IPsec with other WAN Edge.

.

What is TLOC Color?

  • TLOC Color is a logical abstraction used to identify specific WAN transport that connects to a WAN Edge device. 
  • The color is a statically defined keyword that distinguishes a particular WAN transport as either public or private and is globally significant across the Cisco SD-WAN fabric.
  • From the perspective of vEdge-1, the only way to distinguish which interface is connected to which cloud is through the concept of colors that would be externally defined by the controller or locally via CLI.

The TLOC color is configured per interface under the transport vpn0/ interface / tunnel-interface settings as in is shown below:

vpn 0
 interface ge0/0
  ip address 10.1.1.43/24
  tunnel-interface
   encapsulation ipsec
   color mpls

As of now, there are 22 pre-defined color keywords, and they are divided into two main categories - public and private colors.

  • The public colors are designed to distinguish connections to public networks such as the Internet where typically the attachment interface has an RFC1918 address that is later translated to a publicly routable address via NAT.  
  • On the other hand, private colors are intended for use on connections to clouds where NAT is not utilized. On WAN Edge routers, each Transport Locator is associated with a private-public IP address pair. 

The TLOC color dictates whether the private or public IP address will be used when attempting to form a data plane tunnel to a remote TLOC.

Communication Between Colors

  • During the authentication process with the vBond orchestrator, WAN edge devices learn whether they sit behind a NAT device and what is their NATed address and port.
  • This is done using the STUN protocol and the process is explained in further detail as below TLOCs and NAT.
  • In the end, each TLOC contains a pair of private/public addresses and ports.
  • If there is no NAT, both the private and public addresses are the same, if there is a NAT device along the path, the private address represents the native interface IP and the public address represents the post-NAT address.
  • When two Cisco SD-WAN devices attempt to form an overlay tunnel between each other, they look at the colors at both ends to decide which IP address to use.
  • If the TLOC color at both ends is a Public one, the WAN edge devices attempt to form the data plane tunnel using their public IP addresses.

The following diagram demonstrates the general behavior. These rules apply to:

  • WAN Edge routers using IPsec to other WAN Edge routers
  • DTLS/TLS connections between WAN Edge routers and vManage and vSmart controllers
  • DTLS/TLS connections between vManage and vSmart controllers

TLOC Carrier

  • However, specific scenarios might occur where using the public IP addresses between private colors is the desired behavior. 
  • An example would be having two MPLS clouds that are interconnected using NAT. 
  • For such cases, there is a particular TLOC attribute called carrier that changes this behavior - if the carrier setting is the same in the local and remote TLOCs, the WAN edge device attempts to form a tunnel using the private IP address, and if the carrier setting is different, then the WAN edge device attempts to form a tunnel using the public IP address. 
  • The diagram below visualizes this:

TLOC Color Restrict

  • By default, WAN edge routers try to form overlay tunnels to every received TLOC from a different site using every available color. 
  • This is usually the desired outcome in scenarios where we have two Internet connections from two different providers. 
  • Although we typically mark them with different colors to treat them separately, we would like to have a full mesh of tunnels because there is IP reachability between the clouds.
  • However, this behavior might not be desirable in scenarios where we have one private transport alongside an Internet cloud, as it could lead to inefficient routing—such as WAN edge routers trying to build tunnels through the MPLS cloud to Internet TLOCs. 
  • Even though the IP reachability between the clouds may exist, the tunnels might be established over paths that are inefficient or unintended. 
  • This behavior can be changed with the restrict keyword or by using tunnel groups.
vpn 0
 interface ge0/0
  ip dhcp-client
  tunnel-interface
   encapsulation ipsec
   color mpls restrict

When a TLOC is marked as restricted, a WAN edge route router will attempt to establish a data plane tunnel to a remote TLOC only via WAN connections marked with the same color. 

This behavior is demonstrated in below figure. vEdge-1 will never try to establish an IPsec tunnel from T1 to T4 because TLOC1 and TLOC4 are not marked with the same color.

Another option to achieve the same goal of restricting the data plane connectivity between the same colors is by using tunnel groups. Only tunnels with matching tunnel groups will form a data plane connection (regardless of the color).

vpn 0
 interface ge0/0
  ip dhcp-client
  tunnel-interface
   encapsulation ipsec
   group 199 

Bidirectional Forwarding Detection (BFD)

  • On Cisco WAN Edge routers, BFD is automatically started between peers and cannot be disabled. 
  • It runs between all WAN Edge routers in the topology encapsulated in the IPsec tunnels and across all transports. 
  • BFD operates in echo mode, which means when BFD packets are sent by a WAN Edge router, the receiving WAN Edge router returns them without processing them. 
  • Its purpose is to detect path liveliness and it can also perform quality measurements for application-aware routing, like loss, latency, and jitter. 
  • BFD is used to detect both black-out and brown-out scenarios.

Tunnel Liveliness

  • To detect whether an IPsec tunnel is up, BFD hello packets are sent every 1000 milliseconds/1 second by default on every tunnel interface. 
  • The default BFD multiplier is 7, which means the tunnel is declared down after 7 consecutive hellos are lost. 
  • The BFD hello interval and multiplier are configurable on a per color basis. 
  • BFD packets are marked with DSCP 48, which is equivalent to CS6 or IP Precedence 6. Packets are placed in the low latency, high priority QoS queue (LLQ) before being transmitted on the wire but are not subjected to the LLQ policer. 
  • Though rarely needed, the DSCP value can be modified using an egress ACL on the WAN interface.

Path Quality

  • BFD is used not only to detect blackout conditions but is also used to measure various path characteristics such as loss, latency, and jitter. 
  • These measurements are compared against the configured thresholds defined by the application-aware routing policy, and dynamic path decisions can be made based on the results in order to provide optimal quality for business-critical applications.
  • For measurements, the WAN Edge router collects packet loss, latency, and jitter information for every BFD hello packet. 
  • This information is collected over the poll-interval period, which is 10 minutes by default, and then the average of each statistic is calculated over this poll-interval time. 
  • A multiplier is then used to specify how many poll-interval averages should be reviewed against the SLA criteria. 
  • By default, the multiplier is 6, so 6 x 10-minute poll-interval averages for loss, latency, and jitter are reviewed and compared against the SLA thresholds before an out-of-threshold decision is made. 
  • The calculations are rolling, meaning, on the seventh poll interval, the earliest polling data is discarded to accommodate the latest information, and another comparison is made against the SLA criteria with the newest data.

The following figure shows an example when an out-of-threshold condition is recognized when latency suddenly increases. 
When latency jumps from 20 ms to 200 ms at the beginning of poll-interval 7, it takes 3 poll intervals of calculations before the latency average over 6 poll intervals crosses the configured SLA threshold of 100 ms.

  • You may want to adjust application route poll-interval values, but you need to exercise caution, since settings that are too low can result in false positives with loss, latency, and jitter values, and can result in traffic instability. 
  • It is important that there is a sufficient number of BFD hellos per poll interval for the average calculation, or large loss percentages may be incorrectly tabulated when one BFD hello is lost. 
  • In addition, lowering these timers can affect overall scale and performance of the WAN Edge router. 
  • For 1 second hellos, the lowest application route poll-interval that should be deployed is 120 seconds. 
  • With 6 intervals, this gives a 2-minute best case and 12-minute worst case before an out-of-threshold is declared and traffic is moved from the current path. 
  • Any further timer adjustments should be thoroughly tested and used cautiously

vBond as a NAT Traversal Facilitator

  • Any controller or SD-WAN router may be unknowingly sitting behind a NAT device. 
  • Knowing what IP address/port to connect to from outside the network is crucial to successfully establishing control and data plane connections in the SD-WAN network. 
  • vBond plays a crucial role and acts as a Session Traversal Utilities for NAT (STUN) server, which allows other controllers and SD-WAN routers to discover their own mapped/translated IP addresses and port numbers. 
  • SD-WAN devices advertise this information along with their TLOCs so other SD-WAN devices have information in order to make successful connections.

NAT Detection

Cisco SD-WAN solution is designed to run over any kind of WAN transport that is available to the WAN edge devices including all different public networks such as Broadband, 4G/5G, LTE, Business Internet, and so on. This implies that the overlay fabric should be able to form through all flavors of Network Address Translations that these public networks utilize. In practice, any Cisco SD-WAN device may be unknowingly sitting behind one or more NAT devices. To discover the public IP addresses/ports allocated by NAT, Cisco SD-WAN devices use the Session Traversal Utilities for NAT (STUN) protocol defined in RFC5389.

  • STUN is a client-server protocol that uses a request/response transaction in which a client sends a request to a server, and the server returns a response. 
  • As the request (called STUN Binding Request) passes through a NAT, the NAT will modify the source IP address/port of the packet. 
  • Therefore, the STUN server will receive the request with the public IP address/port created by the closest NAT device. 
  • The STUN server then copies the public address into an XOR-MAPPED- ADDRESS attribute in the STUN Binding response and sends it back to the client. 
  • Going back through the NAT, the public address/port in the IP header will be un-NATted back to the private ones, but the public address copy in the body of the STUN response will remain untouched. In this way, the client can learn its IP address allocated by the outermost NAT with respect to the STUN server.

NAT Types

In a typical production SD-WAN deployment, we would probably have many remote sites connected via many different Internet connections to a centralized data center or a regional hub. In most regions in the world, Internet providers will always use some type of private-public address translation due to a shortage of public IPv4 addresses. Let's look at the NAT classifications according to the STUN protocol and how they can affect whether sites can form connections and communicate directly with each other or not.

Full-Cone NAT

  • A full-cone is one where all packets from the same internal IP address are mapped to the same NAT IP address. This type of address translation is also known as One-to-One.
  • Additionally, external hosts can send packets to the internal host, by sending packets to the mapped NAT IP address.

Restricted-Cone NAT

  • A Restricted-Cone network address translation is also known as Address-Restricted-Cone. 
  • It is a network translation technique where all packets from the same internal IP address are mapped to the same NAT IP address. 
  • The difference to a Full-Cone is that an external host can send packets to the internal host only if the internal host had previously sent a packet to the IP address of the external destination. 
  • It is important to note that once the NAT mapping state is created, the external destination can communicate back to the internal host on any port.

Port-Restricted-Cone NAT

  • A Port-Restricted-Cone is like the Restricted-Cone address translation, but the restriction also includes port numbers. 
  • The difference is that an external destination could send back packets to the internal host only if the internal host had previously sent a packet to this destination on this exact port number. 
  • In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT).

Symmetric

  • Symmetric NAT is also known as Port Address Translation (PAT) and is the most restrictive of all other types. 
  • It is a network translation technique where all requests from the same internal IP address and port to a specific destination IP address and port, are mapped to a unique NAT IP address and NAT port. 
  • Furthermore, only the external destination that received a packet can send packets back to the internal host. 
  • In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT) with port-randomization.

NAT Recommendations

  • Though several types of NAT are supported with WAN Edge routers, if full mesh traffic is desired, take care to ensure at least one side of the WAN Edge tunnel can always initiate a connection inbound to a second WAN Edge even if there is a firewall in the path. 
  • It is recommended to configure full-cone, or 1-to-1 NAT at the data center or hub site so that, regardless of what NAT type is running at the branch (restricted-cone, port-restricted cone, or symmetric NAT), the branch can send traffic into the hub site using IPsec at a minimum without issue. 
  • Two sites with firewalls running symmetric NAT will have issues forming a tunnel connection, as this NAT translates the source port of each side to a random port number, and traffic cannot be initiated from the outside. 
  • Symmetric NAT configured at one site requires full-cone NAT or a public IP with no NAT on the other site in order to establish a direct IPsec tunnel between them. Sites which cannot connect directly should be set up to reach each other through the data center or other centralized site.

The following table shows different NAT type combinations and the corresponding IPsec tunnel status:

vEdge-1

vEdge-2

IPsec tunnel can form

GRE tunnel can form

No-NAT (Public IP)

No-NAT (Public IP)

YES

YES

No-NAT (Public IP)

Symmetric

YES

NO

Full Cone  (One-to-one)

Full Cone (One-to-one)

YES

YES

Full Cone (One-to-one)

Restricted-Cone

YES

NO

Full Cone (One-to-one)

Symmetric

YES

NO

Restricted-Cone

Restricted-Cone

YES

NO

Symmetric

Restricted-Cone

NO

NO

Symmetric

Symmetric

NO

NO

IMPORTANT Note that for overlay tunnels configured to use GRE encapsulation instead of IPsec, only public IP addressing, or one-to-one address translation is supported. Any type of Network Address Translation with port overloading is not supported since GRE packets lack an L4 header.

Data Plane Privacy and Encryption

Most overlay solutions these days encrypt and authenticate data plane traffic using IPsec and Cisco SD-WAN is no different. Although there is one major difference that Cisco SD-WAN utilizes in order to scale better and more efficiently. Most traditional IPsec environments use Internet Key Exchange (IKE) to handle the key exchange between IPsec peers. However, IKE creates scalability issues in full-meshed environments with thousands of spokes because each spoke is required to manage n^2 key exchanges and n-1 different keys.

  • Cisco SD-WAN was designed to overcome these scaling limitations by not utilizing IKE at all but instead implementing the key exchange within the control plane.
  • This is possible because vEdges identity is established during the provisioning process with the vBond orchestrator.

  • The main idea is that WAN edge routers can leverage the existing encrypted control connections to the vSmart controller and advertise their keys to the controller via OMP.
  • The controller then redistributes them as OMP updates to all other peers, so the exchange is completely done through the SD-WAN control plane.

vEdge-1 generates an AES-256-bit key for each connected WAN transport. 
In the below example, there is only one transport so there is only one generated key - encr-key-1
However, three symmetric keys will be generated if we have three WAN providers. 
Once the encr-key-1 is generated, vEdge-1 advertises it in an OMP update to vSmart, along with the corresponding TLOC T1. 
This route advertisement is then re-advertised to the rest of the overlay fabric. 
vEdge-2 and vEdge-3 will then use this information to build their IPsec tunnels to vEdge-1 and encrypt the data plane traffic with the received AES-256 key.

  • Essentially, this keys exchange model removes the burden of individual negotiations between WAN edge devices that using IKE would have brought. 
  • In addition to that, each key lifetime is 24 hours and each WAN edge router will regenerate its keys every 12 hours in order to provide enhanced encryption and authentication
  • This means that two keys (old and new) are present at any one time. 
  • The renegotiation of keys does not affect existing traffic, as it happens in parallel with the existing ones and the old key is still held for another 12 hours so any traffic is accepted using either one.
  • If we summarize everything we have said up to this point - the Cisco SD-WAN solution exchanges keys between WAN Edges and vSmart controllers and uses symmetric keys in an asymmetric fashion. 
  • This means the following:
  • The same key is used for encryption and decryption of data plane traffic.
  • WAN edge routers use their remote peer’s key to encrypt the data rather than their own when sending traffic over the tunnel.
Traffic encryption with Symmetric Keys
Two WAN edge devices are going to communicate over a secure overlay tunnel. Encryption and decryption will occur using the following process:

  • vEdge-1 generates an AES-256 key called encr-key-1 and vEdge-2 generates one called encr-key-2.
  • Both routers advertise these via OMP to the controller and it distributes them across the overlay.
  • When vEdge-1 sends data to vEdge-2, it will encrypt the data using vEdge-2’s key.
  • When vEdge-2 receives the data, it will use its key for the decryption of that data.
  • When vEdge-2 sends data to vEdge-1, it will encrypt the data using vEdge-1’s key.
  • When vEdge-1 receives the data, it will use its key for the decryption of that data.
Additional security with Pairwise
The process of encryption and decryption of data when using the IPsec Pairwise keys feature will be as follow:

  • Each WAN Edge will generate a key for each pair of local-remote TLOC. The session key will then be advertised to the vSmart via OMP.
  • The vSmart controller will redistribute the key to the respective peers.
  • When WAN edge A sends data to WAN edge B, the IPsec session key BA will be used. In the reverse scenario, WAN Edge B will use the IPsec session key AB.
  • When vEdge-A sends data to vEdge-C, key CA will be used. In the reverse direction, vEdge-C will send traffic using AC.

Another very important thing to note is that the IPsec Pairwise feature is backward compatible with devices that don’t support pairwise keys. The feature is disabled by default on the Cisco SD-WAN device and can be enabled via templates.