Thursday, September 22, 2022

Overlay Management Protocol (OMP)

What is OMP?

The Cisco Overlay Management Protocol (OMP) is an all-in-one TCP-based protocol, similar to BGP, that establishes and maintains the SD-WAN control plane. OMP runs between the vEdge routers and the vSmart controllers and between the controllers themselves. The protocol is responsible for:

  • Distribution of Transport Locators (TLOCs) among network sites in the sd-wan domain.
  • Distribution of service-side reachability information.
  • Distribution of service-chaining information.
  • Distribution of data plane security parameters, VPN labels, and crypto keys.
  • Distribution of data and application-aware routing (AAR) policies

OMP Peering

  • OMP is enabled by default on all Cisco SD-WAN edge devices.
  • When vEdges go through the Zero-Touch Provisioning process, they learn about the addresses of all available vSmart controllers and automatically initiate secure connections to them.
  • By default, these connections are authenticated and encrypted via the Datagram Transport Layer Security (DTLS) protocol.
  • Depending on the number of available transports, each vEdge router will try to establish a secure control connection via every TLOC.
  • However, the OMP peering uses the System-IPs, and only one peering session is established between one WAN Edge device and one vSmart controller even if there are multiple DTLS connections to the same controller.

Another important thing to know is that these DTLS control plane tunnels are used by other protocols as well. For example, besides OMP, NETCONF and SNMP will also be transported via these secure connections. By utilizing these encrypted DTLS tunnels, we no longer need to be concerned about the native security of protocols like SNMP, NTP, etc.

OMP Route Advertisements

  • Cisco vEdge routers collect routes they learn from directly connected networks, static and dynamic routing protocols that run in the site-local environment.
  • These routes are then advertised to all OMP peers (to controllers) along with the corresponding TLOC next-hops.
  • The routes that represent reachability information are referred to as OMP routes or just vRoutes (to distinguish them from traditional IP routes).
  • However, vEdges also advertise to vSmart all locally attached services that are running in the site-local network.
  • Services include load balancers, firewalls, IDS (Intrusion Detection Systems) and could also be customer-defined ones.
  • The vSmart controllers learn the topology of the overlay fabric and all available network services through these OMP route advertisements coming from vEdges.
  • As it is visualized in figure, vEdge routers advertise three types of routes via the Overlay Management Protocol (OMP) to the vSmart controllers:

OMP Routes (vRoutes)

  • OMP Routes, also referred to as vRoutes, are prefixes learned at the local site via connected interfaces, static routes, and dynamic routing protocols (such as OSPF, EIGRP, and BGP) running on the service side of the vEdge. 
  • These prefixes are redistributed into OMP and advertised to the vSmart controller so that they can be carried across the overlay fabric to all other WAN edge nodes. 
  • OMP routes resolve their next-hop to a TLOC. An OMP route is installed in the forwarding table only if the next-hop TLOC is known and there is a BFD session in UP state associated with that TLOC.

vRoutes consists of a lot of attributes in addition to the reachability information.:

  • VPN: Every OMP route is associated with a VPN, and every Cisco SD-WAN device keeps a separate routing table for each VPN. This allows for the use of overlapping subnet ranges, provided they are in different VPNs.
  • Originator: This is the System-IP of the router, from which the route was originally learned from.
  • TLOC: This is the next-hop identifier of the OMP route. Note suppose that vEdge-1 advertises two vroutes for prefix 1.1.1.0/24, one via tloc T1 and one via tloc T2. This tells the vSmart controller and, subsequently, all remote WAN edge routers that to reach subnet 1.1.1.0/24, they must have an active overlay tunnel to either tloc T1 or T2. Active means that the BFD status associated with that tunnel must be in UP state.
  • Site ID: The site-id plays a similar role to a BGP AS number. It is primarily used for loop prevention. All sites should have a unique site ID, and all devices at the same location should have the same site-id.
  • Origin-Protocol: This is the original protocol from which the vEdge router has learned the routing information. It may be a connected interface, static route, or any existing dynamic routing protocols such as OSPF, EIGRP, or BGP.
  • Origin-Metric: OMP includes the original metric value alongside the origin protocol. These values are then used in the best-path algorithm when OMP calculates the most optimal routes toward destinations.
  • Preference: This attribute is also referred to as OMP preference or vRoute preference, so it is not confused with the TLOC preference attribute in the TLOC routes. The OMP Preference is used for influencing the OMP best-path selection for a given vroute. Higher is better.
  • Tag: This is like the route tags in traditional routing. Once a value is set, it is a transitive attribute that can be acted upon via policy.

Transport locations (TLOCs)

  • Identifiers that live in our transport side VPN and tie an OMP route (vRoute) to a physical location. 
  • A TLOC route represents a WAN link that serves as a tunnel endpoint and is uniquely identified by {System-IP, Color, Encapsulation}. 
  • Note that the System IP address is used instead of the interface IP address as an identifier for a TLOC route. 
  • That’s because the interface IP can change at any given moment. Using the fixed System-IP ensures that the TLOC can be uniquely identified at all times irrespective of any interface IP changes
  • In comparison with BGP, the TLOC acts as the next hop for OMP routes.

A TLOC route advertisement contains the following attributes:

  • Private IPv4/IPv6 addresses and ports: These are the IP addresses configured or assigned via DHCP on the vEdge’s WAN interface.
  • Public IPv4/IPv6 address/port: If the vEdge sits behind a NAT device, the outside NATed IP addresses and ports are included in the TLOC route advertisements. If the router does not sit behind NAT, the public and private addresses and ports are the same.
  • Color: The color is a logical abstraction used to identify a specific WAN interface on a WAN edge router. If no color is explicitly configured under an interface, it is marked with the default color - “default”.
  • Encapsulation type: The encapsulation could be either GRE or IPsec. To successfully form a tunnel, the encapsulation type of a TLOC must match with the remote TLOC’s encapsulation. In a typical production deployment, the encap will always be IPsec for security reasons. However, when one is studying or testing features, it is a good practice to use GRE encapsulation so that everything going through the overlay tunnels is cleartext and can be inspected with Wireshark.
  • Preference: This attribute is also referred to as a TLOC Preference, so it does not get confused with OMP Preference. It is used in the OMP best-path algorithm when comparing multiple vroutes for the same destination. Higher is better, and the default is 0.
  • Site ID: This attribute identifies the originating site for this TLOC route. WAN edge routers will never attempt to form an overlay tunnel to a remote TLOC that has the same site-id.
  • Tag: A user-defined value that can be acted upon in a control policy.
  • Weight: This is an attribute that is used to manipulate the outgoing traffic from the perspective of a vEdge. It is practically the same as BGP Weight. The value is locally significant and tells the vEdge which TLOC to prefer when there are multiple outgoing TLOCs for a destination. Higher is better, and the default is 0.

Service Routes

  • Identifiers that tie an OMP route to a service in the network, specifying the location of the service in the network. Services include firewalls, Intrusion Detection Systems (IDPs), and load balancers. Service route information is carried in both service and OMP routes. 
  • The network must be able to reroute traffic from any remote location in the overlay through these services and then route the traffic back to its original destination. This is called service chaining and is done using service routes.
  • The key point here and the major difference between the Cisco SD-WAN service chaining and the Traditional WAN is that no configuration is required on any remote WAN edge routers.

  • At a high level, the steps to enable service chaining are as follows:
  • One or multiple WAN edge routers advertise a network service to the vSmart controller using an OMP service route.
  • A policy that redirects the traffic from remote sites through the FW service is then defined on vSmart. Once processed by the FW, the traffic is forwarded to its destination.

A service route contains the following attributes:

  • VPN ID: The VPN that this service applies to, in our example, would be 50
  • Service ID: The service-id defines the type of service that is being advertised. There are 7 pre-defines values:
  • FW maps to svc-id 1
  • IDS maps to svc-id 2
  • IDP maps to svc-id 3
  • Custom Services: The last four values are used for customer defined services:
  • netsvc1 maps to svc-id 4
  • netsvc2 maps to svc-id 5
  • netsvc3 maps to svc-id 6
  • netsvc4 maps to svc-id 7
  • Originator ID: The System-IP address of the vEdge that originates the service route
  • TLOC: The TLOC (Transport Locator) where the service is located.


OMP Best-Path Algorithm

vSmart controllers and vEdge routers perform the Best-Path Selection when they have multiple routes for the same prefix.

  • Prefer ACTIVE routes over STALE routes. A route is ACTIVE when there is an OMP session in UP state with the peer that sent out the route. A route is STALE when the OMP session with the peer that sent out the route is in GRACEFUL RESTART mode.
  • Select routes that are Valid. Ignore invalid routes. A route must have a next-hop TLOC that is known and reachable.
  • Prefer routes with lower administrative distance (AD) (on vEdge only). AD is a locally-significant value on each router and depends on the OS. Different platforms may have different AD values for different protocols. For example, OMP has AD of 250 on vEdges and 251 on cEdges.  Additionally, network admins can define floating static routes with various ADs for the same prefix. AD is only compared when the same WAN edge router receives the same site-local prefix from multiple routing protocols. AD is not a parameter in OMP, is not advertised, and does not influence vSmart.
  • Prefer routes with a higher route preference value. By default, all omp routes have 0 preference. This is typically the most often used value when we need to do traffic engineering.
  • Prefer routes with a higher TLOC preference value (on vEdge only). TLOC preference is a parameter in TLOC routes. And TLOC routes are not bound to VPN-id. Therefore, changing the TLOC preference affects vEdges path selection for all VPNs.
  • Compare the origin type, and select the first match in the following order:
Connected
Static
EIGRP summary
EBGP
OSFP intra-area
OSPF inter-area
IS-IS level 1
EIGRP external
OSPF external
IS-IS level 2
IBGP
Unknown
  • Compare the origin metric - If the origin type of the routes is the same, select the routes that have the lower origin metric.
  • Tiebreaker - Prefer vEdge sourced routes over vSmart sourced. (on vSmart only)
  • Tiebreaker - If the origin types are equal, select the routes that have the lowest router-id (System-IP).
  • Tiebreaker - If the router IDs are the same, prefer the routes with the lowest private TLOC IP address.

ECMP - To be considered equal, omp routes must be valid and equal-cost up to step 8. When there are more equal-cost routes than the send-path-limit value, the controller sorts the best ones based on the tiebreakers in descending order and advertises as many as the send-path-limit.

Note that Cisco vEdge routers install a route in their forwarding table (FIB) only if the TLOC to which it points is active. Active TLOCs are ones that have a BFD session in UP state associated with them. When for whatever reason a BFD session becomes down(inactive), the Cisco vSmart/vEdge devices remove all routes that point to that TLOC from their forwarding table.

OMP Graceful Restart

  • While studying how Cisco SD-WAN works, have you ever wondered what happens with the overlay fabric when the SD-WAN Control Plane becomes unavailable?
  • Well, there is a feature called OMP Graceful Restart that allows the data plane to continue functioning and forwarding traffic even if the control plane suddenly goes down or becomes unavailable.
  • WAN edge devices do this by using the last known routing information that they received from the vSmart controllers.
  • At the same time, vEdges actively try to re-establish a control-plane connection to the vSmart controllers.
  • When the controllers are back up and reachable, DTLS control connections are re-established, and the vEdge routers then receive updated and refreshed network information from the vSmart controllers.
  • Cisco vEdge and vSmart devices cache the OMP information that they learn from peers.
  • The cached information includes OMP, TLOC, and SERVICE routes, IPsec SA parameters, and the centralized data policies in place.
  • When a WAN edge device loses its OMP peering to the vSmart controller, the device continues forwarding data traffic using the cached OMP information.
  • The Edge device also periodically checks whether the vSmart controller has come up.
  • When it does come back up and the control plane peering is re-established, the device flashes its local cache and refreshes the control plane information from the vSmart controller.
  • This same technique is valid in the opposite scenario when a vSmart controller no longer detects the presence of Cisco vEdge devices.
  • It then uses its local cache until the WAN edge device becomes reachable again.

What is OMP Send Path Limit?

  • OMP Send-Path-Limit is a configuration parameter that defines the maximum number of equal-cost vroutes that a vSmart controller or a vEdge router advertises to its OMP peers. 
  • By default, the omp send-path-limit is set to 4, which means that if a vSmart controller has ten equal-cost best paths to a destination, it will only advertise four routes to vEdges.
  • Changing the send-path-limit value is as simple as applying one configuration line under the omp configuration hierarchy, as shown in the output below:
vEdge/vSmart# conf t
Entering configuration mode terminal
vEdge/vSmart(config)#omp send-path-limit (1-16)
vEdge/vSmart(config)# commit and-quit
Commit complete.
vEdge/vSmart#

What is the OMP ecmp-limit?

  • Once we have understood what the send-path limit parameter does, let's go ahead and change it to a non-default value, for example, let's make it 5.
  • As expected, the vSmart controller will advertise the first five best routes. This can easily be verified on any of the other vEdges. 
  • However, notice another important thing. Even though vEdge6 now receives five routes for 10.1.1.0/24 in VPN100, the router only installs the first four in the routing table. 
  • We can verify this by checking the routing table for VPN 100.
  • What happened is illustrated in figure 3 below. Even though vEdge6 now receives five routes for 10.1.1.0/24 in VPN100, the number of routes that get installed in the routing tables is subject to another OMP parameter called ecmp-limit. 
  • By default, this parameter is also set to 4, which means that whatever number of best routes a vEdge may have, it will only install the best four in the routing table.
  • Let's change the ecmp-limit parameter and set it to the non-default value of 5.
vEdge-6# conf t
Entering configuration mode terminal
vEdge-6(config)# omp ecmp-limit 5
vEdge-6(config)# commit and-quit
Commit complete.

Now if we check the routing table, we can see that vEdge6 installs all five routes it receives.

What is OMP Send-Backup-Paths?

  • By default, when a Cisco vSmart controller receives multiple OMP routes to the same destination, it runs them through the OMP best-path algorithm and selects the best ones to that destination subnet. Then the vSmart controller advertises only the best routes out to the rest of the overlay fabric. This behavior is well-known to network engineers. BGP route reflectors work pretty much the same way when receiving and re-advertising routes to route-reflector clients.
  • The Cisco SD-WAN solution provides a configuration option that tells the vSmart controller to advertise the first set of non-best routes to vEdge routers. The configuration is as simple as adding one configuration line on the controller, as shown in the output below.

vSmart# config
Entering configuration mode terminal
vSmart(config)# omp send-backup-paths
vSmart(config-omp)# commit
Commit complete.
vSmart#

  • vSmart only advertises the best routes to a destination according to the OMP best-path algorithm.
  • The default behavior of vSmart hides reachability information from remote peers, similarly to a BGP route-reflector.
  • In scenarios where there isn't full IP reachability between TLOCs, and there are remote sites with limited overlay, this might create blackholes in failure scenarios.
  • The OMP send-back-paths option tells the vSmart controller to advertise the first set of non-best routes alongside the best ones.
  • Notice that the overall number of advertised routes is subject to the OMP send-path-limit parameter.

Important Point to be remember

  • The Overlay Management Protocol (OMP) governs the routing among vEdges.
  • The OMP best-path algorithm selects the best routes and sorts them in descending order (from best to worst).
  • The vSmart controller inserts and keeps all routes in separate VPN tables with the best routes at the top.
  • vRoutes:
  • Each vroute is associated with a VPN segment;
  • The next-hop attribute of a vroute is not an IP address but a TLOC route;
  • If the next-hop TLOC route of a vRoute is not known, the vroute is marked as Invalid.
  • The site-id is a loop prevention mechanism similar to the AS number in BGP.
  • TLOC routes:
  • TLOC routes are not associated with a VPN.
  • A TLOC route is uniquely identified by {System-IP, Color, Encapsulation}. Notice that the fixed system-IP address instead of the interface IP. This ensures that a TLOC route can be identified at any given moment irrespective of any interface changes.
  • Service routes:
  • vEdges advertise attached network services using OMP Service Routes.
  • The vSmart controllers do not re-advertise a service route.
  • The service route is used in centralized policies for service chaining.
  • The send-path-limit parameter defines the maximum number of best-paths that an SD-WAN device advertises to its OMP peers.
  • The ecmp-limit parameter defines the maximum number of best paths that a vEdge router installs in its routing tables.
  • The controller-send-path-limit defines the maximum number of best-paths that a vSmart controller advertises to another vSmart controller.

No comments:

Post a Comment