Understanding Border Gateway Protocol Version 4 (BGP-4)
BGP-4 Protocol Specification and Functionality
RFC 1771 defines the current Border Gateway Protocol 4 (BGP-4) implementation. BGP relies on a reliable transport mechanism to establish its connection and for exchanging information between BGP peers. BGP uses TCP port 179 for this purpose and benefits from the TCP protocol to offer reliable communication between BGP speakers. RFC 1771 describes in detail the requirements of BGP neighbor relationships, BGP update format, error notifications, and handling of special cases.
Proper BGP functionality requires proper configuration on the routers and correct implementation of the protocol per RFC 1771.
The sections that follow address these aspects of BGP:
- Neighbor relationships (peering)
- Advertising routes and the concept of synchronization
- Receiving routes
- Best-path calculation
- Policy control through the following:
- Use of BGP attributes (LOCAL_PREF, AS_PATH, MULTI_EXIT_DISC (MED), ORIGIN, NEXT_HOP)
- Use of route maps in policy control
- Use of filter lists in policy control
- Use of distribute lists in policy control
- Use of communities in policy control
- Use of prefix list
- Use of outbound route-filtering (ORF) capability in policy control
- Aggregation in BGP
- Scaling IBGP in large networks
- Route reflectors
- Confederations
Neighbor Relationships
BGP requires a neighbor relationship to be established before any information is exchanged between BGP speakers. BGP does not dynamically discover routers interested in running BGP; instead, BGP is configured with a specific neighbor IP address.
Like most other dynamic protocols, BGP uses periodic keepalive messages to ensure availability of BGP neighbors.
The keepalive timer is one third of the holdtime. If three consecutive keepalive messages are missed from a particular BGP neighbor, the holdtime expires and that neighbor is considered dead. In RFC 1771, the suggested value for the holdtime is 90 seconds, and the suggested value for the keepalive timer is 30 seconds. These values are negotiated between BGP neighbors when the neighbors first come up. RFC 1771 also requires that “an implementation of BGP must allow these timers to be configurable.”
When BGP is configured with a neighbor IP address, it goes through a series of stages before it reaches the desired Established state in which BGP has negotiated all the required parameters and is willing to exchange BGP routes. BGP goes through the following stages of neighbor relationship, per RFC 1771:
- Idle— No BGP resources are allocated in Idle state, and no incoming BGP connections are allowed.
- Connect— BGP waits for a TCP connection to be completed. If successful, the BGP state machine moves into OpenSent state after sending the OPEN message to the peer. Failure in this state could result in either going into Active state or Connect state, or reverting back to Idle state, depending on the failure reasons.
- Active— In this state, a TCP connection is initiated to establish a BGP peer relationship. If successful, BGP sends its OPEN message to the peer and moves to OpenSent state. Failure can result in going to the Active or Idle states.
- OpenSent— After sending an OPEN message to the peer, BGP waits in this state for the OPEN reply.If a successful reply comes in, the BGP state moves to OpenConfirm and a keepalive is sent to the peer. Failure can result in sending the BGP state back to Idle or Active.
- OpenConfirm— The BGP state machine is one step away from reaching its final state (Established).BGP waits in this state for keepalives from the peer. If successful, the state moves to Established; otherwise, the state moves back to Idle based on the errors.
- Established— This is the state in which BGP can exchange information between the peers. The information can be updates, keepalives, or notification.
Figure 14-2 highlights a simple BGP state machine that runs while BGP is in operation. Some details are left out for simplicity. Refer to RFC 1771 for a more detailed examination of the BGP state machine operation.
External BGP Neighbor Relationships
This section explains a sample configuration of EBGP sessions. In Figure 14-3, R1 and R2 belong to different autonomous systems—109 and 110, respectively.
There are two ways to configure when peering EBGP:
- Case 1— R1 and R2 are peering with a physical interface.
- Case 2— R1 and R2 are indirectly connected or they are peering with each other’s loopback interfaces.
The peering relationship with R1 and R2 in Case 1 means that the R1 peering IP address is in the same subnet as its own physical interface.
The configuration for this case is as follows:
R1#: router bgp 109 neighbor 131.108.1.2 remote-as 110
R1 in Figure 14-3 has an interface with an IP address of 131.108.1.1.
Figure 14-4 shows two scenarios of multihop EBGP sessions indicative of the peering relationship in Case 2. In this figure, EBGP peering between R1 and R2 is done with each other’s loopback addresses. This is typically seen when multiple connections exist between the two autonomous systems and both links should carry traffic. Either each AS runs two separate BGP neighbor relationships on two separate physical interfaces, or they can configure one BGP neighbor to the loopback and configure two static routes to reach each other’s loopback. The latter method is preferable because it saves an extra BGP neighbor relationship.
In Figure 14-5, R3 in AS 110 might not be capable of running BGP, so R1 and R2 must peer with each going through R3.
In both cases of Figure 14-4 and Figure 14-5, it is assumed that all routers have reachability to R1 and R2’s loopback addresses.
Loopback addresses are used because they are virtual interfaces and they never go down like physical interfaces do. Even if one physical interface goes down, BGP between loopbacks remains up as long as they exist as redundant paths to each other’s loopbacks.
Example 14-1 shows a sample configuration of R1 to configure multihop EBGP session.
Example 14-1 Sample Configuration of R1 to Show Multihop EBGP Session
R1#: router bgp 109 neighbor 131.108.10.2 remote-as 110 neighbor 131.108.10.2 ebgp-multihop 5 neighbor 131.108.10.2 update-source Loopback0
ebgp-multihop 5 means that neighbor 131.108.10.2 can be only five hops away from R1, and the Time To Live (TTL) field in the IP header is set to 5.
update-source Loopback0 means that all BGP updates are sourced from the Loopback 0 address of R1. R2 uses 131.108.10.1 as the next-hop address for all routes learned through R1.
Internal BGP Neighbor Relationships
Assume that R1 and R2 belong to the same AS, 109, as shown in Figure 14-6.
If R1 and R2 are IBGP neighbors, meaning that they are BGP neighbors in the same AS, the configuration cases can be any of the following:
Case 1— R1 and R2 are peering with a physical interface of each other, and peering is done with the IP address that belongs to the subnet that they both share. The configuration of R1 is as follows:
router bgp 109 neighbor 131.108.1.2 remote-as 109
Case 2— R1 and R2 are either indirectly connected or they are peering with each other’s loopback interfaces. The configuration of R1 is as follows:
router bgp 109 neighbor 131.108.10.2 remote-as 109 neighbor 131.108.10.2 update-source Loopback0
NOTE
The neighbor 131.108.10.2 ebgp-multihop command is not needed. In cases of IBGP, the TTL in the IP header is set to 255 in Cisco IOS Software because it is assumed that IBGP neighbors might not be physically directly connected. In addition, an IBGP neighbor relationship can also be established between loopback addresses that are considered a multihop away from each other. In case of a physical failure in the network, IBGP can use alternate physical paths, if they exist, to reach the loopback of the BGP neighbor. This way, IBGP remains intact, even in the case of physical failures along the way.
Advertising Routes
A BGP router can advertise or receive updates from its BGP peer only if it has achieved the Established state with its neighbor. A router running BGP will advertise only a prefix to other neighbors that it is going to use in its routing table. Such a prefix is called the best path (defined later in the chapter). A rule similar to the split-horizon works in BGP as well. A prefix learned from a neighbor will not be advertised back to that neighbor if that was the best route.
Cisco IOS Software offers multiple ways to advertise prefixes in BGP. One rule that BGP follows when advertising prefixes to other neighbors is that the prefix being advertised must exist in the routing table of the advertising router.
In Figure 14-7, R1 advertises 131.108.1.0/24 through BGP to its BGP peer, R2.
In Cisco IOS Software BGP, there are three ways to advertise the prefix:
- Using the network statement— As with other routing protocols, this is the first option. The following configuration advertises 131.108.1.0/24 through the network statement in R1:
router bgp 109 network 131.108.1.0 mask 255.255.255.0
131.108.1.0/24 must exist in the routing table of R1; otherwise, 131.108.1.0/24 will not be advertised in BGP. The mask keyword followed by the actual mask of the prefix is needed when subnetted routes are being advertised.
- Using the redistribute command— If 131.108.1.0/24 is a connected route in R1’s routing table, the following configuration will advertise 131.108.1.0/24 in BGP:
router bgp 109 redistribute connected no auto-summary
With this configuration, all the connected routes, including 131.108.1.0/24, are advertised. To allow only 131.108.1.0/24 to advertise, BGP must use the filtering mechanism explained later in this chapter. Command no auto-summary is used because BGPs by default advertises redistributed routes to their natural Classful mask. For example, 131.108.1.0/24 being a Class B prefix would go as 131.108.0.0/16 without this command.
- Using the aggregate statement— Prefixes are aggregated or summarized to reduce the number of prefix announcements and reduce the size of the routing table. The Cisco IOS Software aggregate feature in BGP announces summarized routes.
If more specific routes of 131.108.1.0/24 are present in the BGP table—for example, 131.108.1.128/26—the following configuration advertises 131.108.1.0/24 in BGP:
R1#: router bgp 109 aggregate-address 131.108.1.0 255.255.255.0
You need to understand two important rules for the setup shown in Figure 14-8:
Rule 1— Aggregation or summarization of subnets can happen only if those subnet exist in BGP table.
Rule 2— For the aggregated (summarized) route, Cisco IOS installs an IP route with the next hop to Null0 in the IP routing table. This is done to insure that a valid route exists in the routing table to annouce it to other BGP peers.
As Figure 14-8 illustrates, per Rule 1, R1 has 131.108.1.128/25 and 131.108.1.192/26 in its BGP table, but it is configured to advertise 131.108.1.0/24 to R2.
For Rule 2, when the aggregate-address command is used, Cisco IOS Software automatically installs 131.108.1.0/24 with a next-hop interface of NULL0 in its routing table. The output in Example 14-2 illustrates that R1 is configured to advertise 131.108.1.128/25 and 131.108.1.192/26. R1 is also generating an aggregate of 131.108.1.0/24. The first portion displays the BGP table to show that all three routes, including the aggregate, are in the BGP table. The second portion shows the detailed display of the aggregated route in R1. The third portion indicates that Cisco IOS Software automatically installs a Null0 route for the aggregate statement.
Example 14-2 Configuration and Output for Setup in Figure 14-8
R1#: router bgp 1 network 131.108.1.192 mask 255.255.255.192 network 131.108.1.128 mask 255.255.255.128 aggregate-address 131.108.1.0 255.255.255.0 R1#show ip bgp BGP table version is 5, local router ID is 1.1.1.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 131.108.1.0/24 0.0.0.0 32768 i *> 131.108.1.128/25 0.0.0.0 0 32768 i *> 131.108.1.192/26 0.0.0.0 0 32768 i _____________________________________________________________________________________ R1#show ip bgp 131.108.1.0 255.255.255.0 BGP routing table entry for 131.108.1.0/24, version 3 Paths: (1 available, best #1, table Default-IP-Routing-Table) Local, (aggregated by 1 1.1.1.1) 0.0.0.0 from 0.0.0.0 (1.1.1.1) Origin IGP, localpref 100, weight 32768, valid, aggregated, local, atomic-aggregate, best _____________________________________________________________________________________ R1#show ip route 131.108.1.0 Routing entry for 131.108.1.0/25 Known via "static", distance 1, metric 0 (connected) Routing Descriptor Blocks: * directly connected, via Null0 Route metric is 0, traffic share count is 1
Null0 is a bit bucket and will not cause any harm for the data traffic because data traffic is switched based on more specific /25 and /26 routes, not this /24 NULL 0 route.
With the aggregate-address command in Cisco IOS Software, BGP advertises the aggregate and the subnetted routes as well. Cisco IOS allows a knob in configuration that will suppress these subnetted routes; only the aggregated prefix will be announced:
R1#router bgp 1 aggregate-address 131.108.1.0 255.255.255.0 summary-only
Synchronization Rule
This rule in RFC 1771 states that the IGP routing table must be synchronized with the IBGP routing table. This can happen only if EBGP-learned routes are redistributed in IGP (OSPF) as the ASBR. The potential of black-holing traffic exists if IGP is not synchronized with IBGP-learned routes.
Figure 14-9 shows how synchronization rule can black-hole traffic. R2, R3, and R4 are in AS 110 running IBGP and also OSPF. R1 in AS 109 advertises prefix 131.108.1.0/24 through EBGP to R2, which advertises this prefix to R3 and R4; and R4 advertises to its EBGP neighbor R5.
Assume that all the routers except R3 have processed this BGP update and have installed the route for 131.108.1.0/24 in their routing tables. If sources behind R5 start sending traffic to 131.108.1.0/24, packets arrive at R4, and the R4 routing table might report that the next hop to reach 131.108.1.0/24 is through R3. As a result, data traffic arrives at R3 and is dropped because R3 is still processing the update and does not have the route to reach 131.108.1.0/24. This is called transient black-holing of traffic. Over time, R3 will have the route and will be capable of passing traffic for 131.108.1.0/24. By the RFC 1771 synchronization rule, R5 should have waited until its IGP (OSPF) would have also received the route for 131.108.1.0/24; then it could have advertised the route to external peer R5 to attract traffic.
Announcing all EBGP routes in IGP requires manual redistribution at ASBR (R1) in this example. R1 must redistribute 131.108.1.0/24 in OSPF so that all routers in AS 110 definitely receive the prefix. However, with the size of Internet routing tables today, it is not possible or scalable to redistribute a full Internet feed into IGP. Therefore, all BGP speakers running Cisco IOS Software turn off synchronization using the following command:
R2# router bgp 110 no Sysnchronization
Transient black-holing can still happen, but by turning off synchronization, IGP is not required to carry full BGP routes. In cases where some routers are not running BGP and they are in transit path of the IBGP neighbor, synchronization cannot be turned off and BGP must be redistributed in the IGP.
Receiving Routes
In BGP, if the BGP peer is in the Established state, no additional configuration is needed to receive routing updates. BGP will accept all the updates from the peer, provided that those updates pass the necessary checks for packet format and filters.
Policy Control
Policy control means that BGP provides power to control prefix filtering and manage IP traffic flow into and out of the BGP network. BGP policies can flow downstream and affect policy of those autonomous systems to which routes are being propagated. In a large BGP network that is divided into multiple regions, special requirements must be met in terms of what type and how much traffic can flow in and out of each region. BGP policy control gives network operators a highly scalable way of maintaining traffic flows. BGP policies are defined by BGP attributes that consist of the following:
- LOCAL_PREF
- AS_PATH
- MULTI_EXIT_DISC (MED)
- ORIGIN
- NEXT_HOP
- ATOMIC_AGGREGATE
- AGGREGATOR
Typically, ATOMIC_AGGREGATE and AGGREGATOR are not used in defining and configuring BGP policies in routers and therefore will not be discussed in detail in this chapter. The remaining attributes will be illustrated and explained in detail in this chapter.
The routing table of a router dictates how traffic destined to a certain destination exits that router. If the focus of traffic flow is shifted to a region where many routers are present, the routing policy depicted in the routing tables of each router dictates how traffic exits that region. Similarly, all the regions combined can be viewed as a complete IP network. Routing policy depicted in routing tables of all the devices in the network reflects how traffic exits out the network. Figure 14-10 shows how network traffic flows across multiple regions and through multiple routers based on the BGP policy defined to influence the path that data traffic takes from source to destination.
Single BGP AS is divided into multiple regions. Traffic flows from source to destination, crossing multiple regions based on the BGP policy defined.
In Figure 14-10, it seems more logical that traffic from source to destination travels Region 1, Region 2, and Region 3, and then to the final destination because that seems to be the shortest path from source to destination. Region 2, however, is configured with a BGP policy that shifts traffic from Region 2 to Region 4 instead. Network architects design and decide on such policies when traffic takes a longer path. Factors such as bandwidth availability, congestion, router capacity, and many others play a significant role in the definition of these policies.
How is a BGP policy created? Manipulation of BGP attributes defines BGP policies. Packet forwarding in a router happens from the routing table and BGP policy dictates which route of many is chosen to go in the routing table.
Figure 14-11 illustrates a simple example of policy control in the case of EBGP.
AS 109 network administrators require a BGP policy so that, as shown in Figure 14-11, traffic destined to 131.108.1.0/24 from AS 109 must use the R1–R3 link. The R1–R2 link should be used as a backup. This can happen only if routing tables in all the devices of AS 109 show R1–R3 as the exit point and if the path through R2 is present in the BGP table and not in routing table of R1.
The method of choosing one path over the other is accomplished by manipulating the right BGP attribute. In Figure 14-11, R1 learns two paths to reach 131.108.1.0/24—one through R2 and the other through R3.
R1 must pick the path through R3 because of the required BGP policy, and BGP attributes must be changed so that the path from R3 becomes more attractive than the path from R2. When that happens, IP traffic (after following the routing table) flows from all devices in AS 109 to exit through R3 to reach 131.108.1.0/24. The following sections explain using the BGP attributes and the methods of changing them to define BGP policies.
Policy Control Using BGP Attributes
BGP picks a best path for a destination IP prefix from multiple paths and then installs it in the routing table. This best path forwards IP traffic. By default, BGP does a decent job of choosing the appropriate path; however, with the huge size of router-based networks, it is essential that BGP path selection be managed by network operators to have a BGP policy that optimally uses the network. BGP attributes are often manipulated to manage BGP networks. Examples of most commonly used BGP attributes are listed here:
LOCAL_PREF AS_PATH MULTI_EXIT_DISC (MED) ORIGIN NEXT_HOP WEIGHT (WEIGHT is not a BGP attribute—it is a Cisco proprietary attribute)
The sections that follow describe these BGP attributes in greater detail and describe how to manipulate them for BGP policy control, where applicable.
LOCAL_PREF Attribute
A 32-bit non-negative integer value of LOCAL_PREF in BGP updates defines a preference of one exit point within an AS over others to reach the originator of the route. LOCAL_PREF has no significance outside an AS; therefore, it affects only the outgoing traffic from an AS. LOCAL_PREF is not advertised to EBGP neighbors and is propagated to only IBGP neighbors.
Figure 14-12 explains how LOCAL_PREF is handled in networks running BGP.
R1 in AS 109 is advertising 131.108.1.0/24 to its EBGP neighbors R2 and R3 in AS 110. BGP updates sent to EBGP neighbors do not contain LOCAL_PREF. In Cisco IOS Software, LOCAL_PREF is given a default value of 100 for all the prefixes learned from EBGP neighbors. Cisco IOS Software also allows the user to configure LOCAL_PREF, as shown later in this chapter. As Figure 14-12 shows, because the link between R1 and R2 is bigger than the link from R1 to R3, it is likely desired that traffic from AS 110 to AS 109 use the R1–R2 link rather than the R1–R3 link. Therefore, R2 is configured so that it changes the LOCAL_PREF to 200 for all the prefixes that it learned from R1.
Because LOCAL_PREF is advertised to all IBGP neighbors, R3, R4, and R5 receive 131.108.1.0/24 with a LOCAL_PREF of 200 from R2. R3 has an additional path for 131.108.1.0/24 from R1, and its LOCAL_PREF was unchanged and is defaulted to 100. Now, R3 must decide between two paths for 131.108.1.0/24, one from R1 and the other from R2. As explained later in the discussion of best-path calculation of BGP, the path with the higher LOCAL_PREF wins; therefore, the path from R2 will win and get installed in the routing table for R3. Similarly, R4 and R5 will choose R2 to reach 131.108.1.0/24. In Figure 14-12, R4 and R5 are receiving only one path for 131.108.1.0/24, and that is from R2. If they were receiving multiple paths from multiple IBGP neighbors, they would have decided on the best path based on a higher LOCAL_PREF, just like R3 did.
When R6 in AS 111 is sending traffic for 131.108.1.1, it exits AS 110 from R2 because AS 110 has preferred R2 as an exit point for 131.108.1.0/24.
Figure 14-12 explains that LOCAL_PREF plays a significant role in managing outgoing traffic from AS 109 to destinations outside AS 109.
Example 14-3 shows the configuration needed to manipulate the LOCAL_PREF attribute in R2, as illustrated in Figure 14-12.
Example 14-3 Configuring the LOCAL_PREF Attribute
R2# router bgp 110 neighbor 1.1.1.1 remote-as 109 neighbor 1.1.1.1 route-map SET_LOCAL_PREF in route-map SET_LOCAL_PREF permit 10 match ip address 1 set local-preference 200 route-map SET_LOCAL_PREF permit 20 match ip address 2 access-list 1 permit 131.108.1.0 access-list 2 permit any
The IP address of R1 in AS 109 is 1.1.1.1. SET_LOCAL_PREF is the route map that is applied on an EBGP session with R1. The route map usage is defined in detail in Chapter 1, “Understanding IP Routing.” The first sequence of the route map has a match statement against access-list 1 that permits 131.108.1.0/24. The set command configures the LOCAL_PREF to 200 for prefixes that pass access-list 1. The second sequence of the route map is necessary to allow all other prefixes from this neighbor without changing the LOCAL_PREF.
After this configuration in R2, the output of the BGP table for prefix 131.108.1.0/24 from R2 and R3 looks like Example 14-4.
Example 14-4 BGP Output of the Prefix 131.108.1.0/24 After LOCAL_PREF Change
R2#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 8 Paths: (1 available, best #1) Advertised to non peer-group peers: 1.1.8.3 109 1.1.7.1 from 1.1.7.1 (10.1.1.1) Origin IGP, metric 0, localpref 200, valid, external, best _____________________________________________________________________________________ R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 8 Paths: (2 available, best #1) Not advertised to any peer 109 1.1.7.1 (metric 307200) from 1.1.8.2 (10.0.0.5) Origin IGP, metric 0, localpref 200, valid, internal, best 109 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 0, localpref 100, valid, external
R3 has two updates, one from R1 and the other from R2. The path from R2 is selected as the best path because of the higher LOCAL_PREF.
MULTI_EXIT_DISC (MED) Attribute
A 32-bit non-negative integer value of MED in BGP updates defines a method to choose among multiple exit points in the same neighboring AS. MED is a nontransitive attribute of BGP; therefore, if it is received from an EBGP neighbor, it is sent to an IBGP neighbor, but it does not get propagated to other EBGP neighbors.
Figure 14-13 explains the usage of MED. In AS 109, RI has two links to AS 110. The link between R1 and R2 has a higher bandwidth than the link between R1 and R3. Therefore, R1 might decide that all traffic destined to 131.108.1.0/24 must exit AS 110 through the R1–R2 link, not the R1–R3 link.
A lower MED value is preferred when comparing the two updates. In Cisco IOS Software, MED is compared only between updates from within the same AS. To compare MED values in updates from different autonomous systems, Cisco IOS Software must be configured with bgp always-compare-med in BGP subcommands.
The AS 109 policy dictates that all traffic destined for 131.108.1.0/24 should come through the R1–R2 link and that the R1–R3 link should be used as a backup in case the R1–R2 link goes down.
To achieve this, R1 advertises 131.108.1.0/24 to both neighbors in R2 and R3 in AS 110. However, R1 should advertise a lower MED value to R2 than to R3.
Example 14-5 shows a sample configuration of R1 to achieve the goal.
Example 14-5 Configuring the MED Attribute
R1# router bgp 109 neighbor 1.1.7.2 remote-as 110 neighbor 1.1.7.2 route-map SEND_LOWER_MED out neighbor 1.1.2.3 remote-as 110 neighbor 1.1.2.3 route-map SEND_HIGHER_MED out route-map SEND_LOWER_MEDpermit 10 match ip address 1 set metric 10 ! route-map SEND_LOWER_MEDpermit 20 match ip address 2 route-map SEND_HIGHER_MED permit 10 match ip address 1 set metric 20 ! route-map SEND_HIGHER_MED permit 20 match ip address 2 access-list 1 permit 131.108.1.0 access-list 2 permit any
1.1.7.2 and 1.1.2.3 are two neighbors of R1. They are both configured with route maps to advertise a MED of 10 and 20, respectively, for the prefix 131.108.1.0/24. This occurs in sequence 10 of the route map. Sequence 20 permits all other prefixes, if any, advertised from R1 to R2 and R3, and no MED changes were applied to them.
Example 14-6 shows the output of R3 and R2 after this configuration change in R1.
Example 14-6 Output of BGP Table from R3 and R2 After the MED Advertisement from R1
R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 10 Paths: (2 available, best #2) Not advertised to any peer 109 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 20, localpref 100, valid, external 109 1.1.7.1 from 1.1.8.2 (10.0.0.5) Origin IGP, metric 10, localpref 100, valid, internal, best _____________________________________________________________________________________ R2#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 10 Paths: (1 available, best #1) Advertised to non peer-group peers: 1.1.8.3 109 1.1.7.1 from 1.1.7.1 (10.1.1.1) Origin IGP, metric 10, localpref 100, valid, external, best With this configuration, the prefix 131.108.1.0/24 MED field looks like the following in R2 and R3: In R2, MED = 10 for the path from R1. In R3, MED = 10 for the path from R2; MED = 20 for the path from R1.
R2 has only one path for 131.108.1.0/24, whereas R3 has two. This is because R2 is advertising its best route to all its IBGP neighbors (R3, R4, and R5, in this example). R3’s best path for 131.108.1.0/24 is from R2, so R3 will not advertise its best path back to R2 because that path originally came from R2.
Because the lower MED wins in BGP best-path calculation, in R3, the path from R2 wins over the path from R1. Thus, all traffic from autonomous system 110 for 131.108.1.0/24 will exit through R2.
MED is a nontransitive attribute and will not be advertised to AS 111 by AS 110. R5 and R6 might configure to advertise the same or different MED to R6 in AS 111, but the MED value originally set by R1 in AS 109 will not be kept.
Because of the BGP MED policy of R1, traffic from R6 in AS 111 to 131.108.1.1 in AS 109 will exit from R2, not from R3 in AS 110.
The MED attribute plays a significant role in influencing incoming traffic in case multiple connections exist between the same AS. The example in Figure 14-13 is most commonly seen in enterprise BGP networks where routers in AS 109 are dual homed to an ISP in AS 110.
In Cisco IOS Software, MED is compared only between updates from within the same AS. In Example 14-5, R3 compared MEDs because 131.108.1.0/24 came from the same AS 109. To compare MED values in updates from different autonomous systems, Cisco IOS Software must be configured with bgp always-compare-med in BGP subcommands.
Figure 14-14 shows a more complex example, as seen in ISP networks advertising MED to other ISP.
AS 109 has two regional connections, east and west, with AS 110. AS 109 needs to make sure that regional traffic destined to AS 109 regions must enter through their respective regional links. This can be accomplished by defining the following:
AS 109 must advertise the proper MEDs, as shown in Figure 14-14.
AS 110 must honor AS 109 MED announcements.
The first task is achieved through the configuration shown later in this chapter. The second task is more of a peering agreement between AS 109 and AS 110. Honoring MED means that AS 110 must accept the MED announcements from AS 109 and will not overwrite them with its own policies. Honoring MED is typically a two-way relationship: AS 110 honors AS 109 MED only if AS 109 does the same for AS 110 MED. By honoring the MED, AS 110 carries traffic destined to AS 109 through its backbone and exits at the closest point in AS 109. If AS 110 decides not to honor AS 109 MED, it will have its own policies to carry traffic for AS 109. Instead of an optimal exit point, AS 110 might choose the closest exit point. Figure 14-15 shows how traffic flows in AS 110 when it honors the MED from AS 109.
Traffic sourcing behind R2 destined for 140.1.1.1 will traverse AS 110 backbone routers because they have all received the proper MED announcement from AS 109 as the MED is propagated within the IBGP cloud. This traffic exits AS 110 at R5, the exit point closest to the destination, 141.1.1.1. Similarly, traffic behind R4 destined for 131.108.1.1 exits at R3.
In some situations, ISPs do not honor each other’s MEDs. In this case, AS 110 might dump all traffic destined to AS 109 to its closest exit point and not carry its traffic through the backbone. Such an example can arise when traffic destined to 140.1.1.1 from sources behind R2 carries over to R3 and exits to R1; AS 109 must carry that traffic across its backbone to reach R6 in the east region. Proper usage of the MED attribute can also be called cold potato routing (defined earlier in the chapter).
AS_PATH Attribute
The AS_PATH attribute defines the list of autonomous systems through which a BGP update has traversed. It is a mandatory attribute that BGP updates must carry, and it is changed only when a BGP update is sent to an EBGP neighbor. Figure 14-16 explains how the AS_PATH attribute works.
AS 109 is advertising 131.108.1.0/24 to its EBGP neighbor in AS 110. The BGP speaker must prepend its AS number at the left-most position in the AS_PATH attribute field, while sending an update to its EBGP neighbor. R1 prepends its AS number 109 in that field. R2 advertises this update to its IBGP speakers R3 and R4 but does not change the AS_PATH. R3 and R4 prepend their AS 110 when advertising this prefix to AS 111. When a router receives a BGP update that has an AS_PATH attribute that lists its own AS in it, that update is considered looped and is dropped. This mechanism is used in BGP to detect routing loops.
A smaller AS_PATH length is preferred when comparing the two BGP updates.
Refer back to the network topology in Figure 14-13. AS 109 wants to define a BGP policy so that all traffic destined to 131.108.1.0/24 from AS 110 must enter through the R2–R1 link, and the R3–R1 link should be used as a backup.
Visualizing that in R2, R3, and R4 (all in AS 110), the AS_PATH field for the prefix 131.108.1.0/24 is 109. R3 has two paths for this prefix, one from R1 and the other from R2. The best-path calculation rule will tie because the AS_PATH length is identical, at 1. BGP Best-path calculation moves down to next tie-breaking criteria, per the best-path calculation rule described in this chapter. Example 14-7 demonstrates how R1 can achieve its goal of preferring the R1–R2 link over the R1–R3 link for traffic destined to 131.108.1.0/24/.
One approach is to use MEDs so that R1 advertises a lower MED when advertising prefix 131.108.1.0/24 to R2 and advertises a higher MED to R3. Another approach is to make the AS_PATH length longer for the advertisement from R1 to R3 for this prefix. Because the BGP best-path calculation rule looks at AS_PATH length as the tie-breaker rule between multiple paths, the R1–R3 path will lose and the R1–R2 path will win and be installed in the routing table.
Example 14-7 shows the configuration needed in R1 to achieve this.
Example 14-7 Using the AS_PATH Attribute on R1 to Dictate the Best Path
R1# router bgp 109 network 131.108.1.0 mask 255.255.255.0 neighbor 1.1.2.3 remote-as 110 neighbor 1.1.2.3 route-map AS_PREPEND out neighbor 1.1.7.2 remote-as 110 route-map AS_PREPEND permit 10 match ip address 1 set as-path prepend 109 109 ! route-map AS_PREPEND permit 20 match ip address 2 access-list 1 permit 131.108.1.0 access-list 2 permit any
1.1.2.3 is the R3 IP address, and route-map AS_PREPEND is configured in R1 for R3 to increase the length of the AS_PATH attribute by prepending AS 109 twice in the list. route-map AS_PREPEND sequence 10 has a match clause that makes sure that only 131.108.1.0/24 gets this prepended AS_PATH, and sequence 20 ensures that the rest of the prefixes from R1 to R3 remain unchanged. Access lists 1 and 2 are defined to achieve that.
After this configuration, R3 has two updates for prefix 131.108.1.0/24, one from R2 and another from R1 with the prepended AS_PATH. R2 just has a single update from R1. Example 14-8 shows the output for R3 and R2.
Example 14-8 show ip bgp Output from R3 and R2 After AS_PATH Manipulation in R1
R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 5 Paths: (2 available, best #2) Not advertised to any peer 109 109 109 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 0, localpref 100, valid, external 109 1.1.7.1 from 1.1.8.2 (10.0.0.5) Origin IGP, metric 0, localpref 100, valid, internal, best _____________________________________________________________________________________ R2#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 6 Paths: (1 available, best #1 Advertised to non peer-group peers 1.1.8.3 109 1.1.7.1 from 1.1.7.1 (10.1.1.1) Origin IGP, metric 0, localpref 100, valid, external, best For prefix 131.108.1.0/24, the AS_PATH field looks like the following in R2 and R3: In R2, AS_PATH = 109 for path from R2 In R3, AS_PATH = 109 109 109 for path from R1; AS_PATH = 109 for path from R2
Because in R3 the AS_PATH length of an update from R1 is (3) and from R2 is (1), R3 picks the path from R2 over the path from R1. This way, all traffic from AS 110 destined for 131.108.1.0/24 in AS 109 would exit through R2.
R2 has only one path for 131.108.1.0/24, whereas R3 has two. This is because R2 is advertising its best route to all its IBGP neighbors (R3, R4, and R5, in this example). R3’s best path for 131.108.1.0/24 is from R2, so R3 does not advertise its best path back to R2 because it originally came from R2.
The AS_PATH prepend technique to influence incoming traffic is used in cases when AS 110 does not honor MEDs from AS 109, or when AS 109 is dual homed to different ISPs.
Typically, Enterprise BGP networks use the AS_PATH prepend technique with their ISPs because the number of prefixes that they advertise are small and can be easily managed, as showed in Example 14-7. In ISP networks in which the number of prefixes is in the magnitude of thousands, managing AS_PATH per prefix does not scale well; therefore, ISPs rely on LOCAL_PREF, MED, and WEIGHT to manage their traffic. ISP might use AS_PATH prepend in packets to solve temporary problems but typically does not deploy this as a standard, widely used policy.
By observing the AS_PATH, a BGP speaker can find out which AS originated the prefix and how many autonomous systems this prefix has traversed. The right-most AS is the originator of the prefix and the left-most is the neighboring AS that has announced the prefix. The middle autonomous systems in the AS_PATH are the intermediate autonomous systems that the prefix has traversed. Such an order of AS_PATH is called an AS_SEQUENCE, in which AS_PATH sequence is in the order that it has maintained.
Another form of AS_PATH attribute, AS_SET, can be explained if AS 110 aggregates routes learned from AS 109 and other autonomous systems and announces a prefix that contains all the listings of autonomous systems, but the order of AS_PATH is not maintained. For example, AS 110 is aggregating 131.108.1.0/24 and 131.108.2.0/24 to 131.108.8.0/26 and advertised to AS 111. The /24s were learned from AS 109 and AS 108. AS 110 might choose to configure AS_PATH as AS_SET. Such an AS_PATH might look like this:
The order of AS 108 and AS 109 is not preserved. AS_SEQUENCE is the default behavior of BGP, whereas AS_SET is a configurable option in Cisco IOS Software.
NEXT_HOP Attribute
The IP address of the border router should be used as a next hop to reach prefixes propagated by that border router. This could be an IP address that belongs in the same AS or it could be an external IP address that shares the same subnet as that on a border router. NEXT_HOP is typically learned through an Interior Gateway Protocol (IGP), such as OSPF or IS-IS, and the cost to reach the NEXT_HOP often plays an important rule in calculating the best path.
Referring back to Figure 14-16, in AS 110, the NEXT_HOP for 131.108.1.0/24 is the IP address of the R1 subnet that connects to R2. The NEXT_HOP attribute is not changed throughout the AS. Cisco IOS allows the user to change the NEXT_HOP to be the IP address of an internal border router instead of an external address, such as Loopback of R2. This is done by using the neighbor IBGP-Neighbor-IP-address next-hop-self command in BGP. By changing NEXT_HOP from an external address to the loopback, routers carry one less subnet in the routing table. Because loopback IP addresses are carried in IGP, no additional work is needed to propagate the NEXT_HOP.
ORIGIN Attribute
The originator of the BGP update generates the ORIGIN attribute and defines how the original path was originated. Each prefix has an ORIGIN attribute. Routers, which receive updates with the ORIGIN attribute, should forward the ORIGIN attribute to all BGP neighbors unchanged. Table 14-1 describes the meaning of the different ORIGIN attribute codes and explains how these prefixes were originated.
WEIGHT: Cisco Systems, Inc. Proprietary Attribute
WEIGHT is a 4-byte integer number and is not a BGP attribute because it is not defined in RFC 1771. It is a Cisco Systems, Inc. proprietary attribute that has priority over all other BGP attributes when doing the best-path calculation in Cisco IOS Software. WEIGHT is not shared with any BGP neighbor because it has local significance in the router and because the neighboring router might not understand Cisco’s proprietary attribute.
Because WEIGHT has local significance in the router, it does not affect neighboring routers’ BGP policy, as in the case of LOCAL_PREF and MED that gets shared among other routers in the AS, and all the AS is affected when using those attributes.
Figure 14-17 explains the use of WEIGHT. AS 109 has three exit points and connects to three different ISPs. AS 109 has low-bandwidth links in its Core, so therefore AS 109 would like to have BGP policy that makes minimal use of the Core. This can happen if each exit router chooses all the prefixes from its corresponding connected ISP as the best route. In Cisco IOS Software, if R1, R2, and R3 assign WEIGHT to all the prefixes learned from ISP1, ISP2, and ISP3, respectively, R1, R2, and R3 choose ISP1, ISP2, and ISP3, respectively, for all Internet routes. Traffic originated from the source connected to R1 always exits through ISP1, as shown in Figure 14-17. This way, the core of AS 109 never carries any Internet traffic unless a BGP session with an ISP fails anywhere.
Such BGP policy is also called hot Potato routing, as defined in the introductory portion of the chapter.
Referring back to the network topology in Figure 14-13, AS 110 decided on a BGP policy stating that R3 should use R3–R1 link to reach 131.108.1.0/24 advertised by R1. Any policy change in R2 (LOCAL_PREF and so forth) should not affect R3 policy. The easiest way to accomplish this is to assign WEIGHT in R3 on the prefix 131.108.1.0/24 received from R1.
Example 14-9 shows the configuration needed in R3 to assign WEIGHT.
Example 14-9 Sample Configuration of R3 to Assign WEIGHT
R3# router bgp 110 no synchronization neighbor 1.1.2.1 remote-as 109 neighbor 1.1.2.1 route-map SET_WEIGHT in neighbor 1.1.8.2 remote-as 110 route-map SET_WEIGHT permit 10 match ip address 1 set weight 2000 ! route-map SET_WEIGHT permit 2 match ip address 2 access-list 1 permit 131.108.1.0 access-list 2 permit any
1.1.2.1 is the IP address of R1 in AS 109. SET_WEIGHT is the route map that is applied on an EBGP session with R1. The route map usage is defined in detail in Chapter 1. The first sequence of route-map 10 has a match statement against access-list 1, which permits 131.108.1.0/24. The set command configures the WEIGHT to 2000 for prefixes that pass access-list 1. The second sequence of the route map is necessary to allow all other prefixes from this neighbor without changing the WEIGHT.
Example 14-10 shows the output from R3 after setting the WEIGHT.
Example 14-10 WEIGHT Assignment Shown in BGP Output
R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 1 Paths: (2 available, best #1) Advertised to non peer-group peers: 1.1.8.2 109 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 20, localpref 100, weight 2000, valid, external, best 109 1.1.7.1 from 1.1.8.2 (10.0.0.5) Origin IGP, metric 10, localpref 100, valid, internal
R3 has two paths for 131.108.1.0/24, one from R1 and the other from R2. Even though the path from R2 has a MED of 10, R3 prefers the path from R1 because of the WEIGHT assignment. In best-path calculation, WEIGHT has the highest priority over all other attributes.
With WEIGHT, R3 uses the R1–R3 link for traffic destined to 131.108.1.0/24 from R3. The rest of AS 110 follows BGP policy defined in other routers to determine the path to reach 131.108.1.0/24.
Reading BGP Attributes from Cisco IOS Software Output
This section demonstrates how BGP attributes are read from the outputs of show commands in the Cisco IOS Software.
Example 14-11 shows the sample output of a BGP prefix received from an EBGP peer. Example 14-11 is from route-server.cerf.net.
Example 14-11 BGP Update from an EBGP Peer
show ip bgp 3.0.0.0 1740 701 80 192.41.177.69 from 192.41.177.69 (134.24.127.131) Origin IGP, metric 20, localpref 100, valid, external, best
Table 14-2 lists the BGP attributes shown in Example 14-11.
AS_PATH, [1740 701 80], means that prefix 3.0.0.0 was advertised by AS 80. Then it was advertised to AS 701, and, from 701, it came to 1740 and was given to the AS where this output is displayed. AS_PATH shows the AS this prefix has traversed.
LOCAL_PREF 100 means that no LOCAL_PREFERENCE was sent in the update or LOCAL_PREF is set to 100 on this router. Cisco IOS Software uses a predefined LOCAL_PREF of 100 for a missing LOCAL_PREF.
A MED of 20 means that neighbor 192.41.177.69 configured its BGP policy to send the MED of 20.
Example 14-12 shows sample output of a BGP prefix received from an IBGP peer. Example 14-12 is from MAE-West Looking Glass of InterMedia.
Example 14-12 BGP Update from an IBGP Peer
show ip bgp 198.133.219.0 1 109 4.24.7.77 (metric 90200) from 165.117.1.127 (165.117.1.127) Origin IGP, metric 40, localpref 100, valid, internal, best Community: 1:1000 2548:183 2548:234 2548:666 3706:153
Table 14-3 lists the BGP attributes shown in Example 14-12.
AS_PATH [1 109] means that prefix 198.133.219.0 was advertised by AS 109. Then it was advertised to AS 1 and was given to the AS where this output is displayed. AS_PATH shows the autonomous systems this prefix has traversed.
LOCAL_PREF 100 means that no LOCAL_PREFERENCE was sent in the UPDATE. Cisco IOS uses a predefined LOCAL_PREF of 100 for a missing LOCAL_PREF.
A MED of 40 means that either the IBGP neighbor 165.117.1.127 or the EBGP neighbor 4.24.7.77 of 165.117.1.127 configured its BGP policy to send the MED of 40. Later in this chapter, communities are discussed.
Using the match community Command in a Route Map |
Route maps are used extensively in BGP when it comes to managing policies.
The route map might contain match and set statements. The match statement matches a specific value, such as an IP prefix. The set statement changes BGP attributes. The route map feature is mainly used with one of the following statements: aggregate, neighbor, network, or redistribute. Example 14-13 demonstrates a sample route map.
Example 14-13 Sample Route Map Used for Policy Control
router bgp 2 neighbor A remote-as 1 neighbor A route-map test out route-map test permit 10 match ip address 1 set metric 20 ! route-map test permit 20 match ip address 2 access-list 1 permit 131.108.1.0 access-list 2 permit any
The match ip address 1 statement examines access list 1, which allows only the prefix 131.108.1.0/24 to go to the set metric 20 command. The remaining prefixes go through without any additional modification in the route-map test permit 20 statement, which examines access-list 2, which permits all prefixes and sets no attribute.
The following sections explain and demonstrate the most common match and set statement in route maps when used with BGP.
Using the match ip address Command in a Route Map
View the different options available for the match ip address command by entering the following:
match ip address ? 1-199 IP access-list number 1300-2699 IP access-list number (expanded range) WORD IP access-list name prefix-list Match entries of prefix-lists
Example 14-14 demonstrates applying the match ip address statement to a route map.
Example 14-14 Using the match ip address Statement in a Route Map
route-map test permit 10 match ip address 1 access-list 1 permit 131.108.1.0
Using the match community Command in a Route Map
When prefixes contain communities, you should use a route map with the match community command to examine the prefix that has the communities configured.
View the different options available for the match community command by entering the following:
match community ? 1-99 Community-list number (standard) 100-199 Community-list number (extended exact-match Do exact matching of communities
Example 14-15 demonstrates applying the match community statement to a route map.
Example 14-15 Using the match community Statement in a Route Map
route-map test permit 10 match community 1 ip community-list 1 permit 1:1
match community 1 means that it will examine community-list filter 1, which permits prefixes that have community 1:1 configured. Later, this chapter describes communities in depth.
Using the match as-path Command in a Route Map
The AS_PATH attribute in the BGP table is viewed as a text string; therefore, UNIX-like regular expressions can examine the beginning, end, or middle content of the string. AS_PATH filters are commonly used on Internet routers running BGP. Instead of listing each prefix in an access list, you can configure Cisco IOS Software to match against all the prefixes that came in from AS 109. Similarly, AS_PATH filter can be configured to pass only prefixes that have an AS_PATH equal to 109.
View the different options available for the match as-path command by entering the following:
- match as-path ?
- 1-199 AS path access-list
Example 14-16 demonstrates applying the match as-path statement to a route map.
Example 14-16 Using the match as-path Statement in a Route Map
route-map test permit 10 match as-path 1 ip as-path access-list 1 permit ^109$
Here, as-path access-list 1 permits all prefixes that have the AS_PATH field equal to 109. All other prefixes are denied.
Usage of regular expression is powerful and complex. You are encouraged to read the Cisco IOS Software documentation before using regular expressions in your access lists. Table 14-4 explains some commonly used regular expressions and their usage.
Earlier, set commands were used to manipulate BGP attributes. This section shows a few more examples of the use of the set command. set commands may or may not be used with a match statement in the route map. When set is used with match statements, only prefixes that pass the match statement are applied with set commands. A set command without a match means an unconditional set for all the prefixes.
Using the set as-path prepend Command in a Route Map
set as-path prepend is used when the AS_PATH attribute is changed. This command prepends the AS number(s) listed in the SET command. Usage of AS_PATH prepends is discussed in detail earlier.
View the different options available for the set as-path prepend command by entering the following:
set as-path prepend ? 1-65535 AS number
Using the set community Command in a Route Map
Communities are assigned to prefixes using the set community command in the route map. A match statement before set community assigns communities only to prefixes that pass the match. Without the match statement, all prefixes will be assigned with communities configured.
View the different options available for the set community command by entering the following:
set community ?
1-4294967295 community number aa:nn community number in aa:nn format additive Add to the existing community local-AS Do not send outside local AS (well-known community) no-advertise Do not advertise to any peer (well-known community) no-export Do not export to next AS (well-known community) none No community attribute
Using the set local-preference Command in a Route Map
View the different options available for the set local-preference command by entering the following:
set local-preference ? 0-4294967295 Preference value
Using the set metric Command in a Route Map
View the different options available for the set metric command by entering the following:
set metric ? +/-metric Add or subtract metric 0-4294967295 Metric value or Bandwidth in Kbits per second
Using the set weight Command in a Route Map
View the different options available for the set weight command by entering the following:
set weight ? 0-4294967295 Weight value
Policy Control Using filter-list, distribute-list, prefix-list, Communities, and Outbound Route Filtering (ORF)
Cisco IOS Software offers powerful configuration tools for managing advertising and receiving prefixes in BGP. Network operators running BGP must have configuration options to filter what comes in and what goes out in BGP updates from their network. The following sections discuss the capabilities offered by Cisco IOS Software to control BGP prefixes in a scalable manner by using filter lists, distribute lists, prefix lists, communities, and ORF.
Use of Filter Lists in Policy Control
Filter lists permit or deny BGP updates based on the AS_PATH attribute. Filter lists are used per the neighbor statement inbound, outbound, or both. Example 14-17 demonstrates configuring a filter list for policy control.
Example 14-17 Configuring a Filter List
router bgp 110 neighbor 131.108.10.1 remote-as 109 neighbor 131.108.10.1 filter-list 1 in ip as-path access-list 1 permit ^109$
The ip as-path statement uses UNIX-like regular expressions, and it is examined against the AS_PATH attribute in the BGP update.
In this example, all incoming updates from neighbor 131.108.10.1 are examined against as-path 1, which is configured to permit updates with the AS_PATH attribute equal to 109.
AS_PATH filters are scalable because, for example, ^109$ covers all the prefixes and avoids an otherwise lengthy access list, which would involve listing all the prefixes.
Use of Distribute Lists in Policy Control
Like filter lists, distribute lists are used per neighbor statement inbound, outbound, or both. They operate on IP access lists. In distribute lists, prefixes are permitted or denied based on the networks listed in the access list.
Example 14-18 makes use of standard IP access list 1 used with a distribute list. In standard access lists, a router makes the filtering decision based on the prefix network, not based on its mask. Extended access lists enable not only network filtering, but also filtering based on the mask of the prefix.
Example 14-18 Using Distribute Lists in a Standard IP Access List
R2# router bgp 110 neighbor 131.108.10.1 remote-as 109 neighbor 131.108.10.1 distribute-list 1 in access-list 1 permit 131.108.1.0 0.0.0.255
Example 14-18 uses standard IP access-list 1 with a distribute list configured for neighbor 131.108.10.1 inbound. All prefixes that this neighbor advertises to R2 are checked against access-list 1, which permits network 131.108.1.0. This network could have a mask of /24, /25, and so on because the standard access list offers no checking for a mask.
Example 14-19 makes use of extended IP access lists.
Example 14-1 Using Distribute Lists in an Extended IP Access List
router bgp 110 neighbor 131.108.10.1 remote-as 109 neighbor 131.108.10.1 distribute-list 101 in access-list 101 permit ip 131.108.1.0 0.0.0.0 255.255.255.0 0.0.0.0
In standard access lists (1 to 99), the wildcard mask can be applied only on the prefix, not on its mask, whereas, in extended access lists, the subnet mask of the BGP update also can be checked against the access list. When used in BGP for filtering as in this example, the syntax of extended access lists has a different meaning. Extended access lists, when used in interface packet filtering, have a source address portion and a destination address portion. When extended access lists are used with BGP distribute lists, the source address portion is the network number and the destination portion is the mask of that network.
Therefore, for access-list 101, when used in BGP, the distribute list can also be read as follows:
permit IP Prefix wild-card-for-prefix Mask_of the_prefix wild-card-for-mask
access-list 101 in Example 14-19 permits 131.108.1.0 only with the mask of 255.255.255.0. Refer to Chapter 1 or the Cisco IOS Software documentation for a more in-depth explanation of standard and extended access lists.
Use of Prefix Lists in Policy Control
Prefix lists replace distribute lists because they offer user-friendly configuration options when filtering based on IP prefixes. Instead of writing difficult prefix wildcard and mask wildcard combinations in an extended access list applied in a distribute list, prefix lists use a configuration that is easy to read and comprehend. Example 14-20 shows a sample configuration substitution for the configuration in Example 14-19, but it uses a prefix list instead of a distribute list.
Example 14-19 had access-list 101 that permitted 131.108.1.0/24 only. Example 14-20 uses prefix-list to achieve that.
Example 14-20 Sample Configuration to Show How Prefix Lists Work
R2#: router bgp 110 neighbor 131.108.10.1 remote-as 109 neighbor 131.108.10.1 prefix-list FILTER1 in ip prefix-list FILTER1 seq 1 permit 131.108.1.0/24
NOTE
Prefix-list also has an implicit deny at the end, like distribute-list and AS_PATH list.
In Example 14-20, FILTER1 is the name of the prefix list that is applied on the neighbor 131.108.10.1 on all the incoming BGP updates. prefix-list FILTER1 operation will be done in sequential ascending order; the smallest sequence number will be examined first. seq 1 permits 131.108.1.0/24. The prefix list certainly offers a simpler, yet powerful, method to achieve what distribute lists once offered.
Use of Communities in Policy Control
RFC 1997 defines the BGP COMMUNITIES attribute, which describes a community as “a group of destinations which share some common property.” A community is a 32-bit number that is assigned to a prefix by configuration and that is propagated to all neighbors in a BGP update. A prefix can be assigned with multiple communities, with a maximum of 255 different communities per prefix. BGP operators can group networks into communities. For example, all networks in the east region of an internetwork are assigned a community, and the networks in the west region of an internetwork are assigned a different community. Thus, community numbers act as a tag for each prefix. By looking at the community in BGP output, it becomes easy to distinguish east region prefixes from west region prefixes.
Communities can be represented in two ways in Cisco IOS Software. Conventional style is a plain 32-bit number; newer style uses the format AS:nn, where AS is the autonomous system and nn is a 2-byte number. The newer format can be used after ip bgp new-format under the bgp subcommand.
Figure 14-18 shows how sets of prefixes are grouped in certain communities. BGP attribute manipulation is often done on a per-community basis.
In Figure 14-18, AS 109 and 110 are configured with EBGP. In AS 109, 131.108.1.0/24 and 131.108.2.0/24 belong to community 109:1, and 131.108.3.0/24 and 131.108.4.0/24 belong to community 109:2.
Example 14-21 shows a sample configuration in R1, which assigns communities. Assume that R1 can originate 131.108.1.0/24, 131.108.2.0/24, 131.108.3.0/24, and 131.108.4.0/24.
Example 14-21 Sample Configuration to Assign Communities per Prefix
R1# router bgp 109 network 131.108.1.0 mask 255.255.255.0 network 131.108.2.0 mask 255.255.255.0 network 131.108.3.0 mask 255.255.255.0 network 131.108.4.0 mask 255.255.255.0 neighbor 131.108.6.2 remote-as 110 neighbor 131.108.6.2 send-community neighbor 131.108.6.2 route-map SET_COMMUNITY out ip bgp-community new-format access-list 1 permit 131.108.2.0 access-list 1 permit 131.108.1.0 access-list 2 permit 131.108.4.0 access-list 2 permit 131.108.3.0 route-map SET_COMMUNITY permit 10 match ip address 1 set community 109:1 ! route-map SET_COMMUNITY permit 20 match ip address 2 set community 109:2
R1 has configured route-map SET_COMMUNITY and applied it to neighbor 131.108.6.2 for all the outbound BGP updates that R1 advertises to R2. route-map SET_COMMUNITY has a match clause in each sequence that matches against the access list. If the prefix is per-mitted in the access list, it is assigned a community based on which access list it is permitted by. In Example 14-21, 131.108.2.0/24 is permitted by access-list 1, so it will be assigned with community 109:1. Similarly, 131.108.4.0/24 gets community 109:2.
R2 receives these updates with communities and might be configured to assign LOCAL_PREF based on the communities. R2 might assign LOCAL_PREF based on individual prefix, but it would be difficult to manage if the number of prefixes grows to several thousand.
Example 14-22 shows the sample configuration of R2, which assigns a LOCAL_PREF value of 200 for prefixes that belong to community 109:1, and assigns a LOCAL_PREF value of 50 for prefixes that belong to community 109:2.
Example 14-22 Sample Configuration to Show Community Filter Usage in Configuring BGP Policy
R2# router bgp 110 neighbor 131.108.6.1 remote-as 109 neighbor 131.108.6.1 route-map SET_LOCAL_PREF in neighbor 131.108.6.1 send-community ip bgp-community new-format ip community-list 1 permit 109:1 ip community-list 2 permit 109:2 route-map SET_LOCAL_PREF permit 10 match community 1 set local-preference 200 ! route-map SET_LOCAL_PREF permit 20 match community 2 set local-preference 50
Route maps are configured to match against community list filters 1 and 2 that look for these communities in BGP updates. If the community is found in the update, the set operation is performed.
Example 14-23 shows the BGP table for R2. All prefixes that belong to community 109:1 are assigned a LOCAL_PREF value of 200. Prefixes with community 109:2 are assigned a LOCAL_PREF value of 50.
Example 14-23 Router R2 BGP Table Reflects LOCAL_PREF Assignment Along with Communities
R2# show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 4 Paths: (1 available, best #1, table Default-IP-Routing-Table) Not advertised to any peer 109 131.108.6.1 from 131.108.6.1 (131.108.10.1) Origin IGP, metric 0, localpref 200, valid, external, best Community: 109:1 R2# show ip bgp 131.108.3.0 BGP routing table entry for 131.108.3.0/24, version 2 Paths: (1 available, best #1, table Default-IP-Routing-Table) Not advertised to any peer 109 131.108.6.1 from 131.108.6.1 (131.108.10.1) Origin IGP, metric 0, localpref 50, valid, external, best Community: 109:2
The LOCAL_PREF assignment and community is listed for each prefix. Community usage gives a scalable way to manage the BGP prefix in a large BGP network.
BGP policies can be configured based on a single community number that might represent thousands of prefixes. For example, Router R1 in the east region of a large network wants to assign LOCAL_PREF of 200 to all prefixes of west region routes. If west region routers assign a certain community number to all their prefixes when advertising to the east region, Router R1 will simply assign LOCAL_PREF in a route map by matching against a community that east region has set. Router R1 could have made a huge access list to permit each prefix and accomplish the same result, but, using community matching, R1 accomplished it in a scalable fashion.
Not only are communities used in BGP policy control, but they also aid in troubleshooting BGP network problems. Customer BGP prefixes can be assigned with distinct and different communities from peering ISP prefixes. If a customer prefix is having an issue, just looking at the distinct community can isolate customer prefixes, and troubleshooting can be done faster. Such benefits make community usage common in BGP networks.
Use of Outbound Route Filtering in Policy Control
The document draft-chen-bgp-prefix-orf-00.txt defines the functionality of exchanging prefix list-based outbound route filter (ORF) capability. When configured with ORF, one router pushes its inbound prefix list to ORF-capable BGP neighbors. Upon receipt, the pushed prefix list is automatically configured as the outbound prefix list.
Typically, when routers must deny certain prefixes from other routers, they use filter lists, distribute lists, prefix lists, and so on as inbound filters. The receiver denies the prefix after the sender has sent that prefix. ORF offers a dynamic way in which the receiver advertises its inbound filter to the sender; the sender then installs that filter on its outbound neighbor relationship to the receiver. When a neighbor relationship is formed between two routers, they exchange ORF capability that verifies whether both routers are ORF-capable. Only when both agree can the ORF feature be used.
Figure 14-19 shows how BGP speakers make use of ORF. The bold numbers indicate the sequence of events in ORF, defined in the list following the figure.
- R2 in AS 110 is advertising prefixes 131.108.2.0/24 and 131.108.3.0/24 to R1 in AS 109.
- R1’s goal is to deny 131.108.2.0/24 and accept everything else that comes from R2. This is done through a prefix list named ABC, as shown in the Example 14-24 configuration.
- R1 advertises its inbound prefix list ABC to R2 through the ORF mechanism.
- R2 installs prefix list ABC as an outbound filter for neighbor R1.
In Figure 14-19, R2 is originating 131.108.2.0/24 and 131.108.3.0/24.
R1 wants to deny 131.108.2.0/24 and receive everything else. Without ORF, R1 must configure an inbound prefix list that will deny 131.108.2.0/24. Without ORF, this is achieved at the expense of receiving the update and then filtering it, thus wasting lots of resources, such as CPU processing in R2 to advertise the prefixes, link bandwidth to carry the updates that will be dropped at R1, and CPU processing in R1 to filter those updates. ORF sends an inbound prefix list filter to the neighbor. After receiving this prefix list, the neighbor applies it as an outbound prefix list. All updates then must pass by the prefix list, saving extra computation at the receiver.
Example 14-24 shows how R1 can send its inbound prefix list ABC to R2.
Example 14-24 Sample Configuration to Show How ORF Can Be Used
R1: router bgp 109 bgp log-neighbor-changes neighbor 131.108.6.2 remote-as 110 neighbor 131.108.6.2 ebgp-multihop 2 neighbor 131.108.6.2 capability orf prefix-list both neighbor 131.108.6.2 prefix-list ABC in ! ip prefix-list ABC seq 5 deny 131.108.2.0/24 ip prefix-list ABC seq 10 permit 0.0.0.0/0 le 32 R1#clear ip bgp 131.108.6.2 in prefix-filter R1#show ip bg BGP table version is 2, local router ID is 1.1.1.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 131.108.3.0/24 131.108.6.2 0 0 110 I
The neighbor 131.108.6.2 capability orf prefix-filter both enables ORF capability in R1 with the BGP neighbor, and this capability indicates that the R1 is willing to accept or send a prefix list with the neighbor R2.
The neighbor 131.108.6.2 prefix-list ABC in command means that R1 has configured an inbound prefix list ABC for R2, which simply denies 131.108.2.0/24 and permits all other prefixes.
The clear ip bgp 131.108.6.2 in prefix-filter in R1 pushes its inbound prefix list filter to R2.
The show ip bgp command shows that R1 is only receiving 131.108.3.0/24 as R2 has accepted prefix-list ABC that denies 131.108.2.0/24.
Example 14-25 shows the necessary configuration of R2 to accept the ORF from R1 configured in Example 14-24.
Example 14-25 Sample Configuration of R2 to Accept ORF from R1
R2: router bgp 110 network 131.108.2.0 mask 255.255.255.0 network 131.108.3.0 mask 255.255.255.0 neighbor 131.108.6.1 remote-as 109 neighbor 131.108.6.1 ebgp-multihop 2 neighbor 131.108.6.1 capability orf prefix-list both R2#show ip bgp BGP table version is 3, local router ID is 2.2.2.2 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 131.108.2.0/24 0.0.0.0 0 32768 *> 131.108.3.0/24 0.0.0.0 0 32768 I R2#show ip bgp neighbors 131.108.6.1 received prefix-filter Address family: IPv4 Unicast ip prefix-list 131.108.6.1: 2 entries seq 5 deny 131.108.2.0/24 seq 10 permit 0.0.0.0/0 le 32
The neighbor 131.108.6.1 capability orf prefix-list both command enables ORF capability in R2 with the BGP neighbor, and this capability indicates that the R2 is willing to accept or send a prefix list with the neighbor R1.
The two show commands in R2 indicate that R2 is advertising the two prefixes and R2 has received the prefix list from R1 that denies 131.108.2.0/24 and permits everything else. When R2 accepts the prefix-list ABC, it installs it as an outbound prefix-list, resulting in denial of 131.108.2.0/24 and permitting 131.108.3.0/24.
R2 has the option to overwrite received prefix-filter with its own.
ORF is a powerful mechanism to install inbound prefix lists on the remote end, thus avoiding unnecessary routing updates on the link and saving receiver CPU time to process those updates and deny them.
Route Dampening
Route dampening is the feature that reduces propagation of flapping routes in the Internet. Route flapping occurs when IP routes are removed and put back in a routing table. This can be because of physical layer failure, routing protocol failure, or router node failure, and so on. When these flaps are announced through BGP to the Internet, all of Internet routers running BGP are affected, as they have to remove and install such flapping routes. In an unstable internal network where IP routes continuously flaps, this instability is propagated through BGP throughout the Internet. Route dampening is the feature that minimizes this instability by assigning a penalty to such flapping routes. When the penalty reaches a predefined limit (suppress limit), that route is removed from the routing table and is not advertised to Internet. When the route stops flapping, the penalty decreases exponentially. When the penalty is reduced to a predefined limit (reuse limit), that route is installed again and propagated through BGP. Some of the rules and definitions regarding route dampening are as follows:
- Cisco IOS Software application— Route dampening applies to EBGP neighbors only.
- Flap penalty— Each flap receives a penalty of 1000. A penalty is assigned only when routes are withdrawn and not when they are re-advertised.
- Suppress limit— A route is suppressed and removed from the routing table if the penalty exceeds this limit. The default suppress limit is 2000.
- Half-life time— Every 5 sec, a penalty is exponentially reduced such that in half-life the penalty will be reduced to half of its value. Default Half-Life Time is 15 minutes.
- Reuse limit— With exponential reduction of penalty, a penalty will reach its reuse limit at which the route will no longer be suppressed and will be installed and advertised to other BGP speaker. The Reuse-limit default is 750. When penalty is half of Reuse-limit, the dampening information will be purged.
- History state— When flap (withdrawal) occurs, a route is assigned a penalty of 1000. In this state, BGP does not have the route because it is withdrawn, but BGP maintain the information about the route in history state to keep track of dampening.
- Damp state— With repeated flaps, where the penalty exceeds the suppress limit, the route is removed from routing table and is not advertised to any BGP speaker.
- Maximum duration of dampening— Default is 4 times of half-life time (15 minutes). A route can only be dampened for 1 hour in default settings.
In Cisco IOS Software, dampening is configured as shown in Example 14-26.
Example 14-26 Configuration of Dampening in Cisco IOS Software
R3# router bgp 110 bgp dampening
With this configuration, the half-life, reuse limit, suppress limit, and maximum suppress time will get defaults of 15 min, 750, 2000, and 1 hour, respectively. These values can be changed with the configuration shown in Example 14-27.
Example 14-27 Configuration to Change Dampening Parameters
router bgp 110 bgp dampening 1 400 2000 4
In Example 14-27, the half-life is 1, reuse limit is 400, suppress limit is 2000, and the maximum suppress time is 4 times the half-life, so routes will be suppressed for a maximum of 4 minutes.
The example that follows demonstrates how dampening works and what sequence of events the Cisco router goes through when routes are flapped.
In a simple network, R1 and R3 are running EBGP. R1 is advertising 131.108.1.0/24 to R3. Example 14-28 shows the sequence of events when R3 has dampening enabled and R1 is flapping 131.108.1.0/24. R3 is running following debugs to observe the sequence of events.
NOTE
Debugs should be run carefully as excessive debug output can influence router performance.
Example 14-28 Route Dampening Example
R3# debug ip bgp updates 1 debug ip bgp dampening 1 access-list 1 permit 131.108.1.0 0.0.0.0 _____________________________________________________________________________________ ! First Sequence: R1 has withdrawn 131.108.1./24 R3# Jul 7:20:33.151 MDT: BGP: 1.1.2.1 rcv UPDATE about 131.108.1.0/24 withdrawn 24 1 .Jul 4 17:20:33.151 MDT: BGP: charge penalty for 131.108.1.0/24 path 109 2 with halflife-time 15 reuse/suppress 750/2000 .Jul 24 17:20:33.151 MDT: flapped 1 times since 00:00:00. New penalty is 1000 R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 3 Paths: (1 available, no best path) Flag: 0x88 Not advertised to any peer 109 (history entry) 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 20, localpref 100, external Dampinfo: penalty 1000, flapped 1 times in 00:00:04 _____________________________________________________________________________________ ! Second Sequence: R1 announces 131.108.1.0/24 to R3. R3# Jul 24 17:21:01.214 MDT: BGP: 1.1.2.1 rcv UPDATE about 131.108.1.0/24 R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 4 Paths: (1 available, best #1) Flag: 0x88 Not advertised to any peer 109 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 20, localpref 100, valid, external, best Dampinfo: penalty 972, flapped 1 times in 00:00:39 _____________________________________________________________________________________ ! Third Sequence: R1 has again withdrawn 131.108.1./24 R3# Jul 24 17:21:31.882 MDT: BGP: 1.1.2.1 rcv UPDATE about 131.108.1.0/24 -- withdrawn .Jul 24 17:21:31.882 MDT: BGP: charge penalty for 131.108.1.0/24 path 109 with halflife-time 15 reuse/suppress 750/2000 .Jul 24 17:21:31.882 MDT: flapped 2 times since 00:00:58. New penalty is 1960 R3#show ip bgp 131.108.1. BGP routing table entry for 131.108.1.0/24, version 5 Paths: (1 available, no best path Flag: 0x88 Not advertised to any peer 109 (history entry) 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 20, localpref 100, external Dampinfo: penalty 1937, flapped 2 times in 00:01:17 _____________________________________________________________________________________ ! Fourth Sequence: R1 announces 131.108.1.0/24 to BR3. .Jul 24 17:22:13.706 MDT: BGP: 1.1.2.1 rcv UPDATE about 131.108.1.0/24 R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 6 Paths: (1 available, best #1) Flag: 0x88 Not advertised to any peer 109 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 20, localpref 100, valid, external, best Dampinfo: penalty 1891, flapped 2 times in 00:01:52 R3#show ip route 131.108.1.0 Routing entry for 131.108.1.0/24 Known via "bgp 110", distance 20, metric 20 Tag 109, type external Last update from 1.1.2.1 00:00:13 ago Routing Descriptor Blocks: * 1.1.2.1, from 1.1.2.1, 00:00:13 ago Route metric is 20, traffic share count is 1 AS Hops 1, BGP network version 0 _____________________________________________________________________________________ Fifth Sequence: R1 has again withdrawn 131.108.1./24 R3# Jul 24 17:22:40.781 MDT: BGP: 1.1.2.1 rcv UPDATE about 131.108.1.0/24 withdrawn Jul 24 17:22:40.781 MDT: BGP: charge penalty for 131.108.1.0/24 path 109 with halflife-time 15 reuse/suppress 750/2000 .Jul 24 17:22:40.781 MDT: flapped 3 times since 00:02:07. New penalty is 2869 R3#show ip bgp 131.108.1.0 BGP routing table entry for 131.108.1.0/24, version 7 Paths: (1 available, no best path) Not advertised to any peer 109, (suppressed due to dampening) 1.1.2.1 from 1.1.2.1 (10.1.1.1) Origin IGP, metric 20, localpref 100, valid, external Dampinfo: penalty 2802, flapped 3 times in 00:02:44, reuse in 00:28:30 R3#show ip route 131.108.1.0 % Network not in table
The significant events in the five sequences in Example 14-28 are as follows:
- In the debug output of the first sequence, a penalty of 1000 is applied for the with-drawn route. BGP table shows that penalty along with other flap statistics.
- Notice in BGP output of the second sequence that the penalty is gradually going down at 5 sec interval.
- In the debug output of the third sequence, the new penalty is assigned for this flap (The new penalty assigned is 1000.) This new penalty is added to the old penalty of 937 making a total penalty of 1937, as shown in BGP output.
- In the fourth sequence, notice from BGP and routing table output that 131.108.1.0/24 is still installed in the routing table because penalty 1891 is less than Suppress-Limit (2000).
- In the fifth sequence, with the third flap, the total penalty exceeds Suppress Limit of 2000, and this route is now suppressed in the BGP table. When routes are suppressed, they are no longer installed in the routing table and are not advertise to the other BGP neighbor. From the BGP output, it is evident that the route is suppressed for 28 minutes and 30 seconds provided no further flaps happen.
Dampening is commonly used because it offers a dynamic way to penalize flapping and unstable routes.
Scaling IBGP in Large Networks—Route Reflectors and Confederations
It is a common understanding that there exists a rule stating that IBGP neighbors must be fully meshed with each other. This section addresses why this is a requirement and how to avoid fully meshed IBGP.
It is important to understand two rules of prefix advertisement:
- When a prefix is received from an EBGP neighbor, the router must advertise that prefix to all other EBGP and IBGP neighbors.
- When a prefix is received from an IBGP neighbor, it can be advertised ONLY to EBGP neighbors, NOT to any other IBGP neighbors.
This second rule requires a fully meshed IBGP neighbor relationship; otherwise, prefixes are not advertised to all routers in a single AS.
IBGP full mesh can scale in networks where the number of IBGP running routers is small; however, in networks characteristic of a big ISP in which the number of routers running IBGP might reach several hundred, having an n(n–1) (where n is the total number of routers in the AS) neighbor relationship and exchanging routes between all simply will not work. Figure 14-20 shows a fully meshed IBGP with only 12 routers running IBGP.
Imagine the nightmare caused by replacing the 12-router full mesh with a 500-router full mesh of IBGP. This limitation of full-mesh IBGP was the catalyst for the development of two mechanisms that address this problem:
Route Reflection, as described in RFC 1966 AS Confederations, as described in RFC 3065
The sections that follow briefly describe both mechanisms. For more detailed coverage of these mechanisms, you are encouraged to read the RFCs.
Route Reflection
Instead of doing full-mesh IBGP between all routers, Route-Reflection design allows router networks to have a hierarchy. Networks are divided into regions, and each region can have a multiple-layer hierarchy of Core, Aggregation, and Access routers. IBGP routing updates are propagated between levels in both directions when running Route-Reflection.
Figure 14-21 replaces the fully meshed IBGP mess illustrated in Figure 14-20 by using Route-Reflection in an IBGP network. Each Access layer router connects to only regional Aggregation routers, and these Aggregation routers connect only to Core routers. The Core routers need to be fully meshed with each other. Multiple connections exist from each router for redundancy. Routers speak only IBGP with their upper-layer routers. For example, R1 peers only with R4 and R5, which peer only with R6 and R7. Core routers peer with each other and to all routers below them in the hierarchy. This way, the Core is connected to all regions.
The top level is a Route-Reflector (RR) for the bottom level that acts as a Route-Reflector-Client (RRC) for the top level. In Figure 14-21, the Core layer routers (R6 and R7) act as RRs for the Aggregation layer routers (R4, R5, R8, and R9); therefore, the Aggregation layer routers (R4, R5, R8, and R9) are RRCs of the Core layer routers (R6 and R7). An RR client can be a Route-Reflector for bottom-layer routers as well. Aggregation layer router R4, which is an RRC for the Core layer routers (R6 and R7), is also acting as an RR for Access layer Routers R1, R2, and R3, which are RRCs for the Aggregation layer routers (R4 and R5).
Figure 14-21 gives an example of hierarchical Route-Reflection. A network that has just two layers (Core and Access) has only one level of Route-Reflection. Route-Reflection is configured only on the RR(s). Route-Reflector-Clients are unaware that they are part of any reflection; therefore, no configuration is needed to make them RRCs.
The way that IBGP routing updates flow in an RR network is defined by the following rules:
- If an update came from an EBGP neighbor, advertise that update to all neighbors (IBGP, EBGP, Route-Reflector-Client(s)).
- If an update came from an IBGP neighbor, advertise that update to EBGP neighbors and Route-Reflector-Clients.
- If an update came from a Route-Reflector-Client, advertise that update to other Route-Reflector-Client(s), IBGP, and EBGP neighbors, but not to the Route-Reflector-Client that sent the update.
In Figure 14-21, suppose that an EBGP neighbor is connected to Core Router R6 to advertise an update for 131.108.1.0/24. R6 passes that update to all neighbors because of rule 1 just mentioned, and the Aggregation layer (R4 and R5) will pass that to the Access layer (R1, R2, and R3) because of rule 2. Similarly, the east region will also propagate the update. This way, 131.108.1.0/24 will be propagated throughout the region without having a full mesh of IBGP.
Now, imagine that Access layer Router R1 receives the prefix 140.1.1.0/24 from its EBGP neighbor. R1 propagates that to the Aggregation layer (R4 and R5) because of rule 1. R4 and R5 reflect the update to the Aaccess layer (R2 and R3) and to the Core layer (R6 and R7) because of rule 3. The Core layer (R6 and R7) reflects that update to the east region Aggregation layer (R8 and R9) because of rule 3.
This way, 140.1.1.0/24 will be propagated from the lower layers to the upper layers in a hierarchical network.
Hierarchical Route-Reflection networks make more sense when they are viewed as a group of RRs and their clients. Following are the definition of a few important and key concepts in understand hierarchical Route-Reflection.
- Cluster— A set of one or more RRs and their clients.
- Originator_ID attribute— This is a RID of the router that originated or first received the route from EBGP neighbor in the local AS and the RR create the originator ID.
- Cluster-ID— A 4-byte integer representing the cluster. If the cluster-ID is not configured, the RID of the RR is taken as the cluster-ID. Configure the cluster-ID using the following Cisco IOS Software command:
router bgp 109 bgp cluster-id x.x.x.x
When two RRs are configured with the same cluster-ID, they are part of the same cluster.
- Cluster_list attribute— A list of cluster-IDs representing the series of clusters that an update has traversed. When an RR receives an update from its client, the RR adds its local cluster-ID and sends it to a nonclient (upper-level RR or IBGP neighbor).
When an RR receives an update with its own cluster-ID in the cluster list, the RR drops that update, assuming that the update has looped.
Figure 14-22 shows cluster definition as configured on all RRs.
shows how RR and cluster definitions are done in Cisco IOS Software. Specifically, this example looks at the configuration of R4, the RR in the Aggregation layer.
Example 14-29 Sample Configuration of RR and Cluster in R4
router bgp 109 neighbor 1.1.1.1 remote-as 109 neighbor 1.1.1.1 route-reflector-client neighbor 2.2.2.2 remote-as 109 neighbor 2.2.2.2 route-reflector-client neighbor 3.3.3.3 remote-as 109 neighbor 3.3.3.3 route-reflector-client neighbor 6.6.6.6 remote-as 109 neighbor 7.7.7.7 remote-as 109 bgp cluster-id 1.1.1.1
- R4 has three clients—R1 (1.1.1.1), R2 (2.2.2.2), and R3 (3.3.3.3)—and two normal IBGP neighbors—R6 (6.6.6.6) and R7 (7.7.7.7). No configuration in R4 shows that it is an RRC of R6 and R7.
Assume that Access layer Router R1 in the west region advertises a prefix of 140.1.1.0/24. Example 14-30 provides show ip bgp output to display how this update would affect Access layer Router R12 in the east region.
Example 14-30 show ip bgp Output to Show Clusters
R12>show ip bgp 140.1.1.0 BGP routing table entry for 140.1.1.0/24 1.1.1.1 from 1.1.1.1 (1.1.1.1) Origin IGP, metric 0, localpref 100, valid, internal, best Originator : 1.1.1.1 Cluster list: 2.2.2.2 1.1.1.1
In Example 14-30, the originator is 1.1.1.1, which is the RID of R1. The cluster list shows the two clusters that this update has traversed.
Route-Reflection solves the full IBGP mesh problem very elegantly and offers great flexibility for BGP networks to grow to much bigger IBGP networks. Almost all large BGP networks make use of Route-Reflection to scale their IBGP.
AS Confederations
In an AS Confederation, an AS is divided into smaller Sub-autonomous systems, which are connected through EBGP to each other. Each Sub-AS acts as an independent BGP AS and runs normal IBGP internally within the Sub-AS. A single IGP is run in a complete AS and each Sub-AS has IGP routing information about all other Sub-autonomous systems. Most BGP attributes, such as LOCAL_PREF, MED, and NEXT_HOP, are preserved when updates go across a Sub-AS. The AS_PATH attribute adds the Sub-autonomous systems in the AS_PATH. To the outside world, the AS running AS Confederation appears as a single AS.
To better understand AS Confederations, you need to know about how the AS_PATH attribute operates within an AS Confederation network. Just as the AS_PATH attribute carries information about autonomous systems the updates have traversed, AS_PATH in Confederation carries Sub-AS information. Just as with the AS_PATH attribute, when a router running Confederation receives an update whose AS_PATH contains its own Sub-AS, the router drops that update to avoid loops. The two BGP attributes associated with AS Confederations are described as follows:
- AS_CONFED_SEQUENCE— Defines the list of Sub-autonomous systems in the AS_PATH, in sequential order of confederated Sub-AS where the update has traversed. This is analogous to AS_SEQUENCE, as discussed in AS_PATH attribute definition.
- AS_CONFED_SET— Defines the list of Sub-autonomous systems in the AS_PATH in an unordered set of Sub-AS. This can be used in situations where a Confederation Sub-AS is aggregating routes to form multiple Sub-autonomous systems. In this case, you can set AS_PATH as AS_CONFED_SET for the aggregated route; it carries the list of all Sub-AS, but their order is not maintained. This is analogous to AS_SET, as discussed in the AS_PATH attribute definition.
Figure 14-23 shows an AS 109 divided into an AS Confederation of three small Sub-AS: 65001, 65002, and 65003. Each Sub-AS runs EBGP with the other Sub-autonomous systems. Notice that the Sub-autonomous systems do not have a full mesh of EBGP. This is similar to the real world of BGP where all EBGP speakers are not fully meshed. Each Sub-AS treats the other Sub-autonomous systems as EBGP neighbors, thus forwarding all updates from one Sub-AS to other Sub-autonomous systems.
shows that R1 in Sub-AS 65003 is running EBGP with autonomous system 110, which is advertising 140.1.1.0/24 to R1. When R1 receives the update from autonomous system 110, the prefix 140.1.1.0/24 will have the AS_PATH as 110. Sub-AS 65003 propagates this path to Sub-AS 65002 with the AS_PATH attribute as (65003) 110. In BGP output, (65003) means that this autonomous system represents a Sub-AS of an AS Confederation. When this update leaves subautonomous system 65002, the AS_PATH looks like (65002 65003) 110. When R12 in Sub-AS 65001 advertises 140.1.1.0/24 to the outside world, the AS_PATH field is stripped from the Confederation Sub-AS numbers, and the outside world is presented with AS_PATH as 109 110 for prefix 140.1.1.0/24 as if there were no Confederation in AS 109.
Example 14-31 shows a sample BGP configuration in Router R4 in Sub-AS 65003.
Example 14-31 Sample Confederation Sub-AS Configuration
R4# router bgp 65003 confederation identifier 109 bgp confederation peers 65002 neighbor 1.1.1.1 remote-as 65003 neighbor 2.2.2.2 remote-as 65003 neighbor 3.3.3.3 remote-as 65003 neighbor 6.6.6.6 remote-as 65002 neighbor 7.7.7.7 remote-as 65002
In Example 14-31, confederation identifier represents the real AS assigned to this network, and 109 is the AS that the outside world sees. The bgp confederation peers command lists all the Confederation Sub-autonomous systems with which this router is peering. In this example, R4 is peering with Sub-AS 65002 (R6 and R7). R4 is also peering with three internal IBGP peers, R1, R2, and R3, at the 1.1.1.1, 2.2.2.2, and 3.3.3.3 addresses, respectively, in Sub-AS 65003.
Although an AS Confederation offers a mechanism to avoid fully meshed IBGP in a large AS, a full mesh of IBGP is still a requirement within a Sub-AS. This presents a challenge of scaling IBGP within each Sub-AS. Each Sub-AS could then have a full mesh of IBGP or it could run Route-Reflection within each Sub-AS.
In the quest to eliminate fully meshed IBGP using Route-Reflection or AS Confederations, BGP operators look at various reasons to prefer one to the other. It depends on, among other things, how the physical network is laid out, which method requires less configuration change, and which method offers ease in managing IBGP.
Best-Path Calculation
Material in this section is based on the Cisco document “BGP Best Path Selection Algorithm,” available at www.cisco.com/warp/public/459/25.shtml.
By design, a BGP speaker receiving updates picks only a single best update from a set of multiple updates and installs it in the routing table. BGP best-path calculation goes through a series of comparisons between multiple updates. The comparison is done over the BGP attributes, and a series of tests is performed until one update wins over the other and the best path update is placed in the routing table.
With the best-path algorithm, BGP assigns the first valid path as the current best path. BGP then compares the best path with the next path in the list, until it reaches the end of the list of valid paths.
The following list of rules determines the best path:
- Prefer the path with the largest WEIGHT. WEIGHT is a Cisco proprietary parameter, local to the router on which it is configured.
- Prefer the path with the largest local preference (LOCAL_PREF).
- Prefer the path that was locally originated through a network or aggregate BGP subcommand, or through redistribution from an IGP. Local paths sourced by network/redistribute commands are preferred over local aggregates sourced by the aggregate-address command.
- Prefer the path with the shortest AS_PATH. The AS_PATH is a listing of the autonomous systems through which this particular update traveled to reach the local autonomous system. The fewer autonomous systems it crossed, the more preferred the route is. Note the following:
- This step is skipped if you configure bgp bestpath as-path ignore.
- An AS_SET counts as 1, no matter how many autonomous systems are in the set.
- The AS_CONFED_SEQUENCE is not included in the AS_PATH length.
- Prefer the path with the lowest origin type: IGP is lower than EGP, and EGP is lower than INCOMPLETE.
- Prefer the path with the lowest multi-exit discriminator (MED). Note the following:
- This comparison is done only if the first (neighboring) AS is the same in the two paths; any Confederation Sub-autonomous systems are ignored. In other words, MEDs are compared only if the first AS in the AS_SEQUENCE is the same for multiple paths. Any preceding AS_CONFED_SEQUENCE is ignored.
- If bgp always-compare-med is enabled, MEDs are compared for all paths. This option needs to be enabled over the entire AS, otherwise, routing loops can occur.
- If bgp bestpath med-confed is enabled, MEDs are compared for all paths that consist only of AS_CONFED_SEQUENCE (paths originated within the local confederation).
- Paths received from a neighbor with a MED of 4,294,967,295 will have the MED changed to 4,294,967,294 before insertion into the BGP table.
- Paths received with no MED are assigned a MED of 0, unless bgp bestpath missing-as-worst is enabled; in that case, they are assigned a MED of 4,294,967,294.
- The bgp deterministic med command also can influence this step.
- Prefer external (eBGP) over internal (iBGP) paths. Paths containing AS_CONFED_SEQUENCE are local to the confederation and, therefore, are treated as internal paths. There is no distinction between Confederation External and Confederation Internal.
- Prefer the path with the lowest IGP metric to the BGP next hop.
- If the maximum-paths n command is enabled and there are multiple external or confederation external paths from the same neighboring AS or Sub-AS, BGP inserts up to n most recently received paths in the IP routing table. This allows eBGP multi-path load sharing. The maximum value of n is currently 6. The default value, when this option is disabled, is 1. The oldest received path is marked as the best path in the output of show ip bgp longer-prefixes, and the equivalent of next-hop-self is performed before forwarding this best path to internal peers.
- If both paths are external, prefer the path that was received first (the oldest one). This step minimizes route flapping because a newer path won’t displace an older one, even if it was the preferred route based on the RID. It is better practice to apply the additional decision steps in 11, 12, and 13 to iBGP paths only, to ensure a consistent best path decision within the network and thereby avoid loops. This step is skipped if any of the following is true:
- The bgp best path compare-routerid command is enabled.
- The router ID is the same for multiple paths because the routes were received from the same router.
- No current best path exists. An example of losing the current best path occurs when the neighbor offering the path goes down.
- Prefer the route coming from the BGP router with the lowest router ID. The router ID is the highest IP address on the router, with preference given to loopback addresses. It can also be set manually using the bgp router-id command. If a path contains RR attributes, the originator ID is substituted for the router ID in the path selection process.
- If the originator or RID is the same for multiple paths, prefer the path with the minimum cluster ID length. This will be present only in a BGP Route-Reflector environment in which clients peer with RRs or clients in other clusters. In this scenario, the client must be aware of the RR-specific BGP attribute.
- Prefer the path coming from the lowest neighbor address. This is the IP address used in the BGP neighbor configuration, and it corresponds to the remote peer used in the TCP connection with the local router.
Example 14-32 shows how a best path is calculated. Example 14-32 is taken from route-server.cerf.net and is slightly modified to make it more interesting. route-server.cerf.net is a route server available on the Internet.
Example 14-32 Configuration to Demonstrate Calculation of the Best Path
show ip bgp 3.0.0.0 BGP routing table entry for 3.0.0.0/8, version 16396661 Paths: (4 available, best #4) Not advertised to any peer !Path 1 1740 701 8 192.157.69.5 from 192.157.69.5 (134.24.127.201) Origin IGP, metric 20, localpref 100, valid, external, !Path 2 1740 701 80 198.32.176.25 from 198.32.176.25 (134.24.127.35) Origin IGP, metric 20, localpref 110, valid, external, !Path 3 1740 701 80 134.24.88.55 from 134.24.88.55 (134.24.127.27) Origin IGP, metric 20, localpref 100, valid, external, !Path 4 1740 701 80 192.41.177.69 from 192.41.177.69 (134.24.127.131) Origin IGP, metric 10, localpref 110, valid, external, best,
By going through the BGP best-path decision algorithm step by step, path 4 is the best for these reasons:
- Path 2 is better than path 1 because it has a higher LOCAL_PREF.
- Path 2 is better than path 3 because it has a higher LOCAL_PREF.
- Path 4 is better than path 2 because it has a lower MED