Recommended BFD link down detection time when OSPF graceful-restart is configured in an SRX chassis cluster

Bidirectional Forwarding Detection (BFD) is used to providing sub-second convergence times for routing protocols. But BFD detects the link down prior to receiving “Graceful Restart” message, Graceful-restart of routing protocols may not work as expected.

An SRX chassis cluster is configured like below diagram and configuration. When the primary node RE (redundancy-group 0, also known as RG0) is failed over by system reboot using CLI “request system reboot” command, the new primary node RE is sending “Graceful LSA” message to the peer devices after 6 – 10 seconds later or even more depends on the configuration and total number of routes and etc. During this time, all pass-through traffic will be dropped that use the OSPF routes.

NOTE: This issue does not exist when RG0 failover by CLI “request chassis cluster failover redundancy-group 0 node (new primary node id)” command. In this case, the Grace restart LSA will be sent within 1-2 seconds from the new primary node RE. But if you define too aggressive BFD detection time (e.g., 1.2 seconds), the OSPF graceful-restart may not work if BFDD (BFD daemon) detect the link down and notify it to the OSPF client prior to receiving the Grace restart LSA. For more details, refer to the below “Events on the SRX 1, 2 and Router when BFD detect time is 7.5 sec”.

                         router-id 192.168.1.3
                         +---------------+ reth0.0 (1.1.1.3)
                 +-------| SRX 1 (node0) |-------+
                 |       +---------------+       | 
                 |     control | | fabric        |       ge-0/0/0.0 (1.1.1.100)
+------+    +--------+  link   | |  link     +--------+    +--------+    +------+
| PC 1 |----| Switch |         | |           | Switch |----| Router |----| PC 2 |
+------+    +--------+         | |           +--------+    +--------+    +------+
                 |             | |               |         router-id 192.168.1.100
                 |       +---------------+       |
                 +-------| SRX 2 (node1) |-------+
                         +---------------+ reth0.0

Junos OS versions
-----------------
12.1X44-D35.5 on SRX 1 (SRX3600) and SRX 2 (SRX3600)
13.2R3.7 on Router (MX80)

NOTE: Before RG0 failover, the primary node RE is in SRX 1 and the secondary node RE is in SRX 2.

With the following configuration, 11 ping packet loss was observed (from PC 1 to PC 2)

SRX Configuration (BFD detect time is 7.5 sec)

set routing-options graceful-restart
set routing-instances VR1 routing-options graceful-restart
set routing-instances VR1 protocols ospf graceful-restart restart-duration 180
set routing-instances VR1 protocols ospf graceful-restart notify-duration 30
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 hello-interval 10
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 dead-interval 40
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 authentication md5 1 key "Juniper"
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection minimum-interval 2500
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection multiplier 3

Router Configuration (BFD detect time is 7.5 sec)

set routing-options graceful-restart
set protocols ospf graceful-restart restart-duration 180
set protocols ospf graceful-restart notify-duration 30
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 hello-interval 10
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 dead-interval 40
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 authentication md5 1 key "Juniper"
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection minimum-interval 2500
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection multiplier 3

Events on the SRX 1, 2 and Router when BFD detect time is 7.5 sec

-------------------------------------------------------------------------------------
SRX 1 (node0)    SRX 2 (node1)            Router
-------------------------------------------------------------------------------------
10:48:13 reboot
                 JSRPD state change 
                 (secondary to primary)

                 RPD init/start


# 7 sec later of node0 reboot
10:48:20                                  BFDD detect link Down
                                          (Reason: Detect Timer Expiry)

                                          OSPF neighbor state changed from 'Full' to 'Down'
                                          due to InActiveTimer (event reason: BFD session timed out
                                          and neighbor was declared dead)
                                          => delete routes on the forwarding table !!!


# 11 second later of node0 reboot
10:48:24         Graceful restart LSA to Router
10:48:25         Graceful restart LSA to Router
10:48:26         Graceful restart LSA to Router
10:48:27         Graceful restart LSA to Router
10:48:28         Graceful restart LSA to Router
10:48:29         Graceful restart LSA to Router

                   
10:48:30             OSPF neighbor state changed from 'Init' to '2Way or ExStart'
                     due to 2WayRcvd (event reason: neighbor detected this router)
                                               
10:48:34             OSPF neighbor state changed to 'Full'
                     due to LoadDone (event reason: OSPF loading completed)

                                         

10:48:41         BFDD detect link up                                         

10:48:43                                 BFDD detect link Up

NOTE: In the above scenario, even you didn’t configure BFD, but you configured very short value of OSPF dead-interval (e.g., 10 seconds), the Graceful-restart may not work because the OSPF protocol neighbor keep-alive mechanism already detected the neighbor down before receiving “Graceful Restart” message.

When BFD detect the link “Down” on the peer device (Router), it notifies to OSPF, then OSPF will bring down the OSPF neighbor and delete the forwarding table. In this case,

If the OSPF neighbor (Router) receives the Graceful restart LSA prior to BFD link down notification, the OSPF routes will be remain until the configured graceful period is expired.
If not, the OSPF neighbor will delete the routes from the forwarding table immediately.

For example, Graceful restart LSA message

Router> monitor traffic interface ge-0/0/0.0 no-resolve size 1500 matching "ip proto ospf" detail 

(timestamp)  In IP (tos 0xc0, ttl   1, id 59766, offset 0, flags [none], proto: OSPF (89), length: 108) 1.1.1.3 > 224.0.0.5: OSPFv2, LS-Update, length 72 
	Router-ID 192.168.1.3, Area 0.0.0.0, Authentication Type: MD5 (2)
	Key-ID: 1, Auth-Length: 16, Crypto Sequence Number: 0x53fe92b0, 1 LSA
	  LSA #1
	  Advertising Router 1.1.1.3, seq 0x80000003, age 3s, length 24
	    Link Local Opaque LSA (9), Opaque-Type Graceful restart LSA (3), Opaque-ID 0   
	    Options: [External, Demand Circuit]
	    Grace Period TLV (1), length 4, value: 210s
	    Graceful restart Reason TLV (2), length 1, value: Unknown (0)
	    IPv4 interface address TLV (3), length 4, value: 1.1.1.3

In order to avoid the deletion of OSPF routes by BFD link Down notification, BFD detection time should be greater than 6-11 seconds (the time of the new primary RE sends the “Graceful restart LSA”) or even more depends on the configuration and total number of routes and etc. Juniper Engineering’s recommendation of bfd-liveness-detection minimum-interval is 2500ms and multiplier 4 or above along with the default OSPF hello-interval 10 sec and dead-interval 40 sec in an SRX chassis cluster environment.

NOTE: The above time is measured in the JTAC lab, the time of the new primary RE sends the “Graceful restart LSA” may vary in production devices. Therefore, we recommend to add additional time. For example, (6 to 11) + 3 seconds. The recommended BFD detection time will be 9 to 14 seconds.

With the following configuration, 1 ping packet loss was observed (from PC 1 to PC 2)

SRX Configuration (BFD detect time is 12 sec)

set routing-options graceful-restart
set routing-instances VR1 routing-options graceful-restart
set routing-instances VR1 protocols ospf graceful-restart restart-duration 180
set routing-instances VR1 protocols ospf graceful-restart notify-duration 30
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 hello-interval 10
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 dead-interval 40
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 authentication md5 1 key "Juniper"
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection minimum-interval 3000
set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection multiplier 4

Router Configuration (BFD detect time is 12 sec)

set routing-options graceful-restart
set protocols ospf graceful-restart restart-duration 180
set protocols ospf graceful-restart notify-duration 30
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 hello-interval 10
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 dead-interval 40
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 authentication md5 1 key "Juniper"
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection minimum-interval 3000
set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection multiplier 4


root@SRX1> show chassis cluster status 
Cluster ID: 10 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   200         primary        no       no  
    node1                   100         secondary      no       no  

Redundancy group: 1 , Failover count: 1
    node0                   200         primary        no       no  
    node1                   100         secondary      no       no 
{primary:node0}
root@SRX1> show ospf neighbor instance all 
Instance: master

Instance: Inet
Address          Interface              State     ID               Pri  Dead
1.1.1.100        reth0.0                Full      192.168.1.100    128    34


root@SRX1> show bfd session 
                                                  Detect   Transmit
Address                  State     Interface      Time     Interval  Multiplier
1.1.1.100                Up        reth0.0        12.000    3.000        4   

1 sessions, 1 clients
Cumulative transmit rate 0.3 pps, cumulative receive rate 0.3 pps

Events on the SRX 1, 2 and Router

-------------------------------------------------------------------------------------
SRX 1 (node0)    SRX 2 (node1)            Router
-------------------------------------------------------------------------------------
11:21:11 reboot
                 JSRPD state change 
                 (secondary to primary)

                 RPD init/start


# 6 sec later of node0 reboot
11:21:17         Graceful restart LSA to Router


# 10 sec later of node0 reboot
11:21:21                                 BFDD detect link Down
                                         (Reason: Detect Timer Expiry)
                                         => routes remains on the forwarding table 
                                            until the graceful period is expired

 
11:21:22         OSPF neighbor state changed from 'Init' to '2Way'
                 due to 2WayRcvd (event reason: neighbor detected this router)

11:22:01         OSPF neighbor state changed to 'Full'
                 due to LoadDone (event reason: OSPF loading completed)


11:22:13         BFDD detect link Up

11:22:15                                 BFDD detect link Up

NOTE: BFDD detection link Down is not after 12 sec later of node0 reboot. For more details, see RFC 5880, section 6.8.7. Transmitting BFD Control Packets, “the average interval between packets will be roughly 12.5% less than that negotiated”.

Workaround

You can manually send Graceful restart LSA using hidden CLI clear ospf grace-lsa or clear ospf grace-lsa instance (name of instance) prior to the primary node RE reboot. Or Failover all the RGs to the other node before rebooting the current primary node using CLI.

request chassis cluster failover redundancy-group 0 node (new primary node id)
request chassis cluster failover redundancy-group 1 node (new primary node id)
request chassis cluster failover redundancy-group x node (new primary node id) (if there are more RG1+ for data-plane)

Related