Bidirectional Forwarding Detection (BFD) is used to providing sub-second convergence times for routing protocols. But BFD detects the link down prior to receiving “Graceful Restart” message, Graceful-restart of routing protocols may not work as expected.
An SRX chassis cluster is configured like below diagram and configuration. When the primary node RE (redundancy-group 0, also known as RG0) is failed over by system reboot using CLI “request system reboot” command, the new primary node RE is sending “Graceful LSA” message to the peer devices after 6 – 10 seconds later or even more depends on the configuration and total number of routes and etc. During this time, all pass-through traffic will be dropped that use the OSPF routes.
NOTE: This issue does not exist when RG0 failover by CLI “request chassis cluster failover redundancy-group 0 node (new primary node id)” command. In this case, the Grace restart LSA will be sent within 1-2 seconds from the new primary node RE. But if you define too aggressive BFD detection time (e.g., 1.2 seconds), the OSPF graceful-restart may not work if BFDD (BFD daemon) detect the link down and notify it to the OSPF client prior to receiving the Grace restart LSA. For more details, refer to the below “Events on the SRX 1, 2 and Router when BFD detect time is 7.5 sec”.
router-id 192.168.1.3 +---------------+ reth0.0 (1.1.1.3) +-------| SRX 1 (node0) |-------+ | +---------------+ | | control | | fabric | ge-0/0/0.0 (1.1.1.100) +------+ +--------+ link | | link +--------+ +--------+ +------+ | PC 1 |----| Switch | | | | Switch |----| Router |----| PC 2 | +------+ +--------+ | | +--------+ +--------+ +------+ | | | | router-id 192.168.1.100 | +---------------+ | +-------| SRX 2 (node1) |-------+ +---------------+ reth0.0 Junos OS versions ----------------- 12.1X44-D35.5 on SRX 1 (SRX3600) and SRX 2 (SRX3600) 13.2R3.7 on Router (MX80)
NOTE: Before RG0 failover, the primary node RE is in SRX 1 and the secondary node RE is in SRX 2.
With the following configuration, 11 ping packet loss was observed (from PC 1 to PC 2)
SRX Configuration (BFD detect time is 7.5 sec)
set routing-options graceful-restart set routing-instances VR1 routing-options graceful-restart set routing-instances VR1 protocols ospf graceful-restart restart-duration 180 set routing-instances VR1 protocols ospf graceful-restart notify-duration 30 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 hello-interval 10 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 dead-interval 40 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 authentication md5 1 key "Juniper" set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection minimum-interval 2500 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection multiplier 3
Router Configuration (BFD detect time is 7.5 sec)
set routing-options graceful-restart set protocols ospf graceful-restart restart-duration 180 set protocols ospf graceful-restart notify-duration 30 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 hello-interval 10 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 dead-interval 40 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 authentication md5 1 key "Juniper" set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection minimum-interval 2500 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection multiplier 3
Events on the SRX 1, 2 and Router when BFD detect time is 7.5 sec
------------------------------------------------------------------------------------- SRX 1 (node0) SRX 2 (node1) Router ------------------------------------------------------------------------------------- 10:48:13 reboot JSRPD state change (secondary to primary) RPD init/start # 7 sec later of node0 reboot 10:48:20 BFDD detect link Down (Reason: Detect Timer Expiry) OSPF neighbor state changed from 'Full' to 'Down' due to InActiveTimer (event reason: BFD session timed out and neighbor was declared dead) => delete routes on the forwarding table !!! # 11 second later of node0 reboot 10:48:24 Graceful restart LSA to Router 10:48:25 Graceful restart LSA to Router 10:48:26 Graceful restart LSA to Router 10:48:27 Graceful restart LSA to Router 10:48:28 Graceful restart LSA to Router 10:48:29 Graceful restart LSA to Router 10:48:30 OSPF neighbor state changed from 'Init' to '2Way or ExStart' due to 2WayRcvd (event reason: neighbor detected this router) 10:48:34 OSPF neighbor state changed to 'Full' due to LoadDone (event reason: OSPF loading completed) 10:48:41 BFDD detect link up 10:48:43 BFDD detect link Up
NOTE: In the above scenario, even you didn’t configure BFD, but you configured very short value of OSPF dead-interval (e.g., 10 seconds), the Graceful-restart may not work because the OSPF protocol neighbor keep-alive mechanism already detected the neighbor down before receiving “Graceful Restart” message.
When BFD detect the link “Down” on the peer device (Router), it notifies to OSPF, then OSPF will bring down the OSPF neighbor and delete the forwarding table. In this case,
- If the OSPF neighbor (Router) receives the Graceful restart LSA prior to BFD link down notification, the OSPF routes will be remain until the configured graceful period is expired.
- If not, the OSPF neighbor will delete the routes from the forwarding table immediately.
For example, Graceful restart LSA message
Router> monitor traffic interface ge-0/0/0.0 no-resolve size 1500 matching "ip proto ospf" detail (timestamp) In IP (tos 0xc0, ttl 1, id 59766, offset 0, flags [none], proto: OSPF (89), length: 108) 1.1.1.3 > 224.0.0.5: OSPFv2, LS-Update, length 72 Router-ID 192.168.1.3, Area 0.0.0.0, Authentication Type: MD5 (2) Key-ID: 1, Auth-Length: 16, Crypto Sequence Number: 0x53fe92b0, 1 LSA LSA #1 Advertising Router 1.1.1.3, seq 0x80000003, age 3s, length 24 Link Local Opaque LSA (9), Opaque-Type Graceful restart LSA (3), Opaque-ID 0 Options: [External, Demand Circuit] Grace Period TLV (1), length 4, value: 210s Graceful restart Reason TLV (2), length 1, value: Unknown (0) IPv4 interface address TLV (3), length 4, value: 1.1.1.3
In order to avoid the deletion of OSPF routes by BFD link Down notification, BFD detection time should be greater than 6-11 seconds (the time of the new primary RE sends the “Graceful restart LSA”) or even more depends on the configuration and total number of routes and etc. Juniper Engineering’s recommendation of bfd-liveness-detection minimum-interval is 2500ms and multiplier 4 or above along with the default OSPF hello-interval 10 sec and dead-interval 40 sec in an SRX chassis cluster environment.
NOTE: The above time is measured in the JTAC lab, the time of the new primary RE sends the “Graceful restart LSA” may vary in production devices. Therefore, we recommend to add additional time. For example, (6 to 11) + 3 seconds. The recommended BFD detection time will be 9 to 14 seconds.
With the following configuration, 1 ping packet loss was observed (from PC 1 to PC 2)
SRX Configuration (BFD detect time is 12 sec)
set routing-options graceful-restart set routing-instances VR1 routing-options graceful-restart set routing-instances VR1 protocols ospf graceful-restart restart-duration 180 set routing-instances VR1 protocols ospf graceful-restart notify-duration 30 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 hello-interval 10 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 dead-interval 40 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 authentication md5 1 key "Juniper" set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection minimum-interval 3000 set routing-instances VR1 protocols ospf area 0.0.0.0 interface reth0.0 bfd-liveness-detection multiplier 4
Router Configuration (BFD detect time is 12 sec)
set routing-options graceful-restart set protocols ospf graceful-restart restart-duration 180 set protocols ospf graceful-restart notify-duration 30 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 hello-interval 10 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 dead-interval 40 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 authentication md5 1 key "Juniper" set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection minimum-interval 3000 set protocols ospf area 0.0.0.0 interface ge-0/0/0.0 bfd-liveness-detection multiplier 4 root@SRX1> show chassis cluster status Cluster ID: 10 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 200 primary no no node1 100 secondary no no Redundancy group: 1 , Failover count: 1 node0 200 primary no no node1 100 secondary no no {primary:node0} root@SRX1> show ospf neighbor instance all Instance: master Instance: Inet Address Interface State ID Pri Dead 1.1.1.100 reth0.0 Full 192.168.1.100 128 34 root@SRX1> show bfd session Detect Transmit Address State Interface Time Interval Multiplier 1.1.1.100 Up reth0.0 12.000 3.000 4 1 sessions, 1 clients Cumulative transmit rate 0.3 pps, cumulative receive rate 0.3 pps
Events on the SRX 1, 2 and Router
------------------------------------------------------------------------------------- SRX 1 (node0) SRX 2 (node1) Router ------------------------------------------------------------------------------------- 11:21:11 reboot JSRPD state change (secondary to primary) RPD init/start # 6 sec later of node0 reboot 11:21:17 Graceful restart LSA to Router # 10 sec later of node0 reboot 11:21:21 BFDD detect link Down (Reason: Detect Timer Expiry) => routes remains on the forwarding table until the graceful period is expired 11:21:22 OSPF neighbor state changed from 'Init' to '2Way' due to 2WayRcvd (event reason: neighbor detected this router) 11:22:01 OSPF neighbor state changed to 'Full' due to LoadDone (event reason: OSPF loading completed) 11:22:13 BFDD detect link Up 11:22:15 BFDD detect link Up
NOTE: BFDD detection link Down is not after 12 sec later of node0 reboot. For more details, see RFC 5880, section 6.8.7. Transmitting BFD Control Packets, “the average interval between packets will be roughly 12.5% less than that negotiated”.
Workaround
You can manually send Graceful restart LSA using hidden CLI clear ospf grace-lsa or clear ospf grace-lsa instance (name of instance) prior to the primary node RE reboot. Or Failover all the RGs to the other node before rebooting the current primary node using CLI.
request chassis cluster failover redundancy-group 0 node (new primary node id) request chassis cluster failover redundancy-group 1 node (new primary node id) request chassis cluster failover redundancy-group x node (new primary node id) (if there are more RG1+ for data-plane)