VRRP mastership switch over is usually completed instantaneously, but sometimes it takes longer. While testing VRRP mastership switch over multiple times by deleting logical interface, a large delay (20-30 seconds) is observed in VRRP mastership switch over.
This slow VRRP convergence can sometimes happen because the current master router cannot send out a VRRP packet with “priority 0” to the current backup peer while the logical interface goes down. According to RFC 3768, the priority value zero (0) has special meaning indicating that the current master has stopped participating in VRRP. This is used to trigger backup routers to quickly transition to master without having to wait for the current master to timeout. If the priority=0 packet is sent out from the master router and it reaches the backup router properly, the mastership switchover completes in a second, but if disabling the logical interface with the commands set interfaces ae12.2001 disable and commit prevents the priority=0 packet from going out, the backup router does not receive it. The backup router has to wait until the hold-timer expires. Due to this race condition between committing a config (disabling the logical interface) and sending out priority=0 packet, sometimes this long VRRP convergence is seen.
Topology
+-------+ +------+ +-------+ | |ae12.2001 | | ae0.2001| | | MX240 |----------------| L2SW |----------------| MX480 | | | | | | | +-------+ +------+ +-------+ vrrp master vrrp Backup *priority 120 *priority 90
Below, messages.0 is of the problem case. The mgd sent notification to dcd, but it seemed stuck, then IFL=ae12.2001 went down. The mgd resumed sending notification to vrrp one second later, but the IFL was down already.
<messages.0> Aug 21 19:08:10 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Routing protocols process', pid 1344, signal 1, status 0 with notification errors enabled Aug 21 19:08:10 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying dfwd(8) Aug 21 19:08:10 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Firewall process', pid 2114, signal 1, status 0 with notification errors enabled Aug 21 19:08:10 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying dcd(9) Aug 21 19:08:10 RE0 rpd[1344]: %DAEMON-6: Synchronized commit processing phase 1 complete Aug 21 19:08:10 RE0 rpd[1344]: %DAEMON-6: task_reconfigure reinitializing done Aug 21 19:08:10 RE0 /kernel: %KERN-4: bundle ae12.2001: bundle IFL state changed to DOWN <<<<<<<< IFL down Aug 21 19:08:10 RE0 mib2d[2116]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 681, ifAdminStatus down(2), ifOperStatus down(2), ifName ae12.2001 Aug 21 19:08:10 RE0 mib2d[2116]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 677, ifAdminStatus down(2), ifOperStatus down(2), ifName ge-2/1/2.2001 Aug 21 19:08:10 RE0 mib2d[2116]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 679, ifAdminStatus down(2), ifOperStatus down(2), ifName ge-3/1/2.2001 Aug 21 19:08:10 RE0 rpd[1344]: %DAEMON-6: Synchronized commit processing phase 2 start Aug 21 19:08:10 RE0 rpd[1344]: %DAEMON-6: Synchronized commit processing complete Aug 21 19:08:11 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Interface control process', pid 2118, signal 1, status 0 with notification errors enabled Aug 21 19:08:11 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying mib2d(11) Aug 21 19:08:11 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Management Information Base II process', pid 2116, signal 1, status 0 with notification errors enabled Aug 21 19:08:11 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying apsd(12) Aug 21 19:08:11 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'SONET Automatic Protection Switching process', pid 2084, signal 1, status 0 with notification errors enabled Aug 21 19:08:11 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying vrrpd(13) <<<<<<<< mgd notify to vrrpd.
Below messages.1 is of the ideal case. The mgd sent notification to vrrpd, then IFL=ae12.2001 went down. In this scenario, priority=0 packet could be advertised.
<messages.1> Aug 21 18:56:48 RE0 rpd[1344]: %DAEMON-6: task state: <ReConfig Foreground ParseConfig> Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Routing protocols process', pid 1344, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying dfwd(8) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Firewall process', pid 2114, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying dcd(9) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Interface control process', pid 2118, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying mib2d(11) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Management Information Base II process', pid 2116, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying apsd(12) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'SONET Automatic Protection Switching process', pid 2084, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying vrrpd(13) <<<<<<<< mgd notify to vrrpd. Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Virtual Router Redundancy Protocol process', pid 58680, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying pfed(15) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Packet Forwarding Engine management process', pid 2115, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying sampled(18) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Traffic sampling control process', pid 2086, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying fud(25) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: cannot signal 'Port forwarding process', signal 1, no pid Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying spd(30) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: cannot signal 'Adaptive services process', signal 1, no pid Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying rdd(35) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Redundancy interface management process', pid 2096, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying cfmd(42) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Connectivity fault management process', pid 2102, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying sdxd(74) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: cannot signal 'Service Deployment System (SDX) process', signal 1, no pid Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying l2ald(76) Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Layer 2 address flooding and learning process', pid 2083, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying l2cpd(78) Aug 21 18:56:48 RE0 l2cp[2099]: %DAEMON-6-L2CPD_TASK_REINIT: Reinitialized Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Layer 2 Control Protocol process', pid 2099, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: notifying mcsnoopd(81) Aug 21 18:56:48 RE0 mcsn[2109]: %DAEMON-6-MCSN_TASK_REINIT: Reinitializing Aug 21 18:56:48 RE0 l2cp[2099]: %DAEMON-5: Read acess profile () config Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Multicast Snooping process', pid 2109, signal 1, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mcsn[2109]: %DAEMON-6: task_reconfigure reinitializing done Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: Rotate backup configs Aug 21 18:56:48 RE0 l2cp[2099]: %DAEMON-6: task_reconfigure reinitializing done Aug 21 18:56:48 RE0 rpd[1344]: %DAEMON-6: task_reconfigure reinitializing done Aug 21 18:56:48 RE0 /kernel: %KERN-4: bundle ae12.2001: bundle IFL state changed to DOWN <<<<<<<< IFL down Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: signaling 'Simple Network Management Protocol process', pid 2117, signal 31, status 0 with notification errors enabled Aug 21 18:56:48 RE0 mib2d[2116]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 681, ifAdminStatus down(2), ifOperStatus down(2), ifName ae12.2001 Aug 21 18:56:48 RE0 mib2d[2116]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 677, ifAdminStatus down(2), ifOperStatus down(2), ifName ge-2/1/2.2001 Aug 21 18:56:48 RE0 mx4815 mib2d[2116]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 679, ifAdminStatus down(2), ifOperStatus down(2), ifName ge-3/1/2.2001 Aug 21 18:56:48 RE0 mgd[87171]: %INTERACT-6-UI_COMMIT_PROGRESS: Commit operation in progress: commit complete
This race condition between committing config (disabling the logical interface) and sending priority=0 packet is due to a software issue. The issue was resolved by PR 559366. This issue is reproducible in Junos OS 10.4, 11.1, 11.2, 11.3 and 11.4R1 releases.
The fix is available in 11.4R2 and 12.1 and later. This issue could be worked around by setting lower priority value on the current master (lower than 90). It can initiate VRRP mastership switch over.The IFL can be disabled after the mastership switch over is completed. In case the physical cable is down on the master, this long down time canot be avoided because the backup is unable to detect the downed link beyond the L2SW, and it has to wait until the hold-timer expires.