This article shows how to resume traffic during a failover in a chassis cluster environment when individual interfaces (xe or ge or fe) are used instead of standard cluster interfaces (reth).
The problem is shown in the topology below. It is based on the following assumptions:
- The L-3 links on Router 1 and 3 are terminated on node0 only.
- The L-3 links on Router 2 and 4 are terminated on node1 only.
- The Routers are running dynamic routing protocols.
- The primary path between Host-1 and Host-2 is as follows:
R1—– NODE-0 ——-R3 - If node-0 goes down, routing should change to the following path:
R2—– NODE-1 ——-R4 - After node0 fails, node1 takes over the ownership of RG-0 (the control plane), and routing reconverges — but the sessions involving individual interfaces of node0 are no longer available. This will cause any session-sensitive traffic (especially TCP) to fail, and users will be required to restart the application.
The 5-tuple information (source IP address, destination IP address, source port number, destination port number, and protocol) used to create sessions, along with interfaces, is synchronized to node1.
The session table scan on node1 during Routing-Engine failover from node0 flushes the entries of sessions with individual interfaces that no longer exist due to the failure of node0.
Even after the routing protocols converge, any session-sensitive communication (such as TCP) between the end hosts remains broken and needs to be restarted.
The solution below shows how to recover sessions when a node fails. The process behind the recovery is as follows:
- On losing node0, the dynamic routing protocols reconverge.
- To continue forwarding the traffic, it needs to use the same session before and after the failure of node0.
- In order to use the same session on node1, ensure the following:
- The session was established using a reth interface.
- The corresponding reth interfaces (on both devices) are in the same security zone.
- The interfaces are configured as follows:
set interfaces ge-0/0/3 gigether-options redundant-parent reth0 set interfaces ge-0/0/4 gigether-options redundant-parent reth2 set interfaces ge-7/0/5 gigether-options redundant-parent reth1 set interfaces ge-7/0/6 gigether-options redundant-parent reth3
Note: The interfaces reth0 and reth2 are active on node0, whereas reth1 and reth3 are active on node1.
- For simplicity, all interfaces are bound to the same zone.
set security zones security-zone trust host-inbound-traffic system-services all set security zones security-zone trust host-inbound-traffic protocols all set security zones security-zone trust interfaces reth2.0 set security zones security-zone trust interfaces reth3.0 set security zones security-zone trust interfaces reth1.0 set security zones security-zone trust interfaces reth0.0
- You can create multiple zones, but you must ensure that interfaces facing R1 and R2 are bound to the same zone, while interfaces pointing toward R3 and R4 are part of the same zone.
- Preempt is configured on each RG. Preemption enables the associated redundancy group (and reth interfaces) to fail back to the original node, bring up the interfaces, and reestablish the protocol adjacency after node0 recovers.
Note: This solution does not cover Z-mode traffic flow.
Configuration
The sample configuration on each node is below. Modify this solution as needed based on your requirements.
Cluster Devices
groups { node0 { system { host-name ff-node-0; backup-router 10.10.1.1 destination 172.27.1.0/24; } interfaces { fxp0 { unit 0 { family inet { address 10.10.1.220/24; } } } } } node1 { system { host-name ff-node-1; backup-router 10.10.1.1 destination 172.27.1.0/24; } interfaces { fxp0 { unit 0 { family inet { address 10.10.1.221/24; } } } } } } apply-groups "${NODE.EN_US}"; system { root-authentication { encrypted-password "$1$6XEK5N/m$8fEdPfCeMNEJnzYJo6w420"; ## SECRET-DATA } login { user ketan { uid 2000; class super-user; authentication { encrypted-password "$1$mr5WV8P2$3BHL0ghKpqDQ1L/yce2v10"; ## SECRET-DATA } } } services { ssh; telnet; } syslog { user * { any emergency; } file messages { any any; authorization info; } file interactive-commands { interactive-commands any; } } license { autoupdate { url https://ae1.juniper.net/junos/key_retrieval; } } } chassis { cluster { control-link-recovery; reth-count 4; redundancy-group 0 { node 0 priority 254; node 1 priority 1; } redundancy-group 1 { node 1 priority 1; node 0 priority 254; preempt; } redundancy-group 2 { node 1 priority 254; node 0 priority 1; preempt; } redundancy-group 3 { node 0 priority 254; node 1 priority 1; preempt; } redundancy-group 4 { node 0 priority 1; node 1 priority 254; preempt; } } } interfaces { ge-0/0/3 { gigether-options { redundant-parent reth0; } } ge-0/0/4 { gigether-options { redundant-parent reth2; } } ge-7/0/5 { gigether-options { redundant-parent reth1; } } ge-7/0/6 { gigether-options { redundant-parent reth3; } } fab0 { fabric-options { member-interfaces { ge-0/0/2; } } } fab1 { fabric-options { member-interfaces { ge-7/0/2; } } } lo0 { unit 0 { family inet { address 5.5.5.5/32; } } } reth0 { redundant-ether-options { redundancy-group 1; } unit 0 { family inet { address 30.10.1.20/24; } } } reth1 { redundant-ether-options { redundancy-group 2; } unit 0 { family inet { address 50.10.1.20/24; } } } reth2 { redundant-ether-options { redundancy-group 3; } unit 0 { family inet { address 40.10.1.20/24; } } } reth3 { redundant-ether-options { redundancy-group 4; } unit 0 { family inet { address 60.10.1.20/24; } } } } routing-options { static { route 10.10.0.0/24 next-hop 10.10.1.10; } router-id 5.5.5.5; autonomous-system 100; } protocols { bgp { hold-time 15; group internal { type internal; local-address 5.5.5.5; neighbor 3.3.3.3; neighbor 4.4.4.4; } } ospf { area 0.0.0.0 { interface reth1.0 { passive; metric 100; } interface lo0.0 { passive; } interface reth0.0 { passive; } interface reth2.0 { } interface reth3.0 { metric 100; } } } } policy-options { policy-statement nhs { term 1 { from protocol bgp; then { next-hop self; } } } } security { screen { ids-option untrust-screen { icmp { ping-death; } ip { source-route-option; tear-drop; } tcp { syn-flood { alarm-threshold 1024; attack-threshold 200; source-threshold 1024; destination-threshold 2048; queue-size 2000; ## Warning: 'queue-size' is deprecated timeout 20; } land; } } } policies { from-zone trust to-zone trust { policy TRUST-to-TRUST { match { source-address any; destination-address any; application any; } then { permit; } } } policy-rematch; } zones { security-zone trust { host-inbound-traffic { system-services { all; } protocols { all; } } interfaces { reth2.0; reth3.0; reth1.0; reth0.0; } } } }