What happens when the Routing Engine’s hard drive fails?

What is the reaction of the Routing Engine (RE) system when the hard drive fails?

Reaction of the Routing Engine (RE) system when the hard drive fails.

When a hard drive fails the following usually occurs:

  1. A problem is detected (reading or writing) at the driver level.
  2. The hard-drive gets removed from the bootlist (unless it is the only device in the bootlist, – like an M7i without compact flash – to avoid being in a situation where the router cannot reboot).
  3. An emergency /var structure is created out of MFS (Memory File System) – a virtual hard disk out of the RAM – so that the system may continue to function.
  4. The on-disk-failure actions specified in the configuration are undertaken

Configuration Options for ‘on-disk-failure’ knob:

For a single RE, the following CLI will cause the RE to reboot in case of failure:

Note: You can also mention ‘halt’ as the disk-failure-action which will halt the RE from running.

For redundant REs, configuring the below will result in a RE switchover if the Master RE was the one to experience the HDD failure (and a reboot of the Backup RE if it was the one to experience the failure); this is the current best practice.

There is also the option to trigger a failover when the Master RE detects a HDD failure but it is considered less robust than the below:

Note: In this case, graceful-switchover or failover on-loss-of-keepalives must also be configured

Note : If the on-disk-failure action is not configured, the system might hang half functional and stay unaccessible. In that case the only way to recover it is to extract and re-insert the RE.

For more information about on-disk-failure action, refer to the following link:

http://www.juniper.net/techpubs/en_US/junos11.2/topics/task/configuration/chassis-hard-disk-errors-routing-engine-rebooting.html

About the author

Prasanna

Leave a Comment