Reboot reason(s): 0x2: watchdog

This article describes the following syslog message:

Reboot reason(s): 0x2: watchdog

This is a Routing Engine (RE) watchdog timeout event. This message is reported when a watchdog event is triggered.

The 0x2: watchdog message is logged each time the Routing Engine fails to update the watchdog timer in a timely manner.

When a watchdog timer event occurs, a message similar to the following is reported:

savecore: Reboot reason(s): 0x2: watchdog
/kernel: savecore: Reboot reason(s): 0x2: watchdog

This message is usually accompanied by one or more of the following syslog entries:

savecore: no unsaved dumps found
savecore: no dumps found
init: watchdog (PID 3190) started
/kernel: chip1: mem 0xfebff800-0xfebff80f at device 29.4 on pci0

The watchdog timer timeout can be caused by one of many different kinds of events.

The result of the timeout is that the watchdog timer is not updated in a timely manner by the Routing Engine. When this happens, the watchdog message is placed in the syslog, and the Routing Engine is reset in an attempt to clear the error condition. Therefore, anything that keeps the Routing Engine busy or that places the Routing Engine in a hung state will cause the watchdog timer to time out.

Some possible root causes of the watchdog timer timeout include the following:

Too much traffic is directed to the Routing Engine, such as a broadcast storm on fxp0.
Hard drive or compact flash access is suddenly lost.
Interaction between hardware and software.
Power failure.
Routing Engine memory error.

Perform these steps to determine the cause and resolve the problem (if any). Continue through each step until the problem is resolved.

1. Collect the show command output on the Routing Engine.

Capture the output to a file (in case you have to open a technical support case). To do this, configure each SSH client/terminal emulator to log your session.

show log messages | no-more
show log chassisd | no-more
show system core-dumps

2. Analyze the show command output.

a. What version of the Junos OS is the Routing Engine running? Was the OS created before June 11, 2008?

b. Ensure that the hard drive and the compact flash are accessible or are present in the bootlist. Use the following command to verify the bootlist:

lab@Router-re0> start shell 
% sysctl -a | grep "bootdevs"
machdep.bootdevs: usb,compact-flash,disk,lan << Compact flash and disk should be present in the list.

Yes – Continue with Step 2c.
No – Add the affected device back into the bootlist as outlined below, then continue to Step 2c.
Get into the shell prompt as root.

Issue this command:

sysctl -w machdep.bootdevs=pcmcia-flash,compact-flash,disk,lan

Reboot the Routing Engine, but only during a maintenance window, as rebooting will impact transit traffic.

c. Verify that fxp0 and lo0 have appropriate firewall filters to limit the amount of traffic that can reach the Routing Engine.

d. Check to see if any Routing Engine memory errors were reported in the syslog prior to the watchdog messages.

e. Check external sources to find out if there was a power failure to the device. Also check the router’s log prior to the watchdog message for any power failure messages.

f. Did the Routing Engine generate a core-dump file with a timestamp corresponding to the moment that the event message appeared?

Yes – Open a case with your technical support representative to investigate the issue further. Attach the information collected above to the case.
No – Continue with Step 3.

3. If these efforts do not resolve the problem, contact your technical support representative to investigate the issue further.

Related