The chassis process (chassisd), which controls hardware components on the routing platform, shut down the routing platform because the temperature of one or more components exceeded the indicated threshold temperature for the indicated amount of time. Continued operation at the excessive temperature could damage the routing platform.
The CHASSISD_OVER_TEMP_SHUTDOWN_TIME message is logged when the routing platform is powered down due to the temperature of one or more components being above a threshold temperature for a designated period of time.
When the routing platform is powered off due to an over temperature condition detected, the system will create a syslog entry similar to the following example:
chassisd[1097]: CHASSISD_OVER_TEMP_SHUTDOWN_TIME: Chassis temperature above 90 degrees C for too long (> 240 seconds); powering down all FRUs
This log message can also occur with one or more of the following messages preceeding it:
chassisd[1618]: %DAEMON-3-CHASSISD_TEMP_HOT_NOTICE: FPC 0 temperature of 96 degrees C is above limit (95 degrees) chassisd[1085]: %DAEMON-5-CHASSISD_BLOWERS_SPEED_FULL: Fans and impellers being set to full speed [system warm] chassisd[1085]: %DAEMON-4-CHASSISD_HIGH_TEMP_CONDITION: Chassis temperature over 90 degrees C (but no fan/impeller failure detected) larmd[1098]: Alarm set: Temp sensor color=RED, class=CHASSIS, reason=Temperature Hot chassisd[1097]: CHASSISD_OVER_TEMP_CONDITION: Chassis temperature over 90 degrees C (but no fan/impeller failure detected); routing platform will shutdown in 240 seconds if condition persists craftd[1099]: Major alarm set, Temperature Hot CHASSISD_SENSOR_RANGE_NOTICE: FPC 7 temperature is -108 degrees C, which is outside operating range FPC 7 temp sensor not ok, status 0x8 failed 11 times send: yellow alarm clear, device FPC 7, reason FPC 7 TmpSnsr Fail I2CS write cmd to Left Fan Tray#0 [0x10], reg 0x50, cmd 0x8f
When the system shuts down, the chassisd process will turn off the power modules so that the system stays off, rather than just rebooting. The chassis will then have to be manually powered on to restore functionality.
The cause is usually due to the temperature sensor on one or more components reporting an over temperature condition, which exists continuously longer than a certain threshold period of time. However a faulty temperature sensor may also cause this issue.
Determine the cause of the system shutdown by examining the messages preceeding the CHASSISD_OVER_TEMP_SHUTDOWN_TIME message in the output of the following:
show log messages
There may be messages stating that a temperature sensor has failed or that a specific component is reporting an environmental temperature higher than that recommended.
The temperature threshold values for various system components can be viewed by executing this command:
show chassis temperature-thresholds
The current component temperatures can be monitored by entering this command:
show chassis environment
The preceeding command can be repeated over varying time intervals to manually check if the temperature of any one or several components is rising.
Follow these steps to troubleshoot this issue:
- Check the surrounding environment to the chassis, to verify that airflow to and from the unit is not being restricted. Air must be allowed to enter and exit via only the air vents on the chassis. Vent restriction will lower the effectiveness of the cooling for the system. Also check to make sure that air inflow is not next to the exhaust of another system generating heated air, as this will also reduce cooling efficiently.
- Check the air filters on the platform and verify that they are clean.
- Verify that empty FPC slots have a blank cover plates installed, to preserve the integrity of the internal airflow inside the chassis. All system components (FPCs, CBs, PICs, REs, etc.) should fit snugly in their slots and not have gaps that allow air flow around the component to the outside environment.
- Verify that the fans are operating by either visual and/or audio observation. If a fan or fan tray does not seem to be running as expected, or there are log messages reporting a fan failure, try reseating the fan in its slot. If there are still messages reporting a fan failure, open a case with JTAC for further investigation.
- If a specific component is reporting as being over temperature, or is reporting a temperature sensor failure, try reseating that component. Some components, such as FPCs, may also be moved to a different slot in the chassis to see if that resolves the issue.
- If there are still messages reporting component temperature rising steadily, or over temperature, or a temperature sensor failure at this point, open a case with your technical support representative for further investigation.