MQCHIP(#) FI Reorder cell timeout

This article describes the following syslog message:

This message reports that a packet was dropped in fabric traffic.

When an FI Reorder cell timeout event occurs, a message similar to the following is reported:

The message is usually caused by a communication issue between an SCB and an MQ chip-based (Trio) FPC card.

The message is generated either by software or hardware interactions, and log analysis is required to determine which one.

Hardware sources might include the FPC involved, or the Control Board between the FPCs that is governing their respective traffic. For example, manually taking an FPC offline or pulling a Control Board might cause this message to occur on an MQ chip FPC.

Software can produce this syslog message if an FPC core dump has occurred as a result of some other issue.

The error can also happen if the cells arriving over the fabric are delayed beyond a preset arrival window on the FPC. This can happen via a software-triggered condition such as MQ wedge or congestion in the MQ chip’s fabric input block.

Perform these steps to determine the cause and resolve the problem (if any). Continue through each step until the problem is resolved.

1. Collect the show command output on the Routing Engine.

Capture the output to a file (in case you have to open a technical support case). To do this, configure each SSH client/terminal emulator to log your session.

Capture the following set of commands two or three times:

Note: Replace the # character above with the corresponding FPC or PFE chip number, as provided in the syslog messages.

Collect the FPC outputs three times, with a 1-minute time difference between each. This will show any changes in the error counts.

Look for any related events that occurred at or just before the FI Reorder cell timeout message in the syslog or chassisd outputs.

2. Analyze the show command output. Look for any related events that occurred at or just before the FI Reorder cell timeout message in the syslog or chassisd outputs.

a. Did the event message appear shortly after a commit command was executed or after cards were inserted, removed, or taken offline?

  • Yes – The message is expected and can be ignored.
  • No – Continue to Step 2b.

b. Did the system generate an FPC core-dump file with a timestamp corresponding to the moment that the event message appeared?

  • Yes – Open a case with your technical support representative to investigate the issue further. Attach the information collected above to the case.
  • No – Continue to Step 3.

3. Walk the fabric planes one-by-one by offlining one fabric plane at a time and monitoring the message appearance, assuming messages are occurring fairly regularly. Do this only during a maintenance window, as it will impact transit traffic.
Offlining fabric plane may also trigger these errors due to loss of inflight fabric cells.

4. Reset the MQ chip FPC referenced in the syslog message. Do this only during a maintenance window, as it will impact transit traffic.
Did the event message stop appearing in the syslog?

  • Yes – The problem is fixed.
  • No – Continue to Step 4.

5. Reseat the FPC card and the Control Board, and check for bent pins on the connectors (with a flashlight). Do this only during a maintenance window, as it will impact transit traffic.

Did the event message stop appearing in the syslog?

  • Yes – The problem is fixed.
  • No – Continue to Step 5.

6. Swap the FPC with a spare, or swap two FPC cards in each other’s slots. Do this only during a maintenance window, as it will impact transit traffic.

Did the event message stop appearing in the syslog?

  • Yes – The problem is fixed.
  • No – Continue to Step 6.

7. If these efforts do not resolve the problem, contact your technical support representative to investigate the issue further.

About the author

Prasanna

Leave a Comment