PFE:.* LCHIP.*new errors in LSIF*

PFE:.* LCHIP.*new errors in LSIF*

message indicates that the FPC is reporting LCHIP interface errors.

The LCHIP.*new errors in LSIF message is logged each time an LCHIP ASIC detects an error on the L-to-S interface between the LCHIP and Physical Interface Card (PIC) on the egress Flexible PIC Concentrator (FPC). This does not mean that the LCHIP itself has a problem, nor does it mean that the error was generated there.

When a new errors in LSIF event occurs, a message similar to the following is reported:

Other messages can also be associated with the LCHIP message. It is common to see messages such as one or more of the following:

This notification can occur for different reasons. It could be due to an interface flap or because of an issue in a hardware state in an FPC, PIC, or Switch Interface Board (SIB) component. The responsible component might not be the one this log message is referring to either, as this message is only generated on traffic egress, not on ingress.

Perform these steps to determine the cause and resolve the problem (if any). Continue through each step until the problem is resolved.

1. Collect the show command output.

Capture the output to a file (in case you have to open a technical support case). To do this, configure each SSH client/terminal emulator to log your session.

Repeat the following log commands after a delay of a minute or so:

(Repeat the above log outputs after a minute or so.)

Note: Be sure to replace the [fpc#] entry above with the reference to the FPC component that is reporting the log message. Similarly, replace the [x] with the LCHIP referenced in the log message. The lchip outputs on the FPC should be repeated after about a minute. The reason for this is to detect if any of the counters being reported are increasing in value over time.

2. Analyze the show command output. Continue with the following steps.

3. Check the show log messages output to see if there was a link flap, commit, or online of a system component (i.e., FPC, SIB).

  • Yes – The LCHIP message is expected and is not reporting any performance issue in the device. The messages should go away shortly after this event and can be ignored.
  • No – Continue to Step 4.

4. Check the show log messages output to see if ISSU was being done.

  • Yes – Some PICs, such as the 4xCOC12 PIC, must be taken offline before an ISSU is performed. Otherwise, this message might be generated as a result.
  • No – Continue to Step 5.

5. Check to see if some link has suddenly been disabled and traffic cannot pass through it anymore.

  • Yes – These messages are generated while the link is disabled. The errors will stop incrementing when the link is enabled again.
  • No – Continue to Step 6.

6. Check to see if there are any associated log messages with pm3393. This is the MAC chip implemented on the XENPAK 10GE PIC.

a. If “RXXG: packet exceed error” is observed, then a received packet (line side) is bigger than the MTU number programmed into the pm3393 MAC. Check for any MTU mismatch between the link ends. If there are any mismatches, then configure the proper MTU value.

b. If “RXXG: A line interface error is detected” is observed, then improper packets are sent to the interface, such as those resulting from lowering the IFG to an invalid size.

c. If “TXXG: FIFO has errors” are seen, be aware that in the IEEE802.3ae standard there is a concept called pause frames. Pause frames are used to slow down traffic coming from the line-side device. For example, if the ingress FIFO is filling up the PM3393 MAC line side, transmit can send pause frames to the other device, connected on the line side. The device connected on the line side can then stop sending Ethernet packets to allow the ingress FIFO to drain to the link layer device. This will prevent any FIFO overflows. If pause frames are disabled, a FIFO overflow can be experienced, generating the FIFO errors and other errors. Therefore you will need to verify that flow control is enabled using the show interfaces extensive <interface-name> command.

d. If RXOAM errors are observed, these are transient and experienced when an interface transitions UP or DOWN. These are expected and can be ignored.

e. In the IEEE 802.3ae standard, the required BER of 10GBASE-L is 10e-12. If the error rate is in within this range, the error messages can be ignored. If it exceeds this value, then inspect the cabling. If this does not resolve the issue, skip to Step 9.

7. Check for the PKTR ICELL error. If this error appears, you cannot identify the source of the errors from the messages logged because the CRC is system-wide and is only checked on the egress FPC. In some cases, additional crcdrop errors in NLIF messages are seen.

a. To identify the faulty hardware, first test the SIBs to ensure that they are not contributing to the NLIF errors. (Check the FPCs using the instructions below by offlining and onlining them one by one). The SIBs can be cycled manually without packet loss (assuming a spare SIB is installed). A T320 might experience some packet loss if 10G interfaces are installed and running at high rates since the backup SIB only has a single 10G link to the FPCs compared with two for the active SIBs. Take an active SIB offline with the following command, and the backup will take its place:

b. Repeat this for each active SIB checked for errors after each change. Use the following command to check the status of the SIBs between each change:

c. If the log messages do not stop, then take the FPCs offline to identify the faulty item (see Step e for the command). If all but one FPC is reporting errors and that FPC has an active interface, then skip to Step 9.

d. If more than one FPC does not show errors, then take this subset of FPCs offline (see Step e for the command) one at a time until the errors stop, then skip to Step 9.

e. To take the FPCs offline and bring them back online, use the following command:

f. Use the following command to check for NLIF errors:

8. Check if the LCHIP messages are seen with crcerror errors in NLIF messages but not any other messages listed above. If they are, then check to see if there have been any changes with the FPC or the interface (i.e., PIC). A change in either software configuration or hardware would explain an appearance of the LCHIP messages. If reverting config or reseating the hardware helps recovery, then continue monitoring the router. If CRC errors are persistent, then replace the hardware.

About the author

Prasanna

Leave a Comment