RPD_KRT_Q_RETRIES - Config Router

This article explains why the event message

RPD_KRT_Q_RETRIES

appears in the syslog.

RPD_KRT_Q_RETRIES

message is sent to the log file each time the routing protocol daemon (RPD) fails to update the kernel. The routing protocol daemon continues retrying.

When a major change occurs, the routing protocol daemon (RPD) sends updates to the kernel to maintain the current status of the routing tables. These changes can include:

A Routing Engine mastership switchover
A Routing Engine reboot
A restart of the routing daemon, which causes a rebuild of the routing tables
Links to next hops flapping
IGP/BGP convergence

These messages can also be generated when hitting PR836197 – Higher priority rpd tasks may be scheduled too often causing lower priority tasks appearing to be stalled.

The updates are processed through the kernel routing table (KRT) queue. During this state of high activity, the socket connection might run out of buffer space. The built-in flow control will attempt to complete processing of the updates by repeated attempts to send the update to the KRT queue. When the repeat attempt is made, the

 RPD_KRT_Q_RETRIES

message is sent to the log file.

There are several variations of the RPD_KRT_Q_RETRIES message. Following are some examples of

 RPD_KRT_Q_RETRIES

messages from the log:

rpd[1531]: RPD_KRT_Q_RETRIES: Indirect Next Hop Update; No buffer space available
rpd[1531]: RPD_KRT_Q_RETRIES: Route Update: No buffer space available
rpd[1531]: RPD_KRT_Q_RETRIES: Flood Next Hop Update: No such file or directory
rpd[1531]: RPD_KRT_Q_RETRIES: Route Update: Invalid argument

When the RPD resends an update, the following message is also sent to the log file: “Route Update: No buffer space available.” If the retries are due to “Route Update: No buffer space available,” this is due to flow control and is a transient condition. It has no effect on performance.

Example:

rpd[1531]: RPD_KRT_Q_RETRIES: Route update: No buffer space available

If the retries are nontransient but permanent, then further investigation is needed. Contact your technical support representative to open a case.

Perform these steps to determine whether the messages are caused by transient queue operations or if there is some error that is permanently blocking the queue. Continue through each step until the problem is resolved.

1. Collect the show command output.

Capture the output to a file (in case you have to open a technical support case). To do this, configure the SSH client/terminal emulator to log your session.

show krt queue  (wait two minutes and repeat command)
show krt state
show system connection

2. Analyze the show command output.

In the output of show krt state, the labels for the various events are listed. When the RPD_KRT_Q_RETRIES message is generated, if the number to the right of the label is not zero and is increasing, then the kernel is continuing to process the updates correctly. If the numbers to the right of the various labels are not increasing, there is an error record that is stuck in the KRT queue being continuously rejected by the kernel.

test@router> show krt state
General state:
Options: <>
Install job is not running
Number of operations queued: 0
Routing table adds: 0
Interface routes: 0
Indirect Next Hop Adds/Changes: 0 Deletes: 0
MPLS Adds: 0
High pri Adds: 0 Changes: 0 Deletes: 0
Normal pri Indirects: 0
Normal pri Adds: 0 Changes: 0 Deletes: 0
Routing Table deletes: 0
Number of operations deferred: 0
Number of operations canceled: 0
Time until next queue run: 0
Routes learned from kernel: 25

The output of the command show krt queue is helpful to understand whether the rejection is due to ‘no buffer space’ or due to an error. This will also tell you which update is rejected by the kernel.

test@router-re0> show krt queue 
Routing table add queue: 0 queued
Interface add/delete/change queue: 0 queued
Indirect next hop add/change: 0 queued
MPLS add queue: 0 queued
Indirect next hop delete: 0 queued
High-priority deletion queue: 0 queued
High-priority change queue: 0 queued
High-priority add queue: 0 queued
Normal-priority indirect next hop queue: 0 queued
Normal-priority deletion queue: 0 queued
Normal-priority composite next hop deletion queue: 0 queued
Normal-priority change queue: 0 queued
Normal-priority add queue: 0 queued
Routing table delete queue: 0 queued

Ideally, all of the above operation queues should be 0. If this is an operation in the queue, then the number will represent the number of operations queued in that queue. If the queue is not stuck and is draining, then the number will eventually become 0 as soon as the queue becomes empty.

3. Deactivate GRES and NSR to drain the KRT queues.
Note: You may wish to contact your technical support engineer before proceeding with these actions.

If the router has both graceful Routing Engine switchover (GRES) and nonstop routing (NSR), then deactivate both functions; OR if only GRES is configured, then deactivate GRES.

For example, if NSR is enabled:

# deactivate routing-options nonstop-routing

Then deactivate an item under the chassis hierarchy. For example:

# deactive chassis redundancy

Commit the changes, and the KRT queues will be drained.

4. Restart routing.
You can clear possible corrupt updates currently stuck in the KRT queue by restarting the routing protocol daemon: (http://www.juniper.net/techpubs/en_US/junos/topics/task/operational/junos-process-restarting.html). Doing this will disrupt all traffic in the router for however long it will take the router to rebuild the routing tables.

If the issue persists, then please contact your technical support representative for assistance.

Related