This article explains why the event message
RPD_KRT_Q_RETRIES
appears in the syslog.
An
RPD_KRT_Q_RETRIES
message is sent to the log file each time the routing protocol daemon (RPD) fails to update the kernel. The routing protocol daemon continues retrying.
When a major change occurs, the routing protocol daemon (RPD) sends updates to the kernel to maintain the current status of the routing tables. These changes can include:
- A Routing Engine mastership switchover
- A Routing Engine reboot
- A restart of the routing daemon, which causes a rebuild of the routing tables
- Links to next hops flapping
- IGP/BGP convergence
These messages can also be generated when hitting PR836197 – Higher priority rpd tasks may be scheduled too often causing lower priority tasks appearing to be stalled.
The updates are processed through the kernel routing table (KRT) queue. During this state of high activity, the socket connection might run out of buffer space. The built-in flow control will attempt to complete processing of the updates by repeated attempts to send the update to the KRT queue. When the repeat attempt is made, the
RPD_KRT_Q_RETRIES
message is sent to the log file.
There are several variations of the RPD_KRT_Q_RETRIES message. Following are some examples of
RPD_KRT_Q_RETRIES
messages from the log:
rpd[1531]: RPD_KRT_Q_RETRIES: Indirect Next Hop Update; No buffer space available rpd[1531]: RPD_KRT_Q_RETRIES: Route Update: No buffer space available rpd[1531]: RPD_KRT_Q_RETRIES: Flood Next Hop Update: No such file or directory rpd[1531]: RPD_KRT_Q_RETRIES: Route Update: Invalid argument
When the RPD resends an update, the following message is also sent to the log file: “Route Update: No buffer space available.” If the retries are due to “Route Update: No buffer space available,” this is due to flow control and is a transient condition. It has no effect on performance.
Example:
rpd[1531]: RPD_KRT_Q_RETRIES: Route update: No buffer space available
If the retries are nontransient but permanent, then further investigation is needed. Contact your technical support representative to open a case.
Perform these steps to determine whether the messages are caused by transient queue operations or if there is some error that is permanently blocking the queue. Continue through each step until the problem is resolved.
1. Collect the show command output.
Capture the output to a file (in case you have to open a technical support case). To do this, configure the SSH client/terminal emulator to log your session.
show krt queue (wait two minutes and repeat command) show krt state show system connection
2. Analyze the show command output.
In the output of show krt state, the labels for the various events are listed. When the RPD_KRT_Q_RETRIES message is generated, if the number to the right of the label is not zero and is increasing, then the kernel is continuing to process the updates correctly. If the numbers to the right of the various labels are not increasing, there is an error record that is stuck in the KRT queue being continuously rejected by the kernel.
test@router> show krt state General state: Options: <> Install job is not running Number of operations queued: 0 Routing table adds: 0 Interface routes: 0 Indirect Next Hop Adds/Changes: 0 Deletes: 0 MPLS Adds: 0 High pri Adds: 0 Changes: 0 Deletes: 0 Normal pri Indirects: 0 Normal pri Adds: 0 Changes: 0 Deletes: 0 Routing Table deletes: 0 Number of operations deferred: 0 Number of operations canceled: 0 Time until next queue run: 0 Routes learned from kernel: 25
The output of the command show krt queue is helpful to understand whether the rejection is due to ‘no buffer space’ or due to an error. This will also tell you which update is rejected by the kernel.
test@router-re0> show krt queue Routing table add queue: 0 queued Interface add/delete/change queue: 0 queued Indirect next hop add/change: 0 queued MPLS add queue: 0 queued Indirect next hop delete: 0 queued High-priority deletion queue: 0 queued High-priority change queue: 0 queued High-priority add queue: 0 queued Normal-priority indirect next hop queue: 0 queued Normal-priority deletion queue: 0 queued Normal-priority composite next hop deletion queue: 0 queued Normal-priority change queue: 0 queued Normal-priority add queue: 0 queued Routing table delete queue: 0 queued
Ideally, all of the above operation queues should be 0. If this is an operation in the queue, then the number will represent the number of operations queued in that queue. If the queue is not stuck and is draining, then the number will eventually become 0 as soon as the queue becomes empty.
3. Deactivate GRES and NSR to drain the KRT queues.
Note: You may wish to contact your technical support engineer before proceeding with these actions.
If the router has both graceful Routing Engine switchover (GRES) and nonstop routing (NSR), then deactivate both functions; OR if only GRES is configured, then deactivate GRES.
For example, if NSR is enabled:
# deactivate routing-options nonstop-routing
Then deactivate an item under the chassis hierarchy. For example:
# deactive chassis redundancy
Commit the changes, and the KRT queues will be drained.
4. Restart routing.
You can clear possible corrupt updates currently stuck in the KRT queue by restarting the routing protocol daemon: (http://www.juniper.net/techpubs/en_US/junos/topics/task/operational/junos-process-restarting.html). Doing this will disrupt all traffic in the router for however long it will take the router to rebuild the routing tables.
If the issue persists, then please contact your technical support representative for assistance.