The routing protocol daemon (RPD) sends updates to the kernel to maintain the current status of the routing tables. This article provides information about the RPD_KRT_Q_RETRIES: Route Update: No buffer space available error message.
- Name: RPD_KRT_Q_RETRIES
- Message: <reason>: <error-message>
- Help: The attempt to update the kernel has failed.
- Description: The routing protocol process (RPD) attempted to update the kernel in the indicated times and failed. It will continue to retry.
The routing protocol daemon (RPD) sends updates to the kernel to maintain the current status of the routing tables. When a major change occurs, such as
- a Routing Engine mastership switchover,
- rebuilding the routing tables (that might be caused by a restart of the routing daemon),
- links to next hops flapping, or
- IGP/BGP convergence,
large numbers of updates will be sent to the kernel.
These updates are processed via the KRT queue. During this state of high activity, the socket connection may run out of buffer space. The built-in flow control will assure the processing of the updates by repeated attempts to send the update to the KRT queue. When the repeat attempt is made, the message is sent to the log file. If the retries are due to Route Update: No buffer space available, this is due to flow control and is usually a transient condition. If the retries are due to other reasons, then it is mostly a software defect and further investigation is required.
> Feb 6 16:58:07.416 2013 rpd[1896]: : Route Update: No buffer space available.
This issue could be due to cases which get stuck in the KRT queue. The following examples highlight two cases for RPD_KRT_Q_RETRIES:
This case indicates that a slow IFSTATE client (MCSNOOPD) caused the KRT queue to be stuck:
Syslog messages:
Nov 28 05:34:20.178 juniper /kernel: %KERN-3: rt_pfe_veto: Too many delayed route/nexthop unrefs. Op 2, rtsm_id 5, msg type 2 Nov 28 05:34:20.178 juniper /kernel: %KERN-3: rt_pfe_veto: Possible slowest client is mcsnoopd. States processed - 18196645. States to be processed - 4111733 < Nov 28 05:34:20.178 juniper /kernel: %KERN-3: rt_pfe_veto: Possible second slowest client is slaveRE1. States processed - 22308367. States to be processed - 11 Nov 28 05:34:20.422 juniper rpd[1368]: %DAEMON-3-RPD_KRT_Q_RETRIES: Route Update: No buffer space available < Nov 28 05:34:25.178 juniper /kernel: %KERN-3: rt_pfe_veto: Too many delayed route/nexthop unrefs. Op 2, rtsm_id 5, msg type 2 Nov 28 05:34:25.178 juniper /kernel: %KERN-3: rt_pfe_veto: Possible slowest client is mcsnoopd. States processed - 18196645. States to be processed - 4111760 Nov 28 05:34:30.177 juniper /kernel: %KERN-3: rt_pfe_veto: Too many delayed route/nexthop unrefs. Op 2, rtsm_id 5, msg type 2
System output:
juniper@juniper> show krt queue Routing table add queue: 0 queued Interface add/delete/change queue: 0 queued High-priority multicast add/change: 0 queued Indirect next hop add/change: 0 queued MPLS add queue: 0 queued Indirect next hop delete: 0 queued High-priority deletion queue: 0 queued MPLS change queue: 0 queued High-priority change queue: 49 queued CHANGE FROM gf 1 inst id 5 112.190.95.121/32 nexthop 112.190.95.121, ae4.0 (20165) error 'ENOBUFS -- No buffer space available, or pfe socket full' < ENOBUFS means a throttle in Kernel TO gf 1 inst id 5 112.190.95.121/32 reject (20165) error 'ENOBUFS -- No buffer space available, or pfe socket full'
Kernel side:
root@juniper% ifsmon -g veto_retry = 360989 veto_recover = 0 veto_nh_delayed_unref = 360989 < veto_retry count increment indicates KRT queue throttle relayg_veto_err = 0 relayg_veto_slp = 0 Rtsock msgs generated in 10 secs interval for past 10 mins ---------------------------------------------------------------- 10secs 20secs 30secs 40secs 50secs 60secs ---------------------------------------------------------------- 0 mins 4 12 12 17 14 9 1 mins 18 12 12 17 13 11 2 mins 18 12 12 17 13 11 3 mins 18 12 12 17 13 11 4 mins 17 12 116 17 13 11 5 mins 18 12 12 17 12 11 6 mins 18 12 12 17 13 11 7 mins 18 12 11 17 13 11 8 mins 18 12 205 17 13 11 9 mins 17 12 116 17 13 11 root@juniper% ifsmon -c CLIENT PROGRESS PERCENTAGES: Client Name Last state States processed Percentage jdhcpd 0 71861 100.000 jdhcpd 0 71861 100.000 xdpc7 0 71861 100.000 xdpc4 0 71861 100.000 ~~ ~~ ppmd 0 71861 100.000 mcsnoopd 30ca0420 -73802 5976668.500 < In this case, MCSNOOPD caused KRT queue stuck issue.
In this case, the system suddenly became busy, due to VMCore dumping. This caused the KRT queue to retry the syslog message for a while:
Syslog messages:
Feb 1 14:23:25.247 juniper /kernel: %KERN-2: kern_request_live_dump_proc: Triggered dump < Feb 1 14:23:25.247 juniper /kernel: %KERN-2: Physical memory: 16354 MB Feb 1 14:23:25.247 juniper /kernel: %KERN-2: Dumping 892 MB: 877 861 845 Feb 1 14:23:25.247 juniper /kernel: %KERN-3: kern_dump_proc pid 44 tid 100036 ran for 509 ms Feb 1 14:23:25.636 juniper /kernel: %KERN-2: 829 Feb 1 14:23:25.740 juniper rpd[1406]: %DAEMON-3-RPD_KRT_Q_RETRIES: Route Update: No buffer space available < Feb 1 14:23:26.302 juniper rpd[1406]: %DAEMON-3-RPD_KRT_Q_RETRIES: Route Update: No buffer space available Feb 1 14:23:35.722 juniper rpd[1406]: %DAEMON-3-RPD_KRT_Q_RETRIES: Indirect Next Hop Update: No buffer space available Feb 1 14:23:45.969 juniper rpd[1406]: %DAEMON-3-RPD_KRT_Q_RETRIES: Indirect Next Hop Update: No buffer space available Feb 1 14:23:46.073 juniper /kernel: %KERN-2: Dump complete Feb 1 14:23:56.808 juniper login: %AUTH-5: Login attempt for user bucheon from host 168.126.133.165 Feb 1 14:23:56.950 juniper login[8917]: %AUTH-6-LOGIN_INFORMATION: User bucheon logged in from host 168.126.133.165 on device ttyp0 Feb 1 14:23:58.322 juniper savecore: %DAEMON-1: reboot after panic: Junos Live Dump Feb 1 14:23:58.324 juniper savecore: %DAEMON-5: writing core to vmcore.0
System output:
juniper@juniper> show krt state < Nothing special detected. General state: Options: <> Install job is not running Number of operations queued: 0 Routing table adds: 0 Interface routes: 0 High pri multicast Adds/Changes: 0 Indirect Next Hop Adds/Changes: 0 Deletes: 0 MPLS Adds: 0 Changes: 0 High pri Adds: 0 Changes: 0 Deletes: 0 Normal pri Indirects: 0 Normal pri Adds: 0 Changes: 0 Deletes: 0 Routing Table deletes: 0 Number of operations deferred: 0 Number of operations canceled: 0 Time until next queue run: 0 Routes learned from kernel: 25
Collect the output of the following commands, two or three times at a regular interval of two minutes:
From the routing engine side:
-
show system process extensive
-
show task memory
-
show system buffers
-
show system virtual-memory
-
show task statistics
-
show task accounting detail
-
show route summary
-
show krt state
-
show krt queue
From the kernel side:
% ifsmon -g % ifsmon -c
Check the log messages and verify that the routes are being shown in the normal (in the output of show krt state) or stuck state. If the routes are in stuck state, collect the above outputs and further troubleshoot the issue.