Troubleshooting CPUHOG Problems
When OSPF forms an adjacency, it floods all the link-state update packets to its neighbors. Sometimes, the flooding process takes a lot of time, depending upon the router resources. When a router’s CPU gets too busy when flooding using the most of the router’s resources, CPUHOG messages appear in the log.
The CPUHOG messages usually appear in two significant stages:
- Neighbor formation process
- LSA refresh process
- This section discusses the possible solutions for these two instances of SPF:
- CPUHOG messages during adjacency formation
- CPUHOG messages during LSA refresh period
Problem: CPUHOG Messages During Adjacency Formation—Cause: Router Is Not Running Packet-Pacing Code
When OSPF forms an adjacency, it floods all its link-state packets to its neighbor. This flooding sometimes takes a lot of CPU. Also, releases of Cisco IOS Software before 12.0T did not support packet pacing, which means that a router will try to send data as fast as it can over a link. If a link is slow or the router on the other side is slow in responding, this results in retransmission of the LSA and eventually leads to CPUHOG messages. Packet pacing adds a pacing interval between the LS updates. Instead of flooding everything at once, its sends the packet with a gap of a few milliseconds in between. Figure 9-87 shows the flowchart to follow to solve this problem.
Debugs and Verification
CPUHOG messages can be seen on a console of a router during adjacency formation and later can be checked with the show log command. Example 9-242 shows the log messages on a router showing CPUHOG.
Example 9-242 Log Messages Showing CPUHOG by OSPF Router
R1#show log %SYS-3-CPUHOG: Task ran for 2424 msec (15/15), process = OSPF Router %SYS-3-CPUHOG: Task ran for 2340 msec (10/9), process = OSPF Router %SYS-3-CPUHOG: Task ran for 2264 msec (0/0), process = OSPF Router
Solution
Packet pacing introduces a delay of 33 ms between packets and 66 ms between retransmissions. This pacing interval reduces the CPUHOG messages, and the adjacency is formed more quickly. This feature is on by default in Cisco IOS Software Release 12.0T and later. This feature is not available in the Cisco IOS Software releases earlier than 12.0T. If you are running Cisco IOS Software code earlier than Release 12.0T and you are seeing CPUHOG messages during adjacency formation, upgrade to at least Cisco IOS Software Release 12.0T or higher code to solve this problem through packet pacing.
Problem: CPUHOG Messages During LSA Refresh Period—Cause: Router Is Not Running LSA Group-Pacing Code
This problem occurs when the Cisco IOS Software code is not Release 12.0 or later. In Cisco IOS Software Release 12.0, the LSA group pacing feature was introduced to eliminate this CPU problem that can occur every 30 minutes.
In previous versions of Cisco IOS Software, all LSAs refresh every 30 minutes to synchronize the age of all LSAs. Therefore, there is a significant flood every 30 minutes to refresh all LSAs at the same time. This flooding causes the CPUHOG messages every 30 minutes. Imagine a situation in which a couple thousand LSAs are refreshing at the same time.
Figure 9-88 shows the flowchart to follow to solve this problem.
Debugs and Verification
Example 9-243 shows the CPUHOG messages that appear in the router’s log every 30 minutes.
Example 9-243 Router Is Seeing CPUHOG Messages Every 30 Minutes
R1#show log %SYS-3-CPUHOG: Task ran for 2424 msec (15/15), process = OSPF Router %SYS-3-CPUHOG: Task ran for 2340 msec (10/9), process = OSPF Router %SYS-3-CPUHOG: Task ran for 2264 msec (0/0), process = OSPF Router
Solution
LSA group pacing looks at the LSA every periodic interval (every 4 minutes, by default) and refreshes only those LSAs that are past their refresh time. This is an efficient way of reducing a large flood by chopping it down to smaller LSA floods. No extra configuration is required for this feature, but for large numbers of LSAs (generally 10,000 or more), it is recommended to use small intervals (for example, every 2 minutes); for few 100s of LSAs, use a large interval, such as 20 minutes.
If 10,000 LSAs need to be refreshed, keeping the refresh interval smaller will check the LSA every 2 or 4 minutes to see how many LSAs have reached the refresh interval, which is 30 minutes. The advantage of checking this frequently is that fewer LSAs would need to be refreshed every 2 or 4 minutes, and this will not cause a huge storm of LSA updates. If the number of LSAs is small, it really doesn’t matter whether the refresh occurs at 2 minutes or 20 minutes. That is why it’s better to increase the timer so that all the LSAs that are few in number can be refreshed at once.
Example 9-244 shows how to configure the LSA refresh interval.
Example 9-244 Configuring the LSA Refresh Interval
R1(config)#router ospf 1 R1(config-router)#timer lsa-group-pacing ? <10-1800> Interval between group of LSA being refreshed or maxaged