When upgrading NSM from a version prior to 2011.4s2 to a newer version, high CPU utilization might be observed on the SRX device through the awk process. This article explains the event and describes how to change a configuration option to resolve the problem.
Beginning with NSM 2011.4s2, a new feature was added into NSM to detect configuration changes made outside NSM (such as on the SRX command line). This new feature is implemented by the NSM instructing the SRX through an additional RPC command to send to the NSM any events related to commit operations it finds in the default-log-messages. The SRX must parse the default-log-messages with an additional awk process; this is CPU expensive when there are even a relatively low number of logs per second logged.
Prior the upgrade process, while running NSM version prior to 2011.4s2, CPU utilization samples were taken on a SRX210 device sending about 40 logs/sec traffic logs in event mode.
As seen below, the CPU is utilized about ~70%. From the output of top, it can be seen that the process awk is not taking too many CPU cycles and from the output of “show chassis routing-engine” it can be seen that the total utilization (Kernel + Background + User) is ~70%
% top last pid: 56514; load averages: 2.20, 1.07, 0.52 up 7+02:35:42 08:56:01 78 processes: 4 running, 68 sleeping, 2 stopped, 4 zombie CPU states: 61.2% user, 2.9% nice, 19.4% system, 0.9% interrupt, 15.5% idle Mem: 167M Active, 106M Inact, 525M Wired, 105M Cache, 112M Buf, 68M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1283 root 5 76 0 498M 48644K select 0 197.0H 108.98% flowd_octeon_hm 958 root 1 81 0 12492K 4572K select 0 81:31 9.77% eventd 56441 root 1 79 0 9608K 3148K select 0 0:10 6.74% sshd 56466 root 1 -8 13 3440K 1980K piperd 0 0:05 2.25% awk 1222 root 1 76 0 38664K 15972K select 0 59:06 0.00% chassisd 1261 root 1 76 0 12292K 6280K select 0 50:21 0.00% license-check 1227 root 1 76 0 25644K 10476K select 0 35:27 0.00% mib2d 1229 root 1 76 0 20072K 8524K select 0 32:04 0.00% l2ald 1247 root 1 76 0 15512K 3544K RUN 0 23:19 0.00% shm-rtsdbd > show chassis routing-engine Routing Engine status: Temperature 60 degrees C / 140 degrees F Total memory 1024 MB Max 604 MB used ( 59 percent) Control plane memory 560 MB Max 386 MB used ( 69 percent) Data plane memory 464 MB Max 223 MB used ( 48 percent) CPU utilization: User 26 percent Background 6 percent Kernel 37 percent Interrupt 1 percent Idle 30 percent Model RE-SRX210H-POE Serial ID AAAF7112 Start time 2013-09-25 06:20:49 UTC Uptime 7 days, 2 hours, 36 minutes, 3 seconds Last reboot reason Router rebooted after a normal shutdown. Load averages: 1 minute 5 minute 15 minute 1.72 1.12 0.57
The process list shows one awk process running on the system:
% ps aux | grep awk root 56466 2.2 0.2 3440 1980 ?? SN 8:54AM 0:15.90 /usr/bin/nawk -f /usr/libexec/ui/filter-syslog.awk -v filter_filename -v filter_event -v filter_process -v root 56522 0.0 0.1 3128 1264 p0 S+ 8:58AM 0:00.03 grep awk
When NSM and SRX connect, the NSM will send to SRX an RPC command instructing the SRX to send the default-log-messages file. This awk process is used to parse this default-log-messages that is to be sent to NSM via the outbound-ssh connection established.
After the upgrade process, while running NSM version of 2011.4s2 or newer, CPU utilization samples were taken on a SRX210 device sending about 40 logs/sec traffic logs in event mode. As seen below, the CPU is 100%. From the output of top it can be seen that the process awk is taking a lot of CPU cycles and from the output of the command show chassis routing-engine, it can be seen that the total utilization (Kernel + Background + User) is 100%
% top last pid: 56737; load averages: 3.08, 1.83, 1.38 up 7+02:57:52 09:18:11 85 processes: 4 running, 75 sleeping, 2 stopped, 4 zombie CPU states: 61.9% user, 16.4% nice, 20.4% system, 1.3% interrupt, 0.0% idle Mem: 171M Active, 106M Inact, 527M Wired, 105M Cache, 112M Buf, 61M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1283 root 5 76 0 498M 48644K select 0 197.5H 99.95% flowd_octeon_hm 56699 root 1 139 13 3464K 1980K RUN 0 0:10 30.03% awk 958 root 1 81 0 12488K 4568K select 0 84:43 8.98% eventd 56592 root 1 80 0 9544K 3124K select 0 0:07 6.74% sshd 56629 root 1 -8 13 3440K 1980K piperd 0 0:04 2.25% awk 1222 root 1 76 0 38664K 15972K select 0 59:14 0.00% chassisd 1261 root 1 76 0 12292K 6280K select 0 50:28 0.00% license-check > show chassis routing-engine Routing Engine status: Temperature 60 degrees C / 140 degrees F Total memory 1024 MB Max 625 MB used ( 61 percent) Control plane memory 560 MB Max 403 MB used ( 72 percent) Data plane memory 464 MB Max 223 MB used ( 48 percent) CPU utilization: User 20 percent Background 40 percent Kernel 40 percent Interrupt 1 percent Idle 0 percent Model RE-SRX210H-POE Serial ID AAAF7112 Start time 2013-09-25 06:20:49 UTC Uptime 7 days, 3 hours, 44 seconds Last reboot reason Router rebooted after a normal shutdown. Load averages: 1 minute 5 minute 15 minute 3.06 2.41 1.71
The process list shows two awk processes running on the system:
% ps aux | grep awk root 56699 33.8 0.2 3464 1980 ?? RN 9:17AM 0:48.09 /usr/bin/nawk -f /usr/libexec/ui/filter-syslog.awk -v filter_filename -v filter_event -v filter_process -v root 56629 2.2 0.2 3440 1980 ?? SN 9:16AM 0:10.95 /usr/bin/nawk -f /usr/libexec/ui/filter-syslog.awk -v filter_filename -v filter_event -v filter_process -v root 56745 0.0 0.1 3128 1264 p0 S+ 9:19AM 0:00.03 grep awk
When NSM and SRX connect, the NSM will send to SRX an RPC command instructing the SRX to send the default-log-messages file and another RPC command telling SRX to scan the same default-log-messages file for commit events (UI_COMMIT). One awk process is used to parse the default-log-messages, the other is used to parse for UI_COMMIT events.
As can be seen in the output, the awk process with PID 56699 is driving the CPU utilization high. This is the process scanning for UI_COMMIT events.
The additional RPC command to send to the NSM any events related to commit operations it finds in the default-log-messages is causing high CPU utilization on the SRX. The SRX must parse the default-log-messages with an additional awk process; this is CPU expensive when there are even a relatively low number of logs per second logged.
In particular this will happen when the SRX is configured to send traffic logs in event mode, as then the logs/sec rate is increased.
To disable the new RPC command causing SRX to scan for UI_COMMIT events, the configSync.commitChange configuration option needs to be changed from its default yes value to the value of no.
In /var/netscreen/GuiSvr/guiSvr.cfg change from:
configSync.commitChange yes to configSync.commitChange no
Restart of the NSM processes guiSvr (in order for the configuration to take effect) and devSvr (in order to disconnect and reconnect the SRX connection):
# /etc/init.d/guiSvr restart # /etc/init.d/devSvr restart
After the connection between the SRX and NSM is re-established, user should see only one awk process running on the device (the one which parses all logs from the default-log-messages file).
In addition to this, it is recommended to send traffic logs in stream mode rather than even mode from the SRX.