High RE CPU utilization seen on awk process after NSM upgrade

When upgrading NSM from a version prior to 2011.4s2 to a newer version, high CPU utilization might be observed on the SRX device through the awk process. This article explains the event and describes how to change a configuration option to resolve the problem.

Beginning with NSM 2011.4s2, a new feature was added into NSM to detect configuration changes made outside NSM (such as on the SRX command line). This new feature is implemented by the NSM instructing the SRX through an additional RPC command to send to the NSM any events related to commit operations it finds in the default-log-messages. The SRX must parse the default-log-messages with an additional awk process; this is CPU expensive when there are even a relatively low number of logs per second logged.

Prior the upgrade process, while running NSM version prior to 2011.4s2, CPU utilization samples were taken on a SRX210 device sending about 40 logs/sec traffic logs in event mode.

As seen below, the CPU is utilized about ~70%. From the output of top, it can be seen that the process awk is not taking too many CPU cycles and from the output of “show chassis routing-engine” it can be seen that the total utilization (Kernel + Background + User) is ~70%

The process list shows one awk process running on the system:

When NSM and SRX connect, the NSM will send to SRX an RPC command instructing the SRX to send the default-log-messages file. This awk process is used to parse this default-log-messages that is to be sent to NSM via the outbound-ssh connection established.

After the upgrade process, while running NSM version of 2011.4s2 or newer, CPU utilization samples were taken on a SRX210 device sending about 40 logs/sec traffic logs in event mode. As seen below, the CPU is 100%. From the output of top it can be seen that the process awk is taking a lot of CPU cycles and from the output of the command show chassis routing-engine, it can be seen that the total utilization (Kernel + Background + User) is 100%

The process list shows two awk processes running on the system:

When NSM and SRX connect, the NSM will send to SRX an RPC command instructing the SRX to send the default-log-messages file and another RPC command telling SRX to scan the same default-log-messages file for commit events (UI_COMMIT). One awk process is used to parse the default-log-messages, the other is used to parse for UI_COMMIT events.
As can be seen in the output, the awk process with PID 56699 is driving the CPU utilization high. This is the process scanning for UI_COMMIT events.

The additional RPC command to send to the NSM any events related to commit operations it finds in the default-log-messages is causing high CPU utilization on the SRX. The SRX must parse the default-log-messages with an additional awk process; this is CPU expensive when there are even a relatively low number of logs per second logged.

In particular this will happen when the SRX is configured to send traffic logs in event mode, as then the logs/sec rate is increased.

To disable the new RPC command causing SRX to scan for UI_COMMIT events, the configSync.commitChange configuration option needs to be changed from its default yes value to the value of no.

In /var/netscreen/GuiSvr/guiSvr.cfg change from:

Restart of the NSM processes guiSvr (in order for the configuration to take effect) and devSvr (in order to disconnect and reconnect the SRX connection):

After the connection between the SRX and NSM is re-established, user should see only one awk process running on the device (the one which parses all logs from the default-log-messages file).

In addition to this, it is recommended to send traffic logs in stream mode rather than even mode from the SRX.

About the author

Prasanna

Leave a Comment