Here is the procedure to identify and kill a process whenever a high CPU usage condition is experienced and the user is not able to lower the utilization of a given process. This article will also cover including zombie processes.
The goal of this article is to identify stuck processes from root, and kill them in order to free up memory and CPU resources. A zombie process is a process running in the background, but having no real function in CLI/operations.
Zombie processes are processes which get stuck while performing a task; they do not release the CPU or memory used. They do not show up as a top process because they are performed at the root level.
1.We need to check the switch’s CPU in order to check the utilization on the different process; however, in this case, we are not going to see any offender process since the offender is a zombie, we identify the Zombie process if the zombie counter shows results other than 0 (the number of stuck processes will be shown on the zombie quantity displayed: (This sentence is very confusing. Please rewrite it to make it clearer; perhaps break it into shorter sentences.)
[email protected]> show system processes extensive last pid: 16727; load averages: 22.24, 22.19, 22.15 up 0+17:39:34 01:26:10 265 processes: 28 running, 215 sleeping, 1 zombie, 21 waiting Mem: 970M Active, 128M Inact, 147M Wired, 230M Cache, 112M Buf, 386M Free
Swap:
2.Go to shell in order to identify the process and kill it from there. Run a top command in order to confirm the real time processes utilization seen on the CLI, specifically checking for the zombie counter if any, or any top offender process that is allocating more resources than the average baseline:
[email protected]> start shell % top last pid: 37244; load averages: 0.04, 0.03, 0.00 up 16+16:18:49 09:19:44 52 processes: 1 running, 51 sleeping, , 1 zombie, 21 waiting CPU states: 2.5% user, 0.0% nice, 0.6% system, 0.3% interrupt, 96.6% idle Mem: 429M Active, 69M Inact, 59M Wired, 165M Cache, 110M Buf, 258M Free
Swap:
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 1293 root 1 8 0 86856K 34628K nanslp 16.7H 27.93% pfem 1290 root 1 4 0 67900K 11596K kqread 410:01 0.00% chassism 1291 root 2 44 -52 71096K 17596K select 150:03 0.00% sfid 1331 root 1 96 0 12372K 7184K select 31:12 0.00% license-chec 1311 root 1 96 0 28544K 13140K select 27:49 0.00% mib2d 1317 root 1 96 0 15464K 9348K select 25:13 0.00% ppmd 1324 root 1 96 0 14328K 3472K select 16:02 0.00% shm-rtsdbd
3.If the PID is identified from CLI, or the top command, go to step 5.
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 1293 root 1 8 0 86856K 34628K nanslp 16.7H 27.93% pfem 1290 root 1 4 0 67900K 11596K kqread 410:01 0.00% chassism 1291 root 2 44 -52 71096K 17596K select 150:03 0.00% sfid
4.If the process is a zombie, run a “ps aux | grep -w Z ” from shell in order to list the zombie processes.
% ps aux | grep –Wz Pid=16396 Pid=13256
5.Once the PID of the offender process is found, kill it from the shell with the command kill -9 PID.
%kill -9 16396
6.To view the processes re-initiating, open a parallel session to the switch and run a top command.
last pid: 37244; load averages: 0.04, 0.03, 0.00 up 16+16:18:49 09:19:44 52 processes: 1 running, 51 sleeping, , 21 waiting >>>>>> NO ZOMBIE PROCESS CPU states: 2.5% user, 0.0% nice, 0.6% system, 0.3% interrupt, 96.6% idle
7.Once this is done, check CPU utilization and confirm if the offender process/zombie is gone and that utilization is back to normal.
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 1293 root 1 8 0 86856K 34628K nanslp 16.7H 2.97% pfem 1290 root 1 4 0 67900K 11596K kqread 410:01 0.00% chassism 1291 root 2 44 -52 71096K 17596K select 150:03 0.00% sfid