When monitoring a Linux system using the top command, you may come across entries like 1 zombie in the 'Tasks' line.

top - 16:03:19 up  7:09,  1 user,  load average: 2.24, 2.21, 2.29
Tasks: 392 total,  1 running, 390 sleeping,  0 stopped,  1 zombie
...

What does this 'zombie' mean, and what impact does it have on the system? This article will clearly explain the identity of zombie processes and how to identify and resolve them.


What is a Zombie Process? 🧟



To make it easier to understand, let's use an analogy of a process lifecycle.

  1. Birth (Fork): The parent process creates a child process (fork()).

  2. Execution (Exec): The child process performs its task.

  3. Termination (Exit): The child process completes its task and exits (exit()).

  4. Harvest (Wait): Once the child process terminates, the operating system (kernel) retains information about that process's PID, termination status, etc., in the process table. It then sends the parent process a SIGCHLD signal (indicating the child has terminated).

  5. The parent process must receive this signal and call the wait() system call to "harvest" the child's termination status. Once this information is harvested, the kernel finally removes the child's entry from the process table.

A zombie process is one that is trapped between steps 4 and 5. In other words, the child process has completed execution and terminated, but the parent process has not yet called wait() to harvest the termination status.

As the name suggests, this process is in a dead state (not executing). Therefore, it does not consume system resources like CPU or memory.

Why Are Zombie Processes a Problem?

A zombie process itself does not use much of the system's resources, but it occupies one slot (PID) in the process table.

If zombie processes accumulate due to bugs in the parent process and are not cleaned up, the system may reach the maximum number of PIDs it can allocate. In this case, the system will no longer be able to create new processes, which can lead to severe malfunctions. It is not uncommon to see 1 or 2 zombies in top, but if this number keeps increasing, action is necessary.


How to Check and Identify Zombie Processes

The top command only shows the _number_ of zombies. To find out which processes are in a zombie state and who their parents are, you should use the ps command.

The simplest way is to look for processes marked with 'Z' in the STAT (state) column of the ps command.

# Filter to see all processes in 'Z' state (zombie)
ps -elf | grep ' Z '

# Alternatively, use the 'aux' option (8th column ($8) is the state (STAT))
ps aux | awk '$8=="Z"'

Example output:

# ps -elf | grep ' Z '
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 Z  user  5021  5000  0  80   0 -     0 exit   15:30 ?        00:00:00 [defunct]

Important information in this example includes:

  • S (State): Z (indicating a zombie state)

  • PID: 5021 (this is the PID of the zombie process)

  • PPID: 5000 (this is the PID of the parent process that has not harvested the zombie)

  • CMD: [defunct] (a name indicating it has terminated but not been cleaned up)


How to Resolve Zombie Processes



The most important fact is that zombie processes cannot be killed with the kill command.

kill -9 5021 (the zombie PID from the above example)

This command does not work because the zombie is already in a "dead" state. There is no entity to handle the kill signal.

The only way to resolve a zombie process is to ensure that the parent process calls wait().

Step 1: Send a Signal to the Parent Process (Recommended)

The first method to try is to manually send a SIGCHLD signal to the parent process (PPID) to check the status of the child.

# Send SIGCHLD signal to the parent PID (5000) from the example
kill -s SIGCHLD 5000

This effectively informs the parent process, "One of your child processes has terminated, so check it out!" A normally programmed parent will receive this signal and harvest the zombie.

Step 2: Forcefully Terminate the Parent Process (Last Resort)

If step 1 does not work, it indicates that the parent process (PPID 5000) has either stopped or has severe bugs in the logic calling wait().

In this case, forcefully terminating the parent process is the only solution.

# Terminate the parent process (PPID 5000)
kill 5000

# If it still does not terminate, forcefully kill it
kill -9 5000

Why does terminating the parent help?

In Linux, when a parent process is killed, its child processes (orphaned processes) are automatically adopted by the init process (PID 1) or systemd. The init process is designed to periodically check the state of its children and immediately harvest terminated children (including zombies).

Thus, when the troublesome parent (PPID 5000) dies, the zombie (PID 5021) becomes a new child of init, which promptly cleans it up.

⚠️ Warning: Before terminating the parent process, be sure to check with ps -p 5000 (parent PID) that it is not a critical service for the system (e.g., database, web server, etc.). Forcefully terminating important services can cause more significant problems.


Summary

  • Zombie processes are leftovers in the process table of terminated processes that have not had their states harvested by the parent.

  • While they do not consume resources, they occupy PIDs, and if they become excessive, they can cause system failures.

  • You can find zombie (PID) and its parent (PPID) using the ps -elf | grep ' Z ' command.

  • The solution targets the non-zombie parent process (PPID).

    1. kill -s SIGCHLD <parentPID> (recommended)

    2. kill <parentPID> (last resort)