Too Many kdmflush Processes observed causing ORA-27300, ORA-27301, ORA-27302 errors
The Problem
On CentOS/RHEL 6.x there are too many kdmflush processes owned by the root user which was notified after getting the following error.
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn5
The Solution
ORA errors could be caused by:
1. The number of processes for a user exceeds the limit specified in /etc/security/limits.conf.
2. Low setting for the OS kernel parameter pid_max.
Due to an increased process list average size, the kernel fails to allocate a new PID NUMBER, because its assignable range for PID numbers is temporarily exhausted; The fork(2) system call eventually returns -EAGAIN (11) when it fails to alloc a pid number.
When checking for the processes owned by root below output is seen.
$ ps -elf | grep -i root
4 S root 1 0 0 80 0 - 5374 poll_s Nov14 ? 00:42:31 /sbin/init
1 S root 2 0 0 80 0 - 0 kthrea Nov14 ? 00:00:00 [kthreadd]
1 S root 3 2 0 80 0 - 0 run_ks Nov14 ? 00:03:26 [ksoftirqd/0]
1 S root 6 2 99 -40 - - 0 cpu_st Nov14 ? 34-16:37:47 [migration/0]
1 S root 165 2 0 80 0 - 0 worker Nov14 ? 00:00:00 [kworker/23:1]
1 S root 167 2 0 80 0 - 0 worker Nov14 ? 00:03:37 [kworker/25:1]
1 S root 170 2 0 80 0 - 0 worker Nov14 ? 00:01:10 [kworker/28:1]
1 S root 171 2 0 80 0 - 0 worker Nov14 ? 00:03:41 [kworker/29:1]
1 S root 172 2 0 80 0 - 0 worker Nov14 ? 00:04:09 [kworker/30:1]
1 S root 5584 2 0 80 0 - 0 bdi_wr 19:54 ? 00:00:00 [flush-252:188]
1 S root 5586 2 0 80 0 - 0 bdi_wr 19:54 ? 00:00:00 [flush-252:189]
1 S root 5591 2 0 80 0 - 0 bdi_wr 19:54 ? 00:00:00 [flush-252:193]
1 S root 5598 2 0 80 0 - 0 bdi_wr 19:54 ? 00:00:00 [flush-252:198]
1 S root 5600 2 0 80 0 - 0 bdi_wr 19:54 ? 00:00:00 [flush-252:199]
1 S root 5678 2 0 80 0 - 0 worker Dec09 ? 00:01:18 [kworker/30:0]
4 S root 6100 15663 0 80 0 - 28808 unix_s 19:54 ? 00:00:00 sshd: sa537610 [priv]
0 S root 6518 1863 0 80 0 - 26534 wait 19:54 pts/0 00:00:00 /bin/sh /usr/libexec/ipsec/barf
0 D root 6529 6518 39 80 0 - 1049 sleep_ 19:54 pts/0 00:00:00 egrep -q Starting Openswan /var/log/rmlog
0 S sa537610 6542 6293 0 80 0 - 25823 pipe_w 19:54 pts/7 00:00:00 grep -i root
1 S root 6864 2 0 80 0 - 0 worker 16:13 ? 00:00:04 [kworker/20:0]
1 S root 6868 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kmpathd]
1 S root 6869 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kmpath_handlerd]
1 S root 7099 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7101 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7105 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7110 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7115 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7120 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7126 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7132 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7139 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
1 S root 7147 2 0 60 -20 - 0 rescue Nov14 ? 00:00:00 [kdmflush]
# ls /dev/mapper | wc -l
200
# ps -ef|grep -i kdmflush|wc -l
200
In cases where there are hundreds of kdmflush processes then check for output of:
# ls /dev/mapper | wc -l
# ps -ef|grep -i kdmflush|wc -l
If the values are roughly equal, then this is normal behavior as the kdmflush is a kernel thread and there is one kdmflush thread present for each device-mapper device. The number of kdmflush processes can be ignored as this is normal behavior if there are those many device-mapper devices.