Graham Dumpleton
graham.dumpleton at gmail.com
Tue Jan 22 21:04:50 EST 2008
As root, attach to the process by its pid and generate a full stack trace. gdb /usr/bin/httpd 32427 ... $ where FWIW, for these processes to have a PPID of 1, then some code running in the process must have tried to daemonise the process, or some measure of it, for example by calling setsid(). This would disassociate it with proper Apache parent process. Graham On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > Here is another potentially useful cue: > 32427 is this orphan process. > > # strace -p 32427 > Process 32427 attached - interrupt to quit > futex(0x5b7820, FUTEX_WAIT, 0, NULL > > and it's just stuck there. > > > -----Original Message----- > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > Sent: Tuesday, January 22, 2008 5:49 PM > > To: Alec Matusis > > Cc: mod_python at modpython.org > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > > > BTW, originally you said: > > > > """I have been investigating a memory leak that occurs on an apache > > server > > since we switched to worker MPM. > > I found that the source of it are apache subprocesses that lose track > > of > > their parent and never exit:""" > > > > If the processes are truly zombie processes, then they shouldn't > > actually consume any resources except for the entry in the process > > table. Ie., they are an process accounting artifact. > > > > So, strictly speaking they shouldn't be causing any memory leaks, or > > more correctly consuming memory which isn't released. > > > > Graham > > > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit signal > > > > Segmentation > > > > > fault (11) > > > > > ... > > > > > Fatal Python error: Inconsistent interned string state. > > > > > > > > This is corruption of memory used by Python. > > > > > > > > What version of mod_python are you using? > > > > > > Apache/2.2.6 (Unix) mod_python/3.3.1 Python/2.4.1 > > > > > > But since there is only one such log entry, I'd expect at most 1 > > zombie as a > > > result of this? I have 6. > > > > > > > -----Original Message----- > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > > Sent: Tuesday, January 22, 2008 5:15 PM > > > > To: Alec Matusis > > > > Cc: mod_python at modpython.org > > > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > > > > > > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > What do you get if you use a program like lsof or ofiles to > > work > > > > out > > > > > > what open resources the zombie process may still be holding on > > to? > > > > > > > > > > There are 6 zombie sub processes now; executing lsof -p pid takes > > > > ages, and > > > > > it brings the load average up from 8.0 to 23+ on this machine- so > > I > > > > am > > > > > afraid to wait long enough to get a result. > > > > > #ps -ef | grep httpd > > > > > root 16197 1 0 Jan21 ? 00:00:15 > > > > > nobody 23095 16197 0 08:44 ? 00:00:06 > > > > > nobody 29548 1 0 13:14 ? 00:00:00 > > > > > nobody 3812 1 0 13:57 ? 00:00:00 > > > > > nobody 4161 1 0 13:59 ? 00:00:00 > > > > > nobody 20110 1 0 15:43 ? 00:00:00 > > > > > nobody 25399 1 0 16:17 ? 00:00:00 > > > > > nobody 28722 1 0 16:38 ? 00:00:00 > > > > > nobody 28971 16197 5 16:40 ? 00:00:20 > > > > > nobody 29189 16197 7 16:42 ? 00:00:21 > > > > > nobody 29327 16197 7 16:42 ? 00:00:18 > > > > > nobody 29453 16197 6 16:43 ? 00:00:13 > > > > > nobody 29496 16197 10 16:43 ? 00:00:20 > > > > > nobody 29539 16197 9 16:43 ? 00:00:19 > > > > > nobody 29639 16197 11 16:44 ? 00:00:14 > > > > > nobody 29713 16197 11 16:45 ? 00:00:12 > > > > > nobody 29804 16197 5 16:45 ? 00:00:05 > > > > > nobody 29857 16197 10 16:45 ? 00:00:09 > > > > > nobody 29902 16197 10 16:45 ? 00:00:08 > > > > > nobody 29945 16197 11 16:46 ? 00:00:07 > > > > > nobody 29998 16197 11 16:46 ? 00:00:06 > > > > > nobody 30058 16197 16 16:47 ? 00:00:01 > > > > > > > > > > note that those zombie sub processes seem to have had 00:00:00 > > run > > > > time, > > > > > unlike normal sub processes. > > > > > 3 entries in apache error logs this time: > > > > > > > > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit signal > > > > Segmentation > > > > > fault (11) > > > > > ... > > > > > Fatal Python error: Inconsistent interned string state. > > > > > > > > This is corruption of memory used by Python. > > > > > > > > What version of mod_python are you using? > > > > > > > > > [Tue Jan 22 14:03:16 2008] [notice] child pid 4008 exit signal > > > > Aborted (6) > > > > > > > > > > is there a faster way to see what it is holding on to? > > > > > > > > I know you said the applications weren't spawning sub processes, > > but > > > > if your system has 'ptree' try that. In other words, see if those > > > > daemon processes have children which still exist. This was in part > > > > what I was hoping to see, ie., pipes to child processes. The other > > > > thing I was looking for was stuck file accesses to NFS mounted > > > > filesystems or something. Alternative to ptree is just to look at > > > > parent child relationships in ps output. > > > > > > > > Graham > > > > > > > > > > -----Original Message----- > > > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > > > > Sent: Monday, January 21, 2008 10:27 PM > > > > > > To: Alec Matusis > > > > > > Cc: mod_python at modpython.org > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > > > > > > > > > > > What do you get if you use a program like lsof or ofiles to > > work > > > > out > > > > > > what open resources the zombie process may still be holding on > > to? > > > > > > > > > > > > Are you absolutely sure that that zombie process is from the > > > > current > > > > > > Apache instance and not perhaps an earlier instance of Apache? > > > > > > > > > > > > Graham > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > > Are CGI scripts used anywhere at all on your Apache web > > site? > > > > > > > > > > > > > > No, only mod_python and serving static files. > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > > > > > > Sent: Monday, January 21, 2008 10:07 PM > > > > > > > > To: Alec Matusis > > > > > > > > Cc: mod_python at modpython.org > > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache > > subprocesses > > > > > > > > > > > > > > > > Are CGI scripts used anywhere at all on your Apache web > > site? > > > > > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > > > > What version of Apache are you using? > > > > > > > > > > > > > > > > > > 2.2.6 > > > > > > > > > > > > > > > > > > > What Python web application are you running on top of > > > > > > mod_python, a > > > > > > > > > > self built one or one that uses one of the larger web > > > > > > frameworks? > > > > > > > > > > > > > > > > > > Only self-built stuff, nothing complicated. > > > > > > > > > > > > > > > > > > > Does your application create sub processes in any way > > to > > > > > > perform > > > > > > > > > > additional work? > > > > > > > > > > > > > > > > > > No sub processes and no threads, except that we use > > MySQLdb > > > > > > module > > > > > > > > (which > > > > > > > > > might create threads?). > > > > > > > > > > > > > > > > > > I noticed a warning in the error log: > > > > > > > > > /live/scripts/_pro.py:100: Warning: Rows matched: 1 > > Changed: > > > > 1 > > > > > > > > Warnings: 1 > > > > > > > > > (this is a mysql warning), but I would not think this is > > > > > > relevant... > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: Graham Dumpleton > > [mailto:graham.dumpleton at gmail.com] > > > > > > > > > > Sent: Monday, January 21, 2008 9:22 PM > > > > > > > > > > To: Alec Matusis > > > > > > > > > > Cc: mod_python at modpython.org > > > > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache > > > > subprocesses > > > > > > > > > > > > > > > > > > > > What version of Apache are you using? > > > > > > > > > > > > > > > > > > > > What Python web application are you running on top of > > > > > > mod_python, a > > > > > > > > > > self built one or one that uses one of the larger web > > > > > > frameworks? > > > > > > > > > > > > > > > > > > > > Does your application create sub processes in any way > > to > > > > > > perform > > > > > > > > > > additional work? > > > > > > > > > > > > > > > > > > > > Graham > > > > > > > > > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > > > > > I have been investigating a memory leak that occurs > > on > > > > an > > > > > > apache > > > > > > > > > > server > > > > > > > > > > > since we switched to worker MPM. > > > > > > > > > > > I found that the source of it are apache subprocesses > > > > that > > > > > > lose > > > > > > > > track > > > > > > > > > > of > > > > > > > > > > > their parent and never exit: > > > > > > > > > > > > > > > > > > > > > > root at web10 ~> ps -ef | grep httpd > > > > > > > > > > > root 16197 1 0 02:00 ? 00:00:09 > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > /p2/web/conf/web10.conf - > > > > > > k > > > > > > > > start > > > > > > > > > > > nobody 17750 1 0 17:53 ? 00:00:00 > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > /p2/web/conf/web10.conf - > > > > > > k > > > > > > > > start > > > > > > > > > > > nobody 5112 16197 4 20:02 ? 00:00:16 > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > /p2/web/conf/web10.conf - > > > > > > k > > > > > > > > start > > > > > > > > > > > nobody 5159 16197 4 20:02 ? 00:00:15 > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > /p2/web/conf/web10.conf - > > > > > > k > > > > > > > > start > > > > > > > > > > > nobody 5300 16197 4 20:03 ? 00:00:14 > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > /p2/web/conf/web10.conf - > > > > > > k > > > > > > > > start > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in this output, apache child pid 17750 has pid 1 as a > > > > parent, > > > > > > and > > > > > > > > it > > > > > > > > > > is one > > > > > > > > > > > of those 'zombie children'. > > > > > > > > > > > Pids 5112, 5159, 5300 were normal (parent is pid > > 16197), > > > > and > > > > > > > > they > > > > > > > > > > exited > > > > > > > > > > > after MaxRequestsPerChild was reached. > > > > > > > > > > > > > > > > > > > > > > Does anybody have any advice on this? I cannot > > correlate > > > > this > > > > > > to > > > > > > > > > > anything, > > > > > > > > > > > there's nothing interesting in the server error log. > > > > > > > > > > > These 'zombies' appear at a rate of 2-3 per day; this > > > > apache > > > > > > > > serves > > > > > > > > > > about > > > > > > > > > > > 350 requests per second. > > > > > > > > > > > > > > > > > > > > > > This Apache configuration is > > > > > > > > > > > > > > > > > > > > > > ServerLimit 40 > > > > > > > > > > > ThreadLimit 70 > > > > > > > > > > > > > > > > > > > > > > StartServers 10 > > > > > > > > > > > MaxClients 1600 > > > > > > > > > > > MinSpareThreads 75 > > > > > > > > > > > MaxSpareThreads 200 > > > > > > > > > > > ThreadsPerChild 40 > > > > > > > > > > > MaxRequestsPerChild 10000 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > Mod_python mailing list > > > > > > > > > > > Mod_python at modpython.org > > > > > > > > > > > > > http://mailman.modpython.org/mailman/listinfo/mod_python > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
|