Alec Matusis
matusis at yahoo.com
Tue Jan 22 20:57:16 EST 2008
Here is another potentially useful cue: 32427 is this orphan process. # strace -p 32427 Process 32427 attached - interrupt to quit futex(0x5b7820, FUTEX_WAIT, 0, NULL and it's just stuck there. > -----Original Message----- > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > Sent: Tuesday, January 22, 2008 5:49 PM > To: Alec Matusis > Cc: mod_python at modpython.org > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > BTW, originally you said: > > """I have been investigating a memory leak that occurs on an apache > server > since we switched to worker MPM. > I found that the source of it are apache subprocesses that lose track > of > their parent and never exit:""" > > If the processes are truly zombie processes, then they shouldn't > actually consume any resources except for the entry in the process > table. Ie., they are an process accounting artifact. > > So, strictly speaking they shouldn't be causing any memory leaks, or > more correctly consuming memory which isn't released. > > Graham > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit signal > > > Segmentation > > > > fault (11) > > > > ... > > > > Fatal Python error: Inconsistent interned string state. > > > > > > This is corruption of memory used by Python. > > > > > > What version of mod_python are you using? > > > > Apache/2.2.6 (Unix) mod_python/3.3.1 Python/2.4.1 > > > > But since there is only one such log entry, I'd expect at most 1 > zombie as a > > result of this? I have 6. > > > > > -----Original Message----- > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > Sent: Tuesday, January 22, 2008 5:15 PM > > > To: Alec Matusis > > > Cc: mod_python at modpython.org > > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > > > > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > What do you get if you use a program like lsof or ofiles to > work > > > out > > > > > what open resources the zombie process may still be holding on > to? > > > > > > > > There are 6 zombie sub processes now; executing lsof -p pid takes > > > ages, and > > > > it brings the load average up from 8.0 to 23+ on this machine- so > I > > > am > > > > afraid to wait long enough to get a result. > > > > #ps -ef | grep httpd > > > > root 16197 1 0 Jan21 ? 00:00:15 > > > > nobody 23095 16197 0 08:44 ? 00:00:06 > > > > nobody 29548 1 0 13:14 ? 00:00:00 > > > > nobody 3812 1 0 13:57 ? 00:00:00 > > > > nobody 4161 1 0 13:59 ? 00:00:00 > > > > nobody 20110 1 0 15:43 ? 00:00:00 > > > > nobody 25399 1 0 16:17 ? 00:00:00 > > > > nobody 28722 1 0 16:38 ? 00:00:00 > > > > nobody 28971 16197 5 16:40 ? 00:00:20 > > > > nobody 29189 16197 7 16:42 ? 00:00:21 > > > > nobody 29327 16197 7 16:42 ? 00:00:18 > > > > nobody 29453 16197 6 16:43 ? 00:00:13 > > > > nobody 29496 16197 10 16:43 ? 00:00:20 > > > > nobody 29539 16197 9 16:43 ? 00:00:19 > > > > nobody 29639 16197 11 16:44 ? 00:00:14 > > > > nobody 29713 16197 11 16:45 ? 00:00:12 > > > > nobody 29804 16197 5 16:45 ? 00:00:05 > > > > nobody 29857 16197 10 16:45 ? 00:00:09 > > > > nobody 29902 16197 10 16:45 ? 00:00:08 > > > > nobody 29945 16197 11 16:46 ? 00:00:07 > > > > nobody 29998 16197 11 16:46 ? 00:00:06 > > > > nobody 30058 16197 16 16:47 ? 00:00:01 > > > > > > > > note that those zombie sub processes seem to have had 00:00:00 > run > > > time, > > > > unlike normal sub processes. > > > > 3 entries in apache error logs this time: > > > > > > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit signal > > > Segmentation > > > > fault (11) > > > > ... > > > > Fatal Python error: Inconsistent interned string state. > > > > > > This is corruption of memory used by Python. > > > > > > What version of mod_python are you using? > > > > > > > [Tue Jan 22 14:03:16 2008] [notice] child pid 4008 exit signal > > > Aborted (6) > > > > > > > > is there a faster way to see what it is holding on to? > > > > > > I know you said the applications weren't spawning sub processes, > but > > > if your system has 'ptree' try that. In other words, see if those > > > daemon processes have children which still exist. This was in part > > > what I was hoping to see, ie., pipes to child processes. The other > > > thing I was looking for was stuck file accesses to NFS mounted > > > filesystems or something. Alternative to ptree is just to look at > > > parent child relationships in ps output. > > > > > > Graham > > > > > > > > -----Original Message----- > > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > > > Sent: Monday, January 21, 2008 10:27 PM > > > > > To: Alec Matusis > > > > > Cc: mod_python at modpython.org > > > > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > > > > > > > > > What do you get if you use a program like lsof or ofiles to > work > > > out > > > > > what open resources the zombie process may still be holding on > to? > > > > > > > > > > Are you absolutely sure that that zombie process is from the > > > current > > > > > Apache instance and not perhaps an earlier instance of Apache? > > > > > > > > > > Graham > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > Are CGI scripts used anywhere at all on your Apache web > site? > > > > > > > > > > > > No, only mod_python and serving static files. > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > > > > > Sent: Monday, January 21, 2008 10:07 PM > > > > > > > To: Alec Matusis > > > > > > > Cc: mod_python at modpython.org > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache > subprocesses > > > > > > > > > > > > > > Are CGI scripts used anywhere at all on your Apache web > site? > > > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > > > What version of Apache are you using? > > > > > > > > > > > > > > > > 2.2.6 > > > > > > > > > > > > > > > > > What Python web application are you running on top of > > > > > mod_python, a > > > > > > > > > self built one or one that uses one of the larger web > > > > > frameworks? > > > > > > > > > > > > > > > > Only self-built stuff, nothing complicated. > > > > > > > > > > > > > > > > > Does your application create sub processes in any way > to > > > > > perform > > > > > > > > > additional work? > > > > > > > > > > > > > > > > No sub processes and no threads, except that we use > MySQLdb > > > > > module > > > > > > > (which > > > > > > > > might create threads?). > > > > > > > > > > > > > > > > I noticed a warning in the error log: > > > > > > > > /live/scripts/_pro.py:100: Warning: Rows matched: 1 > Changed: > > > 1 > > > > > > > Warnings: 1 > > > > > > > > (this is a mysql warning), but I would not think this is > > > > > relevant... > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Graham Dumpleton > [mailto:graham.dumpleton at gmail.com] > > > > > > > > > Sent: Monday, January 21, 2008 9:22 PM > > > > > > > > > To: Alec Matusis > > > > > > > > > Cc: mod_python at modpython.org > > > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache > > > subprocesses > > > > > > > > > > > > > > > > > > What version of Apache are you using? > > > > > > > > > > > > > > > > > > What Python web application are you running on top of > > > > > mod_python, a > > > > > > > > > self built one or one that uses one of the larger web > > > > > frameworks? > > > > > > > > > > > > > > > > > > Does your application create sub processes in any way > to > > > > > perform > > > > > > > > > additional work? > > > > > > > > > > > > > > > > > > Graham > > > > > > > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > > > > I have been investigating a memory leak that occurs > on > > > an > > > > > apache > > > > > > > > > server > > > > > > > > > > since we switched to worker MPM. > > > > > > > > > > I found that the source of it are apache subprocesses > > > that > > > > > lose > > > > > > > track > > > > > > > > > of > > > > > > > > > > their parent and never exit: > > > > > > > > > > > > > > > > > > > > root at web10 ~> ps -ef | grep httpd > > > > > > > > > > root 16197 1 0 02:00 ? 00:00:09 > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > /p2/web/conf/web10.conf - > > > > > k > > > > > > > start > > > > > > > > > > nobody 17750 1 0 17:53 ? 00:00:00 > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > /p2/web/conf/web10.conf - > > > > > k > > > > > > > start > > > > > > > > > > nobody 5112 16197 4 20:02 ? 00:00:16 > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > /p2/web/conf/web10.conf - > > > > > k > > > > > > > start > > > > > > > > > > nobody 5159 16197 4 20:02 ? 00:00:15 > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > /p2/web/conf/web10.conf - > > > > > k > > > > > > > start > > > > > > > > > > nobody 5300 16197 4 20:03 ? 00:00:14 > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > /p2/web/conf/web10.conf - > > > > > k > > > > > > > start > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in this output, apache child pid 17750 has pid 1 as a > > > parent, > > > > > and > > > > > > > it > > > > > > > > > is one > > > > > > > > > > of those 'zombie children'. > > > > > > > > > > Pids 5112, 5159, 5300 were normal (parent is pid > 16197), > > > and > > > > > > > they > > > > > > > > > exited > > > > > > > > > > after MaxRequestsPerChild was reached. > > > > > > > > > > > > > > > > > > > > Does anybody have any advice on this? I cannot > correlate > > > this > > > > > to > > > > > > > > > anything, > > > > > > > > > > there's nothing interesting in the server error log. > > > > > > > > > > These 'zombies' appear at a rate of 2-3 per day; this > > > apache > > > > > > > serves > > > > > > > > > about > > > > > > > > > > 350 requests per second. > > > > > > > > > > > > > > > > > > > > This Apache configuration is > > > > > > > > > > > > > > > > > > > > ServerLimit 40 > > > > > > > > > > ThreadLimit 70 > > > > > > > > > > > > > > > > > > > > StartServers 10 > > > > > > > > > > MaxClients 1600 > > > > > > > > > > MinSpareThreads 75 > > > > > > > > > > MaxSpareThreads 200 > > > > > > > > > > ThreadsPerChild 40 > > > > > > > > > > MaxRequestsPerChild 10000 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > Mod_python mailing list > > > > > > > > > > Mod_python at modpython.org > > > > > > > > > > > http://mailman.modpython.org/mailman/listinfo/mod_python > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
|