Alec Matusis
matusis at yahoo.com
Tue Jan 22 21:20:47 EST 2008
> gdb /usr/bin/httpd 32427 (gdb) where #0 0x00002aaaab2c94df in sem_wait () from /lib64/tls/libpthread.so.0 #1 0x00002aaaac204dcd in PyThread_acquire_lock (lock=0x5b7820, waitflag=1) at thread_pthread.h:298 #2 0x00002aaaac1f7008 in PyImport_ImportModuleEx (name=0x2aaaad7195c4 "errno", globals=0x2aaaada331c0, locals=<value optimized out>, fromlist=0x2aaaad75e998) at import.c:239 #3 0x00002aaaac230883 in builtin___import__ (self=<value optimized out>, args=<value optimized out>) at bltinmodule.c:45 #4 0x00002aaaac19c91c in PyCFunction_Call (func=0x2aaaad6f9440, arg=0x2aaab0a68c58, kw=<value optimized out>) at methodobject.c:93 #5 0x00002aaaac177470 in PyObject_Call (func=<value optimized out>, arg=<value optimized out>, kw=<value optimized out>) at abstract.c:1751 #6 0x00002aaaac1cb58e in PyEval_CallObjectWithKeywords (func=0x2aaaad6f9440, arg=0x2aaab0a68c58, kw=0x0) at ceval.c:3419 #7 0x00002aaaac1cda14 in PyEval_EvalFrame (f=0x8fe4e0) at ceval.c:2020 #8 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaaad75fb20, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2, kws=0xb71220, kwcount=0, defs=0x2aaaad761e68, defcount=1, closure=0x0) at ceval.c:2730 #9 0x00002aaaac1d0714 in PyEval_EvalFrame (f=0xb71070) at ceval.c:3639 #10 0x00002aaaac1d26d7 in PyEval_EvalFrame (f=0xa54890) at ceval.c:3629 #11 0x00002aaaac1d26d7 in PyEval_EvalFrame (f=0x891e10) at ceval.c:3629 #12 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaab0a78f10, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=4, kws=0x0, kwcount=0, defs=0x2aaab0a5bde8, defcount=2, closure=0x0) at ceval.c:2730 #13 0x00002aaaac22acc7 in function_call (func=0x2aaab0aa3050, arg=0x2aaab0a68520, kw=0x0) at funcobject.c:548 #14 0x00002aaaac177470 in PyObject_Call (func=<value optimized out>, arg=<value optimized out>, kw=<value optimized out>) at abstract.c:1751 #15 0x00002aaaac183871 in instancemethod_call (func=<value optimized out>, arg=0x2aaab0a68520, kw=0x0) at classobject.c:2431 #16 0x00002aaaac177470 in PyObject_Call (func=<value optimized out>, arg=<value optimized out>, kw=<value optimized out>) at abstract.c:1751 #17 0x00002aaaac1cb58e in PyEval_CallObjectWithKeywords (func=0x2aaab00c1d20, arg=0x2aaab0aa1280, kw=0x0) at ceval.c:3419 #18 0x00002aaaac18449e in PyInstance_New (klass=<value optimized out>, arg=0x2aaab0aa1280, kw=0x0) at classobject.c:575 #19 0x00002aaaac177470 in PyObject_Call (func=<value optimized out>, arg=<value optimized out>, kw=<value optimized out>) at abstract.c:1751 #20 0x00002aaaac1cf5b1 in PyEval_EvalFrame (f=0x8fe7c0) at ceval.c:3755 #21 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaab0a9f810, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2, kws=0x2aaaada67ec0, kwcount=0, defs=0x2aaab0a5a728, defcount=2, closure=0x0) at ceval.c:2730 #22 0x00002aaaac1d0714 in PyEval_EvalFrame (f=0x2aaaada67cf0) at ceval.c:3639 #23 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaaad762880, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x987808, kwcount=0, defs=0x2aaaadc040f8, defcount=2, closure=0x0) at ceval.c:2730 #24 0x00002aaaac1d0714 in PyEval_EvalFrame (f=0x987650) at ceval.c:3639 #25 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaab00fc8f0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7c29e8, kwcount=0, defs=0x2aaab0095a88, defcount=2, closure=0x0) at ceval.c:2730 #26 0x00002aaaac1d0714 in PyEval_EvalFrame (f=0x7c27b0) at ceval.c:3639 #27 0x00002aaaac1d26d7 in PyEval_EvalFrame (f=0xb04790) at ceval.c:3629 #28 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaaaec4c960, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x2aaaadd5cd48, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ceval.c:2730 #29 0x00002aaaac1d0714 in PyEval_EvalFrame (f=0x2aaaadd5cb60) at ceval.c:3639 #30 0x00002aaaac1d26d7 in PyEval_EvalFrame (f=0x2aaaaddc8b90) at ceval.c:3629 #31 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaaaec5e570, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0, kws=0x2aaaadd67568, kwcount=7, defs=0x0, defcount=0, closure=0x0) at ceval.c:2730 #32 0x00002aaaac1d0714 in PyEval_EvalFrame (f=0x2aaaadd67340) at ceval.c:3639 #33 0x00002aaaac1d2d7e in PyEval_EvalCodeEx (co=0x2aaaaec5e730, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ceval.c:2730 #34 0x00002aaaac22acc7 in function_call (func=0x2aaaaf50fed8, arg=0x2aaab0158fc8, kw=0x0) at funcobject.c:548 #35 0x00002aaaac177470 in PyObject_Call (func=<value optimized out>, arg=<value optimized out>, kw=<value optimized out>) at abstract.c:1751 #36 0x00002aaaac183871 in instancemethod_call (func=<value optimized out>, arg=0x2aaab0158fc8, kw=0x0) at classobject.c:2431 #37 0x00002aaaac177470 in PyObject_Call (func=<value optimized out>, arg=<value optimized out>, kw=<value optimized out>) at abstract.c:1751 #38 0x00002aaaac17828d in PyObject_CallMethod (o=<value optimized out>, name=<value optimized out>, format=0x2aaab0a63190 "\001") at abstract.c:1840 #39 0x00002aaaac17603a in python_handler (req=0x853098, phase=0x2aaaac23583f "PythonHandler") at mod_python.c:1652 > -----Original Message----- > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > Sent: Tuesday, January 22, 2008 6:05 PM > To: Alec Matusis > Cc: mod_python at modpython.org > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > As root, attach to the process by its pid and generate a full stack > trace. > > gdb /usr/bin/httpd 32427 > > ... > > $ where > > FWIW, for these processes to have a PPID of 1, then some code running > in the process must have tried to daemonise the process, or some > measure of it, for example by calling setsid(). This would > disassociate it with proper Apache parent process. > > Graham > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > Here is another potentially useful cue: > > 32427 is this orphan process. > > > > # strace -p 32427 > > Process 32427 attached - interrupt to quit > > futex(0x5b7820, FUTEX_WAIT, 0, NULL > > > > and it's just stuck there. > > > > > -----Original Message----- > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > Sent: Tuesday, January 22, 2008 5:49 PM > > > To: Alec Matusis > > > Cc: mod_python at modpython.org > > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > > > > > BTW, originally you said: > > > > > > """I have been investigating a memory leak that occurs on an > apache > > > server > > > since we switched to worker MPM. > > > I found that the source of it are apache subprocesses that lose > track > > > of > > > their parent and never exit:""" > > > > > > If the processes are truly zombie processes, then they shouldn't > > > actually consume any resources except for the entry in the process > > > table. Ie., they are an process accounting artifact. > > > > > > So, strictly speaking they shouldn't be causing any memory leaks, > or > > > more correctly consuming memory which isn't released. > > > > > > Graham > > > > > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit > signal > > > > > Segmentation > > > > > > fault (11) > > > > > > ... > > > > > > Fatal Python error: Inconsistent interned string state. > > > > > > > > > > This is corruption of memory used by Python. > > > > > > > > > > What version of mod_python are you using? > > > > > > > > Apache/2.2.6 (Unix) mod_python/3.3.1 Python/2.4.1 > > > > > > > > But since there is only one such log entry, I'd expect at most 1 > > > zombie as a > > > > result of this? I have 6. > > > > > > > > > -----Original Message----- > > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > > > Sent: Tuesday, January 22, 2008 5:15 PM > > > > > To: Alec Matusis > > > > > Cc: mod_python at modpython.org > > > > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses > > > > > > > > > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > What do you get if you use a program like lsof or ofiles to > > > work > > > > > out > > > > > > > what open resources the zombie process may still be holding > on > > > to? > > > > > > > > > > > > There are 6 zombie sub processes now; executing lsof -p pid > takes > > > > > ages, and > > > > > > it brings the load average up from 8.0 to 23+ on this > machine- so > > > I > > > > > am > > > > > > afraid to wait long enough to get a result. > > > > > > #ps -ef | grep httpd > > > > > > root 16197 1 0 Jan21 ? 00:00:15 > > > > > > nobody 23095 16197 0 08:44 ? 00:00:06 > > > > > > nobody 29548 1 0 13:14 ? 00:00:00 > > > > > > nobody 3812 1 0 13:57 ? 00:00:00 > > > > > > nobody 4161 1 0 13:59 ? 00:00:00 > > > > > > nobody 20110 1 0 15:43 ? 00:00:00 > > > > > > nobody 25399 1 0 16:17 ? 00:00:00 > > > > > > nobody 28722 1 0 16:38 ? 00:00:00 > > > > > > nobody 28971 16197 5 16:40 ? 00:00:20 > > > > > > nobody 29189 16197 7 16:42 ? 00:00:21 > > > > > > nobody 29327 16197 7 16:42 ? 00:00:18 > > > > > > nobody 29453 16197 6 16:43 ? 00:00:13 > > > > > > nobody 29496 16197 10 16:43 ? 00:00:20 > > > > > > nobody 29539 16197 9 16:43 ? 00:00:19 > > > > > > nobody 29639 16197 11 16:44 ? 00:00:14 > > > > > > nobody 29713 16197 11 16:45 ? 00:00:12 > > > > > > nobody 29804 16197 5 16:45 ? 00:00:05 > > > > > > nobody 29857 16197 10 16:45 ? 00:00:09 > > > > > > nobody 29902 16197 10 16:45 ? 00:00:08 > > > > > > nobody 29945 16197 11 16:46 ? 00:00:07 > > > > > > nobody 29998 16197 11 16:46 ? 00:00:06 > > > > > > nobody 30058 16197 16 16:47 ? 00:00:01 > > > > > > > > > > > > note that those zombie sub processes seem to have had > 00:00:00 > > > run > > > > > time, > > > > > > unlike normal sub processes. > > > > > > 3 entries in apache error logs this time: > > > > > > > > > > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit > signal > > > > > Segmentation > > > > > > fault (11) > > > > > > ... > > > > > > Fatal Python error: Inconsistent interned string state. > > > > > > > > > > This is corruption of memory used by Python. > > > > > > > > > > What version of mod_python are you using? > > > > > > > > > > > [Tue Jan 22 14:03:16 2008] [notice] child pid 4008 exit > signal > > > > > Aborted (6) > > > > > > > > > > > > is there a faster way to see what it is holding on to? > > > > > > > > > > I know you said the applications weren't spawning sub > processes, > > > but > > > > > if your system has 'ptree' try that. In other words, see if > those > > > > > daemon processes have children which still exist. This was in > part > > > > > what I was hoping to see, ie., pipes to child processes. The > other > > > > > thing I was looking for was stuck file accesses to NFS mounted > > > > > filesystems or something. Alternative to ptree is just to look > at > > > > > parent child relationships in ps output. > > > > > > > > > > Graham > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > > > > > > > Sent: Monday, January 21, 2008 10:27 PM > > > > > > > To: Alec Matusis > > > > > > > Cc: mod_python at modpython.org > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache > subprocesses > > > > > > > > > > > > > > What do you get if you use a program like lsof or ofiles to > > > work > > > > > out > > > > > > > what open resources the zombie process may still be holding > on > > > to? > > > > > > > > > > > > > > Are you absolutely sure that that zombie process is from > the > > > > > current > > > > > > > Apache instance and not perhaps an earlier instance of > Apache? > > > > > > > > > > > > > > Graham > > > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > > > Are CGI scripts used anywhere at all on your Apache web > > > site? > > > > > > > > > > > > > > > > No, only mod_python and serving static files. > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Graham Dumpleton > [mailto:graham.dumpleton at gmail.com] > > > > > > > > > Sent: Monday, January 21, 2008 10:07 PM > > > > > > > > > To: Alec Matusis > > > > > > > > > Cc: mod_python at modpython.org > > > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache > > > subprocesses > > > > > > > > > > > > > > > > > > Are CGI scripts used anywhere at all on your Apache web > > > site? > > > > > > > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote: > > > > > > > > > > > What version of Apache are you using? > > > > > > > > > > > > > > > > > > > > 2.2.6 > > > > > > > > > > > > > > > > > > > > > What Python web application are you running on top > of > > > > > > > mod_python, a > > > > > > > > > > > self built one or one that uses one of the larger > web > > > > > > > frameworks? > > > > > > > > > > > > > > > > > > > > Only self-built stuff, nothing complicated. > > > > > > > > > > > > > > > > > > > > > Does your application create sub processes in any > way > > > to > > > > > > > perform > > > > > > > > > > > additional work? > > > > > > > > > > > > > > > > > > > > No sub processes and no threads, except that we use > > > MySQLdb > > > > > > > module > > > > > > > > > (which > > > > > > > > > > might create threads?). > > > > > > > > > > > > > > > > > > > > I noticed a warning in the error log: > > > > > > > > > > /live/scripts/_pro.py:100: Warning: Rows matched: 1 > > > Changed: > > > > > 1 > > > > > > > > > Warnings: 1 > > > > > > > > > > (this is a mysql warning), but I would not think this > is > > > > > > > relevant... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > From: Graham Dumpleton > > > [mailto:graham.dumpleton at gmail.com] > > > > > > > > > > > Sent: Monday, January 21, 2008 9:22 PM > > > > > > > > > > > To: Alec Matusis > > > > > > > > > > > Cc: mod_python at modpython.org > > > > > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache > > > > > subprocesses > > > > > > > > > > > > > > > > > > > > > > What version of Apache are you using? > > > > > > > > > > > > > > > > > > > > > > What Python web application are you running on top > of > > > > > > > mod_python, a > > > > > > > > > > > self built one or one that uses one of the larger > web > > > > > > > frameworks? > > > > > > > > > > > > > > > > > > > > > > Does your application create sub processes in any > way > > > to > > > > > > > perform > > > > > > > > > > > additional work? > > > > > > > > > > > > > > > > > > > > > > Graham > > > > > > > > > > > > > > > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> > wrote: > > > > > > > > > > > > I have been investigating a memory leak that > occurs > > > on > > > > > an > > > > > > > apache > > > > > > > > > > > server > > > > > > > > > > > > since we switched to worker MPM. > > > > > > > > > > > > I found that the source of it are apache > subprocesses > > > > > that > > > > > > > lose > > > > > > > > > track > > > > > > > > > > > of > > > > > > > > > > > > their parent and never exit: > > > > > > > > > > > > > > > > > > > > > > > > root at web10 ~> ps -ef | grep httpd > > > > > > > > > > > > root 16197 1 0 02:00 ? 00:00:09 > > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > > /p2/web/conf/web10.conf - > > > > > > > k > > > > > > > > > start > > > > > > > > > > > > nobody 17750 1 0 17:53 ? 00:00:00 > > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > > /p2/web/conf/web10.conf - > > > > > > > k > > > > > > > > > start > > > > > > > > > > > > nobody 5112 16197 4 20:02 ? 00:00:16 > > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > > /p2/web/conf/web10.conf - > > > > > > > k > > > > > > > > > start > > > > > > > > > > > > nobody 5159 16197 4 20:02 ? 00:00:15 > > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > > /p2/web/conf/web10.conf - > > > > > > > k > > > > > > > > > start > > > > > > > > > > > > nobody 5300 16197 4 20:03 ? 00:00:14 > > > > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f > > > > > /p2/web/conf/web10.conf - > > > > > > > k > > > > > > > > > start > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in this output, apache child pid 17750 has pid 1 > as a > > > > > parent, > > > > > > > and > > > > > > > > > it > > > > > > > > > > > is one > > > > > > > > > > > > of those 'zombie children'. > > > > > > > > > > > > Pids 5112, 5159, 5300 were normal (parent is pid > > > 16197), > > > > > and > > > > > > > > > they > > > > > > > > > > > exited > > > > > > > > > > > > after MaxRequestsPerChild was reached. > > > > > > > > > > > > > > > > > > > > > > > > Does anybody have any advice on this? I cannot > > > correlate > > > > > this > > > > > > > to > > > > > > > > > > > anything, > > > > > > > > > > > > there's nothing interesting in the server error > log. > > > > > > > > > > > > These 'zombies' appear at a rate of 2-3 per day; > this > > > > > apache > > > > > > > > > serves > > > > > > > > > > > about > > > > > > > > > > > > 350 requests per second. > > > > > > > > > > > > > > > > > > > > > > > > This Apache configuration is > > > > > > > > > > > > > > > > > > > > > > > > ServerLimit 40 > > > > > > > > > > > > ThreadLimit 70 > > > > > > > > > > > > > > > > > > > > > > > > StartServers 10 > > > > > > > > > > > > MaxClients 1600 > > > > > > > > > > > > MinSpareThreads 75 > > > > > > > > > > > > MaxSpareThreads 200 > > > > > > > > > > > > ThreadsPerChild 40 > > > > > > > > > > > > MaxRequestsPerChild 10000 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > Mod_python mailing list > > > > > > > > > > > > Mod_python at modpython.org > > > > > > > > > > > > > > > http://mailman.modpython.org/mailman/listinfo/mod_python > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
|