[mod_python] remnant 'orphan' apache subprocesses

Alec Matusis matusis at yahoo.com
Tue Jan 22 20:57:16 EST 2008


Here is another potentially useful cue:
32427 is this orphan process.

# strace -p 32427
Process 32427 attached - interrupt to quit
futex(0x5b7820, FUTEX_WAIT, 0, NULL

and it's just stuck there.

> -----Original Message-----
> From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com]
> Sent: Tuesday, January 22, 2008 5:49 PM
> To: Alec Matusis
> Cc: mod_python at modpython.org
> Subject: Re: [mod_python] remnant 'orphan' apache subprocesses
> 
> BTW, originally you said:
> 
> """I have been investigating a memory leak that occurs on  an apache
> server
> since we switched to worker MPM.
> I found that the source of it are apache subprocesses that lose track
> of
> their parent and never exit:"""
> 
> If the processes are truly zombie processes, then they shouldn't
> actually consume any resources except for the entry in the process
> table. Ie., they are an process accounting artifact.
> 
> So, strictly speaking they shouldn't be causing any memory leaks, or
> more correctly consuming memory which isn't released.
> 
> Graham
> 
> On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote:
> > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit signal
> > > Segmentation
> > > > fault (11)
> > > > ...
> > > > Fatal Python error: Inconsistent interned string state.
> > >
> > > This is corruption of memory used by Python.
> > >
> > > What version of mod_python are you using?
> >
> > Apache/2.2.6 (Unix) mod_python/3.3.1 Python/2.4.1
> >
> > But since there is only one such log entry, I'd expect at most 1
> zombie as a
> > result of this? I have 6.
> >
> > > -----Original Message-----
> > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com]
> > > Sent: Tuesday, January 22, 2008 5:15 PM
> > > To: Alec Matusis
> > > Cc: mod_python at modpython.org
> > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses
> > >
> > > On 23/01/2008, Alec Matusis <matusis at yahoo.com> wrote:
> > > > > What do you get if you use a program like lsof or ofiles to
> work
> > > out
> > > > > what open resources the zombie process may still be holding on
> to?
> > > >
> > > > There are 6 zombie sub processes now; executing lsof -p pid takes
> > > ages, and
> > > > it brings the load average up from 8.0 to 23+ on this machine- so
> I
> > > am
> > > > afraid to wait long enough to get a result.
> > > > #ps -ef | grep httpd
> > > > root     16197     1  0 Jan21 ?        00:00:15
> > > > nobody   23095 16197  0 08:44 ?        00:00:06
> > > > nobody   29548     1  0 13:14 ?        00:00:00
> > > > nobody    3812     1  0 13:57 ?        00:00:00
> > > > nobody    4161     1  0 13:59 ?        00:00:00
> > > > nobody   20110     1  0 15:43 ?        00:00:00
> > > > nobody   25399     1  0 16:17 ?        00:00:00
> > > > nobody   28722     1  0 16:38 ?        00:00:00
> > > > nobody   28971 16197  5 16:40 ?        00:00:20
> > > > nobody   29189 16197  7 16:42 ?        00:00:21
> > > > nobody   29327 16197  7 16:42 ?        00:00:18
> > > > nobody   29453 16197  6 16:43 ?        00:00:13
> > > > nobody   29496 16197 10 16:43 ?        00:00:20
> > > > nobody   29539 16197  9 16:43 ?        00:00:19
> > > > nobody   29639 16197 11 16:44 ?        00:00:14
> > > > nobody   29713 16197 11 16:45 ?        00:00:12
> > > > nobody   29804 16197  5 16:45 ?        00:00:05
> > > > nobody   29857 16197 10 16:45 ?        00:00:09
> > > > nobody   29902 16197 10 16:45 ?        00:00:08
> > > > nobody   29945 16197 11 16:46 ?        00:00:07
> > > > nobody   29998 16197 11 16:46 ?        00:00:06
> > > > nobody   30058 16197 16 16:47 ?        00:00:01
> > > >
> > > > note that those zombie sub processes seem to have had 00:00:00
> run
> > > time,
> > > > unlike normal sub processes.
> > > > 3 entries in apache error logs this time:
> > > >
> > > > [Tue Jan 22 10:44:45 2008] [notice] child pid 8798 exit signal
> > > Segmentation
> > > > fault (11)
> > > > ...
> > > > Fatal Python error: Inconsistent interned string state.
> > >
> > > This is corruption of memory used by Python.
> > >
> > > What version of mod_python are you using?
> > >
> > > > [Tue Jan 22 14:03:16 2008] [notice] child pid 4008 exit signal
> > > Aborted (6)
> > > >
> > > > is there a faster way to see what it is holding on to?
> > >
> > > I know you said the applications weren't spawning sub processes,
> but
> > > if your system has 'ptree' try that. In other words, see if those
> > > daemon processes have children which still exist. This was in part
> > > what I was hoping to see, ie., pipes to child processes. The other
> > > thing I was looking for was stuck file accesses to NFS mounted
> > > filesystems or something. Alternative to ptree is just to look at
> > > parent child relationships in ps output.
> > >
> > > Graham
> > >
> > > > > -----Original Message-----
> > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com]
> > > > > Sent: Monday, January 21, 2008 10:27 PM
> > > > > To: Alec Matusis
> > > > > Cc: mod_python at modpython.org
> > > > > Subject: Re: [mod_python] remnant 'orphan' apache subprocesses
> > > > >
> > > > > What do you get if you use a program like lsof or ofiles to
> work
> > > out
> > > > > what open resources the zombie process may still be holding on
> to?
> > > > >
> > > > > Are you absolutely sure that that zombie process is from the
> > > current
> > > > > Apache instance and not perhaps an earlier instance of Apache?
> > > > >
> > > > > Graham
> > > > >
> > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote:
> > > > > > > Are CGI scripts used anywhere at all on your Apache web
> site?
> > > > > >
> > > > > > No, only mod_python and serving static files.
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com]
> > > > > > > Sent: Monday, January 21, 2008 10:07 PM
> > > > > > > To: Alec Matusis
> > > > > > > Cc: mod_python at modpython.org
> > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache
> subprocesses
> > > > > > >
> > > > > > > Are CGI scripts used anywhere at all on your Apache web
> site?
> > > > > > >
> > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote:
> > > > > > > > > What version of Apache are you using?
> > > > > > > >
> > > > > > > > 2.2.6
> > > > > > > >
> > > > > > > > > What Python web application are you running on top of
> > > > > mod_python, a
> > > > > > > > > self built one or one that uses one of the larger web
> > > > > frameworks?
> > > > > > > >
> > > > > > > > Only self-built stuff, nothing complicated.
> > > > > > > >
> > > > > > > > > Does your application create sub processes in any way
> to
> > > > > perform
> > > > > > > > > additional work?
> > > > > > > >
> > > > > > > > No sub processes and no threads, except that we use
> MySQLdb
> > > > > module
> > > > > > > (which
> > > > > > > > might create threads?).
> > > > > > > >
> > > > > > > > I noticed a warning in the error log:
> > > > > > > > /live/scripts/_pro.py:100: Warning: Rows matched: 1
> Changed:
> > > 1
> > > > > > > Warnings: 1
> > > > > > > > (this is a mysql warning), but I would not think this is
> > > > > relevant...
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Graham Dumpleton
> [mailto:graham.dumpleton at gmail.com]
> > > > > > > > > Sent: Monday, January 21, 2008 9:22 PM
> > > > > > > > > To: Alec Matusis
> > > > > > > > > Cc: mod_python at modpython.org
> > > > > > > > > Subject: Re: [mod_python] remnant 'orphan' apache
> > > subprocesses
> > > > > > > > >
> > > > > > > > > What version of Apache are you using?
> > > > > > > > >
> > > > > > > > > What Python web application are you running on top of
> > > > > mod_python, a
> > > > > > > > > self built one or one that uses one of the larger web
> > > > > frameworks?
> > > > > > > > >
> > > > > > > > > Does your application create sub processes in any way
> to
> > > > > perform
> > > > > > > > > additional work?
> > > > > > > > >
> > > > > > > > > Graham
> > > > > > > > >
> > > > > > > > > On 22/01/2008, Alec Matusis <matusis at yahoo.com> wrote:
> > > > > > > > > > I have been investigating a memory leak that occurs
> on
> > > an
> > > > > apache
> > > > > > > > > server
> > > > > > > > > > since we switched to worker MPM.
> > > > > > > > > > I found that the source of it are apache subprocesses
> > > that
> > > > > lose
> > > > > > > track
> > > > > > > > > of
> > > > > > > > > > their parent and never exit:
> > > > > > > > > >
> > > > > > > > > > root at web10 ~> ps -ef | grep httpd
> > > > > > > > > > root     16197     1  0 02:00 ?        00:00:09
> > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f
> > > /p2/web/conf/web10.conf -
> > > > > k
> > > > > > > start
> > > > > > > > > > nobody   17750     1  0 17:53 ?        00:00:00
> > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f
> > > /p2/web/conf/web10.conf -
> > > > > k
> > > > > > > start
> > > > > > > > > > nobody    5112 16197  4 20:02 ?        00:00:16
> > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f
> > > /p2/web/conf/web10.conf -
> > > > > k
> > > > > > > start
> > > > > > > > > > nobody    5159 16197  4 20:02 ?        00:00:15
> > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f
> > > /p2/web/conf/web10.conf -
> > > > > k
> > > > > > > start
> > > > > > > > > > nobody    5300 16197  4 20:03 ?        00:00:14
> > > > > > > > > > /usr/local/encap/httpd/bin/httpd -f
> > > /p2/web/conf/web10.conf -
> > > > > k
> > > > > > > start
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > in this output, apache child pid 17750 has pid 1 as a
> > > parent,
> > > > > and
> > > > > > > it
> > > > > > > > > is one
> > > > > > > > > > of those 'zombie children'.
> > > > > > > > > > Pids 5112,  5159, 5300 were normal (parent is pid
> 16197),
> > > and
> > > > > > > they
> > > > > > > > > exited
> > > > > > > > > > after MaxRequestsPerChild was reached.
> > > > > > > > > >
> > > > > > > > > > Does anybody have any advice on this? I cannot
> correlate
> > > this
> > > > > to
> > > > > > > > > anything,
> > > > > > > > > > there's nothing interesting in the server error log.
> > > > > > > > > > These 'zombies' appear at a rate of 2-3 per day; this
> > > apache
> > > > > > > serves
> > > > > > > > > about
> > > > > > > > > > 350 requests per second.
> > > > > > > > > >
> > > > > > > > > > This Apache configuration is
> > > > > > > > > >
> > > > > > > > > > ServerLimit 40
> > > > > > > > > > ThreadLimit 70
> > > > > > > > > >
> > > > > > > > > > StartServers 10
> > > > > > > > > > MaxClients 1600
> > > > > > > > > > MinSpareThreads 75
> > > > > > > > > > MaxSpareThreads 200
> > > > > > > > > > ThreadsPerChild 40
> > > > > > > > > > MaxRequestsPerChild 10000
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Mod_python mailing list
> > > > > > > > > > Mod_python at modpython.org
> > > > > > > > > >
> http://mailman.modpython.org/mailman/listinfo/mod_python
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >



More information about the Mod_python mailing list