Martin Blais
blais at furius.ca
Mon Jan 2 16:28:55 EST 2006
Thanks Graham. Looks like a new directive only available in 2.2... will update now. More info: I'm starting to hack into mod_python to find out what's happening there, and python_finalize() is not called for the offending processes. On 1/2/06, Graham Dumpleton <grahamd at dscpl.com.au> wrote: > Instead of using "apachectl restart", try using "apachectl graceful" to see > if that makes any difference. > > You may also want to adjust GracefulShutdownTimeout for this case to > only allow a certain time for child processes to shutdown. > > http://httpd.apache.org/docs/2.2/mod/mpm_common.html#gracefulshutdowntimeout > > By fiddling with this, you may be able to see if given time that the child > processes do actually shutdown, or whether they actually hang. If they > hang, then you may be able to attach using gdb to the child process and > work out where it is hanging. > > It may just be a case that with Python loaded it takes longer to shutdown > than Apache expects it to and so it abruptly kills it. I haven't been able > to find any directive for specifying a shutdown timeout for "restart" which > it waits before it kills the child processes. > > Graham > > Martin Blais wrote .. > > On 1/2/06, Martin Blais <blais at furius.ca> wrote: > > > Before someone asks, I did not get seg faults in my log, so that's not > > > the problem. > > > > > > On closer inspection though, I do have a few SSL error messages like > > these: > > > > > > [Mon Jan 02 15:07:30 2006] [info] (70014)End of file found: SSL > > > handshake interrupted by system [Hint: Stop button pressed in > > > browser?!] > > > [Mon Jan 02 15:07:30 2006] [info] Connection to child 0 closed with > > > abortive shutdown(server furius.dyndns.biz:443, client 127.0.0.1) > > > [Mon Jan 02 15:07:31 2006] [info] Connection to child 9 established > > > (server furius.dyndns.biz:443, client 127.0.0.1) > > > [Mon Jan 02 15:07:31 2006] [info] Seeding PRNG with 136 bytes of entropy > > > > > > ... the number of which does not match the number of zombie children > > > (does not mean that it's not the problem though). > > > > > > Any information/tips welcome. > > > > It's not the SSL errors, I just reproduced the zombie child error > > without getting the SSL errors (it's my automated test code that > > somehow creates the SSL problems, fiddling a lot with the browser does > > not cause SSL errors, but will result in children not wanting to die). > > > > > > > > > > > > > On 1/2/06, Martin Blais <blais at furius.ca> wrote: > > > > Hi > > > > > > > > I'm getting messages like this in my apache2 log when I stop it after > > > > running requests to a mod_python-based web app: > > > > > > > > ==> /var/log/apache2/error_log <== > > > > [Mon Jan 02 13:04:12 2006] [warn] child process 21806 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:12 2006] [warn] child process 21768 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:12 2006] [warn] child process 21792 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:14 2006] [warn] child process 21806 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:14 2006] [warn] child process 21768 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:14 2006] [warn] child process 21792 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:16 2006] [warn] child process 21806 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:16 2006] [warn] child process 21768 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:16 2006] [warn] child process 21792 still did not > > > > exit, sending a SIGTERM > > > > [Mon Jan 02 13:04:18 2006] [error] child process 21806 still did not > > > > exit, sending a SIGKILL > > > > [Mon Jan 02 13:04:18 2006] [error] child process 21768 still did not > > > > exit, sending a SIGKILL > > > > [Mon Jan 02 13:04:18 2006] [error] child process 21792 still did not > > > > exit, sending a SIGKILL > > > > [Mon Jan 02 13:04:19 2006] [info] removed PID file > > > > /var/run/apache2.pid (pid=21754) > > > > [Mon Jan 02 13:04:19 2006] [notice] caught SIGTERM, shutting down > > > > > > > > I've been tracking the problem, and on exit, the cleanup function (in > > > > Python) that I registered with the mod_python server does get called, > > > > and exits. > > > > > > > > I saw somewhere that recompiling expat, and then all the rest (in my > > > > case, python-2.4.2, apache2, mod_python-3.1.4, psycopg2) might fix > > the > > > > issue (it would be due to a library incompatibility). I did this, > > and > > > > the problem still occurs. I also tried using mod_python-3.2.5b and > > > > the problem persists. This is all running on Linux (Gentoo). > > > > > > > > Any clue on how to debug this issue further? I'm currently trying > > to > > > > run gdb on a non-daemon apache2, or trying to attach to a running > > > > child. The problem is a bit tricky to fix, since it does not happen > > > > always -- but always happens at least after I run a lot of requests > > > > e.g. my automated test suite for my web app. > > > > > > > > Any hints would be appreciated. > > > > > > > > cheers, > > > > > > > > > > > _______________________________________________ > > Mod_python mailing list > > Mod_python at modpython.org > > http://mailman.modpython.org/mailman/listinfo/mod_python >
|