Graham Dumpleton
grahamd at dscpl.com.au
Wed Jan 4 05:29:36 EST 2006
On 04/01/2006, at 5:17 PM, Martin Blais wrote: > I think I got my hands on something here, here is what I got from a > gdb backtrace when I quickly attach before the child is killed: > > #0 0xb7bec054 in __pthread_sigsuspend () from /lib/libpthread.so.0 > #1 0xb7bebe98 in __pthread_wait_for_restart_signal () from > /lib/libpthread.so.0 > #2 0xb7becd6b in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0 > #3 0xb7a2a54f in PyThread_acquire_lock () from > /usr/lib/apache2/modules/mod_python.so > #4 0xb7a2373e in PyThreadState_Delete () from > /usr/lib/apache2/modules/mod_python.so > #5 0xb7a232f8 in PyInterpreterState_Delete () from > /usr/lib/apache2/modules/mod_python.so > #6 0xb7a24663 in Py_EndInterpreter () from > /usr/lib/apache2/modules/mod_python.so > #7 0xb799ff08 in python_finalize () from > /usr/lib/apache2/modules/mod_python.so > #8 0xb7ccb6ed in apr_pool_cleanup_run () from /usr/lib/libapr-0.so.0 > #9 0xb7ccaf8d in apr_pool_destroy () from /usr/lib/libapr-0.so.0 > #10 0x0806860c in ap_graceful_stop_signalled () > #11 0xb7beeef5 in __pthread_sighandler () from /lib/libpthread.so.0 > #12 <signal handler called> > #13 0xb7b750b8 in poll () from /lib/libc.so.6 > #14 0xb7ccc2af in apr_poll () from /usr/lib/libapr-0.so.0 > #15 0xb7ccca53 in apr_wait_for_io_or_timeout () from > /usr/lib/libapr-0.so.0 > #16 0xb7cc1f74 in apr_socket_recv () from /usr/lib/libapr-0.so.0 > #17 0xb7dd9335 in apr_bucket_socket_create () from > /usr/lib/libaprutil-0.so.0 > #18 0xb7dd9bce in apr_brigade_split_line () from > /usr/lib/libaprutil-0.so.0 > #19 0x0807fdfb in ap_get_request_note () > #20 0x080767a6 in ap_get_brigade () > #21 0xb7f3db96 in ?? () from /usr/lib/apache2/modules/mod_logio.so > #22 0x08250d80 in ?? () > #23 0x08272020 in ?? () > #24 0x00000001 in ?? () > #25 0x00000000 in ?? () > > Apache terminates its children by sending them a signal in the first > place (#12). Then, in the Python HEAD_LOCK > (Python-2.4.2/Python/pystate.c), I inspected the code and I can't find > a deadlock problem. But the pthread semaphore seem to use a signal > itself (the name of the call in frame #0), and we're already in a > signal handler, isn't this prohibited? The stack trace is a bit bogus from what I can tell. In the various MPMs I looked at, the ap_graceful_stop_signalled() function simple sets a variable and returns. It doesn't go calling apr_pool_destroy(). Anyway, seeing the stack trace I can see where the problem lies and can simulate the situation with a test case. What it all comes down to is the signal handler for a SIGTERM in the child process is registered as: apr_signal(SIGTERM, just_die); Thus when the SIGTERM is received it calls just_die(). The just_die() function calls clean_child_exit(), which if there is found to be a memory pool in existence for the child process calls apr_pool_destroy() on that memory pool. The problem then is that mod_python registers a cleanup handler associated with that memory pool, namely python_finalize(). Ie., it calls: apr_pool_cleanup_register(p, NULL, python_finalize, apr_pool_cleanup_null); This means that when that memory pool is destroyed, the python_finalize() function is being called, which is wrong in that situation for a couple of reasons. The first reason is that complex things should not be done from inside of signal handlers unless the code which is called is heavily protected against being called by signal handlers when in critical sections. There is no way that general Python API functions are going to fall into that category. The second reason is that at the time that the signal occurs, the main program thread is already deep within Python code and probably has various locks acquired. When the signal handler calls into Py_Finalize() it is most likely reaching a point where it wants to acquire the same lock as the main program thread has and it effectively deadlocks as the signal handler can't proceed until it gets the lock, but the main program thread can't give it up while the signal handler is running. At least this is the case on UNIX systems, where signal handlers interrupt the execution of the main program thread, unlike Win32 where signal handlers are a distinct thread in their own right. My immediate question is why does Py_Finalize() even need to be called within the context of the child process if it is simply being killed off anyway. I know that for the Apache main process if doing a restart that Py_Finalize() needs to be called as the same process is kept around, but for a child process I don't see the point except maybe to flush out stderr/stdout which aren't typically used in mod_python anyway. Time now to work out why python_finalize() needs to be called. Maybe it can't simply not do anything when called in the context of the child process. Graham
|