[mod_python] RE: Bug in mod_python.c causing multiple/redundantinterpreter creation.

Sun Oct 17 10:32:00 EDT 2004

I knew there was an issue with apache.py as well, with there being no 
locking,
but didn't mention it in my email with the fix as I wanted to check 
that that
was the source of the other problem I was seeing and make sure I came 
up with
a proper fix.

I was also thinking to suggest using 
imp.acquire_lock()/imp.release_lock(), but
you need to have a fallback for when version of Python older than 2.3 
as those
functions were only introduced in that version.

I'll come up with proper code later, but I was going to suggest 
something like:

if not globals().has_key("_import_lock"):
   try:
     from threading import Lock
   except:
     class Lock:
       def acquire(self): pass
       def release(self): pass

   _import_lock = Lock()

def import_module(module_name, autoreload=1, log=0, path=None):
     """
     Get the module to handle the request. If
     autoreload is on, then the module will be reloaded
     if it has changed since the last import.
     """

     _import_lock.acquire()
     try:
	# previous code of apache.import_module
     finally:
         _import_lock.release()

This mess is needed to cater for when threading isn't compiled in to 
Python,
use of an older version of Python that didn't have the imp lock 
functions
and also to avoid risk that apache module might get touched and reloaded
and thus lock being overwritten.

With the way that the apache module caching uses sys.modules as its 
cache
and doesn't keep a separate data structure, I am not sure that a two 
level
locking mechanism would be a good idea. The lock for a two level scheme
would at the moment need to be stored in the module itself, as is 
__mtime__
at present.

Frankly, I reckon the whole thing should be redone using execfile() and 
a
purpose built search routine in place of imp.find_module(). All you 
loose
is the ability to use .pyc/.pyo files. If implemented correctly, you 
would
also loose the restriction in mod_python.publisher about having two 
copies
of the index.py file in different places, but that would actually be a 
gain.

You could optionally, have the support for the __clone__ type method I 
was
talking about before. Ie., in my module reloading mechanism, I load 
into a
fresh module, but a rewrite of the apache scheme, would need to preserve
the behaviour of reloading into the existing one otherwise existing code
might break. The presence of a __clone__ method could turn that off and
instead a new module used in a replacement for the apache scheme.

If one is reimplementing it to this extent, then you could introduce a 
two
level locking mechanism.

On 17/10/2004, at 12:46 AM, Nicolas Lehuen wrote:

> Indeed, having a look at apache.py, I didn't see any locking done to 
> prevent
> the same module from being reloaded multiple times concurrently. What 
> we
> need is to put a lock in the import_module method, so that two threads
> cannot check the freshness of the publisher simultaneously. This can 
> be done
> in a rather brutal way by calling imp.acquire_lock() at the beginning 
> of
> import_module, put a try/finally block around the code of the function 
> and
> call imp.release_lock() in the finally block :
>
> def import_module(module_name, autoreload=1, log=0, path=None):
>     """
>     Get the module to handle the request. If
>     autoreload is on, then the module will be reloaded
>     if it has changed since the last import.
>     """
>
>     imp.acquire_lock()
>     try:
> 	# previous code of apache.import_module
>     finally:
>         imp.release_lock()
>
> I tested this, and it works as expected. BTW, maybe it wouldn't work 
> without
> your patch, Graham, since I could suddenly see two interpreters with 
> their
> own copy of the mod_python.apache module. But the two fixes are 
> required for
> it to work properly.
>
> Why did I wrote "in a rather brutal way" above ? Because this locking 
> scheme
> is indeed brutal. It is a global lock, so each and every thread who 
> want to
> get its handler must acquire it for a brief amount of time. The 
> problem is
> that if a handler module is reloaded, it locks each and every other 
> thread
> during its reloading, even if they should not be handled by the same 
> module.
> A smarter way to lock would be to use a two-levels locking scheme, as I
> described in my caching recipe in the Python Cookbook. I can implement 
> it
> there if it's ok for everybody.

--
Graham Dumpleton (grahamd at dscpl.com.au)