[mod_python] RE: Bug in mod_python.c causing multiple/redundantinterpreter creation.

Sun Oct 17 16:45:16 EDT 2004

> -----Message d'origine-----
> De : Graham Dumpleton [mailto:grahamd at dscpl.com.au] 
> Envoyé : dimanche 17 octobre 2004 01:32
> À : Nicolas Lehuen
> Cc : mod_python at modpython.org
> Objet : Re: [mod_python] RE: Bug in mod_python.c causing 
> multiple/redundantinterpreter creation.
> 
> I knew there was an issue with apache.py as well, with there 
> being no locking, but didn't mention it in my email with the 
> fix as I wanted to check that that was the source of the 
> other problem I was seeing and make sure I came up with a proper fix.
> 
> I was also thinking to suggest using
> imp.acquire_lock()/imp.release_lock(), but you need to have a 
> fallback for when version of Python older than 2.3 as those 
> functions were only introduced in that version.

I know this is *bad* but I really think people should move forward and
upgrade to 2.3. Heck, it's not as if 2.3 was brand new ! It always makes me
cringe when people do their best to support versions as old as Python 1.5.
Why is it so difficult to upgrade ?

> I'll come up with proper code later, but I was going to 
> suggest something like:
> 
> if not globals().has_key("_import_lock"):
>    try:
>      from threading import Lock
>    except:
>      class Lock:
>        def acquire(self): pass
>        def release(self): pass
> 
>    _import_lock = Lock()
> 
> def import_module(module_name, autoreload=1, log=0, path=None):
>      """
>      Get the module to handle the request. If
>      autoreload is on, then the module will be reloaded
>      if it has changed since the last import.
>      """
> 
>      _import_lock.acquire()
>      try:
> 	# previous code of apache.import_module
>      finally:
>          _import_lock.release()
> 
> This mess is needed to cater for when threading isn't 
> compiled in to Python, use of an older version of Python that 
> didn't have the imp lock functions and also to avoid risk 
> that apache module might get touched and reloaded and thus 
> lock being overwritten.

You should use a RLock (reentrant), not a Lock, in case you use
import_module to import a module which in turn uses import_module.

> With the way that the apache module caching uses sys.modules 
> as its cache and doesn't keep a separate data structure, I am 
> not sure that a two level locking mechanism would be a good 
> idea. The lock for a two level scheme would at the moment 
> need to be stored in the module itself, as is __mtime__ at present.
>
> Frankly, I reckon the whole thing should be redone using 
> execfile() and a purpose built search routine in place of 
> imp.find_module(). All you loose is the ability to use 
> .pyc/.pyo files. If implemented correctly, you would also 
> loose the restriction in mod_python.publisher about having 
> two copies of the index.py file in different places, but that 
> would actually be a gain.

Well to me .pyc/.pyo files for published modules are a nuisance, since I had
to add an access rule to prevent them from being readable through HTTP (it
would be quite awful to have some of those modules containing passwords on
the net).

As for the restriction about index.py in mod_python.publisher, I really
don't understand why it's still there. It's pretty easy to remove it, it's
just a name mangling issue ! I rewrote my own publisher just because of this
limitations (then I began to add other features). Now that I understand
better how mod_python works, I see how it can be fixed.

I think the core of the problem is that import_module was designed to be
usable for two distincts purposes : loading pages for the publisher and
loading "normal" modules. Loading "normal" modules should be left to Python,
possibly with somes tasks like reloading implemented in import hooks.
Loading pages for the publisher should be pushed into the publisher, and
should have its own module cache, no need to mess with sys.modules (I don't
consider it good practice for a page to "import" another page).

Regards,

Nicolas

> You could optionally, have the support for the __clone__ type 
> method I was talking about before. Ie., in my module 
> reloading mechanism, I load into a fresh module, but a 
> rewrite of the apache scheme, would need to preserve the 
> behaviour of reloading into the existing one otherwise 
> existing code might break. The presence of a __clone__ method 
> could turn that off and instead a new module used in a 
> replacement for the apache scheme.
> 
> If one is reimplementing it to this extent, then you could 
> introduce a two level locking mechanism.
> 
> On 17/10/2004, at 12:46 AM, Nicolas Lehuen wrote:
> 
> > Indeed, having a look at apache.py, I didn't see any 
> locking done to 
> > prevent the same module from being reloaded multiple times 
> > concurrently. What we need is to put a lock in the import_module 
> > method, so that two threads cannot check the freshness of the 
> > publisher simultaneously. This can be done in a rather 
> brutal way by 
> > calling imp.acquire_lock() at the beginning of import_module, put a 
> > try/finally block around the code of the function and call 
> > imp.release_lock() in the finally block :
> >
> > def import_module(module_name, autoreload=1, log=0, path=None):
> >     """
> >     Get the module to handle the request. If
> >     autoreload is on, then the module will be reloaded
> >     if it has changed since the last import.
> >     """
> >
> >     imp.acquire_lock()
> >     try:
> > 	# previous code of apache.import_module
> >     finally:
> >         imp.release_lock()
> >
> > I tested this, and it works as expected. BTW, maybe it 
> wouldn't work 
> > without your patch, Graham, since I could suddenly see two 
> > interpreters with their own copy of the mod_python.apache 
> module. But 
> > the two fixes are required for it to work properly.
> >
> > Why did I wrote "in a rather brutal way" above ? Because 
> this locking 
> > scheme is indeed brutal. It is a global lock, so each and 
> every thread 
> > who want to get its handler must acquire it for a brief amount of 
> > time. The problem is that if a handler module is reloaded, it locks 
> > each and every other thread during its reloading, even if 
> they should 
> > not be handled by the same module.
> > A smarter way to lock would be to use a two-levels locking 
> scheme, as 
> > I described in my caching recipe in the Python Cookbook. I can 
> > implement it there if it's ok for everybody.
> 
> 
> 
> --
> Graham Dumpleton (grahamd at dscpl.com.au)
> 
>