[mod_python] Udate python modules without restarting Apache

Sun Oct 10 15:12:50 EDT 2004

Hi Graham,

First thank you for your thorough anaylisis of the imp module behaviour in a
multi-threaded environment. This reinforce my feeling that NOT using this
module is way better for my publisher since I can precisely understand what
it does without depending on any arcane locking mechanisms. With my
publisher, I know precisely what the locking policy is. Have a look at the
Cache recipe :

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302997

The trick I used to reduce cache contention is to have a two levels locking
scheme. The first level is a lock for the whole cache, used to make sure
that not two entries are created for the same key (here the key is the file
name of the module). The second level is an entry-specific lock which is
used to make sure that a module cannot be initialized by more than one
thread. This two level locking enable multiple threads to handle different
modules concurrently : the possibly lengthy module initialisation process is
done within a module-specific lock, so other thread can initialize other
modules concurrently.

So what happens when a page is accessed by my publisher ?

1) the publisher determines what file is targeted by the http request, just
by using req.filename
2) the publisher get the module keyed by the filename from the module cache
3) the publisher calls a special callable object in the module
(http_handler) or find one by applying acquisition semantics.

At this high level, no locking is required. If the module needs to be
reloaded, the threads which have already reached the step 3 will keep on
using the stale module. Once they are over, the stale module will be
dereferenced and eventually garbage collected (with all objects it
contains). Meanwhile, new threads arriving at step 2 will get the new
module. All is well that ends well.

Now if we study what's happening at step 2 when a module is already up to
date, then we see that we first acquire a lock on the module cache during a
very brief amount of time (just to get the entry for the filename), then a
lock on the module itself for another brief amount of time (just to check
that the module is indeed up to date). Thereafter, all thread are on their
own to do whatever they need in parallel.

So we have indeed a very small contention point where all requests are
serialized, then another small contention point where all requests targeted
at the same module are serialized, then no more contention until the request
is over. That is to say, all the hard work (database requests, file
processing, template rendering etc.) is indeed done in multiple concurrent
threads.

I don't see how this can be done in a better way. In any case, you certainly
should NOT acquire a lock and do all the request processing within this
lock, like you wrote below... This effectively kills any multi-threaded
attempt.

The best part with having properly managed modules is that you can
instantiate long-living objects that are shared between multiple threads in
the modules. The best example is a thread-safe DB connection pool. In fact,
my publisher automatically looks if the module targeted by a request has
special callable objects named 'prepare', 'commit', 'rollback' and
'release'. In those methods, a developper can write some code to manage DB
connections. The net result is that in most of my application, I only have
to write req.cursor to have a safe pooled cursor to the database I use. I
don't have to worry anymore about opening / closing DB connections !

In fact, I extended this behaviour : when a page module is loaded, the
publisher looks for a 'global.pyapp' file which contains a Python module
shared by all modules found in the same directory. This 'global.pyapp' file
is an application-wide module which usually defines global objects such as
the aformentioned DB connection pool, plus the same special methods
'prepare', 'commit', 'rollback', 'release' that are called before the ones
from the targeted module. This is equivalent to the 'global.asa' files found
in Microsoft's ASP.

I plan on releasing my code on the web, but I'm missing the time to build a
proper web site for now...

Anyway, thanks for this interesting discussion :)

Regards,

Nicolas

> -----Message d'origine-----
> De : mod_python-bounces at modpython.org 
> [mailto:mod_python-bounces at modpython.org] De la part de 
> Graham Dumpleton
> Envoyé : dimanche 10 octobre 2004 12:14
> À : mod_python user mailing list
> Objet : Re: [mod_python] Udate python modules without 
> restarting Apache
> 
> 
> On 10/10/2004, at 1:55 PM, Graham Dumpleton wrote:
> 
> > The point I am trying to make is that perhaps locking shouldn't 
> > pertain just to the global data and that executable code should in 
> > some way be protected as well.
> > It needs more exploration, but perhaps the module should read 
> > something like the following (untested).
> >
> >   if not globals().has_key("datalock"):
> >     # This is a brand new import and we are inside the scope of the
> >     # import lock, so this should all be safe to execute.
> >
> >     datalock = Lock()
> >     codelock = Lock()
> >
> >     myglobal1 = None
> >     myglobal2 = None
> >     myglobal3 = None
> >
> >     ... arbitrary code to initialise globals
> >
> >   codelock.acquire()
> >   datalock.acquire()
> >
> >   ... extra code to fiddle the globals
> >
> >   def modify1(req):
> >     datalock.acquire()
> >     ...
> >     datalock.unlock()
> >     return value
> >
> >   def modify2(req,value):
> >     datalock.acquire()
> >     ...
> >     datalock.unlock()
> >
> >   def _handler(req):
> >     value = modify1(req)
> >     modify2(req,value)
> >     ...
> >
> >   def handler(req):
> >     # code for this method should never be changed
> >     try:
> >       codelock.acquire()
> >       result = _handler(req)
> >     finally:
> >       codelock.unlock()
> >     return result
> >
> >   datalock.unlock()
> >   codelock.unlock()
> >
> > What is trying to be done here is using a lock to ensure 
> that any code 
> > itself is not replaced while the handler is being executed. 
> The reason 
> > for two level handler arrangement, is that the lock 
> actually needs to 
> > be acquired outside of the real handler method being 
> called. If it is 
> > done inside, it is too late, as Python has already grabbed the 
> > existing code for handler and if it gets replaced prior to the lock 
> > being acquired, everything may have changed.
> >
> > Am I being too paranoid? I don't understand enough about 
> how code gets 
> > replaced when a module reload is occurring.
> 
> Hmmm, maybe gone slightly too far here. The code lock 
> effectively serialises access to the handler, which screws up 
> the whole idea of having threads in the first place.
> 
> I really like this idea of using execfile() as described in 
> other email and copying across state variables of interest 
> into a new module, as avoids the need for the code lock 
> anyway as new code constructed in new module.
> 
> --
> Graham Dumpleton (grahamd at dscpl.com.au)
> 
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
>