[mod_python] Udate python modules without restarting Apache

Nicolas Lehuen nicolas at lehuen.com
Sun Oct 10 15:34:52 EDT 2004


Note that there is a huge caveat to all this, however : currently,
mod_python does not properly handle the loading of handlers modules in a
thread-safe way. This means that if your web site is under a steady load,
requiring many thread to process requests all the time, then when you're
restarting your Apache server, you'll see many messages from mod_python in
the error log stating that the handler is (re)loaded (this can happen even
with all handlers, even with mod_python.publisher).

I don't know precisely yet what is the impact of this bug, but it does at
least means than at initialization time, you may have many module caches in
memory whereas you'd wanted to have only one. Thus, at least during
initialisation, you don't benefit from data sharing. The least I can say is
that I find this disheartening.

When writing application server code (and I did write quite a bunch in a
former life), threading should be your first issue, not an after thought.
But I guess this is a cultural problem coming for the Linux forking MPM
heritage ; maybe people aren't used to use threads on Linux. Well, I guess
I'm not the only Win32 or MacOS/X guy out there that would like a better
support for threads in mod_python... I could have a look and try to fix this
bug, but I still have no news from the previous patch I've submitted...

Regards,

Nicolas

> -----Message d'origine-----
> De : mod_python-bounces at modpython.org 
> [mailto:mod_python-bounces at modpython.org] De la part de Nicolas Lehuen
> Envoyé : dimanche 10 octobre 2004 14:13
> À : 'Graham Dumpleton'; 'mod_python user mailing list'
> Objet : RE: [mod_python] Udate python modules without 
> restarting Apache
> 
> Hi Graham,
> 
> First thank you for your thorough anaylisis of the imp module 
> behaviour in a multi-threaded environment. This reinforce my 
> feeling that NOT using this module is way better for my 
> publisher since I can precisely understand what it does 
> without depending on any arcane locking mechanisms. With my 
> publisher, I know precisely what the locking policy is. Have 
> a look at the Cache recipe :
> 
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302997
> 
> The trick I used to reduce cache contention is to have a two 
> levels locking scheme. The first level is a lock for the 
> whole cache, used to make sure that not two entries are 
> created for the same key (here the key is the file name of 
> the module). The second level is an entry-specific lock which 
> is used to make sure that a module cannot be initialized by 
> more than one thread. This two level locking enable multiple 
> threads to handle different modules concurrently : the 
> possibly lengthy module initialisation process is done within 
> a module-specific lock, so other thread can initialize other 
> modules concurrently.
> 
> So what happens when a page is accessed by my publisher ?
> 
> 1) the publisher determines what file is targeted by the http 
> request, just by using req.filename
> 2) the publisher get the module keyed by the filename from 
> the module cache
> 3) the publisher calls a special callable object in the module
> (http_handler) or find one by applying acquisition semantics.
> 
> At this high level, no locking is required. If the module 
> needs to be reloaded, the threads which have already reached 
> the step 3 will keep on using the stale module. Once they are 
> over, the stale module will be dereferenced and eventually 
> garbage collected (with all objects it contains). Meanwhile, 
> new threads arriving at step 2 will get the new module. All 
> is well that ends well.
> 
> Now if we study what's happening at step 2 when a module is 
> already up to date, then we see that we first acquire a lock 
> on the module cache during a very brief amount of time (just 
> to get the entry for the filename), then a lock on the module 
> itself for another brief amount of time (just to check that 
> the module is indeed up to date). Thereafter, all thread are 
> on their own to do whatever they need in parallel.
> 
> So we have indeed a very small contention point where all 
> requests are serialized, then another small contention point 
> where all requests targeted at the same module are 
> serialized, then no more contention until the request is 
> over. That is to say, all the hard work (database requests, 
> file processing, template rendering etc.) is indeed done in 
> multiple concurrent threads.
> 
> I don't see how this can be done in a better way. In any 
> case, you certainly should NOT acquire a lock and do all the 
> request processing within this lock, like you wrote below... 
> This effectively kills any multi-threaded attempt.
> 
> The best part with having properly managed modules is that 
> you can instantiate long-living objects that are shared 
> between multiple threads in the modules. The best example is 
> a thread-safe DB connection pool. In fact, my publisher 
> automatically looks if the module targeted by a request has 
> special callable objects named 'prepare', 'commit', 
> 'rollback' and 'release'. In those methods, a developper can 
> write some code to manage DB connections. The net result is 
> that in most of my application, I only have to write 
> req.cursor to have a safe pooled cursor to the database I 
> use. I don't have to worry anymore about opening / closing DB 
> connections !
> 
> In fact, I extended this behaviour : when a page module is 
> loaded, the publisher looks for a 'global.pyapp' file which 
> contains a Python module shared by all modules found in the 
> same directory. This 'global.pyapp' file is an 
> application-wide module which usually defines global objects 
> such as the aformentioned DB connection pool, plus the same 
> special methods 'prepare', 'commit', 'rollback', 'release' 
> that are called before the ones from the targeted module. 
> This is equivalent to the 'global.asa' files found in Microsoft's ASP.
> 
> I plan on releasing my code on the web, but I'm missing the 
> time to build a proper web site for now...
> 
> Anyway, thanks for this interesting discussion :)
> 
> Regards,
> 
> Nicolas
> 
>  
> 
> > -----Message d'origine-----
> > De : mod_python-bounces at modpython.org 
> > [mailto:mod_python-bounces at modpython.org] De la part de Graham 
> > Dumpleton Envoyé : dimanche 10 octobre 2004 12:14 À : 
> mod_python user 
> > mailing list Objet : Re: [mod_python] Udate python modules without 
> > restarting Apache
> > 
> > 
> > On 10/10/2004, at 1:55 PM, Graham Dumpleton wrote:
> > 
> > > The point I am trying to make is that perhaps locking shouldn't 
> > > pertain just to the global data and that executable code 
> should in 
> > > some way be protected as well.
> > > It needs more exploration, but perhaps the module should read 
> > > something like the following (untested).
> > >
> > >   if not globals().has_key("datalock"):
> > >     # This is a brand new import and we are inside the 
> scope of the
> > >     # import lock, so this should all be safe to execute.
> > >
> > >     datalock = Lock()
> > >     codelock = Lock()
> > >
> > >     myglobal1 = None
> > >     myglobal2 = None
> > >     myglobal3 = None
> > >
> > >     ... arbitrary code to initialise globals
> > >
> > >   codelock.acquire()
> > >   datalock.acquire()
> > >
> > >   ... extra code to fiddle the globals
> > >
> > >   def modify1(req):
> > >     datalock.acquire()
> > >     ...
> > >     datalock.unlock()
> > >     return value
> > >
> > >   def modify2(req,value):
> > >     datalock.acquire()
> > >     ...
> > >     datalock.unlock()
> > >
> > >   def _handler(req):
> > >     value = modify1(req)
> > >     modify2(req,value)
> > >     ...
> > >
> > >   def handler(req):
> > >     # code for this method should never be changed
> > >     try:
> > >       codelock.acquire()
> > >       result = _handler(req)
> > >     finally:
> > >       codelock.unlock()
> > >     return result
> > >
> > >   datalock.unlock()
> > >   codelock.unlock()
> > >
> > > What is trying to be done here is using a lock to ensure 
> > that any code 
> > > itself is not replaced while the handler is being executed. 
> > The reason 
> > > for two level handler arrangement, is that the lock 
> > actually needs to 
> > > be acquired outside of the real handler method being 
> > called. If it is 
> > > done inside, it is too late, as Python has already grabbed the 
> > > existing code for handler and if it gets replaced prior 
> to the lock 
> > > being acquired, everything may have changed.
> > >
> > > Am I being too paranoid? I don't understand enough about 
> > how code gets 
> > > replaced when a module reload is occurring.
> > 
> > Hmmm, maybe gone slightly too far here. The code lock 
> > effectively serialises access to the handler, which screws up 
> > the whole idea of having threads in the first place.
> > 
> > I really like this idea of using execfile() as described in 
> > other email and copying across state variables of interest 
> > into a new module, as avoids the need for the code lock 
> > anyway as new code constructed in new module.
> > 
> > --
> > Graham Dumpleton (grahamd at dscpl.com.au)
> > 
> > _______________________________________________
> > Mod_python mailing list
> > Mod_python at modpython.org
> > http://mailman.modpython.org/mailman/listinfo/mod_python
> > 
> 
> 
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
> 




More information about the Mod_python mailing list