Nicolas Lehuen
nicolas at lehuen.com
Sun Oct 10 15:12:50 EDT 2004
Hi Graham, First thank you for your thorough anaylisis of the imp module behaviour in a multi-threaded environment. This reinforce my feeling that NOT using this module is way better for my publisher since I can precisely understand what it does without depending on any arcane locking mechanisms. With my publisher, I know precisely what the locking policy is. Have a look at the Cache recipe : http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302997 The trick I used to reduce cache contention is to have a two levels locking scheme. The first level is a lock for the whole cache, used to make sure that not two entries are created for the same key (here the key is the file name of the module). The second level is an entry-specific lock which is used to make sure that a module cannot be initialized by more than one thread. This two level locking enable multiple threads to handle different modules concurrently : the possibly lengthy module initialisation process is done within a module-specific lock, so other thread can initialize other modules concurrently. So what happens when a page is accessed by my publisher ? 1) the publisher determines what file is targeted by the http request, just by using req.filename 2) the publisher get the module keyed by the filename from the module cache 3) the publisher calls a special callable object in the module (http_handler) or find one by applying acquisition semantics. At this high level, no locking is required. If the module needs to be reloaded, the threads which have already reached the step 3 will keep on using the stale module. Once they are over, the stale module will be dereferenced and eventually garbage collected (with all objects it contains). Meanwhile, new threads arriving at step 2 will get the new module. All is well that ends well. Now if we study what's happening at step 2 when a module is already up to date, then we see that we first acquire a lock on the module cache during a very brief amount of time (just to get the entry for the filename), then a lock on the module itself for another brief amount of time (just to check that the module is indeed up to date). Thereafter, all thread are on their own to do whatever they need in parallel. So we have indeed a very small contention point where all requests are serialized, then another small contention point where all requests targeted at the same module are serialized, then no more contention until the request is over. That is to say, all the hard work (database requests, file processing, template rendering etc.) is indeed done in multiple concurrent threads. I don't see how this can be done in a better way. In any case, you certainly should NOT acquire a lock and do all the request processing within this lock, like you wrote below... This effectively kills any multi-threaded attempt. The best part with having properly managed modules is that you can instantiate long-living objects that are shared between multiple threads in the modules. The best example is a thread-safe DB connection pool. In fact, my publisher automatically looks if the module targeted by a request has special callable objects named 'prepare', 'commit', 'rollback' and 'release'. In those methods, a developper can write some code to manage DB connections. The net result is that in most of my application, I only have to write req.cursor to have a safe pooled cursor to the database I use. I don't have to worry anymore about opening / closing DB connections ! In fact, I extended this behaviour : when a page module is loaded, the publisher looks for a 'global.pyapp' file which contains a Python module shared by all modules found in the same directory. This 'global.pyapp' file is an application-wide module which usually defines global objects such as the aformentioned DB connection pool, plus the same special methods 'prepare', 'commit', 'rollback', 'release' that are called before the ones from the targeted module. This is equivalent to the 'global.asa' files found in Microsoft's ASP. I plan on releasing my code on the web, but I'm missing the time to build a proper web site for now... Anyway, thanks for this interesting discussion :) Regards, Nicolas > -----Message d'origine----- > De : mod_python-bounces at modpython.org > [mailto:mod_python-bounces at modpython.org] De la part de > Graham Dumpleton > Envoyé : dimanche 10 octobre 2004 12:14 > À : mod_python user mailing list > Objet : Re: [mod_python] Udate python modules without > restarting Apache > > > On 10/10/2004, at 1:55 PM, Graham Dumpleton wrote: > > > The point I am trying to make is that perhaps locking shouldn't > > pertain just to the global data and that executable code should in > > some way be protected as well. > > It needs more exploration, but perhaps the module should read > > something like the following (untested). > > > > if not globals().has_key("datalock"): > > # This is a brand new import and we are inside the scope of the > > # import lock, so this should all be safe to execute. > > > > datalock = Lock() > > codelock = Lock() > > > > myglobal1 = None > > myglobal2 = None > > myglobal3 = None > > > > ... arbitrary code to initialise globals > > > > codelock.acquire() > > datalock.acquire() > > > > ... extra code to fiddle the globals > > > > def modify1(req): > > datalock.acquire() > > ... > > datalock.unlock() > > return value > > > > def modify2(req,value): > > datalock.acquire() > > ... > > datalock.unlock() > > > > def _handler(req): > > value = modify1(req) > > modify2(req,value) > > ... > > > > def handler(req): > > # code for this method should never be changed > > try: > > codelock.acquire() > > result = _handler(req) > > finally: > > codelock.unlock() > > return result > > > > datalock.unlock() > > codelock.unlock() > > > > What is trying to be done here is using a lock to ensure > that any code > > itself is not replaced while the handler is being executed. > The reason > > for two level handler arrangement, is that the lock > actually needs to > > be acquired outside of the real handler method being > called. If it is > > done inside, it is too late, as Python has already grabbed the > > existing code for handler and if it gets replaced prior to the lock > > being acquired, everything may have changed. > > > > Am I being too paranoid? I don't understand enough about > how code gets > > replaced when a module reload is occurring. > > Hmmm, maybe gone slightly too far here. The code lock > effectively serialises access to the handler, which screws up > the whole idea of having threads in the first place. > > I really like this idea of using execfile() as described in > other email and copying across state variables of interest > into a new module, as avoids the need for the code lock > anyway as new code constructed in new module. > > -- > Graham Dumpleton (grahamd at dscpl.com.au) > > _______________________________________________ > Mod_python mailing list > Mod_python at modpython.org > http://mailman.modpython.org/mailman/listinfo/mod_python >
|