[mod_python] Udate python modules without restarting Apache

Sun Oct 10 19:50:15 EDT 2004

On 09/10/2004, at 6:10 PM, Nicolas Lehuen wrote:

> Hi Graham, what are the pros and cons of using the imp module versus an
> execfile call or an exec call ?

Hmmm, I'll try and have a go at this question now. Note that I haven't 
ever
used execfile()/exec myself so what I talk about here is only based on 
some
quick observations. Also, as far as I can tell, the end result of 
execfile()
and exec is the same, so I'll compare just imp.load_module() and 
execfile().

When using combination of imp.find_module() and imp.load_module(), main 
thing
is that it is a true module. Ie., result of calling type() on the 
result will
actually be <type 'module'>.

Taking a simple module which contains:

   variable = 0
   def function(): pass

If this module is imported using imp.load_module(), result is same as 
"import"
and running dir() on the module yields the following.

   ['__builtins__', '__doc__', '__file__', '__name__', 'function', 
'variable']

If one uses execfile() however, the result is simply a dictionary, ie., 
type()
yields <type 'dict'>. The results of calling keys() on that dictionary 
yields
the following.

   ['__builtins__', 'variable', 'function']

In terms of executing code within the context of the two, there 
probably isn't
going to be much difference. The only issue would be if the code being 
loaded
had expected it to be executing within the environment of a true 
module. Ie.,
that it expected __file__ and __name__ to actually exist.

I know that I occasionally write code which uses __file__ to work out 
the
directory that the code file is in, because in the same directory I 
might
place some data file that the module reads in when required as part of 
some
function the module provides.

This isn't a big detail, as a module importing system that used 
execfile() could
fake up the entries for __file__ and __name__ by loading them into the 
dictionary
prior to calling execfile().

Next issue with using execfile() is that the input has to be the 
original Python
code as text, albeit stored in a file. When using imp.find_module() and
imp.load_module() together, it can take advantage of a precompiled 
Python code
file, ie., ".pyc" file. I'll ignore C compiled modules, but if you are 
mad enough
you could use those as well.

Ultimately, this is probably no big deal in as much as you are caching 
the module
in memory for use anyway and it would only be an issue on a subsequent 
restart of
Apache the first time the code needs to be loaded.

Moving on to deletion of the loaded code in each case if you really 
needed to do
that. When using execfile(), the only place it will be saved away is in 
your cache.
Thus, delete it from the cache and once any code executing within the 
context of
it or otherwise holding a reference to it goes away, the code will also 
vanish.

When using "imp" methods, deleting it from your own cache isn't going 
to be enough
as it will also be referenced from sys.modules. Whether or not deleting 
stuff from
sys.modules is a good idea I don't know. I have done it in emergencies 
as I don't
have an ability to readily get Apache restarted in one place where I am 
doing
stuff.

An issue with deleting stuff from sys.modules is what happens in a 
multithreaded
environment if another thread is importing a module at the same time 
and is thus
updating sys.modules. It may well be the case that the safest thing to 
do is to
use the Python 2.3+ methods imp.acquire_lock()/imp.release_lock() on 
the assumption
that that lock is acquired when a module is being imported and thus 
when sys.modules
is being updated. Because it is only deletion of a single key, it may 
also not
really be needed either.

Now as far as initial loading of a module goes, the only other thing is 
that
in both cases, any module caching system should ensure that if it is to 
work
properly in a multithreaded environment that it thread locks its cache 
data
structures while working out whether a module is present and up to date 
and
when importing modules.

If this isn't done, you end up with problems like I described in prior 
email where
for example multiple threads may determine at the same time that a 
module has to
be freshly loaded. If this happens and the "imp" functions are being 
used, the
initial import will happen okay, but all the others end up in a reload 
being done.

If execfile() is being used, if you start out with a clean slate each 
time, you
end up with every thread getting a separate instance of the loaded 
module, but
all except the last one to store its copy back in the cache, will be 
deleted when
the HTTP request for each completes.

The actual differences between using "imp" functions and execfile() 
probably come
about when it comes to module reloading when the code changes. As I 
described in
prior email, the "imp" functions effectively reload the new code into 
the same
module instance it had already created.

That code is reloaded into the same module can both be beneficial and 
detrimental
if you are not careful.

A benefit is that you can store data which you wish to be preserved 
across reloads
in global variables. To get this to work, the variable can't be 
declared at global
scope when the module first loads, instead, it should only be created 
from a method
later on. The other alternative as explain in prior mail was to have 
code executing
at global scope check for prior existence of the global variable and 
not reinitialise
it if it already existed.

The not so good things I see about module reloading, and which I also 
explained in
prior email, is what to do if a module reload occurs while some other 
thread is
executing code in the module at the same time and accessing any global 
data. One
has to be very careful about how you perform your thread locking to 
ensure that
data doesn't get modified at the same time in bad ways.

As far as I know, there is possibly no real alternative to having 
modules reload
into the same namespace when using "imp" functions. Deletion of the 
module first from
sys.modules will work, but not sure how wise that is.

As it happens, when using execfile() you actually have a choice as to 
whether a
reload is done into the same namespace or not. This is controlled by 
the dictionary
supplied as second argument to execfile().

The choices are to pass in an empty dictionary and start from scratch 
each time,
or pass in the existing dictionary and have it be overlaid with new 
stuff just like
with the "imp" functions. Obviously, if starting from scratch each 
time, you would
delete the old dictionary from the cache first and then load in the 
module into the
new dictionary. In you want to keep what you had, don't delete it from 
the cache
and just reload over it.

Obviously, if you reload over it, you end up with all the same issues 
as described
from the "imp" methods.

There is other thing worth mentioning here in relation to reloading 
over the top of the
existing module in both cases. This is that if the new code module has 
removed a method
or some global bit of data. That method or data will still persist 
after the reload
even though it is no longer in the code file. This is because nothing 
in the namespace
is deleted.

That this can happen can be problematic. Consider the 
mod_python.publisher case which
currently uses apache.import_module(). If a method is deleted from the 
original code
file because you no longer want it to be visible, even after the 
reload, the old
version will still be there and still callable. Your only option to 
really get rid
of it is to restart Apache.

Now I am glad you asked this question as it has really made me think 
about how I use
the "imp" functions. At the sacrifice of potentially having precompiled 
code around,
I may actually change from using "imp" to execfile(). I would do this 
to enable more
flexibility by allowing control over what happens to existing data on 
reloads.

First off, to preserve some compatibility, I would have execfile() when 
reloading
a module after a change pass in the existing dictionary of data. Thus, 
new stuff is
overlaid on top of the old.

This however would be the default behaviour only. One could have 
special variables
defined within the module which dictate what is actually to occur on a 
reload. The
options I see which could be provided are:

1. Default behaviour being that of the "imp" methods where new is 
overlaid on top.

2. Start as new option where the existing module is totally discarded 
first and new
module goes into a fresh dictionary.

3. Same as option 2, except that module could define a list of global 
variables which
should be copied from the old module into the dictionary for the new 
before the reload
occurs.

When I first started on my module importing system, what I really 
wanted was option 3,
but I couldn't work out how to do it given "imp" reloaded on top and 
didn't really give
you a way of doing it any other way.

The reason I like option 3 is that it solves the problem with deleted 
methods not really
being deleted and thus perhaps causing problems because they may still 
be accessible.
Also, by requiring that the list of variables that should be preserved 
across the
reload be listed, it makes it clearer in the code what is preserved 
state information.

Now hopefully you have followed me through to the end on this. After 
having gone through
this, as suggested above, my preferred approach may actually now be as 
follows.

1. Use execfile().

2. On an import, setup __file__ and __name__ so it sort of looks like a 
real module in
cases where code expects to see those values set.

3. Default behaviour on reloading is to work like "imp" modules and 
reload on top of
existing module by passing dictionary for it as second argument to 
execfile().

4. Have ability through special variables defined in imported module to 
control what
actually happens on a reload. Namely, allow the three options above, 
for default
behaviour, start over as new and start over as new but preserve limited 
set of state
variables including thread locks.

Okay, looks like I have a TODO list for myself now. Not only do I have 
to address the
threading issues in Vampire as I knew I had to, but change the module 
importing system
to use execfile() and implement some control mechanisms over the reload 
behaviour.
Ahhh, more coding to do. In the meantime, my documentation for this 
project and
another project I work on just get delayed more and more. :-(

As usual, if some one thinks I speak with forked tongue and have 
misinterpreted how
something works, please respond and describe what really happens. Any 
other ideas
are also most welcome. Thanks.

--
Graham Dumpleton (grahamd at dscpl.com.au)