[mod_python] How can I turn off apparent caching of python code?

Sun Aug 29 21:42:15 EDT 2004

On 25/08/2004, at 12:20 PM, Graham Dumpleton wrote:

>
> On 24/08/2004, at 9:54 AM, Gregory (Grisha) Trubetskoy wrote:
>
>>
>> On Mon, 23 Aug 2004, Tobiah wrote:
>>
>>>
>>>>  DEBUG = 1
>>>>  import mymodulea
>>>>  import mymoduleb
>>>>  if DEBUG:
>>>>         mymodulea = reload(mymodulea)
>>>>         mymoduleb = reload(mymoduleb)
>>>> That is O.K., but still, unwieldy.
>>
>> This method is pretty old, now a simpler way to do it is:
>>
>> mymodulea = apache.import_module("mymodulea")
>> mymoduleb = apache.import_module("mymoduleb")
>
> No time right now for an extended description, will send one later, 
> but Vampire which I
> described recently and which is available at 
> "http://www.dscpl.com.au/projects/vampire"
> has a module caching system which has explicit module importing 
> similar to the
> mod_python.apache module, but what it will do is track relationships 
> between module
> imports and is able to do a reload even when child imports are changed 
> and not the
> immediate module you are loading.
>
> What this means is if that A imports B which imports C and similarly D 
> imports C, if C
> is then changed, all the modules which depend on C, ie., A, B and D 
> will be automatically
> reloaded when requested, rather than them being grabbed from the 
> cache. If all your
> application modules are in one place and you always use the module 
> importer in Vampire
> for importing any application module from a page, but also one 
> application module from
> another, any change to an application module should see anything that 
> depends on it
> automatically reloaded.

I said I would follow up later with a better description. Being on 
holidays in a
foreign land this has turned out to the first opportunity to do so.

The module caching mechanism in Vampire is used as:

   import vampire.cache

   cache = vampire.cache.ModuleCache()
   c = cache.importModule("c",".")

The first argument to importModule() is the name of the module to load 
and the second
argument is the specific directory to look in. Specifically, the 
contents of the
sys.path variable is NOT used to find the module, instead you have to 
tell it where
it resides.

The reason for doing this is so you know you pick up the exact module 
you want. The
idea is that all the modules which relate to the application component 
of your web
pages would reside in one place. The name of that root directory would 
be the second
argument you use to importModule(). This avoids any problems where you 
can't be
certain about the order of sys.path when multiple applications are 
running within
the same web server and each different application uses the same module 
name.

Another important feature of the module cache is the encoding of the 
actual path to
the module into the lookup key for the module as it is listed in 
sys.modules. The
module cache in mpservlets does a similar thing, although in Vampire, 
for reasons I
can't quite remember, I encode the name to a greater degree. I vaguely 
remember
having some problems with really long module names in some older 
version of Python.

For example, the module "c" above would appear in sys.modules under the 
key:

   _vampire__2b_yg4_2b_XZhHdEtPHG6AmTz9w_3

The path recorded in the loaded module would be "./c.py" or "./c.pyc" 
as appropriate.

The reason for this is to work around a problem often described on the 
mailing list
whereby using the same name in multiple locations will potentially 
cause reloading
of the handler every time it is accessed from the different locations 
even no changes
have been made.

To explain the tracking of parent/child relationships between modules, 
now consider
the following Python code files.

   # c.py

   print "c"

   # b.py

   import vampire.cache
   print "b 1"
   cache = vampire.cache.ModuleCache()
   c = cache.importModule("c",".")
   print "b 2"

   # a.py

   import vampire.cache
   print "a 1"
   cache = vampire.cache.ModuleCache()
   b = cache.importModule("b",".")
   print "a 2"

   # test.py

   import vampire.cache
   import time

   cache = vampire.cache.ModuleCache()

   for i in range(100):
     print "test",i
     a = cache.importModule("a",".")
     for n in cache.cachedModules():
       c = cache.moduleInfo(n)
       print 
c.name,c.label,c.generation,c.file,c.mtime,c.atime,c.direct,c.indirect
     time.sleep(5)

First thing to be noted is that a normal "import" is not used for any 
module which
would be in your application module directory. Any access to such 
modules is mediated
through the module cache.

Now when the test script is run, it will on the first iteration 
generate the output:

   test 0
   a 1
   c
   b 1
   b 2
   a 2
   c _vampire__2b_yg4_2b_XZhHdEtPHG6AmTz9w_3 1 ./c.py 1093773189 
1093774844.22 2 0
   a _vampire_LqOp5S04uKeR2wBXKPT61w_3 3 ./a.py 1093773270 1093774844.23 
1 0
   b _vampire_XcjTxdeaT7e4Z8DD4M2Axg_3 2 ./b.py 1093773255 1093774844.22 
1 0

The code in each of the modules "a.py", "b.py" and "c.py" has been 
executed upon loading
as demonstrated by print statements outputs.

On the subsequent iteration, the output of the test program is:

   test 1
   c _vampire__2b_yg4_2b_XZhHdEtPHG6AmTz9w_3 1 ./c.py 1093773189 
1093774849.24 2 1
   a _vampire_LqOp5S04uKeR2wBXKPT61w_3 3 ./a.py 1093773270 1093774849.24 
2 0
   b _vampire_XcjTxdeaT7e4Z8DD4M2Axg_3 2 ./b.py 1093773255 1093774849.24 
1 1

That is, no modules were reloaded, instead they were obtained from the 
cache. The output
here is information maintained by the cache about modification and 
access times, number
of access and overall generation snapshot of the cache as a whole at 
the point the
module was loaded.

Note that the test script only attempts to load module "a". If we touch 
the file "a.py",
will now see the iteration of test script following point when file was 
modified showing:

   test 19
   a 1
   a 2
   c _vampire__2b_yg4_2b_XZhHdEtPHG6AmTz9w_3 1 ./c.py 1093773189 
1093774939.5 3 19
   a _vampire_LqOp5S04uKeR2wBXKPT61w_3 4 ./a.py 1093774935 1093774939.5 
1 0
   b _vampire_XcjTxdeaT7e4Z8DD4M2Axg_3 2 ./b.py 1093773255 1093774939.5 
2 18

Because only "a.py" was modified, there is no need to reload "b" and 
"c" from disk and
they instead come from the cache.

If instead we modify "b.py" and then "c.py" we will see the output:

   test 22
   a 1
   b 1
   b 2
   a 2
   c _vampire__2b_yg4_2b_XZhHdEtPHG6AmTz9w_3 1 ./c.py 1093773189 
1093774954.54 5 21
   a _vampire_LqOp5S04uKeR2wBXKPT61w_3 6 ./a.py 1093774935 1093774954.55 
1 0
   b _vampire_XcjTxdeaT7e4Z8DD4M2Axg_3 5 ./b.py 1093774951 1093774954.55 
1 0

   test 25
   a 1
   c
   b 1
   b 2
   a 2
   c _vampire__2b_yg4_2b_XZhHdEtPHG6AmTz9w_3 7 ./c.py 1093774966 
1093774969.64 2 1
   a _vampire_LqOp5S04uKeR2wBXKPT61w_3 9 ./a.py 1093774935 1093774969.65 
1 0
   b _vampire_XcjTxdeaT7e4Z8DD4M2Axg_3 8 ./b.py 1093774951 1093774969.65 
1 0

In these cases, although "a.py" was not modified, but "b.py" and then 
"c.py" were,
the module cache knows that "a.py" depends on "b.py" and that "b.py" 
depends on
"c.py" and thus that it should force a reload of "a.py" anyway in case 
the latter
files have changed in some important way.

As an example, imagine that "c.py" contained configuration information. 
This
module cache will ensure that modules which use that configuration are 
reloaded
to pick up the changed information. This auto reloading would also be 
good where
a servlet style approach was used and each file contained a servlet 
which was
part of a hierarchy. If the lowest base class were changed and it 
contained
important site layout code, those servlets derived from it would be 
automatically
reloaded when next used.

 From what I can see, this module cache will solve the sorts of problems 
people
are talking about with how to force reloading of modules when changed. 
Unless I
am missing something, the other suggestions are okay when there is only 
one level
of module importing, but if there are multiple levels of imports from 
your set
of application modules, they will not detect when the file associated 
with a child
import is changed and thus that an unchanged parent should still be 
reloaded.

I will apologise in advance for not answering any followup questions, 
if any, for a
few days as am about to travel to yet another country for a few days 
and will not be
taking my laptop with me on this part of my trip.

FWIW, Vampire can currently be obtained from:

   http://www.dscpl.com.au/projects/vampire

--
Graham Dumpleton (grahamd at dscpl.com.au)