[mod_python] The new module loader

Sat Apr 22 09:51:00 EDT 2006

Sorry for taking so long to get back to this email. Busy day ...

On 22/04/2006, at 2:08 AM, Jorey Bump wrote:

> Graham Dumpleton wrote:
>> Graham Dumpleton wrote ..
>>> The new module importer completely ignores packages as it is  
>>> practically
>>> impossible to get any form of automatic module reloading to work
>>> correctly with them when they are more than trivial. As such,  
>>> packages
>>> are handed off to standard Python __import__ to deal with. That  
>>> it even
>>> finds the package means that you have it installed in sys.path.  
>>> Even if
>>> it was a file based module, because it is on sys.path and thus  
>>> likely to
>>> be installed in a standard location, the new module importer  
>>> would again
>>> ignore it as it leaves all sys.path modules up to Python __import__
>>> as too dangerous to be mixing importing schemes.
>>>
>>> Anyway, that all only applies if you were expecting  
>>> PyServer.pyserver to
>>> automatically reload upon changes.
>
> Graham, can you enumerate the different ways packages are handled,  
> or is it enough to say that packages are never reloaded? In this  
> thread, you explain that when a package is imported via  
> PythonHandler, mod_python uses the conventional Python __import__,  
> requiring an apache restart to reliably reload the package, as in  
> the past.

That is correct. What it means is that packages will only be found if
located somewhere along sys.path and are still held in sys.modules
because it is builtin Python __import__ that will load them. As such,  
they
are not regarded as being reloadable by mod_python. Whether you can
reload Packages using the Python "reload" statement/function I don't  
know
as I have never tried. I would probably not recommend trying, so an
Apache restart is still going to be the only way to reliably reload a  
package.
Therefore nothing has change in this respect from current importer.

I really did try hard to get reloading working with packages, ie., many
nights over a few weeks, but in the end although I could see a glimmer
of hope that it might work, it just became too impractical. Some of the
problems are that sub imports within packages will only work when the
parent module in the package is listed in sys.modules. Thus one had
to fake up horrible unique module reference names to store a reference
to the modules in sys.modules. Because of reloading, this had to be
tagged also with an incarnation version number so that when reloading
you weren't overwriting the currently loaded one. The other big problem
was that you get cyclic dependency loops in packages because of how
you reference back through the root of the package when doing sub
imports. This meant that any change to any module file within the
package had to trigger a complete reload of all files which made up
the package. Ie., you had to treat the package as a complete blob,
otherwise it became impossible to implement and you invariable some
how got different versions of a module in use in different parts of the
package at the same time. Very messy.

What I have hoped to achieve by some of the other features in the new
module importer is a way of achieving the same effect that packages
were generally being used for, ie., namespacing and encapsulation,
but still be able to support reloading. It does though mean  
restructuring
your imports a bit and it only becomes usable within the context of
mod_python, but then if it was some generic package which wasn't
mod_python specific to support a web application, one could question
why one would expect it to be reloadable anyway.

> This also implies that if a published module imports a package, and  
> the published module is touched or modified, then the module will  
> be reloaded, but not the package. Is this correct?

Correct, the file based handler module can be reloaded, but the package
will be referenced out of sys.modules where it already resides by the
Python __import__ builtin importer.

>> BTW, that something outside of the document tree, possibly in  
>> sys.path,
>> is dealt with by Python __import__ doesn't mean you can't have module
>> reloading on stuff outside of the document tree. The idea is that  
>> if it is
>> part of the web application and needs to be reloadable, that it  
>> doesn't
>> really belong in standard Python directories anyway. People only  
>> install
>> it there at present because it is convenient.
>
> There are security benefits to not putting your code in the  
> DocumentRoot. It's also useful to develop generic utilities that  
> are used in multiple apps (not just mod_python), but that you don't  
> want available globally on the system. I prefer extremely minimal  
> frontends in the DocumentRoot, with most of my code stored  
> elsewhere. Will the new importer support reloading modules outside  
> of the DocumentRoot without putting them in sys.path?

If you don't want certain modules available globally on your system,  
ie.,
not in site-packages directory. You can obviously still set PythonPath
just within mod_python configuration so they are found without it
effecting command line Python. Obviously these are still notionally
on sys.path and so would not be candidates for reloading.

As I mentioned, setting of PythonPath currently has nasty side effect
preserved from current importer whereby it causes Directory directive
directory not to be searched. I want to get rid of this behaviour though
as it doesn't seem to make too much sense with new module importer.

   http://issues.apache.org/jira/browse/MODPYTHON-154

although one still has to be careful in doing it as it may cause  
existing
applications to now incorrectly pick up a module from the Directory
directive directory when it wouldn't have before. Because of path
ordering issues in current importer, using common names in multiple
locations always caused unpredictable though.

Now in terms of modules which are a candidates for reloading being
able to be found on some search path, the first thing that could be done
(hasn't yet), is to allow for handler directives a path to be  
specified by:

   PythonHandlerPath '["/some/path1":"/some/path2"]'
   PythonHandler mydispatcher

The idea here is that where the specified handler module is not an
absolute or relative path, ie., is just a module name, the path defined
by PythonHandlerPath directive would be appended to the "path"
argument to apache.import_module() function call internally, the
current value of the "path" argument in this situation being the
Directory directive directory.

The order of search would then be, look in Directory directive  
directory,
then search PythonHandlerPath and then fall back to sys.path.

Note that this can be done now by virtue of a shell handler in the
document tree simply containing something like:

   from mod_python import apache
   _inner = apache.import_module("modname",path=["some/path1","/some/ 
path2"])
   handler = _inner.handler

But then, it is probably better to use a full path name in the config
to begin with and more probably want I want to promote as a preferred
mechanism with the new importer. This is the main reason
why I haven't added PythonHandlerPath. That is, I think using an
absolute path name is better in being more precise.

The other reason PythonHandlerPath hasn't been implemented yet is that
the new importer is still optional and hasn't been properly embedded  
into
mod_python. Until it was accepted as the correct way to go, I didn't
want to be adding new directives or changing other parts of mod_python
which need to be changed so it works correctly in all situations. See:

   http://issues.apache.org/jira/browse/MODPYTHON-155
   http://issues.apache.org/jira/browse/MODPYTHON-156

for a couple of other examples of things which I haven't been able
to do yet and can't really until decision made to embed it properly.

So, PythonHandlerPath is one way that some special search path
could be consulted for reloadable modules. This though would only
apply by default to top level handler imports, it would not apply for
explicit calls to apache.import_module().

Overall I am a bit hesitant on introducing a directive which would
provide a search path which apache.import_module() would
automatically search. The reason is that like in the current importer
this can cause problems where different parts of the document tree
decide to set the search path differently.

For example, imagine a common set of modules outside of the
document tree which are used by code running under different
parts of the document tree and which therefore may have different
handler search paths defined. Depending on which part of the
document tree calls into the common code first will dictate how a
search may be done for some other module if the common modules
expect to find it on the search path. If one part of the document tree
doesn't include this other place, the search will fail. In other words
the common modules are relying on a search path that is in part out
of its control.

Hope you follow what I am getting at here. It is in some way the sort
of situation Dan had with the "config" module. His code was relying
on fact that directory his config module directory was in sys.path.
But PythonPath effectively being random order based on access
order when set to different things in different parts of the document
tree, if someone else provided a config module under same name
it would be found by mistake and he would not get the one he wanted.

My feeling is that those modules should be self contained, or if they do
need to search else where, that they should somehow define the search
path for the other module themselves, ie., using "path" argument to
the apache.import_module() method. This ensures they get want
they wanted.

So, an equivalent to PythonPath for reloadable modules could be
provided, but I'd only really wanted to do it when good use cases
shown and that it is also shown that unpredictable behaviour isn't
just going to result again because of how it could be set differently
in different parts of document tree. One would also have to come up
with a way to extend such a part inherited from a parent context. Ie.,
like how one can refer to sys.path in PythonPath now.

>> The better way of dealing with this with the new module importer  
>> is to
>> put your web application modules elsewhere, ie., not on sys.path.  
>> You then
>> specify an absolute path to the actual .py file in the handler  
>> directive.
>>  <Directory />
>>      SetHandler mod_python
>>      PythonHandler /path/to/web/application/PyServer/pserver.py
>>      ...
>
> How arbitrary is this path? Must it be within the DocumentRoot?

It is an absolute path relative to the root of the filesystem as a  
whole,
so can be anything you want. Can include drive specifiers on Win32.

There currently is a short cut that can be used to refer relative to the
directory the Directory directive refers to. This is:

   PythonHandler ~/mymodules/handler.py

Ie., "~/" prefix. As I mentioned in a previous email, wanting to get rid
of the "~/" prefix as a general mechanism. What I mean here is that
currently this can also be used in explicit calls to  
apache.import_module()
and will refer to the currently value of req.hlist.directory as root  
of path.
This leads to unpredictability with common modules like discussed
above and so getting rid of it. Instead, for handler directive case,  
will
instead allow:

   PythonHandler ./mymodules/handler.py

or:

   PythonHandler ../mymodules/handler.py

Ie., relative to directory the Directory directive specifies.

>> Most cases I have seen is that people use packages purely to create a
>> namespace to group the modules. With the new module importer that
>> doesn't really need to be done anymore. That is because you can
>> directly reference an arbitrary module by its path. When you use the
>> "import" statement in files in that directory, one of the places  
>> it will
>> automatically look, without that directory needing to be in sys.path,
>> is the same directory the file is in. This achieves the same  
>> result as
>> what people are using packages for now but you can still have module
>> reloading work.
>
> Does it (the initial loading, not the reloading) also apply to  
> packages in that directory? Or will it only work with standalone  
> single file modules in the root of that directory?

Only works for standalone single file modules. A Python package  
always has
to be on sys.path and will never be reloabable by mod_python.

Note that if a package is very simple. Ie., is a single level and  
refers to
modules in the same package directly rather than through the root,  
using:

   package = apache.import_module("/some/path/package/__init__.py")

can often work though and will give you reloading as well.

> This is all very nifty, because it implies that a mod_python  
> application can now be easily distributed by inflating a tarball  
> and specifying the PythonHandler accordingly.

If PythonHandler path refers to the extracted tarball by absolute  
path, then
yes it becomes simpler as no need to mess with PythonPath or install  
it into
site-packages. You just can't implement it as a traditional package,  
but then
because it is self contained in its own directory which isn't  
mentioned in
sys.path, you still have the ability to internally structure it how  
you want.

> If the new importer works outside of the DocumentRoot,

Which it does, but then I probably don't need to confirm that again. :-)

> and Location is used instead of Directory, no files need to be  
> created in the DocumentRoot at all. Or is this currently  
> impossible, in regards to automatic module reloading? I already do  
> this for some handlers I've written, and really like the  
> flexibility provided by the virtualization.

Technically it is probably possible to have nothing at all in the  
document tree.
You can do this now with the current importer though, but means  
messing with
PythonPath with all the problems that entails and other code can pick  
up your
handler modules. By being able to specify an absolute path to your  
handler bundle
it becomes completely separate and would only be accessible by other  
code similarly
accessing it by absolute path.

I think perhaps you are starting to see where I am in part going with  
the new
module importer. That is that I am introducing this new way of being  
able to
refer to stuff by explicit paths, thereby breaking away from sys.path  
and all the
problems that result from that. It means restructuring stuff a bit  
and it will not be
backward compatible, but I think that overall it is a much better way  
of doing it
with better compartmentalisation and predictability.

Anyway, that was a long ramble. I really need to start getting some  
of this
documented properly, as there is certainly more to the new module  
importer
than providing an exact replacement for the old. I think the  
possibilities are
quite promising, but need to explain it well so people don't get the  
wrong idea
and that there are good reasons for doing it.

BTW, I forgot to say more about how the "path" argument to  
apache.import_module()
behaves when module name is referred to as an absolute or relative  
path. This
is something I started talking about in previous email to Dan. If you  
didn't
read that one, you may want to go back and look at it. I'll need to  
revisit that
one again, as that is the one area that probably still needs to be  
thought out
properly and changes still made to make it more usable.

Definitely getting late now, but then I slept most of the afternoon  
as felt a bit funny. :-)

Graham