[mod_python] Articles on module importing.

Fri Jul 8 09:40:23 EDT 2005

Graham Dumpleton wrote:
> Jorey Bump wrote ..
> 
>>Understood. My point is that alternate import mechanisms such as 
>>import_module() shouldn't be encouraged in *published* modules or 
>>modules that are based on other handlers. I think we both agree that we
>>want the average Python programmer to be able to start using mod_python
>>  as seamlessly as possible. 
> 
> What would you think of a scheme whereby you could use the "import"
> statement and underneath it would magically and seamlessly translate
> that into a call to import_module() for you?
> 
> The call to import_module() under the covers would only happen though where
> the module was in the same directory, or in one of a set of special directories
> designated by a path distinct from sys.path. If a module wasn't found in
> those places, it will fall back to use of the standard Python import
> mechanism and search in only sys.path.

Yes, this is precisely what I'm trying to describe.

> Thus, for those modules local to the document tree or in specially
> designated directories, even though the "import" statement was used,
> automatic module reloading would work. Because the "import"
> statement is used, you don't need to have two sets of code, one for
> command line use and one for use under mod_python. Under mod_python,
> if necessary, all you would need to do is designate those special directories
> where modules are stored which you want managed by "import_module()"
> under the covers instead of the standard Python module import mechanism.

Exactly. But there could also be a default (i.e. ".htPython"), to 
provide some commonality among installations, to ease configuration for 
both the developer and administrator.

>>After editing modules 
>>imported by a published module, apache must be restarted.
> 
> 
> Or all the modules that inherit it need to be touched so there modification
> time changes. :-)

Yes, perhaps I overemphasized, but I find such workarounds to be 
tedious, also ("OK, I edited a module, now what other files do I need to 
touch?". The holy grail here is behaviour similar to PHP, where file 
edits take immediate effect, but without losing the performance gains 
from module caching, whenever possible.

>>No. I think it's poor practice to put any kind of library files in a 
>>directory that is accessible to browsers, even with other languages. The
>>risk is too high that someone with knowledge of your directory structure
>>can call your support modules directly, causing unintended side effects.
> 
> It could be viewed as being not recommended for beginners who don't
> understand all the security implications, but it is possible now and when
> someone who knows how to secure the system properly does it, it can
> be quick means to and end and can save a lot of pain in other ways.

Absolutely. I wouldn't want to prevent it, but I still think it's a bad 
idea.

> Thus, whether one does this or not is going to be a convention only.

Correct.

> Depending on what type of handler system you are using, putting such
> common code in modules starting with an underscore and using some
> thing like the following in a .htaccess file:
> 
>   <Files "_*">
>   deny from all
>   </Files>
> 
> can be enough to stop the code being exposed.

Yes, but here we're getting into one of the current weaknesses of 
mod_python: It requires too much administrative capability.

What's needed is a handler that's included in the mod_python 
distribution that allows an admin to install and configure mod_python 
once, and developers to use it immediately for web applications, with 
many of the associated headaches taken care of behind the scenes. 
Publisher2, perhaps. :)

>>I  don't want to see mod_python automagically package subdirectories or add
>>them to the path because it gives newbies more rope to hang themselves.
> 
> What if the scheme didn't actually involve the addition of either the top level
> handler directory or the subdirectories into sys.path in the first place for this
> to work?

Yes, a virtualization may be necessary, especially when trying to 
prevent collisions. But when taken out of the mod_python environment, 
the code should still work (although one may have to prefix the special 
directory to PYTHONPATH or move the modules into the search path).

> The problem we get at the moment is that we get questions of why this sort
> of importing doesn't work for subdirectories in the first place. How only the
> top level handler directory is added into sys.path is explained. End result is
> that they go and fiddle with PythonPath explicitly anyway and still hang
> themselves.

I hesitate to mention it, but maybe the directory in which the 
PythonHandler is defined could also have the special quality where the 
modules in .htPython are available to all other modules in the 
interpreter, regardless of location. Actually, this is easily done now, 
without any changes to mod_python. Simply define this in a virtual host:

DocumentRoot /var/www/host1/site

<LocationMatch ".htPython">
   Deny from All
</LocationMatch>

<Directory /var/www/host1/site/app>
   SetHandler python-program
   PythonHandler mod_python.publisher
   PythonDebug On
   PythonPath "['/var/www/host1/site/app/.htPython'] + sys.path"
</Directory>

An admin could do this now, making it much easier to support 
mod_python.publisher on a server for multiple clients. One advantage is 
that a team of developers would have a common area to store custom 
modules for import, and they would at least be able to avoid name 
collisions for these modules (if not the published modules).

A little protection is offered by denying browser access to the 
.htPython directory, but it should noted that any other system user can 
read those files. This is a problem shared by most web applications, not 
just those written for mod_python, even when the files are stored 
outside of the DocumentRoot.

This also eases application distribution, but it's not as simple as 
tarballing a single directory.

> Wouldn't it be preferable if a scheme could be offered where "import" just
> works for modules in the same directory regardless of whether the importer is
> in the top level handler directory or a subdirectory, especially where the
> manner it was implemented didn't require extension of sys.path and where
> automatic module reloading worked?

Yes, but I fear that it encourages bad practice in the web application 
sphere. And it will cause problems exactly because it's perfectly 
acceptable, even desirable, in other application domains. In some ways, 
it's best to treat a module accessible by a browser as a simple frontend 
to a backend application; an interface, if you will. You don't encounter 
this problem at all if your standalone application includes a web server 
and is proxied through apache, for example.

> I feel it comes down to looking at how all users want and expect this stuff
> to work, rather than drawing conclusions as to whether one way is better
> than another, and make it work, but make it work transparently in what
> would be regarded as correct way for the mod_python case.

Well, the customer isn't always right -- mod_python is never going to be 
the same as PHP (I hope!). Some of the shock to new users is a result of 
their experience with other languages, so they will need to learn new 
tricks. But even for experienced Python programmers, some annoyances 
remain. I'd like to address these, first.

> Unless you take extra precautions by adding "deny from all" into the
> ".htPython/.htaccess" file, these modules are still going to be accessible.
> That you have to take explicit steps to protect them is not much
> different to using a convention of using a leading underscore.

Yes, you're right. I mistakenly assumed that the default config of most 
apache installations forbids access to any resource that begins with 
".ht", but this actually only applies to files, so some administrative 
step must be taken. Although it would be a good idea for mod_python 
itself to prevent access, it must also be handled by apache, in case 
mod_python isn't running. Since it's in the DocumentRoot, it's important 
not to leak information. In any case, using a special directory is more 
explicit (and flexible) than a file naming convention, IMHO.

> If the "import" statement overlays "import_module()" and the later supports
> the same named module in different directories, you wouldn't have to
> worry about collisions. If it knows to grab them from this directory using
> this mechanism before looking elsewhere through some special path
> designation, then you could have everything you want and more.
> 
> Am getting into implementation, which I did't want to, but what if for
> this to work one way was to add the following to the start of your
> appl1.py file:
> 
>   import os
> 
>   directory = os.path.dirname(__file__)
>   modules = os.path.join(directory,".htPython")
>   __path__.insert(0,modules)
> 
>   import module1
>   import module2
> 
> You need to add a few lines, but it gives you control on where "import"
> looks, but through a bit of magic when it finds "module1" existant in
> that directory it will also use "import_module()" to actually load it and
> when it handles same named modules the collision problems go away.

Yes, but I wouldn't want to ask a developer to do this in a published 
module. It's too much of a "gotcha".

> It isn't that a specifying a special location is important, but that there is
> a separation between sys.path modules where standard Python import
> infrastructure is used, and web application directories where under the
> covers of "import" the "import_module()" function is used. Thus with the
> example above, a mechanism is provided to specify where you directory
> is, but you set your own convention as to what it is named and where
> it is located.

Yes, but I think it's important to handle this outside of the published 
module, in the handler, even if the standard import mechanism is only 
being enhanced or simulated. Too much special code in the published 
module makes it proprietary to mod_python, in which case, you might as 
well use a full blown framework.