[mod_python] Articles on module importing.

Fri Jul 8 00:59:06 EDT 2005

Jorey Bump wrote ..
> > The current mod_python.publisher handler uses the import_module()
> > function internally to load modules and so it is just as much afflicted
> by many
> > of these problems, more so when within a published module the code uses
> > the import_module() explicitly and even if only the "import" statement
> is
> > used within that module.
> 
> Understood. My point is that alternate import mechanisms such as 
> import_module() shouldn't be encouraged in *published* modules or 
> modules that are based on other handlers. I think we both agree that we
> want the average Python programmer to be able to start using mod_python
>   as seamlessly as possible. 

What would you think of a scheme whereby you could use the "import"
statement and underneath it would magically and seamlessly translate
that into a call to import_module() for you?

The call to import_module() under the covers would only happen though where
the module was in the same directory, or in one of a set of special directories
designated by a path distinct from sys.path. If a module wasn't found in
those places, it will fall back to use of the standard Python import
mechanism and search in only sys.path.

Thus, for those modules local to the document tree or in specially
designated directories, even though the "import" statement was used,
automatic module reloading would work. Because the "import"
statement is used, you don't need to have two sets of code, one for
command line use and one for use under mod_python. Under mod_python,
if necessary, all you would need to do is designate those special directories
where modules are stored which you want managed by "import_module()"
under the covers instead of the standard Python module import mechanism.

> After editing modules 
> imported by a published module, apache must be restarted.

Or all the modules that inherit it need to be touched so there modification
time changes. :-)

> >>1. We want the PythonPath extended automatically based on the location
> >>of a file.
> > 
> > This is not a goal or requirement, it is indicative of a solution. To
> provide
> > the desired behaviour doesn't require PythonPath be extended as there
> are
> > actually better ways of achieving the same thing and still maintain the
> > required transparency that you require.
> > 
> > The real goal you are probably alluding to is that within a Python code
> > file serving as a handler you want the "import" statement to be able
> to
> > pick up a module that resides in the same directory. Yes?
> 
> No. I think it's poor practice to put any kind of library files in a 
> directory that is accessible to browsers, even with other languages. The
> risk is too high that someone with knowledge of your directory structure
> can call your support modules directly, causing unintended side effects.

It could be viewed as being not recommended for beginners who don't
understand all the security implications, but it is possible now and when
someone who knows how to secure the system properly does it, it can
be quick means to and end and can save a lot of pain in other ways.

Thus, whether one does this or not is going to be a convention only.
Depending on what type of handler system you are using, putting such
common code in modules starting with an underscore and using some
thing like the following in a .htaccess file:

  <Files "_*">
  deny from all
  </Files>

can be enough to stop the code being exposed.

> > At the moment, if using mod_python.publisher this only works where the
> > published module is in the same directory as where the PythonHandler
> > directive was specified, it does not work where the published module is
> > in a subdirectory. Thus, should the ability to use the "import" statement
> > in this way also work for a published module in a subdirectory of where
> > the handler directive was originally defined?
> 
> No. For backwards compatibility, I'd leave this behaviour as-is. We use
> the phrase "published module" to refer to a module that we intend a 
> browser to access, but mod_python.publisher makes no such distinction.
> I  don't want to see mod_python automagically package subdirectories or add
> them to the path because it gives newbies more rope to hang themselves.

What if the scheme didn't actually involve the addition of either the top level
handler directory or the subdirectories into sys.path in the first place for this
to work?

The problem we get at the moment is that we get questions of why this sort
of importing doesn't work for subdirectories in the first place. How only the
top level handler directory is added into sys.path is explained. End result is
that they go and fiddle with PythonPath explicitly anyway and still hang
themselves.

Wouldn't it be preferable if a scheme could be offered where "import" just
works for modules in the same directory regardless of whether the importer is
in the top level handler directory or a subdirectory, especially where the
manner it was implemented didn't require extension of sys.path and where
automatic module reloading worked?

I feel it comes down to looking at how all users want and expect this stuff
to work, rather than drawing conclusions as to whether one way is better
than another, and make it work, but make it work transparently in what
would be regarded as correct way for the mod_python case.

> Therefore, it has to be in the same directory as the 
> published module. For example, http://host/app/ might contain this:
> 
>   .htPython/module1.py
>   .htPython/module2.py
>   app1.py
>   app2.py
> 
> #app1.py
> 
> import module1
> import module2
> 
> def index
>      a = module1.do(something)
>      return module2.show(a)
> 
> app1 imports modules from the special directory, but app2 doesn't. No 
> big deal. What's nice is that app.tgz could be untarred anywhere that 
> the appropriate PythonHandler is defined, and it wouldn't be necessary
> to manually extend the PythonPath with an explicit file specification.

Unless you take extra precautions by adding "deny from all" into the
".htPython/.htaccess" file, these modules are still going to be accessible.
That you have to take explicit steps to protect them is not much
different to using a convention of using a leading underscore.

> The only precaution left for the developer is to name imported 
> modules/packages appropriately to prevent collisions, but this must be
> done for all Python apps, anyway. Ideally, the path would be modified 
> per "published" module, but I don't know if that's even possible.

If the "import" statement overlays "import_module()" and the later supports
the same named module in different directories, you wouldn't have to
worry about collisions. If it knows to grab them from this directory using
this mechanism before looking elsewhere through some special path
designation, then you could have everything you want and more.

Am getting into implementation, which I did't want to, but what if for
this to work one way was to add the following to the start of your
appl1.py file:

  import os

  directory = os.path.dirname(__file__)
  modules = os.path.join(directory,".htPython")
  __path__.insert(0,modules)

  import module1
  import module2

You need to add a few lines, but it gives you control on where "import"
looks, but through a bit of magic when it finds "module1" existant in
that directory it will also use "import_module()" to actually load it and
when it handles same named modules the collision problems go away.

> Depth is certainly an issue, but one could assume (or at least establish
> the convention) that the custom modules in .htPython mostly import 
> stable modules that are unlikely to change, such as those in the 
> standard library. The mere possibility of specifying a conventional and
> portable location for custom modules goes a long way towards solving the
> import problem.

It isn't that a specifying a special location is important, but that there is
a separation between sys.path modules where standard Python import
infrastructure is used, and web application directories where under the
covers of "import" the "import_module()" function is used. Thus with the
example above, a mechanism is provided to specify where you directory
is, but you set your own convention as to what it is named and where
it is located.

> Sorry, by packaging I meant simply creating a tarball or zipfile for 
> distribution, not a Python package.

Which means in your case, you wouldn't be affected if "import_module()"
didn't support Python packages within the web application area.

> I currently develop most of my Publisher applications as packages that
> are stored in a directory prefixed to PythonPath. It works fine and I 
> couldn't bear to lose that functionality (although I obviously still 
> need to restart apache after editing package code). Unfortunately, I 
> don't know enough about the import mechanism to understand the 
> difference between reloading modules vs. packages, so I don't know if my
> scheme offers any solution here (or even if it's feasible).

It is all feasible, except for Python packages which may just not be worth
the effort. I know it is feasible, because for how you are using things, it
is already possible and working in "vampire::publisher". ;-)

Graham