[mod_python] Imports, Caching, Handlers and Page Templates.

Sun Aug 15 19:22:31 EDT 2004

I'll admit that I have only been looking at mod_python for about a 
month and have only
been on the mailing list for a couple of weeks. Also, I have 
exclusively been playing
with the older 2.7 version of mod_python, as I don't have access to a 
machine which
runs Apache 2.0. All the same, I thought it might be a good time to say 
a few things
about what I have been up to since I have just made available a new 
beta version of a
package I make available and this new version now incorporates some 
specific support
for mod_python.

Okay, see it as a bit of advertising for this package if you want, but 
the mod_python
handler I have come up with can be separated from the package and used 
by itself if
desired. From what I have seen, this mod_python handler provides 
something a bit
different to what is out there, but then I could be wrong, as I haven't 
been looking
too long and I haven't looked at mod_python 3.1 in any real detail yet. 
Thus I may have
missed the obvious and simply reinvented the wheel. Please educate me 
if necessary. :-)

First off, the package is called OSE and it is available from:

   http://ose.sourceforge.net

The version just made available is OSE 8.0b4.

I will not go into detail about what OSE is all about as it would take 
too long. It is
enough to say that it provides a way of implementing applications 
distributed across
multiple processes or multiple machines. This is done using a 
publish/subscribe and
request/reply messaging system. Although the core is written in C++, 
there are some
Python wrappers provided. As far as mod_python is concerned, the main 
thing I now
provide which would be of interest to people here is a specialised 
mod_python content
handler.

Now unless I have missed something, there are two ways to setup 
mod_python to be used.
You can use "AddHandler" to indicate that a specific handler be used 
for all requests
where a specific extension is used, or you can use SetHandler to have 
all requests be
processed by a specific content handler. The two cases are represented 
by mod_python.psp
in the first instance and mod_python.publisher in the latter.

The mod_python handler I have developed uses SetHandler like 
mod_python.publisher, but
whereas mod_python.publisher then does some interpretation of the 
request and translates
it into a totally different calling convention, ie., Python method 
invocation, what I
have created effectively maps requests back onto different basic 
content handlers by
looking at the resource being requested.

In other words, if you had copied the "mptest.py" example file into the 
directory and
made a request against "mptest.html", it would be processed by the 
"handler()" method
contained in "mptest.py". At the same time and in the same directory, 
if there were
also a handler defined in "index.py" and a request was made for 
"index.html" then
the "handler()" method in "index.py" would be used to process the 
request.

Thus, without imposing a particular mechanism for how the resultant 
HTML is generated,
such as with mod_python.psp, you can have multiple basic content 
handlers within the one
directory where which is used is by default determined by stripping off 
the ".html"
extension and replacing it with ".py" to arrive at which handler to 
execute. If it is
determined that for a particular request of a ".html" file there is no 
corresponding ".py"
file, the handler declines to handle the request and thus Apache 
processes it as per
normal. Similarly, if the request was for any resource at all it didn't 
know what to
do with, it would again pass it back to Apache.

Now I would have thought this would have been quite basic functionality 
and why I feel
I must have missed something and that this actually is possible without 
a specialised
content handler. So what have I overlooked?

Anyway, to extend on that, as well as dealing with requests for files 
with a ".html"
extension, it is possible to define that a content handler can produce 
other types of
files as well as, or instead of ".html" files. Thus, you might have in 
a directory a file
called "data.csv". If you were to request this file, the initial 
handler, not knowing
what to do with it, will decline to service the request and instead 
Apache will return
it instead.

What can now be done though, is to add a "data.py" containing a handler 
method. This
isn't going to be used to generate the CSV data, but by defining that 
this handler be
called for ".txt" and ".html" requests, ie., requests against 
"data.txt" and "data.html",
it can be used to translate the data in the CSV file into different 
forms, such as
tab separated and HTML table formatted data.

In summary, if a directory were to contain the files as described 
above, plus an
ordinary HTML file, as listed below:

   mptest.py
   index.py
   data.csv
   data.py
   basic.html

The result thus of requests would be:

   basic.html - declined by initial handler and content returned by 
Apache
   data.csv - declined by initial handler and content returned by Apache
   missing.html - declined by initial handler and Apache returns "Not 
Found"
   index.html - initial handler would execute handler in index.py to 
generate content
   mptest.html - initial handler would execute handler in mptest.py to 
generate content
   data.txt - initial handler would execute handler in data.py to 
generate content
   data.html - initial handler would execute handler in data.py to 
generate content

One final case, if the extension which a ".py" file is indicated as 
being able to process
is set to be empty, then that name can be used like a directory with 
additional path
info being passed in to the handler. Ie., if the code file is called 
"proxy.py", using
the URL "proxy/a/b/c" would result in the handler in "proxy.py" being 
executed and
with the path info set to "/a/b/c".

That is the first significant bit of functionality provided. The next 
relates to the
fact that to do this effectively, you need to have a caching system for 
Python code
which is loaded, to avoid reimporting on every request. Caching is done 
by mod_python
itself as well as in systems like mpservlets.

The specialised content handler being supplied actually makes use of a 
module caching
system which OSE has had in it for some time, although it has been 
upgraded a bit
recently and split into a separate file so it can be split out of OSE 
if required.

Unlike mod_python and mpservlets the caching system will reload Python 
code if any
change occurs in the modification time. These other systems only do so 
if the modification
time is newer than before, which is no good if an older version of a 
file is put
back in place of the newer one. Only reloading a file if the 
modification is newer
can also sometimes be problematic when files are shared across the 
network using
something like NFS and machine clocks aren't properly synchronised.

More importantly than that, the module cache system can be used 
explicitly within a
loaded module to load additional modules as children and the cache will 
track the
relationships between the modules and will reload a parent if any of 
the children is
changed. This avoids the problem whereby if you use "import" explicitly 
to load a
common bit of code and you change the common code, you are in most 
cases forced to
restart Apache to throw out the old code remembered by Python so as to 
be able to
pick up the new code.

This mechanism of tracking relationships between loaded modules was 
originally developed
for a servlet based system. Specifically, you might have a base class 
servlet which defines
a lot of site structure and a derived class which defines specific page 
content. As
well as the derived servlet being loaded through the module cache, the 
derived servlet
could load the base class servlet via the cache. Now if the base class 
servlet was
modified, even though the derived class servlet hadn't, if a request 
came in which
meant the derived class servlet were to be executed, the system would 
detect the change
in the base class and automatically reload the derived class which 
would in turn reload
the base class thus picking up the changes in the site structure.

Although originally used on a servlet based system independent of 
mod_python and Apache,
it is still applicable here. The cache also does a few other things to 
avoid unnecessary
reimporting as well as cope with removal of files, but enough said on 
that one.

Next thing which might be of interest is that a wrapper for 
HTMLTemplate is provided
along with a template object caching system for its templates. 
HTMLTemplate is a bit
different to most systems in that there is a quite clear separation 
between HTML and
code used to populate the HTML with data. That is, there is no Python 
code embedded
within the HTML, nor is there necessarily any HTML embedded in the 
Python code. In some
respects it is DOM like, but not really, as to make things more 
efficient it only
indexes and allows you to manipulate nodes marked up in a certain way. 
Since this is
a very poor description of HTMLTemplate, you should simply go to its 
web site for a
description.

As to the wrappers for HTMLTemplate and the template caching system, 
what you can end
up with is:

   index.html
   index.py

That is, the HTML is all in "index.html", with necessary node markup 
defined in elements
as necessary. When a request comes in for "index.html", the handler in 
"index.py" is
executed. This loads the HTML for "index.html" via the template cache 
and is given a
copy to work on. It uses the HTMLTemplate API to fill in the data and 
writes back the
rendered result for Apache to to return to the client.

Okay, done for now. If you are interested, get down OSE 8.0b4 and 
specifically look at:

   examples/apache
   examples/apache/_mputils
   netsvc/cache/__init__.py
   netsvc/apache/__init__.py

The example has the start of some documentation on how it hangs 
together, but running
out of time before I go on holidays, so wanted to get something out now.

Knowing that most probably will not care much about the larger 
functionality of OSE,
the "examples/apache/README" file describes how to copy the module 
cache and mod_python
handler out of OSE proper into the example so you can run it as is, 
without installing
OSE.

If you have comments or questions, then post them here on the 
mod_python mailing list
and I'll do my best to address them. If there is enough interest I can 
go on to describe
how I see all this fitting into the larger scheme of things with 
respect to the messaging
system provided with OSE and how one might go about implementing 
sessions and database
interfaces.

Enjoy.

--
Graham Dumpleton (grahamd at dscpl.com.au)