[mod_python] little problem with my handler

Tue Mar 28 19:11:20 EST 2006

A few comments on your handler.

Julien Cigar wrote ..
> Hello,
> 
> I'm writing my first home-made handler (inspired by mod_publisher).

Writing a publisher like handler which allows arbitrary traversal of objects
is non trivial. Get it wrong and you can expose yourself to potential security
holes. Be warned, don't attempt it unless you appreciate what the issues
are. ;-)

I also think you mean't mod_python.publisher. The Apache mod_publisher
module is used to perform substitutions on XML/HTML as well as
implementing an alternative to SSI of mod_include.

> I wanted to do a "ruby on rails"-like dispatcher, so when I call : 
> http://somehost.com/foo/bar it should call the bar() method of the class
> (controller) foo which lives under controllers/
> Please note that it is not at all complete, I'm just playing with it at
> the moment ..

Why cant you just use mod_python.publisher?

> ...
> 
> Here is the code of my handler:
> 
> #!/usr/bin/python
> 
> import sys
> 
> from mod_python import apache, util
> 
> sys.path.append('/home/jcigar/public_html/invasive_species/application')

Modifying sys.path from handlers, even from global scope in module is
bad. It can cause problems in a multithreaded MPM if mod_python tries to
update sys.path at the same time.

Further, if module reloading is enabled, everytime the module is
reloaded the directory will be added to sys.path meaning it will grow
and grow and grow with multiple occurrences of the same directory
appearing.

> def handler(req):
> 
>     req.allow_methods(["GET", "POST"])
>    
>     if req.method not in (["GET", "POST"]):
>         raise apache.SERVER_RETURN, apache.HTTP_METHOD_NOT_ALLOWED
>   
>     (controller, method) = ("default", "index")
>
>     if req.path_info:
>         path_info = req.path_info.strip('/').split('/')
> 
>     if path_info[0].isalpha():
>         controller = path_info[0]
>
>         if len(path_info) > 1 and path_info[1].isalpha():
>             method = path_info[1]

Like mod_python.publisher, you have gone down the path of allowing
defaults and similarly don't deal with trailing slashes in a reasonable
way. This causes various problems due to the fact that different URLs
can be used to address the same resource. The different URLs can
have varying numbers of slashes in them, which makes calculating
relative URLs to another resource fiddly and error prone.

>     config = req.get_config()
>     (autoreload, log) = (int(config.get("PythonAutoReload", 1)), 
> int(config.get("PythonDebug", 0)))
>    
>     try:
>         module = apache.import_module('controllers.%s' % (controller, ),
> autoreload=autoreload, log=log)

Hope you don't expect autoreload to work on Python packages. It can be
unreliable at best in publically released versions of mod_python and new
module importer in future version will not support it. But then there
are better ways of doing what you are doing when new importer is
available as you will be able to load module by full path.

>     except ImportError:
>         req.log_error('Cannot import controller %s' % (controller, ))
>
>     object = getattr(module, method)

You don't reraise the ImportError exception or return any other sort
of explicit error status. Thus, when module can't be imported, it will
still try and access the method from the module, but module variable
will not exist and so it will die.

Luckily you don't allow arbitrary traversal and only allow method
name to start with alphabetic character, so no issue with notional
private stuff prefixed with underscore.

>     if callable(object):
>         req.params = util.FieldStorage(req, keep_blank_values=1)

It is traditional that if storing FieldStorage instance in request object
that it be stored as req.form.

>         result = util.apply_fs_data(object, req.params, req=req)
>     else:
>         req.log_error('%s is not callable' % (object, ))
>         return apache.HTTP_INTERNAL_SERVER_ERROR

Okay, so only allowing access to callable and not acess to data.
More safe than mod_python.publisher in that respect.

>     if result or req.bytes_sent or req.next:

Do you understand what req.next attribute is? It probably isn't
relevant to you and how you use it is probably wrong in this
case anyway. I should probably check that how mod_python.publisher
even uses it is sensible.

>         req.content_type = 'text/html'

If data has already been sent, ie., req.bytes_sent, is non zero,
then setting content length here will not do anything as it has
to be set before first write of data.

>         req.write(result)

Your callables must always ensure they return string objects.
In mod_python.publisher, it will at least convert non string objects
to strings and attempts to treat Unicode strings specially as well.

Problem with this code is if bytes were sent and result is None,
it will try and write result and die because it isn't a string.

>         return apache.OK
>     else:
>         req.log_error('Nothing returned')
>         return apache.HTTP_INTERNAL_SERVER_ERROR
> 
> 
> In advance, thanks for support !

Overall, I don't see why you couldn't use mod_python.publisher.
Yes, publisher provides more features, but if you set your code
up correctly, you would still get same result as what you are
aiming for without having to create your own handler.

Graham