[mod_python] little problem with my handler

Wed Mar 29 02:21:47 EST 2006

Many thanks for all those answers !

Graham Dumpleton wrote:
> A few comments on your handler.
>
> Julien Cigar wrote ..
>   
>> Hello,
>>
>> I'm writing my first home-made handler (inspired by mod_publisher).
>>     
>
> Writing a publisher like handler which allows arbitrary traversal of objects
> is non trivial. Get it wrong and you can expose yourself to potential security
> holes. Be warned, don't attempt it unless you appreciate what the issues
> are. ;-)
>
>   
I forgot to say that I don't want traversal of objects.
> I also think you mean't mod_python.publisher. The Apache mod_publisher
> module is used to perform substitutions on XML/HTML as well as
> implementing an alternative to SSI of mod_include.
>
>   
>> I wanted to do a "ruby on rails"-like dispatcher, so when I call : 
>> http://somehost.com/foo/bar it should call the bar() method of the class
>> (controller) foo which lives under controllers/
>> Please note that it is not at all complete, I'm just playing with it at
>> the moment ..
>>     
>
> Why cant you just use mod_python.publisher?
>
>   
I could (I've used it for a lot of projects here). In fact I wanted to 
understand mod_python more deeply by writing my own handler.
>> ...
>>
>> Here is the code of my handler:
>>
>> #!/usr/bin/python
>>
>> import sys
>>
>> from mod_python import apache, util
>>
>> sys.path.append('/home/jcigar/public_html/invasive_species/application')
>>     
>
> Modifying sys.path from handlers, even from global scope in module is
> bad. It can cause problems in a multithreaded MPM if mod_python tries to
> update sys.path at the same time.
>
> Further, if module reloading is enabled, everytime the module is
> reloaded the directory will be added to sys.path meaning it will grow
> and grow and grow with multiple occurrences of the same directory
> appearing.
>
>   
right... I forgot that
>> def handler(req):
>>
>>     req.allow_methods(["GET", "POST"])
>>    
>>     if req.method not in (["GET", "POST"]):
>>         raise apache.SERVER_RETURN, apache.HTTP_METHOD_NOT_ALLOWED
>>   
>>     (controller, method) = ("default", "index")
>>
>>     if req.path_info:
>>         path_info = req.path_info.strip('/').split('/')
>>
>>     if path_info[0].isalpha():
>>         controller = path_info[0]
>>
>>         if len(path_info) > 1 and path_info[1].isalpha():
>>             method = path_info[1]
>>     
>
> Like mod_python.publisher, you have gone down the path of allowing
> defaults and similarly don't deal with trailing slashes in a reasonable
> way. This causes various problems due to the fact that different URLs
> can be used to address the same resource. The different URLs can
> have varying numbers of slashes in them, which makes calculating
> relative URLs to another resource fiddly and error prone.
>   
Is there a "good" way to handle relative urls ?
>   
>>     config = req.get_config()
>>     (autoreload, log) = (int(config.get("PythonAutoReload", 1)), 
>> int(config.get("PythonDebug", 0)))
>>    
>>     try:
>>         module = apache.import_module('controllers.%s' % (controller, ),
>> autoreload=autoreload, log=log)
>>     
>
> Hope you don't expect autoreload to work on Python packages. It can be
> unreliable at best in publically released versions of mod_python and new
> module importer in future version will not support it. But then there
> are better ways of doing what you are doing when new importer is
> available as you will be able to load module by full path.
>
>   
so apache.import_module should not be used .. ?
>>     except ImportError:
>>         req.log_error('Cannot import controller %s' % (controller, ))
>>
>>     object = getattr(module, method)
>>     
>
> You don't reraise the ImportError exception or return any other sort
> of explicit error status. Thus, when module can't be imported, it will
> still try and access the method from the module, but module variable
> will not exist and so it will die.
>
>   
Yes, this handler is not complete at all (I was just playing a bit with)
> Luckily you don't allow arbitrary traversal and only allow method
> name to start with alphabetic character, so no issue with notional
> private stuff prefixed with underscore.
>
>   
>>     if callable(object):
>>         req.params = util.FieldStorage(req, keep_blank_values=1)
>>     
>
> It is traditional that if storing FieldStorage instance in request object
> that it be stored as req.form.
>
>   
ok :)
>>         result = util.apply_fs_data(object, req.params, req=req)
>>     else:
>>         req.log_error('%s is not callable' % (object, ))
>>         return apache.HTTP_INTERNAL_SERVER_ERROR
>>     
>
> Okay, so only allowing access to callable and not acess to data.
> More safe than mod_python.publisher in that respect.
>
>   
Yes, only to the function of the "controller" in fact. In the futur I'd 
like to do like Ruby on Rails or Django, with some regex patterns (see 
http://www.djangoproject.com/documentation/url_dispatch/)
>>     if result or req.bytes_sent or req.next:
>>     
>
> Do you understand what req.next attribute is? It probably isn't
> relevant to you and how you use it is probably wrong in this
> case anyway. I should probably check that how mod_python.publisher
> even uses it is sensible.
>
>   
>>         req.content_type = 'text/html'
>>     
>
> If data has already been sent, ie., req.bytes_sent, is non zero,
> then setting content length here will not do anything as it has
> to be set before first write of data.
>
>   
>>         req.write(result)
>>     
>
> Your callables must always ensure they return string objects.
> In mod_python.publisher, it will at least convert non string objects
> to strings and attempts to treat Unicode strings specially as well.
>
> Problem with this code is if bytes were sent and result is None,
> it will try and write result and die because it isn't a string.
>
>   
ok, so req.write(str(result))
>>         return apache.OK
>>     else:
>>         req.log_error('Nothing returned')
>>         return apache.HTTP_INTERNAL_SERVER_ERROR
>>
>>
>> In advance, thanks for support !
>>     
>
> Overall, I don't see why you couldn't use mod_python.publisher.
> Yes, publisher provides more features, but if you set your code
> up correctly, you would still get same result as what you are
> aiming for without having to create your own handler.
>
> Graham
>
>   

-- 
Julien Cigar
Belgian Biodiversity Platform
http://www.biodiversity.be
Université Libre de Bruxelles
Campus de la Plaine CP 257
Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
Boulevard du Triomphe, entrée ULB 2
B-1050 Bruxelles
Work: jcigar at ulb.ac.be
Personal: mage at mordor.ath.cx