[mod_python] Re: best practices; was "is mod_python borked?"

Sat Nov 4 00:53:19 EST 2006

On 04/11/2006, at 3:43 PM, Steve Holden wrote:

> Graham Dumpleton wrote:
>> On 04/11/2006, at 1:31 AM, Sells, Fred wrote:
>>> The following is extracted from Graham's reply with questions of  
>>> my  own
>>> added.
>>>
>>> Note that I use Apache 2.x and python 2.3 and mod_python 3.1.3  
>>> on  linux.
>>>
>>>
>>>> As long as you don't run up against the trailing slash and base   
>>>> url issues.
>>>
>>> ;-)
>>> I gues I haven't hit them, not sure what they are, are they  
>>> documented
>>> somewhere. Isn't there an Apache module that fixes this?
>> You can read a bit about it at:
>>   http://www.modpython.org/pipermail/mod_python/2006-March/ 
>> 020501.html
> Wasn't this kind of stuff why publisher was designed in the first  
> place? I remember having a dialog with Grisha about similar topics,  
> and his reply essentially being "Most of the web stuff I do uses  
> CGI" :)

Not having myself really qualified which part of that mail message I  
referred
to was what I saw as the problem, am not sure how your comment relates
and which bit you are saying publisher was designed for in the first  
place. :-)

In terms of what I had in mind as being an issue, there are few parts  
to it.

For the first, when a request arrives against a physical directory  
and there is
no trailing slash on the URL, Apache (unless it is disabled), will  
send a redirect
back to the browser forcing it send the request again but with a  
trailing slash
added. By doing this, it is helping the browser to understand what  
the effective
base URL is such that if relative URLs are then used to address other  
resources
within that directory, it will be able to construct the URL properly.

Thus, if we have a filesystem structure:

   /some/path/directory
   /some/path/directory/index.html
   /some/path/directory/file-1.html

and we pretend that the URL to access it is the same. If a URL  
arrives which
is:

   /some/path/directory

then Apache sends a redirect back to the browser asking it to re- 
request the
resource with the URL:

   /some/path/directory/

Where the DirectoryIndex directive is set and it contains  
'index.html', Apache
will now internally map the request, with trailing slash, to the  
'index.html' file
and serve up that page as the response content.

If that page then has a relative link in it for 'file-1.html', the  
browser will if told
to follow that link, will request:

   /some/path/directory/file-1.html

If this trailing slash redirection for the directory wasn't done, and  
the URL of:

   /some/path/directory

without the trailing slash were allowed to map to 'index.html', then  
the browser
when trying to follow the link would actually request:

   /some/path/file-1.html

as it wouldn't know that 'file-1.html' was actual a contained item  
underneath the
context of the resource, as opposed to a sibling.

Now when we look at how URLs are matched to object hierarchies in  
publisher,
something which if I understand the history correctly, follows from  
the bobo
publisher in what became Zope, it doesn't try and view the object  
hierarchy in
the same way as a directory structure and doesn't do any trailing  
slash redirection
to establish directory contexts when it perhaps should.

Thus if I have:

   class dummy:

     def __call__(req): return "<a href='file_1'>file_1</a>"
     def file_1(req): return "file_1"

   directory = dummy()

and I access this through publisher such that trailing part of the  
URL looks like:

   /directory

although the __call__ method acts like the index page for that level  
in the
hierarchy, publisher doesn't look at it that way and simply accepts  
the URL
without the trailing slash. Thus, if the link is now followed in that  
page, it will
end up mapping to:

   /file_1

and not to:

   /directory/file_1

Now although publisher doesn't force the browser to re-request for  
the __call__()
method, it will accept a trailing slash on the URL still and when  
that is done then
the link in the page does correctly resolve the correct resource.

Thus, everything can be made to work as long as whenever the target  
is a __call__()
method, that a trailing slash is always used on the URL. However, if  
the user sees
the URL and manually removes the trailing slash himself, then the  
links in then page
break.

So whereas one should be able to use fixed relative links in a page  
generated from
a __call__() method called by publisher, one is instead forced to  
either make them
absolute path links, or dynamically determine what the relative link  
should be by looking
at the URL and what is determinable about where the point sits in the  
object hierarchy.

This issue is further complicated by how publisher automatically maps  
a request
against a physical directory when SetHandler is used to the index.py  
file, and when
it maps a request against a physical file, whether SetHandler or  
AddHandler is used,
against the index() method contained within it.

Not only does one have to deal with the fact that publisher will  
accept a URL with
out without a trailing slash and doesn't force a trailing slash to be  
added in
appropriate contexts, you end up, as explained in the referenced  
email, with multiple
URLs that can map to the same resource. If all these URLs were at the  
same context
level that wouldn't really matter, but they aren't and have differing  
numbers of slashes
within the URL, further making the process of working with relative  
URLs a mess.

In summary, the problems I see are publisher not doing the trailing  
slash redirect
where appropriate and how automatic mapping to the index file and  
index function
are done resulting in different URLs working with different base URL  
contexts.

Having said all that, what did you mean when saying 'why publisher  
was designed
in the first place'? I have no issue with the object traversal, just  
how it deals with
nodes within the hierarchy.

Graham