Graham Dumpleton
grahamd at dscpl.com.au
Sat Nov 4 00:53:19 EST 2006
On 04/11/2006, at 3:43 PM, Steve Holden wrote: > Graham Dumpleton wrote: >> On 04/11/2006, at 1:31 AM, Sells, Fred wrote: >>> The following is extracted from Graham's reply with questions of >>> my own >>> added. >>> >>> Note that I use Apache 2.x and python 2.3 and mod_python 3.1.3 >>> on linux. >>> >>> >>>> As long as you don't run up against the trailing slash and base >>>> url issues. >>> >>> ;-) >>> I gues I haven't hit them, not sure what they are, are they >>> documented >>> somewhere. Isn't there an Apache module that fixes this? >> You can read a bit about it at: >> http://www.modpython.org/pipermail/mod_python/2006-March/ >> 020501.html > Wasn't this kind of stuff why publisher was designed in the first > place? I remember having a dialog with Grisha about similar topics, > and his reply essentially being "Most of the web stuff I do uses > CGI" :) Not having myself really qualified which part of that mail message I referred to was what I saw as the problem, am not sure how your comment relates and which bit you are saying publisher was designed for in the first place. :-) In terms of what I had in mind as being an issue, there are few parts to it. For the first, when a request arrives against a physical directory and there is no trailing slash on the URL, Apache (unless it is disabled), will send a redirect back to the browser forcing it send the request again but with a trailing slash added. By doing this, it is helping the browser to understand what the effective base URL is such that if relative URLs are then used to address other resources within that directory, it will be able to construct the URL properly. Thus, if we have a filesystem structure: /some/path/directory /some/path/directory/index.html /some/path/directory/file-1.html and we pretend that the URL to access it is the same. If a URL arrives which is: /some/path/directory then Apache sends a redirect back to the browser asking it to re- request the resource with the URL: /some/path/directory/ Where the DirectoryIndex directive is set and it contains 'index.html', Apache will now internally map the request, with trailing slash, to the 'index.html' file and serve up that page as the response content. If that page then has a relative link in it for 'file-1.html', the browser will if told to follow that link, will request: /some/path/directory/file-1.html If this trailing slash redirection for the directory wasn't done, and the URL of: /some/path/directory without the trailing slash were allowed to map to 'index.html', then the browser when trying to follow the link would actually request: /some/path/file-1.html as it wouldn't know that 'file-1.html' was actual a contained item underneath the context of the resource, as opposed to a sibling. Now when we look at how URLs are matched to object hierarchies in publisher, something which if I understand the history correctly, follows from the bobo publisher in what became Zope, it doesn't try and view the object hierarchy in the same way as a directory structure and doesn't do any trailing slash redirection to establish directory contexts when it perhaps should. Thus if I have: class dummy: def __call__(req): return "<a href='file_1'>file_1</a>" def file_1(req): return "file_1" directory = dummy() and I access this through publisher such that trailing part of the URL looks like: /directory although the __call__ method acts like the index page for that level in the hierarchy, publisher doesn't look at it that way and simply accepts the URL without the trailing slash. Thus, if the link is now followed in that page, it will end up mapping to: /file_1 and not to: /directory/file_1 Now although publisher doesn't force the browser to re-request for the __call__() method, it will accept a trailing slash on the URL still and when that is done then the link in the page does correctly resolve the correct resource. Thus, everything can be made to work as long as whenever the target is a __call__() method, that a trailing slash is always used on the URL. However, if the user sees the URL and manually removes the trailing slash himself, then the links in then page break. So whereas one should be able to use fixed relative links in a page generated from a __call__() method called by publisher, one is instead forced to either make them absolute path links, or dynamically determine what the relative link should be by looking at the URL and what is determinable about where the point sits in the object hierarchy. This issue is further complicated by how publisher automatically maps a request against a physical directory when SetHandler is used to the index.py file, and when it maps a request against a physical file, whether SetHandler or AddHandler is used, against the index() method contained within it. Not only does one have to deal with the fact that publisher will accept a URL with out without a trailing slash and doesn't force a trailing slash to be added in appropriate contexts, you end up, as explained in the referenced email, with multiple URLs that can map to the same resource. If all these URLs were at the same context level that wouldn't really matter, but they aren't and have differing numbers of slashes within the URL, further making the process of working with relative URLs a mess. In summary, the problems I see are publisher not doing the trailing slash redirect where appropriate and how automatic mapping to the index file and index function are done resulting in different URLs working with different base URL contexts. Having said all that, what did you mean when saying 'why publisher was designed in the first place'? I have no issue with the object traversal, just how it deals with nodes within the hierarchy. Graham
|