Martijn Moeling
martijn at xs4us.nu
Mon Jan 8 10:31:13 EST 2007
Perhaps my explanation was misunderstood, let me clarify If I go to http://www.lokalos.nl without a filename given (set By Directoryindex in the apache config to index.py) The returned filename seems to be a random generated filename (like jkfdasadf.html) /var/www/html/index.py is a symbolic link to ../Python_modules/generate.py The Googlebot seems to remember the randomly generated filename, and sure there is no /var/www/html/jkfdasadf.html, so it will 404 So google will get a 404 accessing the Index page and stops indexing (as seen in the access_log) I am puzzled by this. I could set req.filename to "index.py" within my application, but I feel this has something to do with the symbolic link. Anyway, where is the jkfdasadf.html filename generated, would this be MP or apache? Is this done before the MP handler is started, or after the index.py has done its work? As you can understand, the Indexing by searchengines is important for a lot of websites, Martijn -----Oorspronkelijk bericht----- Van: Graham Dumpleton [mailto:grahamd at dscpl.com.au] Verzonden: Sunday, January 07, 2007 1:32 AM Aan: Martijn Moeling CC: mod_python at modpython.org Onderwerp: Re: [mod_python] Returned filename question (updated) On 07/01/2007, at 12:32 AM, Martijn Moeling wrote: > Hi, > > I was just checking my apache access_log to see if and how the > spidering > by Google was going and I found some strange behavior I cannot > explain: > > cat /var/log/httpd/access_log |grep Googlebot > > all the lines I see are like: > > 66.249.65.15 - - [01/Jan/2007:15:43:54 +0100] "GET / > zsswsirofodgrdu.html > HTTP/1.1" 404 298 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; > +http://www.google.com/bot.html)" > > Please note the: GET /zsswsirofodgrdu.html > > I am seeing this since I upgraded to MP 3.3.0b, I have searched the > DOCs > but for some reason I cannot find out where Google gets the filename > from.... > > It seems like MP is returning a random filename now. Not likely to be anything to do with mod_python. First off, it is Apache which logs to the access_log file and not mod_python. Apache will use whatever is set in req.filename at the end of the request. The value of req.filename isn't changed by mod_python although a user level mod_python handler could change it. That the HTTP status code is 404 though suggests that it isn't even getting to a response handler and so mod_python isn't likely being invoked. It simply looks like that is what Google is actually asking for. Graham > The requests send are always http://www2.lokalos.nl or > http://www2.lokalos.nl/?pr=drenthe > > Should I include a filename in my URL's? if I do : > http://www2.lokalos.nl/index.py?pr=drenthe > Or is there any way I can force MP/Apace to return the filename > index.py > an force to overwrite the automatic returned filename (like > fhgdjhlkgjhfhg.html) > > Martijn > > _______________________________________________ > Mod_python mailing list > Mod_python at modpython.org > http://mailman.modpython.org/mailman/listinfo/mod_python
|