[mod_python] Returned filename question (updated)

Martijn Moeling martijn at xs4us.nu
Mon Jan 8 10:31:13 EST 2007


Perhaps my explanation was misunderstood, let me clarify

If I go to http://www.lokalos.nl without a filename given (set By
Directoryindex in the apache config to index.py)

The returned filename seems to be a random generated filename (like
jkfdasadf.html)

/var/www/html/index.py is a symbolic link to
../Python_modules/generate.py

The Googlebot seems to remember the randomly generated filename, and
sure there is no /var/www/html/jkfdasadf.html, so it will 404

So google will get a 404 accessing the Index page and stops indexing
(as seen in the access_log)

I am puzzled by this.

I could set req.filename to "index.py" within my application, but I feel
this has something to do with the symbolic link. Anyway, where is the
jkfdasadf.html filename generated, would this be MP or apache? Is this
done before the MP handler is started, or after the index.py has done
its work?

As you can understand, the Indexing by searchengines is important for a
lot of websites,

Martijn

-----Oorspronkelijk bericht-----
Van: Graham Dumpleton [mailto:grahamd at dscpl.com.au] 
Verzonden: Sunday, January 07, 2007 1:32 AM
Aan: Martijn Moeling
CC: mod_python at modpython.org
Onderwerp: Re: [mod_python] Returned filename question (updated)


On 07/01/2007, at 12:32 AM, Martijn Moeling wrote:

> Hi,
>
> I was just checking my apache access_log to see if and how the  
> spidering
> by Google was going and I found some strange behavior I cannot  
> explain:
>
> cat /var/log/httpd/access_log |grep Googlebot
>
> all the lines I see are like:
>
> 66.249.65.15 - - [01/Jan/2007:15:43:54 +0100] "GET / 
> zsswsirofodgrdu.html
> HTTP/1.1" 404 298 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
> +http://www.google.com/bot.html)"
>
> Please note the: GET /zsswsirofodgrdu.html
>
> I am seeing this since I upgraded to MP 3.3.0b, I have searched the  
> DOCs
> but for some reason I cannot find out where Google gets the filename
> from....
>
> It seems like MP is returning a random filename now.

Not likely to be anything to do with mod_python.

First off, it is Apache which logs to the access_log file and not  
mod_python.
Apache will use whatever is set in req.filename at the end of the  
request.
The value of req.filename isn't changed by mod_python although a user
level mod_python handler could change it.

That the HTTP status code is 404 though suggests that it isn't even  
getting
to a response handler and so mod_python isn't likely being invoked.  
It simply
looks like that is what Google is actually asking for.

Graham

> The requests send are always http://www2.lokalos.nl or
> http://www2.lokalos.nl/?pr=drenthe
>
> Should I include a filename in my URL's? if I do :
> http://www2.lokalos.nl/index.py?pr=drenthe
> Or is there any way I can force MP/Apace to return the filename  
> index.py
> an force to overwrite the automatic returned filename (like
> fhgdjhlkgjhfhg.html)
>
> Martijn
>
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python



More information about the Mod_python mailing list