Emlyn Jones
emlynj at gmail.com
Wed Jan 11 06:44:22 EST 2006
Your right! I thought I had solved that one by explicitly adding a user-agent header. Turns out the urllib2 in python2.2 adds it's own anyway (mod_python used 2.2, my shell gives me 2.4). I've upgraded mod_python to python2.4 and it works. Now I get a seg fault from xml.dom.minidom.parsestring which again only happens via mod_python not from the shell (although I guess that could just be a fluke), grrr. More googling I guess! Thanks for your help. Emlyn. On 1/10/06, Colin Bean <ccbean at gmail.com> wrote: > > It looks like the 403 is being returned by Google, rather than your copy > of Apache. I believe that Google has some rules designed to block web > crawlers / scrapers under some circumstances. One possibility would be that > running your script under apache changes something in the request headers > that makes Google block the request. That's just a guess, though (and my > knowledge of Google's behavior is based on a project I did a couple of years > ago; so take it with a grain of salt). Have you tried this code on some > other URLs? What kind of results do you get then? > > hth, > -Colin > > > > On 1/10/06, Emlyn Jones <emlynj at gmail.com > wrote: > > > Hello, > > I'm not convinced that this is a specific mod_python problem but I'm not > > sure exactly how to explain it in generic terms so hopefully someone here > > can point me in the right direction. > > I have a python script which calls urllib2.urlopen to open a url on a > > remote server (google). It works fine from the command line but when I run > > it from a mod_python.psp page I get: > > > > HTTPError: HTTP Error 403: Forbidden > > > > Clearly this is something to do with the permissions of the apache user > > vs my shell user but what is the safest way to allow this script to run? > > > > Regards, > > Emlyn. > > > > > > _______________________________________________ > > Mod_python mailing list > > Mod_python at modpython.org > > http://mailman.modpython.org/mailman/listinfo/mod_python > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mm_cfg_has_not_been_edited_to_set_host_domains/pipermail/mod_python/attachments/20060111/40f2950b/attachment.html
|