Bart
scarfboy at gmail.com
Wed Sep 13 20:01:12 EDT 2006
Hi all, Apologies if this has been discussed before. (the mod_python archives are a bit hard to search decisively) I wonder whether req.write() really has to bork on unicode strings. We have lovely no-worries pythonic unicode handling in python, but in m_p you have to either .encode('utf8') on every single req.write, or less redundantly with code perhaps like: def handler(): ret=[] # code ret.append('I am text') ret.append(u'I am \u2222 other text') # more code req.write( ''.join(ret).encode('utf8') ) return apache.OK (...which is of course sort of annoying if you do do occasionally want the write-and-flush ability, e.g. when the process is slow and you want immediate feedback) (I believe this currently fails because the unicode.__str__ uses the site.py encoding, which you can't always set) Wouldn't it be relatively trivial to make the write function always encode (unicode) strings according to a configured encoding? (with a sane default like utf8, and settable via code, and also PythonOptions apache config so that there won't be a you-need-to-set-this-in-every-file requirement) Of course, this isn't a problem with one true solution(tm), because you still need to set content_type (now perhaps with a default like "text/html; charset=utf-8") or something still makes it work, but the suggestion still seems like handier default behaviour. This probably deserves some documentation attention anyway, partly because the more intuitive-from-python "charset=utf8" seems to be wrong (should be utf-8) Poker of suggestions, --Bart
|