|
Bart
scarfboy at gmail.com
Wed Sep 13 20:01:12 EDT 2006
Hi all,
Apologies if this has been discussed before.
(the mod_python archives are a bit hard to search decisively)
I wonder whether req.write() really has to bork on unicode strings.
We have lovely no-worries pythonic unicode handling in python,
but in m_p you have to either .encode('utf8') on every single req.write,
or less redundantly with code perhaps like:
def handler():
ret=[]
# code
ret.append('I am text')
ret.append(u'I am \u2222 other text')
# more code
req.write( ''.join(ret).encode('utf8') )
return apache.OK
(...which is of course sort of annoying if you do do occasionally
want the write-and-flush ability, e.g. when the process is slow
and you want immediate feedback)
(I believe this currently fails because the unicode.__str__
uses the site.py encoding, which you can't always set)
Wouldn't it be relatively trivial to make the write function always
encode (unicode) strings according to a configured encoding?
(with a sane default like utf8, and settable via code, and
also PythonOptions apache config so that there won't be a
you-need-to-set-this-in-every-file requirement)
Of course, this isn't a problem with one true solution(tm),
because you still need to set content_type (now perhaps
with a default like "text/html; charset=utf-8") or something
still makes it work, but the suggestion still seems like
handier default behaviour.
This probably deserves some documentation attention anyway,
partly because the more intuitive-from-python "charset=utf8"
seems to be wrong (should be utf-8)
Poker of suggestions,
--Bart
|