[mod_python] Unicode Oddity

Deron Meranda deron.meranda at gmail.com
Wed Jun 21 10:42:19 EDT 2006


On 6/21/06, Robert Synnott <rsynnott at gmail.com> wrote:
> I'm having a bit of trouble with Unicode. On a linux system, python 2.4.2,
> mod_python 3.2.8, inputting and outputting unicode strings to and from a
> mysql database works perfectly, with no particular extra work except
> use_unicode=True in the mysqldb parameters. Also, unicode values from a .py
> file can be outputted happily.
>
> However, when I try the same software on a MacOS X machine, same versions of
> python and mod_python, I get 'UnicodeEncodeError: 'ascii' codec can't encode
> characters in...'
>
> Is there anything special I should be doing to handle unicode? Any idea why
> I'm seeing this error?

When you say "output", what you mean is that you're writing a
sequence of bytes to the HTTP output stream.  You can't actually
write characters to it.  So when you have a Python Unicode string,
it must convert it to a byte string to output it.  (This is actually one
of the type warts in Python, and the python dev list is theorizing
about how to fix it, maybe by unifying 'str' and 'unicode' into just
a single character string type, and introducing a real byte-string
type...but that's speculation and doesn't help you now.)

Your problem is probably due to the "default" character set of your
Python installation--which is what is used to convert Unicode strings
to byte strings if you don't explicitly do it yourself.  In particular
it's the setencoding() function in the site.py file, which then calls
sys.setdefaultencoding().  It's probably defaulting to 'ascii', which
is the Python factory-default encoding.

The only fixes are to change the site.py (you can't just call those
functions afterwards, since they are removed from all namespaces
immediately after Python initialization).  Or, you can convert
strings from one codec to another in a more explicit manner, such as:

   bytes = u'abc\u2012xyz'.encode('utf8')

Search this list, there have been several Unicode discussions
on it.
-- 
Deron Meranda


More information about the Mod_python mailing list