Deron Meranda
deron.meranda at gmail.com
Wed Jun 21 10:42:19 EDT 2006
On 6/21/06, Robert Synnott <rsynnott at gmail.com> wrote: > I'm having a bit of trouble with Unicode. On a linux system, python 2.4.2, > mod_python 3.2.8, inputting and outputting unicode strings to and from a > mysql database works perfectly, with no particular extra work except > use_unicode=True in the mysqldb parameters. Also, unicode values from a .py > file can be outputted happily. > > However, when I try the same software on a MacOS X machine, same versions of > python and mod_python, I get 'UnicodeEncodeError: 'ascii' codec can't encode > characters in...' > > Is there anything special I should be doing to handle unicode? Any idea why > I'm seeing this error? When you say "output", what you mean is that you're writing a sequence of bytes to the HTTP output stream. You can't actually write characters to it. So when you have a Python Unicode string, it must convert it to a byte string to output it. (This is actually one of the type warts in Python, and the python dev list is theorizing about how to fix it, maybe by unifying 'str' and 'unicode' into just a single character string type, and introducing a real byte-string type...but that's speculation and doesn't help you now.) Your problem is probably due to the "default" character set of your Python installation--which is what is used to convert Unicode strings to byte strings if you don't explicitly do it yourself. In particular it's the setencoding() function in the site.py file, which then calls sys.setdefaultencoding(). It's probably defaulting to 'ascii', which is the Python factory-default encoding. The only fixes are to change the site.py (you can't just call those functions afterwards, since they are removed from all namespaces immediately after Python initialization). Or, you can convert strings from one codec to another in a more explicit manner, such as: bytes = u'abc\u2012xyz'.encode('utf8') Search this list, there have been several Unicode discussions on it. -- Deron Meranda
|