Deron Meranda
deron.meranda at gmail.com
Thu Feb 16 10:56:50 EST 2006
On 2/14/06, Gustavo Córdova Avila <gustavo.cordova at q-voz.com> wrote: > ... Python is using > ASCII encoding because it's the default encoding, you can check what your > default encoding is with: > > >>> import sys > >>> sys.getdefaultencoding() > 'iso-8859-1' > > You see here I'm using LATIN1, because it gives me accents and "ñ", and > other > nice chars for spanish. But, there's no _sys.getdefaultencoding()_ function > available because it's explicitly deleted in /usr/lib/pythonXX/site.py: > > (around line 394) > ... > if hasattr(sys, "setdefaultencoding"): > del sys.setdefaultencoding > > I really don't know the rationale for doing that, maybe to stop different > modules from stomping on the default encoding, whatever. > > So, if you can configure your default encoding to whatever you need ("UTF-8" > is pretty nice), unicode objects can be correctly serialized to strings > without > any problems: I have already been trying to figure out a way around this. It appears to me to be a poorly planned feature in Python itself. The reson for this deletion (from what I can tell) makes sense in most cases--it's to prevent apps from pulling dangerous tricks, such as changing the encoding. Imagine a python module which sets the encoding internally because it is easier than calling explit encoding functions. Since the default encoding is interpreter-wide (much like a global variable), it could break the rest of the application. So Python removes the function to discourage sloppy shortcuts. However, there's no way to specify the default encoding when the interpreter is started (either via a python command-line option or the start interpreter C api). And in some applications, especially like mod_python, retaining the ability to change encodings actually makes good sense. Obviously it's best if str() and such in a mod_python-based application has a default encoding which is the same as the request's content-type charset. Since Python's site.py deletes the sys.setdefaultencoding() function, there's no easy way to alter the encoding on a request-by-request basis. The next best would be to specify an encoding on an interpreter basis, perhaps with a directive such as PythonOption DefaultEncoding utf-8 but again, there's no way to do this in a stock Python distribution. I think this is something that should be discussed within the Python dev group. In the mean time, for people that use Unicode, you have a couple options, 1. Alter the site.py and change your defaultencoding (to say "utf-8") as well as insuring your outgoing content-type has the same charset parameter. 2. Do explicit encoding. You can even create your own helper function like, def ustr(s): if not isinstance(s, basestring): s = str(basestring) if isinstance(s, unicode): s = s.encode('utf-8') return s And then use ustr(...) in place of str(...) -- Deron Meranda
|