[mod_python] Problem with PSP and unicode

Thu Feb 16 10:56:50 EST 2006

On 2/14/06, Gustavo Córdova Avila <gustavo.cordova at q-voz.com> wrote:
> ...  Python is using
> ASCII encoding because it's the default encoding, you can check what your
> default encoding is with:
>
>     >>> import sys
>     >>> sys.getdefaultencoding()
>     'iso-8859-1'
>
> You see here I'm using LATIN1, because it gives me accents and "ñ", and
> other
> nice chars for spanish.  But, there's no _sys.getdefaultencoding()_ function
> available because it's explicitly deleted in /usr/lib/pythonXX/site.py:
>
>     (around line 394)
>     ...
>     if hasattr(sys, "setdefaultencoding"):
>        del sys.setdefaultencoding
>
> I really don't know the rationale for doing that, maybe to stop different
> modules from stomping on the default encoding, whatever.
>
> So, if you can configure your default encoding to whatever you need ("UTF-8"
> is pretty nice), unicode objects can be correctly serialized to strings
> without
> any problems:

I have already been trying to figure out a way around this.  It appears
to me to be a poorly planned feature in Python itself.  The reson for this
deletion (from what I can tell) makes sense in most cases--it's to prevent
apps from pulling dangerous tricks, such as changing the encoding.
Imagine a python module which sets the encoding internally
because it is easier than calling explit encoding functions.  Since
the default encoding is interpreter-wide (much like a global variable),
it could break the rest of the application.  So Python removes the function
to discourage sloppy shortcuts.

However, there's no way to specify the default encoding when the
interpreter is started (either via a python command-line option or
the start interpreter C api).  And in some applications, especially
like mod_python, retaining the ability to change encodings actually
makes good sense.

Obviously it's best if str() and such in a mod_python-based application
has a default encoding which is the same as the request's content-type
charset.

Since Python's site.py deletes the sys.setdefaultencoding() function,
there's no easy way to alter the encoding on a request-by-request
basis.
The next best would be to specify an encoding on an interpreter
basis, perhaps with a directive such as

   PythonOption DefaultEncoding utf-8

but again, there's no way to do this in a stock Python distribution.
I think this is something that should be discussed within the
Python dev group.

In the mean time, for people that use Unicode, you have a couple
options,

 1. Alter the site.py and change your defaultencoding (to say "utf-8")
     as well as insuring your outgoing content-type has the same
     charset parameter.

 2. Do explicit encoding.  You can even create your own helper
     function like,

   def ustr(s):
       if not isinstance(s, basestring):
           s = str(basestring)
       if isinstance(s, unicode):
           s = s.encode('utf-8')
       return s

     And then use ustr(...) in place of str(...)

--
Deron Meranda