[mod_python] Problem with PSP and unicode

Gustavo Córdova Avila gustavo.cordova at q-voz.com
Tue Feb 14 14:51:20 EST 2006


Dan Eloff wrote:

> That's the sticky part, currently this doesn't work in python. Quoting 
> one of the PEPs:
>
>    One notable difficulty arises when code requires a string representation of an
>    object; an operation traditionally accomplished by using the str()
>
>    built-in function.
>    
>    Using the current str() function makes the code not Unicode-safe.
>    Replacing a str() call with a unicode() call makes the code not
>    str-stable.  Changing str() so that it could return unicode
>
>    instances would solve this problem.
>
> That's all well and good, but until/if that is ever adopted, we'll 
> have to find a way around it. Probably your best bet for backwards 
> compatibility and unicode support is to have two kinds of psp pages, 
> one for plain ascii, one for unicode. The unicode one uses unicode() 
> and the ascii one stays as is. You would then have to decide when you 
> parse the psp page what the encoding is and produce the appropriate 
> psp object.
>
> -Dan
>
>
> On 2/14/06, *Gregory (Grisha) Trubetskoy* <grisha at modpython.org 
> <mailto:grisha at modpython.org>> wrote:
>
>
>     I'm a bit unicode-ignorant - what should PSP do? The idea was that a
>     variable referred to in a PSP page would be an object that could
>     stringify
>     itself by implementing a __str__(), but obviously this doesn't
>     work with
>     unicode at all. But I'm not sure how self-representation works in the
>     unicode world...
>
>     Grisha
>
>     On Mon, 13 Feb 2006, Dan Eloff wrote:
>
>     > Actually I was just about to post a question about this. The psp
>     generated
>     code surrounds everything with str() before writing it, so it
>     doesn't work
>     with unicode at all.
>
>     -Dan
>
>
Actually, it *can* work perfectly with unicode, if you know what's going on.
The error message that started all this discussion specified that "ASCII"
encoding can't represent the unicoded object in question.  Python is using
ASCII encoding because it's the default encoding, you can check what your
default encoding is with:

    >>> import sys
    >>> sys.getdefaultencoding()
    'iso-8859-1'

You see here I'm using LATIN1, because it gives me accents and "ñ", and 
other
nice chars for spanish.  But, there's no _sys.getdefaultencoding()_ function
available because it's explicitly deleted in /usr/lib/pythonXX/site.py:

    (around line 394)
    ...
    if hasattr(sys, "setdefaultencoding"):
       del sys.setdefaultencoding

I really don't know the rationale for doing that, maybe to stop different
modules from stomping on the default encoding, whatever.

So, if you can configure your default encoding to whatever you need ("UTF-8"
is pretty nice), unicode objects can be correctly serialized to strings 
without
any problems:

    >>> import sys
    >>> sys.getdefaultencoding()
    'iso-8859-1'
    >>> name = u"Gustavo Córdova Avila"
    >>> str(name)
    'Gustavo C\xf3rdova Avila'

See, no errors.

So, time for a bit of homework now. :-)

-gus








More information about the Mod_python mailing list