[mod_python] Problem with PSP and unicode

Gustavo Córdova Avila gustavo.cordova at q-voz.com
Thu Feb 16 12:40:29 EST 2006


> Dan Eloff wrote: Gustavo, here's an example. Suppose some code 
> enforces a maximum length on a string. If it's counting on a default 
> encoding of 1 byte per char, and does something like len(s) <= 15. For 
> ascii or iso-8859-1 this would work. Or the code might use indices or 
> slices (and a lot of code does!) If suddenly you have utf-8 encoded 
> chinese, your string is going to triple in length, and those functions 
> will have unpredictable behaviour. You could think of any number of 
> scenarios, even in the python library. I just wouldn't feel 
> confortable about changing the default encoding, you never know where 
> it will come back to haunt you. What's so hard about using unicode 
> strings in your program and then encoding when you send output somewhere?

Yes, it seems like a valid problem case, but it can be sidestepped very 
easily by simple working with unicode objects until you're ready to 
output your data.  PSP (and 'str') are going to render your unicodes 
only when it's absolutely necesary, and not before (at least that's what 
my template classes do), so if you use:

    <%= myname[:32] %>

the unicode object being sliced --myname-- is going to return another 
unicode object representing the slice, which in turn is going to be 
converted to 'str' using the default encoding.

There's a lot of ways to plan your apps to not have the trouble you 
mention, and my motto has always been "Don't think 'why not', think 'how 
to'."

Good luck :-)

-gus






More information about the Mod_python mailing list