[mod_python] encoding

Joshua Ginsberg listspam at flowtheory.net
Mon Aug 28 15:30:02 EDT 2006


Grr. Sorry -- you hit on a pet peeve of mine.

UTF-8 IS NOT UNICODE!!!!!!!!!! GAH!!!!!!!!!

UTF-8 is a character encoding. UTF-16 is a character encoding. Latin1 is
a character encoding. Big5 is a character encoding.

Unicode is ***NOT*** a character encoding. Think of it as the Rosetta
stone for character encodings.

So when you .decode('utf8') a string encoded in UTF-8 you are taking a
oython string in the UTF-8 encoding and replacing the characters with
the values of their corresponding Unicode codepages -- this changes it
to the Python type "unicode". Then you can .encode(some_other_charenc)
and it will render those codepages in that particular character
encoding.

-jag

On Mon, 2006-08-28 at 17:12 +0200, Julien Cigar wrote:
> Hello,
> 
> I have a little question about encodings.
> On the project I'm currently working on, everything is in unicode :
> - locales on the server (LANG=en_US.UTF-8)
> - the PostgreSQL database
> - ...
> 
> I'm using the Psycopg2 module to interact with PostgreSQL, and SimpleTAL 
> for the template engine.
> Those two libraries requires type unicode instead of type str, otherwise 
> I get errors (ContextContentException: Found non-unicode string in 
> Context! for SimpleTal, and a "Can't adapt ...." error with psycopg2). 
> It's still a little obscure for me why it doesn't work with type str ...
> 
> The solution I found (which works) was to .decode('utf-8') or 
> unicode(mystr, 'utf-8') the POSTed data, but I wondered if it's not 
> dangerous or incorrect to do like that ? To my knowledge, Apache does 
> not make conversion of encoding, so it should be done at the mod_python 
> level, right ?
> 
> Is there a cleaner solution, which works in all cases ?
> 
> In advance thanks, and sorry for my English
> 



More information about the Mod_python mailing list