Joshua Ginsberg
listspam at flowtheory.net
Mon Aug 28 15:30:02 EDT 2006
Grr. Sorry -- you hit on a pet peeve of mine. UTF-8 IS NOT UNICODE!!!!!!!!!! GAH!!!!!!!!! UTF-8 is a character encoding. UTF-16 is a character encoding. Latin1 is a character encoding. Big5 is a character encoding. Unicode is ***NOT*** a character encoding. Think of it as the Rosetta stone for character encodings. So when you .decode('utf8') a string encoded in UTF-8 you are taking a oython string in the UTF-8 encoding and replacing the characters with the values of their corresponding Unicode codepages -- this changes it to the Python type "unicode". Then you can .encode(some_other_charenc) and it will render those codepages in that particular character encoding. -jag On Mon, 2006-08-28 at 17:12 +0200, Julien Cigar wrote: > Hello, > > I have a little question about encodings. > On the project I'm currently working on, everything is in unicode : > - locales on the server (LANG=en_US.UTF-8) > - the PostgreSQL database > - ... > > I'm using the Psycopg2 module to interact with PostgreSQL, and SimpleTAL > for the template engine. > Those two libraries requires type unicode instead of type str, otherwise > I get errors (ContextContentException: Found non-unicode string in > Context! for SimpleTal, and a "Can't adapt ...." error with psycopg2). > It's still a little obscure for me why it doesn't work with type str ... > > The solution I found (which works) was to .decode('utf-8') or > unicode(mystr, 'utf-8') the POSTed data, but I wondered if it's not > dangerous or incorrect to do like that ? To my knowledge, Apache does > not make conversion of encoding, so it should be done at the mod_python > level, right ? > > Is there a cleaner solution, which works in all cases ? > > In advance thanks, and sorry for my English >
|