Julien Cigar
jcigar at ulb.ac.be
Mon Aug 28 16:17:18 EDT 2006
Joshua Ginsberg wrote: > Grr. Sorry -- you hit on a pet peeve of mine. > > UTF-8 IS NOT UNICODE!!!!!!!!!! GAH!!!!!!!!! > > Yep I know that :-) I understand that an Unicode is just a 21 bits string ... (an unique number between 1 and 2097152) ... > UTF-8 is a character encoding. UTF-16 is a character encoding. Latin1 is > a character encoding. Big5 is a character encoding. > > Unicode is ***NOT*** a character encoding. Think of it as the Rosetta > stone for character encodings. > > So when you .decode('utf8') a string encoded in UTF-8 you are taking a > That was my question, how can be sure that a string is always encoded in UTF-8 when the user submit the form ? > oython string in the UTF-8 encoding and replacing the characters with > the values of their corresponding Unicode codepages -- this changes it > to the Python type "unicode". Then you can .encode(some_other_charenc) > and it will render those codepages in that particular character > encoding. > > -jag > > On Mon, 2006-08-28 at 17:12 +0200, Julien Cigar wrote: > >> Hello, >> >> I have a little question about encodings. >> On the project I'm currently working on, everything is in unicode : >> - locales on the server (LANG=en_US.UTF-8) >> - the PostgreSQL database >> - ... >> >> I'm using the Psycopg2 module to interact with PostgreSQL, and SimpleTAL >> for the template engine. >> Those two libraries requires type unicode instead of type str, otherwise >> I get errors (ContextContentException: Found non-unicode string in >> Context! for SimpleTal, and a "Can't adapt ...." error with psycopg2). >> It's still a little obscure for me why it doesn't work with type str ... >> >> The solution I found (which works) was to .decode('utf-8') or >> unicode(mystr, 'utf-8') the POSTed data, but I wondered if it's not >> dangerous or incorrect to do like that ? To my knowledge, Apache does >> not make conversion of encoding, so it should be done at the mod_python >> level, right ? >> >> Is there a cleaner solution, which works in all cases ? >> >> In advance thanks, and sorry for my English >> >> > >
|