Manfred Stienstra
manfred.stienstra at dwerg.net
Wed Jan 7 07:32:31 EST 2004
On Wed, 2004-01-07 at 02:40, jalil at securia.com wrote: > - When I read in a parameter value and print the type of the string, it > is "str" and not "unicode". I know unicode is not "utf-8" and I think > this is fine Yes, this should be a utf-8 encoded string. Personally I think no application should convert this data to a unicode object because you always want to know the encoding it came from. For example utf-8 -> unicode object -> iso8859-15 could mean loss of characters. > - When I try to decode the value into "utf-8" and turn it into unicode > in python, I get an exception (decoding error - invalid data). Why is > that? HTML uses Unicode codepoints and I am sending in utf-8 encoding, > so why I get invalid data? Are you using the unicode constructor or an decode method? Also, a bad browser could send something that is not unicode, you should always expect unicode errors here. Here is a nice introduction to the whole encode/decode confusion: http://www.vandervossen.net/2003/07/unicode_in_python > Second, althought I set the charset in my returned data to utf-8, > the browser doesn't select utf-8 as encoding. If the document isn't utf-8 encoded and the headers claim it is, the browser could try to correct it and show the proper encoding. > So, I thought maybe I > should try to convert into unicode b/f putting the data into my table. > Is this right? Should I do anything b/f storing and/or sending the data? Yes, you should check the input. You don't have to do this by converting the string into a unicode object, but you can do it by checking the byte ordering (see the utf-8 specifications for more information). Manfred
|