[mod_python] UnicodeDecodeError with util.FieldStorage(req).Field.value

Graham Dumpleton graham.dumpleton at gmail.com
Sat Jun 23 07:38:36 EDT 2007


Making a guess here, but even though your form tries to require that
posted form data is UTF-8 it probably isn't and is instead passing it
through as ISO-8859-1 or some other European character set. Am then
guessing that that character maps to the special UTF-8 marker
character for indicating a multibyte character.

Try this in Python:

>>> s.decode("iso-8859-1").encode('utf-8')
'\xc3\x83'

See how it ends up as a multibyte character where the marker is itself
\xc3. Also:

>>> print s.decode("iso-8859-1").encode('utf-8')
Ã

Is that the character you are expecting to see.

I'm not in any way an expert on Unicode though, so could be quite wrong.

Graham

On 21/06/07, Anastasios Hatzis <ah at hatzis.de> wrote:
> Hi,
>
> I'm struggling with UnicodeDecodeError when trying to append
> util.FieldStorage(req).Field.value with umlaut into a variable of type
> unicode.
>
> <SNIP>
> <%
> # file: test.psp
>
> from mod_python import Session, util
> req.assbackwards
>
> req.content_type = 'text/html; charset=UTF-8'
> %>
> <!--SOME HTML HEAD STUFF HERE-->
>
> <form action="/test.psp" method="post" enctype="text/plain"
> accept-charset="UTF-8">
> <!-- form calls this very same file -->
> <input name="name" type="text" size="30" maxlength="50" />
> <input type="submit" value="Submit" />
> </form>
>
> <%
> req.write('<p>Write values to HTML</p>') # Works fine!
> store = util.FieldStorage(req)
> for param in store.list:
>     req.write(param.value)
> %>
> <b><%=param.name%></b>:<%=param.value%>:<%=type(param.value)%><br />
> <%
>
> req.write('<p>Append all values to the variable msg</p>')
> msg = u''
> for param in store.list:
>     msg += param.name + ': ' + param.value + '\r\n' # UnicodeDecodeError!
> %>
> <!--SOME HTML FOOT STUFF HERE-->
>
> </SNIP>
>
>
> So, when calling this page the HTML output for first section is rendered with
> umlaut (ä, ü, ...). Value is <type 'str'> ... well, why not, as long as it is
> UTF-8...
>
> But as soon as the script tries to append param.value to the unicode
> variable "msg" I'm getting this error:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 26:
> ordinal not in range(128)
>
>
> I do not understand why I'm getting this error. How does 'ascii' come into
> this? How do I know which encoding is really applied (in param.value and in
> msg)?
>
> I'm blind, hum?
>
> Anastasios
>
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
>
>
>



More information about the Mod_python mailing list