[mod_python] Decoding HTML escape characters in HTTP Requests

Clodoaldo Pinto Neto clodoaldo.pinto.neto at gmail.com
Sat Jan 3 17:17:33 EST 2009


2009/1/3 Behnam Esfahbod ZWNJ <behnam at zwnj.org>:
> Hi list,
>
> When browsers need to send Unicode characters (i.e. U+06FA, EXTENDED
> ARABIC-INDIC DIGIT ONE)  in a non-Unicode (i.e. Western ISO-8859-1)
> encoded HTTP request, they escape Unicode characters in HTML escape
> formats.  For example above, the string "&#1777;" will be sent to the
> server.

iso-8859-1 is 256 bytes long only. If you want all the unicode code
points represented you should use utf-8. utf-32 also can represent all
unicode code points but consumes more bandwidth and i don't know if it
is as well supported as utf-8, which is universal.

>
> I'm using mod_pythons's Publisher handler, and in these cases, i get
> the escaped string, not the original Unicode text.  Is it a bug in
> mod_python, or a non-standard feature of common browsers/app-servers,
> or both?

Try to use utf-8 and see what you get.

Regards, Clodoaldo

>
> Best,
> -Behnam
>
> Hint: U+06FA, EXTENDED ARABIC-INDIC DIGIT ONE = &#1777;
>
>
> --
>    '     بهنام اسفهبد
>    '     Behnam Esfahbod
>   '
>  *  ..   http://behnam.esfahbod.info
>  *  `  *
>  * o *   http://zwnj.org
>
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
>



More information about the Mod_python mailing list