[mod_python] Problem with html quoted/unquoted

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Wed May 17 17:46:53 EDT 2006

  I have just spent about 4 days implementing utf8 support in my webapp using
mod_python, mysql-python-1.2.1, ElementTree and mysql-5.0. It turned out after

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

and setting

req.content_type = "text/html; charset=UTF-8"

in the top-level index() function when using mod_python.publisher.

I always get UTF8 data back from the user becasue of teh above then.
Thus, there was no need to convert the data from POSTed HTML forms to unicode
and to push the data in unicode using
   connect(..., use_unicode=True, charset='utf8')
That didn't work unless I updated to mysql-python-1.2.1 anyway. Still, I felt
I can live with strings in utf-8 encoding and forget about conversion to unicode.
But, using use unicode=False has shown a bug in mysql-python-1.2.1, so I filed
a bug report on that and had to patch two lines. Next two days I spent figuring
out why the data in exported XML files are screwed up. Yes, I do have to pass
unicode objects to ElementTree.
Tables in mysql I have charset set to utf8 charset, connection charset is utf8,
Check also output from 'show full columns from $tablename'. Why don't you want
to store utf8 data in mysql tables? It is supported, you can search on them,
you can "ignore" the accented chars on searches if you wish ...

In summary, except debugging print calls which use 'ascii' encoding to
convert whatever object to string, I get always nice UTF-8 data back to the
browser and browser always post in that encoding. Internally, I do not have to
convert to unicode at all except the ElementTree problem.
In case of those print calls used for debugging or when exceptions are raised,
I get the \xc2, etc. In all other cases, I have strings, sometimes use
isinstance(_var, basestring) checks and only once use unicode() call,
actually _str.decode('utf8), when feeding data in (c)ElementTree.

Regarding the entity problem, I had the same problem when I fed utf8 strings
into ElementTree, I got out '&#174', etc. That got fixed after feeding in unicode
objects directly. I thought about htmltidy, actually tidy which I think
has some pythonic interface, but probably as you said urllib has similar

Hope this helps

Wouter van Marle wrote:
> Hi all,
> I have a problem with POST requests - loosely related to mod_python but
> I don't know a better place to ask.
> I have a website run by python/mod_python/apache2, using a MySQL
> database (accessed through MySQLdb).
> For html compatibility reasons, I store all the search data in the MySQL
> database in html-quoted format. The term in question that I ran into is:
> "Jubiläumsmodell", stored in the database as "Jubil&auml;umsmodell".
> The select in the html source, as presented to the browser, is fine (the
> html line is abbreviated for here; it contains many more options):
> <select name=model>
> 	<option value="Jubil&auml;umsmodell 40 Jahre">Jubil&auml;umsmodell 40
> Jahre</option>
> </select>
> It is rendered correctly of course by the browser.
> Now when the user clicks go, and the POST is generated, the character is
> sent back in what I guess is UTF-8 encoding (I assume it's that, as
> that's my default encoding) (again only a snippet):
> &model=Jubil%C3%A4umsmodell+40+Jahre
> And as you can imagine this messes up the search in the MySQL database,
> where I can find the term only if it's html quoted again.
> So the end of the whole story, the actual question is now: how do I get
> from this utf-8 encoding back to html quoted encoding, for searching in
> the database? And no, I'm not going to put the info in the database in
> utf-8, that must remain html quoted for many other locations where it
> goes wrong otherwise.
> Wouter.
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python

Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic

More information about the Mod_python mailing list