[SPAM] Re: [mod_python] Sanitizing user input... but not totally.

Anthony L. anthony at ataribaby.org
Wed Nov 23 16:33:25 EST 2005


Thanks for the suggestions. I am looking into the xml.sax.saxutils  
modules Jim mentioned as well as Python's HTML parsing module.

I recalled an earlier technique I used for blocking undesirable  
characters using JavaScript, and came up with the following:

	x = 'U at s%er#_N$a^m!e%-<'
	
	set = '0123456789 _- 
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.'

	print ''.join( [c for c in x if c in set] )

This is very simple, easily extensible by customizing the set  
variable. It allows me to keep in all my allowable characters  
including underscores and dashes. If I wanted to use this to sanitize  
a user input for an email address, I could add the period and amphora  
to the set variable. This feels pretty fast, though I can't quantify  
speed, and I'm tempted to use it everywhere I don't need to be  
conscious of markup code and SQL injection. But is this a good  
Pythonic way of doing things?

Also, unicode. I'd like to allow input using characters from the  
Latin-1 set, but I can't figure out how. I did the following:

	x = u'Üsernåmë'

	set = u'åëüÜabcdefghijklmnopqrstuvwxyz'
	
	print ''.join( [c for c in x if c in set] )

But it's not enough and will return a UnicodeEncodeError. Of course,  
it's probably not even proper to include the actual decoded  
characters within the variable. Can someone point me to an exhaustive  
source for unicode on python that has many examples?

Anthony




More information about the Mod_python mailing list