Deron Meranda
deron.meranda at gmail.com
Fri Nov 17 11:57:18 EST 2006
On 11/17/06, fizban <fizban at paranoici.org> wrote: > 1* take req.uri, str() it (just in case?) and split('/') it. > [stuff = str(req.uri).split('/') There's no need to str() it. It's already a string. It will also have been url decoded. However req.uri is not UTF-8 decoded, in case you deal with Unicode URLs. If you care about that you should probably do try: uri = req.uri().decode('utf8') except UnicodeDecodeError: raise apache.SERVER_RETURN, apache.HTTP_BAD_REQUEST Doing a split('/') is fine, assuming of course you are not overriding the default Apache configuration of the AllowEncodedSlashes directive: http://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes > 2* take stuff[1], see if isalpha(), if so see if stuff[1] is in a tuple > (contains all the valid "sections"). if it is, we assume stuff[1] is > safe to deal with. if not, we return a custom 404. Watch out for empty parts. For example if the url contains ///// The isalpha() test is fine. If you've unicode decoded it just be aware that isalpha() will also allow non-latin letters too. > 3* if stuff[1] is valid, and it is in a tuple containing a list of > special sections with a matching function, we run that function > [eval("%s(%s)" % (section, "req"))]. This should be secure, if you are definitely checking the string against a known acceptible list of them. But it's bad Python form! eval should only be used as a last resort (unless you don't care about form/style). It's almost never necessary. This is a bad habit encouraged by PHP that you will want to unlearn quickly. The simplest way to do this without eval is to change your list into a dictionary. Assuming you have something like allowed_sections = ['one', 'two', 'three'] change it to def one(req): ... def two(req): ... def three(req): ... section_mapping = { 'one': one, 'two': two, 'three': three } and then rather than eval call them as try: section_mapping[section]( req ) except KeyError: raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND Another common way is to put all your functions inside a class and use getattr and such. You might want to get the O'Reilly book "Python Cookbook" to learn more techniques of Python programming. > some of these functions take other > arguments, like a (pre validated with similar approach) stuff[2], or > req.args (same here). otherwise we run some other routine, by parsing > and req.writing a template. > [stuff[2] or req.args are this time matched against regular expressions, > to see if they fit the arguments taken by the section functions] It is very common to use regular expressions to parse URIs. Many python web frameworks do this, and no reason you shouldn't as well. Particularly in Python the named epxression ?P<..> syntax is quite useful to keep your regex code readable, m = re.match( r'/one/(?P<size>\d+)', url) size = m.group('size') > Do you guys think it's a decent approach in terms of "security"? Would > you take any other validation steps? As I said I'm really new to python > and mod_python, so since the website has some huge userbase, I'm really > worried about security.. Lets just say that what you've shown us shouldn't be insecure, but we can't say it's secure either. There's so much that's not even talked about. For example the whole user authentication, SSL, use of database queries, embedded/hardcoded passwords (which are a definte no-no, especially if you have PythonDebug turned on). > We are not using (for various reasons) sql db, > only templates and local xml basically, so sql inj. is not an issue. Okay, but you still might want to worry about other kinds of injection. Such as pathname injection, javascript injection, etc. What if the URL contains the characters "<![CDATA[" for example. That could really mess up some XML processors. Just be cautious of where the data came from and how you use it and you should be fine. > Since the site re-design will force us to change all the URI, I have > setup some other function to see if str(req.uri) matches moved or > deleted pages, if so we return 410 or 301 messages. 404 give the > impression of a messed up site. Is str(req.uri) safe enough to be passed > as argument to the notfound() or moved() functions I've made? Sending a 301 is a very good thing when you move URLs around. For instance the googlebot indexer when seeing a 301 will be more likely to trust the new URLs, carrying forward all your earned rankings, etc. Also 301's are used by some browsers' bookmark systems, so that bookmarks are automatically updated. As for 404s, don't worry about any "impression". Use the correct code for the correct situation. The only case where I might deviate from this is to send 302 rather than the recommended 307, since many old browsers (IE) don't understand 307. Also you may want to use the symbolic names, such as apache.HTTP_NOT_FOUND, provided by the modpython.apache module rather than numbers, as it makes your code more readable. Good luck. -- Deron Meranda
|