[mod_python] Approach to mod_python "secure" code

Fri Nov 17 11:57:18 EST 2006

On 11/17/06, fizban <fizban at paranoici.org> wrote:
> 1* take req.uri, str() it (just in case?) and split('/') it.
> [stuff = str(req.uri).split('/')

There's no need to str() it.  It's already a string.  It will also have been
url decoded.

However req.uri is not UTF-8 decoded, in case you deal with
Unicode URLs.  If you care about that you should probably do

   try:
       uri = req.uri().decode('utf8')
   except UnicodeDecodeError:
       raise apache.SERVER_RETURN, apache.HTTP_BAD_REQUEST

Doing a split('/') is fine, assuming of course you are not overriding
the default Apache configuration of the AllowEncodedSlashes directive:
http://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes

> 2* take stuff[1], see if isalpha(), if so see if stuff[1] is in a tuple
> (contains all the valid "sections"). if it is, we assume stuff[1] is
> safe to deal with. if not, we return a custom 404.

Watch out for empty parts.  For example if the url contains /////

The isalpha() test is fine.  If you've unicode decoded it just be
aware that isalpha() will also allow non-latin letters too.

> 3* if stuff[1] is valid, and it is in a tuple containing a list of
> special sections with a matching function, we run that function
> [eval("%s(%s)" % (section, "req"))].

This should be secure, if you are definitely checking the string
against a known acceptible list of them.

But it's bad Python form!  eval should only be used as a last
resort (unless you don't care about form/style).  It's almost
never necessary.  This is a bad habit encouraged by PHP that
you will want to unlearn quickly.

The simplest way to do this without eval is to change your
list into a dictionary.  Assuming you have something like

   allowed_sections = ['one', 'two', 'three']

change it to

   def one(req): ...
   def two(req): ...
   def three(req): ...
   section_mapping = { 'one': one,  'two': two,  'three': three }

and then rather than eval call them as

   try:
       section_mapping[section]( req )
   except KeyError:
       raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND

Another common way is to put all your functions inside a class
and use getattr and such.  You might want to get the O'Reilly
book "Python Cookbook" to learn more techniques of Python
programming.

> some of these functions take other
> arguments, like a (pre validated with similar approach) stuff[2], or
> req.args (same here). otherwise we run some other routine, by parsing
> and req.writing a template.
> [stuff[2] or req.args are this time matched against regular expressions,
> to see if they fit the arguments taken by the section functions]

It is very common to use regular expressions to parse URIs.  Many
python web frameworks do this, and no reason you shouldn't
as well.  Particularly in Python the named epxression ?P<..> syntax is
quite useful to keep your regex code readable,

   m = re.match( r'/one/(?P<size>\d+)', url)
   size = m.group('size')

> Do you guys think it's a decent approach in terms of "security"? Would
> you take any other validation steps? As I said I'm really new to python
> and mod_python, so since the website has some huge userbase, I'm really
> worried about security..

Lets just say that what you've shown us shouldn't be insecure,
but we can't say it's secure either.  There's so much that's not
even talked about.  For example the whole user authentication,
SSL, use of database queries, embedded/hardcoded passwords
(which are a definte no-no, especially if you have PythonDebug
turned on).

> We are not using (for various reasons) sql db,
> only templates and local xml basically, so sql inj. is not an issue.

Okay, but you still might want to worry about other kinds
of injection.  Such as pathname injection, javascript
injection, etc.

What if the URL contains the characters "<![CDATA["
for example.  That could really mess up some XML processors.

Just be cautious of where the data came from and how you
use it and you should be fine.

> Since the site re-design will force us to change all the URI, I have
> setup some other function to see if str(req.uri) matches moved or
> deleted pages, if so we return 410 or 301 messages. 404 give the
> impression of a messed up site. Is str(req.uri) safe enough to be passed
> as argument to the notfound() or moved() functions I've made?

Sending a 301 is a very good thing when you move URLs around.
For instance the googlebot indexer when seeing a 301 will be
more likely to trust the new URLs, carrying forward all your earned
rankings, etc.  Also 301's are used by some browsers' bookmark
systems, so that bookmarks are automatically updated.

As for 404s, don't worry about any "impression".  Use the correct
code for the correct situation.  The only case where I might deviate
from this is to send 302 rather than the recommended 307, since
many old browsers (IE) don't understand 307.

Also you may want to use the symbolic names, such as
apache.HTTP_NOT_FOUND, provided by the modpython.apache
module rather than numbers, as it makes your code more readable.

Good luck.
-- 
Deron Meranda