[mod_python] Some observations after writing my own modpython

Tue May 29 00:23:43 EDT 2007

On 29/05/07, Roger Binns <rogerb at rogerbinns.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> So I wrote my own Apache module because I needed some things that
> weren't present in modpython (covered in previous messages in this
> group).  Here are some observations based on the experience.
>
> I have a new respect for anyone who has written an Apache module.

Maybe you would like to review my code for me. :-)

  http://www.modwsgi.org

I would like to hope it does some of the core things like
initialisation and interpreter management better than mod_python does.
I know it fixes a lot of things that mod_python doesn't really get
right.

> Ultimately the internal structure and processing is simple.  Everything
> just "plugs in" at the appropriate points.  However the documentation
> about the plugging in, which apis to call and when, etc is pretty much
> non-existent.  (There are a few hello world examples but that is it).
> Doxygen generated references are very good, but the gotcha is that you
> have to know which methods to call.  It is kind of like learning a new
> language by being handed a dictionary but with no idea of grammar.

One can read:

  http://www.fmc-modeling.org/projects/apache/html/Apache_Modeling_Project.html

Also, do a search on the net using Google for 'Writing Apache Modules
with Perl and C'. The book is old and some sections are posted on the
mod_perl site. Look a bit harder through the Google search results and
you might stumble across a more complete copy. I will not post the
address as it isn't probably supposed to be on the net in the first
place.

> For my module I chose to use the same names as the Apache apis.  For
> example I used "ap_get_remote_host" as the method name.  This means that
> it can be looked up in the Apache documentation for semantics, side
> effects and errors.  It does result in un-pythonic names but makes life
> easier for me anyway.

All my work on SWIGing the Apache APIs could probably have saved you a
lot of effort. :-)

> My main goal of being able to use subrequests in order to apply
> authentication and authorization directives from some uris to code in
> another uri has so far worked perfectly although I haven't explored all
> corner cases yet.  Note that I can actually do it from the request phase
> just fine and don't have to do it in the authentication phase.  (The
> request phase is when I have all the information necessary to know
> exactly what I want to do.)

Is what you have done patches to mod_python, or have you gone and
written your own Apache module from scratch which is independent of
mod_python, or uses the exported mod_python optional functions for
getting access to interpreters? Are you intending to post your code
anywhere so we can have a look?

> These are various issues I found in the modpython code while comparing
> my approach with modpython:

And there are a lot more problems than that. Personally, in terms of
adding more and more access to the Apache API I am far from convinced
that mod_python is a good starting point. :-)

> Modpython code for setting errors could be using PyErr_SetString  or
> PyErr_Format in several places.  In some places it actually leaks, for
> example line 111 in connobject.c:
>
>   if (rc != APR_SUCCESS) {
>       PyErr_SetObject(PyExc_IOError,
>                       PyString_FromString("Connection read error"));
>       return NULL;
>   }
>
> The PyString_FromString results in a leaked object since it isn't
> decrefed.  It would be worthwhile checking out all usages of PyErr_*
> which should result in less code and definitely fewer memory leaks.
>
> While on the subject of errors, apr_strerror is never used so you'll not
> actually know what an error was.  For example see req_sendfile in
> requestobject.c.
>
>   status=apr_file_open(&fd, fname,
>                          APR_READ, APR_OS_DEFAULT,
>                          self->request_rec->pool);
>   if (status != APR_SUCCESS) {
>         PyErr_SetString(PyExc_IOError, "Could not open file for reading");
>         return NULL;
>     }
>
> Using apr_strerror will give the actual error string which could be any
> number of things in this example (eg permission denied, file doesn't exist).
>
> The Apache documentation says you are supposed to do this sequence of
> calls in order to read a request body:
>
>   ap_setup_client_block()
>   if (ap_should_client_block()):
>          ap_get_client_block()
>
> modpython does the setup implicitly saying that chunked uploads should
> be rejected.  The mp doc doesn't mention that it is always safe to call
> read() since it calls should_client_block behind the scenes so as a user
> of mp you can always use it without wondering about transfer encodings,
> content length headers etc.  In my own module I exposed
> ap_setup_client_block and ap_should_client_block which means that from
> my Python code I have control over chunked (happy to accept it) and I
> can tell if a body was actually supplied or not.  (In HTTP/1.1 any
> request can have a body including GET).

Read:

  https://issues.apache.org/jira/browse/MODPYTHON-222
  https://issues.apache.org/jira/browse/MODPYTHON-212

Probably nothing that can be done to change mod_python though as would
then potentially be incompatible with old code as the semantics of
req.read() would need to change.

> I found that mp request write method accepts unicode strings providing
> they can be converted to string objects using the default encoding
> (this conversion happens due to using PyArg_ParseTuple with a format of
> "s").  Personally I think it is a really bad idea to accept unicode
> strings and have "magic" (the default encoding) convert them.  With this
> kind of thing I believe it is far better to require the user to convert
> to bytes explicitly choosing an encoding.
>
> modpython appears to make no effort to ensure that things don't crash if
> the python object representing a request outlives the request_rec that
> it is wrapping, although I do recall some part of the doc saying "don't
> do that".  However I think it is fair for python users to expect not to
> be able to coredump the process no matter what they do.  In my own
> module I solved this fairly trivially.  When wrapping a request_rec, do
> an extra incref.  Then use apr_pool_cleanup_register.  In that callback,
> set the wrapped pointer to NULL and do a decref.  That will cause the
> python object to live at least as long as the request_rec.  But if the
> python object is still alive after the request_rec has gone away, then
> the wrapped pointer will be NULL.  In all my methods, I check that for
> NULL first and raise an objectdead exception if it is NULL.
> Consequently no matter what happens, it is impossible to crash things.
> The same principle can be applied to other objects since by definition
> they only live as long as their pool.
>
> mp doesn't do OOR - original object return.  ie each time you get a new
> request/header/etc python level object that is wrapping an underlying
> Apache object, you get a different Python object.

You do get the same request object, but correct that for many of the
attributes you don't, plus wrappers for server and connection objects
also always created on demand each time. Trying to cache values,
especially when they aren't all read only and can be updated by other
Apache functions, just makes the job of writing a hand coded wrapper
even more work so I don't begrudge the original author for not doing
it.

> This shouldn't really
> matter that much, but it would be nice if you got the same object back
> (ie with the same id).  It took wxPython a while to figure out how to do
> it, since you'd need to have some way of mapping from the C level object
> to a Python level object.  I think you can use pools again to do this
> such as by using apr_pool_userdata_set with appropriate keys.
>
> Finally (and this is "cosmetic") the code for setting members of
> requestobject has a lot of duplication.  For example there are data
> structures defining the names and types and then the code has additional
>  type checking and names.  I used PyGetSetDef's and tp_getset and it
> results in no duplication :-)

Too true, attribute and method lookup of objects is a mess.

Graham