[mod_python] Some observations after writing my own modpython

Wed May 30 20:22:27 EDT 2007

On 30/05/07, Roger Binns <rogerb at rogerbinns.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Graham Dumpleton wrote:
> > Maybe you would like to review my code for me. :-)
> >
> >  http://www.modwsgi.org
>
> I've taken a quick look but am not too familiar with what it is doing.

Wow, when I suggested that it was more 'tongue in cheek', if you
understand the expression. In other words I never actually expected to
hear anything back from you about it. Much thanks.

> (Pretty much all the Python wrapping of other libraries I do is to
> deliberately ignore Python "standards" and follow what the libraries do
> instead :-)

For Apache, I definitely feel that is the best approach. How
mod_python wraps it differently is a pain as you can't just point
people at the Apache documentation.

> I am somewhat baffled as to why you have the "daemon" mode.  Surely
> ProxyPass would be sufficient.  And if it wasn't then fastcgi/scgi
> should be.

The complaints one keep seeing about fastcgi/scgi is that it is a pain
to setup both in the fact that you need to install separate backend
packages and also in the configuration. You also see various
complaints about it those process will just die or will hang around
and not die.

The whole point of having such a feature in mod_wsgi is so that you
don't have to install a separate backend framework, so the exact same
script file as for embedded mode can be used and to make the
configuration as simple as possible. Also, because Apache creates and
manages the daemon processes just like any other Apache child process,
you are guaranteed that you will not get daemon processes hanging
around and causing problems, plus Apache can restart them
automatically if they die.

Overall, the intent is to make it as secure, simple and uncomplicated
as possible for web hosting companies so they can see it as a viable
option for hosting Python web applications.

This doesn't mean it will be suitable for all use cases, but it goes a
lot further than any other single solution. The only real case where
it still may not be completely suitable for is where a user wants to
be able to run different versions of Python to that which the web
hosting company might want to provide. This is where a ProxyPass or
fastcgi/scgi solution would work, but if using ProxyPass, you could
equally proxy to a second Apache instance for the user, and running as
the user, which is itself running mod_wsgi in embedded mode.

> I'm astounded at how much effort you have put into being compatible
> across so many Apache versions and packaging tweaks on the various
> distributions.
>
> You've also got a lot of options and complexity for those options, such
> as optionally disabling protection against stdio usage and signals.  My
> own tastes are towards insisting on standards compliance and not have
> more code (and hence more complexity and possibility for bugs).

The problem is that WSGI as a standard is incomplete. There is nothing
in the standard to say that applications shouldn't be using things
like standard input and output, but if you don't your WSGI application
isn't strictly portable. Thus I enforce certain rules which promote
portability, but you still have to provide a way out for those people
using some non portable application. Because so many WSGI capable
applications weren't originally designed for WSGI but made to work on
it later, there are plenty of non portable WSGI applications or
applications which do nasty things that interfere with Apache.

Anyway, again, it is all about providing a system which is going to be
safe to use in shared web hosting environments where users can't
create havoc. At the same time, it is configurable enough that people
running their own web servers can change the behaviour if need be.

> At line 729 after the PyErr_Fetch, I believe you should use
> PyErr_Normalize to set unset value/tracebacks.

Presume you mean PyErr_NormalizeException(). Okay, didn't know about
that function so will need to look in to it.

> The Log_output function has lots of scary string arithmetic and memory
> allocations.  Wouldn't the various apr_str functions do the trick?

There are no APR functions in Apache 1.3. It doesn't have some
equivalents as ap_ functions but not as many. Also, the apr_ functions
mean you are using the Apache memory pools. I didn't want to use the
memory pools as then the cumulative memory used is held until the end
of the request, whereas using malloc/free means only hold memory for
just the period I need to.

> The if(*msg) in Log_write is redundant since format "s" to
> PyArg_ParseTuple will always fill in the char*.

True. Probably my paranoid programming style and never remembering
that 's' will not pass a None line 'z' will. :-)

> You may find it worthwhile using PyObject_CallFunction instead of
> BuildValue and CallObject.  You have one less thing to track refcounts
> for and one less line of code.

Probably just did it that way as that is what the basic Python
documentation examples use and so have always done that.

> Input_read also has the string arithmetic and memory allocations.  Why
> not use the Python PyString or apr routines for all of those?

No apr routines for similar reasons to log output. I do use PyString
objects for part of it, but only as the final buffer which I use
traditional memory routines to copy into.

> For the various objects that have a request_rec*, I don't see how they
> deal with outliving the request_req.

They don't and since I keep pushing this idea that mod_wsgi is meant
to be as secure and robust as possible to satisfy web hosting
companies, maybe I should.

Strictly speaking though, a WSGI application should not be retaining
references to the WSGI environment, the start_response function or
anything else provided as part of the request. If it does, then it is
incorrect.

> A lot of the config values will accept nonsensical values for booleans
> because the config routines effectively do a caseless compare for "off"
> and treat everything else as on, including "0", "1" etc.  I'd suggest
> tightening that up to only accept the values you want rather than
> anything.  (You know someday someone will make a configuration mistake
> and get unexpected behaviour and take ages to track down that "0" didn't
> actually turn something off or "1" didn't turn it on).

This just follows what all the core Apache modules do, ie., they all
have very loose checking on configuration values for flags.

> > Trying to cache values,
> > especially when they aren't all read only and can be updated by other
> > Apache functions, just makes the job of writing a hand coded wrapper
> > even more work so I don't begrudge the original author for not doing
> > it.
>
> For OOR you just have to stash the Python object pointer somewhere in
> the Apache object and so the pool note functions would work.  I don't
> see anywhere that values are cached, unless you mean something like
> getting the same PyString objects out when setting a setting field.
> That certainly is going too far.

Maybe I misunderstand what you are talking about, or to what level you
are applying it. What I thought you meant is that if one access an
attribute of request_rec, such as content type, that you create a
Python string object and return it, but that you also remember that
Python string instance so the next time content type is access in same
request, that Python string instance which is being held is return
instead of having to create a new one. The problem I saw with doing
this is that calling ap_set_content_type() changes the content type in
the request, and thus you have to know when things like this can
change values so you discard that cache value. I'll have to go back
and reread what you said about OOR.

Anyway, thanks greatly for reviewing my code and commenting on it.
Just about everyone grabs the code and uses it, rather than digging
into what it does and giving any feedback and how to make it better.

Graham