[mod_python] Some observations after writing my own modpython

Graham Dumpleton graham.dumpleton at gmail.com
Tue Jun 5 18:45:02 EDT 2007


On 06/06/07, Roger Binns <rogerb at rogerbinns.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> > The complaints one keep seeing about fastcgi/scgi is that it is a pain
> > to setup both in the fact that you need to install separate backend
> > packages and also in the configuration.
>
> Can't that be fixed by better documentation/installers?

Not all authors are that dedicated to come up with really good
documentation and why document a project written by someone else is
usually the attitude. Most will just spend just enough time to get it
working for themselves and leave it at that. Even more seem not even
to bother to read what documentation exists and will instead refer to
some arbitrary persons comments on a blog as to how they got it
working.

> > You also see various
> > complaints about it those process will just die or will hang around
> > and not die.
>
> I assume that is bugs in wsgiref or one of the other servers.

Nothing to do with wsgiref as they aren't Python specific solutions.
The problem is more to do with how the supervisor mechanisms are
implemented for both those packages, or not as the case may be. In
mod_wsgi I use the Apache supervisor mechanism which because it is a
part of Apache should provide more of a guarantee that things get
cleaned up properly on shutdown and restart.

> > Overall, the intent is to make it as secure, simple and uncomplicated
> > as possible for web hosting companies so they can see it as a viable
> > option for hosting Python web applications.
>
> I'd strongly recommend putting that paragraph at the top of the modwsgi
> home page.  A problem with a lot of open source projects is that they
> don't say what they are really good at.

True, I acknowledge I have a lot more work to do on documentation and
I do intend doing it. :-)

Also intend approaching some web hosting companies to see what they
think they need to be able to better support Python. Whether this be
features or simple documentation.

> > Anyway, again, it is all about providing a system which is going to be
> > safe to use in shared web hosting environments where users can't
> > create havoc.
>
> def handler(...):
>    while True:
>       pass

Okay, less havoc. I should really phrase it as being 'where
administrators have more control over the users applications'. Problem
with mod_python for example is that even the entry points sit outside
of the Apache access/auth/authz mechanisms, plus having full access to
request details means one can do password harvesting and other nasty
things. It is also too easy in mod_python to force some of your code
to run in a distinct Python interpreter where code belonging to
someone else is running and thus interact with that other persons
code.

> > Also, the apr_ functions
> > mean you are using the Apache memory pools. I didn't want to use the
> > memory pools as then the cumulative memory used is held until the end
> > of the request, whereas using malloc/free means only hold memory for
> > just the period I need to.
>
> I can't imagine requests live that long to matter.  Also if you are
> talking premature optimizations, I believe it is far more effective to
> have the one free called that gets rid of the whole pool rather than
> individual frees.  The main thing that seemed wierd in the code is that
> it was doing these precise allocations whereas I'd expect Python or apr
> to do allocations larger than you requested so that when you add more
> chunks of data the space is likely already there.

It is not the length of time for the request that matters, it is the
amount of data since if one uses Apache pools one can't give back
memory to the Apache pools such that it can be used straight away
within the lifetime of the pool, ie., one can't free it back to the
pool.

Also, in the log object, these aren't just used for 'wsgi.errors' for
life of request with them being used as replacements for 'sys.stdout'
and 'sys.stderr'. One can't use an Apache pool here as the log objects
live for the life of the interpreter (generally the process).

So, Apache pools are pretty well out unless one wants to add more code
complexity on top to deal with reusing memory etc. Even then still
problem of what to do with extra large chunks of memory if it has to
be allocated.

One could feasibly make use of stuff in Python to replace what I did
and I do use Python string objects at points where it is is easy and
makes sense. The only point I used malloc/free was where I needed to
retain buffered data between calls. For most cases where I did this,
the likelihood of the code actually being triggered is low as they
aren't on the most commonly used path. Thus for now it was a case of
getting something working. So yes it could be changed, but at this
point there is no pressing need.

> >> For the various objects that have a request_rec*, I don't see how they
> >> deal with outliving the request_req.
> >
> > They don't and since I keep pushing this idea that mod_wsgi is meant
> > to be as secure and robust as possible to satisfy web hosting
> > companies, maybe I should.
>
> reqs=[]
>
> def handler(req):
>    reqs.append(req)
>    [r.uri for r in reqs]
>
> That code will cause a core dump :-)

I have already updated mod_wsgi to avoid such problems.

> > Maybe I misunderstand what you are talking about, or to what level you
> > are applying it. What I thought you meant is that if one access an
> > attribute of request_rec, such as content type, that you create a
> > Python string object and return it, but that you also remember that
> > Python string instance so the next time content type is access in same
> > request, that Python string instance which is being held is return
> > instead of having to create a new one.
>
> OOR is not referring to individual attributes and "small" types such as
> string and integer but rather objects such as what wraps the
> request_rec, server, apr_tables etc.

Which is already done for request_rec and because of caching also for
the tables such as subprocess_env, notes, headers etc within a
mod_python request object. It isn't done for server_rec and conn_rec
within a request object however as I pointed out. Thus I think there
is perhaps more OOR going on in mod_python than may be obvious.

> > Anyway, thanks greatly for reviewing my code and commenting on it.
> > Just about everyone grabs the code and uses it, rather than digging
> > into what it does and giving any feedback and how to make it better.
>
> You are welcome.  It was interesting for me too.

FWIW, I have gone through and made some changes to mod_wsgi based on
your comments. Some I see as much lower priority and will only revisit
if there appears to be a great need for it.

So much thanks still for your feedback.

Graham


More information about the Mod_python mailing list