[mod_python] mod_python 3.3.1 memory leak?

Tue Jul 3 08:22:39 EDT 2007

On 03/07/07, yubing <trueice at gmail.com> wrote:
> That's great of you:)
> I've tried a php scriptlet that pumps file to user, it has the same memory
> behavior as the python pumper.
> The pooled allocated memory will only be released when the apache thread
> exits (reaches the MaxRequestsPerChild).

No. The pool here is associated with the particular request, not the
whole process. So would be released and available for use at the end
of the request. Nothing to do with MaxRequestsPerChild.

> However, our streaming code's memory usage(VmRSS) grows too fast in one
> request (maybe 16k per second).
> Is mod_python using such pooled allocation mechanisms during one request ?

Not sure what to suggest at the moment bar having it on a different
port or host and using a custom Python web server for just that one
task.

I can only presume that this problem exists because mod_python is
using a high level API for writing response data. It is quite possible
if writing a custom C handler that if one used the bucket API directly
then one could control it better and ensure that bucket structures
released straight away after data flushed.

Anyway, mod_wsgi suffers the same problem, so I might have to dig into
the lower level bucket API for a solution so I can at least make
mod_wsgi work properly. From that, might understand how to fix
mod_python.

Sleep time for me now, but maybe I'll come up with some bright idea
during the night. :-)

Graham

>  On 7/3/07, Graham Dumpleton <graham.dumpleton at gmail.com> wrote:
> > All I can say is that is simply how Apache works. It is not a problem
> > with mod_python. To try and explain it, look at simpler code of
> > ap_rflush() which is one of two Apache functions called when you call
> > req.write ().
> >
> > AP_DECLARE(int) ap_rflush(request_rec *r)
> > {
> >     conn_rec *c = r->connection;
> >     apr_bucket_brigade *bb;
> >     apr_bucket *b;
> >
> >     bb = apr_brigade_create(r->pool, c->bucket_alloc);
> >     b = apr_bucket_flush_create(c->bucket_alloc);
> >     APR_BRIGADE_INSERT_TAIL(bb, b);
> >     if (ap_pass_brigade(r->output_filters, bb) !=
> APR_SUCCESS)
> >         return -1;
> >
> >     return 0;
> > }
> >
> > The important thing here is r->pool. This is a memory pool which
> > exists for the life of a request. When you allocate memory from the
> > memory pool, it will only be reclaimed at the end of the request. Ie.,
> > it isn't like malloc/free where you can give back a block of memory
> > using free and it will be reusable straight away.
> >
> > Now, every time a flush is occurring, it needs to create a bucket
> > brigade which holds a special flush object. This is then passed down
> > through the output filters. In this case it causes any pending data to
> > be flushed out.
> >
> > Now because the memory for those bucket structures is only reclaimed
> > at the end of the request, it means if you have a very long running
> > request which outputs data in small blocks, then there will be an
> > incremental use of memory because of the need to create the bucket
> > structures. This memory will be unavailable for use by anything else
> > until the requests ends.
> >
> > FWIW, I didn't realise Apache did this either. I can see there are
> > potentially good reasons for it being done this way, but still was a
> > surprise.
> >
> > Graham
> >
> > On 03/07/07, yubing <trueice at gmail.com> wrote:
> > > oh, sorry, that's my typo in the mail, it should have been:
> > > ------------------------------
> > > def pump_file(req):
> > >     fp = open("/dev/zero", "r")
> > >     while(True):
> > >         buf = fp.read(4096)
> > >         try:
> > >             req.write(buf)
> > >             time.sleep(0.1)
> > >         except:
> > >             fp.close()
> > >             break
> > > -------------------------------
> > > you can still observe the memory going up slowly
> > > BTW: I'm using the prefork mpm of apache
> > >
> > > On 7/3/07, Graham Dumpleton <graham.dumpleton at gmail.com > wrote:
> > > > On 03/07/07, yubing < trueice at gmail.com > wrote:
> > > > > Our project has a live HTTP video streamer written in python, which
> > > keeps
> > > > > pumping a stream out to the client.
> > > > > The HTTP serving module is a simple mod_python request handler
> running
> > > on
> > > > > Apache 2.2.4 with mod_python 3.3.1 (Python 2.5.1).
> > > > > The stream is read out of our streaming server via TCP socket, and
> the
> > > > > python script just do some simple processing like header building,
> each
> > > > > allocated buffer is del-ed after being used.
> > > > >
> > > > > The problem is:
> > > > > We observed that after its running serveral hours, its memory
> occupation
> > > > > grows up to serveral hundreds of megabytes and keeps growing in
> 4k-8k
> > > > > increment every 1-2 seconds.
> > > > >
> > > > > Below is a simple testing scriptlet, the memory leaking issue is not
> so
> > > > > serious as our live serving module, but you can still observe 4k
> memory
> > > > > growing every serveral seconds.
> > > > >
> > > > > Could anyone help me to figure out the root cause of this issue?
> > > > >
> > > > > --------------------------
> > > > > import time
> > > > >
> > > > > def pump_file(req):
> > > > >     while(True):
> > > > >         fp = open("/dev/zero", "r")
> > > > >         buf = fp.read(4096)
> > > > >         try:
> > > > >             req.write(buf)
> > > > >             del buf
> > > > >             time.sleep(0.1)
> > > > >         except:
> > > > >             fp.close()
> > > > >             break
> > > >
> > > > BTW, you do realise that you open the file in the loop when it should
> > > > be outside.
> > > >
> > > > Graham
> > > >
> > >
> > >
> > >
> > > --
> > > truly yours
> > > ice
> >
>
>
>
> --
> truly yours
> ice