[mod_python] Output filters for large files

Graham Dumpleton graham.dumpleton at gmail.com
Sat Jun 23 07:24:28 EDT 2007


On 22/06/07, Robin Haswell <me at robhaswell.co.uk> wrote:
> Hi there
>
> I'm having issues writing an output filter that can handle large files.
> My filter's primary purpose is to count the length of the output. Could
> someone assist me please?
>
> The code I have works very well except when a large file is passed
> through it, at which point it eats up memory, probably equal to the size
> of the file.
>
> Any assistance would be much appreciated.
>
> Here is my filter so far:
>
> def outputfilter(filter):
>
>     s = filter.read(4096)
>     bytes = 0
>     while s:
>         filter.write(str(s))
>         bytes += len(s)
>         s = filter.read(4096)
>
>     if s is None:
>         filter.close()

One generally would not provide a size argument to filter.read(),
instead one would let Apache provide the data in the natural size that
was written out by a handler or as read in from a file.

By doing what you are doing you would be forcing Apache to break up
data or have to accumulate it so as to give it in the size you want.
The result of that would be an increase in the amount of memory which
would be used as you are seeing.

Also note that your output filter can be called more than once, so you
have to store your byte count in the request object to preserve it
between calls.

Thus:

def outputfilter1(filter):

    if not hasattr(filter.req, 'mybytecount'):
        filter.req.mybytecount = 0

    s = filter.read()
    while s:
        filter.req.mybytecount += len(s)
        filter.write(s)
        s = filter.read()

    if s is None:
        filter.close()
        # ... do something with byte count

    return apache.OK

BTW, why are you wanting to do this? The request object already has a
bytes_sent attribute which Apache updates as data is written and will
when all is done have the final content length in it. What you want to
do with the value will determine the mechanism used to register a
handler which will give you access to it after everything is done.

Graham


More information about the Mod_python mailing list