[mod_python] output filters: selection criteria and slowness

Graham Dumpleton grahamd at dscpl.com.au
Fri Jul 21 19:28:13 EDT 2006


On 22/07/2006, at 6:53 AM, Rob Miller wrote:

> hi,
>
> i know python pretty well, but i'm a mod_python (mod_*, really)  
> beginner, and i'm having some trouble getting an output filter to  
> work for me.  i'm trying to apply a filter to content that is being  
> served from a CMS and proxied through apache using RewriteRules.    
> i've got it working, sort of, but there are a couple of major snags  
> i've hit:
>
> - the content coming from the other system doesn't necessarily have  
> a file extension.  i can use 'SetOutputFilter" to apply the filter  
> to everything, but i don't WANT to apply it to images or other  
> binary data.  is there a way to use "AddOutputFilter" for files  
> with no extension?  or to use it as an extension blacklist, rather  
> than a whitelist?  or would this have to be done as logic in the  
> filter itself?

It is probably easier to make the determination in the filter itself.  
Rather than:

     try:
         streambuffer = filter.req.streambuffer
     except AttributeError:
         filter.req.streambuffer = StringIO()
         streambuffer = filter.req.streambuffer

have something like:

     try:
         streambuffer = filter.req.streambuffer
     except AttributeError:

         # first time into the filter for this request
         # pass on stuff we don't want to deal with

         if filter.req.notheme:

           # pass on if no theme

           filter.pass_on()
           return

         elif not filter.req.headers_out.has_key("content-type):

           # pass on if no content type specified

           filter.pass_on()
           return

         elif not filter.req.headers_out["content-type"].startswith 
("text/html")

            # pass on if not HTML

            filter.pass_on()
            return

         filter.req.streambuffer = StringIO()
         streambuffer = filter.req.streambuffer

Later on, you can also change:

         if filter.req.notheme:
             filter.write(streambuffer.getvalue())
         else:
             filter.write(appmap.publish(streambuffer.getvalue()))

to just:

          filter.write(appmap.publish(streambuffer.getvalue()))

as you have already passed on control to next filter in chain for  
stuff you do not
want.

> - when i apply the filter to static content coming from a hard  
> drive, it works very well.  when i apply it to content from the  
> CMS, however, it is extremely slow.  a single page can take  
> anywhere from 15 to 45 seconds to return.  (note that if i browse  
> directly to the CMS the page returns are also quite fast.) it seems  
> like a lot of information comes down right away but firefox churns  
> as though it's still waiting.  when i use wget, the page seems to  
> get requested over and over again, with wget never realizing it's  
> done.  my guess is it has something to do w/ the content-length  
> header, but i've deleted it from request before writing to the  
> filter object as shown in the examples i've found.

Add logging in your filter so track how long different things take.  
Ie., sprinkle in:

   filter.req.log_error("timestamp %d %f" %  
(filter.req.connection.id,time.time()))

If you want the content length put back, setup Apache to pass the  
output through
the "CONTENT_LENGTH" filter as well. Because you accumulate all the  
data into
one block it will work. You could also just calculate it yourself and  
add it back.

> i'm using mod_python 3.1.4 and apache 2.0.55 from ubuntu dapper.   
> the code of my filter is here: http://codespeak.net/svn/z3/ 
> deliverance/branches/namespaced/mpfilter.py
>
> anyone have suggestions, or pointers to docs, that might help me?

Reading the general Apache filter FAQ may or may not be useful. It  
isn't mod_python
specific, but explains how it works underneath.

   http://www.projectcomputing.com/resources/apacheFilterFAQ/

Graham


More information about the Mod_python mailing list