[mod_python] Session class or Memcache to reduce load?

Mike Looijmans nlv11281 at natlab.research.philips.com
Tue Aug 7 05:07:43 EDT 2007


Instead of trying to "fix" apache and creating yet more workarounds for other workarounds, work 
together with the client's browser.

The last-modified header should be derived from your database data, not from the thumbnail file. 
 From your description, i understand that the thumbnails never change, but the database does change 
parameters for a thumbnail. (even if thumbnails do change, just change the filename when updating it 
instead of simply replacing the contents to enforce that thumbnail files never change)

Store a last-modified date/time in the DB for the particular item. Then, when a request comes in 
from the client:
- Get the associated last-modified date from the database.
- Get the "if-modified-since" header that the client sent.
- If the date matches, just send back a "304 Not Modified" response. You're done.
- Otherwise, send the image data and be sure to add the Last-Modified header you got from the DB.

This will allow the data to be cached on the client's machine, which is much more efficient. It will 
also allow you to change to another image (with the same URL) without the need of a "purge" function.

Don't try to cache the file contents. The OS will probably do a better job at that. (For example, 
MySQL relies completely on the OS for any file caching and only caches index data)

Note that if you know in advance that the thumbnail will not change for some time, you can add an 
"Expires" header to the outgoing data. This will cause the client not to request an update until 
that date arrives (and you can still use the last-modified system in that case).


Mike Looijmans
Philips Natlab / Topic Automation


Alec Matusis wrote:
> Hi Colin, thanks for the tip.
> 
> I converted to req.sendfile, and then we run into issues with browser
> caching of these thumbnail images.
> 
> When I use req.sendfile, is there a way to retrieve the last modification
> time of the image file automatically (without using os.stat(image_path)) and
> send it to the client in Last-Modified header?
> 
> It looks like I would have to do this additional os.stat(image_path) per
> thumbnail retrieval to make correct use of If-Modified-Since client header
> to keep the browser cache current.
> 
> Alec.
> 
>> -----Original Message-----
>> From: Colin Bean [mailto:ccbean at gmail.com]
>> Sent: Monday, August 06, 2007 2:45 PM
>> To: Alec Matusis
>> Cc: mod_python at modpython.org
>> Subject: Re: [mod_python] Session class or Memcache to reduce load?
>>
>> Hey Alec,
>>
>> For starters, you should use req.sendfile instead of manually reading
>> / sending the file in Python.
>> I believe you'd still have to set the content type manually, so you'd
>> still have to determine that.  This looks like it would be an easy
>> change to implement, and hopefully will help your performance
>> immediately.
>>
>> Graham described a potentially faster method here (with the warning
>> that it's "slightly theoretical" -- I've not tried it, but reading the
>> thread might be helpful)
>> http://www.modpython.org/pipermail/mod_python/2007-July/024061.html
>>
>> -Colin
>>
>>
>> On 8/6/07, Alec Matusis <matusis at matusis.com> wrote:
>>>
>>>> -----Original Message-----
>>>> From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com]
>>>> Sent: Monday, August 06, 2007 2:25 AM
>>>>> Hello, I am sorry if my question is too basic. I would like to
>> reduce
>>>> the
>>>>> load on apache 2.0 (on 2.6.9 linux) that is running with prefork
>> MPM.
>>>>> There are two main things that are causing the load:
>>>> How do you know that these two things are causing the load?
>>>>
>>> Because all this apache server (this physical machine) does is
>> serving
>>> thumbnails.
>>>
>>>>> a) Thumbnail images that are requested repeatedly
>>>> Are you serving up the thumbnails from Python code dynamically or
>>>> allowing Apache to serve them up form a static file?
>>> Dynamically.
>>>
>>>>> b) A simple DB query is necessary to locate an image file after
>> the
>>>> request.
>>>>> The result of the query does not change. DB is located on another
>>>> machine.
>>>>> My first question, do we need to cache \thumbnail images at all,
>> or
>>>> the
>>>>> file-caching by the OS is sufficient?
>>>> OS file caching probably will not make much difference. Where one
>> can
>>>> waste a lot of cycles though is by using Python code to serve up
>> the
>>>> images. Thus, if Python is involved in serving up the images, how
>> is
>>>> it being done?
>>>>
>>> The URI in the client's request contains an image code. Python
>> queries the
>>> DB (on a separate machine) to convert the URI into the image file
>> location,
>>> and then uses
>>>
>>>         f =  open(file_path)
>>>         im_data = f.read()
>>>         f.close()
>>>
>>> then it determines image type using PIL
>>>
>>>           im = Image.open(file_path)
>>>           im_type = im.format.lower()
>>>
>>> (Image.open() is a lazy operation in PIL, so I think it does not have
>> to do
>>> much to determine the image format)
>>> Then python writes the data to the client
>>>
>>>         self.req.content_type = 'image/' + im_type
>>>         self.req.headers_out['Content-Length'] = str(len(im_data))
>>>         self.dict['bin_data'] = im_data
>>>
>>>
>>>>> Second question, to cache the results of the query, should we use
>>>>> mod_python's Session class ( wich will use DbmSession since we
>> are
>>>> using
>>>>> prefork MPM), or memcache?
>>>> Traditionally people use memcached for this.
>>>>
>>> Why? For performance reasons?
>>>
>>>>> For certain reasons in the application logic we cannot use
>> apache's
>>>>> mod_cache.
>>>> What reasons?
>>>>
>>> Because sometimes the images are dynamically banned (flagged) by
>> users, and
>>> in that case we need to render a special image. The DB query that we
>> use
>>> gives None for the image file path when it's banned. So when the
>> image
>>> status changes, we will need to dynamically purge the file path of
>> the image
>>> that is be cached.
>>>
>>>> Graham
>>> _______________________________________________
>>> Mod_python mailing list
>>> Mod_python at modpython.org
>>> http://mailman.modpython.org/mailman/listinfo/mod_python
>>>
> 
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
> 
> 




More information about the Mod_python mailing list