[mod_python] Protecting Web apps from to many simultaneous clicks/Hacking

Byron Ellacott bje at apnic.net
Mon May 17 10:24:18 EDT 2004


On Fri, 2004-05-14 at 23:17, SAiello at Jentoo.com wrote:
> I have already used some caching. When listing the contents of an email box I 
> cache the folder list, the message's ID in sorted order, and the current 
> window (i.e. 1-20) message's 'from, subject, date'. As I continue, I am not 
> sure where I am going to add more caching. The design of the email solution 

I'd cache the results of most IMAP queries.  Thus, you get a relatively
slow response for the first access to any piece of information, but
relatively snappy for subsequent accesses (think: moving backwards and
forwards through the message list).

> (webmail and server backends) is designed to have many small nodes running in 
> parallel. It is not like I am going to have one IMAP server, with alot of web 
> frontends hitting it. The design is to have a local 'IMAP frontend' on every 
> web frontend. Thus, the 'IMAP frontends' contact the 'IMAP backends'. The 
> communication between the IMAP frontends and backends would handle individual 
> message caching. Also, it will probably do it alot more efficiently than I 
> can.

The trouble is that communication between the frontends and backends
probably has a lot of expensive overhead.  I'm making a few assumptions
about how your system works here that may be invalid.  First up, I'm
assuming that a frontend is connecting to a backend using a TCP
connection.  Establishing a TCP connection is a non-trivial operation;
even connecting to the local host can be relatively costly.  Doing a DNS
lookup for the IMAP backend's IP address is expensive too, though easy
to avoid.  Once a connection's established, you need to present the
backend with your user's credentials, which involves more round trips of
data, and then finally you can do your IMAP operation, which has further
TCP overhead.  After that, you need to close the TCP connection, which
is probably done after the user sees their result, but before the Apache
process is available to serve another result.

Compare this to, for example, storing some cached information in your
session: mod_python is already loading and saving your session via dbm
or shm, so the additional overhead is the marshalling and I/O for the
cached data.  Significantly less than talking to an IMAP backend.

If you're using IMAP connection pools, this section is fairly much
irrelevant, since you're avoiding the high costs of connection setup and
teardown for most requests.  However, it's been my experience that
connection pools are difficult to achieve in a forked Apache world.

> When I said 'Next', I meant the next button to display the next window of 
> messages in the email box (i.e. showing messages 1-20 and then 21-40).
> But the point you bring up is very valid, and brings up concerns for other 
> ways I have used session variables. Guess it is time for another rewrite. For 
> caching, do you have any suggestions ? dbm, external database, etc ?

mod_python's session data first up, because it will have already tied
your cached data to a particular user, and because a lot of the overhead
of storing/retrieving data is ameliorated with the session work, and
because it's easier to use Grisha's work than to duplicate it. :)

Otherwise, I'd probably use as lightweight a system as possible, which
would most likely mean anydbm.

> 	if sess['REQUESTS']>1:

The trouble I'm having here is that if session locking is working, you
should never encounter a value of sess['REQUESTS'] > 1.  The session
should be automatically locking when you first create it, and remaining
locked, as Grisha says, until it's cleaned up when the request
completes.  In fact, copy/pasting your code, appending the
"sess['REQUESTS']-=1" and saving the session before returning apache.OK,
I cannot get redirected.  I inserted a sleep(10) before the decrement
and return, and hit reload a dozen odd times.  The only effect of this
was to make my browser spend two minutes loading the final page. :)

If I put in "sess.unlock()" right after I create the session, I can get
myself redirected to the error page.

Any idea how you might be winding up with an unlocked session?  What's
your Apache version?  mod_python version?  What's the request serving
model (worker threads, forked, etc)?

-- 
bje



More information about the Mod_python mailing list