Byron Ellacott
bje at apnic.net
Fri May 14 13:39:48 EDT 2004
On Fri, 2004-05-14 at 04:16, SAiello at Jentoo.com wrote: > I was curious for ideas on how to protect a mod_python web application from > someone submitting/requesting data very quickly repeatedly. An example, I am If you mean, 'how do I protect against someone maliciously trying to overload my server' then the throttling/bandwidth limiting suggestions already given are useful tools. If you mean, 'how do I improve my server's performance to handle high load' then read on. > building an IMAP webmail application. Currently, if I click the view 'next > set of messages in email box' quickly over and over again, that seems to > spawn a bunch of apaches trying to service all those requests. One problem is > that I really don't want one user being able to make my app take up alot of > CPU load by doing this. Another is that I am storing the current message > position in a session variable, by spawning a bunch of simultaneous requests > I seem to be able to keep clicking 'next' above the total number of messages. Use caching. If you've only just asked the IMAP server for the contents of message #404, there's no good reason to ask it again. You could cache the messages, the message indexes, or even the entire output of a given request. Also, you probably shouldn't be storing the 'current message position' in a session. This implies that the user is only viewing one page at once, which in a lot of cases isn't true. They might open a message in a new window or tab, for instance, and have two messages open at once. Which one is 'current' in that situation? A possibly better way would be to have the "Next" link, as generated in response to a request to display a particular message, also include information about which message should be considered the next message. For example, I would probably implement this as a method to display a message by ID, and for each generated display, include a "Next" button which generates a request to display message #(ID + 1). If the functionality of "Next" means "Next Unread" or some such, I'd probably generate a request to display the next unread message after message #ID, so once again, the knowledge about the 'current' message is tied to a particular display. Another, more serious, problem is that you appear to have a race condition. One request might be getting the 'current' message ID, comparing it to the maximum, then incrementing the session value. Another request does the same. Unfortunately, due to the way multiprocessing works, one of them preempted the other, and did its work between the "compare" and the "increment" commands. Thus, the first increments the value again, making it too high. This is serious because my understanding of mod_python.Session is that it automatically does session locking. In other words, there should already be only one simultaneous request per session. > A quick idea of mine to limit one simultaneous request per session, was at the > start of the request, create a session variable that would store the total > number of requests for that session. Then I could check the number of > requests, and if the variable is greater than 1, sleep until it is lower than > 1. ... so yes, the general idea is sound. Your implementation is a little flawed, however. > 1 sess=Session.Session(req, None, cookieSecret) > 2 if not sess.has_key('REQUESTS'): > 3 sess['REQUESTS']=1 > 4 sess.save() > 5 else: > 6 sess['REQUESTS']+=1 > 7 sess.save() > 8 while sess['REQUESTS']>1: > 9 sleep(1) > 10 sess['REQUESTS']-=1 > 11 sess.save() I've added line numbers to help the discussion. So, you create a session object at line 1. This is when the locking should already have occurred. In lines 2-4, you introduce a race condition: if a second process preempts your request after line 2, but before line 4, that process will also get False from sess.has_key('REQUESTS'). This means two separate processes will reach line 3 thinking they have exclusive access to the session. A similar race condition exists between lines 6 and 7. More problematic, lines 8 and 9 loop until sess['REQUESTS'] <= 1. Unfortunately, you didn't refresh the session in that loop, so I would expect any request entering that loop will never leave it. You may need a "sess.load()" in the loop. Finally, at line 10 you decrement your own, local value, and save that to the shared session. This would immediately overwrite any other value there. Granted, if you had achieved a lock by this point, what's in there would be what you expected to be in there. For what it's worth, a quick test with Apache/2.0.47 (Debian GNU/Linux) mod_python/3.1.3 Python/2.3.3 shows that "s = Session.Session(req, None, 'foobar')" does in fact do session locking. In the context of the original request, I'd start by reducing the response time of each request before I started finding ways to deny excessive requests. :) -- bje
|