Alec Matusis
matusis at matusis.com
Mon Oct 1 18:34:43 EDT 2007
Hi Graham, > 1. Please explain what application you are running on top of > mod_python. Show the Apache configuration applying to the application > you are running on top of mod_python. <Directory "/ourpath/scripts"> SetHandler mod_python PythonPath "sys.path+['/ourpath/scripts', '/ourpath/publisher', '/ourpath']" PythonHandler publisher PythonOption init _our_init PythonOption default _main PythonInputFilter flashfilter FLASHFILTER SetInputFilter FLASHFILTER </Directory> Flashfilter is used very infrequently (for user image uploads from legacy clients), it's not a major load factor. > 2. Explain why you are running prefork and not worker MPM. Are you > also running some PHP application that will not work with worker MPM? We are not using any PHP, it's all python. I do not have any rational reason for using prefork over worker, except that once I ran a python script on this machine that spawned multiple threads. I believe it gave me an error (something like "cannot spawn any more threads") at about 305 threads. That made me apprehensive about using worker MPM. > 3. Explain what the relationship is between your mod_python > application and your memory hogging twisted back end processes, ie., > do they communicate with each other and how. We have two twisted processes, they take about 110MB and 75MB of RSS respectively, and run for months without restarting. Python web scripts periodically communicate with one of the twisted processes like this: socket.setdefaulttimeout(10) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('127.0.0.1', twisted_listening_port)) r = s.send(command) if r > 0: data = s.recv(10) else: data = -1 s.close() > 4. Indicate how big your Apache and twisted processes are growing. Apache can have up to 275 processes, each taking between 11-18MB Twisted is 105MB and 75MB (all figures are RSS) Apache is restarted nightly by a cron script (graceful restart) > 5. Indicate whether you have used the netstat program to try and > determine how many socket connections are being held open by Apache > and twisted processes. Under normal operation: # netstat -n | grep :80 | wc -l 11704 # netstat -n | grep :80 | grep ESTABLISHED | wc -l 151 When apache crashed (was "slow"): # netstat -n | grep :80 | wc -l 6067 This is half of the normal. I do not know how many of these were in TIME_WAIT. 2 Twisted processes have about 3500 and 1000 connection respectively, all in the ESTABLISHED state. > 6. Indicate whether you have tried changing the Apache configuration > for how many keep alive connections are maintained and how long keep > alive connections are kept open. We have always had KeepAlive Off > 7. Indicate what range of values you have experimented with for > MaxRequestsPerChild. I upped to from 1000 to 10000, but at that time our bandwidth was much lower, so not recently- since I did not see any memory leaks in apache children. > At the moment the description of your problem is a bit vague so all > the above detail would help immensely as far as us being able to given > any ideas. I managed to take a screenshot of /server-status page when apache was in the "slow" state again today. It showed 0 idle workers, all 300 MaxClients busy, and most of them (90%) in the R state, 8% in the W state and the rest in the C state. Despite this, apache did not log "[error] server reached MaxClients setting, consider raising the MaxClients setting" at any time during the crash. It did log this message immediately after I restarted the server, when it was operational. Overall load average drops form 16-20 to 3-5 when apache crashes, and `procinfo -d` shows that the CPUs are 60% idle (drop from normal 15-30% idle during peak load). After today's crash I removed MaxMemFree 2500 altogether. It crashed again after that, with nothing in the error log. Apachectl stop/start recovers it. Our apache access logs are disabled for most common requests. I noticed that when apache is operating normally, there's a lot of connections to our DB server machine in the TIME_WAIT state. It turns out that the DB server is under a medium-high load as well, but I did not manage to look at that machine during the apache crash. This is how our db machine looks like under normal operation db0 ~> procinfo -d Bootup: Tue Jul 24 15:17:47 2007 Load average: 4.70 5.52 4.76 1/157 24936 user : 0:00:08.51 42.5% page in : 1188 disk 1: 92r 277w9w nice : 0:00:00.00 0.0% page out: 2396 disk 2: 68r 280ww system: 0:00:02.05 10.2% swap in : 0 disk 3: 80r 369ww idle : 0:00:09.44 47.2% swap out: 0 disk 4: 297r 599w7w uptime: 69d 0:16:17.61 context : 414234 I will send /server-status page when apache is crashing as an html attachment in a separate email, since I am afraid this email will be delivered to junk if it has an attachement. Thank You Alec Matusis. > -----Original Message----- > From: Graham Dumpleton [mailto:graham.dumpleton at gmail.com] > Sent: Monday, October 01, 2007 4:12 AM > To: Alec Matusis > Cc: mod_python at modpython.org > Subject: Re: [mod_python] mod_python or apache scalability? > > In order so we can understand things better ... > > 1. Please explain what application you are running on top of > mod_python. Show the Apache configuration applying to the application > you are running on top of mod_python. > > 2. Explain why you are running prefork and not worker MPM. Are you > also running some PHP application that will not work with worker MPM? > > 3. Explain what the relationship is between your mod_python > application and your memory hogging twisted back end processes, ie., > do they communicate with each other and how. > > 4. Indicate how big your Apache and twisted processes are growing. > > 5. Indicate whether you have used the netstat program to try and > determine how many socket connections are being held open by Apache > and twisted processes. > > 6. Indicate whether you have tried changing the Apache configuration > for how many keep alive connections are maintained and how long keep > alive connections are kept open. For some guidance on these see recent > blog post at: > > http://lucumr.pocoo.org/cogitations/2007/09/30/pushing-apache- > performance > > 7. Indicate what range of values you have experimented with for > MaxRequestsPerChild. > > At the moment the description of your problem is a bit vague so all > the above detail would help immensely as far as us being able to given > any ideas. > > Please try not to gloss over details, the more details you give the > more helpful we might be able to be. For example, you don't even > mention that your system is shared with some very large twisted > processes. I only know that this might be the case from a short > followup you made to HTTPD dev list. When you say twisted, I presume > though you mean Python Twisted framework. > > Graham > > On 01/10/2007, Alec Matusis <matusis at matusis.com> wrote: > > I am sorry in advance if this turns out to be an apache-related > issue, but > > when I posted this on apache list, it has been suggested that it > might be an > > application issue, so I am reposting it here. > > > > > > We are running a busy mod_python/3.1.4 Python/2.4.1 server on 2.6.9 > kernel, > > that suddenly becomes very slow- requests either time out, or it > takes > > 10-20sec to serve a 1K thumbnail. > > It is somewhat correlated with load spikes, but not perfectly (by > looking at > > the bandwidth graph, it never happens during the low bandwidth > periods at > > night, but it does not coincide with peaks of b/w) > > > > When we initially encountered an apache overload, it was always > accompanied > > with > > > > [error] server reached MaxClients setting, consider raising the > MaxClients > > setting > > > > in the apache error log. We also got > > > > kernel: possible SYN flooding on port 80. Sending cookies. > > > > in /var/log/messages system log. > > > > After that I raised MaxClients from 200 to 300. The problem initially > > disappeared, but after our bandwidth grew a bit more, we got this > behavior > > again. > > Now apache crashes (becomes very slow) silently, with no warning in > apache > > error logs at all (although we still get SYN flood message in the > system > > log) > > When apache is this 'slow' regime, /server-status still shows > available > > slots, i.e. MaxClients is not reached. > > > > This is the relevant part of httpd.conf: > > > > ServerLimit 300 > > # we are using prefork MPM > > StartServers 10 > > MinSpareServers 5 > > MaxSpareServers 20 > > MaxClients 300 > > MaxRequestsPerChild 10000 > > MaxMemFree 2500 > > > > The server has 4GB of physical RAM and 4GB of swap. During these > apache > > "slowdowns", the swap size is still 0 and vmstat shows no swapping at > all. > > I suspect the problem may be in > > > > MaxMemFree 2500 > > > > but then I would expect some kind of"out of memory" errors in the > logs? > > > > I am posting it on this list since I have not gotten a response in > the users > > list, and I think it's a bug for two reasons: > > > > 1) When apache is in this slow "degraded" regime, I would expect a > log > > message in the apache error log, with an explanation why. > > > > 3) If this is related to resource exhaustion, I would expect the > server to > > recover from this regime by itself when the load subsides, but this > is not > > the case. Only apachectl start/stop recovers the server. > > > > > > > > _______________________________________________ > > Mod_python mailing list > > Mod_python at modpython.org > > http://mailman.modpython.org/mailman/listinfo/mod_python > >
|