Alec Matusis
matusis at yahoo.com
Fri Aug 1 22:47:25 EDT 2008
I was stress- testing MySQL replication on my development setup and run into a particularly nasty case of a hard-to-reproduced bug that I have encountered earlier. Under high concurrency, I sometimes get build/bdist.linux-i686/egg/MySQLdb/connections.py: OperationalError: (2003, "Can't connect to MySQL server on ''10.18.0.2 (4)") Here (4) stangs for: #perror 4 OS error code 4: Interrupted system call I am using apache 2.2.6 with worker MPM , python 2.4.4 and mod_python 3.3.1. 10.18.0.2 is a remote DB test server. I upgraded MySQLdb adapter from 1.2.0 to 1.2.2 and got the same error with both versions. The adapters are compiled with mysqlclient_r , which is supposed to be thread-safe. Then, I pointed MySQL connections to localhost, which runs an identical copy of the same database. To get errors on localhost, I had to increase the request rate by about 10x (no errors at all otherwise), and I got OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' (4)") 127.0.0.1 is my development machine, and I have never seen these errors before, until I ran this stress test today, bringing the machine to load average 8.0. 127.0.0.1 and 10.18.0.2 are two different machines, with different kernels and even different MySQL versions, so it is the client machine that is a problem. I have written a single-threaded stand-alone python loop, that simply connects to the DB on the remote server 10.18.0.2 and executes "SELECT 1". I have run several of these loops concurrently each doing about 10000 iterations, and they never gave a connection error. MySQLdb README says: MySQLdb is an interface to the popular MySQL_ database server for Python. The design goals are - Thread-safety - Thread-friendliness (threads will not block each other) So my guess is that this is a thread-safety problem having to do with mod_python. I now recall that I have seen the problem earlier: I have several identically configured production web servers connected to a single DB server. I have seen these errors infrequently on 32bit webservers, but extremely rarely or almost never on 64bit webservers. I since phased out 32bit web servers to serve static files only, so I do not have this data anymore. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mm_cfg_has_not_been_edited_to_set_host_domains/pipermail/mod_python/attachments/20080801/0ce4a905/attachment.html
|