[mod_python] Very odd behavior using mysqldb

Sun Sep 17 14:02:20 EDT 2006

Jim and Graham,

Thanks for the replies, it's good to have my suspicions confirmed and 
better explained.  Jim, thanks especially for the tip on adding the 
database connection as an attribute of the req.  The main think I didn't 
like about instantiating the connection in the handler was having to 
pass yet another argument to my function calls.  This will clean up my 
code quite a bit.

Thanks again,
e.

Graham Dumpleton wrote:
> In addition to what Jim has says, having the database connection created
> at global module scope of a handler code file isn't necessarily a good 
> idea.
> The problem here is if automatic module reloading is on and you make 
> changes
> to your handler file over time. What will happen is that when the 
> handler code
> file is reloaded, your existing database connection objects will be 
> discarded
> and reinitialised. If normal object cleanup doesn't result in the 
> database
> handles being cleaned up properly, you can get a leak in database 
> resource
> handles and eventually you will run out of connections to the database.
>
> For details on issues like this and others which will affect you with 
> the version
> of mod_python you are running, check out:
>
>   
> http://www.dscpl.com.au/wiki/ModPython/Articles/TheProcessInterpreterModel 
>
>   http://www.dscpl.com.au/wiki/ModPython/Articles/ModuleImportingIsBroken
>
> Graham
>
> On 17/09/2006, at 3:44 AM, Jim Gallacher wrote:
>
>> Eric Brunson wrote:
>>> I've been working on a project using mod_python to access a mysql 
>>> database using the MySQLdb API.  I know mod_python is still 
>>> considered beta under Apache 2, so I thought I'd bring an issue to 
>>> the attention of the list.
>>> I was seeing some very bizarre results when generating dynamic 
>>> content from the database, which I first attributed to PSP caching, 
>>> but then investigated further when I switched to using the publisher 
>>> handler.  It may be the way I've set up my modules and database 
>>> connections, but I haven't convinced myself that I really understand 
>>> the problem and until I do, I'm not going to be happy about it.  I 
>>> have found a workaround that I can live with, but I don't really 
>>> like it that much because it requires constantly connecting to and 
>>> disconnection from the database.
>>> The problem was simply getting old, wrong results interleaved with 
>>> new, correct results after updating the database.  Simply put, I'm 
>>> displaying data from a table.  My index.py imports a module for 
>>> generating HTML which, in turn, includes a module for doing database 
>>> access.  The odd thing about how I set up the database access, and I 
>>> think it's pertinent to the problem, was that I automatically 
>>> connect to the database when I import the database module.  Like this:
>>> ----- database.py -----
>>> import MySQLdb
>>> from mod_python import apache
>>> def Connect():
>>>    return MySQLdb.connect( 'mysql://me:pass@localhost/mydatabase' )
>>> dbc = Connect()
>>> def someExampleAccessFunction():
>>>    global dbc
>>>    curs = dbc.cursor()
>>>    curs.execute( "select count(0) from sometable" )
>>>    result = curs.fetchall()[0][0]
>>>    curs.close()
>>>    apache.log_error( "result was %s" % result )
>>>    return result
>>> ----- end database.py -----
>>> Let's disregard whether that's good programming practice or not, the 
>>> fact is, I could query that value several times by doing a page 
>>> reload, then delete a record from the database and commit the 
>>> change, then subsequent page loads would interleave old results with 
>>> new results, seemingly cyclically.
>>> The key to my confusion is that the error logs for apache would show 
>>> a message for every page reload, and the result from the query would 
>>> agree with whatever data was displayed on the page, correct or 
>>> incorrect.  Meanwhile a direct query to the database from the 
>>> command line would always return the correct result from the database.
>>> My best guess so far is that multiple threads in apache get multiple 
>>> copies of mod_python, each with it's own copy of the imported 
>>> database module in memory and each module maintaining a different 
>>> database connection that is somehow caching results.  I've never 
>>> seen this behavior in any application until I tried this under 
>>> mod_python.  I was able to force the correct results to always be 
>>> returned by getting rid of the module variable, dbc, and moving the 
>>> mysqldb.connect() call to inside each function call, opening the 
>>> database connection, executing my sql, then closing the database 
>>> connection.  This is less than optimal, since I may make several 
>>> database queries during each page load, so later today I'll try 
>>> making the database connection in the index.py, then passing the 
>>> connection handle to each of the calls that need it.  Also a little 
>>> less than optimal, but tolerable.
>>> Does anyone have any insight into this behavior?  It's more of an 
>>> academic question at this point, but I wonder if it may lead to an 
>>> improvement if we were able to identify the problem and implement a 
>>> transparent fix.  Also, feel free to let me know if I'm completely 
>>> offbase with my theory of what's going on under the hood.
>>
>> Your assessment with respect to having multiple copies of mod_python, 
>> each with its own copy of the imported modules and data is correct. 
>> It is not a bug but rather the way apache works. Depending on which 
>> apache-mpm you are using (either prefork or worker on Fedora) and 
>> your configuration, you may have anywhere from a few, to dozens of 
>> independent child processes. Indeed if you are using the prefork mpm 
>> on a heavily loaded server you could have hundreds of processes. In 
>> such cases using a persistent db connection will likely cause 
>> problems with exhausting the number of available mysql connections. 
>> Persistent connections in such a scenario are discouraged in general 
>> - not just in mod_python.
>>
>> As to your particular problem, I wonder if you are using 
>> transactions, but are not committing immediately after an update? The 
>> process which apache chooses to to handle a request is random. Say 
>> for example you update in process A without a commit. The next 
>> request may be handled by  another process, which will see the old 
>> data. A subsequent request may then come in but be handled by process 
>> A again, which will see the new data since it is using the same 
>> connection as that used for the update. Thus you will get the 
>> interleaving effect.
>>
>> This is not unique to db connections. Don't be fooled into thinking 
>> that module-level attributes are global across your application as 
>> they are not. In anything other than winnt-mpm (which is a single, 
>> threaded process), you will have multiple, independent copies of that 
>> variable.
>>
>> Opening and closing your db connection for each request is the usual 
>> solution, as you've discovered. It is possible to do connection 
>> pooling, but you need to be a little smarter about it ensure you 
>> don't run into problems or exhaust the number of mysql connections 
>> configured. Don't forget that leaving open a connection consumes 
>> memory resources as well.
>>
>> This is not to say that you need to open and close the connection 
>> each time you want to access the database within a request. The usual 
>> method is to stuff your db connection into the request object itself. 
>> For example:
>>
>> def handler(req):
>>     req.db = MySQLdb.connect( 'mysql://me:pass@localhost/mydatabase' )
>>     req.register_cleanup(close_db, req.db)
>>     delete_stuff(req)
>>     add_stuff(req)
>>     req.write('stuff done')
>>     return apache.OK
>>
>> def add_stuff(req):
>>     cur = req.db.cursor()
>>     cur.execute("insert into stuff (foo) values ('bar')")
>>     cur.close()
>>
>> def add_stuff(req):
>>     cur = req.db.cursor()
>>     cur.execute("DELETE from stuff where foo = 'baz'")
>>     cur.close()
>>
>> def close_db(db):
>>     db.close()
>>
>> Notice the use of register_cleanup() to register a function which 
>> will close the connection, rather than explicitly closing it. This 
>> ensures that your db connection will always be closed, even if an 
>> exception occurs somewhere in your code.
>>
>> Jim
>> _______________________________________________
>> Mod_python mailing list
>> Mod_python at modpython.org
>> http://mailman.modpython.org/mailman/listinfo/mod_python