Jonathan Gardner
jgardner at jonathangardner.net
Wed Jun 11 11:52:31 EST 2003
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 11 June 2003 09:45, Paul Robinson wrote: > Apache has a number of modes of operation when it comes to threading and > forking, I would like to understand how these things interact with > Python subinterpreters > [http://www.modpython.org/live/current/doc-html/pyapi-interps.html] and > issues such as the Python global interpreter lock (GIL) > [http://www.python.org/doc/current/api/threads.html]. > First off, think of each child process as an entirely seperate process. There is *no* *way* that any process can communicate with each other except through shared memory or pipes. I am no expert on the inner workings of mod_python, but reading the documentation it sounds like each process is entirely independent of each other. Each process can have a number of "subinterpreters" based on the configuration, but these subinterpreters are isolated from one another as well. As far as GIL is concerned, you really shouldn't be concerned about that at all. That is there just to ensure that no thread is caught with its pants down. Or, in more technical terms, that the state of the python interpreter and associated data is always consistent when there is no lock. > For example, on a Windows platform where there is a single > multi-threaded Apache process (mpm_wint > [http://httpd.apache.org/docs-2.0/mod/mpm_winnt.html]) is it correct to > say that mod_python would not be able to take advantage of a > multi-processor machine due to the GIL? > I don't know the details of how Windows machines handle threads, but I do know that threads are like "lightweight" processes. They can and will be run on seperate processors on a normal OS. Whether or not each thread can communicate with each other -- the impression I get from the documentation is that this is not so. It sounds like each thread will have their own main interpreter, and a number of sub-interpreters depending on the configuration. This means that there is no way to communicate among threads via Python, as the Python main interpreters are seperate. > In another, given Apache running in the prefork MPM > [http://httpd.apache.org/docs-2.0/mod/prefork.html]- is it a) possible > or b) useful to have a global, per-Apache-process persitant data > strucuture sharing a pool of (threadsafe) database connections. I would > say not useful since that process will only ever be running a single > mod_python request at a time - hence more than one item in the pool > would be useless. Given the "worker MPM" > [http://httpd.apache.org/docs-2.0/mod/worker.html] however it may be > useful but it's not clear to me if it would be possible. > I don't think this is possible. > Taking the specific example of database connections (let me note I have > read and believe I understand FAQ 3.3) is it ever useful or possible to > share a pool of database connectors, rather than a single connector in > the global namespace. I assume that code such as that in FAQ 3.3 would > require additional locking mechanisms in order to function correctly in > a multi-threaded Apache environment? > Within a single apache thread and process, yes, you can share database connections. If your handler decides to thread while processing a request, then it can share with the same database connections in that apache thread. However, I don't think what you really want (independent processes or threads sharing connections) is possible. > I bet there must be some code in existing projects that does stuff like > this. Any pointers? > Sorry, I looked into this on my own, both with mod_perl and mod_python, and there is nothing out there that I could see. The best solution is to keep the connection alive, and reuse it for new incoming requests. If the database doesn't like having so many open and inactive connections, you can just hangup at the end of the request, and connect at the beginning of the request. Some databases have more overhead than others. Remember I said that the only way to talk between processes is via shared memory or pipes. Shared memory isn't supported well (if at all) in python. Pipes are something you already are familiar with -- TCP sockets are pipes between two processes that can be located on different servers. So another solution that I have thought of but have no reason to implement is a database connection pool server. In this scenario, you would get a connection to the database server by connecting to the connection pool server. After the initial connection, the connection server just relays your commands word for word to the database. When you disconnect, it puts the connection back into the pool. This isn't too far different from a session server, or other kinds of meta-servers. The main stink I have with these is that servers are a pain in the butt to write right, and they are always a nightmare to manage. And you always have to have a plan for scaleability, or it will eventually bite you. > Maybe I'm confusing myself at the moment - maybe some other people as > well ;-) > I found your message to be extremely precise in its wording, with plenty of useful references. That was both helpful and refreshing. - -- Jonathan Gardner <jgardner at jonathangardner.net> (was jgardn at alumni.washington.edu) Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE+53pvWgwF3QvpWNwRAr5nAKDNvpjSXZ4+0GSWQWh11V2EdbhvjACgyAmP kvdSO3JZYSfwDGo1XI3JOvY= =IQH6 -----END PGP SIGNATURE-----
|