[mod_python] Apache, Threading and Multi-Processing Modules

Wed Jun 11 11:52:31 EST 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 11 June 2003 09:45, Paul Robinson wrote:
> Apache has a number of modes of operation when it comes to threading and
> forking, I would like to understand how these things interact with
> Python subinterpreters
> [http://www.modpython.org/live/current/doc-html/pyapi-interps.html] and
> issues such as the Python global interpreter lock (GIL)
> [http://www.python.org/doc/current/api/threads.html].
>

First off, think of each child process as an entirely seperate process. There 
is *no* *way* that any process can communicate with each other except through 
shared memory or pipes. I am no expert on the inner workings of mod_python, 
but reading the documentation it sounds like each process is entirely 
independent of each other. Each process can have a number of 
"subinterpreters" based on the configuration, but these subinterpreters are 
isolated from one another as well.

As far as GIL is concerned, you really shouldn't be concerned about that at 
all. That is there just to ensure that no thread is caught with its pants 
down. Or, in more technical terms, that the state of the python interpreter 
and associated data is always consistent when there is no lock.

> For example, on a Windows platform where there is a single
> multi-threaded Apache process (mpm_wint
> [http://httpd.apache.org/docs-2.0/mod/mpm_winnt.html]) is it correct to
> say that mod_python would not be able to take advantage of a
> multi-processor machine due to the GIL?
>

I don't know the details of how Windows machines handle threads, but I do know 
that threads are like "lightweight" processes. They can and will be run on 
seperate processors on a normal OS.

Whether or not each thread can communicate with each other -- the impression I 
get from the documentation is that this is not so. It sounds like each thread 
will have their own main interpreter, and a number of sub-interpreters 
depending on the configuration. This means that there is no way to 
communicate among threads via Python, as the Python main interpreters are 
seperate.

> In another, given Apache running in the prefork MPM
> [http://httpd.apache.org/docs-2.0/mod/prefork.html]- is it a) possible
> or b) useful to have a global, per-Apache-process persitant data
> strucuture sharing a pool of (threadsafe) database connections. I would
> say not useful since that process will only ever be running a single
> mod_python request at a time - hence more than one item in the pool
> would be useless. Given the "worker MPM"
> [http://httpd.apache.org/docs-2.0/mod/worker.html] however it may be
> useful but it's not clear to me if it would be possible.
>

I don't think this is possible.

> Taking the specific example of database connections (let me note I have
> read and believe I understand FAQ 3.3) is it ever useful or possible to
> share a pool of database connectors, rather than a single connector in
> the global namespace. I assume that code such as that in FAQ 3.3 would
> require additional locking mechanisms in order to function correctly in
> a multi-threaded Apache environment?
>

Within a single apache thread and process, yes, you can share database 
connections. If your handler decides to thread while processing a request, 
then it can share with the same database connections in that apache thread.

However, I don't think what you really want (independent processes or threads 
sharing connections) is possible.

> I bet there must be some code in existing projects that does stuff like
> this. Any pointers?
>

Sorry, I looked into this on my own, both with mod_perl and mod_python, and 
there is nothing out there that I could see.

The best solution is to keep the connection alive, and reuse it for new 
incoming requests. If the database doesn't like having so many open and 
inactive connections, you can just hangup at the end of the request, and 
connect at the beginning of the request. Some databases have more overhead 
than others.

Remember I said that the only way to talk between processes is via shared 
memory or pipes. Shared memory isn't supported well (if at all) in python. 
Pipes are something you already are familiar with -- TCP sockets are pipes 
between two processes that can be located on different servers.

So another solution that I have thought of but have no reason to implement is 
a database connection pool server. In this scenario, you would get a 
connection to the database server by connecting to the connection pool 
server. After the initial connection, the connection server just relays your 
commands word for word to the database. When you disconnect, it puts the 
connection back into the pool.

This isn't too far different from a session server, or other kinds of 
meta-servers. The main stink I have with these is that servers are a pain in 
the butt to write right, and they are always a nightmare to manage. And you 
always have to have a plan for scaleability, or it will eventually bite you.

> Maybe I'm confusing myself at the moment - maybe some other people as
> well ;-)
>

I found your message to be extremely precise in its wording, with plenty of 
useful references. That was both helpful and refreshing.

- -- 
Jonathan Gardner <jgardner at jonathangardner.net>
(was jgardn at alumni.washington.edu)
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+53pvWgwF3QvpWNwRAr5nAKDNvpjSXZ4+0GSWQWh11V2EdbhvjACgyAmP
kvdSO3JZYSfwDGo1XI3JOvY=
=IQH6
-----END PGP SIGNATURE-----