Roger Binns
rogerb at rogerbinns.com
Mon May 28 23:42:49 EDT 2007
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 So I wrote my own Apache module because I needed some things that weren't present in modpython (covered in previous messages in this group). Here are some observations based on the experience. I have a new respect for anyone who has written an Apache module. Ultimately the internal structure and processing is simple. Everything just "plugs in" at the appropriate points. However the documentation about the plugging in, which apis to call and when, etc is pretty much non-existent. (There are a few hello world examples but that is it). Doxygen generated references are very good, but the gotcha is that you have to know which methods to call. It is kind of like learning a new language by being handed a dictionary but with no idea of grammar. For my module I chose to use the same names as the Apache apis. For example I used "ap_get_remote_host" as the method name. This means that it can be looked up in the Apache documentation for semantics, side effects and errors. It does result in un-pythonic names but makes life easier for me anyway. My main goal of being able to use subrequests in order to apply authentication and authorization directives from some uris to code in another uri has so far worked perfectly although I haven't explored all corner cases yet. Note that I can actually do it from the request phase just fine and don't have to do it in the authentication phase. (The request phase is when I have all the information necessary to know exactly what I want to do.) These are various issues I found in the modpython code while comparing my approach with modpython: Modpython code for setting errors could be using PyErr_SetString or PyErr_Format in several places. In some places it actually leaks, for example line 111 in connobject.c: if (rc != APR_SUCCESS) { PyErr_SetObject(PyExc_IOError, PyString_FromString("Connection read error")); return NULL; } The PyString_FromString results in a leaked object since it isn't decrefed. It would be worthwhile checking out all usages of PyErr_* which should result in less code and definitely fewer memory leaks. While on the subject of errors, apr_strerror is never used so you'll not actually know what an error was. For example see req_sendfile in requestobject.c. status=apr_file_open(&fd, fname, APR_READ, APR_OS_DEFAULT, self->request_rec->pool); if (status != APR_SUCCESS) { PyErr_SetString(PyExc_IOError, "Could not open file for reading"); return NULL; } Using apr_strerror will give the actual error string which could be any number of things in this example (eg permission denied, file doesn't exist). The Apache documentation says you are supposed to do this sequence of calls in order to read a request body: ap_setup_client_block() if (ap_should_client_block()): ap_get_client_block() modpython does the setup implicitly saying that chunked uploads should be rejected. The mp doc doesn't mention that it is always safe to call read() since it calls should_client_block behind the scenes so as a user of mp you can always use it without wondering about transfer encodings, content length headers etc. In my own module I exposed ap_setup_client_block and ap_should_client_block which means that from my Python code I have control over chunked (happy to accept it) and I can tell if a body was actually supplied or not. (In HTTP/1.1 any request can have a body including GET). I found that mp request write method accepts unicode strings providing they can be converted to string objects using the default encoding (this conversion happens due to using PyArg_ParseTuple with a format of "s"). Personally I think it is a really bad idea to accept unicode strings and have "magic" (the default encoding) convert them. With this kind of thing I believe it is far better to require the user to convert to bytes explicitly choosing an encoding. modpython appears to make no effort to ensure that things don't crash if the python object representing a request outlives the request_rec that it is wrapping, although I do recall some part of the doc saying "don't do that". However I think it is fair for python users to expect not to be able to coredump the process no matter what they do. In my own module I solved this fairly trivially. When wrapping a request_rec, do an extra incref. Then use apr_pool_cleanup_register. In that callback, set the wrapped pointer to NULL and do a decref. That will cause the python object to live at least as long as the request_rec. But if the python object is still alive after the request_rec has gone away, then the wrapped pointer will be NULL. In all my methods, I check that for NULL first and raise an objectdead exception if it is NULL. Consequently no matter what happens, it is impossible to crash things. The same principle can be applied to other objects since by definition they only live as long as their pool. mp doesn't do OOR - original object return. ie each time you get a new request/header/etc python level object that is wrapping an underlying Apache object, you get a different Python object. This shouldn't really matter that much, but it would be nice if you got the same object back (ie with the same id). It took wxPython a while to figure out how to do it, since you'd need to have some way of mapping from the C level object to a Python level object. I think you can use pools again to do this such as by using apr_pool_userdata_set with appropriate keys. Finally (and this is "cosmetic") the code for setting members of requestobject has a lot of duplication. For example there are data structures defining the names and types and then the code has additional type checking and names. I used PyGetSetDef's and tp_getset and it results in no duplication :-) Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGW6E5mOOfHg372QQRAh32AJ9jZX1EOt0bk+uDLSIUWlQ04f2MKQCeLP/a OZ7VRrM7y5fa7TMtdZL2LWc= =ybbK -----END PGP SIGNATURE-----
|