Graham Dumpleton
grahamd at dscpl.com.au
Sat Apr 22 09:51:00 EDT 2006
Sorry for taking so long to get back to this email. Busy day ... On 22/04/2006, at 2:08 AM, Jorey Bump wrote: > Graham Dumpleton wrote: >> Graham Dumpleton wrote .. >>> The new module importer completely ignores packages as it is >>> practically >>> impossible to get any form of automatic module reloading to work >>> correctly with them when they are more than trivial. As such, >>> packages >>> are handed off to standard Python __import__ to deal with. That >>> it even >>> finds the package means that you have it installed in sys.path. >>> Even if >>> it was a file based module, because it is on sys.path and thus >>> likely to >>> be installed in a standard location, the new module importer >>> would again >>> ignore it as it leaves all sys.path modules up to Python __import__ >>> as too dangerous to be mixing importing schemes. >>> >>> Anyway, that all only applies if you were expecting >>> PyServer.pyserver to >>> automatically reload upon changes. > > Graham, can you enumerate the different ways packages are handled, > or is it enough to say that packages are never reloaded? In this > thread, you explain that when a package is imported via > PythonHandler, mod_python uses the conventional Python __import__, > requiring an apache restart to reliably reload the package, as in > the past. That is correct. What it means is that packages will only be found if located somewhere along sys.path and are still held in sys.modules because it is builtin Python __import__ that will load them. As such, they are not regarded as being reloadable by mod_python. Whether you can reload Packages using the Python "reload" statement/function I don't know as I have never tried. I would probably not recommend trying, so an Apache restart is still going to be the only way to reliably reload a package. Therefore nothing has change in this respect from current importer. I really did try hard to get reloading working with packages, ie., many nights over a few weeks, but in the end although I could see a glimmer of hope that it might work, it just became too impractical. Some of the problems are that sub imports within packages will only work when the parent module in the package is listed in sys.modules. Thus one had to fake up horrible unique module reference names to store a reference to the modules in sys.modules. Because of reloading, this had to be tagged also with an incarnation version number so that when reloading you weren't overwriting the currently loaded one. The other big problem was that you get cyclic dependency loops in packages because of how you reference back through the root of the package when doing sub imports. This meant that any change to any module file within the package had to trigger a complete reload of all files which made up the package. Ie., you had to treat the package as a complete blob, otherwise it became impossible to implement and you invariable some how got different versions of a module in use in different parts of the package at the same time. Very messy. What I have hoped to achieve by some of the other features in the new module importer is a way of achieving the same effect that packages were generally being used for, ie., namespacing and encapsulation, but still be able to support reloading. It does though mean restructuring your imports a bit and it only becomes usable within the context of mod_python, but then if it was some generic package which wasn't mod_python specific to support a web application, one could question why one would expect it to be reloadable anyway. > This also implies that if a published module imports a package, and > the published module is touched or modified, then the module will > be reloaded, but not the package. Is this correct? Correct, the file based handler module can be reloaded, but the package will be referenced out of sys.modules where it already resides by the Python __import__ builtin importer. >> BTW, that something outside of the document tree, possibly in >> sys.path, >> is dealt with by Python __import__ doesn't mean you can't have module >> reloading on stuff outside of the document tree. The idea is that >> if it is >> part of the web application and needs to be reloadable, that it >> doesn't >> really belong in standard Python directories anyway. People only >> install >> it there at present because it is convenient. > > There are security benefits to not putting your code in the > DocumentRoot. It's also useful to develop generic utilities that > are used in multiple apps (not just mod_python), but that you don't > want available globally on the system. I prefer extremely minimal > frontends in the DocumentRoot, with most of my code stored > elsewhere. Will the new importer support reloading modules outside > of the DocumentRoot without putting them in sys.path? If you don't want certain modules available globally on your system, ie., not in site-packages directory. You can obviously still set PythonPath just within mod_python configuration so they are found without it effecting command line Python. Obviously these are still notionally on sys.path and so would not be candidates for reloading. As I mentioned, setting of PythonPath currently has nasty side effect preserved from current importer whereby it causes Directory directive directory not to be searched. I want to get rid of this behaviour though as it doesn't seem to make too much sense with new module importer. http://issues.apache.org/jira/browse/MODPYTHON-154 although one still has to be careful in doing it as it may cause existing applications to now incorrectly pick up a module from the Directory directive directory when it wouldn't have before. Because of path ordering issues in current importer, using common names in multiple locations always caused unpredictable though. Now in terms of modules which are a candidates for reloading being able to be found on some search path, the first thing that could be done (hasn't yet), is to allow for handler directives a path to be specified by: PythonHandlerPath '["/some/path1":"/some/path2"]' PythonHandler mydispatcher The idea here is that where the specified handler module is not an absolute or relative path, ie., is just a module name, the path defined by PythonHandlerPath directive would be appended to the "path" argument to apache.import_module() function call internally, the current value of the "path" argument in this situation being the Directory directive directory. The order of search would then be, look in Directory directive directory, then search PythonHandlerPath and then fall back to sys.path. Note that this can be done now by virtue of a shell handler in the document tree simply containing something like: from mod_python import apache _inner = apache.import_module("modname",path=["some/path1","/some/ path2"]) handler = _inner.handler But then, it is probably better to use a full path name in the config to begin with and more probably want I want to promote as a preferred mechanism with the new importer. This is the main reason why I haven't added PythonHandlerPath. That is, I think using an absolute path name is better in being more precise. The other reason PythonHandlerPath hasn't been implemented yet is that the new importer is still optional and hasn't been properly embedded into mod_python. Until it was accepted as the correct way to go, I didn't want to be adding new directives or changing other parts of mod_python which need to be changed so it works correctly in all situations. See: http://issues.apache.org/jira/browse/MODPYTHON-155 http://issues.apache.org/jira/browse/MODPYTHON-156 for a couple of other examples of things which I haven't been able to do yet and can't really until decision made to embed it properly. So, PythonHandlerPath is one way that some special search path could be consulted for reloadable modules. This though would only apply by default to top level handler imports, it would not apply for explicit calls to apache.import_module(). Overall I am a bit hesitant on introducing a directive which would provide a search path which apache.import_module() would automatically search. The reason is that like in the current importer this can cause problems where different parts of the document tree decide to set the search path differently. For example, imagine a common set of modules outside of the document tree which are used by code running under different parts of the document tree and which therefore may have different handler search paths defined. Depending on which part of the document tree calls into the common code first will dictate how a search may be done for some other module if the common modules expect to find it on the search path. If one part of the document tree doesn't include this other place, the search will fail. In other words the common modules are relying on a search path that is in part out of its control. Hope you follow what I am getting at here. It is in some way the sort of situation Dan had with the "config" module. His code was relying on fact that directory his config module directory was in sys.path. But PythonPath effectively being random order based on access order when set to different things in different parts of the document tree, if someone else provided a config module under same name it would be found by mistake and he would not get the one he wanted. My feeling is that those modules should be self contained, or if they do need to search else where, that they should somehow define the search path for the other module themselves, ie., using "path" argument to the apache.import_module() method. This ensures they get want they wanted. So, an equivalent to PythonPath for reloadable modules could be provided, but I'd only really wanted to do it when good use cases shown and that it is also shown that unpredictable behaviour isn't just going to result again because of how it could be set differently in different parts of document tree. One would also have to come up with a way to extend such a part inherited from a parent context. Ie., like how one can refer to sys.path in PythonPath now. >> The better way of dealing with this with the new module importer >> is to >> put your web application modules elsewhere, ie., not on sys.path. >> You then >> specify an absolute path to the actual .py file in the handler >> directive. >> <Directory /> >> SetHandler mod_python >> PythonHandler /path/to/web/application/PyServer/pserver.py >> ... > > How arbitrary is this path? Must it be within the DocumentRoot? It is an absolute path relative to the root of the filesystem as a whole, so can be anything you want. Can include drive specifiers on Win32. There currently is a short cut that can be used to refer relative to the directory the Directory directive refers to. This is: PythonHandler ~/mymodules/handler.py Ie., "~/" prefix. As I mentioned in a previous email, wanting to get rid of the "~/" prefix as a general mechanism. What I mean here is that currently this can also be used in explicit calls to apache.import_module() and will refer to the currently value of req.hlist.directory as root of path. This leads to unpredictability with common modules like discussed above and so getting rid of it. Instead, for handler directive case, will instead allow: PythonHandler ./mymodules/handler.py or: PythonHandler ../mymodules/handler.py Ie., relative to directory the Directory directive specifies. >> Most cases I have seen is that people use packages purely to create a >> namespace to group the modules. With the new module importer that >> doesn't really need to be done anymore. That is because you can >> directly reference an arbitrary module by its path. When you use the >> "import" statement in files in that directory, one of the places >> it will >> automatically look, without that directory needing to be in sys.path, >> is the same directory the file is in. This achieves the same >> result as >> what people are using packages for now but you can still have module >> reloading work. > > Does it (the initial loading, not the reloading) also apply to > packages in that directory? Or will it only work with standalone > single file modules in the root of that directory? Only works for standalone single file modules. A Python package always has to be on sys.path and will never be reloabable by mod_python. Note that if a package is very simple. Ie., is a single level and refers to modules in the same package directly rather than through the root, using: package = apache.import_module("/some/path/package/__init__.py") can often work though and will give you reloading as well. > This is all very nifty, because it implies that a mod_python > application can now be easily distributed by inflating a tarball > and specifying the PythonHandler accordingly. If PythonHandler path refers to the extracted tarball by absolute path, then yes it becomes simpler as no need to mess with PythonPath or install it into site-packages. You just can't implement it as a traditional package, but then because it is self contained in its own directory which isn't mentioned in sys.path, you still have the ability to internally structure it how you want. > If the new importer works outside of the DocumentRoot, Which it does, but then I probably don't need to confirm that again. :-) > and Location is used instead of Directory, no files need to be > created in the DocumentRoot at all. Or is this currently > impossible, in regards to automatic module reloading? I already do > this for some handlers I've written, and really like the > flexibility provided by the virtualization. Technically it is probably possible to have nothing at all in the document tree. You can do this now with the current importer though, but means messing with PythonPath with all the problems that entails and other code can pick up your handler modules. By being able to specify an absolute path to your handler bundle it becomes completely separate and would only be accessible by other code similarly accessing it by absolute path. I think perhaps you are starting to see where I am in part going with the new module importer. That is that I am introducing this new way of being able to refer to stuff by explicit paths, thereby breaking away from sys.path and all the problems that result from that. It means restructuring stuff a bit and it will not be backward compatible, but I think that overall it is a much better way of doing it with better compartmentalisation and predictability. Anyway, that was a long ramble. I really need to start getting some of this documented properly, as there is certainly more to the new module importer than providing an exact replacement for the old. I think the possibilities are quite promising, but need to explain it well so people don't get the wrong idea and that there are good reasons for doing it. BTW, I forgot to say more about how the "path" argument to apache.import_module() behaves when module name is referred to as an absolute or relative path. This is something I started talking about in previous email to Dan. If you didn't read that one, you may want to go back and look at it. I'll need to revisit that one again, as that is the one area that probably still needs to be thought out properly and changes still made to make it more usable. Definitely getting late now, but then I slept most of the afternoon as felt a bit funny. :-) Graham
|