[mod_python] bug in apache.import_module

Tue May 3 02:14:40 EDT 2005

Huzaifa,

You're right in thinking that the same method will be accessed at the
same time, but wrong in thinking that there is a risk that a thread
messes with another thread's stack.

Multi-threading is perfectly predictable and safe on this level : you
can launch as many threads as you want running the same
function/method, each thread will execute in its own environment, each
one will get its own stack, therefore each one will have its own set
of local variables (including parameters) when the method will be
executed. There is absolutely no risk that changing a local parameter
in one thread messes with a local parameter in another thread.

You only have to worry about synchronization when multiple thread
share some data ; for example, for a bound method (a method which is
applied to a specific object instance), even if the threads stacks are
perfectly isolated, the multiple threads will target the same object
instance, and therefore one must think about synchronisation issues
when accessing the members of the object.

To go back to your fix_path() example, only local variables are
changed in the function. The local "path" variable changes, but not
the list passed as a parameter. This means the code would be perfectly
thread-safe even with the locking code removed.

Still, you will tell me, there is the possibility that someone changes
the shared path while I'm trying to fix it. Then, change your design.
At the minimum, you should place your locking code where it is needed,
not in the lowest common factor point.

The one you chose is pretty bad (sorry to be harsh ;-) : imagine you
want to call fix_path for each request. Bam, for each request, you'll
hold a global lock just for fixing a path variable ! You'll reduce the
server performance just for housekeeping issues...

Putting some code where it is needed is what OOP was made for (well,
it was made for making it easier, at least). What I'm suggesting is to
encapsulate the shared path variable in an object, and fix it each
time it is modified, not each time it is accessed. Then, you can use
whatever locking scheme you need to ensure the consistency of this
shared path variable.

Also, locking is not the only option for this : you can also copy the
path list and use a copy in the thread, instead of the original ; you
won't have to worry about the original list being modified, since each
thread will have its own copy.

There are many, many design choices to be made when build a
multi-threaded architecture. One thing is sure : you can't assume the
worst in each part of the architecture. You have to have places (like
this utility method) where you can safely assume that the parameters
you are given won't be modified during the call.

Multi-threading is not about locking everything, not at all, because
we might as well run in a single thread, the performances would be
better. It's about putting the right locks at the right place,
sometimes choosing to copy data instead of sharing it.

Sorry to be so vague, but to go further, I'd need to go deeper in your
code, and this I have not enough time to do :).

A very good book about multithreading is "Concurrent Programming in
Java" from Doug Lea. It's about Java and also quite old but a lot of
information can be applied to any multi-threading problem.

http://www.amazon.com/exec/obidos/tg/detail/-/0201310090/

Regards,
Nicolas

2005/5/3, Huzaifa Tapal <huzaifa at hostway.com>:
>  Hello Nicolas,
>  
>  What I am really worried about is that in a multi-threaded environment,
> accessing a modular level method, multiple threads might be accessing the
> same method at the same time and overwriting the other threads path list as
> a result.  
>  
>  As per your question, I create a _lock object on top of the module.  Pretty
> much this method was created in a "SharedResources" module as a modular
> level method.  My framework then imports the method and uses it within to
> fix the paths.
>  
>  What do you think?
>  
>  Hozi
> 
>  
>  Nicolas Lehuen wrote: 
>  Hi Huzaifa,
> 
> Well, that's really up to you, especially since I don't understand
> where the _lock variable comes from.
> 
> If it's not possible to change the path list which is passed as a
> parameter without acquiring the lock, then why not, but I'd put the
> locking mechanism in the calling code rather than in this function.
> Note that you are not modifying the path list in-place : you just
> return a copy of the path list with some fixes in it. So there is no
> need to lock anything here, except if you fear that the path list will
> be changed during the iteration, but for that, you have to put a
> locking mechanism somewhere else for any modification of the path
> list. So my first advice is : keep all locking code in the same place
> and you'll save time and mental health :).
> 
> My second advice is : don't be over-protective at the cost of
> performance. Trust the users of your code. Provided that they have a
> proper documentation, it's not necessary to lock anything, because
> your users will know that bad things will happen if they change the
> path list during iteration, so they'll do the right stuff accordingly.
> Likewise, it's not necessary to check whether path is a list, or
> whether some path entries are None or "". If your users want to give
> silly data to your function, let them do it, and let Python throw
> exceptions at them.
> 
> Regards,
> 
> Nicolas
> 
> On 4/29/05, Huzaifa Tapal <huzaifa at hostway.com> wrote:
>  
>  
>  I wanted to ask for a piece of advise. So the fix that I have mentioned, I
> moved that fix (the method I replaced with the map(lambda) solution you had
> with a minor fix) to another "SharedResources" module that I import in my
> framework's handler. From within the handler and my app, I am call that
> method passing in a path argument that it *fixes* and returns back the
> results. Here is what the new method looks like:
>  
>  def fix_paths( path ):
>  """ Convenience method for checking and 
>  adding '/' to the end of paths for compatibility
>  with mod_python's import_module. """
>  
>  _lock.acquire()
>  try:
>  if path and type(path) == type([]):
>  path = map(lambda entry : (entry!='' and entry[-1]!='/' and
> entry+'/') or entry,path)
>  
>  return path
>  finally:
>  _lock.release()
>  
>  Should I have the thread-locking in place or can I get by without having it
> in place? The reason I did add that was because the path value may get
> changed, in the process, I thought of making this method thread-safe.
>  
>  Also Nicolas, if you notice I added another check in your lambda function
> to check if the entry is an empty string. If that wasn't there then if an
> empty string path was passed in the list paths, the function would break. 
> Thought, i'd let you know if you were gonna add this functionality into the
> import_module method for apache.py
>  
>  Thanks
>  
>  Hozi
> 
>  
>  Nicolas Lehuen wrote: 
>  Sorry, reposting to the whole list :
> 
> Hi,
> 
> The change I've made for MODPYTHON-9 affects only the publisher
> module. It solves the problem encountered when you had two pages, one
> at /index.py, and the other at /foobar/index.py.
> 
> Huzaifa, what I don't understand here is whether moduleA has the same
> name as moduleB, and whether you are using mod_python.publisher or
> not.
> 
> If you use the publisher, your problem has been solved in the latest
> development version, and the fix will be in the next release.
> 
> Anyway, if we put aside the fix specific to the publisher, the problem
> you describe is real, and the workaround you provide is correct.
> 
> It is so correct that the only way I see of fixing that is to
> integrate the workaround in the import_module function, that is to say
> to check whether each path component passed in the path list argument
> ends with a '/' (or '\'), and if not, add a '/' (or '\\'). This is
> done easily in the first line of the function :
> 
> if path:
>  path = map(lambda entry : (entry[-1]!='/' and entry+'/') or entry,path)
> 
> However, there is no easy way of telling what directory separator
> should be used ; theoretically, os.sep gives us the correct directory
> separator, but even under Win32 Apache admins are advised to use "/"
> which is compatible (at the OS level). So, we might as well always use
> "/".
> 
> What I fear is that adding all these checks to import_module will make
> it a little bit too slow, so what about adding a line or two in the
> documentation about the fact that all directories mentioned in the
> path argument should end with a '/' ?
> 
> Regards,
> 
> Nicolas
> 
> On 4/29/05, Graham Dumpleton <grahamd at dscpl.com.au> wrote:
>  
>  
>  Nicolas, does the change you have already made for MODPYTHON-9
> also address this slightly different variation on the problem?
> 
> Original report was based on same name module in parent and child
> subdirectories, not sibling directories which share a common prefix
> to the subdirectory name.
> 
> Graham
> 
> Huzaifa Tapal wrote ..
>  
>  
>  For my fix, I patched this issue by passing the import_module function
> the path with "/" in the end. That way it distinguishes from
> /home/user/dir1 and /home/user/dir123
> 
> hozi
> 
> Graham Dumpleton wrote:
> 
>  
>  
>  See:
> 
>  http://issues.apache.org/jira/browse/MODPYTHON-9
>  http://issues.apache.org/jira/browse/MODPYTHON-10
>  http://issues.apache.org/jira/browse/MODPYTHON-11
> 
> There are various known issues with using same name module in different
> directories.
> 
> The first one has been addressed for next version of mod_python.
> 
> Graham
> 
> On 29/04/2005, at 8:22 AM, Huzaifa Tapal wrote:
> 
>  
>  
>  I think there is a bug in apache.import_module. I am using
> apache.import_module passing in the path where I want the module
> imported from. I am seeing a very odd result as to sometimes, the
> module that is loaded is not the right one. Instead it loads a
> module that is of the same name but in a different directory. Here
> is what is happening:
> 
> I have two modules of same name in 2 different directories as follows;
> 
> /home/user/dir1/moduleA
> /home/user/dir123/moduleB
> 
> Lets say I restart apache and load up moduleA. At that point there
> is an entry in sys.modules for moduleA. Lets say then, I load up
> moduleB, now the original entry for moduleA is overwritten by this
> new entry. Now when I go back to load moduleA, I get back moduleB
> results. Here is why:
> 
> In apache.import_module if I pass in the path to load moduleA from
> '/home/user/dir1' and the last module was in '/home/user/dir123' then
> the check in import_module to make sure both modules aren't the same
> fails since it does a "file.startswith()" check on the path. Since
> the path moduleA is in the path to moduleB, it loads the wrong module.
> 
> I see that somebody else had complained about this problem in
> mod_python 1.3.1 but it still has not been patched.
> 
> Any ideas?
> 
> Hozi
> 
> 
> _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
>  
>  
>  
>  _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
> 
>  
>  _______________________________________________
> Mod_python mailing list
> Mod_python at modpython.org
> http://mailman.modpython.org/mailman/listinfo/mod_python
> 
> 
>  
> 
>  
>  
>  
>