[mod_python] Re: some questions about using mod_python

Sun Mar 20 19:01:27 EST 2005

Graham Dumpleton wrote:

> Another one of my long rambles. Making up for not reading email for
> a few days at a time at the moment. :-)
> 
> On 20/03/2005, at 3:52 PM, vegetax wrote:
> 
>> hi, i just finished reading the docs and i have some obstacles to start
>> implementing a web system using mod_python.
>>
>> First 2 observations in the docs:
>> -please correct the docs in the hello world example,it doesnt work!
>> req.content_type = '/text/html' is needed,i spended an hour trying to
>> find
>> the problem in the hello world??
> 
> Are you talking about the example in:
> 
>    http://www.modpython.org/live/current/doc-html/inst-testing.html
> 
> The example will work without req.content_type being set, at least
> mod_python will work correctly.
> 
> The problem is that if your Apache configuration does not set:
> 
>    DefaultType text/plain
> 
> and return that with a response if the handler doesn't, then your
> browser
> will not necessarily know what do with a file with an extension of ".py"
> and may ask you to save the response to a file instead of showing it in
> the browser itself.
> 
> It is always thus a good idea to set content type regardless as a matter
> of best practice, but the outcome if not set is an issue to do with the
> Apache configuration and not mod_python.
> 
> BTW, for that part example, it should be "text/plain" for the content
> type and not "text/html".
> 
>> -Please remark that in order to send any kind of output
>> headers,including
>> cookies and sessions,the code should be before any req.write()
> 
> This one is fair enough. The only veiled reference on the request object
> members page seems to be:
> 
>    set_content_length(len)
>       Sets the value of req.clength and the "Content-Length" header to
> len.
>       Note that after the headers have been sent out (which  happens just
>       before the first byte of the body is written,  i.e. first call to
>       req.write()), calling the method is meaningless.
> 
> I would suggest you log a report for an improvement to mod_python at:
> 
>    http://issues.apache.org/jira/browse/MODPYTHON
> 
> You should probably reference this email as it appears in the mailing
> list archive for reference.
> 
> This is the only place where such requests will over time be noticed. If
> only the mailing list they well be lost and forgotten.

Done, added a documentation improvement to jira.

>> My doubts:
>>
>> - PythonPath directive doesnt work at all,when i set it at any config
>> level
>> i get a NOT FOUND error,from apache when i try to access anything that
>> uses
>> mod_python, the definition is : PythonPath "sys.path +
>> ['/devel/classes']"
> 
> Hmmmm, PythonPath does generally work okay from what I have seen. Only
> issue
> I have with it is that if a high up within the directory hierarchy you
> set
> it to:
> 
>    PythonPath: 'sys.path'
> 
> then there is no going back. That is, regardless of the fact that in a
> subdirectory you might use SetHandler/PythonHandler to enable mod_python
> use a second time, PythonPath will be inherited from the mod_python
> scope
> higher up and no extension of the Python path will occur in the new
> mod_python scope which is introduced. :-(
> 
> Anyway, you might like to be a bit more specific and give some working
> examples which demonstrate the problem. Is this somehow tied up with you
> redirections from publisher to PSP? I can see them potentially screwing
> each other up if there requirements for setting the Python path are
> different.
> 
> Are your PSP pages nested at a lower scope than the publisher handlers
> that redirect to them? Maybe you are running up against a similar issue
> to what I was having with nesting of different methods for using
> mod_python.

None of those cases, is just that "any" handler wont work when i set
PythonPath to sys.path or sys.path + ['/devel/classes']"
But i will try to dig more on it, to find the issue.

>> - Where do i set a database connection pool to load at server
>> initialization ,so that all request can access it? is the pythonImport
>> directive the best place? where do i set a clean up function for the
>> pool
>> at server finalization ?
> 
> Cleanup function registration for stuff that should be done at time of
> child
> termination can only be done with req.server.register_cleanup(). There
> probably should be an apache.register_cleanup() method which would be
> available from a module imported using PythonImport. This would then be
> the
> best way of doing it.
> 
> It seems that the best one could do now is import the module when
> required
> but don't do anything at the time of import which would require a
> cleanup
> function to be registered. Then, when the first handler calls in to the
> actual module, require that the "req" object be passed into the pool,
> with
> those resources which need to be cleaned up later being created then
> with a
> cleanup function being registered through req.server.register_cleanup().
> 
> I have added a bug report suggesting that apache.register_cleanup() be
> added to allow it to be used from module imported using PythonImport.

But is to problematic to clean resources at request level, i think in the
midtime i will be cleaning up resources like connections with an external
script which i run after apache exits.

> FWIW, in Vampire, when Vampire's module importing mechanism is used a
> stripped down request object is available in the set of global variables
> during import as __req__. Thus in Vampire one could actually register
> a cleanup function during import by using:
> 
>    __req__.server.register_cleanup(....)
> 
> This would save each handler having to pass the req object into a pool
> and
> means one wouldn't have to delay creation of resources which needed the
> cleanup function to be registered.

Looks like a good solution when the clean up is needed per request,and is
also posible that the pool component was made by someone else,and cant take
req as parameter.

>> - Is it ok to configure apache to just use one process and several
>> threads
>> like in windows? what other implications it has? besides losing some
>> of the
>> stability and safety that apache provides, is just that too many
>> things go
>> wrong in the dynamic applications when mixing using both process and
>> threads.
> 
> No reason why you can't use "worker" MPM with one process and many
> threads
> just like on Win32. You just have to deal with the same multithreading
> issues as you do on Win32.
> 
> First thing to is to patch mod_python to fix the multithreading
> problems.
> The patches can be found at:
> 
>    http://www.dscpl.com.au/projects/vampire/PATCHES
> 
> This address will change after easter to:
> 
>    http://www.dscpl.com.au/projects/vampire/patches.txt
> 
> In terms of other multithreading issues, there are a few problems you
> need
> to be aware of and code for if you want a robust application. One of my
> prior posts on this topic in relation to module importing is:
> 
>    http://www.modpython.org/pipermail/mod_python/2004-October/016605.html
> 
> You should go back and forth within that particular thread for other
> stuff
> related to threading.

Thanks i will analyze those issues.

>> - I want to use a MVC approach,the publisher's methods are the
>> controlers
>> that do the processing and send internal redirects to psps to show the
>> results,so i need to pass objects to the psps from the pub methods,i
>> need
>> those objects to be in the request object of the handler and available
>> to
>> the target psp.
> 
> Why do you need to redirect the request to PSP? Why couldn't you simply
> write a common method of your own which triggered PSP page rendering
> directly within your publisher method with the desired environment?
> 
> At its simplest, you could use:
> 
>      template = psp.PSP(req,filename=path,vars=settings)
>      template.run()
> 
> Where "path" is the name of the PSP file and "settings" is a dictionary
> populated with data that your controller has obtained from the model.
> Using redirection seems to me to be drawing too much of an artificial
> separation between your controller and view.

Yes, i am trying to make a well defined separation.

>> this code doesnt work,it shows an error saying req object has no
>> attribute,data:
>>
>> def regHandler(req):
>>     data = [1,2,'a']
>>     req.data = data
>>     req.internal_redirect('/var/www/myapp/psp/showReg.psp')
>>     return apache.OK
>>
>> showReq.psp:
>>
>> the data :
>> <%= req.prev.data %>
>>
>> I also tried to load the data in a session object retrieved or created
>> in
>> reqHandler,but same results,the session object in the psp always
>> creates a
>> new session and the gives : Key error, session object has no key 'data'
> 
> The documentation does say:
> 
>    The httpd server handles internal redirection by creating a new
> request object
>    and processing all request phases. Within an internal  redirect,
> req.prev will
>    contain a reference to a request  object from which it was redirected.
> 
> However, this doesn't explicitly say that "req.prev" will be the same
> req object
> as was used in the handler from which redirection occurred. All it says
> is that
> it "will contain a reference to a request object from which it was
> redirected".
> 
> I read this as saying that "req.prev" will hold valid data pertaining
> to the
> original request as passed in by Apache, but I don't see it
> guaranteeing that any
> data you might cache in the original request object will be available.
> 
> But then, I back this up by looking at the source code to see that
> internal
> redirection is handled by a call back into Apache.
> 
>    ap_internal_redirect(new_uri, self->request_rec);
> 
> Here "request_rec" is the original Apache request object and not the
> Python one
> that wraps it, thus anything that you stash in the Python part of the
> object
> cannot possibly be available to the handler to which redirection occurs
> as
> the Apache code which does the redirection doesn't have access to it.
> 
> All that could be said is that the documentation could explicitly say
> that any
> user data stashed in the req object is not available. That would clear
> things
> up.

Perhaps another documentation issue to jira to cover all these internal
redirect complications.

>> Also, the post in site
>> (http://dotnot.org/blog/archives/2004/06/27/nasty-deadlock-in-
>> modpython-when-using-sessions/)
>> ,describing that some rare things happens to session locks,when
>> internal
>> redirecting between python handlers,in my case the pspHandler and the
>> pythonHandler or publisher? how can i overcome those issues?
>>
>> What exactly happens between internal redirects that affect mod_python
>> behavior,sessions,etc, when used like i want to use it? And is it fast
>> to
>> internal redirect a lot? since all request phases are processed every
>> time.
> 
> That one has to unlock sessions explicitly before an internal redirect
> has been
> covered on the mailing list a number of times. The documentation could
> mention
> it and there probably should be a FAQ entry for it.
> 
> It all comes about because an internal redirect effectively appears
> like a
> nested function call from the original handler. Ie., after the
> redirection,
> the original handler continues to execute. Since sessions use a non
> reentrant
> lock, a second attempt to lock it from the same thread will cause a
> deadlock.
> 
> At this point I don't know enough about the internals of Apache runtime
> library to know whether it is possible to have reentrant locks, but if
> it
> did, it might be reasonable to have sessions use a reentrant lock
> instead and
> this whole problem might be avoided.
> 
> Note however that if in your own handler code you used non reentrant
> locks
> you would potentially end up with the same sort of problems and would
> either have to unlock them before the internal redirect, or change to a
> reentrant lock. Ie., threading.RLock instead of threading.Lock.
> 
> Thus, one could possibly improve mod_python by using a reentrant lock
> for
> a session object if no reasonable reason could be found not to. The only
> danger in doing that is that the same code will no longer work on older
> versions
> of mod_python. It might be sensible to wait until a point where a major
> version of mod_python was brought out which wasn't backwards compatible
> in other ways as well.
> 
> Hope this is all interesting.

Yes, i think that these handler chaining using internal redirects may be
needed by more complex handlers, and a reentrant lock is the only solution
that comes to mind, also that older code would be really easy to fix.

> BTW, have you considered other page templating solutions besides PSP?
> In terms
> of best separation between model, view and controller, or at least
> between the
> HTML that represents a page and the code that populates it, I would
> recommend
> using HTMLTemplate.
> 
>    http://freespace.virgin.net/hamish.sanderson/htmltemplate.html
> 
> Why I prefer it over PSP is that in PSP you are effectively still
> embedding
> Python code in the template itself and to render the template you are
> actually
> executing it, with there being call outs from the template to collect
> data.
> In HTMLTemplate, the template object is distinct, with your controller
> code
> making calls into the template object when desired to set fields within
> it.
> Ie., DOM like but not having the overhead of a DOM because only
> fillable parts
> of the template are indexed.
> 
> What this means is that with HTMLTemplate you aren't forced to prepare
> all your
> data up front before filling in the template, instead you can fill it
> in bit
> by bit.
> 
> I can supply references to example of using HTMLTemplate from Vampire
> later if
> you are interested.
> Graham

Thanks for the advice graham,but i dont share the philosophy kind of those
templates engines, first and last it uses its own tag language, i HATE
that. I like psp because it lets you embed python code, so that i can
generate complex dynamic views,but the code inside the psp is only for
content displaying, 100% of the form processing is done by the controler
calling domain object's methods! in extremely exceptional situations the
psp code will have a litle processing or data gathering, but those are only
exceptions, the key here is that the team respect the rules.

I work with web designers,you simply cant let them touch the dynamic
parts,those parts tend to be complex,they should work in the static parts
and coordinate with the programmer in charge of the dynamic view
generation,and then plug dynamic and static parts together, in order to
coordinate they MUST have some programming knowledge ,anyway i use
javascript a lot and i dont take a person who doesnt know javascript as a
web designer.Thats my point of view.

What i am trying to acomplish are two things :

First : make a small ordering system that has all the peculiarities of a
data oriented application:
1 common sql queries like insert,delete.. ,findbyName,findbyNameEmail,etc
2 common ORM(Object relational mappings) issues like the tree kind of
relationships,inheritance.
3 extended data gathering,processing components that are business
specific,and therefore create on a case by case basis.

Second : make a code generator that takes an sql script with some metadata
required by the generator.
This will generate 1 and 2, and provide a framework to build 3,the domain
especific rules.
The generator should be flexible enough to stay out of your way to define
the business specific rules,queries,processing, i will use an approach very
similar to apache extensibility provided by handlers.In case of requirement
change or system extension the generator should only regenerate the parts
it generated previously and not delete user written code.

As you can see it has very complex requirements, the sql and domain model
generation is somewhat ready for test, now i need to test the interface
design and generation. The interface code should be very clean and modular
so that the generator can do its work, and provide a clean extension hook
for programmer defined interfaces.

So psp seemed like a good choice, because it provides the power of python to
generate complex interfaces,defined by the code generator or by the
user,but as it just displays things, no processing logic.
But i have my doubts about the extension system that psp provides,the
"include" directive doesnt seem enough for my complex requirements,so i was
thinking about using my own interface code generation libraries,that can
use object oriented goodies like polimorphism ,inheritance to provide the
required flexibility and extensibility.
But i am also considering using internal redirects to servlets or something
like that,servlets use the full object oriented python power to generate
dynamic interfaces, but also have my doubts about it.