[mod_python] file upload very slow in fieldStorage

Mike Looijmans nlv11281 at natlab.research.philips.com
Tue Oct 25 05:00:13 EDT 2005


I was looking at Barry's fix in util.py, but I had already done some 
work into the same direction, in order to upload huge files (to a TAPE 
streamer, go figure...).

http://www.modpython.org/pipermail/mod_python/2005-March/017773.html

My idea is that the "read_to_boundary" function is unnessary complex. 
the following code does basically the same thing, but it skips a lot of 
memcpy calls (based on the 3.1.4 code but should work for 3.2.x too):

     def read_to_boundary(self, req, boundary, file):
         delim = ""
         line = req.readline(10240)
         while line and not line.startswith(boundary):
             odelim = delim
             if line[-2:] == "\r\n":
                 delim = "\r\n"
                 line = line[:-2]
             elif line[-1:] == "\n":
                 delim = "\n"
                 line = line[:-1]
             else:
                 delim = ""
             file.write(odelim + line)
             line = req.readline(10240)

Consider:
- If the last char is a #13 (\r) then it's just sent to the file. The 
next readline will return the \n by itself. Since most callback handlers 
will just write to a disk file, they don't care about line ends anyway. 
They must be prepared to receive partial lines anyway.

- line.startswith(boundary)
Now you may argue that it is only a boundary if it appears on a line by 
itself. Well, I say, the odds that your file contains a boundary string 
followed by a newline are not _significantly_ smaller than without that 
one character.

I tested this implementation with various binary (100MB), DOS and UNIX 
text files, without problems. The uploaded files were bitwise equal.

I also implemented the callback by allowing subclasses to override 
make_file. That should go into another thread, I guess.

-- 
Mike Looijmans
Philips Natlab / Topic Automation



More information about the Mod_python mailing list