How not to copy a file to standard output in Python

December 10, 2009

Suppose that you are quickly bashing together a CGI that as part of its job has to spit out a file. When I needed to do this recently, I reflexively wrote more or less:

fp = open(fname, "r")
for line in fp:
    print line

(Because this was reading from a file, my usual objections to using file iteration don't apply.)

Somewhat later I got a politely phrased report from the person that this was for to the effect that the PDFs this CGI was supposed to hand out to people were corrupt. He even helpfully reported that testing with wget said that the files were some number of bytes larger than they should be, to the tune of one byte per line in the original PDF, which pointed me right at my stupid bug.

The problem is, of course, that all of the forms of reading lines at a time from a file keep the terminating newline on the line, and then print adds another one. The easiest solution is to use sys.stdout.write() instead of print.

(For some people, the easiest solution would be to use 'print line,' but I've never used that syntax feature and I don't particularly like it. I would rather use sys.stdout.write() just so that I'm explicit about it.)

In theory the more efficient way is to use read plus write to read in large sized but not memory-busting chunks. I was going to say that this requires handling short writes and buffering, but that's not correct; unlike the underlying stdio fwrite() routine, .write() on file objects always does a full write. Instead using this sort of buffering just requires slightly more worrying about efficiency than I was doing at the time.

(In theory this code has a second bug; I should be opening the file in binary mode just in case. In practice, I ignore binary mode; I am not writing Python code that will ever run on Windows machines.)

Written on 10 December 2009.
« My views on inheritance versus interface
A wish for KVM virtualization: simple bridged networking »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 10 00:28:18 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.