How not to copy a file to standard output in Python
Suppose that you are quickly bashing together a CGI that as part of its job has to spit out a file. When I needed to do this recently, I reflexively wrote more or less:
fp = open(fname, "r") for line in fp: print line
(Because this was reading from a file, my usual objections to using file iteration don't apply.)
Somewhat later I got a politely phrased report from the person that this
was for to the effect that the PDFs this CGI was supposed to hand out to
people were corrupt. He even helpfully reported that testing with wget
said that the files were some number of bytes larger than they should
be, to the tune of one byte per line in the original PDF, which pointed
me right at my stupid bug.
The problem is, of course, that all of the forms of reading lines
at a time from a file keep the terminating newline on the line,
and then print
adds another one. The easiest solution is to use
sys.stdout.write()
instead of print
.
(For some people, the easiest solution would be to use 'print line,
'
but I've never used that syntax feature and I don't particularly like
it. I would rather use sys.stdout.write()
just so that I'm explicit
about it.)
In theory the more efficient way is to use read plus write to read in
large sized but not memory-busting chunks. I was going to say that this
requires handling short writes and buffering, but that's not correct;
unlike the underlying stdio fwrite()
routine, .write()
on file
objects always does a full write. Instead using this sort of buffering
just requires slightly more worrying about efficiency than I was doing
at the time.
(In theory this code has a second bug; I should be opening the file in binary mode just in case. In practice, I ignore binary mode; I am not writing Python code that will ever run on Windows machines.)
|
|