== Two problems with Python's file iterators Modern versions of Python let you process each line in a file in a simple way, with just '_for line in fp: .._', replacing either manual _while_ loops with _.readline()_ or the memory inefficiency of letting _.readlines()_ pull the entire thing into memory. But there's two bugs, both of which can be illustrated by running a 'pycat' program: > import sys > for line in sys.stdin: > sys.stdout.write(line) If you run this without standard input redirected, you will immediately notice the problems: * the program only gets lines from standard input in big blocks, instead of one line at a time. * you (almost always) have to give *two* ^D's to the program before it sees end of file and exits. Both problems are caused by the same underlying decision: despite using Unix's traditional stdio functions, which do their own buffering, Python adds its own layer of forced buffering for file iteration. This forced buffering even has the perverse effect that you can't mix file iteration and explicit _.readline()_ et al, even if you break out of the iteration loop. (Since this is a deliberate and long standing design decision, I suspect that the Python people are not interested in bug reports.) These bugs might seem relatively minor, except that reading from terminals isn't the only case where you really need to handle input a line at a time, without insisting on buffering up a bunch of it; another is dealing with line oriented network protocols. As a result of running into these issues I reflexively avoid file iteration in my own code, which makes me grumpy when I write yet another 'read the lines' loop. (By now, I have the necessary _while_ pattern memorized.) === Sidebar: coding around the problem The necessary _while_ pattern for reading from files is: > while 1: > line = fp.readline() > if not line: > break > ... process line ... > # have reached EOF Note that even if you want the newline stripped off the end of the line, you do not want to strip it before you do the '_if not line_' check; otherwise you will think that blank lines are the end of the file. (Speaking from personal experience, this is an embarrassing mistake to make, although you usually catch it fast.) It's also possible to fix things up with an 'iterfile' routine, like this: > def iterfile(fp): > while 1: > line = fp.readline() > if not line: > return > yield line Then instead of '_for line in fp_:', just use '_for line in iterfile(fp):_'. And of course you can mix this with regular reads from the file without anything getting too confused. You may still have the double EOF problem, depending on how you structure your program; unfortunately, file objects don't remember if they've seen an EOF, so _iterfile()_ itself can't avoid the problem.