Using PyPy (or thinking about it) exposed a bug in closing files
Over on the Fediverse, I said:
A fun Python error some code can make and not notice until you run it under PyPy is a function that has 'f.close' at the end instead of 'f.close()' where f is an open()'d file.
(Normal CPython will immediately close the file when the function returns due to refcounted GC. PyPy uses non-refcounted GC so the file remains open until GC happens, and so you can get too many files open at once. Not explicitly closing files is a classic PyPy-only Python bug.)
When a Python file object is garbage collected, Python arranges to close the underlying C level file descriptor if you didn't already call .close(). In CPython, garbage collection is deterministic and generally prompt; for example, when a function returns, all of its otherwise unreferenced local variables will be garbage collected as their reference counts drop to zero. However, PyPy doesn't use reference counting for its garbage collection; instead, like Go, it only collects garbage periodically, and so will only close files as a side effect some time later. This can make it easy to build up a lot of open files that aren't doing anything, and possibly run your program out of available file descriptors, something I've run into in the past.
I recently wanted to run a hacked up version of a NFS monitoring program written in Python under PyPy instead of CPython, so it would run faster and use less CPU on the systems I was interested in. Since I remembered this PyPy issue, I found myself wondering if it properly handled closing the file(s) it had to open, or if it left it to CPython garbage collection. When I looked at the code, what I found can be summarized as 'yes and no':
def parse_stats_file(filename): [...] f = open(filename) [...] f.close return ms_dict
Because I was specifically looking for uses of .close(), the lack of the '()' immediately jumped out at me (and got fixed in my hacked version).
It's easy to see how this typo could linger undetected in CPython.
The line 'f.close
' itself does nothing but isn't an error, and
then 'f
' is implicitly closed in the next line, as part of the
'return
', so even if you looking at this program's file descriptor
usage while it's running you won't see any leaks.
(I'm not entirely a fan of nondeterministic garbage collection, at least in the context of Python, where deterministic GC was a long standing feature of the language in practice.)
|
|