2018-09-17
Python 3 supports not churning memory on IO
I am probably late to this particular party, just as I am late to
many Python 3 things, but today (in the course of research for
another entry) I discovered the pleasant fact that Python 3 now
supports read and write IO to and from appropriate pre-created byte
buffers. This is supported at the low level and also at the high
level with file objects (as covered in the io
module).
In Python 2, one of the drawbacks of Python for relatively high performance IO-related code was that reading data always required allocating a new string to hold it, and changing what you were writing also required new strings (you could write the same byte string over and over again without memory allocation, although not necessarily a Unicode string). Python 3's introduction of mutable bytestring objects (aka 'read-write bytes-like objects') means that we can bypass both issues now. With reading data, you can read data into an existing mutable bytearray (or a suitable memoryview), or a set of them. For writing data, you can write a mutable bytestring and then mutate it in place to write different data a second time. This probably doesn't help much if you're generating entirely new data (unless you can do it piece by piece), but is great if you only need to change a bit of the data to write a new chunk of stuff.
One obvious question here is how you limit how much data you read.
Python modules in the standard library appear to have taken two
different approaches to this. The os
module and the io
module use the total size of
the pre-allocated buffer or buffers you've provided as the only
limit. The socket
module defaults to the
size of the buffer you provide, but allows you to further limit the
amount of data read to below that. This initially struck me as odd,
but then I realized that network protocols often have situations
where you know you want only a few more bytes in order to complete
some element of a protocol. Limiting the amount of data read below
the native buffer size means that you can have a single maximum-sized
buffer while still doing short reads if you only want the next N
bytes.
(If I'm understanding things right, you could do this with a
memoryview of explicitly limited size. But this would still require
a new memoryview object, and they actually take up a not tiny amount
of space; sys.getsizeof()
on a 64-bit Linux machine says they're
192 bytes each. A bytearray's fixed size is actually smaller,
apparently coming in at 56 bytes for an empty one and 58 bytes for
one with a single byte in it.)
Sidebar: Subset memoryviews
Suppose you have a big bytearray object, and you want a memoryview of the first N bytes of it. As far as I can see, you actually need to make two memoryviews:
>>> b = bytearray(200) >>> b[0:4] bytearray(b'\x00\x00\x00\x00') >>> m = memoryview(b) >>> ms = m[0:30] >>> ms[0:4] = b'1234' >>> b[0:4] bytearray(b'1234')
It is tempting to do 'memoryview(b[0:30])
', but that creates
a copy of the bytearray that you then get a memoryview of, so your
change doesn't actually change the original bytearray (and you're
churning memory). Of course if you intend to do this regularly,
you'd create the initial memoryview up front and keep it around for
the lifetime of the bytearray itself.
I'm a little bit surprised that memoryview objects don't have support for creating subset views from the start, although I'm sure there are good reasons for it.
The importance of explicitly and clearly specifying things
I was going to write this entry in an abstract way, but it is easier and more honest to start with the concrete specifics and move from there to the general conclusions I draw and my points.
We recently encountered an unusual Linux NFS client behavior, which at the time I called a bug. I have since been informed that this is not actually a bug but is Linux's implementation of what Linux people call "close to open cache consistency", which is written up in the Linux NFS FAQ, section A8. I'm not sure what to call the FAQ's answer; it is partly a description of concepts and partly a description of the nominal kernel implementation. However, this kernel implementation has changed over time, as we found out, with changes in user visible behavior. In addition, the FAQ doesn't make any attempt to describe how this interacts with NFS locking or if indeed NFS locking has any effect on it.
As someone who has to deal with this from the perspective of programs that are running on Linux NFS clients today and will likely run on Linux NFS clients for many years to come, what I need is a description of the official requirements for client programs. This is not a description of what works today or what the kernel does today, because as we've seen that can change; instead, it would be a description of what the NFS developers promise will work now and in the future. As with Unix's file durability problem, this would give me something to write client programs to and mean that if I found that the kernel deviated from this behavior I could report it as a bug.
(It would also give the NFS maintainers something clear to point people to if what they report is not in fact a bug but them not understanding what the kernel requires.)
On the Linux NFS mailing list, I attempted to write a specific
description of this from the FAQ's wording (you can see my attempt
here),
and then asked some questions about what effect using flock()
had
on this (since the FAQ is entirely silent on this). This uncovered
another Linux NFS developer who apparently has a different (and
less strict) view of what the kernel should require from programs
here. It has not yet yielded any clarity on what's guaranteed about
flock()
s interaction with Linux CTO cache consistency.
The importance of explicitly and clearly specifying things is that
it deals with all four issues that have been uncovered here. With
a clear and explicit specification (which doesn't have to be a
formal, legalistic thing), it would be obvious what writers of
programs must do to guarantee things working (not just now but also
into the future), all of the developers could be sure that they
were in agreement about how the code should work (and if there's
disagreement, it would be immediately uncovered), any unclear or
unspecified areas would at least become obvious (you could notice
that the specification says nothing about what flock()
does), and
it would be much clearer if kernel behavior was a bug or if a kernel
change introduced a deviation from the agreed specification.
This is a general thing, not something specific to the Linux kernel or kernels in general. For 'kernel' you can substitute 'any system that other people base things on', like compilers, languages, web servers, etc etc. In a sense this applies to anything that you can describe as an API. If you have an API, you want to know how you use the API correctly, what the API actually is (not just the current implementation), if the API is ambiguous or incomplete, and if something is a bug (it violates the API) or just a surprise. All of this is very much helped by having a clear and explicit description of the API (and, I suppose I should add, a complete one).