== What can go wrong with polling for writability on blocking sockets Yesterday I wrote about how [[our performance problem with _amandad_ ../sysadmin/SlowBackupsCause]] were caused by [[_amandad_ doing IO multiplexing wrong IOMultiplexingDoneWrong]] by only polling for whether it could read from its input file descriptors and assuming it could always write to its network sockets. But let's ask a question: suppose that _amandad_ was also polling for writability on those network sockets. Would it work fine? The answer is no, not without even more code changes, because *_amandad_'s network sockets aren't set to be non-blocking*. The problem here is what it really means when _poll()_ reports that something is ready for write (or for that matter, for read). Let me put it this way: > ~~That _poll()_ says a file descriptor is ready for writes doesn't > mean that you can write an arbitrary amount of data to it without > blocking.~~ When I put it this way, of course it can't. Can I write a gigabyte to a network socket or a pipe without blocking? Pretty much any kernel is going to say 'hell no'. Network sockets and pipes can never instantly absorb arbitrary amounts of data; there's always a limit somewhere. What _poll()_'s readiness indicator more or less means is that you can now write *some* data without blocking. How much data is uncertain. The importance of non-blocking sockets is due to an API decision that Unix has made. Given that you can't write an arbitrary amount of data to a socket or a pipe without blocking, Unix has decided that by default when you write 'too much' you get blocked instead of getting a short write return (where you try to write N bytes and get told you wrote less than that). In order to not get blocked if you try a too large write you must explicitly set your file descriptor to non-blocking mode; at this point you will either get a short write or just an error (if you're trying to write and there is no room at all). (This is a sensible API decision for reasons beyond the scope of this entry. And yes, [[it's not symmetric with reading from sockets and pipes CommonSocketError]].) So if _amandad_ just polled for writability but changed nothing else in its behavior, it would almost certainly still wind up blocking on writes to network sockets as it tried to stuff too much down them. At most it would wind up blocked somewhat less often because it would at least send some data immediately every time it tried to write to the network. (The pernicious side of this particular bug is whether it bites you in any visible way depends on how much network IO you try to do how fast. If you send to the network (or to pipes) at a sufficiently slow rate, perhaps because your source of data is slow, you won't stall visibly on writes because there's always the capacity for how much data you're sending. Only when your send rates start overwhelming the receiver will you actively block in writes.) === Sidebar: The value of serendipity (even if I was wrong) Yesterday I mentioned that my realization about the core cause of our _amandad_ problem was sparked by remembering an apparently unrelated thing. As it happens, it was my memory of reading Rusty Russell's [[POLLOUT doesn't mean write(2) won't block: Part II http://rusty.ozlabs.org/?p=437]] that started me on this whole chain. A few rusty neurons woke up and said 'wait, _poll()_ and then long _write()_ waits? I was reading about that...' and off I went, [[even if my initial idea turned out to be wrong about the real cause https://twitter.com/thatcks/status/510184511188267008]]. Had I not been reading Rusty Russell's blog I probably would have missed noticing the anomaly and as a result wasted a bunch of time at some point trying to figure out what the core problem was. The _write()_ issue is clearly in the air because Ewen McNeill also pointed it out in a comment on [[yesterday's entry IOMultiplexingDoneWrong]]. This is a good thing; the odd write behavior deserves to be better known so that it doesn't bite people.