Wandering Thoughts archives

2019-01-06

accept(2)'s problem of trying to return two different sorts of errors

A long time ago, I wrote about the dangers of being overly specific in the errno values you looked for, with the specific case being a daemon that exited because an accept() system call got an ECONNRESET that it didn't expect. Recently, John Wiersba left a comment on that entry asking what else the original programmer should have done, given an unexpected error from accept(). In thinking about the issues, I realized that part of the problem is that accept() is actually returning two different sorts of errors and the Unix API doesn't provide it any good way to let people tell the two different sorts apart.

(These days accept() is standardized to return ECONNABORTED instead of ECONNRESET in these circumstances, although this may not be universal.)

The two sorts of errors that accept() is trying to return are errors in the accept() call, such as a bad file descriptor (EBADF, ENOTSOCK) or a bad parameter (EFAULT), and errors in the new connection that accept() may or may not be returning (EAGAIN, ECONNABORTED, etc). One of the differences between the two is that the first sort of errors are probably permanent unless fixed by the program somehow and generally indicate an internal program error, while the second sort of errors will go away if you correctly loop through your accept() sequence again.

A sensibly behaving network daemon should definitely not exit when it gets the second sort of error; it should instead just continue on with its processing loop. However, it's perfectly sensible and probably broadly correct to exit if you get the first sort of error, especially if it's an unknown error and you have no idea how to correct it in your code. If someone has closed a file descriptor on you or it's become a non-socket somehow, continuing will generally just get you an un-ending stream of the same error over and over (and burn CPU, and perhaps flood logs). Exiting is a perfectly sensible way out and often really the only thing you can do.

However, you can't reliably distinguish between these two types of errors unless you believe you can know all of the possible errnos for one or the other of them. Given the general habit of Unixes of adding more errno returns for system calls over time, the practical reality is that you can't. This unfortunately leaves authors of Unix network daemons sort of up in the air; they have to pick one way or the other, and either way might give the wrong answer in some circumstances.

(Perhaps accept() should never have returned the second sort of errors, leaving them all to be discovered on a subsequent use of the file descriptor it returned. But that ship sailed a very long time ago; accept() returning these sorts of errors is even in the Single UNIX Specification for accept().)

I suspect that accept() is not the only the only system call with this sort of split in types of errors (although I can't think of any others off the top of my head). But thankfully I don't think there are too many others, because accept()'s pattern of operation is an unusual one.

PS: The Linux accept() manpage actually has a warning about Linux's behavior here, in the RETURN VALUE section. Linux opts to immediately return a lot of errors detected on the new socket, while other Unixes generally postpone some of them. But note that any Unix can return ECONNABORTED.

unix/AcceptErrnoProblem written at 23:11:46; Add Comment

Linux network-scripts being deprecated is a problem for my home PPPoE link

The other day, I ran ifdown on my home machine for the first time since I upgraded it to Fedora 29 and got an unpleasant surprise:

WARN : [ifdown] You are using 'ifdown' script provided by 'network-scripts', which are now deprecated.
WARN : [ifdown] 'network-scripts' will be removed from distribution in near future.
WARN : [ifdown] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.

As they say, this is my unhappy face.

On both my work and my home machines, most of my network configuration is done through systemd's networkd. However, at home I also have a PPPoE DSL link. Systemd (still) doesn't handle PPPoE and I have no interest in using NetworkManager on my desktop machines, which means that currently my PPPoE link setup is still done through the good old fashioned Fedora /etc/sysconfig/network-scripts system. Since this now seems to be on a deprecation schedule of some sort (although who knows what 'near future' is here, for Fedora or in general), I'm going to need to find some sort of a replacement for my use of it.

In theory this shouldn't be too hard, because after all ifup and ifdown are just shell scripts, and for a DSL link it appears that most of what they do is delegate things to rp-pppoe's adsl-start script. In practice, these are gnarled and tangled shell scripts, with who knows what side effects and environment variable settings that adsl-start and things downstream of it are counting on, and I'm not looking forward to first reverse engineering all of the setup and then building an equivalent replacement system, just because people want to remove network-scripts.

For even more potential fun for me in the future, ifup and ifdown are provided both by the network-scripts package and by NetworkManager, with this managed by Fedora's alternatives system. I suspect that this means I won't even notice that network-scripts has been removed until my system's ifup and ifdown invocations start quietly running NetworkManager and things explode for reasons that I expect to boil down to 'because NetworkManager'.

(I don't have much optimism about NetworkManager's ability to cooperate with other parties or be modest about what it will do with your network setup; instead my impression is that NetworkManager expects to run all of your networking however it sees fit. So I expect it to try to read random bits of my very historical network-scripts configuration files, interpret them in various ways, and then probably cause my networking to explode. NetworkManager has an ifcfg-rh plugin for this, but I have no idea how well it works and it doesn't seem to support DSL PPPoE at all based on the documentation.)

PS: For what it's worth, removing the network-scripts package is not currently listed in the Fedora 30 accepted changes as far as I can see (see also).

Sidebar: How I currently have my PPPoE networking wired up

I have a system cron.d file that runs 'ifup ppp0' on boot (via a @reboot action), and then re-runs it every fifteen minutes if there's no default route, because sometimes it falls over. In a more proper systemd world, I guess I should write a service unit that runs it after my home machine's Ethernet is up and then perhaps try out a timer unit to handle the 'try again every fifteen minutes' thing.

(I normally strong prefer crontab entries over systemd timer units, but I would be interacting with other systemd units and with the overall systemd state here so timer units are probably better.)

linux/NetworkScriptsAndPPPoE written at 01:25:09; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.