2020-11-11
The problems inherent in building your own copies of software packages
In some quarters, it's accepted system administrator wisdom that if you really care about a particular program you should be building your own copy from the upstream sources instead of relying on whatever may be supplied by your Unix distribution. I'm against this, but in the past I have been so on the grounds that this is a waste and a duplication of effort. These days I think it's worse than that, in that there are potentially significant problems if you build software packages yourself. As it happens, I can illustrate these problems with the current state of Exim.
Exim recently added a security feature to 'taint' information taken directly from the outside world to reduce the potential for future issues like CVE-2019-13917. The change was added in the 4.93 release, and made many previous configurations break. This is a worthwhile change in general, but Exim 4.93 and even the current 4.94 were released with flaws in the support for dealing with this tainting; the current recommendation is to use a patched version of 4.94.
So suppose that Exim is important to you and you're building it from source. You have two problems. First, you want to build not the officially released 4.94 but 4.94 plus a collection of fixes (and you definitely didn't want to use 4.93), so you need to be able to assemble the right collection of fixes. Second, when the next security issue or serious bug is uncovered in Exim, the official policy of the Exim developers is that only the most recent stable release will be fixed. This can leave you to either backport the fixes into your previous version or upgrade to the current stable version, which may have other changes that break your configuration (as the move from 4.92 to 4.93 did).
(If Exim was to release a '4.94.1' with a security fix, you would be in luck in one sense but you would still need to integrate your collection of additional fixes on top of the security change. And always upgrading to new Exim releases to stay within the range of support has its own obvious problems.)
I want to be clear that I'm not picking on Exim specifically. Many packages in Unix distributions have changes added on from the upstream versions (for good reasons), and the upstream only supporting the current version (or a narrow range of versions) is quite common.
The large scale problem here is that in general, building and maintaining software packages takes ongoing specific expertise in that package. Getting and maintaining this expertise takes time (and effort); it does not magically happen for free. That means that there is a finite number of software packages that you can do this for (and it is limited by what other work you have to do). Building your own software packages without having this expertise exposes you to various sorts of risks.
The great advantage of using packages from your Unix distribution is that if all goes well, you can outsource having this expertise to the people involved in maintaining the package in the distribution. And since different people maintain different packages, they're in a position to apply much more total time to keeping on top of all of them than you are if you try to do it all yourself.
Logging fatal exceptions in my Python programs is not enough
We have a few Python programs which run automatically, need to produce very rigid output (or lack of output) to standard output and even standard error, and are complex enough (and use enough outside code) that they may reasonably run into unhandled exceptions. One example is our program to report on email attachment type information under Exim; this runs a lot of code on untrusted input, and our Exim configuration expects its output to have a pretty rigid format (cf). Allowing Python to dump out the normal unhandled exception to standard error is not what we wanted. So for years that program has had a chunk of top level code to catch and syslog otherwise unhandled exceptions. I wrote it, deployed it, and considered it all good.
The other day I discovered that this program had been periodically experiencing, catching, and dutifully syslogging an exception about an internal error (caused by a package we use), going back months. In fact, more than one error about more than one thing. I hadn't known, because I don't normally go look through the logs for these exception traces. Why would I? They aren't supposed to happen and they mostly don't happen, and humans are very bad at consistently looking for things that don't happen.
Django has a very nice feature where it will email error reports to you, which has periodically been handy here. I'm not sure I trust myself to write that much code that absolutely must run, but I certainly could make my exception logging code also run an external script with very minimal arguments and that script could email me to notify me. Since the exception is being logged, I don't need a copy in email; I just need to know that I should go look at the logs.
(Django emails the whole exception along with a bunch of additional information, but I believe the email is the only place that information is captured. There are various tradeoffs here, but my starting point is that I'm already logging the exception.)
I could likely benefit from going through PyPI to see how other people have solved this particular problem, and maybe even use their code rather than write my own. I've traditionally avoided outside packages, but we're already using a bunch of them in this program as it is and I should probably get over that hangup in general.
(It helps that I'm slowly acquiring a better understanding of using
pip
in practice.)