Why we have CentOS machines as well as Ubuntu ones
I'll start with the tweets that I ran across semi-recently (via @bridgetkromhout):
@alicegoldfuss: If you're running Ubuntu and some guy comes in and says 'we should use Redhat'...fuck that guy." - @mipsytipsy #SREcon16
mipsytipsy: alright, ppl keep turning this into an OS war; it is not. supporting multiple things is costly so try to avoid it.
This is absolutely true. But, well, sometimes you wind up with exceptions despite how you may feel.
We're an Ubuntu shop; it's the Linux we run and almost all of our machines are Linux machines. Despite this we still have a few CentOS machines lurking around, so today I thought I'd explain why they persist despite their extra support burden.
The easiest machine to explain is the one machine running CentOS 6. It's running CentOS 6 for the simple reason that that's basically the last remaining supported Linux distribution that Sophos PureMessage officially runs on. If we want to keep running PureMessage in our anti-spam setup (and we do), CentOS 6 is it. We'd rather run this machine on Ubuntu and we used to before Sophos's last supported Ubuntu version aged out of support.
Our current generation iSCSI backends run CentOS 7 because of the long support period it gives us. We treat these machines as appliances and freeze them once installed, but we still want at least the possibility of applying security updates if there's a sufficiently big issue (an OpenSSH exposure, for example). Because these machines are so crucial to our environment we want to qualify them once and then never touch them again, and CentOS has a long enough support period to more than cover their expected five year lifespan.
Finally, we have a couple of syslog servers and a console server that run CentOS 7. This is somewhat due to historical reasons, but in general we're happy with this choice; these are machines that are deliberately entirely isolated from our regular management infrastructure and that we want to just sit in a corner and keep working smoothly for as long as possible. Basing them on CentOS 7 gives us a very long support period and means we probably won't touch them again until the hardware is old enough to start worrying us (which will probably take a while).
The common feature here is the really long support period that RHEL and CentOS gives us. If all we want is basic garden variety server functionality (possibly because we're running our own code on top, as with the iSCSI backends), we don't really care about using the latest and greatest software versions and it's an advantage to not have to worry about big things like OS upgrades (which for us is actually 'build completely new instance of the server from scratch'; we don't attempt in-place upgrades of that degree and they probably wouldn't really work anyways for reasons out of the scope of this entry).
Why I think Illumos/OmniOS uses PCI subsystem IDs
As I mentioned yesterday, PCI has both
vendor/device IDs and 'subsystem' vendor/device IDs. Here is what
this looks like (in Linux) for a random device on one of our machines
here (from '
lspci -vnn', more or less):
04:00.0 Serial Attached SCSI controller : LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0086] (rev 05)
Subsystem: Super Micro Computer Inc Device [15d9:0691]
This is the integrated motherboard SAS controller on a SuperMicro motherboard (part of our fileserver hardware). It's using a standard LSI chipset, as reported in the main PCI vendor and device ID, but the subsystem ID says it's from SuperMicro. Similarly, this is an Intel chipset based motherboard so there are a lot of things with standard Intel vendor and device IDs, but SuperMicro specific subsystem vendor and device IDs.
As far as I know, most systems use the PCI vendor and device IDs and mostly ignore the subsystem vendor and device IDs. It's not hard to see why; the main IDs tell you more about what the device actually is, and there are fewer of them to keep track of. Illumos is an exception, where much of the PCI information you see reported uses subsystem IDs. I believe that a significant reason for this is that Illumos is often attempting to basically fingerprint devices.
Illumos tries hard to have some degree of constant device naming
(at least for their definition of it), so that say 'e1000g0' is
always the same thing. This requires being able to identify specific
hardware devices as much as possible, so you can tie them to the
visible system-level names you've established. This is the purpose
/etc/path_to_inst and the systems associated with it; it
fingerprints devices on first contact, assigns them an identifier
(in the form of a driver plus an instance number), and thereafter
tries to keep them exactly the same.
(From Illumos's perspective the ideal solution would be for each single PCI device to have a UUID or other unique identifier. But such a thing doesn't exist, at least not in general. So Illumos must fake a unique identifier by using some form of fingerprinting.)
If you want a device fingerprint, the PCI subsystem IDs are generally going to be more specific than the main IDs. A whole lot of very different LSI SAS controllers have 1000:0086 as their PCI vendor and device IDs, after all; that's basically the purpose of having the split. Using the SuperMicro subsystem vendor and device IDs ties it to 'the motherboard SAS controller on this specific type of motherboard', which is much closer to being a unique device identifier.
Note that Illumos's approach more or less explicitly errs on the side of declaring devices to be new. If you shuffle which slots your PCI cards are in, Illumos will declare them all to be new devices and force you to reconfigure things. However, this is broadly much more conservative than doing it the other way. Essentially Illumos says 'if I can see that something changed, I'm not going to go ahead and use your existing settings'. Maybe it's a harmless change where you just shuffled card slots, or maybe it's a sign of something more severe. Illumos doesn't know and isn't going to guess; you get to tell it.
(I do wish there were better tools to tell Illumos that certain changes were harmless and expected. It's kind of a pain that eg moving cards between PCI slots can cause such a commotion.)
What Illumos/OmniOS PCI device names seem to mean
When working on an OmniOS system, under normal circumstances you'll
use friendly device names from
/dev and things like
network devices). However, Illumos-based systems have an underlying
hardware based naming scheme (exposed in
/devices), and under
some circumstances you can wind up
dealing with it. When you do, you'll be confronted with relatively
opaque names like '
very little clue what these names actually mean, at least if you're
not already an Illumos/Solaris expert.
So let's take just one bit here:
pci8086,e04@2. The pci8086,e04
portion is the PCI subsystem vendor and device code, expressed in
hex. You'll probably see '8086' a lot, because it's the vendor code for
Intel. Then the @2 portion is PCI path information expressed relative
to the parent. This can get complicated, because 'path relative to the
parent' doesn't map well to the kinds of PCI names you get on Linux
from eg '
lspci'. When you see a '@...' portion with a comma, that is
what other systems would label
as 'device.function'. If there is no comma in the '@..' portion, the
function is implicitly 0.
(Note that the PCI subsystem vendor and device is different from
the PCI vendor and device. Linux '
lspci -n' shows only the vendor
code, because that's what's important for knowing what sort of thing
it is instead of who exactly made it; you have to use '
to see the subsystem stuff. Illumos's PCI names here are inherently
framed as a PCI tree, whereas Linux
lspci normally does not show
the tree topology, just flat slot numbering. See '
lspci -t' for
the tree view.)
As far as I can tell, in a modern PCI Express setup the physical
slot you put a card into will determine the first two elements of
the PCI path. '/pci@0,0' is just a (synthetic) PCI root instance,
and then '/pci8086,e04@2' is a specific PCI Express Port. However,
I'm not sure if one PCI Express Port can feed multiple slots and
if it can, I'm not sure how you tell them apart. I'm not quite sure
how things work for plain PCI cards, but for onboard PCI devices
you get PCI paths like '/pci@0,0/pci15d9,714@1a' where the '@1a'
corresponds to what Linux
lspci sees as 00:1a.0.
So, suppose that you have a collection of OmniOS servers and you want to know if they have exactly
the same PCI Express cards in exactly the same slots (or, say,
exactly the same Intel 1G based network cards). If you look at
/etc/path_to_inst and see exactly the same PCI paths, you've
got what you want. If you look at the paths and see two systems
What you have is a situation where the cards are in the same slots
(because the first two elements of the path are the same) but they're
slightly different generations and Intel has changed the PCI subsystem
device code on you (seen in ',135e' versus ',115e'). If you're
transplanting system disks from s2 to s1, this can cause problems
that you'll need to deal with by editing
I don't know what order Illumos uses when choosing how to assign instances (and thus eg network device names) to hardware when you have multiple instances of the same hardware. On a single card with multiple ports it seems consistent that the port with the lower function is assigned first, eg if you have a dual port card where the ports are pci8086,115e@0 and pci8086,115e@0,1, the @0 port will always be a lower instance than the @0,1 port. How multiple cards are handled is not clear to me and I can't reverse engineer it based on our current hardware.
(While we have multiple Intel 1G dual-port cards in our OmniOS fileservers, they are in PCI Express slots that differ both in the PCI subdevice and in the PCI path information; we have pci8086,e04@2 as the PCI Express Port for the first card and pci8086,e0a@3,2 for the second. I suspect that the PCI path information ('@2' versus '@3,2') determines things here, but I don't know for sure.)
PS: Yes, all of this is confusing (at least to me). Maybe I need to read up on general principles of PCI, PCI Express, and how all the topology stuff works (the PCI bus world is clearly not flat any more, if it ever was).
A brief review of the HP three button USB optical mouse
The short background is that I'm strongly attached to real three button mice (mice where the middle mouse button is not just a scroll wheel), for good reason. This is a slowly increasing problem primarily because my current three button mice are all PS/2 mice and PS/2 ports are probably going to be somewhat hard to find on future motherboards (and PS/2 to USB converters are finicky beasts).
One of the very few three button USB mice you can find is a HP mouse (model DY651A); it's come up in helpful comments here several times (and see also Peter da Silva). Online commentary on it has been mixed with some people not very happy with it. Last November I noticed that we could get one for under $20 (Canadian, delivery included), so I had work buy me one; I figured that even if it didn't work for me, having another mouse around for test machines wouldn't be a bad thing. At this point I've used it at work for a few months and I've formed some opinions.
The mouse's good side is straightforward. It's a real three button
USB optical mouse, it works, and it costs under $20 on Amazon.
It's not actually made by HP, of course; it turns out to be a lightly
rebranded Logitech (
xinput reports it as 'Logitech USB Optical
Mouse'), which is good because Logitech made a lot of good three
button mice back in the days. There are reports that it's not durable
over the long term but at under $20 a pop, I suggest not caring if
it only lasts a few years. Buy spares in advance if you want to,
just in case it goes out of production on you.
(And if you're coming from a PS/2 ball mouse, modern optical mouse tracking is plain nicer and smoother.)
On the bad side there are two issues. The minor one is that my copy
seems to have become a little bit hair trigger on the middle mouse
button already, in that every so often I'll click once (eg to do a
single paste in
xterm) and X registers two clicks (so I get things
pasted twice in
xterm). It's possible that this mouse just needs a
lighter touch in general than I'm used to.
The larger issue for me is that the shape of the mouse is just not
as nice as Logitech's old three button PS/2 mice. It's still a
perfectly usable and reasonably pleasant mouse, it just doesn't
feel as nice as my old PS/2 mouse (to the extent that I can put my
finger on anything specific, I think that the front feels a bit too
steep and maybe too short). My overall feeling after using the HP
mouse for several months is that it's just okay instead of rather
nice the way I'm used to my PS/2 mouse feeling. I could certainly
use the HP mouse; it's just that I'd rather use my PS/2 mouse.
(For reasons beyond the scope of this entry I think it's specifically the shape of the HP mouse, not just that it's different from my PS/2 mouse and I haven't acclimatized to the difference.)
The end result is that I've switched back to my PS/2 mouse at work. Reverting from optical tracking to a mouse ball is a bit of a step backwards but having a mouse that feels fully comfortable under my hand is more than worth it. I currently plan to keep on using my PS/2 mouse for as long as I can still connect it to my machine (and since my work machine is unlikely to be upgraded any time soon, that's probably a good long time).
Overall, if you need a three button USB mouse the HP is cheap and perfectly usable, and you may like its feel more than I do. At $20, I think it's worth a try even if it doesn't work out; if nothing else, you'll wind up with an emergency spare three button mouse (or a mouse for secondary machines).
(And unfortunately it's not like we have a lot of choice here. At least the HP gives us three button people an option.)
How to get Unbound to selectively add or override DNS records
Suppose, not entirely hypothetically, that you're using Unbound and you have a situation where you want to shim some local information into the normal DNS data (either adding records that don't exist naturally or overriding some that do). You don't want to totally overwrite a zone, just add some things. The good news is that Unbound can actually do this, and in a relatively straightforward way (unlike, say, Bind, where if this is possible at all it's not obvious).
You basically have two options, depending on what you want to do with the names you're overriding. I'll illustrate both of these:
local-zone: example.org typetransparent local-data: "server.example.org A 188.8.131.52"
Here we have added or overridden an A record for
Any other DNS records for
server.example.org will be returned
as-is, such as MX records.
local-zone: example.com transparent local-data: "server.example.com A 184.108.40.206"
We've supplied our own A record for
server.example.com, but we've
also effectively deleted all other DNS records for it. If it has
an MX record or a TXT record or what have you, those records will
not be visible. For any names in transparent local-data zones, you
are in complete control of all records returned; either they're in
your local-data stanzas, or they don't exist.
Note that if you just give
local-data for something without a
local-zone directive, Unbound silently makes it into such a
transparent local zone.
Transparent local zones have one gotcha, which I will now illustrate:
local-zone: example.net transparent local-data: "example.net A 220.127.116.11"
Because this is a transparent zone and we haven't listed any NS
example.net as part of our local data, people will
not be able to look up any names inside the zone even though we
don't explicitly block or override them. Of course if we did list
some additional names inside example.net as local-data, people would
be able to look up them (and only them). This can be a bit puzzling
until you work out what's going on.
(Since transparent local zones are the default, note that this
happens if you leave out the
local-zone or get the name wrong by
mistake or accident.)
As far as I know, there's no way to use a typetransparent zone but delete certain record types for some names, which you'd use so you can do things like remove all MX entries for some host names. However, Unbound's idea of 'zones' don't have to map to actual DNS zones, so you can do this:
local-zone: example.org typetransparent local-data: "server.example.org A 18.104.22.168" # but: local-zone: www.example.org transparent local-data: "www.example.org A 22.214.171.124"
www.example.org as a separate transparent local zone,
this allows us to delete all records for it but the A record that
we supply; this would remove, say, MX entries. Since I just tried
this out, note that a transparent local zone with no data naturally
doesn't blank out anything, so if you want to totally delete a
name's records you need to supply some dummy record (eg a TXT
(We've turned out to not need to do this right now, but since I worked out how to do it I want to write it down before I forget.)
Today's odd spammer behavior for sender addresses
It's not news that spammers like to forge your own addresses into
MAIL FROMs of the spam that they're trying to send you; I've
seen this here for some time.
On the machine where I have my sinkhole server running, this clearly comes
and goes. Some of the time almost all the senders will be trying a
MAIL FROM (often what they seem to be trying to mail
to), and other times I won't see any in the logs for weeks. But
recently there's been a new and odd behavior.
Right now, a surprising number of sending attempts are using a
MAIL FROM of 'oey@domain'. They're not just picking on a single
address that is mutilated this way, as I see the pattern with a
number of addresses.
(Some of the time they'll add some letters after the login name too, eg 'joey@domain' will turn into 'oeyn@domain'.)
So far I have no idea what specific spam campaign this is for because all of the senders have been in the Spamhaus XBL (this currently gets my sinkhole server to reject them as boring spam that I already have enough samples of).
What really puzzles me is what the spammers who programmed this are
thinking. It's probably quite likely that systems will reject bad
local addresses in
MAIL FROMs for incoming email, which means
that starting with addresses you think are good and then mutating
them is a great way to get a lot of your spam sending attempts
rejected immediately. Yet spammers are setting up their systems to
deliberately mutate addresses and then use them as the sender
address, and presumably this both works and is worthwhile for some
(Perhaps they're trying to bash their way through address obfuscation, even when the address isn't obfuscated.)
(I suspect that this is a single spammer that has latched on to my
now spamtrap addresses, instead of a general thing. Our general
inbound mail gateway gets too much volume for me to pick through
the 'no such local user'
MAIL FROM rejections with any confidence
that I'd spot such a pattern.)
Why your Apache should have mod_status configured somewhere
Recently, our monitoring system alerted us that our central web server wasn't responding. I poked it and indeed, it wasn't responding, but when I looked at the server everything seemed okay and the logs said it was responding to requests (a lot of them, in fact). Then a little bit later monitoring said it was responding again. Then it wasn't responding. Then my attempt to look at a URL from it worked, but only really slowly.
If you're a long-term Apache wrangler, you can probably already guess the cause. You would be correct; what was going on was that our Apache was being hit with so many requests at once that it was running out of worker processes. If it got through enough work in time, it would eventually pick up your request and satisfy it; if it didn't, you timed out. And if you were lucky, maybe you could get a request in during a lull in all the requests and it would be handled right away.
Once we'd identified the overall cause, we needed to know who or what was doing it. Our central web server handles a wide variety of URLs for a lot of people, some of which can get popular from time to time, so there were a lot of options. And nothing stood out in a quick scan of the logs as receiving a wall of requests or the like. Now, I'm sure that we could have done some more careful log analysis to determine the most active URLs and the most active sources over the last hour or half hour or something, but that would have taken time and effort and we still might have missed sometime. Instead I took the brute force approach: I added mod_status to the server's configuration, on a non-standard URL with access restrictions, and then I looked at it. A high volume source IP jumped out right away and did indeed turn out to be our problem.
Apache's mod_status has a bad reputation as an information leak and a security issue, and as a result I think that a lot of people don't enabled it these days. Our example shows why you might want to reconsider that. Mod_status offers information that's fairly hard to get in any other way and that's very useful (or essential) when you need it, and it's definitely possible to enable it securely. Someday you will want to know who or what is bogging down your server (or at least what it's doing right now), and a live display of current requests is just the thing to tell you.
(This should not be surprising; live status is valuable for pretty much anything. Even when this sort of information can be approximated or reconstructed from logs, it takes extra time and effort.)
Why Unix needs a standard way to deal with the file durability problem
One of the reactions to my entry on Unix's file durability problem is the obvious pragmatic one. To wit, that this
isn't really a big problem because you can just look up what you
need to do in practice and do it (possibly with some debate over
whether you still need to
fsync() the containing directory to
make new files truly durable or whether that's just superstition
by now). I don't disagree with this pragmatic answer and it's
certainly what you need to do today, but I think to stick to it is
to not see why Unix as a whole should have some sort of agreed on
standard for this.
An agreed on standard would help both programmers and kernel
developers. On the side of user level programmers, it tells us not
just what we need to do in order to achieve file durability today
but also what we need to do in order to future-proof our code. A
standard amounts to a promise that no sane future Unix setup will
add an additional requirement for file durability. If our code is
working right today on Solaris UFS or Linux ext2, it will keep
working right tomorrow on Linux ext4 or Solaris ZFS. Without a
standard, we can't be sure about this and in fact some programs
have been burned by it in the past, when new filesystems added extra
fsync()'ing directories under some circumstances.
(This doesn't mean that all future Unix setups will abide by this, of course. It just means that we can say 'your system is clearly broken, this is your problem and not a fault in our code, fix your system setup'. After all, even today people can completely disable file durability through configuration choices.)
On the side of kernel people and filesystem developers, it tells both parties how far a sensible filesystem can go; it becomes a 'this far and no further' marker for filesystem write optimization. Filesystem developers can reject proposed features that break the standard as 'it breaks the standard', and if they don't the overall kernel developers can. Filesystem development can entirely avoid both a race to the bottom and strained attempts to read the POSIX specifications so as to allow ever faster but more dangerous behavior (and also the ensuing arguments over just how one group of FS developers read POSIX).
The whole situation is exacerbated because POSIX and other standards have so relatively little to say on this. The people who create hyper-aggressive C optimizers are at least relying on a detailed and legalistically written C standard (even if almost no programs are fully conformant to it in practice), and so they can point users to chapter and verse on why their code is not standards conforming and so can be broken by the compiler. The filesystem people are not so much on shakier ground as on fuzzy ground, which results in much more confusion, disagreement, and arguing. It also makes it very hard for user level programmers to predict what future filesystems might require here, since they have so little to go from.
Why I think Let's Encrypt won't be a threat to commercial CAs for now
One of the questions these days is what effects Let's Encrypt and their well publicized free TLS certificates will have on existing commercial CAs. While LE has become a very popular CA and there have been some amusing signs of nervousness from existing CAs, my personal belief is that Let's Encrypt is not going to be a threat to the business of commercial CAs for the medium term, say the next few years.
One reason I say this is pragmatic past history. LE is not the first or only free SSL CA, and that existing free SSL CA does not seem to have made any particularly large dent in the business of other CAs. Now, there are certainly some differences between LE and this other CA that makes this not quite an apples to apples situation; LE is very well known now and issues short-duration certificates through easy automation, while the other free CA is less well known and issues year long certificates through a more annoying and onerous web-based process that not everyone can take advantage of. However, I think it's still a sign of some things.
(The other free CA also charges for certificate revocation or early replacement, which did not endear them to a lot of people during Heartbleed.)
Beyond that, there are a number of potential reasons that Let's Encrypt certificates are not necessarily all that compelling for many organizations. In no particular order:
- LE doesn't issue wild card certificates.
- Even with their recently raised rate limits, 20 new certificates a week is still potentially a real limit for decent sized organizations (who can have a lot of hosts and subdomains they want TLS certificates for).
- The short duration of LE certificates pretty much requires developing
and deploying new automation to automatically renew and roll over
LE certificates. With commercial CAs, a moderate sized place can
just buy some certificates and be done for the next few years, with
minimal staff time required.
(At least some CAs will even mail you helpful reminder emails when your certificates are approaching their expiry times.)
Finally, as a commodity basic TLS DV certificates are already available for very little money. It's my impression that a lot of organizations are not that price sensitive about TLS certificates; they simply don't care that they could save $10 or $20 a year per certificate, because the aggregate savings are not worth thinking about at their size. They might as well go with whatever is either simpler for them now or already familiar. Let's Encrypt is compelling for the price sensitive hobbyist, but for a decent sized organization I think it's only interesting for now.
(And for a large organization, the rate limits probably make it infeasible to rely on LE for new certificates.)
However, I do expect this to change over the longer term. What will really threaten existing CAs is the addition of turn-key integration of LE certificates in more and more pieces of software. Once you can set an option in your web server of choice to automatically get and maintain LE certificates for each virtual host you have or add, I think that a lot of people in a lot of organizations will just turn that option on rather than wrestle with all of the little hassles of commercial CAs. Once LE becomes the easiest choice, existing CAs have much less to offer to anyone who fits into LE's rate limits (basically CAs are down to 'we'll sell you wild-carded DV certificates').
Unbound illustrates the Unix manpage mistake with its ratelimits documentation
Our departmental recursive nameservers are based on OpenBSD, which has recently switched from BIND to Unbound and NSD. As a result of this, we've been in the process of setting up a working Unbound configuration. In the process of this we ran into an interesting issue.
A relatively current
unbound.conf manpage has this to say about
(some) ratelimiting options (I'm excerpting here):
- ratelimit: <number or 0>
- Enable ratelimiting of queries sent to the nameserver for performing recursion. If 0, the default, it is disabled. [...] For example, 1000 may be a suitable value to stop the server from being overloaded with random names, and keeps unbound from sending traffic to the nameservers for those zones.
- ratelimit-for-domain: <domain> <number qps>
- Override the global ratelimit for an exact match domain name with the listed number. [...]
So you set up an Unbound configuration that contains the following:
# apparent good practice ratelimit: 1000 # but let's exempt our own zones from it, # just in case. ratelimit-for-domain: utoronto.ca 0
Congratulations, on at least the OpenBSD version of Unbound you
have just blown your own foot off; you'll probably be unable to
resolve anything in utoronto.ca. If you watch the logs sufficiently
carefully, you can eventually spot a little mention that your query
for say the A record of
www.utoronto.ca has been ratelimited.
(If you're writing a moderately complicated Unbound configuration for the first time, it may take you some time to reach this point instead of suspecting that you have screwed something up in other bits of the configuration.)
What has happened is that you have not read the manpage with the
necessary closeness for a true
Unix manpage. You see, the manpage does not come out and actually
ratelimit-for-domain treats a ratelimit of 0 as unlimited.
It just looks like it should, because
ratelimit-for-domain is a
more specialized version of plain
ratelimit so you'd certainly
assume that they treat their number argument in the same way. And
of course that would be the sensible thing to do so you can do just
what we're trying to do here.
This may or may not be an Unbound bug in either Unbound itself or
unbound.conf manpage. Unix's minimalistic, legalistic
'close reading' history of both reading and writing manpages makes
it impossible to tell, because this could be both intended and
(In my opinion it is not well documented if it is intended, but that is a different argument. Classical style Unix manpages take specification-level terseness far too far for my tastes, partly for historical reasons. However this is not a winning argument to make with someone who likes this extreme terseness and 'pay attention to every word, both present and absent' approach; they will just tell you to read more carefully.)