2021-06-17
Apache directory indexes will notice and exclude blocked URLs in them
Today I learned about an interesting and nice little Apache feature.
If you have Apache generating its own automatic index pages for
filesystem directories, and you block access to some things in a
directory with <Location>
blocks, Apache's generated index won't
include what you blocked. It's as if the object doesn't exist. This
is what you want (since attempts to access those things will fail),
but it's more than I expected.
That sounds abstract, so let me make it concrete. We have an old
legacy FTP site, which we've recently
made available as a HTTPS site
because browsers are removing support for FTP. For historical reasons, this FTP site
has some symbolic links that create recursive structures; for
example, it has a public_html
symlink in the root that points
to '.
' (the current directory). Unfortunately, web spiders just
love recursive structures and will crawl through them incessantly,
with ever lengthening URLs.
(Web spider operators will probably tell you that they don't like
recursive link situations like this. I have to go by observed
behavior, which is that any number of web spiders don't appear to
notice that /public_html/
is exactly the same content as /
and /public_html/public_html/
and so on.)
We don't want to remove the symbolic links from the actual directory tree that's the FTP site, for various reasons (maybe they're there for some necessary reason, or at least have become embedded in historical FTP URLs). But the HTTPS site is new and we can drop whatever URLs we want from it. So I did the obvious simple thing:
<Location "/public_html"> Order deny,allow Deny from all </Location>
When I was verifying that this worked, I noticed that the top
level index page for the FTP site
no longer showed any public_html
entry. Testing showed that
this happened for other entries in the directory as well, if I
temporarily added them.
The mod_autoindex documentation suggests
that this is a standard feature that does general permission checks,
based on the documentation for the ShowForbidden
option to the
IndexOptions
directive. However, I haven't tested this with more complex situations,
such as <Directory> instead of <Location> or more complicated
permissions.
In Prometheus queries, on
and ignoring
don't drop labels from the result
Today I learned that one of the areas of PromQL, the query language for Prometheus that I'm a still a bit weak on is when labels will and won't get dropped from metrics as you manipulate them in a query. So I'll start with the story.
Today I wrote an alert rule to make sure that the network interfaces
on our servers hadn't unexpectedly dropped down to 100 Mbit/second
(instead of 1Gbit/s or for some servers 10Gbit/s). We have a couple
of interfaces on a couple of servers that legitimately are at 100M
(or as legitimately as a 100M connection can be in 2021), and I
needed to exclude them. The speed of network interfaces is reported
by node_exporter
in node_network_speed_bytes
, so I first wrote an expression
using unless
and all of the labels involved:
node_network_speed_bytes == 12500000 unless ( node_network_speed_bytes{host="host1",device="eno2",...} or node_network_speed_bytes{host="host2",device="eno1",...} )
However, most of the standard labels you get on metrics from the
host agent (such as job
, instance
, and so on) are irrelevant
and even potentially harmful to include (the full set of labels
might have to change someday). The labels I really care about are
the host and the device. So I rewrote this as:
node_network_speed_bytes == 12500000 unless on(host,device) [....]
When I wrote this expression I wasn't sure if it was going to drop
all other labels beside host
and device
from the filtered end
result of the PromQL expression. It
turns out that it didn't; the full set of labels for
node_network_speed_bytes
is passed through, even though we're
only matching on some of them in the unless
.
(The host and the device are all that I needed for the alert message so it wouldn't have been fatal if the other labels were dropped. But it's better to retain them just in case.)
Aggregation operators
discard labels unless you use without
or by
, as covered by their
documentation (although it's not phrased that way), since aggregating
over labels is their purpose. As I've found out, careless use of
aggregation operators can lose labels that are valuable for alerts (which may be what left me jumpy about
this case). Aggregation over time
keeps all labels, though, because it's aggregating over time instead
of over some or all labels. But as I was reminded today (since I'm
sure I've seen it before), vector matching
using on
and ignoring
don't drop labels, they merely restrict
what labels are used in the matching (and then it's up to you to
make sure you still have a one to one vector match or at least a
match that you expect; I've made mistakes there).
(You can also explicitly pull in additional labels from other metrics.)
There may be other cases in PromQL where labels are dropped, but if so I can't think of them right now. My overall moral is that I still need to test my assumptions and guesses in order to be sure about this stuff.
Sidebar: Why I used unless (... or ...)
in this query
In many cases, the obvious way to exclude some things from an alert rule expression is to use negative label matches. However, these can't match on the combination of several labels instead of the value of a single label. As far as I know, if you want to exclude only certain label combinations (here 'host1 and eno2' and 'host2 and eno1') where the individual label elements can occur separately (so host1 and host2 both have other network interfaces, and other hosts have eno1 and eno2 interfaces), you're stuck with more awkward construction I used. This construction is unfortunately somewhat brute force.