Shooting myself in the foot by cargo-culting Apache configuration bits
I spent part of today working to put Prometheus's Blackbox prober
a reverse proxy in Apache (to add TLS and some security to it).
Unlike some other pieces of Prometheus, the Blackbox exporter is
not designed for this and so its little web server generates HTML
pages with absolute urls like
/config, which doesn't
work too well when you've relocated it to be under
your reverse proxying rules. Years and years ago I would have just
been out of luck, but these days Apache has mod_proxy_html, which
can rewrite HTML to fix URLs as it flows back through your Apache
I've never used mod_proxy_html before so I did my usual thing with Apache modules when I just want to hack something together; I skimmed the official Apache documentation, decided it was confusing me, did some Internet searches for examples and discussions, and used them to put together a configuration. The result behaved weirdly. I had the apparently obvious rewrite rule of:
<Location /blackbox> ProxyHTMLEnable On [...] ProxyHTMLURLMap / /blackbox/ [...] </Location>
As I understood it, this was supposed to transform a Blackbox HTML
link of '
/config' to '
/blackbox/config', by mapping
/blackbox. Instead, what I got out was
I flailed around with various alternatives and got any of the three
following variants to work:
# Match only what's supposed to be there ProxyHTMLURLMap "^/(metrics|config|probe|logs)(.*)" "/blackbox/$1$2" [R] # Terrible hack, convert to relative URLs ProxyHTMLURLMap / ./ # This works but I don't understand why # and it has to be in *this order* ProxyHTMLURLMap /blackbox /blackbox ProxyHTMLURLMap / /blackbox/
Eventually I discovered the magic setting '
proxy_html:trace3', which gave me a report of what was theoretically
happening in the HTML rewriting process. What the logs said was
that HTML rewriting appeared to be happening twice, which at least
explained why I had wound up with a doubled
/blackbox in the URL
and why the last variant worked around it (on the second pass,
mod_proxy_html matched the do-nothing rule and stopped).
I read the official documentation again to see if I could figure out why the module was doing two passes, but it didn't have any enlightenment for me. Then, suddenly, I had a terrible suspicion. You see, I left out a little bit of my Apache configuration, a bit that I had just blindly copied from Internet sources (possibly here, but there are lots of mentions of it):
It turns out that in Apache 2.4, you don't want to set an output
filter for mod_proxy_html. Just setting '
On' is enough to get the module rewriting your HTML (presumably
it internally hooks into Apache's filtering system). If you do go ahead
and set mod_proxy_html as an output filter as well, you get the
obvious thing happening; it acts twice, and then like me you will
probably be fairly confused. Removing this setting made everything
I know that superstition is a dangerous but attractive thing, and I still fell victim to blindly copying things from the Internet rather than slowing down to try to build a configuration from the documentation itself. Next time, perhaps I'll remember to be patient.
Sidebar: The one thing that still didn't work
Fetching a URL that returns a fairly large
text/plain result works
curl but fails in browsers (including
various errors about corrupted content or an inability to uncompress
things. Various Internet searches suggest that perhaps this is a
problem with the back-end web server returning compressed content
and Apache being unhappy. I followed the suggested approach of
stopping that with:
RequestHeader unset Accept-Encoding
This seems to have worked. For our usage I don't care if all of the content here is served without compression; it's not very big and I don't actually expect us to use the Blackbox probe exporter's web thing very often.