Wandering Thoughts archives

2018-10-29

Shooting myself in the foot by cargo-culting Apache configuration bits

I spent part of today working to put Prometheus's Blackbox prober exporter behind a reverse proxy in Apache (to add TLS and some security to it). Unlike some other pieces of Prometheus, the Blackbox exporter is not designed for this and so its little web server generates HTML pages with absolute urls like /metrics and /config, which doesn't work too well when you've relocated it to be under /blackbox/ in your reverse proxying rules. Years and years ago I would have just been out of luck, but these days Apache has mod_proxy_html, which can rewrite HTML to fix URLs as it flows back through your Apache reverse proxy.

I've never used mod_proxy_html before so I did my usual thing with Apache modules when I just want to hack something together; I skimmed the official Apache documentation, decided it was confusing me, did some Internet searches for examples and discussions, and used them to put together a configuration. The result behaved weirdly. I had the apparently obvious rewrite rule of:

<Location /blackbox>
   ProxyHTMLEnable On
   [...]
   ProxyHTMLURLMap / /blackbox/
   [...]
</Location>

As I understood it, this was supposed to transform a Blackbox HTML link of '/config' to '/blackbox/config', by mapping / to /blackbox. Instead, what I got out was /blackbox/blackbox/config. I flailed around with various alternatives and got any of the three following variants to work:

# Match only what's supposed to be there
ProxyHTMLURLMap "^/(metrics|config|probe|logs)(.*)" "/blackbox/$1$2" [R]

# Terrible hack, convert to relative URLs
ProxyHTMLURLMap / ./

# This works but I don't understand why
# and it has to be in *this order*
ProxyHTMLURLMap /blackbox /blackbox
ProxyHTMLURLMap / /blackbox/

Eventually I discovered the magic setting 'LogLevel debug proxy_html:trace3', which gave me a report of what was theoretically happening in the HTML rewriting process. What the logs said was that HTML rewriting appeared to be happening twice, which at least explained why I had wound up with a doubled /blackbox in the URL and why the last variant worked around it (on the second pass, mod_proxy_html matched the do-nothing rule and stopped).

I read the official documentation again to see if I could figure out why the module was doing two passes, but it didn't have any enlightenment for me. Then, suddenly, I had a terrible suspicion. You see, I left out a little bit of my Apache configuration, a bit that I had just blindly copied from Internet sources (possibly here, but there are lots of mentions of it):

SetOutputFilter proxy-html

It turns out that in Apache 2.4, you don't want to set an output filter for mod_proxy_html. Just setting 'ProxyHTMLEnable On' is enough to get the module rewriting your HTML (presumably it internally hooks into Apache's filtering system). If you do go ahead and set mod_proxy_html as an output filter as well, you get the obvious thing happening; it acts twice, and then like me you will probably be fairly confused. Removing this setting made everything work.

I know that superstition is a dangerous but attractive thing, and I still fell victim to blindly copying things from the Internet rather than slowing down to try to build a configuration from the documentation itself. Next time, perhaps I'll remember to be patient.

(mod_proxy_html is apparently not a complete solution for modern websites that have piles of URLs hidden in CSS, JavaScript, and so on, but fortunately the Blackbox exporter just uses plain HTML.)

Sidebar: The one thing that still didn't work

Fetching a URL that returns a fairly large text/plain result works in wget and curl but fails in browsers (including lynx) with various errors about corrupted content or an inability to uncompress things. Various Internet searches suggest that perhaps this is a problem with the back-end web server returning compressed content and Apache being unhappy. I followed the suggested approach of stopping that with:

RequestHeader unset Accept-Encoding

This seems to have worked. For our usage I don't care if all of the content here is served without compression; it's not very big and I don't actually expect us to use the Blackbox probe exporter's web thing very often.

web/ApacheHTMLProxyMistake written at 22:10:10; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.