2020-07-15
A piece of phish spam with some clever URL obfuscation
We were the target of a phish spam run today. In many respects it was a standard modern phish; it was specifically targeted to us, with a message and claimed sender tuned to here, it was in HTML, and the inducement to click was a claim of 'go here to retrieve a voicemail message'. However, it had one interesting trick that I haven't seen before, and that was how it obfuscated its target URL.
The first level of obfuscation was that the target in the <a href="..."> was entirely encoded in HTML hex entities, which probably only stops very basic spam recognizer engines (and serves as a big warning sign for others). However, even when decoded the direct URL came out to be '/blah/?of=<email address>', with no host in evidence. At first I stared at this in puzzlement, and then the penny dropped and I looked at the full HTML. Up at the top was a little thing:
<html> <base href="ht tp s: \/[...]
(For a bit of extra obfuscation, that decodes to 'https:\/'. I've removed the hostname, and added strategic spaces between some hex entities so that this entry doesn't get an extra-wide line.)
The phish spammers had split their URL in two by using a base URL element. The base URL element had the hostname (and the https://, sort of); the <a href> had the path on the host. Given this, it seems likely that a decent number of anti-spam engines that parse HTML don't handle it to the extent of base URL elements (and anything that just does basic text matching is out in the cold).
(I have a personal little program that extracts URLs from email messages for my own uses. It didn't understand the base URL element, but I'm not sure I should bother fixing that.)
I expect that IMAP mail clients properly reconstruct the full URL as part of properly rendering modern HTML, although I haven't tested that. I don't know if web based things like GMail do, although it's possible that document base URLs are used frequently enough in real HTML email that they have to.
(The phish spammer targeting us may have assumed that anyone using GMail or the like was a lost cause anyway, and have aimed at people using desktop or mobile IMAP clients.)