Wandering Thoughts archives

2019-09-25

It's always convenient when malware is clear about its nature (7z edition)

A certain amount of malware these days likes 7z files, or at least things that claim to be 7z files with their file extension. We've been getting a run of malware that claims its file extension is .pdf.7z and that Sophos PureMessage detects as 'CXmail/MalPE-AS', which I suspect means that there's actually a Windows executable in there. We've also got others that are being reported by our attachment logger as simply .7z files (and also, apparently, as MalPE-AS).

All of this caused me to take a look at our logs, where I found some attachments with the listed extension of .exe.7z (which were also detected as MalPE-AS). This is actually quite convenient for us, because we already reject email with .exe attachments. If you're going to helpfully label your attachment as a .exe in some way, well, we'll extend our rejection to rejecting it too, which we now do.

(We've also decided to reject .pdf.7z attachments. As far as we can tell we don't get any real ones. We're not sure we get any real .7z attachments in general, but rejecting those is currently a little bit more chancy. For various reasons, we will probably be augmenting our attachment logger to try to peer into 7z archives, as it currently does for ZIP and RAR archives.)

As a side note, the reason I said that the .7z attachments were 'apparently' detected as MalPE-AS is that all of the email messages with them actually had two attachments:

application/octet-stream; MIME file ext: .7z
application/msword; MIME file ext: .doc; zip exts: .bin .emf .rels .xml[9] none

PureMessage only gives us a report for the entire message, and it reported that these emails have both CXmail/DocDrp-C and CXmail/MalPE-AS. I suspect that DocDrp is the .doc and MalPE is the .7z, but I don't know for sure.

(This would be another example of malware covering its bases.)

spam/MalwareBeingClear written at 16:26:44; Add Comment

Our workaround for Ubuntu 16.04 and 18.04 failing to reliably reboot some of our servers

A few years ago I wrote about how and why systemd on Ubuntu 16.04 couldn't reliably reboot some of our servers. At the time I finished off the entry by suggesting that we'd live with the intermittent failures that caused some of our systems to hang during reboot attempts, forcing us to go power cycle them. Shortly afterward, we changed our minds and decided to work around the situation by resorting to a bigger hammer. These days we use our bigger hammer on both Ubuntu 16.04 and Ubuntu 18.04; the latter may have improved some aspects of the shutdown situation, but our experience is that it hasn't fixed all of them.

The fundamental problem is that systemd can leave descendant processes running even when it has nominally terminated a systemd service, such as Apache, cron, or Exim. These lingering processes are not killed (or attempted to be killed) until very late and can cause a variety of problems during NFS unmounts, turning off swap, or various other portions of system shutdown. To deal with this, we use the big hammer of doing it ourselves; during shutdown, we run a script to kill lingering processes from various service units.

The script has a list of systemd services. For each service, it first looks in the systemd cgroup hierarchy to see if there are still processes associated with the service, by counting how many lines there are in /sys/fs/cgroup/systemd/system.slice/<what>.service/tasks. If there are processes still associated with the service, it kills them with SIGTERM and then SIGKILL (if necessary), using systemd itself to do the work with:

systemctl --kill-who=all --signal=SIG... kill <what>.service

(The actual implementation is slightly more complicated.)

The script has a bunch of logging to report on whether it had to do anything, what it did, and what the process tree looked like before and after it did various killing (as reported through systemd-cgls, because that will show us what systemd units the stray processes are under).

All of this is driven by a systemd .service unit with the following relevant bits:

[Unit]
After=remote-fs.target
Before=cron.service apache2.service exim4.service atd.service slurmd.service

[Service]
Type=oneshot
RemainAfterExit=True
ExecStop=/path/to/script

We set After so that our stop action is run before NFS unmounting starts, and Before so that the stop action happens after those listed services are shut down. Not all of those services exist and are enabled on all machines, but listing a Before service that isn't enabled is harmless. The Before list is basically 'what has caused us problems'; we add things to it as we run into problem services.

(Slurmd is a recent addition, for example.)

Right now the list of 'before' services is duplicated between the script and the systemd unit. It feels tempting to try to eliminate that, but on the other hand I'm not sure I want to be introspecting systemd too much during shutdown. We could also try to be more general by sniffing around the cgroup hierarchy to find stray processes from any unit we don't whitelist (or at least any unit that's theoretically been shut down). However, that might not be very useful on modern systems, where 'KillMode=control-group' is the default.

The good news is that the script's logging suggests that it usually doesn't need to do anything during system shutdown on our 18.04 machines. But usually isn't always, which is what prompted the addition of slurmd.service.

Sidebar: A potential alternate approach

Basically this is making these units behave as if they were set to 'KillMode=control-group' during shutdown. You can change systemd unit properties on the fly and only for the current system boot (with 'systemctl --runtime set-property', which we use for our per-user CPU and memory limits), so perhaps it would work to switch to this KillMode on the relevant service units early in the shutdown process.

This option didn't even occur to me until I wrote this entry, and in general it seems more uncertain and chancy than just killing things (even if we're killing things indirectly through systemd). But it'd give you a much smaller and simpler script.

linux/SystemdUbuntuRebootWorkaround written at 00:44:54; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.