Wandering Thoughts: Recent Entries

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.

2012-02-05

My view on what will kill 'traditional' system administration

Phil Hollenback recently wrote DevOps Is Here Whether You Like It Or Not, in which he writes that traditional system administration is dying. While I sort of agree with him about the death, I don't think it's necessarily for the reasons that Phil points to.

Fundamentally, there has always been a divide between small systems and large systems. Large systems have had to automate and when that automation involved applications, it involved the developers; small systems did not have to automate, and often do not automate because the costs of automation are larger than the costs of doing everything by hand. Moving to virtualization doesn't change that (at least for my sort of system administration, which has always had very little to do with shoving actual physical hardware around); if you have only a few virtualized servers and services, you can perfectly well keep running them by hand and it will probably be easier than learning Chef, Puppet, or CFEngine and then setting up an install.

(If you're future-proofing your career you want to learn Chef or Puppet anyways, so go ahead and use them even in a small environment.)

There are two things that I think will change that, and Phil points to one of them. Heroku is not just a virtualization provider; they are what I'll call a deployment provider, where if you write your application to their API you can simply push it to them without having to configure servers directly. We've seen deployment providers before (eg Google App Engine), but what distinguishes Heroku is how unconstrained and garden variety your API choices are. You don't need to write to special APIs to build a Heroku-ready application; in many cases, if you build an application in a sensible way it's automatically Heroku-ready. This is very enticing to developers because (among other things) it avoids lockin; if Heroku sucks for you, you can easily take your application elsewhere.

(This has historically not been true of other deployment providers, which makes writing things to, say, the Google AppEngine API a very big decision that you have to commit to very early on.)

Deployment providers like Heroku remove traditional system administration entirely. There's no systems or services to configure, and the developers are deeply involved in deployment because a non-developer can't really take a random application and deploy it for the developers. If there is an operations group, it's one that worries about higher level issues such as production environment performance and how to control the movement of code from development to production.

The other thing is general work to reduce the amount of knowledge you need to set up a Chef or Puppet-based environment with certain canned configurations. Right now my impression is that we're still at the stage where someone with experience has to write the initial recipe to configure all N of your servers correctly, and you might as well call that person a sysadmin (ie, they understand Apache config files, package installation on Ubuntu, etc). However it's quite possible that this is going to change over time to the point where we'll see programs shipped with Chef or Puppet recipes to install them in standard setups. At that point you won't need any special knowledge to go from, say, writing a Django-based application to installing it on the virtualization environment of your choice. This really will be the end of developers needing conventional sysadmins in order to get stuff done.

The general issue of the amount of hardware in a small business (and virtualizing the hardware) ties into a larger question of how much hardware the business of the future is going to need or want, but that's a different entry. I will just observe that the amount of servers that you need for a given amount of functionality has been steadily shrinking for years.

Sidebar: what virtualization does change now

I think that plain virtualization does mark a sea change today in one way: it moves sysadmins away from a model of upgrading OSes to a model of recreating their customizations on top of a new version of the OS. Possibly it moves away from upgrading software versions in general to 'build new install with new software versions from scratch, then configure'.

This is partly because the common virtualization model is 'provide base OS version X image, you customize from there' and partly because most virtualization makes it easy to build new server instances. It's much easier to start a new Ubuntu 12.04 image than it is to find a spare server to use as your 12.04 version of whatever.

(Note that virtualization may not make it any easier to replace your Ubuntu 10.04 server with a new 12.04 server; there are a host of low level practical issues that you can still run into unless you already have a sophisticated management environment built up.)

I don't think that this is a huge change for system administration, partly because this is pretty how much we've been doing things here for years. We basically never upgrade servers in place; we always build new servers from scratch. Among other things, it's much cleaner and more reproduceable that way.

sysadmin/WhatWillKillSysadmin written at 23:56:02; Add Comment

Link: Filenames.WTF

In Filenames.WTF, Daniel Rutter runs down the reasons first why paying attention to file extensions is ridiculous, and then the reasons why it's still the best solution to the problem that we have. Spoiler: it's because people have spent decades creating file formats that suck.

(Via philliph on Twitter.)

links/OnFileExtentsions written at 19:27:58; Add Comment

What five years of PC technology changed for me

This fall I got a new home machine, just a bit over exactly five years after I got my previous home machine. It happens that I saved the invoice for my five year old machine, so I dug it out today in order to do a comparison about what five years of progress in PC technology did and didn't change for me.

First off, the progress of five years got me much better prices. My recent home machine cost me only about 60% of what my old home machine did. By itself, this is pretty impressive. Apart from that, running down the major components:

  • CPU: AMD dual core versus much faster Intel quad core. The Intel CPU was cheaper but not by a substantial amount; I think the AMD was probably closer to the high end at the time. I don't know what the benchmark results are, but I got a substantial performance improvement.

  • RAM: This is perhaps the most striking change on a purely numerical level; in 2006 I got 2GB of RAM for more than twice as much as what 16 GB of RAM cost me in 2011. Even in 2006, 2 GB was clearly economizing (I remember debating with myself over 2 GB versus the extra money for 4 GB and deciding that 2 GB should be good enough). In 2011, 16 GB is as much as the motherboard will take with current DIMM densities.

    In short, desktop RAM has become stupid cheap.

    (One index of the change is that in 2006, the 2 GB of RAM cost more than the CPU and was the most expensive single component. In 2011, the 16 GB cost only a bit over half of the CPU.)

  • motherboard: the modern era features more SATA, less IDE, more USB, and not even one external serial port. Motherboards are unexciting. Even in 2006 the motherboard had onboard sound and gigabit Ethernet. The 2011 motherboard probably has better onboard sound, but in practice this doesn't matter to me; my sound needs are modest.

    (The 2006 motherboard was a bit cheaper than the 2011 motherboard, but neither were particularly expensive or advanced ones.)

  • Hard drives changed only moderately at one level; in 2006 I got 320 GB drives for somewhat over twice what 2011's 500 GB drives cost me. In 2011, 500 GB drives are nowhere near state of the art; in 2006, 320 GB drives were not that far out of it.

    (This was before the floods in Thailand.)

    On another level, they changed a lot. The 320 GB hard drives of 2006 were my only storage. The 500 GB drives of 2011 are only for the operating system; my data lives on a pair of 1.5 TB drives (that I had upgraded to some time ago). 500 GB is way overkill for the OS, but there's no real point in using drives that are any smaller; it's not like I'd have saved any significant amount of money.

  • Video card: ATI X800 GT versus ATI HD 5450 with double the memory for less than a third of the price. Toms Hardware theoretically puts these two cards in almost the same performance category, although I'm not sure that's really true. In practice, what happened between 2006 and 2011 is that graphics cards shifted to the point where a basic passively cooled card was clearly more than good enough for what I was doing, even for driving dual displays digitally.

    (I don't yet have dual displays at home, but I do at work and my work machine uses the same card. In fact, my work machine is now a clone of my home machine, just as it was in 2006.)

  • optical drives: in 2006 a DVD burner cost about four times what it did in 2011, and I thought I would listen to CDs enough to justify having a separate CD/DVD reader (rather than put wear and tear on an expensive burner).

    (I was wrong; my CD listening had already dropped off a cliff in early 2006 and never recovered. I still kind of miss that sometimes.)

  • Power supply: in 2006 I didn't trust the power supply that came with the case to really be a good solid one that delivered enough power so I bought a separate one as well. In 2011 I couldn't find any reason to worry about it so I didn't; the power supply you get with a decent quiet case these days is going to be quite good, more than you need (for a PC like the kind I build), and efficient.

In 2006, the most expensive components were the RAM, the CPU, the two hard drives together, and then the video card. In 2011, the most expensive components were the CPU, the motherboard, and the case (more or less tying with the RAM). Another way to put it is that in 2011, the video card, the DVD burner, the hard drives, and pretty much the RAM were all what I considered trivial expenses in the overall machine.

tech/FiveYearsPCChanges written at 02:28:22; Add Comment

2012-02-03

Understanding a subtle Twitter feature

One part of getting on Twitter has been following people, which led me to discover that when you follow someone Twitter doesn't show you all of their public tweets. To summarize what I think is the rule, Twitter excludes any conversations they're having that purely involve other people you don't also follow. Their tweets in the conversation will appear in their public timeline, but not in your view of their tweets.

(This may only apply to relatively new Twitter accounts, or even only to some of them. I've seen Twitter give two different interfaces to two new accounts.)

On the one hand, when I discovered this I was infuriated. If you really did want to see everything (for example, so you could find other people to follow based on who your initial people had interesting conversations with), this made having a Twitter account worse than just perusing the Twitter pages of interesting people.

On the other hand, once I thought about it more I've come to reluctantly admire Twitter's trick with this feature. What it is, from my perspective, is a clever way to reduce the volume impact of following someone and thus make doing so less risky. Without it, following someone would immediately expose you to both their general remarks and to the full flow of whatever conversations they have. With Twitter's way, you are only initially exposed to people's general remarks; you ramp up your exposure to their conversations by following more people, and ramp it down by the reverse.

My feeling is that exposure to an overwhelming firehose of updates is the general problem of social networking. Social networks usually want you to be active and to follow lots of people. But if those people are themselves active, the more people you follow the more volume descends on you, and it's especially bad when you follow very socially active users, the ones having a lot of conversations. This creates a disincentive to follow people and pushes you to scale back. Twitter has this especially badly because it has no separate 'comment' mechanism (comments are important for reducing volume). Twitter's trick here is thus a clever way to reduce the firehose in a natural way that doesn't require user intervention and tuning; you could see it as a way of recreating something like comments in a system that doesn't naturally have them.

Once I realized this, it's certainly been working the way that Twitter probably intended. When I'm considering whether or not to follow someone I don't really look at the volume of their tweets in general; I mostly look just at the volume of their non-conversation tweets, because those are the only ones that I'm going to see. Often this makes me more willing to follow people (and thereby furthers Twitter's overall goal of getting me more engaged with their service).

tech/TwitterVolumeLimit written at 22:48:37; Add Comment

Understanding Resident Set Size and the RSS problem on modern Unixes

On a modern Unix system with all sorts of memory sharing between processes, Resident Set Size is a hard thing to explain; I resorted to a very technical description in my entry on Linux memory stats. To actually understand RSS, let's back up and imagine a hypothetical old system that has no memory sharing between processes at all; each page of RAM is either free or in use by exactly one process.

(We'll ignore the RAM the operating system itself uses. In old Unixes, this was an acceptable simplification; memory was statically divided between memory used by the OS and memory used by user programs.)

In this system, processes acquire new pages of RAM by trying to access them and then either having them allocated or having them paged (back) in from disk. Meanwhile, the kernel is running around trying to free up memory, generally using some approximation of finding the least recently used page of RAM. How aggressively the operating system tries to reclaim pages depends on how much free memory it has; the less free memory, the faster the OS tries to grab pages back. In this environment, the resident set size of a process is how many pages of RAM it has. If the system is not thrashing, ie if there's enough memory to go around, a process's RSS is how much RAM it actually needs in order to work at its current pace.

(All of this is standard material from an operating system course.)

The problem of RSS on modern Unix systems is how to adopt this model to an environment where processes share significant amounts of memory with each other. In the face of a lot of sharing, what does it mean for a process to have a resident set size and how do you find the right pages to free up?

There are at least two approaches the kernel can take to reclaiming pages, which we can call the 'physical' and 'process' approaches. In the physical approach the kernel continues to scan over physical RAM to identify candidate pages to be freed up; when it finds one, it takes it away from all of the processes using it at once (this is the 'global' removal of my earlier entry). In the process approach the kernel scans each process more or less independently, finding candidate pages and removing them only from the process (a 'local' removal); only once a candidate page has been removed from all processes using it is it actually freed up.

(Scanning each 'process' is a simplification. Really the kernel scans each separate set of page tables; there are situations where multiple processes share a single set of page tables.)

The problem with the process approach is that the kernel can spend a great deal of time removing pages from processes when the pages will never actually be reclaimed for real. Imagine two processes with a shared memory area; one process uses it actively and one process only uses it slowly. The kernel can spend all the time it wants removing pages of the shared area from the less active process without ever actually getting any RAM back, because the active process is keeping all of those pages in RAM anyways.

So, why doesn't everyone use the physical approach? My understanding is that the problem with the physical approach is that it is often not necessarily a good fit for how the hardware manages virtual memory activity information. Per my earlier entry, every process mapping a shared page of RAM can have a different page table entry for it. To find out if the page of RAM has been accessed recently you may have to find and look at all of those PTEs (with locking), and do so for every page of physical RAM you look at.

My impression is that most current Unixes normally use per-process scanning, perhaps falling back on physical scanning if memory pressure gets sufficiently severe.

(I suspect and hope that virtual memory management in the face of shared pages have been studied academically, just as the older and simpler model of virtual memory has been, but I'm out of contact with OS research.)

unix/UnderstandingRSS written at 02:11:13; Add Comment

2012-02-01

A ZFS pool scrub wish: suspending scrubs

Like sensible people, we scrub our pools periodically in order to turn up latent problems. Because pool scrubs have a visible impact on responsiveness (at least in the lightly patched Solaris 10 update 8 that we're running), we only run scrubs on weekends (and only scrub one pool per fileserver). However, we've recently started running into problems where pool scrubs slow the fileservers down enough that backups have started failing.

The obvious way around this is to switch things to only doing scrubs when backups aren't running. Except there's a problem: we run backups every day, they run for a fairly long time every day, and some of our pools take up to fifteen hours to scrub. If we only scrub when backups aren't running, there just isn't a fifteen hour gap that our biggest pools need.

(It's possible that they would scrub somewhat faster if they never overlapped with backups, but that's only a vague possibility. And as the pools get more data, they'll take longer and longer to scrub.)

Which brings me to my wish: I wish you could suspend ZFS pool scrubs. Not stop them and start them again from the start, but just put one to sleep by telling the pool to remember where the scrub was but do no further scrub IO for now, then later resume the scrub from where it left off. This would allow us to do even big scrubs around the backups, and in fact we could schedule scrubs much more liberally than we do right now. For example, we might have a couple of hours in a weekday early morning after backups have finished that we could use to get some scrubbing in.

(I'd be perfectly happy if this was only an in-memory pause, so that if you rebooted your system or exported the pool you lost it and had to start from scratch. As an in-memory pause it ought to be relatively simple to implement.)

PS: I checked and this doesn't seem to be in Illumos, at least based on the current Illumos zpool manpage.

solaris/ZFSScrubWish written at 11:38:13; Add Comment

2012-01-31

The solution to the modern X font handling mystery

I wrote last time about my attempts to work out just why xterm was rendering the same font differently on Ubuntu and Fedora. Thanks to comments from Adam Sampson and some additional digging, I now have an answer and some theories. As it happens, the answer illuminates yet more issues with modern X font handling.

In the modern Xft/FreeType/Fontconfig world, fonts are specified more or less as a font name and a size. With most programs that allow explicit specification of the font name you can augment the name with additional attributes, partly to modify the exact font that gets matched and partly to control how it's rendered. All of this is sort of covered in the fontconfig user documentation.

(An example could be 'DejaVu Sans Mono:style=bold:hintstyle=hintslight'. This shows both a modification of the font selection process and a rendering instruction. A similar sort of syntax can be used if you want to find, eg, all of the monospace fonts on the system.)

Fontconfig also has system-wide configuration files, found in /etc/fonts/conf.d/. In most packages that I'm familiar with, the global configuration is a default and explicit specification of things override them. However, this is not the case for fontconfig; at least for some settings, fontconfig's global settings silently override anything you specify explicitly. The only way to override these settings yourself is to have a $HOME/.fonts.conf file (and you can't unset the settings so that you can pick them on the fly, only set them to whatever personal global value you want).

You can probably guess the rest of the story. As spotted by Adam Sampson, Ubuntu's fontconfig package has a global config file that is explicitly forces hinting to be set to hintslight, while Fedora has no config file and is defaulting to hintfull. Because this is set in a global config file you can't override it on the xterm command line, which fooled me into thinking that this setting wasn't the culprit.

(You can include ':hintstyle=hint<whatever>' in a -fa argument all you want, but it is silently ignored.)

Overriding that (with a personal .fonts.conf file that forces hintfull hinting) got Ubuntu rendering to be almost the same as Fedora rendering. The remaining difference turns out to be due to the specific versions and compilation options of my version of FreeType. Interestingly, this is not just a small visual difference; at least under some circumstances the Ubuntu FreeType library renders DejaVu Sans Mono characters a pixel or so taller than my Fedora FreeType library does, meaning that an 80x50 xterm on Ubuntu is visibly taller than a Fedora 80x50 xterm. (They are both the same width.)

I don't know for sure why gnome-terminal, Firefox, and TK applications were unaffected by this, but my theory is that all of them use the Gnome preferences system. Gnome has its own preferences settings for how to render fonts and these appear to completely override fontconfig's views on the subject, so Gnome applications were using the 'right' hinting style for my tastes. I would have probably seen the same rendering of DejaVu Sans Mono in any other Gnome application that used it as the monospace font (a good example is probably gedit).

(Why this happened for some fonts and not for others presumably has to do with how the fonts were hinted, or maybe some fonts specify that they can only be hinted at some levels. I don't know if this means that the fonts that weren't affected are less hinted than DejaVu Sans Mono and so on, or just hinted differently.)

linux/ModernXFontDrawbackIII written at 21:52:59; Add Comment

Where is Oracle going with Solaris?

(Disclaimer: rambling ahead.)

Once upon a time, back when Sun was still Sun, it was possible to kind of see what they thought the future market for Solaris was. Solaris wasn't Linux, but they could load it with attractive features (ZFS, DTrace, arguably Zones, etc) to make up for being not-Linux and then sell it for a relatively low price to hook the low end of the market. Arguably Sun skipped the bit where they upsold to more lucrative services later.

(In this view, the free Linux distributions serve as a valuable initial hook for higher end commercial Linuxes like Red Hat Enterprise. A small company is unlikely to buy RHEL right away; instead they can progressively move closer, first with Debian or Ubuntu, then with CentOS, and finally they start paying Red Hat when they get tired of the alternatives. Since very few people were going to jump from a Linux to Solaris, Solaris needed a similar entry-level hook.)

Then Oracle took over Solaris and now I don't understand how they see its future. The initial moves were straightforward: Oracle drastically raised prices and effectively drastically reduced hardware availability. Then of course they killed off other features that made Solaris attractive, like source availability. As far as I can see this took out the bottom end of the Solaris market entirely.

(It's hard to find current pricing for Solaris on non-Oracle hardware. The best I could find on Oracle's own website was $1k per core per year; it's not clear if you can get a better deal through either Dell or HP, which were at one point theoretically reselling Solaris on their own hardware. I couldn't configure a low-end 1U Dell server with Solaris, for what that's worth.)

One possible answer is that Oracle has no real plans for Solaris's future. In this view, they're treating it as a declining asset and milking it to get as much money as possible from those people who have to have Solaris. As the ranks of those people dwindle, Solaris itself will dwindle away with them. Eventually Oracle will politely sunset it and no one will really care. In this view, the relatively high prices for Solaris (and the outrageously high ones for non-Oracle hardware) are somewhat deliberately designed to discourage new customers; the last thing Oracle wants is for Solaris to actually get popular, because then Oracle would have to start spending real money on it.

Another possible answer is that Oracle thinks that Solaris has a viable future on big iron but not on low end hardware. I'm a professional skeptic about big iron in general, so I'm not well placed to evaluate how realistic this is. I think you can make a case that big iron customers are mostly insensitive to both the exact operating system (they care about the apps, which are often layered on top of a database to start with) and the licensing costs, but will value various (theoretical) Solaris virtues like resilience and inspectability with DTrace (especially if Oracle integrates DTrace support into their database products). On the other hand they do care about TCO (and there can be a lot of money involved in that TCO with big iron and Solaris licensing) and I'm not sure Oracle has a good sales pitch for Solaris against the relentless march of cheaper Linuxes.

(I'm not persuaded by the variant of this where Solaris is supposed to be the true home of Oracle's database software, because it requires customers to either like or be neutral to Solaris and its increased costs. If everyone wants to run Oracle on RHEL, it's hard to make Solaris Oracle's true home.)

All of this is mostly but not entirely academic to me, since it seems clear that we have too little money to interest Oracle. Still, I just can't stop wondering; there was a time when Solaris looked like it had a place in the general Unix future.

(You can argue that Solaris still does, in the form of Illumos and distributions using it. Especially as apparently a whole lot of the Sun technical people have left Oracle and settled at various other places that are working on Illumos; this makes Illumos the technical future of Solaris, and the technical future is the interesting one.)

PS: I would probably be better informed about the speculation on this if I actually followed Solaris news. I don't, because it seems very unlikely that anything Solaris news is going to affect us; Oracle would have to perform one of the world's most spectacular sudden reverses in order to be relevant to us again.

solaris/OracleSolarisFuture written at 00:17:09; Add Comment

2012-01-30

HTML is not a SGML dialect and never really has been

There is a persistent story that makes the rounds among the web specification world (for example, in this otherwise realistic article on XHTML) that HTML is a SGML dialect but web browsers persistently mishandle and mis-parse certain SGML features such as minimization. Although I have pandered to this belief before, it is false in practice and in reality.

HTML is really a documentation standard; the standard followed behind existing practice, not preceded it. In the very beginning, people just created browsers and a vague format that the browsers understood. This format was inspired by SGML, but it was never an SGML dialect and as such it never had various obscure SGML features. At some point, when people in the W3C were writing down the HTML standard of the time (or perhaps evolving it), they decided to 'fix' this obvious omission by writing into the new version of the HTML specification that it was a SGML dialect.

(Looking at the historical specifications via wikipedia, this appears to go as far back as HTML 2.0.)

You can guess what happened next. All of the browsers of the time promptly ignored this new bit of the standard, and pretty much every browser written since then has as well; none of them ever parsed HTML as SGML, supporting all of the little odd SGML features that that implies. HTML may be an SGML dialect as far as the W3 standards and their validator are concerned, but it is not in real life and anyone who writes HTML believing otherwise is going to have problems.

As you might expect, HTML5 very firmly puts a stake in this particular issue; the current spec draft says explicitly (emphasis mine):

For compatibility with existing content and prior specifications, this specification describes two authoring formats: one based on XML (referred to as the XHTML syntax), and one using a custom format inspired by SGML (referred to as the HTML syntax).

Perhaps someday all of the common HTML validators will be updated to understand HTML as it really is.

web/HTMLAndSGML written at 15:33:28; Add Comment

2012-01-29

Dealing with Fitts' Law on widescreen displays

One of the usual sayings derived from Fitts' Law is that four of the five easiest locations to reach with the mouse are the four corners of the screen, because they require very little precision (the edges trap the mouse and guide it into the corner). Over the years I've made some modifications to my desktop environment to make better use of this principle. The most important one is how I use the top left corner; I have my taskbar equivalent arranged so that when an iconified terminal window gets output, I can just zoom my mouse to that corner and click in order to reveal the terminal window.

Zooming to a corner is a fast operation in most setups; it works fine on a single monitor, even a single widescreen monitor, and on a normal dual-monitor setup such as my work desktop. But recently (for reasons beyond the scope of this blog) my work setup got updated to dual widescreen monitors, which revealed two problems with my application of Fitts' Law in this environment.

The first problem is that the sheer number of side to side pixels in a pair of 1920x1200 LCD panels seems to be a bit too many to easily zoom a mouse across. My mouse pointer generally winds up in the middle of the right hand display; getting it to the top left corner of the left display was no longer anything like a little flick of the wrist. The second problem is that the top left corner was sufficiently physically far off to the side that it was no longer an easy casual action to glance at it to see if there was anything with new output that I needed to deiconify; I was less glancing off a bit and more peering off into the distance.

(I had my old dual displays relatively flat against each other, but I think that I probably need to move the new displays into a much more pronounced V shape.)

My current solution to this issue exploits Fitts' Law once again. The often-overlooked fifth easy to reach location is 'where the mouse is right now', or failing that 'some large area very near where the mouse is'. So I've created a new mouse button binding for my window manager; if the mouse is over the root window, hitting the left button with Shift+Control now de-iconifies the (alphabetically) first terminal window. My mouse is frequently parked over the root window and when it's not there's generally an exposed patch of the root window close to it.

(Technically the binding toggles the window's iconified state, which means that I can flip the first window back and forth from iconified to not. This is a great way to fidget.)

To deal with the 'too far to look' issue and to make things in my terminal windows taskbar easier to reach in general, I've repositioned it so that it's at the top left corner of my second (right) display; this puts it more or less in the center of my overall workspace and makes it easier to both reach and look at. I don't think this move away from a screen corner is a loss for Fitts' Law because everything except the first window already had to be targeted carefully.

Of course, now I just have to train myself out of a many years habit of reflexively looking and going to the top left of the left display. This shouldn't take too long, right?

(What I'd really like to do is duplicate my taskbar equivalent in the top left of both displays. Unfortunately this isn't possible right now with my window manager.)

PS: I experimented briefly with increasing the mouse acceleration (which would make everything effectively closer) but didn't like the effects it had on my ability to target things with the mouse in general; I kept overshooting and missing stuff. Possibly I would have acclimatized with time and I just gave up too soon.

sysadmin/WidescreensAndFittsLaw written at 21:13:37; Add Comment

Thinking about spam rejection and abuse addresses

Somewhat recently we got a spate of spam messages to our abuse address, which set me to thinking about the mostly theoretical issue of how to treat email to it.

(It's a mostly theoretical issue for us because the volume of spam and other email to our abuse address is very low in general, so we're not at all likely to change anything about it.)

On the one hand, visible spam rejection of email to abuse addresses is one of the things that really gets on people's nerves; it's famous for rejecting real spam complaints because, of course, they contain spam. Your spam, that people are trying to complain about.

On the other hand, email to abuse is going to go through our spam scoring system and get tagged if the system thinks it's spam. Pretty much everyone here either discards spam-tagged email outright or filters it to a separate folder. My mail filtering deliberately excludes email to abuse (among a few other things), but I don't know if anyone else either bothered or even thought of it; it's not necessarily something that comes to mind when you're setting up personal email filtering.

And finally, I can't think of any actual real email to our abuse address that we've gotten in the last five years or so (since I moved to here). It's all been spam. So as a practical matter, any filtering or rejection that we do on abuse email is unlikely to affect real complaints, because we don't get real complaints (hopefully because our users and machines don't generate spam, as opposed to people just not complaining about it).

(The other aspect of email to our abuse address is that I suspect most people are going to complaint to the central university-wide abuse address instead of abuse at our specific subdomain. The central people will then get in touch with us through our internal contact address, not our abuse address.)

This is of course a specific instance of the general spam rejection versus spam filtering dilemma. If you reject email people at least know; if you filter, there's at least a theoretical chance that you'll recover from filtering mistakes. The stakes are higher for the abuse address because it is one of the addresses that has a very high chance of false positives (non-spam classified as spam).

The most pragmatic thing to do in a situation like this is to apply spam-filtering to your abuse address. This blackholes real spam to keep it from bothering people while carefully not saying anything to real senders who had their messages misclassified. But this pragmatism sort of bothers me because it's lying to real senders just to pacify them (their email is being ignored either way but you're deliberately doing it silently so they don't know). It would be more honest to use spam rejection on the abuse address, and it might do some good to reduce the level of spam. If legitimate email to your abuse address really is vanishingly rare, it also shouldn't affect very many people.

So what's the right answer? I have no idea.

(My current approach of exempting the abuse address from my personal filtering would not be viable if it got a lot of spam. At that point I would probably remove the exemption and let spam-tagged email to the abuse address get quietly filtered away, mostly because it's easier than trying to persuade everyone that maybe we should do spam rejection for email to abuse.)

spam/AbuseRejection written at 02:24:38; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
[There's more, starting at 2012/01/28 or Previous 11]
(Previous day)
By day for February 2012: 1 3 5; before February.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.