Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.
|
2009-11-20 Spam and the attraction of reachHere is a thesis: the larger or more standardized the environment for sending messages, the more spam you should expect to get in or through it. Accordingly, email is heavily abused because it is hugely standardized. The spammer's motivation for abusing larger, standardized environments is obvious; the larger the environment, the more people you can reach with a single technique, approach, or system. Larger environments have better return on effort, since generally (but not always) most of the effort in spamming in an environment is figuring out how to do it well. (This ties in to how spammers are lazy but not stupid, at least not in the aggregate.) This is depressing because it implies that any well used service that allows push messages is going to have spam no matter what you do. If you build such a service or protocol and it gets popular, you'll get spam. (In fact, degree of spam is not a bad metric for degree of popularity. And if the spammers abandon you, well, worry.) It is tempting to say that one important way to discourage spammers is to shift the relative costs so that as much effort as possible is per-message effort; if nothing else, this might make you less attractive than the next target. However, I think that the general history of people's anti-spam efforts in new systems shows that this ultimately doesn't work; if you're attractive enough for regular users, you're easy enough for spammers. (See also DeterringAbuseProblem on this general issue.)
2009-11-19 The corollary for effective anti-spam heuristicsLast time I mentioned that spammers were
perfectly capable of adopting their practices to defeat anti-spam
heuristics like requiring a valid Since spammers will adopt when it is both useful and possible, a good anti-spam heuristic is some characteristic of the message or of how it is transmitted that the spammer cannot easily change. While people have made various stabs at this in the past (and will no doubt continue to do so in the future), the problem for anti-spam efforts is that such characteristics have been hard to find, partly because spammers have proven to be very ingenious about finding ways to change them. (For a small example, are anti-spam systems matching on the characteristic phrases of your advance fee frauds in email? No problem, just put your pitches in file attachments. I await with resignation the day the spammers start sending PDFs, not just Word .doc files, since a sufficiently ingenious spammer can make a PDF that is very hard to analyse.) I am not convinced that it's even theoretically possible to come up with good (under this definition) anti-spam heuristics in any sort of general environment, partly for reasons that run up against the fundamental spam problem. (While current heuristics are effective, my strong impression is that they are a laboriously maintained and ever-evolving collection of more or less ad-hoc rules. This doesn't necessarily scale, and it's expensive.)
2009-11-18 Universities are open environmentsOne of the things that's led to the university Internet environment changing (per an earlier entry) is that universities are open environments in general and especially in terms of services. In this they are fundamentally different from companies, which can be much more closed and closeted environments. I think that there's three reasons for this. First, there is a much different relationship between many people at the university and the university. In a company, everyone 'at' or 'in' the company is working for the company, but in a university the majority of the user base is effectively customers, and this creates significantly different expectations. Second, one way that these expectations manifest is that a company has much more scope to plead security and secrecy in order to keep services inside its walls. In a company you can assert with a straight face that you have privacy concerns in putting company email on some outside provider. In a university, the students will say 'so what? I don't care'. And in general I think that there is more acceptance of secrecy and security as valid concerns at a company than at a university; at a company they are defaults, while at a university there is at least a theory of transparency and operating in the open. Finally and I think significantly, universities are open in good part because people are flowing through them all the time. Every year N people show up and N people leave, more or less, and at least in theory these people should be significant users of your services. This constant and significant flow works to destroy any insularity and ignorance about the outside world's progress that might build up in general, and when combined with the relation between students and university creates an environment where you are constantly justifying your services to the next generation of arrivals (whether or not you realize it). (This degree of turnover is also another strike against claims of secrecy and security. As I've said before, at a university you have to assume that there are plenty of evil people already inside your organization.) Or in short: the university is open because people keep walking through, bringing in knowledge of the outside (and leaving with knowledge of the university).
2009-11-17 Finally understanding the appeal of 'Interfaces'I spent a long time not really getting the need for coded, explicit implementations of 'Interfaces', by which I mean things like zope.interface. It didn't help that I generally encountered them as part of very large, complex systems like Zope and Twisted, and they tended to come with a lot of extra magic features, which made the whole idea seem like the sort of thing you only needed if you had to deal with such a beast. Then, recently, the penny dropped and I finally saw the light. Shorn of complexity and extra features, what Interface implementations give you is an explicit and easily used way to assert and ask 'is-a' questions. Need to find out if this object is a compiled regular expression? Just ask if it supports the ICRegexp interface. What to be accepted as a compiled regular expression? Assert that you support ICRegexp. (Assuming the best and forging ahead is still the most Pythonic approach, but per my original problem you sometimes do need to know this sort of thing. And per yesterday's entry, requiring inheritance is not the answer, especially if you want to build decoupled systems.) When I put it this way, it's easy to see why you'd like a basic interface implementation. If you have to test at all, simple 'is-a' tests beat both 'is-a-descendant-of' restrictions and probing for duck typing with its annoyances and ambiguities (cf an earlier entry). In this view, the important thing is really to have a unique name
(really an object) for each interface, so that you avoid the duck
typing ambiguity. A basic implementation is almost
trivial; treat interfaces as opaque objects, and just register classes
as supporting interfaces and then have an ' (This demonstrates the old computer science aphorism that there's no
problem that can't be solved by an extra level of indirection, since
that is basically what this does: it adds a level of indirection to
More complex implementations are of course possible; you could give
the interface objects actual behavior and information, add checks for
basic duck typing compatibility with the interface, make it so that
(Sooner or later you end up back at zope.interface.) (5 comments.)
python/GettingInterfaces written at 00:19:21; Add Comment
2009-11-16 'Is-a' versus 'is-a-descendant-of'One of the things that my issue with the Python re module not exposing its types has firmly mashed my nose into is the difference between 'is-a-descendant-of' and 'is-a' in object-oriented languages. It's conventional to think of them as more or less the same thing, even in a loose duck typed language like Python; it just seems to make sense for all compiled regular expressions to descend from a single base class, just as it theoretically makes sense for both plain bytestrings and Unicode strings to descend from an abstract generic string class. (Technically, some of the things that I am calling classes here are actually types. In Python this is a distinction that can usually be ignored.) Of course, when I write it out like this it's evident that it doesn't
necessarily make sense. For example, the actual implementation of
Python's base string class has no code and no behavior; it exists only
for the convenience of programmers who want to On the flipside, it would be nice to be able to write alternate regular expression engines and have their objects accepted as 'compiled regular expressions'. Right now, anything that does duck typing will accept them, but things that look at types won't, purely because they don't descend from the current implementation of the regexp class (and you can't fix that, partly for reasons that I covered yesterday). What this gets down to is that 'is-a' is effectively a question of interface, not of inheritance. In fact, duck typing in a nutshell is that your object 'is-a' compiled regular expression if it satisfies the expected interface behavior for such objects. Even in Python, we almost always use 'is-a-descendant-of' tests only as a convenient proxy for answering this 'is-a' question, but they are not quite the same thing and the difference can trip you (or other people) up. (I'm sure I've read about this before, but there is a certain vividness to things this time around because I've had my nose rubbed in this.)
2009-11-15 A limitation of Python types from C extension modulesIt's recently struck me that there is an important difference between types (and classes) created in a Python module and types/classes that come from a C-level extension module. Suppose that duck typing is not enough and so
you really want to make a class that inherits from an outside class
(one in another module), yet overrides all of its behavior. This lets
you create objects that work the way you need them to but will pass
(Yes, you'll need to do your own initialization instead to make your version of the behavior all work out, since once you're not using the parent type's initialization you can't assume that any of the parent's other methods keep working.) If the outside module is a Python module, you can always (or perhaps
almost always) do this. If the outside module is a C extension module,
there is no guarantee that you will be able to do this (and sometimes
you may not even be able to create your descendant class, much less
initialize new instances of it). Fundamentally, the reason for this is
the same reason as the reason you can't use This means that types created in C modules can be effectively sealed
against descent and impersonation; they simply can't be substituted for
in a way that will fool (It's possible to make a C-level type inheritable; all of the core
Python types are C-level types, after all, and you can do things
like inherit from
2009-11-14 How to defer things in EximNormally, Exim routers will only accept or fail addresses (or be uninterested in them). This is good enough for normal handling of addresses, but if you are using routers to their full power, there are times when you want to force routers to defer addresses instead. There are two general ways to do this. (Unsuccessful DNS lookups can cause addresses to defer, but this is not normally under your control.) The straightforward way is to use a separate router to explicitly defer
the address using the defer_addr: driver = redirect allow_defer data = :defer:stalling [... whatever condition needed ...] Using a separate router is straightforward and makes for clear log messages about what is going on. However, it's not always possible (or desirable) to use a separate router. In that case you can abuse string expansion to cause an expansion failure while expanding some option where this will force the router to defer. This is moderately tricky for two reasons. First, you cannot just force
string expansion to fail explicitly (via an Second, you need to pick a router option where expansion failure
causes a deferral and, ideally, that you are not already using.
The Exim documentation is the final authority on what router
options will do for this (see generic options for routers
and check what each option does on non-forced expansion failure);
the one that I have found useful in our mailer configuration is
postbox:
driver = accept
transport = local_delivery
# make sure it's mounted
address_data = ${readfile{/var/mail/.MOUNTED}}
[....]
(Our The drawback of this approach is that Exim will log alarmed looking and rather cryptic error messages if the condition every fails and forces messages to be deferred, so it is best reserved for conditions that you don't expect to happen very often.
2009-11-13 (Ab)using Exim routers for their full powerOfficially, as reflected in the documentation, Exim routers are expected to take more or less disjoint sets of addresses; for example, you have one router to do DNS lookups and SMTP for external addresses, one router to handle aliases, one router to expand the .forwards of people with them, and one router to deliver to people's mailboxes for people without .forwards. This makes the ordering of the routers relatively unimportant; approached this way, it is used mostly to make writing routers more convenient by having to be less neurotically careful about what addresses a router applies to. (There is one exception; traditional .forward handling absolutely requires ordering and cannot be done with router conditions.) If you want to really do powerful things with Exim routers, you need to go beyond this view. Instead, you should think of routers as (conditional) steps, or decision points, in a peculiar programming language. Not all decision points apply (or potentially apply) to all addresses, but it is entirely natural that multiple routers potentially apply (depending on circumstances) to the same set of addresses; each such router is a step on the conditional handling logic for these addresses. (This mindset sounds simple when I explain it, but I don't think that it's obvious from the current Exim documentation. I've certainly seen a fair number of 'how to do X' questions asked on the Exim mailing list by people who clearly hadn't made this conceptual leap.) Once you think of routers this way, ordering becomes important; for routers that handle the same set of addresses, the relative ordering of the routers is the ordering of decision steps about those addresses. Often you have something close to a total order of routers because you will want to do some common things with all addresses. To make all of this less abstract, here is the list of decisions that our central mail system makes about external addresses, each implemented with a separate router:
(Some but not all of these also apply to internal addresses too.) Sidebar: why .forward handling requires orderingI cheated in the my example description of Exim routers. Traditional .forward semantics allow you to put your own email address in your .forward again; this means 'deliver to me, bypassing my .forward', which usually winds up putting a copy of the message in /var/mail. If you want to support these semantics under Exim, the router that delivers messages to /var/mail cannot apply only to people who do not have .forwards, and thus has to be ordered after the router that handles .forwards. (How Exim makes these semantics work is a little bit complicated.)
2009-11-12 What makes Exim work as a mailer construction kitIn light of Postfix versus Exim, you might wonder what features make Exim into a mailer construction kit. For me, the easiest way to summarize the answer is to say that Exim has the idea of what I will call a user-written mail processing pipeline (actually two of them, sort of). By a mail processing pipeline I mean a series of steps that messages go through to decide what will happen to them and how they will be delivered. In many MTAs, this processing pipeline is more or less fixed, with you having opportunities to add a table lookup here or mangle addresses there. In Exim, there is no fixed processing pipeline; you write it entirely from scratch yourself, using relatively generic components to do most of the work. The result is that you have a great deal of flexibility in what happens in those pipelines; in other words, how messages get handled and delivered is to a large extent under your direct control. (The two drawbacks of this are that you have to write the pipeline yourself and that it is much easier to screw things up in various ways, some of them subtle.) Conceptually, Exim has two major places with this sort of processing flexibility. The first is deciding how to route an address to one or more delivery destinations; you write a series of what Exim calls 'routers', and then they get used in sequence to process each address in various ways, hopefully ultimately delivering them somewhere. (The Exim documentation describes routers and this routing process in a way that makes it sound less powerful than it is.) The other major place with such a processing pipeline is deciding what reply code to give for each SMTP command in a SMTP conversation. In Exim you do this by writing a series of ACL rules for each command, again using relatively generic components to do most of the hard work. These rules can do quite powerful and generic things, and the combination can be quite powerful. (Exim also gets a fair bit of its general power from its crazy string expansion language; this comes up when writing both routers and SMTP ACL rules.) (2 comments.)
sysadmin/EximMailerKit written at 01:36:25; Add Comment
2009-11-11 For universities, the Internet world has fundamentally changedOnce upon a time, the Internet was just something that you used to communicate with other universities (and companies). One consequence of this was that the university needed to provide everything for its own people; all of the services they needed needed to come from the university. This is no longer the case for universities. Increasingly, people no longer want you to be their service provider (partly because they already have their own), and on top of that other people can do bits of it better than you can (consider Google Mail versus your typical university webmail interface). This is a major, wrenching change in how you think about providing services, and part of what makes it wrenching is that expectations have to be changed too. To put it bluntly, you can't be held responsible for the service being available, because there will be times that the service is unavailable or broken for reasons that are completely beyond your control. This is, I think, not a trivial thing. 'Responsibility' is burned very deeply into organizations; it's in people's attitudes towards their jobs, in mission statements and organizational descriptions, and in expectations by higher administration. Letting go is hard, because it is such a fundamental change; you stop being responsible for user email, for example, and instead become 'responsible' merely for making the best choice of outside provider (or running it yourself, but let's be honest here, Google is better if you can use it). (This assumes that you do just become responsible for picking the best outside provider. If in practice you will be held responsible if something unforeseeable goes horribly wrong with the outside provider, then the sensible and predictable managerial response is to keep doing as much in house as possible.) PS: application to the general university tension between locally provided services and centrally provided services is left as an exercise for the reader. (4 comments.)
tech/UniversityInternetWorld written at 00:54:52; Add Comment
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |