Wandering Thoughts archives

2019-05-28

Distribution packaging of software needs to be informed (and useful)

In my recent entry on a RPM name clash between grafana.com's and Fedora's packaging of Grafana, Josef "Jeff" Sipek wrote in a comment:

I think that's a bit backwards. I always consider the distro as the owner of the package namespace. So, it's really up to grafana.com to use a non-colliding name.

The distro provides a self-contained ecosystem, if a 3rd party package wants to work with it, then it should be up to the 3rd party to do all the work. There's no reasonable way for Fedora (or any other distro) to know about every other possible package on the web that may get installed.

[...]

I fundamentally disagree with this view as it plays out in the case of the 'grafana' RPM package name (where grafana.com packaged it long before Fedora did). At the immediate level, when the upstream already distributes a package for the thing (either as a standalone package, the case here, or through an addon repository), it is up to the distribution, as the second person on the scene, to make sure that they work with the existing situation unless it is completely untenable to do so.

(On a purely practical level, saying 'the distribution can take over your existing package name any time it feels like and cause arbitrary problems for current users' is a great way to get upstreams to never make their own packages for a distribution and to only every release tarballs. I feel strongly that this would be a loss; tarballs are strongly inferior to proper packages for various reasons.)

More broadly, when a distribution creates a package for something, they absolutely should inform themselves about how the thing is currently packaged, distributed, installed, and used on their distribution, if it is, and how it will be used and evolve in the future. Fundamentally, a distribution should be creating useful packages of programs and being informed is part of that. Blindly grabbing a release of something and packaging it as an official distribution package is not necessarily creating a useful thing for anyone, either current users or potential future users who might install the distribution's package. Doing a good, useful package fundamentally requires understanding things like how the upstream distributes things, what their release schedule is like, how they support old releases (if they do), and so on. It cannot be done blindly, even in cases where the upstream is not already providing its own packages.

(For example, if you package and freeze a version of something that will have that version abandoned immediately by the upstream and not have fixes, security updates and so on backported by you, you are not creating a useful package; instead, you're creating a dangerous one. In some cases this means that you cannot create a distribution package that is both in compliance with distribution packaging policies and useful to your users; in that case, you should not package it at all. If users keep asking, set up a web page for 'why we cannot provide a package for this'.)

PS: Some of this is moot if the upstream does not distribute their own pre-built binaries, but even then you really want to know the upstream's release schedule, length of version support, degree of version to version change, and so on. If the upstream believes in routine significant change, no support of old versions, and frequent releases, you probably do not want to touch that minefield. In the modern world, it is an unfortunate fact of life that not every upstream project is suitable for being packaged by distributions, even if this leaves your users to deal with the problem themselves. It's better to be honest about the upstream project being incompatible with what your users expect from your packages.

linux/PackagingMustBeInformed written at 23:57:21; Add Comment

An interesting report on newly used domain names and their usage in spam

One of the interesting things from Geoff Huston's DNS-OARC 30: Bad news for DANE (via, which has useful comments, especially from tptacek, and seen also Against DNSSEC) is some information about the churn in new domain names over the time span of a week, in a section called "The modality of mortality of domain names". I'm just going to quote the end summary, but the whole section is well worth reading. The summary:

The majority of the short-lived names were observed in the gTLD space, and here blacklisting is the primary cause of name death. This was also observed in those ccTLDs that are used as generic TLDs. Overall, some 8% of new names die within seven days.

The observation from this study is that we appear to be spending a huge set of resources to remove names that should never have existed in the first place. If further rounds of new gTLD rounds turn out to be little more than an exercise to offer more choices for spammers, then why are we doing this to ourselves?

(Geoff Huston's article has the wrong link for the presentation materials; the correct link is The Modality of Mortality in Domain Names. Also, 'name death' here does not mean that the DNS records are removed; merely being listed on a domain blacklist is enough. From Paul Vixie's slides, the domain blacklists used are Spamhaus, Swinog URIBL, and SURBL.)

The cynical observation is that people pay a lot of money to register as operators for new gTLDs, and who is going to turn down that money? The operators may not make much money (but maybe they do, from some spammers), but the people who approve new gTLDs and get money for them sure do.

Another striking thing from the slides is that almost 1/5th of new gTLD domains die within a week, and it is usually due to blacklists. This is a much higher rate of death than the overall numbers, which backs up what I suspect will be most people's intuition that random gTLD domain names are most likely to be involved in spam. Some gTLDs have dramatic death rates in the study; the slides suggest that 65% of new domains in .date get blacklisted within a week, for example.

This is for 'newly observed domains', which means that this is the first time the domain names have been used. They may or may not have been registered recently, although the speculation that some fast removals from the DNS result from credit card chargebacks and other charging failures suggests that perhaps that recent registration is also the case.

Since blacklisting is apparently often so fast, there is an obvious approach in an anti-spam system that wanted to do the work. You could keep track of domain names that you've seen in email and then temporarily defer all messages with new domain names for six hours or so. This is a clear extension of IP-based or sender-based greylisting, with part of the same goal of hoping that any bad actors appear on blocklists before you reach your timeout period and accept the email.

spam/NewDomainsAndSpam written at 17:03:32; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.