2013-06-24
How to get your syndication feed fetcher at least temporarily banned here
In the spirit of a previous series, here's how to get me to at least temporarily ban a syndication feed fetcher that appears potentially legitimate. This is not something that I like to do because it potentially cuts off people who actually want to read Wandering Thoughts, but this case is so bad and so potentially questionable that I'm doing it at least temporarily.
So here's the procedure:
- Make a lot of requests for the same feed. For example, request the
main feed here once every ten minutes like clockwork
(despite the fact that it doesn't change anywhere near that often).
- Don't use any form of conditional GET, so
you fetch the full feed every time.
- Don't support gzip encoding, so you fetch nearly half a megabyte
every ten minutes.
- Insert bogus
Cookieheaders into the request. In this case the feed fetcher appears to be leaking cookies set by other sites into requests to here, including some badly formed cookies that cause the standard Python cookie parser to throw errors (which get logged by DWiki, which is why I noticed all of this in the first place). - Don't have any meaningful reverse DNS and have a
User-Agent:header of:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2; Feeder.co) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.43 Safari/537.31
This is not a proper User-Agent for an automated feed fetcher. A proper User-Agent clearly identifies the organization responsible and that this is a robotic agent making the request. This is instead an almost complete imitation of a real web browser's User-Agent, with only an inconspicuous 'Feeder.co' to perhaps identify the actual responsible party (there really is a 'feeder.co' and they appear to do feed fetching).
- Of course the Feeder.co website exposes almost no contact information and especially doesn't have a 'contact us here if our feed fetcher is doing something odd' page.
Under normal circumstances I would continue to allow this feed fetcher to pull my feed and send the people running it email about the problems in the hopes that they'll fix them. But the User-Agent here smells very much like what spammers do and with everything else going on I have no idea if Feeder.co is even responsible for this or whether someone is abusing their vaguely good name. Certainly I don't feel like trusting them with any of my email addresses; even at the best they are running a significantly bad feed fetcher and have made a number of extremely questionable decisions in operating it. It doesn't help that some of their program bugs are drastically polluting my logs (due to the complaints about the malformed cookies).
(If you do not support conditional GET you have absolutely no business polling feeds at a rate anywhere near close to once every ten minutes. Never. Ever.)
(It's not just that the spammers have thoroughly poisoned the well for reaching out to random people on the Internet that you don't have any real knowledge of. It's also that telling people that their software has serious problems is sometimes an excellent way of sparking a great deal of drama (with a capital D). Especially if they are a commercial company.)
PS: I may reluctantly change my opinions here in a few days. I really don't like cutting Wandering Thoughts readers off, even if they are using a service with major problems.
(I've considered redirecting these requests to a very small Atom feed with a single entry that just says 'this feed fetcher is broken and not getting actual content, please switch software or report this to the operators', but that would require creating such a feed somehow. I suppose it wouldn't be too hard. Right now the feed requests are just getting 403 responses (and they are still coming in every ten minutes, which is another failure).)
2013-06-08
The Flickr redesign and knowing your site's focus
You may have heard that Flickr recently did a major site redesign that significantly changed the look and the experience of the site (of course many long-term users are up in arms over it). I'm not sure how I feel about it myself, but after interacting with the new Flickr for a while I've realized something: the new Flickr is very strongly focused on looking at photographs.
This might sound obvious, but the old Flickr wasn't this way. The easiest way to explain it is to say that the old Flickr showed you a lot of distractions. For example when you were looking at a single photograph the page had the photograph in a relatively modest size and then a lot of both empty space and other stuff (there was various photo information, groups, tags, and other stuff on the right and the description and comments below the picture, all routinely visible in a normally sized browser window). In the new Flickr, almost all you see on an individual photo page is the photograph itself; everything else has been removed, pushed 'below the fold' where you have to scroll to see it, or minimized. A similar transformation has happened to the various sorts of index pages (eg sets), where now almost all of their space is filled with photographs instead of anything else (and photographs are now presented uncropped; they used to be usually shown in cropped-to-square thumbnails).
The old Flickr didn't have a clear focus. It was about photographs, sure, but it was also about information about photographs and comments on them and so on and so forth. The new Flickr still has those other things but now its focus is clearly on looking at the photographs, to the point where you have to go out of your way to see much else. I suspect that it's successful at this and a good part of me thinks that it's more interesting and useful now.
In a way I'm lucky in that I don't generally deal with the sort of web sites where we should be worrying a lot about this. But that's probably too narrow a reading of this particular lesson; for example, we have a support site here and I don't think we've ever tried to sit down and figure out what its focus is.
(You can argue that this is in fact a central issue in the wiki approach to building websites. Unless it's carefully curated a wiki is just a pile of information emptied into a pile of pages and thus doesn't really have any sort of focus as such. It's just a database and a jumble.)
(This is of course related to recent ruminations on blog design issues and in fact Matt Gemmell's original article is called Designing blogs for readers. If you take it seriously, there's your focus right there.)
2013-06-05
The case against blog sidebars
The trail of this thought starts with Matt Gemmell's Designing blogs for readers, which advocates strongly against blog sidebars, and continues with Dr Drang's Blogging and readability, which pushes back against the anti-sidebar sentiments. In the process Dr Drang agrees that blog sidebars only work on relatively wide screens; on the small and narrow screens typical of smartphones (for example), a visible sidebar is a waste of precious space that should be going to the content.
(Apparently smartphone browsers may be smart enough to show only a single column even if the HTML nominally disallows this. For the sake of anyone browsing Wandering Thoughts from such a device, I sure hope that actually works.)
This is the core of the case that I can see against blog sidebars. If how your site appears on smartphones and other constrained displays is important to you, you can't assume that the sidebar is visible; in the best case (with CSS magic), it will appear as a footer instead. If it's going to appear as a footer in some important cases you might as well make it a footer all the time. This at least somewhat unifies the visual appearance and functionality of your site between smartphones and larger displays and means that you only have to design one version of the site's visual appearance instead of two.
(Well, probably not really. You may need to tweak your design for smartphones even if you build it with footers.)
All of this makes me feel conflicted about how Wandering Thoughts looks. Reading Matt Gemmell's article caused me to do a little bit of design tweaking (and to deploy something I've been poking for a while), but WT is still very cluttered. Some of the sidebar clutter can probably be removed (I don't think Atom feeds need to be mentioned any more, for example), but a lot of it has things that I consider important. And some of the clutter exists because the whole blog is actually part of a generic wiki engine and that imposes its own structure on pages.
(I could work out a way to do a total blog-specific design for the blog portions of the wiki, but then I'd have to actually design it. Design is not something I feel I'm good at, which is why the whole of WT generally avoids it.)