Wandering Thoughts archives

2020-10-04

Web page generation systems should support remapping external URLs

Some web pages and web sites are hand authored, but many more are generated (dynamically or statically) through web page generation systems and content management systems of various sorts. Also, often our writing in these systems has links to external pages; to other people's writing, to reference documentation, to Wikipedia, to whatever. This presents us (the people running web sites and writing on them) a long term problem, because in practice some or many of those external URLs will eventually change.

Today, we don't have good support in our page generation systems for this unfortunate reality of web life. If you find out that a an external URL you reference has moved, you generally have to hunt around through all of your content and update it, either completely manually or at best semi-automatically. The unsurprising result of this is that people often don't, even when they know old links have changed; it's simply too much work to go back through everything and fix it all up.

So here's an idea: all of our web page generation systems should support a remapping file (or data source) for external URLs, which would list the old URL and its new replacement. A fancier version could also have site matching, prefix matching or general pattern matching. When you're generating a page and the page has a link pointing to such an old URL, it would automatically get replaced with the new URL. The obvious advantage of this remapping system is that it's less work; the subtle one is that it's automatically universal, with you not having to hunt down every last obscure corner of the site where the URL is mentioned.

(In some systems it would make sense to automatically edit this change into the source data; generally I think those are systems where the source data is already held in a database by the web generation system and is not edited by people by hand.)

One additional advantage of doing this in the web page generation system instead of in external tools is that the web page generator generally has the best idea if what it's really dealing with is a link target, instead of some other text that happens to mention or include the URL. You probably don't want to rewrite mentions of old URLs in plain text, for example, especially not automatically.

PS: This remapping should be applied repeatedly, because replacement URLs can themselves get replaced. Yes, sure, theoretically people could go through and update the original mappings again, but let's make it easy and as foolproof as possible. Since link rot is going to happen, we should make it easy to deal with.

(This idea was sparked by Aristotle Pagaltzis linking to a web.archive.org copy of a diveintomark.org entry in a comment on this entry, causing me to realize that I had entries with direct links to diveintomark that needed to be updated to web.archive.org. This shows both how long it can take me to write some Wandering Thoughts entries and how I still haven't gotten around to finding and editing all of those entries (or implementing a remapping file here).)

web/RemappingExternalLinks written at 23:35:20; Add Comment

Link: Old-School Disk Partitions

Warner Losh's Old-School Disk Partitions (via) is a discussion of how disk partitioning worked and evolved in the early days of Unix, up through 4.3 BSD. This has more information than what I wrote about how major and minor device numbers worked in V7, because I missed that various disk device drivers had their own partitioning tables for minor numbers.

Warner Losh's blog has a lot of interesting writing on historical Unix things, so if that's one of your interests (as it is one of mine) it's well worth a look in general.

links/OldSchoolDiskPartitions written at 14:49:34; Add Comment

Solid state disks in mirrors and other RAID setups, and wear lifetimes

Writing down my plans to move to all solid state disks on my home machine, where I don't have great backups, has made me start thinking about various potential issues that this shift might create. One of them is specific to how I'm going to be using my drives (and how I'm already using SSDs), which is in mirrored pairs and more generally in a RAID environment.

The theory of using mirrored drives is that it creates redundancy and gives you insurance against single disk drive failures. When you mirror hard drives, one of the things you are tacitly counting on is that most hard drive failures seem to be random mechanical or physical media failures (ie, the drive suffers a motor failure or too many bad spots start cropping up on the platters). Because these are random failures, the odds are very good that they won't happen on both drives at the same time.

Solid state drives are definitely subject to random failures from things like (probable) manufacturing defects. We've had some SSDs die very early in their lifetimes, and there are a reasonable number of reports that SSDs are subject to infant mortality (people might find A Study of SSD Reliability in Large Scale Enterprise Storage Deployments [PDF] to be interesting on this topic, among others). However, solid state drives also have a definite maximum lifetime based on total writes. Drives in a mirrored setup (or more generally in any RAID one) are likely to see almost exactly the same amount of writes over time, which means that they will reach their wear lifetimes at almost the same time.

If your solid state drives reach their wear lifetimes at all in your RAID array (and you put them into the array at the same time, which is quite common), it seems very likely that they will reach that lifetime at about the same time. If you have good monitoring and reporting on wear (and if the drives report wear honestly), this means you'll start wanting to replace them at about the same time. If they don't report wear honestly and just die someday, the odds of nearly simultaneous failures are perhaps uncomfortably high.

There are two reasons this may not be a real worry in practice. The first is that it seems unusual (and hard) in practice to reach even the official nominal wear lifetimes of SSDs, much less the real ones (which historically seem to have been much higher than the datasheet numbers when people have tested to destruction). The second is that A Study of SSD Reliability in Large Scale Enterprise Storage Deployments specifically says that you should worry more about infant mortality getting multiple drives at once, since their data says (enterprise) solid state storage has a significantly extended infant mortality period.

(You can also deal with wear concerns by throwing one or some of your RAID drives into a test setup to get written to a lot before you spin up the real RAID array, so that they should reach any wear lifetime a TB or three ahead of your other drives. This might or might not affect infant mortality in any useful way.)

tech/SSDsRAIDWearWorry written at 01:12:28; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.