Wandering Thoughts archives

2013-10-31

Naming disk devices: drive IDs versus drive locations

From my perspective there are two defensible ways of naming disk drives at the operating system level. You can do it by a stable identifier tied to the physical drive somehow, such as a drive serial number or WWN, or by a stable identifier based on its connection topology and thus ultimately the drive's physical location (such as the 'port X on card Y' style of name). I don't want to get into an argument about which one is 'better' because I don't think that argument is meaningful; the real question to ask is which form of naming is more useful under what circumstances.

(Since the boundaries between the two sorts of names may be fuzzy, my rule of thumb is that it is clearly a drive identifier if you have to ask the drive for it. Well, provided that you are actually speaking to the drive instead of a layer in between. The ultimate drive identifiers are metadata that you've written to the drive.)

Before I get started, though, let me put one inconvenient fact front and center: in almost all environments today, you're ultimately going to be dealing with drives in terms of their physical location. For all the popularity of drive identifiers as a source of disk names (among OS developers and storage technologies), there are very few environments right now where you can tell your storage system 'pull the drive with WWN <X> and drop it into my hands' and have that happen. As I tweeted I really do need to know where a particular disk actually is.

This leads to my bias, which is that using drive identifiers makes the most sense when the connection topology either changes frequently or is completely opaque, or both. If your connection topology rearranges itself on a regular basis then it can't be a source of stable identifiers because it itself isn't stable. However you can sometimes get around this by finding a stable point in the topology; for example, iSCSI target names (and LUNs) are a stable point whereas the IP addresses or network interfaces involved may not be.

(Topology rearrangement can be physical rearrangement, ranging from changing cabling all the way up to physically transferring disks between enclosures for whatever reason.)

Conversely, physical location makes the most sense when topology is fixed (and drives aren't physically moved around). With stable locations and stable topology to map to locations, all of the important aspects of a drive's physical location can be exposed to you so you can see where it is, what the critical points are for connecting to it, what other drives will be affected if some of those points fail or become heavily loaded, and so on. Theoretically you don't have to put this in the device name if it's visible in some other way, but in practice visible names matter.

My feeling is that stable topology is much more common than variable topology, at least once you identify the useful fixed points in connection topology. Possibly this is an artifact of the environment I work in; on the other hand, I think that relatively small and simple environments like mine are much more common than large and complex ones.

Sidebar: the cynic's view of OS device naming

It's much easier to give disks an identifier based device name than it is to figure out how to decode a particular topology and then represent the important bits of it in a device name, especially if you're working in a limited device naming scheme (such as 'cXtYdZ', for example). And you can almost always find excuses for why the topology might be unstable in theory (eg 'the sysadmin might move PCI cards between slots and oh no').

DiskNamingIDVsLocation written at 01:14:16; Add Comment

2013-10-28

If you're on the IPv4 Internet, you really are in public now

Once upon a time it was possible to feel that your machines were somewhat private and obscure even if they had public IP(v4) addresses and were on the Internet. It wasn't quite true but it was mostly true because what scanning there was was haphazard and slow and random. You might get poked sooner or later, especially for common things like SSH, but that was just from background noise and people trying to get lucky.

The first clear and public cracks in this came last year with some anonymous researcher's Internet Census 2012, which used a massive botnet to scan the entire IPv4 address range. That showed that a mass scan was feasible but not that it was practical; even if you have one, a massive botnet is a valuable thing, generally too valuable to burn scanning all of IPv4. But the Internet has a long tradition of scaling things up and making them faster, so along came zmap. Given a decent machine with a good Internet connection, zmap will mass scan IPv4 in a feasible amount of time. That was nice (in a sense) but you could tell yourself that it was basically an academic thing.

We're all wrong. Those days are very much over now:

@PaulM: Apparently many of you missed it. I took a screenshot of all unauthenticated VNC servers on IPv4. It took 16 minutes. results.survey.tx.ai

Let me repeat that: as a casual thing someone can now scan the entire IPv4 Internet and connect to every visible instance of something (with a reasonably complicated protocol). In sixteen minutes (well, allegedly).

There is no hiding on the IPv4 Internet any more. There is no more obscurity. If you have something out there and someone is interested in finding all instances of it, they not merely can do so but they can do so trivially. They don't have to target you specifically; the IPv4 Internet is now a world of large-scale scanning that simply sweeps up absolutely everything.

Implications for the next security hole in something that advertises itself in a banner or even can be detected in a TCP conversation are left as an exercise for the reader.

(These implications have always been there, but there has generally been a theoretical 'worst case' air to them. This is not theoretical any more; this is all too bluntly practical.)

NoMoreIPv4Hiding written at 23:22:38; Add Comment

2013-10-25

Modern disk write caches and how they get dealt with (a quick overview)

Basically all modern disks (SAS, SATA, etc, it doesn't matter) have write caches. I think that most disks these days default to having them on, even in the 'enterprise' space, since enterprise OSes have generally been dealing with disk write caches for some time and so are safe in the face of them.

(In the old days when write caches were just starting to appear, the stereotype was that consumer drives defaulted to enabling it and enterprise drives to disabling it.)

Generally, if you have a disk with its write cache enabled ('WCE' in the jargon) there are three things you can do to force writes to be committed. The blunt hammer is disabling the write cache ('WCD'), which I believe generally has such a bad effect on performance that you don't want to do that and in fact any number of OS and filesystem setups will turn the write cache on for you if it's turned off (cf).

I believe that all modern disks with write caches support a way to flush the cache; in SCSI (and SAS) this is the SYNCHRONIZE CACHE SCSI command. Of course this has the usual drawback that it flushes even data that you may not care about and not need to be committed to disk right now. As I discovered yesterday, drives can also support a write option called 'Force Unit Access' (FUA) that bypasses the write cache in order to force what you're writing to be forced to disk. In general FUA is bundled with another feature called 'Disable Page Out' (DPO), which tells the drive that putting the data into cache is not useful.

(DPO and FUA can also be used on reads, apparently, but I haven't looked into that at all.)

Historically, FUA support has apparently been common and well done in SCSI (and SAS) drives and SAS drivers routinely support it. FUA support in ordinary SATA drives was apparently a chancy thing in the beginning and as a result Linux's SATA driver still defaults to pretending that drives do not support FUA (and I believe DPO) regardless of what the actual drive reports (other OSes may behave similarly). How SATA drives behind a SAS controller behave and how your SAS controller driver treats them is probably an interesting question that you may want to check for yourself if you have such a setup.

Of course higher level software may make some or all of this academic. Your filesystem may also only bother with write cache flushes instead of trying to be more fancy with selective FUA usage.

(This entry is due to me getting both curious and nervous after discovering FUA in yesterday's entry and then discovering that drives in our SATA to SAS environment claimed to support it when no directly connected SATA drives did.)

ModernDiskWriteCaches written at 00:57:02; Add Comment

2013-10-23

Paying for services is not necessarily enough

There is a meme running around the Internet that if you don't pay for the services you use, you're a sheep. The problem with this is that we have plenty of demonstrations that even paying for services is not necessarily good enough to insure you won't be turned into a sheep.

My personal demonstration of this is Flickr. I've paid for a Flickr Pro membership (and felt it was worth it) for years, but then Yahoo changed the account structure this summer. Flickr's new price of $50 a year to not be shown ads is well above the current renewal rates for ad-free Pro accounts, and on top of that they no longer offer new Pro accounts. While they have not come out and said it outright, the new pricing to not be shown ads makes it clear both that Pro accounts were not enough for Flickr and what Flickr's new business model is. And it's not sustaining themselves on my money, certainly not at the discounted rate I'm paying now and perhaps not even at full price. Flickr apparently wants (and perhaps needs) sheep to be sold to Yahoo's real customers, their advertisers.

(I suppose that this too is part of Flickr knowing their focus.)

Well, you may say, of course paying for things by itself is not good enough; you have to pay people who are building a sustainable business. The problem with this advice is that it's very difficult for me to know if your business is really a sustainable one or if you're just papering over the cracks for now. There are signs and indicators, such as the presence of free accounts, but they're far from definitive.

(Not even past success makes the future certain, although it makes it much more likely.)

Does this matter? Maybe. People who are determined to try to pay for sustainable businesses will keep paying, because for them it is partly a moral issue. People who feel that they are getting current value for money will also pay for service (I don't regret paying for Flickr Pro and I felt I got my money's worth, even at the moment). But I do think that this exploitation of even paying customers makes it harder to argue that you should pay for everything when just as good free alternatives exist. If you're likely to be exploited one way or another, you might as well not pay for it.

PS: of course, sometimes a big business will say 'to hell with you' even if you pay them plenty of money because you're worth even more than that as marketing data and they think you have no choice. Bell Canada, essentially the monopoly landline phone provider and part-monopoly Internet provider (and also a mobile provider and so on), has kind of not announced a new non-privacy policy that allows them to use both your Internet usage and your 'calling patterns' as grist for the marketing mill.

PaidServiceNotEnough written at 00:43:00; Add Comment

2013-10-17

There are two cases for changing SSL/TLS cipher settings

I've recently been reading a discussion about TLS cipher settings for a program. The suggestion was made that there should be some sort of abstract high level setting for this, one that would be all that most people should ever need (partly this was proposed on the entirely sensible grounds that TLS cipher settings are insanely difficult to get right unless you're a real expert). While this is perhaps a good idea in theory, I'm not sure it's going to work in practice. You see, in practice I see two reasons to change TLS cipher settings.

We can see the first reason by asking why you wouldn't just always set the cipher settings to the 'high security' setting and be done with it. Well, I can't give you a concrete reason why not, but in general any reasons why you might not want to do that are the high level settings you'd want to control. Perhaps you need to use less CPU, perhaps you need some general TLS option that impacts security, perhaps you need to be compatible with older or odder clients. But the meta-point here is that the high level options chosen need to be meaningful ones and as a result, determining what they should be will take real work and real domain expertise (not just in TLS in general but in the tradeoffs that are commonly seen in the field). You can't just throw something like 'high', 'medium', and 'low' settings into your program and think you're done; at a minimum those are the wrong names for whatever the real choices being made are.

The second reason to set ciphers is to deal with new attacks on TLS ciphers or your TLS implementation (or both at once, if you're unlucky). While this year has not been a good one for TLS, this sort of thing can happen any time and when it does, some people will want to mitigate it in the short term (or need to). There is no real option for this other than direct control over TLS ciphers and other TLS options; no high level option will do it because no high level option can foresee the future.

(At some later point the program can be updated either to always be secure or, if there are tradeoffs involved, to have a high level option for it. However in realistic situations this update will take literally years to make it out into the field; before then, your options are low level TLS control or nothing.)

PS: a corollary of the second reason is when TLS best practices change between when your program was written and when you are using it. Again, high level options can't really help you because the program would have had to anticipate the future.

TLSCipherChangesTwoCases written at 01:17:17; Add Comment

2013-10-15

Why I'm not looking for any alternatives to iSCSI for us

Broad scale distributed storage systems such as Ceph are an in thing these days (at least in some quarters). A while back a commentator on this entry suggested looking at them as an alternative to our use of iSCSI and I've been mulling over my reaction since then. Let me put it simply: my reaction is strongly negative. The short reason why is that I see no compelling benefits and all alternatives appear to involve more complexity and magic.

Let's assume that the setuid issue can be dealt with somehow (this is a basic prerequisite). First off, it's worth noting that ZFS plus iSCSI plus backends involves completely commodity hardware and (with Illumos) completely open source software; moving to something like Ceph gives no benefits there.

Our current ZFS plus iSCSI environment has simple components where we understand and can predict (at some level) basically everything that is going on. The distribution of data over physical backends and physical disks is not completely predictable (ZFS pools smear data across all of their components in somewhat unpredictable ways) but it is relatively so, as is the performance of the resulting bits. This is a feature for us. We very much do not want a big black box where magic happens and people's data is distributed over, well, something, somewhere.

I do not want to say that Ceph or other distributed storage systems are going to be black boxes, because I suspect that they aren't and I certainly don't have the experience to say one way or another. But what I can say is that I don't see any way in which they're going to be simpler than our current environment. No matter how you slice it we need filesystems inside pools of storage (that are fixed size but expandable) where those storage pools are mapped to some mirrored disk space. ZFS pools on disks is about as direct an expression of this as you can get and we know that it works and that we can manage it easily. I just don't see how a distributed storage system can do this even better, not without introducing magic that we don't want.

(Given the risks of switching from a known to work environment, it's not enough for a distributed storage system to be just as good as our current system. It must be better, and not just a little bit better; it should be substantially and visibly better.)

PS: I'm not saying that distributed storage systems have no use. I can certainly see situations where something like our ZFS plus iSCSI environment would become unmanageably complex and inflexible, for example. But we are not operating anywhere near that scale today or in the foreseeable future.

Sidebar: ease of use versus magic

It's possible to imagine a distributed storage system that makes our environment easier to manage at one level. You could have this cloud of storage, a storage pool management layer that insured that everything in it was mirrored, and a set of storage pools or filesystem groups on top of this (with quota or other size limits). Storage would be automatically managed and migrated and all sorts of good things.

The problem is that this system is much more magical and less predictable than our current environment. For instance, we might generally have no idea which storage pools or filesystems are using any particular chunk of storage because the system handles storage distribution for us. We don't consider this a feature, partly because we definitely want the ability to engineer our system so that certain sources of IO load are fenced off from other sources.

DismissingISCSIAlternatives written at 00:20:12; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.