Wandering Thoughts: Recent Entries

2013-05-10

Illustrating the tradeoff of security versus usability

One of the sessions of the university's yearly technical conference that I went to today was on two-factor authentication using USB crypto tokens (augmented by software on the client). In the talk, it came up that token-aware software can notice when the USB token is removed and do things like de-authenticate you or break a VPN connection. It struck me that this creates a perfect illustration of the tradeoff between security and usability, which I will frame through a question:

When the screen locker activates, should a token-aware application break its authenticated connection to whatever it's talking to and deauthenticate the user, forcing them to reauthenticate by re-entering their token PIN when they come back to the machine? This is clearly the most secure option; otherwise there's no proof that the person who unlocked the screen and is now using the computer is the person who owns the USB token and passed the two-factor authentication earlier.

Some people are enthusiastically saying 'yes' right now. Now, imagine that you're using this two-factor system to authenticate your SSH connections to your servers. Does your opinion change? In fact, does your opinion change about how the system should behave if the token is removed?

The usability issue is pretty simple: tearing down VPNs and breaking SSH sessions and logging you out of applications is secure but disruptive. In some situations it would be actively dangerous, because you'd be interrupting something halfway through an operation (although in this sort of environment all sysadmins would rapidly start using screen or tmux everywhere in self defense). You probably don't want this disruption every time you step away from your machine to go to the office coffee pot, the washroom, or whatever. At the same time you don't want to leave your machine exposed with its screen unlocked.

(In fact the most secure thing to do would be to both lock your screen and take the USB crypto token with you. This is also likely to be maximally disruptive.)

It's worth noting that the more you use your USB token, the more disruptive this is. This is especially punishing to the power users who run authenticated applications all the time and who often or always have multiple ones active at once, possibly with complex state (such as sysadmins with SSH sessions). Unfortunately these may be exactly the people you want to be most secure.

It's tempting to say that way to improve this situation is to improve the usability by suspending secured sessions instead of breaking them and deauthenticating the user; then users merely have to re-enter their PIN (hopefully only once) instead of re-opening all their secured applications and re-establishing their VPN and SSH connections and so on. In theory you can make this work. In practice, doing this securely requires that the server side of everything supports the equivalent of screen, letting you disconnect and later reconnect.

(If the suspension is done only by client software bad guys can use various physical attacks to compromise an exposed machine, bypass the client suspension, and directly use the established VPN, SSH session, or whatever. You need the server software to force the client to re-authenticate.)

PS: I suspect that you can predict the result of having the screen locker activating causing sessions to be broken and people to be deauthenticated. For that matter, you can likely predict the result of having this happen when the USB token is removed (and it involves a surprising number of unattended USB tokens, especially in areas that people feel are physically secure (like lockable single-person offices)).

SecurityVsUsabilityToken written at 23:39:17; Add Comment

2013-05-05

The original vision of RISC was that it would be pervasive

In the middle of an excellent comment gently correcting my ignorance, a commentator on yesterday's entry wrote:

[RISCs requiring people to recompile programs] has some truth to it, but I would disagree that it was a big bet of RISC. Rather, it was a function of the market niche that classic RISC was confined to: Customers that paid tens or hundreds of thousands of dollars for a high-performance RISC machine would be willing to recompile their code to eke out the best possible performance.

I have to disagree with this because I don't think it matches up with the actual history of what I've been calling performance RISC. To put it one way, RISC was not originally intended to be just for high performance computing; in the beginning and for a fairly long time, RISC was intended to be pervasive. This is part of why in 1992 Andrew Tannenbaum could seriously say (and have people agree with him) that x86 would die out and RISC would become pervasive (cf). He did not mean 'pervasive in HPC'; he meant pervasive in general, across at least the broad range of machines used to run Unix.

The early vision of (performance) RISC was that RISC would supplant the current CISC architectures just as the current 16 and 32-bit CISC architectures had supplanted earlier 8-bit ones. The RISC pioneers may not have been thinking about 'PC' class machines (although Acorn gave it a serious try) but they were certainly thinking about and targeting garden variety Unix workstations and servers. And even in 1990, everyone knew and understood that most Unix servers were not HPC machines and they spent their time doing much more prosaic things. To really be successful and meaningful, RISC needed to be good for those machines at least as much as it needed to be good for the uncommon HPC server.

(Everyone also understood that these machines cost a lot less than full out everything for speed HPC servers. DEC, MIPS, Sun, and so on sold plenty of lower end servers and workstations, so they were well aware of this. I would guess that by volume, far more RISC machines were lower end machines than were high end ones through at least 1995 or so.)

RISC certainly did wind up fenced into the high performance computing market niche in the relatively long run, but that was because it failed outside that niche (run over by the march of the cheap x86 machines everywhere else). The HPC niche was not the original intention and had it been, RISC would have been much less exciting and interesting for everyone.

(And in this general market it was empirically not the case that most people were running code that was compiled specifically for their CPU's generation of optimizations, scheduling, and so on.)

PervasiveRISC written at 00:37:49; Add Comment

2013-05-04

What I see as RISC's big bets

At the time, performance oriented RISC was presented as the obviously correct next step for systems to take. Even today I can casually say that x86 won against RISC mostly because Intel spent more money and have people nod along with it. But I'm not sure that this is really the case, because I think you can make an argument that the whole idea behind (performance) RISC rested on some big bets.

As I see them, the main big bets were:

  1. CPU speeds would continue to be the constraint on system performance. This one is obvious; the only part of the system that a fast RISC improved was the CPU itself and that only mattered if the CPU was the limiting factor.

  2. Compilers could statically extract a lot of parallelism and scheduling opportunities, because this is what lets classic RISC designs omit the complex circuitry for out of order dynamic instruction scheduling.

    (Itanium is an extreme example of this assumption, if you consider it a RISC.)

  3. People would recompile their programs frequently for new generations of CPU chips, because this is a consequence of static scheduling and newer CPUs have different scheduling opportunities from older ones. If you can't make this assumption, new CPUs either run old code unimpressively or need to do dynamic instruction scheduling for old code. Running old code unimpressively (if people care about old code) does not sell new CPUs.

The third bet proved false in practice for all sorts of pragmatic reasons. My impression is that the second bet also wound up being false, with dynamic out of order instruction scheduling able to extract significantly more parallelism than static compiler analysis could. My memory is that together these two factors pushed later generation RISC CPU designs to include more and more complex instruction scheduling, diluting their advantages over (theoretically) more complex CISC designs.

(I'm honestly not sure how the first bet turned out for fast RISC over its era (up to the early 2000s, say). CPUs weren't anywhere near fast enough in those days but my impression is that as the CPU speed ramped up, memory bandwidth and latency issues increasingly became a constraint as well. This limited the payoff from pure CPU improvements.)

I don't see the RISC emphasis on 64 bit support as being a bet so much as an attempt to create a competitive advantage. (I may be underestimating how much work it took AMD to add 64 bit support to the x86 architecture.)

Update: I'm wrong about some of this. See the first comment for a discussion, especially about the out-of-order issue.

RISCBigBets written at 03:02:46; Add Comment

2013-04-30

The two stories of RISC

I said this implicitly in my entry on ARM versus other RISCs but I've realized that I want to say it explicitly: there are (or were) effectively two stories of RISC development. You could see RISC as a way of designing simple chips, or you could see RISC as a way of designing fast chips.

In the story of simple chips, designing a RISC instruction set architecture with its simple and regular set of operations and registers and so on is a great way of designing a simple chip. You don't need complex instruction decoding, you don't need microcode, you don't need all sorts of irregularities in this and that, and so on. You throw out complex instructions that would take significant amount of silicon in favour of simple ones (and anyways, the early RISC studies suggested that those instructions didn't get used much anyways). The result would be good enough and more importantly, you could actually design and build it.

In the story of fast chips, all of the ISA and implementation simplicity was there to let you design a small chip that you could make go fast (and then scale up easily to make it go even faster). CPU cycle time was a holy thing and every instruction and feature had to be usable by the compiler in practice to make things go faster, not just locally (in code where it could be used) but globally (looking at total execution time across all programs, however you did that). Various RISC research had showed you could throw out a lot of the CISC complexity and push a lot of things to the compiler without slowing code down in practice, so that's what you did. Designing one of these RISC chips was actually a pretty big amount of work, not for the raw design so much as also building the compilers, the simulation environment, and so on.

(These RISC chips were almost invariably built in conjunction with their compilers. The chip and the compiler were two parts of a single overall system.)

Almost all of the RISC chips that people like me have heard of (and lusted after) were designed under the fast chips story; this is MIPS, the DEC Alpha, the Sun SPARC, IBM's Power (later PowerPC aka PPC), Intel's Itanium, and HP's PA-RISC, probably among others. This is the sort of RISC that I learned about from John Mashey's comp.arch posts in the late 1980s and early 1990s and what I still reflexively think of as 'RISC' today. They were what went into Unix servers and workstations (and then later into Macs) and the loss of their elegant, nice architectures to the brute force and money of x86 made many Unix people sad.

(I'd say that I've gotten over my own sadness, but that's not quite what I did. I still don't particularly like the x86 architecture, I just ignore it because my machines are cheap and run fast.)

As I discovered when I researched my entry on ARM, ARM chips are the other story, the unsexy story, the story of simple chips. Unix people like me didn't (and often still don't) really pay them much attention because they were never really server or workstation chips; they didn't appear in machines that we really cared about. Of course the punchline is that they turned out to be the more important sort of RISC chips.

TwoRISCStories written at 23:00:14; Add Comment

2013-04-29

My view of ARM versus other RISCs

Way back in my second entry on x86 winning against RISC a commentator asked:

And now you need to write who how ARM up-ending this orderly structure of the universe.

Writing about this is hampered by my lack of knowledge of the details of ARM history, but after some reading I have my theory: ARM has been successful where other RISCs weren't because from the first it was targeted differently.

(This is not a matter of architecture, as I initially thought before I started reading. The original ARM ISA was a bit odd but no more so than other RISCs.)

The simple version is that all other RISCs saw themselves as competing for the performance crown (against each other and then x86); they quite carefully and quite consciously engineered for performance and then tried to sell their CPUs on that basis. This was a sensible decision because performance was where the money was (and also because there clearly was a lot of room to improve CPU performance). It just happened that Intel was able to spend enough money to scale x86 up enough to crush everyone else with good enough performance for not too much money (and with the other advantages x86 gave them).

Acorn (where ARM started) doesn't seem to have seen itself as building a high-performance CPU. Instead it wanted to build a CPU that met its needs as far as features went, performed well enough, and was simple enough that a small company could design it (after all, early RISCs were designed by a class of graduate students). This gave ARM different design priorities and, just as importantly, meant that Acorn (and later ARM Ltd) didn't spend huge amounts of money on R&D efforts to crank performance up to compete with other CPUs (a race that they would have been doomed to lose). Free from chasing performance at all costs, ARM was both able and willing to adopt its design for people who were interested in other needs.

(I have no particular insight about why ARM won out over similar higher end low-power CPU efforts of the early 1990s. However two things seem worth noting there. First, all of the really big designs seem to have been RISC, which makes sense; if you're building a new embedded CPU, you want something simple. Second, both the Intel i960 and the AMD 29k were made by companies that were also chasing the CPU performance crown; among other things this probably drained their design teams of top talent.)

It's worth noting that part of this difference is a difference in business priorities. One reason that ARM is so widely used is that it is dirt cheap and one fundamental reason that it's dirt cheap is that ARM Ltd licenses designs instead of selling chips. This means that ARM Ltd has made a fraction of the profit that, say, Intel has made from their CPUs. Licensing widely and cheaply is an excellent way to spread your CPUs around but a terrible way to make a lot of money.

(Of course losing the CPU performance race was an even worse way of making money for all of the other RISC companies. But if the RISC revolution had actually worked out, one or more of those companies could have been an Intel or a mini-Intel.)

ARMvsRISC written at 01:24:50; Add Comment

2013-04-19

How I want storage systems to handle disk block sizes

What I mean by a storage system here is anything that exports what look like disks through some mechanism, whether that's iSCSI, AoE, FibreChannel, a directly attached smart controller of some sort, or something I haven't heard of. As I mentioned last entry, I have some developing opinions on how these things should handle the current minefield of logical and physical block sizes.

First off, modern storage systems have no excuse for not knowing about logical block size versus physical block size. The world is no longer a simple place where all disks can be assumed to have 512 byte physical sectors and you're done. So the basic behavior is to pass through the logical and physical block sizes of the underlying disk that you're exporting. If you're exporting something aggregated together from multiple disks, you should obviously advertise the largest block size used by any part of the underlying storage.

(If the system has complex multi-layered storage it should try hard to propagate all of this information up through the layers.)

You should also provide the ability to explicitly configure what logical and physical block sizes a particular piece of storage advertises. You should allow physical block sizes to be varied up and down from their true value and for logical block sizes to be varied up (and down if you support making that work). It may not be obvious why people need all of this, so let me mention some scenarios:

  • you may want to bump the physical block size of all your storage to 4kb regardless of the actual disks used so that your filesystems et al will be ready and optimal when you start replacing your current 512 byte disks with 4kb disks. (Possibly) wasting a bit of space now beats copying terabytes of data later.

  • similarly you may be replacing 512 byte disks with 4kb disks (because they're all that you can get) but your systems really don't deal well with this so you want to lie to them about it. There are other related scenarios that I'll leave to your imagination.

  • you may want to set a 4 kb logical sector size to see how your software copes with it in various ways. Sometime in the future setting it will also be a future-proofing step (just as setting a 4 kb physical block size is today).

It would be handy if storage systems had both global and per-whatever settings for these. Global settings are both easier and less error prone for certain things; with a global setting, for example, I can make sure that I never accidentally advertise a disk as having 512 byte physical sectors.

(Why this now matters very much is the subject for a future entry.)

SANAdvertisingBlocksizes written at 02:19:21; Add Comment

2013-04-18

How SCSI devices tell you their logical and physical block sizes

Since I spent today looking this up and working it all out, I might as well write all of this down.

Old SCSI had no distinction between logical and physical size; it just had the block size. Modern SCSI has redefined those old plain block sizes to be the logical block size and then added an odd way of encoding the physical block size. This information is reported through the SCSI operation READ CAPACITY (16), which unlike its stunted older brother READ CAPACITY (10) is not actually a SCSI command; instead it's a sub-option of a general SERVICE ACTION IN command. This may assist you in finding it in code and/or documentation.

(SERVICE ACTION IN is SCSI opcode 0x9E and READ CAPACITY (16) is sub-action 0x10. Nice code will have some #defines or the like for these; other code, well, may not. See the discussion of finding SCSI opcodes and so on in this entry.)

The logical block size is returned as a big endian byte count in response bytes 8 through 11 (counting from 0; 0 through 7 are the device's size in logical blocks, again big endian). The size of physical blocks is reported by giving the 'logical blocks per physical block exponent' in the low order four bits of byte 13. If it is set to some non-zero value N, there are 2^N logical blocks per physical block; for 4k sector disks with 512 byte logical blocks the magic exponent is thus 3.

There is no guarantee that code that uses READ CAPACITY (16) either sets or reads this exponent. My impression is that RC (16) and its use in code predates at least the need to think about the difference and perhaps the actual definition of the field (as opposed to just marking it 'reserved').

Note that some code may talk about or #define 'READ CAPACITY' when it means READ CAPACITY (10). You should ignore this code because no one wants to use RC (10) any more. If there's code that is carefully handling a device capacity case of '0xffffffff', you're reading the wrong code. Yes, this can be confusing.

(One of the problems with READ CAPACITY (10) is that the (logical block) size of the device is limited to a 32-bit field. With 512 byte blocks this translates to a disk size of about 2 Tb. It follows that if some old system can't deal with 2 Tb SCSI disks, it's extremely likely that it probably also has no idea of physical block size versus logical block size.)

I'm developing opinions on how storage systems should handle all of this, but that's going to have to wait for another entry.

SCSIBlocksizesDiscovery written at 00:25:44; Add Comment

2013-04-15

The basics of 4K sector hard drives (aka 'Advanced Format' drives)

Modern hard drives have two sector sizes, the physical sector size and the logical one. The physical sector size is what the hard drive actually reads and writes in; the logical sector size is what you can ask it to read or write (and I believe what logical block addresses are in). The physical block size is always equal to or larger than the logical one. Writing to only part of a physical sector requires the drive to do a read-modify-write cycle.

In the beginning, basically all drives had a 512 byte sector size (for both physical and logical, which weren't really split back then). Today it's difficult or impossible to find a current SATA drive that is not an 'Advanced Format' drive with 4096 byte physical sectors. To date I believe that all 4k drives have a 512 byte logical sector size (call this 4k/512), but in the future that may change so that we see 4k/4k drives.

(At this point I have no idea if vendors want to move to a 4k logical sector size. If they don't move life gets simpler for a lot of people, us included.)

The main issue for 4k/512 drives is partial writes. If you're waiting for the write to complete a partial write apparently costs you one rotational latency in extra time. If you're not waiting, eg if you're just writing to the drive's write cache (at a volume where it doesn't fill up), you're probably still going to lose overall IOPs.

(The other problem with partial writes is that if things go wrong they can corrupt the data in the rest of the physical sector, data which you didn't think you were writing.)

There are two ways to get partial writes. The first is that your OS simply writes things smaller than the physical block size (perhaps it uses the logical block size for something or just assumes that sectors are 512 bytes and that it can write single ones). The other is unaligned large writes, where you may be issuing writes that are multiples of the physical block size but the starting position is not lined up with the start of physical blocks. Since most filesystems today normally write in 4k blocks or larger, unaligned writes are the most common problem. The extra bonus for unaligned writes is that they give you two partial writes, one at the start and a second at the end, both of which cost you time, IOPs, or both.

(Aligned large writes that are not multiples of the physical block size will also cause partial writes at the end, but I think that this is relatively uncommon today.)

Getting writes to be aligned requires that everything in the chain from basic partitioning (BIOS or GPT, take your pick) up through internal OS partitioning and on-disk filesystem data structures be on 4k (or larger) boundaries. This is often not the case for existing legacy partitioning. Frequently the original (and existing) partitioning tools rounded things up (or down) to essentially arbitrary 'cylinder' boundaries using nominal disk geometries that were entirely imaginary and generally arbitrary.

(There was a day when disk geometries were real and meaningful, but that was more than a decade ago for most machines.)

Modern disk drives advertise both their physical and logical block sizes (in disk inquiry data). Unfortunately this information may or may not properly propagate up through a complex storage stack (which may involve hardware or software RAID, SAN controllers, logical volume management, virtualization, and so on). The good news is that most modern software aligns things on 4k or larger boundaries regardless of what block size the underlying storage claims to have, so you have at least some chance of having everything work out. The bad news is that you're probably not using all-modern software.

(This is the kind of thing that I write to get everything fixed in my head, since we're now seriously looking into how badly 4k sector drives are going to impact our fileserver environment.)

Note that some vendors make drives with the same model number that can have different physical block sizes. I have a pair of Seagate 500 GB SATA drives (with the same model number, ST500DM002), bought at the same time from the same vendor, one of which turns out to have 4k sectors and one of which has 512 byte sectors as I expected. Fortunately the difference is basically harmless for what I'm using them for.

(Seagate documents this possibility in a footnote on their technical PDF for the drive series, if you read the small print.)

AdvancedFormatDrives written at 23:45:35; Add Comment

2013-04-12

My view on software RAID and the RAID write hole

The old issue of Software RAID versus hardware RAID came up recently on Twitter, which got Chris Cowley to write Stop the hate on Software RAID, which prompted a small lobste.rs discussion in which people pointed to the RAID 5 write hole as a reason to prefer hardware RAID over software RAID. I've written several entries about how I favour software RAID but I've never talked about the write hole.

(For now let's ignore some other issues with RAID 5 or pretend that we're talking about RAID 6 instead, which also has this write hole issue.)

I'll start by being honest even if it's painful: hardware RAID has an advantage here. Yes, you can (and should) put your software RAID system on a UPS (or two) and so on, but there are simply more parts that can fail abruptly when you're dealing with a full server than when you're dealing with an on-card battery. This doesn't mean either that hardware RAID is risk free (hardware RAID cards fail too) or that software RAID is particularly risky (abrupt crashes of this sort are extreme outliers in most environments), but it does mean that hardware RAID is less risky in this specific respect.

This is where we get into tradeoffs. Hardware RAID has both drawbacks and risks of its own (relative to software RAID). When building any real system you have to assess the relative importance and real world chances of these risks (and how successfully you feel that you can mitigate them), because real systems are always almost always a balance between (potential) problems. My personal view is that in general, abrupt system halts are a vanishingly rare in properly designed systems. This makes the RAID write hole essentially a non-issue for software RAID.

(Of course there are all sorts of cautions here. For example, if you're operating enough systems the vanishingly rare can start happening more often than you want.)

Thus my overall feeling is (and remains) that most people and most systems are better off with software RAID than with hardware RAID. In practice I think you are much more likely to get bitten by various issues with hardware RAID than you are to blow things up by hitting the software RAID write hole with a system crash or power loss event.

(By the way, if you're seriously worried about the RAID write hole you'll want to carefully verify that your disks actually write data when they tell you that they have. This is probably much less of a risk if you buy expensive 'enterprise' SAS drives, of course.)

SoftwareRAIDAndRAIDWriteHole written at 00:24:19; Add Comment

2013-03-27

What checksums in your filesystem are usually actually doing

The usual way to talk about the modern trend of filesystems with inherent checksums (such as ZFS and btrfs) is to say that the checksums exist to detect data corruption in your files (and in the filesystem as a whole). In an environment with a certain amount of random bit flips, decaying media, periodic hardware glitches, and other sources of damage, it's no longer good enough to imagine that if you wrote it to disk you're sure to read it back perfectly (or to get a disk error). Filesystems with checksums are sentinels, standing on guard for you and letting you know when this has happened to your data.

But this is not quite what they do in practice (generally). This is because they perform this sentinel duty by denying you access to your data. In doing this they implicitly prioritize integrity over availability; better to not give you data at all than to give you data that at least seems damaged. The same is true but even more so if filesystem metadata seems damaged.

(This is similar to the tradeoff disk encryption makes for you.)

You may not be exactly happy with this tradeoff. Yes, it's nice to know if you're reading corrupt data, but sometimes you really want to see that data anyways just to see if you can reconstruct something. This goes even more so for filesystem metadata, especially core metadata; it's not hard to get into a situation where almost all of your data is intact and probably recoverable but the filesystem won't give it to you.

Old filesystems went the other way, and not just by not having any sort of checksums; they often came with quite elaborate recovery tools that would do almost everything they could to get something back. The results might be scattered in little incoherent bits all over the filesystem, but if you cared enough (ie it was important enough), you had a shot at assembling what you could.

(This is still theoretically possible with modern checksumming filesystems but at least some of them are very strongly of the opinion that the answer here is 'restore from backups (of course you have backups)' and so they don't supply any real sort of tools to help you out.)

My opinion is that filesystems ought to support an interface that allows you to get access to even data that fails checksums (perhaps through a special 'no error on checksum error' flag for open()). This wouldn't fix all of the problems (since it wouldn't help in the face of many metadata issues) but it would at least be something and a gesture to agreeing that integrity is not always the most important thing.

FilesystemChecksumEffects written at 22:54:11; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
(Previous 10 or go back to March 2013 at 2013/03/17)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.