Wandering Thoughts


ARM servers had better just work if vendors want to sell very many

A few years ago I wrote about the cost challenge facing hypothetical future ARM servers here; to attract our interest, they'd have to be cheaper or better than x86 servers in some way that we cared about. At the time I made what turns out to be a big assumption: I assumed that ARM servers would be like x86 servers in that they would all just work with Linux. Courtesy of Pete Zaitcev's Standards for ARM computers and Linaro and the follow-on Standards for ARM computers in 2017, I've now learned that this was a pretty optimistic assumption. The state of play in 2017 is that LWN can write an article called Making distributions Just Work on ARM servers that describes not current reality but an aspirational future that may perhaps arrive some day.

Well, you know, no wonder no one is trying to actually sell real ARM servers. They're useless to a lot of people right now. Certainly they'd be useless to us, because we don't want to buy servers with a bespoke boatloader that probably only works with one specific Linux distribution (the one the vendor has qualified) and may not work in the future. A basic prerequisite for us being interested in ARM servers is that they be as useful for generic Linux as x86 servers are (modulo which distributions have ARM versions at all). If we have to buy one sort of servers to run Ubuntu but another sort to run CentOS, well, no. We'll buy x86 servers instead because they're generic and we can even run OpenBSD on them.

There are undoubtedly people who work at a scale with a server density where things like the power advantages of ARM might be attractive enough to overcome this. These people might even be willing to fund their own bootloader and distribution work. But I think that there are a lot of people who are in our situation; we wouldn't mind extra-cheap servers to run Linux, but we aren't all that interested in buying servers that might as well be emblazoned 'Ubuntu 16.04 only' or 'CentOS only' or the like.

I guess this means I can tune out all talk of ARM servers for Linux for the next few years. If the BIOS-level standards for ARM servers for Linux are only being created now, it'll be at least that long until there's real hardware implementing workable versions of them that isn't on the bleeding edge. I wouldn't be surprised if it takes half a decade before we get ARM servers that are basically plug and play with your choice of a variety of Linux distributions.

(I don't blame ARM or anyone for this situation, even though it sort of boggles me. Sure, it's not a great one, but the mere fact that it exists means that ARM vendors haven't particularly cared about the server market so far (and may still not). It's hard to blame people for not catering to a market that they don't care about, especially when we might not care about it either when the dust settles.)

ArmServersHaveToJustWork written at 23:14:50; Add Comment


Overcommitting virtual memory is perfectly sensible

Every so often people get quite grumpy about the fact that some systems allow you to allocate more virtual memory than the system can possibly provide if you actually try to use it all. Some of these people go so far as to say that this overcommitting is only being allowed because of all of those sloppy, lazy programmers who can't be bothered to properly manage memory, and so ask for (far) more memory than they'll actually use.

I'm afraid that these people are living in a dream world. I don't mean that in a pragmatic sense; I mean that in the sense that they are ignoring the fact that we pervasively trade memory for speed all through computing, from programming practices all the way down to CPUs. Over-use and over-allocation of memory is pervasive, and once you put enough of that together what you get is virtual memory that is allocated but never used. It's impossible to be comprehensive because once you start looking wasted space is everywhere, but here are some examples.

Essentially every mainstream memory allocator rounds up memory allocation sizes or otherwise wastes space. Slab allocation leaves space unused at the end of pages, for example, and can round the size you ask for up to a commonly used size to avoid having too many slabs. Most user-level allocators request space from the operating system in relatively large chunks in order to amortize the cost of expensive operating system calls, rather than go back to the OS every time they need another page's worth of memory, and very few of them do anything like release a single free page's worth of space back to the OS, again because of the overhead.

(There used to be memory allocators that essentially never returned space to the operating system. These days it's more common to give free space back eventually if enough of it accumulates in one spot, but on Unix that's because how we get memory from the operating system has changed in ways that make it more convenient to free it up again.)

It's extremely common for data structures like hash tables to not be sized minimally and to be resized in significant size jumps. We accept that a hash table normally needs a certain amount of empty space in order to keep performing well for insertions, and when entries are deleted from a large hash we often delay resizing the hash table down, even though we're once again wasting memory. Sometimes this is explicitly coded as part of the application; at other times it is hidden in standard language runtimes or generic support libraries.

Speaking of things in standard language runtimes, delayed garbage collection is of course a great way to have wasted memory and to have more memory allocated from the operating system than you actually need. Yet garbage collection is pervasively seen as a valid programming technique in large part because we've accepted a tradeoff of higher memory 'use' in exchange for faster and more reliable program development, although it can also be faster under some circumstances.

And finally there's object alignment in memory, which generally goes right down to the CPU to some degree. Yes, it's (only) bytes, but again we've shown that we're willing to trade memory for speed, and there are ripple effects as everything grows just a little bit bigger and fits just a little bit less well into memory pages and so on.

There are environments where people very carefully watch every little scrap of space and waste as little of it as possible, but they are not mainstream ones. In mainstream computing, trading memory for speed is a common and widely accepted thing; the only question is generally how much memory is worth trading for how much speed.

Sidebar: Memory waste and APIs

On Unix, one of the tradeoffs that has been made from the very start is that there are some simple, general, and flexible (and powerful) APIs that strictly speaking may commit the kernel to supplying massive amounts of memory. I am talking here primarily of fork() and especially the fork()/exec() model for starting new processes. I maintain that fork() is a good API even though it very strongly pushes you towards a non-strict handling of memory overcommit. In this it illustrates that there can be a tradeoff between memory and power even at the API level.

OvercommittingMemoryIsSensible written at 01:27:38; Add Comment


An AMD Ryzen is unlikely to be my next desktop's CPU

I'm probably going to wind up building a new home machine this year, to replace my current five year old one. One of the reasons I haven't been doing much on this yet is that both Intel and AMD only recently released their latest desktop CPU lines, Kaby Lake and Ryzen respectively. AMD has not been particularly competitive in CPUs for years now, so there's been a lot of hope attached to Ryzen; plenty of people really want AMD to come through this time around so Intel would face some real competition for once.

I have an unusual reason to be interested in Ryzen, which is that I would like ECC memory if possible. For a while, one of the quiet attractions of AMD is that they've been much more generous about supporting ECC in their CPUs and chipsets than Intel, who carefully locks ECC away from their good desktops. If Ryzen was reasonably competitive in CPU performance, had reasonable thermal performance, and supported ECC, it would suddenly might be worth overlooking things like single-threaded performance (despite what I've written about that being a priority, because I'm fickle).

The best current information I've absorbed is via Dan McDonald's Twitter, here (original, with more questions and so on) and a qualification here; unfortunately this is then followed up by more confusion. The short form version appears to be that ECC support is theoretically in the Ryzen CPUs and the AM4 chipset, but it is not qualified by AMD at the moment and it may not be enabled and supported by any particular actual AM4 Ryzen motherboard, and there are some indications that ECC may not actually be supported in this generation after all.

The really short form version: I shouldn't hold my breath for actual, buyable AMD Ryzen based desktop motherboards that support ECC RAM (in ECC mode). They may come out and they may even work reliably when they do, but ECC is clearly not a feature that either AMD or the motherboard vendors consider a priority.

With ECC off the table as a motivation, the rest of Ryzen doesn't look particularly compelling. Although AMD has gotten closer this time around, Ryzen doesn't have the raw CPU performance of Intel's best CPUs and perhaps not their good thermal performance either; for raw single CPU performance at reasonable cost and TDP, Intel's i7-7700K is still close to being the champion. Ryzen attempts to make up the deficit by throwing more cores at it, which I'm not excited by, and by being comparatively cheaper, which doesn't motivate me much when I seem to buy roughly one desktop every five years.

(As an additional drawback, current Ryzen CPUs don't have integrated graphics. I hate the idea of venturing into the graphics card swamps just to drive a 4K monitor.)

Still, Ryzen's a pretty good try this time around. If I ran a lot of virtual machines on my (office) desktop machine, Ryzen might be an interesting alternative to look at there, and I hope that there are a bunch of people working on cramming them into inexpensive servers (especially if they can get those servers to support ECC).

AMDRyzenEarlyViews written at 00:56:48; Add Comment


Another risk of hardware RAID controllers is the manufacturer vanishing

We recently inherited a 16-drive machine with a 3ware hardware RAID controller and now I'm busy trying to put it to work. Our first preference is to ignore the RAID part and just the raw disks, which may or may not work sufficiently well (the omens are uncertain at the moment). If we have to use the 3ware hardware RAID, we'll need the proprietary tools that 3ware supplies. And that has turned out to be a problem.

Once upon a time, 3ware was an independent company that made well regarded (at the time) hardware RAID controllers that were also a popular way to do JBOD IDE and SATA disks. Then it was bought by AMCC, which was bought by LSI, which was bought by Avago (now Broadcom). The 3ware website currently points to an Avago IP address that doesn't respond, and good luck finding links for anything that still works, or rather links that point to official sources for it (lots of people have made copies of this stuff and put them up on their own websites). At one point it looked like I might have to resort to the Wayback Machine in order to get something, although that probably didn't have the actual files we'd need.

(If you're ever in this situation, it turns out that you can dig things out of the Broadcom website with enough work. The downloads you want are in the 'Legacy Products' category, for example through this search link.)

I've been generally down on hardware RAID over the years for various reasons, including performance, ease of management and diagnostics, and the portability of software RAID across random hardware. But I have to admit that until now I hadn't really considered the risk of the maker of your hardware RAID card simply vanishing and taking with it the associated software that you needed to actually manage and monitor the RAID at anything except a very basic level.

(Monitoring is especially important for hardware RAID, where without special software you may not get notified about a failed disk until the second one dies and takes your entire array with it. Or a third one, for people using RAID-6.)

Of course even if the company doesn't vanish, products do get deprecated and software related to them stops being maintained. I'm reasonably hopeful that the 3ware utilities will still run on a modern Linux system, but I'm not entirely confident of it. And if they don't, we don't really have very many options. People who use less popular operating systems may have even bigger problems here (I think current versions of Illumos may have wound up with no support for 3ware, for example).

HardwareRAIDManufacturerRisk written at 01:16:54; Add Comment


I'm too much of a perfectionist about contributing to open source projects

I find it hard to contribute changes to open source projects, and in fact even the thought of trying to do so usually makes me shy away. There are a tangled bunch of reasons for why, but I've come to realize that a part of it is that I'm nervous about my work being perfect, or at least very good. Take a documentation change, for example.

If I'm going to make any public contribution of this nature, for example to make a correction to a manpage, I'm going to worry a great deal about how my writing reads. Does it flow well? Does it fit in? Have I written some clumsy, clunky sentences that are forever going to sit there as a stinker of a legacy, something I'll wind up wincing about every time I read the manpage myself? I write enough things here that don't quite work or that make me wince at the phrasing in retrospect, so it's very much a possibility and something I'm aware of.

Then there's the question of what I'm writing down. Do I actually understand the situation correctly and completely? If I'm writing about how to do something, have I picked the best way to do it, one that works effectively and is easy to do? What about backwards compatibility concerns, for example if this is different on older Linux kernels? Does the project even care about that, or do they want their documentation to only reflect the current state of affairs?

I'm not saying that I should throw half-baked, half-thought-out things at projects; that's clearly a bad idea itself. But many of these worries I wind up with are probably overblown. Maybe I don't get my writeup entirely complete on the first submission, and the existing people in the project have to point out stuff that I missed. That's probably okay. But there's this gnawing worry that it's not. I don't want to be an annoyance for open source projects and (potentially) incomplete work feels like it makes me into one.

There's also that making changes to a decent sized project feels like a terrifyingly large responsibility. After all, I could screw something up and create bugs or documentation problems or whatever, for a whole lot of people. It's much easier to just submit bug reports and make suggestions about fixes, which leaves the responsibility for the actual changes to someone else.

(Even with bug reports I sometimes feel nervous about submitting them. There's the problem of putting in too much detail, and the social effects of bad ones, among other things.)

(Of course, submitting good changes is hard, too. It really is a surprisingly large amount of work to do it right. But that's another issue from being (overly) nervous about how good my work is, and in theory many projects are welcoming of incomplete or first-pass change submissions. I doubt I'll ever persuade myself to take them up on their offer, though.)

HardOpenSourceContributionsForMe written at 03:22:50; Add Comment


Different ways you can initialize a RAID-[567+] array

I was installing a machine today where we're using Linux software RAID to build a RAID-6 array of SATA HDs, and naturally one of the parts of the installation is creating and thus initializing the RAID-6 array. This is not something that goes very fast, and when I wandered past the server itself I noticed that the drive activity lights were generally blinking, not on solid. This got me thinking about various different ways that you might initialize a newly created RAID-N array.

It's obvious, but the reason newly created RAID-N arrays need to be initialized is to make the parity blocks consistent with the data blocks. The array generally starts with drives where all the blocks are in some random and unknown state, which means that the parity blocks of a RAID stripe are extremely unlikely to match with the data blocks. Initializing a RAID array fixes this in one way or another, so that you know that any parity mismatches are due to data corruption somewhere.

The straightforward way to initialize a RAID-N array is to read the current state of all of the data blocks for each stripe, compute the parity blocks, and write them out. This approach does minimal write IO, but it has the drawback that it sends an interleaved mixture of read and write IO to all drives, which may slow them down and force seeking. This happens because the parity blocks are normally distributed over all of the drives, rotating from drive to drive with each stripe. This rotation means that every drive will have parity blocks written to it and no drive sees pure sequential read or write IOs. This way minimizes write IO to any particular drive.

A clever way to initialize the array is to create it as a degraded array and then add new disks. If you have an M disk array with N-way parity, create the array with M-N disks active. This has no redundancy and thus no need to resynchronize the redundancy to be correct. Now add N more disks, and let your normal RAID resynchronization code go into effect. You'll read whatever random stuff is on those first M-N disks, assume it's completely correct, reconstruct the 'missing' data and parity from it, and write it to the N disks. The result is random garbage, but so what; it was always going to be random garbage. The advantage here is that you should be sequentially reading from the M-N disks and sequentially writing to the N disks, and disks like simple sequential read and write IO. You do however write over all of the N disks, and you still spend the CPU to do the parity computation for every RAID stripe.

The final way I can think of is to explicitly blank all the drives. You can pre-calculate the two parity blocks for a stripe with all zeros in the data blocks, then build appropriate large write IOs for each drive that interleave zero'd data blocks and the rotating parity blocks, and finally blast these out to all of the drives as fast as each one can write. There's no need to do any per-stripe computation or any read IO. The cost of this is that you overwrite all of every disk in the array.

(If you allow people to do regular IO to a RAID array being initialized, each scheme also needs a way to preempt itself and handle writes to a random place in the array.)

In a world with both HDs and SSDs, I don't think it's possible to say that one approach is right and the other approaches are wrong. On SSDs seeks and reads are cheap, writes are sometimes expensive, and holding total writes down will keep their lifetimes up. On HDs, seeks are expensive, reads are moderately cheap but not free, writes may or may not be expensive (depending in part on how big they are), and we usually assume that we can write as much data to them as we want with no lifetime concerns.

PS: There are probably other clever ways to initialize RAID-N arrays; these are just the three I can think of now.

(I'm deliberately excluding schemes where you don't actually initialize the RAID array but instead keep track of which parts have been written to and so have had their parity updated to be correct. I have various reactions to them that do not fit in the margins of this entry.)

PPS: The Linux software RAID people have a discussion of this issue from 2008. Back then, RAID-5 used the 'create as degraded' trick, but RAID-6 didn't; I'm not sure why. There may be some reason it's not a good idea.

RAIDNInitializationOptions written at 00:53:39; Add Comment


Why having CR LF as your line ending is a mistake

In my entry on what we still use ASCII CR for today, I mentioned in passing that it was unfortunate that protocols like HTTP had continued to specify that their line ending was CR LF instead of plain LF, and called it a mistake. Aneurin Price disagreed with this view, citing the history that CR LF was there first as a line ending. This history is absolutely true, but it doesn't change that CR LF is a mistake today and pretty much always was. In fact, we can be more general. The mistake is not specifically CR LF; the mistake is making any multi-byte sequence be your line ending.

The moment you introduce a multi-byte line ending sequence you require every piece of code that wants to recognize line endings to use some sort of state machine, because you have to recognize a sequence. A CR by itself is not a line ending, and a LF by itself is theoretically not a line ending; only a CR LF combined is a line ending, and you must recognize that somehow. This state machine may be as (apparently) simple as using a library call 'find the sequence \r\n' instead of a library call 'find the byte \n' (or \r on old Macs), or it may be more elaborate when you are attempting to read an IO stream character by character and stop the moment you hit end-of-line. But you always need that state machine in some form, and with it you need state.

If you have a single byte line terminator, life is much easier. You read until you find the byte, or you scan until you find the byte, and you are done. No state is needed to recognize your end of line marker.

(There's also no ambiguity about what you should do when you see just one byte of the line terminator, and thus no disagreement and different behavior between implementations. Such differences definitely exist in handling CR LF and they lead to various sorts of problems in practice.)

The decision by Unix and Mac OS to have a single character represent logical end of line in their standard text format regardless of how many ASCII characters had to be printed to the terminal to actually achieve a proper newline is the correct one. It simplifies and quietly slightly speeds up a huge amount of code, at the minor cost (on Unix) of requiring some more smarts inside the kernel.

(This is also the right place to put the smarts, since far more text is processed on typical systems than is ever printed out to the terminal. The place to pay the cost is at the low-frequency and central spot of actually displaying text to the user, not the high-frequency and widely spread spot of everything that processes text line by line.)

PS: The relevant Wikipedia page credits the idea of using a single character for logical end of line and converting it on output to Multics, which picked LF for this job for perfectly reasonable reasons. See the Newline history section.

WhyCRLFIsAMistake written at 21:39:07; Add Comment


Conversations, conversational units, and Twitter

A while back, Glyph wrote an article more or less decrying 'tweetstorms', which are more or less multi-tweet sequences of remarks on Twitter and suggested that they should be blog posts instead (he later added an important followup). On the one hand I sort of agree with Glyph. Twitter doesn't make it very nice to read long sequences of tweets, and when I wind up trying to follow such things I generally wish I could get them in a blog post, even if it was just a copy of all of the tweets. On the other hand, I think that 'no tweetstorms' is going too far, because it conjoins an arbitrary limit (Twitter's 140 characters) with what I think people want, which is basically one remark in a conversation.

(I admit that my ox is mildly gored here.)

Twitter is not exactly a conversation, especially when you are starting a series of tweets from scratch, but it is close enough that there's a definite crossover. In my own Twitter usage, sometimes I can get what I want to say into a single tweet, but sometimes really covering it requires elaboration or what I need to say is simply larger than 140 characters. But what all of these cases have in common is that they are what I consider a single conversational unit, for lack of a better term. Each of them is a single small-scale thought, and so they're all the sort of thing I might say as a remark in a back-and-forth conversation.

I think that people are happy to both read and make genuine remarks on Twitter, even if they go over 140 characters and thus have to be done through several tweets. In a normal, natural conversation, you should be reasonably short but no one expects you to speak in a single short sentence and then stop. A big tweetstorm is different, though; it's not a remark, it's a speech, and people can tell. If you're going to make a speech, that's when you should get a blog.

In practice it's probably more like the distinction between writing a single paragraph or a big multi-paragraph thing. Short runs of tweets are basically a single paragraph spread out across several tweets, while a big tweetstorm would almost always clearly translate to multiple paragraphs if written out in pretty much any other form.

(I find Twitter's 140 character limit interesting and even sort of fascinating because it so clearly affects how people write on it, myself included. And sometimes this includes not writing things when I can't find a good way to fit them into the confines of a tweet or three. I'm sure that all mediums affect what and how we write, but for me it's rare to feel it so clearly. For that matter, having the character count for pending tweets directly visible affects how I write them.)

ConversationalUnitsAndTwitter written at 02:16:22; Add Comment


Some notes on 4K monitors and connecting to them

For reasons beyond the scope of this entry, I'm probably going to build a new home machine this year, finally replacing my current vintage 2011 machine. As part of this (and part of motivating me into doing it), I'm going to persuade myself to finally get a high-resolution display, probably a 27" 4K monitor such as the Dell P2715Q. Now, I would like this hypothetical new machine to drive this hypothetical 4K+ monitor using (Intel) motherboard graphics, which means that I need a motherboard that supports 4K at 60 Hz through, well, whatever connector I should have. Which has sent me off on a quest to understand just how modern monitors connect to modern computers.

(It would be simple if all motherboard supported 4K at 60 Hz on all the various options, but they don't. Just among the modest subset I've already looked at, some motherboards do DisplayPort, some do HDMI, and some have both but not at 4K @ 60 Hz for both.)

As far as I can tell so far, the answer is 'DisplayPort 1.2' or better. If I wanted to go all the way to a 5K display at 60 Hz, I would need DisplayPort 1.3, but 5K displays appear to still be too expensive. Every 4K monitor I've looked at has DisplayPort, generally 1.2 or 1.2a. HDMI 2.0 will also do 4K at 60 Hz and some monitors have that as well.

(That 4K monitors mostly don't go past DisplayPort 1.2 is apparently not a great thing. DisplayPort allows you to daisy-chain displays but you have to stay within the total bandwidth limit, so a 4K monitor that wants to let you daisy-chain to a second 4K monitor needs at least one DP 1.3+ port. Of course you'd also need DisplayPort 1.3+ on your motherboard or graphics card.)

Adding to the momentum of DisplayPort as the right choice is that there are also converters from DisplayPort 1.2 to HDMI 2.0 (and apparently not really any that go the other way). So a motherboard with DisplayPort 1.2 and support for 4K at 60 Hz over it can be used to drive a HDMI 2.0-only monitor, if such a thing even exists (there are probably HDMI 2.0 only TVs, but I'm not interested in them).

I assume that having HDMI 2.0 on motherboards helps if you want to drive a TV, and that having both DisplayPort 1.2 and HDMI 2.0 (both with 4K at 60 Hz support) might let you drive two 4K displays if one of them has HDMI 2.0. The latter feature is not interesting to me at the moment, as one 27" display is going to take up enough desk space at home all on its own.

(As usual, searching for and comparing PC motherboards seems to be a pain in the rear. You'd think vendors would let you easily search on 'I want the following features ...', but apparently not.)

Driving4KMonitorsNotes written at 03:04:18; Add Comment


People may be accepting that security questions are a bad idea

Maybe, once upon a time, security questions made a kind of sense. If so, it was back in a much more innocent time before a ton of information about people was available to be found and searched through. These days, almost any question that's easy for people to remember the answer to is also too easy for other people to find out. None of this is news to security researchers, but people keep using security questions (and security conscious people keep making up random answers and then having to record them somehow). However I've now seen a hopeful sign that that may be changing.

Yahoo recently forced me to change my Yahoo account's password (which I only have because of my Flickr account). When I went through this process, I discovered something interesting: Yahoo very strongly urged me to disable my security question.

(I left it turned on for now because it's got a random answer.)

Also of interest to me was that Yahoo didn't seem to feel any need to explain why disabling my security question would be a good idea; they just asked me to. I assume that either they think it's obvious to people or they don't think they can write enough documentation to matter.

If a large place like Yahoo is pushing away from security questions (and for all I know, may have been doing so for some time now), I can hope that this is going to spread. Not having to record several random passwords for various sites certainly would make my life easier.

(Of course I'm sure that sites will come up with equally annoying alternatives. Maybe some of them will start absolutely insisting on some form of two-factor authentication.)

SecurityQuestionsAcceptedAsBad written at 01:46:03; Add Comment

(Previous 10 or go back to December 2016 at 2016/12/15)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.