Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.
|
2012-02-09 What supporting a production OS means for meVia Pete Zaitcev I just read FreeBSD and release engineering (lwn.net), which is about some issues people are having with the FreeBSD release schedules. That article will probably be a Rorschach test for at least non-FreeBSD people, because how you react to the ideas in it will probably depend on how you see 'production support'. Before I get into my reactions to it (in another entry), I've decided to write down how I see production support. To me:
There should be some overlap in full support between release X and release X+1, because people are not necessarily ready to start installing new machines with release X+1 the moment it comes out. A production release should not change the code of existing features and systems beyond genuine bug fixes and security updates (and for new hardware support if you really, really have to do so). I really do mean 'the code of', not just 'the behavior of', because any code change adds the possibility of bugs and incompatibilities. I don't trust OS vendors to not accidentally introduce either of these in code changes, so I want code changes minimized as much as possible. I don't mind if a production release (in either full support or legacy support) gets genuinely new features and software provided that those features are optional and are written in such a way that there is no possibility that they will change or destabilize systems that do not use them. For example, if you want to add iSCSI target support as some new programs and a new loadable kernel module, knock yourself out and I don't care. But if your new iSCSI target driver requires changes to general kernel code, no, forget it; that's too dangerous. (Another way to put this is that new features are analogous to new hardware support, except that unlike new hardware support you don't get to change anything that already exists. If you can't add it without changing existing things, you don't get to add it at all.) I don't think that you should allow optional replacement of existing software with new software versions (eg, an optional upgrade from Apache 1.x to Apache 2.x) because it creates a combinatorial explosion in significant variations on the initial production OS release. Among other issues, you can easily get into a situation where no one actually really supports running the original Apache 1.x version any more because everyone has made the 'optional' upgrade to Apache 2.x, which effectively makes the upgrade mandatory instead. (The extreme version of this is how Debian stable and unstable used to be, where practically no one in the Debian development community actually used the stable version any more.)
A general point about SSH personal keysRecently I've seen a number of articles on suggested good ways to use SSH securely and other SSH tricks (unfortunately I can't find URLs to all of them, so I'm not going to try to put any here). As it happens I have a few modest suggestions on this, but before I started I wanted to make a broad meta-point about the use of personal SSH keys, aka SSH identities. The big thing to understand about all advice about SSH personal keys is that when you choose to use personal keys for your own logins, you are deciding to balance convenience with security. After all, if security was your primary concern you would not use personal keys at all; you would use one time passwords with two-factor authentication. (Things are different for cron'd scripts and the like, when there is no human there to interact with the system. I'm purely talking about using SSH identities to avoid typing passwords.) Now, everyone has different views of the amount of security that they need and the convenience that they want. People fall along a spectrum between the two poles and where you wind up is not necessarily where I do. Thus, people's security advice about personal keys is not necessarily right for you even if it's correct (in some sense). The trick is to understand your particular tradeoffs and circumstances, to figure out what irritates you and what you need, and then to pick what works for you rather than blindly following someone else's suggestions and being either frustrated or dangerously insecure (in your environment) or both. Yes, some things will make you less secure than others but they can also be more convenient (and vice versa). Sometimes this is the right tradeoff for you and sometimes it is not (even if it's the right tradeoff for me or whoever you're reading). And yes, there are some SSH tricks that usually increase both security and convenience. These are excellent things to know when you can find them. (Sadly, my suggestions to come are not of this nature.) PS: as always when you consider security related issues, you want to think about not just security in the abstract but security in the concrete in your environment with your risks.
2012-02-08 Choosing the superblock format for Linux's software RAIDLinux's software RAID implementation stores metadata about the RAID
device in each physical device involved in the RAID, in what (Even if you don't actively make a decision, In my opinion, at the moment there are three sensible options to choose from: the 0.90 format and then two variants of the 'version-1' metadata format.
(You can see what format your current RAID arrays are using by looking
at Where the superblock goes is potentially important for RAID-1 arrays. A RAID-1 array with the superblock at the end can relatively easily have whatever filesystem it contains mounted read-only without the RAID running, because the filesystem will start at the start of the underlying raw partitions; this can be important sometimes. A RAID-1 array with the superblock at or near the start of the underlying partitions can't have the raw partitions used this way, because you have to look somewhat beyond the start of the raw partition to see the filesystem. (Some versions of If you want to use a modern format and are going to directly use the
RAID-1 array for a filesystem, I would use 1.0 format (this is what
I've done for my new (LVM physical volumes have their own metadata, which normally goes at
the start of the 'raw' partition that LVM is using but which can be
replicated to the end as well. See As far as I know you can't change the superblock format of an array after it has been created, at least not without destroying it and recreating it. You can sort of do this without an extra disk with sufficient work, but really you want to get it right at creation time. PS: note that in theory you can use
2012-02-06 The advantage of HDMI for dual displaysOne of the interesting things that happened during my five years of hardware hibernation is that when I woke up, even low end (aka passively cooled) graphics cards could suddenly drive two digital outputs. Back in 2006 it was common for cards to have one analog and one digital out (eg, the ATI X300 in my work machine had VGA plus DVI), but getting dual digital out required an expensive card with an often noisy fan. (I actually went through two such cards at work, each time deciding that I couldn't see enough advantage to driving my second display digitally instead of via analog VGA to be worth putting up with the noise. Possibly I wasn't sensitized enough to VGA artifacts and issues.) What I have to thank for this is HDMI. Now, I'm aware that there's a lot to dislike about HDMI (see eg HDCP), but from my perspective the great thing about it is that it's given even low end cards a second digital output; it seems to be common for cards to have both DVI and HDMI. Some modern displays can be directly driven over HDMI and for the others, a simple cable will go from HDMI to DVI. And so my 2011 low end, passively cooled graphics card will now drive both my displays at work digitally, one directly with DVI and one with an HDMI to DVI cable, which is something that I never managed nicely before now. (I believe that this has resolution limits. I don't use really big LCDs, so these haven't affected me.) One of the interesting questions for me is why this happened. Why did graphics card vendors start putting HDMI on everything, where they only rarely did dual DVI? I think that part of the reason is that HDMI uses a physically small connector. DVI uses a relatively big connector and if you look at the back of a graphics card (especially a dual-DVI graphics card), there just isn't all that much physical space there; it's hard to get two DVI connectors and anything else in. By contrast, HDMI connectors are much smaller (I can't find the exact dimensions, but some sources say a third of the size). This makes it much easier to find the physical room for a HDMI connector on a card edge and on a circuit board. (For example, my current graphics card just fits in VGA, DVI, and HDMI connectors with basically no spare room.) PS: I don't think it's a coincidence that DisplayPort, the theoretical next generation replacement for DVI, also has a small connector. I suspect that the graphics card layout designers had a few words with people. (Of course pretty much everything seems to be going to small connectors, with large ones proving awkward. Consider SATA versus IDE, for example. Someone who knew more about electronics than I do could probably write a fascinating article about all of the developments that made narrow-connector interfaces feasible and preferable to the old wide connector ones.) (2 comments.)
tech/HDMIDualDisplays written at 23:24:52; Add Comment
2012-02-05 My view on what will kill 'traditional' system administrationPhil Hollenback recently wrote DevOps Is Here Whether You Like It Or Not, in which he writes that traditional system administration is dying. While I sort of agree with him about the death, I don't think it's necessarily for the reasons that Phil points to. Fundamentally, there has always been a divide between small systems and large systems. Large systems have had to automate and when that automation involved applications, it involved the developers; small systems did not have to automate, and often do not automate because the costs of automation are larger than the costs of doing everything by hand. Moving to virtualization doesn't change that (at least for my sort of system administration, which has always had very little to do with shoving actual physical hardware around); if you have only a few virtualized servers and services, you can perfectly well keep running them by hand and it will probably be easier than learning Chef, Puppet, or CFEngine and then setting up an install. (If you're future-proofing your career you want to learn Chef or Puppet anyways, so go ahead and use them even in a small environment.) There are two things that I think will change that, and Phil points to one of them. Heroku is not just a virtualization provider; they are what I'll call a deployment provider, where if you write your application to their API you can simply push it to them without having to configure servers directly. We've seen deployment providers before (eg Google App Engine), but what distinguishes Heroku is how unconstrained and garden variety your API choices are. You don't need to write to special APIs to build a Heroku-ready application; in many cases, if you build an application in a sensible way it's automatically Heroku-ready. This is very enticing to developers because (among other things) it avoids lockin; if Heroku sucks for you, you can easily take your application elsewhere. (This has historically not been true of other deployment providers, which makes writing things to, say, the Google AppEngine API a very big decision that you have to commit to very early on.) Deployment providers like Heroku remove traditional system administration entirely. There's no systems or services to configure, and the developers are deeply involved in deployment because a non-developer can't really take a random application and deploy it for the developers. If there is an operations group, it's one that worries about higher level issues such as production environment performance and how to control the movement of code from development to production. The other thing is general work to reduce the amount of knowledge you need to set up a Chef or Puppet-based environment with certain canned configurations. Right now my impression is that we're still at the stage where someone with experience has to write the initial recipe to configure all N of your servers correctly, and you might as well call that person a sysadmin (ie, they understand Apache config files, package installation on Ubuntu, etc). However it's quite possible that this is going to change over time to the point where we'll see programs shipped with Chef or Puppet recipes to install them in standard setups. At that point you won't need any special knowledge to go from, say, writing a Django-based application to installing it on the virtualization environment of your choice. This really will be the end of developers needing conventional sysadmins in order to get stuff done. The general issue of the amount of hardware in a small business (and virtualizing the hardware) ties into a larger question of how much hardware the business of the future is going to need or want, but that's a different entry. I will just observe that the amount of servers that you need for a given amount of functionality has been steadily shrinking for years. Sidebar: what virtualization does change nowI think that plain virtualization does mark a sea change today in one way: it moves sysadmins away from a model of upgrading OSes to a model of recreating their customizations on top of a new version of the OS. Possibly it moves away from upgrading software versions in general to 'build new install with new software versions from scratch, then configure'. This is partly because the common virtualization model is 'provide base OS version X image, you customize from there' and partly because most virtualization makes it easy to build new server instances. It's much easier to start a new Ubuntu 12.04 image than it is to find a spare server to use as your 12.04 version of whatever. (Note that virtualization may not make it any easier to replace your Ubuntu 10.04 server with a new 12.04 server; there are a host of low level practical issues that you can still run into unless you already have a sophisticated management environment built up.) I don't think that this is a huge change for system administration, partly because this is pretty how much we've been doing things here for years. We basically never upgrade servers in place; we always build new servers from scratch. Among other things, it's much cleaner and more reproduceable that way. (One comment.)
sysadmin/WhatWillKillSysadmin written at 23:56:02; Add Comment
Link: Filenames.WTFIn Filenames.WTF, Daniel Rutter runs down the reasons first why paying attention to file extensions is ridiculous, and then the reasons why it's still the best solution to the problem that we have. Spoiler: it's because people have spent decades creating file formats that suck.
What five years of PC technology changed for meThis fall I got a new home machine, just a bit over exactly five years after I got my previous home machine. It happens that I saved the invoice for my five year old machine, so I dug it out today in order to do a comparison about what five years of progress in PC technology did and didn't change for me. First off, the progress of five years got me much better prices. My recent home machine cost me only about 60% of what my old home machine did. By itself, this is pretty impressive. Apart from that, running down the major components:
In 2006, the most expensive components were the RAM, the CPU, the two hard drives together, and then the video card. In 2011, the most expensive components were the CPU, the motherboard, and the case (more or less tying with the RAM). Another way to put it is that in 2011, the video card, the DVD burner, the hard drives, and pretty much the RAM were all what I considered trivial expenses in the overall machine. (One comment.)
tech/FiveYearsPCChanges written at 02:28:22; Add Comment
2012-02-03 Understanding a subtle Twitter featureOne part of getting on Twitter has been following people, which led me to discover that when you follow someone Twitter doesn't show you all of their public tweets. To summarize what I think is the rule, Twitter excludes any conversations they're having that purely involve other people you don't also follow. Their tweets in the conversation will appear in their public timeline, but not in your view of their tweets. (This may only apply to relatively new Twitter accounts, or even only to some of them. I've seen Twitter give two different interfaces to two new accounts.) On the one hand, when I discovered this I was infuriated. If you really did want to see everything (for example, so you could find other people to follow based on who your initial people had interesting conversations with), this made having a Twitter account worse than just perusing the Twitter pages of interesting people. On the other hand, once I thought about it more I've come to reluctantly admire Twitter's trick with this feature. What it is, from my perspective, is a clever way to reduce the volume impact of following someone and thus make doing so less risky. Without it, following someone would immediately expose you to both their general remarks and to the full flow of whatever conversations they have. With Twitter's way, you are only initially exposed to people's general remarks; you ramp up your exposure to their conversations by following more people, and ramp it down by the reverse. My feeling is that exposure to an overwhelming firehose of updates is the general problem of social networking. Social networks usually want you to be active and to follow lots of people. But if those people are themselves active, the more people you follow the more volume descends on you, and it's especially bad when you follow very socially active users, the ones having a lot of conversations. This creates a disincentive to follow people and pushes you to scale back. Twitter has this especially badly because it has no separate 'comment' mechanism (comments are important for reducing volume). Twitter's trick here is thus a clever way to reduce the firehose in a natural way that doesn't require user intervention and tuning; you could see it as a way of recreating something like comments in a system that doesn't naturally have them. Once I realized this, it's certainly been working the way that Twitter probably intended. When I'm considering whether or not to follow someone I don't really look at the volume of their tweets in general; I mostly look just at the volume of their non-conversation tweets, because those are the only ones that I'm going to see. Often this makes me more willing to follow people (and thereby furthers Twitter's overall goal of getting me more engaged with their service). (4 comments.)
tech/TwitterVolumeLimit written at 22:48:37; Add Comment
Understanding Resident Set Size and the RSS problem on modern UnixesOn a modern Unix system with all sorts of memory sharing between processes, Resident Set Size is a hard thing to explain; I resorted to a very technical description in my entry on Linux memory stats. To actually understand RSS, let's back up and imagine a hypothetical old system that has no memory sharing between processes at all; each page of RAM is either free or in use by exactly one process. (We'll ignore the RAM the operating system itself uses. In old Unixes, this was an acceptable simplification; memory was statically divided between memory used by the OS and memory used by user programs.) In this system, processes acquire new pages of RAM by trying to access them and then either having them allocated or having them paged (back) in from disk. Meanwhile, the kernel is running around trying to free up memory, generally using some approximation of finding the least recently used page of RAM. How aggressively the operating system tries to reclaim pages depends on how much free memory it has; the less free memory, the faster the OS tries to grab pages back. In this environment, the resident set size of a process is how many pages of RAM it has. If the system is not thrashing, ie if there's enough memory to go around, a process's RSS is how much RAM it actually needs in order to work at its current pace. (All of this is standard material from an operating system course.) The problem of RSS on modern Unix systems is how to adopt this model to an environment where processes share significant amounts of memory with each other. In the face of a lot of sharing, what does it mean for a process to have a resident set size and how do you find the right pages to free up? There are at least two approaches the kernel can take to reclaiming pages, which we can call the 'physical' and 'process' approaches. In the physical approach the kernel continues to scan over physical RAM to identify candidate pages to be freed up; when it finds one, it takes it away from all of the processes using it at once (this is the 'global' removal of my earlier entry). In the process approach the kernel scans each process more or less independently, finding candidate pages and removing them only from the process (a 'local' removal); only once a candidate page has been removed from all processes using it is it actually freed up. (Scanning each 'process' is a simplification. Really the kernel scans each separate set of page tables; there are situations where multiple processes share a single set of page tables.) The problem with the process approach is that the kernel can spend a great deal of time removing pages from processes when the pages will never actually be reclaimed for real. Imagine two processes with a shared memory area; one process uses it actively and one process only uses it slowly. The kernel can spend all the time it wants removing pages of the shared area from the less active process without ever actually getting any RAM back, because the active process is keeping all of those pages in RAM anyways. So, why doesn't everyone use the physical approach? My understanding is that the problem with the physical approach is that it is often not necessarily a good fit for how the hardware manages virtual memory activity information. Per my earlier entry, every process mapping a shared page of RAM can have a different page table entry for it. To find out if the page of RAM has been accessed recently you may have to find and look at all of those PTEs (with locking), and do so for every page of physical RAM you look at. My impression is that most current Unixes normally use per-process scanning, perhaps falling back on physical scanning if memory pressure gets sufficiently severe. (I suspect and hope that virtual memory management in the face of shared pages have been studied academically, just as the older and simpler model of virtual memory has been, but I'm out of contact with OS research.)
2012-02-01 A ZFS pool scrub wish: suspending scrubsLike sensible people, we scrub our pools periodically in order to turn up latent problems. Because pool scrubs have a visible impact on responsiveness (at least in the lightly patched Solaris 10 update 8 that we're running), we only run scrubs on weekends (and only scrub one pool per fileserver). However, we've recently started running into problems where pool scrubs slow the fileservers down enough that backups have started failing. The obvious way around this is to switch things to only doing scrubs when backups aren't running. Except there's a problem: we run backups every day, they run for a fairly long time every day, and some of our pools take up to fifteen hours to scrub. If we only scrub when backups aren't running, there just isn't a fifteen hour gap that our biggest pools need. (It's possible that they would scrub somewhat faster if they never overlapped with backups, but that's only a vague possibility. And as the pools get more data, they'll take longer and longer to scrub.) Which brings me to my wish: I wish you could suspend ZFS pool scrubs. Not stop them and start them again from the start, but just put one to sleep by telling the pool to remember where the scrub was but do no further scrub IO for now, then later resume the scrub from where it left off. This would allow us to do even big scrubs around the backups, and in fact we could schedule scrubs much more liberally than we do right now. For example, we might have a couple of hours in a weekday early morning after backups have finished that we could use to get some scrubbing in. (I'd be perfectly happy if this was only an in-memory pause, so that if you rebooted your system or exported the pool you lost it and had to start from scratch. As an in-memory pause it ought to be relatively simple to implement.) PS: I checked and this doesn't seem to be in Illumos, at least based on the current Illumos zpool manpage.
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |