Wandering Thoughts archives

2013-09-30

Centralizing syslogs as an easy way to improve your environment

If your environment is on the low end of the organized 'DevOps' spectrum, here's an easy improvement that I urge you to make right away: create a central syslog server that all of your machines echo their syslogs to. Disk space is cheap today and if you have an ordinary sized environment you don't need a particularly powerful machine or a particularly fast disk system (at least if you turn off synchronous syslog writes on your log host). Once you've set up the syslog server all you need to do to ship logs off to it is to add the following to everyone's syslog configuration:

*.*   @CENTRAL.SERVER.IP

(I prefer an IP address because it still works if hostname lookups blow up for some reason.)

Modern Linux versions of syslog support putting configuration fragments in a directory (typically /etc/rsyslog.d), so you can just make one and drop it in there; you don't even need to change the system-supplied syslog configuration files.

The syslog server itself should start with a simple configuration. First, log all of each facility's messages to a separate file (you might put all localN facilities in one file, depending). Then also log all messages to a single file; we call it allmessages. Your syslog server should have lots of disk space so you should configure syslog log rotation to keep many days or weeks of logs (trust me, this will be useful). Don't try to be clever by dropping debug messages or sorting things by priority and stuff like that. It's not worth the extra hassle and it's not reliable (plenty of programs log things you want to know at debug priority, or at random other priorities).

(You might wonder about separating things by facilities, but programs are much more consistent about that and it's useful to have a lower volume place to look for things like kernel messages.)

The important thing that this does is that it gives you a single place to get a global view on what happened when (at a system level) and to monitor your systems live if you need to. Over and over again it's been extremely helpful to us to see a coherent cross-system view and timeline of things like kernel error messages from NFS clients and NFS servers, authentication logs from multiple systems, and so on. The allmessages log itself gives us an easy way of seeing all syslog messages from a single machine no matter what random facility they were logged on, or all messages related to a single machine (both from the machine itself and from others interacting with it). Finally, you don't have to try to remember (or find, or fix) the syslog configuration on any particular machine and where it puts what messages; since all messages are shipped off to your central machine you can just look there and be done with it.

Of course this is not perfect. Sending syslog messages to a remote server is not completely reliable in general, things can go wrong with the network while the local disk is okay (although the converse can happen too), and so on and so forth. You can probably do better if you try hard enough. But if you don't have any sort of central logging right now, the perfect is the enemy of the good. Put together a central syslog machine now and improve it later. The benefit of having one right now is worth doing it twice (and worth it being less than perfect).

PS: If you're running enough Unix machines with enough syslog message volume that a central syslog server needs to be a beefy server with ultra-fast networking and many terabytes of disk space and so on, you hopefully already have some sort of central logging solution. If not, all I can suggest is multiple syslog aggregators to at least cluster things together.

PPS: Yes, there are some security concerns about a central syslog server. My personal view is that the security worries are worth the benefit of having a high quality central view.

CentralizeSyslog written at 22:59:00; Add Comment

2013-09-28

Why I put configuration management systems over packaging systems

A debate has broken out in the comments on my previous entry over whether you should manage system configurations through a configuration management system or through the system's package management. Although I once wrote a number of entries a few years ago in support of this view, I have now changed my mind and think that configuration management software does a better job for both pragmatic and high level reasons.

The pragmatics of the situation are clear: no packaging system today is really set up for this and so you can't really do it in practice. If they support features like overlays at all (and this is not common) the actual implementation is well below what you'd want. Nor is the foreseeable future likely to be any better; it seems very unlikely that any of the common package systems will add these features (often it would take significant changes in internal architecture to support this). So this idea is dead in the real world.

But even theoretically I think it's the wrong approach. To start with, the mechanics of managing a system via packages are awkward and indirect. You don't actually manage systems in any direct way; instead you assemble files, bundle them into packages, and queue those packages to propagate to systems where they do stuff. You need a whole infrastructure of 'action at a distance' things to make this work even in an ideal world with full package manager support.

(This is hard to see if you're only thinking about initial system setup because initial system setup looks simple; just install some packages and you're done. But that's not all you need to handle.)

The bigger problem is that managing through packages is working at the wrong level of abstraction. With packages you're forced to work at the very concrete level of files, package dependencies, and commands to run instead of at a level where you can write relatively abstract descriptions of the desired system state (which is what a configuration management system can let you do). Among other reasons, working at a relatively high level lets you actually express something close to your actual intentions instead of having them buried in a thicket of low level mechanics. You can also better describe various elements of state that are not easily represented as files to be shoved into places.

(All of my experience with configuration languages has convinced me that the more you hide intentions, the more things go wrong.)

PS: I'm assuming here that you can build all of the packages you need out of one version controlled directory hierarchy that has everything in it. This is at least theoretically possible and puts a package based system at par with modern CM systems that do work this way.

ConfigMgmtSystemOverPackages written at 01:14:52; Add Comment

2013-09-27

The long term future of any particular configuration management system

From my perspective, one of the problems with the current profusion of configuration management systems is exactly that there is a profusion of them. I don't have a problem with choice per se (or having to sort through them all); for me the issue is the uncertainty that this creates for the long term future of any particular system.

Right now we have a lot of CM systems; off the top of my head I can think of CFEngine, Puppet, Chef, Ansible, and SaltStack, and there are probably others. It seems unlikely that they will all survive as actively developed and used systems over the next five to ten years. Some of them certainly will but probably not all (and I kind of hope that not all of them survive, because if they all do that's a lot of duplicate or near-duplicate effort).

Unless you're confident in your ability to pick winners and losers here (which I'm certainly not), the net effect of this is to add more uncertainty about the long term costs and payoffs of using any particular current CM system. If today you pick what turns out to be a loser in five years, well, that's all the payoff time you're getting (and now you'll have the costs of switching what's likely to be a well-developed environment to another CM system). At least for me this drives up the short term payoff I want to see from a CM system before it seems worthwhile to switch to it and I keep having problems seeing that.

(I don't have a good answer to this issue. I'm not sure there is one. And certainly my deep-seated skepticism that a CM system wouldn't be a big win here is part of the general issue; if I thought a CM system would be a significant improvement, well, I'd have that short term payoff. Instead I'm left feeling that I'd get at best a small incremental improvement over the short term.)

CMSystemsLongTermFuture written at 00:41:20; Add Comment

2013-09-26

Trying to explain my harshness on configuration management tools

It started with a Tweet that became a conversation:

@thatcks: I'm biased but if a CM system needs more than an apt-get to install I'm not going to use it to set up our Ubuntu machines. Sorry Ansible.

@jgoldschrafe: That's kind of a silly approach - most modern CM tools are evolving so fast that it's folly to stick to outdated versions

@thatcks: For us it's an issue of both (long-term) risk and effort, and expected payoff from using a CM system for system setup.

(By needing just an apt-get to install, I mean that the CM system must be in standard Ubuntu repositories. Adding a PPA is one step too many.)

To start with I'm restricting this to (initial) system setup, the stuff where you go from bare metal to a working system. As I've mentioned we already have a mostly scripted system for this, build instructions for all machines, and a central repository of everything we need to rebuild a machine. Our build instructions are almost all quite short and basically amount to 'install some packages and put some configuration files into place'. The best case for using a CM system for initial system setup is that it reduces all of these extra build steps down to 'install CM system; run CM system once' (a script can make that one step). But given that the steps are pretty simple to start with, the payoff to doing this is relatively low and automation adds its own overhead that further reduce it.

(There are various ways in which the payoff is not zero and using a CM system for initial setup opens the door to using it for ongoing changes, which has additional potential advantages. The idea was attractive enough that I considered experimenting with it with Ansible until I found out the problem.)

Anything beyond a simple apt-get for the install adds friction to using the CM system. This friction directly reduces the payoff of using the CM system in the first place; at the worst, setting up the CM system takes about as many steps as the by-hand build instructions would. At this point my co-workers would sensibly reject it out of hand. In addition, using any sort of unofficial package or direct local build (I've seen people suggest cloning git repos, for example) leaves us exposed to a collection of risks. With real Ubuntu packages there is in theory some sort of tracking of upstream security issues, some degree of quality control, and some degree of stability over time. None of these three are guaranteed for PPAs or other distribution mechanisms, which means that ultimately they become our responsibility (at a minimum we get to check that our package supplier is doing them and hasn't gotten busy with other things).

(Here I want to mention that we're unlikely to consider it a feature that a core component of our build instructions keeps evolving rapidly on us. We like stability for that sort of stuff.)

For me in our environment, the result is that an unpackaged CM system is not worth the hassle for system setup. The likely payoff is too low and the risks, however moderate, are not appealing. We already have a system that works; we don't really need one that works maybe a bit better but is a lot more complicated.

ConfigMgmtSetupTradeoffs written at 01:36:39; Add Comment

2013-09-18

Reconsidering external disk enclosures versus disk servers

Not even three months ago I confidently wrote about all of the good reasons why we were picking external disk enclosures over disk servers. Today I'm here to tell you why we've flip-flopped on that and are now planning to buy disk servers instead. What it boils down to is money.

(I could claim it was also uncertainties over SATA disks behind SAS expanders, but not really; I only started really reading about those issues after we'd made the decision.)

What started the ball rolling was that we found reasonably affordable motherboards with onboard dual 10G-T Ethernet ports (and SAS). These were pretty much the only affordable way of doing 10G in our next generation of fileserver hardware. However, going with this motherboard meant no more generic inexpensive servers; instead we'd have to specify out a case and all of the other things for it. This basically meant that we had a 'one chassis or two' choice; we could buy a case for the motherboard and then a second case as an external disk enclosure, or we could buy just one case and put both the motherboard and the disks into it. Using a single case will save us a significant amount per backend and it turned out that we could find a suitable case (in fact one with a near-ideal disk configuration).

I still believe in all of the merits of external disk enclosures that I wrote about in my original entry. But until we can get inexpensive generic servers with dual 10G-T (and SAS) all of them are trumped by budget practicalities. We can deal with the moderate downsides.

(There are also some upsides, such as fewer exterior cables to get snagged and accidentally yanked loose. I'm always a bit nervous when I'm behind our current fileserver racks because of all of the ESATA cables.)

(Would I still buy external disk enclosures if we had the budget for it? I'm honestly not sure. Those advantages are real but I'm not convinced that they're worth the cost, especially when compared to other things you could do with the same amount of money. If I had endless money, yes definitely; we'd use SAS disks in external SAS JBODs connected to generic servers with dual 10G-T onboard.)

DiskShelvesVsServersII written at 01:29:02; Add Comment

2013-09-07

Why wiring things up physically instead of virtually is better for us

As I mentioned in my entry on the physical versus the virtual approach to network drop wiring, we use the physical wiring approach and I think it's the right approach for us. Today I want to run down my view of the collection of reasons that make it so.

First off is that physical wiring can be done almost entirely with a collection of inexpensive and relatively small switches, since most switches are only for a single VLAN. Virtual wiring basically demands big switches with a lot of ports so that you can handle as many drops as possible through one switch (both for bandwidth and for ease of management and configuration changes). Unfortunately our environment makes it hard to buy big, expensive things. It would be very hard for us to buy, upgrade, and replace the quite expensive (by our standards) core switch or switch stack that the virtual approach calls for.

A related advantage of the physical approach using lots of smaller switches is that we can mix and match switch types, picking the best (or cheapest) model and company for any particular purpose. We can also upgrade switches piece by piece, which is very much how our migration from 100 Mbit to 1 GBit Ethernet happened (and yes, it took years).

Next is a cluster of issues related to making wiring changes either physically or virtually:

  • making switch configuration changes is not as easy as it looks, especially if you bought relatively inexpensive switches.
  • for the switches we can afford, it's generally easier to see your actual network configuration when it's embodied in physical wiring than when it's all virtual. Colour-coding important networks helps in this.

  • it's much easier to share access to wiring closets than it is to switch configurations, at least in our environment.
  • it seems easier to train people to do physical wiring changes than it is to get them to do switch changes (especially if switch changes involve a complex dance to also make backups or mirror the changes on a backup switch).
  • I think the chances of errors are lower in practice with physical wiring changes because of the physical nature of things. People can make slips in switch configuration that are much less likely when they are moving network wires (especially if they're only supposed to touch their switches).

I suspect but don't know for sure that our port-isolated networks would add complexity (and heartburn) in a giant-switch environment. It's possible that modern switches are smart enough that you can set this as a default VLAN parameter or something and have it work.

WhyPhysicalWiringForUs written at 23:50:39; Add Comment

2013-09-06

Making switch configuration changes is not as easy as it looks

In theory one of the advantages of the virtual approach to network drop wiring is that all you have to do to change a drop's networking is change your switch configuration for its port. But I feel that this ease is often somewhat of an illusion and I've come around to the view that switch configuration changes are not as easy in practice as they look (at least in many smaller environments).

To put it simply, making the change on the switch itself is the easy part. If the switch matters, you need to be able to deal with it dying; you need to be able to deploy a replacement on relatively short notice. This means that it's not enough to just change the live switch, you also need a way to get that change on to a replacement switch, generally either a backup of the configuration or actually mirroring the change on your hot spare switch. Depending on how much work is involved, this configuration replication can easily take more effort and time than the actual change itself.

Life is hopefully better on 'enterprise' grade switches, but our experience on inexpensive managed switches is that you do not really get features to make this easy (such as the ability to push a configuration on to the switch from another machine). Sometimes you can't even back up and restore the switch configuration (at this point you get to manually redo your change on your hot spare switch).

A closely related issue is dealing with the possibility of errors. If you have to change things directly on the switch it takes extra care to make sure that you are doing only the exact change you intended to do and backing out from an accidental change may be quite time consuming (in the worst case you get to check the entire switch configuration).

SwitchConfigChangeNotEasy written at 01:31:38; Add Comment

2013-09-05

The physical versus the virtual approach to network wiring

I was recently reading Matt Simmons' The Cyclical Nature of Academia, where he talks a bit about the network wiring that they do. This sparked a thought about the two different ways to do network wiring (or at least the two main ways).

If you have a collection of VLANs and a collection of ports and a set of mappings between these two that stubbornly keeps changing, you can set up your wiring closet in what I'll call physically or virtually. The physical approach is simpler to describe; you have a collection of single-network switches and as a wall jack changes VLAN membership you physically rewire it from switch to switch by moving its patch cable. In the virtual approach you have a great big switch stack that carries all VLANs, wire each wall jack to a fixed port on the switch stack, and to 'rewire' a wall jack you change the switch's VLAN configuration for that port; the physical patch cable never moves.

From the brief description in Simmons' entry it sounds like his organization has gone for the virtual wiring approach. For a collection of reasons we have gone for the physical one (as I implied in an earlier entry on how our network is implemented).

My feeling is that you inevitably wind up with a network snarl in both cases once enough time has gone by. The difference between the two is where that network snarl is. In the physical case the snarl is physical, created as you run and re-run patch cables between your patch panels and your edge switches. In the virtual case the snarl is in software, in the switch configuration where a mishmash of VLANs goes to a mishmash of ports (you have no physical snarl; since your patch cables never move you can set them up neatly and have them stay that way).

I don't think that there is a universal right answer for whether you should do physical or virtual wiring. In our specific situation physical wiring is clearly the right choice for a complex collection of reasons, even if it creates visible cable messes.

(Trying to explain those reasons is for another entry.)

PhysicalOrVirtualWiring written at 00:15:12; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.