Our small tools for running commands on multiple machines

January 14, 2018

A while back I wrote about the personal shell scripts I had for running commands on multiple machines. At the time, they were only personal scripts that I used myself; however, over time they kept informally creeping into worklog entries that documented what we actually did and even some shell scripts we have to pre-write the commands we need for convoluted operations like migrating ZFS filesystems from server to server. Eventually we decided to adopt them as actual official scripts, put in our central location for such scripts.

My own versions were sort of slapped together, especially the machines script to print out the names of machines that fall into various categories, so making them into production-worthy tools meant cleaning that up. The oneach script needed only moderate reforms and as a result the new version is only slightly improved over my old personal version; in day to day usage, I probably couldn't notice any difference if I switched back to using my old one.

(The big difference is that the production version has more options for things like extra verbosity and a dryrun mode that just reports the ssh commands that would be run.)

The machines command got completely redone from scratch, because I realized that my hack approach just wouldn't work. For a start, I couldn't ask my co-workers to edit a script every time we added a machine; there would have been a revolt. So I wrote a new version in Python that parsed a configuration file. This new production version is a drastic improvement over my shell script hack; because I wrote it in Python, I was able to include significantly more features, in addition to making it more convenient and regular (since it's parsing a configuration file). The most important one is support for 'AND' and 'EXCEPT' operations, so you can express machine categories like 'all machines with some feature that are also Ubuntu 16.04 machines' or 'all Ubuntu 14.04 machines except ...'. This is supported both in the configuration file, where it sees a little bit of use, and on the command line, where I take advantage of it periodically.

(The configuration file format is nothing special and basically duplicates what I've seen other similar programs use. Although I didn't consciously set out to duplicate their approach, it feels like we wound up in the same spot because there's only so many good solutions for the problem.)

Using a configuration file doesn't just make things more convenient and maintainable; it also makes them more consistent, in several senses. It's now much harder for me to accidentally forget to add machines to categories they should be in (or not remove them from categories that no longer apply). A good part of the reason is that the configuration file is mostly inverted from how my script used to do it. Rather than list machines that are in categories, it mostly lists the categories that a machine is in:

apps0    apps  ubuntu1604  allnfs  users

There are a few categories that are explicitly specified, but even then they tend to be in terms of other categories:

all=ubuntu1604 ubuntu1404

This approach wouldn't have been feasible in my original simple shell script, but it's a natural one once you have a configuration file (especially if you want to make adding new machines easy and obvious; for the most part you can copy an existing line and change the initial host name).

In theory I could have done all of these improvements in my own personal versions, and writing the Python version of machines didn't take too long (even writing a Go version for my own use only added a modest amount of time). In practice it took the push of knowing that these had to now be generally usable and maintainable by my co-workers to get me to spend the time. Would it have been wrong to spend the time on this when they were just personal scripts? Probably, and even if not I doubt I could have persuaded myself of that. After all, they worked well enough as they were originally.


Comments on this page:

This reminds me a bit of the ansible inventory format: http://docs.ansible.com/ansible/latest/intro_inventory.html

Their extension to this format - dynamic inventory scripts (http://docs.ansible.com/ansible/latest/intro_dynamic_inventory.html) lets you introspect your environment as needed and add additional attributes to these hosts if appropriate.

By dozzie at 2018-01-14 08:38:02:

In my previous job we used cfengine to manage servers, and cfengine stores "classes" the machine belongs to (i386/x86_64, rhel4/5/6, which server room, which network, intended purpose, intended services, etc.) Then we were collecting these classes and putting them into database, which served as a sort of servers inventory.

Then we had a search form with a simple query language, and once the search function was exposed with RPC, it was a nice fit for "run on all machines" tool (we had one, too).

Your environment may not warrant such an elaborate thing as automatic inventory collection; we had several hundreds of servers that were installed, reinstalled, and decommissioned quite often, while you probably have quite static list of machines.

By rwoodsmall at 2018-01-14 18:03:43:

ClusterShell's clush and cluset have been lifesavers when working on loads of machines.

http://cea-hpc.github.io/clustershell/

Ships with lots of examples, can setup different groups, with different commands sets, etc., or use it as a simple command runner with an interactive repl.

Written on 14 January 2018.
« Some plans for migrating my workstation from MBR booting to UEFI
Meltdown and the temptation of switching to Ryzen for my new home machine »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 14 02:15:11 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.