2015-12-31
How I've wound up taking my notes
If you're going to take and keep notes, you need some way to actually do this. I'm not going to claim that my system is at all universal; it's simply what has worked so far for me. I'm a brute force Unix kind of person, so my system is built on simple basic Unix things.
My basic form of note taking is plain ASCII in Unix files. I don't version control them (although maybe I should), but instead I treat them as logs where I only append new information to the end rather than rewriting existing sections (although sometimes I'll add an update note directly in place next to some information I later found out was incorrect). The honest reason why I take this append only approach is that it's easier to write, but I can justify with it as creating a useful record of my thinking at the time.
(The exception to this is when I'm writing and testing things like build instructions or migration checklists. If my output is going to be documentation for other people, it obviously has to get rewritten in place so the end result is coherent.)
Some but not all of the time I will date new additions to files (in simple '2015-12-30' form). This helps me keep track of when I did something and also how long it's been since I worked on something. Although I often used to just summarize the commands I was using and the output I was getting, I have tried lately to literally copy and paste both commands and output in. I've found that this is handy for being lazy when repeating things; I can just copy-and-paste commands from the file into a terminal window.
I use file names that make sense to me, although not necessarily to
anyone else. Typical file names are things like bsdtcp-restores
and cs8-oldmail-weirdness. Often I'll put a summary of the project
or issue at the top of a file, so that if (or when) I look at it much
later I can remember what it was about. For lab notebook stuff I tend to put the date of the initial incident
in the filename, but I'm kind of inconsistent in this.
I've found it useful to segregate my notes files into more or less three directories. One is for (active) projects, one is for general notes on various things, and one is for lab notebook stuff done during (semi-)crisis situations. In all of those directories I have subdirectories for files that are complete, or over, or now obsolete for various reasons. All of these old files remain valuable so I keep them, but I try to keep the top level directories only having current things (especially for the projects directory). I sometimes rename files when I move them into subdirectories because I realize that my initial file name is not a good one for future reference (often it turns out to be too generic).
I don't currently keep any of these files under version control. Maybe I should, but at the moment it feels like overkill given that I never want to delete things and I don't really have a situation where I want 'revert to (or look at) previous version of a file'. Many things are implicitly versioned just by me having multiple files and starting new ones for new situations, even if I copy things from an older file.
(For example, the test plan for upgrading our mail server from Ubuntu 10.04 to 12.04 is in a different file than the test plan for upgrading it from 12.04 to 14.04, even though I created the latter from the former.)
As to where these files all live: their master location is in my home directory on our fileservers. On the rare occasion that I need to refer to or work on one of them when our fileservers or our Linux servers are going to be down, I rsync the relevant file to my office workstation and work on it there, then rsync it back afterwards. These are fortunately not the sort of notes that I'd be looking at if our entire infrastructure fell down. Putting them in my fileserver home directory means that they're automatically available on all of our Unix servers and they get backed up via our backup system and so on.
As for the editor I use, well, vi is my sysadmin editor. But the choice of editor doesn't really
matter here (and sometimes I use others).
PS: I'm lucky enough that none of my notes files need to be kept so secret that they need to be encrypted. I don't know what I'd do if I needed that for some of my notes, and given that encryption is generally a pain I hope that I never have to find out.
2015-12-29
Take notes when (and as) you do things and keep them
One of the lessons that I have been learning over and over again over time and in different contexts is that I should take notes about what I'm doing and then keep them. As you can tell I've written before about this in various specific contexts, but I keep not entirely learning and writing down the general lesson, which is that this is a good idea basically all the time. Really, there are very few situations where taking good notes and then keeping them is not a good idea.
So that is my big piece of advice:
Take notes as you do things and then keep them after you're done, even if you don't think you're going to want them later.
(Like all general pieces of advice there are all sorts of specific exceptions.)
Fortunately I mostly haven't learned this the hard way. I've tended to write things down as I was doing them just to keep track of where I was (as I get interrupted by moths) and I'm naturally a packrat with files, so I've wound up keeping all of these notes files. Where I have learned the hard way is in how much detail I've often (not) put in many of those notes. Unless I thought I was writing them for my future self (which I knew I was in a few cases), I tended to only put down what I needed at the time to jog my current memory. This is of course far less than what I wound up wanting much later when I was trying to remember what exactly I'd done and precisely what the results were.
(Having stubbed my toes on this, I now try to include the specific command lines, exact output, and so on instead of just writing general notes. My co-workers periodically asking me for specifics has also helped a bunch; there is nothing like other people for showing you your own blind spots.)
There are some things I do that are too small for notes, of course, but certainly anything that takes me more than an hour or two should have notes, regardless of what it is. Regardless of whether it was looking into some issue, working out where information was, testing something, making some change, or so on, sooner or later I'm probably going to want to do something like it again or at least look back at what I did and what I saw.
(And even if I don't think I'm ever going to need what I'm doing again, well, writing things down is relatively cheap and not writing them down can be very annoying, as I keep reminding myself when I write some entries here. It's better to err on the side of writing too much down and then having to search through it later.)
2015-12-21
Some opinions on how package systems should allow you to pin versions
Good package management systems allow you to selectively pin or freeze the versions of some packages. Over time I have evolved some opinions on how this should work, usually by getting irritated by the limitations of some tool in doing this (which is what happened today). So here is the minimum things that your package management tool should support around this.
First, you should be able to pin a package at a version that is not the currently installed version. Such a pin means that the package system is allowed to upgrade the package to that version and no other. Ideally such a pinned version would anchor the update of other packages which require synchronized versions.
(Bonus points are awarded if the package system can be made to downgrade a package to that version as well.)
Second, you should be able to pin the version of a package that is not even installed. Such a pin means that only that version of the package is allowed to be installed later. As with the previous case, a 'can only install version X' pin should influence other packages through dependencies and so on.
When both situations primarily matter is during system installation where you will be applying package updates (which is often the case). If you can pin non-current versions and even future package installs, your install system has a simple workflow; it first installs all of your pins, then does its usual package updates and installations of extra packages without worrying about specific versions. If you cannot pin non-current or non-installed packages, knowledge of pinned packages (and their versions) leaks into the whole update and install process. When you apply updates, you have to limit some packages to be updated only so far; when you install new packages, you have to install specific versions of some packages. And on top of this you have to pin (or hold, or freeze) the packages as well, so that future updates won't undo your work of picking specific package versions.
(This can also matter later on if you decide that you now want to pin some additional packages before applying more updates or installing new packages.)
Sidebar: pinning specific versions versus holding back changes
Sometimes you want a specific version of a package because it's what you've determined will work, or because you want all systems to be the same for some important package, or the like. Other times you don't particularly care about the specific version, but you just don't want a package to change for various reasons (for example, kernel updates might require reboots which have to be scheduled well in advance, or grub updates often wind up causing problems for your update process).
In theory, holding package changes is a subset of pinning a specific
version (it is 'pin the current version of the package'). In practice
package managers that support both often implement the two in
different ways. I believe that Debian apt is an example of this.
2015-12-18
A fun tale of network troubleshooting involving VLANs and MACs
The following is not my story; it comes from my co-workers, who had the real fun of trying to figure this one out and then finding a fix.
To start with, let's set the background. We have an (OpenBSD) routing firewall machine that sits on a network segment whose egress router is not under our control. Actually, we have two of them, one active and one as a warm spare (it's on and being updated, but it is not connected to any of the production networks because otherwise it would fight the live firewall for the public IPs). A while back, as part of trying to fail over from the live machine to the warm spare, we discovered that the egress router for the network caches ARP information for a long time. Like, apparently, hours. This was obviously no good for being able to switch over (such as in the case of hardware failure). Since the egress router is not under our control, the only thing we could really do was explicitly set the warm spare to have the same Ethernet address as the active machine.
(This was tested at the time it was set up and worked, but we believe the test at the time was misleading.)
Recently my co-workers wanted to swap from the active machine to the warm spare, because the active machine had been up for literally years (we don't update OpenBSD all that often). Unfortunately, when they made the swap the (ex-)warm spare was not reachable on its public IP, so they failed back to the active machine and took the warm spare off for testing. Testing established that the warm spare was showing 'incomplete' for other machines in its ARP cache, although other machines picked it up fine for their ARP caches. Further, trying to inspect the traffic with tcpdump made the network suddenly work, but things broke again when they stopped tcpdump. Oh, and the problem was specific to using our preferred Intel Ethernet cards; if the warm spare was switched to use non-Intel network hardware, everything worked.
Now, it happens that this machine has a slightly unusual network configuration. Because it needs to talk to a number of external networks, it actually gets all of its external networks as tagged VLANs over a single physical network port. When we changed the machine to use the MAC of the active machine, we had set the Ethernet address on the VLAN for that particular network, because that was the network that mattered; we didn't change the MAC of anything else.
It turned out that this was the problem. Using Intel cards on our (old) version of OpenBSD, when the MAC of the VLAN differed from the MAC of the underlying physical interface and the interface was not in promiscuous mode, ARP (at least) didn't work because the kernel apparently never received the replies to its ARP queries. If you put the interface into promiscuous mode, such as by running tcpdump, things suddenly worked; the kernel received ARP replies and so on. We think that the whole setup worked when tested because we likely tested it with tcpdump running to watch traffic (and verify what MACs were being used).
(The obvious suspect here is hardware level receive filtering; perhaps the hardware is only being set by the driver to recognize the physical port MAC as its MAC. This is a driver and/or hardware issue, but these things happen.)
Once my co-workers figured out what the problem was, the fix was simple: explicitly set the MACs of both the physical port and all the VLANs on it to the active machine's MAC. But getting there took a whole frustrating and puzzling journey. This wasn't exactly a Heisenbug, but until my co-workers noticed the pattern that running tcpdump made it disappear it did look like one.
(Using 'tcpdump -p' is the obvious thing for the future, but I
don't know if it would actually have worked in this situation.
Still, it's something to try to remember for the next time around.
Maybe tcpdump should default to -p these days.)