Hardware Security Modules are just boxes running opaque and probably flawed software
The news of the time interval is that researchers discovered a remote unauthenticated attack giving full, persistent control over a HSM (via). When I read the details, I was not at all surprised to read that one critical issue was that the internal HSM code implementing the PKCS#11 commands had exploitable buffer overflows, because sooner or later everyone seems to have code problems with PKCS#11 (I believe it's been a source of issues for Unix TLS/SSL libraries, especially OpenSSL).
(The flaws have apparently since been fixed in a firmware update by the HSM vendor, which sounds good until you remember that some people deliberately destroy the ability to apply firmware updates to their HSMs to avoid the possibility of being compelled to apply a firmware update that introduces a back door.)
There is perhaps a tendency to think that HSMs and hardware security keys are magic and invariably secure and flawless. As this example and the Infineon RSA key generation issue demonstrate quite vividly, HSMs are just things running opaque proprietary software that is almost certainly not as good or as well probed as open source code. Proprietary software development is not magic, any more than open source development is, but open source code has the advantage that it's much easier to inspect, fuzz, and so on, and if a project is popular, there probably are a number of people doing that. The number of people who will ever apply this level of scrutiny to your average HSM is much lower, just as it is much lower with most proprietary software.
This doesn't mean that HSMs are useless, especially as hardware security tokens for authenticating people (where under most circumstances they serve as proof of something that you have). But I have come to put much less trust in them and look much more critically at their use. For server side situations under many threat models, I increasingly think that you might be better off building a carefully secured and sealed Unix machine of your own, using well checked open source components.
(Real HSMs are hopefully better secured against hardware tampering than any build it yourself option, but how much you care about this depends on your threat model. An entirely encrypted system that is not on the network and must have a boot password supplied when it powers on goes a long way. Talk to it over a serial port using a limited protocol and write all of the software in a memory safe language using popular and reasonably audited cryptography libraries, or audited tools that work at as high a level as you can get away with.)
PS: The one flaw in the build your own approach in a commercial setting is that often security is not really what you care most about. Instead, you may well care most about is that it's not your fault if something goes wrong. If you buy a well regarded HSM and then a year later some researchers go to a lot of work and find a security flaw in it, that is not your fault. If you build your own and it gets hacked, that is your fault. Buying the HSM is much safer from a blame perspective than rolling your own, even if the actual security may be worse.
(This is a potential motivation even in non-commercial settings, although the dynamics are a bit different. Sometimes what you really care most about is being able to clearly demonstrate due diligence.)
Our current approach for updating things like build instructions
At work here, we have a strong culture of documenting everything we do in email, in something that we call 'worklogs'. Worklog messages are sent to everyone in my group, and they are also stored in private, searchable web archive. We also have systematic build instructions for our systems, and unsurprisingly they are also worklog messages. However, they are unusual worklog messages, because they are not only what was done but also what you should do to recreate the system. This means that when we modify a system covered by build instructions, we have to update these build instructions and re-mail them to the worklog system.
For a long time, that was literally how things worked. If and when you modified such a system, part of your job was to go to the worklog archives, search through them, find the most recent build instructions for the system, make a copy, modify the copy, and then mail it back in. If you were making a modification that you weren't sure was final or that we'd want to keep, you had to make an additional note to do this whole update process when the dust settled. If you were in a rush or had other things to do too or weren't certain, it was pretty tempting to postpone all of this work until some convenient later time. Sometimes that didn't happen, or at least didn't happen before a co-worker also modified the machine (with various sorts of confusion possible in the aftermath). A cautious person who wanted to build a copy of a machine for a new Ubuntu version would invariably wind up trawling through our worklog archive to check for additional changes on top of the latest build instructions, and then perhaps have to sort out if we wanted to keep some of them.
At one point, all of this reached a critical mass and we decided that something had to change; build instructions needed to be more reliably up to date. We decided to make a simple change to enable more easy updates; we would commit to keeping the current copy of every set of build instructions in a file in a known spot in our central administrative filesystem, as well as mailing them to the worklog system. That way, we could cut a number of steps off the update process to reduce the friction involved; rather than hunting for the latest version, you could just go edit it, commit it to RCS, and then mail it in to the worklog system.
(The version in the worklog system remains the canonical reference, in part because my co-workers keep printed out copies of the build instructions for various critical systems. An update doesn't really exist until it's been emailed in.)
This modest change is now a couple of years old and I think it's safe to say that it's been a smashing success. Our build instructions are now almost always up to date and it takes much less work to keep them that way. What was a pain in the rear before is now only a couple of minutes of work, often quite simple work. In the common case, you can copy the commands necessary from your existing email message about 'I made the following change to system X', since we always write those when we make changes. As an additional benefit, we don't have to worry about line-wrapping and other mangling happening when we copy email messages around and cut & paste them from the web archive system and so on; the 'real' build instructions live in a text file and never get mangled by any mail-related thing.
In general and in theory I know the power of small changes that reduce friction, but pretty much every time I run into one in practice it surprises me all over again. This is one of the times; I certainly hoped for a change and an improvement, but I didn't have any real idea how large of one it would be.
PS: We also have various 'how-tos' and 'how-this-works' and so on documentation that we keep in the same directory and update in the same way. Basically, any email in our worklog archive that serves as the canonical instructions or explanations for something is a candidate to be enrolled in this system, not just system build instructions.