Wandering Thoughts archives


Why installing packages is almost always going to be slow (today)

In a comment on my entry on how package installs are what limits our machine install speed, Timmy suggested that there had to be a faster way to do package installs and updates. As it happens, I think our systems can't do much here because of some fundamental limits in how we want package updates to behave, especially ones that are done live.

The basic problem on systems today is that we want package installs and updates to be as close to atomic transactions as possible. If you think about it, there are a lot of things that can go wrong during package install. For example, you can suddenly run out of disk space halfway through; you can have the system crash halfway through; you can be trying to start or run a program from a package that is part way through being installed or updated. We want as many of these to work as possible, and especially we want as few bad things as possible to happen to our systems if something goes wrong part way through a package update. At a minimum we want to be able to roll back a partially applied package install or update if the package system discovers that there's a problem.

(On some systems there's also the issue that you can't overwrite at least some files that are in use, such as executables that are running.)

This implies that we can't just delete all of the existing files for a package (if any), upend a tarball on the disk, and be done with it. Instead we need a much more complicated multi-step operation with writing things to disk, making sure they've been synced to disk, replacing old files with new ones as close to atomically as possible, and then updating the package management system's database. If you're updating multiple packages at once, you also get a tradeoff of how much you aggregate together. If you basically do each package separately you add more disk syncs and disk IO, but if you do all packages at once you may grow both the transient disk space required and the risks if something goes wrong in the middle.

(Existing package management systems tend to be cautious because people are more willing to excuse them being slow than blowing up their systems once in a while.)

To significantly accelerate this process, we need to do less IO and to wait for less IO. If we also want this process to not be drastically more risky, we have no real choice but to also make it much more transactional so that if there are problems at any point before the final (and single) commit point, we haven't done any damage. Unfortunately I don't think there's any way to do this within conventional systems today (and it's disruptive on even somewhat unconventional ones).

By the way, this is an advantage that installing a system from scratch has. Since there's nothing there to start with and the system is not running, you can do things the fast and sloppy way; if they blow up, the official remedy is 'reformat the filesystems and start from scratch again'. This makes package installation much more like unpacking a tarball than it normally is (and it may be little more than that once the dust settles).

(I'm ignoring package postinstall scripts here because in theory that's a tractable problem with some engineering work.)

tech/SlowPackageInstalls written at 00:06:10; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.