Wandering Thoughts archives

2014-08-10

The problem with self-contained 'application bundle' style packaging

In a comment on my entry on FreeBSD vs Linux for me, Matt Campell asked (quoting me in an earlier comment):

I also fundamentally disagree with an approach of distributing applications as giant self contained bundles.

Why? Mac OS X, iOS, and Android all use self-contained app bundles, and so do the smarter third-party developers on Windows. It's a proven approach for packaged applications.

To answer this I need to add an important bit of context that may not have been clear in my initial comment and certainly isn't here in this extract: I was talking about PC-BSD in specific and in general the idea that the OS provider would distribute their packages this way.

Let's start with a question. Suppose that you start with a competently done .deb or RPM of Firefox and then convert it into one of these 'application bundles' instead. What's the difference between the contents of the two packagings of Firefox? Clearly it is that some of Firefox's dependencies are going to be included in the application bundle, not just Firefox itself. So what dependencies are included, or to put it another way, how far down the library stack do you go? GTK and FreeType? SQLite? The C++ ABI support libraries? The core C library?

The first problem with including some or all of these dependencies is that they are shared ones; plenty of other packages use them too. If you include separate copies in every package that uses them, you're going to have a lot of duplicate copies floating around your system (both on disk and in memory). I know disk and RAM are both theoretically cheap these days, but yes this still matters. In addition, packaging copies of things like GTK runs into problems with stuff that was designed to be shared, like themes.

(A sufficiently clever system can get around the duplication issue, but it has to be really clever behind the backs of these apparently self contained application bundles. Really clever systems are complex and often fragile.)

The bigger problem is that the capabilities enabled by bundling dependencies will in practice essentially never be used for packages supported by the OS vendor. Sure, in theory you could ship a different minor version of GTK or FreeType with Firefox than with Thunderbird, but in practice no sane release engineering team or security team will let things go out the door that way because if they do they're on the hook for supporting and patching both minor versions. In practice every OS-built application bundle will use exactly the same minor version of GTK, FreeType, the C++ ABI support libraries, SQLite, and so on. And if a dependency has to get patched because of one application, expect new revisions of all applications.

(In fact pretty much every source of variation in dependencies is a bad idea at the OS vendor level. Different compile options for different applications? Custom per-application patches? No, no, no, because all of them drive up the support load.)

So why is this approach so popular in Mac OS X, iOS, Windows, and so on? Because it's not being used by the OS vendor. Creators of individual applications have a completely different perspective, since they're only on the hook to support their own application. If all you support is Firefox, there is no extra cost to you if Thunderbird or Liferea is using a different GTK minor version because updating it is not your responsibility. In fact having your own version of GTK is an advantage because you can't have support costs imposed on you because someone else decided to update GTK.

sysadmin/ApplicationBundleProblems written at 23:32:03; Add Comment

What I want out of a Linux SSD disk cache layer

One of the suggestions in response to my SSD dilemma was a number of Linux kernel systems that are designed to add a caching layer on top of regular disks; the leading candidates here seem to be dm-cache and bcache. I looked at both of them and unfortunately I don't like either one because they don't work in the way I want.

Put simply, what I want is the ability to attach a SSD read accelerator to my filesystems or devices without changing how they are currently set up. What I had hoped for was some system where you told things 'start caching traffic from X, Y, and Z' and it would all transparently just happen; your cache would quietly attach itself to the rest of the system somehow and that would be that. Later you could say 'stop caching traffic from X', or 'stop entirely', and everything would go back to how it was before. Roughly speaking this is the traditional approach taken by local disks used to cache and accelerate NFS reads in a few systems that implemented that.

Unfortunately this isn't what dm-cache and bcache do. Both of them function as an additional, explicit layer in the Linux storage stack, and as explicit layers you don't mount, say, your filesystem from its real device, you mount it from the dm-cache or bcache version of it. Among other things, this makes moving between using a cached version and a non-cached version of your objects a somewhat hair raising exercise; for example, bcache explicitly needs to change an existing underlying filesystem. Want to totally back out from using bcache or dm-cache? You're probably going to have a headache.

(This is especially annoying because there are two cache options in Linux today and who knows which one will be better for me.)

Both dm-cache and bcache are probably okay for a large deployment where they are planned from the start. In a large deployment you will evaluate each in your scenario, determine which one you want and what sort of settings you want, and then install machines with the caching layer configured from the start. You expect to never remove your chosen caching layer; generally you'll have specifically configured your hardware fleet around the needs of the caching layer.

None of this describes the common scenario of 'I have an existing machine with a bunch of existing data, and I have enough money for a SSD. I'd like to speed up my stuff'. That is pretty much my scenario (at least to start with). I rather expect it's very much the scenario of any number of people with existing desktops.

(It's also effectively the scenario for new machines for people who do not buy their desktops in bulk. I'm not going to spec out and buy a machine configuration built around the assumption that some Linux caching layer will turn out to work great for me; among other things, it's too risky.)

PS: if I've misunderstood how dm-cache or bcache work, my apologies; I have only skimmed their documentation. Bcache at least has a kind of scary FAQ about using (or not using) it on existing filesystems.

linux/SSDDiskCacheDesire written at 00:47:14; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.