2018-06-04
The superficial versus deep appeal of ZFS
Red Hat recently announced Stratis (for a suitable value of 'recently'). In their articles introducing it, Red Hat says explicitly that they looked at ZFS, among other similar things, and implies that they did their best to take the appealing bits from ZFS. So what are Stratis's priorities? Let's quote:
Stratis aims to make three things easier: initial configuration of storage; making later changes; and using advanced storage features like snapshots, thin provisioning, and even tiering.
If you look at ZFS, it's clear where Stratis draws inspiration from both ZFS features and ZFS limitations. But it's also clear what the Stratis people see as ZFS's appeals (especially in their part 2); I would summarize these as flexible storage management (where you have a pool of storage that can be flexibly used by filesystems) and good command line tools.
These are good appeals of ZFS, make no mistake. I like them myself, and chafe when I wind up dealing with less flexible and more cumbersome storage management via LVM. But as someone who's used ZFS for years, it's also my opinion that they are superficial appeals of ZFS. They're the obvious things that you notice right away when you start using ZFS, and for good reason; it's very liberating to not have to pre-assign space to filesystems and so on.
(Casually making a snapshot before some potentially breaking change like switching Firefox versions and then being able to retrieve files from the snapshot in order to revert is also a cool trick.)
However, the longer I've used it the more I've come to see the deep appeal of ZFS as its checksums and how these are deeply integrated into its RAID layer to enable things like self-healing (such deep integration is required for this). You generally can't see this appeal right away, when you just set up and use ZFS. Instead you have to use ZFS for a while, through scrubs, disks that develop problems, and perhaps ZFS noticing and repairing damage to your pool without losing any data. This reassurance that your data is intact and repairable is something I've come to really treasure in ZFS and why I don't want to use anything without checksums any more.
On the whole, Stratis (or at least the articles about it) provides an interesting mirror on how people see ZFS and how that's different from my view of ZFS. Probably there are lessons for how people view many technologies, and certainly I've experienced this sort of split in other contexts.
What I use Github for and how I feel about it
In light of recent events (or at least rumours) of Microsoft buying Github, I've been driven to think about my view of Github and how I'd feel about it changing or disappearing. This really comes in two sides, those of someone who has repos on Github and those of someone who uses other people's repos on Github, and today I feel like writing about the first (because it's simpler).
Some people probably use Github as the center of their own work, and to a certain extent Github tries hard to make that inevitable if you have repositories that are popular enough to attract activity from other people (because they'll interact with your Github presence unless you work hard to prevent that). In my case, I don't have things set up that way, at least theoretically. Github doesn't host the master copy of any of my repositories; instead I maintain the master copies on my own machines and treat the Github version as a convenient publicly visible version (one that presents, more or less, what I want people to be using). If Github disappeared tomorrow, I could move the public version to another place (such as Gitlab or Bitbucket), or perhaps finally get around to setting up my own Git publishing system.
Well, except for the bit where most or all of my public projects currently list their Github URL in the README and so on, and I have places (such as my Firefox addon's page) that explicitly list the Github URLs. All of those would have to be updated, which starts to point out the problem; those updates would have to propagate through to any users of my software somehow. The reality is that I've been sort of lazy in my README links and so on; they tend to point only to Github, not to Github plus anywhere else. What they should really do is point to Github plus some page that I run (and perhaps additional public Git repo URLs if I establish them on Gitlab or wherever).
There's some additional things that I'd lose, too. To start with, any issues that people have filed and pull requests that people have made (although I think that one can get copies of those, and perhaps I should). I'd also lose knowledge of people's forks of my Github repos and the ability to look at any changes that they may have made to them, changes that either show me popular modifications or things that perhaps I should adopt.
All of these things make Github sticky in a soft way. It's not that you can't extract yourself from Github or maintain a presence apart from it; it's that Github has made its embrace inviting and easy to take advantage of. It's very easy to slide into Github tacitly being your open source presence, where people go to find you and your stuff. If I wanted to change this (which I currently don't), I'm honestly not sure how I'd make it clear on my Github presence that people should now look elsewhere.
I don't regret having drifted into using Github this way, because to be honest I probably wouldn't have public repositories and a central point for them without Github or some equivalent. At the same time I'm aware that I drift into bad habits because they're easy and it's possible that Github is one such bad habit. Am I going to go to the effort of changing this? Certainly not right away (especially to my own infrastructure). Probably, like many people who have their code on Github, I'm going to wait and see and above all hope that I don't actually have to do anything.
(I'm also not convinced that there is any truly safe option for having other people host the public side of my repositories. Sourceforge serves as a cautionary example of what can happen to such places, and it's not like Gitlab, Bitbucket, and so on are obviously safer than Github is or was; they're just not (currently) owned by Microsoft. The money to pay for all of the web servers and disk space and so on has to come from somewhere, and I'm probably going to be a freeloader on any of them.)