Wandering Thoughts

2018-10-05

My non-approach to password management tools

In response to my entry on why I don't set master passwords in programs, Bill asked a good question:

Does your skepticism extend to password-management tools in general? If so, then what do you store passwords in? [...]

There are two answers to this. The first one is that I simply assume that if an attacker compromises my machine, they get essentially everything no matter what I do, so I either record a password in an unencrypted form on the machine or I don't have it on my machine at all. Access to my machine more or less gives you access to my email, and with access to my email you can probably reset all of the passwords that I keep on the machine anyway. In general and in theory, none of these are important passwords.

(In practice, these days I would care a fair bit if I lost control of some of the accounts they're for. But I started doing this back in the days when the only web accounts I had were on places like Slashdot and on vendor sites that insisted that you register for things.)

But that's the anodyne, potentially defensible answer. It's true, as far as it goes, in that I make sure that important, dangerous passwords are never recorded on my machine. But it is not really why I don't have a password manager. The deeper truth is that I've never cared enough to go through the effort of investigating the various alternatives, figuring out which one is trustworthy, competent, has good cryptography, and will be there in ten years, and then putting all of my theoretically unimportant passwords into it. This is the same lack of caring and laziness that had me use unencrypted SSH keypairs for many years until I finally motivated myself to switch over.

(Probably I should motivate myself to start using some encrypted password storage scheme, but my current storage scheme for such nominally unimportant passwords has more than just the password; I also note down all sorts of additional details about the website or registration or whatever, including things like login name, the tagged email address I used for it, and so on. Really I'd want to find a decent app that did a great job of handling encrypted notes.)

I have a long history of such laziness until I'm prodded into finding better solutions, sometimes by writing about my current approaches here and facing up to their flaws. I'll have to see if that happens for this case.

PS: The reason to encrypt passwords at rest even on my machine is the same reason to encrypt my SSH keypairs at rest; it's often a lot easier in practice to read files you're not supposed to have access to than to fully compromise the machine. On the other hand, SSH keypairs are usually in a known or directly findable location, and my collection of password information is not; an attacker would need the ability to hunt around my filesystem.

MyBadPasswordHandling written at 01:37:46; Add Comment

2018-09-24

Why I don't set master passwords in programs

There are any number of programs and systems that store passwords for you, most prominently browsers with their remembered website passwords. It's very common for these programs to ask you to set a master password that will secure the passwords they store and be necessary to unlock those passwords. One of my peculiarities is that I refuse to set up such master passwords; this shows up most often in browsers, but I stick to it elsewhere as well. The fundamental reason why I don't do this because I don't trust programs to securely handle any such master password.

You might think that everyone manages this, but in practice securely handling a master password requires a lot more than obvious things like not leaking it or leaving it sitting around in memory or the like. It also includes things like not making it easy to recover the master password through brute force, which is a problem that Firefox has (and Thunderbird too); see Wladimir Palant's writeup (via). It seems likely that other master password systems have similar issues, and at the least it's hard to trust them. Cryptography is a hard and famously tricky field, where small mistakes can turn into big problems and there are few genuine experts.

I have a few core passwords that I use routinely and have memorized; these are things like Unix login passwords and the like. But if I can't trust a program to securely handle its master password, it's not safe to use one of those high value memorized passwords of mine as its master password; I'm not willing to risk the leak of, say, my Unix login password. That means that I need to create a new password to be the program's master password, and additional passwords are all sorts of hassle, especially if I don't use them frequently enough to memorize them. Even having a single password that I used for everything that wanted a master password would be an annoyance, and of course it would be somewhat insecure.

So the upshot of all of this is that I just don't use master passwords. Since all of the passwords that I do allow things to store are not strongly protected, I make sure to never allow my browsers, my IMAP clients, and so on to store the password for anything I consider really important. Sometimes this makes life a bit more inconvenient, but I'm willing to live with that.

(The exception that proves the rule is that I do have a fair bit of trust in my iPhone's security, so I'm willing to have it hold passwords that I don't allow other things to get near. But even on the iPhone, I haven't tried to use one of the password store apps like 1Password, partly because I'm not sure if they'd get me anything over Apple's native features for this.)

I don't have any clever solutions to this in general. The proliferation of programs with separate password management and separate master passwords strikes me as a system design problem, but it's one that's very hard to fix in today's cross-platform world (and it's impossible to fix on platforms without a strong force in control). Firefox, Chrome, and all of those other systems have rational reasons to have their own password stores, and once you have separate password stores you have at least some degree of user annoyance.

PS: One obvious solution to my specific issue is to find some highly trustworthy password store system and have it hold the master passwords and so on. I'm willing to believe that this can be done well on a deeply integrated system, but I primarily use Linux and so I doubt there's any way to have a setup that doesn't require various amounts of cutting and pasting. So far the whole area is too much of a hassle and involves too much uncertainty for me to dig into it.

(This is another personal limit on how much I care about security, although in a different form than the first one.)

MasterPasswordsWhyNot written at 21:30:27; Add Comment

2018-09-17

The importance of explicitly and clearly specifying things

I was going to write this entry in an abstract way, but it is easier and more honest to start with the concrete specifics and move from there to the general conclusions I draw and my points.

We recently encountered an unusual Linux NFS client behavior, which at the time I called a bug. I have since been informed that this is not actually a bug but is Linux's implementation of what Linux people call "close to open cache consistency", which is written up in the Linux NFS FAQ, section A8. I'm not sure what to call the FAQ's answer; it is partly a description of concepts and partly a description of the nominal kernel implementation. However, this kernel implementation has changed over time, as we found out, with changes in user visible behavior. In addition, the FAQ doesn't make any attempt to describe how this interacts with NFS locking or if indeed NFS locking has any effect on it.

As someone who has to deal with this from the perspective of programs that are running on Linux NFS clients today and will likely run on Linux NFS clients for many years to come, what I need is a description of the official requirements for client programs. This is not a description of what works today or what the kernel does today, because as we've seen that can change; instead, it would be a description of what the NFS developers promise will work now and in the future. As with Unix's file durability problem, this would give me something to write client programs to and mean that if I found that the kernel deviated from this behavior I could report it as a bug.

(It would also give the NFS maintainers something clear to point people to if what they report is not in fact a bug but them not understanding what the kernel requires.)

On the Linux NFS mailing list, I attempted to write a specific description of this from the FAQ's wording (you can see my attempt here), and then asked some questions about what effect using flock() had on this (since the FAQ is entirely silent on this). This uncovered another Linux NFS developer who apparently has a different (and less strict) view of what the kernel should require from programs here. It has not yet yielded any clarity on what's guaranteed about flock()s interaction with Linux CTO cache consistency.

The importance of explicitly and clearly specifying things is that it deals with all four issues that have been uncovered here. With a clear and explicit specification (which doesn't have to be a formal, legalistic thing), it would be obvious what writers of programs must do to guarantee things working (not just now but also into the future), all of the developers could be sure that they were in agreement about how the code should work (and if there's disagreement, it would be immediately uncovered), any unclear or unspecified areas would at least become obvious (you could notice that the specification says nothing about what flock() does), and it would be much clearer if kernel behavior was a bug or if a kernel change introduced a deviation from the agreed specification.

This is a general thing, not something specific to the Linux kernel or kernels in general. For 'kernel' you can substitute 'any system that other people base things on', like compilers, languages, web servers, etc etc. In a sense this applies to anything that you can describe as an API. If you have an API, you want to know how you use the API correctly, what the API actually is (not just the current implementation), if the API is ambiguous or incomplete, and if something is a bug (it violates the API) or just a surprise. All of this is very much helped by having a clear and explicit description of the API (and, I suppose I should add, a complete one).

ExplicitSpecImportance written at 01:06:10; Add Comment

2018-08-20

Explicit manipulation versus indirect manipulation UIs

One of the broad splits in user interfaces in general is the spectrum between what I'll call explicit manipulation and indirect manipulation. Put simply, in a explicit manipulation interface you see what you're working on and you specify it directly, and in an indirect manipulation interface you don't; you specify it indirectly. The archetypal explicit manipulation interface is the classical GUI mouse-based text selection and operations on it; you directly select the text with the mouse cursor and you can directly see your selection.

(This directness starts to slip away once your selection is large enough that you can no longer see it all on the screen at once.)

An example of an indirect manipulation interface is the common interactive Unix shell feature of !-n, for repeating (or getting access to) the Nth previous command line. You aren't directly pointing to the command line and you may not even still have it visible on the screen; instead you're using it indirectly, through knowledge of what relative command number it is.

A common advantage of indirect manipulation is that indirect manipulation is compact and powerful, and often fast; you can do a lot very concisely with indirect manipulation. Typing '!-7 CR' is unquestionably a lot faster than scrolling back up through a bunch of output to select and then copy/paste a command line. Even the intermediate version of hitting cursor up a few times until the desired command appears and then CR is faster than the full scale GUI text selection.

(Unix shell command line editing features span the spectrum of strong indirect manipulation through strong explicit manipulation; there's the !-n notation, cursor up/down, interactive search, and once you have a command line you can edit it in basically an explicit manipulation interface where you move the cursor around in the line to delete or retype or alter various bits.)

Indirect manipulation also scales and automates well; it's generally clear how to logically extend it to some sort of bulk operation that doesn't require any particular interaction. You specify what you want to operate on and what you want to do, and there you go. Abstraction more or less requires the use of indirect manipulation at some level.

The downside of indirect manipulation is that it requires you to maintain context in order to use it, in contrast to explicit manipulation where it's visible right in front of you. You can't type '!-7' without the context that the command you want is that one, not -6 or -8 or some other number. You need to construct and maintain this context in order to really use indirect manipulation effectively, and if you get the context wrong, bad things happen. I have accidentally shut down a system by being confidently wrong about what shell command line a cursor-up would retrieve, for example, and mistakes about context are a frequent source of production accidents like 'oops we just mangled the live database, not the test one' (or 'oops we modified much more of the database than we thought this operation would apply to').

My guess is that in much the same way that custom interfaces can be a benefit for people who use them a lot, indirect manipulation interfaces work best for frequent and ongoing users, because these are the people who will have the most experience at maintaining the necessary context in their head. Conveniently, these are the people who can often gain the most from using the compact, rapid power of indirect manipulation, simply because they spend so much time doing things with the system. By corollary, people who only infrequently use a thing are not necessarily going to remember context or be good at constructing it in their head and keeping track of it as they work (see also).

(The really great trick is to figure out some way to provide the power and compactness of indirect manipulation along with the low need for context of explicit manipulation. This is generally not easy to pull off, but in my view incremental search shows one path toward it.)

PS: I'm using 'user interface' very broadly here, in a sense that goes well beyond graphical UIs. Unix shells have a UI, programs have a UI in their command line arguments, sed and awk have a UI in the form of their little languages, programming languages and APIs have and are UIs, and so on. If people use it, it's in some sense a user interface.

(I'd like to use the term 'direct manipulation' for what I'm calling 'explicit manipulation' here, but the term has an established, narrower definition. GUI direct manipulation interfaces are a subset of what I'm calling explicit manipulation interfaces.)

ExplicitVsIndirectManipulation written at 22:12:16; Add Comment

2018-08-05

Why email is often not as good as modern communication protocols

I was recently reading Git is already federated & decentralized (via). To summarize the article, it's a reaction to proposals to decentralize large git servers (of the Github/Gitlab variety) by having them talk to each other with ActivityPub. Drew DeVault ntes that notes that Git can already be used in a distributed way over email, describes how git forges could talk to each other via email, and contrasts it with ActivityPub. In the process of this article, Drew DeVault proposes using email not just between git forges but between git forges and users, and this is where my sysadmin eyebrows went up.

The fundamental problem with email as a protocol, as compared to more modern ones, is that standard basic email is an 'anonymous push' protocol. You do not poll things you're interested in, they push data to you, and you can be pushed to with no form of subscription on your part required; all that's needed is a widely shared identifier in the form of your email address. This naturally and necessarily leads to spam. An anonymous push protocol is great if you want to get contacted by arbitrary strangers, but that necessarily leads to arbitrary strangers being able to contact you whether or not you're actually interested in what they have to say.

This susceptibility to spam is not desired (to put it one way) in a modern protocol, a protocol that is designed to survive and prosper on today's Internet. Unless you absolutely need the ability to be contacted by arbitrary strangers, a modern protocol should be either pull-based or require some form of explicit and verifiable subscription, such that pushes without the subscription information automatically fail.

(One form of explicit verification is to make the push endpoint different for each 'subscription' by incorporating some kind of random secret in eg the web-hook URL that notifications are POSTed to.)

It's possible to use email as the transport to implement a protocol that doesn't allow anonymous push; you can require signed messages, for example, and arrange to automatically reject or drop unsigned or badly signed ones. But this requires an additional layer of software on top of email; it is not simple, basic email by itself, and that means that it can't be used directly by people with just a straightforward mail address and mail client. As a result, I think that Drew DeVault's idea of using email as the transport mechanism between git forges is perfectly fine (although you're going to want to layer message signatures and other precautions on top), but the idea of extending that to directly involving people's email boxes is not really the right way to go, or at least not the right way to go by itself.

(To be blunt, one of the great appeals of Github to me is that I can to a large extent participate in Github without spraying my email address around to all and sundry. It still leaks out and I still get spam due to participating in Github, but it's a lot less than I would if all Github activity took place in mailing lists that were as public as eg Github issues are.)

EmailVsModernProtocols written at 00:03:11; Add Comment

2018-07-14

The challenge of storing file attributes on disk

In pretty much every Unix filesystem and many non-Unix ones, files (and more generally all filesystem objects) have a collection of various basic attributes, things like modification time, permissions, ownership, and so on, as well as additional attributes that the filesystem uses for internal purposes (eg). This means that every filesystem needs to figure out how to store and represent these attributes on disk (and to a lesser extent in memory). This presents two problems, an immediate one and a long term one.

The immediate problem is that different types of filesystem objects have different attributes that make sense for them. A plain file definitely needs a (byte) length that is stored on disk, but that doesn't make any sense to store on disk for things like FIFOs, Unix domain sockets, and even block and character devices, and it's not clear if a (byte) length still make sense for directories either given that they're often complex data structures today. There's also attributes that some non-file objects need that files don't; a classical example in Unix is st_rdev, the device ID of special files.

(Block and character devices may have byte lengths in stat() results but that's a different thing entirely than storing a byte length for them on disk. You probably don't want to pay any attention to the on-disk 'length' for them, partly so that you don't have to worry about updating it to reflect what you'll return in stat(). Non-linear directories definitely have a space usage, but that's usually reported in blocks; a size in bytes doesn't necessarily make much sense unless it's just 'block count times block size'.)

The usual answer for this is to punt. The filesystem will define an on-disk structure (an 'inode') that contains all of the fields that are considered essential, especially for plain files, and that's pretty much it. Objects that don't use some of the basic attributes still pay the space cost for them, and extra attributes you might want either get smuggled somewhere or usually just aren't present. Would you like attributes for how many live entries and how many empty entry slots are in a directory? You don't get it, because it would be too much overhead to have the attributes there for everything.

The long term problem is dealing with the evolution of your attributes. You may think that they're perfect now (or at least that you can't do better given your constraints), but if your filesystem lives for long enough, that will change. Generally, either you'll want to add new attributes or you'll want to change existing ones (for example, widening a timestamp from 32 bits to 64 bits). More rarely you may discover that existing attributes make no sense any more or aren't as useful as you thought.

If you thought ahead, the usual answer for this is to include unused extra space in your on-disk attribute structure and then slowly start using it for new attributes or extending existing ones. This works, at least for a while, but it has various drawbacks, including that because you only have limited space you'll have long arguments about what attributes are important enough to claim some of it. On the other hand, perhaps you should have long arguments over permanent changes to the data stored in the on-disk format and face strong pressures to do it only when you really have to.

As an obvious note, the reason that people turn to a fixed on-disk 'inode' structure is that it's generally the most space-efficient option and you're going to have a lot of these things sitting around. In most filesystems, most of them will be for regular files (which will outnumber directories and other things), and so there is a natural pressure to prioritize what regular files need at the possible expense of other things. It's not the only option, though; people have tried a lot of things.

I've talked about the on-disk format for filesystems here, but you face similar issues in any archive format (tar, ZIP, etc). Almost all of them have file 'attributes' or metadata beyond the name and the in-archive size, and they have to store and represent this somehow. Often archive formats face both issues; different types of things in the archive want different attributes, and what attributes and other metadata needs to be stored changes over time. There have been some creative solutions over the years.

FileAttributesProblem written at 00:28:48; Add Comment

2018-07-10

Some thoughts on performance shifts in moving from an iSCSI SAN to local SSDs

At one level, we're planning for our new fileserver environment to be very similar to our old one. It will still use ZFS and NFS, our clients will treat it the same, and we're even going to be reusing almost all of our local management tools more or less intact. At another level, though, it's very different because we're dropping our SAN in this iteration. Our current environment is an iSCSI-based SAN using HDs, where every fileserver connects to two iSCSI backends over two independent 1G Ethernet networks; mirrored pairs of disks are split between backends, so we can lose an entire backend without losing any ZFS pools. Our new generation of hardware uses local SSDs, with mirrored pairs of disks split between SATA and SAS. This drastic low level change is going to change a number of performance and failure characteristics of our environment, and today I want to think aloud about how the two environments will differ.

(One reason I care about their differences is that it affects how we want to operate ZFS, by changing what's slow or user-visible and what's not.)

In our current iSCSI environment, we have roughly 200 MBytes/sec of total read bandwidth and write bandwidth across all disks (which we can theoretically get simultaneously) and individual disks can probably do about 100 to 150 MBytes/sec of some combination of reads and writes. With mirrors, we have 2x write amplification from incoming NFS traffic to outgoing iSCSI writes, so 100 Mbytes/sec of incoming NFS writes saturates our disk write bandwidth (and it also seems to squeeze our read bandwidth). Individual disks can do on the order of 100 IOPs/sec, and with mirrors, pure read traffic can be distributed across both disks in a pair for 200 IOPs/sec in total. Disks are shared between multiple pools, which visibly causes problems, possibly because the sharing is invisible to our OmniOS fileservers so they do a bad job of scheduling IO.

Faults have happened at all levels of this SAN setup. We have lost individual disks, we have had one of the two iSCSI networks stop being used for some or all of the disks or backends (usually due to software issues), and we have had entire backends need to be rotated out of service and replaced with another one. When we stop using one of the iSCSI networks for most or all disks of one backend, that backend drops to 100 Mbytes/sec of total read and write bandwidth, and we've had cases where the OmniOS fileserver just stopped using one network so it was reduced to 100 Mbytes/sec to both backends combined.

On our new hardware with local Crucial MX300 and MX500 SSDs, each individual disk has roughly 500 Mbytes/sec of read bandwidth and at least 250 Mbytes/sec of write bandwidth (the reads are probably hitting the 6.0 Gbps SATA link speed limit). The SAS controller seems to have no total bandwidth limit that we can notice with our disks, but the SATA controller appears to top out at about 2000 Mbytes/sec of aggregate read bandwidth. The SSDs can sustain over 10K read IOPs/sec each, even with all sixteen active at once. With a single 10G-T network connection for NFS traffic, a fileserver can do at most about 1 GByte/sec of outgoing reads (which theoretically can be satisfied from a single pair of disk) and 1 GByte/sec of incoming writes (which would likely require at least four disk pairs to get enough total write bandwidth, and probably more because we're writing additional ZFS metadata and periodically forcing the SSDs to flush and so on).

As far as failures go, we don't expect to lose either the SAS or the SATA controllers, since both of them are integrated into the motherboard. This means we have no analog of an iSCSI backend failure (or temporary unavailability), where a significant number of physical disks are lost at once. Instead the only likely failures seem to be the loss of individual disks and we certainly hope to not have a bunch fall over at once. I have seen a SATA-connected disk drop from a 6.0 Gbps SATA link speed down to 1.5 Gbps, but that may have been an exceptional case caused by pulling it out and then immediately re-inserting it; this dropped the disk's read speed to 140 MBytes/sec or so. We'll likely want to monitor for this, or in general for any link speed that's not 6.0 Gbps.

(We may someday have what is effectively a total server failure, even if the server stays partially up after a fan failure or a partial motherboard explosion or whatever. But if this happens, we've already accepted that the server is 'down' until we can physically do things to fix or replace it.)

In our current iSCSI environment, both ZFS scrubs to check data integrity and ZFS resilvers to replace failed disks can easily have a visible impact on performance during the workday and they don't go really fast even after our tuning; this is probably not surprising given both total read/write bandwidth limits from 1G networking and IOPs/sec limits from using HDs. When coupled with our multi-tenancy, this means that we've generally limited how much scrubbing and resilvering we'll do at once. We may have historically been too cautious about limiting resilvers (they're cheaper than you might think), but we do have a relatively low total write bandwidth limit.

Our old fileservers couldn't have the same ZFS pool use two chunks from the same physical disk without significant performance impact. On our new hardware this doesn't seem to be a problem, which suggests that we may experience much less impact from multi-tenancy (which we're still going to have, due to how we sell storage). This is intuitively what I'd expect, at least for random IO, since SSDs have so many IOPs/sec available; it may also help that the fileserver can now see that all of this IO is going to the same disk and schedule it better.

On our new hardware, test ZFS scrubs and resilvers have run at anywhere from 250 Mbyte/sec on upward (on mirrored pools), depending on the test pool's setup and contents. With high SSD IOPs/sec and read and write bandwidth (both to individual disks and in general), it seems very likely that we can be much more aggressive about scrubs and resilvers without visibly affecting NFS fileserver performance, even during the workday. With an apparent 6000 Mbytes/sec of total read bandwidth and perhaps 4000 Mbytes/sec of total write bandwidth, we're pretty unlikely to starve regular NFS IO with scrub or resilver IO even with aggressive tuning settings.

(One consequence of hoping to mostly see single-disk failures is that under normal circumstances, a given ZFS pool will only ever have a single failed 'disk' from a single vdev. This makes it much less relevant that resilvering multiple disks at once in a ZFS pool is mostly free; the multi-disk case is probably going to be a pretty rare thing, much rarer than it is in our current iSCSI environment.)

ShiftsInSANToLocalSSD written at 23:43:52; Add Comment

2018-07-08

TLS Certificate Authorities and 'trust'

In casual conversation about CAs, it's common for people to talk about whether you trust a CA (or should) and whether a CA is trustworthy. I often bristle at using 'trust' in these contexts, but it's been hard to articulate why. Today, in a conversation on HN prompted by my entry on the first imperative of commercial CAs, I came up with a useful explanation.

Let's imagine that there's a new CA that's successfully set itself up as a copy of how Let's Encrypt operates; it uses the same hardware, runs the same open source software, configures things the same, follows the same procedures, has equally good staff, has been properly audited, and in general has completely duplicated Let's Encrypt's security and operational excellence. However, it has opted for the intellectually pure approach of starting with new root certificates that are not cross-signed by anyone and it is not in any browser root stores yet; as a result, its certificates are not trusted by any browser.

(Let's Encrypt has made this example plausible, because as a non-commercial CA that mostly does things with automation it doesn't have as many reasons to keep how it operates a secret as a commercial CA does.)

In any reasonable and normal sense of the word, this CA is as trustworthy as Let's Encrypt is. It will issue or not issue TLS certificates in the same situations that LE would (ignoring rate limits and pretending that everyone who authorizes LE in CAA records will also authorize this CA and so on), and its infrastructure and procedures are as secure and solid as LE's. If we trust LE, and I think we do, it's hard to say why we wouldn't trust this CA.

If we say that this CA is 'less trustworthy' than Let's Encrypt anyway, what we really mean is 'TLS certificates from this CA currently provoke browser warnings'. This is a perfectly good thing to care about (and it's usually what matters in practice), but it is not really 'trust' and the difference matters because we have a whole tangled set of expectations, beliefs, and intuitions surrounding the idea of trust. When we use the language of trust to talk about technical issues of which CA certificates the browsers accept and when, we create at least some confusion and lose some clarity, and we risk losing sight of what browser-accepted TLS certificates really are, what they tell us, and what we care about with them.

For instance, if we talk about trust and you get a TLS certificate from a CA, it seems to make intuitive sense to say that you need to trust the CA and that it should be trustworthy. But what does that actually mean when we look at the technical details? What should the CA do or not do? How does that affect our security, especially in light of the fundamental practical problem with the CA model?

At the same time, talking about the trustworthiness of a CA is not a pointless thing. If a CA is not trustworthy (in the normal sense of the word), it should not be included in browsers (and eventually will not be). It's just that the trustworthiness of a CA is only loosely correlated with whether TLS certificates from the CA are currently accepted by browsers, which is almost always what we really care about. As we've seen with StartCom, it can take a quite long time to transition from concluding that a CA is no longer trustworthy to having all its TLS certificates no longer accepted by browsers.

There can also be some amount of time when a new CA is trustworthy but is not included in browsers, because inclusion takes a while. This actually happened with Let's Encrypt; it's just that Let's Encrypt worked around this time delay by getting their certificate cross-signed by an existing in-browser CA, so people mostly didn't notice.

(I will concede that using 'trust' casually is very attractive. For example, in the sentence above I initially wrote 'trusted CA' instead of 'in-browser CA', and while that's sort of accurate I decided it was not the right phrasing to use in this entry.)

Sidebar: The one sort of real trust required in the CA model

Browser vendors and other people who maintain sets of root certificates must trust that CAs included in them will not issue certificates improperly and will in general conduct themselves according to the standards and requirements that the browser has. What constitutes improper issuance is one part straightforward and one part very complicated and nuanced; see, for example, the baseline requirements.

CertificateAuthoritiesAndTrust written at 23:02:11; Add Comment

2018-06-25

Twitter probably isn't for you or me any more

I'm currently feeling unhappy about Twitter, because last Friday I confirmed that my Linux Twitter client is going to stop working in less than two months in Twitter's client apocalypse (my iOS client is very likely to be affected too). This development isn't an exception; instead it's business as usual for Twitter, at least as long term and active users of Twitter see it. Twitter has a long history of making product changes that we don't want and ignoring the ones that we do want, like a return of chronological timelines. Worse, even bigger ones are said to be on the way, with further changes from the classical Twitter experience. Why is Twitter ignoring its long term users this way? Is it going to change its mind? My guess is probably not. Instead, I've come to believe that Twitter has made a cold blooded business decision that it's not for you or me any more.

Here is how I see things at the moment, from my grumpy and cynical perspective.

Twitter is a tech company with a highly priced stock, which current investors want and need to be even higher. To support and increase its stock price, Twitter needs to grow its revenue and grow it fast (no one is going to sit around for five or ten years of slow growth). Like many modern Internet companies, Twitter is currently mostly an advertising company and makes money by showing ads to its users. There are three broad ways for an ad company to increase the money coming in; it can:

  1. increase the value of its ads, so companies will pay more for them.
  2. show more ads to current users.
  3. increase the number of (active) users, so it sells more ads in total.

The first seems unlikely to happen, especially given that Internet ad trends seem to be running the other way. The second generally doesn't work too well and can work against the first. Neither of them, separately or together, seem likely to deliver the sort of major growth Twitter needs (even if they work, they both have limits). So that leaves increasing the number of users.

But Twitter has already been trying to grow its user base for years, generally without much success and certainly without the very visible large scale growth that investors need. As part of this, Twitter has spent years refining and tinkering with the core Twitter product in attempts to draw in more users and get them to be more active, and with moderate exceptions it hasn't worked. The modern Twitter is genuinely more pleasant in various modest ways than it was when I started, but it's manifestly not drawing in hordes of new users.

In this situation, Twitter has a choice. It could double down on its past approach, trying yet more tweaks to the current core Twitter experience in a 'this time for sure' bet even though that's repeatedly failed before. Alternately, it could make a cold blooded business decision to shift to a significantly different core experience that (Twitter feels) has a much better chance of pulling in the vast ocean of users in the world who aren't particularly attracted to the current version of Twitter, and may even be turned off by it.

I believe that Twitter's made the second choice. It's decided to change what 'Twitter' is; as a result, 'Twitter' is no longer for you and me, the people who like it as it is, as a chronological timeline and so on. 'Twitter' the experience is now going to be for the new users that Twitter (the company) needs in order to have a chance of growing revenue enough and keeping its share price up. If the new experience displeases or outright alienates you and me, that's just tough luck for us. The Twitter that we find interesting and compelling, the product that's useful to us, well, it's apparently not capable of growing big enough (for Twitter's investors, at least; it might be a profitable company without the baggage of a high stock price).

(Analogies to the rise and then the stall of syndication feed reading are left as an exercise for the reader, including any arguments that there was or wasn't a natural limit to the number of people who'd ever want to use a feed reader.)

I have no idea and no opinions on where this leaves you and me, the people who like Twitter as it is, or what alternatives we really have, especially if the community we've found on Twitter is important to us. The unpleasant answer may be that things will just dissolve; we'll all walk away in our own separate and scattered directions, as people walked away from Usenet communities once upon a time.

PS: Twitter added 6 million 'monthly active users' in Q1 2018 (and not all of them will be bots), but it also attributed a bunch of this to new experiences and features, not the core product suddenly being more attractive. See also, about Twitter's Q4 2017. Twitter is also apparently making more money from video ads, but there's a limit to how much money growth that can drive; after a certain point, they're (almost) all video ads.

TwitterNoLongerForMeOrYou written at 23:40:25; Add Comment

2018-06-08

Networks form through usage and must be maintained through it

I recently read The network's the thing (via, itself probably via). One of the things this article is about is how for many companies, the network created by your users is the important piece, not the product itself; you can change the product but retain the network and win, despite the yelling of aggravated users. My impression is that this is a popular view of companies, especially companies with social networks.

(Purely social networks are not the only sort of networks there are. Consider Github; part of its value is from the network of associations among users and repositories. This is 'social' in the sense that it involves humans talking to each other, but it is not 'social' in the sense that Twitter, Facebook, and many other places are.)

On the one hand, this "networks matter but products don't" view is basically true. On the other hand, I think that this is a view that misses an important detail. You see, users do not form (social) networks on services out of the goodness of their hearts. Instead, those social networks form and remain because they are useful to people. More than that, they don't form in isolation; instead they're created as a side effect of the usefulness of the service itself to people, and this usefulness depends on how the service works (and on how people use it). People create the network by using the product, and the network forms only to the extent and in the manner that the product is useful to them.

As a corollary, if the product changes enough, the network by itself will not necessarily keep people present. What actually matters to people is not the network as an abstract thing, it's the use they get out of the network. If that use dries up because the product is no longer useful to them, well, there you go. For example, if people are drawn to Twitter to have conversations and someday Twitter breaks that by changing how tweets are displayed and threaded together so that you stop being able to see and get into conversations, people will drift away. Twitter's deep social network remains in theory, but it is not useful to you any more so it might as well not exist in practice.

In this sense, your network is not and cannot be separated from your core product (or products). It exists only because people use those products and they do that (in general) because the products help them or meet some need they feel. If people stop finding your product useful, the network will wither. If different people find your product useful in different ways, the shape of your (active) network will shift.

(For example, Twitter's network graph will probably look quite different if it becomes a place where most people passively follow a small number of 'star' accounts and never interact with most other people on Twitter.)

At the same time, changes in the product don't necessarily really change the network because they don't necessarily drastically change the use people get out of the service. To pick on Twitter again, the length of tweets or whether or not their length counts people mentioned in them are not crucial to the use most people get out of Twitter, for all that I know I heard yelling about both.

NetworksMustBeUseful written at 00:44:17; Add Comment

(Previous 10 or go back to June 2018 at 2018/06/04)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.