Wandering Thoughts archives

2021-01-10

Thinking through why you shouldn't use plaintext passwords in authentication, even inside TLS

I recently read an article (this one, via) that advocated for just using plaintext passwords inside TLS for things like IMAP and (authenticated) SMTP. My gut reaction was that this was a terrible idea in general, but I couldn't immediately come up with a solid reason why and why other alternatives are better for authentication. So here's an attempt.

There are (at least) two problems with passwords in general. The first problem is that people reuse passwords from one place to another, so knowing someone's password on site A often gives an attacker a big lead on breaking into their accounts elsewhere (or for other services on the site, if it has multiple ones). The second problem is that if you can obtain someone's password from a site through read-only access, you can usually leverage this to be able to log in as them and thus change anything they have access to (or even see things you couldn't before).

The consequence of this is that sending a password in plaintext over an encrypted connection has about the worst risk profile for various plausible means of authentication. This is because both ends will see the password and the server side has to directly know it in some form (encrypted and salted, hopefully). Our history is full of accidents where the client, the server, or both wind up doing things like logging the passwords by accident (for example, as part of logging the full conversation for debugging) or exposing it temporarily in some way, and generally the authentication information the server has to store can be directly brute forced to determine those passwords, which can turn a small information disclosure into a password breach.

So what are your options? In descending order of how ideal they are, I can think of three:

  • Have someone else do the user authentication for you, and only validate their answers through solidly secure means like public key cryptography. If you can get this right, you outsource all of the hassles of dealing with authentication in the real world to someone else, which is often a major win for everyone.

    (On the other hand, this gives third parties some control over your users, so you may want to have a backup plan.)

  • Use keypairs as SSH does. This requires the user (or their software) to hold their key and hopefully encrypt it locally, but the great advantage is that the server doesn't hold anything that can be used to recover the 'password' and a reusable challenge never goes across the network, so getting a copy of the authentication conversation does an attacker no good.

    (If an attacker can use a public key to recover the secret key, everyone has some bad problems.)

  • Use some sort of challenge-response system that doesn't expose the password in plaintext or provide a reusable challenge, and that allows the server side to store the password in a form that can't be readily attacked with things like off the shelf rainbow tables. You're still vulnerable to a dedicated attacker who reads out the server's stored authentication information and then builds a custom setup to brute force it, but at least you don't fall easily.

    (OPAQUE may be what you'd want for this, assuming you were willing to implement an IETF draft. But I haven't looked at it in detail.)

As far as the practical answers for IMAP, authenticated SMTP, and so on go, I have no idea. For various reasons I haven't looked at alternative authentication methods that IMAP supports, and as far as websites go to do anything other than plaintext passwords that get passed to you over HTTPS, you'd have to implement some security sensitive custom stuff (which has been done by people who had a big enough problem).

tech/PlaintextPasswordDanger written at 23:31:23; Add Comment

What timestamps you get back along with Prometheus query results

When you make a Prometheus query in PromQL, the result has both values and timestamps for those values (as covered in the API documentation). This is the case for both instant queries and range queries. Usually tools ignore the timestamp on instant queries and use it to order the values for graphing or otherwise displaying the results of ranged queries.

(In Grafana, the timestamp is one of the fields you can display in a table. For reasons that we'll cover, the timestamp of typical queries is usually uninteresting and you routinely hide it from being displayed.)

Simplifying somewhat, the result of most PromQL expressions is what we can consider to be an instant vector, which is to say that there are a bunch of metric points, their values, and an associated timestamp. This is true both for instant queries and for range queries; for range queries, the PromQL expression is evaluated at each query step and then all of those individual query results are put together and returned (where Grafana or the Prometheus console will generally pick them apart to generate a graph).

For normal PromQL queries that result in these instant vectors, the timestamp associated with each value generated by the query is the time at which the query ran. For an instant query, this is 'right now' (or whenever you set the query to be at), even if you used offset in the expression. For a ranged query, the time the query ran is the time of that particular query step. As I found out before, this time can be surprising for subqueries because Prometheus rounds off the time. This timestamp is emphatically not the time of the metric (or metrics) that the query is using, and we can see the gap by looking at the results of a query like:

time() - timestamp( node_load1 )

(How big the difference can be obviously depends on how frequently Prometheus pulls the metric.)

However, a PromQL expression that uses a range vector selector on a simple metric to return a range vector as the result is different. As I described in how to extract raw time series data, such a query returns a set of values and timestamps where the timestamp is the underlying timestamp of the metric point in Prometheus's time series database (TSDB), the same value that timestamp() would give you, and you get as many elements in the range vector as Prometheus actually pulled from the metrics source over the time range and has in its TSDB. Right now (and probably in the future), such PromQL queries must be instant queries; if you try to make a range query with a PromQL expression that returns a range vector, you will get an error from Prometheus.

A PromQL expression with a 'bare' subquery (one not reduced down by aggregation operators and so on) can also return a range vector from an instant query, but the timestamps of the values behave as if you made a range query; they are the evaluation time of each query step (as altered and rounded by Prometheus). Effectively a subquery acts as a range query, and I believe it's more or less implemented as that inside Prometheus.

PS: Technically PromQL expressions can also return a scalar or a string, per the API documentation. I believe that you get scalars from PromQL expressions that don't involve any metrics, for example just 'time()'. I'm not sure what PromQL expression could give you just a string. Both scalar and string results have timestamps, and for scalar results the time is definitely the time the expression was evaluated at.

(This is easy to see by making a query for 'time()'; the result has the same number for the timestamp and the value.)

sysadmin/PrometheusQueryTimestamps written at 00:35:41; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.