2021-01-10
Thinking through why you shouldn't use plaintext passwords in authentication, even inside TLS
I recently read an article (this one, via) that advocated for just using plaintext passwords inside TLS for things like IMAP and (authenticated) SMTP. My gut reaction was that this was a terrible idea in general, but I couldn't immediately come up with a solid reason why and why other alternatives are better for authentication. So here's an attempt.
There are (at least) two problems with passwords in general. The first problem is that people reuse passwords from one place to another, so knowing someone's password on site A often gives an attacker a big lead on breaking into their accounts elsewhere (or for other services on the site, if it has multiple ones). The second problem is that if you can obtain someone's password from a site through read-only access, you can usually leverage this to be able to log in as them and thus change anything they have access to (or even see things you couldn't before).
The consequence of this is that sending a password in plaintext over an encrypted connection has about the worst risk profile for various plausible means of authentication. This is because both ends will see the password and the server side has to directly know it in some form (encrypted and salted, hopefully). Our history is full of accidents where the client, the server, or both wind up doing things like logging the passwords by accident (for example, as part of logging the full conversation for debugging) or exposing it temporarily in some way, and generally the authentication information the server has to store can be directly brute forced to determine those passwords, which can turn a small information disclosure into a password breach.
So what are your options? In descending order of how ideal they are, I can think of three:
- Have someone else do the user authentication for you, and only validate
their answers through solidly secure means like public key cryptography.
If you can get this right, you outsource all of the hassles of dealing
with authentication in the real world to someone else, which is often
a major win for everyone.
(On the other hand, this gives third parties some control over your users, so you may want to have a backup plan.)
- Use keypairs as SSH does. This requires the user (or their software)
to hold their key and hopefully encrypt it locally, but the great
advantage is that the server doesn't hold anything that can be
used to recover the 'password' and a reusable challenge never
goes across the network, so getting a copy of the authentication
conversation does an attacker no good.
(If an attacker can use a public key to recover the secret key, everyone has some bad problems.)
- Use some sort of challenge-response system that doesn't expose the
password in plaintext or provide a reusable challenge, and that
allows the server side to store the password in a form that can't
be readily attacked with things like off the shelf rainbow
tables. You're still vulnerable to
a dedicated attacker who reads out the server's stored authentication
information and then builds a custom setup to brute force it, but
at least you don't fall easily.
(OPAQUE may be what you'd want for this, assuming you were willing to implement an IETF draft. But I haven't looked at it in detail.)
As far as the practical answers for IMAP, authenticated SMTP, and so on go, I have no idea. For various reasons I haven't looked at alternative authentication methods that IMAP supports, and as far as websites go to do anything other than plaintext passwords that get passed to you over HTTPS, you'd have to implement some security sensitive custom stuff (which has been done by people who had a big enough problem).
What timestamps you get back along with Prometheus query results
When you make a Prometheus query in PromQL, the result has both values and timestamps for those values (as covered in the API documentation). This is the case for both instant queries and range queries. Usually tools ignore the timestamp on instant queries and use it to order the values for graphing or otherwise displaying the results of ranged queries.
(In Grafana, the timestamp is one of the fields you can display in a table. For reasons that we'll cover, the timestamp of typical queries is usually uninteresting and you routinely hide it from being displayed.)
Simplifying somewhat, the result of most PromQL expressions is what we can consider to be an instant vector, which is to say that there are a bunch of metric points, their values, and an associated timestamp. This is true both for instant queries and for range queries; for range queries, the PromQL expression is evaluated at each query step and then all of those individual query results are put together and returned (where Grafana or the Prometheus console will generally pick them apart to generate a graph).
For normal PromQL queries that result in these instant vectors, the
timestamp associated with each value generated by the query is the
time at which the query ran. For an instant query, this is 'right
now' (or whenever you set the query to be at), even if you used
offset
in the expression. For a ranged query, the time the query
ran is the time of that particular query step.
As I found out before, this time can be surprising for subqueries because Prometheus rounds off the
time. This timestamp is emphatically not the time of the metric (or
metrics) that the query is using, and we can see the gap by looking
at the results of a query like:
time() - timestamp( node_load1 )
(How big the difference can be obviously depends on how frequently Prometheus pulls the metric.)
However, a PromQL expression that uses a range vector selector
on a simple metric to return a range vector as the result is
different. As I described in how to extract raw time series data, such a query returns a set of values and
timestamps where the timestamp is the underlying timestamp of the
metric point in Prometheus's time series database (TSDB), the same
value that timestamp()
would give you, and you get as many elements
in the range vector as Prometheus actually pulled from the metrics
source over the time range and has in its TSDB. Right now (and
probably in the future), such PromQL queries must be instant queries;
if you try to make a range query with a PromQL expression that
returns a range vector, you will get an error from Prometheus.
A PromQL expression with a 'bare' subquery (one not reduced down by aggregation operators and so on) can also return a range vector from an instant query, but the timestamps of the values behave as if you made a range query; they are the evaluation time of each query step (as altered and rounded by Prometheus). Effectively a subquery acts as a range query, and I believe it's more or less implemented as that inside Prometheus.
PS: Technically PromQL expressions can also return a scalar or a
string, per the API documentation. I believe that you get scalars
from PromQL expressions that don't involve any metrics, for example
just 'time()
'. I'm not sure what PromQL expression could give you
just a string. Both scalar and string results have timestamps, and
for scalar results the time is definitely the time the expression
was evaluated at.
(This is easy to see by making a query for 'time()
'; the result
has the same number for the timestamp and the value.)