Wandering Thoughts

2019-12-02

You can have Grafana tables with multiple values for a single metric (with Prometheus)

Every so often, the most straightforward way to show some information in a Grafana dashboard is with a table, for example to list how long it is before TLS certificates expire, how frequently people are using your VPN servers, or how much disk space they're using. However, sometimes you want to present the underlying information in more than one way; for example, you might want to list both how many days until a TLS certificate expires and the date on which it will expire. The good news is that Grafana tables can do this, because Grafana will merge query results with identical Prometheus label sets (more or less).

(There's a gotcha with this that we will come to.)

In a normal Grafana table, your column fields are the labels of the metric and a 'Value' field that is whatever computed value your PromQL query returned. When you have several queries, the single 'Value' field turns into, eg, 'Value #A', 'Value #B', and so on, and all of them can be displayed in the table (and given more useful names and perhaps different formatting, so Grafana knows that one is a time in seconds and another is a 0.0 to 1.0 percentage). If the Prometheus queries return the same label sets, every result with the same set of labels will get merged into a single row in the table, with all of the 'Value #<X>' fields having values. If not all sets of labels show up in all queries, the missing results will generally be shown as '-'.

(Note that what matters for merging is not what fields you display, but all of the fields. Grafana will not merge rows just because your displayed fields have the same values.)

The easiest way to get your label sets to be the same is to do the same query, just with different math applied to the query's value. You can do this to present TLS expiry as a duration and an absolute time, or usage over time as both a percentage and an amount of time (as seen in counting usage over time). A more advanced version is to do different queries while making sure that they return the same labels, possibly by either restricting what labels are returned with use of 'by (...)' and similar operators (as sort of covered in this entry).

When you're doing different queries of different metrics, an important gotcha comes up. When you do simple queries, Prometheus and Grafana acting together add a __name__ label field with the name of the metric involved. You're probably not displaying this field, but its mere presence with a different value will block field merging. To get rid of it, you have various options, such as adding '+ 0' to the query or using some operator or function (as seen in the comments of this Grafana pull request and this Grafana issue). Conveniently, if you use 'by (...)' with an operator to get rid of some normal labels, you'll get rid of __name__ as well.

All of this only works if you want to display two values for the same set of labels. If you want to pull in labels from multiple metrics, you need to do the merging in your PromQL query, generally using the usual tricks to pull in labels from other metrics.

(I'm writing this all down because I wound up doing this recently and I want to capture what I learned before I forget how to do it.)

GrafanaMultiValueTables written at 23:16:46; Add Comment

Calculating usage over time in Prometheus (and Grafana)

Suppose, not hypothetically, that you have a metric that says whether something is in use at a particular moment in time, such as a SLURM compute node or a user's VPN connection, and you would like to know how used it is over some time range. Prometheus can do this, but you may need to get a little clever.

The simplest case is when your metric is 1 if the thing is in use and 0 if it isn't, and the metric is always present. Then you can compute the percentage of use over a time range as a 0.0 to 1.0 value by averaging it over the time range, and then get the amount of time (in seconds) it was in use by multiplying that by the duration of the range (in seconds):

avg_over_time( slurm_node_available[$__range] )
avg_over_time( slurm_node_available[$__range] ) * $__range_s

(Here $__range is the variable Grafana uses for the time range in some format for Prometheus, which has values such as '1d', and $__range_s is the Grafana variable for the time range in seconds.)

But suppose that instead of being 0 when the thing isn't in use, the metric is absent. For instance, you have metrics for SLURM node states that look like this:

slurm_node_state{ node="cpunode1", state="idle" }   1
slurm_node_state{ node="cpunode2", state="alloc" }  1
slurm_node_state{ node="cpunode3", state="drain" }  1

We want to calculate what percentage of the time a node is in the 'alloc' state. Because the metric may be missing some of the time, we can't just average it out over time any more; the average of a bunch of 1's and a bunch of missing metrics is 1. The simplest approach is to use a subquery, like this:

sum_over_time( slurm_node_state{ state="alloc" }[$__range:1m] ) /
   ($__range_s / 60)

The reason we're using a subquery instead of simply a time range is so that we can control how many sample points there are over the time range, which gives us our divisor to determine the average. The relationship here is that we explicitly specify the subquery range step (here 1 minute aka 60 seconds) and then we divide the total range duration by that range step. If you change the range step, you also have to change the divisor or get wrong numbers, as I have experienced the hard way when I was absent-minded and didn't think this one through.

If we want to know the total time in seconds that a node was allocated, we would multiply by the range step in seconds instead of dividing:

sum_over_time( slurm_node_state{ state="alloc" }[$__range:1m] ) * 60

Now let's suppose that we have a more complicated metric that isn't always 1 when the thing is active but that's still absent entirely when there's no activity (instead of being 0). As an example, I'll use the count of connections a user has to one of our VPN servers, which has a set of metrics like this:

vpn_user_sessions{ server="vpn1", user="cks" }  1
vpn_user_sessions{ server="vpn2", user="cks" }  2
vpn_user_sessions{ server="vpn1", user="fred" } 1

We want to work out the percentage of time or amount of time that any particular user has at least one connection to at least one VPN server. To do this, we need to start with a PromQL expression that is 1 when this condition is true. We'll use the same basic trick for crushing multiple metric points down to one that I covered in counting the number of distinct labels:

sum(vpn_user_sessions) by (user) > bool 0

The '> bool 0' turns any count of current sessions into 1. If the user has no sessions at the moment to any VPN servers, the metric will still be missing (and we can't get around that), so we still need to use a subquery to put this all together to get the percentage of usage:

sum_over_time(
   (sum(vpn_user_sessions) by (user) > bool 0)[$__range:1m]
) / ($__range_s / 60)

As before, if we want to know the amount of time in seconds that a user has had at least one VPN connection, we would multiply by 60 instead of doing the division. Also as before, the range step and the '60' in the division (or multiplication) are locked together; if you change the range step, you must change the other side of things.

Sidebar: A subquery trick that doesn't work (and why)

On the surface, it seems like we could get away from the need to do our complicated division by using a more complicated subquery to supply a default value. You could imagine something like this:

avg_over_time(
 ( slurm_node_state{ state="alloc" } or vector(0) )[$__range:]
)

However, this doesn't work. If you try it interactively in the Prometheus query dashboard, you will probably see that you get a bunch of the metrics that you expect, which all have the value 1, and then one unusual one:

{} 0

The reason that 'or vector(0)' doesn't work is that we're asking Prometheus to be superintelligent, and it isn't. What we get with 'vector(0)' is a vector with a value of 0 and no labels. What we actually want is a collection of vectors with all of the valid labels that we don't already have as allocated nodes, and Prometheus can't magically generate that for us for all sorts of good reasons.

PrometheusCountUsageOverTime written at 00:09:48; Add Comment

2019-11-30

Counting the number of distinct labels in a Prometheus metric

Suppose, not hypothetically, that you're collecting Prometheus metrics on your several VPN servers, including a per user count of sessions on each server. The resulting metric looks like this:

vpn_user_sessions{ user="cks", server="vpn1", ... }  1
vpn_user_sessions{ user="fred", server="vpn1", ... } 1
vpn_user_sessions{ user="cks", server="vpn2", ... }  1

We would like to know how many different users are currently connected across our entire collection of VPN servers. As we see here, the same user may be connected to multiple VPN servers for whatever reason, including that different devices prefer to use different VPN software (such as L2TP or OpenVPN). In Prometheus terms, we want to count the number of distinct label values in vpn_user_sessions for the 'user' label, which I will shorten to the number of distinct labels.

To do this, our first step is to somehow reduce this down to something with one metric point per user, with no other labels. Throwing away labels is done with the 'by (...)' modifier to PromQL aggregation operators. For our purposes we can use any of the straightforward operators such as sum, min, or max; I'll use sum. Using 'sum(...) by (user)' will produce a series like this:

{ user="cks" }  2
{ user="fred" } 1

Having generated this new vector, we simply count how many elements are in it with count(). The final expression is:

count( sum( vpn_user_sessions ) by (user) )

This will give us the number of different users that are connected right now.

Next, suppose that we want to know how many different users have used our VPNs over some span of time, such as the past day. To do this in the most straightforward way, we'll start by basically aggregating our time spam down to something that has an element (with a full set of labels) if the user was connected to a particular VPN server at some point in the time span. Since we don't care about the values, we can use any reasonable <aggregation>_over_time function, such as 'min':

min_over_time( vpn_user_sessions[24h] )

(The choice of aggregation to use is relatively arbitrary; we're using it to sweep up all of the different sets of labels that have appeared in the last 24 hours, not for its output value. Min does this and is simple to compute.)

This gives us an instant vector that we can then process in the same way as we did with vpn_user_sessions when we generated our number of currently connected users; we aggregate it to get rid of all labels other than 'user', and then we count how many distinct elements we have. The resulting query is:

count(
   sum(
        min_over_time( vpn_user_sessions[24h] ) 
   ) by (user)
)

This is not the only way to create a query that does this, but it's the simplest and probably also the best performing.

(I initially wrote a 'how many different users over time' query that didn't produce correct numbers, which I didn't realize until I tested it, and then my next attempt used a subquery and some brute force. It wasn't until I sat down to systematically work out what I wanted and how to get there that I came up with these current versions. This is a valuable learning experience; whenever I'm faced with a complex PromQL query situation, I shouldn't just guess, I should tackle the problem systematically, building up the solution in steps and verifying each one interactively.)

PS: It's possible that this trick is either well known or obvious, but if so I couldn't find it in my initial Internet searches before I started flailing around writing my own queries.

PrometheusCountDistinctLabels written at 00:07:34; Add Comment

2019-11-27

Selecting metrics to gather mostly based on what we can use

Partly due to our situation with our L2TP VPN servers, I've set up a system to capture some basic usage metrics from our VPN servers. They don't directly provide metrics; instead we have to parse the output of things like OpenBSD's npppctl and use the information to generate various metrics from the raw output. As part of doing this, I had to figure out what metrics we wanted to generate.

I could have generated every single metric I can think of a way to get, but this is probably not what we want. We have to write custom code to do this, so the more metrics we gather the more code I have to write and the more complicated the program gets. Large, complicated programs are overhead, and however nifty it would be, we shouldn't pay that cost unless we're getting value from it. Writing a lot of code to generate metrics that we never look at and that just clutter up everything is not a good idea.

At the same time, I felt that I didn't want to go overboard with minimalism and put in only the metrics I could think of an immediate use for and that were easy to code. There's good reason to gather additional information (ie, generate additional metrics) when it's easy and potentially useful; I don't necessarily know what we'll be interested in the future, or what questions we'll ask once we have data available.

In the end I started out with some initial ideas, then refined them as I wrote the code, looked at the generated metrics, thought about how I would use them, and so on. After having both thought about it in advance and seen how my initial ideas changed, I've wound up with some broad thoughts for future occasions (and some practical examples of them).

To start with, I should generate pretty much everything that's directly in the data I have, even if I can't think of an immediate use for it. For our VPN servers this is straightforward things like how many connections there are and the number of connected users (not necessarily the same thing, since user may be connected from multiple devices). Our OpenVPN servers also provide some additional per connection information, which I eventually decided I shouldn't just ignore. We may not use it, but the raw information is there and it's a waste to discard it rather than turn it into metrics.

I also generate some metrics which require additional processing because they seem valuable enough; for these, I should see a clear use, with them letting us answer some questions I'm pretty sure we're interested in. For our VPN servers the largest one (in terms of code) is a broad classification of the IP addresses that people are connecting from; if it's from our wireless network, from our internal networks, from the outside world, and a few other categories. This will give us at least some information about how people are using our VPN. I also generate the maximum number of connections a single IP address or user has. I could have made this a histogram of connections per IP or per user, but that seemed like far too much complexity for the likely value; for our purposes we probably mostly care about the maximum we see, especially since most of the time everyone only has one connection from one remote IP.

(Multiple connections from the same IP address but for different people can happen under various circumstances, as can multiple connections from the same IP for the same person. If the latter sounds odd to you, consider the case of someone with multiple devices at home that are all behind a single NAT gateway.)

Potentially high value information is worth gathering even if it doesn't quite fit neatly into the mold of metrics or raises things like cardinality concerns in Prometheus. For our VPN metrics, I decided that I should generate a metric for every currently connected user so that we can readily track that information, save it, and potentially correlate it to other metrics. Initially I was going to make the value of this metric be '1', but then I realized that was silly; I could just as well make it the number of current connections the user has (which will always be at least 1).

(In theory this is a potentially high cardinality metric. In practice we don't have that many people who use our VPNs, and I've discovered that sometimes high cardinality metrics are worth it. While the information is already in log files, extracting it from Prometheus is orders of magnitude easier.)

In general I should keep metrics de-aggregated as much as is reasonable. At the same time, some things can be worth providing in pre-aggregated versions. For instance, in theory I don't need to have separate metrics for the number of connections and the number of connected users, because they can both be calculated from the per-user connection count metric. In practice, having to do that computation every time is annoying and puts extra load on Prometheus.

However, I also need to think about whether aggregating things together actually makes sense. For instance, OpenVPN provides per connection information on the amount of data sent and received over that connection. It looks tempting to pre-aggregate this together into a 'total sent/received by VPN server' metric, but that's dangerous because connections come and go; our aggregated metric would bounce around in a completely artificial way. Aggregating by user is potentially dangerous, but we have to do something to get stable and useful flow identifiers and most of the time a user only has one connection.

SelectingUsefulMetrics written at 23:40:16; Add Comment

2019-11-25

In Prometheus, don't be afraid of high cardinality metrics if they're valuable enough

We generate any number of custom local metrics that we feed into our local Prometheus metrics and monitoring setup. Most of them are pretty conventional, but one of them is probably something that will raise a lot of eyebrows among people who are familiar with Prometheus and set them to muttering about cardinality explosions. Among our more conventional metrics about how much disk space is free on our fileserver filesystems, we generate a per-user, per-filesystem disk space usage metric. In the abstract, this looks like:

cslab_user_used_bytes{ user="cks", filesystem="/h/281" } 59533330944
cslab_user_used_bytes{ user="cks", filesystem="/cs/mail"} 512

(Users that are not using any space on a filesystem do not get listed for that filesystem.)

Having a time series per user is generally not recommended, and then having it per filesystem as well makes it worse. This metric generates a lot of distinct time series and I'm sure a lot of people would tell us that maybe we shouldn't have it.

However, it's turned out that we derive a major amount of practical value from having this information and having it in Prometheus (and therefor having not just current data but fine grained historical data going back a long ways). Many of our filesystems and ZFS pools perpetually run relatively full and periodically fill up, and when they do this information can immediately tell people what happened, not just in the immediate past but over larger time scales too. Obviously we can easily get various sorts of summed up information, such as per-pool usage by person.

(Another use is finding Unix logins using space in filesystems we didn't expect. When I first set this up, I found root-owned stuff littered in all sorts of places, often by accident or by omission.)

Before I set this metric up and we started using it, I was nervous about the cardinality issue; in fact, cardinality worries kept me from doing this for a while, until various things pushed me over the edge. But now it's clear that the metric is very much worth it, despite all of those different time series it creates.

The large scale Prometheus lesson I took from this is that sometimes high cardinality metrics provide enough value that they're worth having anyway. You don't want to create unnecessary cardinality and you don't want to be too excessive (or overload your Prometheus), but there's value in detail that isn't there in broad overviews. I should be cautious, but not too afraid.

(Now that the most recent versions of Prometheus will actually tell you about your highest cardinality metric names, I've found out that this metric is actually nowhere near our highest cardinality metrics. The highest cardinality one by far is node_systemd_unit_state, which is a standard host agent metric, although not one that is enabled in the default configuration.)

PrometheusCardinalityUnafraid written at 23:01:07; Add Comment

2019-11-22

Our problem of checking if our L2TP VPN servers are actually working

We operate some VPN servers to let people who are currently outside our wired networks have access to internal networks and internal services (this includes people using our wireless network, for reasons beyond the scope of this entry). For various reasons, we offer both L2TP and OpenVPN (on different servers). Our OpenVPN servers are pretty reliable, but our current L2TP servers have a little problem where sometimes they'll just stop responding to people's attempts to establish L2TP VPNs while otherwise looking perfectly healthy. Fortunately this is infrequent. Unfortunately, it turns out to be surprisingly hard to automatedly monitor an L2TP server to make sure that it's working.

Many protocols make it relatively easy to connect to a server and ask it to do something from a client program, for instance by being text based and running over TLS in straightforward ways. Often people have put together libraries for doing this, such as Python's libraries for IMAP clients and POP3 clients. Sadly, L2TP is a much more complicated protocol, at least as commonly implemented for VPNs. For a start, it is really L2TP/IPsec, L2TP over IPsec, which requires you to set up an IPSec security associate through IKE before you can even begin talking the L2TP protocol itself (well, L2TP/PPP). As far as I could find, no one has written a good client library for even the L2TP and L2TP/PPP portion of this in the way of the Python IMAP and POP3 packages.

Another option would be to run an actual L2TP client and have it try to establish an L2TP connection with our VPN servers. There are two practical problems with this; it seems to be rather complicated to set up (and requires running IPSec daemons), and typical L2TP clients are more oriented towards establishing and managing network connections instead of reporting whether or not things worked. We could probably make all of this work if we tried hard enough, but it would almost certainly be moderately fragile, because we're using all of the software involved for something other than its actual purpose.

All of this has taught me a valuable lesson about how useful it is to have servers and protocols that are easy to probe and check. Mind you, I sort of knew this lesson already from thinking about how we could check that our NFS servers are actually serving NFS (we can't really), but it hadn't quite sunk in in the same way as it has here.

L2TPServerStatusCheckProblem written at 01:33:07; Add Comment

2019-11-17

It's good to make sure you have notifications of things

In the course of writing yesterday's entry on the operational differences between notifications and logs, I wound up having a realization that is obvious in retrospect: not having notifications for things is at the root of a lot of sysadmin horror stories. We've all heard the stories of people who lost hardware RAID arrays because disks failed silently, for example; that's a missing notification (either because there was nothing at all or because the failure information only went to logs). Logs are useful to tell you what's happening, but notifications are critical to tell you that there's something you need to look at and probably deal with.

The corollary of this for me is that when I set up a new system (or upgrade to one, as with Certbot), I should check to make sure that any necessary notifications are being generated for it. Sometimes this is an obvious part of setting up a new service, as it was for Prometheus, but sometimes it's easy to let things drop through the cracks, either because I just assume it's going to work without actually checking or because there's no obvious way to do it. Making this an actual checklist item for setting up new things will hopefully reduce the incidents of surprises.

(We may decide that something doesn't need explicit checks and notifications for various reasons, but if so at least we'll have actively considered it.)

I think of alerts as one form of notification, or alternately one way of generating notifications, but not the only form or source. Email from cron about a cron job failing is a notification, but probably not an alert. Nor do notifications necessarily have to directly go to you and bother you. We have a daily cron job on our Ubuntu machines that sends us email about new pending Ubuntu package updates, but we don't actually read that email; we use the presence of that email from one or more of our machines as a sign that we should run our 'update all of our Ubuntu machines' script in the morning.

(It may be easiest or most useful to generate an alert as your notification, or you may want to generate the notification in another way. For Certbot, we could generate an alert but because of how Prometheus and so on work, the alert would have relatively little information. With an email-based notification that comes directly from the machine, we can include what is hopefully the actual error being reported by Certbot, which hopefully shortens the investigation by a step.)

CheckForNotificationsWorking written at 23:36:38; Add Comment

The operational differences between notifications and logs

In a comment on my entry on how systemd timer units hide errors, rlaager raised an interesting issue:

The emphasis on emails feels like status quo bias, though. Imagine the situation was reversed: that everything was using systemd timers and then someone wrote cron and people started switching to that. In that case, there is a similar operational change. You'd switch from having a centralized status (e.g. systemctl list-units --failed) and centralized logging (the journal, which also defaults to forwarding to syslog) to crond sending emails. Is that an improvement, a step backwards, neither or both?

My answer is 'both', because in their normal state, emails from cron are fundamentally different from systemd journal entries. Emails from cron are notifications, while log entries of all sorts are, well, logs. A switch from notifications to logs or vice versa is a deep switch with real operational impacts because you get different things from each of them.

Logs give you a history. You can look back through your logs to see what happened when, and with merged logs (or just multiple logs) you can try to correlated this with other things happening at the time. Notifications let you know that something happened (or is happening, but cron only sends email when the cron job finishes), but they don't provide history unless you capture and save each separate email (in order).

(You can create one from the other with additional work, of course. With notifications, you save the notifications in a log, and with logs you have something watch the logs and send you notifications. But you have to go to that additional work, and if you don't do it you're going to miss something.)

On an operational level, switching from one to the other is potentially dangerous because in each case you lose something that you were probably counting on. If you move from a system that gives you notifications (such as cron jobs sending email on failure) to one that gives you logs (such as systemd timer units logging their failures to the journal), you lose the notifications that you're expecting and that you're using to discover problems. If you move from logs to notifications, you lose history and you may get spammed with notifications that you don't actually care about. And of course the most dangerous switches are the ones where you don't realize that you're actually switching (or that the software you use has quietly switched for you, for example by moving from cron jobs to systemd timer units).

(You may also have built your systems differently in the first place. In a log-based world, it's perfectly sensible to have things emit a lot of messages (and then to drive notifications from a subset of them, if you do). If you move to a world where emitting messages triggers notifications, suddenly you will be getting a lot of notifications that you don't want.)

NotificationsVersusLogs written at 00:43:44; Add Comment

2019-11-10

Putting a footer on automated email that says what generated it

We have various cron jobs and other systems that occasionally send us email, such as notifications about Certbot renewals failing or alerts that Prometheus is down. One of our local system administration habits when we set up such automated email is that we always add a footer to the email message that includes as much information as possible about what generated the message and where.

For example, the email we send on systemd unit failures (currently used for our Certbot renewal failures) has a footer that looks like this:

(This email comes from /opt/cslab/sbin/systemd-email on $HOST. It was probably run from /etc/systemd/system/cslab-status-email@.service, triggered by an 'OnFailure=' property on the systemd unit $SERVICE.)

I acquired this habit from my co-workers (who were doing it before I got here), but in an environment with a lot of things that may send email once in a while from a lot of machines, it's a good habit to get into. In particular, if you receive a mysterious email about something, being able to go find and look at whatever is generating it is very useful for sorting out what it's really telling you, what you should do about it, and whether or not you actually care.

(There is apparently local history behind our habit of doing this, as there often is with these things. Many potentially odd habits of any particular system administration environment are often born from past issues, much as is true for monitoring system alerts. In both cases, sometimes this is bolting the stable door after the horse is long gone and not returning and sometimes it's quite useful.)

In general, this is another way of adding more context to things you may only see once in a while, much like making infrequent error messages verbose and clear. We're probably going to have forgotten about the existence of many of our automated scripts by the time they send us email, so they need to remind us about a lot more context than I might think when writing them.

PS: If you're putting a footer on the automatic email, you don't necessarily have to identify the machine by the GECOS name of its root account, but every little bit helps. And identifying the machine in the GECOS name still helps for things that don't naturally generate a footer, like email from cron about jobs that failed or produced unexpected output.

AutomatedEmailSourceFooter written at 23:30:20; Add Comment

The problems with piping curl to a shell are system management ones

I was recently reading Martin Tournoij's Curl to shell isn't so bad (via), which argues that the commonly suggested approach of using 'curl example.com/install.sh | sh' is not the security hazard that it's often made out to be. Although it may surprise people to hear this, I actually agree with the article's core argument. If you're going to download and use source code (with its autoconfigure script and 'make install' and so on) or even pre-build binaries, you're already extending quite a lot of trust to the software's authors. However, I still don't think you should install things with curl to shell. There are two reasons not to, one a general system management one and one a pragmatic one about what people do in these scripts.

The general system management one is that to manage and maintain your system over time, you need to control what changes are made to it and insure that everything is handled consistently. You don't want someone's install script making arbitrary and unknown changes to your system, and it gets worse when that install script can change over time. The ideal thing to install is an artifact that you can save locally and that makes limited and inspectable changes to your system (if any). Good install options are, for example, a self-contained tarball that you can extract into a directory hierarchy of your choice (and that doesn't even have to be owned by or extracted by root), or a package for the standard package manager on your system that doesn't contain peculiar custom scripts with undesired side effects. An un-versioned shell script fetched from a remote end that you don't save or inspect and that will make who knows what changes on your system is a terrible idea for being able to manage, maintain, and understand the resulting system state.

The pragmatic reason is that for some reason, the people writing these install shell scripts feel free to have them make all sorts of nominally convenient changes to your system on behalf of their software. These shell scripts could be carefully contained, minimal, and unchanging (for a particular release), doing very little more than what would happen if you installed a good package through your package manager, but very often they aren't and you'll wind up with all sorts of random changes all over your system. This is bad for the obvious reason, and it's also bad because there's no guarantee that your system is set up in the way that the install script expects it to be. Of course generally 'make install' has the same problem, which is why experienced sysadmins also mostly avoid running that as root.

(More generally, you really want to manage the system through only one thing, often the system's package manager. This is the problem with CPAN and other independent package systems (althogh there are good reasons why people keep creating them). Piping curl to a shell and 'make install' are just magnified versions of it. See also why package systems are important.)

CurlToShellManagementProblem written at 00:20:22; Add Comment

(Previous 10 or go back to November 2019 at 2019/11/04)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.