Wandering Thoughts archives

2017-08-13

Sorting out slice mutability in Go

I've recently been writing some Go code that heavily uses and mutates slices, mostly through append(), often slices passed as arguments to functions that then returned what were theoretically different, mutated slices. This left me uncertain that both I and my code were doing the right thing or if I was creating hidden bear traps for myself.

So let's start out with some completely artificial example code:

func test1(h string, hl []string) []string {
  return append(hl, h)
}

// Add h at position p, moving the current
// element there to the end.
func test2(h string, p int, hl []string) []string {
  t := append(hl, hl[p])
  t[p] = h
  return t
}

Suppose you call both of these functions with a []string slice that you want to use later in its current state, ie you want it to not be changed by either call. Is it the case that this will be true for either or both functions?

The answer turns out to be no. Both functions can mutate the original slice in visible ways. Yes, even test1() can have this effect, and here's a demonstration:

func main() {
  t := []string{"a", "b", "c",}
  t2 := test1("d", t)
  t3 := test1("fred", t2)
  _ = test1("barney", t2)
  fmt.Printf("t3[4] is %s\n", t3[4])
}

Although you would expect that t3[4] is "fred", because that's what we appended to t2 to create t3, it is actually "barney" (on the Go Playground, at least, and also on my 64-bit x86 Linux machine).

In Go, slices are a data structure built on top of arrays, as covered in Go Slice: usage and internals. When you generate a slice from an explicit array, the slice is backed by and uses that array. When you work purely with slices (including using append()), as we are here, the resulting slices are backed by anonymous arrays; these anonymous arrays are where the actual data involved in your slice is stored. These anonymous arrays may be shared between slices, and when you copy a slice (for example by calling a function that takes a slice as its argument), you do not create a new copy of the anonymous array that backs it.

Slices have a current length and a maximum capacity that they can grow to. If you call append() on a slice with no capacity left (where len(slice) is cap(slice)), append() has to create a new backing array for the slice and copy all the current elements over into it. However, if you call append() on a slice that has remaining capacity, append() simply uses a bit of the remaining capacity in the underlying backing array; you get a new slice from append(), but it is still using the same backing array. If this backing array is also used by another slice that you care about, problems can ensue.

With test2(), we have a relatively straightforward and obvious case. If append() doesn't have to create a new backing array, we'll mutate the existing one by changing the string at position p. Writing to an existing element of a slice is a clear warning sign here, and it's not too hard to look out for this in your code (and in functions that you call, such as sort.Strings).

With test1() things are more subtle. What is going on here is that when append() doesn't have to increase a slice's capacity, it ends up writing the new element to the original backing array. Our program arranges for t2's anonymous backing array to have spare capacity, so both the second and the third calls to test1() wind up writing to <anonymous-array>[4] and "fred" turns into "barney". This is alarming (at least to me), because I normally think of pure append() calls as being safe; this demonstrates that they are not.

To guard against this, you must always force the creation of a new backing array. The straightforward way to do this is:

func test1(h string, hl []string) []string {
  t := append([]string{}, hl...)
  return append(t, h)
}

(You can reduce this to a one-liner if you want to.)

A version that might be slightly more efficient would explicitly make() a new slice with an extra element's worth of capacity, then copy() the old slice to it, then finally append() or add the new value.

(Whether this version is actually more efficient depends on whether you're going to use plain append() to add even more elements to the new slice later on.)

All of this is a little bit counter-intuitive to me. I could work out what was going on and why it has to be this way, but until I started to really think about it, I thought that test1() was safe. And it sometimes is, which makes things tricky; if t2 had no extra capacity, t3 would have allocated a new backing array and everything would have been fine. When slices backed by anonymous arrays have extra capacity is an implementation detail and depends on both the exact numbers involved and the exact path of slice growth.

(The test1() case is also tricky because the mutation is not visible in the original t2 slice. In test2(), at least the original is clearly altered. In a test1() case, the two append()s to the slice might be quite separated in the code, and the damage is only visible if and when you look at the first new slice.)

PS: This implies that calling append() on the same slice in two different goroutines creates a potential data race, at least if you ever read the newly appended element.

programming/GoSliceMutability written at 01:31:31; Add Comment

2017-08-12

Notes on cgroups and systemd's interaction with them as of Ubuntu 16.04

I wrote recently on putting temporary CPU and memory limits on a user, using cgroups and systemd's features to fiddle around with them on Ubuntu 16.04. In the process I wound up confused about various aspects of how things work today. Since then I've done a bit of digging and I want to write down what I've learned before I forget it again.

The overall cgroup experience is currently a bit confusing on Linux because there are now two versions of cgroups, the original ('v1') and the new version ('v2'). The kernel people consider v1 cgroups to be obsolete and I believe that the systemd people do as well, but in practice Ubuntu 16.04 (and even Fedora 25) use cgroup v1, not v2. You find out which cgroup version your system is using by looking at /proc/mounts to see what sort of cgroup(s) you're mounting. With cgroup v1, you'll see multiple mounts in /sys/fs/cgroup with filesystem type cgroup and various cgroup controllers specified as mount options, eg:

[...]
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,[...],cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,[...],pids 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,[...],net_cls,net_prio 0 0
[...]

According to the current kernel v2 documentation, v2 cgroup would have a single mount with the filesystem type cgroup2. The current systemd.resource-control manpage discusses the systemd differences between v1 and v2 cgroups, and in the process mentions that v2 cgroups are incomplete because the kernel people can't agree on how to implement bits of them.

In my first entry, I wondered in an aside how you could tell if per-user fair share scheduling was on. The answer is that it depends on how processes are organized into cgroup hierarchies. You can see this for a particular process by looking at /proc/<pid>/cgroup:

11:devices:/user.slice
10:memory:/user.slice/user-915.slice
9:pids:/user.slice/user-915.slice
8:hugetlb:/
7:blkio:/user.slice/user-915.slice
6:perf_event:/
5:freezer:/
4:cpu,cpuacct:/user.slice/user-915.slice
3:net_cls,net_prio:/
2:cpuset:/
1:name=systemd:/user.slice/user-915.slice/session-c188763.scope

What this means is documented in the cgroups(7) manpage. The important thing for us is the interaction between the second field (the controller) and the path in the third field. Here we see that for the CPU time controller (cpu,cpuacct), my process is under my user-NNN.slice slice, not just systemd's overall user.slice. That means that I'm subject to per-user fair share scheduling on this system. On another system, the result is:

[...]
5:cpu,cpuacct:/user.slice
[...]

Here I'm not subject to per-user fair share scheduling, because I'm only under user.slice and I'm thus not separated out from processes that other users are running.

You can somewhat estimate the overall state of things by looking at what's in the /sys/fs/cgroup/cpu,cpuacct/user.slice directory. If there are a whole bunch of user-NNN.slice directories, processes of those users are at least potentially subject to fair share scheduling. If there aren't, processes from a user definitely aren't. Similar things apply to other controllers, such as memory.

(The presence of a user-915.slice subdirectory doesn't mean that all of my processes are subject to fair share scheduling, but it does mean that some of them are. On the system I'm taking this /proc/self/cgroup output from, there are a number of people's processes that are only in user.slice in the CPU controller; these processes would not be subject to per-user fair share scheduling, even though other processes of the same user would be.)

If you want a full overview of how everything is structured for a particular cgroup controller, you can use systemd-cgls to see this information all accumulated in one spot. You have to ask for a particular controller specifically, for example 'systemd-cgls /sys/fs/cgroup/cpu,cpuacct', and obviously it's only really useful if there actually is a hierarchy (ie, there are some subdirectories under the controller's user.slice directory). Unfortunately, as far as I know there's no way to get systemd-cgls to tell you the user of a particular process if it hasn't already been put under a user-NNN.slice slice; you'll have to grab the PID and then use another tool like ps.

For setting temporary systemd resource limits on slices, it's important to know that systemd completely removes those user-NNN.slice slices when a user logs out from all of their sessions, and as part of this forgets about your temporary resource limit settings (as far as I know). This may make them more temporary than you expected. I'm not sure if trying to set persistent resource limits with 'systemctl set-property user-NNN.slice ...' actually works; my results have been inconsistent, and since this doesn't work on user.slice I suspect it doesn't work here either.

(As far as I can tell, temporary limits created with 'systemctl --runtime set-property' work in part by writing files to /run/systemd/system/user-NNN.slice.d. When a user fully logs out and their user-NNN.slice is removed, systemd appears to delete the corresponding /run directory, thereby tossing out your temporary limits.)

Although you can ask systemd what it thinks the resource limits imposed on a slice are (with 'systemctl show ...'), the ultimate authority is the cgroup control files in /sys/fs/cgroup/<controller>/<path>. If in doubt, I would look there; the systemd.resource-control manpage will tell you what cgroup attribute is used for which systemd resource limit. Of course you need to make sure that the actual runaway process you want to be limited has actually been placed in the right spot in the hierarchy of the relevant cgroup controller, by checking /proc/<pid>/cgroup.

(Yes, this whole thing is a complicated mess. Slogging through it all has at least given me a better idea of what's going on and how to inspect it, though. For example, until I started writing this entry I hadn't spotted that systemd-cgls could show you a specific cgroup controller's hierarchy.)

linux/SystemdCgroupsNotes written at 00:04:29; Add Comment

2017-08-11

Some notes from my brief experience with the Grumpy transpiler for Python

I've been keeping an eye on Google's Grumpy Python to Go transpiler more or less since it was introduced because it's always been my great white hope for speeding up my Python code more or less effortlessly (and I like Go). However, until recently I had never actually tried to do anything much with it because I didn't really have a problem that it looked like a good fit for. What changed is that I finally got hit by the startup overhead of small programs.

As mentioned in that entry, my initial attempts to use Grumpy weren't successful, because how to actually use Grumpy for anything beyond toys is basically not documented today. Because sometimes I'm stubborn, I kept banging my head against the wall for long enough until I hacked together how to bring up my program, which gave me the chance to get some real world results. Basically the process went like this:

  • build Grumpy from source following their 'method 2' process (using the Fedora 25 system version of Go, not my own build, because Grumpy very much didn't work with the latter).
  • have Grumpy translate my Python program into a module, which was possible because I'd kept it importable.
  • hack grumprun to not delete the Go source file it creates on the fly based on your input. grumprun is in Python, which makes this reasonably easy.
  • feed grumprun a Python program that was 'import mymodule; mymodule.main()' and grab the Go source code it generated (now that it wasn't deleting said source code afterward). This gave me a Go program that I could build into a binary that I could keep and then run with command line arguments.

Unfortunately it turns out that this didn't do me any good. First, the compiled binary of my Grumpy-transpiled Python code also took about the same 0.05 of a second to start and run as my real Python code. Second, my code immediately failed because Grumpy has not fully implemented Python set()s; in particular, it doesn't have the .difference() method. This is not listed in their Missing features wiki page, but Grumpy is underdocumented in general.

(As a general note, Grumpy appears to be in a state of significant churn in how it operates and how you use it, which I suppose is not particularly surprising. You can find older articles on how to use Grumpy that clearly worked at the time but don't work any more.)

This whole experience has unfortunately left me much less interested in Grumpy. As it is today, Grumpy's clearly not ready for outside people to do anything with it, and even in the future it may well never be good at the kind of things I want it for. Building fast-starting and fast-running programs may not ever be a Grumpy priority. Grumpy is an interesting experiment and I wish Google the best of luck with it, but it clearly can't be my great hope for faster, lighter-weight Python programs.

My meta-view of Grumpy is that right now it feels like an internal Google (or Youtube) tool that Google just happens to be developing in a public repository for us to watch.

(In this particular case my fix was to hand-write a second version of the program in Go, which has been part irritating and part interesting. The Go version runs in essentially no time, as I wanted and hoped, so the slow startup of the Grumpy version is not intrinsic to either Go or the problem. My Go version will not be the canonical version of this program for local reasons, so I'll have to maintain it myself in sync with the official Python version for as long as I care enough to.)

Sidebar: Part of why Grumpy is probably slow (and awkward)

It's an interesting exercise to look at the Go code that grumpc generates. It's not anything like Go code as you'd conventionally write it; instead, it's much closer to CPython bytecode that has been turned into Go code. This faithfully implements the semantics of (C)Python, which is explicitly one of Grumpy's goals, but it means that Grumpy has a significant amount of overhead over a true Go solution in many situations.

(The transpiler may lower some Python types and expressions to more pure Go code under some circumstances, but scanning the generated output for my Python program suggests that this is uncommon to rare in the kind of code I write.)

Grumpy codes various Python types in pure Go code, but as I found with set, some of their implementations are incomplete. In fact, now that I look I can see that the only Go code in the entire project appears to be in those types, which generally correspond to things that are implemented in C in CPython. Everything else is generated by the transpiling process.

python/GrumpyBriefExperience written at 02:36:07; Add Comment

Link: Linux Load Averages: Solving the Mystery

Brendan Gregg's Linux Load Averages: Solving the Mystery (via, and) is about both the definition and the history of load average calculations in Linux. Specifically:

Load averages are an industry-critical metric – my company spends millions auto-scaling cloud instances based on them and other metrics – but on Linux there's some mystery around them. Linux load averages track not just runnable tasks, but also tasks in the uninterruptible sleep state. Why? I've never seen an explanation. In this post I'll solve this mystery, and summarize load averages as a reference for everyone trying to interpret them.

In the process of doing this, Brendan Gregg goes back to TENEX (including its source code) for the more or less original load average. Then he chases down the kernel patch from October 1993 that changed Linux's load averages from purely based on the size of the run queue to including processes in disk wait. It goes on from there, including some great examples of how to break down a load average to see what's contributing what (using modern Linux tracing tools, which Gregg is an expert on). The whole thing is really impressive and worth reading.

(Gregg's discussion is focused on Linux alone. For a cross-Unix view, I've written entries on when the load average was added to Unix and the many load averages of different Unix strains. In the latter entry I confidently asserted that Linux's load average included 'disk wait' processes from the start, which Gregg's research has revealed to be wrong.)

links/LinuxLoadAveragesMystery written at 00:01:14; Add Comment

2017-08-10

On the Internet, merely blocking eavesdropping is a big practical win

One of the things said against many basic encryption measures, such as SMTP's generally weak TLS when one mail server is delivering email to another one, is that that they're unauthenticated and thus completely vulnerable to man in the middle attacks (and sometimes to downgrade attacks). This is (obviously) true, but it is focused on the mathematical side of security. On the practical side, the reality is simple:

Forcing attackers to move from passive listening to active interception is almost always a big win.

There are a lot of attackers that can (and will) engage in passive eavesdropping. It is relatively easy, relatively covert, and quite useful, and as a result can be used pervasively and often is. Far fewer attackers can and will engage in active attacks like MITM interception or forced protocol downgrades; such attacks are not always possible for an attacker (they may have only limited network access) and when the attacks are possible they're more expensive and riskier.

Forcing attackers to move from passive eavesdropping to some form of active interception is thus almost always a big practical win. Most of the time you'll wind up with fewer attackers doing fewer things against less traffic. Sometimes attackers will mostly give up; I don't think there are very many people attempting to MITM SSH connections, for example, although in theory you might be able to get away with it some of the time.

(There certainly were people snooping on Telnet and rlogin connections back in the days.)

If you can prevent eavesdropping, the theoretical security of the environment may not have gotten any better (you have to assume that an attacker can run a MITM attack if they really want to badly enough), but the practical security certainly has. This makes it a worthwhile thing to do by itself if you can. Of course full protection against even active attacks is better, but don't let the perfect be the enemy of the good. SMTP's basic server to server TLS encryption may be easily defeated by an active attacker and frequently derided by security mavens, but it has probably kept a great deal of email out of the hands of passive listeners (see eg Google's report on this).

(I mentioned this yesterday in the context of the web, but I think it's worth covering in its own entry.)

tech/BlockingEavesdroppingBigWin written at 01:39:31; Add Comment

2017-08-09

How encryption without authentication would still be useful on the web

In HTTPS is a legacy protocol, I talked about how we are stuck with encryption being tightly coupled to web site authentication then mentioned in an aside that they could be split apart. In a comment, Alexy asked a good question:

How could encryption be useful at all without authentication? Without authentication, any MITM (i.e. ISP) could easily pretend to be other side and happily encrypt the connection. And we would still get our ISP-induced ads and tracking.

The limitation that Alexy mentions is absolutely true; an encryption-only connection can still be MITMd, at least some of the time. Having encryption without authentication for web traffic is not about absolute security; instead it's about making things harder for the attacker and thus reducing how many attackers there are. Encrypting web traffic would do this in at least three ways.

First, it takes passive eavesdropping completely off the table; it just doesn't work any more. This matters a fair bit because passive eavesdropping is easy to deploy and hard to detect. If you force attackers (including ISPs) to become active attackers instead of passive listeners, you make their work much harder and more chancy in various ways. All by itself I think this makes unauthenticated encryption very useful, since passive eavesdropping is becoming increasingly pervasive (partly as it becomes less expensive).

(One of the less obvious advantages of passive eavesdropping is that you don't have to handle the full traffic volume that's going past. Sure, it would be nice to have a complete picture, but generally if you drop some amount of the traffic because your eavesdropping machine is too overloaded it's not a catastrophe. With active interception, at least some part of your system must be able to handle the full traffic volume or increasingly bad things start to happen. If you drop some traffic, that's generally traffic that's not getting through, and people notice that and get upset.)

Second, using encryption significantly raises the monetary costs of active MITM interception, especially large-scale interception. Terminating and initiating encrypted sessions takes a lot more resources (CPU, memory, etc) than does fiddling some bits and bytes in a cleartext stream as it flows through you. Anyone who wants to do this at an ISP's network speed and scale is going to need much beefier and more expensive hardware than their current HTTP interception boxes, which changes the cost to benefit calculations. It's also probably going to make latency worse and thus to slow down page loads and so on, which people care about.

Finally, in many situations it's probably going to increase the attacker's risks from active MITM interception and reduce how much they get from it. As an example, consider the case of Ted Unangst's site and me. I haven't accepted his new root CA, so in theory my HTTPS connection to his site is completely open to a MITM attack. In practice my browser has a memorized exception for his current certificate and if it saw a new certificate, it would warn me and I'd have to approve the new one. In a hypothetical world with encryption-only HTTP, there are any number of things that browsers, web sites, and the protocol could do to make MITM interception far more obvious and annoying to users (especially if browsers are willing to stick to their current hardline attitudes). This doesn't necessarily stop ISPs, but it does draw attention and creates irritated customers (and raises the ISP's support costs). And of course it's very inconvenient to attackers that want to be covert; as with HTTPS interception today, it would be fairly likely to expose you and to burn whatever means you used to mount the attack.

None of this stops narrowly targeted MITM interception, whether by your ISP or a sufficiently determined and well funded attacker. Instead, unauthenticated encryption's benefit is that it goes a long way towards crippling broad snooping on web traffic (and broad alterations to it), whether by your ISP or by anyone else. Such broad snooping would still be technically possible, but encryption would raise the costs in money, irritated customers, and bad press to a sufficient degree that it would cut a great deal of this activity off in practice.

web/EncryptionWithHTTPBenefit written at 01:36:21; Add Comment

2017-08-08

We care more about long term security updates than full long term support

We like running so-called 'LTS' (Long Term Support) releases of any OS that we use, and more broadly of any software that we care about, because using LTS releases allows us to keep running the same version for a fairly long time. This is generally due to pragmatics on two levels. First, testing and preparing a significant OS upgrade simply takes time and there's only so much time available. Second, upgrades generally represent some amount of increased risk over our existing environment. If our existing environment is working, why would we want to mess with that?

(Note that our general environment is somewhat unusual. There are plenty of places where you simply can't stick with kernels and software that is more than a bit old, for various reasons.)

But the general idea of 'LTS' is a big tent and it can cover many things (as I sort of talked about in an entry on what supporting a production OS means to me). As I've wound up mentioning in passing recently (eg here), the thing that we care about most is security updates. Sure, we'd like to get our bugs fixed too, but we consider this less crucial for at least two reasons.

First and most importantly, we can reasonably hope to not hit any important bugs once we've tested an OS release (or at least had it in production for an initial period), so if things run okay now they'll keep running decently into the future even if we do nothing to them. This is very much not true of security problems, for obvious reasons; to put it one way, attackers hit your security exposures for you and there's not necessarily much you can do to stop them short of updating. Running an OS without current security updates is getting close to being actively dangerous; running without the possibility of bug fixes is usually merely inconvenient at most.

(There can be data loss bugs that will shift the calculations here, but we can hope that they're far less common than security issues.)

Second, I have to admit that we're making a virtue of more or less necessity, because we generally can't actually get general updates and bug fixes in the first place. For one big and quite relevant example, Ubuntu appears to fix only unusually egregious bugs in their LTS releases. If you're affected by mere ordinary bugs and issues, you're stuck. This is one of the tradeoffs you get to make with Ubuntu LTS releases; you trade off a large package set for effectively only getting security updates (and it has been this way for a long time). More broadly, no LTS vendor promises to fix every bug that every user finds, only the sufficiently severe and widely experienced ones. So just because we run into a bug doesn't mean that it's going to get fixed; it may well not be significant enough to be worth the engineering effort and risk of an update on the vendor's part.

(There is also the issue that if we hit a high-impact bug, we can't wait for a fix to be developed upstream and slowly pushed down to us. If we have systems falling over, we need to solve our problems now, in whatever way that takes. Sometimes LTS support can come through with a prompt fix, but more often than not you're going to be waiting too long.)

sysadmin/LongtermSecurityVersusSupport written at 01:27:28; Add Comment

2017-08-07

There will be no LTS release of the OmniOS Community Edition

At the end of my entry on how I was cautiously optimistic about OmniOS CE, I said:

[...] For a start, it's not clear to me if OmniOS CE r151022 will receive long-term security updates or if users will be expected to move to r151024 when it's released (and I suppose I should ask).

Well, I asked, and the answer is a pretty unambiguous 'no'. The OmniOS CE core team has no interest in maintaining an LTS release; any such extended support would have to come from someone else doing the work. The current OmniOS CE support plans are:

What we intend, is to support the current and previous release with an emphasis on the current release going forward from r151022.

OmniOS CE releases are planned to come out roughly every 26 weeks, ie every six months, so supporting the current and previous release means that you get a nominal year of security updates and so on (in practice less than a year).

I can't blame the OmniOS CE core team for this (and I'm not anything that I'd describe as 'disappointed'; getting not just a OmniOS CE but a OmniOS CE LTS was always a long shot). People work on what interest them, and the CE core team just doesn't use LTS releases or plan to. They're doing enough as it is to keep OmniOS alive. And for most people, upgrading from release to release is probably not a big deal.

In the short term, this means that we are not going to bother to try to upgrade from OmniOS r151014 to either the current or the next version of OmniOS CE, because the payoff of relatively temporary security support doesn't seem worth the effort. We've already been treating our fileservers as sealed appliances, so this is not something we consider a big change.

(The long term is beyond the scope of this entry.)

solaris/OmniOSCENoLTSVersion written at 01:09:13; Add Comment

2017-08-06

Our decision to restrict what we use for developing internal tools

A few years ago, we (my small sysadmin group) hit a trigger point where we realized that we were writing internal sysadmin tools (including web things) in a steadily increasing collection of programming languages, packages, and environments for doing things like web pages and apps. This was fine individually but created a collective problem, because in theory we want everyone to be able to at least start to support and troubleshoot everything we have running. The more languages and environments we use across all of our tools, the harder this gets. As things escalated and got more exotic, my co-workers objected quite strongly and, well, they're right.

The result of this was that we decided to standardize on using only a few languages and environments for our internally developed tools, web things, and so on. Our specific choices are not necessarily the most ideal choices and they're certainly a product of our environment, both in what people already knew and what we already had things written in. For instance, given that I've written a lot of tools in Python, it would have been relatively odd to exclude it from our list.

Since the whole goal of this is to make sure that co-workers don't need to learn tons of things to work on your code, we're de facto limiting not just the basic choice of language but also what additional packages, libraries, and so on you use with it. If I load my Python code down with extensive use of additional modules, web frameworks, and so on, it's not enough for my co-workers to just know Python; I've also forced them to learn all those packages. Similar things hold true for any language, including (and especially) shell scripts. Of course sometimes you absolutely need additional packages (eg), but if we don't absolutely need it our goal is to stick to doing things with only core stuff even if the result is a bit longer and more tedious.

(It doesn't really matter if these additional modules are locally developed or come from the outside world. If anything, outside modules are likely to be better documented and better supported than ones I write myself. Sadly this means that the Python module I put together myself to do simple HTML stuff is now off the table for future CGI programs.)

I don't regret our overall decision and I think it was the right choice. I had already been asking my co-workers if they were happy with me using various things, eg Go, and I think that the tradeoffs we're making here are both sensible and necessary. To the extent that I regret anything, I mildly regret that I've not yet been able to talk my co-workers into adding Go to the mix.

(Go has become sort of a running joke among us, and I recently got to cheerfully tell one of my co-workers that I had lured him into using and even praising my call program for some network bandwidth testing.)

Note that this is, as mentioned, just for my small group of sysadmins, what we call Core in our support model. The department as a whole has all sorts of software and tools in all sorts of languages and environments, and as far as I know there has been no department-wide effort to standardize on a subset there. My perception is that part of this is that the department as a whole does not have the cross-support issue we do in Core. Certainly we're not called on to support other people's applications; that's not part of our sysadmin environment.

Sidebar: What we've picked

We may have recorded our specific decision somewhere, but if so I can't find it right now. So off the top of my head, we picked more or less:

  • Shell scripts for command line tools, simple 'display some information' CGIs, and the like, provided that they are not too complex.
  • Python for command line tools.
  • Python with standard library modules for moderately complicated CGIs.
  • Python with Django for complicated web apps such as our account request system.

  • Writing something in C is okay for things that can't be in an interpreted language, for instance because they have to be setuid.

We aren't actively rewriting existing things that go outside this, for various reasons. Well, at least if they don't need any maintenance, which they mostly don't.

(We have a few PHP things that I don't think anyone is all that interested in rewriting in Python plus Django.)

sysadmin/LimitingToolDevChoices written at 02:03:10; Add Comment

2017-08-05

I've been hit by the startup overhead of small programs in Python

I've written before about how I care about the resource usage and speed of short running programs. However, that care has basically been theoretical. I knew this was an issue in general and it worried me because we have short running Python programs, but it didn't impact me directly and our systems didn't seem to be suffering as a result of it. Even DWiki running as a CGI was merely kind of embarrassing.

Today, I turned a hacky personal shell script into a better done production ready version that I rewrote in Python. This worked fine and everything was great right up to the point where I discovered that I had made this script a critical path in invoking dmenu on my office workstation, which is something that I do a lot (partly because I have a very convenient key binding for it). The new Python version is not slow as such, but it is slower, and it turns out that I am very sensitive to even moderate startup delays with dmenu (partly because I type ahead, expecting dmenu to appear essentially instantly). With the old shell script version, this part of dmenu startup took around one to two hundredths of a second; with the new Python version, things now takes around a quarter of a second, which is enough lag to be perceptible and for my type-ahead to go awry.

(This assumes that my machine is unloaded, which is not always the case. Active CPU load, such as installing Ubuntu in a test VM, can make this worse. My dmenu setup actually runs this program five times to extract various information, so each individual run is taking about five hundredths of a second.)

Profiling and measuring short running Python programs is a bit challenging and I've wound up resorting to fairly crude tricks (such as just exiting from the program at strategic points). These tricks strongly suggest that almost all of the extra time is going simply to starting Python, with a significant amount of it spent importing the standard library modules I use (and all of the things that they import in turn). Simply getting to the quite early point where I call argparse's parse_args ArgumentParser method consumes almost all of the time on my desktop. My own code contributes relatively little to the slower execution (although not nothing), which unfortunately means that there's basically no point in trying to optimize it.

(On the one hand, this saves me time. On the other hand, optimizing Python code can be interesting.)

My inelegant workaround for now is to cache the information my program is producing, so I only have to run the program (and take the quarter second delay) when its configuration file changes; this seems to work okay and it's as least as fast as the old shell script version. I'm hopeful that I won't run into any other places where I'm using this program in a latency sensitive situation (and anyway, such situations are likely to have less latency since I'm probably only running it once).

In the longer run it would be nice to have some relatively general solution to pre-translate Python programs into some faster to start form. For my purposes with short running programs it's okay if the translated result has somewhat less efficient code, as long as it starts very fast and thus finishes fast for programs that only run briefly. The sort of obvious candidate is Google's grumpy project; unfortunately, I can't figure out how to make it convert and build programs instead of Python modules, although it's clearly possible somehow.

(My impression is that both grumpy and Cython have wound up focused on converting modules, not programs. Like PyPy, they may also be focusing on longer running CPU-intensive code.)

PS: The new version of the program is written in Python instead of shell because a non-hacky version of the job it's doing is more complicated than is sensible to implement in a shell script (it involves reading a configuration file, among other issues). It's written in Python instead of Go for multiple reasons, including that we've decided to standardize on only using a few languages for our tools and Go currently isn't one of them (I've mentioned this in a comment on this entry).

python/StartupOverheadProblem written at 00:59:49; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.