Wandering Thoughts

2017-03-29

The work of safely raising our local /etc/group line length limit

My department has now been running our Unix computing environment for a very long time (which has some interesting consequences). When you run a Unix environment over the long term, old historical practices slowly build up and get carried forward from generation to generation of the overall system, because you've probably never restarted everything from complete scratch. All of this is an elaborate route to say that as part of our local password propagation infrastructure, we have a program that checks /etc/passwd and /etc/group to make sure they look good, and this program puts a 512 byte limit on the size of lines in /etc/group. If it finds a group line longer than that, it complains and aborts and you get to fix it.

(Don't ask what our workaround is for groups with large memberships. I'll just say that it raises some philosophical and practical questions about what group membership means.)

We would like to remove this limit; it makes our life more complicated in a number of ways, causes problems periodically, and we're pretty sure that it's no longer needed and probably hasn't been needed for years. So we should just take that bit of code out, or at least change the '> 512' to '> 4096', right?

Not so fast, please. We're pretty sure that doing so is harmless, but we're not certain. And we would like to not blow up some part of our local environment by mistake if it turns out that actually there is still something around here that has heartburn on long /etc/group lines. So in order to remove the limit we need to test to make sure everything still works, and one of the things that this has meant is sitting down and trying to think of all of the places in our environment where something could go wrong with a long group line. It's turned out that there were a number of these places:

  • Linux could fail to properly recognize group membership for people in long groups. I rated this as unlikely, since the glibc people are good at avoiding small limits and relatively long group lines are an obvious thing to think about.

  • OmniOS on our fileservers could fail to recognize group membership. Probably unlikely too; the days when people put 512-byte buffers or the like into getgrent() and friends are likely to be long over by now.

    (Hopefully those days were long over by, say, 2000.)

  • Our Samba server might do something special with group handling and so fail to properly deal with a long group, causing it to think that someone wasn't a member or deny them access to group-protected file.

  • The tools we use to build an Apache format group file from our /etc/group could blow up on long lines. I thought that this was unlikely too; awk and sed and so on generally don't have line length limitations these days.

    (They did in the past, in one form or another, which is probably part of why we had this /etc/group line length check in the first place.)

  • Apache's own group authorization checking could fail on long lines, either completely or just for logins at the end of the line.

  • Even if they handled regular group membership fine, perhaps our OmniOS fileservers would have a problem with NFS permission checks if you were in more than 16 groups and one of your extra groups was a long group, because this case causes the NFS server to do some additional group handling. I thought this was unlikely, since the code should be using standard OmniOS C library routines and I would have verified that those worked already, but given how important NFS permissions are for our users I felt I had to be sure.

(I was already confident that our local tools that dealt with /etc/group would have no problems; for the most part they're written in Python and so don't have any particular line length or field count limitations.)

It's probably worth explicitly testing Linux tools like useradd and groupadd to make sure that they have no problems manipulating group membership in the presence of long /etc/group lines. I can't imagine them failing (just as I didn't expect the C library to have any problems), but that just means it would be really embarrassing if they turned out to have some issue and I hadn't checked.

All of this goes to show that getting rid of bits of the past can be much more work and hassle than you'd like. And it's not particularly interesting work, either; it's all dotting i's and crossing t's just in case, testing things that you fully expect to just work (and that have just worked so far). But we've got to do this sometime, or we'll spend another decade with /etc/group lines limited to 512 bytes or less.

(System administration life is often not particularly exciting.)

sysadmin/GroupSizeIncreaseWorries written at 01:57:29; Add Comment

2017-03-28

What affects automatically removing old kernels on Ubuntu

I have griped before (and recently) about how much of a pain it is to try to keep the number of kernels that Ubuntu installs on your machines under control. Writing your own script to remove obsolete kernels is fraught with challenges, but as it turns out I think we can do what we want with 'apt-get autoremove' and some extra work.

First, as Ewen McNeill said in a comment here back in 2015, it's the case that 'apt-get autoremove' will not remove a held package, kernel or otherwise. This makes a certain amount of sense, even if it's inconvenient. We can't keep kernels unheld in general for reasons covered here and here, but we probably can write a script that unholds them, runs 'apt-get autoremove', and holds the remaining kernels afterwards.

(Note that holding Ubuntu packages doesn't convert them from automatically installed packages to manually installed ones; it just holds them. You can see this with apt-mark, which also makes a handy way to hold and unhold packages on the command line.)

If you run apt-get autoremove with your kernel packages not held, you'll notice that it doesn't remove all of them. This naturally made me curious about what controlled this, and at least in Ubuntu the answer is in /etc/apt/apt.conf.d/01autoremove-kernels:

// DO NOT EDIT! File autogenerated by
// /etc/kernel/postinst.d/apt-auto-removal
APT::NeverAutoRemove
{
   "^linux-image-4\.4\.0-45-generic$";
   "^linux-image-4\.4\.0-53-generic$";
[...]

This contains a list of kernel packages and package regular expressions that should not be autoremoved; generally it's going to contain your two most recent kernels. As the comment says, it's (re)created by a script when kernel packages are installed and removed. This script, /etc/kernel/postinst.d/apt-auto-removal, starts with a comment that does a pretty good job of explaining what it wants to do:

Mark as not-for-autoremoval those kernel packages that are:

  • the currently booted version
  • the kernel version we've been called for
  • the latest kernel version (as determined by debian version number)
  • the second-latest kernel version

In the common case this results in two kernels saved (booted into the second-latest kernel, we install the latest kernel in an upgrade), but can save up to four. Kernel refers here to a distinct release, which can potentially be installed in multiple flavours counting as one kernel.

The second rule here implies that if you install an old kernel by hand for some reason, it will get added to the manual exclusion list. Well, added to the current manual exclusion list, since the list is rebuilt on at least every kernel install.

Now, there is a very important gotcha with this whole setup: this list of kernels to never autoremove is only recreated when kernel packages are installed or otherwise manipulated. When you run 'apt-get autoremove', there is nothing that specifically preserves the kernel you are actually running right then. Normally you're probably booted into one of the preserved kernels. But you might not be; if you have to boot back into an old version for some reason and you then run 'apt-get autoremove', as far as I can see it's entirely possible for this to remove your kernel right out from underneath you. Possibly autoremove has specific safeguards against this, but if so I don't see them mentioned in the manpage and there's also this Ubuntu bug.

(As a result, our wrapper script is likely to specifically hold or keep held the actual running kernel.)

(I got some of this information from this askubuntu question and its answers.)

PS: This suggests that maximum safety comes from writing your own script to explicitly work out what kernels you can remove based on local policy decisions. Using 'apt-get autoremove' will probably work much of the time, but it's the somewhat lazy way. We're lazy, though, so we'll probably use it.

linux/UbuntuKernelAutoremove written at 00:57:14; Add Comment

2017-03-27

Link: The Unix Heritage Society now has the 8th, 9th, and 10th editions of Research Unix

Today in an email message with the subject of [TUHS] Release of 8th, 9th and 10th Editions Unix, Warren Toomey announced that the Unix Heritage Society has now gained permission to make the source code of Research Unix's 8th, 9th, and 10th editions available for the usual non-commercial purposes. This apparently is the result of a significant lobbying campaign from a variety of Unix luminaries. The actual source trees can be found in TUHS' archive area for Research distributions.

Most people are familiar with Research Unix versions through V7 (the 7th Edition), which was the famous one that really got out into the outside world and started the Unix revolution. The 8th through 10th editions were what happened inside Bell Labs after this (with a trip through BSD for the port to Vaxen, among other things; see the history of Unix), and because Unix was starting to be commercialized around when they were being worked on by Bell Labs, they were never released in the way that the 7th Edition was. Despite that they were the foundation of some significant innovations, such as the original Streams and /proc, and for various reasons they acquired a somewhat legendary status as the last versions of the true original strain of Research Unix. Which you couldn't see or run, which just added to the allure.

You probably still can't actually run these editions, unless you want to engage in extensive hardware archaeology and system (re)construction. But at least the Unix community now has these pieces of history.

links/UnixHeritageSocietyUnixV8V9V10 written at 21:50:32; Add Comment

We're probably going to upgrade our OmniOS servers by reinstalling them

We're currently running OmniOS r151014 on our fileservers, which is the current long term support release (although we're behind on updates, because we avoid them for stability reasons). However, per the OmniOS release cycle, there's a new LTS release coming this summer and about six to nine months later, our current r151014 version will stop being supported at all. Despite what I wrote not quite a year ago about how we might not upgrade at all, we seem to be broadly in support of the idea of upgrading when the next LTS release is out in order to retain at least the option of applying updates for security issues and so on.

This raises the question of how we do it, because there are two possible options; we could reinstall (what we did the last time around), or upgrade the existing systems through the normal process with a new boot environment. Having thought about it, I think that I'm likely to argue for upgrading via full reinstalls (on new system disks). There's two reasons for this, one specific to this particular version change and one more general one.

The specific issue is that OmniOS is in the process of transitioning to a new bootloader; they're moving from an old version of Grub to a version of the BSD bootloader (which OmniOS calls the 'BSD Loader'). While it's apparently going to be possible to stick with Grub or switch bootloaders over the transition, the current OmniOS Bloody directions make this sound pretty intricate. Installing a new OmniOS from scratch on new disks seems to be the cleanest and best way to get the new bootloader for the new OmniOS while preserving Grub for the old OmniOS (on the old disks).

The more broader issue is that reinstalling from scratch on new disks every time is more certain for rollbacks (since we can keep the old disks) and means that any hypothetical future systems we install wind up the same as the current ones without making us go through extra work. If we did in-place upgrades, to get identical new installs we would actually have to install r151014 then immediately upgrade it to the new LTS. If we just installed the new LTS, there are various sorts of subtle differences and incompatibilities that could sneak in.

(This is of course not specific to OmniOS. It's very hard to make sure that upgraded systems are exactly the same as newly installed systems, especially if you've upgraded the systems over significant version boundaries.)

I like the idea of upgrading between OmniOS versions using boot environments in theory (partly because it's neat if it works), it would probably be faster and less of a hassle, and I may yet change my mind here. But I suspect that we're going to do it the tedious way just because it's easier on us in the long run.

solaris/OmniOSUpgradesViaReinstalls written at 01:45:33; Add Comment

2017-03-26

Your exposure from retaining Let's Encrypt account keys

In a comment on my entry on how I think you have lots of Let's Encrypt accounts, Aristotle Pagaltzis asked a good question:

Taking this logic to its logical conclusion: as long as you can arrange to prove your control of a domain under some ACME challenge at any time, should you not immediately delete an account after obtaining a certificate through it?

(Granted – in practice, there is the small matter that deleting accounts appears unimplemented, as per your other entry…)

Let's take the last bit first: for security purposes, it's sufficient to destroy your account's private key. This leaves dangling registration data on Let's Encrypt's servers, but that's not your problem; with your private key destroyed, no one can use your authorized account to get any further certificates.

(If they can, either you or the entire world of cryptography have much bigger problems.)

For the broader issue: yes, in theory it's somewhat more secure to immediately destroy your private key the moment you have successfully obtained a certificate. However, there is a limit to how much security you get this way because someone with unrestricted access to your machine can get their own authorization for it with an account of their own. If I have root access to your machine and you normally run a Let's Encryption authorization process from it, I can just use my own client to do that same and get my own authorized account. I can then take the private key off the machine and later use it to get my own certificates for your machine.

(I can also reuse an account I already have and merely pass the authorization check, but in practice I might as well get a new account to go with it.)

The real exposure for existing authorized accounts is when it's easier to get at the account's private key than it is to get unrestricted access to the machine itself. If you keep the key on the machine and only accessible to root, well, I won't say you have no additional exposure at all, but in practice your exposure is probably fairly low; there are a lot of reasonably sensitive secrets that are protected this way and we don't consider it a problem (machine SSH host keys, for example). So in my opinion your real exposure starts going up when you transport the account key off the machine, for example to reuse the same account on multiple machines or over machine reinstalls.

As a compromise you might want to destroy account keys every so often, say once a year or every six months. This limits your long-term exposure to quietly compromised keys while not filling up Let's Encrypt's database with too many accounts.

As a corollary to this and the available Let's Encrypt challenge methods, someone who has compromised your DNS infrastructure can obtain their own Let's Encrypt authorizations (for any account) for any arbitrary host in your domain. If they issue a certificate for it immediately you can detect this through certificate transparency monitoring, but if they sit on their authorization for a while I don't think you can tell. As far as I know, LE provides no way to report on accounts that are authorized for things in your domain (or any domain), so you can't monitor this in advance of certificates being issued.

For some organizations, compromising your DNS infrastructure is about as difficult as getting general root access (this is roughly the case for us). However, for people who use outside DNS providers, such a compromise may only require gaining access to one of your authorized API keys for their services. And if you have some system that allows people to add arbitrary TXT records to your DNS with relatively little access control, congratulations, you now have a pretty big exposure there.

sysadmin/LetsEncryptAccountExposure written at 01:22:08; Add Comment

2017-03-24

An odd and persistent year old phish spammer

We have a number of more or less internal mailing lists for things like mailing all of the technical staff. They have at least somewhat unusual names and don't appear in things like email directories or most users' address books. Back almost a year ago (21st April 2016), one of them got a phish spam:

From codewizard@approject.com [...]
Received: from [177.47.160.250] (helo=approject.com) [...]
From: "Capital One 360" <codewizard@approject.com>
Subject: Your Capital one 360 Account Urgent Login Reminder

LOOK FOR THE ATTACHED FILE AND OPEN

(With an attached PDF.)

Slightly over a month later, the same address got another one:

From codewizard@approject.com [...]
Received: from [95.213.155.178] (helo=approject.com) [...]
From: "USAA SECURITY" <codewizard@approject.com>
Subject: Your Account Log-on Reminder

A week later it got a third one, with the same MAIL FROM (and EHLO), but from a different IP address yet again. Then a fourth two weeks later.

At this point I'd had enough, so I threw the MAIL FROM of codewizard@approject.com into the per-address server side email blocks for this particular address. You can probably guess what has happened periodically ever since then:

2017-03-23 18:11:31 H=(approject.com) [46.39.225.151] F=<codewizard@approject.com> rejected RCPT <redacted>: blocked by personal senders blacklist.

(As I write this, that IP address is on the Spamhaus CSS.)

It's clear that whatever is doing this spamming is widely dispersed, very persistent, and is using a basically unique address list that it has a death grip on (this internal mailing list of ours hasn't started getting other sorts of spam, just this one phish spammer). Maybe this is wandering malware that is now operating more or less autonomously (like some do), or maybe this is someone running a long-term campaign who cannot be bothered to disguise the distinctive signatures here (those being the envelope sender and the EHLO).

(This isn't the first time I've seen spammer persistence illustrated, but I think it's the first time it's clearly a single spammer or spam agent instead of address lists being shared and reshared endlessly.)

PS: Since various aspects of this phish spam have apparently mutated over time, it's probably not autonomous malware in action but instead someone running a long-term campaign. I don't know why they're so fixated on using this very distinctive MAIL FROM, but it's certainly handy so please don't change, whoever you are.

spam/PersistentPhishSpammer written at 22:25:02; Add Comment

2017-03-23

ARM servers had better just work if vendors want to sell very many

A few years ago I wrote about the cost challenge facing hypothetical future ARM servers here; to attract our interest, they'd have to be cheaper or better than x86 servers in some way that we cared about. At the time I made what turns out to be a big assumption: I assumed that ARM servers would be like x86 servers in that they would all just work with Linux. Courtesy of Pete Zaitcev's Standards for ARM computers and Linaro and the follow-on Standards for ARM computers in 2017, I've now learned that this was a pretty optimistic assumption. The state of play in 2017 is that LWN can write an article called Making distributions Just Work on ARM servers that describes not current reality but an aspirational future that may perhaps arrive some day.

Well, you know, no wonder no one is trying to actually sell real ARM servers. They're useless to a lot of people right now. Certainly they'd be useless to us, because we don't want to buy servers with a bespoke boatloader that probably only works with one specific Linux distribution (the one the vendor has qualified) and may not work in the future. A basic prerequisite for us being interested in ARM servers is that they be as useful for generic Linux as x86 servers are (modulo which distributions have ARM versions at all). If we have to buy one sort of servers to run Ubuntu but another sort to run CentOS, well, no. We'll buy x86 servers instead because they're generic and we can even run OpenBSD on them.

There are undoubtedly people who work at a scale with a server density where things like the power advantages of ARM might be attractive enough to overcome this. These people might even be willing to fund their own bootloader and distribution work. But I think that there are a lot of people who are in our situation; we wouldn't mind extra-cheap servers to run Linux, but we aren't all that interested in buying servers that might as well be emblazoned 'Ubuntu 16.04 only' or 'CentOS only' or the like.

I guess this means I can tune out all talk of ARM servers for Linux for the next few years. If the BIOS-level standards for ARM servers for Linux are only being created now, it'll be at least that long until there's real hardware implementing workable versions of them that isn't on the bleeding edge. I wouldn't be surprised if it takes half a decade before we get ARM servers that are basically plug and play with your choice of a variety of Linux distributions.

(I don't blame ARM or anyone for this situation, even though it sort of boggles me. Sure, it's not a great one, but the mere fact that it exists means that ARM vendors haven't particularly cared about the server market so far (and may still not). It's hard to blame people for not catering to a market that they don't care about, especially when we might not care about it either when the dust settles.)

tech/ArmServersHaveToJustWork written at 23:14:50; Add Comment

2017-03-22

Setting the root login's 'full name' to identify the machine that sent email

Yesterday I wrote about making sure you can identify what machine sent you a status email, and the comments Sotiris Tsimbonis shared a brilliant yet simple solution to this problem:

We change the gecos info for this purpose.

chfn -f "$HOSTNAME root" root

Take it from me; this is beautiful genius (so much so that both we and another group here immediately adopted it). It's so simple yet still extremely effective, because almost everything that sends email does so using programs like mail that will fill out the From: header using the login's GECOS full name from /etc/passwd. You get email that looks like:

From: root@<your-domain> (hebikera root)

This does exactly what we want by immediately showing the machine that the email is from. In fact many mail clients these days will show you only the 'real name' from the From: header by default, not the actual email address (I'm old-fashioned, so I see the traditional full From: header).

This likely works with any mail-sending program that doesn't require completely filled out email headers. It definitely works in the Postfix sendmail cover program for 'sendmail -t' (as well as the CentOS 6 and 7 mailx, which supplies the standard mail command).

(As an obvious corollary, you can also use this trick for any other machine-specific accounts that send email; just give them an appropriate GECOS 'full name' as well.)

There's two perhaps obvious cautions here. First, if you ever rename machines you have to remember to re-chfn the root login and any other such logins to have the correct hostname in them. It's probably worth creating an officially documented procedure for renaming machines, since there are other things you'll want to update as well (you might even script it). Second, if you have some sort of password synchronization system you need it to leave root's GECOS full name alone (although it can update root's password). Fortunately ours already does this.

sysadmin/IdentifyMachineEmailByRootName written at 23:49:10; Add Comment

Making sure you can identify what machine sent you a status email

I wrote before about making sure that system email works, so that machines can do important things like tell you that their RAID array has lost redundancy and you should do something about that. In a comment on that entry, -dsr- brought up an important point, which is you want to be able to easily tell which machine sent you email.

In an ideal world, everything on every machine that sends out email reports would put the machine's hostname in, say, the Subject: header. This would give you reports like:

Subject: SMART error (FailedOpenDevice) detected on host: urd

In the real world you also get helpful emails like this:

Subject: Health

Device: /dev/sdn [SAT], FAILED SMART self-check. BACK UP DATA NOW!

The only way for us to tell which machine this came from was to look at either the Received: headers or the Message-ID, which is annoying.

There are at least two ways to achieve this. The first approach is what -dsr- said in the comment, which is to make every machine send its email to a unique alias on your system. This unfortunately has at least two limitations. The first is that it somewhat clashes with a true 'null client' setup, where your machines dump absolutely all of their email on the server. A straightforward null client does no local rewriting of email at all, so to get this you need a smarter local mailer (and then you may need per-machine setup, hopefully automated). The second limitation is that there's no guarantee that all of the machine's email will be sent to root (and thus be subject to simple rewriting). It's at least likely, but machines have been known to send status email to all sorts of addresses.

(I'm going to assume that you can arrange for the unique destination alias to be visible in the To: header.)

You can somewhat get around this by doing some of the rewriting on your central mail handler machine (assuming that you can tell the machine email apart from regular user email, which you probably want to do anyways). This needs a relatively sophisticated configuration, but it probably can be done in something like Exim (which has quite powerful rewrite rules).

However, if you're going to do this sort of magic in your central mail handler machine, you might as well do somewhat different magic and alter the Subject: header of such email to include the host name. For instance, you might just add a general rule to your mailer so that all email from root that's going to root will have its Subject: altered to add the sending machine's hostname, eg 'Subject: [$HOSTNAME] ....'. Your central mail handler already knows what machine it received the email from (the information went into the Received header, for example). You could be more selective, for instance if you know that certain machines are problem sources (like the CentOS 7 machine that generated my second example) while others use software that already puts the hostname in (such as the Ubuntu machine that generated my first example).

I'm actually more attracted to the second approach than the first one. Sure, it's a big hammer and a bit crude, but it creates the easy to see marker of the source machine that I want (and it's a change we only have to make to one central machine). I'd feel differently if we routinely got status emails from various machines that we just filed away (in which case the alias-based approach would give us easy per-machine filing), but in practice our machines only email us occasionally and it's always going to be something that goes to our inboxes and probably needs to be dealt with.

sysadmin/IdentifyingStatusEmailSource written at 01:11:32; Add Comment

2017-03-20

Modern Linux kernel memory allocation rules for higher-order page requests

Back in 2012 I wrote an entry on why our Ubuntu 10.04 server had a page allocation failure, despite apparently having a bunch of memory free. The answer boiled down to the the NFS code wanting to allocate a higher-order request of 64 Kb of (physically contiguous) memory and the kernel having some rather complicated and confusing rules for when this was permitted when memory was reasonably fragmented and low(-ish).

That was four and a half years ago, back in the days of kernel 3.5. Four years is a long time for the kernel. Today the kernel people are working on 4.11 and, unsurprisingly, things have changed around a bit in this area of code. The function involved is still called __zone_watermark_ok() in mm/page_alloc.c, but it is much simpler today. As far as I can tell from the code, the new general approach is nicely described by the function's current comment:

Return true if free base pages are above 'mark'. For high-order checks it will return true of the order-0 watermark is reached and there is at least one free page of a suitable size. Checking now avoids taking the zone lock to check in the allocation paths if no pages are free.

The 'order-0' watermark is the overall lowmem watermark (which I believe is low: from my old entry). This bounds all requests for obvious reasons; as the code says in a comment, if a request for a single page is not something that can go ahead, requests for more than one page certainly can't. Requests for order-0 pages merely have to pass this watermark; if they do, they get a page.

Requests for higher-order pages have to pass an obvious additional check, which is that there has to be a chunk of at least the required order that's still free. If you ask for a 64 Kb contiguous chunk, your request can't be satisfied unless there's at least one chunk of size 64 Kb or bigger left, but it's satisfied if there's even a single such chunk. Unlike in the past, as far as I can tell requests for higher-order pages can now consume all of those pages, possibly leaving only fragmented order-0 4 Kb pages free in the zone. There is no longer any attempt to have a (different) low water mark for higher-order allocations.

This change happened in late 2015, in commit 97a16fc82a; as far as I can tell it comes after kernel 4.3 and before kernel 4.4-rc1. I believe it's one commit in a series by Mel Gorman that reworks various aspects of kernel memory management in this area. His commit message has an interesting discussion of the history of high-order watermarks and why they're apparently not necessary any more.

(Certainly I'm happy to have this odd kernel memory allocation failure mode eliminated.)

Sidebar: Which distributions have this change

Ubuntu 16.04 LTS uses kernel '4.4.0' (plus many Ubuntu patches); it has this change, although with some Ubuntu modifications from the stock 4.4.0 code. Ubuntu 14.04 LTS has kernel 3.13.0 so it shouldn't have this change.

CentOS 7 is using a kernel labeled '3.10.0'. Unsurprisingly, it does not have this change and so should have the old behavior, although Red Hat has been known to patch their kernels so much that I can't be completely sure that they haven't done something here.

Debian Stable has kernel 3.16.39, and thus should also be using the old code and the old behavior. Debian Testing ('stretch') has kernel 4.9.13, so it should have this change and so the next Debian stable release will include it.

linux/ModernPageAllocRules written at 21:54:42; Add Comment

My theory on why Go's gofmt has wound up being accepted

In Three Months of Go (from a Haskeller's perspective) (via), Michael Walker makes the following observation in passing:

I do find it a little strange that gofmt has been completely accepted, whereas Python’s significant whitespace (which is there for exactly the same reason: enforcing readable code) has been much more contentious across the programming community.

As it happens, I have a theory about this: I think it's important that gofmt only has social force. By this I mean that you can write Go code in whatever style and indentation you want, and the Go compiler will accept it (in some styles you'll have to use more semicolons than in others). This is not the case in Python, where the language itself flatly insists that you use whitespace in roughly the correct way. In Go, the only thing 'forcing' you to put your code through gofmt is the social expectations of the Go community. This is a powerful force (especially when people learning Go also learn 'run your code through gofmt'), but it is a soft force as compared to the hard force of Python's language specification, and so I think people are more accepting of it. Many of the grumpy reactions to Python's indentation rules seem to be not because the formatting it imposes is bad but because people reflexively object to being forced to do it.

(This also means that Go looks more conventional as a programming language; it has explicit block delimiters, for example. I think that people often react to languages that look weird and unconventional.)

There is an important practical side effect of this that is worth noting, which is that your pre-gofmt code can be completely sloppy. You can just slap some code into the file with terrible indentation or no indentation at all, and gofmt will fix it all up for you. This is not the case in Python; because whitespace is part of the grammar, your Python code must have good indentation from the start and cannot be fixed later. This makes it easier to write Go code (and to write it in a wide variety of editors that don't necessarily have smart indentation support and so on).

The combination of these two gives the soft force of gofmt a great deal of power over the long term. It's quite convenient to be able to scribble sloppily formatted code down and then have gofmt make it all nice for you, but if you do this you must go along with gofmt's style choices even if you disagree with some of them. You can hold out and stick to your own style, but you're doing things the hard way as well as the socially disapproved way, and in my personal experience sooner or later it's not worth fighting Go's city hall any more. The lazy way wins out and gofmt notches up another quiet victory.

(It probably also matters that a number of editors have convenient gofmt integration. I wouldn't use it as a fixup tool as much as I do if I had to drop the file from my editor, run gofmt by hand, and then reload the now-changed file. And if it was less of a fixup tool, there would be less soft pressure of 'this is just the easiest way to fix up code formatting so it looks nice'; I'd be more likely to format my Go code 'correctly' in my editor to start with.)

programming/GoWhyGofmtAccepted written at 01:01:12; Add Comment

(Previous 11 or go back to March 2017 at 2017/03/19)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.