Wandering Thoughts

2021-07-23

Why it matters that map values are unaddressable in Go

A while ago, I wrote Addressable values in Go (and unaddressable ones too) as an attempt to get straight this tricky concept in Go, which I hadn't fully understood. To refresh, the Go specification's core description of this is covered in Address operators:

For an operand x of type T, the address operation &x generates a pointer of type *T to x. The operand must be addressable, that is, either a variable, pointer indirection, or slice indexing operation; or a field selector of an addressable struct operand; or an array indexing operation of an addressable array. As an exception to the addressability requirement, x may also be a (possibly parenthesized) composite literal. [...]

One of the things that are explicitly not addressable are values in a map. As I mentioned in the original entry, the following is an error:

&m["key"]

On the surface this looks relatively unimportant. There aren't many situations where you might naturally explicitly take the address of a map value. But there turns out to be an important consequence of this, brought to my attention recently by this article.

One important thing in Go that addressability affects is Assignments:

Each left-hand side operand must be addressable, a map index expression, or (for = assignments only), the blank identifier. [...]

Suppose that you have map values that are structs with fields. Because map values are not addressable and field selectors can only be applied to addressable struct operands, you cannot directly assign values to the fields of map values. The following is an error:

m["key'].field = 10

This will give you the clear error of 'cannot assign to struct field m["key"].field in map'. To make this work, you must assign the map value to temporary variable, modify the temporary, and put it back in the map:

t := m["key"]
t.field = 10
m["key"] = t

One reason I can think of for this restriction is that otherwise, Go might be required to silently materialize struct values in maps as a consequence of what looks like a simple field assignment. Consider:

m["nosuchkey"].field = 10

If this was to work, it would have to have the side effect of creating an entire m["nosuchkey"] value and setting it in the map for the key. Instead Go refuses to allow it, at compile time.

In the usual way of addressable values in Go, this will work if the map values are pointers to structs and the syntax is exactly the same. This implies that in some cases you can convert internal map values from pointers to structs to the structs themselves without any code changes or errors, and in some cases you can't. However, if you use pointer map values,

(However, with pointer map values the m["nosuchkey"].field case would be a runtime panic. When you deal with explicit pointers, Go makes you accept this possibility.)

This also affects method calls (and method values) in some situations, because of this special case:

[...] If x is addressable and &x's method set contains m, x.m() is shorthand for (&x).m(): [...]

If you have a type T and there is a pointer receiver method *T.Mp(), you can normally call .Mp() even on a non-pointer value:

var v T
v.Mp()

However, this requires that the value be addressable. Since map values are not addressable, the following is an error (when the type of map values is T):

m["key"].Mp()

Currently, you get two errors for this (reported on the same location):

cannot call pointer method on m["key"]
cannot take the address of m["key"]

This is the same error message as we saw for function return values in my original entry, just about a different thing. As before, converting the map value type from T to *T will make this not an error and all of the syntax is exactly the same.

As with the field access case, Go not allowing this means that it doesn't have to consider what to do if you write:

m["nosuchkey"].Mp()

While there are various plausible options for what could happen here if Go accepted it, I think the one that most people would expect is that it would work the same as:

t := m["nosuchkey"]
t.Mp()
m["nosuchkey"] = t

Which is to say, Go would have to materialize a value and then add it to the map. As a subtle issue, the working version makes it clear when m["nosuchkey"] actually exists. This also makes it explicit that the method call isn't manipulating the value that is in the map.

(My original entry was sparked by a Dave Cheney pop quiz involving the type of a function return, so I was thinking more about function return values than other sorts of values.)

PS: I think this lack of map value addressability means that there's no way today in Go to directly modify a map value or its fields. Instead you must copy the map value into a temporary, manipulate the temporary, and then put it back in the map. This is probably a feature.

programming/GoAddressableValuesII written at 23:33:18; Add Comment

2021-07-22

Apache's mod_wsgi and the Python 2 issue it creates

If you use Apache (as we do) and have relatively casual WSGI-based applications (again, as we do), then Apache's mod_wsgi is often the easiest way to deploy your WSGI application. Speaking as a system administrator, it's quite appealing to not have to manage a separate configuration and a separate daemon (and I still get process separation and different UIDs). But at the moment there is a little problem, at least for people (like us) who use their Unix distribution's provided version of Apache and mod_wsgi rather than build your own. The problem is that any given build of mod_wsgi only supports one version of (C)Python.

(Mod_wsgi contains an embedded CPython interpreter, although generally it's not literally embedded; instead mod_wsgi is linked to the appropriate libpython shared library.)

In the glorious future there will only be (some version of) Python 3, and this will not be an issue. All of your WSGI programs will be Python 3, mod_wsgi will use some version of Python 3, and everything will be relatively harmonious. In the current world, there is still a mixture of Python 2 and Python 3, and if you want to run a WSGI based program written in a different version of Python than your mod_wsgi supports, you will be sad. As a corollary of this, you just can't run both Python 2 and Python 3 WSGI applications under mod_wsgi in a single Apache.

Some distributions have both Python 2 and Python 3 versions of mod_wsgi available; this is the case for Ubuntu 20.04 (which answers something I wondered about last January). This at least lets you pick whether you're going to run Python 2 or Python 3 WSGI applications on any given system. Hopefully no current Unix restricts itself to only a Python 2 mod_wsgi, since there's an increasing number of WSGI frameworks that only run under Python 3.

(For example, Django last supported Python 2 in 1.11 LTS, which is no longer supported; support stopped some time last year.)

PS: Since I just looked it up, CentOS 7 has a Python 3 version of mod_wsgi in EPEL, and Ubuntu 18.04 has a Python 3 version in the standard repositories.

python/Python2ApacheWsgiIssue written at 23:56:04; Add Comment

Improving my web reading with Martin Tournoij's "readable" Firefox bookmarklet

Not that long ago, I set up Martin Tournoij's "fixed" bookmarklet to deal with CSS fixed elements. When I did this, I also decided to install Tournoij's "readable" bookmarklet, because it was right there and it felt potentially useful. With it sitting in my tab bar, I started trying it out on sites that I found not so readable, or even vaguely marginally non-readable, and to my surprise it's been a major quality of life improvement on many sites. I've become quite glad that I made it conveniently available.

What the "readable" bookmarklet does is it goes through every <p>, <li>, and <div> to force the text colour, size, weight, line spacing, and font family to reasonable values. It doesn't try to set the background colour, but it turns out that a lot of sites use a basically white background, so forcing the text colour is sufficient. All of this sounds very basic, but the result can be really marvelous. It's especially impressive on sites that don't feel as if they have obviously terrible text, just text that's a bit annoying. It turns out that what feels 'a bit annoying' to me is often harder to read than I was consciously aware of.

Why such simple restyling works so well in practice is somewhat sad. It turns out that a lot of sites make terrible text styling choices for clear readability. The obvious case is too-small text, but beyond that a lot of sites turn out to set a lower-contrast text colour, such as some shade of grey, unusually thin text through either weight or font choice, or both at once. Undoubtedly they think that the result looks good and is perfectly readable, but increasingly my eyes disagree with them.

Because I looked it up, here is specifically what is being set by the current bookmarklet. Currently, the "readable" bookmarklet runs the following Javascript:

javascript:(function() {
    document.querySelectorAll('p, li, div').forEach(function(n) {
        n.style.color = '#000';
        n.style.font = '500 16px/1.7em sans-serif';
    });
})();

The n.style.color is simple; #000 is black. The n.style.font is a little bit more complex, because it's using the shorthand font property in a specific format. This format sets the font-weight to '500', which is just a little bit bolder than normal ('400' is normal), the font-size to 16 px (which these days is a device-independent thing), the line-height to 1.7 em for a pretty generous spacing between lines, and the font-family to your general sans-serif font. People who prefer serif fonts may want to change that to 'serif', and in general now that I look at it you might want to tinker with the 16px and the line spacing as well, depending on your preferences.

(My standard Firefox font is set to the default Fedora 'serif' font, currently DejaVu Serif according to Firefox, at size '15'. I could probably reasonably change the '16px/1.7em sans-serif' in the bookmarklet to '15px/1.5em serif' or so, but at the moment I don't feel inclined to do so; if I'm irritated enough to poke the bookmarklet, I might as well make the page really readable.)

web/FirefoxReadablePraise written at 00:04:47; Add Comment

2021-07-21

It's nice when programs switch to being launched from systemd user units

I recently upgraded my home machine from Fedora 33 to Fedora 34. One of the changes in Fedora 34 is that the audio system switched from PulseAudio to PipeWire (the Fedora change proposal, an article on the switch). Part of this switch is that you need to run different daemons in your user session. For normal people, this is transparently handled by whichever standard desktop environment they're using. Unfortunately I use a completely custom desktop, so I have to sort this out myself (this is one way Fedora upgrades are complicated for me). Except this time I didn't need to do anything; PipeWire just worked after the switch.

One significant reason for this is that PipeWire arranges to be started in your user session not through old mechanisms like /etc/xdg/autostart but through a systemd user unit (actually two, one for the daemon and one for the socket). Systemd user units are independent of your desktop and get started automatically, which means that they just work even in non-standard desktop environments (well, so far).

(As covered in the Arch Wiki, there are some things you need to do in an X session.)

One of the things that's quietly making my life easier in my custom desktop environment is that more things are switching to being started through systemd user units instead of the various other methods. It's probably a bit more work for some of the programs involved (since they can't assume direct access to your display any more and so on), but it's handy for me, so I'm glad that they're investing in the change.

PS: It turns out that the basic PulseAudio daemon was also being set up through systemd user units on Fedora 33. But PulseAudio did want special setup under X, with an /etc/xdg/autostart file that ran /usr/bin/start-pulseaudio-x11. It's possible that PipeWire is less integrated with the X server than PulseAudio is. See the PulseAudio X11 modules (also).

PPS: Apparently I now need to find a replacement for running 'amixer -q set Master ...' to control my volume from the keyboard. This apparently still works for some people (also), but not for me; for now 'pactl' does, and it may be the more or less official tool for doing this with PipeWire for the moment, even though it's from PulseAudio.

linux/SystemdUserUnitsNice written at 01:01:16; Add Comment

2021-07-19

Making a Go program build with Go modules can be not a small change

In theory, at some point in the future Go will stop supporting the traditional GOPATH mode. When this happens, if you want to still build old Go programs that you have sitting around in checked out version control repositories, you will need to modularize them. Once upon a time, I thought that this would be as simple as going to the root of your copy of the repo, then running 'go mod init ...' and 'go mod tidy'. Unfortunately, life is not this simple and there can be at least two complications.

The first complication is moved and renamed repositories for modules, if the moved module has a go.mod that declares its new name. For example what is now github.com/hexops/vecty was once github.com/gopherjs/vecty. In a non-modular Go build, you can still import it under the old path and it will work. However, the moment you attempt to modularize the program, 'go mod tidy' will complain and stop:

github.com/gopherjs/vecty: github.com/gopherjs/vecty@v0.6.0: parsing go.mod:
module declares its path as: github.com/hexops/vecty
        but was required as: github.com/gopherjs/vecty

In theory you may be able to get this to work with a go.mod replace directive. In practice my attempts to do this resulted in 'go mod tidy' errors about:

go: github.com/hexops/vecty@v0.6.0 used for two different module paths (github.com/gopherjs/vecty and github.com/hexops/vecty)

(You also need to get the version number or other version identifier of the moved repository.)

The general fix is to edit every import of packages from the module to use the new location. Then you can run 'go mod tidy' without it complaining.

The second complication is modules that have moved to versions above v1, possibly very far past v1; for example, github.com/google/go-github is up to v37, and modularized at v18 (it doesn't even have a tagged v1). A GOPATH build of the program you're trying to modularize will use whatever version of the repository you have checked out, which may well be the current one, and the code will import it as a version without a version suffix (as 'github.com/google/go-github'). When you run 'go mod tidy', Go will attempt to find the most recent tag (or version of the repository) that doesn't have a go.mod file, and specify that version in your go.mod with a '+incompatible' tag. Depending on how far Go had to rewind, this may be a version of the package that is far older than the program expects.

(If a go.mod existed for a v1 version, I suspect that 'go mod tidy' will pick that in this case. But I haven't tried to test it, partly for lack of a suitable module to test against. With github.com/google/go-github, I get 'v17.0.0+incompatible', the last tagged version before it was modularized.)

Again the fix is to edit the program's source code to change every import of the package to use the proper versioned package. Instead of importing, say, 'github.com/google/go-github/github', you would import 'github.com/google/go-github/v37/github'.

Although I haven't tested it extensively, it appears that go-imports-rename can be used to make both sorts of changes. I successfully used it to automatically modify my test third party repository.

(There may be other tools to do this package import renaming, but this is the one I could find.)

The unfortunate part of all of this is that it requires you to make changes to files that will be under version control in the repo. If the upstream updates things in the future, this will probably make your life more complicated.

(In some cases, 'go mod tidy' may insist that you clean up imports in code that's in sub-packages in the repository that aren't actually imported and used in the program itself.)

programming/GoModularizationTwoGotchas written at 23:32:20; Add Comment

2021-07-18

On sending all syslog messages to one file

Over on Twitter, I had a view on where syslog messages should go:

Tired sysadmin take: Different sorts of syslog messages going to different places are a mistake. Throw it all into /var/log/allmessages and I'll sort it out myself.

Like many Twitter takes of mine, in retrospect this one is heartfelt but a little bit too extreme as presented. Specifically, I think you should log all syslog messages to one place, but also log some sorts of messages to their own additional places so you can look through them more easily.

In the old days, I used to carefully curate my syslog.conf so that every different syslog facility had its own different file. Often, the net result of this is that I would end up using grep on every current syslog file in /var/log because I'd forgotten (or never knew) what facility a given program logged under. Trying to predict what facility a program will use is often almost as futile as predicting what priority level messages will be logged under.

(This is worse if you rely on the Unix vendor stock syslog.conf instead of customizing it. Unix vendors are inevitably different from each other, and some of them have rather strange ideas of what should go where.)

All of this leads to the tired sysadmin take of putting everything into one file (/var/log/allmessages is what I prefer) and then searching it. An allmessages file is the brute force solution to unpredictable programs and Unix vendor variability, and it also makes sure everything gets logged. But sending all syslog messages to only a single place is a little bit of overkill. Despite my tired take, there are often syslog facilities that it's sensible to also log to separate files, so you can look at just them.

The obvious case is kernel messages, and it's so obvious that systemd's journalctl has a dedicated flag to show you only kernel messages. If I was starting a syslog configuration from scratch, I would also have a log file dedicated to "auth" and "authpriv" messages, one dedicated to "mail" messages, and on my own systems, one dedicated to "daemon" messages. Everything would still go to allmessages; these files are in addition to it.

(And on some systems you might opt to have specific programs log to specific facilities, like "user" or "local0", and have specific files so you can monitor and see the activities of just those programs.)

Sending all syslog messages to an allmessage file is a blunt hammer, and like all blunt hammers it's possible to overuse it. Being able to scan through a single file that has everything has a lot of positive features, but not everything is best served by searching for it through a giant file. Sometimes you want both options.

sysadmin/SyslogToOnePlace written at 23:13:42; Add Comment

2021-07-17

The minimum for syslog configurations should be to log (nearly) everything

I have some opinions on how the venerable Unix syslog should be set up, but a very strong one of them is that (nearly) every syslog message should be logged somewhere. I consider this a minimum standard for vendor and distribution supplied syslog.conf files. The 'nearly' is that although syslog priorities don't mean much these days, I think a Unix is reasonably justified in not syslog'ing the debug priority for most facilities. However, a stock syslog.conf should definitely log each of the syslog facilities supported by its syslog to somewhere.

(POSIX's syslog.h defines seventeen facilities. Actual Unixes define more; Linux syslog(3) and OpenBSD have 20, while FreeBSD has 23.)

This should also be something you preserve in any local versions or modifications to the standard syslog configuration. Unless you're extremely sure that a syslog facility will never ever be used, you should keep logging it somewhere. And if you're sure it will never be used, well, what's the harm in having it sent to a file that will always wind up being empty? This is especially the case if you're running third party software (whether commercial or open source), because programmers can have all sorts of clever ideas about what syslog facilities to use for what.

If you're extremely sure that you don't need to syslog a particular facility and so you leave it out, please put a comment in your syslog configuration file to explain this. A good goal to strive for in syslog configuration files (for you and for vendors) is to create one that convinces any sysadmin reading it (including your future self) that it covers everything that will ever be logged.

(My other syslog configuration opinions are for another entry.)

PS: Out of the Unixes we use, Ubuntu has a default configuration that clearly logs everything to either /var/log/syslog or /var/log/auth.log, while the stock OpenBSD configuration only covers a limited number of facilities. It's possible that OpenBSD covers every use of syslog in the base system (you'd certainly hope so), but if so I doubt it covers all uses of syslog in the packages collection.

sysadmin/SyslogLogEverythingSomewhere written at 23:13:58; Add Comment

2021-07-16

The WireGuard VPN challenge of provisioning clients

I mentioned in yesterday's entry that at work I'm building a VPN server that will support WireGuard. I'm quite happy with WireGuard in general and I think it has some important attractive features (such as the lack of 'sessions'), but we won't be offering WireGuard for general use. I would like to, but every time I even consider the idea, I run headlong into the problem of provisioning, specifically of provisioning WireGuard clients in some way that ordinary people can successfully set them up.

Right now, to set up a WireGuard client you need the server's name and port (which every VPN needs), the server's public key, the IP the server expects you to have inside the WireGuard connection (its AllowedIPs setting for you), and a private key that the server has the public key for. We also need you to set your DNS server(s) to correctly point to us, and for general VPN usage you have to set your AllowedIPs to 0.0.0.0/0. This is a lot more things for you to set up than other VPN servers need, partly because other VPN servers will push your internal IP, the DNS servers to use, and often other information to you. Much of this is also sensitive to typos or, in the case of keys, must be cut and pasted to start with (no one is typing a base64 WireGuard key). If you get your client IP wrong, for example, things just quietly don't work (the server will discard your traffic).

The client keypair is an especially touchy problem. The ideal would be to securely generate it on the client and upload the public key. In practice this is asking a lot of people to do more or less by hand, so in a realistic setup we would probably want to generate your client keypair on the server and then somehow give you access to the private key for you to configure along side the server's public key. Given this, possibly the most generally usable way of provisioning WireGuard client connections would be to generate the wg.conf that a client would use with the normal WireGuard command line tools, then provide it to people and hope that any WireGuard client will be able to import it.

(The official WireGuard client for iOS and Android will apparently do this, including decoding the configuration from a QR code. I believe the official Windows client does as well. On Unix, you can use the wg.conf directly or import it into NetworkManager.)

An additional complication is that you need a separate WireGuard configuration on each device that you want to use WireGuard on at the same time. So we wouldn't have to just provision one WireGuard setup per person, we're looking at one for your laptop, one for your phone, one for your tablet, and so on. This also complicates naming them and keeping track of them (for people and for us), and likely would tempt people into reusing configurations across devices, which leads to fun problems if both devices are in use at the same time.

I don't blame the WireGuard project for this state of affairs. Provisioning is both a hard problem and a high level concern that is sort of out of scope for a project that's deliberately low level and simple. I'm honestly impressed (and happy) that there are official WireGuard clients on as many platforms as there are. I do wish there was some officially supported way to push configuration information to clients, although I understand why there isn't.

(Tailscale is not a solution for us for various reasons, including price. I do admire them for solving the provisioning problem, though.)

sysadmin/WireGuardProvisioningChallenge written at 23:56:07; Add Comment

Setting up a WireGuard client with NetworkManager (using nmcli)

For reasons beyond the scope of this entry, I've been building a VPN server that will support WireGuard (along with OpenVPN and L2TP). A server needs a client, so I spent part of today setting up my work laptop as a WireGuard client in a 'VPN' configuration, under NetworkManager because that's what my laptop uses. I was hoping to do this through the Cinnamon GUIs for NetworkManager, but unfortunately while NetworkManager itself has supported WireGuard for some time, this support hasn't propagated into GUIs such as the GNOME Control Center (cf) or the NetworkManager applet that Cinnamon uses.

I'm already quite familiar with WireGuard in general, so I found that the easiest way to start was to set up a basic WireGuard configuration file for the connection in /etc/wireguard/wg0.conf, including both the main configuration (with the laptop's key and my local port) and a [Peer] section for the server. Since I'm using WireGuard here in a VPN configuration, instead of to reach just some internal IPs, I set AllowedIPs to 0.0.0.0/0. After writing wg0.conf, I then imported it into NetworkManager:

nmcli connection import type wireguard file /etc/wireguard/wg0.conf

(For what can go in the configuration file, start with wg(8) and wg-quick(8). I suspect that NetworkManager doesn't support some of the more advanced keys. I stuck to the basics. The import process definitely ignores the various script settings supported by wg-quick(8). Currently, see nm_vpn_wireguard_import() in nm-vpn-helpers.c.)

Imported connections are apparently set to auto-connect, which isn't what I wanted, plus there were some other things to adjust (following the guide of Thomas Haller's WireGuard in NetworkManager):

nmcli con modify wg0 \
   autoconnect no \
   ipv4.method manual \
   ipv4.address 172.29.50.10/24 \
   ipv4.dns <...>

At this point you might be tempted to set ipv4.gateway, and indeed that's what I did the first time around. It turns out that this is a mistake, because these days NetworkManager will do the right thing based on the 'accept everything' AllowedIPs I set, right down to setting up policy based routing with a fwmark so that encrypted traffic to the WireGuard VPN server doesn't try to go over WireGuard. If you set ipv4.gateway as well, you wind up with two default routes and then your encrypted WireGuard traffic may try to go over your WireGuard connection again, which doesn't work.

(See the description of 'ip4-auto-default-route in the WireGuard configuration properties. The full index of available NetworkManager settings in various sections is currently here; the ones most useful to me are probably connection.* and ipv4.*.)

Getting DNS to work correctly requires a little extra step, or at least did for me. While the wg0 connection is active, I want all of my DNS queries to go to our internal resolving DNS server and also to have a search path of our university subdomain. This apparently requires explicitly including '~' in the NetworkManager DNS search path:

nmcli con modify wg0 \
  ipv4.dns-search "cs.toronto.edu,~"

This comes from Fedora bug #1895518, which also has some useful resolvectl options.

You (I) can see a lot of settings for the WireGuard setup with 'nmcli connection show wg0', including active ones, but this seems to omit NetworkManager's view of the WireGuard peers. To see that, I needed to look directly at the configuration file that NetworkManager wrote, in /etc/NetworkManager/system-connections/wg0.nmconnection. I'm someday going to need to edit this directly to modify the WireGuard VPN server's endpoint from my test machine to the production machine.

(The NetworkManager RFE for configuring WireGuard peers in nmcli is issue #358.)

With no GUI support for WireGuard connections, I have to bring this WireGuard VPN up and down with 'nmcli con up wg0' and 'nmcli con down wg0'. Once I have the new VPN server in production, I'll be writing little scripts to do this for me. Hopefully this will be improved some day, so that the NetworkManager applet allows you to activate and deactivate WireGuard connections and shows you that one is active.

If I wanted a limited VPN that only sent traffic to our internal networks over my WireGuard link, I would configure the server's AllowedIPs to the list of networks and then I believe that NetworkManager would automatically set up routes for them. However, I don't know how to make this work (in NetworkManager) if the WireGuard VPN server itself was on one of the subnets I wanted to reach over WireGuard. For my laptop, routing all traffic over WireGuard to work is no worse than using our OpenVPN or L2TP VPN servers, which also do the same thing by default.

(On my home desktop, I use hand built fwmark-based policy rules to deal with my WireGuard endpoint being on a subnet I want to normally reach over WireGuard. NetworkManager will build the equivalents for me when I'm routing 0.0.0.0/0 over the WireGuard link, but I believe not in other situations.)

(For information, I primarily relied on Thomas Haller's WireGuard in NetworkManager, supplemented with a Fedora Magazine article and this article.)

linux/NetworkManagerWireGuardClient written at 01:00:49; Add Comment

2021-07-14

Making two Unix permissions mistakes in one

I tweeted:

Today's state of work-brain:
mkdir /tmp/fred
umask 077 /tmp/fred

Immediately after these two commands, I hit cursor-up to change the 'umask' to 'chmod', so that I then ran 'chmod 077 /tmp/fred'. Fortunately I was doing this as a regular user, so my next action exposed my error.

This whole sequence of commands is a set of mistakes jumbled together in a very Unix way. My goal was to create a new /tmp/fred directory that was only accessible to me. My second command is not just wrong because I wanted chmod instead of umask (I should have run umask before the mkdir, not after), but because I had the wrong set of permissions for chmod. It was as if my brain wanted Unix to apply a 'umask 077' to the creation of /tmp/fred after the fact. Since the numeric permissions you give to umask are the inverse of the permissions you give to chmod (you tell umask what you don't want instead of what you do), my change of umask to chmod then left /tmp/fred with completely wrong permissions; instead of being only accessible to me, it was fully accessible to everyone except me.

(Had I been doing this as root, I would then have been able to cd into the directory, put files in it, access files in it, and so on, and might not have noticed that the permissions were reversed from what I actually wanted.)

The traditional Unix umask itself is a very Unix command (well, shell built-in), in that it more or less directly calls umask(). This allows a very simple implementation, which was a priority in early Unixes like V7. A more sensible implementation would be that you specify effectively the maximum permissions that you want (for example, that things can be '755') and then umask would invert this to get the value it uses for umask(). But early Unixes took the direct approach, counting on people to remember the inversion and perform it in their heads.

In the process of writing this entry I learned that POSIX umask supports symbolic modes, and that they work this way. You get and set umask modes like 'u=rwx,g=rx,o=rx' (aka '022', the traditional friendly Unix umask), and they're the same permissions as you would use with chmod. I believe that this symbolic mode is supported by any modern Bourne compatible shell (including zsh), but it isn't necessarily supported by non-Bourne shells such as tcsh or rc (which is my shell).

unix/PermissionsTwoMistakes written at 23:53:11; Add Comment

Some ways to get (or not get) information about system memory ranges on Linux

I recently learned about lsmem, which is described as "list[ing] the ranges of available memory [...]". The source I learned it from was curious why lsmem on a modern 64-bit machine didn't list all of the low 4 GB as a single block (they were exploring kernel memory zones, where the low 4 GB of RAM are still a special 'DMA32' zone). To start with, I'll show typical lsmem default output from a machine with 32 GB of RAM:

; lsmem
RANGE                                  SIZE  STATE REMOVABLE  BLOCK
0x0000000000000000-0x00000000dfffffff  3.5G online       yes   0-27
0x0000000100000000-0x000000081fffffff 28.5G online       yes 32-259

Memory block size:       128M
Total online memory:      32G
Total offline memory:      0B

Lsmem is reporting information from /sys/devices/system/memory (see also memory-hotplug.txt). Both the sysfs hierarchy and lsmem itself apparently come originally from the IBM S390x architecture. Today this sysfs hierarchy apparently only exists for memory hotplug, and there are some signs that kernel developers aren't fond of it.

On the machines I've looked at, the hole reported by lsmem is authentic, in that /sys/devices/system/memory also doesn't have any nodes for that range (on the machine above, for blocks 28, 29, 30, and 31). The specific gap varies from machine to machine. However, all of the information from lsmem may well be a simplification of a more complex reality.

The kernel also exposes physical memory range information through /proc in /proc/iomem (on modern kernels you'll probably have to read this as root to get real address ranges). This has a much more complicated view of actual RAM, one with many more holes than what lsmem and /sys/devices/system/memory show. This is especially the case in the low 4G of memory, where for example the system above reports a whole series of chunks of reserved memory, PCI bus address space, ACPI tables and storage, and more. The high memory range is simpler, but still not quite the same:

100000000-81f37ffff : System RAM
81f380000-81fffffff : RAM buffer

The information from /proc/iomem has a lot of information about PCI(e) windows and other things, so you may want to narrow down what you look at. On the system above, /proc/iomem has 107 lines but only nine of them are for 'System RAM', and all but one of them are in the physical memory address range that lsmem lumps into the 'low' 3.5 GB:

00001000-0009d3ff : System RAM
00100000-09e0ffff : System RAM
0a000000-0a1fffff : System RAM
0a20b000-0affffff : System RAM
0b020000-d17bafff : System RAM
d17da000-da66ffff : System RAM
da7e5000-da8eefff : System RAM
dbac7000-ddffffff : System RAM

(I don't have the energy to work out how much actual RAM this represents.)

Another view of physical memory range information is the kernel's report of the BIOS 'e820' memory map, printed during boot. On the system above, this says that the top of memory is actually 0x81f37ffff:

BIOS-e820: [mem 0x0000000100000000-0x000000081f37ffff] usable

I don't know if the Linux kernel exposes this information in /sys. You can also find various other things about physical memory ranges in the kernel's boot messages, but I don't know enough to analyze them.

What's clear is that in general, a modern x86 machine's physical memory ranges are quite complicated. There are historical bits and pieces, ACPI and other data that is in RAM but must be preserved, PCI(e) windows, and other things.

(I assume that there is low level chipset magic to direct reads and writes for RAM to the appropriate bits of RAM, including remapping parts of the DIMMs around so that they can be more or less fully used.)

linux/SystemMemoryRangeInfo written at 01:00:13; Add Comment

(Previous 11 or go back to July 2021 at 2021/07/12)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.