Wandering Thoughts

2023-09-22

Changing GNU Emacs Lisp functions through advice-add, not brute force

It's a tradition with me that sooner or later, I hit a GNU Emacs function that doesn't work the way I want it to and has no applicable customization options. My traditional brute force approach to dealing with these functions has been to redefine them; I'd copy their code to my .emacs or some personal .el file, modify or replace it to taste, and then insure that my definition got used instead of the standard one. If what I really cared about was a keybinding, sometimes I could give my version a new name (and bind the key I cared about to it). Recently, however, Ben Zanin pointed me at advice-add and, after a while, I was able to work out how to do some changes with it in a nicer way than my previous brute force approach. So here are some notes.

The simplest situation for advice-add is when you want to completely replace a function with a different implementation, which is easily done with ':override':

(advice-add 'mh-forwarded-letter-subject :override
            (lambda (_ignored subject)
              (concat "Fwd: " subject)))

This redefines the function that generates the Subject: header for email forwarded in MH-E so that it completely ignores the author of the mail being forwarded (this turns out to be possible through standard customizations because format can use (only) specific arguments, unlike C printf formatting, which I didn't know at the time I hacked this up).

Sometimes I want a function to not do something, for example to never use NMH's anno to modify my existing email messages. MH-E doesn't directly expose this as a customization, but it does have a specific function to execute (N)MH commands and we can just not run that function if we're trying to execute anno:

(advice-add 'mh-exec-cmd :before-until
            (lambda (cmd &rest args)
              (string-equal cmd "anno")))

The Ways to compose advice section is a little bit confusing, but my rule of thumb is that ':before-until' is for situations where I don't want the function to run if something is true, while ':before-while' is for situations where I only want the function to run if something is true.

The brute force version of this is to use ':around', which lets you wrap around the original function in a single function of your own:

(defun no-annotate (oldfun cmd &rest args)
  (if (not (string-equal cmd "anno"))
      (apply oldfun cmd args)))
(advice-add 'mh-exec-cmd :around 'no-annotate)

Another thing you might want to do is modify the argument list of a function, for example to change how MH-E handles email replies so that you always automatically get cc'd on them. This is done with ':filter-args' advice, but this advice is slightly tricky because as far as I can see you don't get passed exactly the normal arguments; instead you just get a forced list of them. So you have to write your modification like this:

(defun repl-cc-me (args)
  (if (string-equal (car args) "repl")
      (append args '("-cc" "me"))
    args))
(advice-add 'mh-exec-cmd :filter-args 'repl-cc-me)

Although it may be an obvious thing to say, if you're going to filter the function's arguments you're going to have to know exactly what arguments it's passed, and there may be surprises. For example, mh-exec-cmd turns out not to always be passed a list of just strings; sometimes there may be numbers or sublists mixed in.

You can advise the same function (such as mh-exec-cmd) multiple times to do different things. Here I've advised it twice, one to ignore attempts to run "anno" and once to modify what "repl" gets handed as an argument. Obviously you need your advice to not clash (as is the case here); otherwise you may need to combine them in a single piece of advice that sorts it out right.

As both of these examples illustrate, you may need to advice-add some other function than your initial target. My initial targets are the behavior of 'mh-forward' and 'mh-reply', but in both cases the behavior is in the middle of the function, so unless I want to re-define them I have to hook into something else. Finding where to hook into requires reading the code and following control flow. If you need to alter one function's use of another function that is generally used (such as 'mh-exec-cmd', which is used all over MH-E), you generally need to hope that it's called with some argument you can detect that's specific to your function.

(In fact 'mh-forward's annotation behavior is a couple of layers of functions down, but once I found the root cause I decided I didn't want any annotation to be happening from anywhere, not just forwarding messages.)

One somewhat tricky option here that I didn't actually get working the one time I tried it is to create a dynamically scoped variable that you use to signal that you are in your function of interest:

(defvar mh-reply-marker nil "Are we in mh-reply?")
(defun mark-mh-reply (oldfun message &optional reply-to includep)
  (let ((mh-reply-marker t))
    (apply oldfun message reply-to includep)))
(advice-add 'mh-reply :around 'mark-mh-reply)

Then your advising code for other functions can check mh-reply-marker to see if they should do things or just quietly stay out of the way. This is a real usage case for ':around', because we need to wrap the original function in that 'let'.

The other thing that I haven't gotten fully and completely working is advising interactive functions. In theory this is supposed to work, but in practice at various times either invoking the function itself has failed or other functions that do '(call-interactive 'mh-reply)' have failed with mysterious errors. My conclusion is that this is a level of advising that is currently beyond my ken, and if I need to dabble in the waters to this depth, I'm probably back to re-defining functions (it may be brute force but it works).

Sidebar: 'let' and dynamically scoped variables

If you're new to Lisp and dynamic scoping, it may be a little bit surprising that your in-function 'let' of 'mh-reply-marker' persists even into functions that you call, but it does and this feature is actually used all over GNU Emacs in various ways. One example is temporarily forcing the value of a customizable setting to do something. MH-E can be set to prefer plain text over HTML for email with both, and it has a function 'mh-show-preferred-alternative' to override that temporarily; this function works by nulling out the customization you set and re-displaying the message. You can write an inverse of this with the same idea:

(defun mh-show-plaintext ()
  "Show text/plain instead of text/html for a message."
  (interactive)
  (let
      ((mm-discouraged-alternatives '("text/html")))
    (mh-show nil t)))

(I have other personal functions that work this way.)

programming/EmacsChangingLispWithAdviceAdd written at 23:04:10; Add Comment

2023-09-21

HTTP Basic Authentication and your URL hierarchy

We're big fans of Apache's implementation of HTTP Basic Authentication, but we recently discovered that there are some subtle implications of how Basic Authentication can interact with your URL hierarchy within a web application. This is because of when HTTP Basic Authentication is and isn't sent to you, and specifically that browsers don't preemptively send the Authorization when they are moving up your URL hierarchy (well, for the first time). That sounds abstract, so let's give a more concrete example.

Let's suppose you have a web application that allows authenticated people to interact with it at both a high level, with a URL of '/manage/', and at the level of dealing with a specific item, with an URL of say '/manage/frobify/{item}'. You would like a person to frobify some item, so you (automatically) send them email saying 'please visit <url> to frobify {item}'. They visit that URL while not yet authenticated, which causes the web server to return a HTTP 401 and gets their browser to ask them for their login and password on your site (for a specific 'realm', effectively a text label). Their re-request with an Authorization header succeeds, and the person delightedly frobifies their item. At the end of this process, your web application redirects them to '/manage/'. Because this URL is above the URL the person been dealing with, their browser will not preemptively send the Authorization header, and your web server will once again respond with a HTTP 401.

Because this is all part of the same web application, your HTTP Basic Authentication will use the same realm setting for both URLs in your web server and thus your WWW-Authenticate header. In theory the browser can see that it already knows an authentication for this realm and automatically retry with the Authorization header. In practice a browser may not always do this in all circumstances, and may instead stop to ask the person for their login and password again. With this URL design you're at the mercy of the browser to do what you want.

(This can be confusing to the person, especially if (from their perspective) they just pressed a form button that said 'yes, really frobify {item}' and now they're getting challenged again. They may well think that their action failed; after all, successful actions don't usually cause you to get re-challenged for authentication.)

Unfortunately it's hard to see how to get out of this while still having a sensible URL hierarchy, short of never sending people direct links for actions and always having them enter at the top level of your application. One not entirely great option is that when people frobify their items, they are never automatically redirected up; instead they just wind up back on '/manage/frobify/{item}' except that now the page says 'congratulations, you have frobified this item, click here to go to the top level'. This is slightly less convenient (well, if people actually want to go to your '/manage/' page) but won't leave people in doubt about whether or not they really did successfully frobify their item.

When you look at your logs, this behavior may be surprising to you if you've forgotten the complexities of when browsers preemptively send HTTP Basic Authentication information. HTTP Basic Authentication doesn't work like a regular cookie, where you can set it once and then assume it will always come back, which is the model of authentication we're generally most familiar with.

web/BasicAuthAndURLHierarchy written at 23:36:10; Add Comment

2023-09-20

Restarting nfs-server on a Linux NFS (v3) server isn't transparent

A while back I wrote an article on enabling NFS v4 on an Ubuntu 22.04 fileserver (instead of just NFS v3), where one of the final steps was to restart 'nfsd', the NFS server daemon (sort of), with 'systemctl restart nfs-server'. In that article I said that as far as I could tell this entire process was transparent to NFS v3 clients that were talking to the NFS server. Unfortunately I have to take that back. Restarting 'nfs-server' will cause the NFS server to discard locks obtained by NFS v3 clients, without telling the NFS v3 clients anything about this. This results in the NFS v3 clients thinking that they hold locks while the NFS server believes that everything is unlocked and so will allow another client to lock it.

(What happens with NFS v4 clients is more uncertain to me; they may more or less ride through things.)

On Linux, the NFS server is in the kernel and runs as kernel processes, generally visible in process lists as '[nfsd]'. You might wonder how these processes are started and stopped, and the answer is through a little user-level shim, rpc.nfsd. What this program actually does is write to some files in /proc/fs/nfsd that control the portlist, the NFS versions offered, and the number of kernel nfsd threads that are running. To restart (kernel) NFS service, the nfs-server.service unit first stops it with 'rpc.nfsd 0', telling the kernel to run '0' nfsd threads, and then starts it again by writing some appropriate number of threads into place, which starts NFS service. The nfs-server.service systemd unit also does some other things.

(As a side note, you can see what NFS versions your NFS server is currently supporting by looking at /proc/fs/nfsd/versions. Sadly this can't be changed while there are NFS server threads running.)

If you restart the kernel NFS server either with 'systemctl restart nfs-server' or by hand by writing '0' and then some number to /proc/fs/nfsd/threads, the kernel will completely drop knowledge of all locks from NFS v3 clients. Unfortunately running 'sm-notify' doesn't seem to recover them; they're just gone. Locks from NFS v4 clients suffer a somewhat less predictable and certain fate. If the NFS v4 client is actively doing NFS operations to the server, its locks will generally be preserved over a 'systemctl restart nfs-server'. If the client isn't actively doing NFS operations and doesn't do any for a while, I'm not certain that its locks will be preserved, and certainly they aren't immediately there (they seem to only come back when the NFS v4 client re-attaches to the server).

Looked at from the right angle, this makes sense. The kernel has to release locks from NFS clients when it stops being an NFS server, and a sensible signal that it's no longer an NFS server is when it's told to run zero NFS threads. However, it does seem to lead to an unfortunate result for at least NFS v3 clients.

linux/NFSServerRestartLosesNFSv3Locks written at 23:12:58; Add Comment

2023-09-19

How Unix shells used to be used as an access control mechanism

Once upon a time, one of the ways that system administrators controlled who could log in to what server was by assigning special administrative shells to logins, either on a particular system or across your entire server fleet. Today, special shells (mostly) aren't an effective mechanism for this any more, so modern Unix people may not have much exposure to this idea. However, vestiges of this live on in typical Unix configurations, in the form of /sbin/nologin (sometimes in /usr/sbin) and how many system accounts have this set as their shell in /etc/passwd.

The normal thing for /sbin/nologin to do when run is to print something like 'This account is currently not available.' and exit with status 1 (in a surprising bit of cross-Unix agreement, all of Linux, FreeBSD, and OpenBSD nologin appear to print exactly the same message). By making this the shell of some account, anything that executes an account's shell as part of accessing it will fail, so login (locally or over SSH) and a normal su will both fail. Typical versions of su usually have special features to keep you from overriding this by supplying your own shell (often involving /etc/shells, or deliberately running the /etc/passwd shell for the user). Otherwise, there is nothing that prevents processes from running under the login's UID, and in fact it's extremely common for such system accounts to be running various processes.

Unix system administrators have long used this basic idea for their own purposes, creating their own fleet of administrative shells to, for example, tell you that a machine was only accessible by staff. You would then arrange for all non-staff logins on the machine to have that shell as their login shell (there might be such logins if, for example, the machine is a NFS fileserver). Taking the idea one step further, you might suspend accounts before deleting them by changing the account's shell to an administrative shell that printed out 'your account is suspended and will be deleted soon, contact <X> if you think this is a terrible mistake' and then exited. In an era when everyone accessed your services by logging in to your machines through SSH (or earlier, rlogin and telnet), this was an effective way of getting someone's attention and a reasonably effective way of denying them access (although even back then, the details could be complex).

(Our process for disabling accounts gives such accounts a special shell, but it's mostly as a marker for us for reasons covered in that entry.)

You could also use administrative shells to enforce special actions when people logged in. For example, newly created logins might be given a special shell that would make them agree to your usage policies, force them to change their password, and then through magic change their shell to a regular shell. Some of this could be done through existing system features (sometimes there was a way to mark a passwd entry so that it forced an immediate password change), but generally not all of it. Again, this worked well when you could count on people starting using your systems by logging in at the Unix level (which generally is no longer true).

Sensible system administrators didn't try to use administrative shells to restrict what people could do on a machine, because historically such 'restricted shells' had not been very successful at being restrictive. Either you let someone have access or you didn't, and any 'restriction' was generally temporary (such as forcing people to do one time actions on their first login). Used this way, administrative shells worked well enough that many old Unix environments accumulated a bunch of local ones, customized to print various different messages for various purposes.

PS: One trick you could do with some sorts of administrative shells was make them trigger alarms when run. If some people were not really supposed to even try to log in to some machine, you might want to know if someone tried. One reason this is potentially an interesting signal is that anyone who gets as far as running a login shell definitely knows the account's password (or otherwise can pass your local Unix authentication).

(These days I believe this would be considered a form of 'canary token'.)

unix/ShellsAsAccessControl written at 22:28:59; Add Comment

2023-09-18

Making a function that defines functions in GNU Emacs ELisp

Suppose that for some reason you're trying to create a number of functions that follow a fixed template; for example, they should all be called 'mh-visit-<name>' that will all use mh-visit-folder to visit the (N)MH folder '+inbox/<name>'. In my last installment I did this with an Emacs Lisp macro, but it turns out there are reasons to prefer a function over a macro. For example, you apparently can't use a macro in dolist, which means you have to write all of the macro invocations for all of the functions you want to create by hand, instead of having a list of all of the names and dolist'ing over it.

There are two ways to write this function to create functions, a simpler version that doesn't necessarily work and a more complicated version that always does (as far as I know). I'll start with the simple version and describe the problem:

(defun make-visit-func (fname)
  (let ((folder-name (concat "+inbox/" fname))
        (func (concat "mh-visit-" fname)))
    (defalias (intern func)
      (lambda ()
        (interactive)
        (mh-visit-folder folder-name))
      (format "Visit MH folder %s." folder-name))))

If you try this in an Emacs *scratch* buffer, it may well work. If you put this into a .el file (one that has no special adornment) and use it to create a bunch of functions in that file, then try to use one of them, Emacs will tell you 'mh-visit-<name>: Symbol’s value as variable is void: folder-name'. This is because folder-name is dynamically scoped, and so is not captured by the lambda we've created here; the 'folder-name' in the lambda is just a free-floating variable. As far as I know, there is no way to create a lexically bound variable and a closure without making all elisp code in the file use lexical binding instead of dynamic scoping.

(As of Emacs 29.1, .el files that aren't specifically annotated on their first lines still use dynamic scoping, so your personal .el files are probably set up this way. If you innocently create a .el file and start pushing code from your .emacs into it, dynamic scoping is what you get.)

Fortunately we can use a giant hammer, basically imitating the structure of our macro version and directly calling apply. This version looks like this:

(defun make-visit-func (fname)
  (let ((folder-name (concat "+inbox/" fname))
        (func (concat "mh-visit-" fname)))
    (apply `(defalias ,(intern func)
              (lambda ()
                ,(format "Visit MH folder %s." folder-name)
                (interactive)
                (mh-visit-folder ,folder-name))))))

(This time I've attached the docstring to the lambda, not the alias, which is really the right thing but which seems to be hard in the other version.)

As I understand it, we are effectively doing what a macro would be; we are creating the S-expression version of the function we want, with our let created variables being directly spliced in by value, not by their (dynamically bound) names.

PS: I probably should switch my .el files over to lexical binding and fix anything that breaks, especially since I only have one right now. But the whole thing irritates me a bit. And I think the apply-based version is still a tiny bit more efficient and better, since it directly substitutes in the values (and puts the docstring on the lambda).

programming/EmacsFunctionDefiningFunction written at 22:02:24; Add Comment

2023-09-17

Unix shells are generally not viable access control mechanisms any more

Once upon a time, if you had a collection of Unix systems, you could reasonably do a certain amount of access control to your overall environment by forcing logins to have specific administrative shells. As a bonus, these administrative shells could print helpful messages about why the particular login wasn't being allowed to use your system. This is a quite attractive bundle of features, but unfortunately this no longer works in a (modern) Unix environment with logins (such as we have). There are two core problems.

First, you almost certainly operate a variety of services that normally only use Unix logins as a source of (password) authentication and perhaps a UID to operate as, and ignore the login's shell. This is the common pattern of Samba, IMAP servers, Apache HTTP Basic Authentication, and so on. In some cases you may be able to teach these services to look at the login's shell and do special things, but some of them are sealed black boxes and even the ones that can be changed require you to go out of your way. If you forget one, it fails open (allowing access to people with an administrative shell that should lock them out).

(One of these services is SSH itself, since you can generally initiate SSH sessions and ask for port forwarding or other features that don't cause SSH to run the login shell.)

Second, you may operate general authentication services, such as LDAP or a Single Sign On system, and if you do these authentication services are generally blind to what they're being used for and thus to whether or not a login with a special shell should be allowed to pass this particular authentication. The only real solution is to have multiple versions of these authentication systems with different logins in them, and point systems at different ones based on exactly who should be allowed to use them.

A similar issue happens with Apache HTTP Basic Authentication in common configurations, where you have a single authentication realm with a single Apache htpasswd file that covers an assortment of different services. If you need certain logins ('locked' logins or the like) to be excluded from some of these services but not others, either you need multiple htpasswd files (at least) or you need to teach each such service to do additional checks.

(In general you're going to have to try to carefully review who should be able to use which of your services when, and the resulting matrix is often surprisingly complicated and tangled. Life gets more complicated if you're using administrative shells for reasons other than just locking people out with a message, for example to try to force an initial password change.)

Today, the only two measures of login access control that really work in a general environment are either scrambling the login's password (and disable any SSH authorized keys) or excluding the login entirely from your various authentication data sources (your LDAP servers, your Apache htpasswd files, and so on). It's a pity that changing people's shells is no longer enough (it was both easy and convenient), but that's how the environment has evolved.

sysadmin/UnixShellsNoMoreAccessControl written at 21:44:38; Add Comment

2023-09-16

Apache's HTTP Basic Authentication could do with more logging

Suppose, not entirely hypothetically, that you use Apache and have an area of your website protected with Apache's HTTP Basic Authentication. A user comes to you with a problem report; while interacting with this area of the site, they unexpectedly got re-challenged for authentication. In fact, in your Apache logs you can see that they made an authenticated request that returned a HTTP redirect and literally moments later their browser's GET of the redirection target was met with a HTTP 401 response, indicating that Apache didn't think they were authenticated or maybe authorized. Unfortunately, our options for understanding exactly what happened are limited, because Apache doesn't really do logging about the Basic Authentication process.

There is one useful (or even critical) piece of information that Apache does log in the standard log format, and that is whether or not the HTTP 401 was because of a lack of authorization. Both normally get HTTP 401 responses (although you can change that with AuthzSendForbiddenOnFailure and perhaps should), but they appear differently in the normal access log. If there was a successful authentication but the user was not authorized, you will see their name in the log file:

192.168.1.1 - theuser [...] "GET /restricted HTTP/1.1" 401 ....

If they are not authenticated (for whatever reason), then there will be no user name logged; the leading bit will just be '192.168.1.1 - -'.

However, there are at least five reasons why this request was not authenticated (in Apache's view) and you can't tell them apart. The five reasons are the browser didn't send an Authorization header, the header or a part of it was malformed, the authorization source (such as your htpasswd file) was missing or unreadable, the header contained a user name that Apache didn't find or recognize, or the password in the header was incorrect. It would be nice to know which one of these had happened, because they lead to quite different causes and fixes.

(Apache may log errors if the authorization source is missing or unreadable; I haven't tested. That still leaves the other cases.)

For example, if your logs say that the browser didn't send the header at all, that is probably not a problem on your side. Although the rules for when browsers decide to send this header are a bit complex and potentially surprising, because it doesn't work like cookies, where a browser will always send them once set. And browsers make their own decisions about how to react to HTTP 401 responses on requests where they didn't send Authorization headers, so they may decide to re-ask the person for a name and password even though they have Basic Authentication credentials they could try.

(Having discovered AuthzSendForbiddenOnFailure, I am probably going to set it on several limited-access areas in our Apache configuration, because it's rather more user friendly. It's not an information disclosure for us because there are authenticated but otherwise unrestricted areas on the web server with the same credentials, so an attacker can already validate guessed passwords.)

web/ApacheBasicAuthLoggingIssues written at 22:49:05; Add Comment

2023-09-15

Insuring that my URL server and client programs exit after problems

I recently wrote about my new simple system to open URLs on my desktop from remote machines, where a Python client (on the remote server) listens on a Unix domain socket for URLs that programs (like mail clients) want opened, and reports these URLs to the server on my desktop, which passes them to my browser. The server and client communicate over SSH; the server starts by SSH'ing to the remote machine and running the client. On my desktop, I run the server in a terminal window, because that's the easy approach.

Whenever I have a pair of communicating programs like this, one of my concerns is making sure that each end notices when the other goes away or the communication channel breaks, and cleans itself up. If the SSH connection is broken or the remote client exits for some reason, I don't want the server to hang around looking like it's still alive and functioning; similarly, if the server exits or the SSH connection is broken, I want the remote client to exit immediately, rather than hang around claiming to other parties that it can accept URLs and pass them to my desktop to be opened in a browser.

On the server this is relatively simple. I started with my standard stanza for Python programs that I want to die when there are problems:

signal.signal(signal.SIGINT, signal.SIG_DFL)
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
signal.signal(signal.SIGHUP, signal.SIG_DFL)

If I was being serious I should check to see what SIGINT was initially set to, but this is a casual program, so I'll never run it with SIGINT deliberately masked. Setting SIGHUP isn't necessary today, but I didn't remember that until I checked and Python could change it.

Since all the server does is read from the SSH connection to the client, I can detect both client exit and SSH connection problems by looking for end of file, which is signalled by an empty read result:

def process(host: str) -> None:
  pd = remoteprocess(host)
  assert(pd.stdout)
  while True:
    in = pd.stdout.readline()
    if not in:
      break
    [...]

As far as I know, our SSH configurations use TCP keepalives, so if the connection between my machine and the server is broken, both ends will eventually notice.

Arranging for the remote client to exit at appropriate points is a bit harder and involves a hack. The client's sign that the server has gone away is that the SSH connection gets closed, and one sign of that is that the client's standard input gets closed. However, the client is normally parked in socket.accept() waiting for new connections over its Unix socket, not trying to read from the SSH connection. Rather than write more complicated Python code to try to listen for both a new socket connection and end of file on standard input (for example using select), I decided to use a second thread and brute force. The second thread tries to read from standard input and forces the entire program to exit if it sees end of file:

def reader() -> None:
  while True:
    try:
      s = sys.stdin.readline()
      if not s:
        os._exit(0)
    except EnvironmentError:
      os._exit(0)

[...]
def main() -> None:
  [the same signal setup as above]

  t = threading.Thread(target=reader, daemon=True)
  t.start()

  [rest of code]

In theory the server is not supposed to send anything to the client, but in practice I decided that I would rather have the client exit only on an explicit end of file indication. The use of os._exit() is a bit brute force, but at this point I want all of the client to exit immediately.

This threading approach is brute force but also quite easy, so I'm glad I opted for it rather than complicating my life a lot with select and similar options. These days maybe the proper easy way to do this sort of thing is asyncio with streams, but I haven't written any asyncio code.

(I may take this as a challenge and rewrite the client as a proper asyncio based program, just to see how difficult it is.)

All of this appears to work in casual testing. If I Ctrl-C the server in my terminal window, the remote client dutifully exits. If I manually kill the remote client, my local server exits. I haven't simulated having the network connection stop working and having SSH recognize this, but my network connections don't get broken very often (and if my network isn't working, I won't be logged in to work and trying to open URLs on my home desktop).

python/URLServerInsuringExits written at 22:04:18; Add Comment

2023-09-14

An important difference between intern and make-symbol in GNU Emacs ELisp

Suppose, not hypothetically, that for some reason you're trying to create a GNU Emacs ELisp macro that defines a function, and for your sins you don't want to directly specify the name of your new function. In my case, I want to create a bunch of functions with names of the form 'mh-visit-<something>', which all use mh-visit-folder to visit the (N)MH folder '+inbox/<something>'. Ordinary people using macros to create functions probably give the macro the full name of the new function, but here it's less annoying if I can put it together in the macro.

(I will put the summary here: unless you know what you're doing, use intern instead of make-symbol in situations where you're making up symbols, like on the fly function definitions.)

If we passed in the name of our new function to be, I believe the resulting macro would be (without niceties like docstrings):

(defmacro make-visit-func (fname folder)
  `(defalias ',fname
     (lambda ()
       (interactive)
       (mh-visit-folder ,(format "+inbox/%s" folder)))))

;; usage:
(make-visit-func mh-visit-fred "fred")

(You might think we would use defun, but defun is actually a macro itself and defalias is the direct thing, as I understand it; see Defining functions.)

The ` and , bits are Backquoting, which lets us create a quoted list of the code the macro will generate while splicing in some variables. The peculiar ',fname is (E)Lisp for the value of 'fname' spliced in to the list but quoted instead of evaluated; 'fname' itself is not a string but a symbol (also), which we have here because symbols are how functions get their (global) names.

(We don't have to quote 'mh-visit-fred' when we use the macro because macro arguments are effectively automatically quoted, by virtue of being unevaluated.)

If we're going to use the folder name to create both the full folder and the name of our new function, we need to turn a string into a symbol. If you quickly scan the documentation on creating symbols, there are two plausible ELisp functions to use, intern and make-symbol. The latter's name certainly sounds like what we want, and if you aren't an ELisp expert, you might rewrite our make-visit-func to use it, like so:

;; This doesn't actually work, don't copy it.
(defmacro make-visit-func (folder)
  `(defalias ',(make-symbol (format "mh-visit-%s" folder))
     (lambda ()
       (interactive)
       (mh-visit-folder ,(format "+inbox/%s" folder)))))

;; usage:
(make-visit-func "fred")

If you do this, you will spend some time being rather confused. Your macro will execute without errors and debugging approaches like macroexpand will give you a result that works when evaluated. If you change to using intern, this will work, so you actually want:

;; This does actually work, you can copy it.
(defmacro make-visit-func (folder)
  `(defalias ',(intern (format "mh-visit-%s" folder))
     (lambda ()
       (interactive)
       (mh-visit-folder ,(format "+inbox/%s" folder)))))

What I believe is happening is caused by the following innocent seeming bit in the description of make-symbol, emphasis mine:

This function returns a newly-allocated, uninterned symbol whose name is name (which must be a string).

Normally, the symbols that serve as the name of functions are interned, making them both visible for later use and kept around by ELisp garbage collection. However, what make-symbol gives us is an unreferenced symbol object (with a name assigned to it), so when defalias connects our lambda to it to make it a function, it is still not visible or retained. As a result it disappears into a puff of smoke afterward; even if it hasn't been literally garbage collected, there is no reference to it.

Evaluating the result of macroexpand worked, because it was the (more or less) list result and when evaluated back created a normal, interned symbol. Here it is:

(defalias 'mh-visit-fred
  (lambda nil
    (interactive) (mh-visit-folder "+inbox/fred")))

(I've split this on three lines for readability.)

Although 'mh-visit-fred' came from 'make-symbol', it is printed as text (well, its name is), and when read back to evaluate, it will be a normal interned string. This is one of the cases where the printed representation is not actually the read syntax for this value. I believe that the actual neurotically correct print syntax for 'make-symbol's result is '#:'. Although I believe Emacs accepts this as read syntax, I don't know if current versions ever print this on output unless you try very hard.

(Having gone through this once, I'm writing it down so that if I ever do this sort of thing again I won't have to re-learn this.)

PS: Since I am not an Emacs Lisp expert, it's quite possible that there are better ways to do all of this macro.

programming/EmacsInternVsMakeSymbol written at 23:05:13; Add Comment

2023-09-13

A user program doing intense IO can manifest as high system CPU time

Recently, our IMAP server had unusually high CPU usage and was increasingly close to saturating its CPU. When I investigated with 'top' it was easy to see the culprit processes, but when I checked what they were doing with the strace command, they were all busy madly doing IO, in fact processing recursive IMAP LIST commands by walking around in the filesystem. Processes that intensely do IO like this normally wind up in "iowait", not in active CPU usage (whether user or system CPU usage). Except here these processes were, using huge amounts of system CPU time.

What was happening is that these IMAP processes trying to do recursive IMAP LISTs of all available 'mail folders' had managed to escape into '/sys'. The processes were working away more or less endlessly because Dovecot (the IMAP server software we use) makes the entirely defensible but less common decision to follow symbolic links when traversing directory trees, and Linux's /sys has a lot of them (and may have ones that form cycles, so a directory traversal that follows symbolic links may never terminate). Since /sys is a virtual filesystem that is handled entirely inside the Linux kernel, traversing it and reading directories from it does no actual IO to actual disks. Instead, it's all handled in kernel code, and all of the work to traverse around it, list directories, and so on shows up as system time.

Operating on a virtual filesystem isn't the only way that a program can turn a high IO rate into high system time. You can get the same effect if you're repeatedly re-reading the same data that the kernel has cached in memory. Since the kernel can satisfy your IO requests without going to disk, all of the effort required turns into system CPU time inside the kernel. This is probably easiest to have happen with reading data from files, but you can also have programs that are repeatedly scanning the same directories or calling stat() (or lstat()) on the same filesystem names. All of those can wind up as entirely in-kernel activities because the modern Linux kernel is very good at caching things.

(Most people's IMAP servers don't have the sort of historical configuration issues we have that create these exciting adventures.)

linux/UserIOCanBeSystemTime written at 22:11:06; Add Comment

(Previous 10 or go back to September 2023 at 2023/09/12)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.