Finding which NFS client owns a lock on a NFS server via Linux kernel delving

May 6, 2023

Suppose that you have some Linux NFS servers, which have some NFS locks, and you'd like to know which NFS client owns which lock. Since the NFS server can drop a client's locks when it reboots, this information is in the kernel data structures, but it's not exposed through public interfaces like /proc/locks. As I mentioned yesterday while talking about drgn, I've worked out how to do this, so in case someone's looking for this information, here are the details. This is as of Ubuntu 22.04, but I believe this code is relatively stable (although where things are in the header files has changed since 22.04's kernel).

In the rest of this I'll be making lots of references to kernel data structures implemented as C structs in include/linux/fs.h, include/linux/lockd/lockd.h, and include/linux/filelock.h. To start with, I'll introduce our cast of characters, which is to say various sorts of kernel structures.

  • 'struct nlm_host' represents a NFS client (on an NFS server), or more generally a NLM peer. It contains the identifying information we want in various fields, and so our ultimate goal is to associate (NFS) file locks with nlm_hosts. I believe that a given nlm_host can be connected to multiple locks, since a NFS client can have many locks on the server.
  • 'struct nlm_lockowner' seems to represent the 'owner' of a lock. It's only interesting to us because it contains a reference to the nlm_host associated with the lock, in '.host'.

  • 'struct lock_manager_operations' is a set of function pointers for lock manager operations. There is a specific instance of this, 'nlmsvc_lock_operations', which is used for all lockd/NLM locks.

  • 'struct file_lock' represents a generic "file lock", POSIX or otherwise. It contains a '.fl_lmops' field that points to a lock_manager_operations, a '.fl_pid' field of the nominal PID that owns the lock, a '.fl_file' that points to the 'struct file' that this lock is for, and a special '.fl_owner' field that holds a 'void *' pointer to lock manager specific data. For lockd/NLM locks, this is a pointer to the associated 'struct nlm_lockowner' for the lock, from which we can get the nlm_host and the information we want.

    All lockd/NLM locks will have a '.fl_lmops' field that points to 'nlmsvc_lock_operations' and a '.fl_pid' that has lockd's PID.

    (The POSIX versus flock versus whatever type of a lock is not in '.fl_type' but is instead encoded as set bits in '.fl_flags'. Conveniently, all NFS client locks are POSIX locks so we don't have to care about this.)

  • 'struct inode' represents a generic, in-kernel inode. It contains an '.i_sb' pointer to its 'superblock' (really its mount), its '.i_ino' inode number, and '.i_flctx', which is a pointer to 'struct file_lock_context', which holds context for all of the locks associated with this inode; '.i_flctx->flc_posix' is the list of POSIX locks associated with this inode (there's also eg '.flc_flock' for flock locks).
  • 'struct file' represents an open file in the kernel, including files 'opened' by lockd/NLM in order to get locks on them for NFS clients. It contains a '.f_inode' that points to the file's associated 'struct inode', among other fields. If you want filename information about a struct file, you also want to look at '.f_path', which points to the file's 'struct path'; see include/linux/path.h and drgn's 'd_path()' helper.

  • 'struct nlm_file' is the lockd/NLM representation of a file held open by lockd/NLM in order to get a lock on it, and for obvious reasons has a pointer to the corresponding 'struct file'. For reasons I don't understand, this is actually stored in a two-element array, '.f_file[2]'; which element is used depends on whether the file was 'opened' for reading or writing.

There are two paths into determining what NFS client holds what (NFS) lock, the simple and the more involved. In the simple path, we can start by traversing all generic kernel locks somehow, which is to say we start with 'struct file_lock'. For each one, we check that '.fl_lmops' is 'nlmsvc_lock_operations' or that '.fl_pid' is lockd's PID, then cast '.fl_owner' to a 'struct nlm_lockowner *', dereference it and use its '.host' to reach the 'struct nlm_host'.

One way to do this is to use bpftrace to hook into 'lock_get_status()' in fs/locks.c, which is called repeatedly to print each line of /proc/locks and is passed a 'struct file_lock *' as its second argument (this also conveniently iterates all current file locks for you). We also have the struct file and thus the struct inode, which will give us identifying information about the file (the major and minor device numbers and its inode, which is the same information in /proc/locks). The 'struct nlm_host' has several fields of interest, including what seems to be the pre-formatted IP address in .h_addrbuf and the client's name for itself in .h_name.

So here's some bpftrace (not fully tested and you'll need to provide the lockd PID yourself, and also maybe include some header files):

kprobe:lock_get_status
/((struct file_lock *)arg1)->fl_pid == <your lockd PID>/
{
   $fl = (struct file_lock *)arg1;
   $nlo = (struct nlm_lockowner *)$fl->fl_owner;
   $ino = $fl->fl_file->f_inode;
   $dev = $ino->i_sb->s_dev;
   printf("%d: %02x:%02x:%ld inode %ld owned by %s ('%s')\n",
          (int64)arg2,
          $dev >> 20, $dev & 0xfffff, $ino->i_ino,
          $ino->i_ino,
          str($nlo->host->h_addrbuf),
          str($nlo->host->h_name));
}

(Now that I look at this a second time, you also want to look at the fifth argument, arg4 (an int32), because if it's non-zero I believe this is a pending lock, not a granted one. You may want to either skip them or print them differently.)

This will print the same indexes and (I believe) the same major:minor:inode information as /proc/locks, but add the NFS client information. To trigger it you must read /proc/locks, either directly or by using lslocks.

Another way is to use drgn to go through the global list of file locks, which is a per-cpu kernel hlist under the general name 'file_lock_list'. In interactive drgn, it appears that you traverse these lists as follows:

for i in for_each_present_cpu(prog):
  fll_cpu = per_cpu(prog['file_lock_list'], i)
  for flock in hlist_for_each_entry('struct file_lock', fll_cpu.hlist, 'fl_link'):
    [do whatever you want with flock]

I'm not quite sure if you want present CPUs, online CPUs, or possible CPUs. Probably you don't have locks for CPUs that aren't online.

The second path in is that the NFS NLM code maintains a global data structure of all 'struct nlm_file' objects, in 'nlm_files', which is an array of hlists, per fs/lockd/svcsubs.c. Starting with these 'nlm_file' structs, we can reach the generic file structs, then each file's inode, then the inode's lock context, and finally the POSIX locks in that lock context (since we know that all NFS locks are POSIX locks). This gives us a series of 'file_lock' structs, which puts us at the starting point above.

(The lock context '.flc_posix' is a plain list, not a hlist, and they're chained together with the '.fl_list' field in file_lock. Probably most inodes with NFS locks will have only a single POSIX lock on them.)

So we have more or less:

walk nlm_files to get a series of struct nlm_file → get one .f_file
.f_inode.i_flctx → walk .flc_posix to get a series of struct file_lock (probably you usually get only one)
→ check that .fl_lmops is nlmsvc_lock_operations to know you have an NFS lock, and then follow .fl_owner casting it as a struct nlm_lockowner *
→ .host → { .h_addrbuf, .h_name, and anything else you want from struct nlm_host }

If this doesn't make sense, sorry. I don't know a better way to represent data structure traversal in something like plain text.

(Also, having written this I've realized that you might need to make sure you visit each given inode only once. In theory multiple generic file objects can all point to the same inode, and so repeatedly visit its list of locks. I'm not sure this can happen with NFS locks; the lockd/NLM system may reuse nlm_file entries across multiple clients getting shared locks on the same file.)

Since starting from nlm_files requires several walks of list-like structures that will generate multiple entries and starting from a struct file_lock doesn't, you can see why I called the latter the simpler case. Now that I've found the 'file_lock_list' global and learned how to traverse it in drgn in the course of writing this entry, I don't think I'll use the 'nlm_files' approach in the future; it's strictly a historical curiosity of how I did it the first time around. And starting from the global file lock list guarantees you're reporting on each file lock only once.

(I was hoping to be able to spot a more direct path through the fs/lockd code, but the path I outlined above really seems to be how lockd does it. See, for example, 'nlm_traverse_locks()' in fs/lockd/svcsubs.c, which starts with a 'struct nlm_file *' and does the process I outlined above.)

Written on 06 May 2023.
« Some early praise for using drgn for poking into Linux kernel internals
Advisory file locks and mandatory file locks are two quite different things »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat May 6 22:29:58 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.