Some early praise for using drgn for poking into Linux kernel internals
I've been keeping my eyes on drgn (repository, 2019 LWN article) for some time, because it held promise for being a better way to poke around your Linux kernel than the venerable crash(8) program (which I've actually used in anger, and it was a lot of work). Today, for the first time, I got around to using drgn and the experience was broadly positive.
I used drgn on an Ubuntu 22.04 test NFS server, by creating a Python 3 venv, installing drgn into the venv, and then running it from there (after installing the necessary kernel debugging information from Ubuntu); this worked fine and 'drgn' gave me a nice interactive Python environment where with minimal knowledge of drgn itself I could poke around the kernel. Specifically, I could poke into the various data structures maintained by the kernel NFS NLM system, with the goal of being able to see which NFS client owned each NFS lock on the server (or in this case, a lock, since it was a test server and I established only a single lock to it for simplicity).
Drgn in interactive mode works quite well for this sort of exploration for a number of reasons. To start with it does a remarkably good job of pretty-printing structures (and arrays) with type and content information of all of the fields. Simply being able to see the contents of various things (and type information for pointers) led me to make some useful discoveries. However, sometimes you'll be confronted with things like this:
>>> prog['nlm_files'] (struct hlist_head [128]){ [...] { .first = (struct hlist_node *)0xffff8974099ae600, },
This is a message from drgn to you that you're going to be reading some kernel source code and kernel headers in order to figure out your next step. The good news is that drgn supports all of the kernel's normal ways of traversing these sorts of data structures, in a way that's very similar to the kernel's own code for it, to the point where an outsider like me can translate back and forth. For instance, if you have kernel code that looks like:
hlist_for_each_entry_safe(file, next, &nlm_files[i], f_list) {
Then the drgn equivalent you want (hard-coding the index by experimentation because this is exploration):
>>> r = list( hlist_for_each_entry('struct nlm_file', prog['nlm_files'][6].address_of_(), 'f_list') ) >>> r [Object(prog, 'struct nlm_file *', value=0xffff8974099ae600)]
(We use list()
for the usual Python reason that drgn's helper
function returns a Python generator, and we want to poke at the
actual results in a simple way. Also, technically these are in
drgn.helpers.linux, which you may want to import specifically so
you can read the help text for. Or see the user guide and the
section on helpers.)
You'll also need to read kernel source code and kernel headers in
order to dig your way through the kernel data structures to what
you want. Drgn
won't (and can't) tell you how NLM data structures are linked
together and how you can go from, for example, the global 'nlm_files
'
to the 'struct nlm_host
' that tells you the NFS client that got
a particular lock. The path can be quite convoluted (cf).
The good news is that if the kernel can do it, drgn probably can do it too, although it may take you quite a bit of digging and persistence to get there. The further good news is that if you can do it in drgn's interactive mode, even painfully and with many mis-steps, you can probably turn your worked out process into Python code that uses drgn. Although I (temporarily) turned to other tools for now, being able to explore and test ideas with drgn was essential to getting there. Now that I've used drgn for this, I'll likely to be turning to it for similar explorations and information extraction in the future.
In addition to needing to know Python and be able to read kernel code and headers, drgn's other drawback is that you need kernel debugging information, and on most Linuxes these days that's not installed by default. Installing it may be a bit annoying and it's generally rather big; drgn's documentation has a guide. This means that drgn doesn't work out of the box the way tools like bpftrace do.
(It would be great if drgn could use the kernel's BPT Type Format (BTF) information, which bpftrace and other eBPF tools already use, but apparently there are various obstacles. I believe that drgn is tracking this in DWARFless Debugging #176.)
|
|